Recent advancements in large language models focus on enhancing reasoning, interpretability, and safety. One approach explores four methods for building reasoning models, from inference-time scaling to reinforcement learning and supervised fine-tuning, noting that targeted fine-tuning can yield impressive results even on a limited budget. Another work investigates feature interpretability in crosscoder models, identifying and mitigating the polysemanticity of exclusive features through shared feature strategies. Finally, a new defense mechanism against AI jailbreaks demonstrates robustness against attacks while addressing overrefusal rates and compute overhead, with ongoing efforts to adapt rapidly to novel threats and improve AI safety.