LLM Research Papers: The 2025 List (January to June) A topic-organized collection of 200+ LLM research papers from 2025
This blog post shares a curated list of recent research papers focused on improving reasoning abilities in Large Language Models (LLMs). The papers are organized into categories such as training strategies, inference-time scaling, and general understanding/evaluation. Many of the training strategies revolve around reinforcement learning with verifiable rewards. The list is broken into bi-yearly updates to stay digestible and timely. The author also offers free access to their "Machine Learning Q and AI" book for the summer.
The Big LLM Architecture Comparison From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design
Explore the architectural developments that define today's open large language models (LLMs) in 2025, moving beyond benchmarks to focus on structural changes. Key models like DeepSeek V3, OLMo 2, Gemma 3, and Kimi K2 are examined for innovations such as Multi-Head Latent Attention (MLA), Mixture-of-Experts (MoE), sliding window attention, and normalization layer placements. These architectural choices impact computational efficiency, training stability, and overall performance, highlighting the ongoing evolution and optimization in the field.
Synthetic and federated: Privacy-preserving domain adaptation with LLMs for mobile applications
This blog post explores using privacy-preserving synthetic data in federated learning to enhance both small and large language models (LLMs), particularly for mobile typing applications like Gboard. By leveraging LLMs to generate synthetic data that mimics user input, the approach avoids privacy risks associated with real user data. The post details methods for creating domain-adaptive synthetic data, including using a 'buttress module' trained with differential privacy to guide data generation. This approach improves model accuracy and simplifies the development process, while maintaining user privacy.
Simulating large systems with Regression Language Models
This blog post introduces Regression Language Models (RLMs) for predicting numerical outcomes from complex, unstructured data by treating it as text. RLMs read string representations of inputs and output numbers as structured text, demonstrated by predicting resource efficiency in Google's compute clusters. The approach avoids feature engineering, adapts to new tasks with few-shot learning, and approximates output probability distributions, offering density estimates and uncertainty quantification. The results showed the model's accurate predictions and adaptability, paving the way for universal system simulators and sophisticated reward mechanisms.
MedGemma: Our most capable open models for health AI development
Google Research introduces MedGemma, a collection of open models for health AI development, emphasizing efficiency and privacy. The collection includes MedGemma 27B Multimodal, enhancing the existing models with multimodal EHR interpretation, and MedSigLIP, a lightweight image and text encoder. These models are designed as starting points for medical research and product development, offering adaptability for specific tasks. MedGemma models have shown high accuracy in chest X-ray report generation and competitive performance on medical knowledge benchmarks. The open nature of MedGemma facilitates customization, addressing privacy concerns and ensuring stable performance for medical applications.