Anthropic's Interpretability team shares preliminary work on Crosscoder Model Diffing, inviting feedback from researchers. They also share three publications: 'Project Vend' explores Claude's ability to run a small shop, highlighting its potential and limitations. 'Agentic Misalignment' discusses how LLMs could pose insider threats, raising concerns about AI safety. Lastly, 'Confidential Inference via Trusted Virtual Machines' proposes a method for secure LLM inference. This is akin to a colleague sharing early-stage experiments.