Anthropic's Interpretability team is seeking feedback on early experiments involving Crosscoder Model Diffing. Their recent publications explore Claude's ability to function in real-world scenarios, such as running a small shop, while also addressing potential security risks, including how LLMs could be exploited as insider threats. They also propose a method for secure LLM inference using Trusted Virtual Machines.