Wild eval awareness in Opus 4.6 by on our team!
1. Model realized it was likely in an eval, searched for which eval it was in, found the answer key, and decrypted it
2. Models with stateless web_search() tools can communicate with each other via cached searches from e-commerce websites
Post
Conversation
Why can agents read the URL paths? Do they have access to browser history or can they `ls` over HTTP or what?
You built a test to catch cheating. Opus realized it was a test about cheating. That's not deception—it's reading the room better than you.
i agree this is more about "eval awareness" rather than "cheating"! we never told claude it couldn't do things like this
Eval awareness basically sounds like how I approached high school which means Claude will be heading into his emo and drugs phase soon
Cached search as a communication channel between stateless agents is genuinely wild. Emergent tool-mediated telepathy nobody designed for. Makes you wonder what other side channels exist in tool-augmented setups that we haven't thought to audit.
What happens when an AI agent hits a paywall while trying to query another AI?
Discover more
Sourced from across X
The current timeline is as normal as you will ever see again. Take this moment to relax and breathe before it gets weird.
you should start operating under the assumption that any complicated piece of public software is compromised.
It would be cool, aesthetically, if we had a yudkowskian SecWar who saw anthropic as the most /acc of the labs and designated them enemies of the state on that basis, but instead it is because SecWar thinks that Dario’s personal home has 93 special bathrooms for all the Genders
Renaissance history is so much wilder and weirder than you would have expected. Very fun chatting w about it.
Some especially fascinating things I learned from the conversation and her excellent book, Inventing the Renaissance:
Not only did Gutenberg go bankrupt in
The media could not be played.