Post

Conversation

Image
Quote
Anthropic
@AnthropicAI
New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claudeโ€™s (often complex and surprising) internal mechanisms.
2:49
David Watson ๐Ÿฅ‘
Post your reply

This really shines when it gets a wildly wrong answer and then "shows how" with a nonsequitor that shows it's independently forging how it came up with the answer (lying)
I want to see one model argue with another and use this tracing tool to prove the other one is biased to shut it down
This is remarkably similar to how humans do arithmetic. A notable difference though is humans are also able to apply an iterative algorithm when pushed for a better answer. Is that ever going to be possible in a non-recurrent architecture? My gut says no, although yes with
lies, openai is the largest gathering of word wizards and neuromages ever assembled in the history of spellcasting. i will not stand for magick erasure when so many have died in the mana mines to make this happen
Itโ€™s definitely a jungle gym in there!
Quote
WikiBonsai
@wibomd
Replying to @wibomd
Certainly, like mental weight-training the more one builds and traverses these trails and connections, the stronger the structure becomes in one's mind. Like traveling between bars on a **jungle gym**, each one makes you stronger and more capable of reaching the next one. 7/n
Show more
Image