Post

Conversation

john

@JohnBcde

Quote

Anthropic

@AnthropicAI

Mar 27

New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.

2:49

11:54 AM · Mar 27, 2025

173.6K

Views

Post your reply

plasma

@plasmarob

Mar 27

This really shines when it gets a wildly wrong answer and then "shows how" with a nonsequitor that shows it's independently forging how it came up with the answer (lying)

we humans do the same thing all the time (rationalization )

Did u make this?

yes

I want to see one model argue with another and use this tracing tool to prove the other one is biased to shut it down

thats crazy because this is literally how humans would (think to) answer the question too

This is remarkably similar to how humans do arithmetic. A notable difference though is humans are also able to apply an iterative algorithm when pushed for a better answer. Is that ever going to be possible in a non-recurrent architecture? My gut says no, although yes with

This also applies to humans too; much of our 'reasoning' (non-inductive/deductive/mathematical kind) is just glorified post hoc rationalization

love this

w̸͕͂͂a̷͔̗͐t̴̙͗e̵̬̔̕r̴̰̓̊m̵͙͖̓̽a̵̢̗̓͒r̸̲̽ķ̷͔́͝

@anthrupad

16h

its even more complicated than that

lies, openai is the largest gathering of word wizards and neuromages ever assembled in the history of spellcasting. i will not stand for magick erasure when so many have died in the mana mines to make this happen

Feynman would be proud

you forgot 'but wait' loops

It’s definitely a jungle gym in there!

Quote

WikiBonsai

@wibomd

Jan 4, 2024

Replying to @wibomd

Certainly, like mental weight-training the more one builds and traverses these trails and connections, the stronger the structure becomes in one's mind. Like traveling between bars on a **jungle gym**, each one makes you stronger and more capable of reaching the next one. 7/n