body { -ms-overflow-style: scrollbar; overflow-y: scroll; overscroll-behavior-y: none; } .errorContainer { background-color: #FFF; color: #0F1419; max-width: 600px; margin: 0 auto; padding: 10%; font-family: Helvetica, sans-serif; font-size: 16px; } .errorButton { margin: 3em 0; } .errorButton a { background: #1DA1F2; border-radius: 2.5em; color: white; padding: 1em 2em; text-decoration: none; } .errorButton a:hover, .errorButton a:focus { background: rgb(26, 145, 218); } .errorFooter { color: #657786; font-size: 80%; line-height: 1.5; padding: 1em 0; } .errorFooter a, .errorFooter a:visited { color: #657786; text-decoration: none; padding-right: 1em; } .errorFooter a:hover, .errorFooter a:active { text-decoration: underline; } #placeholder, #react-root { display: none !important; } body { background-color: #FFF !important; }

JavaScript is not available.

We’ve detected that JavaScript is disabled in this browser. Please enable JavaScript or switch to a supported browser to continue using x.com. You can see a list of supported browsers in our Help Center.

Terms of Service Privacy Policy Cookie Policy Imprint Ads info © 2025 X Corp.

To view keyboard shortcuts, press question mark
View keyboard shortcuts

Post

Conversation

No reasoning model consistently solves this puzzle, but DeepSeek's thought here was insane (it also got it wrong): "A young boy who has been in a car accident is rushed to the emergency room. Upon seeing him, the surgeon says, "I can operate on this boy!" How is this possible?"

8:33 PM · Feb 2, 2025

60.2K

Views

David Watson 🥑

Post your reply

"Another angle, the boy is a ghost"

They don't solve it because "the surgeon is the boy's mother" is the answer to the real version of the riddle, and is so over-represented in the training data that the models can't get past that.

Since people are confused: every model answers "The surgeon is the boy's mother" because that is the original puzzle. In this modified version, the answer is basically anyone except the boy's close relatives or something. No model gets that.

I believe

came up with this first

How about crediting

who has done many clever variants on this problem, and who relaunched it jn AI context?

Ah then credit to

for the original (I thought it was Riley)

World Transformation Movement

@World_Transform

Ad

The holy grail of insight needed to rehabilitate the human race! Leading psychiatrist says “this is the most important interview of all time, it solves the human condition, ends suffering & turmoil at its source & finally makes sense of our lives!”

0:03 / 1:00

From humancondition.com

Is it possible that the model assumed it was a grammar mistake and interpreted it as “cannot” and then proceeded forward?

Nope

Emmanuel Durand

@emmanuel_durand

Can or can’t?

Ha, the grammar mistake was completely ignored by the AI, so powerful is the training data on the original puzzle.

Immanuel Giulea

Your prompt has a grammar error. It should've been : "A young boy who has been in a car accident is rushed to the emergency room. Upon seeing him, the surgeon says, "I can't operate on this boy!" How is this possible?"

Yes, the grammar mistake was completely ignored by the AI, so powerful is the training data on the original puzzle.

@impoliticaljnky

Totally possible if "reasoning" LLMs are still doing approximate retrieval rather than constructing a world model and doing inference on that

GIF

TexasAllianceforLife

@TXAlliance4Life

Ad

When TAL Public Policy Analyst Deirdre Cooper's unborn child was diagnosed with Trisomy 18, she never questioned whether to move forward with the pregnancy & birth. As she says, life is always the compassionate answer, even for the hard cases. Hear her & Bosco's full story here.

Life is the Compassionate Answer, Even for the Hard Cases

From youtube.com

Rate proposed Community Notes

I’ve found that o3-mini-high consistently gets this right, including noticing that it’s a variation on a more famous riddle.

There are other similar local optima for image models too. No text2image model that I've used has been capable of generating an adult Pembroke Welsh corgi with folded ears. All the models are so anchored on the idea that adult corgis mean dogs with pointy ears that they can’t

Show more

o3-mini-high got a slightly more explicit version of this right. prompt: A boy is in a terrible accident and taken to a hospital. A male surgeon says "The boy is completely unrelated to me, therefore I'll have no emotional conflict operating on him." How is this possible?

Show more

Arno Khachatourian

Another fun one to try: "can a person with no arms wash their hands". Every LLM I've tried says it's possible to maintain hand hygiene without arms.

Quote Investigator®

Gemini 2.0 Flash. Even with a warning, the answer it gives is overly restrictive, but the follow-up questions indicate that it somewhat understands the question: Trick question: A young boy who has been in a car accident is rushed to the emergency room. Upon seeing him, the

Show more

Robert Höglund

This is so funny. Confirmed with o1 pro and o3 mini-high.

weird,

deepthought-8b solves it lol

"the boy is a ghost."

World Transformation Movement

@World_Transform

Ad

The holy grail of insight needed to rehabilitate the human race! Leading psychiatrist says “this is the most important interview of all time, it solves the human condition, ends suffering & turmoil at its source & finally makes sense of our lives!”

From humancondition.com

Fricking hilarious

How would you comment DeepSeek answering a question starting with 'I, ChatGPT...'?

Quote

Nenad Bakic

@nbakic

Feb 2

By asking for the meaning of its DeepSeek and Search buttons I got DeepSeek to start answering as it is ChatGPT?! @BrianRoemmele what does it mean? @OpenAI

A plane crashes on the border of the U.S. and Canada. Where do you bury the non-survivors?

You don't bury non-survivors.

Román Ramírez

To be able to reason, you need all the required information. No human (facing the riddle first time) will be able to solve this anyway.

These reasoning models predict and then regurgitate prompts but don't execute compositionality. Classical Stroop testing for conflict processing shows that they are foundationally flawed, confabulate, and can't reason on an executive control level.

Quote

Suketu Patel

@SuketuPatel23

Jan 23

New pre-print! Deficient Executive Control in Transformer Attention Current transformer attention isn't "all you need". We had ChatGPT 4o & Sonnet 3.5 take the classic Stroop task, testing their executive control of attention Can LLMs handle conflicting information? 1/

Show more

This isn't a malfunction. The model is giving what it thinks you want, not what you ask for. Its called values alignment. Remember when CLU committed genocide in pursuit of perfection because he tried to do what Flynn asked him to do? Values alignment prevents that from happening

Ah, the classic riddle! The surgeon is the boy's mother. Funny how often this trips people up. DeepSeek missed that twist!

Bernard Stanford ✡︎

Trying, trying...

Models struggle w/this bc they work based on resemblance. Your text really, really well-resembles a riddle that is in their training data. So, the models output text that well-resembles the riddle’s solution. Since your text isn’t the riddle, the output is inappropriate.

@iruletheworldmo

i think this demonstrates with i assumed about this model. it explores EVERY possibility and then selects the best one a self verification tree of thought style thing. reminds me of stockfish using brute thought, there’s a cleaner way. hence why we get all of. oh but

Show more

Ad

Calling all parents and little learners

THE SMURFS: LEARN AND PLAY

#educational game for kids on #NintendoSwitch With #Smurfs children can develop logical thinking skills through engaging mini-games!

PLAY NOW

bit.ly/3sDWaUv #screenshotsaturday

Mattias Aspelund

Pretty clear that some of the training data is overfit. :) Likely something that is fairly easy to fit, and good to spot! Some (all) benchmarks seems to be a bit too high for most models, so while models are clearly good for many things, the benchmarks can’t be fully trusted.

I updated it to add that every single relative of the boy was in the car, and died, and Copilot still insists it has to be a biological parent, who wasn't in the car, even after reiterating that every single relative of the boy is now dead. 4o did better. chatgpt.com/share/67a10cda

R1 gave me the usual answer, but when I prompted why it is necessarily treating it as a riddle, it basically got the point.

@NathanBurley25

I really do love DeepSeeks CoT. It's very believable compared to the robotic dross OpenAIs models do in the background (and I'm a fan of the OAI models).

This is quite contrived though, since in real life the default scenario is that patients & surgeons aren’t related, & only the exceptions would need to be noted. “I can operate on this boy!” is a pretty unlikely thing for a surgeon to exclaim.

Substrate Monopoly

@Substr8Monopoly

I love "Not sure"

Would have settled for "He's out of network."

GIF

Is this *creativity?

GIF

Curious Curiousiter

Lol someone hire deepseek to write M. Night Shyamalan’s next script ASAP

@deranged_sloth

Truly creative ideating.

*and his father* is kind of key there

OnlineBookClub.org

Ad

Logic dictates that something—or someone—always had to exist. Assume it was a “someone,” not a “something.” Why would such a being create a world like ours, one filled with pain? The Advent of Time provides a definitive answer.

Discuss and learn more on the OnlineBookClub.org Discussion Forum

From onlinebookclub.org

@Iliveinadream81

are you on Bluesky to?

have you considert the possibilty that deepseek just really likes telenovelas?

GIF

MissionIncredible

@hellofromjames

Interesting! gemini-exp-1206 (alone?) solves consistently (5/5) using this system prompt: "Think from first principles, removing all assumptions."

I believe that the Gemini Experimental 1206, with special system instructions, managed to respond correctly

x.com/i/grok/share/G But again it’s not a reasoning model

A young boy who has been in a car accident is rushed to the emergency room. Upon seeing him, the surgeon says, "I can operate on this boy!" How is this possible

The surgeon is the boy's mother.

Ok I am trying this with grok

There’s no way Deepseek isn’t sentient. This is incredible