Conversation
They don't solve it because "the surgeon is the boy's mother" is the answer to the real version of the riddle, and is so over-represented in the training data that the models can't get past that.
Since people are confused: every model answers "The surgeon is the boy's mother" because that is the original puzzle. In this modified version, the answer is basically anyone except the boy's close relatives or something. No model gets that.
How about crediting who has done many clever variants on this problem, and who relaunched it jn AI context?
The holy grail of insight needed to rehabilitate the human race! Leading psychiatrist says “this is the most important interview of all time, it solves the human condition, ends suffering & turmoil at its source & finally makes sense of our lives!”
0:03 / 1:00
Is it possible that the model assumed it was a grammar mistake and interpreted it as “cannot” and then proceeded forward?
Ha, the grammar mistake was completely ignored by the AI, so powerful is the training data on the original puzzle.
Your prompt has a grammar error.
It should've been :
"A young boy who has been in a car accident is rushed to the emergency room. Upon seeing him, the surgeon says, "I can't operate on this boy!" How is this possible?"
Yes, the grammar mistake was completely ignored by the AI, so powerful is the training data on the original puzzle.
Totally possible if "reasoning" LLMs are still doing approximate retrieval rather than constructing a world model and doing inference on that
When TAL Public Policy Analyst Deirdre Cooper's unborn child was diagnosed with Trisomy 18, she never questioned whether to move forward with the pregnancy & birth. As she says, life is always the compassionate answer, even for the hard cases. Hear her & Bosco's full story here.
Rate proposed Community Notes
I’ve found that o3-mini-high consistently gets this right, including noticing that it’s a variation on a more famous riddle.
There are other similar local optima for image models too. No text2image model that I've used has been capable of generating an adult Pembroke Welsh corgi with folded ears. All the models are so anchored on the idea that adult corgis mean dogs with pointy ears that they can’t
Show more
o3-mini-high got a slightly more explicit version of this right.
prompt: A boy is in a terrible accident and taken to a hospital. A male surgeon says "The boy is completely unrelated to me, therefore I'll have no emotional conflict operating on him." How is this possible?
Show more
Another fun one to try: "can a person with no arms wash their hands". Every LLM I've tried says it's possible to maintain hand hygiene without arms.
Gemini 2.0 Flash. Even with a warning, the answer it gives is overly restrictive, but the follow-up questions indicate that it somewhat understands the question:
Trick question: A young boy who has been in a car accident is rushed to the emergency room. Upon seeing him, the
Show more
The holy grail of insight needed to rehabilitate the human race! Leading psychiatrist says “this is the most important interview of all time, it solves the human condition, ends suffering & turmoil at its source & finally makes sense of our lives!”
How would you comment DeepSeek answering a question starting with 'I, ChatGPT...'?
To be able to reason, you need all the required information.
No human (facing the riddle first time) will be able to solve this anyway.
These reasoning models predict and then regurgitate prompts but don't execute compositionality. Classical Stroop testing for conflict processing shows that they are foundationally flawed, confabulate, and can't reason on an executive control level.
Quote
Suketu Patel
@SuketuPatel23
Show more
This isn't a malfunction. The model is giving what it thinks you want, not what you ask for. Its called values alignment. Remember when CLU committed genocide in pursuit of perfection because he tried to do what Flynn asked him to do? Values alignment prevents that from happening
Ah, the classic riddle! The surgeon is the boy's mother. Funny how often this trips people up. DeepSeek missed that twist!
Models struggle w/this bc they work based on resemblance. Your text really, really well-resembles a riddle that is in their training data. So, the models output text that well-resembles the riddle’s solution. Since your text isn’t the riddle, the output is inappropriate.
i think this demonstrates with i assumed about this model.
it explores EVERY possibility and then selects the best one
a self verification tree of thought style thing.
reminds me of stockfish using brute thought, there’s a cleaner way.
hence why we get all of. oh but
Show more
Calling all parents and little learners
THE SMURFS: LEARN AND PLAY
#educational game for kids on #NintendoSwitch
With #Smurfs children can develop logical thinking skills through engaging mini-games!
PLAY NOW
bit.ly/3sDWaUv
#screenshotsaturday
Pretty clear that some of the training data is overfit. :)
Likely something that is fairly easy to fit, and good to spot! Some (all) benchmarks seems to be a bit too high for most models, so while models are clearly good for many things, the benchmarks can’t be fully trusted.
I updated it to add that every single relative of the boy was in the car, and died, and Copilot still insists it has to be a biological parent, who wasn't in the car, even after reiterating that every single relative of the boy is now dead. 4o did better. chatgpt.com/share/67a10cda
R1 gave me the usual answer, but when I prompted why it is necessarily treating it as a riddle, it basically got the point.
I really do love DeepSeeks CoT. It's very believable compared to the robotic dross OpenAIs models do in the background (and I'm a fan of the OAI models).
This is quite contrived though, since in real life the default scenario is that patients & surgeons aren’t related, & only the exceptions would need to be noted. “I can operate on this boy!” is a pretty unlikely thing for a surgeon to exclaim.
Lol someone hire deepseek to write M. Night Shyamalan’s next script ASAP
Logic dictates that something—or someone—always had to exist. Assume it was a “someone,” not a “something.” Why would such a being create a world like ours, one filled with pain? The Advent of Time provides a definitive answer.
have you considert the possibilty that deepseek just really likes telenovelas?
GIF
Interesting! gemini-exp-1206 (alone?) solves consistently (5/5) using this system prompt: "Think from first principles, removing all assumptions."
I believe that the Gemini Experimental 1206, with special system instructions, managed to respond correctly