Claude Sonnet 3.5 generated significantly better ideas for research papers than humans, but when researchers tried executing the ideas the gap between human & AI idea quality disappeared
Execution is a harder problem for AI. (Yet this is a better outcome for AI than I expected)
Post
Conversation
and the rate of increase in capabilities of human execution is \epsilon, while for the AIs...
Members of the military and their families deserve fast access to private, convenient, and affordable mental health support. Sign up for Talkspace therapy, psychiatry, or teen therapy to receive high-quality care without delay, covered by TRICARE.
AI can see the destination but can’t navigate the journey. Ideas are about connecting dots that exist; execution is about creating dots that don’t.
So does that mean that Claude came up with more novel and interesting ideas - that were kinda unfeasible ? Maybe the human researchers filtered out the unviable ones through experience?
Research is a parallel process, I wonder if there's a lack of diversity in AI's research ideas?
Worth noting that there's some debate over whether the ideas (pre-execution step) were actually novel
AIs excel at being impressive. That vein runs deep in the training data (basically social media)
Reminds of what Terence Tao said in the recent Friedman podcast: a lack of ‘smell’ or taste in their thinking
That drop-off in execution quality is the real story here. AI can brainstorm, but the messy reality of actually *doing* research is a whole different ballgame. Makes sense why that's harder for it right now.
Members of the military and their families deserve fast access to private, convenient, and affordable mental health support. Sign up for Talkspace therapy, psychiatry, or teen therapy to receive high-quality care without delay, covered by TRICARE.