Post

Conversation

Ok, this thread is long overdue. Now that everyone sees that GPT-4.5 is a disappointment - How has OpenAI underperformed so much on the intelligence of their core models? They've been faring worse and worse since the day GPT-4 came out. Buckle in for the tale:
David Watson 🥑
Post your reply

First, ChatGPT in Nov 2022 and GPT-4 in Mar 2023 were amazing, world-changing innovations. Full credit there. Shortly after GPT-4, forecasts of GPT-5's release began. The initial view was mid 2024. It has crept up and up and up:
Image
OpenAI did release GPT-4-Turbo in Dec 2023, and GPT-4o in May 2024. Way faster and cheaper, but hardly better. Fine - for almost a full year, GPT-4 was king. And so OpenAI was still king. But Anthropic released Claude-3-Opus on March 4, 2024 and took the crown.
Claude-3-Opus was immediately and clearly smarter than GPT-4 class models. I mean real intelligence here, not usability or whatnot. (We at futuresearch.ai switched our most important operations over, despite the cost.) As we know, was just getting started:
3 months later: Claude-3.5-sonnet, big intelligence jump 4 months later: Claude-3.6-sonnet, big intelligence jump 4 months later: Claude-3.7-sonnet, big intelligence jump Even Google caught up and surpassed OpenAI in core LLMs during this time. Google!!
But that was a preview. o1 was the king of thinking models, right? The huge cost and latency increase over Sonnet make it only usable in niche cases. Who actually used it in production? Nobody I know. Claude-3.7-Sonnet-Thinking might have dethroned it anyway (too soon to say).
But, but, but - you say - what about other modalities? What about OpenAI Whisper for audio, or Dalle-3 for images, and SORA for video? None of them are state of the art! Take SORA for example, its own story of disappointment and delay:
SORA was teased on x.com in Feb 2024. In March, Mira Murati said release "possibly before summer". More teasers. In Sept was still in private beta. Finally released in Dec (almost 10 months after the teaser!), and was by then not a top 3 video model!
As I keep saying: OpenAI lost the mandate of heaven. Why? A thread for another day. Maybe because most of their top researchers left, first to found Anthropic, then again after the board coup. Disagree? I'll take bets on whether OpenAI will ever reclaim the top spot.

Discover more

Sourced from across X
Episode 4 of Agents at work is out 🎙️ In this episode, explores how AI tools are transforming the coding landscape and what these changes mean for developers. He finally gave me a definition for AI Agents that I can understand 🙏
0:33
Square profile picture
pushed an update to the agent farm which improves performance when running terminal commands! if you were experience the agent not iterating properly on `cargo` / `pytest` / `jest` errors, the agent should focus on these even more and iterate better on the errors
Yours truly on "All Else Equal," the podcast hosted by two brilliant business school profs, Jonathan Berk of Stanford and Jules van Binsbergen of Wharton. Our topic? Ronald Reagan and Donald Trump. They could hardly prove more different--except, of course, that both communicate.
Show more