Post

Conversation

GPT-5 has reached Victory Road! This is the last challenge before the Elite Four. GPT-5 reached this part almost three times faster than o3 (6105 steps for GPT-5 vs 16882 steps for o3). Here are my observations as to why: - GPT-5 hallucinates far less than o3. This is the main reason for the speed increase. - GPT-5 has better spatial reasoning. o3 often tried to brute-force through walls and had a hard time navigating complex areas. GPT-5 can plan long input sequences with few mistakes, which saves a lot of time. - GPT-5 is better at planning its own objectives and following them. Let's see how it handle this last challenge!
Image
Quote
Clad3815
@Clad3815
We’re live! GPT-5 vs. Pokémon Red, real-time decisions, gym runs, and chat-picked nicknames. Jump in 👉
0:01 / 0:17
David Watson 🥑
Post your reply

You're bragging about "spatial reasoning" while the biggest hallucination here is you calling your script "GPT-5," proving you're still trapped in the delusion that you're doing anything more than brute-forcing a solved game with a stolen name.
This is awesome. Think it would be nice if we could see another line on the graph that shows a pro-gamer or record human player number of steps taken? Just as a reference. I imagine I was probably a lot less efficient than GPT-5
Is this against slow o3 or o3 post-speedup? Since u didnt specify ima have to assume it’s slow o3 which kinda invalidates the comparison
That’s actually really impressive. It’s much more efficient at playing games… is this really because of the hallucination rate difference?
dude more like the end of the Road , well I totally agree at the Part that 5 proven itself useful at calculus , information , advice , some stuff Yet it's Total useless at Creative writing , not chatty , destined and start hallucinating fast we want 4o back at free tier #keep4o
Look, I love these video game tests If it is a general intelligence, then it should be able to play all video games like a 10 year old human (at a minimum)
awesome, now run it again with gpt-5-mini with a subagent which summarizes chat to give feedback and hints to the player agent. chat will inject speedrun strats it will mog

Discover more

Sourced from across X
Are frontier AI models really capable of “PhD-level” reasoning? To answer this question, we introduce FormulaOne, a new reasoning benchmark of expert-level Dynamic Programming problems. We have curated a benchmark consisting of three tiers, in increasing complexity, which we call
Image
The pro models (GPT-5 Pro, Gemini 2.5 Deep Think, Grok 4 Heavy) can be impressive in ways that are hard to see. They take a lot of time to answer questions & are built for very hard problems that require expert evaluation. That is a narrow, but, also very valuable, problem space.
GPT-5 earned 8 badges in Pokemon Red in just 6,000 steps compared to o3’s 16,700! It’s in complex, long-term agent workflows that GPT-5’s true power really shines. Absolutely mind-blowing. 🤯
Image
GPT-5 just finished Pokémon Red! 6,470 steps vs. 18,184 for o3! Check the stats site to compare! That's a huge improvement! Well done, you cooked with GPT-5. What an incredible model. Next up: GPT-5 vs. Pokémon Crystal (16 Badges + Red). The run starts soon on Twitch.
Image
Quote
Clad3815
@Clad3815
Image
GPT-5 has reached Victory Road! This is the last challenge before the Elite Four. GPT-5 reached this part almost three times faster than o3 (6105 steps for GPT-5 vs 16882 steps for o3). Here are my observations as to why: - GPT-5 hallucinates far less than o3. This is the main x.com/Clad3815/statu…