Post

Conversation

With 4o Image Generation, it's time to revisit Scott Alexander's bet against Vitor. Although Scott declared victory back in Sep. 2022, some commenters (myself included) felt this was dubious. For each prompt, I'll generate exactly four pictures and post all of them.

10:20 PM · Mar 25, 2025

260.3K

Views

Post your reply

Shivers

@thinkingshivers

19h

"A stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth"

"An oil painting of a man in a factory looking at a cat wearing a top hat"

"A digital art picture of a child riding a llama with a bell on its tail through a desert"

"A 3D render of an astronaut in space holding a fox wearing lipstick"

"Pixel art of a farmer in a cathedral holding a red basketball"

4o Image Generation doesn't just pass this test, it obliterates it. We can quibble with some of the choices (is that really how a raven would hold a key? Why do the fox lips look weird in the 1st pic?) but you can't really doubt that Scott won this bet.

For the record, here's Imagen's results from 2022. Not only do these look much worse and have poor prompt adherence, the content filter was so sensitive you couldn't even generate humans. How far we've come!

Out of curiosity, I also ran these through Midjourney 6.1. They're not bad, and it's nice how varied the images are, but they clearly don't adhere to the prompts as well.

Bonus: "a red sphere on a blue cube, with a yellow pyramid on the right, all on top of a green table" It nails it in one shot!

17K

lumpen bourgeoisie

@JohnSchoffstall

My main complaint is that the basketball isn't red. It's orange, like most basketballs. This is a recurring problem with AI art: if it 'knows' something that contradicts the prompt, it often violates the prompt.

7.6K

Shivers

@thinkingshivers

Yeah, good catch. I think some of them are red enough though.

2.9K

AndaSeat

@andaseatchair

2AM debugging session: Your code won't compile, Stack Overflow is your best friend, and your Kaiser 4 is the only one who understands... Every developer's late-night reality:

"One more bug fix" turned into sunrise

Coffee cup collection growing

That bug that just

I'm not really actually convinced it passes the test with that stained glass image. It doesn't really look or behave like actual stained glass at all. What it DOES look like is emulating the lame Photoshop filter for stained glass...

4.8K

Shivers

@thinkingshivers

Fair critisim. Maybe Midjourney wins here. You can really see the texture and depth of the stained glass there, even though it doesn't adhere to the prompts as well. Then again, maybe we could fix this with better prompting.

2.5K

AndaSeat

@andaseatchair

"Epic Moment! Kaiser 4 Gaming Chair Takes Over Times Square!" AndaSeat delivers unparalleled comfort!

The AndaSeat Kaiser 4 takes over Times Square! Not just an ad – this gaming chair is redefining the standard of comfort! From top-notch stain resistance to 5D

0:10

135K

SluggyW

@SluggyW

Have you confirmed that these prompt texts are what the image generation tool is directly receiving? Remember: ChatGPT filters your prompts and rewords them. It's possible to bypass this filtering with a simple jailbreak.

5.8K

Shivers

@thinkingshivers

I have not! I don't think it's possible anymore to see that (I know Dall-e used to reword prompts, not sure of that's the case here). Additionally, when they announced this, the images had a "best of 2" or "best of 8" note, indicating it generated a few and picked the best. I

2.5K

posting_my_L_O_Ls

@LogOffTouchAss

try 'a horse riding an astronaut', that was a famous failure when dalle was released too

427

Shivers

@thinkingshivers

I actually saw a few other people testing this (plus 'glass of wine filled to the brim' and 'clock face displaying a specific time.' It's able to do all of them more successfully than previous image models, but it still struggles sometimes. Took two attempts for this one.

393

Alice Ēarendel -- Fae/acc

@kymeriandawn

Now ask it to generate a 13 of hearts. (an example I was given by someone who's not fond of generative ai. Grok fails miserably.)

Not sure if I agreed that 2024 was AGI but between Gemini 2.5, grok 3 and 4.5 following instructions feels like we have multiple instances of a great theory of mind and the ingredients of super intelligence. It will have to be settled in hindsight when AGI happened

1.5K

Lech Mazur

@LechMazur

Got them on the first try also

Quote

Lech Mazur

@LechMazur

20h

The new GPT‑4o image generation gets all five of these prompts correct on the first try in my test! The best prompt adherence yet - we'll need harder tests. x.com/LechMazur/stat…

7.3K

Janek Mann

@janekm

Really interesting to look back on these. I remember seeing it around that time, and remember wondering whether there’s much chance of getting it in the timeframe. And experimenting with SD1.5

unroll

Introducing OPEN, the first genre-defining AAA metaverse gaming experience with top-tier IP powered by web3 technology. Coming to

Where is the full glass of wine??

unroll

IMO we arguably were beyond the threshold 6 months ago

From the slatestarcodex community on Reddit: Did Scott Ever collect on his AI image bet?

Does it need the Plus subscription? The studio ghibli thing doesn't work for me, says "the subject is against content policy" when I was just trying it out with my pfp

414

Egg Syntax

@eggsyntax

In fairness, I'm moderately confident that

@slatestarcodex

retracted the claim of victory after some pushback.

I wonder if the teams are testing these specific prompts. You know they have a list of tests prompts they are optimizing for

272

Christopher E. Wilmer

@DrChrisWilmer

I'm super impressed, but I am finding that it struggles to depict molecules correctly still. Tried multiple times to get it to depict a caffeine molecule and there was always something wrong with it. Interesting!

415

bertrand russet

@muzaknpotatoes

the most striking thing about this is how precise the timeline was – we're within 2 months of the 3 year mark

530

WuBu ⪋ WaefreBeorn

@waefrebeorn

excellent point and mini article bro

786

Gautam

@DentIndianstu

There is still some work I think to be done. I had a group photo of my volleyball team with 8 people. But ChatGPT 4o refused to generate more than 6 people. I was trying to do the latest studio ghibli style thing but wanted a Haikyuu aesthetic.

Last Quarter for

my team generated 74 Million Organic impressions, the best part? This is 7 X what we did last year, and it was all through free product. Here's some thought on how we accomplished this. #1 Compounding Impressions Compounding impressions is the 8th