Less than a year from announcement to near saturation.
(On to ARC-AGI-3)
Quote
François Chollet
@fchollet
Replying to @fchollet
Unlike ARC-AGI-1, this new version is not easily brute-forced. Current top AI approaches score 0-4%.
All base LLMs (GPT-4.5, Claude 3.7 Sonnet, Gemini 2, etc.) score 0%. Single-CoT reasoning models (Claude Thinking, R1, o3-mini…) score 0-1%.
So you can't solve these tasks via