Post

Conversation

Why do pre-o3 LLMs struggle with generalization tasks like

? It's not what you might think. OpenAI o3 shattered the ARC-AGI benchmark. But the hardest puzzles didn’t stump it because of reasoning, and this has implications for the benchmark as a whole. Analysis below

Tokenization labelled on the ARC-AGI prompt used by OpenAI: Find the common rule that maps an input grid to an output grid, given the examples below. Example 1: Input: 0 0 0 ...

LLMs having to draw the grid manually and the fact the grid appears linear from their perspective are probably the main factors in performance dropping from just increasing grid size. Grid size doesn't matter to humans because we mostly ignore the grid unless it's for alignment

4.6K

Mikel Bober-Irizar

@mikb0b

Yep, exactly. How much are we testing the LLMs ability to generalise from 3 examples, and how much are we testing its ability to de-linearise grids?

1:17 PM · Dec 24, 2024

4,330

Views

Post your reply

JohnBrown

@JonathanLigmas

Dec 25

Why does de-linearising need to happen? A grids arrangement in space doesn't change its information content

The de-linearising needs to happen implicitly within the model in order to perform many of the transformations ARC requires (vertical translation and rotation for example)

To view keyboard shortcuts, press question markView keyboard shortcuts

Post

Conversation

To view keyboard shortcuts, press question mark
View keyboard shortcuts