Post

Conversation

Finally filled out my forecasts. This was excellently designed!

Loved having the context laid out since I'm too lazy to read the papers. TL;DR I think everything saturates except FrontierMath, which merely dectuples.

Talk about it at Minifest tomorrow!

Quote

Sage

@sage_future_

Dec 5

Is AGI just around the corner or is AI scaling hitting a wall? To make this discourse more concrete, we’ve created a survey for forecasting concrete AI capabilities by the end of 2025. Fill it out and share your predictions by end of year! bit.ly/ai-2025

“With long timeline people like these, who needs short timeline people”

I never felt like a "long timelines person" tbh, since 2019 my attitude has been "Holy crap this could be soon & we should prep." Left tails matter more than medians! And IME a lot of people who have "shorter" timelines than me forecast much weaker endpoints.

FWIW communication wise, my understanding of your views 1-2 years ago from skimming BioAnchors and reading summaries + your update + a talk you gave + LW debate, would absolutely not have made me think you would be expecting this for 2025. I am highly surprised.

A lot of what's going on here is that I think we've seen repeatedly that benchmark performance moves faster than anyone thinks, while real-world adoption and impact moves slower than most bulls think.

I have been surprised every year the last four years at how little AI has impacted my life and the lives of ordinary people. So I'm still confused how saturating these benchmarks translates to real-world impacts.

As soon as you have a well-defined benchmark that gains significance, AI developers tend to optimize for it, so it gets saturated way faster than expected — but not in a way that generalizes perfectly to everything else.

1.3K

Ajeya Cotra

@ajeya_cotra

In the last round of benchmarks we had basically few-minute knowledge recall tasks (e.g. bar exam). Humans that can do those tasks well also tend to do long-horizon tasks that draw on that knowledge well (e.g. be a lawyer). But that's not the case for AIs.

9:54 AM · Dec 16, 2024

9,956

Views

Post your reply

Ajeya Cotra

@ajeya_cotra

Dec 16

This round of benchmarks is few-hour programming and math taskss. Humans who do those tasks very well can also handle much longer tasks (being a SWE for many years). But I expect AI agents to solve them in a way that generalizes worse to those longer tasks.

How much worse? Still, annoyingly, a very wide range in my mind.

In this specific case, my assumption is that the model has been pre-trained on a corpus of e.g. hundreds of practice questions, making the Bar exam a benchmark perfectly crafted to overestimate LLMs relative to humans.

Fast Company

@FastCompany

AI is helping startups automate tasks, scale support, and drive growth. See how companies like Luna are leveraging AI.

@DellTech

#Dellforstartups #DWEN #AI #ad

How startups can leverage AI to punch above their weight

From fastcompany.com

2.4M

To view keyboard shortcuts, press question markView keyboard shortcuts

Post

Conversation

To view keyboard shortcuts, press question mark
View keyboard shortcuts