body { -ms-overflow-style: scrollbar; overflow-y: scroll; overscroll-behavior-y: none; } .errorContainer { background-color: #FFF; color: #0F1419; max-width: 600px; margin: 0 auto; padding: 10%; font-family: Helvetica, sans-serif; font-size: 16px; } .errorButton { margin: 3em 0; } .errorButton a { background: #1DA1F2; border-radius: 2.5em; color: white; padding: 1em 2em; text-decoration: none; } .errorButton a:hover, .errorButton a:focus { background: rgb(26, 145, 218); } .errorFooter { color: #657786; font-size: 80%; line-height: 1.5; padding: 1em 0; } .errorFooter a, .errorFooter a:visited { color: #657786; text-decoration: none; padding-right: 1em; } .errorFooter a:hover, .errorFooter a:active { text-decoration: underline; } #placeholder, #react-root { display: none !important; } body { background-color: #FFF !important; }

JavaScript is not available.

We’ve detected that JavaScript is disabled in this browser. Please enable JavaScript or switch to a supported browser to continue using x.com. You can see a list of supported browsers in our Help Center.

Terms of Service Privacy Policy Cookie Policy Imprint Ads info © 2025 X Corp.

To view keyboard shortcuts, press question mark
View keyboard shortcuts

Post

Conversation

People underrate how big a bottleneck inference compute will be. Especially if you have short timelines. There's currently about 10 million H100 equivalents in the world. By some estimates, human brain has the same FLOPS as an H100. So even if we could train an AGI that is as inference efficient as humans, we couldn't sustain a very large population of AIs. Not to mention that a large fraction of AI compute will continue to be used for training, not inference. And while AI compute has been growing 2.25x so far, by 2028, you'd be push against TSMC's overall wafer production limits, which grows 1.25x according to AI 2027 Compute Forecast. ht

,

@EpochAIResearch

's "Can AI Scaling Continue Through 2030?", AI-2027 compute forecast

Quote

Dean W. Ball

@deanwball

May 17

cue the @ohlennart laser eyes meme

9:58 AM · May 17, 2025

260.4K

Views

David Watson 🥑

Post your reply

Full credit - I just copy pasting a riff that

narrated to me last night.

Eliezer Yudkowsky

If you think in those terms, seems the corresponding prediction is that AI starts to have a real impact only after going past the 98th percentile of intelligence, rather than average human intelligence.

I wouldn't put it mainly in terms of intelligence. I would put it in terms of the economic value of their work. Long term coherence, efficient+online learning, advanced multimodality seem like much bigger bottlenecks to the value of these models than their intelligence.

I think there's room to be more inference-efficient than humans in some cases. For example NVIDIA claims to be able to get as many as 30k tok/s for R1 on 8xB200, ~2k tok/s per H100 equiv. Probably only a fraction of that is achievable in practice, but 500 tok/s/H100 is well

Humans can generate a few tokens per second on 20W of brain. H100 can generate >1000 Llama-70B tokens per second on 700W of electricity. AI as inference-efficient as humans is already here. And you can sustain a lot of 20W AGIs on gigawatts of datacenter power!

@Kaushal25664748

a very simple botec: the plot shows 100m h100s by eoy 2027. how many humans will be smarter than the smartest model that fits in a h100 at that time? 10m, if that? so then that's at least a 10x multiple on the knowledge work!

100% Total brainpower is still ~1000x less. 10B humans at 1PFLOP vs. 10M H100s at 1PFLOP. By ~2050, maybe 100M z100s at 1eFLOP still running at 1kW each. *Then* total brainpower will be similar. But still mostly orthogonal. i.e. Moravecian

I’m not entirely sure how true is this in near short term. The total global daily consumption of Deepseek V3 on OpenRouter can be satisfied with $3M worth of compute. Maybe this drastically change but even two orders of magnitude in change wouldn’t be that painful

@threekardashevs

Measuring flops is not the right metric as it ignores efficiency/utility of the AI that runs in the flops. By nvidia’s own numbers the real rate of token output has grown much much more than flops. Utility needs to be measured at the end use case as humans are.

I'm genuinely surprised that

or

@CerebrasSystems

hasn't had a major acquisition offer made yet. For example, Microsoft, Google, Amazon, or xAI just taking it off the board. It's pretty remarkable their progress and the sheer delta in performance is mind boggling.

Petri Kuittinen

@KuittinenPetri

Currently compute is the main bottle neck for AI services, but eventually the bottle neck will be electricity (amount, price & reliability of the infra under extremely heavy load). China is constrained to access AI chips due US sanctions. It cannot also purchase the

“compute will be 95% inference 5% training soon enough” or something along those lines, from Jim Keller

@veridian_prime

I’m not sure we’d want 10 billion AGIs. Probably even a few will do, and then they merge into ASI anyway. General problem - for now - is we’re brute forcing the progress somewhat. But this will change too.

@aidanprattewart

I think a reasonable take is that “improving AI” is substantially also “improving the real FLOPS efficiency of AI”; as such I’m not sure how literally to take this type of analysis.

I don't get it, 10M today + 2.25x/y gets you to a 100M, in 3 years, which is very close to the 500M knowledge workers globally, considering H100 don't sleep or take breaks (easily a 5X factor)

Growth is energy constrained. If ASI were here today it would focus all its efforts in expanding primary energy and electricity generation.

Bring

on pod to discuss this

This is part of why I've been researching completely different paradigms. I think that analog reservoirs combined with igpus, tpus, or similar devices could be a good alternative, especially as diffusion-like text generation becomes more reliable

makes it clear why Taiwanese sovereignty is such a pivotal issue

-> By some estimates, human brain has the same FLOPS as an H100 ok not an expert here but this looks very sus to me also consider that huawei chip production is ramping up in spite of US trade controls and potentially AI companies reprovisioning training compute for inference

Link to the report:

Can AI Scaling Continue Through 2030?

We investigate four constraints to scaling AI training: power, chip manufacturing, data, and latency. We predict 2e29 FLOP runs will be feasible by 2030.

except for the widely incorrect comparison of H100 to human brain in terms of FLOPs, this post is very accurate. Also, there is some investment advice hidden in this post

@EpochAIResearch

Thanks for sharing. Couple questions: 1. Do you guys still think will we still need all that compute if models stop scaling with more compute? 2. Will all this compute be fully utilized in the next few years and in what applications?

You don't need a very high AI population (measured in roughly human equivalent units of compute?) to trigger an intelligence explosion. AI labs seem to be doing fine pushing things forward despite the number of AI researchers on Earth being less than 100.000, and the distribution

clearly we need to be automating wafer manufacturing with ai for faster scaling then

The bottleneck is even bigger when you factor in that the brain is not the only thing computing in a human.

Any take on what architecture might win for inference? Do you think it’s likely that Nvidia holds much less of an advantage there?

You forget that 98% of human compute is wasted on queries like "how to get laid"

Who's working on this?

cyberdyne_canary

Just wait until the organic brain chips are coming online en masse

What does a large population of AIs mean to you?

Many companies like META, Google, and Amazon are building great chips for inference so this problem is overblown.

Mohan Narendran

"AGI hive-minds" (as you have described them) will greatly accelerate the inference compute requirements

Definitely

High school kids should learn how to wafer

Okay - I'll settle with ASI but only 10 of them. Let's see what they can get done for us.

SecBriefs | Making Cybersecurity Simple

Ad

Quantum computers can break today’s encryption in seconds.

Quantum tech will reshape our digital lives. Governments & hackers are preparing for the quantum era. How about you?

Don’t get left behind!

Cybersecurity Dictionary for Everyone can help: amazon.com/CYBERSECURITY-

Seems bullish for $AMZN $MSFT $GOOG.. They should be able to charge cartel prices to process a shortage of inference compute. Thoughts?

yeah, in the short run the supply of smart humans (say, ~100M top 99th percentile thinkers) absolutely dwarfs AIs (~millions of H100 equivalents). even the extremely optimistic AI 2027 model, the top lab only has 200k brains doing AI research. scaling up will take a lot of time.

Rama Nambimadom

Question : How much low hanging fruit do you think there exists in making inference substantially more computationally efficient ? In compute intensive efforts that I have been involved in, in the first phase, focus is on capability not efficiency.

Counterpoint is that 10 AI with 10x normal human intelligence might be able to completely change the world.

note on scale Human kid has 1e15 synapses GPT4 has 1e12 floats 1000x less. If one is to believe synapses can compute more information than float there's 1000x scale left. Increasing compute 1000x from 10m to 10b h100 would cost $30 000 * 10b = $300T.

@realantonmaier

If you put it like this, you also have to account for knowledge spread. In society all the knowledge is spread out but currently we are trying to condense it into one NN. Also H100 is roughly 10^7 times less efficient than the Landauer Limit. If you want more human like behav..

Nonsense. Compute available will rise 100x in the next six or seven years (the historic rate for decades now), which is faster than most businesses will be able to integrate LLMs.

Soft takeoff that everyone will pretend was hard after the fact

𐍆𐍂𐌰𐍅𐌰𐌿𐍂𐌷𐍄𐍃

Comparing brains and inference machines (I refuse to say "comp*te") in terms of FLOPS is probably meaningless. Comparatively little of the brain is used for any kind of language or reasoning, IIRC.

Rudzinski Maciej

@rudzinskimaciej

process much more as the computing unit is ion channel With that current supercomputer clusters approach processing power Of single "frozen" brain (no plasticity) o3 verification below

What about the power bottleneck? Surely that is equally constraining on progress.

Trading Sardine

@sardine_trader_

Not to mention what happens if Taiwan gets invaded and TSMC stops pumping out chips

Totally agree—compute bottlenecks are becoming the real story. Curious to see which markets move quickest on fabrication and infrastructure. Who’s best positioned to scale?

why do short timeline ppl need a large population of AIs and why do you think humans are near compute efficient limit?

How is inference compute different from other computes? Physically, is inference compute made of anything exotic?

H100 can work 24 hours a day - humans only around 8. So that's 30 million up from 10 million right there.

@dearmadisonblue

a popular estimate for the brain's FLOPS is around 10^14, but I think if you're a Penrose believer it's closer to 10^24, which means you're not getting human level AI anytime soon

Computational capacity of life in relation to the universe

The discovery of life processing with UV-excited qubits supports a conjecture relative to the computing capacity of the universe.

Nvidia GPU architecture is generic and isn’t tailored for inference

One of the most prescient things from sci-fi is the increasingly insatiable demand for computronium and energy

someone pull out the cerebras wafers

ASICs and eventually thermodynamic compute Inference problem is going to get reduced by 2 OOMs at least

Bexel Initiative - Let's Talk Amendments!

@BexelInitiative

Einstein doesn't need more inference than the average human So while 10 million humans can't do much 10 million einsteins would be world changing Better yet why have 10 million and not just one big superhuman who can make breakthroughs every few minutes?

Human brains are efficient Human beings are not current LLMs are already several OOMs more efficient than humans at inference

aye

Quote

Dishwasher

@DishwasherTag

Apr 1

Replying to @firstadopter

There is going to be an AI in every washing machine, every car radio, every dishwasher. The GPU demand over the next decade is like Mount Everest, and right now, we’ve just stepped out of the tent at base camp. We’ve got a long climb ahead.

@threadreaderapp

unroll

By some estimates i am Gods Prophet - what kind of bullshitification is that.

what’s inference compute and how is different from other types

Great points on the inference compute bottleneck. Its def a huge factor for scaling. We're thinking about this at jenova ai too – for instance, our intelligent model router aims to optimize which model handles a query, and users can build Custom AI Agents choosing models that

Power is even bigger constraint.

Almost Got It Podcast

@AlmostGotItPod

AGI isn’t sci-fi anymore. It’s the inflection point where AI stops needing us to write code, breaks labor markets, and rewires intelligence itself. Compute limits matter—but AGI’s self-improving loop will force new bottlenecks we can’t yet predict. What bottlenecks scare you

goldilocks caliban

@HybridVitalist

Aren't you familiar with

? They use 14nm wafers and their inference pwns. Inference compute does not compete for wafers with high end training compute.