Post

Conversation

People underrate how big a bottleneck inference compute will be. Especially if you have short timelines. There's currently about 10 million H100 equivalents in the world. By some estimates, human brain has the same FLOPS as an H100. So even if we could train an AGI that is as inference efficient as humans, we couldn't sustain a very large population of AIs. Not to mention that a large fraction of AI compute will continue to be used for training, not inference. And while AI compute has been growing 2.25x so far, by 2028, you'd be push against TSMC's overall wafer production limits, which grows 1.25x according to AI 2027 Compute Forecast. ht , 's "Can AI Scaling Continue Through 2030?", AI-2027 compute forecast
Image
Quote
Dean W. Ball
@deanwball
cue the @ohlennart laser eyes meme
Image
David Watson 🥑
Post your reply

If you think in those terms, seems the corresponding prediction is that AI starts to have a real impact only after going past the 98th percentile of intelligence, rather than average human intelligence.
I wouldn't put it mainly in terms of intelligence. I would put it in terms of the economic value of their work. Long term coherence, efficient+online learning, advanced multimodality seem like much bigger bottlenecks to the value of these models than their intelligence.
I think there's room to be more inference-efficient than humans in some cases. For example NVIDIA claims to be able to get as many as 30k tok/s for R1 on 8xB200, ~2k tok/s per H100 equiv. Probably only a fraction of that is achievable in practice, but 500 tok/s/H100 is well
Image
Humans can generate a few tokens per second on 20W of brain. H100 can generate >1000 Llama-70B tokens per second on 700W of electricity. AI as inference-efficient as humans is already here. And you can sustain a lot of 20W AGIs on gigawatts of datacenter power!
a very simple botec: the plot shows 100m h100s by eoy 2027. how many humans will be smarter than the smartest model that fits in a h100 at that time? 10m, if that? so then that's at least a 10x multiple on the knowledge work!
100% Total brainpower is still ~1000x less. 10B humans at 1PFLOP vs. 10M H100s at 1PFLOP. By ~2050, maybe 100M z100s at 1eFLOP still running at 1kW each. *Then* total brainpower will be similar. But still mostly orthogonal. i.e. Moravecian
I’m not entirely sure how true is this in near short term. The total global daily consumption of Deepseek V3 on OpenRouter can be satisfied with $3M worth of compute. Maybe this drastically change but even two orders of magnitude in change wouldn’t be that painful
Measuring flops is not the right metric as it ignores efficiency/utility of the AI that runs in the flops. By nvidia’s own numbers the real rate of token output has grown much much more than flops. Utility needs to be measured at the end use case as humans are.
I'm genuinely surprised that or hasn't had a major acquisition offer made yet. For example, Microsoft, Google, Amazon, or xAI just taking it off the board. It's pretty remarkable their progress and the sheer delta in performance is mind boggling.
Currently compute is the main bottle neck for AI services, but eventually the bottle neck will be electricity (amount, price & reliability of the infra under extremely heavy load). China is constrained to access AI chips due US sanctions. It cannot also purchase the
“compute will be 95% inference 5% training soon enough” or something along those lines, from Jim Keller
I’m not sure we’d want 10 billion AGIs. Probably even a few will do, and then they merge into ASI anyway. General problem - for now - is we’re brute forcing the progress somewhat. But this will change too.
I think a reasonable take is that “improving AI” is substantially also “improving the real FLOPS efficiency of AI”; as such I’m not sure how literally to take this type of analysis.
I don't get it, 10M today + 2.25x/y gets you to a 100M, in 3 years, which is very close to the 500M knowledge workers globally, considering H100 don't sleep or take breaks (easily a 5X factor)
Growth is energy constrained. If ASI were here today it would focus all its efforts in expanding primary energy and electricity generation.
This is part of why I've been researching completely different paradigms. I think that analog reservoirs combined with igpus, tpus, or similar devices could be a good alternative, especially as diffusion-like text generation becomes more reliable
-> By some estimates, human brain has the same FLOPS as an H100 ok not an expert here but this looks very sus to me also consider that huawei chip production is ramping up in spite of US trade controls and potentially AI companies reprovisioning training compute for inference
except for the widely incorrect comparison of H100 to human brain in terms of FLOPs, this post is very accurate. Also, there is some investment advice hidden in this post
Thanks for sharing. Couple questions: 1. Do you guys still think will we still need all that compute if models stop scaling with more compute? 2. Will all this compute be fully utilized in the next few years and in what applications?
You don't need a very high AI population (measured in roughly human equivalent units of compute?) to trigger an intelligence explosion. AI labs seem to be doing fine pushing things forward despite the number of AI researchers on Earth being less than 100.000, and the distribution
yeah, in the short run the supply of smart humans (say, ~100M top 99th percentile thinkers) absolutely dwarfs AIs (~millions of H100 equivalents). even the extremely optimistic AI 2027 model, the top lab only has 200k brains doing AI research. scaling up will take a lot of time.
Question : How much low hanging fruit do you think there exists in making inference substantially more computationally efficient ? In compute intensive efforts that I have been involved in, in the first phase, focus is on capability not efficiency.
Counterpoint is that 10 AI with 10x normal human intelligence might be able to completely change the world.
note on scale Human kid has 1e15 synapses GPT4 has 1e12 floats 1000x less. If one is to believe synapses can compute more information than float there's 1000x scale left. Increasing compute 1000x from 10m to 10b h100 would cost $30 000 * 10b = $300T.
If you put it like this, you also have to account for knowledge spread. In society all the knowledge is spread out but currently we are trying to condense it into one NN. Also H100 is roughly 10^7 times less efficient than the Landauer Limit. If you want more human like behav..
Nonsense. Compute available will rise 100x in the next six or seven years (the historic rate for decades now), which is faster than most businesses will be able to integrate LLMs.
🧠 process much more as the computing unit is ion channel With that current supercomputer clusters approach processing power Of single "frozen" brain (no plasticity) o3 verification below
Image
Totally agree—compute bottlenecks are becoming the real story. Curious to see which markets move quickest on fabrication and infrastructure. Who’s best positioned to scale? 🚀
why do short timeline ppl need a large population of AIs and why do you think humans are near compute efficient limit?
How is inference compute different from other computes? Physically, is inference compute made of anything exotic?
One of the most prescient things from sci-fi is the increasingly insatiable demand for computronium and energy
ASICs and eventually thermodynamic compute Inference problem is going to get reduced by 2 OOMs at least
Einstein doesn't need more inference than the average human So while 10 million humans can't do much 10 million einsteins would be world changing Better yet why have 10 million and not just one big superhuman who can make breakthroughs every few minutes?
Human brains are efficient Human beings are not current LLMs are already several OOMs more efficient than humans at inference
aye
Quote
Dishwasher
@DishwasherTag
Replying to @firstadopter
There is going to be an AI in every washing machine, every car radio, every dishwasher. The GPU demand over the next decade is like Mount Everest, and right now, we’ve just stepped out of the tent at base camp. We’ve got a long climb ahead.
Great points on the inference compute bottleneck. Its def a huge factor for scaling. We're thinking about this at jenova ai too – for instance, our intelligent model router aims to optimize which model handles a query, and users can build Custom AI Agents choosing models that
AGI isn’t sci-fi anymore. It’s the inflection point where AI stops needing us to write code, breaks labor markets, and rewires intelligence itself. Compute limits matter—but AGI’s self-improving loop will force new bottlenecks we can’t yet predict. What bottlenecks scare you