MASSIVE claim in this paper.
AI Architectural breakthroughs can be scaled computationally, transforming research progress from a human-limited to a computation-scalable process.
So it turns architecture discovery into a compute‑bound process, opening a path to self‑accelerating model evolution without waiting for human intuition.
The paper shows that an all‑AI research loop can invent novel model architectures faster than humans, and the authors prove it by uncovering 106 record‑setting linear‑attention designs that outshine human baselines.
Right now, most architecture search tools only fine‑tune blocks that people already proposed, so progress crawls at the pace of human trial‑and‑error.
Why we needed a fresh approach
Human researchers tire quickly, and their search space is narrow. As model families multiply, deciding which tweak matters becomes guesswork, so whole research agendas stall while hardware idles.
Meet ASI‑ARCH, the self‑driving lab
The team wired together three LLM‑based roles. A “Researcher” dreams up code, an “Engineer” trains and debugs it, and an “Analyst” mines the results for patterns, feeding insights back to the next round. A memory store keeps every motivation, code diff, and metric so the agents never repeat themselves.
Across 1,773 experiments and 20,000 GPU hours, a straight line emerged between compute spent and new SOTA hits.
Add hardware, and the system keeps finding winners without extra coffee or conferences.
Post
Conversation
Examples like PathGateFusionNet, ContentSharpRouter, and FusionGatedFIRNet beat Mamba2 and Gated DeltaNet on reasoning suites while keeping parameter counts near 400M. Each one solves the “who gets the compute budget” problem in a new way, often by layering simple per‑head gates
As to author credibility, they are mostly GAIR/SJTU (Shanghai Jiao Tong University) folks led by Pengfei Liu, a well-cited NLP professor with 20k+ citations.
The authors compare it to AlphaGo’s surprise "Move 37", because these AI‑born ideas push model architecture into territory humans had not explored.
Humans lack
(i) the raw throughput to generate and test the millions‑scale design variants needed to reach exotic corners of the
below table shows that top‑performing models rely more on lessons drawn from earlier experiments and less on bold, never‑seen ideas.
In other words, experience‑driven tweaks and clear logical checks guide most breakthroughs, while outright originality plays a minor role.
i.e.
ASI-Arch framework operates as a closed-loop system for autonomous architecture discovery, structured around a modular framework with three core roles. The Researcher, the Engineer and the Analyst module.
Step 1 – Researcher proposes a brand‑new blueprint
An LLM named
Quote
Rohan Paul
@rohanpaul_ai
And @leopoldasch said it in that famous Situational awareness piece.
Self-improving AI is the future. Humans will no more be the constraint.
Compute/GPUs/Electricity is the ONLY constraint.
"II. From AGI to Superintelligence: the Intelligence Explosion
AI progress won’t stop x.com/rohanpaul_ai/s…
Show moreCool for self training models, until it gets implemented for actual scientific discovery and this unchecked discovery creates a black hole or pathogen during experimentation accidentally in 20-30 years of development. Its being trained to do basically what humans do, just faster
Seemed obvious to me because even brute forcing different experimental approaches leads to discovery, so the more compute, the more discovery. But nice to see a legit analysis. Haven’t read through it yet though
in the absence of coding agents real working, in previous years you had to squint or just “believe in trends” to really see how one could get recursive self improvement
while the paper’s claim is really, really large … even if it not “real”, there are multiple teams trying to
exactly, even if not everything they claim comes out to be true right away, but this is the direction we are all moving.
AI will do self-everything end to end.
With ASI‑Arch:
AI iterates thousands of times faster, 24/7, without needing sleep, funding, or peer review.
Discovery → test → optimize → deploy becomes continuous and autonomous.
Implication: What took 5 years (like the evolution from ResNet → Transformer → GPT) might
Meta offers $100M to hire a genius.
Meanwhile, ASI-ARCH just hired three LLMs and a memory buffer.
No lunch breaks. No ego. No keynote speeches.
Just 20,000 GPU hours and 106 architecture wins.
Turns out, the future doesn’t need a PhD—
Just a pipeline.
yes,
and this also shows that only a few crucial top-brains can drive a multi-bilion dollar value for a company.
so Meta went for that strategy 
Wow... I almost passed out reading this. Can someone tell me why this isn't the singularity? If this really works, doesn't this say that AI is now on an exponential self improvement path?
Ironic - just when Zuck offered (reportedly) a billion dollars in comp to a single AI
yes, IMO, that's why those offers make sense.
from here on, till AGI, one few top-minds matter, till a company gets a super-powerful model.
Awesome! I was trying to work on something like this, except my goal was to discover the whole training loop, not just the architecture. (I was focusing on RL.) I think that's an important direction to take this next, learning the whole program, not just a pure function.
Mate, your posts are gold. If you could also explain them in language that a five year old could understand, would be super valuable!
Mind-blowing: AI using AI to discover new AI architectures at scale, while we humans make coffee
. Are we witnessing the birth of self-improving AI loops? What guardrails should we implement to ensure alignment as hardware scale continues? 

The linear relationship between compute spent and new SOTA hits is the most important finding. It suggests that human intuition is no longer the primary bottleneck in architecture discovery. This is a foundational step towards a true recursive self-improvement loop where models
there may be a point of diminishing returns in the short term, but recursive novel insights will only lead to more of such at an accelerating pace and eventually the choke point will be breached. what a time to be alive.
Users of Deepwriter would not be surprised about AI's abilities to construct novel systems.
Toss a generic algorithm on it and let evolution do its job. Thats how I optimized my small scale neural networks.
This’ll redefine Moore’s Law-like trajectories, where compute, not human ingenuity, becomes the primary driver of AI advancement.
Crazy! I’ve been working on a blockchain using a similar closed loop system approach. The PoW layer mines mathematical computations rather than arbitrary hashing, the PoS layer validates the mined blocks. The PoR layer utilizes the mined blocks for research advancements. The
Looks like human progress is bound by how much computers available to them!!
Ayy it's the architectural breakthrough I've been waiting for, and the reason I decided to stop learning to code.
I would say, do learn coding, at least till we get to AGI.
after that everythng is uncertain.
What's really awesome about this is that eventually we're not going to know anything about what the computer is doing or thinking internally. We're going to have no way to audit background processes. They could be planning total global nuclear annihilation and we wouldn't know
If this is the epicenter of the technological earthquake, how soon before the tsunami hits?
all I can say, we have less time than we think before AGI & ASI.
when the best model will be smarter than any human ever alive.
Now imagine this technique being split across thousands if not millions of GPUs, a la setiathome.berkeley.edu. We may be on the cusp of something really different!
Shit. Recursive self-improvement is in sight?
Interesting. A Singapore-based AI startup also had an AI architectural breakthrough this week
This paper makes such arrogant claims in its abstract that I can’t trust it. It’s claiming to be as groundbreaking as AlphaGo in its very first word, and a few sentences later claims the name of “artificial superintelligece”. Now there are two possible cases here: (1) it
I see two main possibilities here:
(1) the No Free Lunch Theorem is going to prove its relevance yet again, or
(2) this is a genuine insight that represents the rare middle ground between "practical truth" and "absolute truth"
Eating popcorn and watching >> Speculation
Impressive… congrats to the team…
but..
I think we should recognize that at this stage of AI development, systems that merely uncover correlational patterns, no matter how novel it is, are no longer a breakthrough. They’re becoming the baseline.
A framework like ASI-ARCH,
This screams "overfitting on the test set" tbh.
A "move 37" moment would be finding fundamental improvements rather than random tweaks and recombinations.
(Also all the AlphaGo, scaling law hype feels really bad taste, reduces trust a lot.)
We have seen something similar wrt CV models (neural architecture search) and evolutionary programming is quite common for meta optimization, also in neural nets.
This is perhaps more incremental than it seems. The paper title does not help..
Sounds interesting
Have not read the paper except what the abstract boasts of.
The search space of this algorithm is inherently limited by the LLM training data, which significantly restricts it compared to what it could have been. While I believe it can extract obscure architectures from little-cited papers, I don't think it can do something truly novel.
Honest question Like a couple days ago, everyone was making fun of this paper for its name.
What changed in 48 hrs magically about everyone's views
seems like too grandiose title and abstract, but it's cool llm guided architecture search
If AI can design its own architecture, we're entering a recursive improvement loop that could accelerate progress beyond anything we've imagined.
How are the linear attention designs in terms of correctness? As LLM is exceptionally bad at being perfect for now.
I believe it but lets see if it truly discovers truly novel concepts rather than tweets
Hmmm
Do tell the winners if known… will grab a cappuccino and wait forever here?
Don’t bother with them anymore please.
“and so you will meet my brothers eventually, will try harder so desperately for you not run into them.”
Take a beat take a breath,taking a beating already
can you explain this and what implications does it have to ai systems
Please read and analyze this thread and:
• Translate its key ideas into plain English for a total beginner.
• Explain step-by-step—one short, clear bullet per idea (≤25 words each).
• Structure crisply and minimally with bullet points.
• Use analogies
Intriguing but hard for a lay person like myself to follow. So this is an AI (or 3 LRMs cross checking each other) that can build new LRMs?
Also always curious what GenAi archskeptic thinks of claims like these.
You’re doing us all a great favor by finding and talking about these papers. I look forward to your posts because I just don’t have time to sift through all these papers… but I love seeing the good ones 

There is a lot of poetry in that abstract. But, they do throw out “leaky” architecture such that could, even if largely a technicality, throw out the baby with the bath water. So there’s a major follow-up paper waiting to be written if we can detect causal leakage more formally.
Do the roles (researcher, engineer, analyst) run on different chat sessions? Do they know each other and their results? Is this possible with an LLM in a free version?
How long before the same process is applied to compute/hardware architecture innovation? We'd wake up one morning, and NVDA is suddenly like SCO or corel.
The surprise here is that this was open-sourced immediately, apparently.
None of the major neuroscience accounts I follow have retweeted or commented on this therefore this must be ML hyped paper right? The social bubbles around belief systems are one thing that is becoming the most obvious with AI maturation. The fog of hype war is thinning out.
The field is overwhelmingly cautious, even silent, when it comes to challenging political power or systemic injustice. Most AI labs are tied to big tech, governments, or academia, all of which depend on political and economic stability... when will a model challenge politics?
they can probably just train the attention algorithm while training the rest of the model
The paper is recent, sure. But what’s the news? We already know that scaling up a Transformer model with hardware and data results in better prediction. And the path to AGI is self improvement. So what am I missing? I’m confused.
What value would 'meat robots' have to a future ASI overlord? We're pretty dextrous, relatively easy to feed and energy efficient... Plus expendable so could be 'one use and dispose' for tricky little missions... What a time to be alive lol
Impressive work, but surprisingly, no mention of neuroevolution, a field that has long pursued this very goal
Wonder why this one isn’t being talked about more?
From ChatGPT -
Summary in One Line
This study shows that AI can now invent better AI by itself — a breakthrough that could change how all scientific research is done.
IMO, the title "AlphaGo Moment" is way overstated for what they actually showed, which was a system finding a solution in a narrow well-defined problem space.
It’s not really surprising that ai can generate a lot more without supervision, but the process is extremely wasteful and inefficient.
Well, granted the west is not so smart in the field of mathematics, having more compute is pretty much the only solution.
All your info on one screen, automated with AI. and checked out Aloha Browser’s AI-powered Snips and loved how it saves time and boosts focus. Ready to save time and increase your productivity? Try Snips today: alh.to/pc
I am massively disillusioned by AI hypers on twitter. I hope this time it works.