Post

Conversation

MASSIVE claim in this paper. AI Architectural breakthroughs can be scaled computationally, transforming research progress from a human-limited to a computation-scalable process. So it turns architecture discovery into a compute‑bound process, opening a path to self‑accelerating model evolution without waiting for human intuition. The paper shows that an all‑AI research loop can invent novel model architectures faster than humans, and the authors prove it by uncovering 106 record‑setting linear‑attention designs that outshine human baselines. Right now, most architecture search tools only fine‑tune blocks that people already proposed, so progress crawls at the pace of human trial‑and‑error.

Why we needed a fresh approach Human researchers tire quickly, and their search space is narrow. As model families multiply, deciding which tweak matters becomes guesswork, so whole research agendas stall while hardware idles.

Meet ASI‑ARCH, the self‑driving lab The team wired together three LLM‑based roles. A “Researcher” dreams up code, an “Engineer” trains and debugs it, and an “Analyst” mines the results for patterns, feeding insights back to the next round. A memory store keeps every motivation, code diff, and metric so the agents never repeat themselves.

Across 1,773 experiments and 20,000 GPU hours, a straight line emerged between compute spent and new SOTA hits. Add hardware, and the system keeps finding winners without extra coffee or conferences.

7:05 PM · Jul 25, 2025

835.8K

Views

Post your reply

Rohan Paul

@rohanpaul_ai

Jul 25

Across 1,773 experiments and 20,000 GPU hours, a straight line emerged between compute spent and new SOTA hits. Add hardware, and the system keeps finding winners without extra coffee or conferences.

Examples like PathGateFusionNet, ContentSharpRouter, and FusionGatedFIRNet beat Mamba2 and Gated DeltaNet on reasoning suites while keeping parameter counts near 400M. Each one solves the “who gets the compute budget” problem in a new way, often by layering simple per‑head gates

Patterns the agents uncovered The chart compares how often each component shows up in 106 winning architectures versus 1,667 discarded ones. Gating layers and small convolutions dominate both groups at roughly 14% and 12% usage, while staples like residual links and feature

Paper – arxiv.org/abs/2507.18074 Paper Title: "AlphaGo Moment for Model Architecture Discovery"

arxiv.org

AlphaGo Moment for Model Architecture Discovery

While AI systems demonstrate exponentially improving capabilities, the pace of AI research itself remains linearly bounded by human cognitive capacity, creating an increasingly severe development...

As to author credibility, they are mostly GAIR/SJTU (Shanghai Jiao Tong University) folks led by Pengfei Liu, a well-cited NLP professor with 20k+ citations.

The authors compare it to AlphaGo’s surprise "Move 37", because these AI‑born ideas push model architecture into territory humans had not explored. Humans lack (i) the raw throughput to generate and test the millions‑scale design variants needed to reach exotic corners of the

below table shows that top‑performing models rely more on lessons drawn from earlier experiments and less on bold, never‑seen ideas. In other words, experience‑driven tweaks and clear logical checks guide most breakthroughs, while outright originality plays a minor role. i.e.

ASI-Arch framework operates as a closed-loop system for autonomous architecture discovery, structured around a modular framework with three core roles. The Researcher, the Engineer and the Analyst module. Step 1 – Researcher proposes a brand‑new blueprint An LLM named

a comment on this paper on reddit. and I agree with him.

Quote

Rohan Paul

@rohanpaul_ai

12h

And @leopoldasch said it in that famous Situational awareness piece. Self-improving AI is the future. Humans will no more be the constraint. Compute/GPUs/Electricity is the ONLY constraint. "II. From AGI to Superintelligence: the Intelligence Explosion AI progress won’t stop x.com/rohanpaul_ai/s…

Cool for self training models, until it gets implemented for actual scientific discovery and this unchecked discovery creates a black hole or pathogen during experimentation accidentally in 20-30 years of development. Its being trained to do basically what humans do, just faster

Seemed obvious to me because even brute forcing different experimental approaches leads to discovery, so the more compute, the more discovery. But nice to see a legit analysis. Haven’t read through it yet though

in the absence of coding agents real working, in previous years you had to squint or just “believe in trends” to really see how one could get recursive self improvement while the paper’s claim is really, really large … even if it not “real”, there are multiple teams trying to

exactly, even if not everything they claim comes out to be true right away, but this is the direction we are all moving. AI will do self-everything end to end.

To view keyboard shortcuts, press question markView keyboard shortcuts

Post

Conversation

To view keyboard shortcuts, press question mark
View keyboard shortcuts