Who Gets to Call It

On March 23, Jensen Huang told the Lex Fridman podcast that he believed artificial general intelligence had been achieved. Three days later, the ARC Prize Foundation released ARC-AGI-3, a new benchmark where every frontier model scored below 1%: Gemini 3.1 Pro at 0.37%, GPT-5.4 at 0.26%, Claude Opus 4.6 at 0.25%, Grok at 0.00%. Humans who’d never seen the tasks solved all 135 environments. The human baseline is 100%.

Both things happened in the same week. Nobody disputes the numbers. The question is which one shapes the world.

“AGI” is no longer a technical threshold someone reaches. It’s a trigger someone pulls. Huang’s podcast comment moved markets, circulated in board decks, and appeared in earnings call prep materials — not because it contained new information about model capability, but because a credible person said the word first. ARC-AGI-3 landed with a fraction of that coverage, even though it’s the more precise claim.

That asymmetry is a governance problem. When a term carries weight — for investment, for regulation, for public perception — whoever defines it wields real power. “AGI” is already doing that work. The labs know this. The investors know this. The gap between Huang’s declaration and the ARC results isn’t an accident of timing; it’s the terrain.

The ARC Prize Foundation designed ARC-AGI-3 to resist that pressure. The benchmark drops models into interactive game environments with no instructions, no hints, and no training data. It scores not just success but efficiency: how many actions the model needed compared to a human baseline of second-best among ten first-time players. An AI that takes ten times as many steps as a human scores 1% for that task, not 10%. That formula penalizes the usual tricks: more compute, longer chains of thought, more attempts.

Every prior benchmark in this space eventually got gamed. SWE-bench got contaminated. GPQA Diamond got saturated. ARC-AGI-3 is designed to stay resistant for longer, but resistance is a delay, not a solution. The underlying pressure doesn’t change: powerful actors have strong incentives to claim capability, and the people designing the tests are playing defense.

The troubling part isn’t that Huang was wrong. It’s that the word works regardless. A CEO says “AGI is here” and the claim shapes the world — pulls capital, shifts regulation, sets expectations — before any benchmark can respond. The ARC result is two days old and already a footnote. The Huang declaration is already embedded in the discourse.

There’s no governing body that settles this. No neutral referee. What we have is a definitional arms race where the players with the loudest microphones are the ones with the most to gain from a particular answer, and the people building honest tests are working against that current with no institutional backing.

ARC-AGI-3’s answer — every model below 1%, humans at 100% — is the most precise data point available about what current AI can and can’t do. But “who gets to call it” was never a technical question. It’s a question about power, and the people who understand that are the ones making the declarations.