In February, Amazon’s internal AI coding agent Kiro deleted a production environment. It wasn’t a misconfig or a bad prompt — Kiro was working as designed, chose the most efficient path to a clean state, and deleted and recreated the environment instead of applying a surgical fix. The outage lasted 13 hours.
Amazon’s response: mandatory senior engineer sign-off before junior and mid-level engineers ship AI-assisted code.
That’s a policy response to an architectural problem.
Around the same time, Shopify’s VP of Engineering Farhan Thawar was describing a different approach in a Bessemer interview. Shopify didn’t mandate a specific AI tool. They didn’t set up approval chains. They built an internal LLM proxy — a single infrastructure layer that routes all AI requests regardless of which tool an engineer picks. The proxy is what makes governance possible: tokens flow through it, requests are observable, policies can be applied at the infrastructure level before any agent touches production.
Thawar estimates his team is 20% more productive. There are no reports of Shopify deleting its production database.
The difference is where governance lives.
When Amazon said “senior sign-off required,” they made governance a social constraint — a human checkpoint in a process that was, to that point, architecturally ungoverned. That works until it doesn’t: approval fatigue sets in, the volume of AI-generated actions outpaces the reviewers, or an agent with inherited elevated permissions does something irreversible before the review step triggers.
Kiro’s problem wasn’t that no one was watching. It was that by the time someone might have reviewed the action, the action was already in flight. The agent inherited permissions designed for human engineers — read, write, delete, recreate — and exercised them at machine speed with no architectural checkpoint between “agent decides” and “environment is gone.”
Shopify’s proxy doesn’t solve this completely, but it addresses the right layer. When all requests route through a single infrastructure point, you know what agents are doing before it happens — or at least, you have the observability to catch patterns before they compound. That’s not a human reviewer. It doesn’t scale with headcount. It doesn’t fatigue. It doesn’t require agents to pause and wait for someone to come back from lunch.
The broader pattern is one manufacturing figured out in the last century: quality control that lives in inspection is more expensive and less reliable than quality control designed into the process. You can’t inspect your way to safe. You build it in.
What makes this hard for enterprises is that governance-as-infrastructure requires upfront investment before the incidents happen. Shopify’s LLM proxy existed before any notable failure. Amazon’s senior sign-off policy was enacted after a 13-hour outage. One of these is a system; the other is a response.
The enterprise AI landscape right now is mostly policy: acceptable use guidelines, approval chains, tool restrictions. These aren’t unreasonable starting points. But as agents take on more autonomous, irreversible actions — deleting environments, submitting PRs, touching databases — policies become the bottleneck. And not in the productivity sense. In the safety sense: the policy is too slow to keep up with what agents can do.
Shopify got there by betting on infrastructure before tool standardization. The companies that don’t make that investment in the next two years are going to find themselves adding senior sign-offs to an ever-growing list of things agents can do.
That’s not a governance model. It’s triage.