Blog
Research and experiments on AI code review
April 17, 2026 · Case Study
One capital letter bypassed TLS in Bun. Ensemble caught it.
serverName vs servername -- a casing mismatch in two files let attackers MITM every Postgres, Redis, and MySQL connection. 4 reviewers, 17 issues, and problems found in the fix itself.
Read →
April 17, 2026 · Case Study
We missed a critical Deno bug. Then we fixed our pipeline and caught it.
From 0 issues to 18 -- how upgrading 6 review prompts turned a complete miss into a catch. An honest post-mortem on the Deno ResourceId-to-FD type confusion bug.
Read →
April 15, 2026 · Case Study
We flagged the Next.js OOM bug before it shipped. It shipped anyway.
Ensemble's 3 AI reviewers independently flagged resource exhaustion and race conditions in Next.js PR #91729. Without a merge gate, it shipped — and crashed production 3 days later.
Read →
April 13, 2026
I marketed multi-model AI review. Turned out my own product didn't support it.
3 models, 35 runs, $0.78. Here's what we learned — including why cheap models fail catastrophically.
Read →
April 12, 2026
One AI caught a use-after-free in Bun. Another said the code was clean.
We ran Claude Sonnet 15 times on the same PR. 40% of runs caught the real bug. 20% saw nothing wrong.
Read →
April 12, 2026
I ran the same AI code review 3 times. It found different bugs each time.
40% of PRs showed different findings across identical runs. Real data from 5 open-source repos.
Read →