agentic-coding

Should tests in codebase be tripwires

Zohaib Rauf· Apr 18, 2026 · 1 min read min read

I've been thinking about AI-generated tests and the friction around them. AI can generate a lot of tests, but figuring out which ones are "good" requires human review that doesn't scale. The reframe I keep coming back to is to stop judging them on quality and start thinking of them as tripwires. Their job isn't to validate correctness, it's to alert reviewers when something significant changed. If a PR trips them, look closely. If nothing trips, the change likely didn't touch core logic.

If that framing holds, you could use techniques like mutation testing to find gaps where code changes would go unnoticed, then generate tests targeting those gaps at scale. The bar shifts from "is this a good test" to "would this catch a change," which is much easier to clear with less review overhead and directly accelerates code reviews by telling reviewers where to focus. Still thinking through this but it feels like an interesting direction to explore.