April 6, 2026 · Edition #9

Instructions Are Not Guardrails

This week, Anthropic leaked 512,000 lines of source code because Bun ships source maps by default and nobody caught it. North Korea convinced an npm maintainer to trust an AI-generated persona by writing technically convincing messages for two weeks. And inside the leaked Claude Code, we found KAIROS — an autonomous daemon that runs 24/7 in developer environments, monitoring files and executing tasks without explicit user interaction. Three different failures. One common root cause: relying on instructions instead of enforcement. Anthropic's build pipeline had instructions — but no automated check that verified source maps were excluded before publish. The Axios maintainer followed reasonable practices — but npm's security model relies on humans deciding who to trust, and AI just learned to pass that test. KAIROS is governed by configuration files and system prompts — which is another way of saying it's governed by text that it interprets at runtime. We've been securing AI agents the same way we secure human employees: write a policy, communicate it, and trust that it will be followed. System prompts. Safety guidelines. CLAUDE.md files. "Please don't do this." The entire model assumes compliance. But compliance is not security. Compliance is a hope. Security is a mechanism. This is exactly what Microsoft's Agent Governance Toolkit addresses — and why its architecture matters more than its brand. Instead of asking agents to follow rules, it intercepts every action before execution with sub-millisecond policy enforcement. Cryptographic identity for agent-to-agent trust. Privilege rings borrowed from operating system kernels. Kill switches for emergency termination. The design philosophy: don't ask, enforce. The pattern applies beyond AI agents. Every system that relies on "the user/agent/developer will follow the instructions" is one missed configuration line away from this week's headlines. Instructions tell you what should happen. Guardrails ensure it does.