The Cyber Leader - Balanced Security

The Cyber Leader - Balanced Security

A Security Guide for Building Agentic AI Applications

Jeffery Moore's avatar
Jeffery Moore
Apr 17, 2026
∙ Paid

I’ve recently been spending time reading about agentic AI security frameworks such as MITRE ATLAS, MAESTRO, and the OWASP Agentic Top 10 to better understand how to build agentic systems more securely.

There are two specific guides that help answer that question more directly. The first is the OWASP Securing Agentic Applications Guide (80 pages, July 2025), an engineering manual from the same team behind the Agentic Top 10. The second is Casaba Security’s Agentic AI Security Guide (v1.2, April 2026), written by a penetration testing firm based on findings from actual engagements.

Between the two, you get both the framework and the field report. Here’s what I think matters, organized around the risks that show up in practice and the architectural decisions that address them.

A useful starting point

Before getting into specifics, one concept from the OWASP guide is worth mentioning first. The guide decomposes “an agent” into six Key Components (KC1 through KC6): the language model (KC1), orchestration and control flow (KC2), reasoning and planning (KC3), memory (KC4), tool integration (KC5), and the operational environment (KC6). Each has its own attack surface, and the risks below target specific components. This matters because you can’t secure a system you haven’t decomposed. I’m betting that teams mapping their agent to these six components will find gaps in KC4 (memory) and KC6 (operational environment), the components that existing threat models don’t cover well.

Untrusted Data Reaching the Control Plane

The risk that underlies almost everything else in agentic security is indirect prompt injection, what the research community calls XPIA. Most people think of prompt injection as a user typing something malicious into a chat box. The indirect version is harder to spot. The injection comes from the data the agent processes, not from the user: documents in RAG indices, tool outputs, emails, web pages, API responses, CRM records. Anywhere the agent reads untrusted data, an attacker can plant instructions.

Casaba breaks XPIA into four attack surfaces. Perception-layer injection hides instructions in content the agent ingests, but humans can’t see (e.g., CSS display: none, HTML comments, aria-label attributes). Research shows these alter agent outputs in 15-29% of tested cases. Instead of injecting explicit commands, the attacker fills the source content with confident, authoritative language that leans in a particular direction. The agent isn’t being told what to say. But when most of what it reads carries the same framing, its synthesis reflects that framing. There’s no payload to detect because the attack is in the aggregate rather than in any single document.

Memory and learning attacks corrupt stored context, so the compromise persists across sessions. Action-layer attacks embed explicit instruction sequences in external resources that, when ingested, override safety alignment.

The architectural response: separate the data plane from the control plane. This is the single most important design decision. The OWASP guide highlights Google’s CaMeL as the cleanest conceptual model. A privileged LLM receives only trusted inputs and generates control flow (which tools to call, in what order). A quarantined LLM processes untrusted data (web content, email bodies, retrieved documents) and has no access to tools. Prompt injection in a retrieved document hits the quarantined LLM, which can’t invoke tools. The injection has nowhere to go. CaMeL also isolates memory: the quarantined LLM’s context doesn’t leak into the privileged LLM’s memory, which prevents poisoned data from influencing future control flow decisions.

CaMeL remains a research architecture. A follow-up paper (May 2025) adds prompt screening, tiered-risk access, and output auditing, but no production deployments have been published. What is shipping in production is the underlying principle: external enforcement layers that sit between the agent and its tools. Check Points, Zenity’s runtime agent monitor, Lasso Security’s MCP Gateway, and Airia’s model-agnostic control plane all enforce the same boundary: untrusted content can’t directly trigger tool invocations. They do it through runtime policy engines and gateways rather than a second LLM, but the design principle is identical.

What to watch for: Any workflow where an agent retrieves or processes content from sources outside your direct control. Email summarizers, web research agents, document analyzers, RAG-based assistants. All are at high risk for XPIA.

User's avatar

Continue reading this post for free, courtesy of Jeffery Moore.

Or purchase a paid subscription.
© 2026 Jeffery Moore · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture