Prompt Injection Is an Authority-Promotion Failure, Not Just a Bad Prompt

- Share:





2938 Members
Prompt injection is often described as an input-sanitization or prompt-engineering issue. That framing is incomplete. In modern agentic systems, prompt injection becomes dangerous when untrusted data crosses an authority boundary and gets promoted into something that can change state, access secrets, or execute tools. The failure is not just “the model read malicious text.” The failure is that the runtime treated that text as authorized intent.
This boundary-centered framing echoes the architectural lens in Microsoft’s discussion of hidden system boundaries in AI, where capability depends on where boundaries are drawn and enforced, not just on model behavior in isolation.
If a model can only generate text, prompt injection mostly corrupts text output. If a model can call tools, write files, trigger workflows, or invoke MCP servers, prompt injection authorization becomes a runtime security problem. The model is now part of a control plane, and every “maybe do X” string can become “system did X” unless there is a gate.
A useful analogy is an airport. A stranger can hand you a note saying “open that secure door,” but the note is not a boarding pass, badge, and biometric check. In many AI stacks, we accidentally treat the note as all three. That is the authority-promotion failure: confusing suggestion with permission.
An authority boundary is the point where data stops being descriptive and starts being operative. On one side, strings are content: user text, retrieved chunks, logs, tool docs, memory snippets. On the other side, they are intent that can cause action: tool invocations, policy exceptions, state mutations, external API calls.
The same token sequence can occupy multiple roles along an authority gradient: content, instruction, memory, evidence, policy input, tool intent, or executable action. That is why context governance matters more than clever prompting. You are not securing words; you are securing role transitions across boundaries.
In practical terms, authority promotion happens across surfaces: raw text in prompts, vectors selected by retrieval, conversational memory objects, tool metadata, and finally tool call envelopes. Each hop needs explicit checks, because each hop increases effective power.
Teams often collapse these steps into one fuzzy “the agent reasoned” phase, but security depends on separating them. Retrieval is just search over candidate evidence. Context promotion is a decision to let selected artifacts influence model behavior in a privileged window. Tool execution is a state-changing action request sent to a real system.
Similarity scores only rank semantic closeness; they do not establish trust, provenance, or authorization. Prompt rules are behavior hints to a probabilistic model, not enforceable guards. Tool descriptions improve usability, but they are not tool metadata security controls and cannot substitute for runtime policy decisions.
When these steps are distinct, the architecture can enforce distinct controls: relevance for retrieval, trust and policy for context promotion, and authorization for execution. When they are fused, prompt injection can ride a single path from “found text” to “executed command.”

RAG authorization starts after retrieval, not before it. Retrieved content should not automatically enter privileged context just because it scored high on similarity. A robust promotion gate evaluates provenance, tenant scope, freshness checks, data classification, and policy compatibility before context insertion.
Provenance asks where the chunk came from and whether that source is trusted for this use case. Tenant scope asks whether this actor is allowed to see or act on this tenant’s data. Freshness checks ask whether stale instructions are being treated as current policy. Classification asks whether the data contains regulated or high-risk content. Policy asks whether this material may influence answer generation only, or also action planning.
This is the core reframing for RAG: retrieval finds candidates, but promotion decides authority. A retrieved paragraph can suggest an action; it cannot authorize that action.

Policy enforcement belongs at runtime, outside the model, at the point where proposals become operations. The model can propose; the enforcement layer decides. This is where PEP/PDP architecture matters: a Policy Enforcement Point intercepts context promotions and tool calls, while a Policy Decision Point evaluates identity, resource, action, environment, delegator context, and risk.
This is also where Permit.io fits: not as another prompt rule, but as externalized authorization infrastructure for agent runtimes. The model remains a reasoning engine, while Permit.io-backed runtime policy evaluates whether a proposed promotion or tool call is allowed, denied, or requires step-up approval. That separation keeps control deterministic and auditable even when model outputs are probabilistic.
The governing principle is simple and non-negotiable: untrusted content may suggest an action but cannot authorize that action.
MCP makes tool ecosystems composable, but it also widens authority boundaries. The CSA MCP STDIO RCE research note describes how configuration and transport design choices can become command-execution pathways in agentic infrastructure. Separately, this MCP transport security analysis highlights authentication and session-binding gaps that can expose tool execution surfaces if deployments assume “internal means trusted.”
For defenders, MCP tool poisoning is not only about malicious binaries; it is also about poisoned metadata, registry trust, and argument shaping that passes weak allowlists. Action gates therefore must evaluate the tool identity, normalized arguments, actor identity, delegator identity, declared intent, and current trust level before dispatch. If any element is unverified or policy-incompatible, the runtime blocks or downgrades the request.
In other words, MCP security is not solved by documenting tools better. It is solved by enforcing authorization at the promotion edge where model-proposed calls become executable operations.

Without a high-fidelity AI agent audit trail, teams cannot explain incidents, prove control efficacy, or tune false positives. The audit record should capture each boundary crossing: artifact retrieved, artifact promoted, policy evaluated, tool proposed, tool executed or blocked. It should preserve correlation IDs across turns and sub-agents so investigators can reconstruct causal chains.
For promotion events, log source provenance, tenant scope resolution, freshness verdict, classification tags, and policy decision rationale. For tool events, log requested tool and arguments, argument canonicalization, actor and delegator, trust score inputs, PDP decision, and enforcement outcome at the PEP. For denials, log machine-readable reasons and the exact policy version used, so engineers can reproduce the decision.
This turns security from guesswork into engineering. You can answer not only “what happened,” but “which authority promotion was attempted, why it was denied, and how to fix legitimate workflows without opening blast radius.”
Prompt injection is no longer just output corruption when the system can execute tool calls or mutate external systems. At that point, injected text can be promoted from content into action intent. Authorization is the control that decides whether that promotion is allowed.
An authority boundary is the transition where data changes role from informative context to operative command. Crossing that boundary increases power, so it requires explicit checks. Treating all text as equivalent is what creates authority-promotion failures.
RAG systems should apply a promotion policy after retrieval and before context insertion. That policy should evaluate provenance, tenant scope, freshness, classification, and use-case-specific risk. Content that fails those checks can remain visible for debugging but must not steer policy-relevant behavior.
Retrieval is candidate selection, context promotion is trust and authority assignment, and tool execution is state-changing action. They are separate lifecycle stages with different risk profiles. Collapsing them removes the places where controls should be enforced.
Policy enforcement should happen at runtime, outside the model, using explicit PEP/PDP patterns. The model proposes actions, but an external decision layer authorizes or denies them. This keeps governance deterministic, inspectable, and consistent across models and frameworks.
They help quality and ergonomics, but they are not security controls by themselves. Similarity estimates relevance, prompts influence behavior probabilistically, and tool descriptions are documentation. None of them can reliably enforce least privilege or block unauthorized execution.
MCP action gates are authorization checks performed before dispatching a tool call. They evaluate tool identity, argument safety and normalization, actor and delegator identity, intent alignment, and trust level. If any check fails, the call is blocked, sandboxed, or escalated for approval.

Co-Founder / CEO at Permit.io