Coding Agent Sandboxes Don't Solve Credential Authorization

- Share:





2938 Members
A lot of teams are having the same uncomfortable conversation right now: we moved our coding agent into a locked-down container, so why does security still feel fragile? The answer is that container hardening and VM isolation solve one class of problem—the host compromise problem—but not the authority problem. If your agent can still push to protected branches, publish packages, deploy infra, read inboxes, or call high-power MCP tools, the blast radius is defined by those privileges, not by namespace isolation.
That's the architectural tension behind secure coding agents today. We're very good at talking about runtime boundaries, but still too casual about coding-agent credentials. You can run Claude Code security controls perfectly at the OS layer and still hand the agent effective admin authority through GitHub tokens, cloud credentials, and browser-backed sessions. That gap is where most serious failures will happen.
Host isolation and authority isolation are different security properties that happen to get discussed as if they were one. Host isolation is about containing code execution: containers, microVMs, seccomp profiles, egress controls, read-only mounts, and process-level guardrails. It answers, "if this process is malicious, how far can it move on this machine?"
Authority isolation answers a different question: "what can this process do through legitimate APIs and trusted control planes?" An agent doesn't need a kernel escape if it already has a token that can merge to main, trigger production deploy, or read secrets from a vault. In practice, most damaging incidents won't look like classic host compromise—they'll look like authorized misuse.
This is the industry mistake worth naming in plain language: coding-agent sandboxing is necessary, but it addresses a different security dimension than coding agent authorization. You can be perfect on one axis and dangerously weak on the other. A fully isolated agent with a high-scope GitHub PAT and cloud CLI creds is still a high-authority operator.

Most teams underestimate the credential surface because they reason only about what they explicitly passed to the agent in environment variables. But modern coding workflows leak authority through many side channels: local CLIs already authenticated, browser sessions already warm, CI tokens in repo secrets, and MCP servers that proxy additional capabilities. Least privilege for coding agents has to model all of that, not just .env.
The discussion around Claude Code security has already surfaced this reality in public practitioner forums. The Claude Fable Hacker News thread has multiple engineers explicitly calling out Gmail access, password-reset abuse, browser-profile exposure, and .env/MCP pathways as the real concern—not merely "can it delete local files." That framing is directionally correct: the dangerous path is often credential reuse, not sandbox breakout.
The credential inventory that matters in practice looks like this:
repo:read and metadata are very different from contents:write, pull_requests:write, or org-admin permissions. GitHub agent permissions should be task-scoped and repo-bounded by default.The risk is not theoretical. Obsidian Security's LiteLLM privilege escalation research demonstrates a path from low-privilege access to admin-level control and then forged downstream tool execution against agents like Claude Code, including MCP-related execution paths. And Antimetal's automation implementation writeup makes the opposite but equally important point: autonomous systems increasingly do need direct authenticated production access to deliver value. This is architectural reality, not fearmongering.

A tiered model exists because "tool call" is not a meaningful risk unit. cat README.md and "merge to main + deploy prod" are not siblings; treating them as equivalent is how teams end up with catastrophic default approvals. Tool trust levels are the bridge between policy language and operational controls.
This is especially relevant for Cursor MCP security and Claude Code security deployments where users expect fluid interaction. If you put the same friction on every call, people disable controls. If you put no friction on high-impact calls, incidents become inevitable. Risk-tiered controls are the only workable middle path.
| Tier | Examples | Risk profile | Typical control |
|---|---|---|---|
| Read/List | git clone, git log, grep, ls, cat |
Low; observational; no direct side effects | Auto-allow with logging |
| Edit/Write | file write, git commit, branch push |
Medium; reversible but defect-introducing | Policy allow + scope checks |
| PR/Review | open PR, request reviewers, issue-state changes | Elevated; org/social surface | Conditional allow, stronger identity binding |
| Merge/Deploy | merge protected branches, trigger CI/CD, deploy envs | High; business impact, potentially irreversible | Human approval for agent actions |
| Secret/Credential access | read secret manager, write .env, rotate keys |
Critical; privilege amplification | Explicit approval + JIT grant |
| Destructive shell | rm -rf, DROP DATABASE, infra teardown |
Critical/irreversible | Default deny or break-glass approval |
Once tiers exist, policy becomes legible: read paths stay fast; write paths enforce resource scoping; merge/deploy and secret/destructive paths require explicit, invocation-specific authorization. This is what coding agent authorization should look like in production: proportional control, not blanket paranoia and not blanket trust.
Zero standing permissions for AI agents means exactly what it says: the agent starts with no persistent authority. At provisioning time, it gets identity, telemetry hooks, and policy context—but not long-lived GitHub PATs, cloud keys, registry publish tokens, or always-on SaaS credentials. Capability is granted only when needed.
In a mature setup, access is just-in-time, narrowly scoped to the declared task, and auto-expired by default. If an agent is assigned "prepare PR for bugfix in repo X," then rights should be limited to that repo, that branch pattern, that operation class, and a short validity window. If it stalls or deviates, grants should time out and revoke automatically.
Delegated access is the other half. The agent acts on behalf of a human principal, not as a superuser shadow account. That means effective rights must be the intersection of: the human's permissions, declared task scope, tool tier policy, and temporal window. If the delegating engineer cannot merge to production, the agent must not be able to merge either. If the task is read-only triage, write pathways stay closed.
This is the opposite of today's common anti-pattern: stuffing durable secrets into agent environment variables and calling it enablement. That model quietly creates standing privilege, turns prompt injection into privilege abuse, and makes post-incident attribution ambiguous. Zero standing permissions plus delegated authorization gives you a principled default: no authority without explicit, contextual, time-bounded intent.

At this point, policy has to move from documentation to runtime decisions. Permit.io is useful here as an enforcement plane because each tool invocation can be evaluated in real time by a policy decision point (PDP) using the full context: human identity, agent identity, session/task metadata, requested operation, target resource, and risk tier.
That is the key shift from static secrets to live authorization. Instead of "token present => action allowed," the system asks, "is this exact call allowed now, by this agent, for this delegated human, on this resource, under this task scope?" For secure coding agents, this is the difference between binary trust and continuous policy.
High-risk operations should also support invocation-specific human approval flows. The agent pauses on write/destructive or secret-tier actions and requests approval for that one action with full parameters visible. This avoids dangerous blanket session consent and gives reviewers a chance to reject suspicious deltas like "merge + deploy + secret read" chained together.
A credible implementation also needs forensic-grade auditability. Every meaningful call should produce a record binding principal, delegation chain, tool input, policy decision, and outcome. A blocked merge example might look like:
{
"event_id": "evt_9f3b2c1d",
"timestamp": "2026-06-12T18:14:22Z",
"human_principal": {
"id": "user_12345",
"email": "dev@example.com",
"role": "engineer"
},
"agent": {
"id": "agent_claude_code_07",
"session_id": "sess_a81d",
"task_id": "task_fix_checkout_timeout"
},
"request": {
"tool": "github.merge_pull_request",
"operation": "merge",
"resource": "github://org/payment-service/pull/842",
"tier": "merge_deploy",
"parameters": {
"merge_method": "squash",
"target_branch": "main"
}
},
"policy_decision": {
"result": "deny",
"reason": "human_principal_missing_permission_for_protected_branch",
"matched_rule": "merge_requires_maintainer_and_change_window",
"policy_version": "2026-06-12.4"
},
"jit_grant": {
"status": "not_issued",
"ttl_seconds": 0
},
"outcome": "operation_blocked_before_execution"
}
When tied to JIT grants keyed by session ID, expiring automatically and revocable mid-task, this model gives teams practical least privilege for coding agents without killing developer velocity. Read calls stay smooth, write calls require scope integrity, and secret/destructive calls require explicit human intent.
Security teams and platform engineers are converging on a shared vocabulary, but implementation details are still uneven. The questions below are the ones that usually determine whether a program is genuinely safe or just cosmetically hardened.
Sandboxing limits host-level compromise paths, but it does not limit what the agent can do with valid credentials. If the agent holds high-scope tokens, it can still perform high-impact actions through trusted APIs without any host escape. Effective security requires both host isolation and authority isolation—treating them as the same thing is how teams build a hardened container that still holds production keys.
Only short-lived, task-scoped credentials issued just in time for specific operations. Long-lived PATs, static cloud access keys, and persistent registry publish tokens should not live inside agent environments. The right model is zero standing permissions at provisioning time, with grants issued narrowly by tier and task and revoked automatically when the task completes or times out.
MCP connections can significantly expand effective authority because they proxy external capabilities into the agent's toolset. If an MCP path is compromised—as Obsidian Security demonstrated with LiteLLM—the agent may execute actions it was never intended to perform directly, including forged tool calls against downstream systems. That is why MCP server security and per-tool authorization are core controls, not optional add-ons for high-risk deployments.
Zero standing permissions means the agent has no durable privileged credentials at rest between tasks—no always-on PATs, no persistent cloud keys, no live registry publish tokens in the environment. Access is granted at runtime, bounded in scope and time, then revoked automatically. This approach reduces credential theft impact and eliminates the standing privilege that makes prompt injection so dangerous.
Teams should use invocation-level approvals rather than blanket "approve session" toggles. The reviewer needs to see the exact tool name, parameters, target resource, and policy context before approving—not just "the agent wants to do something." Approvals should be logged, time-bound, and invalidated if request parameters change between approval and execution.
A strong audit trail binds human principal, agent identity, session and task identifier, requested operation, resource scope, policy result, matched rule, policy version, timestamp, and execution outcome into a single queryable record. Missing any of these fields weakens attribution during incident response. The goal is to be able to answer "who authorized what, under which policy, for which task, with what outcome?" for every meaningful tool call.
Agents execute at machine speed and can chain many actions in seconds, so over-permissioned access compounds far more quickly than it does for humans. Human developers rely on judgment pauses and friction; agents rely on policy gates enforced at the runtime layer. This means least privilege for agents requires tighter operation scoping, shorter credential lifetimes, and stronger runtime authorization checks than typical human access controls—the margin for error is smaller because the execution velocity is higher.

Co-Founder / CEO at Permit.io