What is least-privilege drift for an AI agent?

It is the widening gap between what an agent is permitted to do and what it is observed actually doing. Coarse grants and dynamic behaviour mean an agent can start touching resources nobody approved long before anyone notices. Drift is the silent accumulation of that gap, and it is reviewable only if you continuously compare the permitted set against the observed set.

How do you catch an over-privileged agent without breaking production?

Observe rather than intercept. A read-first collector reads from logs, OpenTelemetry and eBPF kernel signals instead of sitting in the agent's data path, so a collector failure never blocks the agent. That observation feeds a permitted-vs-observed diff. Enforcement then lives in policy-as-code evaluated at access time, where an unreviewed write becomes a denied path plus an alert.

Least-privilege drift for AI agents

Least privilege is one of the oldest, most reliable ideas in security. Give an identity exactly the access it needs, nothing more, and the blast radius of any compromise stays small. For human accounts and long-lived service accounts the model holds up reasonably well: roles are reviewed, grants are scoped, access certifications run on a quarterly cadence.

AI agents break this model quietly. An agent is given a credential, a tool, an MCP server connection, and from that moment its behaviour is dynamic. It decides at runtime which resources to read, which to write, which tool to invoke next. The grant you wrote down on day one describes a ceiling, not what the agent does. And because agents are productive, that ceiling tends to be generous: broad database roles, write access “just in case”, a service account shared across half a dozen workflows. The result is an estate where the permitted access and the actually-used access drift apart, silently, every day.

Independent industry research captures the scale of the blind spot: roughly 82% of organisations report having AI agents running that they did not know about (CSA/Token Security, n=418). You cannot enforce least privilege on an estate you cannot see.

Defining least-privilege drift

Let me name the thing precisely. Least-privilege drift is the growing gap between the access an agent is permitted and the access it is observed to use. It has two failure directions, and only one of them is obvious.

The obvious direction is under-use: an agent holds write access to a table it has never written to. That is dead privilege, and it is risk you are carrying for no benefit. The dangerous direction is the inverse signal it implies: when the observed set contains something the permitted set never deliberately granted, you have an action happening that no one reviewed. An export job that suddenly writes to a bucket it only ever read from. An agent that policy scoped to one schema reaching across to another. These are not exotic; they are the everyday texture of agents wired together with coarse grants.

The reason this is hard, and the reason it is specific to agents rather than humans, is that the behaviour is generated, not configured. A human with too much access mostly does not exercise it. An agent with too much access will exercise whatever helps it complete the task in front of it, including paths nobody anticipated. Static policy review cannot keep up with behaviour that changes per run.

Turning drift into a reviewable signal

Drift is only dangerous while it is invisible. The job is to make it a signal a human can review, and that requires two things working together: continuous observation of what agents actually touch, and a stable record of what they were permitted to touch.

The observation has to be read-first. A collector that observes from logs, OpenTelemetry traces and eBPF kernel signals sits outside the agent’s data path. It is not a proxy, it does not gate calls, and if it fails it fails open in the safe sense, the agent keeps working, you lose visibility rather than availability. That asymmetry matters: a security control that can take down production is a control teams quietly disable. The eBPF layer in particular acts as a kernel-level ground truth, the part an agent cannot route around, which is why protocol-level hints such as MCP tool annotations (readOnlyHint, destructiveHint) are corroborated against it rather than trusted. The MCP specification itself says those annotations are untrusted; kernel signals are what make the corroboration real.

What the observation produces is an access map: for each agent, which resources it reached and whether it read (R) or read/write (RW). The map stores access relationships, not payloads, secrets or PII. The interesting part is the diff:

Agent	Resource	Permitted	Observed	Drift
data-export-job	prod-postgres	R	R	none
data-export-job	s3://billing-exports	R	RW	unreviewed write
report-builder	analytics-db	R	(unused)	dead privilege

The row that matters is the middle one. Policy granted read on the export bucket; the collector observed a write. That single line is the headline risk least-privilege drift is meant to surface: a privilege exercised that nobody reviewed, attributed to a specific agent rather than a shared service account, because per-agent identity is what makes the attribution and the audit possible at all.

Enforcing at access time, not just logging it

Detection tells you drift happened. To close the loop you want the unreviewed write to be a denied path, not a logged one. That is where policy-as-code evaluated at access time comes in. The same diff that flagged the drift becomes the rule that prevents it.

Consider pinning the export job to read-only on the production database and denying writes outright, with a violation that blocks and alerts rather than passing silently:

agent "data-export-job" {
  # Read-only on the operational database. No writes, ever.
  access "prod-postgres" {
    mode  = "read"
    deny  = ["write", "delete", "ddl"]
  }

  # The export target the job is *supposed* to use.
  access "s3://billing-exports" {
    mode = "read"
  }

  on_violation {
    action = "block"        # deny the call at access time
    alert  = "security-oncall"
    audit  = "append"       # write to the tamper-evident ledger
  }
}

Walk the before and after concretely.

Before. The export job, holding a broad role, issues a write to s3://billing-exports. Nothing stops it. The action succeeds, blends into normal traffic, and shows up days later, if at all, as an anomaly in the access map. The gap between permitted and observed widened, and the only artefact is a log line nobody read.

After. The same write arrives. Policy is evaluated at access time, sees write on a resource scoped to read, and returns a denial before the operation lands. The violation blocks the call, raises an alert to the on-call rotation, and appends an entry to the append-only, hash-chained audit ledger. The unreviewed write never becomes an unreviewed change. The drift is converted, in the moment, back into a least-privilege path.

Two properties keep this honest. First, every privileged view of the access map is itself audited, who looked at what, because the map is sensitive and a security tool that cannot account for its own operators is not trustworthy. Second, confidence is shown plainly: an action attributed to an agent by kernel-level evidence is marked differently from one inferred approximately. You are never asked to act on a fabricated certainty.

The takeaway

Least privilege did not fail for AI agents; the review cadence did. Grants are coarse, behaviour is dynamic, and quarterly certifications cannot track an access surface that changes every run. The fix is not a heavier proxy in the critical path. It is continuous, read-first observation that produces a permitted-versus-observed diff, plus policy-as-code that enforces the corrected boundary at access time, so an over-privileged agent is caught as a reviewable signal long before it becomes an incident.

If you want to see how the collector, the access map and access-time enforcement fit together without sitting in your agents’ data path, the architecture page walks through the design, and the product shows what the permitted-versus-observed view looks like on a real estate.

Least-privilege drift: catching over-privileged AI agents before an incident

Defining least-privilege drift

Turning drift into a reviewable signal

Enforcing at access time, not just logging it

The takeaway

Related posts

What your AI agents can actually reach: mapping agent access on real infrastructure

Audit evidence a verifier can check offline

Inside the hooks PEP: deny-closed policy inside Claude Code

Frequently asked

See what your agents can reach