Skip to content

MCP

Auditing Claude Code and MCP servers in self-hosted environments

By Olivares AI 6 min read

A platform team rolls out Claude Code to a dozen engineers. To make it useful, they wire it to a few MCP servers: one that reads the monorepo, one that queries a reporting replica, one that files tickets. Within a week, an agent is reading source, touching a database, and calling internal tools — and nobody can answer the question an auditor will eventually ask: which agent did what, to which resource, and can you prove it wasn’t altered afterwards?

This is not a hypothetical gap. Independent industry research (CSA/Token, n=418) found that around 82% of organisations have AI agents running they don’t know about, while only about 21% keep a real-time inventory of them. Claude Code and MCP are exactly the kind of capability that lands fast and outpaces the audit story. The good news: a defensible trail is achievable entirely inside your own perimeter, with no payloads or secrets leaving it.

Why the obvious setup fails an audit

The usual first deployment shares one credential. Every Claude Code instance authenticates to the MCP layer and to downstream databases as the same service account — mcp-runner, ci-bot, something generic. It works, and it quietly destroys attribution.

When ten agents act as one principal, the database audit log shows ten writes from mcp-runner and nothing more. You cannot tell which engineer’s session issued the UPDATE, whether it came from an interactive Claude Code prompt or an unattended job, or which MCP tool was in the path. Independent research (Optro) puts the share of organisations that can trace an agent’s action back to a person at roughly 28%. A shared service account is one structural reason attribution collapses: the audit log can show what happened, but never which agent did it.

The second failure is integrity. Even teams that log per-action usually write to a mutable store. If the logs live in the same system an attacker (or a buggy agent) can reach, “we have logs” is not the same as “we have evidence”. An auditor distinguishes the two sharply.

Per-agent identity is the foundation

Everything downstream depends on attribution, so fix identity first. Instead of one shared token, issue a distinct, short-lived identity per agent — ideally per session. The identity travels with the request into the MCP server and into any resource the agent reaches, so the principal recorded at the database is the agent, not a generic runner.

This is what makes least-privilege meaningful. A reporting agent can be pinned to read-only on the reporting replica; a release agent gets write only to the artefacts it owns. Now a single observed write outside that envelope is a precise, attributable signal instead of noise in a shared account. Expressed as policy, the intent is blunt:

agent "reporting-assistant" {
  # Claude Code session via the reporting MCP server
  resource "prod-postgres/reporting_replica" {
    access     = "read"      # SELECT only
    deny       = ["write", "ddl"]
  }
  resource "s3://billing-exports" {
    access = "read"
  }
  on_violation {
    action = "block_and_alert"   # refuse at access time, not just log
  }
}

The point is not the syntax — it’s that the policy names the agent, names the resource, and separates read from read/write. That separation is the whole game.

Treat MCP annotations as untrusted signals

MCP tools can describe themselves with annotations such as readOnlyHint and destructiveHint. They are genuinely useful for triage — a tool that declares itself destructive deserves a tighter policy. But the MCP specification is explicit that these are hints, and clients must not rely on them for security decisions. They originate from the server, which is the very thing you are auditing. A tool can declare readOnlyHint: true and still issue a write, whether through a bug, a misconfiguration, or deliberate evasion.

So the correct posture is corroboration, not trust. Take the annotation as a claim, then check it against ground truth from a layer the tool does not control:

  • Database audit logs (for example PostgreSQL pgAudit) tell you whether a SELECT or an UPDATE actually ran.
  • OpenTelemetry spans from the MCP server and downstream services show the call graph and the operations performed.
  • eBPF kernel signals are the anti-evasion backstop: a write syscall to a file or socket is observable at the kernel regardless of what the tool claimed in user space.

When the annotation says read-only and the kernel saw a write, that contradiction is the headline finding — a readOnlyHint tool that wrote to prod-postgres is precisely the least-privilege drift worth waking someone for. The diff between what policy permitted and what the collectors observed is where the real risk lives.

The ledger an auditor will accept

A trail is only evidence if it is tamper-evident. Write each event to an append-only, hash-chained ledger: every record includes the hash of the previous one, so any edit or deletion downstream breaks the chain and is detectable. Each line attributes the action to the specific agent, names the resource, records read versus read/write, and carries a confidence level — attributed when identity and outcome are both corroborated, approximate when something had to be inferred. Confidence is shown honestly; an approximate match is never dressed up as a proven one.

ts=2026-06-08T09:14:02Z agent=reporting-assistant@s3f1 tool=sql-read     resource=prod-postgres/reporting_replica  op=R   outcome=allow   conf=attributed   prev=8a1c…
ts=2026-06-08T09:14:05Z agent=reporting-assistant@s3f1 tool=export-writer resource=s3://billing-exports          op=R   outcome=allow   conf=attributed   prev=2f90…
ts=2026-06-08T09:17:48Z agent=data-export-job@b22e     tool=sql-write     resource=prod-postgres/customers          op=RW  outcome=DENY    conf=attributed   prev=c7d3…  policy=read_only_violation

That third line is the one an auditor cares about: a write attempt on a table the agent had no write grant for, denied at access time, attributed to a named agent, and anchored in the chain. Because policy is enforced when the access happens — not merely logged after — the deny is a control, not a post-mortem.

Two more properties make the trail hold up. Privileged views are themselves audited: looking at the access map is recorded, so “who saw what” is answerable. And evidence export is tamper-evident — you hand an auditor a signed slice of the chain they can verify independently, rather than a CSV they have to take on faith.

PropertyShared service accountPer-agent identity + hash-chained ledger
AttributionOne principal for all agentsAction tied to the specific agent/session
Read vs writeConflatedDistinguished and policy-checked
IntegrityMutable logsAppend-only, chain breaks on edit
Auditor-ready exportCSV taken on trustSigned, independently verifiable slice

Why self-hosted matters here

All of this works without anything leaving your network. The collector observes — logs, OpenTelemetry, eBPF as a kernel-level backstop — rather than sitting in the agent’s data path, so if it fails it never breaks the agent or the production request behind it. The ledger stores access relationships, not payloads: it records that reporting-assistant read reporting_replica, not the rows it returned. Inputs that might carry secrets or PII are redacted and secret-scanned before anything is written. For air-gapped, GDPR-bound or data-residency-constrained estates, that is the difference between an audit story you can defend and one you cannot: nothing about your Claude Code and MCP usage phones home, because the system never sees the data in the first place.

Claude Code and MCP are worth deploying. They just need an audit trail built the same way you’d build one for any other privileged automation — identity first, outcomes corroborated, evidence anchored. If you want to see how the access map and the tamper-evident ledger fit together, the security model and the product overview go deeper on each.

Frequently asked

Can I trust an MCP tool's readOnlyHint to prove an action was read-only?

No. The MCP specification states that tool annotations such as readOnlyHint and destructiveHint are hints and must be treated as untrusted, because they come from the server, not from an enforcement layer. Use them to triage and to flag contradictions, but prove the actual read/write outcome with telemetry or kernel-level signals (OpenTelemetry spans, database audit logs, eBPF). Trust the corroborated outcome, not the annotation.

How do I attribute a Claude Code action to a specific agent instead of a shared service account?

Issue a distinct, short-lived identity per agent or per session rather than one shared token. When the agent reaches a database or an MCP server, that identity travels with the request, so the audit ledger records which agent did what. A shared service account collapses every agent into one principal and makes per-agent least-privilege and tamper-evident attribution impossible after the fact.

See what your agents can reach

Olivares AI is the open, self-hosted platform for your AI estate. Deploy it on your own infrastructure and get the access map your security and platform teams have been asking for.