Most data-protection conversations about AI governance start in the wrong place. They ask which certifications a vendor holds, which sub-processors it lists, which region its cloud sits in. Those are real questions. But for the specific category of tooling that watches your AI agents — discovering every agent, session, model and MCP server on your infrastructure and mapping what each one can reach — there is a more fundamental question hiding underneath: does the governance tool itself ever receive your data?
If the answer is yes, you have just created a new processor, a new copy of sensitive material, a new place that can be breached, subpoenaed, or made to phone home. If the answer is no — structurally, by design — then most of the downstream GDPR questions get much smaller. This is the case for self-hosting an AI platform, and it is worth making precisely, without overclaiming.
The privacy guarantee that matters is structural, not a certificate
A SOC 2 report or an ISO 27001 certificate tells you a vendor has processes around the data it holds. That is useful, but it is a statement about governance of access to your data. A far stronger guarantee is not holding the data in the first place. You cannot leak, mis-handle, or be compelled to disclose what you never received.
Self-hosting delivers exactly that. When the control plane runs inside your own hosts, clusters or clouds — including fully air-gapped, with no egress — the sensitive material it observes never crosses your perimeter. The vendor is not a sub-processor of your operational data because the vendor never sees it. That is an architectural fact, not a policy promise you have to audit.
To be clear about where this product stands: Olivares AI is pre-release. It is not certified under SOC 2, ISO/IEC 27001, the EU AI Act or any other framework, and no audit is in progress. The product is designed toward the control objectives those frameworks examine — audit logging, access control, integrity, encryption, change management — so it is ready to be audited when the time comes. The residency argument below does not depend on any certification, which is exactly the point.
Edges, not payloads
The core design decision is what gets stored. An AI governance tool has to understand who can touch what. It does not need the contents of the queries, the prompt bodies, the secrets or the personal data flowing through those touches.
So the graph stores edges, not payloads: the access relationship between an agent and a resource, and whether that access is read (R) or read/write (RW). data-export-job → prod-postgres (RW) is an edge. The rows that job read are not stored. The map records that an agent reached an object in s3://billing-exports; it does not copy the export.
| Stored (the access map) | Not stored |
|---|---|
| Agent identity (role / application name) | Credential values, tokens, keys |
Resource reached (prod-postgres) | Query bodies, result rows |
| Access type — R or RW | Prompt and response payloads |
| Timestamp, outcome, confidence level | Secrets, PII in transit |
Inputs that might carry secrets or personal data are redacted and secret-scanned before anything is written, so the redaction happens at the edge of collection rather than as a later cleanup. What you don’t store, you can’t leak — and what you can’t leak doesn’t expand your GDPR processing footprint.
How the data stays inside the perimeter
Three properties keep this honest in operation:
Read-first observation. The collector observes through signals you already produce — application and audit logs, OpenTelemetry, and eBPF as a kernel-level ground-truth backstop. It is not a proxy in the agent’s data path, so it sees the shape of access, not the contents, and if it fails it never breaks production. There is no mandatory man-in-the-middle copying your traffic.
No telemetry home. Secure-by-default means no phone-home. Vendor telemetry is off unless you explicitly turn it on. Nothing about your estate — not the agent names, not the access map, not usage counts — is sent back to the vendor by default.
Air-gapped with zero egress. In disconnected, regulated or classified networks the control plane runs entirely locally, with licensing validated offline. There is no path out, full stop. For a data-residency requirement that says EU data must remain on EU infrastructure under your control, an air-gapped self-hosted deployment is the most literal possible answer: the data never moves because there is nowhere for it to move to.
Retention and purge are configurable, so you control how long even the access map persists.
Mapping to GDPR Article 28 — honestly
GDPR Article 28 governs the controller–processor relationship and what a Data Processing Agreement must cover. The relevant observation is that in a self-hosted deployment, the usual vendor-as-processor relationship for your operational data largely dissolves: because the tool runs in your infrastructure and never receives that data, in most deployments you remain the controller and processor of your own data within your own environment.
That does not make a DPA pointless. A commercial relationship still benefits from formalising responsibilities — for the software supply chain, for support access, for any future managed component. A Data Processing Agreement under Article 28 is available on request for enterprise procurement. What changes is the scope: there is no list of places your personal data has been shipped, because it was never shipped. That is a much shorter, much defensible conversation with a DPO or procurement team than “trust our sub-processor list.”
This is a structural argument, so treat the boundaries with the same honesty. Self-hosting moves the residency and processing responsibility to you; it does not remove it. You still secure the host, control retention, and govern who can read the access map — and that map is itself sensitive, which is why every privileged view of it is audited and components authenticate to each other with mutual TLS. The product reduces the vendor surface to near zero; it does not absolve the operator.
The takeaway
If a regulator, a DPO, or your own security team asks “where does our data go when we adopt this AI governance tool,” the strongest possible answer is “nowhere — it never leaves, and the tool never sees it.” That answer comes from architecture: self-hosted execution, edges-not-payloads storage, redaction before write, no telemetry home, and air-gapped operation with zero egress. A certificate can corroborate good process; it cannot match the guarantee of data that was never received.
If you want the full, honest version of this posture — including the not-yet-certified compliance stance and how a GDPR Art. 28 DPA fits — see /security. If you would rather read the code that backs the claim, the complete product is self-hostable under AGPL-3.0 on the /open-source page.