You already run the dashboards for everything else on the estate: hosts, queues, data stores, the request path. Then a dozen AI agents show up — copilots, MCP servers, scheduled jobs, a Claude Code instance someone wired to production — and you have no single place that answers the operator’s first question: what is running, and what can each thing actually reach?
This is the same pattern as unmanaged shadow infrastructure, except the workloads hold credentials and act on their own. Of organizations that suffered an AI-related breach, around 97% lacked proper AI access controls (IBM Cost of a Data Breach 2025, Ponemon). The industry has a word for the cause now — agent sprawl — and Gartner expects more than 40% of agentic-AI projects to be cancelled by the end of 2027 (Gartner), often because no one could operate them safely once they multiplied.
What an operator gets
Olivares AI is an open, self-hostable platform that runs where your agents run. It discovers them, builds a read/write access map of what each one touches, and gives you the controls to govern and audit that access. For an SRE or sysadmin, five properties matter more than the feature list.
Passive — not in the request path
Discovery is built from telemetry and source-native audit out of band. The health and reliability view is a consumer of the event bus, not a prober that opens sockets into your infrastructure — liveness is inferred from observed activity, and an agent that stops emitting is itself a signal. Because Olivares is not a proxy or a sidecar in front of your agents, an Olivares outage cannot take them down. The only surfaces that ever sit inline are the optional actuation gates you choose to wire in, and those fail closed.
One self-hosted static binary
It deploys as a single static binary with the web console embedded — no agent fleet to push, no control-plane Kubernetes operator to babysit. The store is pure-Go SQLite (Postgres for multi-tenant), so there is no C toolchain to fight. You install one file, run it as a dedicated service user, and you have the cockpit. See install & self-host.
The control plane runs inside your perimeter and can run air-gapped — your governance and observation data never leaves. One caveat operators should hold onto: this is not offline AI. Hosted models like Claude still reach the provider’s API; air-gapped means your estate data stays home, not that the model runs locally. Only genuinely self-hostable models (vLLM, Ollama) run fully offline. More in security and architecture.
OTel-native ingest
The ingest path pins the OpenTelemetry GenAI semantic conventions alongside OCSF,
the SIEM formats and W3C Trace Context — so the telemetry you already emit is the
telemetry it reads. One honest boundary: Olivares correlates your ledger of governed
events by W3C trace_id; it does not store full OTel spans. Durations and status
on the trace view are ledger-event windows, not reconstructed span data. The full
spans stay in your own OTLP collector, where they belong. Olivares tells you who
touched what and whether it was permitted — it is not a replacement for your tracing
backend. (See the compare page for where it sits next to LiteLLM/Langfuse.)
Probes and health, like any service you run
The binary exposes the endpoints your orchestrator already expects:
GET /livez liveness — is the process up
GET /readyz readiness — is it serving (and not a cold standby)
GET /healthz setup-exempt liveness for scrapers
GET /metrics Prometheus exposition
/readyz returns 503 when the backing store is unreachable rather than hanging, so a
load balancer drains the node instead of black-holing it. On top of that, module XXII
tracks the health, SLA and uptime of the agents and MCP servers themselves — what
is healthy, what is degraded, and what depends on what — and emits down/degraded/
recovered/SLA-breach findings to your existing rail (Slack, PagerDuty, SIEM). It
produces the signal; it does not try to be your notifier.
Read-first, with a real estate-wide stop
Olivares is detective by default and deny-closed. It observes and governs broadly; it does not broadly actuate. But when an agent goes wrong at 3am, an operator needs one lever that works without ceremony — so the kill switch is real and built to be used:
- Engaging is cheap. Admin-tier, one mandatory reason, one click. There is no approval gate and no step-up on engage — an emergency stop that waits for quorum is not a stop. The estate-wide scope denies every governed actuation surface for the tenant; the agent scope stops one agent across every surface it can be named on.
- Re-enabling is not. Lifting a stop requires dual-control (two distinct humans) and is followed by a mandatory post-review that a third, uninvolved person must sign. “The estate stays stopped” is the safe state.
- It fails closed. Every actuation gate consults the live stop state per call and denies on a read error — an unreadable stop never means “go”.
Be honest with yourself about scope: the stop is only as wide as the gates you have wired in. It cannot freeze a surface Olivares has no seam into. That is the deliberate trade of a read-first system — and it is why discovery and the map come first.
What this is — and what it isn’t
- It is a passive, self-hosted cockpit: discover the agents, map their read/write access, watch their health, and keep an estate-wide stop one click away.
- It isn’t an inline proxy, a tracing backend, or a system that silently rewires your agents. Actuation is opt-in, on-demand, and deny-closed — you wire each seam deliberately.
- It is pre-1.0 and open-core. The catalog lists 23 capability modules; roughly twenty are wired today, the rest are design-stage or post-v1. The modules reference marks what is live.
- It isn’t certified. Olivares is designed toward SOC 2, ISO/IEC 42001 and the EU AI Act — it does not hold those certifications, and it will not claim to. See security for the honest posture.
- Fidelity is tiered and shown as such. Read-vs-write coverage is
cleanon stores with native audit,lossyon some document/vector stores, andunknownwhere it cannot be reconstructed passively (Redis, SQLite, D1); per-agent attribution isfirmonly when the signal supports it,approximatebehind a shared account. Nothing is guessed. See fidelity.
Where to start
- Stand it up: self-host, then the quickstart for a synthetic estate.
- Wire your first signal: connect a source.
- Understand the model: permitted vs observed and the access map.
- Operator-adjacent personas: platform engineering and security leaders.