Skip to content

For SRE & sysadmins

The cockpit for the AI agents running on your infrastructure

Discover the AI agents on your hosts, see what each can read and write, and keep an estate-wide stop within reach — from one self-hosted binary that never sits in the request path.

You already run the dashboards for everything else on the estate: hosts, queues, data stores, the request path. Then a dozen AI agents show up — copilots, MCP servers, scheduled jobs, a Claude Code instance someone wired to production — and you have no single place that answers the operator’s first question: what is running, and what can each thing actually reach?

This is the same pattern as unmanaged shadow infrastructure, except the workloads hold credentials and act on their own. Of organizations that suffered an AI-related breach, around 97% lacked proper AI access controls (IBM Cost of a Data Breach 2025, Ponemon). The industry has a word for the cause now — agent sprawl — and Gartner expects more than 40% of agentic-AI projects to be cancelled by the end of 2027 (Gartner), often because no one could operate them safely once they multiplied.

What an operator gets

Olivares AI is an open, self-hostable platform that runs where your agents run. It discovers them, builds a read/write access map of what each one touches, and gives you the controls to govern and audit that access. For an SRE or sysadmin, five properties matter more than the feature list.

Passive — not in the request path

Discovery is built from telemetry and source-native audit out of band. The health and reliability view is a consumer of the event bus, not a prober that opens sockets into your infrastructure — liveness is inferred from observed activity, and an agent that stops emitting is itself a signal. Because Olivares is not a proxy or a sidecar in front of your agents, an Olivares outage cannot take them down. The only surfaces that ever sit inline are the optional actuation gates you choose to wire in, and those fail closed.

One self-hosted static binary

It deploys as a single static binary with the web console embedded — no agent fleet to push, no control-plane Kubernetes operator to babysit. The store is pure-Go SQLite (Postgres for multi-tenant), so there is no C toolchain to fight. You install one file, run it as a dedicated service user, and you have the cockpit. See install & self-host.

The control plane runs inside your perimeter and can run air-gapped — your governance and observation data never leaves. One caveat operators should hold onto: this is not offline AI. Hosted models like Claude still reach the provider’s API; air-gapped means your estate data stays home, not that the model runs locally. Only genuinely self-hostable models (vLLM, Ollama) run fully offline. More in security and architecture.

OTel-native ingest

The ingest path pins the OpenTelemetry GenAI semantic conventions alongside OCSF, the SIEM formats and W3C Trace Context — so the telemetry you already emit is the telemetry it reads. One honest boundary: Olivares correlates your ledger of governed events by W3C trace_id; it does not store full OTel spans. Durations and status on the trace view are ledger-event windows, not reconstructed span data. The full spans stay in your own OTLP collector, where they belong. Olivares tells you who touched what and whether it was permitted — it is not a replacement for your tracing backend. (See the compare page for where it sits next to LiteLLM/Langfuse.)

Probes and health, like any service you run

The binary exposes the endpoints your orchestrator already expects:

GET /livez     liveness  — is the process up
GET /readyz    readiness — is it serving (and not a cold standby)
GET /healthz   setup-exempt liveness for scrapers
GET /metrics   Prometheus exposition

/readyz returns 503 when the backing store is unreachable rather than hanging, so a load balancer drains the node instead of black-holing it. On top of that, module XXII tracks the health, SLA and uptime of the agents and MCP servers themselves — what is healthy, what is degraded, and what depends on what — and emits down/degraded/ recovered/SLA-breach findings to your existing rail (Slack, PagerDuty, SIEM). It produces the signal; it does not try to be your notifier.

Read-first, with a real estate-wide stop

Olivares is detective by default and deny-closed. It observes and governs broadly; it does not broadly actuate. But when an agent goes wrong at 3am, an operator needs one lever that works without ceremony — so the kill switch is real and built to be used:

  • Engaging is cheap. Admin-tier, one mandatory reason, one click. There is no approval gate and no step-up on engage — an emergency stop that waits for quorum is not a stop. The estate-wide scope denies every governed actuation surface for the tenant; the agent scope stops one agent across every surface it can be named on.
  • Re-enabling is not. Lifting a stop requires dual-control (two distinct humans) and is followed by a mandatory post-review that a third, uninvolved person must sign. “The estate stays stopped” is the safe state.
  • It fails closed. Every actuation gate consults the live stop state per call and denies on a read error — an unreadable stop never means “go”.

Be honest with yourself about scope: the stop is only as wide as the gates you have wired in. It cannot freeze a surface Olivares has no seam into. That is the deliberate trade of a read-first system — and it is why discovery and the map come first.

What this is — and what it isn’t

  • It is a passive, self-hosted cockpit: discover the agents, map their read/write access, watch their health, and keep an estate-wide stop one click away.
  • It isn’t an inline proxy, a tracing backend, or a system that silently rewires your agents. Actuation is opt-in, on-demand, and deny-closed — you wire each seam deliberately.
  • It is pre-1.0 and open-core. The catalog lists 23 capability modules; roughly twenty are wired today, the rest are design-stage or post-v1. The modules reference marks what is live.
  • It isn’t certified. Olivares is designed toward SOC 2, ISO/IEC 42001 and the EU AI Act — it does not hold those certifications, and it will not claim to. See security for the honest posture.
  • Fidelity is tiered and shown as such. Read-vs-write coverage is clean on stores with native audit, lossy on some document/vector stores, and unknown where it cannot be reconstructed passively (Redis, SQLite, D1); per-agent attribution is firm only when the signal supports it, approximate behind a shared account. Nothing is guessed. See fidelity.

Where to start

Questions

Does Olivares AI sit in my agents' request path?

No. Discovery and the access map are built from telemetry and source-native audit out of band, so an Olivares outage cannot take your agents down. The only in-path surfaces are the optional, deny-closed actuation gates you explicitly wire in — and an unreadable stop state there fails closed, never open.

Can it run air-gapped?

The Olivares control plane can. Your governance and observation data stays inside your perimeter. The honest caveat is model inference — hosted models like Claude still reach their provider's API; only genuinely self-hostable models (vLLM, Ollama) run fully offline.

What does the kill switch actually stop?

The estate-wide stop denies every governed actuation surface for the tenant; an agent-scoped stop denies one agent. It is read-first by design — it cannot reach a surface you have not wired a gate into. Engaging is cheap and one-click; re-enabling requires dual-control and a mandatory post-review.

Try it on your own infrastructure

Olivares AI is open-core (AGPL-3.0) and self-hosted. Deploy it and see what your agents can reach.