Skip to content

Guides

Connect a source

Wire a real observation source into Olivares AI, understand the read-first connector model, and configure pgaudit and s3cloudtrail with the correct source kinds

A source observes one external system and emits normalized observations — it never sits in the data path, proxies traffic, or reads payloads. This page covers the connector model and how to wire a real source through OLIVARES_SOURCES_CONFIG. If you only want to connect a coding agent, start with Connect Claude Code; that is one source on the cooperative path, and this is the model underneath it.

What a source does

A source observes a system and reports what it saw as typed observations. The read/write access map is built from what the source reports, not from intercepting what flows. The engine owns scheduling: a streaming source (a log tail) blocks until cancelled; a batch source does its work and returns, and the engine decides when to run it again.

An observation carries only identifiers and a read/write classification — never SQL bodies, request payloads, secrets, or PII. That is a property of the wire vocabulary the connector speaks, not a setting you can toggle. See Permitted vs observed for how these observations land on the map.

Every edge records which source produced it and a confidence level, and the product shows both. Attribution is firm when the access is tied to a per-agent identity and approximate when it is inferred or lossy (a shared service account, a pooled connection). The access mode is one of unknown, read, write, or readwriteunknown is explicit and never guessed. See Fidelity.

Where it runs

The collectors that run these sources always run on your infrastructure. The control plane that ingests them can be a single self-hosted binary, a distributed deployment, or air-gapped — the observed estate’s data never leaves your boundary. See Self-host and the architecture overview.

The config file

Real (non-demo) sources are wired from a single operator config file named by the OLIVARES_SOURCES_CONFIG environment variable, read before the engine starts. It is a JSON document that declares a list of sources. Each entry selects a connector by kind, names the tenant its observations belong to, gives the source a name, and carries the connector’s own config. An optional poll_seconds re-runs a batch source on an interval; a streaming source ignores it.

The source kinds are registered names in the engine. The two clean-tier file observers are pgaudit (PostgreSQL) and s3cloudtrail (AWS S3). Use those exact strings — earlier docs that wrote pg_audit or cloudtrail were wrong and those strings do not resolve.

A real pgaudit source

The pgAudit source tails PostgreSQL’s structured audit log and emits one edge per audited data access. The read/write mode is taken verbatim from pgAudit’s class (READ, WRITE, DDL) — never inferred from the SQL text. It is read-only over the log file and never connects to the database.

{
  "sources": [
    {
      "name": "prod-postgres",
      "kind": "pgaudit",
      "tenant": "acme",
      "config": {
        "log_path": "/var/log/postgresql/postgresql.json",
        "format": "jsonlog",
        "follow": "true",
        "shared_accounts": "app_pool,reporting"
      }
    }
  ]
}

The config values are strings. The keys above are owned by the pgAudit connector:

  • log_path (required) — path to the PostgreSQL log file to read.
  • formatcsvlog or jsonlog; defaults to csvlog.
  • follow — tail continuously. This applies to jsonlog only; a csvlog file is read as a batch because its records can span newlines.
  • shared_accounts — comma-separated roles or application_names that are pooled or shared. Access attributed to one of these is marked approximate, deliberately, because the trail cannot separate the real callers behind a shared identity.

A distinguishing application_name is the per-agent bridge that earns a firm edge. If many agents share one role or connection pool, every access collapses onto that identity and attribution becomes approximate — the product says so rather than pretending it can tell the agents apart.

A real s3cloudtrail source

The CloudTrail source reads AWS CloudTrail log files and emits one edge per S3 event, taking read/write verbatim from CloudTrail’s readOnly field. The origin is the IAM principal; an assumed role shared across callers is marked approximate.

{
  "sources": [
    {
      "name": "prod-s3",
      "kind": "s3cloudtrail",
      "tenant": "acme",
      "config": {
        "path": "/var/log/cloudtrail/",
        "shared_accounts": "shared-pipeline-role"
      }
    }
  ]
}

The path key (required) is a CloudTrail log file or a directory of *.json / *.json.gz files. shared_accounts behaves as it does for pgAudit.

When nothing is wired, the engine warns

The engine fails safe, not loud:

  • If OLIVARES_SOURCES_CONFIG is unset, the engine starts with no sources.
  • If the file is missing, unreadable, or not valid JSON, the engine warns and continues with no sources — it does not crash on boot.
  • If the source list is empty, it warns that no connector will ingest and that the estate is running on no live traffic.

In every case the boot log tells you plainly that nothing real is wired. An empty access map should never look like a clean one.

Not every in-tree connector is wired into stock serve

Olivares AI ships more connectors in-tree than a stock serve binary registers as selectable source kinds. The file observers pgaudit, s3cloudtrail, the kernel backstop ebpf, the host runtime reader, and mcp introspection are wired into stock serve, alongside a set of data-platform, secrets, network, and identity observers. Other connectors exist in the tree but are not yet wired into the stock serve source registry — that is a tracked follow-up, not a claim that everything is selectable today. If a kind you expect does not resolve, treat it as not yet wired rather than misconfigured, and confirm against the connector’s own descriptor before relying on it.

This page describes only keys that are verified against the connectors above. The exact config keys for any other connector are owned by that connector; read its descriptor rather than copy an unverified schema.