Skip to main content

Provenance Graphs

Provenance graphs are causal decision DAGs that capture why an agent made each change β€” not just what changed. Every tool call is classified, linked to the agent's goal, and stored as a content-addressed artifact alongside the changes it explains.

What Is a Provenance Graph?​

When an agent works on a task, it follows a pattern: understand the goal, read code to orient, make edits, then verify. Atomic captures this pattern as a directed acyclic graph with typed nodes and causal edges.

Goal: "Fix the authentication bug"
β”‚
β”œβ”€β”€led_to──▢ Exploration: read src/auth.rs
β”œβ”€β”€led_to──▢ Exploration: grep "verify_token"
β”‚ β”‚
β”‚ β”œβ”€β”€explored_via──▢ Commitment: edit src/auth.rs
β”‚ β”‚
β”‚ β”œβ”€β”€verified_by──▢ Verification: bash "cargo test"
β”‚ β”‚
β”‚ └──committed_via──▢ PatchProposal: Change XMJZ3IPF (2 files)
β”‚
└──led_to──▢ Goal: "Add test coverage" (next turn)

Each node has a timestamp, tool name, duration, and summary. Each edge has a kind that describes the causal relationship. The graph is built incrementally as tool calls arrive and saved at the end of each turn.

Node Types​

Tool calls are classified into node types by a rule-based classifier that examines the tool name, input, and output:

Node TypeDescriptionExample Tools
GoalHuman prompt that starts a turnUser message
ExplorationRead-only operations to understand coderead, grep, list_directory, glob
CommitmentFile-modifying operationsedit, write, edit_file, create_file
VerificationTest or validation operationsbash (with test, check, lint in command)
ExecutionNon-test shell commandsbash (with install, build, run in command)
ErrorFailed operationsAny tool with error status
HumanGatePermission requested from userApproval prompts
PatchProposalA recorded Atomic changeCreated when record_turn() succeeds
DecisionConsolidated reasoning nodeCreated by post-hoc consolidation

Classification Rules​

The classifier uses the tool name as the primary signal, with input/output inspection for disambiguation:

  • read, grep, glob, list_directory β†’ always Exploration
  • edit, write, edit_file, create_file β†’ always Commitment
  • bash / terminal β†’ inspects the command string:
    • Contains test, check, lint, clippy, pytest, jest, cargo test β†’ Verification
    • Contains install, build, compile, run, start β†’ Execution
    • Otherwise β†’ Exploration (read-only shell command)
  • Error status on any tool β†’ Error

Edge Types​

Edges are inferred automatically from the sequence of events and the cursor state (current goal, pending explorations, last commitment):

Edge KindMeaningWhen Created
LedToGoal initiated this actionGoal β†’ Exploration, Goal β†’ Commitment (when no explorations precede it)
ExploredViaExplorations informed this commitmentExploration β†’ Commitment
VerifiedByCommitment was validatedCommitment β†’ Verification
CommittedViaCommitments became this patchCommitment β†’ PatchProposal
FailedWithPrevious action caused this errorAny node β†’ Error
BlockedByAction was blocked by human gateAny node β†’ HumanGate
ResumedAfterGoal resumed after a gate was resolvedHumanGate β†’ Goal

Edge Inference Example​

append_goal("Fix the auth bug")         β†’ Goal node created
append_tool_call("read", "src/auth.rs") β†’ Exploration, edge: Goal --led_to-β†’ Exploration
append_tool_call("grep", "verify_token")β†’ Exploration, edge: Goal --led_to-β†’ Exploration
append_tool_call("edit", "src/auth.rs") β†’ Commitment, edges: Exploration --explored_via-β†’ Commitment (Γ—2)
append_tool_call("bash", "cargo test") β†’ Verification, edge: Commitment --verified_by-β†’ Verification
append_patch_proposal("XMJZ3IPF", ...) β†’ PatchProposal, edge: Commitment --committed_via-β†’ PatchProposal

The pending explorations list is cleared when a commitment arrives, so each commitment captures exactly which explorations informed it.

How Provenance Graphs Are Built​

The ProvenanceAccumulator maintains an in-memory graph for each session. Because each hook invocation is a separate process, the accumulator is persisted to disk between invocations:

.atomic/sessions/{session_id}/graph.json

Lifecycle​

  1. session-start β€” Session created, accumulator initialized (empty graph)
  2. user-prompt (TurnStart) β€” Accumulator loaded from disk, Goal node appended, saved back
  3. after-tool (PostToolUse) β€” Accumulator loaded, tool call node appended (classified), saved back
  4. stop (TurnEnd) β€” If a change was recorded:
    • Accumulator loaded
    • PatchProposal node appended
    • Graph converted to content-addressed ProvenanceGraph
    • Saved to repository via repo.save_provenance_graph()
    • last_provenance_hash updated for chaining
    • Accumulator saved back to disk
  5. session-end β€” Attestation created (provenance graph data is already saved)

Multi-Turn Chaining​

Each turn's ProvenanceGraph is a self-contained artifact with a previous field pointing to the prior turn's graph hash. This creates a chain:

Turn 1 graph (hash: ABC123)  ←  Turn 2 graph (hash: DEF456, previous: ABC123)  ←  Turn 3 graph (...)

The accumulator maintains the full session graph across turns. Each turn's saved ProvenanceGraph contains the complete graph up to that point, not just the delta.

Storage​

On Disk​

Provenance graphs are stored alongside changes in the two-level directory structure:

.atomic/changes/
β”œβ”€β”€ AB/
β”‚ β”œβ”€β”€ ABCDEF1234567890.change # A change file
β”‚ β”œβ”€β”€ ABCDEF1234567890.attest # An attestation
β”‚ └── AB9876FEDCBA5432.provenance # A provenance graph
└── XM/
└── XMJZ3IPF...........provenance # Another provenance graph

The .provenance extension distinguishes them from .change and .attest files.

Content Addressing​

Like changes and attestations, provenance graphs are content-addressed:

hash = blake3(serialized_graph)
path = .atomic/changes/{hash[0:2]}/{hash}.provenance

The graph is serialized with postcard for compact binary representation.

Push and Pull​

Provenance graphs travel with the changes they explain. When you push:

  1. Atomic uploads the changes
  2. For each pushed change, finds provenance graphs that reference it
  3. Uploads provenance graphs where all explained changes have been pushed
$ atomic push origin

βœ“ Pushed 2 changes
βœ“ XMJZ3IPF provenance (7 nodes, 1 change)
βœ“ R3KQP7YN provenance (12 nodes, 1 change)
βœ“ ABCDEF12 attestation ($0.12, 2 covered)

The server stores them and serves them to the web UI for visualization.

Viewing Provenance Graphs​

CLI​

Provenance data is embedded in changes and visible through existing commands:

# See provenance metadata on each change
atomic log --verbose

# Inspect a specific change's provenance
atomic change <hash> --show-provenance

# List provenance graphs for a session
# (via the attestation which references covered changes)
atomic agent attest --hash <prefix> --verbose

Web UI​

The Atomic web UI renders provenance graphs as interactive visualizations on the Attestations tab. Each node is clickable, showing tool details, duration, and the causal chain that led to each change.

Data Model​

ProvenanceGraph (atomic-core)​

The content-addressed artifact stored in the repository:

FieldTypeDescription
session_idStringSession this graph belongs to
agent_nameStringAgent registry key (e.g., opencode)
agent_display_nameStringHuman-readable name (e.g., OpenCode)
agent_vendorStringProvider (e.g., anthropic)
nodesVec<ProvenanceNode>All nodes in the graph
edgesVec<ProvenanceEdge>All causal edges
changes_explainedVec<Hash>Change hashes this graph explains
previousOption<Hash>Hash of prior graph in this session (for chaining)

ProvenanceNode​

FieldTypeDescription
idStringUnique node ID (session prefix + counter)
kindNodeKindGoal, Exploration, Commitment, Verification, etc.
timestampi64Unix timestamp
summaryStringHuman-readable description
tool_nameOption<String>Tool that produced this node
tool_call_idOption<String>Unique tool invocation ID
duration_msOption<u64>Tool execution time
change_hashOption<Hash>For PatchProposal nodes
detailOption<String>JSON detail (files, command, etc.)

ProvenanceEdge​

FieldTypeDescription
fromStringSource node ID
toStringTarget node ID
kindEdgeKindLedTo, ExploredVia, VerifiedBy, CommittedVia, etc.

Compaction Context​

When OpenCode compacts a conversation to fit the context window, the provenance graph is injected as a structured summary. This preserves the agent's decision history across compaction boundaries:

## Session Provenance (12 nodes)

### Goals
- Fix the authentication bug
- Add test coverage

### Decisions
- Read src/auth.rs, grep verify_token β†’ edit src/auth.rs
- Run cargo test β†’ passed

### Patches
- Change XMJZ3IPF: src/auth.rs, src/auth/tests.rs

This keeps the agent oriented about what it has already explored and committed, even after the raw conversation is compacted away.

See Also​