Manifest Nodes: Sapling-Inspired Query Optimization

Status: 🔍 Investigating
Proposed: December 2025
Goal: Address scaling challenges with many small operations by adding Manifest nodes as first-class DAG citizens

Problem Statement

Atomic's pure patch-based model provides mathematical correctness but faces scaling challenges with frequent, small incremental changes:

Query Performance: File existence/content queries require O(changes) traversal
Context Calculation: O(contexts × edges) complexity per hunk
Write Amplification: Copy-on-write B-tree overhead (2-5×) for many small writes
Transaction Commits: Each change requires commit overhead (1-10ms)

While local database access eliminates network latency, the fundamental algorithmic complexity remains.

Proposed Solution: Manifest Nodes

Extend Atomic's unified DAG model to include Manifest nodes as third-class citizens alongside Changes and Tags, inspired by Sapling's manifest system but maintaining Atomic's mathematical foundations.

Three-Node DAG Architecture

pub enum NodeType {
    Change = 0,     // Semantic patch with hunks
    Tag = 1,        // Consolidating snapshot (dependency consolidation)
    Manifest = 2,  // File tree snapshot (query optimization)
}

Manifest Node Structure

/// Manifest node: File tree snapshot for O(1) queries
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct Manifest {
    /// Hash of the manifest (content-addressed)
    pub manifest_hash: Hash,
    
    /// Merkle hash representing the channel state
    pub state: Merkle,
    
    /// Channel this manifest belongs to
    pub channel: String,
    
    /// File tree snapshot: path -> file metadata
    pub file_tree: HashMap<String, FileManifestEntry>,
    
    /// Optional: Reference to tag that created this manifest
    pub created_from_tag: Option<Hash>,
    
    /// Optional: Reference to change that created this manifest
    pub created_from_change: Option<Hash>,
    
    /// Timestamp when manifest was created
    pub created_at: u64,
    
    /// AI attestation metadata (if AI generated this file tree)
    pub ai_attestation: Option<AIAttestation>,
}

#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct FileManifestEntry {
    /// Content hash of the file
    pub file_hash: Hash,
    
    /// Inode reference in the DAG
    pub inode: Inode,
    
    /// File size in bytes
    pub size: u64,
    
    /// Whether this is a directory
    pub is_directory: bool,
    
    /// Optional: Merkle hash of directory contents
    pub directory_state: Option<Merkle>,
}

Key Benefits

1. O(1) Query Performance

Without manifest (current):

// Query: "Does file X exist at state S?"
// Requires: Apply all changes up to state S, traverse DAG
// Complexity: O(changes)

With manifest (proposed):

// Query: "Does file X exist at manifest M?"
// Requires: HashMap lookup
// Complexity: O(1)
fn file_exists_at_manifest(manifest: &Manifest, file_path: &str) -> bool {
    manifest.file_tree.contains_key(file_path)
}

2. Independent Creation

Manifests can be created independently of tags:

# Create manifest from current state (not tied to tag)
atomic manifest create --from-state <merkle>

# Create manifest from tag (for production releases)
atomic manifest create --from-tag v1.0.0

# Create manifest from change (for development snapshots)
atomic manifest create --from-change <hash>

3. AI Attestation on File Trees

AI agents can attest to file tree correctness:

pub struct AIAttestation {
    // ... existing fields ...
    
    /// Node type this attestation applies to
    pub node_type: NodeType,  // Can be Change, Tag, or Manifest
    
    /// Node hash (change_hash, tag_hash, or manifest_hash)
    pub node_hash: Hash,
    
    /// For manifests: Attestation that file tree is correct
    pub manifest_attestation: Option<ManifestAttestation>,
}

pub struct ManifestAttestation {
    /// AI verified that this file tree is correct
    pub verified_file_tree: bool,
    
    /// AI verified that all files are syntactically valid
    pub verified_syntax: bool,
    
    /// AI verified that dependencies are correct
    pub verified_dependencies: bool,
}

4. DAG Integration

All three node types are first-class DAG nodes:

impl Node {
    /// Create a manifest node
    pub fn manifest(hash: Hash, state: Merkle) -> Self {
        Self {
            hash,
            node_type: NodeType::Manifest,
            state,
        }
    }
    
    /// Check if this node is a manifest
    pub fn is_manifest(&self) -> bool {
        self.node_type == NodeType::Manifest
    }
}

// All operations work on any node type
atomic apply <manifest-hash>     # Apply manifest (restore file tree)
atomic dependencies <manifest-hash>  # Get manifest dependencies
atomic log <manifest-hash>       # Show manifest in history

Performance Comparison

Operation	Without Manifest	With Manifest Node	Improvement
File exists query	O(changes)	O(1)	100-1000× faster
File content lookup	O(changes + log n)	O(log n)	10-100× faster
Directory listing	O(changes + files)	O(files)	10-100× faster
AI attestation	Change-level only	Change + Manifest	More granular
Manifest creation	N/A	O(files)	One-time cost

Use Cases

1. Production Release Manifests

# Create tag for release
atomic tag create v1.0.0 -m "Production release"

# Create manifest for fast queries
atomic manifest create --from-tag v1.0.0 -m "v1.0.0 file tree"

# O(1) queries on production state
atomic manifest query v1.0.0 --file src/index.ts
# Returns: file exists, hash, size, inode

2. AI Agent File Tree Verification

# AI agent creates manifest and attests to correctness
atomic manifest create --ai-attest \
  --ai-provider anthropic \
  --ai-model claude-sonnet-4 \
  --verify-syntax \
  --verify-dependencies

# Manifest includes AI attestation
# Other agents can verify: "This file tree is correct"

3. Development Snapshots

# Create manifest from current state (not tied to tag)
atomic manifest create --from-state <merkle> -m "Dev snapshot"

# Fast queries during development
atomic manifest query <manifest-hash> --list-files

DAG Structure Example

Change A [root]
  ↓
Change B [A]
  ↓
Change C [A, B]
  ↓
Tag v1.0 [A, B, C]  ← Consolidates dependencies
  ↓
Manifest M1 [Tag v1.0]  ← File tree snapshot
  ↓
Change D [Tag v1.0]  ← New change depends on tag
  ↓
Manifest M2 [Change D]  ← New manifest from change

Key Insight: Manifests can depend on:

Tags: Production release manifests
Changes: Development snapshots
Other Manifests: Incremental updates

Comparison with Sapling

Aspect	Sapling Manifest	Atomic Manifest Node
DAG Integration	Separate from DAG	First-class DAG node
Content-Addressed	No	Yes (manifest_hash)
AI Attestation	No	Yes (built-in)
Dependencies	Per changeset	Can depend on any node
Mathematical Guarantees	None	Merkle hash verification
Query Speed	O(1)	O(1)
Creation Cost	O(files) per changeset	O(files) on-demand

Key Advantage: Atomic's manifest nodes are:

DAG-integrated: First-class nodes with dependencies
Content-addressed: Same file tree = same manifest hash
AI-attestable: Can verify file tree correctness
Flexible: Can depend on changes, tags, or other manifests

Implementation Strategy

Phase 1: Add Manifest NodeType

// atomic-core/src/pristine/mod.rs
pub enum NodeType {
    Change = 0,
    Tag = 1,
    Manifest = 2,  // NEW
}

impl NodeType {
    pub fn from_u8(val: u8) -> Option<Self> {
        match val {
            0 => Some(NodeType::Change),
            1 => Some(NodeType::Tag),
            2 => Some(NodeType::Manifest),  // NEW
            _ => None,
        }
    }
}

Phase 2: Manifest Storage

// atomic-core/src/pristine/manifest.rs
pub struct Manifest {
    // ... fields ...
}

// Store in redb database
#[table("manifests")]
pub struct ManifestTable {
    manifest_hash: Hash,
    state: Merkle,
    // ... other fields ...
}

Phase 3: AI Attestation Extension

// Extend AIAttestation to support manifests
pub struct AIAttestation {
    node_type: NodeType,  // Change, Tag, or Manifest
    node_hash: Hash,
    // ... existing fields ...
    manifest_attestation: Option<ManifestAttestation>,
}

Phase 4: CLI Commands

# Create manifest
atomic manifest create [--from-tag <tag>] [--from-change <hash>] [--from-state <merkle>]

# Query manifest
atomic manifest query <manifest-hash> [--file <path>] [--list-files] [--directory <path>]

# List manifests
atomic manifest list [--channel <channel>]

# Apply manifest (restore file tree)
atomic manifest apply <manifest-hash>

Trade-offs

Benefits

O(1) Query Performance: File existence/content queries become instant
AI Attestation: Verify file tree correctness at manifest level
DAG Integration: First-class nodes maintain mathematical guarantees
Flexibility: Independent of tags, can be created on-demand
Content-Addressed: Same file tree = same manifest hash (deduplication)

Costs

Storage Overhead: Each manifest stores full file tree (O(files) space)
Creation Cost: Generating manifest requires O(files) traversal
Maintenance: Manifests must be kept in sync with channel state
Complexity: Additional node type increases system complexity

Mitigation Strategies

Optional Manifests: Only create when needed (production releases, AI verification)
Lazy Generation: Generate manifests on-demand, not per change
Incremental Updates: Update manifests incrementally when possible
Content-Addressed Deduplication: Same file tree = same manifest (automatic deduplication)

Research Questions

When to create manifests?
- On-demand vs. automatic?
- Per tag vs. per change?
- Production releases only?
Storage strategy?
- Store full file tree in database?
- Store references to DAG vertices?
- Hybrid approach?
Update frequency?
- Recreate on every change?
- Incremental updates?
- On-demand only?
Query optimization?
- Cache frequently accessed manifests?
- Index file paths for faster lookup?
- Lazy loading of manifest data?

Next Steps

Prototype: Implement basic Manifest node type in atomic-core
Benchmark: Measure query performance improvement
Evaluate: Assess storage overhead vs. performance gain
Design: Finalize manifest creation/update strategy
Implement: Full integration with DAG and AI attestation

References

Sapling SCM - Facebook's scalable VCS with manifest system
Hunks: Edit and Replacement Calculations - Mathematical foundations
Hunks: Edit and Replacement Calculations - Mathematical foundations of Atomic's patch model

Problem Statement​

Proposed Solution: Manifest Nodes​

Three-Node DAG Architecture​

Manifest Node Structure​

Key Benefits​

1. O(1) Query Performance​

2. Independent Creation​

3. AI Attestation on File Trees​

4. DAG Integration​

Performance Comparison​

Use Cases​

1. Production Release Manifests​

2. AI Agent File Tree Verification​

3. Development Snapshots​

DAG Structure Example​

Comparison with Sapling​

Implementation Strategy​

Phase 1: Add Manifest NodeType​

Phase 2: Manifest Storage​

Phase 3: AI Attestation Extension​

Phase 4: CLI Commands​

Trade-offs​

Benefits​

Costs​

Mitigation Strategies​

Research Questions​

Next Steps​

References​