Change Identity
Overviewβ
Change identity is a fundamental concept in Atomic VCS that distinguishes it from Git and other version control systems. In Atomic, changes are identified by their content alone, not by metadata like author, timestamp, or parent commits.
What is Change Identity?β
A change in Atomic is a set of patch hunks that describe transformations to files. The identity of a change is determined by cryptographically hashing only the patch contentβthe actual code modifications.
Change {
hunks: Vec<Hunk>, // The actual code changes
dependencies: Vec<Hash>, // What this change depends on
}
// Change hash = hash(hunks + dependencies)
// NOT hash(hunks + author + timestamp + message + parent)
Git vs Atomic: A Critical Differenceβ
Git Commitsβ
In Git, commit identity includes everything:
Git Commit Hash = hash(
tree, // File snapshots
parent, // Previous commit
author, // Who wrote it
committer, // Who committed it
timestamp, // When it was committed
message, // Commit message
)
Result: The same code change gets different hashes in different contexts.
Atomic Changesβ
In Atomic, change identity includes only content:
Atomic Change Hash = hash(
hunks, // The actual code changes
dependencies, // What it depends on
)
Result: The same code change gets the same hash everywhere.
Why This Mattersβ
1. No Lost Work from Duplicate Fixesβ
Git Problem:
# Developer A fixes bug
git commit -m "Fix null pointer bug" # Commit: abc123
# Developer B independently fixes same bug
git commit -m "Fix null pointer bug" # Commit: def456
# Different hashes, Git treats as different commits
# Merge creates conflict even though patches are identical
Atomic Solution:
# Developer A fixes bug
atomic record -m "Fix null pointer bug" # Change: XYZ789
# Developer B independently fixes same bug
atomic record -m "Fix null pointer bug" # Change: XYZ789 (same hash!)
# Atomic detects: "This change already exists"
# No duplicate work, no conflicts
2. Change References Never Breakβ
Git Problem:
# Create PR referencing commit
PR #123: "See commit abc123 for implementation"
# Someone rebases the branch
git rebase main # Commit abc123 becomes def456
# Reference is now broken! abc123 doesn't exist anymore
Atomic Solution:
# Create discussion referencing change
Discussion: "See change XYZ789 for implementation"
# Someone updates the change
atomic unrecord XYZ789
atomic record -m "Updated implementation" # New change: XYZ789'
# Original XYZ789 still exists in change store
# Reference still valid, can view both versions
3. Cross-Repository Deduplicationβ
Git Problem:
# Repository A
git commit -m "Add feature" # Commit: aaa111
# Repository B (fork)
git cherry-pick aaa111 # Creates new commit: bbb222
# Same code, different hashes
# No way to know they're the same change
Atomic Solution:
# Repository A
atomic record -m "Add feature" # Change: CCC333
# Repository B (fork)
atomic pull CCC333 # Same change, same hash: CCC333
# Atomic knows these are identical
# Automatic deduplication
How Change Hashing Worksβ
Step 1: Normalize Hunksβ
fn normalize_hunk(hunk: &Hunk) -> Vec<u8> {
// Extract only the content changes
let mut data = Vec::new();
// File path
data.extend(hunk.file_path.as_bytes());
// Line numbers
data.extend(&hunk.start_line.to_le_bytes());
data.extend(&hunk.end_line.to_le_bytes());
// Operation (add/delete/modify)
data.push(hunk.operation as u8);
// Content
data.extend(hunk.content.as_bytes());
data
}
Step 2: Hash Dependenciesβ
fn hash_change(hunks: &[Hunk], dependencies: &[Hash]) -> Hash {
let mut hasher = Blake3::new();
// Hash all hunks
for hunk in hunks {
let normalized = normalize_hunk(hunk);
hasher.update(&normalized);
}
// Hash dependencies (sorted for consistency)
let mut deps_sorted = dependencies.to_vec();
deps_sorted.sort();
for dep in deps_sorted {
hasher.update(dep.as_bytes());
}
Hash::from(hasher.finalize())
}
Step 3: Base32 Encodingβ
// Hash is 256-bit Blake3 digest
// Encoded as base32 for human readability
let hash = hash_change(&hunks, &dependencies);
let base32 = hash.to_base32(); // Example: "XYZ789ABC123DEF456..."
Change Identity in Practiceβ
Same Change, Different Messagesβ
# Developer A
atomic record src/fix.rs -m "Fix bug"
# Output: Change ABC123
# Developer B (independently)
atomic record src/fix.rs -m "Fixed the null pointer issue"
# Output: Change ABC123 (same hash!)
# Atomic: "This change already exists with different message"
# Uses existing change, no duplicate
Same Change, Different Authorsβ
# Alice commits change
atomic record src/feature.rs -m "New feature"
# Change: DEF456, Author: alice@example.com
# Bob makes identical change
atomic record src/feature.rs -m "New feature"
# Change: DEF456 (same hash!), Author: bob@example.com
# Atomic detects duplicate, suggests using existing change
Same Change, Different Timestampsβ
# Create change today
atomic record src/code.rs -m "Implementation"
# Change: GHI789, Date: 2025-01-01
# Create identical change tomorrow
atomic record src/code.rs -m "Implementation"
# Change: GHI789 (same hash!), Date: 2025-01-02
# Same hash despite different dates
Change Metadata vs Identityβ
While identity is content-only, changes still carry metadata:
pub struct ChangeHeader {
pub hash: Hash, // Identity (content-based)
pub message: String, // NOT part of identity
pub authors: Vec<Author>, // NOT part of identity
pub timestamp: DateTime, // NOT part of identity
pub dependencies: Vec<Hash>, // Part of identity
}
Key Point: Metadata can differ, but identity stays the same.
Benefits of Content-Based Identityβ
β Deduplication Across Repositoriesβ
When two repositories independently create the same fix:
- Git: Two commits, potential merge conflict
- Atomic: One change, automatic deduplication
β Stable Referencesβ
Links to changes never break:
- Git: Commit hashes change with rebase
- Atomic: Change hashes are stable
β Provable Correctnessβ
Mathematical properties:
- Same input β Same output (deterministic)
- Content equality β Identity equality (content-addressed)
- Commutative merges (order-independent)
β Cross-Team Collaborationβ
Teams can share changes knowing:
- Same hash = exact same code change
- Dependencies are explicit
- No hidden context required
Comparison Tableβ
| Aspect | Git Commits | Atomic Changes |
|---|---|---|
| Hash Includes | Tree, parent, author, date, message | Hunks, dependencies only |
| Stability | Changes on rebase/amend | Immutable |
| Deduplication | Manual (requires coordination) | Automatic |
| References | Break on rewrite | Always valid |
| Identity | Context-dependent | Content-dependent |
| Merge Detection | By commit hash | By change hash |
Common Questionsβ
Q: What if I want to update a change?β
A: Create a new change. The old change remains in the store.
atomic unrecord ABC123 # Remove from stack
atomic record src/fix.rs -m "Updated fix" # New change: ABC124
# ABC123 still exists in change store, can be referenced
Q: Can two changes have the same hash by accident?β
A: Astronomically unlikely. Blake3 has 256-bit output space (2^256 possible hashes). Collision probability is negligible.
Q: What if dependencies differ?β
A: Different dependencies = different hash.
# Change A depends on X
atomic record -m "Feature" --deps X
# Hash: AAA111
# Change B depends on Y
atomic record -m "Feature" --deps Y
# Hash: BBB222 (different!)
# Even if hunks are identical, dependencies differ
Q: How does this affect performance?β
A: Positively! Content-based hashing enables:
- Automatic deduplication (saves space)
- Faster conflict detection
- Better caching strategies
Q: Can I see the raw hash?β
A: Yes!
atomic show ABC123 --format json
# Output:
# {
# "hash": "ABC123DEF456...",
# "blake3": "f8e9a7b6c5d4e3f2a1b0...",
# "hunks": [...],
# "dependencies": [...]
# }
Advanced: Change Algebraβ
Change identity enables mathematical operations:
Commutative Propertyβ
A + B = B + A
Same changes in different order produce the same result.
Associative Propertyβ
(A + B) + C = A + (B + C)
Grouping doesn't matter.
Identity Elementβ
A + β
= A
Empty change doesn't affect result.
Inverse Elementβ
A + Aβ»ΒΉ = β
Change and its inverse cancel out.
Why this matters: These properties enable conflict-free merges and parallel development at scale.
Implementation Notesβ
Hash Algorithmβ
Atomic uses Blake3 for change hashing:
- Fast (faster than SHA-2, competitive with SHA-1)
- Secure (cryptographically strong)
- Parallel (tree-based hashing)
- Fixed output (256 bits)
Encodingβ
Hashes are encoded as base32 for human readability:
- Case-insensitive
- No ambiguous characters (0/O, 1/l)
- URL-safe
- Shorter than hex for same information
See Alsoβ
- Virtual Working Copies - How sessions use change identity
- Stacked Diffs Guide - Building on change identity
- Comparison with Git - Why change identity matters
Summaryβ
Change identity is what makes Atomic different:
- β Changes identified by content only
- β Same code = same hash everywhere
- β No lost work from duplicate fixes
- β References never break
- β Automatic deduplication
- β Mathematical correctness
This foundation enables Atomic's advanced features: commutative merges, stacked workflows, and AI agent collaboration at scale.
Key Takeaway: In Git, commits are snapshots in time. In Atomic, changes are timeless mathematical objects identified solely by their transformations.