Hunks: Mathematical Foundations of Edit and Replacement Operations
Abstractβ
This document provides a rigorous mathematical treatment of hunk operations in Atomic VCS. Hunks are the fundamental atomic operations in Atomic's patch-based directed acyclic graph (DAG) model. We prove the correctness properties of Edit and Replacement hunks, showing how they maintain consistency in the version control system's mathematical model.
1. Preliminaries: The Patch-Based DAG Modelβ
1.1 Mathematical Frameworkβ
Atomic VCS is built on a patch-based DAG where:
- Vertices represent immutable blocks of content
- Edges represent ordering relationships between vertices
- Changes are collections of hunks that introduce new vertices and edges
- Position is defined as a point in this graph:
P = (change_id, byte_offset)
Definition 1.1 (Vertex): A vertex V = (c, s, e) where:
c: Change identifier (NodeId)s: Start position (ChangePosition)e: End position (ChangePosition)- The vertex represents the immutable byte range
[s, e)within changec
Definition 1.2 (Position): A position P = (c, p) where:
c: Change identifierp: Byte offset (ChangePosition) within changec- Represents a unique point in the DAG
Definition 1.3 (Ordering Relation): Vertices are ordered via edges. If vertex Vβ has an edge to vertex Vβ, then Vβ comes before Vβ in the linear ordering of the file.
1.2 Context Dependenciesβ
When creating a new vertex, we must specify its position in the ordering relation via context:
Definition 1.4 (Up Context): A set of positions U = {Pβ, Pβ, ..., Pβ} such that the new vertex must appear after all vertices containing these positions in the linear ordering.
Definition 1.5 (Down Context): A set of positions D = {Pβ, Pβ, ..., Pβ} such that the new vertex must appear before all vertices containing these positions in the linear ordering.
Axiom 1.1 (Consistency): For a valid change, up context and down context must define a consistent ordering. That is, there exists a linear ordering of all vertices such that:
- All up context positions appear before the new vertex
- All down context positions appear after the new vertex
2. Hunks as Graph Operationsβ
2.1 Edit Hunkβ
Definition 2.1 (Edit Hunk): An Edit hunk E = (path, line_new, pos, content, context) where:
path: File path (identifies the inode in the DAG)line_new: Line number in the NEW file state (after applying all previous hunks)pos: Position in the change graph (merkle hash identifier)content: Byte content to add (if addition) or remove (if deletion)context: Up and down context positions for ordering
Mathematical Property 2.1: The line_new field represents the target position in the transformed file after all previous operations have been applied. This is not a reference to the old file state, but to the new file state being constructed.
Theorem 2.1 (Edit Hunk Correctness): Given an Edit hunk E with up context U and down context D, if:
- All positions in
Uexist in the current DAG state - All positions in
Dexist in the current DAG state - The ordering constraint
U < E < Dis satisfiable
Then applying E produces a valid new DAG state.
Proof Sketch:
- The new vertex is created at positions defined by
content - Edges are created from vertices containing positions in
Uto the new vertex - Edges are created from the new vertex to vertices containing positions in
D - The DAG property (acyclicity) is maintained because we insert between existing vertices
- Consistency is maintained because contexts are validated before edge creation
2.2 Replacement Hunkβ
Definition 2.2 (Replacement Hunk): A Replacement hunk R = (path, line_new, pos_old, pos_new, content_old, content_new, context) where:
path: File pathline_new: Line number in the NEW file state (same position for both old and new content)pos_old: Position identifier for the old content (deletion)pos_new: Position identifier for the new content (addition)content_old: Byte content being removedcontent_new: Byte content being addedcontext: Shared up and down context for both operations
Critical Insight: In a Replacement, both the deletion and addition occur at the same position line_new in the NEW file state. This is the fundamental property that distinguishes Replacement from two separate Edit hunks.
Mathematical Property 2.2: For a Replacement hunk R:
- The old content (deletion) and new content (addition) both reference position
line_new - This means: in the NEW file, at line
line_new, we removecontent_oldand insertcontent_new - The old content existed at some position in the OLD file state, but we reference it by its NEW file position
Theorem 2.2 (Replacement Hunk Correctness): A Replacement hunk R is mathematically equivalent to:
- An Edit hunk
E_delthat deletescontent_oldat positionline_new - An Edit hunk
E_addthat addscontent_newat positionline_new - With the constraint that
E_del.line_new = E_add.line_new
Proof:
From libatomic/src/diff/replace.rs:85-109, we see that a Replacement is created when:
if old_len > 0 {
match self.actions.pop() {
Some(Hunk::Edit { change: c, local, .. }) => {
if local.line == from_new + 1 { // Same NEW file position
// Convert to Replacement
self.actions.push(Hunk::Replacement { ... });
}
}
}
}
This proves that Replacements are created precisely when two Edits would occur at the same line_new position. The Replacement is a notational convenience that groups these operations, but the mathematical semantics are: both operations target the same position in the NEW file state.
2.3 Line Number Semantics: The Critical Propertyβ
Axiom 2.1 (Line Number Invariant): For any hunk H, the field H.line represents the line number in the NEW file state after applying all hunks that precede H in the change sequence.
Corollary 2.1: For a Replacement hunk R:
R.lineis the NEW file line number- The deletion removes content that, after previous hunks, would appear at line
R.line - The addition inserts content that will appear starting at line
R.line - Both operations target the same position because they occur in the same transformed file state
Proof of Corollary 2.1:
From libatomic/src/diff/replace.rs:110-119, when creating an Edit hunk:
self.actions.push(Hunk::Edit {
local: LocalByte {
line: from_new + 1, // NEW file line number
path: diff.path.clone(),
inode,
byte: Some(bytes_pos(lines_b, from_new)),
},
...
});
The line field is set to from_new + 1, where from_new is the index in the NEW file (lines_b array). This proves that line always refers to the NEW file state.
For Replacements, the same local.line value is used for both the deletion and addition operations, proving they target the same NEW file position.
3. Position Calculation in the DAGβ
3.1 Computing Up Contextβ
Definition 3.1 (Up Context Calculation): Given a deletion range [old, old + old_len) in the old file, the up context is computed by:
- Finding the vertex containing the byte position at
oldin the old file - Mapping this to a Position
P = (change_id, byte_offset)in the DAG - This Position becomes part of the up_context set
From libatomic/src/diff/replace.rs:123-193, the get_up_context function:
- Finds the vertex containing the byte position in the old file state
- Returns the Position that marks where the new content should appear after in the ordering
3.2 Computing Down Contextβ
Definition 3.2 (Down Context Calculation): Given a deletion range [old, old + old_len) and addition range [from_new, from_new + len) in the new file, the down context is computed by:
- Finding the vertex containing the byte position after
old + old_lenin the old file - Mapping this to a Position
Pin the DAG - This Position becomes part of the down_context set, marking where the new content should appear before in the ordering
From libatomic/src/diff/replace.rs:238-315, the get_down_context function computes positions that the new vertex must appear before.
3.3 The Replacement Position Invariantβ
Theorem 3.1 (Replacement Position Invariant): For a Replacement hunk R at NEW file line line_new:
- The up_context for both deletion and addition is computed from the same old file position range
- The down_context for both deletion and addition is computed from positions after the old file range
- Both operations share the same context because they target the same NEW file position
Proof:
When local.line == from_new + 1 in replace.rs:90, the system creates a Replacement. The local structure (which contains line) is reused for both the deletion (change) and addition (replacement) operations. Since they share the same local.line, they target the same NEW file position, and thus have the same context requirements.
4. Correctness Propertiesβ
4.1 Commutativityβ
Theorem 4.1 (Hunk Commutativity): Two hunks Hβ and Hβ commute if they operate on disjoint line ranges in the NEW file state. That is:
- If
Hβ.line_new + Hβ.length < Hβ.line_neworHβ.line_new + Hβ.length < Hβ.line_new - Then applying
(Hβ, Hβ)produces the same result as(Hβ, Hβ)
Proof: Since the hunks operate on disjoint ranges, the context dependencies don't overlap. The DAG edges created are independent, so the order of application doesn't affect the final state.
4.2 Consistency Preservationβ
Theorem 4.2 (DAG Consistency): Applying a valid hunk to a consistent DAG state produces a consistent DAG state.
Proof Sketch:
-
Acyclicity: New edges are only created from up_context vertices to the new vertex, and from the new vertex to down_context vertices. Since up_context < down_context by definition, no cycles are introduced.
-
Ordering: The new vertex is inserted between existing vertices, preserving the linear ordering property.
-
Uniqueness: Each new vertex has a unique
(change_id, start, end)tuple, ensuring no collisions.
4.3 Replacement Semantics Correctnessβ
Theorem 4.3 (Replacement Semantics): A Replacement hunk R correctly models the operation "replace content at position line_new" if:
- The old content exists at position
line_newafter applying previous hunks - The new content is inserted at position
line_new - Both operations use the same context dependencies
Proof:
From the Replacement creation logic, when local.line == from_new + 1 for both operations, they target the same position. The deletion removes vertices containing the old content, and the addition creates new vertices at the same position. The shared context ensures both operations see the same DAG state, making the replacement atomic and consistent.
5. Implementation Verificationβ
5.1 Edit Hunk Creationβ
From libatomic/src/diff/replace.rs:110-119:
self.actions.push(Hunk::Edit {
local: LocalByte {
line: from_new + 1, // NEW file line number
path: diff.path.clone(),
inode,
byte: Some(bytes_pos(lines_b, from_new)),
},
change: Atom::NewVertex(change),
encoding: encoding.clone(),
});
Verification: from_new is the index in lines_b (the NEW file array). Setting line = from_new + 1 correctly represents the NEW file line number (1-indexed).
5.2 Replacement Hunk Creationβ
From libatomic/src/diff/replace.rs:85-97:
if old_len > 0 {
match self.actions.pop() {
Some(Hunk::Edit { change: c, local, .. }) => {
if local.line == from_new + 1 { // Same NEW file position
self.actions.push(Hunk::Replacement {
change: c, // Deletion (old Edit)
local, // Same local.line (NEW file position)
replacement: Atom::NewVertex(change), // Addition
encoding: encoding.clone(),
});
}
}
}
}
Verification:
- An Edit hunk
Eexists withE.local.line = L - A new Edit would be created with
line = from_new + 1 - If
L == from_new + 1, both target the same NEW file position - They are combined into a Replacement, preserving
local.line = Lfor both operations
This proves that Replacements are created precisely when two operations target the same NEW file position, confirming our mathematical model.
5.3 Position Serializationβ
From libatomic/src/change/printable.rs:471-493:
Replace {
path,
line, // Serialized as: "Replacement in path:line pos encoding"
pos,
encoding,
change, // Old content (deletion)
replacement, // New content (addition)
change_contents, // "-" prefixed
replacement_contents,// "+" prefixed
} => {
writeln!(w, "Replacement in {}:{} {} {}",
Escaped(&path),
line, // This is local.line (NEW file line number)
pos,
Escaped(encoding_label(encoding))
)?;
print_contents(w, "-", change_contents, encoding)?; // Old
print_contents(w, "+", replacement_contents, encoding)?; // New
}
Verification: The serialized format shows a single line value, confirming that both deletion and addition reference the same NEW file position.
6. Common Misconceptions and Correctionsβ
β Misconception 1: "Line numbers refer to old file positions"β
Correction: By Axiom 2.1 and Theorem 2.1, H.line always refers to the NEW file line number after applying previous hunks. The old file positions are used only for computing context dependencies, not for the target position.
β Misconception 2: "Replacements have separate positions for deletion and addition"β
Correction: By Theorem 2.2 and the Replacement creation logic, both operations target the same line_new position. The Replacement is a grouping of two operations at the same position, not two operations at different positions.
β Misconception 3: "Line numbers are computed from old file state"β
Correction: From replace.rs:112, line = from_new + 1 where from_new is an index into lines_b (the NEW file). This is computed from the diff algorithm's result, which produces indices into the new file array.
7. Summary: Mathematical Foundationsβ
The mathematical model of hunks rests on these foundations:
- Position Semantics:
H.linerepresents the NEW file line number (Axiom 2.1) - Replacement Invariant: Replacements target a single NEW file position for both deletion and addition (Theorem 2.2)
- Context Dependencies: Up/down context positions ensure correct ordering in the DAG (Definitions 1.4, 1.5)
- Correctness: Valid hunks preserve DAG consistency properties (Theorems 4.1, 4.2, 4.3)
These properties ensure that:
- Changes can be applied in any order that respects dependencies
- The DAG remains acyclic and consistent
- Replacements correctly model atomic "replace at position" operations
- The mathematical model is sound and implementable
8. Concrete Example: Hello World Diffβ
To illustrate the mathematical concepts, we present a complete example showing how Edit and Replacement hunks represent file changes.
8.1 Original File Stateβ
File: hello.ts
1: function greet(name: string): string {
2: return `Hello, ${name}!`;
3: }
4:
5: function main(): void {
6: const message = greet("World");
7: console.log(message);
8: }
DAG State: The file consists of vertices representing each line:
Vβ = (changeβ, 0, len(line1))- "function greet..."Vβ = (changeβ, len(line1), len(line1)+len(line2))- " return..."Vβ = (changeβ, ...)- "}"- ... and so on
8.2 Modified File Stateβ
File: hello.ts (after changes)
1: function greet(name: string): string {
2: const redName = `\x1b[31m${name}\x1b[0m`;
3: return `Hello, ${redName}!`;
4: }
5:
6: function main(): void {
7: const name = getName();
8: const message = greet(name);
9: console.log(message);
10: }
11:
12: function getName(): string {
13: const args = process.argv.slice(2);
14: const name = args.join(" ").trim();
15: return name || "World";
16: }
8.3 Diff Analysisβ
The diff algorithm compares the old file (lines_a) and new file (lines_b) and produces the following hunks:
Hunk 1: Edit (Addition)β
Operation: Add a new line after line 1
Hunk Structure:
Edit in "hello.ts":2 1.234 "UTF-8"
+ const redName = `\x1b[31m${name}\x1b[0m`;
Mathematical Interpretation:
line = 2: This is the NEW file line number where the addition occurs- After applying this hunk, the new file will have the added line at position 2
- Up Context: Position pointing to the end of line 1 (
function greet...) - Down Context: Position pointing to the start of the old line 2 (
return...)
DAG Operation:
- Creates new vertex
V_new = (changeβ, start, end)containing the new line - Creates edge from
Vβ(line 1) toV_new(new line 2) - Creates edge from
V_newto the vertex that was at old line 2
Resulting File State (after Hunk 1):
1: function greet(name: string): string {
2: const redName = `\x1b[31m${name}\x1b[0m`; β NEW
3: return `Hello, ${name}!`; β OLD line 2, now at NEW line 3
4: }
Hunk 2: Replacementβ
Operation: Replace line 3 (old line 2) with a modified version
Hunk Structure:
Replacement in "hello.ts":3 2.473 "UTF-8"
- return `Hello, ${name}!`;
+ return `Hello, ${redName}!`;
Mathematical Interpretation:
line = 3: This is the NEW file line number where BOTH operations occur- The deletion removes the old content that is now at NEW line 3
- The addition inserts new content starting at NEW line 3
- Critical: Both operations target the same NEW file position (line 3)
Why it's a Replacement:
- The diff algorithm found old content to remove (
old_len > 0) - The previous Edit hunk was at
line = 2 - The new operation would be at
line = 3(after the addition) - But there's old content at the position that maps to NEW line 3
- Since
local.line == from_new + 1(both reference NEW line 3), they're combined into a Replacement
DAG Operation:
- Deletion: Removes vertex containing the code: return
Hello, ${name}!; - Addition: Creates new vertex
V_replacement = (changeβ, start, end)containing the code: returnHello, ${redName}!; - Both operations use the same up_context (end of line 2) and down_context (start of line 4)
- The new vertex is inserted at the same position where the old vertex was removed
Resulting File State (after Hunk 2):
1: function greet(name: string): string {
2: const redName = `\x1b[31m${name}\x1b[0m`;
3: return `Hello, ${redName}!`; β REPLACED (was: return `Hello, ${name}!`;)
4: }
Hunk 3: Replacementβ
Operation: Replace the main() function body
Hunk Structure:
Replacement in "hello.ts":7 3.891 "UTF-8"
- const message = greet("World");
+ const name = getName();
+ const message = greet(name);
Mathematical Interpretation:
line = 7: Both deletion and addition occur at NEW file line 7- The old line
const message = greet("World");is removed - Two new lines are added starting at NEW line 7
- Note: Even though multiple lines are added, they all start at the same NEW file position (line 7)
DAG Operation:
- Deletion: Removes vertex containing the old line
- Addition: Creates new vertices for the two new lines
- All new vertices share the same up_context (end of line 6) and down_context (start of line 9)
Resulting File State (after Hunk 3):
6: function main(): void {
7: const name = getName(); β NEW (first addition)
8: const message = greet(name); β NEW (second addition)
9: console.log(message);
10: }
Hunk 4: Edit (Addition)β
Operation: Add a new function at the end
Hunk Structure:
Edit in "hello.ts":12 4.567 "UTF-8"
+ function getName(): string {
+ const args = process.argv.slice(2);
+ const name = args.join(" ").trim();
+ return name || "World";
+ }
Mathematical Interpretation:
line = 12: NEW file line number where the addition starts- Multiple lines are added, but they all start at position 12 in the NEW file
- Up Context: Position pointing to the end of line 11 (empty line)
- Down Context: None (end of file)
8.4 Complete Transformationβ
Step-by-step application:
- Initial State: Original file with 8 lines
- After Hunk 1 (Edit): 9 lines (added line 2)
- After Hunk 2 (Replacement): Still 9 lines (replaced line 3)
- After Hunk 3 (Replacement): 10 lines (replaced 1 line with 2 lines)
- After Hunk 4 (Edit): 16 lines (added 5 lines)
Key Observations:
-
Line numbers are cumulative: Each hunk's
linefield refers to the NEW file state after all previous hunks have been applied. -
Replacements preserve position: In Hunk 2, the replacement occurs at NEW line 3. The old content was at old line 2, but after Hunk 1 added a line, it moved to NEW line 3. The replacement correctly targets this NEW position.
-
Context dependencies: Each hunk's up_context and down_context ensure the new vertices are inserted in the correct position in the DAG, maintaining the file's logical structure.
-
Mathematical correctness: The DAG remains acyclic and consistent throughout all operations. Each new vertex has well-defined ordering relationships with existing vertices.
8.5 DAG Structure Visualizationβ
After all hunks are applied, the DAG structure looks like:
Vβ (line 1: function greet...)
β Vβ (line 2: const redName...) [NEW from Hunk 1]
β Vβ (line 3: return Hello...) [REPLACED in Hunk 2]
β Vβ (line 4: })
β Vβ
(line 5: empty)
β Vβ (line 6: function main...)
β Vβ (line 7: const name...) [NEW from Hunk 3]
β Vβ (line 8: const message...) [NEW from Hunk 3]
β Vβ (line 9: console.log...)
β Vββ (line 10: })
β Vββ (line 11: empty)
β Vββ (line 12: function getName...) [NEW from Hunk 4]
β ... (rest of getName function)
Each edge represents an ordering constraint: the source vertex must appear before the target vertex in the linear file ordering.
8.6 Why This Model is Correctβ
Theorem 8.1 (Example Correctness): The hunks in this example correctly transform the file because:
- Position Accuracy: Each
linefield correctly identifies the NEW file position after previous hunks - Context Validity: All up_context and down_context positions exist in the DAG at the time of application
- Ordering Consistency: The DAG edges create a valid linear ordering matching the file structure
- Replacement Correctness: Replacements target the same NEW file position for both deletion and addition
Proof: By inspection:
- Hunk 1 adds at line 2, correctly placing it between lines 1 and 3
- Hunk 2 replaces at line 3, correctly removing the old line 3 and inserting the new line 3
- Hunk 3 replaces at line 7, correctly expanding one line into two
- Hunk 4 adds at line 12, correctly appending to the end of the file
All operations maintain DAG consistency and produce the expected final file state.
Referencesβ
libatomic/src/diff/replace.rs- Replacement creation and position calculationlibatomic/src/change/printable.rs- Hunk serialization formatlibatomic/src/change/parse.rs- Hunk parsing logiclibatomic/src/pristine/vertex.rs- Vertex and Position definitionslibatomic/src/apply/vertex.rs- DAG application logic with context validation