Every code review starts with a diff. Green lines added, red lines removed. It looks simple, but the algorithm deciding which lines to mark as changed and how to group them is solving a genuinely hard computer science problem.

Understanding how diff works helps you write better commits, review code faster, and troubleshoot confusing merge conflicts.

1. The Core Problem: Longest Common Subsequence

At its heart, a diff algorithm is finding the Longest Common Subsequence (LCS) between two sequences (files). The LCS is the longest set of lines that appear in both files in the same order, without needing to be contiguous.

Consider two versions of a file:

Version A:        Version B:
1. import foo     1. import foo
2. import bar     2. import baz
3.                3. import bar
4. function x()   4.
5.   return 1     5. function x()
6. }              6.   return 2
                  7. }

The LCS here is: import foo, import bar, function x(), }. Everything not in the LCS is either an insertion or a deletion. The diff algorithm's job is to find this LCS as efficiently as possible.

2. The Myers Algorithm: Git's Default

Eugene Myers published his diff algorithm in 1986, and it's still the default in Git. It works by exploring an "edit graph" where moving right represents deleting a line from file A, moving down represents inserting a line from file B, and moving diagonally represents a matching line (no change).

The algorithm finds the shortest edit script: the minimum number of insertions and deletions needed to transform file A into file B. This is what makes Git diffs feel "minimal"—they show the fewest changes possible.

Why "Minimal" Matters

There are often multiple valid diffs for the same change. Imagine you add a blank line between two functions. Should the diff show the blank line added after the first function's closing brace, or before the second function's opening line? Myers picks the shortest path, which usually produces the most intuitive result.

3. Patience Diff: Better for Code

Myers is optimal for minimizing edit distance, but it can produce confusing results when large blocks of code are moved or when files have many similar lines (like closing braces in C-style languages).

Patience diff takes a different approach:

  1. Find all lines that appear exactly once in both files. These are "anchor" lines—they're almost certainly the same logical line.
  2. Match these unique lines first to establish a skeleton.
  3. Recursively diff the gaps between anchors using the standard LCS approach.

The result is diffs that align on semantically meaningful lines (function signatures, class definitions) rather than on braces and blank lines. You can enable it in Git:

git diff --diff-algorithm=patience

4. Unified vs. Side-by-Side: Choosing a View

The algorithm produces a list of changes. How you display them is a separate design decision.

Unified Diff

The classic format you see in git diff output and GitHub PRs. Both files are interleaved in a single column, with - for removed lines and + for added lines.

 import foo
-import bar
+import baz
+import bar

 function x()
-  return 1
+  return 2
 }

Best for: Small, focused changes. Quick scanning. Terminal output. Patch files.

Side-by-Side Diff

The old file and new file are displayed in parallel columns. Changed lines are aligned horizontally so you can compare them directly.

Best for: Large refactors. Comparing config files. Reviewing changes where context matters (seeing what a line was alongside what it became).

Inline Highlighting

The most useful enhancement to either view is word-level or character-level highlighting within changed lines. Instead of marking the entire line as changed, the diff highlights only the specific characters that differ. This is invaluable when a line has a small typo fix buried in a long string.

5. Writing Diff-Friendly Code

Understanding how diffs work helps you write code that produces cleaner code reviews:

  • Trailing commas: In arrays and objects, trailing commas mean adding an item only shows one added line, not a modified line plus an added line.
  • One import per line: Sorting imports alphabetically and putting each on its own line produces minimal diffs when adding or removing dependencies.
  • Separate refactoring from logic changes: A commit that renames a variable across 50 files makes it impossible to spot the one-line bug fix buried inside it. Split them into separate commits.
  • Avoid reformatting in the same commit: Running a code formatter on an entire file in the same commit as a bug fix creates noise. Format first, then fix.

6. Beyond Text: Structured Diffing

Traditional diff algorithms operate on lines of text. They don't understand that your file is JSON, or HTML, or a programming language with syntax. This leads to diffs that are technically correct but semantically confusing.

Structured diff tools parse the file into an AST (Abstract Syntax Tree) and compare the trees instead of the text. This means:

  • Reordering JSON keys doesn't show as a deletion + insertion.
  • Moving a function to a different location shows as a "move," not a delete + add.
  • Changing indentation style doesn't generate any diff at all.

While these tools are still maturing, they represent the future of code review.

7. Three-Way Merge: How Git Resolves Conflicts

A two-way diff compares two files — the old version and the new version. A three-way merge compares three: the common ancestor, your version, and their version. This is how git handles merging two branches that have diverged from a shared commit.

     Base (common ancestor)
        |         |
   Your changes  Their changes
        |         |
        v         v
      Merged result (or conflict)

If both you and a colleague modified the same lines, git cannot automatically determine which version is correct — it marks the conflict and asks you to resolve it. If you modified different lines, git can merge both changes without conflict. If only one side modified a region, that change is applied automatically.

The conflict markers git inserts look like this:

<<<<<<< HEAD
function greet(name) {
    return `Hello, ${name}!`;
=======
function greet(name, greeting = "Hello") {
    return `${greeting}, ${name}!`;
>>>>>>> feature/customizable-greeting

The HEAD section is your version. The section after ======= is their version. You resolve the conflict by editing the file to contain the desired final state and removing the markers, then staging the resolved file with git add.

8. Writing a Minimal Diff in JavaScript

Understanding a diff algorithm from first principles requires implementing one. Here is a minimal, readable implementation of a line-by-line diff using the Longest Common Subsequence approach — the conceptual core of all text diff algorithms:

function simpleDiff(oldLines, newLines) {
    const m = oldLines.length, n = newLines.length;

    // Build LCS table
    const dp = Array.from({length: m + 1}, () => new Array(n + 1).fill(0));
    for (let i = 1; i <= m; i++) {
        for (let j = 1; j <= n; j++) {
            if (oldLines[i-1] === newLines[j-1]) {
                dp[i][j] = dp[i-1][j-1] + 1;
            } else {
                dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);
            }
        }
    }

    // Traceback to produce diff output
    const result = [];
    let i = m, j = n;
    while (i > 0 || j > 0) {
        if (i > 0 && j > 0 && oldLines[i-1] === newLines[j-1]) {
            result.unshift({ type: 'equal', line: oldLines[i-1] });
            i--; j--;
        } else if (j > 0 && (i === 0 || dp[i][j-1] >= dp[i-1][j])) {
            result.unshift({ type: 'insert', line: newLines[j-1] });
            j--;
        } else {
            result.unshift({ type: 'delete', line: oldLines[i-1] });
            i--;
        }
    }
    return result;
}

const diff = simpleDiff(
    ["Hello", "World"],
    ["Hello", "JavaScript", "World"]
);
// [
//   { type: "equal", line: "Hello" },
//   { type: "insert", line: "JavaScript" },
//   { type: "equal", line: "World" }
// ]

This O(mn) implementation is clear but not fast enough for large files. Production diff algorithms (Myers, histogram) achieve better practical performance through smarter heuristics — but this captures the core logic: find the longest sequence of lines that are the same, then everything else is either an insertion or deletion.

Frequently Asked Questions

Why does git sometimes show a change as a deletion + addition instead of a modification?

Text diff algorithms operate at the line level by default — they see whole lines as atomic units. If a line changed, the algorithm sees it as: the old line was deleted, and a new line was inserted. There is no "modified line" concept in the output. Some diff tools add a second pass that highlights the specific characters within a changed line (word-level or character-level diff), but the underlying diff operation is still delete + insert. Git's --word-diff flag enables word-level highlighting within changed lines.

How does git handle binary files in diffs?

Git does not attempt to diff binary files by default — it reports "Binary files X and Y differ" and shows the change as a single binary blob modification. For images, PDFs, and other binary formats, this is usually the right behaviour since line-based diffs are meaningless. You can configure git to use external diff drivers (e.g., ExifTool for images, or custom scripts for proprietary formats) via the .gitattributes file, which enables meaningful diffs for specific binary types.

Conclusion

Diff algorithms are a fascinating intersection of theoretical computer science and practical developer tooling. Myers gives you minimal diffs, Patience gives you semantic diffs, and the display format you choose affects how quickly reviewers can understand your changes.

Need to compare two files or API responses? Use our Diff Checker to see changes highlighted side-by-side or in unified view, with character-level diffing.