Developer Tools

How to Compare Two Texts and Find Differences - Complete Guide

EveryTool Editorial
7 min read

The ability to compare two versions of a text and immediately see what changed is one of the most fundamental operations in software development. Git shows you what changed since your last commit. Code review tools highlight what your colleague modified. Document collaboration platforms track every edit. Under the hood of all of these features is a diff algorithm - and understanding how it works, and how to use diff tools effectively, makes you a more productive developer and a more careful collaborator.

How Text Diff Works

At its core, a text diff algorithm answers a simple question: what is the minimum set of changes needed to transform text A into text B? This is formally known as the shortest edit script problem. The algorithm compares the two texts line by line (and optionally word by word within lines) and identifies which lines are the same in both versions (equal), which exist only in A (deleted), and which exist only in B (inserted). The output is called a diff - short for difference.

The Myers Diff Algorithm

The Myers diff algorithm, published by Eugene Myers in 1986, is the algorithm that powers Git, GNU diff, and most professional diff tools. It works by finding the shortest edit path through a conceptual edit graph where moving diagonally represents an equal line, moving right represents an insertion, and moving down represents a deletion. The algorithm is efficient with O((N+M)D) time complexity where N and M are the lengths of the two texts and D is the size of the shortest edit script. In practice this means the algorithm is very fast when texts are similar (small D) and scales well to large files.

Git uses the Myers diff algorithm by default. You can see it in action with `git diff`. The `--word-diff` flag enables word-level highlighting - the same feature that this tool provides automatically.

Unified Diff vs Side-by-Side Diff

  • **Unified diff** - the traditional format used by Git and patch files. Shows both versions in a single column. Deleted lines are prefixed with `-` and inserted lines with `+`. Compact and easy to copy/share but requires more mental processing to visualize the change.
  • **Side-by-side diff** - shows the original and modified versions in two parallel columns with alignment. Much easier to visually compare because your eyes can directly compare corresponding lines. Used by GitHub, GitLab, and most modern code review tools.
  • **Inline diff** - shows only the changed sections, collapsing identical parts. Best for quickly scanning what changed without reading through unchanged content.
  • **Word-level diff** - a secondary diff run on individual words within changed lines. Shows exactly which words were added or removed. Essential for reviewing prose edits and spotting small changes in long lines.

Comparing JSON, Config Files, and Code

Different content types benefit from different comparison strategies. For JSON, the most useful comparison normalizes formatting first - pretty-printing both sides with consistent indentation - so that differences in whitespace do not create noise. For code, syntax-aware diffing can be more meaningful than raw line comparison. For config files like YAML, TOML, or .env, the key concern is usually spotting added or changed keys rather than formatting. For prose documents, word-level diffing is more valuable than line-level since a single edited sentence spans one line that looks entirely different.

Reading a Diff Effectively

  • **Red / minus lines** - these lines existed in the original but are gone in the modified version. They were deleted.
  • **Green / plus lines** - these lines are new in the modified version. They were inserted.
  • **Unchanged lines** - context lines shown in both versions to help you understand where the change sits.
  • **Word highlights within lines** - darker red words were removed from that line, darker green words were added. These secondary highlights tell you precisely what changed within a line that was largely preserved.
  • **Change blocks** - consecutive changed lines are grouped into a change block. Each block represents one logical edit.

A diff showing many changed lines does not necessarily mean a large change - it might just mean whitespace was reformatted. Always check the Ignore whitespace option before concluding that a lot changed.

Practical Diff Workflows

  • Before deploying a config change: diff the current production config against the new one to verify only intended changes are present
  • During code review: use a diff tool to understand what a PR actually changes before approving
  • When debugging a regression: diff the last known-good version of a file against the current broken version
  • When merging document edits: use a diff tool to see what your collaborator changed before accepting their version
  • For data validation: diff two CSV exports to verify a migration or transformation produced the expected result

Frequently Asked Questions

What is a unified diff format?

The standard diff format used by Git and patch files. Deleted lines are prefixed with -, inserted lines with +, and context lines with a space. Chunks start with @@ -start,count +start,count @@ headers showing where in each file the chunk appears.

How do I compare two JSON files?

Use a diff tool with JSON mode enabled. This normalizes both JSONs to the same formatting before comparing so you see real data differences instead of whitespace noise. Without normalization, a pretty-printed vs minified version of the same JSON looks completely different.

What does similarity percentage mean in diff tools?

It measures how much of the content is shared between the two texts. 100% means identical. 0% means no overlap. A score of 85% means the texts are mostly the same with small changes - useful for quickly gauging how much was changed.

Can I use diff to compare binary files?

Standard text diff does not work well for binary files because they are not line-oriented. Specialized tools exist for binary diffing. For common formats like PDFs or Word documents, convert to text first then use a text diff.

What is the difference between diff and merge?

Diff compares two versions and shows differences. Merge combines three versions - original, version A, and version B - and produces a single merged output incorporating changes from both. Merge tools flag conflicts where A and B changed the same section differently and require manual resolution.

Tools Mentioned in this Article