Interactive tutorial

Levenshtein Distance, Step by Step

This page explains edit distance in plain language and lets you run a fixed reference implementation like a debugger. Change the inputs, then watch the matrix and the code move together.

What Levenshtein distance measures

Levenshtein distance is the minimum number of single-character edits needed to transform one string into another. It answers questions like, "How many edits turn cat into cut?" and gives a numeric measure of similarity.

Allowed operations and costs

  • Insert a character into the first string (cost 1).
  • Delete a character from the first string (cost 1).
  • Substitute one character for another (cost 0 if they match, else 1).

If you allow only these moves, the distance is the cheapest sequence of edits that turns string A into string B.

Why dynamic programming is used

Trying every possible edit sequence is expensive. Dynamic programming (DP) stores answers to smaller subproblems so each prefix pair is computed once, then reused.

DP matrix definition

Let dp[i][j] be the edit distance between the first i characters of a and the first j characters of b. The final answer is dp[m][n] where m = a.length and n = b.length.

Small DP grid with labeled axes
a \\ b '' c u t
'' 0 1 2 3
c 1 0 1 2
a 2 1 1 2
t 3 2 2 1

Initialization rules

The first row and column compare a string with the empty prefix. It takes i deletions to turn a length-i prefix into empty, and j insertions to build a length-j prefix from empty.

Recurrence formula

For each cell dp[i][j], pick the cheapest of deleting, inserting, or substituting the last character. That minimum becomes the new value in the matrix.


dp[i][j] = min(
  dp[i-1][j] + 1,                      // delete
  dp[i][j-1] + 1,                      // insert
  dp[i-1][j-1] + (a[i-1] == b[j-1] ? 0 : 1)  // substitute
)
          

Animated intuition

The next cell depends on three neighbors. The small animation below highlights the delete, insert, and substitute candidates feeding the minimum.

delete insert substitute
Each step compares three candidate costs and writes the minimum into the current cell.

Debugger-style simulator

Change the inputs, then step through the fixed algorithm. The code listing does not change; only the data does.

Hint: dp[i][j] compares prefixes a[0..i) and b[0..j).
Focus:
Advanced controls
Stop points

Developer checks

Run a small set of correctness checks in your browser console or by clicking the button below.


            

Simulator code (JavaScript)

Line highlights map to the steps shown in the matrix.


            

DP matrix

Current cell and dependencies are highlighted.

Current Dependencies Edit path
Values update as the code advances. Empty cells have not been computed yet.

Watch + explanation

Key variables for the current step.

Step 0 of 0 | Phase: start

The simulator executes the JavaScript above while following the same algorithm as the C99 reference listing below.

Time and space complexity

The DP algorithm fills an (m + 1) x (n + 1) matrix. That is O(mn) time and O(mn) space. There are faster variants for special cases, but the classic method is reliable and easy to reason about.

Common pitfalls

  • Off-by-one errors when mapping indices to characters.
  • Forgetting to initialize the first row or column.
  • Mixing up i/j with character positions.
  • Ignoring how empty strings behave (they are valid inputs).

Examples

cat -> cut has distance 1 (one substitution: a -> u).

kitten -> sitting has distance 3 (substitute k -> s, substitute e -> i, insert g).

Optional edit script

The simulator can show one valid sequence of edits by tracing backward through the matrix. Click "Show edits" once the final answer is reached.

Note: JavaScript strings are indexed by UTF-16 code units. Some emoji or accented characters count as two positions. This simulator follows that standard for consistency with the reference code.

C99 reference implementation

This listing is a clean C99 version of the same algorithm. The simulator uses JavaScript for execution so it can run in the browser, but the structure mirrors this reference.

C99 reference code (read-only)

#include 
#include 

size_t levenshtein(const char *a, const char *b) {
  size_t m = 0;
  size_t n = 0;
  while (a[m] != '\0') m++;
  while (b[n] != '\0') n++;

  size_t **dp = malloc((m + 1) * sizeof(size_t *));
  for (size_t i = 0; i <= m; i++) {
    dp[i] = malloc((n + 1) * sizeof(size_t));
  }

  dp[0][0] = 0;
  for (size_t i = 1; i <= m; i++) dp[i][0] = i;
  for (size_t j = 1; j <= n; j++) dp[0][j] = j;

  for (size_t i = 1; i <= m; i++) {
    for (size_t j = 1; j <= n; j++) {
      size_t cost = (a[i - 1] == b[j - 1]) ? 0 : 1;
      size_t del = dp[i - 1][j] + 1;
      size_t ins = dp[i][j - 1] + 1;
      size_t sub = dp[i - 1][j - 1] + cost;
      size_t min = del < ins ? del : ins;
      if (sub < min) min = sub;
      dp[i][j] = min;
    }
  }

  size_t result = dp[m][n];
  for (size_t i = 0; i <= m; i++) free(dp[i]);
  free(dp);
  return result;
}