🔁 Seminal Paper of the Week — a foundational classic chosen to anchor the May 17–23, 2026 weekly digest. Area rotated to Databases this week.

Authors: C. Mohan, Don Haerder, Bruce Lindsay, Hamid Pirahesh, Peter Schwarz (IBM Almaden, 1992) Venue: ACM Transactions on Database Systems, Vol. 17, No. 1 DOI: 10.1145/128765.128770

TL;DR

ARIES is the recovery algorithm. It combines write-ahead logging (WAL), steal + no-force buffer management, physiological logging, and a three-pass restart (Analysis → Redo → Undo) with compensation log records (CLRs) that make undo idempotent. The result: fine-granularity record-level locking, partial rollbacks via savepoints, and a recovery story that scales to multi-GB buffer pools without the prior generation’s restrictions. Three decades later, DB2, SQL Server, the entire Postgres lineage, and the storage layers of virtually every cloud OLTP system (Aurora, AlloyDB, Spanner’s tablet servers, CockroachDB’s Pebble, Yugabyte) are ARIES variants.

Why the paper still matters

Phil Bernstein’s retrospective in this same week’s digest nominates ARIES as the canonical example of algorithmic optimization of an existing mechanism — the kind of work that quieted the transaction-research community in the early 1990s by appearing to close the problem. The “quiet” lasted maybe ten years before cloud computing reopened it, but the WAL + redo/undo skeleton ARIES standardized never went away. It just got distributed (Aurora’s “log is the database”), partitioned (Spanner per-tablet logs replicated by Paxos), and tiered into object storage (Socrates, modern cloud-native designs) — but the basic algorithm inside each unit is still ARIES.

If you’re building anything durable — a database, a transactional cache, a workflow engine, an event-sourced log — understanding ARIES is table stakes. It’s also a surprisingly short read once you know what to look for.

The setup: what ARIES needs to support

The paper’s introduction lays out the requirements that distinguish ARIES from its predecessors:

  • Fine-granularity locking (record-level, not page-level), so two transactions can update different records on the same page concurrently.
  • Partial rollbacks via savepoints — a transaction must be able to undo back to an intermediate point and continue.
  • Steal buffer management — the buffer manager may flush dirty uncommitted pages to disk to free memory.
  • No-force buffer management — committed pages need not be flushed to disk before the commit returns; only the log must be durable.
  • Operation logging — log records describe logical operations (e.g. “insert tuple in slot 7”), not full before-images of pages.

The combination of steal + no-force + record-level locking is what makes high throughput possible, and it’s also what makes recovery hard. Steal means uncommitted updates can be on disk at crash time (so undo is needed). No-force means committed updates may not yet be on disk at crash time (so redo is needed). Record-level locking means undo can’t just restore whole pages (something else may have committed on the same page in the meantime).

The five invariants ARIES enforces

Everything in the algorithm flows from a small set of structural invariants:

1. Write-Ahead Logging (WAL)

A log record describing a change is forced to stable storage before the corresponding data page can be flushed. This is the single invariant that makes crash recovery possible: at recovery time the log is authoritative.

2. Per-page LSN

Every page header carries the Log Sequence Number of the last log record that updated it. This is the magic ingredient that makes redo idempotent: during recovery, ARIES inspects each page’s pageLSN before re-applying a log record, and skips re-application if the page already reflects it (or anything later). Crashes during recovery are therefore safe — restart just runs the same algorithm again.

3. Physiological logging

Log records are logical within a page (e.g., “insert tuple at slot 7 with these bytes”) but physical across pages (they name the specific page they affect). This is the sweet spot — small log records, fast redo, no whole-page snapshots — and it’s what lets record-level locking coexist with page-level I/O.

4. Three-pass restart

Recovery walks the log three times. Once forward to figure out what was happening at crash, once forward to repeat history exactly, and once backward to undo the losers. The order — Analysis, then Redo, then Undo — is the most important sequencing decision in the paper, and it took years of community debate before it was accepted as the right one.

5. Compensation Log Records (CLRs)

Every undo step is itself logged with a CLR. CLRs carry an UndoNxtLSN field that points to the next log record still to be undone for the transaction. This makes undo redo-only on a recrash: if the system crashes mid-undo, the next restart re-runs Analysis + Redo (which replays the CLRs already written), then resumes undo from wherever the last CLR pointed. Undo never has to be undone.

The three-pass restart algorithm

This is the algorithmic heart of the paper. The diagram below summarizes the structure; the prose below it walks each pass.

ARIES three-pass restart: Analysis runs forward from the last checkpoint to rebuild the dirty-page and transaction tables, Redo runs forward from the earliest dirty-page recLSN to replay all logged updates (idempotently, gated by pageLSN), and Undo runs backward over losers, writing CLRs so undo itself is recoverable.

Pass 1 — Analysis (forward, from the last checkpoint)

The analysis pass starts at the most recent checkpoint and scans the log forward to the end. Its job is to reconstruct two tables that the checkpoint captured but that have evolved since:

  • The Transaction Table — every active transaction at crash time, with the LSN of its most recent log record. Transactions that committed before the crash are removed (they’re winners); the survivors are the losers that Undo must roll back.
  • The Dirty Page Table — every page modified since it was last written to disk, with the LSN of the earliest log record that dirtied it (recLSN). The minimum recLSN across this table is the RedoLSN — the log position where Pass 2 must begin.

Crucially, Analysis does not modify any pages. It is read-only on the log, and its output is the two reconstructed tables plus the RedoLSN.

Pass 2 — Redo (forward, from RedoLSN)

The redo pass starts at RedoLSN — which can be earlier than the checkpoint, because some dirty pages from before the checkpoint may not have been flushed — and re-applies every logged update, including updates by transactions that will later be undone. This is the “repeat history” doctrine, and it is the most subtle design choice in the paper. The reason it works:

  • Redo is idempotent because of per-page LSNs: before re-applying log record L, ARIES checks pageLSN ≥ L.LSN and skips if so.
  • Replaying loser updates means that after Redo, the database is restored to the exact state it had at crash time — including dirty uncommitted writes that the steal policy allowed onto disk.
  • Undo can then operate from a known state, using the regular logical-undo operators it would use for an in-flight transaction rollback at runtime. It does not need a special “recover” path.

This unified treatment — recovery undo is the same as runtime rollback undo — is what allows record-level locking to coexist with crash recovery. The prior generation’s recovery algorithms had to take page-level locks during restart precisely because they could not repeat history.

Pass 3 — Undo (backward, losers only)

The undo pass walks the log backward in LSN order, undoing operations of loser transactions one at a time. For each operation undone, it writes a CLR to the log. The CLR records (a) the logical undo of the operation and (b) the UndoNxtLSN pointing to the next log record of that transaction still to be undone.

Two consequences:

  • A recrash during undo is harmless. The next restart’s Redo pass will replay the CLRs that were written, advancing the losers’ state to where they were when the crash interrupted undo. Undo then resumes from the UndoNxtLSN of the last CLR.
  • Multiple losers can be undone in interleaved LSN order, not transaction-by-transaction. This matters because their operations are interleaved on the same pages, and the backward walk must respect that order.

When all losers reach their first log record (or savepoint), Undo is complete and the database is open for new work.

What lives in a log record

The paper specifies the per-record fields with care. The structure is worth memorizing:

Field Purpose
LSN Monotonically increasing log sequence number (often = byte offset)
Type update / commit / abort / CLR / checkpoint
TransID Owning transaction
PrevLSN Previous log record of the same transaction — a per-transaction backward chain through the log; this is what Undo walks
PageID Affected page
UndoNxtLSN (CLRs only) Next log record of this transaction still to undo
Data Redo info and (for non-CLR updates) undo info

The per-transaction PrevLSN chain is what lets Undo find a transaction’s log records efficiently in reverse without scanning the whole log for each one. Combined with UndoNxtLSN in CLRs, the entire undo trajectory of every transaction is a linked list in the log, and a crash mid-traversal just means starting from a CLR that’s slightly further along.

Checkpoints: bounding the work

Without checkpoints, Analysis would have to scan the log from the beginning. ARIES uses fuzzy checkpoints: the checkpoint log record captures the current dirty-page table and transaction table at a single LSN, but does not require any pages to be flushed (no-force in action). The log writer keeps running. At restart, Analysis just starts at the most recent checkpoint, rebuilds the tables from there, and Redo uses the minimum recLSN as its starting point. The cost of a checkpoint is essentially the cost of writing the two tables to the log — independent of the buffer pool size.

Why this design has outlasted everything around it

Two things make ARIES durable in the literal sense:

  1. It separates correctness from performance cleanly. WAL + per-page LSN + repeat-history-then-undo is the correctness skeleton; everything else (checkpoint frequency, log-write batching, group commit, asynchronous I/O, log-structured page stores) is performance tuning around that skeleton. New platforms change the performance tuning but rarely the skeleton.
  2. It makes crash recovery composable. Because recovery undo uses the same logical operators as runtime undo, and because CLRs make undo itself crash-safe, you can layer almost any concurrency-control scheme on top — 2PL, MVCC, snapshot isolation, OCC — without reworking the recovery story.

The current generation of cloud OLTP databases shows this clearly:

  • Aurora famously says “the log is the database” — but its storage layer is still ARIES-shaped: replicated log records with LSNs, per-segment redo to materialize pages on demand. The log-as-source-of-truth idea is older than Aurora (it’s Hyder, Bernstein and Reid, 2010 — see the retrospective) but ARIES is the substrate that makes it work.
  • Spanner / CockroachDB / Yugabyte wrap an ARIES-style local store (Pebble, RocksDB, custom) per shard, then replicate the log via Paxos/Raft. The replication layer is new; the per-shard recovery is ARIES.
  • Postgres is the most direct lineage — its WAL, page LSNs, and recovery passes map almost one-to-one onto the paper.

Read alongside

  • Fifty Years of Transaction Processing Research — Bernstein’s 2026 retrospective and the perfect modern companion. It explicitly walks the post-ARIES lineage through the cloud era.
  • Transaction Processing: Concepts and Techniques (Gray & Reuter, 1992) — the textbook companion to ARIES; same era, same problem space, more breadth.
  • The ARIES family of follow-ups by Mohan et al.: ARIES/KVL (key-value locking for B+ tree indexes), ARIES/IM (index management with structural modifications), ARIES/CSA (client-server architectures), ARIES/NT (nested transactions). All worth knowing exist; you’ll meet them in the references of any modern transaction paper.
  • Aries Rebuttals: the comparison papers between ARIES and Repeating History (RH) protocols are useful to understand why the redo-before-undo and physiological-logging choices won out over alternatives.

📄 ACM Digital Library — ARIES (DOI: 10.1145/128765.128770)


Part of the Weekly CS Paper Digest series. Written from background knowledge of the paper; the diagram above is original.