Verifying integrity¶

Verification proves — at a chosen moment in time — that no row in a chain has been altered, removed or inserted since it was written.

The service¶

Drupal\audit_trail\AuditTrailVerifier exposes three methods:

$verifier = \Drupal::service('audit_trail.verifier');

// Discover.
$chain_ids = $verifier->listChains();
// → ['notarial', 'webdav', 'finance', …]

// Per chain.
$result = $verifier->verifyChain('notarial');
// → [
//     'ok' => TRUE,
//     'count' => 4827,
//     'first_broken_id' => NULL,
//     'message' => 'Chain "notarial" verified: 4827 entries intact.',
//   ]

// All chains in one call.
$all = $verifier->verifyAll();
// → ['notarial' => […], 'webdav' => […], …]

verifyChain walks the chain in id order. For each row it checks three layers, in order:

Chain link. Confirms row.previous_hash matches the previous row's hash column. Catches inserted, removed, or reordered rows.
Public hash. Recomputes SHA-256(canonicalize(payload)) and compares with row.hash in constant time. The canonical payload is built from the row's channel, chain, severity, action, resource, context_permanent, context_transient_hash, created, secret_id, and previous_hash columns — the raw context_transient is not in the canonical (only its write-time SHA-256 hash is), which is what lets the cron purge worker NULL the transient column at retention without breaking verification. This layer is publicly verifiable: no secret is required, anyone with read access to the row can reproduce the check.
Operator HMAC. Loads the secret keyed by row.secret_id from the configured SecretRepository (Key-module backend), recomputes HMAC-SHA-256(row.hash, secret) and compares with row.hmac in constant time (hash_equals). Catches rows inserted directly into the DB by an attacker who has table write access but lacks the signing secret.

The two-layer split is deliberate: layer 2 surfaces tampering even when the row's secret is unavailable (rotated out, key deletion); layer 3 detects unsigned forgeries. A mismatch at layer 1 or 2 surfaces as a structural break; a mismatch at layer 3 (or a missing secret) surfaces as an authentication break. The verifier reports both flags independently on the verdict.

The first mismatch wins — the result reports the row id and a human-readable diagnostic. Subsequent rows are not walked (the chain is broken from that point downstream anyway).

Per-row secret_id dispatch means a single chain can span any number of rotated secrets without breaking verification — older rows verify under their original signing secret, newer rows under the current one. A row referencing a secret id that no longer exists (retired secret, or a forged value) is reported as "secret #N not available" at the row's id, which the operator investigates as either a legitimate retirement (cross-check WORM archive) or a forgery attempt.

What a clean walk proves¶

No row between the genesis row and the chain head has been edited (column tampering would change the recomputed hash, and so would the HMAC layer over that hash).
No row has been inserted in the middle (a new row's previous_hash would mismatch the surrounding rows).
No row has been removed (the row after the deletion would have a previous_hash pointing to a vanished hash).

Verification does NOT prove:

That a row at the chain head wasn't WRITTEN by a forger who has the secret. (The signature is symmetric; possession of the secret + database write access lets you produce a chain that validates.) Mitigation: external WORM export, RFC 3161 timestamps on batch boundaries — see security.
That every audit-worthy action made it into the chain. (A bug in the consumer that silently drops calls, or a code path that bypasses \Drupal::logger(), is invisible to AuditTrailVerifier.)

Segment-event cross-reference¶

For each segment_* event the verifier walks (segment_archived, segment_transient_purged, segment_live_purged, segment_file_purged), it confirms a mutual reference with the segment row the event names:

The chain event's resource field must be segment:<id>.
The matching audit_trail_segment.<transition>_event_id column must point back at the event's id.

A mismatch surfaces as segment row may have been rolled back to a pre-transition state in broken_ranges. This catches the canonical rollback tamper: an attacker with DB write access but no operator secret who tries to undo a lifecycle transition by clearing <transition>_event_id back to 0.

Live-purge supersession exemption. Restore is a legitimate reversal of a prior live-purge (see architecture.md). When the verifier sees a segment_live_purged event whose id doesn't match the segment's live_purged_event_id, it does one extra O(1) primary-key lookup on the value the segment points at. The mismatch is accepted only when the referenced event:

Exists in audit_trail (rules out rollback-to-zero).
Lives on the same chain as the current walk (rules out cross-chain pointer forging).
Has resource = 'segment:<same_id>' (rules out pointers at events for a different segment).
Has action segment_live_purged OR segment_restored (rules out pointers at archive / file-purge / unrelated events).
Has an id strictly greater than the event under verification (rules out pointers at an older live-purge, which would let an attacker hide a more recent purge).

Forged segment_restored events can't exploit this exemption: the row's HMAC is checked at verifyRow() before the cross-reference helper runs, so an event without the operator secret never reaches the exemption code.

Other segment transitions (segment_archived, segment_file_purged, segment_transient_purged) are not reversible by restore, so their strict-equality check stays in force -- any rollback tamper on those columns is still surfaced.

Mid-restore transitional states verify cleanly. Restore is not atomic: the segment_restored chain event commits in Step 1 BEFORE the rows are replayed (Step 2) and BEFORE the segment row is updated (Step 3). The verifier accepts every intermediate state without special-casing:

Between Step 1 and Step 2: the chain has a segment_restored event referencing the segment, but no rows have re-appeared in [from_id, to_id]. The verifier walks visible rows + the archive bridge as it would for any fully-purged segment; segment_restored is non-cross-checkable so the event passes through. The segment_live_purged event still strict-matches segment.live_purged_event_id (Step 3 has not moved the pointer yet).
Between Step 2 and Step 3: rows are now back in the live table. The verifier walks them in id order and validates each row's previous_hash linkage; the archive bridge does not fire because there is no gap. The segment_live_purged event still strict-matches segment.live_purged_event_id.
After Step 3: segment.live_purged_event_id now references the segment_restored event id. The strict-equality check on the segment_live_purged event fails; the supersession exemption above accepts the mismatch.

The transitional acceptance is property of the existing rules, not a separate exemption -- no rule was added or relaxed to support the narrowed restore design.

Running verification periodically¶

The standard pattern is a cron-driven verification job that calls drush audit_trail:verify and alerts (via Drupal watchdog → external monitor, or via email, or via a status report block) on any non-ok result. The drush command exits non-zero when any chain breaks, so a one-liner crontab covers the integration:

0 * * * * drush audit_trail:verify \
          || mail -s "audit_trail alert" admin@example.test

For embedded use (e.g., a custom alerting script that talks to PagerDuty), call the verifier service directly:

$results = \Drupal::service('audit_trail.verifier')->verifyAll();
$bad = array_filter($results, fn ($r) => !$r['ok']);
if ($bad !== []) {
  $msg = "AUDIT_TRAIL INTEGRITY BREAK:\n";
  foreach ($bad as $chain => $r) {
    $msg .= "  - {$chain}: {$r['message']}\n";
  }
  fwrite(STDERR, $msg);
  exit(1);
}
echo "All chains verified.\n";

Performance: incremental verification + checkpoints¶

A full walk costs O(chain length) — fine for hundreds of entries, intolerable for the multi-million-entry chains a long-lived audit-worthy install accumulates. The module keeps verification cheap with per-chain checkpoints:

Every time verifyChainIncremental() walks a chain cleanly to its current head, it mints a row in audit_trail_checkpoint recording (chain, last_id, last_hash, created) plus an hmac column signing the tuple.
The next call reads the most recent checkpoint for that chain, starts the walk after last_id, expecting last_hash as the genesis-equivalent of previous_hash.

A typical operational pattern:

Cron hourly: verifyAll() (default: incremental). Each chain walks only the rows since its last checkpoint — minutes-to-hours of activity, hundreds to thousands of entries at most. Sub-second.
Cron weekly (or on-demand): verifyAll(full: TRUE). Full cold walk from genesis to head. Useful as a belt-and-braces check even though incremental walks already validate checkpoint signatures (see below): a full walk re-derives every HMAC from the secret and ignores checkpoints entirely.

Checkpoints are themselves signed with the row's signing secret — each row carries an hmac column over (chain || last_id || last_hash || created), keyed by the secret_id that signed the audit row at last_id. verifyChainIncremental() validates the checkpoint's signature before trusting it; a forged or modified checkpoint fails the check, the verifier falls back to a full walk from genesis, and the result is flagged with checkpoint_forged => TRUE plus a warning in the message so operators can investigate the forgery itself as a security event.

Checkpoints are optimization, not source of truth. They speed up the common case (cron polling) but the chain itself is the authoritative record. Lose all checkpoints and a full walk still verifies the chain end-to-end.

For chains expected to outgrow what a single-process walk handles even with weekly full verification (multi-million row, multi-year archives), the roadmap plans for chain rotation (yearly closure + fresh chain) and external WORM export + qualified TSA timestamping — at which point old chains live in a write-once archive and are verified once, against their qualified timestamp, rather than re-walked from the DB.

API¶

$verifier = \Drupal::service('audit_trail.verifier');

// Incremental — fast, default.
$verifier->verifyChainIncremental('notarial');
// → ['ok' => TRUE, 'count' => 73, 'checkpoint_minted' => TRUE,
//    'message' => 'Chain "notarial" verified incrementally:
//                  73 new entries since last checkpoint at id 4754.
//                  Checkpoint refreshed.']

// Manually mint a checkpoint (independent of verification).
$verifier->mintCheckpoint('notarial');

// Full cold walk — slow, on demand.
$verifier->verifyChain('notarial');

// All chains: incremental by default, pass full: TRUE for cold.
$verifier->verifyAll();
$verifier->verifyAll(full: TRUE);

What a broken chain means¶

If verifyChain returns ok => FALSE, treat it as a security incident:

The break could be benign — a developer ran an ad-hoc SQL UPDATE in dev to fix a typo, a database migration altered the table. Verify the timestamp on the broken row against the local change log.
The break could be adversarial — someone with database access edited a row to cover an unauthorized action. Treat as a compromise: rotate the master secret, audit other systems, follow the incident response plan for the deployment.

Either way the chain stays usable for new entries from the break onwards (previous_hash of the next row records the broken row's hash so the chain "heals" from there). But every row from the break to the verification point is now considered unverified — annotate the incident in your audit log, preferably in the same chain (with a chain: TRUE event explaining the discrepancy), so the trace stays self-describing.