Preview

Site Audit

Crawl a site into a timestamped snapshot of every URL — response codes, headers, and body — then diff a BEFORE against an AFTER to prove a migration changed nothing it shouldn’t have, or flag exactly what did.

Built on the same crawler as Site Search, but a distinct product: where Site Search indexes a site to make it findable, Site Audit freezes a site to make it comparable.

First use case: a portable, replayable snapshot

Point the crawler at a site and Site Audit captures, per URL, the full response state into a standard HAR file — stamped with the crawl time so two crawls of the same site coexist instead of overwriting each other. The snapshot is an immutable, labelled baseline (pre-migration 2026-06-28), not a search index that gets upserted in place.

Because it is a HAR — not a bespoke report format — the snapshot is an executable artifact, not a dead document. It diffs, it converts to HURL, and it replays. Everything below is built on that one capture.

Captured per URL	What it is	Why it matters in a diff
Response code	`status` + reason (`200 OK`, `404`, `503`…)	Catches newly broken, newly redirected, or newly erroring pages
Response headers	All of them — incl. `HSTS`, `CSP`, `X-Frame-Options`, `Cache-Control`, `Content-Type`, `Server`	Catches dropped security headers and tech-stack / cache regressions
Request headers	As sent on the captured request	Makes the snapshot faithfully replayable
Body size	Bytes returned	Sharp shrink = content loss / broken render; bloat = regression
HTTP version + latency	`h1`/`h2`/`h3` and response time (ms)	Catches protocol downgrades and performance cliffs
Content hash	SHA-256 of the body	Detects silent content drift even when status and size are unchanged
Crawl timestamp	When the URL was captured	Anchors the baseline; lets you say exactly what AFTER is compared to

How a migration audit runs

1 Snapshot BEFORE

Crawl the live site and freeze it as a labelled baseline. Minutes of work, and the expensive half — the crawl — is the part that already runs.

2 Make the change

Replatform, CMS or domain move, redesign, redirect-map rollout, infra cutover — whatever the migration is. Site Audit doesn’t care how you change it.

3 Snapshot AFTER

Crawl again into a second labelled snapshot. Now two frozen states of the same site exist side by side, ready to compare.

4 Diff + score

Per-URL deltas roll up into one migration-health score, a human-readable report, and a machine-readable JSON diff you can gate a release on.

What the diff surfaces

Status transitions

200→404 broken, 200→3xx newly redirected, →5xx error introduced, 404→200 fixed.

Dropped security headers

An HSTS, CSP, or X-Frame-Options header that was present BEFORE and is missing AFTER — a silent, easy-to-miss migration regression.

Body-size & content drift

Sharp size deltas flag content loss or broken renders; the SHA-256 hash catches changes even when status and size look identical.

Redirect-chain changes

A clean 301 that became a redirect loop, or a chain that grew an extra hop, shows up per URL.

URL coverage

Present BEFORE but missing AFTER = lost pages. Present AFTER only = net-new pages. The two lists are usually the first thing a migration owner wants.

One health score

The deltas roll into a single number so “did the migration regress anything?” has a yes/no answer, with the detail one click away.

Essential & well-known files — presence and status

A site’s behaviour at the edges is defined by a set of conventional files at fixed paths. They stay invisible until one is missing, stale, or answering with the wrong status — a manifest.json that 404s, a redirect rule that quietly swallows the /.well-known/acme-challenge/ path and breaks certificate renewal, or a sitemap.xml that returns 200 with an HTML error body (a “soft 404”) instead of XML. Every snapshot inventories the full idiomatic set and records, per file, its status code, content-type, and basic validity — then diffs that inventory across snapshots like everything else.

Crawlers & SEO

robots.txt (and does it point at a sitemap?), sitemap.xml / sitemap_index.xml, humans.txt, the OpenSearch description, and the emerging llms.txt / /.well-known/ai.txt AI-crawler hints.

TLS & certificates

/.well-known/acme-challenge/ (ACME HTTP-01 — a stray redirect here silently breaks Let’s Encrypt renewal), /.well-known/pki-validation/, and /.well-known/mta-sts.txt for mail transport security.

PWA & installability

manifest.json / manifest.webmanifest (referenced from <link rel="manifest">?), the service worker (sw.js / service-worker.js and its scope), favicon.ico, apple-touch-icon, and browserconfig.xml.

Mobile deep-linking

/.well-known/apple-app-site-association (iOS Universal Links — must be JSON, no redirect) and /.well-known/assetlinks.json (Android App Links / Digital Asset Links).

Security & privacy

/.well-known/security.txt (RFC 9116 — present and not past its Expires date?), /.well-known/change-password, /.well-known/gpc.json, and OIDC /.well-known/openid-configuration where it applies.

Ads & monetisation

ads.txt, the mobile-app app-ads.txt, and sellers.json — presence and parse-validity across the IAB authorised-sellers chain.

Status, not just presence. For each path the audit separates present (200) from missing (404), redirected (a 3xx that is sometimes itself the bug), wrong content-type (a manifest served as text/html), soft-404 (200 with an error body), and stale (e.g. an expired security.txt). A clean legacy report also flags files that should usually be gone — crossdomain.xml, clientaccesspolicy.xml, and any surviving *.appcache AppCache manifest (long deprecated and dropped by browsers).

Across a migration this is where the quiet regressions hide: a robots.txt that lost its Sitemap: line, a manifest that started 404ing on the new host, or an apple-app-site-association that now redirects and silently broke every Universal Link into the app.

The snapshot is executable — replay it with `zforce`

Because the snapshot is a standard HAR, it feeds straight into zforce, our multi-backend HTTP scanner / load-tester / fuzzer. The same captured requests that formed your BEFORE baseline become a smoke test, a load test, or a fuzz corpus you can fire at the AFTER environment — no hand-written test suite, just the snapshot you already have.

# Replay every captured request against the migrated environment
# (10 concurrent; pass -q for runs over ~100 requests).
zforce --har-file pre-migration-2026-06-28.har -r 10

# Convert the snapshot to a HURL suite you can check into CI and
# run on every deploy, asserting status codes and headers stay put.
zforce --har-file pre-migration-2026-06-28.har --convert-har-hurl ./suite

# Fuzz the captured endpoints with edge-value headers / payloads
# to see how the new stack handles what the old one shrugged off.
zforce --har-file pre-migration-2026-06-28.har --fuzz -q

That closes the loop: capture → diff → replay. The snapshot proves what the site was; the diff proves what changed; zforce proves the new environment still answers the same requests the same way.

Where this goes — one engine, several audits

Migration QA is the sharpest first use case, but the snapshot/diff/score machinery is general. Each of these is a layer on the same capture, shipped independently:

Drift & uptime monitoring

Run the snapshot on a schedule instead of around a migration, and divergence from the baseline is drift / uptime monitoring.

Compliance & hygiene score

Scan robots.txt, sitemap.xml, http→https redirects, TLS, and security headers into a single audit / compliance score with a report.

API contract audit

Slurp a Swagger / OpenAPI spec and fuzz the endpoints with edge-value headers and payloads, asserting contract shape, status codes, and error codes hold.

Honest status

Preview, not GA

This is an idea-stage product direction. There is no public capture / diff endpoint yet — the examples above describe the intended workflow, not a live API to call today.

The crawl already exists

The expensive half — crawling and capturing per-URL response state — runs in production behind Site Search. Site Audit is the snapshot, diff, and scoring layer on top of it, which is the part under active development.

Shaped by real buyers

We’d rather build the diff that a real migration owner needs than guess at a feature matrix. If you have a migration coming up, tell us what would make you trust it shipped clean.

Have a migration coming up?

Tell us about the site and the change. We’re looking for early design partners to point the snapshot/diff at a real, high-stakes migration.

Get in touch

Opens your mail client to info@loxal.net. This page is a preview — the snapshot/diff service is not generally available yet.