Analytics Documentation That Never Goes Stale

The documentation problem nobody talks about

If you've worked in digital analytics long enough, you've been in this exact meeting. Someone asks which parameters the generate_lead event collects. You open the tagging spec. It says "email, subject, budget". You check GA4. Only "subject" and "budget" are showing up. You open the HTML. There it is — email was hashed and renamed to email_sha256 three sprints ago. The spec was never updated.

The document is three months old, two product iterations behind, and now actively misleading. And nobody feels at fault — because this is just how analytics documentation works in most projects.

The real cost: teams waste hours reconciling what the spec says with what's actually in GA4. Analysts make decisions on data they don't fully trust. Audits reveal the same inconsistencies every time.

The root cause isn't negligence. It's that documentation is treated as a deliverable, not a system. You write it once, ship it, and it starts drifting from reality the moment the first product change lands.

A different model: make the code document itself

The approach I've taken in this project flips the model. Instead of writing documentation separately and trying to keep it in sync, I designed a system where the tracking implementation and the documentation are the same artifact.

Three principles drive this:

Tracking parameters live in the HTML, not in a spreadsheet. Every element that fires an event carries its own parameters as a JSON attribute. The code is the spec.
Visual evidence is generated automatically. A Puppeteer script navigates every page, highlights every tracked element, and takes a screenshot. No manual annotation, no guessing.
The measurement plan is a build artifact. It's generated from the code and screenshots, not hand-maintained. Regenerating it takes one command.

How it works in practice

Declarative tracking via data attributes

Rather than writing JavaScript event handlers for each element, every trackable link in the site carries its analytics payload directly in the markup:

<!-- The tracking parameters live here, in the HTML itself -->
<a href="blog.html"
   data-track='{"content_type":"nav_link",
              "content_id":"blog",
              "content_name":"Blog",
              "item_list_name":"nav"}'>
  Blog
</a>

A single delegation listener in analytics.js intercepts all clicks on elements with a data-track attribute, parses the JSON, and pushes to the dataLayer. No per-element event handlers. No risk of parameters drifting between the HTML and some external document — they are the HTML.

Adding tracking to a new element doesn't require writing JavaScript. You add the attribute with the right parameters, and the delegation listener handles the rest. The spec updates itself.

Privacy by architecture, not by configuration

The central analytics script handles PII filtering at the system level, before any data touches the dataLayer. This isn't a checkbox in GTM — it's the default behavior of the code:

// Email → hashed with SHA-256 before it ever reaches the dataLayer
async function hashEmail(email) {
  const buf = await crypto.subtle.digest(
    'SHA-256',
    new TextEncoder().encode(email.trim().toLowerCase())
  );
  return Array.from(new Uint8Array(buf))
    .map(b => b.toString(16).padStart(2, '0')).join('');
}

// Free-text fields → never captured raw, only metadata
// message content → message_word_count, message_char_count, message_filled

// Fields named name, phone, address, etc. → automatically excluded

The dataLayer push for a form submission looks like this — the raw message is never in there:

window.dataLayer.push({
  event:               'generate_lead',
  form_name:           'contact',
  subject:             'analytics audit',
  budget:              '5k-10k',
  message_word_count:  47,
  message_char_count:  214,
  message_filled:      true,
  email_sha256:        '3a7bd3e2...'   // hashed, never raw
});

Automated visual evidence with Puppeteer

The part that makes this system genuinely different from a well-maintained spreadsheet is the visual layer. A Puppeteer script (capture_measurement.mjs) runs through the entire site and produces proof of what's being measured:

It navigates to each page and waits for network idle
It locates each tracked element, scrolls it into view, and draws a red highlight overlay over it
It takes a full-page screenshot with the element in context
It moves to the next event and repeats

The output is a folder of 30+ screenshots, each showing exactly which element fires which event. When a designer moves a button, you re-run the script and the screenshots regenerate. No manual annotation, no outdated arrows on a slide deck.

This is auditable documentation. Not "the spec says this button tracks X". But "here is a screenshot of that exact button, with a red box around it, taken automatically from the live site."

The measurement plan as a build artifact

All of this feeds into an interactive HTML document — the measurement plan. It has eight tabs, one per event type. Each tab has a parameter table, example payloads, and the Puppeteer screenshots embedded inline.

Keeping it up to date requires no manual effort beyond the tracking implementation itself. When something changes:

Update the data-track attribute (or inline JS) in the HTML
Run node capture_measurement.mjs — screenshots regenerate
Run node build_standalone_measurement.mjs — the standalone file rebuilds with all images embedded as base64

That standalone file — a single self-contained HTML — can be sent to any stakeholder, opened without a server, and read without any setup. It's the shareable artifact of record.

What's currently tracked

The measurement plan for this project documents six event types across 28 parameters:

Event	Trigger	Key parameters
page_view	Every page load	page_name, page_category, page_type, content_group
select_content	Click on any data-track link	content_type, content_id, content_name, item_list_name
generate_lead	Contact form submit	form_name, subject, budget, message_word_count, email_sha256
orbit_interaction	Hover on skills orbit section	orbit_pause, skill_hover
search	Blog search — debounced 500ms, min 2 chars	search_term, search_results_count, search_active_filter
post_engagement	Like or comment on a blog post	content_id, content_name, action (like/unlike/comment)

Semantic versioning for the measurement plan

The measurement plan itself is versioned. Before any significant update — new event added, existing event removed, parameter renamed — the current version is archived to measurement_plan/archive/measurement_plan_vN.html. The main document gets a version bump and a changelog entry.

Minor fixes (typos, refreshed screenshots, link corrections) don't trigger a version bump. Structural changes always do. This gives you a full audit trail: you can open the v1 archive and see exactly what was being tracked on launch day.

What this changes in practice

The immediate benefit is obvious: the documentation is always right. But the second-order effect is more interesting.

When updating the spec takes one command instead of an afternoon in a spreadsheet, people actually do it. The friction disappears. Documentation stops being a chore that competes with delivery, and becomes a natural side effect of doing the implementation work.

The audit trail also changes how conversations about data quality happen. Instead of "I think the event tracks X", you open the standalone HTML, navigate to the event tab, and show the screenshot of the exact element, taken from the live site, with the parameter table next to it. There's no ambiguity to argue about.

The goal isn't better documentation. It's making the cost of accurate documentation so low that it's never worth skipping.

This system was built as part of the TNK portfolio project — the same site you're reading this on. Every event described above is live and currently tracking. The measurement plan is available as a standalone HTML that reflects the current implementation, not the one from the launch sprint.