← Back to Blog
Analytics GTM Documentation 7 min read ·

Analytics Documentation That Never Goes Stale

Every project starts with a tagging spec. Six months later, nobody trusts it. Here's how I built a system where the documentation writes itself — and stays true forever.

tagging-spec.xlsx vs measurement_plan.html

Manual approach

Sprint
Write tagging spec
Dev
Implement tracking
Weeks later
Product changes
Result
Docs are stale
"What does this event actually track?"

Automated approach

Code
data-track in HTML
Script
Puppeteer captures
Build
Standalone HTML
Result
Always current
Code IS the documentation.

The documentation problem nobody talks about

If you've worked in digital analytics long enough, you've been in this exact meeting. Someone asks which parameters the generate_lead event collects. You open the tagging spec. It says "email, subject, budget". You check GA4. Only "subject" and "budget" are showing up. You open the HTML. There it is — email was hashed and renamed to email_sha256 three sprints ago. The spec was never updated.

The document is three months old, two product iterations behind, and now actively misleading. And nobody feels at fault — because this is just how analytics documentation works in most projects.

The real cost: teams waste hours reconciling what the spec says with what's actually in GA4. Analysts make decisions on data they don't fully trust. Audits reveal the same inconsistencies every time.

The root cause isn't negligence. It's that documentation is treated as a deliverable, not a system. You write it once, ship it, and it starts drifting from reality the moment the first product change lands.

A different model: make the code document itself

The approach I've taken in this project flips the model. Instead of writing documentation separately and trying to keep it in sync, I designed a system where the tracking implementation and the documentation are the same artifact.

Three principles drive this:

How it works in practice

Declarative tracking via data attributes

Rather than writing JavaScript event handlers for each element, every trackable link in the site carries its analytics payload directly in the markup:

<!-- The tracking parameters live here, in the HTML itself -->
<a href="blog.html"
   data-track='{"content_type":"nav_link",
              "content_id":"blog",
              "content_name":"Blog",
              "item_list_name":"nav"}'>
  Blog
</a>

A single delegation listener in analytics.js intercepts all clicks on elements with a data-track attribute, parses the JSON, and pushes to the dataLayer. No per-element event handlers. No risk of parameters drifting between the HTML and some external document — they are the HTML.

Adding tracking to a new element doesn't require writing JavaScript. You add the attribute with the right parameters, and the delegation listener handles the rest. The spec updates itself.

Privacy by architecture, not by configuration

The central analytics script handles PII filtering at the system level, before any data touches the dataLayer. This isn't a checkbox in GTM — it's the default behavior of the code:

// Email → hashed with SHA-256 before it ever reaches the dataLayer
async function hashEmail(email) {
  const buf = await crypto.subtle.digest(
    'SHA-256',
    new TextEncoder().encode(email.trim().toLowerCase())
  );
  return Array.from(new Uint8Array(buf))
    .map(b => b.toString(16).padStart(2, '0')).join('');
}

// Free-text fields → never captured raw, only metadata
// message content → message_word_count, message_char_count, message_filled

// Fields named name, phone, address, etc. → automatically excluded

The dataLayer push for a form submission looks like this — the raw message is never in there:

window.dataLayer.push({
  event:               'generate_lead',
  form_name:           'contact',
  subject:             'analytics audit',
  budget:              '5k-10k',
  message_word_count:  47,
  message_char_count:  214,
  message_filled:      true,
  email_sha256:        '3a7bd3e2...'   // hashed, never raw
});

Automated visual evidence with Puppeteer

The part that makes this system genuinely different from a well-maintained spreadsheet is the visual layer. A Puppeteer script (capture_measurement.mjs) runs through the entire site and produces proof of what's being measured:

The output is a folder of 30+ screenshots, each showing exactly which element fires which event. When a designer moves a button, you re-run the script and the screenshots regenerate. No manual annotation, no outdated arrows on a slide deck.

This is auditable documentation. Not "the spec says this button tracks X". But "here is a screenshot of that exact button, with a red box around it, taken automatically from the live site."

The measurement plan as a build artifact

All of this feeds into an interactive HTML document — the measurement plan. It has eight tabs, one per event type. Each tab has a parameter table, example payloads, and the Puppeteer screenshots embedded inline.

Keeping it up to date requires no manual effort beyond the tracking implementation itself. When something changes:

  1. Update the data-track attribute (or inline JS) in the HTML
  2. Run node capture_measurement.mjs — screenshots regenerate
  3. Run node build_standalone_measurement.mjs — the standalone file rebuilds with all images embedded as base64

That standalone file — a single self-contained HTML — can be sent to any stakeholder, opened without a server, and read without any setup. It's the shareable artifact of record.


What's currently tracked

The measurement plan for this project documents six event types across 28 parameters:

Event registry — 6 events · 28 parameters
Event Trigger Key parameters
page_view Every page load page_name, page_category, page_type, content_group
select_content Click on any data-track link content_type, content_id, content_name, item_list_name
generate_lead Contact form submit form_name, subject, budget, message_word_count, email_sha256
orbit_interaction Hover on skills orbit section orbit_pause, skill_hover
search Blog search — debounced 500ms, min 2 chars search_term, search_results_count, search_active_filter
post_engagement Like or comment on a blog post content_id, content_name, action (like/unlike/comment)

Semantic versioning for the measurement plan

The measurement plan itself is versioned. Before any significant update — new event added, existing event removed, parameter renamed — the current version is archived to measurement_plan/archive/measurement_plan_vN.html. The main document gets a version bump and a changelog entry.

Minor fixes (typos, refreshed screenshots, link corrections) don't trigger a version bump. Structural changes always do. This gives you a full audit trail: you can open the v1 archive and see exactly what was being tracked on launch day.

What this changes in practice

The immediate benefit is obvious: the documentation is always right. But the second-order effect is more interesting.

When updating the spec takes one command instead of an afternoon in a spreadsheet, people actually do it. The friction disappears. Documentation stops being a chore that competes with delivery, and becomes a natural side effect of doing the implementation work.

The audit trail also changes how conversations about data quality happen. Instead of "I think the event tracks X", you open the standalone HTML, navigate to the event tab, and show the screenshot of the exact element, taken from the live site, with the parameter table next to it. There's no ambiguity to argue about.

The goal isn't better documentation. It's making the cost of accurate documentation so low that it's never worth skipping.


This system was built as part of the TNK portfolio project — the same site you're reading this on. Every event described above is live and currently tracking. The measurement plan is available as a standalone HTML that reflects the current implementation, not the one from the launch sprint.

Like this post

Comments

Leave a comment

Guillermo García

Guillermo García

Digital Analytics Engineer · TNK Design & Analytics

Get in touch →

Related posts

Analytics · GTM

dataLayer: La Clave de una Medición Robusta y Escalable

Analytics · Dev

Analytics-First Workflow with CLAUDE.md