The Pipeline That Pays for Itself

I’ve built UI testing pipelines for a long time. Visual regression, accessibility gates, component isolation — the concepts aren’t new to me. But the tools are. And every few years it’s worth clearing the workbench, starting from zero, and seeing how far the state of the art has moved.

This is that check-in.

I picked the simplest possible surface — color tokens — and wired them end-to-end from Figma to tested production. Not because colors are the hard part, but because they’re the easiest thing to trace through every layer. If you can prove the pipeline works for nine color values, you can prove it works for spacing, typography, and full components. Part 2 will do exactly that. This is the foundation.

Here’s what I found: today’s tools make this absurdly fast. What used to be a multi-sprint initiative is now an evening.

The stack

Seven tools, each with one job:

  • Figma Variables — the source of truth. Primitive colors (raw hex values like #d14b3d) and semantic tokens (intent-based names like color-accent that alias the primitives).
  • Tokens Studio — Figma plugin that syncs your variables to a tokens.json file in GitHub. Push a button, get a commit.
  • Style Dictionary v5 — Node.js build tool that reads tokens.json and transforms it into CSS custom properties. --charcoal: #2a2a28 becomes --color-text-primary: var(--charcoal).
  • Astro — the site framework. Imports the generated variables.css into global.css. Components only reference semantic tokens, never raw hex.
  • Storybook (html-vite) — component development and visual catalog. Loads the same global.css, so stories render with real tokens.
  • Vitest — unit-level token validation. Confirms every semantic variable references a primitive, no hardcoded hex sneaking back in.
  • Playwright — visual regression screenshots and accessibility testing via axe-core. Catches what unit tests can’t: does it actually look right?

How they connect

This is the part most guides skip. The tools aren’t a list — they’re a chain, and the order matters.

Design → Code: Figma Variables → Tokens Studio → tokens/tokens.json → Style Dictionary → src/styles/tokens/variables.cssglobal.css → Astro components

Dev → Test: Storybook loads global.css → stories render with real tokens → Vitest validates token structure (node environment) → Playwright screenshots pages against baselines → axe-playwright runs accessibility audits against Storybook stories

Push → Deploy: GitHub Actions runs the full chain: build tokens → validate with Vitest → build Astro → build Storybook → Playwright visual regression → Playwright a11y → if all green, Cloudflare Pages deploys

Change a color in Figma, push, and every layer verifies itself. The token test catches structural errors. The screenshot test catches visual errors. The axe test catches contrast and landmark errors. If something slips through all three, it probably isn’t wrong.

What actually broke

Style Dictionary v5 and Tokens Studio don’t quite speak the same language. Tokens Studio exports Figma’s variable groups as Primitives/Mode 1 with short references like {charcoal}. Style Dictionary expects dot-path references like {primitives.charcoal} and chokes on the slashes and spaces. Nine references, nine failures, one cryptic error: “Some token references (9) could not be found.”

The fix was a Style Dictionary preprocessor — a function that flattens the Tokens Studio structure into a single namespace before reference resolution runs. It’s twelve lines of JavaScript. I didn’t find it in any tutorial.

Storybook 10 tried to parse .astro files as JavaScript when the default story glob matched src/**/*. Astro components are server-rendered templates, not client JS. The fix: point stories at a dedicated stories/ directory instead of scanning src/.

Playwright’s visual regression tests timed out because stale Node processes from a previous npm run dev session were holding ports 4321 and 6006. reuseExistingServer: true told Playwright to use them. They were ghosts. Kill the processes, tests pass. For CI, the config builds and serves static files instead of using the dev server — more reliable, closer to production.

And the a11y test caught a legitimate violation on the first run — but it was Storybook’s own wrapper div (sb-nopreview) failing the region landmark rule, not my component. Excluding Storybook’s chrome from the axe audit fixed it.

What the test suite actually validates

Four things, at three different layers:

Token integrity (Vitest, node environment):

  • Generated variables.css exists
  • All nine semantic tokens are present
  • Every semantic token uses var() — no hardcoded hex
  • All ten primitive tokens contain valid hex values

Visual correctness (Playwright):

  • Homepage screenshot matches baseline within 1% pixel tolerance
  • Computed background-color on <html> matches rgb(241, 241, 237) — the cream token
  • Computed color on <html> matches rgb(42, 42, 40) — the charcoal token

Accessibility (axe-playwright via Storybook):

  • Token color swatch story passes axe-core with zero violations

Build verification (GitHub Actions):

  • Token build completes
  • Astro production build completes
  • Storybook build completes
  • All of the above pass before deploy

Where this goes next

This is Part 1 — colors only, deliberately simple. The pipeline is proven. Every link in the chain works: design tool to code to test to deploy, with automated verification at each step.

Part 2 adds a real component. A carousel, designed in Figma using the semantic tokens, built as an Astro web component with full ARIA support, tested with visual regression across breakpoints and accessibility audits. That’s where the pipeline earns its keep — when a design change in Figma flows through to a tested, accessible, deployed component without anyone manually checking if the border radius is right.

The tools are ready. It’s a good time to be building this stuff.

The Pipeline That Pays for Itself