The Scorecard: Grading Design Systems for the AI Era

April 15, 2026 · design-systems, ai-native, fluent, material, scorecard

Fluent UI scores 25 out of 80. Material Design scores 22. Neither system has any vocabulary for expressing AI confidence to users — both score 1 out of 10 on trust expression.

Those aren’t opinions. They’re scores against a methodology you can apply to any design system yourself. Here’s how it works.

The Methodology

The framework for AI-native design defined what a design system needs to handle in the AI era. The scorecard turns that framework into numbers. Each dimension is scored 1–10:

Score	What It Means
1–2	No support. The system doesn’t address this dimension.
3–4	Building blocks exist but no coherent vision. Developers assemble it themselves.
5–6	Intentional support with gaps. The system has features for this but they’re incomplete.
7–8	Strong support. The dimension is a design priority with real primitives.
9–10	Leading. The system defines the state of the art for this dimension.

Eight dimensions, ten points each, eighty total. No design system is close to 80. The interesting question isn’t the total — it’s the shape of the score. Where does a system lead? Where is it asleep?

Here’s what “good” looks like for each dimension — the difference between a 3 and a 7.

Contextual Adaptation. If your system has a Spinner and a Toast, you’re at a 2. If it has compositional primitives for the input → processing → response → confirmation cycle — streaming content components, typing indicators, progressive block rendering — you’re at a 5+. A 7 means the conversation choreography is a first-class design concern, not something each product team invents.

Fluid Motion. Discrete state switches (default → hover → active) keep you at a 2. Data-driven continuous parameters — spring physics, shape morphing, animation driven by confidence values rather than boolean flags — push toward a 7. The question is whether visual properties can be blended or only switched.

Progressive Disclosure. Fixed layouts for known content are a 3. Responsive containers that negotiate with content of unknown shape — ten words or five hundred, code or prose, with or without images — that’s a 7. No major system is there yet.

Natural Language Integration. If the design system lives as a Storybook site and a Figma kit, it’s a 2. If the design artifact is the runtime artifact — the animation you author is the animation that ships — it’s a 5+. A 7 means there’s no translation layer between design intent and deployed behavior.

Multi-Modal Harmony. Per-component ARIA compliance is a 4. System-level focus management that works across dynamically rendered trees — where content streams in and interactive elements appear at runtime — that’s a 6+. A 7 adds cognitive load modeling, reading level adaptation, and motion sensitivity beyond a binary toggle.

Trust & Confidence Signaling. If the system has color roles for error and success but nothing for confidence, uncertainty, or provenance, it’s a 1. Anything above a 3 means the system has some vocabulary for epistemic state — tokens or components that distinguish “I’m confident” from “I’m guessing.” Currently, no major system scores above 1.

Collaborative Presence. Tokens that cover color, typography, and spacing but only for a single surface keep you at a 4. A 7 means the design system defines how the AI behaves on every surface — web, mobile, voice, email, embedded panels — with motion tokens, behavioral tokens, and voice persona tokens that translate across contexts.

Graceful Degradation. A component library with documentation is a 2. A schema that generative tools can consume as behavioral constraints — so Copilot, Claude, or Cursor can generate components that comply with the design system’s rules — that’s a 7+. The design system becomes a machine-readable spec, not just a human-readable site.

The Scorecards

Three columns tell the real story. Fluent UI on web, Material Design 3 on web, and Material Design 3 on native (Jetpack Compose). The native column matters because Google’s best AI-native design thinking lives there — and nowhere else.

Fluent UI — 25/80. Accessibility is the bright spot — 6 out of 10, the only score above 5 in the entire matrix. Tabster and system-level focus management put it ahead of everything else for dynamic content. The 1 on trust expression is the blind spot. Fluent can tell you a button is primary. It can’t tell you the AI is guessing.

Material Design 3 (Web) — 22/80. The lowest total. M3 Expressive was designed for AI — but only shipped on mobile. On web, Material is in maintenance mode. The GitHub issue requesting Expressive support has heart reactions and zero maintainer responses. Community forks are stepping in, but this is fragmented and risky for enterprise adoption. The strategic blunder is real.

Material Design 3 (Compose) — 29/80. The highest total, and the only system where Fluid Motion scores above a 3. Spring physics, shape morphing, continuous state-driven animation — this is what AI-native motion looks like. But it’s Kotlin-only. If you’re building for the web, this column is a window into the future, not a tool you can use today.

The Patterns

Three things jump out of the scores.

Trust expression is unclaimed territory. Both systems score 1/10. This is the dimension that will make or break AI adoption in regulated industries — healthcare, finance, government. The first design system to ship confidence tokens, uncertainty indicators, and provenance components will have a structural advantage in enterprise AI sales. It’s the most important dimension, and it’s completely empty.

The web is underserved. Google’s best AI-native design thinking is on Compose. Microsoft’s Copilot products all build their own conversation layer on top of Fluent. The door is open for whoever ships these patterns as web components first. Fluent could leapfrog Material on web by adopting the AI-native patterns that M3 Expressive pioneered on native — continuous motion, shape morphing, conversational components — and shipping them where enterprise actually lives.

Graceful Degradation is the sleeper. Design-system-as-constraint-system sounds abstract until you realize every AI coding tool is already generating UI components. If the design system can’t be consumed as a machine-readable behavioral spec, every generated component is a brand violation. The team that turns their design system into a schema that LLMs can follow has solved a problem every enterprise will have within 18 months.

Score Your Own

Take the design system you use every day. Score it against the eight dimensions. You don’t need a rubric — the calibration above is enough. Be honest about what “3” really means (“building blocks exist, developers assemble it themselves”).

You’ll find the same pattern: strong on the established dimensions — accessibility, theming, composition — empty on the new ones. Trust, spectrums, constraints. The shape of the score is more interesting than the total.

The gap is the opportunity.

The Scorecard: Grading Design Systems for the AI Era

The Methodology

The Scorecards

The Patterns

Score Your Own

Sources