Your system is in genuinely good shape where it counts most — the token architecture is close to best-in-class and adoption across the monorepo is broad and active.
The two things holding it back are governance (no documented contribution or deprecation process, despite 17 contributors) and documentation (Storybook coverage is strong, but usage guidelines and anti-patterns are almost entirely missing). AI readiness and platform maturity are thin — but that's expected at this stage, and you're sitting on unusually high-quality raw material to fix them cheaply.
What this is. A code-side health assessment of the @lawhive/ui design system — three-tier tokens, ~60–77 components, Storybook, a published package, four consuming apps. The full seven-dimension template applies. Figma, npm/CodeArtifact analytics, and design-side data were not inspected.
Overall health
Seven dimensions, scored by direct inspection. Three are strong, one functional, three weak — and the weak ones are the low-effort, high-leverage ones.
token-names.gen.ts, 508 tokens).combobox vs combobox-next duplication awaiting cleanup.workspace:* — no semver, changelog, or release cadence consumers plan around.Maturity stage: Managed — with the token layer already operating at Systematic.
The token architecture is enforced by a generated typed contract that fails CI on drift (a Systematic-to-Measured capability), and component APIs follow a consistent contract. But contribution, deprecation and decision processes are informal and undocumented, there's no quantitative adoption or drift tracking, and no recurring review cadence — which keeps the system as a whole below Systematic. To get there, it needs a documented contribution and deprecation process and a per-component documentation standard beyond Storybook.
A note on scoring. The system-health command asks for numeric scores out of 35. The skill's own output-discipline rule prohibits invented numeric scores — they imply a measurement instrument that doesn't exist — so this report uses the calibrated status labels the skill defines (Strong / Functional / Weak / Absent) with factual measurements behind each.
Dimension findings
Tokens
Strong- All three tiers present and well-structured. Primitive palettes (
--color-grey-*,--color-lawrence-purple-*, full type scale, radii), a semantic tier via@theme inline(--color-background,--primary,--muted-foreground,--destructive,--border,--ring), and semantic tokens that describe intent rather than appearance. - The component tier is deliberately thin (essentially only
--card). This is the shadcn convention — components bind to the semantic tier directly — not a gap. - Single source of truth with real enforcement.
tokens.cssis the source;scripts/generate-design-tokens.mjsemits a committed typed union (token-names.gen.ts), andpnpm tokens:check/typecheckfails CI if the contract drifts. Stronger than most Systematic-stage systems. - Excellent inline intent documentation on the z-index band and motion tokens — the why is written down where a future reader needs it.
Components
Strong- ~77 source components (60 with stories), spanning foundational (
button,input,checkbox,badge,separator), compound (dialog-form,sidebar,comment-thread) and feature-level (data-table,combobox-next,editable-popover). Distribution is healthy — weighted toward foundations. - Consistent API contract:
cva/tailwind-variants+VariantPropstyping + Radix primitives +cn. A new component slots into a predictable shape. - State coverage is good —
buttonalone covers hover, focus-visible, active, disabled and nested-svg defaults. - Two divergence signals worth tracking:
comboboxandcombobox-nextcoexist (a migration in flight; the old one is a deprecation candidate once consumers move), andbuttoncarries a tracked TODO about splitting icon size variants. Both are tracked debt — healthy — but a Linear ticket is not a deprecation process.
combobox so the duplication doesn't become permanent.Documentation
Functional- Storybook coverage is strong — 60 stories for 77 components (~78%), with
addon-docsandaddon-a11ywired in. Contributor conventions (styling, icons, forms, accessibility, Storybook standards) are well documented indocs/agent/react-and-frontend.md. - Usage guidelines are the gap. Only 2
.docs.mdxfiles exist. Most components have a visual/API story but no when to use / when not to use / anti-patterns prose — the documentation that prevents misuse, and the same content that unlocks AI readiness. - The package README is two lines. A new consumer lands there with no orientation.
- Code-side only: if usage guidance lives in Zeroheight / Notion / Figma, it wasn't in scope.
Adoption
Strong- Broad, real consumption:
@lawhive/uiis a dependency oflegal-os,admin-app,client-os,outlook-addinandreact-hook-form-fields. Docs explicitly position it as the migration target ("check the package before creating local components"). - Very active: 159 commits to
packages/uiin 90 days, from 17 distinct contributors in 180 days. - One known parallel system:
admin-appis vendored Tremor Raw (a separate house style) — a legitimate, documented divergence rather than accidental drift, but it means adoption isn't uniform across the org. - Caveat: this is coverage / availability, not measured usage. No quantitative adoption tracking exists, so "available everywhere" vs "used everywhere" can't be distinguished. The commit and dependency signals are strong proxies.
Governance
Weak- No documented contribution process. With 17 contributors landing changes, there's no written bar for what belongs in the system vs. what stays local, and no visible proposal/decision record.
- No deprecation process and no CHANGELOG. The
combobox→combobox-nextmigration is happening without a formal deprecation notice or migration path in the package. - No decision records specific to the design system; conventions live in
AGENTS.md, some decisions in Linear. - What does exist: an implied owning team, strong convention docs and CI enforcement on tokens. Decision-making happens — it's just not legible to a newcomer or defensible six months on.
- Calibration: because consumption is
workspace:*(atomic monorepo changes, no external version negotiation), some governance overhead is genuinely unnecessary. But at this contributor count, the absence of a contribution bar and deprecation process is a real trust risk, not premature process.
AI readiness
Weak- No machine-readable surface: no component manifest, no
.ai/directory, no structured JSON metadata, no six-section descriptions. An agent selecting a component today must read source and infer. - But the latent material is unusually good: every component has TypeScript prop types, the token layer is a parseable CSS-custom-property set and a committed typed union, and the cva API is consistent enough to extract from. Purpose / anti-patterns / composition are the only parts that must be authored by hand.
- Semantic tokens are partially self-documenting — z-index and motion have excellent intent comments; colour aliases have naming but no prose intent.
- Calibration: for a Managed system, weak AI readiness is expected. Flagged because the engineering bar is high enough that a manifest would pay for itself quickly, and the documentation work above produces most of the AI-readiness material as a by-product.
Platform maturity
Weak- The package is real infrastructure in embryo: published to CodeArtifact, versioned (0.0.2), built via
tsc, tokens generated on build. - But it's consumed as source, not as versioned infrastructure. All five consumers pin
workspace:*— no version pinning, no changelog, no release cadence to plan around, no semver discipline, no documented breaking-change blast-radius analysis. Changes land atomically via monorepo PRs. - This is a legitimate monorepo pattern, and for internal-only consumption it caps platform maturity by design rather than neglect. The CI-enforced token contract is a genuinely platform-grade capability inside an otherwise workspace-linked model.
- Calibration: Weak only against the ambition of independent / external consumption. If
workspace:*internal use is permanent, this dimension is closer to "appropriate" and shouldn't drive investment.
Prioritised actions
Ordered by leverage. The immediate two are cheap and close the biggest trust gaps; the near-term three compound (documentation feeds AI readiness); the last two are strategic decisions, not backlog items.
- Write a one-page contribution + deprecation process. What belongs in the system, who decides, how a component is retired. Biggest trust gap at 17 contributors, and cheap. · Governance
- Publish the
comboboxdeprecation path so the duplication resolves rather than calcifies. Components / Governance
- Adopt a per-component doc standard — purpose, anti-patterns, one edge-case example — starting with the ~10 most-used. Feeds AI readiness directly. · Documentation
- Generate a component manifest from existing types + the token contract. Most fields extract automatically. · AI readiness
- Flesh out the README into real orientation for new consumers. Documentation
- Decide the platform model explicitly: internal
workspace:*forever, or versioned / external. If external, add changelog + semver. If internal, mark Platform maturity as intentionally scoped. · Platform - Instrument adoption quantitatively (import/usage counts) to move from Managed toward Measured. Adoption
Scope of this assessment
Inspected
packages/uisourcetokens.css, the token generator + generated contract- Component / story / test inventory
package.jsonand consumerpackage.jsonfilesdocs/agent/react-and-frontend.md
Not inspected
- Figma / design-side data
- CodeArtifact / npm download analytics
- Storybook rendered output
- Docs hosted outside the repo (Notion, Zeroheight)
- Runtime accessibility behaviour — ARIA/role usage counted in source (38 of 77 files) but not tested
A note on context. This assessment sees the system's artefacts — not the history, constraints or trade-offs behind them. Several findings (the thin component-token tier, the workspace:* consumption model, admin-app's separate Tremor style) look like deliberate decisions rather than oversights, and are flagged as such where discernible. If any finding describes an intentional decision or known limitation, that's worth recording so future assessments focus on real problems rather than a generic ideal.