Industry standard audit
Benchmarked Bupa's system against published patterns from IBM Carbon, Shopify Polaris, Google Material, Adobe Spectrum, Atlassian and GitHub Primer. Named the gaps honestly, prioritised the fixes.
Engaged through Torii Consulting for a six-month independent audit of Bupa Australia's design system.
Three brands (Bupa, Blua, Mindplace), five Figma libraries, three platforms (web, iOS, Android), and 1,714 design tokens that had grown faster than anyone could keep track of.
An outside set of eyes, six months on the clock. Close enough to work alongside the team, independent enough to call the system what it was.
My job was to see the whole picture, name what was broken, and hand back a plan. Audit the system against industry standards, define the target architecture, ship one reference automation as proof, and leave the team a backlog of epics and tickets they could own from day one.
Token architecture, contribution governance, and the Power Automate intake pipeline. All built to be owned by the team after my engagement ended.
Benchmarked Bupa's system against published patterns from IBM Carbon, Shopify Polaris, Google Material, Adobe Spectrum, Atlassian and GitHub Primer. Named the gaps honestly, prioritised the fixes.
Delivered one canonical token system in three formats: Figma Variables, Excel SSoT, and W3C DTCG JSON. Designers, engineers and stakeholders drew from the same data.
Built an end-to-end Power Automate intake pipeline, live and tested. Microsoft Forms trigger to Teams notification to triage. Not a slide, a working system.
Mapped epics and tickets for each workstream, so the internal team had a clear plan to work from the day I left.
Design systems accumulate debt. Stripping it back is half the job.
The other half is the governance and automation that stop it coming back.
The audit produced fourteen ranked findings: nine across the wider design system, five inside the token files. Critical issues would break production or block any consolidation work. High-severity issues caused drift and friction, but did not break what was already shipped.
The audit ran across four fronts: library inventory, token system structure, governance practices, and automation coverage. Libraries told the loudest story first. Initial scoping suggested 43; the audit found 131 across 8 workspaces. Around 80 working files had been accidentally published as libraries. Archived libraries still showed 100+ weekly inserts. Teams had independently created and published libraries to solve immediate needs.
There was no foundation team, no single source of truth, no intake process, and no formal contribution path. The question “who decides?” produced a different answer depending on which team you asked.
Teams shipped fast. Governance came later. Organic drift was inevitable.
Nobody did anything wrong, everyone solved their own problem.
Nine ranked observations spanning library sprawl, deprecated assets, cross-platform drift, and governance gaps. Scroll to advance through the cards.
The same primary button lived in four libraries, each subtly different. Designers could not tell which was canonical.
SprawlRetired libraries showed 100+ weekly inserts from production files. Pulling them without a migration plan would have broken live pages.
RiskWeb and native used different names for the same token. A design signed off in Figma shipped looking different on iOS.
DriftAround 80 working files had been accidentally published as libraries. No separation between draft and source of truth, so unfinished work was consumed as if canonical.
Bloatcool-grey, coolgrey, and coolGrey all existed. Build tools treated them as different tokens.
NamingOrphaned styles sat on the shelf with no roadmap and nobody accountable for cleanup.
OwnerTeams could not tell which components were retired, so the inventory only grew. Deprecation became a guessing game.
StaleRationale stayed in DMs or nowhere at all. New contributors kept rehashing the same debates.
DocsRequests came in through eight different channels. Nothing was trackable or triaged.
BacklogToken files assessed against the W3C Design Token Community Group (DTCG) specification. Critical issues would cause silent UI failures if not resolved first.
Legacy tokens used a 0 to 100 scale where alpha-100 meant solid. The new system used 0 to 1000 where alpha-100 meant 10% transparent. A direct name migration would make UI elements invisible with no error.
Silent failWeb defined 7 transparency stops per colour (100% efficient). Native defined 21 stops (only 62% referenced). A 1:1 cross-platform mapping was impossible.
Cross-platformcool-grey vs coolgrey vs coolGrey. Build tools treated these as different tokens. Scripts failed silently when names did not match.
BuildSpelling inconsistencies in collection names persisted across multiple files. Migration scripts could not find their source files. No review process existed for token file structure.
Typos244 of the 415 alpha tokens carried a colour (4 colours times 20+ stops, multiplied across brands). A single black and white overlay approach could replace all of them.
BloatThe audit's findings became the brief for the architecture. The first job was to define one target the team could build toward, instead of three that kept pulling apart. The architecture is where the system is headed. Migrating the existing 1,714 references is the team's route there.
Same expressive range with a 74% smaller surface. The new canonical set powers Figma Variables, the SSoT spreadsheet, and the W3C DTCG JSON downstream.
Single black-and-white overlay model replaces per-colour alpha scales.
Cool-grey and Warm-grey, aligned across web and native.
The new system is a single source of truth, organised in a three-tier model of 235 primitives and 215 semantics. Brand differences across Bupa, Blua, and Mindplace are handled as modes inside one library rather than as separate token files. The same canonical data flows into Figma Variables, the SSoT spreadsheet, and the W3C DTCG JSON.
Tokens authored to the W3C Design Tokens Community Group format.
Every primitive carries $value, $type and $description
One JSON file, multiple toolchains downstream.
Drag the space slider from spacing.50 to spacing.400. The token name updates in the Primitive card, the Semantic alias re-points, and the button preview re-paints. Slide radius to the end for the pill.
Raw, context-free values. The source material: hex codes, pixel measurements, unitless scales.
Purpose-driven aliases. Names describe intent, not appearance; values reference primitives.
Part-level tokens scoped to one component. Composed entirely from semantic aliases.
Primitives are the raw material, the hex codes and pixel values nothing else should hardcode. Semantics name those by their job, the colour body text uses or the space inside a card. Components compose the semantics into finished parts. The semantic layer took the most work: four libraries had four names for the same thing, now one shared vocabulary.
Every raw colour, type, and dimension value the rest of the system points at, 235 in total. Pick a category and hover any token to see its name and value.
The hardest tradeoff was alpha tokens. The legacy system had grown to 415 of them on the assumption every primary colour needed its own opacity scale. Four colours, twenty stops, three brand variants. Unsustainable, and worse with every new hue.
Only 118 of the 415 alpha primitives were actually referenced by a semantic token. The other 72% were maintenance overhead with no real benefit. Consolidating wasn't taste, it was hygiene.
The decision was to consolidate. A single black and white overlay system handles every state where alpha actually appears: hover, disabled, scrim, selection. At those opacities, users do not perceive the difference.
Same expressive range. 95% fewer tokens, smaller files, faster audits, fewer naming collisions.
Pick any base colour, swap the overlay, slide the alpha. The token name resolves in real time. Same overlay works on every brand colour.
The simpler system takes 415 alpha tokens down to 22 shared stops. Less for designers to hold in their heads, smaller token files for engineering, and one fewer thing to argue about when a new component arrives. It also matches where the industry is heading on alpha: token discipline over per-colour fidelity.
Smaller files, faster audits, naming collisions resolved.
Maintainability won over pixel fidelity.
Hand-off needed an honest scorecard, not a victory lap. Working with the design and engineering leads, I walked the team through thirteen foundational decisions and recommended a status for each: locked and ready to ship, parked for governance to resolve, or open to refine through real use. The foundation is solid. Some decisions are deliberately unfinished, because the team will sharpen them better than I can on my own.
Before my engagement, the team had no central intake form. Requests arrived from eight scattered channels: Teams DMs, Figma comments, Confluence pages, hallway conversations, phone calls, and meeting action items. Nothing was tracked. Nothing was triaged.
I designed a contribution framework around a five-step process. One door in, one path through, one clear owner at every stage.
A team identifies a reusable problem worth bringing into the system.
The form captures enough context to assess the work without another meeting.
Light, medium, or heavy is based on effort, risk, reach, and impact.
The request lands with the team or decision-maker best placed to move it forward.
Approve, backlog, escalate, or close with a reason the requester can see.
Light governance that enables speed.
Too little creates chaos. Too much kills adoption.
We're not gatekeepers, we're a service.
Each request is routed into one of three tiers based on effort, risk, reach, and impact. The framework creates a lightweight path for maintenance, a review path for reusable improvements, and a governance path for strategic system change. Small fixes move quickly. Larger system changes get the right level of ownership, review and approval.
Low risk maintenance where the intent is already clear and no new design decision is required.
Reusable improvements that need prioritisation, a named owner and a quick design system review before release.
System level change with broader product, platform, accessibility or release impact.
Triage answers "how big is this." Decision authority answers "who signs it off." The two stack: a Light request lands at L1 with the component owner, a Heavy request climbs to L3 or L4. The point of the map is that most contributions resolve at L1 with self-serve documentation, the way a service is meant to work.
Bug fixes, typo corrections, single-token swaps. The component owner approves and ships, no committee needed.
New variants, prop changes, accessibility uplift, doc rework. Reviewed by the design system lead at weekly triage.
Net-new components, deprecations, API breaks, brand-mode work. Requires a short RFC and council review.
Tooling change, vendor swap, multi-brand strategy, budget. Goes to exec sponsorship at cycle review.
Tiers and SLAs answer the timing question. RACI answers the accountability one. The full matrix lives in the governance report with rows per request type, but the four roles boil down to this.
Worked example, a new component request across all three platforms:
Three columns, ranked by effort. V1 is the quick wins, what the team can ship in weeks with tools already in their hands. V2 is the next layer, more refined automation that needs real engineering investment but pays back in throughput. Dream State is the longer-horizon work: high effort, mature systems, and integrations beyond today's scope.
Excerpt from the full maturity roadmap.
The complete version in the governance report covers the full capability set across the design system lifecycle.
For an enterprise this size, design system intake is either automated and auditable or it's chaos. Three brands, three platforms, dozens of squads. Manual triage was never going to scale, and a paper-trail-by-email approach was never going to hold.
Requests didn't stop coming in from everywhere. They never will. What changed is that there's finally a place to direct them, and an automated flow behind it that catches what used to slip through the cracks.
One Microsoft Form triggers 31 Power Automate actions across 6 connectors. Every submission produces five simultaneous outputs:
Zero manual steps from submission to ticket. Triage itself stays human, a weekly Friday review by the governance team, with the full automation roadmap mapped as tickets and epics for the team to execute.
Requests came from everywhere: DMs, channels, hallway chats, Figma comments, calls. There was nowhere to direct them, and each one was captured manually by whoever happened to catch it. Things slipped through the cracks.
One intake form, one process. Every request flows through the intake form, automatically routed, fanned out to the surfaces the team already lives in, and tracked end to end with SLA visibility and clear ownership.
The flow tree below shows the actual Power Automate structure: 31 actions, 14 variables, 6 connectors, and a Switch block handling four urgency tiers.
A single cloud flow triggered by Microsoft Forms. It initialises 14 variables, runs an urgency Switch, loops through links and attachments, creates an ADO work item, logs the request to a SharePoint list, posts a Teams Adaptive Card, and sends two formatted emails, all in one run.
A single Power Automate flow handles conditional routing, variable manipulation, and parallel branches. Every submission fans out across the surfaces the team already lives in. Each tab below shows the actual output.
A nine-field Microsoft Form replaces eight scattered intake channels. Two minutes from "I have a thing" to a structured ticket on the right board.
An Adaptive Card hits the DS Intake channel within seconds. Requester, type, urgency, links surfaced inline so triage starts before someone has to ask.
A structured email sent to the DS team's distribution list alongside the Teams notification. Two channels means nothing slips through.
A matching receipt to the person who filled out the form. Same Bupa chrome, different message: confirms what was captured and surfaces an escalation path if it can't wait for triage.
Power Automate creates the ADO work item immediately on submit. Triage owner opens it on the right board with platform, urgency, and team tags applied.
The primary button does not show a hover state on mobile Safari. When tapping, there is no visual feedback that the button has been pressed. Affecting checkout flow on iOS devices.
1. Open the checkout page on iOS Safari. 2. Tap the primary CTA. 3. Observe no hover or active visual feedback.
The architecture is hub and spoke: each automation flow attaches its own SharePoint list to capture historical data, and all spokes feed a single Power BI dashboard. The intake pipeline shipped as the first spoke. The remaining spokes and the dashboard itself are mapped as tickets and epics for the team to execute against the same pattern.
A single Power BI dashboard fed by the Figma Library Analytics API, webhook logs, and the SharePoint hub. Each metric turns the day-to-day governance work into numbers leadership can act on, so those conversations run on evidence instead of opinion. The team owns the build, with tickets and epics mapped per metric, so the dashboard lights up bit by bit as each spoke comes online.
Component popularity: most and least used components across the org
Adoption rates: which teams use which libraries and how frequently
Library health: component counts, last publish dates, approved status
Token coverage: semantic tokens vs raw values across libraries
Illustrative layout, sample data shown
Six months of consolidation, governance, and automation. Here is what changed, in numbers you can check.
Tokens consolidated across platforms into one canonical source of truth.
Single source of truth establishedLibraries audited, then consolidated into a roadmap target of three workspaces.
Alpha tokens rationalised into a single overlay model.
Scattered intake channels consolidated into one Microsoft Forms pipeline.
Brands and platforms unified on a single token library.
Excel, Figma Variables, and W3C DTCG JSON.
Designer and engineer hours per week reclaimed through the live intake pipeline.
The new architecture, governance framework, and intake pipeline all shipped inside the six months and went straight to the team. For the legacy component re-tokenisation I recommended a phased rollout, with tickets and epics mapped per workstream so the team has a clear backlog to work through. The semantic tokens are platform-agnostic, with platform-specific transforms at the output layer, so adding a brand is a configuration change, not a rebuild.
Six structural workstreams surfaced by the audit, each mapped to a target in the consolidation roadmap.
Five source files were maintained independently, with duplicate tokens and conflicting values across platforms.
Consolidated into one source of truth covering all three brands.
The same token name held different values across platforms. grey.500 was #929ba2 on native and #21272a on web.
Web values adopted as the baseline, platform exceptions documented for accessibility.
Mixed spelling and case (cool-grey, coolgrey, coolGrey) that build tools read as three different tokens.
Standardised to kebab-case per the W3C DTCG spec, with a migration map for legacy names.
Native and web carried 27 overlapping colour families, many of them near-duplicates.
Consolidated to 13 canonical families. 14 deprecated, including Cyan, Indigo, Tangerine and Evergreen.
Web and native used 15 and 16 scale stops, several of them non-standard.
Standardised to 11 stops (50 to 950), symmetric for dark mode inversion.
Three disconnected grey ramps drifted in value between platforms.
Unified to two intentional scales, cool grey and warm grey.
A separate Ink token (#111c24) sat almost on top of grey-950 (#121517), adding maintenance cost.
Deprecated and migrated to grey-950.
Native carried 21 transparency stops and web only 7, with a naming collision where the legacy alpha-100 meant solid and the new alpha-100 meant 10 percent.
Consolidated to 22 tokens, 11 standardised stops each for black and white.
149 typography, dimension and component tokens lived only in code and docs, invisible to designers.
All 149 migrated to Figma Variables, with brand differences handled as modes.
Semantic names were fragmented: native used iOS-derived labels, both platforms used tool-derived stroke names.
Adopted a foreground, background, border vocabulary aligned with Carbon, Polaris and Primer.
Brand-specific intent tokens limited design flexibility when onboarding a new brand.
Generic, expandable accent colour slots added, ready for any brand.
Beyond the token architecture and governance framework, I left the team a full set of artefacts built to keep working after I did.
Editable master with all 450 tokens, mappings, and migration notes.
Style Dictionary-ready JSON. Direct pipeline to code.
Three reports: 1,714 tokens with cross-platform comparison, 131 libraries audited, and governance maturity scored against industry standards.
Browsable colour families, scales, and alpha variants with live swatches and token metadata.
WCAG AA/AAA pass-fail testing across every foreground-background pair in the token set.
Every token inconsistency captured as a tracked work item with platform tag, severity, and owner.
Contribution model, triage framework, and maturity roadmap.
Conditional routing, variable manipulation, and parallel branches powering the intake infrastructure.
I set up the governance infrastructure so any team member can run it. Every automated flow, every service account, every shared channel was configured with team continuity in mind. The intake pipeline runs whether I am there or not.
Not everything needs to be settled before V1 ships. The 7-3-3 split (7 locked, 3 deferred, 3 evolving) let stakeholders trust that the foundation was solid, while being honest that some decisions need real use before anyone can call them final.
The token consolidation was also a reminder to build accessibility in at the foundation, not bolt it on later. I checked WCAG 2.1 AA contrast across every semantic surface and text token pairing.
Wiring up the Figma Enterprise API webhooks would be a strong next step. Watching for new libraries the moment they are created would stop sprawl creeping back in while the consolidation is still underway.
I would also spend more time early on agreeing a shared vocabulary with the native development team. Sorting out semantic naming together, sooner, saves a lot of back-and-forth at handoff.
Design systems grow faster than the rules that govern them. Teams ship. Libraries multiply. Before you know it, there are more than anyone expected. This is normal. It happens to every maturing system.
The team focused on shipping and supporting the business. Process and documentation lagged behind, normal for a young team doing their job well.
How things work lived in people's heads, not documentation. The answer you got depended on who you asked.
Eight scattered intake channels with no formal process. Requests captured manually, nothing tracked, nothing triaged.
“Who decides?” had a different answer depending on which team you asked. Conflicts resolved by whoever was loudest.
1,714 tokens, 415 alpha variants, 27 colour families. Each brand and platform built its own layer instead of sharing primitives.
Friday meetings burning hours of senior time. Requests routed by memory, not process, with no SLA tracking.