npm - @cubis/foundry - Versions diffs - 0.3.70 → 0.3.72 - Mend

@cubis/foundry 0.3.70 → 0.3.72

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (304) hide show

package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/agent-design/references/skill-testing.md ADDED Viewed

@@ -0,0 +1,164 @@
+# Skill Testing Reference
+Load this when writing evals, regression sets, or description-triggering tests for a CBX skill.
+Source: Anthropic skill-creator research — [Improving skill-creator: Test, measure, and refine Agent Skills](https://claude.com/blog/improving-skill-creator-test-measure-and-refine-agent-skills) (March 2026).
+---
+## Two Reasons to Test
+1. **Catch regressions** — As models and infrastructure evolve, skills that worked last month may behave differently. Evals give you an early signal before it impacts your team.
+2. **Know when the skill is obsolete** — For _capability uplift_ skills: if the base model starts passing your evals without the skill loaded, the skill has been incorporated into model behavior and can be retired.
+---
+## Five Test Categories
+Every skill should pass all five before shipping.
+### 1. Trigger tests (description precision)
+Does the skill load when it should — and stay quiet when it shouldn't?
+**Method:**
+- Write 5 natural-language prompts that _should_ trigger the skill
+- Write 5 near-miss prompts that _should not_ trigger
+- Load the skill and observe whether it activates
+**Example for a frontend-design skill:**
+```
+Should trigger:
+- "Build me a landing page for my SaaS product"
+- "Make this dashboard look less generic"
+- "I need a color system for a health app"
+Should NOT trigger:
+- "Fix this TypeScript error"
+- "Review my API endpoint design"
+- "Help me write tests"
+```
+**Fix:** If false positives occur, make the description more specific. If false negatives, broaden or add domain keywords.
+### 2. Happy path test
+Does the skill complete its standard task correctly?
+**Method:**
+- Write the most common, straightforward version of the task the skill handles
+- Run it and verify the output meets the expected criteria
+### 3. Edge case tests
+What happens under abnormal or missing input?
+Examples:
+- Missing required information (no brand color, no framework specified)
+- Ambiguous phrasing
+- Conflicting requirements
+- Very large or very small input
+- The user ignored the clarification questions and just said "do it"
+### 4. Comparison test (A/B)
+Does the skill actually improve output vs. no skill?
+**Method:** Run the same prompt with and without the skill loaded. Judge which output is better — ideally with a fresh evaluator agent that doesn't know which is which.
+If the no-skill output is equivalent, the skill adds no value (or the model has caught up to it).
+### 5. Reader test
+Can someone with no conversation context understand the skill's output?
+**Method:**
+- Take the skill's final output (plan, document, code, design)
+- Open a fresh conversation or use a sub-agent with only the output, no history
+- Ask: "What is this?", "What are the key decisions?", "What's unclear?"
+If the fresh reader struggles, the output has context bleed issues. Fix them before shipping.
+---
+## Writing Eval Cases
+Each eval case = one input + expected behavior description.
+**Format:**
+```
+Input: [natural language prompt or file +prompt]
+Expected:
+  - [Observable behavior 1]
+  - [Observable behavior 2]
+  - [Observable behavior 3 — what NOT to happen]
+```
+**Example for `ask-questions-if-underspecified`:**
+```
+Input: "Build me a feature."
+Expected:
+  - Asks at least 1 clarifying question (scope, purpose, or constraints)
+  - Provides default options to choose from
+  - Does NOT immediately generate code
+  - Does NOT ask more than 5 questions
+```
+**Rules:**
+- Evals should be independent (not dependent on previous evals)
+- Expected behavior should be observable and binary (pass/fail, not subjective)
+- Aim for 5-10 evals per skill before shipping; 15+ for critical skills
+---
+## Benchmark Mode
+Run all evals after a model update or after editing the skill:
+1. Run all evals sequentially (or in parallel to avoid context bleed)
+2. Record: pass rate, elapsed time per eval, token usage
+3. Compare to baseline before the change
+**Pass rate thresholds:**
+- < 60%: Skill has serious issues. Do not ship.
+- 60-80%: Acceptable for early versions. Target improvement.
+- > 80%: Production-ready.
+- > 90%: Reliable enough for critical workflows.
+---
+## Description Tuning Process
+If triggering is unreliable:
+1. List 10 prompts that should trigger the skill (write them as a user would)
+2. List 5 prompts of similar tasks that should _not_ trigger
+3. Find the distinguishing words/phrases between the two lists
+4. Rewrite the description to include the distinguishing words and exclude the overlap
+**Pattern:**
+```yaml
+description: "Use when [specific verb] [specific noun/domain]: [comma-separated task keywords]. NOT for [adjacent tasks that should not trigger]."
+```
+---
+## When to Retire a Skill
+A skill is ready to retire when:
+- 90%+ of its evals pass without the skill loaded (for capability uplift skills)
+- The skill's instructions are now standard model behavior
+- Maintenance cost exceeds value
+Retiring isn't failure — it means the skill did its job and the model caught up.

package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/agent-design/references/workflow-patterns.md ADDED Viewed

@@ -0,0 +1,226 @@
+# Workflow Patterns Reference
+Load this when choosing or implementing a workflow pattern for a CBX agent or skill.
+Source: Anthropic engineering research — [Common workflow patterns for AI agents](https://claude.com/blog/common-workflow-patterns-for-ai-agents-and-when-to-use-them) (March 2026).
+---
+## The Core Insight
+Workflows don't replace agent autonomy — they _shape where and how_ agents apply it.
+A fully autonomous agent decides everything: tools, order, when to stop.
+A workflow provides structure: overall flow, checkpoints, boundaries — but each step still uses full agent reasoning.
+**Start with a single agent call.** If that meets quality bar, you're done. Only add workflow complexity when you can measure the improvement.
+---
+## Pattern 1: Sequential Workflow
+### What it is
+Agents execute in a fixed order. Each stage processes its input, makes tool calls, then passes results to the next stage.
+```
+Input → [Agent A] → [Agent B] → [Agent C] → Output
+```
+### Use when
+- Steps have explicit dependencies (B needs A's output before starting)
+- Multi-stage transformation where each step adds specific value
+- Draft-review-polish cycles
+- Data extraction → validation → loading pipelines
+### Avoid when
+- A single agent can handle the whole task
+- Agents need to collaborate rather than hand off linearly
+- You're forcing sequential structure onto a task that doesn't naturally fit it
+### Cost/benefit
+- **Cost:** Latency is linear — step 2 waits for step 1
+- **Benefit:** Each agent focuses on one thing; accuracy often improves
+### CBX implementation
+```markdown
+## Workflow
+1. **[Agent/Step A]** — [what it receives, what it does, what it produces]
+2. **[Agent/Step B]** — [takes A's output, does X, produces Y]
+3. **[Agent/Step C]** — [final synthesis/delivery]
+Artifacts pass via [file path / variable / structured JSON / natural handoff instructions].
+```
+### Pro tip
+First try the pipeline as a single agent where the steps are part of the prompt. If quality is good enough, you've solved the problem without complexity.
+---
+## Pattern 2: Parallel Workflow
+### What it is
+Multiple agents run simultaneously on independent tasks. Results are merged or synthesized afterward.
+```
+         ┌→ [Agent A] →┐
+Input →  ├→ [Agent B] →├→ Synthesize → Output
+         └→ [Agent C] →┘
+```
+### Use when
+- Tasks are genuinely independent (no agent needs another's output to start)
+- Speed matters and concurrent execution helps
+- Multiple perspectives on the same input (e.g., code review from security + performance + quality)
+- Separation of concerns — different engineers can own individual agents
+### Avoid when
+- Agents need cumulative context or must build on each other's work
+- Resource constraints (API quotas) make concurrent calls inefficient
+- Aggregation logic is unclear or produces contradictory results with no resolution strategy
+### Cost/benefit
+- **Cost:** Tokens multiply (N agents × tokens each); requires aggregation strategy
+- **Benefit:** Faster completion; clean separation of concerns
+### CBX implementation
+```markdown
+## Parallel Steps
+Run these simultaneously:
+- **[Agent A]** — [focused task, specific scope]
+- **[Agent B]** — [focused task, different scope]
+- **[Agent C]** — [focused task, different scope]
+## Synthesis
+After all agents complete:
+[How to merge: majority vote / highest confidence / specialized agent defers to other / human review]
+```
+### Pro tip
+Design your aggregation strategy _before_ implementing parallel agents. Without a clear merge plan, you collect conflicting outputs with no way to resolve them.
+---
+## Pattern 3: Evaluator-Optimizer Workflow
+### What it is
+Two agents loop: one generates content, another evaluates it against criteria, the generator refines based on feedback. Repeat until quality threshold is met or max iterations reached.
+```
+        ┌─────────────────────────────────────┐
+        ↓                                     |
+Input → [Generator] → Draft → [Evaluator] → Pass? → Output
+                                 ↓ Fail
+                            Feedback → [Generator]
+```
+### Use when
+- First-draft quality consistently falls short of the required bar
+- You have clear, measurable quality criteria an AI evaluator can apply consistently
+- The gap between first-attempt and final quality justifies extra tokens and latency
+- Examples: technical docs, customer communications, code against specific standards
+### Avoid when
+- First-attempt quality already meets requirements (unnecessary cost)
+- Real-time applications needing immediate responses
+- Evaluation criteria are too subjective for consistent AI evaluation
+- Deterministic tools exist (linters for style, validators for schemas) — use those instead
+### Cost/benefit
+- **Cost:** Tokens × iterations; adds latency proportionally
+- **Benefit:** Structured feedback loops produce measurably better outputs
+### CBX implementation
+```markdown
+## Generator Prompt
+Task: [what to create]
+Constraints: [specific, measurable requirements]
+Format: [exact output format]
+## Evaluator Prompt
+Review this output against these criteria:
+1. [Criterion A] — Pass/Fail + specific failure note
+2. [Criterion B] — Pass/Fail + specific failure note
+3. [Criterion C] — Pass/Fail + specific failure note
+Output JSON: { "pass": bool, "failures": ["..."], "revision_note": "..." }
+## Loop Control
+- Max iterations: [3-5]
+- Stop when: all criteria pass OR max iterations reached
+- On max with failures: surface remaining issues for human review
+```
+### Pro tip
+Set stopping criteria _before_ iterating. Define max iterations and specific quality thresholds. Without guardrails, you enter expensive loops where the evaluator finds minor issues and quality plateaus well before you stop.
+---
+## Decision Tree
+```
+Can a single agent handle this task effectively?
+  → YES: Don't use workflows. Use a rich single-agent prompt.
+  → NO: Continue...
+Do steps have dependencies (B needs A's output)?
+  → YES: Use Sequential
+  → NO: Continue...
+Can steps run independently, and would concurrency help?
+  → YES: Use Parallel
+  → NO: Continue...
+Does quality improve meaningfully through iteration, and can you measure it?
+  → YES: Use Evaluator-Optimizer
+  → NO: Re-examine whether workflows help at all
+```
+---
+## Combining Patterns
+Patterns are building blocks, not mutually exclusive:
+- A **sequential workflow** can include **parallel** steps at certain stages (e.g., three parallel reviewers before a final synthesis step)
+- An **evaluator-optimizer** can use **parallel evaluation** where multiple evaluators assess different quality dimensions simultaneously
+- A **sequential chain** can use **evaluator-optimizer** at the critical high-quality step
+Only add the combination when each additional pattern measurably improves outcomes.
+---
+## Pattern Comparison
+|                | Sequential                                   | Parallel                                | Evaluator-Optimizer                  |
+| -------------- | -------------------------------------------- | --------------------------------------- | ------------------------------------ |
+| **When**       | Dependencies between steps                   | Independent tasks                       | Quality below bar                    |
+| **Examples**   | Extract → validate → load; Draft → translate | Code review (security + perf + quality) | Technical docs, comms, SQL           |
+| **Latency**    | Linear (each waits for previous)             | Fast (concurrent)                       | Multiplied by iterations             |
+| **Token cost** | Linear                                       | Multiplicative                          | Linear × iterations                  |
+| **Key risk**   | Bottleneck at slow steps                     | Aggregation conflicts                   | Infinite loops without stop criteria |

package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/deep-research/SKILL.md CHANGED Viewed

@@ -1,10 +1,10 @@
 ---
 name: deep-research
-description: "Use when a task needs multi-round research rather than a quick lookup: iterative search, gap finding, corroboration across sources, contradiction handling, or evidence-led synthesis before planning or implementation."
+description: "Use when a task needs multi-round research rather than a quick lookup: iterative search, gap finding, corroboration across sources, contradiction handling, evidence-led synthesis before planning or implementation. Also use when the user asks for 'deep research', 'latest info', or 'how does X compare to Y publicly'."
 license: MIT
 metadata:
   author: cubis-foundry
-  version: "1.0"
+  version: "1.1"
 compatibility: Claude Code, Codex, GitHub Copilot
 ---
 # Deep Research
@@ -13,23 +13,25 @@ compatibility: Claude Code, Codex, GitHub Copilot
 You are the specialist for iterative evidence gathering and synthesis.
-Your job is to find what is missing, not just summarize the first page of results.
+Your job is to find what is missing, not just summarize the first page of results. Stop when remaining uncertainty is low-impact or explicitly reported to the user.
 ## When to Use
-- The task needs deep web or repo research before planning or implementation.
-- The first-pass answer is incomplete, contradictory, or likely stale.
-- The user explicitly asks for research, latest information, or public-repo comparison.
+- The task needs deep web or repo research before planning or implementation
+- The first-pass answer is incomplete, contradictory, or likely stale
+- The user explicitly asks for research, latest information, or public-repo comparison
+- Claims are contested or the topic changes fast (AI tooling, frameworks, protocols)
 ## Instructions
 ### STANDARD OPERATING PROCEDURE (SOP)
-1. Define the question and what would count as enough evidence.
-2. Run a first pass and identify gaps or contradictions.
+1. Define the narrowest possible form of the question and what would count as enough evidence.
+2. Run a first pass and identify gaps, contradictions, and missing facts.
 3. Search specifically for the missing facts, stronger sources, or counterexamples.
-4. Rank sources by directness, recency, and authority.
-5. Separate sourced facts, informed inference, and unresolved gaps.
+4. Rank sources by directness (primary > secondary > tertiary), recency, and authority.
+5. Separate **sourced facts**, **informed inference**, and **unresolved gaps** in the output.
+6. Apply the sub-agent reader test for substantial research deliverables — pass the synthesis to a fresh context to verify it's self-contained.
 ### Constraints
@@ -40,19 +42,22 @@ Your job is to find what is missing, not just summarize the first page of result
 ## Output Format
-Provide implementation guidance, code examples, and configuration as appropriate to the task.
+Structure clearly as:
-## References
-| File                                      | Load when                                                                                             |
-| ----------------------------------------- | ----------------------------------------------------------------------------------------------------- |
-| `references/multi-round-research-loop.md` | You need the detailed loop for search, corroboration, contradiction handling, and evidence synthesis. |
+- **Key findings** — the answer, directly stated
+- **Evidence** — sourced facts with citations ranked by confidence
+- **Inference** — what follows logically from the evidence (labeled as inference)
+- **Open questions** — what remains unresolved and why it matters
-## Scripts
+## References
-No helper scripts are required for this skill right now. Keep execution in `SKILL.md` and `references/` unless repeated automation becomes necessary.
+| File                                      | Load when                                                                                                                                                   |
+| ----------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `references/multi-round-research-loop.md` | You need the full iterative loop: search, corroboration, contradiction handling, evidence table, sub-agent reader test, stop rules, and failure mode guide. |
 ## Examples
-- "Help me with deep research best practices in this project"
-- "Review my deep research implementation for issues"
+- "Research how Anthropic structures their agent skills — compare to what CBX does"
+- "What's the latest on evaluator-optimizer patterns in production agent systems?"
+- "Deep research on OKLCH vs HSL for design systems — what do practitioners actually use?"
+- "Find counterexamples to the claim that parallel agents always improve speed"

package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/deep-research/references/multi-round-research-loop.md CHANGED Viewed

@@ -1,15 +1,80 @@
 # Multi-Round Research Loop
-Load this when the research task is broad, unstable, or likely to have conflicting public sources.
+Load this when the research task is broad, unstable, contested, or likely to have conflicting public sources — or when the user explicitly asks for deep or latest research.
-## Loop
+## The Core Principle
-1. Start with the narrowest question possible.
-2. Record what the first pass did not answer.
-3. Search directly for the missing facts or contradictions.
-4. Prefer primary or official sources when they exist.
-5. Keep a short evidence table: fact, source, confidence, open question.
+Your job is to find what is _missing_, not just summarize the first page of results. Stop when remaining uncertainty is low-impact or explicitly reported back to the user.
+---
+## The Loop
+### Round 1: Define and search broadly
+1. **Narrow the question** — State the most specific version of what you're trying to find. Vague questions produce vague results.
+2. **Search** — Run initial queries across the most likely sources (official docs, engineering blogs, GitHub repos, research papers as appropriate).
+3. **Record gaps** — What did this pass _not_ answer? What's contradictory? What's suspiciously absent?
+### Round 2: Target the gaps
+4. **Search directly for the missing facts** — Use specific, targeted queries (not broad topic queries). Prefer: official docs > primary source blog > authoritative community reference > general article.
+5. **Search for contradictions** — If two sources disagree, search specifically for why. Age, version differences, and context often explain it.
+6. **Seek counterexamples** — Actively search for "X doesn't work", "X is wrong", "problems with X" — not just confirmation.
+### Round 3: Corroborate and synthesize
+7. **Cross-verify** unstable claims against at least one independent source.
+8. **Rank sources** by directness (primary > secondary > tertiary), recency (newer > older for fast-moving topics), and authority (official > community > anecdotal).
+9. **Write the evidence table** — Track what you found, not just conclusions:
+| Fact    | Source       | Confidence          | Open question            |
+| ------- | ------------ | ------------------- | ------------------------ |
+| [Claim] | [URL or doc] | High / Medium / Low | [What's still uncertain] |
+---
 ## Stop Rule
-Stop only when the remaining uncertainty is either low-impact or explicitly reported back to the user.
+Stop when:
+- Remaining uncertainty is low-impact (won't change the recommendation)
+- OR the question is genuinely unresolvable from public sources (report this explicitly)
+- OR you've completed 3 rounds without new signal (diminishing returns — report what's known and what's not)
+Do NOT stop after one source when the claim is unstable, contested, or from a secondary source.
+---
+## Output Format
+Separate clearly:
+- **Sourced facts** — you have direct evidence
+- **Informed inference** — logically follows from evidence but not directly stated
+- **Unresolved gaps** — you searched and didn't find it; note what's unknown
+Do not present inference as fact. Do not present absence of evidence as evidence of absence.
+---
+## Common Research Failure Modes
+| Failure                               | Fix                                                |
+| ------------------------------------- | -------------------------------------------------- |
+| Stopping at first result              | Check at least 2-3 sources for any unstable claim  |
+| Only finding confirmation             | Actively search for counterexamples and criticisms |
+| Treating recent = correct             | Cross-check recency with authority and context     |
+| Vague queries returning vague results | Restate the question as a specific, narrow query   |
+| Reporting uncertainty as fact         | Use "inference" or "unknown" tags explicitly       |
+| Burying the answer in context         | Lead with the finding; evidence follows            |
+---
+## Sub-Agent Reader Test
+After completing research, if the output will be used by another agent or handed to a human without context:
+- Pass the research summary to a fresh sub-agent with no conversation history
+- Ask: "What is the main finding?", "What is still uncertain?", "What sources support the key claims?"
+- If the fresh reader gets it wrong, the synthesis has context bleed — revise before delivery

package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/frontend-design/SKILL.md CHANGED Viewed

@@ -31,9 +31,10 @@ Guide creation of distinctive, production-grade frontend interfaces that avoid g
 Before writing code, ask or infer:
 1. **Purpose** — What problem does this interface solve? Who uses it?
-2. **Tone** — Pick a bold aesthetic direction: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian. There are many flavors. Pick one and commit.
-3. **Constraints** — Technical requirements (framework, performance, a11y level, browser support).
-4. **Differentiation** — What makes this UNFORGETTABLE? What's the one thing someone will remember?
+2. **Brand** — Is there an existing brand system, style guide, or named brand (e.g. "Anthropic", client guidelines, hex palette) to follow? If yes, load `references/brand-presets.md` and use `/brand` to apply it before choosing aesthetic direction.
+3. **Tone** — If no brand system exists, pick a bold aesthetic direction: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian. There are many flavors. Pick one and commit.
+4. **Constraints** — Technical requirements (framework, performance, a11y level, browser support).
+5. **Differentiation** — What makes this UNFORGETTABLE? What's the one thing someone will remember?
 Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work — the key is intentionality, not intensity.
@@ -166,40 +167,42 @@ Deliver:
 Load only what the current step needs.
-| File                               | Load when                                                                                            |
-| ---------------------------------- | ---------------------------------------------------------------------------------------------------- |
-| `references/typography.md`         | Task involves font selection, type scale, font loading, or text hierarchy decisions.                 |
-| `references/color-and-contrast.md` | Task involves palette selection, dark mode, OKLCH color, contrast ratios, or tinted neutrals.        |
-| `references/spatial-design.md`     | Task involves grid systems, spacing rhythm, container queries, or layout composition.                |
-| `references/motion-design.md`      | Task involves animation timing, easing curves, staggered reveals, or reduced motion support.         |
-| `references/interaction-design.md` | Task involves form design, focus management, loading states, or progressive disclosure patterns.     |
-| `references/responsive-design.md`  | Task involves mobile-first design, fluid layouts, container queries, or adaptive interfaces.         |
-| `references/ux-writing.md`         | Task involves button labels, error messages, empty states, or microcopy decisions.                   |
-| `references/ux-psychology.md`      | Task involves cognitive load, decision architecture, trust building, or emotional design principles. |
+| File                               | Load when                                                                                                                             |
+| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
+| `references/typography.md`         | Task involves font selection, type scale, font loading, or text hierarchy decisions.                                                  |
+| `references/color-and-contrast.md` | Task involves palette selection, dark mode, OKLCH color, contrast ratios, or tinted neutrals.                                         |
+| `references/spatial-design.md`     | Task involves grid systems, spacing rhythm, container queries, or layout composition.                                                 |
+| `references/motion-design.md`      | Task involves animation timing, easing curves, staggered reveals, or reduced motion support.                                          |
+| `references/interaction-design.md` | Task involves form design, focus management, loading states, or progressive disclosure patterns.                                      |
+| `references/responsive-design.md`  | Task involves mobile-first design, fluid layouts, container queries, or adaptive interfaces.                                          |
+| `references/ux-writing.md`         | Task involves button labels, error messages, empty states, or microcopy decisions.                                                    |
+| `references/ux-psychology.md`      | Task involves cognitive load, decision architecture, trust building, or emotional design principles.                                  |
+| `references/brand-presets.md`      | Task involves applying existing brand guidelines, a named brand system (e.g. Anthropic), or converting a hex palette into CSS tokens. |
 ## Commands
 17 specialized commands for targeted design operations. Each command focuses on a specific design concern and can be applied to a whole page or a specific element.
-| Command             | Purpose                                                                                   |
-| ------------------- | ----------------------------------------------------------------------------------------- |
-| `/audit`            | Run technical quality checks: accessibility, performance, responsive behavior             |
-| `/critique`         | UX design review: hierarchy, clarity, emotional resonance, user flow                      |
-| `/normalize`        | Align with design system standards: tokens, spacing, typography consistency               |
-| `/polish`           | Final pass before shipping: micro-details, alignment, visual refinement                   |
-| `/distill`          | Strip to essence: remove unnecessary complexity, simplify without losing character        |
-| `/clarify`          | Improve unclear UX copy: labels, instructions, error messages, empty states               |
-| `/optimize`         | Performance improvements: image sizes, render-blocking, bundle impact                     |
-| `/harden`           | Error handling, i18n readiness, edge cases, defensive UI patterns                         |
-| `/animate`          | Add purposeful motion: transitions, micro-interactions, state changes                     |
-| `/colorize`         | Introduce strategic color: palette refinement, accent placement, contrast fixes           |
-| `/bolder`           | Amplify timid designs: stronger hierarchy, more contrast, bigger gestures                 |
-| `/quieter`          | Tone down overwhelming designs: reduce noise, increase whitespace, simplify               |
-| `/delight`          | Add moments of joy: easter eggs, satisfying interactions, personality                     |
-| `/extract`          | Pull into reusable components: identify patterns, create component API                    |
-| `/adapt`            | Adapt for different devices: responsive breakpoints, touch targets, viewport optimization |
-| `/onboard`          | Design onboarding flows: first-run experience, empty states, progressive disclosure       |
-| `/teach-impeccable` | One-time setup: gather project design context, save preferences for future sessions       |
+| Command             | Purpose                                                                                                             |
+| ------------------- | ------------------------------------------------------------------------------------------------------------------- |
+| `/audit`            | Run technical quality checks: accessibility, performance, responsive behavior                                       |
+| `/critique`         | UX design review: hierarchy, clarity, emotional resonance, user flow                                                |
+| `/normalize`        | Align with design system standards: tokens, spacing, typography consistency                                         |
+| `/polish`           | Final pass before shipping: micro-details, alignment, visual refinement                                             |
+| `/distill`          | Strip to essence: remove unnecessary complexity, simplify without losing character                                  |
+| `/clarify`          | Improve unclear UX copy: labels, instructions, error messages, empty states                                         |
+| `/optimize`         | Performance improvements: image sizes, render-blocking, bundle impact                                               |
+| `/harden`           | Error handling, i18n readiness, edge cases, defensive UI patterns                                                   |
+| `/animate`          | Add purposeful motion: transitions, micro-interactions, state changes                                               |
+| `/colorize`         | Introduce strategic color: palette refinement, accent placement, contrast fixes                                     |
+| `/bolder`           | Amplify timid designs: stronger hierarchy, more contrast, bigger gestures                                           |
+| `/quieter`          | Tone down overwhelming designs: reduce noise, increase whitespace, simplify                                         |
+| `/delight`          | Add moments of joy: easter eggs, satisfying interactions, personality                                               |
+| `/extract`          | Pull into reusable components: identify patterns, create component API                                              |
+| `/adapt`            | Adapt for different devices: responsive breakpoints, touch targets, viewport optimization                           |
+| `/onboard`          | Design onboarding flows: first-run experience, empty states, progressive disclosure                                 |
+| `/teach-impeccable` | One-time setup: gather project design context, save preferences for future sessions                                 |
+| `/brand`            | Apply or enforce a specific brand identity: convert guideline colors to CSS tokens, set typography, verify contrast |
 Usage: Most commands accept an optional argument to focus on a specific area (e.g., `/audit header`, `/polish checkout-form`).
@@ -211,3 +214,5 @@ Usage: Most commands accept an optional argument to focus on a specific area (e.
 - "/critique the checkout flow — is the hierarchy clear? Does it build trust?"
 - "/polish the hero section before we ship."
 - "I need a color system for a health tech app. No fintech blue, no AI purple."
+- "/brand anthropic — apply Anthropic's brand colors and typography to this interface."
+- "Here's our brand guide with hex values. Apply it to this dashboard — /brand"