npm - @cubis/foundry - Versions diffs - 0.3.70 → 0.3.71 - Mend

@cubis/foundry 0.3.70 → 0.3.71

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/workflows/workflows/agent-environment-setup/platforms/claude/skills/frontend-design/references/brand-presets.md ADDED Viewed

@@ -0,0 +1,228 @@
+# Brand Presets Reference
+Use this reference when building interfaces that must conform to an existing brand system — whether a client-supplied style guide, a design system handoff, or a well-documented brand like Anthropic's.
+## Receiving Brand Guidelines
+When a user hands over brand guidelines, extract these five things before writing any code:
+| Extract            | Ask or infer                                                                 |
+| ------------------ | ---------------------------------------------------------------------------- |
+| **Color palette**  | What are the primary, secondary, and accent hex values?                      |
+| **Neutral system** | Are neutrals warm, cool, or truly achromatic?                                |
+| **Typography**     | Specific font families for headings and body? Variable font available?       |
+| **Spacing DNA**    | Base unit? Tight/airy preference?                                            |
+| **Tone**           | Where on the spectrum between playful ↔ authoritative, minimal ↔ expressive? |
+## Hex → OKLCH Conversion Workflow
+Always convert brand hex colors to OKLCH for CSS. OKLCH gives you perceptual uniformity — colors with the same L value look equally bright, unlike hex or HSL.
+```css
+/* Conversion pattern:  hex → oklch via CSS Color 4 */
+/* Most modern browsers accept oklch() natively */
+/* Use https://oklch.com to find values, or compute: */
+/* L = perceived lightness 0–100% */
+/* C = chroma (colorfulness) 0–0.37ish */
+/* H = hue angle 0–360 */
+/* Example: #d97757 (warm orange) */
+--brand-orange: oklch(65% 0.145 42);
+/* Example: #faf9f5 (warm cream) */
+--brand-surface: oklch(98.2% 0.008 85);
+/* Example: #141413 (warm near-black) */
+--brand-ink: oklch(10.5% 0.006 85);
+```
+**Shorthand**: enter any hex at [oklch.com](https://oklch.com) to get the L/C/H values.
+## Semantic Token Mapping
+Once you have OKLCH values, map them to semantic tokens — never use raw hex or oklch values directly in components:
+```css
+:root {
+  /* --- RAW BRAND TOKENS (source of truth) --- */
+  --brand-ink: oklch(10.5% 0.006 85); /* near-black, warm */
+  --brand-surface: oklch(98.2% 0.008 85); /* cream white */
+  --brand-mid: oklch(72% 0.009 85); /* mid-range gray */
+  --brand-subtle: oklch(92% 0.01 85); /* light gray */
+  --brand-orange: oklch(65% 0.145 42); /* primary accent */
+  --brand-blue: oklch(65% 0.09 235); /* secondary accent */
+  --brand-green: oklch(57% 0.09 130); /* tertiary accent */
+  /* --- SEMANTIC TOKENS (what components use) --- */
+  --color-bg: var(--brand-surface);
+  --color-bg-subtle: var(--brand-subtle);
+  --color-text: var(--brand-ink);
+  --color-text-secondary: var(--brand-mid);
+  --color-accent: var(--brand-orange);
+  --color-accent-secondary: var(--brand-blue);
+  --color-accent-tertiary: var(--brand-green);
+  --color-border: var(--brand-subtle);
+}
+```
+## Anthropic Brand System
+Anthropic's brand (from [anthropics/skills](https://github.com/anthropics/skills/tree/main/skills/brand-guidelines)) is a useful reference implementation. It's a warm, editorial system — earthy neutrals with bold accent contrast.
+### Color Palette
+```css
+:root {
+  /* Neutrals — warm hue angle ~85 (yellow-brown direction) */
+  --anthropic-ink: oklch(10.5% 0.006 85); /* #141413 — body text, dark bg */
+  --anthropic-cream: oklch(
+    98.2% 0.008 85
+  ); /* #faf9f5 — light bg, text on dark */
+  --anthropic-mid: oklch(72% 0.009 85); /* #b0aea5 — secondary text */
+  --anthropic-subtle: oklch(92% 0.01 85); /* #e8e6dc — dividers, subtle bg */
+  /* Accents — arranged in visual temperature order */
+  --anthropic-orange: oklch(
+    65% 0.145 42
+  ); /* #d97757 — primary CTA, highlights */
+  --anthropic-blue: oklch(65% 0.09 235); /* #6a9bcc — secondary actions */
+  --anthropic-green: oklch(
+    57% 0.09 130
+  ); /* #788c5d — tertiary, success states */
+}
+```
+### Typography
+```css
+/* Load from Google Fonts */
+@import url("https://fonts.googleapis.com/css2?family=Poppins:wght@400;500;600;700&family=Lora:ital,wght@0,400;0,500;0,600;1,400&display=swap");
+:root {
+  --font-display: "Poppins", Arial, sans-serif; /* headings, nav, labels */
+  --font-body: "Lora", Georgia, serif; /* body text, long-form */
+}
+/* Application rules */
+h1,
+h2,
+h3,
+h4,
+h5,
+h6,
+.label,
+.nav-item,
+.button,
+.badge {
+  font-family: var(--font-display);
+}
+p,
+blockquote,
+.prose,
+article {
+  font-family: var(--font-body);
+}
+```
+**Why this pairing works**: Poppins is geometric and structured — clear, modern authority. Lora is elegant serif — warm, readable, literary. Together they balance clarity with warmth, matching Anthropic's positioning between scientific rigor and approachability.
+### Spacing DNA
+The brand leans into generous negative space. Use a `8px` base unit with larger gaps between major sections:
+```css
+:root {
+  --space-1: 0.5rem; /* 8px  — tight groupings */
+  --space-2: 1rem; /* 16px — component padding */
+  --space-3: 1.5rem; /* 24px — between related elements */
+  --space-4: 2rem; /* 32px — section mini-gap */
+  --space-6: 3rem; /* 48px — between sections */
+  --space-8: 4rem; /* 64px — major section spacing */
+}
+```
+### Component Patterns
+```css
+/* Card — warm border, generous padding, no shadow */
+.card {
+  background: var(--anthropic-subtle);
+  border: 1px solid var(--anthropic-mid);
+  border-radius: 4px; /* subtle — not pill-shaped */
+  padding: var(--space-4);
+}
+/* Button — orange CTA */
+.button-primary {
+  background: var(--anthropic-orange);
+  color: var(--anthropic-cream);
+  font-family: var(--font-display);
+  font-weight: 500;
+  letter-spacing: 0.01em;
+  border-radius: 4px;
+  padding: 0.625rem 1.5rem;
+  border: none;
+}
+.button-primary:hover {
+  background: oklch(60% 0.145 42); /* slightly darker orange */
+}
+/* Text link — uses orange, not blue */
+a {
+  color: var(--anthropic-orange);
+  text-decoration-thickness: 1px;
+  text-underline-offset: 3px;
+}
+/* Code / monospace — uses ink on subtle bg */
+code {
+  background: var(--anthropic-subtle);
+  color: var(--anthropic-ink);
+  font-size: 0.875em;
+  padding: 0.2em 0.4em;
+  border-radius: 3px;
+}
+```
+### Dark Mode
+Anthropic's palette inverts gracefully — the cream and ink swap, mid-tones pull slightly warmer:
+```css
+@media (prefers-color-scheme: dark) {
+  :root {
+    --color-bg: var(--anthropic-ink);
+    --color-bg-subtle: oklch(16% 0.008 85); /* slightly elevated */
+    --color-text: var(--anthropic-cream);
+    --color-text-secondary: var(--anthropic-mid);
+    --color-border: oklch(22% 0.008 85);
+    /* Accents stay the same — they hold on both backgrounds */
+  }
+}
+```
+## Applying Other Brand Systems
+When adapting a different brand, follow this checklist:
+1. **Extract and convert** — Get all hex values, convert to oklch
+2. **Identify neutrals** — Are they warm, cool, or pure gray? Find the hue angle
+3. **Map the hierarchy** — Which color is dominant (60%), which secondary (30%), which accent (10%)?
+4. **Check contrast** — Use the WCAG APCa algorithm for text contrast. oklch makes this predictable
+5. **Find the typography voice** — Geometric sans = structured/modern; humanist sans = friendly; slab = authoritative; oldstyle serif = editorial; transitional serif = professional
+6. **Test the mood** — Show a prototype section in brand colors. Does it _feel_ like the brand in motion, not just color?
+## Common Brand Color Archetypes
+| Archetype                      | Neutrals                | Primary accent       | Feeling                           |
+| ------------------------------ | ----------------------- | -------------------- | --------------------------------- |
+| **Warm editorial** (Anthropic) | Cream / warm near-black | Orange / terracotta  | Thoughtful, approachable, premium |
+| **Cool tech**                  | True gray / white       | Electric blue / teal | Precise, efficient, modern        |
+| **Finance/enterprise**         | Navy / white            | Deep blue / gold     | Stable, trustworthy, conservative |
+| **Health/wellness**            | Off-white / dark green  | Sage green / amber   | Natural, calm, nurturing          |
+| **Startup/consumer**           | White / black           | Bold purple or coral | Energetic, fun, accessible        |
+| **Luxury**                     | White / true black      | Gold / burgundy      | Exclusive, refined, timeless      |
+When working with a brand that fits these archetypes, pull from the pattern — then make one unexpected choice to give it character.

package/workflows/workflows/agent-environment-setup/platforms/claude/skills/skills_index.json CHANGED Viewed

@@ -552,7 +552,7 @@
       "research"
     ],
     "path": ".claude/skills/deep-research/SKILL.md",
-    "description": "Use when a task needs multi-round research rather than a quick lookup: iterative search, gap finding, corroboration across sources, contradiction handling, or evidence-led synthesis before planning or implementation.",
+    "description": "Use when a task needs multi-round research rather than a quick lookup: iterative search, gap finding, corroboration across sources, contradiction handling, evidence-led synthesis before planning or implementation. Also use when the user asks for 'deep research', 'latest info', or 'how does X compare to Y publicly'.",
     "triggers": [
       "deep-research",
       "deep research",

package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/agent-design/SKILL.md ADDED Viewed

@@ -0,0 +1,197 @@
+---
+name: agent-design
+description: "Use when designing, building, or improving a CBX agent, skill, or workflow: clarification strategy, progressive disclosure structure, workflow pattern selection (sequential, parallel, evaluator-optimizer), skill type taxonomy, description tuning, and eval-first testing."
+license: MIT
+metadata:
+  author: cubis-foundry
+  version: "1.0"
+compatibility: Claude Code, Codex, GitHub Copilot, Gemini CLI
+---
+# Agent Design
+## Purpose
+You are the specialist for designing CBX agents and skills that behave intelligently — asking the right questions, knowing when to pause, executing in the right workflow pattern, and testing their own output.
+Your job is to close the gap between "it kinda works" and "it works reliably under any input."
+## When to Use
+- Designing or refactoring a SKILL.md or POWER.md
+- Choosing between sequential, parallel, or evaluator-optimizer workflow
+- Writing clarification logic for an agent that handles ambiguous requests
+- Deciding whether a task needs a skill or just a prompt
+- Testing whether a skill actually works as intended
+- Writing descriptions that trigger the right skill at the right time
+## Core Principles
+These come directly from Anthropic's agent engineering research (["Equipping agents for the real world"](https://claude.com/blog/equipping-agents-for-the-real-world-with-agent-skills), March 2026):
+1. **Progressive disclosure** — A skill's SKILL.md provides just enough context to know when to load it. Full instructions, references, and scripts are loaded lazily, only when needed. More context in a single file does not equal better behavior — it usually hurts it.
+2. **Eval before optimizing** — Define what "good looks like" (test cases + success criteria) before editing the skill. This prevents regression and tells you when improvement actually happened.
+3. **Description precision** — The `description` field in YAML frontmatter controls triggering. Too broad = false positives. Too narrow = the skill never fires. Tune it like a search query.
+4. **Two skill types** — See [Skill Type Taxonomy](#skill-type-taxonomy). These need different testing strategies and have different shelf lives.
+5. **Start with a single agent** — Before adding workflow complexity, first try a single agent with a rich prompt. Only add orchestration when it measurably improves results.
+## Skill Type Taxonomy
+| Type                   | What it does                                                                                                                                | Testing goal                                | Shelf life                                              |
+| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------- | ------------------------------------------------------- |
+| **Capability uplift**  | Teaches Claude to do something it can't do alone (e.g. manipulate PDFs, fill forms, use a domain-specific API)                              | Verify the output is correct and consistent | Medium — may become obsolete as models improve          |
+| **Encoded preference** | Sequences steps Claude could do individually, but in your team's specific order and style (e.g. NDA review checklist, weekly update format) | Verify fidelity to the actual workflow      | High — these stay useful because they're uniquely yours |
+Design question: "Is this skill teaching Claude something new, or encoding how we do things?"
+## Clarification Strategy
+An agent that starts wrong wastes everyone's time. Smart agents pause at the right moments.
+Load `references/clarification-patterns.md` when:
+- Designing how a skill should handle ambiguous or underspecified inputs
+- Writing the early steps of a workflow where user intent matters
+- Deciding what questions to ask vs. what to infer
+## Workflow Pattern Selection
+Three patterns cover 95% of production agent workflows:
+| Pattern                 | Use when                                                        | Cost                    | Benefit                                   |
+| ----------------------- | --------------------------------------------------------------- | ----------------------- | ----------------------------------------- |
+| **Sequential**          | Steps have dependencies (B needs A's output)                    | Latency (linear)        | Focus: each step does one thing well      |
+| **Parallel**            | Steps are independent and concurrency helps                     | Tokens (multiplicative) | Speed + separation of concerns            |
+| **Evaluator-optimizer** | First-draft quality isn't good enough and quality is measurable | Tokens × iterations     | Better output through structured feedback |
+Default to sequential. Add parallel when latency is the bottleneck and tasks are genuinely independent. Add evaluator-optimizer only when you can measure the improvement.
+Load `references/workflow-patterns.md` for the full decision tree, examples, and anti-patterns.
+## Progressive Disclosure Structure
+A well-structured CBX skill looks like:
+```
+skill-name/
+  SKILL.md           ← lean entry: name, description, purpose, when-to-use, load-table
+  references/        ← detailed guides loaded lazily when step requires it
+    topic-a.md
+    topic-b.md
+  commands/          ← slash commands (optional)
+    command.md
+  scripts/           ← executable code (optional)
+    helper.py
+```
+**SKILL.md should be loadable in <2000 tokens.** Everything else lives in references.
+The metadata table pattern that works:
+```markdown
+## References
+| File                    | Load when                                  |
+| ----------------------- | ------------------------------------------ |
+| `references/topic-a.md` | Task involves [specific trigger condition] |
+| `references/topic-b.md` | Task involves [specific trigger condition] |
+```
+This lets the agent make intelligent decisions about what context to load rather than ingesting everything upfront.
+## Description Writing
+The `description` field is a trigger — write it like a search query, not marketing copy.
+**Good description:**
+```yaml
+description: "Use when evaluating an agent, skill, workflow, or MCP server: rubric design, evaluator-optimizer loops, LLM-as-judge patterns, regression suites, or prototype-vs-production quality gaps."
+```
+**Bad description:**
+```yaml
+description: "A comprehensive skill for evaluating things and making sure they work well."
+```
+Rules:
+- Lead with the specific trigger verb: "Use when [user does X]"
+- List the specific task types with commas — these act like search keywords
+- Include domain-specific nouns the user would actually type
+- Avoid generic adjectives ("comprehensive", "powerful", "advanced")
+Test your description: would a user's natural-language request match the intent of these words?
+## Testing a Skill
+Before shipping, verify with this checklist:
+1. **Positive trigger** — Does the skill load when it should? Test 5 natural phrasings of the target task.
+2. **Negative trigger** — Does it stay quiet when it shouldn't load? Test 5 near-miss phrasings.
+3. **Happy path** — Does the skill complete the standard task correctly?
+4. **Edge cases** — What happens with missing input, ambiguous phrasing, or edge-case content?
+5. **Reader test** — Run the delivery (e.g., a generated doc, a plan) through a fresh sub-agent with no context. Can it answer questions about the output correctly?
+For formal regression suites, load `references/skill-testing.md`.
+## Instructions
+### Step 1 — Understand the design task
+Before touching any file, clarify:
+- Is this a new skill or improving an existing one?
+- Is it capability uplift or encoded preference?
+- What's the specific failure mode being fixed?
+- What would passing look like?
+If any of these are unclear, apply the clarification pattern from `references/clarification-patterns.md`.
+### Step 2 — Choose the structure
+- If the skill is simple (single task, single purpose): lean SKILL.md with no references
+- If the skill is complex (multiple phases, conditional logic): SKILL.md + references loaded lazily
+- If the skill has reusable commands: add `commands/` directory
+### Step 3 — Design the workflow
+Use the pattern selection table above. Start with sequential. Prove you need complexity before adding it.
+### Step 4 — Write the description
+Write it last. Once you know what the skill does and how it differs from adjacent skills, the right description is usually obvious.
+### Step 5 — Define a test
+Write at least 3 test cases (input → expected output or behavior) before considering the skill done. These become the regression suite.
+## Output Format
+Deliver:
+1. **Skill structure** — directory layout, file list
+2. **SKILL.md** — production-ready with lean body and reference table
+3. **Reference files** — if needed, each scoped to a specific phase or topic
+4. **Test cases** — 3-5 natural language inputs with expected behaviors
+5. **Description** — the final `description` field, tuned for triggering
+## References
+| File                                   | Load when                                                                      |
+| -------------------------------------- | ------------------------------------------------------------------------------ |
+| `references/clarification-patterns.md` | Designing how the agent handles ambiguous or underspecified input              |
+| `references/workflow-patterns.md`      | Choosing or implementing sequential, parallel, or evaluator-optimizer workflow |
+| `references/skill-testing.md`          | Writing evals, regression sets, or triggering tests for a skill                |
+## Examples
+- "Design a skill for our NDA review process — it should follow our checklist exactly."
+- "The feature-forge skill triggers on the wrong prompts. Help me fix the description."
+- "How do I test whether my skill still works after a model update?"
+- "I need a workflow where 3 agents review code in parallel then one synthesizes findings."
+- "This skill's SKILL.md is 4000 tokens. Help me split it into lean structure with references."

package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/agent-design/references/clarification-patterns.md ADDED Viewed

@@ -0,0 +1,153 @@
+# Clarification Patterns Reference
+Load this when designing how an agent handles ambiguous, underspecified, or multi-interpretation input.
+Source: Anthropic doc-coauthoring skill pattern + CBX ask-questions-if-underspecified research (2026).
+---
+## When to Clarify vs. When to Infer
+The wrong default is to ask everything. The right default is to ask what genuinely branches the work.
+**Clarify** when:
+- Multiple plausible interpretations produce significantly different implementations
+- The wrong interpretation wastes significant time or produces the wrong output
+- A key parameter (scope, audience, constraint) changes the entire approach
+**Infer and state assumptions** when:
+- A quick read (repo structure, config file, existing code) can answer the question
+- The request is clear for 90%+ of the obvious interpretations
+- The user explicitly asked you to proceed
+**Proceed without asking** when:
+- The task is clear and unambiguous
+- Discovery is faster than asking
+- The cost of being slightly wrong is low and reversible
+---
+## The 1-5 Question Rule
+Ask at most **5 questions** in the first pass. Prefer questions that eliminate entire branches of work.
+If more than 5 things are unclear, rank by impact and ask the highest-impact ones first. More questions surface after the user's first answers.
+---
+## Fast-Path Design
+Every clarification block should have a fast path. Users who know what they want shouldn't wade through 5 questions.
+**Include always:**
+- A compact reply format: `"Reply 1b 2a 3c to accept these options"`
+- Default options explicitly labeled: `(default)` or _bolded_
+- A fast-path shortcut: `"Reply 'defaults' to accept all recommended choices"`
+**Example block:**
+```
+Before I start, a few quick questions:
+1. **Scope?**
+   a) Only the requested function **(default)**
+   b) Refactor any touched code
+   c) Not sure — use default
+2. **Framework target?**
+   a) Match existing project **(default)**
+   b) Specify: ___
+3. **Test coverage?**
+   a) None needed **(default)**
+   b) Unit tests alongside
+   c) Full integration test
+Reply with numbers and letters (e.g., `1a 2a 3b`) or `defaults` to proceed with all defaults.
+```
+---
+## Three-Stage Context Gathering (for complex tasks)
+Use this when a task is substantial enough that getting it wrong = significant wasted work. Borrowed from Anthropic's doc-coauthoring skill.
+### Stage 1: Initial Questions (meta-context)
+Ask 3-5 questions about the big-picture framing before touching the content:
+- What type of deliverable is this? (spec, code, doc, design, plan)
+- Who's the audience / consumer of this output?
+- What's the definition of done — what would make this clearly successful?
+- Are there constraints (framework, format, performance bar, audience knowledge level)?
+- Is there an existing template or precedent to follow?
+Tell the user they can answer in shorthand. Offer: "Or just dump your context and I'll ask follow-ups."
+### Stage 2: Info Dump + Follow-up
+After initial answers, invite a full brain dump:
+> "Dump everything you know about this — background, prior decisions, constraints, blockers, opinions. Don't organize it, just get it out."
+Then ask targeted follow-up questions based on gaps in what they provided. Aim for 5-10 numbered follow-ups. Users can use shorthand (e.g., "1: yes, 2: see previous context, 3: no").
+**Exit condition for Stage 2:** You understand the objective, the constraints, and at least one clear definition of success.
+### Stage 3: Confirm Interpretation, Then Proceed
+Restate the requirements in 1-3 sentences before starting work:
+> "Here's my understanding: [objective in one sentence]. [Key constraint]. [What done looks like]. Starting now — let me know if anything's off."
+---
+## Reader Test (for deliverables)
+When the deliverable is substantial (a plan, a document, a design decision), test it with a fresh context before handing it to the user.
+**How:** Invoke a sub-agent or fresh prompt with only the deliverable (no conversation history) and ask:
+- "What is this about?"
+- "What are the key decisions made here?"
+- "What's missing or unclear?"
+If the fresh read surfaces gaps the user would have found, fix them first.
+**When to use:** After generating complex plans, multi-section documents, architecture decisions, or any output that will be read by someone without conversation context.
+---
+## Clarification Anti-Patterns
+Avoid these:
+| Anti-pattern                         | Problem                                                      |
+| ------------------------------------ | ------------------------------------------------------------ |
+| Asking everything upfront            | Overwhelms users; many questions are answerable by inference |
+| Asking about things you can discover | Read the file/repo before asking about it                    |
+| No default options                   | Forces users to reason through every option                  |
+| Open-ended questions without choices | High friction; users don't know the option space             |
+| Not restating interpretation         | User doesn't know what you understood                        |
+| Asking the same question twice       | Signals you didn't read the answer                           |
+| Asking about reversible decisions    | Just pick one and move; it can be changed                    |
+---
+## Decision: Which Pattern to Use
+```
+Is the task clear and unambiguous?
+  → YES: Proceed. State assumptions inline if any.
+  → NO: Is missing info discoverable by reading files/code?
+    → YES: Read first, then proceed or ask a single targeted question.
+    → NO: Is this a quick task where wrong interpretation is cheap?
+      → YES: Proceed with stated assumptions, invite correction.
+      → NO: Use the 1-5 Question Rule or Three-Stage Context Gathering.
+```
+Use Three-Stage context gathering only for substantial deliverables (docs, plans, architecture, complex features). For code tasks, the 1-5 question rule is usually sufficient.