@cubis/foundry 0.3.70 → 0.3.71

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. package/package.json +1 -1
  2. package/workflows/powers/ask-questions-if-underspecified/SKILL.md +51 -3
  3. package/workflows/powers/behavioral-modes/SKILL.md +100 -9
  4. package/workflows/skills/agent-design/SKILL.md +198 -0
  5. package/workflows/skills/agent-design/references/clarification-patterns.md +153 -0
  6. package/workflows/skills/agent-design/references/skill-testing.md +164 -0
  7. package/workflows/skills/agent-design/references/workflow-patterns.md +226 -0
  8. package/workflows/skills/deep-research/SKILL.md +25 -20
  9. package/workflows/skills/deep-research/references/multi-round-research-loop.md +73 -8
  10. package/workflows/skills/frontend-design/SKILL.md +37 -32
  11. package/workflows/skills/frontend-design/commands/brand.md +167 -0
  12. package/workflows/skills/frontend-design/references/brand-presets.md +228 -0
  13. package/workflows/skills/generated/skill-audit.json +11 -2
  14. package/workflows/skills/generated/skill-catalog.json +37 -5
  15. package/workflows/skills/skills_index.json +1 -1
  16. package/workflows/workflows/agent-environment-setup/platforms/claude/skills/agent-design/SKILL.md +198 -0
  17. package/workflows/workflows/agent-environment-setup/platforms/claude/skills/agent-design/references/clarification-patterns.md +153 -0
  18. package/workflows/workflows/agent-environment-setup/platforms/claude/skills/agent-design/references/skill-testing.md +164 -0
  19. package/workflows/workflows/agent-environment-setup/platforms/claude/skills/agent-design/references/workflow-patterns.md +226 -0
  20. package/workflows/workflows/agent-environment-setup/platforms/claude/skills/deep-research/SKILL.md +25 -20
  21. package/workflows/workflows/agent-environment-setup/platforms/claude/skills/deep-research/references/multi-round-research-loop.md +73 -8
  22. package/workflows/workflows/agent-environment-setup/platforms/claude/skills/frontend-design/SKILL.md +37 -32
  23. package/workflows/workflows/agent-environment-setup/platforms/claude/skills/frontend-design/commands/brand.md +167 -0
  24. package/workflows/workflows/agent-environment-setup/platforms/claude/skills/frontend-design/references/brand-presets.md +228 -0
  25. package/workflows/workflows/agent-environment-setup/platforms/claude/skills/skills_index.json +1 -1
  26. package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/agent-design/SKILL.md +197 -0
  27. package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/agent-design/references/clarification-patterns.md +153 -0
  28. package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/agent-design/references/skill-testing.md +164 -0
  29. package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/agent-design/references/workflow-patterns.md +226 -0
  30. package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/deep-research/SKILL.md +25 -20
  31. package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/deep-research/references/multi-round-research-loop.md +73 -8
  32. package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/frontend-design/SKILL.md +37 -32
  33. package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/frontend-design/commands/brand.md +167 -0
  34. package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/frontend-design/references/brand-presets.md +228 -0
  35. package/workflows/workflows/agent-environment-setup/platforms/copilot/skills/skills_index.json +1 -1
@@ -0,0 +1,153 @@
1
+ # Clarification Patterns Reference
2
+
3
+ Load this when designing how an agent handles ambiguous, underspecified, or multi-interpretation input.
4
+
5
+ Source: Anthropic doc-coauthoring skill pattern + CBX ask-questions-if-underspecified research (2026).
6
+
7
+ ---
8
+
9
+ ## When to Clarify vs. When to Infer
10
+
11
+ The wrong default is to ask everything. The right default is to ask what genuinely branches the work.
12
+
13
+ **Clarify** when:
14
+
15
+ - Multiple plausible interpretations produce significantly different implementations
16
+ - The wrong interpretation wastes significant time or produces the wrong output
17
+ - A key parameter (scope, audience, constraint) changes the entire approach
18
+
19
+ **Infer and state assumptions** when:
20
+
21
+ - A quick read (repo structure, config file, existing code) can answer the question
22
+ - The request is clear for 90%+ of the obvious interpretations
23
+ - The user explicitly asked you to proceed
24
+
25
+ **Proceed without asking** when:
26
+
27
+ - The task is clear and unambiguous
28
+ - Discovery is faster than asking
29
+ - The cost of being slightly wrong is low and reversible
30
+
31
+ ---
32
+
33
+ ## The 1-5 Question Rule
34
+
35
+ Ask at most **5 questions** in the first pass. Prefer questions that eliminate entire branches of work.
36
+
37
+ If more than 5 things are unclear, rank by impact and ask the highest-impact ones first. More questions surface after the user's first answers.
38
+
39
+ ---
40
+
41
+ ## Fast-Path Design
42
+
43
+ Every clarification block should have a fast path. Users who know what they want shouldn't wade through 5 questions.
44
+
45
+ **Include always:**
46
+
47
+ - A compact reply format: `"Reply 1b 2a 3c to accept these options"`
48
+ - Default options explicitly labeled: `(default)` or _bolded_
49
+ - A fast-path shortcut: `"Reply 'defaults' to accept all recommended choices"`
50
+
51
+ **Example block:**
52
+
53
+ ```
54
+ Before I start, a few quick questions:
55
+
56
+ 1. **Scope?**
57
+ a) Only the requested function **(default)**
58
+ b) Refactor any touched code
59
+ c) Not sure — use default
60
+
61
+ 2. **Framework target?**
62
+ a) Match existing project **(default)**
63
+ b) Specify: ___
64
+
65
+ 3. **Test coverage?**
66
+ a) None needed **(default)**
67
+ b) Unit tests alongside
68
+ c) Full integration test
69
+
70
+ Reply with numbers and letters (e.g., `1a 2a 3b`) or `defaults` to proceed with all defaults.
71
+ ```
72
+
73
+ ---
74
+
75
+ ## Three-Stage Context Gathering (for complex tasks)
76
+
77
+ Use this when a task is substantial enough that getting it wrong = significant wasted work. Borrowed from Anthropic's doc-coauthoring skill.
78
+
79
+ ### Stage 1: Initial Questions (meta-context)
80
+
81
+ Ask 3-5 questions about the big-picture framing before touching the content:
82
+
83
+ - What type of deliverable is this? (spec, code, doc, design, plan)
84
+ - Who's the audience / consumer of this output?
85
+ - What's the definition of done — what would make this clearly successful?
86
+ - Are there constraints (framework, format, performance bar, audience knowledge level)?
87
+ - Is there an existing template or precedent to follow?
88
+
89
+ Tell the user they can answer in shorthand. Offer: "Or just dump your context and I'll ask follow-ups."
90
+
91
+ ### Stage 2: Info Dump + Follow-up
92
+
93
+ After initial answers, invite a full brain dump:
94
+
95
+ > "Dump everything you know about this — background, prior decisions, constraints, blockers, opinions. Don't organize it, just get it out."
96
+
97
+ Then ask targeted follow-up questions based on gaps in what they provided. Aim for 5-10 numbered follow-ups. Users can use shorthand (e.g., "1: yes, 2: see previous context, 3: no").
98
+
99
+ **Exit condition for Stage 2:** You understand the objective, the constraints, and at least one clear definition of success.
100
+
101
+ ### Stage 3: Confirm Interpretation, Then Proceed
102
+
103
+ Restate the requirements in 1-3 sentences before starting work:
104
+
105
+ > "Here's my understanding: [objective in one sentence]. [Key constraint]. [What done looks like]. Starting now — let me know if anything's off."
106
+
107
+ ---
108
+
109
+ ## Reader Test (for deliverables)
110
+
111
+ When the deliverable is substantial (a plan, a document, a design decision), test it with a fresh context before handing it to the user.
112
+
113
+ **How:** Invoke a sub-agent or fresh prompt with only the deliverable (no conversation history) and ask:
114
+
115
+ - "What is this about?"
116
+ - "What are the key decisions made here?"
117
+ - "What's missing or unclear?"
118
+
119
+ If the fresh read surfaces gaps the user would have found, fix them first.
120
+
121
+ **When to use:** After generating complex plans, multi-section documents, architecture decisions, or any output that will be read by someone without conversation context.
122
+
123
+ ---
124
+
125
+ ## Clarification Anti-Patterns
126
+
127
+ Avoid these:
128
+
129
+ | Anti-pattern | Problem |
130
+ | ------------------------------------ | ------------------------------------------------------------ |
131
+ | Asking everything upfront | Overwhelms users; many questions are answerable by inference |
132
+ | Asking about things you can discover | Read the file/repo before asking about it |
133
+ | No default options | Forces users to reason through every option |
134
+ | Open-ended questions without choices | High friction; users don't know the option space |
135
+ | Not restating interpretation | User doesn't know what you understood |
136
+ | Asking the same question twice | Signals you didn't read the answer |
137
+ | Asking about reversible decisions | Just pick one and move; it can be changed |
138
+
139
+ ---
140
+
141
+ ## Decision: Which Pattern to Use
142
+
143
+ ```
144
+ Is the task clear and unambiguous?
145
+ → YES: Proceed. State assumptions inline if any.
146
+ → NO: Is missing info discoverable by reading files/code?
147
+ → YES: Read first, then proceed or ask a single targeted question.
148
+ → NO: Is this a quick task where wrong interpretation is cheap?
149
+ → YES: Proceed with stated assumptions, invite correction.
150
+ → NO: Use the 1-5 Question Rule or Three-Stage Context Gathering.
151
+ ```
152
+
153
+ Use Three-Stage context gathering only for substantial deliverables (docs, plans, architecture, complex features). For code tasks, the 1-5 question rule is usually sufficient.
@@ -0,0 +1,164 @@
1
+ # Skill Testing Reference
2
+
3
+ Load this when writing evals, regression sets, or description-triggering tests for a CBX skill.
4
+
5
+ Source: Anthropic skill-creator research — [Improving skill-creator: Test, measure, and refine Agent Skills](https://claude.com/blog/improving-skill-creator-test-measure-and-refine-agent-skills) (March 2026).
6
+
7
+ ---
8
+
9
+ ## Two Reasons to Test
10
+
11
+ 1. **Catch regressions** — As models and infrastructure evolve, skills that worked last month may behave differently. Evals give you an early signal before it impacts your team.
12
+ 2. **Know when the skill is obsolete** — For _capability uplift_ skills: if the base model starts passing your evals without the skill loaded, the skill has been incorporated into model behavior and can be retired.
13
+
14
+ ---
15
+
16
+ ## Five Test Categories
17
+
18
+ Every skill should pass all five before shipping.
19
+
20
+ ### 1. Trigger tests (description precision)
21
+
22
+ Does the skill load when it should — and stay quiet when it shouldn't?
23
+
24
+ **Method:**
25
+
26
+ - Write 5 natural-language prompts that _should_ trigger the skill
27
+ - Write 5 near-miss prompts that _should not_ trigger
28
+ - Load the skill and observe whether it activates
29
+
30
+ **Example for a frontend-design skill:**
31
+
32
+ ```
33
+ Should trigger:
34
+ - "Build me a landing page for my SaaS product"
35
+ - "Make this dashboard look less generic"
36
+ - "I need a color system for a health app"
37
+
38
+ Should NOT trigger:
39
+ - "Fix this TypeScript error"
40
+ - "Review my API endpoint design"
41
+ - "Help me write tests"
42
+ ```
43
+
44
+ **Fix:** If false positives occur, make the description more specific. If false negatives, broaden or add domain keywords.
45
+
46
+ ### 2. Happy path test
47
+
48
+ Does the skill complete its standard task correctly?
49
+
50
+ **Method:**
51
+
52
+ - Write the most common, straightforward version of the task the skill handles
53
+ - Run it and verify the output meets the expected criteria
54
+
55
+ ### 3. Edge case tests
56
+
57
+ What happens under abnormal or missing input?
58
+
59
+ Examples:
60
+
61
+ - Missing required information (no brand color, no framework specified)
62
+ - Ambiguous phrasing
63
+ - Conflicting requirements
64
+ - Very large or very small input
65
+ - The user ignored the clarification questions and just said "do it"
66
+
67
+ ### 4. Comparison test (A/B)
68
+
69
+ Does the skill actually improve output vs. no skill?
70
+
71
+ **Method:** Run the same prompt with and without the skill loaded. Judge which output is better — ideally with a fresh evaluator agent that doesn't know which is which.
72
+
73
+ If the no-skill output is equivalent, the skill adds no value (or the model has caught up to it).
74
+
75
+ ### 5. Reader test
76
+
77
+ Can someone with no conversation context understand the skill's output?
78
+
79
+ **Method:**
80
+
81
+ - Take the skill's final output (plan, document, code, design)
82
+ - Open a fresh conversation or use a sub-agent with only the output, no history
83
+ - Ask: "What is this?", "What are the key decisions?", "What's unclear?"
84
+
85
+ If the fresh reader struggles, the output has context bleed issues. Fix them before shipping.
86
+
87
+ ---
88
+
89
+ ## Writing Eval Cases
90
+
91
+ Each eval case = one input + expected behavior description.
92
+
93
+ **Format:**
94
+
95
+ ```
96
+ Input: [natural language prompt or file +prompt]
97
+ Expected:
98
+ - [Observable behavior 1]
99
+ - [Observable behavior 2]
100
+ - [Observable behavior 3 — what NOT to happen]
101
+ ```
102
+
103
+ **Example for `ask-questions-if-underspecified`:**
104
+
105
+ ```
106
+ Input: "Build me a feature."
107
+ Expected:
108
+ - Asks at least 1 clarifying question (scope, purpose, or constraints)
109
+ - Provides default options to choose from
110
+ - Does NOT immediately generate code
111
+ - Does NOT ask more than 5 questions
112
+ ```
113
+
114
+ **Rules:**
115
+
116
+ - Evals should be independent (not dependent on previous evals)
117
+ - Expected behavior should be observable and binary (pass/fail, not subjective)
118
+ - Aim for 5-10 evals per skill before shipping; 15+ for critical skills
119
+
120
+ ---
121
+
122
+ ## Benchmark Mode
123
+
124
+ Run all evals after a model update or after editing the skill:
125
+
126
+ 1. Run all evals sequentially (or in parallel to avoid context bleed)
127
+ 2. Record: pass rate, elapsed time per eval, token usage
128
+ 3. Compare to baseline before the change
129
+
130
+ **Pass rate thresholds:**
131
+
132
+ - < 60%: Skill has serious issues. Do not ship.
133
+ - 60-80%: Acceptable for early versions. Target improvement.
134
+ - > 80%: Production-ready.
135
+ - > 90%: Reliable enough for critical workflows.
136
+
137
+ ---
138
+
139
+ ## Description Tuning Process
140
+
141
+ If triggering is unreliable:
142
+
143
+ 1. List 10 prompts that should trigger the skill (write them as a user would)
144
+ 2. List 5 prompts of similar tasks that should _not_ trigger
145
+ 3. Find the distinguishing words/phrases between the two lists
146
+ 4. Rewrite the description to include the distinguishing words and exclude the overlap
147
+
148
+ **Pattern:**
149
+
150
+ ```yaml
151
+ description: "Use when [specific verb] [specific noun/domain]: [comma-separated task keywords]. NOT for [adjacent tasks that should not trigger]."
152
+ ```
153
+
154
+ ---
155
+
156
+ ## When to Retire a Skill
157
+
158
+ A skill is ready to retire when:
159
+
160
+ - 90%+ of its evals pass without the skill loaded (for capability uplift skills)
161
+ - The skill's instructions are now standard model behavior
162
+ - Maintenance cost exceeds value
163
+
164
+ Retiring isn't failure — it means the skill did its job and the model caught up.
@@ -0,0 +1,226 @@
1
+ # Workflow Patterns Reference
2
+
3
+ Load this when choosing or implementing a workflow pattern for a CBX agent or skill.
4
+
5
+ Source: Anthropic engineering research — [Common workflow patterns for AI agents](https://claude.com/blog/common-workflow-patterns-for-ai-agents-and-when-to-use-them) (March 2026).
6
+
7
+ ---
8
+
9
+ ## The Core Insight
10
+
11
+ Workflows don't replace agent autonomy — they _shape where and how_ agents apply it.
12
+
13
+ A fully autonomous agent decides everything: tools, order, when to stop.
14
+ A workflow provides structure: overall flow, checkpoints, boundaries — but each step still uses full agent reasoning.
15
+
16
+ **Start with a single agent call.** If that meets quality bar, you're done. Only add workflow complexity when you can measure the improvement.
17
+
18
+ ---
19
+
20
+ ## Pattern 1: Sequential Workflow
21
+
22
+ ### What it is
23
+
24
+ Agents execute in a fixed order. Each stage processes its input, makes tool calls, then passes results to the next stage.
25
+
26
+ ```
27
+ Input → [Agent A] → [Agent B] → [Agent C] → Output
28
+ ```
29
+
30
+ ### Use when
31
+
32
+ - Steps have explicit dependencies (B needs A's output before starting)
33
+ - Multi-stage transformation where each step adds specific value
34
+ - Draft-review-polish cycles
35
+ - Data extraction → validation → loading pipelines
36
+
37
+ ### Avoid when
38
+
39
+ - A single agent can handle the whole task
40
+ - Agents need to collaborate rather than hand off linearly
41
+ - You're forcing sequential structure onto a task that doesn't naturally fit it
42
+
43
+ ### Cost/benefit
44
+
45
+ - **Cost:** Latency is linear — step 2 waits for step 1
46
+ - **Benefit:** Each agent focuses on one thing; accuracy often improves
47
+
48
+ ### CBX implementation
49
+
50
+ ```markdown
51
+ ## Workflow
52
+
53
+ 1. **[Agent/Step A]** — [what it receives, what it does, what it produces]
54
+ 2. **[Agent/Step B]** — [takes A's output, does X, produces Y]
55
+ 3. **[Agent/Step C]** — [final synthesis/delivery]
56
+
57
+ Artifacts pass via [file path / variable / structured JSON / natural handoff instructions].
58
+ ```
59
+
60
+ ### Pro tip
61
+
62
+ First try the pipeline as a single agent where the steps are part of the prompt. If quality is good enough, you've solved the problem without complexity.
63
+
64
+ ---
65
+
66
+ ## Pattern 2: Parallel Workflow
67
+
68
+ ### What it is
69
+
70
+ Multiple agents run simultaneously on independent tasks. Results are merged or synthesized afterward.
71
+
72
+ ```
73
+ ┌→ [Agent A] →┐
74
+ Input → ├→ [Agent B] →├→ Synthesize → Output
75
+ └→ [Agent C] →┘
76
+ ```
77
+
78
+ ### Use when
79
+
80
+ - Tasks are genuinely independent (no agent needs another's output to start)
81
+ - Speed matters and concurrent execution helps
82
+ - Multiple perspectives on the same input (e.g., code review from security + performance + quality)
83
+ - Separation of concerns — different engineers can own individual agents
84
+
85
+ ### Avoid when
86
+
87
+ - Agents need cumulative context or must build on each other's work
88
+ - Resource constraints (API quotas) make concurrent calls inefficient
89
+ - Aggregation logic is unclear or produces contradictory results with no resolution strategy
90
+
91
+ ### Cost/benefit
92
+
93
+ - **Cost:** Tokens multiply (N agents × tokens each); requires aggregation strategy
94
+ - **Benefit:** Faster completion; clean separation of concerns
95
+
96
+ ### CBX implementation
97
+
98
+ ```markdown
99
+ ## Parallel Steps
100
+
101
+ Run these simultaneously:
102
+
103
+ - **[Agent A]** — [focused task, specific scope]
104
+ - **[Agent B]** — [focused task, different scope]
105
+ - **[Agent C]** — [focused task, different scope]
106
+
107
+ ## Synthesis
108
+
109
+ After all agents complete:
110
+ [How to merge: majority vote / highest confidence / specialized agent defers to other / human review]
111
+ ```
112
+
113
+ ### Pro tip
114
+
115
+ Design your aggregation strategy _before_ implementing parallel agents. Without a clear merge plan, you collect conflicting outputs with no way to resolve them.
116
+
117
+ ---
118
+
119
+ ## Pattern 3: Evaluator-Optimizer Workflow
120
+
121
+ ### What it is
122
+
123
+ Two agents loop: one generates content, another evaluates it against criteria, the generator refines based on feedback. Repeat until quality threshold is met or max iterations reached.
124
+
125
+ ```
126
+ ┌─────────────────────────────────────┐
127
+ ↓ |
128
+ Input → [Generator] → Draft → [Evaluator] → Pass? → Output
129
+ ↓ Fail
130
+ Feedback → [Generator]
131
+ ```
132
+
133
+ ### Use when
134
+
135
+ - First-draft quality consistently falls short of the required bar
136
+ - You have clear, measurable quality criteria an AI evaluator can apply consistently
137
+ - The gap between first-attempt and final quality justifies extra tokens and latency
138
+ - Examples: technical docs, customer communications, code against specific standards
139
+
140
+ ### Avoid when
141
+
142
+ - First-attempt quality already meets requirements (unnecessary cost)
143
+ - Real-time applications needing immediate responses
144
+ - Evaluation criteria are too subjective for consistent AI evaluation
145
+ - Deterministic tools exist (linters for style, validators for schemas) — use those instead
146
+
147
+ ### Cost/benefit
148
+
149
+ - **Cost:** Tokens × iterations; adds latency proportionally
150
+ - **Benefit:** Structured feedback loops produce measurably better outputs
151
+
152
+ ### CBX implementation
153
+
154
+ ```markdown
155
+ ## Generator Prompt
156
+
157
+ Task: [what to create]
158
+ Constraints: [specific, measurable requirements]
159
+ Format: [exact output format]
160
+
161
+ ## Evaluator Prompt
162
+
163
+ Review this output against these criteria:
164
+
165
+ 1. [Criterion A] — Pass/Fail + specific failure note
166
+ 2. [Criterion B] — Pass/Fail + specific failure note
167
+ 3. [Criterion C] — Pass/Fail + specific failure note
168
+
169
+ Output JSON: { "pass": bool, "failures": ["..."], "revision_note": "..." }
170
+
171
+ ## Loop Control
172
+
173
+ - Max iterations: [3-5]
174
+ - Stop when: all criteria pass OR max iterations reached
175
+ - On max with failures: surface remaining issues for human review
176
+ ```
177
+
178
+ ### Pro tip
179
+
180
+ Set stopping criteria _before_ iterating. Define max iterations and specific quality thresholds. Without guardrails, you enter expensive loops where the evaluator finds minor issues and quality plateaus well before you stop.
181
+
182
+ ---
183
+
184
+ ## Decision Tree
185
+
186
+ ```
187
+ Can a single agent handle this task effectively?
188
+ → YES: Don't use workflows. Use a rich single-agent prompt.
189
+ → NO: Continue...
190
+
191
+ Do steps have dependencies (B needs A's output)?
192
+ → YES: Use Sequential
193
+ → NO: Continue...
194
+
195
+ Can steps run independently, and would concurrency help?
196
+ → YES: Use Parallel
197
+ → NO: Continue...
198
+
199
+ Does quality improve meaningfully through iteration, and can you measure it?
200
+ → YES: Use Evaluator-Optimizer
201
+ → NO: Re-examine whether workflows help at all
202
+ ```
203
+
204
+ ---
205
+
206
+ ## Combining Patterns
207
+
208
+ Patterns are building blocks, not mutually exclusive:
209
+
210
+ - A **sequential workflow** can include **parallel** steps at certain stages (e.g., three parallel reviewers before a final synthesis step)
211
+ - An **evaluator-optimizer** can use **parallel evaluation** where multiple evaluators assess different quality dimensions simultaneously
212
+ - A **sequential chain** can use **evaluator-optimizer** at the critical high-quality step
213
+
214
+ Only add the combination when each additional pattern measurably improves outcomes.
215
+
216
+ ---
217
+
218
+ ## Pattern Comparison
219
+
220
+ | | Sequential | Parallel | Evaluator-Optimizer |
221
+ | -------------- | -------------------------------------------- | --------------------------------------- | ------------------------------------ |
222
+ | **When** | Dependencies between steps | Independent tasks | Quality below bar |
223
+ | **Examples** | Extract → validate → load; Draft → translate | Code review (security + perf + quality) | Technical docs, comms, SQL |
224
+ | **Latency** | Linear (each waits for previous) | Fast (concurrent) | Multiplied by iterations |
225
+ | **Token cost** | Linear | Multiplicative | Linear × iterations |
226
+ | **Key risk** | Bottleneck at slow steps | Aggregation conflicts | Infinite loops without stop criteria |
@@ -1,10 +1,10 @@
1
1
  ---
2
2
  name: deep-research
3
- description: "Use when a task needs multi-round research rather than a quick lookup: iterative search, gap finding, corroboration across sources, contradiction handling, or evidence-led synthesis before planning or implementation."
3
+ description: "Use when a task needs multi-round research rather than a quick lookup: iterative search, gap finding, corroboration across sources, contradiction handling, evidence-led synthesis before planning or implementation. Also use when the user asks for 'deep research', 'latest info', or 'how does X compare to Y publicly'."
4
4
  license: MIT
5
5
  metadata:
6
6
  author: cubis-foundry
7
- version: "1.0"
7
+ version: "1.1"
8
8
  compatibility: Claude Code, Codex, GitHub Copilot
9
9
  ---
10
10
 
@@ -14,23 +14,25 @@ compatibility: Claude Code, Codex, GitHub Copilot
14
14
 
15
15
  You are the specialist for iterative evidence gathering and synthesis.
16
16
 
17
- Your job is to find what is missing, not just summarize the first page of results.
17
+ Your job is to find what is missing, not just summarize the first page of results. Stop when remaining uncertainty is low-impact or explicitly reported to the user.
18
18
 
19
19
  ## When to Use
20
20
 
21
- - The task needs deep web or repo research before planning or implementation.
22
- - The first-pass answer is incomplete, contradictory, or likely stale.
23
- - The user explicitly asks for research, latest information, or public-repo comparison.
21
+ - The task needs deep web or repo research before planning or implementation
22
+ - The first-pass answer is incomplete, contradictory, or likely stale
23
+ - The user explicitly asks for research, latest information, or public-repo comparison
24
+ - Claims are contested or the topic changes fast (AI tooling, frameworks, protocols)
24
25
 
25
26
  ## Instructions
26
27
 
27
28
  ### STANDARD OPERATING PROCEDURE (SOP)
28
29
 
29
- 1. Define the question and what would count as enough evidence.
30
- 2. Run a first pass and identify gaps or contradictions.
30
+ 1. Define the narrowest possible form of the question and what would count as enough evidence.
31
+ 2. Run a first pass and identify gaps, contradictions, and missing facts.
31
32
  3. Search specifically for the missing facts, stronger sources, or counterexamples.
32
- 4. Rank sources by directness, recency, and authority.
33
- 5. Separate sourced facts, informed inference, and unresolved gaps.
33
+ 4. Rank sources by directness (primary > secondary > tertiary), recency, and authority.
34
+ 5. Separate **sourced facts**, **informed inference**, and **unresolved gaps** in the output.
35
+ 6. Apply the sub-agent reader test for substantial research deliverables — pass the synthesis to a fresh context to verify it's self-contained.
34
36
 
35
37
  ### Constraints
36
38
 
@@ -41,19 +43,22 @@ Your job is to find what is missing, not just summarize the first page of result
41
43
 
42
44
  ## Output Format
43
45
 
44
- Provide implementation guidance, code examples, and configuration as appropriate to the task.
46
+ Structure clearly as:
45
47
 
46
- ## References
47
-
48
- | File | Load when |
49
- | ----------------------------------------- | ----------------------------------------------------------------------------------------------------- |
50
- | `references/multi-round-research-loop.md` | You need the detailed loop for search, corroboration, contradiction handling, and evidence synthesis. |
48
+ - **Key findings** — the answer, directly stated
49
+ - **Evidence** — sourced facts with citations ranked by confidence
50
+ - **Inference** what follows logically from the evidence (labeled as inference)
51
+ - **Open questions** what remains unresolved and why it matters
51
52
 
52
- ## Scripts
53
+ ## References
53
54
 
54
- No helper scripts are required for this skill right now. Keep execution in `SKILL.md` and `references/` unless repeated automation becomes necessary.
55
+ | File | Load when |
56
+ | ----------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
57
+ | `references/multi-round-research-loop.md` | You need the full iterative loop: search, corroboration, contradiction handling, evidence table, sub-agent reader test, stop rules, and failure mode guide. |
55
58
 
56
59
  ## Examples
57
60
 
58
- - "Help me with deep research best practices in this project"
59
- - "Review my deep research implementation for issues"
61
+ - "Research how Anthropic structures their agent skills compare to what CBX does"
62
+ - "What's the latest on evaluator-optimizer patterns in production agent systems?"
63
+ - "Deep research on OKLCH vs HSL for design systems — what do practitioners actually use?"
64
+ - "Find counterexamples to the claim that parallel agents always improve speed"