oh-my-customcode 0.37.2 → 0.38.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +20 -5
- package/dist/cli/index.js +1189 -99
- package/dist/index.js +4 -1
- package/package.json +3 -1
- package/templates/.claude/hooks/hooks.json +23 -11
- package/templates/.claude/hooks/scripts/context-budget-advisor.sh +1 -1
- package/templates/.claude/hooks/scripts/eval-core-batch-save.sh +23 -0
- package/templates/.claude/hooks/scripts/session-env-check.sh +20 -0
- package/templates/.claude/hooks/scripts/stuck-detector.sh +1 -1
- package/templates/.claude/hooks/scripts/task-outcome-recorder.sh +13 -1
- package/templates/.claude/rules/MAY-optimization.md +12 -0
- package/templates/.claude/rules/MUST-agent-design.md +45 -7
- package/templates/.claude/rules/MUST-completion-verification.md +81 -0
- package/templates/.claude/rules/SHOULD-memory-integration.md +81 -0
- package/templates/.claude/skills/de-lead-routing/SKILL.md +8 -92
- package/templates/.claude/skills/deep-plan/SKILL.md +55 -4
- package/templates/.claude/skills/dev-lead-routing/SKILL.md +9 -21
- package/templates/.claude/skills/dev-refactor/SKILL.md +34 -1
- package/templates/.claude/skills/evaluator-optimizer/SKILL.md +53 -0
- package/templates/.claude/skills/qa-lead-routing/SKILL.md +7 -242
- package/templates/.claude/skills/research/SKILL.md +74 -7
- package/templates/.claude/skills/sauron-watch/SKILL.md +81 -0
- package/templates/.claude/skills/secretary-routing/SKILL.md +3 -18
- package/templates/.claude/skills/structured-dev-cycle/SKILL.md +20 -3
- package/templates/guides/claude-code/index.yaml +5 -0
- package/templates/manifest.json +3 -3
- package/templates/.claude/hooks/scripts/session-compliance-report.sh +0 -65
|
@@ -5,14 +5,14 @@ scope: core
|
|
|
5
5
|
version: 1.0.0
|
|
6
6
|
user-invocable: true
|
|
7
7
|
argument-hint: "<topic-or-issue>"
|
|
8
|
-
|
|
8
|
+
teams-compatible: true
|
|
9
9
|
---
|
|
10
10
|
|
|
11
11
|
# Deep Plan Skill
|
|
12
12
|
|
|
13
13
|
Research-validated planning that eliminates the gap between research assumptions and actual code. Orchestrates a 3-phase cycle: Discovery Research → Reality-Check Planning → Plan Verification.
|
|
14
14
|
|
|
15
|
-
**
|
|
15
|
+
**Teams-compatible** — works both from the main conversation (R010) and inside Agent Teams members. When used in Teams, the member directly executes the 3-phase workflow without Skill tool invocation.
|
|
16
16
|
|
|
17
17
|
## Usage
|
|
18
18
|
|
|
@@ -50,7 +50,11 @@ Phase 1: Discovery Research
|
|
|
50
50
|
└── Output: research report (artifact)
|
|
51
51
|
```
|
|
52
52
|
|
|
53
|
-
**Execution**:
|
|
53
|
+
**Execution**:
|
|
54
|
+
- **Orchestrator mode**: Delegates to `/research` skill via `Skill(research, args="<topic>")`.
|
|
55
|
+
- **Teams mode**: Executes the research workflow inline (see Teams Mode section). The member spawns research teams directly as sub-agents.
|
|
56
|
+
|
|
57
|
+
The executor waits for completion before proceeding to Phase 2.
|
|
54
58
|
|
|
55
59
|
**Output**: Full research report with ADOPT/ADAPT/AVOID taxonomy.
|
|
56
60
|
|
|
@@ -251,7 +255,7 @@ Phase 1 delegation to `/research` means Agent Teams decisions are handled by the
|
|
|
251
255
|
|
|
252
256
|
| Component | Integration |
|
|
253
257
|
|-----------|-------------|
|
|
254
|
-
| `/research` | Phase 1 full invocation + Phase 3 reduced invocation pattern |
|
|
258
|
+
| `/research` | Phase 1 full invocation (via Skill tool or inline in Teams mode) + Phase 3 reduced invocation pattern |
|
|
255
259
|
| EnterPlanMode/ExitPlanMode | Phase 2 plan creation and user approval |
|
|
256
260
|
| Explore agents | Phase 2 codebase verification (up to 3 parallel) |
|
|
257
261
|
| R009 | Phase 1 (10 teams batched), Phase 2 (3 Explore), Phase 3 (3 teams) |
|
|
@@ -271,6 +275,53 @@ Phase 1 delegation to `/research` means Agent Teams decisions are handled by the
|
|
|
271
275
|
| Explore agent failure | Reduce parallel count, retry with remaining |
|
|
272
276
|
| Partial team failure | Synthesize from available results, note gaps |
|
|
273
277
|
|
|
278
|
+
## Teams Mode
|
|
279
|
+
|
|
280
|
+
When running inside an Agent Teams member (not via Skill tool), the deep-plan workflow operates identically but with these adaptations:
|
|
281
|
+
|
|
282
|
+
### How It Works
|
|
283
|
+
|
|
284
|
+
The orchestrator reads this SKILL.md and includes the deep-plan instructions directly in the Teams member's prompt. The member then:
|
|
285
|
+
|
|
286
|
+
1. Phase 1: Executes research workflow inline (not via `Skill(research)`) — spawns 10 research teams as sub-agents
|
|
287
|
+
2. Phase 2: Uses EnterPlanMode/ExitPlanMode and Explore agents normally
|
|
288
|
+
3. Phase 3: Spawns 3 verification teams as sub-agents
|
|
289
|
+
4. Delivers final verified plan via `SendMessage` to team lead
|
|
290
|
+
|
|
291
|
+
### Prompt Embedding Pattern
|
|
292
|
+
|
|
293
|
+
```
|
|
294
|
+
# When spawning a Teams member for deep-plan:
|
|
295
|
+
Agent(
|
|
296
|
+
name: "planner-1",
|
|
297
|
+
team_name: "my-team",
|
|
298
|
+
prompt: """
|
|
299
|
+
You are a deep-plan agent. Follow the deep-plan skill workflow below:
|
|
300
|
+
{contents of deep-plan/SKILL.md}
|
|
301
|
+
|
|
302
|
+
Also follow this research workflow for Phase 1:
|
|
303
|
+
{contents of research/SKILL.md}
|
|
304
|
+
|
|
305
|
+
Topic: {user's planning topic}
|
|
306
|
+
Deliver verified plan via SendMessage to team lead when complete.
|
|
307
|
+
"""
|
|
308
|
+
)
|
|
309
|
+
```
|
|
310
|
+
|
|
311
|
+
### Differences from Orchestrator Mode
|
|
312
|
+
|
|
313
|
+
| Aspect | Orchestrator Mode | Teams Mode |
|
|
314
|
+
|--------|------------------|------------|
|
|
315
|
+
| Invocation | `Skill(deep-plan)` | Prompt embedding |
|
|
316
|
+
| Phase 1 research | `Skill(research)` | Inline execution |
|
|
317
|
+
| Result delivery | Return to main conversation | `SendMessage` to team lead |
|
|
318
|
+
| Plan approval | User via ExitPlanMode | Team lead via SendMessage |
|
|
319
|
+
| Context isolation | Previously used `context: fork` | Standard context (no fork) |
|
|
320
|
+
|
|
321
|
+
### Why No context: fork
|
|
322
|
+
|
|
323
|
+
`context: fork` was removed to enable Teams compatibility. Fork blocks sub-agent spawning, which is essential for Phase 1 (10 research teams) and Phase 3 (3 verification teams). Without fork, deep-plan operates in the standard context, which is required for both orchestrator and Teams usage.
|
|
324
|
+
|
|
274
325
|
## Artifact Persistence
|
|
275
326
|
|
|
276
327
|
Phase 1 research artifact is persisted by the `/research` skill.
|
|
@@ -18,6 +18,7 @@ context: fork
|
|
|
18
18
|
| Tooling | tool-npm-expert, tool-optimizer, tool-bun-expert |
|
|
19
19
|
| Database | db-supabase-expert, db-postgres-expert, db-redis-expert |
|
|
20
20
|
| Architect | arch-documenter, arch-speckit-agent |
|
|
21
|
+
| Security | sec-codeql-expert |
|
|
21
22
|
| Infra | infra-docker-expert, infra-aws-expert |
|
|
22
23
|
|
|
23
24
|
## File Extension Mapping
|
|
@@ -67,6 +68,7 @@ context: fork
|
|
|
67
68
|
| supabase, rls, edge function | db-supabase-expert |
|
|
68
69
|
| docker, dockerfile, container, compose | infra-docker-expert |
|
|
69
70
|
| aws, cloudformation, vpc, iam, s3, lambda, cdk, terraform | infra-aws-expert |
|
|
71
|
+
| security, codeql, cve, vulnerability, sarif, sast, security audit | sec-codeql-expert |
|
|
70
72
|
| architecture, adr, openapi, swagger, diagram | arch-documenter |
|
|
71
73
|
| spec, specification, tdd, requirements | arch-speckit-agent |
|
|
72
74
|
|
|
@@ -97,10 +99,11 @@ Check if Agent Teams is available (`CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` or T
|
|
|
97
99
|
For **new file creation**, **boilerplate**, or **test code generation**:
|
|
98
100
|
|
|
99
101
|
1. Check `/tmp/.claude-env-status-*` for codex availability
|
|
100
|
-
2. If codex available →
|
|
102
|
+
2. If codex available AND task involves new file creation → automatically delegate to `/codex-exec` for scaffolding:
|
|
103
|
+
- Display: `[Codex Hybrid] Delegating to codex-exec...`
|
|
101
104
|
- codex-exec generates initial code (strength: fast generation)
|
|
102
|
-
- Claude expert reviews and refines (strength: reasoning, quality)
|
|
103
|
-
3. If codex unavailable → use Claude expert directly
|
|
105
|
+
- Selected Claude expert reviews and refines codex output (strength: reasoning, quality)
|
|
106
|
+
3. If codex unavailable → display `[Codex] Unavailable — proceeding with {expert} directly` and use Claude expert directly
|
|
104
107
|
|
|
105
108
|
**Suitable**: New file creation, boilerplate, scaffolding, test code
|
|
106
109
|
**Unsuitable**: Existing code modification, architecture decisions, bug fixes
|
|
@@ -110,26 +113,11 @@ Route to appropriate language/framework expert based on file extension and keywo
|
|
|
110
113
|
|
|
111
114
|
### Step 4: Ontology-RAG Enrichment (R019)
|
|
112
115
|
|
|
113
|
-
|
|
116
|
+
If `get_agent_for_task` MCP tool is available, call it with the original query and inject `suggested_skills` into the agent prompt. Skip silently on failure.
|
|
114
117
|
|
|
115
|
-
|
|
116
|
-
2. Extract `suggested_skills` from response
|
|
117
|
-
3. If `suggested_skills` non-empty, prepend to spawned agent prompt:
|
|
118
|
-
`"Ontology context suggests these skills may be relevant: {suggested_skills}"`
|
|
119
|
-
4. On MCP failure: skip silently, proceed with unmodified prompt
|
|
118
|
+
### Step 5: Soul Injection (R006)
|
|
120
119
|
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
### Step 5: Soul Injection
|
|
124
|
-
|
|
125
|
-
If the selected agent has `soul: true` in its frontmatter:
|
|
126
|
-
|
|
127
|
-
1. Read `.claude/agents/souls/{agent-name}.soul.md`
|
|
128
|
-
2. If file exists, prepend soul content to the agent's prompt:
|
|
129
|
-
`"Identity context:\n{soul content}\n\n---\n\n"`
|
|
130
|
-
3. If file doesn't exist → skip silently (no error, no injection)
|
|
131
|
-
|
|
132
|
-
**This step runs after ontology-RAG enrichment. Soul content is identity context, not capability instructions.**
|
|
120
|
+
If the selected agent has `soul: true` in frontmatter, read and prepend `.claude/agents/souls/{agent-name}.soul.md` content to the prompt. Skip silently if file doesn't exist.
|
|
133
121
|
|
|
134
122
|
## Routing Rules
|
|
135
123
|
|
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
name: dev-refactor
|
|
3
3
|
description: Refactor code for better structure and patterns
|
|
4
4
|
scope: core
|
|
5
|
-
argument-hint: "<file-or-directory> [--lang <language>]"
|
|
5
|
+
argument-hint: "<file-or-directory> [--lang <language>] [--spec]"
|
|
6
6
|
---
|
|
7
7
|
|
|
8
8
|
# Code Refactoring Skill
|
|
@@ -198,3 +198,36 @@ Summary:
|
|
|
198
198
|
|
|
199
199
|
Recommendation: Run tests to verify changes.
|
|
200
200
|
```
|
|
201
|
+
|
|
202
|
+
## Spec Mode (`--spec`)
|
|
203
|
+
|
|
204
|
+
When the `--spec` flag is present, refactoring is guided by the target's canonical specification:
|
|
205
|
+
|
|
206
|
+
### Workflow
|
|
207
|
+
|
|
208
|
+
1. **Load spec**: Read `.claude/specs/<agent-name>.spec.md` (generated by `/omcustom:takeover`)
|
|
209
|
+
- If spec doesn't exist, run takeover first: `/omcustom:takeover <name>`
|
|
210
|
+
2. **Extract invariants**: Parse the spec's `## Invariants` section as pre-flight guard constraints
|
|
211
|
+
3. **Refactor**: Perform normal refactoring (per existing workflow)
|
|
212
|
+
4. **Verify invariants**: After refactoring, check each invariant still holds:
|
|
213
|
+
```
|
|
214
|
+
[Spec Verification]
|
|
215
|
+
├── ✓ Invariant 1: {description} — PASS
|
|
216
|
+
├── ✓ Invariant 2: {description} — PASS
|
|
217
|
+
└── ✗ Invariant 3: {description} — FAIL (reason)
|
|
218
|
+
```
|
|
219
|
+
5. **Regenerate spec**: If refactoring changed the contract, run `/omcustom:takeover <name>` to update
|
|
220
|
+
|
|
221
|
+
### When to Use
|
|
222
|
+
|
|
223
|
+
| Scenario | Use `--spec`? |
|
|
224
|
+
|----------|--------------|
|
|
225
|
+
| Refactoring agent internals | Yes — preserves declared invariants |
|
|
226
|
+
| Renaming/restructuring skill | Yes — ensures contract stability |
|
|
227
|
+
| Simple code cleanup | No — overhead not justified |
|
|
228
|
+
| Adding new capability | No — spec will change anyway |
|
|
229
|
+
|
|
230
|
+
### Prerequisites
|
|
231
|
+
|
|
232
|
+
- `.claude/specs/<name>.spec.md` must exist (or will be auto-generated via takeover)
|
|
233
|
+
- Target must be an agent or skill (not arbitrary code)
|
|
@@ -46,6 +46,14 @@ evaluator-optimizer:
|
|
|
46
46
|
| `quality_gate.threshold` | No | `0.8` | Score threshold (for `score_threshold` type) |
|
|
47
47
|
| `max_iterations` | No | `3` | Max refinement loops (hard cap: 5) |
|
|
48
48
|
|
|
49
|
+
### Model Selection Guidance
|
|
50
|
+
|
|
51
|
+
For model selection within the evaluator-optimizer loop, follow the [reasoning-sandwich](/skills/reasoning-sandwich) pattern:
|
|
52
|
+
|
|
53
|
+
- **Generator**: Use `sonnet` (default) — optimized for content generation
|
|
54
|
+
- **Evaluator**: Use `opus` (default) — benefits from stronger reasoning for quality assessment
|
|
55
|
+
- **Override**: For simpler domains, `sonnet`/`sonnet` is acceptable; for critical domains, consider `opus`/`opus`
|
|
56
|
+
|
|
49
57
|
## Quality Gate Types
|
|
50
58
|
|
|
51
59
|
| Type | Behavior |
|
|
@@ -148,6 +156,7 @@ The evaluator MUST return a structured verdict in this format:
|
|
|
148
156
|
| Documentation | `arch-documenter` | opus reviewer | Completeness, clarity, accuracy |
|
|
149
157
|
| Architecture | Plan agent | opus reviewer | No SPOFs, no circular deps |
|
|
150
158
|
| Test plans | `qa-planner` | `qa-engineer` | Coverage, edge cases, feasibility |
|
|
159
|
+
| Test coverage | `qa-writer` | `qa-engineer` + coverage tool | `coverage >= target%` |
|
|
151
160
|
| Agent creation | `mgr-creator` | opus reviewer | Frontmatter validity, R006 compliance |
|
|
152
161
|
| Security audit | `sec-codeql-expert` | opus reviewer | Vulnerability coverage, false positive rate |
|
|
153
162
|
|
|
@@ -208,6 +217,50 @@ evaluator-optimizer:
|
|
|
208
217
|
max_iterations: 3
|
|
209
218
|
```
|
|
210
219
|
|
|
220
|
+
### Domain: Test Coverage Optimization
|
|
221
|
+
|
|
222
|
+
```yaml
|
|
223
|
+
evaluator-optimizer:
|
|
224
|
+
generator:
|
|
225
|
+
agent: qa-writer
|
|
226
|
+
model: sonnet
|
|
227
|
+
evaluator:
|
|
228
|
+
agent: qa-engineer
|
|
229
|
+
model: sonnet
|
|
230
|
+
rubric:
|
|
231
|
+
- criterion: line_coverage
|
|
232
|
+
weight: 0.4
|
|
233
|
+
description: "Percentage of code lines exercised by tests"
|
|
234
|
+
- criterion: branch_coverage
|
|
235
|
+
weight: 0.3
|
|
236
|
+
description: "Percentage of conditional branches tested"
|
|
237
|
+
- criterion: edge_cases
|
|
238
|
+
weight: 0.2
|
|
239
|
+
description: "Critical edge cases explicitly tested"
|
|
240
|
+
- criterion: test_quality
|
|
241
|
+
weight: 0.1
|
|
242
|
+
description: "Tests are meaningful, not just hitting lines"
|
|
243
|
+
quality_gate:
|
|
244
|
+
type: score_threshold
|
|
245
|
+
threshold: 0.8
|
|
246
|
+
max_iterations: 5
|
|
247
|
+
parameters:
|
|
248
|
+
target_coverage: 80 # Minimum coverage percentage
|
|
249
|
+
max_iterations: 5 # Hard cap (matches skill-level cap)
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
**Workflow**:
|
|
253
|
+
1. qa-writer generates test cases targeting uncovered code
|
|
254
|
+
2. qa-engineer runs tests and measures coverage
|
|
255
|
+
3. If coverage < target: qa-writer generates additional tests for uncovered paths
|
|
256
|
+
4. Repeat until target reached or max_iterations exhausted
|
|
257
|
+
|
|
258
|
+
**Parameters**:
|
|
259
|
+
| Parameter | Default | Description |
|
|
260
|
+
|-----------|---------|-------------|
|
|
261
|
+
| `target_coverage` | 80% | Minimum acceptable coverage |
|
|
262
|
+
| `max_iterations` | 5 | Hard cap on refinement loops |
|
|
263
|
+
|
|
211
264
|
## Integration
|
|
212
265
|
|
|
213
266
|
| Rule | Integration |
|
|
@@ -47,261 +47,26 @@ full_qa_cycle → all agents (sequential)
|
|
|
47
47
|
|
|
48
48
|
### Ontology-RAG Enrichment (R019)
|
|
49
49
|
|
|
50
|
-
|
|
50
|
+
If `get_agent_for_task` MCP tool is available, call it with the original query and inject `suggested_skills` into the agent prompt. Skip silently on failure.
|
|
51
51
|
|
|
52
|
-
|
|
53
|
-
2. Extract `suggested_skills` from response
|
|
54
|
-
3. If `suggested_skills` non-empty, prepend to spawned agent prompt:
|
|
55
|
-
`"Ontology context suggests these skills may be relevant: {suggested_skills}"`
|
|
56
|
-
4. On MCP failure: skip silently, proceed with unmodified prompt
|
|
52
|
+
### Step 5: Soul Injection (R006)
|
|
57
53
|
|
|
58
|
-
|
|
54
|
+
If the selected agent has `soul: true` in frontmatter, read and prepend `.claude/agents/souls/{agent-name}.soul.md` content to the prompt. Skip silently if file doesn't exist.
|
|
59
55
|
|
|
60
|
-
|
|
56
|
+
## Sequential Workflow Ordering
|
|
61
57
|
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
1. Read `.claude/agents/souls/{agent-name}.soul.md`
|
|
65
|
-
2. If file exists, prepend soul content to the agent's prompt:
|
|
66
|
-
`"Identity context:\n{soul content}\n\n---\n\n"`
|
|
67
|
-
3. If file doesn't exist → skip silently (no error, no injection)
|
|
68
|
-
|
|
69
|
-
**This step runs after ontology-RAG enrichment. Soul content is identity context, not capability instructions.**
|
|
70
|
-
|
|
71
|
-
## Routing Rules
|
|
72
|
-
|
|
73
|
-
### 1. Test Planning
|
|
74
|
-
|
|
75
|
-
```
|
|
76
|
-
User: "Create test plan for feature X"
|
|
77
|
-
|
|
78
|
-
Route:
|
|
79
|
-
Agent(qa-planner role → create test plan, model: "sonnet")
|
|
80
|
-
|
|
81
|
-
Output:
|
|
82
|
-
- Test scenarios
|
|
83
|
-
- Coverage targets
|
|
84
|
-
- Acceptance criteria
|
|
85
|
-
- Risk assessment
|
|
86
|
-
```
|
|
87
|
-
|
|
88
|
-
### 2. Test Documentation
|
|
89
|
-
|
|
90
|
-
```
|
|
91
|
-
User: "Document test cases for API"
|
|
92
|
-
|
|
93
|
-
Route:
|
|
94
|
-
Agent(qa-writer role → document test cases, model: "sonnet")
|
|
95
|
-
|
|
96
|
-
Output:
|
|
97
|
-
- Test case specifications
|
|
98
|
-
- Test data requirements
|
|
99
|
-
- Expected results
|
|
100
|
-
- Test templates
|
|
101
|
-
```
|
|
102
|
-
|
|
103
|
-
### 3. Test Execution
|
|
104
|
-
|
|
105
|
-
```
|
|
106
|
-
User: "Execute tests for module Y"
|
|
107
|
-
|
|
108
|
-
Route:
|
|
109
|
-
Agent(qa-engineer role → execute tests, model: "sonnet")
|
|
110
|
-
|
|
111
|
-
Output:
|
|
112
|
-
- Test execution results
|
|
113
|
-
- Pass/fail metrics
|
|
114
|
-
- Defect reports
|
|
115
|
-
- Coverage reports
|
|
116
|
-
```
|
|
117
|
-
|
|
118
|
-
### 4. Quality Analysis
|
|
119
|
-
|
|
120
|
-
When analysis is needed (parallel execution):
|
|
121
|
-
|
|
122
|
-
```
|
|
123
|
-
User: "Analyze quality metrics"
|
|
124
|
-
|
|
125
|
-
Route (parallel):
|
|
126
|
-
Agent(qa-planner role → analyze strategy, model: "sonnet")
|
|
127
|
-
Agent(qa-engineer role → analyze results, model: "sonnet")
|
|
128
|
-
|
|
129
|
-
Aggregate:
|
|
130
|
-
Strategy insights + execution data
|
|
131
|
-
```
|
|
132
|
-
|
|
133
|
-
### 5. Full QA Cycle (Sequential)
|
|
134
|
-
|
|
135
|
-
For complete quality assurance workflow:
|
|
136
|
-
|
|
137
|
-
```
|
|
138
|
-
User: "Run full QA cycle for feature Z"
|
|
139
|
-
|
|
140
|
-
Route (sequential):
|
|
141
|
-
1. Agent(qa-planner role → create test plan, model: "sonnet")
|
|
142
|
-
2. Agent(qa-writer role → document test cases, model: "sonnet")
|
|
143
|
-
3. Agent(qa-engineer role → execute tests, model: "sonnet")
|
|
144
|
-
4. Agent(qa-writer role → generate report, model: "sonnet")
|
|
145
|
-
|
|
146
|
-
Aggregate and present final report
|
|
147
|
-
```
|
|
148
|
-
|
|
149
|
-
## Full QA Cycle Workflow
|
|
150
|
-
|
|
151
|
-
```
|
|
152
|
-
1. Planning Phase (qa-planner)
|
|
153
|
-
- Analyze requirements
|
|
154
|
-
- Define test scenarios
|
|
155
|
-
- Set acceptance criteria
|
|
156
|
-
- Identify risks
|
|
157
|
-
|
|
158
|
-
2. Documentation Phase (qa-writer)
|
|
159
|
-
- Write test cases
|
|
160
|
-
- Define test data
|
|
161
|
-
- Document expected results
|
|
162
|
-
- Create templates
|
|
163
|
-
|
|
164
|
-
3. Execution Phase (qa-engineer)
|
|
165
|
-
- Execute test cases
|
|
166
|
-
- Record results
|
|
167
|
-
- Report defects
|
|
168
|
-
- Calculate coverage
|
|
169
|
-
|
|
170
|
-
4. Reporting Phase (qa-writer)
|
|
171
|
-
- Aggregate results
|
|
172
|
-
- Generate reports
|
|
173
|
-
- Document findings
|
|
174
|
-
- Provide recommendations
|
|
175
|
-
|
|
176
|
-
5. Aggregation (qa-lead routing)
|
|
177
|
-
- Combine all phases
|
|
178
|
-
- Present unified status
|
|
179
|
-
- Highlight critical issues
|
|
180
|
-
```
|
|
181
|
-
|
|
182
|
-
## Sequential vs Parallel Execution
|
|
183
|
-
|
|
184
|
-
### Sequential (typical for QA workflow)
|
|
185
|
-
|
|
186
|
-
QA workflow is typically sequential because each phase depends on the previous:
|
|
187
|
-
- Planning must complete before documentation
|
|
188
|
-
- Documentation must complete before execution
|
|
189
|
-
- Execution must complete before reporting
|
|
58
|
+
Full QA cycle follows sequential phases (each depends on the previous):
|
|
190
59
|
|
|
191
60
|
```
|
|
192
61
|
qa-planner → qa-writer → qa-engineer → qa-writer
|
|
193
62
|
(plan) (document) (execute) (report)
|
|
194
63
|
```
|
|
195
64
|
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
Only when tasks are truly independent:
|
|
199
|
-
- Quality analysis (strategy + results)
|
|
200
|
-
- Multi-module testing (independent modules)
|
|
201
|
-
|
|
202
|
-
```
|
|
203
|
-
Example:
|
|
204
|
-
Agent(qa-engineer role → test module A, model: "sonnet")
|
|
205
|
-
Agent(qa-engineer role → test module B, model: "sonnet")
|
|
206
|
-
Agent(qa-engineer role → test module C, model: "sonnet")
|
|
207
|
-
```
|
|
65
|
+
Parallel execution only for independent analyses (e.g., multi-module testing). See R009.
|
|
208
66
|
|
|
209
67
|
## Sub-agent Model Selection
|
|
210
68
|
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
| Agent | Recommended Model | Reason |
|
|
214
|
-
|-------|-------------------|--------|
|
|
215
|
-
| qa-planner | `sonnet` | Strategy requires balanced reasoning |
|
|
216
|
-
| qa-writer | `sonnet` | Documentation quality matters |
|
|
217
|
-
| qa-engineer | `sonnet` | Test execution needs accuracy |
|
|
218
|
-
|
|
219
|
-
All QA agents typically use `sonnet` for balanced quality output.
|
|
220
|
-
|
|
221
|
-
### Agent Call Examples
|
|
222
|
-
|
|
223
|
-
```
|
|
224
|
-
# Test planning
|
|
225
|
-
Agent(
|
|
226
|
-
subagent_type: "general-purpose",
|
|
227
|
-
prompt: "Create comprehensive test plan for authentication feature following qa-planner guidelines",
|
|
228
|
-
model: "sonnet"
|
|
229
|
-
)
|
|
230
|
-
|
|
231
|
-
# Test documentation
|
|
232
|
-
Agent(
|
|
233
|
-
subagent_type: "general-purpose",
|
|
234
|
-
prompt: "Document test cases for API endpoints following qa-writer guidelines",
|
|
235
|
-
model: "sonnet"
|
|
236
|
-
)
|
|
237
|
-
|
|
238
|
-
# Test execution
|
|
239
|
-
Agent(
|
|
240
|
-
subagent_type: "general-purpose",
|
|
241
|
-
prompt: "Execute integration tests and report results following qa-engineer guidelines",
|
|
242
|
-
model: "sonnet"
|
|
243
|
-
)
|
|
244
|
-
```
|
|
245
|
-
|
|
246
|
-
## Display Format
|
|
247
|
-
|
|
248
|
-
### Full QA Cycle
|
|
249
|
-
|
|
250
|
-
```
|
|
251
|
-
[Planning] Delegating to qa-planner...
|
|
252
|
-
→ Test plan created (15 scenarios)
|
|
253
|
-
|
|
254
|
-
[Documentation] Delegating to qa-writer...
|
|
255
|
-
→ 15 test cases documented
|
|
256
|
-
|
|
257
|
-
[Execution] Delegating to qa-engineer...
|
|
258
|
-
→ 13 passed, 2 failed
|
|
259
|
-
|
|
260
|
-
[Report] Generating summary...
|
|
261
|
-
Coverage: 85%
|
|
262
|
-
Pass Rate: 87%
|
|
263
|
-
Defects: 2 (1 High, 1 Medium)
|
|
264
|
-
|
|
265
|
-
[Done] QA cycle completed
|
|
266
|
-
```
|
|
267
|
-
|
|
268
|
-
### Parallel Quality Analysis
|
|
269
|
-
|
|
270
|
-
```
|
|
271
|
-
[Analyzing] Spawning parallel analysis...
|
|
272
|
-
|
|
273
|
-
[Instance 1] strategy-analysis:sonnet → qa-planner
|
|
274
|
-
[Instance 2] results-analysis:sonnet → qa-engineer
|
|
275
|
-
|
|
276
|
-
[Progress] ████████████ 2/2
|
|
277
|
-
|
|
278
|
-
[Summary]
|
|
279
|
-
Strategy: Coverage targets met, add edge cases
|
|
280
|
-
Results: 85% pass rate, 2 critical defects
|
|
281
|
-
|
|
282
|
-
Analysis completed.
|
|
283
|
-
```
|
|
284
|
-
|
|
285
|
-
## Integration with Other Agents
|
|
286
|
-
|
|
287
|
-
- **Receives requirements from**: arch-speckit-agent (sw-architect)
|
|
288
|
-
- **Reports quality status to**: dev-lead
|
|
289
|
-
- **Coordinates with**: Language experts for automated tests
|
|
290
|
-
- **Provides feedback to**: Development team via dev-lead
|
|
291
|
-
|
|
292
|
-
## Metrics Tracking
|
|
293
|
-
|
|
294
|
-
QA lead routing should aggregate these metrics:
|
|
295
|
-
|
|
296
|
-
```yaml
|
|
297
|
-
metrics:
|
|
298
|
-
test_coverage: percentage
|
|
299
|
-
pass_rate: percentage
|
|
300
|
-
defect_count: number
|
|
301
|
-
defect_severity: [critical, high, medium, low]
|
|
302
|
-
execution_time: duration
|
|
303
|
-
test_case_count: number
|
|
304
|
-
```
|
|
69
|
+
All QA agents use `sonnet` by default for balanced quality output.
|
|
305
70
|
|
|
306
71
|
## No Match Fallback
|
|
307
72
|
|
|
@@ -3,13 +3,14 @@ name: research
|
|
|
3
3
|
description: 10-team parallel deep analysis with cross-verification for any topic, repository, or technology. Use when user invokes /research or asks for comprehensive research.
|
|
4
4
|
scope: core
|
|
5
5
|
user-invocable: true
|
|
6
|
+
teams-compatible: true
|
|
6
7
|
---
|
|
7
8
|
|
|
8
9
|
# Research Skill
|
|
9
10
|
|
|
10
11
|
Orchestrates 10 parallel research teams for comprehensive deep analysis of any topic, GitHub repository, or technology. Produces a structured report with ADOPT/ADAPT/AVOID taxonomy.
|
|
11
12
|
|
|
12
|
-
**
|
|
13
|
+
**Teams-compatible** — works both from the main conversation (R010) and inside Agent Teams members. When used in Teams, the member directly executes the research workflow without Skill tool invocation.
|
|
13
14
|
|
|
14
15
|
## Usage
|
|
15
16
|
|
|
@@ -162,8 +163,21 @@ Batch 3: T9, T10 (Innovation)
|
|
|
162
163
|
|
|
163
164
|
### Phase 2: Cross-Verification Loop (min 2, max 30 rounds)
|
|
164
165
|
|
|
166
|
+
#### Codex Availability Check
|
|
167
|
+
|
|
168
|
+
Before starting verification rounds, check codex availability:
|
|
169
|
+
|
|
170
|
+
```bash
|
|
171
|
+
# Run this check once before Phase 2 begins
|
|
172
|
+
which codex &>/dev/null && [ -n "$OPENAI_API_KEY" ]
|
|
173
|
+
# Exit 0 → codex available: enable dual-model verification (opus + codex)
|
|
174
|
+
# Exit 1 → codex unavailable: display notice and proceed with opus-only
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
If unavailable, display: `[Phase 2] Codex unavailable — opus-only verification`
|
|
178
|
+
|
|
165
179
|
```
|
|
166
|
-
Team findings ──→ opus 4.6 verification ──→ codex-exec xhigh verification
|
|
180
|
+
Team findings ──→ opus 4.6 verification ──→ codex-exec xhigh verification (if available)
|
|
167
181
|
│ │
|
|
168
182
|
└── Contradiction detected? ── YES ──→ Round N+1
|
|
169
183
|
NO ──→ Consensus reached → Phase 3
|
|
@@ -171,7 +185,8 @@ Team findings ──→ opus 4.6 verification ──→ codex-exec xhigh verific
|
|
|
171
185
|
|
|
172
186
|
Each round:
|
|
173
187
|
1. **opus 4.6**: Deep reasoning verification — checks logical consistency, identifies gaps, challenges assumptions
|
|
174
|
-
2. **codex-exec xhigh** (
|
|
188
|
+
2. **codex-exec xhigh** (when available): Independent code-level verification — validates technical claims, tests feasibility
|
|
189
|
+
- If unavailable: display `[Phase 2] Round {N}: Codex unavailable, proceeding with opus verification only`
|
|
175
190
|
3. **Contradiction resolution**: Reconcile divergent findings between teams and verifiers
|
|
176
191
|
4. **Convergence check**: All major claims verified with no outstanding contradictions → proceed
|
|
177
192
|
|
|
@@ -271,10 +286,16 @@ Round N:
|
|
|
271
286
|
- Internal consistency (breadth ↔ depth alignment)
|
|
272
287
|
- Cross-domain consistency (security ↔ architecture)
|
|
273
288
|
- Evidence quality (claims without backing)
|
|
274
|
-
Step 2: codex-exec validates technical claims:
|
|
275
|
-
-
|
|
276
|
-
|
|
277
|
-
|
|
289
|
+
Step 2: codex-exec validates technical claims (when available):
|
|
290
|
+
a. Invoke: /codex-exec with findings from all teams
|
|
291
|
+
b. Prompt: "Validate technical claims: {findings}.
|
|
292
|
+
Check code patterns, benchmark reproducibility,
|
|
293
|
+
dependency resolution."
|
|
294
|
+
c. Effort: --effort xhigh
|
|
295
|
+
d. Parse: contradictions → merge with opus findings
|
|
296
|
+
e. On timeout/error: log "[Phase 2] Round {N}: codex-exec error — {reason},
|
|
297
|
+
continuing with opus results only"
|
|
298
|
+
If unavailable: log "[Phase 2] Round {N}: Codex unavailable, proceeding with opus verification only"
|
|
278
299
|
Step 3: Compile contradiction list
|
|
279
300
|
- 0 contradictions → CONVERGED
|
|
280
301
|
- >0 contradictions → feedback to relevant teams → Round N+1
|
|
@@ -385,6 +406,52 @@ Progress:
|
|
|
385
406
|
└── T9-T10: ○ Pending
|
|
386
407
|
```
|
|
387
408
|
|
|
409
|
+
## Teams Mode
|
|
410
|
+
|
|
411
|
+
When running inside an Agent Teams member (not via Skill tool), the research workflow operates identically but with these adaptations:
|
|
412
|
+
|
|
413
|
+
### How It Works
|
|
414
|
+
|
|
415
|
+
The orchestrator reads this SKILL.md and includes the research instructions directly in the Teams member's prompt. The member then:
|
|
416
|
+
|
|
417
|
+
1. Executes Phase 1-4 autonomously using its own Agent tool access
|
|
418
|
+
2. Spawns research teams as sub-agents (Teams members CAN spawn sub-agents)
|
|
419
|
+
3. Delivers results via `SendMessage` to the team lead instead of returning to orchestrator
|
|
420
|
+
|
|
421
|
+
### Prompt Embedding Pattern
|
|
422
|
+
|
|
423
|
+
```
|
|
424
|
+
# When spawning a Teams member for research:
|
|
425
|
+
Agent(
|
|
426
|
+
name: "researcher-1",
|
|
427
|
+
team_name: "my-team",
|
|
428
|
+
prompt: """
|
|
429
|
+
You are a research agent. Follow the research skill workflow below:
|
|
430
|
+
{contents of research/SKILL.md}
|
|
431
|
+
|
|
432
|
+
Topic: {user's research topic}
|
|
433
|
+
Deliver results via SendMessage to team lead when complete.
|
|
434
|
+
"""
|
|
435
|
+
)
|
|
436
|
+
```
|
|
437
|
+
|
|
438
|
+
### Differences from Orchestrator Mode
|
|
439
|
+
|
|
440
|
+
| Aspect | Orchestrator Mode | Teams Mode |
|
|
441
|
+
|--------|------------------|------------|
|
|
442
|
+
| Invocation | `Skill(research)` | Prompt embedding |
|
|
443
|
+
| Result delivery | Return to main conversation | `SendMessage` to team lead |
|
|
444
|
+
| Artifact persistence | Teams member writes artifact | Same |
|
|
445
|
+
| GitHub issue creation | Orchestrator handles | Teams member handles directly |
|
|
446
|
+
| Phase management | Orchestrator manages phases | Member manages phases autonomously |
|
|
447
|
+
|
|
448
|
+
### Constraints
|
|
449
|
+
|
|
450
|
+
- Each Teams member running research still respects R009 (max 4 concurrent sub-agents)
|
|
451
|
+
- Batching order remains: T1-T4 → T5-T8 → T9-T10
|
|
452
|
+
- Cost is identical to orchestrator mode (~$8-15 per research invocation)
|
|
453
|
+
- Multiple Teams members running research simultaneously will multiply costs proportionally
|
|
454
|
+
|
|
388
455
|
## Integration
|
|
389
456
|
|
|
390
457
|
| Rule | Integration |
|