guild-agents 1.5.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/README.md +71 -67
  2. package/bin/guild.js +4 -85
  3. package/package.json +1 -1
  4. package/src/commands/doctor.js +11 -33
  5. package/src/commands/init.js +1 -1
  6. package/src/templates/skills/build-feature/SKILL.md +7 -38
  7. package/src/templates/skills/build-feature/evals/evals.json +2 -2
  8. package/src/templates/skills/council/SKILL.md +4 -14
  9. package/src/templates/skills/council/evals/evals.json +3 -13
  10. package/src/templates/skills/create-pr/SKILL.md +2 -5
  11. package/src/templates/skills/guild-specialize/SKILL.md +2 -5
  12. package/src/templates/skills/qa-cycle/SKILL.md +0 -7
  13. package/src/templates/skills/re-specialize/SKILL.md +0 -3
  14. package/src/templates/skills/session-end/SKILL.md +77 -30
  15. package/src/templates/skills/session-start/SKILL.md +51 -20
  16. package/src/utils/eval-runner.js +2 -8
  17. package/src/utils/generators.js +3 -4
  18. package/src/utils/skill-parser.js +83 -0
  19. package/src/utils/trigger-runner.js +1 -1
  20. package/src/commands/logs.js +0 -63
  21. package/src/commands/reset-learnings.js +0 -44
  22. package/src/commands/run.js +0 -105
  23. package/src/commands/stats.js +0 -147
  24. package/src/templates/agents/learnings-extractor.md +0 -49
  25. package/src/templates/skills/dev-flow/SKILL.md +0 -81
  26. package/src/templates/skills/dev-flow/evals/evals.json +0 -36
  27. package/src/templates/skills/dev-flow/evals/triggers.json +0 -16
  28. package/src/templates/skills/new-feature/SKILL.md +0 -119
  29. package/src/templates/skills/new-feature/evals/evals.json +0 -41
  30. package/src/templates/skills/new-feature/evals/triggers.json +0 -16
  31. package/src/templates/skills/review/SKILL.md +0 -97
  32. package/src/templates/skills/review/evals/evals.json +0 -43
  33. package/src/templates/skills/review/evals/triggers.json +0 -16
  34. package/src/templates/skills/status/SKILL.md +0 -100
  35. package/src/templates/skills/status/evals/evals.json +0 -40
  36. package/src/templates/skills/status/evals/triggers.json +0 -16
  37. package/src/templates/skills/verify/SKILL.md +0 -114
  38. package/src/templates/skills/verify/evals/triggers.json +0 -16
  39. package/src/utils/accounting.js +0 -139
  40. package/src/utils/dispatch-protocol.js +0 -71
  41. package/src/utils/dispatch.js +0 -172
  42. package/src/utils/executor.js +0 -293
  43. package/src/utils/learnings-io.js +0 -76
  44. package/src/utils/learnings.js +0 -204
  45. package/src/utils/orchestrator-io.js +0 -356
  46. package/src/utils/orchestrator.js +0 -590
  47. package/src/utils/pricing.js +0 -28
  48. package/src/utils/providers/claude-code.js +0 -43
  49. package/src/utils/skill-loader.js +0 -83
  50. package/src/utils/trace.js +0 -400
  51. package/src/utils/workflow-parser.js +0 -225
package/README.md CHANGED
@@ -7,7 +7,7 @@
7
7
 
8
8
  **Guild makes Claude Code think before it builds.**
9
9
 
10
- Guild is a spec-driven development CLI for Claude Code. It installs structured design and development workflows as `.claude/` markdown files in any project. Before code is written, features are evaluated, debated by independent AI perspectives, and specified in a design doc. Everything is markdown, tracked by git, works offline, zero infrastructure.
10
+ Without Guild, Claude Code writes code immediately. No evaluation, no design, no review. With Guild, every feature goes through structured phases evaluated by an advisor, designed by a tech lead, reviewed by a code reviewer, validated by QA before anything ships. Everything is markdown in `.claude/`, tracked by git, works offline, zero infrastructure.
11
11
 
12
12
  ## The Problem
13
13
 
@@ -20,10 +20,11 @@ Without structure, Claude Code:
20
20
 
21
21
  ## How Guild Solves It
22
22
 
23
- - **Spec before code**: every feature starts with a design doc
24
- - **Structured deliberation**: `/council` runs parallel independent analysis -- multiple perspectives evaluate independently, then synthesize
25
- - **Decisions that persist**: design docs, session state, and project context live in git-tracked markdown
26
- - **Zero infrastructure**: no servers, no APIs, just markdown files and Claude Code
23
+ - **Spec before code**: `/build-feature` enforces evaluation, design, and review phases — code comes after the design doc
24
+ - **Independent perspectives**: `/council` spawns parallel agents that each analyze your idea independently, then synthesize into a decision
25
+ - **Session continuity**: `/session-start` and `/session-end` combine SESSION.md with Claude Code's memory system — you never lose context between sessions
26
+ - **Behavioral discipline**: `/tdd` and `/debug` prevent the most common LLM anti-patterns: code before tests, fixes before root cause analysis
27
+ - **Quality you can measure**: `guild eval` validates skill structure, trigger accuracy, and description quality with automated benchmarks
27
28
 
28
29
  ## Quick Start
29
30
 
@@ -43,46 +44,56 @@ Then use skills as slash commands in Claude Code:
43
44
  ## The Pipeline
44
45
 
45
46
  ```text
46
- You ──> /council "Add JWT auth"
47
+ You ──> /build-feature "Add JWT auth"
47
48
 
48
49
 
49
- ┌──────────┐ ┌──────────────┐ ┌──────────┐
50
- │ Evaluate │────>│ Design Doc │────>│ Build │
51
- debate │ │ spec │ │ implement
52
- └──────────┘ └──────────────┘ └────┬─────┘
53
-
54
- ┌─────┴─────┐
55
- ▼ ▼
56
- ┌──────────┐┌──────────┐
57
- │ Review ││ QA │
58
- └──────────┘└──────────┘
50
+ ┌──────────┐ ┌──────────┐ ┌──────────┐
51
+ │ Evaluate │────>│ Design │────>│ Build │
52
+ advisor │ │ tech-lead│ │developer
53
+ └──────────┘ └──────────┘ └────┬─────┘
54
+
55
+ ┌─────┴─────┐
56
+ ▼ ▼
57
+ ┌──────────┐┌──────────┐
58
+ │ Review ││ QA │
59
+ └──────────┘└──────────┘
59
60
  ```
60
61
 
61
62
  Five phases: **evaluate**, **design**, **implement**, **review**, **validate**. Phases 1-2 happen before any code is written.
62
63
 
63
- ## Skills Reference
64
-
65
- All 15 skills, grouped by function:
66
-
67
- | Skill | Group | Description |
68
- | --- | --- | --- |
69
- | `/build-feature` | Pipeline | Full pipeline: evaluate, spec, implement, review, QA |
70
- | `/new-feature` | Pipeline | Create branch and scaffold for a new feature |
71
- | `/create-pr` | Pipeline | Create a structured pull request from current branch |
72
- | `/council` | Decision | Multi-perspective deliberation on a decision or feature |
73
- | `/review` | Quality | Code review on the current diff |
74
- | `/qa-cycle` | Quality | QA and bugfix loop until clean |
75
- | `/tdd` | Discipline | TDD red-green-refactor cycle |
76
- | `/debug` | Discipline | Systematic 4-phase debugging |
77
- | `/verify` | Discipline | Evidence-before-claims verification |
78
- | `/guild-specialize` | Context | Explore codebase, enrich CLAUDE.md with real conventions |
79
- | `/re-specialize` | Context | Incremental update of auto-generated CLAUDE.md zones |
80
- | `/session-start` | Context | Load context and resume work |
81
- | `/session-end` | Context | Save state to SESSION.md |
82
- | `/status` | Context | Project and session state overview |
83
- | `/dev-flow` | Context | Show current pipeline phase and next step |
84
-
85
- ## CLI Commands
64
+ ## Skills
65
+
66
+ 10 skills, available as slash commands in Claude Code:
67
+
68
+ | Skill | What it does |
69
+ | --- | --- |
70
+ | `/build-feature` | Full pipeline: evaluate, design, implement, review, QA |
71
+ | `/council` | Multi-perspective deliberation 3 agents debate independently, then synthesize |
72
+ | `/create-pr` | Structured pull request from current branch |
73
+ | `/qa-cycle` | QA and bugfix loop until clean |
74
+ | `/tdd` | TDD red-green-refactor no code without a failing test |
75
+ | `/debug` | Systematic 4-phase debugging no fixes without root cause |
76
+ | `/guild-specialize` | Explore your codebase, enrich CLAUDE.md with real conventions |
77
+ | `/re-specialize` | Incremental update of CLAUDE.md when your stack changes |
78
+ | `/session-start` | Resume work from SESSION.md + Claude Code memory |
79
+ | `/session-end` | Save state to SESSION.md + durable learnings to memory |
80
+
81
+ ## Agents
82
+
83
+ 6 specialized roles that give Claude Code distinct perspectives:
84
+
85
+ | Agent | Role |
86
+ | --- | --- |
87
+ | advisor | Evaluates ideas and provides strategic direction. First gate before any work begins |
88
+ | tech-lead | Breaks features into tasks. Defines technical approach and architecture |
89
+ | developer | Implements features following project conventions. Writes tests, makes atomic commits |
90
+ | code-reviewer | Reviews quality, patterns, and technical debt |
91
+ | qa | Testing, edge cases, regression. Validates the implementation meets acceptance criteria |
92
+ | bugfix | Diagnosis and bug resolution. Isolates root causes and applies targeted fixes |
93
+
94
+ Each agent is a flat `.md` file with identity, responsibilities, and boundaries. Claude Code reads them via its native Agent tool and assumes the role.
95
+
96
+ ## CLI
86
97
 
87
98
  ```bash
88
99
  guild init # Interactive project onboarding
@@ -90,45 +101,38 @@ guild new-agent <name> # Create a custom agent
90
101
  guild status # Show project status
91
102
  guild doctor # Diagnose setup
92
103
  guild list # List agents and skills
93
- guild run <skill> # Preview a skill's execution plan (dry-run)
94
- guild logs # View execution traces
95
- guild logs clean # Remove old traces (--days N, --all)
96
- guild stats # Token usage and cost estimates
97
104
  guild eval # Run structural skill evaluations
98
- guild eval --triggers # Run trigger accuracy tests (keyword matcher)
99
- guild eval --semantic # Run trigger tests with LLM semantic matcher
100
- guild eval --suggest # Show description improvement suggestions
101
- guild workspace init <name> <members...> # Create a workspace
102
- guild workspace add <path> # Add a member repo
103
- guild workspace status # Show workspace state
105
+ guild eval --triggers # Run trigger accuracy tests
106
+ guild eval --semantic # LLM-based trigger tests (requires ANTHROPIC_API_KEY)
107
+ guild eval --suggest # Description improvement suggestions
108
+ guild workspace init # Create a multi-repo workspace
104
109
  ```
105
110
 
106
111
  ## Skill Evaluations
107
112
 
108
- Guild includes a built-in evaluation framework for validating skill quality:
113
+ Guild includes a built-in framework for measuring skill quality:
109
114
 
110
- - **Structural evals** (`guild eval`) -- assert workflow structure: steps exist, roles are correct, gates are present
111
- - **Trigger tests** (`guild eval --triggers`) -- verify that user prompts route to the correct skill using keyword overlap scoring
112
- - **Semantic matcher** (`guild eval --semantic`) -- optional LLM-based scoring via Anthropic Haiku for higher-fidelity trigger testing (requires `ANTHROPIC_API_KEY`)
113
- - **Description suggestions** (`guild eval --suggest`) -- analyzes keyword gaps in skill descriptions based on failed triggers
115
+ - **Structural evals** -- assert workflow structure: steps exist, roles are correct, gates are present
116
+ - **Trigger tests** -- verify that user prompts route to the correct skill
117
+ - **Semantic matcher** -- LLM-based scoring for higher-fidelity trigger testing
118
+ - **Benchmarks** -- rolling history with per-skill accuracy, precision, recall, and regression detection
114
119
 
115
- Every trigger run automatically records results to `benchmarks/benchmark.json` (rolling 30-entry history) and generates `benchmarks/benchmark.md` with per-skill accuracy, precision, recall, and delta vs previous run. Regressions (>5% accuracy drop with 2+ tests flipped) are flagged automatically.
120
+ ## How It Works
116
121
 
117
- ## Under the Hood
122
+ Guild installs agent definitions and skill workflows as markdown files in your project's `.claude/` directory. Claude Code discovers and executes them natively — no custom runtime, no extra process, no API calls. When you type `/build-feature`, Claude Code reads the skill, follows the phases, and spawns agents using its own Agent tool.
118
123
 
119
- Guild coordinates 7 specialized agents through the pipeline. Each agent handles one phase.
124
+ Guild defines **what** happens. Claude Code decides **how** to execute it.
120
125
 
121
- | Agent | Role |
122
- | --- | --- |
123
- | advisor | Evaluates ideas and provides strategic direction |
124
- | tech-lead | Defines technical approach, tasks, and architecture |
125
- | developer | Implements features following project conventions |
126
- | code-reviewer | Reviews quality, patterns, and technical debt |
127
- | qa | Testing, edge cases, regression validation |
128
- | bugfix | Bug diagnosis and resolution |
129
- | learnings-extractor | Extracts compound learnings from pipeline executions |
126
+ ## Session Continuity
127
+
128
+ Claude Code's native memory system remembers who you are, lessons learned, and project context knowledge that lasts months. But it explicitly does not store ephemeral work state: what you were building, which branch, what phase, what's next. That's the gap Guild fills.
129
+
130
+ `/session-end` writes to **both layers**:
131
+
132
+ - **SESSION.md** where you stopped: task, branch, phase, next steps (overwritten each session)
133
+ - **Claude Code memory** what you learned: decisions, lessons, references (persists across sessions)
130
134
 
131
- Agents are flat `.md` files with identity and expertise. Skills orchestrate agents through structured pipelines. Everything lives in `.claude/`, readable by humans, tracked by git.
135
+ `/session-start` reads from **both** and presents a unified summary. You resume exactly where you left off, with full context of what you know and what you were doing.
132
136
 
133
137
  ## Guild Builds Itself
134
138
 
package/bin/guild.js CHANGED
@@ -1,14 +1,15 @@
1
1
  #!/usr/bin/env node
2
2
 
3
3
  /**
4
- * Guild v1 — CLI entry point
4
+ * Guild v2 — CLI entry point
5
5
  * Usage:
6
- * guild init — interactive onboarding v1
6
+ * guild init — interactive onboarding
7
7
  * guild new-agent — create a new agent
8
8
  * guild status — view project status
9
9
  * guild doctor — verify setup and report issues
10
10
  * guild list — list installed agents and skills
11
- * guild stats view token usage and cost stats
11
+ * guild eval run skill structural evaluations
12
+ * guild workspace — manage multi-repo workspaces
12
13
  */
13
14
 
14
15
  import { program } from 'commander';
@@ -106,69 +107,6 @@ program
106
107
  }
107
108
  });
108
109
 
109
- // guild reset-learnings
110
- program
111
- .command('reset-learnings')
112
- .description('Reset the compound learnings file')
113
- .option('-f, --force', 'Skip confirmation prompt')
114
- .action(async (options) => {
115
- try {
116
- const { runResetLearnings } = await import('../src/commands/reset-learnings.js');
117
- await runResetLearnings(options);
118
- } catch (err) {
119
- console.error(err.message);
120
- process.exit(1);
121
- }
122
- });
123
-
124
- // guild run
125
- program
126
- .command('run')
127
- .description('Execute a skill workflow')
128
- .argument('<skill>', 'Skill name to run')
129
- .argument('[input]', 'Input text for the skill', '')
130
- .option('--profile <profile>', 'Model profile (max, pro)', 'max')
131
- .option('--dry-run', 'Display the execution plan without running it')
132
- .action(async (skill, input, options) => {
133
- try {
134
- const { runRun } = await import('../src/commands/run.js');
135
- await runRun(skill, input, options);
136
- } catch (err) {
137
- console.error(err.message);
138
- process.exit(1);
139
- }
140
- });
141
-
142
- // guild logs (list traces)
143
- const logsCmd = program
144
- .command('logs')
145
- .description('View and manage execution traces')
146
- .action(async (options) => {
147
- try {
148
- const { runLogs } = await import('../src/commands/logs.js');
149
- await runLogs('list', options);
150
- } catch (err) {
151
- console.error(err.message);
152
- process.exit(1);
153
- }
154
- });
155
-
156
- // guild logs clean
157
- logsCmd
158
- .command('clean')
159
- .description('Remove old execution traces')
160
- .option('--days <days>', 'Remove traces older than N days', '30')
161
- .option('--all', 'Remove all traces')
162
- .action(async (options) => {
163
- try {
164
- const { runLogs } = await import('../src/commands/logs.js');
165
- await runLogs('clean', options);
166
- } catch (err) {
167
- console.error(err.message);
168
- process.exit(1);
169
- }
170
- });
171
-
172
110
  // guild eval
173
111
  program
174
112
  .command('eval')
@@ -195,25 +133,6 @@ program
195
133
  }
196
134
  });
197
135
 
198
- // guild stats
199
- program
200
- .command('stats')
201
- .description('View token usage stats and cost estimates')
202
- .option('--period <period>', 'Filter by period: today, week, month, all', 'month')
203
- .option('--compare', 'Compare cost across model profiles')
204
- .option('--reset', 'Delete all usage history')
205
- .option('-f, --force', 'Skip confirmation prompt (for --reset)')
206
- .option('--export <format>', 'Export data (csv)')
207
- .action(async (options) => {
208
- try {
209
- const { runStats } = await import('../src/commands/stats.js');
210
- await runStats(options);
211
- } catch (err) {
212
- console.error(err.message);
213
- process.exit(1);
214
- }
215
- });
216
-
217
136
  // guild workspace
218
137
  const workspaceCmd = program
219
138
  .command('workspace')
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "guild-agents",
3
- "version": "1.5.0",
3
+ "version": "2.1.0",
4
4
  "description": "Specification-driven development CLI for Claude Code — think before you build",
5
5
  "type": "module",
6
6
  "files": [
@@ -4,10 +4,10 @@
4
4
 
5
5
  import * as p from '@clack/prompts';
6
6
  import chalk from 'chalk';
7
- import { existsSync, readdirSync } from 'fs';
7
+ import { existsSync, readdirSync, readFileSync } from 'fs';
8
8
  import { join } from 'path';
9
9
  import { resolveProjectRoot } from '../utils/files.js';
10
- import { loadAllSkills } from '../utils/skill-loader.js';
10
+ import { parseSkill } from '../utils/skill-parser.js';
11
11
 
12
12
  export async function runDoctor() {
13
13
  const root = resolveProjectRoot();
@@ -86,27 +86,26 @@ export async function runDoctor() {
86
86
 
87
87
  // Check workflow validation in skills
88
88
  if (existsSync(skillsDir)) {
89
- const skills = loadAllSkills(skillsDir);
89
+ const skillDirs = readdirSync(skillsDir, { withFileTypes: true })
90
+ .filter(d => d.isDirectory())
91
+ .filter(d => existsSync(join(skillsDir, d.name, 'SKILL.md')));
92
+
90
93
  let workflowCount = 0;
91
94
  let workflowErrors = 0;
92
95
  const errorDetails = [];
93
96
 
94
- for (const [name, skill] of skills) {
97
+ for (const dir of skillDirs) {
98
+ const content = readFileSync(join(skillsDir, dir.name, 'SKILL.md'), 'utf8');
99
+ const skill = parseSkill(content);
100
+
95
101
  if (skill.workflow) {
96
102
  workflowCount++;
97
- if (skill.errors.length > 0) {
98
- workflowErrors++;
99
- errorDetails.push(`${name}: ${skill.errors.join('; ')}`);
100
- }
101
- }
102
103
 
103
- // Check that agent references exist
104
- if (skill.workflow) {
105
104
  for (const step of skill.workflow.steps) {
106
105
  if (step.role !== 'system' && step.role !== 'dynamic') {
107
106
  const agentPath = join(agentsDir, `${step.role}.md`);
108
107
  if (!existsSync(agentPath)) {
109
- errorDetails.push(`${name}: step "${step.id}" references agent "${step.role}" — agent not found`);
108
+ errorDetails.push(`${dir.name}: step "${step.id}" references agent "${step.role}" — agent not found`);
110
109
  workflowErrors++;
111
110
  }
112
111
  }
@@ -124,27 +123,6 @@ export async function runDoctor() {
124
123
  });
125
124
  healthy = false;
126
125
  }
127
- // If workflowCount === 0, don't add a check (no workflows to validate)
128
-
129
- // Check for dual-format skills (workflow frontmatter + body step/phase headings)
130
- // Matches "### Step 1", "## Phase 2", etc. — requires digit after Step/Phase
131
- const STEP_PHASE_RE = /^#{1,3}\s+(Step|Phase)\s+\d/im;
132
- const dualFormatWarnings = [];
133
-
134
- for (const [name, skill] of skills) {
135
- if (skill.workflow && skill.body && STEP_PHASE_RE.test(skill.body)) {
136
- dualFormatWarnings.push(name);
137
- }
138
- }
139
-
140
- if (dualFormatWarnings.length > 0) {
141
- checks.push({
142
- name: `Dual-format skills (${dualFormatWarnings.length} warning(s))`,
143
- pass: true,
144
- warn: true,
145
- detail: `Skills with both workflow frontmatter and body step/phase headings: ${dualFormatWarnings.join(', ')}. Workflow steps take precedence — consider removing prose steps from body.`,
146
- });
147
- }
148
126
  }
149
127
 
150
128
  // Display results
@@ -135,7 +135,7 @@ export async function runInit() {
135
135
 
136
136
  const relevantSkills = projectData.hasExistingCode
137
137
  ? ['/guild-specialize', '/council', '/build-feature']
138
- : ['/council', '/build-feature', '/new-feature'];
138
+ : ['/council', '/build-feature'];
139
139
  p.log.info(`Start with: ${relevantSkills.join(' ')}`);
140
140
 
141
141
  const quickStart = projectData.hasExistingCode
@@ -11,14 +11,12 @@ workflow:
11
11
  requires: [feature-description]
12
12
  produces: [evaluation-report, verdict]
13
13
  model-tier: reasoning
14
- on-failure: abort
15
14
  - id: design
16
15
  role: tech-lead
17
16
  intent: "Break the feature into concrete tasks with acceptance criteria. Define implementation approach: files to modify, patterns to follow, interfaces, and technical risks."
18
17
  requires: [feature-description, evaluation-report]
19
18
  produces: [task-list, acceptance-criteria, technical-plan]
20
19
  model-tier: reasoning
21
- condition: step.evaluate.verdict != rejected
22
20
  - id: implement
23
21
  role: developer
24
22
  intent: "Implement the feature following the technical plan. Write unit tests. Make atomic commits."
@@ -28,13 +26,11 @@ workflow:
28
26
  - id: gate-pre-review
29
27
  role: system
30
28
  intent: "Run project tests and lint. Both must pass before review."
31
- commands: [npm test, npm run lint]
32
29
  gate: true
33
30
  produces: [gate-pre-review-result]
34
- on-failure: goto:implement
35
- - id: checkpoint-phase4
31
+ - id: checkpoint
36
32
  role: system
37
- intent: "Create checkpoint commit and write partial pipeline trace (phases 1-4) to spec file."
33
+ intent: "Create checkpoint commit and write partial pipeline trace to spec file."
38
34
  requires: [implementation, gate-pre-review-result]
39
35
  produces: [checkpoint-commit]
40
36
  gate: true
@@ -44,56 +40,29 @@ workflow:
44
40
  requires: [implementation, gate-pre-review-result]
45
41
  produces: [review-report]
46
42
  model-tier: reasoning
47
- retry:
48
- max: 2
49
- on: has-blockers
50
43
  - id: fix-review-blockers
51
44
  role: developer
52
45
  intent: "Fix blocker findings from code review. Run tests after fixing."
53
46
  requires: [review-report]
54
47
  produces: [implementation]
55
48
  model-tier: execution
56
- condition: step.review.has-blockers
57
- on-failure: goto:review
58
49
  - id: qa-phase
59
- role: system
60
- intent: "Run QA validation with bugfix cycles."
61
- delegates-to: qa-cycle
50
+ role: qa
51
+ intent: "Validate acceptance criteria, test edge cases, run bugfix cycles if needed."
62
52
  requires: [acceptance-criteria, implementation]
63
53
  produces: [qa-report]
64
- retry:
65
- max: 2
66
- on: has-bugs
67
- - id: post-qa-review
68
- role: code-reviewer
69
- intent: "Review changes introduced during QA bugfix cycles."
70
- requires: [qa-report, implementation]
71
- produces: [post-qa-review-report]
72
- model-tier: reasoning
73
- condition: step.qa-phase.had-significant-changes
74
- retry:
75
- max: 2
76
- on: has-blockers
54
+ model-tier: execution
77
55
  - id: gate-final
78
56
  role: system
79
57
  intent: "Run project tests and lint as final verification. Both must pass."
80
- commands: [npm test, npm run lint]
81
58
  gate: true
82
59
  produces: [final-gate-result]
83
- on-failure: goto:qa-phase
84
60
  - id: completion
85
61
  role: system
86
- intent: "Write complete pipeline trace to spec file. Update SESSION.md. Present summary to user."
62
+ intent: "Update SESSION.md. Present summary to user."
87
63
  requires: [final-gate-result, review-report, qa-report]
88
- produces: [pipeline-trace, session-update]
64
+ produces: [session-update]
89
65
  gate: true
90
- - id: extract-learnings
91
- role: learnings-extractor
92
- intent: "Extract compound learnings from this pipeline execution."
93
- requires: [pipeline-trace]
94
- produces: [updated-learnings]
95
- model-tier: routine
96
- blocking: false
97
66
  ---
98
67
 
99
68
  # Build Feature
@@ -43,9 +43,9 @@
43
43
  },
44
44
  {
45
45
  "id": "bf-minimum-steps",
46
- "description": "Plan has at least 10 steps",
46
+ "description": "Plan has at least 8 steps",
47
47
  "expectations": [
48
- { "text": "At least 10 steps", "assertion": "step-count:10" }
48
+ { "text": "At least 8 steps", "assertion": "step-count:8" }
49
49
  ]
50
50
  }
51
51
  ]
@@ -11,33 +11,24 @@ workflow:
11
11
  requires: [user-question]
12
12
  produces: [council-type, participant-roles]
13
13
  gate: true
14
- - id: workspace-context
15
- role: system
16
- intent: "Detect workspace membership. If in a workspace, collect context from sibling repos (CLAUDE.md, PROJECT.md, SESSION.md) and build workspace context block."
17
- requires: [council-type]
18
- produces: [workspace-context]
19
- condition: in-workspace
20
14
  - id: agent-1
21
15
  role: dynamic
22
- intent: "Analyze the question from specialized perspective. State position with concrete arguments."
23
- requires: [user-question, council-type, workspace-context]
16
+ intent: "Analyze the question from specialized perspective. State position with concrete arguments. Spawn via Agent tool IN PARALLEL with agent-2 and agent-3."
17
+ requires: [user-question, council-type]
24
18
  produces: [perspective-1]
25
19
  model-tier: reasoning
26
- parallel: [agent-2, agent-3]
27
20
  - id: agent-2
28
21
  role: dynamic
29
22
  intent: "Analyze the question from specialized perspective. State position with concrete arguments."
30
- requires: [user-question, council-type, workspace-context]
23
+ requires: [user-question, council-type]
31
24
  produces: [perspective-2]
32
25
  model-tier: reasoning
33
- parallel: [agent-1, agent-3]
34
26
  - id: agent-3
35
27
  role: dynamic
36
28
  intent: "Analyze the question from specialized perspective. State position with concrete arguments."
37
- requires: [user-question, council-type, workspace-context]
29
+ requires: [user-question, council-type]
38
30
  produces: [perspective-3]
39
31
  model-tier: reasoning
40
- parallel: [agent-1, agent-2]
41
32
  - id: synthesize
42
33
  role: system
43
34
  intent: "Synthesize debate: points of agreement, disagreement, risks. Present options to user."
@@ -49,7 +40,6 @@ workflow:
49
40
  intent: "After user decides, write spec document to docs/specs/."
50
41
  requires: [synthesis, user-decision]
51
42
  produces: [spec-document]
52
- condition: user-wants-spec
53
43
  gate: true
54
44
  ---
55
45
 
@@ -2,15 +2,12 @@
2
2
  "skill": "council",
3
3
  "evals": [
4
4
  {
5
- "id": "council-three-parallel-agents",
6
- "description": "Council has 3 agent steps in parallel",
5
+ "id": "council-three-agents",
6
+ "description": "Council has 3 agent steps",
7
7
  "expectations": [
8
8
  { "text": "Agent-1 exists", "assertion": "step-exists:agent-1" },
9
9
  { "text": "Agent-2 exists", "assertion": "step-exists:agent-2" },
10
- { "text": "Agent-3 exists", "assertion": "step-exists:agent-3" },
11
- { "text": "Agent-1 is parallel", "assertion": "step-parallel:agent-1" },
12
- { "text": "Agent-2 is parallel", "assertion": "step-parallel:agent-2" },
13
- { "text": "Agent-3 is parallel", "assertion": "step-parallel:agent-3" }
10
+ { "text": "Agent-3 exists", "assertion": "step-exists:agent-3" }
14
11
  ]
15
12
  },
16
13
  {
@@ -29,13 +26,6 @@
29
26
  { "text": "Synthesize step exists", "assertion": "step-exists:synthesize" },
30
27
  { "text": "Synthesize has gate", "assertion": "gate-exists:synthesize" }
31
28
  ]
32
- },
33
- {
34
- "id": "council-workspace-context",
35
- "description": "Workspace context step exists with condition",
36
- "expectations": [
37
- { "text": "Workspace-context step exists", "assertion": "step-exists:workspace-context" }
38
- ]
39
29
  }
40
30
  ]
41
31
  }
@@ -8,12 +8,10 @@ workflow:
8
8
  - id: verify-branch
9
9
  role: system
10
10
  intent: "Verify not on main/develop, check for uncommitted changes, get commits ahead of main."
11
- commands: [git branch --show-current, git status, git log main..HEAD --oneline]
12
11
  produces: [branch-name, branch-state, commit-list]
13
12
  - id: gather-context
14
13
  role: system
15
14
  intent: "Collect diff stats, run tests and lint for PR description context."
16
- commands: [git diff main..HEAD --stat, npm test, npm run lint]
17
15
  requires: [branch-state]
18
16
  produces: [diff-summary, test-result, lint-result]
19
17
  - id: generate-description
@@ -25,7 +23,6 @@ workflow:
25
23
  - id: create-pr
26
24
  role: system
27
25
  intent: "Push branch to origin and create PR via gh CLI."
28
- commands: [git push -u origin, gh pr create]
29
26
  requires: [pr-description, pr-title, branch-name]
30
27
  produces: [pr-url]
31
28
  - id: post-creation
@@ -102,7 +99,7 @@ Build a structured PR description:
102
99
  1. Display the PR URL
103
100
  2. Suggest next steps:
104
101
  - "Request review from a teammate"
105
- - "Run `/review` for an AI code review"
102
+ - "Run `/code-review` for an AI code review"
106
103
  - "Merge when ready with `gh pr merge [number]`"
107
104
 
108
105
  ## Example Session
@@ -121,7 +118,7 @@ https://github.com/org/repo/pull/42
121
118
 
122
119
  Next steps:
123
120
  - Request review from a teammate
124
- - Run /review for AI code review
121
+ - Run /code-review for AI code review
125
122
  - Merge when ready
126
123
  ```
127
124