guild-agents 1.5.0 → 2.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +71 -67
- package/bin/guild.js +4 -85
- package/package.json +1 -1
- package/src/commands/doctor.js +11 -33
- package/src/commands/init.js +1 -1
- package/src/templates/skills/build-feature/SKILL.md +7 -38
- package/src/templates/skills/build-feature/evals/evals.json +2 -2
- package/src/templates/skills/council/SKILL.md +4 -14
- package/src/templates/skills/council/evals/evals.json +3 -13
- package/src/templates/skills/create-pr/SKILL.md +2 -5
- package/src/templates/skills/guild-specialize/SKILL.md +2 -5
- package/src/templates/skills/qa-cycle/SKILL.md +0 -7
- package/src/templates/skills/re-specialize/SKILL.md +0 -3
- package/src/templates/skills/session-end/SKILL.md +77 -30
- package/src/templates/skills/session-start/SKILL.md +51 -20
- package/src/utils/eval-runner.js +2 -8
- package/src/utils/generators.js +3 -4
- package/src/utils/skill-parser.js +83 -0
- package/src/utils/trigger-runner.js +1 -1
- package/src/commands/logs.js +0 -63
- package/src/commands/reset-learnings.js +0 -44
- package/src/commands/run.js +0 -105
- package/src/commands/stats.js +0 -147
- package/src/templates/agents/learnings-extractor.md +0 -49
- package/src/templates/skills/dev-flow/SKILL.md +0 -81
- package/src/templates/skills/dev-flow/evals/evals.json +0 -36
- package/src/templates/skills/dev-flow/evals/triggers.json +0 -16
- package/src/templates/skills/new-feature/SKILL.md +0 -119
- package/src/templates/skills/new-feature/evals/evals.json +0 -41
- package/src/templates/skills/new-feature/evals/triggers.json +0 -16
- package/src/templates/skills/review/SKILL.md +0 -97
- package/src/templates/skills/review/evals/evals.json +0 -43
- package/src/templates/skills/review/evals/triggers.json +0 -16
- package/src/templates/skills/status/SKILL.md +0 -100
- package/src/templates/skills/status/evals/evals.json +0 -40
- package/src/templates/skills/status/evals/triggers.json +0 -16
- package/src/templates/skills/verify/SKILL.md +0 -114
- package/src/templates/skills/verify/evals/triggers.json +0 -16
- package/src/utils/accounting.js +0 -139
- package/src/utils/dispatch-protocol.js +0 -71
- package/src/utils/dispatch.js +0 -172
- package/src/utils/executor.js +0 -293
- package/src/utils/learnings-io.js +0 -76
- package/src/utils/learnings.js +0 -204
- package/src/utils/orchestrator-io.js +0 -356
- package/src/utils/orchestrator.js +0 -590
- package/src/utils/pricing.js +0 -28
- package/src/utils/providers/claude-code.js +0 -43
- package/src/utils/skill-loader.js +0 -83
- package/src/utils/trace.js +0 -400
- package/src/utils/workflow-parser.js +0 -225
package/README.md
CHANGED
|
@@ -7,7 +7,7 @@
|
|
|
7
7
|
|
|
8
8
|
**Guild makes Claude Code think before it builds.**
|
|
9
9
|
|
|
10
|
-
Guild
|
|
10
|
+
Without Guild, Claude Code writes code immediately. No evaluation, no design, no review. With Guild, every feature goes through structured phases — evaluated by an advisor, designed by a tech lead, reviewed by a code reviewer, validated by QA — before anything ships. Everything is markdown in `.claude/`, tracked by git, works offline, zero infrastructure.
|
|
11
11
|
|
|
12
12
|
## The Problem
|
|
13
13
|
|
|
@@ -20,10 +20,11 @@ Without structure, Claude Code:
|
|
|
20
20
|
|
|
21
21
|
## How Guild Solves It
|
|
22
22
|
|
|
23
|
-
- **Spec before code**:
|
|
24
|
-
- **
|
|
25
|
-
- **
|
|
26
|
-
- **
|
|
23
|
+
- **Spec before code**: `/build-feature` enforces evaluation, design, and review phases — code comes after the design doc
|
|
24
|
+
- **Independent perspectives**: `/council` spawns parallel agents that each analyze your idea independently, then synthesize into a decision
|
|
25
|
+
- **Session continuity**: `/session-start` and `/session-end` combine SESSION.md with Claude Code's memory system — you never lose context between sessions
|
|
26
|
+
- **Behavioral discipline**: `/tdd` and `/debug` prevent the most common LLM anti-patterns: code before tests, fixes before root cause analysis
|
|
27
|
+
- **Quality you can measure**: `guild eval` validates skill structure, trigger accuracy, and description quality with automated benchmarks
|
|
27
28
|
|
|
28
29
|
## Quick Start
|
|
29
30
|
|
|
@@ -43,46 +44,56 @@ Then use skills as slash commands in Claude Code:
|
|
|
43
44
|
## The Pipeline
|
|
44
45
|
|
|
45
46
|
```text
|
|
46
|
-
You ──> /
|
|
47
|
+
You ──> /build-feature "Add JWT auth"
|
|
47
48
|
│
|
|
48
49
|
▼
|
|
49
|
-
┌──────────┐
|
|
50
|
-
│ Evaluate │────>│ Design
|
|
51
|
-
│
|
|
52
|
-
└──────────┘
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
50
|
+
┌──────────┐ ┌──────────┐ ┌──────────┐
|
|
51
|
+
│ Evaluate │────>│ Design │────>│ Build │
|
|
52
|
+
│ advisor │ │ tech-lead│ │developer │
|
|
53
|
+
└──────────┘ └──────────┘ └────┬─────┘
|
|
54
|
+
│
|
|
55
|
+
┌─────┴─────┐
|
|
56
|
+
▼ ▼
|
|
57
|
+
┌──────────┐┌──────────┐
|
|
58
|
+
│ Review ││ QA │
|
|
59
|
+
└──────────┘└──────────┘
|
|
59
60
|
```
|
|
60
61
|
|
|
61
62
|
Five phases: **evaluate**, **design**, **implement**, **review**, **validate**. Phases 1-2 happen before any code is written.
|
|
62
63
|
|
|
63
|
-
## Skills
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
| Skill |
|
|
68
|
-
| --- | --- |
|
|
69
|
-
| `/build-feature` |
|
|
70
|
-
| `/
|
|
71
|
-
| `/create-pr` |
|
|
72
|
-
| `/
|
|
73
|
-
| `/
|
|
74
|
-
| `/
|
|
75
|
-
| `/
|
|
76
|
-
| `/
|
|
77
|
-
| `/
|
|
78
|
-
| `/
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
64
|
+
## Skills
|
|
65
|
+
|
|
66
|
+
10 skills, available as slash commands in Claude Code:
|
|
67
|
+
|
|
68
|
+
| Skill | What it does |
|
|
69
|
+
| --- | --- |
|
|
70
|
+
| `/build-feature` | Full pipeline: evaluate, design, implement, review, QA |
|
|
71
|
+
| `/council` | Multi-perspective deliberation — 3 agents debate independently, then synthesize |
|
|
72
|
+
| `/create-pr` | Structured pull request from current branch |
|
|
73
|
+
| `/qa-cycle` | QA and bugfix loop until clean |
|
|
74
|
+
| `/tdd` | TDD red-green-refactor — no code without a failing test |
|
|
75
|
+
| `/debug` | Systematic 4-phase debugging — no fixes without root cause |
|
|
76
|
+
| `/guild-specialize` | Explore your codebase, enrich CLAUDE.md with real conventions |
|
|
77
|
+
| `/re-specialize` | Incremental update of CLAUDE.md when your stack changes |
|
|
78
|
+
| `/session-start` | Resume work from SESSION.md + Claude Code memory |
|
|
79
|
+
| `/session-end` | Save state to SESSION.md + durable learnings to memory |
|
|
80
|
+
|
|
81
|
+
## Agents
|
|
82
|
+
|
|
83
|
+
6 specialized roles that give Claude Code distinct perspectives:
|
|
84
|
+
|
|
85
|
+
| Agent | Role |
|
|
86
|
+
| --- | --- |
|
|
87
|
+
| advisor | Evaluates ideas and provides strategic direction. First gate before any work begins |
|
|
88
|
+
| tech-lead | Breaks features into tasks. Defines technical approach and architecture |
|
|
89
|
+
| developer | Implements features following project conventions. Writes tests, makes atomic commits |
|
|
90
|
+
| code-reviewer | Reviews quality, patterns, and technical debt |
|
|
91
|
+
| qa | Testing, edge cases, regression. Validates the implementation meets acceptance criteria |
|
|
92
|
+
| bugfix | Diagnosis and bug resolution. Isolates root causes and applies targeted fixes |
|
|
93
|
+
|
|
94
|
+
Each agent is a flat `.md` file with identity, responsibilities, and boundaries. Claude Code reads them via its native Agent tool and assumes the role.
|
|
95
|
+
|
|
96
|
+
## CLI
|
|
86
97
|
|
|
87
98
|
```bash
|
|
88
99
|
guild init # Interactive project onboarding
|
|
@@ -90,45 +101,38 @@ guild new-agent <name> # Create a custom agent
|
|
|
90
101
|
guild status # Show project status
|
|
91
102
|
guild doctor # Diagnose setup
|
|
92
103
|
guild list # List agents and skills
|
|
93
|
-
guild run <skill> # Preview a skill's execution plan (dry-run)
|
|
94
|
-
guild logs # View execution traces
|
|
95
|
-
guild logs clean # Remove old traces (--days N, --all)
|
|
96
|
-
guild stats # Token usage and cost estimates
|
|
97
104
|
guild eval # Run structural skill evaluations
|
|
98
|
-
guild eval --triggers # Run trigger accuracy tests
|
|
99
|
-
guild eval --semantic #
|
|
100
|
-
guild eval --suggest #
|
|
101
|
-
guild workspace init
|
|
102
|
-
guild workspace add <path> # Add a member repo
|
|
103
|
-
guild workspace status # Show workspace state
|
|
105
|
+
guild eval --triggers # Run trigger accuracy tests
|
|
106
|
+
guild eval --semantic # LLM-based trigger tests (requires ANTHROPIC_API_KEY)
|
|
107
|
+
guild eval --suggest # Description improvement suggestions
|
|
108
|
+
guild workspace init # Create a multi-repo workspace
|
|
104
109
|
```
|
|
105
110
|
|
|
106
111
|
## Skill Evaluations
|
|
107
112
|
|
|
108
|
-
Guild includes a built-in
|
|
113
|
+
Guild includes a built-in framework for measuring skill quality:
|
|
109
114
|
|
|
110
|
-
- **Structural evals**
|
|
111
|
-
- **Trigger tests**
|
|
112
|
-
- **Semantic matcher**
|
|
113
|
-
- **
|
|
115
|
+
- **Structural evals** -- assert workflow structure: steps exist, roles are correct, gates are present
|
|
116
|
+
- **Trigger tests** -- verify that user prompts route to the correct skill
|
|
117
|
+
- **Semantic matcher** -- LLM-based scoring for higher-fidelity trigger testing
|
|
118
|
+
- **Benchmarks** -- rolling history with per-skill accuracy, precision, recall, and regression detection
|
|
114
119
|
|
|
115
|
-
|
|
120
|
+
## How It Works
|
|
116
121
|
|
|
117
|
-
|
|
122
|
+
Guild installs agent definitions and skill workflows as markdown files in your project's `.claude/` directory. Claude Code discovers and executes them natively — no custom runtime, no extra process, no API calls. When you type `/build-feature`, Claude Code reads the skill, follows the phases, and spawns agents using its own Agent tool.
|
|
118
123
|
|
|
119
|
-
Guild
|
|
124
|
+
Guild defines **what** happens. Claude Code decides **how** to execute it.
|
|
120
125
|
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
| learnings-extractor | Extracts compound learnings from pipeline executions |
|
|
126
|
+
## Session Continuity
|
|
127
|
+
|
|
128
|
+
Claude Code's native memory system remembers who you are, lessons learned, and project context — knowledge that lasts months. But it explicitly does not store ephemeral work state: what you were building, which branch, what phase, what's next. That's the gap Guild fills.
|
|
129
|
+
|
|
130
|
+
`/session-end` writes to **both layers**:
|
|
131
|
+
|
|
132
|
+
- **SESSION.md** — where you stopped: task, branch, phase, next steps (overwritten each session)
|
|
133
|
+
- **Claude Code memory** — what you learned: decisions, lessons, references (persists across sessions)
|
|
130
134
|
|
|
131
|
-
|
|
135
|
+
`/session-start` reads from **both** and presents a unified summary. You resume exactly where you left off, with full context of what you know and what you were doing.
|
|
132
136
|
|
|
133
137
|
## Guild Builds Itself
|
|
134
138
|
|
package/bin/guild.js
CHANGED
|
@@ -1,14 +1,15 @@
|
|
|
1
1
|
#!/usr/bin/env node
|
|
2
2
|
|
|
3
3
|
/**
|
|
4
|
-
* Guild
|
|
4
|
+
* Guild v2 — CLI entry point
|
|
5
5
|
* Usage:
|
|
6
|
-
* guild init — interactive onboarding
|
|
6
|
+
* guild init — interactive onboarding
|
|
7
7
|
* guild new-agent — create a new agent
|
|
8
8
|
* guild status — view project status
|
|
9
9
|
* guild doctor — verify setup and report issues
|
|
10
10
|
* guild list — list installed agents and skills
|
|
11
|
-
* guild
|
|
11
|
+
* guild eval — run skill structural evaluations
|
|
12
|
+
* guild workspace — manage multi-repo workspaces
|
|
12
13
|
*/
|
|
13
14
|
|
|
14
15
|
import { program } from 'commander';
|
|
@@ -106,69 +107,6 @@ program
|
|
|
106
107
|
}
|
|
107
108
|
});
|
|
108
109
|
|
|
109
|
-
// guild reset-learnings
|
|
110
|
-
program
|
|
111
|
-
.command('reset-learnings')
|
|
112
|
-
.description('Reset the compound learnings file')
|
|
113
|
-
.option('-f, --force', 'Skip confirmation prompt')
|
|
114
|
-
.action(async (options) => {
|
|
115
|
-
try {
|
|
116
|
-
const { runResetLearnings } = await import('../src/commands/reset-learnings.js');
|
|
117
|
-
await runResetLearnings(options);
|
|
118
|
-
} catch (err) {
|
|
119
|
-
console.error(err.message);
|
|
120
|
-
process.exit(1);
|
|
121
|
-
}
|
|
122
|
-
});
|
|
123
|
-
|
|
124
|
-
// guild run
|
|
125
|
-
program
|
|
126
|
-
.command('run')
|
|
127
|
-
.description('Execute a skill workflow')
|
|
128
|
-
.argument('<skill>', 'Skill name to run')
|
|
129
|
-
.argument('[input]', 'Input text for the skill', '')
|
|
130
|
-
.option('--profile <profile>', 'Model profile (max, pro)', 'max')
|
|
131
|
-
.option('--dry-run', 'Display the execution plan without running it')
|
|
132
|
-
.action(async (skill, input, options) => {
|
|
133
|
-
try {
|
|
134
|
-
const { runRun } = await import('../src/commands/run.js');
|
|
135
|
-
await runRun(skill, input, options);
|
|
136
|
-
} catch (err) {
|
|
137
|
-
console.error(err.message);
|
|
138
|
-
process.exit(1);
|
|
139
|
-
}
|
|
140
|
-
});
|
|
141
|
-
|
|
142
|
-
// guild logs (list traces)
|
|
143
|
-
const logsCmd = program
|
|
144
|
-
.command('logs')
|
|
145
|
-
.description('View and manage execution traces')
|
|
146
|
-
.action(async (options) => {
|
|
147
|
-
try {
|
|
148
|
-
const { runLogs } = await import('../src/commands/logs.js');
|
|
149
|
-
await runLogs('list', options);
|
|
150
|
-
} catch (err) {
|
|
151
|
-
console.error(err.message);
|
|
152
|
-
process.exit(1);
|
|
153
|
-
}
|
|
154
|
-
});
|
|
155
|
-
|
|
156
|
-
// guild logs clean
|
|
157
|
-
logsCmd
|
|
158
|
-
.command('clean')
|
|
159
|
-
.description('Remove old execution traces')
|
|
160
|
-
.option('--days <days>', 'Remove traces older than N days', '30')
|
|
161
|
-
.option('--all', 'Remove all traces')
|
|
162
|
-
.action(async (options) => {
|
|
163
|
-
try {
|
|
164
|
-
const { runLogs } = await import('../src/commands/logs.js');
|
|
165
|
-
await runLogs('clean', options);
|
|
166
|
-
} catch (err) {
|
|
167
|
-
console.error(err.message);
|
|
168
|
-
process.exit(1);
|
|
169
|
-
}
|
|
170
|
-
});
|
|
171
|
-
|
|
172
110
|
// guild eval
|
|
173
111
|
program
|
|
174
112
|
.command('eval')
|
|
@@ -195,25 +133,6 @@ program
|
|
|
195
133
|
}
|
|
196
134
|
});
|
|
197
135
|
|
|
198
|
-
// guild stats
|
|
199
|
-
program
|
|
200
|
-
.command('stats')
|
|
201
|
-
.description('View token usage stats and cost estimates')
|
|
202
|
-
.option('--period <period>', 'Filter by period: today, week, month, all', 'month')
|
|
203
|
-
.option('--compare', 'Compare cost across model profiles')
|
|
204
|
-
.option('--reset', 'Delete all usage history')
|
|
205
|
-
.option('-f, --force', 'Skip confirmation prompt (for --reset)')
|
|
206
|
-
.option('--export <format>', 'Export data (csv)')
|
|
207
|
-
.action(async (options) => {
|
|
208
|
-
try {
|
|
209
|
-
const { runStats } = await import('../src/commands/stats.js');
|
|
210
|
-
await runStats(options);
|
|
211
|
-
} catch (err) {
|
|
212
|
-
console.error(err.message);
|
|
213
|
-
process.exit(1);
|
|
214
|
-
}
|
|
215
|
-
});
|
|
216
|
-
|
|
217
136
|
// guild workspace
|
|
218
137
|
const workspaceCmd = program
|
|
219
138
|
.command('workspace')
|
package/package.json
CHANGED
package/src/commands/doctor.js
CHANGED
|
@@ -4,10 +4,10 @@
|
|
|
4
4
|
|
|
5
5
|
import * as p from '@clack/prompts';
|
|
6
6
|
import chalk from 'chalk';
|
|
7
|
-
import { existsSync, readdirSync } from 'fs';
|
|
7
|
+
import { existsSync, readdirSync, readFileSync } from 'fs';
|
|
8
8
|
import { join } from 'path';
|
|
9
9
|
import { resolveProjectRoot } from '../utils/files.js';
|
|
10
|
-
import {
|
|
10
|
+
import { parseSkill } from '../utils/skill-parser.js';
|
|
11
11
|
|
|
12
12
|
export async function runDoctor() {
|
|
13
13
|
const root = resolveProjectRoot();
|
|
@@ -86,27 +86,26 @@ export async function runDoctor() {
|
|
|
86
86
|
|
|
87
87
|
// Check workflow validation in skills
|
|
88
88
|
if (existsSync(skillsDir)) {
|
|
89
|
-
const
|
|
89
|
+
const skillDirs = readdirSync(skillsDir, { withFileTypes: true })
|
|
90
|
+
.filter(d => d.isDirectory())
|
|
91
|
+
.filter(d => existsSync(join(skillsDir, d.name, 'SKILL.md')));
|
|
92
|
+
|
|
90
93
|
let workflowCount = 0;
|
|
91
94
|
let workflowErrors = 0;
|
|
92
95
|
const errorDetails = [];
|
|
93
96
|
|
|
94
|
-
for (const
|
|
97
|
+
for (const dir of skillDirs) {
|
|
98
|
+
const content = readFileSync(join(skillsDir, dir.name, 'SKILL.md'), 'utf8');
|
|
99
|
+
const skill = parseSkill(content);
|
|
100
|
+
|
|
95
101
|
if (skill.workflow) {
|
|
96
102
|
workflowCount++;
|
|
97
|
-
if (skill.errors.length > 0) {
|
|
98
|
-
workflowErrors++;
|
|
99
|
-
errorDetails.push(`${name}: ${skill.errors.join('; ')}`);
|
|
100
|
-
}
|
|
101
|
-
}
|
|
102
103
|
|
|
103
|
-
// Check that agent references exist
|
|
104
|
-
if (skill.workflow) {
|
|
105
104
|
for (const step of skill.workflow.steps) {
|
|
106
105
|
if (step.role !== 'system' && step.role !== 'dynamic') {
|
|
107
106
|
const agentPath = join(agentsDir, `${step.role}.md`);
|
|
108
107
|
if (!existsSync(agentPath)) {
|
|
109
|
-
errorDetails.push(`${name}: step "${step.id}" references agent "${step.role}" — agent not found`);
|
|
108
|
+
errorDetails.push(`${dir.name}: step "${step.id}" references agent "${step.role}" — agent not found`);
|
|
110
109
|
workflowErrors++;
|
|
111
110
|
}
|
|
112
111
|
}
|
|
@@ -124,27 +123,6 @@ export async function runDoctor() {
|
|
|
124
123
|
});
|
|
125
124
|
healthy = false;
|
|
126
125
|
}
|
|
127
|
-
// If workflowCount === 0, don't add a check (no workflows to validate)
|
|
128
|
-
|
|
129
|
-
// Check for dual-format skills (workflow frontmatter + body step/phase headings)
|
|
130
|
-
// Matches "### Step 1", "## Phase 2", etc. — requires digit after Step/Phase
|
|
131
|
-
const STEP_PHASE_RE = /^#{1,3}\s+(Step|Phase)\s+\d/im;
|
|
132
|
-
const dualFormatWarnings = [];
|
|
133
|
-
|
|
134
|
-
for (const [name, skill] of skills) {
|
|
135
|
-
if (skill.workflow && skill.body && STEP_PHASE_RE.test(skill.body)) {
|
|
136
|
-
dualFormatWarnings.push(name);
|
|
137
|
-
}
|
|
138
|
-
}
|
|
139
|
-
|
|
140
|
-
if (dualFormatWarnings.length > 0) {
|
|
141
|
-
checks.push({
|
|
142
|
-
name: `Dual-format skills (${dualFormatWarnings.length} warning(s))`,
|
|
143
|
-
pass: true,
|
|
144
|
-
warn: true,
|
|
145
|
-
detail: `Skills with both workflow frontmatter and body step/phase headings: ${dualFormatWarnings.join(', ')}. Workflow steps take precedence — consider removing prose steps from body.`,
|
|
146
|
-
});
|
|
147
|
-
}
|
|
148
126
|
}
|
|
149
127
|
|
|
150
128
|
// Display results
|
package/src/commands/init.js
CHANGED
|
@@ -135,7 +135,7 @@ export async function runInit() {
|
|
|
135
135
|
|
|
136
136
|
const relevantSkills = projectData.hasExistingCode
|
|
137
137
|
? ['/guild-specialize', '/council', '/build-feature']
|
|
138
|
-
: ['/council', '/build-feature'
|
|
138
|
+
: ['/council', '/build-feature'];
|
|
139
139
|
p.log.info(`Start with: ${relevantSkills.join(' ')}`);
|
|
140
140
|
|
|
141
141
|
const quickStart = projectData.hasExistingCode
|
|
@@ -11,14 +11,12 @@ workflow:
|
|
|
11
11
|
requires: [feature-description]
|
|
12
12
|
produces: [evaluation-report, verdict]
|
|
13
13
|
model-tier: reasoning
|
|
14
|
-
on-failure: abort
|
|
15
14
|
- id: design
|
|
16
15
|
role: tech-lead
|
|
17
16
|
intent: "Break the feature into concrete tasks with acceptance criteria. Define implementation approach: files to modify, patterns to follow, interfaces, and technical risks."
|
|
18
17
|
requires: [feature-description, evaluation-report]
|
|
19
18
|
produces: [task-list, acceptance-criteria, technical-plan]
|
|
20
19
|
model-tier: reasoning
|
|
21
|
-
condition: step.evaluate.verdict != rejected
|
|
22
20
|
- id: implement
|
|
23
21
|
role: developer
|
|
24
22
|
intent: "Implement the feature following the technical plan. Write unit tests. Make atomic commits."
|
|
@@ -28,13 +26,11 @@ workflow:
|
|
|
28
26
|
- id: gate-pre-review
|
|
29
27
|
role: system
|
|
30
28
|
intent: "Run project tests and lint. Both must pass before review."
|
|
31
|
-
commands: [npm test, npm run lint]
|
|
32
29
|
gate: true
|
|
33
30
|
produces: [gate-pre-review-result]
|
|
34
|
-
|
|
35
|
-
- id: checkpoint-phase4
|
|
31
|
+
- id: checkpoint
|
|
36
32
|
role: system
|
|
37
|
-
intent: "Create checkpoint commit and write partial pipeline trace
|
|
33
|
+
intent: "Create checkpoint commit and write partial pipeline trace to spec file."
|
|
38
34
|
requires: [implementation, gate-pre-review-result]
|
|
39
35
|
produces: [checkpoint-commit]
|
|
40
36
|
gate: true
|
|
@@ -44,56 +40,29 @@ workflow:
|
|
|
44
40
|
requires: [implementation, gate-pre-review-result]
|
|
45
41
|
produces: [review-report]
|
|
46
42
|
model-tier: reasoning
|
|
47
|
-
retry:
|
|
48
|
-
max: 2
|
|
49
|
-
on: has-blockers
|
|
50
43
|
- id: fix-review-blockers
|
|
51
44
|
role: developer
|
|
52
45
|
intent: "Fix blocker findings from code review. Run tests after fixing."
|
|
53
46
|
requires: [review-report]
|
|
54
47
|
produces: [implementation]
|
|
55
48
|
model-tier: execution
|
|
56
|
-
condition: step.review.has-blockers
|
|
57
|
-
on-failure: goto:review
|
|
58
49
|
- id: qa-phase
|
|
59
|
-
role:
|
|
60
|
-
intent: "
|
|
61
|
-
delegates-to: qa-cycle
|
|
50
|
+
role: qa
|
|
51
|
+
intent: "Validate acceptance criteria, test edge cases, run bugfix cycles if needed."
|
|
62
52
|
requires: [acceptance-criteria, implementation]
|
|
63
53
|
produces: [qa-report]
|
|
64
|
-
|
|
65
|
-
max: 2
|
|
66
|
-
on: has-bugs
|
|
67
|
-
- id: post-qa-review
|
|
68
|
-
role: code-reviewer
|
|
69
|
-
intent: "Review changes introduced during QA bugfix cycles."
|
|
70
|
-
requires: [qa-report, implementation]
|
|
71
|
-
produces: [post-qa-review-report]
|
|
72
|
-
model-tier: reasoning
|
|
73
|
-
condition: step.qa-phase.had-significant-changes
|
|
74
|
-
retry:
|
|
75
|
-
max: 2
|
|
76
|
-
on: has-blockers
|
|
54
|
+
model-tier: execution
|
|
77
55
|
- id: gate-final
|
|
78
56
|
role: system
|
|
79
57
|
intent: "Run project tests and lint as final verification. Both must pass."
|
|
80
|
-
commands: [npm test, npm run lint]
|
|
81
58
|
gate: true
|
|
82
59
|
produces: [final-gate-result]
|
|
83
|
-
on-failure: goto:qa-phase
|
|
84
60
|
- id: completion
|
|
85
61
|
role: system
|
|
86
|
-
intent: "
|
|
62
|
+
intent: "Update SESSION.md. Present summary to user."
|
|
87
63
|
requires: [final-gate-result, review-report, qa-report]
|
|
88
|
-
produces: [
|
|
64
|
+
produces: [session-update]
|
|
89
65
|
gate: true
|
|
90
|
-
- id: extract-learnings
|
|
91
|
-
role: learnings-extractor
|
|
92
|
-
intent: "Extract compound learnings from this pipeline execution."
|
|
93
|
-
requires: [pipeline-trace]
|
|
94
|
-
produces: [updated-learnings]
|
|
95
|
-
model-tier: routine
|
|
96
|
-
blocking: false
|
|
97
66
|
---
|
|
98
67
|
|
|
99
68
|
# Build Feature
|
|
@@ -43,9 +43,9 @@
|
|
|
43
43
|
},
|
|
44
44
|
{
|
|
45
45
|
"id": "bf-minimum-steps",
|
|
46
|
-
"description": "Plan has at least
|
|
46
|
+
"description": "Plan has at least 8 steps",
|
|
47
47
|
"expectations": [
|
|
48
|
-
{ "text": "At least
|
|
48
|
+
{ "text": "At least 8 steps", "assertion": "step-count:8" }
|
|
49
49
|
]
|
|
50
50
|
}
|
|
51
51
|
]
|
|
@@ -11,33 +11,24 @@ workflow:
|
|
|
11
11
|
requires: [user-question]
|
|
12
12
|
produces: [council-type, participant-roles]
|
|
13
13
|
gate: true
|
|
14
|
-
- id: workspace-context
|
|
15
|
-
role: system
|
|
16
|
-
intent: "Detect workspace membership. If in a workspace, collect context from sibling repos (CLAUDE.md, PROJECT.md, SESSION.md) and build workspace context block."
|
|
17
|
-
requires: [council-type]
|
|
18
|
-
produces: [workspace-context]
|
|
19
|
-
condition: in-workspace
|
|
20
14
|
- id: agent-1
|
|
21
15
|
role: dynamic
|
|
22
|
-
intent: "Analyze the question from specialized perspective. State position with concrete arguments."
|
|
23
|
-
requires: [user-question, council-type
|
|
16
|
+
intent: "Analyze the question from specialized perspective. State position with concrete arguments. Spawn via Agent tool IN PARALLEL with agent-2 and agent-3."
|
|
17
|
+
requires: [user-question, council-type]
|
|
24
18
|
produces: [perspective-1]
|
|
25
19
|
model-tier: reasoning
|
|
26
|
-
parallel: [agent-2, agent-3]
|
|
27
20
|
- id: agent-2
|
|
28
21
|
role: dynamic
|
|
29
22
|
intent: "Analyze the question from specialized perspective. State position with concrete arguments."
|
|
30
|
-
requires: [user-question, council-type
|
|
23
|
+
requires: [user-question, council-type]
|
|
31
24
|
produces: [perspective-2]
|
|
32
25
|
model-tier: reasoning
|
|
33
|
-
parallel: [agent-1, agent-3]
|
|
34
26
|
- id: agent-3
|
|
35
27
|
role: dynamic
|
|
36
28
|
intent: "Analyze the question from specialized perspective. State position with concrete arguments."
|
|
37
|
-
requires: [user-question, council-type
|
|
29
|
+
requires: [user-question, council-type]
|
|
38
30
|
produces: [perspective-3]
|
|
39
31
|
model-tier: reasoning
|
|
40
|
-
parallel: [agent-1, agent-2]
|
|
41
32
|
- id: synthesize
|
|
42
33
|
role: system
|
|
43
34
|
intent: "Synthesize debate: points of agreement, disagreement, risks. Present options to user."
|
|
@@ -49,7 +40,6 @@ workflow:
|
|
|
49
40
|
intent: "After user decides, write spec document to docs/specs/."
|
|
50
41
|
requires: [synthesis, user-decision]
|
|
51
42
|
produces: [spec-document]
|
|
52
|
-
condition: user-wants-spec
|
|
53
43
|
gate: true
|
|
54
44
|
---
|
|
55
45
|
|
|
@@ -2,15 +2,12 @@
|
|
|
2
2
|
"skill": "council",
|
|
3
3
|
"evals": [
|
|
4
4
|
{
|
|
5
|
-
"id": "council-three-
|
|
6
|
-
"description": "Council has 3 agent steps
|
|
5
|
+
"id": "council-three-agents",
|
|
6
|
+
"description": "Council has 3 agent steps",
|
|
7
7
|
"expectations": [
|
|
8
8
|
{ "text": "Agent-1 exists", "assertion": "step-exists:agent-1" },
|
|
9
9
|
{ "text": "Agent-2 exists", "assertion": "step-exists:agent-2" },
|
|
10
|
-
{ "text": "Agent-3 exists", "assertion": "step-exists:agent-3" }
|
|
11
|
-
{ "text": "Agent-1 is parallel", "assertion": "step-parallel:agent-1" },
|
|
12
|
-
{ "text": "Agent-2 is parallel", "assertion": "step-parallel:agent-2" },
|
|
13
|
-
{ "text": "Agent-3 is parallel", "assertion": "step-parallel:agent-3" }
|
|
10
|
+
{ "text": "Agent-3 exists", "assertion": "step-exists:agent-3" }
|
|
14
11
|
]
|
|
15
12
|
},
|
|
16
13
|
{
|
|
@@ -29,13 +26,6 @@
|
|
|
29
26
|
{ "text": "Synthesize step exists", "assertion": "step-exists:synthesize" },
|
|
30
27
|
{ "text": "Synthesize has gate", "assertion": "gate-exists:synthesize" }
|
|
31
28
|
]
|
|
32
|
-
},
|
|
33
|
-
{
|
|
34
|
-
"id": "council-workspace-context",
|
|
35
|
-
"description": "Workspace context step exists with condition",
|
|
36
|
-
"expectations": [
|
|
37
|
-
{ "text": "Workspace-context step exists", "assertion": "step-exists:workspace-context" }
|
|
38
|
-
]
|
|
39
29
|
}
|
|
40
30
|
]
|
|
41
31
|
}
|
|
@@ -8,12 +8,10 @@ workflow:
|
|
|
8
8
|
- id: verify-branch
|
|
9
9
|
role: system
|
|
10
10
|
intent: "Verify not on main/develop, check for uncommitted changes, get commits ahead of main."
|
|
11
|
-
commands: [git branch --show-current, git status, git log main..HEAD --oneline]
|
|
12
11
|
produces: [branch-name, branch-state, commit-list]
|
|
13
12
|
- id: gather-context
|
|
14
13
|
role: system
|
|
15
14
|
intent: "Collect diff stats, run tests and lint for PR description context."
|
|
16
|
-
commands: [git diff main..HEAD --stat, npm test, npm run lint]
|
|
17
15
|
requires: [branch-state]
|
|
18
16
|
produces: [diff-summary, test-result, lint-result]
|
|
19
17
|
- id: generate-description
|
|
@@ -25,7 +23,6 @@ workflow:
|
|
|
25
23
|
- id: create-pr
|
|
26
24
|
role: system
|
|
27
25
|
intent: "Push branch to origin and create PR via gh CLI."
|
|
28
|
-
commands: [git push -u origin, gh pr create]
|
|
29
26
|
requires: [pr-description, pr-title, branch-name]
|
|
30
27
|
produces: [pr-url]
|
|
31
28
|
- id: post-creation
|
|
@@ -102,7 +99,7 @@ Build a structured PR description:
|
|
|
102
99
|
1. Display the PR URL
|
|
103
100
|
2. Suggest next steps:
|
|
104
101
|
- "Request review from a teammate"
|
|
105
|
-
- "Run `/review` for an AI code review"
|
|
102
|
+
- "Run `/code-review` for an AI code review"
|
|
106
103
|
- "Merge when ready with `gh pr merge [number]`"
|
|
107
104
|
|
|
108
105
|
## Example Session
|
|
@@ -121,7 +118,7 @@ https://github.com/org/repo/pull/42
|
|
|
121
118
|
|
|
122
119
|
Next steps:
|
|
123
120
|
- Request review from a teammate
|
|
124
|
-
- Run /review for AI code review
|
|
121
|
+
- Run /code-review for AI code review
|
|
125
122
|
- Merge when ready
|
|
126
123
|
```
|
|
127
124
|
|