guild-agents 2.0.0 → 2.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +100 -97
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,152 +1,155 @@
|
|
|
1
1
|
# Guild
|
|
2
2
|
|
|
3
3
|
[](https://www.npmjs.com/package/guild-agents)
|
|
4
|
+
[](https://www.npmjs.com/package/guild-agents)
|
|
4
5
|
[](https://github.com/guild-agents/guild/actions/workflows/ci.yml)
|
|
5
6
|
[](LICENSE)
|
|
6
|
-
[](https://nodejs.org)
|
|
7
7
|
|
|
8
8
|
**Guild makes Claude Code think before it builds.**
|
|
9
9
|
|
|
10
|
-
Without Guild, Claude Code writes code immediately. No evaluation, no design, no review. With Guild, every feature goes through structured phases — evaluated by an advisor, designed by a tech lead, reviewed
|
|
10
|
+
Without Guild, Claude Code writes code immediately. No evaluation, no design, no review. With Guild, every feature goes through structured phases — evaluated by an advisor, designed by a tech lead, reviewed, and validated — before anything ships.
|
|
11
11
|
|
|
12
|
-
|
|
12
|
+
```bash
|
|
13
|
+
npx guild-agents init
|
|
14
|
+
```
|
|
13
15
|
|
|
14
|
-
|
|
16
|
+
## The Problem
|
|
15
17
|
|
|
16
|
-
|
|
17
|
-
- Has no design phase and no review gate
|
|
18
|
-
- Loses decisions between sessions
|
|
19
|
-
- Produces results that vary with every conversation
|
|
18
|
+
Claude Code is powerful but unstructured. Ask it to "add authentication" and it starts writing code immediately. No one evaluated whether the approach makes sense. No design doc captures the trade-offs. No review gate catches issues before they compound. Next session, all that context is gone.
|
|
20
19
|
|
|
21
20
|
## How Guild Solves It
|
|
22
21
|
|
|
23
|
-
- **
|
|
24
|
-
- **Independent perspectives**: `/council` spawns parallel agents that each analyze your idea independently, then synthesize into a decision
|
|
25
|
-
- **Session continuity**: `/session-start` and `/session-end` combine SESSION.md with Claude Code's memory system — you never lose context between sessions
|
|
26
|
-
- **Behavioral discipline**: `/tdd` and `/debug` prevent the most common LLM anti-patterns: code before tests, fixes before root cause analysis
|
|
27
|
-
- **Quality you can measure**: `guild eval` validates skill structure, trigger accuracy, and description quality with automated benchmarks
|
|
22
|
+
Guild installs **spec-first workflows** and **specialized role definitions** as `.claude/` markdown files in your project. Claude Code reads them natively as slash commands. The process enforces quality gates — you can't skip evaluation, you can't ship without review.
|
|
28
23
|
|
|
29
|
-
|
|
24
|
+
- **`/guild:build-feature`** — 5-phase pipeline: evaluate, design, implement, review, QA. Code comes after the design doc.
|
|
25
|
+
- **`/guild:council`** — 3 agents analyze your idea in parallel with different perspectives, then synthesize into a decision with a spec document.
|
|
26
|
+
- **`/guild:tdd`** — No production code without a failing test first. Enforces red-green-refactor.
|
|
27
|
+
- **`/guild:debug`** — No fixes without root cause investigation. Systematic 4-phase process.
|
|
28
|
+
- **`/guild:session-start`** / **`/guild:session-end`** — SESSION.md captures where you stopped. Claude Code memory captures what you learned. You resume with full context.
|
|
29
|
+
|
|
30
|
+
## Quality You Can Measure
|
|
31
|
+
|
|
32
|
+
Most agent frameworks can't tell you if their skills actually fire when they should. Guild ships a benchmark suite.
|
|
30
33
|
|
|
31
34
|
```bash
|
|
32
|
-
|
|
33
|
-
guild
|
|
35
|
+
guild eval # Structural assertions: steps exist, roles correct, gates present
|
|
36
|
+
guild eval --triggers # Trigger accuracy: do user prompts route to the right skill?
|
|
37
|
+
guild eval --semantic # LLM-based scoring via Haiku for higher-fidelity testing
|
|
38
|
+
guild eval --suggest # Keyword gap analysis with improvement suggestions
|
|
34
39
|
```
|
|
35
40
|
|
|
36
|
-
|
|
41
|
+
Every trigger run records results to a rolling benchmark with per-skill accuracy, precision, recall, and delta vs previous run. Regressions are caught before they ship.
|
|
42
|
+
|
|
43
|
+
## The Pipeline
|
|
37
44
|
|
|
38
45
|
```text
|
|
39
|
-
/guild-
|
|
40
|
-
|
|
41
|
-
|
|
46
|
+
/guild:build-feature "Add JWT auth"
|
|
47
|
+
|
|
|
48
|
+
v
|
|
49
|
+
+-----------+ +-----------+ +-----------+
|
|
50
|
+
| Evaluate |---->| Design |---->| Build |
|
|
51
|
+
| advisor | | tech-lead | | developer |
|
|
52
|
+
+-----------+ +-----------+ +-----+-----+
|
|
53
|
+
|
|
|
54
|
+
+-----+-----+
|
|
55
|
+
v v
|
|
56
|
+
+-----------++-----------+
|
|
57
|
+
| Review || QA |
|
|
58
|
+
+-----------++-----------+
|
|
42
59
|
```
|
|
43
60
|
|
|
44
|
-
|
|
61
|
+
Five phases. Phases 1-2 happen before any code is written. Gates between phases can't be skipped.
|
|
45
62
|
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
│ advisor │ │ tech-lead│ │developer │
|
|
53
|
-
└──────────┘ └──────────┘ └────┬─────┘
|
|
54
|
-
│
|
|
55
|
-
┌─────┴─────┐
|
|
56
|
-
▼ ▼
|
|
57
|
-
┌──────────┐┌──────────┐
|
|
58
|
-
│ Review ││ QA │
|
|
59
|
-
└──────────┘└──────────┘
|
|
63
|
+
## Install
|
|
64
|
+
|
|
65
|
+
**As a Claude Code plugin** (recommended):
|
|
66
|
+
|
|
67
|
+
```
|
|
68
|
+
/plugin install Guild-Agents/guild
|
|
60
69
|
```
|
|
61
70
|
|
|
62
|
-
|
|
71
|
+
All 10 skills and 6 roles are available immediately as `/guild:*` commands.
|
|
63
72
|
|
|
64
|
-
|
|
73
|
+
**As an npm package** (for the eval CLI):
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
npm install -g guild-agents
|
|
77
|
+
guild init
|
|
78
|
+
```
|
|
65
79
|
|
|
66
|
-
|
|
80
|
+
## Skills
|
|
67
81
|
|
|
68
82
|
| Skill | What it does |
|
|
69
83
|
| --- | --- |
|
|
70
|
-
| `/build-feature` | Full pipeline
|
|
71
|
-
| `/council` | Multi-perspective deliberation — 3
|
|
72
|
-
| `/create-pr` | Structured pull request from current branch |
|
|
73
|
-
| `/qa-cycle` | QA and bugfix loop until clean |
|
|
74
|
-
| `/tdd` | TDD red-green-refactor — no code without a failing test |
|
|
75
|
-
| `/debug` | Systematic 4-phase debugging — no fixes without root cause |
|
|
76
|
-
| `/guild-specialize` | Explore your codebase, enrich CLAUDE.md with real conventions |
|
|
77
|
-
| `/re-specialize` | Incremental update
|
|
78
|
-
| `/session-start` | Resume
|
|
79
|
-
| `/session-end` | Save state
|
|
80
|
-
|
|
81
|
-
##
|
|
82
|
-
|
|
83
|
-
6
|
|
84
|
-
|
|
85
|
-
|
|
|
84
|
+
| `/guild:build-feature` | Full 5-phase pipeline with quality gates |
|
|
85
|
+
| `/guild:council` | Multi-perspective deliberation — 3 roles debate, then synthesize |
|
|
86
|
+
| `/guild:create-pr` | Structured pull request from current branch |
|
|
87
|
+
| `/guild:qa-cycle` | QA and bugfix loop until clean |
|
|
88
|
+
| `/guild:tdd` | TDD red-green-refactor — no code without a failing test |
|
|
89
|
+
| `/guild:debug` | Systematic 4-phase debugging — no fixes without root cause |
|
|
90
|
+
| `/guild:guild-specialize` | Explore your codebase, enrich CLAUDE.md with real conventions |
|
|
91
|
+
| `/guild:re-specialize` | Incremental update when your stack changes |
|
|
92
|
+
| `/guild:session-start` | Resume from SESSION.md + Claude Code memory |
|
|
93
|
+
| `/guild:session-end` | Save state + durable learnings to memory |
|
|
94
|
+
|
|
95
|
+
## Roles
|
|
96
|
+
|
|
97
|
+
6 role definitions that give Claude Code distinct perspectives:
|
|
98
|
+
|
|
99
|
+
| Role | Perspective |
|
|
86
100
|
| --- | --- |
|
|
87
|
-
| advisor |
|
|
88
|
-
| tech-lead | Breaks features into tasks
|
|
89
|
-
| developer |
|
|
90
|
-
| code-reviewer | Reviews
|
|
91
|
-
| qa |
|
|
92
|
-
| bugfix | Diagnosis
|
|
101
|
+
| advisor | Strategic direction. First gate — evaluates ideas before work begins |
|
|
102
|
+
| tech-lead | Technical approach. Breaks features into tasks with acceptance criteria |
|
|
103
|
+
| developer | Implementation. Follows conventions, writes tests, makes atomic commits |
|
|
104
|
+
| code-reviewer | Quality. Reviews patterns, security, and technical debt |
|
|
105
|
+
| qa | Validation. Tests edge cases, regressions, acceptance criteria |
|
|
106
|
+
| bugfix | Diagnosis. Isolates root causes, applies targeted fixes |
|
|
93
107
|
|
|
94
|
-
Each
|
|
108
|
+
Each role is a `.md` file with identity, responsibilities, and boundaries. Claude Code reads them via its native Agent tool.
|
|
95
109
|
|
|
96
|
-
##
|
|
110
|
+
## Session Continuity
|
|
97
111
|
|
|
98
|
-
|
|
99
|
-
guild init # Interactive project onboarding
|
|
100
|
-
guild new-agent <name> # Create a custom agent
|
|
101
|
-
guild status # Show project status
|
|
102
|
-
guild doctor # Diagnose setup
|
|
103
|
-
guild list # List agents and skills
|
|
104
|
-
guild eval # Run structural skill evaluations
|
|
105
|
-
guild eval --triggers # Run trigger accuracy tests
|
|
106
|
-
guild eval --semantic # LLM-based trigger tests (requires ANTHROPIC_API_KEY)
|
|
107
|
-
guild eval --suggest # Description improvement suggestions
|
|
108
|
-
guild workspace init # Create a multi-repo workspace
|
|
109
|
-
```
|
|
112
|
+
Claude Code's memory system stores long-term knowledge (who you are, lessons learned). But it explicitly excludes ephemeral work state — what you were building, which branch, what phase. That's the gap Guild fills.
|
|
110
113
|
|
|
111
|
-
|
|
114
|
+
`/guild:session-end` writes to **both layers**:
|
|
115
|
+
- **SESSION.md** — where you stopped: task, branch, phase, next steps (overwritten each session)
|
|
116
|
+
- **Claude Code memory** — what you learned: decisions, lessons, references (persists across sessions)
|
|
112
117
|
|
|
113
|
-
|
|
118
|
+
`/guild:session-start` reads from **both** and presents a unified summary.
|
|
114
119
|
|
|
115
|
-
|
|
116
|
-
- **Trigger tests** -- verify that user prompts route to the correct skill
|
|
117
|
-
- **Semantic matcher** -- LLM-based scoring for higher-fidelity trigger testing
|
|
118
|
-
- **Benchmarks** -- rolling history with per-skill accuracy, precision, recall, and regression detection
|
|
120
|
+
## When NOT to Use Guild
|
|
119
121
|
|
|
120
|
-
|
|
122
|
+
Guild is overkill for: throwaway scripts, exploratory prototypes, single-file utilities, or anything where you'd rather ship fast than ship right. The 5-phase pipeline has cost — use it when that cost buys you something.
|
|
121
123
|
|
|
122
|
-
|
|
124
|
+
## How Is This Different?
|
|
123
125
|
|
|
124
|
-
Guild
|
|
126
|
+
Guild isn't an editor (Cursor), isn't a pair-programmer (Aider), isn't an autonomous agent (Devin). It's a **methodology layer** on top of Claude Code. If you already use Claude Code and want structured spec-first workflows with quality gates, Guild adds the discipline. If you don't use Claude Code, Guild isn't for you.
|
|
125
127
|
|
|
126
|
-
##
|
|
128
|
+
## How It Works
|
|
127
129
|
|
|
128
|
-
|
|
130
|
+
Guild installs role definitions and skill workflows as markdown files in `.claude/`. Claude Code discovers and executes them natively — no custom runtime, no extra process, no API calls. When you type `/guild:build-feature`, Claude Code reads the skill, follows the phases, and spawns agents using its own Agent tool.
|
|
129
131
|
|
|
130
|
-
|
|
132
|
+
Guild defines **what** happens. Claude Code decides **how** to execute it.
|
|
131
133
|
|
|
132
|
-
|
|
133
|
-
- **Claude Code memory** — what you learned: decisions, lessons, references (persists across sessions)
|
|
134
|
+
## CLI
|
|
134
135
|
|
|
135
|
-
|
|
136
|
+
```bash
|
|
137
|
+
guild init # Interactive project onboarding
|
|
138
|
+
guild new-agent <name> # Create a custom role
|
|
139
|
+
guild status # Show project status
|
|
140
|
+
guild doctor # Diagnose setup
|
|
141
|
+
guild list # List roles and skills
|
|
142
|
+
guild eval # Run structural skill evaluations
|
|
143
|
+
guild workspace init # Create a multi-repo workspace
|
|
144
|
+
```
|
|
136
145
|
|
|
137
146
|
## Guild Builds Itself
|
|
138
147
|
|
|
139
|
-
Every feature in Guild goes through the same spec-first pipeline that Guild installs in your project.
|
|
140
|
-
|
|
141
|
-
## Requirements
|
|
142
|
-
|
|
143
|
-
- Node.js >= 20
|
|
144
|
-
- Claude Code
|
|
145
|
-
- `gh` CLI (optional, for GitHub integration)
|
|
148
|
+
Every feature in Guild goes through the same spec-first pipeline that Guild installs in your project.
|
|
146
149
|
|
|
147
150
|
## Contributing
|
|
148
151
|
|
|
149
|
-
See [CONTRIBUTING.md](.github/CONTRIBUTING.md)
|
|
152
|
+
See [CONTRIBUTING.md](.github/CONTRIBUTING.md).
|
|
150
153
|
|
|
151
154
|
## License
|
|
152
155
|
|