guild-agents 2.0.0 → 2.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +100 -97
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -1,152 +1,155 @@
1
1
  # Guild
2
2
 
3
3
  [![npm version](https://img.shields.io/npm/v/guild-agents)](https://www.npmjs.com/package/guild-agents)
4
+ [![npm downloads](https://img.shields.io/npm/dm/guild-agents)](https://www.npmjs.com/package/guild-agents)
4
5
  [![CI](https://github.com/guild-agents/guild/actions/workflows/ci.yml/badge.svg)](https://github.com/guild-agents/guild/actions/workflows/ci.yml)
5
6
  [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
6
- [![Node.js >= 20](https://img.shields.io/badge/node-%3E%3D20-brightgreen)](https://nodejs.org)
7
7
 
8
8
  **Guild makes Claude Code think before it builds.**
9
9
 
10
- Without Guild, Claude Code writes code immediately. No evaluation, no design, no review. With Guild, every feature goes through structured phases — evaluated by an advisor, designed by a tech lead, reviewed by a code reviewer, validated by QA — before anything ships. Everything is markdown in `.claude/`, tracked by git, works offline, zero infrastructure.
10
+ Without Guild, Claude Code writes code immediately. No evaluation, no design, no review. With Guild, every feature goes through structured phases — evaluated by an advisor, designed by a tech lead, reviewed, and validated — before anything ships.
11
11
 
12
- ## The Problem
12
+ ```bash
13
+ npx guild-agents init
14
+ ```
13
15
 
14
- Without structure, Claude Code:
16
+ ## The Problem
15
17
 
16
- - Writes code before understanding the problem
17
- - Has no design phase and no review gate
18
- - Loses decisions between sessions
19
- - Produces results that vary with every conversation
18
+ Claude Code is powerful but unstructured. Ask it to "add authentication" and it starts writing code immediately. No one evaluated whether the approach makes sense. No design doc captures the trade-offs. No review gate catches issues before they compound. Next session, all that context is gone.
20
19
 
21
20
  ## How Guild Solves It
22
21
 
23
- - **Spec before code**: `/build-feature` enforces evaluation, design, and review phasescode comes after the design doc
24
- - **Independent perspectives**: `/council` spawns parallel agents that each analyze your idea independently, then synthesize into a decision
25
- - **Session continuity**: `/session-start` and `/session-end` combine SESSION.md with Claude Code's memory system — you never lose context between sessions
26
- - **Behavioral discipline**: `/tdd` and `/debug` prevent the most common LLM anti-patterns: code before tests, fixes before root cause analysis
27
- - **Quality you can measure**: `guild eval` validates skill structure, trigger accuracy, and description quality with automated benchmarks
22
+ Guild installs **spec-first workflows** and **specialized role definitions** as `.claude/` markdown files in your project. Claude Code reads them natively as slash commands. The process enforces quality gates you can't skip evaluation, you can't ship without review.
28
23
 
29
- ## Quick Start
24
+ - **`/guild:build-feature`** — 5-phase pipeline: evaluate, design, implement, review, QA. Code comes after the design doc.
25
+ - **`/guild:council`** — 3 agents analyze your idea in parallel with different perspectives, then synthesize into a decision with a spec document.
26
+ - **`/guild:tdd`** — No production code without a failing test first. Enforces red-green-refactor.
27
+ - **`/guild:debug`** — No fixes without root cause investigation. Systematic 4-phase process.
28
+ - **`/guild:session-start`** / **`/guild:session-end`** — SESSION.md captures where you stopped. Claude Code memory captures what you learned. You resume with full context.
29
+
30
+ ## Quality You Can Measure
31
+
32
+ Most agent frameworks can't tell you if their skills actually fire when they should. Guild ships a benchmark suite.
30
33
 
31
34
  ```bash
32
- npm install -g guild-agents
33
- guild init
35
+ guild eval # Structural assertions: steps exist, roles correct, gates present
36
+ guild eval --triggers # Trigger accuracy: do user prompts route to the right skill?
37
+ guild eval --semantic # LLM-based scoring via Haiku for higher-fidelity testing
38
+ guild eval --suggest # Keyword gap analysis with improvement suggestions
34
39
  ```
35
40
 
36
- Then use skills as slash commands in Claude Code:
41
+ Every trigger run records results to a rolling benchmark with per-skill accuracy, precision, recall, and delta vs previous run. Regressions are caught before they ship.
42
+
43
+ ## The Pipeline
37
44
 
38
45
  ```text
39
- /guild-specialize # Learn your codebase, enrich CLAUDE.md
40
- /council "Add JWT auth" # Spec a feature through structured deliberation
41
- /build-feature # Implement from spec through the full pipeline
46
+ /guild:build-feature "Add JWT auth"
47
+ |
48
+ v
49
+ +-----------+ +-----------+ +-----------+
50
+ | Evaluate |---->| Design |---->| Build |
51
+ | advisor | | tech-lead | | developer |
52
+ +-----------+ +-----------+ +-----+-----+
53
+ |
54
+ +-----+-----+
55
+ v v
56
+ +-----------++-----------+
57
+ | Review || QA |
58
+ +-----------++-----------+
42
59
  ```
43
60
 
44
- ## The Pipeline
61
+ Five phases. Phases 1-2 happen before any code is written. Gates between phases can't be skipped.
45
62
 
46
- ```text
47
- You ──> /build-feature "Add JWT auth"
48
-
49
-
50
- ┌──────────┐ ┌──────────┐ ┌──────────┐
51
- Evaluate │────>│ Design │────>│ Build │
52
- │ advisor │ │ tech-lead│ │developer │
53
- └──────────┘ └──────────┘ └────┬─────┘
54
-
55
- ┌─────┴─────┐
56
- ▼ ▼
57
- ┌──────────┐┌──────────┐
58
- │ Review ││ QA │
59
- └──────────┘└──────────┘
63
+ ## Install
64
+
65
+ **As a Claude Code plugin** (recommended):
66
+
67
+ ```
68
+ /plugin install Guild-Agents/guild
60
69
  ```
61
70
 
62
- Five phases: **evaluate**, **design**, **implement**, **review**, **validate**. Phases 1-2 happen before any code is written.
71
+ All 10 skills and 6 roles are available immediately as `/guild:*` commands.
63
72
 
64
- ## Skills
73
+ **As an npm package** (for the eval CLI):
74
+
75
+ ```bash
76
+ npm install -g guild-agents
77
+ guild init
78
+ ```
65
79
 
66
- 10 skills, available as slash commands in Claude Code:
80
+ ## Skills
67
81
 
68
82
  | Skill | What it does |
69
83
  | --- | --- |
70
- | `/build-feature` | Full pipeline: evaluate, design, implement, review, QA |
71
- | `/council` | Multi-perspective deliberation — 3 agents debate independently, then synthesize |
72
- | `/create-pr` | Structured pull request from current branch |
73
- | `/qa-cycle` | QA and bugfix loop until clean |
74
- | `/tdd` | TDD red-green-refactor — no code without a failing test |
75
- | `/debug` | Systematic 4-phase debugging — no fixes without root cause |
76
- | `/guild-specialize` | Explore your codebase, enrich CLAUDE.md with real conventions |
77
- | `/re-specialize` | Incremental update of CLAUDE.md when your stack changes |
78
- | `/session-start` | Resume work from SESSION.md + Claude Code memory |
79
- | `/session-end` | Save state to SESSION.md + durable learnings to memory |
80
-
81
- ## Agents
82
-
83
- 6 specialized roles that give Claude Code distinct perspectives:
84
-
85
- | Agent | Role |
84
+ | `/guild:build-feature` | Full 5-phase pipeline with quality gates |
85
+ | `/guild:council` | Multi-perspective deliberation — 3 roles debate, then synthesize |
86
+ | `/guild:create-pr` | Structured pull request from current branch |
87
+ | `/guild:qa-cycle` | QA and bugfix loop until clean |
88
+ | `/guild:tdd` | TDD red-green-refactor — no code without a failing test |
89
+ | `/guild:debug` | Systematic 4-phase debugging — no fixes without root cause |
90
+ | `/guild:guild-specialize` | Explore your codebase, enrich CLAUDE.md with real conventions |
91
+ | `/guild:re-specialize` | Incremental update when your stack changes |
92
+ | `/guild:session-start` | Resume from SESSION.md + Claude Code memory |
93
+ | `/guild:session-end` | Save state + durable learnings to memory |
94
+
95
+ ## Roles
96
+
97
+ 6 role definitions that give Claude Code distinct perspectives:
98
+
99
+ | Role | Perspective |
86
100
  | --- | --- |
87
- | advisor | Evaluates ideas and provides strategic direction. First gate before any work begins |
88
- | tech-lead | Breaks features into tasks. Defines technical approach and architecture |
89
- | developer | Implements features following project conventions. Writes tests, makes atomic commits |
90
- | code-reviewer | Reviews quality, patterns, and technical debt |
91
- | qa | Testing, edge cases, regression. Validates the implementation meets acceptance criteria |
92
- | bugfix | Diagnosis and bug resolution. Isolates root causes and applies targeted fixes |
101
+ | advisor | Strategic direction. First gate — evaluates ideas before work begins |
102
+ | tech-lead | Technical approach. Breaks features into tasks with acceptance criteria |
103
+ | developer | Implementation. Follows conventions, writes tests, makes atomic commits |
104
+ | code-reviewer | Quality. Reviews patterns, security, and technical debt |
105
+ | qa | Validation. Tests edge cases, regressions, acceptance criteria |
106
+ | bugfix | Diagnosis. Isolates root causes, applies targeted fixes |
93
107
 
94
- Each agent is a flat `.md` file with identity, responsibilities, and boundaries. Claude Code reads them via its native Agent tool and assumes the role.
108
+ Each role is a `.md` file with identity, responsibilities, and boundaries. Claude Code reads them via its native Agent tool.
95
109
 
96
- ## CLI
110
+ ## Session Continuity
97
111
 
98
- ```bash
99
- guild init # Interactive project onboarding
100
- guild new-agent <name> # Create a custom agent
101
- guild status # Show project status
102
- guild doctor # Diagnose setup
103
- guild list # List agents and skills
104
- guild eval # Run structural skill evaluations
105
- guild eval --triggers # Run trigger accuracy tests
106
- guild eval --semantic # LLM-based trigger tests (requires ANTHROPIC_API_KEY)
107
- guild eval --suggest # Description improvement suggestions
108
- guild workspace init # Create a multi-repo workspace
109
- ```
112
+ Claude Code's memory system stores long-term knowledge (who you are, lessons learned). But it explicitly excludes ephemeral work state — what you were building, which branch, what phase. That's the gap Guild fills.
110
113
 
111
- ## Skill Evaluations
114
+ `/guild:session-end` writes to **both layers**:
115
+ - **SESSION.md** — where you stopped: task, branch, phase, next steps (overwritten each session)
116
+ - **Claude Code memory** — what you learned: decisions, lessons, references (persists across sessions)
112
117
 
113
- Guild includes a built-in framework for measuring skill quality:
118
+ `/guild:session-start` reads from **both** and presents a unified summary.
114
119
 
115
- - **Structural evals** -- assert workflow structure: steps exist, roles are correct, gates are present
116
- - **Trigger tests** -- verify that user prompts route to the correct skill
117
- - **Semantic matcher** -- LLM-based scoring for higher-fidelity trigger testing
118
- - **Benchmarks** -- rolling history with per-skill accuracy, precision, recall, and regression detection
120
+ ## When NOT to Use Guild
119
121
 
120
- ## How It Works
122
+ Guild is overkill for: throwaway scripts, exploratory prototypes, single-file utilities, or anything where you'd rather ship fast than ship right. The 5-phase pipeline has cost — use it when that cost buys you something.
121
123
 
122
- Guild installs agent definitions and skill workflows as markdown files in your project's `.claude/` directory. Claude Code discovers and executes them natively — no custom runtime, no extra process, no API calls. When you type `/build-feature`, Claude Code reads the skill, follows the phases, and spawns agents using its own Agent tool.
124
+ ## How Is This Different?
123
125
 
124
- Guild defines **what** happens. Claude Code decides **how** to execute it.
126
+ Guild isn't an editor (Cursor), isn't a pair-programmer (Aider), isn't an autonomous agent (Devin). It's a **methodology layer** on top of Claude Code. If you already use Claude Code and want structured spec-first workflows with quality gates, Guild adds the discipline. If you don't use Claude Code, Guild isn't for you.
125
127
 
126
- ## Session Continuity
128
+ ## How It Works
127
129
 
128
- Claude Code's native memory system remembers who you are, lessons learned, and project contextknowledge that lasts months. But it explicitly does not store ephemeral work state: what you were building, which branch, what phase, what's next. That's the gap Guild fills.
130
+ Guild installs role definitions and skill workflows as markdown files in `.claude/`. Claude Code discovers and executes them natively no custom runtime, no extra process, no API calls. When you type `/guild:build-feature`, Claude Code reads the skill, follows the phases, and spawns agents using its own Agent tool.
129
131
 
130
- `/session-end` writes to **both layers**:
132
+ Guild defines **what** happens. Claude Code decides **how** to execute it.
131
133
 
132
- - **SESSION.md** — where you stopped: task, branch, phase, next steps (overwritten each session)
133
- - **Claude Code memory** — what you learned: decisions, lessons, references (persists across sessions)
134
+ ## CLI
134
135
 
135
- `/session-start` reads from **both** and presents a unified summary. You resume exactly where you left off, with full context of what you know and what you were doing.
136
+ ```bash
137
+ guild init # Interactive project onboarding
138
+ guild new-agent <name> # Create a custom role
139
+ guild status # Show project status
140
+ guild doctor # Diagnose setup
141
+ guild list # List roles and skills
142
+ guild eval # Run structural skill evaluations
143
+ guild workspace init # Create a multi-repo workspace
144
+ ```
136
145
 
137
146
  ## Guild Builds Itself
138
147
 
139
- Every feature in Guild goes through the same spec-first pipeline that Guild installs in your project. Guild's own design decisions live in `docs/specs/`.
140
-
141
- ## Requirements
142
-
143
- - Node.js >= 20
144
- - Claude Code
145
- - `gh` CLI (optional, for GitHub integration)
148
+ Every feature in Guild goes through the same spec-first pipeline that Guild installs in your project.
146
149
 
147
150
  ## Contributing
148
151
 
149
- See [CONTRIBUTING.md](.github/CONTRIBUTING.md) for setup, branching, and contribution guidelines.
152
+ See [CONTRIBUTING.md](.github/CONTRIBUTING.md).
150
153
 
151
154
  ## License
152
155
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "guild-agents",
3
- "version": "2.0.0",
3
+ "version": "2.1.1",
4
4
  "description": "Specification-driven development CLI for Claude Code — think before you build",
5
5
  "type": "module",
6
6
  "files": [