@the-bearded-bear/claude-craft 5.6.0-next.8bfdffa → 5.7.0-next.bf058ea

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -171,12 +171,62 @@ Summary:
171
171
  - Metrics exported: .ralph/sessions/.../metrics-export.json
172
172
  ```
173
173
 
174
+ ## Agent Teams Coordination Mode
175
+
176
+ When operating in Agent Teams mode (activated via `--use-teams` on `/common:ralph-sprint`), the conductor takes on the role of **team lead** and coordinates a dev teammate through the Claude Code Agent Teams API instead of bash process management.
177
+
178
+ ### Prerequisites
179
+
180
+ - `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` environment variable
181
+ - Claude Code v2.1.32+
182
+ - Adapter library: `Tools/AgentTeams/lib/ralph-teams-adapter.sh`
183
+
184
+ ### Coordination via Task System
185
+
186
+ In Agent Teams mode, the conductor replaces PID-based tracking with the shared task system:
187
+
188
+ | Bash Mode (current) | Agent Teams Mode |
189
+ |---------------------|-----------------|
190
+ | `spawn_ralph_for_story()` with bash `&` | `TaskCreate` + `SendMessage` to dev teammate |
191
+ | `kill -0 $pid` polling | `TaskList` / `TaskCompleted` hook |
192
+ | PID-based completion detection | `TaskUpdate(status=completed)` by dev |
193
+ | `kill -9` for stuck processes | `SendMessage(type=shutdown_request)` + watchdog fallback |
194
+ | `yq` writes to `batch-queue.yaml` | Shared `TaskList` (built-in coordination) |
195
+
196
+ ### Story Processing Flow
197
+
198
+ 1. **Claim story**: Conductor reads `sprint-status.yaml`, claims next `ready-for-dev` story
199
+ 2. **Create task**: `TaskCreate` with story details, acceptance criteria, and TDD instructions
200
+ 3. **Assign to dev**: `SendMessage(type=message, recipient=dev-1)` with the story prompt
201
+ 4. **Monitor progress**: Poll `TaskList` for status updates from the dev teammate
202
+ 5. **Handle completion**: When dev marks task as `completed`, conductor transitions story to `review`
203
+ 6. **Handle failure**: If dev reports failure or watchdog detects a stall, conductor applies recovery strategy
204
+ 7. **Next story**: Assign next ready story or send `shutdown_request` if sprint is complete
205
+
206
+ ### Watchdog Integration
207
+
208
+ The conductor runs periodic health checks through the adapter's `teams_watchdog()`:
209
+
210
+ - **Check interval**: Every 60 seconds (configurable via `TEAMS_WATCHDOG_INTERVAL`)
211
+ - **Timeout threshold**: 5 minutes of no activity (configurable via `TEAMS_WATCHDOG_TIMEOUT`)
212
+ - **Stall action**: Mark teammate as stalled, trigger `teams_fallback_sequential()`, reprocess story through existing `execute_story_with_ralph()`
213
+
214
+ ### Keeping Bash Mode Intact
215
+
216
+ All existing bash-mode orchestration remains unchanged. The Agent Teams mode is activated only when:
217
+ 1. The `--use-teams` flag is passed to `/common:ralph-sprint`
218
+ 2. The `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` env var is set
219
+ 3. The adapter library is available
220
+
221
+ Without these conditions, the conductor operates exactly as before.
222
+
174
223
  ## Integration Points
175
224
 
176
225
  - Works with `/common:ralph-run` command
177
226
  - Integrates with Claude Code 2.1.23+ hooks
178
227
  - Compatible with `/project:sprint-dev` workflow
179
228
  - Uses `@tdd-coach` principles
229
+ - Agent Teams mode via `/common:ralph-sprint --use-teams`
180
230
 
181
231
  ## When to Stop
182
232
 
@@ -30,6 +30,50 @@ For each detected technology:
30
30
  1. Load rules from `.claude/rules/`
31
31
  2. Apply specific audit
32
32
 
33
+ ### Step 1.5: Prepare Isolated Output Directories
34
+
35
+ When **2 or more technologies** are detected, each audit agent MUST write its results to an isolated directory to prevent file write conflicts during parallel execution:
36
+
37
+ ```
38
+ .audit-output/
39
+ {technology}-{category}/
40
+ result.json # Structured audit result
41
+ tool-output.log # Raw tool output
42
+ ```
43
+
44
+ For example, a Symfony + React project creates:
45
+ ```
46
+ .audit-output/
47
+ symfony-architecture/result.json
48
+ symfony-code-quality/result.json
49
+ symfony-testing/result.json
50
+ symfony-security/result.json
51
+ react-architecture/result.json
52
+ react-code-quality/result.json
53
+ react-testing/result.json
54
+ react-security/result.json
55
+ ```
56
+
57
+ Each `result.json` follows this schema:
58
+ ```json
59
+ {
60
+ "tech": "symfony",
61
+ "category": "architecture",
62
+ "score": 22,
63
+ "max": 25,
64
+ "findings": [
65
+ {
66
+ "severity": "warning",
67
+ "file": "src/Controller/UserController.php",
68
+ "message": "Direct repository access from controller",
69
+ "rule": "clean-architecture-layer-violation"
70
+ }
71
+ ]
72
+ }
73
+ ```
74
+
75
+ **Single-technology projects**: When only 1 technology is detected, isolation is not required. Results can be written directly without the `.audit-output/` directory structure.
76
+
33
77
  ### Step 2: Audit by Technology
34
78
 
35
79
  For EACH detected technology, verify:
@@ -84,6 +128,28 @@ docker compose exec node npm run lint
84
128
  docker compose exec node npm run test -- --coverage
85
129
  ```
86
130
 
131
+ ### Step 3.5: Merge Audit Results
132
+
133
+ When using isolated output directories (2+ technologies), collect and merge all results before scoring:
134
+
135
+ 1. **Read all `result.json` files** from `.audit-output/*/result.json`
136
+ 2. **Group by technology**: Combine the 4 category results per technology
137
+ 3. **Deduplicate findings**: Remove duplicate findings that appear across categories (e.g., a file flagged in both architecture and code quality)
138
+ 4. **Resolve conflicts**: If the same file is scored differently by two categories, use the lower (more critical) score
139
+ 5. **Produce merged result** for each technology with all 4 category scores
140
+
141
+ Merge can be done using the result aggregator script if available:
142
+ ```bash
143
+ # If result-aggregator.sh is available
144
+ Tools/AgentTeams/lib/result-aggregator.sh \
145
+ --input-dir .audit-output \
146
+ --output-file .audit-output/merged-report.json
147
+ ```
148
+
149
+ Or manually by reading each `result.json` and aggregating in memory.
150
+
151
+ **Single-technology projects**: Skip this step (no merge needed).
152
+
87
153
  ### Step 4: Calculate Scores
88
154
 
89
155
  For each technology, calculate:
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  description: Run autonomous sprint conductor for overnight/unattended sprint execution
3
- argument-hint: <sprint-name> [--overnight|--parallel N|--supervised|--max-stories N]
3
+ argument-hint: <sprint-name> [--overnight|--parallel N|--supervised|--max-stories N|--use-teams]
4
4
  ---
5
5
 
6
6
  # Ralph Sprint - Autonomous Sprint Conductor (ASC)
@@ -17,6 +17,7 @@ Execute an entire sprint autonomously with minimal human intervention. The Auton
17
17
  - `--supervised`: Pause before each story for confirmation
18
18
  - `--max-stories N`: Maximum stories to process (default: 10)
19
19
  - `--timeout H`: Maximum runtime in hours (default: 12)
20
+ - `--use-teams`: Use Agent Teams mode (2-agent prototype: conductor + 1 dev)
20
21
 
21
22
  ## Key Features
22
23
 
@@ -163,6 +164,51 @@ parallel:
163
164
  mem_percent: 80 # Don't spawn if memory > 80%
164
165
  ```
165
166
 
167
+ ## Agent Teams Mode
168
+
169
+ When `--use-teams` is specified, the ASC uses Claude Code Agent Teams (v2.1.32+) instead of bash-based parallel processing. This is a prototype mode limited to a 2-agent team (conductor + 1 dev).
170
+
171
+ ### Prerequisites
172
+
173
+ - Claude Code v2.1.32 or later
174
+ - Environment variable: `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`
175
+ - The adapter library: `Tools/AgentTeams/lib/ralph-teams-adapter.sh`
176
+
177
+ ### How It Works
178
+
179
+ 1. **Conductor as Team Lead**: The `@ralph-conductor` agent acts as the team lead, coordinating work via the shared task system
180
+ 2. **Dev Teammate**: A single `@dev` agent receives story assignments via `SendMessage` and reports completion via `TaskUpdate`
181
+ 3. **Shared Tasks**: Stories are managed through `TaskCreate`/`TaskUpdate` instead of PID-based tracking
182
+ 4. **Watchdog Recovery**: If the dev teammate is unresponsive for 5 minutes, the adapter triggers a fallback to sequential processing
183
+
184
+ ```
185
+ Sprint Conductor (Team Lead)
186
+ |
187
+ +-- TaskCreate: US-001 story task
188
+ +-- SendMessage: assign to dev-1
189
+ |
190
+ +-- dev-1 implements US-001 (TDD: Red -> Green -> Refactor)
191
+ +-- dev-1 marks TaskUpdate: completed
192
+ |
193
+ +-- Conductor transitions story to review
194
+ +-- Conductor assigns next story (or shuts down teammate)
195
+ ```
196
+
197
+ ### Fallback Behavior
198
+
199
+ - Without `--use-teams`: behavior is unchanged (bash-based parallel with `--parallel N` or sequential)
200
+ - With `--use-teams` but without `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`: the adapter falls back to sequential processing automatically
201
+ - If the dev teammate stalls (no response for 5 minutes): the watchdog triggers `teams_fallback_sequential()` and the story is reprocessed sequentially
202
+
203
+ ### Limitations (Prototype)
204
+
205
+ | Constraint | Detail |
206
+ |-----------|--------|
207
+ | Team size | Fixed at 2 agents (conductor + 1 dev) |
208
+ | Stories per run | Processes 1 story at a time through the dev teammate |
209
+ | API stability | Agent Teams is a Research Preview feature |
210
+ | No persistent teams | Teams are per-session only |
211
+
166
212
  ## Quick Start Examples
167
213
 
168
214
  ```bash
@@ -180,6 +226,12 @@ parallel:
180
226
 
181
227
  # Limited run (5 stories, 4 hours)
182
228
  /common:ralph-sprint "Sprint 3" --max-stories 5 --timeout 4
229
+
230
+ # Agent Teams mode (2-agent prototype)
231
+ /common:ralph-sprint "Sprint 3" --use-teams
232
+
233
+ # Agent Teams overnight
234
+ /common:ralph-sprint "Sprint 3" --use-teams --overnight
183
235
  ```
184
236
 
185
237
  ## Configuration
@@ -0,0 +1,310 @@
1
+ ---
2
+ description: Full Audit Team - Parallel multi-technology audit using Agent Teams
3
+ argument-hint: [--techs=auto|tech1,tech2] [--max-workers=4]
4
+ ---
5
+
6
+ # Full Audit Team - Parallel Multi-Technology Audit
7
+
8
+ Orchestrate a parallel full-audit across multiple technology stacks using Claude Code Agent Teams (v2.1.32+). Spawns a lead agent (opus) plus N stack-auditor workers (haiku), one per detected technology stack, up to a configurable maximum.
9
+
10
+ ## Arguments
11
+
12
+ $ARGUMENTS
13
+
14
+ - `--techs=auto`: Auto-detect technologies (default). Or specify comma-separated: `--techs=symfony,react`
15
+ - `--max-workers=4`: Maximum parallel auditor workers (default: 4, max: 4)
16
+ - `--output-dir=<path>`: Custom output directory for audit results
17
+ - `--dry-run`: Show team composition and estimated cost without executing
18
+ - `--skip-aggregation`: Output per-stack results without merging
19
+
20
+ ## Prerequisites
21
+
22
+ - Claude Code v2.1.32+ with Agent Teams support
23
+ - `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` environment variable set
24
+ - Project with 2+ detected technology stacks (single-stack projects should use sequential `/common:full-audit`)
25
+ - `Tools/AgentTeams/lib/compatibility-check.sh` available
26
+ - `Tools/AgentTeams/lib/result-aggregator.sh` available
27
+ - `Tools/AgentTeams/lib/cost-estimator.sh` available
28
+
29
+ ## When to Use (vs. Sequential Audit)
30
+
31
+ | Condition | Use Team Audit | Use Sequential `/common:full-audit` |
32
+ |-----------|---------------|--------------------------------------|
33
+ | 1 technology stack | No | Yes |
34
+ | 2+ technology stacks | Yes | Also valid (simpler, cheaper) |
35
+ | Time-sensitive | Yes (2-3x speedup) | No |
36
+ | Budget-constrained | No (+20-35% token overhead) | Yes |
37
+
38
+ **Break-even**: Parallelization benefits emerge at 2+ stacks. For a single stack, coordination overhead exceeds time saved.
39
+
40
+ ## Process
41
+
42
+ ### Step 1: Technology Detection
43
+
44
+ ```
45
+ Audit Leader (opus)
46
+ |
47
+ v
48
+ Scan project root for technology markers:
49
+ composer.json + symfony/* -> Symfony
50
+ pubspec.yaml + flutter: -> Flutter
51
+ pyproject.toml / requirements -> Python
52
+ package.json + react -> React
53
+ package.json + react-native -> React Native
54
+ package.json + @angular/core -> Angular
55
+ package.json + vue -> Vue.js
56
+ artisan + laravel/* -> Laravel
57
+ *.csproj + dotnet -> C#/.NET
58
+ composer.json (no symfony) -> PHP
59
+ ```
60
+
61
+ If `--techs=auto`, detect all. If explicit, validate specified stacks exist.
62
+
63
+ **Decision gate**: If only 1 technology detected, fall back to sequential `/common:full-audit` (no team overhead needed).
64
+
65
+ ### Step 2: Compatibility Check
66
+
67
+ Before spawning workers, validate each auditor agent against role requirements:
68
+
69
+ ```bash
70
+ # For each detected stack, verify the reviewer agent has required tools
71
+ Tools/AgentTeams/lib/compatibility-check.sh \
72
+ --agent Dev/i18n/en/<Tech>/agents/<tech>-reviewer.md \
73
+ --require-tools Read,Glob,Grep,Bash \
74
+ --require-model haiku
75
+ ```
76
+
77
+ If any agent fails compatibility, log a warning and exclude that stack from parallel execution (fall back to leader handling it sequentially).
78
+
79
+ ### Step 3: Cost Estimation
80
+
81
+ Before spawning the team, estimate token costs:
82
+
83
+ ```bash
84
+ Tools/AgentTeams/lib/cost-estimator.sh \
85
+ --team-size <N+1> \
86
+ --lead-model opus \
87
+ --worker-model haiku \
88
+ --task-type audit \
89
+ --stacks <detected_count>
90
+ ```
91
+
92
+ Display estimated cost to user. In `--dry-run` mode, stop here.
93
+
94
+ ### Step 4: Team Spawn (Fan-Out)
95
+
96
+ ```
97
+ Audit Leader (opus) — coordinates via TaskCreate/SendMessage
98
+ |
99
+ +-- [Parallel Workers - max 4] --------+
100
+ | stack-auditor-1 (haiku): Symfony |
101
+ | stack-auditor-2 (haiku): React |
102
+ | stack-auditor-3 (haiku): Python |
103
+ | stack-auditor-4 (haiku): Angular |
104
+ +---------------------------------------+
105
+ ```
106
+
107
+ **Team creation pattern:**
108
+
109
+ 1. Leader creates isolated output directories per worker (one per stack)
110
+ 2. Leader creates tasks via `TaskCreate` for each stack audit:
111
+ - Task subject: `Audit <TechName> stack`
112
+ - Task description: includes check-architecture, check-code-quality, check-testing, check-security, check-compliance instructions
113
+ - Each task specifies its isolated output path
114
+ 3. Workers claim tasks via `TaskUpdate` (status: in_progress)
115
+ 4. Workers write results to their isolated directory only
116
+
117
+ **Worker instructions** (per stack):
118
+
119
+ Each worker executes the 4 audit categories sequentially within its stack:
120
+
121
+ | Category | Points | What to Check |
122
+ |----------|--------|---------------|
123
+ | Architecture (25pts) | Layer separation, dependency direction, folder conventions, no framework coupling |
124
+ | Code Quality (25pts) | Naming standards, linting, type hints, documentation, complexity < 10 |
125
+ | Testing (25pts) | Coverage >= 80%, unit tests, integration tests, E2E tests, test pyramid |
126
+ | Security (25pts) | No secrets, input validation, OWASP, encryption, dependency CVEs |
127
+
128
+ Workers run Docker-based diagnostic commands per stack:
129
+
130
+ ```bash
131
+ # Symfony
132
+ docker compose exec php php bin/console lint:container
133
+ docker compose exec php vendor/bin/phpstan analyse
134
+ docker compose exec php vendor/bin/phpunit --coverage-text
135
+ docker compose exec php composer audit
136
+
137
+ # React
138
+ docker compose exec node npm run lint
139
+ docker compose exec node npm run test -- --coverage
140
+ docker compose exec node npm audit
141
+
142
+ # Python
143
+ docker compose exec app ruff check .
144
+ docker compose exec app mypy .
145
+ docker compose exec app pytest --cov
146
+ docker compose exec app pip-audit
147
+
148
+ # Flutter
149
+ docker run --rm -v $(pwd):/app -w /app dart dart analyze
150
+ docker run --rm -v $(pwd):/app -w /app dart flutter test --coverage
151
+ ```
152
+
153
+ Each worker writes `result.json` to its isolated output directory:
154
+
155
+ ```json
156
+ {
157
+ "tech": "symfony",
158
+ "score": 82,
159
+ "architecture": { "score": 22, "findings": [...] },
160
+ "code_quality": { "score": 20, "findings": [...] },
161
+ "testing": { "score": 18, "findings": [...] },
162
+ "security": { "score": 22, "findings": [...] }
163
+ }
164
+ ```
165
+
166
+ ### Step 5: Sync Barrier
167
+
168
+ Leader waits for all worker tasks to reach `completed` status via `TaskList` polling. If a worker exceeds its timeout (5 minutes per stack), leader marks it as failed and proceeds with partial results.
169
+
170
+ ### Step 6: Result Aggregation
171
+
172
+ Leader runs the result aggregator:
173
+
174
+ ```bash
175
+ Tools/AgentTeams/lib/result-aggregator.sh \
176
+ --input-dir <isolated-output-root> \
177
+ --output-file audit-report.json
178
+ ```
179
+
180
+ The aggregator:
181
+ - Collects all `result.json` files from isolated directories
182
+ - Deduplicates findings (same file + same message = duplicate)
183
+ - Resolves score conflicts via weighted average
184
+ - Produces unified report
185
+
186
+ ### Step 7: Report Generation
187
+
188
+ Leader generates the formatted multi-technology audit report:
189
+
190
+ ```
191
+ ================================================================
192
+ MULTI-TECHNOLOGY AUDIT (Agent Teams) - Global Score: XX/100
193
+ ================================================================
194
+
195
+ Detected technologies: [list]
196
+ Team size: 1 leader + N workers
197
+ Execution mode: Parallel
198
+ Date: YYYY-MM-DD
199
+
200
+ ----------------------------------------------------------------
201
+ SYMFONY - Score: XX/100
202
+ ----------------------------------------------------------------
203
+
204
+ Architecture (XX/25)
205
+ [PASS] Clean Architecture respected
206
+ [PASS] CQRS implemented correctly
207
+ [WARN] 2 services directly access Repository
208
+
209
+ Code Quality (XX/25)
210
+ [PASS] PHPStan level 8 - 0 errors
211
+ [WARN] 5 methods > 20 lines
212
+
213
+ Testing (XX/25)
214
+ [PASS] Coverage: 85%
215
+ [WARN] No Panther E2E tests
216
+
217
+ Security (XX/25)
218
+ [PASS] No secrets in code
219
+ [WARN] Dependency with minor CVE
220
+
221
+ ----------------------------------------------------------------
222
+ REACT - Score: XX/100
223
+ ----------------------------------------------------------------
224
+
225
+ [Same structure per technology]
226
+
227
+ ================================================================
228
+ GLOBAL SUMMARY
229
+ ================================================================
230
+
231
+ | Technology | Architecture | Code | Tests | Security | Total |
232
+ |------------|-------------|------|-------|----------|-------|
233
+ | Symfony | XX/25 | XX/25| XX/25 | XX/25 | XX/100|
234
+ | React | XX/25 | XX/25| XX/25 | XX/25 | XX/100|
235
+ | AVERAGE | XX/25 | XX/25| XX/25 | XX/25 | XX/100|
236
+
237
+ ================================================================
238
+ TOP 5 PRIORITY ACTIONS
239
+ ================================================================
240
+
241
+ 1. [CRITICAL] Action description
242
+ -> Impact: +X points | Effort: Low/Medium/High
243
+
244
+ 2. [HIGH] Action description
245
+ -> Impact: +X points | Effort: Low/Medium/High
246
+
247
+ ================================================================
248
+ EXECUTION METRICS
249
+ ================================================================
250
+
251
+ | Metric | Value |
252
+ |--------|-------|
253
+ | Total time | Xs (vs ~Ys sequential) |
254
+ | Speedup | ~X.Xx |
255
+ | Total tokens | ~XK |
256
+ | Token overhead vs sequential | +XX% |
257
+ | Workers spawned | N |
258
+ | Workers completed | N |
259
+ | Workers failed | 0 |
260
+ ```
261
+
262
+ ### Step 8: Cleanup
263
+
264
+ Leader sends `shutdown_request` to all workers and cleans up isolated output directories (unless `--keep-artifacts` is specified).
265
+
266
+ ## Scoring Rules
267
+
268
+ Same as `/common:full-audit`:
269
+
270
+ | Violation | Points Lost |
271
+ |-----------|-------------|
272
+ | Architectural pattern violated | -5 |
273
+ | Framework/domain coupling | -3 |
274
+ | Critical linting error | -2 |
275
+ | Linting warning | -1 |
276
+ | Method > 30 lines | -1 |
277
+ | Coverage < 80% | -5 |
278
+ | No domain unit tests | -5 |
279
+ | Secret in code | -10 |
280
+ | Critical CVE vulnerability | -10 |
281
+ | High CVE vulnerability | -5 |
282
+
283
+ ## Performance Expectations
284
+
285
+ | Stacks | Sequential Estimate | Team Estimate | Speedup | Token Overhead |
286
+ |--------|--------------------|--------------:|---------|----------------|
287
+ | 2 | ~4 min | ~2.5 min | ~1.6x | +20% |
288
+ | 3 | ~6 min | ~3 min | ~2x | +25% |
289
+ | 4 | ~8 min | ~3.5 min | ~2.3x | +30% |
290
+ | 5+ | ~10+ min | ~4 min | ~2.5x | +35% |
291
+
292
+ **Note**: These are realistic estimates accounting for coordination overhead (agent spawn ~5-10s, task assignment, result aggregation). Do not expect linear speedup.
293
+
294
+ ## Error Handling
295
+
296
+ | Error | Recovery |
297
+ |-------|----------|
298
+ | Worker timeout (>5min) | Leader marks failed, proceeds with partial results |
299
+ | Worker crash | Leader logs error, excludes stack from report |
300
+ | Docker not available | Worker reports error, leader falls back to source-only analysis |
301
+ | No technologies detected | Abort with clear message |
302
+ | Single technology only | Fall back to sequential `/common:full-audit` |
303
+ | Compatibility check fails | Exclude stack from parallel, leader handles sequentially |
304
+
305
+ ## Limitations
306
+
307
+ - Maximum 4 parallel workers (coordination overhead dominates beyond this)
308
+ - Token cost is ~20-35% higher than sequential due to context duplication per worker
309
+ - Requires Agent Teams Research Preview (API may change)
310
+ - Each worker loads project context independently (~10-20K tokens overhead each)