npm - @the-bearded-bear/claude-craft - Versions diffs - 5.6.0 → 5.7.0-next.bf058ea - Mend

@the-bearded-bear/claude-craft 5.6.0 → 5.7.0-next.bf058ea

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/Dev/i18n/en/Common/agents/ralph-conductor.md +50 -0
package/Dev/i18n/en/Common/commands/full-audit.md +66 -0
package/Dev/i18n/en/Common/commands/ralph-sprint.md +53 -1
package/Dev/i18n/en/Common/commands/team-audit.md +310 -0
package/Dev/i18n/en/Common/commands/team-security.md +325 -0
package/Dev/i18n/en/Common/commands/team-sprint.md +243 -0
package/README.md +10 -0
package/Tools/AgentTeams/lib/compatibility-check.sh +883 -0
package/Tools/AgentTeams/lib/cost-dashboard.sh +353 -0
package/Tools/AgentTeams/lib/cost-estimator.sh +528 -0
package/Tools/AgentTeams/lib/ralph-teams-adapter.sh +705 -0
package/Tools/AgentTeams/lib/result-aggregator.sh +427 -0
package/Tools/Ralph/lib/parallel-manager.sh +204 -18
package/package.json +1 -1

package/Dev/i18n/en/Common/agents/ralph-conductor.md CHANGED Viewed

@@ -171,12 +171,62 @@ Summary:
 - Metrics exported: .ralph/sessions/.../metrics-export.json
 ```
+## Agent Teams Coordination Mode
+When operating in Agent Teams mode (activated via `--use-teams` on `/common:ralph-sprint`), the conductor takes on the role of **team lead** and coordinates a dev teammate through the Claude Code Agent Teams API instead of bash process management.
+### Prerequisites
+- `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` environment variable
+- Claude Code v2.1.32+
+- Adapter library: `Tools/AgentTeams/lib/ralph-teams-adapter.sh`
+### Coordination via Task System
+In Agent Teams mode, the conductor replaces PID-based tracking with the shared task system:
+| Bash Mode (current) | Agent Teams Mode |
+|---------------------|-----------------|
+| `spawn_ralph_for_story()` with bash `&` | `TaskCreate` + `SendMessage` to dev teammate |
+| `kill -0 $pid` polling | `TaskList` / `TaskCompleted` hook |
+| PID-based completion detection | `TaskUpdate(status=completed)` by dev |
+| `kill -9` for stuck processes | `SendMessage(type=shutdown_request)` + watchdog fallback |
+| `yq` writes to `batch-queue.yaml` | Shared `TaskList` (built-in coordination) |
+### Story Processing Flow
+1. **Claim story**: Conductor reads `sprint-status.yaml`, claims next `ready-for-dev` story
+2. **Create task**: `TaskCreate` with story details, acceptance criteria, and TDD instructions
+3. **Assign to dev**: `SendMessage(type=message, recipient=dev-1)` with the story prompt
+4. **Monitor progress**: Poll `TaskList` for status updates from the dev teammate
+5. **Handle completion**: When dev marks task as `completed`, conductor transitions story to `review`
+6. **Handle failure**: If dev reports failure or watchdog detects a stall, conductor applies recovery strategy
+7. **Next story**: Assign next ready story or send `shutdown_request` if sprint is complete
+### Watchdog Integration
+The conductor runs periodic health checks through the adapter's `teams_watchdog()`:
+- **Check interval**: Every 60 seconds (configurable via `TEAMS_WATCHDOG_INTERVAL`)
+- **Timeout threshold**: 5 minutes of no activity (configurable via `TEAMS_WATCHDOG_TIMEOUT`)
+- **Stall action**: Mark teammate as stalled, trigger `teams_fallback_sequential()`, reprocess story through existing `execute_story_with_ralph()`
+### Keeping Bash Mode Intact
+All existing bash-mode orchestration remains unchanged. The Agent Teams mode is activated only when:
+1. The `--use-teams` flag is passed to `/common:ralph-sprint`
+2. The `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` env var is set
+3. The adapter library is available
+Without these conditions, the conductor operates exactly as before.
 ## Integration Points
 - Works with `/common:ralph-run` command
 - Integrates with Claude Code 2.1.23+ hooks
 - Compatible with `/project:sprint-dev` workflow
 - Uses `@tdd-coach` principles
+- Agent Teams mode via `/common:ralph-sprint --use-teams`
 ## When to Stop

package/Dev/i18n/en/Common/commands/full-audit.md CHANGED Viewed

@@ -30,6 +30,50 @@ For each detected technology:
 1. Load rules from `.claude/rules/`
 2. Apply specific audit
+### Step 1.5: Prepare Isolated Output Directories
+When **2 or more technologies** are detected, each audit agent MUST write its results to an isolated directory to prevent file write conflicts during parallel execution:
+```
+.audit-output/
+  {technology}-{category}/
+    result.json          # Structured audit result
+    tool-output.log      # Raw tool output
+```
+For example, a Symfony + React project creates:
+```
+.audit-output/
+  symfony-architecture/result.json
+  symfony-code-quality/result.json
+  symfony-testing/result.json
+  symfony-security/result.json
+  react-architecture/result.json
+  react-code-quality/result.json
+  react-testing/result.json
+  react-security/result.json
+```
+Each `result.json` follows this schema:
+```json
+{
+  "tech": "symfony",
+  "category": "architecture",
+  "score": 22,
+  "max": 25,
+  "findings": [
+    {
+      "severity": "warning",
+      "file": "src/Controller/UserController.php",
+      "message": "Direct repository access from controller",
+      "rule": "clean-architecture-layer-violation"
+    }
+  ]
+}
+```
+**Single-technology projects**: When only 1 technology is detected, isolation is not required. Results can be written directly without the `.audit-output/` directory structure.
 ### Step 2: Audit by Technology
 For EACH detected technology, verify:
@@ -84,6 +128,28 @@ docker compose exec node npm run lint
 docker compose exec node npm run test -- --coverage
 ```
+### Step 3.5: Merge Audit Results
+When using isolated output directories (2+ technologies), collect and merge all results before scoring:
+1. **Read all `result.json` files** from `.audit-output/*/result.json`
+2. **Group by technology**: Combine the 4 category results per technology
+3. **Deduplicate findings**: Remove duplicate findings that appear across categories (e.g., a file flagged in both architecture and code quality)
+4. **Resolve conflicts**: If the same file is scored differently by two categories, use the lower (more critical) score
+5. **Produce merged result** for each technology with all 4 category scores
+Merge can be done using the result aggregator script if available:
+```bash
+# If result-aggregator.sh is available
+Tools/AgentTeams/lib/result-aggregator.sh \
+  --input-dir .audit-output \
+  --output-file .audit-output/merged-report.json
+```
+Or manually by reading each `result.json` and aggregating in memory.
+**Single-technology projects**: Skip this step (no merge needed).
 ### Step 4: Calculate Scores
 For each technology, calculate:

package/Dev/i18n/en/Common/commands/ralph-sprint.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 description: Run autonomous sprint conductor for overnight/unattended sprint execution
-argument-hint: <sprint-name> [--overnight|--parallel N|--supervised|--max-stories N]
+argument-hint: <sprint-name> [--overnight|--parallel N|--supervised|--max-stories N|--use-teams]
 ---
 # Ralph Sprint - Autonomous Sprint Conductor (ASC)
@@ -17,6 +17,7 @@ Execute an entire sprint autonomously with minimal human intervention. The Auton
 - `--supervised`: Pause before each story for confirmation
 - `--max-stories N`: Maximum stories to process (default: 10)
 - `--timeout H`: Maximum runtime in hours (default: 12)
+- `--use-teams`: Use Agent Teams mode (2-agent prototype: conductor + 1 dev)
 ## Key Features
@@ -163,6 +164,51 @@ parallel:
     mem_percent: 80      # Don't spawn if memory > 80%
 ```
+## Agent Teams Mode
+When `--use-teams` is specified, the ASC uses Claude Code Agent Teams (v2.1.32+) instead of bash-based parallel processing. This is a prototype mode limited to a 2-agent team (conductor + 1 dev).
+### Prerequisites
+- Claude Code v2.1.32 or later
+- Environment variable: `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`
+- The adapter library: `Tools/AgentTeams/lib/ralph-teams-adapter.sh`
+### How It Works
+1. **Conductor as Team Lead**: The `@ralph-conductor` agent acts as the team lead, coordinating work via the shared task system
+2. **Dev Teammate**: A single `@dev` agent receives story assignments via `SendMessage` and reports completion via `TaskUpdate`
+3. **Shared Tasks**: Stories are managed through `TaskCreate`/`TaskUpdate` instead of PID-based tracking
+4. **Watchdog Recovery**: If the dev teammate is unresponsive for 5 minutes, the adapter triggers a fallback to sequential processing
+```
+Sprint Conductor (Team Lead)
+  |
+  +-- TaskCreate: US-001 story task
+  +-- SendMessage: assign to dev-1
+  |
+  +-- dev-1 implements US-001 (TDD: Red -> Green -> Refactor)
+  +-- dev-1 marks TaskUpdate: completed
+  |
+  +-- Conductor transitions story to review
+  +-- Conductor assigns next story (or shuts down teammate)
+```
+### Fallback Behavior
+- Without `--use-teams`: behavior is unchanged (bash-based parallel with `--parallel N` or sequential)
+- With `--use-teams` but without `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`: the adapter falls back to sequential processing automatically
+- If the dev teammate stalls (no response for 5 minutes): the watchdog triggers `teams_fallback_sequential()` and the story is reprocessed sequentially
+### Limitations (Prototype)
+| Constraint | Detail |
+|-----------|--------|
+| Team size | Fixed at 2 agents (conductor + 1 dev) |
+| Stories per run | Processes 1 story at a time through the dev teammate |
+| API stability | Agent Teams is a Research Preview feature |
+| No persistent teams | Teams are per-session only |
 ## Quick Start Examples
 ```bash
@@ -180,6 +226,12 @@ parallel:
 # Limited run (5 stories, 4 hours)
 /common:ralph-sprint "Sprint 3" --max-stories 5 --timeout 4
+# Agent Teams mode (2-agent prototype)
+/common:ralph-sprint "Sprint 3" --use-teams
+# Agent Teams overnight
+/common:ralph-sprint "Sprint 3" --use-teams --overnight
 ```
 ## Configuration

package/Dev/i18n/en/Common/commands/team-audit.md ADDED Viewed

@@ -0,0 +1,310 @@
+---
+description: Full Audit Team - Parallel multi-technology audit using Agent Teams
+argument-hint: [--techs=auto|tech1,tech2] [--max-workers=4]
+---
+# Full Audit Team - Parallel Multi-Technology Audit
+Orchestrate a parallel full-audit across multiple technology stacks using Claude Code Agent Teams (v2.1.32+). Spawns a lead agent (opus) plus N stack-auditor workers (haiku), one per detected technology stack, up to a configurable maximum.
+## Arguments
+$ARGUMENTS
+- `--techs=auto`: Auto-detect technologies (default). Or specify comma-separated: `--techs=symfony,react`
+- `--max-workers=4`: Maximum parallel auditor workers (default: 4, max: 4)
+- `--output-dir=<path>`: Custom output directory for audit results
+- `--dry-run`: Show team composition and estimated cost without executing
+- `--skip-aggregation`: Output per-stack results without merging
+## Prerequisites
+- Claude Code v2.1.32+ with Agent Teams support
+- `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` environment variable set
+- Project with 2+ detected technology stacks (single-stack projects should use sequential `/common:full-audit`)
+- `Tools/AgentTeams/lib/compatibility-check.sh` available
+- `Tools/AgentTeams/lib/result-aggregator.sh` available
+- `Tools/AgentTeams/lib/cost-estimator.sh` available
+## When to Use (vs. Sequential Audit)
+| Condition | Use Team Audit | Use Sequential `/common:full-audit` |
+|-----------|---------------|--------------------------------------|
+| 1 technology stack | No | Yes |
+| 2+ technology stacks | Yes | Also valid (simpler, cheaper) |
+| Time-sensitive | Yes (2-3x speedup) | No |
+| Budget-constrained | No (+20-35% token overhead) | Yes |
+**Break-even**: Parallelization benefits emerge at 2+ stacks. For a single stack, coordination overhead exceeds time saved.
+## Process
+### Step 1: Technology Detection
+```
+Audit Leader (opus)
+  |
+  v
+Scan project root for technology markers:
+  composer.json + symfony/*      -> Symfony
+  pubspec.yaml + flutter:        -> Flutter
+  pyproject.toml / requirements  -> Python
+  package.json + react           -> React
+  package.json + react-native    -> React Native
+  package.json + @angular/core   -> Angular
+  package.json + vue             -> Vue.js
+  artisan + laravel/*            -> Laravel
+  *.csproj + dotnet              -> C#/.NET
+  composer.json (no symfony)     -> PHP
+```
+If `--techs=auto`, detect all. If explicit, validate specified stacks exist.
+**Decision gate**: If only 1 technology detected, fall back to sequential `/common:full-audit` (no team overhead needed).
+### Step 2: Compatibility Check
+Before spawning workers, validate each auditor agent against role requirements:
+```bash
+# For each detected stack, verify the reviewer agent has required tools
+Tools/AgentTeams/lib/compatibility-check.sh \
+  --agent Dev/i18n/en/<Tech>/agents/<tech>-reviewer.md \
+  --require-tools Read,Glob,Grep,Bash \
+  --require-model haiku
+```
+If any agent fails compatibility, log a warning and exclude that stack from parallel execution (fall back to leader handling it sequentially).
+### Step 3: Cost Estimation
+Before spawning the team, estimate token costs:
+```bash
+Tools/AgentTeams/lib/cost-estimator.sh \
+  --team-size <N+1> \
+  --lead-model opus \
+  --worker-model haiku \
+  --task-type audit \
+  --stacks <detected_count>
+```
+Display estimated cost to user. In `--dry-run` mode, stop here.
+### Step 4: Team Spawn (Fan-Out)
+```
+Audit Leader (opus) — coordinates via TaskCreate/SendMessage
+  |
+  +-- [Parallel Workers - max 4] --------+
+  |   stack-auditor-1 (haiku): Symfony     |
+  |   stack-auditor-2 (haiku): React       |
+  |   stack-auditor-3 (haiku): Python      |
+  |   stack-auditor-4 (haiku): Angular     |
+  +---------------------------------------+
+```
+**Team creation pattern:**
+1. Leader creates isolated output directories per worker (one per stack)
+2. Leader creates tasks via `TaskCreate` for each stack audit:
+   - Task subject: `Audit <TechName> stack`
+   - Task description: includes check-architecture, check-code-quality, check-testing, check-security, check-compliance instructions
+   - Each task specifies its isolated output path
+3. Workers claim tasks via `TaskUpdate` (status: in_progress)
+4. Workers write results to their isolated directory only
+**Worker instructions** (per stack):
+Each worker executes the 4 audit categories sequentially within its stack:
+| Category | Points | What to Check |
+|----------|--------|---------------|
+| Architecture (25pts) | Layer separation, dependency direction, folder conventions, no framework coupling |
+| Code Quality (25pts) | Naming standards, linting, type hints, documentation, complexity < 10 |
+| Testing (25pts) | Coverage >= 80%, unit tests, integration tests, E2E tests, test pyramid |
+| Security (25pts) | No secrets, input validation, OWASP, encryption, dependency CVEs |
+Workers run Docker-based diagnostic commands per stack:
+```bash
+# Symfony
+docker compose exec php php bin/console lint:container
+docker compose exec php vendor/bin/phpstan analyse
+docker compose exec php vendor/bin/phpunit --coverage-text
+docker compose exec php composer audit
+# React
+docker compose exec node npm run lint
+docker compose exec node npm run test -- --coverage
+docker compose exec node npm audit
+# Python
+docker compose exec app ruff check .
+docker compose exec app mypy .
+docker compose exec app pytest --cov
+docker compose exec app pip-audit
+# Flutter
+docker run --rm -v $(pwd):/app -w /app dart dart analyze
+docker run --rm -v $(pwd):/app -w /app dart flutter test --coverage
+```
+Each worker writes `result.json` to its isolated output directory:
+```json
+{
+  "tech": "symfony",
+  "score": 82,
+  "architecture": { "score": 22, "findings": [...] },
+  "code_quality": { "score": 20, "findings": [...] },
+  "testing": { "score": 18, "findings": [...] },
+  "security": { "score": 22, "findings": [...] }
+}
+```
+### Step 5: Sync Barrier
+Leader waits for all worker tasks to reach `completed` status via `TaskList` polling. If a worker exceeds its timeout (5 minutes per stack), leader marks it as failed and proceeds with partial results.
+### Step 6: Result Aggregation
+Leader runs the result aggregator:
+```bash
+Tools/AgentTeams/lib/result-aggregator.sh \
+  --input-dir <isolated-output-root> \
+  --output-file audit-report.json
+```
+The aggregator:
+- Collects all `result.json` files from isolated directories
+- Deduplicates findings (same file + same message = duplicate)
+- Resolves score conflicts via weighted average
+- Produces unified report
+### Step 7: Report Generation
+Leader generates the formatted multi-technology audit report:
+```
+================================================================
+MULTI-TECHNOLOGY AUDIT (Agent Teams) - Global Score: XX/100
+================================================================
+Detected technologies: [list]
+Team size: 1 leader + N workers
+Execution mode: Parallel
+Date: YYYY-MM-DD
+----------------------------------------------------------------
+SYMFONY - Score: XX/100
+----------------------------------------------------------------
+Architecture (XX/25)
+  [PASS] Clean Architecture respected
+  [PASS] CQRS implemented correctly
+  [WARN] 2 services directly access Repository
+Code Quality (XX/25)
+  [PASS] PHPStan level 8 - 0 errors
+  [WARN] 5 methods > 20 lines
+Testing (XX/25)
+  [PASS] Coverage: 85%
+  [WARN] No Panther E2E tests
+Security (XX/25)
+  [PASS] No secrets in code
+  [WARN] Dependency with minor CVE
+----------------------------------------------------------------
+REACT - Score: XX/100
+----------------------------------------------------------------
+[Same structure per technology]
+================================================================
+GLOBAL SUMMARY
+================================================================
+| Technology | Architecture | Code | Tests | Security | Total |
+|------------|-------------|------|-------|----------|-------|
+| Symfony    | XX/25       | XX/25| XX/25 | XX/25    | XX/100|
+| React      | XX/25       | XX/25| XX/25 | XX/25    | XX/100|
+| AVERAGE    | XX/25       | XX/25| XX/25 | XX/25    | XX/100|
+================================================================
+TOP 5 PRIORITY ACTIONS
+================================================================
+1. [CRITICAL] Action description
+   -> Impact: +X points | Effort: Low/Medium/High
+2. [HIGH] Action description
+   -> Impact: +X points | Effort: Low/Medium/High
+================================================================
+EXECUTION METRICS
+================================================================
+| Metric | Value |
+|--------|-------|
+| Total time | Xs (vs ~Ys sequential) |
+| Speedup | ~X.Xx |
+| Total tokens | ~XK |
+| Token overhead vs sequential | +XX% |
+| Workers spawned | N |
+| Workers completed | N |
+| Workers failed | 0 |
+```
+### Step 8: Cleanup
+Leader sends `shutdown_request` to all workers and cleans up isolated output directories (unless `--keep-artifacts` is specified).
+## Scoring Rules
+Same as `/common:full-audit`:
+| Violation | Points Lost |
+|-----------|-------------|
+| Architectural pattern violated | -5 |
+| Framework/domain coupling | -3 |
+| Critical linting error | -2 |
+| Linting warning | -1 |
+| Method > 30 lines | -1 |
+| Coverage < 80% | -5 |
+| No domain unit tests | -5 |
+| Secret in code | -10 |
+| Critical CVE vulnerability | -10 |
+| High CVE vulnerability | -5 |
+## Performance Expectations
+| Stacks | Sequential Estimate | Team Estimate | Speedup | Token Overhead |
+|--------|--------------------|--------------:|---------|----------------|
+| 2 | ~4 min | ~2.5 min | ~1.6x | +20% |
+| 3 | ~6 min | ~3 min | ~2x | +25% |
+| 4 | ~8 min | ~3.5 min | ~2.3x | +30% |
+| 5+ | ~10+ min | ~4 min | ~2.5x | +35% |
+**Note**: These are realistic estimates accounting for coordination overhead (agent spawn ~5-10s, task assignment, result aggregation). Do not expect linear speedup.
+## Error Handling
+| Error | Recovery |
+|-------|----------|
+| Worker timeout (>5min) | Leader marks failed, proceeds with partial results |
+| Worker crash | Leader logs error, excludes stack from report |
+| Docker not available | Worker reports error, leader falls back to source-only analysis |
+| No technologies detected | Abort with clear message |
+| Single technology only | Fall back to sequential `/common:full-audit` |
+| Compatibility check fails | Exclude stack from parallel, leader handles sequentially |
+## Limitations
+- Maximum 4 parallel workers (coordination overhead dominates beyond this)
+- Token cost is ~20-35% higher than sequential due to context duplication per worker
+- Requires Agent Teams Research Preview (API may change)
+- Each worker loads project context independently (~10-20K tokens overhead each)