npm - @yemi33/squad - Versions diffs - 0.1.0 - Mend

@yemi33/squad 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

package/LICENSE +21 -0
package/README.md +483 -0
package/TODO.md +60 -0
package/agents/dallas/charter.md +55 -0
package/agents/lambert/charter.md +66 -0
package/agents/ralph/charter.md +44 -0
package/agents/rebecca/charter.md +56 -0
package/agents/ripley/charter.md +46 -0
package/bin/squad.js +164 -0
package/config.template.json +53 -0
package/dashboard.html +1680 -0
package/dashboard.js +886 -0
package/docs/auto-discovery.md +414 -0
package/docs/blog-first-successful-dispatch.md +127 -0
package/docs/self-improvement.md +277 -0
package/engine/ado-mcp-wrapper.js +49 -0
package/engine/spawn-agent.js +98 -0
package/engine.js +3416 -0
package/package.json +46 -0
package/playbooks/build-and-test.md +155 -0
package/playbooks/explore.md +63 -0
package/playbooks/fix.md +57 -0
package/playbooks/implement.md +84 -0
package/playbooks/plan-to-prd.md +74 -0
package/playbooks/review.md +68 -0
package/playbooks/test.md +75 -0
package/playbooks/work-item.md +74 -0
package/routing.md +29 -0
package/skills/README.md +72 -0
package/skills/ado-pr-status-fetch.md +18 -0
package/squad.js +300 -0
package/team.md +19 -0

package/docs/auto-discovery.md ADDED Viewed

@@ -0,0 +1,414 @@
+# Auto-Discovery & Execution Pipeline
+How the squad engine finds work and dispatches agents automatically.
+## The Tick Loop
+The engine runs a tick every 60 seconds (configurable via `config.json` → `engine.tickInterval`). Each tick:
+```
+tick()
+  1. checkTimeouts()       Kill stale/hung agents (>heartbeatTimeout)
+  2. consolidateInbox()    Merge learnings into notes.md (at 5+ inbox files)
+  2.5 runCleanup()         Periodic cleanup (every 10 ticks ≈ 5min)
+  2.6 pollPrStatus()       Poll ADO for build, review, merge status (every 6 ticks ≈ 3min)
+  3. discoverWork()        Scan ALL linked projects for new tasks
+  4. updateSnapshot()      Write identity/now.md
+  5. dispatch              Spawn agents for pending items (up to maxConcurrent)
+```
+## Work Discovery
+`discoverWork()` iterates every project in `config.projects[]` and runs three core discovery sources. Results are prioritized: fixes > reviews > implements > work-items.
+Before scanning, the engine materializes plans and specs into project work items (side-effect writes to `work-items.json`), so they're picked up by the work items source below.
+### Source 1: Pull Requests (`discoverFromPrs`)
+**Reads:** `<project>/.squad/pull-requests.json`
+| PR State | Action | Dispatch Type |
+|----------|--------|---------------|
+| Squad review pending/waiting | Queue a code review | `review` |
+| Squad review `changes-requested` | Route back to author for fixes | `fix` |
+| `buildStatus: "failing"` | Route to any agent for build fix | `fix` |
+| No `buildTested` flag | Queue build & test verification | `test` |
+Skips PRs where `status !== "active"`.
+**Build & Test auto-dispatch:** When a PR is first created (synced from agent output), it has no `buildTested` field. The engine dispatches a test agent to check out the branch, build, run tests, and if it's a webapp, start a local dev server. If build/tests fail, the agent auto-files a high-priority fix work item. The PR is marked `buildTested: 'dispatched'` to prevent re-dispatch.
+### Source 2: PRD Gap Analysis (`discoverFromPrd`)
+**Reads:** `<project>/docs/prd-gaps.json` (or custom path from `workSources.prd.path`)
+| Item State | Action | Dispatch Type |
+|------------|--------|---------------|
+| `status: "missing"` or `"planned"` | Queue implementation | `implement` |
+| `estimated_complexity: "large"` | Routes to `implement:large` (prefers Rebecca) | `implement:large` |
+### Source 3: Per-Project Work Items (`discoverFromWorkItems`)
+**Reads:** `<project>/.squad/work-items.json`
+| Item State | Action | Dispatch Type |
+|------------|--------|---------------|
+| `status: "queued"` or `"pending"` | Queue based on item's `type` field | Item's `type` (default: `implement`) |
+After dispatching, the engine writes `status: "dispatched"` back to the item. The agent is scoped to the specific project directory.
+### Source 4: Central Work Items (`discoverCentralWorkItems`)
+**Reads:** `~/.squad/work-items.json` (central, project-agnostic)
+These are tasks where the agent decides which project to work in. The engine builds a prompt that includes a list of all linked projects with their paths and repo config. The agent then navigates to the appropriate project directory based on the task.
+| Item State | Action | Dispatch Type |
+|------------|--------|---------------|
+| `status: "queued"` or `"pending"` | Queue based on item's `type` field | Item's `type` (default: `implement`) |
+**How it differs from per-project work items:**
+- No `cwd` is set — agent starts in the squad directory and navigates itself
+- The prompt includes all project paths and **descriptions** so the agent can choose
+- No branch/worktree is pre-created — agent handles this
+- Useful for cross-project tasks, exploratory work, or when you don't know which repo is relevant
+**Project descriptions drive routing.** Each project in `config.json` has a `description` field. The engine injects these into the central work item prompt:
+```
+### MyProject
+- **Path:** C:/Users/you/MyProject
+- **Repo:** org/project/MyProject
+- **What it is:** Description from config.json...
+```
+Better descriptions → better agent routing. Describe what each repo contains, what kind of work happens there, and what technologies it uses.
+**Cross-repo tasks.** Central work items can span multiple repositories. The agent's prompt instructs it to:
+1. Determine all repos affected by the task
+2. Work on each sequentially (worktree → commit → push → PR per repo)
+3. Note cross-repo dependencies in PR descriptions (e.g., "Requires MyProject PR #456")
+4. Use the correct repo config (org, project, repoId) for each repo
+5. Document which repos were touched in the learnings file
+This means a single work item like "Add telemetry to the document creation pipeline" can result in PRs across multiple repos if the agent determines the change touches shared modules in one repo and the frontend in another.
+**Adding central work items:**
+- Dashboard Command Center → type your intent (no `#project` = central queue)
+- CLI: `node engine.js work "task title"` (defaults to central queue)
+- Direct edit: `~/.squad/work-items.json`
+### Materialization: Specs and Plans → Work Items
+Before the 3 core sources run, the engine materializes indirect sources into work items:
+**Specs** (`materializeSpecsAsWorkItems`): When a PR merges that added/modified `.md` files under `docs/` (configurable via `workSources.specs.filePatterns`), the engine reads each doc and checks for `type: spec` in its frontmatter. Only docs with this marker are treated as actionable specs — regular documentation is ignored. For matching docs, it extracts title/summary/priority and creates implementation work items with `createdBy: 'engine:spec-discovery'`. State tracked in `.squad/spec-tracker.json` to avoid re-processing merged PRs.
+Example spec frontmatter:
+```markdown
+---
+type: spec
+title: Add user authentication
+priority: high
+---
+# Add user authentication
+...
+```
+**Plans** (`materializePlansAsWorkItems`): Scans `~/.squad/plans/*.json` for plan files with `missing`/`planned` items. Creates work items in the target project's queue with `createdBy: 'engine:plan-discovery'`. Deduped via `sourcePlan` + `sourcePlanItem` fields.
+Both write to `work-items.json` and are picked up by Source 3 on the same or next tick.
+## ADO PR Status Polling (`pollPrStatus`)
+**Runs:** Every 6 ticks (≈ 3 minutes), independently of work discovery. Replaces the retired agent-based `pr-sync`.
+The engine directly polls the Azure DevOps REST API for **all** PR metadata: build/CI status, human review votes, and completion state. Two API calls per PR — no agent dispatch needed.
+**Per PR:**
+1. `GET pullrequests/{id}` → `status` (active/completed/abandoned), `mergeStatus`, `reviewers[].vote`
+2. `GET pullrequests/{id}/statuses` → CI pipeline results
+**Fields updated in `pull-requests.json`:**
+| Field | Source | Values |
+|-------|--------|--------|
+| `status` | PR details | `active` / `merged` / `abandoned` |
+| `reviewStatus` | `reviewers[].vote` | `approved` (vote ≥ 5) / `changes-requested` (-10) / `waiting` (-5) / `pending` (0) |
+| `buildStatus` | PR statuses (codecoverage/deploy/build/ci contexts) | `passing` / `failing` / `running` / `none` |
+| `buildFailReason` | Failed status description | Set on failure, cleared otherwise |
+**Auth:** Bearer token via `azureauth ado token --output token` (cached 30 minutes).
+This feeds `discoverFromPrs` — when `buildStatus` flips to `"failing"`, the next discovery tick dispatches a fix agent. When `status` becomes `"merged"`, the PR drops out of active polling.
+## Discovery Gates
+Every discovered item passes through three gates before being queued:
+```
+Item found
+  │
+  ├─ isAlreadyDispatched(key)?  → skip if already in pending or active queue
+  │    Key format: <source>-<projectName>-<itemId>
+  │    e.g., "prd-MyProject-M001", "review-MyRepo-PR-123"
+  │
+  ├─ isOnCooldown(key)?         → skip if dispatched within cooldown window
+  │    Default: 30min for PRD/PRs, 0 for work-items
+  │    Cooldowns are in-memory (reset on engine restart)
+  │
+  └─ resolveAgent(workType)?    → skip if no idle agent available
+       Checks routing.md: preferred → fallback → any idle agent
+```
+## Agent Routing
+`resolveAgent()` parses `routing.md` to pick the right agent:
+```
+routing.md table:
+  implement       → dallas  (fallback: ralph)
+  implement:large → rebecca (fallback: dallas)
+  review          → ripley  (fallback: lambert)
+  fix             → _author_ (fallback: dallas)   ← routes to PR author
+  analyze         → lambert (fallback: rebecca)
+  explore         → ripley  (fallback: rebecca)
+  test            → dallas  (fallback: ralph)
+```
+Resolution order:
+1. Check if **preferred** agent is idle → use it
+2. Check if **fallback** agent is idle → use it
+3. Check **any** agent that's idle → use it
+4. If all busy → return null, item stays undiscovered until next tick
+## Dispatch Queue
+Discovered items land in `engine/dispatch.json`:
+```json
+{
+  "pending": [ ... ],    // Waiting to be spawned
+  "active": [ ... ],     // Currently running
+  "completed": [ ... ]   // Finished (last 100 kept)
+}
+```
+Each tick, the engine checks available slots:
+```
+slotsAvailable = maxConcurrent (3) - activeCount
+```
+It takes up to `slotsAvailable` items from pending and spawns them. Items are processed in discovery-priority order (fixes first, then reviews, then implements, then work-items).
+## Execution: spawnAgent()
+When an item is dispatched:
+### 1. Resolve Project Context
+```
+meta.project.localPath → rootDir (the repo on disk)
+```
+### 2. Create Git Worktree (if task has a branch)
+```
+git worktree add <rootDir>/../worktrees/<branch> -b <branch> <mainBranch>
+```
+- Branch names are sanitized (alphanumeric, dots, hyphens, slashes only, max 200 chars)
+- If worktree fails for implement/fix → item marked as error, moved to completed
+- If worktree fails for review/analyze → falls back to rootDir (read-only tasks)
+### 3. Render Playbook
+```
+playbooks/<type>.md  →  substitute {{variables}}  →  append notes.md  →  append learnings requirement
+```
+Variables injected from config and item metadata:
+- `{{project_name}}`, `{{ado_org}}`, `{{ado_project}}`, `{{repo_name}}` — from project config
+- `{{agent_name}}`, `{{agent_id}}`, `{{agent_role}}` — from agent roster
+- `{{item_id}}`, `{{item_name}}`, `{{branch_name}}`, `{{repo_id}}` — from work item
+- `{{team_root}}` — path to central `.squad/` directory
+### 4. Build System Prompt
+Combines:
+- Agent identity (name, role, skills)
+- Agent charter (`agents/<name>/charter.md`)
+- Project context (repo name, repo host config, main branch)
+- Critical rules (worktrees, MCP tools, PowerShell, learnings)
+- Full `notes.md` content
+### 5. Spawn Claude CLI
+```bash
+claude -p <prompt-file> --system-prompt <sysprompt-file> \
+  --output-format json --max-turns 100 --verbose \
+  --allowedTools Edit,Write,Read,Bash,Glob,Grep,Agent,WebFetch,WebSearch
+```
+- Process runs in the worktree directory (or rootDir for reviews)
+- stdout/stderr captured (capped at 1MB each)
+- CLAUDECODE env vars stripped to allow nested sessions
+### 6. Track State
+- Agent status → `working` in `agents/<name>/status.json`
+- Dispatch item → moved from `pending` to `active` in `dispatch.json`
+- Process tracked in `activeProcesses` Map for timeout monitoring
+## Post-Completion
+When the claude process exits:
+```
+proc.on('close')
+  │
+  ├─ Save output to agents/<name>/output.log
+  │
+  ├─ Set agent status to "done" (exit 0) or "error" (non-zero)
+  │
+  ├─ Move dispatch item: active → completed
+  │
+  ├─ Sync PRs from output (scan for PR URLs → pull-requests.json)
+  │
+  ├─ Post-completion hooks:
+  │    review     → update PR squadReview in pull-requests.json, vote on ADO
+  │    fix        → set PR squadReview back to "waiting"
+  │    build-test → (agent auto-files fix work items on failure)
+  │
+  ├─ Check for learnings in notes/inbox/
+  │    (warns if agent didn't write findings)
+  │
+  ├─ Update agent history and metrics
+  │
+  └─ Clean up temp prompt files from engine/
+```
+## Command Center (Dashboard Input)
+The dashboard exposes a unified input box at `http://localhost:7331` that parses natural-language intent into structured work items, decisions, or PRD items.
+**Syntax:**
+| Token | Effect |
+|-------|--------|
+| `@agent` | Assigns to a specific agent (sets `item.agent`) |
+| `@everyone` | Fan-out to all agents (sets `scope: 'fan-out'`) |
+| `!high` / `!low` | Sets priority (default: medium) |
+| `/decide` | Creates a decision (appended to `notes.md`) |
+| `/prd` | Creates a PRD item (appended to `prd-gaps.json`) |
+| `#project` | Targets a specific project queue |
+Work type is auto-detected from keywords (fix, explore, test, review → implement as fallback). The `@agent` assignment flows through to the engine: `item.agent || resolveAgent(workType, config)`.
+## Data Flow Diagram
+```
+                     Dashboard Command Center
+                     (unified intent input)
+                              │
+                   ┌──────────┼──────────┐
+                   ▼          ▼          ▼
+             work-items   decisions   prd-gaps
+             .json        .md         .json
+Per-project sources:              Central engine:              Agents:
+work-items.json ──┐
+prd-gaps.json ────┤
+pull-requests.json┤  discoverWork()   dispatch.json
+docs/**/*.md (specs)┤  (each tick)      ┌──────────┐
+                  │       │           │ pending   │
+~/.squad/         │       │           │ active    │
+  work-items.json ┤       │           │ completed │
+  plans/*.md ─────┘       ▼           └─────┬────┘
+                     addToDispatch()─────────┘
+                                            │
+ADO REST API ─── pollPrBuildStatus() ──► pull-requests.json
+(every 3min)      (buildStatus field)       │
+                                       spawnAgent()
+                                            │
+                               ┌────────────┼────────────┐
+                               ▼            ▼            ▼
+                          worktree     claude CLI    status.json
+                          (in project   (max 100      (working)
+                           repo dir)     turns)
+                                            │
+                                        on exit:
+                                            │
+                        ┌───────────┬───────┼───────┬──────────┐
+                        ▼           ▼       ▼       ▼          ▼
+                   output.log  notes/  PRs    work-items  localhost
+                   (per agent) inbox/*.md  .json  .json       (if webapp,
+                                    │             (auto-filed  from build
+                          consolidateInbox()       on failure)  & test)
+                          (at 5+ files)
+                                    │
+                                    ▼
+                              notes.md
+                              (injected into
+                               all future
+                               playbooks)
+```
+## Timeout & Stale Detection
+Two layers of protection:
+**Agent timeout** (`engine.agentTimeout`, default 10min):
+- Checks `activeProcesses` Map for elapsed time
+- Sends SIGTERM, then SIGKILL after 5s
+**Stale detection** (`engine.staleThreshold`, default 30min):
+- Scans `dispatch.active` for items where `started_at` exceeds threshold
+- Catches cases where the process exited but dispatch wasn't cleaned up
+- Kills process if still tracked, marks dispatch as error, resets agent to idle
+## Cooldown Behavior
+| Source | Default Cooldown | Behavior |
+|--------|-----------------|----------|
+| PRD items | 30 minutes | After dispatching M001, won't re-discover it for 30min |
+| Pull requests | 30 minutes | After dispatching a review, won't re-queue for 30min |
+| Work items | 0 (immediate) | No cooldown, but item status changes to "dispatched" |
+Cooldowns are **in-memory only**. On engine restart, `isAlreadyDispatched()` still prevents duplicates by checking the pending/active queue in `dispatch.json`. The cooldown just prevents rapid re-discovery within a single engine session.
+## Configuration Reference
+All discovery behavior is controlled via `config.json`:
+```json
+{
+  "engine": {
+    "tickInterval": 60000,       // ms between ticks
+    "maxConcurrent": 3,          // max agents running at once
+    "agentTimeout": 600000,      // 10min — kill hung processes
+    "staleThreshold": 1800000,   // 30min — kill stale dispatches
+    "maxTurns": 100              // max claude CLI turns per agent
+  },
+  "projects": [
+    {
+      "name": "MyProject",
+      "localPath": "C:/Users/you/MyProject",
+      "workSources": {
+        "prd": {
+          "enabled": true,
+          "path": "docs/prd-gaps.json",
+          "itemFilter": { "status": ["missing", "planned"] },
+          "cooldownMinutes": 30
+        },
+        "pullRequests": {
+          "enabled": true,
+          "path": ".squad/pull-requests.json",
+          "cooldownMinutes": 30
+        },
+        "workItems": {
+          "enabled": true,
+          "path": ".squad/work-items.json",
+          "cooldownMinutes": 0
+        }
+      }
+    }
+  ]
+}
+```
+To disable a work source for a project, set `"enabled": false`. To change where the engine looks for PRD or PR files, change the `path` field (resolved relative to `localPath`).

package/docs/blog-first-successful-dispatch.md ADDED Viewed

@@ -0,0 +1,127 @@
+# Blog: First Successful Agent Dispatch — The Spawn Debugging Saga
+**Date:** 2026-03-12
+**Author:** Squad Team + Claude Opus 4.6
+**Result:** PR #4959092 created by Dallas on office-bohemia
+## The Task
+Simple ask: add a `/cowork` route to the bebop webapp, referencing an existing commit as a pattern. Created via the dashboard Command Center as a central work item (auto-route, agent decides which project).
+## What Went Wrong (7 Failed Attempts)
+### Attempt 1-2: Shell Metacharacter Explosion
+The engine spawned agents via bash:
+```bash
+claude -p "$(cat 'prompt.md')" --system-prompt "$(cat 'sysprompt.md')" ...
+```
+The prompt contained parentheses, backticks, and special characters:
+```
+Each repo has its own ADO config (org, project, repoId) — use the correct one per repo
+```
+Bash interpreted `(org, project, repoId)` as a subshell command. **Silent failure** — no error logged, agent process died instantly.
+### Attempt 3-4: Direct Spawn Without Shell
+Switched to `spawn('claude', args)` — no bash wrapper. But on Windows, `claude` is a POSIX shell script (`#!/bin/sh`), not a binary. Node's `spawn` without `shell: true` couldn't execute it. **Process returned no PID**, error callback never fired.
+### Attempt 5: Shell True With Content Args
+Added `shell: true` to the spawn. But now the `--system-prompt` arg (3KB of text with special chars) got shell-interpreted again. Same metacharacter problem, different location.
+### Attempt 6: Node Wrapper Script
+Created `spawn-agent.js` — a Node wrapper that reads files and calls claude. Engine spawns `node spawn-agent.js <files> <args>`. Wrapper uses `shell: true` to find claude binary. **Same problem** — system prompt content as a CLI arg still gets shell-expanded.
+### Attempt 7: The Actual Fix
+Realized the claude shell wrapper at `/c/.tools/.npm-global/claude` just calls:
+```sh
+exec node "$basedir/node_modules/@anthropic-ai/claude-code/cli.js" "$@"
+```
+**Solution:** Resolve the actual `cli.js` entry point and spawn it directly via `process.execPath` (the current node binary). No shell involved at all:
+```javascript
+// spawn-agent.js
+const proc = spawn(process.execPath, [claudeBin, ...cliArgs], {
+  stdio: ['pipe', 'pipe', 'pipe'],
+  env  // CLAUDECODE vars already stripped
+});
+proc.stdin.write(prompt);  // prompt via stdin — no shell interpretation
+proc.stdin.end();
+```
+## Other Issues Discovered Along the Way
+### Output Format: json vs stream-json
+`--output-format json` produces **one JSON blob at exit**. No streaming output during execution. This broke:
+- Live output in dashboard (nothing to show until agent finishes)
+- Heartbeat monitoring (no file writes to check mtime against)
+Fix: switched to `--output-format stream-json` — streams events as they happen.
+### Permission Mode
+Agents would hang waiting for permission prompts (invisible in headless mode). Added `--permission-mode bypassPermissions`.
+### CLAUDECODE Environment Variable
+Claude Code sets `CLAUDECODE` env var to prevent nested sessions. Spawned agents inherit it and refuse to start. The engine strips it from `childEnv`, but the wrapper script was using `process.env` (which re-inherits from the parent). Fixed by stripping in both places.
+### Heartbeat vs Stale Detection
+Original approach: kill agents after a fixed time threshold (staleThreshold). Problem: agents can legitimately run for hours on complex tasks.
+New approach: heartbeat based on `live-output.log` mtime. As long as the agent produces output, it's alive. If silent for 5 minutes → declared dead. Catches orphaned processes (engine restart loses process handles) and hung agents.
+### Engine Restart Orphan Problem
+When the engine restarts, the in-memory `activeProcesses` Map is lost. Active dispatch items stay in `dispatch.json` but the engine has no process handle. Old stale detection (6h threshold) was too slow to catch this. The heartbeat check catches it in 5 minutes.
+## The Successful Run
+**Dispatch 1773292681199** — Dallas, central work item, auto-route:
+1. Engine spawns `node spawn-agent.js prompt.md sysprompt.md --output-format stream-json --verbose --permission-mode bypassPermissions --mcp-config mcp-servers.json`
+2. spawn-agent.js resolves `cli.js`, spawns `node cli.js -p --system-prompt <content> ...`
+3. Prompt piped via stdin — no shell interpretation
+4. MCP servers connect (azure-ado, azure-kusto, mobile, DevBox)
+5. Dallas reads the reference commit, explores office-bohemia's route structure
+6. Creates `apps/bebop/src/routes/_mainLayout.cowork.tsx`
+7. Updates route tree
+8. Creates git worktree, commits, pushes
+9. Creates **PR #4959092** via `mcp__azure-ado__repo_create_pull_request`
+10. Posts implementation notes as PR thread comment
+11. Writes learnings to `decisions/inbox/dallas-2026-03-12.md`
+12. Exits with code 0
+**Time:** ~8 minutes from dispatch to PR created
+**Output:** 70KB of stream-json events captured in `live-output.log`
+## Architecture Lessons
+1. **Never pass user content through shell expansion.** Use stdin or direct args via Node's `spawn` (without shell).
+2. **On Windows, npm-installed CLI tools are shell wrappers.** Resolve the actual `.js` entry point and spawn via `node`.
+3. **Streaming output format is essential** for monitoring long-running agents. One-shot JSON is useless for heartbeats.
+4. **Environment variable inheritance is tricky** with nested spawns. Strip at every level.
+5. **Heartbeats > timeouts** for agents that can run for hours. Check liveness, not elapsed time.
+## The Spawn Chain (Final Working Version)
+```
+engine.js (tick loop)
+  → spawn(process.execPath, ['spawn-agent.js', prompt.md, sysprompt.md, ...args])
+    → spawn-agent.js reads files, resolves cli.js path
+      → spawn(process.execPath, ['cli.js', '-p', '--system-prompt', content, ...args])
+        → claude-code runs with prompt via stdin
+          → agent works, streams JSON events to stdout
+            → engine captures to live-output.log (heartbeat)
+            → dashboard polls /api/agent/:id/live (3s refresh)
+```
+No bash. No shell. No metacharacter interpretation. Just Node spawning Node.