@yemi33/squad 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -172,7 +172,7 @@ The web dashboard at `http://localhost:7331` provides:
172
172
 
173
173
  ## Project Config
174
174
 
175
- When you run `squad.js add <dir>`, it prompts for project details and saves them to `config.json`. Each project entry looks like:
175
+ When you run `squad add <dir>`, it prompts for project details and saves them to `config.json`. Each project entry looks like:
176
176
 
177
177
  ```json
178
178
  {
@@ -205,7 +205,7 @@ The init script also creates `<project>/.squad/` with empty `work-items.json` an
205
205
 
206
206
  ### Auto-Discovery
207
207
 
208
- When you run `squad.js add`, the tool automatically detects what it can from the repo:
208
+ When you run `squad add`, the tool automatically detects what it can from the repo:
209
209
 
210
210
  | What | How |
211
211
  |------|-----|
@@ -227,7 +227,7 @@ Agents need MCP tools to interact with your repo host (create PRs, post review c
227
227
 
228
228
  **Example:** If you use Azure DevOps, configure the `azure-ado` MCP server in your Claude Code settings. If you use GitHub, configure the `github` MCP server. Agents will discover and use whichever tools are available.
229
229
 
230
- Manually refresh with `node engine.js mcp-sync`.
230
+ Manually refresh with `squad mcp-sync`.
231
231
 
232
232
  ## Work Items
233
233
 
@@ -235,7 +235,7 @@ All work items use the shared `playbooks/work-item.md` template, which provides
235
235
 
236
236
  **Per-project** — scoped to one repo. Select a project in the Command Center dropdown.
237
237
 
238
- **Central (auto-route)** — agent gets all project descriptions and decides where to work. Use "Auto (agent decides)" in the dropdown, or `node engine.js work "title"`. Can span multiple repos.
238
+ **Central (auto-route)** — agent gets all project descriptions and decides where to work. Use "Auto (agent decides)" in the dropdown, or `squad work "title"`. Can span multiple repos.
239
239
 
240
240
  ### Fan-Out (Parallel Multi-Agent)
241
241
 
@@ -328,9 +328,10 @@ Routing rules in `routing.md`. Charters in `agents/{name}/charter.md`. Both are
328
328
  | `implement.md` | Build a PRD item in a git worktree, create PR |
329
329
  | `review.md` | Review a PR, post findings to repo host |
330
330
  | `fix.md` | Fix review feedback on existing PR branch |
331
- | `analyze.md` | Generate PRD gap analysis in a worktree |
332
331
  | `explore.md` | Read-only codebase exploration |
333
332
  | `test.md` | Run tests and report results |
333
+ | `build-and-test.md` | Build project and run test suite |
334
+ | `plan-to-prd.md` | Convert a plan into PRD gap items |
334
335
 
335
336
  All playbooks use `{{template_variables}}` filled from project config. The `work-item.md` playbook uses `{{scope_section}}` to inject project-specific or multi-project context. Playbooks are fully customizable — edit them to match your workflow.
336
337
 
@@ -355,7 +356,7 @@ Agents can run for hours as long as they're producing output. The `heartbeatTime
355
356
  | Orphaned worktrees | >24 hours old, no active dispatch references them |
356
357
  | Zombie processes | In memory but no matching dispatch |
357
358
 
358
- Manual cleanup: `node engine.js cleanup`
359
+ Manual cleanup: `squad cleanup`
359
360
 
360
361
  ## Self-Improvement Loop
361
362
 
@@ -374,7 +375,7 @@ When a reviewer flags issues, the engine creates `feedback-<author>-from-<review
374
375
  `engine/metrics.json` tracks per agent: tasks completed, errors, PRs created/approved/rejected, reviews done. Visible in CLI (`status`) and dashboard with color-coded approval rates.
375
376
 
376
377
  ### 5. Skills
377
- Agents save repeatable workflows to `skills/<name>.md` with Claude Code-compatible frontmatter. Engine builds an index injected into all prompts. Skills can also be stored per-project at `<project>/.claude/skills/<name>/SKILL.md` (requires a PR). Visible in dashboard alongside decisions.
378
+ Agents save repeatable workflows to `skills/<name>.md` with Claude Code-compatible frontmatter. Engine builds an index injected into all prompts. Skills can also be stored per-project at `<project>/.claude/skills/<name>/SKILL.md` (requires a PR). Visible in the dashboard Skills section.
378
379
 
379
380
  See `docs/self-improvement.md` for the full breakdown.
380
381
 
@@ -408,9 +409,9 @@ Engine behavior is controlled via `config.json`. Key settings:
408
409
 
409
410
  The engine and all spawned agents use the Node binary that started the engine (`process.execPath`). After upgrading Node, restart the engine:
410
411
 
411
- ```powershell
412
- node ~/.squad/engine.js stop
413
- node ~/.squad/engine.js
412
+ ```bash
413
+ squad stop
414
+ squad start
414
415
  ```
415
416
 
416
417
  ## Portability
@@ -418,41 +419,45 @@ node ~/.squad/engine.js
418
419
  **Portable (works on any machine):** Engine, dashboard, playbooks, charters, routing, notes, skills, docs, work items.
419
420
 
420
421
  **Machine-specific (reconfigure per machine):**
421
- - `config.json` — contains absolute paths to project directories. Re-link via `node squad.js add <dir>`.
422
+ - `config.json` — contains absolute paths to project directories. Re-link via `squad add <dir>`.
422
423
  - `mcp-servers.json` — auto-synced from `~/.claude.json` on engine start.
423
424
 
424
- To move to a new machine: clone `~/.squad/`, delete `engine/control.json`, re-run `node squad.js add` for each project.
425
+ To move to a new machine: `npm install -g @yemi33/squad && squad init --force`, then re-run `squad add` for each project.
425
426
 
426
427
  ## File Layout
427
428
 
428
429
  ```
429
430
  ~/.squad/
430
- squad.js <- CLI: init, add, remove, list projects
431
+ bin/
432
+ squad.js <- Unified CLI entry point (npm package)
433
+ squad.js <- Project management: init, add, remove, list
431
434
  engine.js <- Engine daemon
432
435
  engine/
433
436
  spawn-agent.js <- Agent spawn wrapper (resolves claude cli.js)
434
- control.json <- running/paused/stopped
435
- dispatch.json <- pending/active/completed queue
436
- log.json <- Audit trail (capped at 500)
437
- metrics.json <- Per-agent quality metrics
437
+ ado-mcp-wrapper.js <- ADO MCP authentication wrapper
438
+ control.json <- running/paused/stopped (runtime)
439
+ dispatch.json <- pending/active/completed queue (runtime)
440
+ log.json <- Audit trail, capped at 500 (runtime)
441
+ metrics.json <- Per-agent quality metrics (runtime)
438
442
  dashboard.js <- Web dashboard server
439
- dashboard.html <- Dashboard UI
443
+ dashboard.html <- Dashboard UI (single-file)
440
444
  config.json <- projects[], agents, engine, claude settings
441
- config.template.json <- Template for reference
445
+ config.template.json <- Template for new installs
446
+ package.json <- npm package definition
442
447
  mcp-servers.json <- MCP servers (auto-synced, gitignored)
443
448
  routing.md <- Dispatch rules table (editable)
444
449
  team.md <- Team roster
445
- notes.md <- Team rules + consolidated learnings
446
- work-items.json <- Central work queue (agent decides which project)
447
- TODO.md <- Future improvements roadmap
450
+ notes.md <- Team rules + consolidated learnings (runtime)
451
+ work-items.json <- Central work queue (runtime)
448
452
  playbooks/
449
453
  work-item.md <- Shared work item template
450
454
  implement.md <- Build a PRD item
451
455
  review.md <- Review a PR
452
456
  fix.md <- Fix review feedback
453
- analyze.md <- Generate new PRD
454
457
  explore.md <- Codebase exploration
455
458
  test.md <- Run tests
459
+ build-and-test.md <- Build project and run test suite
460
+ plan-to-prd.md <- Convert plan to PRD gap items
456
461
  skills/
457
462
  README.md <- Skill format guide
458
463
  <name>.md <- Agent-created reusable workflows
@@ -460,7 +465,7 @@ To move to a new machine: clone `~/.squad/`, delete `engine/control.json`, re-ru
460
465
  {name}/
461
466
  charter.md <- Agent identity and boundaries (editable)
462
467
  status.json <- Current state (runtime)
463
- history.md <- Task history (last 20, runtime)
468
+ history.md <- Task history, last 20 (runtime)
464
469
  live-output.log <- Streaming output while working (runtime)
465
470
  output.log <- Final output after completion (runtime)
466
471
  identity/
package/dashboard.html CHANGED
@@ -387,7 +387,7 @@
387
387
  <section class="cmd-center">
388
388
  <h2>Command Center</h2>
389
389
  <div class="cmd-input-wrap" id="cmd-input-wrap">
390
- <textarea id="cmd-input" rows="1" placeholder='What do you need? e.g. "Fix the auth bug @dallas" or "/decide always use feature flags"'
390
+ <textarea id="cmd-input" rows="1" placeholder='What do you need? e.g. "Fix the auth bug @dallas" or "/note always use feature flags"'
391
391
  oninput="cmdInputChanged()" onkeydown="cmdKeyDown(event)"></textarea>
392
392
  <button class="cmd-send-btn" id="cmd-send-btn" onclick="cmdSubmit()">Send <kbd>Ctrl+Enter</kbd></button>
393
393
  </div>
@@ -0,0 +1,92 @@
1
+ # Engine Restart & Agent Survival
2
+
3
+ ## The Problem
4
+
5
+ When the engine restarts, it loses its in-memory process handles (`activeProcesses` Map). Claude CLI agents spawned before the restart are still running as OS processes, but the engine can't monitor their stdout, detect exit codes, or manage their lifecycle. Without protection, the heartbeat check (5-min default) would kill these agents as "orphans."
6
+
7
+ ## What's Persisted vs Lost
8
+
9
+ | State | Storage | Survives Restart |
10
+ |-------|---------|-----------------|
11
+ | Dispatch queue (pending/active/completed) | `engine/dispatch.json` | Yes |
12
+ | Agent status (working/idle/error) | `agents/*/status.json` | Yes |
13
+ | Agent live output | `agents/*/live-output.log` | Yes (mtime used as heartbeat) |
14
+ | Process handles (`ChildProcess`) | In-memory Map | **No** |
15
+ | Cooldown timestamps | In-memory Map | **No** (repopulated from `engine/cooldowns.json`) |
16
+
17
+ ## Protection Mechanisms
18
+
19
+ ### 1. Grace Period on Startup (20 min default)
20
+
21
+ When the engine starts and finds active dispatches from a previous session, it sets `engineRestartGraceUntil` to `now + 20 minutes`. During this window, orphan detection is completely suppressed — agents won't be killed even if the engine has no process handle for them.
22
+
23
+ Configurable via `config.json`:
24
+ ```json
25
+ {
26
+ "engine": {
27
+ "restartGracePeriod": 1200000
28
+ }
29
+ }
30
+ ```
31
+
32
+ ### 2. Blocking Tool Detection
33
+
34
+ Even after the grace period expires, the engine scans each agent's `live-output.log` for the most recent `tool_use` call. If the agent is in a known blocking tool:
35
+
36
+ - **`TaskOutput` with `block: true`** — timeout extended to the task's own timeout + 1 min
37
+ - **`Bash` with long timeout (>5 min)** — timeout extended to the bash timeout + 1 min
38
+
39
+ This works for both tracked processes and orphans (no process handle).
40
+
41
+ ### 3. Stop Warning
42
+
43
+ `engine.js stop` checks for active dispatches and warns:
44
+ ```
45
+ WARNING: 2 agent(s) are still working:
46
+ - Dallas: [office-bohemia] Build & test PR PR-4959092
47
+ - Rebecca: [office-bohemia] Review PR PR-4964594
48
+
49
+ These agents will continue running but the engine won't monitor them.
50
+ On next start, they'll get a 20-min grace period before being marked as orphans.
51
+ To kill them now, run: node engine.js kill
52
+ ```
53
+
54
+ ### 4. Exponential Backoff on Failures
55
+
56
+ If an agent is killed as an orphan and the work item retries, cooldowns use exponential backoff (2^failures, max 8x) to prevent spam-retrying broken tasks.
57
+
58
+ ## Safe Restart Pattern
59
+
60
+ ```bash
61
+ node engine.js stop # Check the warning — are agents working?
62
+ # If yes, decide: wait for them to finish, or accept the grace period
63
+ # Make your code changes
64
+ node engine.js start # Grace period kicks in for surviving agents
65
+ ```
66
+
67
+ ## What the Engine Cannot Do
68
+
69
+ - **Reattach to processes** — Node.js `child_process` doesn't support adopting external PIDs. Once the process handle is lost, the engine can only observe the agent indirectly via file output.
70
+ - **Guarantee completion** — An agent that finishes during a restart will have its output saved to `live-output.log`, but the engine won't run post-completion hooks (PR sync, metrics update, learnings check). These are picked up on the next tick via output file scanning.
71
+ - **Resume mid-task** — If an agent is killed (by orphan detection or timeout), the work item is marked failed. It can be retried but starts from scratch.
72
+
73
+ ## Timeline of a Restart
74
+
75
+ ```
76
+ T+0s engine.js stop (warns about active agents)
77
+ Engine process exits. Agents keep running as OS processes.
78
+
79
+ T+30s Code changes made. engine.js start.
80
+ Engine reads dispatch.json — finds 2 active items.
81
+ Sets grace period: 20 min from now.
82
+ Logs: "2 active dispatch(es) from previous session"
83
+
84
+ T+0-20m Ticks run. Orphan detection skipped (grace period).
85
+ If an agent finishes, output is written to live-output.log.
86
+ Engine detects completed output on next tick via file scan.
87
+
88
+ T+20m Grace period expires.
89
+ Heartbeat check resumes. Blocking tool detection still active.
90
+ Agent in TaskOutput block:true gets extended timeout.
91
+ Agent with no output for 5min+ and no blocking tool → orphaned.
92
+ ```
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@yemi33/squad",
3
- "version": "0.1.1",
3
+ "version": "0.1.3",
4
4
  "description": "Multi-agent AI dev team that runs from ~/.squad/ — five autonomous agents share a single engine, dashboard, and knowledge base",
5
5
  "bin": {
6
6
  "squad": "bin/squad.js"