squad-station 0.6.9 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,11 @@
1
+ # Git Rules — BMad Method
2
+
3
+ ## Branching
4
+ - NEVER commit to `develop` or `master` directly.
5
+ - Create branch before code changes: `feat/epic-<N>-<name>`, `feat/story-<epic>-<story>-<name>`, `fix/<desc>`, `quick/<desc>`.
6
+ - Merge via PR only. Run `/bmad-code-review` before every PR.
7
+
8
+ ## Commits
9
+ - NEVER auto-commit. Wait for orchestrator/user instruction.
10
+ - Convention: `feat(epic-3/story-2):`, `fix:`, `refactor:`, `test:`, `docs:`
11
+ - Run tests before committing. One story = one atomic commit.
@@ -0,0 +1,11 @@
1
+ # Git Rules — GSD
2
+
3
+ ## Branching
4
+ - NEVER commit to `develop` or `master` directly.
5
+ - Create branch before code changes: `milestone/<name>`, `phase/<N>-<desc>`, `fix/<desc>`, `quick/<desc>`.
6
+ - Merge via PR only. Use `/gsd:ship` for PRs.
7
+
8
+ ## Commits
9
+ - NEVER auto-commit. Wait for orchestrator/user instruction.
10
+ - Convention: `feat(phase-3):`, `fix(phase-1):`, `refactor:`, `test:`, `docs:`
11
+ - Run tests before committing.
@@ -0,0 +1,11 @@
1
+ # Git Rules — OpenSpec
2
+
3
+ ## Branching
4
+ - NEVER commit to `develop` or `master` directly.
5
+ - Each change = one branch: `feat/<change-name>`, `fix/<change-name>`.
6
+ - Merge via PR only. Run `/opsx:verify` before creating PR.
7
+
8
+ ## Commits
9
+ - NEVER auto-commit. Wait for orchestrator/user instruction.
10
+ - Convention: `feat(<change-name>):`, `fix:`, `refactor:`, `test:`, `docs:`
11
+ - Verify specs before committing. Only stage files for current change.
@@ -0,0 +1,11 @@
1
+ # Git Rules — Superpowers
2
+
3
+ ## Branching
4
+ - NEVER commit to `develop` or `master` directly.
5
+ - Create branch before code changes: `feat/<feature>`, `fix/<bug>`. Use `using-git-worktrees` skill.
6
+ - Merge via PR only. Use `finishing-a-development-branch` skill for merge decisions.
7
+
8
+ ## Commits
9
+ - NEVER auto-commit. Wait for orchestrator/user instruction.
10
+ - Convention: `feat(<feature>):`, `fix:`, `test:`, `refactor:`, `docs:`
11
+ - TDD flow: `test:` → `feat:` → `refactor:`. All tests must pass before commit.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,252 @@
2
2
 
3
3
  All notable changes to Squad Station are documented in this file.
4
4
 
5
+ ## v0.7.0 — SDD Git Workflow Rules (2026-03-24)
6
+
7
+ Auto-install SDD git workflow rules during squad initialization, plus three watchdog reliability fixes.
8
+
9
+ ### Added
10
+
11
+ - **SDD git workflow rules auto-install** — During `squad-station init`, for each SDD entry in squad.yml, copies the matching rule template from `.squad/rules/git-workflow-<name>.md` into provider-specific rules directories (`.claude/rules/`, `.gemini/rules/`). Ships with 4 built-in rule templates: get-shit-done, bmad-method, openspec, superpowers.
12
+ - **SDD templates versioned** — `.squad/rules/`, `.squad/sdd/`, and `.squad/examples/` are now tracked in git. Runtime files (station.db, logs, PID) remain ignored via `.gitignore` whitelisting.
13
+
14
+ ### Fixed
15
+
16
+ - **Orphan busy state reset** — Reconcile and watchdog now detect agents marked "busy" in DB with zero processing messages (signal completed the task but failed to reset status). Resets to idle immediately without heuristics.
17
+ - **Pane capture window 5→20 lines** — `pane_looks_idle()` captured only 5 lines, missing Claude Code's prompt behind 4-5 status bar lines. Switched from `-l 5` to `-S -20` for broader tmux version compatibility.
18
+ - **Signal completes all processing messages on stop** — When orchestrator rapid-fires N tasks, agent processes all in one turn but only one Stop hook fires. Now uses `complete_all_processing()` to prevent N-1 orphaned "processing" messages.
19
+
20
+ ---
21
+
22
+ ## v0.6.9 — Remove Idle Nudge (2026-03-23)
23
+
24
+ Simplifies watchdog by removing idle nudge notifications. Watchdog now focuses solely on stuck-agent detection with tiered escalation.
25
+
26
+ ### Changed
27
+
28
+ - **Watchdog simplified** — Removed idle nudge (Pass 2) that sent "System idle for Xm" notifications to the orchestrator. Watchdog now only monitors for stuck agents: log-only at 10m, auto-heal at 30m, orchestrator alert at 60m.
29
+ - **`--stall-threshold` hidden** — CLI arg kept for backwards compatibility but hidden from help output (no longer functional).
30
+
31
+ ### Fixed
32
+
33
+ - **Docs updated** — SYSTEM-DESIGN.md now includes `watch` and `doctor` in CLI reference, `watch.rs` in architecture modules, and updated release history. README watchdog description updated to reflect tiered stuck-agent detection.
34
+
35
+ ---
36
+
37
+ ## v0.6.8 — Lean SDD Playbooks (2026-03-23)
38
+
39
+ Trims SDD playbooks to agent-essential content and adds OpenSpec as a supported SDD framework.
40
+
41
+ ### Changed
42
+
43
+ - **SDD playbooks trimmed 84%** — Removed installation guides, troubleshooting, mermaid diagrams, verbose prose, and external links from bmad-playbook.md, gsd-playbook.md, and superpowers-playbook.md. Each file now contains only workflow sequences, command reference tables, and critical rules.
44
+
45
+ ### Added
46
+
47
+ - **OpenSpec SDD playbook** — New `openspec-playbook.md` (74 lines) supporting OpenSpec's spec-driven workflow (propose → apply → archive) with core and expanded profiles
48
+ - **OpenSpec in example configs** — Added OpenSpec as a commented-out SDD option in orchestrator-claude.yml and orchestrator-gemini.yml examples
49
+
50
+ ---
51
+
52
+ ## v0.6.7 — Hook Log Redirect Fix (2026-03-23)
53
+
54
+ Fixes hook command failures caused by relative shell redirects resolving against the wrong working directory.
55
+
56
+ ### Fixed
57
+
58
+ - **Hook stderr redirect path** — `squad-station init` generated hook commands with `2>>.squad/log/signal.log` which fails when Claude Code's hook runner CWD differs from the project root. Replaced with `2>/dev/null` since `signal.rs` handles logging internally via `log_signal()`. Gemini hooks similarly updated from `>>.squad/log/signal.log 2>&1` to `>/dev/null 2>&1`.
59
+ - **notify.rs hook safety** — `notify` command used `anyhow::bail!` on errors (non-zero exit), which could break provider hook contracts. Now always exits 0 with best-effort logging via `log_notify()`, matching the `signal.rs` pattern.
60
+
61
+ ### Changed
62
+
63
+ - Updated SYSTEM-DESIGN.md GUARD flowchart and prose to reflect `/dev/null` redirect pattern
64
+
65
+ ---
66
+
67
+ ## v0.6.6 — Stale Busy Fix (2026-03-23)
68
+
69
+ Fixes false positive watchdog warnings caused by orphaned processing messages and missed idle detection.
70
+
71
+ ### Fixed
72
+
73
+ - **current_task overwrite in send.rs** — when a second task was sent while the first was still processing, `set_current_task` blindly overwrote the FK to the newer message. Signal then completed the wrong message, orphaning the original in `processing` forever and leaving the agent stuck in `busy` state. Now only sets `current_task` if no task is currently assigned; queued tasks are picked up by signal's remaining-processing check.
74
+ - **Idle pane detection in reconcile.rs** — `pane_looks_idle` only checked the last non-empty line for the `❯` prompt pattern. Claude Code's TUI renders a status bar below the prompt, so the last line was always status info, never the prompt. Now scans all 5 captured lines for idle patterns.
75
+
76
+ ### Added
77
+
78
+ - Regression test `test_second_send_does_not_overwrite_current_task` reproducing the exact production incident
79
+
80
+ ---
81
+
82
+ ## v0.6.5 — Async Pattern Fixes and Batch DB Queries (2026-03-23)
83
+
84
+ Fixes blocking async patterns in tmux operations and optimizes the status command with a batch database query.
85
+
86
+ ### Fixed
87
+
88
+ - **Async sleep in tmux.rs** — converted 3 instances of `std::thread::sleep()` to `tokio::time::sleep().await` in `send_keys_literal()`, `inject_single()`, and `inject_body()`. These were blocking the Tokio executor for 2–5 seconds per call, preventing other async tasks from making progress.
89
+ - **Clippy warnings** — resolved 7 clippy lints: empty doc comment line, `push_str` → `push` for single char, `match` → `matches!` macro (3×), needless borrows (2×)
90
+
91
+ ### Changed
92
+
93
+ - **Batch DB query in status command** — added `count_processing_per_agent()` single `GROUP BY` aggregate query replacing N sequential `list_messages()` calls (one per agent). Scales O(1) instead of O(N) with agent count.
94
+ - Updated 9 callers across 5 command files (`send.rs`, `notify.rs`, `signal.rs`, `reconcile.rs`, `watch.rs`) to await the now-async tmux functions
95
+
96
+ ### Added
97
+
98
+ - 3 new unit tests for `count_processing_per_agent()` covering empty DB, single agent, and multiple agents
99
+
100
+ ---
101
+
102
+ ## v0.6.4 — Smart PATH Detection for npm Installer (2026-03-23)
103
+
104
+ npm installer now picks install directories already in PATH, adds cross-platform PATH instructions, and adds Windows support.
105
+
106
+ ### Added
107
+
108
+ - **Smart PATH detection** — `findBestInstallDir()` scans PATH for writable directories (`/usr/local/bin`, `~/.local/bin`, `~/bin`) before falling back to `~/.squad-station/bin`, eliminating manual PATH configuration on most systems
109
+ - **Cross-platform PATH instructions** — when the install directory is not in PATH, prints shell-specific instructions (bash/zsh/fish/PowerShell) for adding it
110
+ - **Windows support** — npm installer handles `.exe` suffix, uses PowerShell for downloads, and supports Windows PATH directories
111
+
112
+ ### Changed
113
+
114
+ - **Release process codified** — added `/release` slash command (`.claude/commands/release.md`) documenting the full 7-step release checklist with lessons learned
115
+
116
+ ---
117
+
118
+ ## v0.6.3 — npm Installer Binary Fix (2026-03-23)
119
+
120
+ Fixes the npm installer so the downloaded binary is executable and the npm package works correctly out of the box.
121
+
122
+ ### Fixed
123
+
124
+ - **chmod +x bin/run.js** — npm entry point was not executable after install
125
+ - **npm package fixes** — corrected package configuration for reliable `npx squad-station install` flow
126
+
127
+ ---
128
+
129
+ ## v0.6.2 — Post-Init Health Check, Autonomous Orchestrator, Doctor Command (2026-03-23)
130
+
131
+ Adds a comprehensive post-init health check that validates 9 components, a standalone `doctor` diagnostic command, autonomous orchestrator mode with clear decision authority boundaries, and fixes the watchdog self-detection race condition.
132
+
133
+ ### Added
134
+
135
+ - **Post-init health check** — validates 9 components after `squad-station init`: database, log directory, signal hooks, notify hooks (per provider), orchestrator context file, tmux sessions (orchestrator + each agent), and watchdog daemon. Prints pass/fail/warn summary with actionable remediation steps.
136
+ - **`squad-station doctor` command** — standalone health check for diagnosing squad operational issues without re-running init. Exits with code 1 if any checks fail.
137
+ - **Autonomous orchestrator mode** — new "Autonomous Mode" section in generated `squad-orchestrator.md`:
138
+ - **Decision authority** — orchestrator makes routing, implementation, testing, and technical trade-off decisions without asking the user
139
+ - **Escalation criteria** — only escalate for ambiguous requirements, destructive actions, external dependencies, or scope conflicts
140
+ - **Driving to completion** — orchestrator dispatches follow-up tasks on errors, answers agent questions, and verifies work before reporting done
141
+ - **11 E2E lifecycle tests** (`tests/test_e2e_lifecycle.rs`) — covers watchdog daemon lifecycle (start/stop/duplicate/stale PID/logging), init artifact creation, init idempotency, doctor exit codes, and watchdog self-detection regression
142
+
143
+ ### Fixed
144
+
145
+ - **Watchdog self-detection race condition** — daemon was killing itself immediately after start because it read its own PID from the PID file and treated it as a duplicate. Now compares PID file contents against `std::process::id()` and skips the duplicate check when they match.
146
+
147
+ ### Changed
148
+
149
+ - **QA Gate instructions refined** — error handling now instructs orchestrator to analyze and fix errors autonomously; technical questions answered from project context; only genuinely ambiguous requirements escalated to user
150
+
151
+ ---
152
+
153
+ ## v0.6.1 — Signal Hook Fix & Watchdog Self-Healing (2026-03-22)
154
+
155
+ Fixes the critical signal hook failure where `$SQUAD_AGENT_NAME` was never available in hook subprocess contexts, causing silent signal drops. Adds tiered watchdog self-healing that auto-recovers stuck agents.
156
+
157
+ **168 tests passing** (89 lib + 79 integration).
158
+
159
+ ### Fixed
160
+
161
+ - **Signal hook agent name resolution** (BUG-01, CRITICAL) — switched from `$SQUAD_AGENT_NAME`/`tmux list-panes` to `tmux display-message -p '#S'`, a tmux server-side query that works reliably in all hook subprocess contexts (Claude Code Stop hooks, Gemini CLI AfterAgent). The env var approach from v0.6.0 never worked because hook subprocesses don't inherit the shell's exported variables.
162
+ - **GUARD-1 silent failure** (BUG-02) — empty agent name now logged to `.squad/log/signal.log` + stderr before exit 0, instead of silently swallowed with zero forensic evidence
163
+ - **Watchdog daemon dies silently** (BUG-06) — new `ensure_watchdog()` health check called opportunistically from `signal` and `send`; detects dead PID and respawns daemon
164
+ - **Watchdog stderr sent to /dev/null** (BUG-07) — daemon stderr now redirected to `.squad/log/watch-stderr.log` for crash diagnostics
165
+ - **Watchdog Pass 3 observe-only** (BUG-08) — prolonged busy detection upgraded from log-only to tiered escalation with corrective actions
166
+ - **Watchdog killed by terminal close** (BUG-10) — daemon now calls `setsid()` to create a new session, surviving SIGHUP from parent terminal
167
+
168
+ ### Added
169
+
170
+ - **Tiered watchdog busy detection** — 4-level escalation for stuck agents:
171
+ - 10-30min: log only (long tasks are normal)
172
+ - 30min+: auto-heal if pane is idle (complete stuck tasks, reset to idle, notify orchestrator)
173
+ - 60min+: alert orchestrator with WARNING (10min per-agent cooldown)
174
+ - 120min+: escalate to URGENT prefix
175
+ - **Pane content snapshot logging** — Tier 2 idle detection logs last 5 lines of pane content to `watch.log` for diagnosing false positives
176
+ - **`complete_all_processing()` DB function** — batch-completes all processing messages for an agent; used by watchdog self-healing
177
+ - **`BusyAlertState`** — per-agent notification cooldown to prevent orchestrator notification spam
178
+ - **`spawn_watchdog_daemon()` shared helper** — extracted from `watch --daemon` and `ensure_watchdog` to eliminate duplication; configures setsid, stderr-to-log, stdin/stdout null
179
+ - **Unit tests** — `BusyAlertState` (4 tests), `complete_all_processing` (2 tests), `test_guard1_logs_empty_agent_name`
180
+
181
+ ### Changed
182
+
183
+ - **Hook commands simplified** — removed intermediate `$AGENT` variable, `[ -n "$AGENT" ]` shell guard, `$SQUAD_AGENT_NAME`, `$TMUX_PANE`, and `tmux list-panes` fallback. Signal.rs GUARD-1 handles empty names with logging — no shell-level guard needed.
184
+ - **`agent_resolve_snippet()` renamed to `agent_name_subshell()`** — returns `$(tmux display-message -p '#S' 2>/dev/null)` instead of the old multi-stage resolution
185
+ - **`pane_looks_idle()` visibility** — narrowed from `fn` to `pub(crate)` for watchdog Tier 2 access
186
+ - **`capture_pane()` visibility** — narrowed from `fn` to `pub(crate)` for watchdog pane snapshot logging
187
+ - **`// SAFETY:` comments** — added to all `unsafe` blocks (`setsid`, `kill`, `signal`)
188
+
189
+ ### Removed
190
+
191
+ - **Vendored GSD plugin files** — removed ~104k lines of `.claude/agents/`, `.claude/commands/gsd/`, `.claude/get-shit-done/`, `.gemini/`, `.planning/` directories
192
+ - **`$SQUAD_AGENT_NAME` env var** — never available in hook contexts; removed entirely
193
+
194
+ ### Documentation
195
+
196
+ - **SYSTEM-DESIGN.md** — updated section 5.2 (guard chain) and 5.3 (hook commands) to reflect new `tmux display-message` pattern with stderr redirection
197
+ - **PLAYBOOK.md** — added section 4 (Watchdog) documenting tiered escalation, resilience features, and log files; updated troubleshooting
198
+ - **Change analysis archived** — `docs/changes/archive/bug-analysis-signal-watchdog.md` and `solution-signal-watchdog.md`
199
+
200
+ ---
201
+
202
+ ## v0.6.0 — Signal Reliability (2026-03-20)
203
+
204
+ Three-layer defense against lost agent completion signals. Zero-config hook setup, project-scoped logging, and a self-healing watchdog daemon.
205
+
206
+ **233 tests passing** (84 lib + 149 integration). E2E validated on kindle-ai-export with 3 running Claude Code agents.
207
+
208
+ ### Added
209
+
210
+ - **`squad-station reconcile` command** — reconcile agent statuses against live tmux sessions; supports `--dry-run` and `--json` output
211
+ - **`squad-station watch` daemon** — background watchdog with 3-pass detection: individual agent reconcile, global stall detection with orchestrator nudge, and prolonged busy warnings; auto-starts on `init`
212
+ - **`clean --all` flag** — deletes logs in addition to DB and sessions
213
+ - **`providers.rs` module** — centralized provider metadata (idle patterns, hook events, settings paths, fire-and-forget prefixes, alternate buffer detection)
214
+ - **`$SQUAD_AGENT_NAME` environment variable** — set in each tmux session at launch for reliable hook identification; eliminates fragile `tmux display-message` in subprocess contexts
215
+ - **Project-scoped signal logging** — all signal events logged to `.squad/log/signal.log` with RFC3339 timestamps, level (OK/WARN/GUARD), agent name, and structured context
216
+ - **Watchdog logging** — daemon activity logged to `.squad/log/watch.log` with nudge tracking and stall detection
217
+ - **Log rotation** — signal.log auto-truncates to 500 lines when exceeding 1MB
218
+ - **Signal uses `current_task` FK** — targeted completion of the exact task being worked on, with FIFO fallback safety net when `current_task` is NULL
219
+ - **DB layer functions** — `set_current_task`, `clear_current_task`, `complete_by_id`, `last_completed_id`, `complete_message_by_id`, `count_processing_all`, `total_count`, `last_activity_timestamp`
220
+ - **Hook templates upgraded** — Claude Code: Stop + Notification + PostToolUse with `$SQUAD_AGENT_NAME` and stderr-to-log redirection; Gemini CLI: AfterAgent + Notification with JSON stdout compliance and 30s timeout
221
+
222
+ ### Changed
223
+
224
+ - **Signal flow rewritten** — primary path uses `current_task` FK for targeted completion instead of FIFO queue; FIFO retained as fallback with WARN-level logging
225
+ - **Fire-and-forget commands** (`/clear`) no longer set `current_task` — prevents corruption when `/clear` overlaps an in-flight real task
226
+ - **Hook resolution** — switched from `tmux display-message -p '#S'` to `$SQUAD_AGENT_NAME` env var with `tmux list-panes` fallback
227
+ - **Init command** — now creates `.squad/log/` directory, auto-starts watchdog, installs hooks for all providers in the squad (not just orchestrator)
228
+ - **Clean command** — stops watchdog daemon before deleting DB to prevent crash loops
229
+
230
+ ### Fixed
231
+
232
+ - **current_task corruption** when `/clear` overlaps in-flight task — current_task now correctly reverts to the real task (v0.5.8)
233
+ - **Signal race condition** — `/clear` followed by a real task no longer leaves the real task stuck at `processing` (v0.5.7)
234
+ - **Shell injection in session names** — `sanitize_session_name` now strips all shell metacharacters (`' ` `` ` `` `$ ; () | & <> \ /` space newline null), not just `.` `:` `"` (PR #2)
235
+ - **Unquoted model value in launch commands** — model values validated against `[a-zA-Z0-9._-:]` before shell interpolation (PR #2)
236
+ - **Clean command misses sessions** — `compute_session_names` now calls `sanitize_session_name` to match init naming (PR #2)
237
+ - **Signal exit-0 violation** — `get_agent` error no longer propagates as non-zero exit; uses soft guard matching the rest of the function (PR #2)
238
+ - **Antigravity agents marked dead** — `reconcile_agent_statuses` now skips `tool="antigravity"` agents that never have tmux sessions (PR #2)
239
+ - **inject_body corrupts task text** — `&&` splitting now only triggers when ALL parts are slash commands; plain text like "check if A && B" is sent as-is (PR #2)
240
+ - **Orphan WAL/SHM files** — `delete_db_file` now removes `station.db-wal` and `station.db-shm` alongside the main DB (PR #2)
241
+
242
+ ### Security
243
+
244
+ - **sanitize_session_name hardened** — prevents shell injection via crafted session names in `sh -c` view commands
245
+ - **Model value validation** — blocks injection via malicious `model` field in `squad.yml`
246
+
247
+ ### Removed
248
+
249
+ - Raw SQL queries outside `src/db/` — all moved to db layer functions
250
+
5
251
  ## v0.5.8 - 2026-03-20
6
252
 
7
253
  ### 🐛 Bug Fixes
package/bin/run.js CHANGED
@@ -43,7 +43,7 @@ function install() {
43
43
 
44
44
  function installBinary() {
45
45
  // Binary version — may differ from npm package version
46
- var VERSION = '0.6.9';
46
+ var VERSION = '0.7.0';
47
47
  var REPO = 'thientranhung/squad-station';
48
48
 
49
49
  var isWindows = process.platform === 'win32';
@@ -221,6 +221,24 @@ function scaffoldProject(force) {
221
221
  }
222
222
  });
223
223
 
224
+ // Copy rules/ (git workflow rules)
225
+ var rulesSrc = path.join(srcSquad, 'rules');
226
+ if (fs.existsSync(rulesSrc)) {
227
+ var rulesDest = path.join(destSquad, 'rules');
228
+ fs.mkdirSync(rulesDest, { recursive: true });
229
+
230
+ var rulesFiles = fs.readdirSync(rulesSrc).filter(function(f) { return f.endsWith('.md'); });
231
+ rulesFiles.forEach(function(file) {
232
+ var dest = path.join(rulesDest, file);
233
+ if (fs.existsSync(dest) && !force) {
234
+ console.log(' \x1b[33m–\x1b[0m .squad/rules/' + file + ' \x1b[2m(exists, use --force to overwrite)\x1b[0m');
235
+ } else {
236
+ fs.copyFileSync(path.join(rulesSrc, file), dest);
237
+ console.log(' \x1b[32m✓\x1b[0m .squad/rules/' + file);
238
+ }
239
+ });
240
+ }
241
+
224
242
  // Copy examples/
225
243
  var exSrc = path.join(srcSquad, 'examples');
226
244
  var exDest = path.join(destSquad, 'examples');
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "squad-station",
3
- "version": "0.6.9",
3
+ "version": "0.7.1",
4
4
  "description": "Message routing and orchestration for AI agent squads",
5
5
  "repository": {
6
6
  "type": "git",
@@ -11,6 +11,7 @@
11
11
  },
12
12
  "files": [
13
13
  "bin/",
14
+ ".squad/rules/",
14
15
  ".squad/sdd/",
15
16
  ".squad/examples/",
16
17
  "CHANGELOG.md"