@yemi33/squad 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,92 @@
1
+ # Engine Restart & Agent Survival
2
+
3
+ ## The Problem
4
+
5
+ When the engine restarts, it loses its in-memory process handles (`activeProcesses` Map). Claude CLI agents spawned before the restart are still running as OS processes, but the engine can't monitor their stdout, detect exit codes, or manage their lifecycle. Without protection, the heartbeat check (5-min default) would kill these agents as "orphans."
6
+
7
+ ## What's Persisted vs Lost
8
+
9
+ | State | Storage | Survives Restart |
10
+ |-------|---------|-----------------|
11
+ | Dispatch queue (pending/active/completed) | `engine/dispatch.json` | Yes |
12
+ | Agent status (working/idle/error) | `agents/*/status.json` | Yes |
13
+ | Agent live output | `agents/*/live-output.log` | Yes (mtime used as heartbeat) |
14
+ | Process handles (`ChildProcess`) | In-memory Map | **No** |
15
+ | Cooldown timestamps | In-memory Map | **No** (repopulated from `engine/cooldowns.json`) |
16
+
17
+ ## Protection Mechanisms
18
+
19
+ ### 1. Grace Period on Startup (20 min default)
20
+
21
+ When the engine starts and finds active dispatches from a previous session, it sets `engineRestartGraceUntil` to `now + 20 minutes`. During this window, orphan detection is completely suppressed — agents won't be killed even if the engine has no process handle for them.
22
+
23
+ Configurable via `config.json`:
24
+ ```json
25
+ {
26
+ "engine": {
27
+ "restartGracePeriod": 1200000
28
+ }
29
+ }
30
+ ```
31
+
32
+ ### 2. Blocking Tool Detection
33
+
34
+ Even after the grace period expires, the engine scans each agent's `live-output.log` for the most recent `tool_use` call. If the agent is in a known blocking tool:
35
+
36
+ - **`TaskOutput` with `block: true`** — timeout extended to the task's own timeout + 1 min
37
+ - **`Bash` with long timeout (>5 min)** — timeout extended to the bash timeout + 1 min
38
+
39
+ This works for both tracked processes and orphans (no process handle).
40
+
41
+ ### 3. Stop Warning
42
+
43
+ `engine.js stop` checks for active dispatches and warns:
44
+ ```
45
+ WARNING: 2 agent(s) are still working:
46
+ - Dallas: [office-bohemia] Build & test PR PR-4959092
47
+ - Rebecca: [office-bohemia] Review PR PR-4964594
48
+
49
+ These agents will continue running but the engine won't monitor them.
50
+ On next start, they'll get a 20-min grace period before being marked as orphans.
51
+ To kill them now, run: node engine.js kill
52
+ ```
53
+
54
+ ### 4. Exponential Backoff on Failures
55
+
56
+ If an agent is killed as an orphan and the work item retries, cooldowns use exponential backoff (2^failures, max 8x) to prevent spam-retrying broken tasks.
57
+
58
+ ## Safe Restart Pattern
59
+
60
+ ```bash
61
+ node engine.js stop # Check the warning — are agents working?
62
+ # If yes, decide: wait for them to finish, or accept the grace period
63
+ # Make your code changes
64
+ node engine.js start # Grace period kicks in for surviving agents
65
+ ```
66
+
67
+ ## What the Engine Cannot Do
68
+
69
+ - **Reattach to processes** — Node.js `child_process` doesn't support adopting external PIDs. Once the process handle is lost, the engine can only observe the agent indirectly via file output.
70
+ - **Guarantee completion** — An agent that finishes during a restart will have its output saved to `live-output.log`, but the engine won't run post-completion hooks (PR sync, metrics update, learnings check). These are picked up on the next tick via output file scanning.
71
+ - **Resume mid-task** — If an agent is killed (by orphan detection or timeout), the work item is marked failed. It can be retried but starts from scratch.
72
+
73
+ ## Timeline of a Restart
74
+
75
+ ```
76
+ T+0s engine.js stop (warns about active agents)
77
+ Engine process exits. Agents keep running as OS processes.
78
+
79
+ T+30s Code changes made. engine.js start.
80
+ Engine reads dispatch.json — finds 2 active items.
81
+ Sets grace period: 20 min from now.
82
+ Logs: "2 active dispatch(es) from previous session"
83
+
84
+ T+0-20m Ticks run. Orphan detection skipped (grace period).
85
+ If an agent finishes, output is written to live-output.log.
86
+ Engine detects completed output on next tick via file scan.
87
+
88
+ T+20m Grace period expires.
89
+ Heartbeat check resumes. Blocking tool detection still active.
90
+ Agent in TaskOutput block:true gets extended timeout.
91
+ Agent with no output for 5min+ and no blocking tool → orphaned.
92
+ ```
@@ -1,49 +1,49 @@
1
- #!/usr/bin/env node
2
- /**
3
- * Wrapper for @azure-devops/mcp that fetches an ADO token via azureauth
4
- * broker (no browser popup) and sets AZURE_DEVOPS_EXT_PAT before launching
5
- * the MCP server.
6
- */
7
- const { execSync, spawn } = require('child_process');
8
- const path = require('path');
9
-
10
- // Fetch token via azureauth broker (corp tool, no browser)
11
- let token;
12
- try {
13
- token = execSync('azureauth ado token --mode broker --output token --timeout 1', {
14
- encoding: 'utf8',
15
- timeout: 30000,
16
- windowsHide: true,
17
- }).trim();
18
- } catch (e) {
19
- // Fallback: try with web mode (may open browser as last resort)
20
- try {
21
- token = execSync('azureauth ado token --mode web --output token --timeout 5', {
22
- encoding: 'utf8',
23
- timeout: 120000,
24
- windowsHide: true,
25
- }).trim();
26
- } catch (e2) {
27
- process.stderr.write('ado-mcp-wrapper: Failed to get ADO token: ' + e2.message + '\n');
28
- process.exit(1);
29
- }
30
- }
31
-
32
- // Launch the actual MCP server with the token in env
33
- const args = process.argv.slice(2);
34
- const child = spawn(process.platform === 'win32' ? 'npx.cmd' : 'npx', [
35
- '-y',
36
- '--registry=https://registry.npmjs.org/',
37
- '@azure-devops/mcp@latest',
38
- ...args
39
- ], {
40
- stdio: 'inherit',
41
- env: { ...process.env, AZURE_DEVOPS_EXT_PAT: token },
42
- windowsHide: true,
43
- });
44
-
45
- child.on('exit', (code) => process.exit(code || 0));
46
- child.on('error', (err) => {
47
- process.stderr.write('ado-mcp-wrapper: ' + err.message + '\n');
48
- process.exit(1);
49
- });
1
+ #!/usr/bin/env node
2
+ /**
3
+ * Wrapper for @azure-devops/mcp that fetches an ADO token via azureauth
4
+ * broker (no browser popup) and sets AZURE_DEVOPS_EXT_PAT before launching
5
+ * the MCP server.
6
+ */
7
+ const { execSync, spawn } = require('child_process');
8
+ const path = require('path');
9
+
10
+ // Fetch token via azureauth broker (corp tool, no browser)
11
+ let token;
12
+ try {
13
+ token = execSync('azureauth ado token --mode broker --output token --timeout 1', {
14
+ encoding: 'utf8',
15
+ timeout: 30000,
16
+ windowsHide: true,
17
+ }).trim();
18
+ } catch (e) {
19
+ // Fallback: try with web mode (may open browser as last resort)
20
+ try {
21
+ token = execSync('azureauth ado token --mode web --output token --timeout 5', {
22
+ encoding: 'utf8',
23
+ timeout: 120000,
24
+ windowsHide: true,
25
+ }).trim();
26
+ } catch (e2) {
27
+ process.stderr.write('ado-mcp-wrapper: Failed to get ADO token: ' + e2.message + '\n');
28
+ process.exit(1);
29
+ }
30
+ }
31
+
32
+ // Launch the actual MCP server with the token in env
33
+ const args = process.argv.slice(2);
34
+ const child = spawn(process.platform === 'win32' ? 'npx.cmd' : 'npx', [
35
+ '-y',
36
+ '--registry=https://registry.npmjs.org/',
37
+ '@azure-devops/mcp@latest',
38
+ ...args
39
+ ], {
40
+ stdio: 'inherit',
41
+ env: { ...process.env, AZURE_DEVOPS_EXT_PAT: token },
42
+ windowsHide: true,
43
+ });
44
+
45
+ child.on('exit', (code) => process.exit(code || 0));
46
+ child.on('error', (err) => {
47
+ process.stderr.write('ado-mcp-wrapper: ' + err.message + '\n');
48
+ process.exit(1);
49
+ });
package/engine.js CHANGED
@@ -48,6 +48,44 @@ const IDENTITY_DIR = path.join(SQUAD_DIR, 'identity');
48
48
  // "projects": [ { ... }, ... ] — multi-project (central .squad)
49
49
  // Each project must have "localPath" pointing to the repo root.
50
50
 
51
+ function validateConfig(config) {
52
+ let errors = 0;
53
+ // Agents
54
+ if (!config.agents || Object.keys(config.agents).length === 0) {
55
+ console.error('FATAL: No agents defined in config.json');
56
+ errors++;
57
+ }
58
+ // Projects
59
+ const projects = config.projects || [];
60
+ if (projects.length === 0) {
61
+ console.error('FATAL: No projects configured');
62
+ errors++;
63
+ }
64
+ for (const p of projects) {
65
+ if (!p.localPath || !fs.existsSync(path.resolve(p.localPath))) {
66
+ console.error(`WARN: Project "${p.name}" path not found: ${p.localPath}`);
67
+ }
68
+ if (!p.repositoryId) {
69
+ console.warn(`WARN: Project "${p.name}" missing repositoryId — PR operations will fail`);
70
+ }
71
+ }
72
+ // Playbooks
73
+ const requiredPlaybooks = ['implement', 'review', 'fix', 'work-item'];
74
+ for (const pb of requiredPlaybooks) {
75
+ if (!fs.existsSync(path.join(PLAYBOOKS_DIR, `${pb}.md`))) {
76
+ console.error(`WARN: Missing playbook: playbooks/${pb}.md`);
77
+ }
78
+ }
79
+ // Routing
80
+ if (!fs.existsSync(ROUTING_PATH)) {
81
+ console.error('WARN: routing.md not found — agent routing will use fallbacks only');
82
+ }
83
+ if (errors > 0) {
84
+ console.error(`\n${errors} fatal config error(s) — exiting.`);
85
+ process.exit(1);
86
+ }
87
+ }
88
+
51
89
  function getProjects(config) {
52
90
  if (config.projects && Array.isArray(config.projects)) {
53
91
  return config.projects;
@@ -525,6 +563,7 @@ function sanitizeBranch(name) {
525
563
  // ─── Agent Spawner ──────────────────────────────────────────────────────────
526
564
 
527
565
  const activeProcesses = new Map(); // dispatchId → { proc, agentId, startedAt }
566
+ let engineRestartGraceUntil = 0; // timestamp — suppress orphan detection until this time
528
567
 
529
568
  function spawnAgent(dispatchItem, config) {
530
569
  const { id, agent: agentId, prompt: taskPrompt, type, meta } = dispatchItem;
@@ -1268,6 +1307,77 @@ async function pollPrStatus(config) {
1268
1307
  }
1269
1308
  }
1270
1309
 
1310
+ // ─── Post-Merge / Post-Close Hooks ───────────────────────────────────────────
1311
+
1312
+ async function handlePostMerge(pr, project, config, newStatus) {
1313
+ const prNum = (pr.id || '').replace('PR-', '');
1314
+
1315
+ // 1. Worktree cleanup
1316
+ if (pr.branch) {
1317
+ const root = path.resolve(project.localPath);
1318
+ const wtRoot = path.resolve(root, config.engine?.worktreeRoot || '../worktrees');
1319
+ const wtPath = path.join(wtRoot, pr.branch);
1320
+ const btPath = path.join(wtRoot, `bt-${prNum}`); // build-and-test worktree
1321
+ for (const p of [wtPath, btPath]) {
1322
+ if (fs.existsSync(p)) {
1323
+ try {
1324
+ execSync(`git worktree remove "${p}" --force`, { cwd: root, stdio: 'pipe', timeout: 15000 });
1325
+ log('info', `Cleaned up worktree: ${p}`);
1326
+ } catch (e) { log('warn', `Failed to remove worktree ${p}: ${e.message}`); }
1327
+ }
1328
+ }
1329
+ }
1330
+
1331
+ // Only run remaining hooks for merged PRs (not abandoned)
1332
+ if (newStatus !== 'merged') return;
1333
+
1334
+ // 2. Update PRD item status to 'implemented'
1335
+ if (pr.prdItems?.length > 0) {
1336
+ const root = path.resolve(project.localPath);
1337
+ const prdSrc = project.workSources?.prd || {};
1338
+ const prdPath = path.resolve(root, prdSrc.path || 'docs/prd-gaps.json');
1339
+ const prd = safeJson(prdPath);
1340
+ if (prd?.missing_features) {
1341
+ let updated = 0;
1342
+ for (const itemId of pr.prdItems) {
1343
+ const feature = prd.missing_features.find(f => f.id === itemId);
1344
+ if (feature && feature.status !== 'implemented') {
1345
+ feature.status = 'implemented';
1346
+ updated++;
1347
+ }
1348
+ }
1349
+ if (updated > 0) {
1350
+ safeWrite(prdPath, prd);
1351
+ log('info', `Post-merge: marked ${updated} PRD item(s) as implemented for ${pr.id}`);
1352
+ }
1353
+ }
1354
+ }
1355
+
1356
+ // 3. Update agent metrics
1357
+ const agentId = (pr.agent || '').toLowerCase();
1358
+ if (agentId && config.agents?.[agentId]) {
1359
+ const metricsPath = path.join(ENGINE_DIR, 'metrics.json');
1360
+ const metrics = safeJson(metricsPath) || {};
1361
+ if (!metrics[agentId]) metrics[agentId] = { tasksCompleted:0, tasksErrored:0, prsCreated:0, prsApproved:0, prsRejected:0, prsMerged:0, reviewsDone:0, lastTask:null, lastCompleted:null };
1362
+ metrics[agentId].prsMerged = (metrics[agentId].prsMerged || 0) + 1;
1363
+ safeWrite(metricsPath, metrics);
1364
+ }
1365
+
1366
+ // 4. Teams notification
1367
+ const teamsUrl = process.env.TEAMS_PLAN_FLOW_URL;
1368
+ if (teamsUrl) {
1369
+ try {
1370
+ await fetch(teamsUrl, {
1371
+ method: 'POST',
1372
+ headers: { 'Content-Type': 'application/json' },
1373
+ body: JSON.stringify({ text: `PR ${pr.id} merged: ${pr.title} (${project.name}) by ${pr.agent || 'unknown'}` })
1374
+ });
1375
+ } catch (e) { log('warn', `Teams post-merge notify failed: ${e.message}`); }
1376
+ }
1377
+
1378
+ log('info', `Post-merge hooks completed for ${pr.id}`);
1379
+ }
1380
+
1271
1381
  function checkForLearnings(agentId, agentInfo, taskDesc) {
1272
1382
  const today = dateStamp();
1273
1383
  const inboxFiles = getInboxFiles();
@@ -1621,16 +1731,43 @@ function updateSnapshot(config) {
1621
1731
  safeWrite(path.join(IDENTITY_DIR, 'now.md'), snapshot);
1622
1732
  }
1623
1733
 
1734
+ // ─── Idle Alert ─────────────────────────────────────────────────────────────
1735
+
1736
+ let _lastActivityTime = Date.now();
1737
+ let _idleAlertSent = false;
1738
+
1739
+ function checkIdleThreshold(config) {
1740
+ const thresholdMs = (config.engine?.idleAlertMinutes || 15) * 60 * 1000;
1741
+ const agents = Object.keys(config.agents || {});
1742
+ const allIdle = agents.every(id => isAgentIdle(id));
1743
+ const dispatch = getDispatch();
1744
+ const hasPending = (dispatch.pending || []).length > 0;
1745
+
1746
+ if (!allIdle || hasPending) {
1747
+ _lastActivityTime = Date.now();
1748
+ _idleAlertSent = false;
1749
+ return;
1750
+ }
1751
+
1752
+ const idleMs = Date.now() - _lastActivityTime;
1753
+ if (idleMs > thresholdMs && !_idleAlertSent) {
1754
+ const mins = Math.round(idleMs / 60000);
1755
+ log('warn', `All agents idle for ${mins} minutes — no work sources producing items`);
1756
+ _idleAlertSent = true;
1757
+ }
1758
+ }
1759
+
1624
1760
  // ─── Timeout Checker ────────────────────────────────────────────────────────
1625
1761
 
1626
1762
  function checkTimeouts(config) {
1627
1763
  const timeout = config.engine?.agentTimeout || 18000000; // 5h default
1628
1764
  const heartbeatTimeout = config.engine?.heartbeatTimeout || 300000; // 5min — no output = dead
1629
1765
 
1630
- // 1. Check tracked processes for hard timeout
1766
+ // 1. Check tracked processes for hard timeout (supports per-item deadline from fan-out)
1631
1767
  for (const [id, info] of activeProcesses.entries()) {
1768
+ const itemTimeout = info.meta?.deadline ? Math.max(0, info.meta.deadline - new Date(info.startedAt).getTime()) : timeout;
1632
1769
  const elapsed = Date.now() - new Date(info.startedAt).getTime();
1633
- if (elapsed > timeout) {
1770
+ if (elapsed > itemTimeout) {
1634
1771
  log('warn', `Agent ${info.agentId} (${id}) hit hard timeout after ${Math.round(elapsed / 1000)}s — killing`);
1635
1772
  try { info.proc.kill('SIGTERM'); } catch {}
1636
1773
  setTimeout(() => {
@@ -1707,9 +1844,10 @@ function checkTimeouts(config) {
1707
1844
 
1708
1845
  // Check if agent is in a blocking tool call (TaskOutput block:true, Bash with long timeout, etc.)
1709
1846
  // These tools produce no stdout for extended periods — don't kill them prematurely
1847
+ // Check for BOTH tracked and untracked processes (orphan case after engine restart)
1710
1848
  let isBlocking = false;
1711
1849
  let blockingTimeout = heartbeatTimeout;
1712
- if (hasProcess && silentMs > heartbeatTimeout) {
1850
+ if (silentMs > heartbeatTimeout) {
1713
1851
  try {
1714
1852
  const liveLog = safeRead(liveLogPath);
1715
1853
  if (liveLog) {
@@ -1747,9 +1885,9 @@ function checkTimeouts(config) {
1747
1885
 
1748
1886
  const effectiveTimeout = isBlocking ? blockingTimeout : heartbeatTimeout;
1749
1887
 
1750
- if (!hasProcess && silentMs > heartbeatTimeout) {
1751
- // No tracked process AND no recent output → orphaned
1752
- log('warn', `Orphan detected: ${item.agent} (${item.id}) — no process tracked, no output for ${silentSec}s`);
1888
+ if (!hasProcess && silentMs > effectiveTimeout && Date.now() > engineRestartGraceUntil) {
1889
+ // No tracked process AND no recent output past effective timeout AND grace period expired → orphaned
1890
+ log('warn', `Orphan detected: ${item.agent} (${item.id}) — no process tracked, silent for ${silentSec}s${isBlocking ? ' (blocking timeout exceeded)' : ''}`);
1753
1891
  deadItems.push({ item, reason: `Orphaned — no process, silent for ${silentSec}s` });
1754
1892
  } else if (hasProcess && silentMs > effectiveTimeout) {
1755
1893
  // Has process but no output past effective timeout → hung
@@ -2041,15 +2179,16 @@ function discoverFromPrd(config, project) {
2041
2179
  const statusFilter = src.itemFilter?.status || ['missing', 'planned'];
2042
2180
  const items = (prd.missing_features || []).filter(f => statusFilter.includes(f.status));
2043
2181
  const newWork = [];
2182
+ const skipped = { dispatched: 0, cooldown: 0, noAgent: 0 };
2044
2183
 
2045
2184
  for (const item of items) {
2046
2185
  const key = `prd-${project?.name || 'default'}-${item.id}`;
2047
- if (isAlreadyDispatched(key)) continue;
2048
- if (isOnCooldown(key, cooldownMs)) continue;
2186
+ if (isAlreadyDispatched(key)) { skipped.dispatched++; continue; }
2187
+ if (isOnCooldown(key, cooldownMs)) { skipped.cooldown++; continue; }
2049
2188
 
2050
2189
  const workType = item.estimated_complexity === 'large' ? 'implement:large' : 'implement';
2051
2190
  const agentId = resolveAgent(workType, config);
2052
- if (!agentId) continue;
2191
+ if (!agentId) { skipped.noAgent++; continue; }
2053
2192
 
2054
2193
  const branchName = `feature/${item.id.toLowerCase()}-${item.name.toLowerCase().replace(/[^a-z0-9]+/g, '-').slice(0, 40)}`;
2055
2194
  const vars = {
@@ -2090,6 +2229,11 @@ function discoverFromPrd(config, project) {
2090
2229
  setCooldown(key);
2091
2230
  }
2092
2231
 
2232
+ const skipTotal = skipped.dispatched + skipped.cooldown + skipped.noAgent;
2233
+ if (skipTotal > 0) {
2234
+ log('debug', `PRD discovery (${project?.name}): skipped ${skipTotal} items (${skipped.dispatched} dispatched, ${skipped.cooldown} cooldown, ${skipped.noAgent} no agent)`);
2235
+ }
2236
+
2093
2237
  return newWork;
2094
2238
  }
2095
2239
 
@@ -2369,20 +2513,20 @@ function discoverFromWorkItems(config, project) {
2369
2513
  const items = safeJson(path.resolve(root, src.path)) || [];
2370
2514
  const cooldownMs = (src.cooldownMinutes || 0) * 60 * 1000;
2371
2515
  const newWork = [];
2516
+ const skipped = { gated: 0, noAgent: 0 };
2372
2517
 
2373
2518
  for (const item of items) {
2374
2519
  if (item.status !== 'queued' && item.status !== 'pending') continue;
2375
2520
 
2376
2521
  const key = `work-${project?.name || 'default'}-${item.id}`;
2377
- if (isAlreadyDispatched(key) || isOnCooldown(key, cooldownMs)) continue;
2522
+ if (isAlreadyDispatched(key) || isOnCooldown(key, cooldownMs)) { skipped.gated++; continue; }
2378
2523
 
2379
2524
  let workType = item.type || 'implement';
2380
- // Route large items to architecture agents, matching PRD/plan behavior
2381
2525
  if (workType === 'implement' && (item.complexity === 'large' || item.estimated_complexity === 'large')) {
2382
2526
  workType = 'implement:large';
2383
2527
  }
2384
2528
  const agentId = item.agent || resolveAgent(workType, config);
2385
- if (!agentId) continue;
2529
+ if (!agentId) { skipped.noAgent++; continue; }
2386
2530
 
2387
2531
  const branchName = item.branch || `work/${item.id}`;
2388
2532
  const vars = {
@@ -2443,6 +2587,11 @@ function discoverFromWorkItems(config, project) {
2443
2587
  safeWrite(workItemsPath, items);
2444
2588
  }
2445
2589
 
2590
+ const skipTotal = skipped.gated + skipped.noAgent;
2591
+ if (skipTotal > 0) {
2592
+ log('debug', `Work item discovery (${project?.name}): skipped ${skipTotal} items (${skipped.gated} gated, ${skipped.noAgent} no agent)`);
2593
+ }
2594
+
2446
2595
  return newWork;
2447
2596
  }
2448
2597
 
@@ -2695,7 +2844,10 @@ function discoverCentralWorkItems(config) {
2695
2844
  agentRole: agent.role,
2696
2845
  task: `[fan-out] ${item.title} → ${agent.name}${assignedProject ? ' → ' + assignedProject.name : ''}`,
2697
2846
  prompt,
2698
- meta: { dispatchKey: fanKey, source: 'central-work-item-fanout', item, parentKey: key }
2847
+ meta: {
2848
+ dispatchKey: fanKey, source: 'central-work-item-fanout', item, parentKey: key,
2849
+ deadline: item.timeout ? Date.now() + item.timeout : Date.now() + (config.engine?.fanOutTimeout || config.engine?.agentTimeout || 18000000)
2850
+ }
2699
2851
  });
2700
2852
  }
2701
2853
 
@@ -2838,8 +2990,9 @@ async function tickInner() {
2838
2990
  const config = getConfig();
2839
2991
  tickCount++;
2840
2992
 
2841
- // 1. Check for timed-out agents
2993
+ // 1. Check for timed-out agents and idle threshold
2842
2994
  checkTimeouts(config);
2995
+ checkIdleThreshold(config);
2843
2996
 
2844
2997
  // 2. Consolidate inbox
2845
2998
  consolidateInbox(config);
@@ -2931,9 +3084,24 @@ const commands = {
2931
3084
  }
2932
3085
  }
2933
3086
 
3087
+ // Validate config before starting
3088
+ validateConfig(config);
3089
+
2934
3090
  // Load persistent state
2935
3091
  loadCooldowns();
2936
3092
 
3093
+ // Grace period for agents that survived a restart
3094
+ const dispatch = getDispatch();
3095
+ const activeOnStart = (dispatch.active || []);
3096
+ if (activeOnStart.length > 0) {
3097
+ const gracePeriod = config.engine?.restartGracePeriod || 1200000; // 20min default
3098
+ engineRestartGraceUntil = Date.now() + gracePeriod;
3099
+ console.log(` ${activeOnStart.length} active dispatch(es) from previous session — ${gracePeriod / 60000}min grace period before orphan detection`);
3100
+ for (const item of activeOnStart) {
3101
+ console.log(` - ${item.agentName || item.agent}: ${(item.task || '').slice(0, 70)}`);
3102
+ }
3103
+ }
3104
+
2937
3105
  // Initial tick
2938
3106
  tick();
2939
3107
 
@@ -2944,6 +3112,18 @@ const commands = {
2944
3112
  },
2945
3113
 
2946
3114
  stop() {
3115
+ // Warn if agents are actively working
3116
+ const dispatch = getDispatch();
3117
+ const active = (dispatch.active || []);
3118
+ if (active.length > 0) {
3119
+ console.log(`\n WARNING: ${active.length} agent(s) are still working:`);
3120
+ for (const item of active) {
3121
+ console.log(` - ${item.agentName || item.agent}: ${(item.task || '').slice(0, 80)}`);
3122
+ }
3123
+ console.log('\n These agents will continue running but the engine won\'t monitor them.');
3124
+ console.log(' On next start, they\'ll get a 20-min grace period before being marked as orphans.');
3125
+ console.log(' To kill them now, run: node engine.js kill\n');
3126
+ }
2947
3127
  safeWrite(CONTROL_PATH, { state: 'stopped', stopped_at: ts() });
2948
3128
  log('info', 'Engine stopped');
2949
3129
  console.log('Engine stopped.');
package/package.json CHANGED
@@ -1,46 +1,46 @@
1
- {
2
- "name": "@yemi33/squad",
3
- "version": "0.1.0",
4
- "description": "Multi-agent AI dev team that runs from ~/.squad/ — five autonomous agents share a single engine, dashboard, and knowledge base",
5
- "bin": {
6
- "squad": "bin/squad.js"
7
- },
8
- "keywords": [
9
- "ai",
10
- "agents",
11
- "claude",
12
- "dev-team",
13
- "automation",
14
- "multi-agent",
15
- "cli"
16
- ],
17
- "author": "yemi33",
18
- "license": "MIT",
19
- "repository": {
20
- "type": "git",
21
- "url": "https://github.com/yemi33/squad.git"
22
- },
23
- "homepage": "https://github.com/yemi33/squad#readme",
24
- "engines": {
25
- "node": ">=18"
26
- },
27
- "files": [
28
- "bin/",
29
- "agents/*/charter.md",
30
- "config.template.json",
31
- "dashboard.html",
32
- "dashboard.js",
33
- "docs/",
34
- "engine.js",
35
- "engine/spawn-agent.js",
36
- "engine/ado-mcp-wrapper.js",
37
- "playbooks/",
38
- "routing.md",
39
- "skills/README.md",
40
- "skills/ado-pr-status-fetch.md",
41
- "squad.js",
42
- "team.md",
43
- "README.md",
44
- "TODO.md"
45
- ]
46
- }
1
+ {
2
+ "name": "@yemi33/squad",
3
+ "version": "0.1.2",
4
+ "description": "Multi-agent AI dev team that runs from ~/.squad/ — five autonomous agents share a single engine, dashboard, and knowledge base",
5
+ "bin": {
6
+ "squad": "bin/squad.js"
7
+ },
8
+ "keywords": [
9
+ "ai",
10
+ "agents",
11
+ "claude",
12
+ "dev-team",
13
+ "automation",
14
+ "multi-agent",
15
+ "cli"
16
+ ],
17
+ "author": "yemi33",
18
+ "license": "MIT",
19
+ "repository": {
20
+ "type": "git",
21
+ "url": "https://github.com/yemi33/squad.git"
22
+ },
23
+ "homepage": "https://github.com/yemi33/squad#readme",
24
+ "engines": {
25
+ "node": ">=18"
26
+ },
27
+ "files": [
28
+ "bin/",
29
+ "agents/*/charter.md",
30
+ "config.template.json",
31
+ "dashboard.html",
32
+ "dashboard.js",
33
+ "docs/",
34
+ "engine.js",
35
+ "engine/spawn-agent.js",
36
+ "engine/ado-mcp-wrapper.js",
37
+ "playbooks/",
38
+ "routing.md",
39
+ "skills/README.md",
40
+ "skills/ado-pr-status-fetch.md",
41
+ "squad.js",
42
+ "team.md",
43
+ "README.md",
44
+ "TODO.md"
45
+ ]
46
+ }