tiger-agent 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/.env.example CHANGED
@@ -4,6 +4,13 @@
4
4
  # NEVER commit .env to git!
5
5
  # ============================================
6
6
 
7
+ # Provider Routing (default: MiniMax)
8
+ ACTIVE_PROVIDER=minimax
9
+ PROVIDER_ORDER=minimax,claude,kimi,moonshot,zai
10
+ MINIMAX_API_KEY=your_minimax_api_key_here
11
+ MINIMAX_BASE_URL=https://api.minimax.io/v1
12
+ MINIMAX_MODEL=MiniMax-M2.5
13
+
7
14
  # Google Gemini (for image generation)
8
15
  GEMINI_API_KEY=your_gemini_api_key_here
9
16
 
@@ -12,6 +19,8 @@ TELEGRAM_BOT_TOKEN=your_telegram_bot_token_here
12
19
  TELEGRAM_CHAT_ID=8172556270
13
20
  # Swarm agent step timeout in ms (0 = no extra swarm timeout)
14
21
  SWARM_AGENT_TIMEOUT_MS=0
22
+ # Start Telegram swarm routing enabled or disabled (true/false)
23
+ SWARM_ENABLED=false
15
24
  # Swarm-only provider failover on timeout/network/API error (true/false)
16
25
  SWARM_ROUTE_ON_PROVIDER_ERROR=false
17
26
  # Swarm default flow for new Telegram tasks (auto|design|research_build)
@@ -7,6 +7,9 @@ MOONSHOT_API_KEY=
7
7
  # Kimi Code API key (used when KIMI_PROVIDER=code)
8
8
  KIMI_CODE_API_KEY=
9
9
 
10
+ # MiniMax API key
11
+ MINIMAX_API_KEY=
12
+
10
13
  # Backward-compatible alias (optional)
11
14
  KIMI_API_KEY=
12
15
 
package/README.md CHANGED
@@ -54,6 +54,32 @@ Made by **AI Research Group, Department of Civil Engineering, KMUTT**
54
54
  | **Channels** | CLI + Telegram simultaneously | Single channel only |
55
55
  | **Execution** | Chains multiple skills autonomously | Single command only |
56
56
 
57
+ ## 📊 Dimension Comparison
58
+
59
+ | Dimension | Tiger v0.3.1 🐯 | OpenClaw 🔧 | NanoClaw 🪐 |
60
+ |---|---|---|---|
61
+ | Language | JS + Python | TypeScript | TypeScript |
62
+ | Platform | Linux + Docker | macOS/Linux/Win | macOS/Linux/Win |
63
+ | Install | `npm install -g tiger-agent` | `npm install -g openclaw` | `git clone` + Claude Code |
64
+ | LLM Providers | 5 (Kimi, Z.ai, MiniMax, Claude, Moonshot) | OpenAI + Claude | Claude only |
65
+ | Multi-provider Failover | ✅ Auto on 429/403 | ✅ | ❌ |
66
+ | Token Budgeting | ✅ Per-provider daily limits | ❌ | ❌ |
67
+ | Predefined Agents | ✅ Role-based, customizable via Markdown files | ✅ Built-in typed agents | ❌ User-defined only |
68
+ | Swarm Architecture | ✅ YAML configurable | ❌ | ❌ |
69
+ | Parallel Execution | ✅ Fault-tolerant `min_success` threshold | ✅ | ✅ |
70
+ | Judgment Matrix | ✅ Weighted criteria + review-revise loop | ❌ | ❌ |
71
+ | Task Resume | ✅ `/task continue <id>` | ❌ | ❌ |
72
+ | Crash Detection | ✅ 60s heartbeat; 5-min stale -> restart worker | ❌ | ✅ 5-min -> reclaim tasks |
73
+ | Container Isolation | ✅ Docker hardened (`cap_drop: ALL`, read-only FS) | Optional Docker | ✅ Docker default |
74
+ | Memory Persistence | ✅ Cross-session SQLite + 30-day backup | Session only | Team lifetime only |
75
+ | Self-learning | ✅ 12h reflection + 24h regeneration | ❌ | ❌ |
76
+ | Vector Retrieval | ✅ sqlite-vec / cosine fallback | ❌ | ❌ |
77
+ | Audit Logging | ✅ | ❌ | ❌ |
78
+ | Voice / Browser | ❌ / ❌ | ✅ / ✅ | ❌ / ❌ |
79
+ | Channel Coverage | Telegram, WhatsApp, CLI | All + iMessage + Teams | Most major |
80
+ | Core Strength | Cost control + YAML swarm + self-learning | Channel breadth + voice + sync A2A | Security + formal swarm lifecycle |
81
+ | Core Weakness | Linux-primary; no cross-task DAG | High complexity; app-layer security | Single-provider lock-in |
82
+
57
83
  ---
58
84
 
59
85
  ## 📋 Requirements
@@ -72,16 +98,60 @@ npm install -g tiger-agent
72
98
 
73
99
  All config and runtime data is stored in `~/.tiger/` — nothing written to the npm global directory.
74
100
 
101
+ ## 🐳 Docker (Safer Runtime Isolation)
102
+
103
+ Run Tiger in a hardened container with:
104
+ - non-root user (`node`)
105
+ - dropped Linux capabilities (`cap_drop: [ALL]`)
106
+ - `no-new-privileges`
107
+ - read-only root filesystem
108
+ - persistent writable volume only for `TIGER_HOME` (`/home/node/.tiger`)
109
+
110
+ Build image:
111
+
112
+ ```bash
113
+ docker build -t tiger-agent:local .
114
+ ```
115
+
116
+ Run CLI mode:
117
+
118
+ ```bash
119
+ docker run --rm -it \
120
+ --env-file .env \
121
+ --read-only \
122
+ --tmpfs /tmp \
123
+ --security-opt no-new-privileges:true \
124
+ --cap-drop ALL \
125
+ -e TIGER_HOME=/home/node/.tiger \
126
+ -v tiger_home:/home/node/.tiger \
127
+ tiger-agent:local start
128
+ ```
129
+
130
+ Run Telegram mode via Compose:
131
+
132
+ ```bash
133
+ docker compose up -d
134
+ docker compose logs -f tiger
135
+ ```
136
+
137
+ Default compose command is `telegram`. Change `command:` in `docker-compose.yml` if you want `start` instead.
138
+
75
139
  ---
76
140
 
77
141
  ## 🚀 Quick Start
78
142
 
79
- ### 1. Run the setup wizard
143
+ ### 1. Run the setup wizard (`npm`, not `npn`)
80
144
 
81
145
  ```bash
82
146
  tiger onboard
83
147
  ```
84
148
 
149
+ If you cloned this repo and run locally (without global install), use:
150
+
151
+ ```bash
152
+ npm run onboard
153
+ ```
154
+
85
155
  The wizard will ask for:
86
156
  - **Active provider** — which LLM to use by default (e.g. `zai`, `claude`)
87
157
  - **Fallback order** — comma-separated list tried when the active provider is rate-limited
@@ -92,25 +162,45 @@ The wizard will ask for:
92
162
 
93
163
  Config is saved to `~/.tiger/.env` (mode 600).
94
164
 
165
+ **MiniMax starter (quick setup):**
166
+ ```bash
167
+ # during onboard: choose active provider = minimax
168
+ tiger onboard
169
+ # local repo alternative
170
+ # npm run onboard
171
+ ```
172
+ Set at least:
173
+ - `ACTIVE_PROVIDER=minimax`
174
+ - `MINIMAX_API_KEY=...`
175
+
95
176
  ### 2. Start
96
177
 
97
178
  **CLI chat:**
98
179
  ```bash
99
180
  tiger start
181
+ # local repo
182
+ npm run start
100
183
  ```
101
184
  Exit with `/exit` or `/quit`.
102
185
 
103
186
  **Telegram bot (foreground):**
104
187
  ```bash
105
188
  tiger telegram
189
+ # local repo
190
+ npm run telegram
106
191
  ```
192
+ Use foreground mode only for testing/log watching in the current terminal session.
107
193
 
108
194
  **Telegram bot (background daemon):**
109
195
  ```bash
110
196
  tiger telegram --background # start
111
197
  tiger status # check if running
112
198
  tiger stop # stop
199
+ # local repo
200
+ npm run telegram:bg # start
201
+ npm run telegram:stop # stop
113
202
  ```
203
+ Recommended for daily use: run background mode so Tiger keeps running after you close the terminal.
114
204
 
115
205
  **Restart background bot (after editing `.env` in this repo):**
116
206
  ```bash
@@ -134,6 +224,11 @@ Logs: `~/.tiger/logs/telegram.out.log`
134
224
  | **Status** | `tiger status` | Check daemon status |
135
225
  | **Onboard** | `tiger onboard` | Re-run setup wizard |
136
226
 
227
+ Background crash detection:
228
+ - Telegram worker now emits a heartbeat every 60 seconds.
229
+ - Supervisor watchdog checks heartbeat every minute.
230
+ - If heartbeat is stale for 5 minutes, supervisor force-restarts the worker.
231
+
137
232
  ---
138
233
 
139
234
  ## 🔧 Setup Wizard Details
@@ -186,7 +281,7 @@ Tiger supports **5 providers** with automatic fallback and daily token limits.
186
281
  | Kimi Code | `kimi` | `k2p5` | `KIMI_CODE_API_KEY` |
187
282
  | Kimi Moonshot | `moonshot` | `kimi-k1` | `MOONSHOT_API_KEY` |
188
283
  | Z.ai (Zhipu) | `zai` | `glm-4.7` | `ZAI_API_KEY` (format: `id.secret`) |
189
- | MiniMax | `minimax` | `abab6.5s-chat` | `MINIMAX_API_KEY` |
284
+ | MiniMax | `minimax` | `MiniMax-M2.5` | `MINIMAX_API_KEY` |
190
285
  | Claude (Anthropic) | `claude` | `claude-sonnet-4-6` | `CLAUDE_API_KEY` |
191
286
 
192
287
  ### `.env` Example
@@ -218,6 +313,10 @@ SWARM_AGENT_TIMEOUT_MS=120000
218
313
  # Swarm only: on timeout/network/API error, retry via next provider
219
314
  SWARM_ROUTE_ON_PROVIDER_ERROR=true
220
315
 
316
+ # Swarm execution resilience
317
+ SWARM_STEP_MAX_RETRIES=2
318
+ SWARM_CONTINUE_ON_ERROR=true
319
+
221
320
  # Swarm task entry policy
222
321
  SWARM_DEFAULT_FLOW=auto
223
322
  SWARM_FIRST_AGENT_POLICY=auto
@@ -267,17 +366,20 @@ SWARM_FIRST_AGENT=designer
267
366
 
268
367
  Tiger v0.3.1 includes an internal agent swarm for Telegram message routing.
269
368
 
270
- - **Default:** swarm is **ON** when the Telegram bot starts
369
+ - **Default:** swarm is **OFF** when the Telegram bot starts (`SWARM_ENABLED=false`)
271
370
  - **`/swarm on`**: regular user messages are routed through the YAML architecture in `swarm/architecture/*.yaml` (selected by `tasks/styles/default.yaml`)
272
371
  - **`/swarm off`**: regular user messages skip the swarm and go directly to the standard Tiger agent reply path
273
372
  - **Scope:** this toggle affects only **normal chat messages** (not admin commands like `/api`, `/tokens`, `/limit`)
274
- - **Current persistence:** the `/swarm` toggle is currently **in-memory only** and resets to **ON** after bot restart
373
+ - **Current persistence:** the `/swarm` toggle is currently **in-memory only** and resets to `SWARM_ENABLED` value after bot restart
275
374
  - **Task resume:** use `/task continue <task_id>` (or `/task retry <task_id>`) to continue a failed timeout/API-error task without starting over
276
375
 
277
376
  ### Swarm Timeout / Failover (`.env`)
278
377
 
279
378
  - `SWARM_AGENT_TIMEOUT_MS`: timeout per swarm worker step (e.g. one `designer` turn). `0` disables the extra swarm timeout.
379
+ - `SWARM_ENABLED=true|false`: default `/swarm` state at bot startup.
280
380
  - `SWARM_ROUTE_ON_PROVIDER_ERROR=true|false`: swarm-only provider failover on timeout/network/API errors.
381
+ - `SWARM_STEP_MAX_RETRIES`: retries per failed worker/stage before giving up.
382
+ - `SWARM_CONTINUE_ON_ERROR=true|false`: if `true`, swarm continues on degraded path after retries are exhausted (instead of hard failing).
281
383
  - Provider timeouts are separate and provider-specific, for example `KIMI_TIMEOUT_MS`, `ZAI_TIMEOUT_MS`, `CLAUDE_TIMEOUT_MS`.
282
384
 
283
385
  ### Swarm Entry Policy (`.env`)
@@ -364,6 +466,14 @@ Default architecture behavior:
364
466
  - Stage 3: selected designer revises based on reviewer feedback (loop until approved)
365
467
  - Stage 4: `spec_writer` writes final output in two sections: **Calculation Report** and **Executive Summary**
366
468
 
469
+ Resilient execution behavior:
470
+
471
+ - Parallel stages are fault-tolerant: one failed role does not abort the whole stage.
472
+ - `type: parallel` now supports `min_success` (default `1`) to define how many successful role outputs are required.
473
+ - Failed parallel-role errors are stored in context as `<store_as>_errors`.
474
+ - Worker/stage retries are controlled by `SWARM_STEP_MAX_RETRIES`.
475
+ - If retries are exhausted and `SWARM_CONTINUE_ON_ERROR=true`, swarm continues on a degraded path instead of hard fail.
476
+
367
477
  Example `swarm/architecture/tiger_parallel_design.yaml`:
368
478
 
369
479
  ```yaml
@@ -394,6 +504,7 @@ stages:
394
504
  - designer_a
395
505
  - designer_b
396
506
  - designer_c
507
+ min_success: 2
397
508
  store_as: design_candidates
398
509
  next: review_best
399
510
  - id: review_best
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tiger-agent",
3
- "version": "0.3.1",
3
+ "version": "0.3.2",
4
4
  "description": "Cognitive AI agent with persistent memory, multi-provider LLM, and Telegram bot",
5
5
  "type": "commonjs",
6
6
  "main": "src/cli.js",
@@ -60,6 +60,19 @@ function envLine(k, v) {
60
60
  return `${k}=${s}`;
61
61
  }
62
62
 
63
+ const KNOWN_PROVIDERS = ['minimax', 'zai', 'claude', 'kimi', 'moonshot'];
64
+
65
+ function parseProviderList(input, fallback = []) {
66
+ const raw = String(input || '').trim();
67
+ const values = (raw ? raw : fallback.join(','))
68
+ .split(',')
69
+ .map((s) => s.trim().toLowerCase())
70
+ .filter(Boolean);
71
+ const unique = [...new Set(values)];
72
+ const invalid = unique.filter((p) => !KNOWN_PROVIDERS.includes(p));
73
+ return { providers: unique, invalid };
74
+ }
75
+
63
76
  // ─── Daemon helpers ───────────────────────────────────────────────────────────
64
77
 
65
78
  function nodeBin() {
@@ -180,32 +193,87 @@ Config will be saved to: ${TIGER_HOME}
180
193
  if (!yn(ow, false)) { console.log('Cancelled.'); rl.close(); return; }
181
194
  }
182
195
 
183
- // ── Active provider ────────────────────────────────────────────────────────
184
- console.log('\nAvailable providers: kimi, zai (Zhipu GLM-4.7), minimax, claude, moonshot');
185
- const activeProv = (await ask('Active provider (zai): ')).trim() || 'zai';
186
- const provOrder = (await ask(`Provider fallback order (${activeProv},claude,kimi,minimax,moonshot): `)).trim()
187
- || `${activeProv},claude,kimi,minimax,moonshot`;
196
+ // ── Provider selection / routing ──────────────────────────────────────────
197
+ console.log('\nAvailable providers: minimax, zai (Zhipu GLM-4.7), claude, kimi, moonshot');
198
+ console.log('Choose only providers you want to configure. Others will be omitted from .env.');
199
+
200
+ let selectedProviders = [];
201
+ while (!selectedProviders.length) {
202
+ const picked = await ask('Providers to configure (comma-separated, default: minimax): ');
203
+ const parsed = parseProviderList(picked, ['minimax']);
204
+ if (parsed.invalid.length) {
205
+ console.log(`Invalid provider(s): ${parsed.invalid.join(', ')}. Try again.`);
206
+ continue;
207
+ }
208
+ if (!parsed.providers.length) {
209
+ console.log('Pick at least one provider.');
210
+ continue;
211
+ }
212
+ selectedProviders = parsed.providers;
213
+ }
214
+
215
+ const activeDefault = selectedProviders[0];
216
+ let activeProv = '';
217
+ while (!activeProv) {
218
+ const candidate = (await ask(`Active provider (${activeDefault}): `)).trim().toLowerCase() || activeDefault;
219
+ if (!selectedProviders.includes(candidate)) {
220
+ console.log(`Active provider must be one of: ${selectedProviders.join(', ')}`);
221
+ continue;
222
+ }
223
+ activeProv = candidate;
224
+ }
188
225
 
189
- // ── API keys ───────────────────────────────────────────────────────────────
190
- console.log('\nEnter API keys (press Enter to skip a provider):');
226
+ const orderDefault = [activeProv, ...selectedProviders.filter((p) => p !== activeProv)].join(',');
227
+ let provOrder = '';
228
+ while (!provOrder) {
229
+ const input = await ask(`Provider fallback order (${orderDefault}): `);
230
+ const parsed = parseProviderList(input, [activeProv, ...selectedProviders.filter((p) => p !== activeProv)]);
231
+ if (parsed.invalid.length) {
232
+ console.log(`Invalid provider(s): ${parsed.invalid.join(', ')}. Try again.`);
233
+ continue;
234
+ }
235
+ const outsideSelection = parsed.providers.filter((p) => !selectedProviders.includes(p));
236
+ if (outsideSelection.length) {
237
+ console.log(`Order can only include selected providers: ${selectedProviders.join(', ')}`);
238
+ continue;
239
+ }
240
+ if (!parsed.providers.includes(activeProv)) {
241
+ console.log(`Order must include active provider: ${activeProv}`);
242
+ continue;
243
+ }
244
+ provOrder = parsed.providers.join(',');
245
+ }
191
246
 
192
- const kimiKey = (await askHidden(' KIMI_CODE_API_KEY : ')).trim();
193
- const moonshotKey= (await askHidden(' MOONSHOT_API_KEY : ')).trim();
194
- const zaiKey = (await askHidden(' ZAI_API_KEY : ')).trim();
195
- const minimaxKey = (await askHidden(' MINIMAX_API_KEY : ')).trim();
196
- const claudeKey = (await askHidden(' CLAUDE_API_KEY : ')).trim();
247
+ // ── API keys ───────────────────────────────────────────────────────────────
248
+ console.log('\nEnter API keys for selected providers:');
249
+ const kimiKey = selectedProviders.includes('kimi') ? (await askHidden(' KIMI_CODE_API_KEY : ')).trim() : '';
250
+ const moonshotKey = selectedProviders.includes('moonshot') ? (await askHidden(' MOONSHOT_API_KEY : ')).trim() : '';
251
+ const zaiKey = selectedProviders.includes('zai') ? (await askHidden(' ZAI_API_KEY : ')).trim() : '';
252
+ const minimaxKey = selectedProviders.includes('minimax') ? (await askHidden(' MINIMAX_API_KEY : ')).trim() : '';
253
+ const claudeKey = selectedProviders.includes('claude') ? (await askHidden(' CLAUDE_API_KEY : ')).trim() : '';
197
254
 
198
255
  // ── Telegram ───────────────────────────────────────────────────────────────
199
256
  console.log('');
200
257
  const tgToken = (await askHidden(' TELEGRAM_BOT_TOKEN : ')).trim();
201
258
 
202
259
  // ── Token limits ───────────────────────────────────────────────────────────
203
- console.log('\nDaily token limits per provider (0 = unlimited, auto-switch on breach):');
204
- const kimiLimit = (await ask(' KIMI_TOKEN_LIMIT (100000): ')).trim() || '100000';
205
- const moonshotLimit= (await ask(' MOONSHOT_TOKEN_LIMIT(100000): ')).trim() || '100000';
206
- const zaiLimit = (await ask(' ZAI_TOKEN_LIMIT (100000): ')).trim() || '100000';
207
- const minimaxLimit = (await ask(' MINIMAX_TOKEN_LIMIT (100000): ')).trim() || '100000';
208
- const claudeLimit = (await ask(' CLAUDE_TOKEN_LIMIT (500000): ')).trim() || '500000';
260
+ const tokenLimits = {};
261
+ console.log('\nDaily token limits for selected providers (0 = unlimited, auto-switch on breach):');
262
+ if (selectedProviders.includes('kimi')) {
263
+ tokenLimits.kimi = (await ask(' KIMI_TOKEN_LIMIT (100000): ')).trim() || '100000';
264
+ }
265
+ if (selectedProviders.includes('moonshot')) {
266
+ tokenLimits.moonshot = (await ask(' MOONSHOT_TOKEN_LIMIT(100000): ')).trim() || '100000';
267
+ }
268
+ if (selectedProviders.includes('zai')) {
269
+ tokenLimits.zai = (await ask(' ZAI_TOKEN_LIMIT (100000): ')).trim() || '100000';
270
+ }
271
+ if (selectedProviders.includes('minimax')) {
272
+ tokenLimits.minimax = (await ask(' MINIMAX_TOKEN_LIMIT (100000): ')).trim() || '100000';
273
+ }
274
+ if (selectedProviders.includes('claude')) {
275
+ tokenLimits.claude = (await ask(' CLAUDE_TOKEN_LIMIT (500000): ')).trim() || '500000';
276
+ }
209
277
 
210
278
  // ── Misc ───────────────────────────────────────────────────────────────────
211
279
  const allowShell = yn(await ask('\nEnable shell tool? (y/N): '), false);
@@ -215,51 +283,80 @@ Config will be saved to: ${TIGER_HOME}
215
283
  const lines = [
216
284
  '# Tiger Agent config — generated by `tiger onboard`',
217
285
  '',
218
- '# ── Legacy Kimi compat (used if ACTIVE_PROVIDER=kimi)',
219
- envLine('KIMI_PROVIDER', 'code'),
220
- envLine('KIMI_CODE_API_KEY', kimiKey),
221
- envLine('KIMI_BASE_URL', 'https://api.kimi.com/coding/v1'),
222
- envLine('KIMI_CHAT_MODEL', 'kimi-coding/k2p5'),
223
- envLine('KIMI_EMBED_MODEL', ''),
224
- envLine('KIMI_USER_AGENT', 'KimiCLI/0.77'),
225
- envLine('KIMI_ENABLE_EMBEDDINGS', 'false'),
226
- envLine('KIMI_TIMEOUT_MS', '30000'),
227
- '',
228
286
  '# ── Multi-provider',
229
287
  envLine('ACTIVE_PROVIDER', activeProv),
230
288
  envLine('PROVIDER_ORDER', provOrder),
231
- '',
232
- '# ── Z.ai (Zhipu GLM)',
233
- envLine('ZAI_API_KEY', zaiKey),
234
- envLine('ZAI_BASE_URL', 'https://api.z.ai/api/coding/paas/v4'),
235
- envLine('ZAI_MODEL', 'glm-4.7'),
236
- envLine('ZAI_TIMEOUT_MS', '30000'),
237
- '',
238
- '# ── MiniMax',
239
- envLine('MINIMAX_API_KEY', minimaxKey),
240
- envLine('MINIMAX_BASE_URL', 'https://api.minimax.chat/v1'),
241
- envLine('MINIMAX_MODEL', 'abab6.5s-chat'),
242
- envLine('MINIMAX_TIMEOUT_MS', '30000'),
243
- '',
244
- '# ── Claude (Anthropic)',
245
- envLine('CLAUDE_API_KEY', claudeKey),
246
- envLine('CLAUDE_MODEL', 'claude-sonnet-4-6'),
247
- envLine('CLAUDE_TIMEOUT_MS', '60000'),
248
- '',
249
- '# ── Moonshot',
250
- envLine('MOONSHOT_API_KEY', moonshotKey),
251
- envLine('MOONSHOT_BASE_URL', 'https://api.moonshot.cn/v1'),
252
- envLine('MOONSHOT_MODEL', 'kimi-k1'),
253
- '',
254
- '# ── Token limits (daily, 0 = unlimited)',
255
- envLine('KIMI_TOKEN_LIMIT', kimiLimit),
256
- envLine('MOONSHOT_TOKEN_LIMIT', moonshotLimit),
257
- envLine('ZAI_TOKEN_LIMIT', zaiLimit),
258
- envLine('MINIMAX_TOKEN_LIMIT', minimaxLimit),
259
- envLine('CLAUDE_TOKEN_LIMIT', claudeLimit),
289
+ ''
290
+ ];
291
+
292
+ if (selectedProviders.includes('kimi')) {
293
+ lines.push(
294
+ '# ── Legacy Kimi compat (used if ACTIVE_PROVIDER=kimi)',
295
+ envLine('KIMI_PROVIDER', 'code'),
296
+ envLine('KIMI_CODE_API_KEY', kimiKey),
297
+ envLine('KIMI_BASE_URL', 'https://api.kimi.com/coding/v1'),
298
+ envLine('KIMI_CHAT_MODEL', 'kimi-coding/k2p5'),
299
+ envLine('KIMI_EMBED_MODEL', ''),
300
+ envLine('KIMI_USER_AGENT', 'KimiCLI/0.77'),
301
+ envLine('KIMI_ENABLE_EMBEDDINGS', 'false'),
302
+ envLine('KIMI_TIMEOUT_MS', '30000'),
303
+ ''
304
+ );
305
+ }
306
+
307
+ if (selectedProviders.includes('zai')) {
308
+ lines.push(
309
+ '# ── Z.ai (Zhipu GLM)',
310
+ envLine('ZAI_API_KEY', zaiKey),
311
+ envLine('ZAI_BASE_URL', 'https://api.z.ai/api/coding/paas/v4'),
312
+ envLine('ZAI_MODEL', 'glm-4.7'),
313
+ envLine('ZAI_TIMEOUT_MS', '30000'),
314
+ ''
315
+ );
316
+ }
317
+
318
+ if (selectedProviders.includes('minimax')) {
319
+ lines.push(
320
+ '# ── MiniMax (Coding / OpenAI-compatible)',
321
+ envLine('MINIMAX_API_KEY', minimaxKey),
322
+ envLine('MINIMAX_BASE_URL', 'https://api.minimax.io/v1'),
323
+ envLine('MINIMAX_MODEL', 'MiniMax-M2.5'),
324
+ envLine('MINIMAX_TIMEOUT_MS', '30000'),
325
+ ''
326
+ );
327
+ }
328
+
329
+ if (selectedProviders.includes('claude')) {
330
+ lines.push(
331
+ '# ── Claude (Anthropic)',
332
+ envLine('CLAUDE_API_KEY', claudeKey),
333
+ envLine('CLAUDE_MODEL', 'claude-sonnet-4-6'),
334
+ envLine('CLAUDE_TIMEOUT_MS', '60000'),
335
+ ''
336
+ );
337
+ }
338
+
339
+ if (selectedProviders.includes('moonshot')) {
340
+ lines.push(
341
+ '# ── Moonshot',
342
+ envLine('MOONSHOT_API_KEY', moonshotKey),
343
+ envLine('MOONSHOT_BASE_URL', 'https://api.moonshot.cn/v1'),
344
+ envLine('MOONSHOT_MODEL', 'kimi-k1'),
345
+ ''
346
+ );
347
+ }
348
+
349
+ lines.push('# ── Token limits (daily, 0 = unlimited)');
350
+ if (tokenLimits.kimi != null) lines.push(envLine('KIMI_TOKEN_LIMIT', tokenLimits.kimi));
351
+ if (tokenLimits.moonshot != null) lines.push(envLine('MOONSHOT_TOKEN_LIMIT', tokenLimits.moonshot));
352
+ if (tokenLimits.zai != null) lines.push(envLine('ZAI_TOKEN_LIMIT', tokenLimits.zai));
353
+ if (tokenLimits.minimax != null) lines.push(envLine('MINIMAX_TOKEN_LIMIT', tokenLimits.minimax));
354
+ if (tokenLimits.claude != null) lines.push(envLine('CLAUDE_TOKEN_LIMIT', tokenLimits.claude));
355
+ lines.push(
260
356
  '',
261
357
  '# ── Telegram',
262
358
  envLine('TELEGRAM_BOT_TOKEN', tgToken),
359
+ envLine('SWARM_ENABLED', 'false'),
263
360
  '',
264
361
  '# ── Permissions',
265
362
  envLine('ALLOW_SHELL', allowShell ? 'true' : 'false'),
@@ -280,7 +377,7 @@ Config will be saved to: ${TIGER_HOME}
280
377
  'MEMORY_INGEST_EVERY_TURNS=2',
281
378
  'MEMORY_INGEST_MIN_CHARS=140',
282
379
  ''
283
- ];
380
+ );
284
381
 
285
382
  fs.writeFileSync(ENV_PATH, lines.join('\n'), { mode: 0o600 });
286
383
  console.log(`\n✅ Config written to ${ENV_PATH}`);
@@ -307,7 +404,7 @@ Setup complete! 🐯
307
404
  Start CLI: tiger start
308
405
  Start Telegram: tiger telegram
309
406
  Background daemon: tiger telegram --background
310
- Switch provider: /api claude (in Telegram chat)
407
+ Switch provider: /api <provider_id> (in Telegram chat)
311
408
  Token usage: /tokens (in Telegram chat)
312
409
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`);
313
410
 
@@ -177,8 +177,8 @@ function buildProviders(env) {
177
177
  minimax: {
178
178
  id: 'minimax',
179
179
  name: 'MiniMax',
180
- baseUrl: (env.MINIMAX_BASE_URL || 'https://api.minimax.chat/v1').replace(/\/$/, ''),
181
- chatModel: env.MINIMAX_MODEL || 'abab6.5s-chat',
180
+ baseUrl: (env.MINIMAX_BASE_URL || 'https://api.minimax.io/v1').replace(/\/$/, ''),
181
+ chatModel: env.MINIMAX_MODEL || 'MiniMax-M2.5',
182
182
  embedModel: env.MINIMAX_EMBED_MODEL || '',
183
183
  apiKey: env.MINIMAX_API_KEY || '',
184
184
  userAgent: '',
package/src/cli.js CHANGED
@@ -16,6 +16,7 @@ const srcRoot = path.resolve(__dirname, '..');
16
16
  // Runtime root — inside TIGER_HOME when installed globally, otherwise project root
17
17
  const rootDir = process.env.TIGER_HOME || process.cwd();
18
18
  const supervisorPidPath = path.resolve(rootDir, 'tiger-telegram.pid');
19
+ const workerHeartbeatPath = path.resolve(rootDir, 'tiger-telegram-worker.heartbeat');
19
20
 
20
21
  process.on('unhandledRejection', (reason) => {
21
22
  const msg = reason && reason.stack ? reason.stack : String(reason);
@@ -34,6 +35,14 @@ function hasFlag(argv, flag) {
34
35
  return argv.includes(flag);
35
36
  }
36
37
 
38
+ function writeWorkerHeartbeat() {
39
+ try {
40
+ fs.writeFileSync(workerHeartbeatPath, `${Date.now()}\n`, 'utf8');
41
+ } catch (err) {
42
+ // Heartbeat is best-effort and must not crash the worker.
43
+ }
44
+ }
45
+
37
46
  function isPidRunning(pid) {
38
47
  if (!pid) return false;
39
48
  try {
@@ -87,7 +96,9 @@ function stopTelegramBackground() {
87
96
  process.kill(pid, 'SIGTERM');
88
97
  fs.unlinkSync(supervisorPidPath);
89
98
  const workerPidPath = path.resolve(rootDir, 'tiger-telegram-worker.pid');
99
+ const workerHeartbeatPath = path.resolve(rootDir, 'tiger-telegram-worker.heartbeat');
90
100
  if (fs.existsSync(workerPidPath)) fs.unlinkSync(workerPidPath);
101
+ if (fs.existsSync(workerHeartbeatPath)) fs.unlinkSync(workerHeartbeatPath);
91
102
  process.stdout.write(`Stopped Telegram background bot (supervisor PID ${pid}).\n`);
92
103
  }
93
104
 
@@ -162,6 +173,7 @@ async function runCli() {
162
173
 
163
174
  async function main() {
164
175
  const argv = process.argv.slice(2);
176
+ const isWorkerProcess = hasFlag(argv, '--worker');
165
177
  if (hasFlag(argv, '--telegram-stop')) {
166
178
  stopTelegramBackground();
167
179
  return;
@@ -178,6 +190,11 @@ async function main() {
178
190
  startReflectionScheduler();
179
191
  const vectorStatus = initVectorMemory();
180
192
  printVectorMemoryStatus(vectorStatus);
193
+ if (isWorkerProcess) {
194
+ // NanoClaw-style heartbeat: worker emits liveness every minute.
195
+ writeWorkerHeartbeat();
196
+ setInterval(writeWorkerHeartbeat, 60 * 1000);
197
+ }
181
198
  startTelegramBot();
182
199
  process.stdout.write('Telegram bot started.\n');
183
200
  return;
package/src/config.js CHANGED
@@ -107,6 +107,11 @@ const swarmRouteOnProviderError =
107
107
  const swarmDefaultFlow = cleanEnvValue(process.env.SWARM_DEFAULT_FLOW || 'auto').toLowerCase() || 'auto';
108
108
  const swarmFirstAgentPolicy = cleanEnvValue(process.env.SWARM_FIRST_AGENT_POLICY || 'auto').toLowerCase() || 'auto';
109
109
  const swarmFirstAgent = cleanEnvValue(process.env.SWARM_FIRST_AGENT || '').toLowerCase();
110
+ const swarmStepMaxRetries = Math.max(0, Number(process.env.SWARM_STEP_MAX_RETRIES || 2));
111
+ const swarmContinueOnError =
112
+ ['1', 'true', 'yes', 'on'].includes(cleanEnvValue(process.env.SWARM_CONTINUE_ON_ERROR || 'true').toLowerCase());
113
+ const swarmEnabled =
114
+ ['1', 'true', 'yes', 'on'].includes(cleanEnvValue(process.env.SWARM_ENABLED || 'false').toLowerCase());
110
115
 
111
116
  module.exports = {
112
117
  kimiProvider,
@@ -138,6 +143,9 @@ module.exports = {
138
143
  swarmDefaultFlow,
139
144
  swarmFirstAgentPolicy,
140
145
  swarmFirstAgent,
146
+ swarmStepMaxRetries,
147
+ swarmContinueOnError,
148
+ swarmEnabled,
141
149
  dbPath: path.resolve(process.env.DB_PATH || './db/agent.json'),
142
150
  maxMessages: Number(process.env.MAX_MESSAGES || 200),
143
151
  recentMessages: Number(process.env.RECENT_MESSAGES || 40)
@@ -8,7 +8,9 @@ const {
8
8
  swarmRouteOnProviderError,
9
9
  swarmDefaultFlow,
10
10
  swarmFirstAgentPolicy,
11
- swarmFirstAgent
11
+ swarmFirstAgent,
12
+ swarmStepMaxRetries,
13
+ swarmContinueOnError
12
14
  } = require('../config');
13
15
  const {
14
16
  AGENTS_DIR,
@@ -85,6 +87,34 @@ function getTaskContext(task) {
85
87
  return task.metadata.swarm_ctx;
86
88
  }
87
89
 
90
+ function getRetryState(task) {
91
+ if (!task.metadata || typeof task.metadata !== 'object') task.metadata = {};
92
+ if (!task.metadata.retry_state || typeof task.metadata.retry_state !== 'object') {
93
+ task.metadata.retry_state = {};
94
+ }
95
+ const state = task.metadata.retry_state;
96
+ if (!state.workers || typeof state.workers !== 'object') state.workers = {};
97
+ if (!state.stages || typeof state.stages !== 'object') state.stages = {};
98
+ return state;
99
+ }
100
+
101
+ function markRetryAttempt(task, scope, key) {
102
+ const state = getRetryState(task);
103
+ const store = scope === 'stage' ? state.stages : state.workers;
104
+ const k = String(key || '').trim();
105
+ if (!k) return 1;
106
+ store[k] = Number(store[k] || 0) + 1;
107
+ return store[k];
108
+ }
109
+
110
+ function clearRetryAttempts(task, scope, key) {
111
+ const state = getRetryState(task);
112
+ const store = scope === 'stage' ? state.stages : state.workers;
113
+ const k = String(key || '').trim();
114
+ if (!k) return;
115
+ delete store[k];
116
+ }
117
+
88
118
  function resolveRoleMap(architecture) {
89
119
  const out = {};
90
120
  const agents = Array.isArray(architecture && architecture.agents) ? architecture.agents : [];
@@ -281,7 +311,7 @@ async function runArchitectureStage(task, architecture, stage) {
281
311
 
282
312
  if (stageType === 'parallel') {
283
313
  const roles = Array.isArray(stage.roles) ? stage.roles.map((x) => String(x || '').trim()).filter(Boolean) : [];
284
- const outputs = await Promise.all(roles.map(async (roleId) => {
314
+ const roleRuns = await Promise.allSettled(roles.map(async (roleId) => {
285
315
  const role = roleMap[roleId] || { runtimeAgent: roleId };
286
316
  const text = await runRoleStep(
287
317
  task,
@@ -293,11 +323,37 @@ async function runArchitectureStage(task, architecture, stage) {
293
323
  );
294
324
  return { role: roleId, runtime_agent: role.runtimeAgent, text };
295
325
  }));
326
+ const outputs = [];
327
+ const failures = [];
328
+ for (let i = 0; i < roleRuns.length; i += 1) {
329
+ const roleId = roles[i];
330
+ const outcome = roleRuns[i];
331
+ if (outcome.status === 'fulfilled') {
332
+ outputs.push(outcome.value);
333
+ } else {
334
+ failures.push({
335
+ role: roleId,
336
+ error: String(outcome.reason && outcome.reason.message ? outcome.reason.message : outcome.reason || 'unknown error')
337
+ });
338
+ }
339
+ }
340
+
296
341
  for (const out of outputs) {
297
342
  appendThread(task, out.role, out.text || `${out.role} completed step`);
298
343
  }
344
+ for (const fail of failures) {
345
+ appendThread(task, fail.role, `error: ${fail.error}`);
346
+ }
347
+
299
348
  const key = String(stage.store_as || `${stage.id}_outputs`).trim();
300
349
  ctx[key] = outputs;
350
+ ctx[`${key}_errors`] = failures;
351
+ const minSuccess = Math.max(1, Number(stage.min_success || 1));
352
+ if (outputs.length < minSuccess) {
353
+ throw new Error(
354
+ `parallel stage "${stage.id}" produced ${outputs.length}/${roles.length} successful outputs (min_success=${minSuccess})`
355
+ );
356
+ }
301
357
  appendThread(task, 'tiger', `stage ${stage.id} completed with ${outputs.length} parallel outputs`);
302
358
  task.next_agent = stage.next ? stageRef(stage.next) : 'tiger';
303
359
  task.status = 'pending';
@@ -448,17 +504,49 @@ async function runWorkerTurn(agentName) {
448
504
  let { task, filePath } = claim;
449
505
  try {
450
506
  task = await processWorkerTask(agentName, task);
507
+ clearRetryAttempts(task, 'worker', agentName);
451
508
  const out = releaseTask(task, filePath, task.status === 'failed' ? 'failed' : 'pending');
452
509
  return { ok: true, idle: false, agent: agentName, task: out.task };
453
510
  } catch (err) {
454
- appendThread(task, agentName, `error: ${err.message}`);
511
+ const errorMsg = String(err && err.message ? err.message : 'unknown error');
512
+ appendThread(task, agentName, `error: ${errorMsg}`);
513
+ const attempt = markRetryAttempt(task, 'worker', agentName);
514
+ const maxRetries = Math.max(0, Number(process.env.SWARM_STEP_MAX_RETRIES || swarmStepMaxRetries || 0));
515
+
516
+ if (attempt <= maxRetries) {
517
+ task.status = 'pending';
518
+ task.next_agent = agentName;
519
+ appendThread(task, 'tiger', `retry scheduled for ${agentName} (${attempt}/${maxRetries})`);
520
+ const out = releaseTask(task, filePath, 'pending');
521
+ return {
522
+ ok: true,
523
+ idle: false,
524
+ retrying: true,
525
+ agent: agentName,
526
+ error: errorMsg,
527
+ task: out.task
528
+ };
529
+ }
530
+
455
531
  if (!task.metadata || typeof task.metadata !== 'object') task.metadata = {};
456
532
  task.metadata.last_failed_agent = agentName;
457
- task.metadata.last_error = String(err && err.message ? err.message : 'unknown error');
533
+ task.metadata.last_error = errorMsg;
534
+
535
+ if (swarmContinueOnError) {
536
+ appendThread(task, 'tiger', `continuing after ${agentName} failure (retries exhausted)`);
537
+ task.status = 'pending';
538
+ task.next_agent = 'tiger';
539
+ if (!task.result) {
540
+ task.result = `Task completed with degraded path: ${agentName} failed after ${attempt - 1} retries. Last error: ${errorMsg}`;
541
+ }
542
+ const out = releaseTask(task, filePath, 'pending');
543
+ return { ok: true, idle: false, degraded: true, agent: agentName, error: errorMsg, task: out.task };
544
+ }
545
+
458
546
  task.status = 'failed';
459
547
  task.next_agent = 'tiger';
460
548
  const out = releaseTask(task, filePath, 'failed');
461
- return { ok: false, idle: false, agent: agentName, error: err.message, task: out.task };
549
+ return { ok: false, idle: false, agent: agentName, error: errorMsg, task: out.task };
462
550
  }
463
551
  }
464
552
 
@@ -525,9 +613,58 @@ async function runTaskToTiger(taskId, opts = {}) {
525
613
  const stage = getStageById(architecture, stageId);
526
614
  if (!stage) return { ok: false, task, error: `Unknown stage: ${stageId}` };
527
615
  if (onProgress) onProgress({ phase: 'worker_start', agent: stage.id, task });
528
- const updated = await runArchitectureStage(task, architecture, stage);
529
- saveTaskInPlace(filePath, updated);
530
- if (onProgress) onProgress({ phase: 'worker_done', agent: stage.id, task: updated, turn: { ok: true } });
616
+ try {
617
+ const updated = await runArchitectureStage(task, architecture, stage);
618
+ clearRetryAttempts(updated, 'stage', stageId);
619
+ saveTaskInPlace(filePath, updated);
620
+ if (onProgress) onProgress({ phase: 'worker_done', agent: stage.id, task: updated, turn: { ok: true } });
621
+ } catch (err) {
622
+ const errorMsg = String(err && err.message ? err.message : 'unknown error');
623
+ appendThread(task, 'tiger', `stage ${stageId} error: ${errorMsg}`);
624
+ const attempt = markRetryAttempt(task, 'stage', stageId);
625
+ const maxRetries = Math.max(0, Number(process.env.SWARM_STEP_MAX_RETRIES || swarmStepMaxRetries || 0));
626
+
627
+ if (attempt <= maxRetries) {
628
+ task.status = 'pending';
629
+ task.next_agent = stageRef(stageId);
630
+ appendThread(task, 'tiger', `retry scheduled for stage ${stageId} (${attempt}/${maxRetries})`);
631
+ saveTaskInPlace(filePath, task);
632
+ if (onProgress) {
633
+ onProgress({
634
+ phase: 'worker_done',
635
+ agent: stage.id,
636
+ task,
637
+ turn: { ok: true, retrying: true, error: errorMsg }
638
+ });
639
+ }
640
+ continue;
641
+ }
642
+
643
+ if (swarmContinueOnError) {
644
+ const fallbackNext = stage.fail_next || stage.next || 'tiger';
645
+ task.status = 'pending';
646
+ task.next_agent = fallbackNext === 'tiger' ? 'tiger' : stageRef(fallbackNext);
647
+ appendThread(task, 'tiger', `continuing after stage ${stageId} failure (retries exhausted)`);
648
+ if (!task.result && task.next_agent === 'tiger') {
649
+ task.result = `Task completed with degraded path: stage ${stageId} failed after ${attempt - 1} retries. Last error: ${errorMsg}`;
650
+ }
651
+ saveTaskInPlace(filePath, task);
652
+ if (onProgress) {
653
+ onProgress({
654
+ phase: 'worker_done',
655
+ agent: stage.id,
656
+ task,
657
+ turn: { ok: true, degraded: true, error: errorMsg }
658
+ });
659
+ }
660
+ continue;
661
+ }
662
+
663
+ task.status = 'failed';
664
+ task.next_agent = 'tiger';
665
+ saveTaskInPlace(filePath, task);
666
+ return { ok: false, task, error: errorMsg };
667
+ }
531
668
  continue;
532
669
  }
533
670
 
@@ -1,7 +1,7 @@
1
1
  'use strict';
2
2
 
3
3
  const TelegramBot = require('node-telegram-bot-api');
4
- const { telegramBotToken } = require('../config');
4
+ const { telegramBotToken, swarmEnabled } = require('../config');
5
5
  const { handleMessage } = require('../agent/mainAgent');
6
6
  const tokenManager = require('../tokenManager');
7
7
  const { getProvider } = require('../apiProviders');
@@ -170,7 +170,7 @@ function startTelegramBot() {
170
170
  ensureSwarmLayout();
171
171
  ensureSwarmConfigLayout();
172
172
  const bot = new TelegramBot(telegramBotToken, { polling: true });
173
- let swarmEnabled = true;
173
+ let swarmRoutingEnabled = swarmEnabled;
174
174
 
175
175
  // Register commands so Telegram shows the list when user types /
176
176
  bot.setMyCommands([
@@ -228,16 +228,16 @@ function startTelegramBot() {
228
228
  if (text.startsWith('/swarm')) {
229
229
  const arg = text.slice(6).trim().toLowerCase();
230
230
  if (!arg) {
231
- await safeSend(bot, chatId, `🐯 Swarm is currently *${swarmEnabled ? 'ON' : 'OFF'}*.\nUse \`/swarm on\` or \`/swarm off\`.`, MD);
231
+ await safeSend(bot, chatId, `🐯 Swarm is currently *${swarmRoutingEnabled ? 'ON' : 'OFF'}*.\nUse \`/swarm on\` or \`/swarm off\`.`, MD);
232
232
  return;
233
233
  }
234
234
  if (arg === 'on') {
235
- swarmEnabled = true;
235
+ swarmRoutingEnabled = true;
236
236
  await safeSend(bot, chatId, '✅ Swarm routing is now *ON*', MD);
237
237
  return;
238
238
  }
239
239
  if (arg === 'off') {
240
- swarmEnabled = false;
240
+ swarmRoutingEnabled = false;
241
241
  await safeSend(bot, chatId, '✅ Swarm routing is now *OFF*\\.\nNew messages will go to the regular Tiger agent\\.', { parse_mode: 'MarkdownV2' });
242
242
  return;
243
243
  }
@@ -509,7 +509,7 @@ function startTelegramBot() {
509
509
  try {
510
510
  await safeSendTyping(bot, chatId);
511
511
  typingTimer = setInterval(() => safeSendTyping(bot, chatId), 4500);
512
- if (!swarmEnabled) {
512
+ if (!swarmRoutingEnabled) {
513
513
  const reply = await handleMessage({ platform: 'telegram', userId, text });
514
514
  clearInterval(typingTimer);
515
515
  await safeSend(bot, chatId, reply);
@@ -11,10 +11,15 @@ const logsDir = path.resolve(runtimeDir, 'logs');
11
11
  const botLogPath = path.resolve(logsDir, 'telegram.out.log');
12
12
  const supervisorPidPath = path.resolve(runtimeDir, 'tiger-telegram.pid');
13
13
  const workerPidPath = path.resolve(runtimeDir, 'tiger-telegram-worker.pid');
14
+ const workerHeartbeatPath = path.resolve(runtimeDir, 'tiger-telegram-worker.heartbeat');
14
15
  const restartDelayMs = 5000;
16
+ const heartbeatCheckMs = 60 * 1000;
17
+ const heartbeatTimeoutMs = 5 * 60 * 1000;
15
18
 
16
19
  let worker = null;
17
20
  let stopping = false;
21
+ let heartbeatTimer = null;
22
+ let restartPending = false;
18
23
 
19
24
  function appendLog(line) {
20
25
  fs.appendFileSync(botLogPath, `[${new Date().toISOString()}] ${line}\n`, 'utf8');
@@ -24,6 +29,46 @@ function writeBufferToLog(buffer) {
24
29
  fs.appendFileSync(botLogPath, buffer);
25
30
  }
26
31
 
32
+ function getHeartbeatAgeMs() {
33
+ if (!fs.existsSync(workerHeartbeatPath)) return Number.POSITIVE_INFINITY;
34
+ const raw = fs.readFileSync(workerHeartbeatPath, 'utf8').trim();
35
+ const ts = Number(raw);
36
+ if (!Number.isFinite(ts) || ts <= 0) return Number.POSITIVE_INFINITY;
37
+ return Date.now() - ts;
38
+ }
39
+
40
+ function scheduleRestart(reason) {
41
+ if (stopping || restartPending) return;
42
+ restartPending = true;
43
+ appendLog(`${reason}, restarting in 5s`);
44
+ setTimeout(() => {
45
+ restartPending = false;
46
+ startWorker();
47
+ }, restartDelayMs);
48
+ }
49
+
50
+ function stopWorker(reason) {
51
+ if (!worker || !worker.pid) return;
52
+ appendLog(reason);
53
+ try {
54
+ process.kill(worker.pid, 'SIGTERM');
55
+ } catch (err) {
56
+ // Worker may already be dead.
57
+ }
58
+ }
59
+
60
+ function startHeartbeatMonitor() {
61
+ if (heartbeatTimer) clearInterval(heartbeatTimer);
62
+ heartbeatTimer = setInterval(() => {
63
+ if (stopping || !worker || !worker.pid) return;
64
+ const ageMs = getHeartbeatAgeMs();
65
+ if (ageMs > heartbeatTimeoutMs) {
66
+ stopWorker(`heartbeat stale (${Math.round(ageMs / 1000)}s > ${Math.round(heartbeatTimeoutMs / 1000)}s), force restarting worker`);
67
+ scheduleRestart('worker restart requested by heartbeat watchdog');
68
+ }
69
+ }, heartbeatCheckMs);
70
+ }
71
+
27
72
  function startWorker() {
28
73
  if (stopping) return;
29
74
 
@@ -35,7 +80,9 @@ function startWorker() {
35
80
  });
36
81
 
37
82
  fs.writeFileSync(workerPidPath, `${worker.pid}\n`, 'utf8');
83
+ fs.writeFileSync(workerHeartbeatPath, `${Date.now()}\n`, 'utf8');
38
84
  appendLog(`worker started (PID ${worker.pid})`);
85
+ startHeartbeatMonitor();
39
86
 
40
87
  if (worker.stdout) {
41
88
  worker.stdout.on('data', writeBufferToLog);
@@ -47,8 +94,7 @@ function startWorker() {
47
94
  worker.on('exit', (code, signal) => {
48
95
  if (fs.existsSync(workerPidPath)) fs.unlinkSync(workerPidPath);
49
96
  if (stopping) return;
50
- appendLog(`worker exited (code=${code}, signal=${signal || 'none'}), restarting in 5s`);
51
- setTimeout(startWorker, restartDelayMs);
97
+ scheduleRestart(`worker exited (code=${code}, signal=${signal || 'none'})`);
52
98
  });
53
99
  }
54
100
 
@@ -65,7 +111,12 @@ function shutdown(signal) {
65
111
  }
66
112
  }
67
113
 
114
+ if (heartbeatTimer) {
115
+ clearInterval(heartbeatTimer);
116
+ heartbeatTimer = null;
117
+ }
68
118
  if (fs.existsSync(workerPidPath)) fs.unlinkSync(workerPidPath);
119
+ if (fs.existsSync(workerHeartbeatPath)) fs.unlinkSync(workerHeartbeatPath);
69
120
  if (fs.existsSync(supervisorPidPath)) fs.unlinkSync(supervisorPidPath);
70
121
  process.exit(0);
71
122
  }