@ducci/jarvis 1.0.38 → 1.0.40

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (32) hide show
  1. package/docs/agent.md +43 -4
  2. package/docs/crons.md +100 -0
  3. package/docs/identity.md +38 -0
  4. package/docs/skills.md +77 -0
  5. package/docs/system-prompt.md +25 -13
  6. package/docs/telegram.md +61 -2
  7. package/package.json +2 -1
  8. package/src/channels/telegram/index.js +65 -0
  9. package/src/server/agent.js +59 -19
  10. package/src/server/app.js +125 -2
  11. package/src/server/config.js +43 -0
  12. package/src/server/cron-scheduler.js +35 -0
  13. package/src/server/crons.js +106 -0
  14. package/src/server/tools.js +234 -72
  15. package/docs/findings/001-context-explosion.md +0 -116
  16. package/docs/findings/002-handoff-edge-cases.md +0 -84
  17. package/docs/findings/003-event-loop-blocking-and-reliability.md +0 -120
  18. package/docs/findings/004-agent-reliability-improvements.md +0 -162
  19. package/docs/findings/005-installation-timeout.md +0 -128
  20. package/docs/findings/006-malformed-tool-schema.md +0 -118
  21. package/docs/findings/007-telegram-errors-and-handoff-stalling.md +0 -271
  22. package/docs/findings/008-exec-timeout-architecture.md +0 -118
  23. package/docs/findings/009-non-string-response-field.md +0 -153
  24. package/docs/findings/010-checkpoint-field-type-safety.md +0 -121
  25. package/docs/findings/011-empty-model-response.md +0 -157
  26. package/docs/findings/012-empty-nudge-loses-recovery-text.md +0 -121
  27. package/docs/findings/013-stderr-visibility-and-truncation.md +0 -59
  28. package/docs/findings/014-exec-stderr-artifact-and-malformed-tool-args.md +0 -202
  29. package/docs/findings/015-failed-run-context-strip.md +0 -142
  30. package/docs/findings/016-file-writing-corruption-and-stderr-loop.md +0 -119
  31. package/docs/findings/017-looping-intervention-and-lossy-checkpoint.md +0 -110
  32. package/docs/findings/018-anthropic-oauth-token-support.md +0 -72
@@ -120,77 +120,6 @@ const SEED_TOOLS = {
120
120
  },
121
121
  code: `const filePath = path.join(process.env.HOME, '.jarvis/data/user-info.json'); const raw = await fs.promises.readFile(filePath, 'utf8').catch(() => '{"items":[]}'); const { items } = JSON.parse(raw); return { status: 'ok', items };`,
122
122
  },
123
- save_tool: {
124
- definition: {
125
- type: 'function',
126
- function: {
127
- name: 'save_tool',
128
- description: 'Create or update a custom tool and make it available immediately in this session. Use this to build reusable JS tools for tasks you repeat. The tool code runs in Node.js and has access to: args, fs, path, process, require, __jarvisDir. To update an existing tool, first call get_tool to read its current code and parameters, then call save_tool with your modifications.',
129
- parameters: {
130
- type: 'object',
131
- properties: {
132
- name: {
133
- type: 'string',
134
- description: 'Tool name in snake_case (e.g. "parse_json_file"). Must be unique.',
135
- },
136
- description: {
137
- type: 'string',
138
- description: 'What the tool does. Be specific — the LLM uses this to decide when to call it.',
139
- },
140
- parameters: {
141
- type: 'object',
142
- description: 'JSON Schema object for the tool parameters (with type, properties, required fields).',
143
- },
144
- code: {
145
- type: 'string',
146
- description: 'The body of an async function. Must end with a return statement — the returned value becomes the tool result. Available bindings: args (your tool parameters), fs (node:fs), path (node:path), process, require, __jarvisDir (absolute path to the jarvis server directory — use path.resolve(__jarvisDir, "../..") to get the project root for npm installs). Do NOT wrap in a function declaration. Example: const raw = await fs.promises.readFile(args.filePath, "utf8"); const data = JSON.parse(raw); return { count: data.length, first: data[0] };',
147
- },
148
- timeout: {
149
- type: 'number',
150
- description: 'Optional execution timeout in milliseconds for this tool (max 600000 = 10 minutes). Use this when the tool wraps a slow operation (e.g. a network request or long computation) that exceeds the default 60-second limit. If omitted, the default 60-second timeout applies.',
151
- },
152
- },
153
- required: ['name', 'description', 'parameters', 'code'],
154
- },
155
- },
156
- },
157
- code: `const toolsFile = path.join(process.env.HOME, '.jarvis/data/tools/tools.json'); const raw = await fs.promises.readFile(toolsFile, 'utf8').catch(() => '{}'); const tools = JSON.parse(raw); let parameters = args.parameters; if (typeof parameters === 'string') { try { parameters = JSON.parse(parameters); } catch { return { status: 'error', error: 'parameters must be a JSON Schema object, not a string. Pass the object directly, not as a JSON-serialized string.' }; } } if (typeof parameters !== 'object' || parameters === null || Array.isArray(parameters)) { return { status: 'error', error: 'parameters must be a JSON Schema object (e.g. { type: "object", properties: {...} }).' }; } const entry = { definition: { type: 'function', function: { name: args.name, description: args.description, parameters } }, code: args.code }; if (args.timeout !== undefined) { const t = Number(args.timeout); if (!Number.isFinite(t) || t <= 0) return { status: 'error', error: 'timeout must be a positive number in milliseconds.' }; entry.timeout = Math.min(t, 600_000); } tools[args.name] = entry; await fs.promises.writeFile(toolsFile, JSON.stringify(tools, null, 2), 'utf8'); return { status: 'ok', saved: args.name, timeout: entry.timeout || 60000 };`,
158
- },
159
- get_tool: {
160
- definition: {
161
- type: 'function',
162
- function: {
163
- name: 'get_tool',
164
- description: 'Read the full definition and code of a single tool by name. Use this before updating an existing tool so you understand its current implementation.',
165
- parameters: {
166
- type: 'object',
167
- properties: {
168
- name: {
169
- type: 'string',
170
- description: 'The tool name to retrieve.',
171
- },
172
- },
173
- required: ['name'],
174
- },
175
- },
176
- },
177
- code: `const toolsFile = path.join(process.env.HOME, '.jarvis/data/tools/tools.json'); const raw = await fs.promises.readFile(toolsFile, 'utf8').catch(() => '{}'); const tools = JSON.parse(raw); const tool = tools[args.name]; if (!tool) return { status: 'not_found', name: args.name }; return { status: 'ok', name: args.name, definition: tool.definition, code: tool.code };`,
178
- },
179
- list_tools: {
180
- definition: {
181
- type: 'function',
182
- function: {
183
- name: 'list_tools',
184
- description: 'List all available tools with their names and descriptions. Use this to see what tools exist before creating a new one.',
185
- parameters: {
186
- type: 'object',
187
- properties: {},
188
- required: [],
189
- },
190
- },
191
- },
192
- code: `const toolsFile = path.join(process.env.HOME, '.jarvis/data/tools/tools.json'); const raw = await fs.promises.readFile(toolsFile, 'utf8').catch(() => '{}'); const tools = JSON.parse(raw); const list = Object.entries(tools).map(([name, t]) => ({ name, description: t.definition.function.description })); return { status: 'ok', tools: list };`,
193
- },
194
123
  npm_install: {
195
124
  definition: {
196
125
  type: 'function',
@@ -352,7 +281,7 @@ const SEED_TOOLS = {
352
281
  type: 'function',
353
282
  function: {
354
283
  name: 'write_file',
355
- description: 'Write content directly to a file on the filesystem, bypassing all shell escaping. Use this to create or overwrite any file — shell scripts, config files, code, etc. Content is written exactly as provided: dollar signs, backslashes, and special characters are preserved without modification. Always prefer this over exec+echo, exec+printf, or exec+heredoc for writing files. For shell scripts, pass mode: "755" to make the file executable. Example: write_file({ path: "/path/to/scan.sh", content: "#!/bin/bash\\nDOMAIN=$1\\n...", mode: "755" })',
284
+ description: 'Create a new file or completely overwrite an existing file. Content is written exactly as provided dollar signs, backslashes, and special characters are preserved without modification. Always prefer this over exec+echo, exec+printf, or exec+heredoc. For shell scripts, pass mode: "755". For targeted edits to an existing file (changing a specific line or section), use edit_file instead.',
356
285
  parameters: {
357
286
  type: 'object',
358
287
  properties: {
@@ -384,6 +313,239 @@ const SEED_TOOLS = {
384
313
  return { status: 'ok', path: targetPath, bytes, mode: args.mode || '644' };
385
314
  `,
386
315
  },
316
+ edit_file: {
317
+ definition: {
318
+ type: 'function',
319
+ function: {
320
+ name: 'edit_file',
321
+ description: 'Replace an exact string in a file with a new string. Use this for targeted edits — you only need to provide the specific section to change, not the whole file. old_string must match exactly (including whitespace and indentation) and must appear exactly once in the file. If it appears more than once, add more surrounding context to make it unique. For creating new files or rewriting entire files, use write_file instead.',
322
+ parameters: {
323
+ type: 'object',
324
+ properties: {
325
+ path: {
326
+ type: 'string',
327
+ description: 'Absolute or relative path to the file to edit.',
328
+ },
329
+ old_string: {
330
+ type: 'string',
331
+ description: 'The exact string to find and replace. Must match character-for-character including whitespace and indentation.',
332
+ },
333
+ new_string: {
334
+ type: 'string',
335
+ description: 'The string to replace old_string with.',
336
+ },
337
+ },
338
+ required: ['path', 'old_string', 'new_string'],
339
+ },
340
+ },
341
+ },
342
+ code: `
343
+ const targetPath = path.resolve(args.path);
344
+ const content = await fs.promises.readFile(targetPath, 'utf8');
345
+ const count = content.split(args.old_string).length - 1;
346
+ if (count === 0) {
347
+ return { status: 'error', error: 'old_string not found in file. Check for exact whitespace and indentation match.' };
348
+ }
349
+ if (count > 1) {
350
+ return { status: 'error', error: \`old_string found \${count} times. Add more surrounding context to make it unique.\` };
351
+ }
352
+ const updated = content.replace(args.old_string, args.new_string);
353
+ await fs.promises.writeFile(targetPath, updated, 'utf8');
354
+ return { status: 'ok', path: targetPath };
355
+ `,
356
+ },
357
+ get_current_time: {
358
+ definition: {
359
+ type: 'function',
360
+ function: {
361
+ name: 'get_current_time',
362
+ description: 'Returns the current server time. Call this before scheduling a cron job when the user specifies a relative time (e.g. "in 2 hours", "at 3pm today") so you can calculate the correct schedule.',
363
+ parameters: { type: 'object', properties: {}, required: [] },
364
+ },
365
+ },
366
+ code: `
367
+ const now = new Date();
368
+ return {
369
+ status: 'ok',
370
+ iso: now.toISOString(),
371
+ local: now.toLocaleString(),
372
+ utcOffset: -now.getTimezoneOffset() / 60,
373
+ };
374
+ `,
375
+ },
376
+ create_cron: {
377
+ definition: {
378
+ type: 'function',
379
+ function: {
380
+ name: 'create_cron',
381
+ description: 'Schedule a recurring or one-time task. The prompt is executed by a fresh agent with no prior context — write it as a self-contained task. For one-time tasks (e.g. "remind me in 2 hours"), set once: true. Call get_current_time first when calculating a relative schedule.',
382
+ parameters: {
383
+ type: 'object',
384
+ properties: {
385
+ name: { type: 'string', description: 'Short identifier for this cron, e.g. "backup-nightly".' },
386
+ schedule: { type: 'string', description: 'Cron expression, e.g. "0 3 * * *" for 3am daily. For a one-time task, compute the exact time from get_current_time and express it as a cron expression.' },
387
+ prompt: { type: 'string', description: 'The task prompt the agent will receive when this cron fires. Must be self-contained. Include "use send_telegram_message to notify the user with the result" if notification is desired.' },
388
+ once: { type: 'boolean', description: 'If true, the cron deletes itself after firing once. Use for one-time reminders or tasks.' },
389
+ },
390
+ required: ['name', 'schedule', 'prompt'],
391
+ },
392
+ },
393
+ },
394
+ code: `
395
+ const { randomUUID } = require('crypto');
396
+ const cronsFile = path.join(process.env.HOME, '.jarvis/data/crons.json');
397
+ const crons = JSON.parse(await fs.promises.readFile(cronsFile, 'utf8').catch(() => '[]'));
398
+ const entry = {
399
+ id: randomUUID(),
400
+ name: args.name,
401
+ schedule: args.schedule,
402
+ prompt: args.prompt,
403
+ once: args.once || false,
404
+ createdAt: new Date().toISOString(),
405
+ };
406
+ crons.push(entry);
407
+ await fs.promises.mkdir(path.dirname(cronsFile), { recursive: true });
408
+ await fs.promises.writeFile(cronsFile, JSON.stringify(crons, null, 2), 'utf8');
409
+ return { status: 'ok', cron: entry };
410
+ `,
411
+ },
412
+ list_crons: {
413
+ definition: {
414
+ type: 'function',
415
+ function: {
416
+ name: 'list_crons',
417
+ description: 'List all scheduled cron jobs.',
418
+ parameters: { type: 'object', properties: {}, required: [] },
419
+ },
420
+ },
421
+ code: `
422
+ const cronsFile = path.join(process.env.HOME, '.jarvis/data/crons.json');
423
+ const crons = JSON.parse(await fs.promises.readFile(cronsFile, 'utf8').catch(() => '[]'));
424
+ return { status: 'ok', crons };
425
+ `,
426
+ },
427
+ delete_cron: {
428
+ definition: {
429
+ type: 'function',
430
+ function: {
431
+ name: 'delete_cron',
432
+ description: 'Delete a scheduled cron job by name or id.',
433
+ parameters: {
434
+ type: 'object',
435
+ properties: {
436
+ name: { type: 'string', description: 'The cron name to delete.' },
437
+ id: { type: 'string', description: 'The cron id to delete.' },
438
+ },
439
+ },
440
+ },
441
+ },
442
+ code: `
443
+ const cronsFile = path.join(process.env.HOME, '.jarvis/data/crons.json');
444
+ const crons = JSON.parse(await fs.promises.readFile(cronsFile, 'utf8').catch(() => '[]'));
445
+ const idx = crons.findIndex(c => c.id === args.id || c.name === args.name);
446
+ if (idx === -1) return { status: 'not_found' };
447
+ const [removed] = crons.splice(idx, 1);
448
+ await fs.promises.writeFile(cronsFile, JSON.stringify(crons, null, 2), 'utf8');
449
+ return { status: 'ok', id: removed.id, name: removed.name };
450
+ `,
451
+ },
452
+ send_telegram_message: {
453
+ definition: {
454
+ type: 'function',
455
+ function: {
456
+ name: 'send_telegram_message',
457
+ description: 'Send a message to the Telegram user. Use this inside cron prompts to notify the user with the result of a task.',
458
+ parameters: {
459
+ type: 'object',
460
+ properties: {
461
+ message: { type: 'string', description: 'The message text to send.' },
462
+ },
463
+ required: ['message'],
464
+ },
465
+ },
466
+ },
467
+ code: `
468
+ const https = require('https');
469
+ const token = process.env.TELEGRAM_BOT_TOKEN;
470
+ const settingsFile = path.join(process.env.HOME, '.jarvis/data/config/settings.json');
471
+ const settings = JSON.parse(await fs.promises.readFile(settingsFile, 'utf8'));
472
+ const chatId = settings.channels?.telegram?.allowedUserIds?.[0];
473
+ if (!chatId) return { status: 'error', error: 'No Telegram chat_id configured.' };
474
+ if (!token) return { status: 'error', error: 'No TELEGRAM_BOT_TOKEN configured.' };
475
+ const body = JSON.stringify({ chat_id: chatId, text: args.message });
476
+ await new Promise((resolve, reject) => {
477
+ const req = https.request({
478
+ hostname: 'api.telegram.org',
479
+ path: '/bot' + token + '/sendMessage',
480
+ method: 'POST',
481
+ headers: { 'Content-Type': 'application/json', 'Content-Length': Buffer.byteLength(body) },
482
+ }, res => {
483
+ let data = '';
484
+ res.on('data', chunk => data += chunk);
485
+ res.on('end', () => {
486
+ const parsed = JSON.parse(data);
487
+ if (!parsed.ok) reject(new Error(parsed.description));
488
+ else resolve(parsed);
489
+ });
490
+ });
491
+ req.on('error', reject);
492
+ req.write(body);
493
+ req.end();
494
+ });
495
+ return { status: 'ok', chatId };
496
+ `,
497
+ },
498
+ read_cron_log: {
499
+ definition: {
500
+ type: 'function',
501
+ function: {
502
+ name: 'read_cron_log',
503
+ description: 'Read the execution log for a cron job. Returns the most recent runs with status, response, and logSummary.',
504
+ parameters: {
505
+ type: 'object',
506
+ properties: {
507
+ id: { type: 'string', description: 'The cron id.' },
508
+ limit: { type: 'number', description: 'Max entries to return. Defaults to 20.' },
509
+ },
510
+ required: ['id'],
511
+ },
512
+ },
513
+ },
514
+ code: `
515
+ const logsDir = path.join(process.env.HOME, '.jarvis/logs');
516
+ const logFile = path.join(logsDir, 'cron-' + args.id + '.jsonl');
517
+ const content = await fs.promises.readFile(logFile, 'utf8').catch(() => '');
518
+ const lines = content.trim().split('\\n').filter(Boolean);
519
+ const limit = args.limit || 20;
520
+ const entries = lines.slice(-limit).map(line => JSON.parse(line));
521
+ return { status: 'ok', entries };
522
+ `,
523
+ },
524
+ read_skill: {
525
+ definition: {
526
+ type: 'function',
527
+ function: {
528
+ name: 'read_skill',
529
+ description: 'Read the full instructions of a skill by name. Call this before executing a skill so you have the complete workflow. The skill name must match one of the available skills listed in your system prompt.',
530
+ parameters: {
531
+ type: 'object',
532
+ properties: {
533
+ name: {
534
+ type: 'string',
535
+ description: 'The skill name, e.g. "add-two-integers".',
536
+ },
537
+ },
538
+ required: ['name'],
539
+ },
540
+ },
541
+ },
542
+ code: `
543
+ const skillFile = path.join(process.env.HOME, '.jarvis/data/skills', args.name, 'skill.md');
544
+ const content = await fs.promises.readFile(skillFile, 'utf8').catch(() => null);
545
+ if (!content) return { status: 'not_found', name: args.name };
546
+ return { status: 'ok', name: args.name, content };
547
+ `,
548
+ },
387
549
  get_recent_sessions: {
388
550
  definition: {
389
551
  type: 'function',
@@ -1,116 +0,0 @@
1
- # Finding 001: Context Window Explosion via Tool Output Accumulation
2
-
3
- **Date:** 2026-02-26
4
- **Severity:** High — renders session completely unusable after enough handoffs
5
- **Status:** Fixed
6
-
7
- ---
8
-
9
- ## What Happened
10
-
11
- A session was started with the question *"Hast du Zugriff auf deinen source code? Wo liegt er?"* (Does Jarvis have access to its own source code?). The agent began exploring the filesystem using `exec` and `list_dir`, running commands like `cat agent.js`, `cat tools.js`, `cat app.js`, and various `find` commands.
12
-
13
- The task required more than 10 iterations to complete, so the checkpoint/handoff mechanism fired. The agent ran 6 consecutive handoff runs before hitting `maxHandoffs` and stopping with `intervention_required`.
14
-
15
- By that point the session `conversation.json` had grown to **687KB**. On the very next user message (*"Why?"*), both the primary and fallback models returned a `400 Provider returned error`. The session was permanently broken — no further messages could be processed.
16
-
17
- ---
18
-
19
- ## Root Cause
20
-
21
- Two compounding problems:
22
-
23
- ### 1. Tool output stored verbatim, without size limit
24
-
25
- `exec` returns raw `stdout` from shell commands. When the model runs `cat agent.js` (440 lines, ~22 000 chars), that entire output gets stored in `session.messages` as a `role: "tool"` message. Every subsequent model request in that run — and in all future runs — sends this content in full.
26
-
27
- There was no cap anywhere on tool result content. A single run of 10 iterations with a few `cat` calls could easily produce 100–200 KB of tool messages.
28
-
29
- ### 2. Handoff runs accumulated on top of each other
30
-
31
- When the iteration limit is hit, the checkpoint/handoff mechanism pushes `checkpoint.remaining` as a new user message and starts a fresh agent run — but on top of the **same, growing** `session.messages` array. Each of the 6 handoff runs added another 10 iterations of tool call messages to the history. Nothing was ever removed.
32
-
33
- After 6 runs × ~10 iterations × multiple `cat` commands each, the context reached approximately 170 000 tokens — exceeding the free model's 128 000 token limit. The `400` was the provider rejecting the oversized request.
34
-
35
- ### Why the `400` appeared on the *next* user message, not during the run
36
-
37
- The session's final run hit `maxHandoffs` and stopped. At that point the context was already at or near the limit. When the user sent a new message, the full bloated history was loaded and sent again — this time slightly over the limit — causing the rejection.
38
-
39
- ---
40
-
41
- ## Model Context Windows (for reference)
42
-
43
- | Model | Context Window |
44
- |---|---|
45
- | arcee-ai/trinity-large-preview:free | ~128 000 tokens |
46
- | Claude Sonnet 4.6 | 200 000 tokens |
47
- | Gemini 2.5 Pro / 2.0 Flash | 1 000 000 tokens |
48
-
49
- A larger model would have delayed the failure, but not prevented it. The conversation would still grow unboundedly.
50
-
51
- ---
52
-
53
- ## What We Considered
54
-
55
- **Truncate tool results in `prepareMessages`** — works, but runs on every loop iteration and is the wrong place conceptually. The content is already stored in full in the session before `prepareMessages` is ever called.
56
-
57
- **Naive sliding window (drop oldest N messages)** — breaks the OpenRouter/OpenAI API contract. Every `role: "tool"` message must be paired with the assistant message containing the matching `tool_call_id`. Slicing arbitrarily through the message array orphans tool results and causes a `400` — the exact error we're trying to fix.
58
-
59
- **Token budget / summarisation** — more adaptive but significantly more complex. Requires either token counting per model or an extra LLM call. Overkill for v1.
60
-
61
- ---
62
-
63
- ## Fix
64
-
65
- Two targeted changes to `src/server/agent.js`.
66
-
67
- ### 1. Cap tool result content at write time (`MAX_TOOL_RESULT = 4000`)
68
-
69
- Right where a tool result is pushed to `session.messages`, cap the content to 4 000 characters. The full result is still passed to `runToolCalls` and therefore written to the JSONL session log — no information is lost for debugging. Only what the model sees is limited.
70
-
71
- ```js
72
- const sessionContent = resultStr.length > MAX_TOOL_RESULT
73
- ? resultStr.slice(0, MAX_TOOL_RESULT) + '\n[...truncated]'
74
- : resultStr;
75
- session.messages.push({ role: 'tool', tool_call_id: toolCall.id, content: sessionContent });
76
- ```
77
-
78
- 4 000 chars is ~80 lines of code or a full `ls -la` listing — enough for the model to reason about any output. If more detail is needed, the model should use targeted commands (`grep`, `head`, `tail`) rather than `cat`-ing entire files.
79
-
80
- ### 2. Strip intermediate tool messages before each handoff
81
-
82
- Before calling `runAgentLoop`, snapshot `session.messages.length` as `runStartIndex`. If the run ends with `checkpoint_reached`, splice out all messages added during that run *except the final wrap-up assistant response*, then push `checkpoint.remaining` as the new user message.
83
-
84
- ```js
85
- const runStartIndex = session.messages.length;
86
- const run = await runAgentLoop(...);
87
-
88
- // on checkpoint_reached, before resuming:
89
- session.messages.splice(runStartIndex, session.messages.length - runStartIndex - 1);
90
- session.messages.push({ role: 'user', content: run.checkpoint.remaining });
91
- ```
92
-
93
- **Before** (after 6 handoffs):
94
- ```
95
- [system] [user: question] [assistant/tool ×10] [wrap-up] [user: checkpoint1]
96
- [assistant/tool ×10] [wrap-up] [user: checkpoint2]
97
- [assistant/tool ×10] [wrap-up] ... → 687 KB
98
- ```
99
-
100
- **After** (after 6 handoffs):
101
- ```
102
- [system] [user: question] [wrap-up] [user: checkpoint1]
103
- [wrap-up] [user: checkpoint2]
104
- [wrap-up] ... → ~5 KB
105
- ```
106
-
107
- Each handoff now adds 2 messages instead of 20+. The wrap-up message carries the relevant state (what was done, what remains) so the model is not flying blind — it just doesn't have the raw tool noise from previous runs.
108
-
109
- ---
110
-
111
- ## Outcome
112
-
113
- - Sessions with long-running tasks no longer grow the context unboundedly.
114
- - The JSONL session log is unaffected — full tool outputs are always written there.
115
- - The model can still access previous run output via `read_session_log` if needed.
116
- - A follow-up message after a completed multi-handoff task will no longer receive a `400`.
@@ -1,84 +0,0 @@
1
- # Finding 002: Handoff Edge Cases Found During Review of Finding 001
2
-
3
- **Date:** 2026-02-26
4
- **Severity:** Medium
5
- **Status:** Fixed
6
-
7
- ---
8
-
9
- ## Context
10
-
11
- While reviewing the fix for [Finding 001](./001-context-explosion.md), two edge cases in the handoff system were found. Neither caused problems in the observed debugging session, but both could cause failures under specific conditions.
12
-
13
- ---
14
-
15
- ## Issue A: `checkpoint.remaining` could be `null`, causing a 400 on the next iteration
16
-
17
- ### What could happen
18
-
19
- When the iteration limit is hit, the agent asks the model for a wrap-up response that includes a `checkpoint` field:
20
-
21
- ```json
22
- {
23
- "response": "...",
24
- "logSummary": "...",
25
- "checkpoint": {
26
- "progress": "...",
27
- "remaining": "..."
28
- }
29
- }
30
- ```
31
-
32
- The server then pushes `checkpoint.remaining` as a user message to start the next run:
33
-
34
- ```js
35
- session.messages.push({ role: 'user', content: run.checkpoint.remaining });
36
- ```
37
-
38
- Weaker or free models occasionally omit required fields or set them to `null`. If `remaining` is `null`, the session gets a `{ role: 'user', content: null }` message. Most providers reject a null content field with a `400 Bad Request` on the next model call — the same error that surfaced in Finding 001, but from a different cause.
39
-
40
- ### Fix
41
-
42
- ```js
43
- session.messages.push({ role: 'user', content: run.checkpoint.remaining || 'Continue with the task.' });
44
- ```
45
-
46
- ---
47
-
48
- ## Issue B: `intervention_required` did not strip tool history before saving
49
-
50
- ### What could happen
51
-
52
- The tool history strip introduced in Finding 001 runs right before pushing `checkpoint.remaining` for the next run. But the `intervention_required` path (max handoffs exceeded) breaks out of the loop *before* reaching the strip:
53
-
54
- ```js
55
- if (session.metadata.handoffCount > config.maxHandoffs) {
56
- // ... log and set status ...
57
- break; // ← strip never ran
58
- }
59
-
60
- // strip only reached here, after the if-block
61
- session.messages.splice(runStartIndex, session.messages.length - runStartIndex - 1);
62
- ```
63
-
64
- This meant a session that hit the handoff limit was saved with the full tool history of the last run still in it. When the user sends a new message after `intervention_required`, the model receives all of that accumulated tool history — the same context bloat risk as before the fix in Finding 001.
65
-
66
- ### Fix
67
-
68
- Strip the tool history inside the `intervention_required` branch, before breaking:
69
-
70
- ```js
71
- if (session.metadata.handoffCount > config.maxHandoffs) {
72
- // ... log and set status ...
73
- session.messages.splice(runStartIndex, session.messages.length - runStartIndex - 1);
74
- break;
75
- }
76
- ```
77
-
78
- The wrap-up assistant message (last in the array) is preserved — it gives the model context about what was attempted when the user resumes.
79
-
80
- ---
81
-
82
- ## Why these weren't caught earlier
83
-
84
- Both issues only manifest under specific conditions (model omitting a field; hitting maxHandoffs exactly). The debugging session in Finding 001 stopped at `intervention_required` after 6 handoffs, but the 400 error on the next message was attributed to the overall context size, masking the fact that the strip hadn't run for that final run.
@@ -1,120 +0,0 @@
1
- # Finding 003: Event Loop Blocking, Async File I/O, and Session Reliability
2
-
3
- **Date:** 2026-02-27
4
- **Severity:** High — caused observed 100% CPU and server unresponsiveness in production
5
- **Status:** Fixed
6
-
7
- ---
8
-
9
- ## What Happened
10
-
11
- A session was started with the question *"Kannst du deinen source code finden und anschauen mittels Tools?"*. The agent used the `exec` tool to run two full-filesystem scans:
12
-
13
- ```
14
- find / -type f \( -iname "*.js" -o -iname "*.ts" -o -iname "*.py" \) 2>/dev/null | head -20
15
- find / -type d -name "jarvis" 2>/dev/null
16
- ```
17
-
18
- Both commands start from filesystem root `/`. The second has no output limit and scans everything: real disk filesystems, `/proc`, `/sys`, `/dev`, and any network mounts. On the affected Linux server this caused the CPU to reach 100% and the server became unresponsive. The server had to be shut down manually.
19
-
20
- ---
21
-
22
- ## Root Cause
23
-
24
- ### 1. `execSync` blocks the entire Node.js event loop
25
-
26
- Both `exec` and `list_dir` used `execSync` from `child_process`. `execSync` is a synchronous call that blocks the event loop for its entire duration. While any shell command runs:
27
-
28
- - Express cannot process incoming HTTP requests
29
- - The Telegram bot cannot receive or process new messages
30
- - All timers and async callbacks are frozen (including the Telegram `typingInterval`, so the user sees no activity indicator)
31
-
32
- The OS sees a CPU-hungry `find` child process running at full speed while Node.js sits blocked waiting for it. Combined, this presents as ~100% CPU with a completely unresponsive server.
33
-
34
- Additionally, `list_dir` used `execSync` with **no timeout at all**. A hanging command (e.g. `ls` on an NFS mount or a blocked `/proc` entry) would freeze the server permanently.
35
-
36
- ### 2. All file I/O was synchronous
37
-
38
- `loadSession`, `saveSession`, `appendLog`, and `loadTools` all used `fs.*Sync` variants. In an async Node.js server these block the event loop on every request. For small files the impact is measured in microseconds, but the pattern is architecturally incorrect and accumulates under load.
39
-
40
- ### 3. Session not saved on unexpected error
41
-
42
- In `handleChat`, `saveSession` was called unconditionally after the `try/catch` block. If the catch re-threw an unexpected error, `saveSession` was never reached. The user message had already been appended to the in-memory session but the on-disk version did not reflect it — leaving the session in an inconsistent state for the next request.
43
-
44
- ### 4. No concurrency protection per session
45
-
46
- The Telegram channel uses `@grammyjs/runner`, which processes updates concurrently. If a user sent two messages in quick succession, both `handleChat` calls could load the same session simultaneously, run independent agent loops, and then overwrite each other's `saveSession` call. The second write would silently discard the first response.
47
-
48
- ### 5. Seed tools never updated after initial creation
49
-
50
- `seedTools()` used `if (!existing[name])` — it only wrote a seed tool on first run. Any update to `exec` or `list_dir` in the source code would never propagate to an existing installation. This blocked the async fix for `exec` and `list_dir` from taking effect.
51
-
52
- ---
53
-
54
- ## Fixes
55
-
56
- ### 1. `exec` and `list_dir` → async (`src/server/tools.js`)
57
-
58
- **`exec`**: replaced `execSync` with `promisify(exec)`. The event loop is now free during shell command execution. Timeout (60s) and maxBuffer (2MB) are preserved.
59
-
60
- **`list_dir`**: replaced `execSync` with `promisify(execFile)`. `execFile` does not use a shell interpreter, which is safer against special characters in paths. Added a 10-second timeout (previously none).
61
-
62
- ### 2. `executeTool` global timeout (`src/server/tools.js`)
63
-
64
- All tool executions — both built-in and AI-created — are now wrapped in `Promise.race` against a 60-second timeout. This protects against AI-created tools that hang on async operations (network requests, file I/O). The timeout matches the `exec` tool's own limit for consistency.
65
-
66
- ```js
67
- const timeout = new Promise((_, reject) =>
68
- setTimeout(() => reject(new Error(`Tool '${name}' timed out after 60s`)), 60_000)
69
- );
70
- return await Promise.race([fn(toolArgs, fs, path, process, _require), timeout]);
71
- ```
72
-
73
- Note: this does not protect against synchronous CPU loops without `await` points — that would require Worker Threads. Such code is unlikely to be generated accidentally.
74
-
75
- ### 3. Seed tools always updated (`src/server/tools.js`)
76
-
77
- `seedTools()` now compares the serialized content of each seed tool against the stored version and overwrites only when there is a difference. Updates to built-in tools propagate on the next server start without touching user-created tools.
78
-
79
- ### 4. All file I/O → async (`src/server/sessions.js`, `src/server/logging.js`, `src/server/tools.js`)
80
-
81
- `loadSession`, `saveSession`, `appendLog`, and `loadTools` now use `fs.promises.*`. All callers in `agent.js` are updated to `await` these calls.
82
-
83
- ### 5. `saveSession` moved to `finally` block (`src/server/agent.js`)
84
-
85
- The session is now always persisted — on success, on model error, and on unexpected errors. A failed save is caught and logged without masking the original error.
86
-
87
- ```js
88
- } finally {
89
- try {
90
- await saveSession(sessionId, session);
91
- } catch (saveErr) {
92
- console.error(`Failed to save session ${sessionId}:`, saveErr);
93
- }
94
- }
95
- ```
96
-
97
- ### 6. Session queue for concurrency control (`src/server/agent.js`)
98
-
99
- A module-level `Map<sessionId, Promise>` serializes concurrent requests for the same session. Each new request registers itself as the tail of the queue and waits for the previous request to resolve before starting. The map entry is cleaned up by whichever request is last in the chain.
100
-
101
- ```js
102
- const previous = sessionQueues.get(sessionId) ?? Promise.resolve();
103
- let releaseLock;
104
- const current = new Promise(resolve => { releaseLock = resolve; });
105
- sessionQueues.set(sessionId, current);
106
- await previous;
107
- // ... process request ...
108
- // finally: releaseLock()
109
- ```
110
-
111
- This is safe in Node.js because the event loop is single-threaded: `get`, `new Promise`, and `set` all execute synchronously before the first `await`, so there is no race between two requests reading the same `undefined` entry.
112
-
113
- ---
114
-
115
- ## What Was Not Changed
116
-
117
- - The agent loop logic, checkpoint/handoff system, loop detection, and format recovery — all unchanged.
118
- - `seedTools()` remains synchronous (called once at startup, before the server accepts requests).
119
- - `createSession()` and `getToolDefinitions()` remain synchronous (pure functions, no I/O).
120
- - No rate limiting or HTTP authentication added — the server is intended for local/personal use only.