npm - @ducci/jarvis - Versions diffs - 1.0.95 → 1.0.97 - Mend

@ducci/jarvis 1.0.95 → 1.0.97

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/docs/telegram.md +45 -10
package/package.json +1 -1
package/src/channels/telegram/index.js +80 -0
package/src/server/agent.js +7 -6
package/src/server/config.js +1 -0
package/src/server/tools.js +34 -3

package/docs/telegram.md CHANGED Viewed

@@ -248,7 +248,7 @@ await bot.api.setMyCommands([
 ## Photo Support
-The bot handles incoming photos (`message:photo`) in addition to text. When a user sends a photo, the adapter selects the best resolution under 800px wide to keep token usage reasonable, then passes the image URL and optional caption to the agent as a multimodal content block.
+The bot handles incoming photos (`message:photo`) in addition to text. When a user sends a photo, the adapter selects the best resolution under 800px wide to keep token usage reasonable.
 ### Photo selection
@@ -265,30 +265,65 @@ This gives the highest quality image below the 800px threshold. Sending the full
 The image is downloaded immediately at receive time using the Telegram file URL (`https://api.telegram.org/file/bot<token>/<file_path>`) and converted to a base64 data URL (`data:image/jpeg;base64,...`). The data URL is stored directly in the session message, so the image remains available across handoffs and future conversation turns without depending on a Telegram URL that would expire after ~1 hour. Base64 encoding does not cost more tokens than a URL — image token cost is based on pixel dimensions, not transport format.
-### Agent call
+### Image processing paths
-Photos are passed to the agent as a multimodal content array instead of a plain string:
+How the image reaches the model depends on whether a dedicated vision model is configured:
+**Path 1 — `visionModel` configured** (`settings.json: visionProvider + visionModel`):
+Before the main agent call, the adapter calls `describeImage()` — a separate, one-shot API call to the vision model. The result (a text description of the image) is injected into the user turn as plain text. The main agent never sees the image itself; it only sees the description. This allows a cheap non-multimodal main model to handle image conversations.
+**Path 2 — No `visionModel`, multimodal main model**:
+The base64 data URL is passed directly to the main model as an `image_url` content block alongside any caption. The model processes the image natively.
 ```js
 const content = [
-  { type: 'image_url', url: fileUrl },
+  { type: 'image_url', image_url: { url: 'data:image/jpeg;base64,...' } },
+  { type: 'text', text: caption },
 ];
-if (caption) content.push({ type: 'text', text: caption });
 ```
-The agent layer must support receiving `content` as either a string or a content array and pass it through to the model accordingly.
+**Fallback — model rejects image input**:
+If the main model returns an error indicating it does not support image input (`isImageUnsupportedError`), the agent responds with a clear message ("This model does not support image input…") and strips the image from the session so subsequent messages are not permanently broken. A text placeholder is inserted in its place so the model retains context.
 ### Caption
-If the user attaches a caption to the photo (`ctx.message.caption`), it is included as a text block alongside the image. If there is no caption, only the image block is sent.
+If the user attaches a caption to the photo (`ctx.message.caption`), it is included alongside the image (as a text block in multimodal mode, or appended to the vision description in Path 1). If there is no caption, only the image content is sent.
+### Unsupported incoming media types
+Documents, audio files, video, stickers, and other non-photo non-voice media types sent by the user are not handled — the bot silently ignores them.
+## Outgoing Files
+The agent can send files from the server to the Telegram chat using the `send_file` seed tool. This complements the text-only `send_telegram_message` tool for cases where the agent has produced or located a file the user needs.
+### Tool interface
+```js
+send_file({ path: '/absolute/or/~/path/to/file', caption: 'Optional caption' })
+```
+The tool resolves `~` to the home directory, checks that the file exists, and calls the channel-provided `sendFile` callback. It returns `{ status: 'error', error: '...' }` if the file is not found or the channel does not support file sending.
+### Channel integration
+The Telegram adapter passes an `onSendFile` callback to `handleChat`:
+```js
+handleChat(config, sessionId, userText, attachments, onCheckpoint, async (filePath, caption) => {
+  await api.sendDocument(chatId, new InputFile(filePath), caption ? { caption } : {});
+});
+```
+`InputFile(filePath)` streams the file from disk — no in-memory buffering of the full file. The callback is threaded through `handleChat → _runHandleChat → runAgentLoop → executeTool` and injected into the tool's `AsyncFunction` as the `sendFile` parameter.
-### Unsupported media types
+### Channel support
-Documents, audio, video, stickers, and other non-photo media types are not handled — the bot silently ignores them (same as unauthorized messages).
+`send_file` only works in channels that register an `onSendFile` callback (currently: Telegram). In other contexts (web UI, cron runs), the tool returns an error immediately rather than silently succeeding.
 ## Non-Goals (v1)
-- No support for documents, audio, video, or other non-photo media types
+- No support for receiving documents, audio files, video, or other non-photo non-voice media from the user
 - No inline keyboards or callback queries
 - No group chat support (only private chats)
 - No message editing or deletion handling

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ducci/jarvis",
-  "version": "1.0.95",
+  "version": "1.0.97",
   "description": "A fully automated agent system that lives on a server.",
   "main": "./src/index.js",
   "type": "module",

package/src/channels/telegram/index.js CHANGED Viewed

@@ -658,6 +658,9 @@ export async function startTelegramChannel(config) {
           lastCheckpointSent = prefixed;
           await appendTelegramChatLog(chatId, getSessionId(chatId, slot) || null, 'JARVIS', prefixed);
           await sendMessage(api, chatId, prefixed, getSessionId(chatId, slot) || null);
+        }, async (filePath, caption) => {
+          await api.sendDocument(chatId, new InputFile(filePath), caption ? { caption } : {});
+          console.log(`[telegram] file sent chat_id=${chatId} slot=${slot} path=${filePath}`);
         });
       } catch (e) {
         console.error(`[telegram] agent error chat_id=${chatId} slot=${slot}: ${e.message}`);
@@ -850,6 +853,83 @@ export async function startTelegramChannel(config) {
     }
   });
+  bot.on('message:document', async (ctx) => {
+    const userId = ctx.from?.id;
+    if (!allowedUserIds.includes(userId)) return;
+    const chatId = ctx.chat.id;
+    const ts = new Date().toISOString();
+    const doc = ctx.message.document;
+    const MAX_BYTES = 20 * 1024 * 1024; // 20MB — Telegram bot API getFile limit
+    if (doc.file_size && doc.file_size > MAX_BYTES) {
+      await ctx.reply(`File too large (${Math.round(doc.file_size / 1024 / 1024)}MB). Telegram bot API limit is 20MB.`).catch(() => {});
+      return;
+    }
+    console.log(`[telegram] incoming document chat_id=${chatId} name=${doc.file_name} size=${doc.file_size}`);
+    let savedPath;
+    try {
+      const file = await ctx.api.getFile(doc.file_id);
+      const fileUrl = `https://api.telegram.org/file/bot${token}/${file.file_path}`;
+      const response = await fetch(fileUrl);
+      const buffer = Buffer.from(await response.arrayBuffer());
+      fs.mkdirSync(PATHS.uploadsDir, { recursive: true });
+      const safeName = (doc.file_name || 'file').replace(/[^a-zA-Z0-9._-]/g, '_');
+      savedPath = path.join(PATHS.uploadsDir, `${Date.now()}-${safeName}`);
+      fs.writeFileSync(savedPath, buffer);
+    } catch (e) {
+      console.error(`[telegram] document download error chat_id=${chatId}: ${e.message}`);
+      await ctx.reply('Sorry, could not download the file.').catch(() => {});
+      return;
+    }
+    const sizeKb = doc.file_size ? `${Math.round(doc.file_size / 1024)} KB` : 'unknown size';
+    const mimeType = doc.mime_type || 'application/octet-stream';
+    const fileInfo = `[User sent a file: ${savedPath} (${mimeType}, ${sizeKb})]`;
+    const userText = ctx.message.caption ? `${fileInfo}\n${ctx.message.caption}` : fileInfo;
+    const entry = { text: userText, attachments: [], ts };
+    const slot = getActiveSlot(chatId);
+    const key = slotKey(chatId, slot);
+    if (isRunning.has(key)) {
+      if (!pendingMessages.has(key)) pendingMessages.set(key, []);
+      pendingMessages.get(key).push(entry);
+      console.log(`[telegram] buffered document chat_id=${chatId} slot=${slot} pending=${pendingMessages.get(key).length}`);
+      return;
+    }
+    isRunning.add(key);
+    runStartTimes.set(key, new Date());
+    await ctx.api.sendChatAction(chatId, 'typing');
+    const typingInterval = setInterval(() => {
+      ctx.api.sendChatAction(chatId, 'typing').catch(() => {});
+    }, 4000);
+    try {
+      await processQueue(ctx.api, chatId, slot, [entry]);
+    } finally {
+      clearInterval(typingInterval);
+      isRunning.delete(key);
+      runStartTimes.delete(key);
+    }
+  });
+  bot.on('message:audio', async (ctx) => {
+    const userId = ctx.from?.id;
+    if (!allowedUserIds.includes(userId)) return;
+    await ctx.reply("I can't process audio files. Send a voice message for speech input, or send the file using \"Send as file\" if you want me to read it.").catch(() => {});
+  });
+  bot.on('message:video', async (ctx) => {
+    const userId = ctx.from?.id;
+    if (!allowedUserIds.includes(userId)) return;
+    await ctx.reply("I can't process video files. Use \"Send as file\" if you want me to access the file.").catch(() => {});
+  });
   bot.on('message:text', async (ctx) => {
     const userId = ctx.from?.id;

package/src/server/agent.js CHANGED Viewed

@@ -399,7 +399,7 @@ export async function runAgentLoop(client, config, session, prepareMessages, usa
             if (toolName === 'spawn_subagent') {
               result = await runSubagent(client, config, toolArgs, config._sessionId);
             } else {
-              result = await executeTool(tools, toolName, toolArgs);
+              result = await executeTool(tools, toolName, toolArgs, { sendFile: config._sendFile ?? null });
             }
           } catch (e) {
             result = { status: 'error', error: e.message };
@@ -709,7 +709,7 @@ export async function withSessionLock(sessionId, fn) {
  * Main entry point: handles a single POST /api/chat request.
  * Manages the handoff loop across multiple agent runs.
  */
-export async function handleChat(config, requestSessionId, userMessage, attachments = [], onCheckpoint = null) {
+export async function handleChat(config, requestSessionId, userMessage, attachments = [], onCheckpoint = null, onSendFile = null) {
   const sessionId = requestSessionId || crypto.randomUUID();
   // Serialize concurrent requests for the same session. Each request registers
@@ -723,7 +723,7 @@ export async function handleChat(config, requestSessionId, userMessage, attachme
   await previous;
   try {
-    return await _runHandleChat(config, sessionId, userMessage, attachments, onCheckpoint);
+    return await _runHandleChat(config, sessionId, userMessage, attachments, onCheckpoint, onSendFile);
   } finally {
     releaseLock();
     // Clean up only if no one else has queued behind us
@@ -737,7 +737,7 @@ export async function handleChat(config, requestSessionId, userMessage, attachme
  * The actual chat logic, extracted so handleChat can wrap it cleanly with the
  * session lock.
  */
-async function _runHandleChat(config, sessionId, userMessage, attachments = [], onCheckpoint = null) {
+async function _runHandleChat(config, sessionId, userMessage, attachments = [], onCheckpoint = null, onSendFile = null) {
   const client = createClient(config);
   const systemPromptTemplate = loadSystemPrompt();
@@ -782,6 +782,7 @@ async function _runHandleChat(config, sessionId, userMessage, attachments = [],
   } else {
     userContent = userMessageWithContext;
   }
+  const interactionStartIndex = session.messages.length;
   session.messages.push({ role: 'user', content: userContent });
   session.metadata.handoffCount = 0;
   session.metadata.failedApproaches = [];
@@ -820,7 +821,7 @@ async function _runHandleChat(config, sessionId, userMessage, attachments = [],
       // Safety check: if the last two assistant messages are both model_error
       // synthetic notes, we are in a confirmed failure loop. Escalate immediately
       // rather than burning more iterations on a stuck session.
-      if (hasConsecutiveModelErrors(session.messages)) {
+      if (hasConsecutiveModelErrors(session.messages.slice(interactionStartIndex))) {
         finalResponse = 'The model has failed twice in a row. This is likely due to the conversation being too long for the model to process. Please start a new session or switch to a model with a larger context window.';
         finalLogSummary = 'Consecutive model_error detected: session escalated to intervention_required without running another agent loop.';
         finalStatus = 'intervention_required';
@@ -837,7 +838,7 @@ async function _runHandleChat(config, sessionId, userMessage, attachments = [],
       }
       const runStartIndex = session.messages.length;
-      const run = await runAgentLoop(client, { ...config, _sessionId: sessionId }, session, prepareMessages, usageAccum);
+      const run = await runAgentLoop(client, { ...config, _sessionId: sessionId, _sendFile: onSendFile }, session, prepareMessages, usageAccum);
       allToolCalls.push(...run.runToolCalls);
       if (run.status !== 'checkpoint_reached') {

package/src/server/config.js CHANGED Viewed

@@ -23,6 +23,7 @@ export const PATHS = {
   identityFile: path.join(JARVIS_DIR, 'data', 'identity.md'),
   skillsDir: path.join(JARVIS_DIR, 'data', 'skills'),
   cronsFile: path.join(JARVIS_DIR, 'data', 'crons.json'),
+  uploadsDir: path.join(JARVIS_DIR, 'uploads'),
   systemPromptFile: path.join(__dirname, '..', '..', 'docs', 'system-prompt.md'),
 };

package/src/server/tools.js CHANGED Viewed

@@ -379,6 +379,37 @@ const SEED_TOOLS = {
       };
     `,
   },
+  send_file: {
+    definition: {
+      type: 'function',
+      function: {
+        name: 'send_file',
+        description: 'Send a file from disk to the user in the current chat (e.g. Telegram). Supports any file type: images, PDFs, text files, archives, etc. Use a caption to describe the file.',
+        parameters: {
+          type: 'object',
+          properties: {
+            path: {
+              type: 'string',
+              description: 'Absolute or ~ path to the file to send.',
+            },
+            caption: {
+              type: 'string',
+              description: 'Optional caption displayed with the file.',
+            },
+          },
+          required: ['path'],
+        },
+      },
+    },
+    code: `
+      const _p = args.path;
+      const targetPath = path.resolve(_p === '~' || _p.startsWith('~/') ? require('os').homedir() + _p.slice(1) : _p);
+      if (!fs.existsSync(targetPath)) return { status: 'error', error: 'File not found: ' + targetPath };
+      if (typeof sendFile !== 'function') return { status: 'error', error: 'send_file is not supported in this channel.' };
+      await sendFile(targetPath, args.caption || '');
+      return { status: 'ok', path: targetPath };
+    `,
+  },
   create_cron: {
     definition: {
       type: 'function',
@@ -793,13 +824,13 @@ export function getToolDefinitions(tools) {
   return defs;
 }
-export async function executeTool(tools, name, toolArgs) {
+export async function executeTool(tools, name, toolArgs, { sendFile = null } = {}) {
   const tool = tools[name];
   if (!tool) {
     throw new Error(`Unknown tool: ${name}`);
   }
-  const fn = new AsyncFunction('args', 'fs', 'path', 'process', 'require', '__jarvisDir', tool.code);
+  const fn = new AsyncFunction('args', 'fs', 'path', 'process', 'require', '__jarvisDir', 'sendFile', tool.code);
   // Tools can declare their own timeout (e.g. system_install needs 5 min).
   // Falls back to the global default of 60s.
@@ -812,5 +843,5 @@ export async function executeTool(tools, name, toolArgs) {
     )
   );
-  return await Promise.race([fn(toolArgs, fs, path, process, _require, __jarvisDir), timeout]);
+  return await Promise.race([fn(toolArgs, fs, path, process, _require, __jarvisDir, sendFile), timeout]);
 }