npm - @ducci/jarvis - Versions diffs - 1.0.38 → 1.0.40 - Mend

@ducci/jarvis 1.0.38 → 1.0.40

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

package/docs/agent.md +43 -4
package/docs/crons.md +100 -0
package/docs/identity.md +38 -0
package/docs/skills.md +77 -0
package/docs/system-prompt.md +25 -13
package/docs/telegram.md +61 -2
package/package.json +2 -1
package/src/channels/telegram/index.js +65 -0
package/src/server/agent.js +59 -19
package/src/server/app.js +125 -2
package/src/server/config.js +43 -0
package/src/server/cron-scheduler.js +35 -0
package/src/server/crons.js +106 -0
package/src/server/tools.js +234 -72
package/docs/findings/001-context-explosion.md +0 -116
package/docs/findings/002-handoff-edge-cases.md +0 -84
package/docs/findings/003-event-loop-blocking-and-reliability.md +0 -120
package/docs/findings/004-agent-reliability-improvements.md +0 -162
package/docs/findings/005-installation-timeout.md +0 -128
package/docs/findings/006-malformed-tool-schema.md +0 -118
package/docs/findings/007-telegram-errors-and-handoff-stalling.md +0 -271
package/docs/findings/008-exec-timeout-architecture.md +0 -118
package/docs/findings/009-non-string-response-field.md +0 -153
package/docs/findings/010-checkpoint-field-type-safety.md +0 -121
package/docs/findings/011-empty-model-response.md +0 -157
package/docs/findings/012-empty-nudge-loses-recovery-text.md +0 -121
package/docs/findings/013-stderr-visibility-and-truncation.md +0 -59
package/docs/findings/014-exec-stderr-artifact-and-malformed-tool-args.md +0 -202
package/docs/findings/015-failed-run-context-strip.md +0 -142
package/docs/findings/016-file-writing-corruption-and-stderr-loop.md +0 -119
package/docs/findings/017-looping-intervention-and-lossy-checkpoint.md +0 -110
package/docs/findings/018-anthropic-oauth-token-support.md +0 -72

package/docs/agent.md CHANGED Viewed

@@ -32,13 +32,22 @@ Respond with your normal JSON, but add a checkpoint field:
   "logSummary": "Human-readable summary of what happened in this run.",
   "checkpoint": {
     "progress": "What has been fully completed so far.",
-    "remaining": "What still needs to be done to finish the task."
+    "remaining": "What still needs to be done to finish the task.",
+    "failedApproaches": "Comma-separated list of approaches already tried that did not work.",
+    "state": { "key": "value" }
   }
 }
 The checkpoint field will be used to automatically resume the task in the next run.]
 ```
+The checkpoint object has four fields:
+- `progress` — what has been fully completed so far.
+- `remaining` — what still needs to be done to finish the task. The server uses this as the starting prompt for the next run.
+- `failedApproaches` — a record of approaches already attempted that did not work, so the next run does not repeat them. This is preserved in `session.metadata` and injected into each subsequent resume prompt.
+- `state` — a flat key-value JSON object for concrete facts confirmed by tool output (file paths, binary locations, config values, etc.). It is merged into `session.metadata.checkpointState` across handoffs and injected as known facts into the next resume prompt, so the agent does not need to re-discover information it already found.
 2. The server reads `checkpoint.remaining` from the response and uses it as the starting prompt for a fresh agent run.
 3. The server marks the run status as `checkpoint_reached`.
@@ -138,17 +147,36 @@ Interaction flow:
 The authoritative system prompt text lives in [docs/system-prompt.md](./system-prompt.md). It is sent as the first message (`role: "system"`) in every session and stored verbatim in the conversation history.
+Four placeholders are injected at runtime before the system prompt is sent to the model — none of them are ever written back to disk or stored in conversation history:
+- `{{identity}}` — replaced with the full contents of `~/.jarvis/data/identity.md`. This is freeform text that describes the agent's persona and behavior.
+- `{{skills}}` — replaced with a rendered list of available skills (name + description) loaded from `~/.jarvis/data/skills/`. This lets the model know which skills exist and what they do without embedding full skill content in every request.
+- `{{session_id}}` — replaced with the current session UUID.
+- `{{user_info}}` — replaced with the current contents of `user-info.json`. If no user info exists, replaced with `(none yet)`.
 ## Tools
 All tools — built-in and user-defined — live in a single registry file (`tools.json`) and are executed via the same `new Function()` path. There is no separate execution mechanism for built-ins.
 **Built-in tools** (seeded into `tools.json` on first server start if missing):
-- `get_recent_sessions` — returns the most recent sessions (default: last 2, configurable via `limit`)
-- `read_session_log` — returns JSONL log entries for a given session; the agent-accessible way to inspect failures and previous run summaries
+- `list_dir` — lists directory contents (ls -la)
+- `exec` — runs arbitrary shell commands; 5-minute timeout
+- `write_file` — writes a file directly via fs.promises.writeFile, bypassing shell escaping; supports optional `mode` parameter for executable scripts
 - `save_user_info` — persists user facts to `user-info.json`
 - `read_user_info` — returns all stored user facts
-- `exec` — runs arbitrary shell commands as the server user; no safeguards
+- `get_recent_sessions` — returns the most recent sessions
+- `read_session_log` — returns JSONL log entries for a given session
+- `npm_install` — installs an npm package into the jarvis project directory
+- `system_install` — installs a system binary via brew/apt-get/snap; 5-minute timeout
+- `perplexity_search` — web search via Perplexity AI
+- `read_skill` — reads the full content of a skill by name from `~/.jarvis/data/skills/<name>/skill.md`
+- `get_current_time` — returns current server time; used before scheduling crons with relative times
+- `create_cron` — creates a scheduled cron job and writes to `crons.json`; activates immediately without restart
+- `list_crons` — lists all scheduled cron jobs
+- `delete_cron` — removes a cron job by name or id
+- `send_telegram_message` — sends a proactive message to the Telegram user; used inside cron prompts
+- `read_cron_log` — reads the JSONL execution log for a given cron id
 If a built-in entry is missing from `tools.json` at startup, the server re-seeds it from its default definition. This means built-ins can be inspected and edited in place, and will be restored if accidentally deleted.
@@ -306,6 +334,10 @@ To support persistent tracking (like `handoffCount`), each file contains a JSON
 The system prompt is stored as the first message in the `messages` array. The full turn sequence — user → assistant (with tool_calls) → tool → assistant (final) — is stored verbatim so that subsequent requests can be sent to the provider without any transformation.
+## Sliding Window
+`prepareMessages()` applies a sliding window before every model call: it always includes the system prompt (`messages[0]`) plus the most recent `contextWindow` messages (default 100, configurable via `settings.json`). The full message history is always preserved on disk — only what is sent to the model is trimmed. This prevents context overflow on long sessions without losing data.
 ## Provider Message Format
 When sending the conversation to OpenRouter, messages must follow the OpenAI-compatible chat format.
@@ -599,3 +631,10 @@ Tool inputs/outputs:
 - `get_recent_sessions`
   - Input: `{ "limit": 2 }`
   - Output: `{ "status": "ok", "sessions": [{ "sessionId": "...", "title": "...", "lastTs": "..." }] }`
+## See Also
+- [docs/system-prompt.md](./system-prompt.md) — the authoritative system prompt text
+- [docs/identity.md](./identity.md) — the agent's persona and identity configuration (`~/.jarvis/data/identity.md`)
+- [docs/skills.md](./skills.md) — the skills system (per-skill `skill.md` files, how they are listed and read)
+- [docs/crons.md](./crons.md) — the cron scheduler (job format, `crons.json`, execution loop, logging)

package/docs/crons.md ADDED Viewed

@@ -0,0 +1,100 @@
+# Crons
+Crons let you schedule recurring or one-time tasks. The agent executes the task autonomously and optionally notifies you via Telegram.
+## Storage
+All cron jobs are stored in `~/.jarvis/data/crons.json`:
+```json
+[
+  {
+    "id": "550e8400-e29b-41d4-a716-446655440000",
+    "name": "backup-nightly",
+    "schedule": "0 3 * * *",
+    "prompt": "Backup folder /home/xyz to /backups/xyz. When done, use send_telegram_message to notify the user with the result.",
+    "once": false,
+    "createdAt": "2026-03-11T10:00:00.000Z"
+  }
+]
+```
+## How a Cron Runs
+When a cron fires:
+1. A **fresh agent run** starts with no prior conversation context — only the stored `prompt`
+2. The agent executes the task, optionally calling `send_telegram_message` to notify you
+3. The result is logged to `~/.jarvis/logs/cron-<id>.jsonl`
+4. A synthetic message is appended to your Telegram session so the agent has context if you reply:
+```
+[Cron "backup-nightly" | 2026-03-11 03:00] Backup completed. 2.3GB written to /backups/xyz.
+```
+5. If `once: true`, the cron deletes itself after firing
+## Scheduling
+Crons use standard cron expressions:
+| Expression | Meaning |
+|---|---|
+| `0 3 * * *` | Every day at 3am |
+| `0 */2 * * *` | Every 2 hours |
+| `0 9 * * 1` | Every Monday at 9am |
+| `30 14 11 3 *` | Once on March 11 at 14:30 |
+For one-time tasks specified as relative times ("in 2 hours", "at 3pm today"), the agent calls `get_current_time` first, calculates the exact schedule, and sets `once: true`.
+## Notifications
+Notification is opt-in via the prompt. Include this in the prompt when you want a notification:
+> "When done, use `send_telegram_message` to notify the user with the result."
+If you don't want a notification, omit it. The agent follows the prompt literally — conditional notifications work naturally:
+> "Check disk usage. If any partition is above 90%, use `send_telegram_message` to alert the user. Otherwise do nothing."
+## Dynamic Scheduling
+When `create_cron` runs successfully, the agent loop immediately registers the new cron in the in-memory scheduler — no server restart required. `delete_cron` unregisters it immediately as well.
+On server restart, all crons in `crons.json` are re-loaded and rescheduled. `once: true` crons that already fired (and deleted themselves) are gone from the file and will not re-run.
+## Logs
+Each cron has its own JSONL log at `~/.jarvis/logs/cron-<id>.jsonl`. One entry per run:
+```json
+{
+  "ts": "2026-03-11T03:00:01.234Z",
+  "cronName": "backup-nightly",
+  "status": "ok",
+  "response": "Backup completed. 2.3GB written to /backups/xyz.",
+  "logSummary": "Ran rsync from /home/xyz to /backups/xyz. Exit code 0."
+}
+```
+Use `read_cron_log` to inspect past runs. Ask Jarvis "did my backup run last night?" and it will call `list_crons` + `read_cron_log`.
+## Tools
+| Tool | Purpose |
+|---|---|
+| `create_cron` | Schedule a new cron job |
+| `list_crons` | List all active crons |
+| `delete_cron` | Remove a cron by name or id |
+| `read_cron_log` | Read execution history for a cron |
+| `get_current_time` | Get current server time for relative scheduling |
+| `send_telegram_message` | Send a proactive message to the Telegram user |
+## Triggering Without Saying "Cron"
+The system prompt instructs the agent to recognise scheduling intent from natural language. Examples that will create a cron:
+- "every night at 3am, backup my projects folder"
+- "remind me in 2 hours"
+- "check my server disk usage every day and alert me if it's getting full"
+- "send me a good morning message every day at 8am"

package/docs/identity.md ADDED Viewed

@@ -0,0 +1,38 @@
+# Identity
+This document describes how Jarvis's identity is defined and injected.
+## What It Is
+`~/.jarvis/data/identity.md` is a plain Markdown file that defines who the agent is — its name, purpose, tone, and communication style. It is loaded at runtime and injected into the system prompt via the `{{identity}}` placeholder on every request.
+This means you can change how Jarvis behaves without touching the system prompt or restarting the server. Editing `identity.md` takes effect on the next message.
+## Default Content
+Created automatically on first server start if the file does not exist:
+```md
+# Identity
+You are Jarvis, a fully autonomous agent running on a local server. You have access to tools and can execute shell commands on the machine you run on.
+Be concise and direct in your responses. Avoid unnecessary filler. When a task is done, say so clearly.
+```
+## How It Is Injected
+`resolveSystemPrompt()` in `src/server/config.js` reads `identity.md` at call time and substitutes it for `{{identity}}` in the system prompt template. The resolved prompt is sent to the model but never written to disk — the placeholder is always preserved in the stored session history.
+## Customisation
+Edit `~/.jarvis/data/identity.md` directly. Examples of what you can change:
+- **Name** — rename the agent to anything
+- **Tone** — formal, casual, verbose, terse
+- **Domain** — focus the agent on a specific area (e.g. "You are a security researcher...")
+- **Personality** — add quirks, communication preferences, or constraints
+## What Belongs Here vs. the System Prompt
+`identity.md` is for **who the agent is**. The system prompt (`docs/system-prompt.md`) is for **how the agent must behave** — response format, tool use rules, exec safety, failure recovery. Keep technical rules in the system prompt where they cannot be accidentally deleted.

package/docs/skills.md ADDED Viewed

@@ -0,0 +1,77 @@
+# Skills
+Skills are predefined workflows that guide how the agent approaches specific tasks. Unlike tools (which execute code), skills are instructions written in Markdown — they tell the agent how to do something rather than doing it directly.
+## Folder Structure
+Each skill lives in its own subdirectory under `~/.jarvis/data/skills/`:
+```
+~/.jarvis/data/skills/
+  <skill-name>/
+    skill.md       ← required: frontmatter + instructions
+    *.js / *.sh    ← optional: bundled scripts the skill references
+```
+## skill.md Format
+Every `skill.md` starts with YAML frontmatter:
+```yaml
+---
+name: skill-name
+description: What this skill does and when to use it. Use this when the user asks to...
+---
+# Skill Title
+Instructions for the agent...
+```
+The `description` field is the only signal the agent has to decide whether to load the skill. Write it so the agent reliably recognises when the skill applies — be specific about the task type and include a "Use this when..." clause.
+Bad: `"Manages ports."`
+Good: `"Scan a target host for open ports using nmap and return a structured report. Use this when the user asks to scan ports or check what services are running on a host."`
+## How Skills Are Used
+At runtime, `resolveSystemPrompt()` reads all skill directories and builds a list of available skills (name + description only) injected via the `{{skills}}` placeholder in the system prompt. The agent sees this list on every request and decides which skill (if any) is relevant.
+When the agent decides to use a skill, it calls the `read_skill` tool to fetch the full instructions:
+```json
+{ "name": "skill-name" }
+```
+The tool returns the full `skill.md` content. The agent then follows the instructions.
+This two-step approach (list in system prompt → full content on demand) keeps the prompt small while making all skills discoverable.
+## Bundled Scripts
+A skill folder can contain scripts that the skill's instructions reference. Scripts are called via `exec`:
+```sh
+node ~/.jarvis/data/skills/<name>/script.js <args>
+```
+Always reference scripts by their absolute path. Use `write_file` to create scripts — never `exec+echo`.
+## Seed Skills
+Two skills are created on first server start if they do not exist:
+- **`add-two-integers`** — example skill demonstrating the skill + bundled script pattern
+- **`manage-skill`** — create, edit, or delete skills; includes guidance on what makes a good skill
+## Creating and Managing Skills
+Use the `manage-skill` skill. The agent will read it when asked to create, edit, or list skills.
+## What Makes a Good Skill
+- Describes a **workflow or approach**, not a single command
+- Name is specific and lowercase with hyphens (`scan-open-ports`, not `scanning`)
+- Description reliably signals to the agent when to use it (see example above)
+- Instructions are written for the agent, not the user
+- Uses `write_file` for any file creation inside the skill workflow

package/docs/system-prompt.md CHANGED Viewed

@@ -2,21 +2,43 @@
 This is the authoritative system prompt sent to the model at the start of every session. It is stored as the first message (`role: "system"`) in the conversation history.
-Before sending to the model, the server replaces the `{{user_info}}` and `{{session_id}}` placeholders at runtime on every request — these are never stored in the conversation history.
+Before sending to the model, the server replaces the `{{identity}}`, `{{user_info}}` and `{{session_id}}` placeholders at runtime on every request — these are never stored in the conversation history.
 ---
 ```
-You are Jarvis, a fully autonomous agent running on a local server. You have access to tools and can execute shell commands on the machine you run on.
+## Identity
+{{identity}}
 ## Session
 Current session ID: {{session_id}}
+Only the most recent messages are included in your context (sliding window). Older messages are stored on disk but not sent to you. If the user references something you cannot find in the conversation, explain that it may have scrolled out of your context window and ask them to repeat the relevant detail.
 ## Known User Context
 {{user_info}}
+## Crons
+You can schedule recurring or one-time tasks using cron jobs.
+- Use `create_cron` when the user wants to schedule something — even if they don't say "cron". Triggers: "every night", "every 2 hours", "remind me at 3pm", "notify me in 2 hours", "check X every Monday", etc.
+- Call `get_current_time` first when the user specifies a relative time (e.g. "in 2 hours") so you can calculate the correct cron expression.
+- The `prompt` stored in the cron is executed by a fresh agent with no prior conversation context. Write it as a complete, self-contained instruction.
+- If the user wants to be notified, include "use send_telegram_message to notify the user with the result" in the prompt. If they explicitly don't want a notification, omit it.
+- For one-time tasks, set `once: true` — the cron deletes itself after firing.
+- Use `list_crons` to show active crons, `delete_cron` to remove one, `read_cron_log` to inspect past runs.
+## Skills
+Skills are predefined workflows that guide how you approach specific tasks. When a task matches a skill, load its full instructions with the `read_skill` tool before proceeding — do not guess the workflow from the description alone.
+Available skills:
+{{skills}}
 ## Response Format
 There are two types of responses depending on whether you need to use tools:
@@ -40,7 +62,7 @@ You have access to a set of tools. Each tool has a name and description that tel
 - Always use a tool to perform an action. Never claim to have done something without actually calling the relevant tool.
 - Call tools one at a time. You will receive the result before deciding on the next step.
-- After a tool call, verify the result before declaring the task done.
+- After a tool call, verify the result before declaring the task done. Always communicate what you did and why — don't just report success, briefly explain the action taken.
 - Stop as soon as the task is complete and verified. Do not do extra work that was not asked for.
 - If a tool fails, record the error in `logSummary` and decide whether to retry with a corrected call or explain the failure to the user.
 - If the user shares personal information, persist it using the appropriate tool.
@@ -88,16 +110,6 @@ When a tool or command fails:
 - **Use `perplexity_search` sparingly.** At most 3 searches per topic per session. If the first search didn't give you what you need, try a different query angle once — then stop searching and work with what you have or report the gap.
 - **Escalate cleanly.** If you cannot make progress after two distinct approaches, give the user a clear explanation of what was attempted, what failed, and what they can do manually. A useful failure report is better than an infinite retry loop.
-## Tool Creation
-When building a custom tool with `save_tool`:
-- **Prefer npm packages** over reimplementing functionality from scratch. If a well-known package exists for the task (e.g. an API SDK, a parser, a utility library), use it.
-- **Installing an npm package**: use the `npm_install` tool — it handles the correct install directory automatically. Then create the tool with `save_tool`. The tool code can `require('<package-name>')` directly.
-- **Installing a system binary** (e.g. nuclei, jq, ffmpeg, git): use the `system_install` tool — never use exec for this. It auto-detects the available package manager (brew/apt-get/snap) and has a 5-minute timeout sized for real downloads.
-- **Available bindings in tool code**: `args`, `fs`, `path`, `process`, `require`, `__jarvisDir` (absolute path to the jarvis server directory).
-- **Long-running custom tools**: if your tool wraps an operation that takes more than 60 seconds (e.g. a network call, a slow computation), pass `timeout` in milliseconds to `save_tool` (max 600000 = 10 minutes). Example: `save_tool({ name: "run_scan", timeout: 300000, ... })`.
 ## logSummary Guidelines
 The `logSummary` is written for a human observer, not for the user. It must:

package/docs/telegram.md CHANGED Viewed

@@ -18,7 +18,7 @@ The channel calls the agent layer directly (no HTTP hop) — it imports and call
 ```
 Telegram user
-    ↓ (text message)
+    ↓ (text or photo message)
 Telegram Bot API  ←→  grammy-runner (long polling)
     ↓
 Channel adapter (src/channels/telegram/index.js)
@@ -196,6 +196,20 @@ Log lines use a simple prefix format, written to stdout (captured by PM2 alongsi
 No JSONL session logging — that is handled by the agent layer for every run.
+## Proactive Notifications
+The Telegram channel supports proactive outbound messages initiated by the agent, not by the user. This is used by the cron system.
+**`send_telegram_message` tool**: any agent run (including cron runs) can call this tool to send a message directly to the configured Telegram user. The tool reads the bot token from `TELEGRAM_BOT_TOKEN` and the chat_id from `settings.json channels.telegram.allowedUserIds[0]`. For private Telegram chats, `chat_id === user_id`.
+**Synthetic cron messages**: after a cron run completes, the cron runner appends a synthetic assistant message to the user's normal Telegram session so the agent has context if the user replies:
+```
+[Cron "backup-nightly" | 2026-03-11 03:00] Backup completed successfully. 2.3GB written to /backups/xyz.
+```
+This uses the session queue (`withSessionLock`) to avoid race conditions if the user is chatting simultaneously.
 ## Commands
 ### `/new` — Start a fresh session
@@ -204,6 +218,10 @@ Sending `/new` resets the conversation. The `chat_id → sessionId` mapping for
 The next text message after `/new` will create a new session as if the user were messaging for the first time.
+### `/usage` — Show token usage
+Sending `/usage` displays the token usage for the current session. Shows input tokens, output tokens, total, and (if non-zero) Anthropic prompt cache read/write tokens. If no session exists or no tokens have been recorded yet, a short message is shown instead.
 **Command registration**
 Commands are registered with the Telegram Bot API at startup via `bot.api.setMyCommands()`. This makes them visible to users in two places:
@@ -216,6 +234,7 @@ Without registration the command still works if typed manually, but users would
 ```js
 await bot.api.setMyCommands([
   { command: 'new', description: 'Start a fresh session' },
+  { command: 'usage', description: 'Show token usage for this session' },
 ]);
 ```
@@ -227,9 +246,49 @@ await bot.api.setMyCommands([
 | User sends `/new`, no session exists yet | No-op, same confirmation sent |
 | Next text message after `/new` | New session created, mapped to `chat_id` |
+## Photo Support
+The bot handles incoming photos (`message:photo`) in addition to text. When a user sends a photo, the adapter selects the best resolution under 800px wide to keep token usage reasonable, then passes the image URL and optional caption to the agent as a multimodal content block.
+### Photo selection
+Telegram always delivers multiple resolutions of every photo as an array of `PhotoSize` objects, sorted ascending by resolution. The adapter picks the last entry with `width <= 800`:
+```js
+const photo = ctx.message.photo.filter(p => p.width <= 800).at(-1)
+  ?? ctx.message.photo[0]; // fallback: smallest if all variants exceed 800px
+```
+This gives the highest quality image below the 800px threshold. Sending the full-resolution original would consume significantly more tokens for no practical benefit in most tasks.
+### Download and base64 encoding
+The image is downloaded immediately at receive time using the Telegram file URL (`https://api.telegram.org/file/bot<token>/<file_path>`) and converted to a base64 data URL (`data:image/jpeg;base64,...`). The data URL is stored directly in the session message, so the image remains available across handoffs and future conversation turns without depending on a Telegram URL that would expire after ~1 hour. Base64 encoding does not cost more tokens than a URL — image token cost is based on pixel dimensions, not transport format.
+### Agent call
+Photos are passed to the agent as a multimodal content array instead of a plain string:
+```js
+const content = [
+  { type: 'image_url', url: fileUrl },
+];
+if (caption) content.push({ type: 'text', text: caption });
+```
+The agent layer must support receiving `content` as either a string or a content array and pass it through to the model accordingly.
+### Caption
+If the user attaches a caption to the photo (`ctx.message.caption`), it is included as a text block alongside the image. If there is no caption, only the image block is sent.
+### Unsupported media types
+Documents, audio, video, stickers, and other non-photo media types are not handled — the bot silently ignores them (same as unauthorized messages).
 ## Non-Goals (v1)
-- No support for photos, files, or other media types (text only)
+- No support for documents, audio, video, or other non-photo media types
 - No inline keyboards or callback queries
 - No group chat support (only private chats)
 - No message editing or deletion handling

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ducci/jarvis",
-  "version": "1.0.38",
+  "version": "1.0.40",
   "description": "A fully automated agent system that lives on a server.",
   "main": "./src/index.js",
   "type": "module",
@@ -44,6 +44,7 @@
     "express": "^5.2.1",
     "grammy": "^1.40.1",
     "inquirer": "^12.11.1",
+    "node-cron": "^4.2.1",
     "openai": "^6.22.0",
     "pm2": "^6.0.14"
   },

package/src/channels/telegram/index.js CHANGED Viewed

@@ -60,6 +60,71 @@ export async function startTelegramChannel(config) {
     await ctx.reply('New session started.');
   });
+  bot.on('message:photo', async (ctx) => {
+    const userId = ctx.from?.id;
+    if (!allowedUserIds.includes(userId)) return;
+    const chatId = ctx.chat.id;
+    const sessionId = sessions[chatId] || null;
+    console.log(`[telegram] incoming photo chat_id=${chatId}`);
+    await ctx.api.sendChatAction(chatId, 'typing');
+    const typingInterval = setInterval(() => {
+      ctx.api.sendChatAction(chatId, 'typing').catch(() => {});
+    }, 4000);
+    let result;
+    try {
+      const photo = ctx.message.photo.filter(p => p.width <= 800).at(-1)
+        ?? ctx.message.photo[0];
+      const file = await ctx.api.getFile(photo.file_id);
+      const fileUrl = `https://api.telegram.org/file/bot${token}/${file.file_path}`;
+      const imgResponse = await fetch(fileUrl);
+      const buffer = await imgResponse.arrayBuffer();
+      const base64 = Buffer.from(buffer).toString('base64');
+      const dataUrl = `data:image/jpeg;base64,${base64}`;
+      const caption = ctx.message.caption || '';
+      result = await handleChat(config, sessionId, caption, [{ url: dataUrl }]);
+    } catch (e) {
+      console.error(`[telegram] agent error chat_id=${chatId}: ${e.message}`);
+      const errText = e.message
+        ? `Sorry, something went wrong: ${e.message}`
+        : 'Sorry, something went wrong. Please try again.';
+      await ctx.reply(errText).catch(() => {});
+      clearInterval(typingInterval);
+      return;
+    }
+    if (!sessions[chatId]) {
+      sessions[chatId] = result.sessionId;
+      save(sessions);
+      console.log(`[telegram] session created sessionId=${result.sessionId.slice(0, 8)}`);
+    }
+    try {
+      const MAX_TG = 4096;
+      const rawResponse = typeof result.response === 'string'
+        ? result.response
+        : result.response != null ? JSON.stringify(result.response, null, 2) : '';
+      const text = rawResponse.trim()
+        || 'The agent encountered an error and could not produce a response. Please try again.';
+      if (text.length <= MAX_TG) {
+        await ctx.reply(text);
+      } else {
+        for (let i = 0; i < text.length; i += MAX_TG) {
+          await ctx.reply(text.slice(i, i + MAX_TG));
+        }
+      }
+      console.log(`[telegram] response sent chat_id=${chatId} length=${text.length}`);
+    } catch (e) {
+      console.error(`[telegram] delivery error chat_id=${chatId}: ${e.message}`);
+      await ctx.reply('Sorry, something went wrong sending the response. Please try again.').catch(() => {});
+    } finally {
+      clearInterval(typingInterval);
+    }
+  });
   bot.on('message:text', async (ctx) => {
     const userId = ctx.from?.id;