npm - arisa - Versions diffs - 3.1.2 → 3.1.4 - Mend

arisa 3.1.2 → 3.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/AGENTS.md +59 -14
package/README.md +3 -3
package/package.json +1 -1
package/src/core/agent/agent-manager.js +56 -37
package/src/core/agent/runtime-context.js +2 -1
package/src/core/artifacts/normalize-for-reasoning.js +5 -4
package/src/core/skills/skill-registry.js +71 -0
package/src/core/tasks/task-store.js +1 -1
package/src/core/tools/daemon-runtime.js +2 -2
package/src/core/tools/tool-registry.js +38 -4
package/src/runtime/bootstrap.js +1 -1
package/src/runtime/paths.js +17 -8
package/src/transport/telegram/bot.js +95 -31
package/src/transport/telegram/media.js +20 -0
package/tools/openai-transcribe/index.js +1 -1
package/tools/openai-transcribe/tool.manifest.json +2 -2
package/docs/async-event-queue-flow.md +0 -68

package/AGENTS.md CHANGED Viewed

@@ -3,10 +3,21 @@
 ## Architecture
 - Telegram transport handles inbound and outbound messaging.
 - Pi Agent keeps one session per authorized chat.
-- Every incoming or generated message or file becomes an artifact.
+- Incoming messages and files (text, voice, photo, document) and generated files become artifacts.
 - A tool registry handles tool discovery, help lookup, config writes, and execution.
 - Tools are isolated and each one has its own manifest, entrypoint, and config defaults.
+## Runtime directory rules
+Do not build runtime paths by hand. Use `src/runtime/paths.js`:
+- `getToolDir(toolName)`: installed user tool package only; no runtime data here.
+- `getToolStateDir(toolName)`: global tool infrastructure only: daemons, queues, shared browser sessions, model caches.
+- `getChatToolStateDir(chatId, toolName)`: persistent user/chat data: tool DBs, indexes, inboxes, generated sites, vaults.
+- `getChatArtifactsDir(chatId)` / `getChatArtifactsIndexFile(chatId)`: chat artifacts and artifact index. Artifacts are never global.
+- `getChatToolConfigPath(chatId, toolName)`: chat-scoped config overrides.
+- `getToolTmpDir(toolName)` / `getChatToolTmpDir(chatId, toolName)`: ephemeral scratch. Create only while a request runs; remove when empty.
+Tools receive `chatId` from the registry. Any persisted or indexed user content must be scoped by chat. Avoid ad hoc roots like `~/.arisa/state/<toolName>`, `~/.arisa/state/chats`, or runtime data inside `~/.arisa/tools/<toolName>`.
 ## Main rule: everything is piped through artifacts
 A pipe transforms one input artifact into one output artifact.
 Examples:
@@ -18,6 +29,7 @@ Each tool declares in `tool.manifest.json`:
 - `input`: supported input types
 - `output`: produced output types
 - `configSchema`: required config fields
+- `skillHints`: optional skills to apply when using or editing the tool
 ## Conceptual pipe model
 There are two different moments where pipes can happen:
@@ -34,12 +46,11 @@ There are two different moments where pipes can happen:
    - Pi Agent may decide to chain tools to achieve a user goal.
    - Example: text -> TTS audio, or future multi-step workflows.
-This distinction is critical. Not every pipe should be decided by Pi Agent at runtime. Some pipes are part of the transport/input normalization layer and must happen before reasoning.
+Not every pipe should be decided by Pi Agent at runtime. Some pipes are part of the transport/input normalization layer and must happen before reasoning.
 ## Telegram inbound pipeline
-Current conceptual behavior:
 - text -> send directly to Pi Agent
-- audio/voice -> transcribe first -> send transcript to Pi Agent
+- voice -> transcribe first -> send transcript to Pi Agent
 - image/document/other media -> keep as artifacts, and add normalization pipes when needed
 If inbound media was normalized before reasoning, Pi Agent should use the normalized result as the actual message content.
@@ -50,23 +61,23 @@ Before using a tool, inspect its help:
 - via the custom tool: `tool_help`
 - or by running the CLI with `--help`
-Every CLI must support:
+Every CLI must support (the entrypoint comes from `manifest.entry`, currently always `index.js`):
 - `node index.js --help`
 - `node index.js run --request-file <json>`
 ### Tools that need daemons
-Some tools need a persistent process, for example to keep a browser session alive or a local model warm.
-Implement these tools with the shared daemon runtime instead of custom ad hoc process management:
+A future tool may need a persistent process, for example to keep a browser session alive or a local model warm. The shared daemon runtime exists for this, but no bundled tool uses it yet.
+When such a tool is built, implement it with the shared daemon runtime instead of custom ad hoc process management:
 - use `src/core/tools/daemon-runtime.js`
-- keep runtime files under the tool state directory (`stateDir/<toolName>`)
+- keep runtime files under the tool state directory (`~/.arisa/state/tools/<toolName>`)
 - expose normal CLI behavior through `run --request-file`; callers should not manage daemon internals
 - use the runtime for `daemon.pid`, `daemon.log`, `status.json`, and `commands/*.request|processing|result.json`
 - keep one daemon owner per tool/session and avoid opening a second client over the same resource
 - use `beforeStart` only for tool-specific cleanup such as stale browser locks, without deleting persistent session/model data
 - keep daemon tools headless/server-safe by default when they are meant to run on VPS machines
-## Pipe behavior in V1
-V1 does not have a full automatic planner yet. The agent should:
+## Manual pipe behavior
+To run a pipe, the agent should:
 1. understand whether the needed pipe belongs to pre-reasoning normalization or post-reasoning tool chaining
 2. use `list_tools`
 3. use `tool_help` when it needs operational details
@@ -76,7 +87,28 @@ V1 does not have a full automatic planner yet. The agent should:
 Example manual pipe:
 1. `run_tool(openai-transcribe, artifact audio)`
 2. take the returned text `artifactId`
-3. `run_tool(openai-tts, artifact text)` or `send_audio_reply(text)`
+3. `run_tool(openai-tts, artifact text)` or `send_media_reply(text)`
+## Async event queue flow
+Beyond time-based scheduling, tools can drive an event queue that wakes the agent only when there is something to evaluate. Everything goes through the `asyncTask` (single) or `asyncTasks` (array) field the pipeline already supports; no new Pi tools are needed. The 1s poller drains tasks by `kind`:
+- `agent_task`: a scheduled prompt. The poller delivers it as a prompt for Pi to fulfill (time-based work).
+- `poll_tool`: a recurring checker the poller **runs directly as a tool** (no agent turn spent). The poller materializes its output with the same logic as `run_tool`, so any `agent_event` the checker emits is enqueued for the next tick. Its `recurrence` reschedules the next poll.
+- `agent_event`: an incoming event. The poller delivers it as a prompt so Pi evaluates it and decides the next action (it may stay silent).
+Tasks without a `runAt` fire immediately, so `agent_event` and the first `poll_tool` run on the next tick.
+The poller dispatches all three kinds, but only `agent_task` is exercised by a bundled tool today (`schedule-agent-task`). The following is the pattern to follow when a checker tool is built:
+How a tool wires its own polling:
+1. From any tool `run`, start the poll by returning an `asyncTask` (or several in `asyncTasks`):
+   `{ kind: "poll_tool", payload: { toolName, args }, recurrence: { type: "interval", everySeconds: N } }`.
+2. On each poll the checker tool (`toolName`) runs headless. It keeps its own cursor of seen state in its config/tmp per chat, so it knows what is new.
+3. When the checker finds something new, it emits an event from its `run`:
+   `{ kind: "agent_event", payload: { prompt: "<content to evaluate>" } }`.
+4. The agent reasons over the `agent_event` and decides what to do.
+`list_scheduled_tasks`, `cancel_scheduled_task`, and `cancel_all_scheduled_tasks` are kind-agnostic, so they already work to inspect or cancel active polls.
 ## Missing config flow
 If `run_tool` returns `missingConfig`, the agent should:
@@ -101,13 +133,26 @@ The default attitude is:
 - propose or start creating the needed tool
 When creating or editing tools:
-- use the shared path helpers and the runtime paths provided in the prompt instead of assuming fixed locations
-- consult the local skill for that workflow when building new tools
+- use the path helpers in `src/runtime/paths.js`
+- follow the existing bundled tools under `tools/` as the reference pattern for new tools
 - keep all help text, usage instructions, manifests, and user-facing operational strings in English
 - follow the One Thing Rule: each function or method should do one thing well; if it mixes low-level operations with high-level policy, split it into smaller focused units
+### Tool skill hints
+Tools may declare skills in `tool.manifest.json`:
+```json
+{
+  "skillHints": [
+    { "name": "stop-slop", "when": "writing public page copy" }
+  ]
+}
+```
+The tool registry resolves these from the installed skills directory and injects them into the tool request as `skills`. `list_tools` exposes the hints and `tool_help` shows their resolution status. Skills are guidance for the agent/tool; they are not separate runtime dependencies.
 ## Dependency installation
-Arisa installs tool dependencies itself.
+Tool dependencies are installed as part of building or running the tool, not delegated to the user.
 - Prefer `pnpm install`.
 - Fall back to `npm install`.
 - Do not ask the user to do it manually.

package/README.md CHANGED Viewed

@@ -145,13 +145,13 @@ node src/index.js --telegram.token <token>
 With this mode, Arisa creates `~/.arisa/state/config.json` without prompts and applies these defaults when not provided:
 - `pi.provider`: `openai-codex` when available, otherwise first provider from the current Pi provider list
-- `pi.model`: first model after bootstrap sorting (currently prioritizes `openai-codex/gpt-5.4`)
+- `pi.model`: first model after bootstrap sorting (currently prioritizes `openai-codex/gpt-5.5`)
 - `telegram.maxChatIds`: `1`
 Supported overrides:
 ```bash
-node src/index.js --telegram.token <token> --telegram.maxChatIds 3 --pi.provider openai-codex --pi.model gpt-5.4 --pi.apiKey <optional-provider-key>
+node src/index.js --telegram.token <token> --telegram.maxChatIds 3 --pi.provider openai-codex --pi.model gpt-5.5 --pi.apiKey <optional-provider-key>
 ```
 Notes:
@@ -171,7 +171,7 @@ For providers with internal Pi login support, such as Codex, leaving the API key
 For example, selecting:
-- `openai-codex/gpt-5.4`
+- `openai-codex/gpt-5.5`
 allows Arisa to authenticate through Pi's Codex OAuth flow instead of requiring a normal OpenAI API key.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "arisa",
-  "version": "3.1.2",
+  "version": "3.1.4",
   "description": "Telegram + Pi Agent modular assistant",
   "type": "module",
   "main": "src/index.js",

package/src/core/agent/agent-manager.js CHANGED Viewed

@@ -132,6 +132,49 @@ export class AgentManager {
     return ctx;
   }
+  async runTool({ name, request, chatId }) {
+    await this.toolRegistry.load();
+    this.logger?.log("agent", `run_tool ${name}`);
+    const chatArtifactStore = this.artifactStore.forChat(chatId);
+    const result = await this.toolRegistry.run({ name, request, chatId });
+    if (result.output?.text) {
+      const outArtifact = await chatArtifactStore.createText({
+        text: result.output.text,
+        source: { type: "tool", toolName: name },
+        metadata: { tool: name }
+      });
+      result.output.artifactId = outArtifact.id;
+    }
+    if (result.output?.filePath) {
+      const generated = await chatArtifactStore.createFromFile({
+        originalPath: result.output.filePath,
+        fileName: result.output.fileName || path.basename(result.output.filePath),
+        kind: result.output.kind || "file",
+        mimeType: result.output.mimeType || "application/octet-stream",
+        source: { type: "tool", toolName: name },
+        metadata: { tool: name }
+      });
+      result.output.artifactId = generated.id;
+      await unlink(result.output.filePath).catch(() => {});
+    }
+    if (result.asyncTask || result.asyncTasks?.length) {
+      const scheduled = await this.taskStore.addMany(
+        result.asyncTasks || [result.asyncTask],
+        {
+          payload: { chatId },
+          source: { type: "tool", toolName: name, chatId }
+        }
+      );
+      result.asyncTasks = scheduled;
+      delete result.asyncTask;
+    }
+    return result;
+  }
   createTools(telegram, chatId) {
     const chatArtifactStore = this.artifactStore.forChat(chatId);
@@ -160,6 +203,18 @@ export class AgentManager {
           return { content: [{ type: "text", text: help }], details: { help } };
         }
       }),
+      defineTool({
+        name: "tool_skills",
+        label: "Tool skills",
+        description: "Show skills assigned to a CLI tool via its manifest skillHints.",
+        parameters: Type.Object({ name: Type.String() }),
+        execute: async (_id, params) => {
+          await this.toolRegistry.load();
+          const skills = await this.toolRegistry.resolveSkills(params.name);
+          const visible = skills.map(({ content, ...item }) => item);
+          return { content: [{ type: "text", text: JSON.stringify(visible, null, 2) }], details: visible };
+        }
+      }),
       defineTool({
         name: "set_tool_config",
         label: "Set tool config",
@@ -182,8 +237,6 @@ export class AgentManager {
           args: Type.Optional(Type.Record(Type.String(), Type.String()))
         }),
         execute: async (_id, params) => {
-          await this.toolRegistry.load();
-          this.logger?.log("agent", `run_tool ${params.name}`);
           let artifact = null;
           if (params.artifactId) {
             artifact = await chatArtifactStore.get(params.artifactId);
@@ -191,7 +244,7 @@ export class AgentManager {
               return { content: [{ type: "text", text: `Artifact not found: ${params.artifactId}` }], details: { ok: false } };
             }
           }
-          const result = await this.toolRegistry.run({
+          const result = await this.runTool({
             name: params.name,
             request: {
               artifact,
@@ -201,40 +254,6 @@ export class AgentManager {
             chatId
           });
-          if (result.output?.text) {
-            const outArtifact = await chatArtifactStore.createText({
-              text: result.output.text,
-              source: { type: "tool", toolName: params.name },
-              metadata: { tool: params.name }
-            });
-            result.output.artifactId = outArtifact.id;
-          }
-          if (result.output?.filePath) {
-            const generated = await chatArtifactStore.createFromFile({
-              originalPath: result.output.filePath,
-              fileName: result.output.fileName || path.basename(result.output.filePath),
-              kind: result.output.kind || "file",
-              mimeType: result.output.mimeType || "application/octet-stream",
-              source: { type: "tool", toolName: params.name },
-              metadata: { tool: params.name }
-            });
-            result.output.artifactId = generated.id;
-            await unlink(result.output.filePath).catch(() => {});
-          }
-          if (result.asyncTask || result.asyncTasks?.length) {
-            const scheduled = await this.taskStore.addMany(
-              result.asyncTasks || [result.asyncTask],
-              {
-                payload: { chatId },
-                source: { type: "tool", toolName: params.name, chatId }
-              }
-            );
-            result.asyncTasks = scheduled;
-            delete result.asyncTask;
-          }
           return {
             content: [{ type: "text", text: JSON.stringify(result, null, 2) }],
             details: result

package/src/core/agent/runtime-context.js CHANGED Viewed

@@ -1,5 +1,5 @@
 import { fileURLToPath } from "node:url";
-import { arisaHomeDir, chatsDir, stateDir, toolsDir } from "../../runtime/paths.js";
+import { arisaHomeDir, chatsDir, stateDir, toolStateDir, toolsDir } from "../../runtime/paths.js";
 export const arisaInstallDir = fileURLToPath(new URL("../../..", import.meta.url));
 export const bundledToolsDir = fileURLToPath(new URL("../../../tools", import.meta.url));
@@ -10,6 +10,7 @@ export function buildAgentRuntimeContext() {
     `arisaInstallDir: ${arisaInstallDir}`,
     `bundledToolsDir: ${bundledToolsDir}`,
     `userToolsDir: ${toolsDir}`,
+    `toolStateDir: ${toolStateDir}`,
     `chatsDir: ${chatsDir}`,
     `stateDir: ${stateDir}`
   ].join("\n");

package/src/core/artifacts/normalize-for-reasoning.js CHANGED Viewed

@@ -19,8 +19,9 @@ function looksLikeAudioTranscriptionTool(tool) {
   return /transcri|whisper|speech.?to.?text|audio.?to.?text/i.test(`${tool.name} ${tool.description || ""}`);
 }
-function shouldNormalizeAudioToText(artifact, desiredMimeType) {
-  return artifact?.mimeType?.startsWith("audio/") && desiredMimeType === "text/plain";
+export function shouldNormalizeArtifactToText(artifact, desiredMimeType = "text/plain") {
+  return desiredMimeType === "text/plain"
+    && (artifact?.mimeType?.startsWith("audio/") || artifact?.mimeType?.startsWith("video/"));
 }
 export function selectPipeTool({ toolRegistry, artifact, desiredMimeType }) {
@@ -28,7 +29,7 @@ export function selectPipeTool({ toolRegistry, artifact, desiredMimeType }) {
     .filter((tool) => toolSupportsArtifact(tool, artifact))
     .filter((tool) => toolProduces(tool, desiredMimeType));
-  if (shouldNormalizeAudioToText(artifact, desiredMimeType)) {
+  if (shouldNormalizeArtifactToText(artifact, desiredMimeType)) {
     return tools.find(looksLikeAudioTranscriptionTool) || null;
   }
@@ -44,7 +45,7 @@ export async function normalizeArtifactForReasoning({
 }) {
   if (!artifact) return { normalizedArtifact: null, toolResult: null, toolName: "" };
-  if (!shouldNormalizeAudioToText(artifact, desiredMimeType)) {
+  if (!shouldNormalizeArtifactToText(artifact, desiredMimeType)) {
     return { normalizedArtifact: null, toolResult: null, toolName: "" };
   }

package/src/core/skills/skill-registry.js ADDED Viewed

@@ -0,0 +1,71 @@
+import os from "node:os";
+import path from "node:path";
+import { readFile } from "node:fs/promises";
+const defaultSkillsDir = path.join(os.homedir(), ".agents", "skills");
+function parseFrontmatter(source = "") {
+  if (!source.startsWith("---")) return {};
+  const end = source.indexOf("\n---", 3);
+  if (end === -1) return {};
+  const block = source.slice(3, end).trim();
+  const data = {};
+  for (const line of block.split("\n")) {
+    const match = line.match(/^([A-Za-z0-9_-]+):\s*(.*)$/);
+    if (match) data[match[1]] = match[2].replace(/^['"]|['"]$/g, "");
+  }
+  return data;
+}
+function normalizeSkillHint(value) {
+  if (typeof value === "string") return { name: value, when: "" };
+  if (value && typeof value === "object" && value.name) {
+    return { name: String(value.name), when: String(value.when || "") };
+  }
+  return null;
+}
+export class SkillRegistry {
+  constructor({ skillsDir = defaultSkillsDir } = {}) {
+    this.skillsDir = skillsDir;
+    this.cache = new Map();
+  }
+  async get(name) {
+    const key = String(name || "").trim();
+    if (!key) return null;
+    if (this.cache.has(key)) return this.cache.get(key);
+    const file = path.join(this.skillsDir, key, "SKILL.md");
+    try {
+      const content = await readFile(file, "utf8");
+      const metadata = parseFrontmatter(content);
+      const skill = {
+        name: metadata.name || key,
+        description: metadata.description || "",
+        path: file,
+        content
+      };
+      this.cache.set(key, skill);
+      return skill;
+    } catch {
+      this.cache.set(key, null);
+      return null;
+    }
+  }
+  normalizeHints(manifest = {}) {
+    const raw = manifest.skillHints || manifest.skills || [];
+    if (!Array.isArray(raw)) return [];
+    return raw.map(normalizeSkillHint).filter(Boolean);
+  }
+  async resolveHints(hints = []) {
+    const resolved = [];
+    for (const hint of hints) {
+      const skill = await this.get(hint.name);
+      resolved.push({ ...hint, found: Boolean(skill), skill });
+    }
+    return resolved;
+  }
+}

package/src/core/tasks/task-store.js CHANGED Viewed

@@ -27,7 +27,7 @@ function normalizeTask(task, defaults = {}) {
     createdAt: task.createdAt || new Date().toISOString(),
     updatedAt: new Date().toISOString(),
     kind: task.kind,
-    runAt: task.runAt,
+    runAt: task.runAt || new Date().toISOString(),
     payload: {
       ...(defaults.payload || {}),
       ...(task.payload || {})

package/src/core/tools/daemon-runtime.js CHANGED Viewed

@@ -3,10 +3,10 @@ import { spawn } from "node:child_process";
 import { openSync } from "node:fs";
 import { mkdir, readFile, readdir, rename, rm, unlink, writeFile } from "node:fs/promises";
 import path from "node:path";
-import { stateDir } from "../../runtime/paths.js";
+import { getToolStateDir } from "../../runtime/paths.js";
 export function daemonPaths(toolName) {
-  const root = path.join(stateDir, toolName);
+  const root = getToolStateDir(toolName);
   return {
     root,
     commandsDir: path.join(root, "commands"),

package/src/core/tools/tool-registry.js CHANGED Viewed

@@ -1,10 +1,11 @@
-import { mkdir, readdir, readFile, unlink, writeFile } from "node:fs/promises";
+import { mkdir, readdir, readFile, rmdir, unlink, writeFile } from "node:fs/promises";
 import path from "node:path";
 import { spawn } from "node:child_process";
 import { fileURLToPath } from "node:url";
 import { getToolConfigPath, getToolTmpDir, getChatToolTmpDir, toolsDir as userToolsRoot } from "../../runtime/paths.js";
 import { loadToolConfig, parseConfigModule, writeToolConfig } from "./tool-config.js";
 import { normalizeToolResult } from "./tool-result.js";
+import { SkillRegistry } from "../skills/skill-registry.js";
 const bundledToolsRoot = fileURLToPath(new URL("../../../tools", import.meta.url));
 const toolRoots = [
@@ -27,6 +28,7 @@ export class ToolRegistry {
   constructor({ logger } = {}) {
     this.logger = logger;
     this.tools = new Map();
+    this.skillRegistry = new SkillRegistry();
   }
   async load() {
@@ -52,8 +54,10 @@ export class ToolRegistry {
           const configSource = await readFile(configPath, "utf8");
           const defaults = parseConfigModule(configSource);
           const config = await loadToolConfig(manifest.name, defaults);
+          const skillHints = this.skillRegistry.normalizeHints(manifest);
           this.tools.set(manifest.name, {
             ...manifest,
+            skillHints,
             dir: toolDir,
             entry: path.join(toolDir, manifest.entry || "index.js"),
             localConfigPath: configPath,
@@ -77,7 +81,8 @@ export class ToolRegistry {
       description: tool.description,
       input: tool.input,
       output: tool.output,
-      configSchema: tool.configSchema || {}
+      configSchema: tool.configSchema || {},
+      skillHints: tool.skillHints || []
     }));
   }
@@ -89,7 +94,29 @@ export class ToolRegistry {
     const tool = this.get(name);
     if (!tool) throw new Error(`Tool not found: ${name}`);
     const result = await runProcess("node", [tool.entry, "--help"], { cwd: tool.dir, env: process.env });
-    return result.stdout || result.stderr;
+    const help = result.stdout || result.stderr;
+    const skills = await this.resolveSkills(name);
+    if (!skills.length) return help;
+    const skillHelp = skills.map((item) => [
+      `- ${item.name}${item.when ? ` (${item.when})` : ""}`,
+      item.description ? `  ${item.description}` : null,
+      item.found ? `  path: ${item.path}` : "  warning: skill not found"
+    ].filter(Boolean).join("\n")).join("\n");
+    return `${help}\n\nAssigned skills:\n${skillHelp}\n`;
+  }
+  async resolveSkills(name) {
+    const tool = this.get(name);
+    if (!tool) throw new Error(`Tool not found: ${name}`);
+    const hints = await this.skillRegistry.resolveHints(tool.skillHints || []);
+    return hints.map((hint) => ({
+      name: hint.name,
+      when: hint.when,
+      found: hint.found,
+      description: hint.skill?.description || "",
+      path: hint.skill?.path || "",
+      content: hint.skill?.content || ""
+    }));
   }
   async resolveConfigForChat(name, chatId) {
@@ -121,12 +148,19 @@ export class ToolRegistry {
     const tmpDir = chatId != null ? getChatToolTmpDir(chatId, name) : getToolTmpDir(name);
     await mkdir(tmpDir, { recursive: true });
     const requestFile = path.join(tmpDir, `.request-${Date.now()}.json`);
-    await writeFile(requestFile, `${JSON.stringify(request, null, 2)}\n`, "utf8");
+    const skills = await this.resolveSkills(name);
+    const enrichedRequest = { ...request, chatId, skills };
+    await writeFile(requestFile, `${JSON.stringify(enrichedRequest, null, 2)}\n`, "utf8");
     const result = await runProcess("node", [tool.entry, "run", "--request-file", requestFile], {
       cwd: tool.dir,
       env: process.env
     });
     await unlink(requestFile).catch(() => {});
+    await rmdir(tmpDir).catch(() => {});
+    if (chatId != null) {
+      await rmdir(path.dirname(tmpDir)).catch(() => {});
+      await rmdir(path.dirname(path.dirname(tmpDir))).catch(() => {});
+    }
     try {
       const parsed = JSON.parse(result.stdout || result.stderr);
       const normalized = normalizeToolResult(name, parsed);

package/src/runtime/bootstrap.js CHANGED Viewed

@@ -90,7 +90,7 @@ function sortBootstrapProviders(providers) {
 function sortBootstrapModels(provider, models) {
   const preferred = {
-    "openai-codex": ["gpt-5.4"]
+    "openai-codex": ["gpt-5.5"]
   };
   const priority = preferred[provider] || [];

package/src/runtime/paths.js CHANGED Viewed

@@ -10,6 +10,7 @@ export const serviceLogFile = path.join(stateDir, "arisa.log");
 export const tasksFile = path.join(stateDir, "tasks.json");
 export const toolsDir = path.join(arisaHomeDir, "tools");
 export const chatsDir = path.join(arisaHomeDir, "chats");
+export const toolStateDir = path.join(stateDir, "tools");
 export function getChatDir(chatId) {
   return path.join(chatsDir, String(chatId));
@@ -23,6 +24,10 @@ export function getChatArtifactsIndexFile(chatId) {
   return path.join(getChatDir(chatId), "state", "artifacts.json");
 }
+export function getChatToolStateDir(chatId, toolName) {
+  return path.join(getChatDir(chatId), "state", "tools", toolName);
+}
 export function getChatPiSessionsDir(chatId) {
   return path.join(getChatDir(chatId), "state", "pi-sessions");
 }
@@ -35,24 +40,28 @@ export function getToolConfigPath(toolName) {
   return path.join(getToolDir(toolName), "config.js");
 }
-export function getChatToolConfigPath(chatId, toolName) {
-  return path.join(getChatDir(chatId), "tools", toolName, "config.js");
+export function getChatConfigDir(chatId) {
+  return path.join(getChatDir(chatId), "config");
 }
-export function getToolRuntimeDir(toolName) {
-  return getToolDir(toolName);
+export function getChatTmpDir(chatId) {
+  return path.join(getChatDir(chatId), "tmp");
+}
+export function getChatToolConfigPath(chatId, toolName) {
+  return path.join(getChatConfigDir(chatId), "tools", toolName, "config.js");
 }
-export function getToolOutDir(toolName) {
-  return path.join(getToolRuntimeDir(toolName), "out");
+export function getToolStateDir(toolName) {
+  return path.join(toolStateDir, toolName);
 }
 export function getToolTmpDir(toolName) {
-  return path.join(getToolRuntimeDir(toolName), "tmp");
+  return path.join(getToolStateDir(toolName), "tmp");
 }
 export function getChatToolTmpDir(chatId, toolName) {
-  return path.join(getChatDir(chatId), "tools", toolName, "tmp");
+  return path.join(getChatTmpDir(chatId), "tools", toolName);
 }
 export async function ensureArisaHome() {

package/src/transport/telegram/bot.js CHANGED Viewed

@@ -3,7 +3,7 @@ import path from "node:path";
 import { authorizeChat } from "./auth.js";
 import { captureIncomingArtifact } from "./media.js";
 import { renderTelegramHtml } from "./text-format.js";
-import { normalizeArtifactForReasoning } from "../../core/artifacts/normalize-for-reasoning.js";
+import { normalizeArtifactForReasoning, shouldNormalizeArtifactToText } from "../../core/artifacts/normalize-for-reasoning.js";
 function quotedMessageSummary(message) {
   if (!message) return [];
@@ -63,11 +63,11 @@ function buildPrompt({ ctx, artifact, transcript, toolResult }) {
   if (transcript) {
     parts.push(`transcriptArtifactId: ${transcript.id}`);
     parts.push(`transcriptText: ${transcript.text}`);
-    parts.push(`Important: the incoming audio has already been transcribed. Use the transcript as the user message content. Do not answer with a raw transcription unless the user explicitly asked for one.`);
+    parts.push(`Important: the incoming media has already been transcribed. Use the transcript as the user message content. Do not answer with a raw transcription unless the user explicitly asked for one.`);
   }
-  if (artifact?.kind === "audio" && !transcript && toolResult) {
-    parts.push(`audioNormalizationResult: ${JSON.stringify(toolResult)}`);
-    parts.push(`Important: pre-reasoning audio normalization could not be completed, so you do not have a transcript for this voice/audio message.`);
+  if (shouldNormalizeArtifactToText(artifact) && !transcript && toolResult) {
+    parts.push(`mediaNormalizationResult: ${JSON.stringify(toolResult)}`);
+    parts.push(`Important: pre-reasoning media normalization could not be completed, so you do not have a transcript for this audio/video message.`);
   }
   parts.push(`If you need a CLI tool, use list_tools/tool_help/run_tool.`);
@@ -114,10 +114,10 @@ async function buildAsyncTaskPrompt({ task, artifactStore, toolRegistry, logger
         logger?.log("tasks", `artifact ${artifact.id} normalized to ${normalizedArtifact.id}`);
         parts.push(`transcriptArtifactId: ${normalizedArtifact.id}`);
         parts.push(`transcriptText: ${normalizedArtifact.text}`);
-        parts.push("Important: the attached audio artifact has already been normalized for reasoning. Use the transcript as the message content.");
-      } else if (artifact.kind === "audio" && toolResult) {
-        parts.push(`audioNormalizationResult: ${JSON.stringify(toolResult)}`);
-        parts.push("Important: pre-reasoning audio normalization could not be completed, so you do not have a transcript for this audio artifact.");
+        parts.push("Important: the attached media artifact has already been normalized for reasoning. Use the transcript as the message content.");
+      } else if (shouldNormalizeArtifactToText(artifact) && toolResult) {
+        parts.push(`mediaNormalizationResult: ${JSON.stringify(toolResult)}`);
+        parts.push("Important: pre-reasoning media normalization could not be completed, so you do not have a transcript for this audio/video artifact.");
       }
     } else {
       parts.push(`artifactId: ${task.payload.artifactId}`);
@@ -130,6 +130,18 @@ async function buildAsyncTaskPrompt({ task, artifactStore, toolRegistry, logger
   return parts.filter(Boolean).join("\n");
 }
+function buildAsyncEventPrompt(task) {
+  return [
+    "External event arrived.",
+    `taskId: ${task.id}`,
+    `chatId: ${task.payload.chatId}`,
+    task.payload.prompt ? `event: ${task.payload.prompt}` : null,
+    "A polling checker detected this external event. Evaluate it and decide the next action.",
+    "If it warrants no action, you may stay silent.",
+    "If needed, use tools."
+  ].filter(Boolean).join("\n");
+}
 async function normalizeIncomingArtifact({ artifact, toolRegistry, chatArtifactStore, chatId }) {
   if (!artifact) return { transcript: null, toolResult: null };
   const { normalizedArtifact, toolResult } = await normalizeArtifactForReasoning({
@@ -194,9 +206,9 @@ export async function createTelegramBot({ config, artifactStore, toolRegistry, t
     const artifact = await captureIncomingArtifact(ctx, artifactStore);
     if (artifact) logger?.log("telegram", `captured artifact ${artifact.kind}${artifact.id ? ` ${artifact.id}` : ""}`);
     const { transcript, toolResult } = await normalizeIncomingArtifact({ artifact, toolRegistry, chatArtifactStore, chatId });
-    if (transcript) logger?.log("telegram", `audio transcribed to artifact ${transcript.id}`);
-    if (artifact?.kind === "audio" && !transcript) {
-      logger?.log("telegram", `audio normalization unavailable for chat ${ctx.chat.id}: ${toolResult?.error || toolResult?.missingConfig?.join(", ") || "unknown error"}`);
+    if (transcript) logger?.log("telegram", `media transcribed to artifact ${transcript.id}`);
+    if (shouldNormalizeArtifactToText(artifact) && !transcript) {
+      logger?.log("telegram", `media normalization unavailable for chat ${ctx.chat.id}: ${toolResult?.error || toolResult?.missingConfig?.join(", ") || "unknown error"}`);
     }
     return buildPrompt({ ctx, artifact, transcript, toolResult });
   }
@@ -310,6 +322,73 @@ export async function createTelegramBot({ config, artifactStore, toolRegistry, t
     });
   }
+  async function dispatchTask(task) {
+    const chatId = task.payload?.chatId;
+    if (!chatId) {
+      await taskStore.fail(task.id, `Task missing chatId: ${task.kind}`);
+      return;
+    }
+    if (task.kind === "agent_task") {
+      if (!task.payload.prompt) {
+        await taskStore.fail(task.id, "agent_task missing prompt");
+        return;
+      }
+      logger?.log("tasks", `running task ${task.id} for chat ${chatId}`);
+      await enqueuePrompt({
+        chatId,
+        prompt: await buildAsyncTaskPrompt({ task, artifactStore, toolRegistry, logger }),
+        label: `scheduled task ${task.id}`
+      });
+      await taskStore.complete(task.id);
+      return;
+    }
+    if (task.kind === "agent_event") {
+      logger?.log("tasks", `agent event ${task.id} for chat ${chatId}`);
+      await enqueuePrompt({
+        chatId,
+        prompt: buildAsyncEventPrompt(task),
+        label: `agent event ${task.id}`
+      });
+      await taskStore.complete(task.id);
+      return;
+    }
+    if (task.kind === "poll_tool") {
+      const toolName = task.payload?.toolName;
+      if (!toolName) {
+        await taskStore.fail(task.id, "poll_tool missing toolName");
+        return;
+      }
+      logger?.log("tasks", `polling tool ${toolName} (task ${task.id}) for chat ${chatId}`);
+      try {
+        await agentManager.runTool({
+          name: toolName,
+          request: { args: task.payload.args || {} },
+          chatId
+        });
+      } catch (error) {
+        logger?.log("tasks", `poll_tool ${toolName} failed: ${error instanceof Error ? error.message : String(error)}`);
+      }
+      await taskStore.complete(task.id);
+      return;
+    }
+    await taskStore.fail(task.id, `Unsupported task: ${task.kind}`);
+  }
+  async function dispatchDueTasks() {
+    const tasks = await taskStore.claimDue(10);
+    for (const task of tasks) {
+      try {
+        await dispatchTask(task);
+      } catch (error) {
+        await taskStore.fail(task.id, error instanceof Error ? error.message : String(error));
+      }
+    }
+  }
   async function handleNewCommand(ctx) {
     agentManager.resetSession(ctx.chat.id);
     perChatState.set(ctx.chat.id, { processing: false, nextPrompt: "" });
@@ -381,25 +460,10 @@ export async function createTelegramBot({ config, artifactStore, toolRegistry, t
       await bot.api.setMyCommands([
         { command: "new", description: "Start a new chat context" }
       ]);
-      setInterval(async () => {
-        const tasks = await taskStore.claimDue(10);
-        for (const task of tasks) {
-          try {
-            if (task.kind !== "agent_task" || !task.payload?.chatId || !task.payload?.prompt) {
-              await taskStore.fail(task.id, `Unsupported task: ${task.kind}`);
-              continue;
-            }
-            logger?.log("tasks", `running task ${task.id} for chat ${task.payload.chatId}`);
-            await enqueuePrompt({
-              chatId: task.payload.chatId,
-              prompt: await buildAsyncTaskPrompt({ task, artifactStore, toolRegistry, logger }),
-              label: `scheduled task ${task.id}`
-            });
-            await taskStore.complete(task.id);
-          } catch (error) {
-            await taskStore.fail(task.id, error instanceof Error ? error.message : String(error));
-          }
-        }
+      setInterval(() => {
+        dispatchDueTasks().catch((error) => {
+          logger?.error("tasks", `dispatch failed: ${error instanceof Error ? error.message : String(error)}`);
+        });
       }, 1000).unref();
       if (webhookUrl && setHttpRequestHandler) {
         const webhookPath = `/telegram-${config.telegram.token.slice(-8)}`;

package/src/transport/telegram/media.js CHANGED Viewed

@@ -33,6 +33,26 @@ export async function captureIncomingArtifact(ctx, artifactStore) {
     });
   }
+  if (ctx.message?.video) {
+    const video = ctx.message.video;
+    const fileName = video.file_name || `${chatId}-${ctx.msg.message_id}.mp4`;
+    const content = await downloadToBuffer(ctx, video.file_id);
+    return store.createGeneratedFile({
+      fileName,
+      content,
+      kind: "video",
+      mimeType: video.mime_type || "video/mp4",
+      source: baseSource,
+      metadata: {
+        duration: video.duration,
+        width: video.width,
+        height: video.height,
+        fileSize: video.file_size,
+        ...incomingCaptionMetadata(ctx)
+      }
+    });
+  }
   if (ctx.message?.document) {
     const fileName = ctx.message.document.file_name || `${chatId}-${ctx.msg.message_id}`;
     const content = await downloadToBuffer(ctx, ctx.message.document.file_id);

package/tools/openai-transcribe/index.js CHANGED Viewed

@@ -9,7 +9,7 @@ const toolName = "openai-transcribe";
 const config = await loadToolConfig(toolName, defaults);
 function printHelp() {
-  console.log(`openai-transcribe\n\nUsage:\n  node index.js --help\n  node index.js run --request-file <json>\n\nExpected input:\n  {\n    "artifact": { "path": "/abs/audio.ogg", "mimeType": "audio/ogg" },\n    "args": {}\n  }\n\nConfig at ${getToolConfigPath(toolName)}:\n  OPENAI_API_KEY\n  MODEL\n`);
+  console.log(`openai-transcribe\n\nUsage:\n  node index.js --help\n  node index.js run --request-file <json>\n\nExpected input:\n  {\n    "artifact": { "path": "/abs/media.ogg", "mimeType": "audio/ogg" },\n    "args": {}\n  }\n\nConfig at ${getToolConfigPath(toolName)}:\n  OPENAI_API_KEY\n  MODEL\n`);
 }
 async function run(requestFile) {

package/tools/openai-transcribe/tool.manifest.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
   "name": "openai-transcribe",
-  "description": "Transcribe audio files with OpenAI audio transcription API.",
+  "description": "Transcribe audio files and video audio tracks with OpenAI audio transcription API.",
   "entry": "index.js",
-  "input": ["audio/ogg", "audio/mpeg", "audio/wav", "audio/mp4"],
+  "input": ["audio/ogg", "audio/mpeg", "audio/wav", "audio/mp4", "video/mp4"],
   "output": ["text/plain"],
   "configSchema": {
     "OPENAI_API_KEY": {

package/docs/async-event-queue-flow.md DELETED Viewed

@@ -1,68 +0,0 @@
-# Flow genérico de eventos asíncronos para tools
-> Estado: propuesta / no implementado. Guardado como referencia.
-> La implementación actual (timer) se mantiene; este documento describe una evolución posible.
-## Problema
-Hoy la única re-entrada asíncrona al agente es por tiempo: una tool devuelve `asyncTask` con `runAt` y el poller de 1s en `src/transport/telegram/bot.js` lo dispara como prompt. Eso obliga a resolver con timer (polling crudo, latencia fija, re-spawn de la tool y un turno completo del agente en cada chequeo). Falta una **cola de eventos entrantes** que despierte al agente solo cuando hay algo que evaluar.
-## Solución (polling ordenado por cola, reusando TaskStore)
-Dos nuevos `kind` de tarea, drenados por el mismo poller hacia el mismo `enqueuePrompt`:
-- `poll_tool`: tarea recurrente que el poller **ejecuta directamente como tool** (no gasta turno del agente). El checker mantiene su propio cursor de estado en su config/tmp por chat. Si hay novedad, emite un `agent_event`.
-- `agent_event`: evento entrante que se dispara de inmediato. El poller lo entrega como prompt para que Pi lo evalúe y decida.
-```mermaid
-flowchart LR
-  Tool[Tool run normal] -->|asyncTask poll_tool| TS[TaskStore]
-  TS --> Poller[1s poller dispatcher]
-  Poller -->|kind poll_tool| Run[agentManager.runTool checker]
-  Run -->|si hay novedad: asyncTask agent_event| TS
-  Poller -->|kind agent_event| EP[enqueuePrompt]
-  Poller -->|kind agent_task| EP
-  EP --> Pi[Pi evalua y decide]
-```
-## Cambios
-### 1. TaskStore: eventos/polls sin hora se disparan ya
-`src/core/tasks/task-store.js` - en `normalizeTask`, default `runAt` a `now` cuando no viene (los `agent_event` y el primer disparo de `poll_tool` deben ser inmediatos; `computeNextRunAt` ya reprograma `poll_tool` por su `recurrence`). Cambio de una línea, no rompe `agent_task` (siempre trae `runAt`).
-### 2. AgentManager: extraer "run + materializar" (DRY)
-`src/core/agent/agent-manager.js` - hoy el `execute` de `run_tool` (líneas ~184-242) hace: correr la tool, convertir `output.text`/`output.filePath` en artifacts y mandar `asyncTask(s)` al `TaskStore` con el `chatId`. Extraer eso a un método reusable `runTool({ name, request, chatId })`. El Pi tool `run_tool` pasa a llamarlo. Así el poller puede correr tools con la **misma** lógica de materialización (incluido el alta de `agent_event` que emita el checker).
-### 3. Poller -> dispatcher por kind
-`src/transport/telegram/bot.js` - reemplazar el handler de un solo kind dentro del `setInterval` (líneas ~361-380) por un dispatcher:
-- `agent_task` -> `enqueuePrompt(buildAsyncTaskPrompt(task))` + `complete` (igual que hoy).
-- `agent_event` -> `enqueuePrompt(buildAsyncEventPrompt(task))` + `complete`.
-- `poll_tool` -> `agentManager.runTool({ name: task.payload.toolName, request: { args: task.payload.args || {} }, chatId })`; los `agent_event` que emita el checker quedan encolados para el próximo tick; luego `complete` (la `recurrence` reprograma el poll). Si la tool falla: log + `complete` para no matar el poll.
-Agregar `buildAsyncEventPrompt(task)` junto a `buildAsyncTaskPrompt` (línea ~82), con framing de "llegó un evento externo, evalualo y decidí la próxima acción". Si el branch queda denso, extraer `dispatchDueTasks(...)` a una función para mantener `bot.js` como transporte.
-### 4. Documentar el flow
-`AGENTS.md` - sección nueva (en inglés) explicando: cómo una tool arma su auto-polling devolviendo un `asyncTask` kind `poll_tool` con `recurrence`, cómo emite novedades con `asyncTask` kind `agent_event`, que el checker guarda su cursor en su config/tmp por chat, y que el agente razona sobre el `agent_event` para decidir. `list_scheduled_tasks`/`cancel_scheduled_task` ya sirven (son kind-agnostic) para ver/cancelar polls.
-## Contrato del checker tool (sin nuevas Pi tools)
-Todo pasa por el campo `asyncTasks` que el pipeline ya soporta:
-- Arranque del poll (desde el `run` de cualquier tool): `asyncTasks: [{ kind: "poll_tool", payload: { toolName, args }, recurrence: { type: "interval", everySeconds: N } }]`.
-- Novedad (desde el `run` del checker): `asyncTasks: [{ kind: "agent_event", payload: { prompt: "<contenido a evaluar>" } }]`.
-## No-goals (por ahora)
-- No se agrega listener persistente (`node index.js listen`) ni proceso de fondo con IPC.
-- No se agrega endpoint HTTP entrante para eventos.
-- No se resuelve el caso de conexión sostenida (tipo cliente logueado): los checkers son one-shot y persisten su cursor entre corridas.
-## Alternativas consideradas (descartadas para esta versión)
-- **Listener tools**: la tool corre como proceso de larga duración (`node index.js listen`) y emite eventos por stdout que Arisa drena a la cola. Más general y realtime, pero agrega ciclo de vida de proceso a la service e IPC.
-- **Webhook entrante**: Arisa expone un endpoint HTTP interno donde sistemas externos hacen POST de eventos. Bueno para callbacks; no sirve para los que requieren sostener una conexión.