arisa 3.1.2 → 3.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md CHANGED
@@ -3,10 +3,21 @@
3
3
  ## Architecture
4
4
  - Telegram transport handles inbound and outbound messaging.
5
5
  - Pi Agent keeps one session per authorized chat.
6
- - Every incoming or generated message or file becomes an artifact.
6
+ - Incoming messages and files (text, voice, photo, document) and generated files become artifacts.
7
7
  - A tool registry handles tool discovery, help lookup, config writes, and execution.
8
8
  - Tools are isolated and each one has its own manifest, entrypoint, and config defaults.
9
9
 
10
+ ## Runtime directory rules
11
+ Do not build runtime paths by hand. Use `src/runtime/paths.js`:
12
+ - `getToolDir(toolName)`: installed user tool package only; no runtime data here.
13
+ - `getToolStateDir(toolName)`: global tool infrastructure only: daemons, queues, shared browser sessions, model caches.
14
+ - `getChatToolStateDir(chatId, toolName)`: persistent user/chat data: tool DBs, indexes, inboxes, generated sites, vaults.
15
+ - `getChatArtifactsDir(chatId)` / `getChatArtifactsIndexFile(chatId)`: chat artifacts and artifact index. Artifacts are never global.
16
+ - `getChatToolConfigPath(chatId, toolName)`: chat-scoped config overrides.
17
+ - `getToolTmpDir(toolName)` / `getChatToolTmpDir(chatId, toolName)`: ephemeral scratch. Create only while a request runs; remove when empty.
18
+
19
+ Tools receive `chatId` from the registry. Any persisted or indexed user content must be scoped by chat. Avoid ad hoc roots like `~/.arisa/state/<toolName>`, `~/.arisa/state/chats`, or runtime data inside `~/.arisa/tools/<toolName>`.
20
+
10
21
  ## Main rule: everything is piped through artifacts
11
22
  A pipe transforms one input artifact into one output artifact.
12
23
  Examples:
@@ -18,6 +29,7 @@ Each tool declares in `tool.manifest.json`:
18
29
  - `input`: supported input types
19
30
  - `output`: produced output types
20
31
  - `configSchema`: required config fields
32
+ - `skillHints`: optional skills to apply when using or editing the tool
21
33
 
22
34
  ## Conceptual pipe model
23
35
  There are two different moments where pipes can happen:
@@ -34,12 +46,11 @@ There are two different moments where pipes can happen:
34
46
  - Pi Agent may decide to chain tools to achieve a user goal.
35
47
  - Example: text -> TTS audio, or future multi-step workflows.
36
48
 
37
- This distinction is critical. Not every pipe should be decided by Pi Agent at runtime. Some pipes are part of the transport/input normalization layer and must happen before reasoning.
49
+ Not every pipe should be decided by Pi Agent at runtime. Some pipes are part of the transport/input normalization layer and must happen before reasoning.
38
50
 
39
51
  ## Telegram inbound pipeline
40
- Current conceptual behavior:
41
52
  - text -> send directly to Pi Agent
42
- - audio/voice -> transcribe first -> send transcript to Pi Agent
53
+ - voice -> transcribe first -> send transcript to Pi Agent
43
54
  - image/document/other media -> keep as artifacts, and add normalization pipes when needed
44
55
 
45
56
  If inbound media was normalized before reasoning, Pi Agent should use the normalized result as the actual message content.
@@ -50,23 +61,23 @@ Before using a tool, inspect its help:
50
61
  - via the custom tool: `tool_help`
51
62
  - or by running the CLI with `--help`
52
63
 
53
- Every CLI must support:
64
+ Every CLI must support (the entrypoint comes from `manifest.entry`, currently always `index.js`):
54
65
  - `node index.js --help`
55
66
  - `node index.js run --request-file <json>`
56
67
 
57
68
  ### Tools that need daemons
58
- Some tools need a persistent process, for example to keep a browser session alive or a local model warm.
59
- Implement these tools with the shared daemon runtime instead of custom ad hoc process management:
69
+ A future tool may need a persistent process, for example to keep a browser session alive or a local model warm. The shared daemon runtime exists for this, but no bundled tool uses it yet.
70
+ When such a tool is built, implement it with the shared daemon runtime instead of custom ad hoc process management:
60
71
  - use `src/core/tools/daemon-runtime.js`
61
- - keep runtime files under the tool state directory (`stateDir/<toolName>`)
72
+ - keep runtime files under the tool state directory (`~/.arisa/state/tools/<toolName>`)
62
73
  - expose normal CLI behavior through `run --request-file`; callers should not manage daemon internals
63
74
  - use the runtime for `daemon.pid`, `daemon.log`, `status.json`, and `commands/*.request|processing|result.json`
64
75
  - keep one daemon owner per tool/session and avoid opening a second client over the same resource
65
76
  - use `beforeStart` only for tool-specific cleanup such as stale browser locks, without deleting persistent session/model data
66
77
  - keep daemon tools headless/server-safe by default when they are meant to run on VPS machines
67
78
 
68
- ## Pipe behavior in V1
69
- V1 does not have a full automatic planner yet. The agent should:
79
+ ## Manual pipe behavior
80
+ To run a pipe, the agent should:
70
81
  1. understand whether the needed pipe belongs to pre-reasoning normalization or post-reasoning tool chaining
71
82
  2. use `list_tools`
72
83
  3. use `tool_help` when it needs operational details
@@ -76,7 +87,28 @@ V1 does not have a full automatic planner yet. The agent should:
76
87
  Example manual pipe:
77
88
  1. `run_tool(openai-transcribe, artifact audio)`
78
89
  2. take the returned text `artifactId`
79
- 3. `run_tool(openai-tts, artifact text)` or `send_audio_reply(text)`
90
+ 3. `run_tool(openai-tts, artifact text)` or `send_media_reply(text)`
91
+
92
+ ## Async event queue flow
93
+ Beyond time-based scheduling, tools can drive an event queue that wakes the agent only when there is something to evaluate. Everything goes through the `asyncTask` (single) or `asyncTasks` (array) field the pipeline already supports; no new Pi tools are needed. The 1s poller drains tasks by `kind`:
94
+
95
+ - `agent_task`: a scheduled prompt. The poller delivers it as a prompt for Pi to fulfill (time-based work).
96
+ - `poll_tool`: a recurring checker the poller **runs directly as a tool** (no agent turn spent). The poller materializes its output with the same logic as `run_tool`, so any `agent_event` the checker emits is enqueued for the next tick. Its `recurrence` reschedules the next poll.
97
+ - `agent_event`: an incoming event. The poller delivers it as a prompt so Pi evaluates it and decides the next action (it may stay silent).
98
+
99
+ Tasks without a `runAt` fire immediately, so `agent_event` and the first `poll_tool` run on the next tick.
100
+
101
+ The poller dispatches all three kinds, but only `agent_task` is exercised by a bundled tool today (`schedule-agent-task`). The following is the pattern to follow when a checker tool is built:
102
+
103
+ How a tool wires its own polling:
104
+ 1. From any tool `run`, start the poll by returning an `asyncTask` (or several in `asyncTasks`):
105
+ `{ kind: "poll_tool", payload: { toolName, args }, recurrence: { type: "interval", everySeconds: N } }`.
106
+ 2. On each poll the checker tool (`toolName`) runs headless. It keeps its own cursor of seen state in its config/tmp per chat, so it knows what is new.
107
+ 3. When the checker finds something new, it emits an event from its `run`:
108
+ `{ kind: "agent_event", payload: { prompt: "<content to evaluate>" } }`.
109
+ 4. The agent reasons over the `agent_event` and decides what to do.
110
+
111
+ `list_scheduled_tasks`, `cancel_scheduled_task`, and `cancel_all_scheduled_tasks` are kind-agnostic, so they already work to inspect or cancel active polls.
80
112
 
81
113
  ## Missing config flow
82
114
  If `run_tool` returns `missingConfig`, the agent should:
@@ -101,13 +133,26 @@ The default attitude is:
101
133
  - propose or start creating the needed tool
102
134
 
103
135
  When creating or editing tools:
104
- - use the shared path helpers and the runtime paths provided in the prompt instead of assuming fixed locations
105
- - consult the local skill for that workflow when building new tools
136
+ - use the path helpers in `src/runtime/paths.js`
137
+ - follow the existing bundled tools under `tools/` as the reference pattern for new tools
106
138
  - keep all help text, usage instructions, manifests, and user-facing operational strings in English
107
139
  - follow the One Thing Rule: each function or method should do one thing well; if it mixes low-level operations with high-level policy, split it into smaller focused units
108
140
 
141
+ ### Tool skill hints
142
+ Tools may declare skills in `tool.manifest.json`:
143
+
144
+ ```json
145
+ {
146
+ "skillHints": [
147
+ { "name": "stop-slop", "when": "writing public page copy" }
148
+ ]
149
+ }
150
+ ```
151
+
152
+ The tool registry resolves these from the installed skills directory and injects them into the tool request as `skills`. `list_tools` exposes the hints and `tool_help` shows their resolution status. Skills are guidance for the agent/tool; they are not separate runtime dependencies.
153
+
109
154
  ## Dependency installation
110
- Arisa installs tool dependencies itself.
155
+ Tool dependencies are installed as part of building or running the tool, not delegated to the user.
111
156
  - Prefer `pnpm install`.
112
157
  - Fall back to `npm install`.
113
158
  - Do not ask the user to do it manually.
package/README.md CHANGED
@@ -145,13 +145,13 @@ node src/index.js --telegram.token <token>
145
145
  With this mode, Arisa creates `~/.arisa/state/config.json` without prompts and applies these defaults when not provided:
146
146
 
147
147
  - `pi.provider`: `openai-codex` when available, otherwise first provider from the current Pi provider list
148
- - `pi.model`: first model after bootstrap sorting (currently prioritizes `openai-codex/gpt-5.4`)
148
+ - `pi.model`: first model after bootstrap sorting (currently prioritizes `openai-codex/gpt-5.5`)
149
149
  - `telegram.maxChatIds`: `1`
150
150
 
151
151
  Supported overrides:
152
152
 
153
153
  ```bash
154
- node src/index.js --telegram.token <token> --telegram.maxChatIds 3 --pi.provider openai-codex --pi.model gpt-5.4 --pi.apiKey <optional-provider-key>
154
+ node src/index.js --telegram.token <token> --telegram.maxChatIds 3 --pi.provider openai-codex --pi.model gpt-5.5 --pi.apiKey <optional-provider-key>
155
155
  ```
156
156
 
157
157
  Notes:
@@ -171,7 +171,7 @@ For providers with internal Pi login support, such as Codex, leaving the API key
171
171
 
172
172
  For example, selecting:
173
173
 
174
- - `openai-codex/gpt-5.4`
174
+ - `openai-codex/gpt-5.5`
175
175
 
176
176
  allows Arisa to authenticate through Pi's Codex OAuth flow instead of requiring a normal OpenAI API key.
177
177
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "arisa",
3
- "version": "3.1.2",
3
+ "version": "3.1.4",
4
4
  "description": "Telegram + Pi Agent modular assistant",
5
5
  "type": "module",
6
6
  "main": "src/index.js",
@@ -132,6 +132,49 @@ export class AgentManager {
132
132
  return ctx;
133
133
  }
134
134
 
135
+ async runTool({ name, request, chatId }) {
136
+ await this.toolRegistry.load();
137
+ this.logger?.log("agent", `run_tool ${name}`);
138
+ const chatArtifactStore = this.artifactStore.forChat(chatId);
139
+ const result = await this.toolRegistry.run({ name, request, chatId });
140
+
141
+ if (result.output?.text) {
142
+ const outArtifact = await chatArtifactStore.createText({
143
+ text: result.output.text,
144
+ source: { type: "tool", toolName: name },
145
+ metadata: { tool: name }
146
+ });
147
+ result.output.artifactId = outArtifact.id;
148
+ }
149
+
150
+ if (result.output?.filePath) {
151
+ const generated = await chatArtifactStore.createFromFile({
152
+ originalPath: result.output.filePath,
153
+ fileName: result.output.fileName || path.basename(result.output.filePath),
154
+ kind: result.output.kind || "file",
155
+ mimeType: result.output.mimeType || "application/octet-stream",
156
+ source: { type: "tool", toolName: name },
157
+ metadata: { tool: name }
158
+ });
159
+ result.output.artifactId = generated.id;
160
+ await unlink(result.output.filePath).catch(() => {});
161
+ }
162
+
163
+ if (result.asyncTask || result.asyncTasks?.length) {
164
+ const scheduled = await this.taskStore.addMany(
165
+ result.asyncTasks || [result.asyncTask],
166
+ {
167
+ payload: { chatId },
168
+ source: { type: "tool", toolName: name, chatId }
169
+ }
170
+ );
171
+ result.asyncTasks = scheduled;
172
+ delete result.asyncTask;
173
+ }
174
+
175
+ return result;
176
+ }
177
+
135
178
  createTools(telegram, chatId) {
136
179
  const chatArtifactStore = this.artifactStore.forChat(chatId);
137
180
 
@@ -160,6 +203,18 @@ export class AgentManager {
160
203
  return { content: [{ type: "text", text: help }], details: { help } };
161
204
  }
162
205
  }),
206
+ defineTool({
207
+ name: "tool_skills",
208
+ label: "Tool skills",
209
+ description: "Show skills assigned to a CLI tool via its manifest skillHints.",
210
+ parameters: Type.Object({ name: Type.String() }),
211
+ execute: async (_id, params) => {
212
+ await this.toolRegistry.load();
213
+ const skills = await this.toolRegistry.resolveSkills(params.name);
214
+ const visible = skills.map(({ content, ...item }) => item);
215
+ return { content: [{ type: "text", text: JSON.stringify(visible, null, 2) }], details: visible };
216
+ }
217
+ }),
163
218
  defineTool({
164
219
  name: "set_tool_config",
165
220
  label: "Set tool config",
@@ -182,8 +237,6 @@ export class AgentManager {
182
237
  args: Type.Optional(Type.Record(Type.String(), Type.String()))
183
238
  }),
184
239
  execute: async (_id, params) => {
185
- await this.toolRegistry.load();
186
- this.logger?.log("agent", `run_tool ${params.name}`);
187
240
  let artifact = null;
188
241
  if (params.artifactId) {
189
242
  artifact = await chatArtifactStore.get(params.artifactId);
@@ -191,7 +244,7 @@ export class AgentManager {
191
244
  return { content: [{ type: "text", text: `Artifact not found: ${params.artifactId}` }], details: { ok: false } };
192
245
  }
193
246
  }
194
- const result = await this.toolRegistry.run({
247
+ const result = await this.runTool({
195
248
  name: params.name,
196
249
  request: {
197
250
  artifact,
@@ -201,40 +254,6 @@ export class AgentManager {
201
254
  chatId
202
255
  });
203
256
 
204
- if (result.output?.text) {
205
- const outArtifact = await chatArtifactStore.createText({
206
- text: result.output.text,
207
- source: { type: "tool", toolName: params.name },
208
- metadata: { tool: params.name }
209
- });
210
- result.output.artifactId = outArtifact.id;
211
- }
212
-
213
- if (result.output?.filePath) {
214
- const generated = await chatArtifactStore.createFromFile({
215
- originalPath: result.output.filePath,
216
- fileName: result.output.fileName || path.basename(result.output.filePath),
217
- kind: result.output.kind || "file",
218
- mimeType: result.output.mimeType || "application/octet-stream",
219
- source: { type: "tool", toolName: params.name },
220
- metadata: { tool: params.name }
221
- });
222
- result.output.artifactId = generated.id;
223
- await unlink(result.output.filePath).catch(() => {});
224
- }
225
-
226
- if (result.asyncTask || result.asyncTasks?.length) {
227
- const scheduled = await this.taskStore.addMany(
228
- result.asyncTasks || [result.asyncTask],
229
- {
230
- payload: { chatId },
231
- source: { type: "tool", toolName: params.name, chatId }
232
- }
233
- );
234
- result.asyncTasks = scheduled;
235
- delete result.asyncTask;
236
- }
237
-
238
257
  return {
239
258
  content: [{ type: "text", text: JSON.stringify(result, null, 2) }],
240
259
  details: result
@@ -1,5 +1,5 @@
1
1
  import { fileURLToPath } from "node:url";
2
- import { arisaHomeDir, chatsDir, stateDir, toolsDir } from "../../runtime/paths.js";
2
+ import { arisaHomeDir, chatsDir, stateDir, toolStateDir, toolsDir } from "../../runtime/paths.js";
3
3
 
4
4
  export const arisaInstallDir = fileURLToPath(new URL("../../..", import.meta.url));
5
5
  export const bundledToolsDir = fileURLToPath(new URL("../../../tools", import.meta.url));
@@ -10,6 +10,7 @@ export function buildAgentRuntimeContext() {
10
10
  `arisaInstallDir: ${arisaInstallDir}`,
11
11
  `bundledToolsDir: ${bundledToolsDir}`,
12
12
  `userToolsDir: ${toolsDir}`,
13
+ `toolStateDir: ${toolStateDir}`,
13
14
  `chatsDir: ${chatsDir}`,
14
15
  `stateDir: ${stateDir}`
15
16
  ].join("\n");
@@ -19,8 +19,9 @@ function looksLikeAudioTranscriptionTool(tool) {
19
19
  return /transcri|whisper|speech.?to.?text|audio.?to.?text/i.test(`${tool.name} ${tool.description || ""}`);
20
20
  }
21
21
 
22
- function shouldNormalizeAudioToText(artifact, desiredMimeType) {
23
- return artifact?.mimeType?.startsWith("audio/") && desiredMimeType === "text/plain";
22
+ export function shouldNormalizeArtifactToText(artifact, desiredMimeType = "text/plain") {
23
+ return desiredMimeType === "text/plain"
24
+ && (artifact?.mimeType?.startsWith("audio/") || artifact?.mimeType?.startsWith("video/"));
24
25
  }
25
26
 
26
27
  export function selectPipeTool({ toolRegistry, artifact, desiredMimeType }) {
@@ -28,7 +29,7 @@ export function selectPipeTool({ toolRegistry, artifact, desiredMimeType }) {
28
29
  .filter((tool) => toolSupportsArtifact(tool, artifact))
29
30
  .filter((tool) => toolProduces(tool, desiredMimeType));
30
31
 
31
- if (shouldNormalizeAudioToText(artifact, desiredMimeType)) {
32
+ if (shouldNormalizeArtifactToText(artifact, desiredMimeType)) {
32
33
  return tools.find(looksLikeAudioTranscriptionTool) || null;
33
34
  }
34
35
 
@@ -44,7 +45,7 @@ export async function normalizeArtifactForReasoning({
44
45
  }) {
45
46
  if (!artifact) return { normalizedArtifact: null, toolResult: null, toolName: "" };
46
47
 
47
- if (!shouldNormalizeAudioToText(artifact, desiredMimeType)) {
48
+ if (!shouldNormalizeArtifactToText(artifact, desiredMimeType)) {
48
49
  return { normalizedArtifact: null, toolResult: null, toolName: "" };
49
50
  }
50
51
 
@@ -0,0 +1,71 @@
1
+ import os from "node:os";
2
+ import path from "node:path";
3
+ import { readFile } from "node:fs/promises";
4
+
5
+ const defaultSkillsDir = path.join(os.homedir(), ".agents", "skills");
6
+
7
+ function parseFrontmatter(source = "") {
8
+ if (!source.startsWith("---")) return {};
9
+ const end = source.indexOf("\n---", 3);
10
+ if (end === -1) return {};
11
+ const block = source.slice(3, end).trim();
12
+ const data = {};
13
+ for (const line of block.split("\n")) {
14
+ const match = line.match(/^([A-Za-z0-9_-]+):\s*(.*)$/);
15
+ if (match) data[match[1]] = match[2].replace(/^['"]|['"]$/g, "");
16
+ }
17
+ return data;
18
+ }
19
+
20
+ function normalizeSkillHint(value) {
21
+ if (typeof value === "string") return { name: value, when: "" };
22
+ if (value && typeof value === "object" && value.name) {
23
+ return { name: String(value.name), when: String(value.when || "") };
24
+ }
25
+ return null;
26
+ }
27
+
28
+ export class SkillRegistry {
29
+ constructor({ skillsDir = defaultSkillsDir } = {}) {
30
+ this.skillsDir = skillsDir;
31
+ this.cache = new Map();
32
+ }
33
+
34
+ async get(name) {
35
+ const key = String(name || "").trim();
36
+ if (!key) return null;
37
+ if (this.cache.has(key)) return this.cache.get(key);
38
+
39
+ const file = path.join(this.skillsDir, key, "SKILL.md");
40
+ try {
41
+ const content = await readFile(file, "utf8");
42
+ const metadata = parseFrontmatter(content);
43
+ const skill = {
44
+ name: metadata.name || key,
45
+ description: metadata.description || "",
46
+ path: file,
47
+ content
48
+ };
49
+ this.cache.set(key, skill);
50
+ return skill;
51
+ } catch {
52
+ this.cache.set(key, null);
53
+ return null;
54
+ }
55
+ }
56
+
57
+ normalizeHints(manifest = {}) {
58
+ const raw = manifest.skillHints || manifest.skills || [];
59
+ if (!Array.isArray(raw)) return [];
60
+ return raw.map(normalizeSkillHint).filter(Boolean);
61
+ }
62
+
63
+ async resolveHints(hints = []) {
64
+ const resolved = [];
65
+ for (const hint of hints) {
66
+ const skill = await this.get(hint.name);
67
+ resolved.push({ ...hint, found: Boolean(skill), skill });
68
+ }
69
+ return resolved;
70
+ }
71
+ }
@@ -27,7 +27,7 @@ function normalizeTask(task, defaults = {}) {
27
27
  createdAt: task.createdAt || new Date().toISOString(),
28
28
  updatedAt: new Date().toISOString(),
29
29
  kind: task.kind,
30
- runAt: task.runAt,
30
+ runAt: task.runAt || new Date().toISOString(),
31
31
  payload: {
32
32
  ...(defaults.payload || {}),
33
33
  ...(task.payload || {})
@@ -3,10 +3,10 @@ import { spawn } from "node:child_process";
3
3
  import { openSync } from "node:fs";
4
4
  import { mkdir, readFile, readdir, rename, rm, unlink, writeFile } from "node:fs/promises";
5
5
  import path from "node:path";
6
- import { stateDir } from "../../runtime/paths.js";
6
+ import { getToolStateDir } from "../../runtime/paths.js";
7
7
 
8
8
  export function daemonPaths(toolName) {
9
- const root = path.join(stateDir, toolName);
9
+ const root = getToolStateDir(toolName);
10
10
  return {
11
11
  root,
12
12
  commandsDir: path.join(root, "commands"),
@@ -1,10 +1,11 @@
1
- import { mkdir, readdir, readFile, unlink, writeFile } from "node:fs/promises";
1
+ import { mkdir, readdir, readFile, rmdir, unlink, writeFile } from "node:fs/promises";
2
2
  import path from "node:path";
3
3
  import { spawn } from "node:child_process";
4
4
  import { fileURLToPath } from "node:url";
5
5
  import { getToolConfigPath, getToolTmpDir, getChatToolTmpDir, toolsDir as userToolsRoot } from "../../runtime/paths.js";
6
6
  import { loadToolConfig, parseConfigModule, writeToolConfig } from "./tool-config.js";
7
7
  import { normalizeToolResult } from "./tool-result.js";
8
+ import { SkillRegistry } from "../skills/skill-registry.js";
8
9
 
9
10
  const bundledToolsRoot = fileURLToPath(new URL("../../../tools", import.meta.url));
10
11
  const toolRoots = [
@@ -27,6 +28,7 @@ export class ToolRegistry {
27
28
  constructor({ logger } = {}) {
28
29
  this.logger = logger;
29
30
  this.tools = new Map();
31
+ this.skillRegistry = new SkillRegistry();
30
32
  }
31
33
 
32
34
  async load() {
@@ -52,8 +54,10 @@ export class ToolRegistry {
52
54
  const configSource = await readFile(configPath, "utf8");
53
55
  const defaults = parseConfigModule(configSource);
54
56
  const config = await loadToolConfig(manifest.name, defaults);
57
+ const skillHints = this.skillRegistry.normalizeHints(manifest);
55
58
  this.tools.set(manifest.name, {
56
59
  ...manifest,
60
+ skillHints,
57
61
  dir: toolDir,
58
62
  entry: path.join(toolDir, manifest.entry || "index.js"),
59
63
  localConfigPath: configPath,
@@ -77,7 +81,8 @@ export class ToolRegistry {
77
81
  description: tool.description,
78
82
  input: tool.input,
79
83
  output: tool.output,
80
- configSchema: tool.configSchema || {}
84
+ configSchema: tool.configSchema || {},
85
+ skillHints: tool.skillHints || []
81
86
  }));
82
87
  }
83
88
 
@@ -89,7 +94,29 @@ export class ToolRegistry {
89
94
  const tool = this.get(name);
90
95
  if (!tool) throw new Error(`Tool not found: ${name}`);
91
96
  const result = await runProcess("node", [tool.entry, "--help"], { cwd: tool.dir, env: process.env });
92
- return result.stdout || result.stderr;
97
+ const help = result.stdout || result.stderr;
98
+ const skills = await this.resolveSkills(name);
99
+ if (!skills.length) return help;
100
+ const skillHelp = skills.map((item) => [
101
+ `- ${item.name}${item.when ? ` (${item.when})` : ""}`,
102
+ item.description ? ` ${item.description}` : null,
103
+ item.found ? ` path: ${item.path}` : " warning: skill not found"
104
+ ].filter(Boolean).join("\n")).join("\n");
105
+ return `${help}\n\nAssigned skills:\n${skillHelp}\n`;
106
+ }
107
+
108
+ async resolveSkills(name) {
109
+ const tool = this.get(name);
110
+ if (!tool) throw new Error(`Tool not found: ${name}`);
111
+ const hints = await this.skillRegistry.resolveHints(tool.skillHints || []);
112
+ return hints.map((hint) => ({
113
+ name: hint.name,
114
+ when: hint.when,
115
+ found: hint.found,
116
+ description: hint.skill?.description || "",
117
+ path: hint.skill?.path || "",
118
+ content: hint.skill?.content || ""
119
+ }));
93
120
  }
94
121
 
95
122
  async resolveConfigForChat(name, chatId) {
@@ -121,12 +148,19 @@ export class ToolRegistry {
121
148
  const tmpDir = chatId != null ? getChatToolTmpDir(chatId, name) : getToolTmpDir(name);
122
149
  await mkdir(tmpDir, { recursive: true });
123
150
  const requestFile = path.join(tmpDir, `.request-${Date.now()}.json`);
124
- await writeFile(requestFile, `${JSON.stringify(request, null, 2)}\n`, "utf8");
151
+ const skills = await this.resolveSkills(name);
152
+ const enrichedRequest = { ...request, chatId, skills };
153
+ await writeFile(requestFile, `${JSON.stringify(enrichedRequest, null, 2)}\n`, "utf8");
125
154
  const result = await runProcess("node", [tool.entry, "run", "--request-file", requestFile], {
126
155
  cwd: tool.dir,
127
156
  env: process.env
128
157
  });
129
158
  await unlink(requestFile).catch(() => {});
159
+ await rmdir(tmpDir).catch(() => {});
160
+ if (chatId != null) {
161
+ await rmdir(path.dirname(tmpDir)).catch(() => {});
162
+ await rmdir(path.dirname(path.dirname(tmpDir))).catch(() => {});
163
+ }
130
164
  try {
131
165
  const parsed = JSON.parse(result.stdout || result.stderr);
132
166
  const normalized = normalizeToolResult(name, parsed);
@@ -90,7 +90,7 @@ function sortBootstrapProviders(providers) {
90
90
 
91
91
  function sortBootstrapModels(provider, models) {
92
92
  const preferred = {
93
- "openai-codex": ["gpt-5.4"]
93
+ "openai-codex": ["gpt-5.5"]
94
94
  };
95
95
 
96
96
  const priority = preferred[provider] || [];
@@ -10,6 +10,7 @@ export const serviceLogFile = path.join(stateDir, "arisa.log");
10
10
  export const tasksFile = path.join(stateDir, "tasks.json");
11
11
  export const toolsDir = path.join(arisaHomeDir, "tools");
12
12
  export const chatsDir = path.join(arisaHomeDir, "chats");
13
+ export const toolStateDir = path.join(stateDir, "tools");
13
14
 
14
15
  export function getChatDir(chatId) {
15
16
  return path.join(chatsDir, String(chatId));
@@ -23,6 +24,10 @@ export function getChatArtifactsIndexFile(chatId) {
23
24
  return path.join(getChatDir(chatId), "state", "artifacts.json");
24
25
  }
25
26
 
27
+ export function getChatToolStateDir(chatId, toolName) {
28
+ return path.join(getChatDir(chatId), "state", "tools", toolName);
29
+ }
30
+
26
31
  export function getChatPiSessionsDir(chatId) {
27
32
  return path.join(getChatDir(chatId), "state", "pi-sessions");
28
33
  }
@@ -35,24 +40,28 @@ export function getToolConfigPath(toolName) {
35
40
  return path.join(getToolDir(toolName), "config.js");
36
41
  }
37
42
 
38
- export function getChatToolConfigPath(chatId, toolName) {
39
- return path.join(getChatDir(chatId), "tools", toolName, "config.js");
43
+ export function getChatConfigDir(chatId) {
44
+ return path.join(getChatDir(chatId), "config");
40
45
  }
41
46
 
42
- export function getToolRuntimeDir(toolName) {
43
- return getToolDir(toolName);
47
+ export function getChatTmpDir(chatId) {
48
+ return path.join(getChatDir(chatId), "tmp");
49
+ }
50
+
51
+ export function getChatToolConfigPath(chatId, toolName) {
52
+ return path.join(getChatConfigDir(chatId), "tools", toolName, "config.js");
44
53
  }
45
54
 
46
- export function getToolOutDir(toolName) {
47
- return path.join(getToolRuntimeDir(toolName), "out");
55
+ export function getToolStateDir(toolName) {
56
+ return path.join(toolStateDir, toolName);
48
57
  }
49
58
 
50
59
  export function getToolTmpDir(toolName) {
51
- return path.join(getToolRuntimeDir(toolName), "tmp");
60
+ return path.join(getToolStateDir(toolName), "tmp");
52
61
  }
53
62
 
54
63
  export function getChatToolTmpDir(chatId, toolName) {
55
- return path.join(getChatDir(chatId), "tools", toolName, "tmp");
64
+ return path.join(getChatTmpDir(chatId), "tools", toolName);
56
65
  }
57
66
 
58
67
  export async function ensureArisaHome() {
@@ -3,7 +3,7 @@ import path from "node:path";
3
3
  import { authorizeChat } from "./auth.js";
4
4
  import { captureIncomingArtifact } from "./media.js";
5
5
  import { renderTelegramHtml } from "./text-format.js";
6
- import { normalizeArtifactForReasoning } from "../../core/artifacts/normalize-for-reasoning.js";
6
+ import { normalizeArtifactForReasoning, shouldNormalizeArtifactToText } from "../../core/artifacts/normalize-for-reasoning.js";
7
7
 
8
8
  function quotedMessageSummary(message) {
9
9
  if (!message) return [];
@@ -63,11 +63,11 @@ function buildPrompt({ ctx, artifact, transcript, toolResult }) {
63
63
  if (transcript) {
64
64
  parts.push(`transcriptArtifactId: ${transcript.id}`);
65
65
  parts.push(`transcriptText: ${transcript.text}`);
66
- parts.push(`Important: the incoming audio has already been transcribed. Use the transcript as the user message content. Do not answer with a raw transcription unless the user explicitly asked for one.`);
66
+ parts.push(`Important: the incoming media has already been transcribed. Use the transcript as the user message content. Do not answer with a raw transcription unless the user explicitly asked for one.`);
67
67
  }
68
- if (artifact?.kind === "audio" && !transcript && toolResult) {
69
- parts.push(`audioNormalizationResult: ${JSON.stringify(toolResult)}`);
70
- parts.push(`Important: pre-reasoning audio normalization could not be completed, so you do not have a transcript for this voice/audio message.`);
68
+ if (shouldNormalizeArtifactToText(artifact) && !transcript && toolResult) {
69
+ parts.push(`mediaNormalizationResult: ${JSON.stringify(toolResult)}`);
70
+ parts.push(`Important: pre-reasoning media normalization could not be completed, so you do not have a transcript for this audio/video message.`);
71
71
  }
72
72
 
73
73
  parts.push(`If you need a CLI tool, use list_tools/tool_help/run_tool.`);
@@ -114,10 +114,10 @@ async function buildAsyncTaskPrompt({ task, artifactStore, toolRegistry, logger
114
114
  logger?.log("tasks", `artifact ${artifact.id} normalized to ${normalizedArtifact.id}`);
115
115
  parts.push(`transcriptArtifactId: ${normalizedArtifact.id}`);
116
116
  parts.push(`transcriptText: ${normalizedArtifact.text}`);
117
- parts.push("Important: the attached audio artifact has already been normalized for reasoning. Use the transcript as the message content.");
118
- } else if (artifact.kind === "audio" && toolResult) {
119
- parts.push(`audioNormalizationResult: ${JSON.stringify(toolResult)}`);
120
- parts.push("Important: pre-reasoning audio normalization could not be completed, so you do not have a transcript for this audio artifact.");
117
+ parts.push("Important: the attached media artifact has already been normalized for reasoning. Use the transcript as the message content.");
118
+ } else if (shouldNormalizeArtifactToText(artifact) && toolResult) {
119
+ parts.push(`mediaNormalizationResult: ${JSON.stringify(toolResult)}`);
120
+ parts.push("Important: pre-reasoning media normalization could not be completed, so you do not have a transcript for this audio/video artifact.");
121
121
  }
122
122
  } else {
123
123
  parts.push(`artifactId: ${task.payload.artifactId}`);
@@ -130,6 +130,18 @@ async function buildAsyncTaskPrompt({ task, artifactStore, toolRegistry, logger
130
130
  return parts.filter(Boolean).join("\n");
131
131
  }
132
132
 
133
+ function buildAsyncEventPrompt(task) {
134
+ return [
135
+ "External event arrived.",
136
+ `taskId: ${task.id}`,
137
+ `chatId: ${task.payload.chatId}`,
138
+ task.payload.prompt ? `event: ${task.payload.prompt}` : null,
139
+ "A polling checker detected this external event. Evaluate it and decide the next action.",
140
+ "If it warrants no action, you may stay silent.",
141
+ "If needed, use tools."
142
+ ].filter(Boolean).join("\n");
143
+ }
144
+
133
145
  async function normalizeIncomingArtifact({ artifact, toolRegistry, chatArtifactStore, chatId }) {
134
146
  if (!artifact) return { transcript: null, toolResult: null };
135
147
  const { normalizedArtifact, toolResult } = await normalizeArtifactForReasoning({
@@ -194,9 +206,9 @@ export async function createTelegramBot({ config, artifactStore, toolRegistry, t
194
206
  const artifact = await captureIncomingArtifact(ctx, artifactStore);
195
207
  if (artifact) logger?.log("telegram", `captured artifact ${artifact.kind}${artifact.id ? ` ${artifact.id}` : ""}`);
196
208
  const { transcript, toolResult } = await normalizeIncomingArtifact({ artifact, toolRegistry, chatArtifactStore, chatId });
197
- if (transcript) logger?.log("telegram", `audio transcribed to artifact ${transcript.id}`);
198
- if (artifact?.kind === "audio" && !transcript) {
199
- logger?.log("telegram", `audio normalization unavailable for chat ${ctx.chat.id}: ${toolResult?.error || toolResult?.missingConfig?.join(", ") || "unknown error"}`);
209
+ if (transcript) logger?.log("telegram", `media transcribed to artifact ${transcript.id}`);
210
+ if (shouldNormalizeArtifactToText(artifact) && !transcript) {
211
+ logger?.log("telegram", `media normalization unavailable for chat ${ctx.chat.id}: ${toolResult?.error || toolResult?.missingConfig?.join(", ") || "unknown error"}`);
200
212
  }
201
213
  return buildPrompt({ ctx, artifact, transcript, toolResult });
202
214
  }
@@ -310,6 +322,73 @@ export async function createTelegramBot({ config, artifactStore, toolRegistry, t
310
322
  });
311
323
  }
312
324
 
325
+ async function dispatchTask(task) {
326
+ const chatId = task.payload?.chatId;
327
+ if (!chatId) {
328
+ await taskStore.fail(task.id, `Task missing chatId: ${task.kind}`);
329
+ return;
330
+ }
331
+
332
+ if (task.kind === "agent_task") {
333
+ if (!task.payload.prompt) {
334
+ await taskStore.fail(task.id, "agent_task missing prompt");
335
+ return;
336
+ }
337
+ logger?.log("tasks", `running task ${task.id} for chat ${chatId}`);
338
+ await enqueuePrompt({
339
+ chatId,
340
+ prompt: await buildAsyncTaskPrompt({ task, artifactStore, toolRegistry, logger }),
341
+ label: `scheduled task ${task.id}`
342
+ });
343
+ await taskStore.complete(task.id);
344
+ return;
345
+ }
346
+
347
+ if (task.kind === "agent_event") {
348
+ logger?.log("tasks", `agent event ${task.id} for chat ${chatId}`);
349
+ await enqueuePrompt({
350
+ chatId,
351
+ prompt: buildAsyncEventPrompt(task),
352
+ label: `agent event ${task.id}`
353
+ });
354
+ await taskStore.complete(task.id);
355
+ return;
356
+ }
357
+
358
+ if (task.kind === "poll_tool") {
359
+ const toolName = task.payload?.toolName;
360
+ if (!toolName) {
361
+ await taskStore.fail(task.id, "poll_tool missing toolName");
362
+ return;
363
+ }
364
+ logger?.log("tasks", `polling tool ${toolName} (task ${task.id}) for chat ${chatId}`);
365
+ try {
366
+ await agentManager.runTool({
367
+ name: toolName,
368
+ request: { args: task.payload.args || {} },
369
+ chatId
370
+ });
371
+ } catch (error) {
372
+ logger?.log("tasks", `poll_tool ${toolName} failed: ${error instanceof Error ? error.message : String(error)}`);
373
+ }
374
+ await taskStore.complete(task.id);
375
+ return;
376
+ }
377
+
378
+ await taskStore.fail(task.id, `Unsupported task: ${task.kind}`);
379
+ }
380
+
381
+ async function dispatchDueTasks() {
382
+ const tasks = await taskStore.claimDue(10);
383
+ for (const task of tasks) {
384
+ try {
385
+ await dispatchTask(task);
386
+ } catch (error) {
387
+ await taskStore.fail(task.id, error instanceof Error ? error.message : String(error));
388
+ }
389
+ }
390
+ }
391
+
313
392
  async function handleNewCommand(ctx) {
314
393
  agentManager.resetSession(ctx.chat.id);
315
394
  perChatState.set(ctx.chat.id, { processing: false, nextPrompt: "" });
@@ -381,25 +460,10 @@ export async function createTelegramBot({ config, artifactStore, toolRegistry, t
381
460
  await bot.api.setMyCommands([
382
461
  { command: "new", description: "Start a new chat context" }
383
462
  ]);
384
- setInterval(async () => {
385
- const tasks = await taskStore.claimDue(10);
386
- for (const task of tasks) {
387
- try {
388
- if (task.kind !== "agent_task" || !task.payload?.chatId || !task.payload?.prompt) {
389
- await taskStore.fail(task.id, `Unsupported task: ${task.kind}`);
390
- continue;
391
- }
392
- logger?.log("tasks", `running task ${task.id} for chat ${task.payload.chatId}`);
393
- await enqueuePrompt({
394
- chatId: task.payload.chatId,
395
- prompt: await buildAsyncTaskPrompt({ task, artifactStore, toolRegistry, logger }),
396
- label: `scheduled task ${task.id}`
397
- });
398
- await taskStore.complete(task.id);
399
- } catch (error) {
400
- await taskStore.fail(task.id, error instanceof Error ? error.message : String(error));
401
- }
402
- }
463
+ setInterval(() => {
464
+ dispatchDueTasks().catch((error) => {
465
+ logger?.error("tasks", `dispatch failed: ${error instanceof Error ? error.message : String(error)}`);
466
+ });
403
467
  }, 1000).unref();
404
468
  if (webhookUrl && setHttpRequestHandler) {
405
469
  const webhookPath = `/telegram-${config.telegram.token.slice(-8)}`;
@@ -33,6 +33,26 @@ export async function captureIncomingArtifact(ctx, artifactStore) {
33
33
  });
34
34
  }
35
35
 
36
+ if (ctx.message?.video) {
37
+ const video = ctx.message.video;
38
+ const fileName = video.file_name || `${chatId}-${ctx.msg.message_id}.mp4`;
39
+ const content = await downloadToBuffer(ctx, video.file_id);
40
+ return store.createGeneratedFile({
41
+ fileName,
42
+ content,
43
+ kind: "video",
44
+ mimeType: video.mime_type || "video/mp4",
45
+ source: baseSource,
46
+ metadata: {
47
+ duration: video.duration,
48
+ width: video.width,
49
+ height: video.height,
50
+ fileSize: video.file_size,
51
+ ...incomingCaptionMetadata(ctx)
52
+ }
53
+ });
54
+ }
55
+
36
56
  if (ctx.message?.document) {
37
57
  const fileName = ctx.message.document.file_name || `${chatId}-${ctx.msg.message_id}`;
38
58
  const content = await downloadToBuffer(ctx, ctx.message.document.file_id);
@@ -9,7 +9,7 @@ const toolName = "openai-transcribe";
9
9
  const config = await loadToolConfig(toolName, defaults);
10
10
 
11
11
  function printHelp() {
12
- console.log(`openai-transcribe\n\nUsage:\n node index.js --help\n node index.js run --request-file <json>\n\nExpected input:\n {\n "artifact": { "path": "/abs/audio.ogg", "mimeType": "audio/ogg" },\n "args": {}\n }\n\nConfig at ${getToolConfigPath(toolName)}:\n OPENAI_API_KEY\n MODEL\n`);
12
+ console.log(`openai-transcribe\n\nUsage:\n node index.js --help\n node index.js run --request-file <json>\n\nExpected input:\n {\n "artifact": { "path": "/abs/media.ogg", "mimeType": "audio/ogg" },\n "args": {}\n }\n\nConfig at ${getToolConfigPath(toolName)}:\n OPENAI_API_KEY\n MODEL\n`);
13
13
  }
14
14
 
15
15
  async function run(requestFile) {
@@ -1,8 +1,8 @@
1
1
  {
2
2
  "name": "openai-transcribe",
3
- "description": "Transcribe audio files with OpenAI audio transcription API.",
3
+ "description": "Transcribe audio files and video audio tracks with OpenAI audio transcription API.",
4
4
  "entry": "index.js",
5
- "input": ["audio/ogg", "audio/mpeg", "audio/wav", "audio/mp4"],
5
+ "input": ["audio/ogg", "audio/mpeg", "audio/wav", "audio/mp4", "video/mp4"],
6
6
  "output": ["text/plain"],
7
7
  "configSchema": {
8
8
  "OPENAI_API_KEY": {
@@ -1,68 +0,0 @@
1
- # Flow genérico de eventos asíncronos para tools
2
-
3
- > Estado: propuesta / no implementado. Guardado como referencia.
4
- > La implementación actual (timer) se mantiene; este documento describe una evolución posible.
5
-
6
- ## Problema
7
-
8
- Hoy la única re-entrada asíncrona al agente es por tiempo: una tool devuelve `asyncTask` con `runAt` y el poller de 1s en `src/transport/telegram/bot.js` lo dispara como prompt. Eso obliga a resolver con timer (polling crudo, latencia fija, re-spawn de la tool y un turno completo del agente en cada chequeo). Falta una **cola de eventos entrantes** que despierte al agente solo cuando hay algo que evaluar.
9
-
10
- ## Solución (polling ordenado por cola, reusando TaskStore)
11
-
12
- Dos nuevos `kind` de tarea, drenados por el mismo poller hacia el mismo `enqueuePrompt`:
13
-
14
- - `poll_tool`: tarea recurrente que el poller **ejecuta directamente como tool** (no gasta turno del agente). El checker mantiene su propio cursor de estado en su config/tmp por chat. Si hay novedad, emite un `agent_event`.
15
- - `agent_event`: evento entrante que se dispara de inmediato. El poller lo entrega como prompt para que Pi lo evalúe y decida.
16
-
17
- ```mermaid
18
- flowchart LR
19
- Tool[Tool run normal] -->|asyncTask poll_tool| TS[TaskStore]
20
- TS --> Poller[1s poller dispatcher]
21
- Poller -->|kind poll_tool| Run[agentManager.runTool checker]
22
- Run -->|si hay novedad: asyncTask agent_event| TS
23
- Poller -->|kind agent_event| EP[enqueuePrompt]
24
- Poller -->|kind agent_task| EP
25
- EP --> Pi[Pi evalua y decide]
26
- ```
27
-
28
- ## Cambios
29
-
30
- ### 1. TaskStore: eventos/polls sin hora se disparan ya
31
-
32
- `src/core/tasks/task-store.js` - en `normalizeTask`, default `runAt` a `now` cuando no viene (los `agent_event` y el primer disparo de `poll_tool` deben ser inmediatos; `computeNextRunAt` ya reprograma `poll_tool` por su `recurrence`). Cambio de una línea, no rompe `agent_task` (siempre trae `runAt`).
33
-
34
- ### 2. AgentManager: extraer "run + materializar" (DRY)
35
-
36
- `src/core/agent/agent-manager.js` - hoy el `execute` de `run_tool` (líneas ~184-242) hace: correr la tool, convertir `output.text`/`output.filePath` en artifacts y mandar `asyncTask(s)` al `TaskStore` con el `chatId`. Extraer eso a un método reusable `runTool({ name, request, chatId })`. El Pi tool `run_tool` pasa a llamarlo. Así el poller puede correr tools con la **misma** lógica de materialización (incluido el alta de `agent_event` que emita el checker).
37
-
38
- ### 3. Poller -> dispatcher por kind
39
-
40
- `src/transport/telegram/bot.js` - reemplazar el handler de un solo kind dentro del `setInterval` (líneas ~361-380) por un dispatcher:
41
-
42
- - `agent_task` -> `enqueuePrompt(buildAsyncTaskPrompt(task))` + `complete` (igual que hoy).
43
- - `agent_event` -> `enqueuePrompt(buildAsyncEventPrompt(task))` + `complete`.
44
- - `poll_tool` -> `agentManager.runTool({ name: task.payload.toolName, request: { args: task.payload.args || {} }, chatId })`; los `agent_event` que emita el checker quedan encolados para el próximo tick; luego `complete` (la `recurrence` reprograma el poll). Si la tool falla: log + `complete` para no matar el poll.
45
-
46
- Agregar `buildAsyncEventPrompt(task)` junto a `buildAsyncTaskPrompt` (línea ~82), con framing de "llegó un evento externo, evalualo y decidí la próxima acción". Si el branch queda denso, extraer `dispatchDueTasks(...)` a una función para mantener `bot.js` como transporte.
47
-
48
- ### 4. Documentar el flow
49
-
50
- `AGENTS.md` - sección nueva (en inglés) explicando: cómo una tool arma su auto-polling devolviendo un `asyncTask` kind `poll_tool` con `recurrence`, cómo emite novedades con `asyncTask` kind `agent_event`, que el checker guarda su cursor en su config/tmp por chat, y que el agente razona sobre el `agent_event` para decidir. `list_scheduled_tasks`/`cancel_scheduled_task` ya sirven (son kind-agnostic) para ver/cancelar polls.
51
-
52
- ## Contrato del checker tool (sin nuevas Pi tools)
53
-
54
- Todo pasa por el campo `asyncTasks` que el pipeline ya soporta:
55
-
56
- - Arranque del poll (desde el `run` de cualquier tool): `asyncTasks: [{ kind: "poll_tool", payload: { toolName, args }, recurrence: { type: "interval", everySeconds: N } }]`.
57
- - Novedad (desde el `run` del checker): `asyncTasks: [{ kind: "agent_event", payload: { prompt: "<contenido a evaluar>" } }]`.
58
-
59
- ## No-goals (por ahora)
60
-
61
- - No se agrega listener persistente (`node index.js listen`) ni proceso de fondo con IPC.
62
- - No se agrega endpoint HTTP entrante para eventos.
63
- - No se resuelve el caso de conexión sostenida (tipo cliente logueado): los checkers son one-shot y persisten su cursor entre corridas.
64
-
65
- ## Alternativas consideradas (descartadas para esta versión)
66
-
67
- - **Listener tools**: la tool corre como proceso de larga duración (`node index.js listen`) y emite eventos por stdout que Arisa drena a la cola. Más general y realtime, pero agrega ciclo de vida de proceso a la service e IPC.
68
- - **Webhook entrante**: Arisa expone un endpoint HTTP interno donde sistemas externos hacen POST de eventos. Bueno para callbacks; no sirve para los que requieren sostener una conexión.