arisa 3.1.2 → 3.1.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +59 -14
- package/README.md +3 -3
- package/package.json +1 -1
- package/src/core/agent/agent-manager.js +56 -37
- package/src/core/agent/runtime-context.js +2 -1
- package/src/core/artifacts/normalize-for-reasoning.js +5 -4
- package/src/core/skills/skill-registry.js +71 -0
- package/src/core/tasks/task-store.js +1 -1
- package/src/core/tools/daemon-runtime.js +2 -2
- package/src/core/tools/tool-registry.js +38 -4
- package/src/runtime/bootstrap.js +1 -1
- package/src/runtime/paths.js +17 -8
- package/src/transport/telegram/bot.js +95 -31
- package/src/transport/telegram/media.js +20 -0
- package/tools/openai-transcribe/index.js +1 -1
- package/tools/openai-transcribe/tool.manifest.json +2 -2
- package/docs/async-event-queue-flow.md +0 -68
package/AGENTS.md
CHANGED
|
@@ -3,10 +3,21 @@
|
|
|
3
3
|
## Architecture
|
|
4
4
|
- Telegram transport handles inbound and outbound messaging.
|
|
5
5
|
- Pi Agent keeps one session per authorized chat.
|
|
6
|
-
-
|
|
6
|
+
- Incoming messages and files (text, voice, photo, document) and generated files become artifacts.
|
|
7
7
|
- A tool registry handles tool discovery, help lookup, config writes, and execution.
|
|
8
8
|
- Tools are isolated and each one has its own manifest, entrypoint, and config defaults.
|
|
9
9
|
|
|
10
|
+
## Runtime directory rules
|
|
11
|
+
Do not build runtime paths by hand. Use `src/runtime/paths.js`:
|
|
12
|
+
- `getToolDir(toolName)`: installed user tool package only; no runtime data here.
|
|
13
|
+
- `getToolStateDir(toolName)`: global tool infrastructure only: daemons, queues, shared browser sessions, model caches.
|
|
14
|
+
- `getChatToolStateDir(chatId, toolName)`: persistent user/chat data: tool DBs, indexes, inboxes, generated sites, vaults.
|
|
15
|
+
- `getChatArtifactsDir(chatId)` / `getChatArtifactsIndexFile(chatId)`: chat artifacts and artifact index. Artifacts are never global.
|
|
16
|
+
- `getChatToolConfigPath(chatId, toolName)`: chat-scoped config overrides.
|
|
17
|
+
- `getToolTmpDir(toolName)` / `getChatToolTmpDir(chatId, toolName)`: ephemeral scratch. Create only while a request runs; remove when empty.
|
|
18
|
+
|
|
19
|
+
Tools receive `chatId` from the registry. Any persisted or indexed user content must be scoped by chat. Avoid ad hoc roots like `~/.arisa/state/<toolName>`, `~/.arisa/state/chats`, or runtime data inside `~/.arisa/tools/<toolName>`.
|
|
20
|
+
|
|
10
21
|
## Main rule: everything is piped through artifacts
|
|
11
22
|
A pipe transforms one input artifact into one output artifact.
|
|
12
23
|
Examples:
|
|
@@ -18,6 +29,7 @@ Each tool declares in `tool.manifest.json`:
|
|
|
18
29
|
- `input`: supported input types
|
|
19
30
|
- `output`: produced output types
|
|
20
31
|
- `configSchema`: required config fields
|
|
32
|
+
- `skillHints`: optional skills to apply when using or editing the tool
|
|
21
33
|
|
|
22
34
|
## Conceptual pipe model
|
|
23
35
|
There are two different moments where pipes can happen:
|
|
@@ -34,12 +46,11 @@ There are two different moments where pipes can happen:
|
|
|
34
46
|
- Pi Agent may decide to chain tools to achieve a user goal.
|
|
35
47
|
- Example: text -> TTS audio, or future multi-step workflows.
|
|
36
48
|
|
|
37
|
-
|
|
49
|
+
Not every pipe should be decided by Pi Agent at runtime. Some pipes are part of the transport/input normalization layer and must happen before reasoning.
|
|
38
50
|
|
|
39
51
|
## Telegram inbound pipeline
|
|
40
|
-
Current conceptual behavior:
|
|
41
52
|
- text -> send directly to Pi Agent
|
|
42
|
-
-
|
|
53
|
+
- voice -> transcribe first -> send transcript to Pi Agent
|
|
43
54
|
- image/document/other media -> keep as artifacts, and add normalization pipes when needed
|
|
44
55
|
|
|
45
56
|
If inbound media was normalized before reasoning, Pi Agent should use the normalized result as the actual message content.
|
|
@@ -50,23 +61,23 @@ Before using a tool, inspect its help:
|
|
|
50
61
|
- via the custom tool: `tool_help`
|
|
51
62
|
- or by running the CLI with `--help`
|
|
52
63
|
|
|
53
|
-
Every CLI must support:
|
|
64
|
+
Every CLI must support (the entrypoint comes from `manifest.entry`, currently always `index.js`):
|
|
54
65
|
- `node index.js --help`
|
|
55
66
|
- `node index.js run --request-file <json>`
|
|
56
67
|
|
|
57
68
|
### Tools that need daemons
|
|
58
|
-
|
|
59
|
-
|
|
69
|
+
A future tool may need a persistent process, for example to keep a browser session alive or a local model warm. The shared daemon runtime exists for this, but no bundled tool uses it yet.
|
|
70
|
+
When such a tool is built, implement it with the shared daemon runtime instead of custom ad hoc process management:
|
|
60
71
|
- use `src/core/tools/daemon-runtime.js`
|
|
61
|
-
- keep runtime files under the tool state directory (
|
|
72
|
+
- keep runtime files under the tool state directory (`~/.arisa/state/tools/<toolName>`)
|
|
62
73
|
- expose normal CLI behavior through `run --request-file`; callers should not manage daemon internals
|
|
63
74
|
- use the runtime for `daemon.pid`, `daemon.log`, `status.json`, and `commands/*.request|processing|result.json`
|
|
64
75
|
- keep one daemon owner per tool/session and avoid opening a second client over the same resource
|
|
65
76
|
- use `beforeStart` only for tool-specific cleanup such as stale browser locks, without deleting persistent session/model data
|
|
66
77
|
- keep daemon tools headless/server-safe by default when they are meant to run on VPS machines
|
|
67
78
|
|
|
68
|
-
##
|
|
69
|
-
|
|
79
|
+
## Manual pipe behavior
|
|
80
|
+
To run a pipe, the agent should:
|
|
70
81
|
1. understand whether the needed pipe belongs to pre-reasoning normalization or post-reasoning tool chaining
|
|
71
82
|
2. use `list_tools`
|
|
72
83
|
3. use `tool_help` when it needs operational details
|
|
@@ -76,7 +87,28 @@ V1 does not have a full automatic planner yet. The agent should:
|
|
|
76
87
|
Example manual pipe:
|
|
77
88
|
1. `run_tool(openai-transcribe, artifact audio)`
|
|
78
89
|
2. take the returned text `artifactId`
|
|
79
|
-
3. `run_tool(openai-tts, artifact text)` or `
|
|
90
|
+
3. `run_tool(openai-tts, artifact text)` or `send_media_reply(text)`
|
|
91
|
+
|
|
92
|
+
## Async event queue flow
|
|
93
|
+
Beyond time-based scheduling, tools can drive an event queue that wakes the agent only when there is something to evaluate. Everything goes through the `asyncTask` (single) or `asyncTasks` (array) field the pipeline already supports; no new Pi tools are needed. The 1s poller drains tasks by `kind`:
|
|
94
|
+
|
|
95
|
+
- `agent_task`: a scheduled prompt. The poller delivers it as a prompt for Pi to fulfill (time-based work).
|
|
96
|
+
- `poll_tool`: a recurring checker the poller **runs directly as a tool** (no agent turn spent). The poller materializes its output with the same logic as `run_tool`, so any `agent_event` the checker emits is enqueued for the next tick. Its `recurrence` reschedules the next poll.
|
|
97
|
+
- `agent_event`: an incoming event. The poller delivers it as a prompt so Pi evaluates it and decides the next action (it may stay silent).
|
|
98
|
+
|
|
99
|
+
Tasks without a `runAt` fire immediately, so `agent_event` and the first `poll_tool` run on the next tick.
|
|
100
|
+
|
|
101
|
+
The poller dispatches all three kinds, but only `agent_task` is exercised by a bundled tool today (`schedule-agent-task`). The following is the pattern to follow when a checker tool is built:
|
|
102
|
+
|
|
103
|
+
How a tool wires its own polling:
|
|
104
|
+
1. From any tool `run`, start the poll by returning an `asyncTask` (or several in `asyncTasks`):
|
|
105
|
+
`{ kind: "poll_tool", payload: { toolName, args }, recurrence: { type: "interval", everySeconds: N } }`.
|
|
106
|
+
2. On each poll the checker tool (`toolName`) runs headless. It keeps its own cursor of seen state in its config/tmp per chat, so it knows what is new.
|
|
107
|
+
3. When the checker finds something new, it emits an event from its `run`:
|
|
108
|
+
`{ kind: "agent_event", payload: { prompt: "<content to evaluate>" } }`.
|
|
109
|
+
4. The agent reasons over the `agent_event` and decides what to do.
|
|
110
|
+
|
|
111
|
+
`list_scheduled_tasks`, `cancel_scheduled_task`, and `cancel_all_scheduled_tasks` are kind-agnostic, so they already work to inspect or cancel active polls.
|
|
80
112
|
|
|
81
113
|
## Missing config flow
|
|
82
114
|
If `run_tool` returns `missingConfig`, the agent should:
|
|
@@ -101,13 +133,26 @@ The default attitude is:
|
|
|
101
133
|
- propose or start creating the needed tool
|
|
102
134
|
|
|
103
135
|
When creating or editing tools:
|
|
104
|
-
- use the
|
|
105
|
-
-
|
|
136
|
+
- use the path helpers in `src/runtime/paths.js`
|
|
137
|
+
- follow the existing bundled tools under `tools/` as the reference pattern for new tools
|
|
106
138
|
- keep all help text, usage instructions, manifests, and user-facing operational strings in English
|
|
107
139
|
- follow the One Thing Rule: each function or method should do one thing well; if it mixes low-level operations with high-level policy, split it into smaller focused units
|
|
108
140
|
|
|
141
|
+
### Tool skill hints
|
|
142
|
+
Tools may declare skills in `tool.manifest.json`:
|
|
143
|
+
|
|
144
|
+
```json
|
|
145
|
+
{
|
|
146
|
+
"skillHints": [
|
|
147
|
+
{ "name": "stop-slop", "when": "writing public page copy" }
|
|
148
|
+
]
|
|
149
|
+
}
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
The tool registry resolves these from the installed skills directory and injects them into the tool request as `skills`. `list_tools` exposes the hints and `tool_help` shows their resolution status. Skills are guidance for the agent/tool; they are not separate runtime dependencies.
|
|
153
|
+
|
|
109
154
|
## Dependency installation
|
|
110
|
-
|
|
155
|
+
Tool dependencies are installed as part of building or running the tool, not delegated to the user.
|
|
111
156
|
- Prefer `pnpm install`.
|
|
112
157
|
- Fall back to `npm install`.
|
|
113
158
|
- Do not ask the user to do it manually.
|
package/README.md
CHANGED
|
@@ -145,13 +145,13 @@ node src/index.js --telegram.token <token>
|
|
|
145
145
|
With this mode, Arisa creates `~/.arisa/state/config.json` without prompts and applies these defaults when not provided:
|
|
146
146
|
|
|
147
147
|
- `pi.provider`: `openai-codex` when available, otherwise first provider from the current Pi provider list
|
|
148
|
-
- `pi.model`: first model after bootstrap sorting (currently prioritizes `openai-codex/gpt-5.
|
|
148
|
+
- `pi.model`: first model after bootstrap sorting (currently prioritizes `openai-codex/gpt-5.5`)
|
|
149
149
|
- `telegram.maxChatIds`: `1`
|
|
150
150
|
|
|
151
151
|
Supported overrides:
|
|
152
152
|
|
|
153
153
|
```bash
|
|
154
|
-
node src/index.js --telegram.token <token> --telegram.maxChatIds 3 --pi.provider openai-codex --pi.model gpt-5.
|
|
154
|
+
node src/index.js --telegram.token <token> --telegram.maxChatIds 3 --pi.provider openai-codex --pi.model gpt-5.5 --pi.apiKey <optional-provider-key>
|
|
155
155
|
```
|
|
156
156
|
|
|
157
157
|
Notes:
|
|
@@ -171,7 +171,7 @@ For providers with internal Pi login support, such as Codex, leaving the API key
|
|
|
171
171
|
|
|
172
172
|
For example, selecting:
|
|
173
173
|
|
|
174
|
-
- `openai-codex/gpt-5.
|
|
174
|
+
- `openai-codex/gpt-5.5`
|
|
175
175
|
|
|
176
176
|
allows Arisa to authenticate through Pi's Codex OAuth flow instead of requiring a normal OpenAI API key.
|
|
177
177
|
|
package/package.json
CHANGED
|
@@ -132,6 +132,49 @@ export class AgentManager {
|
|
|
132
132
|
return ctx;
|
|
133
133
|
}
|
|
134
134
|
|
|
135
|
+
async runTool({ name, request, chatId }) {
|
|
136
|
+
await this.toolRegistry.load();
|
|
137
|
+
this.logger?.log("agent", `run_tool ${name}`);
|
|
138
|
+
const chatArtifactStore = this.artifactStore.forChat(chatId);
|
|
139
|
+
const result = await this.toolRegistry.run({ name, request, chatId });
|
|
140
|
+
|
|
141
|
+
if (result.output?.text) {
|
|
142
|
+
const outArtifact = await chatArtifactStore.createText({
|
|
143
|
+
text: result.output.text,
|
|
144
|
+
source: { type: "tool", toolName: name },
|
|
145
|
+
metadata: { tool: name }
|
|
146
|
+
});
|
|
147
|
+
result.output.artifactId = outArtifact.id;
|
|
148
|
+
}
|
|
149
|
+
|
|
150
|
+
if (result.output?.filePath) {
|
|
151
|
+
const generated = await chatArtifactStore.createFromFile({
|
|
152
|
+
originalPath: result.output.filePath,
|
|
153
|
+
fileName: result.output.fileName || path.basename(result.output.filePath),
|
|
154
|
+
kind: result.output.kind || "file",
|
|
155
|
+
mimeType: result.output.mimeType || "application/octet-stream",
|
|
156
|
+
source: { type: "tool", toolName: name },
|
|
157
|
+
metadata: { tool: name }
|
|
158
|
+
});
|
|
159
|
+
result.output.artifactId = generated.id;
|
|
160
|
+
await unlink(result.output.filePath).catch(() => {});
|
|
161
|
+
}
|
|
162
|
+
|
|
163
|
+
if (result.asyncTask || result.asyncTasks?.length) {
|
|
164
|
+
const scheduled = await this.taskStore.addMany(
|
|
165
|
+
result.asyncTasks || [result.asyncTask],
|
|
166
|
+
{
|
|
167
|
+
payload: { chatId },
|
|
168
|
+
source: { type: "tool", toolName: name, chatId }
|
|
169
|
+
}
|
|
170
|
+
);
|
|
171
|
+
result.asyncTasks = scheduled;
|
|
172
|
+
delete result.asyncTask;
|
|
173
|
+
}
|
|
174
|
+
|
|
175
|
+
return result;
|
|
176
|
+
}
|
|
177
|
+
|
|
135
178
|
createTools(telegram, chatId) {
|
|
136
179
|
const chatArtifactStore = this.artifactStore.forChat(chatId);
|
|
137
180
|
|
|
@@ -160,6 +203,18 @@ export class AgentManager {
|
|
|
160
203
|
return { content: [{ type: "text", text: help }], details: { help } };
|
|
161
204
|
}
|
|
162
205
|
}),
|
|
206
|
+
defineTool({
|
|
207
|
+
name: "tool_skills",
|
|
208
|
+
label: "Tool skills",
|
|
209
|
+
description: "Show skills assigned to a CLI tool via its manifest skillHints.",
|
|
210
|
+
parameters: Type.Object({ name: Type.String() }),
|
|
211
|
+
execute: async (_id, params) => {
|
|
212
|
+
await this.toolRegistry.load();
|
|
213
|
+
const skills = await this.toolRegistry.resolveSkills(params.name);
|
|
214
|
+
const visible = skills.map(({ content, ...item }) => item);
|
|
215
|
+
return { content: [{ type: "text", text: JSON.stringify(visible, null, 2) }], details: visible };
|
|
216
|
+
}
|
|
217
|
+
}),
|
|
163
218
|
defineTool({
|
|
164
219
|
name: "set_tool_config",
|
|
165
220
|
label: "Set tool config",
|
|
@@ -182,8 +237,6 @@ export class AgentManager {
|
|
|
182
237
|
args: Type.Optional(Type.Record(Type.String(), Type.String()))
|
|
183
238
|
}),
|
|
184
239
|
execute: async (_id, params) => {
|
|
185
|
-
await this.toolRegistry.load();
|
|
186
|
-
this.logger?.log("agent", `run_tool ${params.name}`);
|
|
187
240
|
let artifact = null;
|
|
188
241
|
if (params.artifactId) {
|
|
189
242
|
artifact = await chatArtifactStore.get(params.artifactId);
|
|
@@ -191,7 +244,7 @@ export class AgentManager {
|
|
|
191
244
|
return { content: [{ type: "text", text: `Artifact not found: ${params.artifactId}` }], details: { ok: false } };
|
|
192
245
|
}
|
|
193
246
|
}
|
|
194
|
-
const result = await this.
|
|
247
|
+
const result = await this.runTool({
|
|
195
248
|
name: params.name,
|
|
196
249
|
request: {
|
|
197
250
|
artifact,
|
|
@@ -201,40 +254,6 @@ export class AgentManager {
|
|
|
201
254
|
chatId
|
|
202
255
|
});
|
|
203
256
|
|
|
204
|
-
if (result.output?.text) {
|
|
205
|
-
const outArtifact = await chatArtifactStore.createText({
|
|
206
|
-
text: result.output.text,
|
|
207
|
-
source: { type: "tool", toolName: params.name },
|
|
208
|
-
metadata: { tool: params.name }
|
|
209
|
-
});
|
|
210
|
-
result.output.artifactId = outArtifact.id;
|
|
211
|
-
}
|
|
212
|
-
|
|
213
|
-
if (result.output?.filePath) {
|
|
214
|
-
const generated = await chatArtifactStore.createFromFile({
|
|
215
|
-
originalPath: result.output.filePath,
|
|
216
|
-
fileName: result.output.fileName || path.basename(result.output.filePath),
|
|
217
|
-
kind: result.output.kind || "file",
|
|
218
|
-
mimeType: result.output.mimeType || "application/octet-stream",
|
|
219
|
-
source: { type: "tool", toolName: params.name },
|
|
220
|
-
metadata: { tool: params.name }
|
|
221
|
-
});
|
|
222
|
-
result.output.artifactId = generated.id;
|
|
223
|
-
await unlink(result.output.filePath).catch(() => {});
|
|
224
|
-
}
|
|
225
|
-
|
|
226
|
-
if (result.asyncTask || result.asyncTasks?.length) {
|
|
227
|
-
const scheduled = await this.taskStore.addMany(
|
|
228
|
-
result.asyncTasks || [result.asyncTask],
|
|
229
|
-
{
|
|
230
|
-
payload: { chatId },
|
|
231
|
-
source: { type: "tool", toolName: params.name, chatId }
|
|
232
|
-
}
|
|
233
|
-
);
|
|
234
|
-
result.asyncTasks = scheduled;
|
|
235
|
-
delete result.asyncTask;
|
|
236
|
-
}
|
|
237
|
-
|
|
238
257
|
return {
|
|
239
258
|
content: [{ type: "text", text: JSON.stringify(result, null, 2) }],
|
|
240
259
|
details: result
|
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
import { fileURLToPath } from "node:url";
|
|
2
|
-
import { arisaHomeDir, chatsDir, stateDir, toolsDir } from "../../runtime/paths.js";
|
|
2
|
+
import { arisaHomeDir, chatsDir, stateDir, toolStateDir, toolsDir } from "../../runtime/paths.js";
|
|
3
3
|
|
|
4
4
|
export const arisaInstallDir = fileURLToPath(new URL("../../..", import.meta.url));
|
|
5
5
|
export const bundledToolsDir = fileURLToPath(new URL("../../../tools", import.meta.url));
|
|
@@ -10,6 +10,7 @@ export function buildAgentRuntimeContext() {
|
|
|
10
10
|
`arisaInstallDir: ${arisaInstallDir}`,
|
|
11
11
|
`bundledToolsDir: ${bundledToolsDir}`,
|
|
12
12
|
`userToolsDir: ${toolsDir}`,
|
|
13
|
+
`toolStateDir: ${toolStateDir}`,
|
|
13
14
|
`chatsDir: ${chatsDir}`,
|
|
14
15
|
`stateDir: ${stateDir}`
|
|
15
16
|
].join("\n");
|
|
@@ -19,8 +19,9 @@ function looksLikeAudioTranscriptionTool(tool) {
|
|
|
19
19
|
return /transcri|whisper|speech.?to.?text|audio.?to.?text/i.test(`${tool.name} ${tool.description || ""}`);
|
|
20
20
|
}
|
|
21
21
|
|
|
22
|
-
function
|
|
23
|
-
return
|
|
22
|
+
export function shouldNormalizeArtifactToText(artifact, desiredMimeType = "text/plain") {
|
|
23
|
+
return desiredMimeType === "text/plain"
|
|
24
|
+
&& (artifact?.mimeType?.startsWith("audio/") || artifact?.mimeType?.startsWith("video/"));
|
|
24
25
|
}
|
|
25
26
|
|
|
26
27
|
export function selectPipeTool({ toolRegistry, artifact, desiredMimeType }) {
|
|
@@ -28,7 +29,7 @@ export function selectPipeTool({ toolRegistry, artifact, desiredMimeType }) {
|
|
|
28
29
|
.filter((tool) => toolSupportsArtifact(tool, artifact))
|
|
29
30
|
.filter((tool) => toolProduces(tool, desiredMimeType));
|
|
30
31
|
|
|
31
|
-
if (
|
|
32
|
+
if (shouldNormalizeArtifactToText(artifact, desiredMimeType)) {
|
|
32
33
|
return tools.find(looksLikeAudioTranscriptionTool) || null;
|
|
33
34
|
}
|
|
34
35
|
|
|
@@ -44,7 +45,7 @@ export async function normalizeArtifactForReasoning({
|
|
|
44
45
|
}) {
|
|
45
46
|
if (!artifact) return { normalizedArtifact: null, toolResult: null, toolName: "" };
|
|
46
47
|
|
|
47
|
-
if (!
|
|
48
|
+
if (!shouldNormalizeArtifactToText(artifact, desiredMimeType)) {
|
|
48
49
|
return { normalizedArtifact: null, toolResult: null, toolName: "" };
|
|
49
50
|
}
|
|
50
51
|
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
import os from "node:os";
|
|
2
|
+
import path from "node:path";
|
|
3
|
+
import { readFile } from "node:fs/promises";
|
|
4
|
+
|
|
5
|
+
const defaultSkillsDir = path.join(os.homedir(), ".agents", "skills");
|
|
6
|
+
|
|
7
|
+
function parseFrontmatter(source = "") {
|
|
8
|
+
if (!source.startsWith("---")) return {};
|
|
9
|
+
const end = source.indexOf("\n---", 3);
|
|
10
|
+
if (end === -1) return {};
|
|
11
|
+
const block = source.slice(3, end).trim();
|
|
12
|
+
const data = {};
|
|
13
|
+
for (const line of block.split("\n")) {
|
|
14
|
+
const match = line.match(/^([A-Za-z0-9_-]+):\s*(.*)$/);
|
|
15
|
+
if (match) data[match[1]] = match[2].replace(/^['"]|['"]$/g, "");
|
|
16
|
+
}
|
|
17
|
+
return data;
|
|
18
|
+
}
|
|
19
|
+
|
|
20
|
+
function normalizeSkillHint(value) {
|
|
21
|
+
if (typeof value === "string") return { name: value, when: "" };
|
|
22
|
+
if (value && typeof value === "object" && value.name) {
|
|
23
|
+
return { name: String(value.name), when: String(value.when || "") };
|
|
24
|
+
}
|
|
25
|
+
return null;
|
|
26
|
+
}
|
|
27
|
+
|
|
28
|
+
export class SkillRegistry {
|
|
29
|
+
constructor({ skillsDir = defaultSkillsDir } = {}) {
|
|
30
|
+
this.skillsDir = skillsDir;
|
|
31
|
+
this.cache = new Map();
|
|
32
|
+
}
|
|
33
|
+
|
|
34
|
+
async get(name) {
|
|
35
|
+
const key = String(name || "").trim();
|
|
36
|
+
if (!key) return null;
|
|
37
|
+
if (this.cache.has(key)) return this.cache.get(key);
|
|
38
|
+
|
|
39
|
+
const file = path.join(this.skillsDir, key, "SKILL.md");
|
|
40
|
+
try {
|
|
41
|
+
const content = await readFile(file, "utf8");
|
|
42
|
+
const metadata = parseFrontmatter(content);
|
|
43
|
+
const skill = {
|
|
44
|
+
name: metadata.name || key,
|
|
45
|
+
description: metadata.description || "",
|
|
46
|
+
path: file,
|
|
47
|
+
content
|
|
48
|
+
};
|
|
49
|
+
this.cache.set(key, skill);
|
|
50
|
+
return skill;
|
|
51
|
+
} catch {
|
|
52
|
+
this.cache.set(key, null);
|
|
53
|
+
return null;
|
|
54
|
+
}
|
|
55
|
+
}
|
|
56
|
+
|
|
57
|
+
normalizeHints(manifest = {}) {
|
|
58
|
+
const raw = manifest.skillHints || manifest.skills || [];
|
|
59
|
+
if (!Array.isArray(raw)) return [];
|
|
60
|
+
return raw.map(normalizeSkillHint).filter(Boolean);
|
|
61
|
+
}
|
|
62
|
+
|
|
63
|
+
async resolveHints(hints = []) {
|
|
64
|
+
const resolved = [];
|
|
65
|
+
for (const hint of hints) {
|
|
66
|
+
const skill = await this.get(hint.name);
|
|
67
|
+
resolved.push({ ...hint, found: Boolean(skill), skill });
|
|
68
|
+
}
|
|
69
|
+
return resolved;
|
|
70
|
+
}
|
|
71
|
+
}
|
|
@@ -27,7 +27,7 @@ function normalizeTask(task, defaults = {}) {
|
|
|
27
27
|
createdAt: task.createdAt || new Date().toISOString(),
|
|
28
28
|
updatedAt: new Date().toISOString(),
|
|
29
29
|
kind: task.kind,
|
|
30
|
-
runAt: task.runAt,
|
|
30
|
+
runAt: task.runAt || new Date().toISOString(),
|
|
31
31
|
payload: {
|
|
32
32
|
...(defaults.payload || {}),
|
|
33
33
|
...(task.payload || {})
|
|
@@ -3,10 +3,10 @@ import { spawn } from "node:child_process";
|
|
|
3
3
|
import { openSync } from "node:fs";
|
|
4
4
|
import { mkdir, readFile, readdir, rename, rm, unlink, writeFile } from "node:fs/promises";
|
|
5
5
|
import path from "node:path";
|
|
6
|
-
import {
|
|
6
|
+
import { getToolStateDir } from "../../runtime/paths.js";
|
|
7
7
|
|
|
8
8
|
export function daemonPaths(toolName) {
|
|
9
|
-
const root =
|
|
9
|
+
const root = getToolStateDir(toolName);
|
|
10
10
|
return {
|
|
11
11
|
root,
|
|
12
12
|
commandsDir: path.join(root, "commands"),
|
|
@@ -1,10 +1,11 @@
|
|
|
1
|
-
import { mkdir, readdir, readFile, unlink, writeFile } from "node:fs/promises";
|
|
1
|
+
import { mkdir, readdir, readFile, rmdir, unlink, writeFile } from "node:fs/promises";
|
|
2
2
|
import path from "node:path";
|
|
3
3
|
import { spawn } from "node:child_process";
|
|
4
4
|
import { fileURLToPath } from "node:url";
|
|
5
5
|
import { getToolConfigPath, getToolTmpDir, getChatToolTmpDir, toolsDir as userToolsRoot } from "../../runtime/paths.js";
|
|
6
6
|
import { loadToolConfig, parseConfigModule, writeToolConfig } from "./tool-config.js";
|
|
7
7
|
import { normalizeToolResult } from "./tool-result.js";
|
|
8
|
+
import { SkillRegistry } from "../skills/skill-registry.js";
|
|
8
9
|
|
|
9
10
|
const bundledToolsRoot = fileURLToPath(new URL("../../../tools", import.meta.url));
|
|
10
11
|
const toolRoots = [
|
|
@@ -27,6 +28,7 @@ export class ToolRegistry {
|
|
|
27
28
|
constructor({ logger } = {}) {
|
|
28
29
|
this.logger = logger;
|
|
29
30
|
this.tools = new Map();
|
|
31
|
+
this.skillRegistry = new SkillRegistry();
|
|
30
32
|
}
|
|
31
33
|
|
|
32
34
|
async load() {
|
|
@@ -52,8 +54,10 @@ export class ToolRegistry {
|
|
|
52
54
|
const configSource = await readFile(configPath, "utf8");
|
|
53
55
|
const defaults = parseConfigModule(configSource);
|
|
54
56
|
const config = await loadToolConfig(manifest.name, defaults);
|
|
57
|
+
const skillHints = this.skillRegistry.normalizeHints(manifest);
|
|
55
58
|
this.tools.set(manifest.name, {
|
|
56
59
|
...manifest,
|
|
60
|
+
skillHints,
|
|
57
61
|
dir: toolDir,
|
|
58
62
|
entry: path.join(toolDir, manifest.entry || "index.js"),
|
|
59
63
|
localConfigPath: configPath,
|
|
@@ -77,7 +81,8 @@ export class ToolRegistry {
|
|
|
77
81
|
description: tool.description,
|
|
78
82
|
input: tool.input,
|
|
79
83
|
output: tool.output,
|
|
80
|
-
configSchema: tool.configSchema || {}
|
|
84
|
+
configSchema: tool.configSchema || {},
|
|
85
|
+
skillHints: tool.skillHints || []
|
|
81
86
|
}));
|
|
82
87
|
}
|
|
83
88
|
|
|
@@ -89,7 +94,29 @@ export class ToolRegistry {
|
|
|
89
94
|
const tool = this.get(name);
|
|
90
95
|
if (!tool) throw new Error(`Tool not found: ${name}`);
|
|
91
96
|
const result = await runProcess("node", [tool.entry, "--help"], { cwd: tool.dir, env: process.env });
|
|
92
|
-
|
|
97
|
+
const help = result.stdout || result.stderr;
|
|
98
|
+
const skills = await this.resolveSkills(name);
|
|
99
|
+
if (!skills.length) return help;
|
|
100
|
+
const skillHelp = skills.map((item) => [
|
|
101
|
+
`- ${item.name}${item.when ? ` (${item.when})` : ""}`,
|
|
102
|
+
item.description ? ` ${item.description}` : null,
|
|
103
|
+
item.found ? ` path: ${item.path}` : " warning: skill not found"
|
|
104
|
+
].filter(Boolean).join("\n")).join("\n");
|
|
105
|
+
return `${help}\n\nAssigned skills:\n${skillHelp}\n`;
|
|
106
|
+
}
|
|
107
|
+
|
|
108
|
+
async resolveSkills(name) {
|
|
109
|
+
const tool = this.get(name);
|
|
110
|
+
if (!tool) throw new Error(`Tool not found: ${name}`);
|
|
111
|
+
const hints = await this.skillRegistry.resolveHints(tool.skillHints || []);
|
|
112
|
+
return hints.map((hint) => ({
|
|
113
|
+
name: hint.name,
|
|
114
|
+
when: hint.when,
|
|
115
|
+
found: hint.found,
|
|
116
|
+
description: hint.skill?.description || "",
|
|
117
|
+
path: hint.skill?.path || "",
|
|
118
|
+
content: hint.skill?.content || ""
|
|
119
|
+
}));
|
|
93
120
|
}
|
|
94
121
|
|
|
95
122
|
async resolveConfigForChat(name, chatId) {
|
|
@@ -121,12 +148,19 @@ export class ToolRegistry {
|
|
|
121
148
|
const tmpDir = chatId != null ? getChatToolTmpDir(chatId, name) : getToolTmpDir(name);
|
|
122
149
|
await mkdir(tmpDir, { recursive: true });
|
|
123
150
|
const requestFile = path.join(tmpDir, `.request-${Date.now()}.json`);
|
|
124
|
-
await
|
|
151
|
+
const skills = await this.resolveSkills(name);
|
|
152
|
+
const enrichedRequest = { ...request, chatId, skills };
|
|
153
|
+
await writeFile(requestFile, `${JSON.stringify(enrichedRequest, null, 2)}\n`, "utf8");
|
|
125
154
|
const result = await runProcess("node", [tool.entry, "run", "--request-file", requestFile], {
|
|
126
155
|
cwd: tool.dir,
|
|
127
156
|
env: process.env
|
|
128
157
|
});
|
|
129
158
|
await unlink(requestFile).catch(() => {});
|
|
159
|
+
await rmdir(tmpDir).catch(() => {});
|
|
160
|
+
if (chatId != null) {
|
|
161
|
+
await rmdir(path.dirname(tmpDir)).catch(() => {});
|
|
162
|
+
await rmdir(path.dirname(path.dirname(tmpDir))).catch(() => {});
|
|
163
|
+
}
|
|
130
164
|
try {
|
|
131
165
|
const parsed = JSON.parse(result.stdout || result.stderr);
|
|
132
166
|
const normalized = normalizeToolResult(name, parsed);
|
package/src/runtime/bootstrap.js
CHANGED
package/src/runtime/paths.js
CHANGED
|
@@ -10,6 +10,7 @@ export const serviceLogFile = path.join(stateDir, "arisa.log");
|
|
|
10
10
|
export const tasksFile = path.join(stateDir, "tasks.json");
|
|
11
11
|
export const toolsDir = path.join(arisaHomeDir, "tools");
|
|
12
12
|
export const chatsDir = path.join(arisaHomeDir, "chats");
|
|
13
|
+
export const toolStateDir = path.join(stateDir, "tools");
|
|
13
14
|
|
|
14
15
|
export function getChatDir(chatId) {
|
|
15
16
|
return path.join(chatsDir, String(chatId));
|
|
@@ -23,6 +24,10 @@ export function getChatArtifactsIndexFile(chatId) {
|
|
|
23
24
|
return path.join(getChatDir(chatId), "state", "artifacts.json");
|
|
24
25
|
}
|
|
25
26
|
|
|
27
|
+
export function getChatToolStateDir(chatId, toolName) {
|
|
28
|
+
return path.join(getChatDir(chatId), "state", "tools", toolName);
|
|
29
|
+
}
|
|
30
|
+
|
|
26
31
|
export function getChatPiSessionsDir(chatId) {
|
|
27
32
|
return path.join(getChatDir(chatId), "state", "pi-sessions");
|
|
28
33
|
}
|
|
@@ -35,24 +40,28 @@ export function getToolConfigPath(toolName) {
|
|
|
35
40
|
return path.join(getToolDir(toolName), "config.js");
|
|
36
41
|
}
|
|
37
42
|
|
|
38
|
-
export function
|
|
39
|
-
return path.join(getChatDir(chatId), "
|
|
43
|
+
export function getChatConfigDir(chatId) {
|
|
44
|
+
return path.join(getChatDir(chatId), "config");
|
|
40
45
|
}
|
|
41
46
|
|
|
42
|
-
export function
|
|
43
|
-
return
|
|
47
|
+
export function getChatTmpDir(chatId) {
|
|
48
|
+
return path.join(getChatDir(chatId), "tmp");
|
|
49
|
+
}
|
|
50
|
+
|
|
51
|
+
export function getChatToolConfigPath(chatId, toolName) {
|
|
52
|
+
return path.join(getChatConfigDir(chatId), "tools", toolName, "config.js");
|
|
44
53
|
}
|
|
45
54
|
|
|
46
|
-
export function
|
|
47
|
-
return path.join(
|
|
55
|
+
export function getToolStateDir(toolName) {
|
|
56
|
+
return path.join(toolStateDir, toolName);
|
|
48
57
|
}
|
|
49
58
|
|
|
50
59
|
export function getToolTmpDir(toolName) {
|
|
51
|
-
return path.join(
|
|
60
|
+
return path.join(getToolStateDir(toolName), "tmp");
|
|
52
61
|
}
|
|
53
62
|
|
|
54
63
|
export function getChatToolTmpDir(chatId, toolName) {
|
|
55
|
-
return path.join(
|
|
64
|
+
return path.join(getChatTmpDir(chatId), "tools", toolName);
|
|
56
65
|
}
|
|
57
66
|
|
|
58
67
|
export async function ensureArisaHome() {
|
|
@@ -3,7 +3,7 @@ import path from "node:path";
|
|
|
3
3
|
import { authorizeChat } from "./auth.js";
|
|
4
4
|
import { captureIncomingArtifact } from "./media.js";
|
|
5
5
|
import { renderTelegramHtml } from "./text-format.js";
|
|
6
|
-
import { normalizeArtifactForReasoning } from "../../core/artifacts/normalize-for-reasoning.js";
|
|
6
|
+
import { normalizeArtifactForReasoning, shouldNormalizeArtifactToText } from "../../core/artifacts/normalize-for-reasoning.js";
|
|
7
7
|
|
|
8
8
|
function quotedMessageSummary(message) {
|
|
9
9
|
if (!message) return [];
|
|
@@ -63,11 +63,11 @@ function buildPrompt({ ctx, artifact, transcript, toolResult }) {
|
|
|
63
63
|
if (transcript) {
|
|
64
64
|
parts.push(`transcriptArtifactId: ${transcript.id}`);
|
|
65
65
|
parts.push(`transcriptText: ${transcript.text}`);
|
|
66
|
-
parts.push(`Important: the incoming
|
|
66
|
+
parts.push(`Important: the incoming media has already been transcribed. Use the transcript as the user message content. Do not answer with a raw transcription unless the user explicitly asked for one.`);
|
|
67
67
|
}
|
|
68
|
-
if (artifact
|
|
69
|
-
parts.push(`
|
|
70
|
-
parts.push(`Important: pre-reasoning
|
|
68
|
+
if (shouldNormalizeArtifactToText(artifact) && !transcript && toolResult) {
|
|
69
|
+
parts.push(`mediaNormalizationResult: ${JSON.stringify(toolResult)}`);
|
|
70
|
+
parts.push(`Important: pre-reasoning media normalization could not be completed, so you do not have a transcript for this audio/video message.`);
|
|
71
71
|
}
|
|
72
72
|
|
|
73
73
|
parts.push(`If you need a CLI tool, use list_tools/tool_help/run_tool.`);
|
|
@@ -114,10 +114,10 @@ async function buildAsyncTaskPrompt({ task, artifactStore, toolRegistry, logger
|
|
|
114
114
|
logger?.log("tasks", `artifact ${artifact.id} normalized to ${normalizedArtifact.id}`);
|
|
115
115
|
parts.push(`transcriptArtifactId: ${normalizedArtifact.id}`);
|
|
116
116
|
parts.push(`transcriptText: ${normalizedArtifact.text}`);
|
|
117
|
-
parts.push("Important: the attached
|
|
118
|
-
} else if (artifact
|
|
119
|
-
parts.push(`
|
|
120
|
-
parts.push("Important: pre-reasoning
|
|
117
|
+
parts.push("Important: the attached media artifact has already been normalized for reasoning. Use the transcript as the message content.");
|
|
118
|
+
} else if (shouldNormalizeArtifactToText(artifact) && toolResult) {
|
|
119
|
+
parts.push(`mediaNormalizationResult: ${JSON.stringify(toolResult)}`);
|
|
120
|
+
parts.push("Important: pre-reasoning media normalization could not be completed, so you do not have a transcript for this audio/video artifact.");
|
|
121
121
|
}
|
|
122
122
|
} else {
|
|
123
123
|
parts.push(`artifactId: ${task.payload.artifactId}`);
|
|
@@ -130,6 +130,18 @@ async function buildAsyncTaskPrompt({ task, artifactStore, toolRegistry, logger
|
|
|
130
130
|
return parts.filter(Boolean).join("\n");
|
|
131
131
|
}
|
|
132
132
|
|
|
133
|
+
function buildAsyncEventPrompt(task) {
|
|
134
|
+
return [
|
|
135
|
+
"External event arrived.",
|
|
136
|
+
`taskId: ${task.id}`,
|
|
137
|
+
`chatId: ${task.payload.chatId}`,
|
|
138
|
+
task.payload.prompt ? `event: ${task.payload.prompt}` : null,
|
|
139
|
+
"A polling checker detected this external event. Evaluate it and decide the next action.",
|
|
140
|
+
"If it warrants no action, you may stay silent.",
|
|
141
|
+
"If needed, use tools."
|
|
142
|
+
].filter(Boolean).join("\n");
|
|
143
|
+
}
|
|
144
|
+
|
|
133
145
|
async function normalizeIncomingArtifact({ artifact, toolRegistry, chatArtifactStore, chatId }) {
|
|
134
146
|
if (!artifact) return { transcript: null, toolResult: null };
|
|
135
147
|
const { normalizedArtifact, toolResult } = await normalizeArtifactForReasoning({
|
|
@@ -194,9 +206,9 @@ export async function createTelegramBot({ config, artifactStore, toolRegistry, t
|
|
|
194
206
|
const artifact = await captureIncomingArtifact(ctx, artifactStore);
|
|
195
207
|
if (artifact) logger?.log("telegram", `captured artifact ${artifact.kind}${artifact.id ? ` ${artifact.id}` : ""}`);
|
|
196
208
|
const { transcript, toolResult } = await normalizeIncomingArtifact({ artifact, toolRegistry, chatArtifactStore, chatId });
|
|
197
|
-
if (transcript) logger?.log("telegram", `
|
|
198
|
-
if (artifact
|
|
199
|
-
logger?.log("telegram", `
|
|
209
|
+
if (transcript) logger?.log("telegram", `media transcribed to artifact ${transcript.id}`);
|
|
210
|
+
if (shouldNormalizeArtifactToText(artifact) && !transcript) {
|
|
211
|
+
logger?.log("telegram", `media normalization unavailable for chat ${ctx.chat.id}: ${toolResult?.error || toolResult?.missingConfig?.join(", ") || "unknown error"}`);
|
|
200
212
|
}
|
|
201
213
|
return buildPrompt({ ctx, artifact, transcript, toolResult });
|
|
202
214
|
}
|
|
@@ -310,6 +322,73 @@ export async function createTelegramBot({ config, artifactStore, toolRegistry, t
|
|
|
310
322
|
});
|
|
311
323
|
}
|
|
312
324
|
|
|
325
|
+
async function dispatchTask(task) {
|
|
326
|
+
const chatId = task.payload?.chatId;
|
|
327
|
+
if (!chatId) {
|
|
328
|
+
await taskStore.fail(task.id, `Task missing chatId: ${task.kind}`);
|
|
329
|
+
return;
|
|
330
|
+
}
|
|
331
|
+
|
|
332
|
+
if (task.kind === "agent_task") {
|
|
333
|
+
if (!task.payload.prompt) {
|
|
334
|
+
await taskStore.fail(task.id, "agent_task missing prompt");
|
|
335
|
+
return;
|
|
336
|
+
}
|
|
337
|
+
logger?.log("tasks", `running task ${task.id} for chat ${chatId}`);
|
|
338
|
+
await enqueuePrompt({
|
|
339
|
+
chatId,
|
|
340
|
+
prompt: await buildAsyncTaskPrompt({ task, artifactStore, toolRegistry, logger }),
|
|
341
|
+
label: `scheduled task ${task.id}`
|
|
342
|
+
});
|
|
343
|
+
await taskStore.complete(task.id);
|
|
344
|
+
return;
|
|
345
|
+
}
|
|
346
|
+
|
|
347
|
+
if (task.kind === "agent_event") {
|
|
348
|
+
logger?.log("tasks", `agent event ${task.id} for chat ${chatId}`);
|
|
349
|
+
await enqueuePrompt({
|
|
350
|
+
chatId,
|
|
351
|
+
prompt: buildAsyncEventPrompt(task),
|
|
352
|
+
label: `agent event ${task.id}`
|
|
353
|
+
});
|
|
354
|
+
await taskStore.complete(task.id);
|
|
355
|
+
return;
|
|
356
|
+
}
|
|
357
|
+
|
|
358
|
+
if (task.kind === "poll_tool") {
|
|
359
|
+
const toolName = task.payload?.toolName;
|
|
360
|
+
if (!toolName) {
|
|
361
|
+
await taskStore.fail(task.id, "poll_tool missing toolName");
|
|
362
|
+
return;
|
|
363
|
+
}
|
|
364
|
+
logger?.log("tasks", `polling tool ${toolName} (task ${task.id}) for chat ${chatId}`);
|
|
365
|
+
try {
|
|
366
|
+
await agentManager.runTool({
|
|
367
|
+
name: toolName,
|
|
368
|
+
request: { args: task.payload.args || {} },
|
|
369
|
+
chatId
|
|
370
|
+
});
|
|
371
|
+
} catch (error) {
|
|
372
|
+
logger?.log("tasks", `poll_tool ${toolName} failed: ${error instanceof Error ? error.message : String(error)}`);
|
|
373
|
+
}
|
|
374
|
+
await taskStore.complete(task.id);
|
|
375
|
+
return;
|
|
376
|
+
}
|
|
377
|
+
|
|
378
|
+
await taskStore.fail(task.id, `Unsupported task: ${task.kind}`);
|
|
379
|
+
}
|
|
380
|
+
|
|
381
|
+
async function dispatchDueTasks() {
|
|
382
|
+
const tasks = await taskStore.claimDue(10);
|
|
383
|
+
for (const task of tasks) {
|
|
384
|
+
try {
|
|
385
|
+
await dispatchTask(task);
|
|
386
|
+
} catch (error) {
|
|
387
|
+
await taskStore.fail(task.id, error instanceof Error ? error.message : String(error));
|
|
388
|
+
}
|
|
389
|
+
}
|
|
390
|
+
}
|
|
391
|
+
|
|
313
392
|
async function handleNewCommand(ctx) {
|
|
314
393
|
agentManager.resetSession(ctx.chat.id);
|
|
315
394
|
perChatState.set(ctx.chat.id, { processing: false, nextPrompt: "" });
|
|
@@ -381,25 +460,10 @@ export async function createTelegramBot({ config, artifactStore, toolRegistry, t
|
|
|
381
460
|
await bot.api.setMyCommands([
|
|
382
461
|
{ command: "new", description: "Start a new chat context" }
|
|
383
462
|
]);
|
|
384
|
-
setInterval(
|
|
385
|
-
|
|
386
|
-
|
|
387
|
-
|
|
388
|
-
if (task.kind !== "agent_task" || !task.payload?.chatId || !task.payload?.prompt) {
|
|
389
|
-
await taskStore.fail(task.id, `Unsupported task: ${task.kind}`);
|
|
390
|
-
continue;
|
|
391
|
-
}
|
|
392
|
-
logger?.log("tasks", `running task ${task.id} for chat ${task.payload.chatId}`);
|
|
393
|
-
await enqueuePrompt({
|
|
394
|
-
chatId: task.payload.chatId,
|
|
395
|
-
prompt: await buildAsyncTaskPrompt({ task, artifactStore, toolRegistry, logger }),
|
|
396
|
-
label: `scheduled task ${task.id}`
|
|
397
|
-
});
|
|
398
|
-
await taskStore.complete(task.id);
|
|
399
|
-
} catch (error) {
|
|
400
|
-
await taskStore.fail(task.id, error instanceof Error ? error.message : String(error));
|
|
401
|
-
}
|
|
402
|
-
}
|
|
463
|
+
setInterval(() => {
|
|
464
|
+
dispatchDueTasks().catch((error) => {
|
|
465
|
+
logger?.error("tasks", `dispatch failed: ${error instanceof Error ? error.message : String(error)}`);
|
|
466
|
+
});
|
|
403
467
|
}, 1000).unref();
|
|
404
468
|
if (webhookUrl && setHttpRequestHandler) {
|
|
405
469
|
const webhookPath = `/telegram-${config.telegram.token.slice(-8)}`;
|
|
@@ -33,6 +33,26 @@ export async function captureIncomingArtifact(ctx, artifactStore) {
|
|
|
33
33
|
});
|
|
34
34
|
}
|
|
35
35
|
|
|
36
|
+
if (ctx.message?.video) {
|
|
37
|
+
const video = ctx.message.video;
|
|
38
|
+
const fileName = video.file_name || `${chatId}-${ctx.msg.message_id}.mp4`;
|
|
39
|
+
const content = await downloadToBuffer(ctx, video.file_id);
|
|
40
|
+
return store.createGeneratedFile({
|
|
41
|
+
fileName,
|
|
42
|
+
content,
|
|
43
|
+
kind: "video",
|
|
44
|
+
mimeType: video.mime_type || "video/mp4",
|
|
45
|
+
source: baseSource,
|
|
46
|
+
metadata: {
|
|
47
|
+
duration: video.duration,
|
|
48
|
+
width: video.width,
|
|
49
|
+
height: video.height,
|
|
50
|
+
fileSize: video.file_size,
|
|
51
|
+
...incomingCaptionMetadata(ctx)
|
|
52
|
+
}
|
|
53
|
+
});
|
|
54
|
+
}
|
|
55
|
+
|
|
36
56
|
if (ctx.message?.document) {
|
|
37
57
|
const fileName = ctx.message.document.file_name || `${chatId}-${ctx.msg.message_id}`;
|
|
38
58
|
const content = await downloadToBuffer(ctx, ctx.message.document.file_id);
|
|
@@ -9,7 +9,7 @@ const toolName = "openai-transcribe";
|
|
|
9
9
|
const config = await loadToolConfig(toolName, defaults);
|
|
10
10
|
|
|
11
11
|
function printHelp() {
|
|
12
|
-
console.log(`openai-transcribe\n\nUsage:\n node index.js --help\n node index.js run --request-file <json>\n\nExpected input:\n {\n "artifact": { "path": "/abs/
|
|
12
|
+
console.log(`openai-transcribe\n\nUsage:\n node index.js --help\n node index.js run --request-file <json>\n\nExpected input:\n {\n "artifact": { "path": "/abs/media.ogg", "mimeType": "audio/ogg" },\n "args": {}\n }\n\nConfig at ${getToolConfigPath(toolName)}:\n OPENAI_API_KEY\n MODEL\n`);
|
|
13
13
|
}
|
|
14
14
|
|
|
15
15
|
async function run(requestFile) {
|
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "openai-transcribe",
|
|
3
|
-
"description": "Transcribe audio files with OpenAI audio transcription API.",
|
|
3
|
+
"description": "Transcribe audio files and video audio tracks with OpenAI audio transcription API.",
|
|
4
4
|
"entry": "index.js",
|
|
5
|
-
"input": ["audio/ogg", "audio/mpeg", "audio/wav", "audio/mp4"],
|
|
5
|
+
"input": ["audio/ogg", "audio/mpeg", "audio/wav", "audio/mp4", "video/mp4"],
|
|
6
6
|
"output": ["text/plain"],
|
|
7
7
|
"configSchema": {
|
|
8
8
|
"OPENAI_API_KEY": {
|
|
@@ -1,68 +0,0 @@
|
|
|
1
|
-
# Flow genérico de eventos asíncronos para tools
|
|
2
|
-
|
|
3
|
-
> Estado: propuesta / no implementado. Guardado como referencia.
|
|
4
|
-
> La implementación actual (timer) se mantiene; este documento describe una evolución posible.
|
|
5
|
-
|
|
6
|
-
## Problema
|
|
7
|
-
|
|
8
|
-
Hoy la única re-entrada asíncrona al agente es por tiempo: una tool devuelve `asyncTask` con `runAt` y el poller de 1s en `src/transport/telegram/bot.js` lo dispara como prompt. Eso obliga a resolver con timer (polling crudo, latencia fija, re-spawn de la tool y un turno completo del agente en cada chequeo). Falta una **cola de eventos entrantes** que despierte al agente solo cuando hay algo que evaluar.
|
|
9
|
-
|
|
10
|
-
## Solución (polling ordenado por cola, reusando TaskStore)
|
|
11
|
-
|
|
12
|
-
Dos nuevos `kind` de tarea, drenados por el mismo poller hacia el mismo `enqueuePrompt`:
|
|
13
|
-
|
|
14
|
-
- `poll_tool`: tarea recurrente que el poller **ejecuta directamente como tool** (no gasta turno del agente). El checker mantiene su propio cursor de estado en su config/tmp por chat. Si hay novedad, emite un `agent_event`.
|
|
15
|
-
- `agent_event`: evento entrante que se dispara de inmediato. El poller lo entrega como prompt para que Pi lo evalúe y decida.
|
|
16
|
-
|
|
17
|
-
```mermaid
|
|
18
|
-
flowchart LR
|
|
19
|
-
Tool[Tool run normal] -->|asyncTask poll_tool| TS[TaskStore]
|
|
20
|
-
TS --> Poller[1s poller dispatcher]
|
|
21
|
-
Poller -->|kind poll_tool| Run[agentManager.runTool checker]
|
|
22
|
-
Run -->|si hay novedad: asyncTask agent_event| TS
|
|
23
|
-
Poller -->|kind agent_event| EP[enqueuePrompt]
|
|
24
|
-
Poller -->|kind agent_task| EP
|
|
25
|
-
EP --> Pi[Pi evalua y decide]
|
|
26
|
-
```
|
|
27
|
-
|
|
28
|
-
## Cambios
|
|
29
|
-
|
|
30
|
-
### 1. TaskStore: eventos/polls sin hora se disparan ya
|
|
31
|
-
|
|
32
|
-
`src/core/tasks/task-store.js` - en `normalizeTask`, default `runAt` a `now` cuando no viene (los `agent_event` y el primer disparo de `poll_tool` deben ser inmediatos; `computeNextRunAt` ya reprograma `poll_tool` por su `recurrence`). Cambio de una línea, no rompe `agent_task` (siempre trae `runAt`).
|
|
33
|
-
|
|
34
|
-
### 2. AgentManager: extraer "run + materializar" (DRY)
|
|
35
|
-
|
|
36
|
-
`src/core/agent/agent-manager.js` - hoy el `execute` de `run_tool` (líneas ~184-242) hace: correr la tool, convertir `output.text`/`output.filePath` en artifacts y mandar `asyncTask(s)` al `TaskStore` con el `chatId`. Extraer eso a un método reusable `runTool({ name, request, chatId })`. El Pi tool `run_tool` pasa a llamarlo. Así el poller puede correr tools con la **misma** lógica de materialización (incluido el alta de `agent_event` que emita el checker).
|
|
37
|
-
|
|
38
|
-
### 3. Poller -> dispatcher por kind
|
|
39
|
-
|
|
40
|
-
`src/transport/telegram/bot.js` - reemplazar el handler de un solo kind dentro del `setInterval` (líneas ~361-380) por un dispatcher:
|
|
41
|
-
|
|
42
|
-
- `agent_task` -> `enqueuePrompt(buildAsyncTaskPrompt(task))` + `complete` (igual que hoy).
|
|
43
|
-
- `agent_event` -> `enqueuePrompt(buildAsyncEventPrompt(task))` + `complete`.
|
|
44
|
-
- `poll_tool` -> `agentManager.runTool({ name: task.payload.toolName, request: { args: task.payload.args || {} }, chatId })`; los `agent_event` que emita el checker quedan encolados para el próximo tick; luego `complete` (la `recurrence` reprograma el poll). Si la tool falla: log + `complete` para no matar el poll.
|
|
45
|
-
|
|
46
|
-
Agregar `buildAsyncEventPrompt(task)` junto a `buildAsyncTaskPrompt` (línea ~82), con framing de "llegó un evento externo, evalualo y decidí la próxima acción". Si el branch queda denso, extraer `dispatchDueTasks(...)` a una función para mantener `bot.js` como transporte.
|
|
47
|
-
|
|
48
|
-
### 4. Documentar el flow
|
|
49
|
-
|
|
50
|
-
`AGENTS.md` - sección nueva (en inglés) explicando: cómo una tool arma su auto-polling devolviendo un `asyncTask` kind `poll_tool` con `recurrence`, cómo emite novedades con `asyncTask` kind `agent_event`, que el checker guarda su cursor en su config/tmp por chat, y que el agente razona sobre el `agent_event` para decidir. `list_scheduled_tasks`/`cancel_scheduled_task` ya sirven (son kind-agnostic) para ver/cancelar polls.
|
|
51
|
-
|
|
52
|
-
## Contrato del checker tool (sin nuevas Pi tools)
|
|
53
|
-
|
|
54
|
-
Todo pasa por el campo `asyncTasks` que el pipeline ya soporta:
|
|
55
|
-
|
|
56
|
-
- Arranque del poll (desde el `run` de cualquier tool): `asyncTasks: [{ kind: "poll_tool", payload: { toolName, args }, recurrence: { type: "interval", everySeconds: N } }]`.
|
|
57
|
-
- Novedad (desde el `run` del checker): `asyncTasks: [{ kind: "agent_event", payload: { prompt: "<contenido a evaluar>" } }]`.
|
|
58
|
-
|
|
59
|
-
## No-goals (por ahora)
|
|
60
|
-
|
|
61
|
-
- No se agrega listener persistente (`node index.js listen`) ni proceso de fondo con IPC.
|
|
62
|
-
- No se agrega endpoint HTTP entrante para eventos.
|
|
63
|
-
- No se resuelve el caso de conexión sostenida (tipo cliente logueado): los checkers son one-shot y persisten su cursor entre corridas.
|
|
64
|
-
|
|
65
|
-
## Alternativas consideradas (descartadas para esta versión)
|
|
66
|
-
|
|
67
|
-
- **Listener tools**: la tool corre como proceso de larga duración (`node index.js listen`) y emite eventos por stdout que Arisa drena a la cola. Más general y realtime, pero agrega ciclo de vida de proceso a la service e IPC.
|
|
68
|
-
- **Webhook entrante**: Arisa expone un endpoint HTTP interno donde sistemas externos hacen POST de eventos. Bueno para callbacks; no sirve para los que requieren sostener una conexión.
|