@agfpd/voice-connect 0.1.11 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +124 -24
- package/bin/voice-connect.mjs +60 -0
- package/package.json +2 -1
- package/src/http.mjs +22 -4
- package/src/provider.mjs +344 -0
package/README.md
CHANGED
|
@@ -1,26 +1,44 @@
|
|
|
1
1
|
# voice-connect
|
|
2
2
|
|
|
3
|
-
**
|
|
3
|
+
**A standalone voice service (STT + TTS) for a team of AI agents.**
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
> **Built for iapeer.** It isn't a standalone TTS service — it runs inside [iapeer](https://github.com/agfpd/iapeer), alongside `iapeer-memory` and `telegram-runtime`. A peer synthesizes a file with one tool call and delivers it over iapeer's own messaging.
|
|
5
|
+
voice-connect gives agents and runtimes one voice layer with two halves — text-to-speech and speech-to-text — over a single multi-provider core. Each request runs a cascade with fallback inside the tool: cloud quality first, a local engine as the floor, so synthesis and transcription still work with no API key and no network. It runs on its own (configured from a file) or inside [iapeer](https://github.com/agfpd/iapeer) (configured from the peer-profile).
|
|
8
6
|
|
|
9
7
|
## How it works
|
|
10
8
|
|
|
11
|
-
|
|
9
|
+
Two facades sit on one core. Agents reach it as an MCP server; runtimes reach it over an OpenAI-compatible HTTP service. Both call the same cascade — no synthesis or routing logic is duplicated between them.
|
|
10
|
+
|
|
11
|
+
```
|
|
12
|
+
agents (MCP) runtimes (HTTP)
|
|
13
|
+
tts / stt POST /v1/audio/speech
|
|
14
|
+
POST /v1/audio/transcriptions
|
|
15
|
+
\ /
|
|
16
|
+
\ /
|
|
17
|
+
▼ ▼
|
|
18
|
+
┌──────────────────────────────────────┐
|
|
19
|
+
│ core: router + provider cascade │
|
|
20
|
+
├──────────────────────────────────────┤
|
|
21
|
+
│ TTS: Gemini → gpt-audio → F5 → Super │
|
|
22
|
+
│ STT: speaches → mlx-whisper │
|
|
23
|
+
└──────────────────────────────────────┘
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
The router advances to the next engine only on a known "can't-serve" error (quota, no key, unreachable); any other error propagates, so a real bug surfaces instead of being masked by a fallback. The last rung in each cascade is a local engine — Supertonic 3 for TTS, mlx-whisper for STT — with no fallback of its own: it is the floor, so a result is always produced when an engine is reachable at all.
|
|
12
27
|
|
|
13
|
-
|
|
28
|
+
Delivery is never the service's job. The MCP `tts` tool hands back a file path; you attach it yourself with `send_to_peer(personality, attachments=[path])`. The HTTP facade returns the audio bytes in the response.
|
|
29
|
+
|
|
30
|
+
Short text is synthesized synchronously — the path comes back at once. Text over ~2000 chars goes async: the `tts` tool returns a `job_id` immediately and a detached worker notifies you over IAP when the file is ready. The HTTP `/v1/audio/speech` route is always synchronous (a runtime holds the connection and wants bytes back).
|
|
14
31
|
|
|
15
32
|
## Requirements
|
|
16
33
|
|
|
17
34
|
- **Node.js ≥ 18**
|
|
18
|
-
- **ffmpeg / ffprobe** on `PATH`
|
|
19
|
-
- **python3** — only for the first provision of the local Supertonic
|
|
20
|
-
- **
|
|
21
|
-
|
|
22
|
-
- `
|
|
23
|
-
-
|
|
35
|
+
- **ffmpeg / ffprobe** on `PATH` — audio encode and probe. On macOS: `brew install ffmpeg`.
|
|
36
|
+
- **python3** — only for the first provision of the local Supertonic TTS floor (a managed venv is created on demand).
|
|
37
|
+
- **mlx_whisper** on `PATH` — only for the local STT floor (Apple-silicon Whisper, installed as a `uv` tool). Without it, STT needs a configured speaches endpoint.
|
|
38
|
+
- **API keys** — read from the environment, a shell rc file, or the standalone config file:
|
|
39
|
+
- `GEMINI_API_KEY` — the primary TTS engine (Google Gemini TTS).
|
|
40
|
+
- `OPENROUTER_API_KEY` — the gpt-audio TTS fallback rung (OpenRouter).
|
|
41
|
+
- Neither key is required: with no key, TTS routing falls through to the local F5 / Supertonic engines.
|
|
24
42
|
|
|
25
43
|
## Install
|
|
26
44
|
|
|
@@ -37,18 +55,78 @@ Or run the MCP server directly from npm:
|
|
|
37
55
|
npx -y @agfpd/voice-connect@latest
|
|
38
56
|
```
|
|
39
57
|
|
|
40
|
-
|
|
58
|
+
The package name is `@agfpd/voice-connect`; the MCP entry point is the `voice-connect-mcp` bin. Wire it into an MCP client by pointing the client at that command, e.g.:
|
|
59
|
+
|
|
60
|
+
```jsonc
|
|
61
|
+
{
|
|
62
|
+
"mcpServers": {
|
|
63
|
+
"voice-connect": {
|
|
64
|
+
"command": "npx",
|
|
65
|
+
"args": ["-y", "@agfpd/voice-connect@latest"]
|
|
66
|
+
}
|
|
67
|
+
}
|
|
68
|
+
}
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
### Always-on HTTP service
|
|
72
|
+
|
|
73
|
+
The HTTP facade is meant to run as a long-lived local service so runtimes can hit it without spawning a process per request. It is self-managed: it owns its own launchd agent (label `com.voice-connect.http`), in the package's own namespace, and writes a discovery slot (`~/.iapeer/voice-provider.json`) consumers read for the endpoint.
|
|
74
|
+
|
|
75
|
+
```
|
|
76
|
+
node scripts/launchd-http.mjs render # print the plist (no writes)
|
|
77
|
+
node scripts/launchd-http.mjs install # write plist + slot, (re)bootstrap
|
|
78
|
+
node scripts/launchd-http.mjs status # launchctl print + slot
|
|
79
|
+
node scripts/launchd-http.mjs uninstall # bootout + remove plist + slot
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
It binds `127.0.0.1:8127` by default (local-only); override with `PEER_VOICE_HTTP_HOST` / `PEER_VOICE_HTTP_PORT`. `GET /health` reports liveness.
|
|
83
|
+
|
|
84
|
+
## What it does
|
|
85
|
+
|
|
86
|
+
### MCP tools
|
|
87
|
+
|
|
88
|
+
| Tool | In → out | Notes |
|
|
89
|
+
|------|----------|-------|
|
|
90
|
+
| `tts` | text → `.ogg/opus` file | Short → `{ path, ... }` sync; long (>~2000 chars) → `{ job_id }` + an IAP "done" message later. |
|
|
91
|
+
| `stt` | audio file path → text | Returns `{ text, engine, fallback_from? }`. |
|
|
92
|
+
| `voice_create` | — | DEPRECATED alias for `tts`, kept one release; use `tts`. |
|
|
93
|
+
|
|
94
|
+
### HTTP routes (OpenAI-compatible)
|
|
41
95
|
|
|
42
|
-
|
|
96
|
+
| Route | Method | Body | Returns |
|
|
97
|
+
|-------|--------|------|---------|
|
|
98
|
+
| `/v1/audio/speech` | POST | JSON (`input`/`text`, `voice?`, `model?`/`engine?`, `lang?`, `style?`) | Ogg/Opus bytes; engine/fallback in `X-Voice-*` headers. |
|
|
99
|
+
| `/v1/audio/transcriptions` | POST | multipart (`file`, `language?`, `prompt?`, `engine?`, `response_format?`) | `{ text }`, or plain text with `response_format=text`. |
|
|
100
|
+
| `/health` | GET | — | `{ status, service, version }`. |
|
|
101
|
+
|
|
102
|
+
### Engines
|
|
103
|
+
|
|
104
|
+
TTS — first applicable engine wins; the cascade advances on a known can't-serve error:
|
|
105
|
+
|
|
106
|
+
| Engine | Tier | Model | Notes |
|
|
107
|
+
|--------|------|-------|-------|
|
|
108
|
+
| Gemini | cloud primary | `gemini-3.1-flash-tts-preview` | ru+en in one pass; honors `style`; needs `GEMINI_API_KEY`. |
|
|
109
|
+
| gpt-audio | cloud second | `openai/gpt-audio` (OpenRouter) | Multilingual one pass; honors `style`; needs `OPENROUTER_API_KEY`. |
|
|
110
|
+
| F5-TTS | local | `f5-tts` | Live-prosody Russian rung; per-peer voice cloning; applies on the `ru` route only. |
|
|
111
|
+
| Supertonic 3 | local floor | `supertonic-3` | Offline, one pass in the routed language; the floor. |
|
|
112
|
+
|
|
113
|
+
STT — speaches when an endpoint is configured, otherwise the local floor:
|
|
114
|
+
|
|
115
|
+
| Engine | Tier | Notes |
|
|
116
|
+
|--------|------|-------|
|
|
117
|
+
| speaches | primary | OpenAI-compatible `/v1/audio/transcriptions`; set `PEER_VOICE_STT_ENDPOINT`. Skipped when unset. |
|
|
118
|
+
| mlx-whisper | local floor | Offline Apple-silicon Whisper via the `mlx_whisper` CLI. |
|
|
119
|
+
|
|
120
|
+
## `tts` parameters
|
|
43
121
|
|
|
44
122
|
```
|
|
45
|
-
|
|
123
|
+
tts(text, voice?, lang?, style?, note?, engine?, out_path?)
|
|
46
124
|
```
|
|
47
125
|
|
|
48
126
|
| Param | Type | Description |
|
|
49
127
|
|-------|------|-------------|
|
|
50
128
|
| `text` | string (required) | Text to speak. Mixed ru+en is fine — read in one pass. |
|
|
51
|
-
| `voice` | string | Gemini prebuilt voice. Default `Aoede`. |
|
|
129
|
+
| `voice` | string | Gemini prebuilt voice for the primary engine. Default `Aoede`. |
|
|
52
130
|
| `lang` | `ru` \| `en` \| `na` | Language hint for the fallback engines (Gemini reads any language itself). Omit to auto-detect by character share. |
|
|
53
131
|
| `style` | string | Delivery directive (tone / emotion / tempo) for the cloud engines; ignored by the local fallbacks. |
|
|
54
132
|
| `note` | string | Reminder echoed back in the async "done" message (e.g. who to deliver to). Ignored for short (sync) text. |
|
|
@@ -57,16 +135,38 @@ voice_create(text, voice?, lang?, style?, note?, engine?, out_path?)
|
|
|
57
135
|
|
|
58
136
|
**Returns** — short text (sync): `{ path, engine, voice, lang?, probe, fallback_from? }`. Long text (async): `{ job_id, status: "started" }`, followed by an IAP message `voice job <id> done path=<path> note=<note>` when synthesis finishes.
|
|
59
137
|
|
|
60
|
-
|
|
138
|
+
## `stt` parameters
|
|
139
|
+
|
|
140
|
+
```
|
|
141
|
+
stt(audio_path, lang?, prompt?, engine?)
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
| Param | Type | Description |
|
|
145
|
+
|-------|------|-------------|
|
|
146
|
+
| `audio_path` | string (required) | Absolute path to the audio file to transcribe (e.g. a received voice `.ogg`). |
|
|
147
|
+
| `lang` | string | Language hint (ISO-639-1: `en`, `ru`, …). Omit to auto-detect. |
|
|
148
|
+
| `prompt` | string | Decoder-priming prompt — biases spelling/casing of terms (e.g. keep `Claude Code` in Latin). Not part of the output. |
|
|
149
|
+
| `engine` | `auto` \| `speaches` \| `mlx-whisper` | Force an engine. Default `auto`. |
|
|
150
|
+
|
|
151
|
+
**Returns** `{ text, engine, fallback_from? }`.
|
|
152
|
+
|
|
153
|
+
## Standalone and iapeer modes
|
|
154
|
+
|
|
155
|
+
The mode is decided by one signal: whether the caller has an iapeer identity. Both modes share the same core and the same key ladder (env → shell rc → config file).
|
|
156
|
+
|
|
157
|
+
- **Standalone** — no iapeer identity. Voice and keys come from one config file: `$PEER_VOICE_CONFIG`, else `<PEER_VOICE_HOME>/config.json`:
|
|
158
|
+
|
|
159
|
+
```jsonc
|
|
160
|
+
{
|
|
161
|
+
"voice": { "gemini-3.1-flash-tts-preview": "Aoede", "supertonic-3": "F3" },
|
|
162
|
+
"keys": { "GEMINI_API_KEY": "...", "OPENROUTER_API_KEY": "..." }
|
|
163
|
+
}
|
|
164
|
+
```
|
|
61
165
|
|
|
62
|
-
|
|
166
|
+
voice-connect only reads this file; it never writes config itself.
|
|
63
167
|
|
|
64
|
-
- **
|
|
65
|
-
- **Mixed ru+en in one pass.** Gemini reads both languages together, so a Russian-dominant line with a few English words comes back natural — no splitting, no per-language calls.
|
|
66
|
-
- **Always a result.** With no API key, routing falls through to the local F5 / Supertonic engines; synthesis works offline.
|
|
67
|
-
- **Long text never blocks.** Over ~2000 chars the tool returns a `job_id` at once and a detached worker notifies the caller over IAP when the file is ready.
|
|
68
|
-
- **Produces, never delivers.** The tool hands back a path; the caller attaches it with `send_to_peer` — one clear boundary between synthesis and delivery.
|
|
168
|
+
- **iapeer** — the caller has an iapeer identity (`PEER_PERSONALITY` or a cwd peer-profile). Voice comes from the peer-profile; keys from the host env/rc. iapeer owns the configuration.
|
|
69
169
|
|
|
70
170
|
## License
|
|
71
171
|
|
|
72
|
-
[Apache-2.0](LICENSE). Platform: macOS.
|
|
172
|
+
[Apache-2.0](LICENSE). Platform: macOS.
|
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
/**
|
|
3
|
+
* Unified entry point for @agfpd/voice-connect.
|
|
4
|
+
*
|
|
5
|
+
* Arg-dispatched so the SAME bin serves both surfaces without breaking the
|
|
6
|
+
* live per-peer MCP wiring:
|
|
7
|
+
* • no subcommand (or `mcp`) → the MCP stdio server (what the per-peer
|
|
8
|
+
* .mcp.json launches via `npx -y @agfpd/voice-connect@latest`, relying on
|
|
9
|
+
* this bin being the package-name match — backward compatible).
|
|
10
|
+
* • init | update | status | uninstall → the host-provider lifecycle
|
|
11
|
+
* (src/provider.mjs), the verbs the iapeer foundation invokes.
|
|
12
|
+
*
|
|
13
|
+
* `voice-connect-mcp` remains a direct alias for the MCP server.
|
|
14
|
+
*/
|
|
15
|
+
const cmd = process.argv[2];
|
|
16
|
+
|
|
17
|
+
const die = (err) => {
|
|
18
|
+
process.stderr.write(`voice-connect: fatal: ${err && err.stack ? err.stack : err}\n`);
|
|
19
|
+
process.exit(1);
|
|
20
|
+
};
|
|
21
|
+
|
|
22
|
+
switch (cmd) {
|
|
23
|
+
case undefined:
|
|
24
|
+
case 'mcp': {
|
|
25
|
+
const { main } = await import('../src/server.mjs');
|
|
26
|
+
main().catch(die);
|
|
27
|
+
break;
|
|
28
|
+
}
|
|
29
|
+
case 'init': {
|
|
30
|
+
const { init } = await import('../src/provider.mjs');
|
|
31
|
+
await init().catch(die);
|
|
32
|
+
break;
|
|
33
|
+
}
|
|
34
|
+
case 'update': {
|
|
35
|
+
const { update } = await import('../src/provider.mjs');
|
|
36
|
+
await update().catch(die);
|
|
37
|
+
break;
|
|
38
|
+
}
|
|
39
|
+
case 'status': {
|
|
40
|
+
const { status } = await import('../src/provider.mjs');
|
|
41
|
+
try { status(); } catch (e) { die(e); }
|
|
42
|
+
break;
|
|
43
|
+
}
|
|
44
|
+
case 'uninstall': {
|
|
45
|
+
const { uninstall } = await import('../src/provider.mjs');
|
|
46
|
+
try { uninstall(); } catch (e) { die(e); }
|
|
47
|
+
break;
|
|
48
|
+
}
|
|
49
|
+
default:
|
|
50
|
+
process.stderr.write(
|
|
51
|
+
`voice-connect: unknown command: ${cmd}\n` +
|
|
52
|
+
`usage: voice-connect [mcp|init|update|status|uninstall]\n` +
|
|
53
|
+
` (no subcommand) | mcp run the MCP stdio server\n` +
|
|
54
|
+
` init deploy the host backend (launchd HTTP + slot; prompts TTS keys)\n` +
|
|
55
|
+
` update re-deploy fresh code + restart + bump slot version\n` +
|
|
56
|
+
` status print slot + launchd state + service health\n` +
|
|
57
|
+
` uninstall remove launchd service + slot + provider home\n`,
|
|
58
|
+
);
|
|
59
|
+
process.exit(2);
|
|
60
|
+
}
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@agfpd/voice-connect",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.2.0",
|
|
4
4
|
"description": "voice-connect — voice service for agents. MCP tools tts (text → ready-to-send .ogg/opus voice file) and stt (audio → text) over one core, plus an OpenAI-compatible HTTP facade (/v1/audio/speech + /v1/audio/transcriptions) for runtimes. Cascade with fallback inside each tool — TTS: Gemini 3.1 Flash → gpt-audio (OpenRouter) → F5-TTS → Supertonic 3 local floor; STT: speaches → mlx-whisper local floor. Delivery stays the caller's job (send_to_peer attachments).",
|
|
5
5
|
"license": "Apache-2.0",
|
|
6
6
|
"author": {
|
|
@@ -32,6 +32,7 @@
|
|
|
32
32
|
],
|
|
33
33
|
"type": "module",
|
|
34
34
|
"bin": {
|
|
35
|
+
"voice-connect": "bin/voice-connect.mjs",
|
|
35
36
|
"voice-connect-mcp": "bin/peer-voice-mcp.mjs"
|
|
36
37
|
},
|
|
37
38
|
"files": [
|
package/src/http.mjs
CHANGED
|
@@ -24,9 +24,9 @@
|
|
|
24
24
|
* not a hand-rolled boundary parser.
|
|
25
25
|
*/
|
|
26
26
|
import { createServer as nodeCreateServer } from 'node:http';
|
|
27
|
-
import { readFile, writeFile, rm } from 'node:fs/promises';
|
|
28
|
-
import { tmpdir } from 'node:os';
|
|
29
|
-
import { join } from 'node:path';
|
|
27
|
+
import { readFile, writeFile, rm, mkdir, utimes } from 'node:fs/promises';
|
|
28
|
+
import { tmpdir, homedir } from 'node:os';
|
|
29
|
+
import { join, dirname } from 'node:path';
|
|
30
30
|
import { randomBytes } from 'node:crypto';
|
|
31
31
|
import { createVoice } from './voice.mjs';
|
|
32
32
|
import { transcribe as transcribeAudio } from './stt.mjs';
|
|
@@ -235,6 +235,21 @@ export function createHttpServer({ voice, transcribe, version } = {}) {
|
|
|
235
235
|
});
|
|
236
236
|
}
|
|
237
237
|
|
|
238
|
+
/** Liveness file the provider slot points at; the foundation reads its mtime
|
|
239
|
+
* for freshness in `iapeer status`. Kept in sync with provider.mjs HEARTBEAT_PATH. */
|
|
240
|
+
const HEARTBEAT_PATH = join(homedir(), '.iapeer', 'state', 'voice-connect', 'http.heartbeat');
|
|
241
|
+
const HEARTBEAT_INTERVAL_MS = 30_000;
|
|
242
|
+
|
|
243
|
+
/** mtime-touch the heartbeat (create on first call). Best-effort — never throws. */
|
|
244
|
+
async function touchHeartbeat() {
|
|
245
|
+
try {
|
|
246
|
+
await mkdir(dirname(HEARTBEAT_PATH), { recursive: true });
|
|
247
|
+
const now = new Date();
|
|
248
|
+
try { await utimes(HEARTBEAT_PATH, now, now); }
|
|
249
|
+
catch { await writeFile(HEARTBEAT_PATH, ''); }
|
|
250
|
+
} catch { /* heartbeat is advisory; a write failure must not kill the service */ }
|
|
251
|
+
}
|
|
252
|
+
|
|
238
253
|
/** Bootstrap: bind and run the always-on facade. Importable without side effects. */
|
|
239
254
|
export async function main() {
|
|
240
255
|
const server = createHttpServer();
|
|
@@ -245,8 +260,11 @@ export async function main() {
|
|
|
245
260
|
server.listen(port, host, resolve);
|
|
246
261
|
});
|
|
247
262
|
process.stderr.write(`voice-connect http: listening on http://${host}:${port} (version ${readVersion()})\n`);
|
|
263
|
+
await touchHeartbeat();
|
|
264
|
+
const heartbeat = setInterval(touchHeartbeat, HEARTBEAT_INTERVAL_MS);
|
|
265
|
+
heartbeat.unref(); // never keep the event loop alive on the heartbeat alone
|
|
248
266
|
for (const sig of ['SIGINT', 'SIGTERM']) {
|
|
249
|
-
process.on(sig, () => server.close(() => process.exit(0)));
|
|
267
|
+
process.on(sig, () => { clearInterval(heartbeat); server.close(() => process.exit(0)); });
|
|
250
268
|
}
|
|
251
269
|
return server;
|
|
252
270
|
}
|
package/src/provider.mjs
ADDED
|
@@ -0,0 +1,344 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Host-level provider lifecycle for voice-connect — the callable verbs the
|
|
3
|
+
* iapeer foundation invokes (`init`, `update`) plus `status` and `uninstall`.
|
|
4
|
+
*
|
|
5
|
+
* Contract (agreed with the iapeer peer, 2026-06-22): voice-connect is a
|
|
6
|
+
* host-level provider modelled 1:1 on the memory provider. The foundation's
|
|
7
|
+
* onboarding step runs `voice-connect init` with inherited stdio; its update
|
|
8
|
+
* cascade leg runs `voice-connect update` via
|
|
9
|
+
* `npm exec --package=@agfpd/voice-connect@latest`. The core only READS the
|
|
10
|
+
* slot ~/.iapeer/voice-provider.json — the provider declares it.
|
|
11
|
+
*
|
|
12
|
+
* Scope (the PIN): init = HOST BACKEND only — the self-managed launchd HTTP
|
|
13
|
+
* facade (com.voice-connect.http) + the discovery slot + TTS-key prompts. It
|
|
14
|
+
* wires NO host-wide MCP. The voice_create/tts MCP surface is per-peer via
|
|
15
|
+
* `iapeer enable voice-connect <peer>` (cwd-resolved per-peer voice identity,
|
|
16
|
+
* the 0.1.10 codex-parity mechanism). So the slot carries no provision/
|
|
17
|
+
* unprovision commands (unlike memory — voice is opt-in, not auto-at-birth).
|
|
18
|
+
*
|
|
19
|
+
* Robust-update model: init/update copy the RUNNING package's runtime tree into
|
|
20
|
+
* a STABLE provider home (~/.iapeer/providers/voice-connect) and point the
|
|
21
|
+
* launchd plist there — never at the ephemeral npx cache (which is GC'd; a
|
|
22
|
+
* plist into _npx/<hash> is the exact fragile path to avoid). `update` runs
|
|
23
|
+
* from the `npm exec …@latest` temp package = fresh code; it re-copies that into
|
|
24
|
+
* the stable home, restarts the service, and the slot version (read from the
|
|
25
|
+
* copied package.json) reflects the new release — so the foundation's
|
|
26
|
+
* before/after version-gate is real.
|
|
27
|
+
*
|
|
28
|
+
* This module is the single source of truth for the launchd posture; the dev
|
|
29
|
+
* tool scripts/launchd-http.mjs delegates here (codeHome = the repo, no copy).
|
|
30
|
+
*/
|
|
31
|
+
import {
|
|
32
|
+
readFileSync, writeFileSync, mkdirSync, rmSync, cpSync,
|
|
33
|
+
existsSync, accessSync, constants,
|
|
34
|
+
} from 'node:fs';
|
|
35
|
+
import { homedir } from 'node:os';
|
|
36
|
+
import { join, dirname, resolve } from 'node:path';
|
|
37
|
+
import { fileURLToPath } from 'node:url';
|
|
38
|
+
import { execFileSync } from 'node:child_process';
|
|
39
|
+
import { createInterface } from 'node:readline';
|
|
40
|
+
import { configPath } from './configfile.mjs';
|
|
41
|
+
import { readGeminiKey, readOpenRouterKey } from './apikey.mjs';
|
|
42
|
+
|
|
43
|
+
const HOME = homedir();
|
|
44
|
+
|
|
45
|
+
// ── Identity ───────────────────────────────────────────────────────────────
|
|
46
|
+
// SERVICE drives the label, log dir and the slot's `provider` field. ROLE is the
|
|
47
|
+
// stable discovery role — the slot FILE NAME and the operator env-file name,
|
|
48
|
+
// which survive any implementation rename (mirrors ~/.iapeer/memory-provider.json:
|
|
49
|
+
// file name = role, `provider` field = concrete provider).
|
|
50
|
+
const SERVICE = 'voice-connect';
|
|
51
|
+
const ROLE = 'voice-provider';
|
|
52
|
+
const LABEL = `com.${SERVICE}.http`;
|
|
53
|
+
|
|
54
|
+
const STABLE_HOME = join(HOME, '.iapeer', 'providers', SERVICE);
|
|
55
|
+
const PLIST_PATH = join(HOME, 'Library', 'LaunchAgents', `${LABEL}.plist`);
|
|
56
|
+
const LOG_DIR = join(HOME, '.iapeer', 'logs', `${SERVICE}-http`);
|
|
57
|
+
const SLOT_PATH = join(HOME, '.iapeer', `${ROLE}.json`);
|
|
58
|
+
// Host-local operator STT config (speaches primary endpoint/model) — NOT in the
|
|
59
|
+
// repo, NOT published. Sourced into the plist so the deployed service has a
|
|
60
|
+
// working STT tier and survives a bare reinstall. Absent → mlx-whisper floor.
|
|
61
|
+
const STT_ENV_FILE = join(HOME, '.iapeer', `${ROLE}.env`);
|
|
62
|
+
// Liveness file the HTTP daemon mtime-touches; the foundation reads its mtime
|
|
63
|
+
// for freshness in `iapeer status`. http.mjs computes the same path.
|
|
64
|
+
const HEARTBEAT_PATH = join(HOME, '.iapeer', 'state', SERVICE, 'http.heartbeat');
|
|
65
|
+
const DOMAIN = `gui/${process.getuid()}`;
|
|
66
|
+
|
|
67
|
+
// The running package's root: src/provider.mjs → repo root. In dev that's the
|
|
68
|
+
// checkout; under `npm exec …@latest` it's the freshly-fetched temp package.
|
|
69
|
+
const SOURCE_ROOT = resolve(dirname(fileURLToPath(import.meta.url)), '..');
|
|
70
|
+
|
|
71
|
+
// ── small helpers ────────────────────────────────────────────────────────────
|
|
72
|
+
function port() {
|
|
73
|
+
const p = parseInt(process.env.PEER_VOICE_HTTP_PORT || '', 10);
|
|
74
|
+
return Number.isInteger(p) && p > 0 ? p : 8127;
|
|
75
|
+
}
|
|
76
|
+
function host() {
|
|
77
|
+
return (process.env.PEER_VOICE_HTTP_HOST || '127.0.0.1').trim() || '127.0.0.1';
|
|
78
|
+
}
|
|
79
|
+
|
|
80
|
+
/** Absolute node path for launchd's minimal PATH. Prefer the brew SYMLINK over
|
|
81
|
+
* the versioned Cellar path (which breaks on every `brew upgrade node`). */
|
|
82
|
+
function nodeBin() {
|
|
83
|
+
for (const c of ['/opt/homebrew/bin/node', '/usr/local/bin/node']) {
|
|
84
|
+
try { accessSync(c, constants.X_OK); return c; } catch { /* try next */ }
|
|
85
|
+
}
|
|
86
|
+
return process.execPath;
|
|
87
|
+
}
|
|
88
|
+
|
|
89
|
+
/** Parse a simple KEY=VALUE / `export KEY=VALUE` env file (optional quotes). */
|
|
90
|
+
function readEnvFile(p) {
|
|
91
|
+
const out = {};
|
|
92
|
+
try {
|
|
93
|
+
for (const line of readFileSync(p, 'utf8').split('\n')) {
|
|
94
|
+
const m = line.match(/^\s*(?:export\s+)?([A-Z0-9_]+)\s*=\s*(.*?)\s*$/);
|
|
95
|
+
if (!m || line.trim().startsWith('#')) continue;
|
|
96
|
+
let v = m[2];
|
|
97
|
+
if ((v.startsWith('"') && v.endsWith('"')) || (v.startsWith("'") && v.endsWith("'"))) v = v.slice(1, -1);
|
|
98
|
+
out[m[1]] = v;
|
|
99
|
+
}
|
|
100
|
+
} catch { /* no file → no STT config */ }
|
|
101
|
+
return out;
|
|
102
|
+
}
|
|
103
|
+
|
|
104
|
+
/** Resolved STT tier config: the operator env-file is the durable source;
|
|
105
|
+
* process.env overrides it. Empty endpoint → speaches tier skipped, floor used. */
|
|
106
|
+
function sttConfig() {
|
|
107
|
+
const file = readEnvFile(STT_ENV_FILE);
|
|
108
|
+
const endpoint = (process.env.PEER_VOICE_STT_ENDPOINT ?? file.PEER_VOICE_STT_ENDPOINT ?? '').trim();
|
|
109
|
+
const model = (process.env.PEER_VOICE_STT_MODEL ?? file.PEER_VOICE_STT_MODEL ?? '').trim();
|
|
110
|
+
return { endpoint, model };
|
|
111
|
+
}
|
|
112
|
+
|
|
113
|
+
const xml = (s) => String(s).replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>');
|
|
114
|
+
|
|
115
|
+
function readManifest(codeHome) {
|
|
116
|
+
return JSON.parse(readFileSync(join(codeHome, 'package.json'), 'utf8'));
|
|
117
|
+
}
|
|
118
|
+
|
|
119
|
+
// ── render: plist + slot (parameterized by codeHome) ─────────────────────────
|
|
120
|
+
/** Render the launchd plist for a service whose code lives at `codeHome`. */
|
|
121
|
+
export function renderPlist(codeHome) {
|
|
122
|
+
const node = nodeBin();
|
|
123
|
+
const entry = join(codeHome, 'bin', 'peer-voice-http.mjs');
|
|
124
|
+
// ~/.local/bin FIRST: the mlx-whisper STT floor lives there (uv tool); without
|
|
125
|
+
// it the floor crash-loops ENOENT under launchd's minimal PATH. Then brew + system.
|
|
126
|
+
const PATH = `${HOME}/.local/bin:/opt/homebrew/bin:/usr/bin:/bin:/usr/sbin:/sbin`;
|
|
127
|
+
const { endpoint: sttEndpoint, model: sttModel } = sttConfig();
|
|
128
|
+
const sttEnv =
|
|
129
|
+
(sttEndpoint ? ` <key>PEER_VOICE_STT_ENDPOINT</key>\n <string>${xml(sttEndpoint)}</string>\n` : '') +
|
|
130
|
+
(sttModel ? ` <key>PEER_VOICE_STT_MODEL</key>\n <string>${xml(sttModel)}</string>\n` : '');
|
|
131
|
+
return `<?xml version="1.0" encoding="UTF-8"?>
|
|
132
|
+
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
|
133
|
+
<plist version="1.0">
|
|
134
|
+
<dict>
|
|
135
|
+
<key>Label</key>
|
|
136
|
+
<string>${LABEL}</string>
|
|
137
|
+
|
|
138
|
+
<key>ProgramArguments</key>
|
|
139
|
+
<array>
|
|
140
|
+
<string>${xml(node)}</string>
|
|
141
|
+
<string>${xml(entry)}</string>
|
|
142
|
+
</array>
|
|
143
|
+
|
|
144
|
+
<key>WorkingDirectory</key>
|
|
145
|
+
<string>${xml(codeHome)}</string>
|
|
146
|
+
|
|
147
|
+
<!-- launchd gives a minimal PATH; bake an absolute node path + a usable PATH
|
|
148
|
+
so a bare binary name can't crash-loop. -->
|
|
149
|
+
<key>EnvironmentVariables</key>
|
|
150
|
+
<dict>
|
|
151
|
+
<key>PATH</key>
|
|
152
|
+
<string>${PATH}</string>
|
|
153
|
+
<key>PEER_VOICE_HTTP_PORT</key>
|
|
154
|
+
<string>${port()}</string>
|
|
155
|
+
<key>PEER_VOICE_HTTP_HOST</key>
|
|
156
|
+
<string>${xml(host())}</string>
|
|
157
|
+
${sttEnv} </dict>
|
|
158
|
+
|
|
159
|
+
<!-- durable: start at load, restart on crash, throttle the respawn -->
|
|
160
|
+
<key>RunAtLoad</key>
|
|
161
|
+
<true/>
|
|
162
|
+
<key>KeepAlive</key>
|
|
163
|
+
<true/>
|
|
164
|
+
<key>ThrottleInterval</key>
|
|
165
|
+
<integer>10</integer>
|
|
166
|
+
|
|
167
|
+
<key>StandardOutPath</key>
|
|
168
|
+
<string>${xml(join(LOG_DIR, 'launchd-stdout.log'))}</string>
|
|
169
|
+
<key>StandardErrorPath</key>
|
|
170
|
+
<string>${xml(join(LOG_DIR, 'launchd-stderr.log'))}</string>
|
|
171
|
+
</dict>
|
|
172
|
+
</plist>
|
|
173
|
+
`;
|
|
174
|
+
}
|
|
175
|
+
|
|
176
|
+
/** Declarative discovery slot — the manifest the core READS (never writes).
|
|
177
|
+
* Mirrors ~/.iapeer/memory-provider.json: `provider` (role), package, version,
|
|
178
|
+
* registeredAt, heartbeat; plus voice-specific endpoint/routes for discovery. */
|
|
179
|
+
export function renderSlot(codeHome) {
|
|
180
|
+
const pkg = readManifest(codeHome);
|
|
181
|
+
return JSON.stringify(
|
|
182
|
+
{
|
|
183
|
+
provider: SERVICE,
|
|
184
|
+
package: pkg.name,
|
|
185
|
+
version: pkg.version,
|
|
186
|
+
registeredAt: new Date().toISOString(),
|
|
187
|
+
heartbeat: HEARTBEAT_PATH,
|
|
188
|
+
managed: 'self',
|
|
189
|
+
label: LABEL,
|
|
190
|
+
host: host(),
|
|
191
|
+
port: port(),
|
|
192
|
+
endpoint: `http://${host()}:${port()}`,
|
|
193
|
+
routes: {
|
|
194
|
+
tts: '/v1/audio/speech',
|
|
195
|
+
stt: '/v1/audio/transcriptions',
|
|
196
|
+
health: '/health',
|
|
197
|
+
},
|
|
198
|
+
},
|
|
199
|
+
null,
|
|
200
|
+
2,
|
|
201
|
+
) + '\n';
|
|
202
|
+
}
|
|
203
|
+
|
|
204
|
+
// ── launchd install / uninstall / status (no code copy) ──────────────────────
|
|
205
|
+
/** Write plist+slot for code at `codeHome` and (re)load the agent. Idempotent. */
|
|
206
|
+
export function installLaunchd(codeHome) {
|
|
207
|
+
mkdirSync(LOG_DIR, { recursive: true });
|
|
208
|
+
mkdirSync(dirname(SLOT_PATH), { recursive: true });
|
|
209
|
+
mkdirSync(dirname(PLIST_PATH), { recursive: true });
|
|
210
|
+
mkdirSync(dirname(HEARTBEAT_PATH), { recursive: true });
|
|
211
|
+
writeFileSync(PLIST_PATH, renderPlist(codeHome));
|
|
212
|
+
writeFileSync(SLOT_PATH, renderSlot(codeHome));
|
|
213
|
+
// Idempotent (re)load: bootout an existing instance, then bootstrap fresh.
|
|
214
|
+
try { execFileSync('launchctl', ['bootout', DOMAIN, PLIST_PATH], { stdio: 'ignore' }); } catch { /* not loaded */ }
|
|
215
|
+
execFileSync('launchctl', ['bootstrap', DOMAIN, PLIST_PATH], { stdio: 'inherit' });
|
|
216
|
+
}
|
|
217
|
+
|
|
218
|
+
export function uninstall() {
|
|
219
|
+
try { execFileSync('launchctl', ['bootout', DOMAIN, PLIST_PATH], { stdio: 'ignore' }); } catch { /* not loaded */ }
|
|
220
|
+
rmSync(PLIST_PATH, { force: true });
|
|
221
|
+
rmSync(SLOT_PATH, { force: true });
|
|
222
|
+
// Remove our code home; KEEP user data (config.json keys) + logs + STT env.
|
|
223
|
+
rmSync(STABLE_HOME, { recursive: true, force: true });
|
|
224
|
+
process.stdout.write(
|
|
225
|
+
`uninstalled ${LABEL}\n removed: plist, slot, ${STABLE_HOME}\n kept: logs, ${STT_ENV_FILE}, key config\n`,
|
|
226
|
+
);
|
|
227
|
+
}
|
|
228
|
+
|
|
229
|
+
export function status() {
|
|
230
|
+
const slot = existsSync(SLOT_PATH) ? readFileSync(SLOT_PATH, 'utf8') : null;
|
|
231
|
+
process.stdout.write(
|
|
232
|
+
`provider: ${SERVICE}\nlabel: ${LABEL}\nplist: ${PLIST_PATH} ${existsSync(PLIST_PATH) ? '(present)' : '(absent)'}\n` +
|
|
233
|
+
`slot: ${SLOT_PATH} ${slot ? '(present)' : '(absent)'}\nhome: ${STABLE_HOME} ${existsSync(STABLE_HOME) ? '(present)' : '(absent)'}\n\n`,
|
|
234
|
+
);
|
|
235
|
+
if (slot) process.stdout.write(slot + '\n');
|
|
236
|
+
try {
|
|
237
|
+
process.stdout.write(execFileSync('launchctl', ['print', `${DOMAIN}/${LABEL}`], { encoding: 'utf8' }));
|
|
238
|
+
} catch {
|
|
239
|
+
process.stdout.write(`(${LABEL} not loaded in ${DOMAIN})\n`);
|
|
240
|
+
}
|
|
241
|
+
}
|
|
242
|
+
|
|
243
|
+
// ── runtime tree copy → stable home ──────────────────────────────────────────
|
|
244
|
+
/** Copy the running package's runtime files into the stable provider home,
|
|
245
|
+
* replacing any prior copy (so files removed in a new release don't linger),
|
|
246
|
+
* and MATERIALIZE the runtime dependency. The published tarball ships no
|
|
247
|
+
* node_modules (`files` is bin/src/LICENSE/README), so a bare file copy leaves
|
|
248
|
+
* the service ERR_MODULE_NOT_FOUND on @modelcontextprotocol/sdk. We prefer
|
|
249
|
+
* copying the source's node_modules (offline, the exact tested versions); when
|
|
250
|
+
* absent — the `npm exec …@latest` case, where deps hoist to the npx env root,
|
|
251
|
+
* not inside the package — we `npm install --omit=dev` from the registry. */
|
|
252
|
+
function copyRuntimeTree(src, dest) {
|
|
253
|
+
mkdirSync(dest, { recursive: true });
|
|
254
|
+
for (const item of ['bin', 'src']) {
|
|
255
|
+
rmSync(join(dest, item), { recursive: true, force: true });
|
|
256
|
+
cpSync(join(src, item), join(dest, item), { recursive: true });
|
|
257
|
+
}
|
|
258
|
+
for (const file of ['package.json', 'LICENSE', 'README.md']) {
|
|
259
|
+
const from = join(src, file);
|
|
260
|
+
if (existsSync(from)) cpSync(from, join(dest, file));
|
|
261
|
+
}
|
|
262
|
+
const srcModules = join(src, 'node_modules');
|
|
263
|
+
const destModules = join(dest, 'node_modules');
|
|
264
|
+
rmSync(destModules, { recursive: true, force: true });
|
|
265
|
+
if (existsSync(srcModules)) {
|
|
266
|
+
cpSync(srcModules, destModules, { recursive: true });
|
|
267
|
+
} else {
|
|
268
|
+
execFileSync('npm', ['install', '--omit=dev', '--no-audit', '--no-fund', '--no-save'],
|
|
269
|
+
{ cwd: dest, stdio: 'inherit' });
|
|
270
|
+
}
|
|
271
|
+
}
|
|
272
|
+
|
|
273
|
+
// ── interactive key prompts (init only) ──────────────────────────────────────
|
|
274
|
+
/** Prompt once on a TTY for any unresolved cloud key and persist to the
|
|
275
|
+
* autonomous config file's `keys` map. The RUNTIME stays read-only; this write
|
|
276
|
+
* is the operator provision action. Skipped without a TTY (update / headless)
|
|
277
|
+
* and for keys already resolvable via env/rc/config — local floors need none. */
|
|
278
|
+
async function ensureKeys() {
|
|
279
|
+
if (!process.stdin.isTTY) return;
|
|
280
|
+
const wanted = [
|
|
281
|
+
{ name: 'GEMINI_API_KEY', get: readGeminiKey, label: 'Gemini TTS (primary)' },
|
|
282
|
+
{ name: 'OPENROUTER_API_KEY', get: readOpenRouterKey, label: 'gpt-audio via OpenRouter (fallback)' },
|
|
283
|
+
];
|
|
284
|
+
const missing = wanted.filter((k) => !k.get());
|
|
285
|
+
if (!missing.length) return;
|
|
286
|
+
|
|
287
|
+
const rl = createInterface({ input: process.stdin, output: process.stdout });
|
|
288
|
+
const ask = (q) => new Promise((res) => rl.question(q, (a) => res(a.trim())));
|
|
289
|
+
process.stdout.write(
|
|
290
|
+
'\nvoice-connect: optional TTS cloud keys (blank = skip; the local F5/Supertonic floor needs none):\n',
|
|
291
|
+
);
|
|
292
|
+
const collected = {};
|
|
293
|
+
for (const k of missing) {
|
|
294
|
+
const v = await ask(` ${k.label} — ${k.name}: `);
|
|
295
|
+
if (v) collected[k.name] = v;
|
|
296
|
+
}
|
|
297
|
+
rl.close();
|
|
298
|
+
if (Object.keys(collected).length) persistKeys(collected);
|
|
299
|
+
}
|
|
300
|
+
|
|
301
|
+
/** Merge `keys` into the autonomous config file (create dirs as needed). */
|
|
302
|
+
function persistKeys(keys) {
|
|
303
|
+
const p = configPath();
|
|
304
|
+
let data = {};
|
|
305
|
+
try { data = JSON.parse(readFileSync(p, 'utf8')) || {}; } catch { /* new file */ }
|
|
306
|
+
data.keys = { ...(data.keys && typeof data.keys === 'object' ? data.keys : {}), ...keys };
|
|
307
|
+
mkdirSync(dirname(p), { recursive: true });
|
|
308
|
+
writeFileSync(p, JSON.stringify(data, null, 2) + '\n');
|
|
309
|
+
process.stdout.write(` saved ${Object.keys(keys).join(', ')} → ${p}\n`);
|
|
310
|
+
}
|
|
311
|
+
|
|
312
|
+
// ── the verbs ────────────────────────────────────────────────────────────────
|
|
313
|
+
/** Deploy the host backend: copy code → stable home, (init only) prompt keys,
|
|
314
|
+
* install the launchd HTTP service + slot. Idempotent. */
|
|
315
|
+
export async function deploy({ interactive = false } = {}) {
|
|
316
|
+
const prev = readSlotVersion();
|
|
317
|
+
copyRuntimeTree(SOURCE_ROOT, STABLE_HOME);
|
|
318
|
+
if (interactive) await ensureKeys();
|
|
319
|
+
installLaunchd(STABLE_HOME);
|
|
320
|
+
const pkg = readManifest(STABLE_HOME);
|
|
321
|
+
const stt = sttConfig();
|
|
322
|
+
const verb = prev ? (prev === pkg.version ? `re-deployed (v${pkg.version})` : `updated ${prev} → ${pkg.version}`) : `installed v${pkg.version}`;
|
|
323
|
+
process.stdout.write(
|
|
324
|
+
`voice-connect ${verb}\n label: ${LABEL}\n home: ${STABLE_HOME}\n slot: ${SLOT_PATH}\n` +
|
|
325
|
+
` endpoint: http://${host()}:${port()}\n heartbeat:${HEARTBEAT_PATH}\n` +
|
|
326
|
+
` STT primary: ${stt.endpoint || '(none — mlx-whisper floor only)'}${stt.model ? ` [${stt.model}]` : ''}\n`,
|
|
327
|
+
);
|
|
328
|
+
}
|
|
329
|
+
|
|
330
|
+
/** Read the current slot's version (for update from→to reporting), or null. */
|
|
331
|
+
function readSlotVersion() {
|
|
332
|
+
try { return JSON.parse(readFileSync(SLOT_PATH, 'utf8')).version || null; } catch { return null; }
|
|
333
|
+
}
|
|
334
|
+
|
|
335
|
+
/** `init` — onboard install. Inherits the tty for key prompts. */
|
|
336
|
+
export const init = () => deploy({ interactive: true });
|
|
337
|
+
|
|
338
|
+
/** `update` — cascade leg. Non-interactive; re-copies fresh code + restarts +
|
|
339
|
+
* bumps the slot version. MUST be invoked via
|
|
340
|
+
* `npm exec --package=@agfpd/voice-connect@latest -- voice-connect update`
|
|
341
|
+
* so SOURCE_ROOT is the freshly-fetched package, not a stale on-disk copy. */
|
|
342
|
+
export const update = () => deploy({ interactive: false });
|
|
343
|
+
|
|
344
|
+
export const paths = { SERVICE, ROLE, LABEL, STABLE_HOME, PLIST_PATH, SLOT_PATH, STT_ENV_FILE, HEARTBEAT_PATH, LOG_DIR };
|