pi-voice-input 0.1.2 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md CHANGED
@@ -13,8 +13,8 @@ Development workflow for this repo.
13
13
 
14
14
  ## Secrets and local data
15
15
 
16
- - Never commit API keys, `.env`, recordings, logs, caches, or `node_modules`.
17
- - User credentials belong in `~/.pi/agent/voice-input.env`, usually written by `/voice key`.
16
+ - Never commit API keys, `.env`, local config JSON, recordings, logs, caches, or `node_modules`.
17
+ - User credentials and plugin settings belong in `~/.pi/agent/voice-input.config.json`, usually written by `/voice key` or `/voice init`.
18
18
  - Do not print or copy real API keys into commits, docs, tests, or command output.
19
19
  - The explicit VolcEngine API key URL that should be shown to users is:
20
20
  `https://console.volcengine.com/speech/new/setting/apikeys?projectName=default`
@@ -33,7 +33,6 @@ npm pack --dry-run
33
33
  Check that `npm pack --dry-run` includes only publishable files, normally:
34
34
 
35
35
  ```text
36
- .env.example
37
36
  AGENTS.md
38
37
  README.md
39
38
  extensions/voice-input.ts
@@ -50,7 +49,7 @@ Then check:
50
49
 
51
50
  ```bash
52
51
  git status --short
53
- rg -n "VOLC_API_KEY=|[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}" \
52
+ rg -n '"volcApiKey"\\s*:\\s*"[^"]+"|VOLC_API_KEY=|[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}' \
54
53
  --glob '!node_modules/**' --glob '!package-lock.json' . || true
55
54
  ```
56
55
 
package/README.md CHANGED
@@ -23,6 +23,8 @@ pi extension: extensions/voice-input.ts
23
23
  ├─ parses the WAV container in TypeScript and extracts raw PCM
24
24
  ├─ sends PCM frames to the configured ASR provider via ws
25
25
  │ └─ current provider: VolcEngine /api/v3/sauc/bigmodel_nostream
26
+ ├─ optionally post-processes raw ASR text with a configured pi model
27
+ │ └─ default: deepseek/deepseek-v4-flash, no reasoning option
26
28
  └─ appends the final transcript to pi's editor with ctx.ui.setEditorText()
27
29
  ```
28
30
 
@@ -43,18 +45,6 @@ Install the published package with pi:
43
45
  pi install npm:pi-voice-input
44
46
  ```
45
47
 
46
- To pin a specific version:
47
-
48
- ```bash
49
- pi install npm:pi-voice-input@0.1.0
50
- ```
51
-
52
- If pi is already running, reload extensions after installation:
53
-
54
- ```text
55
- /reload
56
- ```
57
-
58
48
  ## Providers
59
49
 
60
50
  The extension is structured around a provider boundary: recording, editor insertion, and command handling are generic; ASR transport/protocol logic is provider-specific.
@@ -68,92 +58,49 @@ Planned provider direction:
68
58
  - add more ASR providers without changing the shortcut/user workflow
69
59
  - keep provider credentials and options isolated in config
70
60
 
71
- ## Configure credentials
61
+ ## Configure
72
62
 
73
- In pi, run:
63
+ All plugin settings live in one JSON file:
74
64
 
75
65
  ```text
76
- /voice key
66
+ ~/.pi/agent/voice-input.config.json
77
67
  ```
78
68
 
79
- Paste your VolcEngine Speech API key into the prompt. The extension saves it for future sessions and keeps it out of your project files.
69
+ Package-local and project-local env files are not read.
80
70
 
81
- The key URL is also shown inside pi when the key is missing, when you run `/voice key`, and in `/voice help`:
71
+ Create or normalize the file from inside pi:
82
72
 
83
73
  ```text
84
- https://console.volcengine.com/speech/new/setting/apikeys?projectName=default
74
+ /voice init
85
75
  ```
86
76
 
87
- Then verify:
77
+ Then set the VolcEngine Speech API key:
88
78
 
89
79
  ```text
90
- /voice config
80
+ /voice key
91
81
  ```
92
82
 
93
- You can get/manage the key here:
83
+ The key URL is also shown inside pi when the key is missing, when you run `/voice key`, and in `/voice help`:
94
84
 
95
85
  https://console.volcengine.com/speech/new/setting/apikeys?projectName=default
96
86
 
97
- If `VOLC_API_KEY` is missing, the extension does not silently fail. It shows an error notification explaining:
98
-
99
- - that the current provider API key is missing
100
- - to run `/voice key`
101
- - the VolcEngine API-key settings URL
102
- - that `/voice config` can be used to verify detection
87
+ The config file is plain JSON and can be edited directly:
103
88
 
104
- Manual fallback:
105
-
106
- ```bash
107
- mkdir -p ~/.pi/agent
108
- cp .env.example ~/.pi/agent/voice-input.env
109
- $EDITOR ~/.pi/agent/voice-input.env
89
+ ```json
90
+ {
91
+ "volcApiKey": "",
92
+ "polishModel": "deepseek/deepseek-v4-flash"
93
+ }
110
94
  ```
111
95
 
112
- ## Configuration reference
113
-
114
- Example:
115
-
116
- ```bash
117
- # Required for the current provider. Usually set by /voice key.
118
- VOLC_API_KEY=your_volcengine_speech_api_key
119
-
120
- # Current provider: VolcEngine WebSocket ASR endpoint and resource
121
- VOLC_WS_URL=wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_nostream
122
- VOLC_STREAM_RESOURCE_ID=volc.seedasr.sauc.duration
123
-
124
- # Empty means auto-detect. Example: zh-CN.
125
- ASR_LANGUAGE=
126
-
127
- # Optional contextual prompt for ASR.
128
- ASR_PROMPT=
96
+ `polishModel` is resolved from pi's model registry, so any model shown by `pi --list-models` can be used. Leave it empty to disable polish. If polishing fails, the raw ASR transcript is inserted instead.
129
97
 
130
- # Faster for post-recording batch transcription. Use 200 for realtime-like packet size.
131
- STREAM_SEGMENT_MS=5000
132
- ASR_REQUEST_TIMEOUT_MS=90000
98
+ Verify the effective non-secret config:
133
99
 
134
- # Empty means use PipeWire's default source.
135
- RECORDER_TARGET=
136
- RECORDING_FINALIZE_DELAY=0.1
137
-
138
- # Storage for recordings, logs, and state.
139
- VOICE_INPUT_HOME=~/.pi/agent/voice-input
140
- RECORDINGS_DIR=recordings
141
- RECORDER_STATE=recording.json
142
- RECORDER_LOG_DIR=logs
143
-
144
- # Shortcut. Default: ctrl+shift+r
145
- VOICE_INPUT_SHORTCUT=ctrl+shift+r
100
+ ```text
101
+ /voice config
146
102
  ```
147
103
 
148
- Config loading order, later values override earlier ones:
149
-
150
- 1. `~/.pi/agent/voice-input.env`
151
- 2. package-local `.env`
152
- 3. current-working-directory `.env`
153
- 4. shell environment variables
154
-
155
- Do not commit real credentials. Prefer `/voice key`, or keep private local values in `.env` or `~/.pi/agent/voice-input.env`.
156
-
157
104
  ## Usage
158
105
 
159
106
  Shortcut:
@@ -171,6 +118,7 @@ Slash commands:
171
118
  /voice cancel # stop recording without transcribing
172
119
  /voice status # show recorder state
173
120
  /voice config # show effective non-secret config and whether API key is detected
121
+ /voice init # create or normalize ~/.pi/agent/voice-input.config.json
174
122
  /voice key # prompt for and save the current provider API key
175
123
  /voice help # show setup help, including the explicit VolcEngine API key URL
176
124
  ```
@@ -178,8 +126,10 @@ Slash commands:
178
126
  ## Notes
179
127
 
180
128
  - The extension uses post-recording WebSocket ASR: it records locally first, then sends the stopped recording in chunks. It is optimized for fast voice input, not live subtitles.
181
- - The default `STREAM_SEGMENT_MS=5000` is intentionally larger than realtime packet sizes because this workflow sends already-recorded audio.
129
+ - The default ASR segment size is intentionally larger than realtime packet sizes because this workflow sends already-recorded audio.
182
130
  - The transcript is inserted into the editor only; it is not submitted automatically.
131
+ - When `polishModel` is set, polishing uses the current editor content and recent session messages as context, but outputs only the refined user instruction.
132
+ - While recording, the status line and tool panel show `Recording with [device name]`.
183
133
 
184
134
  ## Development
185
135
 
@@ -209,7 +159,7 @@ After changing the extension while pi is open, run:
209
159
  /reload
210
160
  ```
211
161
 
212
- ## Volcengine links
162
+ ## Links
213
163
 
214
164
  - API key settings: https://console.volcengine.com/speech/new/setting/apikeys?projectName=default
215
165
  - ASR product page: https://www.volcengine.com/product/asr
@@ -1,4 +1,5 @@
1
1
  import type { ExtensionAPI, ExtensionContext } from "@earendil-works/pi-coding-agent";
2
+ import { completeSimple, type Api, type Model } from "@earendil-works/pi-ai";
2
3
  import { Key } from "@earendil-works/pi-tui";
3
4
  import { spawn, spawnSync } from "node:child_process";
4
5
  import { randomUUID } from "node:crypto";
@@ -15,15 +16,23 @@ import {
15
16
  } from "node:fs";
16
17
  import { homedir } from "node:os";
17
18
  import path from "node:path";
18
- import { fileURLToPath } from "node:url";
19
19
  import { gzipSync, gunzipSync } from "node:zlib";
20
20
  import WebSocket from "ws";
21
21
 
22
- const EXTENSION_DIR = path.dirname(fileURLToPath(import.meta.url));
23
- const PACKAGE_ROOT = path.resolve(EXTENSION_DIR, "..");
24
- const PRIVATE_CONFIG_PATH = path.join(homedir(), ".pi", "agent", "voice-input.env");
22
+ const CONFIG_PATH = path.join(homedir(), ".pi", "agent", "voice-input.config.json");
25
23
  const VOLC_API_KEY_URL = "https://console.volcengine.com/speech/new/setting/apikeys?projectName=default";
26
24
  const DEFAULT_SHORTCUT = Key.ctrlShift("r");
25
+ const DEFAULT_POSTPROCESS_MODEL = "deepseek/deepseek-v4-flash";
26
+ const POSTPROCESS_SYSTEM_PROMPT = `你是 pi 语音输入插件的语音识别后处理器。你的唯一任务是把原始 ASR 文本改写为可直接提交给编码智能体的用户指令。
27
+
28
+ 规则:
29
+ - 只输出优化后的用户指令正文,不要输出解释、标题、前后缀、引号、代码围栏或寒暄。
30
+ - 结合上下文理解省略指代、当前任务、文件/项目名称和用户意图;上下文仅用于理解,不要重复上下文内容,除非原始语音明确要求引用或修改它。
31
+ - 修正明显的语音识别错误、同音/近音错误、断句和标点错误;保留代码标识符、命令、路径、URL、模型名、包名和专有名词。
32
+ - 如果用户口误后自我更正(例如“不是……是……”“不对……”“算了改成……”),只保留更正后的正确指令,删除错误说法和更正过程。
33
+ - 让结果完整、符合逻辑、指令明确、有指导性;必要时拆成条目或步骤。
34
+ - 不要凭空添加原始语音没有表达的新需求;不确定时保留原意并用更清晰的措辞表达。
35
+ - 输出语言通常与原始语音一致。`;
27
36
 
28
37
  const MSG_TYPE_CLIENT_FULL_REQUEST = 0b0001;
29
38
  const MSG_TYPE_CLIENT_AUDIO_ONLY_REQUEST = 0b0010;
@@ -35,9 +44,15 @@ const SERIALIZATION_NONE = 0b0000;
35
44
  const SERIALIZATION_JSON = 0b0001;
36
45
  const COMPRESSION_GZIP = 0b0001;
37
46
 
38
- type EnvMap = Record<string, string>;
47
+ type JsonObject = Record<string, unknown>;
48
+
49
+ type VoiceInputConfigFile = {
50
+ volcApiKey: string;
51
+ polishModel: string;
52
+ };
39
53
 
40
54
  type VoiceConfig = {
55
+ configPath: string;
41
56
  apiKey: string;
42
57
  wsUrl: string;
43
58
  resourceId: string;
@@ -56,6 +71,11 @@ type VoiceConfig = {
56
71
  enablePunc: boolean;
57
72
  enableDdc: boolean;
58
73
  showUtterances: boolean;
74
+ postprocessEnabled: boolean;
75
+ postprocessModel: string;
76
+ postprocessTimeoutMs: number;
77
+ postprocessMaxTokens: number;
78
+ postprocessContextChars: number;
59
79
  };
60
80
 
61
81
  type RecordingState = {
@@ -64,6 +84,7 @@ type RecordingState = {
64
84
  logPath: string;
65
85
  startedAt: string;
66
86
  recorderTarget?: string;
87
+ deviceName?: string;
67
88
  };
68
89
 
69
90
  type DecodedFrame = {
@@ -85,139 +106,94 @@ type TranscriptionResult = {
85
106
  };
86
107
  };
87
108
 
88
- function parseEnvText(text: string): EnvMap {
89
- const env: EnvMap = {};
90
- for (const rawLine of text.split(/\r?\n/)) {
91
- const line = rawLine.trim();
92
- if (!line || line.startsWith("#")) continue;
93
- const match = line.match(/^([A-Za-z_][A-Za-z0-9_]*)\s*=\s*(.*)$/);
94
- if (!match) continue;
95
- const key = match[1];
96
- let value = match[2] ?? "";
97
- if ((value.startsWith('"') && value.endsWith('"')) || (value.startsWith("'") && value.endsWith("'"))) {
98
- value = value.slice(1, -1);
99
- }
100
- env[key] = value;
101
- }
102
- return env;
103
- }
104
-
105
- function loadEnvFiles(): EnvMap {
106
- const candidates = [
107
- PRIVATE_CONFIG_PATH,
108
- path.join(PACKAGE_ROOT, ".env"),
109
- path.join(process.cwd(), ".env"),
110
- ];
111
- const merged: EnvMap = {};
112
- for (const file of candidates) {
113
- if (!existsSync(file)) continue;
114
- Object.assign(merged, parseEnvText(readFileSync(file, "utf8")));
115
- }
116
- return merged;
117
- }
118
-
119
- function setting(env: EnvMap, name: string, fallback = ""): string {
120
- const value = process.env[name] ?? env[name];
121
- return value == null ? fallback : value;
109
+ function ensureDir(dir: string) {
110
+ mkdirSync(dir, { recursive: true });
122
111
  }
123
112
 
124
- function settingAny(env: EnvMap, names: string[], fallback = ""): string {
125
- for (const name of names) {
126
- const value = process.env[name] ?? env[name];
127
- if (value != null && value !== "") return value;
128
- }
129
- return fallback;
113
+ function defaultConfigFile(): VoiceInputConfigFile {
114
+ return {
115
+ volcApiKey: "",
116
+ polishModel: DEFAULT_POSTPROCESS_MODEL,
117
+ };
130
118
  }
131
119
 
132
- function boolSetting(env: EnvMap, name: string, fallback: boolean): boolean {
133
- const raw = setting(env, name, fallback ? "true" : "false").trim().toLowerCase();
134
- if (["1", "true", "yes", "on"].includes(raw)) return true;
135
- if (["0", "false", "no", "off"].includes(raw)) return false;
136
- return fallback;
120
+ function isObject(value: unknown): value is JsonObject {
121
+ return Boolean(value && typeof value === "object" && !Array.isArray(value));
137
122
  }
138
123
 
139
- function numberSetting(env: EnvMap, name: string, fallback: number): number {
140
- const raw = setting(env, name, String(fallback)).trim();
141
- const value = Number(raw);
142
- return Number.isFinite(value) ? value : fallback;
124
+ function stringField(source: JsonObject, name: string, fallback: string): string {
125
+ const value = source[name];
126
+ return typeof value === "string" ? value : fallback;
143
127
  }
144
128
 
145
- function clamp(value: number, min: number, max: number): number {
146
- return Math.min(max, Math.max(min, value));
129
+ function normalizeConfigFile(input: unknown): VoiceInputConfigFile {
130
+ const defaults = defaultConfigFile();
131
+ const root = isObject(input) ? input : {};
132
+ return {
133
+ volcApiKey: stringField(root, "volcApiKey", defaults.volcApiKey).trim(),
134
+ polishModel: stringField(root, "polishModel", defaults.polishModel).trim(),
135
+ };
147
136
  }
148
137
 
149
- function expandHome(value: string): string {
150
- if (value === "~") return homedir();
151
- if (value.startsWith("~/")) return path.join(homedir(), value.slice(2));
152
- return value;
138
+ function writeConfigFile(config: unknown) {
139
+ ensureDir(path.dirname(CONFIG_PATH));
140
+ writeFileSync(CONFIG_PATH, `${JSON.stringify(normalizeConfigFile(config), null, 2)}\n`, { mode: 0o600 });
141
+ chmodSync(CONFIG_PATH, 0o600);
153
142
  }
154
143
 
155
- function resolvePath(value: string, baseDir: string): string {
156
- const expanded = expandHome(value);
157
- return path.isAbsolute(expanded) ? expanded : path.resolve(baseDir, expanded);
144
+ function loadConfigFile(): VoiceInputConfigFile {
145
+ if (!existsSync(CONFIG_PATH)) return defaultConfigFile();
146
+ try {
147
+ return normalizeConfigFile(JSON.parse(readFileSync(CONFIG_PATH, "utf8")));
148
+ } catch (error) {
149
+ throw new Error(`Failed to read voice input config ${CONFIG_PATH}: ${error instanceof Error ? error.message : String(error)}`);
150
+ }
158
151
  }
159
152
 
160
153
  function getConfig(): VoiceConfig {
161
- const env = loadEnvFiles();
162
- const defaultHome = path.join(homedir(), ".pi", "agent", "voice-input");
163
- const voiceHome = resolvePath(setting(env, "VOICE_INPUT_HOME", defaultHome), process.cwd());
154
+ const fileConfig = loadConfigFile();
155
+ const voiceHome = path.join(homedir(), ".pi", "agent", "voice-input");
156
+ const polishModel = fileConfig.polishModel.trim();
164
157
 
165
158
  return {
166
- apiKey: settingAny(env, ["VOLC_API_KEY", "VOLCENGINE_API_KEY", "DOUBAO_ASR_API_KEY"]).trim(),
167
- wsUrl: setting(env, "VOLC_WS_URL", "wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_nostream").trim(),
168
- resourceId: setting(env, "VOLC_STREAM_RESOURCE_ID", "volc.seedasr.sauc.duration").trim(),
169
- language: settingAny(env, ["ASR_LANGUAGE", "VOLC_ASR_LANGUAGE"], "").trim(),
170
- uid: setting(env, "ASR_UID", "pi-voice-input").trim(),
171
- prompt: setting(env, "ASR_PROMPT", "").trim(),
172
- segmentMs: clamp(Math.round(numberSetting(env, "STREAM_SEGMENT_MS", 5000)), 100, 20000),
173
- requestTimeoutMs: clamp(Math.round(numberSetting(env, "ASR_REQUEST_TIMEOUT_MS", 90000)), 1000, 10 * 60 * 1000),
174
- finalizeDelayMs: clamp(numberSetting(env, "RECORDING_FINALIZE_DELAY", 0.1) * 1000, 0, 5000),
175
- recorderTarget: setting(env, "RECORDER_TARGET", "").trim(),
176
- recordingsDir: resolvePath(setting(env, "RECORDINGS_DIR", "recordings"), voiceHome),
177
- statePath: resolvePath(setting(env, "RECORDER_STATE", "recording.json"), voiceHome),
178
- logDir: resolvePath(setting(env, "RECORDER_LOG_DIR", "logs"), voiceHome),
179
- shortcut: setting(env, "VOICE_INPUT_SHORTCUT", DEFAULT_SHORTCUT).trim() || DEFAULT_SHORTCUT,
180
- enableItn: boolSetting(env, "ENABLE_ITN", true),
181
- enablePunc: boolSetting(env, "ENABLE_PUNC", true),
182
- enableDdc: boolSetting(env, "ENABLE_DDC", false),
183
- showUtterances: boolSetting(env, "SHOW_UTTERANCES", false),
159
+ configPath: CONFIG_PATH,
160
+ apiKey: fileConfig.volcApiKey.trim(),
161
+ wsUrl: "wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_nostream",
162
+ resourceId: "volc.seedasr.sauc.duration",
163
+ language: "",
164
+ uid: "pi-voice-input",
165
+ prompt: "",
166
+ segmentMs: 5000,
167
+ requestTimeoutMs: 90000,
168
+ finalizeDelayMs: 100,
169
+ recorderTarget: "",
170
+ recordingsDir: path.join(voiceHome, "recordings"),
171
+ statePath: path.join(voiceHome, "recording.json"),
172
+ logDir: path.join(voiceHome, "logs"),
173
+ shortcut: DEFAULT_SHORTCUT,
174
+ enableItn: true,
175
+ enablePunc: true,
176
+ enableDdc: false,
177
+ showUtterances: false,
178
+ postprocessEnabled: polishModel.length > 0,
179
+ postprocessModel: polishModel,
180
+ postprocessTimeoutMs: 30000,
181
+ postprocessMaxTokens: 2048,
182
+ postprocessContextChars: 6000,
184
183
  };
185
184
  }
186
185
 
187
- function ensureDir(dir: string) {
188
- mkdirSync(dir, { recursive: true });
189
- }
190
-
191
- function envValue(value: string): string {
192
- if (/^[A-Za-z0-9_./:@+-]*$/.test(value)) return value;
193
- return JSON.stringify(value);
186
+ function ensureConfigFile(): boolean {
187
+ const existed = existsSync(CONFIG_PATH);
188
+ writeConfigFile(loadConfigFile());
189
+ return !existed;
194
190
  }
195
191
 
196
- function writePrivateEnvValue(name: string, value: string) {
197
- if (/\r|\n/.test(value)) throw new Error(`${name} must be a single-line value`);
198
- ensureDir(path.dirname(PRIVATE_CONFIG_PATH));
199
-
200
- const original = existsSync(PRIVATE_CONFIG_PATH) ? readFileSync(PRIVATE_CONFIG_PATH, "utf8") : "";
201
- const lines = original ? original.split(/\r?\n/) : [];
202
- const replacement = `${name}=${envValue(value)}`;
203
- let replaced = false;
204
-
205
- const nextLines = lines.map((line) => {
206
- if (new RegExp(`^\\s*${name}\\s*=`).test(line)) {
207
- replaced = true;
208
- return replacement;
209
- }
210
- return line;
211
- });
212
-
213
- if (!replaced) {
214
- if (nextLines.length > 0 && nextLines[nextLines.length - 1] !== "") nextLines.push("");
215
- nextLines.push("# Managed by pi-voice-input. You can also update this with /voice key.");
216
- nextLines.push(replacement);
217
- }
218
-
219
- writeFileSync(PRIVATE_CONFIG_PATH, nextLines.join("\n").replace(/\n*$/, "\n"), { mode: 0o600 });
220
- chmodSync(PRIVATE_CONFIG_PATH, 0o600);
192
+ function writeConfigApiKey(apiKey: string) {
193
+ if (/\r|\n/.test(apiKey)) throw new Error("volcApiKey must be a single-line value");
194
+ const config = loadConfigFile();
195
+ config.volcApiKey = apiKey.trim();
196
+ writeConfigFile(config);
221
197
  }
222
198
 
223
199
  function timestampForFilename(): string {
@@ -228,6 +204,12 @@ function commandExists(command: string): boolean {
228
204
  return spawnSync("sh", ["-lc", `command -v ${command}`], { stdio: "ignore" }).status === 0;
229
205
  }
230
206
 
207
+ function commandOutput(command: string, args: string[], timeoutMs = 1500): string {
208
+ const result = spawnSync(command, args, { encoding: "utf8", timeout: timeoutMs });
209
+ if (result.status !== 0) return "";
210
+ return (result.stdout || "").trim();
211
+ }
212
+
231
213
  function recorderCommand(config: VoiceConfig, outputPath: string): string[] {
232
214
  if (commandExists("pw-record")) {
233
215
  const cmd = ["pw-record", "--rate", "16000", "--channels", "1", "--format", "s16"];
@@ -241,6 +223,98 @@ function recorderCommand(config: VoiceConfig, outputPath: string): string[] {
241
223
  throw new Error("No recorder found. Install PipeWire tools (pw-record) or alsa-utils (arecord).");
242
224
  }
243
225
 
226
+ type PipeWireSource = {
227
+ id: string;
228
+ name: string;
229
+ description: string;
230
+ };
231
+
232
+ function parsePactlSources(text: string): PipeWireSource[] {
233
+ const sources: PipeWireSource[] = [];
234
+ let current: PipeWireSource | null = null;
235
+ for (const line of text.split(/\r?\n/)) {
236
+ const sourceMatch = line.match(/^Source #(\S+)/);
237
+ if (sourceMatch) {
238
+ if (current) sources.push(current);
239
+ current = { id: sourceMatch[1], name: "", description: "" };
240
+ continue;
241
+ }
242
+ if (!current) continue;
243
+ const nameMatch = line.match(/^\s*Name:\s*(.+)$/);
244
+ if (nameMatch) {
245
+ current.name = nameMatch[1].trim();
246
+ continue;
247
+ }
248
+ const descriptionMatch = line.match(/^\s*Description:\s*(.+)$/);
249
+ if (descriptionMatch) current.description = descriptionMatch[1].trim();
250
+ }
251
+ if (current) sources.push(current);
252
+ return sources;
253
+ }
254
+
255
+ function wpctlProperty(text: string, property: string): string {
256
+ const escaped = property.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
257
+ const match = text.match(new RegExp(`(?:^|\\n)\\s*\\*?\\s*${escaped}\\s*=\\s*"([^"]+)"`));
258
+ return match?.[1]?.trim() ?? "";
259
+ }
260
+
261
+ function inspectPipeWireSource(target: string): string {
262
+ if (!commandExists("wpctl")) return "";
263
+ const inspect = commandOutput("wpctl", ["inspect", target]);
264
+ return (
265
+ wpctlProperty(inspect, "node.description") ||
266
+ wpctlProperty(inspect, "node.nick") ||
267
+ wpctlProperty(inspect, "node.name")
268
+ );
269
+ }
270
+
271
+ function defaultPipeWireSourceFromStatus(): string {
272
+ if (!commandExists("wpctl")) return "";
273
+ const status = commandOutput("wpctl", ["status"]);
274
+ let inSources = false;
275
+ for (const line of status.split(/\r?\n/)) {
276
+ if (/Sources:/.test(line)) {
277
+ inSources = true;
278
+ continue;
279
+ }
280
+ if (inSources && /^\s*[├└]─/.test(line)) break;
281
+ if (!inSources) continue;
282
+ const match = line.match(/^\s*│\s+\*\s+\d+\.\s+(.+?)(?:\s+\[|$)/);
283
+ if (match) return match[1].trim();
284
+ }
285
+ return "";
286
+ }
287
+
288
+ function pipeWireSourceName(target: string): string {
289
+ const sources = commandExists("pactl") ? parsePactlSources(commandOutput("pactl", ["list", "sources"])) : [];
290
+
291
+ if (!target) {
292
+ const defaultName = commandExists("pactl") ? commandOutput("pactl", ["get-default-source"]) : "";
293
+ const source = sources.find((item) => item.name === defaultName);
294
+ return (
295
+ source?.description ||
296
+ source?.name ||
297
+ inspectPipeWireSource("@DEFAULT_SOURCE@") ||
298
+ defaultPipeWireSourceFromStatus() ||
299
+ defaultName ||
300
+ "default microphone"
301
+ );
302
+ }
303
+
304
+ const source = sources.find((item) => item.id === target || item.name === target || item.description === target);
305
+ return source?.description || source?.name || (/^\d+$/.test(target) ? inspectPipeWireSource(target) : "") || target;
306
+ }
307
+
308
+ function recordingDeviceName(config: VoiceConfig, recorderExecutable: string): string {
309
+ if (recorderExecutable === "pw-record") return pipeWireSourceName(config.recorderTarget);
310
+ if (recorderExecutable === "arecord") return "ALSA default microphone";
311
+ return config.recorderTarget || "default microphone";
312
+ }
313
+
314
+ function recordingStatusText(deviceName: string): string {
315
+ return `Recording with ${deviceName || "default microphone"}`;
316
+ }
317
+
244
318
  function readState(config: VoiceConfig): RecordingState | null {
245
319
  if (!existsSync(config.statePath)) return null;
246
320
  return JSON.parse(readFileSync(config.statePath, "utf8")) as RecordingState;
@@ -457,8 +531,9 @@ function parseRecordedWav(filePath: string): { pcm: Buffer; durationMs: number }
457
531
 
458
532
  function missingCredentialsMessage(): string {
459
533
  return [
460
- "Missing VOLC_API_KEY for the current VolcEngine ASR provider.",
534
+ "Missing VolcEngine API key in the pi voice input config.",
461
535
  "Run /voice key and paste your VolcEngine Speech API key.",
536
+ `Config file: ${CONFIG_PATH}`,
462
537
  `Get/create the key here: ${VOLC_API_KEY_URL}`,
463
538
  "Run /voice config to verify whether the key is detected.",
464
539
  ].join("\n");
@@ -611,6 +686,196 @@ async function transcribePcm(pcm: Buffer, durationMs: number, config: VoiceConfi
611
686
  };
612
687
  }
613
688
 
689
+ function tailText(text: string, maxChars: number): string {
690
+ if (maxChars <= 0) return "";
691
+ if (text.length <= maxChars) return text;
692
+ return `…${text.slice(-maxChars)}`;
693
+ }
694
+
695
+ function truncateText(text: string, maxChars: number): string {
696
+ if (maxChars <= 0) return "";
697
+ if (text.length <= maxChars) return text;
698
+ return `${text.slice(0, maxChars)}…`;
699
+ }
700
+
701
+ function textFromContent(content: unknown): string {
702
+ if (typeof content === "string") return content;
703
+ if (!Array.isArray(content)) return "";
704
+ return content
705
+ .map((part) => {
706
+ if (!part || typeof part !== "object") return "";
707
+ const block = part as { type?: unknown; text?: unknown };
708
+ if (block.type === "text" && typeof block.text === "string") return block.text;
709
+ return "";
710
+ })
711
+ .filter(Boolean)
712
+ .join("\n");
713
+ }
714
+
715
+ function getEditorContext(ctx: ExtensionContext, maxChars: number): string {
716
+ if (maxChars <= 0) return "";
717
+ try {
718
+ return tailText(ctx.ui.getEditorText().trim(), maxChars);
719
+ } catch {
720
+ return "";
721
+ }
722
+ }
723
+
724
+ function getRecentSessionContext(ctx: ExtensionContext, maxChars: number): string {
725
+ if (maxChars <= 0) return "";
726
+ const lines: string[] = [];
727
+ for (const entry of ctx.sessionManager.getBranch()) {
728
+ if (entry.type !== "message") continue;
729
+ const message = entry.message as { role?: unknown; content?: unknown };
730
+ if (message.role !== "user" && message.role !== "assistant") continue;
731
+ const text = textFromContent(message.content).replace(/\s+/g, " ").trim();
732
+ if (!text) continue;
733
+ lines.push(`${message.role}: ${truncateText(text, 1200)}`);
734
+ }
735
+ return tailText(lines.slice(-8).join("\n"), maxChars);
736
+ }
737
+
738
+ function simplifyModelReference(value: string): string {
739
+ return value.toLowerCase().replace(/[^a-z0-9]+/g, "");
740
+ }
741
+
742
+ function stripThinkingSuffix(value: string): string {
743
+ return value.replace(/:(?:off|minimal|low|medium|high|xhigh)$/i, "");
744
+ }
745
+
746
+ function modelLabel(model: Model<Api>): string {
747
+ return `${model.provider}/${model.id}`;
748
+ }
749
+
750
+ function resolvePostprocessModel(ctx: ExtensionContext, reference: string): Model<Api> {
751
+ const requested = stripThinkingSuffix(reference.trim());
752
+ if (!requested) throw new Error("polishModel is empty in voice input config");
753
+
754
+ const models = ctx.modelRegistry.getAll();
755
+ const lower = requested.toLowerCase();
756
+ const simple = simplifyModelReference(requested);
757
+
758
+ const exactCanonical = models.filter((model) => modelLabel(model).toLowerCase() === lower);
759
+ if (exactCanonical.length === 1) return exactCanonical[0];
760
+
761
+ const exactBare = models.filter((model) => model.id.toLowerCase() === lower || model.name.toLowerCase() === lower);
762
+ if (exactBare.length === 1) return exactBare[0];
763
+ if (exactBare.length > 1) {
764
+ throw new Error(
765
+ `Ambiguous postprocess model "${reference}". Use provider/model, e.g. ${exactBare.map(modelLabel).slice(0, 5).join(", ")}`,
766
+ );
767
+ }
768
+
769
+ const exactSimple = models.filter(
770
+ (model) =>
771
+ simplifyModelReference(modelLabel(model)) === simple ||
772
+ simplifyModelReference(model.id) === simple ||
773
+ simplifyModelReference(model.name) === simple,
774
+ );
775
+ if (exactSimple.length === 1) return exactSimple[0];
776
+ if (exactSimple.length > 1) {
777
+ throw new Error(
778
+ `Ambiguous postprocess model "${reference}". Use provider/model, e.g. ${exactSimple.map(modelLabel).slice(0, 5).join(", ")}`,
779
+ );
780
+ }
781
+
782
+ const fuzzy = models.filter(
783
+ (model) =>
784
+ modelLabel(model).toLowerCase().includes(lower) ||
785
+ model.id.toLowerCase().includes(lower) ||
786
+ model.name.toLowerCase().includes(lower) ||
787
+ simplifyModelReference(modelLabel(model)).includes(simple) ||
788
+ simplifyModelReference(model.id).includes(simple) ||
789
+ simplifyModelReference(model.name).includes(simple),
790
+ );
791
+ if (fuzzy.length === 1) return fuzzy[0];
792
+ if (fuzzy.length > 1) {
793
+ throw new Error(
794
+ `Ambiguous postprocess model "${reference}". Use provider/model, e.g. ${fuzzy.map(modelLabel).slice(0, 5).join(", ")}`,
795
+ );
796
+ }
797
+
798
+ throw new Error(`Postprocess model "${reference}" not found. Run pi --list-models to see available models.`);
799
+ }
800
+
801
+ function extractAssistantText(message: { content: unknown }): string {
802
+ return textFromContent(message.content).trim();
803
+ }
804
+
805
+ function cleanPostprocessOutput(output: string): string {
806
+ let text = output.trim();
807
+ const fence = text.match(/^```[a-zA-Z0-9_-]*\s*\n([\s\S]*?)\n```$/);
808
+ if (fence) text = fence[1].trim();
809
+ text = text.replace(/^(?:优化后的(?:用户)?指令|整理后的(?:用户)?指令|改写后的(?:用户)?指令)\s*[::]\s*/u, "").trim();
810
+ return text;
811
+ }
812
+
813
+ function buildPostprocessPrompt(ctx: ExtensionContext, rawText: string, config: VoiceConfig): string {
814
+ const contextBudget = config.postprocessContextChars;
815
+ const editorContext = getEditorContext(ctx, Math.floor(contextBudget / 2));
816
+ const sessionContext = getRecentSessionContext(ctx, Math.ceil(contextBudget / 2));
817
+
818
+ return [
819
+ "请根据上下文优化下面的原始语音识别结果。",
820
+ "如果上下文为空,直接依据原始文本优化。",
821
+ "不要重复上下文本身;只输出原始语音对应的最终用户指令。",
822
+ "",
823
+ "--- 上下文:当前编辑器已有内容 ---",
824
+ editorContext || "(空)",
825
+ "",
826
+ "--- 上下文:最近会话 ---",
827
+ sessionContext || "(空)",
828
+ "",
829
+ "--- 原始语音识别结果 ---",
830
+ rawText.trim(),
831
+ ].join("\n");
832
+ }
833
+
834
+ async function postprocessTranscript(ctx: ExtensionContext, rawText: string, config: VoiceConfig): Promise<string> {
835
+ if (!config.postprocessEnabled) return rawText;
836
+
837
+ const raw = rawText.trim();
838
+ if (!raw) return rawText;
839
+
840
+ const model = resolvePostprocessModel(ctx, config.postprocessModel);
841
+ const auth = await ctx.modelRegistry.getApiKeyAndHeaders(model);
842
+ if (!auth.ok) {
843
+ throw new Error(`Postprocess model ${modelLabel(model)} is not ready: ${auth.error}`);
844
+ }
845
+
846
+ const response = await completeSimple(
847
+ model,
848
+ {
849
+ systemPrompt: POSTPROCESS_SYSTEM_PROMPT,
850
+ messages: [
851
+ {
852
+ role: "user",
853
+ content: buildPostprocessPrompt(ctx, raw, config),
854
+ timestamp: Date.now(),
855
+ },
856
+ ],
857
+ tools: [],
858
+ },
859
+ {
860
+ apiKey: auth.apiKey,
861
+ headers: auth.headers,
862
+ temperature: 0,
863
+ maxTokens: config.postprocessMaxTokens,
864
+ timeoutMs: config.postprocessTimeoutMs,
865
+ maxRetries: 0,
866
+ cacheRetention: "none",
867
+ signal: ctx.signal,
868
+ },
869
+ );
870
+
871
+ if (response.stopReason === "error" || response.stopReason === "aborted") {
872
+ throw new Error(response.errorMessage || `Postprocess model stopped with ${response.stopReason}`);
873
+ }
874
+
875
+ const polished = cleanPostprocessOutput(extractAssistantText(response));
876
+ return polished || rawText;
877
+ }
878
+
614
879
  function appendToEditor(ctx: ExtensionContext, text: string) {
615
880
  const trimmed = text.trim();
616
881
  if (!trimmed) return;
@@ -628,8 +893,9 @@ async function startRecording(ctx: ExtensionContext) {
628
893
  const config = getConfig();
629
894
  const existing = readState(config);
630
895
  if (existing && pidAlive(existing.pid)) {
631
- ctx.ui.notify(`Already recording: pid=${existing.pid}`, "warning");
632
- ctx.ui.setStatus("voice-input", ctx.ui.theme.fg("error", "● recording"));
896
+ const deviceName = existing.deviceName || recordingDeviceName(config, commandExists("pw-record") ? "pw-record" : "arecord");
897
+ ctx.ui.notify(`Already recording: pid=${existing.pid}. ${recordingStatusText(deviceName)}`, "warning");
898
+ ctx.ui.setStatus("voice-input", ctx.ui.theme.fg("error", recordingStatusText(deviceName)));
633
899
  return;
634
900
  }
635
901
  if (existing) clearState(config);
@@ -639,6 +905,7 @@ async function startRecording(ctx: ExtensionContext) {
639
905
  const outputPath = path.join(config.recordingsDir, `recording-${timestampForFilename()}.wav`);
640
906
  const logPath = path.join(config.logDir, `recording-${timestampForFilename()}.log`);
641
907
  const cmd = recorderCommand(config, outputPath);
908
+ const deviceName = recordingDeviceName(config, cmd[0]);
642
909
 
643
910
  ctx.ui.setStatus("voice-input", ctx.ui.theme.fg("warning", "● starting mic"));
644
911
  const logFd = openSync(logPath, "a");
@@ -656,10 +923,11 @@ async function startRecording(ctx: ExtensionContext) {
656
923
  logPath,
657
924
  startedAt: new Date().toISOString(),
658
925
  recorderTarget: config.recorderTarget || undefined,
926
+ deviceName,
659
927
  });
660
928
 
661
- ctx.ui.setStatus("voice-input", ctx.ui.theme.fg("error", "● recording"));
662
- ctx.ui.notify("Voice recording started. Press Ctrl+Shift+R again to stop/transcribe.", "info");
929
+ ctx.ui.setStatus("voice-input", ctx.ui.theme.fg("error", recordingStatusText(deviceName)));
930
+ ctx.ui.notify(`${recordingStatusText(deviceName)}. Press Ctrl+Shift+R again to stop/transcribe.`, "info");
663
931
  }
664
932
 
665
933
  async function stopRecording(ctx: ExtensionContext, transcribe = true) {
@@ -691,9 +959,9 @@ async function stopRecording(ctx: ExtensionContext, transcribe = true) {
691
959
  const { pcm, durationMs } = parseRecordedWav(state.path);
692
960
  const decodeMs = Date.now() - decodeStart;
693
961
  const result = await transcribePcm(pcm, durationMs, config);
694
- ctx.ui.setStatus("voice-input", undefined);
695
962
 
696
963
  if (!result.text.trim()) {
964
+ ctx.ui.setStatus("voice-input", undefined);
697
965
  ctx.ui.notify(
698
966
  `Transcription finished but no text was returned. audio=${(durationMs / 1000).toFixed(2)}s total=${result.timings.totalMs}ms`,
699
967
  "warning",
@@ -701,9 +969,31 @@ async function stopRecording(ctx: ExtensionContext, transcribe = true) {
701
969
  return;
702
970
  }
703
971
 
704
- appendToEditor(ctx, result.text);
972
+ let finalText = result.text;
973
+ let postprocessMs = 0;
974
+ let postprocessUsed = false;
975
+ if (config.postprocessEnabled) {
976
+ ctx.ui.setStatus("voice-input", ctx.ui.theme.fg("warning", "● polishing"));
977
+ const postprocessStart = Date.now();
978
+ try {
979
+ finalText = await postprocessTranscript(ctx, result.text, config);
980
+ postprocessMs = Date.now() - postprocessStart;
981
+ postprocessUsed = finalText.trim() !== result.text.trim();
982
+ } catch (error) {
983
+ postprocessMs = Date.now() - postprocessStart;
984
+ ctx.ui.notify(
985
+ `Voice postprocess failed; inserting raw transcript. ${error instanceof Error ? error.message : String(error)}`,
986
+ "warning",
987
+ );
988
+ }
989
+ }
990
+
991
+ ctx.ui.setStatus("voice-input", undefined);
992
+ appendToEditor(ctx, finalText);
705
993
  ctx.ui.notify(
706
- `Voice text inserted. audio=${(durationMs / 1000).toFixed(2)}s decode=${decodeMs}ms asr=${result.timings.totalMs}ms packets=${result.packets}`,
994
+ `Voice text inserted. audio=${(durationMs / 1000).toFixed(2)}s decode=${decodeMs}ms asr=${result.timings.totalMs}ms${
995
+ config.postprocessEnabled ? ` postprocess=${postprocessMs}ms${postprocessUsed ? " polished" : ""}` : ""
996
+ } packets=${result.packets}`,
707
997
  "info",
708
998
  );
709
999
  }
@@ -722,8 +1012,11 @@ function setupHelp(config = getConfig()): string {
722
1012
  return [
723
1013
  "pi Voice Input setup:",
724
1014
  "- Current provider: VolcEngine WebSocket ASR",
1015
+ `- Config file: ${config.configPath}`,
725
1016
  `- API key: ${config.apiKey ? "set" : "missing"}`,
1017
+ "- To create/update the JSON config file, run: /voice init",
726
1018
  "- To save/update the key, run: /voice key",
1019
+ `- Polish: ${config.postprocessEnabled ? config.postprocessModel : "disabled"}`,
727
1020
  `- Get/create a VolcEngine Speech API key here: ${VOLC_API_KEY_URL}`,
728
1021
  "- After saving the key, run: /voice config",
729
1022
  ].join("\n");
@@ -734,12 +1027,12 @@ async function configureApiKey(ctx: ExtensionContext, providedKey = "") {
734
1027
 
735
1028
  if (!apiKey) {
736
1029
  if (!ctx.hasUI) {
737
- ctx.ui.notify(`Run /voice key in interactive pi, or get a key from ${VOLC_API_KEY_URL} and set VOLC_API_KEY.`, "error");
1030
+ ctx.ui.notify(`Run /voice key in interactive pi, or edit ${CONFIG_PATH}. Get a key from ${VOLC_API_KEY_URL}.`, "error");
738
1031
  return;
739
1032
  }
740
1033
  ctx.ui.notify(`Get/create a VolcEngine Speech API key here:\n${VOLC_API_KEY_URL}`, "info");
741
1034
  const current = getConfig().apiKey;
742
- const placeholder = current ? "Paste a new VolcEngine API key (current key is already set)" : "Paste VOLC_API_KEY";
1035
+ const placeholder = current ? "Paste a new VolcEngine API key (current key is already set)" : "Paste VolcEngine API key";
743
1036
  apiKey = (await ctx.ui.input("VolcEngine API key", placeholder))?.trim() ?? "";
744
1037
  }
745
1038
 
@@ -748,25 +1041,21 @@ async function configureApiKey(ctx: ExtensionContext, providedKey = "") {
748
1041
  return;
749
1042
  }
750
1043
 
751
- writePrivateEnvValue("VOLC_API_KEY", apiKey);
752
- ctx.ui.notify("VolcEngine API key saved for pi voice input. Run /voice config to verify it is detected.", "info");
1044
+ writeConfigApiKey(apiKey);
1045
+ ctx.ui.notify(`VolcEngine API key saved in ${CONFIG_PATH}. Run /voice config to verify it is detected.`, "info");
753
1046
  }
754
1047
 
755
1048
  function configSummary(config: VoiceConfig): string {
1049
+ const recorderExecutable = commandExists("pw-record") ? "pw-record" : commandExists("arecord") ? "arecord" : "";
1050
+ const currentDevice = recorderExecutable ? recordingDeviceName(config, recorderExecutable) : "no recorder found";
756
1051
  return [
757
1052
  "Voice input config:",
758
- `- api key: ${config.apiKey ? "set" : "missing"} (update with /voice key)`,
759
- `- ws url: ${config.wsUrl}`,
760
- `- resource id: ${config.resourceId}`,
761
- `- language: ${config.language || "auto"}`,
762
- `- recorder target: ${config.recorderTarget || "PipeWire/default"}`,
763
- `- segment: ${config.segmentMs}ms`,
764
- `- recordings: ${config.recordingsDir}`,
765
- `- state: ${config.statePath}`,
766
- `- shortcut: ${config.shortcut}`,
767
- "Run /voice key to save/update the current provider API key.",
1053
+ `- config file: ${config.configPath}${existsSync(config.configPath) ? "" : " (missing; run /voice init to create it)"}`,
1054
+ `- volcApiKey: ${config.apiKey ? "set" : "missing"} (update with /voice key)`,
1055
+ `- polishModel: ${config.postprocessEnabled ? config.postprocessModel : "disabled"}`,
1056
+ `- current recording device: ${currentDevice}`,
1057
+ "Config keys: volcApiKey, polishModel. Leave polishModel empty to disable polish.",
768
1058
  `VolcEngine API key URL: ${VOLC_API_KEY_URL}`,
769
- "Config files checked: ~/.pi/agent/voice-input.env, package .env, current .env; shell env overrides them.",
770
1059
  ].join("\n");
771
1060
  }
772
1061
 
@@ -786,7 +1075,7 @@ export default function (pi: ExtensionAPI) {
786
1075
  });
787
1076
 
788
1077
  pi.registerCommand("voice", {
789
- description: "Voice input: start | stop | status | toggle | cancel | config | key | help",
1078
+ description: "Voice input: start | stop | status | toggle | cancel | config | init | key | help",
790
1079
  handler: async (args, ctx) => {
791
1080
  const input = (args || "toggle").trim();
792
1081
  const action = (input.split(/\s+/, 1)[0] || "toggle").toLowerCase();
@@ -814,6 +1103,11 @@ export default function (pi: ExtensionAPI) {
814
1103
  ctx.ui.notify(configSummary(getConfig()), "info");
815
1104
  return;
816
1105
  }
1106
+ if (action === "init") {
1107
+ const created = ensureConfigFile();
1108
+ ctx.ui.notify(`${created ? "Created" : "Updated"} voice input config: ${CONFIG_PATH}`, "info");
1109
+ return;
1110
+ }
817
1111
  if (["key", "api-key", "apikey", "setup", "configure"].includes(action)) {
818
1112
  await configureApiKey(ctx, rest);
819
1113
  return;
@@ -826,7 +1120,7 @@ export default function (pi: ExtensionAPI) {
826
1120
  await toggleRecording(ctx);
827
1121
  return;
828
1122
  }
829
- ctx.ui.notify("Usage: /voice start | stop | status | toggle | cancel | config | key | help", "error");
1123
+ ctx.ui.notify("Usage: /voice start | stop | status | toggle | cancel | config | init | key | help", "error");
830
1124
  } catch (error) {
831
1125
  ctx.ui.setStatus("voice-input", undefined);
832
1126
  ctx.ui.notify(`Voice command error: ${error instanceof Error ? error.message : String(error)}`, "error");
@@ -842,7 +1136,8 @@ export default function (pi: ExtensionAPI) {
842
1136
  ctx.ui.notify(
843
1137
  [
844
1138
  `Voice input loaded: ${startupConfig.shortcut} toggles recording.`,
845
- "API key is missing. Run /voice key to set it up.",
1139
+ "API key is missing. Run /voice key to set it up, or edit the JSON config file.",
1140
+ `Config file: ${startupConfig.configPath}`,
846
1141
  `Get/create a VolcEngine Speech API key here: ${VOLC_API_KEY_URL}`,
847
1142
  ].join("\n"),
848
1143
  "warning",
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-voice-input",
3
- "version": "0.1.2",
3
+ "version": "0.2.0",
4
4
  "description": "provider-extensible voice input extension for pi",
5
5
  "type": "module",
6
6
  "keywords": [
@@ -12,9 +12,10 @@
12
12
  "asr"
13
13
  ],
14
14
  "license": "MIT",
15
+ "author": "tr-nc",
15
16
  "repository": {
16
17
  "type": "git",
17
- "url": "git+ssh://git@github.com/tr-nc/pi-voice-input.git"
18
+ "url": "git+https://github.com/tr-nc/pi-voice-input.git"
18
19
  },
19
20
  "bugs": {
20
21
  "url": "https://github.com/tr-nc/pi-voice-input/issues"
@@ -22,7 +23,6 @@
22
23
  "homepage": "https://github.com/tr-nc/pi-voice-input#readme",
23
24
  "files": [
24
25
  "extensions",
25
- ".env.example",
26
26
  "README.md",
27
27
  "AGENTS.md"
28
28
  ],
@@ -35,6 +35,7 @@
35
35
  "ws": "^8.20.1"
36
36
  },
37
37
  "devDependencies": {
38
+ "@earendil-works/pi-ai": "*",
38
39
  "@earendil-works/pi-coding-agent": "*",
39
40
  "@earendil-works/pi-tui": "*",
40
41
  "@types/node": "^25.8.0",
@@ -42,10 +43,14 @@
42
43
  "typescript": "^6.0.3"
43
44
  },
44
45
  "peerDependencies": {
46
+ "@earendil-works/pi-ai": "*",
45
47
  "@earendil-works/pi-coding-agent": "*",
46
48
  "@earendil-works/pi-tui": "*"
47
49
  },
48
50
  "peerDependenciesMeta": {
51
+ "@earendil-works/pi-ai": {
52
+ "optional": true
53
+ },
49
54
  "@earendil-works/pi-coding-agent": {
50
55
  "optional": true
51
56
  },
package/.env.example DELETED
@@ -1,27 +0,0 @@
1
- # Copy to ~/.pi/agent/voice-input.env or to this package as .env.
2
- # Do not commit real credentials.
3
-
4
- # Required for the current provider: VolcEngine speech API key.
5
- VOLC_API_KEY=
6
-
7
- # Optional ASR settings.
8
- VOLC_WS_URL=wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_nostream
9
- VOLC_STREAM_RESOURCE_ID=volc.seedasr.sauc.duration
10
- ASR_LANGUAGE=
11
- ASR_PROMPT=
12
- STREAM_SEGMENT_MS=5000
13
- ASR_REQUEST_TIMEOUT_MS=90000
14
-
15
- # Optional recorder settings.
16
- # Leave empty to let PipeWire choose the default microphone.
17
- RECORDER_TARGET=
18
- RECORDING_FINALIZE_DELAY=0.1
19
-
20
- # Optional storage settings. Defaults to ~/.pi/agent/voice-input.
21
- VOICE_INPUT_HOME=~/.pi/agent/voice-input
22
- RECORDINGS_DIR=recordings
23
- RECORDER_STATE=recording.json
24
- RECORDER_LOG_DIR=logs
25
-
26
- # Optional shortcut. Default is Ctrl+Shift+R.
27
- VOICE_INPUT_SHORTCUT=ctrl+shift+r