npm - pi-voice-input - Versions diffs - 0.2.12 → 0.2.13 - Mend

pi-voice-input 0.2.12 → 0.2.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md +92 -128
package/extensions/voice-input.ts +38 -10
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -1,193 +1,158 @@
 # pi Voice Input
-A publishable, pure TypeScript [pi](https://pi.dev/) extension for Linux and macOS voice dictation into pi's editor.
+Voice dictation for [pi](https://pi.dev/). Press one shortcut, speak naturally, and insert the transcript into the editor without sending the prompt automatically.
-- Press `Ctrl+Shift+R` once to start recording.
-- Press `Ctrl+Shift+R` again to stop.
-- The extension sends the audio to VolcEngine WebSocket ASR.
-- The recognized text is inserted into pi's editor without submitting.
+## Why use it?
-Current scope:
+Typing long prompts can slow you down. `pi-voice-input` lets you:
-- Linux uses `pw-record` from PipeWire tools or `arecord` from alsa-utils.
-- macOS uses `afrecord` when present, otherwise `ffmpeg` with AVFoundation.
-- A VolcEngine Speech API key is required.
-- This is not a local/offline ASR engine.
+- capture ideas quickly while you are thinking out loud
+- dictate long instructions, notes, bug reports, or code review comments
+- speak naturally in Chinese, English, or a mix of both
+- keep your hands on the keyboard with a simple toggle shortcut
+- review or edit the inserted text before you submit it
+- optionally polish dictated text with one of your configured pi models
-The provider layer is intended to be extensible. **Current version supports only VolcEngine WebSocket ASR.**
+## Features
-No Python, `uv`, or upload service is required for normal shortcut usage. On macOS systems without `afrecord`, install `ffmpeg` for recording.
+- **One-key dictation**: `Ctrl+Shift+R` starts recording; press it again to stop and insert text.
+- **Editor-safe workflow**: transcription is pasted into the current editor only. It does not auto-submit.
+- **Chinese/English mixed input**: handles prompts that switch between Chinese, English, product names, and technical terms.
+- **Works on Linux and macOS**: uses common system recording tools.
+- **Lowers sound while you speak**: automatically turns down system audio during recording, then restores it afterwards.
+- **Optional transcript polish**: use a pi model to clean up punctuation and wording before insertion.
+- **Simple setup commands**: configure from inside pi with `/voice init` and `/voice key`.
-## Architecture
+Current speech provider: **VolcEngine Speech ASR**. A VolcEngine Speech API key is required.
-```text
-pi extension: extensions/index.ts → extensions/voice-input.ts
-  ├─ registers Ctrl+Shift+R and /voice commands
-  ├─ starts/stops a local recorder process
-  │    ├─ Linux preferred: pw-record
-  │    ├─ Linux fallback: arecord
-  │    └─ macOS: afrecord, or ffmpeg/AVFoundation fallback
-  ├─ ducks system output volume while the microphone is listening
-  ├─ records a temporary 16 kHz mono 16-bit WAV
-  ├─ parses the WAV container in TypeScript and extracts raw PCM
-  ├─ sends PCM frames to the configured ASR provider via ws
-  │    └─ current provider: VolcEngine /api/v3/sauc/bigmodel_nostream
-  ├─ optionally post-processes raw ASR text with a configured pi model
-  │    └─ default: disabled; set polishModel to enable it
-  └─ pastes the final transcript into pi's editor
-```
-Runtime package dependency:
-- `ws`
-System dependency, one of:
-- Linux: `pw-record` from PipeWire tools, preferred
-- Linux: `arecord` from alsa-utils, fallback
-- macOS: `afrecord` when present, or `ffmpeg` from Homebrew (`brew install ffmpeg`) as the AVFoundation fallback
-On macOS, grant Terminal, ffmpeg, or your pi host app microphone permission when prompted. If macOS has previously denied microphone access, enable it in System Settings → Privacy & Security → Microphone.
-## Install / Update
-Install the published package with pi:
+## Install
 ```bash
 pi install npm:pi-voice-input
 ```
-Update to the latest published version:
+Update later with:
 ```bash
 pi update npm:pi-voice-input
 ```
-If pi is already running, restart pi after installing or updating. `/reload` may not replace code that was already loaded by the current pi process.
+Restart pi after installing or updating.
-## Providers
+## First-time setup
-The extension is structured around a provider boundary: recording, editor insertion, and command handling are generic; ASR transport/protocol logic is provider-specific.
+1. Install the extension:
-Currently implemented provider:
+   ```bash
+   pi install npm:pi-voice-input
+   ```
-- VolcEngine WebSocket ASR (`bigmodel_nostream`)
+2. Restart pi.
-Planned provider direction:
+3. Create the local config:
-- add more ASR providers without changing the shortcut/user workflow
-- keep provider credentials and options isolated in config
+   ```text
+   /voice init
+   ```
-## Configure
+4. Add your VolcEngine Speech API key:
-All plugin settings live in one JSON file:
+   ```text
+   /voice key
+   ```
-```text
-~/.pi/agent/voice-input.config.json
-```
+   Get your key here:
-Package-local and project-local env files are not read.
+   https://console.volcengine.com/speech/new/setting/apikeys?projectName=default
-Create or normalize the file from inside pi:
+5. Check that pi sees your setup:
-```text
-/voice init
-```
+   ```text
+   /voice config
+   ```
-Then set the VolcEngine Speech API key:
+6. Press `Ctrl+Shift+R`, speak, then press `Ctrl+Shift+R` again to insert the transcript.
+## Use
+Press:
 ```text
-/voice key
+Ctrl+Shift+R
 ```
-The key URL is also shown inside pi when the key is missing, when you run `/voice key`, and in `/voice help`:
+Then speak naturally in Chinese, English, or both. Press `Ctrl+Shift+R` again to stop recording. The recognized text appears in the editor at your cursor.
-https://console.volcengine.com/speech/new/setting/apikeys?projectName=default
+Useful commands:
-The config file is plain JSON and can be edited directly:
-```json
-{
-  "volcApiKey": "",
-  "polishModel": "",
-  "duckSystemVolume": true,
-  "duckSystemVolumeFactor": 0.5,
-  "duckSystemVolumeFadeMs": 300
-}
+```text
+/voice start    start recording
+/voice stop     stop, transcribe, and insert text
+/voice toggle   start or stop recording
+/voice cancel   stop and discard the recording
+/voice status   show current recorder state
+/voice config   show non-secret configuration
+/voice key      set or replace the API key
+/voice help     show setup help
 ```
-`polishModel` is disabled by default. Set it to any model shown by `pi --list-models` to enable transcript polish. If polishing fails, the raw ASR transcript is inserted instead.
-`duckSystemVolume` is enabled by default. While recording, the extension lowers system output volume to `duckSystemVolumeFactor` of the original volume using a short ease-in/ease-out fade (`duckSystemVolumeFadeMs`), then restores the saved volume when recording stops or is cancelled. Linux uses `wpctl` or `pactl`; macOS uses `osascript`.
+## Optional: polish dictated text
-Verify the effective non-secret config:
+By default, pi inserts the raw transcript. To let a pi model clean up punctuation and wording, set `polishModel` in:
 ```text
-/voice config
+~/.pi/agent/voice-input.config.json
 ```
-## Usage
+Use any model name shown by:
-Shortcut:
-```text
-Ctrl+Shift+R
+```bash
+pi --list-models
 ```
-Slash commands:
+Example:
-```text
-/voice start    # start recording
-/voice stop     # stop, transcribe, insert text
-/voice toggle   # start if idle, stop if recording
-/voice cancel   # stop recording and discard local audio without transcribing
-/voice status   # show recorder state
-/voice config   # show effective non-secret config and whether API key is detected
-/voice init     # create or normalize ~/.pi/agent/voice-input.config.json
-/voice key      # prompt for and save the current provider API key
-/voice help     # show setup help, including the explicit VolcEngine API key URL
+```json
+{
+  "volcApiKey": "",
+  "polishModel": "your-model-name"
+}
 ```
-## Notes
+If polishing fails, the raw transcript is inserted instead.
-- The extension uses post-recording WebSocket ASR: it records locally to a per-run temporary WAV, sends the stopped recording in chunks, then deletes the temporary audio. It is optimized for fast voice input, not live subtitles.
-- The default ASR segment size is intentionally larger than realtime packet sizes because this workflow sends already-recorded audio.
-- The transcript is inserted into the editor only; it is not submitted automatically.
-- Recorder stdout/stderr is not logged to disk, to avoid retaining potentially sensitive runtime data.
-- On startup, legacy `~/.pi/agent/voice-input/recordings` and `~/.pi/agent/voice-input/logs` artifacts are cleaned up when they are not part of an active recording.
-- When `polishModel` is set, polishing uses the unsent editor draft and recent session messages as context, but outputs only the refined voice text to insert at the current cursor. It must not reconstruct the full draft; the final text is pasted without replacing existing editor content.
-- While recording, the status line shows `● Mic on: [device name] — press Ctrl+Shift+R again to stop/transcribe` in the current theme accent color; no separate popup is shown when recording starts.
-- By default, system output volume is ducked to 50% of its previous level with a 300 ms ease-in/ease-out fade while the microphone is listening, then restored after recording stops.
+## System requirements
-## Development
+Linux needs one recording tool:
-See [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines, validation commands, and pull request expectations.
+- `pw-record` from PipeWire tools, recommended
+- or `arecord` from alsa-utils
-Clone the repo and install dependencies:
+macOS uses the built-in recorder when available. If recording does not work, install ffmpeg:
 ```bash
-git clone git@github.com:tr-nc/pi-voice-input.git
-cd pi-voice-input
-npm install
+brew install ffmpeg
 ```
-Run directly from the package checkout:
+On macOS, allow microphone access for your terminal or pi host app when prompted. You can also check System Settings → Privacy & Security → Microphone.
-```bash
-pi -e .
-```
+## Privacy notes
-Or install the local checkout while developing:
+- Your API key is stored locally in `~/.pi/agent/voice-input.config.json`.
+- Recordings are temporary and are removed after use.
+- Transcribed text is inserted into the editor so you can review it before submitting.
-```bash
-pi install .
-```
+## Troubleshooting
-After changing the extension while pi is open, run:
+- Run `/voice status` to see whether recording is active.
+- Run `/voice config` to confirm the API key is detected.
+- Run `/voice key` again if the key was changed or expired.
+- On macOS, check microphone permission if recording immediately fails.
+- On Linux, make sure `pw-record` or `arecord` is installed and your microphone works in other apps.
-```text
-/reload
-```
+## Development
+See [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines.
 ## Roadmap
@@ -196,5 +161,4 @@ See [ROADMAP.md](ROADMAP.md) for planned user-visible work.
 ## Links
 - API key settings: https://console.volcengine.com/speech/new/setting/apikeys?projectName=default
-- ASR product page: https://www.volcengine.com/product/asr
-- WebSocket ASR docs: https://www.volcengine.com/docs/6561/1354869?lang=zh
+- VolcEngine ASR: https://www.volcengine.com/product/asr

package/extensions/voice-input.ts CHANGED Viewed

@@ -1091,22 +1091,46 @@ function cleanPostprocessOutput(output: string): string {
   return text;
 }
+const EXPLICIT_ENGLISH_MULTILINE_PATTERN =
+  /\b(?:new\s*line|newline|line break|next line|new paragraph|paragraph break|carriage return|press enter|separate lines?|multi[- ]line|multiple lines)\b/i;
+const EXPLICIT_CHINESE_MULTILINE_PATTERN = /(?:换行|新的一行|另起一行|下一行|回车|分行|多行|逐行|每行|空一行|新段落|另起一段|分段)/u;
+const CJK_LIKE_PATTERN = /[\p{Script=Han}\p{Script=Hiragana}\p{Script=Katakana}\p{Script=Hangul}]/u;
+const CJK_PUNCTUATION_PATTERN = /[，。！？、；：（）《》「」『』“”‘’]/u;
+const CLOSING_PUNCTUATION_PATTERN = /^[,.;:!?，。！？、；：）)\]}》」』”’]/u;
+const OPENING_PUNCTUATION_PATTERN = /[（([{\[《「『“‘]$/u;
 function rawTextRequestsMultiline(rawText: string): boolean {
-  return (
-    /\r|\n/.test(rawText) ||
-    /\b(?:new\s*line|newline|line break|next line|new paragraph|paragraph break|carriage return|press enter|separate lines?|multi[- ]line|multiple lines)\b/i.test(rawText) ||
-    /(?:换行|新的一行|另起一行|下一行|回车|分行|多行|逐行|每行|空一行|新段落|另起一段|分段)/u.test(rawText)
-  );
+  // Existing newlines in raw ASR are not reliable user intent: providers can
+  // insert segment or sentence breaks on their own. Treat only spoken layout
+  // commands as intentional multiline input.
+  return EXPLICIT_ENGLISH_MULTILINE_PATTERN.test(rawText) || EXPLICIT_CHINESE_MULTILINE_PATTERN.test(rawText);
+}
+function lineBreakJoiner(left: string, right: string): string {
+  if (!left || !right) return "";
+  if (CLOSING_PUNCTUATION_PATTERN.test(right) || OPENING_PUNCTUATION_PATTERN.test(left)) return "";
+  if (CJK_PUNCTUATION_PATTERN.test(left) || CJK_PUNCTUATION_PATTERN.test(right)) return "";
+  if (CJK_LIKE_PATTERN.test(left) && CJK_LIKE_PATTERN.test(right)) return "";
+  return " ";
 }
 function collapseUnexpectedLineBreaks(text: string): string {
-  return text
-    .replace(/\r\n?/g, "\n")
-    .replace(/[ \t\f\v]*\n+[ \t\f\v]*/g, " ")
+  const normalized = text.replace(/\r\n?/g, "\n");
+  return normalized
+    .replace(/[ \t\f\v]*\n+[ \t\f\v]*/g, (match, offset: number, source: string) => {
+      const left = source.slice(0, offset).replace(/[ \t\f\v]+$/g, "").at(-1) ?? "";
+      const right = source.slice(offset + match.length).replace(/^[ \t\f\v]+/g, "").at(0) ?? "";
+      return lineBreakJoiner(left, right);
+    })
     .replace(/[ \t\f\v]{2,}/g, " ")
     .trim();
 }
+function normalizeRawTextForPostprocess(rawText: string): string {
+  const raw = rawText.trim();
+  return rawTextRequestsMultiline(raw) ? raw : collapseUnexpectedLineBreaks(raw);
+}
 function preserveExpectedPostprocessLayout(rawText: string, output: string): string {
   if (rawTextRequestsMultiline(rawText)) return output.trim();
   return collapseUnexpectedLineBreaks(output);
@@ -1193,7 +1217,7 @@ async function postprocessTranscript(ctx: ExtensionContext, rawText: string, con
       messages: [
         {
           role: "user",
-          content: buildPostprocessPrompt(ctx, raw, config),
+          content: buildPostprocessPrompt(ctx, normalizeRawTextForPostprocess(raw), config),
           timestamp: Date.now(),
         },
       ],
@@ -1370,6 +1394,7 @@ async function stopRecording(ctx: ExtensionContext, transcribe = true) {
   let finalText = result.text;
   let postprocessMs = 0;
+  let postprocessSucceeded = false;
   let postprocessUsed = false;
   if (config.postprocessEnabled) {
     ctx.ui.setStatus("voice-input", ctx.ui.theme.fg("warning", "● polishing"));
@@ -1377,7 +1402,7 @@ async function stopRecording(ctx: ExtensionContext, transcribe = true) {
     try {
       finalText = await postprocessTranscript(ctx, result.text, config);
       postprocessMs = Date.now() - postprocessStart;
-      postprocessUsed = finalText.trim() !== result.text.trim();
+      postprocessSucceeded = true;
     } catch (error) {
       postprocessMs = Date.now() - postprocessStart;
       ctx.ui.notify(
@@ -1387,6 +1412,9 @@ async function stopRecording(ctx: ExtensionContext, transcribe = true) {
     }
   }
+  finalText = preserveExpectedPostprocessLayout(result.text, finalText);
+  postprocessUsed = postprocessSucceeded && finalText.trim() !== result.text.trim();
   ctx.ui.setStatus("voice-input", undefined);
   insertIntoEditor(ctx, finalText);
   ctx.ui.notify(

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pi-voice-input",
-  "version": "0.2.12",
+  "version": "0.2.13",
   "description": "Press Ctrl+Shift+R to dictate prompts into Pi using VolcEngine ASR",
   "type": "module",
   "keywords": [