@kstonekuan/gemini-voice 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +112 -0
- package/commands/voice.toml +7 -0
- package/dist/cli.d.ts +2 -0
- package/dist/cli.js +15 -0
- package/dist/cli.js.map +1 -0
- package/dist/commands/auth.d.ts +6 -0
- package/dist/commands/auth.js +45 -0
- package/dist/commands/auth.js.map +1 -0
- package/dist/commands/devices.d.ts +2 -0
- package/dist/commands/devices.js +18 -0
- package/dist/commands/devices.js.map +1 -0
- package/dist/commands/transcribe.d.ts +8 -0
- package/dist/commands/transcribe.js +255 -0
- package/dist/commands/transcribe.js.map +1 -0
- package/dist/components/TranscribeUI.d.ts +9 -0
- package/dist/components/TranscribeUI.js +56 -0
- package/dist/components/TranscribeUI.js.map +1 -0
- package/dist/config.d.ts +4 -0
- package/dist/config.js +38 -0
- package/dist/config.js.map +1 -0
- package/dist/index.d.ts +1 -0
- package/dist/index.js +2 -0
- package/dist/index.js.map +1 -0
- package/gemini-extension.json +5 -0
- package/package.json +52 -0
package/README.md
ADDED
|
@@ -0,0 +1,112 @@
|
|
|
1
|
+
# Gemini CLI Voice Extension
|
|
2
|
+
|
|
3
|
+
**Voice mode for [Gemini CLI](https://github.com/google-gemini/gemini-cli).** Talk to Gemini from your terminal, powered by the Gemini Live API.
|
|
4
|
+
|
|
5
|
+

|
|
6
|
+
|
|
7
|
+
This repo ships two things:
|
|
8
|
+
|
|
9
|
+
- **`gemini-voice` CLI**, a standalone voice real-time transcription tool in the terminal with an audio waveform display. It captures speech from your microphone, streams it to the Gemini Live API, and returns a transcript.
|
|
10
|
+
- **Gemini CLI Extension**, which adds a `/voice` command to Gemini CLI so you can speak instead of type.
|
|
11
|
+
|
|
12
|
+
The CLI was built first as the core transcription engine, and the extension wraps it to bring voice input into Gemini CLI. Think of it like voice mode for Claude Code, but for Gemini CLI.
|
|
13
|
+
|
|
14
|
+

|
|
15
|
+
|
|
16
|
+
### Current limitations
|
|
17
|
+
|
|
18
|
+
The extension approach works, but Gemini CLI's extension system has some constraints that limit the experience:
|
|
19
|
+
|
|
20
|
+
- **No push-to-talk.** You need to type `/voice` (or use your OS voice-to-text) to start listening. There's no hotkey to hold and talk.
|
|
21
|
+
- **No live feedback.** The standalone `gemini-voice` CLI shows a real-time audio waveform, but Gemini CLI doesn't support live output from extension subprocesses, so the interactive UI is suppressed when used as an extension.
|
|
22
|
+
|
|
23
|
+
These are platform limitations, not bugs. To get a true voice mode with push-to-talk, live waveforms, and tight integration, it needs to be built natively into Gemini CLI itself. I'm working on that, and this project is a stepping stone towards it, built on top of the Gemini Live API.
|
|
24
|
+
|
|
25
|
+
## Features
|
|
26
|
+
|
|
27
|
+
- Voice input for Gemini CLI via the `/voice` extension command
|
|
28
|
+
- Native microphone capture via a Rust addon (cpal + lock-free ring buffer)
|
|
29
|
+
- Real-time audio streaming to the Gemini Live API for transcription
|
|
30
|
+
- Server-side voice activity detection (VAD), no local VAD needed
|
|
31
|
+
- Automatic shutdown after speech ends
|
|
32
|
+
- Ink-based terminal UI with spinner and live audio level meter (standalone CLI)
|
|
33
|
+
- Standalone CLI with `transcribe` and `devices` subcommands
|
|
34
|
+
- Pre-built native binaries, no Rust toolchain needed for end users
|
|
35
|
+
|
|
36
|
+
## How it works
|
|
37
|
+
|
|
38
|
+
The Gemini Live API is actually a speech-to-speech API designed for real-time voice conversations with the model. We're repurposing it here, only using its real-time input transcription and server-side voice activity detection to build a transcription tool. The model's audio responses are ignored entirely.
|
|
39
|
+
|
|
40
|
+
1. The native Rust addon captures 16kHz 16-bit PCM mono audio from the microphone using cpal
|
|
41
|
+
2. Audio samples are written to a lock-free ring buffer and drained on a dedicated thread
|
|
42
|
+
3. The drain thread pushes samples into Node.js via a NAPI ThreadsafeFunction (non-blocking)
|
|
43
|
+
4. TypeScript code base64-encodes the PCM chunks and sends them as `realtimeInput` over a WebSocket to the Gemini Live API
|
|
44
|
+
5. The server performs voice activity detection and streams back `inputTranscription` messages
|
|
45
|
+
6. Once transcription is complete (or a settle timeout elapses), the transcript is printed to stdout and the process exits
|
|
46
|
+
|
|
47
|
+
## Prerequisites
|
|
48
|
+
|
|
49
|
+
- [Gemini CLI](https://github.com/google-gemini/gemini-cli)
|
|
50
|
+
- [Node.js](https://nodejs.org/) (v18+)
|
|
51
|
+
- A Gemini API key ([get one here](https://aistudio.google.com/apikey))
|
|
52
|
+
|
|
53
|
+
## Installation
|
|
54
|
+
|
|
55
|
+
### As a Gemini CLI extension
|
|
56
|
+
|
|
57
|
+
From GitHub:
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
gemini extensions install https://github.com/kstonekuan/gemini-cli-voice-extension
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
From npm:
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
gemini extensions install @kstonekuan/gemini-voice
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Set up your API key:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
gemini-voice auth
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### Standalone CLI
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
npm install -g @kstonekuan/gemini-voice
|
|
79
|
+
gemini-voice auth
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
### Development
|
|
83
|
+
|
|
84
|
+
See [CONTRIBUTING.md](./CONTRIBUTING.md) for development setup.
|
|
85
|
+
|
|
86
|
+
## Usage
|
|
87
|
+
|
|
88
|
+
### Inside Gemini CLI
|
|
89
|
+
|
|
90
|
+
```
|
|
91
|
+
/voice
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
### Standalone CLI
|
|
95
|
+
|
|
96
|
+
```bash
|
|
97
|
+
# Transcribe speech from the default microphone
|
|
98
|
+
gemini-voice transcribe
|
|
99
|
+
|
|
100
|
+
# Transcribe from a specific audio device
|
|
101
|
+
gemini-voice transcribe --device 1
|
|
102
|
+
|
|
103
|
+
# Quiet mode -- only output the final transcript (no UI)
|
|
104
|
+
gemini-voice transcribe --quiet
|
|
105
|
+
|
|
106
|
+
# List available audio input devices
|
|
107
|
+
gemini-voice devices
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
> **Note:** When using `/voice` inside Gemini CLI, the `--quiet` flag is used automatically. Gemini CLI's `!{...}` syntax does not support live output from subprocesses, so the interactive UI is suppressed. The model will echo back the transcription before responding.
|
|
111
|
+
|
|
112
|
+
Press `Ctrl+C` to cancel at any time.
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
description = "Speak into your microphone and use the transcript as your next message"
|
|
2
|
+
|
|
3
|
+
prompt = """
|
|
4
|
+
The user provided the following voice input. First, quote the transcription back to the user, then process it as you would any typed input.
|
|
5
|
+
|
|
6
|
+
!{gemini-voice transcribe --quiet}
|
|
7
|
+
"""
|
package/dist/cli.d.ts
ADDED
package/dist/cli.js
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
import yargs from "yargs";
|
|
3
|
+
import { hideBin } from "yargs/helpers";
|
|
4
|
+
import { authCommand } from "./commands/auth.js";
|
|
5
|
+
import { devicesCommand } from "./commands/devices.js";
|
|
6
|
+
import { transcribeCommand } from "./commands/transcribe.js";
|
|
7
|
+
yargs(hideBin(process.argv))
|
|
8
|
+
.command(authCommand)
|
|
9
|
+
.command(transcribeCommand)
|
|
10
|
+
.command(devicesCommand)
|
|
11
|
+
.demandCommand(1, "Please specify a command")
|
|
12
|
+
.strict()
|
|
13
|
+
.help()
|
|
14
|
+
.parse();
|
|
15
|
+
//# sourceMappingURL=cli.js.map
|
package/dist/cli.js.map
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"cli.js","sourceRoot":"","sources":["../src/cli.ts"],"names":[],"mappings":";AACA,OAAO,KAAK,MAAM,OAAO,CAAC;AAC1B,OAAO,EAAE,OAAO,EAAE,MAAM,eAAe,CAAC;AACxC,OAAO,EAAE,WAAW,EAAE,MAAM,oBAAoB,CAAC;AACjD,OAAO,EAAE,cAAc,EAAE,MAAM,uBAAuB,CAAC;AACvD,OAAO,EAAE,iBAAiB,EAAE,MAAM,0BAA0B,CAAC;AAE7D,KAAK,CAAC,OAAO,CAAC,OAAO,CAAC,IAAI,CAAC,CAAC;KAC1B,OAAO,CAAC,WAAW,CAAC;KACpB,OAAO,CAAC,iBAAiB,CAAC;KAC1B,OAAO,CAAC,cAAc,CAAC;KACvB,aAAa,CAAC,CAAC,EAAE,0BAA0B,CAAC;KAC5C,MAAM,EAAE;KACR,IAAI,EAAE;KACN,KAAK,EAAE,CAAC"}
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
import { createInterface } from "node:readline/promises";
|
|
2
|
+
import { clearApiKey, getConfigFilePath, getStoredApiKey, storeApiKey, } from "../config.js";
|
|
3
|
+
async function promptForApiKey() {
|
|
4
|
+
const rl = createInterface({
|
|
5
|
+
input: process.stdin,
|
|
6
|
+
output: process.stderr,
|
|
7
|
+
});
|
|
8
|
+
try {
|
|
9
|
+
const apiKey = await rl.question("Enter your Gemini API key (from https://aistudio.google.com/apikey): ");
|
|
10
|
+
return apiKey.trim();
|
|
11
|
+
}
|
|
12
|
+
finally {
|
|
13
|
+
rl.close();
|
|
14
|
+
}
|
|
15
|
+
}
|
|
16
|
+
export const authCommand = {
|
|
17
|
+
command: "auth",
|
|
18
|
+
describe: "Set up or clear your Gemini API key",
|
|
19
|
+
builder: (argv) => argv.option("clear", {
|
|
20
|
+
type: "boolean",
|
|
21
|
+
default: false,
|
|
22
|
+
describe: "Remove the stored API key",
|
|
23
|
+
}),
|
|
24
|
+
handler: async (argv) => {
|
|
25
|
+
if (argv.clear) {
|
|
26
|
+
clearApiKey();
|
|
27
|
+
process.stderr.write("API key removed.\n");
|
|
28
|
+
return;
|
|
29
|
+
}
|
|
30
|
+
const existing = getStoredApiKey();
|
|
31
|
+
if (existing) {
|
|
32
|
+
process.stderr.write(`API key already configured (stored in ${getConfigFilePath()}).\n`);
|
|
33
|
+
process.stderr.write("Run 'gemini-voice auth --clear' to remove it.\n");
|
|
34
|
+
return;
|
|
35
|
+
}
|
|
36
|
+
const apiKey = await promptForApiKey();
|
|
37
|
+
if (!apiKey) {
|
|
38
|
+
process.stderr.write("No API key provided.\n");
|
|
39
|
+
process.exit(1);
|
|
40
|
+
}
|
|
41
|
+
storeApiKey(apiKey);
|
|
42
|
+
process.stderr.write(`API key saved to ${getConfigFilePath()} (mode 600).\n`);
|
|
43
|
+
},
|
|
44
|
+
};
|
|
45
|
+
//# sourceMappingURL=auth.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"auth.js","sourceRoot":"","sources":["../../src/commands/auth.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,eAAe,EAAE,MAAM,wBAAwB,CAAC;AAEzD,OAAO,EACN,WAAW,EACX,iBAAiB,EACjB,eAAe,EACf,WAAW,GACX,MAAM,cAAc,CAAC;AAEtB,KAAK,UAAU,eAAe;IAC7B,MAAM,EAAE,GAAG,eAAe,CAAC;QAC1B,KAAK,EAAE,OAAO,CAAC,KAAK;QACpB,MAAM,EAAE,OAAO,CAAC,MAAM;KACtB,CAAC,CAAC;IACH,IAAI,CAAC;QACJ,MAAM,MAAM,GAAG,MAAM,EAAE,CAAC,QAAQ,CAC/B,uEAAuE,CACvE,CAAC;QACF,OAAO,MAAM,CAAC,IAAI,EAAE,CAAC;IACtB,CAAC;YAAS,CAAC;QACV,EAAE,CAAC,KAAK,EAAE,CAAC;IACZ,CAAC;AACF,CAAC;AAMD,MAAM,CAAC,MAAM,WAAW,GAAoC;IAC3D,OAAO,EAAE,MAAM;IACf,QAAQ,EAAE,qCAAqC;IAC/C,OAAO,EAAE,CAAC,IAAI,EAAE,EAAE,CACjB,IAAI,CAAC,MAAM,CAAC,OAAO,EAAE;QACpB,IAAI,EAAE,SAAS;QACf,OAAO,EAAE,KAAK;QACd,QAAQ,EAAE,2BAA2B;KACrC,CAAC;IACH,OAAO,EAAE,KAAK,EAAE,IAAI,EAAE,EAAE;QACvB,IAAI,IAAI,CAAC,KAAK,EAAE,CAAC;YAChB,WAAW,EAAE,CAAC;YACd,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,oBAAoB,CAAC,CAAC;YAC3C,OAAO;QACR,CAAC;QAED,MAAM,QAAQ,GAAG,eAAe,EAAE,CAAC;QACnC,IAAI,QAAQ,EAAE,CAAC;YACd,OAAO,CAAC,MAAM,CAAC,KAAK,CACnB,yCAAyC,iBAAiB,EAAE,MAAM,CAClE,CAAC;YACF,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,iDAAiD,CAAC,CAAC;YACxE,OAAO;QACR,CAAC;QAED,MAAM,MAAM,GAAG,MAAM,eAAe,EAAE,CAAC;QACvC,IAAI,CAAC,MAAM,EAAE,CAAC;YACb,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,wBAAwB,CAAC,CAAC;YAC/C,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;QACjB,CAAC;QAED,WAAW,CAAC,MAAM,CAAC,CAAC;QACpB,OAAO,CAAC,MAAM,CAAC,KAAK,CACnB,oBAAoB,iBAAiB,EAAE,gBAAgB,CACvD,CAAC;IACH,CAAC;CACD,CAAC"}
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
import { createRequire } from "node:module";
|
|
2
|
+
const require = createRequire(import.meta.url);
|
|
3
|
+
const { Recorder } = require("@kstonekuan/audio-capture");
|
|
4
|
+
export const devicesCommand = {
|
|
5
|
+
command: "devices",
|
|
6
|
+
describe: "List available audio input devices",
|
|
7
|
+
handler: () => {
|
|
8
|
+
const devices = Recorder.getAudioDevices();
|
|
9
|
+
if (devices.length === 0) {
|
|
10
|
+
process.stderr.write("No audio input devices found.\n");
|
|
11
|
+
process.exit(1);
|
|
12
|
+
}
|
|
13
|
+
for (const device of devices) {
|
|
14
|
+
process.stdout.write(`${device.index}: ${device.name}\n`);
|
|
15
|
+
}
|
|
16
|
+
},
|
|
17
|
+
};
|
|
18
|
+
//# sourceMappingURL=devices.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"devices.js","sourceRoot":"","sources":["../../src/commands/devices.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,aAAa,EAAE,MAAM,aAAa,CAAC;AAG5C,MAAM,OAAO,GAAG,aAAa,CAAC,MAAM,CAAC,IAAI,CAAC,GAAG,CAAC,CAAC;AAC/C,MAAM,EAAE,QAAQ,EAAE,GACjB,OAAO,CAAC,2BAA2B,CAA+C,CAAC;AAEpF,MAAM,CAAC,MAAM,cAAc,GAAkB;IAC5C,OAAO,EAAE,SAAS;IAClB,QAAQ,EAAE,oCAAoC;IAC9C,OAAO,EAAE,GAAG,EAAE;QACb,MAAM,OAAO,GAAG,QAAQ,CAAC,eAAe,EAAE,CAAC;QAC3C,IAAI,OAAO,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;YAC1B,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,iCAAiC,CAAC,CAAC;YACxD,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;QACjB,CAAC;QACD,KAAK,MAAM,MAAM,IAAI,OAAO,EAAE,CAAC;YAC9B,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,GAAG,MAAM,CAAC,KAAK,KAAK,MAAM,CAAC,IAAI,IAAI,CAAC,CAAC;QAC3D,CAAC;IACF,CAAC;CACD,CAAC"}
|
|
@@ -0,0 +1,255 @@
|
|
|
1
|
+
import { appendFileSync } from "node:fs";
|
|
2
|
+
import { createRequire } from "node:module";
|
|
3
|
+
import { GoogleGenAI, Modality } from "@google/genai";
|
|
4
|
+
import { render } from "ink";
|
|
5
|
+
import React from "react";
|
|
6
|
+
import { TranscribeUI } from "../components/TranscribeUI.js";
|
|
7
|
+
import { getStoredApiKey } from "../config.js";
|
|
8
|
+
const require = createRequire(import.meta.url);
|
|
9
|
+
const { Recorder } = require("@kstonekuan/audio-capture");
|
|
10
|
+
const GEMINI_MODEL = "gemini-2.5-flash-native-audio-preview-12-2025";
|
|
11
|
+
const AUDIO_MIME_TYPE = "audio/pcm;rate=16000";
|
|
12
|
+
// After the first transcription arrives, wait this long for a `finished: true`
|
|
13
|
+
// signal or additional fragments before closing (ms).
|
|
14
|
+
const TRANSCRIPT_SETTLE_TIMEOUT_MS = 2000;
|
|
15
|
+
const DEBUG_LOG_PATH = "/tmp/gemini-debug.log";
|
|
16
|
+
function createDebugLogger(enabled) {
|
|
17
|
+
if (!enabled)
|
|
18
|
+
return () => { };
|
|
19
|
+
return (message) => appendFileSync(DEBUG_LOG_PATH, message);
|
|
20
|
+
}
|
|
21
|
+
function getApiKey() {
|
|
22
|
+
const apiKey = getStoredApiKey();
|
|
23
|
+
if (!apiKey) {
|
|
24
|
+
process.stderr.write("Error: No API key found. Run 'gemini-voice auth' to set one up.\n");
|
|
25
|
+
process.exit(1);
|
|
26
|
+
}
|
|
27
|
+
return apiKey;
|
|
28
|
+
}
|
|
29
|
+
function int16ArrayToBase64(samples) {
|
|
30
|
+
const bytes = new Uint8Array(samples.buffer, samples.byteOffset, samples.byteLength);
|
|
31
|
+
return Buffer.from(bytes).toString("base64");
|
|
32
|
+
}
|
|
33
|
+
function computeRmsLevel(samples) {
|
|
34
|
+
if (samples.length === 0)
|
|
35
|
+
return 0;
|
|
36
|
+
let sumOfSquares = 0;
|
|
37
|
+
for (const sample of samples) {
|
|
38
|
+
const normalized = sample / 32768;
|
|
39
|
+
sumOfSquares += normalized * normalized;
|
|
40
|
+
}
|
|
41
|
+
const rms = Math.sqrt(sumOfSquares / samples.length);
|
|
42
|
+
// Logarithmic (dB) scale for perceptually responsive metering.
|
|
43
|
+
// Floor at -50dB to ignore background noise, ceiling at -6dB.
|
|
44
|
+
const minDb = -50;
|
|
45
|
+
const maxDb = -6;
|
|
46
|
+
const db = 20 * Math.log10(rms || Number.MIN_VALUE);
|
|
47
|
+
if (db < minDb)
|
|
48
|
+
return 0;
|
|
49
|
+
return Math.min(1, (db - minDb) / (maxDb - minDb));
|
|
50
|
+
}
|
|
51
|
+
const UI_UPDATE_INTERVAL_MS = 66; // ~15fps
|
|
52
|
+
function startMicrophoneCapture(state) {
|
|
53
|
+
let lastUiUpdateTime = 0;
|
|
54
|
+
state.recorder.start((error, samples) => {
|
|
55
|
+
if (error) {
|
|
56
|
+
process.stderr.write(`Audio capture error: ${error.message}\n`);
|
|
57
|
+
return;
|
|
58
|
+
}
|
|
59
|
+
if (state.sessionPhase.phase !== "active")
|
|
60
|
+
return;
|
|
61
|
+
const audioLevel = computeRmsLevel(samples);
|
|
62
|
+
const now = Date.now();
|
|
63
|
+
if (now - lastUiUpdateTime >= UI_UPDATE_INTERVAL_MS) {
|
|
64
|
+
lastUiUpdateTime = now;
|
|
65
|
+
const runningTranscript = state.transcriptParts.join("").trim();
|
|
66
|
+
state.updateUI("listening", audioLevel, runningTranscript);
|
|
67
|
+
}
|
|
68
|
+
const base64Audio = int16ArrayToBase64(samples);
|
|
69
|
+
state.debugLog(`[AUDIO] len=${samples.length} rms=${audioLevel.toFixed(3)}\n`);
|
|
70
|
+
state.sessionPhase.session.sendRealtimeInput({
|
|
71
|
+
audio: { data: base64Audio, mimeType: AUDIO_MIME_TYPE },
|
|
72
|
+
});
|
|
73
|
+
});
|
|
74
|
+
}
|
|
75
|
+
function handleServerMessage(message, state) {
|
|
76
|
+
const serverContent = message.serverContent;
|
|
77
|
+
if (!serverContent)
|
|
78
|
+
return;
|
|
79
|
+
if (serverContent.inputTranscription?.text) {
|
|
80
|
+
state.transcriptParts.push(serverContent.inputTranscription.text);
|
|
81
|
+
const runningTranscript = state.transcriptParts.join("").trim();
|
|
82
|
+
state.updateUI("listening", 0, runningTranscript);
|
|
83
|
+
if (state.settleTimeoutId) {
|
|
84
|
+
clearTimeout(state.settleTimeoutId);
|
|
85
|
+
}
|
|
86
|
+
if (serverContent.inputTranscription.finished) {
|
|
87
|
+
shutdownGracefully(state);
|
|
88
|
+
return;
|
|
89
|
+
}
|
|
90
|
+
state.settleTimeoutId = setTimeout(() => {
|
|
91
|
+
shutdownGracefully(state);
|
|
92
|
+
}, TRANSCRIPT_SETTLE_TIMEOUT_MS);
|
|
93
|
+
}
|
|
94
|
+
if (serverContent.turnComplete && !serverContent.modelTurn) {
|
|
95
|
+
if (!state.settleTimeoutId && state.transcriptParts.length === 0) {
|
|
96
|
+
state.settleTimeoutId = setTimeout(() => {
|
|
97
|
+
shutdownGracefully(state);
|
|
98
|
+
}, TRANSCRIPT_SETTLE_TIMEOUT_MS);
|
|
99
|
+
}
|
|
100
|
+
}
|
|
101
|
+
}
|
|
102
|
+
function shutdownGracefully(state) {
|
|
103
|
+
if (state.sessionPhase.phase === "shutting_down")
|
|
104
|
+
return;
|
|
105
|
+
const session = state.sessionPhase.phase === "active" ? state.sessionPhase.session : null;
|
|
106
|
+
state.sessionPhase = { phase: "shutting_down" };
|
|
107
|
+
if (state.settleTimeoutId) {
|
|
108
|
+
clearTimeout(state.settleTimeoutId);
|
|
109
|
+
state.settleTimeoutId = undefined;
|
|
110
|
+
}
|
|
111
|
+
state.recorder.stop();
|
|
112
|
+
const transcript = state.transcriptParts.join("").trim();
|
|
113
|
+
if (transcript.length > 0) {
|
|
114
|
+
state.updateUI("done", 0, transcript);
|
|
115
|
+
process.stdout.write(transcript);
|
|
116
|
+
process.stdout.write("\n");
|
|
117
|
+
}
|
|
118
|
+
else {
|
|
119
|
+
state.updateUI("done", 0, "No speech detected.");
|
|
120
|
+
process.stderr.write("No speech detected.\n");
|
|
121
|
+
}
|
|
122
|
+
session?.close();
|
|
123
|
+
}
|
|
124
|
+
async function runTranscribe(deviceIndex, debug, quiet) {
|
|
125
|
+
const debugLog = createDebugLogger(debug ?? false);
|
|
126
|
+
const apiKey = getApiKey();
|
|
127
|
+
const ai = new GoogleGenAI({ apiKey });
|
|
128
|
+
const liveConnectConfig = {
|
|
129
|
+
responseModalities: [Modality.AUDIO],
|
|
130
|
+
inputAudioTranscription: {},
|
|
131
|
+
realtimeInputConfig: {
|
|
132
|
+
automaticActivityDetection: {
|
|
133
|
+
disabled: false,
|
|
134
|
+
prefixPaddingMs: 100,
|
|
135
|
+
silenceDurationMs: 500,
|
|
136
|
+
},
|
|
137
|
+
},
|
|
138
|
+
systemInstruction: {
|
|
139
|
+
parts: [
|
|
140
|
+
{
|
|
141
|
+
text: "You are a transcription assistant. Respond with a single word 'ok' after each user message. Keep responses minimal.",
|
|
142
|
+
},
|
|
143
|
+
],
|
|
144
|
+
},
|
|
145
|
+
};
|
|
146
|
+
const recorder = new Recorder(deviceIndex ?? null);
|
|
147
|
+
let updateUI;
|
|
148
|
+
if (quiet) {
|
|
149
|
+
updateUI = () => { };
|
|
150
|
+
}
|
|
151
|
+
else {
|
|
152
|
+
let currentState = "connecting";
|
|
153
|
+
let currentAudioLevel = 0;
|
|
154
|
+
let currentTranscript = "";
|
|
155
|
+
const { rerender, unmount } = render(React.createElement(TranscribeUI, {
|
|
156
|
+
state: currentState,
|
|
157
|
+
audioLevel: currentAudioLevel,
|
|
158
|
+
transcript: currentTranscript,
|
|
159
|
+
}), { stdout: process.stderr });
|
|
160
|
+
updateUI = (newState, audioLevel, transcript) => {
|
|
161
|
+
currentState = newState;
|
|
162
|
+
currentAudioLevel = audioLevel;
|
|
163
|
+
currentTranscript = transcript;
|
|
164
|
+
rerender(React.createElement(TranscribeUI, {
|
|
165
|
+
state: currentState,
|
|
166
|
+
audioLevel: currentAudioLevel,
|
|
167
|
+
transcript: currentTranscript,
|
|
168
|
+
}));
|
|
169
|
+
if (newState === "done") {
|
|
170
|
+
unmount();
|
|
171
|
+
}
|
|
172
|
+
};
|
|
173
|
+
}
|
|
174
|
+
// connect() resolves after the WebSocket opens and the setup message is
|
|
175
|
+
// sent. The official example starts streaming audio immediately after
|
|
176
|
+
// connect() — no need to wait for setupComplete.
|
|
177
|
+
process.on("unhandledRejection", (reason) => {
|
|
178
|
+
debugLog(`[UNHANDLED] ${String(reason)}\n`);
|
|
179
|
+
});
|
|
180
|
+
debugLog("[START] connecting...\n");
|
|
181
|
+
let resolveSetupComplete;
|
|
182
|
+
const setupCompletePromise = new Promise((resolve) => {
|
|
183
|
+
resolveSetupComplete = resolve;
|
|
184
|
+
});
|
|
185
|
+
const state = {
|
|
186
|
+
sessionPhase: { phase: "connecting" },
|
|
187
|
+
recorder,
|
|
188
|
+
transcriptParts: [],
|
|
189
|
+
settleTimeoutId: undefined,
|
|
190
|
+
debugLog,
|
|
191
|
+
updateUI,
|
|
192
|
+
};
|
|
193
|
+
const session = await ai.live.connect({
|
|
194
|
+
model: GEMINI_MODEL,
|
|
195
|
+
config: liveConnectConfig,
|
|
196
|
+
callbacks: {
|
|
197
|
+
onopen: () => { },
|
|
198
|
+
onmessage: (message) => {
|
|
199
|
+
debugLog(`[MSG] keys: ${JSON.stringify(Object.keys(message))}\n`);
|
|
200
|
+
if (message.setupComplete) {
|
|
201
|
+
debugLog("[MSG] setupComplete\n");
|
|
202
|
+
resolveSetupComplete();
|
|
203
|
+
return;
|
|
204
|
+
}
|
|
205
|
+
if (message.serverContent) {
|
|
206
|
+
debugLog(`[MSG] serverContent: ${JSON.stringify(message.serverContent)}\n`);
|
|
207
|
+
}
|
|
208
|
+
handleServerMessage(message, state);
|
|
209
|
+
},
|
|
210
|
+
onerror: (error) => {
|
|
211
|
+
debugLog(`[ERR] ${error.message}\n`);
|
|
212
|
+
process.stderr.write(`Live API error: ${error.message}\n`);
|
|
213
|
+
shutdownGracefully(state);
|
|
214
|
+
},
|
|
215
|
+
onclose: (event) => {
|
|
216
|
+
debugLog(`[CLOSE] code=${event.code} reason=${event.reason}\n`);
|
|
217
|
+
},
|
|
218
|
+
},
|
|
219
|
+
});
|
|
220
|
+
state.sessionPhase = { phase: "active", session };
|
|
221
|
+
debugLog("[START] connected, waiting for setupComplete\n");
|
|
222
|
+
await setupCompletePromise;
|
|
223
|
+
debugLog("[START] setupComplete received, starting mic\n");
|
|
224
|
+
updateUI("listening", 0, "");
|
|
225
|
+
startMicrophoneCapture(state);
|
|
226
|
+
process.on("SIGINT", () => {
|
|
227
|
+
process.stderr.write("\nInterrupted.\n");
|
|
228
|
+
shutdownGracefully(state);
|
|
229
|
+
});
|
|
230
|
+
}
|
|
231
|
+
export const transcribeCommand = {
|
|
232
|
+
command: "transcribe",
|
|
233
|
+
describe: "Capture microphone audio and transcribe via Gemini Live API",
|
|
234
|
+
builder: (argv) => argv
|
|
235
|
+
.option("device", {
|
|
236
|
+
alias: "d",
|
|
237
|
+
type: "number",
|
|
238
|
+
describe: "Audio input device index (use 'devices' command to list)",
|
|
239
|
+
})
|
|
240
|
+
.option("debug", {
|
|
241
|
+
type: "boolean",
|
|
242
|
+
default: false,
|
|
243
|
+
describe: "Write debug logs to /tmp/gemini-debug.log",
|
|
244
|
+
})
|
|
245
|
+
.option("quiet", {
|
|
246
|
+
alias: "q",
|
|
247
|
+
type: "boolean",
|
|
248
|
+
default: false,
|
|
249
|
+
describe: "Suppress all UI output, only write transcript to stdout",
|
|
250
|
+
}),
|
|
251
|
+
handler: async (argv) => {
|
|
252
|
+
await runTranscribe(argv.device, argv.debug, argv.quiet);
|
|
253
|
+
},
|
|
254
|
+
};
|
|
255
|
+
//# sourceMappingURL=transcribe.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"transcribe.js","sourceRoot":"","sources":["../../src/commands/transcribe.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,cAAc,EAAE,MAAM,SAAS,CAAC;AACzC,OAAO,EAAE,aAAa,EAAE,MAAM,aAAa,CAAC;AAM5C,OAAO,EAAE,WAAW,EAAE,QAAQ,EAAE,MAAM,eAAe,CAAC;AACtD,OAAO,EAAE,MAAM,EAAE,MAAM,KAAK,CAAC;AAC7B,OAAO,KAAK,MAAM,OAAO,CAAC;AAG1B,OAAO,EAAE,YAAY,EAAE,MAAM,+BAA+B,CAAC;AAC7D,OAAO,EAAE,eAAe,EAAE,MAAM,cAAc,CAAC;AAE/C,MAAM,OAAO,GAAG,aAAa,CAAC,MAAM,CAAC,IAAI,CAAC,GAAG,CAAC,CAAC;AAC/C,MAAM,EAAE,QAAQ,EAAE,GACjB,OAAO,CAAC,2BAA2B,CAA+C,CAAC;AAEpF,MAAM,YAAY,GAAG,+CAA+C,CAAC;AACrE,MAAM,eAAe,GAAG,sBAAsB,CAAC;AAE/C,+EAA+E;AAC/E,sDAAsD;AACtD,MAAM,4BAA4B,GAAG,IAAI,CAAC;AAE1C,MAAM,cAAc,GAAG,uBAAuB,CAAC;AAE/C,SAAS,iBAAiB,CAAC,OAAgB;IAC1C,IAAI,CAAC,OAAO;QAAE,OAAO,GAAG,EAAE,GAAE,CAAC,CAAC;IAC9B,OAAO,CAAC,OAAe,EAAE,EAAE,CAAC,cAAc,CAAC,cAAc,EAAE,OAAO,CAAC,CAAC;AACrE,CAAC;AAED,SAAS,SAAS;IACjB,MAAM,MAAM,GAAG,eAAe,EAAE,CAAC;IACjC,IAAI,CAAC,MAAM,EAAE,CAAC;QACb,OAAO,CAAC,MAAM,CAAC,KAAK,CACnB,mEAAmE,CACnE,CAAC;QACF,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;IACjB,CAAC;IACD,OAAO,MAAM,CAAC;AACf,CAAC;AAED,SAAS,kBAAkB,CAAC,OAAmB;IAC9C,MAAM,KAAK,GAAG,IAAI,UAAU,CAC3B,OAAO,CAAC,MAAM,EACd,OAAO,CAAC,UAAU,EAClB,OAAO,CAAC,UAAU,CAClB,CAAC;IACF,OAAO,MAAM,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC,QAAQ,CAAC,QAAQ,CAAC,CAAC;AAC9C,CAAC;AAED,SAAS,eAAe,CAAC,OAAmB;IAC3C,IAAI,OAAO,CAAC,MAAM,KAAK,CAAC;QAAE,OAAO,CAAC,CAAC;IACnC,IAAI,YAAY,GAAG,CAAC,CAAC;IACrB,KAAK,MAAM,MAAM,IAAI,OAAO,EAAE,CAAC;QAC9B,MAAM,UAAU,GAAG,MAAM,GAAG,KAAK,CAAC;QAClC,YAAY,IAAI,UAAU,GAAG,UAAU,CAAC;IACzC,CAAC;IACD,MAAM,GAAG,GAAG,IAAI,CAAC,IAAI,CAAC,YAAY,GAAG,OAAO,CAAC,MAAM,CAAC,CAAC;IACrD,+DAA+D;IAC/D,8DAA8D;IAC9D,MAAM,KAAK,GAAG,CAAC,EAAE,CAAC;IAClB,MAAM,KAAK,GAAG,CAAC,CAAC,CAAC;IACjB,MAAM,EAAE,GAAG,EAAE,GAAG,IAAI,CAAC,KAAK,CAAC,GAAG,IAAI,MAAM,CAAC,SAAS,CAAC,CAAC;IACpD,IAAI,EAAE,GAAG,KAAK;QAAE,OAAO,CAAC,CAAC;IACzB,OAAO,IAAI,CAAC,GAAG,CAAC,CAAC,EAAE,CAAC,EAAE,GAAG,KAAK,CAAC,GAAG,CAAC,KAAK,GAAG,KAAK,CAAC,CAAC,CAAC;AACpD,CAAC;AAoBD,MAAM,qBAAqB,GAAG,EAAE,CAAC,CAAC,SAAS;AAE3C,SAAS,sBAAsB,CAAC,KAAoB;IACnD,IAAI,gBAAgB,GAAG,CAAC,CAAC;IAEzB,KAAK,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAC,KAAmB,EAAE,OAAmB,EAAE,EAAE;QACjE,IAAI,KAAK,EAAE,CAAC;YACX,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,wBAAwB,KAAK,CAAC,OAAO,IAAI,CAAC,CAAC;YAChE,OAAO;QACR,CAAC;QACD,IAAI,KAAK,CAAC,YAAY,CAAC,KAAK,KAAK,QAAQ;YAAE,OAAO;QAElD,MAAM,UAAU,GAAG,eAAe,CAAC,OAAO,CAAC,CAAC;QAE5C,MAAM,GAAG,GAAG,IAAI,CAAC,GAAG,EAAE,CAAC;QACvB,IAAI,GAAG,GAAG,gBAAgB,IAAI,qBAAqB,EAAE,CAAC;YACrD,gBAAgB,GAAG,GAAG,CAAC;YACvB,MAAM,iBAAiB,GAAG,KAAK,CAAC,eAAe,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC,IAAI,EAAE,CAAC;YAChE,KAAK,CAAC,QAAQ,CAAC,WAAW,EAAE,UAAU,EAAE,iBAAiB,CAAC,CAAC;QAC5D,CAAC;QAED,MAAM,WAAW,GAAG,kBAAkB,CAAC,OAAO,CAAC,CAAC;QAChD,KAAK,CAAC,QAAQ,CACb,eAAe,OAAO,CAAC,MAAM,QAAQ,UAAU,CAAC,OAAO,CAAC,CAAC,CAAC,IAAI,CAC9D,CAAC;QACF,KAAK,CAAC,YAAY,CAAC,OAAO,CAAC,iBAAiB,CAAC;YAC5C,KAAK,EAAE,EAAE,IAAI,EAAE,WAAW,EAAE,QAAQ,EAAE,eAAe,EAAE;SACvD,CAAC,CAAC;IACJ,CAAC,CAAC,CAAC;AACJ,CAAC;AAED,SAAS,mBAAmB,CAC3B,OAA0B,EAC1B,KAAoB;IAEpB,MAAM,aAAa,GAAG,OAAO,CAAC,aAAa,CAAC;IAC5C,IAAI,CAAC,aAAa;QAAE,OAAO;IAE3B,IAAI,aAAa,CAAC,kBAAkB,EAAE,IAAI,EAAE,CAAC;QAC5C,KAAK,CAAC,eAAe,CAAC,IAAI,CAAC,aAAa,CAAC,kBAAkB,CAAC,IAAI,CAAC,CAAC;QAClE,MAAM,iBAAiB,GAAG,KAAK,CAAC,eAAe,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC,IAAI,EAAE,CAAC;QAChE,KAAK,CAAC,QAAQ,CAAC,WAAW,EAAE,CAAC,EAAE,iBAAiB,CAAC,CAAC;QAElD,IAAI,KAAK,CAAC,eAAe,EAAE,CAAC;YAC3B,YAAY,CAAC,KAAK,CAAC,eAAe,CAAC,CAAC;QACrC,CAAC;QAED,IAAI,aAAa,CAAC,kBAAkB,CAAC,QAAQ,EAAE,CAAC;YAC/C,kBAAkB,CAAC,KAAK,CAAC,CAAC;YAC1B,OAAO;QACR,CAAC;QAED,KAAK,CAAC,eAAe,GAAG,UAAU,CAAC,GAAG,EAAE;YACvC,kBAAkB,CAAC,KAAK,CAAC,CAAC;QAC3B,CAAC,EAAE,4BAA4B,CAAC,CAAC;IAClC,CAAC;IAED,IAAI,aAAa,CAAC,YAAY,IAAI,CAAC,aAAa,CAAC,SAAS,EAAE,CAAC;QAC5D,IAAI,CAAC,KAAK,CAAC,eAAe,IAAI,KAAK,CAAC,eAAe,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;YAClE,KAAK,CAAC,eAAe,GAAG,UAAU,CAAC,GAAG,EAAE;gBACvC,kBAAkB,CAAC,KAAK,CAAC,CAAC;YAC3B,CAAC,EAAE,4BAA4B,CAAC,CAAC;QAClC,CAAC;IACF,CAAC;AACF,CAAC;AAED,SAAS,kBAAkB,CAAC,KAAoB;IAC/C,IAAI,KAAK,CAAC,YAAY,CAAC,KAAK,KAAK,eAAe;QAAE,OAAO;IACzD,MAAM,OAAO,GACZ,KAAK,CAAC,YAAY,CAAC,KAAK,KAAK,QAAQ,CAAC,CAAC,CAAC,KAAK,CAAC,YAAY,CAAC,OAAO,CAAC,CAAC,CAAC,IAAI,CAAC;IAC3E,KAAK,CAAC,YAAY,GAAG,EAAE,KAAK,EAAE,eAAe,EAAE,CAAC;IAEhD,IAAI,KAAK,CAAC,eAAe,EAAE,CAAC;QAC3B,YAAY,CAAC,KAAK,CAAC,eAAe,CAAC,CAAC;QACpC,KAAK,CAAC,eAAe,GAAG,SAAS,CAAC;IACnC,CAAC;IAED,KAAK,CAAC,QAAQ,CAAC,IAAI,EAAE,CAAC;IAEtB,MAAM,UAAU,GAAG,KAAK,CAAC,eAAe,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC,IAAI,EAAE,CAAC;IACzD,IAAI,UAAU,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;QAC3B,KAAK,CAAC,QAAQ,CAAC,MAAM,EAAE,CAAC,EAAE,UAAU,CAAC,CAAC;QACtC,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,UAAU,CAAC,CAAC;QACjC,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC;IAC5B,CAAC;SAAM,CAAC;QACP,KAAK,CAAC,QAAQ,CAAC,MAAM,EAAE,CAAC,EAAE,qBAAqB,CAAC,CAAC;QACjD,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,uBAAuB,CAAC,CAAC;IAC/C,CAAC;IAED,OAAO,EAAE,KAAK,EAAE,CAAC;AAClB,CAAC;AAQD,KAAK,UAAU,aAAa,CAC3B,WAAoB,EACpB,KAAe,EACf,KAAe;IAEf,MAAM,QAAQ,GAAG,iBAAiB,CAAC,KAAK,IAAI,KAAK,CAAC,CAAC;IACnD,MAAM,MAAM,GAAG,SAAS,EAAE,CAAC;IAC3B,MAAM,EAAE,GAAG,IAAI,WAAW,CAAC,EAAE,MAAM,EAAE,CAAC,CAAC;IAEvC,MAAM,iBAAiB,GAAsB;QAC5C,kBAAkB,EAAE,CAAC,QAAQ,CAAC,KAAK,CAAC;QACpC,uBAAuB,EAAE,EAAE;QAC3B,mBAAmB,EAAE;YACpB,0BAA0B,EAAE;gBAC3B,QAAQ,EAAE,KAAK;gBACf,eAAe,EAAE,GAAG;gBACpB,iBAAiB,EAAE,GAAG;aACtB;SACD;QACD,iBAAiB,EAAE;YAClB,KAAK,EAAE;gBACN;oBACC,IAAI,EAAE,qHAAqH;iBAC3H;aACD;SACD;KACD,CAAC;IAEF,MAAM,QAAQ,GAAG,IAAI,QAAQ,CAAC,WAAW,IAAI,IAAI,CAAC,CAAC;IAEnD,IAAI,QAIK,CAAC;IAEV,IAAI,KAAK,EAAE,CAAC;QACX,QAAQ,GAAG,GAAG,EAAE,GAAE,CAAC,CAAC;IACrB,CAAC;SAAM,CAAC;QACP,IAAI,YAAY,GAAoB,YAAY,CAAC;QACjD,IAAI,iBAAiB,GAAG,CAAC,CAAC;QAC1B,IAAI,iBAAiB,GAAG,EAAE,CAAC;QAE3B,MAAM,EAAE,QAAQ,EAAE,OAAO,EAAE,GAAG,MAAM,CACnC,KAAK,CAAC,aAAa,CAAC,YAAY,EAAE;YACjC,KAAK,EAAE,YAAY;YACnB,UAAU,EAAE,iBAAiB;YAC7B,UAAU,EAAE,iBAAiB;SAC7B,CAAC,EACF,EAAE,MAAM,EAAE,OAAO,CAAC,MAAM,EAAE,CAC1B,CAAC;QAEF,QAAQ,GAAG,CACV,QAAyB,EACzB,UAAkB,EAClB,UAAkB,EACX,EAAE;YACT,YAAY,GAAG,QAAQ,CAAC;YACxB,iBAAiB,GAAG,UAAU,CAAC;YAC/B,iBAAiB,GAAG,UAAU,CAAC;YAC/B,QAAQ,CACP,KAAK,CAAC,aAAa,CAAC,YAAY,EAAE;gBACjC,KAAK,EAAE,YAAY;gBACnB,UAAU,EAAE,iBAAiB;gBAC7B,UAAU,EAAE,iBAAiB;aAC7B,CAAC,CACF,CAAC;YACF,IAAI,QAAQ,KAAK,MAAM,EAAE,CAAC;gBACzB,OAAO,EAAE,CAAC;YACX,CAAC;QACF,CAAC,CAAC;IACH,CAAC;IAED,wEAAwE;IACxE,sEAAsE;IACtE,iDAAiD;IACjD,OAAO,CAAC,EAAE,CAAC,oBAAoB,EAAE,CAAC,MAAM,EAAE,EAAE;QAC3C,QAAQ,CAAC,eAAe,MAAM,CAAC,MAAM,CAAC,IAAI,CAAC,CAAC;IAC7C,CAAC,CAAC,CAAC;IAEH,QAAQ,CAAC,yBAAyB,CAAC,CAAC;IACpC,IAAI,oBAAgC,CAAC;IACrC,MAAM,oBAAoB,GAAG,IAAI,OAAO,CAAO,CAAC,OAAO,EAAE,EAAE;QAC1D,oBAAoB,GAAG,OAAO,CAAC;IAChC,CAAC,CAAC,CAAC;IAEH,MAAM,KAAK,GAAkB;QAC5B,YAAY,EAAE,EAAE,KAAK,EAAE,YAAY,EAAE;QACrC,QAAQ;QACR,eAAe,EAAE,EAAE;QACnB,eAAe,EAAE,SAAS;QAC1B,QAAQ;QACR,QAAQ;KACR,CAAC;IAEF,MAAM,OAAO,GAAG,MAAM,EAAE,CAAC,IAAI,CAAC,OAAO,CAAC;QACrC,KAAK,EAAE,YAAY;QACnB,MAAM,EAAE,iBAAiB;QACzB,SAAS,EAAE;YACV,MAAM,EAAE,GAAG,EAAE,GAAE,CAAC;YAChB,SAAS,EAAE,CAAC,OAA0B,EAAE,EAAE;gBACzC,QAAQ,CAAC,eAAe,IAAI,CAAC,SAAS,CAAC,MAAM,CAAC,IAAI,CAAC,OAAO,CAAC,CAAC,IAAI,CAAC,CAAC;gBAClE,IAAI,OAAO,CAAC,aAAa,EAAE,CAAC;oBAC3B,QAAQ,CAAC,uBAAuB,CAAC,CAAC;oBAClC,oBAAoB,EAAE,CAAC;oBACvB,OAAO;gBACR,CAAC;gBACD,IAAI,OAAO,CAAC,aAAa,EAAE,CAAC;oBAC3B,QAAQ,CACP,wBAAwB,IAAI,CAAC,SAAS,CAAC,OAAO,CAAC,aAAa,CAAC,IAAI,CACjE,CAAC;gBACH,CAAC;gBACD,mBAAmB,CAAC,OAAO,EAAE,KAAK,CAAC,CAAC;YACrC,CAAC;YACD,OAAO,EAAE,CAAC,KAAiB,EAAE,EAAE;gBAC9B,QAAQ,CAAC,SAAS,KAAK,CAAC,OAAO,IAAI,CAAC,CAAC;gBACrC,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,mBAAmB,KAAK,CAAC,OAAO,IAAI,CAAC,CAAC;gBAC3D,kBAAkB,CAAC,KAAK,CAAC,CAAC;YAC3B,CAAC;YACD,OAAO,EAAE,CAAC,KAAiB,EAAE,EAAE;gBAC9B,QAAQ,CAAC,gBAAgB,KAAK,CAAC,IAAI,WAAW,KAAK,CAAC,MAAM,IAAI,CAAC,CAAC;YACjE,CAAC;SACD;KACD,CAAC,CAAC;IAEH,KAAK,CAAC,YAAY,GAAG,EAAE,KAAK,EAAE,QAAQ,EAAE,OAAO,EAAE,CAAC;IAElD,QAAQ,CAAC,gDAAgD,CAAC,CAAC;IAC3D,MAAM,oBAAoB,CAAC;IAC3B,QAAQ,CAAC,gDAAgD,CAAC,CAAC;IAC3D,QAAQ,CAAC,WAAW,EAAE,CAAC,EAAE,EAAE,CAAC,CAAC;IAC7B,sBAAsB,CAAC,KAAK,CAAC,CAAC;IAE9B,OAAO,CAAC,EAAE,CAAC,QAAQ,EAAE,GAAG,EAAE;QACzB,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,kBAAkB,CAAC,CAAC;QACzC,kBAAkB,CAAC,KAAK,CAAC,CAAC;IAC3B,CAAC,CAAC,CAAC;AACJ,CAAC;AAED,MAAM,CAAC,MAAM,iBAAiB,GAA0C;IACvE,OAAO,EAAE,YAAY;IACrB,QAAQ,EAAE,6DAA6D;IACvE,OAAO,EAAE,CAAC,IAAI,EAAE,EAAE,CACjB,IAAI;SACF,MAAM,CAAC,QAAQ,EAAE;QACjB,KAAK,EAAE,GAAG;QACV,IAAI,EAAE,QAAQ;QACd,QAAQ,EAAE,0DAA0D;KACpE,CAAC;SACD,MAAM,CAAC,OAAO,EAAE;QAChB,IAAI,EAAE,SAAS;QACf,OAAO,EAAE,KAAK;QACd,QAAQ,EAAE,2CAA2C;KACrD,CAAC;SACD,MAAM,CAAC,OAAO,EAAE;QAChB,KAAK,EAAE,GAAG;QACV,IAAI,EAAE,SAAS;QACf,OAAO,EAAE,KAAK;QACd,QAAQ,EAAE,yDAAyD;KACnE,CAAC;IACJ,OAAO,EAAE,KAAK,EAAE,IAAI,EAAE,EAAE;QACvB,MAAM,aAAa,CAAC,IAAI,CAAC,MAAM,EAAE,IAAI,CAAC,KAAK,EAAE,IAAI,CAAC,KAAK,CAAC,CAAC;IAC1D,CAAC;CACD,CAAC"}
|
|
@@ -0,0 +1,9 @@
|
|
|
1
|
+
import type React from "react";
|
|
2
|
+
export type TranscribeState = "connecting" | "listening" | "done";
|
|
3
|
+
interface TranscribeUIProps {
|
|
4
|
+
state: TranscribeState;
|
|
5
|
+
audioLevel: number;
|
|
6
|
+
transcript: string;
|
|
7
|
+
}
|
|
8
|
+
export declare function TranscribeUI({ state, audioLevel, transcript, }: TranscribeUIProps): React.ReactElement;
|
|
9
|
+
export {};
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
import { jsx as _jsx, jsxs as _jsxs } from "react/jsx-runtime";
|
|
2
|
+
import { Box, Text } from "ink";
|
|
3
|
+
import Spinner from "ink-spinner";
|
|
4
|
+
import { useRef } from "react";
|
|
5
|
+
import { match } from "ts-pattern";
|
|
6
|
+
const BAR_COUNT = 16;
|
|
7
|
+
// Block elements ▁▂▃▄▅▆▇█ growing upward from the bottom of the cell.
|
|
8
|
+
const BLOCKS = [
|
|
9
|
+
" ",
|
|
10
|
+
"\u2581",
|
|
11
|
+
"\u2582",
|
|
12
|
+
"\u2583",
|
|
13
|
+
"\u2584",
|
|
14
|
+
"\u2585",
|
|
15
|
+
"\u2586",
|
|
16
|
+
"\u2587",
|
|
17
|
+
"\u2588",
|
|
18
|
+
];
|
|
19
|
+
const MIN_BLOCK_INDEX = 1;
|
|
20
|
+
function levelToBlock(level) {
|
|
21
|
+
const clamped = Math.max(0, Math.min(1, level));
|
|
22
|
+
const index = MIN_BLOCK_INDEX +
|
|
23
|
+
Math.round(clamped * (BLOCKS.length - 1 - MIN_BLOCK_INDEX));
|
|
24
|
+
return BLOCKS[index];
|
|
25
|
+
}
|
|
26
|
+
// Exponential smoothing factors: bars rise fast but fall slowly.
|
|
27
|
+
const ATTACK_FACTOR = 0.85;
|
|
28
|
+
const DECAY_FACTOR = 0.15;
|
|
29
|
+
function AudioLevelMeter({ level }) {
|
|
30
|
+
const smoothedRef = useRef(new Array(BAR_COUNT).fill(0));
|
|
31
|
+
const phaseRef = useRef(Math.random() * Math.PI * 2);
|
|
32
|
+
// Advance phase based on level so the wave moves when there's audio.
|
|
33
|
+
phaseRef.current += level * 0.6 + 0.02;
|
|
34
|
+
const smoothedLevels = Array.from({ length: BAR_COUNT }, (_, i) => {
|
|
35
|
+
// Two overlapping sine waves at different frequencies create organic,
|
|
36
|
+
// spatially correlated variation — adjacent bars have similar heights.
|
|
37
|
+
const wave1 = Math.sin(phaseRef.current + i * 0.7) * 0.3;
|
|
38
|
+
const wave2 = Math.sin(phaseRef.current * 1.3 + i * 1.1) * 0.2;
|
|
39
|
+
const targetLevel = level + (wave1 + wave2) * level;
|
|
40
|
+
const previous = smoothedRef.current[i];
|
|
41
|
+
const factor = targetLevel > previous ? ATTACK_FACTOR : DECAY_FACTOR;
|
|
42
|
+
const smoothed = previous + factor * (targetLevel - previous);
|
|
43
|
+
smoothedRef.current[i] = smoothed;
|
|
44
|
+
return smoothed;
|
|
45
|
+
});
|
|
46
|
+
const row = smoothedLevels.map((l) => levelToBlock(l)).join(" ");
|
|
47
|
+
return _jsx(Text, { children: row });
|
|
48
|
+
}
|
|
49
|
+
export function TranscribeUI({ state, audioLevel, transcript, }) {
|
|
50
|
+
return match(state)
|
|
51
|
+
.with("connecting", () => (_jsx(Box, { children: _jsxs(Text, { color: "yellow", children: [_jsx(Spinner, { type: "dots" }), " Connecting to Gemini Live API..."] }) })))
|
|
52
|
+
.with("listening", () => (_jsxs(Box, { flexDirection: "column", children: [_jsx(Box, { children: _jsxs(Text, { color: "green", children: [_jsx(Spinner, { type: "dots" }), " Listening... (silence ends recording)"] }) }), _jsxs(Box, { children: [_jsx(Text, { children: " " }), _jsx(AudioLevelMeter, { level: audioLevel })] }), transcript.length > 0 && (_jsx(Box, { children: _jsx(Text, { dimColor: true, children: transcript }) }))] })))
|
|
53
|
+
.with("done", () => (_jsx(Box, { children: _jsxs(Text, { color: "green", children: ["✓ ", transcript] }) })))
|
|
54
|
+
.exhaustive();
|
|
55
|
+
}
|
|
56
|
+
//# sourceMappingURL=TranscribeUI.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"TranscribeUI.js","sourceRoot":"","sources":["../../src/components/TranscribeUI.tsx"],"names":[],"mappings":";AAAA,OAAO,EAAE,GAAG,EAAE,IAAI,EAAE,MAAM,KAAK,CAAC;AAChC,OAAO,OAAO,MAAM,aAAa,CAAC;AAElC,OAAO,EAAE,MAAM,EAAE,MAAM,OAAO,CAAC;AAC/B,OAAO,EAAE,KAAK,EAAE,MAAM,YAAY,CAAC;AAUnC,MAAM,SAAS,GAAG,EAAE,CAAC;AAErB,sEAAsE;AACtE,MAAM,MAAM,GAAG;IACd,GAAG;IACH,QAAQ;IACR,QAAQ;IACR,QAAQ;IACR,QAAQ;IACR,QAAQ;IACR,QAAQ;IACR,QAAQ;IACR,QAAQ;CACR,CAAC;AAEF,MAAM,eAAe,GAAG,CAAC,CAAC;AAE1B,SAAS,YAAY,CAAC,KAAa;IAClC,MAAM,OAAO,GAAG,IAAI,CAAC,GAAG,CAAC,CAAC,EAAE,IAAI,CAAC,GAAG,CAAC,CAAC,EAAE,KAAK,CAAC,CAAC,CAAC;IAChD,MAAM,KAAK,GACV,eAAe;QACf,IAAI,CAAC,KAAK,CAAC,OAAO,GAAG,CAAC,MAAM,CAAC,MAAM,GAAG,CAAC,GAAG,eAAe,CAAC,CAAC,CAAC;IAC7D,OAAO,MAAM,CAAC,KAAK,CAAC,CAAC;AACtB,CAAC;AAED,iEAAiE;AACjE,MAAM,aAAa,GAAG,IAAI,CAAC;AAC3B,MAAM,YAAY,GAAG,IAAI,CAAC;AAE1B,SAAS,eAAe,CAAC,EAAE,KAAK,EAAqB;IACpD,MAAM,WAAW,GAAG,MAAM,CAAW,IAAI,KAAK,CAAC,SAAS,CAAC,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC,CAAC;IACnE,MAAM,QAAQ,GAAG,MAAM,CAAC,IAAI,CAAC,MAAM,EAAE,GAAG,IAAI,CAAC,EAAE,GAAG,CAAC,CAAC,CAAC;IAErD,qEAAqE;IACrE,QAAQ,CAAC,OAAO,IAAI,KAAK,GAAG,GAAG,GAAG,IAAI,CAAC;IAEvC,MAAM,cAAc,GAAG,KAAK,CAAC,IAAI,CAAC,EAAE,MAAM,EAAE,SAAS,EAAE,EAAE,CAAC,CAAC,EAAE,CAAC,EAAE,EAAE;QACjE,sEAAsE;QACtE,uEAAuE;QACvE,MAAM,KAAK,GAAG,IAAI,CAAC,GAAG,CAAC,QAAQ,CAAC,OAAO,GAAG,CAAC,GAAG,GAAG,CAAC,GAAG,GAAG,CAAC;QACzD,MAAM,KAAK,GAAG,IAAI,CAAC,GAAG,CAAC,QAAQ,CAAC,OAAO,GAAG,GAAG,GAAG,CAAC,GAAG,GAAG,CAAC,GAAG,GAAG,CAAC;QAC/D,MAAM,WAAW,GAAG,KAAK,GAAG,CAAC,KAAK,GAAG,KAAK,CAAC,GAAG,KAAK,CAAC;QAEpD,MAAM,QAAQ,GAAG,WAAW,CAAC,OAAO,CAAC,CAAC,CAAC,CAAC;QACxC,MAAM,MAAM,GAAG,WAAW,GAAG,QAAQ,CAAC,CAAC,CAAC,aAAa,CAAC,CAAC,CAAC,YAAY,CAAC;QACrE,MAAM,QAAQ,GAAG,QAAQ,GAAG,MAAM,GAAG,CAAC,WAAW,GAAG,QAAQ,CAAC,CAAC;QAC9D,WAAW,CAAC,OAAO,CAAC,CAAC,CAAC,GAAG,QAAQ,CAAC;QAClC,OAAO,QAAQ,CAAC;IACjB,CAAC,CAAC,CAAC;IAEH,MAAM,GAAG,GAAG,cAAc,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,YAAY,CAAC,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC,GAAG,CAAC,CAAC;IAEjE,OAAO,KAAC,IAAI,cAAE,GAAG,GAAQ,CAAC;AAC3B,CAAC;AAED,MAAM,UAAU,YAAY,CAAC,EAC5B,KAAK,EACL,UAAU,EACV,UAAU,GACS;IACnB,OAAO,KAAK,CAAC,KAAK,CAAC;SACjB,IAAI,CAAC,YAAY,EAAE,GAAG,EAAE,CAAC,CACzB,KAAC,GAAG,cACH,MAAC,IAAI,IAAC,KAAK,EAAC,QAAQ,aACnB,KAAC,OAAO,IAAC,IAAI,EAAC,MAAM,GAAG,yCACjB,GACF,CACN,CAAC;SACD,IAAI,CAAC,WAAW,EAAE,GAAG,EAAE,CAAC,CACxB,MAAC,GAAG,IAAC,aAAa,EAAC,QAAQ,aAC1B,KAAC,GAAG,cACH,MAAC,IAAI,IAAC,KAAK,EAAC,OAAO,aAClB,KAAC,OAAO,IAAC,IAAI,EAAC,MAAM,GAAG,8CACjB,GACF,EACN,MAAC,GAAG,eACH,KAAC,IAAI,oBAAS,EACd,KAAC,eAAe,IAAC,KAAK,EAAE,UAAU,GAAI,IACjC,EACL,UAAU,CAAC,MAAM,GAAG,CAAC,IAAI,CACzB,KAAC,GAAG,cACH,KAAC,IAAI,IAAC,QAAQ,kBAAE,UAAU,GAAQ,GAC7B,CACN,IACI,CACN,CAAC;SACD,IAAI,CAAC,MAAM,EAAE,GAAG,EAAE,CAAC,CACnB,KAAC,GAAG,cACH,MAAC,IAAI,IAAC,KAAK,EAAC,OAAO,aACjB,IAAI,EACJ,UAAU,IACL,GACF,CACN,CAAC;SACD,UAAU,EAAE,CAAC;AAChB,CAAC"}
|
package/dist/config.d.ts
ADDED
package/dist/config.js
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
import { existsSync, mkdirSync, readFileSync, writeFileSync } from "node:fs";
|
|
2
|
+
import { homedir } from "node:os";
|
|
3
|
+
import { join } from "node:path";
|
|
4
|
+
const CONFIG_DIR = join(homedir(), ".config", "gemini-voice");
|
|
5
|
+
const CONFIG_FILE = join(CONFIG_DIR, "config.json");
|
|
6
|
+
function readConfig() {
|
|
7
|
+
if (!existsSync(CONFIG_FILE))
|
|
8
|
+
return {};
|
|
9
|
+
try {
|
|
10
|
+
return JSON.parse(readFileSync(CONFIG_FILE, "utf-8"));
|
|
11
|
+
}
|
|
12
|
+
catch {
|
|
13
|
+
return {};
|
|
14
|
+
}
|
|
15
|
+
}
|
|
16
|
+
function writeConfig(config) {
|
|
17
|
+
mkdirSync(CONFIG_DIR, { recursive: true });
|
|
18
|
+
writeFileSync(CONFIG_FILE, `${JSON.stringify(config, null, "\t")}\n`, {
|
|
19
|
+
mode: 0o600,
|
|
20
|
+
});
|
|
21
|
+
}
|
|
22
|
+
export function getStoredApiKey() {
|
|
23
|
+
return readConfig().apiKey;
|
|
24
|
+
}
|
|
25
|
+
export function storeApiKey(apiKey) {
|
|
26
|
+
const config = readConfig();
|
|
27
|
+
config.apiKey = apiKey;
|
|
28
|
+
writeConfig(config);
|
|
29
|
+
}
|
|
30
|
+
export function clearApiKey() {
|
|
31
|
+
const config = readConfig();
|
|
32
|
+
delete config.apiKey;
|
|
33
|
+
writeConfig(config);
|
|
34
|
+
}
|
|
35
|
+
export function getConfigFilePath() {
|
|
36
|
+
return CONFIG_FILE;
|
|
37
|
+
}
|
|
38
|
+
//# sourceMappingURL=config.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"config.js","sourceRoot":"","sources":["../src/config.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,UAAU,EAAE,SAAS,EAAE,YAAY,EAAE,aAAa,EAAE,MAAM,SAAS,CAAC;AAC7E,OAAO,EAAE,OAAO,EAAE,MAAM,SAAS,CAAC;AAClC,OAAO,EAAE,IAAI,EAAE,MAAM,WAAW,CAAC;AAEjC,MAAM,UAAU,GAAG,IAAI,CAAC,OAAO,EAAE,EAAE,SAAS,EAAE,cAAc,CAAC,CAAC;AAC9D,MAAM,WAAW,GAAG,IAAI,CAAC,UAAU,EAAE,aAAa,CAAC,CAAC;AAMpD,SAAS,UAAU;IAClB,IAAI,CAAC,UAAU,CAAC,WAAW,CAAC;QAAE,OAAO,EAAE,CAAC;IACxC,IAAI,CAAC;QACJ,OAAO,IAAI,CAAC,KAAK,CAAC,YAAY,CAAC,WAAW,EAAE,OAAO,CAAC,CAAW,CAAC;IACjE,CAAC;IAAC,MAAM,CAAC;QACR,OAAO,EAAE,CAAC;IACX,CAAC;AACF,CAAC;AAED,SAAS,WAAW,CAAC,MAAc;IAClC,SAAS,CAAC,UAAU,EAAE,EAAE,SAAS,EAAE,IAAI,EAAE,CAAC,CAAC;IAC3C,aAAa,CAAC,WAAW,EAAE,GAAG,IAAI,CAAC,SAAS,CAAC,MAAM,EAAE,IAAI,EAAE,IAAI,CAAC,IAAI,EAAE;QACrE,IAAI,EAAE,KAAK;KACX,CAAC,CAAC;AACJ,CAAC;AAED,MAAM,UAAU,eAAe;IAC9B,OAAO,UAAU,EAAE,CAAC,MAAM,CAAC;AAC5B,CAAC;AAED,MAAM,UAAU,WAAW,CAAC,MAAc;IACzC,MAAM,MAAM,GAAG,UAAU,EAAE,CAAC;IAC5B,MAAM,CAAC,MAAM,GAAG,MAAM,CAAC;IACvB,WAAW,CAAC,MAAM,CAAC,CAAC;AACrB,CAAC;AAED,MAAM,UAAU,WAAW;IAC1B,MAAM,MAAM,GAAG,UAAU,EAAE,CAAC;IAC5B,OAAO,MAAM,CAAC,MAAM,CAAC;IACrB,WAAW,CAAC,MAAM,CAAC,CAAC;AACrB,CAAC;AAED,MAAM,UAAU,iBAAiB;IAChC,OAAO,WAAW,CAAC;AACpB,CAAC"}
|
package/dist/index.d.ts
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
export {};
|
package/dist/index.js
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,CAAC"}
|
package/package.json
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "@kstonekuan/gemini-voice",
|
|
3
|
+
"version": "0.0.1",
|
|
4
|
+
"type": "module",
|
|
5
|
+
"bin": {
|
|
6
|
+
"gemini-voice": "dist/cli.js"
|
|
7
|
+
},
|
|
8
|
+
"main": "dist/index.js",
|
|
9
|
+
"types": "dist/index.d.ts",
|
|
10
|
+
"files": [
|
|
11
|
+
"dist",
|
|
12
|
+
"commands",
|
|
13
|
+
"gemini-extension.json"
|
|
14
|
+
],
|
|
15
|
+
"repository": {
|
|
16
|
+
"type": "git",
|
|
17
|
+
"url": "https://github.com/kstonekuan/gemini-cli-voice-extension"
|
|
18
|
+
},
|
|
19
|
+
"publishConfig": {
|
|
20
|
+
"access": "public"
|
|
21
|
+
},
|
|
22
|
+
"dependencies": {
|
|
23
|
+
"@google/genai": "^1.45.0",
|
|
24
|
+
"@kstonekuan/audio-capture": "link:audio-capture",
|
|
25
|
+
"ink": "^6.8.0",
|
|
26
|
+
"ink-spinner": "^5.0.0",
|
|
27
|
+
"react": "^19.2.4",
|
|
28
|
+
"ts-pattern": "^5.9.0",
|
|
29
|
+
"yargs": "^18.0.0"
|
|
30
|
+
},
|
|
31
|
+
"devDependencies": {
|
|
32
|
+
"@biomejs/biome": "2.4.7",
|
|
33
|
+
"@napi-rs/cli": "^3.5.1",
|
|
34
|
+
"@types/node": "25.5.0",
|
|
35
|
+
"@types/react": "^19.2.14",
|
|
36
|
+
"@types/yargs": "^17.0.35",
|
|
37
|
+
"tsx": "^4.21.0",
|
|
38
|
+
"typescript": "5.9.3"
|
|
39
|
+
},
|
|
40
|
+
"scripts": {
|
|
41
|
+
"build:audio": "cd audio-capture && napi build --platform --release",
|
|
42
|
+
"build:ts": "tsc",
|
|
43
|
+
"build": "pnpm run build:audio && pnpm run build:ts",
|
|
44
|
+
"dev": "pnpm run build:audio && tsx src/cli.ts",
|
|
45
|
+
"lint": "biome check --write",
|
|
46
|
+
"typecheck": "tsc --noEmit",
|
|
47
|
+
"cargo:clippy": "cargo clippy --manifest-path audio-capture/Cargo.toml --all-targets --all-features",
|
|
48
|
+
"cargo:fmt": "cargo fmt --manifest-path audio-capture/Cargo.toml",
|
|
49
|
+
"cargo": "pnpm run cargo:clippy && pnpm run cargo:fmt",
|
|
50
|
+
"check": "pnpm run lint && pnpm run typecheck && pnpm run cargo"
|
|
51
|
+
}
|
|
52
|
+
}
|