shmakk 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/docs/voice.md ADDED
@@ -0,0 +1,181 @@
1
+ # shmakk voice
2
+
3
+ Always-on speech-to-speech mode for shmakk. Speak naturally — shmakk listens, transcribes, responds, and reads its answer aloud. No push-to-talk, no hotkeys.
4
+
5
+ ## How it works
6
+
7
+ - **STT** — Whisper-base ONNX via `@huggingface/transformers`. Runs fully in-process, no Python, no server, no API key. Model (~75MB) auto-downloads on first use.
8
+ - **VAD** — `sox` silence detection. Recording starts when you speak, stops automatically after 1 second of silence. No button to push.
9
+ - **TTS** — Kokoro-82M ONNX via `kokoro-js`. Runs fully in-process. Model (~165MB) auto-downloads on first use. Sentences stream sentence-by-sentence so the first words play immediately.
10
+ - **Voice rotation** — All 28 Kokoro voices rotate on a deterministic daily schedule (changes every 2–5 hours, varied per day). Feels random, fully reproducible.
11
+
12
+ ## Requirements
13
+
14
+ ### System packages
15
+
16
+ **Arch / EndeavourOS:**
17
+ ```bash
18
+ sudo pacman -S sox
19
+ ```
20
+
21
+ **Debian / Ubuntu:**
22
+ ```bash
23
+ sudo apt install sox
24
+ ```
25
+
26
+ **macOS:**
27
+ ```bash
28
+ brew install sox
29
+ ```
30
+
31
+ Sox provides the `rec` command used for VAD-based microphone capture. A working PulseAudio or PipeWire setup is also required (standard on any modern Linux desktop).
32
+
33
+ ### Node.js optional dependencies
34
+
35
+ Voice deps are optional — base shmakk works without them.
36
+
37
+ ```bash
38
+ npm install --include=optional
39
+ ```
40
+
41
+ Or use the setup script which installs deps and runs a full preflight check:
42
+
43
+ ```bash
44
+ npm run setup:voice
45
+ ```
46
+
47
+ ## Usage
48
+
49
+ ```bash
50
+ shmakk --sts # speech-to-speech: always-on mic + TTS responses
51
+ shmakk --stt # mic input only, text responses
52
+ shmakk --tts # text input, spoken responses
53
+ ```
54
+
55
+ Just speak. shmakk will:
56
+ 1. Detect your voice via VAD
57
+ 2. Transcribe it (shown in cyan on stderr)
58
+ 3. Send it as input
59
+ 4. Speak the response aloud, sentence by sentence
60
+
61
+ ## Interrupting
62
+
63
+ Say any of these to stop TTS mid-sentence:
64
+
65
+ > stop · quiet · shut up · silence · enough · cancel
66
+
67
+ The current playback stops immediately and shmakk goes back to listening.
68
+
69
+ ## Tuning VAD for your microphone
70
+
71
+ The default settings work well for USB headsets with a clean noise floor. If speech is cut off or recordings don't stop, tune these env vars:
72
+
73
+ | Variable | Default | Description |
74
+ |----------|---------|-------------|
75
+ | `SHMAKK_VOICE_SILENCE_SEC` | `1.0` | Seconds of silence before stopping |
76
+ | `SHMAKK_VOICE_SILENCE_THRESHOLD` | `1%` | Amplitude threshold for silence |
77
+ | `SHMAKK_VOICE_SILENCE_START_SEC` | `0.5` | Seconds of sound before starting |
78
+ | `SHMAKK_VOICE_PAD_START_SEC` | `0.3` | Padding added to start of recording |
79
+ | `SHMAKK_VOICE_MAX_SEC` | `30` | Hard maximum recording duration |
80
+
81
+ Add to your `.env`:
82
+ ```bash
83
+ SHMAKK_VOICE_SILENCE_SEC=1.5
84
+ SHMAKK_VOICE_SILENCE_THRESHOLD=2%
85
+ ```
86
+
87
+ To find your microphone's noise floor:
88
+ ```bash
89
+ rec -q -r 16000 -c 1 /tmp/silence.wav trim 0 3 && sox /tmp/silence.wav -n stat 2>&1 | grep RMS
90
+ ```
91
+
92
+ Set `SHMAKK_VOICE_SILENCE_THRESHOLD` to roughly 3× the RMS amplitude percentage.
93
+
94
+ ## Voice settings
95
+
96
+ | Variable | Default | Description |
97
+ |----------|---------|-------------|
98
+ | `SHMAKK_TTS_VOICE` | *(scheduled)* | Pin a specific voice (e.g. `am_michael`) |
99
+ | `SHMAKK_TTS_DTYPE` | `fp16` | Model precision: `fp32`, `fp16`, `q8`, `q4` |
100
+
101
+ **Available voices (28 total):**
102
+
103
+ | ID | Language | Gender |
104
+ |----|----------|--------|
105
+ | `af_bella`, `af_sarah`, `af_sky`, `af_nicole`, `af_heart`, `af_aoede`, `af_river` | American English | Female |
106
+ | `am_adam`, `am_michael`, `am_echo`, `am_liam` | American English | Male |
107
+ | `bf_emma`, `bf_isabella` | British English | Female |
108
+ | `bm_george`, `bm_lewis`, `bm_daniel` | British English | Male |
109
+ | `jf_alpha`, `jf_gongitsune`, `jf_nezumi`, `jf_tebukuro` | Japanese | Female |
110
+ | `jm_kumo` | Japanese | Male |
111
+ | `zf_xiaobei`, `zf_xiaoni`, `zf_xiaoxiao`, `zf_xiaoyi` | Chinese | Female |
112
+ | `zm_yunjian`, `zm_yunxia` | Chinese | Male |
113
+
114
+ To see today's voice schedule:
115
+ ```bash
116
+ node -e "
117
+ const tts = require('./src/services/tts');
118
+ tts.listVoices().then(voices => {
119
+ const now = new Date();
120
+ const day = now.getFullYear() * 10000 + (now.getMonth()+1)*100 + now.getDate();
121
+ const daySeed = (day * 2654435761) >>> 0;
122
+ let t = 0, b = 0, seed = daySeed;
123
+ const ids = voices.map(v => v.id);
124
+ console.log('Today schedule:');
125
+ while (t < 1440) {
126
+ seed = (seed * 1664525 + 1013904223) >>> 0;
127
+ const mins = 120 + (seed % 180);
128
+ const voiceSeed = (daySeed ^ (b * 2246822519)) >>> 0;
129
+ const v = ids[voiceSeed % ids.length];
130
+ const h = String(Math.floor(t/60)).padStart(2,'0');
131
+ const m = String(t%60).padStart(2,'0');
132
+ console.log(h+':'+m, '->', v, '('+Math.round(mins/60*10)/10+'h)');
133
+ t += mins; b++;
134
+ }
135
+ });
136
+ "
137
+ ```
138
+
139
+ ## Language
140
+
141
+ STT defaults to English. Override:
142
+ ```bash
143
+ shmakk --sts --voice-language sv # Swedish
144
+ shmakk --sts --voice-language de # German
145
+ ```
146
+
147
+ Or set permanently:
148
+ ```bash
149
+ export SHMAKK_VOICE_LANGUAGE=en
150
+ ```
151
+
152
+ ## Troubleshooting
153
+
154
+ **Voice not detected / recording doesn't start**
155
+ ```bash
156
+ # Check mic level
157
+ rec -q -r 16000 -c 1 /tmp/test.wav trim 0 3 && sox /tmp/test.wav -n stat 2>&1 | grep RMS
158
+ # Lower the start threshold if RMS is low
159
+ export SHMAKK_VOICE_SILENCE_THRESHOLD=0.5%
160
+ ```
161
+
162
+ **Recording doesn't stop**
163
+ ```bash
164
+ # Raise the stop threshold — background noise is above it
165
+ export SHMAKK_VOICE_SILENCE_THRESHOLD=3%
166
+ ```
167
+
168
+ **No TTS sound**
169
+ ```bash
170
+ # Check player
171
+ which paplay aplay
172
+ pactl info
173
+ ```
174
+
175
+ **Slow first response**
176
+ Models download on first use. After that they're cached in `~/.cache/huggingface`. Subsequent starts load from cache in seconds.
177
+
178
+ **Run the full preflight check:**
179
+ ```bash
180
+ npm run setup:voice
181
+ ```
package/package.json ADDED
@@ -0,0 +1,58 @@
1
+ {
2
+ "name": "shmakk",
3
+ "version": "1.1.0",
4
+ "description": "AI-supervised terminal wrapper — command correction, tool-driven tasks, safety controls",
5
+ "license": "MIT",
6
+ "keywords": [
7
+ "terminal",
8
+ "ai",
9
+ "pty",
10
+ "developer-tools",
11
+ "cli",
12
+ "voice",
13
+ "speech-to-text",
14
+ "text-to-speech"
15
+ ],
16
+ "bin": {
17
+ "shmakk": "bin/shmakk.js"
18
+ },
19
+ "files": [
20
+ "bin/",
21
+ "src/",
22
+ "scripts/",
23
+ "docs/",
24
+ "README.md",
25
+ "LICENSE",
26
+ ".env.example"
27
+ ],
28
+ "main": "src/index.js",
29
+ "type": "commonjs",
30
+ "scripts": {
31
+ "postinstall": "node scripts/patch-onnxruntime.js",
32
+ "start": "node bin/shmakk.js",
33
+ "dev": "node bin/shmakk.js --debug",
34
+ "test": "node test/units.js",
35
+ "check": "node -e \"require('./src/index'); require('./src/agent'); require('./src/orchestrator'); console.log('check-ok')\"",
36
+ "mock-llm": "node test/mock-llm.js",
37
+ "global:setup": "node src/global-setup.js",
38
+ "global:link": "npm link && npm run global:setup",
39
+ "global:unlink": "npm unlink -g shmakk",
40
+ "global:install": "npm install -g . && npm run global:setup",
41
+ "global:reinstall": "npm uninstall -g shmakk && npm install -g . && npm run global:setup",
42
+ "setup": "npm install && npm run check && npm run test",
43
+ "setup:voice": "npm install --include=optional && node src/setup-voice.js",
44
+ "global:doctor": "node src/global-doctor.js"
45
+ },
46
+ "engines": {
47
+ "node": ">=18"
48
+ },
49
+ "dependencies": {
50
+ "node-pty": "^1.0.0",
51
+ "openai": "^4.77.0",
52
+ "wavefile": "^11.0.0"
53
+ },
54
+ "optionalDependencies": {
55
+ "@huggingface/transformers": "^4.2.0",
56
+ "kokoro-js": "^1.2.1"
57
+ }
58
+ }
@@ -0,0 +1,82 @@
1
+ #!/usr/bin/env node
2
+ /**
3
+ * Patches the kokoro-js nested onnxruntime-node so its SONAME doesn't conflict
4
+ * with the project-level onnxruntime-node (@huggingface/transformers).
5
+ *
6
+ * Problem:
7
+ * - @huggingface/transformers → onnxruntime-node 1.24.3 (napi-v6)
8
+ * - kokoro-js → @huggingface/transformers 3.x → onnxruntime-node 1.21.0 (napi-v3)
9
+ * - Both ship libonnxruntime.so.1 with the same SONAME
10
+ * - Whichever loads first "wins"; the second fails with symbol version errors
11
+ *
12
+ * Fix:
13
+ * - Rename SONAME of the napi-v3 lib to libkokoro_ort.so.1
14
+ * - Update the napi-v3 binding.node's NEEDED reference accordingly
15
+ */
16
+
17
+ const { execSync } = require('child_process');
18
+ const fs = require('fs');
19
+ const path = require('path');
20
+
21
+ const KOKORO_ORT_DIR = path.join(
22
+ __dirname, '..', 'node_modules', 'kokoro-js', 'node_modules',
23
+ 'onnxruntime-node', 'bin', 'napi-v3', 'linux', 'x64'
24
+ );
25
+
26
+ const ORIG_SO = 'libonnxruntime.so.1';
27
+ const NEW_SO = 'libkokoro_ort.so.1';
28
+
29
+ function patchelf(...args) {
30
+ return execSync(`patchelf ${args.join(' ')}`, { encoding: 'utf8', stdio: 'pipe' });
31
+ }
32
+
33
+ function main() {
34
+ // Check if patchelf is available
35
+ try {
36
+ execSync('which patchelf', { stdio: 'ignore' });
37
+ } catch {
38
+ console.error('[shmakk] patchelf not found. Install it for voice+TTS coexistence.');
39
+ console.error(' pacman -S patchelf # Arch');
40
+ console.error(' apt install patchelf # Debian/Ubuntu');
41
+ console.error(' brew install patchelf # macOS');
42
+ process.exit(0);
43
+ }
44
+
45
+ if (!fs.existsSync(KOKORO_ORT_DIR)) {
46
+ // kokoro-js or its onnxruntime-node not installed — nothing to patch
47
+ return;
48
+ }
49
+
50
+ const soPath = path.join(KOKORO_ORT_DIR, ORIG_SO);
51
+ const newSoPath = path.join(KOKORO_ORT_DIR, NEW_SO);
52
+ const bindingPath = path.join(KOKORO_ORT_DIR, 'onnxruntime_binding.node');
53
+
54
+ // Already patched?
55
+ if (fs.existsSync(newSoPath)) {
56
+ // Verify it was done correctly
57
+ const soname = execSync(`patchelf --print-soname "${newSoPath}"`, { encoding: 'utf8' }).trim();
58
+ if (soname === NEW_SO) {
59
+ return; // Already patched, nothing to do
60
+ }
61
+ // Otherwise, re-apply from scratch
62
+ fs.unlinkSync(newSoPath);
63
+ }
64
+
65
+ if (!fs.existsSync(soPath)) {
66
+ console.error('[shmakk] Expected onnxruntime library not found:', soPath);
67
+ process.exit(1);
68
+ }
69
+
70
+ // 1. Change SONAME of the .so file
71
+ patchelf('--set-soname', NEW_SO, soPath);
72
+
73
+ // 2. Rename the file
74
+ fs.renameSync(soPath, newSoPath);
75
+
76
+ // 3. Update the binding.node's NEEDED reference
77
+ patchelf('--replace-needed', ORIG_SO, NEW_SO, bindingPath);
78
+
79
+ console.log('[shmakk] Patched kokoro-js onnxruntime SONAME →', NEW_SO);
80
+ }
81
+
82
+ main();
package/src/agent.js ADDED
Binary file
package/src/audit.js ADDED
@@ -0,0 +1,18 @@
1
+ const fs = require('fs');
2
+ const os = require('os');
3
+ const path = require('path');
4
+
5
+ function logPath() {
6
+ const base = process.env.XDG_STATE_HOME || path.join(os.homedir(), '.local', 'state');
7
+ return path.join(base, 'shmakk', 'audit.log');
8
+ }
9
+
10
+ function append(entry) {
11
+ try {
12
+ const p = logPath();
13
+ fs.mkdirSync(path.dirname(p), { recursive: true });
14
+ fs.appendFileSync(p, JSON.stringify({ t: new Date().toISOString(), ...entry }) + '\n');
15
+ } catch { /* never let audit failures bubble */ }
16
+ }
17
+
18
+ module.exports = { append, logPath };
package/src/cli.js ADDED
@@ -0,0 +1,177 @@
1
+ function parseArgs(argv) {
2
+ const opts = {
3
+ review: false,
4
+ yesFiles: false,
5
+ updateGlossary: false,
6
+ help: false,
7
+ debug: false,
8
+ workspace: null,
9
+ noAi: false,
10
+ noCorrection: false,
11
+ printConfig: false,
12
+ status: false,
13
+ buildHistory: null,
14
+ stats: false,
15
+ compact: false,
16
+ loadSkill: null,
17
+ listSkills: false,
18
+ skillStatus: false,
19
+ unloadSkill: null,
20
+ installSkill: null,
21
+ resumeStatus: false,
22
+ exitNow: false,
23
+ restart: false,
24
+ profile: null,
25
+ profileSet: null,
26
+ colors: null,
27
+ endpoint: null,
28
+ voice: false,
29
+ stt: false,
30
+ tts: false,
31
+ sts: false,
32
+ voiceLanguage: null,
33
+ voiceMaxDuration: null,
34
+ voiceSilenceSec: null,
35
+ voiceSilenceThreshold: null,
36
+ voiceSilenceStartSec: null,
37
+ voicePadStartSec: null,
38
+ ttsVoice: null,
39
+ completion: null,
40
+ unknown: [],
41
+ };
42
+
43
+ for (let i = 0; i < argv.length; i++) {
44
+ const a = argv[i];
45
+ switch (a) {
46
+ case '--review': opts.review = true; break;
47
+ case '--yes-files': opts.yesFiles = true; break;
48
+ case '--update-command-glossary': opts.updateGlossary = true; break;
49
+ case '-h':
50
+ case '--help': opts.help = true; break;
51
+ case '--debug': opts.debug = true; break;
52
+ case '--no-ai': opts.noAi = true; break;
53
+ case '--no-correction': opts.noCorrection = true; break;
54
+ case '--print-config': opts.printConfig = true; break;
55
+ case '--workspace': opts.workspace = argv[++i] || null; break;
56
+ case '--status': opts.status = true; break;
57
+ case '--stats': opts.stats = true; break;
58
+ case '--compact': opts.compact = true; break;
59
+ case '--load-skill': opts.loadSkill = argv[++i] || null; break;
60
+ case '--list-skills': opts.listSkills = true; break;
61
+ case '--skill-status': opts.skillStatus = true; break;
62
+ case '--unload-skill': opts.unloadSkill = argv[++i] || null; break;
63
+ case '--install-skill': opts.installSkill = argv[++i] || null; break;
64
+ case '--resume-status': opts.resumeStatus = true; break;
65
+ case '--exit': opts.exitNow = true; break;
66
+ case '--restart': opts.restart = true; break;
67
+ case '--reset': opts.reset = true; break;
68
+ case '--profile': opts.profile = argv[++i] || null; break;
69
+ case '--profile-set': opts.profileSet = argv[++i] || null; break;
70
+ case '--build-history':
71
+ opts.buildHistory = [];
72
+ // Collect remaining args as file paths until next flag
73
+ while (i + 1 < argv.length && !argv[i + 1].startsWith('--')) {
74
+ opts.buildHistory.push(argv[++i]);
75
+ }
76
+ if (!opts.buildHistory.length) opts.buildHistory = null; // flag with no files = auto-detect
77
+ break;
78
+ case '--stt': opts.stt = true; opts.voice = true; break;
79
+ case '--tts': opts.tts = true; break;
80
+ case '--sts': opts.sts = true; opts.stt = true; opts.tts = true; opts.voice = true; break;
81
+ case '--voice': opts.stt = true; opts.voice = true; break;
82
+ case '--voice-language': opts.voiceLanguage = argv[++i] || null; break;
83
+ case '--voice-max-sec': opts.voiceMaxDuration = parseInt(argv[++i], 10) || null; break;
84
+ case '--voice-silence-sec': opts.voiceSilenceSec = argv[++i] || null; break;
85
+ case '--voice-silence-threshold': opts.voiceSilenceThreshold = argv[++i] || null; break;
86
+ case '--voice-silence-start-sec': opts.voiceSilenceStartSec = argv[++i] || null; break;
87
+ case '--voice-pad-start-sec': opts.voicePadStartSec = argv[++i] || null; break;
88
+ case '--tts-voice': opts.ttsVoice = argv[++i] || null; break;
89
+ case '--completion': opts.completion = argv[++i] || null; break;
90
+ case '--colors': opts.colors = argv[++i] || null; break;
91
+ case '--endpoint': opts.endpoint = argv[++i] || null; break;
92
+ default: opts.unknown.push(a);
93
+ }
94
+ }
95
+ return opts;
96
+ }
97
+
98
+ const HELP = `shmakk - AI-supervised terminal wrapper
99
+
100
+ Usage:
101
+ shmakk Launch in auto mode
102
+ shmakk --review Launch in review mode (confirm every AI action)
103
+ shmakk --yes-files Auto-accept AI file writes, edits, and directory creation
104
+ shmakk --update-command-glossary
105
+ Scan PATH and build local command glossary
106
+ shmakk --help Show this help
107
+ shmakk --build-history [files...]
108
+ Parse shell history files and build command
109
+ frequency map for better corrections.
110
+ Auto-detects bash/zsh/fish history if no
111
+ files given.
112
+
113
+ Control (run from inside an shmakk session):
114
+ shmakk --status Show whether this terminal is inside shmakk
115
+ shmakk --stats Show session/task stats (journal, audit, active skill)
116
+ shmakk --compact Compact context by clearing conversation + task journal
117
+ shmakk --load-skill <name> Load a Claude/Codex-style skill into shmakk workspace state
118
+ shmakk --list-skills List registered local skills
119
+ shmakk --skill-status Show active skill and registry status
120
+ shmakk --unload-skill <name> Remove skill from registry/local cache
121
+ shmakk --install-skill <url> Download skill markdown from URL, validate, and load
122
+ shmakk --resume-status Show task journal summary for resume continuity
123
+ shmakk --exit Cleanly exit the parent shmakk
124
+ shmakk --restart Restart the inner shell (preserves window)
125
+ shmakk --reset Clear the AI conversation history (keep session)
126
+ shmakk --profile-set <name> Switch profile and restart (tiny|balanced|deep|builder|large-app)
127
+ shmakk --colors <true|false> Enable or disable ANSI colors + code highlighting
128
+
129
+ Optional:
130
+ --no-ai Disable AI entirely (pure passthrough)
131
+ --no-correction Disable command correction
132
+ --yes-files Auto-accept write_file, edit_file, and make_dir in auto mode
133
+ --workspace <path> Override workspace root
134
+ --profile <name> Startup profile: tiny|balanced|deep|builder|large-app
135
+ --endpoint <name> Use endpoint preset from .shmakk/endpoints.json
136
+ --colors <true|false> Toggle colored logs and code-block highlighting
137
+ --debug Verbose logging to stderr
138
+ --print-config Print resolved configuration and exit
139
+
140
+ Speech-to-Text / Text-to-Speech (VAD-based, no hotkeys):
141
+ --sts Speech-to-Speech: always-on mic + TTS responses
142
+ --stt Speech-to-Text: mic → text input (no TTS)
143
+ --tts Text-to-Speech: text input → spoken responses
144
+ --voice-language <code> Language hint (e.g., en, es, fr)
145
+ --voice-max-sec <sec> Max recording duration (default: 30)
146
+ --voice-silence-sec <sec> VAD silence before stopping (default: 1.0)
147
+ --voice-silence-threshold <%> VAD amplitude threshold (default: 1%)
148
+ --voice-silence-start-sec <sec> Seconds of sound before starting (default: 0.5)
149
+ --voice-pad-start-sec <sec> Padding added to start of recording (default: 0.3)
150
+ --tts-voice <name> Override rotated voice schedule (default: af_heart)
151
+ --completion <bash|zsh|fish> Output shell tab-completion script
152
+
153
+ Voice uses Whisper-base ONNX in-process. No Python, no server, no API key.
154
+ Model auto-downloads on first use.
155
+
156
+ TTS uses kokoro-js (Kokoro-82M ONNX, ~334MB fp16). Model auto-downloads on first use.
157
+ Requires: aplay, paplay, or afplay for audio playback.
158
+ All 28 Kokoro voices rotate automatically on a daily schedule.
159
+
160
+ Voice environment:
161
+ SHMAKK_HF_CACHE HuggingFace cache directory override
162
+ SHMAKK_TTS_VOICE Pin a specific TTS voice (default: auto-rotated)
163
+ SHMAKK_TTS_DTYPE Kokoro dtype: fp32, fp16, q8, q4, q4f16 (default: fp16)
164
+ SHMAKK_VOICE_LANGUAGE Language hint for STT (e.g., en, es, fr)
165
+ SHMAKK_VOICE_MAX_SEC Max recording seconds (default: 30)
166
+ SHMAKK_VOICE_SILENCE_SEC VAD silence threshold seconds (default: 1.0)
167
+ SHMAKK_VOICE_SILENCE_THRESHOLD VAD amplitude threshold (default: 1%)
168
+ SHMAKK_VOICE_PAD_START_SEC Padding added to start of recording (default: 0.3)
169
+
170
+ Environment:
171
+ SHMAKK_BASE_URL OpenAI-compatible base URL
172
+ SHMAKK_API_KEY API key
173
+ SHMAKK_MODEL Default model
174
+ SHMAKK_HEADERS Comma-separated extra headers (k=v,k=v)
175
+ `;
176
+
177
+ module.exports = { parseArgs, HELP };