twokey 1.0.2 → 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +124 -145
  2. package/bin/twokey.js +73 -4
  3. package/package.json +3 -3
package/README.md CHANGED
@@ -1,209 +1,188 @@
1
1
  # TwoKey Linux AI Assistant
2
2
 
3
- TwoKey is a Linux-first desktop assistant prototype. The goal is a small always-available overlay that will later let users hold two keys, speak, and have the result processed by local or online AI providers without switching to a browser or chat window.
4
-
5
- Phase 1 implements the foundation only:
6
-
7
- - Tauri + React project structure
8
- - Minimal dark overlay pill
9
- - Dummy modes: Conversation, Edit Text, Dictation, Feedback
10
- - Clickable mode menu
11
- - Placeholder settings window
12
- - Initial architecture and product documentation
13
- - X11 Phase 2 prototype: hold `Ctrl+Space` to record audio, double tap `Ctrl+Space` to cycle modes
14
- - Phase 3 prototype: mock STT and external STT command adapter
15
- - Phase 4 prototype: local Ollama conversation mode with `qwen2.5:3b`
16
-
17
- No TTS or text injection is implemented yet.
18
-
19
- ## Installation
3
+ TwoKey is a Linux-first desktop AI assistant with a small floating pill overlay.
4
+ The core idea matches the video workflow:
5
+
6
+ - hold a global hotkey, speak, release
7
+ - audio is recorded and transcribed
8
+ - transcript is sent to the selected AI provider
9
+ - result is used directly in your current desktop workflow
10
+ - optional TTS reads the answer out loud
11
+
12
+ ## What Works Now
13
+
14
+ - Global hold hotkey on X11 for voice capture.
15
+ - Double-tap hotkey to cycle modes.
16
+ - Single-tap hotkey to open file-context picker.
17
+ - Voice-triggered toolchains for multi-step desktop actions.
18
+ - Four modes:
19
+ - Conversation
20
+ - Edit Text
21
+ - Dictation
22
+ - Feedback
23
+ - Conversation mode:
24
+ - transcript -> provider answer
25
+ - optional TTS playback
26
+ - Edit mode:
27
+ - read selected text
28
+ - apply spoken transform via AI
29
+ - replace selected text directly
30
+ - Dictation mode:
31
+ - insert transcript at cursor
32
+ - Feedback mode:
33
+ - local feedback persistence in history DB
34
+ - File context:
35
+ - TXT/MD/JSON/CSV/PDF content context
36
+ - image context routing to vision-capable providers
37
+ - Provider routing:
38
+ - local Ollama
39
+ - OpenAI-compatible
40
+ - OpenRouter-compatible
41
+ - secure API key storage via local keyring
42
+ - Local audit/history persistence in SQLite.
43
+ - Tray icon and settings window.
44
+ - GitHub release check from settings.
45
+ - In-app AppImage update download and launch.
46
+
47
+ ## Current Video Parity
48
+
49
+ Implemented from video behavior:
50
+
51
+ - Minimal floating pill UI.
52
+ - Hold-to-talk workflow.
53
+ - Mode switch via double tap.
54
+ - Text workflow without browser/chat-tab context switch.
55
+ - File attach + ask flow.
56
+ - Hybrid local/online providers.
57
+ - Optional TTS answer playback.
58
+
59
+ Still not fully equivalent to the video vision:
60
+
61
+ - Toolchains are implemented, but no visual workflow builder exists yet.
62
+ - Wayland still has compositor-specific limits for global hold hotkeys and full automation.
63
+ - Update install is available for AppImage, but no signed rollback-capable updater pipeline yet.
64
+
65
+ ## Install
20
66
 
21
67
  ```bash
22
68
  npm install twokey
23
69
  ```
24
70
 
25
- Start the tool directly:
71
+ Run:
26
72
 
27
73
  ```bash
28
74
  twokey
29
75
  ```
30
76
 
31
- Default behavior: start the native desktop app in background.
77
+ Default behavior:
32
78
 
33
- Useful CLI options:
79
+ - starts native desktop app in background
80
+ - if no native binary is installed, tries to download latest AppImage release
81
+
82
+ Useful options:
34
83
 
35
84
  ```bash
36
85
  twokey --help
37
86
  twokey --cli
38
- twokey --once "Erklaere kurz den Unterschied zwischen X11 und Wayland"
87
+ twokey --once "Erklaere X11 vs Wayland kurz"
39
88
  twokey --desktop
40
89
  ```
41
90
 
42
- ## Minimal Usage
43
-
44
- ```ts
45
- import { getPackageInfo } from "twokey";
46
-
47
- const info = getPackageInfo();
48
- console.log(info.name);
49
- console.log(info.runtimeStatus.waylandGlobalHotkeys);
50
- ```
51
-
52
- The package exports runtime status metadata. Current status includes planned/limited areas such as Wayland global hotkeys, TTS, tray menu, SQLite history/audit, and online provider execution until secure API-key storage is implemented.
91
+ ## Development
53
92
 
54
- ## Requirements
55
-
56
- - Linux desktop
57
- - Node.js 20+
58
- - npm
59
- - Rust toolchain with `cargo` for running the Tauri app
60
- - Tauri Linux system dependencies, for example on Ubuntu/Debian:
61
-
62
- ```bash
63
- sudo apt install libwebkit2gtk-4.1-dev build-essential curl wget file libxdo-dev libssl-dev libayatana-appindicator3-dev librsvg2-dev
64
- ```
65
-
66
- This workspace currently has Node/npm available. `cargo` is required before the native app can be started.
67
-
68
- ## Start Development
69
-
70
- Install JavaScript dependencies:
93
+ Install dependencies:
71
94
 
72
95
  ```bash
73
96
  npm install
74
97
  ```
75
98
 
76
- Run only the web UI:
77
-
78
- ```bash
79
- npm run dev
80
- ```
81
-
82
- Run the Linux desktop app:
99
+ Run desktop dev stack:
83
100
 
84
101
  ```bash
85
102
  npm run tauri:dev
86
103
  ```
87
104
 
88
- Phase 2 hotkey behavior on X11:
89
-
90
- - Hold `Ctrl+Space`: start recording after a short hold delay.
91
- - Release `Ctrl+Space`: stop recording and save a WAV file under `~/.cache/twokey-ai/recordings/`.
92
- - Double tap `Ctrl+Space`: cycle to the next mode.
93
- - Press `Escape`: cancel an active recording.
94
-
95
- On Wayland, generic global hold-hotkeys are reported as unavailable instead of failing silently.
96
-
97
- Phase 3 STT behavior:
98
-
99
- - Default STT provider is a deterministic mock transcriber.
100
- - To test a real local or custom STT command, set `TWOKEY_STT_COMMAND`.
101
- - The command must print the transcript to stdout and include `{audio}` as placeholder.
105
+ Important:
102
106
 
103
- Example:
104
-
105
- ```bash
106
- TWOKEY_STT_COMMAND='whisper-cli -f {audio} --language de --no-timestamps' npm run tauri:dev
107
- ```
107
+ - `target/debug/twokey-ai` alone in dev mode will fail if Vite is not running.
108
+ - For development, use `npm run tauri:dev` so frontend + Tauri run together.
108
109
 
109
- Phase 4 Ollama behavior:
110
-
111
- - Ollama runs as a systemd service on `127.0.0.1:11434`.
112
- - Default model: `qwen2.5:3b`.
113
- - Conversation mode sends finished transcripts to Ollama and displays the answer in the overlay.
114
- - Override model or endpoint with:
110
+ Build:
115
111
 
116
112
  ```bash
117
- TWOKEY_OLLAMA_MODEL='llama3.2:3b' TWOKEY_OLLAMA_URL='http://127.0.0.1:11434' npm run tauri:dev
113
+ npm run build
114
+ cd src-tauri && cargo check
118
115
  ```
119
116
 
120
- Phase 5 dictation behavior:
121
-
122
- - In dictation mode, a finished transcript is pasted at the active cursor position.
123
- - X11 uses `xclip` or `xsel` plus `xdotool`.
124
- - Clipboard content is read before insertion and restored afterward when possible.
125
- - Wayland insertion is deliberately reported as unsupported for now.
117
+ ## Hotkey Behavior
126
118
 
127
- Phase 6 text editing behavior:
119
+ Default hotkey can be changed in settings. Examples:
128
120
 
129
- - In edit mode, TwoKey reads the current X11 selection with `Ctrl+C`.
130
- - The spoken instruction and selected text are sent to Ollama.
131
- - The overlay shows a replacement preview.
132
- - The selected text is replaced only after pressing `Ersetzen`.
121
+ - `Ctrl+Space`
122
+ - `Ctrl+Super`
123
+ - mouse-button combinations except left/right mouse button in recorder UI
133
124
 
134
- Phase 7 settings behavior:
125
+ Runtime semantics on X11:
135
126
 
136
- - Settings are persisted at `~/.config/twokey-ai/settings.json`.
137
- - The settings window saves general, hotkey, STT, Ollama, privacy, and update-channel values.
138
- - Some settings are stored before they are fully applied at runtime; later phases will wire them into the helper/provider layers.
127
+ - hold hotkey: start recording
128
+ - release hotkey: stop recording -> transcribe -> mode action
129
+ - short single tap: open file picker
130
+ - short double tap: switch mode
131
+ - `Escape` while recording: cancel
139
132
 
140
- Phase 8 provider behavior:
133
+ Wayland fallback:
141
134
 
142
- - Ollama chat is routed through a provider abstraction.
143
- - Provider metadata is exposed to the settings UI.
144
- - OpenAI-compatible and OpenRouter-compatible providers are visible as planned online providers, but disabled until API-key storage and routing are implemented.
135
+ - if global hotkeys are blocked, manual start/stop recording is available from the menu
136
+ - this keeps the voice pipeline usable on restricted desktops
145
137
 
146
- Phase 9 file context behavior:
138
+ ## STT Providers
147
139
 
148
- - File context can be added from the overlay menu.
149
- - `txt`, `md`, `markdown`, `json`, and `csv` files are read directly.
150
- - PDFs are extracted with `pdftotext` from `poppler-utils`.
151
- - Images are registered as context metadata for future vision providers.
152
- - Extracted text is cached under `~/.cache/twokey-ai/file-contexts/`.
140
+ Configured in settings (`sttProvider`):
153
141
 
154
- Phase 10 packaging behavior:
142
+ - `mock`
143
+ - `local-whisper`
144
+ - `external-command`
145
+ - `openai-compatible`
155
146
 
156
- - `npm run tauri:build` creates AppImage and `.deb` bundles.
157
- - Settings autostart writes `~/.config/autostart/twokey-ai.desktop`.
158
- - Generated bundles live under `src-tauri/target/release/bundle/`.
147
+ `external-command` needs `TWOKEY_STT_COMMAND` with `{audio}` placeholder.
159
148
 
160
- Phase 11 update behavior:
161
-
162
- - Settings can check GitHub Releases for a newer version.
163
- - The app only reports availability; it does not auto-download or auto-install.
164
-
165
- Current stabilization status:
166
-
167
- - AppImage and `.deb` builds complete successfully.
168
- - Ollama runs locally as a systemd service with `qwen2.5:3b`.
169
- - Production dependency audit reports no vulnerabilities.
170
- - Wayland limitations, TTS, tray menu, SQLite history, and online provider execution remain future hardening work.
171
-
172
- Build the frontend:
149
+ Example:
173
150
 
174
151
  ```bash
175
- npm run build
152
+ TWOKEY_STT_COMMAND='whisper-cli -f {audio} -l de -nt' npm run tauri:dev
176
153
  ```
177
154
 
178
- Build the desktop bundle:
155
+ ## Linux Requirements
179
156
 
180
- ```bash
181
- npm run tauri:build
182
- ```
157
+ - Node.js 20+
158
+ - npm
159
+ - Rust + cargo
160
+ - Linux desktop dependencies for Tauri
183
161
 
184
- ## Project Layout
162
+ Ubuntu/Debian example:
185
163
 
186
- ```text
187
- docs/ Product, architecture, security, and status docs
188
- src/ React frontend
189
- src-tauri/ Rust/Tauri desktop shell
164
+ ```bash
165
+ sudo apt install libwebkit2gtk-4.1-dev build-essential curl wget file libxdo-dev libssl-dev libayatana-appindicator3-dev librsvg2-dev
190
166
  ```
191
167
 
192
- ## Repository
193
-
194
- GitHub: https://github.com/meinzeug/twokey
168
+ Optional runtime tools:
195
169
 
196
- ## License
170
+ - X11 automation: `xdotool`, `xclip` or `xsel`
171
+ - PDF extraction: `poppler-utils` (`pdftotext`)
172
+ - TTS backends: `spd-say` or `espeak-ng`/`espeak`
197
173
 
198
- MIT. See `LICENSE`.
199
-
200
- ## Linux Notes
201
-
202
- TwoKey is designed around Linux conventions:
174
+ ## Data Paths
203
175
 
204
176
  - Config: `~/.config/twokey-ai/`
205
177
  - Data: `~/.local/share/twokey-ai/`
206
178
  - Cache: `~/.cache/twokey-ai/`
207
- - Logs/state: `~/.local/state/twokey-ai/`
179
+ - History DB: `~/.local/share/twokey-ai/history.db`
180
+ - Toolchains: `~/.config/twokey-ai/toolchains.json`
181
+
182
+ ## Repo
183
+
184
+ https://github.com/meinzeug/twokey
185
+
186
+ ## License
208
187
 
209
- X11 and Wayland will be handled separately in later phases. Phase 1 only detects and displays the current session type.
188
+ MIT
package/bin/twokey.js CHANGED
@@ -1,11 +1,17 @@
1
1
  #!/usr/bin/env node
2
2
 
3
3
  import { spawn } from "node:child_process";
4
+ import fs from "node:fs";
5
+ import os from "node:os";
6
+ import path from "node:path";
4
7
  import readline from "node:readline";
5
8
 
6
- const VERSION = "1.0.2";
9
+ const VERSION = "1.0.3";
7
10
  const DEFAULT_MODEL = process.env.TWOKEY_OLLAMA_MODEL || "qwen2.5:3b";
8
11
  const DEFAULT_OLLAMA_URL = process.env.TWOKEY_OLLAMA_URL || "http://127.0.0.1:11434";
12
+ const LATEST_RELEASE_API = "https://api.github.com/repos/meinzeug/twokey/releases/latest";
13
+ const APPIMAGE_DIR = path.join(os.homedir(), ".local", "share", "twokey", "bin");
14
+ const APPIMAGE_PATH = path.join(APPIMAGE_DIR, "twokey-ai.AppImage");
9
15
 
10
16
  const args = process.argv.slice(2);
11
17
 
@@ -55,8 +61,8 @@ if (onceIndex >= 0) {
55
61
  process.exit(0);
56
62
  }
57
63
 
58
- console.error("No native desktop binary found in PATH.");
59
- console.error("Install the .deb/.AppImage release and ensure 'twokey-ai' is available in PATH.");
64
+ console.error("Could not start desktop app.");
65
+ console.error("Tried system binaries and auto-download from GitHub Releases.");
60
66
  console.error("Use 'twokey --cli' to run terminal mode.");
61
67
  process.exit(1);
62
68
  });
@@ -159,7 +165,7 @@ async function launchDesktopApp() {
159
165
  if (process.env.TWOKEY_DESKTOP_CMD) {
160
166
  candidates.push(process.env.TWOKEY_DESKTOP_CMD);
161
167
  }
162
- candidates.push("twokey-ai", "twokey-desktop");
168
+ candidates.push("twokey-ai", "twokey-desktop", APPIMAGE_PATH);
163
169
 
164
170
  for (const command of candidates) {
165
171
  const started = await spawnDetached(command);
@@ -168,9 +174,71 @@ async function launchDesktopApp() {
168
174
  }
169
175
  }
170
176
 
177
+ try {
178
+ const downloaded = await ensureLocalAppImage();
179
+ if (downloaded) {
180
+ return spawnDetached(APPIMAGE_PATH);
181
+ }
182
+ } catch {
183
+ return false;
184
+ }
185
+
171
186
  return false;
172
187
  }
173
188
 
189
+ async function ensureLocalAppImage() {
190
+ try {
191
+ await fs.promises.access(APPIMAGE_PATH, fs.constants.X_OK);
192
+ return true;
193
+ } catch {
194
+ // Not installed yet.
195
+ }
196
+
197
+ await fs.promises.mkdir(APPIMAGE_DIR, { recursive: true });
198
+ const assetUrl = await resolveLatestAppImageUrl();
199
+ if (!assetUrl) {
200
+ return false;
201
+ }
202
+
203
+ const response = await fetch(assetUrl, {
204
+ headers: {
205
+ "User-Agent": "twokey-cli",
206
+ Accept: "application/octet-stream",
207
+ },
208
+ });
209
+
210
+ if (!response.ok) {
211
+ return false;
212
+ }
213
+
214
+ const data = Buffer.from(await response.arrayBuffer());
215
+ await fs.promises.writeFile(APPIMAGE_PATH, data, { mode: 0o755 });
216
+
217
+ await fs.promises.chmod(APPIMAGE_PATH, 0o755);
218
+ return true;
219
+ }
220
+
221
+ async function resolveLatestAppImageUrl() {
222
+ const response = await fetch(LATEST_RELEASE_API, {
223
+ headers: {
224
+ "User-Agent": "twokey-cli",
225
+ Accept: "application/vnd.github+json",
226
+ },
227
+ });
228
+
229
+ if (!response.ok) {
230
+ return null;
231
+ }
232
+
233
+ const payload = await response.json();
234
+ const assets = Array.isArray(payload.assets) ? payload.assets : [];
235
+ const appImage = assets.find(
236
+ (asset) => typeof asset?.name === "string" && asset.name.endsWith(".AppImage") && asset.name.includes("amd64"),
237
+ ) || assets.find((asset) => typeof asset?.name === "string" && asset.name.endsWith(".AppImage"));
238
+
239
+ return appImage?.browser_download_url ?? null;
240
+ }
241
+
174
242
  function spawnDetached(command) {
175
243
  return new Promise((resolve) => {
176
244
  const child = spawn(command, [], {
@@ -201,4 +269,5 @@ function printHelp() {
201
269
  console.log(" --desktop Start native desktop app in background");
202
270
  console.log("");
203
271
  console.log("Without options, twokey starts the native desktop app in background.");
272
+ console.log("If no desktop binary is installed, twokey tries to download an AppImage from latest GitHub release.");
204
273
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "twokey",
3
- "version": "1.0.2",
3
+ "version": "1.0.4",
4
4
  "description": "Linux-first desktop AI assistant built with Tauri, React, and TypeScript.",
5
5
  "license": "MIT",
6
6
  "repository": {
@@ -43,8 +43,8 @@
43
43
  "lint": "npm run typecheck",
44
44
  "test": "npm run typecheck",
45
45
  "prepublishOnly": "npm run build && npm pack --dry-run",
46
- "tauri:dev": "tauri dev",
47
- "tauri:build": "tauri build"
46
+ "tauri:dev": "GTK_MODULES='' LIBGL_ALWAYS_SOFTWARE=1 WEBKIT_DISABLE_DMABUF_RENDERER=1 tauri dev",
47
+ "tauri:build": "GTK_MODULES='' LIBGL_ALWAYS_SOFTWARE=1 WEBKIT_DISABLE_DMABUF_RENDERER=1 tauri build"
48
48
  },
49
49
  "dependencies": {
50
50
  "@tauri-apps/api": "^2.5.0",