twokey 1.0.3 → 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +124 -146
  2. package/package.json +3 -3
package/README.md CHANGED
@@ -1,210 +1,188 @@
1
1
  # TwoKey Linux AI Assistant
2
2
 
3
- TwoKey is a Linux-first desktop assistant prototype. The goal is a small always-available overlay that will later let users hold two keys, speak, and have the result processed by local or online AI providers without switching to a browser or chat window.
4
-
5
- Phase 1 implements the foundation only:
6
-
7
- - Tauri + React project structure
8
- - Minimal dark overlay pill
9
- - Dummy modes: Conversation, Edit Text, Dictation, Feedback
10
- - Clickable mode menu
11
- - Placeholder settings window
12
- - Initial architecture and product documentation
13
- - X11 Phase 2 prototype: hold `Ctrl+Space` to record audio, double tap `Ctrl+Space` to cycle modes
14
- - Phase 3 prototype: mock STT and external STT command adapter
15
- - Phase 4 prototype: local Ollama conversation mode with `qwen2.5:3b`
16
-
17
- No TTS or text injection is implemented yet.
18
-
19
- ## Installation
3
+ TwoKey is a Linux-first desktop AI assistant with a small floating pill overlay.
4
+ The core idea matches the video workflow:
5
+
6
+ - hold a global hotkey, speak, release
7
+ - audio is recorded and transcribed
8
+ - transcript is sent to the selected AI provider
9
+ - result is used directly in your current desktop workflow
10
+ - optional TTS reads the answer out loud
11
+
12
+ ## What Works Now
13
+
14
+ - Global hold hotkey on X11 for voice capture.
15
+ - Double-tap hotkey to cycle modes.
16
+ - Single-tap hotkey to open file-context picker.
17
+ - Voice-triggered toolchains for multi-step desktop actions.
18
+ - Four modes:
19
+ - Conversation
20
+ - Edit Text
21
+ - Dictation
22
+ - Feedback
23
+ - Conversation mode:
24
+ - transcript -> provider answer
25
+ - optional TTS playback
26
+ - Edit mode:
27
+ - read selected text
28
+ - apply spoken transform via AI
29
+ - replace selected text directly
30
+ - Dictation mode:
31
+ - insert transcript at cursor
32
+ - Feedback mode:
33
+ - local feedback persistence in history DB
34
+ - File context:
35
+ - TXT/MD/JSON/CSV/PDF content context
36
+ - image context routing to vision-capable providers
37
+ - Provider routing:
38
+ - local Ollama
39
+ - OpenAI-compatible
40
+ - OpenRouter-compatible
41
+ - secure API key storage via local keyring
42
+ - Local audit/history persistence in SQLite.
43
+ - Tray icon and settings window.
44
+ - GitHub release check from settings.
45
+ - In-app AppImage update download and launch.
46
+
47
+ ## Current Video Parity
48
+
49
+ Implemented from video behavior:
50
+
51
+ - Minimal floating pill UI.
52
+ - Hold-to-talk workflow.
53
+ - Mode switch via double tap.
54
+ - Text workflow without browser/chat-tab context switch.
55
+ - File attach + ask flow.
56
+ - Hybrid local/online providers.
57
+ - Optional TTS answer playback.
58
+
59
+ Still not fully equivalent to the video vision:
60
+
61
+ - Toolchains are implemented, but no visual workflow builder exists yet.
62
+ - Wayland still has compositor-specific limits for global hold hotkeys and full automation.
63
+ - Update install is available for AppImage, but no signed rollback-capable updater pipeline yet.
64
+
65
+ ## Install
20
66
 
21
67
  ```bash
22
68
  npm install twokey
23
69
  ```
24
70
 
25
- Start the tool directly:
71
+ Run:
26
72
 
27
73
  ```bash
28
74
  twokey
29
75
  ```
30
76
 
31
- Default behavior: start the native desktop app in background.
32
- If no desktop binary is installed yet, `twokey` attempts to download an AppImage from the latest GitHub release into `~/.local/share/twokey/bin/` and starts it.
77
+ Default behavior:
33
78
 
34
- Useful CLI options:
79
+ - starts native desktop app in background
80
+ - if no native binary is installed, tries to download latest AppImage release
81
+
82
+ Useful options:
35
83
 
36
84
  ```bash
37
85
  twokey --help
38
86
  twokey --cli
39
- twokey --once "Erklaere kurz den Unterschied zwischen X11 und Wayland"
87
+ twokey --once "Erklaere X11 vs Wayland kurz"
40
88
  twokey --desktop
41
89
  ```
42
90
 
43
- ## Minimal Usage
44
-
45
- ```ts
46
- import { getPackageInfo } from "twokey";
47
-
48
- const info = getPackageInfo();
49
- console.log(info.name);
50
- console.log(info.runtimeStatus.waylandGlobalHotkeys);
51
- ```
52
-
53
- The package exports runtime status metadata. Current status includes planned/limited areas such as Wayland global hotkeys, TTS, tray menu, SQLite history/audit, and online provider execution until secure API-key storage is implemented.
91
+ ## Development
54
92
 
55
- ## Requirements
56
-
57
- - Linux desktop
58
- - Node.js 20+
59
- - npm
60
- - Rust toolchain with `cargo` for running the Tauri app
61
- - Tauri Linux system dependencies, for example on Ubuntu/Debian:
62
-
63
- ```bash
64
- sudo apt install libwebkit2gtk-4.1-dev build-essential curl wget file libxdo-dev libssl-dev libayatana-appindicator3-dev librsvg2-dev
65
- ```
66
-
67
- This workspace currently has Node/npm available. `cargo` is required before the native app can be started.
68
-
69
- ## Start Development
70
-
71
- Install JavaScript dependencies:
93
+ Install dependencies:
72
94
 
73
95
  ```bash
74
96
  npm install
75
97
  ```
76
98
 
77
- Run only the web UI:
78
-
79
- ```bash
80
- npm run dev
81
- ```
82
-
83
- Run the Linux desktop app:
99
+ Run desktop dev stack:
84
100
 
85
101
  ```bash
86
102
  npm run tauri:dev
87
103
  ```
88
104
 
89
- Phase 2 hotkey behavior on X11:
90
-
91
- - Hold `Ctrl+Space`: start recording after a short hold delay.
92
- - Release `Ctrl+Space`: stop recording and save a WAV file under `~/.cache/twokey-ai/recordings/`.
93
- - Double tap `Ctrl+Space`: cycle to the next mode.
94
- - Press `Escape`: cancel an active recording.
95
-
96
- On Wayland, generic global hold-hotkeys are reported as unavailable instead of failing silently.
97
-
98
- Phase 3 STT behavior:
99
-
100
- - Default STT provider is a deterministic mock transcriber.
101
- - To test a real local or custom STT command, set `TWOKEY_STT_COMMAND`.
102
- - The command must print the transcript to stdout and include `{audio}` as placeholder.
105
+ Important:
103
106
 
104
- Example:
105
-
106
- ```bash
107
- TWOKEY_STT_COMMAND='whisper-cli -f {audio} --language de --no-timestamps' npm run tauri:dev
108
- ```
107
+ - `target/debug/twokey-ai` alone in dev mode will fail if Vite is not running.
108
+ - For development, use `npm run tauri:dev` so frontend + Tauri run together.
109
109
 
110
- Phase 4 Ollama behavior:
111
-
112
- - Ollama runs as a systemd service on `127.0.0.1:11434`.
113
- - Default model: `qwen2.5:3b`.
114
- - Conversation mode sends finished transcripts to Ollama and displays the answer in the overlay.
115
- - Override model or endpoint with:
110
+ Build:
116
111
 
117
112
  ```bash
118
- TWOKEY_OLLAMA_MODEL='llama3.2:3b' TWOKEY_OLLAMA_URL='http://127.0.0.1:11434' npm run tauri:dev
113
+ npm run build
114
+ cd src-tauri && cargo check
119
115
  ```
120
116
 
121
- Phase 5 dictation behavior:
122
-
123
- - In dictation mode, a finished transcript is pasted at the active cursor position.
124
- - X11 uses `xclip` or `xsel` plus `xdotool`.
125
- - Clipboard content is read before insertion and restored afterward when possible.
126
- - Wayland insertion is deliberately reported as unsupported for now.
117
+ ## Hotkey Behavior
127
118
 
128
- Phase 6 text editing behavior:
119
+ Default hotkey can be changed in settings. Examples:
129
120
 
130
- - In edit mode, TwoKey reads the current X11 selection with `Ctrl+C`.
131
- - The spoken instruction and selected text are sent to Ollama.
132
- - The overlay shows a replacement preview.
133
- - The selected text is replaced only after pressing `Ersetzen`.
121
+ - `Ctrl+Space`
122
+ - `Ctrl+Super`
123
+ - mouse-button combinations except left/right mouse button in recorder UI
134
124
 
135
- Phase 7 settings behavior:
125
+ Runtime semantics on X11:
136
126
 
137
- - Settings are persisted at `~/.config/twokey-ai/settings.json`.
138
- - The settings window saves general, hotkey, STT, Ollama, privacy, and update-channel values.
139
- - Some settings are stored before they are fully applied at runtime; later phases will wire them into the helper/provider layers.
127
+ - hold hotkey: start recording
128
+ - release hotkey: stop recording -> transcribe -> mode action
129
+ - short single tap: open file picker
130
+ - short double tap: switch mode
131
+ - `Escape` while recording: cancel
140
132
 
141
- Phase 8 provider behavior:
133
+ Wayland fallback:
142
134
 
143
- - Ollama chat is routed through a provider abstraction.
144
- - Provider metadata is exposed to the settings UI.
145
- - OpenAI-compatible and OpenRouter-compatible providers are visible as planned online providers, but disabled until API-key storage and routing are implemented.
135
+ - if global hotkeys are blocked, manual start/stop recording is available from the menu
136
+ - this keeps the voice pipeline usable on restricted desktops
146
137
 
147
- Phase 9 file context behavior:
138
+ ## STT Providers
148
139
 
149
- - File context can be added from the overlay menu.
150
- - `txt`, `md`, `markdown`, `json`, and `csv` files are read directly.
151
- - PDFs are extracted with `pdftotext` from `poppler-utils`.
152
- - Images are registered as context metadata for future vision providers.
153
- - Extracted text is cached under `~/.cache/twokey-ai/file-contexts/`.
140
+ Configured in settings (`sttProvider`):
154
141
 
155
- Phase 10 packaging behavior:
142
+ - `mock`
143
+ - `local-whisper`
144
+ - `external-command`
145
+ - `openai-compatible`
156
146
 
157
- - `npm run tauri:build` creates AppImage and `.deb` bundles.
158
- - Settings autostart writes `~/.config/autostart/twokey-ai.desktop`.
159
- - Generated bundles live under `src-tauri/target/release/bundle/`.
147
+ `external-command` needs `TWOKEY_STT_COMMAND` with `{audio}` placeholder.
160
148
 
161
- Phase 11 update behavior:
162
-
163
- - Settings can check GitHub Releases for a newer version.
164
- - The app only reports availability; it does not auto-download or auto-install.
165
-
166
- Current stabilization status:
167
-
168
- - AppImage and `.deb` builds complete successfully.
169
- - Ollama runs locally as a systemd service with `qwen2.5:3b`.
170
- - Production dependency audit reports no vulnerabilities.
171
- - Wayland limitations, TTS, tray menu, SQLite history, and online provider execution remain future hardening work.
172
-
173
- Build the frontend:
149
+ Example:
174
150
 
175
151
  ```bash
176
- npm run build
152
+ TWOKEY_STT_COMMAND='whisper-cli -f {audio} -l de -nt' npm run tauri:dev
177
153
  ```
178
154
 
179
- Build the desktop bundle:
155
+ ## Linux Requirements
180
156
 
181
- ```bash
182
- npm run tauri:build
183
- ```
157
+ - Node.js 20+
158
+ - npm
159
+ - Rust + cargo
160
+ - Linux desktop dependencies for Tauri
184
161
 
185
- ## Project Layout
162
+ Ubuntu/Debian example:
186
163
 
187
- ```text
188
- docs/ Product, architecture, security, and status docs
189
- src/ React frontend
190
- src-tauri/ Rust/Tauri desktop shell
164
+ ```bash
165
+ sudo apt install libwebkit2gtk-4.1-dev build-essential curl wget file libxdo-dev libssl-dev libayatana-appindicator3-dev librsvg2-dev
191
166
  ```
192
167
 
193
- ## Repository
194
-
195
- GitHub: https://github.com/meinzeug/twokey
168
+ Optional runtime tools:
196
169
 
197
- ## License
170
+ - X11 automation: `xdotool`, `xclip` or `xsel`
171
+ - PDF extraction: `poppler-utils` (`pdftotext`)
172
+ - TTS backends: `spd-say` or `espeak-ng`/`espeak`
198
173
 
199
- MIT. See `LICENSE`.
200
-
201
- ## Linux Notes
202
-
203
- TwoKey is designed around Linux conventions:
174
+ ## Data Paths
204
175
 
205
176
  - Config: `~/.config/twokey-ai/`
206
177
  - Data: `~/.local/share/twokey-ai/`
207
178
  - Cache: `~/.cache/twokey-ai/`
208
- - Logs/state: `~/.local/state/twokey-ai/`
179
+ - History DB: `~/.local/share/twokey-ai/history.db`
180
+ - Toolchains: `~/.config/twokey-ai/toolchains.json`
181
+
182
+ ## Repo
183
+
184
+ https://github.com/meinzeug/twokey
185
+
186
+ ## License
209
187
 
210
- X11 and Wayland will be handled separately in later phases. Phase 1 only detects and displays the current session type.
188
+ MIT
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "twokey",
3
- "version": "1.0.3",
3
+ "version": "1.0.4",
4
4
  "description": "Linux-first desktop AI assistant built with Tauri, React, and TypeScript.",
5
5
  "license": "MIT",
6
6
  "repository": {
@@ -43,8 +43,8 @@
43
43
  "lint": "npm run typecheck",
44
44
  "test": "npm run typecheck",
45
45
  "prepublishOnly": "npm run build && npm pack --dry-run",
46
- "tauri:dev": "tauri dev",
47
- "tauri:build": "tauri build"
46
+ "tauri:dev": "GTK_MODULES='' LIBGL_ALWAYS_SOFTWARE=1 WEBKIT_DISABLE_DMABUF_RENDERER=1 tauri dev",
47
+ "tauri:build": "GTK_MODULES='' LIBGL_ALWAYS_SOFTWARE=1 WEBKIT_DISABLE_DMABUF_RENDERER=1 tauri build"
48
48
  },
49
49
  "dependencies": {
50
50
  "@tauri-apps/api": "^2.5.0",