npm - autopreso - Versions diffs - 0.1.1 → 0.1.4 - Mend

autopreso 0.1.1 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md +18 -16
package/package.json +9 -3
package/public/app.js +141 -42
package/public/style.css +64 -11
package/src/cli.js +2 -2
package/src/openai-transcription.js +37 -1
package/src/server.js +70 -20
package/src/settings-store.js +10 -0
package/src/transcript-turn-queue.js +1 -1
package/src/whiteboard-keywords.js +43 -0
package/src/whiteboard-session.js +7 -1

package/README.md CHANGED Viewed

@@ -29,13 +29,11 @@ Stage a few seed elements, hit start, and present.
 ```sh
 $ npx autopreso              # boots the server, opens the browser
 autopreso listening at http://127.0.0.1:3210
-whiteboard agent: openai gpt-5.5
-settings file: /Users/you/.config/autopreso/settings.json
 # In the browser:
 # 1. Drop reference materials onto the staging canvas (title, agenda, etc).
-# 2. Pick your microphone, pick a transcription model and an agent model.
-# 3. Click "Start preso" and start talking.
+# 2. Pick your microphone, transcription model, agent model, and optional Agent instructions.
+# 3. Click "Start Preso" and start talking.
 ```
 ## Install
@@ -80,10 +78,10 @@ npm start
                                                   └────────────────┘
 ```
-- **Two modes** - "staging" lets you sketch seed content client-side; "live" hands the canvas over to the agent and starts streaming transcripts.
+- **Two modes** - "staging" lets you sketch seed content client-side; "live" hands the canvas over to the agent, biases OpenAI Realtime transcription toward staging text and labels, and starts streaming transcripts.
 - **Local server, local network only** - the Express + WebSocket server binds to 127.0.0.1; nothing is exposed beyond your machine.
-- **Persistent settings** - models, API keys, and STT engine choices live in `~/.config/autopreso/settings.json` and survive restarts.
-- **Warmup loop** - after you hit start the agent primes itself against your staging content so the first sentence you say doesn't get a cold model.
+- **Persistent settings** - models, API keys, STT engine choices, and Agent instructions live in `~/.config/autopreso/settings.json` and survive restarts.
+- **Warmup loop** - after you hit start the agent primes itself against your staging content and Agent instructions so the first sentence you say doesn't get a cold model.
 ## CLI Reference
@@ -102,6 +100,7 @@ npm start
 ## Configuration
 Settings persist at `~/.config/autopreso/settings.json` and are managed from the in-app status panel.
+Agent instructions are saved automatically from staging, can be up to 100,000 characters, and take effect on the next Start Preso.
 ### Defaults on first run
@@ -119,22 +118,24 @@ Auto-detection precedence: **Codex CLI auth wins over `OLLAMA_MODEL` wins over `
 ### Environment variables
-These only seed `settings.json` on first run. Once the file exists, they're ignored - edit the file or use the in-app panel.
+Provider variables only seed `settings.json` on first run. Once the file exists, they're ignored - edit the file or use the in-app panel. Log path variables are read on each process start.
-| Variable         | Purpose                                               |
-| ---------------- | ----------------------------------------------------- |
-| `PORT`           | Port to listen on. Default: `3210`.                   |
-| `OPENAI_API_KEY` | Seeds the OpenAI key for both agent and Realtime STT. |
-| `OPENAI_MODEL`   | Seeds the OpenAI agent model.                         |
-| `CODEX_MODEL`    | Seeds the Codex model.                                |
-| `OLLAMA_MODEL`   | Seeds the Ollama model.                               |
+| Variable               | Purpose                                               |
+| ---------------------- | ----------------------------------------------------- |
+| `PORT`                 | Port to listen on. Default: `3210`.                   |
+| `OPENAI_API_KEY`       | Seeds the OpenAI key for both agent and Realtime STT. |
+| `OPENAI_MODEL`         | Seeds the OpenAI agent model.                         |
+| `CODEX_MODEL`          | Seeds the Codex model.                                |
+| `OLLAMA_MODEL`         | Seeds the Ollama model.                               |
+| `AUTOPRESO_CACHE_LOG`  | Cache usage log path. Default: `~/.config/autopreso/logs/cache.log`. |
+| `AUTOPRESO_DEBUG_LOG`  | Agent debug log path. Default: `~/.config/autopreso/logs/debug.log`. |
 Local Moonshine transcription ships as an optional native sidecar for `darwin-arm64` and `darwin-x64`. On other platforms, choose OpenAI Realtime in the STT panel.
 ## Credits
 - [Excalidraw](https://github.com/excalidraw/excalidraw) - the whiteboard canvas, scene model, and rendering.
-- [Moonshine](https://github.com/usefulsensors/moonshine) by Useful Sensors - the local speech-to-text model that makes the offline path possible.
+- [Moonshine](https://github.com/moonshine-ai/moonshine) the local speech-to-text model that makes the offline path possible.
 - [Vercel AI SDK](https://github.com/vercel/ai) - tool-calling agent loop and provider abstraction.
 ## Development
@@ -142,6 +143,7 @@ Local Moonshine transcription ships as an optional native sidecar for `darwin-ar
 ```sh
 npm install                       # install deps
 npm run dev                       # run the CLI from source
+npm run typecheck                 # tsc --noEmit
 npm test                          # node --test
 npm run build:moonshine-sidecars  # build the Python sidecar binaries
 ```

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "autopreso",
-  "version": "0.1.1",
+  "version": "0.1.4",
   "description": "Realtime speech to presentation. Let the whiteboard whiteboard itself.",
   "license": "MIT",
   "author": "Kun Chen <kun@kunchenguid.com>",
@@ -41,7 +41,8 @@
     "dev": "node ./src/cli.js",
     "prepare:release-packages": "node ./scripts/prepare-release-packages.js",
     "test": "node --test",
-    "start": "node ./src/cli.js"
+    "start": "node ./src/cli.js",
+    "typecheck": "tsc --noEmit"
   },
   "dependencies": {
     "@ai-sdk/openai": "^3.0.63",
@@ -55,5 +56,10 @@
     "@autopreso/moonshine-darwin-arm64": "0.1.1",
     "@autopreso/moonshine-darwin-x64": "0.1.1"
   },
-  "devDependencies": {}
+  "devDependencies": {
+    "@types/express": "^5.0.6",
+    "@types/node": "^25.6.2",
+    "@types/ws": "^8.18.1",
+    "typescript": "^6.0.3"
+  }
 }

package/public/app.js CHANGED Viewed

@@ -13,8 +13,27 @@ const MOONSHINE_MODELS = ["tiny", "small", "medium"];
 const MIC_STORAGE_KEY = "autopreso.mic";
 const STARTER_STAGING_ELEMENTS = [];
-// 1x1 transparent PNG used when the staging area is empty.
-const PLACEHOLDER_IMAGE = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNkYPhfDwAChwGA60e6kgAAAABJRU5ErkJggg==";
+function fullscreenIcon(isFullscreen) {
+  const paths = isFullscreen
+    ? ["M3 6 H6 V3", "M10 3 V6 H13", "M13 10 H10 V13", "M6 13 V10 H3"]
+    : ["M3 6 V3 H6", "M10 3 H13 V6", "M13 10 V13 H10", "M6 13 H3 V10"];
+  return React.createElement(
+    "svg",
+    {
+      width: "1em",
+      height: "1em",
+      viewBox: "0 0 16 16",
+      fill: "none",
+      stroke: "currentColor",
+      strokeWidth: 1.8,
+      strokeLinecap: "round",
+      strokeLinejoin: "round",
+      "aria-hidden": "true",
+    },
+    ...paths.map((d, i) => React.createElement("path", { key: i, d })),
+  );
+}
 function loadStoredMic() {
   try {
@@ -50,6 +69,7 @@ function App() {
   const [resetting, setResetting] = React.useState(false);
   // warmupState: { state: "idle"|"running"|"confirmed"|"exhausted"|"cancelled", attempt, maxAttempts }
   const [warmupState, setWarmupState] = React.useState({ state: "idle", attempt: 0, maxAttempts: 8 });
+  const [agentInstructions, setAgentInstructionsValue] = React.useState("");
   const audioSessionRef = React.useRef(null);
   const apiRef = React.useRef(null);
   const wsRef = React.useRef(null);
@@ -63,6 +83,11 @@ function App() {
   const userElementsSyncTimerRef = React.useRef(null);
   const lastSyncedElementsHashRef = React.useRef("");
   const listeningRef = React.useRef(false);
+  // Seed the textarea once from settings, then let the user own it locally so
+  // their keystrokes don't fight the WS settings broadcast we trigger on save.
+  const agentInstructionsSeededRef = React.useRef(false);
+  const agentInstructionsSaveTimerRef = React.useRef(null);
+  const agentInstructionsSavePromiseRef = React.useRef(Promise.resolve());
   React.useEffect(() => { listeningRef.current = listening; }, [listening]);
   const [isFullscreen, setIsFullscreen] = React.useState(false);
@@ -91,9 +116,34 @@ function App() {
       clearTimeout(captionTimerRef.current);
       clearTimeout(resetConfirmTimerRef.current);
       clearTimeout(userElementsSyncTimerRef.current);
+      clearTimeout(agentInstructionsSaveTimerRef.current);
     };
   }, []);
+  React.useEffect(() => {
+    if (agentInstructionsSeededRef.current) return;
+    if (!settings || typeof settings.agentInstructions !== "string") return;
+    setAgentInstructionsValue(settings.agentInstructions);
+    agentInstructionsSeededRef.current = true;
+  }, [settings]);
+  function handleAgentInstructionsChange(value) {
+    setAgentInstructionsValue(value);
+    clearTimeout(agentInstructionsSaveTimerRef.current);
+    agentInstructionsSaveTimerRef.current = setTimeout(() => {
+      agentInstructionsSaveTimerRef.current = null;
+      agentInstructionsSavePromiseRef.current = saveSettings({ agentInstructions: value }).catch((err) => setError(err.message));
+    }, 600);
+  }
+  async function flushAgentInstructionsSave() {
+    clearTimeout(agentInstructionsSaveTimerRef.current);
+    agentInstructionsSaveTimerRef.current = null;
+    await agentInstructionsSavePromiseRef.current;
+    agentInstructionsSavePromiseRef.current = saveSettings({ agentInstructions });
+    await agentInstructionsSavePromiseRef.current;
+  }
   function handleExcalidrawChange(elements) {
     // Only push user edits to the server while in live mode. In staging the
     // canvas is a client-side scratchpad; the server doesn't need to know.
@@ -322,6 +372,7 @@ function App() {
     setError("");
     setPresoStarting(true);
     try {
+      await flushAgentInstructionsSave();
       // Snapshot what the user has on the staging canvas right now.
       const stagingNative = excalidrawAPI.getSceneElements().map((el) => ({ ...el }));
       stagingSceneRef.current = stagingNative;
@@ -486,14 +537,15 @@ function App() {
     const canvas = document.querySelector("canvas.excalidraw__canvas.static");
     if (!canvas) return null;
     const blob = await canvasToBlob(canvas);
-    return await blobToDataUrl(blob);
+    const downscaled = await downscaleBlobByHalf(blob);
+    return await blobToDataUrl(downscaled);
   }
   async function captureStagingSceneAsImage(excalidrawAPI, elements) {
     if (!Array.isArray(elements) || elements.length === 0) {
-      // Empty staging - no scene to render. Use a 1x1 placeholder so the agent
-      // still gets a valid image part in the primer.
-      return PLACEHOLDER_IMAGE;
+      // Empty staging - no scene to render. Skip the image entirely; the
+      // server's primer already drops the image part when this is falsy.
+      return null;
     }
     try {
       const appState = excalidrawAPI.getAppState();
@@ -504,7 +556,8 @@ function App() {
         files,
         mimeType: "image/png",
       });
-      return await blobToDataUrl(blob);
+      const downscaled = await downscaleBlobByHalf(blob);
+      return await blobToDataUrl(downscaled);
     } catch (error) {
       console.warn("Failed to export staging scene, falling back to viewport canvas:", error);
       return captureCanvasDataUrl();
@@ -549,18 +602,6 @@ function App() {
     React.createElement(
       "aside",
       { className: "panel" },
-      isLive
-        ? React.createElement(
-            "button",
-            {
-              className: "fullscreen-toggle",
-              onClick: toggleFullscreen,
-              title: isFullscreen ? "Exit fullscreen (Esc)" : "Fullscreen for screen sharing",
-              "aria-label": isFullscreen ? "Exit fullscreen" : "Enter fullscreen",
-            },
-            isFullscreen ? "⤓" : "⤢",
-          )
-        : null,
       React.createElement(
         "div",
         { className: "brand" },
@@ -620,7 +661,7 @@ function App() {
                 onClick: startPreso,
                 disabled: presoStarting,
               },
-              presoStarting ? "Starting..." : "Start preso →",
+              presoStarting ? "Starting..." : "Start Preso →",
             )
           : null,
         isLive
@@ -628,26 +669,40 @@ function App() {
               "div",
               { className: "listen-controls" },
               React.createElement(
-                "button",
-                {
-                  className: `record-toggle ${listening ? "recording" : ""}`,
-                  onClick: toggleListening,
-                  disabled: starting || (warmupState.state === "running" && !listening),
-                  title: warmupState.state === "running"
-                    ? "Waiting for prompt cache to warm up"
-                    : warmupState.state === "exhausted"
-                      ? "Cache didn't fully prime; first turn may be slower"
-                      : undefined,
-                },
-                React.createElement("span", { className: "record-icon" }, listening ? "■" : "●"),
-                " ",
-                listening
-                  ? "Stop"
-                  : starting
-                    ? "Starting..."
-                    : warmupState.state === "running"
-                      ? `Warming up... (${warmupState.attempt} / ${warmupState.maxAttempts})`
-                      : "Start talking",
+                "div",
+                { className: "listen-row" },
+                React.createElement(
+                  "button",
+                  {
+                    className: `record-toggle ${listening ? "recording" : ""}`,
+                    onClick: toggleListening,
+                    disabled: starting || (warmupState.state === "running" && !listening),
+                    title: warmupState.state === "running"
+                      ? "Waiting for prompt cache to warm up"
+                      : warmupState.state === "exhausted"
+                        ? "Cache didn't fully prime; first turn may be slower"
+                        : undefined,
+                  },
+                  React.createElement("span", { className: "record-icon" }, listening ? "■" : "●"),
+                  " ",
+                  listening
+                    ? "Stop"
+                    : starting
+                      ? "Starting..."
+                      : warmupState.state === "running"
+                        ? `Warming up... (${warmupState.attempt} / ${warmupState.maxAttempts})`
+                        : "Start Talking",
+                ),
+                React.createElement(
+                  "button",
+                  {
+                    className: "fullscreen-toggle",
+                    onClick: toggleFullscreen,
+                    title: isFullscreen ? "Exit fullscreen (Esc)" : "Fullscreen for screen sharing",
+                    "aria-label": isFullscreen ? "Exit fullscreen" : "Enter fullscreen",
+                  },
+                  fullscreenIcon(isFullscreen),
+                ),
               ),
               warmupState.state === "running" && !listening
                 ? React.createElement(
@@ -657,7 +712,7 @@ function App() {
                       onClick: startAnyway,
                       title: "Skip warmup and start listening now. The first turn may be slower.",
                     },
-                    "Start anyway →",
+                    "Start Anyway →",
                   )
                 : null,
               warmupState.state === "exhausted" && !listening
@@ -681,7 +736,7 @@ function App() {
           },
           resetting ? "Resetting..." : resetConfirming
             ? "Click again to reset"
-            : mode === "staging" ? "Reset staging" : "Reset session",
+            : mode === "staging" ? "Reset Staging" : "Reset Session",
         ),
       ),
       React.createElement(
@@ -734,6 +789,27 @@ function App() {
           }) : null,
         }),
       ),
+      mode === "staging"
+        ? React.createElement(
+            "div",
+            { className: "agent-instructions" },
+            React.createElement("label", { className: "agent-instructions-label", htmlFor: "agent-instructions-input" }, "Agent instructions"),
+            React.createElement("textarea", {
+              id: "agent-instructions-input",
+              className: "agent-instructions-input",
+              value: agentInstructions,
+              onChange: (e) => handleAgentInstructionsChange(e.target.value),
+              placeholder: "Optional. Tell the agent your preferences - e.g. 'Use a tight 4-color palette', 'Prefer drawings over text', 'Be funny'.",
+              rows: 4,
+              spellCheck: true,
+            }),
+            React.createElement(
+              "p",
+              { className: "agent-instructions-hint" },
+              "Saved automatically. Takes effect on next Start Preso.",
+            ),
+          )
+        : null,
       error ? React.createElement("div", { className: "error" }, error) : null,
     ),
   );
@@ -1229,4 +1305,27 @@ function canvasToBlob(canvas) {
   });
 }
+// Halve each dimension before sending to the agent. ~4x fewer pixels means
+// ~4x fewer image tokens and a smaller WS payload, while shapes and labels
+// stay legible enough for the model to do visual sanity checks.
+async function downscaleBlobByHalf(blob) {
+  try {
+    const bitmap = await createImageBitmap(blob);
+    const w = Math.max(1, Math.floor(bitmap.width / 2));
+    const h = Math.max(1, Math.floor(bitmap.height / 2));
+    const canvas = document.createElement("canvas");
+    canvas.width = w;
+    canvas.height = h;
+    const ctx = canvas.getContext("2d");
+    ctx.imageSmoothingEnabled = true;
+    ctx.imageSmoothingQuality = "high";
+    ctx.drawImage(bitmap, 0, 0, w, h);
+    bitmap.close?.();
+    return await canvasToBlob(canvas);
+  } catch (error) {
+    console.warn("Image downscale failed, sending original:", error);
+    return blob;
+  }
+}
 createRoot(document.getElementById("app")).render(React.createElement(App));

package/public/style.css CHANGED Viewed

@@ -113,17 +113,63 @@ body {
   line-height: 1.4;
 }
+.agent-instructions {
+  display: flex;
+  flex-direction: column;
+  gap: 6px;
+}
+.agent-instructions-label {
+  font-size: 13px;
+  font-weight: 600;
+  color: #1e1e1e;
+}
+.agent-instructions-input {
+  width: 100%;
+  resize: vertical;
+  min-height: 84px;
+  font-family: inherit;
+  font-size: 13px;
+  line-height: 1.45;
+  color: #1e1e1e;
+  background: #fffdf8;
+  border: 1px solid #dedbd2;
+  border-radius: 6px;
+  padding: 9px 10px;
+  outline: none;
+  transition: border-color 120ms ease, box-shadow 120ms ease;
+}
+.agent-instructions-input:focus {
+  border-color: #1f6feb;
+  box-shadow: 0 0 0 3px rgba(31, 111, 235, 0.18);
+}
+.agent-instructions-input::placeholder {
+  color: #a39b8e;
+}
+.agent-instructions-hint {
+  margin: 0;
+  font-size: 12px;
+  color: #6d675e;
+}
 .controls {
   display: flex;
   flex-direction: column;
   gap: 8px;
 }
+.controls button {
+  font-size: 15px;
+  padding: 13px 14px;
+}
 .start-preso {
   background: #1f6feb;
   border-color: #1f6feb;
-  font-size: 15px;
-  padding: 13px 14px;
   letter-spacing: 0.01em;
 }
@@ -144,19 +190,25 @@ body {
   background: #f3f0e7;
 }
+.listen-row {
+  display: flex;
+  gap: 6px;
+  align-items: stretch;
+}
+.listen-row .record-toggle {
+  flex: 1;
+  min-width: 0;
+}
 .fullscreen-toggle {
-  position: absolute;
-  top: 14px;
-  right: 14px;
-  z-index: 10;
   background: transparent;
   color: #6d675e;
-  border: none;
-  padding: 4px 6px;
-  font-size: 16px;
+  border: 1px solid #dedbd2;
+  font-size: 22px;
   line-height: 1;
-  border-radius: 6px;
-  cursor: pointer;
+  flex: 0 0 auto;
+  aspect-ratio: 1;
 }
 .fullscreen-toggle:hover:not(:disabled) {
@@ -179,6 +231,7 @@ body {
 .mode-toggle {
   display: inline-flex;
   align-items: stretch;
+  margin-left: auto;
   padding: 2px;
   gap: 0;
   border-radius: 999px;

package/src/cli.js CHANGED Viewed

@@ -54,8 +54,6 @@ async function main() {
   });
   console.log(`autopreso listening at ${url}`);
-  console.log(`whiteboard agent: ${agentProvider.provider} ${agentProvider.requestedModel ?? agentProvider.model}`);
-  console.log(`settings file: ${SETTINGS_PATH}`);
   if (options.openBrowser) {
     await open(url);
@@ -84,6 +82,8 @@ Environment:
   CODEX_BASE_URL           Seeds the Codex backend URL on first run
   OLLAMA_MODEL             Seeds the Ollama model on first run
   OLLAMA_BASE_URL          Seeds the Ollama base URL on first run
+  AUTOPRESO_CACHE_LOG      Cache usage log path. Default: ~/.config/autopreso/logs/cache.log
+  AUTOPRESO_DEBUG_LOG      Agent debug log path. Default: ~/.config/autopreso/logs/debug.log
 Models and providers are configured in the UI after launch. Settings persist at:
   ${SETTINGS_PATH}

package/src/openai-transcription.js CHANGED Viewed

@@ -1,5 +1,7 @@
 import { WebSocket } from "ws";
+import { buildTranscriptionVocabularyPrompt } from "./whiteboard-keywords.js";
 const REALTIME_URL = "wss://api.openai.com/v1/realtime?intent=transcription";
 export function createOpenAITranscription({
@@ -8,6 +10,7 @@ export function createOpenAITranscription({
   options,
   env = process.env,
   createWebSocket = (url, protocols, init) => new WebSocket(url, protocols, init),
+  log = console,
 }) {
   let socket = null;
   let readyPromise = null;
@@ -17,6 +20,7 @@ export function createOpenAITranscription({
   let pendingAudio = [];
   let partialText = "";
   let bufferedSinceCommit = false;
+  let vocabularyPrompt = "";
   function ensureSocket() {
     if (socket) return socket;
@@ -39,6 +43,8 @@ export function createOpenAITranscription({
     socket.on("open", () => {
       configured = true;
+      const transcription = { model: options.openaiTranscriptionModel };
+      if (vocabularyPrompt) transcription.prompt = vocabularyPrompt;
       socket.send(JSON.stringify({
         type: "session.update",
         session: {
@@ -46,7 +52,7 @@ export function createOpenAITranscription({
           audio: {
             input: {
               format: { type: "audio/pcm", rate: 24000 },
-              transcription: { model: options.openaiTranscriptionModel },
+              transcription,
             },
           },
         },
@@ -115,6 +121,36 @@ export function createOpenAITranscription({
       connection.send(JSON.stringify({ type: "input_audio_buffer.append", audio }));
       bufferedSinceCommit = true;
     },
+    /** @param {{ keywords?: string[] | null }} [ctx] */
+    setSessionContext: (ctx) => {
+      const keywords = ctx?.keywords ?? [];
+      const prompt = buildTranscriptionVocabularyPrompt(keywords);
+      // Empty input + nothing to clear: bail. Empty input + a previously
+      // pushed prompt: fall through and emit a clearing session.update.
+      if (!prompt && !vocabularyPrompt) return;
+      if (prompt === vocabularyPrompt) return;
+      vocabularyPrompt = prompt;
+      if (prompt) {
+        log.debug?.(`[openai-transcription] vocabulary prompt set (${keywords.length} terms, ${prompt.length} chars)`);
+      } else {
+        log.debug?.(`[openai-transcription] vocabulary prompt cleared`);
+      }
+      if (!socket || !configured) return;
+      socket.send(JSON.stringify({
+        type: "session.update",
+        session: {
+          type: "transcription",
+          audio: {
+            input: {
+              transcription: {
+                model: options.openaiTranscriptionModel,
+                prompt: vocabularyPrompt,
+              },
+            },
+          },
+        },
+      }));
+    },
     stop: () => {
       if (!socket || !configured) return;
       // If server-side VAD already auto-committed (or no audio was sent), skip the manual

package/src/server.js CHANGED Viewed

@@ -1,6 +1,7 @@
 import { createHash } from "node:crypto";
-import { appendFileSync } from "node:fs";
+import { appendFileSync, mkdirSync } from "node:fs";
 import { createServer as createHttpServer } from "node:http";
+import os from "node:os";
 import path from "node:path";
 import { fileURLToPath } from "node:url";
@@ -16,8 +17,10 @@ import {
 } from "./agent-provider.js";
 import { createMoonshineTranscription as createDefaultMoonshineTranscription } from "./moonshine-transcription.js";
 import { createOpenAITranscription as createDefaultOpenAITranscription } from "./openai-transcription.js";
+import { validateAgentInstructions } from "./settings-store.js";
 import { broadcast, createWhiteboardSession } from "./whiteboard-session.js";
 import { detectMalformedLayoutWarnings, normalizeWhiteboardElements } from "./whiteboard-elements.js";
+import { extractWhiteboardKeywords } from "./whiteboard-keywords.js";
 import { applyWhiteboardEditOperations, formatLineNumberedWhiteboard } from "./whiteboard-tools.js";
 const __dirname = path.dirname(fileURLToPath(import.meta.url));
@@ -66,17 +69,33 @@ export async function startServer(options) {
   app.post("/api/session/reset", (_req, res) => {
     state.reset();
+    transcription.setSessionContext({ keywords: [] });
     broadcast(wss, { type: "whiteboard:update", elements: state.elements });
     res.json({ ok: true });
   });
-  app.post("/api/preso/start", (req, res) => {
+  app.post("/api/preso/start", async (req, res) => {
     const { stagingElements, stagingScreenshot } = req.body ?? {};
     if (!Array.isArray(stagingElements)) {
       return res.status(400).json({ error: "stagingElements (array) is required." });
     }
+    // Snapshot the user's free-form Agent instructions at start so the cached
+    // system-prompt prefix stays stable for the whole preso. Edits made to
+    // the textarea after Start Preso land on disk but only take effect on the
+    // next Start Preso.
+    let settings;
+    try {
+      settings = options.settingsStore ? await options.settingsStore.load() : null;
+      validateAgentInstructions(settings?.agentInstructions);
+    } catch (error) {
+      return res.status(400).json({ error: error.message });
+    }
+    const agentInstructions = typeof settings?.agentInstructions === "string" ? settings.agentInstructions : "";
     const primerMessage = buildStagingPrimerMessage({ stagingElements, stagingScreenshot });
-    state.startPreso({ primerMessage });
+    const keywords = extractWhiteboardKeywords(stagingElements);
+    console.log(`[autopreso] preso/start: ${keywords.length} staging keyword(s) for transcription bias`);
+    transcription.setSessionContext({ keywords });
+    state.startPreso({ primerMessage, agentInstructions });
     state.startWarmupLoop({
       runOnce: ({ attempt }) =>
         runWhiteboardWarmupOnce({
@@ -109,6 +128,7 @@ export async function startServer(options) {
   app.post("/api/preso/back-to-staging", (_req, res) => {
     state.backToStaging();
+    transcription.setSessionContext({ keywords: [] });
     broadcast(wss, { type: "mode", mode: state.mode });
     res.json({ ok: true });
   });
@@ -189,7 +209,7 @@ export async function startServer(options) {
     });
   });
-  await new Promise((resolve) => httpServer.listen(options.port, options.host, resolve));
+  await new Promise((resolve) => httpServer.listen(options.port, options.host, () => resolve(undefined)));
   const address = httpServer.address();
   const port = typeof address === "object" && address ? address.port : options.port;
   return {
@@ -203,6 +223,8 @@ export async function startServer(options) {
 async function createTranscriptionManager({ options, wss, queueTranscript }) {
   let current = null;
   let label = "";
+  let sessionContext = null;
+  let hasSessionContext = false;
   const sendTranscript = (message) => broadcast(wss, message);
@@ -243,15 +265,16 @@ async function createTranscriptionManager({ options, wss, queueTranscript }) {
     const factoryOptions = buildOptionsForFactory(settings);
     const factory = pickFactory(settings);
     label = newLabel;
-    options.onStatus?.(`Loading ${label} transcription model...`);
+    options.onStatus?.(`Preparing ${label} transcription model...`);
     current = factory({
       sendTranscript,
       queueTranscript,
       options: factoryOptions,
       env: factoryOptions.env,
     });
+    if (hasSessionContext) current.setSessionContext?.(sessionContext);
     await current.ready();
-    options.onStatus?.(`${label} transcription model is ready.`);
+    options.onStatus?.(`${label} transcription model ready.`);
   }
   await applyCurrent();
@@ -260,6 +283,11 @@ async function createTranscriptionManager({ options, wss, queueTranscript }) {
     sendAudio: (audio) => current?.sendAudio(audio),
     stop: () => current?.stop(),
     close: () => current?.close(),
+    setSessionContext: (ctx) => {
+      sessionContext = ctx;
+      hasSessionContext = true;
+      current?.setSessionContext?.(ctx);
+    },
     getLabel: () => label,
     applyCurrent,
   };
@@ -306,7 +334,7 @@ export async function runWhiteboardAgent({ transcript, state, wss, options, gene
   // are text-only across these APIs. This keeps the staging context as a
   // first-class system instruction rather than a stale early user message.
   const primerText = extractPrimerText(state.agentHistory?.[0]);
-  const effectiveSystem = buildEffectiveSystemPrompt(baseSystem, primerText);
+  const effectiveSystem = buildEffectiveSystemPrompt(baseSystem, primerText, state.agentInstructions);
   const messages = primerText ? reshapeMessagesForCodex(rawMessages) : rawMessages;
   options.onAgentEvent?.({ type: "model:start", transcript, system: effectiveSystem, messages, timestamp: new Date().toISOString() });
   const codexInstructions = agentProvider.provider === "codex" ? effectiveSystem : null;
@@ -475,7 +503,7 @@ export async function runWhiteboardWarmupOnce({ state, options, attempt = 1, gen
       ? resolveAgentProviderFromSettings({ settings: await options.settingsStore.load(), env: options.env ?? process.env })
       : defaultWhiteboardAgentProvider(options));
   const primerText = extractPrimerText(state.agentHistory[0]);
-  const effectiveSystem = buildEffectiveSystemPrompt(baseSystem, primerText);
+  const effectiveSystem = buildEffectiveSystemPrompt(baseSystem, primerText, state.agentInstructions);
   // Each warmup attempt sends the IDENTICAL prefix [primer, WARMUP_USER_MESSAGE]
   // so attempt N hits the cache that attempt N-1 wrote. We must NOT mutate
@@ -566,8 +594,22 @@ function summarizeAgentResult(result) {
   );
 }
-const CACHE_USAGE_LOG_PATH = process.env.AUTOPRESO_CACHE_LOG ?? path.join(process.cwd(), "autopreso-cache.log");
-const DEBUG_LOG_PATH = process.env.AUTOPRESO_DEBUG_LOG ?? path.join(process.cwd(), "autopreso-debug.log");
+const DEFAULT_LOG_DIR = path.join(os.homedir(), ".config", "autopreso", "logs");
+const CACHE_USAGE_LOG_PATH = process.env.AUTOPRESO_CACHE_LOG ?? path.join(DEFAULT_LOG_DIR, "cache.log");
+const DEBUG_LOG_PATH = process.env.AUTOPRESO_DEBUG_LOG ?? path.join(DEFAULT_LOG_DIR, "debug.log");
+let logDirsEnsured = false;
+function ensureLogDirs() {
+  if (logDirsEnsured) return;
+  for (const file of [CACHE_USAGE_LOG_PATH, DEBUG_LOG_PATH]) {
+    try {
+      mkdirSync(path.dirname(file), { recursive: true });
+    } catch {
+      // Best effort; the appendFileSync call below will surface a real failure.
+    }
+  }
+  logDirsEnsured = true;
+}
 function summarizeMessageForDump(message) {
   if (typeof message?.content === "string") {
@@ -593,7 +635,9 @@ function summarizeMessageForDump(message) {
   return { role: message?.role, content: message?.content };
 }
-export function dumpAgentRequest(label, { system, messages, instructions, primerText } = {}) {
+export function dumpAgentRequest(label, args) {
+  const { system, messages, instructions, primerText } = args ?? {};
+  ensureLogDirs();
   try {
     const record = {
       ts: new Date().toISOString(),
@@ -617,6 +661,7 @@ export function dumpAgentRequest(label, { system, messages, instructions, primer
 }
 export function dumpToolCall(toolName, input, sceneIds, result) {
+  ensureLogDirs();
   try {
     const record = {
       ts: new Date().toISOString(),
@@ -681,11 +726,7 @@ function toolDefinitionFingerprintInput(tools) {
 export function logAgentUsage(label, result, extras = {}) {
   const { input, cached, output, reasoning } = extractAgentUsage(result);
   const cachePct = input > 0 ? Math.round((cached / input) * 100) : 0;
-  const fingerprintsSuffix = extras.fingerprints
-    ? ` system=${extras.fingerprints.system} primer=${extras.fingerprints.primer} tools=${extras.fingerprints.tools}`
-    : "";
-  const line = `[cache] ${label.padEnd(7)} input=${input} cached=${cached} (${cachePct}%) output=${output}${reasoning ? ` reasoning=${reasoning}` : ""}${fingerprintsSuffix}`;
-  console.log(line);
+  ensureLogDirs();
   try {
     const record = {
       ts: new Date().toISOString(),
@@ -695,6 +736,7 @@ export function logAgentUsage(label, result, extras = {}) {
       cachePct,
       output,
       reasoning,
+      rawUsage: result?.usage ?? null,
       ...extras,
     };
     appendFileSync(CACHE_USAGE_LOG_PATH, JSON.stringify(record) + "\n");
@@ -719,9 +761,16 @@ function createWhiteboardAgentProviderOptions(agentProvider, effectiveSystem) {
   };
 }
-export function buildEffectiveSystemPrompt(systemPrompt, primerText) {
-  if (!primerText) return systemPrompt;
-  return `${systemPrompt}\n\n${primerText}`;
+export function buildEffectiveSystemPrompt(systemPrompt, primerText, userInstructions = "") {
+  let result = systemPrompt;
+  const trimmedUserInstructions = typeof userInstructions === "string" ? userInstructions.trim() : "";
+  if (trimmedUserInstructions) {
+    result = `${result}\n\nUser instructions:\n${trimmedUserInstructions}`;
+  }
+  if (primerText) {
+    result = `${result}\n\n${primerText}`;
+  }
+  return result;
 }
 export function extractPrimerText(primerMessage) {
@@ -762,7 +811,7 @@ function withTimeout(promise, timeoutMs, message) {
   return Promise.race([promise, timeoutPromise]).finally(() => clearTimeout(timeout));
 }
-export function buildWhiteboardAgentMessages({ agentHistory, elements, latestScreenshot, transcript }) {
+export function buildWhiteboardAgentMessages({ agentHistory, elements, latestScreenshot = null, transcript }) {
   return [
     ...agentHistory,
     { role: "user", content: formatSpeakerTurn(transcript) },
@@ -872,6 +921,7 @@ CRITICAL: one tool call per turn.
 - If you only need to move the viewport (no edits), pass just viewport. If you only need to edit (no viewport change), pass just operations. If you need both, pass both.
 You receive a screenshot of the audience's CURRENT VIEWPORT (not the entire infinite canvas) on each turn. Use it to verify your edits actually rendered well: look for clipped labels, overlapping shapes, arrows that miss their targets, and check that the right region is visible. The line-numbered text content is authoritative for positions; the screenshot is for visual sanity checking.
+Attached images (both the staging primer and the per-turn viewport screenshot) are downscaled 2x in each dimension (4x fewer pixels) to save tokens. Do NOT read pixel dimensions off the image as if they were the canvas's real size; trust the line-numbered text for coordinates and only use the image for visual sanity checks.
 The audience's viewport is whatever you last set it to. They cannot see anything outside it. So:
 - After every meaningful canvas update, pass viewport with action "scroll_to_content" AND a focus_ids list naming the 1-5 elements that represent the active talking point. The viewport will center on exactly those IDs. Pass the IDs of what the speaker is talking about RIGHT NOW, not the whole diagram.
 - When the speaker shifts topic to a different region of the canvas, send a new whiteboard_apply with viewport scroll_to_content and the new region's focus_ids.

package/src/settings-store.js CHANGED Viewed

@@ -3,6 +3,8 @@ import path from "node:path";
 import { readCodexCliAuthSync } from "./codex-auth.js";
+export const MAX_AGENT_INSTRUCTIONS_CHARS = 100_000;
 export const DEFAULT_SETTINGS = Object.freeze({
   agent: {
     provider: "openai",
@@ -18,6 +20,7 @@ export const DEFAULT_SETTINGS = Object.freeze({
   apiKeys: {
     openai: "",
   },
+  agentInstructions: "",
 });
 export function createSettingsStore({ filePath, env = process.env, readCodexAuth = readCodexCliAuthSync }) {
@@ -56,6 +59,7 @@ export function createSettingsStore({ filePath, env = process.env, readCodexAuth
   async function save(partial) {
     if (!cached) await load();
+    validateAgentInstructions(partial?.agentInstructions);
     cached = deepMerge(cached, partial);
     await writeToDisk(cached);
     return cached;
@@ -135,3 +139,9 @@ function trimOrEmpty(value) {
   if (typeof value !== "string") return "";
   return value.trim();
 }
+export function validateAgentInstructions(value) {
+  if (typeof value === "string" && value.length > MAX_AGENT_INSTRUCTIONS_CHARS) {
+    throw new Error(`Agent instructions must be ${MAX_AGENT_INSTRUCTIONS_CHARS} characters or fewer.`);
+  }
+}

package/src/transcript-turn-queue.js CHANGED Viewed

@@ -1,4 +1,4 @@
-export function createTranscriptTurnQueue({ runTurn, debounceMs = 150, isReady = () => true }) {
+export function createTranscriptTurnQueue({ runTurn, debounceMs = 150, isReady = (_text) => true }) {
   let running = false;
   let buffered = [];
   let current = Promise.resolve();

package/src/whiteboard-keywords.js ADDED Viewed

@@ -0,0 +1,43 @@
+const MIN_TERM_LENGTH = 3;
+const DEFAULT_MAX_PROMPT_CHARS = 500;
+const PROMPT_PREFIX = "Domain vocabulary that may appear: ";
+const PROMPT_SUFFIX = ".";
+export function extractWhiteboardKeywords(elements) {
+  if (!Array.isArray(elements)) return [];
+  const seen = new Map();
+  for (const element of elements) {
+    if (!element || typeof element !== "object") continue;
+    const sources = [];
+    if (element.type === "text" && typeof element.text === "string") {
+      sources.push(element.text);
+    }
+    if (element.label && typeof element.label.text === "string") {
+      sources.push(element.label.text);
+    }
+    for (const source of sources) {
+      for (const line of source.split(/\r?\n/)) {
+        const term = line.trim();
+        if (term.length < MIN_TERM_LENGTH) continue;
+        if (!/[a-zA-Z]/.test(term)) continue;
+        const key = term.toLowerCase();
+        if (!seen.has(key)) seen.set(key, term);
+      }
+    }
+  }
+  return [...seen.values()].sort((a, b) => b.length - a.length);
+}
+export function buildTranscriptionVocabularyPrompt(keywords, { maxChars = DEFAULT_MAX_PROMPT_CHARS } = {}) {
+  if (!Array.isArray(keywords) || keywords.length === 0) return "";
+  let body = "";
+  for (const term of keywords) {
+    const next = body.length === 0 ? term : `${body}, ${term}`;
+    if (PROMPT_PREFIX.length + next.length + PROMPT_SUFFIX.length > maxChars) continue;
+    body = next;
+  }
+  if (!body) return "";
+  return `${PROMPT_PREFIX}${body}${PROMPT_SUFFIX}`;
+}

package/src/whiteboard-session.js CHANGED Viewed

@@ -40,6 +40,11 @@ export function createWhiteboardSession({ options, wss, runAgent }) {
     agentBusy: false,
     warmupBusy: false,
     latestScreenshot: undefined,
+    // Snapshot of the user's free-form "Agent instructions" textarea taken at
+    // /api/preso/start. Frozen for the duration of the preso so the cached
+    // system-prompt prefix the warmup loop primes stays stable; mid-preso edits
+    // to the textarea only take effect on the next Start Preso.
+    agentInstructions: "",
     warmupPromise: Promise.resolve(),
     // Snapshot of the warmup loop state, also broadcast to clients via WS.
     warmupState: { state: "idle", attempt: 0, maxAttempts: DEFAULT_WARMUP_MAX_ATTEMPTS },
@@ -102,11 +107,12 @@ export function createWhiteboardSession({ options, wss, runAgent }) {
     state.agentHistory = [];
     state.latestScreenshot = undefined;
   };
-  state.startPreso = ({ primerMessage }) => {
+  state.startPreso = ({ primerMessage, agentInstructions = "" }) => {
     state.mode = "live";
     state.elements = seedElements();
     state.latestScreenshot = undefined;
     state.agentHistory = [primerMessage];
+    state.agentInstructions = typeof agentInstructions === "string" ? agentInstructions : "";
     state.warmupPromise = Promise.resolve();
     state.canvasDirtyForAgent = false;
     // Reset warmup state for this preso. The startWarmupLoop call that follows