osborn 0.9.54 → 0.9.58

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,129 @@
1
+ ---
2
+ name: bug-reporter
3
+ description: |
4
+ File a bug report or feature request when the user describes a problem with
5
+ Osborn itself (voice glitches, agent freezes, audio echo, session crashes,
6
+ interrupt issues) or asks for a new Osborn feature. Posts to a local agent
7
+ endpoint that hands the report off to the frontend, which writes it to
8
+ Supabase. Use whenever the user describes something wrong with Osborn —
9
+ NOT for questions about their own project code.
10
+ ---
11
+
12
+ # Bug Reporter Skill
13
+
14
+ File bug reports and feature requests from inside a voice session, without
15
+ breaking the conversation. Reports land in the dev team's Supabase table for
16
+ triage from a separate Claude Code session.
17
+
18
+ ## When to use this
19
+
20
+ Trigger when the user describes any of these (or similar):
21
+
22
+ **Bugs:**
23
+ - Voice quality issues — "the audio cut out", "I can't hear you", "you keep echoing", "you interrupted yourself"
24
+ - Agent malfunctions — "the agent froze", "it crashed", "it stopped responding", "you're stuck"
25
+ - Session issues — "session disconnected", "the room keeps closing", "I had to restart"
26
+ - Memory/state issues — "you don't remember", "you lost context"
27
+ - Interrupt problems — "you keep cutting yourself off", "the interrupt isn't working"
28
+ - Direct asks — "this is a bug", "file this", "report this", "let me know when it's fixed"
29
+
30
+ **Feature requests:**
31
+ - "I wish Osborn could…", "can you add…", "it would be nice if…", "feature request:"
32
+
33
+ ## When NOT to use
34
+
35
+ - The user has a coding question about THEIR project — that's normal research/coding work
36
+ - The user mentions an error in code they're writing — not an Osborn bug
37
+ - The user is debugging their own logs — they're working, not reporting
38
+
39
+ ## How to file
40
+
41
+ ### Step 1 — confirm with the user
42
+
43
+ Don't silently file. Say something brief like:
44
+
45
+ > "Sounds like a real bug — want me to file it so the team can dig in? I'll
46
+ > include the recent logs."
47
+
48
+ If they say yes, proceed. If unsure, ask whether it's worth filing.
49
+
50
+ ### Step 2 — POST to the local agent endpoint
51
+
52
+ ```bash
53
+ curl -sS -X POST http://localhost:8741/report-bug \
54
+ -H "Content-Type: application/json" \
55
+ -d @- <<'JSON'
56
+ {
57
+ "type": "bug",
58
+ "severity": "medium",
59
+ "title": "Voice cuts out mid-sentence in pipeline mode",
60
+ "description": "User reported that the agent stops speaking mid-sentence and resumes 5 minutes later. Happens repeatedly. Started after migrating to user_state_changed handler (May 21, 0.9.39).",
61
+ "reproduction_notes": "Speak to the agent, then go silent — audio cuts off at a sentence boundary and won't resume until mic is muted for ~2 seconds.",
62
+ "tags": ["voice-quality", "interrupt", "echo"]
63
+ }
64
+ JSON
65
+ ```
66
+
67
+ The agent endpoint:
68
+ - Generates a `reportId`
69
+ - Tails `/workspace/osborn.log` (last 500 lines)
70
+ - Pulls the last few turns from the current JSONL session
71
+ - Sends everything to the frontend via the LiveKit data channel
72
+ - Returns `{ reportId, status: "submitted" }` to you
73
+
74
+ You don't need to attach logs yourself — the agent does that automatically.
75
+
76
+ ### Step 3 — confirm to the user
77
+
78
+ Briefly:
79
+
80
+ > "Filed. Bug `f4a2…` — the team will look. Want me to log anything else?"
81
+
82
+ Use the first 4 chars of the returned `reportId` as a short reference.
83
+
84
+ ## Choosing severity
85
+
86
+ - `critical` — voice completely unusable, session crashes immediately, data loss
87
+ - `high` — major friction (voice keeps cutting, frequent crashes, can't connect)
88
+ - `medium` — annoying but workable (echo, occasional drops, minor UI glitches)
89
+ - `low` — nice-to-have polish, edge cases, documentation gaps, feature requests
90
+
91
+ Feature requests default to `low` unless the user describes blocking workflows.
92
+
93
+ ## Title writing
94
+
95
+ Short, present-tense, specific. 6–10 words.
96
+
97
+ Good:
98
+ - "Voice cuts out mid-sentence in pipeline mode"
99
+ - "Agent echoes own speech as user interrupt"
100
+ - "Session orphaned after machine OOM auto-restart"
101
+
102
+ Bad:
103
+ - "voice bug" (too vague)
104
+ - "When I was talking the agent stopped responding and I had to..." (use description)
105
+
106
+ ## What NOT to include in the description
107
+
108
+ - Don't dump the full transcript — the agent attaches a `transcript_excerpt` automatically
109
+ - Don't paste log lines — the agent attaches the `log_excerpt` automatically
110
+ - Don't speculate about the fix unless the user explicitly suggested one
111
+ - Don't include the user's API keys, OAuth tokens, or PII
112
+
113
+ ## Tags vocabulary
114
+
115
+ Pick from these rough buckets (one or more):
116
+ `echo, interrupt, crash, freeze, memory, voice-quality, audio, mode-specific,
117
+ direct, pipeline, realtime, ui, sessions, fly, recall, meeting, deepgram, tts, stt`
118
+
119
+ ## Reading existing reports
120
+
121
+ You don't query, list, or close reports from inside a voice session — that's
122
+ the dev team's job from their own Claude Code session. If the user asks "is
123
+ that bug fixed yet?", say "let me check" and use the same endpoint with `GET`:
124
+
125
+ ```bash
126
+ curl -sS "http://localhost:8741/report-bug?id=${REPORT_ID}"
127
+ ```
128
+
129
+ But typically the user won't ask, and you don't need to volunteer the status.
package/dist/index.js CHANGED
@@ -1,7 +1,7 @@
1
1
  // Load environment variables FIRST before any other imports
2
2
  import 'dotenv/config';
3
3
  import { voice, initializeLogger } from '@livekit/agents';
4
- import { Room, RoomEvent, RemoteParticipant, } from '@livekit/rtc-node';
4
+ import { Room, RoomEvent, } from '@livekit/rtc-node';
5
5
  import { AccessToken } from 'livekit-server-sdk';
6
6
  // Initialize logger before anything else
7
7
  initializeLogger({ pretty: true, level: 'info' });
@@ -14,6 +14,7 @@ import { existsSync, readdirSync, readFileSync, mkdirSync, writeFileSync, mkdtem
14
14
  import { dirname, join } from 'node:path';
15
15
  import { fileURLToPath } from 'node:url';
16
16
  import { spawn } from 'node:child_process';
17
+ import { randomUUID } from 'node:crypto';
17
18
  import { homedir, tmpdir } from 'node:os';
18
19
  import { PassThrough } from 'node:stream';
19
20
  import { createGunzip } from 'node:zlib';
@@ -189,6 +190,14 @@ const livekitState = {
189
190
  let intentionalLeave = false;
190
191
  let connectRoomHook = null;
191
192
  let leaveRoomHook = null;
193
+ // Hook for the bug-reporter skill. The /report-bug HTTP endpoint validates the
194
+ // payload + generates the reportId in the module-level handler, then delegates
195
+ // to this hook which lives in main() (where sendToFrontend, currentVoiceMode,
196
+ // and currentSession are in scope). The frontend listens for the data channel
197
+ // message type 'bug_report' and writes the row to Supabase — same architecture
198
+ // as the existing fetch-log/save-log flow so we don't ship Supabase credentials
199
+ // to the Fly machine.
200
+ let bugReportHook = null;
192
201
  function startApiServer(workingDir, port) {
193
202
  const server = createServer(async (req, res) => {
194
203
  // CORS headers for cloud frontend
@@ -332,6 +341,67 @@ function startApiServer(workingDir, port) {
332
341
  }
333
342
  return;
334
343
  }
344
+ // POST /report-bug — invoked by the bug-reporter skill (running inside Claude
345
+ // Code on this same machine) when the user describes an Osborn bug or
346
+ // requests a feature. We validate the payload, generate a reportId, and emit
347
+ // a data channel message via bugReportHook → sendToFrontend. The frontend
348
+ // owns the actual Supabase write (it already has the keys for the log-upload
349
+ // flow, no need to ship them to the Fly machine).
350
+ if (req.method === 'POST' && url.pathname === '/report-bug') {
351
+ let body = '';
352
+ req.on('data', (chunk) => { body += chunk.toString(); });
353
+ req.on('end', () => {
354
+ try {
355
+ const parsed = JSON.parse(body || '{}');
356
+ const errors = [];
357
+ if (parsed.type !== 'bug' && parsed.type !== 'feature')
358
+ errors.push('type must be "bug" or "feature"');
359
+ if (!parsed.title || typeof parsed.title !== 'string' || parsed.title.length < 3)
360
+ errors.push('title required (>= 3 chars)');
361
+ if (!parsed.description || typeof parsed.description !== 'string')
362
+ errors.push('description required');
363
+ const sev = parsed.severity || 'medium';
364
+ if (!['low', 'medium', 'high', 'critical'].includes(sev))
365
+ errors.push('severity invalid');
366
+ if (errors.length) {
367
+ res.writeHead(400, { 'Content-Type': 'application/json' });
368
+ res.end(JSON.stringify({ error: 'invalid payload', details: errors }));
369
+ return;
370
+ }
371
+ const reportId = randomUUID();
372
+ const payload = {
373
+ type: parsed.type,
374
+ severity: sev,
375
+ title: parsed.title.trim().slice(0, 200),
376
+ description: parsed.description.trim().slice(0, 8000),
377
+ reproduction_notes: typeof parsed.reproduction_notes === 'string'
378
+ ? parsed.reproduction_notes.trim().slice(0, 4000)
379
+ : undefined,
380
+ tags: Array.isArray(parsed.tags)
381
+ ? parsed.tags.filter((t) => typeof t === 'string').slice(0, 20)
382
+ : undefined,
383
+ };
384
+ res.writeHead(200, { 'Content-Type': 'application/json' });
385
+ res.end(JSON.stringify({ reportId, status: 'submitted' }));
386
+ if (bugReportHook) {
387
+ try {
388
+ bugReportHook(reportId, payload);
389
+ }
390
+ catch (e) {
391
+ console.error('❌ bugReportHook threw:', e);
392
+ }
393
+ }
394
+ else {
395
+ console.warn('⚠️ /report-bug fired but no bugReportHook registered (frontend may not receive)');
396
+ }
397
+ }
398
+ catch (e) {
399
+ res.writeHead(400, { 'Content-Type': 'application/json' });
400
+ res.end(JSON.stringify({ error: 'invalid JSON', details: e.message }));
401
+ }
402
+ });
403
+ return;
404
+ }
335
405
  // POST /restart — graceful process restart (process manager will restart)
336
406
  if (req.method === 'POST' && url.pathname === '/restart') {
337
407
  res.writeHead(200, { 'Content-Type': 'application/json' });
@@ -1205,16 +1275,6 @@ async function main() {
1205
1275
  // Session-level always-allow list: paths the user has approved for this session without prompting
1206
1276
  let sessionAlwaysAllowPaths = new Set();
1207
1277
  let userState = 'listening'; // Track user speech state for queue safety
1208
- // Self-echo guard for the TTS interrupt below. Updated by the
1209
- // ActiveSpeakersChanged listener registered near the other room.on(...) handlers.
1210
- // user_state_changed carries NO speaker identity (verified against the SDK type
1211
- // — UserStateChangedEvent has only oldState/newState/createdAt), so a separate
1212
- // remote-speaker timestamp is the only way to distinguish "real user spoke" from
1213
- // "agent's own TTS echoed through the mic". Independent producer: rtc-node
1214
- // emits activeSpeakersChanged from server WebRTC audio-level reports
1215
- // (room.js:213), with NO reference to AgentSession or STT — so there's no
1216
- // dependency loop with user_state_changed's STT-driven producer.
1217
- let lastRemoteSpeakerAt = 0;
1218
1278
  let currentVoiceMode = voiceMode; // Track active voice mode for data handlers
1219
1279
  let currentProvider = realtimeConfig.provider; // Track active realtime provider
1220
1280
  // Authenticated Supabase userId from participant metadata. Used to scope
@@ -2062,18 +2122,22 @@ async function main() {
2062
2122
  minDelay: 500, // Wait 500ms after STT commits before generating reply
2063
2123
  maxDelay: 2000, // Force end-of-turn after 2s to prevent hangs
2064
2124
  },
2065
- // Echo-driven false-interrupt protection at the SDK level (1.2.x has these knobs,
2066
- // we just never set them defaults are minDuration:500ms / minWords:0 which let
2067
- // through every short echo blip). Both knobs gate the SDK's internal
2068
- // interruptByAudioActivity() (agent_activity.js runs on Deepgram interim
2069
- // transcripts AND speechDuration updates), which is the path that was firing
2070
- // even after our user_state_changed handler skipped the trigger.
2125
+ // 0.9.57: bump falseInterruptionTimeout from default 2000ms 3000ms.
2126
+ // This is the silence-after-interrupt window the SDK waits before
2127
+ // emitting agentFalseInterruption + resuming. Extending it gives the
2128
+ // user a fuller breath between low-level audio activity moments to
2129
+ // accumulate a clean silence, which helps when echo or ambient noise
2130
+ // keeps resetting the 2s window. Other tunables in this same block
2131
+ // (NOT changed yet — try the timeout first, escalate if needed):
2132
+ // - minDuration (default 500ms) — minimum sustained speech to count
2133
+ // - minWords (default 0) — minimum word count in interim transcript
2134
+ // - enabled (default true) — kept ON (auto-interrupt path active)
2135
+ // - resumeFalseInterruption (default true) — auto-resume kept ON
2136
+ // - discardAudioIfUninterruptible (default true)
2071
2137
  interruption: {
2072
- minDuration: 750, // 500750: require 750ms of sustained audio activity
2073
- minWords: 2, // 02: require ≥2 transcript words (filters single-word echo blips)
2074
- // SDK defaults kept: enabled=true, resumeFalseInterruption=true,
2075
- // falseInterruptionTimeout=2000ms (that 2s timer is what resumed your audio
2076
- // exactly where it stopped — confirmed working as designed).
2138
+ falseInterruptionTimeout: 3000, // 20003000 (extra second of silence before resume)
2139
+ minDuration: 1000, // 5001000 (need 1s sustained speech to count)
2140
+ minWords: 3, // 0 3 (interim transcript needs ≥3 words)
2077
2141
  },
2078
2142
  },
2079
2143
  });
@@ -2612,20 +2676,6 @@ async function main() {
2612
2676
  // rather than hold it indefinitely. Cancelled in ParticipantConnected.
2613
2677
  armAloneTimer();
2614
2678
  });
2615
- // Self-echo guard producer. Server WebRTC audio-level reports drive this
2616
- // (rtc-node room.js:213, ~50-100ms latency from mic onset — faster than
2617
- // Deepgram STT classification, so by the time user_state_changed fires
2618
- // lastRemoteSpeakerAt is already current). Filter speakers to RemoteParticipant
2619
- // — LocalParticipant is the agent itself and including it would defeat the
2620
- // whole point (the echo we're guarding against IS the agent's local audio).
2621
- // This is the speaker-identity filter the removed ActiveSpeakersChanged
2622
- // handler had (May 21 / c345c98) — minus the interrupt() call, since the
2623
- // user_state_changed handler now owns interrupt firing.
2624
- room.on(RoomEvent.ActiveSpeakersChanged, (speakers) => {
2625
- if (speakers.some((s) => s instanceof RemoteParticipant)) {
2626
- lastRemoteSpeakerAt = Date.now();
2627
- }
2628
- });
2629
2679
  // NOTE: previously this section also had a RoomEvent.ActiveSpeakersChanged
2630
2680
  // handler that interrupted TTS on any sustained audio activity (~50ms after
2631
2681
  // mic onset). That fired too eagerly — coughs, paper rustles, the agent's
@@ -2991,25 +3041,15 @@ async function main() {
2991
3041
  userState = ev.newState;
2992
3042
  console.log(`👤 User state: ${prev} → ${ev.newState} (agent: ${agentState})`);
2993
3043
  if (ev.newState === 'speaking' && agentState === 'speaking' && sessionVoiceMode !== 'realtime') {
2994
- const now = Date.now();
2995
- // Self-echo guard. Reject this trigger entirely if no remote
2996
- // participant has been heard speaking in the last 500ms at that
2997
- // point user_state=speaking is almost certainly TTS bleeding through
2998
- // the mic (Deepgram correctly identifies it as "speech", we add the
2999
- // identity filter the high-level event lacks). 500ms is wider than
3000
- // the ~50-300ms gap between ActiveSpeakersChanged and user_state_changed
3001
- // firing, so a real user is comfortably inside the window.
3002
- //
3003
- // The 1s leading-edge debounce that used to live here was removed in
3004
- // 0.9.54 — the SDK-side `turnHandling.interruption.minDuration:750` +
3005
- // `minWords:2` now do the heavy lifting on echo filtering, and stacking
3006
- // an extra cooldown on top risked masking the SDK's own resume timing.
3007
- if (now - lastRemoteSpeakerAt > 500) {
3008
- console.log('🔇 Skipping interrupt — no recent remote-speaker activity (self-echo guard)');
3009
- return;
3010
- }
3044
+ // Reverted to the simple post-May-22 (c345c98 / 0.9.39) shape in 0.9.56.
3045
+ // The self-echo guard via lastRemoteSpeakerAt was defeated by the same
3046
+ // physics it was trying to filter TTS bleeds into the user's mic →
3047
+ // LiveKit registers their participant as a remote speaker → the guard
3048
+ // passes we interrupt anyway. Verified in osbornojure logs 2026-06-16
3049
+ // (2 of 3 interrupts that session were from this handler firing on echo).
3050
+ // Echo prevention moved to browser AEC on the publisher side.
3011
3051
  try {
3012
- console.log('🎤 user_state_changed=speaking + agent speaking + remote-speaker confirmed → interrupting TTS');
3052
+ console.log('🎤 user_state_changed=speaking + agent speaking → interrupting TTS');
3013
3053
  currentSession?.interrupt();
3014
3054
  }
3015
3055
  catch (err) {
@@ -4282,6 +4322,40 @@ async function main() {
4282
4322
  console.error('leave-room room.disconnect failed:', e);
4283
4323
  }
4284
4324
  };
4325
+ // bug-reporter skill hook — forwards a validated bug payload to the frontend
4326
+ // via the LiveKit data channel. Frontend (which holds the Supabase keys for
4327
+ // the existing log-upload flow) is responsible for the actual Supabase write.
4328
+ // Enriches with the agent-side facts the frontend doesn't already have on
4329
+ // hand (voice_mode + sandbox_id from FLY_MACHINE_ID — version it can read
4330
+ // from /health, session_id it tracks via preSelectedSessionId).
4331
+ bugReportHook = (reportId, payload) => {
4332
+ const sandboxId = process.env.FLY_MACHINE_ID || null;
4333
+ let osbornVersion;
4334
+ try {
4335
+ for (const rel of ['../package.json', '../../package.json']) {
4336
+ try {
4337
+ const pkg = JSON.parse(readFileSync(join(__dirname, rel), 'utf8'));
4338
+ if (pkg.name === 'osborn' && pkg.version) {
4339
+ osbornVersion = pkg.version;
4340
+ break;
4341
+ }
4342
+ }
4343
+ catch { /* try next */ }
4344
+ }
4345
+ }
4346
+ catch { /* version optional */ }
4347
+ console.log(`🪲 Bug report ${reportId.slice(0, 8)} (${payload.type}/${payload.severity}): ${payload.title}`);
4348
+ sendToFrontend({
4349
+ type: 'bug_report',
4350
+ reportId,
4351
+ payload,
4352
+ context: {
4353
+ voice_mode: currentVoiceMode,
4354
+ sandbox_id: sandboxId,
4355
+ osborn_version: osbornVersion,
4356
+ },
4357
+ }).catch((e) => console.error('❌ bugReportHook sendToFrontend failed:', e));
4358
+ };
4285
4359
  // Fire and forget; the retry loop keeps the process alive on its own (so
4286
4360
  // we don't need the explicit `new Promise(() => {})` keepalive anymore).
4287
4361
  // Errors that escape the retry loop should never happen, but if they do,
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "osborn",
3
- "version": "0.9.54",
3
+ "version": "0.9.58",
4
4
  "description": "Voice AI coding assistant - local agent that connects to Osborn frontend",
5
5
  "type": "module",
6
6
  "bin": {