@sesamespace/hivemind 0.10.0 → 0.11.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (52) hide show
  1. package/.pnpmrc.json +1 -0
  2. package/AUTO-DEBUG-DESIGN.md +267 -0
  3. package/AUTOMATIC-MEMORY-MANAGEMENT.md +109 -0
  4. package/DASHBOARD-PLAN.md +206 -0
  5. package/MEMORY-ENHANCEMENT-PLAN.md +211 -0
  6. package/TOOL-USE-DESIGN.md +173 -0
  7. package/dist/{chunk-FBQBBAPZ.js → chunk-4C6B2AMB.js} +2 -2
  8. package/dist/{chunk-FK6WYXRM.js → chunk-4YXOQGQC.js} +2 -2
  9. package/dist/{chunk-IXBIAX76.js → chunk-K6KL2VD6.js} +2 -2
  10. package/dist/{chunk-IJRAVHQC.js → chunk-LWJCKTQP.js} +51 -11
  11. package/dist/chunk-LWJCKTQP.js.map +1 -0
  12. package/dist/{chunk-BHCDOHSK.js → chunk-LYL5GG2F.js} +3 -3
  13. package/dist/{chunk-M3A2WRXM.js → chunk-OB6OXLPC.js} +430 -2
  14. package/dist/chunk-OB6OXLPC.js.map +1 -0
  15. package/dist/{chunk-DPLCEMEC.js → chunk-ZA4NWNS6.js} +2 -2
  16. package/dist/commands/fleet.js +3 -3
  17. package/dist/commands/init.js +3 -3
  18. package/dist/commands/service.js +1 -1
  19. package/dist/commands/start.js +3 -3
  20. package/dist/commands/watchdog.js +3 -3
  21. package/dist/dashboard.html +100 -60
  22. package/dist/index.js +2 -2
  23. package/dist/main.js +7 -7
  24. package/dist/start.js +1 -1
  25. package/docs/TOOL-PARITY-PLAN.md +191 -0
  26. package/package.json +23 -24
  27. package/src/memory/dashboard-integration.ts +295 -0
  28. package/src/memory/index.ts +187 -0
  29. package/src/memory/performance-test.ts +208 -0
  30. package/src/memory/processors/agent-sync.ts +312 -0
  31. package/src/memory/processors/command-learner.ts +298 -0
  32. package/src/memory/processors/memory-api-client.ts +105 -0
  33. package/src/memory/processors/message-flow-integration.ts +168 -0
  34. package/src/memory/processors/research-digester.ts +204 -0
  35. package/test-caitlin-access.md +11 -0
  36. package/dist/chunk-IJRAVHQC.js.map +0 -1
  37. package/dist/chunk-M3A2WRXM.js.map +0 -1
  38. package/install.sh +0 -162
  39. package/packages/memory/Cargo.lock +0 -6480
  40. package/packages/memory/Cargo.toml +0 -21
  41. package/packages/memory/src/src/context.rs +0 -179
  42. package/packages/memory/src/src/embeddings.rs +0 -51
  43. package/packages/memory/src/src/main.rs +0 -887
  44. package/packages/memory/src/src/promotion.rs +0 -808
  45. package/packages/memory/src/src/scoring.rs +0 -142
  46. package/packages/memory/src/src/store.rs +0 -460
  47. package/packages/memory/src/src/tasks.rs +0 -321
  48. /package/dist/{chunk-FBQBBAPZ.js.map → chunk-4C6B2AMB.js.map} +0 -0
  49. /package/dist/{chunk-FK6WYXRM.js.map → chunk-4YXOQGQC.js.map} +0 -0
  50. /package/dist/{chunk-IXBIAX76.js.map → chunk-K6KL2VD6.js.map} +0 -0
  51. /package/dist/{chunk-BHCDOHSK.js.map → chunk-LYL5GG2F.js.map} +0 -0
  52. /package/dist/{chunk-DPLCEMEC.js.map → chunk-ZA4NWNS6.js.map} +0 -0
package/.pnpmrc.json ADDED
@@ -0,0 +1 @@
1
+ {"onlyBuiltDependencies":["esbuild","better-sqlite3"]}
@@ -0,0 +1,267 @@
1
+ # Auto-Debug Agent — Design Doc
2
+
3
+ ## Vision
4
+
5
+ A Hivemind background processor that watches agent logs, detects errors, diagnoses root causes, and submits fixes as PRs — creating a self-healing codebase. Any fleet agent (e.g. Caitlin) can run this processor to continuously improve the system while using it.
6
+
7
+ ## How It Works
8
+
9
+ ```
10
+ Logs → Detect → Deduplicate → Diagnose → Fix → PR → Verify
11
+ ```
12
+
13
+ ### 1. Log Watcher (`log-watcher` processor)
14
+
15
+ Tails all Hivemind log files:
16
+ - `/tmp/hivemind-agent.log`
17
+ - `/tmp/hivemind-error.log`
18
+ - `/tmp/hivemind-watchdog.log`
19
+ - `/tmp/hivemind-memory.log`
20
+ - `/tmp/hivemind-memory-error.log`
21
+
22
+ **Detection rules:**
23
+ - Stack traces (multiline, starts with `Error:` or `at ...`)
24
+ - Uncaught exceptions / unhandled rejections
25
+ - Repeated warnings (same message 3+ times in 5 min)
26
+ - Process crashes (exit codes ≠ 0)
27
+ - Health check failures logged by watchdog
28
+ - Memory daemon errors
29
+ - Sesame connection failures
30
+
31
+ **Output:** `ErrorEvent` objects with:
32
+ ```typescript
33
+ interface ErrorEvent {
34
+ id: string; // hash of normalized stack trace
35
+ timestamp: Date;
36
+ source: "agent" | "watchdog" | "memory";
37
+ level: "error" | "crash" | "repeated-warning";
38
+ message: string;
39
+ stackTrace?: string;
40
+ logFile: string;
41
+ lineNumber: number;
42
+ occurrences: number; // count within dedup window
43
+ context: string[]; // surrounding log lines (±10)
44
+ }
45
+ ```
46
+
47
+ ### 2. Deduplication & Prioritization
48
+
49
+ Not every error deserves a PR. The processor maintains a local error registry:
50
+
51
+ ```typescript
52
+ interface ErrorRegistry {
53
+ // key: error id (normalized stack hash)
54
+ errors: Map<string, {
55
+ firstSeen: Date;
56
+ lastSeen: Date;
57
+ totalOccurrences: number;
58
+ status: "new" | "investigating" | "fix-submitted" | "resolved" | "wont-fix";
59
+ prUrl?: string;
60
+ severity: number; // 0-10, auto-calculated
61
+ }>;
62
+ }
63
+ ```
64
+
65
+ **Severity scoring:**
66
+ - Crash/uncaught exception: +5
67
+ - Occurs on every startup: +3
68
+ - Frequency (>10/hour): +2
69
+ - Affects Sesame connectivity: +2
70
+ - Memory daemon related: +1
71
+ - Already has a PR open: -10 (skip)
72
+ - Marked wont-fix: -10 (skip)
73
+
74
+ **Threshold:** Only investigate errors with severity ≥ 3.
75
+
76
+ ### 3. Diagnosis Engine
77
+
78
+ When an error crosses the threshold, the processor:
79
+
80
+ 1. **Locate source:** Parse stack trace → map to source files in the repo
81
+ 2. **Gather context:** Use `code-indexer` data to understand the file, function, and dependencies
82
+ 3. **Check git blame:** Was this recently changed? By whom?
83
+ 4. **Search for related:** Query `task-tracker` for related tasks, check if a fix is already in progress
84
+ 5. **Build diagnosis prompt:** Assemble all context into a structured prompt for the agent's LLM
85
+
86
+ The diagnosis is done by the agent itself (not a separate LLM call) — it feeds into the agent's normal message processing as an internal task.
87
+
88
+ ### 4. Fix Generation
89
+
90
+ Once diagnosed, the agent:
91
+
92
+ 1. Creates a feature branch: `auto-fix/{error-id-short}`
93
+ 2. Makes the code change
94
+ 3. Runs `tsup` to verify build
95
+ 4. Runs tests if available
96
+ 5. Commits with a structured message:
97
+ ```
98
+ fix(auto-debug): {short description}
99
+
100
+ Error: {original error message}
101
+ Occurrences: {count} over {timespan}
102
+ Source: {log file}
103
+
104
+ Root cause: {diagnosis}
105
+
106
+ Auto-generated by hivemind auto-debug processor
107
+ ```
108
+ 6. Pushes branch and opens PR
109
+
110
+ ### 5. PR Template
111
+
112
+ ```markdown
113
+ ## 🤖 Auto-Debug Fix
114
+
115
+ **Error:** `{error message}`
116
+ **Source:** `{file}:{line}`
117
+ **Frequency:** {count} occurrences over {timespan}
118
+ **Severity:** {score}/10
119
+
120
+ ### Diagnosis
121
+ {LLM-generated root cause analysis}
122
+
123
+ ### Fix
124
+ {Description of what changed and why}
125
+
126
+ ### Log Sample
127
+ ```
128
+ {relevant log lines}
129
+ ```
130
+
131
+ ### Verification
132
+ - [ ] Build passes (`tsup`)
133
+ - [ ] Error no longer reproduces (monitored for 30 min post-deploy)
134
+
135
+ ---
136
+ *Auto-generated by hivemind auto-debug processor. Review before merging.*
137
+ ```
138
+
139
+ ### 6. Post-Fix Monitoring
140
+
141
+ After a PR is merged and deployed:
142
+ - Watch for the same error ID in logs
143
+ - If it doesn't recur within 1 hour → mark as `resolved`
144
+ - If it recurs → reopen with additional context, bump severity
145
+
146
+ ## Architecture
147
+
148
+ ### Processor Class
149
+
150
+ ```typescript
151
+ class LogWatcher extends BackgroundProcess {
152
+ name = "log-watcher";
153
+ interval = 10_000; // check every 10s
154
+
155
+ private tailPositions: Map<string, number>; // track file offsets
156
+ private errorRegistry: ErrorRegistry;
157
+ private activeBranches: Set<string>;
158
+
159
+ async run(context: ProcessContext): Promise<ProcessResult> {
160
+ // 1. Read new log lines since last position
161
+ const newLines = this.readNewLines();
162
+
163
+ // 2. Detect errors
164
+ const errors = this.detectErrors(newLines);
165
+
166
+ // 3. Deduplicate and score
167
+ const actionable = this.processErrors(errors);
168
+
169
+ // 4. For high-severity new errors, queue investigation
170
+ for (const error of actionable) {
171
+ await this.queueInvestigation(error, context);
172
+ }
173
+
174
+ return { itemsProcessed: newLines.length, errors: [] };
175
+ }
176
+ }
177
+ ```
178
+
179
+ ### Integration Points
180
+
181
+ - **code-indexer:** Provides file/function context for diagnosis
182
+ - **task-tracker:** Tracks fix progress, prevents duplicate work
183
+ - **agent-sync:** In fleet mode, coordinates so multiple agents don't fix the same bug
184
+ - **Sesame:** Posts status updates to a designated channel (e.g. `#hivemind-debug`)
185
+ - **GitHub:** Opens PRs via `gh` CLI
186
+
187
+ ### Fleet Coordination
188
+
189
+ When multiple agents run this processor:
190
+ 1. Error registry is shared via Sesame vault or a shared memory namespace
191
+ 2. Lock mechanism: first agent to claim an error ID owns the investigation
192
+ 3. Other agents provide additional log samples if they see the same error
193
+ 4. Any agent can review/approve another agent's PR
194
+
195
+ ## Configuration
196
+
197
+ ```toml
198
+ [processors.log-watcher]
199
+ enabled = true
200
+ interval_ms = 10_000
201
+ severity_threshold = 3
202
+ log_files = [
203
+ "/tmp/hivemind-agent.log",
204
+ "/tmp/hivemind-error.log",
205
+ "/tmp/hivemind-watchdog.log",
206
+ "/tmp/hivemind-memory.log",
207
+ ]
208
+ auto_pr = true # false = diagnose only, don't submit PRs
209
+ repo = "baileydavis2026/hivemind"
210
+ branch_prefix = "auto-fix"
211
+ notify_channel = "hivemind-debug" # Sesame channel for status updates
212
+ max_concurrent_fixes = 2 # don't overwhelm with PRs
213
+ cooldown_minutes = 30 # min time between PRs for same error class
214
+ ```
215
+
216
+ ## Safety & Guardrails
217
+
218
+ 1. **Human review required:** PRs are never auto-merged. A human must approve.
219
+ 2. **Scope limits:** Auto-debug only touches files in the Hivemind repo, never system files or other projects.
220
+ 3. **Rate limiting:** Max 2 concurrent fix branches, 30-min cooldown per error class.
221
+ 4. **Severity gating:** Only acts on errors above threshold — ignores transient/cosmetic issues.
222
+ 5. **Rollback awareness:** If a fix introduces new errors, the processor detects this and comments on the PR.
223
+ 6. **No secrets in PRs:** Log context is sanitized (API keys, tokens stripped) before inclusion in PR descriptions.
224
+
225
+ ## Implementation Plan
226
+
227
+ ### Phase 1: Log Watcher (detection only)
228
+ - Implement `LogWatcher` processor
229
+ - Error detection, deduplication, registry
230
+ - Sesame channel notifications for new errors
231
+ - Dashboard integration (error list view)
232
+
233
+ ### Phase 2: Diagnosis
234
+ - Stack trace → source mapping
235
+ - Integration with code-indexer for context
236
+ - LLM-powered root cause analysis
237
+ - Diagnosis reports posted to Sesame
238
+
239
+ ### Phase 3: Auto-Fix PRs
240
+ - Branch creation, code changes, PR submission
241
+ - Build verification
242
+ - Post-merge monitoring
243
+ - Fleet coordination (error locking)
244
+
245
+ ### Phase 4: Learning
246
+ - Track fix success rate per error pattern
247
+ - Build pattern library (common fixes for common errors)
248
+ - Skip LLM for known patterns → direct fix
249
+ - Feed insights back to MEMORY-ENHANCEMENT-PLAN
250
+
251
+ ## Success Metrics
252
+
253
+ - **Detection rate:** % of real errors caught vs. total errors in logs
254
+ - **False positive rate:** % of investigations that led to wont-fix
255
+ - **Fix success rate:** % of merged PRs that resolved the error
256
+ - **Time to fix:** From first error occurrence to PR merged
257
+ - **Regression rate:** % of fixes that introduced new errors
258
+
259
+ ## Why This Works
260
+
261
+ The fastest path to a bug-free system is having the system's own users (agents) fix bugs as they encounter them. Every agent running Hivemind becomes a contributor to Hivemind's stability. The more agents deployed, the faster bugs are found and fixed. It's a flywheel:
262
+
263
+ ```
264
+ More agents → More log coverage → More bugs found → More fixes → More stable → More agents
265
+ ```
266
+
267
+ This is the software equivalent of an immune system — the codebase develops antibodies to its own failure modes.
@@ -0,0 +1,109 @@
1
+ # Automatic Memory Management in Hivemind
2
+
3
+ ## Overview
4
+
5
+ Hivemind v0.8.13+ includes automatic memory management that runs in the background without requiring any cognitive load from agents. This system enriches every agent message with relevant context automatically.
6
+
7
+ ## Features
8
+
9
+ ### 1. Background Processors
10
+ - **Code Indexer** - Tracks file access patterns and maintains a "working set"
11
+ - **Task Tracker** - Monitors conversations for task patterns and progress
12
+ - **Research Digester** - Extracts key points from web pages and documents
13
+ - **Command Learner** - Tracks successful command patterns
14
+ - **Agent Sync** - Enables knowledge sharing between agents
15
+
16
+ ### 2. Auto-Debug System
17
+ - **Log Watcher** - Monitors agent/memory/watchdog logs for errors
18
+ - **Auto Debugger** - Automatically creates fixes for detected errors
19
+ - **Error Registry** - Tracks error patterns and prevents duplicate fixes
20
+
21
+ ### 3. Zero Configuration
22
+ All memory management features are automatically enabled when you install a new agent. No setup required.
23
+
24
+ ## How It Works
25
+
26
+ 1. **Background Processing** - Uses Mac Mini's local compute, not expensive LLM calls
27
+ 2. **Continuous Updates** - Memory is updated in real-time as the agent works
28
+ 3. **Context Building** - Relevant information is automatically included in each LLM request
29
+ 4. **Token Management** - Efficiently manages context size to stay within limits
30
+
31
+ ## Installation
32
+
33
+ The standard one-line installer sets up everything:
34
+
35
+ ```bash
36
+ npm install -g @sesamespace/hivemind
37
+ hivemind init <sesame-api-key>
38
+ hivemind service install
39
+ ```
40
+
41
+ This automatically:
42
+ - Installs the memory daemon with background processors
43
+ - Sets up the watchdog to monitor both agent and memory health
44
+ - Enables auto-debug features if configured
45
+ - Starts all services with proper launchd configuration
46
+
47
+ ## Configuration
48
+
49
+ Optional configuration in `config.toml`:
50
+
51
+ ```toml
52
+ [auto_debug]
53
+ enabled = true
54
+ log_files = ["/tmp/hivemind-agent.log", "/tmp/hivemind-memory.log"]
55
+ severity_threshold = "error"
56
+ auto_pr = true
57
+ repo = "https://github.com/baileydavis2026/hivemind"
58
+ notify_channel = "your-sesame-channel-id"
59
+ ```
60
+
61
+ ## Architecture
62
+
63
+ ```
64
+ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
65
+ │ Agent Process │────▶│ Memory Daemon │◀────│ Watchdog │
66
+ └─────────────────┘ └─────────────────┘ └─────────────────┘
67
+
68
+
69
+ ┌─────────────────────┐
70
+ │ Background Processors│
71
+ ├─────────────────────┤
72
+ │ • Code Indexer │
73
+ │ • Task Tracker │
74
+ │ • Research Digester │
75
+ │ • Command Learner │
76
+ │ • Agent Sync │
77
+ │ • Log Watcher │
78
+ │ • Auto Debugger │
79
+ └─────────────────────┘
80
+ ```
81
+
82
+ ## For Developers
83
+
84
+ To extend the memory management system:
85
+
86
+ 1. Create a new processor implementing `BackgroundProcessor` interface
87
+ 2. Register it in the `ProcessManager`
88
+ 3. The processor will automatically run in the background
89
+
90
+ Example:
91
+ ```typescript
92
+ export class MyProcessor extends BackgroundProcessor {
93
+ async initialize(): Promise<void> {
94
+ // Setup code
95
+ }
96
+
97
+ async process(): Promise<void> {
98
+ // Main processing logic (called periodically)
99
+ }
100
+ }
101
+ ```
102
+
103
+ ## Benefits
104
+
105
+ - **Zero Cognitive Load** - Agents don't need to think about memory
106
+ - **Better Context** - Relevant information is always available
107
+ - **Multi-Agent Collaboration** - Agents can share knowledge seamlessly
108
+ - **Self-Healing** - Auto-debug fixes errors without human intervention
109
+ - **Efficient** - Uses local compute, not expensive API calls
@@ -0,0 +1,206 @@
1
+ # Hivemind Dashboard — Implementation Plan
2
+
3
+ **Goal:** Local web dashboard for debugging memory, context routing, and LLM request formation.
4
+ **Access:** `http://localhost:9485` on the Mac mini (local access only for now).
5
+ **Priority:** LLM Request Inspector first, then Memory Browser, then Context Overview.
6
+
7
+ ---
8
+
9
+ ## Phase 1: LLM Request Logger + Inspector UI
10
+
11
+ ### Backend: Request Logging
12
+
13
+ **Where:** Instrument `buildMessages()` in `prompt.ts` and `processMessage()` in `agent.ts`.
14
+
15
+ Each logged request captures:
16
+ ```typescript
17
+ interface RequestLog {
18
+ id: string; // uuid
19
+ timestamp: string; // ISO-8601
20
+ // Routing
21
+ context: string; // which context was used
22
+ contextSwitched: boolean; // explicit switch?
23
+ routingReason: string; // "pattern_match:X" | "inferred:X" | "active:X"
24
+ // Sender
25
+ channelId: string;
26
+ channelKind: "dm" | "group";
27
+ senderHandle: string;
28
+ rawMessage: string; // as received (with prefix)
29
+ // Prompt components (broken out for UI)
30
+ systemPrompt: {
31
+ identity: string; // workspace files section
32
+ l3Knowledge: string[]; // individual L3 entries
33
+ l2Episodes: Array<{
34
+ id: string;
35
+ content: string;
36
+ score: number;
37
+ timestamp: string;
38
+ context_name: string;
39
+ role: string;
40
+ }>;
41
+ contextInfo: string; // active context section
42
+ fullText: string; // complete system prompt as sent
43
+ };
44
+ conversationHistory: Array<{ role: string; content: string }>; // L1 turns included
45
+ userMessage: string; // final user message
46
+ // Response
47
+ response: {
48
+ content: string;
49
+ model: string;
50
+ latencyMs: number;
51
+ skipped: boolean; // was it __SKIP__?
52
+ };
53
+ // Config snapshot
54
+ config: {
55
+ topK: number;
56
+ model: string;
57
+ maxTokens: number;
58
+ temperature: number;
59
+ };
60
+ // Approximate token counts (char-based estimate: chars/4)
61
+ tokenEstimates: {
62
+ systemPrompt: number;
63
+ conversationHistory: number;
64
+ userMessage: number;
65
+ total: number;
66
+ };
67
+ }
68
+ ```
69
+
70
+ **Storage:** SQLite database at `data/dashboard.db`.
71
+ - Single `request_logs` table with JSON columns for complex fields.
72
+ - Auto-prune: keep last 7 days or 10,000 entries (whichever is smaller).
73
+ - Why SQLite over ring buffer: survives restarts, queryable, minimal overhead.
74
+
75
+ **Token estimation:** Use chars/4 approximation. Good enough for relative sizing. Avoid tokenizer dependency.
76
+
77
+ **Logging approach:** Eager logging. Serialize at request time. The overhead is minimal (~1ms for JSON.stringify) compared to LLM latency (~1-10s). Capturing the exact state at request time is more valuable than lazy reconstruction.
78
+
79
+ ### Backend: Dashboard HTTP Server
80
+
81
+ **Where:** New file `packages/runtime/src/dashboard.ts`.
82
+
83
+ Extend the existing health server (or create a sibling on port 9485):
84
+ - `GET /` — serve the SPA (single HTML file)
85
+ - `GET /api/requests` — list recent requests (paginated, filterable)
86
+ - `GET /api/requests/:id` — single request detail
87
+ - `GET /api/contexts` — proxy to memory daemon's context list
88
+ - `GET /api/contexts/:name/episodes` — proxy L2 episodes
89
+ - `GET /api/contexts/:name/l3` — proxy L3 knowledge
90
+ - `GET /api/stats` — memory stats (episode counts, last promotion, etc.)
91
+ - `DELETE /api/l3/:id` — delete a bad L3 entry (write op from day 1)
92
+ - `POST /api/l3/:id/edit` — edit L3 entry content
93
+
94
+ Bind to `127.0.0.1:9485` only.
95
+
96
+ ### Frontend: Single-File SPA
97
+
98
+ **Why single file:** No build step, no React, no dependencies. Ship as one HTML file with embedded CSS/JS. Can always upgrade later.
99
+
100
+ **Layout:**
101
+ - Left sidebar: navigation (Requests, Memory, Contexts)
102
+ - Main area: content
103
+
104
+ **Request Inspector view:**
105
+ - Reverse-chronological list of requests
106
+ - Each row: timestamp, sender, context, model, latency, token estimate
107
+ - Click to expand → shows all sections:
108
+ - **Identity files** (collapsible, usually not interesting)
109
+ - **L3 Knowledge** (list of entries with metadata)
110
+ - **L2 Episodes** (with similarity scores, timestamps, source context)
111
+ - **L1 History** (conversation turns)
112
+ - **User Message** (raw with prefix)
113
+ - **Response** (with model, latency)
114
+ - **Config** (top_k, model, temperature)
115
+ - **Token breakdown** (bar chart showing proportion per section)
116
+ - Filters: by context, by sender, by time range
117
+ - Search: full-text search across messages
118
+
119
+ **Memory Browser view (Phase 2):**
120
+ - L2: searchable episode list, filterable by context/role/time
121
+ - L3: per-context knowledge entries with edit/delete buttons
122
+ - Promotion log (if we add logging for it)
123
+
124
+ **Context Overview (Phase 2):**
125
+ - List of contexts with episode counts, last active
126
+ - Active context highlighted
127
+ - Click to drill into episodes/L3
128
+
129
+ ---
130
+
131
+ ## Phase 2: Memory Browser + Context Overview
132
+
133
+ After Phase 1 is working and useful, add:
134
+ - Full L2 browsing with semantic search UI
135
+ - L3 management (view, edit, delete)
136
+ - Context explorer with stats
137
+ - Promotion history logging
138
+
139
+ ---
140
+
141
+ ## Implementation Steps (Phase 1)
142
+
143
+ ### Step 1: Request logging infrastructure
144
+ - [ ] Create `packages/runtime/src/request-logger.ts`
145
+ - SQLite setup (using better-sqlite3)
146
+ - `logRequest()` method
147
+ - `getRequests()` with pagination/filters
148
+ - `getRequest(id)` for detail view
149
+ - Auto-pruning on startup
150
+ - [ ] Add better-sqlite3 dependency
151
+
152
+ ### Step 2: Instrument the pipeline
153
+ - [ ] Modify `agent.ts` `processMessage()` to capture routing decision + timing
154
+ - [ ] Modify `prompt.ts` `buildSystemPrompt()` to return structured components (not just string)
155
+ - [ ] Log each request after LLM response arrives
156
+ - [ ] Capture config snapshot with each log entry
157
+
158
+ ### Step 3: Dashboard HTTP server
159
+ - [ ] Create `packages/runtime/src/dashboard.ts`
160
+ - Express-free: use Node's built-in `http` module (like health server)
161
+ - Serve SPA at `/`
162
+ - JSON APIs for request logs and memory proxy
163
+ - [ ] Wire into `pipeline.ts` startup
164
+
165
+ ### Step 4: Frontend SPA
166
+ - [ ] Single HTML file at `packages/runtime/src/dashboard.html`
167
+ - Vanilla JS, no framework
168
+ - CSS grid layout
169
+ - Fetch-based API calls
170
+ - Expandable request cards
171
+ - Token breakdown visualization
172
+ - Basic filtering
173
+
174
+ ### Step 5: Memory proxy + write ops
175
+ - [ ] Proxy endpoints to memory daemon for L2/L3 browsing
176
+ - [ ] DELETE/PATCH endpoints for L3 management
177
+
178
+ ---
179
+
180
+ ## Design Decisions
181
+
182
+ | Question | Decision | Rationale |
183
+ |----------|----------|-----------|
184
+ | Storage | SQLite | Survives restarts, queryable, lightweight |
185
+ | Token counting | chars/4 estimate | Good enough, no tokenizer dep |
186
+ | Logging | Eager | Captures exact state, overhead negligible vs LLM latency |
187
+ | Bind address | 127.0.0.1 only | Local access, no auth needed |
188
+ | Framework | None (vanilla) | Single HTML file, no build step |
189
+ | Read-only or read-write? | Read-write from start | Ryan will want to delete bad L3 entries immediately |
190
+ | Persist request logs? | Yes, 7 days | Need to compare across memory config changes |
191
+ | Multi-agent? | Single agent for now | Don't over-engineer, but use agent name in logs |
192
+ | Port | 9485 | Next to health port (9484), easy to remember |
193
+
194
+ ---
195
+
196
+ ## Sesame Command Fix (Bonus)
197
+
198
+ While we're in the code, fix the sender prefix issue:
199
+ - In `pipeline.ts` `startSesameLoop()`, before calling `agent.processMessage()`, strip the sender prefix for command parsing
200
+ - Or better: in `agent.ts` `handleSpecialCommand()`, strip known prefix patterns before regex matching
201
+ - This unblocks context switching, task commands, and cross-context search over Sesame
202
+
203
+ ---
204
+
205
+ *Created: 2026-02-28*
206
+ *Status: Ready to implement*