smartcontext-proxy 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (166) hide show
  1. package/PLAN.md +406 -0
  2. package/PROGRESS.md +60 -0
  3. package/README.md +99 -0
  4. package/SPEC.md +915 -0
  5. package/adapters/openclaw/embedding.d.ts +8 -0
  6. package/adapters/openclaw/embedding.js +16 -0
  7. package/adapters/openclaw/embedding.ts +15 -0
  8. package/adapters/openclaw/index.d.ts +18 -0
  9. package/adapters/openclaw/index.js +42 -0
  10. package/adapters/openclaw/index.ts +43 -0
  11. package/adapters/openclaw/session-importer.d.ts +22 -0
  12. package/adapters/openclaw/session-importer.js +99 -0
  13. package/adapters/openclaw/session-importer.ts +105 -0
  14. package/adapters/openclaw/storage.d.ts +26 -0
  15. package/adapters/openclaw/storage.js +177 -0
  16. package/adapters/openclaw/storage.ts +183 -0
  17. package/dist/adapters/openclaw/embedding.d.ts +8 -0
  18. package/dist/adapters/openclaw/embedding.js +16 -0
  19. package/dist/adapters/openclaw/index.d.ts +18 -0
  20. package/dist/adapters/openclaw/index.js +42 -0
  21. package/dist/adapters/openclaw/session-importer.d.ts +22 -0
  22. package/dist/adapters/openclaw/session-importer.js +99 -0
  23. package/dist/adapters/openclaw/storage.d.ts +26 -0
  24. package/dist/adapters/openclaw/storage.js +177 -0
  25. package/dist/config/auto-detect.d.ts +3 -0
  26. package/dist/config/auto-detect.js +48 -0
  27. package/dist/config/defaults.d.ts +2 -0
  28. package/dist/config/defaults.js +28 -0
  29. package/dist/config/schema.d.ts +30 -0
  30. package/dist/config/schema.js +3 -0
  31. package/dist/context/budget.d.ts +25 -0
  32. package/dist/context/budget.js +85 -0
  33. package/dist/context/canonical.d.ts +39 -0
  34. package/dist/context/canonical.js +12 -0
  35. package/dist/context/chunker.d.ts +9 -0
  36. package/dist/context/chunker.js +148 -0
  37. package/dist/context/optimizer.d.ts +31 -0
  38. package/dist/context/optimizer.js +163 -0
  39. package/dist/context/retriever.d.ts +29 -0
  40. package/dist/context/retriever.js +103 -0
  41. package/dist/daemon/process.d.ts +6 -0
  42. package/dist/daemon/process.js +76 -0
  43. package/dist/daemon/service.d.ts +2 -0
  44. package/dist/daemon/service.js +99 -0
  45. package/dist/embedding/ollama.d.ts +11 -0
  46. package/dist/embedding/ollama.js +72 -0
  47. package/dist/embedding/types.d.ts +6 -0
  48. package/dist/embedding/types.js +3 -0
  49. package/dist/index.d.ts +2 -0
  50. package/dist/index.js +190 -0
  51. package/dist/metrics/collector.d.ts +43 -0
  52. package/dist/metrics/collector.js +72 -0
  53. package/dist/providers/anthropic.d.ts +15 -0
  54. package/dist/providers/anthropic.js +109 -0
  55. package/dist/providers/google.d.ts +13 -0
  56. package/dist/providers/google.js +40 -0
  57. package/dist/providers/ollama.d.ts +13 -0
  58. package/dist/providers/ollama.js +82 -0
  59. package/dist/providers/openai.d.ts +15 -0
  60. package/dist/providers/openai.js +115 -0
  61. package/dist/providers/types.d.ts +18 -0
  62. package/dist/providers/types.js +3 -0
  63. package/dist/proxy/router.d.ts +12 -0
  64. package/dist/proxy/router.js +46 -0
  65. package/dist/proxy/server.d.ts +25 -0
  66. package/dist/proxy/server.js +265 -0
  67. package/dist/proxy/stream.d.ts +8 -0
  68. package/dist/proxy/stream.js +32 -0
  69. package/dist/src/config/auto-detect.d.ts +3 -0
  70. package/dist/src/config/auto-detect.js +48 -0
  71. package/dist/src/config/defaults.d.ts +2 -0
  72. package/dist/src/config/defaults.js +28 -0
  73. package/dist/src/config/schema.d.ts +30 -0
  74. package/dist/src/config/schema.js +3 -0
  75. package/dist/src/context/budget.d.ts +25 -0
  76. package/dist/src/context/budget.js +85 -0
  77. package/dist/src/context/canonical.d.ts +39 -0
  78. package/dist/src/context/canonical.js +12 -0
  79. package/dist/src/context/chunker.d.ts +9 -0
  80. package/dist/src/context/chunker.js +148 -0
  81. package/dist/src/context/optimizer.d.ts +31 -0
  82. package/dist/src/context/optimizer.js +163 -0
  83. package/dist/src/context/retriever.d.ts +29 -0
  84. package/dist/src/context/retriever.js +103 -0
  85. package/dist/src/daemon/process.d.ts +6 -0
  86. package/dist/src/daemon/process.js +76 -0
  87. package/dist/src/daemon/service.d.ts +2 -0
  88. package/dist/src/daemon/service.js +99 -0
  89. package/dist/src/embedding/ollama.d.ts +11 -0
  90. package/dist/src/embedding/ollama.js +72 -0
  91. package/dist/src/embedding/types.d.ts +6 -0
  92. package/dist/src/embedding/types.js +3 -0
  93. package/dist/src/index.d.ts +2 -0
  94. package/dist/src/index.js +190 -0
  95. package/dist/src/metrics/collector.d.ts +43 -0
  96. package/dist/src/metrics/collector.js +72 -0
  97. package/dist/src/providers/anthropic.d.ts +15 -0
  98. package/dist/src/providers/anthropic.js +109 -0
  99. package/dist/src/providers/google.d.ts +13 -0
  100. package/dist/src/providers/google.js +40 -0
  101. package/dist/src/providers/ollama.d.ts +13 -0
  102. package/dist/src/providers/ollama.js +82 -0
  103. package/dist/src/providers/openai.d.ts +15 -0
  104. package/dist/src/providers/openai.js +115 -0
  105. package/dist/src/providers/types.d.ts +18 -0
  106. package/dist/src/providers/types.js +3 -0
  107. package/dist/src/proxy/router.d.ts +12 -0
  108. package/dist/src/proxy/router.js +46 -0
  109. package/dist/src/proxy/server.d.ts +25 -0
  110. package/dist/src/proxy/server.js +265 -0
  111. package/dist/src/proxy/stream.d.ts +8 -0
  112. package/dist/src/proxy/stream.js +32 -0
  113. package/dist/src/storage/lancedb.d.ts +21 -0
  114. package/dist/src/storage/lancedb.js +158 -0
  115. package/dist/src/storage/types.d.ts +52 -0
  116. package/dist/src/storage/types.js +3 -0
  117. package/dist/src/test/context.test.d.ts +1 -0
  118. package/dist/src/test/context.test.js +141 -0
  119. package/dist/src/test/dashboard.test.d.ts +1 -0
  120. package/dist/src/test/dashboard.test.js +85 -0
  121. package/dist/src/test/proxy.test.d.ts +1 -0
  122. package/dist/src/test/proxy.test.js +188 -0
  123. package/dist/src/ui/dashboard.d.ts +2 -0
  124. package/dist/src/ui/dashboard.js +183 -0
  125. package/dist/storage/lancedb.d.ts +21 -0
  126. package/dist/storage/lancedb.js +158 -0
  127. package/dist/storage/types.d.ts +52 -0
  128. package/dist/storage/types.js +3 -0
  129. package/dist/test/context.test.d.ts +1 -0
  130. package/dist/test/context.test.js +141 -0
  131. package/dist/test/dashboard.test.d.ts +1 -0
  132. package/dist/test/dashboard.test.js +85 -0
  133. package/dist/test/proxy.test.d.ts +1 -0
  134. package/dist/test/proxy.test.js +188 -0
  135. package/dist/ui/dashboard.d.ts +2 -0
  136. package/dist/ui/dashboard.js +183 -0
  137. package/package.json +38 -0
  138. package/src/config/auto-detect.ts +51 -0
  139. package/src/config/defaults.ts +26 -0
  140. package/src/config/schema.ts +33 -0
  141. package/src/context/budget.ts +126 -0
  142. package/src/context/canonical.ts +50 -0
  143. package/src/context/chunker.ts +165 -0
  144. package/src/context/optimizer.ts +201 -0
  145. package/src/context/retriever.ts +123 -0
  146. package/src/daemon/process.ts +70 -0
  147. package/src/daemon/service.ts +103 -0
  148. package/src/embedding/ollama.ts +68 -0
  149. package/src/embedding/types.ts +6 -0
  150. package/src/index.ts +176 -0
  151. package/src/metrics/collector.ts +114 -0
  152. package/src/providers/anthropic.ts +117 -0
  153. package/src/providers/google.ts +42 -0
  154. package/src/providers/ollama.ts +87 -0
  155. package/src/providers/openai.ts +127 -0
  156. package/src/providers/types.ts +20 -0
  157. package/src/proxy/router.ts +48 -0
  158. package/src/proxy/server.ts +315 -0
  159. package/src/proxy/stream.ts +39 -0
  160. package/src/storage/lancedb.ts +169 -0
  161. package/src/storage/types.ts +47 -0
  162. package/src/test/context.test.ts +165 -0
  163. package/src/test/dashboard.test.ts +94 -0
  164. package/src/test/proxy.test.ts +218 -0
  165. package/src/ui/dashboard.ts +184 -0
  166. package/tsconfig.json +18 -0
package/SPEC.md ADDED
@@ -0,0 +1,915 @@
1
+ # SmartContext Proxy โ€” Technical Specification v2.0
2
+
3
+ ## Goal
4
+
5
+ Self-configuring, provider-agnostic transparent proxy between LLM clients and providers. Operates like a network firewall โ€” intercepts traffic, applies context optimization logic, forwards transparently. Zero-config `npx` install, works out of the box.
6
+
7
+ ## Core Principle: Transparent Firewall
8
+
9
+ ```
10
+ Client App โ”€โ”€โ–บ SmartContext Proxy โ”€โ”€โ–บ LLM Provider
11
+ (unchanged) (intercept+optimize) (any provider)
12
+ ```
13
+
14
+ The client doesn't know SmartContext exists. The provider doesn't know SmartContext exists. SmartContext sits in the middle, reads the conversation, replaces bloated history with optimized context, and forwards. Like a firewall โ€” but for tokens.
15
+
16
+ ## 1. Zero-Config Bootstrap
17
+
18
+ ### Install & Run
19
+ ```bash
20
+ npx smartcontext-proxy
21
+ ```
22
+
23
+ That's it. On first run:
24
+
25
+ 1. **Auto-detect providers**: Scan env vars (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GOOGLE_API_KEY`, `OPENROUTER_API_KEY`, `OLLAMA_HOST`). Each detected key = one supported provider.
26
+ 2. **Auto-select embedding**: Check for local Ollama (`localhost:11434`) โ†’ use `nomic-embed-text`. No Ollama โ†’ use built-in ONNX runtime (`@xenova/transformers` with `nomic-embed-text-v1.5`). Zero external dependencies either way.
27
+ 3. **Auto-select storage**: Embedded LanceDB (zero config, writes to `~/.smartcontext/data/`). No server, no setup.
28
+ 4. **Start proxy**: Listen on `localhost:4800`. Print one line: `SmartContext listening on http://localhost:4800 โ€” providers: anthropic, openai, ollama`
29
+ 5. **Generate config**: Write `~/.smartcontext/config.json` with detected settings. User can edit later if needed.
30
+
31
+ ### Client Integration
32
+ Change one env var:
33
+ ```bash
34
+ # Before
35
+ ANTHROPIC_API_URL=https://api.anthropic.com
36
+
37
+ # After
38
+ ANTHROPIC_API_URL=http://localhost:4800/v1/anthropic
39
+ ```
40
+
41
+ Or for OpenAI-compatible clients:
42
+ ```bash
43
+ OPENAI_BASE_URL=http://localhost:4800/v1/openai
44
+ ```
45
+
46
+ ### Self-Configuration via LLM
47
+ If config is ambiguous (multiple providers, unclear defaults), SmartContext can use a cheap local model (Ollama) or the cheapest available cloud model to:
48
+ - Analyze the user's typical usage pattern (from first few intercepted requests)
49
+ - Suggest optimal tier thresholds
50
+ - Auto-tune chunk sizes based on actual conversation structure
51
+
52
+ ## 2. Provider-Agnostic Architecture
53
+
54
+ ### Request Flow (Firewall Model)
55
+
56
+ ```
57
+ Inbound Request
58
+ โ”‚
59
+ โ–ผ
60
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
61
+ โ”‚ Format Detect โ”‚ โ† Auto-detect: Anthropic Messages / OpenAI Chat / Google GenerateContent
62
+ โ”‚ (by URL path โ”‚ /v1/anthropic/* โ†’ Anthropic format
63
+ โ”‚ or headers) โ”‚ /v1/openai/* โ†’ OpenAI format
64
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ /v1/google/* โ†’ Google format
65
+ โ”‚ /v1/ollama/* โ†’ Ollama format
66
+ โ–ผ
67
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
68
+ โ”‚ Parse to โ”‚ โ† Normalize all formats to internal CanonicalMessage[]
69
+ โ”‚ Canonical โ”‚ { role, content, metadata, timestamp }
70
+ โ”‚ Format โ”‚
71
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
72
+ โ”‚
73
+ โ–ผ
74
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
75
+ โ”‚ Context โ”‚ โ† The core logic:
76
+ โ”‚ Optimizer โ”‚ 1. Extract system prompt (keep stable for KV-cache)
77
+ โ”‚ โ”‚ 2. Keep Tier 1 (last N exchanges) verbatim
78
+ โ”‚ โ”‚ 3. Embed user query โ†’ retrieve Tier 2 from vector store
79
+ โ”‚ โ”‚ 4. Pack into token budget
80
+ โ”‚ โ”‚ 5. Append Tier 3 summaries if space remains
81
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
82
+ โ”‚
83
+ โ–ผ
84
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
85
+ โ”‚ Serialize to โ”‚ โ† Convert back to original provider format
86
+ โ”‚ Provider Format โ”‚ (same format as inbound โ€” transparent proxy)
87
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
88
+ โ”‚
89
+ โ–ผ
90
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
91
+ โ”‚ Forward to โ”‚ โ† SSE stream-through for streaming requests
92
+ โ”‚ Provider โ”‚ Async post-index after response complete
93
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
94
+ โ”‚
95
+ โ–ผ
96
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
97
+ โ”‚ Index Exchange โ”‚ โ† Embed + store the full exchange (async, non-blocking)
98
+ โ”‚ (async) โ”‚ Write raw log to disk
99
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
100
+ ```
101
+
102
+ ### Provider Modules
103
+
104
+ Each provider is a module implementing `ProviderAdapter`:
105
+
106
+ ```typescript
107
+ interface ProviderAdapter {
108
+ name: string;
109
+ detect(req: IncomingMessage): boolean; // Can this adapter handle this request?
110
+ parseRequest(body: any): CanonicalRequest; // Provider format โ†’ canonical
111
+ serializeRequest(canonical: CanonicalRequest): any; // Canonical โ†’ provider format
112
+ forwardUrl(originalPath: string): string; // Where to forward
113
+ parseStreamChunk(chunk: Buffer): StreamEvent; // Parse SSE chunks
114
+ serializeStreamChunk(event: StreamEvent): Buffer; // Re-serialize SSE
115
+ }
116
+ ```
117
+
118
+ Built-in adapters: `anthropic`, `openai`, `google`, `ollama`.
119
+ Custom adapters: drop a `.js` file into `~/.smartcontext/adapters/`.
120
+
121
+ ### Canonical Message Format
122
+
123
+ ```typescript
124
+ interface CanonicalMessage {
125
+ role: 'system' | 'user' | 'assistant' | 'tool';
126
+ content: string | ContentBlock[]; // text, images, tool_use, tool_result
127
+ timestamp?: number;
128
+ metadata?: {
129
+ provider?: string;
130
+ model?: string;
131
+ tokens?: number;
132
+ files?: string[]; // file paths mentioned
133
+ tools?: string[]; // tools used
134
+ sessionId?: string;
135
+ };
136
+ }
137
+
138
+ interface CanonicalRequest {
139
+ messages: CanonicalMessage[];
140
+ systemPrompt?: string;
141
+ model: string;
142
+ stream: boolean;
143
+ maxTokens?: number;
144
+ temperature?: number;
145
+ tools?: any[];
146
+ rawHeaders: Record<string, string>; // preserved for forwarding
147
+ providerAuth: string; // API key for forwarding
148
+ }
149
+ ```
150
+
151
+ ## 3. Context Optimization Engine
152
+
153
+ ### Tiered Strategy
154
+
155
+ | Tier | What | Token Budget | Source |
156
+ |------|------|-------------|--------|
157
+ | **T0** | System prompt | Unlimited (stable prefix) | From request, never modified |
158
+ | **T1** | Hot context | Last 3 exchanges verbatim | From request |
159
+ | **T2** | Warm context | Top-K retrieved chunks | Vector store (semantic search) |
160
+ | **T3** | Cold context | Session/project summaries | Pre-computed summaries |
161
+
162
+ ### Token Budget Algorithm
163
+
164
+ ```
165
+ available = model_context_limit - system_prompt_tokens - response_reserve
166
+ t1_tokens = sum(last_3_exchanges)
167
+ t2_budget = available - t1_tokens - t3_reserve
168
+ t3_reserve = min(500, available * 0.05)
169
+
170
+ // Fill T2 greedily by relevance score until budget exhausted
171
+ for chunk in retrieved_chunks_sorted_by_score:
172
+ if t2_used + chunk.tokens <= t2_budget:
173
+ include(chunk)
174
+ t2_used += chunk.tokens
175
+
176
+ // Fill T3 with remaining space
177
+ if remaining > 100:
178
+ include(session_summaries, limit=remaining)
179
+ ```
180
+
181
+ ### Retrieval Pipeline
182
+
183
+ 1. **Embed query**: User's last message โ†’ embedding vector
184
+ 2. **Candidate retrieval**: Top-20 from vector store (cosine similarity)
185
+ 3. **File-path boost**: If query mentions a file path, chunks containing that path get +0.2 boost. File-path inertia: if recent exchanges focused on a file, keep boosting it.
186
+ 4. **Recency boost**: Current session chunks get +0.15, last-hour chunks get +0.05
187
+ 5. **Dedup**: Chunks with similarity >0.92 โ†’ keep most recent
188
+ 6. **Confidence gate**: If best chunk score < 0.55, skip retrieval entirely โ†’ pass original context through (graceful degradation)
189
+ 7. **Min chunks**: Always include at least 3 chunks if they pass threshold 0.55
190
+
191
+ ### Chunking
192
+
193
+ - **Unit**: One user-assistant exchange = one chunk
194
+ - **Long responses**: Split at paragraph boundaries if >2000 tokens. Keep code blocks atomic.
195
+ - **Metadata per chunk**: `{ sessionId, timestamp, files[], tools[], summary(first 100 chars) }`
196
+ - **Overlap**: Last sentence of prev chunk prepended to next chunk
197
+
198
+ ## 4. Streaming Architecture
199
+
200
+ Non-negotiable: zero perceived latency overhead.
201
+
202
+ ```
203
+ Client โ†โ”€โ”€SSEโ”€โ”€โ”€โ”€ SmartContext โ†โ”€โ”€SSEโ”€โ”€โ”€โ”€ Provider
204
+ (pass-through) (pass-through)
205
+
206
+ Timeline:
207
+ 0ms Client sends request
208
+ 5ms SmartContext intercepts, optimizes context
209
+ 15ms Forward to provider (optimized request, fewer tokens)
210
+ 20ms Provider starts streaming response
211
+ 20ms SmartContext passes first SSE chunk to client
212
+ ... Stream continues transparently
213
+ done SmartContext asynchronously indexes the exchange
214
+ ```
215
+
216
+ The optimization happens BEFORE the provider call (5-15ms for embed + retrieve). The streaming response is passed through byte-for-byte with zero buffering.
217
+
218
+ ## 5. Storage Architecture
219
+
220
+ ### Plugin System
221
+
222
+ ```typescript
223
+ interface StorageAdapter {
224
+ name: string;
225
+
226
+ // Vector operations
227
+ upsertChunks(chunks: Chunk[]): Promise<void>;
228
+ search(embedding: number[], options: SearchOptions): Promise<ScoredChunk[]>;
229
+
230
+ // Raw log operations
231
+ appendLog(sessionId: string, exchange: Exchange): Promise<void>;
232
+ getSessionLog(sessionId: string): Promise<Exchange[]>;
233
+
234
+ // Summary operations
235
+ upsertSummary(sessionId: string, summary: string): Promise<void>;
236
+ getSummaries(sessionIds: string[]): Promise<Summary[]>;
237
+
238
+ // Lifecycle
239
+ initialize(config: any): Promise<void>;
240
+ close(): Promise<void>;
241
+ }
242
+ ```
243
+
244
+ ### Built-in Adapters
245
+
246
+ | Adapter | Config | Use Case |
247
+ |---------|--------|----------|
248
+ | `lancedb` (default) | Zero-config, `~/.smartcontext/data/` | npx users, single machine |
249
+ | `opensearch` | `{ url: "http://..." }` | Teams, existing ES/OS infra |
250
+ | `qdrant` | `{ url: "http://..." }` | ML teams with Qdrant |
251
+ | `filesystem` | `{ path: "..." }` | Minimal, logs only, no vector search |
252
+
253
+ ### Embedding Plugin
254
+
255
+ ```typescript
256
+ interface EmbeddingAdapter {
257
+ name: string;
258
+ dimensions: number;
259
+ embed(texts: string[]): Promise<number[][]>;
260
+ initialize(config: any): Promise<void>;
261
+ }
262
+ ```
263
+
264
+ | Adapter | Config | Use Case |
265
+ |---------|--------|----------|
266
+ | `onnx` (default) | Zero-config, downloads model on first run | npx users, no GPU |
267
+ | `ollama` | `{ url: "http://localhost:11434", model: "nomic-embed-text" }` | Local Ollama users |
268
+ | `remote-ollama` | `{ url: "http://beast:11434", model: "nomic-embed-text" }` | Our setup (Beast PC) |
269
+
270
+ ## 6. Configuration
271
+
272
+ ### Auto-Generated Config (`~/.smartcontext/config.json`)
273
+
274
+ ```jsonc
275
+ {
276
+ // Auto-detected on first run, editable
277
+ "proxy": {
278
+ "port": 4800,
279
+ "host": "127.0.0.1"
280
+ },
281
+ "providers": {
282
+ // Auto-discovered from env vars
283
+ "anthropic": { "apiKey": "env:ANTHROPIC_API_KEY" },
284
+ "openai": { "apiKey": "env:OPENAI_API_KEY" }
285
+ },
286
+ "embedding": {
287
+ // Auto-selected: ollama if available, else onnx
288
+ "adapter": "ollama",
289
+ "config": { "url": "http://localhost:11434", "model": "nomic-embed-text" }
290
+ },
291
+ "storage": {
292
+ // Default: zero-config lancedb
293
+ "adapter": "lancedb",
294
+ "config": { "path": "~/.smartcontext/data" }
295
+ },
296
+ "context": {
297
+ "tier1_exchanges": 3, // Hot: last N exchanges kept verbatim
298
+ "tier2_max_chunks": 10, // Warm: max retrieved chunks
299
+ "tier2_min_score": 0.55, // Minimum similarity for retrieval
300
+ "tier3_token_reserve": 500, // Cold: tokens reserved for summaries
301
+ "recency_boost": 0.15, // Boost for current session chunks
302
+ "filepath_boost": 0.20, // Boost for file-path matches
303
+ "dedup_threshold": 0.92, // Near-duplicate merge threshold
304
+ "confidence_gate": 0.55, // Below this: skip retrieval, pass-through
305
+ "response_reserve_tokens": 8192 // Reserve for model response
306
+ },
307
+ "logging": {
308
+ "level": "info",
309
+ "raw_logs": true, // Store full conversation logs
310
+ "metrics": true, // Token savings, latency stats
311
+ "debug_headers": false // X-SmartContext-* headers in responses
312
+ }
313
+ }
314
+ ```
315
+
316
+ ### Process Management
317
+
318
+ **Foreground (default)** โ€” like any dev server:
319
+ ```bash
320
+ npx smartcontext-proxy # Starts in foreground, Ctrl+C stops
321
+ ```
322
+
323
+ **Daemon mode** โ€” runs in background:
324
+ ```bash
325
+ npx smartcontext-proxy start # Start as background daemon
326
+ npx smartcontext-proxy stop # Stop daemon (sends SIGTERM)
327
+ npx smartcontext-proxy restart # Restart daemon
328
+ npx smartcontext-proxy status # Show: running/stopped, PID, uptime, stats
329
+ ```
330
+
331
+ Daemon mechanics:
332
+ - PID file: `~/.smartcontext/smartcontext.pid`
333
+ - Stdout/stderr: `~/.smartcontext/logs/proxy.log`
334
+ - `start` detaches process, writes PID, exits immediately
335
+ - `stop` reads PID file, sends SIGTERM, waits for graceful shutdown (flush metrics, close storage)
336
+ - `status` checks PID alive + shows stats from metrics endpoint
337
+ - Graceful shutdown on SIGTERM/SIGINT: finish in-flight requests (5s timeout), flush index queue, close storage, remove PID file
338
+
339
+ **System service (optional)** โ€” for always-on:
340
+ ```bash
341
+ npx smartcontext-proxy install-service # Generate systemd/launchd service file
342
+ npx smartcontext-proxy uninstall-service # Remove service
343
+ ```
344
+ - macOS: generates LaunchAgent plist in `~/Library/LaunchAgents/`
345
+ - Linux: generates systemd user service in `~/.config/systemd/user/`
346
+ - Auto-start on boot, auto-restart on crash
347
+
348
+ ### CLI
349
+
350
+ ```bash
351
+ npx smartcontext-proxy # Start foreground (Ctrl+C to stop)
352
+ npx smartcontext-proxy start # Start daemon
353
+ npx smartcontext-proxy stop # Stop daemon
354
+ npx smartcontext-proxy restart # Restart daemon
355
+ npx smartcontext-proxy status # Running? PID, uptime, savings stats
356
+ npx smartcontext-proxy install-service # Install system service (auto-start)
357
+ npx smartcontext-proxy uninstall-service # Remove system service
358
+ npx smartcontext-proxy --port 8080 # Custom port (foreground)
359
+ npx smartcontext-proxy --config ./my.json # Custom config
360
+ npx smartcontext-proxy index <file> # Index existing session logs
361
+ npx smartcontext-proxy providers # List detected providers
362
+ npx smartcontext-proxy benchmark # Run retrieval quality benchmark
363
+ ```
364
+
365
+ ## 7. Adapter System (Plugin & Play)
366
+
367
+ ### How Adapters Work
368
+
369
+ Adapters are npm packages following naming convention `smartcontext-adapter-*`:
370
+
371
+ ```bash
372
+ # Install OpenSearch adapter
373
+ npm install -g smartcontext-adapter-opensearch
374
+
375
+ # Install Qdrant adapter
376
+ npm install -g smartcontext-adapter-qdrant
377
+
378
+ # SmartContext auto-discovers installed adapters
379
+ npx smartcontext-proxy
380
+ # Output: "Discovered adapters: opensearch, qdrant"
381
+ ```
382
+
383
+ ### Our OC Adapter
384
+
385
+ For our system, we build `smartcontext-adapter-openclaw`:
386
+
387
+ ```bash
388
+ npm install -g smartcontext-adapter-openclaw
389
+ ```
390
+
391
+ This adapter:
392
+ - **Storage**: Uses OpenSearch on Castle (auto-discovers from `ES_URL` env var or OC config)
393
+ - **Embedding**: Uses Beast Ollama (auto-discovers from OC agent config)
394
+ - **Sessions**: Reads OC gateway session logs for initial indexing
395
+ - **Dashboard**: Exposes metrics to dashboard-ts via existing OpenSearch indices
396
+ - **Auth**: Reads OC auth-profiles for provider API keys
397
+
398
+ Config for our setup becomes just:
399
+ ```jsonc
400
+ {
401
+ "adapter": "openclaw",
402
+ "config": { "ocHome": "~/.openclaw" } // Everything else auto-discovered
403
+ }
404
+ ```
405
+
406
+ ## 8. Control Panel & Observability
407
+
408
+ SmartContext ships with a built-in web dashboard. The user sees real value from minute one.
409
+
410
+ ### 8.1 Web Dashboard (built-in)
411
+
412
+ Accessible at `http://localhost:4800` (same port as proxy, root path serves UI).
413
+ Single-page app, embedded in the binary โ€” no extra dependencies, no build step.
414
+ Built with vanilla HTML/CSS/JS (no React/Vue) โ€” inlined into the server, <50KB total.
415
+
416
+ #### Dashboard Screens
417
+
418
+ **Home / Status**
419
+ ```
420
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
421
+ โ”‚ SmartContext Proxy โ— Running โ”‚
422
+ โ”‚ [Pause] [Stop] โ”‚
423
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
424
+ โ”‚ โ”‚
425
+ โ”‚ ๐Ÿ’ฐ Total Saved โšก Requests Today โ”‚
426
+ โ”‚ $63.00 142 โ”‚
427
+ โ”‚ 4.2M tokens avg 68% savings โ”‚
428
+ โ”‚ โ”‚
429
+ โ”‚ ๐Ÿ“Š Savings Over Time (7-day chart) โ”‚
430
+ โ”‚ โ–โ–ƒโ–…โ–†โ–‡โ–ˆโ–‡โ–†โ–‡โ–ˆโ–ˆโ–‡ โ”‚
431
+ โ”‚ โ”‚
432
+ โ”‚ ๐Ÿ”Œ Providers ๐Ÿ’พ Storage โ”‚
433
+ โ”‚ anthropic: โ— active chunks: 8,943 โ”‚
434
+ โ”‚ openai: โ— active sessions: 142 โ”‚
435
+ โ”‚ ollama: โ— active disk: 234 MB โ”‚
436
+ โ”‚ โ”‚
437
+ โ”‚ โฑ Performance โ”‚
438
+ โ”‚ Latency overhead: 12ms p50 / 18ms p95 โ”‚
439
+ โ”‚ Embedding: ollama (nomic-embed-text) โ”‚
440
+ โ”‚ Cache hit rate: 73% โ”‚
441
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
442
+ ```
443
+
444
+ **Live Feed** โ€” real-time request stream:
445
+ ```
446
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
447
+ โ”‚ Live Feed [Auto-scroll โœ“] โ”‚
448
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
449
+ โ”‚ 14:23:05 anthropic/opus 45.2K โ†’ 12.1K -73% โ”‚
450
+ โ”‚ Retrieved: 7 chunks (top: 0.89) 12ms โ”‚
451
+ โ”‚ 14:22:58 openai/gpt-4o 28.1K โ†’ 9.8K -65% โ”‚
452
+ โ”‚ Retrieved: 5 chunks (top: 0.82) 8ms โ”‚
453
+ โ”‚ 14:22:41 anthropic/sonnet 8.2K โ†’ 8.2K pass โ”‚
454
+ โ”‚ โš  Below threshold, pass-through 2ms โ”‚
455
+ โ”‚ 14:22:30 ollama/qwen3 12.0K โ†’ 4.1K -66% โ”‚
456
+ โ”‚ Retrieved: 4 chunks (top: 0.77) 15ms โ”‚
457
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
458
+ ```
459
+
460
+ Clicking a row expands to show: original messages, what was retrieved, what was cut, final assembled context. Full transparency.
461
+
462
+ **Sessions** โ€” per-session breakdown:
463
+ - Session list with timestamps, request count, total savings
464
+ - Click session โ†’ see all exchanges, retrieval decisions, chunk scores
465
+ - Export session as JSON
466
+
467
+ **Savings Report** โ€” the money page:
468
+ ```
469
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
470
+ โ”‚ Savings Report [Export CSV] โ”‚
471
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
472
+ โ”‚ โ”‚
473
+ โ”‚ This Month โ”‚
474
+ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
475
+ โ”‚ โ”‚ Without SmartContext: $412 โ”‚ โ”‚
476
+ โ”‚ โ”‚ With SmartContext: $127 โ”‚ โ”‚
477
+ โ”‚ โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚ โ”‚
478
+ โ”‚ โ”‚ You saved: $285 โ”‚ โ† big, green โ”‚
479
+ โ”‚ โ”‚ Savings rate: 69% โ”‚ โ”‚
480
+ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
481
+ โ”‚ โ”‚
482
+ โ”‚ By Provider โ”‚
483
+ โ”‚ anthropic $198 saved (72% reduction) โ”‚
484
+ โ”‚ openai $67 saved (61% reduction) โ”‚
485
+ โ”‚ ollama $0 saved (local, free) โ”‚
486
+ โ”‚ โ”‚
487
+ โ”‚ By Model โ”‚
488
+ โ”‚ claude-opus-4-6 $142 saved (most expensive) โ”‚
489
+ โ”‚ claude-sonnet-4-6 $56 saved โ”‚
490
+ โ”‚ gpt-4o $67 saved โ”‚
491
+ โ”‚ โ”‚
492
+ โ”‚ Projection (if current usage continues) โ”‚
493
+ โ”‚ Next month: ~$290 saved โ”‚
494
+ โ”‚ Next year: ~$3,480 saved โ”‚
495
+ โ”‚ โ”‚
496
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
497
+ ```
498
+
499
+ **Settings** โ€” editable from UI:
500
+ - Context tuning (tier sizes, thresholds, boosts)
501
+ - Provider management (add/remove API keys)
502
+ - Storage config
503
+ - Logging level
504
+ - Pause/resume individual providers
505
+ - Changes write to `~/.smartcontext/config.json`
506
+
507
+ #### Dashboard Tech Stack
508
+ - Vanilla HTML + CSS + minimal JS (no framework)
509
+ - Served by the proxy server itself (same port, root path)
510
+ - All HTML/CSS/JS inlined into a single TypeScript file (`src/ui/dashboard.ts`)
511
+ - Data via WebSocket from proxy (real-time updates) + REST API for history
512
+ - Charts: lightweight `<canvas>` drawing, no chart library dependency
513
+ - Works offline, no CDN, no external resources
514
+
515
+ ### 8.2 System Tray (optional, separate package)
516
+
517
+ For users who want a tray icon:
518
+ ```bash
519
+ npm install -g smartcontext-tray
520
+ ```
521
+
522
+ Tray icon shows:
523
+ - Green dot = running, yellow = paused, red = stopped
524
+ - Click โ†’ opens web dashboard in default browser
525
+ - Right-click menu: Pause / Resume / Stop / Open Dashboard / Quit
526
+ - Tooltip: "SmartContext: 142 requests, $63 saved today"
527
+
528
+ Built with `trayhost` (lightweight, no Electron). Separate package because tray requires native deps โ€” core proxy stays zero-native-deps.
529
+
530
+ ### 8.3 API Endpoints (programmatic access)
531
+
532
+ All dashboard data available via REST:
533
+
534
+ ```
535
+ GET /_sc/status โ†’ { state: "running"|"paused", uptime, pid }
536
+ GET /_sc/stats โ†’ { requests, savings, latency, storage }
537
+ GET /_sc/stats/daily โ†’ [ { date, requests, tokens_saved, cost_saved } ... ]
538
+ GET /_sc/stats/providers โ†’ { anthropic: {...}, openai: {...} }
539
+ GET /_sc/stats/models โ†’ { "claude-opus-4-6": {...}, "gpt-4o": {...} }
540
+ GET /_sc/feed โ†’ WebSocket: real-time request stream
541
+ GET /_sc/sessions โ†’ [ { id, started, requests, savings } ... ]
542
+ GET /_sc/sessions/:id โ†’ { exchanges: [...], chunks_retrieved: [...] }
543
+ GET /_sc/config โ†’ current config (keys redacted)
544
+ PUT /_sc/config โ†’ update config (partial merge)
545
+ POST /_sc/pause โ†’ pause proxy (pass-through all requests)
546
+ POST /_sc/resume โ†’ resume optimization
547
+ POST /_sc/stop โ†’ graceful shutdown
548
+ ```
549
+
550
+ ### 8.4 Pause Mode
551
+
552
+ When paused:
553
+ - Proxy still runs and forwards all requests
554
+ - Context optimization disabled โ€” requests pass through unmodified
555
+ - Indexing continues (still learning from conversations)
556
+ - Dashboard shows "PAUSED" badge
557
+ - Useful for: debugging, comparing with/without, temporary disable
558
+
559
+ ### 8.5 Response Headers (opt-in)
560
+
561
+ When `logging.debug_headers: true`:
562
+ ```
563
+ X-SmartContext-Savings: 73%
564
+ X-SmartContext-Original-Tokens: 45200
565
+ X-SmartContext-Optimized-Tokens: 12100
566
+ X-SmartContext-Retrieved-Chunks: 7
567
+ X-SmartContext-Top-Score: 0.89
568
+ X-SmartContext-Cache-Hit: prefix
569
+ X-SmartContext-Latency-Ms: 12
570
+ X-SmartContext-Mode: optimized|pass-through|paused
571
+ ```
572
+
573
+ ### 8.6 Dashboard Integration (OC Adapter)
574
+ For our setup, the OC adapter additionally:
575
+ - Writes metrics to `smartcontext-metrics` OpenSearch index
576
+ - Dashboard-ts gets a SmartContext tab reading from this index
577
+ - Same data, native dashboard look & feel
578
+
579
+ ## 9. Test Mode & LLM-Assisted Diagnostics
580
+
581
+ ### 9.1 A/B Test Mode
582
+
583
+ ```bash
584
+ npx smartcontext-proxy --test-mode
585
+ # or from dashboard: Settings โ†’ Enable Test Mode
586
+ ```
587
+
588
+ In test mode, every request is sent **twice**:
589
+
590
+ ```
591
+ Client Request
592
+ โ”‚
593
+ โ”œโ”€โ”€โ–บ Path A: SmartContext optimized โ†’ Provider โ†’ Response A (returned to client)
594
+ โ”‚
595
+ โ””โ”€โ”€โ–บ Path B: Original unmodified โ†’ Provider โ†’ Response B (stored, not returned)
596
+ ```
597
+
598
+ The client always gets Response A (optimized). Response B is stored for comparison.
599
+
600
+ **What gets compared:**
601
+ - **Semantic similarity**: embed both responses, compute cosine similarity. Score >0.95 = equivalent quality. Score <0.85 = potential retrieval miss.
602
+ - **Token delta**: how many tokens saved (A vs B input)
603
+ - **Latency delta**: overhead of optimization path
604
+ - **Content diff**: structured diff of responses (key facts present/missing)
605
+ - **Tool use match**: if A and B call the same tools with same arguments
606
+
607
+ **Dashboard in Test Mode:**
608
+
609
+ ```
610
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
611
+ โ”‚ A/B Test Results [Export JSON] โ”‚
612
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
613
+ โ”‚ โ”‚
614
+ โ”‚ Total comparisons: 87 โ”‚
615
+ โ”‚ Quality match (>0.95): 79 (91%) โ† green โ”‚
616
+ โ”‚ Minor diff (0.85-0.95): 6 (7%) โ† yellow โ”‚
617
+ โ”‚ Significant diff (<0.85): 2 (2%) โ† red โ”‚
618
+ โ”‚ โ”‚
619
+ โ”‚ Avg token savings: 64% โ”‚
620
+ โ”‚ Avg latency overhead: 14ms โ”‚
621
+ โ”‚ โ”‚
622
+ โ”‚ โš  Significant diffs (click to inspect): โ”‚
623
+ โ”‚ #43 anthropic/opus 0.78 "missed DB schema..." โ”‚
624
+ โ”‚ #71 openai/gpt-4o 0.82 "lost function sig..." โ”‚
625
+ โ”‚ โ”‚
626
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
627
+ ```
628
+
629
+ Clicking a diff row shows side-by-side: optimized context vs full context, what chunks were retrieved, what was missing, both responses.
630
+
631
+ **Cost note:** Test mode doubles API costs. Dashboard shows estimated extra cost. User can limit: `--test-mode --test-sample 20%` (randomly sample 20% of requests for A/B).
632
+
633
+ ### 9.2 Verbose Logging
634
+
635
+ ```bash
636
+ npx smartcontext-proxy --verbose
637
+ # or config: logging.level = "debug"
638
+ # or dashboard: Settings โ†’ Log Level โ†’ Debug
639
+ ```
640
+
641
+ Debug logs per request:
642
+
643
+ ```
644
+ [14:23:05.001] REQ #142 anthropic/claude-opus-4-6
645
+ [14:23:05.002] Original: 45,200 tokens (23 messages)
646
+ [14:23:05.003] System prompt: 2,100 tokens (stable, cache-eligible)
647
+ [14:23:05.004] Tier 1: kept last 3 exchanges (4,800 tokens)
648
+ [14:23:05.008] Embedding query: "fix the auth middleware bug" โ†’ 768-dim vector (6ms)
649
+ [14:23:05.012] Retrieval: 20 candidates, 7 above threshold
650
+ [14:23:05.012] #1 score=0.89 session=abc123 "auth middleware refactor from yesterday"
651
+ [14:23:05.012] #2 score=0.84 session=abc123 "JWT validation edge case discussion"
652
+ [14:23:05.012] #3 score=0.81 session=def456 "middleware stack architecture overview"
653
+ [14:23:05.012] #4 score=0.77 session=abc123 "auth test failures and fixes"
654
+ [14:23:05.012] #5 score=0.72 session=ghi789 "similar bug in payment middleware"
655
+ [14:23:05.012] #6 score=0.68 filepath-boost=+0.20 โ†’ 0.88 "src/middleware/auth.ts changes"
656
+ [14:23:05.012] #7 score=0.61 session=abc123 "general project setup"
657
+ [14:23:05.013] Dedup: merged #1 and #4 (similarity 0.93)
658
+ [14:23:05.013] Budget: 38,300 available โ†’ packed 6 chunks (5,200 tokens)
659
+ [14:23:05.013] Tier 3: 1 session summary (320 tokens)
660
+ [14:23:05.014] Final context: 12,420 tokens (savings: 72.5%)
661
+ [14:23:05.014] Forwarding to api.anthropic.com
662
+ [14:23:07.891] Response: 1,842 tokens, streaming completed (2,877ms)
663
+ [14:23:07.892] Indexing exchange async...
664
+ [14:23:07.910] Indexed: 1 chunk (18ms)
665
+ ```
666
+
667
+ Logs written to:
668
+ - `~/.smartcontext/logs/proxy.log` (standard)
669
+ - `~/.smartcontext/logs/debug.log` (verbose, only when enabled)
670
+ - `~/.smartcontext/logs/requests/` (per-request JSON dumps when `logging.request_dumps: true`)
671
+
672
+ Per-request JSON dump (for forensic analysis):
673
+
674
+ ```jsonc
675
+ {
676
+ "id": 142,
677
+ "timestamp": "2026-03-29T14:23:05.001Z",
678
+ "provider": "anthropic",
679
+ "model": "claude-opus-4-6",
680
+ "original": {
681
+ "messages": 23,
682
+ "tokens": 45200,
683
+ "system_prompt_tokens": 2100
684
+ },
685
+ "optimized": {
686
+ "messages": 12,
687
+ "tokens": 12420,
688
+ "tier1_tokens": 4800,
689
+ "tier2_tokens": 5200,
690
+ "tier3_tokens": 320,
691
+ "system_prompt_tokens": 2100
692
+ },
693
+ "retrieval": {
694
+ "query_embedding_ms": 6,
695
+ "search_ms": 4,
696
+ "candidates": 20,
697
+ "above_threshold": 7,
698
+ "after_dedup": 6,
699
+ "after_budget": 6,
700
+ "top_score": 0.89,
701
+ "chunks": [
702
+ { "id": "chunk_abc_17", "score": 0.89, "tokens": 850, "session": "abc123", "preview": "auth middleware refactor..." },
703
+ // ...
704
+ ]
705
+ },
706
+ "savings_pct": 72.5,
707
+ "latency_overhead_ms": 14,
708
+ "response_tokens": 1842,
709
+ // In test mode, also includes:
710
+ "ab_test": {
711
+ "response_b_tokens": 1910,
712
+ "semantic_similarity": 0.97,
713
+ "quality_match": true
714
+ }
715
+ }
716
+ ```
717
+
718
+ ### 9.3 LLM-Assisted Diagnostics
719
+
720
+ SmartContext can use a cheap LLM to analyze its own behavior and help debug issues.
721
+
722
+ #### Auto-Diagnosis (on significant quality diff)
723
+
724
+ When A/B test detects similarity <0.85, or when user flags a bad response:
725
+
726
+ 1. SmartContext collects the full context: original messages, what was retrieved, what was cut, both responses
727
+ 2. Sends to a diagnostic LLM (cheapest available: local Ollama, or Haiku-tier cloud model)
728
+ 3. LLM analyzes and produces a diagnostic report
729
+
730
+ ```
731
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
732
+ โ”‚ ๐Ÿ” Diagnostic Report โ€” Request #43 โ”‚
733
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
734
+ โ”‚ โ”‚
735
+ โ”‚ Problem: Response quality degraded (similarity โ”‚
736
+ โ”‚ 0.78). Model missed database schema context. โ”‚
737
+ โ”‚ โ”‚
738
+ โ”‚ Root cause: The DB schema was discussed in โ”‚
739
+ โ”‚ exchange #7 (45 min ago) but the embedding โ”‚
740
+ โ”‚ similarity to current query was only 0.52 โ€” โ”‚
741
+ โ”‚ below the 0.55 threshold. The schema discussion โ”‚
742
+ โ”‚ used technical column names while the current โ”‚
743
+ โ”‚ query uses business-level terminology. โ”‚
744
+ โ”‚ โ”‚
745
+ โ”‚ Recommended fixes: โ”‚
746
+ โ”‚ 1. Lower tier2_min_score to 0.50 for this โ”‚
747
+ โ”‚ session type (schema discussions) โ”‚
748
+ โ”‚ 2. Add keyword fallback: if query contains โ”‚
749
+ โ”‚ table/column names, grep raw logs โ”‚
750
+ โ”‚ 3. Consider hybrid retrieval: semantic + keyword โ”‚
751
+ โ”‚ โ”‚
752
+ โ”‚ โšก Auto-fix available: โ”‚
753
+ โ”‚ [Apply fix #1] [Apply fix #2] [Ignore] โ”‚
754
+ โ”‚ โ”‚
755
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
756
+ ```
757
+
758
+ #### Diagnostic Commands
759
+
760
+ ```bash
761
+ npx smartcontext-proxy diagnose # Analyze last 100 requests, find issues
762
+ npx smartcontext-proxy diagnose --request 43 # Diagnose specific request
763
+ npx smartcontext-proxy diagnose --tune # Suggest config tuning based on usage patterns
764
+ ```
765
+
766
+ From dashboard: any request in Live Feed or A/B results has a "๐Ÿ” Diagnose" button.
767
+
768
+ #### What the Diagnostic LLM Analyzes
769
+
770
+ | Trigger | What LLM Receives | What LLM Returns |
771
+ |---------|-------------------|-------------------|
772
+ | Quality diff <0.85 | Both contexts, both responses, retrieval scores | Root cause, fix suggestions |
773
+ | User clicks "Diagnose" | Full request dump (JSON) | Plain-language analysis |
774
+ | `diagnose --tune` | Aggregate stats, score distributions, miss patterns | Config recommendations with reasoning |
775
+ | First 50 requests (onboarding) | Request patterns, conversation structure | Auto-tune suggestions for chunk size, thresholds |
776
+
777
+ #### Auto-Tuning
778
+
779
+ After accumulating 50+ requests, SmartContext can auto-tune itself:
780
+
781
+ ```bash
782
+ npx smartcontext-proxy auto-tune
783
+ ```
784
+
785
+ The diagnostic LLM analyzes:
786
+ - Score distribution (are thresholds too high/low?)
787
+ - Miss patterns (what type of context gets lost?)
788
+ - Chunk size effectiveness (too big = wasteful, too small = lost context)
789
+ - Provider-specific patterns (some models need more context than others)
790
+
791
+ Produces a tuning report with specific config changes. User approves or rejects each.
792
+
793
+ Dashboard: Settings โ†’ "๐Ÿงช Auto-Tune" button. Shows proposed changes with before/after predictions.
794
+
795
+ #### LLM Selection for Diagnostics
796
+
797
+ Priority (cheapest first):
798
+ 1. Local Ollama (qwen3-coder-next, kimi-k2.5) โ€” free, fast
799
+ 2. Ollama Cloud โ€” cheap, fast
800
+ 3. Cheapest detected cloud provider (Haiku, GPT-4o-mini)
801
+ 4. Skip diagnostics if no cheap LLM available
802
+
803
+ Diagnostic calls are **never** sent through SmartContext itself (avoid recursion). Direct API calls to the diagnostic LLM.
804
+
805
+ ## 10. Graceful Degradation
806
+
807
+ SmartContext must NEVER make things worse:
808
+
809
+ | Failure | Behavior |
810
+ |---------|----------|
811
+ | Vector store down | Pass-through original request unmodified |
812
+ | Embedding fails | Pass-through original request unmodified |
813
+ | No chunks above threshold | Pass-through original request unmodified |
814
+ | Provider unreachable | Return error (same as without proxy) |
815
+ | Config missing | Auto-generate defaults and start |
816
+ | First run, empty index | Pass-through until enough data indexed |
817
+
818
+ The proxy is **additive only**. If anything in the optimization pipeline fails, the original request goes through untouched. The user never sees degraded quality from SmartContext โ€” worst case they get exactly what they'd get without it.
819
+
820
+ ## 10. Security
821
+
822
+ - API keys never stored in plaintext โ€” reference env vars (`env:ANTHROPIC_API_KEY`)
823
+ - Proxy listens on localhost by default (no network exposure)
824
+ - Raw logs encrypted at rest (AES-256, key derived from machine ID)
825
+ - No telemetry, no phone-home, no analytics
826
+ - All data stays local unless user explicitly configures remote storage
827
+
828
+ ## 11. Project Structure
829
+
830
+ ```
831
+ smartcontext-proxy/
832
+ โ”œโ”€โ”€ src/
833
+ โ”‚ โ”œโ”€โ”€ index.ts # Entry point, CLI, daemon management
834
+ โ”‚ โ”œโ”€โ”€ proxy/
835
+ โ”‚ โ”‚ โ”œโ”€โ”€ server.ts # HTTP/SSE proxy server + UI serving
836
+ โ”‚ โ”‚ โ”œโ”€โ”€ router.ts # Route to correct provider adapter
837
+ โ”‚ โ”‚ โ”œโ”€โ”€ stream.ts # SSE pass-through logic
838
+ โ”‚ โ”‚ โ””โ”€โ”€ pause.ts # Pause/resume state management
839
+ โ”‚ โ”œโ”€โ”€ providers/
840
+ โ”‚ โ”‚ โ”œโ”€โ”€ types.ts # ProviderAdapter interface
841
+ โ”‚ โ”‚ โ”œโ”€โ”€ anthropic.ts # Anthropic Messages API
842
+ โ”‚ โ”‚ โ”œโ”€โ”€ openai.ts # OpenAI Chat Completions
843
+ โ”‚ โ”‚ โ”œโ”€โ”€ google.ts # Google GenerateContent
844
+ โ”‚ โ”‚ โ””โ”€โ”€ ollama.ts # Ollama native API
845
+ โ”‚ โ”œโ”€โ”€ context/
846
+ โ”‚ โ”‚ โ”œโ”€โ”€ optimizer.ts # Core context optimization logic
847
+ โ”‚ โ”‚ โ”œโ”€โ”€ canonical.ts # Canonical message format
848
+ โ”‚ โ”‚ โ”œโ”€โ”€ chunker.ts # Message chunking
849
+ โ”‚ โ”‚ โ”œโ”€โ”€ retriever.ts # Vector search + scoring
850
+ โ”‚ โ”‚ โ””โ”€โ”€ budget.ts # Token budget allocation
851
+ โ”‚ โ”œโ”€โ”€ storage/
852
+ โ”‚ โ”‚ โ”œโ”€โ”€ types.ts # StorageAdapter interface
853
+ โ”‚ โ”‚ โ”œโ”€โ”€ lancedb.ts # Default embedded storage
854
+ โ”‚ โ”‚ โ””โ”€โ”€ filesystem.ts # Fallback: raw logs only
855
+ โ”‚ โ”œโ”€โ”€ embedding/
856
+ โ”‚ โ”‚ โ”œโ”€โ”€ types.ts # EmbeddingAdapter interface
857
+ โ”‚ โ”‚ โ”œโ”€โ”€ onnx.ts # Built-in ONNX embedding
858
+ โ”‚ โ”‚ โ””โ”€โ”€ ollama.ts # Ollama embedding
859
+ โ”‚ โ”œโ”€โ”€ config/
860
+ โ”‚ โ”‚ โ”œโ”€โ”€ auto-detect.ts # Provider/embedding/storage discovery
861
+ โ”‚ โ”‚ โ”œโ”€โ”€ schema.ts # Config validation
862
+ โ”‚ โ”‚ โ””โ”€โ”€ defaults.ts # Default values
863
+ โ”‚ โ”œโ”€โ”€ metrics/
864
+ โ”‚ โ”‚ โ”œโ”€โ”€ collector.ts # Request/response metrics
865
+ โ”‚ โ”‚ โ”œโ”€โ”€ endpoint.ts # /_sc/* REST API
866
+ โ”‚ โ”‚ โ””โ”€โ”€ history.ts # Persistent metrics (daily/monthly)
867
+ โ”‚ โ”œโ”€โ”€ ui/
868
+ โ”‚ โ”‚ โ”œโ”€โ”€ dashboard.ts # Generates HTML/CSS/JS (inlined, no deps)
869
+ โ”‚ โ”‚ โ”œโ”€โ”€ ws-feed.ts # WebSocket live feed server
870
+ โ”‚ โ”‚ โ””โ”€โ”€ api.ts # /_sc/* route handlers (pause/stop/config)
871
+ โ”‚ โ”œโ”€โ”€ daemon/
872
+ โ”‚ โ”‚ โ”œโ”€โ”€ process.ts # Fork/detach, PID file, signal handling
873
+ โ”‚ โ”‚ โ””โ”€โ”€ service.ts # Generate systemd/launchd service files
874
+ โ”‚ โ””โ”€โ”€ adapters/
875
+ โ”‚ โ””โ”€โ”€ loader.ts # Discover & load external adapters
876
+ โ”œโ”€โ”€ adapters/
877
+ โ”‚ โ””โ”€โ”€ openclaw/ # Our adapter (separate npm package)
878
+ โ”‚ โ”œโ”€โ”€ index.ts
879
+ โ”‚ โ”œโ”€โ”€ storage.ts # OpenSearch storage
880
+ โ”‚ โ”œโ”€โ”€ embedding.ts # Beast Ollama embedding
881
+ โ”‚ โ””โ”€โ”€ session-importer.ts # Import OC session logs
882
+ โ”œโ”€โ”€ package.json
883
+ โ”œโ”€โ”€ tsconfig.json
884
+ โ””โ”€โ”€ README.md
885
+ ```
886
+
887
+ ## 12. Synergy with Our Stack
888
+
889
+ | Our Component | SmartContext Integration |
890
+ |--------------|------------------------|
891
+ | **OC Gateway** | SmartContext sits between OC and Anthropic/Gemini APIs. OC config points `baseUrl` to SmartContext. |
892
+ | **Beast PC** | Remote Ollama embedding via `remote-ollama` adapter. Faster than ONNX, zero cost. |
893
+ | **OpenSearch (Castle)** | `openclaw` adapter stores chunks + metrics in OS. Dashboard reads them. |
894
+ | **Session logs** | `session-importer.ts` indexes historical OC sessions on first setup. |
895
+ | **Dashboard** | New SmartContext tab: savings graph, retrieval quality, per-cron breakdown. |
896
+ | **Cron jobs** | Each cron call goes through SmartContext โ†’ cross-session context for recurring tasks. |
897
+ | **A2A Bridge** | Agent-to-agent messages indexed โ†’ agents share context automatically. |
898
+
899
+ ## 13. Benchmark Plan
900
+
901
+ Before public release, benchmark on 10 real CC sessions (2 per type):
902
+
903
+ | Session Type | What to Measure |
904
+ |-------------|----------------|
905
+ | Bug fix (short) | Retrieval precision, latency overhead |
906
+ | Feature build (long) | Token savings %, quality retention |
907
+ | Cron/monitoring | Cross-session context value |
908
+ | Multi-file refactor | File-path boost effectiveness |
909
+ | Learning/research | Summary tier value |
910
+
911
+ Metrics per session:
912
+ - **Semantic similarity**: SmartContext response vs full-context response (cosine sim of embeddings)
913
+ - **Token ratio**: optimized / original
914
+ - **Latency**: p50, p95 overhead
915
+ - **Retrieval precision**: manually scored relevance of retrieved chunks (1-5 scale)