smartcontext-proxy 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/PLAN.md +406 -0
- package/PROGRESS.md +60 -0
- package/README.md +99 -0
- package/SPEC.md +915 -0
- package/adapters/openclaw/embedding.d.ts +8 -0
- package/adapters/openclaw/embedding.js +16 -0
- package/adapters/openclaw/embedding.ts +15 -0
- package/adapters/openclaw/index.d.ts +18 -0
- package/adapters/openclaw/index.js +42 -0
- package/adapters/openclaw/index.ts +43 -0
- package/adapters/openclaw/session-importer.d.ts +22 -0
- package/adapters/openclaw/session-importer.js +99 -0
- package/adapters/openclaw/session-importer.ts +105 -0
- package/adapters/openclaw/storage.d.ts +26 -0
- package/adapters/openclaw/storage.js +177 -0
- package/adapters/openclaw/storage.ts +183 -0
- package/dist/adapters/openclaw/embedding.d.ts +8 -0
- package/dist/adapters/openclaw/embedding.js +16 -0
- package/dist/adapters/openclaw/index.d.ts +18 -0
- package/dist/adapters/openclaw/index.js +42 -0
- package/dist/adapters/openclaw/session-importer.d.ts +22 -0
- package/dist/adapters/openclaw/session-importer.js +99 -0
- package/dist/adapters/openclaw/storage.d.ts +26 -0
- package/dist/adapters/openclaw/storage.js +177 -0
- package/dist/config/auto-detect.d.ts +3 -0
- package/dist/config/auto-detect.js +48 -0
- package/dist/config/defaults.d.ts +2 -0
- package/dist/config/defaults.js +28 -0
- package/dist/config/schema.d.ts +30 -0
- package/dist/config/schema.js +3 -0
- package/dist/context/budget.d.ts +25 -0
- package/dist/context/budget.js +85 -0
- package/dist/context/canonical.d.ts +39 -0
- package/dist/context/canonical.js +12 -0
- package/dist/context/chunker.d.ts +9 -0
- package/dist/context/chunker.js +148 -0
- package/dist/context/optimizer.d.ts +31 -0
- package/dist/context/optimizer.js +163 -0
- package/dist/context/retriever.d.ts +29 -0
- package/dist/context/retriever.js +103 -0
- package/dist/daemon/process.d.ts +6 -0
- package/dist/daemon/process.js +76 -0
- package/dist/daemon/service.d.ts +2 -0
- package/dist/daemon/service.js +99 -0
- package/dist/embedding/ollama.d.ts +11 -0
- package/dist/embedding/ollama.js +72 -0
- package/dist/embedding/types.d.ts +6 -0
- package/dist/embedding/types.js +3 -0
- package/dist/index.d.ts +2 -0
- package/dist/index.js +190 -0
- package/dist/metrics/collector.d.ts +43 -0
- package/dist/metrics/collector.js +72 -0
- package/dist/providers/anthropic.d.ts +15 -0
- package/dist/providers/anthropic.js +109 -0
- package/dist/providers/google.d.ts +13 -0
- package/dist/providers/google.js +40 -0
- package/dist/providers/ollama.d.ts +13 -0
- package/dist/providers/ollama.js +82 -0
- package/dist/providers/openai.d.ts +15 -0
- package/dist/providers/openai.js +115 -0
- package/dist/providers/types.d.ts +18 -0
- package/dist/providers/types.js +3 -0
- package/dist/proxy/router.d.ts +12 -0
- package/dist/proxy/router.js +46 -0
- package/dist/proxy/server.d.ts +25 -0
- package/dist/proxy/server.js +265 -0
- package/dist/proxy/stream.d.ts +8 -0
- package/dist/proxy/stream.js +32 -0
- package/dist/src/config/auto-detect.d.ts +3 -0
- package/dist/src/config/auto-detect.js +48 -0
- package/dist/src/config/defaults.d.ts +2 -0
- package/dist/src/config/defaults.js +28 -0
- package/dist/src/config/schema.d.ts +30 -0
- package/dist/src/config/schema.js +3 -0
- package/dist/src/context/budget.d.ts +25 -0
- package/dist/src/context/budget.js +85 -0
- package/dist/src/context/canonical.d.ts +39 -0
- package/dist/src/context/canonical.js +12 -0
- package/dist/src/context/chunker.d.ts +9 -0
- package/dist/src/context/chunker.js +148 -0
- package/dist/src/context/optimizer.d.ts +31 -0
- package/dist/src/context/optimizer.js +163 -0
- package/dist/src/context/retriever.d.ts +29 -0
- package/dist/src/context/retriever.js +103 -0
- package/dist/src/daemon/process.d.ts +6 -0
- package/dist/src/daemon/process.js +76 -0
- package/dist/src/daemon/service.d.ts +2 -0
- package/dist/src/daemon/service.js +99 -0
- package/dist/src/embedding/ollama.d.ts +11 -0
- package/dist/src/embedding/ollama.js +72 -0
- package/dist/src/embedding/types.d.ts +6 -0
- package/dist/src/embedding/types.js +3 -0
- package/dist/src/index.d.ts +2 -0
- package/dist/src/index.js +190 -0
- package/dist/src/metrics/collector.d.ts +43 -0
- package/dist/src/metrics/collector.js +72 -0
- package/dist/src/providers/anthropic.d.ts +15 -0
- package/dist/src/providers/anthropic.js +109 -0
- package/dist/src/providers/google.d.ts +13 -0
- package/dist/src/providers/google.js +40 -0
- package/dist/src/providers/ollama.d.ts +13 -0
- package/dist/src/providers/ollama.js +82 -0
- package/dist/src/providers/openai.d.ts +15 -0
- package/dist/src/providers/openai.js +115 -0
- package/dist/src/providers/types.d.ts +18 -0
- package/dist/src/providers/types.js +3 -0
- package/dist/src/proxy/router.d.ts +12 -0
- package/dist/src/proxy/router.js +46 -0
- package/dist/src/proxy/server.d.ts +25 -0
- package/dist/src/proxy/server.js +265 -0
- package/dist/src/proxy/stream.d.ts +8 -0
- package/dist/src/proxy/stream.js +32 -0
- package/dist/src/storage/lancedb.d.ts +21 -0
- package/dist/src/storage/lancedb.js +158 -0
- package/dist/src/storage/types.d.ts +52 -0
- package/dist/src/storage/types.js +3 -0
- package/dist/src/test/context.test.d.ts +1 -0
- package/dist/src/test/context.test.js +141 -0
- package/dist/src/test/dashboard.test.d.ts +1 -0
- package/dist/src/test/dashboard.test.js +85 -0
- package/dist/src/test/proxy.test.d.ts +1 -0
- package/dist/src/test/proxy.test.js +188 -0
- package/dist/src/ui/dashboard.d.ts +2 -0
- package/dist/src/ui/dashboard.js +183 -0
- package/dist/storage/lancedb.d.ts +21 -0
- package/dist/storage/lancedb.js +158 -0
- package/dist/storage/types.d.ts +52 -0
- package/dist/storage/types.js +3 -0
- package/dist/test/context.test.d.ts +1 -0
- package/dist/test/context.test.js +141 -0
- package/dist/test/dashboard.test.d.ts +1 -0
- package/dist/test/dashboard.test.js +85 -0
- package/dist/test/proxy.test.d.ts +1 -0
- package/dist/test/proxy.test.js +188 -0
- package/dist/ui/dashboard.d.ts +2 -0
- package/dist/ui/dashboard.js +183 -0
- package/package.json +38 -0
- package/src/config/auto-detect.ts +51 -0
- package/src/config/defaults.ts +26 -0
- package/src/config/schema.ts +33 -0
- package/src/context/budget.ts +126 -0
- package/src/context/canonical.ts +50 -0
- package/src/context/chunker.ts +165 -0
- package/src/context/optimizer.ts +201 -0
- package/src/context/retriever.ts +123 -0
- package/src/daemon/process.ts +70 -0
- package/src/daemon/service.ts +103 -0
- package/src/embedding/ollama.ts +68 -0
- package/src/embedding/types.ts +6 -0
- package/src/index.ts +176 -0
- package/src/metrics/collector.ts +114 -0
- package/src/providers/anthropic.ts +117 -0
- package/src/providers/google.ts +42 -0
- package/src/providers/ollama.ts +87 -0
- package/src/providers/openai.ts +127 -0
- package/src/providers/types.ts +20 -0
- package/src/proxy/router.ts +48 -0
- package/src/proxy/server.ts +315 -0
- package/src/proxy/stream.ts +39 -0
- package/src/storage/lancedb.ts +169 -0
- package/src/storage/types.ts +47 -0
- package/src/test/context.test.ts +165 -0
- package/src/test/dashboard.test.ts +94 -0
- package/src/test/proxy.test.ts +218 -0
- package/src/ui/dashboard.ts +184 -0
- package/tsconfig.json +18 -0
package/SPEC.md
ADDED
|
@@ -0,0 +1,915 @@
|
|
|
1
|
+
# SmartContext Proxy โ Technical Specification v2.0
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Self-configuring, provider-agnostic transparent proxy between LLM clients and providers. Operates like a network firewall โ intercepts traffic, applies context optimization logic, forwards transparently. Zero-config `npx` install, works out of the box.
|
|
6
|
+
|
|
7
|
+
## Core Principle: Transparent Firewall
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
Client App โโโบ SmartContext Proxy โโโบ LLM Provider
|
|
11
|
+
(unchanged) (intercept+optimize) (any provider)
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
The client doesn't know SmartContext exists. The provider doesn't know SmartContext exists. SmartContext sits in the middle, reads the conversation, replaces bloated history with optimized context, and forwards. Like a firewall โ but for tokens.
|
|
15
|
+
|
|
16
|
+
## 1. Zero-Config Bootstrap
|
|
17
|
+
|
|
18
|
+
### Install & Run
|
|
19
|
+
```bash
|
|
20
|
+
npx smartcontext-proxy
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
That's it. On first run:
|
|
24
|
+
|
|
25
|
+
1. **Auto-detect providers**: Scan env vars (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GOOGLE_API_KEY`, `OPENROUTER_API_KEY`, `OLLAMA_HOST`). Each detected key = one supported provider.
|
|
26
|
+
2. **Auto-select embedding**: Check for local Ollama (`localhost:11434`) โ use `nomic-embed-text`. No Ollama โ use built-in ONNX runtime (`@xenova/transformers` with `nomic-embed-text-v1.5`). Zero external dependencies either way.
|
|
27
|
+
3. **Auto-select storage**: Embedded LanceDB (zero config, writes to `~/.smartcontext/data/`). No server, no setup.
|
|
28
|
+
4. **Start proxy**: Listen on `localhost:4800`. Print one line: `SmartContext listening on http://localhost:4800 โ providers: anthropic, openai, ollama`
|
|
29
|
+
5. **Generate config**: Write `~/.smartcontext/config.json` with detected settings. User can edit later if needed.
|
|
30
|
+
|
|
31
|
+
### Client Integration
|
|
32
|
+
Change one env var:
|
|
33
|
+
```bash
|
|
34
|
+
# Before
|
|
35
|
+
ANTHROPIC_API_URL=https://api.anthropic.com
|
|
36
|
+
|
|
37
|
+
# After
|
|
38
|
+
ANTHROPIC_API_URL=http://localhost:4800/v1/anthropic
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
Or for OpenAI-compatible clients:
|
|
42
|
+
```bash
|
|
43
|
+
OPENAI_BASE_URL=http://localhost:4800/v1/openai
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
### Self-Configuration via LLM
|
|
47
|
+
If config is ambiguous (multiple providers, unclear defaults), SmartContext can use a cheap local model (Ollama) or the cheapest available cloud model to:
|
|
48
|
+
- Analyze the user's typical usage pattern (from first few intercepted requests)
|
|
49
|
+
- Suggest optimal tier thresholds
|
|
50
|
+
- Auto-tune chunk sizes based on actual conversation structure
|
|
51
|
+
|
|
52
|
+
## 2. Provider-Agnostic Architecture
|
|
53
|
+
|
|
54
|
+
### Request Flow (Firewall Model)
|
|
55
|
+
|
|
56
|
+
```
|
|
57
|
+
Inbound Request
|
|
58
|
+
โ
|
|
59
|
+
โผ
|
|
60
|
+
โโโโโโโโโโโโโโโโโโโ
|
|
61
|
+
โ Format Detect โ โ Auto-detect: Anthropic Messages / OpenAI Chat / Google GenerateContent
|
|
62
|
+
โ (by URL path โ /v1/anthropic/* โ Anthropic format
|
|
63
|
+
โ or headers) โ /v1/openai/* โ OpenAI format
|
|
64
|
+
โโโโโโโโโโฌโโโโโโโโโ /v1/google/* โ Google format
|
|
65
|
+
โ /v1/ollama/* โ Ollama format
|
|
66
|
+
โผ
|
|
67
|
+
โโโโโโโโโโโโโโโโโโโ
|
|
68
|
+
โ Parse to โ โ Normalize all formats to internal CanonicalMessage[]
|
|
69
|
+
โ Canonical โ { role, content, metadata, timestamp }
|
|
70
|
+
โ Format โ
|
|
71
|
+
โโโโโโโโโโฌโโโโโโโโโ
|
|
72
|
+
โ
|
|
73
|
+
โผ
|
|
74
|
+
โโโโโโโโโโโโโโโโโโโ
|
|
75
|
+
โ Context โ โ The core logic:
|
|
76
|
+
โ Optimizer โ 1. Extract system prompt (keep stable for KV-cache)
|
|
77
|
+
โ โ 2. Keep Tier 1 (last N exchanges) verbatim
|
|
78
|
+
โ โ 3. Embed user query โ retrieve Tier 2 from vector store
|
|
79
|
+
โ โ 4. Pack into token budget
|
|
80
|
+
โ โ 5. Append Tier 3 summaries if space remains
|
|
81
|
+
โโโโโโโโโโฌโโโโโโโโโ
|
|
82
|
+
โ
|
|
83
|
+
โผ
|
|
84
|
+
โโโโโโโโโโโโโโโโโโโ
|
|
85
|
+
โ Serialize to โ โ Convert back to original provider format
|
|
86
|
+
โ Provider Format โ (same format as inbound โ transparent proxy)
|
|
87
|
+
โโโโโโโโโโฌโโโโโโโโโ
|
|
88
|
+
โ
|
|
89
|
+
โผ
|
|
90
|
+
โโโโโโโโโโโโโโโโโโโ
|
|
91
|
+
โ Forward to โ โ SSE stream-through for streaming requests
|
|
92
|
+
โ Provider โ Async post-index after response complete
|
|
93
|
+
โโโโโโโโโโฌโโโโโโโโโ
|
|
94
|
+
โ
|
|
95
|
+
โผ
|
|
96
|
+
โโโโโโโโโโโโโโโโโโโ
|
|
97
|
+
โ Index Exchange โ โ Embed + store the full exchange (async, non-blocking)
|
|
98
|
+
โ (async) โ Write raw log to disk
|
|
99
|
+
โโโโโโโโโโโโโโโโโโโ
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### Provider Modules
|
|
103
|
+
|
|
104
|
+
Each provider is a module implementing `ProviderAdapter`:
|
|
105
|
+
|
|
106
|
+
```typescript
|
|
107
|
+
interface ProviderAdapter {
|
|
108
|
+
name: string;
|
|
109
|
+
detect(req: IncomingMessage): boolean; // Can this adapter handle this request?
|
|
110
|
+
parseRequest(body: any): CanonicalRequest; // Provider format โ canonical
|
|
111
|
+
serializeRequest(canonical: CanonicalRequest): any; // Canonical โ provider format
|
|
112
|
+
forwardUrl(originalPath: string): string; // Where to forward
|
|
113
|
+
parseStreamChunk(chunk: Buffer): StreamEvent; // Parse SSE chunks
|
|
114
|
+
serializeStreamChunk(event: StreamEvent): Buffer; // Re-serialize SSE
|
|
115
|
+
}
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
Built-in adapters: `anthropic`, `openai`, `google`, `ollama`.
|
|
119
|
+
Custom adapters: drop a `.js` file into `~/.smartcontext/adapters/`.
|
|
120
|
+
|
|
121
|
+
### Canonical Message Format
|
|
122
|
+
|
|
123
|
+
```typescript
|
|
124
|
+
interface CanonicalMessage {
|
|
125
|
+
role: 'system' | 'user' | 'assistant' | 'tool';
|
|
126
|
+
content: string | ContentBlock[]; // text, images, tool_use, tool_result
|
|
127
|
+
timestamp?: number;
|
|
128
|
+
metadata?: {
|
|
129
|
+
provider?: string;
|
|
130
|
+
model?: string;
|
|
131
|
+
tokens?: number;
|
|
132
|
+
files?: string[]; // file paths mentioned
|
|
133
|
+
tools?: string[]; // tools used
|
|
134
|
+
sessionId?: string;
|
|
135
|
+
};
|
|
136
|
+
}
|
|
137
|
+
|
|
138
|
+
interface CanonicalRequest {
|
|
139
|
+
messages: CanonicalMessage[];
|
|
140
|
+
systemPrompt?: string;
|
|
141
|
+
model: string;
|
|
142
|
+
stream: boolean;
|
|
143
|
+
maxTokens?: number;
|
|
144
|
+
temperature?: number;
|
|
145
|
+
tools?: any[];
|
|
146
|
+
rawHeaders: Record<string, string>; // preserved for forwarding
|
|
147
|
+
providerAuth: string; // API key for forwarding
|
|
148
|
+
}
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
## 3. Context Optimization Engine
|
|
152
|
+
|
|
153
|
+
### Tiered Strategy
|
|
154
|
+
|
|
155
|
+
| Tier | What | Token Budget | Source |
|
|
156
|
+
|------|------|-------------|--------|
|
|
157
|
+
| **T0** | System prompt | Unlimited (stable prefix) | From request, never modified |
|
|
158
|
+
| **T1** | Hot context | Last 3 exchanges verbatim | From request |
|
|
159
|
+
| **T2** | Warm context | Top-K retrieved chunks | Vector store (semantic search) |
|
|
160
|
+
| **T3** | Cold context | Session/project summaries | Pre-computed summaries |
|
|
161
|
+
|
|
162
|
+
### Token Budget Algorithm
|
|
163
|
+
|
|
164
|
+
```
|
|
165
|
+
available = model_context_limit - system_prompt_tokens - response_reserve
|
|
166
|
+
t1_tokens = sum(last_3_exchanges)
|
|
167
|
+
t2_budget = available - t1_tokens - t3_reserve
|
|
168
|
+
t3_reserve = min(500, available * 0.05)
|
|
169
|
+
|
|
170
|
+
// Fill T2 greedily by relevance score until budget exhausted
|
|
171
|
+
for chunk in retrieved_chunks_sorted_by_score:
|
|
172
|
+
if t2_used + chunk.tokens <= t2_budget:
|
|
173
|
+
include(chunk)
|
|
174
|
+
t2_used += chunk.tokens
|
|
175
|
+
|
|
176
|
+
// Fill T3 with remaining space
|
|
177
|
+
if remaining > 100:
|
|
178
|
+
include(session_summaries, limit=remaining)
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
### Retrieval Pipeline
|
|
182
|
+
|
|
183
|
+
1. **Embed query**: User's last message โ embedding vector
|
|
184
|
+
2. **Candidate retrieval**: Top-20 from vector store (cosine similarity)
|
|
185
|
+
3. **File-path boost**: If query mentions a file path, chunks containing that path get +0.2 boost. File-path inertia: if recent exchanges focused on a file, keep boosting it.
|
|
186
|
+
4. **Recency boost**: Current session chunks get +0.15, last-hour chunks get +0.05
|
|
187
|
+
5. **Dedup**: Chunks with similarity >0.92 โ keep most recent
|
|
188
|
+
6. **Confidence gate**: If best chunk score < 0.55, skip retrieval entirely โ pass original context through (graceful degradation)
|
|
189
|
+
7. **Min chunks**: Always include at least 3 chunks if they pass threshold 0.55
|
|
190
|
+
|
|
191
|
+
### Chunking
|
|
192
|
+
|
|
193
|
+
- **Unit**: One user-assistant exchange = one chunk
|
|
194
|
+
- **Long responses**: Split at paragraph boundaries if >2000 tokens. Keep code blocks atomic.
|
|
195
|
+
- **Metadata per chunk**: `{ sessionId, timestamp, files[], tools[], summary(first 100 chars) }`
|
|
196
|
+
- **Overlap**: Last sentence of prev chunk prepended to next chunk
|
|
197
|
+
|
|
198
|
+
## 4. Streaming Architecture
|
|
199
|
+
|
|
200
|
+
Non-negotiable: zero perceived latency overhead.
|
|
201
|
+
|
|
202
|
+
```
|
|
203
|
+
Client โโโSSEโโโโ SmartContext โโโSSEโโโโ Provider
|
|
204
|
+
(pass-through) (pass-through)
|
|
205
|
+
|
|
206
|
+
Timeline:
|
|
207
|
+
0ms Client sends request
|
|
208
|
+
5ms SmartContext intercepts, optimizes context
|
|
209
|
+
15ms Forward to provider (optimized request, fewer tokens)
|
|
210
|
+
20ms Provider starts streaming response
|
|
211
|
+
20ms SmartContext passes first SSE chunk to client
|
|
212
|
+
... Stream continues transparently
|
|
213
|
+
done SmartContext asynchronously indexes the exchange
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
The optimization happens BEFORE the provider call (5-15ms for embed + retrieve). The streaming response is passed through byte-for-byte with zero buffering.
|
|
217
|
+
|
|
218
|
+
## 5. Storage Architecture
|
|
219
|
+
|
|
220
|
+
### Plugin System
|
|
221
|
+
|
|
222
|
+
```typescript
|
|
223
|
+
interface StorageAdapter {
|
|
224
|
+
name: string;
|
|
225
|
+
|
|
226
|
+
// Vector operations
|
|
227
|
+
upsertChunks(chunks: Chunk[]): Promise<void>;
|
|
228
|
+
search(embedding: number[], options: SearchOptions): Promise<ScoredChunk[]>;
|
|
229
|
+
|
|
230
|
+
// Raw log operations
|
|
231
|
+
appendLog(sessionId: string, exchange: Exchange): Promise<void>;
|
|
232
|
+
getSessionLog(sessionId: string): Promise<Exchange[]>;
|
|
233
|
+
|
|
234
|
+
// Summary operations
|
|
235
|
+
upsertSummary(sessionId: string, summary: string): Promise<void>;
|
|
236
|
+
getSummaries(sessionIds: string[]): Promise<Summary[]>;
|
|
237
|
+
|
|
238
|
+
// Lifecycle
|
|
239
|
+
initialize(config: any): Promise<void>;
|
|
240
|
+
close(): Promise<void>;
|
|
241
|
+
}
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
### Built-in Adapters
|
|
245
|
+
|
|
246
|
+
| Adapter | Config | Use Case |
|
|
247
|
+
|---------|--------|----------|
|
|
248
|
+
| `lancedb` (default) | Zero-config, `~/.smartcontext/data/` | npx users, single machine |
|
|
249
|
+
| `opensearch` | `{ url: "http://..." }` | Teams, existing ES/OS infra |
|
|
250
|
+
| `qdrant` | `{ url: "http://..." }` | ML teams with Qdrant |
|
|
251
|
+
| `filesystem` | `{ path: "..." }` | Minimal, logs only, no vector search |
|
|
252
|
+
|
|
253
|
+
### Embedding Plugin
|
|
254
|
+
|
|
255
|
+
```typescript
|
|
256
|
+
interface EmbeddingAdapter {
|
|
257
|
+
name: string;
|
|
258
|
+
dimensions: number;
|
|
259
|
+
embed(texts: string[]): Promise<number[][]>;
|
|
260
|
+
initialize(config: any): Promise<void>;
|
|
261
|
+
}
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
| Adapter | Config | Use Case |
|
|
265
|
+
|---------|--------|----------|
|
|
266
|
+
| `onnx` (default) | Zero-config, downloads model on first run | npx users, no GPU |
|
|
267
|
+
| `ollama` | `{ url: "http://localhost:11434", model: "nomic-embed-text" }` | Local Ollama users |
|
|
268
|
+
| `remote-ollama` | `{ url: "http://beast:11434", model: "nomic-embed-text" }` | Our setup (Beast PC) |
|
|
269
|
+
|
|
270
|
+
## 6. Configuration
|
|
271
|
+
|
|
272
|
+
### Auto-Generated Config (`~/.smartcontext/config.json`)
|
|
273
|
+
|
|
274
|
+
```jsonc
|
|
275
|
+
{
|
|
276
|
+
// Auto-detected on first run, editable
|
|
277
|
+
"proxy": {
|
|
278
|
+
"port": 4800,
|
|
279
|
+
"host": "127.0.0.1"
|
|
280
|
+
},
|
|
281
|
+
"providers": {
|
|
282
|
+
// Auto-discovered from env vars
|
|
283
|
+
"anthropic": { "apiKey": "env:ANTHROPIC_API_KEY" },
|
|
284
|
+
"openai": { "apiKey": "env:OPENAI_API_KEY" }
|
|
285
|
+
},
|
|
286
|
+
"embedding": {
|
|
287
|
+
// Auto-selected: ollama if available, else onnx
|
|
288
|
+
"adapter": "ollama",
|
|
289
|
+
"config": { "url": "http://localhost:11434", "model": "nomic-embed-text" }
|
|
290
|
+
},
|
|
291
|
+
"storage": {
|
|
292
|
+
// Default: zero-config lancedb
|
|
293
|
+
"adapter": "lancedb",
|
|
294
|
+
"config": { "path": "~/.smartcontext/data" }
|
|
295
|
+
},
|
|
296
|
+
"context": {
|
|
297
|
+
"tier1_exchanges": 3, // Hot: last N exchanges kept verbatim
|
|
298
|
+
"tier2_max_chunks": 10, // Warm: max retrieved chunks
|
|
299
|
+
"tier2_min_score": 0.55, // Minimum similarity for retrieval
|
|
300
|
+
"tier3_token_reserve": 500, // Cold: tokens reserved for summaries
|
|
301
|
+
"recency_boost": 0.15, // Boost for current session chunks
|
|
302
|
+
"filepath_boost": 0.20, // Boost for file-path matches
|
|
303
|
+
"dedup_threshold": 0.92, // Near-duplicate merge threshold
|
|
304
|
+
"confidence_gate": 0.55, // Below this: skip retrieval, pass-through
|
|
305
|
+
"response_reserve_tokens": 8192 // Reserve for model response
|
|
306
|
+
},
|
|
307
|
+
"logging": {
|
|
308
|
+
"level": "info",
|
|
309
|
+
"raw_logs": true, // Store full conversation logs
|
|
310
|
+
"metrics": true, // Token savings, latency stats
|
|
311
|
+
"debug_headers": false // X-SmartContext-* headers in responses
|
|
312
|
+
}
|
|
313
|
+
}
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
### Process Management
|
|
317
|
+
|
|
318
|
+
**Foreground (default)** โ like any dev server:
|
|
319
|
+
```bash
|
|
320
|
+
npx smartcontext-proxy # Starts in foreground, Ctrl+C stops
|
|
321
|
+
```
|
|
322
|
+
|
|
323
|
+
**Daemon mode** โ runs in background:
|
|
324
|
+
```bash
|
|
325
|
+
npx smartcontext-proxy start # Start as background daemon
|
|
326
|
+
npx smartcontext-proxy stop # Stop daemon (sends SIGTERM)
|
|
327
|
+
npx smartcontext-proxy restart # Restart daemon
|
|
328
|
+
npx smartcontext-proxy status # Show: running/stopped, PID, uptime, stats
|
|
329
|
+
```
|
|
330
|
+
|
|
331
|
+
Daemon mechanics:
|
|
332
|
+
- PID file: `~/.smartcontext/smartcontext.pid`
|
|
333
|
+
- Stdout/stderr: `~/.smartcontext/logs/proxy.log`
|
|
334
|
+
- `start` detaches process, writes PID, exits immediately
|
|
335
|
+
- `stop` reads PID file, sends SIGTERM, waits for graceful shutdown (flush metrics, close storage)
|
|
336
|
+
- `status` checks PID alive + shows stats from metrics endpoint
|
|
337
|
+
- Graceful shutdown on SIGTERM/SIGINT: finish in-flight requests (5s timeout), flush index queue, close storage, remove PID file
|
|
338
|
+
|
|
339
|
+
**System service (optional)** โ for always-on:
|
|
340
|
+
```bash
|
|
341
|
+
npx smartcontext-proxy install-service # Generate systemd/launchd service file
|
|
342
|
+
npx smartcontext-proxy uninstall-service # Remove service
|
|
343
|
+
```
|
|
344
|
+
- macOS: generates LaunchAgent plist in `~/Library/LaunchAgents/`
|
|
345
|
+
- Linux: generates systemd user service in `~/.config/systemd/user/`
|
|
346
|
+
- Auto-start on boot, auto-restart on crash
|
|
347
|
+
|
|
348
|
+
### CLI
|
|
349
|
+
|
|
350
|
+
```bash
|
|
351
|
+
npx smartcontext-proxy # Start foreground (Ctrl+C to stop)
|
|
352
|
+
npx smartcontext-proxy start # Start daemon
|
|
353
|
+
npx smartcontext-proxy stop # Stop daemon
|
|
354
|
+
npx smartcontext-proxy restart # Restart daemon
|
|
355
|
+
npx smartcontext-proxy status # Running? PID, uptime, savings stats
|
|
356
|
+
npx smartcontext-proxy install-service # Install system service (auto-start)
|
|
357
|
+
npx smartcontext-proxy uninstall-service # Remove system service
|
|
358
|
+
npx smartcontext-proxy --port 8080 # Custom port (foreground)
|
|
359
|
+
npx smartcontext-proxy --config ./my.json # Custom config
|
|
360
|
+
npx smartcontext-proxy index <file> # Index existing session logs
|
|
361
|
+
npx smartcontext-proxy providers # List detected providers
|
|
362
|
+
npx smartcontext-proxy benchmark # Run retrieval quality benchmark
|
|
363
|
+
```
|
|
364
|
+
|
|
365
|
+
## 7. Adapter System (Plugin & Play)
|
|
366
|
+
|
|
367
|
+
### How Adapters Work
|
|
368
|
+
|
|
369
|
+
Adapters are npm packages following naming convention `smartcontext-adapter-*`:
|
|
370
|
+
|
|
371
|
+
```bash
|
|
372
|
+
# Install OpenSearch adapter
|
|
373
|
+
npm install -g smartcontext-adapter-opensearch
|
|
374
|
+
|
|
375
|
+
# Install Qdrant adapter
|
|
376
|
+
npm install -g smartcontext-adapter-qdrant
|
|
377
|
+
|
|
378
|
+
# SmartContext auto-discovers installed adapters
|
|
379
|
+
npx smartcontext-proxy
|
|
380
|
+
# Output: "Discovered adapters: opensearch, qdrant"
|
|
381
|
+
```
|
|
382
|
+
|
|
383
|
+
### Our OC Adapter
|
|
384
|
+
|
|
385
|
+
For our system, we build `smartcontext-adapter-openclaw`:
|
|
386
|
+
|
|
387
|
+
```bash
|
|
388
|
+
npm install -g smartcontext-adapter-openclaw
|
|
389
|
+
```
|
|
390
|
+
|
|
391
|
+
This adapter:
|
|
392
|
+
- **Storage**: Uses OpenSearch on Castle (auto-discovers from `ES_URL` env var or OC config)
|
|
393
|
+
- **Embedding**: Uses Beast Ollama (auto-discovers from OC agent config)
|
|
394
|
+
- **Sessions**: Reads OC gateway session logs for initial indexing
|
|
395
|
+
- **Dashboard**: Exposes metrics to dashboard-ts via existing OpenSearch indices
|
|
396
|
+
- **Auth**: Reads OC auth-profiles for provider API keys
|
|
397
|
+
|
|
398
|
+
Config for our setup becomes just:
|
|
399
|
+
```jsonc
|
|
400
|
+
{
|
|
401
|
+
"adapter": "openclaw",
|
|
402
|
+
"config": { "ocHome": "~/.openclaw" } // Everything else auto-discovered
|
|
403
|
+
}
|
|
404
|
+
```
|
|
405
|
+
|
|
406
|
+
## 8. Control Panel & Observability
|
|
407
|
+
|
|
408
|
+
SmartContext ships with a built-in web dashboard. The user sees real value from minute one.
|
|
409
|
+
|
|
410
|
+
### 8.1 Web Dashboard (built-in)
|
|
411
|
+
|
|
412
|
+
Accessible at `http://localhost:4800` (same port as proxy, root path serves UI).
|
|
413
|
+
Single-page app, embedded in the binary โ no extra dependencies, no build step.
|
|
414
|
+
Built with vanilla HTML/CSS/JS (no React/Vue) โ inlined into the server, <50KB total.
|
|
415
|
+
|
|
416
|
+
#### Dashboard Screens
|
|
417
|
+
|
|
418
|
+
**Home / Status**
|
|
419
|
+
```
|
|
420
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
421
|
+
โ SmartContext Proxy โ Running โ
|
|
422
|
+
โ [Pause] [Stop] โ
|
|
423
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
|
|
424
|
+
โ โ
|
|
425
|
+
โ ๐ฐ Total Saved โก Requests Today โ
|
|
426
|
+
โ $63.00 142 โ
|
|
427
|
+
โ 4.2M tokens avg 68% savings โ
|
|
428
|
+
โ โ
|
|
429
|
+
โ ๐ Savings Over Time (7-day chart) โ
|
|
430
|
+
โ โโโ
โโโโโโโโโ โ
|
|
431
|
+
โ โ
|
|
432
|
+
โ ๐ Providers ๐พ Storage โ
|
|
433
|
+
โ anthropic: โ active chunks: 8,943 โ
|
|
434
|
+
โ openai: โ active sessions: 142 โ
|
|
435
|
+
โ ollama: โ active disk: 234 MB โ
|
|
436
|
+
โ โ
|
|
437
|
+
โ โฑ Performance โ
|
|
438
|
+
โ Latency overhead: 12ms p50 / 18ms p95 โ
|
|
439
|
+
โ Embedding: ollama (nomic-embed-text) โ
|
|
440
|
+
โ Cache hit rate: 73% โ
|
|
441
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
442
|
+
```
|
|
443
|
+
|
|
444
|
+
**Live Feed** โ real-time request stream:
|
|
445
|
+
```
|
|
446
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
447
|
+
โ Live Feed [Auto-scroll โ] โ
|
|
448
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
|
|
449
|
+
โ 14:23:05 anthropic/opus 45.2K โ 12.1K -73% โ
|
|
450
|
+
โ Retrieved: 7 chunks (top: 0.89) 12ms โ
|
|
451
|
+
โ 14:22:58 openai/gpt-4o 28.1K โ 9.8K -65% โ
|
|
452
|
+
โ Retrieved: 5 chunks (top: 0.82) 8ms โ
|
|
453
|
+
โ 14:22:41 anthropic/sonnet 8.2K โ 8.2K pass โ
|
|
454
|
+
โ โ Below threshold, pass-through 2ms โ
|
|
455
|
+
โ 14:22:30 ollama/qwen3 12.0K โ 4.1K -66% โ
|
|
456
|
+
โ Retrieved: 4 chunks (top: 0.77) 15ms โ
|
|
457
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
458
|
+
```
|
|
459
|
+
|
|
460
|
+
Clicking a row expands to show: original messages, what was retrieved, what was cut, final assembled context. Full transparency.
|
|
461
|
+
|
|
462
|
+
**Sessions** โ per-session breakdown:
|
|
463
|
+
- Session list with timestamps, request count, total savings
|
|
464
|
+
- Click session โ see all exchanges, retrieval decisions, chunk scores
|
|
465
|
+
- Export session as JSON
|
|
466
|
+
|
|
467
|
+
**Savings Report** โ the money page:
|
|
468
|
+
```
|
|
469
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
470
|
+
โ Savings Report [Export CSV] โ
|
|
471
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
|
|
472
|
+
โ โ
|
|
473
|
+
โ This Month โ
|
|
474
|
+
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
|
|
475
|
+
โ โ Without SmartContext: $412 โ โ
|
|
476
|
+
โ โ With SmartContext: $127 โ โ
|
|
477
|
+
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
|
|
478
|
+
โ โ You saved: $285 โ โ big, green โ
|
|
479
|
+
โ โ Savings rate: 69% โ โ
|
|
480
|
+
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
|
|
481
|
+
โ โ
|
|
482
|
+
โ By Provider โ
|
|
483
|
+
โ anthropic $198 saved (72% reduction) โ
|
|
484
|
+
โ openai $67 saved (61% reduction) โ
|
|
485
|
+
โ ollama $0 saved (local, free) โ
|
|
486
|
+
โ โ
|
|
487
|
+
โ By Model โ
|
|
488
|
+
โ claude-opus-4-6 $142 saved (most expensive) โ
|
|
489
|
+
โ claude-sonnet-4-6 $56 saved โ
|
|
490
|
+
โ gpt-4o $67 saved โ
|
|
491
|
+
โ โ
|
|
492
|
+
โ Projection (if current usage continues) โ
|
|
493
|
+
โ Next month: ~$290 saved โ
|
|
494
|
+
โ Next year: ~$3,480 saved โ
|
|
495
|
+
โ โ
|
|
496
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
497
|
+
```
|
|
498
|
+
|
|
499
|
+
**Settings** โ editable from UI:
|
|
500
|
+
- Context tuning (tier sizes, thresholds, boosts)
|
|
501
|
+
- Provider management (add/remove API keys)
|
|
502
|
+
- Storage config
|
|
503
|
+
- Logging level
|
|
504
|
+
- Pause/resume individual providers
|
|
505
|
+
- Changes write to `~/.smartcontext/config.json`
|
|
506
|
+
|
|
507
|
+
#### Dashboard Tech Stack
|
|
508
|
+
- Vanilla HTML + CSS + minimal JS (no framework)
|
|
509
|
+
- Served by the proxy server itself (same port, root path)
|
|
510
|
+
- All HTML/CSS/JS inlined into a single TypeScript file (`src/ui/dashboard.ts`)
|
|
511
|
+
- Data via WebSocket from proxy (real-time updates) + REST API for history
|
|
512
|
+
- Charts: lightweight `<canvas>` drawing, no chart library dependency
|
|
513
|
+
- Works offline, no CDN, no external resources
|
|
514
|
+
|
|
515
|
+
### 8.2 System Tray (optional, separate package)
|
|
516
|
+
|
|
517
|
+
For users who want a tray icon:
|
|
518
|
+
```bash
|
|
519
|
+
npm install -g smartcontext-tray
|
|
520
|
+
```
|
|
521
|
+
|
|
522
|
+
Tray icon shows:
|
|
523
|
+
- Green dot = running, yellow = paused, red = stopped
|
|
524
|
+
- Click โ opens web dashboard in default browser
|
|
525
|
+
- Right-click menu: Pause / Resume / Stop / Open Dashboard / Quit
|
|
526
|
+
- Tooltip: "SmartContext: 142 requests, $63 saved today"
|
|
527
|
+
|
|
528
|
+
Built with `trayhost` (lightweight, no Electron). Separate package because tray requires native deps โ core proxy stays zero-native-deps.
|
|
529
|
+
|
|
530
|
+
### 8.3 API Endpoints (programmatic access)
|
|
531
|
+
|
|
532
|
+
All dashboard data available via REST:
|
|
533
|
+
|
|
534
|
+
```
|
|
535
|
+
GET /_sc/status โ { state: "running"|"paused", uptime, pid }
|
|
536
|
+
GET /_sc/stats โ { requests, savings, latency, storage }
|
|
537
|
+
GET /_sc/stats/daily โ [ { date, requests, tokens_saved, cost_saved } ... ]
|
|
538
|
+
GET /_sc/stats/providers โ { anthropic: {...}, openai: {...} }
|
|
539
|
+
GET /_sc/stats/models โ { "claude-opus-4-6": {...}, "gpt-4o": {...} }
|
|
540
|
+
GET /_sc/feed โ WebSocket: real-time request stream
|
|
541
|
+
GET /_sc/sessions โ [ { id, started, requests, savings } ... ]
|
|
542
|
+
GET /_sc/sessions/:id โ { exchanges: [...], chunks_retrieved: [...] }
|
|
543
|
+
GET /_sc/config โ current config (keys redacted)
|
|
544
|
+
PUT /_sc/config โ update config (partial merge)
|
|
545
|
+
POST /_sc/pause โ pause proxy (pass-through all requests)
|
|
546
|
+
POST /_sc/resume โ resume optimization
|
|
547
|
+
POST /_sc/stop โ graceful shutdown
|
|
548
|
+
```
|
|
549
|
+
|
|
550
|
+
### 8.4 Pause Mode
|
|
551
|
+
|
|
552
|
+
When paused:
|
|
553
|
+
- Proxy still runs and forwards all requests
|
|
554
|
+
- Context optimization disabled โ requests pass through unmodified
|
|
555
|
+
- Indexing continues (still learning from conversations)
|
|
556
|
+
- Dashboard shows "PAUSED" badge
|
|
557
|
+
- Useful for: debugging, comparing with/without, temporary disable
|
|
558
|
+
|
|
559
|
+
### 8.5 Response Headers (opt-in)
|
|
560
|
+
|
|
561
|
+
When `logging.debug_headers: true`:
|
|
562
|
+
```
|
|
563
|
+
X-SmartContext-Savings: 73%
|
|
564
|
+
X-SmartContext-Original-Tokens: 45200
|
|
565
|
+
X-SmartContext-Optimized-Tokens: 12100
|
|
566
|
+
X-SmartContext-Retrieved-Chunks: 7
|
|
567
|
+
X-SmartContext-Top-Score: 0.89
|
|
568
|
+
X-SmartContext-Cache-Hit: prefix
|
|
569
|
+
X-SmartContext-Latency-Ms: 12
|
|
570
|
+
X-SmartContext-Mode: optimized|pass-through|paused
|
|
571
|
+
```
|
|
572
|
+
|
|
573
|
+
### 8.6 Dashboard Integration (OC Adapter)
|
|
574
|
+
For our setup, the OC adapter additionally:
|
|
575
|
+
- Writes metrics to `smartcontext-metrics` OpenSearch index
|
|
576
|
+
- Dashboard-ts gets a SmartContext tab reading from this index
|
|
577
|
+
- Same data, native dashboard look & feel
|
|
578
|
+
|
|
579
|
+
## 9. Test Mode & LLM-Assisted Diagnostics
|
|
580
|
+
|
|
581
|
+
### 9.1 A/B Test Mode
|
|
582
|
+
|
|
583
|
+
```bash
|
|
584
|
+
npx smartcontext-proxy --test-mode
|
|
585
|
+
# or from dashboard: Settings โ Enable Test Mode
|
|
586
|
+
```
|
|
587
|
+
|
|
588
|
+
In test mode, every request is sent **twice**:
|
|
589
|
+
|
|
590
|
+
```
|
|
591
|
+
Client Request
|
|
592
|
+
โ
|
|
593
|
+
โโโโบ Path A: SmartContext optimized โ Provider โ Response A (returned to client)
|
|
594
|
+
โ
|
|
595
|
+
โโโโบ Path B: Original unmodified โ Provider โ Response B (stored, not returned)
|
|
596
|
+
```
|
|
597
|
+
|
|
598
|
+
The client always gets Response A (optimized). Response B is stored for comparison.
|
|
599
|
+
|
|
600
|
+
**What gets compared:**
|
|
601
|
+
- **Semantic similarity**: embed both responses, compute cosine similarity. Score >0.95 = equivalent quality. Score <0.85 = potential retrieval miss.
|
|
602
|
+
- **Token delta**: how many tokens saved (A vs B input)
|
|
603
|
+
- **Latency delta**: overhead of optimization path
|
|
604
|
+
- **Content diff**: structured diff of responses (key facts present/missing)
|
|
605
|
+
- **Tool use match**: if A and B call the same tools with same arguments
|
|
606
|
+
|
|
607
|
+
**Dashboard in Test Mode:**
|
|
608
|
+
|
|
609
|
+
```
|
|
610
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
611
|
+
โ A/B Test Results [Export JSON] โ
|
|
612
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
|
|
613
|
+
โ โ
|
|
614
|
+
โ Total comparisons: 87 โ
|
|
615
|
+
โ Quality match (>0.95): 79 (91%) โ green โ
|
|
616
|
+
โ Minor diff (0.85-0.95): 6 (7%) โ yellow โ
|
|
617
|
+
โ Significant diff (<0.85): 2 (2%) โ red โ
|
|
618
|
+
โ โ
|
|
619
|
+
โ Avg token savings: 64% โ
|
|
620
|
+
โ Avg latency overhead: 14ms โ
|
|
621
|
+
โ โ
|
|
622
|
+
โ โ Significant diffs (click to inspect): โ
|
|
623
|
+
โ #43 anthropic/opus 0.78 "missed DB schema..." โ
|
|
624
|
+
โ #71 openai/gpt-4o 0.82 "lost function sig..." โ
|
|
625
|
+
โ โ
|
|
626
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
627
|
+
```
|
|
628
|
+
|
|
629
|
+
Clicking a diff row shows side-by-side: optimized context vs full context, what chunks were retrieved, what was missing, both responses.
|
|
630
|
+
|
|
631
|
+
**Cost note:** Test mode doubles API costs. Dashboard shows estimated extra cost. User can limit: `--test-mode --test-sample 20%` (randomly sample 20% of requests for A/B).
|
|
632
|
+
|
|
633
|
+
### 9.2 Verbose Logging
|
|
634
|
+
|
|
635
|
+
```bash
|
|
636
|
+
npx smartcontext-proxy --verbose
|
|
637
|
+
# or config: logging.level = "debug"
|
|
638
|
+
# or dashboard: Settings โ Log Level โ Debug
|
|
639
|
+
```
|
|
640
|
+
|
|
641
|
+
Debug logs per request:
|
|
642
|
+
|
|
643
|
+
```
|
|
644
|
+
[14:23:05.001] REQ #142 anthropic/claude-opus-4-6
|
|
645
|
+
[14:23:05.002] Original: 45,200 tokens (23 messages)
|
|
646
|
+
[14:23:05.003] System prompt: 2,100 tokens (stable, cache-eligible)
|
|
647
|
+
[14:23:05.004] Tier 1: kept last 3 exchanges (4,800 tokens)
|
|
648
|
+
[14:23:05.008] Embedding query: "fix the auth middleware bug" โ 768-dim vector (6ms)
|
|
649
|
+
[14:23:05.012] Retrieval: 20 candidates, 7 above threshold
|
|
650
|
+
[14:23:05.012] #1 score=0.89 session=abc123 "auth middleware refactor from yesterday"
|
|
651
|
+
[14:23:05.012] #2 score=0.84 session=abc123 "JWT validation edge case discussion"
|
|
652
|
+
[14:23:05.012] #3 score=0.81 session=def456 "middleware stack architecture overview"
|
|
653
|
+
[14:23:05.012] #4 score=0.77 session=abc123 "auth test failures and fixes"
|
|
654
|
+
[14:23:05.012] #5 score=0.72 session=ghi789 "similar bug in payment middleware"
|
|
655
|
+
[14:23:05.012] #6 score=0.68 filepath-boost=+0.20 โ 0.88 "src/middleware/auth.ts changes"
|
|
656
|
+
[14:23:05.012] #7 score=0.61 session=abc123 "general project setup"
|
|
657
|
+
[14:23:05.013] Dedup: merged #1 and #4 (similarity 0.93)
|
|
658
|
+
[14:23:05.013] Budget: 38,300 available โ packed 6 chunks (5,200 tokens)
|
|
659
|
+
[14:23:05.013] Tier 3: 1 session summary (320 tokens)
|
|
660
|
+
[14:23:05.014] Final context: 12,420 tokens (savings: 72.5%)
|
|
661
|
+
[14:23:05.014] Forwarding to api.anthropic.com
|
|
662
|
+
[14:23:07.891] Response: 1,842 tokens, streaming completed (2,877ms)
|
|
663
|
+
[14:23:07.892] Indexing exchange async...
|
|
664
|
+
[14:23:07.910] Indexed: 1 chunk (18ms)
|
|
665
|
+
```
|
|
666
|
+
|
|
667
|
+
Logs written to:
|
|
668
|
+
- `~/.smartcontext/logs/proxy.log` (standard)
|
|
669
|
+
- `~/.smartcontext/logs/debug.log` (verbose, only when enabled)
|
|
670
|
+
- `~/.smartcontext/logs/requests/` (per-request JSON dumps when `logging.request_dumps: true`)
|
|
671
|
+
|
|
672
|
+
Per-request JSON dump (for forensic analysis):
|
|
673
|
+
|
|
674
|
+
```jsonc
|
|
675
|
+
{
|
|
676
|
+
"id": 142,
|
|
677
|
+
"timestamp": "2026-03-29T14:23:05.001Z",
|
|
678
|
+
"provider": "anthropic",
|
|
679
|
+
"model": "claude-opus-4-6",
|
|
680
|
+
"original": {
|
|
681
|
+
"messages": 23,
|
|
682
|
+
"tokens": 45200,
|
|
683
|
+
"system_prompt_tokens": 2100
|
|
684
|
+
},
|
|
685
|
+
"optimized": {
|
|
686
|
+
"messages": 12,
|
|
687
|
+
"tokens": 12420,
|
|
688
|
+
"tier1_tokens": 4800,
|
|
689
|
+
"tier2_tokens": 5200,
|
|
690
|
+
"tier3_tokens": 320,
|
|
691
|
+
"system_prompt_tokens": 2100
|
|
692
|
+
},
|
|
693
|
+
"retrieval": {
|
|
694
|
+
"query_embedding_ms": 6,
|
|
695
|
+
"search_ms": 4,
|
|
696
|
+
"candidates": 20,
|
|
697
|
+
"above_threshold": 7,
|
|
698
|
+
"after_dedup": 6,
|
|
699
|
+
"after_budget": 6,
|
|
700
|
+
"top_score": 0.89,
|
|
701
|
+
"chunks": [
|
|
702
|
+
{ "id": "chunk_abc_17", "score": 0.89, "tokens": 850, "session": "abc123", "preview": "auth middleware refactor..." },
|
|
703
|
+
// ...
|
|
704
|
+
]
|
|
705
|
+
},
|
|
706
|
+
"savings_pct": 72.5,
|
|
707
|
+
"latency_overhead_ms": 14,
|
|
708
|
+
"response_tokens": 1842,
|
|
709
|
+
// In test mode, also includes:
|
|
710
|
+
"ab_test": {
|
|
711
|
+
"response_b_tokens": 1910,
|
|
712
|
+
"semantic_similarity": 0.97,
|
|
713
|
+
"quality_match": true
|
|
714
|
+
}
|
|
715
|
+
}
|
|
716
|
+
```
|
|
717
|
+
|
|
718
|
+
### 9.3 LLM-Assisted Diagnostics
|
|
719
|
+
|
|
720
|
+
SmartContext can use a cheap LLM to analyze its own behavior and help debug issues.
|
|
721
|
+
|
|
722
|
+
#### Auto-Diagnosis (on significant quality diff)
|
|
723
|
+
|
|
724
|
+
When A/B test detects similarity <0.85, or when user flags a bad response:
|
|
725
|
+
|
|
726
|
+
1. SmartContext collects the full context: original messages, what was retrieved, what was cut, both responses
|
|
727
|
+
2. Sends to a diagnostic LLM (cheapest available: local Ollama, or Haiku-tier cloud model)
|
|
728
|
+
3. LLM analyzes and produces a diagnostic report
|
|
729
|
+
|
|
730
|
+
```
|
|
731
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
732
|
+
โ ๐ Diagnostic Report โ Request #43 โ
|
|
733
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
|
|
734
|
+
โ โ
|
|
735
|
+
โ Problem: Response quality degraded (similarity โ
|
|
736
|
+
โ 0.78). Model missed database schema context. โ
|
|
737
|
+
โ โ
|
|
738
|
+
โ Root cause: The DB schema was discussed in โ
|
|
739
|
+
โ exchange #7 (45 min ago) but the embedding โ
|
|
740
|
+
โ similarity to current query was only 0.52 โ โ
|
|
741
|
+
โ below the 0.55 threshold. The schema discussion โ
|
|
742
|
+
โ used technical column names while the current โ
|
|
743
|
+
โ query uses business-level terminology. โ
|
|
744
|
+
โ โ
|
|
745
|
+
โ Recommended fixes: โ
|
|
746
|
+
โ 1. Lower tier2_min_score to 0.50 for this โ
|
|
747
|
+
โ session type (schema discussions) โ
|
|
748
|
+
โ 2. Add keyword fallback: if query contains โ
|
|
749
|
+
โ table/column names, grep raw logs โ
|
|
750
|
+
โ 3. Consider hybrid retrieval: semantic + keyword โ
|
|
751
|
+
โ โ
|
|
752
|
+
โ โก Auto-fix available: โ
|
|
753
|
+
โ [Apply fix #1] [Apply fix #2] [Ignore] โ
|
|
754
|
+
โ โ
|
|
755
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
756
|
+
```
|
|
757
|
+
|
|
758
|
+
#### Diagnostic Commands
|
|
759
|
+
|
|
760
|
+
```bash
|
|
761
|
+
npx smartcontext-proxy diagnose # Analyze last 100 requests, find issues
|
|
762
|
+
npx smartcontext-proxy diagnose --request 43 # Diagnose specific request
|
|
763
|
+
npx smartcontext-proxy diagnose --tune # Suggest config tuning based on usage patterns
|
|
764
|
+
```
|
|
765
|
+
|
|
766
|
+
From dashboard: any request in Live Feed or A/B results has a "๐ Diagnose" button.
|
|
767
|
+
|
|
768
|
+
#### What the Diagnostic LLM Analyzes
|
|
769
|
+
|
|
770
|
+
| Trigger | What LLM Receives | What LLM Returns |
|
|
771
|
+
|---------|-------------------|-------------------|
|
|
772
|
+
| Quality diff <0.85 | Both contexts, both responses, retrieval scores | Root cause, fix suggestions |
|
|
773
|
+
| User clicks "Diagnose" | Full request dump (JSON) | Plain-language analysis |
|
|
774
|
+
| `diagnose --tune` | Aggregate stats, score distributions, miss patterns | Config recommendations with reasoning |
|
|
775
|
+
| First 50 requests (onboarding) | Request patterns, conversation structure | Auto-tune suggestions for chunk size, thresholds |
|
|
776
|
+
|
|
777
|
+
#### Auto-Tuning
|
|
778
|
+
|
|
779
|
+
After accumulating 50+ requests, SmartContext can auto-tune itself:
|
|
780
|
+
|
|
781
|
+
```bash
|
|
782
|
+
npx smartcontext-proxy auto-tune
|
|
783
|
+
```
|
|
784
|
+
|
|
785
|
+
The diagnostic LLM analyzes:
|
|
786
|
+
- Score distribution (are thresholds too high/low?)
|
|
787
|
+
- Miss patterns (what type of context gets lost?)
|
|
788
|
+
- Chunk size effectiveness (too big = wasteful, too small = lost context)
|
|
789
|
+
- Provider-specific patterns (some models need more context than others)
|
|
790
|
+
|
|
791
|
+
Produces a tuning report with specific config changes. User approves or rejects each.
|
|
792
|
+
|
|
793
|
+
Dashboard: Settings โ "๐งช Auto-Tune" button. Shows proposed changes with before/after predictions.
|
|
794
|
+
|
|
795
|
+
#### LLM Selection for Diagnostics
|
|
796
|
+
|
|
797
|
+
Priority (cheapest first):
|
|
798
|
+
1. Local Ollama (qwen3-coder-next, kimi-k2.5) โ free, fast
|
|
799
|
+
2. Ollama Cloud โ cheap, fast
|
|
800
|
+
3. Cheapest detected cloud provider (Haiku, GPT-4o-mini)
|
|
801
|
+
4. Skip diagnostics if no cheap LLM available
|
|
802
|
+
|
|
803
|
+
Diagnostic calls are **never** sent through SmartContext itself (avoid recursion). Direct API calls to the diagnostic LLM.
|
|
804
|
+
|
|
805
|
+
## 10. Graceful Degradation
|
|
806
|
+
|
|
807
|
+
SmartContext must NEVER make things worse:
|
|
808
|
+
|
|
809
|
+
| Failure | Behavior |
|
|
810
|
+
|---------|----------|
|
|
811
|
+
| Vector store down | Pass-through original request unmodified |
|
|
812
|
+
| Embedding fails | Pass-through original request unmodified |
|
|
813
|
+
| No chunks above threshold | Pass-through original request unmodified |
|
|
814
|
+
| Provider unreachable | Return error (same as without proxy) |
|
|
815
|
+
| Config missing | Auto-generate defaults and start |
|
|
816
|
+
| First run, empty index | Pass-through until enough data indexed |
|
|
817
|
+
|
|
818
|
+
The proxy is **additive only**. If anything in the optimization pipeline fails, the original request goes through untouched. The user never sees degraded quality from SmartContext โ worst case they get exactly what they'd get without it.
|
|
819
|
+
|
|
820
|
+
## 10. Security
|
|
821
|
+
|
|
822
|
+
- API keys never stored in plaintext โ reference env vars (`env:ANTHROPIC_API_KEY`)
|
|
823
|
+
- Proxy listens on localhost by default (no network exposure)
|
|
824
|
+
- Raw logs encrypted at rest (AES-256, key derived from machine ID)
|
|
825
|
+
- No telemetry, no phone-home, no analytics
|
|
826
|
+
- All data stays local unless user explicitly configures remote storage
|
|
827
|
+
|
|
828
|
+
## 11. Project Structure
|
|
829
|
+
|
|
830
|
+
```
|
|
831
|
+
smartcontext-proxy/
|
|
832
|
+
โโโ src/
|
|
833
|
+
โ โโโ index.ts # Entry point, CLI, daemon management
|
|
834
|
+
โ โโโ proxy/
|
|
835
|
+
โ โ โโโ server.ts # HTTP/SSE proxy server + UI serving
|
|
836
|
+
โ โ โโโ router.ts # Route to correct provider adapter
|
|
837
|
+
โ โ โโโ stream.ts # SSE pass-through logic
|
|
838
|
+
โ โ โโโ pause.ts # Pause/resume state management
|
|
839
|
+
โ โโโ providers/
|
|
840
|
+
โ โ โโโ types.ts # ProviderAdapter interface
|
|
841
|
+
โ โ โโโ anthropic.ts # Anthropic Messages API
|
|
842
|
+
โ โ โโโ openai.ts # OpenAI Chat Completions
|
|
843
|
+
โ โ โโโ google.ts # Google GenerateContent
|
|
844
|
+
โ โ โโโ ollama.ts # Ollama native API
|
|
845
|
+
โ โโโ context/
|
|
846
|
+
โ โ โโโ optimizer.ts # Core context optimization logic
|
|
847
|
+
โ โ โโโ canonical.ts # Canonical message format
|
|
848
|
+
โ โ โโโ chunker.ts # Message chunking
|
|
849
|
+
โ โ โโโ retriever.ts # Vector search + scoring
|
|
850
|
+
โ โ โโโ budget.ts # Token budget allocation
|
|
851
|
+
โ โโโ storage/
|
|
852
|
+
โ โ โโโ types.ts # StorageAdapter interface
|
|
853
|
+
โ โ โโโ lancedb.ts # Default embedded storage
|
|
854
|
+
โ โ โโโ filesystem.ts # Fallback: raw logs only
|
|
855
|
+
โ โโโ embedding/
|
|
856
|
+
โ โ โโโ types.ts # EmbeddingAdapter interface
|
|
857
|
+
โ โ โโโ onnx.ts # Built-in ONNX embedding
|
|
858
|
+
โ โ โโโ ollama.ts # Ollama embedding
|
|
859
|
+
โ โโโ config/
|
|
860
|
+
โ โ โโโ auto-detect.ts # Provider/embedding/storage discovery
|
|
861
|
+
โ โ โโโ schema.ts # Config validation
|
|
862
|
+
โ โ โโโ defaults.ts # Default values
|
|
863
|
+
โ โโโ metrics/
|
|
864
|
+
โ โ โโโ collector.ts # Request/response metrics
|
|
865
|
+
โ โ โโโ endpoint.ts # /_sc/* REST API
|
|
866
|
+
โ โ โโโ history.ts # Persistent metrics (daily/monthly)
|
|
867
|
+
โ โโโ ui/
|
|
868
|
+
โ โ โโโ dashboard.ts # Generates HTML/CSS/JS (inlined, no deps)
|
|
869
|
+
โ โ โโโ ws-feed.ts # WebSocket live feed server
|
|
870
|
+
โ โ โโโ api.ts # /_sc/* route handlers (pause/stop/config)
|
|
871
|
+
โ โโโ daemon/
|
|
872
|
+
โ โ โโโ process.ts # Fork/detach, PID file, signal handling
|
|
873
|
+
โ โ โโโ service.ts # Generate systemd/launchd service files
|
|
874
|
+
โ โโโ adapters/
|
|
875
|
+
โ โโโ loader.ts # Discover & load external adapters
|
|
876
|
+
โโโ adapters/
|
|
877
|
+
โ โโโ openclaw/ # Our adapter (separate npm package)
|
|
878
|
+
โ โโโ index.ts
|
|
879
|
+
โ โโโ storage.ts # OpenSearch storage
|
|
880
|
+
โ โโโ embedding.ts # Beast Ollama embedding
|
|
881
|
+
โ โโโ session-importer.ts # Import OC session logs
|
|
882
|
+
โโโ package.json
|
|
883
|
+
โโโ tsconfig.json
|
|
884
|
+
โโโ README.md
|
|
885
|
+
```
|
|
886
|
+
|
|
887
|
+
## 12. Synergy with Our Stack
|
|
888
|
+
|
|
889
|
+
| Our Component | SmartContext Integration |
|
|
890
|
+
|--------------|------------------------|
|
|
891
|
+
| **OC Gateway** | SmartContext sits between OC and Anthropic/Gemini APIs. OC config points `baseUrl` to SmartContext. |
|
|
892
|
+
| **Beast PC** | Remote Ollama embedding via `remote-ollama` adapter. Faster than ONNX, zero cost. |
|
|
893
|
+
| **OpenSearch (Castle)** | `openclaw` adapter stores chunks + metrics in OS. Dashboard reads them. |
|
|
894
|
+
| **Session logs** | `session-importer.ts` indexes historical OC sessions on first setup. |
|
|
895
|
+
| **Dashboard** | New SmartContext tab: savings graph, retrieval quality, per-cron breakdown. |
|
|
896
|
+
| **Cron jobs** | Each cron call goes through SmartContext โ cross-session context for recurring tasks. |
|
|
897
|
+
| **A2A Bridge** | Agent-to-agent messages indexed โ agents share context automatically. |
|
|
898
|
+
|
|
899
|
+
## 13. Benchmark Plan
|
|
900
|
+
|
|
901
|
+
Before public release, benchmark on 10 real CC sessions (2 per type):
|
|
902
|
+
|
|
903
|
+
| Session Type | What to Measure |
|
|
904
|
+
|-------------|----------------|
|
|
905
|
+
| Bug fix (short) | Retrieval precision, latency overhead |
|
|
906
|
+
| Feature build (long) | Token savings %, quality retention |
|
|
907
|
+
| Cron/monitoring | Cross-session context value |
|
|
908
|
+
| Multi-file refactor | File-path boost effectiveness |
|
|
909
|
+
| Learning/research | Summary tier value |
|
|
910
|
+
|
|
911
|
+
Metrics per session:
|
|
912
|
+
- **Semantic similarity**: SmartContext response vs full-context response (cosine sim of embeddings)
|
|
913
|
+
- **Token ratio**: optimized / original
|
|
914
|
+
- **Latency**: p50, p95 overhead
|
|
915
|
+
- **Retrieval precision**: manually scored relevance of retrieved chunks (1-5 scale)
|