@tecet/ollm 0.1.4 → 0.1.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli.js +20 -14
- package/dist/cli.js.map +3 -3
- package/dist/services/documentService.d.ts.map +1 -1
- package/dist/services/documentService.js +12 -2
- package/dist/services/documentService.js.map +1 -1
- package/dist/ui/components/docs/DocsPanel.d.ts.map +1 -1
- package/dist/ui/components/docs/DocsPanel.js +1 -1
- package/dist/ui/components/docs/DocsPanel.js.map +1 -1
- package/dist/ui/components/launch/VersionBanner.js +1 -1
- package/dist/ui/components/launch/VersionBanner.js.map +1 -1
- package/dist/ui/components/layout/KeybindsLegend.d.ts.map +1 -1
- package/dist/ui/components/layout/KeybindsLegend.js +1 -1
- package/dist/ui/components/layout/KeybindsLegend.js.map +1 -1
- package/dist/ui/components/tabs/BugReportTab.js +1 -1
- package/dist/ui/components/tabs/BugReportTab.js.map +1 -1
- package/dist/ui/services/docsService.d.ts +12 -27
- package/dist/ui/services/docsService.d.ts.map +1 -1
- package/dist/ui/services/docsService.js +40 -67
- package/dist/ui/services/docsService.js.map +1 -1
- package/docs/README.md +3 -410
- package/package.json +10 -7
- package/scripts/copy-docs-to-user.cjs +34 -0
- package/docs/Context/CheckpointFlowDiagram.md +0 -673
- package/docs/Context/ContextArchitecture.md +0 -898
- package/docs/Context/ContextCompression.md +0 -1102
- package/docs/Context/ContextManagment.md +0 -750
- package/docs/Context/Index.md +0 -209
- package/docs/Context/README.md +0 -390
- package/docs/DevelopmentRoadmap/Index.md +0 -238
- package/docs/DevelopmentRoadmap/OLLM-CLI_Releases.md +0 -419
- package/docs/DevelopmentRoadmap/PlanedFeatures.md +0 -448
- package/docs/DevelopmentRoadmap/README.md +0 -174
- package/docs/DevelopmentRoadmap/Roadmap.md +0 -572
- package/docs/DevelopmentRoadmap/RoadmapVisual.md +0 -372
- package/docs/Hooks/Architecture.md +0 -885
- package/docs/Hooks/Index.md +0 -244
- package/docs/Hooks/KeyboardShortcuts.md +0 -248
- package/docs/Hooks/Protocol.md +0 -817
- package/docs/Hooks/README.md +0 -403
- package/docs/Hooks/UserGuide.md +0 -1483
- package/docs/Hooks/VisualGuide.md +0 -598
- package/docs/Index.md +0 -506
- package/docs/Installation.md +0 -586
- package/docs/Introduction.md +0 -367
- package/docs/LLM Models/Index.md +0 -239
- package/docs/LLM Models/LLM_GettingStarted.md +0 -748
- package/docs/LLM Models/LLM_Index.md +0 -701
- package/docs/LLM Models/LLM_MemorySystem.md +0 -337
- package/docs/LLM Models/LLM_ModelCompatibility.md +0 -499
- package/docs/LLM Models/LLM_ModelsArchitecture.md +0 -933
- package/docs/LLM Models/LLM_ModelsCommands.md +0 -839
- package/docs/LLM Models/LLM_ModelsConfiguration.md +0 -1094
- package/docs/LLM Models/LLM_ModelsList.md +0 -1071
- package/docs/LLM Models/LLM_ModelsList.md.backup +0 -400
- package/docs/LLM Models/README.md +0 -355
- package/docs/MCP/MCP_Architecture.md +0 -1086
- package/docs/MCP/MCP_Commands.md +0 -1111
- package/docs/MCP/MCP_GettingStarted.md +0 -590
- package/docs/MCP/MCP_Index.md +0 -524
- package/docs/MCP/MCP_Integration.md +0 -866
- package/docs/MCP/MCP_Marketplace.md +0 -160
- package/docs/MCP/README.md +0 -415
- package/docs/Prompts System/Architecture.md +0 -760
- package/docs/Prompts System/Index.md +0 -223
- package/docs/Prompts System/PromptsRouting.md +0 -1047
- package/docs/Prompts System/PromptsTemplates.md +0 -1102
- package/docs/Prompts System/README.md +0 -389
- package/docs/Prompts System/SystemPrompts.md +0 -856
- package/docs/Quickstart.md +0 -535
- package/docs/Tools/Architecture.md +0 -884
- package/docs/Tools/GettingStarted.md +0 -624
- package/docs/Tools/Index.md +0 -216
- package/docs/Tools/ManifestReference.md +0 -141
- package/docs/Tools/README.md +0 -440
- package/docs/Tools/UserGuide.md +0 -773
- package/docs/Troubleshooting.md +0 -1265
- package/docs/UI&Settings/Architecture.md +0 -729
- package/docs/UI&Settings/ColorASCII.md +0 -34
- package/docs/UI&Settings/Commands.md +0 -755
- package/docs/UI&Settings/Configuration.md +0 -872
- package/docs/UI&Settings/Index.md +0 -293
- package/docs/UI&Settings/Keybinds.md +0 -372
- package/docs/UI&Settings/README.md +0 -278
- package/docs/UI&Settings/Terminal.md +0 -637
- package/docs/UI&Settings/Themes.md +0 -604
- package/docs/UI&Settings/UIGuide.md +0 -550
|
@@ -1,750 +0,0 @@
|
|
|
1
|
-
# Context Management System
|
|
2
|
-
|
|
3
|
-
**Last Updated:** January 26, 2026
|
|
4
|
-
**Status:** Source of Truth
|
|
5
|
-
|
|
6
|
-
**Related Documents:**
|
|
7
|
-
|
|
8
|
-
- `ContextArchitecture.md` - Overall system architecture
|
|
9
|
-
- `ContextCompression.md` - Compression, checkpoints, snapshots
|
|
10
|
-
- `SystemPrompts.md` - System prompt architecture
|
|
11
|
-
|
|
12
|
-
---
|
|
13
|
-
|
|
14
|
-
## Overview
|
|
15
|
-
|
|
16
|
-
The Context Management System determines context window sizes, monitors VRAM, and manages token counting. It provides the foundation for compression and prompt systems.
|
|
17
|
-
|
|
18
|
-
**Core Responsibility:** Determine and maintain the context size that will be sent to Ollama.
|
|
19
|
-
|
|
20
|
-
---
|
|
21
|
-
|
|
22
|
-
## Table of Contents
|
|
23
|
-
|
|
24
|
-
1. [Architecture](#architecture)
|
|
25
|
-
2. [Context Tiers](#context-tiers)
|
|
26
|
-
3. [Context Size Flow](#context-size-flow)
|
|
27
|
-
4. [Auto-Sizing](#auto-sizing)
|
|
28
|
-
5. [Token Counting](#token-counting)
|
|
29
|
-
6. [VRAM Monitoring](#vram-monitoring)
|
|
30
|
-
7. [Configuration](#configuration)
|
|
31
|
-
8. [API Reference](#api-reference)
|
|
32
|
-
|
|
33
|
-
---
|
|
34
|
-
|
|
35
|
-
## Architecture
|
|
36
|
-
|
|
37
|
-
### Core Components
|
|
38
|
-
|
|
39
|
-
```mermaid
|
|
40
|
-
graph TB
|
|
41
|
-
subgraph "Context Management"
|
|
42
|
-
A[Context Manager]
|
|
43
|
-
B[VRAM Monitor]
|
|
44
|
-
C[Token Counter]
|
|
45
|
-
D[Context Pool]
|
|
46
|
-
E[Memory Guard]
|
|
47
|
-
end
|
|
48
|
-
|
|
49
|
-
subgraph "Supporting Systems"
|
|
50
|
-
F[System Prompt Builder]
|
|
51
|
-
G[Compression Coordinator]
|
|
52
|
-
H[Snapshot Manager]
|
|
53
|
-
end
|
|
54
|
-
|
|
55
|
-
A --> B
|
|
56
|
-
A --> C
|
|
57
|
-
A --> D
|
|
58
|
-
A --> E
|
|
59
|
-
A --> F
|
|
60
|
-
A --> G
|
|
61
|
-
A --> H
|
|
62
|
-
|
|
63
|
-
style A fill:#4d96ff
|
|
64
|
-
style B fill:#6bcf7f
|
|
65
|
-
style C fill:#ffd93d
|
|
66
|
-
```
|
|
67
|
-
|
|
68
|
-
**Component Responsibilities:**
|
|
69
|
-
|
|
70
|
-
1. **Context Manager** (`contextManager.ts`)
|
|
71
|
-
- Main orchestration layer
|
|
72
|
-
- Coordinates all context services
|
|
73
|
-
- Manages conversation state
|
|
74
|
-
- Owns system prompt
|
|
75
|
-
|
|
76
|
-
2. **VRAM Monitor** (`vramMonitor.ts`)
|
|
77
|
-
- Tracks GPU memory availability
|
|
78
|
-
- Detects low memory conditions
|
|
79
|
-
- Platform-specific implementations (NVIDIA, AMD, Apple Silicon)
|
|
80
|
-
|
|
81
|
-
3. **Token Counter** (`tokenCounter.ts`)
|
|
82
|
-
- Measures context usage in tokens
|
|
83
|
-
- Caches token counts for performance
|
|
84
|
-
- Estimates tokens for messages
|
|
85
|
-
|
|
86
|
-
4. **Context Pool** (`contextPool.ts`)
|
|
87
|
-
- Manages dynamic context sizing
|
|
88
|
-
- Calculates optimal context size based on VRAM
|
|
89
|
-
- Handles context resizing
|
|
90
|
-
|
|
91
|
-
5. **Memory Guard** (`memoryGuard.ts`)
|
|
92
|
-
- Prevents OOM errors
|
|
93
|
-
- Emits warnings at memory thresholds
|
|
94
|
-
- Triggers emergency actions
|
|
95
|
-
|
|
96
|
-
---
|
|
97
|
-
|
|
98
|
-
## Context Tiers
|
|
99
|
-
|
|
100
|
-
Context tiers are **labels** that represent different context window sizes. They are the **result** of user selection or hardware detection, not decision makers.
|
|
101
|
-
|
|
102
|
-
### Tier Definitions
|
|
103
|
-
|
|
104
|
-
```mermaid
|
|
105
|
-
graph LR
|
|
106
|
-
subgraph "Tier 1: Minimal"
|
|
107
|
-
A1[2K, 4K]
|
|
108
|
-
A2[1700, 3400 Ollama]
|
|
109
|
-
end
|
|
110
|
-
|
|
111
|
-
subgraph "Tier 2: Basic"
|
|
112
|
-
B1[8K]
|
|
113
|
-
B2[6800 Ollama]
|
|
114
|
-
end
|
|
115
|
-
|
|
116
|
-
subgraph "Tier 3: Standard ⭐"
|
|
117
|
-
C1[16K]
|
|
118
|
-
C2[13600 Ollama]
|
|
119
|
-
end
|
|
120
|
-
|
|
121
|
-
subgraph "Tier 4: Premium"
|
|
122
|
-
D1[32K]
|
|
123
|
-
D2[27200 Ollama]
|
|
124
|
-
end
|
|
125
|
-
|
|
126
|
-
subgraph "Tier 5: Ultra"
|
|
127
|
-
E1[64K, 128K]
|
|
128
|
-
E2[54400, 108800 Ollama]
|
|
129
|
-
end
|
|
130
|
-
|
|
131
|
-
style C1 fill:#6bcf7f
|
|
132
|
-
style C2 fill:#6bcf7f
|
|
133
|
-
```
|
|
134
|
-
|
|
135
|
-
| Tier | Context Size | Ollama Size (85%) | Use Case |
|
|
136
|
-
| ----------------- | ------------ | ----------------- | ----------------------------------- |
|
|
137
|
-
| Tier 1 (Minimal) | 2K, 4K | 1700, 3400 | Quick tasks, minimal context |
|
|
138
|
-
| Tier 2 (Basic) | 8K | 6800 | Standard conversations |
|
|
139
|
-
| Tier 3 (Standard) | 16K | 13600 | Complex tasks, code review ⭐ |
|
|
140
|
-
| Tier 4 (Premium) | 32K | 27200 | Large codebases, long conversations |
|
|
141
|
-
| Tier 5 (Ultra) | 64K, 128K | 54400, 108800 | Maximum context, research tasks |
|
|
142
|
-
|
|
143
|
-
**Key Points:**
|
|
144
|
-
|
|
145
|
-
- Tiers are **labels only** - they don't make decisions
|
|
146
|
-
- Context size drives everything
|
|
147
|
-
- Each tier has specific context sizes (not ranges)
|
|
148
|
-
- Tiers are used for prompt selection (see `SystemPrompts.md`)
|
|
149
|
-
- The 85% values are **pre-calculated by devs** in `LLM_profiles.json`
|
|
150
|
-
|
|
151
|
-
---
|
|
152
|
-
|
|
153
|
-
## Context Size Flow
|
|
154
|
-
|
|
155
|
-
### User Selection → Ollama
|
|
156
|
-
|
|
157
|
-
```mermaid
|
|
158
|
-
sequenceDiagram
|
|
159
|
-
participant User
|
|
160
|
-
participant System
|
|
161
|
-
participant Profile as LLM_profiles.json
|
|
162
|
-
participant Ollama
|
|
163
|
-
|
|
164
|
-
User->>System: Select 16K context
|
|
165
|
-
System->>Profile: Read model entry
|
|
166
|
-
Profile->>System: ollama_context_size: 13600
|
|
167
|
-
System->>System: Determine tier: Tier 3
|
|
168
|
-
System->>System: Build prompt for Tier 3
|
|
169
|
-
System->>Ollama: Send prompt + num_ctx: 13600
|
|
170
|
-
Ollama->>Ollama: Use 100% of 13600 tokens
|
|
171
|
-
|
|
172
|
-
Note over System,Ollama: 85% already calculated in profile
|
|
173
|
-
```
|
|
174
|
-
|
|
175
|
-
**Flow Steps:**
|
|
176
|
-
|
|
177
|
-
1. User selects context size (e.g., 16K)
|
|
178
|
-
2. System reads LLM_profiles.json
|
|
179
|
-
3. Gets pre-calculated ollama_context_size (e.g., 13600 for 16K)
|
|
180
|
-
4. System determines tier label (Tier 3 for 16K)
|
|
181
|
-
5. System builds prompt based on tier label
|
|
182
|
-
6. System sends prompt + ollama_context_size (13600) to Ollama
|
|
183
|
-
7. Ollama uses 100% of that value (13600 tokens)
|
|
184
|
-
|
|
185
|
-
**Critical:** The 85% is already calculated in `LLM_profiles.json`. No runtime calculation of 85% should exist in the code.
|
|
186
|
-
|
|
187
|
-
### Data Flow Chain
|
|
188
|
-
|
|
189
|
-
```mermaid
|
|
190
|
-
graph TD
|
|
191
|
-
A[LLM_profiles.json] --> B[ProfileManager.getModelEntry]
|
|
192
|
-
B --> C[calculateContextSizing]
|
|
193
|
-
C --> D[Returns: allowed, ollamaContextSize, ratio]
|
|
194
|
-
D --> E[ModelContext.sendToLLM OR nonInteractive.ts]
|
|
195
|
-
E --> F[contextActions.updateConfig]
|
|
196
|
-
F --> G[context.maxTokens = ollamaContextSize]
|
|
197
|
-
G --> H[provider.chatStream]
|
|
198
|
-
H --> I[Ollama enforces limit]
|
|
199
|
-
|
|
200
|
-
style A fill:#4d96ff
|
|
201
|
-
style G fill:#6bcf7f
|
|
202
|
-
style I fill:#ffd93d
|
|
203
|
-
```
|
|
204
|
-
|
|
205
|
-
**Critical:** `context.maxTokens` MUST equal `ollamaContextSize`, not user's selection.
|
|
206
|
-
|
|
207
|
-
---
|
|
208
|
-
|
|
209
|
-
## LLM_profiles.json Structure
|
|
210
|
-
|
|
211
|
-
### Profile Format
|
|
212
|
-
|
|
213
|
-
```json
|
|
214
|
-
{
|
|
215
|
-
"models": [
|
|
216
|
-
{
|
|
217
|
-
"id": "llama3.2:3b",
|
|
218
|
-
"context_profiles": [
|
|
219
|
-
{
|
|
220
|
-
"size": 4096, // User sees this
|
|
221
|
-
"ollama_context_size": 3482, // We send this to Ollama (85%)
|
|
222
|
-
"size_label": "4k"
|
|
223
|
-
}
|
|
224
|
-
]
|
|
225
|
-
}
|
|
226
|
-
]
|
|
227
|
-
}
|
|
228
|
-
```
|
|
229
|
-
|
|
230
|
-
**Why pre-calculate ratios?**
|
|
231
|
-
|
|
232
|
-
- Model-specific (different models need different ratios)
|
|
233
|
-
- Empirically tested values
|
|
234
|
-
- No runtime calculation = no bugs
|
|
235
|
-
- Single source of truth
|
|
236
|
-
|
|
237
|
-
---
|
|
238
|
-
|
|
239
|
-
## Auto-Sizing
|
|
240
|
-
|
|
241
|
-
Auto-sizing picks the optimal context size at startup based on available VRAM, then **stays fixed** for the session.
|
|
242
|
-
|
|
243
|
-
### Auto-Sizing Flow
|
|
244
|
-
|
|
245
|
-
```mermaid
|
|
246
|
-
graph TD
|
|
247
|
-
Start[Session Start] --> Mode{Sizing Mode?}
|
|
248
|
-
|
|
249
|
-
Mode -->|Auto| Auto[Auto-Sizing]
|
|
250
|
-
Mode -->|Manual| Manual[User Selection]
|
|
251
|
-
|
|
252
|
-
Auto --> CheckVRAM[Check VRAM]
|
|
253
|
-
CheckVRAM --> CalcOptimal[Calculate Optimal Size]
|
|
254
|
-
CalcOptimal --> PickTier[Pick One Tier Below Max]
|
|
255
|
-
PickTier --> Lock[LOCK for Session]
|
|
256
|
-
|
|
257
|
-
Manual --> UserPick[User Picks Size]
|
|
258
|
-
UserPick --> Lock
|
|
259
|
-
|
|
260
|
-
Lock --> SelectPrompt[Select System Prompt]
|
|
261
|
-
SelectPrompt --> Fixed[Context FIXED]
|
|
262
|
-
|
|
263
|
-
Fixed --> LowMem{Low Memory<br/>During Session?}
|
|
264
|
-
LowMem -->|Yes| Warn[Show Warning]
|
|
265
|
-
LowMem -->|No| Continue[Continue]
|
|
266
|
-
|
|
267
|
-
Warn --> NoResize[Do NOT Resize]
|
|
268
|
-
NoResize --> Continue
|
|
269
|
-
|
|
270
|
-
Continue --> End[Session Continues]
|
|
271
|
-
|
|
272
|
-
style Lock fill:#6bcf7f
|
|
273
|
-
style Fixed fill:#6bcf7f
|
|
274
|
-
style NoResize fill:#ffd93d
|
|
275
|
-
```
|
|
276
|
-
|
|
277
|
-
### Context Sizing Logic
|
|
278
|
-
|
|
279
|
-
**Step 1: Load Profile**
|
|
280
|
-
|
|
281
|
-
```typescript
|
|
282
|
-
const modelEntry = profileManager.getModelEntry(modelId);
|
|
283
|
-
```
|
|
284
|
-
|
|
285
|
-
**Step 2: Calculate Sizing**
|
|
286
|
-
|
|
287
|
-
```typescript
|
|
288
|
-
const contextSizing = calculateContextSizing(requestedSize, modelEntry, contextCapRatio);
|
|
289
|
-
// Returns: { requested: 4096, allowed: 4096, ollamaContextSize: 3482, ratio: 0.85 }
|
|
290
|
-
```
|
|
291
|
-
|
|
292
|
-
**Step 3: Set Context Limits (CRITICAL)**
|
|
293
|
-
|
|
294
|
-
```typescript
|
|
295
|
-
// Set context.maxTokens to Ollama's limit, NOT user's selection
|
|
296
|
-
contextActions.updateConfig({ targetSize: contextSizing.ollamaContextSize });
|
|
297
|
-
// Now context.maxTokens = 3482
|
|
298
|
-
```
|
|
299
|
-
|
|
300
|
-
**Step 4: Send to Provider**
|
|
301
|
-
|
|
302
|
-
```typescript
|
|
303
|
-
provider.chatStream({
|
|
304
|
-
options: { num_ctx: contextSizing.ollamaContextSize }, // 3482
|
|
305
|
-
});
|
|
306
|
-
```
|
|
307
|
-
|
|
308
|
-
### Expected Behavior
|
|
309
|
-
|
|
310
|
-
```mermaid
|
|
311
|
-
graph LR
|
|
312
|
-
A[Auto Mode] --> B[Check VRAM]
|
|
313
|
-
B --> C[Pick One Tier Below Max]
|
|
314
|
-
C --> D[FIXED for Session]
|
|
315
|
-
|
|
316
|
-
E[Manual Mode] --> F[User Picks]
|
|
317
|
-
F --> D
|
|
318
|
-
|
|
319
|
-
D --> G[Low Memory?]
|
|
320
|
-
G -->|Yes| H[Show Warning]
|
|
321
|
-
G -->|No| I[Continue]
|
|
322
|
-
H --> I
|
|
323
|
-
|
|
324
|
-
style D fill:#6bcf7f
|
|
325
|
-
style H fill:#ffd93d
|
|
326
|
-
```
|
|
327
|
-
|
|
328
|
-
- **Auto mode:** Check VRAM → pick one tier below max → FIXED for session
|
|
329
|
-
- **Manual mode:** User picks → FIXED for session
|
|
330
|
-
- **On low memory:** Show warning to user (system message)
|
|
331
|
-
- **No automatic mid-conversation changes**
|
|
332
|
-
|
|
333
|
-
### Warning Message Example
|
|
334
|
-
|
|
335
|
-
```
|
|
336
|
-
⚠️ Low memory detected (VRAM: 85% used)
|
|
337
|
-
Your current context size may cause performance issues.
|
|
338
|
-
Consider restarting with a smaller context size.
|
|
339
|
-
```
|
|
340
|
-
|
|
341
|
-
---
|
|
342
|
-
|
|
343
|
-
## Token Counting
|
|
344
|
-
|
|
345
|
-
### Token Counter Responsibilities
|
|
346
|
-
|
|
347
|
-
```mermaid
|
|
348
|
-
graph LR
|
|
349
|
-
A[Messages] --> B[Token Counter]
|
|
350
|
-
B --> C[Count Tokens]
|
|
351
|
-
C --> D[Cache Results]
|
|
352
|
-
D --> E[Return Count]
|
|
353
|
-
|
|
354
|
-
B --> F[Estimate New Content]
|
|
355
|
-
F --> G[Return Estimate]
|
|
356
|
-
|
|
357
|
-
style B fill:#4d96ff
|
|
358
|
-
style D fill:#6bcf7f
|
|
359
|
-
```
|
|
360
|
-
|
|
361
|
-
- Count tokens in messages
|
|
362
|
-
- Cache token counts for performance
|
|
363
|
-
- Estimate tokens for new content
|
|
364
|
-
- Track total context usage
|
|
365
|
-
|
|
366
|
-
### Usage Tracking
|
|
367
|
-
|
|
368
|
-
```typescript
|
|
369
|
-
interface ContextUsage {
|
|
370
|
-
currentTokens: number; // Current usage
|
|
371
|
-
maxTokens: number; // Ollama limit (85% of user selection)
|
|
372
|
-
percentage: number; // Usage percentage
|
|
373
|
-
available: number; // Remaining tokens
|
|
374
|
-
}
|
|
375
|
-
```
|
|
376
|
-
|
|
377
|
-
**Example:**
|
|
378
|
-
|
|
379
|
-
```
|
|
380
|
-
User selects: 16K
|
|
381
|
-
Ollama limit: 13,600 (85%)
|
|
382
|
-
Current usage: 8,500 tokens
|
|
383
|
-
Percentage: 62%
|
|
384
|
-
Available: 5,100 tokens
|
|
385
|
-
```
|
|
386
|
-
|
|
387
|
-
### Token Budget Breakdown
|
|
388
|
-
|
|
389
|
-
```mermaid
|
|
390
|
-
graph TB
|
|
391
|
-
A[Total Context: 13,600 tokens] --> B[System Prompt: 1,000 tokens]
|
|
392
|
-
A --> C[Checkpoints: 2,100 tokens]
|
|
393
|
-
A --> D[User Messages: 3,000 tokens]
|
|
394
|
-
A --> E[Assistant Messages: 7,500 tokens]
|
|
395
|
-
|
|
396
|
-
B --> F[Never Compressed]
|
|
397
|
-
C --> G[Compressed History]
|
|
398
|
-
D --> F
|
|
399
|
-
E --> H[Not Yet Compressed]
|
|
400
|
-
|
|
401
|
-
style F fill:#6bcf7f
|
|
402
|
-
style G fill:#ffd93d
|
|
403
|
-
style H fill:#4d96ff
|
|
404
|
-
```
|
|
405
|
-
|
|
406
|
-
---
|
|
407
|
-
|
|
408
|
-
## VRAM Monitoring
|
|
409
|
-
|
|
410
|
-
### VRAM Monitor Responsibilities
|
|
411
|
-
|
|
412
|
-
```mermaid
|
|
413
|
-
graph TD
|
|
414
|
-
A[VRAM Monitor] --> B[Detect GPU Type]
|
|
415
|
-
B --> C{Platform?}
|
|
416
|
-
|
|
417
|
-
C -->|NVIDIA| D[nvidia-smi]
|
|
418
|
-
C -->|AMD| E[rocm-smi]
|
|
419
|
-
C -->|Apple| F[system APIs]
|
|
420
|
-
|
|
421
|
-
D --> G[Query VRAM]
|
|
422
|
-
E --> G
|
|
423
|
-
F --> G
|
|
424
|
-
|
|
425
|
-
G --> H[Calculate Available]
|
|
426
|
-
H --> I[Check Thresholds]
|
|
427
|
-
I --> J[Emit Warnings]
|
|
428
|
-
|
|
429
|
-
style A fill:#4d96ff
|
|
430
|
-
style I fill:#ffd93d
|
|
431
|
-
style J fill:#ff6b6b
|
|
432
|
-
```
|
|
433
|
-
|
|
434
|
-
- Detect GPU type (NVIDIA, AMD, Apple Silicon)
|
|
435
|
-
- Query VRAM usage
|
|
436
|
-
- Emit low memory warnings
|
|
437
|
-
- Calculate optimal context size
|
|
438
|
-
|
|
439
|
-
### Platform Support
|
|
440
|
-
|
|
441
|
-
**NVIDIA (nvidia-smi):**
|
|
442
|
-
|
|
443
|
-
- Total VRAM
|
|
444
|
-
- Used VRAM
|
|
445
|
-
- Free VRAM
|
|
446
|
-
- GPU utilization
|
|
447
|
-
|
|
448
|
-
**AMD (rocm-smi):**
|
|
449
|
-
|
|
450
|
-
- Total VRAM
|
|
451
|
-
- Used VRAM
|
|
452
|
-
- Free VRAM
|
|
453
|
-
|
|
454
|
-
**Apple Silicon (system APIs):**
|
|
455
|
-
|
|
456
|
-
- Unified memory
|
|
457
|
-
- Memory pressure
|
|
458
|
-
- Available memory
|
|
459
|
-
|
|
460
|
-
### Memory Thresholds
|
|
461
|
-
|
|
462
|
-
```typescript
|
|
463
|
-
enum MemoryLevel {
|
|
464
|
-
NORMAL, // < 70% usage
|
|
465
|
-
WARNING, // 70-85% usage
|
|
466
|
-
CRITICAL, // 85-95% usage
|
|
467
|
-
EMERGENCY, // > 95% usage
|
|
468
|
-
}
|
|
469
|
-
```
|
|
470
|
-
|
|
471
|
-
```mermaid
|
|
472
|
-
graph LR
|
|
473
|
-
A[Memory Usage] --> B{Level?}
|
|
474
|
-
|
|
475
|
-
B -->|< 70%| C[🟢 NORMAL<br/>Continue]
|
|
476
|
-
B -->|70-85%| D[🟡 WARNING<br/>Show Warning]
|
|
477
|
-
B -->|85-95%| E[🟠 CRITICAL<br/>Critical Warning]
|
|
478
|
-
B -->|> 95%| F[🔴 EMERGENCY<br/>Emergency Warning]
|
|
479
|
-
|
|
480
|
-
style C fill:#6bcf7f
|
|
481
|
-
style D fill:#ffd93d
|
|
482
|
-
style E fill:#ff9f43
|
|
483
|
-
style F fill:#ff6b6b
|
|
484
|
-
```
|
|
485
|
-
|
|
486
|
-
---
|
|
487
|
-
|
|
488
|
-
## Configuration
|
|
489
|
-
|
|
490
|
-
### Context Config
|
|
491
|
-
|
|
492
|
-
```typescript
|
|
493
|
-
interface ContextConfig {
|
|
494
|
-
targetSize: number; // Target context size (user selection)
|
|
495
|
-
minSize: number; // Minimum context size
|
|
496
|
-
maxSize: number; // Maximum context size
|
|
497
|
-
autoSize: boolean; // Enable auto-sizing
|
|
498
|
-
vramBuffer: number; // VRAM safety buffer (MB)
|
|
499
|
-
kvQuantization: boolean; // Enable KV cache quantization
|
|
500
|
-
}
|
|
501
|
-
```
|
|
502
|
-
|
|
503
|
-
### Default Values
|
|
504
|
-
|
|
505
|
-
```typescript
|
|
506
|
-
const DEFAULT_CONTEXT_CONFIG = {
|
|
507
|
-
targetSize: 8192,
|
|
508
|
-
minSize: 2048,
|
|
509
|
-
maxSize: 131072,
|
|
510
|
-
autoSize: false,
|
|
511
|
-
vramBuffer: 1024, // 1GB safety buffer
|
|
512
|
-
kvQuantization: false,
|
|
513
|
-
};
|
|
514
|
-
```
|
|
515
|
-
|
|
516
|
-
---
|
|
517
|
-
|
|
518
|
-
## Events
|
|
519
|
-
|
|
520
|
-
### Core Events
|
|
521
|
-
|
|
522
|
-
- `started` - Context management started
|
|
523
|
-
- `stopped` - Context management stopped
|
|
524
|
-
- `config-updated` - Configuration changed
|
|
525
|
-
- `tier-changed` - Context tier changed
|
|
526
|
-
- `mode-changed` - Operational mode changed
|
|
527
|
-
|
|
528
|
-
### Memory Events
|
|
529
|
-
|
|
530
|
-
- `low-memory` - Low VRAM detected
|
|
531
|
-
- `memory-warning` - Memory usage warning (70-85%)
|
|
532
|
-
- `memory-critical` - Critical memory usage (85-95%)
|
|
533
|
-
- `memory-emergency` - Emergency memory condition (>95%)
|
|
534
|
-
|
|
535
|
-
### Context Events
|
|
536
|
-
|
|
537
|
-
- `context-resized` - Context size changed
|
|
538
|
-
- `context-recalculated` - Available tokens recalculated
|
|
539
|
-
- `context-discovered` - New context discovered (JIT)
|
|
540
|
-
|
|
541
|
-
---
|
|
542
|
-
|
|
543
|
-
## API Reference
|
|
544
|
-
|
|
545
|
-
### Context Manager
|
|
546
|
-
|
|
547
|
-
```typescript
|
|
548
|
-
class ConversationContextManager {
|
|
549
|
-
// Lifecycle
|
|
550
|
-
async start(): Promise<void>;
|
|
551
|
-
async stop(): Promise<void>;
|
|
552
|
-
|
|
553
|
-
// Configuration
|
|
554
|
-
updateConfig(config: Partial<ContextConfig>): void;
|
|
555
|
-
|
|
556
|
-
// Context
|
|
557
|
-
getUsage(): ContextUsage;
|
|
558
|
-
getContext(): ConversationContext;
|
|
559
|
-
|
|
560
|
-
// Messages
|
|
561
|
-
async addMessage(message: Message): Promise<void>;
|
|
562
|
-
async getMessages(): Promise<Message[]>;
|
|
563
|
-
|
|
564
|
-
// System Prompt (see SystemPrompts.md)
|
|
565
|
-
setSystemPrompt(content: string): void;
|
|
566
|
-
getSystemPrompt(): string;
|
|
567
|
-
|
|
568
|
-
// Mode & Skills (see SystemPrompts.md)
|
|
569
|
-
setMode(mode: OperationalMode): void;
|
|
570
|
-
getMode(): OperationalMode;
|
|
571
|
-
setActiveSkills(skills: string[]): void;
|
|
572
|
-
setActiveTools(tools: string[]): void;
|
|
573
|
-
setActiveHooks(hooks: string[]): void;
|
|
574
|
-
setActiveMcpServers(servers: string[]): void;
|
|
575
|
-
|
|
576
|
-
// Compression (see ContextCompression.md)
|
|
577
|
-
async compress(): Promise<void>;
|
|
578
|
-
getCheckpoints(): CompressionCheckpoint[];
|
|
579
|
-
|
|
580
|
-
// Snapshots (see ContextCompression.md)
|
|
581
|
-
async createSnapshot(): Promise<ContextSnapshot>;
|
|
582
|
-
async restoreSnapshot(snapshotId: string): Promise<void>;
|
|
583
|
-
|
|
584
|
-
// Discovery
|
|
585
|
-
async discoverContext(targetPath: string): Promise<void>;
|
|
586
|
-
|
|
587
|
-
// Streaming
|
|
588
|
-
reportInflightTokens(delta: number): void;
|
|
589
|
-
clearInflightTokens(): void;
|
|
590
|
-
}
|
|
591
|
-
```
|
|
592
|
-
|
|
593
|
-
---
|
|
594
|
-
|
|
595
|
-
## Best Practices
|
|
596
|
-
|
|
597
|
-
### 1. Context Size Selection
|
|
598
|
-
|
|
599
|
-
- Start with Tier 3 (16K) for most tasks
|
|
600
|
-
- Use Tier 2 (8K) for quick conversations
|
|
601
|
-
- Use Tier 1 (2K, 4K) for minimal context needs
|
|
602
|
-
- Use Tier 4 (32K) for large codebases
|
|
603
|
-
- Use Tier 5 (64K, 128K) only when necessary (high VRAM cost)
|
|
604
|
-
|
|
605
|
-
### 2. Auto-Sizing
|
|
606
|
-
|
|
607
|
-
- Enable for automatic optimization
|
|
608
|
-
- Picks one tier below maximum for safety
|
|
609
|
-
- Fixed for session (no mid-conversation changes)
|
|
610
|
-
- Show warnings on low memory
|
|
611
|
-
|
|
612
|
-
### 3. VRAM Management
|
|
613
|
-
|
|
614
|
-
- Monitor VRAM usage regularly
|
|
615
|
-
- Keep 1GB safety buffer
|
|
616
|
-
- Close other GPU applications
|
|
617
|
-
- Use KV cache quantization for large contexts
|
|
618
|
-
|
|
619
|
-
---
|
|
620
|
-
|
|
621
|
-
## Troubleshooting
|
|
622
|
-
|
|
623
|
-
### Context Overflow
|
|
624
|
-
|
|
625
|
-
**Symptom:** "Context usage at 95%" warning
|
|
626
|
-
|
|
627
|
-
**Solutions:**
|
|
628
|
-
|
|
629
|
-
1. Create a snapshot and start fresh (see `ContextCompression.md`)
|
|
630
|
-
2. Enable compression if disabled
|
|
631
|
-
3. Use smaller context size
|
|
632
|
-
4. Clear old messages
|
|
633
|
-
|
|
634
|
-
### Low Memory
|
|
635
|
-
|
|
636
|
-
**Symptom:** "Low memory detected" warning
|
|
637
|
-
|
|
638
|
-
**Solutions:**
|
|
639
|
-
|
|
640
|
-
1. Restart with smaller context size
|
|
641
|
-
2. Close other applications
|
|
642
|
-
3. Use model with smaller parameters
|
|
643
|
-
4. Enable KV cache quantization
|
|
644
|
-
|
|
645
|
-
### Wrong Context Size Sent to Ollama
|
|
646
|
-
|
|
647
|
-
**Symptom:** Ollama receives wrong `num_ctx` value
|
|
648
|
-
|
|
649
|
-
**Solutions:**
|
|
650
|
-
|
|
651
|
-
1. Verify `context.maxTokens` equals `ollamaContextSize`
|
|
652
|
-
2. Check `LLM_profiles.json` has correct pre-calculated values
|
|
653
|
-
3. Ensure `calculateContextSizing()` reads from profile (no calculation)
|
|
654
|
-
4. Verify `contextActions.updateConfig()` is called before sending to provider
|
|
655
|
-
|
|
656
|
-
---
|
|
657
|
-
|
|
658
|
-
## Common Mistakes
|
|
659
|
-
|
|
660
|
-
### ❌ Calculating instead of reading
|
|
661
|
-
|
|
662
|
-
```typescript
|
|
663
|
-
const ollamaSize = userSize * 0.85; // Wrong
|
|
664
|
-
const ollamaSize = profile.ollama_context_size; // Correct
|
|
665
|
-
```
|
|
666
|
-
|
|
667
|
-
### ❌ Not updating context.maxTokens
|
|
668
|
-
|
|
669
|
-
```typescript
|
|
670
|
-
// Wrong - maxTokens stays at user selection
|
|
671
|
-
provider.chat({ options: { num_ctx: ollamaContextSize } });
|
|
672
|
-
|
|
673
|
-
// Correct - update maxTokens first
|
|
674
|
-
contextActions.updateConfig({ targetSize: ollamaContextSize });
|
|
675
|
-
provider.chat({ options: { num_ctx: ollamaContextSize } });
|
|
676
|
-
```
|
|
677
|
-
|
|
678
|
-
### ❌ Using user selection for thresholds
|
|
679
|
-
|
|
680
|
-
```typescript
|
|
681
|
-
const trigger = userContextSize * 0.75; // Wrong - uses user selection
|
|
682
|
-
const trigger = context.maxTokens * 0.75; // Correct - uses ollama limit
|
|
683
|
-
```
|
|
684
|
-
|
|
685
|
-
---
|
|
686
|
-
|
|
687
|
-
## File Locations
|
|
688
|
-
|
|
689
|
-
| File | Purpose |
|
|
690
|
-
| ---------------------------------------------------- | ------------------------- |
|
|
691
|
-
| `packages/core/src/context/contextManager.ts` | Main orchestration |
|
|
692
|
-
| `packages/core/src/context/vramMonitor.ts` | VRAM monitoring |
|
|
693
|
-
| `packages/core/src/context/tokenCounter.ts` | Token counting |
|
|
694
|
-
| `packages/core/src/context/contextPool.ts` | Dynamic sizing |
|
|
695
|
-
| `packages/core/src/context/memoryGuard.ts` | Memory safety |
|
|
696
|
-
| `packages/core/src/context/types.ts` | Type definitions |
|
|
697
|
-
| `packages/cli/src/config/LLM_profiles.json` | Pre-calculated 85% values |
|
|
698
|
-
| `packages/cli/src/features/context/contextSizing.ts` | calculateContextSizing() |
|
|
699
|
-
| `packages/cli/src/features/context/ModelContext.tsx` | Interactive mode |
|
|
700
|
-
| `packages/cli/src/nonInteractive.ts` | CLI mode |
|
|
701
|
-
|
|
702
|
-
---
|
|
703
|
-
|
|
704
|
-
## Summary
|
|
705
|
-
|
|
706
|
-
### Key Features
|
|
707
|
-
|
|
708
|
-
1. **Fixed Context Sizing** ✅
|
|
709
|
-
- Context size determined once at startup
|
|
710
|
-
- Stays fixed for entire session
|
|
711
|
-
- No mid-conversation changes
|
|
712
|
-
- Predictable behavior
|
|
713
|
-
|
|
714
|
-
2. **Tier-Based System** ✅
|
|
715
|
-
- 5 tiers from Minimal to Ultra
|
|
716
|
-
- Labels represent context size ranges
|
|
717
|
-
- Used for prompt selection
|
|
718
|
-
- Tier 3 (Standard) is primary target
|
|
719
|
-
|
|
720
|
-
3. **Pre-Calculated Ratios** ✅
|
|
721
|
-
- 85% values in LLM_profiles.json
|
|
722
|
-
- No runtime calculation
|
|
723
|
-
- Model-specific values
|
|
724
|
-
- Single source of truth
|
|
725
|
-
|
|
726
|
-
4. **VRAM Monitoring** ✅
|
|
727
|
-
- Platform-specific implementations
|
|
728
|
-
- Real-time memory tracking
|
|
729
|
-
- Low memory warnings
|
|
730
|
-
- Optimal size calculation
|
|
731
|
-
|
|
732
|
-
5. **Auto-Sizing** ✅
|
|
733
|
-
- Automatic optimization
|
|
734
|
-
- One tier below max for safety
|
|
735
|
-
- Fixed for session
|
|
736
|
-
- Clear warnings
|
|
737
|
-
|
|
738
|
-
6. **Token Counting** ✅
|
|
739
|
-
- Accurate token measurement
|
|
740
|
-
- Performance caching
|
|
741
|
-
- Usage tracking
|
|
742
|
-
- Budget management
|
|
743
|
-
|
|
744
|
-
---
|
|
745
|
-
|
|
746
|
-
**Document Status:** ✅ Updated
|
|
747
|
-
**Last Updated:** January 26, 2026
|
|
748
|
-
**Purpose:** Complete guide to context management system
|
|
749
|
-
|
|
750
|
-
**Note:** This document focuses on context sizing logic. For compression and snapshots, see `ContextCompression.md`. For prompt structure, see `SystemPrompts.md`.
|