@tecet/ollm 0.1.4-b → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (66) hide show
  1. package/docs/README.md +3 -410
  2. package/package.json +2 -2
  3. package/docs/Context/CheckpointFlowDiagram.md +0 -673
  4. package/docs/Context/ContextArchitecture.md +0 -898
  5. package/docs/Context/ContextCompression.md +0 -1102
  6. package/docs/Context/ContextManagment.md +0 -750
  7. package/docs/Context/Index.md +0 -209
  8. package/docs/Context/README.md +0 -390
  9. package/docs/DevelopmentRoadmap/Index.md +0 -238
  10. package/docs/DevelopmentRoadmap/OLLM-CLI_Releases.md +0 -419
  11. package/docs/DevelopmentRoadmap/PlanedFeatures.md +0 -448
  12. package/docs/DevelopmentRoadmap/README.md +0 -174
  13. package/docs/DevelopmentRoadmap/Roadmap.md +0 -572
  14. package/docs/DevelopmentRoadmap/RoadmapVisual.md +0 -372
  15. package/docs/Hooks/Architecture.md +0 -885
  16. package/docs/Hooks/Index.md +0 -244
  17. package/docs/Hooks/KeyboardShortcuts.md +0 -248
  18. package/docs/Hooks/Protocol.md +0 -817
  19. package/docs/Hooks/README.md +0 -403
  20. package/docs/Hooks/UserGuide.md +0 -1483
  21. package/docs/Hooks/VisualGuide.md +0 -598
  22. package/docs/Index.md +0 -506
  23. package/docs/Installation.md +0 -586
  24. package/docs/Introduction.md +0 -367
  25. package/docs/LLM Models/Index.md +0 -239
  26. package/docs/LLM Models/LLM_GettingStarted.md +0 -748
  27. package/docs/LLM Models/LLM_Index.md +0 -701
  28. package/docs/LLM Models/LLM_MemorySystem.md +0 -337
  29. package/docs/LLM Models/LLM_ModelCompatibility.md +0 -499
  30. package/docs/LLM Models/LLM_ModelsArchitecture.md +0 -933
  31. package/docs/LLM Models/LLM_ModelsCommands.md +0 -839
  32. package/docs/LLM Models/LLM_ModelsConfiguration.md +0 -1094
  33. package/docs/LLM Models/LLM_ModelsList.md +0 -1071
  34. package/docs/LLM Models/LLM_ModelsList.md.backup +0 -400
  35. package/docs/LLM Models/README.md +0 -355
  36. package/docs/MCP/MCP_Architecture.md +0 -1086
  37. package/docs/MCP/MCP_Commands.md +0 -1111
  38. package/docs/MCP/MCP_GettingStarted.md +0 -590
  39. package/docs/MCP/MCP_Index.md +0 -524
  40. package/docs/MCP/MCP_Integration.md +0 -866
  41. package/docs/MCP/MCP_Marketplace.md +0 -160
  42. package/docs/MCP/README.md +0 -415
  43. package/docs/Prompts System/Architecture.md +0 -760
  44. package/docs/Prompts System/Index.md +0 -223
  45. package/docs/Prompts System/PromptsRouting.md +0 -1047
  46. package/docs/Prompts System/PromptsTemplates.md +0 -1102
  47. package/docs/Prompts System/README.md +0 -389
  48. package/docs/Prompts System/SystemPrompts.md +0 -856
  49. package/docs/Quickstart.md +0 -535
  50. package/docs/Tools/Architecture.md +0 -884
  51. package/docs/Tools/GettingStarted.md +0 -624
  52. package/docs/Tools/Index.md +0 -216
  53. package/docs/Tools/ManifestReference.md +0 -141
  54. package/docs/Tools/README.md +0 -440
  55. package/docs/Tools/UserGuide.md +0 -773
  56. package/docs/Troubleshooting.md +0 -1265
  57. package/docs/UI&Settings/Architecture.md +0 -729
  58. package/docs/UI&Settings/ColorASCII.md +0 -34
  59. package/docs/UI&Settings/Commands.md +0 -755
  60. package/docs/UI&Settings/Configuration.md +0 -872
  61. package/docs/UI&Settings/Index.md +0 -293
  62. package/docs/UI&Settings/Keybinds.md +0 -372
  63. package/docs/UI&Settings/README.md +0 -278
  64. package/docs/UI&Settings/Terminal.md +0 -637
  65. package/docs/UI&Settings/Themes.md +0 -604
  66. package/docs/UI&Settings/UIGuide.md +0 -550
@@ -1,750 +0,0 @@
1
- # Context Management System
2
-
3
- **Last Updated:** January 26, 2026
4
- **Status:** Source of Truth
5
-
6
- **Related Documents:**
7
-
8
- - `ContextArchitecture.md` - Overall system architecture
9
- - `ContextCompression.md` - Compression, checkpoints, snapshots
10
- - `SystemPrompts.md` - System prompt architecture
11
-
12
- ---
13
-
14
- ## Overview
15
-
16
- The Context Management System determines context window sizes, monitors VRAM, and manages token counting. It provides the foundation for compression and prompt systems.
17
-
18
- **Core Responsibility:** Determine and maintain the context size that will be sent to Ollama.
19
-
20
- ---
21
-
22
- ## Table of Contents
23
-
24
- 1. [Architecture](#architecture)
25
- 2. [Context Tiers](#context-tiers)
26
- 3. [Context Size Flow](#context-size-flow)
27
- 4. [Auto-Sizing](#auto-sizing)
28
- 5. [Token Counting](#token-counting)
29
- 6. [VRAM Monitoring](#vram-monitoring)
30
- 7. [Configuration](#configuration)
31
- 8. [API Reference](#api-reference)
32
-
33
- ---
34
-
35
- ## Architecture
36
-
37
- ### Core Components
38
-
39
- ```mermaid
40
- graph TB
41
- subgraph "Context Management"
42
- A[Context Manager]
43
- B[VRAM Monitor]
44
- C[Token Counter]
45
- D[Context Pool]
46
- E[Memory Guard]
47
- end
48
-
49
- subgraph "Supporting Systems"
50
- F[System Prompt Builder]
51
- G[Compression Coordinator]
52
- H[Snapshot Manager]
53
- end
54
-
55
- A --> B
56
- A --> C
57
- A --> D
58
- A --> E
59
- A --> F
60
- A --> G
61
- A --> H
62
-
63
- style A fill:#4d96ff
64
- style B fill:#6bcf7f
65
- style C fill:#ffd93d
66
- ```
67
-
68
- **Component Responsibilities:**
69
-
70
- 1. **Context Manager** (`contextManager.ts`)
71
- - Main orchestration layer
72
- - Coordinates all context services
73
- - Manages conversation state
74
- - Owns system prompt
75
-
76
- 2. **VRAM Monitor** (`vramMonitor.ts`)
77
- - Tracks GPU memory availability
78
- - Detects low memory conditions
79
- - Platform-specific implementations (NVIDIA, AMD, Apple Silicon)
80
-
81
- 3. **Token Counter** (`tokenCounter.ts`)
82
- - Measures context usage in tokens
83
- - Caches token counts for performance
84
- - Estimates tokens for messages
85
-
86
- 4. **Context Pool** (`contextPool.ts`)
87
- - Manages dynamic context sizing
88
- - Calculates optimal context size based on VRAM
89
- - Handles context resizing
90
-
91
- 5. **Memory Guard** (`memoryGuard.ts`)
92
- - Prevents OOM errors
93
- - Emits warnings at memory thresholds
94
- - Triggers emergency actions
95
-
96
- ---
97
-
98
- ## Context Tiers
99
-
100
- Context tiers are **labels** that represent different context window sizes. They are the **result** of user selection or hardware detection, not decision makers.
101
-
102
- ### Tier Definitions
103
-
104
- ```mermaid
105
- graph LR
106
- subgraph "Tier 1: Minimal"
107
- A1[2K, 4K]
108
- A2[1700, 3400 Ollama]
109
- end
110
-
111
- subgraph "Tier 2: Basic"
112
- B1[8K]
113
- B2[6800 Ollama]
114
- end
115
-
116
- subgraph "Tier 3: Standard ⭐"
117
- C1[16K]
118
- C2[13600 Ollama]
119
- end
120
-
121
- subgraph "Tier 4: Premium"
122
- D1[32K]
123
- D2[27200 Ollama]
124
- end
125
-
126
- subgraph "Tier 5: Ultra"
127
- E1[64K, 128K]
128
- E2[54400, 108800 Ollama]
129
- end
130
-
131
- style C1 fill:#6bcf7f
132
- style C2 fill:#6bcf7f
133
- ```
134
-
135
- | Tier | Context Size | Ollama Size (85%) | Use Case |
136
- | ----------------- | ------------ | ----------------- | ----------------------------------- |
137
- | Tier 1 (Minimal) | 2K, 4K | 1700, 3400 | Quick tasks, minimal context |
138
- | Tier 2 (Basic) | 8K | 6800 | Standard conversations |
139
- | Tier 3 (Standard) | 16K | 13600 | Complex tasks, code review ⭐ |
140
- | Tier 4 (Premium) | 32K | 27200 | Large codebases, long conversations |
141
- | Tier 5 (Ultra) | 64K, 128K | 54400, 108800 | Maximum context, research tasks |
142
-
143
- **Key Points:**
144
-
145
- - Tiers are **labels only** - they don't make decisions
146
- - Context size drives everything
147
- - Each tier has specific context sizes (not ranges)
148
- - Tiers are used for prompt selection (see `SystemPrompts.md`)
149
- - The 85% values are **pre-calculated by devs** in `LLM_profiles.json`
150
-
151
- ---
152
-
153
- ## Context Size Flow
154
-
155
- ### User Selection → Ollama
156
-
157
- ```mermaid
158
- sequenceDiagram
159
- participant User
160
- participant System
161
- participant Profile as LLM_profiles.json
162
- participant Ollama
163
-
164
- User->>System: Select 16K context
165
- System->>Profile: Read model entry
166
- Profile->>System: ollama_context_size: 13600
167
- System->>System: Determine tier: Tier 3
168
- System->>System: Build prompt for Tier 3
169
- System->>Ollama: Send prompt + num_ctx: 13600
170
- Ollama->>Ollama: Use 100% of 13600 tokens
171
-
172
- Note over System,Ollama: 85% already calculated in profile
173
- ```
174
-
175
- **Flow Steps:**
176
-
177
- 1. User selects context size (e.g., 16K)
178
- 2. System reads LLM_profiles.json
179
- 3. Gets pre-calculated ollama_context_size (e.g., 13600 for 16K)
180
- 4. System determines tier label (Tier 3 for 16K)
181
- 5. System builds prompt based on tier label
182
- 6. System sends prompt + ollama_context_size (13600) to Ollama
183
- 7. Ollama uses 100% of that value (13600 tokens)
184
-
185
- **Critical:** The 85% is already calculated in `LLM_profiles.json`. No runtime calculation of 85% should exist in the code.
186
-
187
- ### Data Flow Chain
188
-
189
- ```mermaid
190
- graph TD
191
- A[LLM_profiles.json] --> B[ProfileManager.getModelEntry]
192
- B --> C[calculateContextSizing]
193
- C --> D[Returns: allowed, ollamaContextSize, ratio]
194
- D --> E[ModelContext.sendToLLM OR nonInteractive.ts]
195
- E --> F[contextActions.updateConfig]
196
- F --> G[context.maxTokens = ollamaContextSize]
197
- G --> H[provider.chatStream]
198
- H --> I[Ollama enforces limit]
199
-
200
- style A fill:#4d96ff
201
- style G fill:#6bcf7f
202
- style I fill:#ffd93d
203
- ```
204
-
205
- **Critical:** `context.maxTokens` MUST equal `ollamaContextSize`, not user's selection.
206
-
207
- ---
208
-
209
- ## LLM_profiles.json Structure
210
-
211
- ### Profile Format
212
-
213
- ```json
214
- {
215
- "models": [
216
- {
217
- "id": "llama3.2:3b",
218
- "context_profiles": [
219
- {
220
- "size": 4096, // User sees this
221
- "ollama_context_size": 3482, // We send this to Ollama (85%)
222
- "size_label": "4k"
223
- }
224
- ]
225
- }
226
- ]
227
- }
228
- ```
229
-
230
- **Why pre-calculate ratios?**
231
-
232
- - Model-specific (different models need different ratios)
233
- - Empirically tested values
234
- - No runtime calculation = no bugs
235
- - Single source of truth
236
-
237
- ---
238
-
239
- ## Auto-Sizing
240
-
241
- Auto-sizing picks the optimal context size at startup based on available VRAM, then **stays fixed** for the session.
242
-
243
- ### Auto-Sizing Flow
244
-
245
- ```mermaid
246
- graph TD
247
- Start[Session Start] --> Mode{Sizing Mode?}
248
-
249
- Mode -->|Auto| Auto[Auto-Sizing]
250
- Mode -->|Manual| Manual[User Selection]
251
-
252
- Auto --> CheckVRAM[Check VRAM]
253
- CheckVRAM --> CalcOptimal[Calculate Optimal Size]
254
- CalcOptimal --> PickTier[Pick One Tier Below Max]
255
- PickTier --> Lock[LOCK for Session]
256
-
257
- Manual --> UserPick[User Picks Size]
258
- UserPick --> Lock
259
-
260
- Lock --> SelectPrompt[Select System Prompt]
261
- SelectPrompt --> Fixed[Context FIXED]
262
-
263
- Fixed --> LowMem{Low Memory<br/>During Session?}
264
- LowMem -->|Yes| Warn[Show Warning]
265
- LowMem -->|No| Continue[Continue]
266
-
267
- Warn --> NoResize[Do NOT Resize]
268
- NoResize --> Continue
269
-
270
- Continue --> End[Session Continues]
271
-
272
- style Lock fill:#6bcf7f
273
- style Fixed fill:#6bcf7f
274
- style NoResize fill:#ffd93d
275
- ```
276
-
277
- ### Context Sizing Logic
278
-
279
- **Step 1: Load Profile**
280
-
281
- ```typescript
282
- const modelEntry = profileManager.getModelEntry(modelId);
283
- ```
284
-
285
- **Step 2: Calculate Sizing**
286
-
287
- ```typescript
288
- const contextSizing = calculateContextSizing(requestedSize, modelEntry, contextCapRatio);
289
- // Returns: { requested: 4096, allowed: 4096, ollamaContextSize: 3482, ratio: 0.85 }
290
- ```
291
-
292
- **Step 3: Set Context Limits (CRITICAL)**
293
-
294
- ```typescript
295
- // Set context.maxTokens to Ollama's limit, NOT user's selection
296
- contextActions.updateConfig({ targetSize: contextSizing.ollamaContextSize });
297
- // Now context.maxTokens = 3482
298
- ```
299
-
300
- **Step 4: Send to Provider**
301
-
302
- ```typescript
303
- provider.chatStream({
304
- options: { num_ctx: contextSizing.ollamaContextSize }, // 3482
305
- });
306
- ```
307
-
308
- ### Expected Behavior
309
-
310
- ```mermaid
311
- graph LR
312
- A[Auto Mode] --> B[Check VRAM]
313
- B --> C[Pick One Tier Below Max]
314
- C --> D[FIXED for Session]
315
-
316
- E[Manual Mode] --> F[User Picks]
317
- F --> D
318
-
319
- D --> G[Low Memory?]
320
- G -->|Yes| H[Show Warning]
321
- G -->|No| I[Continue]
322
- H --> I
323
-
324
- style D fill:#6bcf7f
325
- style H fill:#ffd93d
326
- ```
327
-
328
- - **Auto mode:** Check VRAM → pick one tier below max → FIXED for session
329
- - **Manual mode:** User picks → FIXED for session
330
- - **On low memory:** Show warning to user (system message)
331
- - **No automatic mid-conversation changes**
332
-
333
- ### Warning Message Example
334
-
335
- ```
336
- ⚠️ Low memory detected (VRAM: 85% used)
337
- Your current context size may cause performance issues.
338
- Consider restarting with a smaller context size.
339
- ```
340
-
341
- ---
342
-
343
- ## Token Counting
344
-
345
- ### Token Counter Responsibilities
346
-
347
- ```mermaid
348
- graph LR
349
- A[Messages] --> B[Token Counter]
350
- B --> C[Count Tokens]
351
- C --> D[Cache Results]
352
- D --> E[Return Count]
353
-
354
- B --> F[Estimate New Content]
355
- F --> G[Return Estimate]
356
-
357
- style B fill:#4d96ff
358
- style D fill:#6bcf7f
359
- ```
360
-
361
- - Count tokens in messages
362
- - Cache token counts for performance
363
- - Estimate tokens for new content
364
- - Track total context usage
365
-
366
- ### Usage Tracking
367
-
368
- ```typescript
369
- interface ContextUsage {
370
- currentTokens: number; // Current usage
371
- maxTokens: number; // Ollama limit (85% of user selection)
372
- percentage: number; // Usage percentage
373
- available: number; // Remaining tokens
374
- }
375
- ```
376
-
377
- **Example:**
378
-
379
- ```
380
- User selects: 16K
381
- Ollama limit: 13,600 (85%)
382
- Current usage: 8,500 tokens
383
- Percentage: 62%
384
- Available: 5,100 tokens
385
- ```
386
-
387
- ### Token Budget Breakdown
388
-
389
- ```mermaid
390
- graph TB
391
- A[Total Context: 13,600 tokens] --> B[System Prompt: 1,000 tokens]
392
- A --> C[Checkpoints: 2,100 tokens]
393
- A --> D[User Messages: 3,000 tokens]
394
- A --> E[Assistant Messages: 7,500 tokens]
395
-
396
- B --> F[Never Compressed]
397
- C --> G[Compressed History]
398
- D --> F
399
- E --> H[Not Yet Compressed]
400
-
401
- style F fill:#6bcf7f
402
- style G fill:#ffd93d
403
- style H fill:#4d96ff
404
- ```
405
-
406
- ---
407
-
408
- ## VRAM Monitoring
409
-
410
- ### VRAM Monitor Responsibilities
411
-
412
- ```mermaid
413
- graph TD
414
- A[VRAM Monitor] --> B[Detect GPU Type]
415
- B --> C{Platform?}
416
-
417
- C -->|NVIDIA| D[nvidia-smi]
418
- C -->|AMD| E[rocm-smi]
419
- C -->|Apple| F[system APIs]
420
-
421
- D --> G[Query VRAM]
422
- E --> G
423
- F --> G
424
-
425
- G --> H[Calculate Available]
426
- H --> I[Check Thresholds]
427
- I --> J[Emit Warnings]
428
-
429
- style A fill:#4d96ff
430
- style I fill:#ffd93d
431
- style J fill:#ff6b6b
432
- ```
433
-
434
- - Detect GPU type (NVIDIA, AMD, Apple Silicon)
435
- - Query VRAM usage
436
- - Emit low memory warnings
437
- - Calculate optimal context size
438
-
439
- ### Platform Support
440
-
441
- **NVIDIA (nvidia-smi):**
442
-
443
- - Total VRAM
444
- - Used VRAM
445
- - Free VRAM
446
- - GPU utilization
447
-
448
- **AMD (rocm-smi):**
449
-
450
- - Total VRAM
451
- - Used VRAM
452
- - Free VRAM
453
-
454
- **Apple Silicon (system APIs):**
455
-
456
- - Unified memory
457
- - Memory pressure
458
- - Available memory
459
-
460
- ### Memory Thresholds
461
-
462
- ```typescript
463
- enum MemoryLevel {
464
- NORMAL, // < 70% usage
465
- WARNING, // 70-85% usage
466
- CRITICAL, // 85-95% usage
467
- EMERGENCY, // > 95% usage
468
- }
469
- ```
470
-
471
- ```mermaid
472
- graph LR
473
- A[Memory Usage] --> B{Level?}
474
-
475
- B -->|< 70%| C[🟢 NORMAL<br/>Continue]
476
- B -->|70-85%| D[🟡 WARNING<br/>Show Warning]
477
- B -->|85-95%| E[🟠 CRITICAL<br/>Critical Warning]
478
- B -->|> 95%| F[🔴 EMERGENCY<br/>Emergency Warning]
479
-
480
- style C fill:#6bcf7f
481
- style D fill:#ffd93d
482
- style E fill:#ff9f43
483
- style F fill:#ff6b6b
484
- ```
485
-
486
- ---
487
-
488
- ## Configuration
489
-
490
- ### Context Config
491
-
492
- ```typescript
493
- interface ContextConfig {
494
- targetSize: number; // Target context size (user selection)
495
- minSize: number; // Minimum context size
496
- maxSize: number; // Maximum context size
497
- autoSize: boolean; // Enable auto-sizing
498
- vramBuffer: number; // VRAM safety buffer (MB)
499
- kvQuantization: boolean; // Enable KV cache quantization
500
- }
501
- ```
502
-
503
- ### Default Values
504
-
505
- ```typescript
506
- const DEFAULT_CONTEXT_CONFIG = {
507
- targetSize: 8192,
508
- minSize: 2048,
509
- maxSize: 131072,
510
- autoSize: false,
511
- vramBuffer: 1024, // 1GB safety buffer
512
- kvQuantization: false,
513
- };
514
- ```
515
-
516
- ---
517
-
518
- ## Events
519
-
520
- ### Core Events
521
-
522
- - `started` - Context management started
523
- - `stopped` - Context management stopped
524
- - `config-updated` - Configuration changed
525
- - `tier-changed` - Context tier changed
526
- - `mode-changed` - Operational mode changed
527
-
528
- ### Memory Events
529
-
530
- - `low-memory` - Low VRAM detected
531
- - `memory-warning` - Memory usage warning (70-85%)
532
- - `memory-critical` - Critical memory usage (85-95%)
533
- - `memory-emergency` - Emergency memory condition (>95%)
534
-
535
- ### Context Events
536
-
537
- - `context-resized` - Context size changed
538
- - `context-recalculated` - Available tokens recalculated
539
- - `context-discovered` - New context discovered (JIT)
540
-
541
- ---
542
-
543
- ## API Reference
544
-
545
- ### Context Manager
546
-
547
- ```typescript
548
- class ConversationContextManager {
549
- // Lifecycle
550
- async start(): Promise<void>;
551
- async stop(): Promise<void>;
552
-
553
- // Configuration
554
- updateConfig(config: Partial<ContextConfig>): void;
555
-
556
- // Context
557
- getUsage(): ContextUsage;
558
- getContext(): ConversationContext;
559
-
560
- // Messages
561
- async addMessage(message: Message): Promise<void>;
562
- async getMessages(): Promise<Message[]>;
563
-
564
- // System Prompt (see SystemPrompts.md)
565
- setSystemPrompt(content: string): void;
566
- getSystemPrompt(): string;
567
-
568
- // Mode & Skills (see SystemPrompts.md)
569
- setMode(mode: OperationalMode): void;
570
- getMode(): OperationalMode;
571
- setActiveSkills(skills: string[]): void;
572
- setActiveTools(tools: string[]): void;
573
- setActiveHooks(hooks: string[]): void;
574
- setActiveMcpServers(servers: string[]): void;
575
-
576
- // Compression (see ContextCompression.md)
577
- async compress(): Promise<void>;
578
- getCheckpoints(): CompressionCheckpoint[];
579
-
580
- // Snapshots (see ContextCompression.md)
581
- async createSnapshot(): Promise<ContextSnapshot>;
582
- async restoreSnapshot(snapshotId: string): Promise<void>;
583
-
584
- // Discovery
585
- async discoverContext(targetPath: string): Promise<void>;
586
-
587
- // Streaming
588
- reportInflightTokens(delta: number): void;
589
- clearInflightTokens(): void;
590
- }
591
- ```
592
-
593
- ---
594
-
595
- ## Best Practices
596
-
597
- ### 1. Context Size Selection
598
-
599
- - Start with Tier 3 (16K) for most tasks
600
- - Use Tier 2 (8K) for quick conversations
601
- - Use Tier 1 (2K, 4K) for minimal context needs
602
- - Use Tier 4 (32K) for large codebases
603
- - Use Tier 5 (64K, 128K) only when necessary (high VRAM cost)
604
-
605
- ### 2. Auto-Sizing
606
-
607
- - Enable for automatic optimization
608
- - Picks one tier below maximum for safety
609
- - Fixed for session (no mid-conversation changes)
610
- - Show warnings on low memory
611
-
612
- ### 3. VRAM Management
613
-
614
- - Monitor VRAM usage regularly
615
- - Keep 1GB safety buffer
616
- - Close other GPU applications
617
- - Use KV cache quantization for large contexts
618
-
619
- ---
620
-
621
- ## Troubleshooting
622
-
623
- ### Context Overflow
624
-
625
- **Symptom:** "Context usage at 95%" warning
626
-
627
- **Solutions:**
628
-
629
- 1. Create a snapshot and start fresh (see `ContextCompression.md`)
630
- 2. Enable compression if disabled
631
- 3. Use smaller context size
632
- 4. Clear old messages
633
-
634
- ### Low Memory
635
-
636
- **Symptom:** "Low memory detected" warning
637
-
638
- **Solutions:**
639
-
640
- 1. Restart with smaller context size
641
- 2. Close other applications
642
- 3. Use model with smaller parameters
643
- 4. Enable KV cache quantization
644
-
645
- ### Wrong Context Size Sent to Ollama
646
-
647
- **Symptom:** Ollama receives wrong `num_ctx` value
648
-
649
- **Solutions:**
650
-
651
- 1. Verify `context.maxTokens` equals `ollamaContextSize`
652
- 2. Check `LLM_profiles.json` has correct pre-calculated values
653
- 3. Ensure `calculateContextSizing()` reads from profile (no calculation)
654
- 4. Verify `contextActions.updateConfig()` is called before sending to provider
655
-
656
- ---
657
-
658
- ## Common Mistakes
659
-
660
- ### ❌ Calculating instead of reading
661
-
662
- ```typescript
663
- const ollamaSize = userSize * 0.85; // Wrong
664
- const ollamaSize = profile.ollama_context_size; // Correct
665
- ```
666
-
667
- ### ❌ Not updating context.maxTokens
668
-
669
- ```typescript
670
- // Wrong - maxTokens stays at user selection
671
- provider.chat({ options: { num_ctx: ollamaContextSize } });
672
-
673
- // Correct - update maxTokens first
674
- contextActions.updateConfig({ targetSize: ollamaContextSize });
675
- provider.chat({ options: { num_ctx: ollamaContextSize } });
676
- ```
677
-
678
- ### ❌ Using user selection for thresholds
679
-
680
- ```typescript
681
- const trigger = userContextSize * 0.75; // Wrong - uses user selection
682
- const trigger = context.maxTokens * 0.75; // Correct - uses ollama limit
683
- ```
684
-
685
- ---
686
-
687
- ## File Locations
688
-
689
- | File | Purpose |
690
- | ---------------------------------------------------- | ------------------------- |
691
- | `packages/core/src/context/contextManager.ts` | Main orchestration |
692
- | `packages/core/src/context/vramMonitor.ts` | VRAM monitoring |
693
- | `packages/core/src/context/tokenCounter.ts` | Token counting |
694
- | `packages/core/src/context/contextPool.ts` | Dynamic sizing |
695
- | `packages/core/src/context/memoryGuard.ts` | Memory safety |
696
- | `packages/core/src/context/types.ts` | Type definitions |
697
- | `packages/cli/src/config/LLM_profiles.json` | Pre-calculated 85% values |
698
- | `packages/cli/src/features/context/contextSizing.ts` | calculateContextSizing() |
699
- | `packages/cli/src/features/context/ModelContext.tsx` | Interactive mode |
700
- | `packages/cli/src/nonInteractive.ts` | CLI mode |
701
-
702
- ---
703
-
704
- ## Summary
705
-
706
- ### Key Features
707
-
708
- 1. **Fixed Context Sizing** ✅
709
- - Context size determined once at startup
710
- - Stays fixed for entire session
711
- - No mid-conversation changes
712
- - Predictable behavior
713
-
714
- 2. **Tier-Based System** ✅
715
- - 5 tiers from Minimal to Ultra
716
- - Labels represent context size ranges
717
- - Used for prompt selection
718
- - Tier 3 (Standard) is primary target
719
-
720
- 3. **Pre-Calculated Ratios** ✅
721
- - 85% values in LLM_profiles.json
722
- - No runtime calculation
723
- - Model-specific values
724
- - Single source of truth
725
-
726
- 4. **VRAM Monitoring** ✅
727
- - Platform-specific implementations
728
- - Real-time memory tracking
729
- - Low memory warnings
730
- - Optimal size calculation
731
-
732
- 5. **Auto-Sizing** ✅
733
- - Automatic optimization
734
- - One tier below max for safety
735
- - Fixed for session
736
- - Clear warnings
737
-
738
- 6. **Token Counting** ✅
739
- - Accurate token measurement
740
- - Performance caching
741
- - Usage tracking
742
- - Budget management
743
-
744
- ---
745
-
746
- **Document Status:** ✅ Updated
747
- **Last Updated:** January 26, 2026
748
- **Purpose:** Complete guide to context management system
749
-
750
- **Note:** This document focuses on context sizing logic. For compression and snapshots, see `ContextCompression.md`. For prompt structure, see `SystemPrompts.md`.