cc-context-stats 1.7.0 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -7,6 +7,30 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [1.8.0] - 2026-03-15
11
+
12
+ ### Added
13
+
14
+ - **Model Intelligence (MI) score** — Heuristic quality score estimating answer quality based on context utilization, cache efficiency, and output productivity. Inspired by the Michelangelo paper (arXiv:2409.12640). Displayed as `MI:X.XX` in the statusline with green/yellow/red color coding
15
+ - **MI score in all implementations** — MI computation available across Python package, standalone Python, Node.js, and Bash (via `awk`) statusline scripts with full cross-implementation parity
16
+ - **MI timeseries graph** — `context-stats --type mi` renders MI score trajectory over time as an ASCII graph with decimal Y-axis labels
17
+ - **MI in session summary** — `context-stats` summary now shows MI score with sub-component breakdown (CPS, ES, PS) and interpretation text
18
+ - **Shared test vectors** — `tests/fixtures/mi_test_vectors.json` with 6 vectors ensuring Python and Node.js produce identical MI scores within ±0.01 tolerance
19
+ - **`label_fn` parameter for `render_timeseries()`** — Optional custom Y-axis label formatter, used by MI graph to display decimals instead of token counts
20
+ - **Bash feature parity** — `statusline-full.sh` now supports custom color overrides, state file rotation, MI score display, and all config keys (`show_mi`, `mi_curve_beta`, `reduced_motion`, `show_io_tokens`)
21
+ - **Config: `show_mi`** — Toggle MI score display (default: `true`)
22
+ - **Config: `mi_curve_beta`** — Adjust MI degradation curve shape (default: `1.5`)
23
+
24
+ ### Changed
25
+
26
+ - **Compact context display** — Removed "free" word from context info (`872,748 (87.3%)` instead of `872,748 free (87.3%)`) across all implementations
27
+ - **Decoupled state reads from `show_delta`** — State file is now read when either `show_delta` or `show_mi` is enabled, allowing MI to work independently of delta display
28
+ - **Node.js terminal width default** — Changed from `80` to `200` when no TTY is detected (matching Python behavior), preventing `fitToWidth` from dropping statusline parts in Claude Code's subprocess
29
+
30
+ ### Fixed
31
+
32
+ - **Node.js terminal width** — Fixed `getTerminalWidth()` defaulting to 80 in Claude Code's subprocess, which caused MI, delta, AC, and session parts to be silently dropped
33
+
10
34
  ## [1.7.0] - 2026-03-14
11
35
 
12
36
  ### Added
@@ -0,0 +1,396 @@
1
+ # Model Intelligence (MI) Metric — Implementation Plan
2
+
3
+ > Inspired by the **Michelangelo** paper: *Long Context Evaluations Beyond Haystacks via Latent Structure Queries* (arXiv:2409.12640, Google DeepMind, Sep 2024)
4
+
5
+ ## Context
6
+
7
+ Users of cc-context-stats can see how much context remains, but have no indicator of **how well the model is likely performing** at the current context fill level. The Michelangelo paper demonstrates that LLM answer quality degrades predictably as context fills — with an initial sharp super-linear drop followed by linear/flat degradation. Different capabilities (reasoning, retrieval, self-awareness) degrade at different rates.
8
+
9
+ This plan introduces a **Model Intelligence (MI)** score: a [0, 1] heuristic that estimates answer quality using only data already available in the CSV state entries. It complements the existing Smart/Dumb/Wrap Up Zone indicators with a continuous, multi-dimensional score.
10
+
11
+ ## Key Insights from the Paper
12
+
13
+ 1. **Performance degrades with context length** — All models show significant falloff, often starting before 32K tokens. There is an initial sharp super-linear drop, followed by either flattening or continued linear degradation.
14
+
15
+ 2. **Three orthogonal evaluation dimensions** (each measures different aspects):
16
+ - **MRCR**: Understanding ordering, distinguishing similar content, reproducing context (scored via string similarity [0,1])
17
+ - **Latent List**: Tracking a latent data structure through operations (scored via exact match + normalized error [0,1])
18
+ - **IDK**: Knowing what the model doesn't know (scored via accuracy [0,1])
19
+
20
+ 3. **Higher complexity = steeper degradation** — As task complexity increases, performance falls off faster with context length.
21
+
22
+ 4. **Cross-over behavior** — Models better at short context can become worse at long context.
23
+
24
+ 5. **Perplexity ≠ reasoning quality** — Low perplexity does not predict good reasoning at long contexts.
25
+
26
+ ## Formula Design
27
+
28
+ ### Guard Clause
29
+
30
+ If `context_window_size == 0` (old 2-field CSV entries, malformed data), return `MI = 1.0` with all sub-scores at defaults (CPS=1.0, ES=1.0, PS=0.5). This avoids division by zero and treats unknown-context entries optimistically.
31
+
32
+ ### MI = 0.60 × CPS + 0.25 × ES + 0.15 × PS
33
+
34
+ Three sub-scores, each inspired by a Michelangelo evaluation dimension. Weights are **hardcoded constants** (not configurable) to minimize cross-implementation sync burden.
35
+
36
+ ### 1. Context Pressure Score (CPS) — weight: 0.60
37
+
38
+ Maps to the paper's primary finding: performance degrades with context utilization.
39
+
40
+ ```
41
+ u = current_used_tokens / context_window_size (utilization ratio, 0 to 1+)
42
+ CPS = max(0, 1 - u^β)
43
+ ```
44
+
45
+ Default `β = 1.5`. Configurable via `mi_curve_beta`. This creates the super-linear initial drop observed in the paper:
46
+
47
+ | Utilization (u) | CPS (β=1.5) | CPS (β=1.0, linear) | CPS (β=2.0, quadratic) |
48
+ |---|---|---|---|
49
+ | 0.0 | 1.00 | 1.00 | 1.00 |
50
+ | 0.2 | 0.91 | 0.80 | 0.96 |
51
+ | 0.4 | 0.75 | 0.60 | 0.84 |
52
+ | 0.6 | 0.54 | 0.40 | 0.64 |
53
+ | 0.8 | 0.28 | 0.20 | 0.36 |
54
+ | 1.0 | 0.00 | 0.00 | 0.00 |
55
+
56
+ **Rationale:** β=1.5 reproduces the paper's observation that performance is good early, degrades significantly past ~50%, and becomes severely impaired above ~80%. Configurable via `mi_curve_beta`.
57
+
58
+ ### 2. Efficiency Score (ES) — weight: 0.25
59
+
60
+ Proxies context utilization quality (analogous to MRCR — is the model effectively re-using prior context?).
61
+
62
+ ```
63
+ total_context = entry.current_used_tokens # = current_input + cache_creation + cache_read
64
+
65
+ if total_context == 0:
66
+ ES = 1.0 # No context yet
67
+ else:
68
+ cache_hit_ratio = cache_read / total_context
69
+ ES = 0.3 + 0.7 × cache_hit_ratio # [0.3, 1.0]
70
+ ```
71
+
72
+ **Note:** `total_context` is the same as `StateEntry.current_used_tokens` (state.py:132). Reuse the existing property in the package; compute inline in standalone scripts.
73
+
74
+ - Floor of 0.3 prevents penalizing early-session entries (no cache available yet)
75
+ - Full cache-read → ES=1.0
76
+
77
+ **Rationale:** High cache-read ratio indicates the model is successfully re-using previously cached context rather than re-processing, suggesting better context utilization.
78
+
79
+ ### 3. Productivity Score (PS) — weight: 0.15
80
+
81
+ Proxies output quality (analogous to Latent List/IDK — is the model producing meaningful, actionable output?).
82
+
83
+ **Deltas are computed as consecutive entry differences** (not cumulative totals):
84
+ ```
85
+ delta_lines_added = current.lines_added - previous.lines_added
86
+ delta_lines_removed = current.lines_removed - previous.lines_removed
87
+ delta_output_tokens = current.total_output_tokens - previous.total_output_tokens
88
+ ```
89
+
90
+ ```
91
+ if no previous entry OR delta_output_tokens <= 0:
92
+ PS = 0.5 # Neutral
93
+ else:
94
+ delta_lines = delta_lines_added + delta_lines_removed
95
+ ratio = delta_lines / delta_output_tokens
96
+ normalized = min(1.0, ratio / target) # target: 0.2 (hardcoded)
97
+ PS = 0.2 + 0.8 × normalized # [0.2, 1.0]
98
+ ```
99
+
100
+ - Target: 0.2 lines/token (1 line per 5 output tokens) = perfect score, **hardcoded** (not configurable)
101
+ - Floor of 0.2 for explanation-heavy sessions (still valid work)
102
+ - Lowest weight (0.15) because it's the noisiest proxy
103
+
104
+ **Rationale:** When the model produces concrete code changes relative to token expenditure, it is likely giving focused, relevant answers. When it produces many tokens with no code changes, it may be struggling.
105
+
106
+ ### Color Thresholds (hardcoded)
107
+
108
+ | MI Range | Color | Label | Interpretation |
109
+ |---|---|---|---|
110
+ | > 0.65 | Green | High Intelligence | Model is operating well |
111
+ | 0.35–0.65 | Yellow | Degraded | Context pressure affecting quality |
112
+ | < 0.35 | Red | Critical | Severely degraded, consider new session |
113
+
114
+ ### Example Calculations
115
+
116
+ **Fresh session** (u=0.1, 60% cache, 150 lines/1000 tokens):
117
+ - CPS = 1 - 0.1^1.5 = 0.968
118
+ - ES = 0.3 + 0.7 × 0.6 = 0.72
119
+ - PS = 0.2 + 0.8 × min(1, 0.15/0.2) = 0.80
120
+ - **MI = 0.60×0.968 + 0.25×0.72 + 0.15×0.80 = 0.581 + 0.180 + 0.120 = 0.88**
121
+
122
+ **Mid-session** (u=0.5, 40% cache, 100 lines/1000 tokens):
123
+ - CPS = 1 - 0.5^1.5 = 0.646
124
+ - ES = 0.3 + 0.7 × 0.4 = 0.58
125
+ - PS = 0.2 + 0.8 × min(1, 0.1/0.2) = 0.60
126
+ - **MI = 0.60×0.646 + 0.25×0.58 + 0.15×0.60 = 0.388 + 0.145 + 0.090 = 0.62**
127
+
128
+ **Late session** (u=0.85, 20% cache, 50 lines/1000 tokens):
129
+ - CPS = 1 - 0.85^1.5 = 0.217
130
+ - ES = 0.3 + 0.7 × 0.2 = 0.44
131
+ - PS = 0.2 + 0.8 × min(1, 0.05/0.2) = 0.40
132
+ - **MI = 0.60×0.217 + 0.25×0.44 + 0.15×0.40 = 0.130 + 0.110 + 0.060 = 0.30**
133
+
134
+ ## Implementation Steps
135
+
136
+ ### Step 1: Create `src/claude_statusline/graphs/intelligence.py`
137
+
138
+ Core MI computation module:
139
+ - `IntelligenceConfig` dataclass — beta only (weights, thresholds, productivity_target are hardcoded constants)
140
+ - `IntelligenceScore` dataclass — cps, es, ps, mi, utilization (all floats)
141
+ - `calculate_context_pressure(utilization, beta) → float`
142
+ - `calculate_efficiency(entry: StateEntry) → float`
143
+ - `calculate_productivity(current: StateEntry, previous: StateEntry | None) → float` — uses **consecutive entry diffs** for delta_lines and delta_output_tokens
144
+ - `calculate_intelligence(current: StateEntry, previous: StateEntry | None, context_window_size: int, beta?) → IntelligenceScore`
145
+ - **Guard clause:** if `context_window_size == 0`, return `IntelligenceScore(cps=1.0, es=1.0, ps=0.5, mi=1.0, utilization=0.0)`
146
+ - `get_mi_color(mi) → str` — returns "green"/"yellow"/"red" using hardcoded thresholds (0.65/0.35)
147
+ - `format_mi_score(mi) → str` — returns "0.82"
148
+
149
+ Constants (hardcoded, not configurable):
150
+ ```python
151
+ MI_WEIGHT_CPS = 0.60
152
+ MI_WEIGHT_ES = 0.25
153
+ MI_WEIGHT_PS = 0.15
154
+ MI_GREEN_THRESHOLD = 0.65
155
+ MI_YELLOW_THRESHOLD = 0.35
156
+ MI_PRODUCTIVITY_TARGET = 0.2
157
+ ```
158
+
159
+ ### Step 2: Create `tests/python/test_intelligence.py`
160
+
161
+ Unit tests for all functions:
162
+ - **CPS**: empty/full/half context, custom beta, clamping at 0
163
+ - **CPS guard**: `context_window_size == 0` returns MI=1.0 with defaults
164
+ - **ES**: no tokens, all cache read, no cache, mixed cache
165
+ - **PS**: no previous entry, no output, high/zero/moderate productivity, capping
166
+ - **PS deltas**: verify consecutive entry diff computation (not cumulative)
167
+ - **Composite**: optimal/worst conditions, weight validation, bounds check
168
+ - **Color**: green/yellow/red thresholds, boundary values
169
+ - **Format**: two decimals, zero, one, rounding
170
+ - **Statusline integration**: `show_mi=true` + `show_delta=false` produces MI output without delta display
171
+
172
+ ### Step 2b: Create `tests/fixtures/mi_test_vectors.json`
173
+
174
+ Shared test vectors for cross-implementation parity:
175
+ ```json
176
+ [
177
+ {
178
+ "description": "Fresh session",
179
+ "input": { "current_used": 20000, "context_window": 200000, "cache_read": 12000, "current_input": 5000, "cache_creation": 3000, "prev_lines_added": 0, "prev_lines_removed": 0, "cur_lines_added": 150, "cur_lines_removed": 10, "prev_output": 0, "cur_output": 1000, "beta": 1.5 },
180
+ "expected": { "cps": 0.968, "es": 0.72, "ps": 0.84, "mi": 0.887 }
181
+ }
182
+ ]
183
+ ```
184
+
185
+ 5-6 vectors covering: fresh session, mid-session, late session, no previous entry, context_window=0, no cache. Both Python and Node.js test suites read this file and assert results within ±0.01 tolerance.
186
+
187
+ ### Step 3: Modify `src/claude_statusline/core/config.py`
188
+
189
+ Add to `Config` dataclass (only 2 new fields):
190
+ - `show_mi: bool = True`
191
+ - `mi_curve_beta: float = 1.5`
192
+
193
+ All other MI parameters (weights, thresholds, productivity_target) are **hardcoded constants** in `intelligence.py`, not configurable. This minimizes cross-implementation sync burden (2 config branches vs 8).
194
+
195
+ Add parsing in `_read_config()`:
196
+ - `show_mi`: boolean, same pattern as existing keys (`value_lower != "false"`)
197
+ - `mi_curve_beta`: float, inline `try/except` — `try: self.mi_curve_beta = float(raw_value) except ValueError: pass`
198
+
199
+ Add MI section to `_create_default()` config template:
200
+ ```ini
201
+ # Model Intelligence (MI) score display
202
+ show_mi=true
203
+
204
+ # MI degradation curve shape (higher = steeper initial drop)
205
+ # mi_curve_beta=1.5
206
+ ```
207
+
208
+ Update `to_dict()` with both new fields.
209
+
210
+ ### Step 4: Modify `src/claude_statusline/cli/statusline.py`
211
+
212
+ **Key change:** Decouple state file reads from `show_delta`. Currently, the previous entry is only read inside `if config.show_delta:`. With MI, the previous entry must be read whenever `show_mi` OR `show_delta` is enabled.
213
+
214
+ Restructure the state file logic:
215
+ ```python
216
+ # Read previous entry if needed for delta OR MI
217
+ if config.show_delta or config.show_mi:
218
+ state_file = StateFile(session_id)
219
+ prev_entry = state_file.read_last_entry()
220
+ # ... build current entry, append if changed ...
221
+
222
+ if config.show_delta:
223
+ # ... existing delta_info logic ...
224
+
225
+ if config.show_mi:
226
+ from claude_statusline.graphs.intelligence import calculate_intelligence, get_mi_color, format_mi_score
227
+ mi_score = calculate_intelligence(entry, prev_entry, total_size, config.mi_curve_beta)
228
+ mi_color_name = get_mi_color(mi_score.mi)
229
+ mi_color = getattr(colors, mi_color_name)
230
+ mi_info = f" {mi_color}MI:{format_mi_score(mi_score.mi)}{colors.reset}"
231
+ ```
232
+
233
+ Add `mi_info` to the output parts list (between `delta_info` and `ac_info`). Color-code using `get_mi_color()`.
234
+
235
+ ### Step 5: Modify `src/claude_statusline/graphs/renderer.py`
236
+
237
+ **Two changes:**
238
+
239
+ **5a.** Add optional `label_fn: Callable[[int], str] | None = None` parameter to `render_timeseries()`. When provided, use it instead of `format_tokens()` for Y-axis labels. This allows the MI graph to display `0.62` instead of `620` when data is scaled ×1000. Default behavior (existing graphs) is unchanged.
240
+
241
+ **5b.** In `render_summary()` (~line 318, after zone indicator), add MI score display:
242
+ ```
243
+ Model Intelligence: 0.62 (Context pressure is degrading answer quality)
244
+ CPS: 0.54 ES: 0.72 PS: 0.60
245
+ ```
246
+ Accepts an optional `IntelligenceScore` parameter.
247
+
248
+ ### Step 6: Modify `src/claude_statusline/cli/context_stats.py`
249
+
250
+ - Add `"mi"` to `--type` choices
251
+ - In `render_once()`: compute MI scores for each entry pair and render as timeseries graph (scale ×1000 for integer renderer)
252
+ - Use `label_fn=lambda v: f"{v/1000:.2f}"` when calling `render_timeseries()` for the MI graph
253
+ - Include MI in `"all"` graph type
254
+ - Pass MI score to `render_summary()`
255
+
256
+ ### Step 7: Modify `scripts/statusline.py` (standalone)
257
+
258
+ Add a single self-contained `compute_mi()` function (not 4 separate functions) that takes raw values and returns `(mi, cps, es, ps)`. This reduces the sync surface from 4 functions to 1.
259
+
260
+ ```python
261
+ def compute_mi(used_tokens, context_window_size, cache_read, total_context,
262
+ delta_lines, delta_output, beta=1.5):
263
+ """Compute Model Intelligence score. Returns (mi, cps, es, ps)."""
264
+ ...
265
+ ```
266
+
267
+ Add `show_mi` and `mi_curve_beta` config parsing (2 keys, matching the package). Add `MI:X.XX` to output. Decouple state file read from `show_delta` (same restructuring as Step 4).
268
+
269
+ ### Step 8: Modify `scripts/statusline.js` (standalone Node.js)
270
+
271
+ Port MI formula as a single `computeMI()` function (same signature as Python). Add `show_mi` and `mi_curve_beta` config parsing. Add `MI:X.XX` to output. Decouple state file read from `showDelta`.
272
+
273
+ ```javascript
274
+ function computeMI(usedTokens, contextWindowSize, cacheRead, totalContext,
275
+ deltaLines, deltaOutput, beta = 1.5) {
276
+ // Returns { mi, cps, es, ps }
277
+ }
278
+ ```
279
+
280
+ ### Step 9: Add Node.js tests and shared test vectors
281
+
282
+ - Create `tests/fixtures/mi_test_vectors.json` with 5-6 test vectors
283
+ - Port core MI formula tests to `tests/node/intelligence.test.js`, reading from shared vectors
284
+ - Update `tests/python/test_intelligence.py` to also read from shared vectors
285
+ - Ensures cross-implementation parity within ±0.01 tolerance
286
+
287
+ ## Critical Files
288
+
289
+ | File | Action | Description |
290
+ |---|---|---|
291
+ | `src/claude_statusline/graphs/intelligence.py` | **Create** | Core MI computation module (hardcoded constants + configurable beta) |
292
+ | `tests/python/test_intelligence.py` | **Create** | Unit tests for MI module (incl. guard clause, integration tests) |
293
+ | `tests/fixtures/mi_test_vectors.json` | **Create** | Shared test vectors for cross-implementation parity |
294
+ | `src/claude_statusline/core/config.py` | **Modify** | Add `show_mi` (bool) and `mi_curve_beta` (float) only |
295
+ | `src/claude_statusline/cli/statusline.py` | **Modify** | Add MI score; decouple state read from `show_delta` |
296
+ | `src/claude_statusline/cli/context_stats.py` | **Modify** | Add `--type mi` graph option |
297
+ | `src/claude_statusline/graphs/renderer.py` | **Modify** | Add `label_fn` param to `render_timeseries()`; add MI to summary |
298
+ | `scripts/statusline.py` | **Modify** | Single `compute_mi()` function; decouple state read |
299
+ | `scripts/statusline.js` | **Modify** | Single `computeMI()` function; decouple state read |
300
+ | `tests/node/intelligence.test.js` | **Create** | Node.js MI tests using shared vectors |
301
+
302
+ ## Existing Utilities to Reuse
303
+
304
+ - `StateEntry.current_used_tokens` property (`core/state.py:132`) — already computes `current_input_tokens + cache_creation + cache_read` (used for both CPS utilization and ES total_context)
305
+ - `ColorManager` (`core/colors.py`) — for MI color coding
306
+ - `fit_to_width()` (`formatters/layout.py`) — for statusline width management
307
+ - `Config.load()` pattern (`core/config.py`) — extended with 2 new fields
308
+ - `GraphRenderer.render_timeseries()` (`graphs/renderer.py`) — for MI graph (extended with `label_fn` parameter)
309
+
310
+ **Not used**: `format_tokens()` — MI uses its own `f"{mi:.2f}"` format. `calculate_deltas()` — MI computes its own consecutive entry diffs internally.
311
+
312
+ ## Configuration Options
313
+
314
+ Only 2 config keys are exposed (weights, thresholds, and productivity target are hardcoded constants to minimize cross-implementation sync burden):
315
+
316
+ ```ini
317
+ # Model Intelligence (MI) score display
318
+ # Shows a heuristic quality score based on context utilization
319
+ show_mi=true
320
+
321
+ # MI degradation curve shape (higher = steeper initial drop)
322
+ # Based on Michelangelo paper's observed performance degradation
323
+ # mi_curve_beta=1.5
324
+ ```
325
+
326
+ Hardcoded constants (in `intelligence.py`, `compute_mi()`, and `computeMI()`):
327
+ - Weights: CPS=0.60, ES=0.25, PS=0.15
328
+ - Thresholds: green > 0.65, yellow > 0.35, red below
329
+ - Productivity target: 0.2 lines/token
330
+
331
+ ## Display Integration
332
+
333
+ ### Statusline output
334
+
335
+ ```text
336
+ [Opus] myproject | main [3] | 75k free (37.5%) [+2,500] MI:0.62 [AC:45k] abc123
337
+ ```
338
+
339
+ ### Context Stats CLI summary
340
+
341
+ ```text
342
+ Session Summary
343
+ ────────────────────────────────────────────────
344
+ Context Remaining: 75,000/200,000 (37%)
345
+ >>> DUMB ZONE <<< (You are in the dumb zone - Dex Horthy says so)
346
+ Model Intelligence: 0.62 (Context pressure is degrading answer quality)
347
+ CPS: 0.54 ES: 0.72 PS: 0.60
348
+ ```
349
+
350
+ ### Context Stats MI graph (`--type mi`)
351
+
352
+ ASCII timeseries graph of MI score over time, showing the degradation trajectory.
353
+
354
+ ## Verification
355
+
356
+ 1. **Unit tests**: `pytest tests/python/test_intelligence.py -v`
357
+ 2. **All Python tests**: `source venv/bin/activate && pytest tests/python/ -v`
358
+ 3. **Node.js tests**: `npm test` (includes `intelligence.test.js` with shared vectors)
359
+ 4. **Cross-implementation parity**: Both Python and Node.js test suites read `tests/fixtures/mi_test_vectors.json` and assert results within ±0.01
360
+ 5. **Manual statusline test**: Pipe JSON to statusline and verify `MI:X.XX` appears
361
+ 6. **Manual statusline test (decoupled)**: Set `show_mi=true` + `show_delta=false` and verify MI appears without delta
362
+ 7. **Manual context-stats test**: `context-stats --type mi --no-watch` — verify MI graph renders with decimal Y-axis labels
363
+
364
+ ## Known Limitations
365
+
366
+ 1. **Productivity Score is noisy for non-coding sessions** — Research/planning sessions have low PS even with high-quality answers. Mitigation: PS has lowest weight (0.15) and floor of 0.2.
367
+
368
+ 2. **Cache hit ratio reflects API behavior, not reasoning quality** — Cache management is infrastructure, not model intelligence. Mitigation: ES weight is moderate (0.25) and presented as a heuristic proxy.
369
+
370
+ 3. **Degradation curve is not calibrated to specific models** — The paper found different curves per model family. Mitigation: `mi_curve_beta` is configurable.
371
+
372
+ 4. **Integer graph renderer** — MI scores (floats [0,1]) are scaled to [0, 1000] for the integer-based renderer, with Y-axis labels formatted as decimals via `label_fn`.
373
+
374
+ 5. **MI adds file I/O when show_delta=false** — When `show_mi=true` and `show_delta=false`, the statusline reads the previous entry for PS calculation. Users who need minimal I/O can set `show_mi=false`.
375
+
376
+ ## Review Decisions Log
377
+
378
+ Decisions made during engineering review (2026-03-14):
379
+
380
+ | # | Decision | Resolution |
381
+ |---|---|---|
382
+ | 1 | PS delta definition | Consecutive entry diffs (not cumulative totals) |
383
+ | 2 | CPS division by zero | Guard clause: return MI=1.0 when context_window_size=0 |
384
+ | 3 | MI vs show_delta coupling | Decoupled — MI reads prev entry independently |
385
+ | 4 | Config surface area | Only `show_mi` + `mi_curve_beta` (hardcode the rest) |
386
+ | 5 | MI formula DRY | Single `compute_mi()` / `computeMI()` in standalone scripts |
387
+ | 6 | Module location | Keep in `graphs/` next to `statistics.py` |
388
+ | 7 | Float config parsing | Inline try/except for `mi_curve_beta` |
389
+ | 8 | MI graph Y-axis | Add `label_fn` parameter to `render_timeseries()` |
390
+ | 9 | Cross-impl parity | Shared `tests/fixtures/mi_test_vectors.json` |
391
+
392
+ ## TODOs (deferred)
393
+
394
+ - **Shared test vectors**: Create `tests/fixtures/mi_test_vectors.json` with 5-6 vectors for cross-implementation parity (agreed during review, to be built during implementation)
395
+ - **MI trend indicators**: Show `MI:0.62↓` or `MI:0.82↑` in statusline based on comparison with previous MI score (deferred — adds cross-impl sync burden)
396
+ - **Per-model beta calibration**: Map known model IDs to empirically tuned beta values (deferred — requires empirical data we don't have yet)
@@ -79,7 +79,7 @@ Session Summary
79
79
  Output Tokens: 43,429
80
80
  Session Duration: 2h 29m
81
81
 
82
- Powered by cc-context-stats v1.7.0 - https://github.com/luongnv89/cc-context-stats
82
+ Powered by cc-context-stats v1.8.0 - https://github.com/luongnv89/cc-context-stats
83
83
  ```
84
84
 
85
85
  ## Features
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "cc-context-stats",
3
- "version": "1.7.0",
3
+ "version": "1.8.0",
4
4
  "description": "Monitor your Claude Code session context in real-time - track token usage and never run out of context",
5
5
  "main": "scripts/statusline.js",
6
6
  "scripts": {
package/pyproject.toml CHANGED
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
4
4
 
5
5
  [project]
6
6
  name = "cc-context-stats"
7
- version = "1.7.0"
7
+ version = "1.8.0"
8
8
  description = "Monitor your Claude Code session context in real-time - track token usage and never run out of context"
9
9
  readme = "README.md"
10
10
  license = { text = "MIT" }