@ictechgy/context-guard 0.4.3 → 0.4.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,12 @@
2
2
 
3
3
  All notable changes for the ContextGuard plugin are documented here.
4
4
 
5
+ ## [0.4.4] - 2026-06-08
6
+
7
+ - Added top-level `cache_layout_advice` to transcript audit JSON and feasibility output so cache-prefix instability can be prioritized without mixing advice into evidence-only diagnostics.
8
+ - Documented the `cache_layout_advice` consumer contract and conservative cause boundaries for volatile-prefix findings.
9
+ - Refined cache-prefix recommendation wording after quad-review so advice does not overclaim cache reads or session-splitting evidence.
10
+
5
11
  ## [0.4.3] - 2026-06-08
6
12
 
7
13
  - Fixed the Homebrew formula template so packaged helper paths are handled as Pathname objects during install.
package/README.ko.md CHANGED
@@ -99,7 +99,7 @@ brief 모드는 코딩 에이전트가 군더더기를 줄이도록 요청하되
99
99
 
100
100
  - 전체 파일 읽기와 심볼·줄 범위 읽기의 차이
101
101
  - 원본 로그와 요약 출력 또는 로컬 보관 요약 기록의 차이
102
- - `context-guard-audit`가 보고한 대화 기록 사용량 집중 지점과 `cache_friendliness` 프롬프트 배치 신호
102
+ - `context-guard-audit`가 보고한 대화 기록 사용량 집중 지점, `cache_friendliness` 프롬프트 배치 신호, `cache_layout_advice` 실험 우선순위
103
103
  - 상태표시줄의 `cache` / `reuse` 값: ContextGuard가 직접 만든 절감 효과가 아니라 관찰된 대화 기록·provider cache 신호입니다.
104
104
  - `context-guard cost preflight`로 Anthropic 요청 JSON의 추정 비용을 보고, 호출 뒤 `context-guard cost observe`로 provider usage 필드(`cache_creation_input_tokens`, `cache_read_input_tokens`)를 대조합니다.
105
105
  - `context-guard-bench`로 성공한 기준/변형 실행을 쌍으로 맞춰 비교한 결과
@@ -300,7 +300,7 @@ head/tail 로그 대신 의미 요약이 필요하면 `--digest markdown` 또는
300
300
  ./plugins/context-guard/bin/context-guard-audit ~/.claude/projects --top 20 --recommend
301
301
  ```
302
302
 
303
- 감사 명령은 기본적으로 너무 큰 대화 기록 파일과 JSONL 기록을 건너뛰고(`--max-file-bytes`, `--max-line-bytes`), 건너뛴 개수를 함께 보고합니다. 손상된 추적 기록이 메모리를 독점하거나 스캔 공백을 숨기지 않도록 하기 위한 방어입니다. JSON 출력에는 `cache_friendliness`와 [`cache_diagnostics`](docs/cache-diagnostics-schema.md)도 포함됩니다. 이는 제한된 사용량 필드, timestamped cache telemetry records, 가림 처리된 segment hash로 만든 휴리스틱 프롬프트 배치/cache-read 진단이며, 자주 바뀌는 내용이 프롬프트 앞부분에 있는지, 안정 prefix 후보가 있는지, cache miss 가설과 TTL/headroom 증거 공백이 무엇인지 알릴 수 있습니다. 원문 프롬프트는 출력하지 않고 provider cache hit를 증명하지 않으며, 대화 기록 스키마가 충분한 증거를 드러내지 않으면 `missing`, `partial`, `hypothesis`, `unavailable`일 수 있습니다.
303
+ 감사 명령은 기본적으로 너무 큰 대화 기록 파일과 JSONL 기록을 건너뛰고(`--max-file-bytes`, `--max-line-bytes`), 건너뛴 개수를 함께 보고합니다. 손상된 추적 기록이 메모리를 독점하거나 스캔 공백을 숨기지 않도록 하기 위한 방어입니다. JSON 출력에는 `cache_friendliness`와 [`cache_diagnostics`](docs/cache-diagnostics-schema.md)도 포함됩니다. 이는 제한된 사용량 필드, timestamped cache telemetry records, 가림 처리된 segment hash로 만든 휴리스틱 프롬프트 배치/cache-read 진단입니다. sibling `cache_layout_advice`는 신호를 세션 분리, prefix 안정화 같은 순위화된 **확인/실험**으로 바꾸되, 관측된 issue와 가설/입증된 cause를 분리합니다. 원문 프롬프트는 출력하지 않고 provider cache hit를 증명하지 않으며, 대화 기록 스키마가 충분한 증거를 드러내지 않으면 `missing`, `partial`, `hypothesis`, `unavailable`일 수 있습니다.
304
304
 
305
305
  ### 상태표시줄에서 컨텍스트와 캐시 상태 확인
306
306
 
package/README.md CHANGED
@@ -99,7 +99,7 @@ When you need a savings claim, measure it on your own tasks:
99
99
 
100
100
  - full-file reads versus symbol or line-range reads
101
101
  - raw logs versus digest output or artifact receipts
102
- - transcript hotspots reported by `context-guard-audit`, including `cache_friendliness` prompt-layout signals
102
+ - transcript hotspots reported by `context-guard-audit`, including `cache_friendliness` prompt-layout signals and `cache_layout_advice` experiment priorities
103
103
  - statusline `cache` / `reuse` as observed transcript/provider-cache signals, not savings caused by ContextGuard
104
104
  - `context-guard cost preflight` estimates for Anthropic request JSON, followed by `context-guard cost observe` using provider usage fields (`cache_creation_input_tokens`, `cache_read_input_tokens`) after the call
105
105
  - matched successful baseline/variant runs from `context-guard-bench`
@@ -339,7 +339,7 @@ JSON
339
339
  ./plugins/context-guard/bin/context-guard-audit ~/.claude/projects --top 20 --recommend
340
340
  ```
341
341
 
342
- The audit command skips oversized transcript files and JSONL records by default (`--max-file-bytes`, `--max-line-bytes`) and reports skipped counts, so a corrupt trace cannot dominate memory or hide scan gaps. JSON output also includes `cache_friendliness` and [`cache_diagnostics`](docs/cache-diagnostics-schema.md): heuristic prompt-layout/cache-read diagnostics built from bounded usage fields, timestamped cache telemetry records, and redacted segment hashes. They can flag likely volatile content near the prompt prefix, stable-prefix candidates, cache-miss hypotheses, and TTL/headroom evidence gaps, but they do not print raw prompt text, do not prove provider cache hits, and may be `missing`, `partial`, `hypothesis`, or `unavailable` when transcript schemas do not expose enough evidence.
342
+ The audit command skips oversized transcript files and JSONL records by default (`--max-file-bytes`, `--max-line-bytes`) and reports skipped counts, so a corrupt trace cannot dominate memory or hide scan gaps. JSON output also includes `cache_friendliness` and [`cache_diagnostics`](docs/cache-diagnostics-schema.md): heuristic prompt-layout/cache-read diagnostics built from bounded usage fields, timestamped cache telemetry records, and redacted segment hashes. The sibling `cache_layout_advice` field turns those signals into ranked **checks/experiments** such as splitting long sessions or stabilizing early prompt prefixes, while keeping observed issues separate from hypothesized or corroborated causes. These fields can flag likely volatile content near the prompt prefix, stable-prefix candidates, cache-miss hypotheses, and TTL/headroom evidence gaps, but they do not print raw prompt text, do not prove provider cache hits, and may be `missing`, `partial`, `hypothesis`, or `unavailable` when transcript schemas do not expose enough evidence.
343
343
 
344
344
  ### Watch context and cache health in the statusline
345
345
 
@@ -65,7 +65,7 @@ python3 context-guard-kit/sanitize_output.py -- git diff
65
65
  `../research/experimental-token-reduction-radar.md`는 learned compression, multimodal crop/OCR/visual-token pruning, self-hosted KV/latent inference optimization 같은 선택적 미래 실험을 문서화한 gate입니다. `../docs/experimental-benchmark-fixtures.md`에는 fixture-only task/variant 시작 예시가 있습니다. 이 radar와 fixture는 현재 제공되는 runtime helper가 아니며, hosted API token/cost 절감을 보장하지 않습니다. hosted API token/cost 절감 주장은 provider가 측정한 matched-task 근거가 있을 때만 허용합니다. Radar의 later-roadmap gate는 neural/semantic compression, trust-tiered injection-aware compression, context-diff compaction, local proxy constraint를 별도 미래 PR이 gate를 통과하기 전까지 experimental/non-shipped로 유지합니다.
66
66
 
67
67
  `claude_transcript_cost_audit.py --recommend`의 기본 출력은 공유 시 안전하도록 transcript 경로를 `basename#hash`, 명령을 `command#hash` 형태로 익명화합니다. 로컬 원문 식별자가 꼭 필요할 때만 `--show-paths` 또는 `--show-commands`를 추가하세요.
68
- 대용량/손상 transcript 방어를 위해 파일 단위 `--max-file-bytes`, JSONL record 단위 `--max-line-bytes` 제한도 기본 적용되며, 건너뛴 항목은 skip count와 warning으로 표시됩니다. JSON summary/feasibility 출력의 `cache_friendliness`는 제한된 정제 segment hash로 안정적인 prefix와 volatile prefix/tail 신호를 비교하는 휴리스틱입니다. 원문 prompt text는 출력하지 않고, provider cache token field는 ContextGuard가 만든 토큰 절감 증거가 아니라 별도 진단 텔레메트리로 해석하세요.
68
+ 대용량/손상 transcript 방어를 위해 파일 단위 `--max-file-bytes`, JSONL record 단위 `--max-line-bytes` 제한도 기본 적용되며, 건너뛴 항목은 skip count와 warning으로 표시됩니다. JSON summary/feasibility 출력의 `cache_friendliness`는 제한된 정제 segment hash로 안정적인 prefix와 volatile prefix/tail 신호를 비교하는 휴리스틱입니다. `cache_layout_advice`는 그 신호를 긴 세션 분리, prefix 안정화, diet 점검 같은 순위화된 확인/실험으로 연결하지만, 관측 issue와 가설/입증 cause를 분리합니다. 원문 prompt text는 출력하지 않고, provider cache token field는 ContextGuard가 만든 토큰 절감 증거가 아니라 별도 진단 텔레메트리로 해석하세요.
69
69
 
70
70
  `context_guard_diet.py scan`은 항상 로컬에서만 읽는 read-only 스캐너입니다. 기본 출력은 project root를 익명화하고 상대경로 중심으로 보고합니다. `--top`은 보고서의 context-like file 목록과 context-exclusion recommendation 목록에 공통으로 적용됩니다. `--show-paths`는 로컬/비공개 디버깅에서만 쓰세요.
71
71
 
@@ -49,6 +49,7 @@ TIMESTAMP_KEYS = ("timestamp", "created_at", "createdAt", "time", "ts")
49
49
  FEASIBILITY_SCHEMA_VERSION = "contextguard.metric-feasibility.v1.2"
50
50
  FEASIBILITY_PRODUCER = "context-guard-audit"
51
51
  CACHE_DIAGNOSTICS_SCHEMA_VERSION = "contextguard.cache-diagnostics.v1"
52
+ CACHE_LAYOUT_ADVICE_SCHEMA_VERSION = "contextguard.cache-layout-advice.v1"
52
53
  MAX_ERROR_EXAMPLES = 20
53
54
  JSON_PARSE_RECURSION_LIMIT = 10_000
54
55
  READ_CHUNK_BYTES = 64 * 1024
@@ -184,6 +185,7 @@ class UsageSummary:
184
185
  prompt_cache_audit: PromptCacheAudit = field(default_factory=PromptCacheAudit)
185
186
  cache_friendliness_cache: dict[str, Any] | None = field(default=None, init=False, repr=False)
186
187
  cache_diagnostics_cache: dict[str, Any] | None = field(default=None, init=False, repr=False)
188
+ cache_layout_advice_cache: dict[str, Any] | None = field(default=None, init=False, repr=False)
187
189
 
188
190
  @property
189
191
  def total_tokens(self) -> int:
@@ -1398,6 +1400,222 @@ def cache_diagnostics_for_summary(summary: UsageSummary) -> dict[str, Any]:
1398
1400
  return build_cache_diagnostics(summary)
1399
1401
 
1400
1402
 
1403
+ def _dominant_transcript(summary: UsageSummary) -> dict[str, Any] | None:
1404
+ if summary.total_tokens <= 0 or not summary.by_file:
1405
+ return None
1406
+ _label, tokens = summary.by_file.most_common(1)[0]
1407
+ share = tokens / summary.total_tokens if summary.total_tokens else 0.0
1408
+ return {
1409
+ "tokens": tokens,
1410
+ "share": round(share, 4),
1411
+ "dominates": share >= 0.20 and tokens >= 1_000,
1412
+ }
1413
+
1414
+
1415
+ def _first_dynamic_breaker(cache_diagnostics: dict[str, Any]) -> dict[str, Any] | None:
1416
+ breakers = cache_diagnostics.get("dynamic_prefix_breakers") or []
1417
+ if not breakers:
1418
+ return None
1419
+ first = breakers[0]
1420
+ return first if isinstance(first, dict) else None
1421
+
1422
+
1423
+ def build_cache_layout_advice(summary: UsageSummary) -> dict[str, Any]:
1424
+ if summary.cache_layout_advice_cache is not None:
1425
+ return summary.cache_layout_advice_cache
1426
+
1427
+ cache_friendliness = cache_friendliness_for_summary(summary)
1428
+ cache_diagnostics = cache_diagnostics_for_summary(summary)
1429
+ signals = cache_friendliness.get("signals") if isinstance(cache_friendliness.get("signals"), dict) else {}
1430
+ dynamic_breaker = _first_dynamic_breaker(cache_diagnostics)
1431
+ dominant = _dominant_transcript(summary)
1432
+ cache_creation = summary.tokens.get("cache_creation", 0)
1433
+ cache_read = summary.tokens.get("cache_read", 0)
1434
+ cache_fields = cache_diagnostics.get("observations", {}).get("cache_fields", {}) if isinstance(cache_diagnostics.get("observations"), dict) else {}
1435
+ cache_status = cache_fields.get("status") if isinstance(cache_fields, dict) else None
1436
+ stable_prefix_share = signals.get("stable_prefix_share")
1437
+ volatile_prefix_share = signals.get("volatile_prefix_share")
1438
+ volatile_tail_share = signals.get("volatile_tail_share")
1439
+ max_prefix_position = dynamic_breaker.get("position") if dynamic_breaker else None
1440
+ max_prefix_position_volatile_share = dynamic_breaker.get("volatile_share") if dynamic_breaker else signals.get("max_prefix_position_volatile_share")
1441
+
1442
+ status = "missing"
1443
+ confidence = "unavailable"
1444
+ observed_issue = "unknown"
1445
+ priority = "P2"
1446
+ hypothesized_causes: list[dict[str, Any]] = []
1447
+ corroborated_causes: list[dict[str, Any]] = []
1448
+ next_checks: list[dict[str, Any]] = []
1449
+ recommended_experiments: list[dict[str, Any]] = []
1450
+
1451
+ has_cache_any = bool(
1452
+ summary.token_field_presence.get("cache_read", 0)
1453
+ or summary.token_field_presence.get("cache_creation", 0)
1454
+ )
1455
+ has_prompt_samples = bool(summary.prompt_cache_audit.samples)
1456
+ if has_cache_any or has_prompt_samples:
1457
+ status = "partial" if (
1458
+ not has_prompt_samples
1459
+ or cache_friendliness.get("status") == "partial"
1460
+ or cache_diagnostics.get("status") == "partial"
1461
+ or summary.skipped_files
1462
+ or summary.skipped_records
1463
+ or summary.parse_errors
1464
+ ) else "available"
1465
+ confidence = "partial" if status == "partial" else "hypothesis"
1466
+
1467
+ volatile_prefix_breaker = bool(
1468
+ dynamic_breaker
1469
+ and cache_creation > 0
1470
+ and (max_prefix_position in {0, 1} or (max_prefix_position_volatile_share or 0) >= PROMPT_PREFIX_VOLATILE_THRESHOLD)
1471
+ )
1472
+ long_session_dominates = bool(dominant and dominant.get("dominates"))
1473
+
1474
+ if volatile_prefix_breaker:
1475
+ observed_issue = "volatile_prefix_breaker"
1476
+ priority = "P0" if cache_creation >= 50_000 and max_prefix_position in {0, 1} else "P1"
1477
+ hypothesized_causes.append({
1478
+ "id": "prefix-position-churn",
1479
+ "confidence": confidence,
1480
+ "evidence": EVIDENCE_INFERRED,
1481
+ "reason": (
1482
+ "A highly volatile redacted prompt segment appears in the early prefix window; "
1483
+ "this identifies a layout issue, not a confirmed source."
1484
+ ),
1485
+ "next_check": "Check whether startup context, generated evidence, or tool/MCP catalog changes are moving before stable policy.",
1486
+ })
1487
+ if cache_diagnostics.get("stable_prefix_candidates"):
1488
+ hypothesized_causes.append({
1489
+ "id": "evidence-before-policy",
1490
+ "confidence": confidence,
1491
+ "evidence": EVIDENCE_INFERRED,
1492
+ "reason": (
1493
+ "Stable reusable segments appear elsewhere while the early prefix churns; "
1494
+ "check whether logs, diffs, timestamps, or file evidence precede stable instructions."
1495
+ ),
1496
+ "next_check": "Keep stable policy/instructions first and move generated run evidence later.",
1497
+ })
1498
+ next_checks.append({
1499
+ "id": "inspect-startup-context-size",
1500
+ "confidence": "hypothesis",
1501
+ "command_templates": [
1502
+ "context-guard-diet scan <repo>",
1503
+ "context-guard-diet structural-waste <repo>",
1504
+ ],
1505
+ "evidence_required_for_corroboration": (
1506
+ "Large or duplicate CLAUDE.md/AGENTS.md/GEMINI.md findings from diet output."
1507
+ ),
1508
+ })
1509
+ elif long_session_dominates:
1510
+ observed_issue = "long_session_accumulation"
1511
+ priority = "P1"
1512
+ elif cache_creation >= 10_000 and cache_read > 0 and summary.cache_amortization < 0.5:
1513
+ observed_issue = "low_cache_reuse"
1514
+ priority = "P1"
1515
+ elif cache_status == "missing" or not has_cache_any:
1516
+ observed_issue = "missing_cache_fields"
1517
+ priority = "P2"
1518
+
1519
+ if long_session_dominates:
1520
+ recommended_experiments.append({
1521
+ "id": "split-long-sessions",
1522
+ "order": len(recommended_experiments) + 1,
1523
+ "priority": "P1",
1524
+ "effort": "low",
1525
+ "action": "Use /clear between unrelated tasks and /compact focus on changed files, failing tests, and remaining TODO during long work.",
1526
+ "expected_signal": "Cache creation per comparable task decreases and one transcript no longer dominates observed tokens.",
1527
+ "verification": "Re-run context-guard-audit on a comparable window and compare cache_creation, cache_amortization, and top transcript share.",
1528
+ "evidence": dominant or {},
1529
+ })
1530
+ if volatile_prefix_breaker:
1531
+ recommended_experiments.append({
1532
+ "id": "stabilize-cache-prefix",
1533
+ "order": len(recommended_experiments) + 1,
1534
+ "priority": priority,
1535
+ "effort": "medium",
1536
+ "action": "Keep stable reusable instructions/policy before volatile logs, diffs, timestamps, and generated file evidence.",
1537
+ "expected_signal": "Stable prefix share rises and volatile prefix share falls on matched audit windows.",
1538
+ "verification": "Re-run context-guard-audit --json --recommend and compare cache_layout_advice plus cache_friendliness signals.",
1539
+ "evidence": {
1540
+ "dynamic_prefix_breaker_position": max_prefix_position,
1541
+ "dynamic_prefix_breaker_volatile_share": max_prefix_position_volatile_share,
1542
+ },
1543
+ })
1544
+ recommended_experiments.append({
1545
+ "id": "run-context-diet-checks",
1546
+ "order": len(recommended_experiments) + 1,
1547
+ "priority": "P1",
1548
+ "effort": "low",
1549
+ "action": "Run the generated diet command templates and treat any large/duplicate context-file findings as corroborating evidence before editing instructions.",
1550
+ "expected_signal": "Diet output identifies or rules out oversized/duplicated startup context as a contributor.",
1551
+ "verification": "Record diet JSON separately; do not convert prefix-position evidence alone into a confirmed startup-context cause.",
1552
+ "command_templates": [
1553
+ "context-guard-diet scan <repo> --json > diet.json",
1554
+ "context-guard-diet structural-waste <repo> --json > structural-waste.json",
1555
+ ],
1556
+ })
1557
+ if cache_creation >= 50_000 and summary.cache_amortization_defined and 1.0 <= summary.cache_amortization < 5.0:
1558
+ recommended_experiments.append({
1559
+ "id": "defer-longer-ttl-until-prefix-stable" if volatile_prefix_breaker else "evaluate-longer-ttl-after-stability-check",
1560
+ "order": len(recommended_experiments) + 1,
1561
+ "priority": "P2",
1562
+ "effort": "medium",
1563
+ "action": "Treat longer TTL as secondary; first corroborate stable prefix reuse and current provider TTL/pricing behavior.",
1564
+ "expected_signal": "TTL evaluation happens only after prefix volatility is reduced or ruled out.",
1565
+ "verification": "Use timestamped cache telemetry and provider-measured billing/cost evidence; historical token totals alone are insufficient.",
1566
+ })
1567
+ if not recommended_experiments and status == "partial":
1568
+ next_checks.append({
1569
+ "id": "rerun-narrower-audit",
1570
+ "confidence": "partial",
1571
+ "command_templates": ["context-guard-audit <transcript-or-project-dir> --json --recommend"],
1572
+ "evidence_required_for_corroboration": "Enough uncapped prompt/cache records to classify prefix layout.",
1573
+ })
1574
+ if not recommended_experiments and observed_issue == "missing_cache_fields":
1575
+ next_checks.append({
1576
+ "id": "collect-cache-telemetry",
1577
+ "confidence": "unavailable",
1578
+ "command_templates": ["context-guard-audit ~/.claude/projects --json --recommend"],
1579
+ "evidence_required_for_corroboration": "Transcript records with cache_read/cache_creation fields.",
1580
+ })
1581
+
1582
+ advice = {
1583
+ "schema_version": CACHE_LAYOUT_ADVICE_SCHEMA_VERSION,
1584
+ "status": status,
1585
+ "confidence": confidence,
1586
+ "heuristic": True,
1587
+ "observed_issue": observed_issue,
1588
+ "priority": priority,
1589
+ "observed_summary": {
1590
+ "cache_creation_tokens": cache_creation,
1591
+ "cache_read_tokens": cache_read,
1592
+ "cache_amortization": round(summary.cache_amortization, 4) if summary.cache_amortization_defined else None,
1593
+ "stable_prefix_share": stable_prefix_share,
1594
+ "volatile_prefix_share": volatile_prefix_share,
1595
+ "volatile_tail_share": volatile_tail_share,
1596
+ "max_prefix_position": max_prefix_position,
1597
+ "max_prefix_position_volatile_share": max_prefix_position_volatile_share,
1598
+ "dominant_transcript_share": dominant.get("share") if dominant else None,
1599
+ },
1600
+ "hypothesized_causes": hypothesized_causes,
1601
+ "corroborated_causes": corroborated_causes,
1602
+ "next_checks": next_checks,
1603
+ "recommended_experiments": recommended_experiments,
1604
+ "caveats": [
1605
+ "Cache layout advice is a local transcript heuristic, not billing authority or provider-cache proof.",
1606
+ "Observed issues come from cache fields and redacted segment statistics; causes remain hypotheses until corroborated by diet/structural evidence.",
1607
+ "Generated command templates use placeholders and must not be treated as observed user commands or paths.",
1608
+ "Use matched before/after audits before making token or cost savings claims.",
1609
+ ],
1610
+ }
1611
+ summary.cache_layout_advice_cache = advice
1612
+ return advice
1613
+
1614
+
1615
+ def cache_layout_advice_for_summary(summary: UsageSummary) -> dict[str, Any]:
1616
+ return build_cache_layout_advice(summary)
1617
+
1618
+
1401
1619
  def build_metric_caveats(summary: UsageSummary) -> list[str]:
1402
1620
  caveats = [
1403
1621
  "Values are observed from local Claude Code transcript JSON/JSONL fields and are not official billing records.",
@@ -1433,6 +1651,7 @@ def feasibility_json(
1433
1651
  stable_total_tokens = sum(stable_tokens.values())
1434
1652
  cache_friendliness = cache_friendliness_for_summary(summary)
1435
1653
  cache_diagnostics = cache_diagnostics_for_summary(summary)
1654
+ cache_layout_advice = cache_layout_advice_for_summary(summary)
1436
1655
  return {
1437
1656
  "schema_version": FEASIBILITY_SCHEMA_VERSION,
1438
1657
  "producer": FEASIBILITY_PRODUCER,
@@ -1452,6 +1671,7 @@ def feasibility_json(
1452
1671
  "headroom_availability",
1453
1672
  "cache_friendliness",
1454
1673
  "cache_diagnostics",
1674
+ "cache_layout_advice",
1455
1675
  "totals",
1456
1676
  ],
1457
1677
  "diagnostic_fields": ["summary"],
@@ -1480,6 +1700,7 @@ def feasibility_json(
1480
1700
  "headroom_availability": availability["headroom"],
1481
1701
  "cache_friendliness": cache_friendliness,
1482
1702
  "cache_diagnostics": cache_diagnostics,
1703
+ "cache_layout_advice": cache_layout_advice,
1483
1704
  "totals": {
1484
1705
  "total_tokens": stable_total_tokens,
1485
1706
  "tokens": stable_tokens,
@@ -1531,6 +1752,36 @@ def build_recommendations(summary: UsageSummary, top: int) -> list[dict[str, Any
1531
1752
  input_ratio = input_tokens / total
1532
1753
  cache_friendliness = cache_friendliness_for_summary(summary)
1533
1754
  cache_diagnostics = cache_diagnostics_for_summary(summary)
1755
+ cache_layout_advice = cache_layout_advice_for_summary(summary)
1756
+ if cache_layout_advice.get("observed_issue") == "volatile_prefix_breaker":
1757
+ evidence = {
1758
+ "observed_issue": cache_layout_advice.get("observed_issue"),
1759
+ "priority": cache_layout_advice.get("priority"),
1760
+ "confidence": cache_layout_advice.get("confidence"),
1761
+ "cache_creation_tokens": cache_creation,
1762
+ "cache_read_tokens": cache_read,
1763
+ }
1764
+ observed_summary = cache_layout_advice.get("observed_summary")
1765
+ if isinstance(observed_summary, dict):
1766
+ for key in ("max_prefix_position", "max_prefix_position_volatile_share", "stable_prefix_share", "volatile_prefix_share"):
1767
+ evidence[key] = observed_summary.get(key)
1768
+ rec = recommendation(
1769
+ "prioritize-cache-prefix-stabilization",
1770
+ "Prioritize cache-prefix stabilization before TTL or output trimming",
1771
+ (
1772
+ "Cache creation remains material and redacted segment statistics show a volatile early prefix; "
1773
+ "this is an experiment-prioritization signal, not a confirmed root cause."
1774
+ ),
1775
+ (
1776
+ "If one transcript dominates, split unrelated work into shorter sessions; then check startup/context "
1777
+ "size and keep stable policy before volatile logs, diffs, timestamps, and generated evidence."
1778
+ ),
1779
+ str(cache_layout_advice.get("priority") or "P1"),
1780
+ evidence,
1781
+ )
1782
+ rec["heuristic"] = True
1783
+ rec["confidence"] = cache_layout_advice.get("confidence")
1784
+ recs.append(rec)
1534
1785
  for finding in cache_friendliness.get("findings", []):
1535
1786
  if isinstance(finding, dict) and finding.get("id") == "volatile-content-near-prefix":
1536
1787
  evidence = dict(finding.get("evidence") or {})
@@ -1754,6 +2005,7 @@ def summary_json(
1754
2005
  "top_tools": counter_json(summary.by_tool, top),
1755
2006
  "cache_friendliness": cache_friendliness_for_summary(summary),
1756
2007
  "cache_diagnostics": cache_diagnostics_for_summary(summary),
2008
+ "cache_layout_advice": cache_layout_advice_for_summary(summary),
1757
2009
  }
1758
2010
  if include_recommendations:
1759
2011
  data["recommendations"] = build_recommendations(summary, top)
@@ -1887,6 +2139,26 @@ def main() -> int:
1887
2139
  headroom = cache_diagnostics.get("headroom_diagnostics") or {}
1888
2140
  print(f" headroom_status {headroom.get('status')} ({headroom.get('evidence')})")
1889
2141
 
2142
+ cache_layout_advice = cache_layout_advice_for_summary(summary)
2143
+ if cache_layout_advice.get("status") != "missing" or cache_layout_advice.get("observed_issue") != "unknown":
2144
+ print("\nCache layout advice")
2145
+ print(f" status {cache_layout_advice.get('status')}")
2146
+ print(f" confidence {cache_layout_advice.get('confidence')}")
2147
+ print(f" observed_issue {cache_layout_advice.get('observed_issue')}")
2148
+ print(f" priority {cache_layout_advice.get('priority')}")
2149
+ experiments = cache_layout_advice.get("recommended_experiments") or []
2150
+ if experiments:
2151
+ first = experiments[0]
2152
+ print(f" first_experiment {first.get('id')} ({first.get('priority')})")
2153
+ print(f" experiment_action {first.get('action')}")
2154
+ checks = cache_layout_advice.get("next_checks") or []
2155
+ if checks:
2156
+ first = checks[0]
2157
+ print(f" next_check {first.get('id')}")
2158
+ templates = first.get("command_templates") or []
2159
+ if templates:
2160
+ print(f" command_template {templates[0]}")
2161
+
1890
2162
  model_totals = Counter({model: sum(tokens.values()) for model, tokens in summary.by_model.items()})
1891
2163
  print_counter("By model", model_totals, args.top)
1892
2164
 
@@ -2,7 +2,9 @@
2
2
 
3
3
  `cache_diagnostics` is the nested diagnostic object emitted by `context-guard-audit --json` and by top-level `cache_diagnostics` in `context-guard-audit --feasibility-json`. The committed schema file, [`cache-diagnostics.schema.json`](cache-diagnostics.schema.json), describes that nested object only; it is not the full CLI response envelope.
4
4
 
5
- The object is for GUI and external consumers that need stable cache-read, prefix-layout, TTL-evidence, and headroom-boundary fields without scraping prose. It is a local transcript diagnostic contract, not a billing source, not provider telemetry verification, and not a token or cost savings promise.
5
+ The object is for GUI and external consumers that need stable cache-read, prefix-layout, TTL-evidence, and headroom-boundary fields without scraping prose. It is a local transcript diagnostic contract, not a billing source, not provider telemetry verification, and not a token or cost savings promise. It does not guarantee savings, does not prove provider cache hits, and does not infer live headroom.
6
+
7
+ `context-guard-audit` also emits a top-level sibling `cache_layout_advice` object. That sibling is intentionally separate from `cache_diagnostics`: diagnostics stay evidence-oriented, while advice ranks checks and experiments such as session splitting, prefix stabilization, and context-diet scans. Advice distinguishes an `observed_issue` from `hypothesized_causes`, `corroborated_causes`, and `next_checks`; without diet or structural evidence, volatile prefix positions should be presented as hypotheses to check, not confirmed root causes.
6
8
 
7
9
  ## Files
8
10
 
@@ -13,11 +15,30 @@ The object is for GUI and external consumers that need stable cache-read, prefix
13
15
 
14
16
  ### `context-guard-audit --json`
15
17
 
16
- The legacy audit JSON includes top-level `cache_diagnostics` beside `cache_metrics` and `cache_friendliness`.
18
+ The legacy audit JSON includes top-level `cache_diagnostics` beside `cache_metrics`, `cache_friendliness`, and the separate `cache_layout_advice` advice object.
17
19
 
18
20
  ### `context-guard-audit --feasibility-json`
19
21
 
20
- The feasibility JSON includes top-level `cache_diagnostics` and lists `cache_diagnostics` in `consumer_contract.stable_top_level_fields`. GUI consumers should prefer the top-level feasibility field when available and use `summary.cache_diagnostics` only for legacy compatibility.
22
+ The feasibility JSON includes top-level `cache_diagnostics` and `cache_layout_advice`, and lists both in `consumer_contract.stable_top_level_fields`. GUI consumers should prefer the top-level feasibility field when available and use `summary.cache_diagnostics` only for legacy compatibility.
23
+
24
+ ## Sibling `cache_layout_advice` fields
25
+
26
+ `cache_layout_advice` is a stable top-level sibling of `cache_diagnostics`, but it is deliberately not part of `cache-diagnostics.schema.json`. It is an advice contract over local transcript heuristics.
27
+
28
+ | Field | Meaning | Consumer note |
29
+ | --- | --- | --- |
30
+ | `schema_version` | Stable version string, currently `contextguard.cache-layout-advice.v1`. | Treat unknown versions conservatively. |
31
+ | `status` | Advice availability: `available`, `partial`, or `missing`. | `partial` means prompt/cache evidence was capped, skipped, or incomplete. |
32
+ | `confidence` | Overall advice confidence: `hypothesis`, `partial`, or `unavailable`. | Never present as provider truth or billing proof. |
33
+ | `heuristic` | Always `true` for v1. | UI should label advice as heuristic. |
34
+ | `observed_issue` | Primary observed layout issue: `volatile_prefix_breaker`, `long_session_accumulation`, `low_cache_reuse`, `missing_cache_fields`, or `unknown`. | This is an observed/audited symptom, not a confirmed cause. |
35
+ | `priority` | Suggested priority bucket (`P0`, `P1`, or `P2`). | Use for ordering checks, not for savings claims. |
36
+ | `observed_summary` | Sanitized numeric summary such as cache creation/read tokens, prefix shares, breaker position, and dominant transcript share. | Contains aggregate counts/shares only, not raw prompt text. |
37
+ | `hypothesized_causes` | Candidate causes to investigate, each with `id`, `confidence`, `evidence`, `reason`, and `next_check`. | Keep separate from confirmed causes. |
38
+ | `corroborated_causes` | Causes supported by independent evidence beyond prefix-position heuristics. | Empty means no cause has been confirmed. |
39
+ | `next_checks` | Evidence-gathering checks with `id`, `confidence`, `command_templates`, and `evidence_required_for_corroboration`. | Templates use placeholders such as `<repo>` and must not embed observed local paths. |
40
+ | `recommended_experiments` | Ordered experiments with `id`, `order`, `priority`, `effort`, `action`, `expected_signal`, and `verification`. | Run in `order`; compare matched audit windows before claiming improvement. |
41
+ | `caveats` | User-facing boundaries for claims and evidence limits. | Preserve these in GUI summaries and reports. |
21
42
 
22
43
  ## Top-level fields
23
44
 
@@ -72,4 +93,4 @@ Historical transcript scans do not carry live context-window state. `headroom_di
72
93
 
73
94
  ## Claim boundaries
74
95
 
75
- `cache_diagnostics` can help users reorganize prompts, find volatile prefix segments, and identify missing evidence. It does not guarantee savings, does not verify provider cache state, is not billing authority, does not prove provider cache hits, and does not infer live headroom from historical token totals.
96
+ `cache_diagnostics` and the sibling `cache_layout_advice` can help users reorganize prompts, find volatile prefix segments, and identify missing evidence or next checks. They do not guarantee savings, do not verify provider cache state, are not billing authority, do not prove provider cache hits, and do not infer live headroom from historical token totals.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ictechgy/context-guard",
3
- "version": "0.4.3",
3
+ "version": "0.4.4",
4
4
  "description": "ContextGuard CLI helpers for keeping AI coding agent context focused and local-first.",
5
5
  "license": "Apache-2.0",
6
6
  "homepage": "https://github.com/ictechgy/context-guard#readme",
@@ -5,7 +5,7 @@ class ContextGuard < Formula
5
5
 
6
6
  desc "Local-first context guardrails for AI coding agents"
7
7
  homepage "https://github.com/ictechgy/context-guard"
8
- url "https://github.com/ictechgy/context-guard/archive/refs/tags/v0.4.3.tar.gz"
8
+ url "https://github.com/ictechgy/context-guard/archive/refs/tags/v0.4.4.tar.gz"
9
9
  sha256 "REPLACE_WITH_RELEASE_TARBALL_SHA256"
10
10
  license "Apache-2.0"
11
11
 
@@ -37,5 +37,5 @@
37
37
  "gated-experiments",
38
38
  "future-roadmap"
39
39
  ],
40
- "version": "0.4.3"
40
+ "version": "0.4.4"
41
41
  }
@@ -95,7 +95,7 @@ context-guard-statusline-merged
95
95
  - **출력 축약기**는 감싼 명령의 종료 코드를 보존하면서 긴 로그를 줄이고, `--digest markdown` 또는 `--digest json`으로 실행기 실패 정보, 가림 처리된 failure signature, 중복 라인 그룹, 다음 조회 제안이 담긴 요약을 만들 수 있습니다.
96
96
  - **민감정보 가림 도구**는 검색, diff, 로그 출력에서 자격 증명 패턴, 비공개 키 블록, 인증 헤더, 자격 증명이 포함된 URL, 민감해 보이는 경로를 가립니다.
97
97
  - **상태표시줄**은 모델, 컨텍스트, 비용 신호를 짧게 보여주고, 대화 기록 데이터가 있으면 캐시 읽기와 캐시 재사용 신호도 함께 표시합니다.
98
- - **대화 기록 감사**는 usage/cost/cache bucket을 집계하고, 토큰 집중 지점과 `cache_friendliness` 프롬프트 배치 신호를 제한된 가림 처리된 segment hash로 보고합니다. 원문 프롬프트는 출력하지 않습니다.
98
+ - **대화 기록 감사**는 usage/cost/cache bucket을 집계하고, 토큰 집중 지점, `cache_friendliness` 프롬프트 배치 신호, `cache_layout_advice` 확인/실험 우선순위를 제한된 가림 처리된 segment hash로 보고합니다. 원문 프롬프트는 출력하지 않습니다.
99
99
  - **반복 실패 알림**은 Bash 실패가 반복될 때 같은 경로를 계속 재시도하지 않고 전략을 바꾸도록 안내합니다.
100
100
  - **벤치마크 헬퍼**는 기준/변형 실행을 대응해 실제 토큰·비용 필드, 별도의 바이트 감소 간접 증거, 진단용 `wall_time_seconds`, `provider_cached_tokens`, provider-cache 사용 가능성 텔레메트리로 기록합니다.
101
101
 
@@ -109,7 +109,7 @@ brief 모드는 코딩 에이전트가 군더더기를 줄이도록 요청하되
109
109
 
110
110
  ## 절감 수치를 과장하지 않습니다
111
111
 
112
- 이 헬퍼들은 흔히 컨텍스트를 불필요하게 키우는 원인을 줄이지만, 고정된 절감률을 보장하지 않습니다. 실제 전후 비교 증거가 필요하면 `context-guard-bench --ledger-jsonl ... --report-json ...`로 본인 작업에서 측정하세요. 토큰 절감 주장은 대응 태스크 양쪽 모두에 `primary_tokens_measured`가 있을 때만 계산하며, report의 `matched_pair_evidence`가 성공한 baseline/variant task bucket을 transform, quality gate, 측정 가능 여부, claim boundary와 연결합니다. wall-time과 provider-cache 필드는 진단용 텔레메트리이지 단독 절감 증거가 아닙니다. 감사의 `cache_friendliness`와 [`cache_diagnostics`](https://github.com/ictechgy/context-guard/blob/main/docs/cache-diagnostics-schema.md) 관측/추론/가설/불가 경계를 둔 휴리스틱 배치·cache-read 신호이며 청구 기준이나 provider-cache 증명이 아닙니다. 벤치마크 CSV 스키마는 엄격하므로 헬퍼 업그레이드 후에는 새 CSV를 시작하거나 헤더를 마이그레이션하세요. 작업 유형별 합성 예시는 [`docs/benchmark-workflow-examples.md`](https://github.com/ictechgy/context-guard/blob/main/docs/benchmark-workflow-examples.md)에 있고, fixture-only 실험 시작 예시는 [`docs/experimental-benchmark-fixtures.md`](https://github.com/ictechgy/context-guard/blob/main/docs/experimental-benchmark-fixtures.md)에 있습니다.
112
+ 이 헬퍼들은 흔히 컨텍스트를 불필요하게 키우는 원인을 줄이지만, 고정된 절감률을 보장하지 않습니다. 실제 전후 비교 증거가 필요하면 `context-guard-bench --ledger-jsonl ... --report-json ...`로 본인 작업에서 측정하세요. 토큰 절감 주장은 대응 태스크 양쪽 모두에 `primary_tokens_measured`가 있을 때만 계산하며, report의 `matched_pair_evidence`가 성공한 baseline/variant task bucket을 transform, quality gate, 측정 가능 여부, claim boundary와 연결합니다. wall-time과 provider-cache 필드는 진단용 텔레메트리이지 단독 절감 증거가 아닙니다. 감사의 `cache_friendliness`, [`cache_diagnostics`](https://github.com/ictechgy/context-guard/blob/main/docs/cache-diagnostics-schema.md), `cache_layout_advice`는 관측/추론/가설/불가 경계를 둔 휴리스틱 배치·cache-read 신호와 순위화된 확인/실험이며 청구 기준이나 provider-cache 증명이 아닙니다. 벤치마크 CSV 스키마는 엄격하므로 헬퍼 업그레이드 후에는 새 CSV를 시작하거나 헤더를 마이그레이션하세요. 작업 유형별 합성 예시는 [`docs/benchmark-workflow-examples.md`](https://github.com/ictechgy/context-guard/blob/main/docs/benchmark-workflow-examples.md)에 있고, fixture-only 실험 시작 예시는 [`docs/experimental-benchmark-fixtures.md`](https://github.com/ictechgy/context-guard/blob/main/docs/experimental-benchmark-fixtures.md)에 있습니다.
113
113
 
114
114
  ContextGuard는 모델 토큰을 줄이기 위해 작업을 외부 AI 서비스로 전송하지 않습니다. 모든 헬퍼 명령은 로컬에서 동작합니다. 로컬 RAM/디스크 보관본은 다음에 보낼 컨텍스트를 줄이는 데 도움될 수 있지만 provider prompt cache를 대체하지 않습니다. Anthropic 배포나 청구 설명 전에는 공식 prompt caching/pricing 문서를 다시 확인하세요: https://docs.anthropic.com/en/build-with-claude/prompt-caching 및 https://platform.claude.com/docs/en/about-claude/pricing.
115
115
 
@@ -101,7 +101,7 @@ context-guard-statusline-merged
101
101
  - **Output trimmer** preserves the wrapped command exit code, trims long logs, and can emit `--digest markdown` or `--digest json` summaries with runner failure facts, sanitized failure signatures, duplicate-line groups, and suggested next queries. Add `--artifact-receipt` with digest mode to store the exact sanitized full output as a local artifact receipt and re-expand omitted slices with the emitted `context-guard-artifact get ...` command.
102
102
  - **Sanitizer** redacts common credential patterns, private key blocks, auth headers, credential URLs, and sensitive-looking paths from search, diff, and log output.
103
103
  - **Statusline** displays compact model/context/cost signals and, when transcript data is available, cache-read and cache-reuse signals.
104
- - **Transcript audit** aggregates usage/cost/cache buckets, flags likely token hotspots, and exposes `cache_friendliness` plus additive [`cache_diagnostics`](https://github.com/ictechgy/context-guard/blob/main/docs/cache-diagnostics-schema.md) findings from bounded usage fields, timestamped cache telemetry records, and redacted segment hashes without printing raw prompt text or claiming provider-cache savings.
104
+ - **Transcript audit** aggregates usage/cost/cache buckets, flags likely token hotspots, and exposes `cache_friendliness`, additive [`cache_diagnostics`](https://github.com/ictechgy/context-guard/blob/main/docs/cache-diagnostics-schema.md), and `cache_layout_advice` experiment priorities from bounded usage fields, timestamped cache telemetry records, and redacted segment hashes without printing raw prompt text or claiming provider-cache savings.
105
105
  - **Repeated-failure nudge** warns after repeated Bash failures so the agent switches strategy instead of retrying the same context-heavy path.
106
106
  - **Benchmark helper** records matched baseline/variant runs with real token and cost fields, separate byte-reduction proxy evidence, diagnostic `wall_time_seconds`, `provider_cached_tokens`, and provider-cache availability telemetry.
107
107
 
@@ -115,7 +115,7 @@ Three deterministic levels — `lite`, `standard`, `ultra` — live under [`brie
115
115
 
116
116
  ## Conservative claims
117
117
 
118
- These helpers reduce common sources of context bloat, but they do not guarantee a fixed percentage savings. Use `context-guard-bench --ledger-jsonl ... --report-json ...` when you need measured before/after evidence for your own tasks; token-savings claims require `primary_tokens_measured` on both matched sides, and the report's `matched_pair_evidence` links each successful baseline/variant task bucket to the transform, quality gate, measurement availability, and claim boundary. Wall-time/provider-cache fields are diagnostic telemetry, not standalone savings proof. Audit `cache_friendliness` and [`cache_diagnostics`](https://github.com/ictechgy/context-guard/blob/main/docs/cache-diagnostics-schema.md) findings are heuristic layout/cache-read signals with observed/inferred/hypothesis/unavailable boundaries, not billing authority or provider-cache proof. Benchmark CSV schemas are strict, so start a new CSV or migrate the header after helper upgrades. Workflow-specific synthetic examples live in [`docs/benchmark-workflow-examples.md`](https://github.com/ictechgy/context-guard/blob/main/docs/benchmark-workflow-examples.md), and fixture-only experimental task/variant starters live in [`docs/experimental-benchmark-fixtures.md`](https://github.com/ictechgy/context-guard/blob/main/docs/experimental-benchmark-fixtures.md).
118
+ These helpers reduce common sources of context bloat, but they do not guarantee a fixed percentage savings. Use `context-guard-bench --ledger-jsonl ... --report-json ...` when you need measured before/after evidence for your own tasks; token-savings claims require `primary_tokens_measured` on both matched sides, and the report's `matched_pair_evidence` links each successful baseline/variant task bucket to the transform, quality gate, measurement availability, and claim boundary. Wall-time/provider-cache fields are diagnostic telemetry, not standalone savings proof. Audit `cache_friendliness`, [`cache_diagnostics`](https://github.com/ictechgy/context-guard/blob/main/docs/cache-diagnostics-schema.md), and `cache_layout_advice` findings are heuristic layout/cache-read signals and ranked checks/experiments with observed/inferred/hypothesis/unavailable boundaries, not billing authority or provider-cache proof. Benchmark CSV schemas are strict, so start a new CSV or migrate the header after helper upgrades. Workflow-specific synthetic examples live in [`docs/benchmark-workflow-examples.md`](https://github.com/ictechgy/context-guard/blob/main/docs/benchmark-workflow-examples.md), and fixture-only experimental task/variant starters live in [`docs/experimental-benchmark-fixtures.md`](https://github.com/ictechgy/context-guard/blob/main/docs/experimental-benchmark-fixtures.md).
119
119
 
120
120
  ContextGuard also does not send work to external AI providers to save model tokens. All helper commands run locally. Local RAM/disk receipts can reduce what you choose to send, but they do not replace a provider prompt cache. Before release or billing claims for Anthropic, recheck the official prompt-caching and pricing docs: https://docs.anthropic.com/en/build-with-claude/prompt-caching and https://platform.claude.com/docs/en/about-claude/pricing.
121
121
 
@@ -49,6 +49,7 @@ TIMESTAMP_KEYS = ("timestamp", "created_at", "createdAt", "time", "ts")
49
49
  FEASIBILITY_SCHEMA_VERSION = "contextguard.metric-feasibility.v1.2"
50
50
  FEASIBILITY_PRODUCER = "context-guard-audit"
51
51
  CACHE_DIAGNOSTICS_SCHEMA_VERSION = "contextguard.cache-diagnostics.v1"
52
+ CACHE_LAYOUT_ADVICE_SCHEMA_VERSION = "contextguard.cache-layout-advice.v1"
52
53
  MAX_ERROR_EXAMPLES = 20
53
54
  JSON_PARSE_RECURSION_LIMIT = 10_000
54
55
  READ_CHUNK_BYTES = 64 * 1024
@@ -184,6 +185,7 @@ class UsageSummary:
184
185
  prompt_cache_audit: PromptCacheAudit = field(default_factory=PromptCacheAudit)
185
186
  cache_friendliness_cache: dict[str, Any] | None = field(default=None, init=False, repr=False)
186
187
  cache_diagnostics_cache: dict[str, Any] | None = field(default=None, init=False, repr=False)
188
+ cache_layout_advice_cache: dict[str, Any] | None = field(default=None, init=False, repr=False)
187
189
 
188
190
  @property
189
191
  def total_tokens(self) -> int:
@@ -1398,6 +1400,222 @@ def cache_diagnostics_for_summary(summary: UsageSummary) -> dict[str, Any]:
1398
1400
  return build_cache_diagnostics(summary)
1399
1401
 
1400
1402
 
1403
+ def _dominant_transcript(summary: UsageSummary) -> dict[str, Any] | None:
1404
+ if summary.total_tokens <= 0 or not summary.by_file:
1405
+ return None
1406
+ _label, tokens = summary.by_file.most_common(1)[0]
1407
+ share = tokens / summary.total_tokens if summary.total_tokens else 0.0
1408
+ return {
1409
+ "tokens": tokens,
1410
+ "share": round(share, 4),
1411
+ "dominates": share >= 0.20 and tokens >= 1_000,
1412
+ }
1413
+
1414
+
1415
+ def _first_dynamic_breaker(cache_diagnostics: dict[str, Any]) -> dict[str, Any] | None:
1416
+ breakers = cache_diagnostics.get("dynamic_prefix_breakers") or []
1417
+ if not breakers:
1418
+ return None
1419
+ first = breakers[0]
1420
+ return first if isinstance(first, dict) else None
1421
+
1422
+
1423
+ def build_cache_layout_advice(summary: UsageSummary) -> dict[str, Any]:
1424
+ if summary.cache_layout_advice_cache is not None:
1425
+ return summary.cache_layout_advice_cache
1426
+
1427
+ cache_friendliness = cache_friendliness_for_summary(summary)
1428
+ cache_diagnostics = cache_diagnostics_for_summary(summary)
1429
+ signals = cache_friendliness.get("signals") if isinstance(cache_friendliness.get("signals"), dict) else {}
1430
+ dynamic_breaker = _first_dynamic_breaker(cache_diagnostics)
1431
+ dominant = _dominant_transcript(summary)
1432
+ cache_creation = summary.tokens.get("cache_creation", 0)
1433
+ cache_read = summary.tokens.get("cache_read", 0)
1434
+ cache_fields = cache_diagnostics.get("observations", {}).get("cache_fields", {}) if isinstance(cache_diagnostics.get("observations"), dict) else {}
1435
+ cache_status = cache_fields.get("status") if isinstance(cache_fields, dict) else None
1436
+ stable_prefix_share = signals.get("stable_prefix_share")
1437
+ volatile_prefix_share = signals.get("volatile_prefix_share")
1438
+ volatile_tail_share = signals.get("volatile_tail_share")
1439
+ max_prefix_position = dynamic_breaker.get("position") if dynamic_breaker else None
1440
+ max_prefix_position_volatile_share = dynamic_breaker.get("volatile_share") if dynamic_breaker else signals.get("max_prefix_position_volatile_share")
1441
+
1442
+ status = "missing"
1443
+ confidence = "unavailable"
1444
+ observed_issue = "unknown"
1445
+ priority = "P2"
1446
+ hypothesized_causes: list[dict[str, Any]] = []
1447
+ corroborated_causes: list[dict[str, Any]] = []
1448
+ next_checks: list[dict[str, Any]] = []
1449
+ recommended_experiments: list[dict[str, Any]] = []
1450
+
1451
+ has_cache_any = bool(
1452
+ summary.token_field_presence.get("cache_read", 0)
1453
+ or summary.token_field_presence.get("cache_creation", 0)
1454
+ )
1455
+ has_prompt_samples = bool(summary.prompt_cache_audit.samples)
1456
+ if has_cache_any or has_prompt_samples:
1457
+ status = "partial" if (
1458
+ not has_prompt_samples
1459
+ or cache_friendliness.get("status") == "partial"
1460
+ or cache_diagnostics.get("status") == "partial"
1461
+ or summary.skipped_files
1462
+ or summary.skipped_records
1463
+ or summary.parse_errors
1464
+ ) else "available"
1465
+ confidence = "partial" if status == "partial" else "hypothesis"
1466
+
1467
+ volatile_prefix_breaker = bool(
1468
+ dynamic_breaker
1469
+ and cache_creation > 0
1470
+ and (max_prefix_position in {0, 1} or (max_prefix_position_volatile_share or 0) >= PROMPT_PREFIX_VOLATILE_THRESHOLD)
1471
+ )
1472
+ long_session_dominates = bool(dominant and dominant.get("dominates"))
1473
+
1474
+ if volatile_prefix_breaker:
1475
+ observed_issue = "volatile_prefix_breaker"
1476
+ priority = "P0" if cache_creation >= 50_000 and max_prefix_position in {0, 1} else "P1"
1477
+ hypothesized_causes.append({
1478
+ "id": "prefix-position-churn",
1479
+ "confidence": confidence,
1480
+ "evidence": EVIDENCE_INFERRED,
1481
+ "reason": (
1482
+ "A highly volatile redacted prompt segment appears in the early prefix window; "
1483
+ "this identifies a layout issue, not a confirmed source."
1484
+ ),
1485
+ "next_check": "Check whether startup context, generated evidence, or tool/MCP catalog changes are moving before stable policy.",
1486
+ })
1487
+ if cache_diagnostics.get("stable_prefix_candidates"):
1488
+ hypothesized_causes.append({
1489
+ "id": "evidence-before-policy",
1490
+ "confidence": confidence,
1491
+ "evidence": EVIDENCE_INFERRED,
1492
+ "reason": (
1493
+ "Stable reusable segments appear elsewhere while the early prefix churns; "
1494
+ "check whether logs, diffs, timestamps, or file evidence precede stable instructions."
1495
+ ),
1496
+ "next_check": "Keep stable policy/instructions first and move generated run evidence later.",
1497
+ })
1498
+ next_checks.append({
1499
+ "id": "inspect-startup-context-size",
1500
+ "confidence": "hypothesis",
1501
+ "command_templates": [
1502
+ "context-guard-diet scan <repo>",
1503
+ "context-guard-diet structural-waste <repo>",
1504
+ ],
1505
+ "evidence_required_for_corroboration": (
1506
+ "Large or duplicate CLAUDE.md/AGENTS.md/GEMINI.md findings from diet output."
1507
+ ),
1508
+ })
1509
+ elif long_session_dominates:
1510
+ observed_issue = "long_session_accumulation"
1511
+ priority = "P1"
1512
+ elif cache_creation >= 10_000 and cache_read > 0 and summary.cache_amortization < 0.5:
1513
+ observed_issue = "low_cache_reuse"
1514
+ priority = "P1"
1515
+ elif cache_status == "missing" or not has_cache_any:
1516
+ observed_issue = "missing_cache_fields"
1517
+ priority = "P2"
1518
+
1519
+ if long_session_dominates:
1520
+ recommended_experiments.append({
1521
+ "id": "split-long-sessions",
1522
+ "order": len(recommended_experiments) + 1,
1523
+ "priority": "P1",
1524
+ "effort": "low",
1525
+ "action": "Use /clear between unrelated tasks and /compact focus on changed files, failing tests, and remaining TODO during long work.",
1526
+ "expected_signal": "Cache creation per comparable task decreases and one transcript no longer dominates observed tokens.",
1527
+ "verification": "Re-run context-guard-audit on a comparable window and compare cache_creation, cache_amortization, and top transcript share.",
1528
+ "evidence": dominant or {},
1529
+ })
1530
+ if volatile_prefix_breaker:
1531
+ recommended_experiments.append({
1532
+ "id": "stabilize-cache-prefix",
1533
+ "order": len(recommended_experiments) + 1,
1534
+ "priority": priority,
1535
+ "effort": "medium",
1536
+ "action": "Keep stable reusable instructions/policy before volatile logs, diffs, timestamps, and generated file evidence.",
1537
+ "expected_signal": "Stable prefix share rises and volatile prefix share falls on matched audit windows.",
1538
+ "verification": "Re-run context-guard-audit --json --recommend and compare cache_layout_advice plus cache_friendliness signals.",
1539
+ "evidence": {
1540
+ "dynamic_prefix_breaker_position": max_prefix_position,
1541
+ "dynamic_prefix_breaker_volatile_share": max_prefix_position_volatile_share,
1542
+ },
1543
+ })
1544
+ recommended_experiments.append({
1545
+ "id": "run-context-diet-checks",
1546
+ "order": len(recommended_experiments) + 1,
1547
+ "priority": "P1",
1548
+ "effort": "low",
1549
+ "action": "Run the generated diet command templates and treat any large/duplicate context-file findings as corroborating evidence before editing instructions.",
1550
+ "expected_signal": "Diet output identifies or rules out oversized/duplicated startup context as a contributor.",
1551
+ "verification": "Record diet JSON separately; do not convert prefix-position evidence alone into a confirmed startup-context cause.",
1552
+ "command_templates": [
1553
+ "context-guard-diet scan <repo> --json > diet.json",
1554
+ "context-guard-diet structural-waste <repo> --json > structural-waste.json",
1555
+ ],
1556
+ })
1557
+ if cache_creation >= 50_000 and summary.cache_amortization_defined and 1.0 <= summary.cache_amortization < 5.0:
1558
+ recommended_experiments.append({
1559
+ "id": "defer-longer-ttl-until-prefix-stable" if volatile_prefix_breaker else "evaluate-longer-ttl-after-stability-check",
1560
+ "order": len(recommended_experiments) + 1,
1561
+ "priority": "P2",
1562
+ "effort": "medium",
1563
+ "action": "Treat longer TTL as secondary; first corroborate stable prefix reuse and current provider TTL/pricing behavior.",
1564
+ "expected_signal": "TTL evaluation happens only after prefix volatility is reduced or ruled out.",
1565
+ "verification": "Use timestamped cache telemetry and provider-measured billing/cost evidence; historical token totals alone are insufficient.",
1566
+ })
1567
+ if not recommended_experiments and status == "partial":
1568
+ next_checks.append({
1569
+ "id": "rerun-narrower-audit",
1570
+ "confidence": "partial",
1571
+ "command_templates": ["context-guard-audit <transcript-or-project-dir> --json --recommend"],
1572
+ "evidence_required_for_corroboration": "Enough uncapped prompt/cache records to classify prefix layout.",
1573
+ })
1574
+ if not recommended_experiments and observed_issue == "missing_cache_fields":
1575
+ next_checks.append({
1576
+ "id": "collect-cache-telemetry",
1577
+ "confidence": "unavailable",
1578
+ "command_templates": ["context-guard-audit ~/.claude/projects --json --recommend"],
1579
+ "evidence_required_for_corroboration": "Transcript records with cache_read/cache_creation fields.",
1580
+ })
1581
+
1582
+ advice = {
1583
+ "schema_version": CACHE_LAYOUT_ADVICE_SCHEMA_VERSION,
1584
+ "status": status,
1585
+ "confidence": confidence,
1586
+ "heuristic": True,
1587
+ "observed_issue": observed_issue,
1588
+ "priority": priority,
1589
+ "observed_summary": {
1590
+ "cache_creation_tokens": cache_creation,
1591
+ "cache_read_tokens": cache_read,
1592
+ "cache_amortization": round(summary.cache_amortization, 4) if summary.cache_amortization_defined else None,
1593
+ "stable_prefix_share": stable_prefix_share,
1594
+ "volatile_prefix_share": volatile_prefix_share,
1595
+ "volatile_tail_share": volatile_tail_share,
1596
+ "max_prefix_position": max_prefix_position,
1597
+ "max_prefix_position_volatile_share": max_prefix_position_volatile_share,
1598
+ "dominant_transcript_share": dominant.get("share") if dominant else None,
1599
+ },
1600
+ "hypothesized_causes": hypothesized_causes,
1601
+ "corroborated_causes": corroborated_causes,
1602
+ "next_checks": next_checks,
1603
+ "recommended_experiments": recommended_experiments,
1604
+ "caveats": [
1605
+ "Cache layout advice is a local transcript heuristic, not billing authority or provider-cache proof.",
1606
+ "Observed issues come from cache fields and redacted segment statistics; causes remain hypotheses until corroborated by diet/structural evidence.",
1607
+ "Generated command templates use placeholders and must not be treated as observed user commands or paths.",
1608
+ "Use matched before/after audits before making token or cost savings claims.",
1609
+ ],
1610
+ }
1611
+ summary.cache_layout_advice_cache = advice
1612
+ return advice
1613
+
1614
+
1615
+ def cache_layout_advice_for_summary(summary: UsageSummary) -> dict[str, Any]:
1616
+ return build_cache_layout_advice(summary)
1617
+
1618
+
1401
1619
  def build_metric_caveats(summary: UsageSummary) -> list[str]:
1402
1620
  caveats = [
1403
1621
  "Values are observed from local Claude Code transcript JSON/JSONL fields and are not official billing records.",
@@ -1433,6 +1651,7 @@ def feasibility_json(
1433
1651
  stable_total_tokens = sum(stable_tokens.values())
1434
1652
  cache_friendliness = cache_friendliness_for_summary(summary)
1435
1653
  cache_diagnostics = cache_diagnostics_for_summary(summary)
1654
+ cache_layout_advice = cache_layout_advice_for_summary(summary)
1436
1655
  return {
1437
1656
  "schema_version": FEASIBILITY_SCHEMA_VERSION,
1438
1657
  "producer": FEASIBILITY_PRODUCER,
@@ -1452,6 +1671,7 @@ def feasibility_json(
1452
1671
  "headroom_availability",
1453
1672
  "cache_friendliness",
1454
1673
  "cache_diagnostics",
1674
+ "cache_layout_advice",
1455
1675
  "totals",
1456
1676
  ],
1457
1677
  "diagnostic_fields": ["summary"],
@@ -1480,6 +1700,7 @@ def feasibility_json(
1480
1700
  "headroom_availability": availability["headroom"],
1481
1701
  "cache_friendliness": cache_friendliness,
1482
1702
  "cache_diagnostics": cache_diagnostics,
1703
+ "cache_layout_advice": cache_layout_advice,
1483
1704
  "totals": {
1484
1705
  "total_tokens": stable_total_tokens,
1485
1706
  "tokens": stable_tokens,
@@ -1531,6 +1752,36 @@ def build_recommendations(summary: UsageSummary, top: int) -> list[dict[str, Any
1531
1752
  input_ratio = input_tokens / total
1532
1753
  cache_friendliness = cache_friendliness_for_summary(summary)
1533
1754
  cache_diagnostics = cache_diagnostics_for_summary(summary)
1755
+ cache_layout_advice = cache_layout_advice_for_summary(summary)
1756
+ if cache_layout_advice.get("observed_issue") == "volatile_prefix_breaker":
1757
+ evidence = {
1758
+ "observed_issue": cache_layout_advice.get("observed_issue"),
1759
+ "priority": cache_layout_advice.get("priority"),
1760
+ "confidence": cache_layout_advice.get("confidence"),
1761
+ "cache_creation_tokens": cache_creation,
1762
+ "cache_read_tokens": cache_read,
1763
+ }
1764
+ observed_summary = cache_layout_advice.get("observed_summary")
1765
+ if isinstance(observed_summary, dict):
1766
+ for key in ("max_prefix_position", "max_prefix_position_volatile_share", "stable_prefix_share", "volatile_prefix_share"):
1767
+ evidence[key] = observed_summary.get(key)
1768
+ rec = recommendation(
1769
+ "prioritize-cache-prefix-stabilization",
1770
+ "Prioritize cache-prefix stabilization before TTL or output trimming",
1771
+ (
1772
+ "Cache creation remains material and redacted segment statistics show a volatile early prefix; "
1773
+ "this is an experiment-prioritization signal, not a confirmed root cause."
1774
+ ),
1775
+ (
1776
+ "If one transcript dominates, split unrelated work into shorter sessions; then check startup/context "
1777
+ "size and keep stable policy before volatile logs, diffs, timestamps, and generated evidence."
1778
+ ),
1779
+ str(cache_layout_advice.get("priority") or "P1"),
1780
+ evidence,
1781
+ )
1782
+ rec["heuristic"] = True
1783
+ rec["confidence"] = cache_layout_advice.get("confidence")
1784
+ recs.append(rec)
1534
1785
  for finding in cache_friendliness.get("findings", []):
1535
1786
  if isinstance(finding, dict) and finding.get("id") == "volatile-content-near-prefix":
1536
1787
  evidence = dict(finding.get("evidence") or {})
@@ -1754,6 +2005,7 @@ def summary_json(
1754
2005
  "top_tools": counter_json(summary.by_tool, top),
1755
2006
  "cache_friendliness": cache_friendliness_for_summary(summary),
1756
2007
  "cache_diagnostics": cache_diagnostics_for_summary(summary),
2008
+ "cache_layout_advice": cache_layout_advice_for_summary(summary),
1757
2009
  }
1758
2010
  if include_recommendations:
1759
2011
  data["recommendations"] = build_recommendations(summary, top)
@@ -1887,6 +2139,26 @@ def main() -> int:
1887
2139
  headroom = cache_diagnostics.get("headroom_diagnostics") or {}
1888
2140
  print(f" headroom_status {headroom.get('status')} ({headroom.get('evidence')})")
1889
2141
 
2142
+ cache_layout_advice = cache_layout_advice_for_summary(summary)
2143
+ if cache_layout_advice.get("status") != "missing" or cache_layout_advice.get("observed_issue") != "unknown":
2144
+ print("\nCache layout advice")
2145
+ print(f" status {cache_layout_advice.get('status')}")
2146
+ print(f" confidence {cache_layout_advice.get('confidence')}")
2147
+ print(f" observed_issue {cache_layout_advice.get('observed_issue')}")
2148
+ print(f" priority {cache_layout_advice.get('priority')}")
2149
+ experiments = cache_layout_advice.get("recommended_experiments") or []
2150
+ if experiments:
2151
+ first = experiments[0]
2152
+ print(f" first_experiment {first.get('id')} ({first.get('priority')})")
2153
+ print(f" experiment_action {first.get('action')}")
2154
+ checks = cache_layout_advice.get("next_checks") or []
2155
+ if checks:
2156
+ first = checks[0]
2157
+ print(f" next_check {first.get('id')}")
2158
+ templates = first.get("command_templates") or []
2159
+ if templates:
2160
+ print(f" command_template {templates[0]}")
2161
+
1890
2162
  model_totals = Counter({model: sum(tokens.values()) for model, tokens in summary.by_model.items()})
1891
2163
  print_counter("By model", model_totals, args.top)
1892
2164