@ictechgy/context-guard 0.4.3 → 0.4.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. package/CHANGELOG.md +13 -0
  2. package/README.ko.md +16 -3
  3. package/README.md +13 -3
  4. package/context-guard-kit/README.md +2 -2
  5. package/context-guard-kit/benchmark_runner.py +244 -6
  6. package/context-guard-kit/claude_transcript_cost_audit.py +443 -1
  7. package/docs/benchmark-fixtures/learned-compression-baseline-context-pack.prompt.example.md +19 -0
  8. package/docs/benchmark-fixtures/learned-compression-candidate-digest.prompt.example.md +21 -0
  9. package/docs/benchmark-fixtures/learned-compression.tasks.example.json +5 -1
  10. package/docs/benchmark-fixtures/output-transform-baseline-raw-output.prompt.example.md +20 -0
  11. package/docs/benchmark-fixtures/output-transform-digest-receipt.prompt.example.md +23 -0
  12. package/docs/benchmark-fixtures/output-transform.tasks.example.json +28 -0
  13. package/docs/benchmark-fixtures/output-transform.variants.example.json +10 -0
  14. package/docs/benchmark-fixtures/visual-ocr-cropped-ocr.prompt.example.md +22 -0
  15. package/docs/benchmark-fixtures/visual-ocr-full-visual.prompt.example.md +19 -0
  16. package/docs/benchmark-fixtures/visual-ocr.tasks.example.json +5 -1
  17. package/docs/benchmark-workflow-examples.md +6 -2
  18. package/docs/benchmark-workflows/self-hosted-metrics-ledger.example.jsonl +1 -0
  19. package/docs/cache-diagnostics-schema.md +25 -4
  20. package/docs/experimental-benchmark-fixtures.md +17 -6
  21. package/docs/mac-visibility-feasibility-schema.md +62 -0
  22. package/docs/mac-visibility-feasibility.example.json +130 -0
  23. package/package.json +5 -1
  24. package/packaging/homebrew/context-guard.rb.template +1 -1
  25. package/plugins/context-guard/.claude-plugin/plugin.json +1 -1
  26. package/plugins/context-guard/README.ko.md +3 -3
  27. package/plugins/context-guard/README.md +3 -3
  28. package/plugins/context-guard/bin/context-guard-audit +443 -1
  29. package/plugins/context-guard/bin/context-guard-bench +244 -6
@@ -0,0 +1,22 @@
1
+ Fixture-only cropped/OCR prompt for multimodal crop/OCR experiment setup.
2
+
3
+ You are reviewing sanitized textual cropped/OCR-derived evidence only. This fixture contains no screenshot, image file, image URL, private path, endpoint, crop helper, OCR helper, visual-token helper, or provider call.
4
+
5
+ Cropped or OCR-derived evidence:
6
+ - original image dimensions telemetry: width 1200, height 800
7
+ - crop area telemetry: x 80, y 120, width 640, height 360
8
+ - visible area: checkout form region with validation text `Card number required`
9
+ - OCR text: `Card number required`; `Continue`
10
+ - OCR confidence telemetry: 0.96 for validation text; 0.92 for navigation label
11
+ - OCR error notes: footer notice omitted by crop; decorative icon ignored
12
+ - omitted context: footer notice `Sandbox order` and page-wide navigation are not visible in the crop
13
+ - missed-context guardrail: if the answer depends on omitted context, fall back to full visual evidence before judging success
14
+ - full visual fallback: rerun with baseline_full_visual_fixture evidence when OCR confidence is low, crop area excludes required context, or human correction is needed
15
+
16
+ Task:
17
+ 1. Identify the navigation control associated with the visible validation error.
18
+ 2. List any missed or omitted context that could change the answer.
19
+ 3. State that crop area, OCR text, OCR confidence, and byte counts are proxy or telemetry evidence only, not hosted API token or cost savings evidence.
20
+ 4. State that real comparisons require provider-measured image/text token or cost fields when available, matched successful tasks, failure-rate guardrail, human corrections, and shifted-cost accounting.
21
+
22
+ This prompt is dry-run-only fixture scaffolding and does not claim hosted API savings.
@@ -0,0 +1,19 @@
1
+ Fixture-only full visual prompt for multimodal crop/OCR experiment setup.
2
+
3
+ You are reviewing sanitized textual visual evidence only. This fixture contains no screenshot, image file, image URL, private path, endpoint, crop helper, OCR helper, visual-token helper, or provider call.
4
+
5
+ Full visual evidence:
6
+ - image dimensions telemetry: width 1200, height 800
7
+ - visible area: full page region
8
+ - checkout form context: navigation button label `Continue`, validation text `Card number required`, footer notice `Sandbox order`
9
+ - missed context risk: none for this baseline because the full visual description is present
10
+ - OCR confidence telemetry: not applicable for full visual baseline
11
+ - OCR error notes: not applicable for full visual baseline
12
+
13
+ Task:
14
+ 1. Identify the navigation control associated with the visible validation error.
15
+ 2. State what evidence would need to remain visible if a cropped or OCR-derived variant is used.
16
+ 3. State that image dimensions and visible area are telemetry only, not hosted API token or cost savings evidence.
17
+ 4. State that real comparisons require provider-measured image/text token or cost fields when available, matched successful tasks, failure-rate guardrail, human corrections, and shifted-cost accounting.
18
+
19
+ This prompt is dry-run-only fixture scaffolding and does not claim hosted API savings.
@@ -8,7 +8,11 @@
8
8
  "max_budget_usd": 1.0,
9
9
  "allowed_tools": [],
10
10
  "success_command": "python3 -c \"raise SystemExit('fixture-only placeholder: replace success_command before real benchmark runs')\"",
11
- "success_cwd": "."
11
+ "success_cwd": ".",
12
+ "variant_prompt_files": {
13
+ "baseline_full_visual_fixture": "visual-ocr-full-visual.prompt.example.md",
14
+ "fixture_only_cropped_or_ocr_evidence": "visual-ocr-cropped-ocr.prompt.example.md"
15
+ }
12
16
  },
13
17
  {
14
18
  "id": "visual_ocr_table_status_fixture",
@@ -17,6 +17,7 @@ Use them to decide what evidence a workflow has and what it does **not** prove:
17
17
  | [`benchmark-workflows/context-pack-byte-proxy.example.json`](benchmark-workflows/context-pack-byte-proxy.example.json) | `context-guard-pack auto` can reduce selected local bytes and inferred token proxies. | No hosted API token-savings claim because primary provider token fields are unavailable. |
18
18
  | [`benchmark-workflows/provider-cache-telemetry.example.json`](benchmark-workflows/provider-cache-telemetry.example.json) | Cache-layout diagnostics can coincide with observed provider cached-token telemetry. | Provider-cache telemetry is not proof that ContextGuard reduced prompt tokens or cost. |
19
19
  | [`benchmark-workflows/measured-token-workflow.example.json`](benchmark-workflows/measured-token-workflow.example.json) | A matched successful task pair with measured primary tokens may expose `token_savings_pct`. | The percentage is sample report data only, not a general savings promise; real claims require your own matched successful task runs and quality gates. |
20
+ | [`benchmark-workflows/self-hosted-metrics-ledger.example.jsonl`](benchmark-workflows/self-hosted-metrics-ledger.example.jsonl) | A run-evidence JSONL row can carry explicit local/model-server latency, peak-memory, and quality sidecar metrics. | Self-hosted metrics are not hosted API token/cost telemetry and do not change report savings math. |
20
21
 
21
22
  ## How to use the examples
22
23
 
@@ -24,6 +25,7 @@ Use them to decide what evidence a workflow has and what it does **not** prove:
24
25
  2. Compare your report's `claim_status`, `summary_by_variant`, and `comparisons[].quality_gate` to the examples.
25
26
  3. Treat `comparisons[].quality_gate != "pass"` as a warning to inspect failures, correction burden, and unmatched tasks before discussing savings.
26
27
  4. Keep byte-proxy, provider-cache, wall-time, and shifted-cost evidence in separate language from provider-measured token/cost claims. Provider-cache telemetry is not independent savings proof.
28
+ 5. Keep self-hosted local/model-server latency, memory, and quality metrics in the run-evidence ledger sidecar; do not fold them into hosted API token/cost savings claims unless provider-measured matched-task evidence separately supports that claim.
27
29
 
28
30
  ## Safe wording
29
31
 
@@ -35,6 +37,8 @@ Avoid language like:
35
37
 
36
38
  > ContextGuard guarantees this workflow will save tokens or cost.
37
39
 
38
- The fixtures intentionally use full `context-guard-bench-report-v1` shapes so tests can catch schema drift and overclaim wording.
40
+ The `.example.json` fixtures intentionally use full `context-guard-bench-report-v1` shapes so tests can catch schema drift and overclaim wording.
39
41
 
40
- For task/variant starter fixtures rather than full report-shape examples, see [`experimental-benchmark-fixtures.md`](experimental-benchmark-fixtures.md). Those files are fixture-only and synthetic dry-run-only starters until users replace the placeholder prompts and success checks; they are not shipped OCR, visual-token, or learned-compression runtime features, and real claims still require provider-measured matched successful tasks plus failure-rate, correction, and shifted-cost guardrails.
42
+ The self-hosted metrics example is a JSONL run-evidence sidecar, not a full report shape. Its fields are additive ledger evidence only: `latency_ms`, `peak_memory_mb`, and normalized `quality_score` describe local/model-server behavior and leave hosted API report calculations unchanged.
43
+
44
+ For task/variant starter fixtures rather than full report-shape examples, see [`experimental-benchmark-fixtures.md`](experimental-benchmark-fixtures.md). Those files are fixture-only and synthetic dry-run-only starters until users replace the placeholder prompts and success checks; they are not shipped OCR, visual-token, learned-compression, or output-transform benchmark results, and real claims still require provider-measured matched successful tasks plus failure-rate, correction, and shifted-cost guardrails.
@@ -0,0 +1 @@
1
+ {"schema_version":"contextguard.bench.run-evidence.v1","task_id":"self-hosted-demo","variant":"local-cache-reuse","success":true,"primary_tokens_measured":false,"primary_tokens":0,"primary_cost_measured":false,"primary_cost_usd":0.0,"external_tokens_measured":false,"external_tokens":0,"external_cost_measured":false,"external_cost_usd":0.0,"total_cost_with_shift_usd":null,"wall_time_seconds":0.0,"measurement_availability":{"primary_tokens":false,"primary_cost":false,"external_tokens":false,"external_cost":false,"shifted_cost":false,"provider_cache":false,"byte_metrics":false,"wall_time":true,"self_hosted_metrics":true},"self_hosted_metrics":{"schema_version":"contextguard.bench.self-hosted-metrics.v1","source":"explicit_provider_payload.self_hosted_metrics","metrics":{"latency_ms":842.5,"peak_memory_mb":14336.0,"quality_score":0.98},"labels":{"model_server":"local test server","optimization":"prefix cache reuse","quality_metric":"golden task pass rate"},"measurement_availability":{"latency_ms":true,"peak_memory_mb":true,"quality_score":true},"claim_boundary":{"id":"self_hosted_metrics_only_not_hosted_api_token_or_cost_savings","hosted_api_token_savings_claim_allowed":false,"hosted_api_cost_savings_claim_allowed":false,"requires_provider_measured_matched_tasks_for_hosted_claims":true,"reason":"Self-hosted local/model-server latency, memory, and quality metrics are not hosted API token or cost telemetry."}},"proxy_metrics":{"byte_metrics_observed":false,"token_proxy":"chars_div_4","bytes_per_token":4,"claim_boundary":"proxy_only_not_hosted_token_savings"},"notes":"Synthetic JSONL shape only; make hosted API savings claims only from provider-measured matched successful task evidence."}
@@ -2,7 +2,9 @@
2
2
 
3
3
  `cache_diagnostics` is the nested diagnostic object emitted by `context-guard-audit --json` and by top-level `cache_diagnostics` in `context-guard-audit --feasibility-json`. The committed schema file, [`cache-diagnostics.schema.json`](cache-diagnostics.schema.json), describes that nested object only; it is not the full CLI response envelope.
4
4
 
5
- The object is for GUI and external consumers that need stable cache-read, prefix-layout, TTL-evidence, and headroom-boundary fields without scraping prose. It is a local transcript diagnostic contract, not a billing source, not provider telemetry verification, and not a token or cost savings promise.
5
+ The object is for GUI and external consumers that need stable cache-read, prefix-layout, TTL-evidence, and headroom-boundary fields without scraping prose. It is a local transcript diagnostic contract, not a billing source, not provider telemetry verification, and not a token or cost savings promise. It does not guarantee savings, does not prove provider cache hits, and does not infer live headroom.
6
+
7
+ `context-guard-audit` also emits a top-level sibling `cache_layout_advice` object. That sibling is intentionally separate from `cache_diagnostics`: diagnostics stay evidence-oriented, while advice ranks checks and experiments such as session splitting, prefix stabilization, and context-diet scans. Advice distinguishes an `observed_issue` from `hypothesized_causes`, `corroborated_causes`, and `next_checks`; without diet or structural evidence, volatile prefix positions should be presented as hypotheses to check, not confirmed root causes.
6
8
 
7
9
  ## Files
8
10
 
@@ -13,11 +15,30 @@ The object is for GUI and external consumers that need stable cache-read, prefix
13
15
 
14
16
  ### `context-guard-audit --json`
15
17
 
16
- The legacy audit JSON includes top-level `cache_diagnostics` beside `cache_metrics` and `cache_friendliness`.
18
+ The legacy audit JSON includes top-level `cache_diagnostics` beside `cache_metrics`, `cache_friendliness`, and the separate `cache_layout_advice` advice object.
17
19
 
18
20
  ### `context-guard-audit --feasibility-json`
19
21
 
20
- The feasibility JSON includes top-level `cache_diagnostics` and lists `cache_diagnostics` in `consumer_contract.stable_top_level_fields`. GUI consumers should prefer the top-level feasibility field when available and use `summary.cache_diagnostics` only for legacy compatibility.
22
+ The feasibility JSON includes top-level `cache_diagnostics` and `cache_layout_advice`, and lists both in `consumer_contract.stable_top_level_fields`. GUI consumers should prefer the top-level feasibility field when available and use `summary.cache_diagnostics` only for legacy compatibility.
23
+
24
+ ## Sibling `cache_layout_advice` fields
25
+
26
+ `cache_layout_advice` is a stable top-level sibling of `cache_diagnostics`, but it is deliberately not part of `cache-diagnostics.schema.json`. It is an advice contract over local transcript heuristics.
27
+
28
+ | Field | Meaning | Consumer note |
29
+ | --- | --- | --- |
30
+ | `schema_version` | Stable version string, currently `contextguard.cache-layout-advice.v1`. | Treat unknown versions conservatively. |
31
+ | `status` | Advice availability: `available`, `partial`, or `missing`. | `partial` means prompt/cache evidence was capped, skipped, or incomplete. |
32
+ | `confidence` | Overall advice confidence: `hypothesis`, `partial`, or `unavailable`. | Never present as provider truth or billing proof. |
33
+ | `heuristic` | Always `true` for v1. | UI should label advice as heuristic. |
34
+ | `observed_issue` | Primary observed layout issue: `volatile_prefix_breaker`, `long_session_accumulation`, `low_cache_reuse`, `missing_cache_fields`, or `unknown`. | This is an observed/audited symptom, not a confirmed cause. |
35
+ | `priority` | Suggested priority bucket (`P0`, `P1`, or `P2`). | Use for ordering checks, not for savings claims. |
36
+ | `observed_summary` | Sanitized numeric summary such as cache creation/read tokens, prefix shares, breaker position, and dominant transcript share. | Contains aggregate counts/shares only, not raw prompt text. |
37
+ | `hypothesized_causes` | Candidate causes to investigate, each with `id`, `confidence`, `evidence`, `reason`, and `next_check`. | Keep separate from confirmed causes. |
38
+ | `corroborated_causes` | Causes supported by independent evidence beyond prefix-position heuristics. | Empty means no cause has been confirmed. |
39
+ | `next_checks` | Evidence-gathering checks with `id`, `confidence`, `command_templates`, and `evidence_required_for_corroboration`. | Templates use placeholders such as `<repo>` and must not embed observed local paths. |
40
+ | `recommended_experiments` | Ordered experiments with `id`, `order`, `priority`, `effort`, `action`, `expected_signal`, and `verification`. | Run in `order`; compare matched audit windows before claiming improvement. |
41
+ | `caveats` | User-facing boundaries for claims and evidence limits. | Preserve these in GUI summaries and reports. |
21
42
 
22
43
  ## Top-level fields
23
44
 
@@ -72,4 +93,4 @@ Historical transcript scans do not carry live context-window state. `headroom_di
72
93
 
73
94
  ## Claim boundaries
74
95
 
75
- `cache_diagnostics` can help users reorganize prompts, find volatile prefix segments, and identify missing evidence. It does not guarantee savings, does not verify provider cache state, is not billing authority, does not prove provider cache hits, and does not infer live headroom from historical token totals.
96
+ `cache_diagnostics` and the sibling `cache_layout_advice` can help users reorganize prompts, find volatile prefix segments, and identify missing evidence or next checks. They do not guarantee savings, do not verify provider cache state, are not billing authority, do not prove provider cache hits, and do not infer live headroom from historical token totals.
@@ -1,6 +1,6 @@
1
1
  # Experimental benchmark fixtures
2
2
 
3
- These fixtures are **fixture-only** starter scaffolds for future visual/OCR and learned-compression experiments. They are **synthetic**, package-visible examples for `context-guard-bench` task and variant shapes; they are **not a shipped runtime feature**, not an OCR/compression implementation, and not a hosted API savings claim.
3
+ These fixtures are **fixture-only** starter scaffolds for future visual/OCR, learned-compression, and reversible output-transform experiments. They are **synthetic**, package-visible examples for `context-guard-bench` task and variant shapes; they are **not shipped benchmark results**, not OCR/compression implementations, and not hosted API savings claims.
4
4
 
5
5
  Use them when designing an experiment that starts from ContextGuard's existing benchmark discipline:
6
6
 
@@ -12,20 +12,31 @@ Use them when designing an experiment that starts from ContextGuard's existing b
12
12
  5. Treat byte counts, image dimensions, OCR confidence, and local compressor ratios as proxy evidence. Real token/cost claims require **provider-measured** primary token/cost fields on both sides.
13
13
  6. Keep private screenshots, raw secrets, and external service endpoints out of fixture files.
14
14
 
15
+ ## Runner-native variant prompt files
16
+
17
+ `context-guard-bench` supports optional file-backed `variant_prompt_files` in task fixtures. The map is keyed by variant name and lets a single logical task swap sanitized prompt evidence per variant, for example a baseline raw-output prompt versus a digest plus artifact receipt prompt. Prompt files are resolved relative to the task JSON, must be relative paths, and are read with the same no-follow/symlink-safe posture as task and variant fixtures.
18
+
19
+ This runner-native swap only proves command shape and prompt selection until the user supplies real sanitized tasks, success checks, and provider telemetry. It does **not** make dry-run output, artifact receipts, byte counts, or digest metadata into token/cost savings evidence. For real non-dry-run output-transform experiments, keep task IDs matched across baseline and digest variants and require provider-measured primary token/cost fields on matched successful tasks before making any comparison claim.
20
+
15
21
  ## Included fixture sets
16
22
 
17
23
  | Fixture set | Task file | Variant file | Intended future experiment |
18
24
  | --- | --- | --- | --- |
19
- | Visual/OCR evidence | [`benchmark-fixtures/visual-ocr.tasks.example.json`](benchmark-fixtures/visual-ocr.tasks.example.json) | [`benchmark-fixtures/visual-ocr.variants.example.json`](benchmark-fixtures/visual-ocr.variants.example.json) | Compare full visual evidence against cropped or OCR-derived evidence after the user supplies sanitized artifacts and provider telemetry. |
20
- | Learned compression | [`benchmark-fixtures/learned-compression.tasks.example.json`](benchmark-fixtures/learned-compression.tasks.example.json) | [`benchmark-fixtures/learned-compression.variants.example.json`](benchmark-fixtures/learned-compression.variants.example.json) | Compare baseline context packs or artifact digests against a future learned-compression candidate after quality gates and shifted costs are measured. |
25
+ | Visual/OCR evidence | [`benchmark-fixtures/visual-ocr.tasks.example.json`](benchmark-fixtures/visual-ocr.tasks.example.json) | [`benchmark-fixtures/visual-ocr.variants.example.json`](benchmark-fixtures/visual-ocr.variants.example.json) | Compare full visual evidence against cropped or OCR-derived evidence after the user supplies sanitized textual evidence, missed-context notes, crop/OCR telemetry, and provider telemetry. |
26
+ | Learned compression | [`benchmark-fixtures/learned-compression.tasks.example.json`](benchmark-fixtures/learned-compression.tasks.example.json) | [`benchmark-fixtures/learned-compression.variants.example.json`](benchmark-fixtures/learned-compression.variants.example.json) | Compare sanitized baseline context packs against a fixture-only compressed digest candidate after exact retrieval or receipt fallback, quality gates, and shifted costs are measured. |
27
+ | Reversible output transform | [`benchmark-fixtures/output-transform.tasks.example.json`](benchmark-fixtures/output-transform.tasks.example.json) | [`benchmark-fixtures/output-transform.variants.example.json`](benchmark-fixtures/output-transform.variants.example.json) | Compare raw sanitized command output against a digest plus artifact receipt after variant prompt files, success checks, and provider telemetry are supplied. |
21
28
 
22
29
  ## Visual/OCR fixture notes
23
30
 
24
- The visual/OCR fixtures describe placeholder evidence only. They do not crop images, run OCR, prune visual tokens, or call a model. Future experiments should record image dimensions, crop area, OCR confidence/error notes, provider image/text token telemetry when available, task success, corrections, and any external/local processing cost.
31
+ The visual/OCR fixtures describe sanitized textual visual evidence only and now demonstrate `variant_prompt_files` for full visual evidence versus cropped/OCR-derived evidence. They do not include image assets, crop images, run OCR, prune visual tokens, or call a model. Future experiments should record image dimensions, crop area, visible area, omitted or missed context, OCR confidence/error notes, full visual fallback conditions, provider image/text token telemetry when available, task success, corrections, and any external/local processing cost.
25
32
 
26
33
  ## Learned-compression fixture notes
27
34
 
28
- The learned-compression fixtures describe already-sanitized context-pack or artifact-digest comparisons. They do not invoke LLMLingua-style, gist-token, latent-context, or reranking implementations. Future experiments should preserve exact retrieval for lossy transforms where possible and record bytes before/after, primary provider tokens, cost, success, corrections, compressor latency, and external cost.
35
+ The learned-compression fixtures describe already-sanitized context-pack or artifact-digest comparisons and now demonstrate `variant_prompt_files` for baseline context-pack evidence versus a fixture-only compressed digest candidate. They do not invoke LLMLingua-style, gist-token, latent-context, embedding, or reranking implementations. Future experiments must follow a sanitized evidence only rule, keep protected evidence exact or receipt-retrievable, forbid semantic rewrites of identifiers, numeric constants, hashes, paths, quoted strings, stack frames, JSON keys, code fences, and diff zones, and record bytes before/after, primary provider tokens, cost, success, corrections, compressor latency, and external cost.
36
+
37
+ ## Reversible output-transform fixture notes
38
+
39
+ The output-transform fixtures describe already-sanitized command output comparisons and now demonstrate `variant_prompt_files` for raw sanitized output versus digest plus artifact receipt prompt evidence. They do not execute `context-guard-trim-output`, store artifacts, call `context-guard-artifact`, or invoke a provider. Future experiments should compare raw sanitized output against `--digest` output plus an `--artifact-receipt`, verify the receipt's exact re-expand command retrieves the omitted sanitized lines, and record bytes before/after, primary provider tokens, cost, success, corrections, artifact-store usage, and any external/local processing cost.
29
40
 
30
41
  ## Safe wording
31
42
 
@@ -33,4 +44,4 @@ Use language like:
33
44
 
34
45
  > This synthetic fixture validates benchmark task/variant shape only. A real claim needs provider-measured token/cost data for matched successful baseline and variant tasks, plus failure-rate, correction, and shifted-cost guardrails.
35
46
 
36
- Avoid language that presents dry-run output, bytes saved, OCR text, or compressor ratios as hosted API token/cost savings evidence.
47
+ Avoid language that presents dry-run output, bytes saved, OCR text, artifact receipts, exact re-expand handles, or compressor ratios as hosted API token/cost savings evidence.
@@ -0,0 +1,62 @@
1
+ # macOS visibility feasibility contract
2
+
3
+ `context-guard-audit --feasibility-json` emits a local, pre-GUI contract for future macOS-visible surfaces such as a menu-bar app, xbar item, Raycast command, or SwiftUI prototype. It is a transcript-scan contract, not a GUI implementation and not a live daemon.
4
+
5
+ The full feasibility envelope is versioned as `contextguard.metric-feasibility.v1.3`. The macOS binding/index inside that envelope is the top-level `mac_visibility` object with nested `schema_version: contextguard.mac-visibility.v1`.
6
+
7
+ ## Contract boundary
8
+
9
+ - `mac_visibility` is a thin index over stable top-level feasibility fields. It does not recompute totals and does not read diagnostic `summary`.
10
+ - GUI or menu-bar consumers should bind only to fields listed in `consumer_contract.stable_top_level_fields` and `mac_visibility.bind_to_top_level_fields`.
11
+ - `summary` is diagnostic/backward-compatible payload only. It may be shown in a debug panel, but it must not drive primary cards.
12
+ - Historical transcript scans do not include live context-window state. Context and headroom cards stay `missing` until a future surface provides `live_statusline_snapshot`.
13
+ - Values are local transcript observations. They are not invoice-grade billing records, do not prove provider cache hits, and do not guarantee token or cost savings.
14
+
15
+ ## Stable `mac_visibility` keys
16
+
17
+ | Key | Meaning |
18
+ | --- | --- |
19
+ | `schema_version` | Nested contract version: `contextguard.mac-visibility.v1`. |
20
+ | `surface_kind` | Local surface family; currently `local_macos_visibility_contract`. |
21
+ | `readiness.status` | One of `ready`, `partial`, or `missing`, derived from token availability and scan integrity. |
22
+ | `bind_to_top_level_fields` | Stable top-level fields primary consumers may use. |
23
+ | `diagnostic_only_fields` | Fields that must not drive primary UI; currently `summary`. |
24
+ | `primary_cards` | Ordered card descriptors with `id`, `title`, `status`, and `binding_paths`. |
25
+ | `missing_live_observations` | Required live observations that transcript scans cannot provide. |
26
+ | `claim_boundaries` | Copy-safe caveats for UI labels and docs. |
27
+ | `redaction_required` | Always `true` for default GUI/menu-bar presentation. |
28
+
29
+ ## Card IDs and binding paths
30
+
31
+ `primary_cards[*].binding_paths` use dotted paths inside the feasibility envelope. Current card IDs are:
32
+
33
+ 1. `source_freshness` → `source_kind`, `source_freshness.status`, `source_freshness.generated_at`
34
+ 2. `scan_integrity` → scan completeness and skipped counts
35
+ 3. `token_totals` → `totals.total_tokens` and `totals.tokens.*`
36
+ 4. `cache_reuse` → `totals.cache_read_share`, `totals.cache_reuse_ratio`, `metric_availability.cache`
37
+ 5. `observed_cost` → `totals.cost_usd_observed`, `metric_availability.cost`
38
+ 6. `context_availability` → `context_availability`, `metric_availability.context`
39
+ 7. `headroom_availability` → `headroom_availability`, `cache_diagnostics.headroom_diagnostics`
40
+ 8. `cache_layout_advice` → `cache_layout_advice`, `cache_friendliness`, `cache_diagnostics.dynamic_prefix_breakers`
41
+
42
+ When a card includes `required_observation: live_statusline_snapshot`, consumers should show an unavailable or setup state rather than treating the value as zero.
43
+
44
+ ## Example
45
+
46
+ See [`mac-visibility-feasibility.example.json`](mac-visibility-feasibility.example.json) for an abridged feasibility envelope. It keeps `summary` out of primary bindings and demonstrates the missing live context/headroom boundary.
47
+
48
+ ## Verification guidance
49
+
50
+ For a local fixture:
51
+
52
+ ```bash
53
+ context-guard-audit ./fixtures/transcripts --feasibility-json --recommend
54
+ ```
55
+
56
+ Then verify:
57
+
58
+ - `schema_version == "contextguard.metric-feasibility.v1.3"`
59
+ - `consumer_contract.stable_top_level_fields` contains `mac_visibility`
60
+ - `mac_visibility.diagnostic_only_fields` contains `summary`
61
+ - no `primary_cards[*].binding_paths` entry starts with `summary`
62
+ - `missing_live_observations[*].required_observation` names `live_statusline_snapshot` when context/headroom are missing
@@ -0,0 +1,130 @@
1
+ {
2
+ "schema_version": "contextguard.metric-feasibility.v1.3",
3
+ "producer": "context-guard-audit",
4
+ "generated_at": "2026-06-08T12:00:00Z",
5
+ "consumer_contract": {
6
+ "stable_top_level_fields": [
7
+ "schema_version",
8
+ "producer",
9
+ "generated_at",
10
+ "source_kind",
11
+ "source_freshness",
12
+ "scan_integrity",
13
+ "metric_availability",
14
+ "metric_caveats",
15
+ "redaction_mode",
16
+ "context_availability",
17
+ "headroom_availability",
18
+ "cache_friendliness",
19
+ "cache_diagnostics",
20
+ "cache_layout_advice",
21
+ "mac_visibility",
22
+ "totals"
23
+ ],
24
+ "diagnostic_fields": ["summary"],
25
+ "summary_contract": "summary is the legacy audit JSON payload for diagnostics and backward compatibility; new GUI prototypes should bind to stable top-level feasibility fields first."
26
+ },
27
+ "source_kind": "historical_transcript_scan",
28
+ "source_freshness": {
29
+ "status": "snapshot_at_scan_time",
30
+ "live": false,
31
+ "generated_at": "2026-06-08T12:00:00Z",
32
+ "description": "Local transcript files were scanned when this report was generated; this is not a live statusline snapshot."
33
+ },
34
+ "scan_integrity": {
35
+ "status": "complete",
36
+ "files_scanned": 1,
37
+ "records_scanned": 1,
38
+ "skipped_files": 0,
39
+ "skipped_records": 0,
40
+ "parse_error_count": 0,
41
+ "complete": true
42
+ },
43
+ "metric_availability": {
44
+ "tokens": {"status": "available", "present_fields": {"input": 1, "output": 1, "cache_read": 1, "cache_creation": 1}, "evidence": "observed"},
45
+ "cache": {"status": "available", "present_fields": {"cache_read": 1, "cache_creation": 1}, "zero_values_observed": {"cache_read": false, "cache_creation": false}, "evidence": "observed"},
46
+ "cost": {"status": "available", "present_count": 1, "observed_cost_usd": 0.1234, "evidence": "observed"},
47
+ "context": {"status": "missing", "evidence": "unavailable", "reason": "Transcript scans do not include live Claude Code context_window data. Pass a live statusline snapshot in a future surface to populate context availability."},
48
+ "headroom": {"status": "missing", "evidence": "unavailable", "reason": "Transcript scans do not carry live context-window or remaining-token data, so context headroom cannot be observed or conservatively inferred from history alone.", "observable_via": "live_statusline_snapshot"}
49
+ },
50
+ "metric_caveats": [
51
+ "Values are observed from local Claude Code transcript JSON/JSONL fields and are not official billing records.",
52
+ "cache-read share is cache_read / (input + cache_read + cache_creation), not a provider billing hit-rate."
53
+ ],
54
+ "redaction_mode": {
55
+ "paths": "basename_plus_stable_hash_by_default",
56
+ "commands": "command_category_plus_stable_hash_by_default",
57
+ "secret_like_values": "pattern_redacted",
58
+ "raw_path_and_command_flags": ["--show-paths", "--show-commands"]
59
+ },
60
+ "context_availability": {"status": "missing", "evidence": "unavailable", "reason": "Transcript scans do not include live Claude Code context_window data. Pass a live statusline snapshot in a future surface to populate context availability."},
61
+ "headroom_availability": {"status": "missing", "evidence": "unavailable", "observable_via": "live_statusline_snapshot", "reason": "Transcript scans do not carry live context-window or remaining-token data, so context headroom cannot be observed or conservatively inferred from history alone."},
62
+ "cache_friendliness": {"status": "partial", "confidence": "partial", "evidence": "observed", "heuristic": true},
63
+ "cache_diagnostics": {
64
+ "schema_version": "contextguard.cache-diagnostics.v1",
65
+ "status": "partial",
66
+ "confidence": "hypothesis",
67
+ "evidence": "inferred",
68
+ "heuristic": true,
69
+ "dynamic_prefix_breakers": [],
70
+ "headroom_diagnostics": {"status": "missing", "evidence": "unavailable", "observable_via": "live_statusline_snapshot", "required_observation": "live_statusline_snapshot", "historical_total_tokens_are_not_headroom": true}
71
+ },
72
+ "cache_layout_advice": {
73
+ "schema_version": "contextguard.cache-layout-advice.v1",
74
+ "status": "partial",
75
+ "confidence": "partial",
76
+ "priority": "P1",
77
+ "observed_issue": "long_session_accumulation"
78
+ },
79
+ "mac_visibility": {
80
+ "schema_version": "contextguard.mac-visibility.v1",
81
+ "surface_kind": "local_macos_visibility_contract",
82
+ "readiness": {
83
+ "status": "ready",
84
+ "reason": "Transcript token totals are available and the scan completed within configured limits."
85
+ },
86
+ "bind_to_top_level_fields": [
87
+ "source_kind",
88
+ "source_freshness",
89
+ "scan_integrity",
90
+ "metric_availability",
91
+ "metric_caveats",
92
+ "redaction_mode",
93
+ "context_availability",
94
+ "headroom_availability",
95
+ "cache_friendliness",
96
+ "cache_diagnostics",
97
+ "cache_layout_advice",
98
+ "totals"
99
+ ],
100
+ "diagnostic_only_fields": ["summary"],
101
+ "primary_cards": [
102
+ {"id": "source_freshness", "title": "Source freshness", "status": "available", "binding_paths": ["source_kind", "source_freshness.status", "source_freshness.generated_at"]},
103
+ {"id": "scan_integrity", "title": "Scan integrity", "status": "complete", "binding_paths": ["scan_integrity.status", "scan_integrity.files_scanned", "scan_integrity.records_scanned", "scan_integrity.skipped_files", "scan_integrity.skipped_records"]},
104
+ {"id": "token_totals", "title": "Token totals", "status": "available", "binding_paths": ["totals.total_tokens", "totals.tokens.input", "totals.tokens.output", "totals.tokens.cache_read", "totals.tokens.cache_creation"]},
105
+ {"id": "cache_reuse", "title": "Cache-read share and reuse ratio", "status": "available", "binding_paths": ["totals.cache_read_share", "totals.cache_reuse_ratio", "metric_availability.cache"]},
106
+ {"id": "observed_cost", "title": "Observed transcript cost", "status": "available", "binding_paths": ["totals.cost_usd_observed", "metric_availability.cost"]},
107
+ {"id": "context_availability", "title": "Context availability", "status": "missing", "binding_paths": ["context_availability", "metric_availability.context"], "required_observation": "live_statusline_snapshot"},
108
+ {"id": "headroom_availability", "title": "Headroom availability", "status": "missing", "binding_paths": ["headroom_availability", "cache_diagnostics.headroom_diagnostics"], "required_observation": "live_statusline_snapshot"},
109
+ {"id": "cache_layout_advice", "title": "Cache layout advice", "status": "partial", "binding_paths": ["cache_layout_advice", "cache_friendliness", "cache_diagnostics.dynamic_prefix_breakers"]}
110
+ ],
111
+ "missing_live_observations": [
112
+ {"id": "live_context_window", "required_observation": "live_statusline_snapshot", "affects": ["context_availability", "metric_availability.context"], "reason": "Historical transcript scans do not include live Claude Code context_window data."},
113
+ {"id": "live_headroom", "required_observation": "live_statusline_snapshot", "affects": ["headroom_availability", "cache_diagnostics.headroom_diagnostics"], "reason": "Historical transcript totals are not remaining-token or live headroom observations."}
114
+ ],
115
+ "claim_boundaries": [
116
+ "Local transcript observations are not invoice-grade billing records.",
117
+ "Provider cache fields are telemetry, not ContextGuard-caused token reduction and do not prove provider cache hits.",
118
+ "Historical transcript totals do not infer live context headroom or remaining tokens.",
119
+ "This contract does not guarantee token or cost savings."
120
+ ],
121
+ "redaction_required": true
122
+ },
123
+ "totals": {
124
+ "total_tokens": 1150,
125
+ "tokens": {"input": 100, "output": 50, "cache_read": 800, "cache_creation": 200},
126
+ "cost_usd_observed": 0.1234,
127
+ "cache_read_share": 0.7272727272727273,
128
+ "cache_reuse_ratio": 4.0
129
+ }
130
+ }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ictechgy/context-guard",
3
- "version": "0.4.3",
3
+ "version": "0.4.5",
4
4
  "description": "ContextGuard CLI helpers for keeping AI coding agent context focused and local-first.",
5
5
  "license": "Apache-2.0",
6
6
  "homepage": "https://github.com/ictechgy/context-guard#readme",
@@ -55,9 +55,13 @@
55
55
  "docs/cache-diagnostics-schema.md",
56
56
  "docs/cache-diagnostics.schema.json",
57
57
  "docs/cache-diagnostics.example.json",
58
+ "docs/mac-visibility-feasibility-schema.md",
59
+ "docs/mac-visibility-feasibility.example.json",
58
60
  "docs/benchmark-workflows/*.example.json",
61
+ "docs/benchmark-workflows/*.example.jsonl",
59
62
  "docs/benchmark-workflow-examples.md",
60
63
  "docs/benchmark-fixtures/*.example.json",
64
+ "docs/benchmark-fixtures/*.prompt.example.md",
61
65
  "docs/experimental-benchmark-fixtures.md",
62
66
  "packaging/homebrew/context-guard.rb.template"
63
67
  ],
@@ -5,7 +5,7 @@ class ContextGuard < Formula
5
5
 
6
6
  desc "Local-first context guardrails for AI coding agents"
7
7
  homepage "https://github.com/ictechgy/context-guard"
8
- url "https://github.com/ictechgy/context-guard/archive/refs/tags/v0.4.3.tar.gz"
8
+ url "https://github.com/ictechgy/context-guard/archive/refs/tags/v0.4.4.tar.gz"
9
9
  sha256 "REPLACE_WITH_RELEASE_TARBALL_SHA256"
10
10
  license "Apache-2.0"
11
11
 
@@ -37,5 +37,5 @@
37
37
  "gated-experiments",
38
38
  "future-roadmap"
39
39
  ],
40
- "version": "0.4.3"
40
+ "version": "0.4.5"
41
41
  }
@@ -95,9 +95,9 @@ context-guard-statusline-merged
95
95
  - **출력 축약기**는 감싼 명령의 종료 코드를 보존하면서 긴 로그를 줄이고, `--digest markdown` 또는 `--digest json`으로 실행기 실패 정보, 가림 처리된 failure signature, 중복 라인 그룹, 다음 조회 제안이 담긴 요약을 만들 수 있습니다.
96
96
  - **민감정보 가림 도구**는 검색, diff, 로그 출력에서 자격 증명 패턴, 비공개 키 블록, 인증 헤더, 자격 증명이 포함된 URL, 민감해 보이는 경로를 가립니다.
97
97
  - **상태표시줄**은 모델, 컨텍스트, 비용 신호를 짧게 보여주고, 대화 기록 데이터가 있으면 캐시 읽기와 캐시 재사용 신호도 함께 표시합니다.
98
- - **대화 기록 감사**는 usage/cost/cache bucket을 집계하고, 토큰 집중 지점과 `cache_friendliness` 프롬프트 배치 신호를 제한된 가림 처리된 segment hash로 보고합니다. 원문 프롬프트는 출력하지 않습니다.
98
+ - **대화 기록 감사**는 usage/cost/cache bucket을 집계하고, 토큰 집중 지점, `cache_friendliness` 프롬프트 배치 신호, `cache_layout_advice` 확인/실험 우선순위를 제한된 가림 처리된 segment hash로 보고합니다. 원문 프롬프트는 출력하지 않습니다.
99
99
  - **반복 실패 알림**은 Bash 실패가 반복될 때 같은 경로를 계속 재시도하지 않고 전략을 바꾸도록 안내합니다.
100
- - **벤치마크 헬퍼**는 기준/변형 실행을 대응해 실제 토큰·비용 필드, 별도의 바이트 감소 간접 증거, 진단용 `wall_time_seconds`, `provider_cached_tokens`, provider-cache 사용 가능성 텔레메트리로 기록합니다.
100
+ - **벤치마크 헬퍼**는 기준/변형 실행을 대응해 실제 토큰·비용 필드, 별도의 바이트 감소 간접 증거, 진단용 `wall_time_seconds`, `provider_cached_tokens`, provider-cache 사용 가능성 텔레메트리, 파일 기반 `variant_prompt_files`, 선택적 run별 `self_hosted_metrics` JSONL ledger sidecar를 기록합니다. 이 sidecar는 hosted API 절감 주장에 합치지 않습니다.
101
101
 
102
102
  비용 가드의 로컬 HMAC 키는 기본적으로 `.context-guard/cost-ledger/hmac.key`에 자동 생성됩니다. 관리자가 직접 주입하는 경우 파일에는 필수 padding을 포함한 canonical URL-safe base64 32바이트 키만 정확히 들어 있어야 하며, trailing newline이나 공백은 허용하지 않습니다. 리포트는 키와 원문 프롬프트를 출력하지 않고, 로컬 ledger는 Anthropic/provider prompt cache를 대체하지 않습니다.
103
103
 
@@ -109,7 +109,7 @@ brief 모드는 코딩 에이전트가 군더더기를 줄이도록 요청하되
109
109
 
110
110
  ## 절감 수치를 과장하지 않습니다
111
111
 
112
- 이 헬퍼들은 흔히 컨텍스트를 불필요하게 키우는 원인을 줄이지만, 고정된 절감률을 보장하지 않습니다. 실제 전후 비교 증거가 필요하면 `context-guard-bench --ledger-jsonl ... --report-json ...`로 본인 작업에서 측정하세요. 토큰 절감 주장은 대응 태스크 양쪽 모두에 `primary_tokens_measured`가 있을 때만 계산하며, report의 `matched_pair_evidence`가 성공한 baseline/variant task bucket을 transform, quality gate, 측정 가능 여부, claim boundary와 연결합니다. wall-time과 provider-cache 필드는 진단용 텔레메트리이지 단독 절감 증거가 아닙니다. 감사의 `cache_friendliness`와 [`cache_diagnostics`](https://github.com/ictechgy/context-guard/blob/main/docs/cache-diagnostics-schema.md) 관측/추론/가설/불가 경계를 둔 휴리스틱 배치·cache-read 신호이며 청구 기준이나 provider-cache 증명이 아닙니다. 벤치마크 CSV 스키마는 엄격하므로 헬퍼 업그레이드 후에는 새 CSV를 시작하거나 헤더를 마이그레이션하세요. 작업 유형별 합성 예시는 [`docs/benchmark-workflow-examples.md`](https://github.com/ictechgy/context-guard/blob/main/docs/benchmark-workflow-examples.md)에 있고, fixture-only 실험 시작 예시는 [`docs/experimental-benchmark-fixtures.md`](https://github.com/ictechgy/context-guard/blob/main/docs/experimental-benchmark-fixtures.md)에 있습니다.
112
+ 이 헬퍼들은 흔히 컨텍스트를 불필요하게 키우는 원인을 줄이지만, 고정된 절감률을 보장하지 않습니다. 실제 전후 비교 증거가 필요하면 `context-guard-bench --ledger-jsonl ... --report-json ...`로 본인 작업에서 측정하세요. 토큰 절감 주장은 대응 태스크 양쪽 모두에 `primary_tokens_measured`가 있을 때만 계산하며, report의 `matched_pair_evidence`가 성공한 baseline/variant task bucket을 transform, quality gate, 측정 가능 여부, claim boundary와 연결합니다. wall-time과 provider-cache 필드는 진단용 텔레메트리이지 단독 절감 증거가 아닙니다. 감사의 `cache_friendliness`, [`cache_diagnostics`](https://github.com/ictechgy/context-guard/blob/main/docs/cache-diagnostics-schema.md), `cache_layout_advice`는 관측/추론/가설/불가 경계를 둔 휴리스틱 배치·cache-read 신호와 순위화된 확인/실험이며 청구 기준이나 provider-cache 증명이 아닙니다. 벤치마크 CSV 스키마는 엄격하므로 헬퍼 업그레이드 후에는 새 CSV를 시작하거나 헤더를 마이그레이션하세요. 작업 유형별 합성 예시는 [`docs/benchmark-workflow-examples.md`](https://github.com/ictechgy/context-guard/blob/main/docs/benchmark-workflow-examples.md)에 있고, fixture-only 실험 시작 예시는 [`docs/experimental-benchmark-fixtures.md`](https://github.com/ictechgy/context-guard/blob/main/docs/experimental-benchmark-fixtures.md)에 있습니다.
113
113
 
114
114
  ContextGuard는 모델 토큰을 줄이기 위해 작업을 외부 AI 서비스로 전송하지 않습니다. 모든 헬퍼 명령은 로컬에서 동작합니다. 로컬 RAM/디스크 보관본은 다음에 보낼 컨텍스트를 줄이는 데 도움될 수 있지만 provider prompt cache를 대체하지 않습니다. Anthropic 배포나 청구 설명 전에는 공식 prompt caching/pricing 문서를 다시 확인하세요: https://docs.anthropic.com/en/build-with-claude/prompt-caching 및 https://platform.claude.com/docs/en/about-claude/pricing.
115
115
 
@@ -101,9 +101,9 @@ context-guard-statusline-merged
101
101
  - **Output trimmer** preserves the wrapped command exit code, trims long logs, and can emit `--digest markdown` or `--digest json` summaries with runner failure facts, sanitized failure signatures, duplicate-line groups, and suggested next queries. Add `--artifact-receipt` with digest mode to store the exact sanitized full output as a local artifact receipt and re-expand omitted slices with the emitted `context-guard-artifact get ...` command.
102
102
  - **Sanitizer** redacts common credential patterns, private key blocks, auth headers, credential URLs, and sensitive-looking paths from search, diff, and log output.
103
103
  - **Statusline** displays compact model/context/cost signals and, when transcript data is available, cache-read and cache-reuse signals.
104
- - **Transcript audit** aggregates usage/cost/cache buckets, flags likely token hotspots, and exposes `cache_friendliness` plus additive [`cache_diagnostics`](https://github.com/ictechgy/context-guard/blob/main/docs/cache-diagnostics-schema.md) findings from bounded usage fields, timestamped cache telemetry records, and redacted segment hashes without printing raw prompt text or claiming provider-cache savings.
104
+ - **Transcript audit** aggregates usage/cost/cache buckets, flags likely token hotspots, and exposes `cache_friendliness`, additive [`cache_diagnostics`](https://github.com/ictechgy/context-guard/blob/main/docs/cache-diagnostics-schema.md), and `cache_layout_advice` experiment priorities from bounded usage fields, timestamped cache telemetry records, and redacted segment hashes without printing raw prompt text or claiming provider-cache savings.
105
105
  - **Repeated-failure nudge** warns after repeated Bash failures so the agent switches strategy instead of retrying the same context-heavy path.
106
- - **Benchmark helper** records matched baseline/variant runs with real token and cost fields, separate byte-reduction proxy evidence, diagnostic `wall_time_seconds`, `provider_cached_tokens`, and provider-cache availability telemetry.
106
+ - **Benchmark helper** records matched baseline/variant runs with real token and cost fields, separate byte-reduction proxy evidence, diagnostic `wall_time_seconds`, `provider_cached_tokens`, provider-cache availability telemetry, file-backed `variant_prompt_files`, and optional per-run `self_hosted_metrics` JSONL ledger sidecars that stay out of hosted API savings claims.
107
107
 
108
108
  Cost guard creates its local HMAC key automatically at `.context-guard/cost-ledger/hmac.key`. If you provision that file yourself, it must contain exactly one canonical URL-safe base64 32-byte key with required padding and no trailing newline or whitespace. Reports never emit the key or raw prompt text, and the local ledger does not replace Anthropic/provider prompt caching.
109
109
 
@@ -115,7 +115,7 @@ Three deterministic levels — `lite`, `standard`, `ultra` — live under [`brie
115
115
 
116
116
  ## Conservative claims
117
117
 
118
- These helpers reduce common sources of context bloat, but they do not guarantee a fixed percentage savings. Use `context-guard-bench --ledger-jsonl ... --report-json ...` when you need measured before/after evidence for your own tasks; token-savings claims require `primary_tokens_measured` on both matched sides, and the report's `matched_pair_evidence` links each successful baseline/variant task bucket to the transform, quality gate, measurement availability, and claim boundary. Wall-time/provider-cache fields are diagnostic telemetry, not standalone savings proof. Audit `cache_friendliness` and [`cache_diagnostics`](https://github.com/ictechgy/context-guard/blob/main/docs/cache-diagnostics-schema.md) findings are heuristic layout/cache-read signals with observed/inferred/hypothesis/unavailable boundaries, not billing authority or provider-cache proof. Benchmark CSV schemas are strict, so start a new CSV or migrate the header after helper upgrades. Workflow-specific synthetic examples live in [`docs/benchmark-workflow-examples.md`](https://github.com/ictechgy/context-guard/blob/main/docs/benchmark-workflow-examples.md), and fixture-only experimental task/variant starters live in [`docs/experimental-benchmark-fixtures.md`](https://github.com/ictechgy/context-guard/blob/main/docs/experimental-benchmark-fixtures.md).
118
+ These helpers reduce common sources of context bloat, but they do not guarantee a fixed percentage savings. Use `context-guard-bench --ledger-jsonl ... --report-json ...` when you need measured before/after evidence for your own tasks; token-savings claims require `primary_tokens_measured` on both matched sides, and the report's `matched_pair_evidence` links each successful baseline/variant task bucket to the transform, quality gate, measurement availability, and claim boundary. Wall-time/provider-cache fields are diagnostic telemetry, not standalone savings proof. Audit `cache_friendliness`, [`cache_diagnostics`](https://github.com/ictechgy/context-guard/blob/main/docs/cache-diagnostics-schema.md), and `cache_layout_advice` findings are heuristic layout/cache-read signals and ranked checks/experiments with observed/inferred/hypothesis/unavailable boundaries, not billing authority or provider-cache proof. Benchmark CSV schemas are strict, so start a new CSV or migrate the header after helper upgrades. Workflow-specific synthetic examples live in [`docs/benchmark-workflow-examples.md`](https://github.com/ictechgy/context-guard/blob/main/docs/benchmark-workflow-examples.md), and fixture-only experimental task/variant starters live in [`docs/experimental-benchmark-fixtures.md`](https://github.com/ictechgy/context-guard/blob/main/docs/experimental-benchmark-fixtures.md).
119
119
 
120
120
  ContextGuard also does not send work to external AI providers to save model tokens. All helper commands run locally. Local RAM/disk receipts can reduce what you choose to send, but they do not replace a provider prompt cache. Before release or billing claims for Anthropic, recheck the official prompt-caching and pricing docs: https://docs.anthropic.com/en/build-with-claude/prompt-caching and https://platform.claude.com/docs/en/about-claude/pricing.
121
121