npm - @ictechgy/context-guard - Versions diffs - 0.4.1 → 0.4.4 - Mend

@ictechgy/context-guard 0.4.1 → 0.4.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

package/CHANGELOG.md +15 -0
package/README.ko.md +62 -33
package/README.md +91 -23
package/context-guard-kit/README.md +39 -26
package/context-guard-kit/benchmark_runner.py +273 -8
package/context-guard-kit/claude_transcript_cost_audit.py +597 -12
package/context-guard-kit/context_compress.py +153 -1
package/context-guard-kit/context_filter.py +446 -0
package/context-guard-kit/context_guard_cli.py +3 -0
package/context-guard-kit/context_guard_diet.py +677 -2
package/context-guard-kit/context_pack.py +1694 -2
package/context-guard-kit/cost_guard.py +1870 -0
package/context-guard-kit/setup_wizard.py +820 -29
package/context-guard-kit/trim_command_output.py +396 -45
package/docs/benchmark-fixtures/learned-compression.tasks.example.json +24 -0
package/docs/benchmark-fixtures/learned-compression.variants.example.json +10 -0
package/docs/benchmark-fixtures/visual-ocr.tasks.example.json +24 -0
package/docs/benchmark-fixtures/visual-ocr.variants.example.json +10 -0
package/docs/benchmark-workflow-examples.md +40 -0
package/docs/benchmark-workflows/context-pack-byte-proxy.example.json +169 -0
package/docs/benchmark-workflows/measured-token-workflow.example.json +170 -0
package/docs/benchmark-workflows/provider-cache-telemetry.example.json +170 -0
package/docs/cache-diagnostics-schema.md +96 -0
package/docs/cache-diagnostics.example.json +116 -0
package/docs/cache-diagnostics.schema.json +460 -0
package/docs/distribution.md +4 -2
package/docs/experimental-benchmark-fixtures.md +36 -0
package/package.json +11 -2
package/packaging/homebrew/context-guard.rb.template +3 -2
package/plugins/context-guard/.claude-plugin/plugin.json +1 -1
package/plugins/context-guard/README.ko.md +22 -14
package/plugins/context-guard/README.md +24 -10
package/plugins/context-guard/bin/context-guard +3 -0
package/plugins/context-guard/bin/context-guard-audit +597 -12
package/plugins/context-guard/bin/context-guard-bench +273 -8
package/plugins/context-guard/bin/context-guard-compress +153 -1
package/plugins/context-guard/bin/context-guard-cost +1870 -0
package/plugins/context-guard/bin/context-guard-diet +677 -2
package/plugins/context-guard/bin/context-guard-filter +446 -0
package/plugins/context-guard/bin/context-guard-pack +1694 -2
package/plugins/context-guard/bin/context-guard-setup +820 -29
package/plugins/context-guard/bin/context-guard-trim-output +396 -45
package/plugins/context-guard/brief/README.md +10 -3
package/plugins/context-guard/skills/optimize/SKILL.md +5 -2
package/plugins/context-guard/skills/setup/SKILL.md +3 -1

package/docs/benchmark-workflow-examples.md ADDED Viewed

@@ -0,0 +1,40 @@
+# Workflow benchmark examples
+These examples show how to read `context-guard-bench` reports for common ContextGuard workflows. They are **synthetic report-shape examples**, not benchmark results for every repository.
+Use them to decide what evidence a workflow has and what it does **not** prove:
+- matched successful tasks are the comparison basis;
+- provider-measured primary token/cost fields are required for token/cost savings claims;
+- byte reductions and `chars_div_4` token proxies are local proxy evidence;
+- provider cached-token fields are diagnostic telemetry and must stay separate from token-reduction claims;
+- wall time, human corrections, and shifted external costs are quality/cost guardrails, not standalone savings proof.
+## Example fixtures
+| Workflow example | What it demonstrates | Claim boundary |
+| --- | --- | --- |
+| [`benchmark-workflows/context-pack-byte-proxy.example.json`](benchmark-workflows/context-pack-byte-proxy.example.json) | `context-guard-pack auto` can reduce selected local bytes and inferred token proxies. | No hosted API token-savings claim because primary provider token fields are unavailable. |
+| [`benchmark-workflows/provider-cache-telemetry.example.json`](benchmark-workflows/provider-cache-telemetry.example.json) | Cache-layout diagnostics can coincide with observed provider cached-token telemetry. | Provider-cache telemetry is not proof that ContextGuard reduced prompt tokens or cost. |
+| [`benchmark-workflows/measured-token-workflow.example.json`](benchmark-workflows/measured-token-workflow.example.json) | A matched successful task pair with measured primary tokens may expose `token_savings_pct`. | The percentage is sample report data only, not a general savings promise; real claims require your own matched successful task runs and quality gates. |
+## How to use the examples
+1. Run your own benchmark with `context-guard-bench --tasks ... --variants ... --csv ... --report-json ...`.
+2. Compare your report's `claim_status`, `summary_by_variant`, and `comparisons[].quality_gate` to the examples.
+3. Treat `comparisons[].quality_gate != "pass"` as a warning to inspect failures, correction burden, and unmatched tasks before discussing savings.
+4. Keep byte-proxy, provider-cache, wall-time, and shifted-cost evidence in separate language from provider-measured token/cost claims. Provider-cache telemetry is not independent savings proof.
+## Safe wording
+Use language like:
+> In this matched successful task set, primary token telemetry was observed for both variants and the report shows `token_savings_pct` for the optimized variant. Byte reductions and provider-cache fields are diagnostic context, not independent savings proof.
+Avoid language like:
+> ContextGuard guarantees this workflow will save tokens or cost.
+The fixtures intentionally use full `context-guard-bench-report-v1` shapes so tests can catch schema drift and overclaim wording.
+For task/variant starter fixtures rather than full report-shape examples, see [`experimental-benchmark-fixtures.md`](experimental-benchmark-fixtures.md). Those files are fixture-only and synthetic dry-run-only starters until users replace the placeholder prompts and success checks; they are not shipped OCR, visual-token, or learned-compression runtime features, and real claims still require provider-measured matched successful tasks plus failure-rate, correction, and shifted-cost guardrails.

package/docs/benchmark-workflows/context-pack-byte-proxy.example.json ADDED Viewed

@@ -0,0 +1,169 @@
+{
+  "schema": "context-guard-bench-report-v1",
+  "baseline_variant": "baseline",
+  "row_count": 2,
+  "summary_by_variant": {
+    "baseline": {
+      "runs": 1,
+      "successful_runs": 1,
+      "failed_runs": 0,
+      "total_tokens_all_runs": 0,
+      "primary_tokens_measured_runs": 0,
+      "primary_cost_all_runs_usd": 0.0,
+      "primary_cost_measured_runs": 0,
+      "wall_time_seconds_all_runs": 12.0,
+      "wall_time_seconds_measured_runs": 1,
+      "provider_cached_tokens_all_runs": 0,
+      "provider_cached_tokens_measured_runs": 0,
+      "total_cost_with_shift_all_runs_usd": 0.0,
+      "total_cost_with_shift_measured_runs": 0,
+      "total_tokens_successful": 0,
+      "primary_tokens_measured_successful": 0,
+      "primary_cost_successful_usd": 0.0,
+      "primary_cost_measured_successful": 0,
+      "wall_time_seconds_successful": 12.0,
+      "wall_time_seconds_measured_successful": 1,
+      "provider_cached_tokens_successful": 0,
+      "provider_cached_tokens_measured_successful": 0,
+      "external_cost_successful_usd": 0.0,
+      "external_cost_unknown_successful": 1,
+      "total_cost_with_shift_successful_usd": 0.0,
+      "total_cost_with_shift_measured_successful": 0,
+      "external_tokens_successful": 0,
+      "external_tokens_measured_successful": 0,
+      "artifacts_used_successful": 0,
+      "corrections_successful": 0,
+      "bytes_before_successful": 24000,
+      "bytes_after_successful": 24000,
+      "turns_successful": 0,
+      "hook_triggers_successful": 0,
+      "failure_rate": 0.0,
+      "task_count": 1,
+      "successful_task_count": 1,
+      "tokens_per_task_including_failures": null,
+      "wall_time_seconds_per_task_including_failures": 12.0,
+      "provider_cached_tokens_per_task_including_failures": 0.0,
+      "primary_cost_per_task_including_failures_usd": null,
+      "total_cost_with_shift_per_task_including_failures_usd": null,
+      "tokens_per_successful_task": null,
+      "wall_time_seconds_per_successful_task": 12.0,
+      "provider_cached_tokens_per_successful_task": 0.0,
+      "primary_cost_per_successful_task_usd": null,
+      "total_cost_with_shift_per_successful_task_usd": null,
+      "external_tokens_per_successful_task": null,
+      "artifacts_used_per_successful_task": 0.0,
+      "corrections_per_successful_task": 0.0,
+      "byte_reduction_ratio": 1.0,
+      "compression_strategy": "baseline",
+      "is_baseline_strategy": true,
+      "bytes_saved_successful": 0,
+      "bytes_saved_per_successful_task": 0.0,
+      "byte_savings_pct": 0.0,
+      "token_proxy_saved_successful": 0,
+      "token_proxy_saved_per_successful_task": 0.0,
+      "observed_telemetry": {
+        "tokens": "unavailable",
+        "primary_cost": "unavailable",
+        "external_tokens": "unavailable",
+        "byte_savings": "observed",
+        "token_proxy": "inferred",
+        "wall_time": "observed",
+        "provider_cache": "unavailable"
+      }
+    },
+    "context_pack_auto": {
+      "runs": 1,
+      "successful_runs": 1,
+      "failed_runs": 0,
+      "total_tokens_all_runs": 0,
+      "primary_tokens_measured_runs": 0,
+      "primary_cost_all_runs_usd": 0.0,
+      "primary_cost_measured_runs": 0,
+      "wall_time_seconds_all_runs": 11.0,
+      "wall_time_seconds_measured_runs": 1,
+      "provider_cached_tokens_all_runs": 0,
+      "provider_cached_tokens_measured_runs": 0,
+      "total_cost_with_shift_all_runs_usd": 0.0,
+      "total_cost_with_shift_measured_runs": 0,
+      "total_tokens_successful": 0,
+      "primary_tokens_measured_successful": 0,
+      "primary_cost_successful_usd": 0.0,
+      "primary_cost_measured_successful": 0,
+      "wall_time_seconds_successful": 11.0,
+      "wall_time_seconds_measured_successful": 1,
+      "provider_cached_tokens_successful": 0,
+      "provider_cached_tokens_measured_successful": 0,
+      "external_cost_successful_usd": 0.0,
+      "external_cost_unknown_successful": 1,
+      "total_cost_with_shift_successful_usd": 0.0,
+      "total_cost_with_shift_measured_successful": 0,
+      "external_tokens_successful": 0,
+      "external_tokens_measured_successful": 0,
+      "artifacts_used_successful": 0,
+      "corrections_successful": 0,
+      "bytes_before_successful": 24000,
+      "bytes_after_successful": 6000,
+      "turns_successful": 0,
+      "hook_triggers_successful": 0,
+      "failure_rate": 0.0,
+      "task_count": 1,
+      "successful_task_count": 1,
+      "tokens_per_task_including_failures": null,
+      "wall_time_seconds_per_task_including_failures": 11.0,
+      "provider_cached_tokens_per_task_including_failures": 0.0,
+      "primary_cost_per_task_including_failures_usd": null,
+      "total_cost_with_shift_per_task_including_failures_usd": null,
+      "tokens_per_successful_task": null,
+      "wall_time_seconds_per_successful_task": 11.0,
+      "provider_cached_tokens_per_successful_task": 0.0,
+      "primary_cost_per_successful_task_usd": null,
+      "total_cost_with_shift_per_successful_task_usd": null,
+      "external_tokens_per_successful_task": null,
+      "artifacts_used_per_successful_task": 0.0,
+      "corrections_per_successful_task": 0.0,
+      "byte_reduction_ratio": 0.25,
+      "compression_strategy": "context_pack_auto",
+      "is_baseline_strategy": false,
+      "bytes_saved_successful": 18000,
+      "bytes_saved_per_successful_task": 18000.0,
+      "byte_savings_pct": 75.0,
+      "token_proxy_saved_successful": 4500,
+      "token_proxy_saved_per_successful_task": 4500.0,
+      "observed_telemetry": {
+        "tokens": "unavailable",
+        "primary_cost": "unavailable",
+        "external_tokens": "unavailable",
+        "byte_savings": "observed",
+        "token_proxy": "inferred",
+        "wall_time": "observed",
+        "provider_cache": "unavailable"
+      }
+    }
+  },
+  "comparisons": [
+    {
+      "variant": "context_pack_auto",
+      "baseline_variant": "baseline",
+      "quality_gate": "pass",
+      "baseline_failure_rate": 0.0,
+      "variant_failure_rate": 0.0,
+      "failure_rate_delta_pp": 0.0,
+      "matched_successful_task_count": 1,
+      "baseline_successful_task_count": 1,
+      "missing_baseline_success_tasks": [],
+      "baseline_corrections_per_successful_task": 0.0,
+      "variant_corrections_per_successful_task": 0.0,
+      "paired_corrections_task_count": 1,
+      "corrections_delta_per_successful_task": 0.0,
+      "token_savings_pct": null,
+      "paired_token_task_count": 0,
+      "wall_time_delta_seconds_per_successful_task": -1.0,
+      "wall_time_change_pct": -8.333333333333332,
+      "paired_wall_time_task_count": 1,
+      "cost_savings_pct_with_shift": null,
+      "paired_cost_task_count": 0
+    }
+  ],
+  "claim_status": "insufficient_paired_data",
+  "caveat": "Proxy byte reductions are reported separately from matched-task token/cost metrics; shifted cost savings require measured primary cost and measured external cost when external tokens are present. Wall time and provider cached-token fields are diagnostic telemetry, not proof of ContextGuard-caused token or cost savings; provider-cache discounts must stay separate from token-reduction claims."
+}

package/docs/benchmark-workflows/measured-token-workflow.example.json ADDED Viewed

@@ -0,0 +1,170 @@
+{
+  "schema": "context-guard-bench-report-v1",
+  "baseline_variant": "baseline",
+  "row_count": 2,
+  "summary_by_variant": {
+    "baseline": {
+      "runs": 1,
+      "successful_runs": 1,
+      "failed_runs": 0,
+      "total_tokens_all_runs": 1000,
+      "primary_tokens_measured_runs": 1,
+      "primary_cost_all_runs_usd": 0.0,
+      "primary_cost_measured_runs": 0,
+      "wall_time_seconds_all_runs": 10.0,
+      "wall_time_seconds_measured_runs": 1,
+      "provider_cached_tokens_all_runs": 0,
+      "provider_cached_tokens_measured_runs": 0,
+      "total_cost_with_shift_all_runs_usd": 0.0,
+      "total_cost_with_shift_measured_runs": 0,
+      "total_tokens_successful": 1000,
+      "primary_tokens_measured_successful": 1,
+      "primary_cost_successful_usd": 0.0,
+      "primary_cost_measured_successful": 0,
+      "wall_time_seconds_successful": 10.0,
+      "wall_time_seconds_measured_successful": 1,
+      "provider_cached_tokens_successful": 0,
+      "provider_cached_tokens_measured_successful": 0,
+      "external_cost_successful_usd": 0.0,
+      "external_cost_unknown_successful": 1,
+      "total_cost_with_shift_successful_usd": 0.0,
+      "total_cost_with_shift_measured_successful": 0,
+      "external_tokens_successful": 0,
+      "external_tokens_measured_successful": 0,
+      "artifacts_used_successful": 0,
+      "corrections_successful": 0,
+      "bytes_before_successful": 12000,
+      "bytes_after_successful": 12000,
+      "turns_successful": 0,
+      "hook_triggers_successful": 0,
+      "failure_rate": 0.0,
+      "task_count": 1,
+      "successful_task_count": 1,
+      "tokens_per_task_including_failures": 1000.0,
+      "wall_time_seconds_per_task_including_failures": 10.0,
+      "provider_cached_tokens_per_task_including_failures": 0.0,
+      "primary_cost_per_task_including_failures_usd": null,
+      "total_cost_with_shift_per_task_including_failures_usd": null,
+      "tokens_per_successful_task": 1000.0,
+      "wall_time_seconds_per_successful_task": 10.0,
+      "provider_cached_tokens_per_successful_task": 0.0,
+      "primary_cost_per_successful_task_usd": null,
+      "total_cost_with_shift_per_successful_task_usd": null,
+      "external_tokens_per_successful_task": null,
+      "artifacts_used_per_successful_task": 0.0,
+      "corrections_per_successful_task": 0.0,
+      "byte_reduction_ratio": 1.0,
+      "compression_strategy": "baseline",
+      "is_baseline_strategy": true,
+      "bytes_saved_successful": 0,
+      "bytes_saved_per_successful_task": 0.0,
+      "byte_savings_pct": 0.0,
+      "token_proxy_saved_successful": 0,
+      "token_proxy_saved_per_successful_task": 0.0,
+      "observed_telemetry": {
+        "tokens": "observed",
+        "primary_cost": "unavailable",
+        "external_tokens": "unavailable",
+        "byte_savings": "observed",
+        "token_proxy": "inferred",
+        "wall_time": "observed",
+        "provider_cache": "unavailable"
+      }
+    },
+    "brief_mode_standard": {
+      "runs": 1,
+      "successful_runs": 1,
+      "failed_runs": 0,
+      "total_tokens_all_runs": 760,
+      "primary_tokens_measured_runs": 1,
+      "primary_cost_all_runs_usd": 0.0,
+      "primary_cost_measured_runs": 0,
+      "wall_time_seconds_all_runs": 9.6,
+      "wall_time_seconds_measured_runs": 1,
+      "provider_cached_tokens_all_runs": 0,
+      "provider_cached_tokens_measured_runs": 0,
+      "total_cost_with_shift_all_runs_usd": 0.0,
+      "total_cost_with_shift_measured_runs": 0,
+      "total_tokens_successful": 760,
+      "primary_tokens_measured_successful": 1,
+      "primary_cost_successful_usd": 0.0,
+      "primary_cost_measured_successful": 0,
+      "wall_time_seconds_successful": 9.6,
+      "wall_time_seconds_measured_successful": 1,
+      "provider_cached_tokens_successful": 0,
+      "provider_cached_tokens_measured_successful": 0,
+      "external_cost_successful_usd": 0.0,
+      "external_cost_unknown_successful": 1,
+      "total_cost_with_shift_successful_usd": 0.0,
+      "total_cost_with_shift_measured_successful": 0,
+      "external_tokens_successful": 0,
+      "external_tokens_measured_successful": 0,
+      "artifacts_used_successful": 0,
+      "corrections_successful": 0,
+      "bytes_before_successful": 12000,
+      "bytes_after_successful": 9000,
+      "turns_successful": 0,
+      "hook_triggers_successful": 0,
+      "failure_rate": 0.0,
+      "task_count": 1,
+      "successful_task_count": 1,
+      "tokens_per_task_including_failures": 760.0,
+      "wall_time_seconds_per_task_including_failures": 9.6,
+      "provider_cached_tokens_per_task_including_failures": 0.0,
+      "primary_cost_per_task_including_failures_usd": null,
+      "total_cost_with_shift_per_task_including_failures_usd": null,
+      "tokens_per_successful_task": 760.0,
+      "wall_time_seconds_per_successful_task": 9.6,
+      "provider_cached_tokens_per_successful_task": 0.0,
+      "primary_cost_per_successful_task_usd": null,
+      "total_cost_with_shift_per_successful_task_usd": null,
+      "external_tokens_per_successful_task": null,
+      "artifacts_used_per_successful_task": 0.0,
+      "corrections_per_successful_task": 0.0,
+      "byte_reduction_ratio": 0.75,
+      "compression_strategy": "brief_mode_standard",
+      "is_baseline_strategy": false,
+      "bytes_saved_successful": 3000,
+      "bytes_saved_per_successful_task": 3000.0,
+      "byte_savings_pct": 25.0,
+      "token_proxy_saved_successful": 750,
+      "token_proxy_saved_per_successful_task": 750.0,
+      "observed_telemetry": {
+        "tokens": "observed",
+        "primary_cost": "unavailable",
+        "external_tokens": "unavailable",
+        "byte_savings": "observed",
+        "token_proxy": "inferred",
+        "wall_time": "observed",
+        "provider_cache": "unavailable"
+      }
+    }
+  },
+  "comparisons": [
+    {
+      "variant": "brief_mode_standard",
+      "baseline_variant": "baseline",
+      "quality_gate": "pass",
+      "baseline_failure_rate": 0.0,
+      "variant_failure_rate": 0.0,
+      "failure_rate_delta_pp": 0.0,
+      "matched_successful_task_count": 1,
+      "baseline_successful_task_count": 1,
+      "missing_baseline_success_tasks": [],
+      "baseline_corrections_per_successful_task": 0.0,
+      "variant_corrections_per_successful_task": 0.0,
+      "paired_corrections_task_count": 1,
+      "corrections_delta_per_successful_task": 0.0,
+      "token_delta_per_successful_task": -240.0,
+      "token_savings_pct": 24.0,
+      "paired_token_task_count": 1,
+      "wall_time_delta_seconds_per_successful_task": -0.40000000000000036,
+      "wall_time_change_pct": -4.0000000000000036,
+      "paired_wall_time_task_count": 1,
+      "cost_savings_pct_with_shift": null,
+      "paired_cost_task_count": 0
+    }
+  ],
+  "claim_status": "token_savings_observed_cost_unmeasured",
+  "caveat": "Proxy byte reductions are reported separately from matched-task token/cost metrics; shifted cost savings require measured primary cost and measured external cost when external tokens are present. Wall time and provider cached-token fields are diagnostic telemetry, not proof of ContextGuard-caused token or cost savings; provider-cache discounts must stay separate from token-reduction claims."
+}

package/docs/benchmark-workflows/provider-cache-telemetry.example.json ADDED Viewed

@@ -0,0 +1,170 @@
+{
+  "schema": "context-guard-bench-report-v1",
+  "baseline_variant": "baseline",
+  "row_count": 2,
+  "summary_by_variant": {
+    "baseline": {
+      "runs": 1,
+      "successful_runs": 1,
+      "failed_runs": 0,
+      "total_tokens_all_runs": 1200,
+      "primary_tokens_measured_runs": 1,
+      "primary_cost_all_runs_usd": 0.0,
+      "primary_cost_measured_runs": 0,
+      "wall_time_seconds_all_runs": 10.0,
+      "wall_time_seconds_measured_runs": 1,
+      "provider_cached_tokens_all_runs": 0,
+      "provider_cached_tokens_measured_runs": 1,
+      "total_cost_with_shift_all_runs_usd": 0.0,
+      "total_cost_with_shift_measured_runs": 0,
+      "total_tokens_successful": 1200,
+      "primary_tokens_measured_successful": 1,
+      "primary_cost_successful_usd": 0.0,
+      "primary_cost_measured_successful": 0,
+      "wall_time_seconds_successful": 10.0,
+      "wall_time_seconds_measured_successful": 1,
+      "provider_cached_tokens_successful": 0,
+      "provider_cached_tokens_measured_successful": 1,
+      "external_cost_successful_usd": 0.0,
+      "external_cost_unknown_successful": 1,
+      "total_cost_with_shift_successful_usd": 0.0,
+      "total_cost_with_shift_measured_successful": 0,
+      "external_tokens_successful": 0,
+      "external_tokens_measured_successful": 0,
+      "artifacts_used_successful": 0,
+      "corrections_successful": 0,
+      "bytes_before_successful": 0,
+      "bytes_after_successful": 0,
+      "turns_successful": 0,
+      "hook_triggers_successful": 0,
+      "failure_rate": 0.0,
+      "task_count": 1,
+      "successful_task_count": 1,
+      "tokens_per_task_including_failures": 1200.0,
+      "wall_time_seconds_per_task_including_failures": 10.0,
+      "provider_cached_tokens_per_task_including_failures": 0.0,
+      "primary_cost_per_task_including_failures_usd": null,
+      "total_cost_with_shift_per_task_including_failures_usd": null,
+      "tokens_per_successful_task": 1200.0,
+      "wall_time_seconds_per_successful_task": 10.0,
+      "provider_cached_tokens_per_successful_task": 0.0,
+      "primary_cost_per_successful_task_usd": null,
+      "total_cost_with_shift_per_successful_task_usd": null,
+      "external_tokens_per_successful_task": null,
+      "artifacts_used_per_successful_task": 0.0,
+      "corrections_per_successful_task": 0.0,
+      "byte_reduction_ratio": null,
+      "compression_strategy": "baseline",
+      "is_baseline_strategy": true,
+      "bytes_saved_successful": null,
+      "bytes_saved_per_successful_task": null,
+      "byte_savings_pct": null,
+      "token_proxy_saved_successful": null,
+      "token_proxy_saved_per_successful_task": null,
+      "observed_telemetry": {
+        "tokens": "observed",
+        "primary_cost": "unavailable",
+        "external_tokens": "unavailable",
+        "byte_savings": "unavailable",
+        "token_proxy": "unavailable",
+        "wall_time": "observed",
+        "provider_cache": "observed"
+      }
+    },
+    "cache_layout_check": {
+      "runs": 1,
+      "successful_runs": 1,
+      "failed_runs": 0,
+      "total_tokens_all_runs": 1200,
+      "primary_tokens_measured_runs": 1,
+      "primary_cost_all_runs_usd": 0.0,
+      "primary_cost_measured_runs": 0,
+      "wall_time_seconds_all_runs": 10.0,
+      "wall_time_seconds_measured_runs": 1,
+      "provider_cached_tokens_all_runs": 900,
+      "provider_cached_tokens_measured_runs": 1,
+      "total_cost_with_shift_all_runs_usd": 0.0,
+      "total_cost_with_shift_measured_runs": 0,
+      "total_tokens_successful": 1200,
+      "primary_tokens_measured_successful": 1,
+      "primary_cost_successful_usd": 0.0,
+      "primary_cost_measured_successful": 0,
+      "wall_time_seconds_successful": 10.0,
+      "wall_time_seconds_measured_successful": 1,
+      "provider_cached_tokens_successful": 900,
+      "provider_cached_tokens_measured_successful": 1,
+      "external_cost_successful_usd": 0.0,
+      "external_cost_unknown_successful": 1,
+      "total_cost_with_shift_successful_usd": 0.0,
+      "total_cost_with_shift_measured_successful": 0,
+      "external_tokens_successful": 0,
+      "external_tokens_measured_successful": 0,
+      "artifacts_used_successful": 0,
+      "corrections_successful": 0,
+      "bytes_before_successful": 0,
+      "bytes_after_successful": 0,
+      "turns_successful": 0,
+      "hook_triggers_successful": 0,
+      "failure_rate": 0.0,
+      "task_count": 1,
+      "successful_task_count": 1,
+      "tokens_per_task_including_failures": 1200.0,
+      "wall_time_seconds_per_task_including_failures": 10.0,
+      "provider_cached_tokens_per_task_including_failures": 900.0,
+      "primary_cost_per_task_including_failures_usd": null,
+      "total_cost_with_shift_per_task_including_failures_usd": null,
+      "tokens_per_successful_task": 1200.0,
+      "wall_time_seconds_per_successful_task": 10.0,
+      "provider_cached_tokens_per_successful_task": 900.0,
+      "primary_cost_per_successful_task_usd": null,
+      "total_cost_with_shift_per_successful_task_usd": null,
+      "external_tokens_per_successful_task": null,
+      "artifacts_used_per_successful_task": 0.0,
+      "corrections_per_successful_task": 0.0,
+      "byte_reduction_ratio": null,
+      "compression_strategy": "cache_layout_check",
+      "is_baseline_strategy": false,
+      "bytes_saved_successful": null,
+      "bytes_saved_per_successful_task": null,
+      "byte_savings_pct": null,
+      "token_proxy_saved_successful": null,
+      "token_proxy_saved_per_successful_task": null,
+      "observed_telemetry": {
+        "tokens": "observed",
+        "primary_cost": "unavailable",
+        "external_tokens": "unavailable",
+        "byte_savings": "unavailable",
+        "token_proxy": "unavailable",
+        "wall_time": "observed",
+        "provider_cache": "observed"
+      }
+    }
+  },
+  "comparisons": [
+    {
+      "variant": "cache_layout_check",
+      "baseline_variant": "baseline",
+      "quality_gate": "pass",
+      "baseline_failure_rate": 0.0,
+      "variant_failure_rate": 0.0,
+      "failure_rate_delta_pp": 0.0,
+      "matched_successful_task_count": 1,
+      "baseline_successful_task_count": 1,
+      "missing_baseline_success_tasks": [],
+      "baseline_corrections_per_successful_task": 0.0,
+      "variant_corrections_per_successful_task": 0.0,
+      "paired_corrections_task_count": 1,
+      "corrections_delta_per_successful_task": 0.0,
+      "token_delta_per_successful_task": 0.0,
+      "token_savings_pct": 0.0,
+      "paired_token_task_count": 1,
+      "wall_time_delta_seconds_per_successful_task": 0.0,
+      "wall_time_change_pct": 0.0,
+      "paired_wall_time_task_count": 1,
+      "cost_savings_pct_with_shift": null,
+      "paired_cost_task_count": 0
+    }
+  ],
+  "claim_status": "compare_variants",
+  "caveat": "Proxy byte reductions are reported separately from matched-task token/cost metrics; shifted cost savings require measured primary cost and measured external cost when external tokens are present. Wall time and provider cached-token fields are diagnostic telemetry, not proof of ContextGuard-caused token or cost savings; provider-cache discounts must stay separate from token-reduction claims."
+}

package/docs/cache-diagnostics-schema.md ADDED Viewed

@@ -0,0 +1,96 @@
+# ContextGuard `cache_diagnostics` schema
+`cache_diagnostics` is the nested diagnostic object emitted by `context-guard-audit --json` and by top-level `cache_diagnostics` in `context-guard-audit --feasibility-json`. The committed schema file, [`cache-diagnostics.schema.json`](cache-diagnostics.schema.json), describes that nested object only; it is not the full CLI response envelope.
+The object is for GUI and external consumers that need stable cache-read, prefix-layout, TTL-evidence, and headroom-boundary fields without scraping prose. It is a local transcript diagnostic contract, not a billing source, not provider telemetry verification, and not a token or cost savings promise. It does not guarantee savings, does not prove provider cache hits, and does not infer live headroom.
+`context-guard-audit` also emits a top-level sibling `cache_layout_advice` object. That sibling is intentionally separate from `cache_diagnostics`: diagnostics stay evidence-oriented, while advice ranks checks and experiments such as session splitting, prefix stabilization, and context-diet scans. Advice distinguishes an `observed_issue` from `hypothesized_causes`, `corroborated_causes`, and `next_checks`; without diet or structural evidence, volatile prefix positions should be presented as hypotheses to check, not confirmed root causes.
+## Files
+- [`cache-diagnostics.schema.json`](cache-diagnostics.schema.json) — JSON Schema 2020-12-style reference for the nested `cache_diagnostics` object.
+- [`cache-diagnostics.example.json`](cache-diagnostics.example.json) — focused example generated from a synthetic timestamped transcript through `context-guard-audit --json`.
+## Output surfaces
+### `context-guard-audit --json`
+The legacy audit JSON includes top-level `cache_diagnostics` beside `cache_metrics`, `cache_friendliness`, and the separate `cache_layout_advice` advice object.
+### `context-guard-audit --feasibility-json`
+The feasibility JSON includes top-level `cache_diagnostics` and `cache_layout_advice`, and lists both in `consumer_contract.stable_top_level_fields`. GUI consumers should prefer the top-level feasibility field when available and use `summary.cache_diagnostics` only for legacy compatibility.
+## Sibling `cache_layout_advice` fields
+`cache_layout_advice` is a stable top-level sibling of `cache_diagnostics`, but it is deliberately not part of `cache-diagnostics.schema.json`. It is an advice contract over local transcript heuristics.
+| Field | Meaning | Consumer note |
+| --- | --- | --- |
+| `schema_version` | Stable version string, currently `contextguard.cache-layout-advice.v1`. | Treat unknown versions conservatively. |
+| `status` | Advice availability: `available`, `partial`, or `missing`. | `partial` means prompt/cache evidence was capped, skipped, or incomplete. |
+| `confidence` | Overall advice confidence: `hypothesis`, `partial`, or `unavailable`. | Never present as provider truth or billing proof. |
+| `heuristic` | Always `true` for v1. | UI should label advice as heuristic. |
+| `observed_issue` | Primary observed layout issue: `volatile_prefix_breaker`, `long_session_accumulation`, `low_cache_reuse`, `missing_cache_fields`, or `unknown`. | This is an observed/audited symptom, not a confirmed cause. |
+| `priority` | Suggested priority bucket (`P0`, `P1`, or `P2`). | Use for ordering checks, not for savings claims. |
+| `observed_summary` | Sanitized numeric summary such as cache creation/read tokens, prefix shares, breaker position, and dominant transcript share. | Contains aggregate counts/shares only, not raw prompt text. |
+| `hypothesized_causes` | Candidate causes to investigate, each with `id`, `confidence`, `evidence`, `reason`, and `next_check`. | Keep separate from confirmed causes. |
+| `corroborated_causes` | Causes supported by independent evidence beyond prefix-position heuristics. | Empty means no cause has been confirmed. |
+| `next_checks` | Evidence-gathering checks with `id`, `confidence`, `command_templates`, and `evidence_required_for_corroboration`. | Templates use placeholders such as `<repo>` and must not embed observed local paths. |
+| `recommended_experiments` | Ordered experiments with `id`, `order`, `priority`, `effort`, `action`, `expected_signal`, and `verification`. | Run in `order`; compare matched audit windows before claiming improvement. |
+| `caveats` | User-facing boundaries for claims and evidence limits. | Preserve these in GUI summaries and reports. |
+## Top-level fields
+| Field | Meaning | Consumer note |
+| --- | --- | --- |
+| `schema_version` | Stable version string, currently `contextguard.cache-diagnostics.v1`. | Treat unknown versions as compatible only after checking docs. |
+| `status` | Overall diagnostic availability: `available`, `partial`, or `missing`. | `partial` means skipped/capped evidence or partial prompt-layout confidence. |
+| `confidence` | Overall confidence: `hypothesis`, `partial`, or `unavailable` for current v1 output. | Do not present `hypothesis` as observed provider truth. |
+| `evidence` | Overall evidence class: `inferred` or `unavailable`. | Cache fields may be observed inside nested observations. |
+| `heuristic` | Always `true` for v1. | UI should label these as local heuristics. |
+| `observations` | Observed cache field counts and token totals. | Distinguishes missing cache fields from observed zero values. |
+| `derived_ratios` | Inferred ratios such as cache-read share and reuse ratio. | Ratios are formulas over transcript fields. |
+| `stable_prefix_candidates` | Redacted segment positions that appear stable across samples. | Never contains raw prompt text. |
+| `dynamic_prefix_breakers` | Redacted segment positions that appear volatile near the prefix. | Use as prompt-layout guidance only. |
+| `cache_miss_hypotheses` | Ordered local hypotheses for missing or weak cache reads. | Small, conservative list; not root-cause proof. |
+| `ttl_diagnostics` | Timestamped cache telemetry interval evidence and TTL caveats. | Positive timestamped records bound local observations only. |
+| `headroom_diagnostics` | Historical-scan headroom boundary. | Live headroom requires `live_statusline_snapshot`. |
+| `caveats` | User-facing caveats for evidence limits. | Preserve these when presenting summaries. |
+## Evidence vocabulary
+- `observed`: a field was present in the local transcript scan.
+- `inferred`: a value was derived from local transcript fields or redacted prompt segment hashes.
+- `hypothesis`: a plausible local interpretation that still needs corroboration.
+- `partial`: some scan evidence was skipped, capped, overlapping, or otherwise incomplete.
+- `unavailable`: the scan does not contain enough evidence to expose the metric.
+## TTL diagnostics
+`ttl_diagnostics` documents timestamped cache telemetry fields:
+- `timestamped_cache_record_count`: transcript records that had timestamps and cache telemetry fields.
+- `positive_timestamped_cache_record_count`: timestamped cache telemetry records with positive cache read or cache creation token counts.
+- `timestamped_cache_record_span_seconds`: span between the first and last positive timestamped cache telemetry records, or `null` when fewer than two positive records exist.
+- `interval_basis`: always `positive_timestamped_cache_records` for v1.
+- `candidate`: one of `within-5m`, `between-5m-and-1h`, `beyond-1h`, or `null`.
+- `status`, `evidence`, and `confidence`: no stronger than `hypothesis` / `inferred` for timestamp-derived intervals.
+Positive timestamped cache telemetry records are local interval evidence. They do not prove provider TTL state, provider cache hits, invoices, or token/cost savings.
+## Headroom diagnostics
+Historical transcript scans do not carry live context-window state. `headroom_diagnostics` therefore keeps headroom `missing`/`unavailable`, sets `historical_total_tokens_are_not_headroom: true`, and names `live_statusline_snapshot` as the required observation.
+## GUI binding guidance
+1. Bind to `schema_version` and tolerate future additive fields only after reviewing the schema docs.
+2. Display `status`, `confidence`, `evidence`, and `heuristic` near any cache recommendation.
+3. Separate observed token telemetry under `observations.cache_fields` from inferred ratios and hypotheses.
+4. Preserve TTL/headroom caveats in UI copy.
+5. Hide provider-cache or headroom widgets when evidence is `unavailable` instead of treating missing data as zero.
+## Claim boundaries
+`cache_diagnostics` and the sibling `cache_layout_advice` can help users reorganize prompts, find volatile prefix segments, and identify missing evidence or next checks. They do not guarantee savings, do not verify provider cache state, are not billing authority, do not prove provider cache hits, and do not infer live headroom from historical token totals.