PyPI - contexttrace - Versions diffs - 0.2.0__tar.gz → 0.3.0__tar.gz - Mend

contexttrace 0.2.0tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (51) hide show

{contexttrace-0.2.0 → contexttrace-0.3.0}/MANIFEST.in RENAMED Viewed

@@ -3,6 +3,7 @@ include pyproject.toml
 include setup.py
 include contexttrace/py.typed
 recursive-include contexttrace *.py
+recursive-include contexttrace/verify *.json
 prune build
 prune dist
 prune tests

{contexttrace-0.2.0 → contexttrace-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: contexttrace
-Version: 0.2.0
+Version: 0.3.0
 Summary: Local-first SDK and CLI for RAG and agent reliability tracing, citation checks, and failure diagnosis.
 Author: ContextTrace contributors
 License: MIT
@@ -145,17 +145,21 @@ contexttrace verify trace.json --report --out reports/example.html
 contexttrace verify trace.json --mode semantic
 contexttrace verify trace.json --fail-on unsupported --fail-on citation_mismatch
 contexttrace verify-benchmark --mode semantic
+contexttrace verify-benchmark --mode semantic --report
+contexttrace verify-benchmark --case-set external --mode semantic --report
 ```
 Input requires `query`, `answer`, and `contexts` with `id` and `text`. Optional `citations` are checked to catch cited sources that do not actually support the matched claim.
 `verify-demo` uses bundled demo traces, so it works immediately after `pip install contexttrace`. Available demos include `unsupported_claim`, `partial_support`, `citation_mismatch`, `should_abstain`, and `supported_answer`.
-Use `--mode semantic` for local paraphrase-aware matching, and `verify-benchmark` to inspect bundled precision/recall metrics.
+Use `--mode semantic` for local paraphrase-aware matching, and `verify-benchmark` to inspect bundled precision/recall metrics. The default benchmark includes 32 real ContextTrace docs and release-artifact cases. `--case-set external` adds public OSS documentation and GitHub issue cases from Qdrant, Chroma, Haystack, and LangChain, while `--case-set all` runs both packs. `--report` writes an HTML report with misses to inspect.
-ContextTrace verifies whether each generated claim is actually supported by retrieved evidence. Instead of only showing a trace or a score, it tells you where the evidence chain broke: unsupported claim, citation mismatch, insufficient context, or should-have-abstained.
+Verification output includes evidence span offsets, stable span hashes, multiple supporting spans, typed matched/missing facts, and claim-level root causes so partial support failures are easier to inspect.
-The v0.2.0 verifier uses local lexical heuristics by default. Claim extraction is rule-based, contradiction detection is conservative, and semantic or LLM-judge support can be added later.
+ContextTrace verifies whether each generated claim is actually supported by retrieved evidence. Instead of only showing a trace or a score, it tells you where the evidence chain broke: unsupported claim, citation mismatch, retrieval miss, answer overreach, conflicting context, or should-have-abstained.
+The v0.3.0 verifier uses local lexical heuristics by default. Claim extraction is rule-based, contradiction detection is conservative, and semantic or LLM-judge support can be added later.
 ## What It Catches

{contexttrace-0.2.0 → contexttrace-0.3.0}/README.md RENAMED Viewed

@@ -88,17 +88,21 @@ contexttrace verify trace.json --report --out reports/example.html
 contexttrace verify trace.json --mode semantic
 contexttrace verify trace.json --fail-on unsupported --fail-on citation_mismatch
 contexttrace verify-benchmark --mode semantic
+contexttrace verify-benchmark --mode semantic --report
+contexttrace verify-benchmark --case-set external --mode semantic --report
 ```
 Input requires `query`, `answer`, and `contexts` with `id` and `text`. Optional `citations` are checked to catch cited sources that do not actually support the matched claim.
 `verify-demo` uses bundled demo traces, so it works immediately after `pip install contexttrace`. Available demos include `unsupported_claim`, `partial_support`, `citation_mismatch`, `should_abstain`, and `supported_answer`.
-Use `--mode semantic` for local paraphrase-aware matching, and `verify-benchmark` to inspect bundled precision/recall metrics.
+Use `--mode semantic` for local paraphrase-aware matching, and `verify-benchmark` to inspect bundled precision/recall metrics. The default benchmark includes 32 real ContextTrace docs and release-artifact cases. `--case-set external` adds public OSS documentation and GitHub issue cases from Qdrant, Chroma, Haystack, and LangChain, while `--case-set all` runs both packs. `--report` writes an HTML report with misses to inspect.
-ContextTrace verifies whether each generated claim is actually supported by retrieved evidence. Instead of only showing a trace or a score, it tells you where the evidence chain broke: unsupported claim, citation mismatch, insufficient context, or should-have-abstained.
+Verification output includes evidence span offsets, stable span hashes, multiple supporting spans, typed matched/missing facts, and claim-level root causes so partial support failures are easier to inspect.
-The v0.2.0 verifier uses local lexical heuristics by default. Claim extraction is rule-based, contradiction detection is conservative, and semantic or LLM-judge support can be added later.
+ContextTrace verifies whether each generated claim is actually supported by retrieved evidence. Instead of only showing a trace or a score, it tells you where the evidence chain broke: unsupported claim, citation mismatch, retrieval miss, answer overreach, conflicting context, or should-have-abstained.
+The v0.3.0 verifier uses local lexical heuristics by default. Claim extraction is rule-based, contradiction detection is conservative, and semantic or LLM-judge support can be added later.
 ## What It Catches

contexttrace-0.3.0/contexttrace/_version.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "0.3.0"

{contexttrace-0.2.0 → contexttrace-0.3.0}/contexttrace/cli.py RENAMED Viewed

@@ -29,7 +29,7 @@ from contexttrace.verify import (
     load_verify_demo,
     verify_trace,
 )
-from contexttrace.verify.benchmark import run_verify_benchmark
+from contexttrace.verify.benchmark import run_verify_benchmark, write_verify_benchmark_report
 from contexttrace.verify.report import VerifyReportGenerator
 from contexttrace.viewer import serve_viewer
@@ -288,18 +288,31 @@ def verify_demo_command(
 @cli.command("verify-benchmark")
 @click.option("--mode", default="lexical", show_default=True, type=click.Choice(["lexical", "semantic"]), help="Evidence scoring mode.")
+@click.option("--case-set", default="contexttrace", show_default=True, type=click.Choice(["contexttrace", "external", "all"]), help="Benchmark case set to run.")
 @click.option("--json", "json_output", is_flag=True, help="Print benchmark results as JSON.")
-def verify_benchmark_command(mode: str, json_output: bool) -> int:
+@click.option("--report", is_flag=True, help="Generate a local HTML benchmark report.")
+@click.option("--out", default=None, help="HTML benchmark report path. Implies --report when provided.")
+def verify_benchmark_command(mode: str, case_set: str, json_output: bool, report: bool, out: Optional[str]) -> int:
     """Run the bundled verification precision/recall benchmark."""
-    result = run_verify_benchmark(mode=mode)
+    result = run_verify_benchmark(mode=mode, case_set=case_set)
+    written_report = None
+    if report or out:
+        output_path = out or str(Path(".contexttrace") / "reports" / ("verify_benchmark_%s.html" % mode))
+        written_report = write_verify_benchmark_report(result, path=output_path)
     if json_output:
+        if written_report:
+            click.echo("Report: %s" % written_report, err=True)
         click.echo(json.dumps(result, indent=2))
         return 0
     click.echo("Mode: %s" % result["mode"])
+    click.echo("Case source: %s" % result["case_source"])
     click.echo("Cases: %s" % result["cases"])
     click.echo("Exact match rate: %.3f" % float(result["exact_match_rate"]))
+    click.echo("Verdict match rate: %.3f" % float(result["verdict_match_rate"]))
+    click.echo("Citation match rate: %.3f" % float(result["citation_match_rate"]))
+    click.echo("Abstention match rate: %.3f" % float(result["abstention_match_rate"]))
     click.echo("label\tprecision\trecall\tf1\ttp\tfp\tfn")
     for label, metrics in result["per_label"].items():
         click.echo(
@@ -322,6 +335,8 @@ def verify_benchmark_command(mode: str, json_output: bool) -> int:
                 "- %s expected=%s predicted=%s"
                 % (row["id"], ",".join(row["expected"]), ",".join(row["predicted"]))
             )
+    if written_report:
+        click.echo("Report: %s" % written_report)
     return 0
@@ -365,6 +380,7 @@ def _print_verify_result(
     click.echo("Unsupported claim rate: %.3f" % float(summary["unsupported_claim_rate"]))
     click.echo("Citation mismatches: %s" % summary["citation_mismatches"])
     click.echo("Failure type: %s" % summary["failure_type"])
+    click.echo("Primary root cause: %s" % summary.get("primary_root_cause", "unknown"))
     click.echo("Should abstain: %s" % str(summary["should_abstain"]).lower())
     click.echo("Suggested fix: %s" % summary["suggested_fix"])
     if written_report:

contexttrace 0.2.0__tar.gz → 0.3.0__tar.gz

contexttrace 0.2.0tar.gz → 0.3.0tar.gz