PyPI - contexttrace - Versions diffs - 0.3.0__tar.gz → 0.4.0__tar.gz - Mend

contexttrace 0.3.0tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (52) hide show

{contexttrace-0.3.0 → contexttrace-0.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: contexttrace
-Version: 0.3.0
+Version: 0.4.0
 Summary: Local-first SDK and CLI for RAG and agent reliability tracing, citation checks, and failure diagnosis.
 Author: ContextTrace contributors
 License: MIT
@@ -147,6 +147,9 @@ contexttrace verify trace.json --fail-on unsupported --fail-on citation_mismatch
 contexttrace verify-benchmark --mode semantic
 contexttrace verify-benchmark --mode semantic --report
 contexttrace verify-benchmark --case-set external --mode semantic --report
+contexttrace compare baseline.json current.json
+contexttrace compare baseline.json current.json --report
+contexttrace compare baseline.json current.json --fail-on new_failure
 ```
 Input requires `query`, `answer`, and `contexts` with `id` and `text`. Optional `citations` are checked to catch cited sources that do not actually support the matched claim.
@@ -159,7 +162,9 @@ Verification output includes evidence span offsets, stable span hashes, multiple
 ContextTrace verifies whether each generated claim is actually supported by retrieved evidence. Instead of only showing a trace or a score, it tells you where the evidence chain broke: unsupported claim, citation mismatch, retrieval miss, answer overreach, conflicting context, or should-have-abstained.
-The v0.3.0 verifier uses local lexical heuristics by default. Claim extraction is rule-based, contradiction detection is conservative, and semantic or LLM-judge support can be added later.
+Use `contexttrace compare baseline.json current.json` to diff two portable traces or saved `verify --json` outputs. It reports support-rate deltas, new unsupported claims, citation regressions, should-abstain flips, and new root causes, with `--fail-on` gates for CI.
+The v0.4.0 verifier uses local lexical heuristics by default. Claim extraction is rule-based, contradiction detection is conservative, and semantic or LLM-judge support can be added later.
 ## What It Catches

{contexttrace-0.3.0 → contexttrace-0.4.0}/README.md RENAMED Viewed

@@ -90,6 +90,9 @@ contexttrace verify trace.json --fail-on unsupported --fail-on citation_mismatch
 contexttrace verify-benchmark --mode semantic
 contexttrace verify-benchmark --mode semantic --report
 contexttrace verify-benchmark --case-set external --mode semantic --report
+contexttrace compare baseline.json current.json
+contexttrace compare baseline.json current.json --report
+contexttrace compare baseline.json current.json --fail-on new_failure
 ```
 Input requires `query`, `answer`, and `contexts` with `id` and `text`. Optional `citations` are checked to catch cited sources that do not actually support the matched claim.
@@ -102,7 +105,9 @@ Verification output includes evidence span offsets, stable span hashes, multiple
 ContextTrace verifies whether each generated claim is actually supported by retrieved evidence. Instead of only showing a trace or a score, it tells you where the evidence chain broke: unsupported claim, citation mismatch, retrieval miss, answer overreach, conflicting context, or should-have-abstained.
-The v0.3.0 verifier uses local lexical heuristics by default. Claim extraction is rule-based, contradiction detection is conservative, and semantic or LLM-judge support can be added later.
+Use `contexttrace compare baseline.json current.json` to diff two portable traces or saved `verify --json` outputs. It reports support-rate deltas, new unsupported claims, citation regressions, should-abstain flips, and new root causes, with `--fail-on` gates for CI.
+The v0.4.0 verifier uses local lexical heuristics by default. Claim extraction is rule-based, contradiction detection is conservative, and semantic or LLM-judge support can be added later.
 ## What It Catches

contexttrace-0.4.0/contexttrace/_version.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "0.4.0"

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/cli.py RENAMED Viewed

@@ -24,12 +24,15 @@ from contexttrace.storage import SQLiteTraceStore
 from contexttrace.thresholds import parse_thresholds, threshold_failures
 from contexttrace.verify import (
     VerificationInputError,
+    compare_failures,
+    compare_trace_files,
     list_verify_demos,
     load_trace_file,
     load_verify_demo,
     verify_trace,
 )
 from contexttrace.verify.benchmark import run_verify_benchmark, write_verify_benchmark_report
+from contexttrace.verify.compare_report import CompareReportGenerator
 from contexttrace.verify.report import VerifyReportGenerator
 from contexttrace.viewer import serve_viewer
@@ -340,6 +343,67 @@ def verify_benchmark_command(mode: str, case_set: str, json_output: bool, report
     return 0
+@cli.command("compare")
+@click.argument("baseline_json")
+@click.argument("current_json")
+@click.option("--json", "json_output", is_flag=True, help="Print the full comparison result as JSON.")
+@click.option("--report", is_flag=True, help="Generate a local HTML regression report.")
+@click.option("--out", default=None, help="HTML report path. Implies --report when provided.")
+@click.option("--mode", default="lexical", show_default=True, type=click.Choice(["lexical", "semantic"]), help="Evidence scoring mode for raw trace inputs.")
+@click.option("--fail-on", multiple=True, help="Fail on new_failure, new_unsupported, new_citation_mismatch, should_abstain_flip, support_rate_drop, new_root_cause, or any_regression.")
+def compare_command(
+    baseline_json: str,
+    current_json: str,
+    json_output: bool,
+    report: bool,
+    out: Optional[str],
+    mode: str,
+    fail_on: tuple[str, ...],
+) -> int:
+    """Compare two portable RAG traces or verification JSON outputs."""
+    try:
+        result = compare_trace_files(baseline_json, current_json, mode=mode)
+    except VerificationInputError as exc:
+        raise click.ClickException(str(exc)) from exc
+    written_report = None
+    if report or out:
+        default_name = "%s_vs_%s_compare.html" % (Path(baseline_json).stem, Path(current_json).stem)
+        output_path = out or str(Path(".contexttrace") / "reports" / default_name)
+        written_report = CompareReportGenerator().generate(result, path=output_path)
+    fail_messages = compare_failures(result, fail_on)
+    if json_output:
+        if written_report:
+            click.echo("Report: %s" % written_report, err=True)
+        click.echo(json.dumps(result, indent=2))
+        for message in fail_messages:
+            click.echo("Comparison failed: %s" % message, err=True)
+        return 1 if fail_messages else 0
+    summary = result["summary"]
+    click.echo("Regression: %s" % str(summary["regression"]).lower())
+    click.echo("Support rate: %.3f -> %.3f (%+.3f)" % (
+        float(summary.get("support_rate_before") or 0.0),
+        float(summary.get("support_rate_after") or 0.0),
+        float(summary.get("support_rate_delta") or 0.0),
+    ))
+    click.echo("Unsupported claim rate delta: %+.3f" % float(summary.get("unsupported_claim_rate_delta") or 0.0))
+    click.echo("Citation mismatch delta: %+d" % int(summary.get("citation_mismatch_delta") or 0))
+    click.echo("New failures: %s" % summary["new_failures"])
+    click.echo("Resolved failures: %s" % summary["resolved_failures"])
+    click.echo("Added claims: %s" % summary["added_claims"])
+    click.echo("Removed claims: %s" % summary["removed_claims"])
+    click.echo("Changed claims: %s" % summary["changed_claims"])
+    click.echo("New root causes: %s" % (", ".join(summary.get("new_root_causes") or []) or "none"))
+    if written_report:
+        click.echo("Report: %s" % written_report)
+    for message in fail_messages:
+        click.echo("Comparison failed: %s" % message, err=True)
+    return 1 if fail_messages else 0
 def _write_verify_report(
     result: dict,
     trace: object,

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/verify/__init__.py RENAMED Viewed

@@ -1,4 +1,5 @@
 from contexttrace.verify.runner import verify_trace, verify_trace_file
+from contexttrace.verify.compare import compare_failures, compare_trace_files, compare_verifications
 from contexttrace.verify.schema import (
     RAGTrace,
     TraceCitation,
@@ -13,6 +14,9 @@ __all__ = [
     "TraceCitation",
     "TraceContext",
     "VerificationInputError",
+    "compare_failures",
+    "compare_trace_files",
+    "compare_verifications",
     "list_verify_demos",
     "load_trace_file",
     "load_verify_demo",

contexttrace-0.4.0/contexttrace/verify/compare.py ADDED Viewed

@@ -0,0 +1,445 @@
+from __future__ import annotations
+import json
+from pathlib import Path
+from typing import Any
+from contexttrace.verify.evidence import lexical_score
+from contexttrace.verify.runner import verify_trace
+from contexttrace.verify.schema import VerificationInputError, load_trace
+FAILURE_VERDICTS = {"partially_supported", "unsupported", "contradicted", "unverifiable"}
+BAD_CITATIONS = {
+    "cited_source_missing",
+    "cited_source_does_not_support_claim",
+    "claim_supported_by_different_source",
+}
+NO_ROOT_CAUSE = "no_failure_detected"
+MATCH_THRESHOLD = 0.58
+def compare_trace_files(
+    baseline_path: str | Path,
+    current_path: str | Path,
+    *,
+    mode: str = "lexical",
+) -> dict[str, Any]:
+    baseline = load_compare_input(baseline_path, mode=mode)
+    current = load_compare_input(current_path, mode=mode)
+    return compare_verifications(baseline, current, mode=mode)
+def load_compare_input(path: str | Path, *, mode: str = "lexical") -> dict[str, Any]:
+    input_path = Path(path)
+    try:
+        payload = json.loads(input_path.read_text(encoding="utf-8"))
+    except OSError as exc:
+        raise VerificationInputError("Could not read compare input %s: %s" % (input_path, exc)) from exc
+    except json.JSONDecodeError as exc:
+        raise VerificationInputError(
+            "Invalid JSON in %s at line %s column %s: %s"
+            % (input_path, exc.lineno, exc.colno, exc.msg)
+        ) from exc
+    if _looks_like_verification_result(payload):
+        return _normalize_verified_result(payload, source=str(input_path))
+    trace = load_trace(payload, source=str(input_path))
+    result = verify_trace(trace, mode=mode)
+    result.setdefault("metadata", {})
+    result["metadata"] = {
+        **dict(result.get("metadata") or {}),
+        "compare_input": str(input_path),
+        "compare_input_type": "raw_trace",
+    }
+    return result
+def compare_verifications(
+    baseline: dict[str, Any],
+    current: dict[str, Any],
+    *,
+    mode: str = "lexical",
+) -> dict[str, Any]:
+    baseline_claims = list(baseline.get("claims") or [])
+    current_claims = list(current.get("claims") or [])
+    matches = _match_claims(baseline_claims, current_claims, mode=mode)
+    changes = []
+    matched_baseline = set()
+    matched_current = set()
+    for baseline_index, current_index, score in matches:
+        matched_baseline.add(baseline_index)
+        matched_current.add(current_index)
+        change = _matched_change(
+            baseline_claims[baseline_index],
+            current_claims[current_index],
+            match_score=score,
+        )
+        if change["status"] != "unchanged":
+            changes.append(change)
+    for index, claim in enumerate(current_claims):
+        if index in matched_current:
+            continue
+        changes.append(_single_change("added_failure" if _is_failure(claim) else "added_claim", after=claim))
+    for index, claim in enumerate(baseline_claims):
+        if index in matched_baseline:
+            continue
+        changes.append(_single_change("removed_failure" if _is_failure(claim) else "removed_claim", before=claim))
+    changes = sorted(changes, key=_change_sort_key)
+    summary = _summary(baseline, current, changes)
+    return {
+        "mode": mode,
+        "summary": summary,
+        "changes": changes,
+        "baseline": _run_snapshot(baseline),
+        "current": _run_snapshot(current),
+    }
+def compare_failures(result: dict[str, Any], fail_on: tuple[str, ...]) -> list[str]:
+    if not fail_on:
+        return []
+    summary = result.get("summary") or {}
+    messages = []
+    for raw_rule in fail_on:
+        rule = raw_rule.strip().lower().replace("-", "_")
+        if rule == "new_failure" and int(summary.get("new_failures") or 0) > 0:
+            messages.append("new verification failure detected")
+        elif rule == "new_unsupported" and int(summary.get("new_unsupported") or 0) > 0:
+            messages.append("new unsupported claim detected")
+        elif rule == "new_citation_mismatch" and int(summary.get("new_citation_mismatches") or 0) > 0:
+            messages.append("new citation mismatch detected")
+        elif rule == "should_abstain_flip" and bool(summary.get("should_abstain_regressed")):
+            messages.append("should-abstain changed from false to true")
+        elif rule == "support_rate_drop" and float(summary.get("support_rate_delta") or 0.0) < 0:
+            messages.append("support rate dropped")
+        elif rule in {"new_root_cause", "root_cause_regression"} and summary.get("new_root_causes"):
+            messages.append("new root cause detected")
+        elif rule == "any_regression" and bool(summary.get("regression")):
+            messages.append("verification regression detected")
+        elif rule not in {
+            "new_failure",
+            "new_unsupported",
+            "new_citation_mismatch",
+            "should_abstain_flip",
+            "support_rate_drop",
+            "new_root_cause",
+            "root_cause_regression",
+            "any_regression",
+        }:
+            messages.append("unknown --fail-on rule %s" % raw_rule)
+    return messages
+def _looks_like_verification_result(payload: Any) -> bool:
+    return (
+        isinstance(payload, dict)
+        and isinstance(payload.get("summary"), dict)
+        and isinstance(payload.get("claims"), list)
+    )
+def _normalize_verified_result(payload: dict[str, Any], *, source: str) -> dict[str, Any]:
+    result = dict(payload)
+    result.setdefault("metadata", {})
+    result["metadata"] = {
+        **dict(result.get("metadata") or {}),
+        "compare_input": source,
+        "compare_input_type": "verification_result",
+    }
+    return result
+def _match_claims(
+    baseline_claims: list[dict[str, Any]],
+    current_claims: list[dict[str, Any]],
+    *,
+    mode: str,
+) -> list[tuple[int, int, float]]:
+    candidates = []
+    for baseline_index, baseline_claim in enumerate(baseline_claims):
+        for current_index, current_claim in enumerate(current_claims):
+            score = _claim_similarity(
+                str(baseline_claim.get("claim") or ""),
+                str(current_claim.get("claim") or ""),
+                mode=mode,
+            )
+            if score >= MATCH_THRESHOLD:
+                candidates.append((score, baseline_index, current_index))
+    matches = []
+    used_baseline = set()
+    used_current = set()
+    for score, baseline_index, current_index in sorted(candidates, reverse=True):
+        if baseline_index in used_baseline or current_index in used_current:
+            continue
+        used_baseline.add(baseline_index)
+        used_current.add(current_index)
+        matches.append((baseline_index, current_index, score))
+    return matches
+def _claim_similarity(left: str, right: str, *, mode: str) -> float:
+    if _normalize_text(left) == _normalize_text(right):
+        return 1.0
+    forward, _ = lexical_score(left, right, mode=mode)
+    reverse, _ = lexical_score(right, left, mode=mode)
+    return max(forward, reverse)
+def _matched_change(
+    before_claim: dict[str, Any],
+    after_claim: dict[str, Any],
+    *,
+    match_score: float,
+) -> dict[str, Any]:
+    before_failure = _is_failure(before_claim)
+    after_failure = _is_failure(after_claim)
+    before_severity = _severity(before_claim)
+    after_severity = _severity(after_claim)
+    before_citation = _citation_severity(before_claim)
+    after_citation = _citation_severity(after_claim)
+    before_root = _root_label(before_claim)
+    after_root = _root_label(after_claim)
+    if not before_failure and after_failure:
+        status = "new_failure"
+    elif before_failure and not after_failure:
+        status = "resolved_failure"
+    elif after_severity > before_severity:
+        status = "verdict_regressed"
+    elif after_severity < before_severity:
+        status = "verdict_improved"
+    elif after_citation > before_citation:
+        status = "citation_regressed"
+    elif after_citation < before_citation:
+        status = "citation_improved"
+    elif before_root != after_root and after_root != NO_ROOT_CAUSE:
+        status = "root_cause_regressed"
+    elif before_root != after_root:
+        status = "root_cause_changed"
+    elif _context_id(before_claim) != _context_id(after_claim):
+        status = "source_changed"
+    elif _normalize_text(str(before_claim.get("claim") or "")) != _normalize_text(str(after_claim.get("claim") or "")):
+        status = "claim_changed"
+    else:
+        status = "unchanged"
+    return {
+        "status": status,
+        "claim": str(after_claim.get("claim") or before_claim.get("claim") or ""),
+        "match_score": round(match_score, 3),
+        "before": _claim_snapshot(before_claim),
+        "after": _claim_snapshot(after_claim),
+        "suggested_fix": _suggested_fix(after_claim, status=status),
+    }
+def _single_change(
+    status: str,
+    *,
+    before: dict[str, Any] | None = None,
+    after: dict[str, Any] | None = None,
+) -> dict[str, Any]:
+    claim = after or before or {}
+    return {
+        "status": status,
+        "claim": str(claim.get("claim") or ""),
+        "match_score": None,
+        "before": _claim_snapshot(before) if before else None,
+        "after": _claim_snapshot(after) if after else None,
+        "suggested_fix": _suggested_fix(claim, status=status),
+    }
+def _summary(
+    baseline: dict[str, Any],
+    current: dict[str, Any],
+    changes: list[dict[str, Any]],
+) -> dict[str, Any]:
+    baseline_summary = dict(baseline.get("summary") or {})
+    current_summary = dict(current.get("summary") or {})
+    new_failures = [change for change in changes if change["status"] in {"new_failure", "added_failure", "verdict_regressed", "citation_regressed", "root_cause_regressed"}]
+    resolved_failures = [change for change in changes if change["status"] in {"resolved_failure", "removed_failure", "verdict_improved", "citation_improved"}]
+    new_unsupported = [
+        change
+        for change in new_failures
+        if ((change.get("after") or {}).get("verdict") in {"unsupported", "contradicted"})
+    ]
+    new_citations = [
+        change
+        for change in new_failures
+        if _citation_status_from_snapshot(change.get("after")) in BAD_CITATIONS
+    ]
+    before_abstain = bool((baseline.get("abstention") or {}).get("should_abstain") or baseline_summary.get("should_abstain"))
+    after_abstain = bool((current.get("abstention") or {}).get("should_abstain") or current_summary.get("should_abstain"))
+    support_delta = _delta(current_summary.get("support_rate"), baseline_summary.get("support_rate"))
+    unsupported_delta = _delta(current_summary.get("unsupported_claim_rate"), baseline_summary.get("unsupported_claim_rate"))
+    citation_delta = int(current_summary.get("citation_mismatches") or 0) - int(baseline_summary.get("citation_mismatches") or 0)
+    new_root_causes = sorted(
+        {
+            _root_from_snapshot(change.get("after"))
+            for change in new_failures
+            if _root_from_snapshot(change.get("after")) != NO_ROOT_CAUSE
+        }
+    )
+    resolved_root_causes = sorted(
+        {
+            _root_from_snapshot(change.get("before"))
+            for change in resolved_failures
+            if _root_from_snapshot(change.get("before")) != NO_ROOT_CAUSE
+        }
+    )
+    regression = bool(
+        new_failures
+        or support_delta < 0
+        or unsupported_delta > 0
+        or citation_delta > 0
+        or (not before_abstain and after_abstain)
+    )
+    return {
+        "regression": regression,
+        "improved": bool(resolved_failures and not regression),
+        "support_rate_before": _number(baseline_summary.get("support_rate")),
+        "support_rate_after": _number(current_summary.get("support_rate")),
+        "support_rate_delta": support_delta,
+        "unsupported_claim_rate_delta": unsupported_delta,
+        "citation_mismatch_delta": citation_delta,
+        "should_abstain_before": before_abstain,
+        "should_abstain_after": after_abstain,
+        "should_abstain_changed": before_abstain != after_abstain,
+        "should_abstain_regressed": (not before_abstain and after_abstain),
+        "new_failures": len(new_failures),
+        "resolved_failures": len(resolved_failures),
+        "new_unsupported": len(new_unsupported),
+        "new_citation_mismatches": len(new_citations),
+        "added_claims": len([change for change in changes if change["status"] in {"added_claim", "added_failure"}]),
+        "removed_claims": len([change for change in changes if change["status"] in {"removed_claim", "removed_failure"}]),
+        "changed_claims": len(changes),
+        "new_root_causes": new_root_causes,
+        "resolved_root_causes": resolved_root_causes,
+    }
+def _run_snapshot(result: dict[str, Any]) -> dict[str, Any]:
+    return {
+        "query": result.get("query"),
+        "answer": result.get("answer"),
+        "summary": result.get("summary") or {},
+        "abstention": result.get("abstention") or {},
+        "metadata": result.get("metadata") or {},
+    }
+def _claim_snapshot(claim: dict[str, Any] | None) -> dict[str, Any] | None:
+    if claim is None:
+        return None
+    root = claim.get("root_cause") or {}
+    return {
+        "claim_id": claim.get("claim_id"),
+        "claim": claim.get("claim"),
+        "verdict": claim.get("verdict"),
+        "confidence": claim.get("confidence"),
+        "best_context_id": claim.get("best_context_id"),
+        "citation_status": claim.get("citation_status"),
+        "root_cause": root.get("label") if isinstance(root, dict) else None,
+        "missing_fact": root.get("missing_fact") if isinstance(root, dict) else None,
+        "closest_evidence": root.get("closest_evidence") if isinstance(root, dict) else claim.get("evidence"),
+        "suggested_fix": root.get("suggested_fix") if isinstance(root, dict) else None,
+    }
+def _is_failure(claim: dict[str, Any]) -> bool:
+    return (
+        str(claim.get("verdict") or "") in FAILURE_VERDICTS
+        or str(claim.get("citation_status") or "") in BAD_CITATIONS
+        or _root_label(claim) != NO_ROOT_CAUSE
+    )
+def _severity(claim: dict[str, Any]) -> int:
+    verdict = str(claim.get("verdict") or "")
+    if verdict in {"unsupported", "contradicted"}:
+        return 3
+    if verdict in {"partially_supported", "unverifiable"}:
+        return 2
+    return 0
+def _citation_severity(claim: dict[str, Any]) -> int:
+    return 1 if str(claim.get("citation_status") or "") in BAD_CITATIONS else 0
+def _root_label(claim: dict[str, Any]) -> str:
+    root = claim.get("root_cause") or {}
+    if isinstance(root, dict):
+        return str(root.get("label") or NO_ROOT_CAUSE)
+    return NO_ROOT_CAUSE
+def _context_id(claim: dict[str, Any]) -> str:
+    return str(claim.get("best_context_id") or "")
+def _root_from_snapshot(snapshot: dict[str, Any] | None) -> str:
+    if not snapshot:
+        return NO_ROOT_CAUSE
+    return str(snapshot.get("root_cause") or NO_ROOT_CAUSE)
+def _citation_status_from_snapshot(snapshot: dict[str, Any] | None) -> str:
+    if not snapshot:
+        return ""
+    return str(snapshot.get("citation_status") or "")
+def _suggested_fix(claim: dict[str, Any], *, status: str) -> str:
+    root = claim.get("root_cause") or {}
+    if isinstance(root, dict) and root.get("suggested_fix"):
+        return str(root["suggested_fix"])
+    if status in {"added_failure", "new_failure", "verdict_regressed"}:
+        return "Inspect the new claim and remove unsupported details or retrieve supporting evidence."
+    if status == "citation_regressed":
+        return "Regenerate claim-level citations and require cited source IDs to support the claim."
+    if status == "source_changed":
+        return "Check whether the new retrieved source is intentional and still supports the claim."
+    return "No automatic fix suggested."
+def _change_sort_key(change: dict[str, Any]) -> tuple[int, str]:
+    priority = {
+        "added_failure": 0,
+        "new_failure": 1,
+        "verdict_regressed": 2,
+        "citation_regressed": 3,
+        "root_cause_regressed": 4,
+        "resolved_failure": 5,
+        "verdict_improved": 6,
+        "citation_improved": 7,
+        "removed_failure": 8,
+        "added_claim": 8,
+        "removed_claim": 9,
+        "source_changed": 10,
+        "claim_changed": 11,
+    }
+    return (priority.get(str(change.get("status")), 99), str(change.get("claim") or ""))
+def _delta(current: Any, baseline: Any) -> float:
+    return round(_number(current) - _number(baseline), 3)
+def _number(value: Any) -> float:
+    try:
+        return round(float(value), 3)
+    except (TypeError, ValueError):
+        return 0.0
+def _normalize_text(text: str) -> str:
+    return " ".join(str(text or "").lower().strip().strip(".!?").split())

contexttrace-0.4.0/contexttrace/verify/compare_report.py ADDED Viewed

@@ -0,0 +1,386 @@
+from __future__ import annotations
+import json
+from html import escape
+from pathlib import Path
+from typing import Any
+class CompareReportGenerator:
+    def generate(self, result: dict[str, Any], *, path: str) -> str:
+        output_path = Path(path)
+        output_path.parent.mkdir(parents=True, exist_ok=True)
+        output_path.write_text(self.render(result), encoding="utf-8")
+        return str(output_path)
+    def render(self, result: dict[str, Any]) -> str:
+        summary = result.get("summary") or {}
+        changes = list(result.get("changes") or [])
+        return HTML_TEMPLATE.format(
+            verdict_class="bad" if summary.get("regression") else "ok",
+            regression=escape(_string(summary.get("regression"))),
+            mode=escape(_string(result.get("mode") or "lexical")),
+            summary_cards=_summary_cards(summary),
+            change_rows=_change_rows(changes),
+            new_failures=_change_cards(
+                changes,
+                {"added_failure", "new_failure", "verdict_regressed", "citation_regressed", "root_cause_regressed"},
+                empty="No new claim-level verification failures were detected.",
+            ),
+            resolved_failures=_change_cards(
+                changes,
+                {"resolved_failure", "removed_failure", "verdict_improved", "citation_improved"},
+                empty="No previously failing claims were resolved.",
+            ),
+            root_changes=_root_changes(summary),
+            baseline_summary=_run_summary(result.get("baseline") or {}),
+            current_summary=_run_summary(result.get("current") or {}),
+            raw_json=escape(json.dumps(_raw_summary(result), indent=2)),
+        )
+def _summary_cards(summary: dict[str, Any]) -> str:
+    cards = [
+        ("Regression", summary.get("regression")),
+        ("Support Rate Delta", _signed(summary.get("support_rate_delta"))),
+        ("Unsupported Rate Delta", _signed(summary.get("unsupported_claim_rate_delta"))),
+        ("Citation Mismatch Delta", _signed(summary.get("citation_mismatch_delta"))),
+        ("New Failures", summary.get("new_failures", 0)),
+        ("Resolved Failures", summary.get("resolved_failures", 0)),
+        ("New Unsupported", summary.get("new_unsupported", 0)),
+        ("New Citation Mismatches", summary.get("new_citation_mismatches", 0)),
+        ("Added Claims", summary.get("added_claims", 0)),
+        ("Removed Claims", summary.get("removed_claims", 0)),
+        ("Should Abstain Before", summary.get("should_abstain_before")),
+        ("Should Abstain After", summary.get("should_abstain_after")),
+    ]
+    return "\n".join(
+        """
+        <div class="card">
+          <div class="label">{label}</div>
+          <div class="value">{value}</div>
+        </div>
+        """.format(label=escape(label), value=escape(_string(value)))
+        for label, value in cards
+    )
+def _change_rows(changes: list[dict[str, Any]]) -> str:
+    if not changes:
+        return "<tr><td colspan=\"7\" class=\"muted\">No claim-level changes detected.</td></tr>"
+    rows = []
+    for change in changes:
+        before = change.get("before") or {}
+        after = change.get("after") or {}
+        rows.append(
+            """
+            <tr>
+              <td><span class="badge status-{status_class}">{status}</span></td>
+              <td>{claim}</td>
+              <td>{before_verdict}</td>
+              <td>{after_verdict}</td>
+              <td>{before_root}</td>
+              <td>{after_root}</td>
+              <td>{fix}</td>
+            </tr>
+            """.format(
+                status_class=escape(_css_token(change.get("status"))),
+                status=escape(_string(change.get("status"))),
+                claim=escape(_string(change.get("claim"))),
+                before_verdict=escape(_string(before.get("verdict") or "none")),
+                after_verdict=escape(_string(after.get("verdict") or "none")),
+                before_root=escape(_string(before.get("root_cause") or "none")),
+                after_root=escape(_string(after.get("root_cause") or "none")),
+                fix=escape(_string(change.get("suggested_fix"))),
+            )
+        )
+    return "\n".join(rows)
+def _change_cards(changes: list[dict[str, Any]], statuses: set[str], *, empty: str) -> str:
+    selected = [change for change in changes if change.get("status") in statuses]
+    if not selected:
+        return "<p class=\"muted\">%s</p>" % escape(empty)
+    return "\n".join(_change_card(change) for change in selected)
+def _change_card(change: dict[str, Any]) -> str:
+    before = change.get("before") or {}
+    after = change.get("after") or {}
+    active = after or before
+    return """
+    <article class="item">
+      <div class="item-meta">{status} | match {match_score}</div>
+      <h3>{claim}</h3>
+      <p><strong>Before:</strong> {before_verdict} | {before_citation} | {before_root}</p>
+      <p><strong>After:</strong> {after_verdict} | {after_citation} | {after_root}</p>
+      <p><strong>Best context:</strong> {context_id}</p>
+      <p><strong>Closest evidence:</strong> {evidence}</p>
+      <p><strong>Suggested fix:</strong> {fix}</p>
+    </article>
+    """.format(
+        status=escape(_string(change.get("status"))),
+        match_score=escape(_string(change.get("match_score") if change.get("match_score") is not None else "new")),
+        claim=escape(_string(change.get("claim"))),
+        before_verdict=escape(_string(before.get("verdict") or "none")),
+        before_citation=escape(_string(before.get("citation_status") or "none")),
+        before_root=escape(_string(before.get("root_cause") or "none")),
+        after_verdict=escape(_string(after.get("verdict") or "none")),
+        after_citation=escape(_string(after.get("citation_status") or "none")),
+        after_root=escape(_string(after.get("root_cause") or "none")),
+        context_id=escape(_string(active.get("best_context_id") or "none")),
+        evidence=escape(_string(active.get("closest_evidence") or "none")),
+        fix=escape(_string(change.get("suggested_fix"))),
+    )
+def _root_changes(summary: dict[str, Any]) -> str:
+    new_roots = list(summary.get("new_root_causes") or [])
+    resolved_roots = list(summary.get("resolved_root_causes") or [])
+    if not new_roots and not resolved_roots:
+        return "<p class=\"muted\">No root-cause labels changed.</p>"
+    return """
+    <div class="grid-two">
+      <div class="item">
+        <div class="item-meta">New root causes</div>
+        <p>{new_roots}</p>
+      </div>
+      <div class="item">
+        <div class="item-meta">Resolved root causes</div>
+        <p>{resolved_roots}</p>
+      </div>
+    </div>
+    """.format(
+        new_roots=escape(", ".join(new_roots) or "none"),
+        resolved_roots=escape(", ".join(resolved_roots) or "none"),
+    )
+def _run_summary(run: dict[str, Any]) -> str:
+    summary = run.get("summary") or {}
+    metadata = run.get("metadata") or {}
+    cards = [
+        ("Query", run.get("query")),
+        ("Support Rate", summary.get("support_rate")),
+        ("Unsupported Rate", summary.get("unsupported_claim_rate")),
+        ("Citation Mismatches", summary.get("citation_mismatches")),
+        ("Failure Type", summary.get("failure_type")),
+        ("Primary Root Cause", summary.get("primary_root_cause")),
+        ("Should Abstain", summary.get("should_abstain")),
+        ("Input Type", metadata.get("compare_input_type")),
+    ]
+    return "\n".join(
+        """
+        <div class="card">
+          <div class="label">{label}</div>
+          <div class="small-value">{value}</div>
+        </div>
+        """.format(label=escape(label), value=escape(_string(value)))
+        for label, value in cards
+    )
+def _raw_summary(result: dict[str, Any]) -> dict[str, Any]:
+    return {
+        "mode": result.get("mode"),
+        "summary": result.get("summary"),
+        "changes": result.get("changes"),
+        "baseline": result.get("baseline"),
+        "current": result.get("current"),
+    }
+def _signed(value: Any) -> str:
+    try:
+        number = float(value)
+    except (TypeError, ValueError):
+        return "0"
+    if number > 0:
+        return "+%s" % _string(round(number, 3))
+    return _string(round(number, 3))
+def _css_token(value: Any) -> str:
+    token = _string(value).lower().replace("_", "-").replace(" ", "-")
+    return "".join(char for char in token if char.isalnum() or char == "-") or "unknown"
+def _string(value: Any) -> str:
+    if value is None:
+        return ""
+    return str(value)
+HTML_TEMPLATE = """<!doctype html>
+<html lang="en">
+<head>
+  <meta charset="utf-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <title>ContextTrace Regression Report</title>
+  <style>
+    :root {{
+      color-scheme: light;
+      --bg: #f7f8fa;
+      --panel: #ffffff;
+      --subtle: #fbfcfe;
+      --text: #202832;
+      --muted: #657286;
+      --line: #d9e0ea;
+      --ok: #176f44;
+      --warn: #946200;
+      --bad: #b42318;
+      --accent: #2458d3;
+    }}
+    * {{ box-sizing: border-box; }}
+    body {{
+      margin: 0;
+      background: var(--bg);
+      color: var(--text);
+      font-family: Inter, ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
+      line-height: 1.5;
+    }}
+    main {{ max-width: 1160px; margin: 0 auto; padding: 32px 20px 56px; }}
+    header {{ border-bottom: 1px solid var(--line); margin-bottom: 22px; padding-bottom: 18px; }}
+    h1, h2, h3 {{ margin: 0; }}
+    h1 {{ font-size: 30px; }}
+    h2 {{ font-size: 18px; margin-bottom: 12px; }}
+    h3 {{ font-size: 15px; margin-bottom: 8px; }}
+    section {{
+      background: var(--panel);
+      border: 1px solid var(--line);
+      border-radius: 8px;
+      margin: 16px 0;
+      padding: 18px;
+    }}
+    .banner {{
+      border: 1px solid var(--line);
+      border-radius: 8px;
+      background: var(--subtle);
+      padding: 14px;
+      margin-top: 12px;
+    }}
+    .banner.ok {{ border-color: #a7dfbf; background: #edf9f1; }}
+    .banner.bad {{ border-color: #f3b1ac; background: #fff1f0; }}
+    .summary {{
+      display: grid;
+      gap: 12px;
+      grid-template-columns: repeat(auto-fit, minmax(155px, 1fr));
+    }}
+    .grid-two {{
+      display: grid;
+      gap: 12px;
+      grid-template-columns: repeat(auto-fit, minmax(260px, 1fr));
+    }}
+    .card, .item {{
+      border: 1px solid var(--line);
+      border-radius: 8px;
+      background: var(--subtle);
+      padding: 12px;
+    }}
+    .item + .item {{ margin-top: 10px; }}
+    .label, .item-meta {{
+      color: var(--muted);
+      font-size: 12px;
+      font-weight: 700;
+      text-transform: uppercase;
+    }}
+    .value {{ margin-top: 4px; font-size: 18px; overflow-wrap: anywhere; }}
+    .small-value {{ margin-top: 4px; font-size: 14px; overflow-wrap: anywhere; }}
+    .muted {{ color: var(--muted); }}
+    table {{ width: 100%; border-collapse: collapse; font-size: 14px; }}
+    th, td {{ border-bottom: 1px solid var(--line); padding: 10px; text-align: left; vertical-align: top; }}
+    th {{ color: var(--muted); font-size: 12px; text-transform: uppercase; }}
+    .badge {{
+      display: inline-block;
+      border-radius: 999px;
+      border: 1px solid var(--line);
+      background: #eef2f7;
+      padding: 3px 8px;
+      font-size: 12px;
+      font-weight: 700;
+      white-space: nowrap;
+    }}
+    .status-added-failure, .status-new-failure, .status-verdict-regressed,
+    .status-citation-regressed, .status-root-cause-regressed {{ color: var(--bad); background: #fdeceb; }}
+    .status-resolved-failure, .status-removed-failure, .status-verdict-improved,
+    .status-citation-improved {{ color: var(--ok); background: #e9f7ef; }}
+    .status-added-claim, .status-removed-claim, .status-source-changed,
+    .status-claim-changed, .status-root-cause-changed {{ color: var(--warn); background: #fff7df; }}
+    pre {{
+      margin: 0;
+      overflow: auto;
+      background: #101828;
+      color: #f8fafc;
+      border-radius: 8px;
+      padding: 14px;
+      font-size: 13px;
+    }}
+  </style>
+</head>
+<body>
+  <main>
+    <header>
+      <h1>ContextTrace Regression Report</h1>
+      <p class="muted">Local diff of two claim-level evidence verification runs.</p>
+      <div class="banner {verdict_class}">
+        <strong>Regression: {regression}</strong>
+        <span class="muted"> | mode {mode}</span>
+      </div>
+    </header>
+    <section>
+      <h2>Regression Summary</h2>
+      <div class="summary">{summary_cards}</div>
+    </section>
+    <section>
+      <h2>Claim Changes</h2>
+      <table>
+        <thead>
+          <tr>
+            <th>Status</th>
+            <th>Claim</th>
+            <th>Before Verdict</th>
+            <th>After Verdict</th>
+            <th>Before Root Cause</th>
+            <th>After Root Cause</th>
+            <th>Suggested Fix</th>
+          </tr>
+        </thead>
+        <tbody>{change_rows}</tbody>
+      </table>
+    </section>
+    <section>
+      <h2>New Failures</h2>
+      {new_failures}
+    </section>
+    <section>
+      <h2>Resolved Failures</h2>
+      {resolved_failures}
+    </section>
+    <section>
+      <h2>Root Cause Changes</h2>
+      {root_changes}
+    </section>
+    <section>
+      <h2>Baseline Summary</h2>
+      <div class="summary">{baseline_summary}</div>
+    </section>
+    <section>
+      <h2>Current Summary</h2>
+      <div class="summary">{current_summary}</div>
+    </section>
+    <section>
+      <h2>Raw JSON Summary</h2>
+      <pre>{raw_json}</pre>
+    </section>
+  </main>
+</body>
+</html>
+"""

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace.egg-info/SOURCES.txt RENAMED Viewed

@@ -33,6 +33,8 @@ contexttrace/verify/abstention.py
 contexttrace/verify/benchmark.py
 contexttrace/verify/citations.py
 contexttrace/verify/claims.py
+contexttrace/verify/compare.py
+contexttrace/verify/compare_report.py
 contexttrace/verify/demos.py
 contexttrace/verify/evidence.py
 contexttrace/verify/external_benchmark_cases.json

{contexttrace-0.3.0 → contexttrace-0.4.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "contexttrace"
-version = "0.3.0"
+version = "0.4.0"
 description = "Local-first SDK and CLI for RAG and agent reliability tracing, citation checks, and failure diagnosis."
 readme = "README.md"
 requires-python = ">=3.8"

contexttrace-0.3.0/contexttrace/_version.py DELETED Viewed

	@@ -1 +0,0 @@
1	- __version__ = "0.3.0"

{contexttrace-0.3.0 → contexttrace-0.4.0}/MANIFEST.in RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/__init__.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/client.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/config.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/demo.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/demo_data.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/endpoint_eval.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/errors.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/evaluator.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/integrations/__init__.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/integrations/fastapi.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/integrations/langchain.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/integrations/langgraph.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/integrations/llamaindex.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/integrations/opentelemetry.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/local.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/py.typed RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/regression.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/reliability.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/report.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/storage/__init__.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/storage/sqlite_store.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/thresholds.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/transport.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/verify/abstention.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/verify/benchmark.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/verify/citations.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/verify/claims.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/verify/demos.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/verify/evidence.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/verify/external_benchmark_cases.json RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/verify/facts.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/verify/real_benchmark_cases.json RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/verify/report.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/verify/root_cause.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/verify/runner.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/verify/schema.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/verify/spans.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/verify/verdicts.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/contexttrace/viewer.py RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/setup.cfg RENAMED Viewed

File without changes

{contexttrace-0.3.0 → contexttrace-0.4.0}/setup.py RENAMED Viewed

File without changes

contexttrace 0.3.0__tar.gz → 0.4.0__tar.gz

contexttrace 0.3.0tar.gz → 0.4.0tar.gz