PyPI - contexttrace - Versions diffs - 0.4.0__tar.gz → 0.5.0__tar.gz - Mend

contexttrace 0.4.0tar.gz → 0.5.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (54) hide show

{contexttrace-0.4.0 → contexttrace-0.5.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: contexttrace
-Version: 0.4.0
+Version: 0.5.0
 Summary: Local-first SDK and CLI for RAG and agent reliability tracing, citation checks, and failure diagnosis.
 Author: ContextTrace contributors
 License: MIT
@@ -150,6 +150,9 @@ contexttrace verify-benchmark --case-set external --mode semantic --report
 contexttrace compare baseline.json current.json
 contexttrace compare baseline.json current.json --report
 contexttrace compare baseline.json current.json --fail-on new_failure
+contexttrace audit trace.json --corpus docs/
+contexttrace audit trace.json --corpus docs/ --report
+contexttrace audit trace.json --corpus docs/ --fail-on retrieval_miss
 ```
 Input requires `query`, `answer`, and `contexts` with `id` and `text`. Optional `citations` are checked to catch cited sources that do not actually support the matched claim.
@@ -164,7 +167,9 @@ ContextTrace verifies whether each generated claim is actually supported by retr
 Use `contexttrace compare baseline.json current.json` to diff two portable traces or saved `verify --json` outputs. It reports support-rate deltas, new unsupported claims, citation regressions, should-abstain flips, and new root causes, with `--fail-on` gates for CI.
-The v0.4.0 verifier uses local lexical heuristics by default. Claim extraction is rule-based, contradiction detection is conservative, and semantic or LLM-judge support can be added later.
+Use `contexttrace audit trace.json --corpus docs/` to diagnose whether an unsupported claim failed because retrieval missed evidence, chunking omitted the supporting span, the corpus lacks coverage, or generation overclaimed.
+The v0.5.0 verifier uses local lexical heuristics by default. Claim extraction is rule-based, contradiction detection is conservative, and semantic or LLM-judge support can be added later.
 ## What It Catches

{contexttrace-0.4.0 → contexttrace-0.5.0}/README.md RENAMED Viewed

@@ -93,6 +93,9 @@ contexttrace verify-benchmark --case-set external --mode semantic --report
 contexttrace compare baseline.json current.json
 contexttrace compare baseline.json current.json --report
 contexttrace compare baseline.json current.json --fail-on new_failure
+contexttrace audit trace.json --corpus docs/
+contexttrace audit trace.json --corpus docs/ --report
+contexttrace audit trace.json --corpus docs/ --fail-on retrieval_miss
 ```
 Input requires `query`, `answer`, and `contexts` with `id` and `text`. Optional `citations` are checked to catch cited sources that do not actually support the matched claim.
@@ -107,7 +110,9 @@ ContextTrace verifies whether each generated claim is actually supported by retr
 Use `contexttrace compare baseline.json current.json` to diff two portable traces or saved `verify --json` outputs. It reports support-rate deltas, new unsupported claims, citation regressions, should-abstain flips, and new root causes, with `--fail-on` gates for CI.
-The v0.4.0 verifier uses local lexical heuristics by default. Claim extraction is rule-based, contradiction detection is conservative, and semantic or LLM-judge support can be added later.
+Use `contexttrace audit trace.json --corpus docs/` to diagnose whether an unsupported claim failed because retrieval missed evidence, chunking omitted the supporting span, the corpus lacks coverage, or generation overclaimed.
+The v0.5.0 verifier uses local lexical heuristics by default. Claim extraction is rule-based, contradiction detection is conservative, and semantic or LLM-judge support can be added later.
 ## What It Catches

contexttrace-0.5.0/contexttrace/_version.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "0.5.0"

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/cli.py RENAMED Viewed

@@ -24,6 +24,8 @@ from contexttrace.storage import SQLiteTraceStore
 from contexttrace.thresholds import parse_thresholds, threshold_failures
 from contexttrace.verify import (
     VerificationInputError,
+    audit_failures,
+    audit_trace,
     compare_failures,
     compare_trace_files,
     list_verify_demos,
@@ -32,6 +34,7 @@ from contexttrace.verify import (
     verify_trace,
 )
 from contexttrace.verify.benchmark import run_verify_benchmark, write_verify_benchmark_report
+from contexttrace.verify.audit_report import AuditReportGenerator
 from contexttrace.verify.compare_report import CompareReportGenerator
 from contexttrace.verify.report import VerifyReportGenerator
 from contexttrace.viewer import serve_viewer
@@ -404,6 +407,63 @@ def compare_command(
     return 1 if fail_messages else 0
+@cli.command("audit")
+@click.argument("trace_json")
+@click.option("--corpus", "corpus_path", required=True, help="Local corpus directory or file to search for supporting evidence.")
+@click.option("--json", "json_output", is_flag=True, help="Print the full audit result as JSON.")
+@click.option("--report", is_flag=True, help="Generate a local HTML retrieval audit report.")
+@click.option("--out", default=None, help="HTML report path. Implies --report when provided.")
+@click.option("--mode", default="lexical", show_default=True, type=click.Choice(["lexical", "semantic"]), help="Evidence scoring mode.")
+@click.option("--fail-on", multiple=True, help="Fail on retrieval_miss, reranking_failure, chunking_issue, corpus_gap, answer_overreach, stale_source, insufficient_context, or any_failure.")
+def audit_command(
+    trace_json: str,
+    corpus_path: str,
+    json_output: bool,
+    report: bool,
+    out: Optional[str],
+    mode: str,
+    fail_on: tuple[str, ...],
+) -> int:
+    """Audit a verified trace against a broader local corpus."""
+    try:
+        trace = load_trace_file(trace_json)
+        result = audit_trace(trace, corpus_path=corpus_path, mode=mode)
+    except VerificationInputError as exc:
+        raise click.ClickException(str(exc)) from exc
+    written_report = None
+    if report or out:
+        default_name = "%s_audit.html" % Path(trace_json).stem
+        output_path = out or str(Path(".contexttrace") / "reports" / default_name)
+        written_report = AuditReportGenerator().generate(result, trace, path=output_path)
+    fail_messages = audit_failures(result, fail_on)
+    if json_output:
+        if written_report:
+            click.echo("Report: %s" % written_report, err=True)
+        click.echo(json.dumps(result, indent=2))
+        for message in fail_messages:
+            click.echo("Audit failed: %s" % message, err=True)
+        return 1 if fail_messages else 0
+    summary = result["summary"]
+    click.echo("Primary audit label: %s" % summary["primary_audit_label"])
+    click.echo("Claims audited: %s" % summary["total_claims"])
+    click.echo("Corpus documents: %s" % summary["corpus_documents"])
+    click.echo("Retrieval misses: %s" % summary["retrieval_miss"])
+    click.echo("Chunking issues: %s" % summary["chunking_issue"])
+    click.echo("Reranking failures: %s" % summary["reranking_failure"])
+    click.echo("Corpus gaps: %s" % summary["corpus_gap"])
+    click.echo("Answer overreach: %s" % summary["answer_overreach"])
+    click.echo("Insufficient context: %s" % summary["insufficient_context"])
+    if written_report:
+        click.echo("Report: %s" % written_report)
+    for message in fail_messages:
+        click.echo("Audit failed: %s" % message, err=True)
+    return 1 if fail_messages else 0
 def _write_verify_report(
     result: dict,
     trace: object,

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/__init__.py RENAMED Viewed

@@ -1,4 +1,5 @@
 from contexttrace.verify.runner import verify_trace, verify_trace_file
+from contexttrace.verify.audit import audit_failures, audit_trace, audit_trace_file, load_corpus
 from contexttrace.verify.compare import compare_failures, compare_trace_files, compare_verifications
 from contexttrace.verify.schema import (
     RAGTrace,
@@ -14,10 +15,14 @@ __all__ = [
     "TraceCitation",
     "TraceContext",
     "VerificationInputError",
+    "audit_failures",
+    "audit_trace",
+    "audit_trace_file",
     "compare_failures",
     "compare_trace_files",
     "compare_verifications",
     "list_verify_demos",
+    "load_corpus",
     "load_trace_file",
     "load_verify_demo",
     "verify_trace",

contexttrace-0.5.0/contexttrace/verify/audit.py ADDED Viewed

@@ -0,0 +1,449 @@
+from __future__ import annotations
+from collections import Counter
+from pathlib import Path
+from typing import Any
+from contexttrace.verify.claims import Claim
+from contexttrace.verify.evidence import find_best_evidence
+from contexttrace.verify.runner import verify_trace
+from contexttrace.verify.schema import RAGTrace, TraceContext, VerificationInputError, load_trace_file
+from contexttrace.verify.verdicts import classify_claim
+NO_FAILURE = "no_failure_detected"
+RETRIEVAL_MISS = "retrieval_miss"
+RERANKING_FAILURE = "reranking_failure"
+CHUNKING_ISSUE = "chunking_issue"
+CORPUS_GAP = "corpus_gap"
+ANSWER_OVERREACH = "answer_overreach"
+STALE_SOURCE = "stale_source"
+INSUFFICIENT_CONTEXT = "insufficient_context"
+AUDIT_FAILURE_LABELS = {
+    RETRIEVAL_MISS,
+    RERANKING_FAILURE,
+    CHUNKING_ISSUE,
+    CORPUS_GAP,
+    ANSWER_OVERREACH,
+    STALE_SOURCE,
+    INSUFFICIENT_CONTEXT,
+}
+BAD_CITATIONS = {
+    "cited_source_missing",
+    "cited_source_does_not_support_claim",
+    "claim_supported_by_different_source",
+}
+SUPPORTED_VERDICTS = {"supported"}
+CORPUS_EXTENSIONS = {
+    ".csv",
+    ".html",
+    ".json",
+    ".jsonl",
+    ".md",
+    ".markdown",
+    ".rst",
+    ".text",
+    ".tsv",
+    ".txt",
+    ".yaml",
+    ".yml",
+}
+SKIP_DIRECTORIES = {
+    ".contexttrace",
+    ".git",
+    ".hg",
+    ".mypy_cache",
+    ".pytest_cache",
+    ".ruff_cache",
+    ".svn",
+    "__pycache__",
+    "build",
+    "dist",
+    "node_modules",
+}
+MAX_FILE_BYTES = 1_000_000
+RERANKING_CUTOFF = 3
+def audit_trace_file(
+    trace_path: str | Path,
+    *,
+    corpus_path: str | Path,
+    mode: str = "lexical",
+) -> dict[str, Any]:
+    trace = load_trace_file(trace_path)
+    return audit_trace(trace, corpus_path=corpus_path, mode=mode)
+def audit_trace(
+    trace: RAGTrace,
+    *,
+    corpus_path: str | Path,
+    mode: str = "lexical",
+) -> dict[str, Any]:
+    corpus_contexts = load_corpus(corpus_path)
+    verification = verify_trace(trace, mode=mode)
+    claim_audits = [
+        _audit_claim(claim, trace, corpus_contexts, mode=mode)
+        for claim in verification.get("claims") or []
+    ]
+    summary = _summary(claim_audits, verification, corpus_contexts, mode=mode)
+    return {
+        "query": trace.query,
+        "answer": trace.answer,
+        "summary": summary,
+        "claims": claim_audits,
+        "verification": {
+            "summary": verification.get("summary") or {},
+            "abstention": verification.get("abstention") or {},
+            "diagnostics": verification.get("diagnostics") or {},
+        },
+        "corpus": {
+            "path": str(Path(corpus_path)),
+            "documents": len(corpus_contexts),
+        },
+        "metadata": dict(trace.metadata),
+    }
+def load_corpus(corpus_path: str | Path) -> list[TraceContext]:
+    root = Path(corpus_path)
+    if not root.exists():
+        raise VerificationInputError("Corpus path %s does not exist." % root)
+    files = [root] if root.is_file() else _corpus_files(root)
+    contexts: list[TraceContext] = []
+    for path in files:
+        text = _read_text(path)
+        if not text.strip():
+            continue
+        context_id = _context_id(path, root)
+        contexts.append(
+            TraceContext(
+                id=context_id,
+                text=text,
+                metadata={
+                    "path": str(path),
+                    "source": context_id,
+                    "size_bytes": path.stat().st_size,
+                    "kind": "corpus_document",
+                },
+            )
+        )
+    if not contexts:
+        raise VerificationInputError("Corpus path %s did not contain readable text documents." % root)
+    return contexts
+def audit_failures(result: dict[str, Any], fail_on: tuple[str, ...]) -> list[str]:
+    if not fail_on:
+        return []
+    summary = result.get("summary") or {}
+    messages = []
+    for raw_rule in fail_on:
+        rule = raw_rule.strip().lower().replace("-", "_")
+        if rule == "any_failure" and bool(summary.get("has_audit_failures")):
+            messages.append("audit failure detected")
+        elif rule == "retrieval_miss" and int(summary.get(RETRIEVAL_MISS) or 0) > 0:
+            messages.append("retrieval miss detected")
+        elif rule == "reranking_failure" and int(summary.get(RERANKING_FAILURE) or 0) > 0:
+            messages.append("reranking failure detected")
+        elif rule == "chunking_issue" and int(summary.get(CHUNKING_ISSUE) or 0) > 0:
+            messages.append("chunking issue detected")
+        elif rule == "corpus_gap" and int(summary.get(CORPUS_GAP) or 0) > 0:
+            messages.append("corpus gap detected")
+        elif rule == "answer_overreach" and int(summary.get(ANSWER_OVERREACH) or 0) > 0:
+            messages.append("answer overreach detected")
+        elif rule == "stale_source" and int(summary.get(STALE_SOURCE) or 0) > 0:
+            messages.append("stale source detected")
+        elif rule == "insufficient_context" and int(summary.get(INSUFFICIENT_CONTEXT) or 0) > 0:
+            messages.append("insufficient context detected")
+        elif rule not in AUDIT_FAILURE_LABELS and rule != "any_failure":
+            messages.append("unknown --fail-on rule %s" % raw_rule)
+    return messages
+def _audit_claim(
+    claim: dict[str, Any],
+    trace: RAGTrace,
+    corpus_contexts: list[TraceContext],
+    *,
+    mode: str,
+) -> dict[str, Any]:
+    claim_text = str(claim.get("claim") or "")
+    claim_id = str(claim.get("claim_id") or "")
+    corpus_match = find_best_evidence(claim_text, corpus_contexts, mode=mode)
+    corpus_verification = classify_claim(
+        Claim(id=claim_id or "claim", text=claim_text),
+        corpus_match,
+        has_contexts=bool(corpus_contexts),
+    )
+    diagnosis = _diagnose(claim, trace, corpus_match, corpus_verification)
+    return {
+        "claim_id": claim_id,
+        "claim": claim_text,
+        "audit_label": diagnosis["label"],
+        "confidence": diagnosis["confidence"],
+        "reason": diagnosis["reason"],
+        "suggested_fix": diagnosis["suggested_fix"],
+        "retrieved": {
+            "verdict": claim.get("verdict"),
+            "best_context_id": claim.get("best_context_id"),
+            "best_score": claim.get("best_score"),
+            "evidence": claim.get("evidence"),
+            "matched_terms": list(claim.get("matched_terms") or []),
+            "root_cause": (claim.get("root_cause") or {}).get("label"),
+            "citation_status": claim.get("citation_status"),
+        },
+        "corpus": {
+            "verdict": corpus_verification.verdict,
+            "best_document_id": corpus_match.context_id,
+            "best_score": corpus_match.score,
+            "evidence": corpus_match.snippet,
+            "matched_terms": list(corpus_match.matched_terms),
+            "evidence_span": corpus_match.span_dict(),
+            "supporting_spans": list(corpus_match.supporting_spans or []),
+            "required_facts": list(corpus_verification.required_facts),
+            "matched_facts": list(corpus_verification.matched_facts),
+            "missing_facts": list(corpus_verification.missing_facts),
+            "conflicting_facts": list(corpus_verification.conflicting_facts),
+        },
+    }
+def _diagnose(
+    claim: dict[str, Any],
+    trace: RAGTrace,
+    corpus_match: object,
+    corpus_verification: object,
+) -> dict[str, Any]:
+    verdict = str(claim.get("verdict") or "")
+    root_label = str((claim.get("root_cause") or {}).get("label") or NO_FAILURE)
+    citation_status = str(claim.get("citation_status") or "")
+    corpus_verdict = str(getattr(corpus_verification, "verdict", ""))
+    corpus_score = float(getattr(corpus_match, "score", 0.0) or 0.0)
+    same_source_rank = _same_source_retrieved_rank(str(getattr(corpus_match, "context_id", "") or ""), trace)
+    if _is_citation_only_failure(claim):
+        return _result(
+            NO_FAILURE,
+            0.92,
+            "The claim is supported by retrieved evidence; the remaining issue is citation-level, not a retrieval or corpus failure.",
+            "Fix the claim-level citation, but do not treat this as a retrieval miss.",
+        )
+    if not _is_failure(claim):
+        return _result(
+            NO_FAILURE,
+            0.99,
+            "The claim is already supported by the retrieved contexts.",
+            "No fix needed for this claim.",
+        )
+    if verdict == "contradicted" or corpus_verdict == "contradicted" or root_label in {"stale_context", "conflicting_contexts"}:
+        return _result(
+            STALE_SOURCE,
+            0.86,
+            "The claim appears to conflict with retrieved or corpus evidence.",
+            "Resolve stale or conflicting sources before allowing the answer to use this fact.",
+        )
+    if corpus_verdict in SUPPORTED_VERDICTS:
+        if same_source_rank is None:
+            return _result(
+                RETRIEVAL_MISS,
+                max(0.82, min(0.98, corpus_score + 0.12)),
+                "The broader corpus contains evidence for this claim, but the retrieved contexts did not include it.",
+                "Improve retrieval recall, filters, query rewriting, or top_k so this source is retrieved.",
+            )
+        if same_source_rank >= RERANKING_CUTOFF:
+            return _result(
+                RERANKING_FAILURE,
+                max(0.78, min(0.95, corpus_score + 0.08)),
+                "A related source was retrieved, but it appeared too low in the retrieved context list for reliable generation.",
+                "Add a reranker or raise high-evidence chunks from this source before generation.",
+            )
+        return _result(
+            CHUNKING_ISSUE,
+            max(0.78, min(0.95, corpus_score + 0.08)),
+            "The retrieved source appears related, but the retrieved chunk omitted the supporting span found in the corpus.",
+            "Adjust chunk boundaries, overlap, or parent-document retrieval so the answerable span is included.",
+        )
+    if root_label == "answer_overreach" or verdict == "partially_supported":
+        return _result(
+            ANSWER_OVERREACH,
+            0.82,
+            "The evidence supports part of the claim, but not every required fact.",
+            "Remove unsupported details or retrieve evidence that explicitly supports each detail.",
+        )
+    if corpus_verdict == "partially_supported":
+        return _result(
+            ANSWER_OVERREACH,
+            0.78,
+            "The corpus supports only part of the claim, so the answer likely added unsupported detail.",
+            "Split the claim and require support for every required fact before answering.",
+        )
+    if corpus_verdict == "unverifiable" or verdict == "unverifiable":
+        return _result(
+            INSUFFICIENT_CONTEXT,
+            0.72,
+            "The closest corpus evidence is related but too weak or ambiguous to verify the claim.",
+            "Retrieve more specific evidence or force the model to qualify/abstain.",
+        )
+    if citation_status in BAD_CITATIONS and corpus_score >= 0.35:
+        return _result(
+            INSUFFICIENT_CONTEXT,
+            0.7,
+            "The claim has a citation problem and the broader corpus evidence is still not strong enough.",
+            "Regenerate claim-level citations and require cited sources to cover all required facts.",
+        )
+    return _result(
+        CORPUS_GAP,
+        max(0.7, min(0.95, 1.0 - corpus_score)),
+        "Neither the retrieved contexts nor the broader corpus provide enough support for this claim.",
+        "Add the missing source to the corpus or make the answer abstain when the corpus lacks this fact.",
+    )
+def _summary(
+    claim_audits: list[dict[str, Any]],
+    verification: dict[str, Any],
+    corpus_contexts: list[TraceContext],
+    *,
+    mode: str,
+) -> dict[str, Any]:
+    counts = Counter(str(claim.get("audit_label") or NO_FAILURE) for claim in claim_audits)
+    labels = [NO_FAILURE] + sorted(AUDIT_FAILURE_LABELS)
+    failure_count = sum(counts[label] for label in AUDIT_FAILURE_LABELS)
+    return {
+        "mode": mode,
+        "total_claims": len(claim_audits),
+        "audited_claims": len([claim for claim in claim_audits if claim.get("audit_label") != NO_FAILURE]),
+        "corpus_documents": len(corpus_contexts),
+        "has_audit_failures": failure_count > 0,
+        "primary_audit_label": _primary_label(counts),
+        "verification_failure_type": (verification.get("summary") or {}).get("failure_type"),
+        "verification_primary_root_cause": (verification.get("summary") or {}).get("primary_root_cause"),
+        **{label: counts[label] for label in labels},
+    }
+def _primary_label(counts: Counter) -> str:
+    failures = {label: counts[label] for label in AUDIT_FAILURE_LABELS if counts[label]}
+    if not failures:
+        return NO_FAILURE
+    priority = [
+        RETRIEVAL_MISS,
+        CHUNKING_ISSUE,
+        RERANKING_FAILURE,
+        CORPUS_GAP,
+        ANSWER_OVERREACH,
+        STALE_SOURCE,
+        INSUFFICIENT_CONTEXT,
+    ]
+    return max(
+        failures,
+        key=lambda label: (
+            failures[label],
+            -priority.index(label) if label in priority else -len(priority),
+        ),
+    )
+def _is_failure(claim: dict[str, Any]) -> bool:
+    return (
+        str(claim.get("verdict") or "") not in SUPPORTED_VERDICTS
+        or str(claim.get("citation_status") or "") in BAD_CITATIONS
+        or str((claim.get("root_cause") or {}).get("label") or NO_FAILURE) != NO_FAILURE
+    )
+def _is_citation_only_failure(claim: dict[str, Any]) -> bool:
+    return (
+        str(claim.get("verdict") or "") in SUPPORTED_VERDICTS
+        and str(claim.get("citation_status") or "") in BAD_CITATIONS
+        and str((claim.get("root_cause") or {}).get("label") or NO_FAILURE)
+        in {"wrong_source_cited", "missing_cited_source", NO_FAILURE}
+    )
+def _same_source_retrieved_rank(corpus_context_id: str, trace: RAGTrace) -> int | None:
+    corpus_key = _source_key(corpus_context_id)
+    if not corpus_key:
+        return None
+    for index, context in enumerate(trace.contexts):
+        candidates = [
+            context.id,
+            context.metadata.get("source"),
+            context.metadata.get("path"),
+            context.metadata.get("file"),
+            context.metadata.get("document"),
+        ]
+        if any(_sources_match(corpus_key, _source_key(value)) for value in candidates):
+            return index
+    return None
+def _sources_match(left: str, right: str) -> bool:
+    if not left or not right:
+        return False
+    if left == right:
+        return True
+    return Path(left).name == Path(right).name
+def _source_key(value: Any) -> str:
+    text = str(value or "").strip().replace("\\", "/").lower()
+    return text.strip("./")
+def _result(label: str, confidence: float, reason: str, suggested_fix: str) -> dict[str, Any]:
+    return {
+        "label": label,
+        "confidence": round(confidence, 3),
+        "reason": reason,
+        "suggested_fix": suggested_fix,
+    }
+def _corpus_files(root: Path) -> list[Path]:
+    files: list[Path] = []
+    for path in root.rglob("*"):
+        if not path.is_file():
+            continue
+        if any(part in SKIP_DIRECTORIES for part in path.parts):
+            continue
+        if path.suffix.lower() not in CORPUS_EXTENSIONS:
+            continue
+        if path.stat().st_size > MAX_FILE_BYTES:
+            continue
+        files.append(path)
+    return sorted(files, key=lambda item: str(item).lower())
+def _read_text(path: Path) -> str:
+    try:
+        return path.read_text(encoding="utf-8")
+    except UnicodeDecodeError:
+        try:
+            return path.read_text(encoding="utf-8", errors="ignore")
+        except OSError:
+            return ""
+    except OSError:
+        return ""
+def _context_id(path: Path, root: Path) -> str:
+    if root.is_file():
+        return path.name
+    try:
+        return path.relative_to(root).as_posix()
+    except ValueError:
+        return path.name

contexttrace-0.5.0/contexttrace/verify/audit_report.py ADDED Viewed

@@ -0,0 +1,372 @@
+from __future__ import annotations
+import json
+from html import escape
+from pathlib import Path
+from typing import Any
+from contexttrace.verify.schema import RAGTrace
+class AuditReportGenerator:
+    def generate(self, result: dict[str, Any], trace: RAGTrace, *, path: str) -> str:
+        output_path = Path(path)
+        output_path.parent.mkdir(parents=True, exist_ok=True)
+        output_path.write_text(self.render(result, trace), encoding="utf-8")
+        return str(output_path)
+    def render(self, result: dict[str, Any], trace: RAGTrace) -> str:
+        summary = result.get("summary") or {}
+        claims = list(result.get("claims") or [])
+        return HTML_TEMPLATE.format(
+            query=escape(_string(result.get("query"))),
+            answer=escape(_string(result.get("answer"))),
+            summary_cards=_summary_cards(summary),
+            claim_rows=_claim_rows(claims),
+            retrieval_misses=_claim_cards(claims, {"retrieval_miss"}, "No retrieval misses detected."),
+            chunking_issues=_claim_cards(
+                claims,
+                {"chunking_issue", "reranking_failure"},
+                "No chunking or reranking failures detected.",
+            ),
+            corpus_gaps=_claim_cards(claims, {"corpus_gap"}, "No corpus coverage gaps detected."),
+            answer_overreach=_claim_cards(
+                claims,
+                {"answer_overreach", "insufficient_context", "stale_source"},
+                "No answer overreach, stale source, or insufficient-context failures detected.",
+            ),
+            retrieved_contexts=_retrieved_contexts(trace),
+            corpus_summary=escape(json.dumps(result.get("corpus") or {}, indent=2)),
+            why_failed=_why_failed(claims),
+            raw_json=escape(json.dumps(_raw_summary(result), indent=2)),
+        )
+def _summary_cards(summary: dict[str, Any]) -> str:
+    cards = [
+        ("Primary Audit Label", summary.get("primary_audit_label")),
+        ("Total Claims", summary.get("total_claims", 0)),
+        ("Audited Failures", summary.get("audited_claims", 0)),
+        ("Corpus Documents", summary.get("corpus_documents", 0)),
+        ("Retrieval Misses", summary.get("retrieval_miss", 0)),
+        ("Chunking Issues", summary.get("chunking_issue", 0)),
+        ("Reranking Failures", summary.get("reranking_failure", 0)),
+        ("Corpus Gaps", summary.get("corpus_gap", 0)),
+        ("Answer Overreach", summary.get("answer_overreach", 0)),
+        ("Stale Sources", summary.get("stale_source", 0)),
+        ("Insufficient Context", summary.get("insufficient_context", 0)),
+        ("Verification Failure", summary.get("verification_failure_type")),
+    ]
+    return "\n".join(
+        """
+        <div class="card">
+          <div class="label">{label}</div>
+          <div class="value">{value}</div>
+        </div>
+        """.format(label=escape(label), value=escape(_string(value)))
+        for label, value in cards
+    )
+def _claim_rows(claims: list[dict[str, Any]]) -> str:
+    if not claims:
+        return "<tr><td colspan=\"7\" class=\"muted\">No factual claims were extracted.</td></tr>"
+    rows = []
+    for claim in claims:
+        retrieved = claim.get("retrieved") or {}
+        corpus = claim.get("corpus") or {}
+        label = _string(claim.get("audit_label"))
+        rows.append(
+            """
+            <tr>
+              <td><span class="badge audit-{label_class}">{label}</span></td>
+              <td>{claim}</td>
+              <td>{retrieved_verdict}</td>
+              <td>{retrieved_context}</td>
+              <td>{corpus_verdict}</td>
+              <td>{corpus_document}</td>
+              <td>{fix}</td>
+            </tr>
+            """.format(
+                label_class=escape(_css_token(label)),
+                label=escape(label),
+                claim=escape(_string(claim.get("claim"))),
+                retrieved_verdict=escape(_string(retrieved.get("verdict"))),
+                retrieved_context=escape(_string(retrieved.get("best_context_id") or "none")),
+                corpus_verdict=escape(_string(corpus.get("verdict"))),
+                corpus_document=escape(_string(corpus.get("best_document_id") or "none")),
+                fix=escape(_string(claim.get("suggested_fix"))),
+            )
+        )
+    return "\n".join(rows)
+def _claim_cards(claims: list[dict[str, Any]], labels: set[str], empty: str) -> str:
+    selected = [claim for claim in claims if claim.get("audit_label") in labels]
+    if not selected:
+        return "<p class=\"muted\">%s</p>" % escape(empty)
+    return "\n".join(_claim_card(claim) for claim in selected)
+def _claim_card(claim: dict[str, Any]) -> str:
+    retrieved = claim.get("retrieved") or {}
+    corpus = claim.get("corpus") or {}
+    return """
+    <article class="item">
+      <div class="item-meta">{claim_id} | {label} | confidence {confidence}</div>
+      <h3>{claim}</h3>
+      <p><strong>Diagnosis:</strong> {reason}</p>
+      <p><strong>Retrieved evidence:</strong> {retrieved_evidence}</p>
+      <p class="muted">Retrieved context: {retrieved_context} | verdict {retrieved_verdict} | score {retrieved_score}</p>
+      <p><strong>Corpus evidence:</strong> {corpus_evidence}</p>
+      <p class="muted">Corpus document: {corpus_document} | verdict {corpus_verdict} | score {corpus_score}</p>
+      <p><strong>Suggested fix:</strong> {fix}</p>
+    </article>
+    """.format(
+        claim_id=escape(_string(claim.get("claim_id"))),
+        label=escape(_string(claim.get("audit_label"))),
+        confidence=escape(_string(claim.get("confidence"))),
+        claim=escape(_string(claim.get("claim"))),
+        reason=escape(_string(claim.get("reason"))),
+        retrieved_evidence=escape(_string(retrieved.get("evidence") or "none")),
+        retrieved_context=escape(_string(retrieved.get("best_context_id") or "none")),
+        retrieved_verdict=escape(_string(retrieved.get("verdict"))),
+        retrieved_score=escape(_string(retrieved.get("best_score"))),
+        corpus_evidence=escape(_string(corpus.get("evidence") or "none")),
+        corpus_document=escape(_string(corpus.get("best_document_id") or "none")),
+        corpus_verdict=escape(_string(corpus.get("verdict"))),
+        corpus_score=escape(_string(corpus.get("best_score"))),
+        fix=escape(_string(claim.get("suggested_fix"))),
+    )
+def _retrieved_contexts(trace: RAGTrace) -> str:
+    if not trace.contexts:
+        return "<p class=\"muted\">No retrieved contexts were supplied.</p>"
+    cards = []
+    for index, context in enumerate(trace.contexts, start=1):
+        cards.append(
+            """
+            <article class="item">
+              <div class="item-meta">rank {rank} | {context_id} | {metadata}</div>
+              <p>{text}</p>
+            </article>
+            """.format(
+                rank=index,
+                context_id=escape(context.id),
+                metadata=escape(json.dumps(context.metadata, sort_keys=True) if context.metadata else "no metadata"),
+                text=escape(context.text),
+            )
+        )
+    return "\n".join(cards)
+def _why_failed(claims: list[dict[str, Any]]) -> str:
+    explanations = []
+    for claim in claims:
+        label = _string(claim.get("audit_label"))
+        if label == "no_failure_detected":
+            continue
+        explanations.append(
+            "%s: %s Suggested fix: %s"
+            % (
+                label,
+                _string(claim.get("reason")),
+                _string(claim.get("suggested_fix")),
+            )
+        )
+    if not explanations:
+        explanations.append("No corpus-level evidence-chain failure was detected.")
+    return "<ul>%s</ul>" % "\n".join("<li>%s</li>" % escape(item) for item in explanations)
+def _raw_summary(result: dict[str, Any]) -> dict[str, Any]:
+    return {
+        "summary": result.get("summary"),
+        "claims": result.get("claims"),
+        "verification": result.get("verification"),
+        "corpus": result.get("corpus"),
+    }
+def _css_token(value: Any) -> str:
+    token = _string(value).lower().replace("_", "-").replace(" ", "-")
+    return "".join(char for char in token if char.isalnum() or char == "-") or "unknown"
+def _string(value: Any) -> str:
+    if value is None:
+        return ""
+    return str(value)
+HTML_TEMPLATE = """<!doctype html>
+<html lang="en">
+<head>
+  <meta charset="utf-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <title>ContextTrace Retrieval Audit Report</title>
+  <style>
+    :root {{
+      color-scheme: light;
+      --bg: #f7f8fa;
+      --panel: #ffffff;
+      --subtle: #fbfcfe;
+      --text: #202832;
+      --muted: #657286;
+      --line: #d9e0ea;
+      --ok: #176f44;
+      --warn: #946200;
+      --bad: #b42318;
+      --accent: #2458d3;
+    }}
+    * {{ box-sizing: border-box; }}
+    body {{
+      margin: 0;
+      background: var(--bg);
+      color: var(--text);
+      font-family: Inter, ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
+      line-height: 1.5;
+    }}
+    main {{ max-width: 1160px; margin: 0 auto; padding: 32px 20px 56px; }}
+    header {{ border-bottom: 1px solid var(--line); margin-bottom: 22px; padding-bottom: 18px; }}
+    h1, h2, h3 {{ margin: 0; }}
+    h1 {{ font-size: 30px; }}
+    h2 {{ font-size: 18px; margin-bottom: 12px; }}
+    h3 {{ font-size: 15px; margin-bottom: 8px; }}
+    section {{
+      background: var(--panel);
+      border: 1px solid var(--line);
+      border-radius: 8px;
+      margin: 16px 0;
+      padding: 18px;
+    }}
+    .summary {{
+      display: grid;
+      gap: 12px;
+      grid-template-columns: repeat(auto-fit, minmax(155px, 1fr));
+    }}
+    .card, .item {{
+      border: 1px solid var(--line);
+      border-radius: 8px;
+      background: var(--subtle);
+      padding: 12px;
+    }}
+    .item + .item {{ margin-top: 10px; }}
+    .label, .item-meta {{
+      color: var(--muted);
+      font-size: 12px;
+      font-weight: 700;
+      text-transform: uppercase;
+    }}
+    .value {{ margin-top: 4px; font-size: 18px; overflow-wrap: anywhere; }}
+    .muted {{ color: var(--muted); }}
+    .answer, .item p {{ white-space: pre-wrap; }}
+    table {{ width: 100%; border-collapse: collapse; font-size: 14px; }}
+    th, td {{ border-bottom: 1px solid var(--line); padding: 10px; text-align: left; vertical-align: top; }}
+    th {{ color: var(--muted); font-size: 12px; text-transform: uppercase; }}
+    .badge {{
+      display: inline-block;
+      border-radius: 999px;
+      border: 1px solid var(--line);
+      background: #eef2f7;
+      padding: 3px 8px;
+      font-size: 12px;
+      font-weight: 700;
+      white-space: nowrap;
+    }}
+    .audit-no-failure-detected {{ color: var(--ok); background: #e9f7ef; }}
+    .audit-retrieval-miss, .audit-corpus-gap, .audit-stale-source {{ color: var(--bad); background: #fdeceb; }}
+    .audit-chunking-issue, .audit-reranking-failure,
+    .audit-answer-overreach, .audit-insufficient-context {{ color: var(--warn); background: #fff7df; }}
+    pre {{
+      margin: 0;
+      overflow: auto;
+      background: #101828;
+      color: #f8fafc;
+      border-radius: 8px;
+      padding: 14px;
+      font-size: 13px;
+    }}
+  </style>
+</head>
+<body>
+  <main>
+    <header>
+      <h1>ContextTrace Retrieval Audit Report</h1>
+      <p class="muted">Local corpus-level diagnosis for claim evidence failures.</p>
+    </header>
+    <section>
+      <h2>Audit Summary</h2>
+      <div class="summary">{summary_cards}</div>
+    </section>
+    <section>
+      <h2>Query</h2>
+      <p>{query}</p>
+      <h2>Answer</h2>
+      <p class="answer">{answer}</p>
+    </section>
+    <section>
+      <h2>Claim Failure Diagnosis</h2>
+      <table>
+        <thead>
+          <tr>
+            <th>Audit Label</th>
+            <th>Claim</th>
+            <th>Retrieved Verdict</th>
+            <th>Retrieved Context</th>
+            <th>Corpus Verdict</th>
+            <th>Corpus Document</th>
+            <th>Suggested Fix</th>
+          </tr>
+        </thead>
+        <tbody>{claim_rows}</tbody>
+      </table>
+    </section>
+    <section>
+      <h2>Retrieval Misses</h2>
+      {retrieval_misses}
+    </section>
+    <section>
+      <h2>Chunking And Reranking Issues</h2>
+      {chunking_issues}
+    </section>
+    <section>
+      <h2>Corpus Gaps</h2>
+      {corpus_gaps}
+    </section>
+    <section>
+      <h2>Answer Overreach And Ambiguous Evidence</h2>
+      {answer_overreach}
+    </section>
+    <section>
+      <h2>Retrieved Contexts</h2>
+      {retrieved_contexts}
+    </section>
+    <section>
+      <h2>Corpus Summary</h2>
+      <pre>{corpus_summary}</pre>
+    </section>
+    <section>
+      <h2>Why This Failed</h2>
+      {why_failed}
+    </section>
+    <section>
+      <h2>Raw JSON Summary</h2>
+      <pre>{raw_json}</pre>
+    </section>
+  </main>
+</body>
+</html>
+"""

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace.egg-info/SOURCES.txt RENAMED Viewed

@@ -30,6 +30,8 @@ contexttrace/storage/__init__.py
 contexttrace/storage/sqlite_store.py
 contexttrace/verify/__init__.py
 contexttrace/verify/abstention.py
+contexttrace/verify/audit.py
+contexttrace/verify/audit_report.py
 contexttrace/verify/benchmark.py
 contexttrace/verify/citations.py
 contexttrace/verify/claims.py

{contexttrace-0.4.0 → contexttrace-0.5.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "contexttrace"
-version = "0.4.0"
+version = "0.5.0"
 description = "Local-first SDK and CLI for RAG and agent reliability tracing, citation checks, and failure diagnosis."
 readme = "README.md"
 requires-python = ">=3.8"

contexttrace-0.4.0/contexttrace/_version.py DELETED Viewed

	@@ -1 +0,0 @@
1	- __version__ = "0.4.0"

{contexttrace-0.4.0 → contexttrace-0.5.0}/MANIFEST.in RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/__init__.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/client.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/config.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/demo.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/demo_data.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/endpoint_eval.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/errors.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/evaluator.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/integrations/__init__.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/integrations/fastapi.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/integrations/langchain.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/integrations/langgraph.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/integrations/llamaindex.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/integrations/opentelemetry.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/local.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/py.typed RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/regression.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/reliability.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/report.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/storage/__init__.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/storage/sqlite_store.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/thresholds.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/transport.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/abstention.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/benchmark.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/citations.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/claims.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/compare.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/compare_report.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/demos.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/evidence.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/external_benchmark_cases.json RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/facts.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/real_benchmark_cases.json RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/report.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/root_cause.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/runner.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/schema.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/spans.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/verify/verdicts.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/contexttrace/viewer.py RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/setup.cfg RENAMED Viewed

File without changes

{contexttrace-0.4.0 → contexttrace-0.5.0}/setup.py RENAMED Viewed

File without changes

contexttrace 0.4.0__tar.gz → 0.5.0__tar.gz

contexttrace 0.4.0tar.gz → 0.5.0tar.gz