evidence-gate-py 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,13 @@
1
+ node_modules/
2
+ dist/
3
+ *.tgz
4
+ __pycache__/
5
+ *.pyc
6
+ .pytest_cache/
7
+ build/
8
+ *.egg-info/
9
+ .venv/
10
+ .DS_Store
11
+
12
+ node_modules/
13
+ package-lock.json
@@ -0,0 +1,124 @@
1
+ Metadata-Version: 2.4
2
+ Name: evidence-gate-py
3
+ Version: 0.1.0
4
+ Summary: Stop your AI from making up facts about your own data.
5
+ Project-URL: Homepage, https://github.com/LopezDray/evidence-gate
6
+ Author: LopezDray
7
+ License: MIT
8
+ Keywords: agents,ai,grounding,guardrails,hallucination,llm,rag,reliability
9
+ Requires-Python: >=3.8
10
+ Description-Content-Type: text/markdown
11
+
12
+ # Evidence Gate
13
+
14
+ **Stop your AI from making up facts about your own data.**
15
+
16
+ A tiny, zero-dependency gate you call **before** an LLM generates anything. It looks at the records you actually have and returns whether the model may summarize, whether the data is stale or low-quality, and the exact caveats to inject into your prompt.
17
+
18
+ It is the missing step between "retrieve" and "generate" in most RAG and agent pipelines: a check that the evidence is actually good enough to speak from.
19
+
20
+ ```js
21
+ import { evidenceGate, presets } from "evidence-gate";
22
+
23
+ const gate = evidenceGate({ records: myRecords, rules: presets.FINANCE });
24
+
25
+ if (!gate.allowedActions.summarize) {
26
+ return gate.caveats.join(" "); // e.g. "No financial statements available — the AI must not invent numbers."
27
+ }
28
+
29
+ const prompt = `${systemPrompt}\n\nDATA RULES:\n- ${gate.caveats.join("\n- ")}\n\n${userPrompt}`;
30
+ ```
31
+
32
+ ---
33
+
34
+ ## The problem
35
+
36
+ LLMs answer confidently even when the underlying data is missing, stale, or partial. Over your own database that means a chatbot inventing a quarter that doesn't exist, quoting a number from a cached/secondary source as if it were authoritative, or saying "as of today" about data that's months old.
37
+
38
+ Content-safety guardrails (toxicity, PII) don't catch this. This is a **grounding** problem, and it needs a grounding gate.
39
+
40
+ ## What it does
41
+
42
+ Given a set of evidence `records` and a `rules` preset, Evidence Gate returns:
43
+
44
+ ```js
45
+ {
46
+ status: "available" | "quality_warning" | "fallback" | "missing",
47
+ freshness: "fresh" | "stale" | "unknown",
48
+ allowedActions: { summarize, compare, /* + your forbidden actions forced to false */ },
49
+ warnings: [{ level, code, message }],
50
+ caveats: ["...strings ready to inject into the prompt..."]
51
+ }
52
+ ```
53
+
54
+ - **`status`** — is there enough authoritative evidence to speak from at all?
55
+ - **`allowedActions`** — gate generation on `summarize`; forbidden actions (e.g. `personalized_advice`, `diagnose`, `claim_realtime`) are always `false`.
56
+ - **`caveats`** — drop straight into your system prompt so the model self-limits.
57
+
58
+ It is fully **domain-agnostic**. The same engine works for finance, healthcare, support, legal — a domain is just a preset.
59
+
60
+ ## Install
61
+
62
+ ```bash
63
+ npm install evidence-gate
64
+ # or
65
+ pip install evidence-gate
66
+ ```
67
+
68
+ ## Records
69
+
70
+ Each record is one observation/period you have evidence for:
71
+
72
+ ```js
73
+ {
74
+ date: "2026-03-31", // ISO date of the observation
75
+ qualityScore: 92, // optional 0-100
76
+ quality: "clean", // optional "clean" | "review"
77
+ flags: ["RESTATED"], // optional data-quality flags
78
+ tier: "primary" // optional "primary" (default) | "fallback" (cached/secondary)
79
+ }
80
+ ```
81
+
82
+ ## Presets
83
+
84
+ A preset is a ruleset. Adding a vertical = copying a preset, never touching the core.
85
+
86
+ ```js
87
+ export const FINANCE = {
88
+ primaryLabel: "financial statements",
89
+ staleDays: 135,
90
+ minRecords: 4,
91
+ qualityThreshold: 70,
92
+ forbiddenActions: ["personalized_advice", "claim_realtime"],
93
+ };
94
+ ```
95
+
96
+ Ships with `FINANCE`, `HEALTH`, and `SUPPORT` examples. Override any message via `rules.messages`.
97
+
98
+ ## Use it as an MCP server
99
+
100
+ Give an agent a fact-checker it calls before it speaks. The package ships an MCP server exposing a `check_evidence` tool, so any MCP-compatible agent (Claude, IDE assistants, etc.) can gate itself.
101
+
102
+ ```bash
103
+ npm install evidence-gate @modelcontextprotocol/sdk # the SDK is an optional peer
104
+ ```
105
+
106
+ Register it with your client:
107
+
108
+ ```json
109
+ {
110
+ "mcpServers": {
111
+ "evidence-gate": { "command": "npx", "args": ["evidence-gate-mcp"] }
112
+ }
113
+ }
114
+ ```
115
+
116
+ The tool accepts `{ records, supporting?, preset?, rules? }` and returns the gate result. Instruct your agent to call it first and refuse to answer when `allowedActions.summarize` is `false`.
117
+
118
+ ## Why trust it
119
+
120
+ The engine is intentionally small and pure — no network, no dependencies, all logic unit-tested. It runs in production today inside a financial-data application, gating an LLM over real, messy, sometimes-missing filings.
121
+
122
+ ## License
123
+
124
+ MIT
@@ -0,0 +1,113 @@
1
+ # Evidence Gate
2
+
3
+ **Stop your AI from making up facts about your own data.**
4
+
5
+ A tiny, zero-dependency gate you call **before** an LLM generates anything. It looks at the records you actually have and returns whether the model may summarize, whether the data is stale or low-quality, and the exact caveats to inject into your prompt.
6
+
7
+ It is the missing step between "retrieve" and "generate" in most RAG and agent pipelines: a check that the evidence is actually good enough to speak from.
8
+
9
+ ```js
10
+ import { evidenceGate, presets } from "evidence-gate";
11
+
12
+ const gate = evidenceGate({ records: myRecords, rules: presets.FINANCE });
13
+
14
+ if (!gate.allowedActions.summarize) {
15
+ return gate.caveats.join(" "); // e.g. "No financial statements available — the AI must not invent numbers."
16
+ }
17
+
18
+ const prompt = `${systemPrompt}\n\nDATA RULES:\n- ${gate.caveats.join("\n- ")}\n\n${userPrompt}`;
19
+ ```
20
+
21
+ ---
22
+
23
+ ## The problem
24
+
25
+ LLMs answer confidently even when the underlying data is missing, stale, or partial. Over your own database that means a chatbot inventing a quarter that doesn't exist, quoting a number from a cached/secondary source as if it were authoritative, or saying "as of today" about data that's months old.
26
+
27
+ Content-safety guardrails (toxicity, PII) don't catch this. This is a **grounding** problem, and it needs a grounding gate.
28
+
29
+ ## What it does
30
+
31
+ Given a set of evidence `records` and a `rules` preset, Evidence Gate returns:
32
+
33
+ ```js
34
+ {
35
+ status: "available" | "quality_warning" | "fallback" | "missing",
36
+ freshness: "fresh" | "stale" | "unknown",
37
+ allowedActions: { summarize, compare, /* + your forbidden actions forced to false */ },
38
+ warnings: [{ level, code, message }],
39
+ caveats: ["...strings ready to inject into the prompt..."]
40
+ }
41
+ ```
42
+
43
+ - **`status`** — is there enough authoritative evidence to speak from at all?
44
+ - **`allowedActions`** — gate generation on `summarize`; forbidden actions (e.g. `personalized_advice`, `diagnose`, `claim_realtime`) are always `false`.
45
+ - **`caveats`** — drop straight into your system prompt so the model self-limits.
46
+
47
+ It is fully **domain-agnostic**. The same engine works for finance, healthcare, support, legal — a domain is just a preset.
48
+
49
+ ## Install
50
+
51
+ ```bash
52
+ npm install evidence-gate
53
+ # or
54
+ pip install evidence-gate
55
+ ```
56
+
57
+ ## Records
58
+
59
+ Each record is one observation/period you have evidence for:
60
+
61
+ ```js
62
+ {
63
+ date: "2026-03-31", // ISO date of the observation
64
+ qualityScore: 92, // optional 0-100
65
+ quality: "clean", // optional "clean" | "review"
66
+ flags: ["RESTATED"], // optional data-quality flags
67
+ tier: "primary" // optional "primary" (default) | "fallback" (cached/secondary)
68
+ }
69
+ ```
70
+
71
+ ## Presets
72
+
73
+ A preset is a ruleset. Adding a vertical = copying a preset, never touching the core.
74
+
75
+ ```js
76
+ export const FINANCE = {
77
+ primaryLabel: "financial statements",
78
+ staleDays: 135,
79
+ minRecords: 4,
80
+ qualityThreshold: 70,
81
+ forbiddenActions: ["personalized_advice", "claim_realtime"],
82
+ };
83
+ ```
84
+
85
+ Ships with `FINANCE`, `HEALTH`, and `SUPPORT` examples. Override any message via `rules.messages`.
86
+
87
+ ## Use it as an MCP server
88
+
89
+ Give an agent a fact-checker it calls before it speaks. The package ships an MCP server exposing a `check_evidence` tool, so any MCP-compatible agent (Claude, IDE assistants, etc.) can gate itself.
90
+
91
+ ```bash
92
+ npm install evidence-gate @modelcontextprotocol/sdk # the SDK is an optional peer
93
+ ```
94
+
95
+ Register it with your client:
96
+
97
+ ```json
98
+ {
99
+ "mcpServers": {
100
+ "evidence-gate": { "command": "npx", "args": ["evidence-gate-mcp"] }
101
+ }
102
+ }
103
+ ```
104
+
105
+ The tool accepts `{ records, supporting?, preset?, rules? }` and returns the gate result. Instruct your agent to call it first and refuse to answer when `allowedActions.summarize` is `false`.
106
+
107
+ ## Why trust it
108
+
109
+ The engine is intentionally small and pure — no network, no dependencies, all logic unit-tested. It runs in production today inside a financial-data application, gating an LLM over real, messy, sometimes-missing filings.
110
+
111
+ ## License
112
+
113
+ MIT
@@ -0,0 +1,21 @@
1
+ """Evidence Gate — stop your AI from making up facts about your own data."""
2
+ from .core import (
3
+ evidence_gate,
4
+ classify_status,
5
+ derive_allowed_actions,
6
+ parse_date,
7
+ days_since,
8
+ freshness_label,
9
+ )
10
+ from . import presets
11
+
12
+ __all__ = [
13
+ "evidence_gate",
14
+ "classify_status",
15
+ "derive_allowed_actions",
16
+ "parse_date",
17
+ "days_since",
18
+ "freshness_label",
19
+ "presets",
20
+ ]
21
+ __version__ = "0.1.0"
@@ -0,0 +1,123 @@
1
+ """Evidence Gate — core engine (Python).
2
+
3
+ Decide whether an LLM may speak about a set of records, BEFORE it generates
4
+ anything — based on evidence coverage, freshness, and quality.
5
+
6
+ No dependencies. Domain-agnostic: bring records + a ruleset (preset).
7
+
8
+ record = {"date", "quality_score"?, "quality"?, "flags"?, "tier"?}
9
+ """
10
+ from datetime import datetime, date
11
+
12
+
13
+ def parse_date(value):
14
+ if not value:
15
+ return None
16
+ try:
17
+ return datetime.strptime(str(value)[:10], "%Y-%m-%d").date()
18
+ except ValueError:
19
+ return None
20
+
21
+
22
+ def days_since(d):
23
+ if not d:
24
+ return None
25
+ return (date.today() - d).days
26
+
27
+
28
+ def freshness_label(latest, threshold_days):
29
+ age = days_since(latest)
30
+ if age is None:
31
+ return "unknown"
32
+ return "stale" if age > threshold_days else "fresh"
33
+
34
+
35
+ def classify_status(records, rules):
36
+ """records[] + rules -> status of the primary evidence group.
37
+
38
+ rules: {"stale_days", "min_records", "quality_threshold"}
39
+ status: "available" | "quality_warning" | "fallback" | "missing"
40
+ """
41
+ usable = records or []
42
+ if not usable:
43
+ return {"status": "missing", "freshness": "unknown", "latest": None,
44
+ "count": 0, "quality_min": None, "flags": []}
45
+
46
+ dates = [d for d in (parse_date(r.get("date")) for r in usable) if d]
47
+ latest = max(dates) if dates else None
48
+ freshness = freshness_label(latest, rules["stale_days"]) if latest else "unknown"
49
+ latest_str = latest.isoformat() if latest else None
50
+
51
+ if all(r.get("tier") == "fallback" for r in usable):
52
+ return {"status": "fallback", "freshness": freshness, "latest": latest_str,
53
+ "count": len(usable), "quality_min": None, "flags": []}
54
+
55
+ scores = [r["quality_score"] for r in usable if r.get("quality_score") is not None]
56
+ quality_min = min(scores) if scores else None
57
+ flags = []
58
+ for r in usable:
59
+ wf = r.get("flags")
60
+ if isinstance(wf, list):
61
+ flags.extend(wf)
62
+ elif isinstance(wf, str) and wf:
63
+ flags.append(wf)
64
+ flags = sorted(set(flags))
65
+
66
+ review_quality = any(r.get("quality") == "review" for r in usable)
67
+ has_quality_issue = (
68
+ (quality_min is not None and quality_min < rules["quality_threshold"])
69
+ or bool(flags)
70
+ or review_quality
71
+ )
72
+
73
+ if len(usable) < rules["min_records"]:
74
+ status = "quality_warning"
75
+ elif has_quality_issue:
76
+ status = "quality_warning"
77
+ else:
78
+ status = "available"
79
+
80
+ return {"status": status, "freshness": freshness, "latest": latest_str,
81
+ "count": len(usable), "quality_min": quality_min, "flags": flags}
82
+
83
+
84
+ def derive_allowed_actions(primary_status, supporting_present=False, forbidden_actions=None):
85
+ summarize = primary_status in ("available", "quality_warning", "fallback") or supporting_present
86
+ compare = primary_status in ("available", "quality_warning")
87
+ actions = {"summarize": summarize, "compare": compare}
88
+ for a in (forbidden_actions or []):
89
+ actions[a] = False
90
+ return actions
91
+
92
+
93
+ def evidence_gate(records=None, supporting=None, rules=None):
94
+ """records + rules -> {status, freshness, allowed_actions, warnings, caveats}."""
95
+ if not rules:
96
+ raise ValueError("evidence_gate: `rules` (a preset) is required")
97
+
98
+ primary = classify_status(records or [], rules)
99
+ supporting_present = bool(supporting or [])
100
+ allowed = derive_allowed_actions(primary["status"], supporting_present, rules.get("forbidden_actions"))
101
+
102
+ warnings = []
103
+ m = rules.get("messages", {})
104
+ label = rules.get("primary_label", "primary data")
105
+ st = primary["status"]
106
+ if st == "missing":
107
+ warnings.append({"level": "block", "code": "primary_missing",
108
+ "message": m.get("primary_missing", f"No {label} available — the AI must not invent numbers.")})
109
+ elif st == "fallback":
110
+ warnings.append({"level": "review", "code": "primary_fallback",
111
+ "message": m.get("primary_fallback", f"{label} is cached/fallback, not authoritative — the AI must say so.")})
112
+ elif st == "quality_warning":
113
+ warnings.append({"level": "review", "code": "primary_quality",
114
+ "message": m.get("primary_quality", f"{label} has a data-quality warning — the AI must add a caveat.")})
115
+ if primary["freshness"] == "stale":
116
+ warnings.append({"level": "review", "code": "primary_stale",
117
+ "message": m.get("primary_stale", f'{label} is stale (older than {rules["stale_days"]} days) — the AI must not say "latest" or "today".')})
118
+ if not supporting_present:
119
+ warnings.append({"level": "info", "code": "no_supporting",
120
+ "message": m.get("no_supporting", "No supporting evidence — primary source only.")})
121
+
122
+ return {"status": st, "freshness": primary["freshness"], "allowed_actions": allowed,
123
+ "warnings": warnings, "caveats": [w["message"] for w in warnings]}
@@ -0,0 +1,25 @@
1
+ """Evidence Gate — example presets. A vertical is just a ruleset."""
2
+
3
+ FINANCE = {
4
+ "primary_label": "financial statements",
5
+ "stale_days": 135,
6
+ "min_records": 4,
7
+ "quality_threshold": 70,
8
+ "forbidden_actions": ["personalized_advice", "claim_realtime"],
9
+ }
10
+
11
+ HEALTH = {
12
+ "primary_label": "lab results",
13
+ "stale_days": 90,
14
+ "min_records": 2,
15
+ "quality_threshold": 80,
16
+ "forbidden_actions": ["diagnose", "prescribe", "claim_realtime"],
17
+ }
18
+
19
+ SUPPORT = {
20
+ "primary_label": "knowledge-base documents",
21
+ "stale_days": 365,
22
+ "min_records": 1,
23
+ "quality_threshold": 50,
24
+ "forbidden_actions": ["make_promises", "claim_realtime"],
25
+ }
@@ -0,0 +1,19 @@
1
+ [project]
2
+ name = "evidence-gate-py"
3
+ version = "0.1.0"
4
+ description = "Stop your AI from making up facts about your own data."
5
+ readme = "README.md"
6
+ requires-python = ">=3.8"
7
+ license = { text = "MIT" }
8
+ keywords = ["llm", "rag", "ai", "agents", "guardrails", "hallucination", "grounding", "reliability"]
9
+ authors = [{ name = "LopezDray" }]
10
+
11
+ [project.urls]
12
+ Homepage = "https://github.com/LopezDray/evidence-gate"
13
+
14
+ [build-system]
15
+ requires = ["hatchling"]
16
+ build-backend = "hatchling.build"
17
+
18
+ [tool.hatch.build.targets.wheel]
19
+ packages = ["evidence_gate"]
@@ -0,0 +1,53 @@
1
+ """Evidence Gate — core tests: python -m pytest (or: python tests/test_core.py)"""
2
+ import os
3
+ import sys
4
+ from datetime import date, timedelta
5
+
6
+ sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
7
+ from evidence_gate.core import classify_status, derive_allowed_actions, evidence_gate
8
+ from evidence_gate.presets import FINANCE, HEALTH
9
+
10
+
11
+ def iso(days_ago):
12
+ return (date.today() - timedelta(days=days_ago)).isoformat()
13
+
14
+
15
+ def rec(q, days_ago, **extra):
16
+ return {"date": iso(days_ago), "quality_score": q, **extra}
17
+
18
+
19
+ def test_classify():
20
+ assert classify_status([rec(95, 5)] * 4, FINANCE)["status"] == "available"
21
+ assert classify_status([rec(95, 5)] * 2, FINANCE)["status"] == "quality_warning"
22
+ assert classify_status([rec(95, 5), rec(95, 5), rec(95, 5), rec(40, 5)], FINANCE)["status"] == "quality_warning"
23
+ assert classify_status([], FINANCE)["status"] == "missing"
24
+ assert classify_status([rec(95, 5, tier="fallback")] * 2, FINANCE)["status"] == "fallback"
25
+ assert classify_status([rec(95, 200)] * 4, FINANCE)["freshness"] == "stale"
26
+
27
+
28
+ def test_allowed_actions():
29
+ a = derive_allowed_actions("available", forbidden_actions=["personalized_advice"])
30
+ assert a["summarize"] and a["compare"] and a["personalized_advice"] is False
31
+ assert derive_allowed_actions("missing")["summarize"] is False
32
+ assert derive_allowed_actions("missing", supporting_present=True)["summarize"] is True
33
+
34
+
35
+ def test_date_validation():
36
+ from evidence_gate.core import parse_date
37
+ assert parse_date(None) is None
38
+ assert parse_date("not-a-date") is None
39
+ assert parse_date("2026-13-01") is None # impossible month
40
+ assert parse_date("2026-02-31") is None # impossible day
41
+ assert parse_date("2024-02-29") is not None # valid leap day
42
+
43
+
44
+ def test_domain_swap():
45
+ g = evidence_gate(records=[rec(95, 5), rec(95, 5)], rules=HEALTH)
46
+ assert g["status"] == "available"
47
+ assert g["allowed_actions"]["diagnose"] is False
48
+ assert g["allowed_actions"]["prescribe"] is False
49
+
50
+
51
+ if __name__ == "__main__":
52
+ test_classify(); test_allowed_actions(); test_date_validation(); test_domain_swap()
53
+ print("all tests passed")