evidence-gate-py 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- evidence_gate_py-0.1.0/.gitignore +13 -0
- evidence_gate_py-0.1.0/PKG-INFO +124 -0
- evidence_gate_py-0.1.0/README.md +113 -0
- evidence_gate_py-0.1.0/evidence_gate/__init__.py +21 -0
- evidence_gate_py-0.1.0/evidence_gate/core.py +123 -0
- evidence_gate_py-0.1.0/evidence_gate/presets.py +25 -0
- evidence_gate_py-0.1.0/pyproject.toml +19 -0
- evidence_gate_py-0.1.0/tests/test_core.py +53 -0
|
@@ -0,0 +1,124 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: evidence-gate-py
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Stop your AI from making up facts about your own data.
|
|
5
|
+
Project-URL: Homepage, https://github.com/LopezDray/evidence-gate
|
|
6
|
+
Author: LopezDray
|
|
7
|
+
License: MIT
|
|
8
|
+
Keywords: agents,ai,grounding,guardrails,hallucination,llm,rag,reliability
|
|
9
|
+
Requires-Python: >=3.8
|
|
10
|
+
Description-Content-Type: text/markdown
|
|
11
|
+
|
|
12
|
+
# Evidence Gate
|
|
13
|
+
|
|
14
|
+
**Stop your AI from making up facts about your own data.**
|
|
15
|
+
|
|
16
|
+
A tiny, zero-dependency gate you call **before** an LLM generates anything. It looks at the records you actually have and returns whether the model may summarize, whether the data is stale or low-quality, and the exact caveats to inject into your prompt.
|
|
17
|
+
|
|
18
|
+
It is the missing step between "retrieve" and "generate" in most RAG and agent pipelines: a check that the evidence is actually good enough to speak from.
|
|
19
|
+
|
|
20
|
+
```js
|
|
21
|
+
import { evidenceGate, presets } from "evidence-gate";
|
|
22
|
+
|
|
23
|
+
const gate = evidenceGate({ records: myRecords, rules: presets.FINANCE });
|
|
24
|
+
|
|
25
|
+
if (!gate.allowedActions.summarize) {
|
|
26
|
+
return gate.caveats.join(" "); // e.g. "No financial statements available — the AI must not invent numbers."
|
|
27
|
+
}
|
|
28
|
+
|
|
29
|
+
const prompt = `${systemPrompt}\n\nDATA RULES:\n- ${gate.caveats.join("\n- ")}\n\n${userPrompt}`;
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
## The problem
|
|
35
|
+
|
|
36
|
+
LLMs answer confidently even when the underlying data is missing, stale, or partial. Over your own database that means a chatbot inventing a quarter that doesn't exist, quoting a number from a cached/secondary source as if it were authoritative, or saying "as of today" about data that's months old.
|
|
37
|
+
|
|
38
|
+
Content-safety guardrails (toxicity, PII) don't catch this. This is a **grounding** problem, and it needs a grounding gate.
|
|
39
|
+
|
|
40
|
+
## What it does
|
|
41
|
+
|
|
42
|
+
Given a set of evidence `records` and a `rules` preset, Evidence Gate returns:
|
|
43
|
+
|
|
44
|
+
```js
|
|
45
|
+
{
|
|
46
|
+
status: "available" | "quality_warning" | "fallback" | "missing",
|
|
47
|
+
freshness: "fresh" | "stale" | "unknown",
|
|
48
|
+
allowedActions: { summarize, compare, /* + your forbidden actions forced to false */ },
|
|
49
|
+
warnings: [{ level, code, message }],
|
|
50
|
+
caveats: ["...strings ready to inject into the prompt..."]
|
|
51
|
+
}
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
- **`status`** — is there enough authoritative evidence to speak from at all?
|
|
55
|
+
- **`allowedActions`** — gate generation on `summarize`; forbidden actions (e.g. `personalized_advice`, `diagnose`, `claim_realtime`) are always `false`.
|
|
56
|
+
- **`caveats`** — drop straight into your system prompt so the model self-limits.
|
|
57
|
+
|
|
58
|
+
It is fully **domain-agnostic**. The same engine works for finance, healthcare, support, legal — a domain is just a preset.
|
|
59
|
+
|
|
60
|
+
## Install
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
npm install evidence-gate
|
|
64
|
+
# or
|
|
65
|
+
pip install evidence-gate
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
## Records
|
|
69
|
+
|
|
70
|
+
Each record is one observation/period you have evidence for:
|
|
71
|
+
|
|
72
|
+
```js
|
|
73
|
+
{
|
|
74
|
+
date: "2026-03-31", // ISO date of the observation
|
|
75
|
+
qualityScore: 92, // optional 0-100
|
|
76
|
+
quality: "clean", // optional "clean" | "review"
|
|
77
|
+
flags: ["RESTATED"], // optional data-quality flags
|
|
78
|
+
tier: "primary" // optional "primary" (default) | "fallback" (cached/secondary)
|
|
79
|
+
}
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
## Presets
|
|
83
|
+
|
|
84
|
+
A preset is a ruleset. Adding a vertical = copying a preset, never touching the core.
|
|
85
|
+
|
|
86
|
+
```js
|
|
87
|
+
export const FINANCE = {
|
|
88
|
+
primaryLabel: "financial statements",
|
|
89
|
+
staleDays: 135,
|
|
90
|
+
minRecords: 4,
|
|
91
|
+
qualityThreshold: 70,
|
|
92
|
+
forbiddenActions: ["personalized_advice", "claim_realtime"],
|
|
93
|
+
};
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
Ships with `FINANCE`, `HEALTH`, and `SUPPORT` examples. Override any message via `rules.messages`.
|
|
97
|
+
|
|
98
|
+
## Use it as an MCP server
|
|
99
|
+
|
|
100
|
+
Give an agent a fact-checker it calls before it speaks. The package ships an MCP server exposing a `check_evidence` tool, so any MCP-compatible agent (Claude, IDE assistants, etc.) can gate itself.
|
|
101
|
+
|
|
102
|
+
```bash
|
|
103
|
+
npm install evidence-gate @modelcontextprotocol/sdk # the SDK is an optional peer
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
Register it with your client:
|
|
107
|
+
|
|
108
|
+
```json
|
|
109
|
+
{
|
|
110
|
+
"mcpServers": {
|
|
111
|
+
"evidence-gate": { "command": "npx", "args": ["evidence-gate-mcp"] }
|
|
112
|
+
}
|
|
113
|
+
}
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
The tool accepts `{ records, supporting?, preset?, rules? }` and returns the gate result. Instruct your agent to call it first and refuse to answer when `allowedActions.summarize` is `false`.
|
|
117
|
+
|
|
118
|
+
## Why trust it
|
|
119
|
+
|
|
120
|
+
The engine is intentionally small and pure — no network, no dependencies, all logic unit-tested. It runs in production today inside a financial-data application, gating an LLM over real, messy, sometimes-missing filings.
|
|
121
|
+
|
|
122
|
+
## License
|
|
123
|
+
|
|
124
|
+
MIT
|
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
# Evidence Gate
|
|
2
|
+
|
|
3
|
+
**Stop your AI from making up facts about your own data.**
|
|
4
|
+
|
|
5
|
+
A tiny, zero-dependency gate you call **before** an LLM generates anything. It looks at the records you actually have and returns whether the model may summarize, whether the data is stale or low-quality, and the exact caveats to inject into your prompt.
|
|
6
|
+
|
|
7
|
+
It is the missing step between "retrieve" and "generate" in most RAG and agent pipelines: a check that the evidence is actually good enough to speak from.
|
|
8
|
+
|
|
9
|
+
```js
|
|
10
|
+
import { evidenceGate, presets } from "evidence-gate";
|
|
11
|
+
|
|
12
|
+
const gate = evidenceGate({ records: myRecords, rules: presets.FINANCE });
|
|
13
|
+
|
|
14
|
+
if (!gate.allowedActions.summarize) {
|
|
15
|
+
return gate.caveats.join(" "); // e.g. "No financial statements available — the AI must not invent numbers."
|
|
16
|
+
}
|
|
17
|
+
|
|
18
|
+
const prompt = `${systemPrompt}\n\nDATA RULES:\n- ${gate.caveats.join("\n- ")}\n\n${userPrompt}`;
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## The problem
|
|
24
|
+
|
|
25
|
+
LLMs answer confidently even when the underlying data is missing, stale, or partial. Over your own database that means a chatbot inventing a quarter that doesn't exist, quoting a number from a cached/secondary source as if it were authoritative, or saying "as of today" about data that's months old.
|
|
26
|
+
|
|
27
|
+
Content-safety guardrails (toxicity, PII) don't catch this. This is a **grounding** problem, and it needs a grounding gate.
|
|
28
|
+
|
|
29
|
+
## What it does
|
|
30
|
+
|
|
31
|
+
Given a set of evidence `records` and a `rules` preset, Evidence Gate returns:
|
|
32
|
+
|
|
33
|
+
```js
|
|
34
|
+
{
|
|
35
|
+
status: "available" | "quality_warning" | "fallback" | "missing",
|
|
36
|
+
freshness: "fresh" | "stale" | "unknown",
|
|
37
|
+
allowedActions: { summarize, compare, /* + your forbidden actions forced to false */ },
|
|
38
|
+
warnings: [{ level, code, message }],
|
|
39
|
+
caveats: ["...strings ready to inject into the prompt..."]
|
|
40
|
+
}
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
- **`status`** — is there enough authoritative evidence to speak from at all?
|
|
44
|
+
- **`allowedActions`** — gate generation on `summarize`; forbidden actions (e.g. `personalized_advice`, `diagnose`, `claim_realtime`) are always `false`.
|
|
45
|
+
- **`caveats`** — drop straight into your system prompt so the model self-limits.
|
|
46
|
+
|
|
47
|
+
It is fully **domain-agnostic**. The same engine works for finance, healthcare, support, legal — a domain is just a preset.
|
|
48
|
+
|
|
49
|
+
## Install
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
npm install evidence-gate
|
|
53
|
+
# or
|
|
54
|
+
pip install evidence-gate
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Records
|
|
58
|
+
|
|
59
|
+
Each record is one observation/period you have evidence for:
|
|
60
|
+
|
|
61
|
+
```js
|
|
62
|
+
{
|
|
63
|
+
date: "2026-03-31", // ISO date of the observation
|
|
64
|
+
qualityScore: 92, // optional 0-100
|
|
65
|
+
quality: "clean", // optional "clean" | "review"
|
|
66
|
+
flags: ["RESTATED"], // optional data-quality flags
|
|
67
|
+
tier: "primary" // optional "primary" (default) | "fallback" (cached/secondary)
|
|
68
|
+
}
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## Presets
|
|
72
|
+
|
|
73
|
+
A preset is a ruleset. Adding a vertical = copying a preset, never touching the core.
|
|
74
|
+
|
|
75
|
+
```js
|
|
76
|
+
export const FINANCE = {
|
|
77
|
+
primaryLabel: "financial statements",
|
|
78
|
+
staleDays: 135,
|
|
79
|
+
minRecords: 4,
|
|
80
|
+
qualityThreshold: 70,
|
|
81
|
+
forbiddenActions: ["personalized_advice", "claim_realtime"],
|
|
82
|
+
};
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
Ships with `FINANCE`, `HEALTH`, and `SUPPORT` examples. Override any message via `rules.messages`.
|
|
86
|
+
|
|
87
|
+
## Use it as an MCP server
|
|
88
|
+
|
|
89
|
+
Give an agent a fact-checker it calls before it speaks. The package ships an MCP server exposing a `check_evidence` tool, so any MCP-compatible agent (Claude, IDE assistants, etc.) can gate itself.
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
npm install evidence-gate @modelcontextprotocol/sdk # the SDK is an optional peer
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Register it with your client:
|
|
96
|
+
|
|
97
|
+
```json
|
|
98
|
+
{
|
|
99
|
+
"mcpServers": {
|
|
100
|
+
"evidence-gate": { "command": "npx", "args": ["evidence-gate-mcp"] }
|
|
101
|
+
}
|
|
102
|
+
}
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
The tool accepts `{ records, supporting?, preset?, rules? }` and returns the gate result. Instruct your agent to call it first and refuse to answer when `allowedActions.summarize` is `false`.
|
|
106
|
+
|
|
107
|
+
## Why trust it
|
|
108
|
+
|
|
109
|
+
The engine is intentionally small and pure — no network, no dependencies, all logic unit-tested. It runs in production today inside a financial-data application, gating an LLM over real, messy, sometimes-missing filings.
|
|
110
|
+
|
|
111
|
+
## License
|
|
112
|
+
|
|
113
|
+
MIT
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
"""Evidence Gate — stop your AI from making up facts about your own data."""
|
|
2
|
+
from .core import (
|
|
3
|
+
evidence_gate,
|
|
4
|
+
classify_status,
|
|
5
|
+
derive_allowed_actions,
|
|
6
|
+
parse_date,
|
|
7
|
+
days_since,
|
|
8
|
+
freshness_label,
|
|
9
|
+
)
|
|
10
|
+
from . import presets
|
|
11
|
+
|
|
12
|
+
__all__ = [
|
|
13
|
+
"evidence_gate",
|
|
14
|
+
"classify_status",
|
|
15
|
+
"derive_allowed_actions",
|
|
16
|
+
"parse_date",
|
|
17
|
+
"days_since",
|
|
18
|
+
"freshness_label",
|
|
19
|
+
"presets",
|
|
20
|
+
]
|
|
21
|
+
__version__ = "0.1.0"
|
|
@@ -0,0 +1,123 @@
|
|
|
1
|
+
"""Evidence Gate — core engine (Python).
|
|
2
|
+
|
|
3
|
+
Decide whether an LLM may speak about a set of records, BEFORE it generates
|
|
4
|
+
anything — based on evidence coverage, freshness, and quality.
|
|
5
|
+
|
|
6
|
+
No dependencies. Domain-agnostic: bring records + a ruleset (preset).
|
|
7
|
+
|
|
8
|
+
record = {"date", "quality_score"?, "quality"?, "flags"?, "tier"?}
|
|
9
|
+
"""
|
|
10
|
+
from datetime import datetime, date
|
|
11
|
+
|
|
12
|
+
|
|
13
|
+
def parse_date(value):
|
|
14
|
+
if not value:
|
|
15
|
+
return None
|
|
16
|
+
try:
|
|
17
|
+
return datetime.strptime(str(value)[:10], "%Y-%m-%d").date()
|
|
18
|
+
except ValueError:
|
|
19
|
+
return None
|
|
20
|
+
|
|
21
|
+
|
|
22
|
+
def days_since(d):
|
|
23
|
+
if not d:
|
|
24
|
+
return None
|
|
25
|
+
return (date.today() - d).days
|
|
26
|
+
|
|
27
|
+
|
|
28
|
+
def freshness_label(latest, threshold_days):
|
|
29
|
+
age = days_since(latest)
|
|
30
|
+
if age is None:
|
|
31
|
+
return "unknown"
|
|
32
|
+
return "stale" if age > threshold_days else "fresh"
|
|
33
|
+
|
|
34
|
+
|
|
35
|
+
def classify_status(records, rules):
|
|
36
|
+
"""records[] + rules -> status of the primary evidence group.
|
|
37
|
+
|
|
38
|
+
rules: {"stale_days", "min_records", "quality_threshold"}
|
|
39
|
+
status: "available" | "quality_warning" | "fallback" | "missing"
|
|
40
|
+
"""
|
|
41
|
+
usable = records or []
|
|
42
|
+
if not usable:
|
|
43
|
+
return {"status": "missing", "freshness": "unknown", "latest": None,
|
|
44
|
+
"count": 0, "quality_min": None, "flags": []}
|
|
45
|
+
|
|
46
|
+
dates = [d for d in (parse_date(r.get("date")) for r in usable) if d]
|
|
47
|
+
latest = max(dates) if dates else None
|
|
48
|
+
freshness = freshness_label(latest, rules["stale_days"]) if latest else "unknown"
|
|
49
|
+
latest_str = latest.isoformat() if latest else None
|
|
50
|
+
|
|
51
|
+
if all(r.get("tier") == "fallback" for r in usable):
|
|
52
|
+
return {"status": "fallback", "freshness": freshness, "latest": latest_str,
|
|
53
|
+
"count": len(usable), "quality_min": None, "flags": []}
|
|
54
|
+
|
|
55
|
+
scores = [r["quality_score"] for r in usable if r.get("quality_score") is not None]
|
|
56
|
+
quality_min = min(scores) if scores else None
|
|
57
|
+
flags = []
|
|
58
|
+
for r in usable:
|
|
59
|
+
wf = r.get("flags")
|
|
60
|
+
if isinstance(wf, list):
|
|
61
|
+
flags.extend(wf)
|
|
62
|
+
elif isinstance(wf, str) and wf:
|
|
63
|
+
flags.append(wf)
|
|
64
|
+
flags = sorted(set(flags))
|
|
65
|
+
|
|
66
|
+
review_quality = any(r.get("quality") == "review" for r in usable)
|
|
67
|
+
has_quality_issue = (
|
|
68
|
+
(quality_min is not None and quality_min < rules["quality_threshold"])
|
|
69
|
+
or bool(flags)
|
|
70
|
+
or review_quality
|
|
71
|
+
)
|
|
72
|
+
|
|
73
|
+
if len(usable) < rules["min_records"]:
|
|
74
|
+
status = "quality_warning"
|
|
75
|
+
elif has_quality_issue:
|
|
76
|
+
status = "quality_warning"
|
|
77
|
+
else:
|
|
78
|
+
status = "available"
|
|
79
|
+
|
|
80
|
+
return {"status": status, "freshness": freshness, "latest": latest_str,
|
|
81
|
+
"count": len(usable), "quality_min": quality_min, "flags": flags}
|
|
82
|
+
|
|
83
|
+
|
|
84
|
+
def derive_allowed_actions(primary_status, supporting_present=False, forbidden_actions=None):
|
|
85
|
+
summarize = primary_status in ("available", "quality_warning", "fallback") or supporting_present
|
|
86
|
+
compare = primary_status in ("available", "quality_warning")
|
|
87
|
+
actions = {"summarize": summarize, "compare": compare}
|
|
88
|
+
for a in (forbidden_actions or []):
|
|
89
|
+
actions[a] = False
|
|
90
|
+
return actions
|
|
91
|
+
|
|
92
|
+
|
|
93
|
+
def evidence_gate(records=None, supporting=None, rules=None):
|
|
94
|
+
"""records + rules -> {status, freshness, allowed_actions, warnings, caveats}."""
|
|
95
|
+
if not rules:
|
|
96
|
+
raise ValueError("evidence_gate: `rules` (a preset) is required")
|
|
97
|
+
|
|
98
|
+
primary = classify_status(records or [], rules)
|
|
99
|
+
supporting_present = bool(supporting or [])
|
|
100
|
+
allowed = derive_allowed_actions(primary["status"], supporting_present, rules.get("forbidden_actions"))
|
|
101
|
+
|
|
102
|
+
warnings = []
|
|
103
|
+
m = rules.get("messages", {})
|
|
104
|
+
label = rules.get("primary_label", "primary data")
|
|
105
|
+
st = primary["status"]
|
|
106
|
+
if st == "missing":
|
|
107
|
+
warnings.append({"level": "block", "code": "primary_missing",
|
|
108
|
+
"message": m.get("primary_missing", f"No {label} available — the AI must not invent numbers.")})
|
|
109
|
+
elif st == "fallback":
|
|
110
|
+
warnings.append({"level": "review", "code": "primary_fallback",
|
|
111
|
+
"message": m.get("primary_fallback", f"{label} is cached/fallback, not authoritative — the AI must say so.")})
|
|
112
|
+
elif st == "quality_warning":
|
|
113
|
+
warnings.append({"level": "review", "code": "primary_quality",
|
|
114
|
+
"message": m.get("primary_quality", f"{label} has a data-quality warning — the AI must add a caveat.")})
|
|
115
|
+
if primary["freshness"] == "stale":
|
|
116
|
+
warnings.append({"level": "review", "code": "primary_stale",
|
|
117
|
+
"message": m.get("primary_stale", f'{label} is stale (older than {rules["stale_days"]} days) — the AI must not say "latest" or "today".')})
|
|
118
|
+
if not supporting_present:
|
|
119
|
+
warnings.append({"level": "info", "code": "no_supporting",
|
|
120
|
+
"message": m.get("no_supporting", "No supporting evidence — primary source only.")})
|
|
121
|
+
|
|
122
|
+
return {"status": st, "freshness": primary["freshness"], "allowed_actions": allowed,
|
|
123
|
+
"warnings": warnings, "caveats": [w["message"] for w in warnings]}
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
"""Evidence Gate — example presets. A vertical is just a ruleset."""
|
|
2
|
+
|
|
3
|
+
FINANCE = {
|
|
4
|
+
"primary_label": "financial statements",
|
|
5
|
+
"stale_days": 135,
|
|
6
|
+
"min_records": 4,
|
|
7
|
+
"quality_threshold": 70,
|
|
8
|
+
"forbidden_actions": ["personalized_advice", "claim_realtime"],
|
|
9
|
+
}
|
|
10
|
+
|
|
11
|
+
HEALTH = {
|
|
12
|
+
"primary_label": "lab results",
|
|
13
|
+
"stale_days": 90,
|
|
14
|
+
"min_records": 2,
|
|
15
|
+
"quality_threshold": 80,
|
|
16
|
+
"forbidden_actions": ["diagnose", "prescribe", "claim_realtime"],
|
|
17
|
+
}
|
|
18
|
+
|
|
19
|
+
SUPPORT = {
|
|
20
|
+
"primary_label": "knowledge-base documents",
|
|
21
|
+
"stale_days": 365,
|
|
22
|
+
"min_records": 1,
|
|
23
|
+
"quality_threshold": 50,
|
|
24
|
+
"forbidden_actions": ["make_promises", "claim_realtime"],
|
|
25
|
+
}
|
|
@@ -0,0 +1,19 @@
|
|
|
1
|
+
[project]
|
|
2
|
+
name = "evidence-gate-py"
|
|
3
|
+
version = "0.1.0"
|
|
4
|
+
description = "Stop your AI from making up facts about your own data."
|
|
5
|
+
readme = "README.md"
|
|
6
|
+
requires-python = ">=3.8"
|
|
7
|
+
license = { text = "MIT" }
|
|
8
|
+
keywords = ["llm", "rag", "ai", "agents", "guardrails", "hallucination", "grounding", "reliability"]
|
|
9
|
+
authors = [{ name = "LopezDray" }]
|
|
10
|
+
|
|
11
|
+
[project.urls]
|
|
12
|
+
Homepage = "https://github.com/LopezDray/evidence-gate"
|
|
13
|
+
|
|
14
|
+
[build-system]
|
|
15
|
+
requires = ["hatchling"]
|
|
16
|
+
build-backend = "hatchling.build"
|
|
17
|
+
|
|
18
|
+
[tool.hatch.build.targets.wheel]
|
|
19
|
+
packages = ["evidence_gate"]
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
"""Evidence Gate — core tests: python -m pytest (or: python tests/test_core.py)"""
|
|
2
|
+
import os
|
|
3
|
+
import sys
|
|
4
|
+
from datetime import date, timedelta
|
|
5
|
+
|
|
6
|
+
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
|
|
7
|
+
from evidence_gate.core import classify_status, derive_allowed_actions, evidence_gate
|
|
8
|
+
from evidence_gate.presets import FINANCE, HEALTH
|
|
9
|
+
|
|
10
|
+
|
|
11
|
+
def iso(days_ago):
|
|
12
|
+
return (date.today() - timedelta(days=days_ago)).isoformat()
|
|
13
|
+
|
|
14
|
+
|
|
15
|
+
def rec(q, days_ago, **extra):
|
|
16
|
+
return {"date": iso(days_ago), "quality_score": q, **extra}
|
|
17
|
+
|
|
18
|
+
|
|
19
|
+
def test_classify():
|
|
20
|
+
assert classify_status([rec(95, 5)] * 4, FINANCE)["status"] == "available"
|
|
21
|
+
assert classify_status([rec(95, 5)] * 2, FINANCE)["status"] == "quality_warning"
|
|
22
|
+
assert classify_status([rec(95, 5), rec(95, 5), rec(95, 5), rec(40, 5)], FINANCE)["status"] == "quality_warning"
|
|
23
|
+
assert classify_status([], FINANCE)["status"] == "missing"
|
|
24
|
+
assert classify_status([rec(95, 5, tier="fallback")] * 2, FINANCE)["status"] == "fallback"
|
|
25
|
+
assert classify_status([rec(95, 200)] * 4, FINANCE)["freshness"] == "stale"
|
|
26
|
+
|
|
27
|
+
|
|
28
|
+
def test_allowed_actions():
|
|
29
|
+
a = derive_allowed_actions("available", forbidden_actions=["personalized_advice"])
|
|
30
|
+
assert a["summarize"] and a["compare"] and a["personalized_advice"] is False
|
|
31
|
+
assert derive_allowed_actions("missing")["summarize"] is False
|
|
32
|
+
assert derive_allowed_actions("missing", supporting_present=True)["summarize"] is True
|
|
33
|
+
|
|
34
|
+
|
|
35
|
+
def test_date_validation():
|
|
36
|
+
from evidence_gate.core import parse_date
|
|
37
|
+
assert parse_date(None) is None
|
|
38
|
+
assert parse_date("not-a-date") is None
|
|
39
|
+
assert parse_date("2026-13-01") is None # impossible month
|
|
40
|
+
assert parse_date("2026-02-31") is None # impossible day
|
|
41
|
+
assert parse_date("2024-02-29") is not None # valid leap day
|
|
42
|
+
|
|
43
|
+
|
|
44
|
+
def test_domain_swap():
|
|
45
|
+
g = evidence_gate(records=[rec(95, 5), rec(95, 5)], rules=HEALTH)
|
|
46
|
+
assert g["status"] == "available"
|
|
47
|
+
assert g["allowed_actions"]["diagnose"] is False
|
|
48
|
+
assert g["allowed_actions"]["prescribe"] is False
|
|
49
|
+
|
|
50
|
+
|
|
51
|
+
if __name__ == "__main__":
|
|
52
|
+
test_classify(); test_allowed_actions(); test_date_validation(); test_domain_swap()
|
|
53
|
+
print("all tests passed")
|