decision-provenance 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,276 @@
1
+ Metadata-Version: 2.4
2
+ Name: decision-provenance
3
+ Version: 1.0.0
4
+ Summary: Tamper-evident audit logging for ML inference pipelines.
5
+ Author-email: Hitesh Srivastava <srivastavahitesh09@gmail.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/hitcaff/decision_provenance
8
+ Requires-Python: >=3.10
9
+ Description-Content-Type: text/markdown
10
+ Provides-Extra: ipfs
11
+ Requires-Dist: requests>=2.28; extra == "ipfs"
12
+ Provides-Extra: evm
13
+ Requires-Dist: web3>=6.0; extra == "evm"
14
+ Provides-Extra: api
15
+ Requires-Dist: fastapi>=0.100; extra == "api"
16
+ Requires-Dist: uvicorn>=0.22; extra == "api"
17
+ Requires-Dist: pydantic>=2.0; extra == "api"
18
+ Provides-Extra: all
19
+ Requires-Dist: requests>=2.28; extra == "all"
20
+ Requires-Dist: web3>=6.0; extra == "all"
21
+ Requires-Dist: fastapi>=0.100; extra == "all"
22
+ Requires-Dist: uvicorn>=0.22; extra == "all"
23
+ Requires-Dist: pydantic>=2.0; extra == "all"
24
+ Dynamic: requires-python
25
+
26
+ # decision-provenance
27
+
28
+ Tamper-evident audit logging for any ML inference pipeline.
29
+
30
+ Designed for **EU AI Act Article 13** compliance (transparency obligations for high-risk AI systems).
31
+
32
+ ---
33
+
34
+ ## What it solves
35
+
36
+ When a loan is denied, a resume is rejected, or a fraud flag fires — there is currently no standard way to prove:
37
+ - Which exact model version made the call
38
+ - What features it saw
39
+ - What threshold was in effect at that moment
40
+ - Whether any of those records have been altered since
41
+
42
+ This library makes every automated decision cryptographically tamper-evident without requiring blockchain infrastructure.
43
+
44
+ ---
45
+
46
+ ## Architecture
47
+
48
+ Three independent chains share one SQLite database:
49
+
50
+ ```
51
+ LabelRegistry → stable label IDs (L001, L002...)
52
+ "approved" can be renamed; L001 never changes
53
+
54
+ ConfigChain → versioned threshold records
55
+ threshold 0.55 → 0.65 is a ConfigRecord, not a mutation
56
+ every change requires a mandatory change_reason
57
+
58
+ MerkleChain → decision records
59
+ SHA-256(prev_root ∥ record_hash) per append
60
+ prev_root assigned inside write lock — concurrency safe
61
+ any mutation breaks every subsequent root
62
+ ```
63
+
64
+ **What is in the decision hash:**
65
+ `model_id + model_version + model_hash + input_hash + output_hash + label_id + config_id + timestamp`
66
+
67
+ **What is deliberately NOT in the decision hash:**
68
+ - `label_display` — a string that can be renamed without affecting the decision
69
+ - `threshold` — lives in ConfigChain, referenced by `config_id`
70
+ - `runtime_env` — informational only
71
+
72
+ ---
73
+
74
+ ## Install
75
+
76
+ ```bash
77
+ git clone <repo>
78
+ cd decision_provenance
79
+ pip install -e .
80
+
81
+ # Optional
82
+ pip install requests # IPFS per-record anchoring
83
+ pip install web3 # EVM periodic chain root anchoring
84
+ pip install fastapi uvicorn # HTTP microservice wrapper
85
+ ```
86
+
87
+ ---
88
+
89
+ ## Quick start
90
+
91
+ ```python
92
+ from decision_provenance import ProvenanceLogger
93
+
94
+ logger = ProvenanceLogger(
95
+ model_id="loan_scorer",
96
+ model_version="2.3.1",
97
+ db_path="provenance.db",
98
+ anonymise_fn=lambda f: {k: v for k, v in f.items()
99
+ if k not in ("name", "ssn", "email")},
100
+ )
101
+
102
+ # Register threshold config with mandatory audit trail
103
+ logger.set_config(
104
+ threshold=0.6,
105
+ above_label="approved",
106
+ below_label="denied",
107
+ changed_by="data_team",
108
+ change_reason="initial production deployment",
109
+ )
110
+
111
+ # Wrap your model with one decorator
112
+ @logger.log(score_fn=lambda out: out["score"])
113
+ def predict(features: dict) -> dict:
114
+ return my_model(features) # unchanged
115
+
116
+ # Use normally — provenance logged automatically
117
+ result = predict({"income": 95_000, "credit_score": 740, "debt_ratio": 0.28})
118
+ ```
119
+
120
+ ---
121
+
122
+ ## Threshold changes
123
+
124
+ Every threshold change is a **new ConfigRecord**, not a mutation. It requires a reason:
125
+
126
+ ```python
127
+ logger.set_config(
128
+ threshold=0.65,
129
+ above_label="approved",
130
+ below_label="denied",
131
+ changed_by="risk_committee",
132
+ change_reason="Q3 risk review: reduce default rate",
133
+ )
134
+ ```
135
+
136
+ The EU AI Act export shows the full config history alongside every decision,
137
+ so an auditor can reconstruct which threshold was active for any record.
138
+
139
+ ---
140
+
141
+ ## Verification
142
+
143
+ ```python
144
+ ok, message = logger.verify()
145
+ # True → "Chain intact — 1247 records, root=a3f8..."
146
+ # False → "Root mismatch at seq=43: computed=... != stored=..."
147
+ ```
148
+
149
+ The full chain is re-walked from genesis. No external service required.
150
+
151
+ ---
152
+
153
+ ## Export
154
+
155
+ ```python
156
+ # Full JSONL audit log
157
+ logger.export_audit_log("audit_log.jsonl")
158
+
159
+ # EU AI Act Article 13 compliance report
160
+ report = logger.export_eu_ai_act("compliance_report.json")
161
+ # Includes: label_registry, config_history, decision_distribution, chain_integrity
162
+ ```
163
+
164
+ ---
165
+
166
+ ## On-chain anchoring (optional)
167
+
168
+ Local SQLite is tamper-evident. External anchoring adds **public** verifiability —
169
+ the chain root exists outside your infrastructure and cannot be altered retroactively.
170
+
171
+ ```python
172
+ # Per-record IPFS anchor (closes the local-mutation window immediately)
173
+ logger = ProvenanceLogger(
174
+ ...,
175
+ ipfs_anchor=True,
176
+ pinata_jwt=os.environ["PINATA_JWT"],
177
+ )
178
+
179
+ # Periodic EVM anchor every 100 records (public, unforgeable timestamp)
180
+ logger = ProvenanceLogger(
181
+ ...,
182
+ evm_anchor_every=100,
183
+ evm_config={
184
+ "private_key": os.environ["SIGNER_KEY"],
185
+ "contract_address": "0x...",
186
+ "rpc_url": "https://eth-mainnet.rpc.grove.city/v1/<app_id>",
187
+ },
188
+ )
189
+ ```
190
+
191
+ Deploy `contracts/ProvenanceRegistry.sol` once per organisation.
192
+ ~35,000 gas per EVM anchor call.
193
+
194
+ ---
195
+
196
+ ## FastAPI microservice
197
+
198
+ ```bash
199
+ python -m decision_provenance.api
200
+ ```
201
+
202
+ ```
203
+ POST /configure initialise or reconfigure the logger
204
+ POST /record log one decision
205
+ GET /verify verify chain integrity
206
+ GET /record/{id} fetch single record
207
+ GET /export/audit download JSONL audit log
208
+ GET /export/eu_ai_act download compliance report
209
+ GET /health liveness check
210
+ ```
211
+
212
+ ---
213
+
214
+ ## Concurrency
215
+
216
+ Thread-safe by design. `prev_root` is assigned inside a module-level write lock —
217
+ concurrent callers can never race on the same root. SQLite WAL mode ensures
218
+ readers never block writers.
219
+
220
+ ---
221
+
222
+ ## Threat model
223
+
224
+ | Threat | Protection |
225
+ |--------|-----------|
226
+ | DB record mutation | Merkle chain — any change breaks all subsequent roots |
227
+ | Label string rename | Label registry — hash uses stable ID, not display string |
228
+ | Threshold change covering tracks | ConfigChain — every change is a new record with mandatory reason |
229
+ | Concurrent write corruption | Write lock + WAL mode |
230
+ | Careless attacker flips label column | `label_id` is in hash; label_display is not — flipping display is detectable |
231
+ | Determined attacker with DB access | External anchor (IPFS/EVM) — root already exists outside the DB |
232
+ | Compromised model lying to logger | Out of scope — requires HSM + model signing at training time |
233
+
234
+ ---
235
+
236
+ ## Test suite
237
+
238
+ ```bash
239
+ python -m pytest tests/ -v
240
+ # 38 tests covering: hash determinism, label registry, config chain,
241
+ # Merkle chain, tamper detection, input validation, concurrency,
242
+ # EU AI Act export, threshold change audit trail
243
+ ```
244
+
245
+ ---
246
+
247
+ ## EU AI Act relevance
248
+
249
+ Article 13 requires high-risk AI systems to enable:
250
+ - Logging with sufficient granularity to identify the cause of results
251
+ - Traceability of system operation
252
+ - Version control of the model
253
+
254
+ The `export_eu_ai_act()` output is structured for direct inclusion in conformity assessment documentation.
255
+
256
+ ---
257
+
258
+ ## File structure
259
+
260
+ ```
261
+ decision_provenance/
262
+ __init__.py public API
263
+ label_registry.py stable label ID registry
264
+ config_record.py versioned threshold config chain
265
+ record.py canonical provenance record + hashing
266
+ chain.py thread-safe Merkle chain (SQLite + WAL)
267
+ logger.py ProvenanceLogger — main entry point
268
+ anchor.py IPFS per-record + EVM periodic anchoring
269
+ api.py FastAPI microservice wrapper
270
+ contracts/
271
+ ProvenanceRegistry.sol on-chain anchor registry
272
+ examples/
273
+ loan_scorer_demo.py full walkthrough
274
+ tests/
275
+ test_all.py 38 tests, 100% pass
276
+ ```
@@ -0,0 +1,251 @@
1
+ # decision-provenance
2
+
3
+ Tamper-evident audit logging for any ML inference pipeline.
4
+
5
+ Designed for **EU AI Act Article 13** compliance (transparency obligations for high-risk AI systems).
6
+
7
+ ---
8
+
9
+ ## What it solves
10
+
11
+ When a loan is denied, a resume is rejected, or a fraud flag fires — there is currently no standard way to prove:
12
+ - Which exact model version made the call
13
+ - What features it saw
14
+ - What threshold was in effect at that moment
15
+ - Whether any of those records have been altered since
16
+
17
+ This library makes every automated decision cryptographically tamper-evident without requiring blockchain infrastructure.
18
+
19
+ ---
20
+
21
+ ## Architecture
22
+
23
+ Three independent chains share one SQLite database:
24
+
25
+ ```
26
+ LabelRegistry → stable label IDs (L001, L002...)
27
+ "approved" can be renamed; L001 never changes
28
+
29
+ ConfigChain → versioned threshold records
30
+ threshold 0.55 → 0.65 is a ConfigRecord, not a mutation
31
+ every change requires a mandatory change_reason
32
+
33
+ MerkleChain → decision records
34
+ SHA-256(prev_root ∥ record_hash) per append
35
+ prev_root assigned inside write lock — concurrency safe
36
+ any mutation breaks every subsequent root
37
+ ```
38
+
39
+ **What is in the decision hash:**
40
+ `model_id + model_version + model_hash + input_hash + output_hash + label_id + config_id + timestamp`
41
+
42
+ **What is deliberately NOT in the decision hash:**
43
+ - `label_display` — a string that can be renamed without affecting the decision
44
+ - `threshold` — lives in ConfigChain, referenced by `config_id`
45
+ - `runtime_env` — informational only
46
+
47
+ ---
48
+
49
+ ## Install
50
+
51
+ ```bash
52
+ git clone <repo>
53
+ cd decision_provenance
54
+ pip install -e .
55
+
56
+ # Optional
57
+ pip install requests # IPFS per-record anchoring
58
+ pip install web3 # EVM periodic chain root anchoring
59
+ pip install fastapi uvicorn # HTTP microservice wrapper
60
+ ```
61
+
62
+ ---
63
+
64
+ ## Quick start
65
+
66
+ ```python
67
+ from decision_provenance import ProvenanceLogger
68
+
69
+ logger = ProvenanceLogger(
70
+ model_id="loan_scorer",
71
+ model_version="2.3.1",
72
+ db_path="provenance.db",
73
+ anonymise_fn=lambda f: {k: v for k, v in f.items()
74
+ if k not in ("name", "ssn", "email")},
75
+ )
76
+
77
+ # Register threshold config with mandatory audit trail
78
+ logger.set_config(
79
+ threshold=0.6,
80
+ above_label="approved",
81
+ below_label="denied",
82
+ changed_by="data_team",
83
+ change_reason="initial production deployment",
84
+ )
85
+
86
+ # Wrap your model with one decorator
87
+ @logger.log(score_fn=lambda out: out["score"])
88
+ def predict(features: dict) -> dict:
89
+ return my_model(features) # unchanged
90
+
91
+ # Use normally — provenance logged automatically
92
+ result = predict({"income": 95_000, "credit_score": 740, "debt_ratio": 0.28})
93
+ ```
94
+
95
+ ---
96
+
97
+ ## Threshold changes
98
+
99
+ Every threshold change is a **new ConfigRecord**, not a mutation. It requires a reason:
100
+
101
+ ```python
102
+ logger.set_config(
103
+ threshold=0.65,
104
+ above_label="approved",
105
+ below_label="denied",
106
+ changed_by="risk_committee",
107
+ change_reason="Q3 risk review: reduce default rate",
108
+ )
109
+ ```
110
+
111
+ The EU AI Act export shows the full config history alongside every decision,
112
+ so an auditor can reconstruct which threshold was active for any record.
113
+
114
+ ---
115
+
116
+ ## Verification
117
+
118
+ ```python
119
+ ok, message = logger.verify()
120
+ # True → "Chain intact — 1247 records, root=a3f8..."
121
+ # False → "Root mismatch at seq=43: computed=... != stored=..."
122
+ ```
123
+
124
+ The full chain is re-walked from genesis. No external service required.
125
+
126
+ ---
127
+
128
+ ## Export
129
+
130
+ ```python
131
+ # Full JSONL audit log
132
+ logger.export_audit_log("audit_log.jsonl")
133
+
134
+ # EU AI Act Article 13 compliance report
135
+ report = logger.export_eu_ai_act("compliance_report.json")
136
+ # Includes: label_registry, config_history, decision_distribution, chain_integrity
137
+ ```
138
+
139
+ ---
140
+
141
+ ## On-chain anchoring (optional)
142
+
143
+ Local SQLite is tamper-evident. External anchoring adds **public** verifiability —
144
+ the chain root exists outside your infrastructure and cannot be altered retroactively.
145
+
146
+ ```python
147
+ # Per-record IPFS anchor (closes the local-mutation window immediately)
148
+ logger = ProvenanceLogger(
149
+ ...,
150
+ ipfs_anchor=True,
151
+ pinata_jwt=os.environ["PINATA_JWT"],
152
+ )
153
+
154
+ # Periodic EVM anchor every 100 records (public, unforgeable timestamp)
155
+ logger = ProvenanceLogger(
156
+ ...,
157
+ evm_anchor_every=100,
158
+ evm_config={
159
+ "private_key": os.environ["SIGNER_KEY"],
160
+ "contract_address": "0x...",
161
+ "rpc_url": "https://eth-mainnet.rpc.grove.city/v1/<app_id>",
162
+ },
163
+ )
164
+ ```
165
+
166
+ Deploy `contracts/ProvenanceRegistry.sol` once per organisation.
167
+ ~35,000 gas per EVM anchor call.
168
+
169
+ ---
170
+
171
+ ## FastAPI microservice
172
+
173
+ ```bash
174
+ python -m decision_provenance.api
175
+ ```
176
+
177
+ ```
178
+ POST /configure initialise or reconfigure the logger
179
+ POST /record log one decision
180
+ GET /verify verify chain integrity
181
+ GET /record/{id} fetch single record
182
+ GET /export/audit download JSONL audit log
183
+ GET /export/eu_ai_act download compliance report
184
+ GET /health liveness check
185
+ ```
186
+
187
+ ---
188
+
189
+ ## Concurrency
190
+
191
+ Thread-safe by design. `prev_root` is assigned inside a module-level write lock —
192
+ concurrent callers can never race on the same root. SQLite WAL mode ensures
193
+ readers never block writers.
194
+
195
+ ---
196
+
197
+ ## Threat model
198
+
199
+ | Threat | Protection |
200
+ |--------|-----------|
201
+ | DB record mutation | Merkle chain — any change breaks all subsequent roots |
202
+ | Label string rename | Label registry — hash uses stable ID, not display string |
203
+ | Threshold change covering tracks | ConfigChain — every change is a new record with mandatory reason |
204
+ | Concurrent write corruption | Write lock + WAL mode |
205
+ | Careless attacker flips label column | `label_id` is in hash; label_display is not — flipping display is detectable |
206
+ | Determined attacker with DB access | External anchor (IPFS/EVM) — root already exists outside the DB |
207
+ | Compromised model lying to logger | Out of scope — requires HSM + model signing at training time |
208
+
209
+ ---
210
+
211
+ ## Test suite
212
+
213
+ ```bash
214
+ python -m pytest tests/ -v
215
+ # 38 tests covering: hash determinism, label registry, config chain,
216
+ # Merkle chain, tamper detection, input validation, concurrency,
217
+ # EU AI Act export, threshold change audit trail
218
+ ```
219
+
220
+ ---
221
+
222
+ ## EU AI Act relevance
223
+
224
+ Article 13 requires high-risk AI systems to enable:
225
+ - Logging with sufficient granularity to identify the cause of results
226
+ - Traceability of system operation
227
+ - Version control of the model
228
+
229
+ The `export_eu_ai_act()` output is structured for direct inclusion in conformity assessment documentation.
230
+
231
+ ---
232
+
233
+ ## File structure
234
+
235
+ ```
236
+ decision_provenance/
237
+ __init__.py public API
238
+ label_registry.py stable label ID registry
239
+ config_record.py versioned threshold config chain
240
+ record.py canonical provenance record + hashing
241
+ chain.py thread-safe Merkle chain (SQLite + WAL)
242
+ logger.py ProvenanceLogger — main entry point
243
+ anchor.py IPFS per-record + EVM periodic anchoring
244
+ api.py FastAPI microservice wrapper
245
+ contracts/
246
+ ProvenanceRegistry.sol on-chain anchor registry
247
+ examples/
248
+ loan_scorer_demo.py full walkthrough
249
+ tests/
250
+ test_all.py 38 tests, 100% pass
251
+ ```
@@ -0,0 +1,19 @@
1
+ from .logger import ProvenanceLogger
2
+ from .record import ProvenanceRecord, build_record, ValidationError
3
+ from .chain import MerkleChain
4
+ from .config_record import ConfigChain, ConfigRecord
5
+ from .label_registry import LabelRegistry
6
+ from .anchor import anchor_record_ipfs, anchor_root_evm
7
+
8
+ __all__ = [
9
+ "ProvenanceLogger",
10
+ "ProvenanceRecord",
11
+ "build_record",
12
+ "ValidationError",
13
+ "MerkleChain",
14
+ "ConfigChain",
15
+ "ConfigRecord",
16
+ "LabelRegistry",
17
+ "anchor_record_ipfs",
18
+ "anchor_root_evm",
19
+ ]