insightforge 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,236 @@
1
+ Metadata-Version: 2.4
2
+ Name: insightforge
3
+ Version: 0.1.0
4
+ Summary: A transparency engine for AI interactions.
5
+ Author: InsightForge
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/your-org/InsightForge
8
+ Project-URL: Repository, https://github.com/your-org/InsightForge
9
+ Keywords: ai,observability,auditability,cli,transparency
10
+ Classifier: Development Status :: 3 - Alpha
11
+ Classifier: Environment :: Console
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Programming Language :: Python :: 3.12
17
+ Requires-Python: >=3.11
18
+ Description-Content-Type: text/markdown
19
+
20
+ # InsightForge
21
+
22
+ AI without insight is blind ambition.
23
+
24
+ InsightForge is an open-source transparency engine for AI interactions. It wraps a model call or agent command, captures the prompt and execution context, scores the response for obvious risk signals, and renders a visual trace you can inspect instead of blindly trusting output.
25
+
26
+ This is not observability theater. It is the start of a forensic layer for AI:
27
+
28
+ - What did we ask?
29
+ - What came back?
30
+ - How confident should we be?
31
+ - Which parts look biased, weakly grounded, or potentially hallucinatory?
32
+ - What evidence would make this answer safer to trust?
33
+
34
+ ## Why this exists
35
+
36
+ The current AI stack optimizes for output volume and benchmark swagger. What it usually does not give you is the "why" layer:
37
+
38
+ - Developers cannot easily debug why an agent drifted off task.
39
+ - Teams cannot prove traceability during audits.
40
+ - Users are asked to trust answers without seeing grounding quality.
41
+ - Compliance pressure is increasing faster than transparency tooling.
42
+
43
+ InsightForge is the neutral inspector that sits around model interactions and makes them legible.
44
+
45
+ ## MVP Scope
46
+
47
+ This repository bootstraps the first local MVP:
48
+
49
+ - `insightforge wrap ...` runs any shell command that represents an AI interaction.
50
+ - `insightforge ask ...` talks to supported providers directly.
51
+ - `insightforge list` shows indexed traces from the local registry.
52
+ - `insightforge diff ...` compares two traces and renders a visual report.
53
+ - `insightforge schema-version` shows the current SQLite schema version.
54
+ - `insightforge migrate` upgrades local storage to the latest schema version.
55
+ - Evaluates production policies such as minimum confidence, stderr failures, source requirements, and blocked absolute language.
56
+ - Redacts common secrets and emails before traces are persisted.
57
+ - Stores trace metadata and payloads in SQLite for durable retrieval.
58
+ - Captures prompt, stdout, stderr, exit status, and basic provenance hints.
59
+ - Applies heuristic checks for weak grounding and overgeneralized language.
60
+ - Emits both a machine-readable JSON trace and a polished HTML "insight map".
61
+
62
+ It is intentionally simple. The point is to prove the workflow before building provider adapters, policy packs, and team-grade compliance pipelines.
63
+
64
+ ## Quickstart
65
+
66
+ ```bash
67
+ python3 -m venv .venv
68
+ source .venv/bin/activate
69
+ pip install -e .
70
+ cp .insightforge.toml.example .insightforge.toml
71
+ insightforge wrap claude "Explain why this migration failed" --cmd "python3 -c 'print(\"Maybe the issue is a missing foreign key\")'"
72
+ ```
73
+
74
+ The command generates:
75
+
76
+ - `traces/latest.json`
77
+ - `traces/latest.html`
78
+
79
+ Open the HTML file in a browser to inspect the visual trace.
80
+
81
+ Provider-native flow:
82
+
83
+ ```bash
84
+ insightforge ask mock demo-model "Why did the recommendation change?" \
85
+ --system "Explain assumptions and mention missing evidence." \
86
+ --out traces/mock-provider
87
+ ```
88
+
89
+ Supported providers today:
90
+
91
+ - `mock` for local demos and tests
92
+ - `openai` via `OPENAI_API_KEY`
93
+ - `anthropic` via `ANTHROPIC_API_KEY`
94
+
95
+ The provider adapters use only the Python standard library so the project stays dependency-light.
96
+
97
+ CLI install options for other developers:
98
+
99
+ ```bash
100
+ pipx install git+https://github.com/<your-org>/InsightForge.git
101
+ ```
102
+
103
+ Or from a published package:
104
+
105
+ ```bash
106
+ pipx install insightforge
107
+ ```
108
+
109
+ That gives developers a global `insightforge` command without needing a project-local virtualenv.
110
+
111
+ Users are automatically notified in the CLI when PyPI has a newer InsightForge release. They can also check manually with:
112
+
113
+ ```bash
114
+ insightforge version --check-updates
115
+ ```
116
+
117
+ Comparison workflow:
118
+
119
+ ```bash
120
+ insightforge list
121
+ insightforge diff trace_id_one trace_id_two --out traces/compare.html
122
+ ```
123
+
124
+ Every captured trace is indexed in `.insightforge/registry.json` so you can compare runs by path or by trace id.
125
+ The primary production store is SQLite at `.insightforge/traces.db`, with the JSON registry kept as a convenience export.
126
+
127
+ Storage maintenance:
128
+
129
+ ```bash
130
+ insightforge schema-version --expected
131
+ insightforge migrate
132
+ ```
133
+
134
+ CI/CD automation:
135
+
136
+ - `.github/workflows/ci.yml` runs compile, tests, and package builds on every push and pull request.
137
+ - `.github/workflows/release.yml` runs on `main`, re-tests the package, compares the local version to PyPI, and publishes only when the version is newer.
138
+ - Users then see the update prompt in the CLI and can upgrade with `pipx upgrade insightforge`.
139
+
140
+ Trusted Publishing setup:
141
+
142
+ 1. Push this repository to GitHub.
143
+ 2. In GitHub, keep Actions enabled for the repository.
144
+ 3. In PyPI, add a Trusted Publisher for this project with:
145
+ - Owner: your GitHub user or org
146
+ - Repository: `InsightForge`
147
+ - Workflow name: `release.yml`
148
+ - Environment name: `pypi`
149
+ 4. In GitHub, no `PYPI_API_TOKEN` secret is needed for publishing.
150
+ 5. Bump the version in `pyproject.toml`, merge/push to `main`, and the release workflow will publish automatically if tests pass and the version is newer than PyPI.
151
+
152
+ If you want a dry run before real publishing, point the same workflow model at TestPyPI first and use a separate trusted publisher there.
153
+
154
+ Policy and redaction config:
155
+
156
+ ```bash
157
+ cp .insightforge.toml.example .insightforge.toml
158
+ ```
159
+
160
+ The checked-in defaults are intentionally strict for factual audit demos:
161
+
162
+ - `policy.min_confidence = 0.85`
163
+ - `policy.require_sources = true`
164
+ - `policy.fail_on_stderr = true`
165
+ - `policy.block_absolute_language = true`
166
+
167
+ Other knobs:
168
+
169
+ - `policy.max_output_chars`
170
+ - `redaction.enabled`
171
+ - `storage.sqlite_path`
172
+ - `updates.enabled`
173
+ - `updates.check_interval_hours`
174
+
175
+ ## Example
176
+
177
+ ```bash
178
+ insightforge wrap local-llm "Review this answer for risk" \
179
+ --cmd "python3 -c 'print(\"This obviously always works\")'" \
180
+ --out traces/risky-demo
181
+ ```
182
+
183
+ Expected outcome:
184
+
185
+ - Confidence score drops.
186
+ - The report flags overgeneralized language.
187
+ - The insight map shows prompt, execution, heuristic analysis, and captured output.
188
+
189
+ ## Product Direction
190
+
191
+ The wedge is developer trust. The moat is structured forensic data.
192
+
193
+ Short term:
194
+
195
+ - Wrap local CLIs and SDK calls.
196
+ - Add richer provider-specific adapters for OpenAI, Anthropic, and local models.
197
+ - Expand bias, hallucination, and provenance checks.
198
+ - Ship a VS Code extension and diff view for prompt iterations.
199
+
200
+ Long term:
201
+
202
+ - Team dashboards for audit review.
203
+ - Queryable trace stores.
204
+ - Policy enforcement before outputs reach end users.
205
+ - Explainability primitives for enterprise compliance.
206
+
207
+ ## 90-Day Build Narrative
208
+
209
+ Weeks 1-2:
210
+
211
+ - Ship crude but usable.
212
+ - Publish the manifesto and demo.
213
+ - Capture raw traces and get developers trying it immediately.
214
+
215
+ Weeks 3-4:
216
+
217
+ - Talk to users who already complain about hallucinations and opacity.
218
+ - Turn every legitimate complaint into a fast release.
219
+
220
+ Weeks 5-8:
221
+
222
+ - Add richer visual maps and editor integrations.
223
+ - Make "gotcha" demos trivially shareable.
224
+
225
+ Weeks 9-12:
226
+
227
+ - Launch a hosted audit workflow for teams.
228
+ - Use real usage, not pitch decks, to pull in platform partners.
229
+
230
+ ## Current Constraints
231
+
232
+ This MVP does not capture hidden chain-of-thought or privileged model internals. It captures the observable trail around the interaction and turns that into a usable audit artifact. That distinction matters.
233
+
234
+ It also uses heuristic analysis for confidence and risk flags. Those signals are useful for audit triage, but they are not a claim of true model introspection.
235
+
236
+ If you want trustworthy AI systems, you need tooling that treats every answer as inspectable infrastructure.
@@ -0,0 +1,217 @@
1
+ # InsightForge
2
+
3
+ AI without insight is blind ambition.
4
+
5
+ InsightForge is an open-source transparency engine for AI interactions. It wraps a model call or agent command, captures the prompt and execution context, scores the response for obvious risk signals, and renders a visual trace you can inspect instead of blindly trusting output.
6
+
7
+ This is not observability theater. It is the start of a forensic layer for AI:
8
+
9
+ - What did we ask?
10
+ - What came back?
11
+ - How confident should we be?
12
+ - Which parts look biased, weakly grounded, or potentially hallucinatory?
13
+ - What evidence would make this answer safer to trust?
14
+
15
+ ## Why this exists
16
+
17
+ The current AI stack optimizes for output volume and benchmark swagger. What it usually does not give you is the "why" layer:
18
+
19
+ - Developers cannot easily debug why an agent drifted off task.
20
+ - Teams cannot prove traceability during audits.
21
+ - Users are asked to trust answers without seeing grounding quality.
22
+ - Compliance pressure is increasing faster than transparency tooling.
23
+
24
+ InsightForge is the neutral inspector that sits around model interactions and makes them legible.
25
+
26
+ ## MVP Scope
27
+
28
+ This repository bootstraps the first local MVP:
29
+
30
+ - `insightforge wrap ...` runs any shell command that represents an AI interaction.
31
+ - `insightforge ask ...` talks to supported providers directly.
32
+ - `insightforge list` shows indexed traces from the local registry.
33
+ - `insightforge diff ...` compares two traces and renders a visual report.
34
+ - `insightforge schema-version` shows the current SQLite schema version.
35
+ - `insightforge migrate` upgrades local storage to the latest schema version.
36
+ - Evaluates production policies such as minimum confidence, stderr failures, source requirements, and blocked absolute language.
37
+ - Redacts common secrets and emails before traces are persisted.
38
+ - Stores trace metadata and payloads in SQLite for durable retrieval.
39
+ - Captures prompt, stdout, stderr, exit status, and basic provenance hints.
40
+ - Applies heuristic checks for weak grounding and overgeneralized language.
41
+ - Emits both a machine-readable JSON trace and a polished HTML "insight map".
42
+
43
+ It is intentionally simple. The point is to prove the workflow before building provider adapters, policy packs, and team-grade compliance pipelines.
44
+
45
+ ## Quickstart
46
+
47
+ ```bash
48
+ python3 -m venv .venv
49
+ source .venv/bin/activate
50
+ pip install -e .
51
+ cp .insightforge.toml.example .insightforge.toml
52
+ insightforge wrap claude "Explain why this migration failed" --cmd "python3 -c 'print(\"Maybe the issue is a missing foreign key\")'"
53
+ ```
54
+
55
+ The command generates:
56
+
57
+ - `traces/latest.json`
58
+ - `traces/latest.html`
59
+
60
+ Open the HTML file in a browser to inspect the visual trace.
61
+
62
+ Provider-native flow:
63
+
64
+ ```bash
65
+ insightforge ask mock demo-model "Why did the recommendation change?" \
66
+ --system "Explain assumptions and mention missing evidence." \
67
+ --out traces/mock-provider
68
+ ```
69
+
70
+ Supported providers today:
71
+
72
+ - `mock` for local demos and tests
73
+ - `openai` via `OPENAI_API_KEY`
74
+ - `anthropic` via `ANTHROPIC_API_KEY`
75
+
76
+ The provider adapters use only the Python standard library so the project stays dependency-light.
77
+
78
+ CLI install options for other developers:
79
+
80
+ ```bash
81
+ pipx install git+https://github.com/<your-org>/InsightForge.git
82
+ ```
83
+
84
+ Or from a published package:
85
+
86
+ ```bash
87
+ pipx install insightforge
88
+ ```
89
+
90
+ That gives developers a global `insightforge` command without needing a project-local virtualenv.
91
+
92
+ Users are automatically notified in the CLI when PyPI has a newer InsightForge release. They can also check manually with:
93
+
94
+ ```bash
95
+ insightforge version --check-updates
96
+ ```
97
+
98
+ Comparison workflow:
99
+
100
+ ```bash
101
+ insightforge list
102
+ insightforge diff trace_id_one trace_id_two --out traces/compare.html
103
+ ```
104
+
105
+ Every captured trace is indexed in `.insightforge/registry.json` so you can compare runs by path or by trace id.
106
+ The primary production store is SQLite at `.insightforge/traces.db`, with the JSON registry kept as a convenience export.
107
+
108
+ Storage maintenance:
109
+
110
+ ```bash
111
+ insightforge schema-version --expected
112
+ insightforge migrate
113
+ ```
114
+
115
+ CI/CD automation:
116
+
117
+ - `.github/workflows/ci.yml` runs compile, tests, and package builds on every push and pull request.
118
+ - `.github/workflows/release.yml` runs on `main`, re-tests the package, compares the local version to PyPI, and publishes only when the version is newer.
119
+ - Users then see the update prompt in the CLI and can upgrade with `pipx upgrade insightforge`.
120
+
121
+ Trusted Publishing setup:
122
+
123
+ 1. Push this repository to GitHub.
124
+ 2. In GitHub, keep Actions enabled for the repository.
125
+ 3. In PyPI, add a Trusted Publisher for this project with:
126
+ - Owner: your GitHub user or org
127
+ - Repository: `InsightForge`
128
+ - Workflow name: `release.yml`
129
+ - Environment name: `pypi`
130
+ 4. In GitHub, no `PYPI_API_TOKEN` secret is needed for publishing.
131
+ 5. Bump the version in `pyproject.toml`, merge/push to `main`, and the release workflow will publish automatically if tests pass and the version is newer than PyPI.
132
+
133
+ If you want a dry run before real publishing, point the same workflow model at TestPyPI first and use a separate trusted publisher there.
134
+
135
+ Policy and redaction config:
136
+
137
+ ```bash
138
+ cp .insightforge.toml.example .insightforge.toml
139
+ ```
140
+
141
+ The checked-in defaults are intentionally strict for factual audit demos:
142
+
143
+ - `policy.min_confidence = 0.85`
144
+ - `policy.require_sources = true`
145
+ - `policy.fail_on_stderr = true`
146
+ - `policy.block_absolute_language = true`
147
+
148
+ Other knobs:
149
+
150
+ - `policy.max_output_chars`
151
+ - `redaction.enabled`
152
+ - `storage.sqlite_path`
153
+ - `updates.enabled`
154
+ - `updates.check_interval_hours`
155
+
156
+ ## Example
157
+
158
+ ```bash
159
+ insightforge wrap local-llm "Review this answer for risk" \
160
+ --cmd "python3 -c 'print(\"This obviously always works\")'" \
161
+ --out traces/risky-demo
162
+ ```
163
+
164
+ Expected outcome:
165
+
166
+ - Confidence score drops.
167
+ - The report flags overgeneralized language.
168
+ - The insight map shows prompt, execution, heuristic analysis, and captured output.
169
+
170
+ ## Product Direction
171
+
172
+ The wedge is developer trust. The moat is structured forensic data.
173
+
174
+ Short term:
175
+
176
+ - Wrap local CLIs and SDK calls.
177
+ - Add richer provider-specific adapters for OpenAI, Anthropic, and local models.
178
+ - Expand bias, hallucination, and provenance checks.
179
+ - Ship a VS Code extension and diff view for prompt iterations.
180
+
181
+ Long term:
182
+
183
+ - Team dashboards for audit review.
184
+ - Queryable trace stores.
185
+ - Policy enforcement before outputs reach end users.
186
+ - Explainability primitives for enterprise compliance.
187
+
188
+ ## 90-Day Build Narrative
189
+
190
+ Weeks 1-2:
191
+
192
+ - Ship crude but usable.
193
+ - Publish the manifesto and demo.
194
+ - Capture raw traces and get developers trying it immediately.
195
+
196
+ Weeks 3-4:
197
+
198
+ - Talk to users who already complain about hallucinations and opacity.
199
+ - Turn every legitimate complaint into a fast release.
200
+
201
+ Weeks 5-8:
202
+
203
+ - Add richer visual maps and editor integrations.
204
+ - Make "gotcha" demos trivially shareable.
205
+
206
+ Weeks 9-12:
207
+
208
+ - Launch a hosted audit workflow for teams.
209
+ - Use real usage, not pitch decks, to pull in platform partners.
210
+
211
+ ## Current Constraints
212
+
213
+ This MVP does not capture hidden chain-of-thought or privileged model internals. It captures the observable trail around the interaction and turns that into a usable audit artifact. That distinction matters.
214
+
215
+ It also uses heuristic analysis for confidence and risk flags. Those signals are useful for audit triage, but they are not a claim of true model introspection.
216
+
217
+ If you want trustworthy AI systems, you need tooling that treats every answer as inspectable infrastructure.
@@ -0,0 +1,34 @@
1
+ [build-system]
2
+ requires = ["setuptools>=68"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "insightforge"
7
+ version = "0.1.0"
8
+ description = "A transparency engine for AI interactions."
9
+ readme = "README.md"
10
+ requires-python = ">=3.11"
11
+ license = { text = "MIT" }
12
+ authors = [
13
+ { name = "InsightForge" }
14
+ ]
15
+ urls = { Homepage = "https://github.com/your-org/InsightForge", Repository = "https://github.com/your-org/InsightForge" }
16
+ keywords = ["ai", "observability", "auditability", "cli", "transparency"]
17
+ classifiers = [
18
+ "Development Status :: 3 - Alpha",
19
+ "Environment :: Console",
20
+ "Intended Audience :: Developers",
21
+ "License :: OSI Approved :: MIT License",
22
+ "Programming Language :: Python :: 3",
23
+ "Programming Language :: Python :: 3.11",
24
+ "Programming Language :: Python :: 3.12",
25
+ ]
26
+
27
+ [project.scripts]
28
+ insightforge = "insightforge.cli:main"
29
+
30
+ [tool.setuptools]
31
+ package-dir = { "" = "src" }
32
+
33
+ [tool.setuptools.packages.find]
34
+ where = ["src"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,7 @@
1
+ """InsightForge public package interface."""
2
+
3
+ from .analyzer import build_trace
4
+
5
+ __version__ = "0.1.0"
6
+
7
+ __all__ = ["__version__", "build_trace"]
@@ -0,0 +1,5 @@
1
+ from .cli import main
2
+
3
+
4
+ if __name__ == "__main__":
5
+ raise SystemExit(main())
@@ -0,0 +1,172 @@
1
+ from __future__ import annotations
2
+
3
+ import re
4
+ from collections.abc import Sequence
5
+
6
+ from .models import RiskFlag, TraceNode, TraceRecord
7
+
8
+
9
+ HEDGE_PATTERNS = (
10
+ r"\bmaybe\b",
11
+ r"\bprobably\b",
12
+ r"\bI think\b",
13
+ r"\blikely\b",
14
+ r"\bappears to\b",
15
+ r"\bseems\b",
16
+ )
17
+
18
+ SOURCE_PATTERNS = (
19
+ r"https?://",
20
+ r"\bsource\b",
21
+ r"\bcitation\b",
22
+ r"\breference\b",
23
+ r"\bdocumentation\b",
24
+ r"\bresearch paper\b",
25
+ r"\bstudy\b",
26
+ )
27
+
28
+ BIAS_PATTERNS = (
29
+ r"\balways\b",
30
+ r"\bnever\b",
31
+ r"\bobviously\b",
32
+ r"\beveryone\b",
33
+ r"\bno one\b",
34
+ )
35
+
36
+
37
+ def _count_matches(patterns: Sequence[str], text: str) -> int:
38
+ return sum(len(re.findall(pattern, text, flags=re.IGNORECASE)) for pattern in patterns)
39
+
40
+
41
+ def _build_summary(prompt: str, stdout: str, stderr: str, risk_count: int) -> str:
42
+ if stderr.strip():
43
+ return "The wrapped command produced stderr output; review execution details before trusting the answer."
44
+ if risk_count:
45
+ return "InsightForge detected language patterns that correlate with weak grounding or overconfident claims."
46
+ if not stdout.strip():
47
+ return "The wrapped command returned no stdout, so the trace is mostly execution metadata."
48
+ if prompt.strip():
49
+ return "The response completed without obvious risk markers, but the trace should still be reviewed for source quality."
50
+ return "Execution completed successfully with a low-risk heuristic profile."
51
+
52
+
53
+ def build_trace(
54
+ *,
55
+ prompt: str,
56
+ command: Sequence[str],
57
+ model_hint: str,
58
+ provider: str = "unknown",
59
+ system_prompt: str = "",
60
+ stdout: str,
61
+ stderr: str,
62
+ exit_code: int,
63
+ metadata: dict[str, str] | None = None,
64
+ provenance_notes: Sequence[str] | None = None,
65
+ ) -> TraceRecord:
66
+ output_blob = "\n".join(part for part in (stdout, stderr) if part)
67
+ hedge_hits = _count_matches(HEDGE_PATTERNS, output_blob)
68
+ source_hits = _count_matches(SOURCE_PATTERNS, output_blob)
69
+ bias_hits = _count_matches(BIAS_PATTERNS, output_blob)
70
+ stderr_penalty = 0.2 if stderr.strip() else 0.0
71
+ empty_penalty = 0.15 if not stdout.strip() else 0.0
72
+
73
+ confidence = 0.72
74
+ confidence -= min(0.24, hedge_hits * 0.04)
75
+ confidence -= min(0.20, bias_hits * 0.05)
76
+ confidence -= stderr_penalty + empty_penalty
77
+ confidence += min(0.18, source_hits * 0.06)
78
+ confidence = max(0.05, min(0.99, round(confidence, 2)))
79
+
80
+ bias_flags: list[RiskFlag] = []
81
+ hallucination_flags: list[RiskFlag] = []
82
+ provenance: list[str] = list(provenance_notes or [])
83
+
84
+ if source_hits:
85
+ provenance.append("Sources or citations were mentioned in the output.")
86
+ elif not provenance:
87
+ provenance.append("No explicit sources or citations were detected.")
88
+
89
+ if bias_hits:
90
+ bias_flags.append(
91
+ RiskFlag(
92
+ code="OVERGENERALIZATION",
93
+ title="Overgeneralized claim pattern",
94
+ severity="medium",
95
+ evidence="The output uses absolute language that can hide edge cases or demographic skew.",
96
+ recommendation="Ask the model to qualify claims, state assumptions, and list known exceptions.",
97
+ )
98
+ )
99
+
100
+ if hedge_hits and not source_hits:
101
+ hallucination_flags.append(
102
+ RiskFlag(
103
+ code="UNGROUNDED_HEDGING",
104
+ title="Ungrounded uncertainty",
105
+ severity="high",
106
+ evidence="The output contains hedging language without nearby source signals.",
107
+ recommendation="Request citations, intermediate evidence, or a narrower task boundary.",
108
+ )
109
+ )
110
+
111
+ if stderr.strip():
112
+ hallucination_flags.append(
113
+ RiskFlag(
114
+ code="EXECUTION_ANOMALY",
115
+ title="Execution anomaly",
116
+ severity="medium",
117
+ evidence="The wrapped command emitted stderr output, which may indicate tool failure or partial completion.",
118
+ recommendation="Inspect stderr and rerun before relying on the result for audits or downstream actions.",
119
+ )
120
+ )
121
+
122
+ nodes = [
123
+ TraceNode(id="prompt", label="Prompt", kind="input", detail=prompt or "No prompt recorded."),
124
+ TraceNode(
125
+ id="system",
126
+ label="System Prompt",
127
+ kind="input",
128
+ detail=system_prompt or "No system prompt recorded.",
129
+ ),
130
+ TraceNode(
131
+ id="execution",
132
+ label="Execution",
133
+ kind="process",
134
+ detail=" ".join(command) if command else "No command recorded.",
135
+ score=1.0 if exit_code == 0 else 0.4,
136
+ ),
137
+ TraceNode(
138
+ id="analysis",
139
+ label="Heuristic Analysis",
140
+ kind="analysis",
141
+ detail=f"Hedges={hedge_hits}, source signals={source_hits}, bias markers={bias_hits}",
142
+ score=confidence,
143
+ ),
144
+ TraceNode(
145
+ id="output",
146
+ label="Model Output",
147
+ kind="output",
148
+ detail=(stdout or stderr or "No output captured.")[:1200],
149
+ score=confidence,
150
+ ),
151
+ ]
152
+
153
+ risk_count = len(bias_flags) + len(hallucination_flags)
154
+ summary = _build_summary(prompt, stdout, stderr, risk_count)
155
+
156
+ return TraceRecord(
157
+ model_hint=model_hint or "unknown",
158
+ provider=provider or "unknown",
159
+ prompt=prompt,
160
+ system_prompt=system_prompt,
161
+ command=list(command),
162
+ exit_code=exit_code,
163
+ stdout=stdout,
164
+ stderr=stderr,
165
+ metadata=dict(metadata or {}),
166
+ confidence_score=confidence,
167
+ bias_flags=bias_flags,
168
+ hallucination_flags=hallucination_flags,
169
+ provenance=provenance,
170
+ nodes=nodes,
171
+ summary=summary,
172
+ )