groundy 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,17 @@
1
+ # Copy to .env and fill in. .env is gitignored — never commit real keys.
2
+
3
+ # groundy makes ONE call (reformulation) over any OpenAI-compatible API. Set all three
4
+ # explicitly — there is no default provider, so always name your endpoint.
5
+ GROUNDY_API_KEY=sk-...
6
+ GROUNDY_BASE_URL=https://api.openai.com/v1 # your provider (OpenRouter, Groq, Anthropic, a local server…)
7
+ GROUNDY_MODEL=gpt-4o-mini # the reformulation model (required, no default)
8
+
9
+ # Dev only: print every reformulation, answer, and score. Silent unless truthy.
10
+ # GROUNDY_DEBUG=1
11
+
12
+ # Optional observability — only if you trace with LangfuseTracer (the `groundy[langfuse]`
13
+ # extra). groundy core never reads these; the Langfuse SDK does. Get the keys from your
14
+ # Langfuse project settings. Leave unset to run without tracing.
15
+ LANGFUSE_SECRET_KEY="sk-lf-e74b5fae-ff8b-48be-9def-c1fda2529993"
16
+ LANGFUSE_PUBLIC_KEY="pk-lf-ae1f626c-7f87-4991-b0b5-89592c2c00a2"
17
+ LANGFUSE_BASE_URL="https://cloud.langfuse.com"
@@ -0,0 +1,41 @@
1
+ name: release
2
+
3
+ # Publish to PyPI via Trusted Publishing (OIDC) — no API token stored.
4
+ # Fires when you push a version tag (e.g. `git tag v0.3.0 && git push origin v0.3.0`).
5
+ on:
6
+ push:
7
+ tags:
8
+ - "v*"
9
+
10
+ jobs:
11
+ build:
12
+ runs-on: ubuntu-latest
13
+ steps:
14
+ - uses: actions/checkout@v4
15
+ - name: Install uv
16
+ uses: astral-sh/setup-uv@v5
17
+ - name: Build sdist + wheel
18
+ run: uv build
19
+ - name: Check metadata
20
+ run: uvx twine check dist/*
21
+ - name: Upload build artifacts
22
+ uses: actions/upload-artifact@v4
23
+ with:
24
+ name: dist
25
+ path: dist/
26
+
27
+ publish:
28
+ needs: build
29
+ runs-on: ubuntu-latest
30
+ # Must match the "Environment name" in the PyPI trusted-publisher config.
31
+ environment: pypi
32
+ permissions:
33
+ id-token: write # required for OIDC / Trusted Publishing
34
+ steps:
35
+ - name: Download build artifacts
36
+ uses: actions/download-artifact@v4
37
+ with:
38
+ name: dist
39
+ path: dist/
40
+ - name: Publish to PyPI
41
+ uses: pypa/gh-action-pypi-publish@release/v1
@@ -0,0 +1,70 @@
1
+ # Secrets / local config
2
+ .env
3
+ .env.*
4
+ !.env.example
5
+
6
+ # Byte-compiled / optimized / DLL files
7
+ __pycache__/
8
+ *.py[cod]
9
+ *$py.class
10
+ *.so
11
+
12
+ # Distribution / packaging
13
+ .Python
14
+ build/
15
+ dist/
16
+ downloads/
17
+ eggs/
18
+ .eggs/
19
+ sdist/
20
+ wheels/
21
+ *.egg-info/
22
+ *.egg
23
+ MANIFEST
24
+
25
+ # Installer logs
26
+ pip-log.txt
27
+ pip-delete-this-directory.txt
28
+
29
+ # Unit test / coverage reports
30
+ .pytest_cache/
31
+ .coverage
32
+ .coverage.*
33
+ htmlcov/
34
+ .cache
35
+ .tox/
36
+ .nox/
37
+
38
+ # Type checkers / linters
39
+ .mypy_cache/
40
+ .ruff_cache/
41
+ .dmypy.json
42
+
43
+ # Virtual environments
44
+ .venv/
45
+ venv/
46
+ env/
47
+ ENV/
48
+
49
+ # uv lockfile — not committed for this library (resolution is left to consumers)
50
+ uv.lock
51
+
52
+ # Jupyter
53
+ .ipynb_checkpoints/
54
+
55
+ # sentence-transformers / HF model cache (can be large)
56
+ .cache/huggingface/
57
+
58
+ # IDEs / editors
59
+ .idea/
60
+ .vscode/
61
+ *.swp
62
+ *~
63
+
64
+ # OS cruft
65
+ .DS_Store
66
+ Thumbs.db
67
+
68
+ # Claude Code
69
+ CLAUDE.md
70
+ .claude/
@@ -0,0 +1,45 @@
1
+ # Changelog
2
+
3
+ Notable changes to `groundy`. Pre-1.0, so the API may still shift between releases.
4
+
5
+ ## 0.3.0 — 2026-06-11
6
+
7
+ - Added agnostic observability: pass a `tracer` to `@groundy` / `GroundyChecker` (a small
8
+ `Tracer` protocol, like `cache=`) and each `check()` emits a nested trace —
9
+ `reformulate → verify ×n → score → served`. Default `tracer=None` is a no-op (zero overhead).
10
+ - Ships a Langfuse adapter behind the `groundy[langfuse]` extra
11
+ (`from groundy.observability.langfuse import LangfuseTracer`); the core imports no vendor SDK.
12
+ The reformulation node (the one call groundy owns) carries the model, temperature, token
13
+ usage, and a prompt-template hash.
14
+
15
+ ## 0.2.1 — 2026-06-09
16
+
17
+ - Added a `fastembed` similarity backend: the same `all-MiniLM-L6-v2` model via ONNX
18
+ Runtime (no torch), ~15x lighter import (CLI cold start ~10s → ~1-2s). Opt-in via
19
+ `backend="fastembed"` + the `fastembed` extra; the CLI defaults to it and falls back to
20
+ `embeddings` when it isn't installed.
21
+ - Added a `concurrency` knob (`GroundyChecker` / `@groundy`, default 2; CLI `-c`) to fetch
22
+ the verify answers in parallel — cuts wall-clock since they're independent. The served
23
+ call stays sequential.
24
+
25
+ ## 0.2.0 — 2026-06-08
26
+
27
+ - **Breaking:** env config moved to groundy's own namespace — `GROUNDY_API_KEY` /
28
+ `GROUNDY_BASE_URL` (was `OPENAI_*`), and `base_url` is now required (no default provider).
29
+ - **Breaking:** removed the `reformulate_fn` hook — any OpenAI-compatible `base_url` covers it.
30
+ - Added the `groundy` CLI: a terminal vibe-check printing the verdict + agreement matrix
31
+ (`--matrix` for the N×N heatmap). Supports stdin and `-q`/`-n`/`-t`/`--debug`.
32
+
33
+ ## 0.1.0b1 — 2026-06-07
34
+
35
+ - **Breaking:** renamed env `GROUNDY_REFORMULATION_MODEL` → `GROUNDY_MODEL`; the
36
+ reformulation model is now required (no `gpt-4o-mini` fallback) — set `model=` or
37
+ `GROUNDY_MODEL` or it raises.
38
+ - Fixed: `best_answer` returns the served answer (not the consensus/medoid) when reliable,
39
+ else `None`; docs now match the code.
40
+
41
+ ## 0.1.0b0
42
+
43
+ - Initial release: the `@groundy` decorator, `GroundyChecker.check()` rich result, the
44
+ `Cache` protocol with `cache=` orchestration, pluggable similarity backends (`embeddings`
45
+ real, `llm_judge` stub), and an injectable `reformulate_fn`.
groundy-0.3.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Lorenzo 'lopoc' Cococcia
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
groundy-0.3.0/PKG-INFO ADDED
@@ -0,0 +1,348 @@
1
+ Metadata-Version: 2.4
2
+ Name: groundy
3
+ Version: 0.3.0
4
+ Summary: Hallucination detection for LLMs via semantic consistency checking
5
+ Project-URL: Homepage, https://github.com/lopoc/groundy
6
+ Project-URL: Repository, https://github.com/lopoc/groundy
7
+ Project-URL: Issues, https://github.com/lopoc/groundy/issues
8
+ Project-URL: Changelog, https://github.com/lopoc/groundy/blob/main/CHANGELOG.md
9
+ Author: Lorenzo 'lopoc' Cococcia
10
+ License-Expression: MIT
11
+ License-File: LICENSE
12
+ Keywords: agents,ai,hallucination,llm,openai,semantic-cache
13
+ Classifier: Development Status :: 4 - Beta
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Operating System :: OS Independent
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Programming Language :: Python :: 3.13
21
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
22
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
23
+ Classifier: Typing :: Typed
24
+ Requires-Python: >=3.10
25
+ Requires-Dist: loguru>=0.7.0
26
+ Requires-Dist: openai>=1.0.0
27
+ Requires-Dist: sentence-transformers>=2.7.0
28
+ Provides-Extra: fastembed
29
+ Requires-Dist: fastembed>=0.3.0; extra == 'fastembed'
30
+ Provides-Extra: langfuse
31
+ Requires-Dist: langfuse>=3.2.0; extra == 'langfuse'
32
+ Description-Content-Type: text/markdown
33
+
34
+ # groundy 🌱
35
+
36
+ **Keep your LLM grounded - no ground truth required.**
37
+
38
+ A grounded model agrees with itself: ask the same question a few different ways and the
39
+ answer holds. A model that's improvising scatters. `groundy` wraps that check into one
40
+ decorator that returns an answer you can trust - or a refusal when the model is just
41
+ making things up. No labels, no fine-tuning, no retrieval.
42
+
43
+ ```python
44
+ from groundy import groundy
45
+
46
+ @groundy
47
+ def ask(q: str) -> str:
48
+ return my_llm(q) # your LLM call - any provider, returns a str
49
+
50
+ ask("Who proved Fermat's Last Theorem?") # → "Andrew Wiles."
51
+ ask("Who was the 14th person on the Moon?") # → "I'm not confident enough to answer that reliably."
52
+ ```
53
+
54
+ Same signature, same `str` return. Nothing downstream changes - the answer just became
55
+ trustworthy.
56
+
57
+ ## Get started
58
+
59
+ **1. Install** (not on PyPI yet):
60
+
61
+ ```bash
62
+ uv add git+https://github.com/lopoc/groundy.git
63
+ ```
64
+
65
+ That's the full library, ready to use — the `@groundy` decorator and the local `embeddings`
66
+ backend work out of the box, no extras needed. Two optional extras add heavier integrations
67
+ only if you want them:
68
+
69
+ | Extra | Adds | Use it for |
70
+ |---|---|---|
71
+ | `fastembed` | ONNX embedding backend (no torch) | ~15× lighter import (CLI cold start ~10s → ~1–2s). Select with `backend="fastembed"`. |
72
+ | `langfuse` | Langfuse tracing adapter | Trace every check (`tracer=LangfuseTracer()`). See [Observability](#observability). |
73
+
74
+ Add them in the brackets (comma-separated for several) — note the quotes and the `name @`
75
+ prefix when you include an extra:
76
+
77
+ ```bash
78
+ uv add "groundy[fastembed,langfuse] @ git+https://github.com/lopoc/groundy.git"
79
+ ```
80
+
81
+ Skip the extras and nothing breaks: `fastembed` and the Langfuse SDK are imported lazily —
82
+ only when you actually select that backend or construct the tracer — so a plain install
83
+ never needs them.
84
+
85
+ **2. Give groundy an API key, a provider, and a model name.** It makes one call of its own
86
+ - reformulation, over any OpenAI-compatible API - all under its own `GROUNDY_*` namespace:
87
+
88
+ ```bash
89
+ export GROUNDY_API_KEY=sk-...
90
+ export GROUNDY_BASE_URL=https://api.openai.com/v1 # your provider — name it, no default (OpenRouter, Groq, a local server…)
91
+ export GROUNDY_MODEL=gpt-4o-mini # the reformulation model (required, no default)
92
+ ```
93
+
94
+ **3. Decorate your LLM call** and use it as usual:
95
+
96
+ ```python
97
+ from openai import OpenAI
98
+ from groundy import groundy
99
+
100
+ client = OpenAI()
101
+
102
+ @groundy
103
+ def ask(q: str) -> str:
104
+ return client.chat.completions.create(
105
+ model="gpt-4o", messages=[{"role": "user", "content": q}]
106
+ ).choices[0].message.content
107
+
108
+ print(ask("Who proved Fermat's Last Theorem?"))
109
+ print(ask("Who was the 14th person on the Moon?"))
110
+ ```
111
+
112
+ That's it. A ready-to-run version (decorator + cache + raw checker) ships in the repo:
113
+ `uv run python examples/basic.py`.
114
+
115
+ > 💡 `export GROUNDY_DEBUG=1` prints every reformulation, answer, and score.
116
+
117
+ ## Vibe-check it from the terminal
118
+
119
+ No code needed - `groundy` asks your question a few ways and shows you the *matrix*: each
120
+ distinct answer with a bar for how much it **agrees with the rest** (groundy's own signal),
121
+ consensus on top, outliers at the bottom. Identical answers collapse to one `×N` row:
122
+
123
+ ```bash
124
+ export GROUNDY_API_KEY=sk-...
125
+ export GROUNDY_BASE_URL=https://api.openai.com/v1 # your provider — required, no default
126
+ export GROUNDY_MODEL=gpt-4o-mini
127
+
128
+ groundy "Who was the 14th person to walk on the Moon?"
129
+ ```
130
+
131
+ ```text
132
+ 🌱 groundy
133
+
134
+ ? Who was the 14th person to walk on the Moon?
135
+
136
+ ⚠ uncertain consistency 0.50 · 17.8s
137
+
138
+ I'm not confident enough to answer that reliably.
139
+
140
+ scatter
141
+ █████░░░ 0.61 Eugene Cernan (the last person to walk on the Moon, Apollo 17)…
142
+ ████░░░░ 0.52 Eugene Cernan was the last (12th) person to walk on the Moon…
143
+ ███░░░░░ 0.41 Harrison Schmitt ×2
144
+ ```
145
+
146
+ On a reliable question the bars stand tall together and collapse to one row
147
+ (`████████ 1.00 Paris ×5`); on a shaky one they fan down as the answers pull apart.
148
+
149
+ Want the raw structure? `--matrix` prints the full N×N pairwise heatmap - mutually-agreeing
150
+ answers light up as bright blocks, so you *see* the clusters with no threshold and nothing
151
+ aggregated:
152
+
153
+ ```text
154
+ scatter
155
+ a b c d e
156
+ a ██░░██████ Eugene Cernan was the last (12th)…
157
+ b ░░██░░░░░░ Gene Cernan
158
+ c ██░░██████ Eugene Cernan was the last (12th)…
159
+ d ██░░██████ Eugene Cernan was the last (12th)…
160
+ e ██░░██████ Eugene Cernan (the last person…)
161
+ ```
162
+
163
+ It reads `GROUNDY_API_KEY` + `GROUNDY_MODEL` like everything else. Pipe a question in
164
+ (`echo "…" | groundy`), add `-q` for answer-only output, `--matrix` for the heatmap,
165
+ `-n`/`-t` to tune, or `--debug` for the raw reformulation log.
166
+
167
+ ## How it works
168
+
169
+ An uncertain model disagrees with itself when you rephrase the question; a confident one
170
+ doesn't. With `@groundy(n=5)`, each call:
171
+
172
+ 1. **Rephrases** the query 4 ways - groundy's one own call.
173
+ 2. **Answers all 5 tersely.** A `verify_prompt` is prepended so the comparison is about
174
+ *substance, not phrasing*. These are the *verify answers*.
175
+ 3. **Scores agreement** - embeds the verify answers locally (sentence-transformers) and
176
+ averages their pairwise cosine similarity into a `consistency_score` in `[0, 1]`.
177
+ 4. **Decides:** `reliable = consistency_score >= threshold`.
178
+ 5. **Answers your way - only if reliable.** It calls your function once more on the raw
179
+ query for the *served* answer (your verbosity/prompt) and returns it. Unreliable → it
180
+ skips this call and returns your `on_unreliable` string.
181
+
182
+ You serve the answer the way you want it, but verification is terse so verbosity can't
183
+ hide disagreement. Cost: **7 LLM calls when reliable** (1 reformulation + 5 verify + 1
184
+ served), **6 when unreliable**, all synchronous - which is exactly why you cache it.
185
+
186
+ ## Cache it - pay once per *cluster* of questions
187
+
188
+ groundy is expensive, so hand it a cache and it runs **only on a miss**. A cache is anything
189
+ with `get(key) -> str | None` and `set(key, value)`. The real win is a **semantic** cache: a
190
+ hit fires on any question close enough in *meaning*, so groundy runs once per cluster of
191
+ similar questions and serves the whole neighbourhood for free.
192
+
193
+ ```python
194
+ from groundy import groundy
195
+
196
+ # Bring any semantic cache exposing get(key) -> str | None and set(key, value). A hit fires
197
+ # on questions close in *meaning*, so groundy runs once per cluster (GPTCache, Momento,
198
+ # Upstash, Redis + RedisVL - a 3-line adapter if the method names differ).
199
+ cache = SemanticCache(threshold=0.9)
200
+
201
+ @groundy(cache=cache)
202
+ def ask(q: str) -> str:
203
+ return client.chat.completions.create(...).choices[0].message.content # the RAW model
204
+
205
+ ask("Who discovered penicillin?") # MISS → full check → verdict cached
206
+ ask("Who was penicillin discovered by?") # HIT → same meaning, zero LLM calls
207
+ ```
208
+
209
+ On a hit groundy never runs. On a miss it checks, then `cache.set`s the verdict - refusals
210
+ included, so "the model can't answer this" is remembered too.
211
+
212
+ > ⚠️ **The one rule: groundy goes *above* your semantic cache, never below it.** If a
213
+ > semantic cache sits inside the wrapped call, the reformulations - semantically
214
+ > equivalent by design - all hit the same entry, score a perfect 1.0, and *every* check
215
+ > falsely passes. The semantic cache belongs on top (via `cache=`), caching the verdict.
216
+
217
+ ## When you want the numbers
218
+
219
+ The decorator hides the scores on purpose. Reach past it for the rich result:
220
+
221
+ ```python
222
+ from groundy import GroundyChecker
223
+
224
+ checker = GroundyChecker(n=5, threshold=0.75)
225
+ r = checker.check("What does Italian Civil Code art. 2043 establish?", answer_fn=my_llm)
226
+
227
+ r.consistency_score # 0.0–1.0
228
+ r.is_reliable # bool
229
+ r.best_answer # the served answer if reliable, else None
230
+ r.consensus_answer, r.agreement_scores, r.similarity_scores, r.latency_ms
231
+ ```
232
+
233
+ `best_answer` is the **served** answer (your raw call) when reliable, and `None` when not
234
+ - on a genuine split the right move is to refuse, not guess. The decorator turns that
235
+ `None` into your `on_unreliable` string. (`consensus_answer`, the verify answer that agrees
236
+ most with the rest, is diagnostic only.)
237
+
238
+ ## Run on any vendor
239
+
240
+ There are **two independent LLM tasks**, configured separately:
241
+
242
+ - **Answering** - your decorated function. OpenAI, LiteLLM, Ollama, anything returning a
243
+ `str`. There's no `answer_model=` knob: the answer call *is* your function.
244
+ - **Reformulating** - groundy's own OpenAI-compatible call. Set `GROUNDY_MODEL` +
245
+ `GROUNDY_BASE_URL` (both required, no default provider), or pass `model` / `base_url` /
246
+ `api_key`.
247
+
248
+ So you can reformulate on a cheap, fast model and answer on a stronger one - even across
249
+ providers:
250
+
251
+ ```python
252
+ @groundy(
253
+ model="llama-3.3-70b-versatile", # reformulate on Groq…
254
+ base_url="https://api.groq.com/openai/v1",
255
+ api_key="gsk_...",
256
+ )
257
+ def ask(q: str) -> str:
258
+ return openai_client.chat.completions.create( # …answer on OpenAI
259
+ model="gpt-4o", messages=[{"role": "user", "content": q}]
260
+ ).choices[0].message.content
261
+ ```
262
+
263
+ Any OpenAI-compatible endpoint works - that covers OpenAI, OpenRouter, Groq, Together,
264
+ Fireworks, and local servers (vLLM, llama.cpp, Ollama).
265
+
266
+ ## Knobs
267
+
268
+ | Param | Default | What it does |
269
+ |---|---|---|
270
+ | `n` | `5` | Answers compared: original + n-1 reformulations. Must be ≥ 2. Higher = sturdier + pricier. |
271
+ | `threshold` | `0.75` | Score below this → refusal. **Calibrate it** (see limits). |
272
+ | `backend` | `"embeddings"` | `embeddings` (local, sentence-transformers) or `llm_judge` (stub). |
273
+ | `model` | `None` | Reformulation model - **required** (no default). `None` → `GROUNDY_MODEL`, else `ValueError`. |
274
+ | `temperature` | `0.0` | Reformulator temperature (`0.0` = reproducible). Set `None` to omit it for models that reject the param. |
275
+ | `base_url` | `None` | Reformulation provider — **required** (no default). `None` → `GROUNDY_BASE_URL`, else `ValueError`. |
276
+ | `api_key` | `None` | `None` → `GROUNDY_API_KEY` (may be unset for keyless local servers). |
277
+ | `verify_prompt` | *(terse instruction)* | Prepended to the verify answers (not the served one). `None` verifies with your raw answers. |
278
+ | `cache` | `None` | Any object with `get`/`set`. Runs groundy only on a miss. |
279
+ | `tracer` | `None` | Any object with the `Tracer` protocol. Emits a nested trace per check. Langfuse adapter in `groundy[langfuse]`. |
280
+ | `on_unreliable` | *(a refusal)* | Returned/cached when the model disagrees with itself. |
281
+
282
+ ## Honest limits - read this
283
+
284
+ groundy measures **self-consistency, not correctness.** Know the failure modes:
285
+
286
+ - **Consistent confabulation passes.** A confidently, consistently wrong model scores
287
+ high. This catches uncertainty *that surfaces as divergence* - a large subset of
288
+ hallucination, not all of it. Terse verify answers help: verbose hedging hides
289
+ disagreement (verbose answers to *"the 14th person on the Moon"* all hedge alike and
290
+ score ~0.9; terse ones confabulate *different* names → ~0.30, flagged). That's why
291
+ verification is terse by default while your served answer stays verbose.
292
+ - **Calibrate the threshold.** With the default `all-MiniLM-L6-v2` backend, scores cluster
293
+ high (~0.75–0.95) for any related text. `0.75` is a starting point - tune it on your
294
+ prompts.
295
+ - **It costs ~N+2 LLM calls per check** (n=5 ≈ 7, sequential). Hence `cache=`: vet a
296
+ question once, serve it free forever after.
297
+
298
+ ## Observability
299
+
300
+ Optional and agnostic. Pass a `tracer` (a tiny `Tracer` protocol, just like `cache=`) and
301
+ every `check()` emits a nested trace: `reformulate → verify ×n → score → served`. Default
302
+ `tracer=None` → no tracing, zero overhead.
303
+
304
+ A Langfuse adapter ships in the box — add the `langfuse` extra:
305
+
306
+ ```bash
307
+ uv add "groundy[langfuse] @ git+https://github.com/lopoc/groundy.git"
308
+ ```
309
+
310
+ ```python
311
+ from groundy.observability.langfuse import LangfuseTracer
312
+
313
+ @groundy(tracer=LangfuseTracer()) # reads LANGFUSE_* from the env
314
+ def ask(q: str) -> str:
315
+ ...
316
+ ```
317
+
318
+ The core imports no vendor SDK - only you import the adapter. groundy owns one LLM call
319
+ (reformulation), so that node carries the model, temperature, token usage, and a prompt hash;
320
+ the `answer_fn` nodes show text + timing only. Prefer to log it yourself? The full
321
+ `GroundyResult` is still right there. For dev, `GROUNDY_DEBUG=1` prints reformulations,
322
+ answers, and scores.
323
+
324
+ ## Develop
325
+
326
+ ```bash
327
+ git clone https://github.com/lopoc/groundy.git
328
+ cd groundy
329
+ uv sync # creates .venv, installs runtime + dev tools
330
+
331
+ uv run python examples/basic.py # smoke test (needs GROUNDY_API_KEY + GROUNDY_MODEL)
332
+ uv run ruff check groundy # lint
333
+ uv run ruff format groundy # format
334
+ uv run pytest # tests (once a tests/ dir exists)
335
+ ```
336
+
337
+ ## Roadmap
338
+
339
+ - [x] CLI: `groundy "your query"`
340
+ - [ ] `async def acheck()` - parallelize the N calls
341
+ - [ ] `llm_judge` backend (structured 0–1 scoring - sharper than embeddings)
342
+ - [ ] Tests + benchmark (measured reliable-vs-hallucinated separation)
343
+
344
+ ## Origin
345
+
346
+ A practical take on the **Laplace agent** from the Socrates/Laplace judicial-AI framework.
347
+
348
+ MIT License