@twarc_net/groundtruth 0.1.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +136 -2
- package/dist/cli.js +1120 -178
- package/dist/index.d.ts +139 -8
- package/dist/index.js +849 -24
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
<p align="center">
|
|
2
|
-
<img src="assets/
|
|
2
|
+
<img src="assets/demo.svg" alt="groundtruth — catch when your AI coding assistant claims work it didn't do" width="820">
|
|
3
3
|
</p>
|
|
4
4
|
|
|
5
5
|
<p align="center">
|
|
@@ -11,8 +11,22 @@
|
|
|
11
11
|
<a href="https://github.com/youcefzemmar/groundtruth/blob/main/CONTRIBUTING.md"><img src="https://img.shields.io/badge/PRs-welcome-brightgreen.svg" alt="PRs welcome"></a>
|
|
12
12
|
</p>
|
|
13
13
|
|
|
14
|
+
<p align="center">
|
|
15
|
+
<b>English</b> ·
|
|
16
|
+
<a href="docs/i18n/README.zh-CN.md">简体中文</a> ·
|
|
17
|
+
<a href="docs/i18n/README.es.md">Español</a> ·
|
|
18
|
+
<a href="docs/i18n/README.pt-BR.md">Português</a> ·
|
|
19
|
+
<a href="docs/i18n/README.fr.md">Français</a> ·
|
|
20
|
+
<a href="docs/i18n/README.de.md">Deutsch</a> ·
|
|
21
|
+
<a href="docs/i18n/README.ja.md">日本語</a> ·
|
|
22
|
+
<a href="docs/i18n/README.ru.md">Русский</a> ·
|
|
23
|
+
<a href="docs/i18n/README.ar.md">العربية</a>
|
|
24
|
+
</p>
|
|
25
|
+
|
|
14
26
|
# groundtruth
|
|
15
27
|
|
|
28
|
+
> **TL;DR** — Your AI says _"Done! I added X, fixed Y, wrote tests."_ groundtruth checks each claim against the real diff and flags the ones that never happened. One command: `npx @twarc_net/groundtruth install`.
|
|
29
|
+
|
|
16
30
|
**Catch when your AI coding assistant claims work it didn't do.**
|
|
17
31
|
|
|
18
32
|
Your agent ends a turn with _"Done! I added a `rateLimiter` middleware to `src/server.ts`, fixed the timeout bug, and added tests."_ You trust the summary, commit, and move on. Two weeks later production breaks — the rate limiter was never written. The summary lied (or hallucinated), and nothing checked it against the actual diff.
|
|
@@ -69,6 +83,13 @@ Restart Claude Code (or run `/hooks`) and groundtruth checks every turn automati
|
|
|
69
83
|
npx @twarc_net/groundtruth verify
|
|
70
84
|
```
|
|
71
85
|
|
|
86
|
+
Prefer plugins? Add the marketplace and install in one step:
|
|
87
|
+
|
|
88
|
+
```text
|
|
89
|
+
/plugin marketplace add youcefzemmar/groundtruth
|
|
90
|
+
/plugin install groundtruth
|
|
91
|
+
```
|
|
92
|
+
|
|
72
93
|
## How it works
|
|
73
94
|
|
|
74
95
|
```text
|
|
@@ -119,6 +140,98 @@ By default the hook is **non-blocking**: it prints its report and gets out of th
|
|
|
119
140
|
|
|
120
141
|
Full details in [`docs/claim-types.md`](docs/claim-types.md).
|
|
121
142
|
|
|
143
|
+
## Use in CI (GitHub Action)
|
|
144
|
+
|
|
145
|
+
Post claim verdicts as a sticky comment on every PR — grading the **PR description against the diff**, so it works on any PR with zero agent setup:
|
|
146
|
+
|
|
147
|
+
```yaml
|
|
148
|
+
# .github/workflows/groundtruth.yml
|
|
149
|
+
name: groundtruth
|
|
150
|
+
on: pull_request
|
|
151
|
+
permissions:
|
|
152
|
+
contents: read
|
|
153
|
+
pull-requests: write
|
|
154
|
+
jobs:
|
|
155
|
+
claim-check:
|
|
156
|
+
runs-on: ubuntu-latest
|
|
157
|
+
steps:
|
|
158
|
+
- uses: actions/checkout@v6
|
|
159
|
+
with: { fetch-depth: 0 }
|
|
160
|
+
- uses: youcefzemmar/groundtruth@v0.3.0
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
Add `with: { strict: true }` to turn it into a merge gate. Full options in [docs/github-action.md](docs/github-action.md).
|
|
164
|
+
|
|
165
|
+
### Locally, against your commit message
|
|
166
|
+
|
|
167
|
+
`--staged` checks a message against what's actually staged — drop this in `.git/hooks/commit-msg` (or a [lefthook](https://github.com/evilmartians/lefthook)/husky `commit-msg` hook):
|
|
168
|
+
|
|
169
|
+
```sh
|
|
170
|
+
#!/bin/sh
|
|
171
|
+
# .git/hooks/commit-msg — verify the commit message against the staged diff
|
|
172
|
+
npx @twarc_net/groundtruth verify --summary "$1" --staged
|
|
173
|
+
# add --strict to abort the commit when a claim is unsupported
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
## Configuration
|
|
177
|
+
|
|
178
|
+
Optional — drop a `.groundtruthrc.json` in your project (or a `"groundtruth"` key in package.json):
|
|
179
|
+
|
|
180
|
+
```json
|
|
181
|
+
{
|
|
182
|
+
"strict": false,
|
|
183
|
+
"failOn": ["unsupported"],
|
|
184
|
+
"shadow": false,
|
|
185
|
+
"ignore": ["CHANGELOG.md", "*.generated.ts"],
|
|
186
|
+
"ignoreKinds": ["command"],
|
|
187
|
+
"output": "terminal"
|
|
188
|
+
}
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
- **`ignore`** — claim targets to skip (substring or `*` glob). Your escape hatch for any false positive.
|
|
192
|
+
- **`ignoreKinds`** — whole claim kinds to skip (`file`, `symbol`, `test`, `dependency`, `command`, `action`).
|
|
193
|
+
- **`strict`** / **`output`** — defaults for blocking and output format.
|
|
194
|
+
- **`failOn`** — which verdict levels count as a failure in strict mode (default `["unsupported"]`).
|
|
195
|
+
- **`shadow`** — record to the ledger but never print or block (for gradual rollout).
|
|
196
|
+
|
|
197
|
+
Install into more hook events for multi-agent workflows:
|
|
198
|
+
|
|
199
|
+
```bash
|
|
200
|
+
npx @twarc_net/groundtruth install --events Stop,SubagentStop,SessionEnd
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
`SubagentStop` checks each subagent's turn; `SessionEnd` prints a per-session digest.
|
|
204
|
+
|
|
205
|
+
## Other agents
|
|
206
|
+
|
|
207
|
+
The Stop hook is Claude Code-specific, but `verify` reads other agents' transcripts too — the claim engine is agent-neutral:
|
|
208
|
+
|
|
209
|
+
```bash
|
|
210
|
+
groundtruth verify --agent codex # OpenAI Codex CLI
|
|
211
|
+
groundtruth verify --agent gemini # Gemini CLI
|
|
212
|
+
groundtruth verify --agent cursor # Cursor (agent-transcripts)
|
|
213
|
+
groundtruth verify --agent opencode # OpenCode
|
|
214
|
+
groundtruth verify --agent aider # Aider (best-effort)
|
|
215
|
+
groundtruth verify --agent auto # pick the most recent across all
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
Each adapter normalizes the agent's transcript into the same `{summary, toolUses}` shape. New adapters are a great contribution — see [CONTRIBUTING.md](CONTRIBUTING.md).
|
|
219
|
+
|
|
220
|
+
## Stats & status bar
|
|
221
|
+
|
|
222
|
+
The hook keeps a privacy-safe local tally (counts only — never code or prompts, in `~/.groundtruth/ledger.jsonl`):
|
|
223
|
+
|
|
224
|
+
```bash
|
|
225
|
+
groundtruth stats # this project: turns, verified, unsupported, to-review (7d/30d/all)
|
|
226
|
+
groundtruth stats --all # across every project
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
Show a live count in the Claude Code status bar (`🔎 gt 3❌ ·7d`):
|
|
230
|
+
|
|
231
|
+
```bash
|
|
232
|
+
npx @twarc_net/groundtruth install --statusline
|
|
233
|
+
```
|
|
234
|
+
|
|
122
235
|
## Honest limitations
|
|
123
236
|
|
|
124
237
|
- It verifies that claimed work **exists in the diff**, not that it is **correct**. _"Fixed the bug"_ can be confirmed to touch the right code; it cannot be confirmed to actually fix anything. That's what tests are for.
|
|
@@ -134,13 +247,34 @@ const report = runPipeline({ transcriptPath: "session.jsonl", cwd: process.cwd()
|
|
|
134
247
|
console.log(renderMarkdown(report));
|
|
135
248
|
```
|
|
136
249
|
|
|
250
|
+
## FAQ
|
|
251
|
+
|
|
252
|
+
**Does it send my code anywhere?**
|
|
253
|
+
No. It runs entirely locally — reads your transcript and `git`, writes nothing except when you run `install`. Zero network calls, zero runtime dependencies.
|
|
254
|
+
|
|
255
|
+
**Will it block my commits or get in the way?**
|
|
256
|
+
No. By default it just prints a report and exits cleanly. Blocking is strictly opt-in (`--strict`).
|
|
257
|
+
|
|
258
|
+
**Isn't this what tests are for?**
|
|
259
|
+
Tests catch code that's _wrong_. groundtruth catches code that was _never written_ but reported as done — there's nothing for a test to run. They're complementary.
|
|
260
|
+
|
|
261
|
+
**Does it work with Cursor / other agents?**
|
|
262
|
+
The engine is format-agnostic; today it ships a Claude Code transcript adapter. Adapters for other agents are a great first contribution — see [CONTRIBUTING.md](CONTRIBUTING.md).
|
|
263
|
+
|
|
264
|
+
**Will it falsely accuse me?**
|
|
265
|
+
It's tuned hard against that. A claim is only `unsupported` when it's concretely checkable and nothing supports it; everything fuzzy is shown as advisory, never a failure.
|
|
266
|
+
|
|
137
267
|
## Contributing
|
|
138
268
|
|
|
139
269
|
Issues and PRs welcome — especially new claim patterns, agent adapters, and false-positive reports (those are gold). See [CONTRIBUTING.md](CONTRIBUTING.md) and the [Code of Conduct](CODE_OF_CONDUCT.md).
|
|
140
270
|
|
|
141
271
|
## Star history
|
|
142
272
|
|
|
143
|
-
If groundtruth ever catches your agent in a lie,
|
|
273
|
+
If groundtruth ever catches your agent in a lie, a ⭐ helps other people find it.
|
|
274
|
+
|
|
275
|
+
<a href="https://star-history.com/#youcefzemmar/groundtruth&Date">
|
|
276
|
+
<img src="https://api.star-history.com/svg?repos=youcefzemmar/groundtruth&type=Date" alt="Star History Chart" width="600">
|
|
277
|
+
</a>
|
|
144
278
|
|
|
145
279
|
## License
|
|
146
280
|
|