@twarc_net/groundtruth 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,5 +1,5 @@
1
1
  <p align="center">
2
- <img src="assets/hero.svg" alt="groundtruth — catch when your AI coding assistant claims work it didn't do" width="820">
2
+ <img src="assets/demo.svg" alt="groundtruth — catch when your AI coding assistant claims work it didn't do" width="820">
3
3
  </p>
4
4
 
5
5
  <p align="center">
@@ -11,8 +11,22 @@
11
11
  <a href="https://github.com/youcefzemmar/groundtruth/blob/main/CONTRIBUTING.md"><img src="https://img.shields.io/badge/PRs-welcome-brightgreen.svg" alt="PRs welcome"></a>
12
12
  </p>
13
13
 
14
+ <p align="center">
15
+ <b>English</b> ·
16
+ <a href="docs/i18n/README.zh-CN.md">简体中文</a> ·
17
+ <a href="docs/i18n/README.es.md">Español</a> ·
18
+ <a href="docs/i18n/README.pt-BR.md">Português</a> ·
19
+ <a href="docs/i18n/README.fr.md">Français</a> ·
20
+ <a href="docs/i18n/README.de.md">Deutsch</a> ·
21
+ <a href="docs/i18n/README.ja.md">日本語</a> ·
22
+ <a href="docs/i18n/README.ru.md">Русский</a> ·
23
+ <a href="docs/i18n/README.ar.md">العربية</a>
24
+ </p>
25
+
14
26
  # groundtruth
15
27
 
28
+ > **TL;DR** — Your AI says _"Done! I added X, fixed Y, wrote tests."_ groundtruth checks each claim against the real diff and flags the ones that never happened. One command: `npx @twarc_net/groundtruth install`.
29
+
16
30
  **Catch when your AI coding assistant claims work it didn't do.**
17
31
 
18
32
  Your agent ends a turn with _"Done! I added a `rateLimiter` middleware to `src/server.ts`, fixed the timeout bug, and added tests."_ You trust the summary, commit, and move on. Two weeks later production breaks — the rate limiter was never written. The summary lied (or hallucinated), and nothing checked it against the actual diff.
@@ -69,6 +83,13 @@ Restart Claude Code (or run `/hooks`) and groundtruth checks every turn automati
69
83
  npx @twarc_net/groundtruth verify
70
84
  ```
71
85
 
86
+ Prefer plugins? Add the marketplace and install in one step:
87
+
88
+ ```text
89
+ /plugin marketplace add youcefzemmar/groundtruth
90
+ /plugin install groundtruth
91
+ ```
92
+
72
93
  ## How it works
73
94
 
74
95
  ```text
@@ -119,6 +140,85 @@ By default the hook is **non-blocking**: it prints its report and gets out of th
119
140
 
120
141
  Full details in [`docs/claim-types.md`](docs/claim-types.md).
121
142
 
143
+ ## Use in CI (GitHub Action)
144
+
145
+ Post claim verdicts as a sticky comment on every PR — grading the **PR description against the diff**, so it works on any PR with zero agent setup:
146
+
147
+ ```yaml
148
+ # .github/workflows/groundtruth.yml
149
+ name: groundtruth
150
+ on: pull_request
151
+ permissions:
152
+ contents: read
153
+ pull-requests: write
154
+ jobs:
155
+ claim-check:
156
+ runs-on: ubuntu-latest
157
+ steps:
158
+ - uses: actions/checkout@v6
159
+ with: { fetch-depth: 0 }
160
+ - uses: youcefzemmar/groundtruth@v0.2.0
161
+ ```
162
+
163
+ Add `with: { strict: true }` to turn it into a merge gate. Full options in [docs/github-action.md](docs/github-action.md).
164
+
165
+ ## Configuration
166
+
167
+ Optional — drop a `.groundtruthrc.json` in your project (or a `"groundtruth"` key in package.json):
168
+
169
+ ```json
170
+ {
171
+ "strict": false,
172
+ "failOn": ["unsupported"],
173
+ "shadow": false,
174
+ "ignore": ["CHANGELOG.md", "*.generated.ts"],
175
+ "ignoreKinds": ["command"],
176
+ "output": "terminal"
177
+ }
178
+ ```
179
+
180
+ - **`ignore`** — claim targets to skip (substring or `*` glob). Your escape hatch for any false positive.
181
+ - **`ignoreKinds`** — whole claim kinds to skip (`file`, `symbol`, `test`, `dependency`, `command`, `action`).
182
+ - **`strict`** / **`output`** — defaults for blocking and output format.
183
+ - **`failOn`** — which verdict levels count as a failure in strict mode (default `["unsupported"]`).
184
+ - **`shadow`** — record to the ledger but never print or block (for gradual rollout).
185
+
186
+ Install into more hook events for multi-agent workflows:
187
+
188
+ ```bash
189
+ npx @twarc_net/groundtruth install --events Stop,SubagentStop,SessionEnd
190
+ ```
191
+
192
+ `SubagentStop` checks each subagent's turn; `SessionEnd` prints a per-session digest.
193
+
194
+ ## Other agents
195
+
196
+ The Stop hook is Claude Code-specific, but `verify` reads other agents' transcripts too — the claim engine is agent-neutral:
197
+
198
+ ```bash
199
+ groundtruth verify --agent codex # OpenAI Codex CLI
200
+ groundtruth verify --agent gemini # Gemini CLI
201
+ groundtruth verify --agent cursor # Cursor (agent-transcripts)
202
+ groundtruth verify --agent auto # pick the most recent across all
203
+ ```
204
+
205
+ Each adapter normalizes the agent's transcript into the same `{summary, toolUses}` shape. New adapters are a great contribution — see [CONTRIBUTING.md](CONTRIBUTING.md).
206
+
207
+ ## Stats & status bar
208
+
209
+ The hook keeps a privacy-safe local tally (counts only — never code or prompts, in `~/.groundtruth/ledger.jsonl`):
210
+
211
+ ```bash
212
+ groundtruth stats # this project: turns, verified, unsupported, to-review (7d/30d/all)
213
+ groundtruth stats --all # across every project
214
+ ```
215
+
216
+ Show a live count in the Claude Code status bar (`🔎 gt 3❌ ·7d`):
217
+
218
+ ```bash
219
+ npx @twarc_net/groundtruth install --statusline
220
+ ```
221
+
122
222
  ## Honest limitations
123
223
 
124
224
  - It verifies that claimed work **exists in the diff**, not that it is **correct**. _"Fixed the bug"_ can be confirmed to touch the right code; it cannot be confirmed to actually fix anything. That's what tests are for.
@@ -134,13 +234,34 @@ const report = runPipeline({ transcriptPath: "session.jsonl", cwd: process.cwd()
134
234
  console.log(renderMarkdown(report));
135
235
  ```
136
236
 
237
+ ## FAQ
238
+
239
+ **Does it send my code anywhere?**
240
+ No. It runs entirely locally — reads your transcript and `git`, writes nothing except when you run `install`. Zero network calls, zero runtime dependencies.
241
+
242
+ **Will it block my commits or get in the way?**
243
+ No. By default it just prints a report and exits cleanly. Blocking is strictly opt-in (`--strict`).
244
+
245
+ **Isn't this what tests are for?**
246
+ Tests catch code that's _wrong_. groundtruth catches code that was _never written_ but reported as done — there's nothing for a test to run. They're complementary.
247
+
248
+ **Does it work with Cursor / other agents?**
249
+ The engine is format-agnostic; today it ships a Claude Code transcript adapter. Adapters for other agents are a great first contribution — see [CONTRIBUTING.md](CONTRIBUTING.md).
250
+
251
+ **Will it falsely accuse me?**
252
+ It's tuned hard against that. A claim is only `unsupported` when it's concretely checkable and nothing supports it; everything fuzzy is shown as advisory, never a failure.
253
+
137
254
  ## Contributing
138
255
 
139
256
  Issues and PRs welcome — especially new claim patterns, agent adapters, and false-positive reports (those are gold). See [CONTRIBUTING.md](CONTRIBUTING.md) and the [Code of Conduct](CODE_OF_CONDUCT.md).
140
257
 
141
258
  ## Star history
142
259
 
143
- If groundtruth ever catches your agent in a lie, consider [starring the repo](https://github.com/youcefzemmar/groundtruth) — it helps other people find it.
260
+ If groundtruth ever catches your agent in a lie, a helps other people find it.
261
+
262
+ <a href="https://star-history.com/#youcefzemmar/groundtruth&Date">
263
+ <img src="https://api.star-history.com/svg?repos=youcefzemmar/groundtruth&type=Date" alt="Star History Chart" width="600">
264
+ </a>
144
265
 
145
266
  ## License
146
267