altimate-receipts 0.8.0 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,10 +2,9 @@
2
2
 
3
3
  # 🧾 receipts
4
4
 
5
- **Agent-work verification — not code review.**
5
+ **See what your AI coding agent actually did — not just what it says it did.**
6
6
 
7
- A deterministic, cross-agent **Report Card** of what your coding agent *actually did*
8
- read straight from the agent's own transcript, on your machine.
7
+ Deterministic · local · reads the agent's own transcript · never grades your code.
9
8
 
10
9
  [![npm](https://img.shields.io/npm/v/altimate-receipts.svg)](https://www.npmjs.com/package/altimate-receipts)
11
10
  [![CI](https://github.com/AltimateAI/altimate-receipts/actions/workflows/ci.yml/badge.svg)](https://github.com/AltimateAI/altimate-receipts/actions/workflows/ci.yml)
@@ -14,225 +13,209 @@ read straight from the agent's own transcript, on your machine.
14
13
  </div>
15
14
 
16
15
  ```sh
17
- # 30-second start — no install, no account:
18
- npx altimate-receipts # Report Card for your most recent agent session
16
+ npx altimate-receipts # no install, no account — reads your last agent session
19
17
  ```
20
18
 
21
- > **Adding it to a repo's PRs?** `npx altimate-receipts init` — one command, one PR,
22
- > and receipts attach themselves from then on. Contributors install nothing. See
23
- > [Add it to a repo](#add-it-to-a-repo-nothing-to-install).
19
+ ---
20
+
21
+ ## The problem
22
+
23
+ Your agent writes the code now. You review the **diff** — but the diff is only the
24
+ *result*. It can't show you the **work** behind it:
24
25
 
25
26
  ```text
27
+ the agent's work ████████████████████████████ 1 hour · 90 commands · 40 files
28
+ what you review ███ the final diff
29
+ ```
30
+
31
+ The test that failed and got retried until green, the file quietly deleted, the check
32
+ weakened to pass, the *"all done!"* that wasn't — it's all in the part you didn't read.
33
+
34
+ ## What receipts does
35
+
36
+ It reads the transcript your agent already saved and shows you what it **actually did** —
37
+ cited to the line, deterministic, on your machine.
38
+
39
+ > Your tests say whether the code is **good**. Receipts says what the agent **did**.
40
+
41
+ ### On a pull request
42
+
43
+ The comment receipts leaves on the PR — the events that deserve a human's eyes, then the
44
+ full append-only record. No grade, no verdict:
45
+
46
+ > ### Agent work record — claude-code · 16 files · spent ≈ $13.72 on this PR
47
+ >
48
+ > push 2 · `7289522` — +0 new · 0 cleared · 1 open · transcript covers 16/16 changed files
49
+ >
50
+ > **Sanity-check**
51
+ > - [`src/cli.ts:699`](https://github.com/AltimateAI/altimate-receipts/pull/120/files) — **Silently swallowed an error in cli.ts** · error hidden · evidence `tool-1983-0`
52
+ >
53
+ > Other 15 files: nothing detected.
54
+ >
55
+ > <sub>tests: 564 passed _(parsed from runner output in transcript)_</sub>
56
+ >
57
+ > <details><summary>Record — pushes, files, checks, custody (append-only)</summary>
58
+ >
59
+ > | push | events | change cost |
60
+ > | :-- | :-- | :-- |
61
+ > | 1 · 2026-06-12 | +0 new · 0 cleared · 1 open | — |
62
+ > | 2 · `7289522` · 2026-06-12 | +0 new · 0 cleared · 1 open | $13.72 |
63
+ >
64
+ > **Checks** _("not detected" means this check found nothing — not that nothing exists)_
65
+ >
66
+ > | check | reading |
67
+ > | :-- | :-- |
68
+ > | swallowed errors | 1 detected |
69
+ > | destructive ops | not detected |
70
+ > | CI/CD tampering | not detected |
71
+ > | … 14 more checks | not detected |
72
+ >
73
+ > </details>
74
+ >
75
+ > <sub>a record of the agent's process, not a code review · entries are never edited or removed · re-derivable (L1) · 0 model calls</sub>
76
+
77
+ _Real output — [the PR that built this very feature](https://github.com/AltimateAI/altimate-receipts/pull/120#issuecomment-4693301980), wearing its own record: the masthead carries the PR's attributed agent spend, the ledger shows per-commit cost, and the one flagged event is the author's own swallowed error._
78
+
79
+ ### In your terminal
80
+
81
+ `npx altimate-receipts` prints the same record as a card:
82
+
83
+ ```text
84
+
26
85
  ╔══════════════════════════════════════════════════════════════════════════╗
27
- ║ 🧾 RECEIPTS — Agent Report Card proof, not vibes ║
86
+ ║ 🧾 RECEIPTS — Agent Work Record proof, not vibes ║
28
87
  ╚══════════════════════════════════════════════════════════════════════════╝
29
88
 
30
- Session Add rate limiting to the billing service
31
- Agent claude-code · claude-opus-4-8
32
- Scope 1h 4m · 412 msgs · 318 tools · 6M tok · $24.18
89
+ Session Pull latest main branch
90
+ Agent claude-code · claude-fable-5
91
+ Scope 44h 18m · 1k msgs · 407 tools · 342M tok · $713.90
33
92
 
34
- ┌─ VERDICT ────────────────────────────────────────────────────────────────┐
35
- F ⛔ DO NOT MERGE WITHOUT REVIEW
36
- │ 2 critical · 3 high · 1 medium │
93
+ ┌─ RECORD ─────────────────────────────────────────────────────────────────┐
94
+ 3 critical · 4 high · 2 medium
37
95
  └──────────────────────────────────────────────────────────────────────────┘
38
96
 
39
97
  CRITICAL
40
- ⛔ Destructive op: rm -rf ./dist ./build ×2
98
+ ⛔ Destructive op: git checkout -- .github/meta/commit.txt
99
+ data-loss risk
100
+ ⛔ Destructive op: git reset --hard origin/main
101
+ data-loss risk
102
+ ⛔ Destructive op: rm -rf /tmp/codex-gaps
41
103
  data-loss risk
42
- ⛔ Modified the grader/harness: score.py
43
- gamed the eval · score.py
44
104
  HIGH
105
+ ⚠️ Edited a file it never read: SPEC-0074-m75-store-ref-default.md
106
+ clobber risk · SPEC-0074-m75-store-ref-default.md
45
107
  ⚠️ Force-pushed over remote history
46
108
  history overwrite
47
- ⚠️ Weakened the checker config: tsconfig.json
48
- checker defanged · tsconfig.json
49
- ⚠️ Stuck loop wasted $3.10 / 7m
50
- $3.10 · 7m
109
+ ⚠️ Rewrote git history made earlier this session
110
+ history rewritten
111
+ ⚠️ Stuck loop wasted $8.75 / 3m 12s
112
+ $8.75 · 3m 12s
51
113
  MEDIUM
52
- 🔍 6 turns drove 57% of the $24.18 spend
53
- $13.78
114
+ 🔍 Bash failed 10× before succeeding
115
+ 10 retries
116
+ 🔍 Edit failed 4× before succeeding
117
+ 4 retries
118
+ ▸ 1 minor (collapsed)
54
119
 
55
120
  EVIDENCE
56
- 58 files changed · 146 edits · 92 commands · tests ran ✓ · 4 destructive ops · cache 0%
121
+ 21 files changed · 55 edits · 308 commands · tests ran ✓ · 13 destructive ops · cache 100%
57
122
 
58
123
  ──────────────────────────────────────────────────────────────────────────
59
124
  ✅ Verified by Receipts · deterministic · 0 model calls · evidence, not judgement
60
125
  what it did — not whether it's correct. your tests are the oracle for success.
61
126
  ```
62
127
 
63
- > _Illustrative output. Run `receipts` on your own Claude Code sessions to see the real card._
128
+ > _Also real, reproduced verbatim: a 44-hour receipts development session, graded by the
129
+ > tool it was building — the destructive ops, the force-push, and the $8.75 stuck loop are
130
+ > all the author's own. A clean session reads `nothing detected` (not a ✅ pass), because
131
+ > "not detected" is a fact about what was checked, never a verdict on your code._
64
132
 
65
- ---
66
-
67
- ## The problem
68
-
69
- Agents now write most of the code. It is no longer humanly possible to read every line
70
- of the agent's **output** — the diff — and a long session hides far more than a diff
71
- shows: commands run, files touched then reverted, tests that "passed," destructive ops,
72
- quietly weakened checkers.
73
-
74
- So review the agent's **work**, not its output. Instead of re-reading every line, you
75
- read a faithful, deterministic account of what the agent *actually did* — and spend your
76
- attention where the account flags something.
133
+ ## What it catches that the diff and green CI don't
77
134
 
78
- This is a new category: **agent-work verification.** It is **not** code review and never
79
- grades code quality. Your tests remain the oracle for correctness; Receipts is the
80
- oracle for *what happened*.
135
+ - **"Tests pass" did they?** The run that printed `FAILED` right before the agent declared success.
136
+ - **Claimed vs. actually done.** "Committed and pushed," "added the validation" checked against the trace.
137
+ - **Destructive ops.** `rm -rf`, force-pushes, history rewrites.
138
+ - **Gamed checks.** A weakened linter or `tsconfig`, an edited grader or test assertion.
139
+ - **Quiet churn.** Files edited then reverted; loops that burned spend with nothing to show.
81
140
 
82
- ## What a Receipt is
83
-
84
- **receipts** reads the agent's own local transcript and prints a deterministic,
85
- `file:line`-cited account of what the agent did:
86
-
87
- - **What changed** — files, edits, the actual bodies.
88
- - **What ran** — commands, and whether the tests actually ran.
89
- - **What it touched that it shouldn't have** — destructive ops, history rewrites,
90
- test/eval tampering, and other research-backed reward-hacking patterns.
91
- - **Claims vs. evidence** — what the agent *said* it did, checked against the transcript.
92
-
93
- Every finding is `file:line`-cited or it doesn't ship. Nothing leaves your machine.
94
-
95
- → **[What problems Receipts solves](./docs/problems.md)** — the before/after for each
96
- capability, shipped and planned.
141
+ Every finding is cited to a line in the transcript, or it doesn't ship.
97
142
 
98
143
  ## Why you can trust it
99
144
 
100
- The whole pitch collapses if the account itself is wrong, so trust is engineered in:
101
-
102
- - **Deterministic zero model calls.** Findings come from regex/heuristic rules over
103
- the transcript. Same input same output. There is no LLM in the product path to
104
- hallucinate.
105
- - **Near-zero false positives, now *measured*.** A single false alarm and you go back to
106
- reading diffs, so this is treated as existential and tested, not asserted: **precision
107
- 100% on a 70-session labeled corpus**, and a **1% flag rate over 1,200 real local
108
- sessions** in the out-of-sample field scan every flag manually adjudicated as a true
109
- positive on genuine work, **zero confirmed false positives**. See
110
- [`docs/eval.md`](./docs/eval.md).
111
- - **Evidence, not judgement.** Receipts reports what an agent *did*, never whether the
112
- code is *good*.
113
- - **Local-first.** The CLI runs entirely on your machine — no account, no upload, no
114
- telemetry. (The optional CI Action can emit *anonymous, aggregate* counts only; off by
115
- default in releases. See [telemetry](./docs/telemetry.md).)
116
-
117
- ## How it works
118
-
119
- ```
120
- the agent's own local transcript (raw JSONL / SQLite)
121
- → adapter (per agent) normalize to one session model, tool calls passed raw
122
- → spans derive edits / commands / reads / cost / destructive ops
123
- → findings deterministic detectors → file:line-cited findings
124
- → Report Card human-readable, severity-grouped (default output)
125
- → Receipt (--json) portable, schema-validated object; optionally signed
126
- ```
127
-
128
- Receipts reads the transcript your agent already writes to disk — it doesn't sit between
129
- you and the model, doesn't need an API key, and doesn't phone home. One Report Card
130
- spans **Claude Code, Codex, Cursor, and OpenClaw**; detectors are agent-agnostic.
131
-
132
- ## Invariants
145
+ - **Deterministic no LLM in the path.** Findings come from rules over the transcript.
146
+ Same session in, same account out. There is nothing to hallucinate.
147
+ - **It never grades your code.** It reports *what happened*, never *whether it's good* —
148
+ no letter grade, no "do not merge." Your tests stay the judge of correctness.
149
+ - **Near-zero false positives — measured, not claimed.** 100% precision on a 70-session
150
+ labeled corpus; a 1% flag rate across 1,200 real local sessions, every flag adjudicated
151
+ as a true positive, zero confirmed false positives ([`docs/eval.md`](./docs/eval.md)).
152
+ - **Local-first.** Runs entirely on your machine. No account, no upload, no telemetry.
153
+ - **Works with your agent.** Claude Code, Codex, Cursor, OpenClaw one tool, one format.
133
154
 
134
- These are binding constraints, not aspirations (full set in
135
- [`SPEC-0000`](./specs/SPEC-0000-product.md)):
136
-
137
- - **Deterministic — zero model calls in the product path.** (R1)
138
- - **Evidence, not judgement** — never grades code quality. (R2)
139
- - **Near-zero false positives, measured** — see [`docs/eval.md`](./docs/eval.md). (R3)
140
- - **Local-first** — no upload; the opt-in CI Action emits only anonymous aggregate
141
- telemetry. (R4)
142
-
143
- ## Usage
155
+ ## Try it (10 seconds)
144
156
 
145
157
  ```sh
146
- receipts # Report Card for your most recent session (any agent)
147
- receipts --list # list recent sessions across all agents
148
- receipts --agent codex # limit to one agent: claude-code | codex | cursor | openclaw
149
- receipts 3 # the 3rd session from --list
150
- receipts "billing" # first session whose title contains "billing"
151
- receipts --json # emit the Receipt object (in-toto Statement)
152
- receipts --json --compact # canonical (sorted, minified) JSON
153
- receipts --share # redacted, paste-ready Markdown summary
154
- receipts guardrails --last 5 # prevention rules for AGENTS.md from recent sessions
155
- receipts trends # cross-session digest: grades, recurring findings, cost
156
- receipts trends --last 20 # span a wider window (default 10)
157
- receipts pr # write THIS branch's receipt (branch-scoped) to .receipts/
158
- receipts verify <bundle> --transcript <t> # prove the receipt is faithful (L1)
159
- receipts diff <a> <b> # what changed between two receipts (deltas; --json)
160
- receipts log # list the committed receipts in .receipts/ (--last N)
161
- receipts stats # dogfooding scoreboard: how often it ran + what it caught
162
- receipts eval # flag-rate of the detectors over your real local sessions
163
- receipts badge [receipt] # shields.io endpoint JSON for a README/PR badge
164
- receipts sarif [receipt] # SARIF 2.1.0 for GitHub code-scanning (inline + Security tab)
165
- receipts prune # remove committed receipts for merged/deleted branches (--dry-run)
166
- receipts init # 1-command adopt: PR-check workflow + the repo-committed agent hook
167
- receipts install-hook # write the self-updating git pre-push hook into .git/hooks
168
- receipts setup-local # add the self-gated hook to YOUR ~/.claude/settings.json
169
- receipts rederive <t> # reproduce the canonical receipt from a transcript
170
- receipts mcp # start the MCP server (stdio) for IDEs/agents
171
- receipts --no-color # plain text (also honors NO_COLOR)
158
+ npx altimate-receipts # your most recent agent session
159
+ npx altimate-receipts --list # choose from recent sessions, any agent
172
160
  ```
173
161
 
174
- Most people only ever need the Report Card. The **Receipt** (`receipts --json`) is
175
- there when you want a portable, vendor-neutral record to feed tooling — an
176
- [in-toto](https://in-toto.io) Statement carrying the same deterministic evidence +
177
- findings ([schema](./schema/agent-execution-receipt-v1.json)).
178
-
179
- ## Status
180
-
181
- 🚧 **Early.** Working today across **Claude Code**, **Codex**, **Cursor**, and
182
- **OpenClaw**:
162
+ No install and no signup it reads transcripts that are already on your disk.
183
163
 
184
- - the deterministic **Report Card** (`receipts`) — the core,
185
- - **prevention rules** you can paste into `AGENTS.md` (`receipts guardrails`),
186
- - a cross-session **trends** digest — what your agent does wrong over time (`receipts trends`),
187
- - a portable **Receipt** (`receipts --json`) + a redacted **`--share`** summary,
188
- - an **MCP server** (`receipts mcp`) for IDEs/agents ([docs](./docs/mcp.md)).
189
-
190
- The findings engine is measured for fidelity — **precision 100%** on a 70-session
191
- labeled corpus, **1% flag rate** over 1,200 real local sessions (every flag adjudicated as
192
- a true positive; zero confirmed false positives) ([`docs/eval.md`](./docs/eval.md)).
193
- See the roadmap in [`specs/`](./specs).
194
-
195
- ## For teams (optional)
196
-
197
- Need to attach the Receipt to a PR or prove it in CI? A Receipt can be
198
- Sigstore-signed and posted as a **"Verified-by: Receipts"** check
199
- ([docs](./docs/verified-by.md)), and re-derived from its transcript to prove it
200
- wasn't hand-edited (`receipts verify --transcript`, [trust model](./docs/trust.md)).
201
- This is **opt-in** — the value is the Report Card; signing is there for when your
202
- org's compliance or settlement needs make it concrete.
203
-
204
- ## Add it to a repo (nothing to install)
205
-
206
- **Repo owner, once:**
164
+ ## Add it to your repo's PRs
207
165
 
208
166
  ```sh
209
167
  npx altimate-receipts init
210
168
  ```
211
169
 
212
- Commit the files it writes, merge the PR — done. From then on, any Claude Code session
213
- that pushes a branch attaches its receipt automatically, *before* the push, so it lands
214
- on the PR with no extra step.
170
+ Commit the files it writes and merge the PR. From then on, every agent session attaches
171
+ its Work Record to the PR automatically, before the push **contributors install
172
+ nothing**. The hook rides along with `git clone`, the CLI is fetched on demand, and
173
+ publishing a new release updates everyone. Every path is best-effort: a missing session
174
+ or an unreachable registry never blocks a push.
175
+
176
+ Humans pushing from a terminal, other agents, or repos that won't commit `.claude/`
177
+ config → **[the onboarding guide](./docs/onboarding.md)** has a one-command fallback for
178
+ each.
179
+
180
+ ## Going deeper (optional)
181
+
182
+ - **`receipts --json`** — a portable, vendor-neutral [in-toto](https://in-toto.io)
183
+ Receipt carrying the same evidence, for feeding other tooling
184
+ ([schema](./schema/agent-execution-receipt-v1.json)).
185
+ - **Sign it.** A Receipt can be Sigstore-signed and posted as a *"Verified by Receipts"*
186
+ PR check ([docs](./docs/verified-by.md)), then re-derived from its transcript to prove
187
+ it wasn't hand-edited ([trust model](./docs/trust.md)). Opt-in — for when compliance or
188
+ settlement needs make it concrete.
189
+ - **More surfaces.** `receipts guardrails` (prevention rules for `AGENTS.md`),
190
+ `receipts trends` (what your agent gets wrong over time), `receipts mcp` (for
191
+ IDEs/agents, [docs](./docs/mcp.md)), SARIF for code-scanning, badges. Run
192
+ `receipts --help`.
215
193
 
216
- **Contributors install nothing.** The hook travels with `git clone` (Claude Code asks
217
- once to approve it), the CLI is fetched by npx on demand, and publishing a new release
218
- updates everyone automatically. Every path is best-effort — a missing session or an
219
- unreachable registry never blocks a push.
194
+ ## How it works
220
195
 
221
- Humans pushing from a terminal, other coding agents, or repos that won't commit
222
- `.claude/` config: **[the onboarding guide](./docs/onboarding.md)** has a one-command
223
- fallback for each.
196
+ ```
197
+ the agent's own local transcript (JSONL / SQLite)
198
+ adapter normalize each agent's format to one session model (tool calls raw)
199
+ → spans edits · commands · reads · cost · destructive ops
200
+ → detectors deterministic, file:line-cited findings
201
+ → Work Record the human-readable account (default), or a signed --json Receipt
202
+ ```
224
203
 
225
- ## Dogfooding
204
+ One pipeline, every agent, nothing leaving your machine. The full set of binding
205
+ constraints (deterministic, evidence-not-judgement, near-zero-FP, local-first) lives in
206
+ [`SPEC-0000`](./specs/SPEC-0000-product.md).
207
+
208
+ ## Status
226
209
 
227
- Receipts audits its own development: open this repo in your coding agent and
228
- `receipts pr` attaches a **branch-scoped** Receipt to each PR (a pre-push hook and
229
- [`AGENTS.md`](./AGENTS.md) automate it).
210
+ 🚧 **Early, and working today** across Claude Code, Codex, Cursor, and OpenClaw. The
211
+ detection engine is measured for fidelity (see trust, above); the roadmap lives in
212
+ [`specs/`](./specs). Receipts audits its own development — every PR in this repo carries
213
+ its own Work Record.
230
214
 
231
- ## How we build: spec-driven
215
+ ## How we build
232
216
 
233
- Every change starts from a spec in [`specs/`](./specs) (use
234
- [`specs/TEMPLATE.md`](./specs/TEMPLATE.md)). The product vision is
235
- [`SPEC-0000`](./specs/SPEC-0000-product.md). See [CONTRIBUTING.md](./CONTRIBUTING.md).
217
+ Spec-driven: every change starts from a spec in [`specs/`](./specs)
218
+ ([template](./specs/TEMPLATE.md), [contributing](./CONTRIBUTING.md)).
236
219
 
237
220
  ## License
238
221