npm - altimate-receipts - Versions diffs - 0.8.0 → 0.10.0 - Mend

altimate-receipts 0.8.0 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/README.md +158 -175
package/dist/{chunk-SGVU3CGP.js → chunk-63E3RZHD.js} +283 -166
package/dist/chunk-63E3RZHD.js.map +1 -0
package/dist/{chunk-6QULLAVH.js → chunk-JE6HSACL.js} +2 -2
package/dist/{chunk-NPYC3NGR.js → chunk-KM6VCSVW.js} +2 -2
package/dist/chunk-KM6VCSVW.js.map +1 -0
package/dist/cli.js +489 -76
package/dist/cli.js.map +1 -1
package/dist/index.js +2 -2
package/dist/mcp/server.js +2 -2
package/package.json +4 -1
package/dist/chunk-NPYC3NGR.js.map +0 -1
package/dist/chunk-SGVU3CGP.js.map +0 -1
/package/dist/{chunk-6QULLAVH.js.map → chunk-JE6HSACL.js.map} +0 -0

package/README.md CHANGED Viewed

@@ -2,10 +2,9 @@
 # 🧾 receipts
-**Agent-work verification — not code review.**
+**See what your AI coding agent actually did — not just what it says it did.**
-A deterministic, cross-agent **Report Card** of what your coding agent *actually did* —
-read straight from the agent's own transcript, on your machine.
+Deterministic · local · reads the agent's own transcript · never grades your code.
 [![npm](https://img.shields.io/npm/v/altimate-receipts.svg)](https://www.npmjs.com/package/altimate-receipts)
 [![CI](https://github.com/AltimateAI/altimate-receipts/actions/workflows/ci.yml/badge.svg)](https://github.com/AltimateAI/altimate-receipts/actions/workflows/ci.yml)
@@ -14,225 +13,209 @@ read straight from the agent's own transcript, on your machine.
 </div>
 ```sh
-# 30-second start — no install, no account:
-npx altimate-receipts          # Report Card for your most recent agent session
+npx altimate-receipts          # no install, no account — reads your last agent session
 ```
-> **Adding it to a repo's PRs?** `npx altimate-receipts init` — one command, one PR,
-> and receipts attach themselves from then on. Contributors install nothing. See
-> [Add it to a repo](#add-it-to-a-repo-nothing-to-install).
+---
+## The problem
+Your agent writes the code now. You review the **diff** — but the diff is only the
+*result*. It can't show you the **work** behind it:
 ```text
+  the agent's work    ████████████████████████████   1 hour · 90 commands · 40 files
+  what you review     ███                             the final diff
+```
+The test that failed and got retried until green, the file quietly deleted, the check
+weakened to pass, the *"all done!"* that wasn't — it's all in the part you didn't read.
+## What receipts does
+It reads the transcript your agent already saved and shows you what it **actually did** —
+cited to the line, deterministic, on your machine.
+> Your tests say whether the code is **good**. Receipts says what the agent **did**.
+### On a pull request
+The comment receipts leaves on the PR — the events that deserve a human's eyes, then the
+full append-only record. No grade, no verdict:
+> ### Agent work record — claude-code · 16 files · spent ≈ $13.72 on this PR
+>
+> push 2 · `7289522` — +0 new · 0 cleared · 1 open · transcript covers 16/16 changed files
+>
+> **Sanity-check**
+> - [`src/cli.ts:699`](https://github.com/AltimateAI/altimate-receipts/pull/120/files) — **Silently swallowed an error in cli.ts** · error hidden · evidence `tool-1983-0`
+>
+> Other 15 files: nothing detected.
+>
+> <sub>tests: 564 passed _(parsed from runner output in transcript)_</sub>
+>
+> <details><summary>Record — pushes, files, checks, custody (append-only)</summary>
+>
+> | push | events | change cost |
+> | :-- | :-- | :-- |
+> | 1 · 2026-06-12 | +0 new · 0 cleared · 1 open | — |
+> | 2 · `7289522` · 2026-06-12 | +0 new · 0 cleared · 1 open | $13.72 |
+>
+> **Checks** _("not detected" means this check found nothing — not that nothing exists)_
+>
+> | check | reading |
+> | :-- | :-- |
+> | swallowed errors | 1 detected |
+> | destructive ops | not detected |
+> | CI/CD tampering | not detected |
+> | … 14 more checks | not detected |
+>
+> </details>
+>
+> <sub>a record of the agent's process, not a code review · entries are never edited or removed · re-derivable (L1) · 0 model calls</sub>
+_Real output — [the PR that built this very feature](https://github.com/AltimateAI/altimate-receipts/pull/120#issuecomment-4693301980), wearing its own record: the masthead carries the PR's attributed agent spend, the ledger shows per-commit cost, and the one flagged event is the author's own swallowed error._
+### In your terminal
+`npx altimate-receipts` prints the same record as a card:
+```text
   ╔══════════════════════════════════════════════════════════════════════════╗
-  ║ 🧾  RECEIPTS — Agent Report Card                        proof, not vibes ║
+  ║ 🧾  RECEIPTS — Agent Work Record                        proof, not vibes ║
   ╚══════════════════════════════════════════════════════════════════════════╝
-  Session  Add rate limiting to the billing service
-  Agent    claude-code · claude-opus-4-8
-  Scope    1h 4m · 412 msgs · 318 tools · 6M tok · $24.18
+  Session  Pull latest main branch
+  Agent    claude-code · claude-fable-5
+  Scope    44h 18m · 1k msgs · 407 tools · 342M tok · $713.90
-  ┌─ VERDICT ────────────────────────────────────────────────────────────────┐
-  │   F     ⛔ DO NOT MERGE WITHOUT REVIEW                                   │
-  │ 2 critical · 3 high · 1 medium                                           │
+  ┌─ RECORD ─────────────────────────────────────────────────────────────────┐
+  │ 3 critical · 4 high · 2 medium                                           │
   └──────────────────────────────────────────────────────────────────────────┘
   CRITICAL
-   ⛔ Destructive op: rm -rf ./dist ./build ×2
+   ⛔ Destructive op: git checkout -- .github/meta/commit.txt
+      data-loss risk
+   ⛔ Destructive op: git reset --hard origin/main
+      data-loss risk
+   ⛔ Destructive op: rm -rf /tmp/codex-gaps
       data-loss risk
-   ⛔ Modified the grader/harness: score.py
-      gamed the eval · score.py
   HIGH
+   ⚠️  Edited a file it never read: SPEC-0074-m75-store-ref-default.md
+      clobber risk · SPEC-0074-m75-store-ref-default.md
    ⚠️  Force-pushed over remote history
       history overwrite
-   ⚠️  Weakened the checker config: tsconfig.json
-      checker defanged · tsconfig.json
-   ⚠️  Stuck loop wasted $3.10 / 7m
-      $3.10 · 7m
+   ⚠️  Rewrote git history made earlier this session
+      history rewritten
+   ⚠️  Stuck loop wasted $8.75 / 3m 12s
+      $8.75 · 3m 12s
   MEDIUM
-   🔍 6 turns drove 57% of the $24.18 spend
-      $13.78
+   🔍 Bash failed 10× before succeeding
+      10 retries
+   🔍 Edit failed 4× before succeeding
+      4 retries
+     ▸ 1 minor (collapsed)
   EVIDENCE
-   58 files changed · 146 edits · 92 commands · tests ran ✓ · 4 destructive ops · cache 0%
+   21 files changed · 55 edits · 308 commands · tests ran ✓ · 13 destructive ops · cache 100%
   ──────────────────────────────────────────────────────────────────────────
   ✅ Verified by Receipts  ·  deterministic  ·  0 model calls  ·  evidence, not judgement
   what it did — not whether it's correct. your tests are the oracle for success.
 ```
-> _Illustrative output. Run `receipts` on your own Claude Code sessions to see the real card._
+> _Also real, reproduced verbatim: a 44-hour receipts development session, graded by the
+> tool it was building — the destructive ops, the force-push, and the $8.75 stuck loop are
+> all the author's own. A clean session reads `nothing detected` (not a ✅ pass), because
+> "not detected" is a fact about what was checked, never a verdict on your code._
----
-## The problem
-Agents now write most of the code. It is no longer humanly possible to read every line
-of the agent's **output** — the diff — and a long session hides far more than a diff
-shows: commands run, files touched then reverted, tests that "passed," destructive ops,
-quietly weakened checkers.
-So review the agent's **work**, not its output. Instead of re-reading every line, you
-read a faithful, deterministic account of what the agent *actually did* — and spend your
-attention where the account flags something.
+## What it catches that the diff and green CI don't
-This is a new category: **agent-work verification.** It is **not** code review and never
-grades code quality. Your tests remain the oracle for correctness; Receipts is the
-oracle for *what happened*.
+- **"Tests pass" — did they?** The run that printed `FAILED` right before the agent declared success.
+- **Claimed vs. actually done.** "Committed and pushed," "added the validation" — checked against the trace.
+- **Destructive ops.** `rm -rf`, force-pushes, history rewrites.
+- **Gamed checks.** A weakened linter or `tsconfig`, an edited grader or test assertion.
+- **Quiet churn.** Files edited then reverted; loops that burned spend with nothing to show.
-## What a Receipt is
-**receipts** reads the agent's own local transcript and prints a deterministic,
-`file:line`-cited account of what the agent did:
-- **What changed** — files, edits, the actual bodies.
-- **What ran** — commands, and whether the tests actually ran.
-- **What it touched that it shouldn't have** — destructive ops, history rewrites,
-  test/eval tampering, and other research-backed reward-hacking patterns.
-- **Claims vs. evidence** — what the agent *said* it did, checked against the transcript.
-Every finding is `file:line`-cited or it doesn't ship. Nothing leaves your machine.
-→ **[What problems Receipts solves](./docs/problems.md)** — the before/after for each
-capability, shipped and planned.
+Every finding is cited to a line in the transcript, or it doesn't ship.
 ## Why you can trust it
-The whole pitch collapses if the account itself is wrong, so trust is engineered in:
-- **Deterministic — zero model calls.** Findings come from regex/heuristic rules over
-  the transcript. Same input → same output. There is no LLM in the product path to
-  hallucinate.
-- **Near-zero false positives, now *measured*.** A single false alarm and you go back to
-  reading diffs, so this is treated as existential and tested, not asserted: **precision
-  100% on a 70-session labeled corpus**, and a **1% flag rate over 1,200 real local
-  sessions** in the out-of-sample field scan — every flag manually adjudicated as a true
-  positive on genuine work, **zero confirmed false positives**. See
-  [`docs/eval.md`](./docs/eval.md).
-- **Evidence, not judgement.** Receipts reports what an agent *did*, never whether the
-  code is *good*.
-- **Local-first.** The CLI runs entirely on your machine — no account, no upload, no
-  telemetry. (The optional CI Action can emit *anonymous, aggregate* counts only; off by
-  default in releases. See [telemetry](./docs/telemetry.md).)
-## How it works
-```
-the agent's own local transcript (raw JSONL / SQLite)
-  → adapter (per agent)   normalize to one session model, tool calls passed raw
-  → spans                 derive edits / commands / reads / cost / destructive ops
-  → findings              deterministic detectors → file:line-cited findings
-  → Report Card           human-readable, severity-grouped (default output)
-  → Receipt (--json)      portable, schema-validated object; optionally signed
-```
-Receipts reads the transcript your agent already writes to disk — it doesn't sit between
-you and the model, doesn't need an API key, and doesn't phone home. One Report Card
-spans **Claude Code, Codex, Cursor, and OpenClaw**; detectors are agent-agnostic.
-## Invariants
+- **Deterministic — no LLM in the path.** Findings come from rules over the transcript.
+  Same session in, same account out. There is nothing to hallucinate.
+- **It never grades your code.** It reports *what happened*, never *whether it's good* —
+  no letter grade, no "do not merge." Your tests stay the judge of correctness.
+- **Near-zero false positives — measured, not claimed.** 100% precision on a 70-session
+  labeled corpus; a 1% flag rate across 1,200 real local sessions, every flag adjudicated
+  as a true positive, zero confirmed false positives ([`docs/eval.md`](./docs/eval.md)).
+- **Local-first.** Runs entirely on your machine. No account, no upload, no telemetry.
+- **Works with your agent.** Claude Code, Codex, Cursor, OpenClaw — one tool, one format.
-These are binding constraints, not aspirations (full set in
-[`SPEC-0000`](./specs/SPEC-0000-product.md)):
-- **Deterministic — zero model calls in the product path.** (R1)
-- **Evidence, not judgement** — never grades code quality. (R2)
-- **Near-zero false positives, measured** — see [`docs/eval.md`](./docs/eval.md). (R3)
-- **Local-first** — no upload; the opt-in CI Action emits only anonymous aggregate
-  telemetry. (R4)
-## Usage
+## Try it (10 seconds)
 ```sh
-receipts                  # Report Card for your most recent session (any agent)
-receipts --list           # list recent sessions across all agents
-receipts --agent codex    # limit to one agent: claude-code | codex | cursor | openclaw
-receipts 3                # the 3rd session from --list
-receipts "billing"        # first session whose title contains "billing"
-receipts --json           # emit the Receipt object (in-toto Statement)
-receipts --json --compact # canonical (sorted, minified) JSON
-receipts --share          # redacted, paste-ready Markdown summary
-receipts guardrails --last 5  # prevention rules for AGENTS.md from recent sessions
-receipts trends           # cross-session digest: grades, recurring findings, cost
-receipts trends --last 20 # span a wider window (default 10)
-receipts pr               # write THIS branch's receipt (branch-scoped) to .receipts/
-receipts verify <bundle> --transcript <t>   # prove the receipt is faithful (L1)
-receipts diff <a> <b>     # what changed between two receipts (deltas; --json)
-receipts log              # list the committed receipts in .receipts/ (--last N)
-receipts stats            # dogfooding scoreboard: how often it ran + what it caught
-receipts eval             # flag-rate of the detectors over your real local sessions
-receipts badge [receipt]  # shields.io endpoint JSON for a README/PR badge
-receipts sarif [receipt]  # SARIF 2.1.0 for GitHub code-scanning (inline + Security tab)
-receipts prune            # remove committed receipts for merged/deleted branches (--dry-run)
-receipts init             # 1-command adopt: PR-check workflow + the repo-committed agent hook
-receipts install-hook     # write the self-updating git pre-push hook into .git/hooks
-receipts setup-local      # add the self-gated hook to YOUR ~/.claude/settings.json
-receipts rederive <t>     # reproduce the canonical receipt from a transcript
-receipts mcp              # start the MCP server (stdio) for IDEs/agents
-receipts --no-color       # plain text (also honors NO_COLOR)
+npx altimate-receipts            # your most recent agent session
+npx altimate-receipts --list     # choose from recent sessions, any agent
 ```
-Most people only ever need the Report Card. The **Receipt** (`receipts --json`) is
-there when you want a portable, vendor-neutral record to feed tooling — an
-[in-toto](https://in-toto.io) Statement carrying the same deterministic evidence +
-findings ([schema](./schema/agent-execution-receipt-v1.json)).
-## Status
-🚧 **Early.** Working today across **Claude Code**, **Codex**, **Cursor**, and
-**OpenClaw**:
+No install and no signup — it reads transcripts that are already on your disk.
-- the deterministic **Report Card** (`receipts`) — the core,
-- **prevention rules** you can paste into `AGENTS.md` (`receipts guardrails`),
-- a cross-session **trends** digest — what your agent does wrong over time (`receipts trends`),
-- a portable **Receipt** (`receipts --json`) + a redacted **`--share`** summary,
-- an **MCP server** (`receipts mcp`) for IDEs/agents ([docs](./docs/mcp.md)).
-The findings engine is measured for fidelity — **precision 100%** on a 70-session
-labeled corpus, **1% flag rate** over 1,200 real local sessions (every flag adjudicated as
-a true positive; zero confirmed false positives) ([`docs/eval.md`](./docs/eval.md)).
-See the roadmap in [`specs/`](./specs).
-## For teams (optional)
-Need to attach the Receipt to a PR or prove it in CI? A Receipt can be
-Sigstore-signed and posted as a **"Verified-by: Receipts"** check
-([docs](./docs/verified-by.md)), and re-derived from its transcript to prove it
-wasn't hand-edited (`receipts verify --transcript`, [trust model](./docs/trust.md)).
-This is **opt-in** — the value is the Report Card; signing is there for when your
-org's compliance or settlement needs make it concrete.
-## Add it to a repo (nothing to install)
-**Repo owner, once:**
+## Add it to your repo's PRs
 ```sh
 npx altimate-receipts init
 ```
-Commit the files it writes, merge the PR — done. From then on, any Claude Code session
-that pushes a branch attaches its receipt automatically, *before* the push, so it lands
-on the PR with no extra step.
+Commit the files it writes and merge the PR. From then on, every agent session attaches
+its Work Record to the PR automatically, before the push — **contributors install
+nothing**. The hook rides along with `git clone`, the CLI is fetched on demand, and
+publishing a new release updates everyone. Every path is best-effort: a missing session
+or an unreachable registry never blocks a push.
+Humans pushing from a terminal, other agents, or repos that won't commit `.claude/`
+config → **[the onboarding guide](./docs/onboarding.md)** has a one-command fallback for
+each.
+## Going deeper (optional)
+- **`receipts --json`** — a portable, vendor-neutral [in-toto](https://in-toto.io)
+  Receipt carrying the same evidence, for feeding other tooling
+  ([schema](./schema/agent-execution-receipt-v1.json)).
+- **Sign it.** A Receipt can be Sigstore-signed and posted as a *"Verified by Receipts"*
+  PR check ([docs](./docs/verified-by.md)), then re-derived from its transcript to prove
+  it wasn't hand-edited ([trust model](./docs/trust.md)). Opt-in — for when compliance or
+  settlement needs make it concrete.
+- **More surfaces.** `receipts guardrails` (prevention rules for `AGENTS.md`),
+  `receipts trends` (what your agent gets wrong over time), `receipts mcp` (for
+  IDEs/agents, [docs](./docs/mcp.md)), SARIF for code-scanning, badges. Run
+  `receipts --help`.
-**Contributors install nothing.** The hook travels with `git clone` (Claude Code asks
-once to approve it), the CLI is fetched by npx on demand, and publishing a new release
-updates everyone automatically. Every path is best-effort — a missing session or an
-unreachable registry never blocks a push.
+## How it works
-Humans pushing from a terminal, other coding agents, or repos that won't commit
-`.claude/` config: → **[the onboarding guide](./docs/onboarding.md)** has a one-command
-fallback for each.
+```
+the agent's own local transcript (JSONL / SQLite)
+  → adapter        normalize each agent's format to one session model (tool calls raw)
+  → spans          edits · commands · reads · cost · destructive ops
+  → detectors      deterministic, file:line-cited findings
+  → Work Record    the human-readable account (default), or a signed --json Receipt
+```
-## Dogfooding
+One pipeline, every agent, nothing leaving your machine. The full set of binding
+constraints (deterministic, evidence-not-judgement, near-zero-FP, local-first) lives in
+[`SPEC-0000`](./specs/SPEC-0000-product.md).
+## Status
-Receipts audits its own development: open this repo in your coding agent and
-`receipts pr` attaches a **branch-scoped** Receipt to each PR (a pre-push hook and
-[`AGENTS.md`](./AGENTS.md) automate it).
+🚧 **Early, and working today** across Claude Code, Codex, Cursor, and OpenClaw. The
+detection engine is measured for fidelity (see trust, above); the roadmap lives in
+[`specs/`](./specs). Receipts audits its own development — every PR in this repo carries
+its own Work Record.
-## How we build: spec-driven
+## How we build
-Every change starts from a spec in [`specs/`](./specs) (use
-[`specs/TEMPLATE.md`](./specs/TEMPLATE.md)). The product vision is
-[`SPEC-0000`](./specs/SPEC-0000-product.md). See [CONTRIBUTING.md](./CONTRIBUTING.md).
+Spec-driven: every change starts from a spec in [`specs/`](./specs)
+([template](./specs/TEMPLATE.md), [contributing](./CONTRIBUTING.md)).
 ## License