leerness 1.32.0 → 1.33.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (6) hide show
  1. package/CHANGELOG.md +14080 -14012
  2. package/README.ko.md +195 -187
  3. package/README.md +113 -105
  4. package/bin/leerness.js +20650 -20637
  5. package/package.json +58 -58
  6. package/scripts/e2e.js +6679 -6612
package/README.md CHANGED
@@ -1,110 +1,118 @@
1
- # leerness
2
-
3
- ```
4
- ██╗ ███████╗███████╗██████╗ ███╗ ██╗███████╗███████╗
5
- ██║ ██╔════╝██╔════╝██╔══██╗████╗ ██║██╔════╝██╔════╝
6
- ██║ █████╗ █████╗ ██████╔╝██╔██╗ ██║█████╗ ███████╗
7
- ██║ ██╔══╝ ██╔══╝ ██╔══██╗██║╚██╗██║██╔══╝ ╚════██║
8
- ███████╗███████╗███████╗██║ ██║██║ ╚████║███████╗███████║
9
- ╚══════╝╚══════╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═══╝╚══════╝╚══════╝
10
- ```
11
-
12
- > **The AI-coding operations layer that makes "done" require evidence — for any language, any AI agent.**
13
- > leerness does not write code. It gives your AI agent persistent memory, verified completion, and clean handoffs — stored inside your repo as plain files, exposed via CLI + MCP.
14
-
15
- [![npm](https://img.shields.io/npm/v/leerness)](https://www.npmjs.com/package/leerness) · ![MCP tools](https://img.shields.io/badge/MCP--tools-85-blue) · **0 runtime deps** · **0 install scripts** · offline-first · Node ≥ 18 · MIT
16
-
17
- **🇰🇷 한국어 전문: [README.ko.md](./README.ko.md)**
18
-
19
- ---
20
-
21
- ## Try it in 30 seconds
22
-
23
- ```bash
24
- npx -y leerness init . --yes # adds .harness/ memory + guard files to your project
25
- npx leerness handoff . # everything your AI should know right now, in one call
26
- ```
27
-
28
- Your project now has agent-independent memory. To see the flagship feature — catching a false "done" claim:
29
-
30
- ```bash
31
- npx leerness task add "Implement payment API" # prints the new id, e.g. T-0002 — use it below
32
- npx leerness task update T-0002 --status done --evidence "payment.js implemented + tested"
33
- npx leerness verify-claim T-0002 # exit 1 — payment.js does not exist. Claim rejected.
34
- ```
35
-
36
- Now actually write `payment.js`, then run the **same** `verify-claim T-0002` → it exits 0. That is the whole idea: **"done" must match reality.**
37
-
38
- > Tip: if your evidence claims a specific test count (e.g. "5 tests passed"), leerness measures the real count and rejects a mismatch — so claim only what's true, or add `--run-tests --test-cmd "<your test cmd>"` to verify by running them.
39
-
40
- > Want a smaller footprint? `leerness init . --minimal` installs only the core memory + verification files instead of the full set.
41
-
42
- ---
43
-
44
- ## No terminal? Let your AI run it
45
-
46
- You never have to type a command yourself. Paste this into Claude Code, Cursor, Codex, or any coding agent:
47
-
48
- > Set up leerness in this project by running `npx -y leerness init . --yes`. From now on, run `leerness handoff .` at the start of every session, verify finished work with `leerness verify-claim`, and run `leerness session close .` before you finish.
49
-
50
- The agent installs and operates it for you — `leerness init` also writes the instructions into CLAUDE.md / AGENTS.md so future sessions pick them up automatically.
51
-
52
- Prefer pure natural language? leerness ships an **MCP server with 85 tools** (`leerness mcp serve`). Connect it once to Claude Desktop / Claude Code and just ask: *"what was I working on?"*, *"did the AI actually finish T-0001?"*
53
-
54
- ---
55
-
56
- ## Claude and Codex already have memory. Why leerness?
57
-
58
- Built-in harnesses remember what the AI **said**. leerness verifies what the AI **did** — and keeps working when you switch agents.
59
-
60
- | | Built-in (CLAUDE.md, agent memory) | leerness |
61
- |---|---|---|
62
- | Memory | per-agent, free-form notes | structured tasks / decisions / lessons / rules — agent-independent files in your repo |
63
- | "Done" claims | trusted as written | **evidence-gated**: claimed files, test counts, and run output are checked against reality — bluffs exit 1 |
64
- | Switching agents (Claude → Codex → Cursor) | context lost | same `.harness/` state, same one-call handoff |
65
- | Secrets · encoding · drift guards | none | `scan secrets` · `encoding check` · `drift check --auto-fix` — CI-ready |
66
- | Lock-in | one vendor | any agent, any language, 0 runtime dependencies |
67
-
68
- This positioning was verified by **independent clean-room evaluations** — fresh `npm install` into temp dirs, driven by behavior only, including adversarial attacks against the verifier itself (fake tests, comment-only stubs, inflated test counts — all rejected). Methodology, results, and honest limitations: **[docs/clean-room-evaluations.md](./docs/clean-room-evaluations.md)**.
69
-
70
- ---
71
-
72
- ## Guidance vs enforcement (be honest about this)
73
-
74
- By default leerness is **cooperative**: your AI agent runs the commands because CLAUDE.md / AGENTS.md tell it to. A determined agent could skip them. To make verification **enforced**, not optional:
75
-
76
- ```bash
77
- leerness ci init # writes .github/workflows/leerness-gate.yml — runs `leerness gate` on every PR
78
- ```
79
-
80
- Then make that check **required** in GitHub branch protection. Now a PR that skips verification (or whose claims fail) **cannot merge** — the gate runs independently of the agent, returns a non-zero exit code, and blocks. That is the difference between a guideline and a guardrail.
81
-
82
- ---
83
-
84
- ## What is inside (the 60-second tour)
85
-
86
- - **Memory**`task` / `plan` / `decision` / `lesson` / `rule`: canonical JSON + markdown projections, archive/restore.
87
- - **Handoff** — `handoff` (session start context) · `session close` (closing report). Survives agent swaps.
88
- - **Verification** — `verify-claim` (evidence vs reality, stub/fake-test/inflated-count detection, `--run-tests --test-cmd` for any language) · `contract verify` (spec impl) · `gate` (one-call CI gate).
89
- - **Audit** — `audit` · `lazy detect` · `drift check` keep the workspace honest over time.
90
- - **Security** — `scan secrets` (committed-secret detection) · `encoding check` (BOM/CP949) — also runs at `session close`.
91
-
92
- Full command reference, workflows, and architecture: **[README.ko.md](./README.ko.md)** (Korean) · `leerness commands` · `leerness help`.
93
-
94
- ## Links
95
-
96
- - npm: https://www.npmjs.com/package/leerness
97
- - Site & release videos: https://leerness.pages.dev
98
- - Changelog: [CHANGELOG.md](./CHANGELOG.md)
99
-
100
- ## License
101
-
1
+ # leerness
2
+
3
+ ```
4
+ ██╗ ███████╗███████╗██████╗ ███╗ ██╗███████╗███████╗
5
+ ██║ ██╔════╝██╔════╝██╔══██╗████╗ ██║██╔════╝██╔════╝
6
+ ██║ █████╗ █████╗ ██████╔╝██╔██╗ ██║█████╗ ███████╗
7
+ ██║ ██╔══╝ ██╔══╝ ██╔══██╗██║╚██╗██║██╔══╝ ╚════██║
8
+ ███████╗███████╗███████╗██║ ██║██║ ╚████║███████╗███████║
9
+ ╚══════╝╚══════╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═══╝╚══════╝╚══════╝
10
+ ```
11
+
12
+ > **The AI-coding operations layer that makes "done" require evidence — for any language, any AI agent.**
13
+ > leerness does not write code. It gives your AI agent persistent memory, verified completion, and clean handoffs — stored inside your repo as plain files, exposed via CLI + MCP.
14
+
15
+ [![npm](https://img.shields.io/npm/v/leerness)](https://www.npmjs.com/package/leerness) · ![MCP tools](https://img.shields.io/badge/MCP--tools-85-blue) · **0 runtime deps** · **0 install scripts** · offline-first · Node ≥ 18 · MIT
16
+
17
+ **🇰🇷 한국어 전문: [README.ko.md](./README.ko.md)**
18
+
19
+ ---
20
+
21
+ ## Try it in 30 seconds
22
+
23
+ ```bash
24
+ npx -y leerness init . --yes # adds .harness/ memory + guard files to your project
25
+ npx leerness handoff . # everything your AI should know right now, in one call
26
+ ```
27
+
28
+ Your project now has agent-independent memory. To see the flagship feature — catching a false "done" claim:
29
+
30
+ ```bash
31
+ npx leerness task add "Implement payment API" # prints the new id, e.g. T-0002 — use it below
32
+ npx leerness task update T-0002 --status done --evidence "payment.js implemented + tested"
33
+ npx leerness verify-claim T-0002 # exit 1 — payment.js does not exist. Claim rejected.
34
+ ```
35
+
36
+ Now actually write `payment.js`, then run the **same** `verify-claim T-0002` → it exits 0. That is the whole idea: **"done" must match reality.**
37
+
38
+ > Tip: if your evidence claims a specific test count (e.g. "5 tests passed"), leerness measures the real count and rejects a mismatch — so claim only what's true, or add `--run-tests --test-cmd "<your test cmd>"` to verify by running them.
39
+
40
+ > Want a smaller footprint? `leerness init . --minimal` installs only the core memory + verification files instead of the full set.
41
+
42
+ ---
43
+
44
+ ## No terminal? Let your AI run it
45
+
46
+ You never have to type a command yourself. Paste this into Claude Code, Cursor, Codex, or any coding agent:
47
+
48
+ > Set up leerness in this project by running `npx -y leerness init . --yes`. From now on, run `leerness handoff .` at the start of every session, verify finished work with `leerness verify-claim`, and run `leerness session close .` before you finish.
49
+
50
+ The agent installs and operates it for you — `leerness init` also writes the instructions into CLAUDE.md / AGENTS.md so future sessions pick them up automatically.
51
+
52
+ Prefer pure natural language? leerness ships an **MCP server with 85 tools** (`leerness mcp serve`). Connect it once to Claude Desktop / Claude Code and just ask: *"what was I working on?"*, *"did the AI actually finish T-0001?"*
53
+
54
+ ---
55
+
56
+ ## Claude and Codex already have memory. Why leerness?
57
+
58
+ Built-in harnesses remember what the AI **said**. leerness verifies what the AI **did** — and keeps working when you switch agents.
59
+
60
+ | | Built-in (CLAUDE.md, agent memory) | leerness |
61
+ |---|---|---|
62
+ | Memory | per-agent, free-form notes | structured tasks / decisions / lessons / rules — agent-independent files in your repo |
63
+ | "Done" claims | trusted as written | **evidence-gated**: claimed files, test counts, and run output are checked against reality — bluffs exit 1 |
64
+ | Switching agents (Claude → Codex → Cursor) | context lost | same `.harness/` state, same one-call handoff |
65
+ | Secrets · encoding · drift guards | none | `scan secrets` · `encoding check` · `drift check --auto-fix` — CI-ready |
66
+ | Lock-in | one vendor | any agent, any language, 0 runtime dependencies |
67
+
68
+ This positioning is checked by **self-administered clean-room evaluations** — AI agents do a fresh `npm install` into temp dirs and drive it by behavior only, including adversarial attacks against the verifier itself (fake tests, comment-only stubs, inflated test counts — all rejected). To be clear: these are *AI* clean-room runs, **not third-party human audits or peer review** — they make the claim *checkable* rather than a marketing line. Methodology, results, and honest limitations: **[docs/clean-room-evaluations.md](./docs/clean-room-evaluations.md)**.
69
+
70
+ ---
71
+
72
+ ## Guidance vs enforcement (be honest about this)
73
+
74
+ By default leerness is **cooperative**: your AI agent runs the commands because CLAUDE.md / AGENTS.md tell it to. A determined agent could skip them. To make verification **enforced**, not optional:
75
+
76
+ ```bash
77
+ leerness ci init # writes .github/workflows/leerness-gate.yml — runs `leerness gate` on every PR
78
+ ```
79
+
80
+ Then make that check **required** in GitHub branch protection. Now a PR that skips verification (or whose claims fail) **cannot merge** — the gate runs independently of the agent, returns a non-zero exit code, and blocks. That is the difference between a guideline and a guardrail.
81
+
82
+ ---
83
+
84
+ ## Maturity and why trying it is still cheap
85
+
86
+ Be honest with yourself before you depend on this: leerness is **early and largely solo-maintained**, developed mostly through autonomous AI rounds so its own `selftest` + e2e suites are the primary quality signal, and external adoption is still small. Don't make it load-bearing on faith: **pin a version**, and treat the differentiated slice — `verify-claim` + the CI `gate` as a required check as the part worth relying on.
87
+
88
+ The asymmetry is what makes a trial reasonable anyway: MIT, **0 runtime dependencies**, offline-first, and all state is plain files in *your* repo. Lock-in is near zero if it doesn't earn its place, remove the tool and your `task`/`decision`/`lesson` files stay. (For secret scanning specifically, mature dedicated tools like gitleaks/trufflehog exist use those if you need a hard guarantee; leerness's `scan secrets` is a convenience guard, not a replacement.)
89
+
90
+ ---
91
+
92
+ ## What is inside (the 60-second tour)
93
+
94
+ - **Memory** — `task` / `plan` / `decision` / `lesson` / `rule`: canonical JSON + markdown projections, archive/restore.
95
+ - **Handoff** — `handoff` (session start context) · `session close` (closing report). Survives agent swaps.
96
+ - **Verification** — `verify-claim` (evidence vs reality, stub/fake-test/inflated-count detection, `--run-tests --test-cmd` for any language) · `contract verify` (spec ↔ impl) · `gate` (one-call CI gate).
97
+ - **Audit** `audit` · `lazy detect` · `drift check` keep the workspace honest over time.
98
+ - **Security** — `scan secrets` (committed-secret detection) · `encoding check` (BOM/CP949) — also runs at `session close`.
99
+
100
+ Full command reference, workflows, and architecture: **[README.ko.md](./README.ko.md)** (Korean) · `leerness commands` · `leerness help`.
101
+
102
+ ## Links
103
+
104
+ - npm: https://www.npmjs.com/package/leerness
105
+ - Site & release videos: https://leerness.pages.dev
106
+ - Changelog: [CHANGELOG.md](./CHANGELOG.md)
107
+
108
+ ## License
109
+
102
110
  MIT
103
111
 
104
112
  <!-- leerness:project-readme:start -->
105
113
  ## Leerness Project Harness
106
114
 
107
- 이 프로젝트는 Leerness v1.32.0 하네스를 사용합니다. AI 에이전트는 작업 전 `leerness handoff`로 컨텍스트를 적재하고, 작업 후 `leerness check`/`leerness audit`/`leerness session close`를 수행해야 합니다.
115
+ 이 프로젝트는 Leerness v1.33.0 하네스를 사용합니다. AI 에이전트는 작업 전 `leerness handoff`로 컨텍스트를 적재하고, 작업 후 `leerness check`/`leerness audit`/`leerness session close`를 수행해야 합니다.
108
116
 
109
117
  ### 정체성 — AI 에이전트 운영 레이어 (UR-0030)
110
118
 
@@ -158,7 +166,7 @@ leerness memory restore decision <date|title>
158
166
 
159
167
  ### MCP server (외부 AI 통합)
160
168
 
161
- Leerness v1.32.0는 stdio JSON-RPC MCP server를 내장합니다 — Claude Code · Cursor · Codex CLI 등 외부 AI에 **85개 도구**를 노출:
169
+ Leerness v1.33.0는 stdio JSON-RPC MCP server를 내장합니다 — Claude Code · Cursor · Codex CLI 등 외부 AI에 **85개 도구**를 노출:
162
170
 
163
171
  ```jsonc
164
172
  // 카테고리별
@@ -179,7 +187,7 @@ Leerness v1.32.0는 stdio JSON-RPC MCP server를 내장합니다 — Claude Code
179
187
  `<<autonomous-loop-dynamic>>` 신호만 보내면 AI가:
180
188
  1) 다음 라운드 후보 선정 → 2) 코드 변경 → 3) stress-v* 신규 작성 + 누적 회귀 → 4) e2e 219/219 → 5) npm pack + git tag + GitHub release → 6) main 자동 push (1.9.140+) → 7) session close → 8) 다음 라운드 예약.
181
189
 
182
- 현재 누적: **70 라운드 (1.9.40 → 1.32.0)** · 매 라운드 GitHub release/태그 생성 · _reports/는 비공개 보존.
190
+ 현재 누적: **70 라운드 (1.9.40 → 1.33.0)** · 매 라운드 GitHub release/태그 생성 · _reports/는 비공개 보존.
183
191
 
184
192
  ### 성능 가이드 (1.9.140 측정)
185
193
 
@@ -217,6 +225,6 @@ leerness release pack --close --auto-main-push
217
225
  - `.harness/session-handoff.md`: 다음 세션 인수인계 (자동 작성)
218
226
  - `.harness/lessons.md` / `decisions.md` / `rules.md`: 영구 메모리 (5 surface)
219
227
 
220
- Last synced by Leerness v1.32.0: 2026-06-16
228
+ Last synced by Leerness v1.33.0: 2026-06-17
221
229
  <!-- leerness:project-readme:end -->
222
230