leerness 1.32.0 → 1.34.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +14152 -14012
- package/README.ko.md +195 -187
- package/README.md +116 -106
- package/bin/leerness.js +20747 -20637
- package/lib/mcp-tools.js +1 -0
- package/package.json +58 -58
- package/scripts/e2e.js +6782 -6612
package/README.md
CHANGED
|
@@ -1,110 +1,120 @@
|
|
|
1
|
-
# leerness
|
|
2
|
-
|
|
3
|
-
```
|
|
4
|
-
██╗ ███████╗███████╗██████╗ ███╗ ██╗███████╗███████╗
|
|
5
|
-
██║ ██╔════╝██╔════╝██╔══██╗████╗ ██║██╔════╝██╔════╝
|
|
6
|
-
██║ █████╗ █████╗ ██████╔╝██╔██╗ ██║█████╗ ███████╗
|
|
7
|
-
██║ ██╔══╝ ██╔══╝ ██╔══██╗██║╚██╗██║██╔══╝ ╚════██║
|
|
8
|
-
███████╗███████╗███████╗██║ ██║██║ ╚████║███████╗███████║
|
|
9
|
-
╚══════╝╚══════╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═══╝╚══════╝╚══════╝
|
|
10
|
-
```
|
|
11
|
-
|
|
12
|
-
> **The AI-coding operations layer that makes "done" require evidence — for any language, any AI agent.**
|
|
13
|
-
> leerness does not write code. It gives your AI agent persistent memory, verified completion, and clean handoffs — stored inside your repo as plain files, exposed via CLI + MCP.
|
|
14
|
-
|
|
15
|
-
[](https://www.npmjs.com/package/leerness) · **
|
|
18
|
-
|
|
19
|
-
---
|
|
20
|
-
|
|
21
|
-
## Try it in 30 seconds
|
|
22
|
-
|
|
23
|
-
```bash
|
|
24
|
-
npx -y leerness init . --yes # adds .harness/ memory + guard files to your project
|
|
25
|
-
npx leerness handoff . # everything your AI should know right now, in one call
|
|
26
|
-
```
|
|
27
|
-
|
|
28
|
-
Your project now has agent-independent memory. To see the flagship feature — catching a false "done" claim:
|
|
29
|
-
|
|
30
|
-
```bash
|
|
31
|
-
npx leerness task add "Implement payment API" # prints the new id, e.g. T-0002 — use it below
|
|
32
|
-
npx leerness task update T-0002 --status done --evidence "payment.js implemented + tested"
|
|
33
|
-
npx leerness verify-claim T-0002 # exit 1 — payment.js does not exist. Claim rejected.
|
|
34
|
-
```
|
|
35
|
-
|
|
36
|
-
Now actually write `payment.js`, then run the **same** `verify-claim T-0002` → it exits 0. That is the whole idea: **"done" must match reality.**
|
|
37
|
-
|
|
38
|
-
> Tip: if your evidence claims a specific test count (e.g. "5 tests passed"), leerness measures the real count and rejects a mismatch — so claim only what's true, or add `--run-tests --test-cmd "<your test cmd>"` to verify by running them.
|
|
39
|
-
|
|
40
|
-
> Want a smaller footprint? `leerness init . --minimal` installs only the core memory + verification files instead of the full set.
|
|
41
|
-
|
|
42
|
-
---
|
|
43
|
-
|
|
44
|
-
## No terminal? Let your AI run it
|
|
45
|
-
|
|
46
|
-
You never have to type a command yourself. Paste this into Claude Code, Cursor, Codex, or any coding agent:
|
|
47
|
-
|
|
48
|
-
> Set up leerness in this project by running `npx -y leerness init . --yes`. From now on, run `leerness handoff .` at the start of every session, verify finished work with `leerness verify-claim`, and run `leerness session close .` before you finish.
|
|
49
|
-
|
|
50
|
-
The agent installs and operates it for you — `leerness init` also writes the instructions into CLAUDE.md / AGENTS.md so future sessions pick them up automatically.
|
|
51
|
-
|
|
52
|
-
Prefer pure natural language? leerness ships an **MCP server with
|
|
53
|
-
|
|
54
|
-
---
|
|
55
|
-
|
|
56
|
-
## Claude and Codex already have memory. Why leerness?
|
|
57
|
-
|
|
58
|
-
Built-in harnesses remember what the AI **said**. leerness verifies what the AI **did** — and keeps working when you switch agents.
|
|
59
|
-
|
|
60
|
-
| | Built-in (CLAUDE.md, agent memory) | leerness |
|
|
61
|
-
|---|---|---|
|
|
62
|
-
| Memory | per-agent, free-form notes | structured tasks / decisions / lessons / rules — agent-independent files in your repo |
|
|
63
|
-
| "Done" claims | trusted as written | **evidence-gated**: claimed files, test counts, and run output are checked against reality — bluffs exit 1 |
|
|
64
|
-
| Switching agents (Claude → Codex → Cursor) | context lost | same `.harness/` state, same one-call handoff |
|
|
65
|
-
| Secrets · encoding · drift guards | none | `scan secrets` · `encoding check` · `drift check --auto-fix` — CI-ready |
|
|
66
|
-
| Lock-in | one vendor | any agent, any language, 0 runtime dependencies |
|
|
67
|
-
|
|
68
|
-
This positioning
|
|
69
|
-
|
|
70
|
-
---
|
|
71
|
-
|
|
72
|
-
## Guidance vs enforcement (be honest about this)
|
|
73
|
-
|
|
74
|
-
By default leerness is **cooperative**: your AI agent runs the commands because CLAUDE.md / AGENTS.md tell it to. A determined agent could skip them. To make verification **enforced**, not optional:
|
|
75
|
-
|
|
76
|
-
```bash
|
|
77
|
-
leerness ci init # writes .github/workflows/leerness-gate.yml — runs `leerness gate` on every PR
|
|
78
|
-
```
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
##
|
|
95
|
-
|
|
96
|
-
-
|
|
97
|
-
-
|
|
98
|
-
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
1
|
+
# leerness
|
|
2
|
+
|
|
3
|
+
```
|
|
4
|
+
██╗ ███████╗███████╗██████╗ ███╗ ██╗███████╗███████╗
|
|
5
|
+
██║ ██╔════╝██╔════╝██╔══██╗████╗ ██║██╔════╝██╔════╝
|
|
6
|
+
██║ █████╗ █████╗ ██████╔╝██╔██╗ ██║█████╗ ███████╗
|
|
7
|
+
██║ ██╔══╝ ██╔══╝ ██╔══██╗██║╚██╗██║██╔══╝ ╚════██║
|
|
8
|
+
███████╗███████╗███████╗██║ ██║██║ ╚████║███████╗███████║
|
|
9
|
+
╚══════╝╚══════╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═══╝╚══════╝╚══════╝
|
|
10
|
+
```
|
|
11
|
+
|
|
12
|
+
> **The AI-coding operations layer that makes "done" require evidence — for any language, any AI agent.**
|
|
13
|
+
> leerness does not write code. It gives your AI agent persistent memory, verified completion, and clean handoffs — stored inside your repo as plain files, exposed via CLI + MCP.
|
|
14
|
+
|
|
15
|
+
[](https://www.npmjs.com/package/leerness) ·  · **0 runtime deps** · **0 install scripts** · offline-first · Node ≥ 18 · MIT
|
|
16
|
+
|
|
17
|
+
**🇰🇷 한국어 전문: [README.ko.md](./README.ko.md)**
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Try it in 30 seconds
|
|
22
|
+
|
|
23
|
+
```bash
|
|
24
|
+
npx -y leerness init . --yes # adds .harness/ memory + guard files to your project
|
|
25
|
+
npx leerness handoff . # everything your AI should know right now, in one call
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
Your project now has agent-independent memory. To see the flagship feature — catching a false "done" claim:
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
npx leerness task add "Implement payment API" # prints the new id, e.g. T-0002 — use it below
|
|
32
|
+
npx leerness task update T-0002 --status done --evidence "payment.js implemented + tested"
|
|
33
|
+
npx leerness verify-claim T-0002 # exit 1 — payment.js does not exist. Claim rejected.
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
Now actually write `payment.js`, then run the **same** `verify-claim T-0002` → it exits 0. That is the whole idea: **"done" must match reality.**
|
|
37
|
+
|
|
38
|
+
> Tip: if your evidence claims a specific test count (e.g. "5 tests passed"), leerness measures the real count and rejects a mismatch — so claim only what's true, or add `--run-tests --test-cmd "<your test cmd>"` to verify by running them.
|
|
39
|
+
|
|
40
|
+
> Want a smaller footprint? `leerness init . --minimal` installs only the core memory + verification files instead of the full set.
|
|
41
|
+
|
|
42
|
+
---
|
|
43
|
+
|
|
44
|
+
## No terminal? Let your AI run it
|
|
45
|
+
|
|
46
|
+
You never have to type a command yourself. Paste this into Claude Code, Cursor, Codex, or any coding agent:
|
|
47
|
+
|
|
48
|
+
> Set up leerness in this project by running `npx -y leerness init . --yes`. From now on, run `leerness handoff .` at the start of every session, verify finished work with `leerness verify-claim`, and run `leerness session close .` before you finish.
|
|
49
|
+
|
|
50
|
+
The agent installs and operates it for you — `leerness init` also writes the instructions into CLAUDE.md / AGENTS.md so future sessions pick them up automatically.
|
|
51
|
+
|
|
52
|
+
Prefer pure natural language? leerness ships an **MCP server with 86 tools** (`leerness mcp serve`). Connect it once to Claude Desktop / Claude Code and just ask: *"what was I working on?"*, *"did the AI actually finish T-0001?"*
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## Claude and Codex already have memory. Why leerness?
|
|
57
|
+
|
|
58
|
+
Built-in harnesses remember what the AI **said**. leerness verifies what the AI **did** — and keeps working when you switch agents.
|
|
59
|
+
|
|
60
|
+
| | Built-in (CLAUDE.md, agent memory) | leerness |
|
|
61
|
+
|---|---|---|
|
|
62
|
+
| Memory | per-agent, free-form notes | structured tasks / decisions / lessons / rules — agent-independent files in your repo |
|
|
63
|
+
| "Done" claims | trusted as written | **evidence-gated**: claimed files, test counts, and run output are checked against reality — bluffs exit 1 |
|
|
64
|
+
| Switching agents (Claude → Codex → Cursor) | context lost | same `.harness/` state, same one-call handoff |
|
|
65
|
+
| Secrets · encoding · drift guards | none | `scan secrets` · `encoding check` · `drift check --auto-fix` — CI-ready |
|
|
66
|
+
| Lock-in | one vendor | any agent, any language, 0 runtime dependencies |
|
|
67
|
+
|
|
68
|
+
This positioning is checked by **self-administered clean-room evaluations** — AI agents do a fresh `npm install` into temp dirs and drive it by behavior only, including adversarial attacks against the verifier itself (fake tests, comment-only stubs, inflated test counts — all rejected). To be clear: these are *AI* clean-room runs, **not third-party human audits or peer review** — they make the claim *checkable* rather than a marketing line. Methodology, results, and honest limitations: **[docs/clean-room-evaluations.md](./docs/clean-room-evaluations.md)**.
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## Guidance vs enforcement (be honest about this)
|
|
73
|
+
|
|
74
|
+
By default leerness is **cooperative**: your AI agent runs the commands because CLAUDE.md / AGENTS.md tell it to. A determined agent could skip them. To make verification **enforced**, not optional:
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
leerness ci init # writes .github/workflows/leerness-gate.yml — runs `leerness gate` on every PR
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
The generated workflow is production-grade: it **pins the leerness version** (reproducible — the gate's verdict can't change from a silent upgrade), runs with **least-privilege permissions** (`contents: read`), and cancels superseded runs.
|
|
81
|
+
|
|
82
|
+
Then make that check **required** in GitHub branch protection. Now a PR that skips verification (or whose claims fail) **cannot merge** — the gate runs independently of the agent, returns a non-zero exit code, and blocks. That is the difference between a guideline and a guardrail. For exact per-claim enforcement, run `leerness gate --claims` — it adds a 6th check that runs `verify-claim` on **every** completed task and fails the gate if any "done" task's evidence doesn't match reality (the default 5-check gate already blocks false-done via heuristics; `--claims` makes it precise).
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
## Maturity — and why trying it is still cheap
|
|
87
|
+
|
|
88
|
+
Be honest with yourself before you depend on this: leerness is **early and largely solo-maintained**, developed mostly through autonomous AI rounds — so its own `selftest` + e2e suites are the primary quality signal, and external adoption is still small. Don't make it load-bearing on faith: **pin a version**, and treat the differentiated slice — `verify-claim` + the CI `gate` as a required check — as the part worth relying on.
|
|
89
|
+
|
|
90
|
+
The asymmetry is what makes a trial reasonable anyway: MIT, **0 runtime dependencies**, offline-first, and all state is plain files in *your* repo. Lock-in is near zero — if it doesn't earn its place, remove the tool and your `task`/`decision`/`lesson` files stay. (For secret scanning specifically, mature dedicated tools like gitleaks/trufflehog exist — use those if you need a hard guarantee; leerness's `scan secrets` is a convenience guard, not a replacement.)
|
|
91
|
+
|
|
92
|
+
---
|
|
93
|
+
|
|
94
|
+
## What is inside (the 60-second tour)
|
|
95
|
+
|
|
96
|
+
- **Memory** — `task` / `plan` / `decision` / `lesson` / `rule`: canonical JSON + markdown projections, archive/restore.
|
|
97
|
+
- **Handoff** — `handoff` (session start context) · `session close` (closing report). Survives agent swaps.
|
|
98
|
+
- **Verification** — `verify-claim` (evidence vs reality, stub/fake-test/inflated-count detection, `--run-tests --test-cmd` for any language; `--all` checks **every** completed claim at once for CI) · `contract verify` (spec ↔ impl) · `gate` (one-call CI gate).
|
|
99
|
+
- **Audit** — `audit` · `lazy detect` · `drift check` keep the workspace honest over time.
|
|
100
|
+
- **Security** — `scan secrets` (committed-secret detection) · `encoding check` (BOM/CP949) — also runs at `session close`.
|
|
101
|
+
|
|
102
|
+
Full command reference, workflows, and architecture: **[README.ko.md](./README.ko.md)** (Korean) · `leerness commands` · `leerness help`.
|
|
103
|
+
|
|
104
|
+
## Links
|
|
105
|
+
|
|
106
|
+
- npm: https://www.npmjs.com/package/leerness
|
|
107
|
+
- Site & release videos: https://leerness.pages.dev
|
|
108
|
+
- Changelog: [CHANGELOG.md](./CHANGELOG.md)
|
|
109
|
+
|
|
110
|
+
## License
|
|
111
|
+
|
|
102
112
|
MIT
|
|
103
113
|
|
|
104
114
|
<!-- leerness:project-readme:start -->
|
|
105
115
|
## Leerness Project Harness
|
|
106
116
|
|
|
107
|
-
이 프로젝트는 Leerness v1.
|
|
117
|
+
이 프로젝트는 Leerness v1.34.0 하네스를 사용합니다. AI 에이전트는 작업 전 `leerness handoff`로 컨텍스트를 적재하고, 작업 후 `leerness check`/`leerness audit`/`leerness session close`를 수행해야 합니다.
|
|
108
118
|
|
|
109
119
|
### 정체성 — AI 에이전트 운영 레이어 (UR-0030)
|
|
110
120
|
|
|
@@ -158,7 +168,7 @@ leerness memory restore decision <date|title>
|
|
|
158
168
|
|
|
159
169
|
### MCP server (외부 AI 통합)
|
|
160
170
|
|
|
161
|
-
Leerness v1.
|
|
171
|
+
Leerness v1.34.0는 stdio JSON-RPC MCP server를 내장합니다 — Claude Code · Cursor · Codex CLI 등 외부 AI에 **86개 도구**를 노출:
|
|
162
172
|
|
|
163
173
|
```jsonc
|
|
164
174
|
// 카테고리별
|
|
@@ -171,7 +181,7 @@ Leerness v1.32.0는 stdio JSON-RPC MCP server를 내장합니다 — Claude Code
|
|
|
171
181
|
// • Workflow: session_close / agents_list / task_export / env_check / usage_stats / reuse_map / whats_new
|
|
172
182
|
|
|
173
183
|
// MCP server 실행: leerness mcp serve
|
|
174
|
-
// tools/list 응답:
|
|
184
|
+
// tools/list 응답: 86 도구
|
|
175
185
|
```
|
|
176
186
|
|
|
177
187
|
### Autonomous mode (자율 모드)
|
|
@@ -179,7 +189,7 @@ Leerness v1.32.0는 stdio JSON-RPC MCP server를 내장합니다 — Claude Code
|
|
|
179
189
|
`<<autonomous-loop-dynamic>>` 신호만 보내면 AI가:
|
|
180
190
|
1) 다음 라운드 후보 선정 → 2) 코드 변경 → 3) stress-v* 신규 작성 + 누적 회귀 → 4) e2e 219/219 → 5) npm pack + git tag + GitHub release → 6) main 자동 push (1.9.140+) → 7) session close → 8) 다음 라운드 예약.
|
|
181
191
|
|
|
182
|
-
현재 누적: **70 라운드 (1.9.40 → 1.
|
|
192
|
+
현재 누적: **70 라운드 (1.9.40 → 1.34.0)** · 매 라운드 GitHub release/태그 생성 · _reports/는 비공개 보존.
|
|
183
193
|
|
|
184
194
|
### 성능 가이드 (1.9.140 측정)
|
|
185
195
|
|
|
@@ -217,6 +227,6 @@ leerness release pack --close --auto-main-push
|
|
|
217
227
|
- `.harness/session-handoff.md`: 다음 세션 인수인계 (자동 작성)
|
|
218
228
|
- `.harness/lessons.md` / `decisions.md` / `rules.md`: 영구 메모리 (5 surface)
|
|
219
229
|
|
|
220
|
-
Last synced by Leerness v1.
|
|
230
|
+
Last synced by Leerness v1.34.0: 2026-06-19
|
|
221
231
|
<!-- leerness:project-readme:end -->
|
|
222
232
|
|