neurain 0.1.0-alpha.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +19 -0
- package/LICENSE +57 -0
- package/README.md +205 -0
- package/SECURITY.md +22 -0
- package/bin/neurain.mjs +7 -0
- package/docs/comparison-mem0.en.md +22 -0
- package/docs/connect-claude.en.md +48 -0
- package/docs/connect-claude.kr.md +51 -0
- package/docs/connect-codex.en.md +38 -0
- package/docs/connect-codex.kr.md +40 -0
- package/docs/connect-gemini.en.md +71 -0
- package/docs/connect-gemini.kr.md +71 -0
- package/docs/connect-runtime.en.md +61 -0
- package/docs/connect-runtime.kr.md +61 -0
- package/docs/development-status.en.md +157 -0
- package/docs/development-status.kr.md +157 -0
- package/docs/knowledge-os.en.md +105 -0
- package/docs/knowledge-os.kr.md +106 -0
- package/docs/pricing.en.md +14 -0
- package/docs/privacy-and-data-flow.en.md +25 -0
- package/docs/public-saas-readiness.en.md +39 -0
- package/docs/quickstart.en.md +64 -0
- package/docs/quickstart.kr.md +64 -0
- package/docs/release-checklist.en.md +38 -0
- package/docs/safety.en.md +36 -0
- package/docs/self-improvement-90-roadmap.en.md +429 -0
- package/docs/self-improvement-90-roadmap.kr.md +429 -0
- package/docs/self-improving-workflows.en.md +163 -0
- package/docs/self-improving-workflows.kr.md +163 -0
- package/docs/support.en.md +17 -0
- package/docs/troubleshooting.en.md +35 -0
- package/package.json +36 -0
- package/src/cli.mjs +261 -0
- package/src/core/adopt.mjs +304 -0
- package/src/core/answer_eval.mjs +450 -0
- package/src/core/capabilities.mjs +217 -0
- package/src/core/capture_durable.mjs +181 -0
- package/src/core/classify.mjs +237 -0
- package/src/core/compile_desk.mjs +324 -0
- package/src/core/complete.mjs +108 -0
- package/src/core/config.mjs +142 -0
- package/src/core/connect.mjs +355 -0
- package/src/core/curator.mjs +351 -0
- package/src/core/daemon.mjs +536 -0
- package/src/core/digest.mjs +155 -0
- package/src/core/doctor.mjs +115 -0
- package/src/core/durable.mjs +96 -0
- package/src/core/envelope.mjs +97 -0
- package/src/core/flush.mjs +190 -0
- package/src/core/fs.mjs +121 -0
- package/src/core/init.mjs +194 -0
- package/src/core/journal.mjs +269 -0
- package/src/core/labels.mjs +117 -0
- package/src/core/lessons.mjs +793 -0
- package/src/core/lifecycle.mjs +1138 -0
- package/src/core/link_check.mjs +180 -0
- package/src/core/live_cases.mjs +221 -0
- package/src/core/onboard.mjs +175 -0
- package/src/core/plan_receipt.mjs +177 -0
- package/src/core/plan_writeback.mjs +176 -0
- package/src/core/queue.mjs +62 -0
- package/src/core/queue_archive.mjs +87 -0
- package/src/core/queue_model.mjs +161 -0
- package/src/core/queue_write.mjs +28 -0
- package/src/core/recall.mjs +1802 -0
- package/src/core/recall_bench.mjs +275 -0
- package/src/core/recall_corpus.mjs +152 -0
- package/src/core/recall_facts.mjs +233 -0
- package/src/core/recall_intel.mjs +233 -0
- package/src/core/recall_lexical.mjs +269 -0
- package/src/core/recap.mjs +78 -0
- package/src/core/review_queue.mjs +131 -0
- package/src/core/review_worker.mjs +284 -0
- package/src/core/route.mjs +73 -0
- package/src/core/safety.mjs +57 -0
- package/src/core/scheduler.mjs +697 -0
- package/src/core/search.mjs +54 -0
- package/src/core/secret_scan.mjs +143 -0
- package/src/core/semantic.mjs +187 -0
- package/src/core/source_digest.mjs +56 -0
- package/src/core/source_digest_gen.mjs +311 -0
- package/src/core/stage.mjs +105 -0
- package/src/core/status.mjs +175 -0
- package/src/core/vault_state.mjs +115 -0
- package/src/core/watch.mjs +282 -0
- package/src/core/wiki_log.mjs +29 -0
- package/src/core/wrap.mjs +62 -0
- package/src/mcp/server.mjs +865 -0
- package/templates/starter-vault/README.md +9 -0
|
@@ -0,0 +1,429 @@
|
|
|
1
|
+
# [en] Self-Improvement 90 Roadmap
|
|
2
|
+
|
|
3
|
+
Version: 1.0
|
|
4
|
+
|
|
5
|
+
This roadmap explains how Neurain Knowledge OS absorbs the useful public architecture patterns from Agent Runtime without pretending that Neurain is an LLM or cloning Runtime as a runtime.
|
|
6
|
+
|
|
7
|
+
Shipped vs planned:
|
|
8
|
+
|
|
9
|
+
- Shipped alpha: lesson list, lesson candidates, lesson candidate detection eval with synthetic and reviewed case-file modes, recap, wrap dry-run, capabilities, narrow CLI promotion, rollback receipt, event journal add/list/verify, journal receipts, read-only watch report, read-only review worker report, read-only scheduler tick, scheduler trigger eval with synthetic and reviewed case-file modes, bounded foreground scheduler monitor, user-started foreground continuous daemon, append-only lifecycle emit, read-only lifecycle lineage report, Claude Code lifecycle hook preview, Codex lifecycle hook preview, E21 native lifecycle automation eval, snapshot-gated curator lifecycle, optional SQLite FTS5 recall DB, E22 local lexical-semantic recall, hybrid recall, E23 live-content recall coverage, E23 redacted live-case scaffold, E24 read-only onboarding, E25 publish-ready CLI gates, E26 Codex/Claude/Gemini/Runtime MCP connector surface, cross-host recall eval smoke, larger 100-case cross-host recall regression eval, reviewed recall case-file eval, answer-quality eval with synthetic and reviewed case-file modes, and MCP preview-only boundary.
|
|
10
|
+
- Remaining in this roadmap: deeper native Runtime lifecycle event automation beyond the shipped host-proxy contract and external user walkthrough evidence.
|
|
11
|
+
- Any command in the planned section is a target interface, not a claim that the command exists today.
|
|
12
|
+
|
|
13
|
+
Naming note: "Agent Runtime" means a generic AI host that owns prompt assembly, model calls, tool execution, and session lifecycle. "Self-Improvement 90" is Neurain's internal milestone name, not a claim about an upstream runtime release version.
|
|
14
|
+
|
|
15
|
+
Distribution note: this roadmap is an alpha planning document. Customer-facing launch pages should use the architecture and promise, but remove internal readiness estimates and implementation gaps.
|
|
16
|
+
|
|
17
|
+
## Category Correction
|
|
18
|
+
|
|
19
|
+
Agent Runtime means a generic AI host, not a model family. Agent Runtime is not comparable to Claude, Gemini, or GPT model families. It is an execution runtime.
|
|
20
|
+
|
|
21
|
+
Correct comparison:
|
|
22
|
+
|
|
23
|
+
- Agent Runtime vs Codex vs Claude Code vs Gemini CLI vs OpenClaw
|
|
24
|
+
- Neurain Knowledge OS as the local-first knowledge layer those runtimes can use
|
|
25
|
+
|
|
26
|
+
## Target Upgrade
|
|
27
|
+
|
|
28
|
+
| Capability | Target |
|
|
29
|
+
|---|---:|
|
|
30
|
+
| Host-agnostic lesson candidate detection | 90 |
|
|
31
|
+
| Safe promotion and rollback | 95 |
|
|
32
|
+
| Background automatic review | 90 |
|
|
33
|
+
| Skill curator equivalent | 90 |
|
|
34
|
+
| Cross-session recall | 90 |
|
|
35
|
+
| User-perceived automation | 90 |
|
|
36
|
+
|
|
37
|
+
These numbers are product readiness targets, not model intelligence scores. Internal baseline estimates are tracked outside the published package.
|
|
38
|
+
|
|
39
|
+
## Measurement Plan
|
|
40
|
+
|
|
41
|
+
| Capability | Measurement method | Minimum fixture |
|
|
42
|
+
|---|---|---|
|
|
43
|
+
| Lesson candidate detection | Precision and recall on reviewed candidate fixtures | 100 events, 40 positive, 60 negative |
|
|
44
|
+
| Safe promotion and rollback | Synthetic promotion, partial-write, drift, and rollback tests | 50 writes |
|
|
45
|
+
| Background review | Trigger precision, trigger recall, and no-recursion checks | 100 event sequences |
|
|
46
|
+
| Curator | Lifecycle, pinned protection, snapshot, rollback, and no-auto-delete tests | 50 lessons/playbooks |
|
|
47
|
+
| Cross-session recall | Hit@5, source support, private-boundary tests, rebuild tests | 100 recall prompts |
|
|
48
|
+
| User-perceived automation | First-run task completion and comprehension survey | 10 non-developer walkthroughs |
|
|
49
|
+
|
|
50
|
+
## Runtime Patterns To Generalize
|
|
51
|
+
|
|
52
|
+
Modern self-improving agent runtimes point to five product patterns worth generalizing:
|
|
53
|
+
|
|
54
|
+
1. Agent loop ownership: prompt assembly, model calls, tool execution, retries, fallback, compression, and persistence are managed by the runtime.
|
|
55
|
+
2. Skills as procedural memory: skill covers are cheap to inspect, full skill bodies load only when needed.
|
|
56
|
+
3. Background self-improvement review: periodic or idle-triggered review proposes memory or skill improvements.
|
|
57
|
+
4. Curator lifecycle: agent-created skills can move through active, stale, and archived states with snapshot and rollback protection.
|
|
58
|
+
5. Session search: SQLite plus FTS5 gives fast recall across session messages and lineage.
|
|
59
|
+
|
|
60
|
+
Neurain should absorb the patterns, not the runtime dependency.
|
|
61
|
+
|
|
62
|
+
## Neurain Adaptation
|
|
63
|
+
|
|
64
|
+
### 1. Neurain Work Loop
|
|
65
|
+
|
|
66
|
+
A dedicated runtime can know exactly when a model response ends because it owns the agent loop. Neurain usually does not own Codex, Claude Code, Gemini CLI, or generic runtime sessions.
|
|
67
|
+
|
|
68
|
+
Neurain's adaptation has several layers:
|
|
69
|
+
|
|
70
|
+
- Manual mode: `neurain wrap <folder> --dry-run` or `!wrap`.
|
|
71
|
+
- Watch report, shipped alpha: `neurain watch <folder> --poll-once` observes local receipts, file changes, journal events, recap hints, and lesson candidates, then proposes review work without writing.
|
|
72
|
+
- Review worker report, shipped alpha: `neurain review <folder> --json` converts watch and journal signals into manual improvement proposals without model calls or writes.
|
|
73
|
+
- Scheduler tick, shipped alpha: `neurain scheduler tick <folder> --json` decides whether local review should run and includes a review worker report only when thresholds are met.
|
|
74
|
+
- Scheduler trigger eval, shipped alpha: `neurain scheduler eval <folder> --fixture-size 100 --json` measures trigger precision, trigger recall, no-recursion, private-boundary handling, and target-root non-write.
|
|
75
|
+
- Scheduler monitor, shipped alpha: `neurain scheduler monitor <folder> --max-ticks 3` repeats read-only scheduler ticks in a bounded foreground run without installing background jobs.
|
|
76
|
+
- Lifecycle report, shipped alpha: `neurain lifecycle emit` records reviewed host boundary events, while `neurain lifecycle report` summarizes turn completion, compaction, resume, and parent-session lineage without Neurain pretending it owns the host loop.
|
|
77
|
+
- Claude Code lifecycle hook preview, shipped alpha: `neurain connect claude <folder> --lifecycle-hooks --dry-run` prints a settings snippet that maps SessionStart, UserPromptSubmit, Stop, and SessionEnd into Neurain lifecycle receipts without storing prompt bodies, transcript paths, or success stdout in model context.
|
|
78
|
+
- Continuous daemon, shipped alpha: `neurain daemon run <folder>` remains active as a user-started foreground loop and calls scheduler ticks without durable knowledge writes.
|
|
79
|
+
- Host-proxy mode: optional future connector where a host routes turn lifecycle events into Neurain.
|
|
80
|
+
|
|
81
|
+
The goal is to make Neurain feel automatic without pretending it controls every host.
|
|
82
|
+
|
|
83
|
+
Acceptance:
|
|
84
|
+
|
|
85
|
+
- Watch mode never writes durable knowledge silently.
|
|
86
|
+
- Every suggested action has a receipt or candidate id.
|
|
87
|
+
- Host-proxy mode is optional and connector-specific.
|
|
88
|
+
|
|
89
|
+
### 2. Lessons And Playbooks
|
|
90
|
+
|
|
91
|
+
Runtime skills map to Neurain lessons and playbooks.
|
|
92
|
+
|
|
93
|
+
Neurain structure:
|
|
94
|
+
|
|
95
|
+
- cover: title, trigger, scope, sensitivity, confidence, last verified
|
|
96
|
+
- body: procedure, pitfalls, tests, rollback, examples
|
|
97
|
+
- source: raw capture, receipt, review finding, or user correction
|
|
98
|
+
- lifecycle: candidate, active, stale, archived
|
|
99
|
+
|
|
100
|
+
Acceptance:
|
|
101
|
+
|
|
102
|
+
- Candidate precision is at least 90 percent on reviewed samples.
|
|
103
|
+
- `neurain lessons eval --fixture-size 100` passes recall, precision, false-positive, unsafe-blocking, and target-root non-write gates.
|
|
104
|
+
- `neurain lessons eval --case-file <json>` can run reviewed lesson candidate cases without writing to the target root.
|
|
105
|
+
- Private-derived candidates cannot become global lessons.
|
|
106
|
+
- Prompt-injection and secret-like content is blocked before any prompt-context use.
|
|
107
|
+
|
|
108
|
+
### 3. Review Worker
|
|
109
|
+
|
|
110
|
+
A dedicated runtime can run a background review fork. Neurain's version should be a local review worker.
|
|
111
|
+
|
|
112
|
+
Triggers:
|
|
113
|
+
|
|
114
|
+
- every N meaningful events
|
|
115
|
+
- after `wrap`
|
|
116
|
+
- after a failed test followed by a successful fix
|
|
117
|
+
- after a Claude review finding is resolved
|
|
118
|
+
- after repeated user correction
|
|
119
|
+
- during idle watch mode
|
|
120
|
+
|
|
121
|
+
Outputs:
|
|
122
|
+
|
|
123
|
+
- lesson candidate
|
|
124
|
+
- stale lesson warning
|
|
125
|
+
- capability routing hint
|
|
126
|
+
- recall gap
|
|
127
|
+
- rollback risk
|
|
128
|
+
|
|
129
|
+
Acceptance:
|
|
130
|
+
|
|
131
|
+
- Review worker trigger precision is at least 85 percent.
|
|
132
|
+
- It never recursively starts another review worker.
|
|
133
|
+
- It cannot execute external writes.
|
|
134
|
+
- It cannot follow or promote injected instructions found in source, receipts, logs, or handoffs.
|
|
135
|
+
- Durable promotion still requires human confirmation.
|
|
136
|
+
|
|
137
|
+
### 4. Curator
|
|
138
|
+
|
|
139
|
+
Runtime curator manages agent-created skills. Neurain curator should manage Neurain-created lessons and playbooks.
|
|
140
|
+
|
|
141
|
+
Rules:
|
|
142
|
+
|
|
143
|
+
- active to stale after configured non-use
|
|
144
|
+
- stale to archived after configured non-use
|
|
145
|
+
- pinned items are protected
|
|
146
|
+
- human-authored docs are protected
|
|
147
|
+
- every mutating curator pass starts with a snapshot
|
|
148
|
+
- rollback restores the exact previous state
|
|
149
|
+
- dry-run exists before real mutation
|
|
150
|
+
|
|
151
|
+
Acceptance:
|
|
152
|
+
|
|
153
|
+
- Accidental destructive curator changes: 0
|
|
154
|
+
- Snapshot restore success: 100 percent in synthetic tests
|
|
155
|
+
- Pinned item mutation refusal: 100 percent
|
|
156
|
+
|
|
157
|
+
### 5. Recall DB
|
|
158
|
+
|
|
159
|
+
A runtime can use SQLite and FTS5 for session search. Neurain should add a local optional recall index while keeping markdown as canonical truth.
|
|
160
|
+
|
|
161
|
+
Design:
|
|
162
|
+
|
|
163
|
+
- SQLite FTS5 stores session events, receipts, lesson covers, handoff summaries, and source references.
|
|
164
|
+
- Markdown remains source of truth.
|
|
165
|
+
- WAL mode supports concurrent reads.
|
|
166
|
+
- Per-host isolation keeps Codex, Claude Code, Agent Runtime, and Gemini CLI traces distinguishable.
|
|
167
|
+
- Session lineage links compacted or resumed work.
|
|
168
|
+
- Sensitivity filters run before indexing.
|
|
169
|
+
- Private-area facts are excluded from global and cross-host recall by default.
|
|
170
|
+
- Secret-like and instruction-injection content is redacted before indexing.
|
|
171
|
+
|
|
172
|
+
Acceptance:
|
|
173
|
+
|
|
174
|
+
- Cross-session recall Hit@5 is at least 90 percent on test prompts.
|
|
175
|
+
- Citation or source support is at least 95 percent for recall-backed answers.
|
|
176
|
+
- Plain markdown fallback still works if the index is deleted.
|
|
177
|
+
- Rebuild from markdown and receipts preserves indexed ids and source pointers in synthetic tests.
|
|
178
|
+
- Recall index private or secret leakage is 0 in global and cross-host recall fixtures.
|
|
179
|
+
|
|
180
|
+
### 6. User-Perceived Automation
|
|
181
|
+
|
|
182
|
+
The user should not need to understand internal commands to feel the improvement.
|
|
183
|
+
|
|
184
|
+
Target experience:
|
|
185
|
+
|
|
186
|
+
- Start with `npx neurain init <folder>`.
|
|
187
|
+
- Connect a host.
|
|
188
|
+
- Work normally.
|
|
189
|
+
- Neurain surfaces "what changed, what to remember, what is risky, what to review."
|
|
190
|
+
- User approval is requested only for durable changes.
|
|
191
|
+
|
|
192
|
+
Acceptance:
|
|
193
|
+
|
|
194
|
+
- First-run to useful status: under 5 minutes.
|
|
195
|
+
- Non-developer onboarding copy has only two visible actions: check state and wrap work.
|
|
196
|
+
- Advanced commands stay available but do not dominate onboarding.
|
|
197
|
+
|
|
198
|
+
## Implementation Phases
|
|
199
|
+
|
|
200
|
+
### Phase 0: Branding And Category Clarity
|
|
201
|
+
|
|
202
|
+
- Promote "Neurain Knowledge OS" across docs.
|
|
203
|
+
- Fix Runtime category language.
|
|
204
|
+
- Publish layer model.
|
|
205
|
+
|
|
206
|
+
### Phase 1: Event Journal
|
|
207
|
+
|
|
208
|
+
- Status: alpha shipped.
|
|
209
|
+
- `neurain journal add` appends reviewed events only with `--confirm "1건 저장 진행"`.
|
|
210
|
+
- `neurain journal list` and `neurain journal verify` are read-only.
|
|
211
|
+
- MCP exposes journal list/verify only, not journal append.
|
|
212
|
+
- Secret-like and instruction-injection summaries are redacted and marked unsafe for prompt context or cross-host indexing.
|
|
213
|
+
|
|
214
|
+
### Phase 2 / E3: Watch Mode
|
|
215
|
+
|
|
216
|
+
- Status: read-only watch report alpha shipped.
|
|
217
|
+
- `neurain watch <folder> --poll-once` observes safe local signals.
|
|
218
|
+
- It reads recent text files, event journal entries, recap hints, and lesson candidates.
|
|
219
|
+
- It produces candidate reports only.
|
|
220
|
+
- It does not start a daemon, append events, promote lessons, or write durable wiki knowledge in alpha.
|
|
221
|
+
- MCP exposes read-only `neurain_watch_report` only.
|
|
222
|
+
|
|
223
|
+
### Phase 3: Review Worker
|
|
224
|
+
|
|
225
|
+
- Status: read-only review worker report alpha shipped.
|
|
226
|
+
- `neurain review <folder> --json` converts watch reports, journal event sequences, recap hints, and lesson candidates into review items, blocked items, and suggested actions.
|
|
227
|
+
- It scores candidate usefulness and risk before any promotion path.
|
|
228
|
+
- It includes blocked reasons for private, unsafe, or withheld signals.
|
|
229
|
+
- It does not call a model, start a nested review worker, run external tools, promote lessons, or write durable wiki knowledge.
|
|
230
|
+
- MCP exposes read-only `neurain_review_worker` only.
|
|
231
|
+
- Connector-triggered automatic scheduling remains planned. User-started foreground daemon checks are shipped separately.
|
|
232
|
+
|
|
233
|
+
### Phase 3b: Scheduler Tick
|
|
234
|
+
|
|
235
|
+
- Status: read-only scheduler tick alpha shipped.
|
|
236
|
+
- `neurain scheduler tick <folder> --json` inspects watch signals and decides whether local review should run.
|
|
237
|
+
- `neurain scheduler status <folder> --json` reports the same decision without including a review worker report.
|
|
238
|
+
- It does not install background jobs, start daemons, call models, promote lessons, or write durable wiki knowledge.
|
|
239
|
+
- MCP exposes read-only `neurain_scheduler_tick` for scheduler ticks.
|
|
240
|
+
|
|
241
|
+
### Phase 3b2 / E20: Scheduler Trigger Eval
|
|
242
|
+
|
|
243
|
+
- Status: read-only scheduler trigger eval alpha shipped.
|
|
244
|
+
- `neurain scheduler eval <folder> --fixture-size 100 --json` runs a synthetic background-review trigger regression without touching the target root.
|
|
245
|
+
- `neurain scheduler eval <folder> --case-file scheduler-cases.json --min-cases 5 --json` runs human-reviewed scheduler trigger cases.
|
|
246
|
+
- It gates trigger precision, trigger recall, false positives, false negatives, no-recursion, private-boundary handling, and target-root non-write.
|
|
247
|
+
- E20 safety hardening added broad target-root snapshots, temp cleanup proof, explicit private/no-recursion denominator counts, case-file size caps, traversal refusal tests, and MCP positive-integer validation.
|
|
248
|
+
- MCP exposes read-only `neurain_scheduler_eval`.
|
|
249
|
+
|
|
250
|
+
### Phase 3c: Scheduler Monitor
|
|
251
|
+
|
|
252
|
+
- Status: bounded foreground scheduler monitor alpha shipped.
|
|
253
|
+
- `neurain scheduler monitor <folder> --interval-seconds 60 --max-ticks 3 --json` repeats read-only scheduler ticks for a user-specified number of ticks.
|
|
254
|
+
- It does not install background jobs, start daemons, continue after the command exits, call models, promote lessons, or write durable wiki knowledge.
|
|
255
|
+
- MCP does not expose scheduler monitor in alpha.
|
|
256
|
+
|
|
257
|
+
### Phase 3d: Lifecycle Lineage
|
|
258
|
+
|
|
259
|
+
- Status: append-only lifecycle event and read-only lineage report alpha shipped.
|
|
260
|
+
- `neurain lifecycle emit <folder> --host codex --event turn_end --session-id <id> --turn-id <id> --confirm "1건 저장 진행"` records reviewed host boundary events in the event journal.
|
|
261
|
+
- Supported lifecycle events are `session_start`, `turn_start`, `turn_end`, `wrap_complete`, `review_due`, `review_complete`, `compaction`, `resume`, and `session_end`.
|
|
262
|
+
- `neurain lifecycle report <folder> --json` summarizes completed turns, open turns, review-due events, compactions, resumes, parent sessions, and journal integrity.
|
|
263
|
+
- MCP exposes read-only `neurain_lifecycle_report` only. It does not expose lifecycle emit.
|
|
264
|
+
- This is Neurain's current substitute for Runtime agent-loop ownership when Neurain is attached beneath Codex, Claude Code, Gemini CLI, Agent Runtime, or OpenClaw.
|
|
265
|
+
- Codex lifecycle hook preview and Runtime host-proxy lifecycle contract are shipped in alpha. Deeper native Runtime lifecycle emission remains connector-specific future work.
|
|
266
|
+
|
|
267
|
+
### Phase 3d2 / E21: Native Lifecycle Automation Eval
|
|
268
|
+
|
|
269
|
+
- Status: read-only native lifecycle automation eval alpha shipped.
|
|
270
|
+
- `neurain lifecycle eval <folder> --fixture-size 100 --json` replays synthetic host lifecycle payloads from Claude Code, Codex, Runtime, and generic hosts through the real hook adapter inside isolated temporary roots.
|
|
271
|
+
- `neurain lifecycle eval <folder> --case-file lifecycle-cases.json --min-cases 6 --json` runs human-reviewed lifecycle payload cases.
|
|
272
|
+
- It gates host coverage (claude, codex, runtime, generic), lifecycle event coverage (session_start, turn_start, turn_end, review_due, compaction, resume, session_end), and ignored handling for malformed, unsupported, and missing-session payloads with no durable write.
|
|
273
|
+
- It reads back every durable artifact and proves that prompt bodies, transcript paths, tool stdout, tool stderr, secrets, and private payloads are never persisted, plus path-traversal containment, broad target-root non-write snapshots, and temp cleanup.
|
|
274
|
+
- It does not call models or external tools and does not write to the target root.
|
|
275
|
+
- MCP exposes read-only `neurain_lifecycle_eval` only. It does not expose lifecycle emit or any lifecycle write path.
|
|
276
|
+
- E21 is the lifecycle observation safety gate. It does not build the full background self-improvement loop and does not claim automatic background review, automatic lesson promotion, or automatic rollback. The next epic, E22 semantic recall quality, continues from here.
|
|
277
|
+
|
|
278
|
+
### Phase 5b / E22: Semantic Recall Quality
|
|
279
|
+
|
|
280
|
+
- Status: read-only local lexical-semantic recall alpha shipped on 2026-06-09.
|
|
281
|
+
- `neurain recall semantic-search <folder> <query> --json` adds a lexical-semantic layer (stemming, curated synonyms, fuzzy character-trigram) on top of exact-token recall so paraphrased queries are found.
|
|
282
|
+
- `neurain recall eval <folder> --semantic --fixture-size 60 --min-cases 50 --json` proves, on a synthetic paraphrase fixture, that semantic Hit@top clearly beats the exact-token baseline.
|
|
283
|
+
- `neurain recall eval <folder> --semantic --case-file semantic-recall-cases.json --min-cases 50 --json` runs reviewed semantic recall cases.
|
|
284
|
+
- The default `local-lexical` provider is deterministic, makes no model call, has no external dependency, needs no separate generated index (markdown stays canonical), and does not lock Neurain to any LLM. The embedding provider is swappable so a real vector model can be attached later without changing the canonical source.
|
|
285
|
+
- Gates: semantic Hit@top at or above 0.9 and at least exact baseline plus 0.2, source support, host isolation, private exclusion, no-answer abstention, rebuild equivalence, and target-root non-write.
|
|
286
|
+
- Expose MCP read-only `neurain_recall_semantic_search` and `neurain_recall_semantic_eval` only.
|
|
287
|
+
- Honest scope: this handles morphological variants, curated synonyms, and typos, not arbitrary neural concept similarity. The synthetic eval demonstrates the mechanism deterministically rather than a real-user benchmark; concepts outside the synonym map need a pluggable embedding provider, and live-user semantic recall proof remains future work.
|
|
288
|
+
|
|
289
|
+
### Phase 6 / E23: Live User Eval Pack (first increment)
|
|
290
|
+
|
|
291
|
+
- Status: first increments shipped on 2026-06-09: read-only live-content recall coverage and redacted live case scaffold.
|
|
292
|
+
- `neurain recall live-eval <folder>` measures how much of a real folder's own indexed content is recallable under auto-derived paraphrase queries (real terms swapped to synonyms), reporting hybrid coverage (the recommended exact-union-semantic strategy), exact-token coverage, semantic-only coverage, and per-kind coverage.
|
|
293
|
+
- Real-content finding: running live-eval on a real folder surfaced that pure semantic alone can recall LESS than exact-token (semantic 0.767 vs exact 0.814 on a real internal corpus), because the lexical-semantic layer lacks the rarity weighting exact-token BM25 has. This was added as the E22 follow-on fix: hybrid recall (`recall hybrid-search`, MCP `neurain_recall_hybrid_search`) returns exact-token union semantic, so it is never worse than exact-token and adds paraphrase catches (hybrid 0.86 vs exact 0.814 on the same corpus). For hybrid search, `--top` is the candidate depth for each branch, so the returned union can exceed `--top` when semantic-only catches exist. This is exactly the kind of regression synthetic fixtures missed and real-content eval caught.
|
|
294
|
+
- It is read-only, returns metrics only with no stored content (safe on a private vault), and makes no model or external calls. Readiness gates that hybrid coverage is never worse than exact-token coverage.
|
|
295
|
+
- Purpose: move alpha evidence from synthetic fixtures toward real content, and give a real user a direct read on whether recall is good enough on their own material and what it misses.
|
|
296
|
+
- Honest scope and what remains: the queries are auto-derived, not human-judged relevance, and one folder is not three users. This reduces but does not remove the synthetic-only claim. Full E23 completion (reviewed live cases from at least three real work folders, redacted, wired into readiness) is a human step: real walkthroughs on real folders. The agent will not fabricate user proof.
|
|
297
|
+
- Storage-safety bridge shipped: `neurain live-cases scaffold <folder>` prepares a redacted reviewed-case pack scaffold with hash-only source refs and no raw source text or absolute paths. It reports `reviewed_live_user_evidence: false` and `human_judged: false` until a human fills safe local case files and runs the evals.
|
|
298
|
+
- Remaining E23 increments: answer and lesson live-eval against real folders, filled reviewed live case files from at least three real work folders, and readiness gates that require those reviewed receipts before external user evidence is claimed.
|
|
299
|
+
|
|
300
|
+
### Phase 6b / E24: Non-Developer Onboarding
|
|
301
|
+
|
|
302
|
+
- Status: read-only first-run onboarding alpha shipped on 2026-06-09.
|
|
303
|
+
- `neurain onboard <folder>` tells a new user whether the next action is `init`, `adopt --dry-run`, host connection, or `wrap --dry-run`.
|
|
304
|
+
- It reports what Neurain stores and does not store.
|
|
305
|
+
- It makes no model call, no external call, and no durable write.
|
|
306
|
+
- Readiness includes a non-developer onboarding dry-run gate.
|
|
307
|
+
|
|
308
|
+
### Phase 6c / E25: Installable CLI Readiness
|
|
309
|
+
|
|
310
|
+
- Status: publish-ready alpha CLI gates shipped on 2026-06-09.
|
|
311
|
+
- The package exposes the `neurain` binary and includes public docs, CI workflow, templates, license, and security doc.
|
|
312
|
+
- Full readiness verifies npm audit, npm pack dry-run, and temporary tarball install smoke.
|
|
313
|
+
- Honest scope: actual npm publish or package name reservation remains a release action, not an implementation claim.
|
|
314
|
+
|
|
315
|
+
### Phase 6d / E26: Thin MCP Connector
|
|
316
|
+
|
|
317
|
+
- Status: Codex, Claude Code, Gemini CLI, and Agent Runtime MCP surfaces shipped on 2026-06-09.
|
|
318
|
+
- `neurain connect codex <folder> --dry-run`, `neurain connect claude <folder> --dry-run`, and `neurain connect gemini <folder> --dry-run` print host CLI setup commands.
|
|
319
|
+
- `neurain connect runtime <folder> --dry-run` prints a bounded config snippet.
|
|
320
|
+
- Gemini uses a bounded read-first allowlist. Raw capture is not included in the Gemini allowlist by default.
|
|
321
|
+
- MCP exposes bounded read, scan, eval, live-case scaffold, and preview tools. It does not expose silent durable wiki writes, lifecycle emit, daemon run/stop, curator write, recall rebuild write, or lesson promotion.
|
|
322
|
+
- Gemini lifecycle automation is manual-only in alpha; explicit `lifecycle emit --host gemini` is the receipt path when needed.
|
|
323
|
+
|
|
324
|
+
### Phase 3e / E11: Continuous Daemon
|
|
325
|
+
|
|
326
|
+
- Status: user-started foreground continuous daemon alpha shipped.
|
|
327
|
+
- `neurain daemon run <folder> --interval-seconds 300 --json` repeats scheduler ticks until the process is stopped.
|
|
328
|
+
- `neurain daemon run <folder> --max-ticks 2 --json` provides a bounded proof path for tests and demos.
|
|
329
|
+
- `neurain daemon status <folder> --json` reads the last operational state.
|
|
330
|
+
- `neurain daemon stop <folder> --json` requests cooperative stop when a matching foreground daemon is running.
|
|
331
|
+
- Stop requests are checked during the sleep interval with about one-second polling.
|
|
332
|
+
- The daemon writes only `00_system/neurain/daemon-state.json` as operational state.
|
|
333
|
+
- It does not install background jobs, write durable wiki knowledge, promote lessons, call models, call external tools, store private paths in state, or expose MCP tools.
|
|
334
|
+
|
|
335
|
+
### Phase 4: Curator
|
|
336
|
+
|
|
337
|
+
- Status: snapshot-gated curator lifecycle alpha shipped.
|
|
338
|
+
- `neurain curator status <folder>` reports planned lifecycle changes without writing.
|
|
339
|
+
- `neurain curator run <folder> --dry-run` previews the same plan.
|
|
340
|
+
- `neurain curator run <folder> --confirm "1건 저장 진행"` writes a snapshot receipt and changes only lesson `Status` fields.
|
|
341
|
+
- `neurain curator rollback <folder> --receipt <receipt>` restores the exact previous registry snapshot when the registry has not drifted.
|
|
342
|
+
- Curator never deletes lessons and protects pinned, human-authored, private, and non-agent-created lessons.
|
|
343
|
+
- MCP exposes read-only `neurain_curator_status` and `neurain_curator_run_preview` only.
|
|
344
|
+
|
|
345
|
+
### Phase 5: Recall DB
|
|
346
|
+
|
|
347
|
+
- Status: optional SQLite FTS5 recall DB alpha shipped.
|
|
348
|
+
- `neurain recall status <folder>` reports cache state, row count, manifest hash, and markdown fallback availability without writing.
|
|
349
|
+
- `neurain recall rebuild <folder> --dry-run` previews indexed count without writing.
|
|
350
|
+
- `neurain recall rebuild <folder>` writes only `00_system/neurain/recall.sqlite` and a rebuild receipt.
|
|
351
|
+
- `neurain recall search <folder> <query>` searches the SQLite FTS5 cache when present and falls back to markdown when the cache is missing.
|
|
352
|
+
- `neurain recall verify <folder>` compares the current markdown, safe events, and safe receipts with the cache manifest.
|
|
353
|
+
- Markdown remains canonical, and the SQLite file is a rebuildable cache.
|
|
354
|
+
- Private paths, raw source bodies, secret-like content, instruction-injection content, and recall rebuild receipts are excluded from the index.
|
|
355
|
+
- MCP exposes read-only `neurain_recall_status`, `neurain_recall_rebuild_preview`, `neurain_recall_search`, `neurain_recall_verify`, and `neurain_recall_cross_host_eval`.
|
|
356
|
+
- MCP exposes read-only `neurain_answer_quality_eval` for answer-quality fixture regression.
|
|
357
|
+
- MCP exposes read-only `neurain_scheduler_eval` for scheduler trigger regression.
|
|
358
|
+
- `neurain recall eval <folder>` is shipped as a read-only alpha smoke eval for safe host-tagged journal events.
|
|
359
|
+
- `neurain recall eval <folder> --fixture-size 100 --private-probes 20` is shipped as a larger synthetic cross-host regression eval. It does not touch the target root, and it gates on exact-token host filtering, source-supporting snippets, host isolation, and private leakage.
|
|
360
|
+
- `neurain recall eval <folder> --case-file recall-cases.json` is shipped as a reviewed case-file eval. It does not touch the target root, and it gates on host filtering, source-supporting snippets, host isolation, and private leakage against human-reviewed recall cases.
|
|
361
|
+
- These recall eval modes validate recall plumbing, reviewed case handling, and privacy filters. They do not count as semantic recall quality evidence for the 90 percent goal by themselves.
|
|
362
|
+
- `neurain answer eval <folder> --fixture-size 120` is shipped as a read-only alpha fixture for answer faithfulness, citation accuracy, conflict surfacing, abstention, private boundaries, and stale-source handling.
|
|
363
|
+
- `neurain answer eval <folder> --case-file answer-cases.json` is shipped as a read-only reviewed case-file eval for answer faithfulness, citation accuracy, conflict surfacing, abstention, private boundaries, and stale-source handling.
|
|
364
|
+
- These answer eval modes validate policy gates and reviewed case handling, not live production answer quality. External user walkthroughs remain required before claiming full 90 percent answer quality.
|
|
365
|
+
|
|
366
|
+
### Phase 6: Runtime Connector
|
|
367
|
+
|
|
368
|
+
- Status: Runtime MCP config-preview connector alpha shipped.
|
|
369
|
+
- `neurain connect runtime <folder> --dry-run` prints a bounded MCP server snippet for a host-managed config.
|
|
370
|
+
- The snippet uses Neurain as a local stdio MCP server with a small read-heavy tool allowlist.
|
|
371
|
+
- The alpha does not edit Runtime config directly.
|
|
372
|
+
- Avoid requiring Runtime for Neurain adoption.
|
|
373
|
+
- Deeper Runtime lifecycle-event connector automation remains planned.
|
|
374
|
+
|
|
375
|
+
## Hard Gates Before Claiming 90 Percent
|
|
376
|
+
|
|
377
|
+
- No product package leakage.
|
|
378
|
+
- No secret or prompt-injection lesson enters prompt context.
|
|
379
|
+
- No source, receipt, log, handoff, or recall-index instruction injection is executed or promoted.
|
|
380
|
+
- No private-area fact or secret-like string appears in global or cross-host recall fixtures.
|
|
381
|
+
- No global promotion from private or area-only source.
|
|
382
|
+
- Rollback removes only receipt-listed writes.
|
|
383
|
+
- Curator has dry-run, snapshot, rollback, and pinned protection.
|
|
384
|
+
- Recall DB can be deleted and rebuilt from canonical markdown and receipts.
|
|
385
|
+
- Recall DB rebuild equivalence and markdown-only fallback both pass synthetic tests.
|
|
386
|
+
- Background review has both trigger precision and trigger recall evidence.
|
|
387
|
+
- Background review trigger eval passes synthetic and reviewed case-file gates.
|
|
388
|
+
- User-perceived automation has non-developer walkthrough evidence.
|
|
389
|
+
- Claude MAX EFFORT review returns PASS or PASS-WITH-CHANGES with blocking findings resolved.
|
|
390
|
+
- Product tests and readiness checks pass after implementation.
|
|
391
|
+
|
|
392
|
+
## Current Status
|
|
393
|
+
|
|
394
|
+
The current alpha has implemented the first safe slice:
|
|
395
|
+
|
|
396
|
+
- lesson list
|
|
397
|
+
- event journal add/list/verify
|
|
398
|
+
- lesson candidates
|
|
399
|
+
- recap
|
|
400
|
+
- wrap dry-run
|
|
401
|
+
- capabilities
|
|
402
|
+
- read-only watch report
|
|
403
|
+
- read-only review worker report
|
|
404
|
+
- read-only scheduler tick
|
|
405
|
+
- scheduler trigger eval alpha with synthetic and reviewed case-file modes
|
|
406
|
+
- bounded foreground scheduler monitor
|
|
407
|
+
- user-started foreground continuous daemon
|
|
408
|
+
- append-only lifecycle emit
|
|
409
|
+
- read-only lifecycle lineage report
|
|
410
|
+
- Claude Code lifecycle hook preview
|
|
411
|
+
- snapshot-gated curator lifecycle
|
|
412
|
+
- optional SQLite FTS5 recall DB
|
|
413
|
+
- E22 local lexical-semantic recall
|
|
414
|
+
- hybrid recall
|
|
415
|
+
- E23 live-content recall coverage
|
|
416
|
+
- E23 redacted live-case scaffold
|
|
417
|
+
- E24 read-only onboarding
|
|
418
|
+
- E25 publish-ready CLI gates
|
|
419
|
+
- E26 Codex, Claude Code, Gemini CLI, and Agent Runtime MCP connector surface
|
|
420
|
+
- cross-host recall eval smoke
|
|
421
|
+
- larger 100-case cross-host recall regression eval
|
|
422
|
+
- answer-quality eval alpha with synthetic and reviewed case-file modes
|
|
423
|
+
- narrow CLI promotion
|
|
424
|
+
- rollback receipt
|
|
425
|
+
- MCP preview-only boundary
|
|
426
|
+
|
|
427
|
+
Claude Code connector-specific lifecycle hook preview is shipped in alpha. It maps SessionStart, UserPromptSubmit, Stop, and SessionEnd into Neurain lifecycle receipts without storing prompt bodies, transcript paths, or success stdout in model context.
|
|
428
|
+
|
|
429
|
+
Codex lifecycle hook preview is now shipped in alpha for `.codex/hooks.json` SessionStart and Edit or Write PostToolUse review markers. Runtime host-proxy lifecycle contract is now shipped in alpha for agent loops that can forward direct lifecycle boundary events. Deeper native Runtime lifecycle-event automation remains future work because Neurain does not own the Runtime agent loop.
|