npm - llm-wiki-kit - Versions diffs - 0.2.13 → 0.2.15 - Mend

llm-wiki-kit 0.2.13 → 0.2.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/README.md +12 -4
package/docs/concepts.md +14 -9
package/docs/integrations/claude-code.md +4 -2
package/docs/integrations/codex.md +4 -2
package/docs/manual.md +51 -3
package/docs/security.md +4 -0
package/package.json +1 -1
package/src/cli.js +52 -2
package/src/constants.js +2 -0
package/src/evidence.js +128 -0
package/src/maintenance.js +127 -22
package/src/project.js +5 -2
package/src/templates.js +20 -0
package/src/update.js +3 -1
package/src/wiki-eval.js +110 -0
package/src/wiki-export.js +214 -0
package/src/wiki-lint.js +66 -1
package/src/wiki-model.js +2 -0
package/src/wiki-search.js +79 -13
package/src/wiki-visibility.js +34 -4

package/README.md CHANGED Viewed

@@ -85,7 +85,9 @@ llm-wiki/
 ├── outputs/
 │   ├── questions/
 │   ├── reports/
+│   ├── exports/
 │   └── maintenance/
+├── evals/
 └── procedures/
 ```
@@ -102,6 +104,7 @@ The installed hooks:
 - remove Codex-facing legacy `oh-my-codex:wiki`/`omx_wiki` surfaces at session start so `llm-wiki/` remains the active wiki implementation
 - record small redacted raw event envelopes and per-turn state
 - capture meaningful work and structured decision points, including tool evidence, changed files, and verification notes
+- attach safe `evidence_refs` candidates to generated durable candidates when changed files or verification commands are available
 - before compaction, classify the current turn and save a redacted checkpoint only for meaningful work, structured decisions, or explicit durable requests; explicit durable candidates also get a maintenance queue item when no durable wiki update is detected
 - after compaction, store the redacted compact summary only; if pre-compact preservation failed, prepare a recovery packet for the next legal model-visible context hook
 - allow tool calls to proceed without secret/PII-based hook blocking
@@ -123,7 +126,7 @@ Most users should not need these during daily Claude Code/Codex work. They exist
 - Install/update: `llm-wiki install`, `llm-wiki update`, `llm-wiki post-update`, `llm-wiki projects`
 - Diagnostics: `llm-wiki doctor`, `llm-wiki status`, `llm-wiki version`
 - Manual: `llm-wiki manual`
-- Agent maintenance helpers: `llm-wiki context`, `llm-wiki lint`, `llm-wiki consolidate`, `llm-wiki maintenance`
+- Agent maintenance helpers: `llm-wiki context`, `llm-wiki lint`, `llm-wiki consolidate`, `llm-wiki maintenance`, `llm-wiki eval`, `llm-wiki export`
 - Live Q&A archive helper: `llm-wiki archive-questions --workspace <project> [--date YYYY-MM-DD] [--dry-run]`
 - Cleanup: `llm-wiki uninstall`
@@ -139,13 +142,17 @@ Installed npm runtimes also perform a cached update notice check from hooks whil
 `llm-wiki post-update --workspace <project>` reapplies the current runtime's hook entries and safe managed template updates without running `npm install -g`. Use `post-update --all --workspace <search-root>` to reapply templates across discovered project roots.
-`llm-wiki context "<query>"` prints the full debug view of the layered context sources used by hooks. Hook injection may render those sources as functional compact context for Codex and Claude, but this CLI stays verbose so maintainers can inspect retrieval, snippets, memory, index, expansion behavior, and context budget metadata. Daily use should rely on hook injection. By default, episodic `wiki/queries/`, `wiki/context/`, and `session-log` pages are excluded from search unless they were promoted with `memory_type: semantic` or `procedural` and `importance >= 4`; use `--include-episodic` only when debugging old automatic records. Archived or superseded pages are hidden unless `--include-archived` is requested, while stale pages remain searchable with lower score.
+`llm-wiki context "<query>"` prints the full debug view of the layered context sources used by hooks. Hook injection may render those sources as functional compact context for Codex and Claude, but this CLI stays verbose so maintainers can inspect retrieval, snippets, memory, index, expansion behavior, context budget metadata, `rankReason`, `matchedFields`, `scoreBreakdown`, `visibilityReason`, and `evidenceRefs`. The text formatter adds a short `why selected` line for each hit; hook compact context deliberately omits that extra detail. Daily use should rely on hook injection. By default, episodic `wiki/queries/`, `wiki/context/`, and `session-log` pages are excluded from search unless they were promoted with `memory_type: semantic` or `procedural` and `importance >= 4`; use `--include-episodic` only when debugging old automatic records. Archived or superseded pages are hidden unless `--include-archived` is requested, while stale pages remain searchable with lower score.
-`llm-wiki lint` checks wiki health and detects outdated managed rules from older kit versions. It also warns when `memory.md` is near budget, wiki page count nears the search cap, hidden episodic/context pages accumulate, or stale/archived pages lack supersession/link discoverability. Agents may use it before/after meaningful wiki maintenance.
+`llm-wiki lint` checks wiki health and detects outdated managed rules from older kit versions. It validates optional `evidence_refs` entries with the prefixes `file:`, `cmd:`, `raw:`, and `url:`; unsafe paths, unsupported prefixes, credential-bearing URLs, command secrets, and secret-like values are reported as errors, while missing local evidence targets are warnings. It also warns when `memory.md` is near budget, wiki page count nears the search cap, hidden episodic/context pages accumulate, or stale/archived pages lack supersession/link discoverability. Agents may use it before/after meaningful wiki maintenance.
 `llm-wiki consolidate` refreshes only generated marker blocks in `wiki/memory.md` and `wiki/index.md`. Generated maps keep durable non-archived pages, hide default episodic records, skip stale/archived/superseded pages, and report those counts in dry-run output. It is an agent maintenance helper, not a command users should run after every turn.
-`llm-wiki maintenance` prints the pending queue and review due status from `llm-wiki/outputs/maintenance/queue.md`. Hooks create only selective candidates; the active agent should merge reusable items into existing durable wiki pages and mark queue items `done` or `skipped` without delaying unrelated user answers. Periodic maintenance is a soft agent-side reminder, not a user command loop.
+`llm-wiki maintenance` prints the queue and review due status from `llm-wiki/outputs/maintenance/queue.md`. Queue states are `pending -> approved -> done` or `skipped`. Use `llm-wiki maintenance --workspace <project> --approve <id> --target <wiki/...md>` when durable promotion is approved, `--done <id> --target <wiki/...md>` after the active agent has merged the fact into a durable page, and `--skip <id> [--note "..."]` for duplicate or non-durable candidates. Approved items are shown before pending items in hook reminders. Periodic maintenance is a soft agent-side reminder, not a user command loop.
+`llm-wiki eval --workspace <project> [--fixture <path>] [--limit 5] [--json]` runs retrieval fixtures from `llm-wiki/evals/retrieval.json` by default. If the fixture is absent, it exits successfully with `no fixture found`. Fixtures list `query`, `expected`, and `unexpected` paths; output reports recall, missed expected hits, unexpected hits, and top hits using the same durable visibility policy as export.
+`llm-wiki export --workspace <project> [--format all|llms|llms-full|json] [--output <dir>] [--dry-run] [--json]` writes durable wiki manifests under `llm-wiki/outputs/exports/` by default. `llms.txt` is an agent onboarding and handoff manifest, not a passive SEO artifact. `llms-full.txt` is a redacted durable context bundle for compaction recovery or handoff. `llm-wiki.json` is a structured manifest for future adapters and eval tooling. Export uses the same durable visibility policy as search/eval and redacts credentials before writing.
 `llm-wiki archive-questions` splits older legacy `llm-wiki/outputs/questions/YYYY-MM-DD-live-qa.md` files into the chunked `llm-wiki/outputs/questions/YYYY-MM-DD/` layout. It preserves the original under `outputs/questions/archive/originals/` with a SHA-256 sidecar and replaces the legacy file with a short pointer stub. Use `--dry-run` first when reviewing a large archive.
@@ -192,6 +199,7 @@ llm-wiki hook claude Stop
 - PreCompact may read a small bounded transcript tail to create a redacted checkpoint, but it does not store the full transcript or raw `transcript_path`.
 - Tool calls are not blocked only because inputs look sensitive.
 - Authentication values such as tokens, passwords, and private keys are redacted before durable summaries are written.
+- Generated exports are redacted and must not contain npm tokens, WinRM credentials, private keys, raw `.env`, or full raw transcripts.
 - Hook payloads are stored only as redacted event envelopes.
 - Phone numbers, emails, dates, and business identifiers are preserved by default so the wiki remains useful for local work.

package/docs/concepts.md CHANGED Viewed

@@ -19,11 +19,13 @@ The important behavior is a loop:
 3. The user works normally; no extra command loop is required.
 4. Hooks gather redacted prompt/tool/result summaries.
 5. At stop/session end, hooks append redacted chunked live Q&A only for turns with work evidence or structured decision/debugging conclusions.
-6. Simple answers, status checks, and keyword-only responses stay out of live Q&A and durable wiki by default.
-7. Durable wiki promotion is selective: explicit record/document requests should be handled by the active agent in existing wiki pages; the hook queues review only when such a request was not reflected in durable files.
-8. At the next start/prompt after an abrupt shutdown, hooks can recover stale turn state into `outputs/maintenance/queue.md`.
-9. When reusable knowledge appears, the active Claude Code/Codex agent folds approved facts into existing durable wiki pages instead of leaving everything as one-off Q&A.
-10. Future sessions start from the improved wiki instead of relying on long chat history.
+6. When possible, generated candidates carry safe `evidence_refs` such as changed files, verification commands, raw source IDs, or external URLs.
+7. Simple answers, status checks, and keyword-only responses stay out of live Q&A and durable wiki by default.
+8. Durable wiki promotion is selective: explicit record/document requests should be handled by the active agent in existing wiki pages; the hook queues review only when such a request was not reflected in durable files.
+9. At the next start/prompt after an abrupt shutdown, hooks can recover stale turn state into `outputs/maintenance/queue.md`.
+10. When reusable knowledge appears, the active Claude Code/Codex agent folds approved facts into existing durable wiki pages instead of leaving everything as one-off Q&A.
+11. Export and eval reuse the same durable visibility policy so handoff manifests, retrieval fixtures, and context selection describe the same wiki surface.
+12. Future sessions start from the improved wiki instead of relying on long chat history.
 The kit is a template/runtime repository. It must not centralize project wiki contents.
@@ -40,8 +42,11 @@ The maintenance loop is intentionally layered:
 - `memory.md`: short hot index for current durable facts.
 - `index.md`: broad navigation map.
-- MiniSearch + wikilinks: retrieval over durable `wiki/**/*.md`, with episodic `wiki/queries/`, `wiki/context/`, and `session-log` pages hidden by default unless promoted or `--include-episodic` is requested; archived/superseded pages stay preserved but hidden unless `--include-archived` is requested.
-- `outputs/maintenance/queue.md`: selective reminders for explicit durable requests that need review, plus stale turn recovery.
-- `lint`: finds broken links, stale pages, duplicates, metadata gaps, secret-like content, outdated managed rules, memory/page-count budget pressure, hidden episodic growth, and stale/archived discoverability gaps.
-- `maintenance`: reports `reviewDue` only when periodic thresholds are met; hook reminders are soft and limited to session start/instructions loaded or maintenance-related prompts.
+- MiniSearch + wikilinks: retrieval over durable `wiki/**/*.md`, with episodic `wiki/queries/`, `wiki/context/`, and `session-log` pages hidden by default unless promoted or `--include-episodic` is requested; archived/superseded pages stay preserved but hidden unless `--include-archived` is requested. Verbose context explains `why selected`; hook context stays compact.
+- `evidence_refs`: optional frontmatter that ties durable claims to `file:`, `cmd:`, `raw:`, or `url:` evidence without embedding secrets or raw transcripts.
+- `outputs/maintenance/queue.md`: selective reminders for explicit durable requests that need review, plus stale turn recovery. Queue state is `pending`, `approved`, `done`, or `skipped`.
+- `lint`: finds broken links, stale pages, duplicates, metadata gaps, invalid evidence refs, secret-like content, outdated managed rules, memory/page-count budget pressure, hidden episodic growth, and stale/archived discoverability gaps.
+- `maintenance`: reports `reviewDue` only when periodic thresholds are met; hook reminders are soft and limited to session start/instructions loaded or maintenance-related prompts, with approved items shown before pending items.
 - `consolidate`: agent helper that refreshes generated blocks in `memory.md` and `index.md` while preserving handwritten notes, keeping default query/context/session pages out of the durable generated maps, and skipping stale/archived/superseded pages.
+- `eval`: checks retrieval fixtures in `llm-wiki/evals/retrieval.json` and reports expected recall, missed expected paths, unexpected hits, and top hits.
+- `export`: writes redacted `llms.txt`, `llms-full.txt`, and `llm-wiki.json` manifests for agent onboarding, handoff, retrieval eval, and external consumption. `llms.txt` is not treated as a passive SEO artifact.

package/docs/integrations/claude-code.md CHANGED Viewed

@@ -42,11 +42,13 @@ when no project `CLAUDE.md` exists. Existing `CLAUDE.md` files are not overwritt
 The hook records redacted turn summaries but does not deny tool calls only because an input looks sensitive. Hook payloads are stored as small redacted event envelopes rather than full transcripts, and context output is redacted field by field before it is returned to Claude Code.
-At `SessionStart`/`InstructionsLoaded`, the hook first attempts a safe managed-template refresh, recovers stale turn state into `outputs/maintenance/queue.md`, performs a cached npm update notice check for npm installs, then injects functional compact context. The context still uses `llm-wiki/wiki/memory.md`, `llm-wiki/wiki/index.md`, relevant wiki/search state, operating rules, maintenance signals, passive runtime update status, and managed-template cleanup notes; the hook formats those signals so they are usable if shown in the Claude Code UI. At `UserPromptSubmit`, it recovers stale turn state, searches wiki pages with MiniSearch or substring fallback, expands one-hop wikilinks, redacts context fields, performs the same cached update notice check, and injects the smallest useful functional compact context set. Update notice cache is scoped by npm command, and maintenance reminders are shown only when the prompt is wiki/maintenance related or matches a queue topic.
+At `SessionStart`/`InstructionsLoaded`, the hook first attempts a safe managed-template refresh, recovers stale turn state into `outputs/maintenance/queue.md`, performs a cached npm update notice check for npm installs, then injects functional compact context. The context still uses `llm-wiki/wiki/memory.md`, `llm-wiki/wiki/index.md`, relevant wiki/search state, operating rules, maintenance signals, passive runtime update status, and managed-template cleanup notes; the hook formats those signals so they are usable if shown in the Claude Code UI. At `UserPromptSubmit`, it recovers stale turn state, searches wiki pages with MiniSearch or substring fallback, expands one-hop wikilinks, redacts context fields, performs the same cached update notice check, and injects the smallest useful functional compact context set. Verbose `llm-wiki context` can explain `why selected`, `rankReason`, `matchedFields`, and `evidenceRefs`, but hook context keeps those details compact. Update notice cache is scoped by npm command, and maintenance reminders are shown only when the prompt is wiki/maintenance related or matches a queue topic.
 Hook-visible language is selected from the current user prompt first. Korean prompts get Korean guidance, English prompts get English guidance. If no prompt language is clear, the hook checks Claude Code `settings.json` `language` when it exists, then local `CLAUDE.md`/`AGENTS.md` language signals, then English. The kit does not require Claude Code to expose a language setting.
-`PostToolUse` and `PostToolBatch` record redacted tool summaries in the same turn buffer. `PreCompact` classifies the current turn before compaction: simple turns record only a context note, work-evidence or structured-decision turns write a chunked live Q&A checkpoint, and explicit durable candidates write a maintenance queue item only when no durable wiki update is detected. The checkpoint can include only a bounded redacted transcript tail, never the full raw transcript or raw `transcript_path`. Compaction is not blocked; if checkpoint storage fails, the hook records a compact recovery packet for the next legal context-injection event. `PostCompact` stores the redacted compact summary as a context note and prepares any pending recovery packet without returning model-visible context directly. In the default `answer-first` mode, `SubagentStop` does not create live Q&A, query, decision, or maintenance files. `Stop` and `SessionEnd` append chunked live Q&A only for work-evidence or structured-decision turns and do not auto-create `wiki/queries/` or `wiki/decisions/`. If the user explicitly asked to record or document durable knowledge and no durable wiki update is detected, `Stop`/`SessionEnd` queue a pending maintenance item for agent review. `Stop` and `SessionEnd` then clear the per-session turn buffer; `SubagentStop` does not.
+`PostToolUse` and `PostToolBatch` record redacted tool summaries in the same turn buffer. `PreCompact` classifies the current turn before compaction: simple turns record only a context note, work-evidence or structured-decision turns write a chunked live Q&A checkpoint, and explicit durable candidates write a maintenance queue item only when no durable wiki update is detected. Queue items may carry safe `evidence_refs` candidates from changed files and verification commands. The checkpoint can include only a bounded redacted transcript tail, never the full raw transcript or raw `transcript_path`. Compaction is not blocked; if checkpoint storage fails, the hook records a compact recovery packet for the next legal context-injection event. `PostCompact` stores the redacted compact summary as a context note and prepares any pending recovery packet without returning model-visible context directly. In the default `answer-first` mode, `SubagentStop` does not create live Q&A, query, decision, or maintenance files. `Stop` and `SessionEnd` append chunked live Q&A only for work-evidence or structured-decision turns and do not auto-create `wiki/queries/` or `wiki/decisions/`. If the user explicitly asked to record or document durable knowledge and no durable wiki update is detected, `Stop`/`SessionEnd` queue a pending maintenance item for agent review. Approved maintenance items are shown before pending items in later reminders. `Stop` and `SessionEnd` then clear the per-session turn buffer; `SubagentStop` does not.
+For handoff or retrieval verification, use `llm-wiki export --workspace <project> --format all` and `llm-wiki eval --workspace <project>`. The generated `llms.txt`/`llms-full.txt`/`llm-wiki.json` files are redacted durable manifests, not raw transcripts.
 Set `LLM_WIKI_KIT_AUTO_PROJECT_UPDATE=0` only while diagnosing automatic managed-template refresh behavior.
 Set `LLM_WIKI_KIT_UPDATE_NOTICE=0` only while suppressing the cached passive runtime update status.

package/docs/integrations/codex.md CHANGED Viewed

@@ -30,18 +30,20 @@ Handled events:
 Expected behavior:
 - `SessionStart` first attempts a safe managed-template refresh, removes Codex-facing legacy `oh-my-codex:wiki`/`omx_wiki` surfaces when they reappear, recovers stale turn state into `outputs/maintenance/queue.md`, performs a cached npm update notice check for npm installs, then injects functional compact context. The context still uses `llm-wiki/wiki/memory.md`, `llm-wiki/wiki/index.md`, relevant wiki/search state, operating rules, maintenance signals, passive runtime update status, and managed-template cleanup notes; the hook formats those signals so they are usable if shown in the Codex UI.
-- `UserPromptSubmit` recovers stale turn state, searches project wiki pages with MiniSearch or substring fallback, expands one-hop wikilinks, redacts context fields, performs the same cached update notice check, and injects the smallest useful functional compact context set. Update notice cache is scoped by npm command, and maintenance reminders are shown only when the prompt is wiki/maintenance related or matches a queue topic.
+- `UserPromptSubmit` recovers stale turn state, searches project wiki pages with MiniSearch or substring fallback, expands one-hop wikilinks, redacts context fields, performs the same cached update notice check, and injects the smallest useful functional compact context set. Verbose `llm-wiki context` can explain `why selected`, `rankReason`, `matchedFields`, and `evidenceRefs`, but hook context keeps those details compact. Update notice cache is scoped by npm command, and maintenance reminders are shown only when the prompt is wiki/maintenance related or matches a queue topic.
 - Hook-visible language is selected from the current user prompt first. Korean prompts get Korean guidance, English prompts get English guidance. If no prompt language is clear, Codex falls back to local `CLAUDE.md`/`AGENTS.md` language signals, then English.
 - `PreToolUse` records redacted tool summaries without blocking tool calls.
 - `PostToolUse` records redacted tool summaries in a turn buffer.
 - `PreCompact` classifies the current turn before compaction. Simple turns record only a context note; work-evidence or structured-decision turns write a chunked live Q&A checkpoint; explicit durable candidates write a maintenance queue item only when no durable wiki update is detected. The checkpoint can include only a bounded redacted transcript tail, never the full raw transcript or raw `transcript_path`. Compaction is not blocked; if checkpoint storage fails, the hook records a compact recovery packet for the next legal context-injection event.
 - `PostCompact` stores the redacted compact summary as a context note and prepares any pending compact recovery packet. It does not return `hookSpecificOutput.additionalContext`, because Codex `PostCompact` only supports common output fields.
 - In the default `answer-first` mode, `SubagentStop` does not create live Q&A, query, decision, or maintenance files. `Stop` appends chunked live Q&A only for work-evidence or structured-decision turns and does not auto-create `wiki/queries/` or `wiki/decisions/`.
-- If the user explicitly asked to record or document durable knowledge and no durable wiki update is detected, `Stop` queues a pending maintenance item for agent review.
+- If the user explicitly asked to record or document durable knowledge and no durable wiki update is detected, `Stop` queues a pending maintenance item for agent review. Queue items may carry safe `evidence_refs` candidates from changed files and verification commands. Approved maintenance items are shown before pending items in later reminders.
 - `Stop` clears the per-session turn buffer after recording. `SubagentStop` leaves the parent turn buffer available for the final stop event.
 Hook payloads are stored as small redacted event envelopes rather than full transcripts. Context output is also redacted field by field before it is returned to Codex. Functional compact context is a presentation policy, not a feature reduction: Codex still receives the wiki memory, search, maintenance, and passive update signals needed for the hook workflow.
+For handoff or retrieval verification, use `llm-wiki export --workspace <project> --format all` and `llm-wiki eval --workspace <project>`. The generated `llms.txt`/`llms-full.txt`/`llm-wiki.json` files are redacted durable manifests, not raw transcripts.
 Set `LLM_WIKI_KIT_AUTO_PROJECT_UPDATE=0` only while diagnosing automatic managed-template refresh behavior.
 Set `LLM_WIKI_KIT_UPDATE_NOTICE=0` only while suppressing the cached passive runtime update status.
 Set `LLM_WIKI_KIT_CAPTURE_MODE=legacy-eager` only as deprecated compatibility mode for the old eager query/decision capture behavior.

package/docs/manual.md CHANGED Viewed

@@ -55,7 +55,9 @@ llm-wiki/
 ├── outputs/
 │   ├── questions/
 │   ├── reports/
+│   ├── exports/
 │   └── maintenance/
+├── evals/
 └── procedures/
 ```
@@ -69,6 +71,7 @@ Use Codex or Claude Code normally. Installed hooks:
 - select Korean or English hook guidance from the current user prompt and local instruction files;
 - use `wiki/memory.md`, `wiki/index.md`, relevant wiki search, maintenance signals, update notices, and compact recovery packets;
 - record redacted prompt/tool/result summaries in per-turn state;
+- preserve safe evidence pointers as `evidence_refs` when changed files or verification commands are available;
 - archive only meaningful work turns or structured decision/debugging turns into chunked `outputs/questions/YYYY-MM-DD/live-qa-001.md` files;
 - avoid automatic `wiki/queries/` and `wiki/decisions/` promotion in the default answer-first mode;
 - queue durable cleanup candidates only for explicit documentation requests that were not reflected in durable wiki files, or when stale turn state is recovered;
@@ -149,9 +152,11 @@ Most users should not need these during daily coding. They are for install, upda
 - `llm-wiki bootstrap --workspace <project>`: create project-local wiki structure.
 - `llm-wiki migrate --workspace <project>`: copy legacy wiki material into the current layout.
 - `llm-wiki context "<query>" --workspace <project>`: verbose debug view of hook context sources.
+- `llm-wiki eval --workspace <project> [--fixture <path>] [--limit 5] [--json]`: run retrieval fixtures.
+- `llm-wiki export --workspace <project> [--format all|llms|llms-full|json] [--output <dir>] [--dry-run] [--json]`: write durable wiki manifests.
 - `llm-wiki lint --workspace <project>`: wiki health check.
 - `llm-wiki consolidate --workspace <project> [--dry-run]`: refresh generated blocks in `memory.md` and `index.md`.
-- `llm-wiki maintenance --workspace <project> [--json]`: show pending durable cleanup candidates and review health.
+- `llm-wiki maintenance --workspace <project> [--approve <id> --target <wiki/...md> | --done <id> --target <wiki/...md> | --skip <id> [--note "..."]] [--json]`: show or update durable cleanup review state.
 - `llm-wiki archive-questions --workspace <project> [--date YYYY-MM-DD] [--dry-run]`: split old flat live Q&A files into chunks.
 - `llm-wiki uninstall`: remove kit-managed hook entries, leaving project wiki contents intact.
@@ -187,11 +192,51 @@ llm-wiki context "auth architecture" --workspace /path/to/project --include-epis
 llm-wiki context "auth architecture" --workspace /path/to/project --include-archived
 ```
-Default search prioritizes durable semantic/procedural wiki pages. Episodic `wiki/queries/`, `wiki/context/`, and `session-log` pages are hidden unless promoted with durable metadata or explicitly requested. Archived and superseded pages are hidden unless `--include-archived` is used. Stale pages remain searchable with lower score.
+Default search prioritizes durable semantic/procedural wiki pages. Episodic `wiki/queries/`, `wiki/context/`, and `session-log` pages are hidden unless promoted with durable metadata or explicitly requested. Archived and superseded pages are hidden unless `--include-archived` is used. Stale pages remain searchable with lower score. JSON hits include `rankReason`, `visibilityReason`, `evidenceRefs`, `matchedFields`, and `scoreBreakdown`; the text formatter prints `why selected` for maintainers. Hook compact context stays shorter and does not include those debug lines.
+## Evidence, Eval, And Export
+Curated wiki pages may include optional frontmatter:
+```yaml
+evidence_refs:
+  - "file:src/wiki-search.js"
+  - "cmd:node --test"
+  - "raw:source-id"
+  - "url:https://example.com/reference"
+```
+`llm-wiki lint` validates the prefix, safety, and rough reachability of those references. `file:` must be repo-relative, `cmd:` must be a short single-line redacted-safe command, `raw:` should resolve to a raw/source candidate, and `url:` must be `http` or `https` without credentials.
+`llm-wiki eval` reads `llm-wiki/evals/retrieval.json` by default:
+```json
+{
+  "queries": [
+    {
+      "query": "semantic retrieval",
+      "expected": ["wiki/architecture/retrieval.md"],
+      "unexpected": ["wiki/queries/old-auto.md"]
+    }
+  ]
+}
+```
+Missing fixtures exit successfully with `no fixture found`. Present fixtures report expected recall, missed expected paths, unexpected hits, and top hits. Eval and export share the same durable visibility policy so archived/superseded/default episodic pages are treated consistently.
+`llm-wiki export` writes `llms.txt`, `llms-full.txt`, and `llm-wiki.json` under `llm-wiki/outputs/exports/` by default. `llms.txt` is a curated onboarding and handoff manifest for agents and humans, not a passive SEO file. `llms-full.txt` is a bounded redacted context bundle for handoff or compaction recovery. `llm-wiki.json` is the structured manifest for future adapters and eval tooling. `--dry-run` reports planned files without writing them.
 ## Maintenance
-`llm-wiki maintenance` reports pending queue state and review health. It does not merge pages automatically. The active agent should merge reusable items into existing durable pages and mark queue items `done` or `skipped`.
+`llm-wiki maintenance` reports queue state and review health. It does not merge pages automatically. The active agent should merge reusable items into existing durable pages and mark queue items through `pending`, `approved`, `done`, or `skipped`.
+```bash
+llm-wiki maintenance --workspace <project> --approve <id> --target wiki/concepts/topic.md
+llm-wiki maintenance --workspace <project> --done <id> --target wiki/concepts/topic.md
+llm-wiki maintenance --workspace <project> --skip <id> --note "duplicate"
+```
+`approved` means durable promotion is accepted but not yet merged. `done` means the durable page has been updated. `skipped` means the item was duplicate or not reusable enough. Approved reminders are shown before pending reminders.
 Hook reminders are soft:
@@ -214,6 +259,7 @@ LLM_WIKI_KIT_PRECOMPACT_ENFORCEMENT=limited
 - Hook payloads are stored as small redacted event envelopes.
 - Tool calls are not blocked only because input looks sensitive.
 - Tokens, passwords, bearer credentials, private keys, and raw `.env` contents are redacted before durable storage.
+- Generated exports are redacted and must not store npm tokens, WinRM credentials, private keys, raw `.env`, or full raw transcripts.
 - Phone numbers, emails, dates, and business identifiers are preserved by default because they can be useful local work context.
 - `llm-wiki lint` reports secret-like wiki content as an error.
@@ -232,6 +278,8 @@ llm-wiki version
 llm-wiki status --workspace /path/to/project
 llm-wiki doctor --workspace /path/to/project
 llm-wiki update --check --workspace /path/to/project
+llm-wiki eval --workspace /path/to/project --json
+llm-wiki export --workspace /path/to/project --format all --dry-run --json
 ```
 Native Windows support claims require a real Windows smoke: install the published package, run `install`, `status`, and `doctor` against a Windows project, inspect `%USERPROFILE%\.codex\hooks.json` and `%USERPROFILE%\.claude\settings.json`, and run hook smoke tests through `llm-wiki.cmd`.

package/docs/security.md CHANGED Viewed

@@ -13,4 +13,8 @@ Before writing durable summaries, the runtime redacts authentication values such
 Manual and hook context output also runs through redaction before returning excerpts or search hits. `llm-wiki lint` reports remaining secret-like wiki content as an error so it can be removed or rewritten before it becomes reusable project memory.
+`evidence_refs` are pointers, not a place to paste secrets or transcripts. `llm-wiki lint` rejects secret-like evidence values, unsafe `file:` paths, credential-bearing `url:` values, unsupported prefixes, and unsafe commands. Missing local `file:` or `raw:` targets are warnings so agents can fix references without losing the surrounding durable note.
+`llm-wiki export` redacts generated `llms.txt`, `llms-full.txt`, and `llm-wiki.json` output. Exports must not contain npm tokens, WinRM credentials, private keys, raw `.env`, or full raw transcripts. `llms.txt` is an agent onboarding/handoff manifest and follows the same durable visibility policy as retrieval eval, so archived/superseded/default episodic pages are excluded by default.
 Hook payloads are stored as small event envelopes, not full raw transcripts. Full transcript capture is intentionally not implemented as a default. `PreCompact` may read a small bounded transcript tail for a redacted checkpoint, but it does not store the raw transcript path or full transcript. If a project needs raw transcript capture, add a project-local policy and a redaction path first.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "llm-wiki-kit",
-  "version": "0.2.13",
+  "version": "0.2.15",
   "description": "Hook-first living Markdown wiki runtime for Codex and Claude Code with Korean/English prompt-aware guidance.",
   "type": "module",
   "files": [

package/src/cli.js CHANGED Viewed

@@ -3,7 +3,7 @@ import { resolve } from 'path';
 import { formatConsolidateResult, runConsolidate } from './consolidate.js';
 import { handleHook } from './hook.js';
 import { install, status, uninstall } from './install.js';
-import { formatMaintenanceResult, maintenanceSummary } from './maintenance.js';
+import { formatMaintenanceResult, maintenanceSummary, updateMaintenanceItem } from './maintenance.js';
 import { bootstrapProject } from './project.js';
 import { inspectProjectState } from './project-state.js';
 import { commandForProject, knownProjectRoots, recordProject } from './projects.js';
@@ -11,6 +11,8 @@ import { formatDoctor, runDoctor } from './doctor.js';
 import { migrate } from './migrate.js';
 import { postUpdate, update } from './update.js';
 import { buildContextPack, formatContextPack } from './wiki-search.js';
+import { formatEvalResult, runEval } from './wiki-eval.js';
+import { formatExportResult, runExport } from './wiki-export.js';
 import { formatLintResult, runLint } from './wiki-lint.js';
 import { archiveQuestions, formatArchiveQuestionsResult } from './live-qa.js';
@@ -33,6 +35,30 @@ function parseOptions(args) {
     } else if (arg === '--to') {
       options.to = optionValue(arg, i);
       i += 1;
+    } else if (arg === '--fixture') {
+      options.fixture = optionValue(arg, i);
+      i += 1;
+    } else if (arg === '--format') {
+      options.format = optionValue(arg, i);
+      i += 1;
+    } else if (arg === '--output') {
+      options.output = resolve(optionValue(arg, i));
+      i += 1;
+    } else if (arg === '--target') {
+      options.target = optionValue(arg, i);
+      i += 1;
+    } else if (arg === '--note') {
+      options.note = optionValue(arg, i);
+      i += 1;
+    } else if (arg === '--approve') {
+      options.approve = optionValue(arg, i);
+      i += 1;
+    } else if (arg === '--done') {
+      options.done = optionValue(arg, i);
+      i += 1;
+    } else if (arg === '--skip') {
+      options.skip = optionValue(arg, i);
+      i += 1;
     } else if (arg === '--date') {
       const value = optionValue(arg, i);
       if (!/^\d{4}-\d{2}-\d{2}$/.test(value)) {
@@ -127,9 +153,11 @@ Usage:
   llm-wiki bootstrap --workspace <project>
   llm-wiki migrate --workspace <project>
   llm-wiki context "<query>" --workspace <project> [--limit 5] [--no-expand] [--include-episodic] [--include-archived]
+  llm-wiki eval --workspace <project> [--fixture <path>] [--limit 5] [--json]
+  llm-wiki export --workspace <project> [--format all|llms|llms-full|json] [--output <dir>] [--dry-run] [--json]
   llm-wiki lint --workspace <project>
   llm-wiki consolidate --workspace <project> [--dry-run]
-  llm-wiki maintenance --workspace <project> [--json]
+  llm-wiki maintenance --workspace <project> [--approve <id> --target <wiki/...md> | --done <id> --target <wiki/...md> | --skip <id> [--note "..."]] [--json]
   llm-wiki archive-questions --workspace <project> [--date YYYY-MM-DD] [--dry-run] [--json]
 `);
     return;
@@ -229,6 +257,20 @@ Usage:
     return;
   }
+  if (command === 'eval') {
+    const projectRoot = resolve(options.workspace || process.cwd());
+    const result = await runEval(projectRoot, options);
+    printJsonOrText(result, options, formatEvalResult);
+    if (!result.ok) process.exitCode = 1;
+    return;
+  }
+  if (command === 'export') {
+    const projectRoot = resolve(options.workspace || process.cwd());
+    printJsonOrText(await runExport(projectRoot, options), options, formatExportResult);
+    return;
+  }
   if (command === 'lint') {
     const projectRoot = resolve(options.workspace || process.cwd());
     const result = await runLint(projectRoot, options);
@@ -245,6 +287,14 @@ Usage:
   if (command === 'maintenance') {
     const projectRoot = resolve(options.workspace || process.cwd());
+    const actions = [options.approve ? 'approve' : '', options.done ? 'done' : '', options.skip ? 'skip' : ''].filter(Boolean);
+    if (actions.length > 1) throw new Error('maintenance accepts only one of --approve, --done, or --skip');
+    if (actions.length === 1) {
+      const action = actions[0];
+      const id = options.approve || options.done || options.skip;
+      printJsonOrText(await updateMaintenanceItem(projectRoot, id, action, options), options);
+      return;
+    }
     printJsonOrText(await maintenanceSummary(projectRoot, { ...options, includeLint: true }), options, formatMaintenanceResult);
     return;
   }

package/src/constants.js CHANGED Viewed

@@ -52,7 +52,9 @@ export const LLM_WIKI_DIRS = [
   'wiki/queries',
   'outputs/questions',
   'outputs/reports',
+  'outputs/exports',
   'outputs/maintenance',
+  'evals',
   'procedures',
 ];

package/src/evidence.js ADDED Viewed

@@ -0,0 +1,128 @@
+import { isAbsolute, relative, sep } from 'path';
+import { extractPathsFromText, hasSecretLikeText, isSensitivePath, redactText, summarizeForStorage } from './redaction.js';
+export const EVIDENCE_PREFIXES = new Set(['file', 'cmd', 'raw', 'url']);
+export const MAX_CMD_EVIDENCE_CHARS = 500;
+export function normalizeEvidenceRefs(value) {
+  if (!Array.isArray(value)) return [];
+  const refs = value
+    .map((item) => summarizeForStorage(String(item || ''), 700))
+    .map((item) => item.replace(/\s+/g, ' ').trim())
+    .filter(Boolean);
+  return [...new Set(refs)];
+}
+export function parseEvidenceRef(ref) {
+  const text = String(ref || '').trim();
+  const match = text.match(/^([a-z]+):([\s\S]*)$/i);
+  if (!match) return { raw: text, prefix: '', value: text };
+  return {
+    raw: text,
+    prefix: match[1].toLowerCase(),
+    value: match[2].trim(),
+  };
+}
+export function parseEvidenceRefsField(value) {
+  const text = String(value || '').trim();
+  if (!text) return [];
+  try {
+    const parsed = JSON.parse(text);
+    if (Array.isArray(parsed)) return normalizeEvidenceRefs(parsed);
+  } catch {
+    // Fall through to legacy comma/text parsing.
+  }
+  return normalizeEvidenceRefs(text.split(',').map((item) => item.trim()));
+}
+export function frontmatterEvidenceRefs(refs) {
+  const normalized = normalizeEvidenceRefs(refs);
+  if (normalized.length === 0) return 'evidence_refs: []';
+  return ['evidence_refs:', ...normalized.map((ref) => `  - "${ref.replace(/"/g, '\\"')}"`)].join('\n');
+}
+function cleanCandidatePath(value) {
+  return String(value || '')
+    .replace(/^[-*]\s+/, '')
+    .replace(/^file:/i, '')
+    .replace(/[),.;:]+$/g, '')
+    .replace(/\\/g, '/')
+    .trim();
+}
+function isUsefulFileCandidate(value) {
+  const text = cleanCandidatePath(value);
+  if (!text || text.includes('://') || text.startsWith('cmd:') || text.startsWith('raw:')) return false;
+  if (isSensitivePath(text) || hasSecretLikeText(text)) return false;
+  return (
+    text.startsWith('llm-wiki/') ||
+    text.startsWith('src/') ||
+    text.startsWith('test/') ||
+    text.startsWith('docs/') ||
+    text.startsWith('bin/') ||
+    text.startsWith('examples/') ||
+    /^(?:README|AGENTS|CLAUDE|LICENSE|package(?:-lock)?|install)\.[A-Za-z0-9]+$/i.test(text) ||
+    /\.[A-Za-z0-9]{1,12}$/.test(text)
+  );
+}
+function relativeProjectPath(projectRoot, value) {
+  const cleaned = cleanCandidatePath(value);
+  if (!cleaned) return '';
+  if (projectRoot && cleaned.startsWith(projectRoot)) {
+    return relative(projectRoot, cleaned).split(sep).join('/');
+  }
+  return cleaned.replace(/^\.\//, '');
+}
+function addFileRefs(refs, projectRoot, text) {
+  for (const candidate of extractPathsFromText(text || '')) {
+    const rel = relativeProjectPath(projectRoot, candidate);
+    if (!isUsefulFileCandidate(rel)) continue;
+    if (isAbsolute(rel) || rel.split('/').includes('..')) continue;
+    refs.push(`file:${rel}`);
+  }
+}
+function decodeJsonString(value) {
+  try {
+    return JSON.parse(`"${value}"`);
+  } catch {
+    return value.replace(/\\"/g, '"');
+  }
+}
+function addCommandRefs(refs, text) {
+  const body = String(text || '');
+  const cmdRegex = /"cmd"\s*:\s*"((?:\\.|[^"\\])*)"/g;
+  let match = cmdRegex.exec(body);
+  while (match) {
+    const command = summarizeForStorage(decodeJsonString(match[1]).replace(/\s+/g, ' '), MAX_CMD_EVIDENCE_CHARS);
+    if (command && !hasSecretLikeText(command)) refs.push(`cmd:${command}`);
+    match = cmdRegex.exec(body);
+  }
+  for (const line of body.split(/\r?\n/)) {
+    const cleaned = line.replace(/^[-*]\s+/, '').trim();
+    const direct = cleaned.match(/^(?:Bash|Shell|Verification|Command):\s*(.+)$/i)?.[1];
+    const command = direct || (/^(?:node|npm|npx|git|llm-wiki|pytest|python|pnpm|yarn|vitest|jest|tsc)\b/.test(cleaned) ? cleaned : '');
+    if (!command) continue;
+    const safe = summarizeForStorage(command.replace(/\s+/g, ' '), MAX_CMD_EVIDENCE_CHARS);
+    if (safe && !hasSecretLikeText(safe)) refs.push(`cmd:${safe}`);
+  }
+}
+export function evidenceRefsFromEntry(entry, options = {}) {
+  const refs = [];
+  const projectRoot = options.projectRoot || '';
+  addFileRefs(refs, projectRoot, entry?.changedFiles);
+  addFileRefs(refs, projectRoot, entry?.work);
+  addCommandRefs(refs, entry?.verification);
+  addCommandRefs(refs, entry?.work);
+  return normalizeEvidenceRefs(refs).slice(0, options.limit || 20);
+}
+export function redactEvidenceRefs(refs) {
+  return normalizeEvidenceRefs(refs).map((ref) => redactText(ref, 700));
+}