@ctxr/skill-llm-wiki 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (75) hide show
  1. package/CHANGELOG.md +134 -0
  2. package/LICENSE +21 -0
  3. package/README.md +484 -0
  4. package/SKILL.md +252 -0
  5. package/guide/basics/concepts.md +74 -0
  6. package/guide/basics/index.md +45 -0
  7. package/guide/basics/schema.md +140 -0
  8. package/guide/cli.md +256 -0
  9. package/guide/correctness/index.md +45 -0
  10. package/guide/correctness/invariants.md +89 -0
  11. package/guide/correctness/safety.md +96 -0
  12. package/guide/history/diff.md +110 -0
  13. package/guide/history/hidden-git.md +130 -0
  14. package/guide/history/index.md +52 -0
  15. package/guide/history/remote-sync.md +113 -0
  16. package/guide/index.md +134 -0
  17. package/guide/isolation/coexistence.md +134 -0
  18. package/guide/isolation/index.md +44 -0
  19. package/guide/isolation/scale.md +251 -0
  20. package/guide/layout/in-place-mode.md +97 -0
  21. package/guide/layout/index.md +53 -0
  22. package/guide/layout/layout-contract.md +131 -0
  23. package/guide/layout/layout-modes.md +115 -0
  24. package/guide/operations/index.md +76 -0
  25. package/guide/operations/ingest/build.md +75 -0
  26. package/guide/operations/ingest/extend.md +61 -0
  27. package/guide/operations/ingest/index.md +54 -0
  28. package/guide/operations/ingest/join.md +65 -0
  29. package/guide/operations/maintain/fix.md +66 -0
  30. package/guide/operations/maintain/index.md +47 -0
  31. package/guide/operations/maintain/rebuild.md +86 -0
  32. package/guide/operations/validate.md +48 -0
  33. package/guide/substrate/index.md +47 -0
  34. package/guide/substrate/operators.md +96 -0
  35. package/guide/substrate/tiered-ai.md +363 -0
  36. package/guide/ux/index.md +44 -0
  37. package/guide/ux/preflight.md +150 -0
  38. package/guide/ux/user-intent.md +135 -0
  39. package/package.json +55 -0
  40. package/scripts/cli.mjs +893 -0
  41. package/scripts/commands/remote.mjs +93 -0
  42. package/scripts/commands/review.mjs +253 -0
  43. package/scripts/commands/sync.mjs +84 -0
  44. package/scripts/lib/chunk.mjs +421 -0
  45. package/scripts/lib/cluster-detect.mjs +516 -0
  46. package/scripts/lib/decision-log.mjs +343 -0
  47. package/scripts/lib/draft.mjs +158 -0
  48. package/scripts/lib/embeddings.mjs +366 -0
  49. package/scripts/lib/frontmatter.mjs +497 -0
  50. package/scripts/lib/git-commands.mjs +155 -0
  51. package/scripts/lib/git.mjs +486 -0
  52. package/scripts/lib/gitignore.mjs +62 -0
  53. package/scripts/lib/history.mjs +331 -0
  54. package/scripts/lib/indices.mjs +510 -0
  55. package/scripts/lib/ingest.mjs +258 -0
  56. package/scripts/lib/intent.mjs +713 -0
  57. package/scripts/lib/interactive.mjs +99 -0
  58. package/scripts/lib/migrate.mjs +126 -0
  59. package/scripts/lib/nest-applier.mjs +260 -0
  60. package/scripts/lib/operators.mjs +1365 -0
  61. package/scripts/lib/orchestrator.mjs +718 -0
  62. package/scripts/lib/paths.mjs +197 -0
  63. package/scripts/lib/preflight.mjs +213 -0
  64. package/scripts/lib/provenance.mjs +672 -0
  65. package/scripts/lib/quality-metric.mjs +269 -0
  66. package/scripts/lib/query-fixture.mjs +71 -0
  67. package/scripts/lib/rollback.mjs +95 -0
  68. package/scripts/lib/shape-check.mjs +172 -0
  69. package/scripts/lib/similarity-cache.mjs +126 -0
  70. package/scripts/lib/similarity.mjs +230 -0
  71. package/scripts/lib/snapshot.mjs +54 -0
  72. package/scripts/lib/source-frontmatter.mjs +85 -0
  73. package/scripts/lib/tier2-protocol.mjs +470 -0
  74. package/scripts/lib/tiered.mjs +453 -0
  75. package/scripts/lib/validate.mjs +362 -0
package/README.md ADDED
@@ -0,0 +1,484 @@
1
+ # skill-llm-wiki — Structured knowledge that your AI can actually use
2
+
3
+ [![npm](https://img.shields.io/npm/v/@ctxr/skill-llm-wiki)](https://www.npmjs.com/package/@ctxr/skill-llm-wiki)
4
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
5
+ [![CI](https://img.shields.io/badge/CI-Ubuntu%20%2B%20Windows-green)](.github/workflows/ci.yml)
6
+
7
+ > **Turn any folder of markdown, docs, or source into a deterministic, token-efficient knowledge base your AI agent reads the way you'd want it to — once, and only the parts it needs.**
8
+
9
+ ## The problem every AI-heavy workflow eventually hits
10
+
11
+ You want your AI pair — Claude, Cursor, an agent loop, whatever — to *know things*. Architecture decisions. Runbooks. API contracts. Prior postmortems. Team conventions. The messy folder of `.md` notes you've been keeping for eighteen months.
12
+
13
+ So you dump it into the context window. And then you watch:
14
+
15
+ - **Token costs balloon** because every query re-reads the whole thing.
16
+ - **Answers go stale** because the AI grabbed a snippet from a doc that was deprecated three sprints ago.
17
+ - **Irrelevant context bleeds in** and your AI confidently cites something that isn't in scope for this task.
18
+ - **The folder structure drifts** because nobody has time to keep hand-maintained README indexes in sync with reality.
19
+
20
+ The fix isn't "more context window." It's **giving your AI a retrieval structure it can actually walk** — one that names what's in each subtree, routes queries to only the relevant leaves, and rewrites itself whenever the shape of the knowledge changes. That's what `skill-llm-wiki` builds.
21
+
22
+ ## What you get
23
+
24
+ Point this skill at a folder. It produces an **LLM wiki**: a sibling folder of markdown files organised into a deterministic, token-efficient retrieval structure. Your AI agent reads the root index, makes a semantic routing decision based on each subcategory's `focus` string, descends only into the subtrees that match the current task, and loads exactly the leaves it needs.
25
+
26
+ The wiki is **just markdown on disk**. No database, no vector store service, no lock-in. Every file is something you can read, edit, grep, and commit to your own git. The skill adds a hierarchy, a routing grammar, and a history substrate — then gets out of the way.
27
+
28
+ **Effectiveness, measured on real corpora:**
29
+
30
+ - **~90% of retrieval decisions resolve without reaching Claude** at all. TF-IDF + local MiniLM embeddings handle the routine cases for free, on-device, with zero API cost.
31
+ - **Token cost scales with *ambiguity*, not corpus size.** A 10,000-entry wiki costs roughly the same per query as a 100-entry wiki when the ambiguity rate is comparable. Decisive decisions short-circuit the ladder.
32
+ - **Dogfooded on itself.** The skill's own operational reference (`guide/`) was rebuilt by the skill. Same content, ~24% smaller on-disk after the convergence loop picked an 8-subcategory nested structure.
33
+ - **Deterministic and reproducible.** Same source + `LLM_WIKI_FIXED_TIMESTAMP=<epoch>` → byte-identical commit and tree SHAs across runs and across machines. Your build is hermetic.
34
+ - **Novel-corpus validated.** The 45-leaf `skill-code-review` corpus builds in one convergence iteration, 13 non-conflicting NESTs applied atomically, zero orphans, `validate` returns 0 errors / 0 warnings.
35
+
36
+ ## Why this matters for AI-heavy workflows
37
+
38
+ If you are building anything that involves an AI agent reading your codebase, your docs, your notes, or your decisions — you are already paying a structure tax. Either your agent re-reads too much (token bill balloons, latency climbs, context gets noisy) or it reads the wrong thing (answers drift, confidence is unjustified, debugging takes longer than writing the feature).
39
+
40
+ A well-structured LLM wiki flips that. Instead of "cram everything into the prompt and hope the model attends to the right parts," you get:
41
+
42
+ - **Routing discipline.** Your agent walks a semantic hierarchy from the root, and only loads the leaves whose `focus` string actually matches the current task. No blind full-tree reads.
43
+ - **Fresh history, not stale snapshots.** Every operation is a git commit inside a private repo under `<wiki>/.llmwiki/git/`. Roll back a bad rebuild with one command. Diff two operations. Blame a line in an index. Your AI's knowledge base is version-controlled with the same discipline as your code.
44
+ - **Rewrite operators that fire themselves.** When you add 50 new docs and the tree shape drifts, the convergence loop (DESCEND → LIFT → MERGE → NEST → DECOMPOSE) detects the drift, proposes structural changes, gates each one on a routing-cost metric, and commits the ones that objectively improve retrieval. You never have to "go fix the index table of contents" again.
45
+ - **It works on anything.** Markdown notes, product docs, API references, research dumps, runbooks, ADRs, policy libraries, source code, mixed folders, whole monorepos — the ingester doesn't care.
46
+
47
+ This skill is built for people who ship with AI and want their AI to ship better — AI vibe coders who have moved past "paste the file and pray" and want their knowledge base to compound the same way their codebase does.
48
+
49
+ ## How it works (the short version)
50
+
51
+ 1. **Ingest** — walk the source folder, compute content hashes, emit one candidate per file with byte-range provenance so nothing is silently dropped.
52
+ 2. **Draft frontmatter** — for each entry, derive `id`, `focus`, `covers[]`, `tags`, and `parents[]` from structure where possible; Claude fills in prose-heavy cases.
53
+ 3. **Layout + operator convergence** — the convergence loop applies deterministic rewrite operators (DESCEND, LIFT, MERGE, NEST, DECOMPOSE) until the tree reaches its token-minimal normal form, measured by a `routing_cost` metric. Clusters are proposed via Tier 2 sub-agents; each application is gated on whether it actually improves routing cost and rolled back otherwise.
54
+ 4. **Index generation** — every directory gets an `index.md` with machine routing metadata in frontmatter and human/LLM orientation prose in the body.
55
+ 5. **Validate + commit-finalize** — hard invariants (id uniqueness, DAG acyclicity, narrowing-chain consistency, byte-range loss check, private-git integrity) run before the operation is allowed to finalise. Any failure rolls back the entire operation to the pre-op snapshot.
56
+
57
+ Every phase is a git commit in the wiki's private history, so you can inspect, diff, roll back, and mirror exactly like a real repo — because it is one.
58
+
59
+ ## Features at a glance
60
+
61
+ - **Git-backed history.** Every operation is a snapshot + a series of per-phase commits under an isolated private git. Rollback, diff, blame, log, reflog, and remote mirroring are first-class skill subcommands — `skill-llm-wiki diff <wiki> --op <id>` is a passthrough to `git diff --find-renames --find-copies` scoped to the op's commit range, rollback is a byte-exact `git reset --hard pre-op/<id>`, and every URL printed by the remote-sync subcommands is redacted by default.
62
+ - **Stable sibling layout.** `<source>.wiki/` is the one folder a wiki ever lives in. No more `.llmwiki.v1`/`.v2`/`.v3` directory proliferation — prior states are reachable as git tags (`pre-op/<id>`, `op/<id>`) in the private repo.
63
+ - **Three layout modes, never guessed.** `sibling` (default), `in-place` (source IS the wiki), and `hosted` (user-chosen path with a `.llmwiki.layout.yaml` contract). Ambiguous invocations refuse and prompt — see the "Ask, don't guess" rule.
64
+ - **User-repo coexistence.** An auto-generated `.gitignore` hides the private metadata from any ancestor user git. The skill's isolation env block (`GIT_DIR`, `GIT_CONFIG_NOSYSTEM`, `core.hooksPath=/dev/null`, …) keeps the two gits from leaking into each other.
65
+ - **Tiered AI strategy.** TF-IDF (free) → local MiniLM embeddings (required, ~23 MB one-time model download, zero-API) → Claude (only for mid-band ambiguity and decisions requiring natural-language judgment). `--quality-mode tiered-fast|claude-first|tier0-only` selects the escalation policy.
66
+ - **Deterministic slug collisions.** NEST operator auto-resolves slug-vs-member-id collisions with a deterministic `-group` suffix before apply. Your convergence loop never needs manual retries for DUP-ID.
67
+ - **Optional interactive review.** `skill-llm-wiki rebuild <wiki> --review` prints the post-convergence diff and commit list, lets the user approve / abort / `drop:<sha>` specific iterations, and re-runs validation + index regen on the reverted tree.
68
+ - **Windows parity.** The CI matrix runs the smoke suite on both `ubuntu-latest` and `windows-latest`; the isolation env switches `/dev/null` to `NUL` and enables `core.longpaths=true` on Windows.
69
+
70
+ Works on any corpus: markdown notes, product docs, API references, research, runbooks, architecture records, policy libraries, source code, mixed folders, whole projects.
71
+
72
+ ## Quick Start
73
+
74
+ ```bash
75
+ # Install into your project
76
+ npx @ctxr/kit install @ctxr/skill-llm-wiki
77
+ ```
78
+
79
+ Then in Claude Code, ask for any of the six operations:
80
+
81
+ ```text
82
+ Build an LLM wiki from ./docs
83
+ Add ./arch to my docs wiki
84
+ Validate my docs wiki
85
+ Rebuild my docs wiki
86
+ Fix my docs wiki
87
+ Merge my docs and runbooks wikis into a handbook
88
+ ```
89
+
90
+ ## Requirements
91
+
92
+ This skill has two hard requirements. If either is missing, the skill will refuse to run and print a clear message explaining why and how to fix it.
93
+
94
+ 1. **[Claude Code](https://claude.ai/code) CLI or IDE extension.**
95
+ 2. **Node.js ≥ 18.0.0.** The skill's deterministic CLI (`scripts/cli.mjs`) is a Node.js program, so Node must be available in the shell Claude Code uses to run Bash commands. If Node.js is missing or below the minimum version, Claude will stop the operation before making any changes and relay platform-specific install instructions.
96
+
97
+ ### Verify your environment before invoking the skill
98
+
99
+ Open a terminal and run:
100
+
101
+ ```bash
102
+ node --version
103
+ ```
104
+
105
+ - If you see `v18.0.0` or newer → you're ready.
106
+ - If you see a version below `v18.0.0` → upgrade Node.js before using the skill.
107
+ - If you see `command not found` or similar → install Node.js before using the skill.
108
+
109
+ ### Installing or upgrading Node.js
110
+
111
+ Pick the option for your platform.
112
+
113
+ **macOS (Homebrew):**
114
+
115
+ ```bash
116
+ brew install node # or: brew upgrade node
117
+ ```
118
+
119
+ **macOS / Linux (nvm — recommended for dev machines):**
120
+
121
+ ```bash
122
+ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/master/install.sh | bash
123
+ nvm install 20
124
+ nvm use 20
125
+ ```
126
+
127
+ **Linux (Debian/Ubuntu):**
128
+
129
+ ```bash
130
+ curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
131
+ sudo apt-get install -y nodejs
132
+ ```
133
+
134
+ **Linux (RHEL/Fedora):**
135
+
136
+ ```bash
137
+ curl -fsSL https://rpm.nodesource.com/setup_20.x | sudo bash -
138
+ sudo dnf install -y nodejs
139
+ ```
140
+
141
+ **Windows (winget):**
142
+
143
+ ```powershell
144
+ winget install OpenJS.NodeJS
145
+ ```
146
+
147
+ **Windows (Chocolatey):**
148
+
149
+ ```powershell
150
+ choco install nodejs-lts
151
+ ```
152
+
153
+ **Any platform:** download the official installer from <https://nodejs.org/en/download/>.
154
+
155
+ After installing, open a **fresh** terminal (so the shell picks up the new `PATH`) and verify with `node --version` again.
156
+
157
+ ### Two-layer safety net
158
+
159
+ The skill checks Node.js availability before running any operation so you never see cryptic failures:
160
+
161
+ 1. **Preflight (Bash).** Before the first CLI invocation of every operation, Claude runs `node --version` via Bash and stops with a detailed install message if Node is missing or too old. Nothing gets mutated before this check passes.
162
+ 2. **Runtime guard (Node).** `scripts/cli.mjs` re-checks `process.version` as its very first action and exits with code 4 and a short message if somehow invoked on an unsupported Node. Defense-in-depth so even a broken shell environment cannot produce a half-finished wiki.
163
+
164
+ Both checks fail **loud and early** with a clear explanation and zero side-effects. The skill is safe to point at real folders on any machine.
165
+
166
+ ## Installation
167
+
168
+ ### Via @ctxr/kit
169
+
170
+ ```bash
171
+ npx @ctxr/kit install @ctxr/skill-llm-wiki # project-local
172
+ npx @ctxr/kit install @ctxr/skill-llm-wiki --user # user-global
173
+ ```
174
+
175
+ Installs to `.claude/skills/ctxr-skill-llm-wiki/` (or `~/.claude/skills/…` with `--user`). No post-install wiring, no automatic hooks, no filesystem watchers — the skill is pure standby until you explicitly ask Claude to run an operation against a specific directory.
176
+
177
+ The installed package contains `SKILL.md` (the routing entry point Claude reads at activation), `LICENSE`, `README.md`, `scripts/` (invoked via `node scripts/cli.mjs <subcommand>`, never read as source), and `guide/` (context-specific routing leaves loaded on keyword activation — `hidden-git.md` when the user asks about history or diff, `user-intent.md` when the request is ambiguous, `tiered-ai.md` when the user asks about quality modes, etc.). The internal design doc `methodology.md` is deliberately excluded from the installed package (`files[]` in `package.json` does not list it) so it is never copied into any user environment and never loaded during a session.
178
+
179
+ ### Manual
180
+
181
+ ```bash
182
+ git clone https://github.com/ctxr-dev/skill-llm-wiki.git /tmp/skill-llm-wiki
183
+ mkdir -p .claude/skills
184
+ cp -r /tmp/skill-llm-wiki .claude/skills/skill-llm-wiki
185
+ ```
186
+
187
+ ### Git Submodule
188
+
189
+ ```bash
190
+ git submodule add https://github.com/ctxr-dev/skill-llm-wiki.git \
191
+ .claude/skills/skill-llm-wiki
192
+ ```
193
+
194
+ ## Usage
195
+
196
+ Ask Claude for any of the six operations against a specific target directory. Examples:
197
+
198
+ ```text
199
+ Build an LLM wiki from ./docs
200
+ # → creates ./docs.wiki/ next to ./docs, initialises the private
201
+ # git at ./docs.wiki/.llmwiki/git/, tags pre-op/<id> and op/<id>
202
+
203
+ Add ./arch to my docs wiki
204
+ # → extends ./docs.wiki/ in place with a new op tag
205
+
206
+ Validate ./docs.wiki
207
+ # → read-only invariant check; prints findings with severity
208
+
209
+ Rebuild ./docs.wiki --review
210
+ # → runs convergence, prints the diff + per-iteration commit list,
211
+ # and prompts approve / abort / drop:<sha> before validation
212
+
213
+ Diff ./docs.wiki --op <op-id> --stat
214
+ # → byte-identical native `git diff --stat` against the private repo
215
+
216
+ Rollback ./docs.wiki --to pre-op/<op-id>
217
+ # → byte-exact reset to the snapshot taken before that operation
218
+
219
+ Fix ./docs.wiki
220
+ # → runs AUTO-class repairs; HUMAN-class findings surface as structured prompts for the user to resolve
221
+
222
+ Merge ./docs.wiki and ./runbooks.wiki into handbook
223
+ # → creates ./handbook.wiki/ with merged content and rewired references
224
+ ```
225
+
226
+ Nothing happens until you ask. The skill performs exactly the operation you request against the target you name, then stops. Ambiguous invocations (two folders would both match, two layout modes are both compatible, a default sibling would stomp on a foreign directory, …) refuse with an `INT-NN` structured error rather than guessing — the skill's "ask, don't guess" rule is a hard contract.
227
+
228
+ ## Layout modes
229
+
230
+ Every operation accepts `--layout-mode <mode>`; the default is `sibling`. Ambiguous cases refuse and prompt — they are never silently resolved.
231
+
232
+ ### `sibling` (default)
233
+
234
+ `<source>.wiki/` lives next to `<source>/`. One wiki, one sibling directory, forever. Subsequent Rebuilds update the same sibling in place; prior states are reachable as git tags in the private repo under `<wiki>/.llmwiki/git/`. No `.llmwiki.v<N>` directory proliferation — the private git is the authoritative history substrate.
235
+
236
+ ### `in-place`
237
+
238
+ The source folder IS the wiki. `<source>/.llmwiki/git/` is created inside the source itself; the `pre-op/<first-op>` snapshot captures the user's original content byte-for-byte; subsequent operations mutate the source directly. Rollback to the snapshot tag restores the original tree exactly. **Only runs when explicitly requested** with `--layout-mode in-place` — never inferred.
239
+
240
+ ### `hosted`
241
+
242
+ The wiki lives at a user-chosen path that carries a `.llmwiki.layout.yaml` contract. Pass `--layout-mode hosted --target <path>`. The contract describes the required directories, allowed entry types, dynamic subdirectory templates (e.g., `daily/{yyyy}-{mm}-{dd}/`), and any additional invariants. Hosted mode is designed for shared team wikis and for "my wiki lives at `./memory/knowledge/`, I don't want it next to any source folder" workflows.
243
+
244
+ ### User-repo coexistence
245
+
246
+ A wiki's filesystem location often sits inside the user's own git repository. The skill's private git never interferes with the user's git: every `git` subprocess runs with a strict isolation env (`GIT_DIR`, `GIT_CONFIG_NOSYSTEM=1`, `GIT_CONFIG_GLOBAL=/dev/null`, `HOME=<tmpdir>`, `core.hooksPath=/dev/null`, …). An auto-generated `<wiki>/.gitignore` hides `.llmwiki/`, `.work/`, and `.shape/history/*/work/` from any ancestor user git. The wiki content itself is plain markdown the user is encouraged to commit.
247
+
248
+ ### Legacy `.llmwiki.v<N>/` auto-migration
249
+
250
+ When the skill encounters a pre-2.0 versioned sibling directory, the intent resolver halts with a migration prompt. On acceptance, the latest version is copied into a new `<source>.wiki/`, the private git is initialised, and the genesis commit is tagged `op/migrated-from-v<N>`. The old folder is left untouched; users prune it manually.
251
+
252
+ ## The Six Operations
253
+
254
+ | Operation | Purpose | Output |
255
+ | --------- | ------- | ------ |
256
+ | **Build** | Create a new wiki from raw sources | sibling: `<source>.wiki/` · in-place: mutates `<source>/` directly · hosted: user-chosen path under a layout contract |
257
+ | **Extend** | Add new sources to an existing wiki | new per-phase commits + a new `op/<id>` tag on the existing `<wiki>.wiki/` |
258
+ | **Validate** | Read-only invariant check | structured findings report (hard + soft) |
259
+ | **Rebuild** | Optimise structure for token efficiency | new per-phase commits on the same wiki; `--review` gates the commit-finalize step on user approval |
260
+ | **Fix** | Repair methodology divergences | new commits on the existing wiki; HUMAN-class findings surface as structured prompts for user resolution *(minimal build-forward stub for now; full fix pipeline + dedicated INT error code are future work)* |
261
+ | **Join** | Merge two or more wikis into one | new unified wiki at the user-chosen target *(stub; full join pipeline is future work)* |
262
+
263
+ ### Safety envelope (all operations)
264
+
265
+ - **Sources are immutable** in `sibling` and `hosted` modes; in `in-place` mode every change is anchored by the `pre-op/<op-id>` snapshot tag so rollback is byte-exact.
266
+ - **Every operation is a git sequence.** The pipeline always runs `pre-op snapshot → phase commits → validation → commit-finalize`. Validation failure triggers `git reset --hard pre-op/<id>` + `git clean -fd`; the failed phase commits survive in the reflog for post-mortem.
267
+ - **Rollback, diff, log, show, blame, history, reflog.** All exposed as subcommands and all byte-identical to native `git` under the isolation env. See `skill-llm-wiki diff/log/show/blame/history/reflog <wiki>`.
268
+ - **Phase-commit audit trail.** Each operation decomposes into named phases; every phase (and every operator-convergence iteration) is a git commit so the private repo's log is a complete per-phase audit trail. An interrupted operation can be inspected via `skill-llm-wiki log --op <id>` and rolled back via `skill-llm-wiki rollback <wiki> --to pre-<op-id>`. True mid-phase resume ("pick up from the last per-item marker") is scoped as future work.
269
+ - **Deterministic.** Same source + `LLM_WIKI_FIXED_TIMESTAMP=<epoch>` → byte-identical HEAD commit AND tree SHAs across runs and across machines. `newOpId` substitutes the random component for the literal `"deterministic"` when the env var is set, so the op-id, tag bodies, commit objects, and tree objects are all reproducible. AI calls are cached by request hash; similarity decisions are cached by content-hash pair.
270
+ - **Atomic commit-finalize.** The final `op/<op-id>` tag is set as the last step of every operation; until that tag exists, the operation is still reversible in one command.
271
+ - **Optional interactive review.** `rebuild --review` prints `git diff --stat` + the per-iteration commit list and prompts approve / abort / `drop:<sha>`. Drops become `git revert --no-edit` commits and the loop re-prompts so the user can drop multiple iterations.
272
+ - **Never-auto-push remote mirroring.** `skill-llm-wiki remote <wiki> add <name> <url>` plus `skill-llm-wiki sync <wiki>` pushes tags (and optionally a branch) to a bare remote the user manages. Tag-only refspec by default; URL credentials are redacted in every echoed line and error message.
273
+
274
+ ## Phase-by-phase pipeline (the long version)
275
+
276
+ Every operation runs the same git-backed sequence end-to-end. Phases are explicit so you can read the private git's `log --oneline` after a run and recover the full story of what happened.
277
+
278
+ 1. **Preflight + pre-op snapshot** — Node and git version checks, private-git integrity check, then `git add -A && git commit -m "pre-op <op-id>"` + tag `pre-op/<op-id>`.
279
+ 2. **Ingest** (Build only) — walk the source tree, compute content hashes, emit entry candidates. Byte-range provenance is recorded to `<wiki>/.llmwiki/provenance.yaml` so `LOSS-01` can verify nothing was silently dropped. Extend / Rebuild / Fix / Join do not currently touch `provenance.yaml`.
280
+ 3. **Classify** — group entries into categories. Tiered AI ladder: TF-IDF → local MiniLM embeddings → Claude. Decisive Tier 0 / Tier 1 outcomes never reach Claude.
281
+ 4. **Draft frontmatter** — derive `id`, `focus`, `covers[]`, `activation`, `tags`, `parents[]` from structure where possible; Claude fallback for prose-heavy sources.
282
+ 5. **Layout** — place entries in a draft tree honouring the narrowing-chain rule.
283
+ 6. **Operator convergence** — apply DESCEND, LIFT, MERGE, NEST, DECOMPOSE in priority order until the tree reaches its normal form. One git commit per iteration so `git log pre-op/<id>..HEAD` reads like a per-iteration audit trail.
284
+ 7. **Review (optional, `--review` only)** — print `git diff --stat` + commit list; accept approve / abort / drop:<sha> from the user. Drops land as `git revert --no-edit` commits and the loop re-prompts.
285
+ 8. **Index generation** — emit a unified `index.md` at every directory with machine routing metadata in frontmatter and human/LLM orientation in the body.
286
+ 9. **Validation** — run hard invariants including the new `GIT-01` (private-git integrity under the isolation env) and `LOSS-01` (byte-range coverage equals source size). Failure triggers `git reset --hard pre-op/<id>` + `git clean -fd`.
287
+ 10. **Commit-finalize** — tag the final commit `op/<op-id>`, append to `<wiki>/.llmwiki/op-log.yaml`, delete the live `.work/` scratch directory. *(A "golden-path" phase that compares routing-fixture load sets against the prior op and a `.work/` → `.shape/history/<op-id>/` archive step are scoped as future work.)*
288
+
289
+ ## Wiki format
290
+
291
+ Every directory in a wiki holds exactly one `index.md`:
292
+
293
+ ```markdown
294
+ ---
295
+ id: installation
296
+ type: index
297
+ depth_role: category
298
+ depth: 1
299
+ focus: installing the product on supported platforms
300
+ parents:
301
+ - ../index.md
302
+ shared_covers:
303
+ - prerequisite checks
304
+ - post-install validation
305
+ entries:
306
+ - id: linux
307
+ file: linux.md
308
+ type: primary
309
+ focus: installing on Linux distributions
310
+ - id: macos
311
+ file: macos.md
312
+ type: primary
313
+ focus: installing on macOS
314
+ children: []
315
+ ---
316
+ <!-- BEGIN AUTO-GENERATED NAVIGATION -->
317
+ # Installation
318
+ ## Children
319
+ | File | Type | Focus |
320
+ | ... |
321
+ <!-- END AUTO-GENERATED NAVIGATION -->
322
+ <!-- BEGIN AUTHORED ORIENTATION -->
323
+ Human/LLM-authored prose, preserved across regenerations.
324
+ <!-- END AUTHORED ORIENTATION -->
325
+ ```
326
+
327
+ Leaves are `<id>.md` files with their own frontmatter (`id`, `type`, `focus`, `covers[]`, `parents[]`, `activation`, `tags`, `aliases`, `links`, `source`). The root `index.md` additionally carries a `generator: skill-llm-wiki/v1` marker that scripts use as a safety check before mutating anything.
328
+
329
+ ## Architecture
330
+
331
+ The installed skill contains only what Claude needs at runtime. Everything Claude reads is in `SKILL.md`; everything it executes is in the `scripts/` CLI.
332
+
333
+ ```text
334
+ skill-llm-wiki/ # installed package layout
335
+ ├── SKILL.md # the ONLY file Claude reads — fully self-contained
336
+ ├── README.md # human-facing docs (this file)
337
+ ├── LICENSE
338
+ ├── guide/ # routing-time leaves loaded by Claude on keyword activation
339
+ │ ├── hidden-git.md # using the private git for history / diff / blame
340
+ │ ├── layout-modes.md # sibling vs in-place vs hosted
341
+ │ ├── user-intent.md # "ask, don't guess" scenarios
342
+ │ ├── tiered-ai.md # tier ladder and quality modes
343
+ │ ├── remote-sync.md # remote mirroring + redaction
344
+ │ └── … # (coexistence, scale, diff, in-place-mode, safety, operations/*)
345
+ └── scripts/
346
+ ├── cli.mjs # Deterministic CLI dispatcher — invoked, never read
347
+ ├── commands/ # Command-level orchestrators
348
+ │ ├── review.mjs # --review flow for rebuild
349
+ │ ├── remote.mjs # remote add/list/remove
350
+ │ └── sync.mjs # remote sync (tag-only default refspec)
351
+ └── lib/
352
+ ├── git.mjs # THE git subprocess spawner — isolation env + redaction
353
+ ├── git-commands.mjs # log/show/diff/blame/history/reflog subcommand bodies
354
+ ├── gitignore.mjs # auto-writer for the wiki-local `.gitignore`
355
+ ├── paths.mjs # Sibling/in-place/hosted recognition + `.llmwiki/git/` detection
356
+ ├── snapshot.mjs # preOpSnapshot + tag helpers
357
+ ├── rollback.mjs # ref verification + reset/clean
358
+ ├── history.mjs # op-log append/read, entry history traversal
359
+ ├── provenance.mjs # byte-range record / verifyCoverage (LOSS-01 source)
360
+ ├── chunk.mjs # Buffer-first frontmatter-only async iterator
361
+ ├── preflight.mjs # Node + git + wiki-fsck checks
362
+ ├── intent.mjs # layout-mode / target / op resolver (INT-NN errors)
363
+ ├── interactive.mjs # stdin prompts; non-TTY → hard error
364
+ ├── similarity.mjs # Tier 0 — TF-IDF + cosine
365
+ ├── embeddings.mjs # Tier 1 — MiniLM via @xenova/transformers (required)
366
+ ├── similarity-cache.mjs # pairwise memoisation
367
+ ├── decision-log.mjs # .llmwiki/decisions.yaml writer
368
+ ├── tiered.mjs # escalation orchestrator + quality modes
369
+ ├── migrate.mjs # legacy .llmwiki.v<N> → .wiki migration flow
370
+ ├── operators.mjs # The five rewrite operator primitives
371
+ ├── nest-applier.mjs # NEST apply + deterministic slug collision resolver
372
+ ├── cluster-detect.mjs # NEST candidate clusterer (affinity + threshold sweep)
373
+ ├── quality-metric.mjs # routing_cost metric for NEST gating
374
+ ├── frontmatter.mjs # Zero-dep YAML frontmatter parser/writer
375
+ ├── ingest.mjs # Source walk + content hashing
376
+ ├── draft.mjs # Deterministic frontmatter drafting + provenance record
377
+ ├── indices.mjs # Unified index.md rebuild
378
+ ├── validate.mjs # Hard-invariant checks including GIT-01 / LOSS-01
379
+ ├── shape-check.mjs # Operator candidate detection (hook-mode path; no git)
380
+ └── orchestrator.mjs # Per-phase commit pipeline
381
+ ```
382
+
383
+ `SKILL.md` and the `guide/` leaves are the only files Claude reads at routing/session time; the `scripts/` source is invoked as a process, never read. Every CLI subcommand's inputs, outputs, and exit codes are documented in `SKILL.md` so no source inspection is ever necessary during a session.
384
+
385
+ The development repository also contains `methodology.md`, an internal design reference for maintainers (sections 9.4.2/9.4.3/9.9/9.10 are the normative source for this README's "Layout modes", "Ask, don't guess", "git-backed history", and "tiered AI" content respectively). It is deliberately excluded from the installed package.
386
+
387
+ The CLI subcommands you will see the skill invoke:
388
+
389
+ ```bash
390
+ # Top-level operations (routed through intent.mjs)
391
+ node scripts/cli.mjs build <source> [--layout-mode sibling|in-place|hosted] [--target <path>]
392
+ node scripts/cli.mjs extend <wiki> <source>
393
+ node scripts/cli.mjs validate <wiki>
394
+ node scripts/cli.mjs rebuild <wiki> [--review]
395
+ node scripts/cli.mjs fix <wiki>
396
+ node scripts/cli.mjs join <target> <wiki-a> <wiki-b> [<wiki-c> ...]
397
+ node scripts/cli.mjs rollback <wiki> --to <ref>
398
+ node scripts/cli.mjs migrate <legacy-wiki>
399
+
400
+ # Hidden-git plumbing (all run under the isolation env)
401
+ node scripts/cli.mjs log <wiki> [--op <id>] [git-log-args...]
402
+ node scripts/cli.mjs show <wiki> <ref> [-- <path>]
403
+ node scripts/cli.mjs diff <wiki> [--op <id>] [git-diff-args...]
404
+ node scripts/cli.mjs blame <wiki> <path>
405
+ node scripts/cli.mjs history <wiki> <entry-id>
406
+ node scripts/cli.mjs reflog <wiki>
407
+
408
+ # Remote mirroring (never auto-pushes)
409
+ node scripts/cli.mjs remote <wiki> add <name> <url>
410
+ node scripts/cli.mjs remote <wiki> list
411
+ node scripts/cli.mjs remote <wiki> remove <name>
412
+ node scripts/cli.mjs sync <wiki> [--remote <name>] [--push-branch <branch>] [--skip-fetch] [--skip-push]
413
+
414
+ # Low-level helpers (invoked by SKILL.md routing, not user-facing)
415
+ node scripts/cli.mjs ingest <source>
416
+ node scripts/cli.mjs draft-leaf <candidate-file>
417
+ node scripts/cli.mjs draft-category <candidate-file>
418
+ node scripts/cli.mjs index-rebuild <wiki>
419
+ node scripts/cli.mjs index-rebuild-one <dir> <wiki>
420
+ node scripts/cli.mjs shape-check <wiki>
421
+
422
+ # Legacy helpers (still present for pre-Phase-2 `.llmwiki.vN` wikis)
423
+ node scripts/cli.mjs resolve-wiki <source>
424
+ node scripts/cli.mjs next-version <source>
425
+ node scripts/cli.mjs list-versions <source>
426
+ node scripts/cli.mjs set-current <source> <version>
427
+ ```
428
+
429
+ ## Validation invariants
430
+
431
+ Every wiki passes the same set of hard invariants:
432
+
433
+ - `id` matches filename (leaves) or directory name (index files)
434
+ - `depth_role` matches actual tree depth
435
+ - **Strict narrowing** along every canonical `parents[0]` chain up to the root
436
+ - `parents[]` required and non-empty on every non-root entry
437
+ - **DAG acyclicity** — walking `parents[]` transitively never revisits the start
438
+ - **Canonical-parent consistency** — the entry lives inside `parents[0]`'s directory; soft parents only cross-reference
439
+ - No duplicate `id` anywhere; aliases do not collide with live ids
440
+ - `overlay_targets`, `links[].id`, and `parents[]` resolve via id or alias
441
+ - **Parent-file contract** — index bodies contain navigation and orientation only, no leaf-shaped content
442
+ - Every directory containing entries has a valid `index.md`
443
+ - Leaf size caps (500 lines for primaries, 200 for overlays)
444
+ - Source integrity — if `source.hash` is set, upstream content must still match
445
+ - **`GIT-01` — private-git integrity.** When `<wiki>/.llmwiki/git/HEAD` exists, `git fsck --no-dangling --no-reflogs` must succeed under the isolation env, and — when the op-log has at least one entry — the most recent logged op's `pre-op/<op-id>` tag must exist and be reachable from HEAD.
446
+ - **`LOSS-01` — byte-range coverage.** When `<wiki>/.llmwiki/provenance.yaml` exists, for every source file recorded in it, the total byte coverage (`sources[].byte_range` + `discarded_ranges[].byte_range`) must equal the manifest-recorded `source_size`, with no overlapping ranges. Sizes are read from the manifest so the check runs without needing access to the original source tree.
447
+
448
+ Soft shape-signals (operator candidates, golden-path regressions, coverage holes) are reported separately and drive the next Rebuild without blocking current operations.
449
+
450
+ ## Tiered AI strategy
451
+
452
+ Every decision the skill makes is classified against a three-tier ladder and escalated only when necessary:
453
+
454
+ | Phase | Primary tier | Escalation | Notes |
455
+ |------------------------------------|-------------------------------------------|------------|-------|
456
+ | ingest / layout / index / validate / commit / routing | None (deterministic scripts) | — | No similarity, no generation. |
457
+ | classify / operator-convergence / join collisions | TF-IDF → MiniLM embeddings → Claude | Full ladder | >90% of decisions resolve at Tier 0 or 1 on typical corpora. |
458
+ | draft-frontmatter | Heuristic extractor → Claude | Skip Tier 1 | Generation, not similarity. Claude only for prose-heavy sources. |
459
+ | Fix — AI-ASSIST class | Claude | — | Content generation. |
460
+ | Fix — HUMAN class | User prompt | — | Always asks. |
461
+
462
+ Quality modes select the escalation policy:
463
+
464
+ - `tiered-fast` (default) — full Tier 0 → 1 → 2 ladder.
465
+ - `claude-first` — skip Tier 1; mid-band Tier 0 escalates straight to Claude.
466
+ - `tier0-only` — air-gapped mode; mid-band becomes an "undecidable" marker resolved via the interactive review flow.
467
+
468
+ Tier 1 uses `@xenova/transformers` running `Xenova/all-MiniLM-L6-v2` locally via ONNX (~23 MB one-time model download, ~50 ms per text on CPU, zero API cost). It is a **required** runtime dependency since v0.4.0 — the dependency preflight at CLI startup verifies it is resolvable, and will offer to `npm install` it on a fresh checkout if it is missing.
469
+
470
+ Token cost is proportional to *ambiguity*, not to corpus size. A 10k-entry wiki takes roughly the same Claude budget as a 100-entry wiki when it produces the same number of mid-band decisions. All AI calls are cached by request hash at `.work/ai-cache/` and all pairwise similarity decisions are cached at `.llmwiki/similarity-cache/` so resumes and re-runs replay free.
471
+
472
+ ## Development
473
+
474
+ ```bash
475
+ npm test # run smoke tests
476
+ node scripts/cli.mjs --version # print CLI version
477
+ node scripts/cli.mjs --help # list subcommands
478
+ ```
479
+
480
+ Smoke tests verify: frontmatter roundtrip, source ingest, hand-built wiki validates, index-rebuild idempotency, and the script safety net against unrelated folders.
481
+
482
+ ## License
483
+
484
+ [MIT](LICENSE)