@ctxr/skill-llm-wiki 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +134 -0
- package/LICENSE +21 -0
- package/README.md +484 -0
- package/SKILL.md +252 -0
- package/guide/basics/concepts.md +74 -0
- package/guide/basics/index.md +45 -0
- package/guide/basics/schema.md +140 -0
- package/guide/cli.md +256 -0
- package/guide/correctness/index.md +45 -0
- package/guide/correctness/invariants.md +89 -0
- package/guide/correctness/safety.md +96 -0
- package/guide/history/diff.md +110 -0
- package/guide/history/hidden-git.md +130 -0
- package/guide/history/index.md +52 -0
- package/guide/history/remote-sync.md +113 -0
- package/guide/index.md +134 -0
- package/guide/isolation/coexistence.md +134 -0
- package/guide/isolation/index.md +44 -0
- package/guide/isolation/scale.md +251 -0
- package/guide/layout/in-place-mode.md +97 -0
- package/guide/layout/index.md +53 -0
- package/guide/layout/layout-contract.md +131 -0
- package/guide/layout/layout-modes.md +115 -0
- package/guide/operations/index.md +76 -0
- package/guide/operations/ingest/build.md +75 -0
- package/guide/operations/ingest/extend.md +61 -0
- package/guide/operations/ingest/index.md +54 -0
- package/guide/operations/ingest/join.md +65 -0
- package/guide/operations/maintain/fix.md +66 -0
- package/guide/operations/maintain/index.md +47 -0
- package/guide/operations/maintain/rebuild.md +86 -0
- package/guide/operations/validate.md +48 -0
- package/guide/substrate/index.md +47 -0
- package/guide/substrate/operators.md +96 -0
- package/guide/substrate/tiered-ai.md +363 -0
- package/guide/ux/index.md +44 -0
- package/guide/ux/preflight.md +150 -0
- package/guide/ux/user-intent.md +135 -0
- package/package.json +55 -0
- package/scripts/cli.mjs +893 -0
- package/scripts/commands/remote.mjs +93 -0
- package/scripts/commands/review.mjs +253 -0
- package/scripts/commands/sync.mjs +84 -0
- package/scripts/lib/chunk.mjs +421 -0
- package/scripts/lib/cluster-detect.mjs +516 -0
- package/scripts/lib/decision-log.mjs +343 -0
- package/scripts/lib/draft.mjs +158 -0
- package/scripts/lib/embeddings.mjs +366 -0
- package/scripts/lib/frontmatter.mjs +497 -0
- package/scripts/lib/git-commands.mjs +155 -0
- package/scripts/lib/git.mjs +486 -0
- package/scripts/lib/gitignore.mjs +62 -0
- package/scripts/lib/history.mjs +331 -0
- package/scripts/lib/indices.mjs +510 -0
- package/scripts/lib/ingest.mjs +258 -0
- package/scripts/lib/intent.mjs +713 -0
- package/scripts/lib/interactive.mjs +99 -0
- package/scripts/lib/migrate.mjs +126 -0
- package/scripts/lib/nest-applier.mjs +260 -0
- package/scripts/lib/operators.mjs +1365 -0
- package/scripts/lib/orchestrator.mjs +718 -0
- package/scripts/lib/paths.mjs +197 -0
- package/scripts/lib/preflight.mjs +213 -0
- package/scripts/lib/provenance.mjs +672 -0
- package/scripts/lib/quality-metric.mjs +269 -0
- package/scripts/lib/query-fixture.mjs +71 -0
- package/scripts/lib/rollback.mjs +95 -0
- package/scripts/lib/shape-check.mjs +172 -0
- package/scripts/lib/similarity-cache.mjs +126 -0
- package/scripts/lib/similarity.mjs +230 -0
- package/scripts/lib/snapshot.mjs +54 -0
- package/scripts/lib/source-frontmatter.mjs +85 -0
- package/scripts/lib/tier2-protocol.mjs +470 -0
- package/scripts/lib/tiered.mjs +453 -0
- package/scripts/lib/validate.mjs +362 -0
package/README.md
ADDED
|
@@ -0,0 +1,484 @@
|
|
|
1
|
+
# skill-llm-wiki — Structured knowledge that your AI can actually use
|
|
2
|
+
|
|
3
|
+
[](https://www.npmjs.com/package/@ctxr/skill-llm-wiki)
|
|
4
|
+
[](LICENSE)
|
|
5
|
+
[](.github/workflows/ci.yml)
|
|
6
|
+
|
|
7
|
+
> **Turn any folder of markdown, docs, or source into a deterministic, token-efficient knowledge base your AI agent reads the way you'd want it to — once, and only the parts it needs.**
|
|
8
|
+
|
|
9
|
+
## The problem every AI-heavy workflow eventually hits
|
|
10
|
+
|
|
11
|
+
You want your AI pair — Claude, Cursor, an agent loop, whatever — to *know things*. Architecture decisions. Runbooks. API contracts. Prior postmortems. Team conventions. The messy folder of `.md` notes you've been keeping for eighteen months.
|
|
12
|
+
|
|
13
|
+
So you dump it into the context window. And then you watch:
|
|
14
|
+
|
|
15
|
+
- **Token costs balloon** because every query re-reads the whole thing.
|
|
16
|
+
- **Answers go stale** because the AI grabbed a snippet from a doc that was deprecated three sprints ago.
|
|
17
|
+
- **Irrelevant context bleeds in** and your AI confidently cites something that isn't in scope for this task.
|
|
18
|
+
- **The folder structure drifts** because nobody has time to keep hand-maintained README indexes in sync with reality.
|
|
19
|
+
|
|
20
|
+
The fix isn't "more context window." It's **giving your AI a retrieval structure it can actually walk** — one that names what's in each subtree, routes queries to only the relevant leaves, and rewrites itself whenever the shape of the knowledge changes. That's what `skill-llm-wiki` builds.
|
|
21
|
+
|
|
22
|
+
## What you get
|
|
23
|
+
|
|
24
|
+
Point this skill at a folder. It produces an **LLM wiki**: a sibling folder of markdown files organised into a deterministic, token-efficient retrieval structure. Your AI agent reads the root index, makes a semantic routing decision based on each subcategory's `focus` string, descends only into the subtrees that match the current task, and loads exactly the leaves it needs.
|
|
25
|
+
|
|
26
|
+
The wiki is **just markdown on disk**. No database, no vector store service, no lock-in. Every file is something you can read, edit, grep, and commit to your own git. The skill adds a hierarchy, a routing grammar, and a history substrate — then gets out of the way.
|
|
27
|
+
|
|
28
|
+
**Effectiveness, measured on real corpora:**
|
|
29
|
+
|
|
30
|
+
- **~90% of retrieval decisions resolve without reaching Claude** at all. TF-IDF + local MiniLM embeddings handle the routine cases for free, on-device, with zero API cost.
|
|
31
|
+
- **Token cost scales with *ambiguity*, not corpus size.** A 10,000-entry wiki costs roughly the same per query as a 100-entry wiki when the ambiguity rate is comparable. Decisive decisions short-circuit the ladder.
|
|
32
|
+
- **Dogfooded on itself.** The skill's own operational reference (`guide/`) was rebuilt by the skill. Same content, ~24% smaller on-disk after the convergence loop picked an 8-subcategory nested structure.
|
|
33
|
+
- **Deterministic and reproducible.** Same source + `LLM_WIKI_FIXED_TIMESTAMP=<epoch>` → byte-identical commit and tree SHAs across runs and across machines. Your build is hermetic.
|
|
34
|
+
- **Novel-corpus validated.** The 45-leaf `skill-code-review` corpus builds in one convergence iteration, 13 non-conflicting NESTs applied atomically, zero orphans, `validate` returns 0 errors / 0 warnings.
|
|
35
|
+
|
|
36
|
+
## Why this matters for AI-heavy workflows
|
|
37
|
+
|
|
38
|
+
If you are building anything that involves an AI agent reading your codebase, your docs, your notes, or your decisions — you are already paying a structure tax. Either your agent re-reads too much (token bill balloons, latency climbs, context gets noisy) or it reads the wrong thing (answers drift, confidence is unjustified, debugging takes longer than writing the feature).
|
|
39
|
+
|
|
40
|
+
A well-structured LLM wiki flips that. Instead of "cram everything into the prompt and hope the model attends to the right parts," you get:
|
|
41
|
+
|
|
42
|
+
- **Routing discipline.** Your agent walks a semantic hierarchy from the root, and only loads the leaves whose `focus` string actually matches the current task. No blind full-tree reads.
|
|
43
|
+
- **Fresh history, not stale snapshots.** Every operation is a git commit inside a private repo under `<wiki>/.llmwiki/git/`. Roll back a bad rebuild with one command. Diff two operations. Blame a line in an index. Your AI's knowledge base is version-controlled with the same discipline as your code.
|
|
44
|
+
- **Rewrite operators that fire themselves.** When you add 50 new docs and the tree shape drifts, the convergence loop (DESCEND → LIFT → MERGE → NEST → DECOMPOSE) detects the drift, proposes structural changes, gates each one on a routing-cost metric, and commits the ones that objectively improve retrieval. You never have to "go fix the index table of contents" again.
|
|
45
|
+
- **It works on anything.** Markdown notes, product docs, API references, research dumps, runbooks, ADRs, policy libraries, source code, mixed folders, whole monorepos — the ingester doesn't care.
|
|
46
|
+
|
|
47
|
+
This skill is built for people who ship with AI and want their AI to ship better — AI vibe coders who have moved past "paste the file and pray" and want their knowledge base to compound the same way their codebase does.
|
|
48
|
+
|
|
49
|
+
## How it works (the short version)
|
|
50
|
+
|
|
51
|
+
1. **Ingest** — walk the source folder, compute content hashes, emit one candidate per file with byte-range provenance so nothing is silently dropped.
|
|
52
|
+
2. **Draft frontmatter** — for each entry, derive `id`, `focus`, `covers[]`, `tags`, and `parents[]` from structure where possible; Claude fills in prose-heavy cases.
|
|
53
|
+
3. **Layout + operator convergence** — the convergence loop applies deterministic rewrite operators (DESCEND, LIFT, MERGE, NEST, DECOMPOSE) until the tree reaches its token-minimal normal form, measured by a `routing_cost` metric. Clusters are proposed via Tier 2 sub-agents; each application is gated on whether it actually improves routing cost and rolled back otherwise.
|
|
54
|
+
4. **Index generation** — every directory gets an `index.md` with machine routing metadata in frontmatter and human/LLM orientation prose in the body.
|
|
55
|
+
5. **Validate + commit-finalize** — hard invariants (id uniqueness, DAG acyclicity, narrowing-chain consistency, byte-range loss check, private-git integrity) run before the operation is allowed to finalise. Any failure rolls back the entire operation to the pre-op snapshot.
|
|
56
|
+
|
|
57
|
+
Every phase is a git commit in the wiki's private history, so you can inspect, diff, roll back, and mirror exactly like a real repo — because it is one.
|
|
58
|
+
|
|
59
|
+
## Features at a glance
|
|
60
|
+
|
|
61
|
+
- **Git-backed history.** Every operation is a snapshot + a series of per-phase commits under an isolated private git. Rollback, diff, blame, log, reflog, and remote mirroring are first-class skill subcommands — `skill-llm-wiki diff <wiki> --op <id>` is a passthrough to `git diff --find-renames --find-copies` scoped to the op's commit range, rollback is a byte-exact `git reset --hard pre-op/<id>`, and every URL printed by the remote-sync subcommands is redacted by default.
|
|
62
|
+
- **Stable sibling layout.** `<source>.wiki/` is the one folder a wiki ever lives in. No more `.llmwiki.v1`/`.v2`/`.v3` directory proliferation — prior states are reachable as git tags (`pre-op/<id>`, `op/<id>`) in the private repo.
|
|
63
|
+
- **Three layout modes, never guessed.** `sibling` (default), `in-place` (source IS the wiki), and `hosted` (user-chosen path with a `.llmwiki.layout.yaml` contract). Ambiguous invocations refuse and prompt — see the "Ask, don't guess" rule.
|
|
64
|
+
- **User-repo coexistence.** An auto-generated `.gitignore` hides the private metadata from any ancestor user git. The skill's isolation env block (`GIT_DIR`, `GIT_CONFIG_NOSYSTEM`, `core.hooksPath=/dev/null`, …) keeps the two gits from leaking into each other.
|
|
65
|
+
- **Tiered AI strategy.** TF-IDF (free) → local MiniLM embeddings (required, ~23 MB one-time model download, zero-API) → Claude (only for mid-band ambiguity and decisions requiring natural-language judgment). `--quality-mode tiered-fast|claude-first|tier0-only` selects the escalation policy.
|
|
66
|
+
- **Deterministic slug collisions.** NEST operator auto-resolves slug-vs-member-id collisions with a deterministic `-group` suffix before apply. Your convergence loop never needs manual retries for DUP-ID.
|
|
67
|
+
- **Optional interactive review.** `skill-llm-wiki rebuild <wiki> --review` prints the post-convergence diff and commit list, lets the user approve / abort / `drop:<sha>` specific iterations, and re-runs validation + index regen on the reverted tree.
|
|
68
|
+
- **Windows parity.** The CI matrix runs the smoke suite on both `ubuntu-latest` and `windows-latest`; the isolation env switches `/dev/null` to `NUL` and enables `core.longpaths=true` on Windows.
|
|
69
|
+
|
|
70
|
+
Works on any corpus: markdown notes, product docs, API references, research, runbooks, architecture records, policy libraries, source code, mixed folders, whole projects.
|
|
71
|
+
|
|
72
|
+
## Quick Start
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
# Install into your project
|
|
76
|
+
npx @ctxr/kit install @ctxr/skill-llm-wiki
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
Then in Claude Code, ask for any of the six operations:
|
|
80
|
+
|
|
81
|
+
```text
|
|
82
|
+
Build an LLM wiki from ./docs
|
|
83
|
+
Add ./arch to my docs wiki
|
|
84
|
+
Validate my docs wiki
|
|
85
|
+
Rebuild my docs wiki
|
|
86
|
+
Fix my docs wiki
|
|
87
|
+
Merge my docs and runbooks wikis into a handbook
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## Requirements
|
|
91
|
+
|
|
92
|
+
This skill has two hard requirements. If either is missing, the skill will refuse to run and print a clear message explaining why and how to fix it.
|
|
93
|
+
|
|
94
|
+
1. **[Claude Code](https://claude.ai/code) CLI or IDE extension.**
|
|
95
|
+
2. **Node.js ≥ 18.0.0.** The skill's deterministic CLI (`scripts/cli.mjs`) is a Node.js program, so Node must be available in the shell Claude Code uses to run Bash commands. If Node.js is missing or below the minimum version, Claude will stop the operation before making any changes and relay platform-specific install instructions.
|
|
96
|
+
|
|
97
|
+
### Verify your environment before invoking the skill
|
|
98
|
+
|
|
99
|
+
Open a terminal and run:
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
node --version
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
- If you see `v18.0.0` or newer → you're ready.
|
|
106
|
+
- If you see a version below `v18.0.0` → upgrade Node.js before using the skill.
|
|
107
|
+
- If you see `command not found` or similar → install Node.js before using the skill.
|
|
108
|
+
|
|
109
|
+
### Installing or upgrading Node.js
|
|
110
|
+
|
|
111
|
+
Pick the option for your platform.
|
|
112
|
+
|
|
113
|
+
**macOS (Homebrew):**
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
brew install node # or: brew upgrade node
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
**macOS / Linux (nvm — recommended for dev machines):**
|
|
120
|
+
|
|
121
|
+
```bash
|
|
122
|
+
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/master/install.sh | bash
|
|
123
|
+
nvm install 20
|
|
124
|
+
nvm use 20
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
**Linux (Debian/Ubuntu):**
|
|
128
|
+
|
|
129
|
+
```bash
|
|
130
|
+
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
|
|
131
|
+
sudo apt-get install -y nodejs
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
**Linux (RHEL/Fedora):**
|
|
135
|
+
|
|
136
|
+
```bash
|
|
137
|
+
curl -fsSL https://rpm.nodesource.com/setup_20.x | sudo bash -
|
|
138
|
+
sudo dnf install -y nodejs
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
**Windows (winget):**
|
|
142
|
+
|
|
143
|
+
```powershell
|
|
144
|
+
winget install OpenJS.NodeJS
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
**Windows (Chocolatey):**
|
|
148
|
+
|
|
149
|
+
```powershell
|
|
150
|
+
choco install nodejs-lts
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
**Any platform:** download the official installer from <https://nodejs.org/en/download/>.
|
|
154
|
+
|
|
155
|
+
After installing, open a **fresh** terminal (so the shell picks up the new `PATH`) and verify with `node --version` again.
|
|
156
|
+
|
|
157
|
+
### Two-layer safety net
|
|
158
|
+
|
|
159
|
+
The skill checks Node.js availability before running any operation so you never see cryptic failures:
|
|
160
|
+
|
|
161
|
+
1. **Preflight (Bash).** Before the first CLI invocation of every operation, Claude runs `node --version` via Bash and stops with a detailed install message if Node is missing or too old. Nothing gets mutated before this check passes.
|
|
162
|
+
2. **Runtime guard (Node).** `scripts/cli.mjs` re-checks `process.version` as its very first action and exits with code 4 and a short message if somehow invoked on an unsupported Node. Defense-in-depth so even a broken shell environment cannot produce a half-finished wiki.
|
|
163
|
+
|
|
164
|
+
Both checks fail **loud and early** with a clear explanation and zero side-effects. The skill is safe to point at real folders on any machine.
|
|
165
|
+
|
|
166
|
+
## Installation
|
|
167
|
+
|
|
168
|
+
### Via @ctxr/kit
|
|
169
|
+
|
|
170
|
+
```bash
|
|
171
|
+
npx @ctxr/kit install @ctxr/skill-llm-wiki # project-local
|
|
172
|
+
npx @ctxr/kit install @ctxr/skill-llm-wiki --user # user-global
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
Installs to `.claude/skills/ctxr-skill-llm-wiki/` (or `~/.claude/skills/…` with `--user`). No post-install wiring, no automatic hooks, no filesystem watchers — the skill is pure standby until you explicitly ask Claude to run an operation against a specific directory.
|
|
176
|
+
|
|
177
|
+
The installed package contains `SKILL.md` (the routing entry point Claude reads at activation), `LICENSE`, `README.md`, `scripts/` (invoked via `node scripts/cli.mjs <subcommand>`, never read as source), and `guide/` (context-specific routing leaves loaded on keyword activation — `hidden-git.md` when the user asks about history or diff, `user-intent.md` when the request is ambiguous, `tiered-ai.md` when the user asks about quality modes, etc.). The internal design doc `methodology.md` is deliberately excluded from the installed package (`files[]` in `package.json` does not list it) so it is never copied into any user environment and never loaded during a session.
|
|
178
|
+
|
|
179
|
+
### Manual
|
|
180
|
+
|
|
181
|
+
```bash
|
|
182
|
+
git clone https://github.com/ctxr-dev/skill-llm-wiki.git /tmp/skill-llm-wiki
|
|
183
|
+
mkdir -p .claude/skills
|
|
184
|
+
cp -r /tmp/skill-llm-wiki .claude/skills/skill-llm-wiki
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
### Git Submodule
|
|
188
|
+
|
|
189
|
+
```bash
|
|
190
|
+
git submodule add https://github.com/ctxr-dev/skill-llm-wiki.git \
|
|
191
|
+
.claude/skills/skill-llm-wiki
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
## Usage
|
|
195
|
+
|
|
196
|
+
Ask Claude for any of the six operations against a specific target directory. Examples:
|
|
197
|
+
|
|
198
|
+
```text
|
|
199
|
+
Build an LLM wiki from ./docs
|
|
200
|
+
# → creates ./docs.wiki/ next to ./docs, initialises the private
|
|
201
|
+
# git at ./docs.wiki/.llmwiki/git/, tags pre-op/<id> and op/<id>
|
|
202
|
+
|
|
203
|
+
Add ./arch to my docs wiki
|
|
204
|
+
# → extends ./docs.wiki/ in place with a new op tag
|
|
205
|
+
|
|
206
|
+
Validate ./docs.wiki
|
|
207
|
+
# → read-only invariant check; prints findings with severity
|
|
208
|
+
|
|
209
|
+
Rebuild ./docs.wiki --review
|
|
210
|
+
# → runs convergence, prints the diff + per-iteration commit list,
|
|
211
|
+
# and prompts approve / abort / drop:<sha> before validation
|
|
212
|
+
|
|
213
|
+
Diff ./docs.wiki --op <op-id> --stat
|
|
214
|
+
# → byte-identical native `git diff --stat` against the private repo
|
|
215
|
+
|
|
216
|
+
Rollback ./docs.wiki --to pre-op/<op-id>
|
|
217
|
+
# → byte-exact reset to the snapshot taken before that operation
|
|
218
|
+
|
|
219
|
+
Fix ./docs.wiki
|
|
220
|
+
# → runs AUTO-class repairs; HUMAN-class findings surface as structured prompts for the user to resolve
|
|
221
|
+
|
|
222
|
+
Merge ./docs.wiki and ./runbooks.wiki into handbook
|
|
223
|
+
# → creates ./handbook.wiki/ with merged content and rewired references
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
Nothing happens until you ask. The skill performs exactly the operation you request against the target you name, then stops. Ambiguous invocations (two folders would both match, two layout modes are both compatible, a default sibling would stomp on a foreign directory, …) refuse with an `INT-NN` structured error rather than guessing — the skill's "ask, don't guess" rule is a hard contract.
|
|
227
|
+
|
|
228
|
+
## Layout modes
|
|
229
|
+
|
|
230
|
+
Every operation accepts `--layout-mode <mode>`; the default is `sibling`. Ambiguous cases refuse and prompt — they are never silently resolved.
|
|
231
|
+
|
|
232
|
+
### `sibling` (default)
|
|
233
|
+
|
|
234
|
+
`<source>.wiki/` lives next to `<source>/`. One wiki, one sibling directory, forever. Subsequent Rebuilds update the same sibling in place; prior states are reachable as git tags in the private repo under `<wiki>/.llmwiki/git/`. No `.llmwiki.v<N>` directory proliferation — the private git is the authoritative history substrate.
|
|
235
|
+
|
|
236
|
+
### `in-place`
|
|
237
|
+
|
|
238
|
+
The source folder IS the wiki. `<source>/.llmwiki/git/` is created inside the source itself; the `pre-op/<first-op>` snapshot captures the user's original content byte-for-byte; subsequent operations mutate the source directly. Rollback to the snapshot tag restores the original tree exactly. **Only runs when explicitly requested** with `--layout-mode in-place` — never inferred.
|
|
239
|
+
|
|
240
|
+
### `hosted`
|
|
241
|
+
|
|
242
|
+
The wiki lives at a user-chosen path that carries a `.llmwiki.layout.yaml` contract. Pass `--layout-mode hosted --target <path>`. The contract describes the required directories, allowed entry types, dynamic subdirectory templates (e.g., `daily/{yyyy}-{mm}-{dd}/`), and any additional invariants. Hosted mode is designed for shared team wikis and for "my wiki lives at `./memory/knowledge/`, I don't want it next to any source folder" workflows.
|
|
243
|
+
|
|
244
|
+
### User-repo coexistence
|
|
245
|
+
|
|
246
|
+
A wiki's filesystem location often sits inside the user's own git repository. The skill's private git never interferes with the user's git: every `git` subprocess runs with a strict isolation env (`GIT_DIR`, `GIT_CONFIG_NOSYSTEM=1`, `GIT_CONFIG_GLOBAL=/dev/null`, `HOME=<tmpdir>`, `core.hooksPath=/dev/null`, …). An auto-generated `<wiki>/.gitignore` hides `.llmwiki/`, `.work/`, and `.shape/history/*/work/` from any ancestor user git. The wiki content itself is plain markdown the user is encouraged to commit.
|
|
247
|
+
|
|
248
|
+
### Legacy `.llmwiki.v<N>/` auto-migration
|
|
249
|
+
|
|
250
|
+
When the skill encounters a pre-2.0 versioned sibling directory, the intent resolver halts with a migration prompt. On acceptance, the latest version is copied into a new `<source>.wiki/`, the private git is initialised, and the genesis commit is tagged `op/migrated-from-v<N>`. The old folder is left untouched; users prune it manually.
|
|
251
|
+
|
|
252
|
+
## The Six Operations
|
|
253
|
+
|
|
254
|
+
| Operation | Purpose | Output |
|
|
255
|
+
| --------- | ------- | ------ |
|
|
256
|
+
| **Build** | Create a new wiki from raw sources | sibling: `<source>.wiki/` · in-place: mutates `<source>/` directly · hosted: user-chosen path under a layout contract |
|
|
257
|
+
| **Extend** | Add new sources to an existing wiki | new per-phase commits + a new `op/<id>` tag on the existing `<wiki>.wiki/` |
|
|
258
|
+
| **Validate** | Read-only invariant check | structured findings report (hard + soft) |
|
|
259
|
+
| **Rebuild** | Optimise structure for token efficiency | new per-phase commits on the same wiki; `--review` gates the commit-finalize step on user approval |
|
|
260
|
+
| **Fix** | Repair methodology divergences | new commits on the existing wiki; HUMAN-class findings surface as structured prompts for user resolution *(minimal build-forward stub for now; full fix pipeline + dedicated INT error code are future work)* |
|
|
261
|
+
| **Join** | Merge two or more wikis into one | new unified wiki at the user-chosen target *(stub; full join pipeline is future work)* |
|
|
262
|
+
|
|
263
|
+
### Safety envelope (all operations)
|
|
264
|
+
|
|
265
|
+
- **Sources are immutable** in `sibling` and `hosted` modes; in `in-place` mode every change is anchored by the `pre-op/<op-id>` snapshot tag so rollback is byte-exact.
|
|
266
|
+
- **Every operation is a git sequence.** The pipeline always runs `pre-op snapshot → phase commits → validation → commit-finalize`. Validation failure triggers `git reset --hard pre-op/<id>` + `git clean -fd`; the failed phase commits survive in the reflog for post-mortem.
|
|
267
|
+
- **Rollback, diff, log, show, blame, history, reflog.** All exposed as subcommands and all byte-identical to native `git` under the isolation env. See `skill-llm-wiki diff/log/show/blame/history/reflog <wiki>`.
|
|
268
|
+
- **Phase-commit audit trail.** Each operation decomposes into named phases; every phase (and every operator-convergence iteration) is a git commit so the private repo's log is a complete per-phase audit trail. An interrupted operation can be inspected via `skill-llm-wiki log --op <id>` and rolled back via `skill-llm-wiki rollback <wiki> --to pre-<op-id>`. True mid-phase resume ("pick up from the last per-item marker") is scoped as future work.
|
|
269
|
+
- **Deterministic.** Same source + `LLM_WIKI_FIXED_TIMESTAMP=<epoch>` → byte-identical HEAD commit AND tree SHAs across runs and across machines. `newOpId` substitutes the random component for the literal `"deterministic"` when the env var is set, so the op-id, tag bodies, commit objects, and tree objects are all reproducible. AI calls are cached by request hash; similarity decisions are cached by content-hash pair.
|
|
270
|
+
- **Atomic commit-finalize.** The final `op/<op-id>` tag is set as the last step of every operation; until that tag exists, the operation is still reversible in one command.
|
|
271
|
+
- **Optional interactive review.** `rebuild --review` prints `git diff --stat` + the per-iteration commit list and prompts approve / abort / `drop:<sha>`. Drops become `git revert --no-edit` commits and the loop re-prompts so the user can drop multiple iterations.
|
|
272
|
+
- **Never-auto-push remote mirroring.** `skill-llm-wiki remote <wiki> add <name> <url>` plus `skill-llm-wiki sync <wiki>` pushes tags (and optionally a branch) to a bare remote the user manages. Tag-only refspec by default; URL credentials are redacted in every echoed line and error message.
|
|
273
|
+
|
|
274
|
+
## Phase-by-phase pipeline (the long version)
|
|
275
|
+
|
|
276
|
+
Every operation runs the same git-backed sequence end-to-end. Phases are explicit so you can read the private git's `log --oneline` after a run and recover the full story of what happened.
|
|
277
|
+
|
|
278
|
+
1. **Preflight + pre-op snapshot** — Node and git version checks, private-git integrity check, then `git add -A && git commit -m "pre-op <op-id>"` + tag `pre-op/<op-id>`.
|
|
279
|
+
2. **Ingest** (Build only) — walk the source tree, compute content hashes, emit entry candidates. Byte-range provenance is recorded to `<wiki>/.llmwiki/provenance.yaml` so `LOSS-01` can verify nothing was silently dropped. Extend / Rebuild / Fix / Join do not currently touch `provenance.yaml`.
|
|
280
|
+
3. **Classify** — group entries into categories. Tiered AI ladder: TF-IDF → local MiniLM embeddings → Claude. Decisive Tier 0 / Tier 1 outcomes never reach Claude.
|
|
281
|
+
4. **Draft frontmatter** — derive `id`, `focus`, `covers[]`, `activation`, `tags`, `parents[]` from structure where possible; Claude fallback for prose-heavy sources.
|
|
282
|
+
5. **Layout** — place entries in a draft tree honouring the narrowing-chain rule.
|
|
283
|
+
6. **Operator convergence** — apply DESCEND, LIFT, MERGE, NEST, DECOMPOSE in priority order until the tree reaches its normal form. One git commit per iteration so `git log pre-op/<id>..HEAD` reads like a per-iteration audit trail.
|
|
284
|
+
7. **Review (optional, `--review` only)** — print `git diff --stat` + commit list; accept approve / abort / drop:<sha> from the user. Drops land as `git revert --no-edit` commits and the loop re-prompts.
|
|
285
|
+
8. **Index generation** — emit a unified `index.md` at every directory with machine routing metadata in frontmatter and human/LLM orientation in the body.
|
|
286
|
+
9. **Validation** — run hard invariants including the new `GIT-01` (private-git integrity under the isolation env) and `LOSS-01` (byte-range coverage equals source size). Failure triggers `git reset --hard pre-op/<id>` + `git clean -fd`.
|
|
287
|
+
10. **Commit-finalize** — tag the final commit `op/<op-id>`, append to `<wiki>/.llmwiki/op-log.yaml`, delete the live `.work/` scratch directory. *(A "golden-path" phase that compares routing-fixture load sets against the prior op and a `.work/` → `.shape/history/<op-id>/` archive step are scoped as future work.)*
|
|
288
|
+
|
|
289
|
+
## Wiki format
|
|
290
|
+
|
|
291
|
+
Every directory in a wiki holds exactly one `index.md`:
|
|
292
|
+
|
|
293
|
+
```markdown
|
|
294
|
+
---
|
|
295
|
+
id: installation
|
|
296
|
+
type: index
|
|
297
|
+
depth_role: category
|
|
298
|
+
depth: 1
|
|
299
|
+
focus: installing the product on supported platforms
|
|
300
|
+
parents:
|
|
301
|
+
- ../index.md
|
|
302
|
+
shared_covers:
|
|
303
|
+
- prerequisite checks
|
|
304
|
+
- post-install validation
|
|
305
|
+
entries:
|
|
306
|
+
- id: linux
|
|
307
|
+
file: linux.md
|
|
308
|
+
type: primary
|
|
309
|
+
focus: installing on Linux distributions
|
|
310
|
+
- id: macos
|
|
311
|
+
file: macos.md
|
|
312
|
+
type: primary
|
|
313
|
+
focus: installing on macOS
|
|
314
|
+
children: []
|
|
315
|
+
---
|
|
316
|
+
<!-- BEGIN AUTO-GENERATED NAVIGATION -->
|
|
317
|
+
# Installation
|
|
318
|
+
## Children
|
|
319
|
+
| File | Type | Focus |
|
|
320
|
+
| ... |
|
|
321
|
+
<!-- END AUTO-GENERATED NAVIGATION -->
|
|
322
|
+
<!-- BEGIN AUTHORED ORIENTATION -->
|
|
323
|
+
Human/LLM-authored prose, preserved across regenerations.
|
|
324
|
+
<!-- END AUTHORED ORIENTATION -->
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
Leaves are `<id>.md` files with their own frontmatter (`id`, `type`, `focus`, `covers[]`, `parents[]`, `activation`, `tags`, `aliases`, `links`, `source`). The root `index.md` additionally carries a `generator: skill-llm-wiki/v1` marker that scripts use as a safety check before mutating anything.
|
|
328
|
+
|
|
329
|
+
## Architecture
|
|
330
|
+
|
|
331
|
+
The installed skill contains only what Claude needs at runtime. Everything Claude reads is in `SKILL.md`; everything it executes is in the `scripts/` CLI.
|
|
332
|
+
|
|
333
|
+
```text
|
|
334
|
+
skill-llm-wiki/ # installed package layout
|
|
335
|
+
├── SKILL.md # the ONLY file Claude reads — fully self-contained
|
|
336
|
+
├── README.md # human-facing docs (this file)
|
|
337
|
+
├── LICENSE
|
|
338
|
+
├── guide/ # routing-time leaves loaded by Claude on keyword activation
|
|
339
|
+
│ ├── hidden-git.md # using the private git for history / diff / blame
|
|
340
|
+
│ ├── layout-modes.md # sibling vs in-place vs hosted
|
|
341
|
+
│ ├── user-intent.md # "ask, don't guess" scenarios
|
|
342
|
+
│ ├── tiered-ai.md # tier ladder and quality modes
|
|
343
|
+
│ ├── remote-sync.md # remote mirroring + redaction
|
|
344
|
+
│ └── … # (coexistence, scale, diff, in-place-mode, safety, operations/*)
|
|
345
|
+
└── scripts/
|
|
346
|
+
├── cli.mjs # Deterministic CLI dispatcher — invoked, never read
|
|
347
|
+
├── commands/ # Command-level orchestrators
|
|
348
|
+
│ ├── review.mjs # --review flow for rebuild
|
|
349
|
+
│ ├── remote.mjs # remote add/list/remove
|
|
350
|
+
│ └── sync.mjs # remote sync (tag-only default refspec)
|
|
351
|
+
└── lib/
|
|
352
|
+
├── git.mjs # THE git subprocess spawner — isolation env + redaction
|
|
353
|
+
├── git-commands.mjs # log/show/diff/blame/history/reflog subcommand bodies
|
|
354
|
+
├── gitignore.mjs # auto-writer for the wiki-local `.gitignore`
|
|
355
|
+
├── paths.mjs # Sibling/in-place/hosted recognition + `.llmwiki/git/` detection
|
|
356
|
+
├── snapshot.mjs # preOpSnapshot + tag helpers
|
|
357
|
+
├── rollback.mjs # ref verification + reset/clean
|
|
358
|
+
├── history.mjs # op-log append/read, entry history traversal
|
|
359
|
+
├── provenance.mjs # byte-range record / verifyCoverage (LOSS-01 source)
|
|
360
|
+
├── chunk.mjs # Buffer-first frontmatter-only async iterator
|
|
361
|
+
├── preflight.mjs # Node + git + wiki-fsck checks
|
|
362
|
+
├── intent.mjs # layout-mode / target / op resolver (INT-NN errors)
|
|
363
|
+
├── interactive.mjs # stdin prompts; non-TTY → hard error
|
|
364
|
+
├── similarity.mjs # Tier 0 — TF-IDF + cosine
|
|
365
|
+
├── embeddings.mjs # Tier 1 — MiniLM via @xenova/transformers (required)
|
|
366
|
+
├── similarity-cache.mjs # pairwise memoisation
|
|
367
|
+
├── decision-log.mjs # .llmwiki/decisions.yaml writer
|
|
368
|
+
├── tiered.mjs # escalation orchestrator + quality modes
|
|
369
|
+
├── migrate.mjs # legacy .llmwiki.v<N> → .wiki migration flow
|
|
370
|
+
├── operators.mjs # The five rewrite operator primitives
|
|
371
|
+
├── nest-applier.mjs # NEST apply + deterministic slug collision resolver
|
|
372
|
+
├── cluster-detect.mjs # NEST candidate clusterer (affinity + threshold sweep)
|
|
373
|
+
├── quality-metric.mjs # routing_cost metric for NEST gating
|
|
374
|
+
├── frontmatter.mjs # Zero-dep YAML frontmatter parser/writer
|
|
375
|
+
├── ingest.mjs # Source walk + content hashing
|
|
376
|
+
├── draft.mjs # Deterministic frontmatter drafting + provenance record
|
|
377
|
+
├── indices.mjs # Unified index.md rebuild
|
|
378
|
+
├── validate.mjs # Hard-invariant checks including GIT-01 / LOSS-01
|
|
379
|
+
├── shape-check.mjs # Operator candidate detection (hook-mode path; no git)
|
|
380
|
+
└── orchestrator.mjs # Per-phase commit pipeline
|
|
381
|
+
```
|
|
382
|
+
|
|
383
|
+
`SKILL.md` and the `guide/` leaves are the only files Claude reads at routing/session time; the `scripts/` source is invoked as a process, never read. Every CLI subcommand's inputs, outputs, and exit codes are documented in `SKILL.md` so no source inspection is ever necessary during a session.
|
|
384
|
+
|
|
385
|
+
The development repository also contains `methodology.md`, an internal design reference for maintainers (sections 9.4.2/9.4.3/9.9/9.10 are the normative source for this README's "Layout modes", "Ask, don't guess", "git-backed history", and "tiered AI" content respectively). It is deliberately excluded from the installed package.
|
|
386
|
+
|
|
387
|
+
The CLI subcommands you will see the skill invoke:
|
|
388
|
+
|
|
389
|
+
```bash
|
|
390
|
+
# Top-level operations (routed through intent.mjs)
|
|
391
|
+
node scripts/cli.mjs build <source> [--layout-mode sibling|in-place|hosted] [--target <path>]
|
|
392
|
+
node scripts/cli.mjs extend <wiki> <source>
|
|
393
|
+
node scripts/cli.mjs validate <wiki>
|
|
394
|
+
node scripts/cli.mjs rebuild <wiki> [--review]
|
|
395
|
+
node scripts/cli.mjs fix <wiki>
|
|
396
|
+
node scripts/cli.mjs join <target> <wiki-a> <wiki-b> [<wiki-c> ...]
|
|
397
|
+
node scripts/cli.mjs rollback <wiki> --to <ref>
|
|
398
|
+
node scripts/cli.mjs migrate <legacy-wiki>
|
|
399
|
+
|
|
400
|
+
# Hidden-git plumbing (all run under the isolation env)
|
|
401
|
+
node scripts/cli.mjs log <wiki> [--op <id>] [git-log-args...]
|
|
402
|
+
node scripts/cli.mjs show <wiki> <ref> [-- <path>]
|
|
403
|
+
node scripts/cli.mjs diff <wiki> [--op <id>] [git-diff-args...]
|
|
404
|
+
node scripts/cli.mjs blame <wiki> <path>
|
|
405
|
+
node scripts/cli.mjs history <wiki> <entry-id>
|
|
406
|
+
node scripts/cli.mjs reflog <wiki>
|
|
407
|
+
|
|
408
|
+
# Remote mirroring (never auto-pushes)
|
|
409
|
+
node scripts/cli.mjs remote <wiki> add <name> <url>
|
|
410
|
+
node scripts/cli.mjs remote <wiki> list
|
|
411
|
+
node scripts/cli.mjs remote <wiki> remove <name>
|
|
412
|
+
node scripts/cli.mjs sync <wiki> [--remote <name>] [--push-branch <branch>] [--skip-fetch] [--skip-push]
|
|
413
|
+
|
|
414
|
+
# Low-level helpers (invoked by SKILL.md routing, not user-facing)
|
|
415
|
+
node scripts/cli.mjs ingest <source>
|
|
416
|
+
node scripts/cli.mjs draft-leaf <candidate-file>
|
|
417
|
+
node scripts/cli.mjs draft-category <candidate-file>
|
|
418
|
+
node scripts/cli.mjs index-rebuild <wiki>
|
|
419
|
+
node scripts/cli.mjs index-rebuild-one <dir> <wiki>
|
|
420
|
+
node scripts/cli.mjs shape-check <wiki>
|
|
421
|
+
|
|
422
|
+
# Legacy helpers (still present for pre-Phase-2 `.llmwiki.vN` wikis)
|
|
423
|
+
node scripts/cli.mjs resolve-wiki <source>
|
|
424
|
+
node scripts/cli.mjs next-version <source>
|
|
425
|
+
node scripts/cli.mjs list-versions <source>
|
|
426
|
+
node scripts/cli.mjs set-current <source> <version>
|
|
427
|
+
```
|
|
428
|
+
|
|
429
|
+
## Validation invariants
|
|
430
|
+
|
|
431
|
+
Every wiki passes the same set of hard invariants:
|
|
432
|
+
|
|
433
|
+
- `id` matches filename (leaves) or directory name (index files)
|
|
434
|
+
- `depth_role` matches actual tree depth
|
|
435
|
+
- **Strict narrowing** along every canonical `parents[0]` chain up to the root
|
|
436
|
+
- `parents[]` required and non-empty on every non-root entry
|
|
437
|
+
- **DAG acyclicity** — walking `parents[]` transitively never revisits the start
|
|
438
|
+
- **Canonical-parent consistency** — the entry lives inside `parents[0]`'s directory; soft parents only cross-reference
|
|
439
|
+
- No duplicate `id` anywhere; aliases do not collide with live ids
|
|
440
|
+
- `overlay_targets`, `links[].id`, and `parents[]` resolve via id or alias
|
|
441
|
+
- **Parent-file contract** — index bodies contain navigation and orientation only, no leaf-shaped content
|
|
442
|
+
- Every directory containing entries has a valid `index.md`
|
|
443
|
+
- Leaf size caps (500 lines for primaries, 200 for overlays)
|
|
444
|
+
- Source integrity — if `source.hash` is set, upstream content must still match
|
|
445
|
+
- **`GIT-01` — private-git integrity.** When `<wiki>/.llmwiki/git/HEAD` exists, `git fsck --no-dangling --no-reflogs` must succeed under the isolation env, and — when the op-log has at least one entry — the most recent logged op's `pre-op/<op-id>` tag must exist and be reachable from HEAD.
|
|
446
|
+
- **`LOSS-01` — byte-range coverage.** When `<wiki>/.llmwiki/provenance.yaml` exists, for every source file recorded in it, the total byte coverage (`sources[].byte_range` + `discarded_ranges[].byte_range`) must equal the manifest-recorded `source_size`, with no overlapping ranges. Sizes are read from the manifest so the check runs without needing access to the original source tree.
|
|
447
|
+
|
|
448
|
+
Soft shape-signals (operator candidates, golden-path regressions, coverage holes) are reported separately and drive the next Rebuild without blocking current operations.
|
|
449
|
+
|
|
450
|
+
## Tiered AI strategy
|
|
451
|
+
|
|
452
|
+
Every decision the skill makes is classified against a three-tier ladder and escalated only when necessary:
|
|
453
|
+
|
|
454
|
+
| Phase | Primary tier | Escalation | Notes |
|
|
455
|
+
|------------------------------------|-------------------------------------------|------------|-------|
|
|
456
|
+
| ingest / layout / index / validate / commit / routing | None (deterministic scripts) | — | No similarity, no generation. |
|
|
457
|
+
| classify / operator-convergence / join collisions | TF-IDF → MiniLM embeddings → Claude | Full ladder | >90% of decisions resolve at Tier 0 or 1 on typical corpora. |
|
|
458
|
+
| draft-frontmatter | Heuristic extractor → Claude | Skip Tier 1 | Generation, not similarity. Claude only for prose-heavy sources. |
|
|
459
|
+
| Fix — AI-ASSIST class | Claude | — | Content generation. |
|
|
460
|
+
| Fix — HUMAN class | User prompt | — | Always asks. |
|
|
461
|
+
|
|
462
|
+
Quality modes select the escalation policy:
|
|
463
|
+
|
|
464
|
+
- `tiered-fast` (default) — full Tier 0 → 1 → 2 ladder.
|
|
465
|
+
- `claude-first` — skip Tier 1; mid-band Tier 0 escalates straight to Claude.
|
|
466
|
+
- `tier0-only` — air-gapped mode; mid-band becomes an "undecidable" marker resolved via the interactive review flow.
|
|
467
|
+
|
|
468
|
+
Tier 1 uses `@xenova/transformers` running `Xenova/all-MiniLM-L6-v2` locally via ONNX (~23 MB one-time model download, ~50 ms per text on CPU, zero API cost). It is a **required** runtime dependency since v0.4.0 — the dependency preflight at CLI startup verifies it is resolvable, and will offer to `npm install` it on a fresh checkout if it is missing.
|
|
469
|
+
|
|
470
|
+
Token cost is proportional to *ambiguity*, not to corpus size. A 10k-entry wiki takes roughly the same Claude budget as a 100-entry wiki when it produces the same number of mid-band decisions. All AI calls are cached by request hash at `.work/ai-cache/` and all pairwise similarity decisions are cached at `.llmwiki/similarity-cache/` so resumes and re-runs replay free.
|
|
471
|
+
|
|
472
|
+
## Development
|
|
473
|
+
|
|
474
|
+
```bash
|
|
475
|
+
npm test # run smoke tests
|
|
476
|
+
node scripts/cli.mjs --version # print CLI version
|
|
477
|
+
node scripts/cli.mjs --help # list subcommands
|
|
478
|
+
```
|
|
479
|
+
|
|
480
|
+
Smoke tests verify: frontmatter roundtrip, source ingest, hand-built wiki validates, index-rebuild idempotency, and the script safety net against unrelated folders.
|
|
481
|
+
|
|
482
|
+
## License
|
|
483
|
+
|
|
484
|
+
[MIT](LICENSE)
|