llm-wiki-compiler 0.2.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +163 -9
- package/dist/cli.js +1478 -417
- package/dist/cli.js.map +1 -1
- package/package.json +5 -2
package/README.md
CHANGED
|
@@ -66,6 +66,52 @@ Example with zero exports (Claude Code already configured):
|
|
|
66
66
|
llmwiki compile
|
|
67
67
|
```
|
|
68
68
|
|
|
69
|
+
### OpenAI-Compatible Local Servers
|
|
70
|
+
|
|
71
|
+
Use the OpenAI provider for local OpenAI-compatible servers such as
|
|
72
|
+
`llama-server`. `OPENAI_BASE_URL` is used for chat/tool calls, and
|
|
73
|
+
`OPENAI_EMBEDDINGS_BASE_URL` is optional. Set it only when embeddings are
|
|
74
|
+
served from a different endpoint; when unset, embeddings use the same client
|
|
75
|
+
and base URL as chat. Include `/v1` in custom URLs.
|
|
76
|
+
|
|
77
|
+
Split endpoint example:
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
export LLMWIKI_PROVIDER=openai
|
|
81
|
+
export LLMWIKI_MODEL=qwen3.6-35b
|
|
82
|
+
export LLMWIKI_EMBEDDING_MODEL=text-embedding-model
|
|
83
|
+
export OPENAI_API_KEY=sk-local
|
|
84
|
+
export OPENAI_BASE_URL=http://host_url:port/v1
|
|
85
|
+
export OPENAI_EMBEDDINGS_BASE_URL=http://host_url:port/v1
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
`OPENAI_API_KEY` is still required by the CLI and OpenAI SDK. For local
|
|
89
|
+
servers that do not check authentication, any dummy value is sufficient.
|
|
90
|
+
|
|
91
|
+
### Ollama
|
|
92
|
+
|
|
93
|
+
Ollama uses its OpenAI-compatible endpoint. Set `OLLAMA_HOST` for chat and
|
|
94
|
+
optionally set `OLLAMA_EMBEDDINGS_HOST` only when embeddings are served from a
|
|
95
|
+
different endpoint. When unset, embeddings use `OLLAMA_HOST`. Include `/v1` in
|
|
96
|
+
custom URLs.
|
|
97
|
+
|
|
98
|
+
```bash
|
|
99
|
+
export LLMWIKI_PROVIDER=ollama
|
|
100
|
+
export LLMWIKI_MODEL=llama3.1
|
|
101
|
+
export LLMWIKI_EMBEDDING_MODEL=nomic-embed-text
|
|
102
|
+
export OLLAMA_HOST=http://ollama_host:11434/v1
|
|
103
|
+
export OLLAMA_EMBEDDINGS_HOST=http://ollama_host:11435/v1
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### Request timeouts
|
|
107
|
+
|
|
108
|
+
The OpenAI SDK defaults to a 10-minute per-request timeout, which can cut off long compile-time completions on slower local models. Override per provider:
|
|
109
|
+
|
|
110
|
+
- `LLMWIKI_REQUEST_TIMEOUT_MS` — provider-agnostic timeout in milliseconds. Applies to both the `openai` and `ollama` backends.
|
|
111
|
+
- `OLLAMA_TIMEOUT_MS` — Ollama-specific override. Wins over `LLMWIKI_REQUEST_TIMEOUT_MS` when both are set.
|
|
112
|
+
|
|
113
|
+
Defaults: 10 minutes for `openai`, 30 minutes for `ollama` (local models commonly need more).
|
|
114
|
+
|
|
69
115
|
## Why not just RAG?
|
|
70
116
|
|
|
71
117
|
RAG retrieves chunks at query time. Every question re-discovers the same relationships from scratch. Nothing accumulates.
|
|
@@ -99,6 +145,7 @@ A raw source like a Wikipedia article on knowledge compilation becomes a structu
|
|
|
99
145
|
---
|
|
100
146
|
title: Knowledge Compilation
|
|
101
147
|
summary: Techniques for converting knowledge representations into forms that support efficient reasoning.
|
|
148
|
+
kind: concept
|
|
102
149
|
sources:
|
|
103
150
|
- knowledge-compilation.md
|
|
104
151
|
createdAt: "2026-04-05T12:00:00Z"
|
|
@@ -113,7 +160,7 @@ a knowledge base into a target language that supports efficient queries.
|
|
|
113
160
|
Related concepts: [[Propositional Logic]], [[Model Counting]]
|
|
114
161
|
```
|
|
115
162
|
|
|
116
|
-
Pages include source attribution in frontmatter. Paragraphs are annotated with `^[filename.md]` markers pointing back to the source file that contributed the content.
|
|
163
|
+
Pages include source attribution in frontmatter. Paragraphs are annotated with `^[filename.md]` markers pointing back to the source file that contributed the content; specific claims can use line ranges like `^[filename.md:42-58]` or `^[filename.md#L42-L58]`.
|
|
117
164
|
|
|
118
165
|
## Commands
|
|
119
166
|
|
|
@@ -121,9 +168,16 @@ Pages include source attribution in frontmatter. Paragraphs are annotated with `
|
|
|
121
168
|
|---------|-------------|
|
|
122
169
|
| `llmwiki ingest <url\|file>` | Fetch a URL or copy a local file into `sources/` |
|
|
123
170
|
| `llmwiki compile` | Incremental compile: extract concepts, generate wiki pages |
|
|
171
|
+
| `llmwiki compile --review` | Write candidate pages to `.llmwiki/candidates/` instead of `wiki/` so you can review before they land |
|
|
172
|
+
| `llmwiki review list` | List pending candidate pages |
|
|
173
|
+
| `llmwiki review show <id>` | Print a candidate's title, summary, and body |
|
|
174
|
+
| `llmwiki review approve <id>` | Promote a candidate into `wiki/` and refresh index/MOC/embeddings |
|
|
175
|
+
| `llmwiki review reject <id>` | Archive a candidate without touching `wiki/` |
|
|
176
|
+
| `llmwiki schema init` | Write a starter `.llmwiki/schema.json` file |
|
|
177
|
+
| `llmwiki schema show` | Print the resolved schema for the current project |
|
|
124
178
|
| `llmwiki query "question"` | Ask questions against your compiled wiki |
|
|
125
179
|
| `llmwiki query "question" --save` | Answer and save the result as a wiki page |
|
|
126
|
-
| `llmwiki lint` | Check wiki quality (broken links, orphans, empty pages, etc.) |
|
|
180
|
+
| `llmwiki lint` | Check wiki quality (broken links, orphans, empty pages, low confidence, contradictions, etc.) |
|
|
127
181
|
| `llmwiki watch` | Auto-recompile when `sources/` changes |
|
|
128
182
|
| `llmwiki serve [--root <dir>]` | Start an MCP server exposing wiki tools to AI agents |
|
|
129
183
|
|
|
@@ -131,13 +185,97 @@ Pages include source attribution in frontmatter. Paragraphs are annotated with `
|
|
|
131
185
|
|
|
132
186
|
```
|
|
133
187
|
wiki/
|
|
134
|
-
concepts/
|
|
135
|
-
queries/
|
|
136
|
-
index.md
|
|
188
|
+
concepts/ one .md file per concept, with YAML frontmatter
|
|
189
|
+
queries/ saved query answers, included in index and retrieval
|
|
190
|
+
index.md auto-generated table of contents
|
|
191
|
+
.llmwiki/
|
|
192
|
+
schema.json optional page-kind and cross-link policy
|
|
193
|
+
candidates/ pending review candidates from `compile --review`
|
|
194
|
+
candidates/archive/ rejected candidates kept for audit
|
|
137
195
|
```
|
|
138
196
|
|
|
139
197
|
Obsidian-compatible. `[[wikilinks]]` resolve to concept titles.
|
|
140
198
|
|
|
199
|
+
## Review queue
|
|
200
|
+
|
|
201
|
+
By default, `compile` writes pages directly to `wiki/`. Add `--review` to write candidate JSON records to `.llmwiki/candidates/` instead, so you can inspect each generated page before it lands.
|
|
202
|
+
|
|
203
|
+
```bash
|
|
204
|
+
llmwiki compile --review # produces candidates, leaves wiki/ untouched
|
|
205
|
+
llmwiki review list # see what's pending
|
|
206
|
+
llmwiki review show <id> # inspect a single candidate
|
|
207
|
+
llmwiki review approve <id> # write into wiki/ + refresh index/MOC/embeddings
|
|
208
|
+
llmwiki review reject <id> # archive to .llmwiki/candidates/archive/
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
A few things to know:
|
|
212
|
+
|
|
213
|
+
- **Approve and reject acquire `.llmwiki/lock`** so they serialize cleanly against each other and against any concurrent `compile`.
|
|
214
|
+
- **Source state is deferred per-source.** When one source produces multiple candidates, the source isn't marked compiled until the last candidate is approved — so unresolved siblings stay re-detectable on the next `compile --review`.
|
|
215
|
+
- **Deletion bookkeeping is deferred.** `compile --review` does not orphan-mark deleted sources; the next non-review `compile` does that. The `--review` help text advertises this.
|
|
216
|
+
- MCP `wiki_status` exposes `pendingCandidates` so agents can see the queue depth.
|
|
217
|
+
|
|
218
|
+
## Page metadata
|
|
219
|
+
|
|
220
|
+
Compiled pages can carry epistemic metadata in frontmatter so consumers know how trustworthy each page is. All fields are optional and existing pages without them continue to work.
|
|
221
|
+
|
|
222
|
+
```yaml
|
|
223
|
+
---
|
|
224
|
+
title: Knowledge Compilation
|
|
225
|
+
summary: Techniques for converting knowledge representations...
|
|
226
|
+
sources:
|
|
227
|
+
- knowledge-compilation.md
|
|
228
|
+
confidence: 0.82 # 0–1, LLM-reported confidence in the synthesized page
|
|
229
|
+
provenanceState: merged # extracted | merged | inferred | ambiguous
|
|
230
|
+
contradictedBy:
|
|
231
|
+
- slug: probabilistic-reasoning
|
|
232
|
+
inferredParagraphs: 1 # paragraphs the LLM marked as inferred (vs cited)
|
|
233
|
+
---
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
When multiple sources merge into one slug, metadata is reconciled: `min` confidence, `provenanceState = 'merged'`, union of `contradictedBy` (deduped by slug), `max` `inferredParagraphs`.
|
|
237
|
+
|
|
238
|
+
`llmwiki lint` adds three rules that surface this metadata:
|
|
239
|
+
|
|
240
|
+
- `low-confidence` — flags pages with `confidence` below a threshold
|
|
241
|
+
- `contradicted-page` — flags pages with non-empty `contradictedBy`
|
|
242
|
+
- `excess-inferred-paragraphs` — flags pages with too many inferred paragraphs without citations
|
|
243
|
+
|
|
244
|
+
## Claim-level provenance
|
|
245
|
+
|
|
246
|
+
Paragraph citations continue to use the original source-marker form:
|
|
247
|
+
|
|
248
|
+
```markdown
|
|
249
|
+
This paragraph is grounded in the source. ^[source.md]
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
For claims that need tighter verification, pages can pin a statement to a line range in the ingested source:
|
|
253
|
+
|
|
254
|
+
```markdown
|
|
255
|
+
The system uses a two-phase compile pipeline. ^[architecture-notes.md:42-58]
|
|
256
|
+
The same range can also use GitHub-style anchors. ^[architecture-notes.md#L42-L58]
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
`llmwiki lint` validates both forms. It reports missing source files, malformed claim citations, impossible ranges like line `0` or `8-3`, and ranges that extend past the end of the source file.
|
|
260
|
+
|
|
261
|
+
## Schema layer
|
|
262
|
+
|
|
263
|
+
Projects can optionally define `.llmwiki/schema.json` to shape the wiki beyond flat concept pages. Existing projects do not need a schema file; missing or invalid `kind` values fall back to `concept`.
|
|
264
|
+
|
|
265
|
+
```bash
|
|
266
|
+
llmwiki schema init
|
|
267
|
+
llmwiki schema show
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
The schema supports four page kinds:
|
|
271
|
+
|
|
272
|
+
- `concept` — standalone idea or pattern
|
|
273
|
+
- `entity` — specific person, product, organization, or named artifact
|
|
274
|
+
- `comparison` — side-by-side analysis across concepts or entities
|
|
275
|
+
- `overview` — map page that connects several concepts in a domain
|
|
276
|
+
|
|
277
|
+
Schema rules can set per-kind `minWikilinks` and optional `seedPages`. Compile can materialize seed pages such as overviews, lint enforces page-kind-specific cross-link minimums, and review candidates surface schema violations before approval.
|
|
278
|
+
|
|
141
279
|
## Demo
|
|
142
280
|
|
|
143
281
|
Try it on any article or document:
|
|
@@ -232,17 +370,33 @@ Karpathy describes an abstract pattern for turning raw data into compiled knowle
|
|
|
232
370
|
|
|
233
371
|
## Roadmap
|
|
234
372
|
|
|
373
|
+
Shipped in 0.4.0:
|
|
374
|
+
|
|
375
|
+
- ✅ Claim-level provenance with source ranges
|
|
376
|
+
- ✅ First-class schema layer with typed page kinds (`concept`, `entity`, `comparison`, `overview`)
|
|
377
|
+
|
|
378
|
+
Shipped in 0.3.0:
|
|
379
|
+
|
|
380
|
+
- ✅ Candidate review queue (approve compile output before pages are written)
|
|
381
|
+
- ✅ Confidence and contradiction metadata on compiled pages
|
|
382
|
+
|
|
383
|
+
Shipped in 0.2.0:
|
|
384
|
+
|
|
235
385
|
- ✅ Better provenance (paragraph-level source attribution)
|
|
236
386
|
- ✅ Linting pass for wiki quality checks
|
|
237
387
|
- ✅ Multi-provider support (OpenAI, Ollama, MiniMax)
|
|
238
388
|
- ✅ Larger-corpus query strategy (semantic search, embeddings)
|
|
239
389
|
- ✅ Deeper Obsidian integration (tags, aliases, Map of Content)
|
|
240
390
|
- ✅ MCP server for agent integration
|
|
241
|
-
- Image support
|
|
242
|
-
- Marp slides
|
|
243
|
-
- Fine-tuning
|
|
244
391
|
|
|
245
|
-
|
|
392
|
+
Next up:
|
|
393
|
+
|
|
394
|
+
- Multimodal ingest (images, PDFs, transcripts)
|
|
395
|
+
- Chunked retrieval with reranking
|
|
396
|
+
- Export bundle (`llms.txt`, JSON, JSON-LD, GraphML, Marp)
|
|
397
|
+
- Session-history adapters (Claude, Codex, Cursor exports)
|
|
398
|
+
|
|
399
|
+
If you like ambitious problems: **multimodal ingest**, **chunked retrieval with reranking**, and **export bundles** are the meatiest. Open an issue to claim one or kick off a design discussion.
|
|
246
400
|
|
|
247
401
|
## Requirements
|
|
248
402
|
|