llm-wiki-compiler 0.3.0 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +72 -7
- package/dist/cli.js +2651 -1576
- package/dist/cli.js.map +1 -1
- package/package.json +6 -3
package/README.md
CHANGED
|
@@ -103,6 +103,15 @@ export OLLAMA_HOST=http://ollama_host:11434/v1
|
|
|
103
103
|
export OLLAMA_EMBEDDINGS_HOST=http://ollama_host:11435/v1
|
|
104
104
|
```
|
|
105
105
|
|
|
106
|
+
### Request timeouts
|
|
107
|
+
|
|
108
|
+
The OpenAI SDK defaults to a 10-minute per-request timeout, which can cut off long compile-time completions on slower local models. Override per provider:
|
|
109
|
+
|
|
110
|
+
- `LLMWIKI_REQUEST_TIMEOUT_MS` — provider-agnostic timeout in milliseconds. Applies to both the `openai` and `ollama` backends.
|
|
111
|
+
- `OLLAMA_TIMEOUT_MS` — Ollama-specific override. Wins over `LLMWIKI_REQUEST_TIMEOUT_MS` when both are set.
|
|
112
|
+
|
|
113
|
+
Defaults: 10 minutes for `openai`, 30 minutes for `ollama` (local models commonly need more).
|
|
114
|
+
|
|
106
115
|
## Why not just RAG?
|
|
107
116
|
|
|
108
117
|
RAG retrieves chunks at query time. Every question re-discovers the same relationships from scratch. Nothing accumulates.
|
|
@@ -136,6 +145,7 @@ A raw source like a Wikipedia article on knowledge compilation becomes a structu
|
|
|
136
145
|
---
|
|
137
146
|
title: Knowledge Compilation
|
|
138
147
|
summary: Techniques for converting knowledge representations into forms that support efficient reasoning.
|
|
148
|
+
kind: concept
|
|
139
149
|
sources:
|
|
140
150
|
- knowledge-compilation.md
|
|
141
151
|
createdAt: "2026-04-05T12:00:00Z"
|
|
@@ -150,7 +160,7 @@ a knowledge base into a target language that supports efficient queries.
|
|
|
150
160
|
Related concepts: [[Propositional Logic]], [[Model Counting]]
|
|
151
161
|
```
|
|
152
162
|
|
|
153
|
-
Pages include source attribution in frontmatter. Paragraphs are annotated with `^[filename.md]` markers pointing back to the source file that contributed the content.
|
|
163
|
+
Pages include source attribution in frontmatter. Paragraphs are annotated with `^[filename.md]` markers pointing back to the source file that contributed the content; specific claims can use line ranges like `^[filename.md:42-58]` or `^[filename.md#L42-L58]`.
|
|
154
164
|
|
|
155
165
|
## Commands
|
|
156
166
|
|
|
@@ -163,6 +173,8 @@ Pages include source attribution in frontmatter. Paragraphs are annotated with `
|
|
|
163
173
|
| `llmwiki review show <id>` | Print a candidate's title, summary, and body |
|
|
164
174
|
| `llmwiki review approve <id>` | Promote a candidate into `wiki/` and refresh index/MOC/embeddings |
|
|
165
175
|
| `llmwiki review reject <id>` | Archive a candidate without touching `wiki/` |
|
|
176
|
+
| `llmwiki schema init` | Write a starter `.llmwiki/schema.json` file |
|
|
177
|
+
| `llmwiki schema show` | Print the resolved schema for the current project |
|
|
166
178
|
| `llmwiki query "question"` | Ask questions against your compiled wiki |
|
|
167
179
|
| `llmwiki query "question" --save` | Answer and save the result as a wiki page |
|
|
168
180
|
| `llmwiki lint` | Check wiki quality (broken links, orphans, empty pages, low confidence, contradictions, etc.) |
|
|
@@ -177,6 +189,7 @@ wiki/
|
|
|
177
189
|
queries/ saved query answers, included in index and retrieval
|
|
178
190
|
index.md auto-generated table of contents
|
|
179
191
|
.llmwiki/
|
|
192
|
+
schema.json optional page-kind and cross-link policy
|
|
180
193
|
candidates/ pending review candidates from `compile --review`
|
|
181
194
|
candidates/archive/ rejected candidates kept for audit
|
|
182
195
|
```
|
|
@@ -228,6 +241,41 @@ When multiple sources merge into one slug, metadata is reconciled: `min` confide
|
|
|
228
241
|
- `contradicted-page` — flags pages with non-empty `contradictedBy`
|
|
229
242
|
- `excess-inferred-paragraphs` — flags pages with too many inferred paragraphs without citations
|
|
230
243
|
|
|
244
|
+
## Claim-level provenance
|
|
245
|
+
|
|
246
|
+
Paragraph citations continue to use the original source-marker form:
|
|
247
|
+
|
|
248
|
+
```markdown
|
|
249
|
+
This paragraph is grounded in the source. ^[source.md]
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
For claims that need tighter verification, pages can pin a statement to a line range in the ingested source:
|
|
253
|
+
|
|
254
|
+
```markdown
|
|
255
|
+
The system uses a two-phase compile pipeline. ^[architecture-notes.md:42-58]
|
|
256
|
+
The same range can also use GitHub-style anchors. ^[architecture-notes.md#L42-L58]
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
`llmwiki lint` validates both forms. It reports missing source files, malformed claim citations, impossible ranges like line `0` or `8-3`, and ranges that extend past the end of the source file.
|
|
260
|
+
|
|
261
|
+
## Schema layer
|
|
262
|
+
|
|
263
|
+
Projects can optionally define `.llmwiki/schema.json` to shape the wiki beyond flat concept pages. Existing projects do not need a schema file; missing or invalid `kind` values fall back to `concept`.
|
|
264
|
+
|
|
265
|
+
```bash
|
|
266
|
+
llmwiki schema init
|
|
267
|
+
llmwiki schema show
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
The schema supports four page kinds:
|
|
271
|
+
|
|
272
|
+
- `concept` — standalone idea or pattern
|
|
273
|
+
- `entity` — specific person, product, organization, or named artifact
|
|
274
|
+
- `comparison` — side-by-side analysis across concepts or entities
|
|
275
|
+
- `overview` — map page that connects several concepts in a domain
|
|
276
|
+
|
|
277
|
+
Schema rules can set per-kind `minWikilinks` and optional `seedPages`. Compile can materialize seed pages such as overviews, lint enforces page-kind-specific cross-link minimums, and review candidates surface schema violations before approval.
|
|
278
|
+
|
|
231
279
|
## Demo
|
|
232
280
|
|
|
233
281
|
Try it on any article or document:
|
|
@@ -322,6 +370,17 @@ Karpathy describes an abstract pattern for turning raw data into compiled knowle
|
|
|
322
370
|
|
|
323
371
|
## Roadmap
|
|
324
372
|
|
|
373
|
+
Shipped in 0.5.0:
|
|
374
|
+
|
|
375
|
+
- ✅ Multimodal ingest (images, PDFs, transcripts)
|
|
376
|
+
- ✅ Chunked retrieval with reranking and `--debug` output
|
|
377
|
+
- ⚠️ Minimum Node version raised to 24 (was 18)
|
|
378
|
+
|
|
379
|
+
Shipped in 0.4.0:
|
|
380
|
+
|
|
381
|
+
- ✅ Claim-level provenance with source ranges
|
|
382
|
+
- ✅ First-class schema layer with typed page kinds (`concept`, `entity`, `comparison`, `overview`)
|
|
383
|
+
|
|
325
384
|
Shipped in 0.3.0:
|
|
326
385
|
|
|
327
386
|
- ✅ Candidate review queue (approve compile output before pages are written)
|
|
@@ -338,18 +397,24 @@ Shipped in 0.2.0:
|
|
|
338
397
|
|
|
339
398
|
Next up:
|
|
340
399
|
|
|
341
|
-
- Claim-level provenance with source ranges
|
|
342
|
-
- First-class schema layer with typed page kinds (`concept`, `entity`, `comparison`, `overview`)
|
|
343
|
-
- Multimodal ingest (images, PDFs, transcripts)
|
|
344
|
-
- Chunked retrieval with reranking
|
|
345
400
|
- Export bundle (`llms.txt`, JSON, JSON-LD, GraphML, Marp)
|
|
346
401
|
- Session-history adapters (Claude, Codex, Cursor exports)
|
|
347
402
|
|
|
348
|
-
|
|
403
|
+
Future ideas (open to discussion):
|
|
404
|
+
|
|
405
|
+
- Recurring source refresh jobs — re-ingest URLs on a schedule, diff against the prior snapshot, re-compile only what changed
|
|
406
|
+
- Graph export and a lightweight read-only graph browser for the concept network
|
|
407
|
+
- A local read-only web UI for browsing the compiled wiki without Obsidian
|
|
408
|
+
- MCP prompt resources — curated agent prompts (review the wiki, propose new sources, draft a comparison page) shipped as MCP resources
|
|
409
|
+
- Maintenance log + log rotation so long-running watch sessions don't grow unbounded
|
|
410
|
+
|
|
411
|
+
If you like ambitious problems: **graph export with a browser**, **recurring source refresh**, and **MCP prompt resources** are the meatiest of the futures. Open an issue to claim one or kick off a design discussion.
|
|
412
|
+
|
|
413
|
+
Explicitly not planned (good ideas, just not for this repo): full static-site generator, desktop or mobile apps, fine-tuning, a formal ontology engine, heavy graph reasoning.
|
|
349
414
|
|
|
350
415
|
## Requirements
|
|
351
416
|
|
|
352
|
-
Node.js >=
|
|
417
|
+
Node.js >= 24, plus provider credentials (for Anthropic: `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN`).
|
|
353
418
|
|
|
354
419
|
## License
|
|
355
420
|
|