llm-wiki-compiler 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -103,6 +103,15 @@ export OLLAMA_HOST=http://ollama_host:11434/v1
103
103
  export OLLAMA_EMBEDDINGS_HOST=http://ollama_host:11435/v1
104
104
  ```
105
105
 
106
+ ### Request timeouts
107
+
108
+ The OpenAI SDK defaults to a 10-minute per-request timeout, which can cut off long compile-time completions on slower local models. Override per provider:
109
+
110
+ - `LLMWIKI_REQUEST_TIMEOUT_MS` — provider-agnostic timeout in milliseconds. Applies to both the `openai` and `ollama` backends.
111
+ - `OLLAMA_TIMEOUT_MS` — Ollama-specific override. Wins over `LLMWIKI_REQUEST_TIMEOUT_MS` when both are set.
112
+
113
+ Defaults: 10 minutes for `openai`, 30 minutes for `ollama` (local models commonly need more).
114
+
106
115
  ## Why not just RAG?
107
116
 
108
117
  RAG retrieves chunks at query time. Every question re-discovers the same relationships from scratch. Nothing accumulates.
@@ -136,6 +145,7 @@ A raw source like a Wikipedia article on knowledge compilation becomes a structu
136
145
  ---
137
146
  title: Knowledge Compilation
138
147
  summary: Techniques for converting knowledge representations into forms that support efficient reasoning.
148
+ kind: concept
139
149
  sources:
140
150
  - knowledge-compilation.md
141
151
  createdAt: "2026-04-05T12:00:00Z"
@@ -150,7 +160,7 @@ a knowledge base into a target language that supports efficient queries.
150
160
  Related concepts: [[Propositional Logic]], [[Model Counting]]
151
161
  ```
152
162
 
153
- Pages include source attribution in frontmatter. Paragraphs are annotated with `^[filename.md]` markers pointing back to the source file that contributed the content.
163
+ Pages include source attribution in frontmatter. Paragraphs are annotated with `^[filename.md]` markers pointing back to the source file that contributed the content; specific claims can use line ranges like `^[filename.md:42-58]` or `^[filename.md#L42-L58]`.
154
164
 
155
165
  ## Commands
156
166
 
@@ -163,6 +173,8 @@ Pages include source attribution in frontmatter. Paragraphs are annotated with `
163
173
  | `llmwiki review show <id>` | Print a candidate's title, summary, and body |
164
174
  | `llmwiki review approve <id>` | Promote a candidate into `wiki/` and refresh index/MOC/embeddings |
165
175
  | `llmwiki review reject <id>` | Archive a candidate without touching `wiki/` |
176
+ | `llmwiki schema init` | Write a starter `.llmwiki/schema.json` file |
177
+ | `llmwiki schema show` | Print the resolved schema for the current project |
166
178
  | `llmwiki query "question"` | Ask questions against your compiled wiki |
167
179
  | `llmwiki query "question" --save` | Answer and save the result as a wiki page |
168
180
  | `llmwiki lint` | Check wiki quality (broken links, orphans, empty pages, low confidence, contradictions, etc.) |
@@ -177,6 +189,7 @@ wiki/
177
189
  queries/ saved query answers, included in index and retrieval
178
190
  index.md auto-generated table of contents
179
191
  .llmwiki/
192
+ schema.json optional page-kind and cross-link policy
180
193
  candidates/ pending review candidates from `compile --review`
181
194
  candidates/archive/ rejected candidates kept for audit
182
195
  ```
@@ -228,6 +241,41 @@ When multiple sources merge into one slug, metadata is reconciled: `min` confide
228
241
  - `contradicted-page` — flags pages with non-empty `contradictedBy`
229
242
  - `excess-inferred-paragraphs` — flags pages with too many inferred paragraphs without citations
230
243
 
244
+ ## Claim-level provenance
245
+
246
+ Paragraph citations continue to use the original source-marker form:
247
+
248
+ ```markdown
249
+ This paragraph is grounded in the source. ^[source.md]
250
+ ```
251
+
252
+ For claims that need tighter verification, pages can pin a statement to a line range in the ingested source:
253
+
254
+ ```markdown
255
+ The system uses a two-phase compile pipeline. ^[architecture-notes.md:42-58]
256
+ The same range can also use GitHub-style anchors. ^[architecture-notes.md#L42-L58]
257
+ ```
258
+
259
+ `llmwiki lint` validates both forms. It reports missing source files, malformed claim citations, impossible ranges like line `0` or `8-3`, and ranges that extend past the end of the source file.
260
+
261
+ ## Schema layer
262
+
263
+ Projects can optionally define `.llmwiki/schema.json` to shape the wiki beyond flat concept pages. Existing projects do not need a schema file; missing or invalid `kind` values fall back to `concept`.
264
+
265
+ ```bash
266
+ llmwiki schema init
267
+ llmwiki schema show
268
+ ```
269
+
270
+ The schema supports four page kinds:
271
+
272
+ - `concept` — standalone idea or pattern
273
+ - `entity` — specific person, product, organization, or named artifact
274
+ - `comparison` — side-by-side analysis across concepts or entities
275
+ - `overview` — map page that connects several concepts in a domain
276
+
277
+ Schema rules can set per-kind `minWikilinks` and optional `seedPages`. Compile can materialize seed pages such as overviews, lint enforces page-kind-specific cross-link minimums, and review candidates surface schema violations before approval.
278
+
231
279
  ## Demo
232
280
 
233
281
  Try it on any article or document:
@@ -322,6 +370,11 @@ Karpathy describes an abstract pattern for turning raw data into compiled knowle
322
370
 
323
371
  ## Roadmap
324
372
 
373
+ Shipped in 0.4.0:
374
+
375
+ - ✅ Claim-level provenance with source ranges
376
+ - ✅ First-class schema layer with typed page kinds (`concept`, `entity`, `comparison`, `overview`)
377
+
325
378
  Shipped in 0.3.0:
326
379
 
327
380
  - ✅ Candidate review queue (approve compile output before pages are written)
@@ -338,14 +391,12 @@ Shipped in 0.2.0:
338
391
 
339
392
  Next up:
340
393
 
341
- - Claim-level provenance with source ranges
342
- - First-class schema layer with typed page kinds (`concept`, `entity`, `comparison`, `overview`)
343
394
  - Multimodal ingest (images, PDFs, transcripts)
344
395
  - Chunked retrieval with reranking
345
396
  - Export bundle (`llms.txt`, JSON, JSON-LD, GraphML, Marp)
346
397
  - Session-history adapters (Claude, Codex, Cursor exports)
347
398
 
348
- If you like ambitious problems: **schema layer + typed page kinds**, **claim-level provenance**, and **chunked retrieval with reranking** are the meatiest. Open an issue to claim one or kick off a design discussion.
399
+ If you like ambitious problems: **multimodal ingest**, **chunked retrieval with reranking**, and **export bundles** are the meatiest. Open an issue to claim one or kick off a design discussion.
349
400
 
350
401
  ## Requirements
351
402