llm-wiki-compiler 0.6.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -25,9 +25,20 @@ export ANTHROPIC_API_KEY=sk-...
25
25
  llmwiki ingest https://some-article.com
26
26
  llmwiki compile
27
27
  llmwiki query "what is X?"
28
+ llmwiki view --open
28
29
  ```
29
30
 
30
- ## Configuration
31
+
32
+ <br>
33
+
34
+ ---
35
+
36
+ <br>
37
+
38
+
39
+ <details>
40
+ <summary><span style="font-size: 1.4em;"><strong>Configuration — click to expand</strong></span></summary>
41
+
31
42
 
32
43
  llmwiki configures providers via environment variables. The default provider is Anthropic.
33
44
 
@@ -103,6 +114,35 @@ export OLLAMA_HOST=http://ollama_host:11434/v1
103
114
  export OLLAMA_EMBEDDINGS_HOST=http://ollama_host:11435/v1
104
115
  ```
105
116
 
117
+ ### GitHub Copilot
118
+
119
+ Uses the GitHub Copilot API (`https://api.githubcopilot.com`), an
120
+ OpenAI-compatible endpoint available to Copilot subscribers. Requires a GitHub
121
+ OAuth token with the `copilot` scope — **classic PATs are not supported**.
122
+
123
+ First, ensure your `gh` CLI token has the required scope:
124
+
125
+ ```bash
126
+ gh auth refresh --scopes copilot
127
+ ```
128
+
129
+ Then run:
130
+
131
+ ```bash
132
+ export LLMWIKI_PROVIDER=copilot
133
+ export GITHUB_TOKEN=$(gh auth token) # OAuth token required; PATs will not work
134
+ export LLMWIKI_MODEL=gpt-4o # optional; gpt-4o is the default
135
+ ```
136
+
137
+ Available models (names use dots, not dashes): `gpt-4o`, `gpt-4o-mini`,
138
+ `claude-sonnet-4.5`, `claude-sonnet-4.6`, `claude-opus-4.5`, `gemini-2.5-pro`,
139
+ and others — availability depends on your Copilot plan.
140
+
141
+ **Embeddings:** The GitHub Copilot API does not expose an embeddings endpoint.
142
+ Semantic search (used by `llmwiki query` with chunked retrieval) will fall back
143
+ to full-index selection without embeddings. For embedding-dependent workflows,
144
+ switch to the `openai` provider and provide `OPENAI_API_KEY`.
145
+
106
146
  ### Request timeouts
107
147
 
108
148
  The OpenAI SDK defaults to a 10-minute per-request timeout, which can cut off long compile-time completions on slower local models. Override per provider:
@@ -129,6 +169,16 @@ When many sources contribute to the same compiled concept, `compile` enforces a
129
169
 
130
170
  A truncation warning prints to stderr when the cap fires so you know which concept hit the budget.
131
171
 
172
+ </details>
173
+
174
+
175
+ <br>
176
+
177
+ ---
178
+
179
+ <br>
180
+
181
+
132
182
  ## Why not just RAG?
133
183
 
134
184
  RAG retrieves chunks at query time. Every question re-discovers the same relationships from scratch. Nothing accumulates.
@@ -179,6 +229,18 @@ Related concepts: [[Propositional Logic]], [[Model Counting]]
179
229
 
180
230
  Pages include source attribution in frontmatter. Paragraphs are annotated with `^[filename.md]` markers pointing back to the source file that contributed the content; specific claims can use line ranges like `^[filename.md:42-58]` or `^[filename.md#L42-L58]`.
181
231
 
232
+
233
+ <br>
234
+
235
+ ---
236
+
237
+ <br>
238
+
239
+
240
+ <details>
241
+ <summary><span style="font-size: 1.4em;"><strong>CLI and wiki model — click to expand</strong></span></summary>
242
+
243
+
182
244
  ## Commands
183
245
 
184
246
  | Command | What it does |
@@ -197,6 +259,7 @@ Pages include source attribution in frontmatter. Paragraphs are annotated with `
197
259
  | `llmwiki query "question"` | Ask questions against your compiled wiki |
198
260
  | `llmwiki query "question" --save` | Answer and save the result as a wiki page |
199
261
  | `llmwiki export [--target <name>]` | Export the wiki to portable formats — `llms.txt`, `llms-full.txt`, JSON, JSON-LD, GraphML, Marp slides |
262
+ | `llmwiki view [--open]` | Start a read-only local web viewer for browsing, searching, and inspecting the compiled wiki |
200
263
  | `llmwiki lint` | Check wiki quality (broken links, orphans, empty pages, low confidence, contradictions, etc.) |
201
264
  | `llmwiki watch` | Auto-recompile when `sources/` changes |
202
265
  | `llmwiki serve [--root <dir>]` | Start an MCP server exposing wiki tools to AI agents |
@@ -216,6 +279,17 @@ wiki/
216
279
 
217
280
  Obsidian-compatible. `[[wikilinks]]` resolve to concept titles.
218
281
 
282
+ ## Local web viewer
283
+
284
+ Run `llmwiki view` from a project root to browse the compiled wiki in a local browser without Obsidian. The viewer is read-only: it renders `wiki/`, exposes sidebar navigation, search, page metadata, health counts, and provenance/citation chips, but does not mutate sources or generated pages.
285
+
286
+ ```bash
287
+ llmwiki view # prints Viewer ready at http://127.0.0.1:<port>
288
+ llmwiki view --open # also opens the URL in your default browser
289
+ ```
290
+
291
+ The server is private by default. It binds to `127.0.0.1` unless you explicitly provide both `--host <host>` and `--allow-lan`; wildcard hosts are rejected. Viewer responses use a strict local-asset CSP and path-confinement checks so the UI can safely render local markdown content.
292
+
219
293
  ## Review queue
220
294
 
221
295
  By default, `compile` writes pages directly to `wiki/`. Add `--review` to write candidate JSON records to `.llmwiki/candidates/` instead, so you can inspect each generated page before it lands.
@@ -295,6 +369,16 @@ The schema supports four page kinds:
295
369
 
296
370
  Schema rules can set per-kind `minWikilinks` and optional `seedPages`. Compile can materialize seed pages such as overviews, lint enforces page-kind-specific cross-link minimums, and review candidates surface schema violations before approval.
297
371
 
372
+ </details>
373
+
374
+
375
+ <br>
376
+
377
+ ---
378
+
379
+ <br>
380
+
381
+
298
382
  ## Demo
299
383
 
300
384
  Try it on any article or document:
@@ -304,10 +388,23 @@ mkdir my-wiki && cd my-wiki
304
388
  llmwiki ingest https://en.wikipedia.org/wiki/Andrej_Karpathy
305
389
  llmwiki compile
306
390
  llmwiki query "What terms did Andrej coin?"
391
+ llmwiki view --open
307
392
  ```
308
393
 
309
394
  See `examples/basic/` in the repo for pre-generated output you can browse without an API key.
310
395
 
396
+
397
+ <br>
398
+
399
+ ---
400
+
401
+ <br>
402
+
403
+
404
+ <details>
405
+ <summary><span style="font-size: 1.4em;"><strong>MCP Server — click to expand</strong></span></summary>
406
+
407
+
311
408
  ## MCP Server
312
409
 
313
410
  llmwiki ships an MCP (Model Context Protocol) server so AI agents (Claude Desktop, Cursor, Claude Code, etc.) can drive the full pipeline directly: ingest sources, compile, query, search, lint, and read pages — without scraping CLI output.
@@ -364,6 +461,16 @@ Tools that need an LLM (`compile_wiki`, `query_wiki`, `search_pages`) check for
364
461
  | `llmwiki://sources` | List of ingested source files with metadata. |
365
462
  | `llmwiki://state` | Compilation state (per-source hashes, last compile times). |
366
463
 
464
+ </details>
465
+
466
+
467
+ <br>
468
+
469
+ ---
470
+
471
+ <br>
472
+
473
+
367
474
  ## Limitations
368
475
 
369
476
  Early software. Best for small, high-signal corpora (a few dozen sources). Query routing is index-based.
@@ -389,6 +496,12 @@ Karpathy describes an abstract pattern for turning raw data into compiled knowle
389
496
 
390
497
  ## Roadmap
391
498
 
499
+ Shipped in 0.7.0:
500
+
501
+ - ✅ Read-only local web viewer — `llmwiki view` with sidebar navigation, markdown rendering, search, metadata, health counts, and provenance/citation chips
502
+ - ✅ GitHub Copilot provider — `LLMWIKI_PROVIDER=copilot` with `GITHUB_TOKEN=$(gh auth token)` for Copilot chat/tool calls
503
+ - ✅ Cached lint health summary — `llmwiki lint` writes `.llmwiki/last-lint.json` so viewer health can show the latest lint counts without re-running lint
504
+
392
505
  Shipped in 0.6.0:
393
506
 
394
507
  - ✅ Export bundle (`llms.txt`, JSON, JSON-LD, GraphML, Marp slides)
@@ -421,17 +534,24 @@ Shipped in 0.2.0:
421
534
  - ✅ Deeper Obsidian integration (tags, aliases, Map of Content)
422
535
  - ✅ MCP server for agent integration
423
536
 
424
- Future ideas (open to discussion):
537
+ Next up:
538
+
539
+ - **Graph/context layer** — page-neighborhood tools, graph paths, gap detection, and token-budgeted context packs for agents.
540
+ - **Evaluation harness** — benchmark answer quality, citation accuracy, update drift, retrieval recall, and scale curves against serious retrieval baselines.
541
+ - **Task and decision ledger** — turn session ingest into durable agent memory: goals, decisions, open questions, outcomes, and next-agent handoffs.
542
+ - **Rollback, audit, and source lifecycle** — undo/reverse ingest, compile diff reports, stale-claim checks, freshness reports, and a durable operation log.
543
+ - **Domain templates** — schema/prompt packs for research, codebase docs, team handbooks, decision logs, and standards/regulations.
544
+
545
+ Later / open to discussion:
425
546
 
426
547
  - Recurring source refresh jobs — re-ingest URLs on a schedule, diff against the prior snapshot, re-compile only what changed
427
- - Graph export and a lightweight read-only graph browser for the concept network
428
- - A local read-only web UI for browsing the compiled wiki without Obsidian
429
- - MCP prompt resources curated agent prompts (review the wiki, propose new sources, draft a comparison page) shipped as MCP resources
430
- - Maintenance log + log rotation so long-running watch sessions don't grow unbounded
548
+ - MCP prompt resources curated agent prompts such as "review the wiki", "propose new sources", and "draft a comparison page"
549
+ - Codex OAuth provider ChatGPT subscription auth as a dedicated provider, with clear token refresh and embedding-limit behavior
550
+ - Team-chat connectors for Slack/Discord/Teams-style institutional memory
431
551
 
432
- If you like ambitious problems: **graph export with a browser**, **recurring source refresh**, and **MCP prompt resources** are the meatiest of the futures. Open an issue to claim one or kick off a design discussion.
552
+ If you like ambitious problems: **local web UI**, **graph/context packs**, and **eval harness** are the meatiest next contributions. Open an issue to claim one or kick off a design discussion.
433
553
 
434
- Explicitly not planned (good ideas, just not for this repo): full static-site generator, desktop or mobile apps, fine-tuning, a formal ontology engine, heavy graph reasoning.
554
+ Explicitly not planned (good ideas, just not for this repo): full static-site generator, desktop or mobile apps, fine-tuning, a formal ontology engine, heavy graph database infrastructure.
435
555
 
436
556
  ## Requirements
437
557