llm-wiki-compiler 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -4,6 +4,8 @@ Compile raw sources into an interlinked markdown wiki.
4
4
 
5
5
  Inspired by Karpathy's [LLM Wiki](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) pattern: instead of re-discovering knowledge at query time, compile it once into a persistent, browsable artifact that compounds over time.
6
6
 
7
+ ![llmwiki demo](docs/images/demo.gif)
8
+
7
9
  ## Who this is for
8
10
 
9
11
  - **AI researchers and engineers** building persistent knowledge from papers, docs, and notes
@@ -15,12 +17,55 @@ Inspired by Karpathy's [LLM Wiki](https://gist.github.com/karpathy/442a6bf555914
15
17
  ```bash
16
18
  npm install -g llm-wiki-compiler
17
19
  export ANTHROPIC_API_KEY=sk-...
20
+ # Or use ANTHROPIC_AUTH_TOKEN if your Anthropic-compatible gateway expects it.
21
+ # Or use a different provider:
22
+ # export LLMWIKI_PROVIDER=openai
23
+ # export OPENAI_API_KEY=sk-...
18
24
 
19
25
  llmwiki ingest https://some-article.com
20
26
  llmwiki compile
21
27
  llmwiki query "what is X?"
22
28
  ```
23
29
 
30
+ ## Configuration
31
+
32
+ llmwiki configures providers via environment variables. The default provider is Anthropic.
33
+
34
+ Configuration precedence for Anthropic values:
35
+
36
+ 1. Shell env / local `.env`
37
+ 2. Claude Code settings fallback (`~/.claude/settings.json` → `env` block)
38
+ 3. Built-in provider defaults (where applicable)
39
+
40
+ - `LLMWIKI_PROVIDER`: The provider to use (e.g., anthropic, openai).
41
+ - `LLMWIKI_MODEL`: The model name to override the provider default.
42
+
43
+ ### Anthropic (Default)
44
+
45
+ - `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN`: Required. Either one can satisfy Anthropic authentication.
46
+ - `ANTHROPIC_BASE_URL`: Optional. Custom endpoint for proxies. Valid HTTP(S) URLs are accepted, including Claude-style path endpoints such as `https://api.kimi.com/coding/`.
47
+
48
+ Example using an Anthropic or cc-switch custom proxy:
49
+
50
+ ```bash
51
+ export LLMWIKI_PROVIDER=anthropic
52
+ export ANTHROPIC_API_KEY=sk-...
53
+ export ANTHROPIC_BASE_URL=https://proxy.example.com
54
+ ```
55
+
56
+ If those values are not set in shell env or `.env`, llmwiki will try Anthropic-compatible values from `~/.claude/settings.json` (`env` block) for:
57
+
58
+ - `ANTHROPIC_API_KEY`
59
+ - `ANTHROPIC_AUTH_TOKEN`
60
+ - `ANTHROPIC_BASE_URL`
61
+ - `ANTHROPIC_MODEL`
62
+
63
+ Example with zero exports (Claude Code already configured):
64
+
65
+ ```bash
66
+ llmwiki compile
67
+ ```
68
+
24
69
  ## Why not just RAG?
25
70
 
26
71
  RAG retrieves chunks at query time. Every question re-discovers the same relationships from scratch. Nothing accumulates.
@@ -68,7 +113,7 @@ a knowledge base into a target language that supports efficient queries.
68
113
  Related concepts: [[Propositional Logic]], [[Model Counting]]
69
114
  ```
70
115
 
71
- Pages include source attribution in frontmatter. Provenance is page-level today, not claim-level.
116
+ Pages include source attribution in frontmatter. Paragraphs are annotated with `^[filename.md]` markers pointing back to the source file that contributed the content.
72
117
 
73
118
  ## Commands
74
119
 
@@ -78,7 +123,9 @@ Pages include source attribution in frontmatter. Provenance is page-level today,
78
123
  | `llmwiki compile` | Incremental compile: extract concepts, generate wiki pages |
79
124
  | `llmwiki query "question"` | Ask questions against your compiled wiki |
80
125
  | `llmwiki query "question" --save` | Answer and save the result as a wiki page |
126
+ | `llmwiki lint` | Check wiki quality (broken links, orphans, empty pages, etc.) |
81
127
  | `llmwiki watch` | Auto-recompile when `sources/` changes |
128
+ | `llmwiki serve [--root <dir>]` | Start an MCP server exposing wiki tools to AI agents |
82
129
 
83
130
  ## Output
84
131
 
@@ -97,16 +144,72 @@ Try it on any article or document:
97
144
 
98
145
  ```bash
99
146
  mkdir my-wiki && cd my-wiki
100
- llmwiki ingest https://en.wikipedia.org/wiki/Knowledge_compilation
147
+ llmwiki ingest https://en.wikipedia.org/wiki/Andrej_Karpathy
101
148
  llmwiki compile
102
- llmwiki query "how does knowledge compilation work?"
149
+ llmwiki query "What terms did Andrej coin?"
103
150
  ```
104
151
 
105
152
  See `examples/basic/` in the repo for pre-generated output you can browse without an API key.
106
153
 
154
+ ## MCP Server
155
+
156
+ llmwiki ships an MCP (Model Context Protocol) server so AI agents (Claude Desktop, Cursor, Claude Code, etc.) can drive the full pipeline directly: ingest sources, compile, query, search, lint, and read pages — without scraping CLI output.
157
+
158
+ Where [llm-wiki-kit](https://github.com/iamsashank09/llm-wiki-kit) gives agents raw CRUD against wiki pages, llmwiki exposes the **automated pipelines**: agents get intelligent compilation, incremental change detection, and semantic query routing built in.
159
+
160
+ ### Setup
161
+
162
+ Start the server (stdio transport, no API key required at startup):
163
+
164
+ ```bash
165
+ llmwiki serve --root /path/to/your/wiki-project
166
+ ```
167
+
168
+ ### Claude Desktop / Cursor configuration
169
+
170
+ Add to your client's MCP config (e.g. `claude_desktop_config.json`):
171
+
172
+ ```json
173
+ {
174
+ "mcpServers": {
175
+ "llmwiki": {
176
+ "command": "npx",
177
+ "args": ["llm-wiki-compiler", "serve", "--root", "/path/to/wiki-project"],
178
+ "env": {
179
+ "ANTHROPIC_API_KEY": "sk-ant-..."
180
+ }
181
+ }
182
+ }
183
+ }
184
+ ```
185
+
186
+ Tools that need an LLM (`compile_wiki`, `query_wiki`, `search_pages`) check for a configured provider on each call. Read-only tools (`read_page`, `lint_wiki`, `wiki_status`) and `ingest_source` work without any credentials.
187
+
188
+ ### Tools
189
+
190
+ | Tool | What it does |
191
+ |------|--------------|
192
+ | `ingest_source` | Fetch a URL or local file into `sources/`. |
193
+ | `compile_wiki` | Run the incremental compile pipeline; returns counts, slugs, errors. |
194
+ | `query_wiki` | Two-step grounded answer with optional `--save`. |
195
+ | `search_pages` | Return full content of pages relevant to a question. |
196
+ | `read_page` | Read a single page by slug (concepts/ then queries/). |
197
+ | `lint_wiki` | Run quality checks; returns structured diagnostics. |
198
+ | `wiki_status` | Page count, source count, orphans, pending changes (read-only). |
199
+
200
+ ### Resources
201
+
202
+ | URI | Returns |
203
+ |-----|---------|
204
+ | `llmwiki://index` | Full `wiki/index.md` content. |
205
+ | `llmwiki://concept/{slug}` | A single concept page (frontmatter + body). |
206
+ | `llmwiki://query/{slug}` | A single saved query page. |
207
+ | `llmwiki://sources` | List of ingested source files with metadata. |
208
+ | `llmwiki://state` | Compilation state (per-source hashes, last compile times). |
209
+
107
210
  ## Limitations
108
211
 
109
- Early software. Best for small, high-signal corpora (a few dozen sources). Query routing is index-based. Provenance is page-level, not claim-level. Anthropic-only for now.
212
+ Early software. Best for small, high-signal corpora (a few dozen sources). Query routing is index-based.
110
213
 
111
214
  **Honest about truncation.** Sources that exceed the character limit are truncated on ingest with `truncated: true` and the original character count recorded in frontmatter, so downstream consumers know they're working with partial content.
112
215
 
@@ -121,24 +224,29 @@ Karpathy describes an abstract pattern for turning raw data into compiled knowle
121
224
  | Q&A | `llmwiki query` | Implemented |
122
225
  | Output filing (save answers back) | `llmwiki query --save` | Implemented |
123
226
  | Auto-recompile | `llmwiki watch` | Implemented |
124
- | Linting / health-check pass | | Not yet implemented (`watch` is auto-recompile, not lint) |
227
+ | Linting / health-check pass | `llmwiki lint` | Implemented |
228
+ | Agent integration | `llmwiki serve` (MCP server) | Implemented |
125
229
  | Image support | — | Not yet implemented |
126
230
  | Marp slides | — | Not yet implemented |
127
231
  | Fine-tuning | — | Not yet implemented |
128
232
 
129
233
  ## Roadmap
130
234
 
131
- - Multi-provider support (OpenAI, local models)
132
- - Better provenance (claim-level source attribution)
133
- - Larger-corpus query strategy (semantic search, embeddings)
134
- - Deeper Obsidian integration
135
- - Linting pass for wiki quality checks
235
+ - Better provenance (paragraph-level source attribution)
236
+ - Linting pass for wiki quality checks
237
+ - ✅ Multi-provider support (OpenAI, Ollama, MiniMax)
238
+ - Larger-corpus query strategy (semantic search, embeddings)
239
+ - Deeper Obsidian integration (tags, aliases, Map of Content)
240
+ - ✅ MCP server for agent integration
241
+ - Image support
242
+ - Marp slides
243
+ - Fine-tuning
136
244
 
137
245
  If you want to contribute, these are the highest-leverage areas right now. Issues and PRs are welcome.
138
246
 
139
247
  ## Requirements
140
248
 
141
- Node.js >= 18, an Anthropic API key.
249
+ Node.js >= 18, plus provider credentials (for Anthropic: `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN`).
142
250
 
143
251
  ## License
144
252