llm-wiki-compiler 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +119 -11
- package/dist/cli.js +1610 -248
- package/dist/cli.js.map +1 -1
- package/package.json +5 -2
package/README.md
CHANGED
|
@@ -4,6 +4,8 @@ Compile raw sources into an interlinked markdown wiki.
|
|
|
4
4
|
|
|
5
5
|
Inspired by Karpathy's [LLM Wiki](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) pattern: instead of re-discovering knowledge at query time, compile it once into a persistent, browsable artifact that compounds over time.
|
|
6
6
|
|
|
7
|
+

|
|
8
|
+
|
|
7
9
|
## Who this is for
|
|
8
10
|
|
|
9
11
|
- **AI researchers and engineers** building persistent knowledge from papers, docs, and notes
|
|
@@ -15,12 +17,55 @@ Inspired by Karpathy's [LLM Wiki](https://gist.github.com/karpathy/442a6bf555914
|
|
|
15
17
|
```bash
|
|
16
18
|
npm install -g llm-wiki-compiler
|
|
17
19
|
export ANTHROPIC_API_KEY=sk-...
|
|
20
|
+
# Or use ANTHROPIC_AUTH_TOKEN if your Anthropic-compatible gateway expects it.
|
|
21
|
+
# Or use a different provider:
|
|
22
|
+
# export LLMWIKI_PROVIDER=openai
|
|
23
|
+
# export OPENAI_API_KEY=sk-...
|
|
18
24
|
|
|
19
25
|
llmwiki ingest https://some-article.com
|
|
20
26
|
llmwiki compile
|
|
21
27
|
llmwiki query "what is X?"
|
|
22
28
|
```
|
|
23
29
|
|
|
30
|
+
## Configuration
|
|
31
|
+
|
|
32
|
+
llmwiki configures providers via environment variables. The default provider is Anthropic.
|
|
33
|
+
|
|
34
|
+
Configuration precedence for Anthropic values:
|
|
35
|
+
|
|
36
|
+
1. Shell env / local `.env`
|
|
37
|
+
2. Claude Code settings fallback (`~/.claude/settings.json` → `env` block)
|
|
38
|
+
3. Built-in provider defaults (where applicable)
|
|
39
|
+
|
|
40
|
+
- `LLMWIKI_PROVIDER`: The provider to use (e.g., anthropic, openai).
|
|
41
|
+
- `LLMWIKI_MODEL`: The model name to override the provider default.
|
|
42
|
+
|
|
43
|
+
### Anthropic (Default)
|
|
44
|
+
|
|
45
|
+
- `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN`: Required. Either one can satisfy Anthropic authentication.
|
|
46
|
+
- `ANTHROPIC_BASE_URL`: Optional. Custom endpoint for proxies. Valid HTTP(S) URLs are accepted, including Claude-style path endpoints such as `https://api.kimi.com/coding/`.
|
|
47
|
+
|
|
48
|
+
Example using an Anthropic or cc-switch custom proxy:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
export LLMWIKI_PROVIDER=anthropic
|
|
52
|
+
export ANTHROPIC_API_KEY=sk-...
|
|
53
|
+
export ANTHROPIC_BASE_URL=https://proxy.example.com
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
If those values are not set in shell env or `.env`, llmwiki will try Anthropic-compatible values from `~/.claude/settings.json` (`env` block) for:
|
|
57
|
+
|
|
58
|
+
- `ANTHROPIC_API_KEY`
|
|
59
|
+
- `ANTHROPIC_AUTH_TOKEN`
|
|
60
|
+
- `ANTHROPIC_BASE_URL`
|
|
61
|
+
- `ANTHROPIC_MODEL`
|
|
62
|
+
|
|
63
|
+
Example with zero exports (Claude Code already configured):
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
llmwiki compile
|
|
67
|
+
```
|
|
68
|
+
|
|
24
69
|
## Why not just RAG?
|
|
25
70
|
|
|
26
71
|
RAG retrieves chunks at query time. Every question re-discovers the same relationships from scratch. Nothing accumulates.
|
|
@@ -68,7 +113,7 @@ a knowledge base into a target language that supports efficient queries.
|
|
|
68
113
|
Related concepts: [[Propositional Logic]], [[Model Counting]]
|
|
69
114
|
```
|
|
70
115
|
|
|
71
|
-
Pages include source attribution in frontmatter.
|
|
116
|
+
Pages include source attribution in frontmatter. Paragraphs are annotated with `^[filename.md]` markers pointing back to the source file that contributed the content.
|
|
72
117
|
|
|
73
118
|
## Commands
|
|
74
119
|
|
|
@@ -78,7 +123,9 @@ Pages include source attribution in frontmatter. Provenance is page-level today,
|
|
|
78
123
|
| `llmwiki compile` | Incremental compile: extract concepts, generate wiki pages |
|
|
79
124
|
| `llmwiki query "question"` | Ask questions against your compiled wiki |
|
|
80
125
|
| `llmwiki query "question" --save` | Answer and save the result as a wiki page |
|
|
126
|
+
| `llmwiki lint` | Check wiki quality (broken links, orphans, empty pages, etc.) |
|
|
81
127
|
| `llmwiki watch` | Auto-recompile when `sources/` changes |
|
|
128
|
+
| `llmwiki serve [--root <dir>]` | Start an MCP server exposing wiki tools to AI agents |
|
|
82
129
|
|
|
83
130
|
## Output
|
|
84
131
|
|
|
@@ -97,16 +144,72 @@ Try it on any article or document:
|
|
|
97
144
|
|
|
98
145
|
```bash
|
|
99
146
|
mkdir my-wiki && cd my-wiki
|
|
100
|
-
llmwiki ingest https://en.wikipedia.org/wiki/
|
|
147
|
+
llmwiki ingest https://en.wikipedia.org/wiki/Andrej_Karpathy
|
|
101
148
|
llmwiki compile
|
|
102
|
-
llmwiki query "
|
|
149
|
+
llmwiki query "What terms did Andrej coin?"
|
|
103
150
|
```
|
|
104
151
|
|
|
105
152
|
See `examples/basic/` in the repo for pre-generated output you can browse without an API key.
|
|
106
153
|
|
|
154
|
+
## MCP Server
|
|
155
|
+
|
|
156
|
+
llmwiki ships an MCP (Model Context Protocol) server so AI agents (Claude Desktop, Cursor, Claude Code, etc.) can drive the full pipeline directly: ingest sources, compile, query, search, lint, and read pages — without scraping CLI output.
|
|
157
|
+
|
|
158
|
+
Where [llm-wiki-kit](https://github.com/iamsashank09/llm-wiki-kit) gives agents raw CRUD against wiki pages, llmwiki exposes the **automated pipelines**: agents get intelligent compilation, incremental change detection, and semantic query routing built in.
|
|
159
|
+
|
|
160
|
+
### Setup
|
|
161
|
+
|
|
162
|
+
Start the server (stdio transport, no API key required at startup):
|
|
163
|
+
|
|
164
|
+
```bash
|
|
165
|
+
llmwiki serve --root /path/to/your/wiki-project
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
### Claude Desktop / Cursor configuration
|
|
169
|
+
|
|
170
|
+
Add to your client's MCP config (e.g. `claude_desktop_config.json`):
|
|
171
|
+
|
|
172
|
+
```json
|
|
173
|
+
{
|
|
174
|
+
"mcpServers": {
|
|
175
|
+
"llmwiki": {
|
|
176
|
+
"command": "npx",
|
|
177
|
+
"args": ["llm-wiki-compiler", "serve", "--root", "/path/to/wiki-project"],
|
|
178
|
+
"env": {
|
|
179
|
+
"ANTHROPIC_API_KEY": "sk-ant-..."
|
|
180
|
+
}
|
|
181
|
+
}
|
|
182
|
+
}
|
|
183
|
+
}
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
Tools that need an LLM (`compile_wiki`, `query_wiki`, `search_pages`) check for a configured provider on each call. Read-only tools (`read_page`, `lint_wiki`, `wiki_status`) and `ingest_source` work without any credentials.
|
|
187
|
+
|
|
188
|
+
### Tools
|
|
189
|
+
|
|
190
|
+
| Tool | What it does |
|
|
191
|
+
|------|--------------|
|
|
192
|
+
| `ingest_source` | Fetch a URL or local file into `sources/`. |
|
|
193
|
+
| `compile_wiki` | Run the incremental compile pipeline; returns counts, slugs, errors. |
|
|
194
|
+
| `query_wiki` | Two-step grounded answer with optional `--save`. |
|
|
195
|
+
| `search_pages` | Return full content of pages relevant to a question. |
|
|
196
|
+
| `read_page` | Read a single page by slug (concepts/ then queries/). |
|
|
197
|
+
| `lint_wiki` | Run quality checks; returns structured diagnostics. |
|
|
198
|
+
| `wiki_status` | Page count, source count, orphans, pending changes (read-only). |
|
|
199
|
+
|
|
200
|
+
### Resources
|
|
201
|
+
|
|
202
|
+
| URI | Returns |
|
|
203
|
+
|-----|---------|
|
|
204
|
+
| `llmwiki://index` | Full `wiki/index.md` content. |
|
|
205
|
+
| `llmwiki://concept/{slug}` | A single concept page (frontmatter + body). |
|
|
206
|
+
| `llmwiki://query/{slug}` | A single saved query page. |
|
|
207
|
+
| `llmwiki://sources` | List of ingested source files with metadata. |
|
|
208
|
+
| `llmwiki://state` | Compilation state (per-source hashes, last compile times). |
|
|
209
|
+
|
|
107
210
|
## Limitations
|
|
108
211
|
|
|
109
|
-
Early software. Best for small, high-signal corpora (a few dozen sources). Query routing is index-based.
|
|
212
|
+
Early software. Best for small, high-signal corpora (a few dozen sources). Query routing is index-based.
|
|
110
213
|
|
|
111
214
|
**Honest about truncation.** Sources that exceed the character limit are truncated on ingest with `truncated: true` and the original character count recorded in frontmatter, so downstream consumers know they're working with partial content.
|
|
112
215
|
|
|
@@ -121,24 +224,29 @@ Karpathy describes an abstract pattern for turning raw data into compiled knowle
|
|
|
121
224
|
| Q&A | `llmwiki query` | Implemented |
|
|
122
225
|
| Output filing (save answers back) | `llmwiki query --save` | Implemented |
|
|
123
226
|
| Auto-recompile | `llmwiki watch` | Implemented |
|
|
124
|
-
| Linting / health-check pass |
|
|
227
|
+
| Linting / health-check pass | `llmwiki lint` | Implemented |
|
|
228
|
+
| Agent integration | `llmwiki serve` (MCP server) | Implemented |
|
|
125
229
|
| Image support | — | Not yet implemented |
|
|
126
230
|
| Marp slides | — | Not yet implemented |
|
|
127
231
|
| Fine-tuning | — | Not yet implemented |
|
|
128
232
|
|
|
129
233
|
## Roadmap
|
|
130
234
|
|
|
131
|
-
-
|
|
132
|
-
-
|
|
133
|
-
-
|
|
134
|
-
-
|
|
135
|
-
-
|
|
235
|
+
- ✅ Better provenance (paragraph-level source attribution)
|
|
236
|
+
- ✅ Linting pass for wiki quality checks
|
|
237
|
+
- ✅ Multi-provider support (OpenAI, Ollama, MiniMax)
|
|
238
|
+
- ✅ Larger-corpus query strategy (semantic search, embeddings)
|
|
239
|
+
- ✅ Deeper Obsidian integration (tags, aliases, Map of Content)
|
|
240
|
+
- ✅ MCP server for agent integration
|
|
241
|
+
- Image support
|
|
242
|
+
- Marp slides
|
|
243
|
+
- Fine-tuning
|
|
136
244
|
|
|
137
245
|
If you want to contribute, these are the highest-leverage areas right now. Issues and PRs are welcome.
|
|
138
246
|
|
|
139
247
|
## Requirements
|
|
140
248
|
|
|
141
|
-
Node.js >= 18,
|
|
249
|
+
Node.js >= 18, plus provider credentials (for Anthropic: `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN`).
|
|
142
250
|
|
|
143
251
|
## License
|
|
144
252
|
|