llm-wiki-compiler 0.1.1 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +115 -9
- package/dist/cli.js +1599 -224
- package/dist/cli.js.map +1 -1
- package/package.json +5 -2
package/README.md
CHANGED
|
@@ -17,12 +17,55 @@ Inspired by Karpathy's [LLM Wiki](https://gist.github.com/karpathy/442a6bf555914
|
|
|
17
17
|
```bash
|
|
18
18
|
npm install -g llm-wiki-compiler
|
|
19
19
|
export ANTHROPIC_API_KEY=sk-...
|
|
20
|
+
# Or use ANTHROPIC_AUTH_TOKEN if your Anthropic-compatible gateway expects it.
|
|
21
|
+
# Or use a different provider:
|
|
22
|
+
# export LLMWIKI_PROVIDER=openai
|
|
23
|
+
# export OPENAI_API_KEY=sk-...
|
|
20
24
|
|
|
21
25
|
llmwiki ingest https://some-article.com
|
|
22
26
|
llmwiki compile
|
|
23
27
|
llmwiki query "what is X?"
|
|
24
28
|
```
|
|
25
29
|
|
|
30
|
+
## Configuration
|
|
31
|
+
|
|
32
|
+
llmwiki configures providers via environment variables. The default provider is Anthropic.
|
|
33
|
+
|
|
34
|
+
Configuration precedence for Anthropic values:
|
|
35
|
+
|
|
36
|
+
1. Shell env / local `.env`
|
|
37
|
+
2. Claude Code settings fallback (`~/.claude/settings.json` → `env` block)
|
|
38
|
+
3. Built-in provider defaults (where applicable)
|
|
39
|
+
|
|
40
|
+
- `LLMWIKI_PROVIDER`: The provider to use (e.g., anthropic, openai).
|
|
41
|
+
- `LLMWIKI_MODEL`: The model name to override the provider default.
|
|
42
|
+
|
|
43
|
+
### Anthropic (Default)
|
|
44
|
+
|
|
45
|
+
- `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN`: Required. Either one can satisfy Anthropic authentication.
|
|
46
|
+
- `ANTHROPIC_BASE_URL`: Optional. Custom endpoint for proxies. Valid HTTP(S) URLs are accepted, including Claude-style path endpoints such as `https://api.kimi.com/coding/`.
|
|
47
|
+
|
|
48
|
+
Example using an Anthropic or cc-switch custom proxy:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
export LLMWIKI_PROVIDER=anthropic
|
|
52
|
+
export ANTHROPIC_API_KEY=sk-...
|
|
53
|
+
export ANTHROPIC_BASE_URL=https://proxy.example.com
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
If those values are not set in shell env or `.env`, llmwiki will try Anthropic-compatible values from `~/.claude/settings.json` (`env` block) for:
|
|
57
|
+
|
|
58
|
+
- `ANTHROPIC_API_KEY`
|
|
59
|
+
- `ANTHROPIC_AUTH_TOKEN`
|
|
60
|
+
- `ANTHROPIC_BASE_URL`
|
|
61
|
+
- `ANTHROPIC_MODEL`
|
|
62
|
+
|
|
63
|
+
Example with zero exports (Claude Code already configured):
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
llmwiki compile
|
|
67
|
+
```
|
|
68
|
+
|
|
26
69
|
## Why not just RAG?
|
|
27
70
|
|
|
28
71
|
RAG retrieves chunks at query time. Every question re-discovers the same relationships from scratch. Nothing accumulates.
|
|
@@ -70,7 +113,7 @@ a knowledge base into a target language that supports efficient queries.
|
|
|
70
113
|
Related concepts: [[Propositional Logic]], [[Model Counting]]
|
|
71
114
|
```
|
|
72
115
|
|
|
73
|
-
Pages include source attribution in frontmatter.
|
|
116
|
+
Pages include source attribution in frontmatter. Paragraphs are annotated with `^[filename.md]` markers pointing back to the source file that contributed the content.
|
|
74
117
|
|
|
75
118
|
## Commands
|
|
76
119
|
|
|
@@ -80,7 +123,9 @@ Pages include source attribution in frontmatter. Provenance is page-level today,
|
|
|
80
123
|
| `llmwiki compile` | Incremental compile: extract concepts, generate wiki pages |
|
|
81
124
|
| `llmwiki query "question"` | Ask questions against your compiled wiki |
|
|
82
125
|
| `llmwiki query "question" --save` | Answer and save the result as a wiki page |
|
|
126
|
+
| `llmwiki lint` | Check wiki quality (broken links, orphans, empty pages, etc.) |
|
|
83
127
|
| `llmwiki watch` | Auto-recompile when `sources/` changes |
|
|
128
|
+
| `llmwiki serve [--root <dir>]` | Start an MCP server exposing wiki tools to AI agents |
|
|
84
129
|
|
|
85
130
|
## Output
|
|
86
131
|
|
|
@@ -106,9 +151,65 @@ llmwiki query "What terms did Andrej coin?"
|
|
|
106
151
|
|
|
107
152
|
See `examples/basic/` in the repo for pre-generated output you can browse without an API key.
|
|
108
153
|
|
|
154
|
+
## MCP Server
|
|
155
|
+
|
|
156
|
+
llmwiki ships an MCP (Model Context Protocol) server so AI agents (Claude Desktop, Cursor, Claude Code, etc.) can drive the full pipeline directly: ingest sources, compile, query, search, lint, and read pages — without scraping CLI output.
|
|
157
|
+
|
|
158
|
+
Where [llm-wiki-kit](https://github.com/iamsashank09/llm-wiki-kit) gives agents raw CRUD against wiki pages, llmwiki exposes the **automated pipelines**: agents get intelligent compilation, incremental change detection, and semantic query routing built in.
|
|
159
|
+
|
|
160
|
+
### Setup
|
|
161
|
+
|
|
162
|
+
Start the server (stdio transport, no API key required at startup):
|
|
163
|
+
|
|
164
|
+
```bash
|
|
165
|
+
llmwiki serve --root /path/to/your/wiki-project
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
### Claude Desktop / Cursor configuration
|
|
169
|
+
|
|
170
|
+
Add to your client's MCP config (e.g. `claude_desktop_config.json`):
|
|
171
|
+
|
|
172
|
+
```json
|
|
173
|
+
{
|
|
174
|
+
"mcpServers": {
|
|
175
|
+
"llmwiki": {
|
|
176
|
+
"command": "npx",
|
|
177
|
+
"args": ["llm-wiki-compiler", "serve", "--root", "/path/to/wiki-project"],
|
|
178
|
+
"env": {
|
|
179
|
+
"ANTHROPIC_API_KEY": "sk-ant-..."
|
|
180
|
+
}
|
|
181
|
+
}
|
|
182
|
+
}
|
|
183
|
+
}
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
Tools that need an LLM (`compile_wiki`, `query_wiki`, `search_pages`) check for a configured provider on each call. Read-only tools (`read_page`, `lint_wiki`, `wiki_status`) and `ingest_source` work without any credentials.
|
|
187
|
+
|
|
188
|
+
### Tools
|
|
189
|
+
|
|
190
|
+
| Tool | What it does |
|
|
191
|
+
|------|--------------|
|
|
192
|
+
| `ingest_source` | Fetch a URL or local file into `sources/`. |
|
|
193
|
+
| `compile_wiki` | Run the incremental compile pipeline; returns counts, slugs, errors. |
|
|
194
|
+
| `query_wiki` | Two-step grounded answer with optional `--save`. |
|
|
195
|
+
| `search_pages` | Return full content of pages relevant to a question. |
|
|
196
|
+
| `read_page` | Read a single page by slug (concepts/ then queries/). |
|
|
197
|
+
| `lint_wiki` | Run quality checks; returns structured diagnostics. |
|
|
198
|
+
| `wiki_status` | Page count, source count, orphans, pending changes (read-only). |
|
|
199
|
+
|
|
200
|
+
### Resources
|
|
201
|
+
|
|
202
|
+
| URI | Returns |
|
|
203
|
+
|-----|---------|
|
|
204
|
+
| `llmwiki://index` | Full `wiki/index.md` content. |
|
|
205
|
+
| `llmwiki://concept/{slug}` | A single concept page (frontmatter + body). |
|
|
206
|
+
| `llmwiki://query/{slug}` | A single saved query page. |
|
|
207
|
+
| `llmwiki://sources` | List of ingested source files with metadata. |
|
|
208
|
+
| `llmwiki://state` | Compilation state (per-source hashes, last compile times). |
|
|
209
|
+
|
|
109
210
|
## Limitations
|
|
110
211
|
|
|
111
|
-
Early software. Best for small, high-signal corpora (a few dozen sources). Query routing is index-based.
|
|
212
|
+
Early software. Best for small, high-signal corpora (a few dozen sources). Query routing is index-based.
|
|
112
213
|
|
|
113
214
|
**Honest about truncation.** Sources that exceed the character limit are truncated on ingest with `truncated: true` and the original character count recorded in frontmatter, so downstream consumers know they're working with partial content.
|
|
114
215
|
|
|
@@ -123,24 +224,29 @@ Karpathy describes an abstract pattern for turning raw data into compiled knowle
|
|
|
123
224
|
| Q&A | `llmwiki query` | Implemented |
|
|
124
225
|
| Output filing (save answers back) | `llmwiki query --save` | Implemented |
|
|
125
226
|
| Auto-recompile | `llmwiki watch` | Implemented |
|
|
126
|
-
| Linting / health-check pass |
|
|
227
|
+
| Linting / health-check pass | `llmwiki lint` | Implemented |
|
|
228
|
+
| Agent integration | `llmwiki serve` (MCP server) | Implemented |
|
|
127
229
|
| Image support | — | Not yet implemented |
|
|
128
230
|
| Marp slides | — | Not yet implemented |
|
|
129
231
|
| Fine-tuning | — | Not yet implemented |
|
|
130
232
|
|
|
131
233
|
## Roadmap
|
|
132
234
|
|
|
133
|
-
-
|
|
134
|
-
-
|
|
135
|
-
-
|
|
136
|
-
-
|
|
137
|
-
-
|
|
235
|
+
- ✅ Better provenance (paragraph-level source attribution)
|
|
236
|
+
- ✅ Linting pass for wiki quality checks
|
|
237
|
+
- ✅ Multi-provider support (OpenAI, Ollama, MiniMax)
|
|
238
|
+
- ✅ Larger-corpus query strategy (semantic search, embeddings)
|
|
239
|
+
- ✅ Deeper Obsidian integration (tags, aliases, Map of Content)
|
|
240
|
+
- ✅ MCP server for agent integration
|
|
241
|
+
- Image support
|
|
242
|
+
- Marp slides
|
|
243
|
+
- Fine-tuning
|
|
138
244
|
|
|
139
245
|
If you want to contribute, these are the highest-leverage areas right now. Issues and PRs are welcome.
|
|
140
246
|
|
|
141
247
|
## Requirements
|
|
142
248
|
|
|
143
|
-
Node.js >= 18,
|
|
249
|
+
Node.js >= 18, plus provider credentials (for Anthropic: `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN`).
|
|
144
250
|
|
|
145
251
|
## License
|
|
146
252
|
|