llm-wiki-compiler 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 atomicmemory
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,150 @@
1
+ # llmwiki
2
+
3
+ Compile raw sources into an interlinked markdown wiki.
4
+
5
+ Inspired by Karpathy's [LLM Wiki](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) pattern: instead of re-discovering knowledge at query time, compile it once into a persistent, browsable artifact that compounds over time.
6
+
7
+ ## Who this is for
8
+
9
+ - **AI researchers and engineers** building persistent knowledge from papers, docs, and notes
10
+ - **Technical writers** compiling scattered sources into a structured, interlinked reference
11
+ - **Anyone with too many bookmarks** who wants a wiki instead of a graveyard of tabs
12
+
13
+ ## Quick start
14
+
15
+ ```bash
16
+ npm install -g llm-wiki-compiler
17
+ export ANTHROPIC_API_KEY=sk-...
18
+
19
+ llmwiki ingest https://some-article.com
20
+ llmwiki compile
21
+ llmwiki query "what is X?"
22
+ ```
23
+
24
+ ## Why not just RAG?
25
+
26
+ RAG retrieves chunks at query time. Every question re-discovers the same relationships from scratch. Nothing accumulates.
27
+
28
+ llmwiki **compiles** your sources into a wiki. Concepts get their own pages. Pages link to each other. When you ask a question with `--save`, the answer becomes a new page, and future queries use it as context. Your explorations compound.
29
+
30
+ This is complementary to RAG, not a replacement. RAG is great for ad-hoc retrieval over large corpora. llmwiki gives you a persistent, structured artifact to retrieve from.
31
+
32
+ ```
33
+ RAG: query → search chunks → answer → forget
34
+ llmwiki: sources → compile → wiki → query → save → richer wiki → better answers
35
+ ```
36
+
37
+ ## How it works
38
+
39
+ ```
40
+ sources/ → SHA-256 hash check → LLM concept extraction → wiki page generation → [[wikilink]] resolution → index.md
41
+ ```
42
+
43
+ **Two-phase pipeline.** Phase 1 extracts all concepts from all sources. Phase 2 generates pages. This eliminates order-dependence, catches failures before writing anything, and merges concepts shared across multiple sources into single pages.
44
+
45
+ **Incremental.** Only changed sources go through the LLM. Everything else is skipped via hash-based change detection.
46
+
47
+ **Compounding queries.** `llmwiki query --save` writes the answer as a wiki page and immediately rebuilds the index. Saved answers show up in future queries as context.
48
+
49
+ ### What it produces
50
+
51
+ A raw source like a Wikipedia article on knowledge compilation becomes a structured wiki page:
52
+
53
+ ```yaml
54
+ ---
55
+ title: Knowledge Compilation
56
+ summary: Techniques for converting knowledge representations into forms that support efficient reasoning.
57
+ sources:
58
+ - knowledge-compilation.md
59
+ createdAt: "2026-04-05T12:00:00Z"
60
+ updatedAt: "2026-04-05T12:00:00Z"
61
+ ---
62
+ ```
63
+
64
+ ```markdown
65
+ Knowledge compilation refers to a family of techniques for pre-processing
66
+ a knowledge base into a target language that supports efficient queries.
67
+
68
+ Related concepts: [[Propositional Logic]], [[Model Counting]]
69
+ ```
70
+
71
+ Pages include source attribution in frontmatter. Provenance is page-level today, not claim-level.
72
+
73
+ ## Commands
74
+
75
+ | Command | What it does |
76
+ |---------|-------------|
77
+ | `llmwiki ingest <url\|file>` | Fetch a URL or copy a local file into `sources/` |
78
+ | `llmwiki compile` | Incremental compile: extract concepts, generate wiki pages |
79
+ | `llmwiki query "question"` | Ask questions against your compiled wiki |
80
+ | `llmwiki query "question" --save` | Answer and save the result as a wiki page |
81
+ | `llmwiki watch` | Auto-recompile when `sources/` changes |
82
+
83
+ ## Output
84
+
85
+ ```
86
+ wiki/
87
+ concepts/ one .md file per concept, with YAML frontmatter
88
+ queries/ saved query answers, included in index and retrieval
89
+ index.md auto-generated table of contents
90
+ ```
91
+
92
+ Obsidian-compatible. `[[wikilinks]]` resolve to concept titles.
93
+
94
+ ## Demo
95
+
96
+ Try it on any article or document:
97
+
98
+ ```bash
99
+ mkdir my-wiki && cd my-wiki
100
+ llmwiki ingest https://en.wikipedia.org/wiki/Knowledge_compilation
101
+ llmwiki compile
102
+ llmwiki query "how does knowledge compilation work?"
103
+ ```
104
+
105
+ See `examples/basic/` in the repo for pre-generated output you can browse without an API key.
106
+
107
+ ## Limitations
108
+
109
+ Early software. Best for small, high-signal corpora (a few dozen sources). Query routing is index-based. Provenance is page-level, not claim-level. Anthropic-only for now.
110
+
111
+ **Honest about truncation.** Sources that exceed the character limit are truncated on ingest with `truncated: true` and the original character count recorded in frontmatter, so downstream consumers know they're working with partial content.
112
+
113
+ ## Karpathy's LLM Wiki pattern vs this compiler
114
+
115
+ Karpathy describes an abstract pattern for turning raw data into compiled knowledge. Here's how llmwiki maps to it:
116
+
117
+ | Karpathy's concept | llmwiki | Status |
118
+ |---|---|---|
119
+ | Data ingest | `llmwiki ingest` | Implemented |
120
+ | Compile wiki | `llmwiki compile` | Implemented |
121
+ | Q&A | `llmwiki query` | Implemented |
122
+ | Output filing (save answers back) | `llmwiki query --save` | Implemented |
123
+ | Auto-recompile | `llmwiki watch` | Implemented |
124
+ | Linting / health-check pass | — | Not yet implemented (`watch` is auto-recompile, not lint) |
125
+ | Image support | — | Not yet implemented |
126
+ | Marp slides | — | Not yet implemented |
127
+ | Fine-tuning | — | Not yet implemented |
128
+
129
+ ## Roadmap
130
+
131
+ - Multi-provider support (OpenAI, local models)
132
+ - Better provenance (claim-level source attribution)
133
+ - Larger-corpus query strategy (semantic search, embeddings)
134
+ - Deeper Obsidian integration
135
+ - Linting pass for wiki quality checks
136
+
137
+ If you want to contribute, these are the highest-leverage areas right now. Issues and PRs are welcome.
138
+
139
+ ## Requirements
140
+
141
+ Node.js >= 18, an Anthropic API key.
142
+
143
+ ## License
144
+
145
+ MIT
146
+
147
+
148
+ ## Disclaimer
149
+
150
+ No LLMs were harmed in the making of this repo.