npm - llm-wiki-compiler - Versions diffs - 0.1.0 → 0.2.0 - Mend

llm-wiki-compiler 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -4,6 +4,8 @@ Compile raw sources into an interlinked markdown wiki.
 Inspired by Karpathy's [LLM Wiki](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) pattern: instead of re-discovering knowledge at query time, compile it once into a persistent, browsable artifact that compounds over time.
+![llmwiki demo](docs/images/demo.gif)
 ## Who this is for
 - **AI researchers and engineers** building persistent knowledge from papers, docs, and notes
@@ -15,12 +17,55 @@ Inspired by Karpathy's [LLM Wiki](https://gist.github.com/karpathy/442a6bf555914
 ```bash
 npm install -g llm-wiki-compiler
 export ANTHROPIC_API_KEY=sk-...
+# Or use ANTHROPIC_AUTH_TOKEN if your Anthropic-compatible gateway expects it.
+# Or use a different provider:
+# export LLMWIKI_PROVIDER=openai
+# export OPENAI_API_KEY=sk-...
 llmwiki ingest https://some-article.com
 llmwiki compile
 llmwiki query "what is X?"
 ```
+## Configuration
+llmwiki configures providers via environment variables. The default provider is Anthropic.
+Configuration precedence for Anthropic values:
+1. Shell env / local `.env`
+2. Claude Code settings fallback (`~/.claude/settings.json` → `env` block)
+3. Built-in provider defaults (where applicable)
+- `LLMWIKI_PROVIDER`: The provider to use (e.g., anthropic, openai).
+- `LLMWIKI_MODEL`: The model name to override the provider default.
+### Anthropic (Default)
+- `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN`: Required. Either one can satisfy Anthropic authentication.
+- `ANTHROPIC_BASE_URL`: Optional. Custom endpoint for proxies. Valid HTTP(S) URLs are accepted, including Claude-style path endpoints such as `https://api.kimi.com/coding/`.
+Example using an Anthropic or cc-switch custom proxy:
+```bash
+export LLMWIKI_PROVIDER=anthropic
+export ANTHROPIC_API_KEY=sk-...
+export ANTHROPIC_BASE_URL=https://proxy.example.com
+```
+If those values are not set in shell env or `.env`, llmwiki will try Anthropic-compatible values from `~/.claude/settings.json` (`env` block) for:
+- `ANTHROPIC_API_KEY`
+- `ANTHROPIC_AUTH_TOKEN`
+- `ANTHROPIC_BASE_URL`
+- `ANTHROPIC_MODEL`
+Example with zero exports (Claude Code already configured):
+```bash
+llmwiki compile
+```
 ## Why not just RAG?
 RAG retrieves chunks at query time. Every question re-discovers the same relationships from scratch. Nothing accumulates.
@@ -68,7 +113,7 @@ a knowledge base into a target language that supports efficient queries.
 Related concepts: [[Propositional Logic]], [[Model Counting]]
 ```
-Pages include source attribution in frontmatter. Provenance is page-level today, not claim-level.
+Pages include source attribution in frontmatter. Paragraphs are annotated with `^[filename.md]` markers pointing back to the source file that contributed the content.
 ## Commands
@@ -78,7 +123,9 @@ Pages include source attribution in frontmatter. Provenance is page-level today,
 | `llmwiki compile` | Incremental compile: extract concepts, generate wiki pages |
 | `llmwiki query "question"` | Ask questions against your compiled wiki |
 | `llmwiki query "question" --save` | Answer and save the result as a wiki page |
+| `llmwiki lint` | Check wiki quality (broken links, orphans, empty pages, etc.) |
 | `llmwiki watch` | Auto-recompile when `sources/` changes |
+| `llmwiki serve [--root <dir>]` | Start an MCP server exposing wiki tools to AI agents |
 ## Output
@@ -97,16 +144,72 @@ Try it on any article or document:
 ```bash
 mkdir my-wiki && cd my-wiki
-llmwiki ingest https://en.wikipedia.org/wiki/Knowledge_compilation
+llmwiki ingest https://en.wikipedia.org/wiki/Andrej_Karpathy
 llmwiki compile
-llmwiki query "how does knowledge compilation work?"
+llmwiki query "What terms did Andrej coin?"
 ```
 See `examples/basic/` in the repo for pre-generated output you can browse without an API key.
+## MCP Server
+llmwiki ships an MCP (Model Context Protocol) server so AI agents (Claude Desktop, Cursor, Claude Code, etc.) can drive the full pipeline directly: ingest sources, compile, query, search, lint, and read pages — without scraping CLI output.
+Where [llm-wiki-kit](https://github.com/iamsashank09/llm-wiki-kit) gives agents raw CRUD against wiki pages, llmwiki exposes the **automated pipelines**: agents get intelligent compilation, incremental change detection, and semantic query routing built in.
+### Setup
+Start the server (stdio transport, no API key required at startup):
+```bash
+llmwiki serve --root /path/to/your/wiki-project
+```
+### Claude Desktop / Cursor configuration
+Add to your client's MCP config (e.g. `claude_desktop_config.json`):
+```json
+{
+  "mcpServers": {
+    "llmwiki": {
+      "command": "npx",
+      "args": ["llm-wiki-compiler", "serve", "--root", "/path/to/wiki-project"],
+      "env": {
+        "ANTHROPIC_API_KEY": "sk-ant-..."
+      }
+    }
+  }
+}
+```
+Tools that need an LLM (`compile_wiki`, `query_wiki`, `search_pages`) check for a configured provider on each call. Read-only tools (`read_page`, `lint_wiki`, `wiki_status`) and `ingest_source` work without any credentials.
+### Tools
+| Tool | What it does |
+|------|--------------|
+| `ingest_source` | Fetch a URL or local file into `sources/`. |
+| `compile_wiki` | Run the incremental compile pipeline; returns counts, slugs, errors. |
+| `query_wiki` | Two-step grounded answer with optional `--save`. |
+| `search_pages` | Return full content of pages relevant to a question. |
+| `read_page` | Read a single page by slug (concepts/ then queries/). |
+| `lint_wiki` | Run quality checks; returns structured diagnostics. |
+| `wiki_status` | Page count, source count, orphans, pending changes (read-only). |
+### Resources
+| URI | Returns |
+|-----|---------|
+| `llmwiki://index` | Full `wiki/index.md` content. |
+| `llmwiki://concept/{slug}` | A single concept page (frontmatter + body). |
+| `llmwiki://query/{slug}` | A single saved query page. |
+| `llmwiki://sources` | List of ingested source files with metadata. |
+| `llmwiki://state` | Compilation state (per-source hashes, last compile times). |
 ## Limitations
-Early software. Best for small, high-signal corpora (a few dozen sources). Query routing is index-based. Provenance is page-level, not claim-level. Anthropic-only for now.
+Early software. Best for small, high-signal corpora (a few dozen sources). Query routing is index-based.
 **Honest about truncation.** Sources that exceed the character limit are truncated on ingest with `truncated: true` and the original character count recorded in frontmatter, so downstream consumers know they're working with partial content.
@@ -121,24 +224,29 @@ Karpathy describes an abstract pattern for turning raw data into compiled knowle
 | Q&A | `llmwiki query` | Implemented |
 | Output filing (save answers back) | `llmwiki query --save` | Implemented |
 | Auto-recompile | `llmwiki watch` | Implemented |
-| Linting / health-check pass | — | Not yet implemented (`watch` is auto-recompile, not lint) |
+| Linting / health-check pass | `llmwiki lint` | Implemented |
+| Agent integration | `llmwiki serve` (MCP server) | Implemented |
 | Image support | — | Not yet implemented |
 | Marp slides | — | Not yet implemented |
 | Fine-tuning | — | Not yet implemented |
 ## Roadmap
-- Multi-provider support (OpenAI, local models)
-- Better provenance (claim-level source attribution)
-- Larger-corpus query strategy (semantic search, embeddings)
-- Deeper Obsidian integration
-- Linting pass for wiki quality checks
+- ✅ Better provenance (paragraph-level source attribution)
+- ✅ Linting pass for wiki quality checks
+- ✅ Multi-provider support (OpenAI, Ollama, MiniMax)
+- ✅ Larger-corpus query strategy (semantic search, embeddings)
+- ✅ Deeper Obsidian integration (tags, aliases, Map of Content)
+- ✅ MCP server for agent integration
+- Image support
+- Marp slides
+- Fine-tuning
 If you want to contribute, these are the highest-leverage areas right now. Issues and PRs are welcome.
 ## Requirements
-Node.js >= 18, an Anthropic API key.
+Node.js >= 18, plus provider credentials (for Anthropic: `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN`).
 ## License