@wanshi-kg/wanshi 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +102 -150
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -6,6 +6,9 @@
6
6
  <img alt="wanshi" src="docs/assets/readme-banner-light.png">
7
7
  </picture>
8
8
 
9
+ [![npm version](https://img.shields.io/npm/v/@wanshi-kg/wanshi)](https://www.npmjs.com/package/@wanshi-kg/wanshi)
10
+ [![CI](https://github.com/wanshi-kg/wanshi/actions/workflows/ci.yml/badge.svg)](https://github.com/wanshi-kg/wanshi/actions/workflows/ci.yml)
11
+ [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
9
12
 
10
13
  > A local-first CLI that reads ten thousand things — code, docs, PDFs, audio, transcripts — and builds one knowledge graph that remembers where every fact came from.
11
14
 
@@ -15,7 +18,11 @@ It's a working CLI and a research platform in equal measure — the long game is
15
18
 
16
19
  ---
17
20
 
18
- > **Command shorthand:** examples below write `wanshi` for the run command. Installed from npm (`@wanshi-kg/wanshi`) that's literally `wanshi`; from a source checkout it's `npm start --` (i.e. `npx ts-node ./src/cli/index.ts`) in dev, or `node ./dist/cli/index.js` after `npm run build`.
21
+ > **Command shorthand:** examples below write `wanshi` for the run command the global CLI once you've run `npm i -g @wanshi-kg/wanshi`. From a source checkout it's `npm start --` (i.e. `npx ts-node ./src/cli/index.ts`) in dev, or `node ./dist/cli/index.js` after `npm run build`.
22
+
23
+ ## Contents
24
+
25
+ [What's distinctive](#whats-distinctive) · [Supported inputs](#supported-inputs) · [Install](#install) · [Quick start](#quick-start) · [CLI reference](#cli-reference) · [Output formats](#output-formats) · [Local model guidance](#local-model-guidance) · [Quality metrics](#quality-metrics) · [Architecture](#architecture) · [Development](#development)
19
26
 
20
27
  ## What's distinctive
21
28
 
@@ -25,6 +32,7 @@ Most text→KG tools stop at "extract triples." `wanshi` is built around the par
25
32
  - **A grounding gate (opt-in).** Each extracted fact can be scored against its source chunk and flagged or dropped before it reaches the output — keyword overlap as a cheap pre-filter, with an optional local NLI checker (MiniCheck) for the uncertain cases. Enabled (`--grounding flag|drop`), it won't record what it can't verify against the source — but it's `disabled` by default.
26
33
  - **Closed-vocabulary extraction.** An optional corpus pre-pass builds a glossary of canonical entity/relation types, which then *constrains* extraction — so a large corpus doesn't fragment into hundreds of one-off types.
27
34
  - **Transcript-aware ingestion.** Speaker-labeled transcripts and chat exports are split into speaker-pure chunks, so a speaker becomes per-fact provenance rather than a polluting entity.
35
+ - **Beyond plain text.** A structured source can map straight to graph — a SQLite `.db` becomes tables→types, rows→entities, foreign-keys→edges with no LLM — and a document's own links and citations become deterministic edges, optionally fetching the cited work to ground the claim.
28
36
  - **Memory-store interop.** `mcp-jsonl` output is byte-compatible with the official [MCP memory server](https://github.com/modelcontextprotocol/servers/tree/main/src/memory) — point it at the file and query your graph from Claude Code/Desktop. No store to build.
29
37
  - **Training-data exports.** Emit KBLaM `(entity, property, value)` triples or quality-filtered LoRA/SFT chat examples straight from a graph.
30
38
  - **Resumable runs.** Per-chunk checkpoints survive interrupts and exhausted API credits; re-run the same command to continue.
@@ -35,9 +43,15 @@ Most text→KG tools stop at "extract triples." `wanshi` is built around the par
35
43
  | ------ | ---------- | -------- |
36
44
  | Text / source code | `.txt`, `.ts`, `.js`, `.py`, `.go`, `.rs`, … | Direct / code-aware extraction |
37
45
  | Markdown | `.md` | Markdown-aware parsing |
46
+ | LaTeX | `.tex` | De-TeX'd to readable prose; `\cite{}` keys feed the citation pipeline |
47
+ | EPUB | `.epub` | Unzipped and parsed per chapter (adm-zip + cheerio + html-to-text) |
48
+ | Jupyter | `.ipynb` | Cell-aware (markdown narrative + fenced code); cell outputs opt-in |
38
49
  | Transcripts | speaker-labeled `*.parakeet.txt`/`*.whisper.txt`, transcript/turn JSON, Claude/ChatGPT exports | Speaker-pure chunks with per-fact `speaker`/`occurredAt` |
50
+ | Email | `.eml`, `.mbox` | Per-message turns (sender → `speaker`, `Date` → `validAt`); thread-aware; quoted replies stripped |
51
+ | Chat exports | WhatsApp `.txt`, Telegram/Discord/Slack `.json` | Per-message speaker-pure turns via a per-platform parser |
52
+ | Subtitles | `.srt`, `.vtt` | Caption text (timecodes/styling stripped); VTT `<v>` voice tags → speakers |
39
53
  | JSON | `.json`, `.jsonl`, `.geojson` | Structure-aware chunking (splits on JSON structure, never mid-object) |
40
- | PDF | `.pdf` | Page text (`pdf2json`), or a richer engine via `--pdf-engine docling\|marker\|mistral` |
54
+ | PDF | `.pdf` | Page text (`pdf2json`), or a richer engine via `--pdf-engine tesseract\|docling\|marker\|chandra\|mistral` |
41
55
  | Office | `.docx`, `.xlsx`, `.pptx` | Via officeparser |
42
56
  | HTML / RTF | `.html`, `.htm`, `.rtf` | cheerio / RTF parsing |
43
57
  | Images | `.jpg`, `.png`, `.gif`, `.webp`, `.tiff`, `.heic`, `.avif` | Vision model required |
@@ -48,14 +62,20 @@ Most text→KG tools stop at "extract triples." `wanshi` is built around the par
48
62
  Requires **Node.js 18+** and **[Ollama](https://ollama.ai)** running locally (needed for the default local generation + embeddings path; optional only if you point *both* at an OpenAI-compatible provider).
49
63
 
50
64
  ```bash
51
- git clone https://github.com/wanshi-kg/wanshi
52
- cd wanshi
53
- npm install
65
+ # Install the published CLI (gives you the `wanshi` command)
66
+ npm install -g @wanshi-kg/wanshi
54
67
 
55
68
  # Default local models
56
69
  ollama pull llama3.2 # generation
57
- ollama pull nomic-embed-text # embeddings
70
+ ollama pull nomic-embed-text # embeddings
71
+ ```
72
+
73
+ Or run from a source checkout (for development / contributing):
58
74
 
75
+ ```bash
76
+ git clone https://github.com/wanshi-kg/wanshi
77
+ cd wanshi
78
+ npm install
59
79
  npm run build # optional; ts-node works directly
60
80
  ```
61
81
 
@@ -145,98 +165,43 @@ wanshi --export-only -i ./knowledge-graph.json --export-format kblam -o ./kb.jso
145
165
 
146
166
  ## CLI reference
147
167
 
148
- ### Core
168
+ The most-used flags are below. Run **`wanshi --help`** for the full list and **`wanshi schema`** for the complete, authoritative config (generated from the Zod schema, so it never drifts from the code); the prose reference lives in [`website/docs/reference/cli.md`](website/docs/reference/cli.md).
149
169
 
150
170
  | Option | Default | Description |
151
171
  | ------ | ------- | ----------- |
152
172
  | `-i, --input <path>` | `.` | Input directory |
153
- | `-f, --filter <glob>` | `**/*` | Include pattern |
154
- | `-e, --exclude <glob...>` | — | Exclude patterns |
155
173
  | `-o, --output <path>` | `knowledge-graph.json` | Output file |
156
- | `-d, --description <text>` | | Content description for LLM context |
157
- | `--config <file>` | — | YAML/JSON config file |
158
-
159
- ### LLM
160
-
161
- | Option | Default | Description |
162
- | ------ | ------- | ----------- |
174
+ | `-f, --filter` / `-e, --exclude <glob…>` | `**/*` | Include / exclude patterns |
175
+ | `--config <file>` | — | YAML/JSON config (recommended; nested shape — `wanshi schema`) |
163
176
  | `--provider <name>` | `ollama` | `ollama` or `openai` (any OpenAI-compatible endpoint) |
164
- | `-m, --model <name>` | `llama3.2` | Ollama tag or provider model id |
165
- | `-h, --host <url>` | `http://localhost:11434` | Ollama host, or OpenAI-compatible base URL |
166
- | `--api-key <key>` | | Falls back to `$OPENAI_API_KEY` / `$WANSHI_API_KEY` |
167
- | `--temperature <n>` | `0.1` | Sampling temperature |
168
- | `--repeat-penalty <n>` | `1.1` | Ollama only (>1.0 discourages repetition) |
169
- | `--context-length <n>` | `8192` | Context window (Ollama only) |
170
- | `--max-tokens <n>` | provider default | Raise (or lower `--chunk-size`) if graph JSON truncates mid-output |
171
- | `--seed <n>` | — | Reproducibility seed (Ollama only) |
172
- | `-s, --system <prompt\|path>` | — | Custom system prompt or template path |
173
-
174
- ### Embeddings (independent from generation)
175
-
176
- | Option | Default | Description |
177
- | ------ | ------- | ----------- |
178
- | `--embeddings-provider <name>` | `ollama` | `ollama` or `openai` |
179
- | `--embeddings-model <name>` | `nomic-embed-text` | Embeddings model |
180
- | `--embeddings-host <url>` | `http://localhost:11434` | Host / base URL |
181
- | `--embeddings-max-input-chars <n>` | `1024` | Truncate embedding inputs (safe for 512-token models; raise for cloud) |
182
-
183
- ### Processing & retrieval
184
-
185
- | Option | Default | Description |
186
- | ------ | ------- | ----------- |
187
- | `--chunking <mode>` | `enabled` | `enabled\|disabled\|auto` |
177
+ | `-m, --model <name>` | `llama3.2` | Generation model |
178
+ | `-h, --host <url>` | `localhost:11434` | Ollama host / OpenAI base URL |
179
+ | `--embeddings-model <name>` | `nomic-embed-text` | Embeddings model (chosen independently from generation) |
188
180
  | `-c, --chunk-size <n>` | `2000` | Max chunk size (chars) |
189
- | `--overlap-size <n>` | `100` | Chunk overlap |
190
- | `--retrieval <mode>` | `enabled` | `enabled\|disabled\|auto` |
191
- | `--retrieval-limit <n>` | `3` | Retrieved context entities per chunk |
192
- | `--retrieval-scope <mode>` | `chunk` | `chunk` (per-chunk) or `file` (once, reused) |
193
- | `--json-strategy <mode>` | `structural` | `structural` (split on JSON structure) or `raw` |
194
-
195
- ### Media & classification
196
-
197
- | Option | Default | Description |
198
- | ------ | ------- | ----------- |
199
- | `--asr <mode>` | `enabled` | `enabled\|disabled\|auto` |
200
- | `--whisper-model <name>` | `medium` | `tiny\|base\|small\|medium\|large` |
201
- | `--language <lang>` | `auto` | Language code or `auto` |
202
- | `--translate` | `false` | Translate audio to English |
203
- | `--images <mode>` | `auto` | `enabled\|disabled\|auto` (vision model required) |
204
- | `--pdf-engine <engine>` | `pdf2json` | `pdf2json\|docling\|marker\|mistral` — PDF reading engine (non-default engines degrade to `pdf2json` on failure) |
205
- | `--asr-engine <engine>` | `whisper` | `whisper\|dual` — `dual` = vendored Python VAD + Parakeet/Whisper dual-STT + diarization (Apple-Silicon) |
206
- | `--classifier <mode>` | `disabled` | `disabled\|heuristic\|llm\|cascade` — drives domain prompt hints and scopes `entityType` to a per-domain enum *(experimental)* |
207
- | `--trace` | `false` | Emit a structured decision run-trace to `<output>.trace.jsonl` *(debug/observability)* |
208
-
209
- ### Merging, grounding, corpus glossary
210
-
211
- | Option | Default | Description |
212
- | ------ | ------- | ----------- |
213
- | `--entity-similarity-threshold <n>` | `0.9` | Jaro-Winkler entity dedup (0–1) |
214
- | `--observation-similarity-threshold <n>` | `0.9` | Embedding similarity (0–1) |
215
- | `--enable-similarity-merging` | `true` | Enable entity deduplication |
216
- | `--grounding <mode>` | `disabled` | `disabled` · `flag` (annotate `grounded`/`groundingScore`) · `drop` (remove below threshold) |
217
- | `--grounding-min-score <n>` | `0.5` | Min grounding score; also gates which facts the `lora` export keeps |
218
- | `--corpus-profiling <mode>` | `disabled` | Pre-pass that builds an authoritative corpus glossary (closed vocab under v5) *(experimental)* |
219
- | `--prompt-version <version>` | `v5` | `v5` (closed-vocab + topology hygiene) or `v4.5` (legacy) |
220
-
221
- ### Export, resume, logging
222
-
223
- | Option | Default | Description |
224
- | ------ | ------- | ----------- |
225
- | `--export-format <format>` | `json` | `json\|jsonl\|mcp-jsonl\|dot\|kblam\|lora\|graphiti` |
226
- | `--export-only` | `false` | Convert an existing graph (`--input`) to `--export-format` — no extraction |
181
+ | `--export-format <fmt>` | `json` | `json·jsonl·mcp-jsonl·dot·kblam·lora·graphiti` |
182
+ | `--export-only` | `false` | Convert an existing graph — no extraction |
227
183
  | `--resume` | `false` | Checkpoint chunks; skip done ones on re-run |
228
- | `--checkpoint <path>` | `<output>.checkpoint.jsonl` | Checkpoint sidecar |
229
- | `-L, --log-level <level>` | `info` | `debug\|info\|warning\|error` |
230
- | `-l, --log-file <path>` | | Write logs to file |
231
- | `-w, --watch` | `false` | Watch mode |
184
+ | `--grounding <mode>` | `disabled` | `flag` / `drop` ungrounded facts (opt-in) |
185
+ | `--pdf-engine <engine>` | `pdf2json` | `pdf2json·tesseract·docling·marker·chandra·mistral` |
186
+ | `-w, --watch` | `false` | Update the graph as files change |
232
187
 
233
- > Document-outline injection (`readers.outline`) and DOT styling (`export.dot`) are config-only (no CLI flags) see the config schema.
188
+ **Opt-in subsystems** — all default **off** (an otherwise byte-identical, offline run): reference + citation resolution (`--reference-links`, `--reference-citations`, `--reference-web`, `--reference-citation-fetch`, plus GROBID / Unpaywall / title-resolver), image enrichment (`--exif`, `--c2pa`, `--object-detection`), structured-source adapters (`--sqlite`), AST code seeding (`--ast`), the dual-STT ASR engine (`--asr-engine dual`), and cost metering (`--cost` / `--max-cost`). Run `wanshi --help` for each.
234
189
 
235
190
  ## Output formats
236
191
 
237
- ### JSON (`json`)
192
+ Pick with `--export-format`:
238
193
 
239
- Observations are **objects**, not bare strings — each carries provenance and the bi-temporal axis. The LLM emits plain text; `wanshi` stamps the metadata deterministically from what it knows about the chunk. Unknown fields are omitted; legacy string-observation graphs still load.
194
+ | Format | What it's for |
195
+ | ------ | ------------- |
196
+ | `json` (default) | Full graph; observations are **objects** carrying provenance + the bi-temporal axis |
197
+ | `jsonl` | Streamable JSON Lines |
198
+ | `mcp-jsonl` | Byte-compatible with the [MCP memory server](https://github.com/modelcontextprotocol/servers/tree/main/src/memory) — point it at the file, query from Claude. No store to build |
199
+ | `dot` | Styled GraphViz (colors, legend, clustering — config-only `export.dot:`); render `dot -Tsvg graph.dot -o graph.svg` |
200
+ | `kblam` | Microsoft [KBLaM](https://github.com/microsoft/KBLaM) `(entity, property, value)` triples for knowledge-token training |
201
+ | `lora` | Chat SFT examples, **quality-filtered** (drops facts below `--grounding-min-score`) |
202
+ | `graphiti` | `add_triplet`-shaped `{ nodes, edges }` for a [Graphiti](https://github.com/getzep/graphiti) temporal graph |
203
+
204
+ The default `json` keeps observations as provenance-stamped **objects** — the LLM emits plain text; `wanshi` stamps `source`/`speaker` and the bi-temporal axis deterministically from what it knows about each chunk:
240
205
 
241
206
  ```json
242
207
  {
@@ -245,11 +210,8 @@ Observations are **objects**, not bare strings — each carries provenance and t
245
210
  "name": "knowledge_graph_builder",
246
211
  "entityType": "class",
247
212
  "observations": [
248
- {
249
- "text": "Extracts entities and relations from file content using an LLM",
250
- "source": "src/core/knowledge/KnowledgeGraphBuilder.ts",
251
- "createdAt": "2026-06-05T15:57:59.856Z"
252
- }
213
+ { "text": "Extracts entities and relations from file content using an LLM",
214
+ "source": "src/core/knowledge/KnowledgeGraphBuilder.ts", "createdAt": "2026-06-05T15:57:59.856Z" }
253
215
  ],
254
216
  "files": ["src/core/knowledge/KnowledgeGraphBuilder.ts"]
255
217
  },
@@ -257,13 +219,9 @@ Observations are **objects**, not bare strings — each carries provenance and t
257
219
  "name": "SPEAKER_01",
258
220
  "entityType": "person",
259
221
  "observations": [
260
- {
261
- "text": "Explains that a Naïve Bayes classifier assumes word independence",
262
- "speaker": "SPEAKER_01",
263
- "source": "Olga Lesson P.parakeet.txt",
264
- "validAt": "2026-05-28T00:00:00Z",
265
- "createdAt": "2026-06-05T15:57:59.856Z"
266
- }
222
+ { "text": "Explains that a Naïve Bayes classifier assumes word independence",
223
+ "speaker": "SPEAKER_01", "source": "Olga Lesson P.parakeet.txt",
224
+ "validAt": "2026-05-28T00:00:00Z", "createdAt": "2026-06-05T15:57:59.856Z" }
267
225
  ],
268
226
  "files": ["Olga Lesson P.parakeet.txt"]
269
227
  }
@@ -274,37 +232,7 @@ Observations are **objects**, not bare strings — each carries provenance and t
274
232
  }
275
233
  ```
276
234
 
277
- ### MCP-compatible JSONL (`mcp-jsonl`)
278
-
279
- ```jsonl
280
- {"type":"entity","name":"knowledge_graph_builder","entityType":"class","observations":["Extracts entities and relations from file content using an LLM"]}
281
- {"type":"relation","from":"knowledge_graph_builder","to":"ollama_service","relationType":"uses,depends_on"}
282
- ```
283
-
284
- ### GraphViz DOT (`dot`)
285
-
286
- Styled, colored graph (one node per entity, colored edges per relation type, legend, config summary). Render with `dot -Tsvg graph.dot -o graph.svg` (or `neato`/`fdp`/`sfdp`/`circo`/`twopi`). Styling is config-only under `export.dot:` — layout, `rankdir`, `colorScheme` (`default\|scientific\|code\|minimal`), clustering by type or file, etc.
287
-
288
- ### KBLaM triples (`kblam`)
289
-
290
- JSONL in the shape Microsoft [KBLaM](https://github.com/microsoft/KBLaM)'s `dataset_generation` ingests — **one `(entity, property, value)` per line**, each with the derived `Q`/`A`/`key_string` it encodes into a knowledge token. Property names are distinct per entity (relations contribute their predicate as the property), and keys are unique per `(name, property)` so rectangular-attention lookup is unambiguous.
291
-
292
- ```jsonl
293
- {"name":"Recursion","property":"definition","value":"a function that calls itself","Q":"What is the definition of Recursion?","A":"The definition of Recursion is a function that calls itself.","key_string":"the definition of Recursion"}
294
- {"name":"Recursion","property":"terminates_at","value":"BaseCase","Q":"What is the terminates_at of Recursion?","A":"The terminates_at of Recursion is BaseCase.","key_string":"the terminates_at of Recursion"}
295
- ```
296
-
297
- ### LoRA / SFT (`lora`)
298
-
299
- Chat-format instruction examples derived from the same triples, **quality-filtered**: observations whose grounding score is below `--grounding-min-score` are dropped, so only grounded facts become training data.
300
-
301
- ```jsonl
302
- {"messages":[{"role":"user","content":"What is the definition of Recursion?"},{"role":"assistant","content":"The definition of Recursion is a function that calls itself."}]}
303
- ```
304
-
305
- ### Graphiti (`graphiti`)
306
-
307
- `add_triplet`-shaped `{ nodes, edges }` for ingestion into a [Graphiti](https://github.com/getzep/graphiti) temporal graph — entities → nodes (summary from observations), relations → `UPPER_SNAKE` edges with stable uuids. Per-fact valid-time rides along in the `json`/`kblam` exports.
235
+ Per-format shapes + examples (KBLaM / LoRA / Graphiti / DOT): [`website/docs/guides/output-formats.md`](website/docs/guides/output-formats.md).
308
236
 
309
237
  ## Local model guidance
310
238
 
@@ -320,15 +248,16 @@ Quality/speed trade-off for local selection. For measured numbers see the benchm
320
248
 
321
249
  Default embeddings: `nomic-embed-text`.
322
250
 
323
- The table above is qualitative guidance. For measured, comparative numbers (wanshi vs KGGen on gold-labeled datasets) see **[Benchmarks](#benchmarks)** below — note those run on **cloud** models; local-model benchmarks are planned.
251
+ The table above is qualitative guidance. For measured, comparative numbers (wanshi vs KGGen on gold-labeled datasets) see **[Benchmarks](#benchmarks)** below — both a cloud arm and a **local (M4 + L4) arm**.
324
252
 
325
253
  ## Benchmarks
326
254
 
327
- > **Scope & honesty (read first).** Every number here is **cloud inference via OpenRouter** —
328
- > **local-model (offline-first) benchmarks are planned and not yet run** (see [What's not yet
329
- > measured](#whats-not-yet-measured)). Comparative baselines are **re-scored under one identical
330
- > harness, not the published figures**. The document-level result rests on **one dataset** so far.
331
- > **MINE** is a recall-only, LLM-judge-mediated axis, reported as *context*, not a load-bearing claim.
255
+ > **Scope & honesty (read first).** Cloud numbers are **OpenRouter inference**; the
256
+ > **local (offline-first) arm is now measured too** see [Local arm](#local-arm-offline-first).
257
+ > Comparative baselines are **re-scored under one identical harness**
258
+ > ([pre-registered methodology](docs/benchmark/SCORING.md)), not the published figures. The
259
+ > document-level result rests on **one dataset** so far. **MINE** is a recall-only, LLM-judge-mediated
260
+ > axis, reported as *context*, not a load-bearing claim.
332
261
 
333
262
  wanshi vs **KGGen** (its real Python package), **same model for both tools**, on gold-labeled datasets.
334
263
  The fair cross-tool metric is **entity-capture F1** (did the tool recover the gold entities) — both
@@ -399,13 +328,29 @@ npx ts-node scripts/gold-compare.ts --dataset redocred --limit 100 \
399
328
  # add --relation-vocab @data/redocred/compare/relation-vocab.txt for the schema-aware (H4) cell
400
329
  ```
401
330
 
331
+ ### Local arm (offline-first)
332
+
333
+ The deployment-target floor is now measured: wanshi vs KGGen on the **same local Ollama model**
334
+ (`gemma3:4b`, `qwen3:8b`), gold corpora, on a **16 GB M4 laptop** *and* a rented **L4 GPU**. The
335
+ precision-collapse holds at the 4B *local* tier — biored KGGen node-precision **0.26**, matching the
336
+ cloud's ~0.24 — so the precision-stability claim is **model-invariant across 4B→70B and three hardware
337
+ tiers**, not just cloud.
338
+
339
+ | `gemma3:4b` · biored | wanshi node-F1 | KGGen node-F1 | conformance | throughput |
340
+ | -------------------- | -------------- | ------------- | ----------- | ---------- |
341
+ | M4 (16 GB laptop) | 0.49 | 0.39 | 1.000 | ~25 tok/s |
342
+ | L4 (rented GPU) | 0.49 | 0.39 | 1.000 | ~63 tok/s |
343
+
344
+ **Quality is hardware-independent** — M4 and L4 node-F1 differ only by sampling noise, and JSON-conformance
345
+ is **1.000** on both dense models — at **~40% of the rental GPU's throughput**. wanshi wins node-F1 in
346
+ **8/8 M4 cells and 11/12 L4 cells** (sole loss: redocred/qwen3:8b). *(qwen3:8b runs on 16 GB only
347
+ serialized; a full 8B comparison sweep isn't a realistic laptop workload.)*
348
+
402
349
  ### What's not yet measured
403
350
 
404
- - **Local-model (offline-first) benchmarks** — the deployment-target floor (`gemma3:4b`-class) is *owed*;
405
- every number above is cloud inference. This is the next benchmark priority. *(An earlier indicative
406
- n=20 single-domain run hinted small `gemma3:4b` ≈ larger Gemmas on entity extraction — to be confirmed
407
- in the local arm.)*
408
351
  - **A second document-level dataset** (SciERC / BioRED) to close the single-dataset caveat on claim (a).
352
+ - **A clean wanshi-alone cell + the redocred/qwen3:8b document cell** (the one local loss) on that second
353
+ corpus — to settle whether the doc-level arc weakens at 8B or it's noise.
409
354
 
410
355
  ## Quality metrics
411
356
 
@@ -415,18 +360,25 @@ Importable evaluators in `src/quality/` (also wired into `npm run benchmark`): *
415
360
 
416
361
  ```text
417
362
  src/
418
- ├── cli/ # Commander.js CLI (process/watch/export; --export-only)
363
+ ├── cli/ # Commander.js CLI (process/watch/export; --export-only)
364
+ ├── config/ # Single nested Zod schema — defaults, validation, `wanshi schema`
419
365
  ├── core/
420
- │ ├── di/ # Async DI container + service registrations
421
- │ ├── processor/ # File readers (transcript, JSON, PDF, Office, audio, …) + chunking + classifiers
422
- │ ├── checkpoint/# Per-chunk resume sidecar
423
- │ ├── llm/ # Ollama / OpenAI-compatible providers, embeddings, Handlebars prompts
424
- │ ├── knowledge/ # KG building (LLM+Zod, provenance + grounding gate), 3-level merge, vector search
425
- └── export/ # Strategy pattern: json, jsonl, mcp-jsonl, dot, kblam, lora, graphiti
426
- ├── quality/ # Importable metrics (structural, semantic, factual, consistency, composite)
427
- ├── evaluation/ # Benchmark harness (CrossRE / REBEL / RE-DocRED)
428
- ├── types/ # Interfaces and data models
429
- └── shared/ # Logger, graceful shutdown, utilities (Jaro-Winkler, cosine, config)
366
+ │ ├── di/ # Async DI container + service registrations
367
+ │ ├── processor/ # File readers (transcript, email, chat, PDF/OCR, audio, …) + chunking + classifiers + AST seed
368
+ │ ├── corpus/ # Corpus pre-pass: term frequency + LLM glossary (closed vocab)
369
+ │ ├── checkpoint/ # Per-chunk resume sidecar
370
+ │ ├── llm/ # Ollama / OpenAI-compatible providers, embeddings, Handlebars prompts
371
+ ├── knowledge/ # KG build (LLM+Zod, provenance + grounding gate), 3-level merge, canon, references, images, vector search
372
+ ├── adapters/ # Structured-emit adapters (SQLite graph fragments, no LLM)
373
+ ├── cv/ # Object-detection pre-pass (a signal for the model, not a verdict)
374
+ ├── cost/ # Token/cost metering + `--max-cost` cap
375
+ │ ├── trace/ # Debug run-trace sidecar (observability, off by default)
376
+ │ ├── pipeline/ # Post-merge transform stages
377
+ │ └── export/ # Strategy pattern: json, jsonl, mcp-jsonl, dot, kblam, lora, graphiti
378
+ ├── quality/ # Importable metrics (structural, semantic, factual, consistency, composite)
379
+ ├── evaluation/ # Benchmark harness (CrossRE / REBEL / RE-DocRED / SemEval-2010 T8 / MINE)
380
+ ├── types/ # Interfaces and data models
381
+ └── shared/ # Logger, graceful shutdown, utilities (Jaro-Winkler, cosine, config)
430
382
  ```
431
383
 
432
384
  Tests use Jest (`npm test`); mock the LLM via `ILLMProvider` for network-free unit tests.
@@ -439,7 +391,7 @@ npm start -- --config config.yaml # run directly (ts-
439
391
  npm run build && node ./dist/cli/index.js --config config.yaml # or build first
440
392
  ```
441
393
 
442
- See `examples/kg-mail-assistant/` for a full integration (Gmail OAuth + Telegram bot + continuous email→KG pipeline) and programmatic usage via `ContainerFactory`.
394
+ See [`examples/`](examples/) for integrations — `kg-telegram-sink` (Telegram graph bot with an A/B canon config) and the legacy `kg-mail-assistant` (Gmail OAuth + email→KG prototype, reference-only) plus programmatic usage via `ContainerFactory`.
443
395
 
444
396
  ## Acknowledgments
445
397
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@wanshi-kg/wanshi",
3
- "version": "0.2.0",
3
+ "version": "0.2.1",
4
4
  "description": "Local-first CLI that turns files, code, PDFs, audio and transcripts into a provenance-tracked knowledge graph — via local Ollama or any OpenAI-compatible LLM.",
5
5
  "keywords": [
6
6
  "knowledge-graph",