llm-kb 0.4.0 → 0.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (58) hide show
  1. package/README.md +183 -42
  2. package/bin/anthropic-5TIU2EED.js +5515 -0
  3. package/bin/azure-openai-responses-ZVUVMK3G.js +190 -0
  4. package/bin/chunk-2WV6TQRI.js +4792 -0
  5. package/bin/chunk-3YMNGUZZ.js +262 -0
  6. package/bin/chunk-5PYKQQLA.js +14295 -0
  7. package/bin/chunk-65KFH7OI.js +31 -0
  8. package/bin/chunk-DHOXVEIR.js +7261 -0
  9. package/bin/chunk-EAQYK3U2.js +41 -0
  10. package/bin/chunk-IFS3OKBN.js +428 -0
  11. package/bin/chunk-LDHOKBJA.js +86 -0
  12. package/bin/chunk-SLYBG6ZQ.js +32681 -0
  13. package/bin/chunk-UEODFF7H.js +17 -0
  14. package/bin/chunk-XCXTZJGO.js +174 -0
  15. package/bin/chunk-XFV534WU.js +7056 -0
  16. package/bin/cli.js +30 -4
  17. package/bin/dist-3YH7P2QF.js +1244 -0
  18. package/bin/google-JFC43EFJ.js +371 -0
  19. package/bin/google-gemini-cli-K4XNMYDI.js +712 -0
  20. package/bin/google-vertex-Y42F254G.js +414 -0
  21. package/bin/indexer-KSYRIVVN.js +10 -0
  22. package/bin/mistral-ZU2JS5XZ.js +38406 -0
  23. package/bin/multipart-parser-CO464TZY.js +371 -0
  24. package/bin/openai-codex-responses-NW2LELBH.js +712 -0
  25. package/bin/openai-completions-TW3VKTHO.js +662 -0
  26. package/bin/openai-responses-VGL522MK.js +198 -0
  27. package/bin/src-Y22OHE3S.js +1408 -0
  28. package/package.json +6 -1
  29. package/PHASE2_SPEC.md +0 -274
  30. package/PHASE3_SPEC.md +0 -245
  31. package/PHASE4_SPEC.md +0 -358
  32. package/SPEC.md +0 -275
  33. package/plan.md +0 -300
  34. package/src/auth.ts +0 -55
  35. package/src/cli.ts +0 -257
  36. package/src/config.ts +0 -61
  37. package/src/eval.ts +0 -548
  38. package/src/indexer.ts +0 -152
  39. package/src/md-stream.ts +0 -133
  40. package/src/pdf.ts +0 -119
  41. package/src/query.ts +0 -408
  42. package/src/resolve-kb.ts +0 -19
  43. package/src/scan.ts +0 -59
  44. package/src/session-store.ts +0 -22
  45. package/src/session-watcher.ts +0 -89
  46. package/src/trace-builder.ts +0 -168
  47. package/src/tui-display.ts +0 -281
  48. package/src/utils.ts +0 -17
  49. package/src/watcher.ts +0 -87
  50. package/src/wiki-updater.ts +0 -136
  51. package/test/auth.test.ts +0 -65
  52. package/test/config.test.ts +0 -96
  53. package/test/md-stream.test.ts +0 -98
  54. package/test/resolve-kb.test.ts +0 -33
  55. package/test/scan.test.ts +0 -65
  56. package/test/trace-builder.test.ts +0 -215
  57. package/tsconfig.json +0 -14
  58. package/vitest.config.ts +0 -8
package/README.md CHANGED
@@ -39,7 +39,7 @@ llm-kb run ./my-documents
39
39
  ```
40
40
 
41
41
  ```
42
- llm-kb v0.3.0
42
+ llm-kb v0.4.1
43
43
 
44
44
  Scanning ./my-documents...
45
45
  Found 9 files (9 PDF)
@@ -77,7 +77,7 @@ Revenue grew 12% QoQ driven by...
77
77
  3. **Indexes** — Haiku reads sources, writes `index.md` with summary table
78
78
  4. **Watches** — drop new files while running, they get parsed and indexed automatically
79
79
  5. **Chat** — interactive TUI with Pi-style markdown rendering, thinking display, tool call progress
80
- 6. **Learns** — every answer updates a knowledge wiki; repeated questions answered instantly from cache
80
+ 6. **Learns** — every answer updates a concept-organized wiki; repeated questions answered instantly from cache
81
81
 
82
82
  ### Continuous conversation
83
83
 
@@ -99,16 +99,49 @@ Sessions persist across restarts — run `llm-kb run` again and the conversation
99
99
  ### Query — single question from CLI
100
100
 
101
101
  ```bash
102
- # Auto-detects .llm-kb/ by walking up from cwd
103
102
  llm-kb query "compare Q3 vs Q4"
103
+ llm-kb query "summarize revenue data" --folder ./my-documents
104
+ llm-kb query "full analysis of lease terms" --save # research mode
105
+ ```
104
106
 
105
- # Explicit folder
106
- llm-kb query "summarize all revenue data" --folder ./my-documents
107
+ ### Eval — analyze and improve
107
108
 
108
- # Research mode — saves answer and re-indexes
109
- llm-kb query "full analysis of lease terms" --save
109
+ ```bash
110
+ llm-kb eval
111
+ llm-kb eval --last 10
112
+ ```
113
+
114
+ ```
115
+ llm-kb eval
116
+
117
+ Reading sessions...
118
+ Found 29 Q&A exchanges across sessions
119
+ Judging 1/29: "What are the 2023 new laws?"
120
+ ...
121
+ Judging 29/29: "How many files you have"
122
+
123
+ Results:
124
+ Queries analyzed: 29
125
+ Wiki hit rate: 66%
126
+ Wasted reads: 42
127
+ Issues: 22 errors 24 warnings
128
+ Wiki gaps: 28
129
+
130
+ Report: .llm-kb/wiki/outputs/eval-report.md
110
131
  ```
111
132
 
133
+ Eval reads your session files and uses Haiku as a judge to find:
134
+
135
+ | Check | What it catches |
136
+ |---|---|
137
+ | **Citation validity** | Agent claims "Clause 303" but source says "Clause 304" |
138
+ | **Contradictions** | Answer says "sedition retained" but source says "removed" |
139
+ | **Wiki gaps** | Topics asked 4 times but never cached in wiki |
140
+ | **Wasted reads** | Files read but never cited in the answer |
141
+ | **Performance** | Wiki hit rate, avg duration, most-read files |
142
+
143
+ The eval report includes actionable recommendations and updates `.llm-kb/guidelines.md` — learned rules the agent reads on-demand during queries. You can also add your own rules to this file (see [Guidelines](#guidelines) below).
144
+
112
145
  ### Status — KB overview
113
146
 
114
147
  ```bash
@@ -120,38 +153,113 @@ Knowledge Base Status
120
153
  Folder: /path/to/my-documents
121
154
  Sources: 12 parsed sources
122
155
  Index: 3 min ago
156
+ Articles: 15 compiled
123
157
  Outputs: 2 saved answers
124
158
  Models: claude-sonnet-4-6 (query) claude-haiku-4-5 (index)
125
159
  Auth: Pi SDK
126
160
  ```
127
161
 
128
- ## The Knowledge Wiki
162
+ ## The Three-Layer Architecture
129
163
 
130
- Every query makes the system smarter. After answering, `llm-kb` uses Haiku to update `.llm-kb/wiki/wiki.md` a structured knowledge wiki organized by topic:
164
+ The system separates **how to behave**, **what to know**, and **what went wrong** into three files with distinct lifecycles:
131
165
 
132
- ```markdown
133
- ## Indian Evidence Act, 1872
166
+ ```
167
+ ┌──────────────────────────────────────────────────────────────┐
168
+ │ AGENTS.md (runtime — built by code, not on disk) │
169
+ │ How to answer: source list, tool patterns, citation │
170
+ │ rules. Points to guidelines.md for learned behaviour. │
171
+ └──────────────────────────────────────────────────────────────┘
172
+
173
+ ┌─────────────────┴─────────────────┐
174
+ ▼ ▼
175
+ ┌────────────────────────────┐ ┌────────────────────────────┐
176
+ │ wiki.md │ │ guidelines.md │
177
+ │ WHAT to know │ │ HOW to behave better │
178
+ │ │ │ │
179
+ │ Concept-organized knowledge │ │ Eval insights (auto) │
180
+ │ synthesized from sources. │ │ + your custom rules. │
181
+ │ Updated after every query. │ │ Read on-demand by agent. │
182
+ └────────────────────────────┘ └────────────────────────────┘
183
+ ▲ ▲
184
+ │ updated by wiki-updater │ updated by eval
185
+ │ │
186
+ ┌──────┴─────────────────────────────────────────┴────────┐
187
+ │ llm-kb eval │
188
+ │ Reads sessions → judges quality → updates guidelines.md │
189
+ │ + writes eval-report.md for humans │
190
+ └──────────────────────────────────────────────────────────────┘
191
+ ```
134
192
 
135
- ### Overview
136
- Foundational legislation covering 167 sections in 3 parts...
193
+ | Layer | File | Changes when | Written by |
194
+ |---|---|---|---|
195
+ | Architecture | AGENTS.md (runtime) | Code deploys | Developer |
196
+ | Behaviour | `guidelines.md` | After eval / by you | Eval + user |
197
+ | Knowledge | `wiki.md` | After every query | Wiki updater |
137
198
 
138
- ### Part IRelevancy of Facts
139
- Admissions, confessions, dying declarations, expert opinions...
199
+ The agent sees AGENTS.md in its system prompt (lean, stable). It reads `guidelines.md` and `wiki.md` on-demand via tool calls progressive disclosure, not context bloat.
140
200
 
141
- ### Electronic Records (Section 65B)
142
- Admissible with certificate from responsible official...
201
+ ## The Data Flywheel
143
202
 
144
- *Sources: Indian Evidence Act.md · 2026-04-06*
203
+ Every query makes the system faster. Every eval makes it smarter.
145
204
 
146
- ---
205
+ ```
206
+ ┌─────────────────┐
207
+ │ User asks │
208
+ │ a question │
209
+ └────────┬────────┘
210
+
211
+
212
+ ┌────────────────────────┐
213
+ │ Agent checks wiki.md │
214
+ │ + reads guidelines.md │ ◄── on-demand, not forced
215
+ │ + reads source files │
216
+ └────────────┬───────────┘
217
+
218
+
219
+ ┌────────────────────────┐
220
+ │ Wiki updated │ ◄── knowledge compounds
221
+ │ (concept-organized) │
222
+ └────────────┬───────────┘
223
+
224
+
225
+ ┌────────────────────────┐
226
+ │ Next similar query │
227
+ │ answered from wiki │ ── 0 file reads, 2s instead of 25s
228
+ └────────────┬───────────┘
229
+
230
+
231
+ ┌────────────────────────┐
232
+ │ llm-kb eval │ ◄── behaviour compounds
233
+ │ analyzes sessions │ updates guidelines.md
234
+ │ improves behaviour │ with learned rules
235
+ └────────────────────────┘
236
+ ```
147
237
 
148
- ## Bankers Books Evidence Act, 1891
238
+ **Proven results:**
239
+ - First query about a topic: ~25s, reads source files
240
+ - Same question again: ~2s, answered from wiki, 0 files read
241
+ - Wiki hit rate grows with usage: 0% → 66% after 29 queries
149
242
 
150
- ### Key Sections
151
- Section 4 (core): certified copy = prima facie evidence...
152
- ```
243
+ ## The Concept Wiki
153
244
 
154
- When you ask a question already covered by the wiki, the agent answers instantly — no source files read. New questions expand the wiki. The knowledge compounds.
245
+ The wiki organizes knowledge by **concepts**, not source files. A single wiki entry can synthesize information from multiple sources:
246
+
247
+ ```markdown
248
+ ## Mob Lynching
249
+ First-ever criminalisation in Indian law under BNS 2023, Clause 101(2).
250
+ Group of 5+ persons, discriminatory grounds, minimum 7 years to death.
251
+ IPC had no equivalent — prosecuted under general S.302.
252
+ See also: [[Murder and Homicide]], [[BNS 2023 Overview]]
253
+ *Sources: indian penal code - new.md (p.137), Annotated comparison (p.15) · 2026-04-06*
254
+
255
+ ---
256
+
257
+ ## Electronic Evidence
258
+ Section 65B requires certificate from responsible official.
259
+ BSB 2023 expands: emails, WhatsApp, GPS, cloud docs all admissible.
260
+ See also: [[Evidence Law Overview]]
261
+ *Sources: Indian Evidence Act.md, Comparison Chart.md · 2026-04-06*
262
+ ```
155
263
 
156
264
  ## Model Configuration
157
265
 
@@ -164,8 +272,12 @@ Auto-generated at `.llm-kb/config.json`:
164
272
  }
165
273
  ```
166
274
 
167
- - **Haiku** for indexing cheap, fast, good enough for summaries
168
- - **Sonnet** for queries — strong reasoning for cited answers
275
+ | Task | Model | Why |
276
+ |---|---|---|
277
+ | Index | Haiku | Summarizing sources — cheap, fast |
278
+ | Wiki update | Haiku | Merging knowledge — cheap, fast |
279
+ | Eval judge | Haiku | Checking quality — cheap, fast |
280
+ | Query | Sonnet | Complex reasoning, citations — needs strength |
169
281
 
170
282
  Override with env vars:
171
283
  ```bash
@@ -175,31 +287,55 @@ LLM_KB_QUERY_MODEL=claude-sonnet-4-6 llm-kb query "question"
175
287
 
176
288
  ## Non-PDF Files
177
289
 
178
- PDFs are parsed at scan time. Other file types are read dynamically by the agent at query time using bash:
290
+ PDFs are parsed at scan time. Other file types are read dynamically by the agent using bash scripts:
179
291
 
180
292
  | File type | How it's read |
181
293
  |---|---|
182
294
  | `.pdf` | Pre-parsed to markdown + bounding boxes (LiteParse) |
183
- | `.docx` | Agent reads selectively via `adm-zip` (XML structure) |
184
- | `.xlsx` | Agent reads specific sheets/cells via `exceljs` |
185
- | `.pptx` | Agent extracts text via `officeparser` |
295
+ | `.docx` | Selective XML reading via `adm-zip` (structure first, then relevant sections) |
296
+ | `.xlsx` | Specific sheets/cells via `exceljs` |
297
+ | `.pptx` | Text extraction via `officeparser` |
186
298
  | `.md`, `.txt`, `.csv` | Read directly |
187
299
 
188
- For large `.docx` files, the agent reads the document structure first, then extracts only the sections relevant to your question — not the whole file.
300
+ For large files, the agent reads the structure first, then extracts only the sections relevant to the question — never dumps the entire file.
189
301
 
190
302
  ## OCR for Scanned PDFs
191
303
 
192
304
  Most PDFs have native text. For scanned PDFs:
193
305
 
194
306
  ```bash
195
- # Local Tesseract (built-in, slower)
196
- OCR_ENABLED=true llm-kb run ./docs
307
+ OCR_ENABLED=true llm-kb run ./docs # local Tesseract
308
+ OCR_SERVER_URL="http://localhost:8080/ocr?key=KEY" llm-kb run . # remote Azure OCR
309
+ ```
310
+
311
+ ## Guidelines
312
+
313
+ `guidelines.md` is the agent’s learned behaviour file. Eval writes the `## Eval Insights` section automatically. You can add your own rules below it — eval will never overwrite them.
197
314
 
198
- # Remote Azure OCR (faster, better quality)
199
- OCR_SERVER_URL="http://localhost:8080/ocr?key=KEY" llm-kb run ./docs
315
+ ```markdown
316
+ ## Eval Insights (auto-generated 2026-04-07)
317
+
318
+ ### Wiki Gaps — add to wiki when users ask about these topics
319
+ - Reserve requirements
320
+ - Engine types
321
+
322
+ ### Behaviour Fixes
323
+ - Double-check clause numbers against source text.
324
+
325
+ ### Performance
326
+ - Wiki hit rate: 82% (target: 80%+)
327
+ - Avg query time: 3.1s
328
+
329
+ ## My Rules
330
+
331
+ - Always use Hindi transliterations for legal terms
332
+ - Respond in bullet points for legal questions
333
+ - For aviation leases: always check both lessee and lessor obligations
200
334
  ```
201
335
 
202
- Native-text pages are always processed locally (free). Only scanned pages hit the OCR server.
336
+ The agent reads this file on-demand not on every query. It consults guidelines when unsure about citation accuracy, file selection, or when a question touches a topic that had issues before. This keeps the system prompt lean while making learned behaviour available when it matters.
337
+
338
+ You can create `guidelines.md` manually before ever running eval. The agent will find it.
203
339
 
204
340
  ## What It Creates
205
341
 
@@ -208,32 +344,37 @@ Native-text pages are always processed locally (free). Only scanned pages hit th
208
344
  ├── (your files — untouched)
209
345
  └── .llm-kb/
210
346
  ├── config.json ← model configuration
347
+ ├── guidelines.md ← learned rules from eval + your custom rules
211
348
  ├── sessions/ ← conversation history (JSONL)
212
349
  ├── traces/ ← per-query traces (JSON)
213
350
  │ └── .processed ← prevents re-processing on restart
214
351
  └── wiki/
215
352
  ├── index.md ← source summary table
216
- ├── wiki.md ← knowledge wiki (grows over time)
353
+ ├── wiki.md ← concept-organized knowledge wiki
217
354
  ├── queries.md ← query log (newest first)
218
355
  ├── sources/ ← parsed markdown + bounding boxes
219
- └── outputs/ ← saved research answers (--save)
356
+ └── outputs/
357
+ ├── eval-report.md ← eval analysis report
358
+ └── ... ← saved research answers (--save)
220
359
  ```
221
360
 
222
361
  Your original files are never modified. Delete `.llm-kb/` to start fresh.
223
362
 
224
363
  ## Display
225
364
 
226
- The interactive TUI (via `@mariozechner/pi-tui`) shows:
365
+ The interactive TUI (via `@mariozechner/pi-tui`) shows the Claude Web UI pattern:
227
366
 
228
367
  | Phase | What you see |
229
368
  |---|---|
230
369
  | Model | `⟡ claude-sonnet-4-6` |
231
370
  | Thinking | `▸ Thinking` + streamed reasoning (dim) |
232
371
  | Tool calls | `▸ Reading file.md` / `▸ Running bash` + code block |
233
- | Answer | Separator line → markdown rendered with tables, code, headers |
372
+ | Answer | Separator line → markdown with tables, code blocks, headers |
234
373
  | Done | `── 8.3s · 2 files read ──` |
235
374
 
236
- The `llm-kb query` command uses stdout mode same phases, streams to terminal, works with pipes.
375
+ Phases can interleave: think read files answer think again read more → continue answer.
376
+
377
+ The `llm-kb query` command uses stdout mode — same phases, works with pipes and scripts.
237
378
 
238
379
  ## Development
239
380
 
@@ -244,7 +385,7 @@ npm install
244
385
  npm run build
245
386
  npm link
246
387
 
247
- npm test # 38 tests
388
+ npm test # 42 tests
248
389
  npm run test:watch # vitest watch mode
249
390
 
250
391
  llm-kb run ./test-folder