llm-kb 0.1.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/PHASE2_SPEC.md +274 -0
- package/PHASE3_SPEC.md +245 -0
- package/PHASE4_SPEC.md +358 -0
- package/README.md +190 -49
- package/bin/cli.js +5512 -176
- package/package.json +11 -6
- package/plan.md +253 -4
- package/src/auth.ts +55 -0
- package/src/cli.ts +182 -33
- package/src/config.ts +61 -0
- package/src/eval.ts +548 -0
- package/src/indexer.ts +36 -32
- package/src/md-stream.ts +133 -0
- package/src/query.ts +408 -0
- package/src/resolve-kb.ts +19 -0
- package/src/session-store.ts +22 -0
- package/src/session-watcher.ts +89 -0
- package/src/trace-builder.ts +168 -0
- package/src/tui-display.ts +281 -0
- package/src/utils.ts +17 -0
- package/src/watcher.ts +5 -2
- package/src/wiki-updater.ts +136 -0
- package/test/auth.test.ts +65 -0
- package/test/config.test.ts +96 -0
- package/test/md-stream.test.ts +98 -0
- package/test/resolve-kb.test.ts +33 -0
- package/test/scan.test.ts +65 -0
- package/test/trace-builder.test.ts +215 -0
- package/vitest.config.ts +8 -0
package/PHASE2_SPEC.md
ADDED
|
@@ -0,0 +1,274 @@
|
|
|
1
|
+
# llm-kb — Phase 2: Query Engine
|
|
2
|
+
|
|
3
|
+
> **Goal:** `llm-kb query "question" --folder ./research` works from the terminal.
|
|
4
|
+
> **Depends on:** Phase 1 (ingest pipeline — complete)
|
|
5
|
+
> **Blog:** Part 3 of the series
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## What Success Looks Like
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
llm-kb query "what are the reserve requirements?" --folder ./research
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
Reading index... 12 sources
|
|
17
|
+
Selected: reserve-policy.md, q3-results.md, board-deck.md
|
|
18
|
+
Reading 3 files...
|
|
19
|
+
|
|
20
|
+
Reserve requirements are defined in two documents:
|
|
21
|
+
|
|
22
|
+
1. **Reserve Policy** (reserve-policy.md, p.3): Minimum reserve
|
|
23
|
+
ratio of 12% of total assets, reviewed quarterly.
|
|
24
|
+
|
|
25
|
+
2. **Q3 Results** (q3-results.md, p.8): Current reserve ratio
|
|
26
|
+
is 14.2%, above the 12% minimum. Management notes this
|
|
27
|
+
provides a 2.2% buffer against regulatory changes.
|
|
28
|
+
|
|
29
|
+
Sources: reserve-policy.md (p.3), q3-results.md (p.8)
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
That's the shape: file selection visible, citations inline, synthesis across sources.
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## Two Modes
|
|
37
|
+
|
|
38
|
+
### Query (read-only)
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
llm-kb query "what changed in Q4 guidance?" --folder ./research
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
The agent reads `index.md`, picks files, reads them, answers. **Cannot modify anything.** Tools: `createReadTool` only.
|
|
45
|
+
|
|
46
|
+
### Research (read + write)
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
llm-kb query "compare pipeline coverage to revenue target" --folder ./research --save
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
Same as query, but the answer is also saved to `.llm-kb/wiki/outputs/`. The watcher detects the new file and re-indexes. Next query can reference the analysis.
|
|
53
|
+
|
|
54
|
+
Tools: `createReadTool` + `createWriteTool` + `createBashTool`.
|
|
55
|
+
|
|
56
|
+
The `--save` flag switches from query mode to research mode.
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## Architecture
|
|
61
|
+
|
|
62
|
+
Same pattern as the indexer — a Pi SDK session with different tools:
|
|
63
|
+
|
|
64
|
+
```typescript
|
|
65
|
+
export async function query(
|
|
66
|
+
folder: string,
|
|
67
|
+
question: string,
|
|
68
|
+
options: { save?: boolean }
|
|
69
|
+
) {
|
|
70
|
+
const sourcesDir = join(folder, ".llm-kb", "wiki", "sources");
|
|
71
|
+
const outputsDir = join(folder, ".llm-kb", "wiki", "outputs");
|
|
72
|
+
|
|
73
|
+
// Build AGENTS.md for query context
|
|
74
|
+
const agentsContent = buildQueryAgents(sourcesDir, options.save);
|
|
75
|
+
|
|
76
|
+
const loader = new DefaultResourceLoader({
|
|
77
|
+
cwd: folder,
|
|
78
|
+
agentsFilesOverride: (current) => ({
|
|
79
|
+
agentsFiles: [
|
|
80
|
+
...current.agentsFiles,
|
|
81
|
+
{ path: ".llm-kb/AGENTS.md", content: agentsContent },
|
|
82
|
+
],
|
|
83
|
+
}),
|
|
84
|
+
});
|
|
85
|
+
await loader.reload();
|
|
86
|
+
|
|
87
|
+
const tools = [createReadTool(folder)];
|
|
88
|
+
if (options.save) {
|
|
89
|
+
tools.push(createWriteTool(folder), createBashTool(folder));
|
|
90
|
+
}
|
|
91
|
+
|
|
92
|
+
const { session } = await createAgentSession({
|
|
93
|
+
cwd: folder,
|
|
94
|
+
resourceLoader: loader,
|
|
95
|
+
tools,
|
|
96
|
+
sessionManager: SessionManager.inMemory(),
|
|
97
|
+
settingsManager: SettingsManager.inMemory({
|
|
98
|
+
compaction: { enabled: false },
|
|
99
|
+
}),
|
|
100
|
+
});
|
|
101
|
+
|
|
102
|
+
// Stream output to terminal
|
|
103
|
+
session.subscribe((event) => {
|
|
104
|
+
if (
|
|
105
|
+
event.type === "message_update" &&
|
|
106
|
+
event.assistantMessageEvent.type === "text_delta"
|
|
107
|
+
) {
|
|
108
|
+
process.stdout.write(event.assistantMessageEvent.delta);
|
|
109
|
+
}
|
|
110
|
+
});
|
|
111
|
+
|
|
112
|
+
await session.prompt(question);
|
|
113
|
+
session.dispose();
|
|
114
|
+
}
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
### The Query AGENTS.md
|
|
118
|
+
|
|
119
|
+
The injected `AGENTS.md` for query mode tells the agent:
|
|
120
|
+
|
|
121
|
+
```markdown
|
|
122
|
+
# llm-kb Knowledge Base — Query Mode
|
|
123
|
+
|
|
124
|
+
## How to answer questions
|
|
125
|
+
|
|
126
|
+
1. FIRST read .llm-kb/wiki/index.md to see all available sources
|
|
127
|
+
2. Based on the question, select the most relevant source files
|
|
128
|
+
3. Read those files in full (not just the first 500 chars)
|
|
129
|
+
4. Answer with inline citations: (filename, page/section)
|
|
130
|
+
5. If the answer requires cross-referencing, read additional files
|
|
131
|
+
6. Prefer primary sources over previous analyses in outputs/
|
|
132
|
+
|
|
133
|
+
## Available sources
|
|
134
|
+
(dynamically generated list of .md files in sources/)
|
|
135
|
+
|
|
136
|
+
## Available libraries for non-PDF files
|
|
137
|
+
- exceljs — for .xlsx/.xls
|
|
138
|
+
- mammoth — for .docx
|
|
139
|
+
- officeparser — for .pptx
|
|
140
|
+
Write a quick Node.js script via bash to read these when needed.
|
|
141
|
+
|
|
142
|
+
## Rules
|
|
143
|
+
- Always cite sources with filename and page number
|
|
144
|
+
- If you can't find the answer, say so — don't hallucinate
|
|
145
|
+
- Read the FULL file, not just the beginning
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
For research mode, add:
|
|
149
|
+
|
|
150
|
+
```markdown
|
|
151
|
+
## Research Mode
|
|
152
|
+
You can save your analysis to .llm-kb/wiki/outputs/.
|
|
153
|
+
Use a descriptive filename (e.g., coverage-analysis.md).
|
|
154
|
+
The file watcher will detect it and update the index.
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
---
|
|
158
|
+
|
|
159
|
+
## CLI Integration
|
|
160
|
+
|
|
161
|
+
Add `query` command to Commander:
|
|
162
|
+
|
|
163
|
+
```typescript
|
|
164
|
+
program
|
|
165
|
+
.command("query")
|
|
166
|
+
.description("Ask a question across your knowledge base")
|
|
167
|
+
.argument("<question>", "Your question")
|
|
168
|
+
.option("--folder <path>", "Path to document folder", ".")
|
|
169
|
+
.option("--save", "Save the answer to wiki/outputs/ (research mode)")
|
|
170
|
+
.action(async (question, options) => {
|
|
171
|
+
const folder = resolve(options.folder);
|
|
172
|
+
|
|
173
|
+
// Check if .llm-kb exists
|
|
174
|
+
if (!existsSync(join(folder, ".llm-kb"))) {
|
|
175
|
+
console.error(chalk.red("No knowledge base found. Run 'llm-kb run' first."));
|
|
176
|
+
process.exit(1);
|
|
177
|
+
}
|
|
178
|
+
|
|
179
|
+
await query(folder, question, { save: options.save });
|
|
180
|
+
});
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## Trace Logging (Prep for Eval — Phase 4)
|
|
186
|
+
|
|
187
|
+
Every query gets logged to `.llm-kb/traces/`:
|
|
188
|
+
|
|
189
|
+
```json
|
|
190
|
+
{
|
|
191
|
+
"timestamp": "2026-04-05T14:30:00Z",
|
|
192
|
+
"question": "what are the reserve requirements?",
|
|
193
|
+
"mode": "query",
|
|
194
|
+
"filesRead": ["index.md", "reserve-policy.md", "q3-results.md"],
|
|
195
|
+
"filesAvailable": ["reserve-policy.md", "q3-results.md", "board-deck.md", "pipeline.md"],
|
|
196
|
+
"answer": "Reserve requirements are defined in two documents...",
|
|
197
|
+
"citations": [
|
|
198
|
+
{ "file": "reserve-policy.md", "location": "p.3", "claim": "Minimum reserve ratio of 12%" },
|
|
199
|
+
{ "file": "q3-results.md", "location": "p.8", "claim": "Current reserve ratio is 14.2%" }
|
|
200
|
+
],
|
|
201
|
+
"tokensUsed": 3800,
|
|
202
|
+
"durationMs": 4200,
|
|
203
|
+
"model": "claude-sonnet-4"
|
|
204
|
+
}
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
Implementation: wrap the session to intercept tool calls and capture which files were read. Save trace JSON after session completes.
|
|
208
|
+
|
|
209
|
+
The eval agent (Phase 4) reads these traces to check citations against sources.
|
|
210
|
+
|
|
211
|
+
---
|
|
212
|
+
|
|
213
|
+
## Streaming Output
|
|
214
|
+
|
|
215
|
+
Terminal query should stream — the user sees the answer appear word by word, not wait for the full response. The `session.subscribe()` handler writes deltas to stdout.
|
|
216
|
+
|
|
217
|
+
For the `run` command (when we add query to the web UI in Phase 3), streaming goes through the Vercel AI SDK protocol.
|
|
218
|
+
|
|
219
|
+
---
|
|
220
|
+
|
|
221
|
+
## Constraints
|
|
222
|
+
|
|
223
|
+
1. **Query must work without the web server running.** `llm-kb query` is standalone — it reads `.llm-kb/` directly. No dependency on `llm-kb run`.
|
|
224
|
+
|
|
225
|
+
2. **Read-only by default.** Query mode cannot modify files. Only `--save` enables write.
|
|
226
|
+
|
|
227
|
+
3. **Index must exist.** If `.llm-kb/wiki/index.md` doesn't exist, error out: "No knowledge base found. Run 'llm-kb run' first."
|
|
228
|
+
|
|
229
|
+
4. **Graceful on empty results.** If the agent can't find relevant files, it should say "I couldn't find sources relevant to this question" — not hallucinate.
|
|
230
|
+
|
|
231
|
+
5. **Token-conscious.** The agent reads index.md (~200 tokens for 50 sources) first, then only the files it selects (3-7 typically). Don't read all sources.
|
|
232
|
+
|
|
233
|
+
---
|
|
234
|
+
|
|
235
|
+
## Build Order (Slices)
|
|
236
|
+
|
|
237
|
+
| Slice | What | Demoable? |
|
|
238
|
+
|---|---|---|
|
|
239
|
+
| 1 | `query` command + read-only session + streaming | ✅ Ask questions, get answers |
|
|
240
|
+
| 2 | `--save` flag + research mode + write to outputs/ | ✅ Answers compound in wiki |
|
|
241
|
+
| 3 | Trace logging (JSON per query) | Prep for eval |
|
|
242
|
+
| 4 | `status` command (show KB stats) | ✅ Nice-to-have |
|
|
243
|
+
|
|
244
|
+
---
|
|
245
|
+
|
|
246
|
+
## Definition of Done
|
|
247
|
+
|
|
248
|
+
- [ ] `llm-kb query "question" --folder ./research` returns a cited answer
|
|
249
|
+
- [ ] Answer streams to terminal (word by word, not all at once)
|
|
250
|
+
- [ ] Agent reads index.md first, then selects and reads relevant source files
|
|
251
|
+
- [ ] `--save` flag saves the answer to `.llm-kb/wiki/outputs/`
|
|
252
|
+
- [ ] Saved answers get detected by watcher and re-indexed
|
|
253
|
+
- [ ] Query traces logged to `.llm-kb/traces/` as JSON
|
|
254
|
+
- [ ] Error if no `.llm-kb/` exists ("run 'llm-kb run' first")
|
|
255
|
+
- [ ] Non-PDF files (Excel, Word) readable by agent via bundled libraries
|
|
256
|
+
- [ ] Blog Part 3 written with real terminal output
|
|
257
|
+
|
|
258
|
+
---
|
|
259
|
+
|
|
260
|
+
## What This Enables
|
|
261
|
+
|
|
262
|
+
With query working, the demo becomes:
|
|
263
|
+
|
|
264
|
+
```bash
|
|
265
|
+
npx llm-kb run ./my-documents # ingest
|
|
266
|
+
llm-kb query "what changed?" # ask
|
|
267
|
+
llm-kb query "compare X vs Y" --save # research (compounds)
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
Three commands. Ingest → Query → Research. That's a product, not a script.
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
*Phase 2 spec written April 4, 2026. DeltaXY.*
|
package/PHASE3_SPEC.md
ADDED
|
@@ -0,0 +1,245 @@
|
|
|
1
|
+
# llm-kb — Phase 3: Auth Fix + Eval Loop + LLM Config
|
|
2
|
+
|
|
3
|
+
> **Priority 1:** Auth fix — users are bouncing because Pi isn't configured
|
|
4
|
+
> **Priority 2:** Eval loop — the differentiator nobody else has
|
|
5
|
+
> **Priority 3:** LLM config — let users pick models
|
|
6
|
+
> **Blog:** Part 4 of the series (eval loop)
|
|
7
|
+
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## 1. Auth Fix (URGENT)
|
|
11
|
+
|
|
12
|
+
Users run `npx llm-kb run` and hit a wall because Pi SDK isn't installed or configured. 117 people saved the LinkedIn post — they're coming back soon.
|
|
13
|
+
|
|
14
|
+
### The Flow
|
|
15
|
+
|
|
16
|
+
```
|
|
17
|
+
User runs `npx llm-kb run ./docs`
|
|
18
|
+
│
|
|
19
|
+
├─ Pi SDK auth exists (~/.pi/agent/auth.json)?
|
|
20
|
+
│ → Use it. Done.
|
|
21
|
+
│
|
|
22
|
+
├─ ANTHROPIC_API_KEY env var set?
|
|
23
|
+
│ → Configure Pi SDK programmatically. Done.
|
|
24
|
+
│
|
|
25
|
+
└─ Neither?
|
|
26
|
+
→ Show clear error:
|
|
27
|
+
|
|
28
|
+
No LLM authentication found.
|
|
29
|
+
|
|
30
|
+
Option 1: Install Pi SDK (recommended)
|
|
31
|
+
npm install -g @mariozechner/pi-coding-agent
|
|
32
|
+
pi
|
|
33
|
+
|
|
34
|
+
Option 2: Set your Anthropic API key
|
|
35
|
+
export ANTHROPIC_API_KEY=sk-ant-...
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
### Implementation
|
|
39
|
+
|
|
40
|
+
Check auth before creating any session. Add to `cli.ts` or a new `auth.ts`:
|
|
41
|
+
|
|
42
|
+
```typescript
|
|
43
|
+
import { existsSync } from "node:fs";
|
|
44
|
+
import { join } from "node:path";
|
|
45
|
+
import { homedir } from "node:os";
|
|
46
|
+
|
|
47
|
+
export function checkAuth(): { ok: boolean; method: string } {
|
|
48
|
+
// Check Pi SDK auth
|
|
49
|
+
const piAuthPath = join(homedir(), ".pi", "agent", "auth.json");
|
|
50
|
+
if (existsSync(piAuthPath)) {
|
|
51
|
+
return { ok: true, method: "pi-sdk" };
|
|
52
|
+
}
|
|
53
|
+
|
|
54
|
+
// Check ANTHROPIC_API_KEY
|
|
55
|
+
if (process.env.ANTHROPIC_API_KEY) {
|
|
56
|
+
return { ok: true, method: "api-key" };
|
|
57
|
+
}
|
|
58
|
+
|
|
59
|
+
return { ok: false, method: "none" };
|
|
60
|
+
}
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
If method is `"api-key"`, configure Pi SDK's settings programmatically so `createAgentSession` works with the env var.
|
|
64
|
+
|
|
65
|
+
### Definition of Done
|
|
66
|
+
- [ ] `ANTHROPIC_API_KEY=sk-... npx llm-kb run ./docs` works without Pi installed
|
|
67
|
+
- [ ] Pi SDK auth works as before (no regression)
|
|
68
|
+
- [ ] Clear error message when neither is available
|
|
69
|
+
- [ ] README updated with both auth options
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## 2. LLM Configuration
|
|
74
|
+
|
|
75
|
+
### Config File
|
|
76
|
+
|
|
77
|
+
Auto-generated on first run at `.llm-kb/config.json`:
|
|
78
|
+
|
|
79
|
+
```json
|
|
80
|
+
{
|
|
81
|
+
"indexModel": "claude-haiku-3-5",
|
|
82
|
+
"queryModel": "claude-sonnet-4-20250514",
|
|
83
|
+
"provider": "anthropic"
|
|
84
|
+
}
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### Env Var Overrides
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
LLM_KB_INDEX_MODEL=claude-haiku-3-5 llm-kb run ./docs
|
|
91
|
+
LLM_KB_QUERY_MODEL=claude-sonnet-4-20250514 llm-kb query "question"
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
### Priority
|
|
95
|
+
|
|
96
|
+
```
|
|
97
|
+
1. Env var (LLM_KB_INDEX_MODEL, LLM_KB_QUERY_MODEL)
|
|
98
|
+
2. Config file (.llm-kb/config.json)
|
|
99
|
+
3. Defaults (Haiku for indexing, Sonnet for query)
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### Why This Matters
|
|
103
|
+
|
|
104
|
+
- Haiku for indexing is 10x cheaper than Sonnet — users shouldn't pay Sonnet prices for one-line summaries
|
|
105
|
+
- Some users want GPT or local models — provider config enables that later
|
|
106
|
+
- Config file is portable — `.llm-kb/` travels with the documents
|
|
107
|
+
|
|
108
|
+
### Definition of Done
|
|
109
|
+
- [ ] `config.json` auto-generated on first run
|
|
110
|
+
- [ ] Index uses cheap model (Haiku), query uses strong model (Sonnet) by default
|
|
111
|
+
- [ ] Env vars override config file
|
|
112
|
+
- [ ] `llm-kb status` shows current model config
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
## 3. Eval Loop
|
|
117
|
+
|
|
118
|
+
### What Gets Traced
|
|
119
|
+
|
|
120
|
+
Every query logs a JSON file to `.llm-kb/traces/`:
|
|
121
|
+
|
|
122
|
+
```json
|
|
123
|
+
{
|
|
124
|
+
"id": "2026-04-06T14-30-00-query",
|
|
125
|
+
"timestamp": "2026-04-06T14:30:00Z",
|
|
126
|
+
"question": "what are the reserve requirements?",
|
|
127
|
+
"mode": "query",
|
|
128
|
+
"filesRead": ["index.md", "reserve-policy.md", "q3-results.md"],
|
|
129
|
+
"filesAvailable": ["reserve-policy.md", "q3-results.md", "board-deck.md", "pipeline.md"],
|
|
130
|
+
"filesSkipped": ["board-deck.md", "pipeline.md"],
|
|
131
|
+
"answer": "Reserve requirements are defined in two documents...",
|
|
132
|
+
"citations": [
|
|
133
|
+
{ "file": "reserve-policy.md", "location": "p.3", "claim": "Minimum reserve ratio of 12%" },
|
|
134
|
+
{ "file": "q3-results.md", "location": "p.8", "claim": "Current reserve ratio is 14.2%" }
|
|
135
|
+
],
|
|
136
|
+
"durationMs": 4200
|
|
137
|
+
}
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
### How to Capture Traces
|
|
141
|
+
|
|
142
|
+
Wrap the session to intercept tool calls:
|
|
143
|
+
|
|
144
|
+
```typescript
|
|
145
|
+
// Track which files the agent reads
|
|
146
|
+
const filesRead: string[] = [];
|
|
147
|
+
|
|
148
|
+
session.subscribe((event) => {
|
|
149
|
+
if (event.type === "tool_use") {
|
|
150
|
+
// Check if it's a read tool call on a source file
|
|
151
|
+
const path = extractPathFromToolCall(event);
|
|
152
|
+
if (path && !filesRead.includes(path)) {
|
|
153
|
+
filesRead.push(path);
|
|
154
|
+
}
|
|
155
|
+
}
|
|
156
|
+
});
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
After session completes, write the trace JSON.
|
|
160
|
+
|
|
161
|
+
### The Eval Command
|
|
162
|
+
|
|
163
|
+
```bash
|
|
164
|
+
llm-kb eval --folder ./research
|
|
165
|
+
llm-kb eval --folder ./research --last 20 # only check last 20 queries
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
The eval agent is a Pi SDK session (read-only) that:
|
|
169
|
+
|
|
170
|
+
1. Reads trace files from `.llm-kb/traces/`
|
|
171
|
+
2. For each trace, checks:
|
|
172
|
+
- **Citation validity** — does the cited file contain the claimed text?
|
|
173
|
+
- **Missing sources** — were any skipped files actually relevant?
|
|
174
|
+
- **Answer consistency** — does the answer contradict the cited sources?
|
|
175
|
+
3. Writes report to `.llm-kb/wiki/outputs/eval-report.md`
|
|
176
|
+
4. Watcher detects the report, re-indexes
|
|
177
|
+
|
|
178
|
+
### The Eval AGENTS.md
|
|
179
|
+
|
|
180
|
+
```markdown
|
|
181
|
+
# llm-kb Knowledge Base — Eval Mode
|
|
182
|
+
|
|
183
|
+
## Your job
|
|
184
|
+
Read query traces from .llm-kb/traces/ and check answer quality.
|
|
185
|
+
|
|
186
|
+
## For each trace, check:
|
|
187
|
+
1. Citation validity — read the cited source file. Does it actually
|
|
188
|
+
contain the claimed text at the claimed location?
|
|
189
|
+
2. Missing sources — read the index summary for each skipped file.
|
|
190
|
+
Given the question, should any skipped file have been read?
|
|
191
|
+
3. Consistency — does the answer contradict anything in the
|
|
192
|
+
cited sources?
|
|
193
|
+
|
|
194
|
+
## Output
|
|
195
|
+
Write .llm-kb/wiki/outputs/eval-report.md with:
|
|
196
|
+
- Summary: X traces checked, Y issues found
|
|
197
|
+
- Per-trace findings (only flag issues, skip clean traces)
|
|
198
|
+
- Recommendations (e.g., "update summary for file X")
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
### Status Command
|
|
202
|
+
|
|
203
|
+
```bash
|
|
204
|
+
llm-kb status --folder ./research
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
```
|
|
208
|
+
Knowledge Base: ./research/.llm-kb/
|
|
209
|
+
Sources: 12 files (8 PDF, 2 XLSX, 1 DOCX, 1 TXT)
|
|
210
|
+
Index: 12 entries, last updated 2 min ago
|
|
211
|
+
Outputs: 3 saved research answers
|
|
212
|
+
Traces: 47 queries logged
|
|
213
|
+
Model: claude-sonnet-4 (query), claude-haiku-3-5 (index)
|
|
214
|
+
Auth: Pi SDK
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
---
|
|
218
|
+
|
|
219
|
+
## Build Order (Slices)
|
|
220
|
+
|
|
221
|
+
| Slice | What | Urgency |
|
|
222
|
+
|---|---|---|
|
|
223
|
+
| 1 | Auth check + ANTHROPIC_API_KEY fallback | 🔴 NOW — users bouncing |
|
|
224
|
+
| 2 | Config file (model selection) | 🟡 This week |
|
|
225
|
+
| 3 | Trace logging (JSON per query) | 🟡 This week |
|
|
226
|
+
| 4 | `status` command | 🟢 Nice to have |
|
|
227
|
+
| 5 | `eval` command + eval session | 🟡 This week |
|
|
228
|
+
| 6 | Blog Part 4 (eval loop) | After code works |
|
|
229
|
+
|
|
230
|
+
---
|
|
231
|
+
|
|
232
|
+
## Definition of Done (Full Phase 3)
|
|
233
|
+
|
|
234
|
+
- [ ] `ANTHROPIC_API_KEY` works without Pi SDK installed
|
|
235
|
+
- [ ] Clear error when no auth found
|
|
236
|
+
- [ ] Config file with model selection (index vs query model)
|
|
237
|
+
- [ ] Every query logs a trace to `.llm-kb/traces/`
|
|
238
|
+
- [ ] `llm-kb eval` checks citations and writes report
|
|
239
|
+
- [ ] `llm-kb status` shows KB stats + config
|
|
240
|
+
- [ ] README updated with auth options + eval command
|
|
241
|
+
- [ ] Blog Part 4 written with real eval output
|
|
242
|
+
|
|
243
|
+
---
|
|
244
|
+
|
|
245
|
+
*Phase 3 spec written April 5, 2026. DeltaXY.*
|