agent-knowledge 1.0.11 → 1.0.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,244 +1,244 @@
1
- # Architecture
2
-
3
- ## Overview
4
-
5
- ```mermaid
6
- graph TB
7
- Claude[Claude Code] -->|MCP stdio| Server[server.ts]
8
- Server --> Knowledge[Knowledge Module]
9
- Server --> Session[Session Module]
10
-
11
- Knowledge --> Store[store.ts — CRUD]
12
- Knowledge --> KSearch[search.ts — TF-IDF]
13
- Knowledge --> Git[git.ts — Sync]
14
-
15
- Store --> Vault[(~/claude-memory)]
16
- Git --> Remote[(Git Remote)]
17
-
18
- Session --> Parser[parser.ts — JSONL + Cache]
19
- Session --> SSearch[search.ts — TF-IDF Index]
20
- Session --> Scopes[scopes.ts — 6 Filters]
21
- Session --> Summary[summary.ts]
22
-
23
- Parser --> Transcripts[(~/.claude/projects/*.jsonl)]
24
-
25
- Server --> Dashboard[dashboard.ts — :3423]
26
- Dashboard --> HTTP[REST API]
27
- Dashboard --> WS[WebSocket]
28
- Dashboard --> Watcher[File Watcher]
29
- HTTP --> Browser[Browser UI]
30
- WS --> Browser
31
- ```
32
-
33
- ## File Structure
34
-
35
- ```
36
- src/
37
- index.ts Entry point — MCP stdio + dashboard auto-start
38
- server.ts 12 tool definitions, request routing, error handling
39
- dashboard.ts HTTP + WebSocket server, REST API, file watcher
40
- types.ts KnowledgeConfig interface, getConfig()
41
- knowledge/
42
- store.ts CRUD for markdown entries with YAML frontmatter
43
- search.ts TF-IDF search over knowledge entries
44
- git.ts git pull/push/sync with execSync + timeouts
45
- sessions/
46
- parser.ts JSONL parsing with mtime-based file cache
47
- search.ts TF-IDF ranked search with 60s global index cache
48
- scopes.ts 6 search scopes, post-filters cached index results
49
- summary.ts Topic extraction, tool/file detection
50
- search/
51
- tfidf.ts TF-IDF scoring engine (tokenizer, stopwords, index)
52
- fuzzy.ts Levenshtein distance, sliding window matching
53
- types.ts SearchResult, SearchOptions interfaces
54
- ui/
55
- index.html Dashboard SPA
56
- styles.css MD3 design tokens (light + dark)
57
- app.js Client-side JS (WebSocket, tabs, rendering)
58
- ```
59
-
60
- ## Knowledge Module
61
-
62
- ### store.ts
63
-
64
- CRUD for markdown files with YAML frontmatter:
65
-
66
- - **parseFrontmatter()** — splits on `---` delimiters, extracts title/tags/updated
67
- - **listEntries()** — recursively finds `.md` files, skips dot-directories, filters by category/tag
68
- - **readEntry()** — reads file with path traversal protection (`path.resolve` must start with base dir)
69
- - **writeEntry()** — validates category against allowed list, ensures directory exists, auto-adds `.md`
70
- - **deleteEntry()** — removes file with path traversal protection
71
-
72
- ### git.ts
73
-
74
- Wraps `execSync` for git operations with timeouts:
75
-
76
- - `gitPull()` — `git pull --rebase --quiet` (15s timeout)
77
- - `gitPush()` — `git add -A`, conditional commit (checks `git diff --cached --quiet`), push (5s/5s/15s)
78
- - `gitSync()` — pull then push, returns both results
79
-
80
- ### search.ts
81
-
82
- Builds a TF-IDF index from all knowledge entries, searches with ranking, falls back to regex for exact phrases.
83
-
84
- ## Session Module
85
-
86
- ### parser.ts — Mtime Cache
87
-
88
- Before parsing a JSONL file, checks `fs.statSync` for mtime. If unchanged since last parse, returns cached result. This avoids re-parsing large transcript files on every search.
89
-
90
- ```
91
- parseSessionFile(path)
92
- → statSync(path).mtimeMs
93
- → if mtime matches cache → return cached entries
94
- → else parse JSONL lines → cache with mtime → return
95
- ```
96
-
97
- ### search.ts — Global TF-IDF Index
98
-
99
- Maintains a single TF-IDF index across all sessions with a 60-second TTL:
100
-
101
- ```
102
- getOrBuildIndex(projects)
103
- → if cache exists AND age < 60s → return cached index
104
- → else scan all sessions → parse (using mtime cache) → index all messages → cache → return
105
- ```
106
-
107
- Role filtering happens post-search: the index includes all roles, and results are filtered after scoring.
108
-
109
- ### scopes.ts
110
-
111
- Uses the cached search index from `search.ts` (via `searchSessions`), then post-filters by scope patterns:
112
-
113
- | Scope | Filter |
114
- | ----------- | --------------------------------------------------------------- |
115
- | `errors` | Regex: Error, Exception, failed, crash, ENOENT, TypeError, etc. |
116
- | `plans` | Regex: plan, step, phase, strategy, TODO, architecture, etc. |
117
- | `configs` | Regex: config, .env, .json, tsconfig, docker, etc. |
118
- | `tools` | Role filter: tool_use, tool_result messages only |
119
- | `files` | Regex: src/, .ts, .js, created, modified, deleted, etc. |
120
- | `decisions` | Regex: decided, chose, because, tradeoff, opted for, etc. |
121
-
122
- ### summary.ts
123
-
124
- Extracts session summaries:
125
-
126
- - **Topics**: user messages filtered to exclude JSON/tool_result/base64/system-reminders
127
- - **Tools used**: tool names from tool_use entries
128
- - **Files modified**: file paths detected via regex in tool_result content
129
-
130
- ## Search Engine
131
-
132
- ### tfidf.ts
133
-
134
- Self-contained TF-IDF implementation:
135
-
136
- **Tokenization**: lowercase → split on `[^a-z0-9]+` → remove ~100 English stopwords
137
-
138
- **Scoring**:
139
-
140
- ```
141
- TF(t, d) = count(t in d) / total_terms(d)
142
- IDF(t) = log(1 + N / docs_containing(t))
143
- Score(q, d) = sum(TF(t, d) * IDF(t)) for each term t in query q
144
- ```
145
-
146
- The `1 +` in IDF ensures single-document results still get a positive score.
147
-
148
- ### fuzzy.ts
149
-
150
- Levenshtein edit distance with two-row DP (O(n\*m) time, O(m) space). Fuzzy matching uses a sliding window of varying size to find approximate substring matches.
151
-
152
- ## Dashboard
153
-
154
- ### dashboard.ts
155
-
156
- Single HTTP server handles both REST API and static files:
157
-
158
- - **Static serving**: resolves UI directory (checks `src/ui/` then `dist/ui/`), serves with MIME detection and CSP headers
159
- - **REST API**: routes for knowledge CRUD/search, session list/search/recall/get/summary, health
160
- - **WebSocket**: `ws` library with `noServer` mode, heartbeat every 30s, initial state snapshot on connect
161
- - **File watcher**: `fs.watch` on UI directory, debounced 200ms, broadcasts `{type: "reload"}` to all WS clients
162
-
163
- ### UI Architecture
164
-
165
- Vanilla JS SPA (no framework, no build step):
166
-
167
- - WebSocket connects on load, handles `state` and `reload` messages
168
- - 4 tabs with lazy data loading
169
- - `marked` + `DOMPurify` + `highlight.js` for markdown rendering
170
- - Theme persisted in `localStorage('agent-knowledge-theme')`
171
-
172
- ## Caching Strategy
173
-
174
- ```
175
- Search Request
176
-
177
-
178
- ┌─────────────────────┐
179
- │ TF-IDF index < 60s? │──Yes──► Search cached index (~40ms)
180
- └─────────────────────┘
181
- │ No
182
-
183
- ┌─────────────────────┐
184
- │ Scan session files │
185
- │ Check mtime cache │──► Parse only changed files
186
- └─────────────────────┘
187
-
188
-
189
- ┌─────────────────────┐
190
- │ Rebuild index │──► Cache with 60s TTL (~5s cold)
191
- └─────────────────────┘
192
-
193
-
194
- Search new index
195
- ```
196
-
197
- ## Data Flow
198
-
199
- ### Session Search
200
-
201
- ```mermaid
202
- sequenceDiagram
203
- participant C as Claude Code
204
- participant S as MCP Server
205
- participant I as TF-IDF Index
206
- participant P as Parser Cache
207
- participant F as File System
208
-
209
- C->>S: knowledge_search({ query })
210
- S->>I: search(query)
211
- alt Index expired
212
- I->>F: List .jsonl files
213
- loop Each file
214
- alt Mtime changed
215
- I->>P: parse(file)
216
- P->>F: Read JSONL
217
- P-->>I: Parsed entries
218
- else Mtime unchanged
219
- I->>P: getCached(file)
220
- P-->>I: Cached entries
221
- end
222
- end
223
- I->>I: Rebuild index
224
- end
225
- I-->>S: Ranked results
226
- S-->>C: SearchResult[]
227
- ```
228
-
229
- ### Knowledge Write
230
-
231
- ```mermaid
232
- sequenceDiagram
233
- participant C as Claude Code
234
- participant S as MCP Server
235
- participant G as Git
236
- participant F as File System
237
-
238
- C->>S: knowledge_write({ category, filename, content })
239
- S->>G: git pull --rebase
240
- S->>F: Write markdown file
241
- S->>G: git add -A && commit && push
242
- G-->>S: Push result
243
- S-->>C: { path, git status }
244
- ```
1
+ # Architecture
2
+
3
+ ## Overview
4
+
5
+ ```mermaid
6
+ graph TB
7
+ Claude[Claude Code] -->|MCP stdio| Server[server.ts]
8
+ Server --> Knowledge[Knowledge Module]
9
+ Server --> Session[Session Module]
10
+
11
+ Knowledge --> Store[store.ts — CRUD]
12
+ Knowledge --> KSearch[search.ts — TF-IDF]
13
+ Knowledge --> Git[git.ts — Sync]
14
+
15
+ Store --> Vault[(~/claude-memory)]
16
+ Git --> Remote[(Git Remote)]
17
+
18
+ Session --> Parser[parser.ts — JSONL + Cache]
19
+ Session --> SSearch[search.ts — TF-IDF Index]
20
+ Session --> Scopes[scopes.ts — 6 Filters]
21
+ Session --> Summary[summary.ts]
22
+
23
+ Parser --> Transcripts[(~/.claude/projects/*.jsonl)]
24
+
25
+ Server --> Dashboard[dashboard.ts — :3423]
26
+ Dashboard --> HTTP[REST API]
27
+ Dashboard --> WS[WebSocket]
28
+ Dashboard --> Watcher[File Watcher]
29
+ HTTP --> Browser[Browser UI]
30
+ WS --> Browser
31
+ ```
32
+
33
+ ## File Structure
34
+
35
+ ```
36
+ src/
37
+ index.ts Entry point — MCP stdio + dashboard auto-start
38
+ server.ts 12 tool definitions, request routing, error handling
39
+ dashboard.ts HTTP + WebSocket server, REST API, file watcher
40
+ types.ts KnowledgeConfig interface, getConfig()
41
+ knowledge/
42
+ store.ts CRUD for markdown entries with YAML frontmatter
43
+ search.ts TF-IDF search over knowledge entries
44
+ git.ts git pull/push/sync with execSync + timeouts
45
+ sessions/
46
+ parser.ts JSONL parsing with mtime-based file cache
47
+ search.ts TF-IDF ranked search with 60s global index cache
48
+ scopes.ts 6 search scopes, post-filters cached index results
49
+ summary.ts Topic extraction, tool/file detection
50
+ search/
51
+ tfidf.ts TF-IDF scoring engine (tokenizer, stopwords, index)
52
+ fuzzy.ts Levenshtein distance, sliding window matching
53
+ types.ts SearchResult, SearchOptions interfaces
54
+ ui/
55
+ index.html Dashboard SPA
56
+ styles.css MD3 design tokens (light + dark)
57
+ app.js Client-side JS (WebSocket, tabs, rendering)
58
+ ```
59
+
60
+ ## Knowledge Module
61
+
62
+ ### store.ts
63
+
64
+ CRUD for markdown files with YAML frontmatter:
65
+
66
+ - **parseFrontmatter()** — splits on `---` delimiters, extracts title/tags/updated
67
+ - **listEntries()** — recursively finds `.md` files, skips dot-directories, filters by category/tag
68
+ - **readEntry()** — reads file with path traversal protection (`path.resolve` must start with base dir)
69
+ - **writeEntry()** — validates category against allowed list, ensures directory exists, auto-adds `.md`
70
+ - **deleteEntry()** — removes file with path traversal protection
71
+
72
+ ### git.ts
73
+
74
+ Wraps `execSync` for git operations with timeouts:
75
+
76
+ - `gitPull()` — `git pull --rebase --quiet` (15s timeout)
77
+ - `gitPush()` — `git add -A`, conditional commit (checks `git diff --cached --quiet`), push (5s/5s/15s)
78
+ - `gitSync()` — pull then push, returns both results
79
+
80
+ ### search.ts
81
+
82
+ Builds a TF-IDF index from all knowledge entries, searches with ranking, falls back to regex for exact phrases.
83
+
84
+ ## Session Module
85
+
86
+ ### parser.ts — Mtime Cache
87
+
88
+ Before parsing a JSONL file, checks `fs.statSync` for mtime. If unchanged since last parse, returns cached result. This avoids re-parsing large transcript files on every search.
89
+
90
+ ```
91
+ parseSessionFile(path)
92
+ → statSync(path).mtimeMs
93
+ → if mtime matches cache → return cached entries
94
+ → else parse JSONL lines → cache with mtime → return
95
+ ```
96
+
97
+ ### search.ts — Global TF-IDF Index
98
+
99
+ Maintains a single TF-IDF index across all sessions with a 60-second TTL:
100
+
101
+ ```
102
+ getOrBuildIndex(projects)
103
+ → if cache exists AND age < 60s → return cached index
104
+ → else scan all sessions → parse (using mtime cache) → index all messages → cache → return
105
+ ```
106
+
107
+ Role filtering happens post-search: the index includes all roles, and results are filtered after scoring.
108
+
109
+ ### scopes.ts
110
+
111
+ Uses the cached search index from `search.ts` (via `searchSessions`), then post-filters by scope patterns:
112
+
113
+ | Scope | Filter |
114
+ | ----------- | --------------------------------------------------------------- |
115
+ | `errors` | Regex: Error, Exception, failed, crash, ENOENT, TypeError, etc. |
116
+ | `plans` | Regex: plan, step, phase, strategy, TODO, architecture, etc. |
117
+ | `configs` | Regex: config, .env, .json, tsconfig, docker, etc. |
118
+ | `tools` | Role filter: tool_use, tool_result messages only |
119
+ | `files` | Regex: src/, .ts, .js, created, modified, deleted, etc. |
120
+ | `decisions` | Regex: decided, chose, because, tradeoff, opted for, etc. |
121
+
122
+ ### summary.ts
123
+
124
+ Extracts session summaries:
125
+
126
+ - **Topics**: user messages filtered to exclude JSON/tool_result/base64/system-reminders
127
+ - **Tools used**: tool names from tool_use entries
128
+ - **Files modified**: file paths detected via regex in tool_result content
129
+
130
+ ## Search Engine
131
+
132
+ ### tfidf.ts
133
+
134
+ Self-contained TF-IDF implementation:
135
+
136
+ **Tokenization**: lowercase → split on `[^a-z0-9]+` → remove ~100 English stopwords
137
+
138
+ **Scoring**:
139
+
140
+ ```
141
+ TF(t, d) = count(t in d) / total_terms(d)
142
+ IDF(t) = log(1 + N / docs_containing(t))
143
+ Score(q, d) = sum(TF(t, d) * IDF(t)) for each term t in query q
144
+ ```
145
+
146
+ The `1 +` in IDF ensures single-document results still get a positive score.
147
+
148
+ ### fuzzy.ts
149
+
150
+ Levenshtein edit distance with two-row DP (O(n\*m) time, O(m) space). Fuzzy matching uses a sliding window of varying size to find approximate substring matches.
151
+
152
+ ## Dashboard
153
+
154
+ ### dashboard.ts
155
+
156
+ Single HTTP server handles both REST API and static files:
157
+
158
+ - **Static serving**: resolves UI directory (checks `src/ui/` then `dist/ui/`), serves with MIME detection and CSP headers
159
+ - **REST API**: routes for knowledge CRUD/search, session list/search/recall/get/summary, health
160
+ - **WebSocket**: `ws` library with `noServer` mode, heartbeat every 30s, initial state snapshot on connect
161
+ - **File watcher**: `fs.watch` on UI directory, debounced 200ms, broadcasts `{type: "reload"}` to all WS clients
162
+
163
+ ### UI Architecture
164
+
165
+ Vanilla JS SPA (no framework, no build step):
166
+
167
+ - WebSocket connects on load, handles `state` and `reload` messages
168
+ - 4 tabs with lazy data loading
169
+ - `marked` + `DOMPurify` + `highlight.js` for markdown rendering
170
+ - Theme persisted in `localStorage('agent-knowledge-theme')`
171
+
172
+ ## Caching Strategy
173
+
174
+ ```
175
+ Search Request
176
+
177
+
178
+ ┌─────────────────────┐
179
+ │ TF-IDF index < 60s? │──Yes──► Search cached index (~40ms)
180
+ └─────────────────────┘
181
+ │ No
182
+
183
+ ┌─────────────────────┐
184
+ │ Scan session files │
185
+ │ Check mtime cache │──► Parse only changed files
186
+ └─────────────────────┘
187
+
188
+
189
+ ┌─────────────────────┐
190
+ │ Rebuild index │──► Cache with 60s TTL (~5s cold)
191
+ └─────────────────────┘
192
+
193
+
194
+ Search new index
195
+ ```
196
+
197
+ ## Data Flow
198
+
199
+ ### Session Search
200
+
201
+ ```mermaid
202
+ sequenceDiagram
203
+ participant C as Claude Code
204
+ participant S as MCP Server
205
+ participant I as TF-IDF Index
206
+ participant P as Parser Cache
207
+ participant F as File System
208
+
209
+ C->>S: knowledge_search({ query })
210
+ S->>I: search(query)
211
+ alt Index expired
212
+ I->>F: List .jsonl files
213
+ loop Each file
214
+ alt Mtime changed
215
+ I->>P: parse(file)
216
+ P->>F: Read JSONL
217
+ P-->>I: Parsed entries
218
+ else Mtime unchanged
219
+ I->>P: getCached(file)
220
+ P-->>I: Cached entries
221
+ end
222
+ end
223
+ I->>I: Rebuild index
224
+ end
225
+ I-->>S: Ranked results
226
+ S-->>C: SearchResult[]
227
+ ```
228
+
229
+ ### Knowledge Write
230
+
231
+ ```mermaid
232
+ sequenceDiagram
233
+ participant C as Claude Code
234
+ participant S as MCP Server
235
+ participant G as Git
236
+ participant F as File System
237
+
238
+ C->>S: knowledge_write({ category, filename, content })
239
+ S->>G: git pull --rebase
240
+ S->>F: Write markdown file
241
+ S->>G: git add -A && commit && push
242
+ G-->>S: Push result
243
+ S-->>C: { path, git status }
244
+ ```