@chiway/contextweaver 1.4.0 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. package/LICENSE +1 -1
  2. package/README.md +482 -196
  3. package/README.zh-CN.md +669 -0
  4. package/dist/{SearchService-OS7CYHNJ.js → SearchService-WVD6THR3.js} +116 -74
  5. package/dist/chunk-2EVCLNYN.js +223 -0
  6. package/dist/{chunk-ZOMGPIU6.js → chunk-3BNHQV5W.js} +1 -5
  7. package/dist/chunk-BFCIZ52F.js +102 -0
  8. package/dist/chunk-H4MGLXXF.js +115 -0
  9. package/dist/{lock-FL54LIQL.js → chunk-HHYPQA3X.js} +1 -1
  10. package/dist/chunk-IZ6IUHNN.js +77 -0
  11. package/dist/chunk-LB42CZEB.js +18 -0
  12. package/dist/chunk-MN6BQJDB.js +85 -0
  13. package/dist/{chunk-EMSMLPMK.js → chunk-ORYIVY7D.js} +10 -117
  14. package/dist/{chunk-RGJSXUFS.js → chunk-PPLFJGO3.js} +60 -0
  15. package/dist/chunk-TPM6YP43.js +38 -0
  16. package/dist/chunk-XFIM2T6S.js +57 -0
  17. package/dist/{chunk-AB24E3Z7.js → chunk-XMZZZKG7.js} +23 -79
  18. package/dist/chunk-XTWNT7KP.js +156 -0
  19. package/dist/chunk-YMQWNIQI.js +143 -0
  20. package/dist/{chunk-X7PAYQMT.js → chunk-YSQI5IRI.js} +125 -5
  21. package/dist/{codebaseRetrieval-3Z4CRA7X.js → codebaseRetrieval-4BFIM7PU.js} +5 -2
  22. package/dist/{db-PMVM7557.js → db-GBCLP4GG.js} +15 -1
  23. package/dist/findReferences-EBYR3VNL.js +16 -0
  24. package/dist/getSymbolDefinition-ZQK65FPN.js +17 -0
  25. package/dist/index.js +244 -41
  26. package/dist/listFiles-W7C5UYOP.js +14 -0
  27. package/dist/loadConfig-XTVT2OWW.js +9 -0
  28. package/dist/lock-HNKQ6X5B.js +8 -0
  29. package/dist/scanner-OVMAMQSQ.js +13 -0
  30. package/dist/server-ZIJIRVWH.js +347 -0
  31. package/dist/stats-AGKUCJQI.js +12 -0
  32. package/dist/{vectorStore-HPQZOVWF.js → vectorStore-4ODCERRO.js} +1 -1
  33. package/package.json +15 -23
  34. package/dist/scanner-2XGJWYHR.js +0 -11
  35. package/dist/server-XK6EINRV.js +0 -146
package/README.md CHANGED
@@ -1,130 +1,218 @@
1
1
  # ContextWeaver
2
2
 
3
3
  <p align="center">
4
- <strong>🧵 AI Agent 精心编织的代码库上下文引擎</strong>
4
+ <strong>🧵 A codebase context engine woven for AI agents</strong>
5
5
  </p>
6
6
 
7
7
  <p align="center">
8
8
  <em>Semantic Code Retrieval for AI Agents — Hybrid Search • Graph Expansion • Token-Aware Packing</em>
9
9
  </p>
10
10
 
11
- ---
12
-
13
- **ContextWeaver** 是一个专为 AI 代码助手设计的语义检索引擎,采用混合搜索(向量 + 词法)、智能上下文扩展和 Token 感知打包策略,为 LLM 提供精准、相关且上下文完整的代码片段。
14
-
15
11
  <p align="center">
16
- <img src="docs/architecture.png" alt="ContextWeaver 架构概览" width="800" />
12
+ <strong>English</strong> ·
13
+ <a href="README.zh-CN.md">简体中文</a>
17
14
  </p>
18
15
 
19
- ## ✨ 核心特性
20
-
21
- ### 🔍 混合检索引擎
22
- - **向量召回 (Vector Retrieval)**:基于语义相似度的深度理解
23
- - **词法召回 (Lexical/FTS)**:精确匹配函数名、类名等技术术语
24
- - **RRF 融合 (Reciprocal Rank Fusion)**:智能融合多路召回结果
25
-
26
- ### 🧠 AST 语义分片
27
- - **Tree-sitter 解析**:支持 TypeScript、JavaScript、Python、Go、Java、Rust 六大语言
28
- - **Dual-Text 策略**:`displayCode` 用于展示,`vectorText` 用于 Embedding
29
- - **Gap-Aware 合并**:智能处理代码间隙,保持语义完整性
30
- - **Breadcrumb 注入**:向量文本包含层级路径,提升检索召回率
31
-
32
- ### 📊 三阶段上下文扩展
33
- - **E1 邻居扩展**:同文件前后相邻 chunks,保证代码块完整性
34
- - **E2 面包屑补全**:同一类/函数下的其他方法,理解整体结构
35
- - **E3 Import 解析**:跨文件依赖追踪(可配置开关)
36
-
37
- ### 🎯 智能截断策略 (Smart TopK)
38
- - **Anchor & Floor**:动态阈值 + 绝对下限双保险
39
- - **Delta Guard**:防止 Top1 outlier 场景的误判
40
- - **Safe Harbor**:前 N 个结果只检查下限,保证基本召回
16
+ ---
41
17
 
42
- ### 🔌 MCP 原生支持
43
- - **MCP Server 模式**:一键启动 Model Context Protocol 服务端
44
- - **意图与术语分离**:LLM 友好的 API 设计
45
- - **自动索引**:首次查询自动触发索引,增量更新透明无感
18
+ **ContextWeaver** is a semantic retrieval engine purpose-built for AI coding assistants. It combines hybrid search (vector + lexical), intelligent context expansion, and token-aware packing to deliver precise, relevant, and context-complete code snippets to LLMs.
46
19
 
47
- ## 📦 快速开始
20
+ <p align="center">
21
+ <img src="docs/architecture.png" alt="ContextWeaver architecture overview" width="800" />
22
+ </p>
48
23
 
49
- ### 环境要求
24
+ ## ✨ Core Features
25
+
26
+ ### 🔍 Hybrid Retrieval Engine
27
+ - **Vector Retrieval**: deep semantic understanding via similarity
28
+ - **Lexical Retrieval (FTS)**: exact matching for function names, class names, and other technical terms
29
+ - **RRF Fusion (Reciprocal Rank Fusion)**: intelligently merges multiple recall channels
30
+
31
+ ### 🧠 AST Semantic Chunking
32
+ - **Tree-sitter parsing**: supports TypeScript, JavaScript, Python, Go, Java, Rust, C, C++, C#, and more
33
+ - **Dual-Text strategy**: `displayCode` for presentation, `vectorText` for embedding
34
+ - **Gap-Aware merging**: handles code gaps intelligently while preserving semantic integrity
35
+ - **Breadcrumb injection**: vector text carries hierarchical paths to boost recall
36
+ - **UTF-16 character-domain normalization**: offsets are unified via `SourceAdapter.toCharOffset` before writing metadata, preventing multi-byte character slicing errors (v1.4.0+)
37
+
38
+ ### 📊 Three-Stage Context Expansion
39
+ - **E1 Neighbor expansion**: adjacent chunks within the same file, preserving block completeness
40
+ - **E2 Breadcrumb completion**: sibling methods under the same class/function for structural understanding
41
+ - **E3 Import resolution**: cross-file dependency tracking (configurable toggle)
42
+
43
+ ### 🎯 Smart TopK Cutoff
44
+ - **Anchor & Floor**: dynamic threshold plus an absolute floor as dual safeguards
45
+ - **Delta Guard**: prevents misjudgment in Top1-outlier scenarios
46
+ - **Safe Harbor**: the first N results only check the floor, guaranteeing baseline recall
47
+
48
+ ### 🔌 Native MCP Support
49
+ - **MCP Server mode**: launch a Model Context Protocol server with one command
50
+ - **Multi-tool granularity** (v1.5.0+): beyond core semantic retrieval, adds dedicated tools for structure browsing, symbol references, symbol definitions, and statistics
51
+ - **Intent/term separation**: an LLM-friendly API design
52
+ - **Auto-indexing**: the first query triggers indexing automatically; incremental updates are transparent
53
+
54
+ ### ⚡ Query Cache & File Watching (v1.5.0+)
55
+ - **Query cache (QueryCache)**: in-process per-project LRU cache (50 entries by default); a hit skips the entire vector recall / rerank / expansion pipeline
56
+ - **Automatic cache invalidation**: the cache key is composed of `normalized query + projectId + index version + search-config fingerprint`, so it invalidates automatically after an index update or config change — stale results are never returned
57
+ - **Watch mode**: `contextweaver watch` watches the filesystem and triggers incremental indexing automatically, with debouncing (500ms by default) and scan de-duplication (no concurrent scans)
58
+
59
+ ### 📈 Statistics & Observability (v1.5.0+)
60
+ - **Three metric groups**: indexing process, search quality/behavior, health/consistency
61
+ - **Dual exits**: `contextweaver stats` CLI (with `--json`) plus the MCP `stats` tool
62
+ - **Consistency diagnostics**: automatically detects abnormal migration state, `pending_marks` backlog, missing vector rows, and more — with suggested fixes
63
+
64
+ ### 🛡️ Crash-Safe Data Architecture (v1.4.0+)
65
+ - **Single source of truth for content**: LanceDB stores only vectors and locating metadata; content is read back from `files.content`, reducing index size by 30–50%
66
+ - **Cross-store transactional compensation**: three-stage write LanceDB → FTS+outbox → SQLite mark, with automatic rollback or replay on any failure
67
+ - **Migration state machine**: `pending/done/aborted` persisted, auto-rebuilt on crash recovery
68
+ - **Cross-process mutual exclusion**: an advisory lock prevents the MCP server and CLI from triggering LanceDB migration concurrently
69
+ - **chunk_id de-duplication**: pre-delete before write to avoid duplicate rows on retry
70
+
71
+ ## 📦 Quick Start
72
+
73
+ ### Requirements
50
74
 
51
75
  - Node.js >= 20
52
- - pnpm (推荐) npm
76
+ - pnpm (recommended) or npm
53
77
 
54
- ### 安装
78
+ ### Installation
55
79
 
56
80
  ```bash
57
- # 全局安装
58
- npm install -g @hsingjui/contextweaver
81
+ # Global install
82
+ npm install -g @chiway/contextweaver
59
83
 
60
- # 或使用 pnpm
61
- pnpm add -g @hsingjui/contextweaver
84
+ # Or with pnpm
85
+ pnpm add -g @chiway/contextweaver
62
86
  ```
63
87
 
64
- ### 初始化配置
88
+ ### Initialize Configuration
65
89
 
66
90
  ```bash
67
- # 初始化配置文件(创建 ~/.contextweaver/.env
91
+ # Create the config file (~/.contextweaver/.env)
68
92
  contextweaver init
69
- # 或简写
93
+ # Or the short alias
70
94
  cw init
71
95
  ```
72
96
 
73
- 编辑 `~/.contextweaver/.env`,填入你的 API Key:
97
+ Edit `~/.contextweaver/.env` and fill in your API keys:
74
98
 
75
99
  ```bash
76
- # Embedding API 配置(必需)
100
+ # Embedding API config (required)
77
101
  EMBEDDINGS_API_KEY=your-api-key-here
78
102
  EMBEDDINGS_BASE_URL=https://api.siliconflow.cn/v1/embeddings
79
103
  EMBEDDINGS_MODEL=BAAI/bge-m3
80
104
  EMBEDDINGS_MAX_CONCURRENCY=10
81
105
  EMBEDDINGS_DIMENSIONS=1024
82
106
 
83
- # Reranker 配置(必需)
107
+ # Reranker config (required)
84
108
  RERANK_API_KEY=your-api-key-here
85
109
  RERANK_BASE_URL=https://api.siliconflow.cn/v1/rerank
86
110
  RERANK_MODEL=BAAI/bge-reranker-v2-m3
87
111
  RERANK_TOP_N=20
88
112
 
89
- # 忽略模式(可选,逗号分隔)
113
+ # Search parameters (optional, override built-in defaults)
114
+ CW_SEARCH_WVEC=0.6
115
+ CW_SEARCH_WLEX=0.4
116
+ CW_SEARCH_RERANK_TOP_N=10
117
+ CW_SEARCH_MAX_TOTAL_CHARS=48000
118
+ CW_SEARCH_VECTOR_TOP_K=80
119
+ CW_SEARCH_SMART_MAX_K=8
120
+ CW_SEARCH_IMPORT_FILES_PER_SEED=3
121
+
122
+ # Ignore patterns (optional, comma-separated)
90
123
  # IGNORE_PATTERNS=.venv,node_modules
91
124
  ```
92
125
 
93
- ### 索引代码库
126
+ ### Index a Codebase
94
127
 
95
128
  ```bash
96
- # 在代码库根目录执行
129
+ # Run from the codebase root
97
130
  contextweaver index
98
131
 
99
- # 指定路径
132
+ # Specify a path
100
133
  contextweaver index /path/to/your/project
101
134
 
102
- # 强制重新索引
135
+ # Force a full re-index
103
136
  contextweaver index --force
104
137
  ```
105
138
 
106
- ### 本地搜索
139
+ ### Watch Mode (v1.5.0+)
140
+
141
+ ```bash
142
+ # Watch for file changes and auto-index incrementally (Ctrl+C to stop)
143
+ contextweaver watch
144
+
145
+ # Specify a path and debounce window (ms)
146
+ contextweaver watch /path/to/project --debounce 800
147
+ ```
148
+
149
+ `watch` runs one full incremental scan on startup, then listens to filesystem events; changes trigger a de-duplicated scan within the debounce window, and paths excluded by ignore rules never trigger a scan.
150
+
151
+ ### Local Search
152
+
153
+ ```bash
154
+ # Semantic search
155
+ cw search --information-request "How is the user authentication flow implemented?"
156
+
157
+ # With exact terms
158
+ cw search --information-request "Database connection logic" --technical-terms "DatabasePool,Connection"
159
+ ```
160
+
161
+ ### Structure Browsing & Symbol Lookup (v1.5.0+)
162
+
163
+ The following commands are CLI mirrors of MCP tools, with zero Embedding API cost:
107
164
 
108
165
  ```bash
109
- # 语义搜索
110
- cw search --information-request "用户认证流程是如何实现的?"
166
+ # List indexed files (supports glob / language / count filters)
167
+ contextweaver list-files --glob "src/**/*.ts" --language typescript --max-results 100
168
+
169
+ # Look up a symbol definition
170
+ contextweaver definition SearchService --hint-path src/search
111
171
 
112
- # 带精确术语
113
- cw search --information-request "数据库连接逻辑" --technical-terms "DatabasePool,Connection"
172
+ # Look up symbol references
173
+ contextweaver references handleStats --exclude-definition
114
174
  ```
115
175
 
116
- ### 启动 MCP 服务器
176
+ ### Statistics (v1.5.0+)
117
177
 
118
178
  ```bash
119
- # 启动 MCP 服务端(供 Claude 等 AI 助手使用)
179
+ # Human-readable stats report
180
+ contextweaver stats
181
+
182
+ # JSON output (for scripting)
183
+ contextweaver stats --json
184
+
185
+ # Specify a project path
186
+ contextweaver stats --path /path/to/project
187
+ ```
188
+
189
+ ### Start the MCP Server
190
+
191
+ ```bash
192
+ # Launch the MCP server (for use by Claude and other AI assistants)
120
193
  contextweaver mcp
121
194
  ```
122
195
 
123
- ## 🔧 MCP 集成配置
196
+ ### Index Management (v1.4.0+)
197
+
198
+ ```bash
199
+ # Show LanceDB migration state
200
+ contextweaver migrate
124
201
 
125
- ### Claude Desktop 配置
202
+ # Clear the aborted state: wipe LanceDB and trigger a full rebuild
203
+ # Triggered when: the Indexer refuses to write after sampling validation fails;
204
+ # run this, then index again.
205
+ contextweaver migrate --reset
206
+
207
+ # Specify a project path
208
+ contextweaver migrate --path /path/to/project
209
+ ```
126
210
 
127
- Claude Desktop 的配置文件中添加:
211
+ ## 🔧 MCP Integration
212
+
213
+ ### Claude Desktop Configuration
214
+
215
+ Add the following to your Claude Desktop config file:
128
216
 
129
217
  ```json
130
218
  {
@@ -137,25 +225,62 @@ contextweaver mcp
137
225
  }
138
226
  ```
139
227
 
140
- ### MCP 工具说明
228
+ ### MCP Tools Overview (v1.5.0+)
229
+
230
+ ContextWeaver exposes 5 MCP tools, following a layered design of "semantic retrieval first, structure browsing second":
231
+
232
+ | Tool | Purpose | Embedding cost |
233
+ |------|---------|----------------|
234
+ | `codebase-retrieval` | **Primary tool**: hybrid semantic + exact-match retrieval | Yes |
235
+ | `list-files` | List indexed file structure (path/language/size) | No |
236
+ | `find-references` | Find heuristic text references to a symbol | No |
237
+ | `get-symbol-definition` | Find likely definition blocks for a symbol | No |
238
+ | `stats` | Index/search/health statistics | No |
141
239
 
142
- ContextWeaver 提供一个核心 MCP 工具:`codebase-retrieval`
240
+ #### `codebase-retrieval` Parameters
143
241
 
144
- #### 参数说明
242
+ | Parameter | Type | Required | Description |
243
+ |-----------|------|----------|-------------|
244
+ | `repo_path` | string | ✅ | Absolute path to the repository root |
245
+ | `information_request` | string | ✅ | The semantic intent in natural language |
246
+ | `technical_terms` | string[] | ❌ | Exact technical terms (class/function names, etc.) |
145
247
 
146
- | 参数 | 类型 | 必需 | 描述 |
147
- |------|------|------|------|
148
- | `repo_path` | string | ✅ | 代码库根目录的绝对路径 |
149
- | `information_request` | string | ✅ | 自然语言形式的语义意图描述 |
150
- | `technical_terms` | string[] | ❌ | 精确技术术语(类名、函数名等) |
248
+ #### `list-files` Parameters
151
249
 
152
- #### 设计理念
250
+ | Parameter | Type | Required | Description |
251
+ |-----------|------|----------|-------------|
252
+ | `repo_path` | string | ✅ | Absolute path to the repository root |
253
+ | `glob` | string | ❌ | Glob pattern to filter paths |
254
+ | `language` | string | ❌ | Language filter (matched against `files.language`) |
255
+ | `max_results` | number | ❌ | Max files to return (default 200) |
153
256
 
154
- - **意图与术语分离**:`information_request` 描述「做什么」,`technical_terms` 过滤「叫什么」
155
- - **同文件上下文优先**:默认提供同文件上下文,跨文件探索由 Agent 自主发起
156
- - **回归代理本能**:工具只负责定位,跨文件探索由 Agent 按需触发
257
+ #### `find-references` Parameters
157
258
 
158
- ## 🏗️ 架构设计
259
+ | Parameter | Type | Required | Description |
260
+ |-----------|------|----------|-------------|
261
+ | `repo_path` | string | ✅ | Absolute path to the repository root |
262
+ | `symbol` | string | ✅ | Exact symbol name |
263
+ | `exclude_definition` | boolean | ❌ | Exclude chunks whose breadcrumb tail matches the symbol name |
264
+ | `max_results` | number | ❌ | Max references to return (default 50) |
265
+
266
+ #### `get-symbol-definition` Parameters
267
+
268
+ | Parameter | Type | Required | Description |
269
+ |-----------|------|----------|-------------|
270
+ | `repo_path` | string | ✅ | Absolute path to the repository root |
271
+ | `symbol` | string | ✅ | Exact symbol name to resolve |
272
+ | `hint_path` | string | ❌ | Preferred path to disambiguate same-name definitions |
273
+ | `max_results` | number | ❌ | Max definitions to return (default 3) |
274
+
275
+ > **Note**: `find-references` and `get-symbol-definition` are heuristic text lookups over indexed chunks, not compiler-accurate navigation. For exhaustive raw text matching, use `grep` outside MCP.
276
+
277
+ #### Design Philosophy
278
+
279
+ - **Intent/term separation**: `information_request` describes "what to do", `technical_terms` filters "what it's called"
280
+ - **Same-file context first**: same-file context is provided by default; cross-file exploration is initiated by the agent
281
+ - **Return to agent instincts**: the tool only locates; cross-file exploration is triggered by the agent on demand
282
+
283
+ ## 🏗️ Architecture
159
284
 
160
285
  ```mermaid
161
286
  flowchart TB
@@ -165,9 +290,11 @@ flowchart TB
165
290
  end
166
291
 
167
292
  subgraph Search["SearchService"]
293
+ QC[QueryCache<br/>LRU]
168
294
  VR[Vector Retrieval]
169
295
  LR[Lexical Retrieval]
170
296
  RRF[RRF Fusion + Rerank]
297
+ QC -.cache hit.-> CP
171
298
  VR --> RRF
172
299
  LR --> RRF
173
300
  end
@@ -194,188 +321,347 @@ flowchart TB
194
321
  Index --> Storage
195
322
  ```
196
323
 
197
- ### 核心模块说明
324
+ ### Core Modules
198
325
 
199
- | 模块 | 职责 |
200
- |------|------|
201
- | **SearchService** | 混合搜索核心,协调向量/词法召回、RRF 融合、Rerank 精排 |
202
- | **GraphExpander** | 上下文扩展器,执行 E1/E2/E3 三阶段扩展策略 |
203
- | **ContextPacker** | 上下文打包器,负责段落合并和 Token 预算控制 |
204
- | **VectorStore** | LanceDB 适配层,管理向量索引的增删改查 |
205
- | **SQLite (FTS5)** | 元数据存储 + 全文搜索索引 |
206
- | **SemanticSplitter** | AST 语义分片器,基于 Tree-sitter 解析 |
326
+ | Module | Responsibility |
327
+ |--------|----------------|
328
+ | **SearchService** | Hybrid search core: coordinates vector/lexical recall, RRF fusion, rerank; integrates QueryCache |
329
+ | **QueryCache** | Per-project in-process LRU cache (v1.5.0+); a hit skips the entire retrieval pipeline |
330
+ | **GraphExpander** | Context expander: runs the E1/E2/E3 three-stage expansion strategy |
331
+ | **ContextPacker** | Context packer: segment merging and token budget control |
332
+ | **ChunkContentLoader** | Slices `files.content` by `(path, start_index, end_index)` (v1.4.0+) |
333
+ | **VectorStore** | LanceDB adapter; exposes pure vector operations only |
334
+ | **Database (SQLite)** | Metadata storage + FTS5 full-text index + statistics counters, schema_version=3 |
335
+ | **Bootstrap** | Cross-store init coordinator: pending_marks replay + LanceDB schema migration (v1.4.0+) |
336
+ | **SemanticSplitter** | AST semantic chunker (Tree-sitter); normalizes offsets to the UTF-16 character domain on write |
337
+ | **Watcher** | File-watch coordinator (v1.5.0+): debounce + scan de-duplication + ignore filtering |
338
+ | **Stats** | Statistics aggregation layer (v1.5.0+): combines index/search/health metrics |
207
339
 
208
- ## 📁 项目结构
340
+ ### Data Architecture (v1.4.0+)
341
+
342
+ ```
343
+ ~/.contextweaver/<projectId>/
344
+ ├── index.db # SQLite
345
+ │ ├── files # File metadata + full content (content column, the only source for text slicing)
346
+ │ ├── files_fts # External-content table, inverted index pointing to files
347
+ │ ├── chunks_fts # Chunk-level inverted index, per-file wholesale replacement
348
+ │ ├── metadata # schema_version / lancedb_migration_state / lock
349
+ │ ├── stats # Cumulative index/search counters (v1.5.0+)
350
+ │ └── pending_marks # Outbox: replayed when a vector_index_hash mark failed
351
+ └── vectors.lance/ # LanceDB chunks table (vectors + locating metadata only, no content)
352
+ ```
353
+
354
+ **Key invariants**:
355
+ - The single source of truth for content is `files.content`; `ChunkContentLoader` slices via `start_index/end_index` (same source as `displayCode`)
356
+ - All LanceDB offset fields live in the UTF-16 character domain; multi-byte files are never sliced incorrectly
357
+ - Cross-store write order: LanceDB → (FTS + outbox single transaction) → SQLite mark + clear outbox
358
+ - LanceDB migration state `pending/done/aborted` is persisted, with cross-process mutual exclusion via an advisory lock
359
+ - The query cache key is bound to the index version and search-config fingerprint; it invalidates on any index or config change
360
+
361
+ ## 📁 Project Structure
209
362
 
210
363
  ```
211
364
  contextweaver/
212
365
  ├── src/
213
- │ ├── index.ts # CLI 入口
214
- │ ├── config.ts # 配置管理(环境变量)
215
- │ ├── api/ # 外部 API 封装
216
- ├── embed.ts # Embedding API
217
- │ │ └── rerank.ts # Reranker API
218
- │ ├── chunking/ # 语义分片
219
- │ │ ├── SemanticSplitter.ts # AST 语义分片器
220
- │ │ ├── SourceAdapter.ts # 源码适配器
221
- ├── LanguageSpec.ts # 语言规范定义
222
- │ │ └── ParserPool.ts # Tree-sitter 解析器池
223
- │ ├── scanner/ # 文件扫描
224
- │ │ ├── crawler.ts # 文件系统遍历
225
- │ │ ├── processor.ts # 文件处理
226
- │ │ └── filter.ts # 过滤规则
227
- │ ├── indexer/ # 索引器
228
- │ │ └── index.ts # 批量索引逻辑
229
- │ ├── vectorStore/ # 向量存储
230
- │ │ └── index.ts # LanceDB 适配层
231
- │ ├── db/ # 数据库
232
- │ │ └── index.ts # SQLite + FTS5
233
- │ ├── search/ # 搜索服务
234
- │ │ ├── SearchService.ts # 核心搜索服务
235
- ├── GraphExpander.ts # 上下文扩展器
236
- │ │ ├── ContextPacker.ts # 上下文打包器
237
- ├── fts.ts # 全文搜索
238
- │ │ ├── config.ts # 搜索配置
239
- ├── types.ts # 类型定义
240
- │ │ └── resolvers/ # 多语言 Import 解析器
366
+ │ ├── index.ts # CLI entry (init / index / watch / search / mcp / migrate / stats)
367
+ │ ├── config.ts # Config management (environment variables)
368
+ │ ├── defaultEnv.ts # Default .env template
369
+ │ ├── cli/
370
+ │ │ └── mirrorCommands.ts # CLI mirrors of MCP tools (list-files / definition / references)
371
+ │ ├── api/ # External API wrappers
372
+ │ │ ├── embedding.ts # Embedding API
373
+ │ │ └── reranker.ts # Reranker API
374
+ │ ├── chunking/ # Semantic chunking
375
+ │ │ ├── SemanticSplitter.ts # AST semantic chunker
376
+ ├── SourceAdapter.ts # Source adapter (UTF-16/UTF-8 domain normalization)
377
+ │ │ ├── LanguageSpec.ts # Language spec definitions
378
+ │ │ ├── ParserPool.ts # Tree-sitter parser pool
379
+ │ │ └── types.ts # Chunking type definitions
380
+ │ ├── scanner/ # File scanning
381
+ │ │ ├── index.ts # Scan orchestration
382
+ ├── crawler.ts # Filesystem traversal
383
+ │ │ ├── processor.ts # File processing
384
+ ├── watcher.ts # File-watch coordinator (v1.5.0+)
385
+ │ │ ├── filter.ts # Filter rules
386
+ ├── hash.ts # File hash
387
+ │ │ └── language.ts # Language detection
388
+ │ ├── indexer/ # Indexer
389
+ │ │ └── index.ts # Three-stage transaction (LanceDB → FTS+outbox → SQLite mark)
390
+ │ ├── vectorStore/ # Vector storage
391
+ │ │ └── index.ts # LanceDB adapter (pure vector operations)
392
+ │ ├── db/ # Database
393
+ │ │ ├── index.ts # SQLite + FTS5 + pending_marks + migration state machine + stats counters
394
+ │ │ └── bootstrap.ts # Cross-store init coordinator (v1.4.0+)
395
+ │ ├── search/ # Search service
396
+ │ │ ├── SearchService.ts # Core search service (cache-integrated)
397
+ │ │ ├── QueryCache.ts # Per-project LRU query cache (v1.5.0+)
398
+ │ │ ├── GraphExpander.ts # Context expander
399
+ │ │ ├── ContextPacker.ts # Context packer
400
+ │ │ ├── ChunkContentLoader.ts # Slices by (path, start_index, end_index) (v1.4.0+)
401
+ │ │ ├── fts.ts # Full-text search (per-file wholesale replacement)
402
+ │ │ ├── config.ts # Search default config + value bounds
403
+ │ │ ├── loadConfig.ts # Env-var overrides + config fingerprint (v1.5.0+)
404
+ │ │ ├── types.ts # Type definitions
405
+ │ │ ├── utils.ts # Token-overlap scoring
406
+ │ │ └── resolvers/ # Multi-language import resolvers
241
407
  │ │ ├── JsTsResolver.ts
242
408
  │ │ ├── PythonResolver.ts
243
409
  │ │ ├── GoResolver.ts
244
410
  │ │ ├── JavaResolver.ts
245
- │ │ └── RustResolver.ts
246
- │ ├── mcp/ # MCP 服务端
247
- │ │ ├── server.ts # MCP 服务器实现
248
- ├── main.ts # MCP 入口
411
+ │ │ ├── RustResolver.ts
412
+ ├── CppResolver.ts
413
+ │ │ └── CSharpResolver.ts
414
+ │ ├── stats/ # Statistics aggregation layer (v1.5.0+)
415
+ │ │ └── index.ts # Aggregates and renders index/search/health metrics
416
+ │ ├── mcp/ # MCP server
417
+ │ │ ├── server.ts # MCP server implementation (registers 5 tools)
418
+ │ │ ├── main.ts # MCP entry
249
419
  │ │ └── tools/
250
- │ │ └── codebaseRetrieval.ts # 代码检索工具
251
- └── utils/ # 工具函数
252
- └── logger.ts # 日志系统
420
+ │ │ ├── index.ts # Tool registry
421
+ │ ├── shared.ts # Shared tool logic
422
+ ├── codebaseRetrieval.ts # Code retrieval tool
423
+ │ │ ├── listFiles.ts # File structure browsing (v1.5.0+)
424
+ │ │ ├── findReferences.ts # Symbol reference lookup (v1.5.0+)
425
+ │ │ ├── getSymbolDefinition.ts # Symbol definition lookup (v1.5.0+)
426
+ │ │ └── stats.ts # Statistics tool (v1.5.0+)
427
+ │ └── utils/ # Utilities
428
+ │ ├── logger.ts # Logging system
429
+ │ ├── encoding.ts # Encoding detection
430
+ │ └── lock.ts # File lock
431
+ ├── tests/ # Unit + integration tests (28 test files, 156 test cases)
432
+ │ ├── chunking/ # SourceAdapter / chunking
433
+ │ ├── cli/ # mirrorCommands
434
+ │ ├── db/ # migration, outbox, advisory lock, index-version
435
+ │ ├── indexer/ # transaction compensation, GC, aborted guard
436
+ │ ├── integration/ # real LanceDB end-to-end
437
+ │ ├── mcp/ # list-files / find-references / get-symbol-definition / shared / tool registry
438
+ │ ├── scanner/ # watcher / index-version
439
+ │ ├── search/ # FTS, ChunkContentLoader, Packer, cache, loadConfig
440
+ │ ├── stats/ # statistics aggregation
441
+ │ └── vectorStore/ # chunk_id de-duplication, sampling validation
253
442
  ├── package.json
254
443
  └── tsconfig.json
255
444
  ```
256
445
 
257
- ## ⚙️ 配置详解
446
+ ## ⚙️ Configuration Reference
447
+
448
+ ### Environment Variables
449
+
450
+ | Variable | Required | Default | Description |
451
+ |----------|----------|---------|-------------|
452
+ | `EMBEDDINGS_API_KEY` | ✅ | - | Embedding API key |
453
+ | `EMBEDDINGS_BASE_URL` | ✅ | - | Embedding API URL |
454
+ | `EMBEDDINGS_MODEL` | ✅ | - | Embedding model name |
455
+ | `EMBEDDINGS_MAX_CONCURRENCY` | ❌ | 10 | Embedding concurrency |
456
+ | `EMBEDDINGS_DIMENSIONS` | ❌ | 1024 | Vector dimensions |
457
+ | `RERANK_API_KEY` | ✅ | - | Reranker API key |
458
+ | `RERANK_BASE_URL` | ✅ | - | Reranker API URL |
459
+ | `RERANK_MODEL` | ✅ | - | Reranker model name |
460
+ | `RERANK_TOP_N` | ❌ | 20 | Rerank return count |
461
+ | `IGNORE_PATTERNS` | ❌ | - | Extra ignore patterns |
258
462
 
259
- ### 环境变量
463
+ ### Search Parameter Env Overrides (v1.5.0+)
260
464
 
261
- | 变量名 | 必需 | 默认值 | 描述 |
262
- |--------|------|--------|------|
263
- | `EMBEDDINGS_API_KEY` | ✅ | - | Embedding API 密钥 |
264
- | `EMBEDDINGS_BASE_URL` | ✅ | - | Embedding API 地址 |
265
- | `EMBEDDINGS_MODEL` | ✅ | - | Embedding 模型名称 |
266
- | `EMBEDDINGS_MAX_CONCURRENCY` | ❌ | 10 | Embedding 并发数 |
267
- | `EMBEDDINGS_DIMENSIONS` | ❌ | 1024 | 向量维度 |
268
- | `RERANK_API_KEY` | ✅ | - | Reranker API 密钥 |
269
- | `RERANK_BASE_URL` | ✅ | - | Reranker API 地址 |
270
- | `RERANK_MODEL` | ✅ | - | Reranker 模型名称 |
271
- | `RERANK_TOP_N` | ❌ | 20 | Rerank 返回数量 |
272
- | `IGNORE_PATTERNS` | ❌ | - | 额外忽略模式 |
465
+ The following environment variables override built-in defaults; out-of-range values are automatically clamped to the valid interval. When only one of `wVec`/`wLex` is set, the other is automatically set to `1 - x`.
273
466
 
274
- ### 搜索配置参数
467
+ | Variable | Default | Bounds | Description |
468
+ |----------|---------|--------|-------------|
469
+ | `CW_SEARCH_WVEC` | 0.6 | 0–1 | Vector weight (fusion stage) |
470
+ | `CW_SEARCH_WLEX` | 0.4 | 0–1 | Lexical weight (complements `wVec`) |
471
+ | `CW_SEARCH_RERANK_TOP_N` | 10 | 5–20 | Results kept after rerank |
472
+ | `CW_SEARCH_MAX_TOTAL_CHARS` | 48000 | 20000–80000 | Token budget (in chars, ~12k tokens) |
473
+ | `CW_SEARCH_VECTOR_TOP_K` | 80 | 40–200 | Vector recall candidates |
474
+ | `CW_SEARCH_SMART_MAX_K` | 8 | 5–15 | Smart TopK hard upper bound |
475
+ | `CW_SEARCH_IMPORT_FILES_PER_SEED` | 3 | 0–5 | E3 import files resolved per seed (0 disables cross-file expansion) |
476
+
477
+ ### Search Config Parameters (built-in defaults)
275
478
 
276
479
  ```typescript
277
480
  interface SearchConfig {
278
- // === 召回阶段 ===
279
- vectorTopK: number; // 向量召回数量(默认 30)
280
- vectorTopM: number; // 送入融合的向量结果数(默认 30)
281
- ftsTopKFiles: number; // FTS 召回文件数(默认 15)
282
- lexChunksPerFile: number; // 每文件词法 chunks 数(默认 3)
283
- lexTotalChunks: number; // 词法总 chunks 数(默认 30)
284
-
285
- // === 融合阶段 ===
286
- rrfK0: number; // RRF 平滑常数(默认 60)
287
- wVec: number; // 向量权重(默认 1.0)
288
- wLex: number; // 词法权重(默认 0.5)
289
- fusedTopM: number; // 融合后送 rerank 数量(默认 40)
481
+ // === Recall ===
482
+ vectorTopK: number; // Vector recall candidates (default 80)
483
+ vectorTopM: number; // Vectors kept after dedup (default 60)
484
+ ftsTopKFiles: number; // FTS recall file count (default 20)
485
+ lexChunksPerFile: number; // Lexical chunks per file (default 2)
486
+ lexTotalChunks: number; // Total lexical chunks (default 40)
487
+
488
+ // === Fusion ===
489
+ rrfK0: number; // RRF smoothing constant (default 20)
490
+ wVec: number; // Vector weight (default 0.6)
491
+ wLex: number; // Lexical weight (default 0.4)
492
+ fusedTopM: number; // Candidates fed into rerank after fusion (default 60)
290
493
 
291
494
  // === Rerank ===
292
- rerankTopN: number; // Rerank 后保留数量(默认 10
293
- maxRerankChars: number; // Rerank 文本最大字符数(默认 1200)
495
+ rerankTopN: number; // Results kept after rerank (default 10)
496
+ maxRerankChars: number; // Max chars per chunk sent to reranker (default 1000)
497
+ maxBreadcrumbChars: number;// Max chars for breadcrumb context (default 250)
498
+ headRatio: number; // Head/tail ratio when truncating (default 0.67)
499
+
500
+ // === Expansion ===
501
+ neighborHops: number; // E1 neighbor hops (default 2)
502
+ breadcrumbExpandLimit: number; // E2 breadcrumb completions (default 3)
503
+ importFilesPerSeed: number; // E3 import files per seed (default 3)
504
+ chunksPerImportFile: number; // E3 chunks per import file (default 3)
294
505
 
295
- // === 扩展策略 ===
296
- neighborHops: number; // E1 邻居跳数(默认 2)
297
- breadcrumbExpandLimit: number; // E2 面包屑补全数(默认 3)
298
- importFilesPerSeed: number; // E3 每 seed 导入文件数(默认 0)
299
- chunksPerImportFile: number; // E3 每导入文件 chunks(默认 0)
506
+ // === ContextPacker ===
507
+ maxSegmentsPerFile: number; // Max non-contiguous segments per file (default 3)
508
+ maxTotalChars: number; // Token budget (chars, default 48000)
300
509
 
301
510
  // === Smart TopK ===
302
- enableSmartTopK: boolean; // 启用智能截断(默认 true
303
- smartTopScoreRatio: number; // 动态阈值比例(默认 0.5
304
- smartMinScore: number; // 绝对下限(默认 0.25
305
- smartMinK: number; // Safe Harbor 数量(默认 2)
306
- smartMaxK: number; // 硬上限(默认 15)
511
+ enableSmartTopK: boolean; // Enable smart cutoff (default true)
512
+ smartTopScoreRatio: number; // Dynamic threshold ratio (default 0.5)
513
+ smartTopScoreDeltaAbs: number; // Max absolute drop from Top1 (default 0.25)
514
+ smartMinScore: number; // Absolute floor (default 0.25)
515
+ smartMinK: number; // Safe Harbor count (default 2)
516
+ smartMaxK: number; // Hard upper bound (default 8)
307
517
  }
308
518
  ```
309
519
 
310
- ## 🌍 多语言支持
520
+ ## 🌍 Multi-Language Support
311
521
 
312
- ContextWeaver 通过 Tree-sitter 原生支持以下编程语言的 AST 解析:
522
+ ContextWeaver natively supports AST parsing for the following languages via Tree-sitter:
313
523
 
314
- | 语言 | AST 解析 | Import 解析 | 文件扩展名 |
315
- |------|----------|-------------|-----------|
524
+ | Language | AST Parsing | Import Resolution | Extensions |
525
+ |----------|-------------|-------------------|------------|
316
526
  | TypeScript | ✅ | ✅ | `.ts`, `.tsx` |
317
- | JavaScript | ✅ | ✅ | `.js`, `.jsx`, `.mjs` |
527
+ | JavaScript | ✅ | ✅ | `.js`, `.jsx`, `.mjs`, `.cjs` |
318
528
  | Python | ✅ | ✅ | `.py` |
319
529
  | Go | ✅ | ✅ | `.go` |
320
530
  | Java | ✅ | ✅ | `.java` |
321
531
  | Rust | ✅ | ✅ | `.rs` |
532
+ | C | ✅ | ✅ | `.c`, `.h` |
533
+ | C++ | ✅ | ✅ | `.cpp`, `.cc`, `.cxx`, `.hpp` |
534
+ | C# | ✅ | ✅ | `.cs` |
322
535
 
323
- 其他语言会采用基于行的 Fallback 分片策略,仍可正常索引和搜索。
536
+ Other languages fall back to line-based chunking and can still be indexed and searched normally.
324
537
 
325
- ## 🔄 工作流程
538
+ ## 🔄 Workflows
326
539
 
327
- ### 索引流程
540
+ ### Indexing Flow
328
541
 
329
542
  ```
330
- 1. Crawler 遍历文件系统,过滤忽略项
331
- 2. Processor 读取文件内容,计算 hash
332
- 3. Splitter AST 解析,语义分片
333
- 4. Indexer 批量 Embedding,写入向量库
334
- 5. FTS Index 更新全文搜索索引
543
+ 0. Bootstrap pending_marks replay + LanceDB schema migration (first launch)
544
+ 1. Crawler traverse the filesystem, filter ignored items
545
+ 2. Processor read file content, compute hash
546
+ 3. Splitter AST parse, semantic chunking (offsets normalized to UTF-16 char domain)
547
+ 4. Indexer batch embedding
548
+ 5. Stages 4-6 pseudo-transaction:
549
+ ├─ LanceDB write (pre-delete (path, hash) to avoid duplicates → add → clear old versions)
550
+ ├─ FTS + outbox single SQLite transaction (rolls back LanceDB on failure)
551
+ └─ SQLite mark + clear outbox single transaction (outbox kept on failure, replayed next launch)
552
+ 6. Trailing GC → clean up LanceDB orphan chunks (time budget 5s)
335
553
  ```
336
554
 
337
- ### 搜索流程
555
+ ### Search Flow
338
556
 
339
557
  ```
340
- 1. Query Parse → 解析查询,分离语义和术语
341
- 2. Hybrid Recall 向量 + 词法双路召回
342
- 3. RRF Fusion Reciprocal Rank Fusion 融合
343
- 4. Rerank 交叉编码器精排
344
- 5. Smart Cutoff 智能分数截断
345
- 6. Graph Expand邻居/面包屑/导入扩展
346
- 7. Context Pack段落合并,Token 预算
347
- 8. Format Output 格式化返回给 LLM
558
+ 1. Query Parse → parse the query, separate semantics from terms
559
+ 2. Cache Lookup return immediately on hit (v1.5.0+, key includes index version + config fingerprint)
560
+ 3. Hybrid Recall dual-channel vector + lexical recall
561
+ 4. RRF Fusion Reciprocal Rank Fusion
562
+ 5. Rerank cross-encoder reranking
563
+ 6. Smart Cutoffintelligent score cutoff
564
+ 7. Graph Expandneighbor/breadcrumb/import expansion
565
+ 8. Context Pack segment merging, token budget
566
+ 9. Cache Store → write to cache (v1.5.0+)
567
+ 10. Format Output → format and return to the LLM
348
568
  ```
349
569
 
350
- ## 📊 性能特性
570
+ ## 📊 Performance Characteristics
571
+
572
+ - **Query cache**: repeated queries hit the LRU cache, skipping the entire recall/rerank/expansion pipeline (v1.5.0+)
573
+ - **Incremental indexing**: only changed files are processed; re-indexing is 10x+ faster
574
+ - **Batch embedding**: adaptive batch size with concurrency control
575
+ - **Rate-limit recovery**: automatic backoff on 429 errors, gradual recovery
576
+ - **Connection pool reuse**: pooled Tree-sitter parsers
577
+ - **File index caching**: lazy-loaded file-path index in GraphExpander
578
+ - **Zero-cost metadata tools**: `list-files`/`find-references`/`get-symbol-definition` do not call the Embedding API (v1.5.0+)
579
+
580
+ ## 📈 Statistics & Observability (v1.5.0+)
581
+
582
+ `contextweaver stats` outputs three sections:
583
+
584
+ - **Indexing process**: cumulative index run count, last index time, last-run snapshot (added/modified/deleted/unchanged/skipped/errors + vector index details)
585
+ - **Search quality/behavior**: cumulative queries, cache hit rate, actual compute runs, plus average per-stage latency (retrieve / rerank / expand / pack) and average recalled seed count
586
+ - **Health/consistency**: file count and total content size, LanceDB vector row count, embedding dimensions, index version, migration state, `pending_marks`, language breakdown
351
587
 
352
- - **增量索引**:只处理变更文件,二次索引速度提升 10x+
353
- - **批量 Embedding**:自适应批次大小,支持并发控制
354
- - **速率限制恢复**:429 错误时自动退避,渐进恢复
355
- - **连接池复用**:Tree-sitter 解析器池化复用
356
- - **文件索引缓存**:GraphExpander 文件路径索引 lazy load
588
+ When an abnormal migration state, `pending_marks` backlog, or missing vector rows are detected, the report appends **diagnostic warnings** with the corresponding fix commands. The `--json` output maps to `StatsReport` for scripts and monitoring systems.
357
589
 
358
- ## 🐛 日志与调试
590
+ ## 🐛 Logging & Debugging
359
591
 
360
- 日志文件位置:`~/.contextweaver/logs/app.YYYY-MM-DD.log`
592
+ Log file location: `~/.contextweaver/logs/app.YYYY-MM-DD.log`
361
593
 
362
- 设置日志级别:
594
+ Set the log level:
363
595
 
364
596
  ```bash
365
- # 开启 debug 日志
597
+ # Enable debug logging
366
598
  LOG_LEVEL=debug contextweaver search --information-request "..."
367
599
  ```
368
600
 
369
- ## 📄 开源协议
601
+ ## 🚨 Troubleshooting (v1.4.0+)
602
+
603
+ ### LanceDB Migration Stuck (`aborted` state)
604
+
605
+ **Symptom**: `contextweaver index` errors with "LanceDB is in the aborted state, refusing to write to prevent schema pollution."
606
+
607
+ **Cause**: during the v1.4.0 upgrade, the old LanceDB index's `display_code` differs from the current `files.content` by >1% on sampling (typically on legacy indexes whose chunk offsets used the UTF-8 byte domain).
608
+
609
+ **Fix**:
610
+ ```bash
611
+ contextweaver migrate --reset # Clear the LanceDB chunks table + reset state to done
612
+ contextweaver index # Full rebuild (new schema)
613
+ ```
614
+
615
+ You can also run `contextweaver stats` first to view diagnostic warnings and confirm the current migration state and `pending_marks` backlog.
616
+
617
+ ### Cross-Process Migration Race
618
+
619
+ If the MCP server is long-running and another terminal runs `contextweaver index`, the two processes contend for migration. v1.4.0 introduces an advisory lock with a 10-minute zombie threshold, automatically letting one process skip migration while the other completes it.
620
+
621
+ If the lock gets stuck (after `kill -9`), clear it manually:
622
+ ```bash
623
+ sqlite3 ~/.contextweaver/<projectId>/index.db \
624
+ "DELETE FROM metadata WHERE key = 'lancedb_migration_lock';"
625
+ ```
626
+
627
+ ### Wasted Duplicate Embeddings
628
+
629
+ v1.4.0 solves this via the `pending_marks` outbox: when an FTS write succeeds but the vector_index_hash mark fails, it is replayed automatically on the next launch, avoiding duplicate embeddings.
630
+
631
+ ### Search Results Don't Reflect Recent Changes
632
+
633
+ Confirm incremental indexing has run (or enable `contextweaver watch` for automatic increments). The query cache key is bound to the index version, so old cache entries invalidate automatically after an index update — no manual clearing needed.
634
+
635
+ ## 📜 Version History
636
+
637
+ - **v1.5.0** (2026-06): query cache, file watching, statistics, and multi-granularity MCP tools
638
+ - Added `QueryCache` (per-project LRU); the cache key includes the index version + search-config fingerprint and invalidates automatically
639
+ - Added `contextweaver watch` for file watching + debounced incremental indexing
640
+ - Added the `contextweaver stats` CLI (`--json`) and MCP `stats` tool: three metric groups + consistency diagnostics
641
+ - Added 3 MCP tools: `list-files` / `find-references` / `get-symbol-definition`, plus their CLI mirror commands
642
+ - Added `CW_SEARCH_*` environment variables to override search parameters (with bounds clamping)
643
+ - 28 test files / 156 test cases
644
+ - **v1.4.0** (2026-05): data architecture and cross-store consistency overhaul
645
+ - LanceDB chunks table drops `display_code/vector_text`; content is read back from `files.content`
646
+ - SemanticSplitter offsets unified to the UTF-16 character domain
647
+ - schema_version 2 → 3; added the `pending_marks` outbox + tri-state migration state machine
648
+ - Added the `contextweaver migrate` CLI
649
+ - Cross-process advisory lock prevents migration races
650
+ - **v1.3.x**: cross-store write transactionality, trailing auto-GC after scan, files_fts external-content table
651
+ - **v1.2.x**: search pipeline optimization, indexing memory optimization
652
+ - **v1.1.x**: Smart TopK cutoff, Smart Cutoff
653
+ - **v1.0.x**: initial release
654
+
655
+ ## 📄 License
370
656
 
371
- 本项目采用 MIT 许可证。
657
+ This project is licensed under the MIT License.
372
658
 
373
- ## 🙏 致谢
659
+ ## 🙏 Acknowledgements
374
660
 
375
- - [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) - 高性能语法解析
376
- - [LanceDB](https://lancedb.com/) - 嵌入式向量数据库
661
+ - [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) - high-performance syntax parsing
662
+ - [LanceDB](https://lancedb.com/) - embedded vector database
377
663
  - [MCP](https://modelcontextprotocol.io/) - Model Context Protocol
378
- - [SiliconFlow](https://siliconflow.cn/) - 推荐的 Embedding/Reranker API 服务
664
+ - [SiliconFlow](https://siliconflow.cn/) - recommended Embedding/Reranker API service
379
665
 
380
666
  ---
381
667