@lyy0709/contextweaver 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 hsingjui
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.en.md ADDED
@@ -0,0 +1,405 @@
1
+ # ContextWeaver
2
+
3
+ <p align="center">
4
+ <strong>🧡 A Context Weaving Engine for AI Agents</strong>
5
+ </p>
6
+
7
+ <p align="center">
8
+ <em>Semantic Code Retrieval for AI Agents β€” Hybrid Search β€’ Graph Expansion β€’ Token-Aware Packing β€’ Prompt Enhancer</em>
9
+ </p>
10
+
11
+ <p align="center">
12
+ English | <a href="./README.md">δΈ­ζ–‡</a>
13
+ </p>
14
+
15
+ ---
16
+
17
+ > **Fork Note**: This project is forked from [hsingjui/ContextWeaver](https://github.com/hsingjui/ContextWeaver), with the addition of a **Prompt Enhancer** feature supporting OpenAI / Claude / Gemini multi-LLM endpoints, CLI commands, and Web UI interaction.
18
+
19
+ **ContextWeaver** is a semantic retrieval engine designed for AI coding assistants. It uses hybrid search (vector + lexical), intelligent context expansion, and token-aware packing to provide precise, relevant, and context-complete code snippets for LLMs.
20
+
21
+ <p align="center">
22
+ <img src="docs/architecture.png" alt="ContextWeaver Architecture" width="800" />
23
+ </p>
24
+
25
+ ## ✨ Key Features
26
+
27
+ ### πŸ” Hybrid Retrieval Engine
28
+ - **Vector Retrieval**: Deep understanding via semantic similarity
29
+ - **Lexical Retrieval (FTS)**: Exact matching of function names, class names, and technical terms
30
+ - **RRF Fusion (Reciprocal Rank Fusion)**: Intelligent fusion of multi-path recall results
31
+
32
+ ### 🧠 AST-Based Semantic Chunking
33
+ - **Tree-sitter Parsing**: Supports TypeScript, JavaScript, Python, Go, Java, Rust, C, C++, C# β€” 9 languages
34
+ - **Dual-Text Strategy**: `displayCode` for display, `vectorText` for embedding
35
+ - **Gap-Aware Merging**: Smart handling of code gaps while preserving semantic integrity
36
+ - **Breadcrumb Injection**: Hierarchical path in vector text improves retrieval recall
37
+
38
+ ### πŸ“Š Three-Phase Context Expansion
39
+ - **E1 Neighbor Expansion**: Adjacent chunks in the same file for code block completeness
40
+ - **E2 Breadcrumb Completion**: Other methods under the same class/function for structural understanding
41
+ - **E3 Import Resolution**: Cross-file dependency tracking (configurable)
42
+
43
+ ### 🎯 Smart TopK Cutoff
44
+ - **Anchor & Floor**: Dynamic threshold + absolute lower bound
45
+ - **Delta Guard**: Prevents misjudgment in Top1 outlier scenarios
46
+ - **Safe Harbor**: First N results only check the lower bound, ensuring basic recall
47
+
48
+ ### πŸ”Œ Native MCP Support
49
+ - **MCP Server Mode**: One-click launch of Model Context Protocol server
50
+ - **Zen Design**: Intent-term separation, LLM-friendly API design
51
+ - **Auto-Indexing**: First query triggers indexing automatically, incremental updates are transparent
52
+
53
+ ### ✏️ Prompt Enhancer
54
+ - **Multi-LLM Support**: Switch between OpenAI / Claude / Gemini with one config
55
+ - **Three Interaction Modes**: MCP tool call, CLI command, Web UI browser interaction
56
+ - **Auto Language Detection**: Chinese input automatically gets Chinese output
57
+ - **Custom Templates**: Support for custom enhancement prompt templates
58
+
59
+ ## πŸ“¦ Quick Start
60
+
61
+ ### Requirements
62
+
63
+ - Node.js >= 20
64
+ - pnpm (recommended) or npm
65
+
66
+ ### Installation
67
+
68
+ ```bash
69
+ # Global install (enhanced version with Prompt Enhancer)
70
+ npm install -g @lyy0709/contextweaver
71
+
72
+ # Or using pnpm
73
+ pnpm add -g @lyy0709/contextweaver
74
+ ```
75
+
76
+ ### Initialize Configuration
77
+
78
+ ```bash
79
+ # Create config file (~/.contextweaver/.env)
80
+ contextweaver init
81
+ # Or shorthand
82
+ cw init
83
+ ```
84
+
85
+ Edit `~/.contextweaver/.env` and fill in your API keys:
86
+
87
+ ```bash
88
+ # Embedding API (required)
89
+ EMBEDDINGS_API_KEY=your-api-key-here
90
+ EMBEDDINGS_BASE_URL=https://api.siliconflow.cn/v1/embeddings
91
+ EMBEDDINGS_MODEL=BAAI/bge-m3
92
+ EMBEDDINGS_MAX_CONCURRENCY=10
93
+ EMBEDDINGS_DIMENSIONS=1024
94
+
95
+ # Reranker (required)
96
+ RERANK_API_KEY=your-api-key-here
97
+ RERANK_BASE_URL=https://api.siliconflow.cn/v1/rerank
98
+ RERANK_MODEL=BAAI/bge-reranker-v2-m3
99
+ RERANK_TOP_N=20
100
+
101
+ # Ignore patterns (optional, comma-separated)
102
+ # IGNORE_PATTERNS=.venv,node_modules
103
+
104
+ # Prompt Enhancer (optional, required when using enhance / enhance-prompt)
105
+ # PROMPT_ENHANCER_ENDPOINT=openai # Endpoint: openai / claude / gemini
106
+ # PROMPT_ENHANCER_BASE_URL= # Custom API URL (for proxies, etc.)
107
+ # PROMPT_ENHANCER_TOKEN=your-api-key-here # API key (required for enhance)
108
+ # PROMPT_ENHANCER_MODEL= # Custom model override
109
+ # PROMPT_ENHANCER_TEMPLATE= # Custom template file path
110
+ ```
111
+
112
+ ### Index a Codebase
113
+
114
+ ```bash
115
+ # Index current directory
116
+ contextweaver index
117
+
118
+ # Index a specific path
119
+ contextweaver index /path/to/your/project
120
+
121
+ # Force re-index
122
+ contextweaver index --force
123
+ ```
124
+
125
+ ### Local Search
126
+
127
+ ```bash
128
+ # Semantic search
129
+ cw search --information-request "How is user authentication implemented?"
130
+
131
+ # With exact terms
132
+ cw search --information-request "Database connection logic" --technical-terms "DatabasePool,Connection"
133
+ ```
134
+
135
+ ### Prompt Enhancement
136
+
137
+ ```bash
138
+ # Launch Web UI for interactive editing (default)
139
+ cw enhance "Implement a cached semantic search"
140
+
141
+ # Direct output to stdout
142
+ cw enhance "Implement a cached semantic search" --no-browser
143
+
144
+ # Specify endpoint temporarily (openai/claude/gemini)
145
+ cw enhance "Implement a cached semantic search" --endpoint claude --no-browser
146
+ ```
147
+
148
+ ### Start MCP Server
149
+
150
+ ```bash
151
+ # Start MCP server (for Claude and other AI assistants)
152
+ contextweaver mcp
153
+ ```
154
+
155
+ ## πŸ”§ MCP Integration
156
+
157
+ ### Claude Desktop / Claude Code Configuration
158
+
159
+ Add to your config file:
160
+
161
+ ```json
162
+ {
163
+ "mcpServers": {
164
+ "contextweaver": {
165
+ "command": "contextweaver",
166
+ "args": ["mcp"]
167
+ }
168
+ }
169
+ }
170
+ ```
171
+
172
+ ### MCP Tools
173
+
174
+ ContextWeaver provides two MCP tools:
175
+
176
+ - `codebase-retrieval`: Codebase search (primary tool)
177
+ - `enhance-prompt`: Prompt enhancement (optional, requires external LLM API config)
178
+
179
+ #### `codebase-retrieval` Parameters
180
+
181
+ | Parameter | Type | Required | Description |
182
+ |-----------|------|----------|-------------|
183
+ | `repo_path` | string | βœ… | Absolute path to the repository root |
184
+ | `information_request` | string | βœ… | Semantic intent in natural language |
185
+ | `technical_terms` | string[] | ❌ | Exact technical terms (class names, function names, etc.) |
186
+
187
+ #### Zen Design Philosophy
188
+
189
+ - **Intent-Term Separation**: `information_request` describes "what to do", `technical_terms` filters "what it's called"
190
+ - **Golden Defaults**: Provides same-file context, no cross-file crawling by default
191
+ - **Agent Autonomy**: The tool only locates; cross-file exploration is driven by the Agent
192
+
193
+ #### `enhance-prompt` Parameters
194
+
195
+ | Parameter | Type | Required | Description |
196
+ |-----------|------|----------|-------------|
197
+ | `prompt` | string | βœ… | The original prompt to enhance |
198
+ | `conversation_history` | string | ❌ | Conversation history (`User: ...\nAssistant: ...`) |
199
+ | `project_root_path` | string | ❌ | Project root path for context |
200
+
201
+ #### Prompt Enhancer Endpoint Defaults
202
+
203
+ | Endpoint | Default Base URL | Default Model |
204
+ |----------|-----------------|---------------|
205
+ | `openai` | `https://api.openai.com/v1/chat/completions` | `gpt-4o-mini` |
206
+ | `claude` | `https://api.anthropic.com/v1/messages` | `claude-sonnet-4-20250514` |
207
+ | `gemini` | `https://generativelanguage.googleapis.com/v1beta` | `gemini-2.0-flash` |
208
+
209
+ ## πŸ—οΈ Architecture
210
+
211
+ ```mermaid
212
+ flowchart TB
213
+ subgraph Interface["CLI / MCP Interface"]
214
+ CLI[contextweaver CLI]
215
+ MCP[MCP Server]
216
+ end
217
+
218
+ subgraph Search["SearchService"]
219
+ VR[Vector Retrieval]
220
+ LR[Lexical Retrieval]
221
+ RRF[RRF Fusion + Rerank]
222
+ VR --> RRF
223
+ LR --> RRF
224
+ end
225
+
226
+ subgraph Expand["Context Expansion"]
227
+ GE[GraphExpander]
228
+ CP[ContextPacker]
229
+ GE --> CP
230
+ end
231
+
232
+ subgraph Storage["Storage Layer"]
233
+ VS[(VectorStore<br/>LanceDB)]
234
+ DB[(SQLite<br/>FTS5)]
235
+ end
236
+
237
+ subgraph Index["Indexing Pipeline"]
238
+ CR[Crawler<br/>fdir] --> SS[SemanticSplitter<br/>Tree-sitter] --> IX[Indexer<br/>Batch Embedding]
239
+ end
240
+
241
+ subgraph Enhancer["Prompt Enhancer"]
242
+ PE[enhancePrompt]
243
+ LLM[LLM Adapters<br/>OpenAI / Claude / Gemini]
244
+ WEB[Web UI Server]
245
+ PE --> LLM
246
+ PE --> WEB
247
+ end
248
+
249
+ Interface --> Search
250
+ Interface --> Enhancer
251
+ RRF --> GE
252
+ Search <--> Storage
253
+ Expand <--> Storage
254
+ Index --> Storage
255
+ ```
256
+
257
+ ### Core Modules
258
+
259
+ | Module | Responsibility |
260
+ |--------|---------------|
261
+ | **SearchService** | Hybrid search core: vector/lexical recall, RRF fusion, reranking |
262
+ | **GraphExpander** | Context expander: E1/E2/E3 three-phase expansion |
263
+ | **ContextPacker** | Context packer: paragraph merging and token budget control |
264
+ | **VectorStore** | LanceDB adapter: vector index CRUD |
265
+ | **SQLite (FTS5)** | Metadata storage + full-text search index |
266
+ | **SemanticSplitter** | AST semantic chunker based on Tree-sitter |
267
+ | **Prompt Enhancer** | Prompt enhancement: multi-LLM adapters, Web UI interaction |
268
+
269
+ ## πŸ“ Project Structure
270
+
271
+ ```
272
+ contextweaver/
273
+ β”œβ”€β”€ src/
274
+ β”‚ β”œβ”€β”€ index.ts # CLI entry point
275
+ β”‚ β”œβ”€β”€ config.ts # Configuration (environment variables)
276
+ β”‚ β”œβ”€β”€ api/ # External API clients
277
+ β”‚ β”‚ β”œβ”€β”€ embed.ts # Embedding API
278
+ β”‚ β”‚ └── rerank.ts # Reranker API
279
+ β”‚ β”œβ”€β”€ chunking/ # Semantic chunking
280
+ β”‚ β”‚ β”œβ”€β”€ SemanticSplitter.ts # AST semantic chunker
281
+ β”‚ β”‚ β”œβ”€β”€ SourceAdapter.ts # Source adapter
282
+ β”‚ β”‚ β”œβ”€β”€ LanguageSpec.ts # Language specifications
283
+ β”‚ β”‚ └── ParserPool.ts # Tree-sitter parser pool
284
+ β”‚ β”œβ”€β”€ scanner/ # File scanning
285
+ β”‚ β”œβ”€β”€ indexer/ # Indexing
286
+ β”‚ β”œβ”€β”€ vectorStore/ # Vector storage (LanceDB)
287
+ β”‚ β”œβ”€β”€ db/ # Database (SQLite + FTS5)
288
+ β”‚ β”œβ”€β”€ search/ # Search service
289
+ β”‚ β”‚ β”œβ”€β”€ SearchService.ts # Core search service
290
+ β”‚ β”‚ β”œβ”€β”€ GraphExpander.ts # Context expander
291
+ β”‚ β”‚ β”œβ”€β”€ ContextPacker.ts # Context packer
292
+ β”‚ β”‚ └── resolvers/ # Multi-language import resolvers
293
+ β”‚ β”œβ”€β”€ enhancer/ # Prompt Enhancer
294
+ β”‚ β”‚ β”œβ”€β”€ index.ts # Enhancement orchestration
295
+ β”‚ β”‚ β”œβ”€β”€ template.ts # Template management
296
+ β”‚ β”‚ β”œβ”€β”€ detect.ts # Language detection
297
+ β”‚ β”‚ β”œβ”€β”€ parser.ts # Response parsing
298
+ β”‚ β”‚ β”œβ”€β”€ llmClient.ts # LLM client interface + factory
299
+ β”‚ β”‚ β”œβ”€β”€ server.ts # Web UI HTTP server
300
+ β”‚ β”‚ β”œβ”€β”€ ui.ts # Frontend page template
301
+ β”‚ β”‚ β”œβ”€β”€ browser.ts # Browser launcher
302
+ β”‚ β”‚ └── adapters/ # LLM API adapters
303
+ β”‚ β”‚ β”œβ”€β”€ openai.ts
304
+ β”‚ β”‚ β”œβ”€β”€ claude.ts
305
+ β”‚ β”‚ └── gemini.ts
306
+ β”‚ β”œβ”€β”€ mcp/ # MCP server
307
+ β”‚ β”‚ β”œβ”€β”€ server.ts # MCP server implementation
308
+ β”‚ β”‚ β”œβ”€β”€ main.ts # MCP entry point
309
+ β”‚ β”‚ └── tools/
310
+ β”‚ β”‚ β”œβ”€β”€ codebaseRetrieval.ts # Code retrieval tool
311
+ β”‚ β”‚ └── enhancePrompt.ts # Prompt enhancement tool
312
+ β”‚ └── utils/ # Utilities
313
+ β”‚ └── logger.ts # Logging system
314
+ β”œβ”€β”€ tests/ # Unit tests
315
+ β”œβ”€β”€ package.json
316
+ β”œβ”€β”€ tsconfig.json
317
+ └── vitest.config.ts
318
+ ```
319
+
320
+ ## βš™οΈ Configuration Reference
321
+
322
+ ### Environment Variables
323
+
324
+ | Variable | Required | Default | Description |
325
+ |----------|----------|---------|-------------|
326
+ | `EMBEDDINGS_API_KEY` | βœ… | - | Embedding API key |
327
+ | `EMBEDDINGS_BASE_URL` | βœ… | - | Embedding API URL |
328
+ | `EMBEDDINGS_MODEL` | βœ… | - | Embedding model name |
329
+ | `EMBEDDINGS_MAX_CONCURRENCY` | ❌ | 10 | Embedding concurrency |
330
+ | `EMBEDDINGS_DIMENSIONS` | ❌ | 1024 | Vector dimensions |
331
+ | `RERANK_API_KEY` | βœ… | - | Reranker API key |
332
+ | `RERANK_BASE_URL` | βœ… | - | Reranker API URL |
333
+ | `RERANK_MODEL` | βœ… | - | Reranker model name |
334
+ | `RERANK_TOP_N` | ❌ | 20 | Rerank return count |
335
+ | `IGNORE_PATTERNS` | ❌ | - | Extra ignore patterns |
336
+ | `PROMPT_ENHANCER_ENDPOINT` | ❌ | `openai` | Enhancer endpoint (openai/claude/gemini) |
337
+ | `PROMPT_ENHANCER_TOKEN` | ❌* | - | Enhancer API key (*required when using enhance) |
338
+ | `PROMPT_ENHANCER_BASE_URL` | ❌ | per endpoint | Custom enhancer API URL |
339
+ | `PROMPT_ENHANCER_MODEL` | ❌ | per endpoint | Custom enhancer model |
340
+ | `PROMPT_ENHANCER_TEMPLATE` | ❌ | - | Custom enhancer template path |
341
+
342
+ ## 🌍 Language Support
343
+
344
+ ContextWeaver natively supports AST parsing for the following languages via Tree-sitter:
345
+
346
+ | Language | AST Parsing | Import Resolution | File Extensions |
347
+ |----------|-------------|-------------------|-----------------|
348
+ | TypeScript | βœ… | βœ… | `.ts`, `.tsx` |
349
+ | JavaScript | βœ… | βœ… | `.js`, `.jsx`, `.mjs` |
350
+ | Python | βœ… | βœ… | `.py` |
351
+ | Go | βœ… | βœ… | `.go` |
352
+ | Java | βœ… | βœ… | `.java` |
353
+ | Rust | βœ… | βœ… | `.rs` |
354
+ | C | βœ… | βœ… | `.c`, `.h` |
355
+ | C++ | βœ… | βœ… | `.cpp`, `.hpp`, `.cc`, `.cxx` |
356
+ | C# | βœ… | βœ… | `.cs` |
357
+
358
+ Other languages use a line-based fallback chunking strategy and can still be indexed and searched.
359
+
360
+ ## πŸ“Š Performance
361
+
362
+ - **Incremental Indexing**: Only processes changed files, 10x+ speedup on re-index
363
+ - **Batch Embedding**: Adaptive batch sizing with concurrency control
364
+ - **Rate Limit Recovery**: Auto-backoff on 429 errors with progressive recovery
365
+ - **Connection Pooling**: Tree-sitter parser pool reuse
366
+ - **File Index Cache**: GraphExpander file path index lazy loading
367
+
368
+ ## πŸ§ͺ Testing
369
+
370
+ ```bash
371
+ # Run tests
372
+ pnpm test
373
+
374
+ # Watch mode
375
+ pnpm test:watch
376
+ ```
377
+
378
+ ## πŸ› Logging & Debugging
379
+
380
+ Log file location: `~/.contextweaver/logs/app.YYYY-MM-DD.log`
381
+
382
+ Set log level:
383
+
384
+ ```bash
385
+ # Enable debug logging
386
+ LOG_LEVEL=debug contextweaver search --information-request "..."
387
+ ```
388
+
389
+ ## πŸ“„ License
390
+
391
+ This project is licensed under the MIT License.
392
+
393
+ ## πŸ™ Acknowledgments
394
+
395
+ - [hsingjui/ContextWeaver](https://github.com/hsingjui/ContextWeaver) β€” Original project
396
+ - [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) β€” High-performance syntax parsing
397
+ - [LanceDB](https://lancedb.com/) β€” Embedded vector database
398
+ - [MCP](https://modelcontextprotocol.io/) β€” Model Context Protocol
399
+ - [SiliconFlow](https://siliconflow.cn/) β€” Recommended Embedding/Reranker API provider
400
+
401
+ ---
402
+
403
+ <p align="center">
404
+ <sub>Made with ❀️ for AI-assisted coding</sub>
405
+ </p>