lmgrep 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +181 -0
- package/completions/_lmgrep +88 -0
- package/dist/chunker/context.d.ts +25 -0
- package/dist/chunker/context.js +204 -0
- package/dist/chunker/context.js.map +1 -0
- package/dist/chunker/index.d.ts +3 -0
- package/dist/chunker/index.js +145 -0
- package/dist/chunker/index.js.map +1 -0
- package/dist/chunker/languages.d.ts +15 -0
- package/dist/chunker/languages.js +251 -0
- package/dist/chunker/languages.js.map +1 -0
- package/dist/cli.d.ts +2 -0
- package/dist/cli.js +69 -0
- package/dist/cli.js.map +1 -0
- package/dist/config.d.ts +2 -0
- package/dist/config.js +31 -0
- package/dist/config.js.map +1 -0
- package/dist/embedder.d.ts +3 -0
- package/dist/embedder.js +55 -0
- package/dist/embedder.js.map +1 -0
- package/dist/index-cmd.d.ts +9 -0
- package/dist/index-cmd.js +250 -0
- package/dist/index-cmd.js.map +1 -0
- package/dist/mcp.d.ts +2 -0
- package/dist/mcp.js +80 -0
- package/dist/mcp.js.map +1 -0
- package/dist/providers.d.ts +1 -0
- package/dist/providers.js +43 -0
- package/dist/providers.js.map +1 -0
- package/dist/repair-cmd.d.ts +5 -0
- package/dist/repair-cmd.js +112 -0
- package/dist/repair-cmd.js.map +1 -0
- package/dist/search-cmd.d.ts +10 -0
- package/dist/search-cmd.js +60 -0
- package/dist/search-cmd.js.map +1 -0
- package/dist/serve-cmd.d.ts +1 -0
- package/dist/serve-cmd.js +139 -0
- package/dist/serve-cmd.js.map +1 -0
- package/dist/status-cmd.d.ts +5 -0
- package/dist/status-cmd.js +119 -0
- package/dist/status-cmd.js.map +1 -0
- package/dist/store.d.ts +25 -0
- package/dist/store.js +207 -0
- package/dist/store.js.map +1 -0
- package/dist/types.d.ts +40 -0
- package/dist/types.js +2 -0
- package/dist/types.js.map +1 -0
- package/dist/walker.d.ts +3 -0
- package/dist/walker.js +90 -0
- package/dist/walker.js.map +1 -0
- package/package.json +57 -0
package/README.md
ADDED
|
@@ -0,0 +1,181 @@
|
|
|
1
|
+
# lmgrep
|
|
2
|
+
|
|
3
|
+
Semantic code search powered by AI embeddings. Index your codebase with any embedding provider and search it using natural language.
|
|
4
|
+
|
|
5
|
+
lmgrep uses [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) to parse source code into meaningful chunks (functions, classes, interfaces, etc.), embeds them with the AI model of your choice, and stores the vectors in a local [LanceDB](https://lancedb.github.io/lancedb/) database. Queries are matched by semantic similarity, so you find code by *intent* rather than exact strings.
|
|
6
|
+
|
|
7
|
+
## Features
|
|
8
|
+
|
|
9
|
+
- **Any embedding provider** — works with Ollama, OpenAI, Google, or any provider supported by the [Vercel AI SDK](https://sdk.vercel.ai/)
|
|
10
|
+
- **Tree-sitter chunking** — splits code at AST boundaries so search results are complete, meaningful units
|
|
11
|
+
- **MCP server** — ships with an MCP server (`lmgrep-mcp`) for integration with Claude Code, Cursor, and other AI tools
|
|
12
|
+
- **File watching** — `lmgrep serve` watches for changes and incrementally re-indexes
|
|
13
|
+
- **Cross-project search** — search across multiple indexed projects
|
|
14
|
+
- **Git-aware** — respects `.gitignore`, deduplicates across worktrees sharing the same remote
|
|
15
|
+
- **Configurable** — global or per-project config, custom ignore patterns, extension filtering
|
|
16
|
+
|
|
17
|
+
## Quick start
|
|
18
|
+
|
|
19
|
+
### 1. Install
|
|
20
|
+
|
|
21
|
+
```sh
|
|
22
|
+
pnpm install -g lmgrep
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
### 2. Set up an embedding model
|
|
26
|
+
|
|
27
|
+
The fastest way to get started is with [Ollama](https://ollama.com/):
|
|
28
|
+
|
|
29
|
+
```sh
|
|
30
|
+
# Install Ollama
|
|
31
|
+
curl -fsSL https://ollama.com/install.sh | sh
|
|
32
|
+
|
|
33
|
+
# Pull an embedding model
|
|
34
|
+
ollama pull nomic-embed-text
|
|
35
|
+
|
|
36
|
+
# Auto-detect and write config
|
|
37
|
+
lmgrep init
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
This creates a config file at `~/.config/lmgrep/config.yml` (Linux) or `~/Library/Application Support/lmgrep/config.yml` (macOS).
|
|
41
|
+
|
|
42
|
+
### 3. Index your project
|
|
43
|
+
|
|
44
|
+
```sh
|
|
45
|
+
cd /path/to/your/project
|
|
46
|
+
lmgrep index
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
### 4. Search
|
|
50
|
+
|
|
51
|
+
```sh
|
|
52
|
+
lmgrep search "how are users authenticated"
|
|
53
|
+
lmgrep search "database connection pooling" --limit 5
|
|
54
|
+
lmgrep search "error handling" --file-prefix src/lib --language .ts
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## CLI commands
|
|
58
|
+
|
|
59
|
+
| Command | Description |
|
|
60
|
+
|---|---|
|
|
61
|
+
| `lmgrep index` | Index the current directory |
|
|
62
|
+
| `lmgrep search <query>` | Search using natural language |
|
|
63
|
+
| `lmgrep status` | Show index stats and embedding connectivity |
|
|
64
|
+
| `lmgrep serve` | Watch for changes and re-index automatically |
|
|
65
|
+
| `lmgrep init` | Detect embedding setup and create config |
|
|
66
|
+
| `lmgrep config` | Open the global config in your editor |
|
|
67
|
+
| `lmgrep repair` | Detect and fix index inconsistencies |
|
|
68
|
+
| `lmgrep compact` | Compact the index to reclaim disk space |
|
|
69
|
+
| `lmgrep prune` | Delete the index for the current directory |
|
|
70
|
+
| `lmgrep import [db-path]` | Import chunks from another lmgrep database |
|
|
71
|
+
| `lmgrep completions zsh` | Output or install zsh completions |
|
|
72
|
+
|
|
73
|
+
### Search options
|
|
74
|
+
|
|
75
|
+
```
|
|
76
|
+
--limit <n> Max results (default: 25)
|
|
77
|
+
--file-prefix <path> Only search files under this path
|
|
78
|
+
--language <exts> Filter by file extension (e.g. .ts,.py)
|
|
79
|
+
--type <types> Filter by AST node type (e.g. function_declaration)
|
|
80
|
+
--not <query> Exclude results similar to this query
|
|
81
|
+
--scores Show relevance scores
|
|
82
|
+
--compact Show file paths only
|
|
83
|
+
--json Output as JSON
|
|
84
|
+
--project <path> Search a different project's index
|
|
85
|
+
--across <paths> Search multiple projects (comma-separated)
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
### Index options
|
|
89
|
+
|
|
90
|
+
```
|
|
91
|
+
--reset Rebuild the entire index from scratch
|
|
92
|
+
--since <dur> Only re-index files modified within duration (e.g. 10m, 2h, 1d)
|
|
93
|
+
--force Force re-embed even if file hash is unchanged
|
|
94
|
+
--dry Show what would be indexed without doing it
|
|
95
|
+
--verbose Show file-by-file progress
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
## MCP server
|
|
99
|
+
|
|
100
|
+
lmgrep includes an MCP server for use with AI coding assistants. Add it to your tool's MCP configuration:
|
|
101
|
+
|
|
102
|
+
```json
|
|
103
|
+
{
|
|
104
|
+
"mcpServers": {
|
|
105
|
+
"lmgrep": {
|
|
106
|
+
"command": "lmgrep-mcp",
|
|
107
|
+
"args": []
|
|
108
|
+
}
|
|
109
|
+
}
|
|
110
|
+
}
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
The MCP server exposes a `search` tool and a `list_other_indexed_projects` tool. It automatically watches for file changes and keeps the index up to date.
|
|
114
|
+
|
|
115
|
+
## Configuration
|
|
116
|
+
|
|
117
|
+
lmgrep looks for configuration in this order:
|
|
118
|
+
|
|
119
|
+
1. `.lmgrep.yml` in the project root (per-project)
|
|
120
|
+
2. `~/.config/lmgrep/config.yml` (global, Linux) or `~/Library/Application Support/lmgrep/config.yml` (macOS)
|
|
121
|
+
3. `~/.lmgrep.yml` (legacy fallback)
|
|
122
|
+
|
|
123
|
+
### Example config
|
|
124
|
+
|
|
125
|
+
```yaml
|
|
126
|
+
# Embedding model in "provider:model" format
|
|
127
|
+
model: ollama:nomic-embed-text
|
|
128
|
+
|
|
129
|
+
# Base URL for the embedding API
|
|
130
|
+
baseURL: http://localhost:11434/v1
|
|
131
|
+
|
|
132
|
+
# Batch size for embedding API calls
|
|
133
|
+
batchSize: 100
|
|
134
|
+
|
|
135
|
+
# Optional: embedding dimensions (if model supports it)
|
|
136
|
+
# dimensions: 384
|
|
137
|
+
|
|
138
|
+
# Optional: max tokens per chunk (estimated at 4 chars/token)
|
|
139
|
+
# maxTokens: 8192
|
|
140
|
+
|
|
141
|
+
# Optional: prefixes for asymmetric embedding models
|
|
142
|
+
# queryPrefix: "search_query: "
|
|
143
|
+
# documentPrefix: "search_document: "
|
|
144
|
+
|
|
145
|
+
# Optional: additional ignore patterns (merged with .gitignore)
|
|
146
|
+
# ignore:
|
|
147
|
+
# - "*.generated.ts"
|
|
148
|
+
# - "fixtures/"
|
|
149
|
+
|
|
150
|
+
# Optional: file extension control
|
|
151
|
+
# extensions:
|
|
152
|
+
# include: [".sql", ".graphql", ".proto"]
|
|
153
|
+
# exclude: [".json"]
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
### Using other providers
|
|
157
|
+
|
|
158
|
+
Install the provider package globally and set the model accordingly:
|
|
159
|
+
|
|
160
|
+
```sh
|
|
161
|
+
# OpenAI
|
|
162
|
+
npm install -g @ai-sdk/openai
|
|
163
|
+
# then in config: model: openai:text-embedding-3-small
|
|
164
|
+
|
|
165
|
+
# Google
|
|
166
|
+
npm install -g @ai-sdk/google
|
|
167
|
+
# then in config: model: google:text-embedding-004
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
## Development
|
|
171
|
+
|
|
172
|
+
```sh
|
|
173
|
+
pnpm install
|
|
174
|
+
pnpm build # compile TypeScript
|
|
175
|
+
pnpm dev # watch mode
|
|
176
|
+
pnpm check # format and lint (Biome)
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
## License
|
|
180
|
+
|
|
181
|
+
Apache-2.0
|
|
@@ -0,0 +1,88 @@
|
|
|
1
|
+
#compdef lmgrep
|
|
2
|
+
|
|
3
|
+
_lmgrep() {
|
|
4
|
+
local -a commands
|
|
5
|
+
commands=(
|
|
6
|
+
'index:Index the current directory for semantic search'
|
|
7
|
+
'search:Search the codebase using natural language'
|
|
8
|
+
'status:Show index stats and check embedding connectivity'
|
|
9
|
+
'repair:Detect and fix index inconsistencies'
|
|
10
|
+
'serve:Watch the current directory and re-index on changes'
|
|
11
|
+
'init:Detect your embedding setup and create config'
|
|
12
|
+
'config:Open the global config file in your editor'
|
|
13
|
+
'compact:Compact the index to reclaim disk space'
|
|
14
|
+
'prune:Delete the index database for the current directory'
|
|
15
|
+
'import:Import chunks and file hashes from another lmgrep database'
|
|
16
|
+
'completions:Output shell completions'
|
|
17
|
+
)
|
|
18
|
+
|
|
19
|
+
_arguments -C \
|
|
20
|
+
'(-h --help)'{-h,--help}'[Show help]' \
|
|
21
|
+
'(-V --version)'{-V,--version}'[Show version]' \
|
|
22
|
+
'1:command:->command' \
|
|
23
|
+
'*::arg:->args'
|
|
24
|
+
|
|
25
|
+
case "$state" in
|
|
26
|
+
command)
|
|
27
|
+
_describe 'command' commands
|
|
28
|
+
;;
|
|
29
|
+
args)
|
|
30
|
+
case "$words[1]" in
|
|
31
|
+
index)
|
|
32
|
+
_arguments \
|
|
33
|
+
'(-r --reset)'{-r,--reset}'[Reset and rebuild the entire index]' \
|
|
34
|
+
'(-v --verbose)'{-v,--verbose}'[Show file-by-file progress]' \
|
|
35
|
+
'(-s --since)'{-s,--since}'[Only consider files modified within duration]:duration' \
|
|
36
|
+
'(-f --force)'{-f,--force}'[Force re-embed even if file hash unchanged]' \
|
|
37
|
+
'(-d --dry)'{-d,--dry}'[Show what would be indexed without actually doing it]'
|
|
38
|
+
;;
|
|
39
|
+
search)
|
|
40
|
+
_arguments \
|
|
41
|
+
'1:query' \
|
|
42
|
+
'(-m --limit)'{-m,--limit}'[Max results]:number' \
|
|
43
|
+
'--scores[Show relevance scores]' \
|
|
44
|
+
'--compact[Show file paths only]' \
|
|
45
|
+
'--json[Output results as JSON]' \
|
|
46
|
+
'--min-score[Minimum score threshold]:score' \
|
|
47
|
+
'--file-prefix[Only search files matching this path prefix]:prefix:_files -/' \
|
|
48
|
+
'--not[Exclude results similar to this query]:query' \
|
|
49
|
+
'--type[Only return chunks of these AST types]:types' \
|
|
50
|
+
'--language[Only return chunks from files with these extensions]:extensions' \
|
|
51
|
+
'--project[Search a different project index]:path:_files -/' \
|
|
52
|
+
'--across[Search multiple project indexes]:paths'
|
|
53
|
+
;;
|
|
54
|
+
status)
|
|
55
|
+
_arguments \
|
|
56
|
+
'(-c --changes)'{-c,--changes}'[Scan for changed files since last index]' \
|
|
57
|
+
'--json[Output status as JSON]'
|
|
58
|
+
;;
|
|
59
|
+
repair)
|
|
60
|
+
_arguments \
|
|
61
|
+
'(-d --dry)'{-d,--dry}'[Show what would be repaired without making changes]' \
|
|
62
|
+
'--json[Output repair results as JSON]'
|
|
63
|
+
;;
|
|
64
|
+
init)
|
|
65
|
+
_arguments \
|
|
66
|
+
'--force[Overwrite existing config]' \
|
|
67
|
+
'--local[Write a project-local .lmgrep.yml instead of the global config]'
|
|
68
|
+
;;
|
|
69
|
+
prune)
|
|
70
|
+
_arguments \
|
|
71
|
+
'--force[Skip confirmation]'
|
|
72
|
+
;;
|
|
73
|
+
import)
|
|
74
|
+
_arguments \
|
|
75
|
+
'1:database path:_files' \
|
|
76
|
+
'--reset[Reset the current index before importing]'
|
|
77
|
+
;;
|
|
78
|
+
completions)
|
|
79
|
+
_arguments \
|
|
80
|
+
'1:shell:(zsh)' \
|
|
81
|
+
'--install[Install to site-functions]'
|
|
82
|
+
;;
|
|
83
|
+
esac
|
|
84
|
+
;;
|
|
85
|
+
esac
|
|
86
|
+
}
|
|
87
|
+
|
|
88
|
+
_lmgrep "$@"
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
import type Parser from "web-tree-sitter";
|
|
2
|
+
import type { LanguageConfig } from "./languages.js";
|
|
3
|
+
export type StructuralRole = "definition" | "orchestration" | "implementation";
|
|
4
|
+
export interface ScopeEntry {
|
|
5
|
+
kind: string;
|
|
6
|
+
name: string;
|
|
7
|
+
}
|
|
8
|
+
export interface ChunkContext {
|
|
9
|
+
filePath: string;
|
|
10
|
+
scope: ScopeEntry[];
|
|
11
|
+
leadingComment: string | null;
|
|
12
|
+
role: StructuralRole;
|
|
13
|
+
}
|
|
14
|
+
/** Walk up from a node to collect typed parent scopes */
|
|
15
|
+
export declare function extractScope(node: Parser.SyntaxNode, langConfig: LanguageConfig): ScopeEntry[];
|
|
16
|
+
/** Extract import module names from the root of the tree */
|
|
17
|
+
export declare function extractImports(tree: Parser.Tree, langConfig: LanguageConfig): string[];
|
|
18
|
+
/** Extract sibling signatures from the same parent scope */
|
|
19
|
+
export declare function extractSiblingSignatures(node: Parser.SyntaxNode, langConfig: LanguageConfig): string[];
|
|
20
|
+
/** Extract leading comments and decorators immediately before a node */
|
|
21
|
+
export declare function extractLeadingComment(node: Parser.SyntaxNode, source: string): string | null;
|
|
22
|
+
/** Classify a chunk's structural role based on its AST node type */
|
|
23
|
+
export declare function classifyRole(node: Parser.SyntaxNode): StructuralRole;
|
|
24
|
+
/** Build the context prefix string for a chunk */
|
|
25
|
+
export declare function buildContextString(ctx: ChunkContext): string;
|
|
@@ -0,0 +1,204 @@
|
|
|
1
|
+
/** Map AST node types to human-readable scope kinds */
|
|
2
|
+
const SCOPE_KIND_MAP = {
|
|
3
|
+
// TypeScript / JavaScript
|
|
4
|
+
class_declaration: "class",
|
|
5
|
+
class_body: "class",
|
|
6
|
+
interface_declaration: "interface",
|
|
7
|
+
module: "module",
|
|
8
|
+
namespace_declaration: "namespace",
|
|
9
|
+
// Python
|
|
10
|
+
class_definition: "class",
|
|
11
|
+
// Rust
|
|
12
|
+
impl_item: "impl",
|
|
13
|
+
trait_item: "trait",
|
|
14
|
+
mod_item: "mod",
|
|
15
|
+
// Go
|
|
16
|
+
type_declaration: "type",
|
|
17
|
+
// Ruby
|
|
18
|
+
class: "class",
|
|
19
|
+
// C / C++
|
|
20
|
+
struct_specifier: "struct",
|
|
21
|
+
class_specifier: "class",
|
|
22
|
+
namespace_definition: "namespace",
|
|
23
|
+
// Swift
|
|
24
|
+
struct_declaration: "struct",
|
|
25
|
+
extension_declaration: "extension",
|
|
26
|
+
// Scala
|
|
27
|
+
object_definition: "object",
|
|
28
|
+
trait_definition: "trait",
|
|
29
|
+
// Generic
|
|
30
|
+
ContainerDecl: "container",
|
|
31
|
+
};
|
|
32
|
+
/** AST node types that represent type/structure definitions */
|
|
33
|
+
const DEFINITION_TYPES = new Set([
|
|
34
|
+
"class_declaration",
|
|
35
|
+
"class_definition",
|
|
36
|
+
"interface_declaration",
|
|
37
|
+
"type_alias_declaration",
|
|
38
|
+
"enum_declaration",
|
|
39
|
+
"struct_item",
|
|
40
|
+
"enum_item",
|
|
41
|
+
"trait_item",
|
|
42
|
+
"type_item",
|
|
43
|
+
"type_declaration",
|
|
44
|
+
"struct_specifier",
|
|
45
|
+
"class_specifier",
|
|
46
|
+
"enum_specifier",
|
|
47
|
+
"type_definition",
|
|
48
|
+
"struct_declaration",
|
|
49
|
+
"protocol_declaration",
|
|
50
|
+
"class_definition",
|
|
51
|
+
"trait_definition",
|
|
52
|
+
"ContainerDecl",
|
|
53
|
+
]);
|
|
54
|
+
/** AST node types that typically orchestrate / glue logic */
|
|
55
|
+
const ORCHESTRATION_TYPES = new Set([
|
|
56
|
+
"export_statement",
|
|
57
|
+
"decorated_definition",
|
|
58
|
+
]);
|
|
59
|
+
/** Walk up from a node to collect typed parent scopes */
|
|
60
|
+
export function extractScope(node, langConfig) {
|
|
61
|
+
const scopes = [];
|
|
62
|
+
let current = node.parent;
|
|
63
|
+
while (current) {
|
|
64
|
+
if (langConfig.scopeTypes.includes(current.type)) {
|
|
65
|
+
const name = extractNodeName(current);
|
|
66
|
+
if (name) {
|
|
67
|
+
const kind = SCOPE_KIND_MAP[current.type] ?? current.type;
|
|
68
|
+
scopes.unshift({ kind, name });
|
|
69
|
+
}
|
|
70
|
+
}
|
|
71
|
+
current = current.parent;
|
|
72
|
+
}
|
|
73
|
+
return scopes;
|
|
74
|
+
}
|
|
75
|
+
/** Extract import module names from the root of the tree */
|
|
76
|
+
export function extractImports(tree, langConfig) {
|
|
77
|
+
const modules = [];
|
|
78
|
+
for (const child of tree.rootNode.children) {
|
|
79
|
+
if (langConfig.importTypes.includes(child.type)) {
|
|
80
|
+
const mod = extractModuleName(child);
|
|
81
|
+
if (mod && !modules.includes(mod)) {
|
|
82
|
+
modules.push(mod);
|
|
83
|
+
}
|
|
84
|
+
}
|
|
85
|
+
}
|
|
86
|
+
return modules;
|
|
87
|
+
}
|
|
88
|
+
/** Pull out just the module/package name from an import node */
|
|
89
|
+
function extractModuleName(node) {
|
|
90
|
+
// Look for string children (the module specifier)
|
|
91
|
+
const stringNode = findDescendant(node, (n) => n.type === "string" ||
|
|
92
|
+
n.type === "string_literal" ||
|
|
93
|
+
n.type === "interpreted_string_literal");
|
|
94
|
+
if (stringNode) {
|
|
95
|
+
// Strip quotes
|
|
96
|
+
return stringNode.text.replace(/^["']|["']$/g, "");
|
|
97
|
+
}
|
|
98
|
+
// Python: import foo.bar / from foo.bar import baz
|
|
99
|
+
const dottedName = findDescendant(node, (n) => n.type === "dotted_name" || n.type === "module_name");
|
|
100
|
+
if (dottedName)
|
|
101
|
+
return dottedName.text;
|
|
102
|
+
// Rust: use crate::foo::bar
|
|
103
|
+
const path = findDescendant(node, (n) => n.type === "scoped_identifier" || n.type === "use_list");
|
|
104
|
+
if (path)
|
|
105
|
+
return path.text;
|
|
106
|
+
// C/C++: #include <foo.h> or "foo.h"
|
|
107
|
+
const sysLib = findDescendant(node, (n) => n.type === "system_lib_string" || n.type === "string_literal");
|
|
108
|
+
if (sysLib)
|
|
109
|
+
return sysLib.text.replace(/^[<"]|[>"]$/g, "");
|
|
110
|
+
return null;
|
|
111
|
+
}
|
|
112
|
+
function findDescendant(node, predicate) {
|
|
113
|
+
for (const child of node.children) {
|
|
114
|
+
if (predicate(child))
|
|
115
|
+
return child;
|
|
116
|
+
const found = findDescendant(child, predicate);
|
|
117
|
+
if (found)
|
|
118
|
+
return found;
|
|
119
|
+
}
|
|
120
|
+
return null;
|
|
121
|
+
}
|
|
122
|
+
/** Extract sibling signatures from the same parent scope */
|
|
123
|
+
export function extractSiblingSignatures(node, langConfig) {
|
|
124
|
+
const parent = node.parent;
|
|
125
|
+
if (!parent)
|
|
126
|
+
return [];
|
|
127
|
+
const siblings = [];
|
|
128
|
+
for (const child of parent.children) {
|
|
129
|
+
if (child.id === node.id)
|
|
130
|
+
continue;
|
|
131
|
+
if (!langConfig.chunkTypes.includes(child.type))
|
|
132
|
+
continue;
|
|
133
|
+
const name = extractNodeName(child);
|
|
134
|
+
if (name) {
|
|
135
|
+
const firstLine = child.text.split("\n")[0].trim();
|
|
136
|
+
if (firstLine.length < 200) {
|
|
137
|
+
siblings.push(firstLine);
|
|
138
|
+
}
|
|
139
|
+
}
|
|
140
|
+
}
|
|
141
|
+
return siblings.slice(0, 10);
|
|
142
|
+
}
|
|
143
|
+
/** Extract leading comments and decorators immediately before a node */
|
|
144
|
+
export function extractLeadingComment(node, source) {
|
|
145
|
+
const lines = source.split("\n");
|
|
146
|
+
const nodeStartLine = node.startPosition.row;
|
|
147
|
+
const collected = [];
|
|
148
|
+
// Walk backwards from the line before the node
|
|
149
|
+
for (let i = nodeStartLine - 1; i >= 0 && i >= nodeStartLine - 10; i--) {
|
|
150
|
+
const line = lines[i].trim();
|
|
151
|
+
if (line.startsWith("//") ||
|
|
152
|
+
line.startsWith("#") ||
|
|
153
|
+
line.startsWith("*") ||
|
|
154
|
+
line.startsWith("/*") ||
|
|
155
|
+
line.startsWith("*/") ||
|
|
156
|
+
line.startsWith("///") ||
|
|
157
|
+
line.startsWith("--") ||
|
|
158
|
+
line.startsWith("@") ||
|
|
159
|
+
line.startsWith('"""') ||
|
|
160
|
+
line.startsWith("'''")) {
|
|
161
|
+
collected.unshift(lines[i]);
|
|
162
|
+
}
|
|
163
|
+
else if (line === "") {
|
|
164
|
+
// Allow one blank line gap
|
|
165
|
+
if (collected.length > 0)
|
|
166
|
+
break;
|
|
167
|
+
}
|
|
168
|
+
else {
|
|
169
|
+
break;
|
|
170
|
+
}
|
|
171
|
+
}
|
|
172
|
+
if (collected.length === 0)
|
|
173
|
+
return null;
|
|
174
|
+
return collected.join("\n").trim();
|
|
175
|
+
}
|
|
176
|
+
/** Classify a chunk's structural role based on its AST node type */
|
|
177
|
+
export function classifyRole(node) {
|
|
178
|
+
if (DEFINITION_TYPES.has(node.type))
|
|
179
|
+
return "definition";
|
|
180
|
+
if (ORCHESTRATION_TYPES.has(node.type))
|
|
181
|
+
return "orchestration";
|
|
182
|
+
return "implementation";
|
|
183
|
+
}
|
|
184
|
+
/** Build the context prefix string for a chunk */
|
|
185
|
+
export function buildContextString(ctx) {
|
|
186
|
+
const lines = [];
|
|
187
|
+
lines.push(`[file: ${ctx.filePath}]`);
|
|
188
|
+
lines.push(`[role: ${ctx.role}]`);
|
|
189
|
+
if (ctx.scope.length > 0) {
|
|
190
|
+
const scopeStr = ctx.scope.map((s) => `${s.kind} ${s.name}`).join(" > ");
|
|
191
|
+
lines.push(`[scope: ${scopeStr}]`);
|
|
192
|
+
}
|
|
193
|
+
if (ctx.leadingComment) {
|
|
194
|
+
lines.push(`[doc: ${ctx.leadingComment}]`);
|
|
195
|
+
}
|
|
196
|
+
return lines.join("\n");
|
|
197
|
+
}
|
|
198
|
+
/** Try to extract a name identifier from an AST node */
|
|
199
|
+
function extractNodeName(node) {
|
|
200
|
+
const nameNode = node.childForFieldName("name") ??
|
|
201
|
+
node.children.find((c) => c.type === "identifier" || c.type === "type_identifier");
|
|
202
|
+
return nameNode?.text;
|
|
203
|
+
}
|
|
204
|
+
//# sourceMappingURL=context.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"context.js","sourceRoot":"","sources":["../../src/chunker/context.ts"],"names":[],"mappings":"AAiBA,uDAAuD;AACvD,MAAM,cAAc,GAA2B;IAC9C,0BAA0B;IAC1B,iBAAiB,EAAE,OAAO;IAC1B,UAAU,EAAE,OAAO;IACnB,qBAAqB,EAAE,WAAW;IAClC,MAAM,EAAE,QAAQ;IAChB,qBAAqB,EAAE,WAAW;IAClC,SAAS;IACT,gBAAgB,EAAE,OAAO;IACzB,OAAO;IACP,SAAS,EAAE,MAAM;IACjB,UAAU,EAAE,OAAO;IACnB,QAAQ,EAAE,KAAK;IACf,KAAK;IACL,gBAAgB,EAAE,MAAM;IACxB,OAAO;IACP,KAAK,EAAE,OAAO;IACd,UAAU;IACV,gBAAgB,EAAE,QAAQ;IAC1B,eAAe,EAAE,OAAO;IACxB,oBAAoB,EAAE,WAAW;IACjC,QAAQ;IACR,kBAAkB,EAAE,QAAQ;IAC5B,qBAAqB,EAAE,WAAW;IAClC,QAAQ;IACR,iBAAiB,EAAE,QAAQ;IAC3B,gBAAgB,EAAE,OAAO;IACzB,UAAU;IACV,aAAa,EAAE,WAAW;CAC1B,CAAC;AAEF,+DAA+D;AAC/D,MAAM,gBAAgB,GAAG,IAAI,GAAG,CAAC;IAChC,mBAAmB;IACnB,kBAAkB;IAClB,uBAAuB;IACvB,wBAAwB;IACxB,kBAAkB;IAClB,aAAa;IACb,WAAW;IACX,YAAY;IACZ,WAAW;IACX,kBAAkB;IAClB,kBAAkB;IAClB,iBAAiB;IACjB,gBAAgB;IAChB,iBAAiB;IACjB,oBAAoB;IACpB,sBAAsB;IACtB,kBAAkB;IAClB,kBAAkB;IAClB,eAAe;CACf,CAAC,CAAC;AAEH,6DAA6D;AAC7D,MAAM,mBAAmB,GAAG,IAAI,GAAG,CAAC;IACnC,kBAAkB;IAClB,sBAAsB;CACtB,CAAC,CAAC;AAEH,yDAAyD;AACzD,MAAM,UAAU,YAAY,CAC3B,IAAuB,EACvB,UAA0B;IAE1B,MAAM,MAAM,GAAiB,EAAE,CAAC;IAChC,IAAI,OAAO,GAAG,IAAI,CAAC,MAAM,CAAC;IAC1B,OAAO,OAAO,EAAE,CAAC;QAChB,IAAI,UAAU,CAAC,UAAU,CAAC,QAAQ,CAAC,OAAO,CAAC,IAAI,CAAC,EAAE,CAAC;YAClD,MAAM,IAAI,GAAG,eAAe,CAAC,OAAO,CAAC,CAAC;YACtC,IAAI,IAAI,EAAE,CAAC;gBACV,MAAM,IAAI,GAAG,cAAc,CAAC,OAAO,CAAC,IAAI,CAAC,IAAI,OAAO,CAAC,IAAI,CAAC;gBAC1D,MAAM,CAAC,OAAO,CAAC,EAAE,IAAI,EAAE,IAAI,EAAE,CAAC,CAAC;YAChC,CAAC;QACF,CAAC;QACD,OAAO,GAAG,OAAO,CAAC,MAAM,CAAC;IAC1B,CAAC;IACD,OAAO,MAAM,CAAC;AACf,CAAC;AAED,4DAA4D;AAC5D,MAAM,UAAU,cAAc,CAC7B,IAAiB,EACjB,UAA0B;IAE1B,MAAM,OAAO,GAAa,EAAE,CAAC;IAC7B,KAAK,MAAM,KAAK,IAAI,IAAI,CAAC,QAAQ,CAAC,QAAQ,EAAE,CAAC;QAC5C,IAAI,UAAU,CAAC,WAAW,CAAC,QAAQ,CAAC,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC;YACjD,MAAM,GAAG,GAAG,iBAAiB,CAAC,KAAK,CAAC,CAAC;YACrC,IAAI,GAAG,IAAI,CAAC,OAAO,CAAC,QAAQ,CAAC,GAAG,CAAC,EAAE,CAAC;gBACnC,OAAO,CAAC,IAAI,CAAC,GAAG,CAAC,CAAC;YACnB,CAAC;QACF,CAAC;IACF,CAAC;IACD,OAAO,OAAO,CAAC;AAChB,CAAC;AAED,gEAAgE;AAChE,SAAS,iBAAiB,CAAC,IAAuB;IACjD,kDAAkD;IAClD,MAAM,UAAU,GAAG,cAAc,CAAC,IAAI,EAAE,CAAC,CAAC,EAAE,EAAE,CAC7C,CAAC,CAAC,IAAI,KAAK,QAAQ;QACnB,CAAC,CAAC,IAAI,KAAK,gBAAgB;QAC3B,CAAC,CAAC,IAAI,KAAK,4BAA4B,CACvC,CAAC;IACF,IAAI,UAAU,EAAE,CAAC;QAChB,eAAe;QACf,OAAO,UAAU,CAAC,IAAI,CAAC,OAAO,CAAC,cAAc,EAAE,EAAE,CAAC,CAAC;IACpD,CAAC;IAED,mDAAmD;IACnD,MAAM,UAAU,GAAG,cAAc,CAAC,IAAI,EAAE,CAAC,CAAC,EAAE,EAAE,CAC7C,CAAC,CAAC,IAAI,KAAK,aAAa,IAAI,CAAC,CAAC,IAAI,KAAK,aAAa,CACpD,CAAC;IACF,IAAI,UAAU;QAAE,OAAO,UAAU,CAAC,IAAI,CAAC;IAEvC,4BAA4B;IAC5B,MAAM,IAAI,GAAG,cAAc,CAAC,IAAI,EAAE,CAAC,CAAC,EAAE,EAAE,CACvC,CAAC,CAAC,IAAI,KAAK,mBAAmB,IAAI,CAAC,CAAC,IAAI,KAAK,UAAU,CACvD,CAAC;IACF,IAAI,IAAI;QAAE,OAAO,IAAI,CAAC,IAAI,CAAC;IAE3B,qCAAqC;IACrC,MAAM,MAAM,GAAG,cAAc,CAAC,IAAI,EAAE,CAAC,CAAC,EAAE,EAAE,CACzC,CAAC,CAAC,IAAI,KAAK,mBAAmB,IAAI,CAAC,CAAC,IAAI,KAAK,gBAAgB,CAC7D,CAAC;IACF,IAAI,MAAM;QAAE,OAAO,MAAM,CAAC,IAAI,CAAC,OAAO,CAAC,cAAc,EAAE,EAAE,CAAC,CAAC;IAE3D,OAAO,IAAI,CAAC;AACb,CAAC;AAED,SAAS,cAAc,CACtB,IAAuB,EACvB,SAA4C;IAE5C,KAAK,MAAM,KAAK,IAAI,IAAI,CAAC,QAAQ,EAAE,CAAC;QACnC,IAAI,SAAS,CAAC,KAAK,CAAC;YAAE,OAAO,KAAK,CAAC;QACnC,MAAM,KAAK,GAAG,cAAc,CAAC,KAAK,EAAE,SAAS,CAAC,CAAC;QAC/C,IAAI,KAAK;YAAE,OAAO,KAAK,CAAC;IACzB,CAAC;IACD,OAAO,IAAI,CAAC;AACb,CAAC;AAED,4DAA4D;AAC5D,MAAM,UAAU,wBAAwB,CACvC,IAAuB,EACvB,UAA0B;IAE1B,MAAM,MAAM,GAAG,IAAI,CAAC,MAAM,CAAC;IAC3B,IAAI,CAAC,MAAM;QAAE,OAAO,EAAE,CAAC;IAEvB,MAAM,QAAQ,GAAa,EAAE,CAAC;IAC9B,KAAK,MAAM,KAAK,IAAI,MAAM,CAAC,QAAQ,EAAE,CAAC;QACrC,IAAI,KAAK,CAAC,EAAE,KAAK,IAAI,CAAC,EAAE;YAAE,SAAS;QACnC,IAAI,CAAC,UAAU,CAAC,UAAU,CAAC,QAAQ,CAAC,KAAK,CAAC,IAAI,CAAC;YAAE,SAAS;QAE1D,MAAM,IAAI,GAAG,eAAe,CAAC,KAAK,CAAC,CAAC;QACpC,IAAI,IAAI,EAAE,CAAC;YACV,MAAM,SAAS,GAAG,KAAK,CAAC,IAAI,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC;YACnD,IAAI,SAAS,CAAC,MAAM,GAAG,GAAG,EAAE,CAAC;gBAC5B,QAAQ,CAAC,IAAI,CAAC,SAAS,CAAC,CAAC;YAC1B,CAAC;QACF,CAAC;IACF,CAAC;IACD,OAAO,QAAQ,CAAC,KAAK,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC;AAC9B,CAAC;AAED,wEAAwE;AACxE,MAAM,UAAU,qBAAqB,CACpC,IAAuB,EACvB,MAAc;IAEd,MAAM,KAAK,GAAG,MAAM,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC;IACjC,MAAM,aAAa,GAAG,IAAI,CAAC,aAAa,CAAC,GAAG,CAAC;IAC7C,MAAM,SAAS,GAAa,EAAE,CAAC;IAE/B,+CAA+C;IAC/C,KAAK,IAAI,CAAC,GAAG,aAAa,GAAG,CAAC,EAAE,CAAC,IAAI,CAAC,IAAI,CAAC,IAAI,aAAa,GAAG,EAAE,EAAE,CAAC,EAAE,EAAE,CAAC;QACxE,MAAM,IAAI,GAAG,KAAK,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC;QAC7B,IACC,IAAI,CAAC,UAAU,CAAC,IAAI,CAAC;YACrB,IAAI,CAAC,UAAU,CAAC,GAAG,CAAC;YACpB,IAAI,CAAC,UAAU,CAAC,GAAG,CAAC;YACpB,IAAI,CAAC,UAAU,CAAC,IAAI,CAAC;YACrB,IAAI,CAAC,UAAU,CAAC,IAAI,CAAC;YACrB,IAAI,CAAC,UAAU,CAAC,KAAK,CAAC;YACtB,IAAI,CAAC,UAAU,CAAC,IAAI,CAAC;YACrB,IAAI,CAAC,UAAU,CAAC,GAAG,CAAC;YACpB,IAAI,CAAC,UAAU,CAAC,KAAK,CAAC;YACtB,IAAI,CAAC,UAAU,CAAC,KAAK,CAAC,EACrB,CAAC;YACF,SAAS,CAAC,OAAO,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,CAAC;QAC7B,CAAC;aAAM,IAAI,IAAI,KAAK,EAAE,EAAE,CAAC;YACxB,2BAA2B;YAC3B,IAAI,SAAS,CAAC,MAAM,GAAG,CAAC;gBAAE,MAAM;QACjC,CAAC;aAAM,CAAC;YACP,MAAM;QACP,CAAC;IACF,CAAC;IAED,IAAI,SAAS,CAAC,MAAM,KAAK,CAAC;QAAE,OAAO,IAAI,CAAC;IACxC,OAAO,SAAS,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC,IAAI,EAAE,CAAC;AACpC,CAAC;AAED,oEAAoE;AACpE,MAAM,UAAU,YAAY,CAC3B,IAAuB;IAEvB,IAAI,gBAAgB,CAAC,GAAG,CAAC,IAAI,CAAC,IAAI,CAAC;QAAE,OAAO,YAAY,CAAC;IACzD,IAAI,mBAAmB,CAAC,GAAG,CAAC,IAAI,CAAC,IAAI,CAAC;QAAE,OAAO,eAAe,CAAC;IAC/D,OAAO,gBAAgB,CAAC;AACzB,CAAC;AAED,kDAAkD;AAClD,MAAM,UAAU,kBAAkB,CAAC,GAAiB;IACnD,MAAM,KAAK,GAAa,EAAE,CAAC;IAE3B,KAAK,CAAC,IAAI,CAAC,UAAU,GAAG,CAAC,QAAQ,GAAG,CAAC,CAAC;IACtC,KAAK,CAAC,IAAI,CAAC,UAAU,GAAG,CAAC,IAAI,GAAG,CAAC,CAAC;IAElC,IAAI,GAAG,CAAC,KAAK,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;QAC1B,MAAM,QAAQ,GAAG,GAAG,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,GAAG,CAAC,CAAC,IAAI,IAAI,CAAC,CAAC,IAAI,EAAE,CAAC,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;QACzE,KAAK,CAAC,IAAI,CAAC,WAAW,QAAQ,GAAG,CAAC,CAAC;IACpC,CAAC;IAED,IAAI,GAAG,CAAC,cAAc,EAAE,CAAC;QACxB,KAAK,CAAC,IAAI,CAAC,SAAS,GAAG,CAAC,cAAc,GAAG,CAAC,CAAC;IAC5C,CAAC;IAED,OAAO,KAAK,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;AACzB,CAAC;AAED,wDAAwD;AACxD,SAAS,eAAe,CAAC,IAAuB;IAC/C,MAAM,QAAQ,GACb,IAAI,CAAC,iBAAiB,CAAC,MAAM,CAAC;QAC9B,IAAI,CAAC,QAAQ,CAAC,IAAI,CACjB,CAAC,CAAoB,EAAE,EAAE,CACxB,CAAC,CAAC,IAAI,KAAK,YAAY,IAAI,CAAC,CAAC,IAAI,KAAK,iBAAiB,CACxD,CAAC;IACH,OAAO,QAAQ,EAAE,IAAI,CAAC;AACvB,CAAC"}
|
|
@@ -0,0 +1,145 @@
|
|
|
1
|
+
import { createHash } from "node:crypto";
|
|
2
|
+
import { readFileSync } from "node:fs";
|
|
3
|
+
import { join } from "node:path";
|
|
4
|
+
import Parser from "web-tree-sitter";
|
|
5
|
+
import { buildContextString, classifyRole, extractLeadingComment, extractScope, } from "./context.js";
|
|
6
|
+
import { getLanguageForFile, getWasmPath, } from "./languages.js";
|
|
7
|
+
const MAX_CHUNK_TOKENS = 8192;
|
|
8
|
+
let parserInstance;
|
|
9
|
+
const loadedLanguages = new Map();
|
|
10
|
+
async function getParser() {
|
|
11
|
+
if (!parserInstance) {
|
|
12
|
+
await Parser.init();
|
|
13
|
+
parserInstance = new Parser();
|
|
14
|
+
}
|
|
15
|
+
return parserInstance;
|
|
16
|
+
}
|
|
17
|
+
async function getLanguage(langConfig) {
|
|
18
|
+
const cached = loadedLanguages.get(langConfig.id);
|
|
19
|
+
if (cached)
|
|
20
|
+
return cached;
|
|
21
|
+
const wasmPath = getWasmPath(langConfig);
|
|
22
|
+
if (!wasmPath)
|
|
23
|
+
return undefined;
|
|
24
|
+
const lang = await Parser.Language.load(wasmPath);
|
|
25
|
+
loadedLanguages.set(langConfig.id, lang);
|
|
26
|
+
return lang;
|
|
27
|
+
}
|
|
28
|
+
/** Chunk a single file into context-enriched chunks */
|
|
29
|
+
export async function chunkFile(filePath, cwd) {
|
|
30
|
+
const langConfig = getLanguageForFile(filePath);
|
|
31
|
+
if (!langConfig)
|
|
32
|
+
return fallbackChunk(filePath, cwd);
|
|
33
|
+
const parser = await getParser();
|
|
34
|
+
const language = await getLanguage(langConfig);
|
|
35
|
+
if (!language)
|
|
36
|
+
return fallbackChunk(filePath, cwd);
|
|
37
|
+
parser.setLanguage(language);
|
|
38
|
+
const absolutePath = join(cwd, filePath);
|
|
39
|
+
const source = readFileSync(absolutePath, "utf-8");
|
|
40
|
+
const tree = parser.parse(source);
|
|
41
|
+
if (!tree)
|
|
42
|
+
return fallbackChunk(filePath, cwd);
|
|
43
|
+
const chunks = [];
|
|
44
|
+
collectChunks(tree.rootNode, langConfig, filePath, source, chunks);
|
|
45
|
+
// If no AST chunks found, fall back to the whole file
|
|
46
|
+
if (chunks.length === 0) {
|
|
47
|
+
return fallbackChunk(filePath, cwd);
|
|
48
|
+
}
|
|
49
|
+
return chunks;
|
|
50
|
+
}
|
|
51
|
+
function collectChunks(node, langConfig, filePath, source, chunks) {
|
|
52
|
+
if (langConfig.chunkTypes.includes(node.type)) {
|
|
53
|
+
const content = node.text;
|
|
54
|
+
// If too large, recurse into children instead of chunking the whole node
|
|
55
|
+
const estimatedTokens = Math.ceil(content.length / 4);
|
|
56
|
+
if (estimatedTokens > MAX_CHUNK_TOKENS && hasChunkableDescendants(node, langConfig)) {
|
|
57
|
+
for (const child of node.children) {
|
|
58
|
+
collectChunks(child, langConfig, filePath, source, chunks);
|
|
59
|
+
}
|
|
60
|
+
return;
|
|
61
|
+
}
|
|
62
|
+
const name = extractNodeName(node) ?? `anonymous_${node.startPosition.row}`;
|
|
63
|
+
// Skip very small chunks (less than 2 lines)
|
|
64
|
+
if (content.split("\n").length < 2 && content.length < 50) {
|
|
65
|
+
return;
|
|
66
|
+
}
|
|
67
|
+
const scope = extractScope(node, langConfig);
|
|
68
|
+
const leadingComment = extractLeadingComment(node, source);
|
|
69
|
+
const role = classifyRole(node);
|
|
70
|
+
const context = buildContextString({
|
|
71
|
+
filePath,
|
|
72
|
+
scope,
|
|
73
|
+
leadingComment,
|
|
74
|
+
role,
|
|
75
|
+
});
|
|
76
|
+
const hash = createHash("sha256")
|
|
77
|
+
.update(content)
|
|
78
|
+
.digest("hex")
|
|
79
|
+
.slice(0, 16);
|
|
80
|
+
chunks.push({
|
|
81
|
+
id: `${filePath}:${node.startPosition.row}:${hash}`,
|
|
82
|
+
filePath,
|
|
83
|
+
startLine: node.startPosition.row + 1,
|
|
84
|
+
endLine: node.endPosition.row + 1,
|
|
85
|
+
type: node.type,
|
|
86
|
+
name,
|
|
87
|
+
content,
|
|
88
|
+
context,
|
|
89
|
+
hash,
|
|
90
|
+
});
|
|
91
|
+
return;
|
|
92
|
+
}
|
|
93
|
+
// Recurse into children
|
|
94
|
+
for (const child of node.children) {
|
|
95
|
+
collectChunks(child, langConfig, filePath, source, chunks);
|
|
96
|
+
}
|
|
97
|
+
}
|
|
98
|
+
function hasChunkableDescendants(node, langConfig) {
|
|
99
|
+
for (const child of node.children) {
|
|
100
|
+
if (langConfig.chunkTypes.includes(child.type))
|
|
101
|
+
return true;
|
|
102
|
+
if (hasChunkableDescendants(child, langConfig))
|
|
103
|
+
return true;
|
|
104
|
+
}
|
|
105
|
+
return false;
|
|
106
|
+
}
|
|
107
|
+
function extractNodeName(node) {
|
|
108
|
+
const nameNode = node.childForFieldName("name") ??
|
|
109
|
+
node.children.find((c) => c.type === "identifier" || c.type === "type_identifier");
|
|
110
|
+
return nameNode?.text;
|
|
111
|
+
}
|
|
112
|
+
/** Fallback: chunk by sliding window for unsupported or empty-parse files */
|
|
113
|
+
function fallbackChunk(filePath, cwd) {
|
|
114
|
+
const absolutePath = join(cwd, filePath);
|
|
115
|
+
const source = readFileSync(absolutePath, "utf-8");
|
|
116
|
+
const lines = source.split("\n");
|
|
117
|
+
if (lines.length === 0)
|
|
118
|
+
return [];
|
|
119
|
+
const WINDOW = 50;
|
|
120
|
+
const STRIDE = 25;
|
|
121
|
+
const chunks = [];
|
|
122
|
+
for (let i = 0; i < lines.length; i += STRIDE) {
|
|
123
|
+
const slice = lines.slice(i, i + WINDOW);
|
|
124
|
+
const content = slice.join("\n");
|
|
125
|
+
if (content.trim().length === 0)
|
|
126
|
+
continue;
|
|
127
|
+
const hash = createHash("sha256")
|
|
128
|
+
.update(content)
|
|
129
|
+
.digest("hex")
|
|
130
|
+
.slice(0, 16);
|
|
131
|
+
chunks.push({
|
|
132
|
+
id: `${filePath}:${i}:${hash}`,
|
|
133
|
+
filePath,
|
|
134
|
+
startLine: i + 1,
|
|
135
|
+
endLine: Math.min(i + WINDOW, lines.length),
|
|
136
|
+
type: "block",
|
|
137
|
+
name: `lines_${i + 1}_${Math.min(i + WINDOW, lines.length)}`,
|
|
138
|
+
content,
|
|
139
|
+
context: `[file: ${filePath}]`,
|
|
140
|
+
hash,
|
|
141
|
+
});
|
|
142
|
+
}
|
|
143
|
+
return chunks;
|
|
144
|
+
}
|
|
145
|
+
//# sourceMappingURL=index.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"index.js","sourceRoot":"","sources":["../../src/chunker/index.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,UAAU,EAAE,MAAM,aAAa,CAAC;AACzC,OAAO,EAAE,YAAY,EAAE,MAAM,SAAS,CAAC;AACvC,OAAO,EAAE,IAAI,EAAE,MAAM,WAAW,CAAC;AACjC,OAAO,MAAM,MAAM,iBAAiB,CAAC;AAErC,OAAO,EACN,kBAAkB,EAClB,YAAY,EACZ,qBAAqB,EACrB,YAAY,GACZ,MAAM,cAAc,CAAC;AACtB,OAAO,EAEN,kBAAkB,EAClB,WAAW,GACX,MAAM,gBAAgB,CAAC;AAExB,MAAM,gBAAgB,GAAG,IAAI,CAAC;AAE9B,IAAI,cAAkC,CAAC;AACvC,MAAM,eAAe,GAAG,IAAI,GAAG,EAA2B,CAAC;AAE3D,KAAK,UAAU,SAAS;IACvB,IAAI,CAAC,cAAc,EAAE,CAAC;QACrB,MAAM,MAAM,CAAC,IAAI,EAAE,CAAC;QACpB,cAAc,GAAG,IAAI,MAAM,EAAE,CAAC;IAC/B,CAAC;IACD,OAAO,cAAc,CAAC;AACvB,CAAC;AAED,KAAK,UAAU,WAAW,CACzB,UAA0B;IAE1B,MAAM,MAAM,GAAG,eAAe,CAAC,GAAG,CAAC,UAAU,CAAC,EAAE,CAAC,CAAC;IAClD,IAAI,MAAM;QAAE,OAAO,MAAM,CAAC;IAE1B,MAAM,QAAQ,GAAG,WAAW,CAAC,UAAU,CAAC,CAAC;IACzC,IAAI,CAAC,QAAQ;QAAE,OAAO,SAAS,CAAC;IAEhC,MAAM,IAAI,GAAG,MAAM,MAAM,CAAC,QAAQ,CAAC,IAAI,CAAC,QAAQ,CAAC,CAAC;IAClD,eAAe,CAAC,GAAG,CAAC,UAAU,CAAC,EAAE,EAAE,IAAI,CAAC,CAAC;IACzC,OAAO,IAAI,CAAC;AACb,CAAC;AAED,uDAAuD;AACvD,MAAM,CAAC,KAAK,UAAU,SAAS,CAC9B,QAAgB,EAChB,GAAW;IAEX,MAAM,UAAU,GAAG,kBAAkB,CAAC,QAAQ,CAAC,CAAC;IAChD,IAAI,CAAC,UAAU;QAAE,OAAO,aAAa,CAAC,QAAQ,EAAE,GAAG,CAAC,CAAC;IAErD,MAAM,MAAM,GAAG,MAAM,SAAS,EAAE,CAAC;IACjC,MAAM,QAAQ,GAAG,MAAM,WAAW,CAAC,UAAU,CAAC,CAAC;IAC/C,IAAI,CAAC,QAAQ;QAAE,OAAO,aAAa,CAAC,QAAQ,EAAE,GAAG,CAAC,CAAC;IAEnD,MAAM,CAAC,WAAW,CAAC,QAAQ,CAAC,CAAC;IAE7B,MAAM,YAAY,GAAG,IAAI,CAAC,GAAG,EAAE,QAAQ,CAAC,CAAC;IACzC,MAAM,MAAM,GAAG,YAAY,CAAC,YAAY,EAAE,OAAO,CAAC,CAAC;IACnD,MAAM,IAAI,GAAG,MAAM,CAAC,KAAK,CAAC,MAAM,CAAC,CAAC;IAClC,IAAI,CAAC,IAAI;QAAE,OAAO,aAAa,CAAC,QAAQ,EAAE,GAAG,CAAC,CAAC;IAE/C,MAAM,MAAM,GAAY,EAAE,CAAC;IAE3B,aAAa,CAAC,IAAI,CAAC,QAAQ,EAAE,UAAU,EAAE,QAAQ,EAAE,MAAM,EAAE,MAAM,CAAC,CAAC;IAEnE,sDAAsD;IACtD,IAAI,MAAM,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;QACzB,OAAO,aAAa,CAAC,QAAQ,EAAE,GAAG,CAAC,CAAC;IACrC,CAAC;IAED,OAAO,MAAM,CAAC;AACf,CAAC;AAED,SAAS,aAAa,CACrB,IAAuB,EACvB,UAA0B,EAC1B,QAAgB,EAChB,MAAc,EACd,MAAe;IAEf,IAAI,UAAU,CAAC,UAAU,CAAC,QAAQ,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC;QAC/C,MAAM,OAAO,GAAG,IAAI,CAAC,IAAI,CAAC;QAE1B,yEAAyE;QACzE,MAAM,eAAe,GAAG,IAAI,CAAC,IAAI,CAAC,OAAO,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC;QACtD,IAAI,eAAe,GAAG,gBAAgB,IAAI,uBAAuB,CAAC,IAAI,EAAE,UAAU,CAAC,EAAE,CAAC;YACrF,KAAK,MAAM,KAAK,IAAI,IAAI,CAAC,QAAQ,EAAE,CAAC;gBACnC,aAAa,CAAC,KAAK,EAAE,UAAU,EAAE,QAAQ,EAAE,MAAM,EAAE,MAAM,CAAC,CAAC;YAC5D,CAAC;YACD,OAAO;QACR,CAAC;QAED,MAAM,IAAI,GAAG,eAAe,CAAC,IAAI,CAAC,IAAI,aAAa,IAAI,CAAC,aAAa,CAAC,GAAG,EAAE,CAAC;QAE5E,6CAA6C;QAC7C,IAAI,OAAO,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,MAAM,GAAG,CAAC,IAAI,OAAO,CAAC,MAAM,GAAG,EAAE,EAAE,CAAC;YAC3D,OAAO;QACR,CAAC;QAED,MAAM,KAAK,GAAG,YAAY,CAAC,IAAI,EAAE,UAAU,CAAC,CAAC;QAC7C,MAAM,cAAc,GAAG,qBAAqB,CAAC,IAAI,EAAE,MAAM,CAAC,CAAC;QAC3D,MAAM,IAAI,GAAG,YAAY,CAAC,IAAI,CAAC,CAAC;QAEhC,MAAM,OAAO,GAAG,kBAAkB,CAAC;YAClC,QAAQ;YACR,KAAK;YACL,cAAc;YACd,IAAI;SACJ,CAAC,CAAC;QAEH,MAAM,IAAI,GAAG,UAAU,CAAC,QAAQ,CAAC;aAC/B,MAAM,CAAC,OAAO,CAAC;aACf,MAAM,CAAC,KAAK,CAAC;aACb,KAAK,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC;QAEf,MAAM,CAAC,IAAI,CAAC;YACX,EAAE,EAAE,GAAG,QAAQ,IAAI,IAAI,CAAC,aAAa,CAAC,GAAG,IAAI,IAAI,EAAE;YACnD,QAAQ;YACR,SAAS,EAAE,IAAI,CAAC,aAAa,CAAC,GAAG,GAAG,CAAC;YACrC,OAAO,EAAE,IAAI,CAAC,WAAW,CAAC,GAAG,GAAG,CAAC;YACjC,IAAI,EAAE,IAAI,CAAC,IAAI;YACf,IAAI;YACJ,OAAO;YACP,OAAO;YACP,IAAI;SACJ,CAAC,CAAC;QACH,OAAO;IACR,CAAC;IAED,wBAAwB;IACxB,KAAK,MAAM,KAAK,IAAI,IAAI,CAAC,QAAQ,EAAE,CAAC;QACnC,aAAa,CAAC,KAAK,EAAE,UAAU,EAAE,QAAQ,EAAE,MAAM,EAAE,MAAM,CAAC,CAAC;IAC5D,CAAC;AACF,CAAC;AAED,SAAS,uBAAuB,CAC/B,IAAuB,EACvB,UAA0B;IAE1B,KAAK,MAAM,KAAK,IAAI,IAAI,CAAC,QAAQ,EAAE,CAAC;QACnC,IAAI,UAAU,CAAC,UAAU,CAAC,QAAQ,CAAC,KAAK,CAAC,IAAI,CAAC;YAAE,OAAO,IAAI,CAAC;QAC5D,IAAI,uBAAuB,CAAC,KAAK,EAAE,UAAU,CAAC;YAAE,OAAO,IAAI,CAAC;IAC7D,CAAC;IACD,OAAO,KAAK,CAAC;AACd,CAAC;AAED,SAAS,eAAe,CAAC,IAAuB;IAC/C,MAAM,QAAQ,GACb,IAAI,CAAC,iBAAiB,CAAC,MAAM,CAAC;QAC9B,IAAI,CAAC,QAAQ,CAAC,IAAI,CACjB,CAAC,CAAoB,EAAE,EAAE,CAAC,CAAC,CAAC,IAAI,KAAK,YAAY,IAAI,CAAC,CAAC,IAAI,KAAK,iBAAiB,CACjF,CAAC;IACH,OAAO,QAAQ,EAAE,IAAI,CAAC;AACvB,CAAC;AAED,6EAA6E;AAC7E,SAAS,aAAa,CAAC,QAAgB,EAAE,GAAW;IACnD,MAAM,YAAY,GAAG,IAAI,CAAC,GAAG,EAAE,QAAQ,CAAC,CAAC;IACzC,MAAM,MAAM,GAAG,YAAY,CAAC,YAAY,EAAE,OAAO,CAAC,CAAC;IACnD,MAAM,KAAK,GAAG,MAAM,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC;IAEjC,IAAI,KAAK,CAAC,MAAM,KAAK,CAAC;QAAE,OAAO,EAAE,CAAC;IAElC,MAAM,MAAM,GAAG,EAAE,CAAC;IAClB,MAAM,MAAM,GAAG,EAAE,CAAC;IAClB,MAAM,MAAM,GAAY,EAAE,CAAC;IAE3B,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,KAAK,CAAC,MAAM,EAAE,CAAC,IAAI,MAAM,EAAE,CAAC;QAC/C,MAAM,KAAK,GAAG,KAAK,CAAC,KAAK,CAAC,CAAC,EAAE,CAAC,GAAG,MAAM,CAAC,CAAC;QACzC,MAAM,OAAO,GAAG,KAAK,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;QACjC,IAAI,OAAO,CAAC,IAAI,EAAE,CAAC,MAAM,KAAK,CAAC;YAAE,SAAS;QAE1C,MAAM,IAAI,GAAG,UAAU,CAAC,QAAQ,CAAC;aAC/B,MAAM,CAAC,OAAO,CAAC;aACf,MAAM,CAAC,KAAK,CAAC;aACb,KAAK,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC;QAEf,MAAM,CAAC,IAAI,CAAC;YACX,EAAE,EAAE,GAAG,QAAQ,IAAI,CAAC,IAAI,IAAI,EAAE;YAC9B,QAAQ;YACR,SAAS,EAAE,CAAC,GAAG,CAAC;YAChB,OAAO,EAAE,IAAI,CAAC,GAAG,CAAC,CAAC,GAAG,MAAM,EAAE,KAAK,CAAC,MAAM,CAAC;YAC3C,IAAI,EAAE,OAAO;YACb,IAAI,EAAE,SAAS,CAAC,GAAG,CAAC,IAAI,IAAI,CAAC,GAAG,CAAC,CAAC,GAAG,MAAM,EAAE,KAAK,CAAC,MAAM,CAAC,EAAE;YAC5D,OAAO;YACP,OAAO,EAAE,UAAU,QAAQ,GAAG;YAC9B,IAAI;SACJ,CAAC,CAAC;IACJ,CAAC;IAED,OAAO,MAAM,CAAC;AACf,CAAC"}
|