raggrep 0.15.0 → 0.17.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +112 -8
- package/dist/cli/main.js +827 -521
- package/dist/cli/main.js.map +26 -19
- package/dist/domain/ports/embedding.d.ts +10 -0
- package/dist/domain/ports/index.d.ts +1 -1
- package/dist/domain/services/chunkContext.d.ts +76 -0
- package/dist/domain/services/index.d.ts +1 -0
- package/dist/index.js +638 -390
- package/dist/index.js.map +25 -18
- package/dist/infrastructure/config/configLoader.d.ts +9 -11
- package/dist/infrastructure/config/index.d.ts +1 -1
- package/dist/infrastructure/embeddings/embeddingPaths.d.ts +6 -0
- package/dist/infrastructure/embeddings/embeddingProviderFactory.d.ts +9 -0
- package/dist/infrastructure/embeddings/globalEmbeddings.d.ts +28 -0
- package/dist/infrastructure/embeddings/huggingfaceEmbeddingProvider.d.ts +21 -0
- package/dist/infrastructure/embeddings/index.d.ts +9 -2
- package/dist/infrastructure/embeddings/modelCache.d.ts +10 -0
- package/dist/infrastructure/embeddings/modelCatalog.d.ts +23 -0
- package/dist/infrastructure/embeddings/xenovaEmbeddingProvider.d.ts +23 -0
- package/dist/infrastructure/index.d.ts +1 -1
- package/package.json +7 -3
- package/dist/infrastructure/embeddings/transformersEmbedding.d.ts +0 -52
package/README.md
CHANGED
|
@@ -12,7 +12,9 @@ RAGgrep indexes your code and lets you search it using natural language. Everyth
|
|
|
12
12
|
- **Local-first** — All indexing and search happens on your machine. No cloud dependencies.
|
|
13
13
|
- **Incremental** — Only re-indexes files that have changed. Instant search when nothing changed.
|
|
14
14
|
- **Watch mode** — Keep the index fresh in real-time as you code.
|
|
15
|
-
- **Hybrid search** — Combines semantic similarity
|
|
15
|
+
- **Hybrid search** — Combines semantic similarity, keyword matching, and exact text matching for best results.
|
|
16
|
+
- **Exact match track** — Finds identifiers in ANY file type (YAML, .env, config, not just code) with grep-like precision.
|
|
17
|
+
- **Fusion boosting** — Semantic results containing exact matches get boosted (1.5x) for better ranking.
|
|
16
18
|
- **Literal boosting** — Exact identifier matches get priority. Use backticks for precise matching: `` `AuthService` ``.
|
|
17
19
|
- **Phrase matching** — Exact phrases in documentation are found even when semantic similarity is low.
|
|
18
20
|
- **Semantic expansion** — Domain-specific synonyms improve recall (function ↔ method, auth ↔ authentication).
|
|
@@ -40,6 +42,59 @@ That's it. The first query creates the index automatically. Subsequent queries a
|
|
|
40
42
|
|
|
41
43
|
### Example Output
|
|
42
44
|
|
|
45
|
+
**Natural Language Query:**
|
|
46
|
+
```
|
|
47
|
+
Index updated: 42 indexed
|
|
48
|
+
|
|
49
|
+
RAGgrep Search
|
|
50
|
+
=============
|
|
51
|
+
|
|
52
|
+
Searching for: "user authentication"
|
|
53
|
+
|
|
54
|
+
Found 3 results:
|
|
55
|
+
|
|
56
|
+
1. src/auth/authService.ts:24-55 (login)
|
|
57
|
+
Score: 34.4% | Type: function | via TypeScript | exported
|
|
58
|
+
export async function login(credentials: LoginCredentials): Promise<AuthResult> {
|
|
59
|
+
const { email, password } = credentials;
|
|
60
|
+
|
|
61
|
+
2. src/auth/session.ts:10-25 (createSession)
|
|
62
|
+
Score: 28.2% | Type: function | via TypeScript | exported
|
|
63
|
+
export function createSession(user: User): Session {
|
|
64
|
+
|
|
65
|
+
3. src/users/types.ts:3-12 (User)
|
|
66
|
+
Score: 26.0% | Type: interface | via TypeScript | exported
|
|
67
|
+
export interface User {
|
|
68
|
+
id: string;
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
**Exact Identifier Query (shows both tracks):**
|
|
72
|
+
```
|
|
73
|
+
Index updated: 42 indexed
|
|
74
|
+
|
|
75
|
+
Searching for: "AUTH_SERVICE_URL"
|
|
76
|
+
|
|
77
|
+
┌─ Exact Matches (4 files, 6 matches) ─┐
|
|
78
|
+
│ Query: "AUTH_SERVICE_URL"
|
|
79
|
+
└─────────────────────────────────────────────────────────────────────┘
|
|
80
|
+
|
|
81
|
+
1. config.yaml (2 matches)
|
|
82
|
+
8 │ auth:
|
|
83
|
+
9 │ url: AUTH_SERVICE_URL
|
|
84
|
+
► 10 │ grpc_url: AUTH_SERVICE_GRPC_URL
|
|
85
|
+
11 │ timeout: 5000
|
|
86
|
+
|
|
87
|
+
2. .env.example (1 match)
|
|
88
|
+
2 │ AUTH_SERVICE_URL=https://auth.example.com
|
|
89
|
+
► 3 │ AUTH_SERVICE_GRPC_URL=grpc://auth.example.com:9000
|
|
90
|
+
|
|
91
|
+
┌─ Semantic Results (boosted by exact matches) ─┐
|
|
92
|
+
└─────────────────────────────────────────────────────────────────────┘
|
|
93
|
+
|
|
94
|
+
1. src/auth/authService.ts:2-10 (AuthService)
|
|
95
|
+
Score: 45.2% | Type: class | via TypeScript | exported | exact match
|
|
96
|
+
export class AuthService {
|
|
97
|
+
private baseUrl = AUTH_SERVICE_URL;
|
|
43
98
|
```
|
|
44
99
|
Index updated: 42 indexed
|
|
45
100
|
|
|
@@ -97,17 +152,20 @@ raggrep reset # Clear the index
|
|
|
97
152
|
### Query Options
|
|
98
153
|
|
|
99
154
|
```bash
|
|
100
|
-
raggrep query "user login" #
|
|
155
|
+
raggrep query "user login" # Natural language query
|
|
156
|
+
raggrep query -C ~/projects/my-app "login" # Search a project without cd
|
|
157
|
+
raggrep query "AUTH_SERVICE_URL" # Exact identifier (auto-triggers exact match)
|
|
158
|
+
raggrep query "\`AuthService\`" # Backticks force exact match
|
|
101
159
|
raggrep query "error handling" --top 5 # Limit results
|
|
102
160
|
raggrep query "database" --min-score 0.2 # Set minimum score threshold
|
|
103
161
|
raggrep query "interface" --type ts # Filter by file extension
|
|
104
162
|
raggrep query "auth" --filter src/auth # Filter by path
|
|
105
163
|
raggrep query "api" -f src/api -f src/routes # Multiple path filters
|
|
106
|
-
raggrep query "\`AuthService\` class" # Exact identifier match (backticks)
|
|
107
164
|
```
|
|
108
165
|
|
|
109
166
|
| Flag | Short | Description |
|
|
110
167
|
| ----------------- | ----- | ---------------------------------------------------------- |
|
|
168
|
+
| `--dir <path>` | `-C` | Project directory to search (default: current directory) |
|
|
111
169
|
| `--top <n>` | `-k` | Number of results to return (default: 10) |
|
|
112
170
|
| `--min-score <n>` | `-s` | Minimum similarity score 0-1 (default: 0.15) |
|
|
113
171
|
| `--type <ext>` | `-t` | Filter by file extension (e.g., ts, tsx, js) |
|
|
@@ -150,10 +208,35 @@ raggrep query "config" --filter "*.json" --filter "*.yaml" --filter config/
|
|
|
150
208
|
|
|
151
209
|
This is useful when you know whether you're looking for code or documentation.
|
|
152
210
|
|
|
211
|
+
### Exact Match Search
|
|
212
|
+
|
|
213
|
+
For identifier-like queries (SCREAMING_SNAKE_CASE, camelCase, PascalCase), RAGgrep automatically runs exact match search:
|
|
214
|
+
|
|
215
|
+
```bash
|
|
216
|
+
# Finds AUTH_SERVICE_URL in ALL file types (YAML, .env, config, etc.)
|
|
217
|
+
raggrep query "AUTH_SERVICE_URL"
|
|
218
|
+
|
|
219
|
+
# Finds the function by exact name
|
|
220
|
+
raggrep query "getUserById"
|
|
221
|
+
|
|
222
|
+
# Use backticks for explicit exact matching (even natural words)
|
|
223
|
+
raggrep query "`configuration`"
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
**What Gets Searched:**
|
|
227
|
+
- **Source code**: `.ts`, `.js`, `.py`, `.go`, `.rs`
|
|
228
|
+
- **Config files**: `.yaml`, `.yml`, `.json`, `.toml`, `.env`
|
|
229
|
+
- **Documentation**: `.md`, `.txt`
|
|
230
|
+
|
|
231
|
+
**Ignored:** `node_modules`, `.git`, `dist`, `build`, `.cache`, etc.
|
|
232
|
+
|
|
233
|
+
Exact matches are shown in a separate section with line numbers and context. Semantic results containing the same identifier get boosted (1.5x score multiplier).
|
|
234
|
+
|
|
153
235
|
### Index Options
|
|
154
236
|
|
|
155
237
|
```bash
|
|
156
238
|
raggrep index # Index current directory
|
|
239
|
+
raggrep index --dir ../other-repo # Index another path without cd
|
|
157
240
|
raggrep index --watch # Watch mode - re-index on file changes
|
|
158
241
|
raggrep index --verbose # Show detailed progress
|
|
159
242
|
raggrep index --concurrency 8 # Set parallel workers (default: auto)
|
|
@@ -162,18 +245,21 @@ raggrep index --model bge-small-en-v1.5 # Use specific embedding model
|
|
|
162
245
|
|
|
163
246
|
| Flag | Short | Description |
|
|
164
247
|
| ------------------- | ----- | ------------------------------------------------------- |
|
|
248
|
+
| `--dir <path>` | `-C` | Project directory to index (default: current directory) |
|
|
165
249
|
| `--watch` | `-w` | Watch for file changes and re-index automatically |
|
|
166
250
|
| `--verbose` | `-v` | Show detailed progress |
|
|
167
251
|
| `--concurrency <n>` | `-c` | Number of parallel workers (default: auto based on CPU) |
|
|
168
|
-
| `--model <name>` | `-m` |
|
|
252
|
+
| `--model <name>` | `-m` | Override TypeScript module embedding model (saved config otherwise) |
|
|
169
253
|
| `--help` | `-h` | Show help message |
|
|
170
254
|
|
|
171
255
|
### Other Commands
|
|
172
256
|
|
|
173
257
|
```bash
|
|
174
|
-
raggrep status
|
|
175
|
-
raggrep
|
|
176
|
-
raggrep
|
|
258
|
+
raggrep status # Show index status and statistics
|
|
259
|
+
raggrep status --dir ./packages/api
|
|
260
|
+
raggrep reset # Clear the index for the current directory
|
|
261
|
+
raggrep reset -C ~/projects/my-app
|
|
262
|
+
raggrep --version # Show version
|
|
177
263
|
```
|
|
178
264
|
|
|
179
265
|
## How It Works
|
|
@@ -183,7 +269,25 @@ raggrep --version # Show version
|
|
|
183
269
|
3. **Files changed** — Re-indexes only modified files automatically
|
|
184
270
|
4. **Files deleted** — Stale entries cleaned up automatically
|
|
185
271
|
|
|
186
|
-
The index is stored in
|
|
272
|
+
The index is stored under **`.raggrep/`** in the project directory you index or pass with **`--dir` / `-C`** (by default, the current working directory). Add `.raggrep/` to `.gitignore` if you do not want index files in version control.
|
|
273
|
+
|
|
274
|
+
## Embeddings and benchmarks
|
|
275
|
+
|
|
276
|
+
Indexing uses Transformers.js–style **local ONNX** models. Unless you change `.raggrep/config.json` or pass **`raggrep index --model`**, a fresh install uses this stack:
|
|
277
|
+
|
|
278
|
+
| | Default |
|
|
279
|
+
|---|--------|
|
|
280
|
+
| **Runtime** | **`huggingface`** (`@huggingface/transformers`). Set `embeddingRuntime` to `"xenova"` on a module in `.raggrep/config.json` to use `@xenova/transformers` instead. |
|
|
281
|
+
| **Model** | **`bge-small-en-v1.5`** on each embedding-backed module (TypeScript, Python, Go, Rust, JSON, markdown). |
|
|
282
|
+
|
|
283
|
+
**Benchmarks** (clone [next-convex-starter-app](https://github.com/conradkoh/next-convex-starter-app) at a pinned commit; see each script for options):
|
|
284
|
+
|
|
285
|
+
| Command | What it measures | Source |
|
|
286
|
+
|--------|------------------|--------|
|
|
287
|
+
| `bun run bench:embeddings` | Embedding throughput (runtime × model matrix; **nomic** omitted from the harness for now) | [`scripts/benchmark-embedding-runtimes.ts`](./scripts/benchmark-embedding-runtimes.ts) |
|
|
288
|
+
| `bun run bench:retrieval` | Index + hybrid search time and accuracy vs golden queries | [`scripts/benchmark-retrieval-quality.ts`](./scripts/benchmark-retrieval-quality.ts) |
|
|
289
|
+
|
|
290
|
+
Golden query set: [`scripts/eval/golden-queries-next-convex.json`](./scripts/eval/golden-queries-next-convex.json). Retrieval runs write `scripts/benchmarks/<benchmark-name>.result.md` and resumable `*.cache.json` (ignored by git by default).
|
|
187
291
|
|
|
188
292
|
## What Gets Indexed
|
|
189
293
|
|