raggrep 0.15.0 → 0.17.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -12,7 +12,9 @@ RAGgrep indexes your code and lets you search it using natural language. Everyth
12
12
  - **Local-first** — All indexing and search happens on your machine. No cloud dependencies.
13
13
  - **Incremental** — Only re-indexes files that have changed. Instant search when nothing changed.
14
14
  - **Watch mode** — Keep the index fresh in real-time as you code.
15
- - **Hybrid search** — Combines semantic similarity with keyword matching for best results.
15
+ - **Hybrid search** — Combines semantic similarity, keyword matching, and exact text matching for best results.
16
+ - **Exact match track** — Finds identifiers in ANY file type (YAML, .env, config, not just code) with grep-like precision.
17
+ - **Fusion boosting** — Semantic results containing exact matches get boosted (1.5x) for better ranking.
16
18
  - **Literal boosting** — Exact identifier matches get priority. Use backticks for precise matching: `` `AuthService` ``.
17
19
  - **Phrase matching** — Exact phrases in documentation are found even when semantic similarity is low.
18
20
  - **Semantic expansion** — Domain-specific synonyms improve recall (function ↔ method, auth ↔ authentication).
@@ -40,6 +42,59 @@ That's it. The first query creates the index automatically. Subsequent queries a
40
42
 
41
43
  ### Example Output
42
44
 
45
+ **Natural Language Query:**
46
+ ```
47
+ Index updated: 42 indexed
48
+
49
+ RAGgrep Search
50
+ =============
51
+
52
+ Searching for: "user authentication"
53
+
54
+ Found 3 results:
55
+
56
+ 1. src/auth/authService.ts:24-55 (login)
57
+ Score: 34.4% | Type: function | via TypeScript | exported
58
+ export async function login(credentials: LoginCredentials): Promise<AuthResult> {
59
+ const { email, password } = credentials;
60
+
61
+ 2. src/auth/session.ts:10-25 (createSession)
62
+ Score: 28.2% | Type: function | via TypeScript | exported
63
+ export function createSession(user: User): Session {
64
+
65
+ 3. src/users/types.ts:3-12 (User)
66
+ Score: 26.0% | Type: interface | via TypeScript | exported
67
+ export interface User {
68
+ id: string;
69
+ ```
70
+
71
+ **Exact Identifier Query (shows both tracks):**
72
+ ```
73
+ Index updated: 42 indexed
74
+
75
+ Searching for: "AUTH_SERVICE_URL"
76
+
77
+ ┌─ Exact Matches (4 files, 6 matches) ─┐
78
+ │ Query: "AUTH_SERVICE_URL"
79
+ └─────────────────────────────────────────────────────────────────────┘
80
+
81
+ 1. config.yaml (2 matches)
82
+ 8 │ auth:
83
+ 9 │ url: AUTH_SERVICE_URL
84
+ ► 10 │ grpc_url: AUTH_SERVICE_GRPC_URL
85
+ 11 │ timeout: 5000
86
+
87
+ 2. .env.example (1 match)
88
+ 2 │ AUTH_SERVICE_URL=https://auth.example.com
89
+ ► 3 │ AUTH_SERVICE_GRPC_URL=grpc://auth.example.com:9000
90
+
91
+ ┌─ Semantic Results (boosted by exact matches) ─┐
92
+ └─────────────────────────────────────────────────────────────────────┘
93
+
94
+ 1. src/auth/authService.ts:2-10 (AuthService)
95
+ Score: 45.2% | Type: class | via TypeScript | exported | exact match
96
+ export class AuthService {
97
+ private baseUrl = AUTH_SERVICE_URL;
43
98
  ```
44
99
  Index updated: 42 indexed
45
100
 
@@ -97,17 +152,20 @@ raggrep reset # Clear the index
97
152
  ### Query Options
98
153
 
99
154
  ```bash
100
- raggrep query "user login" # Basic search
155
+ raggrep query "user login" # Natural language query
156
+ raggrep query -C ~/projects/my-app "login" # Search a project without cd
157
+ raggrep query "AUTH_SERVICE_URL" # Exact identifier (auto-triggers exact match)
158
+ raggrep query "\`AuthService\`" # Backticks force exact match
101
159
  raggrep query "error handling" --top 5 # Limit results
102
160
  raggrep query "database" --min-score 0.2 # Set minimum score threshold
103
161
  raggrep query "interface" --type ts # Filter by file extension
104
162
  raggrep query "auth" --filter src/auth # Filter by path
105
163
  raggrep query "api" -f src/api -f src/routes # Multiple path filters
106
- raggrep query "\`AuthService\` class" # Exact identifier match (backticks)
107
164
  ```
108
165
 
109
166
  | Flag | Short | Description |
110
167
  | ----------------- | ----- | ---------------------------------------------------------- |
168
+ | `--dir <path>` | `-C` | Project directory to search (default: current directory) |
111
169
  | `--top <n>` | `-k` | Number of results to return (default: 10) |
112
170
  | `--min-score <n>` | `-s` | Minimum similarity score 0-1 (default: 0.15) |
113
171
  | `--type <ext>` | `-t` | Filter by file extension (e.g., ts, tsx, js) |
@@ -150,10 +208,35 @@ raggrep query "config" --filter "*.json" --filter "*.yaml" --filter config/
150
208
 
151
209
  This is useful when you know whether you're looking for code or documentation.
152
210
 
211
+ ### Exact Match Search
212
+
213
+ For identifier-like queries (SCREAMING_SNAKE_CASE, camelCase, PascalCase), RAGgrep automatically runs exact match search:
214
+
215
+ ```bash
216
+ # Finds AUTH_SERVICE_URL in ALL file types (YAML, .env, config, etc.)
217
+ raggrep query "AUTH_SERVICE_URL"
218
+
219
+ # Finds the function by exact name
220
+ raggrep query "getUserById"
221
+
222
+ # Use backticks for explicit exact matching (even natural words)
223
+ raggrep query "`configuration`"
224
+ ```
225
+
226
+ **What Gets Searched:**
227
+ - **Source code**: `.ts`, `.js`, `.py`, `.go`, `.rs`
228
+ - **Config files**: `.yaml`, `.yml`, `.json`, `.toml`, `.env`
229
+ - **Documentation**: `.md`, `.txt`
230
+
231
+ **Ignored:** `node_modules`, `.git`, `dist`, `build`, `.cache`, etc.
232
+
233
+ Exact matches are shown in a separate section with line numbers and context. Semantic results containing the same identifier get boosted (1.5x score multiplier).
234
+
153
235
  ### Index Options
154
236
 
155
237
  ```bash
156
238
  raggrep index # Index current directory
239
+ raggrep index --dir ../other-repo # Index another path without cd
157
240
  raggrep index --watch # Watch mode - re-index on file changes
158
241
  raggrep index --verbose # Show detailed progress
159
242
  raggrep index --concurrency 8 # Set parallel workers (default: auto)
@@ -162,18 +245,21 @@ raggrep index --model bge-small-en-v1.5 # Use specific embedding model
162
245
 
163
246
  | Flag | Short | Description |
164
247
  | ------------------- | ----- | ------------------------------------------------------- |
248
+ | `--dir <path>` | `-C` | Project directory to index (default: current directory) |
165
249
  | `--watch` | `-w` | Watch for file changes and re-index automatically |
166
250
  | `--verbose` | `-v` | Show detailed progress |
167
251
  | `--concurrency <n>` | `-c` | Number of parallel workers (default: auto based on CPU) |
168
- | `--model <name>` | `-m` | Embedding model to use |
252
+ | `--model <name>` | `-m` | Override TypeScript module embedding model (saved config otherwise) |
169
253
  | `--help` | `-h` | Show help message |
170
254
 
171
255
  ### Other Commands
172
256
 
173
257
  ```bash
174
- raggrep status # Show index status and statistics
175
- raggrep reset # Clear the index completely
176
- raggrep --version # Show version
258
+ raggrep status # Show index status and statistics
259
+ raggrep status --dir ./packages/api
260
+ raggrep reset # Clear the index for the current directory
261
+ raggrep reset -C ~/projects/my-app
262
+ raggrep --version # Show version
177
263
  ```
178
264
 
179
265
  ## How It Works
@@ -183,7 +269,25 @@ raggrep --version # Show version
183
269
  3. **Files changed** — Re-indexes only modified files automatically
184
270
  4. **Files deleted** — Stale entries cleaned up automatically
185
271
 
186
- The index is stored in a system temp directory, keeping your project clean.
272
+ The index is stored under **`.raggrep/`** in the project directory you index or pass with **`--dir` / `-C`** (by default, the current working directory). Add `.raggrep/` to `.gitignore` if you do not want index files in version control.
273
+
274
+ ## Embeddings and benchmarks
275
+
276
+ Indexing uses Transformers.js–style **local ONNX** models. Unless you change `.raggrep/config.json` or pass **`raggrep index --model`**, a fresh install uses this stack:
277
+
278
+ | | Default |
279
+ |---|--------|
280
+ | **Runtime** | **`huggingface`** (`@huggingface/transformers`). Set `embeddingRuntime` to `"xenova"` on a module in `.raggrep/config.json` to use `@xenova/transformers` instead. |
281
+ | **Model** | **`bge-small-en-v1.5`** on each embedding-backed module (TypeScript, Python, Go, Rust, JSON, markdown). |
282
+
283
+ **Benchmarks** (clone [next-convex-starter-app](https://github.com/conradkoh/next-convex-starter-app) at a pinned commit; see each script for options):
284
+
285
+ | Command | What it measures | Source |
286
+ |--------|------------------|--------|
287
+ | `bun run bench:embeddings` | Embedding throughput (runtime × model matrix; **nomic** omitted from the harness for now) | [`scripts/benchmark-embedding-runtimes.ts`](./scripts/benchmark-embedding-runtimes.ts) |
288
+ | `bun run bench:retrieval` | Index + hybrid search time and accuracy vs golden queries | [`scripts/benchmark-retrieval-quality.ts`](./scripts/benchmark-retrieval-quality.ts) |
289
+
290
+ Golden query set: [`scripts/eval/golden-queries-next-convex.json`](./scripts/eval/golden-queries-next-convex.json). Retrieval runs write `scripts/benchmarks/<benchmark-name>.result.md` and resumable `*.cache.json` (ignored by git by default).
187
291
 
188
292
  ## What Gets Indexed
189
293