simargl 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. simargl-0.1.0/LICENSE +21 -0
  2. simargl-0.1.0/PKG-INFO +588 -0
  3. simargl-0.1.0/README.md +559 -0
  4. simargl-0.1.0/pyproject.toml +36 -0
  5. simargl-0.1.0/setup.cfg +4 -0
  6. simargl-0.1.0/setup.py +30 -0
  7. simargl-0.1.0/simargl/__init__.py +2 -0
  8. simargl-0.1.0/simargl/backends/__init__.py +56 -0
  9. simargl-0.1.0/simargl/backends/numpy_backend.py +349 -0
  10. simargl-0.1.0/simargl/backends/postgres_backend.py +335 -0
  11. simargl-0.1.0/simargl/config.py +16 -0
  12. simargl-0.1.0/simargl/embedder.py +209 -0
  13. simargl-0.1.0/simargl/indexer.py +269 -0
  14. simargl-0.1.0/simargl/ingest/__init__.py +7 -0
  15. simargl-0.1.0/simargl/ingest/db_manager.py +191 -0
  16. simargl-0.1.0/simargl/ingest/git_connector.py +82 -0
  17. simargl-0.1.0/simargl/ingest/task_extractor.py +52 -0
  18. simargl-0.1.0/simargl/ingest/task_fetcher.py +102 -0
  19. simargl-0.1.0/simargl/ingest/trackers/__init__.py +15 -0
  20. simargl-0.1.0/simargl/ingest/trackers/github.py +63 -0
  21. simargl-0.1.0/simargl/ingest/trackers/gitlab.py +54 -0
  22. simargl-0.1.0/simargl/ingest/trackers/jira_api.py +29 -0
  23. simargl-0.1.0/simargl/ingest/trackers/jira_html.py +30 -0
  24. simargl-0.1.0/simargl/ingest/trackers/jira_selenium.py +41 -0
  25. simargl-0.1.0/simargl/ingest/trackers/youtrack.py +49 -0
  26. simargl-0.1.0/simargl/mcp_server.py +333 -0
  27. simargl-0.1.0/simargl/searcher.py +224 -0
  28. simargl-0.1.0/simargl/ui/__init__.py +0 -0
  29. simargl-0.1.0/simargl/ui/cli.py +442 -0
  30. simargl-0.1.0/simargl/ui/gradio_app.py +174 -0
  31. simargl-0.1.0/simargl/utils.py +41 -0
  32. simargl-0.1.0/simargl.egg-info/PKG-INFO +588 -0
  33. simargl-0.1.0/simargl.egg-info/SOURCES.txt +38 -0
  34. simargl-0.1.0/simargl.egg-info/dependency_links.txt +1 -0
  35. simargl-0.1.0/simargl.egg-info/entry_points.txt +3 -0
  36. simargl-0.1.0/simargl.egg-info/requires.txt +23 -0
  37. simargl-0.1.0/simargl.egg-info/top_level.txt +1 -0
  38. simargl-0.1.0/tests/test_indexer.py +55 -0
  39. simargl-0.1.0/tests/test_searcher.py +51 -0
  40. simargl-0.1.0/tests/test_utils.py +67 -0
simargl-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Stanislav Zholobetskyi
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
simargl-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,588 @@
1
+ Metadata-Version: 2.4
2
+ Name: simargl
3
+ Version: 0.1.0
4
+ Summary: Task-to-code retrieval — MCP server and web UI
5
+ Project-URL: Homepage, https://github.com/szholobetsky/simargl
6
+ Project-URL: Repository, https://github.com/szholobetsky/simargl
7
+ Requires-Python: >=3.10
8
+ Description-Content-Type: text/markdown
9
+ License-File: LICENSE
10
+ Requires-Dist: sentence-transformers>=2.2
11
+ Requires-Dist: mcp>=1.0
12
+ Requires-Dist: numpy>=1.24
13
+ Requires-Dist: tqdm>=4.60
14
+ Requires-Dist: pyyaml>=6.0
15
+ Provides-Extra: ui
16
+ Requires-Dist: gradio>=4.0; extra == "ui"
17
+ Provides-Extra: postgres
18
+ Requires-Dist: psycopg2-binary>=2.9; extra == "postgres"
19
+ Requires-Dist: pgvector>=0.2; extra == "postgres"
20
+ Provides-Extra: http
21
+ Requires-Dist: uvicorn>=0.24; extra == "http"
22
+ Provides-Extra: ingest
23
+ Requires-Dist: gitpython>=3.1; extra == "ingest"
24
+ Requires-Dist: requests>=2.28; extra == "ingest"
25
+ Requires-Dist: beautifulsoup4>=4.12; extra == "ingest"
26
+ Provides-Extra: selenium
27
+ Requires-Dist: selenium>=4.0; extra == "selenium"
28
+ Dynamic: license-file
29
+
30
+ # simargl
31
+
32
+ **S**emantic **I**ndex: **M**ap **A**rtifacts, **R**etrieve from **G**it **L**og
33
+
34
+ Task-to-code retrieval. Given a description of a change, finds which files and modules are likely affected — using semantic similarity over historical tasks or commits.
35
+
36
+ Exposes an MCP server (stdio transport) compatible with any MCP-aware agent system.
37
+
38
+ ---
39
+
40
+ ## Install
41
+
42
+ ```bash
43
+ pip install simargl
44
+ ```
45
+
46
+ The default embedding model (`bge-small`, ~130MB) is downloaded automatically during install.
47
+ If that fails or you installed offline, download it manually:
48
+
49
+ ```bash
50
+ simargl download
51
+ ```
52
+
53
+ ---
54
+
55
+ ## Step 1 — Index your project
56
+
57
+ You need two indexes: one for code files, one for tasks (or commits if no tracker).
58
+
59
+ ```bash
60
+ # Index code files (walks the repo, chunks text files, stores vectors)
61
+ simargl index files C:/repos/sonar --project sonar
62
+
63
+ # Index tasks from SQLite (auto-detects tasks vs commits)
64
+ simargl index units C:/data/sonar.db --project sonar
65
+
66
+ # Check what was indexed
67
+ simargl status --project sonar
68
+ ```
69
+
70
+ Both indexes land in `.simargl/sonar/` relative to your working directory.
71
+
72
+ Use `--model bge-large` if you need higher accuracy (uses more RAM and disk).
73
+
74
+ Available model keys:
75
+
76
+ ```bash
77
+ # sentence-transformers — runs locally, CPU or GPU, downloads model on first use
78
+ --model bge-small # default, 384 dims
79
+ --model bge-large # better quality, 1024 dims
80
+
81
+ # Ollama — no model download, uses whatever is already pulled in Ollama
82
+ --model ollama://nomic-embed-text # localhost:11434
83
+ --model ollama://nomic-embed-text@192.168.1.10 # remote machine
84
+
85
+ # OpenAI-compatible local server — LM Studio, llama.cpp, LiteLLM, Jan, Koboldcpp
86
+ --model openai://localhost:1234/nomic-embed-text # LM Studio
87
+ --model openai://localhost:8080/all-minilm # llama.cpp server
88
+ --model openai://localhost:4000/nomic-embed-text # LiteLLM
89
+ ```
90
+
91
+ `openai://` means OpenAI-compatible API — no cloud, no API key, runs entirely locally.
92
+
93
+ ---
94
+
95
+ ## Step 2 — Connect to 1bcoder
96
+
97
+ Launch 1bcoder from your project directory — the MCP subprocess inherits that working
98
+ directory, so `.simargl` resolves correctly with no extra flags.
99
+
100
+ ```bash
101
+ cd C:/Project/my-app
102
+ 1bcoder
103
+ ```
104
+
105
+ If you indexed with the default project_id (no `--project` flag):
106
+ ```
107
+ /mcp connect simargl simargl-mcp
108
+ ```
109
+
110
+ If you indexed with a custom project_id:
111
+ ```
112
+ /mcp connect simargl simargl-mcp --project-id bookcrossing
113
+ ```
114
+
115
+ The first connect takes 30–60 s while the embedding model loads — this is normal.
116
+ Tool calls are instant after that.
117
+
118
+ To connect to a project in a different directory without restarting 1bcoder, use `--cwd`:
119
+ ```
120
+ /mcp connect simargl simargl-mcp --cwd C:/Project/other-app --project-id myproject
121
+ ```
122
+
123
+ Check it connected:
124
+
125
+ ```
126
+ /mcp tools simargl
127
+ ```
128
+
129
+ You should see: `find`, `index_files`, `index_units`, `status`, `vacuum`, `embedding`, `distance`.
130
+
131
+ With `--project-id` set at server startup, you never need to pass `project_id` in tool calls.
132
+
133
+ ---
134
+
135
+ ## Step 3 — Index
136
+
137
+ 1bcoder MCP call syntax is `/mcp call server/tool {json_args}`.
138
+
139
+ ```
140
+ /mcp call simargl/index_files {"path": "C:/Project/my-app"}
141
+ ```
142
+
143
+ If you have a task SQLite (Jira/GitHub export):
144
+ ```
145
+ /mcp call simargl/index_units {"db_path": "C:/data/myproject.db"}
146
+ ```
147
+
148
+ Check what was indexed:
149
+ ```
150
+ /mcp call simargl/status {}
151
+ ```
152
+
153
+ ---
154
+
155
+ ## Step 4 — Search
156
+
157
+ The call syntax is always `/mcp call simargl/tool {json}`.
158
+
159
+ ### Find files related to a description
160
+
161
+ ```
162
+ /mcp call simargl/find {"query": "make author field longer in the book class"}
163
+ ```
164
+
165
+ Default mode is `task` + `sort=rank`. If you only indexed files (no task SQLite), use `mode=file`:
166
+
167
+ ```
168
+ /mcp call simargl/find {"query": "make author field longer in the book class", "mode": "file"}
169
+ ```
170
+
171
+ ### All parameters
172
+
173
+ ```
174
+ /mcp call simargl/find {
175
+ "query": "make author field longer in the book class",
176
+ "mode": "file",
177
+ "top_n": 10
178
+ }
179
+ ```
180
+
181
+ | param | values | default |
182
+ |---|---|---|
183
+ | `mode` | `task`, `file`, `aggr` | `task` |
184
+ | `sort` | `rank`, `freq` | `rank` |
185
+ | `top_n` | integer | 10 |
186
+ | `top_k` | integer | 10 |
187
+ | `include_diff` | true/false | false |
188
+ | `project_id` | string | `default` |
189
+ | `store_dir` | path | `.simargl` |
190
+
191
+ ### If you used a custom project_id at index time
192
+
193
+ ```
194
+ /mcp call simargl/find {"query": "add author field", "project_id": "bookcrossing"}
195
+ ```
196
+
197
+ To avoid passing `project_id` every time, re-index without it (uses `default`):
198
+ ```
199
+ /mcp call simargl/index_files {"path": "C:/Project/my-app"}
200
+ ```
201
+
202
+ ---
203
+
204
+ ## Typical 1bcoder workflow
205
+
206
+ ```
207
+ # 1. Find files
208
+ /mcp call simargl/find {"query": "make author field longer in the book class", "mode": "file"} -> find_result
209
+ /var set find_files matches
210
+
211
+ # 2. Read the most relevant files
212
+ /read {{find_files}}
213
+
214
+ # 3. Ask the model
215
+ make the author field longer in the Book class
216
+
217
+ # 4. Apply
218
+ /patch models.py code
219
+ ```
220
+
221
+ ---
222
+
223
+ ## Other tools
224
+
225
+ ### Check index status
226
+
227
+ ```
228
+ /mcp call simargl/status {}
229
+ /mcp call simargl/status {"project_id": "bookcrossing"}
230
+ ```
231
+
232
+ ### Compute embedding for any text
233
+
234
+ ```
235
+ /mcp call simargl/embedding {"text": "add user authentication to login flow"} -> vector1
236
+ ```
237
+
238
+ Stores the vector as `{{vector1}}`. Use later with `distance`.
239
+
240
+ ### Measure semantic distance between two things
241
+
242
+ ```
243
+ /mcp call simargl/distance {"source1": "auth.py", "source2": "views.py"}
244
+ /mcp call simargl/distance {"source1": "add user auth", "source2": "auth.py"}
245
+ ```
246
+
247
+ Returns cosine similarity (0–1).
248
+
249
+ ### Vacuum (reclaim disk after many incremental re-indexes)
250
+
251
+ ```
252
+ /mcp call simargl/vacuum {}
253
+ ```
254
+
255
+ ### Re-index after code changes
256
+
257
+ ```bash
258
+ # Incremental (default) — only processes files modified since last run
259
+ simargl index files C:/repos/sonar --project sonar
260
+
261
+ # Full reindex — re-embeds everything regardless of mtime
262
+ simargl index files C:/repos/sonar --project sonar --full
263
+ ```
264
+
265
+ Incremental index uses `mtime` comparison against the previous `indexed_at` timestamp:
266
+ - unchanged files → skipped
267
+ - modified files → old chunks soft-deleted, new chunks appended
268
+ - deleted files → chunks soft-deleted
269
+
270
+ Soft-deleted vectors stay in the int8 file until you vacuum. Run vacuum periodically
271
+ (e.g. after a big refactor) to reclaim disk space:
272
+
273
+ ```bash
274
+ simargl vacuum --project sonar
275
+ # or from 1bcoder:
276
+ /mcp simargl vacuum
277
+ ```
278
+
279
+ Units index is separate — re-run `index units` only when the SQLite is updated.
280
+
281
+ ---
282
+
283
+ ## Parameters reference
284
+
285
+ | Tool | Key params | Default |
286
+ |---|---|---|
287
+ | `find` | `mode` (tasks\|files), `sort` (rank\|freq), `top_n`, `top_k`, `top_m`, `include_diff` | tasks, rank, 10, 10, 5, false |
288
+ | `index_files` | `path`, `model_key`, `project_id`, `chunk_size` | —, bge-small, default, 400 |
289
+ | `index_units` | `db_path`, `model_key`, `project_id`, `mode` | —, bge-small, default, auto |
290
+ | `embedding` | `text` or `file`, `project_id` | — |
291
+ | `distance` | `source1`, `source2`, `project_id` | — |
292
+
293
+ ---
294
+
295
+ ## Multiple projects
296
+
297
+ ```bash
298
+ simargl index units kafka.db --project kafka
299
+ simargl index files C:/repos/kafka --project kafka
300
+ ```
301
+
302
+ ```
303
+ /mcp simargl find "add partition rebalance" project_id=kafka
304
+ /mcp simargl status project_id=kafka
305
+ ```
306
+
307
+ Each project stores its vectors in `.simargl/{project_id}/` independently.
308
+
309
+ ---
310
+
311
+ ## Running on Android (Termux) — LAN access from laptop
312
+
313
+ simargl runs fully on Android via Termux. With 8GB+ RAM (e.g. Redmi Note 14 Pro 12/512)
314
+ Ollama + nomic-embed-text + simargl-mcp all fit comfortably on the phone.
315
+ The laptop connects over LAN — no cloud, no GPU, everything local.
316
+
317
+ ### Phone setup (Termux)
318
+
319
+ ```bash
320
+ # base tools
321
+ pkg update && pkg install python git
322
+
323
+ # Ollama for Android (ARM64)
324
+ pkg install ollama
325
+ ollama serve &
326
+ ollama pull nomic-embed-text # 274MB embedding model
327
+ # optional: ollama pull nemotron-mini (if you want LLM on phone too)
328
+
329
+ # simargl
330
+ pip install simargl
331
+ pip install simargl[http] # adds starlette + uvicorn for LAN transport
332
+
333
+ # index your project (copy SQLite and repo to phone storage first)
334
+ simargl index units /sdcard/data/sonar.db \
335
+ --project sonar \
336
+ --model ollama://nomic-embed-text
337
+
338
+ simargl index files /sdcard/repos/sonar \
339
+ --project sonar \
340
+ --model ollama://nomic-embed-text
341
+
342
+ # start MCP server on LAN
343
+ simargl-mcp --http --port 8765
344
+ # → simargl MCP server — http://0.0.0.0:8765/sse
345
+ ```
346
+
347
+ ### Laptop — connect to phone
348
+
349
+ Find phone IP: `ip addr` in Termux or check Wi-Fi settings.
350
+
351
+ **1bcoder:**
352
+ ```
353
+ /mcp connect simargl http://192.168.1.42:8765/sse
354
+ /mcp tools simargl
355
+ ```
356
+
357
+ **Claude Desktop** (`claude_desktop_config.json`):
358
+ ```json
359
+ {
360
+ "mcpServers": {
361
+ "simargl": {
362
+ "url": "http://192.168.1.42:8765/sse"
363
+ }
364
+ }
365
+ }
366
+ ```
367
+
368
+ **Claude Code / OpenCode / Cursor** — same URL pattern, see agent-specific config above.
369
+
370
+ ### Keep server running in Termux background
371
+
372
+ ```bash
373
+ # run in background, log to file
374
+ nohup simargl-mcp --http --port 8765 > ~/.simargl-mcp.log 2>&1 &
375
+
376
+ # or use tmux (pkg install tmux)
377
+ tmux new -s simargl
378
+ simargl-mcp --http --port 8765
379
+ # Ctrl+B D to detach
380
+ ```
381
+
382
+ ### What runs where
383
+
384
+ | Component | Phone | Laptop |
385
+ |---|---|---|
386
+ | Vector index (.simargl/) | yes | — |
387
+ | Embedding model (nomic-embed-text) | yes (Ollama) | — |
388
+ | MCP server (simargl-mcp) | yes | — |
389
+ | Agent / LLM (1bcoder, Claude) | — | yes |
390
+ | Repo source files | yes (for indexing) | yes (for editing) |
391
+
392
+ The phone stores the index and computes embeddings. The laptop runs the agent and edits code.
393
+ Both use the same `.simargl/` directory — if you prefer, mount phone storage via sshfs
394
+ so the laptop can also run `simargl index` directly against it.
395
+
396
+ ---
397
+
398
+ ## Connecting to agent systems
399
+
400
+ simargl-mcp uses **stdio transport** — the universal MCP default. Always pass `--store-dir`
401
+ with the absolute path to your project root so the subprocess always finds `.simargl/`
402
+ regardless of which directory the agent system uses as its working directory.
403
+
404
+ ### Claude Desktop
405
+
406
+ Config file:
407
+ - macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
408
+ - Windows: `%APPDATA%\Claude\claude_desktop_config.json`
409
+
410
+ ```json
411
+ {
412
+ "mcpServers": {
413
+ "simargl": {
414
+ "command": "simargl-mcp",
415
+ "args": ["--store-dir", "C:/repos/sonar/.simargl", "--project-id", "sonar"]
416
+ }
417
+ }
418
+ }
419
+ ```
420
+
421
+ Restart Claude Desktop after editing. Tools appear automatically in the UI.
422
+
423
+ ### Claude Code (CLI)
424
+
425
+ Option A — add to global settings `~/.claude/settings.json`:
426
+
427
+ ```json
428
+ {
429
+ "mcpServers": {
430
+ "simargl": {
431
+ "command": "simargl-mcp",
432
+ "args": ["--store-dir", "C:/repos/sonar/.simargl", "--project-id", "sonar"]
433
+ }
434
+ }
435
+ }
436
+ ```
437
+
438
+ Option B — connect interactively from any session (no restart needed):
439
+
440
+ ```
441
+ /mcp add simargl simargl-mcp --store-dir C:/repos/sonar/.simargl --project-id sonar
442
+ ```
443
+
444
+ Then call tools directly in your prompt:
445
+ ```
446
+ use simargl find to locate files related to "add buildString to project analysis"
447
+ ```
448
+
449
+ ### OpenCode
450
+
451
+ Config file: `~/.config/opencode/config.json`
452
+
453
+ ```json
454
+ {
455
+ "mcp": {
456
+ "simargl": {
457
+ "command": ["simargl-mcp"],
458
+ "cwd": "C:/repos/sonar"
459
+ }
460
+ }
461
+ }
462
+ ```
463
+
464
+ ### OpenAI Codex CLI
465
+
466
+ Config file: `~/.codex/config.yaml`
467
+
468
+ ```yaml
469
+ mcp_servers:
470
+ simargl:
471
+ command: simargl-mcp
472
+ cwd: C:/repos/sonar
473
+ ```
474
+
475
+ ### Cursor
476
+
477
+ Config file: `.cursor/mcp.json` in your project root (or global `~/.cursor/mcp.json`):
478
+
479
+ ```json
480
+ {
481
+ "mcpServers": {
482
+ "simargl": {
483
+ "command": "simargl-mcp",
484
+ "cwd": "${workspaceFolder}"
485
+ }
486
+ }
487
+ }
488
+ ```
489
+
490
+ `${workspaceFolder}` resolves to the open project directory — simargl will look for `.simargl/` there.
491
+
492
+ ### Windsurf (Codeium)
493
+
494
+ Config file: `~/.codeium/windsurf/mcp_settings.json`
495
+
496
+ ```json
497
+ {
498
+ "mcpServers": {
499
+ "simargl": {
500
+ "command": "simargl-mcp",
501
+ "cwd": "C:/repos/sonar"
502
+ }
503
+ }
504
+ }
505
+ ```
506
+
507
+ ### Any other MCP-compatible system
508
+
509
+ The pattern is always the same:
510
+
511
+ ```json
512
+ {
513
+ "command": "simargl-mcp",
514
+ "args": [],
515
+ "cwd": "<directory where .simargl/ lives>"
516
+ }
517
+ ```
518
+
519
+ If the agent system does not support `cwd`, pass it as an env variable instead and adjust the server startup — or simply `cd` to the right directory before launching.
520
+
521
+ ---
522
+
523
+ ### Tip: multiple projects across agents
524
+
525
+ If you work on several repos, use `project_id` to keep their indexes separate under the same `.simargl/` directory:
526
+
527
+ ```
528
+ find files related to "add partition rebalance" project_id=kafka
529
+ find files related to "add buildString to API" project_id=sonar
530
+ ```
531
+
532
+ ---
533
+
534
+ ## PostgreSQL + pgvector backend
535
+
536
+ For larger codebases or when you want sub-linear search via HNSW index.
537
+
538
+ ```bash
539
+ pip install simargl[postgres]
540
+ ```
541
+
542
+ Requires PostgreSQL with pgvector extension:
543
+ ```sql
544
+ CREATE EXTENSION IF NOT EXISTS vector;
545
+ ```
546
+
547
+ ### Index with postgres backend
548
+
549
+ ```bash
550
+ simargl index units sonar.db --project sonar \
551
+ --backend postgres \
552
+ --db-url postgresql://postgres:postgres@localhost/simargl
553
+
554
+ simargl index files C:/repos/sonar --project sonar \
555
+ --backend postgres \
556
+ --db-url postgresql://postgres:postgres@localhost/simargl
557
+ ```
558
+
559
+ ### MCP server with postgres
560
+
561
+ ```bash
562
+ simargl-mcp --backend postgres \
563
+ --db-url postgresql://postgres:postgres@localhost/simargl
564
+ ```
565
+
566
+ ### numpy vs postgres — when to choose which
567
+
568
+ | | numpy | postgres |
569
+ |---|---|---|
570
+ | Install | zero extra deps | psycopg2 + pgvector |
571
+ | Search speed | linear scan | sub-linear (HNSW) |
572
+ | Scales well to | ~500k chunks | millions of chunks |
573
+ | Vacuum | file rebuild | `DELETE` + `VACUUM ANALYZE` |
574
+ | Concurrent writes | no | yes |
575
+ | Termux / Android | yes | harder |
576
+ | Laptop / server | yes | yes |
577
+
578
+ For most projects (sonar.db = ~100k chunks) numpy is fast enough. Switch to postgres when search latency becomes noticeable or you index multiple large repos.
579
+
580
+ ---
581
+
582
+ ## Deferred (session 2)
583
+
584
+ - Ollama and OpenAI embedding providers (`ollama://nomic-embed`, `openai://text-embedding-3-small`)
585
+ - Mode `aggregated` — avg task vectors → file search
586
+ - Set operations: `/mcp simargl find "query" mode=tasks+files` (union/intersection)
587
+ - Gradio web UI (`simargl ui`)
588
+ - PostgreSQL backend (`pip install "simargl[postgres]"`)