knowledge-rag 3.9.0__tar.gz → 4.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -45,6 +45,12 @@ documents/README-CATEGORIES.md
45
45
  *.tar.gz
46
46
  *.bak
47
47
 
48
+ # Type-checker cache (per-Python-version, auto-generated)
49
+ .mypy_cache/
50
+
51
+ # Hypothesis property-based testing cache (auto-generated)
52
+ .hypothesis/
53
+
48
54
  # OS files
49
55
  .DS_Store
50
56
  Thumbs.db
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: knowledge-rag
3
- Version: 3.9.0
3
+ Version: 4.0.0
4
4
  Summary: Local RAG System for Claude Code — Hybrid search + Cross-encoder Reranking + 12 MCP Tools + 20 Format Parsers. Zero external servers.
5
5
  Project-URL: Homepage, https://github.com/lyonzin/knowledge-rag
6
6
  Project-URL: Repository, https://github.com/lyonzin/knowledge-rag
@@ -23,7 +23,7 @@ Requires-Python: >=3.11
23
23
  Requires-Dist: beautifulsoup4>=4.12.0
24
24
  Requires-Dist: chromadb>=1.4.0
25
25
  Requires-Dist: fastembed[reranking]>=0.4.0
26
- Requires-Dist: mcp>=1.0.0
26
+ Requires-Dist: mcp>=1.6.0
27
27
  Requires-Dist: openpyxl>=3.1.0
28
28
  Requires-Dist: pymupdf>=1.23.0
29
29
  Requires-Dist: python-docx>=1.0.0
@@ -34,6 +34,8 @@ Requires-Dist: requests>=2.33.0
34
34
  Requires-Dist: watchdog>=4.0.0
35
35
  Provides-Extra: gpu
36
36
  Requires-Dist: onnxruntime-gpu>=1.14.0; extra == 'gpu'
37
+ Provides-Extra: server
38
+ Requires-Dist: uvicorn>=0.20.0; extra == 'server'
37
39
  Description-Content-Type: text/markdown
38
40
 
39
41
  # Knowledge RAG
@@ -66,17 +68,56 @@ pip install knowledge-rag → restart Claude Code → search_knowledge("your que
66
68
 
67
69
  **12 MCP Tools** | **Hybrid Search + Reranking** | **20 File Formats** | **Optional NVIDIA GPU** | **100% Local**
68
70
 
69
- [What's New](#whats-new-in-v390) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
71
+ [What's New](#whats-new-in-v400) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
70
72
 
71
73
  </div>
72
74
 
73
75
  ---
74
76
 
75
- ## What's New in v3.9.0
77
+ ## Star History
78
+
79
+ <div align="center">
80
+
81
+ <a href="https://www.star-history.com/?repos=lyonzin%2Fknowledge-rag&type=date&legend=top-left">
82
+ <picture>
83
+ <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/chart?repos=lyonzin/knowledge-rag&type=date&theme=dark&legend=top-left" />
84
+ <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/chart?repos=lyonzin/knowledge-rag&type=date&legend=top-left" />
85
+ <img alt="Star History Chart" src="https://api.star-history.com/chart?repos=lyonzin/knowledge-rag&type=date&legend=top-left" />
86
+ </picture>
87
+ </a>
88
+
89
+ </div>
90
+
91
+ ---
92
+
93
+ ## What's New in v4.0.0
94
+
95
+ ### Enterprise Concurrent Access — SSE/HTTP Transport (v4.0.0)
96
+
97
+ The server now supports **SSE** and **streamable-http** transport modes. Instead of spawning a separate process per client (stdio), a single server process serves all clients with shared resources — 1 embedding model, 1 ChromaDB, 1 query cache.
98
+
99
+ ```yaml
100
+ # config.yaml
101
+ server:
102
+ transport: "sse" # "stdio" | "sse" | "streamable-http"
103
+ host: "127.0.0.1"
104
+ port: 8179
105
+ ```
106
+
107
+ Or via CLI: `knowledge-rag --transport sse`
108
+
109
+ **Optional enterprise features** (all disabled by default):
110
+ - **Rate limiting**: Sliding-window counter, configurable RPM and burst
111
+ - **Prometheus metrics**: `/metrics` endpoint on separate port
112
+ - **Bearer auth**: Token validation for SSE/HTTP connections
113
+
114
+ All 12 MCP tools are instrumented with `@rate_limited` and `@instrument` decorators — zero overhead when features are disabled. Default transport remains **stdio** for full backwards compatibility.
115
+
116
+ > **Migration**: Existing users need zero changes. SSE mode is opt-in via `server.transport: "sse"` in config.yaml. See [Configuration](#configuration) for details.
76
117
 
77
118
  ### Quality Gate — 7-Pillar PR Validation
78
119
 
79
- knowledge-rag is now used daily by 70+ enterprise teams. Every PR (including dependabot bumps and one-line fixes) is now evaluated against **35+ automated checks** spread across 7 pillars before any human review:
120
+ Every PR (including dependabot bumps and one-line fixes) is now evaluated against **35+ automated checks** spread across 7 pillars before any human review:
80
121
 
81
122
  | Pillar | What it enforces | Tools |
82
123
  |---|---|---|
@@ -124,6 +165,7 @@ All methods produce the same MCP server. See [Installation](#installation) for f
124
165
 
125
166
  ### Recent Highlights
126
167
 
168
+ - **v4.0.0** — **Enterprise concurrent access**: SSE/HTTP transport (1 server → N clients), thread-safe shared state, optional rate limiting + Prometheus metrics, ChromaDB WAL mode, `--transport` CLI
127
169
  - **v3.9.0** — **Quality Gate** activated: 35+ automated PR checks across 7 pillars (Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality) + nightly resilience suite (chaos, soak, determinism, mutation)
128
170
  - **v3.8.1** — Critical hotfix: loud-fail embeddings (no more silent zero-vector corruption); Windows CI flake erradicated (HF_HUB_OFFLINE + shell:bash + atexit wrapper)
129
171
  - **v3.8.0** — Lazy-load embeddings, opt-in single-instance guard, version sync across PyPI/NPM/Docker
@@ -369,6 +411,7 @@ flowchart LR
369
411
 
370
412
  - Python 3.11+
371
413
  - Claude Code CLI
414
+ - *…or any other MCP client (Claude Desktop, Cursor, VS Code, Antigravity, opencode, Windsurf) — see [Use with other MCP clients](#use-with-other-mcp-clients)*
372
415
  - ~200MB disk for model cache (auto-downloaded on first run)
373
416
  - *Optional:* NVIDIA GPU + CUDA for accelerated embeddings (`pip install knowledge-rag[gpu]` + `models.embedding.gpu: true` in config)
374
417
 
@@ -484,6 +527,94 @@ Add to `~/.claude.json`:
484
527
  > Replace `YOUR_USER` with your username, or use the full path from `echo $HOME`.
485
528
  </details>
486
529
 
530
+ #### Option F: SSE Server Mode (multi-agent)
531
+
532
+ For multi-agent setups where multiple clients query the same knowledge base simultaneously:
533
+
534
+ ```bash
535
+ pip install knowledge-rag[server] # Adds uvicorn for SSE/HTTP
536
+ knowledge-rag --transport sse # Starts on http://127.0.0.1:8179
537
+ ```
538
+
539
+ Then configure each MCP client to connect via SSE:
540
+
541
+ ```json
542
+ {
543
+ "mcpServers": {
544
+ "knowledge-rag": {
545
+ "type": "sse",
546
+ "url": "http://127.0.0.1:8179/sse"
547
+ }
548
+ }
549
+ }
550
+ ```
551
+
552
+ One server process serves all agents — shared embedding model, shared cache, shared ChromaDB. See [Configuration > Server](#server) for rate limiting, metrics, and auth options.
553
+
554
+ ### Use with other MCP clients
555
+
556
+ `knowledge-rag` supports both **stdio** (default, 1:1) and **SSE** (1:N) transport modes. In stdio mode, it works with any MCP-compatible client, not only Claude Code. The launch command is the same everywhere (the `python -m mcp_server.server` from whichever install method you picked); only the **config file location** and **JSON shape** differ per client.
557
+
558
+ #### Clients using the standard `mcpServers` format
559
+
560
+ For **Claude Desktop, Cursor, Antigravity, and Windsurf**, use the same block — only the file location changes:
561
+
562
+ ```json
563
+ {
564
+ "mcpServers": {
565
+ "knowledge-rag": {
566
+ "command": "/home/YOUR_USER/knowledge-rag/venv/bin/python",
567
+ "args": ["-m", "mcp_server.server"]
568
+ }
569
+ }
570
+ }
571
+ ```
572
+
573
+ > **Windows**: set `command` to the full path of `venv\Scripts\python.exe`.
574
+
575
+ | Client | Config file | Notes |
576
+ |---|---|---|
577
+ | **Claude Code** | use `claude mcp add …` (see install methods above) | The CLI writes `~/.claude.json` for you — manual edits to it aren't reliably picked up. |
578
+ | **Claude Desktop** | macOS: `~/Library/Application Support/Claude/claude_desktop_config.json` · Windows: `%APPDATA%\Claude\claude_desktop_config.json` | Easiest: **Settings → Developer → Edit Config** opens the correct file (avoids the Windows Store/MSIX path quirk). |
579
+ | **Cursor** | `~/.cursor/mcp.json` (global) or `.cursor/mcp.json` (per project) | — |
580
+ | **Antigravity** | macOS/Linux: `~/.gemini/antigravity/mcp_config.json` · Windows: `%USERPROFILE%\.gemini\antigravity\mcp_config.json` | Open via Agent panel → **"…" → Manage MCP Servers → View raw config**. |
581
+ | **Windsurf** | `~/.codeium/windsurf/mcp_config.json` (global only) | Easiest: Cascade panel → MCP → **View raw config**. |
582
+
583
+ #### VS Code — uses a `servers` key
584
+
585
+ VS Code (Copilot MCP) nests servers under **`servers`**, not `mcpServers`. Put this in `.vscode/mcp.json` (workspace) or the file opened by the **MCP: Open User Configuration** command:
586
+
587
+ ```json
588
+ {
589
+ "servers": {
590
+ "knowledge-rag": {
591
+ "type": "stdio",
592
+ "command": "/home/YOUR_USER/knowledge-rag/venv/bin/python",
593
+ "args": ["-m", "mcp_server.server"]
594
+ }
595
+ }
596
+ }
597
+ ```
598
+
599
+ #### opencode — uses an `mcp` key
600
+
601
+ opencode nests servers under **`mcp`**, takes `command` as a single **array**, and uses `environment` instead of `env`. Put this in `opencode.json` (project root) or `~/.config/opencode/opencode.json` (global):
602
+
603
+ ```jsonc
604
+ {
605
+ "$schema": "https://opencode.ai/config.json",
606
+ "mcp": {
607
+ "knowledge-rag": {
608
+ "type": "local",
609
+ "command": ["/home/YOUR_USER/knowledge-rag/venv/bin/python", "-m", "mcp_server.server"],
610
+ "enabled": true
611
+ }
612
+ }
613
+ }
614
+ ```
615
+
616
+ > **Any other MCP client**: point it at the same command + args (`…/venv/bin/python -m mcp_server.server`). If it speaks stdio MCP, knowledge-rag works — only the config file's location and key naming differ. Check your client's docs for the exact path.
617
+
487
618
  ### Verify
488
619
 
489
620
  ```bash
@@ -880,6 +1011,21 @@ query_expansions:
880
1011
  privesc:
881
1012
  - privilege escalation
882
1013
  - privesc
1014
+
1015
+ # Server — enterprise features (new in v4.0.0)
1016
+ server:
1017
+ transport: "stdio" # "stdio" | "sse" | "streamable-http"
1018
+ host: "127.0.0.1" # Bind address (SSE/HTTP only)
1019
+ port: 8179 # Bind port (SSE/HTTP only)
1020
+ auth:
1021
+ bearer_token: "" # Set a secret to enable auth (SSE/HTTP only)
1022
+ rate_limit:
1023
+ enabled: false
1024
+ requests_per_minute: 60
1025
+ burst: 10
1026
+ metrics:
1027
+ enabled: false
1028
+ port: 9179 # Separate port for Prometheus scraping
883
1029
  ```
884
1030
 
885
1031
  > See `config.example.yaml` for the fully documented template with explanations for every field.
@@ -899,6 +1045,22 @@ Pre-built configurations for common use cases:
899
1045
 
900
1046
  ### Configuration Reference
901
1047
 
1048
+ #### Server
1049
+
1050
+ | Field | Default | Description |
1051
+ |-------|---------|-------------|
1052
+ | `server.transport` | `"stdio"` | Transport protocol: `"stdio"`, `"sse"`, or `"streamable-http"` |
1053
+ | `server.host` | `"127.0.0.1"` | Bind address for SSE/HTTP mode |
1054
+ | `server.port` | `8179` | Bind port for SSE/HTTP mode |
1055
+ | `server.auth.bearer_token` | `""` (disabled) | Bearer token for SSE/HTTP auth. Empty = no auth |
1056
+ | `server.rate_limit.enabled` | `false` | Enable per-client rate limiting |
1057
+ | `server.rate_limit.requests_per_minute` | `60` | Max requests per minute |
1058
+ | `server.rate_limit.burst` | `10` | Burst allowance above steady rate |
1059
+ | `server.metrics.enabled` | `false` | Enable Prometheus `/metrics` endpoint |
1060
+ | `server.metrics.port` | `9179` | Port for metrics scraping |
1061
+
1062
+ In stdio mode (default), server settings are ignored. SSE/HTTP mode auto-enables the single-instance lock.
1063
+
902
1064
  #### Paths
903
1065
 
904
1066
  | Field | Default | Description |
@@ -1136,10 +1298,59 @@ export KNOWLEDGE_RAG_SINGLE_INSTANCE=1
1136
1298
 
1137
1299
  A second instance exits immediately with code 75. Default is OFF (multi-client friendly). Full guide: [docs/single-instance.md](docs/single-instance.md). Sample MCP config: [examples/mcp-config-single-instance.json](examples/mcp-config-single-instance.json).
1138
1300
 
1301
+ ### SSE server won't start
1302
+
1303
+ ```bash
1304
+ # Check if port 8179 is already in use
1305
+ # Windows:
1306
+ netstat -aon | findstr :8179
1307
+ # Linux/macOS:
1308
+ lsof -i :8179
1309
+ ```
1310
+
1311
+ If `uvicorn` is not found, install the server extras: `pip install knowledge-rag[server]`
1312
+
1313
+ ### Can't connect to SSE server
1314
+
1315
+ Verify the server is running and the URL is correct:
1316
+
1317
+ ```bash
1318
+ curl http://127.0.0.1:8179/sse
1319
+ ```
1320
+
1321
+ Common issues:
1322
+ - Wrong URL: must end with `/sse` (not just the port)
1323
+ - Firewall blocking the port
1324
+ - Server started with a different host/port than configured in the MCP client
1325
+
1139
1326
  ---
1140
1327
 
1141
1328
  ## Changelog
1142
1329
 
1330
+ ### v4.0.0 (2026-06-09) — Enterprise Concurrent Access
1331
+
1332
+ - **NEW**: SSE and streamable-http transport modes — 1 server serves N clients (`server.transport: "sse"` in config.yaml or `--transport sse` CLI).
1333
+ - **NEW**: Thread-safe shared state for concurrent queries — QueryCache locking, BM25 build lock, orchestrator double-checked locking.
1334
+ - **NEW**: ChromaDB WAL mode enabled automatically in SSE/HTTP mode for concurrent read performance.
1335
+ - **NEW**: Optional rate limiting — sliding-window counter, configurable RPM and burst, disabled by default.
1336
+ - **NEW**: Optional Prometheus metrics endpoint — tool call counts, latency histograms, separate port, disabled by default.
1337
+ - **NEW**: All 12 MCP tools instrumented with `@rate_limited` and `@instrument` decorators (zero-cost when disabled).
1338
+ - **NEW**: `--transport` CLI override for Docker/systemd deployments.
1339
+ - **NEW**: `pip install knowledge-rag[server]` optional dependency for SSE/HTTP (uvicorn).
1340
+ - **CHANGED**: SSE/HTTP mode auto-enables single-instance lock (port collision prevention).
1341
+ - **CHANGED**: `mcp` dependency bumped to `>=1.6.0` (SSE/streamable-http support).
1342
+ - **MIGRATION**: Default transport remains `stdio` — existing users need zero changes. See config.example.yaml for SSE setup.
1343
+
1344
+ ### v3.9.1 (2026-06-08)
1345
+
1346
+ - **FIX**: Expand `~` in `config.yaml` path values (`documents_dir`, `data_dir`, `models_cache_dir`) via `expanduser()` on all platforms (#86).
1347
+ - **FIX**: Warn when `documents_dir` resolves to a non-existent path instead of silently indexing zero files.
1348
+ - **FIX**: File watcher now uses accumulate-mode debounce — bulk file copies no longer starve the reindex trigger.
1349
+ - **FIX**: Concurrent `index_all()` calls are serialized via `_index_lock` to prevent ChromaDB SQLite corruption.
1350
+ - **FIX**: `collection.add()` is batched (500 chunks/call) to cap memory usage during large reindex operations.
1351
+ - **NEW**: `KNOWLEDGE_RAG_WATCHER_DISABLED=1` env var to disable the file watcher for troubleshooting.
1352
+ - **NEW**: Progress logging every 10% for reindex operations with >100 documents.
1353
+
1143
1354
  ### v3.9.0 (2026-05-10) — Quality Gate
1144
1355
 
1145
1356
  **Major governance + CI hardening release. No runtime behavior change in `mcp_server/`. Public API surface unchanged from v3.8.1.**
@@ -28,17 +28,56 @@ pip install knowledge-rag → restart Claude Code → search_knowledge("your que
28
28
 
29
29
  **12 MCP Tools** | **Hybrid Search + Reranking** | **20 File Formats** | **Optional NVIDIA GPU** | **100% Local**
30
30
 
31
- [What's New](#whats-new-in-v390) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
31
+ [What's New](#whats-new-in-v400) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
32
32
 
33
33
  </div>
34
34
 
35
35
  ---
36
36
 
37
- ## What's New in v3.9.0
37
+ ## Star History
38
+
39
+ <div align="center">
40
+
41
+ <a href="https://www.star-history.com/?repos=lyonzin%2Fknowledge-rag&type=date&legend=top-left">
42
+ <picture>
43
+ <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/chart?repos=lyonzin/knowledge-rag&type=date&theme=dark&legend=top-left" />
44
+ <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/chart?repos=lyonzin/knowledge-rag&type=date&legend=top-left" />
45
+ <img alt="Star History Chart" src="https://api.star-history.com/chart?repos=lyonzin/knowledge-rag&type=date&legend=top-left" />
46
+ </picture>
47
+ </a>
48
+
49
+ </div>
50
+
51
+ ---
52
+
53
+ ## What's New in v4.0.0
54
+
55
+ ### Enterprise Concurrent Access — SSE/HTTP Transport (v4.0.0)
56
+
57
+ The server now supports **SSE** and **streamable-http** transport modes. Instead of spawning a separate process per client (stdio), a single server process serves all clients with shared resources — 1 embedding model, 1 ChromaDB, 1 query cache.
58
+
59
+ ```yaml
60
+ # config.yaml
61
+ server:
62
+ transport: "sse" # "stdio" | "sse" | "streamable-http"
63
+ host: "127.0.0.1"
64
+ port: 8179
65
+ ```
66
+
67
+ Or via CLI: `knowledge-rag --transport sse`
68
+
69
+ **Optional enterprise features** (all disabled by default):
70
+ - **Rate limiting**: Sliding-window counter, configurable RPM and burst
71
+ - **Prometheus metrics**: `/metrics` endpoint on separate port
72
+ - **Bearer auth**: Token validation for SSE/HTTP connections
73
+
74
+ All 12 MCP tools are instrumented with `@rate_limited` and `@instrument` decorators — zero overhead when features are disabled. Default transport remains **stdio** for full backwards compatibility.
75
+
76
+ > **Migration**: Existing users need zero changes. SSE mode is opt-in via `server.transport: "sse"` in config.yaml. See [Configuration](#configuration) for details.
38
77
 
39
78
  ### Quality Gate — 7-Pillar PR Validation
40
79
 
41
- knowledge-rag is now used daily by 70+ enterprise teams. Every PR (including dependabot bumps and one-line fixes) is now evaluated against **35+ automated checks** spread across 7 pillars before any human review:
80
+ Every PR (including dependabot bumps and one-line fixes) is now evaluated against **35+ automated checks** spread across 7 pillars before any human review:
42
81
 
43
82
  | Pillar | What it enforces | Tools |
44
83
  |---|---|---|
@@ -86,6 +125,7 @@ All methods produce the same MCP server. See [Installation](#installation) for f
86
125
 
87
126
  ### Recent Highlights
88
127
 
128
+ - **v4.0.0** — **Enterprise concurrent access**: SSE/HTTP transport (1 server → N clients), thread-safe shared state, optional rate limiting + Prometheus metrics, ChromaDB WAL mode, `--transport` CLI
89
129
  - **v3.9.0** — **Quality Gate** activated: 35+ automated PR checks across 7 pillars (Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality) + nightly resilience suite (chaos, soak, determinism, mutation)
90
130
  - **v3.8.1** — Critical hotfix: loud-fail embeddings (no more silent zero-vector corruption); Windows CI flake erradicated (HF_HUB_OFFLINE + shell:bash + atexit wrapper)
91
131
  - **v3.8.0** — Lazy-load embeddings, opt-in single-instance guard, version sync across PyPI/NPM/Docker
@@ -331,6 +371,7 @@ flowchart LR
331
371
 
332
372
  - Python 3.11+
333
373
  - Claude Code CLI
374
+ - *…or any other MCP client (Claude Desktop, Cursor, VS Code, Antigravity, opencode, Windsurf) — see [Use with other MCP clients](#use-with-other-mcp-clients)*
334
375
  - ~200MB disk for model cache (auto-downloaded on first run)
335
376
  - *Optional:* NVIDIA GPU + CUDA for accelerated embeddings (`pip install knowledge-rag[gpu]` + `models.embedding.gpu: true` in config)
336
377
 
@@ -446,6 +487,94 @@ Add to `~/.claude.json`:
446
487
  > Replace `YOUR_USER` with your username, or use the full path from `echo $HOME`.
447
488
  </details>
448
489
 
490
+ #### Option F: SSE Server Mode (multi-agent)
491
+
492
+ For multi-agent setups where multiple clients query the same knowledge base simultaneously:
493
+
494
+ ```bash
495
+ pip install knowledge-rag[server] # Adds uvicorn for SSE/HTTP
496
+ knowledge-rag --transport sse # Starts on http://127.0.0.1:8179
497
+ ```
498
+
499
+ Then configure each MCP client to connect via SSE:
500
+
501
+ ```json
502
+ {
503
+ "mcpServers": {
504
+ "knowledge-rag": {
505
+ "type": "sse",
506
+ "url": "http://127.0.0.1:8179/sse"
507
+ }
508
+ }
509
+ }
510
+ ```
511
+
512
+ One server process serves all agents — shared embedding model, shared cache, shared ChromaDB. See [Configuration > Server](#server) for rate limiting, metrics, and auth options.
513
+
514
+ ### Use with other MCP clients
515
+
516
+ `knowledge-rag` supports both **stdio** (default, 1:1) and **SSE** (1:N) transport modes. In stdio mode, it works with any MCP-compatible client, not only Claude Code. The launch command is the same everywhere (the `python -m mcp_server.server` from whichever install method you picked); only the **config file location** and **JSON shape** differ per client.
517
+
518
+ #### Clients using the standard `mcpServers` format
519
+
520
+ For **Claude Desktop, Cursor, Antigravity, and Windsurf**, use the same block — only the file location changes:
521
+
522
+ ```json
523
+ {
524
+ "mcpServers": {
525
+ "knowledge-rag": {
526
+ "command": "/home/YOUR_USER/knowledge-rag/venv/bin/python",
527
+ "args": ["-m", "mcp_server.server"]
528
+ }
529
+ }
530
+ }
531
+ ```
532
+
533
+ > **Windows**: set `command` to the full path of `venv\Scripts\python.exe`.
534
+
535
+ | Client | Config file | Notes |
536
+ |---|---|---|
537
+ | **Claude Code** | use `claude mcp add …` (see install methods above) | The CLI writes `~/.claude.json` for you — manual edits to it aren't reliably picked up. |
538
+ | **Claude Desktop** | macOS: `~/Library/Application Support/Claude/claude_desktop_config.json` · Windows: `%APPDATA%\Claude\claude_desktop_config.json` | Easiest: **Settings → Developer → Edit Config** opens the correct file (avoids the Windows Store/MSIX path quirk). |
539
+ | **Cursor** | `~/.cursor/mcp.json` (global) or `.cursor/mcp.json` (per project) | — |
540
+ | **Antigravity** | macOS/Linux: `~/.gemini/antigravity/mcp_config.json` · Windows: `%USERPROFILE%\.gemini\antigravity\mcp_config.json` | Open via Agent panel → **"…" → Manage MCP Servers → View raw config**. |
541
+ | **Windsurf** | `~/.codeium/windsurf/mcp_config.json` (global only) | Easiest: Cascade panel → MCP → **View raw config**. |
542
+
543
+ #### VS Code — uses a `servers` key
544
+
545
+ VS Code (Copilot MCP) nests servers under **`servers`**, not `mcpServers`. Put this in `.vscode/mcp.json` (workspace) or the file opened by the **MCP: Open User Configuration** command:
546
+
547
+ ```json
548
+ {
549
+ "servers": {
550
+ "knowledge-rag": {
551
+ "type": "stdio",
552
+ "command": "/home/YOUR_USER/knowledge-rag/venv/bin/python",
553
+ "args": ["-m", "mcp_server.server"]
554
+ }
555
+ }
556
+ }
557
+ ```
558
+
559
+ #### opencode — uses an `mcp` key
560
+
561
+ opencode nests servers under **`mcp`**, takes `command` as a single **array**, and uses `environment` instead of `env`. Put this in `opencode.json` (project root) or `~/.config/opencode/opencode.json` (global):
562
+
563
+ ```jsonc
564
+ {
565
+ "$schema": "https://opencode.ai/config.json",
566
+ "mcp": {
567
+ "knowledge-rag": {
568
+ "type": "local",
569
+ "command": ["/home/YOUR_USER/knowledge-rag/venv/bin/python", "-m", "mcp_server.server"],
570
+ "enabled": true
571
+ }
572
+ }
573
+ }
574
+ ```
575
+
576
+ > **Any other MCP client**: point it at the same command + args (`…/venv/bin/python -m mcp_server.server`). If it speaks stdio MCP, knowledge-rag works — only the config file's location and key naming differ. Check your client's docs for the exact path.
577
+
449
578
  ### Verify
450
579
 
451
580
  ```bash
@@ -842,6 +971,21 @@ query_expansions:
842
971
  privesc:
843
972
  - privilege escalation
844
973
  - privesc
974
+
975
+ # Server — enterprise features (new in v4.0.0)
976
+ server:
977
+ transport: "stdio" # "stdio" | "sse" | "streamable-http"
978
+ host: "127.0.0.1" # Bind address (SSE/HTTP only)
979
+ port: 8179 # Bind port (SSE/HTTP only)
980
+ auth:
981
+ bearer_token: "" # Set a secret to enable auth (SSE/HTTP only)
982
+ rate_limit:
983
+ enabled: false
984
+ requests_per_minute: 60
985
+ burst: 10
986
+ metrics:
987
+ enabled: false
988
+ port: 9179 # Separate port for Prometheus scraping
845
989
  ```
846
990
 
847
991
  > See `config.example.yaml` for the fully documented template with explanations for every field.
@@ -861,6 +1005,22 @@ Pre-built configurations for common use cases:
861
1005
 
862
1006
  ### Configuration Reference
863
1007
 
1008
+ #### Server
1009
+
1010
+ | Field | Default | Description |
1011
+ |-------|---------|-------------|
1012
+ | `server.transport` | `"stdio"` | Transport protocol: `"stdio"`, `"sse"`, or `"streamable-http"` |
1013
+ | `server.host` | `"127.0.0.1"` | Bind address for SSE/HTTP mode |
1014
+ | `server.port` | `8179` | Bind port for SSE/HTTP mode |
1015
+ | `server.auth.bearer_token` | `""` (disabled) | Bearer token for SSE/HTTP auth. Empty = no auth |
1016
+ | `server.rate_limit.enabled` | `false` | Enable per-client rate limiting |
1017
+ | `server.rate_limit.requests_per_minute` | `60` | Max requests per minute |
1018
+ | `server.rate_limit.burst` | `10` | Burst allowance above steady rate |
1019
+ | `server.metrics.enabled` | `false` | Enable Prometheus `/metrics` endpoint |
1020
+ | `server.metrics.port` | `9179` | Port for metrics scraping |
1021
+
1022
+ In stdio mode (default), server settings are ignored. SSE/HTTP mode auto-enables the single-instance lock.
1023
+
864
1024
  #### Paths
865
1025
 
866
1026
  | Field | Default | Description |
@@ -1098,10 +1258,59 @@ export KNOWLEDGE_RAG_SINGLE_INSTANCE=1
1098
1258
 
1099
1259
  A second instance exits immediately with code 75. Default is OFF (multi-client friendly). Full guide: [docs/single-instance.md](docs/single-instance.md). Sample MCP config: [examples/mcp-config-single-instance.json](examples/mcp-config-single-instance.json).
1100
1260
 
1261
+ ### SSE server won't start
1262
+
1263
+ ```bash
1264
+ # Check if port 8179 is already in use
1265
+ # Windows:
1266
+ netstat -aon | findstr :8179
1267
+ # Linux/macOS:
1268
+ lsof -i :8179
1269
+ ```
1270
+
1271
+ If `uvicorn` is not found, install the server extras: `pip install knowledge-rag[server]`
1272
+
1273
+ ### Can't connect to SSE server
1274
+
1275
+ Verify the server is running and the URL is correct:
1276
+
1277
+ ```bash
1278
+ curl http://127.0.0.1:8179/sse
1279
+ ```
1280
+
1281
+ Common issues:
1282
+ - Wrong URL: must end with `/sse` (not just the port)
1283
+ - Firewall blocking the port
1284
+ - Server started with a different host/port than configured in the MCP client
1285
+
1101
1286
  ---
1102
1287
 
1103
1288
  ## Changelog
1104
1289
 
1290
+ ### v4.0.0 (2026-06-09) — Enterprise Concurrent Access
1291
+
1292
+ - **NEW**: SSE and streamable-http transport modes — 1 server serves N clients (`server.transport: "sse"` in config.yaml or `--transport sse` CLI).
1293
+ - **NEW**: Thread-safe shared state for concurrent queries — QueryCache locking, BM25 build lock, orchestrator double-checked locking.
1294
+ - **NEW**: ChromaDB WAL mode enabled automatically in SSE/HTTP mode for concurrent read performance.
1295
+ - **NEW**: Optional rate limiting — sliding-window counter, configurable RPM and burst, disabled by default.
1296
+ - **NEW**: Optional Prometheus metrics endpoint — tool call counts, latency histograms, separate port, disabled by default.
1297
+ - **NEW**: All 12 MCP tools instrumented with `@rate_limited` and `@instrument` decorators (zero-cost when disabled).
1298
+ - **NEW**: `--transport` CLI override for Docker/systemd deployments.
1299
+ - **NEW**: `pip install knowledge-rag[server]` optional dependency for SSE/HTTP (uvicorn).
1300
+ - **CHANGED**: SSE/HTTP mode auto-enables single-instance lock (port collision prevention).
1301
+ - **CHANGED**: `mcp` dependency bumped to `>=1.6.0` (SSE/streamable-http support).
1302
+ - **MIGRATION**: Default transport remains `stdio` — existing users need zero changes. See config.example.yaml for SSE setup.
1303
+
1304
+ ### v3.9.1 (2026-06-08)
1305
+
1306
+ - **FIX**: Expand `~` in `config.yaml` path values (`documents_dir`, `data_dir`, `models_cache_dir`) via `expanduser()` on all platforms (#86).
1307
+ - **FIX**: Warn when `documents_dir` resolves to a non-existent path instead of silently indexing zero files.
1308
+ - **FIX**: File watcher now uses accumulate-mode debounce — bulk file copies no longer starve the reindex trigger.
1309
+ - **FIX**: Concurrent `index_all()` calls are serialized via `_index_lock` to prevent ChromaDB SQLite corruption.
1310
+ - **FIX**: `collection.add()` is batched (500 chunks/call) to cap memory usage during large reindex operations.
1311
+ - **NEW**: `KNOWLEDGE_RAG_WATCHER_DISABLED=1` env var to disable the file watcher for troubleshooting.
1312
+ - **NEW**: Progress logging every 10% for reindex operations with >100 documents.
1313
+
1105
1314
  ### v3.9.0 (2026-05-10) — Quality Gate
1106
1315
 
1107
1316
  **Major governance + CI hardening release. No runtime behavior change in `mcp_server/`. Public API surface unchanged from v3.8.1.**
@@ -245,3 +245,36 @@ query_expansions: {}
245
245
  #
246
246
  # # Logging verbosity: DEBUG, INFO, WARNING, ERROR
247
247
  # # log_level: "INFO"
248
+
249
+
250
+ # ============================================================================
251
+ # SERVER (new in v4.0.0)
252
+ # ============================================================================
253
+ # Controls transport, networking, and enterprise features.
254
+ # All fields are optional — defaults preserve v3.x stdio behavior.
255
+
256
+ server:
257
+ # Transport protocol: "stdio" (legacy), "sse", "streamable-http"
258
+ # stdio: 1 process per client (compatible with all MCP clients)
259
+ # sse: 1 server serves N clients over HTTP+SSE (recommended for multi-agent)
260
+ # streamable-http: 1 server, HTTP streaming
261
+ transport: "stdio"
262
+
263
+ # Network settings (ignored when transport is stdio)
264
+ host: "127.0.0.1"
265
+ port: 8179
266
+
267
+ # Auth: optional bearer token validation (SSE/HTTP only)
268
+ auth:
269
+ bearer_token: ""
270
+
271
+ # Rate limiting: optional per-client request throttling
272
+ rate_limit:
273
+ enabled: false
274
+ requests_per_minute: 60
275
+ burst: 10
276
+
277
+ # Metrics: optional Prometheus-compatible /metrics endpoint
278
+ metrics:
279
+ enabled: false
280
+ port: 9179
@@ -8,7 +8,7 @@ import sys # noqa: I001
8
8
  _original_stdout = sys.stdout
9
9
  sys.stdout = sys.stderr
10
10
 
11
- __version__ = "3.9.0"
11
+ __version__ = "4.0.0"
12
12
  __author__ = "Ailton Rocha (Lyon.)"
13
13
 
14
14
  from .config import Config # noqa: E402