knowledge-rag 3.9.0__tar.gz → 4.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/.gitignore +6 -0
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/PKG-INFO +216 -5
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/README.md +212 -3
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/config.example.yaml +33 -0
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/mcp_server/__init__.py +1 -1
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/mcp_server/config.py +73 -3
- knowledge_rag-4.0.0/mcp_server/metrics.py +102 -0
- knowledge_rag-4.0.0/mcp_server/ratelimit.py +69 -0
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/mcp_server/server.py +624 -105
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/pyproject.toml +3 -2
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/LICENSE +0 -0
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/mcp_server/guarded.py +0 -0
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/mcp_server/ingestion.py +0 -0
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/mcp_server/instance_lock.py +0 -0
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/mcp_server/preflight.py +0 -0
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/npm/README.md +0 -0
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/presets/cybersecurity.yaml +0 -0
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/presets/developer.yaml +0 -0
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/presets/general.yaml +0 -0
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/presets/research.yaml +0 -0
- {knowledge_rag-3.9.0 → knowledge_rag-4.0.0}/requirements.txt +0 -0
|
@@ -45,6 +45,12 @@ documents/README-CATEGORIES.md
|
|
|
45
45
|
*.tar.gz
|
|
46
46
|
*.bak
|
|
47
47
|
|
|
48
|
+
# Type-checker cache (per-Python-version, auto-generated)
|
|
49
|
+
.mypy_cache/
|
|
50
|
+
|
|
51
|
+
# Hypothesis property-based testing cache (auto-generated)
|
|
52
|
+
.hypothesis/
|
|
53
|
+
|
|
48
54
|
# OS files
|
|
49
55
|
.DS_Store
|
|
50
56
|
Thumbs.db
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: knowledge-rag
|
|
3
|
-
Version:
|
|
3
|
+
Version: 4.0.0
|
|
4
4
|
Summary: Local RAG System for Claude Code — Hybrid search + Cross-encoder Reranking + 12 MCP Tools + 20 Format Parsers. Zero external servers.
|
|
5
5
|
Project-URL: Homepage, https://github.com/lyonzin/knowledge-rag
|
|
6
6
|
Project-URL: Repository, https://github.com/lyonzin/knowledge-rag
|
|
@@ -23,7 +23,7 @@ Requires-Python: >=3.11
|
|
|
23
23
|
Requires-Dist: beautifulsoup4>=4.12.0
|
|
24
24
|
Requires-Dist: chromadb>=1.4.0
|
|
25
25
|
Requires-Dist: fastembed[reranking]>=0.4.0
|
|
26
|
-
Requires-Dist: mcp>=1.
|
|
26
|
+
Requires-Dist: mcp>=1.6.0
|
|
27
27
|
Requires-Dist: openpyxl>=3.1.0
|
|
28
28
|
Requires-Dist: pymupdf>=1.23.0
|
|
29
29
|
Requires-Dist: python-docx>=1.0.0
|
|
@@ -34,6 +34,8 @@ Requires-Dist: requests>=2.33.0
|
|
|
34
34
|
Requires-Dist: watchdog>=4.0.0
|
|
35
35
|
Provides-Extra: gpu
|
|
36
36
|
Requires-Dist: onnxruntime-gpu>=1.14.0; extra == 'gpu'
|
|
37
|
+
Provides-Extra: server
|
|
38
|
+
Requires-Dist: uvicorn>=0.20.0; extra == 'server'
|
|
37
39
|
Description-Content-Type: text/markdown
|
|
38
40
|
|
|
39
41
|
# Knowledge RAG
|
|
@@ -66,17 +68,56 @@ pip install knowledge-rag → restart Claude Code → search_knowledge("your que
|
|
|
66
68
|
|
|
67
69
|
**12 MCP Tools** | **Hybrid Search + Reranking** | **20 File Formats** | **Optional NVIDIA GPU** | **100% Local**
|
|
68
70
|
|
|
69
|
-
[What's New](#whats-new-in-
|
|
71
|
+
[What's New](#whats-new-in-v400) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
|
|
70
72
|
|
|
71
73
|
</div>
|
|
72
74
|
|
|
73
75
|
---
|
|
74
76
|
|
|
75
|
-
##
|
|
77
|
+
## Star History
|
|
78
|
+
|
|
79
|
+
<div align="center">
|
|
80
|
+
|
|
81
|
+
<a href="https://www.star-history.com/?repos=lyonzin%2Fknowledge-rag&type=date&legend=top-left">
|
|
82
|
+
<picture>
|
|
83
|
+
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/chart?repos=lyonzin/knowledge-rag&type=date&theme=dark&legend=top-left" />
|
|
84
|
+
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/chart?repos=lyonzin/knowledge-rag&type=date&legend=top-left" />
|
|
85
|
+
<img alt="Star History Chart" src="https://api.star-history.com/chart?repos=lyonzin/knowledge-rag&type=date&legend=top-left" />
|
|
86
|
+
</picture>
|
|
87
|
+
</a>
|
|
88
|
+
|
|
89
|
+
</div>
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
## What's New in v4.0.0
|
|
94
|
+
|
|
95
|
+
### Enterprise Concurrent Access — SSE/HTTP Transport (v4.0.0)
|
|
96
|
+
|
|
97
|
+
The server now supports **SSE** and **streamable-http** transport modes. Instead of spawning a separate process per client (stdio), a single server process serves all clients with shared resources — 1 embedding model, 1 ChromaDB, 1 query cache.
|
|
98
|
+
|
|
99
|
+
```yaml
|
|
100
|
+
# config.yaml
|
|
101
|
+
server:
|
|
102
|
+
transport: "sse" # "stdio" | "sse" | "streamable-http"
|
|
103
|
+
host: "127.0.0.1"
|
|
104
|
+
port: 8179
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
Or via CLI: `knowledge-rag --transport sse`
|
|
108
|
+
|
|
109
|
+
**Optional enterprise features** (all disabled by default):
|
|
110
|
+
- **Rate limiting**: Sliding-window counter, configurable RPM and burst
|
|
111
|
+
- **Prometheus metrics**: `/metrics` endpoint on separate port
|
|
112
|
+
- **Bearer auth**: Token validation for SSE/HTTP connections
|
|
113
|
+
|
|
114
|
+
All 12 MCP tools are instrumented with `@rate_limited` and `@instrument` decorators — zero overhead when features are disabled. Default transport remains **stdio** for full backwards compatibility.
|
|
115
|
+
|
|
116
|
+
> **Migration**: Existing users need zero changes. SSE mode is opt-in via `server.transport: "sse"` in config.yaml. See [Configuration](#configuration) for details.
|
|
76
117
|
|
|
77
118
|
### Quality Gate — 7-Pillar PR Validation
|
|
78
119
|
|
|
79
|
-
|
|
120
|
+
Every PR (including dependabot bumps and one-line fixes) is now evaluated against **35+ automated checks** spread across 7 pillars before any human review:
|
|
80
121
|
|
|
81
122
|
| Pillar | What it enforces | Tools |
|
|
82
123
|
|---|---|---|
|
|
@@ -124,6 +165,7 @@ All methods produce the same MCP server. See [Installation](#installation) for f
|
|
|
124
165
|
|
|
125
166
|
### Recent Highlights
|
|
126
167
|
|
|
168
|
+
- **v4.0.0** — **Enterprise concurrent access**: SSE/HTTP transport (1 server → N clients), thread-safe shared state, optional rate limiting + Prometheus metrics, ChromaDB WAL mode, `--transport` CLI
|
|
127
169
|
- **v3.9.0** — **Quality Gate** activated: 35+ automated PR checks across 7 pillars (Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality) + nightly resilience suite (chaos, soak, determinism, mutation)
|
|
128
170
|
- **v3.8.1** — Critical hotfix: loud-fail embeddings (no more silent zero-vector corruption); Windows CI flake erradicated (HF_HUB_OFFLINE + shell:bash + atexit wrapper)
|
|
129
171
|
- **v3.8.0** — Lazy-load embeddings, opt-in single-instance guard, version sync across PyPI/NPM/Docker
|
|
@@ -369,6 +411,7 @@ flowchart LR
|
|
|
369
411
|
|
|
370
412
|
- Python 3.11+
|
|
371
413
|
- Claude Code CLI
|
|
414
|
+
- *…or any other MCP client (Claude Desktop, Cursor, VS Code, Antigravity, opencode, Windsurf) — see [Use with other MCP clients](#use-with-other-mcp-clients)*
|
|
372
415
|
- ~200MB disk for model cache (auto-downloaded on first run)
|
|
373
416
|
- *Optional:* NVIDIA GPU + CUDA for accelerated embeddings (`pip install knowledge-rag[gpu]` + `models.embedding.gpu: true` in config)
|
|
374
417
|
|
|
@@ -484,6 +527,94 @@ Add to `~/.claude.json`:
|
|
|
484
527
|
> Replace `YOUR_USER` with your username, or use the full path from `echo $HOME`.
|
|
485
528
|
</details>
|
|
486
529
|
|
|
530
|
+
#### Option F: SSE Server Mode (multi-agent)
|
|
531
|
+
|
|
532
|
+
For multi-agent setups where multiple clients query the same knowledge base simultaneously:
|
|
533
|
+
|
|
534
|
+
```bash
|
|
535
|
+
pip install knowledge-rag[server] # Adds uvicorn for SSE/HTTP
|
|
536
|
+
knowledge-rag --transport sse # Starts on http://127.0.0.1:8179
|
|
537
|
+
```
|
|
538
|
+
|
|
539
|
+
Then configure each MCP client to connect via SSE:
|
|
540
|
+
|
|
541
|
+
```json
|
|
542
|
+
{
|
|
543
|
+
"mcpServers": {
|
|
544
|
+
"knowledge-rag": {
|
|
545
|
+
"type": "sse",
|
|
546
|
+
"url": "http://127.0.0.1:8179/sse"
|
|
547
|
+
}
|
|
548
|
+
}
|
|
549
|
+
}
|
|
550
|
+
```
|
|
551
|
+
|
|
552
|
+
One server process serves all agents — shared embedding model, shared cache, shared ChromaDB. See [Configuration > Server](#server) for rate limiting, metrics, and auth options.
|
|
553
|
+
|
|
554
|
+
### Use with other MCP clients
|
|
555
|
+
|
|
556
|
+
`knowledge-rag` supports both **stdio** (default, 1:1) and **SSE** (1:N) transport modes. In stdio mode, it works with any MCP-compatible client, not only Claude Code. The launch command is the same everywhere (the `python -m mcp_server.server` from whichever install method you picked); only the **config file location** and **JSON shape** differ per client.
|
|
557
|
+
|
|
558
|
+
#### Clients using the standard `mcpServers` format
|
|
559
|
+
|
|
560
|
+
For **Claude Desktop, Cursor, Antigravity, and Windsurf**, use the same block — only the file location changes:
|
|
561
|
+
|
|
562
|
+
```json
|
|
563
|
+
{
|
|
564
|
+
"mcpServers": {
|
|
565
|
+
"knowledge-rag": {
|
|
566
|
+
"command": "/home/YOUR_USER/knowledge-rag/venv/bin/python",
|
|
567
|
+
"args": ["-m", "mcp_server.server"]
|
|
568
|
+
}
|
|
569
|
+
}
|
|
570
|
+
}
|
|
571
|
+
```
|
|
572
|
+
|
|
573
|
+
> **Windows**: set `command` to the full path of `venv\Scripts\python.exe`.
|
|
574
|
+
|
|
575
|
+
| Client | Config file | Notes |
|
|
576
|
+
|---|---|---|
|
|
577
|
+
| **Claude Code** | use `claude mcp add …` (see install methods above) | The CLI writes `~/.claude.json` for you — manual edits to it aren't reliably picked up. |
|
|
578
|
+
| **Claude Desktop** | macOS: `~/Library/Application Support/Claude/claude_desktop_config.json` · Windows: `%APPDATA%\Claude\claude_desktop_config.json` | Easiest: **Settings → Developer → Edit Config** opens the correct file (avoids the Windows Store/MSIX path quirk). |
|
|
579
|
+
| **Cursor** | `~/.cursor/mcp.json` (global) or `.cursor/mcp.json` (per project) | — |
|
|
580
|
+
| **Antigravity** | macOS/Linux: `~/.gemini/antigravity/mcp_config.json` · Windows: `%USERPROFILE%\.gemini\antigravity\mcp_config.json` | Open via Agent panel → **"…" → Manage MCP Servers → View raw config**. |
|
|
581
|
+
| **Windsurf** | `~/.codeium/windsurf/mcp_config.json` (global only) | Easiest: Cascade panel → MCP → **View raw config**. |
|
|
582
|
+
|
|
583
|
+
#### VS Code — uses a `servers` key
|
|
584
|
+
|
|
585
|
+
VS Code (Copilot MCP) nests servers under **`servers`**, not `mcpServers`. Put this in `.vscode/mcp.json` (workspace) or the file opened by the **MCP: Open User Configuration** command:
|
|
586
|
+
|
|
587
|
+
```json
|
|
588
|
+
{
|
|
589
|
+
"servers": {
|
|
590
|
+
"knowledge-rag": {
|
|
591
|
+
"type": "stdio",
|
|
592
|
+
"command": "/home/YOUR_USER/knowledge-rag/venv/bin/python",
|
|
593
|
+
"args": ["-m", "mcp_server.server"]
|
|
594
|
+
}
|
|
595
|
+
}
|
|
596
|
+
}
|
|
597
|
+
```
|
|
598
|
+
|
|
599
|
+
#### opencode — uses an `mcp` key
|
|
600
|
+
|
|
601
|
+
opencode nests servers under **`mcp`**, takes `command` as a single **array**, and uses `environment` instead of `env`. Put this in `opencode.json` (project root) or `~/.config/opencode/opencode.json` (global):
|
|
602
|
+
|
|
603
|
+
```jsonc
|
|
604
|
+
{
|
|
605
|
+
"$schema": "https://opencode.ai/config.json",
|
|
606
|
+
"mcp": {
|
|
607
|
+
"knowledge-rag": {
|
|
608
|
+
"type": "local",
|
|
609
|
+
"command": ["/home/YOUR_USER/knowledge-rag/venv/bin/python", "-m", "mcp_server.server"],
|
|
610
|
+
"enabled": true
|
|
611
|
+
}
|
|
612
|
+
}
|
|
613
|
+
}
|
|
614
|
+
```
|
|
615
|
+
|
|
616
|
+
> **Any other MCP client**: point it at the same command + args (`…/venv/bin/python -m mcp_server.server`). If it speaks stdio MCP, knowledge-rag works — only the config file's location and key naming differ. Check your client's docs for the exact path.
|
|
617
|
+
|
|
487
618
|
### Verify
|
|
488
619
|
|
|
489
620
|
```bash
|
|
@@ -880,6 +1011,21 @@ query_expansions:
|
|
|
880
1011
|
privesc:
|
|
881
1012
|
- privilege escalation
|
|
882
1013
|
- privesc
|
|
1014
|
+
|
|
1015
|
+
# Server — enterprise features (new in v4.0.0)
|
|
1016
|
+
server:
|
|
1017
|
+
transport: "stdio" # "stdio" | "sse" | "streamable-http"
|
|
1018
|
+
host: "127.0.0.1" # Bind address (SSE/HTTP only)
|
|
1019
|
+
port: 8179 # Bind port (SSE/HTTP only)
|
|
1020
|
+
auth:
|
|
1021
|
+
bearer_token: "" # Set a secret to enable auth (SSE/HTTP only)
|
|
1022
|
+
rate_limit:
|
|
1023
|
+
enabled: false
|
|
1024
|
+
requests_per_minute: 60
|
|
1025
|
+
burst: 10
|
|
1026
|
+
metrics:
|
|
1027
|
+
enabled: false
|
|
1028
|
+
port: 9179 # Separate port for Prometheus scraping
|
|
883
1029
|
```
|
|
884
1030
|
|
|
885
1031
|
> See `config.example.yaml` for the fully documented template with explanations for every field.
|
|
@@ -899,6 +1045,22 @@ Pre-built configurations for common use cases:
|
|
|
899
1045
|
|
|
900
1046
|
### Configuration Reference
|
|
901
1047
|
|
|
1048
|
+
#### Server
|
|
1049
|
+
|
|
1050
|
+
| Field | Default | Description |
|
|
1051
|
+
|-------|---------|-------------|
|
|
1052
|
+
| `server.transport` | `"stdio"` | Transport protocol: `"stdio"`, `"sse"`, or `"streamable-http"` |
|
|
1053
|
+
| `server.host` | `"127.0.0.1"` | Bind address for SSE/HTTP mode |
|
|
1054
|
+
| `server.port` | `8179` | Bind port for SSE/HTTP mode |
|
|
1055
|
+
| `server.auth.bearer_token` | `""` (disabled) | Bearer token for SSE/HTTP auth. Empty = no auth |
|
|
1056
|
+
| `server.rate_limit.enabled` | `false` | Enable per-client rate limiting |
|
|
1057
|
+
| `server.rate_limit.requests_per_minute` | `60` | Max requests per minute |
|
|
1058
|
+
| `server.rate_limit.burst` | `10` | Burst allowance above steady rate |
|
|
1059
|
+
| `server.metrics.enabled` | `false` | Enable Prometheus `/metrics` endpoint |
|
|
1060
|
+
| `server.metrics.port` | `9179` | Port for metrics scraping |
|
|
1061
|
+
|
|
1062
|
+
In stdio mode (default), server settings are ignored. SSE/HTTP mode auto-enables the single-instance lock.
|
|
1063
|
+
|
|
902
1064
|
#### Paths
|
|
903
1065
|
|
|
904
1066
|
| Field | Default | Description |
|
|
@@ -1136,10 +1298,59 @@ export KNOWLEDGE_RAG_SINGLE_INSTANCE=1
|
|
|
1136
1298
|
|
|
1137
1299
|
A second instance exits immediately with code 75. Default is OFF (multi-client friendly). Full guide: [docs/single-instance.md](docs/single-instance.md). Sample MCP config: [examples/mcp-config-single-instance.json](examples/mcp-config-single-instance.json).
|
|
1138
1300
|
|
|
1301
|
+
### SSE server won't start
|
|
1302
|
+
|
|
1303
|
+
```bash
|
|
1304
|
+
# Check if port 8179 is already in use
|
|
1305
|
+
# Windows:
|
|
1306
|
+
netstat -aon | findstr :8179
|
|
1307
|
+
# Linux/macOS:
|
|
1308
|
+
lsof -i :8179
|
|
1309
|
+
```
|
|
1310
|
+
|
|
1311
|
+
If `uvicorn` is not found, install the server extras: `pip install knowledge-rag[server]`
|
|
1312
|
+
|
|
1313
|
+
### Can't connect to SSE server
|
|
1314
|
+
|
|
1315
|
+
Verify the server is running and the URL is correct:
|
|
1316
|
+
|
|
1317
|
+
```bash
|
|
1318
|
+
curl http://127.0.0.1:8179/sse
|
|
1319
|
+
```
|
|
1320
|
+
|
|
1321
|
+
Common issues:
|
|
1322
|
+
- Wrong URL: must end with `/sse` (not just the port)
|
|
1323
|
+
- Firewall blocking the port
|
|
1324
|
+
- Server started with a different host/port than configured in the MCP client
|
|
1325
|
+
|
|
1139
1326
|
---
|
|
1140
1327
|
|
|
1141
1328
|
## Changelog
|
|
1142
1329
|
|
|
1330
|
+
### v4.0.0 (2026-06-09) — Enterprise Concurrent Access
|
|
1331
|
+
|
|
1332
|
+
- **NEW**: SSE and streamable-http transport modes — 1 server serves N clients (`server.transport: "sse"` in config.yaml or `--transport sse` CLI).
|
|
1333
|
+
- **NEW**: Thread-safe shared state for concurrent queries — QueryCache locking, BM25 build lock, orchestrator double-checked locking.
|
|
1334
|
+
- **NEW**: ChromaDB WAL mode enabled automatically in SSE/HTTP mode for concurrent read performance.
|
|
1335
|
+
- **NEW**: Optional rate limiting — sliding-window counter, configurable RPM and burst, disabled by default.
|
|
1336
|
+
- **NEW**: Optional Prometheus metrics endpoint — tool call counts, latency histograms, separate port, disabled by default.
|
|
1337
|
+
- **NEW**: All 12 MCP tools instrumented with `@rate_limited` and `@instrument` decorators (zero-cost when disabled).
|
|
1338
|
+
- **NEW**: `--transport` CLI override for Docker/systemd deployments.
|
|
1339
|
+
- **NEW**: `pip install knowledge-rag[server]` optional dependency for SSE/HTTP (uvicorn).
|
|
1340
|
+
- **CHANGED**: SSE/HTTP mode auto-enables single-instance lock (port collision prevention).
|
|
1341
|
+
- **CHANGED**: `mcp` dependency bumped to `>=1.6.0` (SSE/streamable-http support).
|
|
1342
|
+
- **MIGRATION**: Default transport remains `stdio` — existing users need zero changes. See config.example.yaml for SSE setup.
|
|
1343
|
+
|
|
1344
|
+
### v3.9.1 (2026-06-08)
|
|
1345
|
+
|
|
1346
|
+
- **FIX**: Expand `~` in `config.yaml` path values (`documents_dir`, `data_dir`, `models_cache_dir`) via `expanduser()` on all platforms (#86).
|
|
1347
|
+
- **FIX**: Warn when `documents_dir` resolves to a non-existent path instead of silently indexing zero files.
|
|
1348
|
+
- **FIX**: File watcher now uses accumulate-mode debounce — bulk file copies no longer starve the reindex trigger.
|
|
1349
|
+
- **FIX**: Concurrent `index_all()` calls are serialized via `_index_lock` to prevent ChromaDB SQLite corruption.
|
|
1350
|
+
- **FIX**: `collection.add()` is batched (500 chunks/call) to cap memory usage during large reindex operations.
|
|
1351
|
+
- **NEW**: `KNOWLEDGE_RAG_WATCHER_DISABLED=1` env var to disable the file watcher for troubleshooting.
|
|
1352
|
+
- **NEW**: Progress logging every 10% for reindex operations with >100 documents.
|
|
1353
|
+
|
|
1143
1354
|
### v3.9.0 (2026-05-10) — Quality Gate
|
|
1144
1355
|
|
|
1145
1356
|
**Major governance + CI hardening release. No runtime behavior change in `mcp_server/`. Public API surface unchanged from v3.8.1.**
|
|
@@ -28,17 +28,56 @@ pip install knowledge-rag → restart Claude Code → search_knowledge("your que
|
|
|
28
28
|
|
|
29
29
|
**12 MCP Tools** | **Hybrid Search + Reranking** | **20 File Formats** | **Optional NVIDIA GPU** | **100% Local**
|
|
30
30
|
|
|
31
|
-
[What's New](#whats-new-in-
|
|
31
|
+
[What's New](#whats-new-in-v400) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
|
|
32
32
|
|
|
33
33
|
</div>
|
|
34
34
|
|
|
35
35
|
---
|
|
36
36
|
|
|
37
|
-
##
|
|
37
|
+
## Star History
|
|
38
|
+
|
|
39
|
+
<div align="center">
|
|
40
|
+
|
|
41
|
+
<a href="https://www.star-history.com/?repos=lyonzin%2Fknowledge-rag&type=date&legend=top-left">
|
|
42
|
+
<picture>
|
|
43
|
+
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/chart?repos=lyonzin/knowledge-rag&type=date&theme=dark&legend=top-left" />
|
|
44
|
+
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/chart?repos=lyonzin/knowledge-rag&type=date&legend=top-left" />
|
|
45
|
+
<img alt="Star History Chart" src="https://api.star-history.com/chart?repos=lyonzin/knowledge-rag&type=date&legend=top-left" />
|
|
46
|
+
</picture>
|
|
47
|
+
</a>
|
|
48
|
+
|
|
49
|
+
</div>
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## What's New in v4.0.0
|
|
54
|
+
|
|
55
|
+
### Enterprise Concurrent Access — SSE/HTTP Transport (v4.0.0)
|
|
56
|
+
|
|
57
|
+
The server now supports **SSE** and **streamable-http** transport modes. Instead of spawning a separate process per client (stdio), a single server process serves all clients with shared resources — 1 embedding model, 1 ChromaDB, 1 query cache.
|
|
58
|
+
|
|
59
|
+
```yaml
|
|
60
|
+
# config.yaml
|
|
61
|
+
server:
|
|
62
|
+
transport: "sse" # "stdio" | "sse" | "streamable-http"
|
|
63
|
+
host: "127.0.0.1"
|
|
64
|
+
port: 8179
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
Or via CLI: `knowledge-rag --transport sse`
|
|
68
|
+
|
|
69
|
+
**Optional enterprise features** (all disabled by default):
|
|
70
|
+
- **Rate limiting**: Sliding-window counter, configurable RPM and burst
|
|
71
|
+
- **Prometheus metrics**: `/metrics` endpoint on separate port
|
|
72
|
+
- **Bearer auth**: Token validation for SSE/HTTP connections
|
|
73
|
+
|
|
74
|
+
All 12 MCP tools are instrumented with `@rate_limited` and `@instrument` decorators — zero overhead when features are disabled. Default transport remains **stdio** for full backwards compatibility.
|
|
75
|
+
|
|
76
|
+
> **Migration**: Existing users need zero changes. SSE mode is opt-in via `server.transport: "sse"` in config.yaml. See [Configuration](#configuration) for details.
|
|
38
77
|
|
|
39
78
|
### Quality Gate — 7-Pillar PR Validation
|
|
40
79
|
|
|
41
|
-
|
|
80
|
+
Every PR (including dependabot bumps and one-line fixes) is now evaluated against **35+ automated checks** spread across 7 pillars before any human review:
|
|
42
81
|
|
|
43
82
|
| Pillar | What it enforces | Tools |
|
|
44
83
|
|---|---|---|
|
|
@@ -86,6 +125,7 @@ All methods produce the same MCP server. See [Installation](#installation) for f
|
|
|
86
125
|
|
|
87
126
|
### Recent Highlights
|
|
88
127
|
|
|
128
|
+
- **v4.0.0** — **Enterprise concurrent access**: SSE/HTTP transport (1 server → N clients), thread-safe shared state, optional rate limiting + Prometheus metrics, ChromaDB WAL mode, `--transport` CLI
|
|
89
129
|
- **v3.9.0** — **Quality Gate** activated: 35+ automated PR checks across 7 pillars (Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality) + nightly resilience suite (chaos, soak, determinism, mutation)
|
|
90
130
|
- **v3.8.1** — Critical hotfix: loud-fail embeddings (no more silent zero-vector corruption); Windows CI flake erradicated (HF_HUB_OFFLINE + shell:bash + atexit wrapper)
|
|
91
131
|
- **v3.8.0** — Lazy-load embeddings, opt-in single-instance guard, version sync across PyPI/NPM/Docker
|
|
@@ -331,6 +371,7 @@ flowchart LR
|
|
|
331
371
|
|
|
332
372
|
- Python 3.11+
|
|
333
373
|
- Claude Code CLI
|
|
374
|
+
- *…or any other MCP client (Claude Desktop, Cursor, VS Code, Antigravity, opencode, Windsurf) — see [Use with other MCP clients](#use-with-other-mcp-clients)*
|
|
334
375
|
- ~200MB disk for model cache (auto-downloaded on first run)
|
|
335
376
|
- *Optional:* NVIDIA GPU + CUDA for accelerated embeddings (`pip install knowledge-rag[gpu]` + `models.embedding.gpu: true` in config)
|
|
336
377
|
|
|
@@ -446,6 +487,94 @@ Add to `~/.claude.json`:
|
|
|
446
487
|
> Replace `YOUR_USER` with your username, or use the full path from `echo $HOME`.
|
|
447
488
|
</details>
|
|
448
489
|
|
|
490
|
+
#### Option F: SSE Server Mode (multi-agent)
|
|
491
|
+
|
|
492
|
+
For multi-agent setups where multiple clients query the same knowledge base simultaneously:
|
|
493
|
+
|
|
494
|
+
```bash
|
|
495
|
+
pip install knowledge-rag[server] # Adds uvicorn for SSE/HTTP
|
|
496
|
+
knowledge-rag --transport sse # Starts on http://127.0.0.1:8179
|
|
497
|
+
```
|
|
498
|
+
|
|
499
|
+
Then configure each MCP client to connect via SSE:
|
|
500
|
+
|
|
501
|
+
```json
|
|
502
|
+
{
|
|
503
|
+
"mcpServers": {
|
|
504
|
+
"knowledge-rag": {
|
|
505
|
+
"type": "sse",
|
|
506
|
+
"url": "http://127.0.0.1:8179/sse"
|
|
507
|
+
}
|
|
508
|
+
}
|
|
509
|
+
}
|
|
510
|
+
```
|
|
511
|
+
|
|
512
|
+
One server process serves all agents — shared embedding model, shared cache, shared ChromaDB. See [Configuration > Server](#server) for rate limiting, metrics, and auth options.
|
|
513
|
+
|
|
514
|
+
### Use with other MCP clients
|
|
515
|
+
|
|
516
|
+
`knowledge-rag` supports both **stdio** (default, 1:1) and **SSE** (1:N) transport modes. In stdio mode, it works with any MCP-compatible client, not only Claude Code. The launch command is the same everywhere (the `python -m mcp_server.server` from whichever install method you picked); only the **config file location** and **JSON shape** differ per client.
|
|
517
|
+
|
|
518
|
+
#### Clients using the standard `mcpServers` format
|
|
519
|
+
|
|
520
|
+
For **Claude Desktop, Cursor, Antigravity, and Windsurf**, use the same block — only the file location changes:
|
|
521
|
+
|
|
522
|
+
```json
|
|
523
|
+
{
|
|
524
|
+
"mcpServers": {
|
|
525
|
+
"knowledge-rag": {
|
|
526
|
+
"command": "/home/YOUR_USER/knowledge-rag/venv/bin/python",
|
|
527
|
+
"args": ["-m", "mcp_server.server"]
|
|
528
|
+
}
|
|
529
|
+
}
|
|
530
|
+
}
|
|
531
|
+
```
|
|
532
|
+
|
|
533
|
+
> **Windows**: set `command` to the full path of `venv\Scripts\python.exe`.
|
|
534
|
+
|
|
535
|
+
| Client | Config file | Notes |
|
|
536
|
+
|---|---|---|
|
|
537
|
+
| **Claude Code** | use `claude mcp add …` (see install methods above) | The CLI writes `~/.claude.json` for you — manual edits to it aren't reliably picked up. |
|
|
538
|
+
| **Claude Desktop** | macOS: `~/Library/Application Support/Claude/claude_desktop_config.json` · Windows: `%APPDATA%\Claude\claude_desktop_config.json` | Easiest: **Settings → Developer → Edit Config** opens the correct file (avoids the Windows Store/MSIX path quirk). |
|
|
539
|
+
| **Cursor** | `~/.cursor/mcp.json` (global) or `.cursor/mcp.json` (per project) | — |
|
|
540
|
+
| **Antigravity** | macOS/Linux: `~/.gemini/antigravity/mcp_config.json` · Windows: `%USERPROFILE%\.gemini\antigravity\mcp_config.json` | Open via Agent panel → **"…" → Manage MCP Servers → View raw config**. |
|
|
541
|
+
| **Windsurf** | `~/.codeium/windsurf/mcp_config.json` (global only) | Easiest: Cascade panel → MCP → **View raw config**. |
|
|
542
|
+
|
|
543
|
+
#### VS Code — uses a `servers` key
|
|
544
|
+
|
|
545
|
+
VS Code (Copilot MCP) nests servers under **`servers`**, not `mcpServers`. Put this in `.vscode/mcp.json` (workspace) or the file opened by the **MCP: Open User Configuration** command:
|
|
546
|
+
|
|
547
|
+
```json
|
|
548
|
+
{
|
|
549
|
+
"servers": {
|
|
550
|
+
"knowledge-rag": {
|
|
551
|
+
"type": "stdio",
|
|
552
|
+
"command": "/home/YOUR_USER/knowledge-rag/venv/bin/python",
|
|
553
|
+
"args": ["-m", "mcp_server.server"]
|
|
554
|
+
}
|
|
555
|
+
}
|
|
556
|
+
}
|
|
557
|
+
```
|
|
558
|
+
|
|
559
|
+
#### opencode — uses an `mcp` key
|
|
560
|
+
|
|
561
|
+
opencode nests servers under **`mcp`**, takes `command` as a single **array**, and uses `environment` instead of `env`. Put this in `opencode.json` (project root) or `~/.config/opencode/opencode.json` (global):
|
|
562
|
+
|
|
563
|
+
```jsonc
|
|
564
|
+
{
|
|
565
|
+
"$schema": "https://opencode.ai/config.json",
|
|
566
|
+
"mcp": {
|
|
567
|
+
"knowledge-rag": {
|
|
568
|
+
"type": "local",
|
|
569
|
+
"command": ["/home/YOUR_USER/knowledge-rag/venv/bin/python", "-m", "mcp_server.server"],
|
|
570
|
+
"enabled": true
|
|
571
|
+
}
|
|
572
|
+
}
|
|
573
|
+
}
|
|
574
|
+
```
|
|
575
|
+
|
|
576
|
+
> **Any other MCP client**: point it at the same command + args (`…/venv/bin/python -m mcp_server.server`). If it speaks stdio MCP, knowledge-rag works — only the config file's location and key naming differ. Check your client's docs for the exact path.
|
|
577
|
+
|
|
449
578
|
### Verify
|
|
450
579
|
|
|
451
580
|
```bash
|
|
@@ -842,6 +971,21 @@ query_expansions:
|
|
|
842
971
|
privesc:
|
|
843
972
|
- privilege escalation
|
|
844
973
|
- privesc
|
|
974
|
+
|
|
975
|
+
# Server — enterprise features (new in v4.0.0)
|
|
976
|
+
server:
|
|
977
|
+
transport: "stdio" # "stdio" | "sse" | "streamable-http"
|
|
978
|
+
host: "127.0.0.1" # Bind address (SSE/HTTP only)
|
|
979
|
+
port: 8179 # Bind port (SSE/HTTP only)
|
|
980
|
+
auth:
|
|
981
|
+
bearer_token: "" # Set a secret to enable auth (SSE/HTTP only)
|
|
982
|
+
rate_limit:
|
|
983
|
+
enabled: false
|
|
984
|
+
requests_per_minute: 60
|
|
985
|
+
burst: 10
|
|
986
|
+
metrics:
|
|
987
|
+
enabled: false
|
|
988
|
+
port: 9179 # Separate port for Prometheus scraping
|
|
845
989
|
```
|
|
846
990
|
|
|
847
991
|
> See `config.example.yaml` for the fully documented template with explanations for every field.
|
|
@@ -861,6 +1005,22 @@ Pre-built configurations for common use cases:
|
|
|
861
1005
|
|
|
862
1006
|
### Configuration Reference
|
|
863
1007
|
|
|
1008
|
+
#### Server
|
|
1009
|
+
|
|
1010
|
+
| Field | Default | Description |
|
|
1011
|
+
|-------|---------|-------------|
|
|
1012
|
+
| `server.transport` | `"stdio"` | Transport protocol: `"stdio"`, `"sse"`, or `"streamable-http"` |
|
|
1013
|
+
| `server.host` | `"127.0.0.1"` | Bind address for SSE/HTTP mode |
|
|
1014
|
+
| `server.port` | `8179` | Bind port for SSE/HTTP mode |
|
|
1015
|
+
| `server.auth.bearer_token` | `""` (disabled) | Bearer token for SSE/HTTP auth. Empty = no auth |
|
|
1016
|
+
| `server.rate_limit.enabled` | `false` | Enable per-client rate limiting |
|
|
1017
|
+
| `server.rate_limit.requests_per_minute` | `60` | Max requests per minute |
|
|
1018
|
+
| `server.rate_limit.burst` | `10` | Burst allowance above steady rate |
|
|
1019
|
+
| `server.metrics.enabled` | `false` | Enable Prometheus `/metrics` endpoint |
|
|
1020
|
+
| `server.metrics.port` | `9179` | Port for metrics scraping |
|
|
1021
|
+
|
|
1022
|
+
In stdio mode (default), server settings are ignored. SSE/HTTP mode auto-enables the single-instance lock.
|
|
1023
|
+
|
|
864
1024
|
#### Paths
|
|
865
1025
|
|
|
866
1026
|
| Field | Default | Description |
|
|
@@ -1098,10 +1258,59 @@ export KNOWLEDGE_RAG_SINGLE_INSTANCE=1
|
|
|
1098
1258
|
|
|
1099
1259
|
A second instance exits immediately with code 75. Default is OFF (multi-client friendly). Full guide: [docs/single-instance.md](docs/single-instance.md). Sample MCP config: [examples/mcp-config-single-instance.json](examples/mcp-config-single-instance.json).
|
|
1100
1260
|
|
|
1261
|
+
### SSE server won't start
|
|
1262
|
+
|
|
1263
|
+
```bash
|
|
1264
|
+
# Check if port 8179 is already in use
|
|
1265
|
+
# Windows:
|
|
1266
|
+
netstat -aon | findstr :8179
|
|
1267
|
+
# Linux/macOS:
|
|
1268
|
+
lsof -i :8179
|
|
1269
|
+
```
|
|
1270
|
+
|
|
1271
|
+
If `uvicorn` is not found, install the server extras: `pip install knowledge-rag[server]`
|
|
1272
|
+
|
|
1273
|
+
### Can't connect to SSE server
|
|
1274
|
+
|
|
1275
|
+
Verify the server is running and the URL is correct:
|
|
1276
|
+
|
|
1277
|
+
```bash
|
|
1278
|
+
curl http://127.0.0.1:8179/sse
|
|
1279
|
+
```
|
|
1280
|
+
|
|
1281
|
+
Common issues:
|
|
1282
|
+
- Wrong URL: must end with `/sse` (not just the port)
|
|
1283
|
+
- Firewall blocking the port
|
|
1284
|
+
- Server started with a different host/port than configured in the MCP client
|
|
1285
|
+
|
|
1101
1286
|
---
|
|
1102
1287
|
|
|
1103
1288
|
## Changelog
|
|
1104
1289
|
|
|
1290
|
+
### v4.0.0 (2026-06-09) — Enterprise Concurrent Access
|
|
1291
|
+
|
|
1292
|
+
- **NEW**: SSE and streamable-http transport modes — 1 server serves N clients (`server.transport: "sse"` in config.yaml or `--transport sse` CLI).
|
|
1293
|
+
- **NEW**: Thread-safe shared state for concurrent queries — QueryCache locking, BM25 build lock, orchestrator double-checked locking.
|
|
1294
|
+
- **NEW**: ChromaDB WAL mode enabled automatically in SSE/HTTP mode for concurrent read performance.
|
|
1295
|
+
- **NEW**: Optional rate limiting — sliding-window counter, configurable RPM and burst, disabled by default.
|
|
1296
|
+
- **NEW**: Optional Prometheus metrics endpoint — tool call counts, latency histograms, separate port, disabled by default.
|
|
1297
|
+
- **NEW**: All 12 MCP tools instrumented with `@rate_limited` and `@instrument` decorators (zero-cost when disabled).
|
|
1298
|
+
- **NEW**: `--transport` CLI override for Docker/systemd deployments.
|
|
1299
|
+
- **NEW**: `pip install knowledge-rag[server]` optional dependency for SSE/HTTP (uvicorn).
|
|
1300
|
+
- **CHANGED**: SSE/HTTP mode auto-enables single-instance lock (port collision prevention).
|
|
1301
|
+
- **CHANGED**: `mcp` dependency bumped to `>=1.6.0` (SSE/streamable-http support).
|
|
1302
|
+
- **MIGRATION**: Default transport remains `stdio` — existing users need zero changes. See config.example.yaml for SSE setup.
|
|
1303
|
+
|
|
1304
|
+
### v3.9.1 (2026-06-08)
|
|
1305
|
+
|
|
1306
|
+
- **FIX**: Expand `~` in `config.yaml` path values (`documents_dir`, `data_dir`, `models_cache_dir`) via `expanduser()` on all platforms (#86).
|
|
1307
|
+
- **FIX**: Warn when `documents_dir` resolves to a non-existent path instead of silently indexing zero files.
|
|
1308
|
+
- **FIX**: File watcher now uses accumulate-mode debounce — bulk file copies no longer starve the reindex trigger.
|
|
1309
|
+
- **FIX**: Concurrent `index_all()` calls are serialized via `_index_lock` to prevent ChromaDB SQLite corruption.
|
|
1310
|
+
- **FIX**: `collection.add()` is batched (500 chunks/call) to cap memory usage during large reindex operations.
|
|
1311
|
+
- **NEW**: `KNOWLEDGE_RAG_WATCHER_DISABLED=1` env var to disable the file watcher for troubleshooting.
|
|
1312
|
+
- **NEW**: Progress logging every 10% for reindex operations with >100 documents.
|
|
1313
|
+
|
|
1105
1314
|
### v3.9.0 (2026-05-10) — Quality Gate
|
|
1106
1315
|
|
|
1107
1316
|
**Major governance + CI hardening release. No runtime behavior change in `mcp_server/`. Public API surface unchanged from v3.8.1.**
|
|
@@ -245,3 +245,36 @@ query_expansions: {}
|
|
|
245
245
|
#
|
|
246
246
|
# # Logging verbosity: DEBUG, INFO, WARNING, ERROR
|
|
247
247
|
# # log_level: "INFO"
|
|
248
|
+
|
|
249
|
+
|
|
250
|
+
# ============================================================================
|
|
251
|
+
# SERVER (new in v4.0.0)
|
|
252
|
+
# ============================================================================
|
|
253
|
+
# Controls transport, networking, and enterprise features.
|
|
254
|
+
# All fields are optional — defaults preserve v3.x stdio behavior.
|
|
255
|
+
|
|
256
|
+
server:
|
|
257
|
+
# Transport protocol: "stdio" (legacy), "sse", "streamable-http"
|
|
258
|
+
# stdio: 1 process per client (compatible with all MCP clients)
|
|
259
|
+
# sse: 1 server serves N clients over HTTP+SSE (recommended for multi-agent)
|
|
260
|
+
# streamable-http: 1 server, HTTP streaming
|
|
261
|
+
transport: "stdio"
|
|
262
|
+
|
|
263
|
+
# Network settings (ignored when transport is stdio)
|
|
264
|
+
host: "127.0.0.1"
|
|
265
|
+
port: 8179
|
|
266
|
+
|
|
267
|
+
# Auth: optional bearer token validation (SSE/HTTP only)
|
|
268
|
+
auth:
|
|
269
|
+
bearer_token: ""
|
|
270
|
+
|
|
271
|
+
# Rate limiting: optional per-client request throttling
|
|
272
|
+
rate_limit:
|
|
273
|
+
enabled: false
|
|
274
|
+
requests_per_minute: 60
|
|
275
|
+
burst: 10
|
|
276
|
+
|
|
277
|
+
# Metrics: optional Prometheus-compatible /metrics endpoint
|
|
278
|
+
metrics:
|
|
279
|
+
enabled: false
|
|
280
|
+
port: 9179
|