knowledge-rag 3.5.0__tar.gz → 3.5.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: knowledge-rag
3
- Version: 3.5.0
3
+ Version: 3.5.2
4
4
  Summary: Local RAG System for Claude Code — Hybrid search + Cross-encoder Reranking + 12 MCP Tools. Zero external servers.
5
5
  Project-URL: Homepage, https://github.com/lyonzin/knowledge-rag
6
6
  Project-URL: Repository, https://github.com/lyonzin/knowledge-rag
@@ -19,7 +19,7 @@ Classifier: Programming Language :: Python :: 3.11
19
19
  Classifier: Programming Language :: Python :: 3.12
20
20
  Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
21
21
  Classifier: Topic :: Text Processing :: Indexing
22
- Requires-Python: <3.13,>=3.11
22
+ Requires-Python: >=3.11
23
23
  Requires-Dist: beautifulsoup4>=4.12.0
24
24
  Requires-Dist: chromadb>=1.4.0
25
25
  Requires-Dist: fastembed[reranking]>=0.4.0
@@ -40,10 +40,11 @@ Description-Content-Type: text/markdown
40
40
 
41
41
  <div align="center">
42
42
 
43
- ![Version](https://img.shields.io/badge/version-3.5.0-blue.svg)
44
- ![Python](https://img.shields.io/badge/python-3.11%20%7C%203.12-green.svg)
43
+ ![Version](https://img.shields.io/badge/version-3.5.2-blue.svg)
44
+ ![Python](https://img.shields.io/badge/python-3.11%2B-green.svg)
45
45
  ![License](https://img.shields.io/badge/license-MIT-yellow.svg)
46
46
  ![Platform](https://img.shields.io/badge/platform-Windows%20%7C%20Linux%20%7C%20macOS-lightgrey.svg)
47
+ ![GPU](https://img.shields.io/badge/GPU-NVIDIA%20CUDA-76B900.svg?logo=nvidia)
47
48
  [![CI](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml)
48
49
  [![CodeQL](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml)
49
50
  [![Glama Score](https://glama.ai/mcp/servers/lyonzin/knowledge-rag/badges/score.svg)](https://glama.ai/mcp/servers/lyonzin/knowledge-rag)
@@ -63,35 +64,47 @@ Your documents become instantly searchable inside Claude Code — with reranking
63
64
 
64
65
  **12 MCP Tools** | **Hybrid Search + Cross-Encoder Reranking** | **Markdown-Aware Chunking** | **100% Local, Zero Cloud**
65
66
 
66
- [What's New](#whats-new-in-v350) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
67
+ [What's New](#whats-new-in-v352) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
67
68
 
68
69
  </div>
69
70
 
70
71
  ---
71
72
 
72
- ## What's New in v3.5.0
73
+ ## What's New in v3.5.2
73
74
 
74
75
  ### GPU-Accelerated Embeddings (Optional)
75
76
 
76
- ONNX embeddings can now run on NVIDIA GPUs for **5-10x faster indexing**. Opt-in — CPU remains the default.
77
+ ONNX embeddings can run on NVIDIA GPUs for **5-10x faster indexing**. Opt-in — CPU remains the default.
77
78
 
78
79
  ```bash
80
+ # NVIDIA GPU (requires CUDA 12.x drivers)
79
81
  pip install knowledge-rag[gpu]
82
+
83
+ # Also install CUDA 12 runtime libraries (if not using CUDA Toolkit 12.x)
84
+ pip install nvidia-cublas-cu12 nvidia-cudnn-cu12 nvidia-cuda-runtime-cu12
80
85
  ```
81
86
 
82
87
  ```yaml
83
88
  # config.yaml
84
89
  models:
85
90
  embedding:
86
- gpu: true # Falls back to CPU if CUDA unavailable
91
+ gpu: true # Automatic CPU fallback if CUDA is unavailable
87
92
  ```
88
93
 
94
+ **How it works:**
95
+ - Sets `CUDAExecutionProvider` as primary, `CPUExecutionProvider` as fallback
96
+ - Auto-discovers CUDA 12 DLLs from pip-installed NVIDIA packages (no manual PATH config)
97
+ - If GPU init fails for any reason, falls back to CPU silently with a `[WARN]` log
98
+ - `gpu: false` (default) forces CPU-only mode — zero CUDA overhead, clean logs
99
+
89
100
  Ideal for large knowledge bases (1000+ documents) where full rebuilds take minutes on CPU. After the initial index, incremental reindexing (`force: true`) takes seconds regardless.
90
101
 
91
102
  ### Recent Highlights
92
103
 
104
+ - **v3.5.2** — CUDA DLL auto-discovery from pip packages, graceful GPU→CPU fallback, explicit CPU provider (no CUDA noise when `gpu: false`), BASE_DIR resolution fix for editable installs
105
+ - **v3.5.1** — Remove Python `<3.13` upper bound — 3.13 and 3.14 now supported
106
+ - **v3.5.0** — Optional GPU acceleration, supported formats table, full README rewrite
93
107
  - **v3.4.3** — MCP stdout save/restore fix (v3.4.2 broke JSON-RPC responses)
94
- - **v3.4.1** — `pip install` auto-detects project dir from venv location, Linux/macOS `install.sh`
95
108
  - **v3.4.0** — Persistent model cache, exclude patterns, Jupyter Notebook parser, inotify resilience, MetaTrader support
96
109
 
97
110
  See [Changelog](#changelog) for full history.
@@ -319,7 +332,7 @@ flowchart LR
319
332
 
320
333
  ### Prerequisites
321
334
 
322
- - Python 3.11 or 3.12 (**NOT** 3.13+ — onnxruntime incompatibility)
335
+ - Python 3.11+
323
336
  - Claude Code CLI
324
337
  - ~200MB disk for model cache (auto-downloaded on first run)
325
338
  - *Optional:* NVIDIA GPU + CUDA for [accelerated embeddings](#gpu-accelerated-embeddings-optional) (`pip install knowledge-rag[gpu]`)
@@ -950,17 +963,10 @@ knowledge-rag/
950
963
 
951
964
  ### Python version mismatch
952
965
 
953
- ChromaDB depends on onnxruntime which requires Python 3.11 or 3.12. Python 3.13+ is **NOT** supported.
966
+ Requires Python 3.11 or newer.
954
967
 
955
968
  ```bash
956
- # Check version
957
- python --version
958
-
959
- # Windows: use specific version
960
- py -3.12 -m venv venv
961
-
962
- # Linux/macOS: use specific version
963
- python3.12 -m venv venv
969
+ python --version # Must be 3.11+
964
970
  ```
965
971
 
966
972
  ### FastEmbed model download fails
@@ -1018,6 +1024,17 @@ With ~200 documents, expect ~300-500MB RAM. The embedding model (~50MB) and rera
1018
1024
 
1019
1025
  ## Changelog
1020
1026
 
1027
+ ### v3.5.2 (2026-04-16)
1028
+
1029
+ - **NEW**: Auto-discovery of CUDA 12 DLLs from pip-installed NVIDIA packages — no manual PATH configuration needed
1030
+ - **NEW**: Graceful GPU→CPU fallback with `[WARN]` log when CUDA init fails (missing drivers, wrong version, etc.)
1031
+ - **FIX**: Explicit `CPUExecutionProvider` when `gpu: false` — eliminates noisy CUDA probe errors in logs
1032
+ - **FIX**: BASE_DIR resolution now correctly prefers directories with `config.yaml` over those with only `config.example.yaml` (fixes editable installs)
1033
+
1034
+ ### v3.5.1 (2026-04-16)
1035
+
1036
+ - **FIX**: Removed Python upper bound constraint (`<3.13` → `>=3.11`). Python 3.13 and 3.14 now supported — onnxruntime ships wheels for both.
1037
+
1021
1038
  ### v3.5.0 (2026-04-16)
1022
1039
 
1023
1040
  - **NEW**: Optional GPU acceleration for ONNX embeddings — `pip install knowledge-rag[gpu]` + `models.embedding.gpu: true` in config. 5-10x faster indexing on NVIDIA GPUs with automatic CPU fallback.
@@ -2,10 +2,11 @@
2
2
 
3
3
  <div align="center">
4
4
 
5
- ![Version](https://img.shields.io/badge/version-3.5.0-blue.svg)
6
- ![Python](https://img.shields.io/badge/python-3.11%20%7C%203.12-green.svg)
5
+ ![Version](https://img.shields.io/badge/version-3.5.2-blue.svg)
6
+ ![Python](https://img.shields.io/badge/python-3.11%2B-green.svg)
7
7
  ![License](https://img.shields.io/badge/license-MIT-yellow.svg)
8
8
  ![Platform](https://img.shields.io/badge/platform-Windows%20%7C%20Linux%20%7C%20macOS-lightgrey.svg)
9
+ ![GPU](https://img.shields.io/badge/GPU-NVIDIA%20CUDA-76B900.svg?logo=nvidia)
9
10
  [![CI](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml)
10
11
  [![CodeQL](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml)
11
12
  [![Glama Score](https://glama.ai/mcp/servers/lyonzin/knowledge-rag/badges/score.svg)](https://glama.ai/mcp/servers/lyonzin/knowledge-rag)
@@ -25,35 +26,47 @@ Your documents become instantly searchable inside Claude Code — with reranking
25
26
 
26
27
  **12 MCP Tools** | **Hybrid Search + Cross-Encoder Reranking** | **Markdown-Aware Chunking** | **100% Local, Zero Cloud**
27
28
 
28
- [What's New](#whats-new-in-v350) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
29
+ [What's New](#whats-new-in-v352) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
29
30
 
30
31
  </div>
31
32
 
32
33
  ---
33
34
 
34
- ## What's New in v3.5.0
35
+ ## What's New in v3.5.2
35
36
 
36
37
  ### GPU-Accelerated Embeddings (Optional)
37
38
 
38
- ONNX embeddings can now run on NVIDIA GPUs for **5-10x faster indexing**. Opt-in — CPU remains the default.
39
+ ONNX embeddings can run on NVIDIA GPUs for **5-10x faster indexing**. Opt-in — CPU remains the default.
39
40
 
40
41
  ```bash
42
+ # NVIDIA GPU (requires CUDA 12.x drivers)
41
43
  pip install knowledge-rag[gpu]
44
+
45
+ # Also install CUDA 12 runtime libraries (if not using CUDA Toolkit 12.x)
46
+ pip install nvidia-cublas-cu12 nvidia-cudnn-cu12 nvidia-cuda-runtime-cu12
42
47
  ```
43
48
 
44
49
  ```yaml
45
50
  # config.yaml
46
51
  models:
47
52
  embedding:
48
- gpu: true # Falls back to CPU if CUDA unavailable
53
+ gpu: true # Automatic CPU fallback if CUDA is unavailable
49
54
  ```
50
55
 
56
+ **How it works:**
57
+ - Sets `CUDAExecutionProvider` as primary, `CPUExecutionProvider` as fallback
58
+ - Auto-discovers CUDA 12 DLLs from pip-installed NVIDIA packages (no manual PATH config)
59
+ - If GPU init fails for any reason, falls back to CPU silently with a `[WARN]` log
60
+ - `gpu: false` (default) forces CPU-only mode — zero CUDA overhead, clean logs
61
+
51
62
  Ideal for large knowledge bases (1000+ documents) where full rebuilds take minutes on CPU. After the initial index, incremental reindexing (`force: true`) takes seconds regardless.
52
63
 
53
64
  ### Recent Highlights
54
65
 
66
+ - **v3.5.2** — CUDA DLL auto-discovery from pip packages, graceful GPU→CPU fallback, explicit CPU provider (no CUDA noise when `gpu: false`), BASE_DIR resolution fix for editable installs
67
+ - **v3.5.1** — Remove Python `<3.13` upper bound — 3.13 and 3.14 now supported
68
+ - **v3.5.0** — Optional GPU acceleration, supported formats table, full README rewrite
55
69
  - **v3.4.3** — MCP stdout save/restore fix (v3.4.2 broke JSON-RPC responses)
56
- - **v3.4.1** — `pip install` auto-detects project dir from venv location, Linux/macOS `install.sh`
57
70
  - **v3.4.0** — Persistent model cache, exclude patterns, Jupyter Notebook parser, inotify resilience, MetaTrader support
58
71
 
59
72
  See [Changelog](#changelog) for full history.
@@ -281,7 +294,7 @@ flowchart LR
281
294
 
282
295
  ### Prerequisites
283
296
 
284
- - Python 3.11 or 3.12 (**NOT** 3.13+ — onnxruntime incompatibility)
297
+ - Python 3.11+
285
298
  - Claude Code CLI
286
299
  - ~200MB disk for model cache (auto-downloaded on first run)
287
300
  - *Optional:* NVIDIA GPU + CUDA for [accelerated embeddings](#gpu-accelerated-embeddings-optional) (`pip install knowledge-rag[gpu]`)
@@ -912,17 +925,10 @@ knowledge-rag/
912
925
 
913
926
  ### Python version mismatch
914
927
 
915
- ChromaDB depends on onnxruntime which requires Python 3.11 or 3.12. Python 3.13+ is **NOT** supported.
928
+ Requires Python 3.11 or newer.
916
929
 
917
930
  ```bash
918
- # Check version
919
- python --version
920
-
921
- # Windows: use specific version
922
- py -3.12 -m venv venv
923
-
924
- # Linux/macOS: use specific version
925
- python3.12 -m venv venv
931
+ python --version # Must be 3.11+
926
932
  ```
927
933
 
928
934
  ### FastEmbed model download fails
@@ -980,6 +986,17 @@ With ~200 documents, expect ~300-500MB RAM. The embedding model (~50MB) and rera
980
986
 
981
987
  ## Changelog
982
988
 
989
+ ### v3.5.2 (2026-04-16)
990
+
991
+ - **NEW**: Auto-discovery of CUDA 12 DLLs from pip-installed NVIDIA packages — no manual PATH configuration needed
992
+ - **NEW**: Graceful GPU→CPU fallback with `[WARN]` log when CUDA init fails (missing drivers, wrong version, etc.)
993
+ - **FIX**: Explicit `CPUExecutionProvider` when `gpu: false` — eliminates noisy CUDA probe errors in logs
994
+ - **FIX**: BASE_DIR resolution now correctly prefers directories with `config.yaml` over those with only `config.example.yaml` (fixes editable installs)
995
+
996
+ ### v3.5.1 (2026-04-16)
997
+
998
+ - **FIX**: Removed Python upper bound constraint (`<3.13` → `>=3.11`). Python 3.13 and 3.14 now supported — onnxruntime ships wheels for both.
999
+
983
1000
  ### v3.5.0 (2026-04-16)
984
1001
 
985
1002
  - **NEW**: Optional GPU acceleration for ONNX embeddings — `pip install knowledge-rag[gpu]` + `models.embedding.gpu: true` in config. 5-10x faster indexing on NVIDIA GPUs with automatic CPU fallback.
@@ -8,7 +8,7 @@ import sys # noqa: I001
8
8
  _original_stdout = sys.stdout
9
9
  sys.stdout = sys.stderr
10
10
 
11
- __version__ = "3.5.0"
11
+ __version__ = "3.5.2"
12
12
  __author__ = "Ailton Rocha (Lyon.)"
13
13
 
14
14
  from .config import Config # noqa: E402
@@ -51,12 +51,17 @@ _venv_dir = _venv_project_dir()
51
51
 
52
52
  if os.environ.get("KNOWLEDGE_RAG_DIR"):
53
53
  BASE_DIR = Path(os.environ["KNOWLEDGE_RAG_DIR"])
54
+ elif _venv_dir is not None and (_venv_dir / "config.yaml").exists():
55
+ # Prefer venv parent if it has an actual config.yaml (editable installs, PyPI installs)
56
+ BASE_DIR = _venv_dir
57
+ elif _is_project_root(_source_dir) and (_source_dir / "config.yaml").exists():
58
+ BASE_DIR = _source_dir
59
+ elif _is_project_root(Path.cwd()):
60
+ BASE_DIR = Path.cwd()
54
61
  elif _is_project_root(_source_dir):
55
62
  BASE_DIR = _source_dir
56
63
  elif _is_project_root(_venv_dir):
57
64
  BASE_DIR = _venv_dir
58
- elif _is_project_root(Path.cwd()):
59
- BASE_DIR = Path.cwd()
60
65
  else:
61
66
  BASE_DIR = _venv_dir if _venv_dir is not None else Path.cwd()
62
67
 
@@ -19,7 +19,7 @@ Features:
19
19
  - CRUD operations via MCP tools (add, update, remove docs)
20
20
 
21
21
  Autor: Lyon (Ailton Rocha)
22
- Versao: 3.5.0
22
+ Versao: 3.5.2
23
23
  Data: 2026-04-16
24
24
  """
25
25
 
@@ -138,17 +138,60 @@ class FastEmbedEmbeddings:
138
138
  Model: BAAI/bge-small-en-v1.5 (384-dim, MTEB score 62.x)
139
139
  """
140
140
 
141
+ @staticmethod
142
+ def _setup_cuda_dll_paths():
143
+ """Add NVIDIA CUDA 12 pip package DLL paths to os.environ['PATH'].
144
+
145
+ When onnxruntime-gpu is installed alongside nvidia-cublas-cu12 etc.,
146
+ the DLLs live under site-packages/nvidia/*/bin/ and onnxruntime can't
147
+ find them unless they're on PATH. This is a no-op if the dirs don't exist.
148
+ """
149
+ import os
150
+ import site
151
+
152
+ site_dirs = site.getsitepackages() if hasattr(site, "getsitepackages") else []
153
+ nvidia_libs = [
154
+ "nvidia/cublas/bin",
155
+ "nvidia/cudnn/bin",
156
+ "nvidia/cuda_runtime/bin",
157
+ "nvidia/cufft/bin",
158
+ "nvidia/curand/bin",
159
+ "nvidia/cusolver/bin",
160
+ "nvidia/cusparse/bin",
161
+ "nvidia/nvjitlink/bin",
162
+ "nvidia/cuda_nvrtc/bin",
163
+ ]
164
+ added = []
165
+ for sp in site_dirs:
166
+ for lib in nvidia_libs:
167
+ p = os.path.join(sp, lib)
168
+ if os.path.isdir(p) and p not in os.environ.get("PATH", ""):
169
+ os.environ["PATH"] = p + os.pathsep + os.environ.get("PATH", "")
170
+ added.append(lib.split("/")[1])
171
+ if added:
172
+ print(f"[INFO] CUDA DLL paths added for: {', '.join(dict.fromkeys(added))}")
173
+
141
174
  def __init__(self, model: str = None):
142
175
  self.model_name = model or config.embedding_model
143
176
  self._dim = config.embedding_dim
144
177
  kwargs = {"model_name": self.model_name, "cache_dir": str(config.models_cache_dir)}
145
178
  if config.gpu_acceleration:
179
+ self._setup_cuda_dll_paths()
146
180
  kwargs["providers"] = ["CUDAExecutionProvider", "CPUExecutionProvider"]
147
181
  print(f"[INFO] Loading embedding model: {self.model_name} ({self._dim}D) [GPU accelerated]...")
182
+ try:
183
+ self._model = TextEmbedding(**kwargs)
184
+ print("[INFO] Embedding model loaded successfully [GPU]")
185
+ except (ValueError, RuntimeError) as e:
186
+ print(f"[WARN] GPU init failed ({e}), falling back to CPU...")
187
+ kwargs["providers"] = ["CPUExecutionProvider"]
188
+ self._model = TextEmbedding(**kwargs)
189
+ print("[INFO] Embedding model loaded successfully [CPU fallback]")
148
190
  else:
191
+ kwargs["providers"] = ["CPUExecutionProvider"]
149
192
  print(f"[INFO] Loading embedding model: {self.model_name} ({self._dim}D)...")
150
- self._model = TextEmbedding(**kwargs)
151
- print("[INFO] Embedding model loaded successfully")
193
+ self._model = TextEmbedding(**kwargs)
194
+ print("[INFO] Embedding model loaded successfully")
152
195
 
153
196
  def __call__(self, input: List[str]) -> List[List[float]]:
154
197
  """
@@ -4,11 +4,11 @@ build-backend = "hatchling.build"
4
4
 
5
5
  [project]
6
6
  name = "knowledge-rag"
7
- version = "3.5.0"
7
+ version = "3.5.2"
8
8
  description = "Local RAG System for Claude Code — Hybrid search + Cross-encoder Reranking + 12 MCP Tools. Zero external servers."
9
9
  readme = "README.md"
10
10
  license = {text = "MIT"}
11
- requires-python = ">=3.11,<3.13"
11
+ requires-python = ">=3.11"
12
12
  authors = [
13
13
  {name = "Lyon.", email = "lyonzin@users.noreply.github.com"},
14
14
  ]
File without changes
File without changes