PyPI - knowledge-rag - Versions diffs - 3.5.0__tar.gz → 3.5.2__tar.gz - Mend

knowledge-rag 3.5.0tar.gz → 3.5.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

{knowledge_rag-3.5.0 → knowledge_rag-3.5.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: knowledge-rag
-Version: 3.5.0
+Version: 3.5.2
 Summary: Local RAG System for Claude Code — Hybrid search + Cross-encoder Reranking + 12 MCP Tools. Zero external servers.
 Project-URL: Homepage, https://github.com/lyonzin/knowledge-rag
 Project-URL: Repository, https://github.com/lyonzin/knowledge-rag
@@ -19,7 +19,7 @@ Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
 Classifier: Topic :: Text Processing :: Indexing
-Requires-Python: <3.13,>=3.11
+Requires-Python: >=3.11
 Requires-Dist: beautifulsoup4>=4.12.0
 Requires-Dist: chromadb>=1.4.0
 Requires-Dist: fastembed[reranking]>=0.4.0
@@ -40,10 +40,11 @@ Description-Content-Type: text/markdown
 <div align="center">
-![Version](https://img.shields.io/badge/version-3.5.0-blue.svg)
-![Python](https://img.shields.io/badge/python-3.11%20%7C%203.12-green.svg)
+![Version](https://img.shields.io/badge/version-3.5.2-blue.svg)
+![Python](https://img.shields.io/badge/python-3.11%2B-green.svg)
 ![License](https://img.shields.io/badge/license-MIT-yellow.svg)
 ![Platform](https://img.shields.io/badge/platform-Windows%20%7C%20Linux%20%7C%20macOS-lightgrey.svg)
+![GPU](https://img.shields.io/badge/GPU-NVIDIA%20CUDA-76B900.svg?logo=nvidia)
 [![CI](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml)
 [![CodeQL](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml)
 [![Glama Score](https://glama.ai/mcp/servers/lyonzin/knowledge-rag/badges/score.svg)](https://glama.ai/mcp/servers/lyonzin/knowledge-rag)
@@ -63,35 +64,47 @@ Your documents become instantly searchable inside Claude Code — with reranking
 **12 MCP Tools** | **Hybrid Search + Cross-Encoder Reranking** | **Markdown-Aware Chunking** | **100% Local, Zero Cloud**
-[What's New](#whats-new-in-v350) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
+[What's New](#whats-new-in-v352) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
 </div>
 ---
-## What's New in v3.5.0
+## What's New in v3.5.2
 ### GPU-Accelerated Embeddings (Optional)
-ONNX embeddings can now run on NVIDIA GPUs for **5-10x faster indexing**. Opt-in — CPU remains the default.
+ONNX embeddings can run on NVIDIA GPUs for **5-10x faster indexing**. Opt-in — CPU remains the default.
 ```bash
+# NVIDIA GPU (requires CUDA 12.x drivers)
 pip install knowledge-rag[gpu]
+# Also install CUDA 12 runtime libraries (if not using CUDA Toolkit 12.x)
+pip install nvidia-cublas-cu12 nvidia-cudnn-cu12 nvidia-cuda-runtime-cu12
 ```
 ```yaml
 # config.yaml
 models:
   embedding:
-    gpu: true   # Falls back to CPU if CUDA unavailable
+    gpu: true   # Automatic CPU fallback if CUDA is unavailable
 ```
+**How it works:**
+- Sets `CUDAExecutionProvider` as primary, `CPUExecutionProvider` as fallback
+- Auto-discovers CUDA 12 DLLs from pip-installed NVIDIA packages (no manual PATH config)
+- If GPU init fails for any reason, falls back to CPU silently with a `[WARN]` log
+- `gpu: false` (default) forces CPU-only mode — zero CUDA overhead, clean logs
 Ideal for large knowledge bases (1000+ documents) where full rebuilds take minutes on CPU. After the initial index, incremental reindexing (`force: true`) takes seconds regardless.
 ### Recent Highlights
+- **v3.5.2** — CUDA DLL auto-discovery from pip packages, graceful GPU→CPU fallback, explicit CPU provider (no CUDA noise when `gpu: false`), BASE_DIR resolution fix for editable installs
+- **v3.5.1** — Remove Python `<3.13` upper bound — 3.13 and 3.14 now supported
+- **v3.5.0** — Optional GPU acceleration, supported formats table, full README rewrite
 - **v3.4.3** — MCP stdout save/restore fix (v3.4.2 broke JSON-RPC responses)
-- **v3.4.1** — `pip install` auto-detects project dir from venv location, Linux/macOS `install.sh`
 - **v3.4.0** — Persistent model cache, exclude patterns, Jupyter Notebook parser, inotify resilience, MetaTrader support
 See [Changelog](#changelog) for full history.
@@ -319,7 +332,7 @@ flowchart LR
 ### Prerequisites
-- Python 3.11 or 3.12 (**NOT** 3.13+ — onnxruntime incompatibility)
+- Python 3.11+
 - Claude Code CLI
 - ~200MB disk for model cache (auto-downloaded on first run)
 - *Optional:* NVIDIA GPU + CUDA for [accelerated embeddings](#gpu-accelerated-embeddings-optional) (`pip install knowledge-rag[gpu]`)
@@ -950,17 +963,10 @@ knowledge-rag/
 ### Python version mismatch
-ChromaDB depends on onnxruntime which requires Python 3.11 or 3.12. Python 3.13+ is **NOT** supported.
+Requires Python 3.11 or newer.
 ```bash
-# Check version
-python --version
-# Windows: use specific version
-py -3.12 -m venv venv
-# Linux/macOS: use specific version
-python3.12 -m venv venv
+python --version    # Must be 3.11+
 ```
 ### FastEmbed model download fails
@@ -1018,6 +1024,17 @@ With ~200 documents, expect ~300-500MB RAM. The embedding model (~50MB) and rera
 ## Changelog
+### v3.5.2 (2026-04-16)
+- **NEW**: Auto-discovery of CUDA 12 DLLs from pip-installed NVIDIA packages — no manual PATH configuration needed
+- **NEW**: Graceful GPU→CPU fallback with `[WARN]` log when CUDA init fails (missing drivers, wrong version, etc.)
+- **FIX**: Explicit `CPUExecutionProvider` when `gpu: false` — eliminates noisy CUDA probe errors in logs
+- **FIX**: BASE_DIR resolution now correctly prefers directories with `config.yaml` over those with only `config.example.yaml` (fixes editable installs)
+### v3.5.1 (2026-04-16)
+- **FIX**: Removed Python upper bound constraint (`<3.13` → `>=3.11`). Python 3.13 and 3.14 now supported — onnxruntime ships wheels for both.
 ### v3.5.0 (2026-04-16)
 - **NEW**: Optional GPU acceleration for ONNX embeddings — `pip install knowledge-rag[gpu]` + `models.embedding.gpu: true` in config. 5-10x faster indexing on NVIDIA GPUs with automatic CPU fallback.

{knowledge_rag-3.5.0 → knowledge_rag-3.5.2}/README.md RENAMED Viewed

@@ -2,10 +2,11 @@
 <div align="center">
-![Version](https://img.shields.io/badge/version-3.5.0-blue.svg)
-![Python](https://img.shields.io/badge/python-3.11%20%7C%203.12-green.svg)
+![Version](https://img.shields.io/badge/version-3.5.2-blue.svg)
+![Python](https://img.shields.io/badge/python-3.11%2B-green.svg)
 ![License](https://img.shields.io/badge/license-MIT-yellow.svg)
 ![Platform](https://img.shields.io/badge/platform-Windows%20%7C%20Linux%20%7C%20macOS-lightgrey.svg)
+![GPU](https://img.shields.io/badge/GPU-NVIDIA%20CUDA-76B900.svg?logo=nvidia)
 [![CI](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml)
 [![CodeQL](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml)
 [![Glama Score](https://glama.ai/mcp/servers/lyonzin/knowledge-rag/badges/score.svg)](https://glama.ai/mcp/servers/lyonzin/knowledge-rag)
@@ -25,35 +26,47 @@ Your documents become instantly searchable inside Claude Code — with reranking
 **12 MCP Tools** | **Hybrid Search + Cross-Encoder Reranking** | **Markdown-Aware Chunking** | **100% Local, Zero Cloud**
-[What's New](#whats-new-in-v350) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
+[What's New](#whats-new-in-v352) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
 </div>
 ---
-## What's New in v3.5.0
+## What's New in v3.5.2
 ### GPU-Accelerated Embeddings (Optional)
-ONNX embeddings can now run on NVIDIA GPUs for **5-10x faster indexing**. Opt-in — CPU remains the default.
+ONNX embeddings can run on NVIDIA GPUs for **5-10x faster indexing**. Opt-in — CPU remains the default.
 ```bash
+# NVIDIA GPU (requires CUDA 12.x drivers)
 pip install knowledge-rag[gpu]
+# Also install CUDA 12 runtime libraries (if not using CUDA Toolkit 12.x)
+pip install nvidia-cublas-cu12 nvidia-cudnn-cu12 nvidia-cuda-runtime-cu12
 ```
 ```yaml
 # config.yaml
 models:
   embedding:
-    gpu: true   # Falls back to CPU if CUDA unavailable
+    gpu: true   # Automatic CPU fallback if CUDA is unavailable
 ```
+**How it works:**
+- Sets `CUDAExecutionProvider` as primary, `CPUExecutionProvider` as fallback
+- Auto-discovers CUDA 12 DLLs from pip-installed NVIDIA packages (no manual PATH config)
+- If GPU init fails for any reason, falls back to CPU silently with a `[WARN]` log
+- `gpu: false` (default) forces CPU-only mode — zero CUDA overhead, clean logs
 Ideal for large knowledge bases (1000+ documents) where full rebuilds take minutes on CPU. After the initial index, incremental reindexing (`force: true`) takes seconds regardless.
 ### Recent Highlights
+- **v3.5.2** — CUDA DLL auto-discovery from pip packages, graceful GPU→CPU fallback, explicit CPU provider (no CUDA noise when `gpu: false`), BASE_DIR resolution fix for editable installs
+- **v3.5.1** — Remove Python `<3.13` upper bound — 3.13 and 3.14 now supported
+- **v3.5.0** — Optional GPU acceleration, supported formats table, full README rewrite
 - **v3.4.3** — MCP stdout save/restore fix (v3.4.2 broke JSON-RPC responses)
-- **v3.4.1** — `pip install` auto-detects project dir from venv location, Linux/macOS `install.sh`
 - **v3.4.0** — Persistent model cache, exclude patterns, Jupyter Notebook parser, inotify resilience, MetaTrader support
 See [Changelog](#changelog) for full history.
@@ -281,7 +294,7 @@ flowchart LR
 ### Prerequisites
-- Python 3.11 or 3.12 (**NOT** 3.13+ — onnxruntime incompatibility)
+- Python 3.11+
 - Claude Code CLI
 - ~200MB disk for model cache (auto-downloaded on first run)
 - *Optional:* NVIDIA GPU + CUDA for [accelerated embeddings](#gpu-accelerated-embeddings-optional) (`pip install knowledge-rag[gpu]`)
@@ -912,17 +925,10 @@ knowledge-rag/
 ### Python version mismatch
-ChromaDB depends on onnxruntime which requires Python 3.11 or 3.12. Python 3.13+ is **NOT** supported.
+Requires Python 3.11 or newer.
 ```bash
-# Check version
-python --version
-# Windows: use specific version
-py -3.12 -m venv venv
-# Linux/macOS: use specific version
-python3.12 -m venv venv
+python --version    # Must be 3.11+
 ```
 ### FastEmbed model download fails
@@ -980,6 +986,17 @@ With ~200 documents, expect ~300-500MB RAM. The embedding model (~50MB) and rera
 ## Changelog
+### v3.5.2 (2026-04-16)
+- **NEW**: Auto-discovery of CUDA 12 DLLs from pip-installed NVIDIA packages — no manual PATH configuration needed
+- **NEW**: Graceful GPU→CPU fallback with `[WARN]` log when CUDA init fails (missing drivers, wrong version, etc.)
+- **FIX**: Explicit `CPUExecutionProvider` when `gpu: false` — eliminates noisy CUDA probe errors in logs
+- **FIX**: BASE_DIR resolution now correctly prefers directories with `config.yaml` over those with only `config.example.yaml` (fixes editable installs)
+### v3.5.1 (2026-04-16)
+- **FIX**: Removed Python upper bound constraint (`<3.13` → `>=3.11`). Python 3.13 and 3.14 now supported — onnxruntime ships wheels for both.
 ### v3.5.0 (2026-04-16)
 - **NEW**: Optional GPU acceleration for ONNX embeddings — `pip install knowledge-rag[gpu]` + `models.embedding.gpu: true` in config. 5-10x faster indexing on NVIDIA GPUs with automatic CPU fallback.

{knowledge_rag-3.5.0 → knowledge_rag-3.5.2}/mcp_server/__init__.py RENAMED Viewed

@@ -8,7 +8,7 @@ import sys  # noqa: I001
 _original_stdout = sys.stdout
 sys.stdout = sys.stderr
-__version__ = "3.5.0"
+__version__ = "3.5.2"
 __author__ = "Ailton Rocha (Lyon.)"
 from .config import Config  # noqa: E402

{knowledge_rag-3.5.0 → knowledge_rag-3.5.2}/mcp_server/config.py RENAMED Viewed

@@ -51,12 +51,17 @@ _venv_dir = _venv_project_dir()
 if os.environ.get("KNOWLEDGE_RAG_DIR"):
     BASE_DIR = Path(os.environ["KNOWLEDGE_RAG_DIR"])
+elif _venv_dir is not None and (_venv_dir / "config.yaml").exists():
+    # Prefer venv parent if it has an actual config.yaml (editable installs, PyPI installs)
+    BASE_DIR = _venv_dir
+elif _is_project_root(_source_dir) and (_source_dir / "config.yaml").exists():
+    BASE_DIR = _source_dir
+elif _is_project_root(Path.cwd()):
+    BASE_DIR = Path.cwd()
 elif _is_project_root(_source_dir):
     BASE_DIR = _source_dir
 elif _is_project_root(_venv_dir):
     BASE_DIR = _venv_dir
-elif _is_project_root(Path.cwd()):
-    BASE_DIR = Path.cwd()
 else:
     BASE_DIR = _venv_dir if _venv_dir is not None else Path.cwd()

{knowledge_rag-3.5.0 → knowledge_rag-3.5.2}/mcp_server/server.py RENAMED Viewed

@@ -19,7 +19,7 @@ Features:
     - CRUD operations via MCP tools (add, update, remove docs)
 Autor:   Lyon (Ailton Rocha)
-Versao:  3.5.0
+Versao:  3.5.2
 Data:    2026-04-16
 """
@@ -138,17 +138,60 @@ class FastEmbedEmbeddings:
     Model: BAAI/bge-small-en-v1.5 (384-dim, MTEB score 62.x)
     """
+    @staticmethod
+    def _setup_cuda_dll_paths():
+        """Add NVIDIA CUDA 12 pip package DLL paths to os.environ['PATH'].
+        When onnxruntime-gpu is installed alongside nvidia-cublas-cu12 etc.,
+        the DLLs live under site-packages/nvidia/*/bin/ and onnxruntime can't
+        find them unless they're on PATH. This is a no-op if the dirs don't exist.
+        """
+        import os
+        import site
+        site_dirs = site.getsitepackages() if hasattr(site, "getsitepackages") else []
+        nvidia_libs = [
+            "nvidia/cublas/bin",
+            "nvidia/cudnn/bin",
+            "nvidia/cuda_runtime/bin",
+            "nvidia/cufft/bin",
+            "nvidia/curand/bin",
+            "nvidia/cusolver/bin",
+            "nvidia/cusparse/bin",
+            "nvidia/nvjitlink/bin",
+            "nvidia/cuda_nvrtc/bin",
+        ]
+        added = []
+        for sp in site_dirs:
+            for lib in nvidia_libs:
+                p = os.path.join(sp, lib)
+                if os.path.isdir(p) and p not in os.environ.get("PATH", ""):
+                    os.environ["PATH"] = p + os.pathsep + os.environ.get("PATH", "")
+                    added.append(lib.split("/")[1])
+        if added:
+            print(f"[INFO] CUDA DLL paths added for: {', '.join(dict.fromkeys(added))}")
     def __init__(self, model: str = None):
         self.model_name = model or config.embedding_model
         self._dim = config.embedding_dim
         kwargs = {"model_name": self.model_name, "cache_dir": str(config.models_cache_dir)}
         if config.gpu_acceleration:
+            self._setup_cuda_dll_paths()
             kwargs["providers"] = ["CUDAExecutionProvider", "CPUExecutionProvider"]
             print(f"[INFO] Loading embedding model: {self.model_name} ({self._dim}D) [GPU accelerated]...")
+            try:
+                self._model = TextEmbedding(**kwargs)
+                print("[INFO] Embedding model loaded successfully [GPU]")
+            except (ValueError, RuntimeError) as e:
+                print(f"[WARN] GPU init failed ({e}), falling back to CPU...")
+                kwargs["providers"] = ["CPUExecutionProvider"]
+                self._model = TextEmbedding(**kwargs)
+                print("[INFO] Embedding model loaded successfully [CPU fallback]")
         else:
+            kwargs["providers"] = ["CPUExecutionProvider"]
             print(f"[INFO] Loading embedding model: {self.model_name} ({self._dim}D)...")
-        self._model = TextEmbedding(**kwargs)
-        print("[INFO] Embedding model loaded successfully")
+            self._model = TextEmbedding(**kwargs)
+            print("[INFO] Embedding model loaded successfully")
     def __call__(self, input: List[str]) -> List[List[float]]:
         """

{knowledge_rag-3.5.0 → knowledge_rag-3.5.2}/pyproject.toml RENAMED Viewed

@@ -4,11 +4,11 @@ build-backend = "hatchling.build"
 [project]
 name = "knowledge-rag"
-version = "3.5.0"
+version = "3.5.2"
 description = "Local RAG System for Claude Code — Hybrid search + Cross-encoder Reranking + 12 MCP Tools. Zero external servers."
 readme = "README.md"
 license = {text = "MIT"}
-requires-python = ">=3.11,<3.13"
+requires-python = ">=3.11"
 authors = [
     {name = "Lyon.", email = "lyonzin@users.noreply.github.com"},
 ]

{knowledge_rag-3.5.0 → knowledge_rag-3.5.2}/.gitignore RENAMED Viewed

File without changes

{knowledge_rag-3.5.0 → knowledge_rag-3.5.2}/LICENSE RENAMED Viewed

File without changes

{knowledge_rag-3.5.0 → knowledge_rag-3.5.2}/config.example.yaml RENAMED Viewed

File without changes

{knowledge_rag-3.5.0 → knowledge_rag-3.5.2}/documents/examples/sample-document.md RENAMED Viewed

File without changes

{knowledge_rag-3.5.0 → knowledge_rag-3.5.2}/mcp_server/ingestion.py RENAMED Viewed

File without changes

{knowledge_rag-3.5.0 → knowledge_rag-3.5.2}/presets/cybersecurity.yaml RENAMED Viewed

File without changes

{knowledge_rag-3.5.0 → knowledge_rag-3.5.2}/presets/developer.yaml RENAMED Viewed

File without changes

{knowledge_rag-3.5.0 → knowledge_rag-3.5.2}/presets/general.yaml RENAMED Viewed

File without changes

{knowledge_rag-3.5.0 → knowledge_rag-3.5.2}/presets/research.yaml RENAMED Viewed

File without changes

{knowledge_rag-3.5.0 → knowledge_rag-3.5.2}/requirements.txt RENAMED Viewed

File without changes

knowledge-rag 3.5.0__tar.gz → 3.5.2__tar.gz

knowledge-rag 3.5.0tar.gz → 3.5.2tar.gz