PyPI - knowledge-rag - Versions diffs - 3.8.0__tar.gz → 3.9.0__tar.gz - Mend

knowledge-rag 3.8.0tar.gz → 3.9.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/.gitignore RENAMED Viewed

@@ -32,8 +32,9 @@ documents/**/*.csv
 models_cache/
 *.onnx
-# Local scripts (not part of distribution)
-scripts/
+# Local one-off scripts (not part of distribution)
+# Note: tracked utility scripts under scripts/ (e.g. check_version_sync.py) are
+# committed; this rule only catches ad-hoc local files outside that directory.
 setup-notebook.ps1
 demo-real.yml
 documents/.sync-log.txt

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: knowledge-rag
-Version: 3.8.0
+Version: 3.9.0
 Summary: Local RAG System for Claude Code — Hybrid search + Cross-encoder Reranking + 12 MCP Tools + 20 Format Parsers. Zero external servers.
 Project-URL: Homepage, https://github.com/lyonzin/knowledge-rag
 Project-URL: Repository, https://github.com/lyonzin/knowledge-rag
@@ -49,6 +49,7 @@ Description-Content-Type: text/markdown
 ![GPU](https://img.shields.io/badge/GPU-NVIDIA%20CUDA-76B900.svg?logo=nvidia)
 [![CI](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml)
 [![CodeQL](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml)
+[![Quality Gate](https://github.com/lyonzin/knowledge-rag/actions/workflows/quality-gate.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/quality-gate.yml)
 [![Glama Score](https://glama.ai/mcp/servers/lyonzin/knowledge-rag/badges/score.svg)](https://glama.ai/mcp/servers/lyonzin/knowledge-rag)
 ### Your docs, your machine, zero cloud. Claude Code searches them natively.
@@ -65,19 +66,41 @@ pip install knowledge-rag → restart Claude Code → search_knowledge("your que
 **12 MCP Tools** | **Hybrid Search + Reranking** | **20 File Formats** | **Optional NVIDIA GPU** | **100% Local**
-[What's New](#whats-new-in-v360) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
+[What's New](#whats-new-in-v390) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
 </div>
 ---
-## What's New in v3.8.0
+## What's New in v3.9.0
-### Lazy-Loaded Embeddings — Cheaper Idle Processes
+### Quality Gate — 7-Pillar PR Validation
+knowledge-rag is now used daily by 70+ enterprise teams. Every PR (including dependabot bumps and one-line fixes) is now evaluated against **35+ automated checks** spread across 7 pillars before any human review:
+| Pillar | What it enforces | Tools |
+|---|---|---|
+| **1 Security** | SAST, secrets, CVEs, supply chain | bandit, semgrep, gitleaks, pip-audit, dependency-review, Snyk, CodeQL, Socket |
+| **2 Stability** | Flake detection, coverage trend, test count, deterministic runs | pytest-rerunfailures, codecov ±0.5pp, test-count guard |
+| **3 Memory Leak** | RSS bounded under 1000-query load, no idle bloat | psutil-based baseline tests + nightly 50K-iteration soak |
+| **4 Versatility** | 9 OS×Python combos, 14 format parsers, 4 config presets, locale tolerance, property-based fuzzing | matrix CI on Linux+Windows+macOS × 3.11+3.12+3.13, Hypothesis |
+| **5 Scalability** | Performance regression > 10% blocks merge, public bench dashboard | pytest-benchmark, GH Pages chart |
+| **6 Versioning** | Atomic version sync, API surface diff, conventional commits, CHANGELOG enforcement, backwards compat | griffe-style AST diff, custom guards |
+| **7 Quality** | Type strictness, docstring coverage, complexity, dead code | mypy strict, interrogate ≥80%, radon, vulture |
+Plus a **nightly resilience workflow** that runs chaos failure-injection (HF down, ChromaDB corruption, watchdog crash, ONNX zero-byte replay), determinism check (full suite × 3), and mutation testing on selected modules.
+Read the full philosophy in [CONTRIBUTING.md](CONTRIBUTING.md). Report bugs via [SECURITY.md](SECURITY.md) or the [issue templates](.github/ISSUE_TEMPLATE/).
+### Critical Hotfix — No More Silent Zero-Vector Corruption (v3.8.1)
+`FastEmbedEmbeddings.__call__` no longer swallows exceptions and returns `[[0.0]*dim, ...]` when the ONNX model fails to load. That bug pre-existed in master but was silent: ChromaDB happily stored zero embeddings, `count()` reported normal numbers, smart-reindex skipped them as "already indexed", and queries returned garbage similarity with no error visible. Now raises `EmbeddingModelLoadError` / `EmbeddingError` loudly. **All v3.8.0 users should upgrade.** Full details in [Changelog](#v381-2026-05-10--hotfix).
+### Lazy-Loaded Embeddings — Cheaper Idle Processes (v3.8.0)
 The FastEmbed ONNX model (~200MB resident) now loads on the **first query**, not at startup. Idle `knowledge-rag` processes are now genuinely cheap. Why this matters: MCP stdio is one-process-per-client by protocol — multiple Claude Code windows, Claude Desktop + IDE simultaneously, or review/approval flows that open extra connections all spawn their own processes. Before v3.8.0, every one of them paid the full embedding-model cost up front. Now only processes that actually serve queries load the model. Public API is unchanged.
-### Opt-In Single-Instance Guard
+### Opt-In Single-Instance Guard (v3.8.0)
 For users who measured their setup and want a hard cap of one server per `data_dir`:
@@ -101,6 +124,8 @@ All methods produce the same MCP server. See [Installation](#installation) for f
 ### Recent Highlights
+- **v3.9.0** — **Quality Gate** activated: 35+ automated PR checks across 7 pillars (Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality) + nightly resilience suite (chaos, soak, determinism, mutation)
+- **v3.8.1** — Critical hotfix: loud-fail embeddings (no more silent zero-vector corruption); Windows CI flake erradicated (HF_HUB_OFFLINE + shell:bash + atexit wrapper)
 - **v3.8.0** — Lazy-load embeddings, opt-in single-instance guard, version sync across PyPI/NPM/Docker
 - **v3.6.0** — Multi-language code parsing (C/C++/JS/TS/XML), NPM wrapper, Docker image, automated release pipeline
 - **v3.5.2** — CUDA DLL auto-discovery from pip packages, graceful GPU→CPU fallback, explicit CPU provider (no CUDA noise when `gpu: false`), BASE_DIR resolution fix for editable installs
@@ -1115,6 +1140,36 @@ A second instance exits immediately with code 75. Default is OFF (multi-client f
 ## Changelog
+### v3.9.0 (2026-05-10) — Quality Gate
+**Major governance + CI hardening release. No runtime behavior change in `mcp_server/`. Public API surface unchanged from v3.8.1.**
+- **NEW** Quality Gate workflow (`.github/workflows/quality-gate.yml`) enforcing the 7 pillars on every PR: Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality. 35+ status checks total.
+- **NEW** Nightly resilience workflow (`.github/workflows/nightly.yml`): chaos suite (failure injection), 1h soak test (50K-iteration loop), determinism check (full suite × 3), mutation testing (mutmut). Auto-opens GitHub issue on any nightly failure.
+- **NEW** Performance benchmark suite under `bench/` (12 microbenchmarks, pytest-benchmark) with 10% regression gate on every PR.
+- **NEW** Public performance dashboard via GitHub Pages (`.github/workflows/bench-pages.yml`) — chart of latency/throughput per commit. Dormant until repo Pages is enabled.
+- **NEW** Property-based fuzzing of all parsers via Hypothesis (`tests/test_ingestion_property.py`) — 200 random examples per CI run.
+- **NEW** Memory baseline regression tests (`tests/test_memory_baseline.py`, cross-platform via psutil) — RSS bounded under 1000 queries; nightly soak amplifies to 50K iterations.
+- **NEW** Property/locale/format/preset matrices (`tests/test_presets.py`, `tests/test_locale.py`, `tests/test_format_smoke.py`).
+- **NEW** Backwards-compatibility regression tests (`tests/test_backwards_compat.py`) — legacy YAML configs from v3.6.0 / v3.7.0 still parse; all 12 MCP tool parameter names frozen.
+- **NEW** AST-based public API surface diff (`scripts/check_api_surface.py`) — any breaking change blocks merge, baseline at `.github/api-surface-baseline.json`.
+- **NEW** CHANGELOG enforcement (`scripts/check_changelog.py`) — user-facing PRs must add a bullet under `## Unreleased`; bypass via `skip-changelog` label.
+- **NEW** Test count anti-regression (`scripts/check_test_count.py`) — guards against silent test deletion.
+- **NEW** Conventional commits required on every PR title (commitlint via `amannn/action-semantic-pull-request`).
+- **NEW** mypy `--strict` rolling out per-module (currently `instance_lock.py` + `preflight.py` + `scripts/`); interrogate docstring coverage ≥ 80%; radon, vulture, PR-size guard report-only.
+- **NEW** CI matrix expanded to 9 cells: Linux + Windows + **macOS** × 3.11 + 3.12 + **3.13** (all required at v3.9.0; macOS / 3.13 promoted from experimental after two clean cycles).
+- **NEW** Governance docs: `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, `SECURITY.md`, `.github/PULL_REQUEST_TEMPLATE.md`, 3 issue templates, expanded `CODEOWNERS`.
+- **NEW** Pre-commit hooks: ruff, gitleaks, version-sync, conventional commits.
+- **CHORE** `.github/codecov.yml` enforcing coverage trend gate (-0.5pp blocks; new code ≥ 70%).
+### v3.8.1 (2026-05-10) — hotfix
+- **FIX (critical)**: `FastEmbedEmbeddings.__call__` no longer returns vectors of zeros when the ONNX model fails to load or `embed()` raises. The previous behavior silently corrupted the index — ChromaDB stored zero embeddings, `count()` reported normal numbers, smart-reindex skipped the bad chunks, and queries returned garbage scores with no error visible. Now raises `EmbeddingModelLoadError` / `EmbeddingError`. (#36)
+- **FIX**: Sticky `_load_failed` flag — after a load failure, subsequent calls re-raise immediately instead of looping through HuggingFace download attempts (was the "frozen query" UX in v3.8.0).
+- **NEW**: Sanity checks in `__call__` — embed count and dim mismatches raise `EmbeddingError` instead of silently returning malformed vectors.
+- **TEST**: 7 new regression cases in `tests/test_lazy_embeddings.py`, including `test_does_not_return_zero_vectors_silently` as a guard for the whole class of bug.
+- **NOTE**: This is a pre-existing bug in master, not introduced by v3.8.0. v3.8.0 lazy-load expanded the impact (failures moved to query time). All v3.8.0 users should upgrade.
 ### v3.8.0 (2026-05-10)
 - **NEW**: Lazy-load FastEmbed embedding model (~200MB ONNX runtime). Loads on first query instead of startup — idle `knowledge-rag` processes are now cheap, which matters when MCP stdio clients spawn parallel server processes (multiple Claude Code windows, Claude Desktop + IDE, etc.). Public API unchanged. (#32)

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/README.md RENAMED Viewed

@@ -11,6 +11,7 @@
 ![GPU](https://img.shields.io/badge/GPU-NVIDIA%20CUDA-76B900.svg?logo=nvidia)
 [![CI](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml)
 [![CodeQL](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml)
+[![Quality Gate](https://github.com/lyonzin/knowledge-rag/actions/workflows/quality-gate.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/quality-gate.yml)
 [![Glama Score](https://glama.ai/mcp/servers/lyonzin/knowledge-rag/badges/score.svg)](https://glama.ai/mcp/servers/lyonzin/knowledge-rag)
 ### Your docs, your machine, zero cloud. Claude Code searches them natively.
@@ -27,19 +28,41 @@ pip install knowledge-rag → restart Claude Code → search_knowledge("your que
 **12 MCP Tools** | **Hybrid Search + Reranking** | **20 File Formats** | **Optional NVIDIA GPU** | **100% Local**
-[What's New](#whats-new-in-v360) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
+[What's New](#whats-new-in-v390) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
 </div>
 ---
-## What's New in v3.8.0
+## What's New in v3.9.0
-### Lazy-Loaded Embeddings — Cheaper Idle Processes
+### Quality Gate — 7-Pillar PR Validation
+knowledge-rag is now used daily by 70+ enterprise teams. Every PR (including dependabot bumps and one-line fixes) is now evaluated against **35+ automated checks** spread across 7 pillars before any human review:
+| Pillar | What it enforces | Tools |
+|---|---|---|
+| **1 Security** | SAST, secrets, CVEs, supply chain | bandit, semgrep, gitleaks, pip-audit, dependency-review, Snyk, CodeQL, Socket |
+| **2 Stability** | Flake detection, coverage trend, test count, deterministic runs | pytest-rerunfailures, codecov ±0.5pp, test-count guard |
+| **3 Memory Leak** | RSS bounded under 1000-query load, no idle bloat | psutil-based baseline tests + nightly 50K-iteration soak |
+| **4 Versatility** | 9 OS×Python combos, 14 format parsers, 4 config presets, locale tolerance, property-based fuzzing | matrix CI on Linux+Windows+macOS × 3.11+3.12+3.13, Hypothesis |
+| **5 Scalability** | Performance regression > 10% blocks merge, public bench dashboard | pytest-benchmark, GH Pages chart |
+| **6 Versioning** | Atomic version sync, API surface diff, conventional commits, CHANGELOG enforcement, backwards compat | griffe-style AST diff, custom guards |
+| **7 Quality** | Type strictness, docstring coverage, complexity, dead code | mypy strict, interrogate ≥80%, radon, vulture |
+Plus a **nightly resilience workflow** that runs chaos failure-injection (HF down, ChromaDB corruption, watchdog crash, ONNX zero-byte replay), determinism check (full suite × 3), and mutation testing on selected modules.
+Read the full philosophy in [CONTRIBUTING.md](CONTRIBUTING.md). Report bugs via [SECURITY.md](SECURITY.md) or the [issue templates](.github/ISSUE_TEMPLATE/).
+### Critical Hotfix — No More Silent Zero-Vector Corruption (v3.8.1)
+`FastEmbedEmbeddings.__call__` no longer swallows exceptions and returns `[[0.0]*dim, ...]` when the ONNX model fails to load. That bug pre-existed in master but was silent: ChromaDB happily stored zero embeddings, `count()` reported normal numbers, smart-reindex skipped them as "already indexed", and queries returned garbage similarity with no error visible. Now raises `EmbeddingModelLoadError` / `EmbeddingError` loudly. **All v3.8.0 users should upgrade.** Full details in [Changelog](#v381-2026-05-10--hotfix).
+### Lazy-Loaded Embeddings — Cheaper Idle Processes (v3.8.0)
 The FastEmbed ONNX model (~200MB resident) now loads on the **first query**, not at startup. Idle `knowledge-rag` processes are now genuinely cheap. Why this matters: MCP stdio is one-process-per-client by protocol — multiple Claude Code windows, Claude Desktop + IDE simultaneously, or review/approval flows that open extra connections all spawn their own processes. Before v3.8.0, every one of them paid the full embedding-model cost up front. Now only processes that actually serve queries load the model. Public API is unchanged.
-### Opt-In Single-Instance Guard
+### Opt-In Single-Instance Guard (v3.8.0)
 For users who measured their setup and want a hard cap of one server per `data_dir`:
@@ -63,6 +86,8 @@ All methods produce the same MCP server. See [Installation](#installation) for f
 ### Recent Highlights
+- **v3.9.0** — **Quality Gate** activated: 35+ automated PR checks across 7 pillars (Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality) + nightly resilience suite (chaos, soak, determinism, mutation)
+- **v3.8.1** — Critical hotfix: loud-fail embeddings (no more silent zero-vector corruption); Windows CI flake erradicated (HF_HUB_OFFLINE + shell:bash + atexit wrapper)
 - **v3.8.0** — Lazy-load embeddings, opt-in single-instance guard, version sync across PyPI/NPM/Docker
 - **v3.6.0** — Multi-language code parsing (C/C++/JS/TS/XML), NPM wrapper, Docker image, automated release pipeline
 - **v3.5.2** — CUDA DLL auto-discovery from pip packages, graceful GPU→CPU fallback, explicit CPU provider (no CUDA noise when `gpu: false`), BASE_DIR resolution fix for editable installs
@@ -1077,6 +1102,36 @@ A second instance exits immediately with code 75. Default is OFF (multi-client f
 ## Changelog
+### v3.9.0 (2026-05-10) — Quality Gate
+**Major governance + CI hardening release. No runtime behavior change in `mcp_server/`. Public API surface unchanged from v3.8.1.**
+- **NEW** Quality Gate workflow (`.github/workflows/quality-gate.yml`) enforcing the 7 pillars on every PR: Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality. 35+ status checks total.
+- **NEW** Nightly resilience workflow (`.github/workflows/nightly.yml`): chaos suite (failure injection), 1h soak test (50K-iteration loop), determinism check (full suite × 3), mutation testing (mutmut). Auto-opens GitHub issue on any nightly failure.
+- **NEW** Performance benchmark suite under `bench/` (12 microbenchmarks, pytest-benchmark) with 10% regression gate on every PR.
+- **NEW** Public performance dashboard via GitHub Pages (`.github/workflows/bench-pages.yml`) — chart of latency/throughput per commit. Dormant until repo Pages is enabled.
+- **NEW** Property-based fuzzing of all parsers via Hypothesis (`tests/test_ingestion_property.py`) — 200 random examples per CI run.
+- **NEW** Memory baseline regression tests (`tests/test_memory_baseline.py`, cross-platform via psutil) — RSS bounded under 1000 queries; nightly soak amplifies to 50K iterations.
+- **NEW** Property/locale/format/preset matrices (`tests/test_presets.py`, `tests/test_locale.py`, `tests/test_format_smoke.py`).
+- **NEW** Backwards-compatibility regression tests (`tests/test_backwards_compat.py`) — legacy YAML configs from v3.6.0 / v3.7.0 still parse; all 12 MCP tool parameter names frozen.
+- **NEW** AST-based public API surface diff (`scripts/check_api_surface.py`) — any breaking change blocks merge, baseline at `.github/api-surface-baseline.json`.
+- **NEW** CHANGELOG enforcement (`scripts/check_changelog.py`) — user-facing PRs must add a bullet under `## Unreleased`; bypass via `skip-changelog` label.
+- **NEW** Test count anti-regression (`scripts/check_test_count.py`) — guards against silent test deletion.
+- **NEW** Conventional commits required on every PR title (commitlint via `amannn/action-semantic-pull-request`).
+- **NEW** mypy `--strict` rolling out per-module (currently `instance_lock.py` + `preflight.py` + `scripts/`); interrogate docstring coverage ≥ 80%; radon, vulture, PR-size guard report-only.
+- **NEW** CI matrix expanded to 9 cells: Linux + Windows + **macOS** × 3.11 + 3.12 + **3.13** (all required at v3.9.0; macOS / 3.13 promoted from experimental after two clean cycles).
+- **NEW** Governance docs: `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, `SECURITY.md`, `.github/PULL_REQUEST_TEMPLATE.md`, 3 issue templates, expanded `CODEOWNERS`.
+- **NEW** Pre-commit hooks: ruff, gitleaks, version-sync, conventional commits.
+- **CHORE** `.github/codecov.yml` enforcing coverage trend gate (-0.5pp blocks; new code ≥ 70%).
+### v3.8.1 (2026-05-10) — hotfix
+- **FIX (critical)**: `FastEmbedEmbeddings.__call__` no longer returns vectors of zeros when the ONNX model fails to load or `embed()` raises. The previous behavior silently corrupted the index — ChromaDB stored zero embeddings, `count()` reported normal numbers, smart-reindex skipped the bad chunks, and queries returned garbage scores with no error visible. Now raises `EmbeddingModelLoadError` / `EmbeddingError`. (#36)
+- **FIX**: Sticky `_load_failed` flag — after a load failure, subsequent calls re-raise immediately instead of looping through HuggingFace download attempts (was the "frozen query" UX in v3.8.0).
+- **NEW**: Sanity checks in `__call__` — embed count and dim mismatches raise `EmbeddingError` instead of silently returning malformed vectors.
+- **TEST**: 7 new regression cases in `tests/test_lazy_embeddings.py`, including `test_does_not_return_zero_vectors_silently` as a guard for the whole class of bug.
+- **NOTE**: This is a pre-existing bug in master, not introduced by v3.8.0. v3.8.0 lazy-load expanded the impact (failures moved to query time). All v3.8.0 users should upgrade.
 ### v3.8.0 (2026-05-10)
 - **NEW**: Lazy-load FastEmbed embedding model (~200MB ONNX runtime). Loads on first query instead of startup — idle `knowledge-rag` processes are now cheap, which matters when MCP stdio clients spawn parallel server processes (multiple Claude Code windows, Claude Desktop + IDE, etc.). Public API unchanged. (#32)

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/mcp_server/__init__.py RENAMED Viewed

@@ -8,7 +8,7 @@ import sys  # noqa: I001
 _original_stdout = sys.stdout
 sys.stdout = sys.stderr
-__version__ = "3.8.0"
+__version__ = "3.9.0"
 __author__ = "Ailton Rocha (Lyon.)"
 from .config import Config  # noqa: E402

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/mcp_server/instance_lock.py RENAMED Viewed

@@ -155,7 +155,7 @@ def single_instance_lock() -> Iterator[Optional[Path]]:
     # Wire signal handlers so SIGINT/SIGTERM cleanup the lock before exit
     previous_handlers: dict[int, object] = {}
-    def _signal_cleanup(signum: int, frame) -> None:
+    def _signal_cleanup(signum: int, frame: object) -> None:
         _remove_if_ours(lock_path)
         # Restore original handler and re-raise so default action runs
         prev = previous_handlers.get(signum, signal.SIG_DFL)

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/mcp_server/server.py RENAMED Viewed

@@ -129,6 +129,18 @@ class QueryCache:
 # =============================================================================
+class EmbeddingError(RuntimeError):
+    """Raised when embedding generation fails after a successful model load."""
+class EmbeddingModelLoadError(RuntimeError):
+    """Raised when the embedding model itself cannot be loaded.
+    Distinct from EmbeddingError so callers can decide whether to retry
+    (transient runtime failure) or surface a hard configuration problem.
+    """
 class FastEmbedEmbeddings:
     """
     FastEmbed-based embedding function for ChromaDB (v1.4.0+ compatible).
@@ -190,33 +202,57 @@ class FastEmbedEmbeddings:
         self._gpu = bool(config.gpu_acceleration)
         self._model: Optional[TextEmbedding] = None
         self._load_lock = threading.Lock()
+        # Sticky failure flag: once load fails, subsequent calls re-raise immediately
+        # instead of looping through download/retry. Same pattern as CrossEncoderReranker.
+        self._load_failed: Optional[Exception] = None
     def _load_model(self) -> None:
-        """Load the ONNX model on demand. Idempotent and thread-safe."""
+        """Load the ONNX model on demand. Idempotent and thread-safe.
+        Raises:
+            EmbeddingModelLoadError: when the underlying ONNX runtime cannot
+                instantiate the model (missing files, hash mismatch, etc.). The
+                exception is sticky — subsequent calls raise the same error
+                without retrying so callers do not loop through HF downloads.
+        """
         if self._model is not None:
             return
+        if self._load_failed is not None:
+            raise EmbeddingModelLoadError(
+                f"Embedding model previously failed to load: {self._load_failed}"
+            ) from self._load_failed
         with self._load_lock:
             if self._model is not None:  # double-checked under the lock
                 return
+            if self._load_failed is not None:
+                raise EmbeddingModelLoadError(
+                    f"Embedding model previously failed to load: {self._load_failed}"
+                ) from self._load_failed
             kwargs = dict(self._init_kwargs)
-            if self._gpu:
-                self._setup_cuda_dll_paths()
-                kwargs["providers"] = ["CUDAExecutionProvider", "CPUExecutionProvider"]
-                print(f"[INFO] Loading embedding model: {self.model_name} ({self._dim}D) [GPU accelerated]...")
-                try:
-                    self._model = TextEmbedding(**kwargs)
-                    print("[INFO] Embedding model loaded successfully [GPU]")
-                    return
-                except (ValueError, RuntimeError) as e:
-                    print(f"[WARN] GPU init failed ({e}), falling back to CPU...")
+            try:
+                if self._gpu:
+                    self._setup_cuda_dll_paths()
+                    kwargs["providers"] = ["CUDAExecutionProvider", "CPUExecutionProvider"]
+                    print(f"[INFO] Loading embedding model: {self.model_name} ({self._dim}D) [GPU accelerated]...")
+                    try:
+                        self._model = TextEmbedding(**kwargs)
+                        print("[INFO] Embedding model loaded successfully [GPU]")
+                    except (ValueError, RuntimeError) as e:
+                        print(f"[WARN] GPU init failed ({e}), falling back to CPU...")
+                        kwargs["providers"] = ["CPUExecutionProvider"]
+                        self._model = TextEmbedding(**kwargs)
+                        print("[INFO] Embedding model loaded successfully [CPU fallback]")
+                else:
                     kwargs["providers"] = ["CPUExecutionProvider"]
+                    print(f"[INFO] Loading embedding model: {self.model_name} ({self._dim}D)...")
                     self._model = TextEmbedding(**kwargs)
-                    print("[INFO] Embedding model loaded successfully [CPU fallback]")
-                    return
-            kwargs["providers"] = ["CPUExecutionProvider"]
-            print(f"[INFO] Loading embedding model: {self.model_name} ({self._dim}D)...")
-            self._model = TextEmbedding(**kwargs)
-            print("[INFO] Embedding model loaded successfully")
+                    print("[INFO] Embedding model loaded successfully")
+            except Exception as exc:
+                # ONNXRuntimeError, FileNotFoundError, etc. — record and re-raise loud
+                self._load_failed = exc
+                self._model = None
+                print(f"[ERROR] Embedding model load FAILED: {exc}", file=sys.stderr)
+                raise EmbeddingModelLoadError(f"Failed to load embedding model: {exc}") from exc
     def __call__(self, input: List[str]) -> List[List[float]]:
         """
@@ -224,17 +260,38 @@ class FastEmbedEmbeddings:
         ChromaDB embedding_function interface: __call__(input: List[str]) -> List[List[float]]
         FastEmbed.embed() returns a generator, so we consume it into a list.
+        Raises:
+            EmbeddingModelLoadError: when the model could not be loaded.
+            EmbeddingError: when embedding generation fails after a successful load.
+        Behavior note (changed in v3.8.1):
+            Previously this method swallowed any exception and returned vectors
+            of zeros (``[[0.0]*dim for _ in input]``). That silently corrupted
+            the index — ChromaDB stored zero vectors as document embeddings,
+            ``count()`` returned the right number of chunks, smart-reindex
+            would skip them as "already indexed", and queries returned garbage
+            similarity scores. Failures are now LOUD: the caller (ChromaDB
+            ``add()``, MCP search tool, etc.) sees the real error and can
+            surface it to the user.
         """
         if not input:
             return []
-        self._load_model()
+        self._load_model()  # may raise EmbeddingModelLoadError
         try:
             embeddings = list(self._model.embed(input))
-            return [emb.tolist() for emb in embeddings]
-        except Exception as e:
-            print(f"[WARN] Embedding failed: {e}")
-            return [[0.0] * self._dim for _ in input]
+        except Exception as exc:
+            print(f"[ERROR] Embedding generation FAILED: {exc}", file=sys.stderr)
+            raise EmbeddingError(f"Embedding generation failed: {exc}") from exc
+        # Sanity check: model returned the right number of vectors with the right dim
+        if len(embeddings) != len(input):
+            raise EmbeddingError(f"Embedding count mismatch: expected {len(input)}, got {len(embeddings)}")
+        result = [emb.tolist() for emb in embeddings]
+        if result and len(result[0]) != self._dim:
+            raise EmbeddingError(f"Embedding dim mismatch: expected {self._dim}, got {len(result[0])}")
+        return result
     def name(self) -> str:
         """Return embedding function name (required by ChromaDB v1.4.0+)"""

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "knowledge-rag"
-version = "3.8.0"
+version = "3.9.0"
 description = "Local RAG System for Claude Code — Hybrid search + Cross-encoder Reranking + 12 MCP Tools + 20 Format Parsers. Zero external servers."
 readme = "README.md"
 license = {text = "MIT"}
@@ -100,6 +100,12 @@ pythonpath = ["."]
 # many tmp dirs accumulate). Tests run isolated; we don't need history.
 tmp_path_retention_count = 1
 tmp_path_retention_policy = "failed"
+markers = [
+    "chaos: failure-injection tests run in nightly workflow only",
+]
+# Default: skip chaos tests in regular pytest runs (PR + local).
+# Nightly workflow opts in with `pytest -m chaos`.
+addopts = "-m 'not chaos'"
 [tool.ruff]
 target-version = "py311"
@@ -118,3 +124,36 @@ source = ["mcp_server"]
 [tool.coverage.report]
 show_missing = true
 fail_under = 35
+# ── mypy: strict gradual rollout (Pillar 7) ─────────────────────────────────
+# Strict mode is enabled GLOBALLY but with a per-module exclusion list while
+# we incrementally annotate the legacy modules. The CI job runs strict on the
+# allowlist below; new modules are added as they earn full annotations.
+[tool.mypy]
+python_version = "3.11"
+strict = true
+show_error_codes = true
+warn_unused_configs = true
+warn_unreachable = true
+ignore_missing_imports = true   # third-party libs (chromadb, fastembed, etc.) often lack stubs
+# ── interrogate: docstring coverage (Pillar 7) ──────────────────────────────
+[tool.interrogate]
+fail-under = 80
+verbose = 1
+quiet = false
+exclude = [
+    "tests",
+    "bench",
+    "scripts",
+    "build",
+    "dist",
+    "venv",
+    ".venv",
+]
+ignore-init-method = true
+ignore-init-module = true
+ignore-magic = true
+ignore-property-decorators = true
+ignore-private = true
+ignore-semiprivate = true

knowledge_rag-3.8.0/documents/examples/sample-document.md DELETED Viewed

@@ -1,36 +0,0 @@
-# Sample Document — Knowledge RAG
-This is an example document showing the expected format for the Knowledge RAG system.
-## How Documents Are Organized
-Place your documents in the `documents/` directory, organized by category:
-- `security/` — Security research, pentest notes, exploit techniques
-- `ctf/` — CTF writeups, challenge solutions
-- `logscale/` — LogScale/LQL queries and documentation
-- `development/` — Code documentation, API references
-- `general/` — Everything else
-## Supported Formats
-- **Markdown** (`.md`) — Best format. Chunks align to `##` sections.
-- **PDF** (`.pdf`) — Extracted page-by-page via PyMuPDF.
-- **Text** (`.txt`) — Plain text, paragraph-based chunking.
-- **Python** (`.py`) — Code with function/class extraction.
-- **JSON** (`.json`) — Structured data, pretty-printed.
-## Tips for Best Results
-1. Use `##` and `###` headers in Markdown files — the system chunks by sections
-2. Keep sections focused on a single topic for better retrieval precision
-3. Include relevant keywords naturally in your text
-4. After adding new documents, the system auto-indexes on next startup
-## Example Search Queries
-```
-search_knowledge("sql injection bypass", hybrid_alpha=0.3)
-search_knowledge("privilege escalation linux", category="security")
-search_knowledge("formatTime logscale", hybrid_alpha=0)
-```

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/LICENSE RENAMED Viewed

File without changes

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/config.example.yaml RENAMED Viewed

File without changes

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/mcp_server/config.py RENAMED Viewed

File without changes

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/mcp_server/guarded.py RENAMED Viewed

File without changes

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/mcp_server/ingestion.py RENAMED Viewed

File without changes

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/mcp_server/preflight.py RENAMED Viewed

File without changes

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/npm/README.md RENAMED Viewed

File without changes

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/presets/cybersecurity.yaml RENAMED Viewed

File without changes

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/presets/developer.yaml RENAMED Viewed

File without changes

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/presets/general.yaml RENAMED Viewed

File without changes

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/presets/research.yaml RENAMED Viewed

File without changes

{knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/requirements.txt RENAMED Viewed

File without changes

knowledge-rag 3.8.0__tar.gz → 3.9.0__tar.gz

knowledge-rag 3.8.0tar.gz → 3.9.0tar.gz