knowledge-rag 3.8.0__tar.gz → 3.9.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -32,8 +32,9 @@ documents/**/*.csv
32
32
  models_cache/
33
33
  *.onnx
34
34
 
35
- # Local scripts (not part of distribution)
36
- scripts/
35
+ # Local one-off scripts (not part of distribution)
36
+ # Note: tracked utility scripts under scripts/ (e.g. check_version_sync.py) are
37
+ # committed; this rule only catches ad-hoc local files outside that directory.
37
38
  setup-notebook.ps1
38
39
  demo-real.yml
39
40
  documents/.sync-log.txt
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: knowledge-rag
3
- Version: 3.8.0
3
+ Version: 3.9.0
4
4
  Summary: Local RAG System for Claude Code — Hybrid search + Cross-encoder Reranking + 12 MCP Tools + 20 Format Parsers. Zero external servers.
5
5
  Project-URL: Homepage, https://github.com/lyonzin/knowledge-rag
6
6
  Project-URL: Repository, https://github.com/lyonzin/knowledge-rag
@@ -49,6 +49,7 @@ Description-Content-Type: text/markdown
49
49
  ![GPU](https://img.shields.io/badge/GPU-NVIDIA%20CUDA-76B900.svg?logo=nvidia)
50
50
  [![CI](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml)
51
51
  [![CodeQL](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml)
52
+ [![Quality Gate](https://github.com/lyonzin/knowledge-rag/actions/workflows/quality-gate.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/quality-gate.yml)
52
53
  [![Glama Score](https://glama.ai/mcp/servers/lyonzin/knowledge-rag/badges/score.svg)](https://glama.ai/mcp/servers/lyonzin/knowledge-rag)
53
54
 
54
55
  ### Your docs, your machine, zero cloud. Claude Code searches them natively.
@@ -65,19 +66,41 @@ pip install knowledge-rag → restart Claude Code → search_knowledge("your que
65
66
 
66
67
  **12 MCP Tools** | **Hybrid Search + Reranking** | **20 File Formats** | **Optional NVIDIA GPU** | **100% Local**
67
68
 
68
- [What's New](#whats-new-in-v360) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
69
+ [What's New](#whats-new-in-v390) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
69
70
 
70
71
  </div>
71
72
 
72
73
  ---
73
74
 
74
- ## What's New in v3.8.0
75
+ ## What's New in v3.9.0
75
76
 
76
- ### Lazy-Loaded EmbeddingsCheaper Idle Processes
77
+ ### Quality Gate7-Pillar PR Validation
78
+
79
+ knowledge-rag is now used daily by 70+ enterprise teams. Every PR (including dependabot bumps and one-line fixes) is now evaluated against **35+ automated checks** spread across 7 pillars before any human review:
80
+
81
+ | Pillar | What it enforces | Tools |
82
+ |---|---|---|
83
+ | **1 Security** | SAST, secrets, CVEs, supply chain | bandit, semgrep, gitleaks, pip-audit, dependency-review, Snyk, CodeQL, Socket |
84
+ | **2 Stability** | Flake detection, coverage trend, test count, deterministic runs | pytest-rerunfailures, codecov ±0.5pp, test-count guard |
85
+ | **3 Memory Leak** | RSS bounded under 1000-query load, no idle bloat | psutil-based baseline tests + nightly 50K-iteration soak |
86
+ | **4 Versatility** | 9 OS×Python combos, 14 format parsers, 4 config presets, locale tolerance, property-based fuzzing | matrix CI on Linux+Windows+macOS × 3.11+3.12+3.13, Hypothesis |
87
+ | **5 Scalability** | Performance regression > 10% blocks merge, public bench dashboard | pytest-benchmark, GH Pages chart |
88
+ | **6 Versioning** | Atomic version sync, API surface diff, conventional commits, CHANGELOG enforcement, backwards compat | griffe-style AST diff, custom guards |
89
+ | **7 Quality** | Type strictness, docstring coverage, complexity, dead code | mypy strict, interrogate ≥80%, radon, vulture |
90
+
91
+ Plus a **nightly resilience workflow** that runs chaos failure-injection (HF down, ChromaDB corruption, watchdog crash, ONNX zero-byte replay), determinism check (full suite × 3), and mutation testing on selected modules.
92
+
93
+ Read the full philosophy in [CONTRIBUTING.md](CONTRIBUTING.md). Report bugs via [SECURITY.md](SECURITY.md) or the [issue templates](.github/ISSUE_TEMPLATE/).
94
+
95
+ ### Critical Hotfix — No More Silent Zero-Vector Corruption (v3.8.1)
96
+
97
+ `FastEmbedEmbeddings.__call__` no longer swallows exceptions and returns `[[0.0]*dim, ...]` when the ONNX model fails to load. That bug pre-existed in master but was silent: ChromaDB happily stored zero embeddings, `count()` reported normal numbers, smart-reindex skipped them as "already indexed", and queries returned garbage similarity with no error visible. Now raises `EmbeddingModelLoadError` / `EmbeddingError` loudly. **All v3.8.0 users should upgrade.** Full details in [Changelog](#v381-2026-05-10--hotfix).
98
+
99
+ ### Lazy-Loaded Embeddings — Cheaper Idle Processes (v3.8.0)
77
100
 
78
101
  The FastEmbed ONNX model (~200MB resident) now loads on the **first query**, not at startup. Idle `knowledge-rag` processes are now genuinely cheap. Why this matters: MCP stdio is one-process-per-client by protocol — multiple Claude Code windows, Claude Desktop + IDE simultaneously, or review/approval flows that open extra connections all spawn their own processes. Before v3.8.0, every one of them paid the full embedding-model cost up front. Now only processes that actually serve queries load the model. Public API is unchanged.
79
102
 
80
- ### Opt-In Single-Instance Guard
103
+ ### Opt-In Single-Instance Guard (v3.8.0)
81
104
 
82
105
  For users who measured their setup and want a hard cap of one server per `data_dir`:
83
106
 
@@ -101,6 +124,8 @@ All methods produce the same MCP server. See [Installation](#installation) for f
101
124
 
102
125
  ### Recent Highlights
103
126
 
127
+ - **v3.9.0** — **Quality Gate** activated: 35+ automated PR checks across 7 pillars (Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality) + nightly resilience suite (chaos, soak, determinism, mutation)
128
+ - **v3.8.1** — Critical hotfix: loud-fail embeddings (no more silent zero-vector corruption); Windows CI flake erradicated (HF_HUB_OFFLINE + shell:bash + atexit wrapper)
104
129
  - **v3.8.0** — Lazy-load embeddings, opt-in single-instance guard, version sync across PyPI/NPM/Docker
105
130
  - **v3.6.0** — Multi-language code parsing (C/C++/JS/TS/XML), NPM wrapper, Docker image, automated release pipeline
106
131
  - **v3.5.2** — CUDA DLL auto-discovery from pip packages, graceful GPU→CPU fallback, explicit CPU provider (no CUDA noise when `gpu: false`), BASE_DIR resolution fix for editable installs
@@ -1115,6 +1140,36 @@ A second instance exits immediately with code 75. Default is OFF (multi-client f
1115
1140
 
1116
1141
  ## Changelog
1117
1142
 
1143
+ ### v3.9.0 (2026-05-10) — Quality Gate
1144
+
1145
+ **Major governance + CI hardening release. No runtime behavior change in `mcp_server/`. Public API surface unchanged from v3.8.1.**
1146
+
1147
+ - **NEW** Quality Gate workflow (`.github/workflows/quality-gate.yml`) enforcing the 7 pillars on every PR: Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality. 35+ status checks total.
1148
+ - **NEW** Nightly resilience workflow (`.github/workflows/nightly.yml`): chaos suite (failure injection), 1h soak test (50K-iteration loop), determinism check (full suite × 3), mutation testing (mutmut). Auto-opens GitHub issue on any nightly failure.
1149
+ - **NEW** Performance benchmark suite under `bench/` (12 microbenchmarks, pytest-benchmark) with 10% regression gate on every PR.
1150
+ - **NEW** Public performance dashboard via GitHub Pages (`.github/workflows/bench-pages.yml`) — chart of latency/throughput per commit. Dormant until repo Pages is enabled.
1151
+ - **NEW** Property-based fuzzing of all parsers via Hypothesis (`tests/test_ingestion_property.py`) — 200 random examples per CI run.
1152
+ - **NEW** Memory baseline regression tests (`tests/test_memory_baseline.py`, cross-platform via psutil) — RSS bounded under 1000 queries; nightly soak amplifies to 50K iterations.
1153
+ - **NEW** Property/locale/format/preset matrices (`tests/test_presets.py`, `tests/test_locale.py`, `tests/test_format_smoke.py`).
1154
+ - **NEW** Backwards-compatibility regression tests (`tests/test_backwards_compat.py`) — legacy YAML configs from v3.6.0 / v3.7.0 still parse; all 12 MCP tool parameter names frozen.
1155
+ - **NEW** AST-based public API surface diff (`scripts/check_api_surface.py`) — any breaking change blocks merge, baseline at `.github/api-surface-baseline.json`.
1156
+ - **NEW** CHANGELOG enforcement (`scripts/check_changelog.py`) — user-facing PRs must add a bullet under `## Unreleased`; bypass via `skip-changelog` label.
1157
+ - **NEW** Test count anti-regression (`scripts/check_test_count.py`) — guards against silent test deletion.
1158
+ - **NEW** Conventional commits required on every PR title (commitlint via `amannn/action-semantic-pull-request`).
1159
+ - **NEW** mypy `--strict` rolling out per-module (currently `instance_lock.py` + `preflight.py` + `scripts/`); interrogate docstring coverage ≥ 80%; radon, vulture, PR-size guard report-only.
1160
+ - **NEW** CI matrix expanded to 9 cells: Linux + Windows + **macOS** × 3.11 + 3.12 + **3.13** (all required at v3.9.0; macOS / 3.13 promoted from experimental after two clean cycles).
1161
+ - **NEW** Governance docs: `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, `SECURITY.md`, `.github/PULL_REQUEST_TEMPLATE.md`, 3 issue templates, expanded `CODEOWNERS`.
1162
+ - **NEW** Pre-commit hooks: ruff, gitleaks, version-sync, conventional commits.
1163
+ - **CHORE** `.github/codecov.yml` enforcing coverage trend gate (-0.5pp blocks; new code ≥ 70%).
1164
+
1165
+ ### v3.8.1 (2026-05-10) — hotfix
1166
+
1167
+ - **FIX (critical)**: `FastEmbedEmbeddings.__call__` no longer returns vectors of zeros when the ONNX model fails to load or `embed()` raises. The previous behavior silently corrupted the index — ChromaDB stored zero embeddings, `count()` reported normal numbers, smart-reindex skipped the bad chunks, and queries returned garbage scores with no error visible. Now raises `EmbeddingModelLoadError` / `EmbeddingError`. (#36)
1168
+ - **FIX**: Sticky `_load_failed` flag — after a load failure, subsequent calls re-raise immediately instead of looping through HuggingFace download attempts (was the "frozen query" UX in v3.8.0).
1169
+ - **NEW**: Sanity checks in `__call__` — embed count and dim mismatches raise `EmbeddingError` instead of silently returning malformed vectors.
1170
+ - **TEST**: 7 new regression cases in `tests/test_lazy_embeddings.py`, including `test_does_not_return_zero_vectors_silently` as a guard for the whole class of bug.
1171
+ - **NOTE**: This is a pre-existing bug in master, not introduced by v3.8.0. v3.8.0 lazy-load expanded the impact (failures moved to query time). All v3.8.0 users should upgrade.
1172
+
1118
1173
  ### v3.8.0 (2026-05-10)
1119
1174
 
1120
1175
  - **NEW**: Lazy-load FastEmbed embedding model (~200MB ONNX runtime). Loads on first query instead of startup — idle `knowledge-rag` processes are now cheap, which matters when MCP stdio clients spawn parallel server processes (multiple Claude Code windows, Claude Desktop + IDE, etc.). Public API unchanged. (#32)
@@ -11,6 +11,7 @@
11
11
  ![GPU](https://img.shields.io/badge/GPU-NVIDIA%20CUDA-76B900.svg?logo=nvidia)
12
12
  [![CI](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml)
13
13
  [![CodeQL](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml)
14
+ [![Quality Gate](https://github.com/lyonzin/knowledge-rag/actions/workflows/quality-gate.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/quality-gate.yml)
14
15
  [![Glama Score](https://glama.ai/mcp/servers/lyonzin/knowledge-rag/badges/score.svg)](https://glama.ai/mcp/servers/lyonzin/knowledge-rag)
15
16
 
16
17
  ### Your docs, your machine, zero cloud. Claude Code searches them natively.
@@ -27,19 +28,41 @@ pip install knowledge-rag → restart Claude Code → search_knowledge("your que
27
28
 
28
29
  **12 MCP Tools** | **Hybrid Search + Reranking** | **20 File Formats** | **Optional NVIDIA GPU** | **100% Local**
29
30
 
30
- [What's New](#whats-new-in-v360) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
31
+ [What's New](#whats-new-in-v390) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
31
32
 
32
33
  </div>
33
34
 
34
35
  ---
35
36
 
36
- ## What's New in v3.8.0
37
+ ## What's New in v3.9.0
37
38
 
38
- ### Lazy-Loaded EmbeddingsCheaper Idle Processes
39
+ ### Quality Gate7-Pillar PR Validation
40
+
41
+ knowledge-rag is now used daily by 70+ enterprise teams. Every PR (including dependabot bumps and one-line fixes) is now evaluated against **35+ automated checks** spread across 7 pillars before any human review:
42
+
43
+ | Pillar | What it enforces | Tools |
44
+ |---|---|---|
45
+ | **1 Security** | SAST, secrets, CVEs, supply chain | bandit, semgrep, gitleaks, pip-audit, dependency-review, Snyk, CodeQL, Socket |
46
+ | **2 Stability** | Flake detection, coverage trend, test count, deterministic runs | pytest-rerunfailures, codecov ±0.5pp, test-count guard |
47
+ | **3 Memory Leak** | RSS bounded under 1000-query load, no idle bloat | psutil-based baseline tests + nightly 50K-iteration soak |
48
+ | **4 Versatility** | 9 OS×Python combos, 14 format parsers, 4 config presets, locale tolerance, property-based fuzzing | matrix CI on Linux+Windows+macOS × 3.11+3.12+3.13, Hypothesis |
49
+ | **5 Scalability** | Performance regression > 10% blocks merge, public bench dashboard | pytest-benchmark, GH Pages chart |
50
+ | **6 Versioning** | Atomic version sync, API surface diff, conventional commits, CHANGELOG enforcement, backwards compat | griffe-style AST diff, custom guards |
51
+ | **7 Quality** | Type strictness, docstring coverage, complexity, dead code | mypy strict, interrogate ≥80%, radon, vulture |
52
+
53
+ Plus a **nightly resilience workflow** that runs chaos failure-injection (HF down, ChromaDB corruption, watchdog crash, ONNX zero-byte replay), determinism check (full suite × 3), and mutation testing on selected modules.
54
+
55
+ Read the full philosophy in [CONTRIBUTING.md](CONTRIBUTING.md). Report bugs via [SECURITY.md](SECURITY.md) or the [issue templates](.github/ISSUE_TEMPLATE/).
56
+
57
+ ### Critical Hotfix — No More Silent Zero-Vector Corruption (v3.8.1)
58
+
59
+ `FastEmbedEmbeddings.__call__` no longer swallows exceptions and returns `[[0.0]*dim, ...]` when the ONNX model fails to load. That bug pre-existed in master but was silent: ChromaDB happily stored zero embeddings, `count()` reported normal numbers, smart-reindex skipped them as "already indexed", and queries returned garbage similarity with no error visible. Now raises `EmbeddingModelLoadError` / `EmbeddingError` loudly. **All v3.8.0 users should upgrade.** Full details in [Changelog](#v381-2026-05-10--hotfix).
60
+
61
+ ### Lazy-Loaded Embeddings — Cheaper Idle Processes (v3.8.0)
39
62
 
40
63
  The FastEmbed ONNX model (~200MB resident) now loads on the **first query**, not at startup. Idle `knowledge-rag` processes are now genuinely cheap. Why this matters: MCP stdio is one-process-per-client by protocol — multiple Claude Code windows, Claude Desktop + IDE simultaneously, or review/approval flows that open extra connections all spawn their own processes. Before v3.8.0, every one of them paid the full embedding-model cost up front. Now only processes that actually serve queries load the model. Public API is unchanged.
41
64
 
42
- ### Opt-In Single-Instance Guard
65
+ ### Opt-In Single-Instance Guard (v3.8.0)
43
66
 
44
67
  For users who measured their setup and want a hard cap of one server per `data_dir`:
45
68
 
@@ -63,6 +86,8 @@ All methods produce the same MCP server. See [Installation](#installation) for f
63
86
 
64
87
  ### Recent Highlights
65
88
 
89
+ - **v3.9.0** — **Quality Gate** activated: 35+ automated PR checks across 7 pillars (Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality) + nightly resilience suite (chaos, soak, determinism, mutation)
90
+ - **v3.8.1** — Critical hotfix: loud-fail embeddings (no more silent zero-vector corruption); Windows CI flake erradicated (HF_HUB_OFFLINE + shell:bash + atexit wrapper)
66
91
  - **v3.8.0** — Lazy-load embeddings, opt-in single-instance guard, version sync across PyPI/NPM/Docker
67
92
  - **v3.6.0** — Multi-language code parsing (C/C++/JS/TS/XML), NPM wrapper, Docker image, automated release pipeline
68
93
  - **v3.5.2** — CUDA DLL auto-discovery from pip packages, graceful GPU→CPU fallback, explicit CPU provider (no CUDA noise when `gpu: false`), BASE_DIR resolution fix for editable installs
@@ -1077,6 +1102,36 @@ A second instance exits immediately with code 75. Default is OFF (multi-client f
1077
1102
 
1078
1103
  ## Changelog
1079
1104
 
1105
+ ### v3.9.0 (2026-05-10) — Quality Gate
1106
+
1107
+ **Major governance + CI hardening release. No runtime behavior change in `mcp_server/`. Public API surface unchanged from v3.8.1.**
1108
+
1109
+ - **NEW** Quality Gate workflow (`.github/workflows/quality-gate.yml`) enforcing the 7 pillars on every PR: Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality. 35+ status checks total.
1110
+ - **NEW** Nightly resilience workflow (`.github/workflows/nightly.yml`): chaos suite (failure injection), 1h soak test (50K-iteration loop), determinism check (full suite × 3), mutation testing (mutmut). Auto-opens GitHub issue on any nightly failure.
1111
+ - **NEW** Performance benchmark suite under `bench/` (12 microbenchmarks, pytest-benchmark) with 10% regression gate on every PR.
1112
+ - **NEW** Public performance dashboard via GitHub Pages (`.github/workflows/bench-pages.yml`) — chart of latency/throughput per commit. Dormant until repo Pages is enabled.
1113
+ - **NEW** Property-based fuzzing of all parsers via Hypothesis (`tests/test_ingestion_property.py`) — 200 random examples per CI run.
1114
+ - **NEW** Memory baseline regression tests (`tests/test_memory_baseline.py`, cross-platform via psutil) — RSS bounded under 1000 queries; nightly soak amplifies to 50K iterations.
1115
+ - **NEW** Property/locale/format/preset matrices (`tests/test_presets.py`, `tests/test_locale.py`, `tests/test_format_smoke.py`).
1116
+ - **NEW** Backwards-compatibility regression tests (`tests/test_backwards_compat.py`) — legacy YAML configs from v3.6.0 / v3.7.0 still parse; all 12 MCP tool parameter names frozen.
1117
+ - **NEW** AST-based public API surface diff (`scripts/check_api_surface.py`) — any breaking change blocks merge, baseline at `.github/api-surface-baseline.json`.
1118
+ - **NEW** CHANGELOG enforcement (`scripts/check_changelog.py`) — user-facing PRs must add a bullet under `## Unreleased`; bypass via `skip-changelog` label.
1119
+ - **NEW** Test count anti-regression (`scripts/check_test_count.py`) — guards against silent test deletion.
1120
+ - **NEW** Conventional commits required on every PR title (commitlint via `amannn/action-semantic-pull-request`).
1121
+ - **NEW** mypy `--strict` rolling out per-module (currently `instance_lock.py` + `preflight.py` + `scripts/`); interrogate docstring coverage ≥ 80%; radon, vulture, PR-size guard report-only.
1122
+ - **NEW** CI matrix expanded to 9 cells: Linux + Windows + **macOS** × 3.11 + 3.12 + **3.13** (all required at v3.9.0; macOS / 3.13 promoted from experimental after two clean cycles).
1123
+ - **NEW** Governance docs: `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, `SECURITY.md`, `.github/PULL_REQUEST_TEMPLATE.md`, 3 issue templates, expanded `CODEOWNERS`.
1124
+ - **NEW** Pre-commit hooks: ruff, gitleaks, version-sync, conventional commits.
1125
+ - **CHORE** `.github/codecov.yml` enforcing coverage trend gate (-0.5pp blocks; new code ≥ 70%).
1126
+
1127
+ ### v3.8.1 (2026-05-10) — hotfix
1128
+
1129
+ - **FIX (critical)**: `FastEmbedEmbeddings.__call__` no longer returns vectors of zeros when the ONNX model fails to load or `embed()` raises. The previous behavior silently corrupted the index — ChromaDB stored zero embeddings, `count()` reported normal numbers, smart-reindex skipped the bad chunks, and queries returned garbage scores with no error visible. Now raises `EmbeddingModelLoadError` / `EmbeddingError`. (#36)
1130
+ - **FIX**: Sticky `_load_failed` flag — after a load failure, subsequent calls re-raise immediately instead of looping through HuggingFace download attempts (was the "frozen query" UX in v3.8.0).
1131
+ - **NEW**: Sanity checks in `__call__` — embed count and dim mismatches raise `EmbeddingError` instead of silently returning malformed vectors.
1132
+ - **TEST**: 7 new regression cases in `tests/test_lazy_embeddings.py`, including `test_does_not_return_zero_vectors_silently` as a guard for the whole class of bug.
1133
+ - **NOTE**: This is a pre-existing bug in master, not introduced by v3.8.0. v3.8.0 lazy-load expanded the impact (failures moved to query time). All v3.8.0 users should upgrade.
1134
+
1080
1135
  ### v3.8.0 (2026-05-10)
1081
1136
 
1082
1137
  - **NEW**: Lazy-load FastEmbed embedding model (~200MB ONNX runtime). Loads on first query instead of startup — idle `knowledge-rag` processes are now cheap, which matters when MCP stdio clients spawn parallel server processes (multiple Claude Code windows, Claude Desktop + IDE, etc.). Public API unchanged. (#32)
@@ -8,7 +8,7 @@ import sys # noqa: I001
8
8
  _original_stdout = sys.stdout
9
9
  sys.stdout = sys.stderr
10
10
 
11
- __version__ = "3.8.0"
11
+ __version__ = "3.9.0"
12
12
  __author__ = "Ailton Rocha (Lyon.)"
13
13
 
14
14
  from .config import Config # noqa: E402
@@ -155,7 +155,7 @@ def single_instance_lock() -> Iterator[Optional[Path]]:
155
155
  # Wire signal handlers so SIGINT/SIGTERM cleanup the lock before exit
156
156
  previous_handlers: dict[int, object] = {}
157
157
 
158
- def _signal_cleanup(signum: int, frame) -> None:
158
+ def _signal_cleanup(signum: int, frame: object) -> None:
159
159
  _remove_if_ours(lock_path)
160
160
  # Restore original handler and re-raise so default action runs
161
161
  prev = previous_handlers.get(signum, signal.SIG_DFL)
@@ -129,6 +129,18 @@ class QueryCache:
129
129
  # =============================================================================
130
130
 
131
131
 
132
+ class EmbeddingError(RuntimeError):
133
+ """Raised when embedding generation fails after a successful model load."""
134
+
135
+
136
+ class EmbeddingModelLoadError(RuntimeError):
137
+ """Raised when the embedding model itself cannot be loaded.
138
+
139
+ Distinct from EmbeddingError so callers can decide whether to retry
140
+ (transient runtime failure) or surface a hard configuration problem.
141
+ """
142
+
143
+
132
144
  class FastEmbedEmbeddings:
133
145
  """
134
146
  FastEmbed-based embedding function for ChromaDB (v1.4.0+ compatible).
@@ -190,33 +202,57 @@ class FastEmbedEmbeddings:
190
202
  self._gpu = bool(config.gpu_acceleration)
191
203
  self._model: Optional[TextEmbedding] = None
192
204
  self._load_lock = threading.Lock()
205
+ # Sticky failure flag: once load fails, subsequent calls re-raise immediately
206
+ # instead of looping through download/retry. Same pattern as CrossEncoderReranker.
207
+ self._load_failed: Optional[Exception] = None
193
208
 
194
209
  def _load_model(self) -> None:
195
- """Load the ONNX model on demand. Idempotent and thread-safe."""
210
+ """Load the ONNX model on demand. Idempotent and thread-safe.
211
+
212
+ Raises:
213
+ EmbeddingModelLoadError: when the underlying ONNX runtime cannot
214
+ instantiate the model (missing files, hash mismatch, etc.). The
215
+ exception is sticky — subsequent calls raise the same error
216
+ without retrying so callers do not loop through HF downloads.
217
+ """
196
218
  if self._model is not None:
197
219
  return
220
+ if self._load_failed is not None:
221
+ raise EmbeddingModelLoadError(
222
+ f"Embedding model previously failed to load: {self._load_failed}"
223
+ ) from self._load_failed
198
224
  with self._load_lock:
199
225
  if self._model is not None: # double-checked under the lock
200
226
  return
227
+ if self._load_failed is not None:
228
+ raise EmbeddingModelLoadError(
229
+ f"Embedding model previously failed to load: {self._load_failed}"
230
+ ) from self._load_failed
201
231
  kwargs = dict(self._init_kwargs)
202
- if self._gpu:
203
- self._setup_cuda_dll_paths()
204
- kwargs["providers"] = ["CUDAExecutionProvider", "CPUExecutionProvider"]
205
- print(f"[INFO] Loading embedding model: {self.model_name} ({self._dim}D) [GPU accelerated]...")
206
- try:
207
- self._model = TextEmbedding(**kwargs)
208
- print("[INFO] Embedding model loaded successfully [GPU]")
209
- return
210
- except (ValueError, RuntimeError) as e:
211
- print(f"[WARN] GPU init failed ({e}), falling back to CPU...")
232
+ try:
233
+ if self._gpu:
234
+ self._setup_cuda_dll_paths()
235
+ kwargs["providers"] = ["CUDAExecutionProvider", "CPUExecutionProvider"]
236
+ print(f"[INFO] Loading embedding model: {self.model_name} ({self._dim}D) [GPU accelerated]...")
237
+ try:
238
+ self._model = TextEmbedding(**kwargs)
239
+ print("[INFO] Embedding model loaded successfully [GPU]")
240
+ except (ValueError, RuntimeError) as e:
241
+ print(f"[WARN] GPU init failed ({e}), falling back to CPU...")
242
+ kwargs["providers"] = ["CPUExecutionProvider"]
243
+ self._model = TextEmbedding(**kwargs)
244
+ print("[INFO] Embedding model loaded successfully [CPU fallback]")
245
+ else:
212
246
  kwargs["providers"] = ["CPUExecutionProvider"]
247
+ print(f"[INFO] Loading embedding model: {self.model_name} ({self._dim}D)...")
213
248
  self._model = TextEmbedding(**kwargs)
214
- print("[INFO] Embedding model loaded successfully [CPU fallback]")
215
- return
216
- kwargs["providers"] = ["CPUExecutionProvider"]
217
- print(f"[INFO] Loading embedding model: {self.model_name} ({self._dim}D)...")
218
- self._model = TextEmbedding(**kwargs)
219
- print("[INFO] Embedding model loaded successfully")
249
+ print("[INFO] Embedding model loaded successfully")
250
+ except Exception as exc:
251
+ # ONNXRuntimeError, FileNotFoundError, etc. — record and re-raise loud
252
+ self._load_failed = exc
253
+ self._model = None
254
+ print(f"[ERROR] Embedding model load FAILED: {exc}", file=sys.stderr)
255
+ raise EmbeddingModelLoadError(f"Failed to load embedding model: {exc}") from exc
220
256
 
221
257
  def __call__(self, input: List[str]) -> List[List[float]]:
222
258
  """
@@ -224,17 +260,38 @@ class FastEmbedEmbeddings:
224
260
 
225
261
  ChromaDB embedding_function interface: __call__(input: List[str]) -> List[List[float]]
226
262
  FastEmbed.embed() returns a generator, so we consume it into a list.
263
+
264
+ Raises:
265
+ EmbeddingModelLoadError: when the model could not be loaded.
266
+ EmbeddingError: when embedding generation fails after a successful load.
267
+
268
+ Behavior note (changed in v3.8.1):
269
+ Previously this method swallowed any exception and returned vectors
270
+ of zeros (``[[0.0]*dim for _ in input]``). That silently corrupted
271
+ the index — ChromaDB stored zero vectors as document embeddings,
272
+ ``count()`` returned the right number of chunks, smart-reindex
273
+ would skip them as "already indexed", and queries returned garbage
274
+ similarity scores. Failures are now LOUD: the caller (ChromaDB
275
+ ``add()``, MCP search tool, etc.) sees the real error and can
276
+ surface it to the user.
227
277
  """
228
278
  if not input:
229
279
  return []
230
280
 
231
- self._load_model()
281
+ self._load_model() # may raise EmbeddingModelLoadError
232
282
  try:
233
283
  embeddings = list(self._model.embed(input))
234
- return [emb.tolist() for emb in embeddings]
235
- except Exception as e:
236
- print(f"[WARN] Embedding failed: {e}")
237
- return [[0.0] * self._dim for _ in input]
284
+ except Exception as exc:
285
+ print(f"[ERROR] Embedding generation FAILED: {exc}", file=sys.stderr)
286
+ raise EmbeddingError(f"Embedding generation failed: {exc}") from exc
287
+
288
+ # Sanity check: model returned the right number of vectors with the right dim
289
+ if len(embeddings) != len(input):
290
+ raise EmbeddingError(f"Embedding count mismatch: expected {len(input)}, got {len(embeddings)}")
291
+ result = [emb.tolist() for emb in embeddings]
292
+ if result and len(result[0]) != self._dim:
293
+ raise EmbeddingError(f"Embedding dim mismatch: expected {self._dim}, got {len(result[0])}")
294
+ return result
238
295
 
239
296
  def name(self) -> str:
240
297
  """Return embedding function name (required by ChromaDB v1.4.0+)"""
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
4
4
 
5
5
  [project]
6
6
  name = "knowledge-rag"
7
- version = "3.8.0"
7
+ version = "3.9.0"
8
8
  description = "Local RAG System for Claude Code — Hybrid search + Cross-encoder Reranking + 12 MCP Tools + 20 Format Parsers. Zero external servers."
9
9
  readme = "README.md"
10
10
  license = {text = "MIT"}
@@ -100,6 +100,12 @@ pythonpath = ["."]
100
100
  # many tmp dirs accumulate). Tests run isolated; we don't need history.
101
101
  tmp_path_retention_count = 1
102
102
  tmp_path_retention_policy = "failed"
103
+ markers = [
104
+ "chaos: failure-injection tests run in nightly workflow only",
105
+ ]
106
+ # Default: skip chaos tests in regular pytest runs (PR + local).
107
+ # Nightly workflow opts in with `pytest -m chaos`.
108
+ addopts = "-m 'not chaos'"
103
109
 
104
110
  [tool.ruff]
105
111
  target-version = "py311"
@@ -118,3 +124,36 @@ source = ["mcp_server"]
118
124
  [tool.coverage.report]
119
125
  show_missing = true
120
126
  fail_under = 35
127
+
128
+ # ── mypy: strict gradual rollout (Pillar 7) ─────────────────────────────────
129
+ # Strict mode is enabled GLOBALLY but with a per-module exclusion list while
130
+ # we incrementally annotate the legacy modules. The CI job runs strict on the
131
+ # allowlist below; new modules are added as they earn full annotations.
132
+ [tool.mypy]
133
+ python_version = "3.11"
134
+ strict = true
135
+ show_error_codes = true
136
+ warn_unused_configs = true
137
+ warn_unreachable = true
138
+ ignore_missing_imports = true # third-party libs (chromadb, fastembed, etc.) often lack stubs
139
+
140
+ # ── interrogate: docstring coverage (Pillar 7) ──────────────────────────────
141
+ [tool.interrogate]
142
+ fail-under = 80
143
+ verbose = 1
144
+ quiet = false
145
+ exclude = [
146
+ "tests",
147
+ "bench",
148
+ "scripts",
149
+ "build",
150
+ "dist",
151
+ "venv",
152
+ ".venv",
153
+ ]
154
+ ignore-init-method = true
155
+ ignore-init-module = true
156
+ ignore-magic = true
157
+ ignore-property-decorators = true
158
+ ignore-private = true
159
+ ignore-semiprivate = true
@@ -1,36 +0,0 @@
1
- # Sample Document — Knowledge RAG
2
-
3
- This is an example document showing the expected format for the Knowledge RAG system.
4
-
5
- ## How Documents Are Organized
6
-
7
- Place your documents in the `documents/` directory, organized by category:
8
-
9
- - `security/` — Security research, pentest notes, exploit techniques
10
- - `ctf/` — CTF writeups, challenge solutions
11
- - `logscale/` — LogScale/LQL queries and documentation
12
- - `development/` — Code documentation, API references
13
- - `general/` — Everything else
14
-
15
- ## Supported Formats
16
-
17
- - **Markdown** (`.md`) — Best format. Chunks align to `##` sections.
18
- - **PDF** (`.pdf`) — Extracted page-by-page via PyMuPDF.
19
- - **Text** (`.txt`) — Plain text, paragraph-based chunking.
20
- - **Python** (`.py`) — Code with function/class extraction.
21
- - **JSON** (`.json`) — Structured data, pretty-printed.
22
-
23
- ## Tips for Best Results
24
-
25
- 1. Use `##` and `###` headers in Markdown files — the system chunks by sections
26
- 2. Keep sections focused on a single topic for better retrieval precision
27
- 3. Include relevant keywords naturally in your text
28
- 4. After adding new documents, the system auto-indexes on next startup
29
-
30
- ## Example Search Queries
31
-
32
- ```
33
- search_knowledge("sql injection bypass", hybrid_alpha=0.3)
34
- search_knowledge("privilege escalation linux", category="security")
35
- search_knowledge("formatTime logscale", hybrid_alpha=0)
36
- ```
File without changes