knowledge-rag 3.8.0__tar.gz → 3.9.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/.gitignore +3 -2
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/PKG-INFO +60 -5
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/README.md +59 -4
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/mcp_server/__init__.py +1 -1
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/mcp_server/instance_lock.py +1 -1
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/mcp_server/server.py +79 -22
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/pyproject.toml +40 -1
- knowledge_rag-3.8.0/documents/examples/sample-document.md +0 -36
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/LICENSE +0 -0
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/config.example.yaml +0 -0
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/mcp_server/config.py +0 -0
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/mcp_server/guarded.py +0 -0
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/mcp_server/ingestion.py +0 -0
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/mcp_server/preflight.py +0 -0
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/npm/README.md +0 -0
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/presets/cybersecurity.yaml +0 -0
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/presets/developer.yaml +0 -0
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/presets/general.yaml +0 -0
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/presets/research.yaml +0 -0
- {knowledge_rag-3.8.0 → knowledge_rag-3.9.0}/requirements.txt +0 -0
|
@@ -32,8 +32,9 @@ documents/**/*.csv
|
|
|
32
32
|
models_cache/
|
|
33
33
|
*.onnx
|
|
34
34
|
|
|
35
|
-
# Local scripts (not part of distribution)
|
|
36
|
-
scripts/
|
|
35
|
+
# Local one-off scripts (not part of distribution)
|
|
36
|
+
# Note: tracked utility scripts under scripts/ (e.g. check_version_sync.py) are
|
|
37
|
+
# committed; this rule only catches ad-hoc local files outside that directory.
|
|
37
38
|
setup-notebook.ps1
|
|
38
39
|
demo-real.yml
|
|
39
40
|
documents/.sync-log.txt
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: knowledge-rag
|
|
3
|
-
Version: 3.
|
|
3
|
+
Version: 3.9.0
|
|
4
4
|
Summary: Local RAG System for Claude Code — Hybrid search + Cross-encoder Reranking + 12 MCP Tools + 20 Format Parsers. Zero external servers.
|
|
5
5
|
Project-URL: Homepage, https://github.com/lyonzin/knowledge-rag
|
|
6
6
|
Project-URL: Repository, https://github.com/lyonzin/knowledge-rag
|
|
@@ -49,6 +49,7 @@ Description-Content-Type: text/markdown
|
|
|
49
49
|

|
|
50
50
|
[](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml)
|
|
51
51
|
[](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml)
|
|
52
|
+
[](https://github.com/lyonzin/knowledge-rag/actions/workflows/quality-gate.yml)
|
|
52
53
|
[](https://glama.ai/mcp/servers/lyonzin/knowledge-rag)
|
|
53
54
|
|
|
54
55
|
### Your docs, your machine, zero cloud. Claude Code searches them natively.
|
|
@@ -65,19 +66,41 @@ pip install knowledge-rag → restart Claude Code → search_knowledge("your que
|
|
|
65
66
|
|
|
66
67
|
**12 MCP Tools** | **Hybrid Search + Reranking** | **20 File Formats** | **Optional NVIDIA GPU** | **100% Local**
|
|
67
68
|
|
|
68
|
-
[What's New](#whats-new-in-
|
|
69
|
+
[What's New](#whats-new-in-v390) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
|
|
69
70
|
|
|
70
71
|
</div>
|
|
71
72
|
|
|
72
73
|
---
|
|
73
74
|
|
|
74
|
-
## What's New in v3.
|
|
75
|
+
## What's New in v3.9.0
|
|
75
76
|
|
|
76
|
-
###
|
|
77
|
+
### Quality Gate — 7-Pillar PR Validation
|
|
78
|
+
|
|
79
|
+
knowledge-rag is now used daily by 70+ enterprise teams. Every PR (including dependabot bumps and one-line fixes) is now evaluated against **35+ automated checks** spread across 7 pillars before any human review:
|
|
80
|
+
|
|
81
|
+
| Pillar | What it enforces | Tools |
|
|
82
|
+
|---|---|---|
|
|
83
|
+
| **1 Security** | SAST, secrets, CVEs, supply chain | bandit, semgrep, gitleaks, pip-audit, dependency-review, Snyk, CodeQL, Socket |
|
|
84
|
+
| **2 Stability** | Flake detection, coverage trend, test count, deterministic runs | pytest-rerunfailures, codecov ±0.5pp, test-count guard |
|
|
85
|
+
| **3 Memory Leak** | RSS bounded under 1000-query load, no idle bloat | psutil-based baseline tests + nightly 50K-iteration soak |
|
|
86
|
+
| **4 Versatility** | 9 OS×Python combos, 14 format parsers, 4 config presets, locale tolerance, property-based fuzzing | matrix CI on Linux+Windows+macOS × 3.11+3.12+3.13, Hypothesis |
|
|
87
|
+
| **5 Scalability** | Performance regression > 10% blocks merge, public bench dashboard | pytest-benchmark, GH Pages chart |
|
|
88
|
+
| **6 Versioning** | Atomic version sync, API surface diff, conventional commits, CHANGELOG enforcement, backwards compat | griffe-style AST diff, custom guards |
|
|
89
|
+
| **7 Quality** | Type strictness, docstring coverage, complexity, dead code | mypy strict, interrogate ≥80%, radon, vulture |
|
|
90
|
+
|
|
91
|
+
Plus a **nightly resilience workflow** that runs chaos failure-injection (HF down, ChromaDB corruption, watchdog crash, ONNX zero-byte replay), determinism check (full suite × 3), and mutation testing on selected modules.
|
|
92
|
+
|
|
93
|
+
Read the full philosophy in [CONTRIBUTING.md](CONTRIBUTING.md). Report bugs via [SECURITY.md](SECURITY.md) or the [issue templates](.github/ISSUE_TEMPLATE/).
|
|
94
|
+
|
|
95
|
+
### Critical Hotfix — No More Silent Zero-Vector Corruption (v3.8.1)
|
|
96
|
+
|
|
97
|
+
`FastEmbedEmbeddings.__call__` no longer swallows exceptions and returns `[[0.0]*dim, ...]` when the ONNX model fails to load. That bug pre-existed in master but was silent: ChromaDB happily stored zero embeddings, `count()` reported normal numbers, smart-reindex skipped them as "already indexed", and queries returned garbage similarity with no error visible. Now raises `EmbeddingModelLoadError` / `EmbeddingError` loudly. **All v3.8.0 users should upgrade.** Full details in [Changelog](#v381-2026-05-10--hotfix).
|
|
98
|
+
|
|
99
|
+
### Lazy-Loaded Embeddings — Cheaper Idle Processes (v3.8.0)
|
|
77
100
|
|
|
78
101
|
The FastEmbed ONNX model (~200MB resident) now loads on the **first query**, not at startup. Idle `knowledge-rag` processes are now genuinely cheap. Why this matters: MCP stdio is one-process-per-client by protocol — multiple Claude Code windows, Claude Desktop + IDE simultaneously, or review/approval flows that open extra connections all spawn their own processes. Before v3.8.0, every one of them paid the full embedding-model cost up front. Now only processes that actually serve queries load the model. Public API is unchanged.
|
|
79
102
|
|
|
80
|
-
### Opt-In Single-Instance Guard
|
|
103
|
+
### Opt-In Single-Instance Guard (v3.8.0)
|
|
81
104
|
|
|
82
105
|
For users who measured their setup and want a hard cap of one server per `data_dir`:
|
|
83
106
|
|
|
@@ -101,6 +124,8 @@ All methods produce the same MCP server. See [Installation](#installation) for f
|
|
|
101
124
|
|
|
102
125
|
### Recent Highlights
|
|
103
126
|
|
|
127
|
+
- **v3.9.0** — **Quality Gate** activated: 35+ automated PR checks across 7 pillars (Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality) + nightly resilience suite (chaos, soak, determinism, mutation)
|
|
128
|
+
- **v3.8.1** — Critical hotfix: loud-fail embeddings (no more silent zero-vector corruption); Windows CI flake erradicated (HF_HUB_OFFLINE + shell:bash + atexit wrapper)
|
|
104
129
|
- **v3.8.0** — Lazy-load embeddings, opt-in single-instance guard, version sync across PyPI/NPM/Docker
|
|
105
130
|
- **v3.6.0** — Multi-language code parsing (C/C++/JS/TS/XML), NPM wrapper, Docker image, automated release pipeline
|
|
106
131
|
- **v3.5.2** — CUDA DLL auto-discovery from pip packages, graceful GPU→CPU fallback, explicit CPU provider (no CUDA noise when `gpu: false`), BASE_DIR resolution fix for editable installs
|
|
@@ -1115,6 +1140,36 @@ A second instance exits immediately with code 75. Default is OFF (multi-client f
|
|
|
1115
1140
|
|
|
1116
1141
|
## Changelog
|
|
1117
1142
|
|
|
1143
|
+
### v3.9.0 (2026-05-10) — Quality Gate
|
|
1144
|
+
|
|
1145
|
+
**Major governance + CI hardening release. No runtime behavior change in `mcp_server/`. Public API surface unchanged from v3.8.1.**
|
|
1146
|
+
|
|
1147
|
+
- **NEW** Quality Gate workflow (`.github/workflows/quality-gate.yml`) enforcing the 7 pillars on every PR: Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality. 35+ status checks total.
|
|
1148
|
+
- **NEW** Nightly resilience workflow (`.github/workflows/nightly.yml`): chaos suite (failure injection), 1h soak test (50K-iteration loop), determinism check (full suite × 3), mutation testing (mutmut). Auto-opens GitHub issue on any nightly failure.
|
|
1149
|
+
- **NEW** Performance benchmark suite under `bench/` (12 microbenchmarks, pytest-benchmark) with 10% regression gate on every PR.
|
|
1150
|
+
- **NEW** Public performance dashboard via GitHub Pages (`.github/workflows/bench-pages.yml`) — chart of latency/throughput per commit. Dormant until repo Pages is enabled.
|
|
1151
|
+
- **NEW** Property-based fuzzing of all parsers via Hypothesis (`tests/test_ingestion_property.py`) — 200 random examples per CI run.
|
|
1152
|
+
- **NEW** Memory baseline regression tests (`tests/test_memory_baseline.py`, cross-platform via psutil) — RSS bounded under 1000 queries; nightly soak amplifies to 50K iterations.
|
|
1153
|
+
- **NEW** Property/locale/format/preset matrices (`tests/test_presets.py`, `tests/test_locale.py`, `tests/test_format_smoke.py`).
|
|
1154
|
+
- **NEW** Backwards-compatibility regression tests (`tests/test_backwards_compat.py`) — legacy YAML configs from v3.6.0 / v3.7.0 still parse; all 12 MCP tool parameter names frozen.
|
|
1155
|
+
- **NEW** AST-based public API surface diff (`scripts/check_api_surface.py`) — any breaking change blocks merge, baseline at `.github/api-surface-baseline.json`.
|
|
1156
|
+
- **NEW** CHANGELOG enforcement (`scripts/check_changelog.py`) — user-facing PRs must add a bullet under `## Unreleased`; bypass via `skip-changelog` label.
|
|
1157
|
+
- **NEW** Test count anti-regression (`scripts/check_test_count.py`) — guards against silent test deletion.
|
|
1158
|
+
- **NEW** Conventional commits required on every PR title (commitlint via `amannn/action-semantic-pull-request`).
|
|
1159
|
+
- **NEW** mypy `--strict` rolling out per-module (currently `instance_lock.py` + `preflight.py` + `scripts/`); interrogate docstring coverage ≥ 80%; radon, vulture, PR-size guard report-only.
|
|
1160
|
+
- **NEW** CI matrix expanded to 9 cells: Linux + Windows + **macOS** × 3.11 + 3.12 + **3.13** (all required at v3.9.0; macOS / 3.13 promoted from experimental after two clean cycles).
|
|
1161
|
+
- **NEW** Governance docs: `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, `SECURITY.md`, `.github/PULL_REQUEST_TEMPLATE.md`, 3 issue templates, expanded `CODEOWNERS`.
|
|
1162
|
+
- **NEW** Pre-commit hooks: ruff, gitleaks, version-sync, conventional commits.
|
|
1163
|
+
- **CHORE** `.github/codecov.yml` enforcing coverage trend gate (-0.5pp blocks; new code ≥ 70%).
|
|
1164
|
+
|
|
1165
|
+
### v3.8.1 (2026-05-10) — hotfix
|
|
1166
|
+
|
|
1167
|
+
- **FIX (critical)**: `FastEmbedEmbeddings.__call__` no longer returns vectors of zeros when the ONNX model fails to load or `embed()` raises. The previous behavior silently corrupted the index — ChromaDB stored zero embeddings, `count()` reported normal numbers, smart-reindex skipped the bad chunks, and queries returned garbage scores with no error visible. Now raises `EmbeddingModelLoadError` / `EmbeddingError`. (#36)
|
|
1168
|
+
- **FIX**: Sticky `_load_failed` flag — after a load failure, subsequent calls re-raise immediately instead of looping through HuggingFace download attempts (was the "frozen query" UX in v3.8.0).
|
|
1169
|
+
- **NEW**: Sanity checks in `__call__` — embed count and dim mismatches raise `EmbeddingError` instead of silently returning malformed vectors.
|
|
1170
|
+
- **TEST**: 7 new regression cases in `tests/test_lazy_embeddings.py`, including `test_does_not_return_zero_vectors_silently` as a guard for the whole class of bug.
|
|
1171
|
+
- **NOTE**: This is a pre-existing bug in master, not introduced by v3.8.0. v3.8.0 lazy-load expanded the impact (failures moved to query time). All v3.8.0 users should upgrade.
|
|
1172
|
+
|
|
1118
1173
|
### v3.8.0 (2026-05-10)
|
|
1119
1174
|
|
|
1120
1175
|
- **NEW**: Lazy-load FastEmbed embedding model (~200MB ONNX runtime). Loads on first query instead of startup — idle `knowledge-rag` processes are now cheap, which matters when MCP stdio clients spawn parallel server processes (multiple Claude Code windows, Claude Desktop + IDE, etc.). Public API unchanged. (#32)
|
|
@@ -11,6 +11,7 @@
|
|
|
11
11
|

|
|
12
12
|
[](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml)
|
|
13
13
|
[](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml)
|
|
14
|
+
[](https://github.com/lyonzin/knowledge-rag/actions/workflows/quality-gate.yml)
|
|
14
15
|
[](https://glama.ai/mcp/servers/lyonzin/knowledge-rag)
|
|
15
16
|
|
|
16
17
|
### Your docs, your machine, zero cloud. Claude Code searches them natively.
|
|
@@ -27,19 +28,41 @@ pip install knowledge-rag → restart Claude Code → search_knowledge("your que
|
|
|
27
28
|
|
|
28
29
|
**12 MCP Tools** | **Hybrid Search + Reranking** | **20 File Formats** | **Optional NVIDIA GPU** | **100% Local**
|
|
29
30
|
|
|
30
|
-
[What's New](#whats-new-in-
|
|
31
|
+
[What's New](#whats-new-in-v390) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
|
|
31
32
|
|
|
32
33
|
</div>
|
|
33
34
|
|
|
34
35
|
---
|
|
35
36
|
|
|
36
|
-
## What's New in v3.
|
|
37
|
+
## What's New in v3.9.0
|
|
37
38
|
|
|
38
|
-
###
|
|
39
|
+
### Quality Gate — 7-Pillar PR Validation
|
|
40
|
+
|
|
41
|
+
knowledge-rag is now used daily by 70+ enterprise teams. Every PR (including dependabot bumps and one-line fixes) is now evaluated against **35+ automated checks** spread across 7 pillars before any human review:
|
|
42
|
+
|
|
43
|
+
| Pillar | What it enforces | Tools |
|
|
44
|
+
|---|---|---|
|
|
45
|
+
| **1 Security** | SAST, secrets, CVEs, supply chain | bandit, semgrep, gitleaks, pip-audit, dependency-review, Snyk, CodeQL, Socket |
|
|
46
|
+
| **2 Stability** | Flake detection, coverage trend, test count, deterministic runs | pytest-rerunfailures, codecov ±0.5pp, test-count guard |
|
|
47
|
+
| **3 Memory Leak** | RSS bounded under 1000-query load, no idle bloat | psutil-based baseline tests + nightly 50K-iteration soak |
|
|
48
|
+
| **4 Versatility** | 9 OS×Python combos, 14 format parsers, 4 config presets, locale tolerance, property-based fuzzing | matrix CI on Linux+Windows+macOS × 3.11+3.12+3.13, Hypothesis |
|
|
49
|
+
| **5 Scalability** | Performance regression > 10% blocks merge, public bench dashboard | pytest-benchmark, GH Pages chart |
|
|
50
|
+
| **6 Versioning** | Atomic version sync, API surface diff, conventional commits, CHANGELOG enforcement, backwards compat | griffe-style AST diff, custom guards |
|
|
51
|
+
| **7 Quality** | Type strictness, docstring coverage, complexity, dead code | mypy strict, interrogate ≥80%, radon, vulture |
|
|
52
|
+
|
|
53
|
+
Plus a **nightly resilience workflow** that runs chaos failure-injection (HF down, ChromaDB corruption, watchdog crash, ONNX zero-byte replay), determinism check (full suite × 3), and mutation testing on selected modules.
|
|
54
|
+
|
|
55
|
+
Read the full philosophy in [CONTRIBUTING.md](CONTRIBUTING.md). Report bugs via [SECURITY.md](SECURITY.md) or the [issue templates](.github/ISSUE_TEMPLATE/).
|
|
56
|
+
|
|
57
|
+
### Critical Hotfix — No More Silent Zero-Vector Corruption (v3.8.1)
|
|
58
|
+
|
|
59
|
+
`FastEmbedEmbeddings.__call__` no longer swallows exceptions and returns `[[0.0]*dim, ...]` when the ONNX model fails to load. That bug pre-existed in master but was silent: ChromaDB happily stored zero embeddings, `count()` reported normal numbers, smart-reindex skipped them as "already indexed", and queries returned garbage similarity with no error visible. Now raises `EmbeddingModelLoadError` / `EmbeddingError` loudly. **All v3.8.0 users should upgrade.** Full details in [Changelog](#v381-2026-05-10--hotfix).
|
|
60
|
+
|
|
61
|
+
### Lazy-Loaded Embeddings — Cheaper Idle Processes (v3.8.0)
|
|
39
62
|
|
|
40
63
|
The FastEmbed ONNX model (~200MB resident) now loads on the **first query**, not at startup. Idle `knowledge-rag` processes are now genuinely cheap. Why this matters: MCP stdio is one-process-per-client by protocol — multiple Claude Code windows, Claude Desktop + IDE simultaneously, or review/approval flows that open extra connections all spawn their own processes. Before v3.8.0, every one of them paid the full embedding-model cost up front. Now only processes that actually serve queries load the model. Public API is unchanged.
|
|
41
64
|
|
|
42
|
-
### Opt-In Single-Instance Guard
|
|
65
|
+
### Opt-In Single-Instance Guard (v3.8.0)
|
|
43
66
|
|
|
44
67
|
For users who measured their setup and want a hard cap of one server per `data_dir`:
|
|
45
68
|
|
|
@@ -63,6 +86,8 @@ All methods produce the same MCP server. See [Installation](#installation) for f
|
|
|
63
86
|
|
|
64
87
|
### Recent Highlights
|
|
65
88
|
|
|
89
|
+
- **v3.9.0** — **Quality Gate** activated: 35+ automated PR checks across 7 pillars (Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality) + nightly resilience suite (chaos, soak, determinism, mutation)
|
|
90
|
+
- **v3.8.1** — Critical hotfix: loud-fail embeddings (no more silent zero-vector corruption); Windows CI flake erradicated (HF_HUB_OFFLINE + shell:bash + atexit wrapper)
|
|
66
91
|
- **v3.8.0** — Lazy-load embeddings, opt-in single-instance guard, version sync across PyPI/NPM/Docker
|
|
67
92
|
- **v3.6.0** — Multi-language code parsing (C/C++/JS/TS/XML), NPM wrapper, Docker image, automated release pipeline
|
|
68
93
|
- **v3.5.2** — CUDA DLL auto-discovery from pip packages, graceful GPU→CPU fallback, explicit CPU provider (no CUDA noise when `gpu: false`), BASE_DIR resolution fix for editable installs
|
|
@@ -1077,6 +1102,36 @@ A second instance exits immediately with code 75. Default is OFF (multi-client f
|
|
|
1077
1102
|
|
|
1078
1103
|
## Changelog
|
|
1079
1104
|
|
|
1105
|
+
### v3.9.0 (2026-05-10) — Quality Gate
|
|
1106
|
+
|
|
1107
|
+
**Major governance + CI hardening release. No runtime behavior change in `mcp_server/`. Public API surface unchanged from v3.8.1.**
|
|
1108
|
+
|
|
1109
|
+
- **NEW** Quality Gate workflow (`.github/workflows/quality-gate.yml`) enforcing the 7 pillars on every PR: Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality. 35+ status checks total.
|
|
1110
|
+
- **NEW** Nightly resilience workflow (`.github/workflows/nightly.yml`): chaos suite (failure injection), 1h soak test (50K-iteration loop), determinism check (full suite × 3), mutation testing (mutmut). Auto-opens GitHub issue on any nightly failure.
|
|
1111
|
+
- **NEW** Performance benchmark suite under `bench/` (12 microbenchmarks, pytest-benchmark) with 10% regression gate on every PR.
|
|
1112
|
+
- **NEW** Public performance dashboard via GitHub Pages (`.github/workflows/bench-pages.yml`) — chart of latency/throughput per commit. Dormant until repo Pages is enabled.
|
|
1113
|
+
- **NEW** Property-based fuzzing of all parsers via Hypothesis (`tests/test_ingestion_property.py`) — 200 random examples per CI run.
|
|
1114
|
+
- **NEW** Memory baseline regression tests (`tests/test_memory_baseline.py`, cross-platform via psutil) — RSS bounded under 1000 queries; nightly soak amplifies to 50K iterations.
|
|
1115
|
+
- **NEW** Property/locale/format/preset matrices (`tests/test_presets.py`, `tests/test_locale.py`, `tests/test_format_smoke.py`).
|
|
1116
|
+
- **NEW** Backwards-compatibility regression tests (`tests/test_backwards_compat.py`) — legacy YAML configs from v3.6.0 / v3.7.0 still parse; all 12 MCP tool parameter names frozen.
|
|
1117
|
+
- **NEW** AST-based public API surface diff (`scripts/check_api_surface.py`) — any breaking change blocks merge, baseline at `.github/api-surface-baseline.json`.
|
|
1118
|
+
- **NEW** CHANGELOG enforcement (`scripts/check_changelog.py`) — user-facing PRs must add a bullet under `## Unreleased`; bypass via `skip-changelog` label.
|
|
1119
|
+
- **NEW** Test count anti-regression (`scripts/check_test_count.py`) — guards against silent test deletion.
|
|
1120
|
+
- **NEW** Conventional commits required on every PR title (commitlint via `amannn/action-semantic-pull-request`).
|
|
1121
|
+
- **NEW** mypy `--strict` rolling out per-module (currently `instance_lock.py` + `preflight.py` + `scripts/`); interrogate docstring coverage ≥ 80%; radon, vulture, PR-size guard report-only.
|
|
1122
|
+
- **NEW** CI matrix expanded to 9 cells: Linux + Windows + **macOS** × 3.11 + 3.12 + **3.13** (all required at v3.9.0; macOS / 3.13 promoted from experimental after two clean cycles).
|
|
1123
|
+
- **NEW** Governance docs: `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, `SECURITY.md`, `.github/PULL_REQUEST_TEMPLATE.md`, 3 issue templates, expanded `CODEOWNERS`.
|
|
1124
|
+
- **NEW** Pre-commit hooks: ruff, gitleaks, version-sync, conventional commits.
|
|
1125
|
+
- **CHORE** `.github/codecov.yml` enforcing coverage trend gate (-0.5pp blocks; new code ≥ 70%).
|
|
1126
|
+
|
|
1127
|
+
### v3.8.1 (2026-05-10) — hotfix
|
|
1128
|
+
|
|
1129
|
+
- **FIX (critical)**: `FastEmbedEmbeddings.__call__` no longer returns vectors of zeros when the ONNX model fails to load or `embed()` raises. The previous behavior silently corrupted the index — ChromaDB stored zero embeddings, `count()` reported normal numbers, smart-reindex skipped the bad chunks, and queries returned garbage scores with no error visible. Now raises `EmbeddingModelLoadError` / `EmbeddingError`. (#36)
|
|
1130
|
+
- **FIX**: Sticky `_load_failed` flag — after a load failure, subsequent calls re-raise immediately instead of looping through HuggingFace download attempts (was the "frozen query" UX in v3.8.0).
|
|
1131
|
+
- **NEW**: Sanity checks in `__call__` — embed count and dim mismatches raise `EmbeddingError` instead of silently returning malformed vectors.
|
|
1132
|
+
- **TEST**: 7 new regression cases in `tests/test_lazy_embeddings.py`, including `test_does_not_return_zero_vectors_silently` as a guard for the whole class of bug.
|
|
1133
|
+
- **NOTE**: This is a pre-existing bug in master, not introduced by v3.8.0. v3.8.0 lazy-load expanded the impact (failures moved to query time). All v3.8.0 users should upgrade.
|
|
1134
|
+
|
|
1080
1135
|
### v3.8.0 (2026-05-10)
|
|
1081
1136
|
|
|
1082
1137
|
- **NEW**: Lazy-load FastEmbed embedding model (~200MB ONNX runtime). Loads on first query instead of startup — idle `knowledge-rag` processes are now cheap, which matters when MCP stdio clients spawn parallel server processes (multiple Claude Code windows, Claude Desktop + IDE, etc.). Public API unchanged. (#32)
|
|
@@ -155,7 +155,7 @@ def single_instance_lock() -> Iterator[Optional[Path]]:
|
|
|
155
155
|
# Wire signal handlers so SIGINT/SIGTERM cleanup the lock before exit
|
|
156
156
|
previous_handlers: dict[int, object] = {}
|
|
157
157
|
|
|
158
|
-
def _signal_cleanup(signum: int, frame) -> None:
|
|
158
|
+
def _signal_cleanup(signum: int, frame: object) -> None:
|
|
159
159
|
_remove_if_ours(lock_path)
|
|
160
160
|
# Restore original handler and re-raise so default action runs
|
|
161
161
|
prev = previous_handlers.get(signum, signal.SIG_DFL)
|
|
@@ -129,6 +129,18 @@ class QueryCache:
|
|
|
129
129
|
# =============================================================================
|
|
130
130
|
|
|
131
131
|
|
|
132
|
+
class EmbeddingError(RuntimeError):
|
|
133
|
+
"""Raised when embedding generation fails after a successful model load."""
|
|
134
|
+
|
|
135
|
+
|
|
136
|
+
class EmbeddingModelLoadError(RuntimeError):
|
|
137
|
+
"""Raised when the embedding model itself cannot be loaded.
|
|
138
|
+
|
|
139
|
+
Distinct from EmbeddingError so callers can decide whether to retry
|
|
140
|
+
(transient runtime failure) or surface a hard configuration problem.
|
|
141
|
+
"""
|
|
142
|
+
|
|
143
|
+
|
|
132
144
|
class FastEmbedEmbeddings:
|
|
133
145
|
"""
|
|
134
146
|
FastEmbed-based embedding function for ChromaDB (v1.4.0+ compatible).
|
|
@@ -190,33 +202,57 @@ class FastEmbedEmbeddings:
|
|
|
190
202
|
self._gpu = bool(config.gpu_acceleration)
|
|
191
203
|
self._model: Optional[TextEmbedding] = None
|
|
192
204
|
self._load_lock = threading.Lock()
|
|
205
|
+
# Sticky failure flag: once load fails, subsequent calls re-raise immediately
|
|
206
|
+
# instead of looping through download/retry. Same pattern as CrossEncoderReranker.
|
|
207
|
+
self._load_failed: Optional[Exception] = None
|
|
193
208
|
|
|
194
209
|
def _load_model(self) -> None:
|
|
195
|
-
"""Load the ONNX model on demand. Idempotent and thread-safe.
|
|
210
|
+
"""Load the ONNX model on demand. Idempotent and thread-safe.
|
|
211
|
+
|
|
212
|
+
Raises:
|
|
213
|
+
EmbeddingModelLoadError: when the underlying ONNX runtime cannot
|
|
214
|
+
instantiate the model (missing files, hash mismatch, etc.). The
|
|
215
|
+
exception is sticky — subsequent calls raise the same error
|
|
216
|
+
without retrying so callers do not loop through HF downloads.
|
|
217
|
+
"""
|
|
196
218
|
if self._model is not None:
|
|
197
219
|
return
|
|
220
|
+
if self._load_failed is not None:
|
|
221
|
+
raise EmbeddingModelLoadError(
|
|
222
|
+
f"Embedding model previously failed to load: {self._load_failed}"
|
|
223
|
+
) from self._load_failed
|
|
198
224
|
with self._load_lock:
|
|
199
225
|
if self._model is not None: # double-checked under the lock
|
|
200
226
|
return
|
|
227
|
+
if self._load_failed is not None:
|
|
228
|
+
raise EmbeddingModelLoadError(
|
|
229
|
+
f"Embedding model previously failed to load: {self._load_failed}"
|
|
230
|
+
) from self._load_failed
|
|
201
231
|
kwargs = dict(self._init_kwargs)
|
|
202
|
-
|
|
203
|
-
self.
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
232
|
+
try:
|
|
233
|
+
if self._gpu:
|
|
234
|
+
self._setup_cuda_dll_paths()
|
|
235
|
+
kwargs["providers"] = ["CUDAExecutionProvider", "CPUExecutionProvider"]
|
|
236
|
+
print(f"[INFO] Loading embedding model: {self.model_name} ({self._dim}D) [GPU accelerated]...")
|
|
237
|
+
try:
|
|
238
|
+
self._model = TextEmbedding(**kwargs)
|
|
239
|
+
print("[INFO] Embedding model loaded successfully [GPU]")
|
|
240
|
+
except (ValueError, RuntimeError) as e:
|
|
241
|
+
print(f"[WARN] GPU init failed ({e}), falling back to CPU...")
|
|
242
|
+
kwargs["providers"] = ["CPUExecutionProvider"]
|
|
243
|
+
self._model = TextEmbedding(**kwargs)
|
|
244
|
+
print("[INFO] Embedding model loaded successfully [CPU fallback]")
|
|
245
|
+
else:
|
|
212
246
|
kwargs["providers"] = ["CPUExecutionProvider"]
|
|
247
|
+
print(f"[INFO] Loading embedding model: {self.model_name} ({self._dim}D)...")
|
|
213
248
|
self._model = TextEmbedding(**kwargs)
|
|
214
|
-
print("[INFO] Embedding model loaded successfully
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
|
|
249
|
+
print("[INFO] Embedding model loaded successfully")
|
|
250
|
+
except Exception as exc:
|
|
251
|
+
# ONNXRuntimeError, FileNotFoundError, etc. — record and re-raise loud
|
|
252
|
+
self._load_failed = exc
|
|
253
|
+
self._model = None
|
|
254
|
+
print(f"[ERROR] Embedding model load FAILED: {exc}", file=sys.stderr)
|
|
255
|
+
raise EmbeddingModelLoadError(f"Failed to load embedding model: {exc}") from exc
|
|
220
256
|
|
|
221
257
|
def __call__(self, input: List[str]) -> List[List[float]]:
|
|
222
258
|
"""
|
|
@@ -224,17 +260,38 @@ class FastEmbedEmbeddings:
|
|
|
224
260
|
|
|
225
261
|
ChromaDB embedding_function interface: __call__(input: List[str]) -> List[List[float]]
|
|
226
262
|
FastEmbed.embed() returns a generator, so we consume it into a list.
|
|
263
|
+
|
|
264
|
+
Raises:
|
|
265
|
+
EmbeddingModelLoadError: when the model could not be loaded.
|
|
266
|
+
EmbeddingError: when embedding generation fails after a successful load.
|
|
267
|
+
|
|
268
|
+
Behavior note (changed in v3.8.1):
|
|
269
|
+
Previously this method swallowed any exception and returned vectors
|
|
270
|
+
of zeros (``[[0.0]*dim for _ in input]``). That silently corrupted
|
|
271
|
+
the index — ChromaDB stored zero vectors as document embeddings,
|
|
272
|
+
``count()`` returned the right number of chunks, smart-reindex
|
|
273
|
+
would skip them as "already indexed", and queries returned garbage
|
|
274
|
+
similarity scores. Failures are now LOUD: the caller (ChromaDB
|
|
275
|
+
``add()``, MCP search tool, etc.) sees the real error and can
|
|
276
|
+
surface it to the user.
|
|
227
277
|
"""
|
|
228
278
|
if not input:
|
|
229
279
|
return []
|
|
230
280
|
|
|
231
|
-
self._load_model()
|
|
281
|
+
self._load_model() # may raise EmbeddingModelLoadError
|
|
232
282
|
try:
|
|
233
283
|
embeddings = list(self._model.embed(input))
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
284
|
+
except Exception as exc:
|
|
285
|
+
print(f"[ERROR] Embedding generation FAILED: {exc}", file=sys.stderr)
|
|
286
|
+
raise EmbeddingError(f"Embedding generation failed: {exc}") from exc
|
|
287
|
+
|
|
288
|
+
# Sanity check: model returned the right number of vectors with the right dim
|
|
289
|
+
if len(embeddings) != len(input):
|
|
290
|
+
raise EmbeddingError(f"Embedding count mismatch: expected {len(input)}, got {len(embeddings)}")
|
|
291
|
+
result = [emb.tolist() for emb in embeddings]
|
|
292
|
+
if result and len(result[0]) != self._dim:
|
|
293
|
+
raise EmbeddingError(f"Embedding dim mismatch: expected {self._dim}, got {len(result[0])}")
|
|
294
|
+
return result
|
|
238
295
|
|
|
239
296
|
def name(self) -> str:
|
|
240
297
|
"""Return embedding function name (required by ChromaDB v1.4.0+)"""
|
|
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
|
|
|
4
4
|
|
|
5
5
|
[project]
|
|
6
6
|
name = "knowledge-rag"
|
|
7
|
-
version = "3.
|
|
7
|
+
version = "3.9.0"
|
|
8
8
|
description = "Local RAG System for Claude Code — Hybrid search + Cross-encoder Reranking + 12 MCP Tools + 20 Format Parsers. Zero external servers."
|
|
9
9
|
readme = "README.md"
|
|
10
10
|
license = {text = "MIT"}
|
|
@@ -100,6 +100,12 @@ pythonpath = ["."]
|
|
|
100
100
|
# many tmp dirs accumulate). Tests run isolated; we don't need history.
|
|
101
101
|
tmp_path_retention_count = 1
|
|
102
102
|
tmp_path_retention_policy = "failed"
|
|
103
|
+
markers = [
|
|
104
|
+
"chaos: failure-injection tests run in nightly workflow only",
|
|
105
|
+
]
|
|
106
|
+
# Default: skip chaos tests in regular pytest runs (PR + local).
|
|
107
|
+
# Nightly workflow opts in with `pytest -m chaos`.
|
|
108
|
+
addopts = "-m 'not chaos'"
|
|
103
109
|
|
|
104
110
|
[tool.ruff]
|
|
105
111
|
target-version = "py311"
|
|
@@ -118,3 +124,36 @@ source = ["mcp_server"]
|
|
|
118
124
|
[tool.coverage.report]
|
|
119
125
|
show_missing = true
|
|
120
126
|
fail_under = 35
|
|
127
|
+
|
|
128
|
+
# ── mypy: strict gradual rollout (Pillar 7) ─────────────────────────────────
|
|
129
|
+
# Strict mode is enabled GLOBALLY but with a per-module exclusion list while
|
|
130
|
+
# we incrementally annotate the legacy modules. The CI job runs strict on the
|
|
131
|
+
# allowlist below; new modules are added as they earn full annotations.
|
|
132
|
+
[tool.mypy]
|
|
133
|
+
python_version = "3.11"
|
|
134
|
+
strict = true
|
|
135
|
+
show_error_codes = true
|
|
136
|
+
warn_unused_configs = true
|
|
137
|
+
warn_unreachable = true
|
|
138
|
+
ignore_missing_imports = true # third-party libs (chromadb, fastembed, etc.) often lack stubs
|
|
139
|
+
|
|
140
|
+
# ── interrogate: docstring coverage (Pillar 7) ──────────────────────────────
|
|
141
|
+
[tool.interrogate]
|
|
142
|
+
fail-under = 80
|
|
143
|
+
verbose = 1
|
|
144
|
+
quiet = false
|
|
145
|
+
exclude = [
|
|
146
|
+
"tests",
|
|
147
|
+
"bench",
|
|
148
|
+
"scripts",
|
|
149
|
+
"build",
|
|
150
|
+
"dist",
|
|
151
|
+
"venv",
|
|
152
|
+
".venv",
|
|
153
|
+
]
|
|
154
|
+
ignore-init-method = true
|
|
155
|
+
ignore-init-module = true
|
|
156
|
+
ignore-magic = true
|
|
157
|
+
ignore-property-decorators = true
|
|
158
|
+
ignore-private = true
|
|
159
|
+
ignore-semiprivate = true
|
|
@@ -1,36 +0,0 @@
|
|
|
1
|
-
# Sample Document — Knowledge RAG
|
|
2
|
-
|
|
3
|
-
This is an example document showing the expected format for the Knowledge RAG system.
|
|
4
|
-
|
|
5
|
-
## How Documents Are Organized
|
|
6
|
-
|
|
7
|
-
Place your documents in the `documents/` directory, organized by category:
|
|
8
|
-
|
|
9
|
-
- `security/` — Security research, pentest notes, exploit techniques
|
|
10
|
-
- `ctf/` — CTF writeups, challenge solutions
|
|
11
|
-
- `logscale/` — LogScale/LQL queries and documentation
|
|
12
|
-
- `development/` — Code documentation, API references
|
|
13
|
-
- `general/` — Everything else
|
|
14
|
-
|
|
15
|
-
## Supported Formats
|
|
16
|
-
|
|
17
|
-
- **Markdown** (`.md`) — Best format. Chunks align to `##` sections.
|
|
18
|
-
- **PDF** (`.pdf`) — Extracted page-by-page via PyMuPDF.
|
|
19
|
-
- **Text** (`.txt`) — Plain text, paragraph-based chunking.
|
|
20
|
-
- **Python** (`.py`) — Code with function/class extraction.
|
|
21
|
-
- **JSON** (`.json`) — Structured data, pretty-printed.
|
|
22
|
-
|
|
23
|
-
## Tips for Best Results
|
|
24
|
-
|
|
25
|
-
1. Use `##` and `###` headers in Markdown files — the system chunks by sections
|
|
26
|
-
2. Keep sections focused on a single topic for better retrieval precision
|
|
27
|
-
3. Include relevant keywords naturally in your text
|
|
28
|
-
4. After adding new documents, the system auto-indexes on next startup
|
|
29
|
-
|
|
30
|
-
## Example Search Queries
|
|
31
|
-
|
|
32
|
-
```
|
|
33
|
-
search_knowledge("sql injection bypass", hybrid_alpha=0.3)
|
|
34
|
-
search_knowledge("privilege escalation linux", category="security")
|
|
35
|
-
search_knowledge("formatTime logscale", hybrid_alpha=0)
|
|
36
|
-
```
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|