PyPI - knowledge-rag - Versions diffs - 3.5.2__tar.gz → 3.6.0__tar.gz - Mend

knowledge-rag 3.5.2tar.gz → 3.6.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

{knowledge_rag-3.5.2 → knowledge_rag-3.6.0}/.gitignore RENAMED Viewed

@@ -1,59 +1,59 @@
-# Runtime
-venv/
-__pycache__/
-*.pyc
-data/
-# User config (personal settings — use presets/ as starting point)
-config.yaml
-# Personal documents (NEVER commit — user-populated content)
-documents/aar/
-documents/security/
-documents/logscale/
-documents/general/
-documents/ctf/
-documents/development/
-documents/documents/
-# Sensitive file formats (extra safety layer)
-documents/**/*.pdf
-documents/**/*.docx
-documents/**/*.xlsx
-documents/**/*.pptx
-documents/**/*.csv
-# Keep example docs
-!documents/examples/
-!documents/examples/**
-# FastEmbed model cache
-.cache/
-models_cache/
-*.onnx
-# Local scripts (not part of distribution)
-scripts/
-setup-notebook.ps1
-demo-real.yml
-documents/.sync-log.txt
-documents/README-CATEGORIES.md
-# Temp/junk
-*.b64
-*.tar.gz
-*.bak
-# OS files
-.DS_Store
-Thumbs.db
-desktop.ini
-# IDE
-.vscode/
-.idea/
-*.swp
-*.swo
-dist/
-.ruff_cache/
-.pytest_cache/
+# Runtime
+venv/
+__pycache__/
+*.pyc
+data/
+# User config (personal settings — use presets/ as starting point)
+config.yaml
+# Personal documents (NEVER commit — user-populated content)
+documents/aar/
+documents/security/
+documents/logscale/
+documents/general/
+documents/ctf/
+documents/development/
+documents/documents/
+# Sensitive file formats (extra safety layer)
+documents/**/*.pdf
+documents/**/*.docx
+documents/**/*.xlsx
+documents/**/*.pptx
+documents/**/*.csv
+# Keep example docs
+!documents/examples/
+!documents/examples/**
+# FastEmbed model cache
+.cache/
+models_cache/
+*.onnx
+# Local scripts (not part of distribution)
+scripts/
+setup-notebook.ps1
+demo-real.yml
+documents/.sync-log.txt
+documents/README-CATEGORIES.md
+# Temp/junk
+*.b64
+*.tar.gz
+*.bak
+# OS files
+.DS_Store
+Thumbs.db
+desktop.ini
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+dist/
+.ruff_cache/
+.pytest_cache/

{knowledge_rag-3.5.2 → knowledge_rag-3.6.0}/LICENSE RENAMED Viewed

@@ -1,21 +1,21 @@
-MIT License
-Copyright (c) 2025 Ailton Rocha (Lyon)
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
+MIT License
+Copyright (c) 2025 Ailton Rocha (Lyon)
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

{knowledge_rag-3.5.2 → knowledge_rag-3.6.0}/PKG-INFO RENAMED Viewed

@@ -1,7 +1,7 @@
 Metadata-Version: 2.4
 Name: knowledge-rag
-Version: 3.5.2
-Summary: Local RAG System for Claude Code — Hybrid search + Cross-encoder Reranking + 12 MCP Tools. Zero external servers.
+Version: 3.6.0
+Summary: Local RAG System for Claude Code — Hybrid search + Cross-encoder Reranking + 12 MCP Tools + 20 Format Parsers. Zero external servers.
 Project-URL: Homepage, https://github.com/lyonzin/knowledge-rag
 Project-URL: Repository, https://github.com/lyonzin/knowledge-rag
 Project-URL: Issues, https://github.com/lyonzin/knowledge-rag/issues
@@ -50,19 +50,19 @@ Description-Content-Type: text/markdown
 [![Glama Score](https://glama.ai/mcp/servers/lyonzin/knowledge-rag/badges/score.svg)](https://glama.ai/mcp/servers/lyonzin/knowledge-rag)
 [![PyPI](https://img.shields.io/pypi/v/knowledge-rag)](https://pypi.org/project/knowledge-rag/)
-### LLMs don't know your docs. Every conversation starts from zero.
+### Your docs, your machine, zero cloud. Claude Code searches them natively.
-Your notes, writeups, internal procedures, PDFs — none of it exists to your AI assistant.
-Cloud RAG solutions leak your private data. Local ones require Docker, Ollama, and 15 minutes of setup before a single query.
+Drop your PDFs, markdown, code, notebooks — **1800+ files, 39K chunks, indexed in under 3 minutes.**<br/>
+Hybrid search (BM25 + semantic vectors + cross-encoder reranking) through 12 MCP tools.<br/>
+Everything runs locally via ONNX. No Docker, no Ollama, no API keys, no data leaves your machine.
-**Knowledge RAG fixes this.** One `pip install`, zero external servers.
-Your documents become instantly searchable inside Claude Code — with reranking precision that actually finds what you need.
-`pip install → restart Claude Code → done.`
+```
+pip install knowledge-rag → restart Claude Code → search_knowledge("your query")
+```
 ---
-**12 MCP Tools** | **Hybrid Search + Cross-Encoder Reranking** | **Markdown-Aware Chunking** | **100% Local, Zero Cloud**
+**12 MCP Tools** | **Hybrid Search + Reranking** | **20 File Formats** | **Optional NVIDIA GPU** | **100% Local**
 [What's New](#whats-new-in-v352) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
@@ -125,6 +125,14 @@ See [Changelog](#changelog) for full history.
 | Excel | `.xlsx` | openpyxl | Yes | Sheet-by-sheet extraction |
 | PowerPoint | `.pptx` | python-pptx | Yes | Slide-by-slide extraction |
 | Jupyter Notebook | `.ipynb` | Cell-aware parser | Yes | Markdown + code cells only, no outputs/base64 |
+| C Source | `.c` | Code-aware parser | Yes | Functions/structs/includes extracted |
+| C/C++ Header | `.h` | Code-aware parser | Yes | Function declarations/structs extracted |
+| C++ Source | `.cpp` | Code-aware parser | Yes | Classes/structs/includes extracted |
+| JavaScript | `.js` | Code-aware parser | Yes | Functions/classes/imports (ESM + CJS) |
+| React JSX | `.jsx` | Code-aware parser | Yes | Same as JS parser |
+| TypeScript | `.ts` | Code-aware parser | Yes | Functions/classes/interfaces/enums/imports |
+| React TSX | `.tsx` | Code-aware parser | Yes | Same as TS parser |
+| XML | `.xml` | XML parser | Yes | Root element and namespace extraction |
 | MQL4 Header | `.mqh` | Code parser | No | MetaTrader — add to `supported_formats` to enable |
 | MQL4 Source | `.mq4` | Code parser | No | MetaTrader — add to `supported_formats` to enable |
@@ -144,7 +152,7 @@ See [Changelog](#changelog) for full history.
 | **Markdown-Aware Chunking** | `.md` files split by `##`/`###` sections instead of fixed windows |
 | **In-Process Embeddings** | FastEmbed ONNX Runtime (BAAI/bge-small-en-v1.5, 384D) |
 | **Keyword Routing** | Word-boundary aware routing for domain-specific queries |
-| **12 Format Parsers** | MD, TXT, PDF, PY, JSON, CSV, DOCX, XLSX, PPTX, IPYNB + opt-in MQH/MQ4 |
+| **20 Format Parsers** | MD, TXT, PDF, PY, C, H, CPP, JS, JSX, TS, TSX, JSON, XML, CSV, DOCX, XLSX, PPTX, IPYNB + opt-in MQH/MQ4 |
 | **Category Organization** | Organize docs by folder, auto-tagged by path |
 | **Incremental Indexing** | Change detection via mtime/size — only re-indexes modified files |
 | **Chunk Deduplication** | SHA256 content hashing prevents duplicate chunks |
@@ -202,7 +210,7 @@ flowchart TB
     end
     subgraph INGEST["DOCUMENT INGESTION"]
-        PARSERS["12 Parsers<br/>MD | PDF | TXT | PY | JSON | CSV<br/>DOCX | XLSX | PPTX | IPYNB | MQH | MQ4"]
+        PARSERS["20 Parsers<br/>MD | PDF | TXT | PY | C | H | CPP | JS | JSX | TS | TSX | JSON | XML | CSV<br/>DOCX | XLSX | PPTX | IPYNB | MQH | MQ4"]
         CHUNKER["Chunking<br/>MD: section-aware<br/>Other: 1000 chars + 200 overlap"]
         PARSERS --> CHUNKER
     end
@@ -268,11 +276,11 @@ flowchart LR
         FILES["documents/<br/>├── security/<br/>├── development/<br/>├── ctf/<br/>└── general/"]
     end
-    subgraph PARSE["Parse (12 formats)"]
+    subgraph PARSE["Parse (20 formats)"]
         MD["Markdown"]
         PDF["PDF<br/>(PyMuPDF)"]
         OFFICE["DOCX | XLSX<br/>PPTX | CSV"]
-        CODE["PY | JSON<br/>IPYNB"]
+        CODE["PY | C | H | CPP | JS | JSX<br/>TS | TSX | JSON | XML | IPYNB"]
     end
     subgraph CHUNK["Chunk"]
@@ -933,7 +941,7 @@ knowledge-rag/
 ├── mcp_server/
 │   ├── __init__.py          # Stdout protection + version
 │   ├── config.py            # YAML config loader + defaults
-│   ├── ingestion.py         # 12 parsers, chunking, metadata extraction
+│   ├── ingestion.py         # 20 parsers, chunking, metadata extraction
 │   └── server.py            # MCP server, ChromaDB, BM25, reranker, 12 tools
 ├── config.example.yaml      # Documented config template (copy to config.yaml)
 ├── config.yaml              # Your active configuration (git-ignored)
@@ -1024,6 +1032,15 @@ With ~200 documents, expect ~300-500MB RAM. The embedding model (~50MB) and rera
 ## Changelog
+### v3.6.0 (2026-04-23)
+- **NEW**: Multi-language code parsing — C (`.c`), C++ (`.cpp`/`.h`), JavaScript (`.js`/`.jsx`), TypeScript (`.ts`/`.tsx`) with per-language function/class/import extraction
+- **NEW**: XML parser (`.xml`) — root element and namespace metadata extraction
+- **NEW**: All 8 new formats default enabled — no config change needed
+- **NEW**: NPM wrapper (`npx knowledge-rag`) + Docker image (`ghcr.io/lyonzin/knowledge-rag`)
+- **NEW**: Automated release pipeline — PyPI (Trusted Publishing), NPM, Docker GHCR
+- **IMPROVED**: Code parser reports correct `language` metadata per file type (was hardcoded to `"python"` for all code files)
 ### v3.5.2 (2026-04-16)
 - **NEW**: Auto-discovery of CUDA 12 DLLs from pip-installed NVIDIA packages — no manual PATH configuration needed
@@ -1038,7 +1055,7 @@ With ~200 documents, expect ~300-500MB RAM. The embedding model (~50MB) and rera
 ### v3.5.0 (2026-04-16)
 - **NEW**: Optional GPU acceleration for ONNX embeddings — `pip install knowledge-rag[gpu]` + `models.embedding.gpu: true` in config. 5-10x faster indexing on NVIDIA GPUs with automatic CPU fallback.
-- **DOCS**: Supported formats table added to README (12 formats)
+- **DOCS**: Supported formats table added to README (20 formats)
 ### v3.4.3 (2026-04-16)

{knowledge_rag-3.5.2 → knowledge_rag-3.6.0}/README.md RENAMED Viewed

@@ -12,19 +12,19 @@
 [![Glama Score](https://glama.ai/mcp/servers/lyonzin/knowledge-rag/badges/score.svg)](https://glama.ai/mcp/servers/lyonzin/knowledge-rag)
 [![PyPI](https://img.shields.io/pypi/v/knowledge-rag)](https://pypi.org/project/knowledge-rag/)
-### LLMs don't know your docs. Every conversation starts from zero.
+### Your docs, your machine, zero cloud. Claude Code searches them natively.
-Your notes, writeups, internal procedures, PDFs — none of it exists to your AI assistant.
-Cloud RAG solutions leak your private data. Local ones require Docker, Ollama, and 15 minutes of setup before a single query.
+Drop your PDFs, markdown, code, notebooks — **1800+ files, 39K chunks, indexed in under 3 minutes.**<br/>
+Hybrid search (BM25 + semantic vectors + cross-encoder reranking) through 12 MCP tools.<br/>
+Everything runs locally via ONNX. No Docker, no Ollama, no API keys, no data leaves your machine.
-**Knowledge RAG fixes this.** One `pip install`, zero external servers.
-Your documents become instantly searchable inside Claude Code — with reranking precision that actually finds what you need.
-`pip install → restart Claude Code → done.`
+```
+pip install knowledge-rag → restart Claude Code → search_knowledge("your query")
+```
 ---
-**12 MCP Tools** | **Hybrid Search + Cross-Encoder Reranking** | **Markdown-Aware Chunking** | **100% Local, Zero Cloud**
+**12 MCP Tools** | **Hybrid Search + Reranking** | **20 File Formats** | **Optional NVIDIA GPU** | **100% Local**
 [What's New](#whats-new-in-v352) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)
@@ -87,6 +87,14 @@ See [Changelog](#changelog) for full history.
 | Excel | `.xlsx` | openpyxl | Yes | Sheet-by-sheet extraction |
 | PowerPoint | `.pptx` | python-pptx | Yes | Slide-by-slide extraction |
 | Jupyter Notebook | `.ipynb` | Cell-aware parser | Yes | Markdown + code cells only, no outputs/base64 |
+| C Source | `.c` | Code-aware parser | Yes | Functions/structs/includes extracted |
+| C/C++ Header | `.h` | Code-aware parser | Yes | Function declarations/structs extracted |
+| C++ Source | `.cpp` | Code-aware parser | Yes | Classes/structs/includes extracted |
+| JavaScript | `.js` | Code-aware parser | Yes | Functions/classes/imports (ESM + CJS) |
+| React JSX | `.jsx` | Code-aware parser | Yes | Same as JS parser |
+| TypeScript | `.ts` | Code-aware parser | Yes | Functions/classes/interfaces/enums/imports |
+| React TSX | `.tsx` | Code-aware parser | Yes | Same as TS parser |
+| XML | `.xml` | XML parser | Yes | Root element and namespace extraction |
 | MQL4 Header | `.mqh` | Code parser | No | MetaTrader — add to `supported_formats` to enable |
 | MQL4 Source | `.mq4` | Code parser | No | MetaTrader — add to `supported_formats` to enable |
@@ -106,7 +114,7 @@ See [Changelog](#changelog) for full history.
 | **Markdown-Aware Chunking** | `.md` files split by `##`/`###` sections instead of fixed windows |
 | **In-Process Embeddings** | FastEmbed ONNX Runtime (BAAI/bge-small-en-v1.5, 384D) |
 | **Keyword Routing** | Word-boundary aware routing for domain-specific queries |
-| **12 Format Parsers** | MD, TXT, PDF, PY, JSON, CSV, DOCX, XLSX, PPTX, IPYNB + opt-in MQH/MQ4 |
+| **20 Format Parsers** | MD, TXT, PDF, PY, C, H, CPP, JS, JSX, TS, TSX, JSON, XML, CSV, DOCX, XLSX, PPTX, IPYNB + opt-in MQH/MQ4 |
 | **Category Organization** | Organize docs by folder, auto-tagged by path |
 | **Incremental Indexing** | Change detection via mtime/size — only re-indexes modified files |
 | **Chunk Deduplication** | SHA256 content hashing prevents duplicate chunks |
@@ -164,7 +172,7 @@ flowchart TB
     end
     subgraph INGEST["DOCUMENT INGESTION"]
-        PARSERS["12 Parsers<br/>MD | PDF | TXT | PY | JSON | CSV<br/>DOCX | XLSX | PPTX | IPYNB | MQH | MQ4"]
+        PARSERS["20 Parsers<br/>MD | PDF | TXT | PY | C | H | CPP | JS | JSX | TS | TSX | JSON | XML | CSV<br/>DOCX | XLSX | PPTX | IPYNB | MQH | MQ4"]
         CHUNKER["Chunking<br/>MD: section-aware<br/>Other: 1000 chars + 200 overlap"]
         PARSERS --> CHUNKER
     end
@@ -230,11 +238,11 @@ flowchart LR
         FILES["documents/<br/>├── security/<br/>├── development/<br/>├── ctf/<br/>└── general/"]
     end
-    subgraph PARSE["Parse (12 formats)"]
+    subgraph PARSE["Parse (20 formats)"]
         MD["Markdown"]
         PDF["PDF<br/>(PyMuPDF)"]
         OFFICE["DOCX | XLSX<br/>PPTX | CSV"]
-        CODE["PY | JSON<br/>IPYNB"]
+        CODE["PY | C | H | CPP | JS | JSX<br/>TS | TSX | JSON | XML | IPYNB"]
     end
     subgraph CHUNK["Chunk"]
@@ -895,7 +903,7 @@ knowledge-rag/
 ├── mcp_server/
 │   ├── __init__.py          # Stdout protection + version
 │   ├── config.py            # YAML config loader + defaults
-│   ├── ingestion.py         # 12 parsers, chunking, metadata extraction
+│   ├── ingestion.py         # 20 parsers, chunking, metadata extraction
 │   └── server.py            # MCP server, ChromaDB, BM25, reranker, 12 tools
 ├── config.example.yaml      # Documented config template (copy to config.yaml)
 ├── config.yaml              # Your active configuration (git-ignored)
@@ -986,6 +994,15 @@ With ~200 documents, expect ~300-500MB RAM. The embedding model (~50MB) and rera
 ## Changelog
+### v3.6.0 (2026-04-23)
+- **NEW**: Multi-language code parsing — C (`.c`), C++ (`.cpp`/`.h`), JavaScript (`.js`/`.jsx`), TypeScript (`.ts`/`.tsx`) with per-language function/class/import extraction
+- **NEW**: XML parser (`.xml`) — root element and namespace metadata extraction
+- **NEW**: All 8 new formats default enabled — no config change needed
+- **NEW**: NPM wrapper (`npx knowledge-rag`) + Docker image (`ghcr.io/lyonzin/knowledge-rag`)
+- **NEW**: Automated release pipeline — PyPI (Trusted Publishing), NPM, Docker GHCR
+- **IMPROVED**: Code parser reports correct `language` metadata per file type (was hardcoded to `"python"` for all code files)
 ### v3.5.2 (2026-04-16)
 - **NEW**: Auto-discovery of CUDA 12 DLLs from pip-installed NVIDIA packages — no manual PATH configuration needed
@@ -1000,7 +1017,7 @@ With ~200 documents, expect ~300-500MB RAM. The embedding model (~50MB) and rera
 ### v3.5.0 (2026-04-16)
 - **NEW**: Optional GPU acceleration for ONNX embeddings — `pip install knowledge-rag[gpu]` + `models.embedding.gpu: true` in config. 5-10x faster indexing on NVIDIA GPUs with automatic CPU fallback.
-- **DOCS**: Supported formats table added to README (12 formats)
+- **DOCS**: Supported formats table added to README (20 formats)
 ### v3.4.3 (2026-04-16)

knowledge-rag 3.5.2__tar.gz → 3.6.0__tar.gz

knowledge-rag 3.5.2tar.gz → 3.6.0tar.gz