PyPI - chunksilo - Versions diffs - 2.0.0__tar.gz → 2.1.0__tar.gz - Mend

chunksilo 2.0.0tar.gz → 2.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of chunksilo might be problematic. Click here for more details.

Files changed (31) hide show

{chunksilo-2.0.0/src/chunksilo.egg-info → chunksilo-2.1.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: chunksilo
-Version: 2.0.0
+Version: 2.1.0
 Summary: Local RAG-based semantic document search with MCP server interface
 Author: Fredrik Reveny
 License-Expression: Apache-2.0
@@ -33,32 +33,53 @@ Requires-Dist: fastembed<1,>=0.5.0
 Requires-Dist: pyyaml<7,>=6.0
 Provides-Extra: confluence
 Requires-Dist: llama-index-readers-confluence<1,>=0.6.0; extra == "confluence"
+Provides-Extra: jira
+Requires-Dist: jira<4,>=3.5.0; extra == "jira"
 Provides-Extra: test
 Requires-Dist: pytest<9,>=7.4.0; extra == "test"
 Requires-Dist: requests<3,>=2.31.0; extra == "test"
 Dynamic: license-file
+<p align="center">
+  <img src="chunksilo.png" alt="ChunkSilo Logo" width="500">
+</p>
 # ChunkSilo MCP Server
 ChunkSilo is like a local Google for your documents. It uses semantic search — matching by meaning rather than exact keywords — so your LLM can find relevant information across all your files even when the wording differs from your query. Point it at your PDFs, Word docs, Markdown, and text files, and it builds a fully searchable index locally on your machine.
-## Overview
-- **No permissions headache**: Each user indexes only the files they already have access to. No centralized access-control system to build or maintain — document permissions stay exactly where they are.
-- **No infrastructure required**: Runs entirely on the user's own machine as an MCP server. Nothing to deploy, no servers to manage.
-- **Easy to set up**: Any user with an MCP-compatible LLM client can install, point at their document directories, and have everything indexed and searchable.
-- **Works with what you have**: Supports PDF, DOCX, DOC, Markdown, and TXT from local folders, network drives, or shared mounts.
+- Runs entirely on your machine — no servers, no infrastructure
+- Semantic search + keyword filename matching across PDF, DOCX, DOC, Markdown, and TXT
+- Incremental indexing — only reprocesses new or changed files
+- Heading-aware results with source links back to the original file
+- Date filtering and recency boosting
+- Optional Confluence integration
-## Features
+### Example `search_docs` output
-- **Local indexing and search**: All indexing and search runs on your machine with bundled models — ChunkSilo itself makes no external network calls when `offline: true`. Note: search results are passed to your MCP client's LLM, which may be cloud-hosted.
-- **Incremental indexing**: Only reindexes new or changed files, so re-runs are fast even on large document collections.
-- **Heading-aware navigation**: Extracts headings from PDFs, Word docs, and Markdown so results include the full heading path (e.g. "Chapter 3 > Setup > Prerequisites").
-- **Date filtering and recency boost**: Search within a date range or let recent documents rank higher automatically.
-- **Dual retrieval**: Returns both meaning-based chunk matches and keyword-based filename matches separately, so file lookups don't get buried by unrelated content.
-- **Multi-directory with per-folder rules**: Index multiple directories with individual include/exclude glob patterns — useful for shared drives with mixed content.
-- **Confluence integration**: Optionally searches your Confluence instance alongside local files, with results returned in the same format.
-- **Source links**: Each result includes a clickable link back to the source file or Confluence page in supported MCP clients.
+```json
+{
+  "matched_files": [
+    { "uri": "file:///docs/database-configuration.docx", "score": 0.8432 }
+  ],
+  "num_matched_files": 1,
+  "chunks": [
+    {
+      "text": "To configure the database connection, set the DATABASE_URL environment variable...",
+      "score": 0.912,
+      "location": {
+        "uri": "file:///docs/setup-guide.pdf",
+        "page": 12,
+        "line": null,
+        "heading_path": ["Getting Started", "Configuration", "Database"]
+      }
+    }
+  ],
+  "num_chunks": 1,
+  "query": "how to configure the database",
+  "retrieval_time": "0.42s"
+}
+```
 ## Installation
@@ -71,6 +92,12 @@ pip install chunksilo
 # Or with Confluence support:
 pip install chunksilo[confluence]
+# Or with Jira support:
+pip install chunksilo[jira]
+# Or with both Confluence and Jira:
+pip install chunksilo[confluence,jira]
 ```
 Then:
@@ -184,6 +211,27 @@ All settings are optional and have sensible defaults.
 | `confluence.timeout` | `10.0` | Request timeout in seconds |
 | `confluence.max_results` | `30` | Maximum results per search |
+#### Jira Settings (optional)
+> **Note:** Jira integration requires the optional dependency. Install with: `pip install chunksilo[jira]`
+| Setting | Default | Description |
+| :--- | :--- | :--- |
+| `jira.url` | `""` | Jira base URL (empty = disabled) |
+| `jira.username` | `""` | Jira username/email |
+| `jira.api_token` | `""` | Jira API token |
+| `jira.timeout` | `10.0` | Request timeout in seconds |
+| `jira.max_results` | `30` | Maximum results per search |
+| `jira.projects` | `[]` | Project keys to search (empty = all) |
+| `jira.include_comments` | `true` | Include issue comments in search |
+| `jira.include_custom_fields` | `true` | Include custom fields in search |
+**Creating a Jira API Token:**
+1. Log into Jira
+2. Go to Account Settings > Security > API Tokens
+3. Click "Create API Token"
+4. Copy the token and add it to your config
 #### SSL Settings (optional)
 | Setting | Default | Description |
@@ -357,6 +405,7 @@ Add to `mcp_settings.json` (typically in `~/.config/Code/User/globalStorage/roov
 - **Retrieval errors**: Check paths in your MCP client configuration.
 - **Offline mode**: PyPI installs default to `offline: false` (models auto-download). The offline bundle includes pre-downloaded models and sets `offline: true`. Set `retrieval.offline: true` in `config.yaml` to prevent network calls after initial model download.
 - **Confluence Integration**: Install with `pip install chunksilo[confluence]`, then set `confluence.url`, `confluence.username`, and `confluence.api_token` in `config.yaml`.
+- **Jira Integration**: Install with `pip install chunksilo[jira]`, then set `jira.url`, `jira.username`, and `jira.api_token` in `config.yaml`. Optionally configure `jira.projects` to restrict search to specific project keys.
 - **Custom CA Bundle**: Set `ssl.ca_bundle_path` in `config.yaml` for custom certificates.
 - **Network mounts**: Unavailable directories are skipped with a warning; indexing continues with available directories.
 - **Legacy .doc files**: Requires LibreOffice to be installed for automatic conversion to .docx. If LibreOffice is not found, .doc files are skipped with a warning. Full heading extraction is supported.

{chunksilo-2.0.0 → chunksilo-2.1.0}/README.md RENAMED Viewed

@@ -1,24 +1,43 @@
+<p align="center">
+  <img src="chunksilo.png" alt="ChunkSilo Logo" width="500">
+</p>
 # ChunkSilo MCP Server
 ChunkSilo is like a local Google for your documents. It uses semantic search — matching by meaning rather than exact keywords — so your LLM can find relevant information across all your files even when the wording differs from your query. Point it at your PDFs, Word docs, Markdown, and text files, and it builds a fully searchable index locally on your machine.
-## Overview
-- **No permissions headache**: Each user indexes only the files they already have access to. No centralized access-control system to build or maintain — document permissions stay exactly where they are.
-- **No infrastructure required**: Runs entirely on the user's own machine as an MCP server. Nothing to deploy, no servers to manage.
-- **Easy to set up**: Any user with an MCP-compatible LLM client can install, point at their document directories, and have everything indexed and searchable.
-- **Works with what you have**: Supports PDF, DOCX, DOC, Markdown, and TXT from local folders, network drives, or shared mounts.
+- Runs entirely on your machine — no servers, no infrastructure
+- Semantic search + keyword filename matching across PDF, DOCX, DOC, Markdown, and TXT
+- Incremental indexing — only reprocesses new or changed files
+- Heading-aware results with source links back to the original file
+- Date filtering and recency boosting
+- Optional Confluence integration
-## Features
+### Example `search_docs` output
-- **Local indexing and search**: All indexing and search runs on your machine with bundled models — ChunkSilo itself makes no external network calls when `offline: true`. Note: search results are passed to your MCP client's LLM, which may be cloud-hosted.
-- **Incremental indexing**: Only reindexes new or changed files, so re-runs are fast even on large document collections.
-- **Heading-aware navigation**: Extracts headings from PDFs, Word docs, and Markdown so results include the full heading path (e.g. "Chapter 3 > Setup > Prerequisites").
-- **Date filtering and recency boost**: Search within a date range or let recent documents rank higher automatically.
-- **Dual retrieval**: Returns both meaning-based chunk matches and keyword-based filename matches separately, so file lookups don't get buried by unrelated content.
-- **Multi-directory with per-folder rules**: Index multiple directories with individual include/exclude glob patterns — useful for shared drives with mixed content.
-- **Confluence integration**: Optionally searches your Confluence instance alongside local files, with results returned in the same format.
-- **Source links**: Each result includes a clickable link back to the source file or Confluence page in supported MCP clients.
+```json
+{
+  "matched_files": [
+    { "uri": "file:///docs/database-configuration.docx", "score": 0.8432 }
+  ],
+  "num_matched_files": 1,
+  "chunks": [
+    {
+      "text": "To configure the database connection, set the DATABASE_URL environment variable...",
+      "score": 0.912,
+      "location": {
+        "uri": "file:///docs/setup-guide.pdf",
+        "page": 12,
+        "line": null,
+        "heading_path": ["Getting Started", "Configuration", "Database"]
+      }
+    }
+  ],
+  "num_chunks": 1,
+  "query": "how to configure the database",
+  "retrieval_time": "0.42s"
+}
+```
 ## Installation
@@ -31,6 +50,12 @@ pip install chunksilo
 # Or with Confluence support:
 pip install chunksilo[confluence]
+# Or with Jira support:
+pip install chunksilo[jira]
+# Or with both Confluence and Jira:
+pip install chunksilo[confluence,jira]
 ```
 Then:
@@ -144,6 +169,27 @@ All settings are optional and have sensible defaults.
 | `confluence.timeout` | `10.0` | Request timeout in seconds |
 | `confluence.max_results` | `30` | Maximum results per search |
+#### Jira Settings (optional)
+> **Note:** Jira integration requires the optional dependency. Install with: `pip install chunksilo[jira]`
+| Setting | Default | Description |
+| :--- | :--- | :--- |
+| `jira.url` | `""` | Jira base URL (empty = disabled) |
+| `jira.username` | `""` | Jira username/email |
+| `jira.api_token` | `""` | Jira API token |
+| `jira.timeout` | `10.0` | Request timeout in seconds |
+| `jira.max_results` | `30` | Maximum results per search |
+| `jira.projects` | `[]` | Project keys to search (empty = all) |
+| `jira.include_comments` | `true` | Include issue comments in search |
+| `jira.include_custom_fields` | `true` | Include custom fields in search |
+**Creating a Jira API Token:**
+1. Log into Jira
+2. Go to Account Settings > Security > API Tokens
+3. Click "Create API Token"
+4. Copy the token and add it to your config
 #### SSL Settings (optional)
 | Setting | Default | Description |
@@ -317,6 +363,7 @@ Add to `mcp_settings.json` (typically in `~/.config/Code/User/globalStorage/roov
 - **Retrieval errors**: Check paths in your MCP client configuration.
 - **Offline mode**: PyPI installs default to `offline: false` (models auto-download). The offline bundle includes pre-downloaded models and sets `offline: true`. Set `retrieval.offline: true` in `config.yaml` to prevent network calls after initial model download.
 - **Confluence Integration**: Install with `pip install chunksilo[confluence]`, then set `confluence.url`, `confluence.username`, and `confluence.api_token` in `config.yaml`.
+- **Jira Integration**: Install with `pip install chunksilo[jira]`, then set `jira.url`, `jira.username`, and `jira.api_token` in `config.yaml`. Optionally configure `jira.projects` to restrict search to specific project keys.
 - **Custom CA Bundle**: Set `ssl.ca_bundle_path` in `config.yaml` for custom certificates.
 - **Network mounts**: Unavailable directories are skipped with a warning; indexing continues with available directories.
 - **Legacy .doc files**: Requires LibreOffice to be installed for automatic conversion to .docx. If LibreOffice is not found, .doc files are skipped with a warning. Full heading extraction is supported.

{chunksilo-2.0.0 → chunksilo-2.1.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "chunksilo"
-version = "2.0.0"
+version = "2.1.0"
 description = "Local RAG-based semantic document search with MCP server interface"
 license = "Apache-2.0"
 requires-python = ">=3.11"
@@ -26,6 +26,7 @@ dynamic = ["dependencies"]
 [project.optional-dependencies]
 confluence = ["llama-index-readers-confluence>=0.6.0,<1"]
+jira = ["jira>=3.5.0,<4"]
 test = ["pytest>=7.4.0,<9", "requests>=2.31.0,<3"]
 [project.scripts]

{chunksilo-2.0.0 → chunksilo-2.1.0}/src/chunksilo/__init__.py RENAMED Viewed

@@ -1,4 +1,4 @@
 # SPDX-License-Identifier: Apache-2.0
 """ChunkSilo - Local RAG-based semantic document search."""
-__version__ = "2.0.0"
+__version__ = "2.1.0"

{chunksilo-2.0.0 → chunksilo-2.1.0}/src/chunksilo/cfgload.py RENAMED Viewed

@@ -70,6 +70,16 @@ _DEFAULTS: dict[str, Any] = {
         "timeout": 10.0,
         "max_results": 30,
     },
+    "jira": {
+        "url": "",
+        "username": "",
+        "api_token": "",
+        "timeout": 10.0,
+        "max_results": 30,
+        "projects": [],  # Empty list = all accessible projects
+        "include_comments": False,
+        "include_custom_fields": False,
+    },
     "ssl": {
         "ca_bundle_path": "",
     },

chunksilo 2.0.0__tar.gz → 2.1.0__tar.gz

Potentially problematic release.

chunksilo 2.0.0tar.gz → 2.1.0tar.gz