mcp-kb 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
mcp_kb-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,176 @@
1
+ Metadata-Version: 2.4
2
+ Name: mcp-kb
3
+ Version: 0.1.0
4
+ Summary: MCP server exposing a local markdown knowledge base
5
+ Author: LLM Maintainer
6
+ Requires-Python: >=3.11
7
+ Description-Content-Type: text/markdown
8
+ Requires-Dist: chromadb>=1.1.0
9
+ Requires-Dist: httpx>=0.28.1
10
+ Requires-Dist: mcp[cli]>=1.15.0
11
+ Provides-Extra: vector
12
+ Requires-Dist: tiktoken>=0.11.0; extra == "vector"
13
+ Requires-Dist: langchain-text-splitters>=0.3.11; extra == "vector"
14
+
15
+ # MCP Knowledge Base Server
16
+
17
+ This project implements a Model Context Protocol (MCP) server that manages a local markdown knowledge base. It provides tools for creating, reading, updating, searching, and organizing documents stored beneath a configurable root directory. The server uses `FastMCP` for transport and schema generation, so the Python tool functions double as the canonical source of truth for MCP metadata.
18
+
19
+ ## Running the server
20
+
21
+ ```bash
22
+ uv run mcp-kb-server --root /path/to/knowledgebase
23
+ ```
24
+
25
+ To expose multiple transports, pass `--transport` flags (supported values: `stdio`,
26
+ `sse`, `http`):
27
+
28
+ ```bash
29
+ uv run mcp-kb-server --transport stdio --transport http
30
+ ```
31
+
32
+ Use `--host` and `--port` to bind HTTP/SSE transports to specific interfaces:
33
+
34
+ ```bash
35
+ uv run mcp-kb-server --transport http --host 0.0.0.0 --port 9000
36
+ ```
37
+
38
+ On first launch the server copies a bundled `KNOWLEDBASE_DOC.md` into the
39
+ `.docs/` directory if it is missing so that every deployment starts with a
40
+ baseline usage guide.
41
+
42
+ ## Optional ChromaDB Mirroring
43
+
44
+ The CLI can mirror knowledge base changes into a Chroma collection without
45
+ exposing raw Chroma operations as MCP tools. Mirroring is enabled by default via
46
+ the `persistent` client, which stores data under `<root>/chroma`. Choose a
47
+ different backend with `--chroma-client` (choices: `ephemeral`, `persistent`,
48
+ `http`, `cloud`). Example:
49
+
50
+ ```bash
51
+ uv run mcp-kb-server \
52
+ --root /path/to/knowledgebase \
53
+ --chroma-client ephemeral \
54
+ --chroma-collection local-kb \
55
+ --chroma-embedding default
56
+ ```
57
+
58
+ Persistent and remote clients accept additional flags:
59
+
60
+ - `--chroma-data-dir`: storage directory for the persistent client (defaults to
61
+ `<root>/chroma` when not supplied).
62
+ - `--chroma-host`, `--chroma-port`, `--chroma-ssl`, `--chroma-custom-auth`:
63
+ options for a self-hosted HTTP server.
64
+ - `--chroma-tenant`, `--chroma-database`, `--chroma-api-key`: credentials for
65
+ Chroma Cloud deployments.
66
+ - `--chroma-id-prefix`: custom document ID prefix (`kb::` by default).
67
+
68
+ Every flag has a matching environment variable (`MCP_KB_CHROMA_*`), so the
69
+ following snippet enables an HTTP client without modifying CLI commands:
70
+
71
+ ```bash
72
+ export MCP_KB_CHROMA_CLIENT=http
73
+ export MCP_KB_CHROMA_HOST=chroma.internal
74
+ export MCP_KB_CHROMA_PORT=8001
75
+ export MCP_KB_CHROMA_CUSTOM_AUTH="username:password"
76
+ uv run mcp-kb-server --transport http
77
+ ```
78
+
79
+ When enabled, any file creation, update, or soft deletion is synchronously
80
+ propagated to the configured Chroma collection, ensuring semantic search stays
81
+ in lockstep with the markdown knowledge base.
82
+
83
+ The `kb.search` MCP tool automatically queries Chroma when mirroring is active,
84
+ falling back to direct filesystem scans if the semantic index returns no hits.
85
+
86
+ ## Reindexing
87
+
88
+ Use the standalone CLI to rebuild external indexes (e.g., Chroma) from the
89
+ current knowledge base. This command is not exposed as an MCP tool.
90
+
91
+ ```bash
92
+ uv run mcp-kb-reindex --root /path/to/knowledgebase \
93
+ --chroma-client persistent \
94
+ --chroma-data-dir /path/to/chroma \
95
+ --chroma-collection knowledge-base \
96
+ --chroma-embedding default
97
+ ```
98
+
99
+ - Honors the same `--chroma-*` flags and `MCP_KB_CHROMA_*` environment
100
+ variables as the server.
101
+ - Processes all non-deleted `*.md` files under the root and prints a summary:
102
+ `Reindexed N documents`.
103
+
104
+ ## Testing
105
+
106
+ ```bash
107
+ uv run pytest
108
+ ```
109
+
110
+ ## LLM Client Configuration
111
+
112
+ Below are sample configurations for popular MCP-capable LLM clients. All
113
+ examples assume this repository is cloned locally and that `uv` is installed.
114
+
115
+ ### Claude Desktop
116
+
117
+ Add the following block to your Claude Desktop `claude_desktop_config.json`:
118
+
119
+ ```json
120
+ {
121
+ "mcpServers": {
122
+ "local-kb": {
123
+ "server-name": "kb-server",
124
+ "command": "uv",
125
+ "args": [
126
+ "run",
127
+ "mcp-kb-server",
128
+ "--root",
129
+ "/absolute/path/to/.knowledgebase"
130
+ ]
131
+ }
132
+ }
133
+ }
134
+ ```
135
+
136
+ ### Cursor AI
137
+
138
+ In Cursor's `cursor-settings.json`, register the server as a custom tool:
139
+
140
+ ```json
141
+ {
142
+ "mcpServers": {
143
+ "local_knowledge_base": {
144
+ "id": "local-kb",
145
+ "title": "Local Knowledge Base",
146
+ "command": "uvx",
147
+ "args": [
148
+ " --from",
149
+ "~/cursor_projects/local_knowledge_base",
150
+ "mcp-kb-server"
151
+ ]
152
+ }
153
+ }
154
+ }
155
+ ```
156
+
157
+ ### VS Code (Claude MCP Extension)
158
+
159
+ For the `Claude MCP` extension, add an entry to `settings.json`:
160
+
161
+ ```json
162
+ {
163
+ "claudeMcp.servers": {
164
+ "local-kb": {
165
+ "command": "uv",
166
+ "args": ["run", "mcp-kb-server"],
167
+ "env": {
168
+ "MCP_KB_ROOT": "/absolute/path/to/.knowledgebase"
169
+ }
170
+ }
171
+ }
172
+ }
173
+ ```
174
+
175
+ Adjust the `--root` flag or `MCP_KB_ROOT` environment variable to point at the
176
+ desired knowledge base directory for each client.
mcp_kb-0.1.0/README.md ADDED
@@ -0,0 +1,162 @@
1
+ # MCP Knowledge Base Server
2
+
3
+ This project implements a Model Context Protocol (MCP) server that manages a local markdown knowledge base. It provides tools for creating, reading, updating, searching, and organizing documents stored beneath a configurable root directory. The server uses `FastMCP` for transport and schema generation, so the Python tool functions double as the canonical source of truth for MCP metadata.
4
+
5
+ ## Running the server
6
+
7
+ ```bash
8
+ uv run mcp-kb-server --root /path/to/knowledgebase
9
+ ```
10
+
11
+ To expose multiple transports, pass `--transport` flags (supported values: `stdio`,
12
+ `sse`, `http`):
13
+
14
+ ```bash
15
+ uv run mcp-kb-server --transport stdio --transport http
16
+ ```
17
+
18
+ Use `--host` and `--port` to bind HTTP/SSE transports to specific interfaces:
19
+
20
+ ```bash
21
+ uv run mcp-kb-server --transport http --host 0.0.0.0 --port 9000
22
+ ```
23
+
24
+ On first launch the server copies a bundled `KNOWLEDBASE_DOC.md` into the
25
+ `.docs/` directory if it is missing so that every deployment starts with a
26
+ baseline usage guide.
27
+
28
+ ## Optional ChromaDB Mirroring
29
+
30
+ The CLI can mirror knowledge base changes into a Chroma collection without
31
+ exposing raw Chroma operations as MCP tools. Mirroring is enabled by default via
32
+ the `persistent` client, which stores data under `<root>/chroma`. Choose a
33
+ different backend with `--chroma-client` (choices: `ephemeral`, `persistent`,
34
+ `http`, `cloud`). Example:
35
+
36
+ ```bash
37
+ uv run mcp-kb-server \
38
+ --root /path/to/knowledgebase \
39
+ --chroma-client ephemeral \
40
+ --chroma-collection local-kb \
41
+ --chroma-embedding default
42
+ ```
43
+
44
+ Persistent and remote clients accept additional flags:
45
+
46
+ - `--chroma-data-dir`: storage directory for the persistent client (defaults to
47
+ `<root>/chroma` when not supplied).
48
+ - `--chroma-host`, `--chroma-port`, `--chroma-ssl`, `--chroma-custom-auth`:
49
+ options for a self-hosted HTTP server.
50
+ - `--chroma-tenant`, `--chroma-database`, `--chroma-api-key`: credentials for
51
+ Chroma Cloud deployments.
52
+ - `--chroma-id-prefix`: custom document ID prefix (`kb::` by default).
53
+
54
+ Every flag has a matching environment variable (`MCP_KB_CHROMA_*`), so the
55
+ following snippet enables an HTTP client without modifying CLI commands:
56
+
57
+ ```bash
58
+ export MCP_KB_CHROMA_CLIENT=http
59
+ export MCP_KB_CHROMA_HOST=chroma.internal
60
+ export MCP_KB_CHROMA_PORT=8001
61
+ export MCP_KB_CHROMA_CUSTOM_AUTH="username:password"
62
+ uv run mcp-kb-server --transport http
63
+ ```
64
+
65
+ When enabled, any file creation, update, or soft deletion is synchronously
66
+ propagated to the configured Chroma collection, ensuring semantic search stays
67
+ in lockstep with the markdown knowledge base.
68
+
69
+ The `kb.search` MCP tool automatically queries Chroma when mirroring is active,
70
+ falling back to direct filesystem scans if the semantic index returns no hits.
71
+
72
+ ## Reindexing
73
+
74
+ Use the standalone CLI to rebuild external indexes (e.g., Chroma) from the
75
+ current knowledge base. This command is not exposed as an MCP tool.
76
+
77
+ ```bash
78
+ uv run mcp-kb-reindex --root /path/to/knowledgebase \
79
+ --chroma-client persistent \
80
+ --chroma-data-dir /path/to/chroma \
81
+ --chroma-collection knowledge-base \
82
+ --chroma-embedding default
83
+ ```
84
+
85
+ - Honors the same `--chroma-*` flags and `MCP_KB_CHROMA_*` environment
86
+ variables as the server.
87
+ - Processes all non-deleted `*.md` files under the root and prints a summary:
88
+ `Reindexed N documents`.
89
+
90
+ ## Testing
91
+
92
+ ```bash
93
+ uv run pytest
94
+ ```
95
+
96
+ ## LLM Client Configuration
97
+
98
+ Below are sample configurations for popular MCP-capable LLM clients. All
99
+ examples assume this repository is cloned locally and that `uv` is installed.
100
+
101
+ ### Claude Desktop
102
+
103
+ Add the following block to your Claude Desktop `claude_desktop_config.json`:
104
+
105
+ ```json
106
+ {
107
+ "mcpServers": {
108
+ "local-kb": {
109
+ "server-name": "kb-server",
110
+ "command": "uv",
111
+ "args": [
112
+ "run",
113
+ "mcp-kb-server",
114
+ "--root",
115
+ "/absolute/path/to/.knowledgebase"
116
+ ]
117
+ }
118
+ }
119
+ }
120
+ ```
121
+
122
+ ### Cursor AI
123
+
124
+ In Cursor's `cursor-settings.json`, register the server as a custom tool:
125
+
126
+ ```json
127
+ {
128
+ "mcpServers": {
129
+ "local_knowledge_base": {
130
+ "id": "local-kb",
131
+ "title": "Local Knowledge Base",
132
+ "command": "uvx",
133
+ "args": [
134
+ " --from",
135
+ "~/cursor_projects/local_knowledge_base",
136
+ "mcp-kb-server"
137
+ ]
138
+ }
139
+ }
140
+ }
141
+ ```
142
+
143
+ ### VS Code (Claude MCP Extension)
144
+
145
+ For the `Claude MCP` extension, add an entry to `settings.json`:
146
+
147
+ ```json
148
+ {
149
+ "claudeMcp.servers": {
150
+ "local-kb": {
151
+ "command": "uv",
152
+ "args": ["run", "mcp-kb-server"],
153
+ "env": {
154
+ "MCP_KB_ROOT": "/absolute/path/to/.knowledgebase"
155
+ }
156
+ }
157
+ }
158
+ }
159
+ ```
160
+
161
+ Adjust the `--root` flag or `MCP_KB_ROOT` environment variable to point at the
162
+ desired knowledge base directory for each client.
@@ -0,0 +1 @@
1
+ """Top-level package for the MCP knowledge base server implementation."""
@@ -0,0 +1 @@
1
+ """CLI subpackage exposing entry points for running the server."""
@@ -0,0 +1,153 @@
1
+ """Shared CLI argument wiring for knowledge base utilities.
2
+
3
+ This module centralizes the definition of common command-line options and
4
+ helpers so that multiple entry points (e.g., server and reindex commands) can
5
+ remain small and focused while sharing consistent behavior.
6
+ """
7
+ from __future__ import annotations
8
+
9
+ import os
10
+ from argparse import ArgumentParser, Namespace
11
+ from pathlib import Path
12
+ from typing import Optional
13
+
14
+ from mcp_kb.ingest.chroma import SUPPORTED_CLIENTS, ChromaConfiguration, ChromaIngestor
15
+
16
+
17
+ def parse_bool(value: str | bool | None) -> bool:
18
+ """Return ``True`` when ``value`` represents an affirmative boolean string.
19
+
20
+ The function accepts case-insensitive variants such as "true", "t",
21
+ "yes", and "1". ``None`` yields ``False``.
22
+ """
23
+
24
+ if isinstance(value, bool):
25
+ return value
26
+ if value is None:
27
+ return False
28
+ return value.lower() in {"1", "true", "t", "yes", "y"}
29
+
30
+
31
+ def add_chroma_arguments(parser: ArgumentParser) -> None:
32
+ """Register Chroma ingestion arguments on ``parser``.
33
+
34
+ Environment variables are used as defaults where available so that
35
+ deployments can configure ingestion without repeating flags.
36
+ """
37
+
38
+ default_chroma_client = os.getenv("MCP_KB_CHROMA_CLIENT", "persistent").lower()
39
+ default_collection = os.getenv("MCP_KB_CHROMA_COLLECTION", "knowledge-base")
40
+ default_embedding = os.getenv("MCP_KB_CHROMA_EMBEDDING", "default")
41
+ default_data_dir = os.getenv("MCP_KB_CHROMA_DATA_DIR")
42
+ default_host = os.getenv("MCP_KB_CHROMA_HOST")
43
+ default_port_env = os.getenv("MCP_KB_CHROMA_PORT")
44
+ default_port = int(default_port_env) if default_port_env else None
45
+ default_ssl = parse_bool(os.getenv("MCP_KB_CHROMA_SSL", "true"))
46
+ default_tenant = os.getenv("MCP_KB_CHROMA_TENANT")
47
+ default_database = os.getenv("MCP_KB_CHROMA_DATABASE")
48
+ default_api_key = os.getenv("MCP_KB_CHROMA_API_KEY")
49
+ default_custom_auth = os.getenv("MCP_KB_CHROMA_CUSTOM_AUTH")
50
+ default_id_prefix = os.getenv("MCP_KB_CHROMA_ID_PREFIX")
51
+
52
+ parser.add_argument(
53
+ "--chroma-client",
54
+ dest="chroma_client",
55
+ choices=SUPPORTED_CLIENTS,
56
+ default=default_chroma_client,
57
+ help="Client implementation for mirroring data to ChromaDB (default: persistent).",
58
+ )
59
+ parser.add_argument(
60
+ "--chroma-collection",
61
+ dest="chroma_collection",
62
+ default=default_collection,
63
+ help="Chroma collection name used to store documents.",
64
+ )
65
+ parser.add_argument(
66
+ "--chroma-embedding",
67
+ dest="chroma_embedding",
68
+ default=default_embedding,
69
+ help="Embedding function name registered with chromadb.utils.embedding_functions.",
70
+ )
71
+ parser.add_argument(
72
+ "--chroma-data-dir",
73
+ dest="chroma_data_dir",
74
+ default=default_data_dir,
75
+ help="Storage directory for the persistent Chroma client.",
76
+ )
77
+ parser.add_argument(
78
+ "--chroma-host",
79
+ dest="chroma_host",
80
+ default=default_host,
81
+ help="Target host for HTTP or cloud Chroma clients.",
82
+ )
83
+ parser.add_argument(
84
+ "--chroma-port",
85
+ dest="chroma_port",
86
+ type=int,
87
+ default=default_port,
88
+ help="Port for the HTTP Chroma client.",
89
+ )
90
+ parser.add_argument(
91
+ "--chroma-ssl",
92
+ dest="chroma_ssl",
93
+ type=parse_bool,
94
+ default=default_ssl,
95
+ help="Toggle SSL for the HTTP Chroma client (default: true).",
96
+ )
97
+ parser.add_argument(
98
+ "--chroma-tenant",
99
+ dest="chroma_tenant",
100
+ default=default_tenant,
101
+ help="Tenant identifier for Chroma Cloud deployments.",
102
+ )
103
+ parser.add_argument(
104
+ "--chroma-database",
105
+ dest="chroma_database",
106
+ default=default_database,
107
+ help="Database name for Chroma Cloud deployments.",
108
+ )
109
+ parser.add_argument(
110
+ "--chroma-api-key",
111
+ dest="chroma_api_key",
112
+ default=default_api_key,
113
+ help="API key used to authenticate against Chroma Cloud.",
114
+ )
115
+ parser.add_argument(
116
+ "--chroma-custom-auth",
117
+ dest="chroma_custom_auth",
118
+ default=default_custom_auth,
119
+ help="Optional custom auth credentials for self-hosted HTTP deployments.",
120
+ )
121
+ parser.add_argument(
122
+ "--chroma-id-prefix",
123
+ dest="chroma_id_prefix",
124
+ default=default_id_prefix,
125
+ help="Prefix applied to document IDs stored in Chroma (default: kb::).",
126
+ )
127
+
128
+
129
+ def build_chroma_listener(options: Namespace, root: Path) -> Optional[ChromaIngestor]:
130
+ """Construct a Chroma listener from parsed CLI options when enabled.
131
+
132
+ Returns ``None`` when the configured client type is ``off``.
133
+ """
134
+
135
+ configuration = ChromaConfiguration.from_options(
136
+ root=root,
137
+ client_type=options.chroma_client,
138
+ collection_name=options.chroma_collection,
139
+ embedding=options.chroma_embedding,
140
+ data_directory=options.chroma_data_dir,
141
+ host=options.chroma_host,
142
+ port=options.chroma_port,
143
+ ssl=options.chroma_ssl,
144
+ tenant=options.chroma_tenant,
145
+ database=options.chroma_database,
146
+ api_key=options.chroma_api_key,
147
+ custom_auth_credentials=options.chroma_custom_auth,
148
+ id_prefix=options.chroma_id_prefix,
149
+ )
150
+ if not configuration.enabled:
151
+ return None
152
+ return ChromaIngestor(configuration)
153
+
@@ -0,0 +1,116 @@
1
+ """Command line interface for running the MCP knowledge base server."""
2
+ from __future__ import annotations
3
+
4
+ import argparse
5
+ import asyncio
6
+ import logging
7
+ import os
8
+ from pathlib import Path
9
+ from typing import Iterable, List, Optional
10
+
11
+ from mcp_kb.config import DOCS_FOLDER_NAME, resolve_knowledge_base_root
12
+ from mcp_kb.cli.args import add_chroma_arguments, build_chroma_listener, parse_bool
13
+ from mcp_kb.ingest.chroma import ChromaIngestor
14
+ from mcp_kb.knowledge.bootstrap import install_default_documentation
15
+ from mcp_kb.security.path_validation import PathRules
16
+ from mcp_kb.server.app import create_fastmcp_app
17
+ from mcp.server.fastmcp import FastMCP
18
+
19
+ logging.basicConfig(level=logging.INFO)
20
+
21
+ logger = logging.getLogger(__name__)
22
+
23
+
24
+ def _build_argument_parser() -> argparse.ArgumentParser:
25
+ """Create and return the argument parser used by ``main``."""
26
+
27
+ parser = argparse.ArgumentParser(description="Run the MCP knowledge base server")
28
+ parser.add_argument(
29
+ "--root",
30
+ dest="root",
31
+ default=None,
32
+ help="Optional path to the knowledge base root (defaults to environment configuration)",
33
+ )
34
+ parser.add_argument(
35
+ "--transport",
36
+ dest="transports",
37
+ action="append",
38
+ choices=["stdio", "sse", "http"],
39
+ help="Transport protocol to enable (repeatable). Defaults to stdio only.",
40
+ )
41
+ parser.add_argument(
42
+ "--host",
43
+ dest="host",
44
+ default=None,
45
+ help="Host interface for HTTP/SSE transports (default 127.0.0.1).",
46
+ )
47
+ parser.add_argument(
48
+ "--port",
49
+ dest="port",
50
+ type=int,
51
+ default=None,
52
+ help="Port for HTTP/SSE transports (default 8000).",
53
+ )
54
+
55
+ add_chroma_arguments(parser)
56
+ return parser
57
+
58
+
59
+ async def _run_transports(server: FastMCP, transports: List[str]) -> None:
60
+ """Run all selected transport protocols concurrently."""
61
+
62
+ coroutines = []
63
+ for name in transports:
64
+ if name == "stdio":
65
+ coroutines.append(server.run_stdio_async())
66
+ elif name == "sse":
67
+ coroutines.append(server.run_sse_async())
68
+ elif name == "http":
69
+ coroutines.append(server.run_streamable_http_async())
70
+ else: # pragma: no cover - argparse restricts values
71
+ raise ValueError(f"Unsupported transport: {name}")
72
+
73
+ await asyncio.gather(*coroutines)
74
+
75
+
76
+ def run_server(arguments: Iterable[str] | None = None) -> None:
77
+ """Entry point used by both CLI invocations and unit tests."""
78
+
79
+ parser = _build_argument_parser()
80
+ options = parser.parse_args(arguments)
81
+ root_path = resolve_knowledge_base_root(options.root)
82
+ rules = PathRules(root=root_path, protected_folders=(DOCS_FOLDER_NAME,))
83
+ install_default_documentation(root_path)
84
+ listeners: List[ChromaIngestor] = []
85
+ try:
86
+ listener = build_chroma_listener(options, root_path)
87
+ except Exception as exc: # pragma: no cover - configuration errors
88
+ raise SystemExit(f"Failed to configure Chroma ingestion: {exc}") from exc
89
+ if listener is not None:
90
+ listeners.append(listener)
91
+ logger.info(
92
+ "Chroma ingestion enabled (client=%s, collection=%s)",
93
+ options.chroma_client,
94
+ options.chroma_collection,
95
+ )
96
+ server = create_fastmcp_app(
97
+ rules,
98
+ host=options.host,
99
+ port=options.port,
100
+ listeners=listeners,
101
+ )
102
+ transports = options.transports or ["stdio"]
103
+ logger.info(f"Running server on {options.host}:{options.port} with transports {transports}")
104
+ logger.info(f"Data root is {root_path}")
105
+ print("--------------------------------",root_path,"--------------------------------")
106
+ asyncio.run(_run_transports(server, transports))
107
+
108
+
109
+ def main() -> None:
110
+ """CLI hook that executes :func:`run_server`."""
111
+
112
+ run_server()
113
+
114
+
115
+ if __name__ == "__main__":
116
+ main()
@@ -0,0 +1,90 @@
1
+ """CLI command to reindex the knowledge base into configured ingestors.
2
+
3
+ This command does not expose an MCP tool. Instead, it builds the configured
4
+ ingestors and calls their ``reindex`` method when available, allowing operators
5
+ to trigger a full rebuild of external indexes (e.g., Chroma) from the current
6
+ filesystem state.
7
+ """
8
+ from __future__ import annotations
9
+
10
+ import argparse
11
+ import logging
12
+ from typing import Iterable, List
13
+
14
+ from mcp_kb.cli.args import add_chroma_arguments, build_chroma_listener
15
+ from mcp_kb.config import DOCS_FOLDER_NAME, resolve_knowledge_base_root
16
+ from mcp_kb.knowledge.events import KnowledgeBaseReindexListener
17
+ from mcp_kb.knowledge.store import KnowledgeBase
18
+ from mcp_kb.security.path_validation import PathRules
19
+
20
+
21
+ logger = logging.getLogger(__name__)
22
+
23
+
24
+ def _build_argument_parser() -> argparse.ArgumentParser:
25
+ """Return the argument parser for the reindex command."""
26
+
27
+ parser = argparse.ArgumentParser(description="Reindex the knowledge base into configured backends")
28
+ parser.add_argument(
29
+ "--root",
30
+ dest="root",
31
+ default=None,
32
+ help="Optional path to the knowledge base root (defaults to environment configuration)",
33
+ )
34
+ add_chroma_arguments(parser)
35
+ return parser
36
+
37
+
38
+ def run_reindex(arguments: Iterable[str] | None = None) -> int:
39
+ """Execute a reindex run across all registered ingestors.
40
+
41
+ The function constructs a :class:`~mcp_kb.knowledge.store.KnowledgeBase`
42
+ using the same root resolution logic as the server, builds any enabled
43
+ ingestion listeners from CLI options, and invokes ``reindex`` on those that
44
+ implement the optional protocol.
45
+
46
+ Parameters
47
+ ----------
48
+ arguments:
49
+ Optional iterable of command-line arguments, primarily used by tests.
50
+
51
+ Returns
52
+ -------
53
+ int
54
+ The total number of documents processed across all reindex-capable
55
+ listeners.
56
+ """
57
+
58
+ parser = _build_argument_parser()
59
+ options = parser.parse_args(arguments)
60
+ root_path = resolve_knowledge_base_root(options.root)
61
+ rules = PathRules(root=root_path, protected_folders=(DOCS_FOLDER_NAME,))
62
+ kb = KnowledgeBase(rules)
63
+
64
+ listeners: List[KnowledgeBaseReindexListener] = []
65
+ try:
66
+ chroma = build_chroma_listener(options, root_path)
67
+ except Exception as exc: # pragma: no cover - configuration errors
68
+ raise SystemExit(f"Failed to configure Chroma ingestion: {exc}") from exc
69
+ if chroma is not None and isinstance(chroma, KnowledgeBaseReindexListener):
70
+ listeners.append(chroma)
71
+
72
+ total = 0
73
+ for listener in listeners:
74
+ count = listener.reindex(kb)
75
+ logger.info("Reindexed %d documents via %s", count, listener.__class__.__name__)
76
+ total += count
77
+
78
+ return total
79
+
80
+
81
+ def main() -> None:
82
+ """CLI hook that executes :func:`run_reindex` and prints a summary."""
83
+
84
+ total = run_reindex()
85
+ print(f"Reindexed {total} documents")
86
+
87
+
88
+ if __name__ == "__main__":
89
+ main()
90
+