PyPI - knowledge-master - Versions diffs - 0.1.0__tar.gz → 0.2.0__tar.gz - Mend

knowledge-master 0.1.0tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

{knowledge_master-0.1.0 → knowledge_master-0.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: knowledge-master
-Version: 0.1.0
+Version: 0.2.0
 Summary: Local-first knowledge graph for developers. Your AI agent's permanent memory.
 Author: Milenko Mitrovic
 License: MIT
@@ -27,6 +27,7 @@ Requires-Dist: gitpython<4.0,>=3.1.0
 Requires-Dist: rich<15.0,>=14.0.0
 Requires-Dist: fastapi<1.0,>=0.115.0
 Requires-Dist: uvicorn<1.0,>=0.34.0
+Requires-Dist: pyyaml>=6.0
 Provides-Extra: office
 Requires-Dist: python-docx<2.0,>=1.1.0; extra == "office"
 Requires-Dist: openpyxl<4.0,>=3.1.0; extra == "office"
@@ -41,6 +42,10 @@ Dynamic: license-file
 **Your codebase's memory.** A local knowledge graph that gives AI agents real understanding of your architecture — not just text search.
 [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
+![Status: Alpha](https://img.shields.io/badge/Status-Alpha-orange)
+![Python 3.11+](https://img.shields.io/badge/Python-3.11+-blue)
+> ⚠️ **Alpha software.** Core features work (search, graph, CLI, MCP server) but some capabilities are early-stage. See [Feature Status](#feature-status) below.
 ---
@@ -210,9 +215,11 @@ Your AI agent gets these tools:
 | `km start` | Boot Docker containers + pull embedding model |
 | `km stop` | Stop containers |
 | `km index <path>` | Index a git repo or docs directory |
-| `km search <query>` | Semantic search with graph context |
-| `km blast-radius <target>` | Show dependencies and affected entities |
+| `km search <query>` | Semantic search with re-ranking |
+| `km blast-radius <target>` | Multi-layer dependency analysis (imports → services → people) |
+| `km who-owns <file>` | File ownership from git blame (weighted by recency) |
 | `km check-conventions <path>` | Verify code follows detected patterns |
+| `km connect <source>` | Pull from external MCP (email, Slack) |
 | `km list` | Show indexed repos, techs, stats |
 | `km remove <name>` | Remove a source from the knowledge base |
 | `km serve` | Start web UI at http://127.0.0.1:9999 |
@@ -231,6 +238,26 @@ When you index a repo, Knowledge Master detects:
 | **People** | Git commit authors and file ownership |
 | **Code structure** | Functions, classes, chunked by AST-aware boundaries |
+## Feature Status
+| Feature | Status | Notes |
+|---|---|---|
+| Semantic search + re-ranking | ✅ Stable | Core retrieval works well |
+| Knowledge graph (FalkorDB) | ✅ Stable | Node/edge storage, vector index |
+| CLI commands | ✅ Stable | All commands functional |
+| MCP server | ✅ Stable | search, blast_radius, check_conventions |
+| Web UI + graph viz | ✅ Stable | htmx + D3, no build step |
+| Git repo indexing | ✅ Stable | Parses code, extracts authors |
+| Tech stack detection | ⚡ Basic | Regex over dependency files — works for common cases |
+| Service topology | ⚡ Basic | docker-compose parsing — limited YAML support |
+| Convention detection | ⚡ Basic | Folder structure + file naming patterns |
+| Blast radius | ⚡ Basic | Graph traversal on stored edges — doesn't trace imports/calls |
+| Email connector (ms-365) | 🧪 Experimental | Works but requires ms-365-mcp setup |
+| Re-ranking | 🧪 Experimental | Novel approach, not benchmarked against cross-encoders |
+| Incremental indexing | 🧪 Experimental | File watcher + git hooks, needs more testing |
+**Legend:** ✅ Stable — ⚡ Basic (works, limited scope) — 🧪 Experimental (may change)
 ## Comparison
 | Feature | Knowledge Master | Generic RAG | GitHub Copilot | Glean |
@@ -259,6 +286,23 @@ python -m knowledge_master.server
 python -m knowledge_master.cli status
 ```
+## Security
+Knowledge Master runs **entirely on your machine**. No data leaves localhost.
+- All ports bound to `127.0.0.1` (not accessible from LAN)
+- Ollama runs locally — no cloud API calls
+- MCP server uses stdio (no network exposure)
+- Optional API key auth for REST endpoints
+```bash
+# Enable API key auth
+export KM_API_KEY=$(openssl rand -hex 32)
+km serve
+```
+See [SECURITY.md](SECURITY.md) for full security model, risks, and hardening guide.
 ## Troubleshooting
 | Issue | Fix |

{knowledge_master-0.1.0 → knowledge_master-0.2.0}/README.md RENAMED Viewed

@@ -3,6 +3,10 @@
 **Your codebase's memory.** A local knowledge graph that gives AI agents real understanding of your architecture — not just text search.
 [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
+![Status: Alpha](https://img.shields.io/badge/Status-Alpha-orange)
+![Python 3.11+](https://img.shields.io/badge/Python-3.11+-blue)
+> ⚠️ **Alpha software.** Core features work (search, graph, CLI, MCP server) but some capabilities are early-stage. See [Feature Status](#feature-status) below.
 ---
@@ -172,9 +176,11 @@ Your AI agent gets these tools:
 | `km start` | Boot Docker containers + pull embedding model |
 | `km stop` | Stop containers |
 | `km index <path>` | Index a git repo or docs directory |
-| `km search <query>` | Semantic search with graph context |
-| `km blast-radius <target>` | Show dependencies and affected entities |
+| `km search <query>` | Semantic search with re-ranking |
+| `km blast-radius <target>` | Multi-layer dependency analysis (imports → services → people) |
+| `km who-owns <file>` | File ownership from git blame (weighted by recency) |
 | `km check-conventions <path>` | Verify code follows detected patterns |
+| `km connect <source>` | Pull from external MCP (email, Slack) |
 | `km list` | Show indexed repos, techs, stats |
 | `km remove <name>` | Remove a source from the knowledge base |
 | `km serve` | Start web UI at http://127.0.0.1:9999 |
@@ -193,6 +199,26 @@ When you index a repo, Knowledge Master detects:
 | **People** | Git commit authors and file ownership |
 | **Code structure** | Functions, classes, chunked by AST-aware boundaries |
+## Feature Status
+| Feature | Status | Notes |
+|---|---|---|
+| Semantic search + re-ranking | ✅ Stable | Core retrieval works well |
+| Knowledge graph (FalkorDB) | ✅ Stable | Node/edge storage, vector index |
+| CLI commands | ✅ Stable | All commands functional |
+| MCP server | ✅ Stable | search, blast_radius, check_conventions |
+| Web UI + graph viz | ✅ Stable | htmx + D3, no build step |
+| Git repo indexing | ✅ Stable | Parses code, extracts authors |
+| Tech stack detection | ⚡ Basic | Regex over dependency files — works for common cases |
+| Service topology | ⚡ Basic | docker-compose parsing — limited YAML support |
+| Convention detection | ⚡ Basic | Folder structure + file naming patterns |
+| Blast radius | ⚡ Basic | Graph traversal on stored edges — doesn't trace imports/calls |
+| Email connector (ms-365) | 🧪 Experimental | Works but requires ms-365-mcp setup |
+| Re-ranking | 🧪 Experimental | Novel approach, not benchmarked against cross-encoders |
+| Incremental indexing | 🧪 Experimental | File watcher + git hooks, needs more testing |
+**Legend:** ✅ Stable — ⚡ Basic (works, limited scope) — 🧪 Experimental (may change)
 ## Comparison
 | Feature | Knowledge Master | Generic RAG | GitHub Copilot | Glean |
@@ -221,6 +247,23 @@ python -m knowledge_master.server
 python -m knowledge_master.cli status
 ```
+## Security
+Knowledge Master runs **entirely on your machine**. No data leaves localhost.
+- All ports bound to `127.0.0.1` (not accessible from LAN)
+- Ollama runs locally — no cloud API calls
+- MCP server uses stdio (no network exposure)
+- Optional API key auth for REST endpoints
+```bash
+# Enable API key auth
+export KM_API_KEY=$(openssl rand -hex 32)
+km serve
+```
+See [SECURITY.md](SECURITY.md) for full security model, risks, and hardening guide.
 ## Troubleshooting
 | Issue | Fix |

knowledge_master-0.2.0/knowledge_master/api.py ADDED Viewed

@@ -0,0 +1,92 @@
+"""REST API — JSON endpoints for external tool integration."""
+from pathlib import Path
+from fastapi import APIRouter
+from . import embeddings, store
+from .parsers import git_repo, markdown
+router = APIRouter(prefix="/api/v1")
+@router.get("/search")
+async def search(q: str, top_k: int = 10, source_type: str = None):
+    """Semantic search across the knowledge base."""
+    graph = store.get_graph()
+    vec = embeddings.embed(q)
+    results = store.graph_context_search(graph, vec, top_k, query=q)
+    if source_type:
+        results = [r for r in results if r.get("source_type") == source_type]
+    return {"query": q, "results": results}
+@router.get("/blast-radius/{target}")
+async def blast_radius(target: str):
+    """Show what depends on a target."""
+    graph = store.get_graph()
+    # Try Service
+    result = graph.query(
+        """MATCH (t:Service {name: $name})
+           OPTIONAL MATCH (other)-[*1..3]->(t)
+           WHERE other <> t
+           RETURN labels(other)[0] AS type, other.name AS name""",
+        params={"name": target},
+    )
+    if not result.result_set or all(r[1] is None for r in result.result_set):
+        # Try Tech
+        result = graph.query(
+            """MATCH (t:Tech {name: $name})
+               OPTIONAL MATCH (r:Repo)-[:USES_TECH]->(t)
+               RETURN 'Repo' AS type, r.name AS name""",
+            params={"name": target},
+        )
+    affected = [{"type": r[0], "name": r[1]} for r in (result.result_set or []) if r[1]]
+    return {"target": target, "affected_count": len(affected), "affected": affected}
+@router.get("/conventions/check")
+async def check_conventions(path: str = "."):
+    """Check conventions for a path."""
+    path = str(Path(path).expanduser().resolve())
+    repo_name = Path(path).name
+    graph = store.get_graph()
+    result = graph.query(
+        "MATCH (r:Repo)-[:FOLLOWS]->(c:Convention) WHERE r.name = $name RETURN c.name, c.category",
+        params={"name": repo_name},
+    )
+    if not result.result_set:
+        result = graph.query("MATCH (c:Convention) RETURN c.name, c.category")
+    from .cli import _check_convention
+    checks = []
+    for conv_name, category in (result.result_set or []):
+        passed = _check_convention(path, conv_name)
+        checks.append({"convention": conv_name, "category": category, "passed": passed})
+    return {"path": path, "checks": checks}
+@router.post("/index")
+async def index_source(path: str, type: str = "auto"):
+    """Index a repo or directory."""
+    path = str(Path(path).expanduser().resolve())
+    if not Path(path).exists():
+        return {"error": f"Path not found: {path}"}
+    graph = store.get_graph()
+    store.init_schema(graph)
+    resolved_type = type if type != "auto" else ("repo" if (Path(path) / ".git").exists() else "docs")
+    if resolved_type == "repo":
+        result = git_repo.index_repo(path, graph)
+    else:
+        result = markdown.index_directory(path, graph)
+    return result
+@router.get("/status")
+async def status():
+    """Knowledge base stats."""
+    graph = store.get_graph()
+    return store.get_stats(graph)

{knowledge_master-0.1.0 → knowledge_master-0.2.0}/knowledge_master/cli.py RENAMED Viewed

@@ -108,7 +108,7 @@ def search(
     """Semantic search across the knowledge base."""
     graph = store.get_graph()
     vec = embeddings.embed(query)
-    results = store.graph_context_search(graph, vec, top_k)
+    results = store.graph_context_search(graph, vec, top_k, query=query)
     table = Table(title=f"Results for: {query}")
     table.add_column("Score", width=6)
@@ -134,62 +134,135 @@ def search(
 @app.command()
 def blast_radius(
-    target: str = typer.Argument(..., help="Service, file, or tech name to check"),
-    depth: int = typer.Option(3, "--depth", "-d", help="Traversal depth"),
+    target: str = typer.Argument(..., help="Service, file, function, or tech name"),
+    depth: int = typer.Option(4, "--depth", "-d", help="Traversal depth"),
 ):
-    """Show what depends on a target — the blast radius of changing it."""
+    """Show what depends on a target — multi-layer blast radius analysis."""
     graph = store.get_graph()
+    results = _compute_blast_radius(graph, target, depth)
-    # Try as Service first
-    result = graph.query(
-        """MATCH (target:Service {name: $name})
-           OPTIONAL MATCH path = (other)-[*1..3]->(target)
-           WHERE other <> target
-           RETURN labels(other)[0] AS type, other.name AS name,
-                  length(path) AS distance, type(last(relationships(path))) AS rel
-           ORDER BY distance""",
-        params={"name": target},
-    )
-    if not result.result_set:
-        # Try as Tech
-        result = graph.query(
-            """MATCH (target:Tech {name: $name})
-               OPTIONAL MATCH (r:Repo)-[:USES_TECH]->(target)
-               RETURN 'Repo' AS type, r.name AS name, 1 AS distance, 'USES_TECH' AS rel""",
-            params={"name": target},
-        )
-    if not result.result_set:
-        # Try as file/document
-        result = graph.query(
-            """MATCH (target:Document) WHERE target.path CONTAINS $name
-               OPTIONAL MATCH (c:Chunk)-[:PART_OF]->(target)
-               OPTIONAL MATCH (p:Person)-[:AUTHORED]->(target)
-               OPTIONAL MATCH (target)-[:IN_REPO]->(r:Repo)
-               RETURN 'Repo' AS type, r.name AS name, 1 AS distance, 'CONTAINS' AS rel
-               UNION
-               MATCH (target:Document) WHERE target.path CONTAINS $name
-               OPTIONAL MATCH (p:Person)-[:AUTHORED]->(target)
-               RETURN 'Person' AS type, p.name AS name, 1 AS distance, 'AUTHORED' AS rel""",
-            params={"name": target},
-        )
-    if not result.result_set or all(r[1] is None for r in result.result_set):
+    if not results:
         console.print(f"[yellow]No dependencies found for:[/] {target}")
-        console.print("[dim]Try: a service name, technology, or file path[/]")
+        console.print("[dim]Try: a file path, function name, service, or technology[/]")
         return
     tree = Tree(f"[bold red]💥 Blast radius: {target}[/]")
-    seen = set()
-    for node_type, name, distance, rel in result.result_set:
-        if name and name not in seen:
-            seen.add(name)
-            icon = {"Repo": "📦", "Service": "⚙️", "Person": "👤", "Document": "📄", "Tech": "🔧"}.get(node_type, "•")
-            tree.add(f"{icon} [bold]{name}[/] [dim]({node_type}, via {rel})[/]")
+    # Group by confidence
+    definite = [r for r in results if r["confidence"] == "definite"]
+    likely = [r for r in results if r["confidence"] == "likely"]
+    possible = [r for r in results if r["confidence"] == "possible"]
+    if definite:
+        branch = tree.add("[bold]Definite impact[/]")
+        for r in definite:
+            icon = _icon(r["type"])
+            branch.add(f"{icon} [bold]{r['name']}[/] [dim]({r['type']}, {r['rel']})[/]")
+    if likely:
+        branch = tree.add("[yellow]Likely affected[/]")
+        for r in likely:
+            icon = _icon(r["type"])
+            branch.add(f"{icon} {r['name']} [dim]({r['type']}, {r['rel']})[/]")
+    if possible:
+        branch = tree.add("[dim]Possibly affected[/]")
+        for r in possible:
+            icon = _icon(r["type"])
+            branch.add(f"{icon} {r['name']} [dim]({r['type']}, {r['rel']})[/]")
     console.print(tree)
-    console.print(f"\n[dim]{len(seen)} entities affected[/]")
+    console.print(f"\n[dim]{len(results)} entities: {len(definite)} definite, {len(likely)} likely, {len(possible)} possible[/]")
+def _compute_blast_radius(graph, target: str, depth: int = 4) -> list[dict]:
+    """Multi-layer blast radius: Symbol → File → Service → Person."""
+    results = []
+    seen = set()
+    # Layer 1: File-level imports (who imports this file?)
+    r = graph.query(
+        """MATCH (src:Document)-[:IMPORTS]->(dst:Document)
+           WHERE dst.path CONTAINS $name
+           RETURN 'Document' AS type, src.path AS name, 'IMPORTS' AS rel""",
+        params={"name": target},
+    )
+    for row in (r.result_set or []):
+        if row[1] and row[1] not in seen:
+            seen.add(row[1])
+            results.append({"type": row[0], "name": row[1], "rel": row[2], "confidence": "definite"})
+    # Layer 1b: Symbol-level (who defines/uses this function?)
+    r = graph.query(
+        """MATCH (f:Function {name: $name})-[:DEFINED_IN]->(d:Document)
+           OPTIONAL MATCH (importer:Document)-[:IMPORTS]->(d)
+           RETURN 'Document' AS type, importer.path AS name, 'IMPORTS function' AS rel""",
+        params={"name": target},
+    )
+    for row in (r.result_set or []):
+        if row[1] and row[1] not in seen:
+            seen.add(row[1])
+            results.append({"type": row[0], "name": row[1], "rel": row[2], "confidence": "definite"})
+    # Layer 2: Service-level (which service owns affected files?)
+    affected_files = [r["name"] for r in results if r["type"] == "Document"]
+    affected_files.append(target)  # include the target itself
+    r = graph.query(
+        """MATCH (d:Document)-[:IN_REPO]->(repo:Repo)-[:DEFINES_SERVICE]->(svc:Service)
+           WHERE any(f IN $files WHERE d.path CONTAINS f)
+           RETURN 'Service' AS type, svc.name AS name, 'owns affected file' AS rel""",
+        params={"files": affected_files},
+    )
+    for row in (r.result_set or []):
+        if row[1] and row[1] not in seen:
+            seen.add(row[1])
+            results.append({"type": row[0], "name": row[1], "rel": row[2], "confidence": "likely"})
+    # Layer 2b: Services that depend on affected services
+    affected_services = [r["name"] for r in results if r["type"] == "Service"]
+    if affected_services:
+        r = graph.query(
+            """MATCH (upstream:Service)-[:DEPENDS_ON]->(downstream:Service)
+               WHERE downstream.name IN $services
+               RETURN 'Service' AS type, upstream.name AS name, 'DEPENDS_ON' AS rel""",
+            params={"services": affected_services},
+        )
+        for row in (r.result_set or []):
+            if row[1] and row[1] not in seen:
+                seen.add(row[1])
+                results.append({"type": row[0], "name": row[1], "rel": row[2], "confidence": "likely"})
+    # Layer 3: Tech-level
+    r = graph.query(
+        """MATCH (t:Tech {name: $name})
+           OPTIONAL MATCH (repo:Repo)-[:USES_TECH]->(t)
+           RETURN 'Repo' AS type, repo.name AS name, 'USES_TECH' AS rel""",
+        params={"name": target},
+    )
+    for row in (r.result_set or []):
+        if row[1] and row[1] not in seen:
+            seen.add(row[1])
+            results.append({"type": row[0], "name": row[1], "rel": row[2], "confidence": "possible"})
+    # Layer 4: People (who authored affected files?)
+    r = graph.query(
+        """MATCH (p:Person)-[:AUTHORED]->(d:Document)
+           WHERE any(f IN $files WHERE d.path = f)
+           RETURN 'Person' AS type, p.name AS name, 'AUTHORED affected file' AS rel""",
+        params={"files": affected_files},
+    )
+    for row in (r.result_set or []):
+        if row[1] and row[1] not in seen:
+            seen.add(row[1])
+            results.append({"type": row[0], "name": row[1], "rel": row[2], "confidence": "possible"})
+    return results
+def _icon(node_type: str) -> str:
+    return {"Repo": "📦", "Service": "⚙️", "Person": "👤", "Document": "📄",
+            "Tech": "🔧", "Function": "🔧", "Class": "🏗️"}.get(node_type, "•")
 @app.command()
@@ -313,6 +386,33 @@ def remove(source: str = typer.Argument(..., help="Repo name or doc path to remo
         console.print(f"[yellow]Not found:[/] {source}")
+@app.command()
+def connect(
+    source: str = typer.Argument(..., help="Source to pull from: outlook, slack, notion, or custom"),
+    command: str = typer.Option(None, "--command", "-c", help="Custom MCP server command"),
+    tool: str = typer.Option(None, "--tool", "-t", help="Tool name to call on the MCP server"),
+):
+    """Pull and index data from an external MCP server (email, Slack, etc.)."""
+    from .connectors import sync_pull_and_index, add_custom_source, SOURCES
+    if command and tool:
+        add_custom_source(source, command.split(), tool)
+    if source not in SOURCES:
+        console.print(f"[yellow]Unknown source:[/] {source}")
+        console.print(f"[dim]Available: {', '.join(SOURCES.keys())}[/]")
+        console.print("[dim]Or use --command and --tool for custom MCP servers[/]")
+        raise typer.Exit(1)
+    console.print(f"[bold blue]Connecting to {source}...[/]")
+    try:
+        result = sync_pull_and_index(source)
+        console.print(f"[green]✓ Done![/] {json.dumps(result)}")
+    except Exception as e:
+        console.print(f"[red]✗ Failed:[/] {e}")
+        raise typer.Exit(1)
 @app.command()
 def status():
     """Check system health."""
@@ -340,5 +440,25 @@ def serve(port: int = typer.Option(9999, help="Port for web UI")):
     uvicorn.run(create_app(), host="127.0.0.1", port=port)
+@app.command(name="who-owns")
+def who_owns(file: str = typer.Argument(..., help="File path to check ownership")):
+    """Show who owns a file based on git blame analysis."""
+    graph = store.get_graph()
+    result = graph.query(
+        """MATCH (p:Person)-[r:OWNS]->(d:Document)
+           WHERE d.path CONTAINS $file
+           RETURN p.name, r.weight, d.path
+           ORDER BY r.weight DESC LIMIT 1""",
+        params={"file": file},
+    )
+    if result.result_set:
+        name, weight, path = result.result_set[0]
+        console.print(f"[bold]{path}[/]")
+        console.print(f"  Owner: [green]{name}[/] (weight: {weight:.2f})")
+    else:
+        console.print(f"[yellow]No ownership data for:[/] {file}")
+        console.print("[dim]Run 'km index <repo>' first to extract ownership.[/]")
 if __name__ == "__main__":
     app()

knowledge_master-0.2.0/knowledge_master/connectors.py ADDED Viewed

@@ -0,0 +1,134 @@
+"""MCP Connector — index data from external MCP servers (email, Slack, etc.)."""
+import asyncio
+import json
+from dataclasses import dataclass
+from . import chunking, embeddings, store
+@dataclass
+class MCPSource:
+    """Configuration for an external MCP server to pull data from."""
+    name: str
+    command: list[str]
+    tool_name: str  # which tool to call to get data
+    tool_args: dict  # arguments to pass
+    source_type: str  # email, slack, docs, etc.
+# Pre-configured sources — commands must be installed separately
+SOURCES = {
+    "outlook": MCPSource(
+        name="Microsoft 365 Emails",
+        command=["npx", "@subzone81/ms-365-mcp", "--preset", "mail"],
+        tool_name="list-mail-messages",
+        tool_args={"top": 50},
+        source_type="email",
+    ),
+    "slack": MCPSource(
+        name="Slack Messages",
+        command=["npx", "@modelcontextprotocol/server-slack"],
+        tool_name="slack_search_messages",
+        tool_args={"query": ""},
+        source_type="slack",
+    ),
+}
+async def pull_and_index(source: MCPSource, graph=None):
+    """Connect to an MCP server, pull data, and index it into our graph."""
+    from mcp import ClientSession
+    from mcp.client.stdio import stdio_client, StdioServerParameters
+    if graph is None:
+        graph = store.get_graph()
+    store.init_schema(graph)
+    params = StdioServerParameters(command=source.command[0], args=source.command[1:])
+    async with stdio_client(params) as (read, write):
+        async with ClientSession(read, write) as session:
+            await session.initialize()
+            # Call the tool to get data
+            result = await session.call_tool(source.tool_name, source.tool_args)
+            items = _parse_mcp_result(result)
+            indexed = 0
+            for item in items:
+                text = item.get("text", item.get("content", item.get("body", "")))
+                if not text or len(text.strip()) < 20:
+                    continue
+                title = item.get("subject", item.get("title", item.get("name", "")))
+                author = item.get("from", item.get("author", item.get("user", "")))
+                source_id = item.get("id", item.get("url", title))
+                # Chunk and embed
+                chunks = chunking.chunk_text(text)
+                vectors = embeddings.embed_batch(chunks)
+                # Store document
+                doc_path = f"{source.source_type}/{source_id}"
+                store.upsert_document(graph, doc_path, source.source_type, {"title": title})
+                # Store person if we have author info
+                if author:
+                    email = author if "@" in author else ""
+                    store.upsert_person(graph, author, email)
+                    store.link_person_authored(graph, email or author, doc_path)
+                # Store chunks
+                for i, (chunk_text, vector) in enumerate(zip(chunks, vectors)):
+                    cid = chunking.chunk_id(doc_path, i)
+                    store.upsert_chunk(graph, cid, chunk_text, vector,
+                                       {"source": doc_path, "source_type": source.source_type})
+                    store.link_chunk_to_document(graph, cid, doc_path)
+                indexed += 1
+    return {"source": source.name, "items_indexed": indexed}
+def _parse_mcp_result(result) -> list[dict]:
+    """Parse MCP tool result into a list of items."""
+    items = []
+    for content in result.content:
+        if hasattr(content, "text"):
+            try:
+                data = json.loads(content.text)
+                if isinstance(data, list):
+                    items.extend(data)
+                elif isinstance(data, dict):
+                    if "results" in data:
+                        items.extend(data["results"])
+                    elif "messages" in data:
+                        items.extend(data["messages"])
+                    elif "items" in data:
+                        items.extend(data["items"])
+                    else:
+                        items.append(data)
+            except json.JSONDecodeError:
+                # Plain text — treat as single item
+                items.append({"text": content.text, "title": "mcp-result"})
+    return items
+def sync_pull_and_index(source_key: str, graph=None):
+    """Synchronous wrapper for CLI usage."""
+    if source_key not in SOURCES:
+        available = ", ".join(SOURCES.keys())
+        raise ValueError(f"Unknown source: {source_key}. Available: {available}")
+    source = SOURCES[source_key]
+    return asyncio.run(pull_and_index(source, graph))
+def add_custom_source(name: str, command: list[str], tool_name: str,
+                      tool_args: dict = None, source_type: str = "external"):
+    """Register a custom MCP source."""
+    SOURCES[name] = MCPSource(
+        name=name, command=command, tool_name=tool_name,
+        tool_args=tool_args or {}, source_type=source_type,
+    )

{knowledge_master-0.1.0 → knowledge_master-0.2.0}/knowledge_master/embeddings.py RENAMED Viewed

@@ -1,13 +1,17 @@
 """Embedding client using Ollama local models."""
-import ollama
+from ollama import Client
 MODEL = "nomic-embed-text"
+TIMEOUT = 30  # seconds
+# Create client with timeout
+_client = Client(timeout=TIMEOUT)
 def embed(text: str) -> list[float]:
     """Embed a single text string, returns vector."""
-    response = ollama.embed(model=MODEL, input=text)
+    response = _client.embed(model=MODEL, input=text)
     return response["embeddings"][0]
@@ -16,6 +20,6 @@ def embed_batch(texts: list[str], batch_size: int = 64) -> list[list[float]]:
     vectors = []
     for i in range(0, len(texts), batch_size):
         batch = texts[i : i + batch_size]
-        response = ollama.embed(model=MODEL, input=batch)
+        response = _client.embed(model=MODEL, input=batch)
         vectors.extend(response["embeddings"])
     return vectors

knowledge-master 0.1.0__tar.gz → 0.2.0__tar.gz

knowledge-master 0.1.0tar.gz → 0.2.0tar.gz