PyPI - neuroweave-python - Versions diffs - 0.1.1__tar.gz → 0.2.0__tar.gz - Mend

neuroweave-python 0.1.1tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (37) hide show

{neuroweave_python-0.1.1 → neuroweave_python-0.2.0}/CHANGELOG.md RENAMED Viewed

@@ -5,6 +5,82 @@ All notable changes to NeuroWeave will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.2.0] — 2026-04-03
+### Summary
+Major feature release adding persistent storage backends, scientific knowledge graph
+support, bulk document ingestion, vector search integration, and cross-session
+entity deduplication.
+### Added
+**NW-001 — Persistent Graph Backend (Neo4j)**
+- `AbstractGraphStore` ABC — common interface for all graph backends.
+- `MemoryGraphStore` — existing in-memory backend, now extends `AbstractGraphStore`.
+- `Neo4jGraphStore` — persistent graph backend using Neo4j (optional dependency).
+- `_build_graph_store()` factory in API — selects backend from `graph_backend` config.
+- Neo4j config fields: `neo4j_uri`, `neo4j_user`, `neo4j_password`, `neo4j_database`.
+- `GraphBackend` enum extended with `NEO4J` and `POSTGRESQL` (reserved).
+**NW-002 — Scientific Entity Schema**
+- 12 new `NodeType` values: `THEOREM`, `LEMMA`, `CONJECTURE`, `PROOF`, `DEFINITION`,
+  `EXAMPLE`, `PAPER`, `AUTHOR`, `DOMAIN`, `MATH_OBJECT`, `OPEN_PROBLEM`, `ALGORITHM`.
+- `RelationType` enum with 18 typed scientific relations (e.g. `PROVES`, `CITES`,
+  `FOLLOWS_FROM`, `BELONGS_TO`).
+- Scientific extraction prompt (`_SCIENTIFIC_SYSTEM_PROMPT`) for mathematical text.
+- `ExtractionPipeline` now accepts `mode` parameter (`"general"` | `"scientific"`).
+- `query_by_type()` — query all nodes of a given type with optional relation filter.
+- `get_proof_chain()` — traverse theorem dependency chains.
+- `get_domain_graph()` — retrieve all entities belonging to a mathematical domain.
+- `extraction_mode` config field.
+**NW-003 — Bulk Document Ingestion**
+- `DocumentIngester` — chunks full documents and extracts concurrently.
+- `ChunkStrategy` enum: `PARAGRAPH`, `FIXED`, `SECTION`, `SENTENCE`.
+- `DocumentIngestionResult` — result with entity/relation counts and timing.
+- `NeuroWeave.ingest_document()` facade method.
+- Short chunk merging to avoid tiny extraction windows.
+**NW-004 — Qdrant Integration Bridge**
+- `QdrantBridge` — combines graph traversal with Qdrant vector similarity search.
+- `VectorContextResult` — merged result from graph + vector with deduplicated names.
+- `NeuroWeave.get_context_with_vectors()` facade method.
+- Concurrent graph + vector search via `asyncio.gather()`.
+- `upsert_node_vectors()` — store node embeddings in Qdrant.
+- Optional dependency: `qdrant-client>=1.9`.
+**NW-005 — Node Merge / Deduplication**
+- Cross-session entity deduplication via `_resolve_entity_name()`.
+- `update_node_properties()` — merge new properties into existing nodes (new wins).
+- Property merging on entity reuse during ingestion.
+- `NODE_UPDATED` events emitted on property merge.
+**NW-006 — Configuration & Exports**
+- All new public symbols exported from `neuroweave.__init__` and `__all__`.
+- Updated `config/default.yaml` with all new fields.
+- Optional dependency groups: `neo4j`, `qdrant`.
+### Changed
+- `ExtractionPipeline.__init__` now accepts `mode` and `confidence_threshold` parameters.
+- `ingest_extraction()` uses cross-session dedup (queries persistent store).
+- Entity type mapping extended with all scientific types.
+### Testing
+- 377 tests total (313 original + 64 new) across 20 test files.
+- New test files: `test_neo4j_backend.py`, `test_scientific_schema.py`,
+  `test_document_ingestion.py`, `test_qdrant_bridge.py`, `test_deduplication.py`.
+---
 ## [0.1.0] — 2026-02-17
 ### Summary

{neuroweave_python-0.1.1 → neuroweave_python-0.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: neuroweave-python
-Version: 0.1.1
+Version: 0.2.0
 Summary: Real-time knowledge graph memory for agentic AI platforms
 Project-URL: Homepage, https://github.com/alexh-scrt/neuroweave
 Project-URL: Documentation, https://neuroweave.readthedocs.io
@@ -45,6 +45,10 @@ Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
 Requires-Dist: mkdocs-section-index>=0.3; extra == 'docs'
 Requires-Dist: mkdocs>=1.6; extra == 'docs'
 Requires-Dist: mkdocstrings[python]>=0.27; extra == 'docs'
+Provides-Extra: neo4j
+Requires-Dist: neo4j>=5.0; extra == 'neo4j'
+Provides-Extra: qdrant
+Requires-Dist: qdrant-client>=1.9; extra == 'qdrant'
 Description-Content-Type: text/markdown
 <p align="center">

neuroweave_python-0.2.0/config/default.yaml ADDED Viewed

@@ -0,0 +1,23 @@
+# NeuroWeave default configuration
+# Override any field via environment variable: NEUROWEAVE_{FIELD}
+llm_provider: "anthropic"
+llm_model: "claude-haiku-4-5-20251001"
+# llm_api_key: set via NEUROWEAVE_LLM_API_KEY or ANTHROPIC_API_KEY
+extraction_enabled: true
+extraction_confidence_threshold: 0.3
+extraction_mode: "general"               # "general" | "scientific"
+graph_backend: "memory"                  # "memory" | "neo4j" | "postgresql"
+neo4j_uri: "neo4j://localhost:7687"
+neo4j_user: "neo4j"
+neo4j_password: ""
+neo4j_database: "neo4j"
+server_host: "127.0.0.1"
+server_port: 8787
+log_level: "INFO"
+log_format: "console"                    # "console" | "json"

{neuroweave_python-0.1.1 → neuroweave_python-0.2.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "neuroweave-python"
-version = "0.1.1"
+version = "0.2.0"
 description = "Real-time knowledge graph memory for agentic AI platforms"
 readme = "README.md"
 license = "Apache-2.0"
@@ -49,6 +49,8 @@ dependencies = [
 ]
 [project.optional-dependencies]
+neo4j = ["neo4j>=5.0"]
+qdrant = ["qdrant-client>=1.9"]
 dev = [
     "pytest>=9.0.2",
     "pytest-asyncio>=1.3.0",

neuroweave_python-0.2.0/src/neuroweave/__init__.py ADDED Viewed

@@ -0,0 +1,26 @@
+"""NeuroWeave — Real-time knowledge graph memory for agentic AI platforms."""
+__version__ = "0.2.0"
+from neuroweave.api import ContextResult, EventType, NeuroWeave, ProcessResult
+from neuroweave.graph.query import QueryResult, get_domain_graph, get_proof_chain, query_by_type
+from neuroweave.graph.store import NodeType, RelationType
+from neuroweave.ingest.document import ChunkStrategy, DocumentIngestionResult
+from neuroweave.vector.qdrant_bridge import QdrantBridge, VectorContextResult
+__all__ = [
+    "ChunkStrategy",
+    "ContextResult",
+    "DocumentIngestionResult",
+    "EventType",
+    "NeuroWeave",
+    "NodeType",
+    "ProcessResult",
+    "QdrantBridge",
+    "QueryResult",
+    "RelationType",
+    "VectorContextResult",
+    "get_domain_graph",
+    "get_proof_chain",
+    "query_by_type",
+]

{neuroweave_python-0.1.1 → neuroweave_python-0.2.0}/src/neuroweave/api.py RENAMED Viewed

@@ -21,7 +21,7 @@ from typing import Any, Awaitable, Callable
 import uvicorn
-from neuroweave.config import LLMProvider, LogFormat, NeuroWeaveConfig
+from neuroweave.config import GraphBackend, LLMProvider, LogFormat, NeuroWeaveConfig
 from neuroweave.events import EventBus
 from neuroweave.extraction.llm_client import (
     AnthropicLLMClient,
@@ -235,8 +235,12 @@ class NeuroWeave:
         # Core components
         llm_client = _create_llm_client(self._config)
-        self._store = GraphStore()
-        self._pipeline = ExtractionPipeline(llm_client)
+        self._store = _build_graph_store(self._config)
+        self._pipeline = ExtractionPipeline(
+            llm_client,
+            mode=self._config.extraction_mode,
+            confidence_threshold=self._config.extraction_confidence_threshold,
+        )
         self._event_bus = EventBus()
         self._nl_planner = NLQueryPlanner(llm_client, self._store)
@@ -380,6 +384,77 @@ class NeuroWeave:
             plan=plan,
         )
+    # -- Bulk ingestion -----------------------------------------------------
+    async def ingest_document(
+        self,
+        text: str,
+        doc_type: str = "paper",
+        metadata: dict[str, Any] | None = None,
+        chunk_strategy: str = "paragraph",
+        concurrent_chunks: int = 5,
+    ) -> Any:
+        """Ingest a full document, chunking and extracting concurrently.
+        Usage:
+            result = await nw.ingest_document(
+                text=full_paper_text,
+                doc_type="paper",
+                metadata={"title": "...", "doi": "...", "year": 2025},
+            )
+            print(f"Extracted {result.total_entities} entities from {result.chunk_count} chunks")
+        """
+        self._ensure_started()
+        from neuroweave.ingest.document import ChunkStrategy, DocumentIngester
+        strategy = ChunkStrategy(chunk_strategy)
+        ingester = DocumentIngester(
+            pipeline=self._pipeline,  # type: ignore[arg-type]
+            store=self._store,  # type: ignore[arg-type]
+            chunk_strategy=strategy,
+            concurrent_chunks=concurrent_chunks,
+        )
+        return await ingester.ingest_document(text, doc_type=doc_type, metadata=metadata)
+    # -- Vector context -----------------------------------------------------
+    async def get_context_with_vectors(
+        self,
+        query: str,
+        query_vector: list[float],
+        qdrant_client: Any,
+        collection: str = "ravennest_papers",
+        top_k: int = 10,
+        graph_hops: int = 2,
+        qdrant_filter: dict[str, Any] | None = None,
+    ) -> Any:
+        """Combined graph + vector search. Requires qdrant-client to be installed.
+        Usage:
+            from qdrant_client import AsyncQdrantClient
+            client = AsyncQdrantClient(url="http://localhost:6333")
+            result = await nw.get_context_with_vectors(
+                query="chromatic polynomial bounds",
+                query_vector=embedding,
+                qdrant_client=client,
+            )
+        """
+        self._ensure_started()
+        from neuroweave.vector.qdrant_bridge import QdrantBridge
+        bridge = QdrantBridge(
+            store=self._store,  # type: ignore[arg-type]
+            qdrant_client=qdrant_client,
+            collection=collection,
+        )
+        return await bridge.get_context_with_vectors(
+            query=query,
+            query_vector=query_vector,
+            top_k=top_k,
+            graph_hops=graph_hops,
+            qdrant_filter=qdrant_filter,
+        )
     # -- Event subscription -------------------------------------------------
     def subscribe(
@@ -480,6 +555,23 @@ class NeuroWeave:
 # ---------------------------------------------------------------------------
+def _build_graph_store(config: NeuroWeaveConfig) -> GraphStore:
+    """Factory: returns the correct GraphStore implementation."""
+    if config.graph_backend == GraphBackend.NEO4J:
+        from neuroweave.graph.backends.neo4j import Neo4jGraphStore
+        return Neo4jGraphStore(
+            uri=config.neo4j_uri,
+            user=config.neo4j_user,
+            password=config.neo4j_password,
+            database=config.neo4j_database,
+        )  # type: ignore[return-value]
+    # Default: memory
+    from neuroweave.graph.backends.memory import MemoryGraphStore
+    return MemoryGraphStore()  # type: ignore[return-value]
 def _create_llm_client(config: NeuroWeaveConfig) -> LLMClient:
     """Create the appropriate LLM client based on configuration."""
     if config.llm_provider == LLMProvider.MOCK:

{neuroweave_python-0.1.1 → neuroweave_python-0.2.0}/src/neuroweave/config.py RENAMED Viewed

@@ -24,6 +24,8 @@ class LogFormat(str, Enum):
 class GraphBackend(str, Enum):
     MEMORY = "memory"
+    NEO4J = "neo4j"
+    POSTGRESQL = "postgresql"  # reserved for future
 _PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent
@@ -63,10 +65,17 @@ class NeuroWeaveConfig(BaseSettings):
     # --- Extraction ---
     extraction_enabled: bool = True
     extraction_confidence_threshold: float = Field(default=0.3, ge=0.0, le=1.0)
+    extraction_mode: str = "general"  # "general" | "scientific"
     # --- Graph ---
     graph_backend: GraphBackend = GraphBackend.MEMORY
+    # --- Neo4j ---
+    neo4j_uri: str = "neo4j://localhost:7687"
+    neo4j_user: str = "neo4j"
+    neo4j_password: str = ""
+    neo4j_database: str = "neo4j"
     # --- Server ---
     server_host: str = "127.0.0.1"
     server_port: int = Field(default=8787, ge=1024, le=65535)

{neuroweave_python-0.1.1 → neuroweave_python-0.2.0}/src/neuroweave/extraction/pipeline.py RENAMED Viewed

@@ -56,7 +56,7 @@ class ExtractionResult:
 # System prompt
 # ---------------------------------------------------------------------------
-EXTRACTION_SYSTEM_PROMPT = """\
+_GENERAL_SYSTEM_PROMPT = """\
 You are a knowledge extraction engine. Your task is to extract entities and \
 relationships from a user's conversational message.
@@ -91,6 +91,48 @@ Respond with ONLY valid JSON in this exact format, no other text:
 }
 """
+# Backward compat alias
+EXTRACTION_SYSTEM_PROMPT = _GENERAL_SYSTEM_PROMPT
+_SCIENTIFIC_SYSTEM_PROMPT = """\
+You are a scientific knowledge extraction system.
+Extract entities and relations from mathematical and scientific text.
+OUTPUT FORMAT — valid JSON only, no surrounding text:
+{
+  "entities": [
+    {
+      "name": "string — canonical name of the entity",
+      "entity_type": "theorem|lemma|conjecture|proof|definition|example|paper|author|domain|math_object|open_problem|algorithm|entity|concept",
+      "properties": {
+        "statement": "formal statement if this is a theorem/lemma/conjecture",
+        "domain": "mathematical subdomain e.g. Graph Theory",
+        "status": "proven|unproven|disproven|open",
+        "year": 2024,
+        "doi": "10.xxxx/yyy if known"
+      }
+    }
+  ],
+  "relations": [
+    {
+      "source": "entity name",
+      "target": "entity name",
+      "relation": "proves|follows_from|uses|contradicts|generalizes|is_special_case|equivalent_to|is_part_of|belongs_to|applies_to|authored_by|published_in|cites|builds_on|verified_by|rejected_by",
+      "confidence": 0.0,
+      "properties": {}
+    }
+  ]
+}
+RULES:
+- Use specific scientific entity types (theorem, lemma, etc.) over generic ones (concept, entity)
+- "statement" property on theorems/lemmas must be the verbatim mathematical statement if present
+- Confidence 0.90-0.99 for explicitly stated facts, 0.50-0.70 for inferred relations
+- Extract the full citation as a PAPER entity if a paper is referenced
+- Empty arrays if no entities or relations are extractable
+- NEVER add explanation or preamble — pure JSON only
+"""
 # ---------------------------------------------------------------------------
 # JSON repair — handles common LLM output issues
@@ -229,8 +271,19 @@ class ExtractionPipeline:
         result = pipeline.extract("My wife's name is Lena")
     """
-    def __init__(self, llm_client: LLMClient) -> None:
+    def __init__(
+        self,
+        llm_client: LLMClient,
+        mode: str = "general",
+        confidence_threshold: float = 0.3,
+    ) -> None:
         self._llm = llm_client
+        self._mode = mode
+        self._threshold = confidence_threshold
+    @property
+    def _system_prompt(self) -> str:
+        return _SCIENTIFIC_SYSTEM_PROMPT if self._mode == "scientific" else _GENERAL_SYSTEM_PROMPT
     async def extract(self, message: str) -> ExtractionResult:
         """Extract entities and relations from a user message.
@@ -246,7 +299,7 @@ class ExtractionPipeline:
         start = time.monotonic()
         try:
-            raw_response = await self._llm.extract(EXTRACTION_SYSTEM_PROMPT, message)
+            raw_response = await self._llm.extract(self._system_prompt, message)
         except LLMError as e:
             log.error("extraction.llm_error", error=str(e))
             return ExtractionResult(

{neuroweave_python-0.1.1 → neuroweave_python-0.2.0}/src/neuroweave/graph/__init__.py RENAMED Viewed

@@ -1,7 +1,13 @@
 """Graph storage, ingestion, query engine, and NL query planner."""
 from neuroweave.graph.nl_query import NLQueryPlanner, QueryPlan
-from neuroweave.graph.query import QueryResult, query_subgraph
+from neuroweave.graph.query import (
+    QueryResult,
+    get_domain_graph,
+    get_proof_chain,
+    query_by_type,
+    query_subgraph,
+)
 from neuroweave.graph.store import (
     Edge,
     GraphEvent,
@@ -9,6 +15,7 @@ from neuroweave.graph.store import (
     GraphStore,
     Node,
     NodeType,
+    RelationType,
     make_edge,
     make_node,
 )
@@ -23,7 +30,11 @@ __all__ = [
     "NodeType",
     "QueryPlan",
     "QueryResult",
+    "RelationType",
+    "get_domain_graph",
+    "get_proof_chain",
     "make_edge",
     "make_node",
+    "query_by_type",
     "query_subgraph",
 ]

neuroweave_python-0.2.0/src/neuroweave/graph/backends/__init__.py ADDED Viewed

@@ -0,0 +1,7 @@
+"""Graph storage backend implementations."""
+from neuroweave.graph.backends.base import AbstractGraphStore
+from neuroweave.graph.backends.memory import MemoryGraphStore
+from neuroweave.graph.backends.neo4j import Neo4jGraphStore
+__all__ = ["AbstractGraphStore", "MemoryGraphStore", "Neo4jGraphStore"]

neuroweave_python-0.2.0/src/neuroweave/graph/backends/base.py ADDED Viewed

@@ -0,0 +1,79 @@
+"""Abstract base for all graph storage backends."""
+from __future__ import annotations
+import abc
+from typing import Any
+from neuroweave.graph.store import Edge, Node
+class AbstractGraphStore(abc.ABC):
+    """Interface contract for all NeuroWeave graph backends.
+    Implementations must be thread-safe for concurrent reads during
+    single-writer access from the main thread.
+    """
+    @abc.abstractmethod
+    def set_event_queue(self, q: Any) -> None:
+        """Attach an event queue. Events are pushed here on mutations."""
+        ...
+    @abc.abstractmethod
+    def add_node(self, node: Node) -> Node:
+        """Add a node. Returns the node (possibly with db-assigned id)."""
+        ...
+    @abc.abstractmethod
+    def get_node(self, node_id: str) -> dict[str, Any] | None:
+        """Return node dict by id, or None if not found."""
+        ...
+    @abc.abstractmethod
+    def find_nodes(
+        self,
+        node_type: str | None = None,
+        name_contains: str | None = None,
+    ) -> list[dict[str, Any]]:
+        """Return all nodes matching the given filters."""
+        ...
+    @abc.abstractmethod
+    def add_edge(self, edge: Edge) -> Edge:
+        """Add a directed edge. Returns the edge."""
+        ...
+    @abc.abstractmethod
+    def get_edges(
+        self,
+        source_id: str | None = None,
+        target_id: str | None = None,
+        relation: str | None = None,
+    ) -> list[dict[str, Any]]:
+        """Return edges matching any combination of source, target, relation."""
+        ...
+    @abc.abstractmethod
+    def get_neighbors(self, node_id: str, depth: int = 1) -> list[dict[str, Any]]:
+        """Return all nodes within `depth` hops of node_id via BFS."""
+        ...
+    @abc.abstractmethod
+    def to_dict(self) -> dict[str, Any]:
+        """Full serialization: {"nodes": [...], "edges": [...], "stats": {...}}."""
+        ...
+    @abc.abstractmethod
+    def update_node_properties(self, node_id: str, properties: dict[str, Any]) -> None:
+        """Merge new properties into an existing node. Existing keys are preserved;
+        new keys are added. Conflicts: new value wins."""
+        ...
+    @property
+    @abc.abstractmethod
+    def node_count(self) -> int: ...
+    @property
+    @abc.abstractmethod
+    def edge_count(self) -> int: ...

neuroweave_python-0.2.0/src/neuroweave/graph/backends/memory.py ADDED Viewed

@@ -0,0 +1,35 @@
+"""In-memory graph backend — wraps the existing GraphStore as MemoryGraphStore."""
+from __future__ import annotations
+from typing import Any
+from neuroweave.graph.backends.base import AbstractGraphStore
+from neuroweave.graph.store import (
+    GraphEvent,
+    GraphEventType,
+    GraphStore,
+)
+class MemoryGraphStore(GraphStore, AbstractGraphStore):
+    """In-memory graph backend using NetworkX.
+    This is the original GraphStore with the AbstractGraphStore interface.
+    All existing functionality is inherited from GraphStore.
+    """
+    def __init__(self) -> None:
+        GraphStore.__init__(self)
+    def update_node_properties(self, node_id: str, properties: dict[str, Any]) -> None:
+        """Merge new properties into an existing node. New value wins on conflict."""
+        if node_id not in self._graph.nodes:
+            return
+        existing = self._graph.nodes[node_id].get("properties", {})
+        merged = {**existing, **properties}
+        self._graph.nodes[node_id]["properties"] = merged
+        self._emit(GraphEvent(
+            event_type=GraphEventType.NODE_UPDATED,
+            data={"id": node_id, "properties": merged},
+        ))

neuroweave-python 0.1.1__tar.gz → 0.2.0__tar.gz

neuroweave-python 0.1.1tar.gz → 0.2.0tar.gz