PyPI - gnosisllm-knowledge - Versions diffs - 0.2.0__tar.gz - Mend

gnosisllm-knowledge 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (63) hide show

gnosisllm_knowledge-0.2.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,382 @@
+Metadata-Version: 2.4
+Name: gnosisllm-knowledge
+Version: 0.2.0
+Summary: Enterprise-grade knowledge loading, indexing, and search for Python
+License: MIT
+Keywords: knowledge-base,rag,semantic-search,vector-search,opensearch,llm,embeddings,enterprise
+Author: David Marsa
+Author-email: david.marsa@neomanex.com
+Requires-Python: >=3.11,<4.0
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Classifier: Topic :: Text Processing :: Indexing
+Classifier: Typing :: Typed
+Classifier: Framework :: AsyncIO
+Requires-Dist: aiohttp (>=3.13.2,<4.0.0)
+Requires-Dist: httpx (>=0.28.1,<0.29.0)
+Requires-Dist: opensearch-py (>=3.1.0,<4.0.0)
+Requires-Dist: pydantic (>=2.12.5,<3.0.0)
+Requires-Dist: pydantic-settings (>=2.12.0,<3.0.0)
+Requires-Dist: python-dotenv (>=1.2.1,<2.0.0)
+Requires-Dist: rich (>=13.7.0,<14.0.0)
+Requires-Dist: typer (>=0.12.0,<1.0.0)
+Project-URL: Changelog, https://github.com/gnosisllm/gnosisllm-knowledge/blob/main/CHANGELOG.md
+Project-URL: Documentation, https://gnosisllm-knowledge.readthedocs.io
+Project-URL: Homepage, https://github.com/gnosisllm/gnosisllm-knowledge
+Project-URL: Repository, https://github.com/gnosisllm/gnosisllm-knowledge
+Description-Content-Type: text/markdown
+# GnosisLLM Knowledge
+Enterprise-grade knowledge loading, indexing, and semantic search library for Python.
+## Features
+- **Semantic Search**: Vector-based similarity search using OpenAI embeddings
+- **Hybrid Search**: Combine semantic and keyword (BM25) search for best results
+- **Multiple Loaders**: Load content from websites, sitemaps, and files
+- **Intelligent Chunking**: Sentence-aware text splitting with configurable overlap
+- **OpenSearch Backend**: Production-ready with k-NN vector search
+- **Multi-Tenancy**: Built-in support for account and collection isolation
+- **Event-Driven**: Observer pattern for progress tracking and monitoring
+- **SOLID Architecture**: Clean, maintainable, and extensible codebase
+## Installation
+```bash
+pip install gnosisllm-knowledge
+# With OpenSearch backend
+pip install gnosisllm-knowledge[opensearch]
+# With all optional dependencies
+pip install gnosisllm-knowledge[all]
+```
+## Quick Start (CLI)
+```bash
+# Install
+pip install gnosisllm-knowledge
+# Set OpenAI API key for embeddings
+export OPENAI_API_KEY=sk-...
+# Setup OpenSearch with ML model
+gnosisllm-knowledge setup --host localhost --port 9200
+# ✓ Created connector, model, pipelines, index
+# Model ID: abc123  →  Add to .env: OPENSEARCH_MODEL_ID=abc123
+export OPENSEARCH_MODEL_ID=abc123
+# Load content from a sitemap
+gnosisllm-knowledge load https://docs.example.com/sitemap.xml
+# ✓ Loaded 247 documents (1,248 chunks) in 45.3s
+# Search
+gnosisllm-knowledge search "how to configure authentication"
+# Found 42 results (23.4ms)
+# 1. Authentication Guide (92.3%)
+#    To configure authentication, set AUTH_PROVIDER...
+# Interactive search mode
+gnosisllm-knowledge search --interactive
+```
+## Quick Start (Python API)
+```python
+from gnosisllm_knowledge import Knowledge
+# Create instance with OpenSearch backend
+knowledge = Knowledge.from_opensearch(
+    host="localhost",
+    port=9200,
+)
+# Setup backend (creates indices)
+await knowledge.setup()
+# Load and index a sitemap
+await knowledge.load(
+    "https://docs.example.com/sitemap.xml",
+    collection_id="docs",
+)
+# Search
+results = await knowledge.search("how to configure authentication")
+for item in results.items:
+    print(f"{item.title}: {item.score}")
+```
+## CLI Commands
+### Setup
+Configure OpenSearch with neural search capabilities:
+```bash
+gnosisllm-knowledge setup [OPTIONS]
+Options:
+  --host        OpenSearch host (default: localhost)
+  --port        OpenSearch port (default: 9200)
+  --use-ssl     Enable SSL connection
+  --force       Clean up existing resources first
+  --no-hybrid   Skip hybrid search pipeline
+```
+### Load
+Load and index content from URLs or sitemaps:
+```bash
+gnosisllm-knowledge load <URL> [OPTIONS]
+Options:
+  --type         Source type: website, sitemap (auto-detects)
+  --index        Target index name (default: knowledge)
+  --account-id   Multi-tenant account ID
+  --collection-id Collection grouping ID
+  --batch-size   Documents per batch (default: 100)
+  --max-urls     Max URLs from sitemap (default: 1000)
+  --dry-run      Preview without indexing
+```
+### Search
+Search indexed content with multiple modes:
+```bash
+gnosisllm-knowledge search <QUERY> [OPTIONS]
+Options:
+  --mode         Search mode: semantic, keyword, hybrid, agentic
+  --index        Index to search (default: knowledge)
+  --limit        Max results (default: 5)
+  --account-id   Filter by account
+  --collection-ids Filter by collections (comma-separated)
+  --json         Output as JSON for scripting
+  --interactive  Interactive search session
+```
+## Architecture
+```
+gnosisllm-knowledge/
+├── api/                 # High-level Knowledge facade
+├── core/
+│   ├── domain/          # Document, SearchQuery, SearchResult models
+│   ├── interfaces/      # Protocol definitions (IContentLoader, etc.)
+│   ├── events/          # Event system for progress tracking
+│   └── exceptions.py    # Exception hierarchy
+├── loaders/             # Content loaders (website, sitemap)
+├── fetchers/            # Content fetchers (HTTP, Neoreader)
+├── chunking/            # Text chunking strategies
+├── backends/
+│   ├── opensearch/      # OpenSearch implementation
+│   └── memory/          # In-memory backend for testing
+└── services/            # Indexing and search orchestration
+```
+## Search Modes
+```python
+from gnosisllm_knowledge import SearchMode
+# Semantic search (vector similarity)
+results = await knowledge.search(query, mode=SearchMode.SEMANTIC)
+# Keyword search (BM25)
+results = await knowledge.search(query, mode=SearchMode.KEYWORD)
+# Hybrid search (default - combines both)
+results = await knowledge.search(query, mode=SearchMode.HYBRID)
+```
+## Agentic Search
+AI-powered search with reasoning and natural language answers using OpenSearch ML agents.
+**Requirements:** OpenSearch 3.4+ for conversational memory support.
+### Setup
+```bash
+# 1. First run standard setup (creates embedding model)
+gnosisllm-knowledge setup --port 9201
+# 2. Setup agentic agents (creates LLM connector, VectorDBTool, MLModelTool, agents)
+gnosisllm-knowledge agentic setup
+# ✓ Flow Agent ID: abc123
+# ✓ Conversational Agent ID: def456
+# 3. Add agent IDs to environment
+export OPENSEARCH_FLOW_AGENT_ID=abc123
+export OPENSEARCH_CONVERSATIONAL_AGENT_ID=def456
+```
+### Usage
+```bash
+# Single-turn agentic search (uses flow agent)
+gnosisllm-knowledge search --mode agentic "What is Typer?"
+# Interactive multi-turn chat (uses conversational agent with memory)
+gnosisllm-knowledge agentic chat
+# You: What is Typer?
+# Assistant: Typer is a library for building CLI applications...
+# You: What did you just say about it?
+# Assistant: I told you that Typer is a library for building CLI...
+```
+### How It Works
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                         User Query                                   │
+│                    "What is Typer?"                                  │
+└─────────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────────┐
+│                    OpenSearch ML Agent                               │
+│              (Flow or Conversational)                                │
+└─────────────────────────────────────────────────────────────────────┘
+                              │
+            ┌─────────────────┴─────────────────┐
+            ▼                                   ▼
+┌───────────────────────┐           ┌───────────────────────┐
+│     VectorDBTool      │           │   Conversation Memory │
+│  (Knowledge Search)   │           │   (Conversational     │
+│                       │           │    Agent Only)        │
+│  - Searches index     │           │                       │
+│  - Returns context    │           │  - Stores Q&A pairs   │
+│                       │           │  - Injects chat_history│
+└───────────────────────┘           └───────────────────────┘
+            │                                   │
+            └─────────────────┬─────────────────┘
+                              ▼
+┌─────────────────────────────────────────────────────────────────────┐
+│                      MLModelTool (answer_generator)                  │
+│                                                                      │
+│  Prompt Template:                                                    │
+│  ┌────────────────────────────────────────────────────────────────┐ │
+│  │ Context from knowledge base:                                    │ │
+│  │ ${parameters.knowledge_search.output}                           │ │
+│  │                                                                 │ │
+│  │ Previous conversation:        ← Only for conversational agent  │ │
+│  │ ${parameters.chat_history:-}                                    │ │
+│  │                                                                 │ │
+│  │ Question: ${parameters.question}                                │ │
+│  └────────────────────────────────────────────────────────────────┘ │
+└─────────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────────┐
+│                      AI-Generated Answer                             │
+│  "Typer is a library for building CLI applications in Python..."    │
+└─────────────────────────────────────────────────────────────────────┘
+```
+### Agent Types
+| Agent | Type | Use Case | Memory |
+|-------|------|----------|--------|
+| Flow | `flow` | Fast single-turn RAG, API calls | No |
+| Conversational | `conversational_flow` | Multi-turn dialogue, chat | Yes |
+### Key Configuration
+The conversational agent requires these settings for memory to work:
+```python
+# In agent registration (setup.py)
+agent_body = {
+    "type": "conversational_flow",
+    "app_type": "rag",  # Required for memory injection
+    "llm": {
+        "model_id": llm_model_id,
+        "parameters": {
+            "message_history_limit": 10,  # Include last N messages
+        },
+    },
+    "memory": {"type": "conversation_index"},
+}
+# MLModelTool prompt must include:
+# ${parameters.chat_history:-}  ← Receives conversation history
+```
+## Multi-Tenancy
+```python
+# Load with tenant isolation
+await knowledge.load(
+    source="https://docs.example.com/sitemap.xml",
+    account_id="tenant-123",
+    collection_id="docs",
+)
+# Search within tenant
+results = await knowledge.search(
+    "query",
+    account_id="tenant-123",
+    collection_ids=["docs"],
+)
+```
+## Event Tracking
+```python
+from gnosisllm_knowledge import EventType
+# Subscribe to events
+@knowledge.events.on(EventType.DOCUMENT_INDEXED)
+def on_indexed(event):
+    print(f"Indexed: {event.document_id}")
+@knowledge.events.on(EventType.BATCH_COMPLETED)
+def on_batch(event):
+    print(f"Batch complete: {event.documents_indexed} docs")
+```
+## Configuration
+```python
+from gnosisllm_knowledge import OpenSearchConfig
+# From environment variables
+config = OpenSearchConfig.from_env()
+# Explicit configuration
+config = OpenSearchConfig(
+    host="search.example.com",
+    port=443,
+    use_ssl=True,
+    username="admin",
+    password="secret",
+    embedding_model="text-embedding-3-small",
+    embedding_dimension=1536,
+)
+knowledge = Knowledge.from_opensearch(config=config)
+```
+## Requirements
+- Python 3.11+
+- OpenSearch 2.0+ (for production use)
+- OpenSearch 3.4+ (for agentic search with conversation memory)
+## License
+MIT