PyPI - dao-ai - Versions diffs - 0.1.18__tar.gz → 0.1.19__tar.gz - Mend

dao-ai 0.1.18tar.gz → 0.1.19tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (319) hide show

{dao_ai-0.1.18 → dao_ai-0.1.19}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: dao-ai
-Version: 0.1.18
+Version: 0.1.19
 Summary: DAO AI: A modular, multi-agent orchestration framework for complex AI workflows. Supports agent handoff, tool integration, and dynamic configuration via YAML.
 Project-URL: Homepage, https://github.com/natefleming/dao-ai
 Project-URL: Documentation, https://natefleming.github.io/dao-ai
@@ -409,7 +409,8 @@ The `config/examples/` directory contains ready-to-use configurations organized
 - `01_getting_started/minimal.yaml` - Simplest possible agent
 - `02_tools/vector_search_with_reranking.yaml` - RAG with improved accuracy
-- `04_genie/genie_semantic_cache.yaml` - NL-to-SQL with two-tier caching
+- `04_genie/genie_semantic_cache.yaml` - NL-to-SQL with PostgreSQL semantic caching
+- `04_genie/genie_in_memory_semantic_cache.yaml` - NL-to-SQL with in-memory semantic caching (no database)
 - `05_memory/conversation_summarization.yaml` - Long conversation handling
 - `06_on_behalf_of_user/obo_basic.yaml` - User-level access control
 - `07_human_in_the_loop/human_in_the_loop.yaml` - Approval workflows

{dao_ai-0.1.18 → dao_ai-0.1.19}/README.md RENAMED Viewed

@@ -330,7 +330,8 @@ The `config/examples/` directory contains ready-to-use configurations organized
 - `01_getting_started/minimal.yaml` - Simplest possible agent
 - `02_tools/vector_search_with_reranking.yaml` - RAG with improved accuracy
-- `04_genie/genie_semantic_cache.yaml` - NL-to-SQL with two-tier caching
+- `04_genie/genie_semantic_cache.yaml` - NL-to-SQL with PostgreSQL semantic caching
+- `04_genie/genie_in_memory_semantic_cache.yaml` - NL-to-SQL with in-memory semantic caching (no database)
 - `05_memory/conversation_summarization.yaml` - Long conversation handling
 - `06_on_behalf_of_user/obo_basic.yaml` - User-level access control
 - `07_human_in_the_loop/human_in_the_loop.yaml` - Approval workflows

{dao_ai-0.1.18 → dao_ai-0.1.19}/config/examples/04_genie/README.md RENAMED Viewed

@@ -52,10 +52,20 @@ flowchart TB
 | File | Description |
 |------|-------------|
-| [`genie_cached.yaml`](./genie_cached.yaml) | Two-tier caching with LRU and semantic cache |
+| [`genie_cached.yaml`](./genie_cached.yaml) | Two-tier caching with LRU and PostgreSQL semantic cache |
+| [`genie_in_memory_semantic_cache.yaml`](./genie_in_memory_semantic_cache.yaml) | In-memory semantic cache (no database required) |
 ## Cache Tiers
+DAO provides two L2 semantic cache implementations:
+| Implementation | Best For | Database Required |
+|----------------|----------|-------------------|
+| **PostgreSQL Semantic Cache** | Production multi-instance deployments, large cache sizes (thousands+), cross-instance sharing | Yes (PostgreSQL with pg_vector) |
+| **In-Memory Semantic Cache** | Single-instance deployments, dev/test, no database access, moderate cache sizes (hundreds to low thousands) | No (in-memory only) |
+Both use the same L2 distance algorithm and support conversation context awareness for consistent behavior.
 ```mermaid
 %%{init: {'theme': 'base'}}%%
 graph TB
@@ -70,8 +80,9 @@ graph TB
         subgraph L2["🧠 L2: Semantic Cache"]
             SEM1["<b>Type:</b> Similarity match"]
             SEM2["<b>Speed:</b> ~50ms"]
-            SEM3["<b>Threshold:</b> 0.95"]
-            SEM4["<b>TTL:</b> ttl: 3600 (1 hour)"]
+            SEM3["<b>Options:</b> PostgreSQL or In-Memory"]
+            SEM4["<b>Threshold:</b> 0.85-0.95"]
+            SEM5["<b>TTL:</b> ttl: 3600 (1 hour)"]
         end
     end
@@ -81,21 +92,56 @@ graph TB
 ## Configuration
+### PostgreSQL Semantic Cache (Multi-Instance)
 ```yaml
-resources:
-  genie_rooms:
-    retail_genie_room: &retail_genie_room
-      space_id: "01efabcd1234567890abcdef12345678"
+genie_tool:
+  function:
+    type: factory
+    name: dao_ai.tools.create_genie_tool
+    args:
+      genie_room: *retail_genie_room
       # ⚡ L1: LRU Cache - Exact match
-      lru_cache:
-        maxsize: 100              # Max cached queries
+      lru_cache_parameters:
+        warehouse: *warehouse
+        capacity: 100
+        time_to_live_seconds: 3600
+      # 🧠 L2: PostgreSQL Semantic Cache - Similar queries
+      semantic_cache_parameters:
+        database: *postgres_db
+        warehouse: *warehouse
+        embedding_model: *embedding_model
+        similarity_threshold: 0.85
+        time_to_live_seconds: 3600
+        context_window_size: 3
+```
+### In-Memory Semantic Cache (Single-Instance)
+```yaml
+genie_tool:
+  function:
+    type: factory
+    name: dao_ai.tools.create_genie_tool
+    args:
+      genie_room: *retail_genie_room
+      # Optional L1: LRU Cache - Exact match
+      # lru_cache_parameters:
+      #   warehouse: *warehouse
+      #   capacity: 100
+      #   time_to_live_seconds: 3600
-      # 🧠 L2: Semantic Cache - Similar queries
-      semantic_cache:
-        similarity_threshold: 0.95  # How similar (0.0-1.0)
-        ttl: 3600                   # Time-to-live in seconds
-        max_results: 1000           # Max cached embeddings
+      # 🧠 In-Memory Semantic Cache - No database required
+      in_memory_semantic_cache_parameters:
+        warehouse: *warehouse
+        embedding_model: *embedding_model
+        similarity_threshold: 0.85
+        time_to_live_seconds: 604800  # 1 week
+        capacity: 1000                # LRU eviction when full
+        context_window_size: 3
 ```
 ## Cache Flow
@@ -210,8 +256,10 @@ agents:
 ## Quick Start
+### PostgreSQL Semantic Cache
 ```bash
-# Run with caching enabled
+# Run with PostgreSQL semantic cache
 dao-ai chat -c config/examples/04_genie/genie_cached.yaml
 # Test caching behavior
@@ -220,6 +268,18 @@ dao-ai chat -c config/examples/04_genie/genie_cached.yaml
 > Show me Q4 revenue                  # Semantic cache hit (~50ms)
 ```
+### In-Memory Semantic Cache
+```bash
+# Run with in-memory semantic cache (no database required)
+dao-ai chat -c config/examples/04_genie/genie_in_memory_semantic_cache.yaml
+# Test caching behavior
+> What are the total sales for Q4?    # First query - Genie hit
+> What are the total sales for Q4?    # Semantic cache hit (~50ms)
+> Show me Q4 revenue                  # Semantic cache hit (~50ms)
+```
 ## Cache Monitoring
 ```bash

dao_ai-0.1.19/config/examples/04_genie/genie_in_memory_semantic_cache.yaml ADDED Viewed

@@ -0,0 +1,148 @@
+# yaml-language-server: $schema=../../../schemas/model_config_schema.json
+#
+# Example configuration for Genie with in-memory semantic caching:
+#   - In-Memory Semantic Cache: Similarity search without external database
+#   - Optional LRU Cache (L1): Fast O(1) exact match lookup
+#
+# This configuration is ideal for:
+# - Environments without access to PostgreSQL or Databricks Lakebase
+# - Single-instance deployments (cache not shared across instances)
+# - Moderate cache sizes (hundreds to low thousands of entries)
+# - Cases where cache persistence across restarts is not required
+#
+# Cache flow: Question → LRU (exact match) → In-Memory Semantic (similarity) → Genie API
+# On cache hit, the cached SQL is re-executed against the warehouse for fresh data.
+schemas:
+  quick_serve_restaurant_schema: &quick_serve_restaurant_schema
+    catalog_name: retail_consumer_goods                    # Unity Catalog name
+    schema_name: quick_serve_restaurant                    # Schema within the catalog
+resources:
+  llms:
+    # Primary LLM for general tasks
+    default_llm: &default_llm
+      name: databricks-claude-sonnet-4
+      temperature: 0.1                              # Low temperature for consistent responses
+      max_tokens: 8192                              # Maximum tokens per response
+      on_behalf_of_user: False
+    # Embedding model for semantic similarity search
+    embedding_model: &embedding_model
+      name: databricks-gte-large-en                 # Text embedding model
+      on_behalf_of_user: False
+  warehouses:
+    # Warehouse for executing SQL queries (used by semantic cache)
+    shared_endpoint_warehouse: &shared_endpoint_warehouse
+      name: "Shared Endpoint Warehouse"             # Human-readable name
+      description: "A warehouse for shared endpoints"  # Description
+      warehouse_id: 148ccb90800933a1                # Databricks warehouse ID
+      on_behalf_of_user: False
+  genie_rooms:
+    # Genie space for retail data queries
+    retail_genie_room: &retail_genie_room
+      name: "Retail AI Genie Room"                        # Human-readable name
+      description: "A room for Genie agents to interact"  # Description
+      space_id:
+        env: RETAIL_AI_GENIE_SPACE_ID
+        default_value: 01f01c91f1f414d59daaefd2b7ec82ea
+# =============================================================================
+# MEMORY CONFIGURATION
+# =============================================================================
+# Configure in-memory storage for agent conversations and state persistence
+memory: &memory
+  # Conversation checkpointing for state persistence
+  checkpointer:
+    name: default_checkpointer                      # Checkpointer identifier (type inferred as memory - no database)
+tools:
+  genie_tool: &genie_tool
+    name: genie
+    function:
+      type: factory                                 # Tool type: factory function
+      name: dao_ai.tools.create_genie_tool          # Factory function path
+      args:                                         # Arguments passed to factory
+        name: my_genie_tool
+        description: Answers questions about retail products and inventory
+        genie_room: *retail_genie_room              # Reference to Genie room config
+        # Optional L1 Cache: LRU (Least Recently Used) - Fast exact match
+        # Uncomment to enable LRU cache in front of semantic cache
+        # lru_cache_parameters:
+        #   warehouse: *shared_endpoint_warehouse     # Warehouse to re-execute cached SQL
+        #   capacity: 100                             # Maximum number of cached entries
+        #   time_to_live_seconds: 3600                # Cache entries expire after 1 hour
+        # In-Memory Semantic Cache: Similarity-based lookup with LRU eviction (NO database required)
+        # Default settings optimized for ~30 users on 8GB machine:
+        #   - Capacity: 10,000 entries (~200MB, ~330 queries/user)
+        #   - Eviction: LRU (Least Recently Used) keeps hot queries cached
+        #   - TTL: 1 week (accommodates weekly work patterns)
+        #   - Memory: ~4-5% of 8GB system
+        in_memory_semantic_cache_parameters:
+          warehouse: *shared_endpoint_warehouse     # Warehouse used to re-execute cached SQL
+          embedding_model: *embedding_model         # Reference to embedding model
+          # embedding_dims: 1024                    # Auto-detected if omitted (recommended)
+          similarity_threshold: 0.85                # Minimum similarity for question matching (L2 distance to 0-1)
+          context_similarity_threshold: 0.80        # Minimum similarity for context matching
+          # time_to_live_seconds: 604800            # Cache entries expire after 1 week (default)
+          # capacity: 10000                         # Max cache entries, LRU eviction when full (default: 10000, ~200MB)
+          #                                         # Adjust for different scenarios:
+          #                                         #   - Small (5-10 users):  capacity: 1000  (~20MB)
+          #                                         #   - Medium (30 users):   capacity: 10000 (~200MB, default)
+          #                                         #   - Large (100 users):   capacity: 30000 (~600MB)
+          #                                         #   - Unlimited:           capacity: null  (not recommended - unbounded memory)
+          context_window_size: 3                    # Number of previous conversation turns to include
+          # max_context_tokens: 2000                # Maximum context length (default: 2000)
+          # question_weight: 0.6                    # Weight for question similarity (default: 0.6)
+          # context_weight: 0.4                     # Weight for context similarity (default: 0.4)
+          # Note: question_weight + context_weight must equal 1.0
+        persist_conversation: true
+agents:
+  genie: &genie
+    name: genie                                     # Agent identifier
+    description: "Genie Agent with In-Memory Semantic Cache"
+    model: *default_llm                             # Reference to LLM configuration
+    tools:                                          # Tools available to this agent
+      - *genie_tool
+    prompt: |                                       # System prompt defining agent behavior
+      Answers questions about retail products and inventory using natural language.
+      You have access to a semantic cache that remembers similar questions to provide faster responses.
+app:
+  name: genie_in_memory_semantic_cache_dao          # Application name
+  description: "Multi-agent system that talks to genie with in-memory semantic caching (no database required)"
+  log_level: DEBUG                                  # Logging level for the application
+  environment_vars:                                 # Secrets to inject at runtime
+    RETAIL_AI_DATABRICKS_CLIENT_ID: "{{secrets/retail_consumer_goods/RETAIL_AI_DATABRICKS_CLIENT_ID}}"
+    RETAIL_AI_DATABRICKS_CLIENT_SECRET: "{{secrets/retail_consumer_goods/RETAIL_AI_DATABRICKS_CLIENT_SECRET}}"
+    RETAIL_AI_DATABRICKS_HOST: "{{secrets/retail_consumer_goods/RETAIL_AI_DATABRICKS_HOST}}"
+  registered_model:                                 # MLflow registered model configuration
+    schema: *quick_serve_restaurant_schema          # Schema where model will be registered
+    name: dao_genie_in_memory_semantic_cache        # Model name in MLflow registry
+  endpoint_name: dao_genie_in_memory_semantic_cache # Model serving endpoint name
+  tags:                                             # Tags for resource organization
+    business: rcg                                   # Business unit identifier
+    streaming: true                                 # Indicates streaming capabilities
+  permissions:                                      # Model serving permissions
+    - principals: [users]                           # Grant access to all users
+      entitlements:
+        - CAN_QUERY                                 # Query permissions
+  agents:                                           # List of agents included in the system
+    - *genie                                        # Genie agent with in-memory cache
+  orchestration:                                    # Agent orchestration configuration
+    memory: *memory                                 # In-memory conversation persistence
+    swarm:                                          # Swarm orchestration pattern
+      default_agent: *genie                         # Default agent for routing

{dao_ai-0.1.18 → dao_ai-0.1.19}/config/examples/README.md RENAMED Viewed

@@ -52,7 +52,8 @@ Or jump directly to the category that matches your current need.
 **Natural language to SQL**
 - Basic Genie integration
 - LRU caching for performance
-- Semantic caching with embeddings
+- PostgreSQL semantic caching with embeddings
+- In-memory semantic caching (no database required)
 👉 Query data with natural language, optimized with caching

{dao_ai-0.1.18 → dao_ai-0.1.19}/docs/examples.md RENAMED Viewed

@@ -120,9 +120,10 @@ Improve performance and reduce costs through intelligent caching.
 | Example | Description |
 |---------|-------------|
 | `genie_lru_cache.yaml` | LRU (Least Recently Used) caching for Genie |
-| `genie_semantic_cache.yaml` | Two-tier semantic caching with embeddings |
+| `genie_semantic_cache.yaml` | Two-tier semantic caching with PostgreSQL embeddings |
+| `genie_in_memory_semantic_cache.yaml` | In-memory semantic caching (no database required) |
-**Prerequisites:** PostgreSQL or Lakebase for semantic cache
+**Prerequisites:** PostgreSQL or Lakebase required for `genie_semantic_cache.yaml` only
 **Next:** Add persistence in `05_memory/`
 ---

{dao_ai-0.1.18 → dao_ai-0.1.19}/docs/key-capabilities.md RENAMED Viewed

@@ -202,7 +202,7 @@ graph TB
     l1_cache["L1: LRU Cache (In-Memory)<br/>• Capacity: 1000 entries<br/>• Hash-based lookup<br/>• O(1) exact string match"]
     l1_hit{Hit?}
-    l2_cache["L2: Semantic Cache (PostgreSQL)<br/>• pg_vector embeddings<br/>• Conversation context aware<br/>• L2 distance similarity<br/>• Partitioned by Genie space ID"]
+    l2_cache["L2: Semantic Cache<br/>• PostgreSQL (pg_vector) OR In-Memory<br/>• Dual embeddings (question + context)<br/>• L2 distance similarity<br/>• Conversation context aware<br/>• Partitioned by Genie space ID"]
     l2_hit{Hit?}
     genie["Genie API<br/>(Expensive call)<br/>Natural language to SQL"]
@@ -247,7 +247,11 @@ The **LRU (Least Recently Used) Cache** provides instant lookups for exact quest
 ### Semantic Cache (L2)
-The **Semantic Cache** uses PostgreSQL with pg_vector to find similar questions even when worded differently. It includes **conversation context awareness** to improve matching in multi-turn conversations:
+The **Semantic Cache** finds similar questions even when worded differently using vector embeddings and similarity search. It includes **conversation context awareness** to improve matching in multi-turn conversations. DAO provides two implementations:
+#### PostgreSQL-Based Semantic Cache
+Uses PostgreSQL with pg_vector for persistent, multi-instance shared caching:
 | Parameter | Default | Description |
 |-----------|---------|-------------|
@@ -259,6 +263,62 @@ The **Semantic Cache** uses PostgreSQL with pg_vector to find similar questions
 | `table_name` | `genie_semantic_cache` | Table name for cache storage |
 | `context_window_size` | 3 | Number of previous conversation turns to include |
 | `context_similarity_threshold` | 0.80 | Minimum similarity for conversation context |
+| `question_weight` | 0.6 | Weight for question similarity in combined score (0.0-1.0) |
+| `context_weight` | 0.4 | Weight for context similarity (computed as 1 - question_weight if not set) |
+| `embedding_dims` | Auto-detected | Embedding vector dimensions (auto-detected from model if not specified) |
+| `max_context_tokens` | 2000 | Maximum token length for conversation context embeddings |
+**Best for:** Production deployments with multiple instances, large cache sizes (thousands+), and cross-instance cache sharing
+#### In-Memory Semantic Cache
+Uses in-memory storage without external database dependencies:
+```yaml
+genie_tool:
+  function:
+    type: factory
+    name: dao_ai.tools.create_genie_tool
+    args:
+      genie_room: *retail_genie_room
+      # In-memory semantic cache (no database required)
+      in_memory_semantic_cache_parameters:
+        warehouse: *warehouse
+        embedding_model: *embedding_model  # Default: databricks-gte-large-en
+        similarity_threshold: 0.85         # 0.0-1.0 (default: 0.85)
+        time_to_live_seconds: 86400        # 1 day (default), use -1 or None for never expire
+        capacity: 1000                     # Max cache entries (LRU eviction when full)
+        context_window_size: 3             # Number of previous conversation turns
+        context_similarity_threshold: 0.80 # Minimum context similarity
+        question_weight: 0.6               # Weight for question similarity
+        context_weight: 0.4                # Weight for context similarity
+        embedding_dims: null               # Auto-detected from model
+        max_context_tokens: 2000           # Max context token length
+```
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `similarity_threshold` | 0.85 | Minimum similarity for cache hit (0.0-1.0) |
+| `time_to_live_seconds` | 86400 | Cache entry lifetime (-1 = never expire) |
+| `embedding_model` | `databricks-gte-large-en` | Model for generating question embeddings |
+| `warehouse` | Required | Databricks warehouse for SQL execution |
+| `capacity` | 1000 | Maximum cache entries (LRU eviction when full) |
+| `context_window_size` | 3 | Number of previous conversation turns to include |
+| `context_similarity_threshold` | 0.80 | Minimum similarity for conversation context |
+| `question_weight` | 0.6 | Weight for question similarity in combined score (0.0-1.0) |
+| `context_weight` | 0.4 | Weight for context similarity (computed as 1 - question_weight if not set) |
+| `embedding_dims` | Auto-detected | Embedding vector dimensions (auto-detected from model if not specified) |
+| `max_context_tokens` | 2000 | Maximum token length for conversation context embeddings |
+**Best for:** Single-instance deployments, development/testing, scenarios without database access, moderate cache sizes (hundreds to low thousands)
+**Key Differences:**
+- ✅ **No external database required** - Simpler setup and deployment
+- ✅ **Same L2 distance algorithm** - Consistent behavior with PostgreSQL version
+- ⚠️ **Per-instance cache** - Each replica has its own cache (not shared)
+- ⚠️ **No persistence** - Cache is lost on restart
+- ⚠️ **Memory-bound** - Limited by available RAM; use capacity limits
 **Best for:** Catching rephrased questions like:
 - "What's our inventory status?" ≈ "Show me stock levels"
@@ -271,6 +331,12 @@ The semantic cache tracks conversation history to resolve ambiguous references:
 This works by embedding both the current question *and* recent conversation turns, then computing a weighted similarity score. This dramatically improves cache hits in multi-turn conversations where users naturally use pronouns and references.
+**Weight Configuration:**
+The `question_weight` and `context_weight` parameters control how question vs conversation context similarity are combined into the final score:
+- Both weights must sum to 1.0 (if only one is provided, the other is computed automatically)
+- Higher `question_weight` prioritizes matching the exact question wording
+- Higher `context_weight` prioritizes matching the conversation context, useful for multi-turn conversations with pronouns and references
 ### Cache Behavior
 1. **SQL Caching, Not Results**: The cache stores the *generated SQL query*, not the query results. On a cache hit, the SQL is re-executed against your warehouse, ensuring **data freshness**.
@@ -283,12 +349,10 @@ This works by embedding both the current question *and* recent conversation turn
    - Genie generates fresh SQL
    - The new SQL is cached
-4. **Multi-Instance Aware**: Each LRU cache is per-instance (in Model Serving, each replica has its own). The semantic cache is shared across all instances via PostgreSQL.
+4. **Multi-Instance Aware**: Each LRU cache is per-instance (in Model Serving, each replica has its own). The PostgreSQL semantic cache is shared across all instances. The in-memory semantic cache is per-instance (not shared).
 5. **Space ID Partitioning**: Cache entries are isolated per Genie space, preventing cross-space cache pollution.
-For more details on semantic cache configuration, see [docs/semantic_cache_weight_configuration.md](semantic_cache_weight_configuration.md).
 ## 5. Vector Search Reranking
 **The problem:** Vector search (semantic similarity) is fast but sometimes returns loosely related results. It's like a librarian who quickly grabs 50 books that *might* be relevant.

{dao_ai-0.1.18 → dao_ai-0.1.19}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "dao-ai"
-version = "0.1.18"
+version = "0.1.19"
 description = "DAO AI: A modular, multi-agent orchestration framework for complex AI workflows. Supports agent handoff, tool integration, and dynamic configuration via YAML."
 readme = "README.md"
 license = { text = "MIT" }

{dao_ai-0.1.18 → dao_ai-0.1.19}/src/dao_ai/config.py RENAMED Viewed

@@ -1773,6 +1773,105 @@ class GenieSemanticCacheParametersModel(BaseModel):
         return self
+# Memory estimation for capacity planning:
+# - Each entry: ~20KB (8KB question embedding + 8KB context embedding + 4KB strings/overhead)
+# - 1,000 entries: ~20MB (0.4% of 8GB)
+# - 5,000 entries: ~100MB (2% of 8GB)
+# - 10,000 entries: ~200MB (4-5% of 8GB) - default for ~30 users
+# - 20,000 entries: ~400MB (8-10% of 8GB)
+# Default 10,000 entries provides ~330 queries per user for 30 users.
+class GenieInMemorySemanticCacheParametersModel(BaseModel):
+    """
+    Configuration for in-memory semantic cache (no database required).
+    This cache stores embeddings and cache entries entirely in memory, providing
+    semantic similarity matching without requiring external database dependencies
+    like PostgreSQL or Databricks Lakebase.
+    Default settings are tuned for ~30 users on an 8GB machine:
+    - Capacity: 10,000 entries (~200MB memory, ~330 queries per user)
+    - Eviction: LRU (Least Recently Used) - keeps frequently accessed queries
+    - TTL: 1 week (accommodates weekly work patterns and batch jobs)
+    - Memory overhead: ~4-5% of 8GB system
+    The LRU eviction strategy ensures hot queries stay cached while cold queries
+    are evicted, providing better hit rates than FIFO eviction.
+    For larger deployments or memory-constrained environments, adjust capacity and TTL accordingly.
+    Use this when:
+    - No external database access is available
+    - Single-instance deployments (cache not shared across instances)
+    - Cache persistence across restarts is not required
+    - Cache sizes are moderate (hundreds to low thousands of entries)
+    For multi-instance deployments or large cache sizes, use GenieSemanticCacheParametersModel
+    with PostgreSQL backend instead.
+    """
+    model_config = ConfigDict(use_enum_values=True, extra="forbid")
+    time_to_live_seconds: int | None = (
+        60 * 60 * 24 * 7
+    )  # 1 week default (604800 seconds), None or negative = never expires
+    similarity_threshold: float = 0.85  # Minimum similarity for question matching (L2 distance converted to 0-1 scale)
+    context_similarity_threshold: float = 0.80  # Minimum similarity for context matching (L2 distance converted to 0-1 scale)
+    question_weight: Optional[float] = (
+        0.6  # Weight for question similarity in combined score (0-1). If not provided, computed as 1 - context_weight
+    )
+    context_weight: Optional[float] = (
+        None  # Weight for context similarity in combined score (0-1). If not provided, computed as 1 - question_weight
+    )
+    embedding_model: str | LLMModel = "databricks-gte-large-en"
+    embedding_dims: int | None = None  # Auto-detected if None
+    warehouse: WarehouseModel
+    capacity: int | None = (
+        10000  # Maximum cache entries. ~200MB for 10000 entries (1024-dim embeddings). LRU eviction when full. None = unlimited (not recommended for production).
+    )
+    context_window_size: int = 3  # Number of previous turns to include for context
+    max_context_tokens: int = (
+        2000  # Maximum context length to prevent extremely long embeddings
+    )
+    @model_validator(mode="after")
+    def compute_and_validate_weights(self) -> Self:
+        """
+        Compute missing weight and validate that question_weight + context_weight = 1.0.
+        Either question_weight or context_weight (or both) can be provided.
+        The missing one will be computed as 1.0 - provided_weight.
+        If both are provided, they must sum to 1.0.
+        """
+        if self.question_weight is None and self.context_weight is None:
+            # Both missing - use defaults
+            self.question_weight = 0.6
+            self.context_weight = 0.4
+        elif self.question_weight is None:
+            # Compute question_weight from context_weight
+            if not (0.0 <= self.context_weight <= 1.0):
+                raise ValueError(
+                    f"context_weight must be between 0.0 and 1.0, got {self.context_weight}"
+                )
+            self.question_weight = 1.0 - self.context_weight
+        elif self.context_weight is None:
+            # Compute context_weight from question_weight
+            if not (0.0 <= self.question_weight <= 1.0):
+                raise ValueError(
+                    f"question_weight must be between 0.0 and 1.0, got {self.question_weight}"
+                )
+            self.context_weight = 1.0 - self.question_weight
+        else:
+            # Both provided - validate they sum to 1.0
+            total_weight = self.question_weight + self.context_weight
+            if not abs(total_weight - 1.0) < 0.0001:  # Allow small floating point error
+                raise ValueError(
+                    f"question_weight ({self.question_weight}) + context_weight ({self.context_weight}) "
+                    f"must equal 1.0 (got {total_weight}). These weights determine the relative importance "
+                    f"of question vs context similarity in the combined score."
+                )
+        return self
 class SearchParametersModel(BaseModel):
     model_config = ConfigDict(use_enum_values=True, extra="forbid")
     num_results: Optional[int] = 10

{dao_ai-0.1.18 → dao_ai-0.1.19}/src/dao_ai/genie/cache/__init__.py RENAMED Viewed

@@ -28,6 +28,7 @@ from dao_ai.genie.cache.base import (
     SQLCacheEntry,
 )
 from dao_ai.genie.cache.core import execute_sql_via_warehouse
+from dao_ai.genie.cache.in_memory_semantic import InMemorySemanticCacheService
 from dao_ai.genie.cache.lru import LRUCacheService
 from dao_ai.genie.cache.semantic import SemanticCacheService
@@ -38,6 +39,7 @@ __all__ = [
     "SQLCacheEntry",
     "execute_sql_via_warehouse",
     # Cache implementations
+    "InMemorySemanticCacheService",
     "LRUCacheService",
     "SemanticCacheService",
 ]

{dao_ai-0.1.18 → dao_ai-0.1.19}/src/dao_ai/genie/cache/core.py RENAMED Viewed

@@ -38,7 +38,7 @@ def execute_sql_via_warehouse(
     w: WorkspaceClient = warehouse.workspace_client
     warehouse_id: str = str(warehouse.warehouse_id)
-    logger.trace("Executing cached SQL", layer=layer_name, sql_prefix=sql[:100])
+    logger.trace("Executing cached SQL", layer=layer_name, sql=sql[:100])
     statement_response: StatementResponse = w.statement_execution.execute_statement(
         statement=sql,

dao-ai 0.1.18__tar.gz → 0.1.19__tar.gz

dao-ai 0.1.18tar.gz → 0.1.19tar.gz