PyPI - dao-ai - Versions diffs - 0.1.18__tar.gz → 0.1.20__tar.gz - Mend

dao-ai 0.1.18tar.gz → 0.1.20tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (338) hide show

{dao_ai-0.1.18 → dao_ai-0.1.20}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: dao-ai
-Version: 0.1.18
+Version: 0.1.20
 Summary: DAO AI: A modular, multi-agent orchestration framework for complex AI workflows. Supports agent handoff, tool integration, and dynamic configuration via YAML.
 Project-URL: Homepage, https://github.com/natefleming/dao-ai
 Project-URL: Documentation, https://natefleming.github.io/dao-ai
@@ -409,7 +409,8 @@ The `config/examples/` directory contains ready-to-use configurations organized
 - `01_getting_started/minimal.yaml` - Simplest possible agent
 - `02_tools/vector_search_with_reranking.yaml` - RAG with improved accuracy
-- `04_genie/genie_semantic_cache.yaml` - NL-to-SQL with two-tier caching
+- `04_genie/genie_semantic_cache.yaml` - NL-to-SQL with PostgreSQL semantic caching
+- `04_genie/genie_in_memory_semantic_cache.yaml` - NL-to-SQL with in-memory semantic caching (no database)
 - `05_memory/conversation_summarization.yaml` - Long conversation handling
 - `06_on_behalf_of_user/obo_basic.yaml` - User-level access control
 - `07_human_in_the_loop/human_in_the_loop.yaml` - Approval workflows

{dao_ai-0.1.18 → dao_ai-0.1.20}/README.md RENAMED Viewed

@@ -330,7 +330,8 @@ The `config/examples/` directory contains ready-to-use configurations organized
 - `01_getting_started/minimal.yaml` - Simplest possible agent
 - `02_tools/vector_search_with_reranking.yaml` - RAG with improved accuracy
-- `04_genie/genie_semantic_cache.yaml` - NL-to-SQL with two-tier caching
+- `04_genie/genie_semantic_cache.yaml` - NL-to-SQL with PostgreSQL semantic caching
+- `04_genie/genie_in_memory_semantic_cache.yaml` - NL-to-SQL with in-memory semantic caching (no database)
 - `05_memory/conversation_summarization.yaml` - Long conversation handling
 - `06_on_behalf_of_user/obo_basic.yaml` - User-level access control
 - `07_human_in_the_loop/human_in_the_loop.yaml` - Approval workflows

{dao_ai-0.1.18 → dao_ai-0.1.20}/config/examples/04_genie/README.md RENAMED Viewed

@@ -52,10 +52,20 @@ flowchart TB
 | File | Description |
 |------|-------------|
-| [`genie_cached.yaml`](./genie_cached.yaml) | Two-tier caching with LRU and semantic cache |
+| [`genie_cached.yaml`](./genie_cached.yaml) | Two-tier caching with LRU and PostgreSQL semantic cache |
+| [`genie_in_memory_semantic_cache.yaml`](./genie_in_memory_semantic_cache.yaml) | In-memory semantic cache (no database required) |
 ## Cache Tiers
+DAO provides two L2 semantic cache implementations:
+| Implementation | Best For | Database Required |
+|----------------|----------|-------------------|
+| **PostgreSQL Semantic Cache** | Production multi-instance deployments, large cache sizes (thousands+), cross-instance sharing | Yes (PostgreSQL with pg_vector) |
+| **In-Memory Semantic Cache** | Single-instance deployments, dev/test, no database access, moderate cache sizes (hundreds to low thousands) | No (in-memory only) |
+Both use the same L2 distance algorithm and support conversation context awareness for consistent behavior.
 ```mermaid
 %%{init: {'theme': 'base'}}%%
 graph TB
@@ -70,8 +80,9 @@ graph TB
         subgraph L2["🧠 L2: Semantic Cache"]
             SEM1["<b>Type:</b> Similarity match"]
             SEM2["<b>Speed:</b> ~50ms"]
-            SEM3["<b>Threshold:</b> 0.95"]
-            SEM4["<b>TTL:</b> ttl: 3600 (1 hour)"]
+            SEM3["<b>Options:</b> PostgreSQL or In-Memory"]
+            SEM4["<b>Threshold:</b> 0.85-0.95"]
+            SEM5["<b>TTL:</b> ttl: 3600 (1 hour)"]
         end
     end
@@ -81,21 +92,56 @@ graph TB
 ## Configuration
+### PostgreSQL Semantic Cache (Multi-Instance)
 ```yaml
-resources:
-  genie_rooms:
-    retail_genie_room: &retail_genie_room
-      space_id: "01efabcd1234567890abcdef12345678"
+genie_tool:
+  function:
+    type: factory
+    name: dao_ai.tools.create_genie_tool
+    args:
+      genie_room: *retail_genie_room
       # ⚡ L1: LRU Cache - Exact match
-      lru_cache:
-        maxsize: 100              # Max cached queries
+      lru_cache_parameters:
+        warehouse: *warehouse
+        capacity: 100
+        time_to_live_seconds: 3600
+      # 🧠 L2: PostgreSQL Semantic Cache - Similar queries
+      semantic_cache_parameters:
+        database: *postgres_db
+        warehouse: *warehouse
+        embedding_model: *embedding_model
+        similarity_threshold: 0.85
+        time_to_live_seconds: 3600
+        context_window_size: 2  # default
+```
+### In-Memory Semantic Cache (Single-Instance)
+```yaml
+genie_tool:
+  function:
+    type: factory
+    name: dao_ai.tools.create_genie_tool
+    args:
+      genie_room: *retail_genie_room
+      # Optional L1: LRU Cache - Exact match
+      # lru_cache_parameters:
+      #   warehouse: *warehouse
+      #   capacity: 100
+      #   time_to_live_seconds: 3600
-      # 🧠 L2: Semantic Cache - Similar queries
-      semantic_cache:
-        similarity_threshold: 0.95  # How similar (0.0-1.0)
-        ttl: 3600                   # Time-to-live in seconds
-        max_results: 1000           # Max cached embeddings
+      # 🧠 In-Memory Semantic Cache - No database required
+      in_memory_semantic_cache_parameters:
+        warehouse: *warehouse
+        embedding_model: *embedding_model
+        similarity_threshold: 0.85
+        time_to_live_seconds: 604800  # 1 week
+        capacity: 1000                # LRU eviction when full
+        context_window_size: 2  # default
 ```
 ## Cache Flow
@@ -210,8 +256,10 @@ agents:
 ## Quick Start
+### PostgreSQL Semantic Cache
 ```bash
-# Run with caching enabled
+# Run with PostgreSQL semantic cache
 dao-ai chat -c config/examples/04_genie/genie_cached.yaml
 # Test caching behavior
@@ -220,6 +268,18 @@ dao-ai chat -c config/examples/04_genie/genie_cached.yaml
 > Show me Q4 revenue                  # Semantic cache hit (~50ms)
 ```
+### In-Memory Semantic Cache
+```bash
+# Run with in-memory semantic cache (no database required)
+dao-ai chat -c config/examples/04_genie/genie_in_memory_semantic_cache.yaml
+# Test caching behavior
+> What are the total sales for Q4?    # First query - Genie hit
+> What are the total sales for Q4?    # Semantic cache hit (~50ms)
+> Show me Q4 revenue                  # Semantic cache hit (~50ms)
+```
 ## Cache Monitoring
 ```bash

dao_ai-0.1.20/config/examples/04_genie/cache_threshold_optimization.yaml ADDED Viewed

@@ -0,0 +1,180 @@
+# yaml-language-server: $schema=../../../schemas/model_config_schema.json
+#
+# Example configuration for Genie semantic cache threshold optimization.
+#
+# This configuration demonstrates how to:
+#   1. Define an evaluation dataset with question pairs
+#   2. Configure threshold optimization parameters
+#   3. Run Optuna Bayesian optimization to find optimal thresholds
+#
+# The optimizer tunes these parameters:
+#   - similarity_threshold: Minimum similarity for question matching (0.5-0.99)
+#   - context_similarity_threshold: Minimum similarity for context matching (0.5-0.99)
+#   - question_weight: Weight for question vs context in combined score (0.1-0.9)
+#
+# Usage:
+#   1. Update the evaluation dataset with your domain-specific question pairs
+#   2. Run the optimization notebook: notebooks/11_optimize_cache_thresholds.py
+#   3. Apply the optimized thresholds to your cache configuration
+schemas:
+  quick_serve_restaurant_schema: &quick_serve_restaurant_schema
+    catalog_name: retail_consumer_goods
+    schema_name: quick_serve_restaurant
+resources:
+  llms:
+    # Judge model for semantic equivalence evaluation
+    # Used when expected_match is not provided for an entry
+    judge_model: &judge_model
+      name: databricks-meta-llama-3-3-70b-instruct
+      temperature: 0.0  # Low temperature for consistent judgments
+      max_tokens: 10    # Only need "MATCH" or "NO_MATCH"
+    # Embedding model for generating embeddings
+    embedding_model: &embedding_model
+      name: databricks-gte-large-en
+  warehouses:
+    shared_endpoint_warehouse: &shared_endpoint_warehouse
+      name: "Shared Endpoint Warehouse"
+      warehouse_id: 148ccb90800933a1
+  databases:
+    semantic_cache_db: &semantic_cache_db
+      name: "Retail and Consumer Goods Database"
+      instance_name: "retail-consumer-goods"
+# =============================================================================
+# CACHE PARAMETERS (Current Configuration)
+# =============================================================================
+# These are the current threshold values that will be optimized
+cache_parameters: &cache_parameters
+  database: *semantic_cache_db
+  warehouse: *shared_endpoint_warehouse
+  embedding_model: *embedding_model
+  similarity_threshold: 0.85           # Question matching threshold
+  context_similarity_threshold: 0.80   # Context matching threshold
+  question_weight: 0.6                 # Weight for question (context = 1 - question)
+  time_to_live_seconds: 86400
+# =============================================================================
+# EVALUATION DATASET
+# =============================================================================
+# Define pairs of questions to evaluate threshold effectiveness.
+#
+# Each entry contains:
+#   - question/context: The incoming query
+#   - cached_question/cached_context: The cached entry to compare against
+#   - expected_match: Whether these should be considered a cache hit
+#     - true: Semantically equivalent (should return cached result)
+#     - false: Different questions (should not match)
+#     - null/omitted: Use LLM judge to determine
+#
+# Tips for good evaluation data:
+#   - Include diverse question types from your domain
+#   - Balance positive and negative examples
+#   - Include edge cases (similar but different questions)
+#   - Use real questions from your production cache if available
+# Note: Embeddings would normally be pre-computed. For this example,
+# we show the structure - use the notebook to generate real embeddings.
+threshold_eval_dataset: &threshold_eval_dataset
+  name: retail_cache_eval_dataset
+  description: "Evaluation dataset for retail domain semantic cache tuning"
+  entries: []
+  # In practice, populate with real entries like:
+  #
+  # entries:
+  #   # Positive pair - paraphrases that should match
+  #   - question: "What are total sales for Q1?"
+  #     question_embedding: [0.1, 0.2, ...]  # Pre-computed embeddings
+  #     context: "Previous: Show me revenue breakdown"
+  #     context_embedding: [0.1, 0.2, ...]
+  #     cached_question: "Show me Q1 total sales"
+  #     cached_question_embedding: [0.1, 0.2, ...]
+  #     cached_context: "Previous: Show me revenue breakdown"
+  #     cached_context_embedding: [0.1, 0.2, ...]
+  #     expected_match: true
+  #
+  #   # Negative pair - different questions that should NOT match
+  #   - question: "What is inventory count by store?"
+  #     question_embedding: [0.3, 0.1, ...]
+  #     context: ""
+  #     context_embedding: [0.0, 0.0, ...]
+  #     cached_question: "Show revenue by region"
+  #     cached_question_embedding: [0.5, 0.6, ...]
+  #     cached_context: ""
+  #     cached_context_embedding: [0.0, 0.0, ...]
+  #     expected_match: false
+  #
+  #   # Unlabeled entry - LLM judge will determine
+  #   - question: "How many items sold last week?"
+  #     question_embedding: [0.2, 0.3, ...]
+  #     context: "Previous: Filter by electronics"
+  #     context_embedding: [0.1, 0.4, ...]
+  #     cached_question: "Total items sold in past 7 days"
+  #     cached_question_embedding: [0.2, 0.35, ...]
+  #     cached_context: "Previous: Filter by electronics"
+  #     cached_context_embedding: [0.1, 0.4, ...]
+  #     # expected_match omitted - will use LLM judge
+# =============================================================================
+# THRESHOLD OPTIMIZATION CONFIGURATION
+# =============================================================================
+# Configure the optimization run parameters.
+threshold_optimizations:
+  optimize_retail_cache_thresholds:
+    name: optimize_retail_cache_thresholds
+    cache_parameters: *cache_parameters       # Current thresholds to improve
+    dataset: *threshold_eval_dataset          # Evaluation dataset
+    judge_model: *judge_model                 # LLM for unlabeled entries
+    # Optimization parameters
+    n_trials: 50                              # Number of Optuna trials (more = better results)
+    metric: f1                                # Metric to optimize: f1, precision, recall, fbeta
+    beta: 1.0                                 # Beta for fbeta metric (higher = favor recall)
+    seed: 42                                  # Random seed for reproducibility
+# =============================================================================
+# USAGE INSTRUCTIONS
+# =============================================================================
+#
+# 1. PREPARE EVALUATION DATA:
+#    Generate embeddings for your question pairs using the embedding model.
+#    You can use the notebook or the generate_eval_dataset_from_cache() function
+#    to create a dataset from existing cache entries.
+#
+# 2. RUN OPTIMIZATION:
+#    Use the notebook notebooks/11_optimize_cache_thresholds.py with this config,
+#    or run programmatically:
+#
+#    ```python
+#    from dao_ai.config import AppConfig
+#
+#    config = AppConfig.from_file("cache_threshold_optimization.yaml")
+#    optimization = config.threshold_optimizations["optimize_retail_cache_thresholds"]
+#    result = optimization.optimize()
+#
+#    print(f"Optimized thresholds: {result.optimized_thresholds}")
+#    print(f"Improvement: {result.improvement:.1%}")
+#    ```
+#
+# 3. APPLY RESULTS:
+#    Update your semantic cache configuration with the optimized values:
+#
+#    semantic_cache_parameters:
+#      similarity_threshold: <optimized_value>
+#      context_similarity_threshold: <optimized_value>
+#      question_weight: <optimized_value>
+#
+# 4. MONITOR:
+#    Track cache hit rates and accuracy in production to validate improvements.

dao_ai-0.1.20/config/examples/04_genie/genie_in_memory_semantic_cache.yaml ADDED Viewed

@@ -0,0 +1,148 @@
+# yaml-language-server: $schema=../../../schemas/model_config_schema.json
+#
+# Example configuration for Genie with in-memory semantic caching:
+#   - In-Memory Semantic Cache: Similarity search without external database
+#   - Optional LRU Cache (L1): Fast O(1) exact match lookup
+#
+# This configuration is ideal for:
+# - Environments without access to PostgreSQL or Databricks Lakebase
+# - Single-instance deployments (cache not shared across instances)
+# - Moderate cache sizes (hundreds to low thousands of entries)
+# - Cases where cache persistence across restarts is not required
+#
+# Cache flow: Question → LRU (exact match) → In-Memory Semantic (similarity) → Genie API
+# On cache hit, the cached SQL is re-executed against the warehouse for fresh data.
+schemas:
+  quick_serve_restaurant_schema: &quick_serve_restaurant_schema
+    catalog_name: retail_consumer_goods                    # Unity Catalog name
+    schema_name: quick_serve_restaurant                    # Schema within the catalog
+resources:
+  llms:
+    # Primary LLM for general tasks
+    default_llm: &default_llm
+      name: databricks-claude-sonnet-4
+      temperature: 0.1                              # Low temperature for consistent responses
+      max_tokens: 8192                              # Maximum tokens per response
+      on_behalf_of_user: False
+    # Embedding model for semantic similarity search
+    embedding_model: &embedding_model
+      name: databricks-gte-large-en                 # Text embedding model
+      on_behalf_of_user: False
+  warehouses:
+    # Warehouse for executing SQL queries (used by semantic cache)
+    shared_endpoint_warehouse: &shared_endpoint_warehouse
+      name: "Shared Endpoint Warehouse"             # Human-readable name
+      description: "A warehouse for shared endpoints"  # Description
+      warehouse_id: 148ccb90800933a1                # Databricks warehouse ID
+      on_behalf_of_user: False
+  genie_rooms:
+    # Genie space for retail data queries
+    retail_genie_room: &retail_genie_room
+      name: "Retail AI Genie Room"                        # Human-readable name
+      description: "A room for Genie agents to interact"  # Description
+      space_id:
+        env: RETAIL_AI_GENIE_SPACE_ID
+        default_value: 01f01c91f1f414d59daaefd2b7ec82ea
+# =============================================================================
+# MEMORY CONFIGURATION
+# =============================================================================
+# Configure in-memory storage for agent conversations and state persistence
+memory: &memory
+  # Conversation checkpointing for state persistence
+  checkpointer:
+    name: default_checkpointer                      # Checkpointer identifier (type inferred as memory - no database)
+tools:
+  genie_tool: &genie_tool
+    name: genie
+    function:
+      type: factory                                 # Tool type: factory function
+      name: dao_ai.tools.create_genie_tool          # Factory function path
+      args:                                         # Arguments passed to factory
+        name: my_genie_tool
+        description: Answers questions about retail products and inventory
+        genie_room: *retail_genie_room              # Reference to Genie room config
+        # Optional L1 Cache: LRU (Least Recently Used) - Fast exact match
+        # Uncomment to enable LRU cache in front of semantic cache
+        # lru_cache_parameters:
+        #   warehouse: *shared_endpoint_warehouse     # Warehouse to re-execute cached SQL
+        #   capacity: 100                             # Maximum number of cached entries
+        #   time_to_live_seconds: 3600                # Cache entries expire after 1 hour
+        # In-Memory Semantic Cache: Similarity-based lookup with LRU eviction (NO database required)
+        # Default settings optimized for ~30 users on 8GB machine:
+        #   - Capacity: 10,000 entries (~200MB, ~330 queries/user)
+        #   - Eviction: LRU (Least Recently Used) keeps hot queries cached
+        #   - TTL: 1 week (accommodates weekly work patterns)
+        #   - Memory: ~4-5% of 8GB system
+        in_memory_semantic_cache_parameters:
+          warehouse: *shared_endpoint_warehouse     # Warehouse used to re-execute cached SQL
+          embedding_model: *embedding_model         # Reference to embedding model
+          # embedding_dims: 1024                    # Auto-detected if omitted (recommended)
+          similarity_threshold: 0.85                # Minimum similarity for question matching (L2 distance to 0-1)
+          context_similarity_threshold: 0.80        # Minimum similarity for context matching
+          # time_to_live_seconds: 604800            # Cache entries expire after 1 week (default)
+          # capacity: 10000                         # Max cache entries, LRU eviction when full (default: 10000, ~200MB)
+          #                                         # Adjust for different scenarios:
+          #                                         #   - Small (5-10 users):  capacity: 1000  (~20MB)
+          #                                         #   - Medium (30 users):   capacity: 10000 (~200MB, default)
+          #                                         #   - Large (100 users):   capacity: 30000 (~600MB)
+          #                                         #   - Unlimited:           capacity: null  (not recommended - unbounded memory)
+          context_window_size: 3                    # Number of previous conversation turns to include
+          # max_context_tokens: 2000                # Maximum context length (default: 2000)
+          # question_weight: 0.6                    # Weight for question similarity (default: 0.6)
+          # context_weight: 0.4                     # Weight for context similarity (default: 0.4)
+          # Note: question_weight + context_weight must equal 1.0
+        persist_conversation: true
+agents:
+  genie: &genie
+    name: genie                                     # Agent identifier
+    description: "Genie Agent with In-Memory Semantic Cache"
+    model: *default_llm                             # Reference to LLM configuration
+    tools:                                          # Tools available to this agent
+      - *genie_tool
+    prompt: |                                       # System prompt defining agent behavior
+      Answers questions about retail products and inventory using natural language.
+      You have access to a semantic cache that remembers similar questions to provide faster responses.
+app:
+  name: genie_in_memory_semantic_cache_dao          # Application name
+  description: "Multi-agent system that talks to genie with in-memory semantic caching (no database required)"
+  log_level: DEBUG                                  # Logging level for the application
+  environment_vars:                                 # Secrets to inject at runtime
+    RETAIL_AI_DATABRICKS_CLIENT_ID: "{{secrets/retail_consumer_goods/RETAIL_AI_DATABRICKS_CLIENT_ID}}"
+    RETAIL_AI_DATABRICKS_CLIENT_SECRET: "{{secrets/retail_consumer_goods/RETAIL_AI_DATABRICKS_CLIENT_SECRET}}"
+    RETAIL_AI_DATABRICKS_HOST: "{{secrets/retail_consumer_goods/RETAIL_AI_DATABRICKS_HOST}}"
+  registered_model:                                 # MLflow registered model configuration
+    schema: *quick_serve_restaurant_schema          # Schema where model will be registered
+    name: dao_genie_in_memory_semantic_cache        # Model name in MLflow registry
+  endpoint_name: dao_genie_in_memory_semantic_cache # Model serving endpoint name
+  tags:                                             # Tags for resource organization
+    business: rcg                                   # Business unit identifier
+    streaming: true                                 # Indicates streaming capabilities
+  permissions:                                      # Model serving permissions
+    - principals: [users]                           # Grant access to all users
+      entitlements:
+        - CAN_QUERY                                 # Query permissions
+  agents:                                           # List of agents included in the system
+    - *genie                                        # Genie agent with in-memory cache
+  orchestration:                                    # Agent orchestration configuration
+    memory: *memory                                 # In-memory conversation persistence
+    swarm:                                          # Swarm orchestration pattern
+      default_agent: *genie                         # Default agent for routing

{dao_ai-0.1.18 → dao_ai-0.1.20}/config/examples/README.md RENAMED Viewed

@@ -52,7 +52,8 @@ Or jump directly to the category that matches your current need.
 **Natural language to SQL**
 - Basic Genie integration
 - LRU caching for performance
-- Semantic caching with embeddings
+- PostgreSQL semantic caching with embeddings
+- In-memory semantic caching (no database required)
 👉 Query data with natural language, optimized with caching

{dao_ai-0.1.18 → dao_ai-0.1.20}/docs/examples.md RENAMED Viewed

@@ -120,9 +120,10 @@ Improve performance and reduce costs through intelligent caching.
 | Example | Description |
 |---------|-------------|
 | `genie_lru_cache.yaml` | LRU (Least Recently Used) caching for Genie |
-| `genie_semantic_cache.yaml` | Two-tier semantic caching with embeddings |
+| `genie_semantic_cache.yaml` | Two-tier semantic caching with PostgreSQL embeddings |
+| `genie_in_memory_semantic_cache.yaml` | In-memory semantic caching (no database required) |
-**Prerequisites:** PostgreSQL or Lakebase for semantic cache
+**Prerequisites:** PostgreSQL or Lakebase required for `genie_semantic_cache.yaml` only
 **Next:** Add persistence in `05_memory/`
 ---

dao-ai 0.1.18__tar.gz → 0.1.20__tar.gz

dao-ai 0.1.18tar.gz → 0.1.20tar.gz