PyPI - dao-ai - Versions diffs - 0.1.12__tar.gz → 0.1.14__tar.gz - Mend

dao-ai 0.1.12tar.gz → 0.1.14tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (304) hide show

{dao_ai-0.1.12 → dao_ai-0.1.14}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: dao-ai
-Version: 0.1.12
+Version: 0.1.14
 Summary: DAO AI: A modular, multi-agent orchestration framework for complex AI workflows. Supports agent handoff, tool integration, and dynamic configuration via YAML.
 Project-URL: Homepage, https://github.com/natefleming/dao-ai
 Project-URL: Documentation, https://natefleming.github.io/dao-ai
@@ -125,7 +125,7 @@ DAO AI Builder generates valid YAML configurations that work seamlessly with thi
 - **[Architecture](docs/architecture.md)** - Understand how DAO works under the hood
 ### Core Concepts
-- **[Key Capabilities](docs/key-capabilities.md)** - Explore 14 powerful features for production agents
+- **[Key Capabilities](docs/key-capabilities.md)** - Explore 15 powerful features for production agents
 - **[Configuration Reference](docs/configuration-reference.md)** - Complete YAML configuration guide
 - **[Examples](docs/examples.md)** - Ready-to-use example configurations
@@ -148,7 +148,7 @@ Before you begin, you'll need:
 - **Python 3.11 or newer** installed on your computer ([download here](https://www.python.org/downloads/))
 - **A Databricks workspace** (ask your IT team or see [Databricks docs](https://docs.databricks.com/))
   - Access to **Unity Catalog** (your organization's data catalog)
-  - **Model Serving** enabled (for deploying AI agents)
+  - **Model Serving** or **Databricks Apps** enabled (for deploying AI agents)
   - *Optional*: Vector Search, Genie (for advanced features)
 **Not sure if you have access?** Your Databricks administrator can grant you permissions.
@@ -345,6 +345,7 @@ DAO provides powerful capabilities for building production-ready AI agents:
 | Feature | Description |
 |---------|-------------|
+| **Dual Deployment Targets** | Deploy to Databricks Model Serving or Databricks Apps with a single config |
 | **Multi-Tool Support** | Python functions, Unity Catalog, MCP, Agent Endpoints |
 | **On-Behalf-Of User** | Per-user permissions and governance |
 | **Advanced Caching** | Two-tier (LRU + Semantic) caching for cost optimization |

{dao_ai-0.1.12 → dao_ai-0.1.14}/README.md RENAMED Viewed

@@ -46,7 +46,7 @@ DAO AI Builder generates valid YAML configurations that work seamlessly with thi
 - **[Architecture](docs/architecture.md)** - Understand how DAO works under the hood
 ### Core Concepts
-- **[Key Capabilities](docs/key-capabilities.md)** - Explore 14 powerful features for production agents
+- **[Key Capabilities](docs/key-capabilities.md)** - Explore 15 powerful features for production agents
 - **[Configuration Reference](docs/configuration-reference.md)** - Complete YAML configuration guide
 - **[Examples](docs/examples.md)** - Ready-to-use example configurations
@@ -69,7 +69,7 @@ Before you begin, you'll need:
 - **Python 3.11 or newer** installed on your computer ([download here](https://www.python.org/downloads/))
 - **A Databricks workspace** (ask your IT team or see [Databricks docs](https://docs.databricks.com/))
   - Access to **Unity Catalog** (your organization's data catalog)
-  - **Model Serving** enabled (for deploying AI agents)
+  - **Model Serving** or **Databricks Apps** enabled (for deploying AI agents)
   - *Optional*: Vector Search, Genie (for advanced features)
 **Not sure if you have access?** Your Databricks administrator can grant you permissions.
@@ -266,6 +266,7 @@ DAO provides powerful capabilities for building production-ready AI agents:
 | Feature | Description |
 |---------|-------------|
+| **Dual Deployment Targets** | Deploy to Databricks Model Serving or Databricks Apps with a single config |
 | **Multi-Tool Support** | Python functions, Unity Catalog, MCP, Agent Endpoints |
 | **On-Behalf-Of User** | Per-user permissions and governance |
 | **Advanced Caching** | Two-tier (LRU + Semantic) caching for cost optimization |

{dao_ai-0.1.12 → dao_ai-0.1.14}/config/examples/15_complete_applications/hardware_store_lakebase.yaml RENAMED Viewed

@@ -15,14 +15,15 @@
 variables:
   client_id: &client_id
     options:
-      - env: RETAIL_AI_DATABRICKS_CLIENT_ID      # Service principal client ID
       - scope: retail_consumer_goods
         secret: RETAIL_AI_DATABRICKS_CLIENT_ID
+      - env: RETAIL_AI_DATABRICKS_CLIENT_ID      # Service principal client ID
   client_secret: &client_secret
     options:
-      - env: RETAIL_AI_DATABRICKS_CLIENT_SECRET  # Service principal secret
       - scope: retail_consumer_goods
         secret: RETAIL_AI_DATABRICKS_CLIENT_SECRET
+      - env: RETAIL_AI_DATABRICKS_CLIENT_SECRET  # Service principal secret
 schemas:
   retail_schema: &retail_schema

{dao_ai-0.1.12 → dao_ai-0.1.14}/docs/key-capabilities.md RENAMED Viewed

@@ -2,7 +2,62 @@
 These are the powerful features that make DAO production-ready. Don't worry if some seem complex — you can start simple and add these capabilities as you need them.
-## 1. Multi-Tool Support
+## 1. Dual Deployment Targets
+**What is this?** DAO agents can be deployed to either **Databricks Model Serving** or **Databricks Apps** using the same configuration, giving you flexibility in how you expose your agent.
+**Why this matters:**
+- **Model Serving**: Traditional endpoint for inference workloads, autoscaling, pay-per-token pricing
+- **Databricks Apps**: Full web applications with custom UI, background jobs, and richer integrations
+- **Single Configuration**: Switch deployment targets with one line change — no code rewrite needed
+- **Environment Consistency**: Same YAML config works in both environments
+**Comparison:**
+| Feature | Model Serving | Databricks Apps |
+|---------|--------------|-----------------|
+| **Use Case** | Inference API endpoint | Full web application |
+| **Scaling** | Auto-scales based on load | Manual scaling configuration |
+| **UI** | API only | Custom web UI possible |
+| **Pricing** | Pay per token/request | Compute-based |
+| **Deployment Speed** | ~2-5 minutes | ~2-5 minutes |
+| **Best For** | API integrations, high throughput | Interactive apps, custom UX |
+**How to configure:**
+```yaml
+app:
+  name: my_agent
+  deployment_target: model_serving  # or 'apps'
+  # Model Serving specific options (only used when deployment_target: model_serving)
+  endpoint_name: my_agent_endpoint
+  workload_size: Small
+  scale_to_zero: true
+  agents:
+    - *my_agent
+```
+**Deploy to Model Serving:**
+```bash
+dao-ai deploy -c config.yaml --target model_serving
+```
+**Deploy to Databricks Apps:**
+```bash
+dao-ai deploy -c config.yaml --target apps
+```
+**CLI override:** The `--target` flag always takes precedence over the YAML config, making it easy to deploy the same config to different environments.
+**Behind the scenes:**
+- Model Serving deployments create an MLflow model and serving endpoint
+- Apps deployments create a Databricks App with MLflow experiment tracking
+- Both share the same agent code, tools, and orchestration logic
+- Environment variables and secrets are automatically configured for each platform
+## 2. Multi-Tool Support
 **What are tools?** Tools are actions an agent can perform — like querying a database, calling an API, or running custom code.
@@ -57,7 +112,7 @@ tools:
       connection: *github_connection
 ```
-## 2. On-Behalf-Of User Support
+## 3. On-Behalf-Of User Support
 **What is this?** Many Databricks resources (like SQL warehouses, Genie spaces, and LLMs) can operate "on behalf of" the end user, using their permissions instead of the agent's service account credentials.
@@ -104,7 +159,7 @@ The same agent code enforces different permissions for each user automatically.
 - The user must have the necessary permissions on the underlying resources
 - Not all Databricks resources support on-behalf-of functionality
-## 3. Advanced Caching (Genie Queries)
+## 4. Advanced Caching (Genie Queries)
 **Why caching matters:** When users ask similar questions repeatedly, you don't want to pay for the same AI processing over and over. Caching stores results so you can reuse them.
@@ -234,7 +289,7 @@ This works by embedding both the current question *and* recent conversation turn
 For more details on semantic cache configuration, see [docs/semantic_cache_weight_configuration.md](semantic_cache_weight_configuration.md).
-## 4. Vector Search Reranking
+## 5. Vector Search Reranking
 **The problem:** Vector search (semantic similarity) is fast but sometimes returns loosely related results. It's like a librarian who quickly grabs 50 books that *might* be relevant.
@@ -321,7 +376,7 @@ rerank:
 **Note:** Model weights are downloaded automatically on first use (~34MB for MiniLM-L-12-v2).
-## 5. Human-in-the-Loop Approvals
+## 6. Human-in-the-Loop Approvals
 **Why this matters:** Some actions are too important to automate completely. For example, you might want human approval before an agent:
 - Deletes data
@@ -343,7 +398,7 @@ tools:
         review_prompt: "This operation will modify production data. Approve?"
 ```
-## 6. Memory & State Persistence
+## 7. Memory & State Persistence
 **What is memory?** Your agent needs to remember past conversations. When a user asks "What about size XL?" the agent should remember they were talking about shirts.
@@ -395,7 +450,7 @@ memory:
 - **PostgreSQL**: When you need external database features or already have PostgreSQL infrastructure
 - **Lakebase**: When you want Databricks-native persistence with Unity Catalog governance
-## 7. MLflow Prompt Registry Integration
+## 8. MLflow Prompt Registry Integration
 **The problem:** Prompts (instructions you give to AI models) need constant refinement. Hardcoding them in YAML means every change requires redeployment.
@@ -427,7 +482,7 @@ agents:
     prompt: *product_expert_prompt  # Loaded from MLflow registry
 ```
-## 8. Automated Prompt Optimization
+## 9. Automated Prompt Optimization
 **What is this?** Instead of manually tweaking prompts through trial and error, DAO can automatically test variations and find the best one.
@@ -452,7 +507,7 @@ optimizations:
       num_candidates: 5
 ```
-## 9. Guardrails & Response Quality Middleware
+## 10. Guardrails & Response Quality Middleware
 **What are guardrails?** Safety and quality controls that validate agent responses before they reach users. Think of them as quality assurance checkpoints.
@@ -670,7 +725,7 @@ agents:
         prompt: *professional_tone_prompt
 ```
-## 10. Conversation Summarization
+## 11. Conversation Summarization
 **The problem:** AI models have a maximum amount of text they can process (the "context window"). Long conversations eventually exceed this limit.
@@ -697,7 +752,7 @@ The `LoggingSummarizationMiddleware` provides detailed observability:
 INFO | Summarization: BEFORE 25 messages (~12500 tokens) → AFTER 3 messages (~2100 tokens) | Reduced by ~10400 tokens
 ```
-## 11. Structured Output (Response Format)
+## 12. Structured Output (Response Format)
 **What is this?** A way to force your agent to return data in a specific JSON structure, making responses machine-readable and predictable.
@@ -749,7 +804,7 @@ See `config/examples/09_structured_output/structured_output.yaml` for a complete
 ---
-## 12. Custom Input & Custom Output Support
+## 13. Custom Input & Custom Output Support
 **What is this?** A flexible system for passing custom configuration values to your agents and receiving enriched output with runtime state.
@@ -829,7 +884,7 @@ When invoked with the `custom_inputs` above, the prompt automatically populates:
 - `session` state is automatically maintained and returned in `custom_outputs`
 - Backward compatible with legacy flat custom_inputs format
-## 13. Middleware (Input Validation, Logging, Monitoring)
+## 14. Middleware (Input Validation, Logging, Monitoring)
 **What is middleware?** Middleware are functions that wrap around agent execution to add cross-cutting concerns like validation, logging, authentication, and monitoring. They run before and after the agent processes requests.
@@ -1049,7 +1104,7 @@ agents:
 ---
-## 14. Hook System
+## 15. Hook System
 **What are hooks?** Hooks let you run custom code at specific moments in your agent's lifecycle — like "before starting" or "when shutting down".

{dao_ai-0.1.12 → dao_ai-0.1.14}/notebooks/07_run_evaluation.py RENAMED Viewed

@@ -225,6 +225,8 @@ def tool_call_efficiency(trace: Trace) -> Feedback:
         value=True,
         rationale=f"Efficient tool usage: {len(tool_calls)} successful calls"
     )
 # COMMAND ----------

{dao_ai-0.1.12 → dao_ai-0.1.14}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "dao-ai"
-version = "0.1.12"
+version = "0.1.14"
 description = "DAO AI: A modular, multi-agent orchestration framework for complex AI workflows. Supports agent handoff, tool integration, and dynamic configuration via YAML."
 readme = "README.md"
 license = { text = "MIT" }

{dao_ai-0.1.12 → dao_ai-0.1.14}/src/dao_ai/apps/handlers.py RENAMED Viewed

@@ -14,7 +14,7 @@ from typing import AsyncGenerator
 import mlflow
 from dotenv import load_dotenv
-from mlflow.genai.agent_server import invoke, stream
+from mlflow.genai.agent_server import get_request_headers, invoke, stream
 from mlflow.types.responses import (
     ResponsesAgentRequest,
     ResponsesAgentResponse,
@@ -25,6 +25,23 @@ from dao_ai.config import AppConfig
 from dao_ai.logging import configure_logging
 from dao_ai.models import LanggraphResponsesAgent
+def _inject_headers_into_request(request: ResponsesAgentRequest) -> None:
+    """Inject request headers into custom_inputs for Context propagation.
+    Captures headers from the MLflow AgentServer context (where they're available)
+    and injects them into request.custom_inputs.configurable.headers so they
+    flow through to Context and can be used for OBO authentication.
+    """
+    headers: dict[str, str] = get_request_headers()
+    if headers:
+        if request.custom_inputs is None:
+            request.custom_inputs = {}
+        if "configurable" not in request.custom_inputs:
+            request.custom_inputs["configurable"] = {}
+        request.custom_inputs["configurable"]["headers"] = headers
 # Load environment variables from .env.local if it exists
 load_dotenv(dotenv_path=".env.local", override=True)
@@ -61,6 +78,8 @@ async def non_streaming(request: ResponsesAgentRequest) -> ResponsesAgentRespons
     Returns:
         ResponsesAgentResponse with the complete output
     """
+    # Capture headers while in the AgentServer async context (before they're lost)
+    _inject_headers_into_request(request)
     return await _responses_agent.apredict(request)
@@ -80,5 +99,7 @@ async def streaming(
     Yields:
         ResponsesAgentStreamEvent objects as they are generated
     """
+    # Capture headers while in the AgentServer async context (before they're lost)
+    _inject_headers_into_request(request)
     async for event in _responses_agent.apredict_stream(request):
         yield event

{dao_ai-0.1.12 → dao_ai-0.1.14}/src/dao_ai/config.py RENAMED Viewed

@@ -7,6 +7,7 @@ from enum import Enum
 from os import PathLike
 from pathlib import Path
 from typing import (
+    TYPE_CHECKING,
     Any,
     Callable,
     Iterator,
@@ -18,6 +19,9 @@ from typing import (
     Union,
 )
+if TYPE_CHECKING:
+    from dao_ai.state import Context
 from databricks.sdk import WorkspaceClient
 from databricks.sdk.credentials_provider import (
     CredentialsStrategy,
@@ -284,8 +288,8 @@ class IsDatabricksResource(ABC, BaseModel):
         Authentication priority:
         1. On-Behalf-Of User (on_behalf_of_user=True):
-           - Forwarded headers (Databricks Apps)
-           - ModelServingUserCredentials (Model Serving)
+           - Uses ModelServingUserCredentials (Model Serving)
+           - For Databricks Apps with headers, use workspace_client_from(context)
         2. Service Principal (client_id + client_secret + workspace_host)
         3. PAT (pat + workspace_host)
         4. Ambient/default authentication
@@ -294,36 +298,6 @@ class IsDatabricksResource(ABC, BaseModel):
         # Check for OBO first (highest priority)
         if self.on_behalf_of_user:
-            # NEW: In Databricks Apps, use forwarded headers for per-user auth
-            try:
-                from mlflow.genai.agent_server import get_request_headers
-                headers = get_request_headers()
-                forwarded_token = headers.get("x-forwarded-access-token")
-                if forwarded_token:
-                    forwarded_user = headers.get("x-forwarded-user", "unknown")
-                    logger.debug(
-                        f"Creating WorkspaceClient for {self.__class__.__name__} "
-                        f"with OBO using forwarded token from Databricks Apps",
-                        forwarded_user=forwarded_user,
-                    )
-                    # Use workspace_host if configured, otherwise SDK will auto-detect
-                    workspace_host_value: str | None = (
-                        normalize_host(value_of(self.workspace_host))
-                        if self.workspace_host
-                        else None
-                    )
-                    return WorkspaceClient(
-                        host=workspace_host_value,
-                        token=forwarded_token,
-                        auth_type="pat",
-                    )
-            except (ImportError, LookupError):
-                # mlflow not available or headers not set - fall through to Model Serving
-                pass
-            # Fall back to Model Serving OBO (existing behavior)
             credentials_strategy: CredentialsStrategy = ModelServingUserCredentials()
             logger.debug(
                 f"Creating WorkspaceClient for {self.__class__.__name__} "
@@ -382,6 +356,55 @@ class IsDatabricksResource(ABC, BaseModel):
         )
         return WorkspaceClient()
+    def workspace_client_from(self, context: "Context | None") -> WorkspaceClient:
+        """
+        Get a WorkspaceClient using headers from the provided Context.
+        Use this method from tools that have access to ToolRuntime[Context].
+        This allows OBO authentication to work in Databricks Apps where headers
+        are captured at request entry and passed through the Context.
+        Args:
+            context: Runtime context containing headers for OBO auth.
+                     If None or no headers, falls back to workspace_client property.
+        Returns:
+            WorkspaceClient configured with appropriate authentication.
+        """
+        from dao_ai.utils import normalize_host
+        # Check if we have headers in context for OBO
+        if context and context.headers and self.on_behalf_of_user:
+            headers = context.headers
+            # Try both lowercase and title-case header names (HTTP headers are case-insensitive)
+            forwarded_token = headers.get("x-forwarded-access-token") or headers.get(
+                "X-Forwarded-Access-Token"
+            )
+            if forwarded_token:
+                forwarded_user = headers.get("x-forwarded-user") or headers.get(
+                    "X-Forwarded-User", "unknown"
+                )
+                logger.debug(
+                    f"Creating WorkspaceClient for {self.__class__.__name__} "
+                    f"with OBO using forwarded token from Context",
+                    forwarded_user=forwarded_user,
+                )
+                # Use workspace_host if configured, otherwise SDK will auto-detect
+                workspace_host_value: str | None = (
+                    normalize_host(value_of(self.workspace_host))
+                    if self.workspace_host
+                    else None
+                )
+                return WorkspaceClient(
+                    host=workspace_host_value,
+                    token=forwarded_token,
+                    auth_type="pat",
+                )
+        # Fall back to existing workspace_client property
+        return self.workspace_client
 class DeploymentTarget(str, Enum):
     """Target platform for agent deployment."""

{dao_ai-0.1.12 → dao_ai-0.1.14}/src/dao_ai/tools/genie.py RENAMED Viewed

@@ -139,29 +139,53 @@ Returns:
 GenieResponse: A response object containing the conversation ID and result from Genie."""
     tool_description = tool_description + function_docs
-    genie: Genie = Genie(
-        space_id=space_id,
-        client=genie_room.workspace_client,
-        truncate_results=truncate_results,
-    )
+    # Cache for genie service - created lazily on first call
+    # This allows us to use workspace_client_from with runtime context for OBO
+    _cached_genie_service: GenieServiceBase | None = None
+    def _get_genie_service(context: Context | None) -> GenieServiceBase:
+        """Get or create the Genie service, using context for OBO auth if available."""
+        nonlocal _cached_genie_service
+        # Use cached service if available (for non-OBO or after first call)
+        # For OBO, we need fresh workspace client each time to use the user's token
+        if _cached_genie_service is not None and not genie_room.on_behalf_of_user:
+            return _cached_genie_service
+        # Get workspace client using context for OBO support
+        from databricks.sdk import WorkspaceClient
-    genie_service: GenieServiceBase = GenieService(genie)
-    # Wrap with semantic cache first (checked second due to decorator pattern)
-    if semantic_cache_parameters is not None:
-        genie_service = SemanticCacheService(
-            impl=genie_service,
-            parameters=semantic_cache_parameters,
-            workspace_client=genie_room.workspace_client,  # Pass workspace client for conversation history
-        ).initialize()  # Eagerly initialize to fail fast and create table
-    # Wrap with LRU cache last (checked first - fast O(1) exact match)
-    if lru_cache_parameters is not None:
-        genie_service = LRUCacheService(
-            impl=genie_service,
-            parameters=lru_cache_parameters,
+        workspace_client: WorkspaceClient = genie_room.workspace_client_from(context)
+        genie: Genie = Genie(
+            space_id=space_id,
+            client=workspace_client,
+            truncate_results=truncate_results,
         )
+        genie_service: GenieServiceBase = GenieService(genie)
+        # Wrap with semantic cache first (checked second due to decorator pattern)
+        if semantic_cache_parameters is not None:
+            genie_service = SemanticCacheService(
+                impl=genie_service,
+                parameters=semantic_cache_parameters,
+                workspace_client=workspace_client,
+            ).initialize()
+        # Wrap with LRU cache last (checked first - fast O(1) exact match)
+        if lru_cache_parameters is not None:
+            genie_service = LRUCacheService(
+                impl=genie_service,
+                parameters=lru_cache_parameters,
+            )
+        # Cache for non-OBO scenarios
+        if not genie_room.on_behalf_of_user:
+            _cached_genie_service = genie_service
+        return genie_service
     @tool(
         name_or_callable=tool_name,
         description=tool_description,
@@ -177,6 +201,10 @@ GenieResponse: A response object containing the conversation ID and result from
         # Access state through runtime
         state: AgentState = runtime.state
         tool_call_id: str = runtime.tool_call_id
+        context: Context | None = runtime.context
+        # Get genie service with OBO support via context
+        genie_service: GenieServiceBase = _get_genie_service(context)
         # Ensure space_id is a string for state keys
         space_id_str: str = str(space_id)
@@ -194,6 +222,14 @@ GenieResponse: A response object containing the conversation ID and result from
             conversation_id=existing_conversation_id,
         )
+        # Log the prompt being sent to Genie
+        logger.trace(
+            "Sending prompt to Genie",
+            space_id=space_id_str,
+            conversation_id=existing_conversation_id,
+            prompt=question[:500] + "..." if len(question) > 500 else question,
+        )
         # Call ask_question which always returns CacheResult with cache metadata
         cache_result: CacheResult = genie_service.ask_question(
             question, conversation_id=existing_conversation_id
@@ -211,6 +247,22 @@ GenieResponse: A response object containing the conversation ID and result from
             cache_key=cache_key,
         )
+        # Log truncated response for debugging
+        result_preview: str = str(genie_response.result)
+        if len(result_preview) > 500:
+            result_preview = result_preview[:500] + "..."
+        logger.trace(
+            "Genie response content",
+            question=question[:100] + "..." if len(question) > 100 else question,
+            query=genie_response.query,
+            description=(
+                genie_response.description[:200] + "..."
+                if genie_response.description and len(genie_response.description) > 200
+                else genie_response.description
+            ),
+            result_preview=result_preview,
+        )
         # Update session state with cache information
         if persist_conversation:
             session.genie.update_space(

dao-ai 0.1.12__tar.gz → 0.1.14__tar.gz

dao-ai 0.1.12tar.gz → 0.1.14tar.gz