PyPI - dao-ai - Versions diffs - 0.1.2__tar.gz → 0.1.3__tar.gz - Mend

dao-ai 0.1.2tar.gz → 0.1.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (278) hide show

{dao_ai-0.1.2 → dao_ai-0.1.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: dao-ai
-Version: 0.1.2
+Version: 0.1.3
 Summary: DAO AI: A modular, multi-agent orchestration framework for complex AI workflows. Supports agent handoff, tool integration, and dynamic configuration via YAML.
 Project-URL: Homepage, https://github.com/natefleming/dao-ai
 Project-URL: Documentation, https://natefleming.github.io/dao-ai
@@ -79,7 +79,7 @@ Description-Content-Type: text/markdown
 # DAO: Declarative Agent Orchestration
-[![Version](https://img.shields.io/badge/version-0.1.0-blue.svg)](CHANGELOG.md)
+[![Version](https://img.shields.io/badge/version-0.1.2-blue.svg)](CHANGELOG.md)
 [![Python](https://img.shields.io/badge/python-3.11+-green.svg)](https://www.python.org/)
 [![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

{dao_ai-0.1.2 → dao_ai-0.1.3}/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 # DAO: Declarative Agent Orchestration
-[![Version](https://img.shields.io/badge/version-0.1.0-blue.svg)](CHANGELOG.md)
+[![Version](https://img.shields.io/badge/version-0.1.2-blue.svg)](CHANGELOG.md)
 [![Python](https://img.shields.io/badge/python-3.11+-green.svg)](https://www.python.org/)
 [![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

{dao_ai-0.1.2 → dao_ai-0.1.3}/config/examples/03_reranking/vector_search_with_reranking.yaml RENAMED Viewed

@@ -83,7 +83,7 @@ retrievers:
       filters: {}
       query_type: ANN
     rerank:                                         # Custom reranking configuration
-      model: ms-marco-MiniLM-L-6-v2                 # Faster, lighter model
+      model: ms-marco-TinyBERT-L-2-v2               # Fastest, smallest model (~4MB)
       top_n: 5                                      # Return top 5 after reranking
       # cache_dir: ~/.dao_ai/cache/flashrank        # Cache directory (default shown)
@@ -258,7 +258,7 @@ app:
 #    - Use when: Need better results than standard search
 #
 # 3. Fast Reranking - fast_search_agent
-#    - Uses lighter model (ms-marco-MiniLM-L-6-v2)
+#    - Uses lighter model (ms-marco-TinyBERT-L-2-v2)
 #    - Retrieves 100 candidates, returns top 5
 #    - Faster than default but still improves results
 #    - Use when: High volume, need speed with some accuracy boost
@@ -286,8 +286,10 @@ app:
 #   - More candidates = better reranking but slower
 #   - Cache directory should be on fast storage
 #
-# Model Selection:
-#   - ms-marco-TinyBERT-L-2-v2: Fastest, basic accuracy
-#   - ms-marco-MiniLM-L-6-v2: Fast, good accuracy
-#   - ms-marco-MiniLM-L-12-v2: Balanced (default)
-#   - rank-T5-flan: Most accurate, slowest
+# Model Selection (see https://github.com/PrithivirajDamodaran/FlashRank):
+#   - ms-marco-TinyBERT-L-2-v2: ~4MB, fastest
+#   - ms-marco-MiniLM-L-12-v2: ~34MB, best cross-encoder (default)
+#   - rank-T5-flan: ~110MB, best non cross-encoder
+#   - ms-marco-MultiBERT-L-12: ~150MB, multilingual (100+ languages)
+#   - ce-esci-MiniLM-L12-v2: e-commerce optimized (Amazon ESCI)
+#   - miniReranker_arabic_v1: Arabic language

{dao_ai-0.1.2 → dao_ai-0.1.3}/config/examples/15_complete_applications/hardware_store_lakebase.yaml RENAMED Viewed

@@ -689,7 +689,7 @@ app:
   registered_model:                                 # MLflow registered model configuration
     schema: *retail_schema                          # Schema where model will be registered
     name: hardware_store_lakebase_dao
-  endpoint_name: hardware_store_postgres_beta           # Model serving endpoint name
+  endpoint_name: hardware_store_lakebase_dao           # Model serving endpoint name
   environment_vars:
     PGHOST: "{{secrets/retail_ai/PGHOST}}"  # Databricks host URL
     RETAIL_AI_DATABRICKS_CLIENT_ID: "{{secrets/retail_ai/RETAIL_AI_DATABRICKS_CLIENT_ID}}"

{dao_ai-0.1.2 → dao_ai-0.1.3}/docs/key-capabilities.md RENAMED Viewed

@@ -298,12 +298,16 @@ Vector embeddings capture semantic similarity but may rank loosely related docum
 ### Available Models
-| Model | Speed | Quality | Use Case |
-|-------|-------|---------|----------|
-| `ms-marco-TinyBERT-L-2-v2` | ⚡⚡⚡ Fastest | Good | High-throughput, latency-sensitive |
-| `ms-marco-MiniLM-L-6-v2` | ⚡⚡ Fast | Better | Balanced performance |
-| `ms-marco-MiniLM-L-12-v2` | ⚡ Moderate | Best | Default, recommended |
-| `rank-T5-flan` | Slower | Excellent | Maximum accuracy |
+See [FlashRank](https://github.com/PrithivirajDamodaran/FlashRank) for the full list of supported models.
+| Model | Size | Speed | Use Case |
+|-------|------|-------|----------|
+| `ms-marco-TinyBERT-L-2-v2` | ~4MB | ⚡⚡⚡ Fastest | High-throughput, latency-sensitive |
+| `ms-marco-MiniLM-L-12-v2` | ~34MB | ⚡⚡ Fast | Default, best cross-encoder |
+| `rank-T5-flan` | ~110MB | ⚡ Moderate | Best non cross-encoder |
+| `ms-marco-MultiBERT-L-12` | ~150MB | Slower | Multilingual (100+ languages) |
+| `ce-esci-MiniLM-L12-v2` | - | ⚡⚡ Fast | E-commerce optimized |
+| `miniReranker_arabic_v1` | - | ⚡⚡ Fast | Arabic language |
 ### Configuration Options
@@ -311,11 +315,11 @@ Vector embeddings capture semantic similarity but may rank loosely related docum
 rerank:
   model: ms-marco-MiniLM-L-12-v2    # FlashRank model name
   top_n: 10                          # Documents to return (default: all)
-  cache_dir: /tmp/flashrank_cache    # Model weights cache location
+  cache_dir: ~/.dao_ai/cache/flashrank  # Model weights cache location
   columns: [description, name]       # Columns for Databricks Reranker (optional)
 ```
-**Note:** Model weights are downloaded automatically on first use (~20MB for MiniLM-L-12-v2).
+**Note:** Model weights are downloaded automatically on first use (~34MB for MiniLM-L-12-v2).
 ## 5. Human-in-the-Loop Approvals

{dao_ai-0.1.2 → dao_ai-0.1.3}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "dao-ai"
-version = "0.1.2"
+version = "0.1.3"
 description = "DAO AI: A modular, multi-agent orchestration framework for complex AI workflows. Supports agent handoff, tool integration, and dynamic configuration via YAML."
 readme = "README.md"
 license = { text = "MIT" }

{dao_ai-0.1.2 → dao_ai-0.1.3}/requirements.txt RENAMED Viewed

@@ -63,7 +63,6 @@ grandalf==0.8
 graphene==3.4.3
 graphql-core==3.2.7
 graphql-relay==3.2.0
-greenlet==3.3.0
 grpcio==1.76.0
 grpcio-status==1.76.0
 gunicorn==23.0.0

{dao_ai-0.1.2 → dao_ai-0.1.3}/schemas/model_config_schema.json RENAMED Viewed

@@ -3402,7 +3402,7 @@
     },
     "RerankParametersModel": {
       "additionalProperties": false,
-      "description": "Configuration for reranking retrieved documents using FlashRank.\n\nFlashRank provides fast, local reranking without API calls using lightweight\ncross-encoder models. Reranking improves retrieval quality by reordering results\nbased on semantic relevance to the query.\n\nTypical workflow:\n1. Retrieve more documents than needed (e.g., 50 via num_results)\n2. Rerank all retrieved documents\n3. Return top_n best matches (e.g., 5)\n\nExample:\n    ```yaml\n    retriever:\n      search_parameters:\n        num_results: 50  # Retrieve more candidates\n      rerank:\n        model: ms-marco-MiniLM-L-12-v2\n        top_n: 5  # Return top 5 after reranking\n    ```\n\nAvailable models (from fastest to most accurate):\n- \"ms-marco-TinyBERT-L-2-v2\" (fastest, smallest)\n- \"ms-marco-MiniLM-L-6-v2\"\n- \"ms-marco-MiniLM-L-12-v2\" (default, good balance)\n- \"rank-T5-flan\" (most accurate, slower)",
+      "description": "Configuration for reranking retrieved documents using FlashRank.\n\nFlashRank provides fast, local reranking without API calls using lightweight\ncross-encoder models. Reranking improves retrieval quality by reordering results\nbased on semantic relevance to the query.\n\nTypical workflow:\n1. Retrieve more documents than needed (e.g., 50 via num_results)\n2. Rerank all retrieved documents\n3. Return top_n best matches (e.g., 5)\n\nExample:\n    ```yaml\n    retriever:\n      search_parameters:\n        num_results: 50  # Retrieve more candidates\n      rerank:\n        model: ms-marco-MiniLM-L-12-v2\n        top_n: 5  # Return top 5 after reranking\n    ```\n\nAvailable models (see https://github.com/PrithivirajDamodaran/FlashRank):\n- \"ms-marco-TinyBERT-L-2-v2\" (~4MB, fastest)\n- \"ms-marco-MiniLM-L-12-v2\" (~34MB, best cross-encoder, default)\n- \"rank-T5-flan\" (~110MB, best non cross-encoder)\n- \"ms-marco-MultiBERT-L-12\" (~150MB, multilingual 100+ languages)\n- \"ce-esci-MiniLM-L12-v2\" (e-commerce optimized, Amazon ESCI)\n- \"miniReranker_arabic_v1\" (Arabic language)",
       "properties": {
         "model": {
           "default": "ms-marco-MiniLM-L-12-v2",
@@ -4336,6 +4336,7 @@
     },
     "VectorStoreModel": {
       "additionalProperties": false,
+      "description": "Configuration model for a Databricks Vector Search store.\n\nSupports two modes:\n1. **Use Existing Index**: Provide only `index` (fully qualified name).\n   Used for querying an existing vector search index at runtime.\n2. **Provisioning Mode**: Provide `source_table` + `embedding_source_column`.\n   Used for creating a new vector search index.\n\nExamples:\n    Minimal configuration (use existing index):\n    ```yaml\n    vector_stores:\n      products_search:\n        index:\n          name: catalog.schema.my_index\n    ```\n\n    Full provisioning configuration:\n    ```yaml\n    vector_stores:\n      products_search:\n        source_table:\n          schema: *my_schema\n          name: products\n        embedding_source_column: description\n        endpoint:\n          name: my_endpoint\n    ```",
       "properties": {
         "on_behalf_of_user": {
           "anyOf": [
@@ -4492,10 +4493,10 @@
           "default": null,
           "title": "Pat"
         },
-        "embedding_model": {
+        "index": {
           "anyOf": [
             {
-              "$ref": "#/$defs/LLMModel"
+              "$ref": "#/$defs/IndexModel"
             },
             {
               "type": "null"
@@ -4503,10 +4504,33 @@
           ],
           "default": null
         },
-        "index": {
+        "source_table": {
           "anyOf": [
             {
-              "$ref": "#/$defs/IndexModel"
+              "$ref": "#/$defs/TableModel"
+            },
+            {
+              "type": "null"
+            }
+          ],
+          "default": null
+        },
+        "embedding_source_column": {
+          "anyOf": [
+            {
+              "type": "string"
+            },
+            {
+              "type": "null"
+            }
+          ],
+          "default": null,
+          "title": "Embedding Source Column"
+        },
+        "embedding_model": {
+          "anyOf": [
+            {
+              "$ref": "#/$defs/LLMModel"
             },
             {
               "type": "null"
@@ -4525,9 +4549,6 @@
           ],
           "default": null
         },
-        "source_table": {
-          "$ref": "#/$defs/TableModel"
-        },
         "source_path": {
           "anyOf": [
             {
@@ -4587,16 +4608,8 @@
           ],
           "default": null,
           "title": "Doc Uri"
-        },
-        "embedding_source_column": {
-          "title": "Embedding Source Column",
-          "type": "string"
         }
       },
-      "required": [
-        "source_table",
-        "embedding_source_column"
-      ],
       "title": "VectorStoreModel",
       "type": "object"
     },

{dao_ai-0.1.2 → dao_ai-0.1.3}/src/dao_ai/cli.py RENAMED Viewed

@@ -715,7 +715,15 @@ def run_databricks_command(
     target: Optional[str] = None,
     dry_run: bool = False,
 ) -> None:
-    """Execute a databricks CLI command with optional profile and target."""
+    """Execute a databricks CLI command with optional profile and target.
+    Args:
+        command: The databricks CLI command to execute (e.g., ["bundle", "deploy"])
+        profile: Optional Databricks CLI profile name
+        config: Optional path to the configuration file
+        target: Optional bundle target name
+        dry_run: If True, print the command without executing
+    """
     config_path = Path(config) if config else None
     if config_path and not config_path.exists():
@@ -737,15 +745,17 @@ def run_databricks_command(
         logger.debug(f"Using app-specific target: {target}")
     # Build databricks command (no -c flag needed, uses databricks.yaml in current dir)
+    # Note: --profile is a global flag, but --target is a subcommand flag for 'bundle'
     cmd = ["databricks"]
     if profile:
         cmd.extend(["--profile", profile])
+    cmd.extend(command)
+    # --target must come after the bundle subcommand (it's a subcommand-specific flag)
     if target:
         cmd.extend(["--target", target])
-    cmd.extend(command)
     # Add config_path variable for notebooks
     if config_path and app_config:
         # Calculate relative path from notebooks directory to config file

{dao_ai-0.1.2 → dao_ai-0.1.3}/src/dao_ai/config.py RENAMED Viewed

@@ -1009,27 +1009,92 @@ class VolumePathModel(BaseModel, HasFullName):
 class VectorStoreModel(IsDatabricksResource):
+    """
+    Configuration model for a Databricks Vector Search store.
+    Supports two modes:
+    1. **Use Existing Index**: Provide only `index` (fully qualified name).
+       Used for querying an existing vector search index at runtime.
+    2. **Provisioning Mode**: Provide `source_table` + `embedding_source_column`.
+       Used for creating a new vector search index.
+    Examples:
+        Minimal configuration (use existing index):
+        ```yaml
+        vector_stores:
+          products_search:
+            index:
+              name: catalog.schema.my_index
+        ```
+        Full provisioning configuration:
+        ```yaml
+        vector_stores:
+          products_search:
+            source_table:
+              schema: *my_schema
+              name: products
+            embedding_source_column: description
+            endpoint:
+              name: my_endpoint
+        ```
+    """
     model_config = ConfigDict(use_enum_values=True, extra="forbid")
-    embedding_model: Optional[LLMModel] = None
+    # RUNTIME: Only index is truly required for querying existing indexes
     index: Optional[IndexModel] = None
+    # PROVISIONING ONLY: Required when creating a new index
+    source_table: Optional[TableModel] = None
+    embedding_source_column: Optional[str] = None
+    embedding_model: Optional[LLMModel] = None
     endpoint: Optional[VectorSearchEndpoint] = None
-    source_table: TableModel
+    # OPTIONAL: For both modes
     source_path: Optional[VolumePathModel] = None
     checkpoint_path: Optional[VolumePathModel] = None
     primary_key: Optional[str] = None
     columns: Optional[list[str]] = Field(default_factory=list)
     doc_uri: Optional[str] = None
-    embedding_source_column: str
+    @model_validator(mode="after")
+    def validate_configuration_mode(self) -> Self:
+        """
+        Validate that configuration is valid for either:
+        - Use existing mode: index is provided
+        - Provisioning mode: source_table + embedding_source_column provided
+        """
+        has_index = self.index is not None
+        has_source_table = self.source_table is not None
+        has_embedding_col = self.embedding_source_column is not None
+        # Must have at least index OR source_table
+        if not has_index and not has_source_table:
+            raise ValueError(
+                "Either 'index' (for existing indexes) or 'source_table' "
+                "(for provisioning) must be provided"
+            )
+        # If provisioning mode, need embedding_source_column
+        if has_source_table and not has_embedding_col:
+            raise ValueError(
+                "embedding_source_column is required when source_table is provided (provisioning mode)"
+            )
+        return self
     @model_validator(mode="after")
     def set_default_embedding_model(self) -> Self:
-        if not self.embedding_model:
+        # Only set default embedding model in provisioning mode
+        if self.source_table is not None and not self.embedding_model:
             self.embedding_model = LLMModel(name="databricks-gte-large-en")
         return self
     @model_validator(mode="after")
     def set_default_primary_key(self) -> Self:
-        if self.primary_key is None:
+        # Only auto-discover primary key in provisioning mode
+        if self.primary_key is None and self.source_table is not None:
             from dao_ai.providers.databricks import DatabricksProvider
             provider: DatabricksProvider = DatabricksProvider()
@@ -1050,14 +1115,16 @@ class VectorStoreModel(IsDatabricksResource):
     @model_validator(mode="after")
     def set_default_index(self) -> Self:
-        if self.index is None:
+        # Only generate index from source_table in provisioning mode
+        if self.index is None and self.source_table is not None:
             name: str = f"{self.source_table.name}_index"
             self.index = IndexModel(schema=self.source_table.schema_model, name=name)
         return self
     @model_validator(mode="after")
     def set_default_endpoint(self) -> Self:
-        if self.endpoint is None:
+        # Only find/create endpoint in provisioning mode
+        if self.endpoint is None and self.source_table is not None:
             from dao_ai.providers.databricks import (
                 DatabricksProvider,
                 with_available_indexes,
@@ -1549,11 +1616,13 @@ class RerankParametersModel(BaseModel):
             top_n: 5  # Return top 5 after reranking
         ```
-    Available models (from fastest to most accurate):
-    - "ms-marco-TinyBERT-L-2-v2" (fastest, smallest)
-    - "ms-marco-MiniLM-L-6-v2"
-    - "ms-marco-MiniLM-L-12-v2" (default, good balance)
-    - "rank-T5-flan" (most accurate, slower)
+    Available models (see https://github.com/PrithivirajDamodaran/FlashRank):
+    - "ms-marco-TinyBERT-L-2-v2" (~4MB, fastest)
+    - "ms-marco-MiniLM-L-12-v2" (~34MB, best cross-encoder, default)
+    - "rank-T5-flan" (~110MB, best non cross-encoder)
+    - "ms-marco-MultiBERT-L-12" (~150MB, multilingual 100+ languages)
+    - "ce-esci-MiniLM-L12-v2" (e-commerce optimized, Amazon ESCI)
+    - "miniReranker_arabic_v1" (Arabic language)
     """
     model_config = ConfigDict(use_enum_values=True, extra="forbid")

{dao_ai-0.1.2 → dao_ai-0.1.3}/src/dao_ai/providers/databricks.py RENAMED Viewed

@@ -625,6 +625,22 @@ class DatabricksProvider(ServiceProvider):
                 df.write.mode("overwrite").saveAsTable(table)
     def create_vector_store(self, vector_store: VectorStoreModel) -> None:
+        # Validate that this is a provisioning-mode config
+        if vector_store.source_table is None:
+            raise ValueError(
+                "Cannot create vector store: source_table is required for provisioning. "
+                "This VectorStoreModel appears to be configured for 'use existing index' mode. "
+                "To provision a new vector store, provide source_table and embedding_source_column."
+            )
+        if vector_store.embedding_source_column is None:
+            raise ValueError(
+                "Cannot create vector store: embedding_source_column is required for provisioning."
+            )
+        if vector_store.endpoint is None:
+            raise ValueError(
+                "Cannot create vector store: endpoint is required for provisioning."
+            )
         if not endpoint_exists(self.vsc, vector_store.endpoint.name):
             self.vsc.create_endpoint_and_wait(
                 name=vector_store.endpoint.name,

{dao_ai-0.1.2 → dao_ai-0.1.3}/tests/dao_ai/test_databricks.py RENAMED Viewed

@@ -10,8 +10,10 @@ from dao_ai.config import (
     AppConfig,
     DatabaseModel,
     FunctionModel,
+    IndexModel,
     SchemaModel,
     TableModel,
+    VectorStoreModel,
 )
 from dao_ai.providers.databricks import DatabricksProvider
@@ -1442,3 +1444,188 @@ def test_create_lakebase_instance_role_with_composite_variable():
     mock_workspace_client.database.get_database_instance_role.assert_called_once()
     call_args = mock_workspace_client.database.get_database_instance_role.call_args
     assert call_args.kwargs["name"] == "test-client-id-456"
+# ==================== VectorStoreModel Tests ====================
+@pytest.mark.unit
+def test_vector_store_model_use_existing_index_minimal():
+    """Test VectorStoreModel with minimal config for existing index (use existing mode)."""
+    # Create VectorStoreModel with just an index - this is the minimal config
+    vector_store = VectorStoreModel(
+        index=IndexModel(name="catalog.schema.my_index"),
+    )
+    assert vector_store.index is not None
+    assert vector_store.index.full_name == "catalog.schema.my_index"
+    # Provisioning fields should be None
+    assert vector_store.source_table is None
+    assert vector_store.embedding_source_column is None
+    # Endpoint should NOT be auto-discovered (only in provisioning mode)
+    assert vector_store.endpoint is None
+    # Embedding model should NOT be set (only in provisioning mode)
+    assert vector_store.embedding_model is None
+@pytest.mark.unit
+def test_vector_store_model_use_existing_index_with_optional_fields():
+    """Test VectorStoreModel with existing index and optional fields."""
+    vector_store = VectorStoreModel(
+        index=IndexModel(name="catalog.schema.my_index"),
+        columns=["id", "name", "description"],
+        primary_key="id",
+        doc_uri="https://docs.example.com",
+    )
+    assert vector_store.index.full_name == "catalog.schema.my_index"
+    assert vector_store.columns == ["id", "name", "description"]
+    assert vector_store.primary_key == "id"
+    assert vector_store.doc_uri == "https://docs.example.com"
+    # Provisioning fields remain None
+    assert vector_store.source_table is None
+    assert vector_store.embedding_source_column is None
+@pytest.mark.unit
+def test_vector_store_model_validation_requires_index_or_source_table():
+    """Test that VectorStoreModel fails without either index or source_table."""
+    with pytest.raises(ValueError) as exc_info:
+        VectorStoreModel()
+    assert "Either 'index' (for existing indexes) or 'source_table'" in str(
+        exc_info.value
+    )
+@pytest.mark.unit
+def test_vector_store_model_provisioning_requires_embedding_source_column():
+    """Test that provisioning mode requires embedding_source_column."""
+    schema = SchemaModel(catalog_name="test_catalog", schema_name="test_schema")
+    table = TableModel(schema=schema, name="test_table")
+    with pytest.raises(ValueError) as exc_info:
+        VectorStoreModel(source_table=table)
+    assert "embedding_source_column is required when source_table is provided" in str(
+        exc_info.value
+    )
+@pytest.mark.unit
+def test_vector_store_model_provisioning_mode():
+    """Test VectorStoreModel in provisioning mode (source_table + embedding_source_column)."""
+    schema = SchemaModel(catalog_name="test_catalog", schema_name="test_schema")
+    table = TableModel(schema=schema, name="test_table")
+    # Mock the DatabricksProvider to avoid actual API calls
+    # The import happens inside the validators, so we patch the providers module
+    with patch(
+        "dao_ai.providers.databricks.DatabricksProvider"
+    ) as mock_provider_class:
+        mock_provider = MagicMock()
+        mock_provider.find_primary_key.return_value = ["id"]
+        mock_provider.find_endpoint_for_index.return_value = "test_endpoint"
+        mock_provider_class.return_value = mock_provider
+        vector_store = VectorStoreModel(
+            source_table=table,
+            embedding_source_column="description",
+        )
+        # Index should be auto-generated
+        assert vector_store.index is not None
+        assert vector_store.index.name == "test_table_index"
+        assert (
+            vector_store.index.full_name == "test_catalog.test_schema.test_table_index"
+        )
+        # Default embedding model should be set in provisioning mode
+        assert vector_store.embedding_model is not None
+        assert vector_store.embedding_model.name == "databricks-gte-large-en"
+        # Primary key should be auto-discovered
+        assert vector_store.primary_key == "id"
+        # Endpoint should be auto-discovered in provisioning mode
+        assert vector_store.endpoint is not None
+        assert vector_store.endpoint.name == "test_endpoint"
+@pytest.mark.unit
+def test_vector_store_model_provisioning_with_explicit_index():
+    """Test that explicit index is respected in provisioning mode."""
+    schema = SchemaModel(catalog_name="test_catalog", schema_name="test_schema")
+    table = TableModel(schema=schema, name="test_table")
+    with patch(
+        "dao_ai.providers.databricks.DatabricksProvider"
+    ) as mock_provider_class:
+        mock_provider = MagicMock()
+        mock_provider.find_primary_key.return_value = ["id"]
+        mock_provider.find_endpoint_for_index.return_value = "test_endpoint"
+        mock_provider_class.return_value = mock_provider
+        vector_store = VectorStoreModel(
+            source_table=table,
+            embedding_source_column="description",
+            index=IndexModel(schema=schema, name="custom_index"),
+        )
+        # Explicit index should be preserved
+        assert vector_store.index.name == "custom_index"
+        assert vector_store.index.full_name == "test_catalog.test_schema.custom_index"
+@pytest.mark.unit
+def test_vector_store_model_use_existing_no_auto_discovery():
+    """Test that use existing mode does not trigger expensive auto-discovery."""
+    # This test ensures no DatabricksProvider calls happen in "use existing" mode
+    with patch(
+        "dao_ai.providers.databricks.DatabricksProvider"
+    ) as mock_provider_class:
+        mock_provider = MagicMock()
+        mock_provider_class.return_value = mock_provider
+        vector_store = VectorStoreModel(
+            index=IndexModel(name="catalog.schema.existing_index"),
+        )
+        # In use existing mode, no provider methods should be called
+        mock_provider.find_primary_key.assert_not_called()
+        mock_provider.find_endpoint_for_index.assert_not_called()
+        mock_provider.find_vector_search_endpoint.assert_not_called()
+        # Verify the model is correctly created
+        assert vector_store.index.full_name == "catalog.schema.existing_index"
+@pytest.mark.unit
+def test_vector_store_model_api_scopes():
+    """Test VectorStoreModel API scopes."""
+    vector_store = VectorStoreModel(
+        index=IndexModel(name="catalog.schema.my_index"),
+    )
+    api_scopes = vector_store.api_scopes
+    assert "vectorsearch.vector-search-endpoints" in api_scopes
+    assert "serving.serving-endpoints" in api_scopes
+    assert "vectorsearch.vector-search-indexes" in api_scopes
+@pytest.mark.unit
+def test_create_vector_store_fails_for_use_existing_mode():
+    """Test that create_vector_store raises clear error for use-existing mode config."""
+    # Create a use-existing mode VectorStoreModel
+    vector_store = VectorStoreModel(
+        index=IndexModel(name="catalog.schema.existing_index"),
+    )
+    # Try to provision (should fail with clear error)
+    provider = DatabricksProvider(w=MagicMock(), vsc=MagicMock())
+    with pytest.raises(ValueError) as exc_info:
+        provider.create_vector_store(vector_store)
+    assert "source_table is required for provisioning" in str(exc_info.value)
+    assert "use existing index" in str(exc_info.value).lower()

{dao_ai-0.1.2 → dao_ai-0.1.3}/tests/dao_ai/test_reranking.py RENAMED Viewed

@@ -54,7 +54,11 @@ def create_mock_vector_store() -> Mock:
     vector_store.embedding_model = None
     vector_store.primary_key = "id"
     vector_store.index = Mock()
+    vector_store.index.full_name = "catalog.schema.test_index"
     vector_store.endpoint = Mock()
+    # New optional fields for VectorStoreModel
+    vector_store.source_table = None  # Use existing index mode
+    vector_store.embedding_source_column = None
     add_databricks_resource_attrs(vector_store)
     return vector_store