PyPI - pyobvector - Versions diffs - 0.2.16__tar.gz → 0.2.18__tar.gz - Mend

pyobvector 0.2.16tar.gz → 0.2.18tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

{pyobvector-0.2.16 → pyobvector-0.2.18}/PKG-INFO RENAMED Viewed

@@ -1,7 +1,8 @@
-Metadata-Version: 2.3
+Metadata-Version: 2.4
 Name: pyobvector
-Version: 0.2.16
+Version: 0.2.18
 Summary: A python SDK for OceanBase Vector Store, based on SQLAlchemy, compatible with Milvus API.
+License-File: LICENSE
 Author: shanhaikang.shk
 Author-email: shanhaikang.shk@oceanbase.com
 Requires-Python: >=3.9,<4.0
@@ -11,12 +12,13 @@ Classifier: Programming Language :: Python :: 3.10
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Programming Language :: Python :: 3.13
-Requires-Dist: aiomysql (>=0.2.0,<0.3.0)
+Classifier: Programming Language :: Python :: 3.14
+Requires-Dist: aiomysql (>=0.3.2,<0.4.0)
 Requires-Dist: numpy (>=1.17.0,<2.0.0)
 Requires-Dist: pydantic (>=2.7.0,<3)
 Requires-Dist: pymysql (>=1.1.1,<2.0.0)
 Requires-Dist: sqlalchemy (>=1.4,<=3)
-Requires-Dist: sqlglot (>=26.0.1,<27.0.0)
+Requires-Dist: sqlglot (>=26.0.1)
 Description-Content-Type: text/markdown
 # pyobvector
@@ -36,7 +38,7 @@ poetry install
 - install with pip:
 ```shell
-pip install pyobvector==0.2.16
+pip install pyobvector==0.2.18
 ```
 ## Build Doc
@@ -48,6 +50,10 @@ mkdir build
 make html
 ```
+## Release Notes
+For detailed release notes and changelog, see [RELEASE_NOTES.md](RELEASE_NOTES.md).
 ## Usage
 `pyobvector` supports two modes:
@@ -174,19 +180,75 @@ client.insert(test_collection_name, data=data1)
 - do ann search:
 ```python
-# perform ann search
+# perform ann search with basic column selection
 res = self.client.ann_search(
     test_collection_name,
     vec_data=[0,0,0],
     vec_column_name='embedding',
     distance_func=l2_distance,
     topk=5,
-    output_column_names=['id']
+    output_column_names=['id']  # Legacy parameter
 )
 # For example, the result will be:
 # [(112,), (111,), (10,), (11,), (12,)]
+# perform ann search with SQLAlchemy expressions (recommended)
+from sqlalchemy import Table, text, func
+table = Table(test_collection_name, client.metadata_obj, autoload_with=client.engine)
+res = self.client.ann_search(
+    test_collection_name,
+    vec_data=[0,0,0],
+    vec_column_name='embedding',
+    distance_func=l2_distance,
+    topk=5,
+    output_columns=[
+        table.c.id,
+        table.c.meta,
+        (table.c.id + 1000).label('id_plus_1000'),
+        text("JSON_EXTRACT(meta, '$.key') as extracted_key")
+    ]
+)
+# For example, the result will be:
+# [(112, '{"key": "value"}', 1112, 'value'), ...]
+# perform ann search with distance threshold (filter results by distance)
+res = self.client.ann_search(
+    test_collection_name,
+    vec_data=[0,0,0],
+    vec_column_name='embedding',
+    distance_func=l2_distance,
+    with_dist=True,
+    topk=10,
+    output_column_names=['id'],
+    distance_threshold=0.5  # Only return results where distance <= 0.5
+)
+# Only returns results with distance <= 0.5
+# For example, the result will be:
+# [(10, 0.0), (11, 0.0), ...]  # Only includes results with distance <= 0.5
 ```
+#### ann_search Parameters
+The `ann_search` method supports flexible output column selection through the `output_columns` parameter:
+- **`output_columns`** (recommended): Accepts SQLAlchemy Column objects, expressions, or a mix of both
+  - Column objects: `table.c.id`, `table.c.name`
+  - Expressions: `(table.c.age + 10).label('age_plus_10')`
+  - JSON queries: `text("JSON_EXTRACT(meta, '$.key') as extracted_key")`
+  - String functions: `func.concat(table.c.name, ' (', table.c.age, ')').label('name_age')`
+- **`output_column_names`** (legacy): Accepts list of column name strings
+  - Example: `['id', 'name', 'meta']`
+- **Parameter Priority**: `output_columns` takes precedence over `output_column_names` when both are provided
+- **`distance_threshold`** (optional): Filter results by distance threshold
+  - Type: `Optional[float]`
+  - Only returns results where `distance <= threshold`
+  - Example: `distance_threshold=0.5` returns only results with distance <= 0.5
+  - Use case: Quality control for similarity search, only return highly similar results
 - If you want to use pure `SQLAlchemy` API with `OceanBase` dialect, you can just get an `SQLAlchemy.engine` via `client.engine`. The engine can also be created as following:
 ```python

{pyobvector-0.2.16 → pyobvector-0.2.18}/README.md RENAMED Viewed

@@ -15,7 +15,7 @@ poetry install
 - install with pip:
 ```shell
-pip install pyobvector==0.2.16
+pip install pyobvector==0.2.18
 ```
 ## Build Doc
@@ -27,6 +27,10 @@ mkdir build
 make html
 ```
+## Release Notes
+For detailed release notes and changelog, see [RELEASE_NOTES.md](RELEASE_NOTES.md).
 ## Usage
 `pyobvector` supports two modes:
@@ -153,19 +157,75 @@ client.insert(test_collection_name, data=data1)
 - do ann search:
 ```python
-# perform ann search
+# perform ann search with basic column selection
 res = self.client.ann_search(
     test_collection_name,
     vec_data=[0,0,0],
     vec_column_name='embedding',
     distance_func=l2_distance,
     topk=5,
-    output_column_names=['id']
+    output_column_names=['id']  # Legacy parameter
 )
 # For example, the result will be:
 # [(112,), (111,), (10,), (11,), (12,)]
+# perform ann search with SQLAlchemy expressions (recommended)
+from sqlalchemy import Table, text, func
+table = Table(test_collection_name, client.metadata_obj, autoload_with=client.engine)
+res = self.client.ann_search(
+    test_collection_name,
+    vec_data=[0,0,0],
+    vec_column_name='embedding',
+    distance_func=l2_distance,
+    topk=5,
+    output_columns=[
+        table.c.id,
+        table.c.meta,
+        (table.c.id + 1000).label('id_plus_1000'),
+        text("JSON_EXTRACT(meta, '$.key') as extracted_key")
+    ]
+)
+# For example, the result will be:
+# [(112, '{"key": "value"}', 1112, 'value'), ...]
+# perform ann search with distance threshold (filter results by distance)
+res = self.client.ann_search(
+    test_collection_name,
+    vec_data=[0,0,0],
+    vec_column_name='embedding',
+    distance_func=l2_distance,
+    with_dist=True,
+    topk=10,
+    output_column_names=['id'],
+    distance_threshold=0.5  # Only return results where distance <= 0.5
+)
+# Only returns results with distance <= 0.5
+# For example, the result will be:
+# [(10, 0.0), (11, 0.0), ...]  # Only includes results with distance <= 0.5
 ```
+#### ann_search Parameters
+The `ann_search` method supports flexible output column selection through the `output_columns` parameter:
+- **`output_columns`** (recommended): Accepts SQLAlchemy Column objects, expressions, or a mix of both
+  - Column objects: `table.c.id`, `table.c.name`
+  - Expressions: `(table.c.age + 10).label('age_plus_10')`
+  - JSON queries: `text("JSON_EXTRACT(meta, '$.key') as extracted_key")`
+  - String functions: `func.concat(table.c.name, ' (', table.c.age, ')').label('name_age')`
+- **`output_column_names`** (legacy): Accepts list of column name strings
+  - Example: `['id', 'name', 'meta']`
+- **Parameter Priority**: `output_columns` takes precedence over `output_column_names` when both are provided
+- **`distance_threshold`** (optional): Filter results by distance threshold
+  - Type: `Optional[float]`
+  - Only returns results where `distance <= threshold`
+  - Example: `distance_threshold=0.5` returns only results with distance <= 0.5
+  - Use case: Quality control for similarity search, only return highly similar results
 - If you want to use pure `SQLAlchemy` API with `OceanBase` dialect, you can just get an `SQLAlchemy.engine` via `client.engine`. The engine can also be created as following:
 ```python

{pyobvector-0.2.16 → pyobvector-0.2.18}/pyobvector/__init__.py RENAMED Viewed

@@ -14,6 +14,7 @@ In this mode, you can regard `pyobvector` as an extension of SQLAlchemy.
 * IndexParams           A list of IndexParam to create vector index in batch
 * DataType              Specify field type in collection schema for MilvusLikeClient
 * VECTOR                An extended data type in SQLAlchemy for ObVecClient
+* SPARSE_VECTOR         An extended data type in SQLAlchemy for ObVecClient
 * VectorIndex           An extended index type in SQLAlchemy for ObVecClient
 * FtsIndex              Full Text Search Index
 * FieldSchema           Clas to define field schema in collection for MilvusLikeClient
@@ -43,6 +44,7 @@ from .client import *
 from .schema import (
     ARRAY,
     VECTOR,
+    SPARSE_VECTOR,
     POINT,
     VectorIndex,
     OceanBaseDialect,
@@ -70,6 +72,7 @@ __all__ = [
     "DataType",
     "ARRAY",
     "VECTOR",
+    "SPARSE_VECTOR",
     "POINT",
     "VectorIndex",
     "FtsIndex",

{pyobvector-0.2.16 → pyobvector-0.2.18}/pyobvector/client/collection_schema.py RENAMED Viewed

@@ -79,14 +79,14 @@ class FieldSchema:
             if "max_length" not in self.kwargs:
                 raise VarcharFieldParamException(
                     code=ErrorCode.INVALID_ARGUMENT,
-                    message=ExceptionsMessage.VarcharFieldMissinglengthParam,
+                    message=ExceptionsMessage.VarcharFieldMissingLengthParam,
                 )
             self.type_params["length"] = self.kwargs["max_length"]
         elif self.dtype == DataType.ARRAY:
             if "element_type" not in self.kwargs:
                 raise ArrayFieldParamException(
                     code=ErrorCode.INVALID_ARGUMENT,
-                    message=ExceptionsMessage.ArrayFiledMissingElementType,
+                    message=ExceptionsMessage.ArrayFieldMissingElementType,
                 )
             if self.kwargs["element_type"] in (
                 DataType.ARRAY,
@@ -95,7 +95,7 @@ class FieldSchema:
             ):
                 raise ArrayFieldParamException(
                     code=ErrorCode.INVALID_ARGUMENT,
-                    message=ExceptionsMessage.ArrayFiledInvalidElementType,
+                    message=ExceptionsMessage.ArrayFieldInvalidElementType,
                 )
             self.type_params["item_type"] = convert_datatype_to_sqltype(
@@ -147,9 +147,9 @@ class CollectionSchema:
         """Add field to collection.
         Args:
-        :param field_name (string) : new field name
-        :param datatype (DataType) : field data type
-        :param kwargs : parameters for data type
+            field_name (string): new field name
+            datatype (DataType): field data type
+            **kwargs: parameters for data type
         """
         field = FieldSchema(field_name, datatype, **kwargs)
         cur_idx = len(self.fields)

{pyobvector-0.2.16 → pyobvector-0.2.18}/pyobvector/client/exceptions.py RENAMED Viewed

@@ -101,9 +101,9 @@ class ExceptionsMessage:
     )
     PrimaryFieldType = "Param primary_field must be int or str type."
     VectorFieldMissingDimParam = "Param 'dim' must be set for vector field."
-    VarcharFieldMissinglengthParam = "Param 'max_length' must be set for varchar field."
-    ArrayFiledMissingElementType = "Param 'element_type' must be set for array field."
-    ArrayFiledInvalidElementType = (
+    VarcharFieldMissingLengthParam = "Param 'max_length' must be set for varchar field."
+    ArrayFieldMissingElementType = "Param 'element_type' must be set for array field."
+    ArrayFieldInvalidElementType = (
         "Param 'element_type' can not be array/vector/varchar."
     )
     CollectionNotExists = "Collection does not exist."
@@ -111,5 +111,5 @@ class ExceptionsMessage:
     MetricTypeValueInvalid = "MetricType should be 'l2'/'ip'/'neg_ip'/'cosine' in ann search."
     UsingInIDsWhenMultiPrimaryKey = "Using 'ids' when table has multi primary key."
     ClusterVersionIsLow = (
-        "OceanBase Vector Store is not supported because cluster version is below 4.3.3.0."
+        "OceanBase %s feature is not supported because cluster version is below %s."
     )

{pyobvector-0.2.16 → pyobvector-0.2.18}/pyobvector/client/fts_index_param.py RENAMED Viewed

@@ -18,13 +18,12 @@ class FtsIndexParam:
         self.field_names = field_names
         self.parser_type = parser_type
-    def param_str(self) -> str:
-        if self.parser_type is None:
-            return None
+    def param_str(self) -> str | None:
         if self.parser_type == FtsParser.IK:
             return "ik"
         if self.parser_type == FtsParser.NGRAM:
             return "ngram"
+        return None
     def __iter__(self):
         yield "index_name", self.index_name

pyobvector-0.2.18/pyobvector/client/hybrid_search.py ADDED Viewed

@@ -0,0 +1,81 @@
+"""OceanBase Hybrid Search Client."""
+import json
+import logging
+from typing import Dict, Any
+from sqlalchemy import text
+from .exceptions import ClusterVersionException, ErrorCode, ExceptionsMessage
+from .ob_vec_client import ObVecClient as Client
+from ..util import ObVersion
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.DEBUG)
+class HybridSearch(Client):
+    """The OceanBase Hybrid Search Client"""
+    def __init__(
+        self,
+        uri: str = "127.0.0.1:2881",
+        user: str = "root@test",
+        password: str = "",
+        db_name: str = "test",
+        **kwargs,
+    ):
+        super().__init__(uri, user, password, db_name, **kwargs)
+        if self.ob_version < ObVersion.from_db_version_nums(4, 4, 1, 0):
+            raise ClusterVersionException(
+                code=ErrorCode.NOT_SUPPORTED,
+                message=ExceptionsMessage.ClusterVersionIsLow % ("Hybrid Search", "4.4.1.0"),
+            )
+    def search(
+        self,
+        index: str,
+        body: Dict[str, Any],
+        **kwargs,
+    ):
+        """Execute hybrid search with parameter compatible with Elasticsearch.
+        Args:
+            index: The name of the table to search
+            body: The search query body
+            **kwargs: Additional search parameters
+        Returns:
+            Search results
+        """
+        body_str = json.dumps(body)
+        sql = text("SELECT DBMS_HYBRID_SEARCH.SEARCH(:index, :body_str)")
+        with self.engine.connect() as conn:
+            with conn.begin():
+                res = conn.execute(sql, {"index": index, "body_str": body_str}).fetchone()
+                return json.loads(res[0])
+    def get_sql(
+        self,
+        index: str,
+        body: Dict[str, Any],
+    ) -> str:
+        """Get the SQL actually to be executed in hybrid search.
+        Args:
+            index: The name of the table to search
+            body: The hybrid search query body
+        Returns:
+            The SQL actually to be executed
+        """
+        body_str = json.dumps(body)
+        sql = text("SELECT DBMS_HYBRID_SEARCH.GET_SQL(:index, :body_str)")
+        with self.engine.connect() as conn:
+            with conn.begin():
+                res = conn.execute(sql, {"index": index, "body_str": body_str}).fetchone()
+                return res[0]

{pyobvector-0.2.16 → pyobvector-0.2.18}/pyobvector/client/index_param.py RENAMED Viewed

@@ -9,7 +9,7 @@ class VecIndexType(Enum):
     IVFFLAT = 2
     IVFSQ = 3
     IVFPQ = 4
+    DAAT = 5
 class IndexParam:
     """Vector index parameters.
@@ -31,6 +31,7 @@ class IndexParam:
     IVFFLAT_ALGO_NAME = "ivf_flat"
     IVFSQ_ALGO_NAME = "ivf_sq8"
     IVFPQ_ALGO_NAME = "ivf_pq"
+    DAAT_ALGO_NAME = "daat"
     def __init__(
         self, index_name: str, field_name: str, index_type: Union[VecIndexType, str], **kwargs
@@ -57,6 +58,11 @@ class IndexParam:
         return self.index_type in [
             IndexParam.IVFPQ_ALGO_NAME,
         ]
+    def is_index_type_sparse_vector(self):
+        return self.index_type in [
+            IndexParam.DAAT_ALGO_NAME,
+        ]
     def _get_vector_index_type_str(self):
         """Parse vector index type to string."""
@@ -71,6 +77,8 @@ class IndexParam:
                 return IndexParam.IVFSQ_ALGO_NAME
             elif self.index_type == VecIndexType.IVFPQ:
                 return IndexParam.IVFPQ_ALGO_NAME
+            elif self.index_type == VecIndexType.DAAT:
+                return IndexParam.DAAT_ALGO_NAME
             raise ValueError(f"unsupported vector index type: {self.index_type}")
         assert isinstance(self.index_type, str)
         index_type = self.index_type.lower()
@@ -80,6 +88,7 @@ class IndexParam:
             IndexParam.IVFFLAT_ALGO_NAME,
             IndexParam.IVFSQ_ALGO_NAME,
             IndexParam.IVFPQ_ALGO_NAME,
+            IndexParam.DAAT_ALGO_NAME,
         ]:
             raise ValueError(f"unsupported vector index type: {self.index_type}")
         return index_type
@@ -124,15 +133,19 @@ class IndexParam:
                     ob_params['ef_construction'] = params['efConstruction']
                 if 'efSearch' in params:
                     ob_params['ef_search'] = params['efSearch']
+        if self.is_index_type_sparse_vector() and ob_params['distance'] != 'inner_product':
+            raise ValueError("Metric type should be 'inner_product' for sparse vector index.")
         return ob_params
     def param_str(self):
         """Parse vector index parameters to string."""
         ob_param = self._parse_kwargs()
         partial_str = ",".join([f"{k}={v}" for k, v in ob_param.items()])
-        if len(partial_str) > 0:
-            partial_str += ","
-        partial_str += f"type={self.index_type}"
+        if not self.is_index_type_sparse_vector():
+            if len(partial_str) > 0:
+                partial_str += ","
+            partial_str += f"type={self.index_type}"
         return partial_str
     def __iter__(self):
@@ -165,10 +178,10 @@ class IndexParams:
         """Add `IndexParam` to `IndexParams`
         Args:
-        :param field_name (string) : vector index built on which field
-        :param index_type (VecIndexType) :
-                vector index algorithms (Only HNSW supported)
-        :param index_name (string) : vector index name
+            field_name (string): vector index built on which field
+            index_type (VecIndexType): vector index algorithms (Only HNSW supported)
+            index_name (string): vector index name
+            **kwargs: additional parameters for different index types
         """
         index_param = IndexParam(index_name, field_name, index_type, **kwargs)
         pair_key = (field_name, index_name)

pyobvector 0.2.16__tar.gz → 0.2.18__tar.gz

pyobvector 0.2.16tar.gz → 0.2.18tar.gz