PyPI - databricks-sql-connector - Versions diffs - 4.0.0b3__tar.gz → 4.0.1__tar.gz - Mend

databricks-sql-connector 4.0.0b3tar.gz → 4.0.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

{databricks_sql_connector-4.0.0b3 → databricks_sql_connector-4.0.1}/CHANGELOG.md RENAMED Viewed

@@ -1,5 +1,39 @@
 # Release History
+# 4.0.1 (2025-03-19)
+- Support for multiple timestamp formats parsing (databricks/databricks-sql-python#533 by @jprakash-db)
+- Rename `_user_agent_entry` in connect call to `user_agent_entry` to expose it as a public parameter. (databricks/databricks-sql-python#530 by @shivam2680)
+- Fix: compatibility with urllib3 versions less than 2.x. (databricks/databricks-sql-python#526 by @shivam2680)
+- Support for Python 3.13 and updated dependencies (databricks/databricks-sql-python#510 by @dhirschfeld and @dbaxa)
+# 4.0.0 (2025-01-19)
+- Split the connector into two separate packages: `databricks-sql-connector` and `databricks-sqlalchemy`. The `databricks-sql-connector` package contains the core functionality of the connector, while the `databricks-sqlalchemy` package contains the SQLAlchemy dialect for the connector.
+- Pyarrow dependency is now optional in `databricks-sql-connector`. Users needing arrow are supposed to explicitly install pyarrow
+# 3.7.3 (2025-03-28)
+- Fix: Unable to poll small results in execute_async function (databricks/databricks-sql-python#515 by @jprakash-db)
+- Updated log messages to show the status code and error messages of requests (databricks/databricks-sql-python#511 by @jprakash-db)
+- Fix: Incorrect metadata was fetched in case of queries with the same alias (databricks/databricks-sql-python#505 by @jprakash-db)
+# 3.7.2 (2025-01-31)
+- Updated the retry_dela_max and retry_timeout (databricks/databricks-sql-python#497 by @jprakash-db)
+# 3.7.1 (2025-01-07)
+- Relaxed the number of Http retry attempts (databricks/databricks-sql-python#486 by @jprakash-db)
+# 3.7.0 (2024-12-23)
+- Fix: Incorrect number of rows fetched in inline results when fetching results with FETCH_NEXT orientation (databricks/databricks-sql-python#479 by @jprakash-db)
+- Updated the doc to specify native parameters are not supported in PUT operation (databricks/databricks-sql-python#477 by @jprakash-db)
+- Relax `pyarrow` and `numpy` pin (databricks/databricks-sql-python#452 by @arredond)
+- Feature: Support for async execute has been added (databricks/databricks-sql-python#463 by @jprakash-db)
+- Updated the HTTP retry logic to be similar to the other Databricks drivers (databricks/databricks-sql-python#467 by @jprakash-db)
 # 3.6.0 (2024-10-25)
 - Support encryption headers in the cloud fetch request (https://github.com/databricks/databricks-sql-python/pull/460 by @jackyhu-db)

{databricks_sql_connector-4.0.0b3 → databricks_sql_connector-4.0.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: databricks-sql-connector
-Version: 4.0.0b3
+Version: 4.0.1
 Summary: Databricks SQL Connector for Python
 License: Apache-2.0
 Author: Databricks
@@ -13,18 +13,15 @@ Classifier: Programming Language :: Python :: 3.9
 Classifier: Programming Language :: Python :: 3.10
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
-Provides-Extra: alembic
-Provides-Extra: databricks-sqlalchemy
 Provides-Extra: pyarrow
-Requires-Dist: alembic (>=1.0.11,<2.0.0) ; extra == "alembic"
-Requires-Dist: databricks-sqlalchemy (>=2.0.0) ; extra == "databricks-sqlalchemy" or extra == "alembic"
 Requires-Dist: lz4 (>=4.0.2,<5.0.0)
-Requires-Dist: numpy (>=1.16.6,<2.0.0) ; python_version >= "3.8" and python_version < "3.11"
-Requires-Dist: numpy (>=1.23.4,<2.0.0) ; python_version >= "3.11"
 Requires-Dist: oauthlib (>=3.1.0,<4.0.0)
 Requires-Dist: openpyxl (>=3.0.10,<4.0.0)
-Requires-Dist: pandas (>=1.2.5,<2.3.0) ; python_version >= "3.8"
-Requires-Dist: pyarrow (>=14.0.1,<17) ; extra == "pyarrow"
+Requires-Dist: pandas (>=1.2.5,<2.3.0) ; python_version >= "3.8" and python_version < "3.13"
+Requires-Dist: pandas (>=2.2.3,<2.3.0) ; python_version >= "3.13"
+Requires-Dist: pyarrow (>=14.0.1) ; (python_version >= "3.8" and python_version < "3.13") and (extra == "pyarrow")
+Requires-Dist: pyarrow (>=18.0.0) ; (python_version >= "3.13") and (extra == "pyarrow")
+Requires-Dist: python-dateutil (>=2.9.0,<3.0.0)
 Requires-Dist: requests (>=2.18.1,<3.0.0)
 Requires-Dist: thrift (>=0.16.0,<0.21.0)
 Requires-Dist: urllib3 (>=1.26)
@@ -37,9 +34,9 @@ Description-Content-Type: text/markdown
 [![PyPI](https://img.shields.io/pypi/v/databricks-sql-connector?style=flat-square)](https://pypi.org/project/databricks-sql-connector/)
 [![Downloads](https://pepy.tech/badge/databricks-sql-connector)](https://pepy.tech/project/databricks-sql-connector)
-The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/) and exposes a [SQLAlchemy](https://www.sqlalchemy.org/) dialect for use with tools like `pandas` and `alembic` which use SQLAlchemy to execute DDL. Use `pip install databricks-sql-connector[sqlalchemy]` to install with SQLAlchemy's dependencies. `pip install databricks-sql-connector[alembic]` will install alembic's dependencies.
+The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/).
-This connector uses Arrow as the data-exchange format, and supports APIs to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time.
+This connector uses Arrow as the data-exchange format, and supports APIs (e.g. `fetchmany_arrow`) to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time. [PyArrow](https://arrow.apache.org/docs/python/index.html) is required to enable this and use these APIs, you can install it via  `pip install pyarrow` or `pip install databricks-sql-connector[pyarrow]`.
 You are welcome to file an issue here for general use cases. You can also contact Databricks Support [here](help.databricks.com).
@@ -56,7 +53,12 @@ For the latest documentation, see
 ## Quickstart
-Install the library with `pip install databricks-sql-connector`
+### Installing the core library
+Install using `pip install databricks-sql-connector`
+### Installing the core library with PyArrow
+Install using `pip install databricks-sql-connector[pyarrow]`
 ```bash
 export DATABRICKS_HOST=********.databricks.com
@@ -94,6 +96,18 @@ or to a Databricks Runtime interactive cluster (e.g. /sql/protocolv1/o/123456789
 > to authenticate the target Databricks user account and needs to open the browser for authentication. So it
 > can only run on the user's machine.
+## SQLAlchemy
+Starting from `databricks-sql-connector` version 4.0.0 SQLAlchemy support has been extracted to a new library `databricks-sqlalchemy`.
+- Github repository [databricks-sqlalchemy github](https://github.com/databricks/databricks-sqlalchemy)
+- PyPI [databricks-sqlalchemy pypi](https://pypi.org/project/databricks-sqlalchemy/)
+### Quick SQLAlchemy guide
+Users can now choose between using the SQLAlchemy v1 or SQLAlchemy v2 dialects with the connector core
+- Install the latest SQLAlchemy v1 using `pip install databricks-sqlalchemy~=1.0`
+- Install SQLAlchemy v2 using `pip install databricks-sqlalchemy`
 ## Contributing

{databricks_sql_connector-4.0.0b3 → databricks_sql_connector-4.0.1}/README.md RENAMED Viewed

@@ -3,9 +3,9 @@
 [![PyPI](https://img.shields.io/pypi/v/databricks-sql-connector?style=flat-square)](https://pypi.org/project/databricks-sql-connector/)
 [![Downloads](https://pepy.tech/badge/databricks-sql-connector)](https://pepy.tech/project/databricks-sql-connector)
-The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/) and exposes a [SQLAlchemy](https://www.sqlalchemy.org/) dialect for use with tools like `pandas` and `alembic` which use SQLAlchemy to execute DDL. Use `pip install databricks-sql-connector[sqlalchemy]` to install with SQLAlchemy's dependencies. `pip install databricks-sql-connector[alembic]` will install alembic's dependencies.
+The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/).
-This connector uses Arrow as the data-exchange format, and supports APIs to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time.
+This connector uses Arrow as the data-exchange format, and supports APIs (e.g. `fetchmany_arrow`) to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time. [PyArrow](https://arrow.apache.org/docs/python/index.html) is required to enable this and use these APIs, you can install it via  `pip install pyarrow` or `pip install databricks-sql-connector[pyarrow]`.
 You are welcome to file an issue here for general use cases. You can also contact Databricks Support [here](help.databricks.com).
@@ -22,7 +22,12 @@ For the latest documentation, see
 ## Quickstart
-Install the library with `pip install databricks-sql-connector`
+### Installing the core library
+Install using `pip install databricks-sql-connector`
+### Installing the core library with PyArrow
+Install using `pip install databricks-sql-connector[pyarrow]`
 ```bash
 export DATABRICKS_HOST=********.databricks.com
@@ -60,6 +65,18 @@ or to a Databricks Runtime interactive cluster (e.g. /sql/protocolv1/o/123456789
 > to authenticate the target Databricks user account and needs to open the browser for authentication. So it
 > can only run on the user's machine.
+## SQLAlchemy
+Starting from `databricks-sql-connector` version 4.0.0 SQLAlchemy support has been extracted to a new library `databricks-sqlalchemy`.
+- Github repository [databricks-sqlalchemy github](https://github.com/databricks/databricks-sqlalchemy)
+- PyPI [databricks-sqlalchemy pypi](https://pypi.org/project/databricks-sqlalchemy/)
+### Quick SQLAlchemy guide
+Users can now choose between using the SQLAlchemy v1 or SQLAlchemy v2 dialects with the connector core
+- Install the latest SQLAlchemy v1 using `pip install databricks-sqlalchemy~=1.0`
+- Install SQLAlchemy v2 using `pip install databricks-sqlalchemy`
 ## Contributing

{databricks_sql_connector-4.0.0b3 → databricks_sql_connector-4.0.1}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "databricks-sql-connector"
-version = "4.0.0.b3"
+version = "4.0.1"
 description = "Databricks SQL Connector for Python"
 authors = ["Databricks <databricks-sql-connector-maintainers@databricks.com>"]
 license = "Apache-2.0"
@@ -12,26 +12,21 @@ include = ["CHANGELOG.md"]
 python = "^3.8.0"
 thrift = ">=0.16.0,<0.21.0"
 pandas = [
-    { version = ">=1.2.5,<2.3.0", python = ">=3.8" }
+    { version = ">=1.2.5,<2.3.0", python = ">=3.8,<3.13" },
+    { version = ">=2.2.3,<2.3.0", python = ">=3.13" }
 ]
 lz4 = "^4.0.2"
 requests = "^2.18.1"
 oauthlib = "^3.1.0"
-numpy = [
-    { version = "^1.16.6", python = ">=3.8,<3.11" },
-    { version = "^1.23.4", python = ">=3.11" },
-]
 openpyxl = "^3.0.10"
 urllib3 = ">=1.26"
-databricks-sqlalchemy = { version = ">=2.0.0", optional = true }
-pyarrow = { version = ">=14.0.1,<17", optional=true }
-alembic = { version = "^1.0.11", optional = true }
+pyarrow = [
+    { version = ">=14.0.1", python = ">=3.8,<3.13", optional=true },
+    { version = ">=18.0.0", python = ">=3.13", optional=true }
+]
+python-dateutil = "^2.9.0"
 [tool.poetry.extras]
-databricks-sqlalchemy = ["databricks-sqlalchemy"]
-alembic = ["databricks-sqlalchemy", "alembic"]
 pyarrow = ["pyarrow"]
 [tool.poetry.dev-dependencies]
@@ -40,14 +35,15 @@ mypy = "^1.10.1"
 pylint = ">=2.12.0"
 black = "^22.3.0"
 pytest-dotenv = "^0.5.2"
+numpy = [
+    { version = ">=1.16.6", python = ">=3.8,<3.11" },
+    { version = ">=1.23.4", python = ">=3.11" },
+]
 [tool.poetry.urls]
 "Homepage" = "https://github.com/databricks/databricks-sql-python"
 "Bug Tracker" = "https://github.com/databricks/databricks-sql-python/issues"
-[tool.poetry.plugins."sqlalchemy.dialects"]
-"databricks" = "databricks.sqlalchemy:DatabricksDialect"
 [build-system]
 requires = ["poetry-core>=1.0.0"]
 build-backend = "poetry.core.masonry.api"
@@ -64,5 +60,5 @@ markers = {"reviewed" = "Test case has been reviewed by Databricks"}
 minversion = "6.0"
 log_cli = "false"
 log_cli_level = "INFO"
-testpaths = ["tests", "src/databricks/sqlalchemy/test_local"]
-env_files = ["test.env"]
+testpaths = ["tests"]
+env_files = ["test.env"]

{databricks_sql_connector-4.0.0b3 → databricks_sql_connector-4.0.1}/src/databricks/sql/__init__.py RENAMED Viewed

@@ -68,7 +68,7 @@ DATETIME = DBAPITypeObject("timestamp")
 DATE = DBAPITypeObject("date")
 ROWID = DBAPITypeObject()
-__version__ = "3.6.0"
+__version__ = "4.0.1"
 USER_AGENT_NAME = "PyDatabricksSqlConnector"
 # These two functions are pyhive legacy

{databricks_sql_connector-4.0.0b3 → databricks_sql_connector-4.0.1}/src/databricks/sql/auth/retry.py RENAMED Viewed

@@ -1,9 +1,12 @@
 import logging
+import random
 import time
 import typing
 from enum import Enum
 from typing import List, Optional, Tuple, Union
+import urllib3
 # We only use this import for type hinting
 try:
     # If urllib3~=2.0 is installed
@@ -13,6 +16,8 @@ except ImportError:
     from urllib3 import HTTPResponse as BaseHTTPResponse
 from urllib3 import Retry
 from urllib3.util.retry import RequestHistory
+from packaging import version
 from databricks.sql.exc import (
     CursorAlreadyClosedError,
@@ -285,25 +290,32 @@ class DatabricksRetryPolicy(Retry):
         """
         retry_after = self.get_retry_after(response)
         if retry_after:
-            backoff = self.get_backoff_time()
-            proposed_wait = max(backoff, retry_after)
-            self.check_proposed_wait(proposed_wait)
-            time.sleep(proposed_wait)
-            return True
+            proposed_wait = retry_after
+        else:
+            proposed_wait = self.get_backoff_time()
-        return False
+        proposed_wait = max(proposed_wait, self.delay_max)
+        self.check_proposed_wait(proposed_wait)
+        logger.debug(f"Retrying after {proposed_wait} seconds")
+        time.sleep(proposed_wait)
+        return True
     def get_backoff_time(self) -> float:
-        """Calls urllib3's built-in get_backoff_time.
+        """
+        This method implements the exponential backoff algorithm to calculate the delay between retries.
         Never returns a value larger than self.delay_max
         A MaxRetryDurationError will be raised if the calculated backoff would exceed self.max_attempts_duration
-        Note: within urllib3, a backoff is only calculated in cases where a Retry-After header is not present
-            in the previous unsuccessful request and `self.respect_retry_after_header` is True (which is always true)
+        :return:
         """
-        proposed_backoff = super().get_backoff_time()
+        current_attempt = self.stop_after_attempts_count - int(self.total or 0)
+        proposed_backoff = (2**current_attempt) * self.delay_min
+        if version.parse(urllib3.__version__) >= version.parse("2.0.0"):
+            if self.backoff_jitter != 0.0:
+                proposed_backoff += random.random() * self.backoff_jitter
         proposed_backoff = min(proposed_backoff, self.delay_max)
         self.check_proposed_wait(proposed_backoff)
@@ -338,23 +350,24 @@ class DatabricksRetryPolicy(Retry):
         if a retry would violate the configured policy.
         """
+        logger.info(f"Received status code {status_code} for {method} request")
         # Request succeeded. Don't retry.
         if status_code == 200:
             return False, "200 codes are not retried"
         if status_code == 401:
-            raise NonRecoverableNetworkError(
-                "Received 401 - UNAUTHORIZED. Confirm your authentication credentials."
+            return (
+                False,
+                "Received 401 - UNAUTHORIZED. Confirm your authentication credentials.",
             )
         if status_code == 403:
-            raise NonRecoverableNetworkError(
-                "Received 403 - FORBIDDEN. Confirm your authentication credentials."
-            )
+            return False, "403 codes are not retried"
         # Request failed and server said NotImplemented. This isn't recoverable. Don't retry.
         if status_code == 501:
-            raise NonRecoverableNetworkError("Received code 501 from server.")
+            return False, "Received code 501 from server."
         # Request failed and this method is not retryable. We only retry POST requests.
         if not self._is_method_retryable(method):
@@ -393,8 +406,9 @@ class DatabricksRetryPolicy(Retry):
             and status_code not in self.status_forcelist
             and status_code not in self.force_dangerous_codes
         ):
-            raise UnsafeToRetryError(
-                "ExecuteStatement command can only be retried for codes 429 and 503"
+            return (
+                False,
+                "ExecuteStatement command can only be retried for codes 429 and 503",
             )
         # Request failed with a dangerous code, was an ExecuteStatement, but user forced retries for this

{databricks_sql_connector-4.0.0b3 → databricks_sql_connector-4.0.1}/src/databricks/sql/auth/thrift_http_client.py RENAMED Viewed

@@ -198,6 +198,12 @@ class THttpClient(thrift.transport.THttpClient.THttpClient):
         self.message = self.__resp.reason
         self.headers = self.__resp.headers
+        logger.info(
+            "HTTP Response with status code {}, message: {}".format(
+                self.code, self.message
+            )
+        )
     @staticmethod
     def basic_proxy_auth_headers(proxy):
         if proxy is None or not proxy.username:

{databricks_sql_connector-4.0.0b3 → databricks_sql_connector-4.0.1}/src/databricks/sql/client.py RENAMED Viewed

@@ -1,3 +1,4 @@
+import time
 from typing import Dict, Tuple, List, Optional, Any, Union, Sequence
 import pandas
@@ -47,11 +48,19 @@ from databricks.sql.experimental.oauth_persistence import OAuthPersistence
 from databricks.sql.thrift_api.TCLIService.ttypes import (
     TSparkParameter,
+    TOperationState,
 )
 logger = logging.getLogger(__name__)
+if pyarrow is None:
+    logger.warning(
+        "[WARN] pyarrow is not installed by default since databricks-sql-connector 4.0.0,"
+        "any arrow specific api (e.g. fetchmany_arrow) and cloud fetch will be disabled."
+        "If you need these features, please run pip install pyarrow or pip install databricks-sql-connector[pyarrow] to install"
+    )
 DEFAULT_RESULT_BUFFER_SIZE_BYTES = 104857600
 DEFAULT_ARRAY_SIZE = 100000
@@ -113,6 +122,9 @@ class Connection:
                 port of the oauth redirect uri (localhost). This is required when custom oauth client_id
                 `oauth_client_id` is set
+            user_agent_entry: `str`, optional
+                A custom tag to append to the User-Agent header. This is typically used by partners to identify their applications.. If not specified, it will use the default user agent PyDatabricksSqlConnector
             experimental_oauth_persistence: configures preferred storage for persisting oauth tokens.
                 This has to be a class implementing `OAuthPersistence`.
                 When `auth_type` is set to `databricks-oauth` or `azure-oauth` without persisting the oauth token in a
@@ -167,8 +179,6 @@ class Connection:
         """
         # Internal arguments in **kwargs:
-        # _user_agent_entry
-        #   Tag to add to User-Agent header. For use by partners.
         # _use_cert_as_auth
         #  Use a TLS cert instead of a token
         # _enable_ssl
@@ -218,12 +228,21 @@ class Connection:
             server_hostname, **kwargs
         )
-        if not kwargs.get("_user_agent_entry"):
-            useragent_header = "{}/{}".format(USER_AGENT_NAME, __version__)
-        else:
+        user_agent_entry = kwargs.get("user_agent_entry")
+        if user_agent_entry is None:
+            user_agent_entry = kwargs.get("_user_agent_entry")
+            if user_agent_entry is not None:
+                logger.warning(
+                    "[WARN] Parameter '_user_agent_entry' is deprecated; use 'user_agent_entry' instead. "
+                    "This parameter will be removed in the upcoming releases."
+                )
+        if user_agent_entry:
             useragent_header = "{}/{} ({})".format(
-                USER_AGENT_NAME, __version__, kwargs.get("_user_agent_entry")
+                USER_AGENT_NAME, __version__, user_agent_entry
             )
+        else:
+            useragent_header = "{}/{}".format(USER_AGENT_NAME, __version__)
         base_headers = [("User-Agent", useragent_header)]
@@ -430,6 +449,8 @@ class Cursor:
         self.escaper = ParamEscaper()
         self.lastrowid = None
+        self.ASYNC_DEFAULT_POLLING_INTERVAL = 2
     # The ideal return type for this method is perhaps Self, but that was not added until 3.11, and we support pre-3.11 pythons, currently.
     def __enter__(self) -> "Cursor":
         return self
@@ -733,6 +754,7 @@ class Cursor:
         self,
         operation: str,
         parameters: Optional[TParameterCollection] = None,
+        enforce_embedded_schema_correctness=False,
     ) -> "Cursor":
         """
         Execute a query and wait for execution to complete.
@@ -796,6 +818,8 @@ class Cursor:
             cursor=self,
             use_cloud_fetch=self.connection.use_cloud_fetch,
             parameters=prepared_params,
+            async_op=False,
+            enforce_embedded_schema_correctness=enforce_embedded_schema_correctness,
         )
         self.active_result_set = ResultSet(
             self.connection,
@@ -803,6 +827,7 @@ class Cursor:
             self.thrift_backend,
             self.buffer_size_bytes,
             self.arraysize,
+            self.connection.use_cloud_fetch,
         )
         if execute_response.is_staging_operation:
@@ -812,6 +837,108 @@ class Cursor:
         return self
+    def execute_async(
+        self,
+        operation: str,
+        parameters: Optional[TParameterCollection] = None,
+        enforce_embedded_schema_correctness=False,
+    ) -> "Cursor":
+        """
+        Execute a query and do not wait for it to complete and just move ahead
+        :param operation:
+        :param parameters:
+        :return:
+        """
+        param_approach = self._determine_parameter_approach(parameters)
+        if param_approach == ParameterApproach.NONE:
+            prepared_params = NO_NATIVE_PARAMS
+            prepared_operation = operation
+        elif param_approach == ParameterApproach.INLINE:
+            prepared_operation, prepared_params = self._prepare_inline_parameters(
+                operation, parameters
+            )
+        elif param_approach == ParameterApproach.NATIVE:
+            normalized_parameters = self._normalize_tparametercollection(parameters)
+            param_structure = self._determine_parameter_structure(normalized_parameters)
+            transformed_operation = transform_paramstyle(
+                operation, normalized_parameters, param_structure
+            )
+            prepared_operation, prepared_params = self._prepare_native_parameters(
+                transformed_operation, normalized_parameters, param_structure
+            )
+        self._check_not_closed()
+        self._close_and_clear_active_result_set()
+        self.thrift_backend.execute_command(
+            operation=prepared_operation,
+            session_handle=self.connection._session_handle,
+            max_rows=self.arraysize,
+            max_bytes=self.buffer_size_bytes,
+            lz4_compression=self.connection.lz4_compression,
+            cursor=self,
+            use_cloud_fetch=self.connection.use_cloud_fetch,
+            parameters=prepared_params,
+            async_op=True,
+            enforce_embedded_schema_correctness=enforce_embedded_schema_correctness,
+        )
+        return self
+    def get_query_state(self) -> "TOperationState":
+        """
+        Get the state of the async executing query or basically poll the status of the query
+        :return:
+        """
+        self._check_not_closed()
+        return self.thrift_backend.get_query_state(self.active_op_handle)
+    def get_async_execution_result(self):
+        """
+        Checks for the status of the async executing query and fetches the result if the query is finished
+        Otherwise it will keep polling the status of the query till there is a Not pending state
+        :return:
+        """
+        self._check_not_closed()
+        def is_executing(operation_state) -> "bool":
+            return not operation_state or operation_state in [
+                ttypes.TOperationState.RUNNING_STATE,
+                ttypes.TOperationState.PENDING_STATE,
+            ]
+        while is_executing(self.get_query_state()):
+            # Poll after some default time
+            time.sleep(self.ASYNC_DEFAULT_POLLING_INTERVAL)
+        operation_state = self.get_query_state()
+        if operation_state == ttypes.TOperationState.FINISHED_STATE:
+            execute_response = self.thrift_backend.get_execution_result(
+                self.active_op_handle, self
+            )
+            self.active_result_set = ResultSet(
+                self.connection,
+                execute_response,
+                self.thrift_backend,
+                self.buffer_size_bytes,
+                self.arraysize,
+            )
+            if execute_response.is_staging_operation:
+                self._handle_staging_operation(
+                    staging_allowed_local_path=self.thrift_backend.staging_allowed_local_path
+                )
+            return self
+        else:
+            raise Error(
+                f"get_execution_result failed with Operation status {operation_state}"
+            )
     def executemany(self, operation, seq_of_parameters):
         """
         Execute the operation once for every set of passed in parameters.
@@ -1097,6 +1224,7 @@ class ResultSet:
         thrift_backend: ThriftBackend,
         result_buffer_size_bytes: int = DEFAULT_RESULT_BUFFER_SIZE_BYTES,
         arraysize: int = 10000,
+        use_cloud_fetch: bool = True,
     ):
         """
         A ResultSet manages the results of a single command.
@@ -1118,6 +1246,7 @@ class ResultSet:
         self.description = execute_response.description
         self._arrow_schema_bytes = execute_response.arrow_schema_bytes
         self._next_row_index = 0
+        self._use_cloud_fetch = use_cloud_fetch
         if execute_response.arrow_queue:
             # In this case the server has taken the fast path and returned an initial batch of
@@ -1145,6 +1274,7 @@ class ResultSet:
             lz4_compressed=self.lz4_compressed,
             arrow_schema_bytes=self._arrow_schema_bytes,
             description=self.description,
+            use_cloud_fetch=self._use_cloud_fetch,
         )
         self.results = results
         self.has_more_rows = has_more_rows

{databricks_sql_connector-4.0.0b3 → databricks_sql_connector-4.0.1}/src/databricks/sql/thrift_backend.py RENAMED Viewed

@@ -7,6 +7,8 @@ import uuid
 import threading
 from typing import List, Union
+from databricks.sql.thrift_api.TCLIService.ttypes import TOperationState
 try:
     import pyarrow
 except ImportError:
@@ -93,8 +95,6 @@ class ThriftBackend:
         **kwargs,
     ):
         # Internal arguments in **kwargs:
-        # _user_agent_entry
-        #   Tag to add to User-Agent header. For use by partners.
         # _username, _password
         #   Username and password Basic authentication (no official support)
         # _connection_uri
@@ -319,7 +319,7 @@ class ThriftBackend:
     # FUTURE: Consider moving to https://github.com/litl/backoff or
     # https://github.com/jd/tenacity for retry logic.
-    def make_request(self, method, request):
+    def make_request(self, method, request, retryable=True):
         """Execute given request, attempting retries when
             1. Receiving HTTP 429/503 from server
             2. OSError is raised during a GetOperationStatus
@@ -458,7 +458,7 @@ class ThriftBackend:
         #       return on success
         #       if available: bounded delay and retry
         #       if not: raise error
-        max_attempts = self._retry_stop_after_attempts_count
+        max_attempts = self._retry_stop_after_attempts_count if retryable else 1
         # use index-1 counting for logging/human consistency
         for attempt in range(1, max_attempts + 1):
@@ -769,6 +769,63 @@ class ThriftBackend:
             arrow_schema_bytes=schema_bytes,
         )
+    def get_execution_result(self, op_handle, cursor):
+        assert op_handle is not None
+        req = ttypes.TFetchResultsReq(
+            operationHandle=ttypes.TOperationHandle(
+                op_handle.operationId,
+                op_handle.operationType,
+                False,
+                op_handle.modifiedRowCount,
+            ),
+            maxRows=cursor.arraysize,
+            maxBytes=cursor.buffer_size_bytes,
+            orientation=ttypes.TFetchOrientation.FETCH_NEXT,
+            includeResultSetMetadata=True,
+        )
+        resp = self.make_request(self._client.FetchResults, req)
+        t_result_set_metadata_resp = resp.resultSetMetadata
+        lz4_compressed = t_result_set_metadata_resp.lz4Compressed
+        is_staging_operation = t_result_set_metadata_resp.isStagingOperation
+        has_more_rows = resp.hasMoreRows
+        description = self._hive_schema_to_description(
+            t_result_set_metadata_resp.schema
+        )
+        schema_bytes = (
+            t_result_set_metadata_resp.arrowSchema
+            or self._hive_schema_to_arrow_schema(t_result_set_metadata_resp.schema)
+            .serialize()
+            .to_pybytes()
+        )
+        queue = ResultSetQueueFactory.build_queue(
+            row_set_type=resp.resultSetMetadata.resultFormat,
+            t_row_set=resp.results,
+            arrow_schema_bytes=schema_bytes,
+            max_download_threads=self.max_download_threads,
+            lz4_compressed=lz4_compressed,
+            description=description,
+            ssl_options=self._ssl_options,
+        )
+        return ExecuteResponse(
+            arrow_queue=queue,
+            status=resp.status,
+            has_been_closed_server_side=False,
+            has_more_rows=has_more_rows,
+            lz4_compressed=lz4_compressed,
+            is_staging_operation=is_staging_operation,
+            command_handle=op_handle,
+            description=description,
+            arrow_schema_bytes=schema_bytes,
+        )
     def _wait_until_command_done(self, op_handle, initial_operation_status_resp):
         if initial_operation_status_resp:
             self._check_command_not_in_error_or_closed_state(
@@ -787,6 +844,12 @@ class ThriftBackend:
             self._check_command_not_in_error_or_closed_state(op_handle, poll_resp)
         return operation_state
+    def get_query_state(self, op_handle) -> "TOperationState":
+        poll_resp = self._poll_for_status(op_handle)
+        operation_state = poll_resp.operationState
+        self._check_command_not_in_error_or_closed_state(op_handle, poll_resp)
+        return operation_state
     @staticmethod
     def _check_direct_results_for_error(t_spark_direct_results):
         if t_spark_direct_results:
@@ -817,6 +880,8 @@ class ThriftBackend:
         cursor,
         use_cloud_fetch=True,
         parameters=[],
+        async_op=False,
+        enforce_embedded_schema_correctness=False,
     ):
         assert session_handle is not None
@@ -832,8 +897,12 @@ class ThriftBackend:
             sessionHandle=session_handle,
             statement=operation,
             runAsync=True,
-            getDirectResults=ttypes.TSparkGetDirectResults(
-                maxRows=max_rows, maxBytes=max_bytes
+            # For async operation we don't want the direct results
+            getDirectResults=None
+            if async_op
+            else ttypes.TSparkGetDirectResults(
+                maxRows=max_rows,
+                maxBytes=max_bytes,
             ),
             canReadArrowResult=True if pyarrow else False,
             canDecompressLZ4Result=lz4_compression,
@@ -844,9 +913,14 @@ class ThriftBackend:
             },
             useArrowNativeTypes=spark_arrow_types,
             parameters=parameters,
+            enforceEmbeddedSchemaCorrectness=enforce_embedded_schema_correctness,
         )
         resp = self.make_request(self._client.ExecuteStatement, req)
-        return self._handle_execute_response(resp, cursor)
+        if async_op:
+            self._handle_execute_response_async(resp, cursor)
+        else:
+            return self._handle_execute_response(resp, cursor)
     def get_catalogs(self, session_handle, max_rows, max_bytes, cursor):
         assert session_handle is not None
@@ -945,6 +1019,10 @@ class ThriftBackend:
         return self._results_message_to_execute_response(resp, final_operation_state)
+    def _handle_execute_response_async(self, resp, cursor):
+        cursor.active_op_handle = resp.operationHandle
+        self._check_direct_results_for_error(resp.directResults)
     def fetch_results(
         self,
         op_handle,
@@ -954,6 +1032,7 @@ class ThriftBackend:
         lz4_compressed,
         arrow_schema_bytes,
         description,
+        use_cloud_fetch=True,
     ):
         assert op_handle is not None
@@ -970,10 +1049,11 @@ class ThriftBackend:
             includeResultSetMetadata=True,
         )
-        resp = self.make_request(self._client.FetchResults, req)
+        # Fetch results in Inline mode with FETCH_NEXT orientation are not idempotent and hence not retried
+        resp = self.make_request(self._client.FetchResults, req, use_cloud_fetch)
         if resp.results.startRowOffset > expected_row_start_offset:
-            logger.warning(
-                "Expected results to start from {} but they instead start at {}".format(
+            raise DataError(
+                "fetch_results failed due to inconsistency in the state between the client and the server. Expected results to start from {} but they instead start at {}, some result batches must have been skipped".format(
                     expected_row_start_offset, resp.results.startRowOffset
                 )
             )

{databricks_sql_connector-4.0.0b3 → databricks_sql_connector-4.0.1}/src/databricks/sql/utils.py RENAMED Viewed

@@ -1,6 +1,6 @@
 from __future__ import annotations
-import pytz
+from dateutil import parser
 import datetime
 import decimal
 from abc import ABC, abstractmethod
@@ -642,16 +642,7 @@ def convert_to_assigned_datatypes_in_column_table(column_table, description):
             )
         elif description[i][1] == "timestamp":
             converted_column_table.append(
-                tuple(
-                    (
-                        v
-                        if v is None
-                        else datetime.datetime.strptime(
-                            v, "%Y-%m-%d %H:%M:%S.%f"
-                        ).replace(tzinfo=pytz.UTC)
-                    )
-                    for v in col
-                )
+                tuple((v if v is None else parser.parse(v)) for v in col)
             )
         else:
             converted_column_table.append(col)

databricks_sql_connector-4.0.0b3/src/databricks/sqlalchemy/__init__.py DELETED Viewed

@@ -1,6 +0,0 @@
-try:
-	from databricks_sqlalchemy import *
-except:
-	import warnings
-	warnings.warn("Install databricks-sqlalchemy plugin before using this")