PyPI - databricks-sql-connector - Versions diffs - 4.0.0b2__tar.gz → 4.0.0b4__tar.gz - Mend

databricks-sql-connector 4.0.0b2tar.gz → 4.0.0b4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

{databricks_sql_connector-4.0.0b2 → databricks_sql_connector-4.0.0b4}/CHANGELOG.md RENAMED Viewed

@@ -1,5 +1,25 @@
 # Release History
+# 4.0.0
+- Split the connector into two separate packages: `databricks-sql-connector` and `databricks-sqlalchemy`. The `databricks-sql-connector` package contains the core functionality of the connector, while the `databricks-sqlalchemy` package contains the SQLAlchemy dialect for the connector.
+- Pyarrow dependency is now optional in `databricks-sql-connector`. Users needing arrow are supposed to explicitly install pyarrow
+# 3.6.0 (2024-10-25)
+- Support encryption headers in the cloud fetch request (https://github.com/databricks/databricks-sql-python/pull/460 by @jackyhu-db)
+# 3.5.0 (2024-10-18)
+- Create a non pyarrow flow to handle small results for the column set (databricks/databricks-sql-python#440 by @jprakash-db)
+- Fix: On non-retryable error, ensure PySQL includes useful information in error (databricks/databricks-sql-python#447 by @shivam2680)
+# 3.4.0 (2024-08-27)
+- Unpin pandas to support v2.2.2 (databricks/databricks-sql-python#416 by @kfollesdal)
+- Make OAuth as the default authenticator if no authentication setting is provided (databricks/databricks-sql-python#419 by @jackyhu-db)
+- Fix (regression): use SSL options with HTTPS connection pool (databricks/databricks-sql-python#425 by @kravets-levko)
 # 3.3.0 (2024-07-18)
 - Don't retry requests that fail with HTTP code 401 (databricks/databricks-sql-python#408 by @Hodnebo)

{databricks_sql_connector-4.0.0b2 → databricks_sql_connector-4.0.0b4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: databricks-sql-connector
-Version: 4.0.0b2
+Version: 4.0.0b4
 Summary: Databricks SQL Connector for Python
 License: Apache-2.0
 Author: Databricks
@@ -13,11 +13,7 @@ Classifier: Programming Language :: Python :: 3.9
 Classifier: Programming Language :: Python :: 3.10
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
-Provides-Extra: alembic
-Provides-Extra: databricks-sqlalchemy
 Provides-Extra: pyarrow
-Requires-Dist: alembic (>=1.0.11,<2.0.0) ; extra == "alembic"
-Requires-Dist: databricks-sqlalchemy (>=2.0.0) ; extra == "databricks-sqlalchemy" or extra == "alembic"
 Requires-Dist: lz4 (>=4.0.2,<5.0.0)
 Requires-Dist: numpy (>=1.16.6,<2.0.0) ; python_version >= "3.8" and python_version < "3.11"
 Requires-Dist: numpy (>=1.23.4,<2.0.0) ; python_version >= "3.11"
@@ -37,9 +33,9 @@ Description-Content-Type: text/markdown
 [![PyPI](https://img.shields.io/pypi/v/databricks-sql-connector?style=flat-square)](https://pypi.org/project/databricks-sql-connector/)
 [![Downloads](https://pepy.tech/badge/databricks-sql-connector)](https://pepy.tech/project/databricks-sql-connector)
-The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/) and exposes a [SQLAlchemy](https://www.sqlalchemy.org/) dialect for use with tools like `pandas` and `alembic` which use SQLAlchemy to execute DDL. Use `pip install databricks-sql-connector[sqlalchemy]` to install with SQLAlchemy's dependencies. `pip install databricks-sql-connector[alembic]` will install alembic's dependencies.
+The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/).
-This connector uses Arrow as the data-exchange format, and supports APIs to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time.
+This connector uses Arrow as the data-exchange format, and supports APIs (e.g. `fetchmany_arrow`) to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time. [PyArrow](https://arrow.apache.org/docs/python/index.html) is required to enable this and use these APIs, you can install it via  `pip install pyarrow` or `pip install databricks-sql-connector[pyarrow]`.
 You are welcome to file an issue here for general use cases. You can also contact Databricks Support [here](help.databricks.com).
@@ -56,7 +52,12 @@ For the latest documentation, see
 ## Quickstart
-Install the library with `pip install databricks-sql-connector`
+### Installing the core library
+Install using `pip install databricks-sql-connector`
+### Installing the core library with PyArrow
+Install using `pip install databricks-sql-connector[pyarrow]`
 ```bash
 export DATABRICKS_HOST=********.databricks.com
@@ -94,6 +95,18 @@ or to a Databricks Runtime interactive cluster (e.g. /sql/protocolv1/o/123456789
 > to authenticate the target Databricks user account and needs to open the browser for authentication. So it
 > can only run on the user's machine.
+## SQLAlchemy
+Starting from `databricks-sql-connector` version 4.0.0 SQLAlchemy support has been extracted to a new library `databricks-sqlalchemy`.
+- Github repository [databricks-sqlalchemy github](https://github.com/databricks/databricks-sqlalchemy)
+- PyPI [databricks-sqlalchemy pypi](https://pypi.org/project/databricks-sqlalchemy/)
+### Quick SQLAlchemy guide
+Users can now choose between using the SQLAlchemy v1 or SQLAlchemy v2 dialects with the connector core
+- Install the latest SQLAlchemy v1 using `pip install databricks-sqlalchemy~=1.0`
+- Install SQLAlchemy v2 using `pip install databricks-sqlalchemy`
 ## Contributing

{databricks_sql_connector-4.0.0b2 → databricks_sql_connector-4.0.0b4}/README.md RENAMED Viewed

@@ -3,9 +3,9 @@
 [![PyPI](https://img.shields.io/pypi/v/databricks-sql-connector?style=flat-square)](https://pypi.org/project/databricks-sql-connector/)
 [![Downloads](https://pepy.tech/badge/databricks-sql-connector)](https://pepy.tech/project/databricks-sql-connector)
-The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/) and exposes a [SQLAlchemy](https://www.sqlalchemy.org/) dialect for use with tools like `pandas` and `alembic` which use SQLAlchemy to execute DDL. Use `pip install databricks-sql-connector[sqlalchemy]` to install with SQLAlchemy's dependencies. `pip install databricks-sql-connector[alembic]` will install alembic's dependencies.
+The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/).
-This connector uses Arrow as the data-exchange format, and supports APIs to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time.
+This connector uses Arrow as the data-exchange format, and supports APIs (e.g. `fetchmany_arrow`) to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time. [PyArrow](https://arrow.apache.org/docs/python/index.html) is required to enable this and use these APIs, you can install it via  `pip install pyarrow` or `pip install databricks-sql-connector[pyarrow]`.
 You are welcome to file an issue here for general use cases. You can also contact Databricks Support [here](help.databricks.com).
@@ -22,7 +22,12 @@ For the latest documentation, see
 ## Quickstart
-Install the library with `pip install databricks-sql-connector`
+### Installing the core library
+Install using `pip install databricks-sql-connector`
+### Installing the core library with PyArrow
+Install using `pip install databricks-sql-connector[pyarrow]`
 ```bash
 export DATABRICKS_HOST=********.databricks.com
@@ -60,6 +65,18 @@ or to a Databricks Runtime interactive cluster (e.g. /sql/protocolv1/o/123456789
 > to authenticate the target Databricks user account and needs to open the browser for authentication. So it
 > can only run on the user's machine.
+## SQLAlchemy
+Starting from `databricks-sql-connector` version 4.0.0 SQLAlchemy support has been extracted to a new library `databricks-sqlalchemy`.
+- Github repository [databricks-sqlalchemy github](https://github.com/databricks/databricks-sqlalchemy)
+- PyPI [databricks-sqlalchemy pypi](https://pypi.org/project/databricks-sqlalchemy/)
+### Quick SQLAlchemy guide
+Users can now choose between using the SQLAlchemy v1 or SQLAlchemy v2 dialects with the connector core
+- Install the latest SQLAlchemy v1 using `pip install databricks-sqlalchemy~=1.0`
+- Install SQLAlchemy v2 using `pip install databricks-sqlalchemy`
 ## Contributing

{databricks_sql_connector-4.0.0b2 → databricks_sql_connector-4.0.0b4}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "databricks-sql-connector"
-version = "4.0.0.b2"
+version = "4.0.0.b4"
 description = "Databricks SQL Connector for Python"
 authors = ["Databricks <databricks-sql-connector-maintainers@databricks.com>"]
 license = "Apache-2.0"
@@ -23,15 +23,9 @@ numpy = [
 ]
 openpyxl = "^3.0.10"
 urllib3 = ">=1.26"
-databricks-sqlalchemy = { version = ">=2.0.0", optional = true }
 pyarrow = { version = ">=14.0.1,<17", optional=true }
-alembic = { version = "^1.0.11", optional = true }
 [tool.poetry.extras]
-databricks-sqlalchemy = ["databricks-sqlalchemy"]
-alembic = ["databricks-sqlalchemy", "alembic"]
 pyarrow = ["pyarrow"]
 [tool.poetry.dev-dependencies]
@@ -45,9 +39,6 @@ pytest-dotenv = "^0.5.2"
 "Homepage" = "https://github.com/databricks/databricks-sql-python"
 "Bug Tracker" = "https://github.com/databricks/databricks-sql-python/issues"
-[tool.poetry.plugins."sqlalchemy.dialects"]
-"databricks" = "databricks.sqlalchemy:DatabricksDialect"
 [build-system]
 requires = ["poetry-core>=1.0.0"]
 build-backend = "poetry.core.masonry.api"
@@ -64,5 +55,5 @@ markers = {"reviewed" = "Test case has been reviewed by Databricks"}
 minversion = "6.0"
 log_cli = "false"
 log_cli_level = "INFO"
-testpaths = ["tests", "src/databricks/sqlalchemy/test_local"]
+testpaths = ["tests"]
 env_files = ["test.env"]

{databricks_sql_connector-4.0.0b2 → databricks_sql_connector-4.0.0b4}/src/databricks/sql/__init__.py RENAMED Viewed

@@ -68,7 +68,7 @@ DATETIME = DBAPITypeObject("timestamp")
 DATE = DBAPITypeObject("date")
 ROWID = DBAPITypeObject()
-__version__ = "3.3.0"
+__version__ = "3.6.0"
 USER_AGENT_NAME = "PyDatabricksSqlConnector"
 # These two functions are pyhive legacy

{databricks_sql_connector-4.0.0b2 → databricks_sql_connector-4.0.0b4}/src/databricks/sql/auth/thrift_http_client.py RENAMED Viewed

@@ -1,13 +1,11 @@
 import base64
 import logging
 import urllib.parse
-from typing import Dict, Union
+from typing import Dict, Union, Optional
 import six
 import thrift
-logger = logging.getLogger(__name__)
 import ssl
 import warnings
 from http.client import HTTPResponse
@@ -16,6 +14,9 @@ from io import BytesIO
 from urllib3 import HTTPConnectionPool, HTTPSConnectionPool, ProxyManager
 from urllib3.util import make_headers
 from databricks.sql.auth.retry import CommandType, DatabricksRetryPolicy
+from databricks.sql.types import SSLOptions
+logger = logging.getLogger(__name__)
 class THttpClient(thrift.transport.THttpClient.THttpClient):
@@ -25,13 +26,12 @@ class THttpClient(thrift.transport.THttpClient.THttpClient):
         uri_or_host,
         port=None,
         path=None,
-        cafile=None,
-        cert_file=None,
-        key_file=None,
-        ssl_context=None,
+        ssl_options: Optional[SSLOptions] = None,
         max_connections: int = 1,
         retry_policy: Union[DatabricksRetryPolicy, int] = 0,
     ):
+        self._ssl_options = ssl_options
         if port is not None:
             warnings.warn(
                 "Please use the THttpClient('http{s}://host:port/path') constructor",
@@ -48,13 +48,11 @@ class THttpClient(thrift.transport.THttpClient.THttpClient):
             self.scheme = parsed.scheme
             assert self.scheme in ("http", "https")
             if self.scheme == "https":
-                self.certfile = cert_file
-                self.keyfile = key_file
-                self.context = (
-                    ssl.create_default_context(cafile=cafile)
-                    if (cafile and not ssl_context)
-                    else ssl_context
-                )
+                if self._ssl_options is not None:
+                    # TODO: Not sure if those options are used anywhere - need to double-check
+                    self.certfile = self._ssl_options.tls_client_cert_file
+                    self.keyfile = self._ssl_options.tls_client_cert_key_file
+                    self.context = self._ssl_options.create_ssl_context()
             self.port = parsed.port
             self.host = parsed.hostname
             self.path = parsed.path
@@ -109,12 +107,23 @@ class THttpClient(thrift.transport.THttpClient.THttpClient):
     def open(self):
         # self.__pool replaces the self.__http used by the original THttpClient
+        _pool_kwargs = {"maxsize": self.max_connections}
         if self.scheme == "http":
             pool_class = HTTPConnectionPool
         elif self.scheme == "https":
             pool_class = HTTPSConnectionPool
-        _pool_kwargs = {"maxsize": self.max_connections}
+            _pool_kwargs.update(
+                {
+                    "cert_reqs": ssl.CERT_REQUIRED
+                    if self._ssl_options.tls_verify
+                    else ssl.CERT_NONE,
+                    "ca_certs": self._ssl_options.tls_trusted_ca_file,
+                    "cert_file": self._ssl_options.tls_client_cert_file,
+                    "key_file": self._ssl_options.tls_client_cert_key_file,
+                    "key_password": self._ssl_options.tls_client_cert_key_password,
+                }
+            )
         if self.using_proxy():
             proxy_manager = ProxyManager(

{databricks_sql_connector-4.0.0b2 → databricks_sql_connector-4.0.0b4}/src/databricks/sql/client.py RENAMED Viewed

@@ -1,6 +1,11 @@
 from typing import Dict, Tuple, List, Optional, Any, Union, Sequence
 import pandas
+try:
+    import pyarrow
+except ImportError:
+    pyarrow = None
 import requests
 import json
 import os
@@ -21,6 +26,8 @@ from databricks.sql.utils import (
     ParamEscaper,
     inject_parameters,
     transform_paramstyle,
+    ColumnTable,
+    ColumnQueue,
 )
 from databricks.sql.parameters.native import (
     DbsqlParameterBase,
@@ -34,7 +41,7 @@ from databricks.sql.parameters.native import (
 )
-from databricks.sql.types import Row
+from databricks.sql.types import Row, SSLOptions
 from databricks.sql.auth.auth import get_python_sql_connector_auth_provider
 from databricks.sql.experimental.oauth_persistence import OAuthPersistence
@@ -42,13 +49,16 @@ from databricks.sql.thrift_api.TCLIService.ttypes import (
     TSparkParameter,
 )
-try:
-    import pyarrow
-except ImportError:
-    pyarrow = None
 logger = logging.getLogger(__name__)
+if pyarrow is None:
+    logger.warning(
+        "[WARN] pyarrow is not installed by default since databricks-sql-connector 4.0.0,"
+        "any arrow specific api (e.g. fetchmany_arrow) and cloud fetch will be disabled."
+        "If you need these features, please run pip install pyarrow or pip install databricks-sql-connector[pyarrow] to install"
+    )
 DEFAULT_RESULT_BUFFER_SIZE_BYTES = 104857600
 DEFAULT_ARRAY_SIZE = 100000
@@ -181,8 +191,9 @@ class Connection:
         # _tls_trusted_ca_file
         #   Set to the path of the file containing trusted CA certificates for server certificate
         #   verification. If not provide, uses system truststore.
-        # _tls_client_cert_file, _tls_client_cert_key_file
+        # _tls_client_cert_file, _tls_client_cert_key_file, _tls_client_cert_key_password
         #   Set client SSL certificate.
+        #   See https://docs.python.org/3/library/ssl.html#ssl.SSLContext.load_cert_chain
         # _retry_stop_after_attempts_count
         #  The maximum number of attempts during a request retry sequence (defaults to 24)
         # _socket_timeout
@@ -223,12 +234,25 @@ class Connection:
         base_headers = [("User-Agent", useragent_header)]
+        self._ssl_options = SSLOptions(
+            # Double negation is generally a bad thing, but we have to keep backward compatibility
+            tls_verify=not kwargs.get(
+                "_tls_no_verify", False
+            ),  # by default - verify cert and host
+            tls_verify_hostname=kwargs.get("_tls_verify_hostname", True),
+            tls_trusted_ca_file=kwargs.get("_tls_trusted_ca_file"),
+            tls_client_cert_file=kwargs.get("_tls_client_cert_file"),
+            tls_client_cert_key_file=kwargs.get("_tls_client_cert_key_file"),
+            tls_client_cert_key_password=kwargs.get("_tls_client_cert_key_password"),
+        )
         self.thrift_backend = ThriftBackend(
             self.host,
             self.port,
             http_path,
             (http_headers or []) + base_headers,
             auth_provider,
+            ssl_options=self._ssl_options,
             _use_arrow_native_complex_types=_use_arrow_native_complex_types,
             **kwargs,
         )
@@ -1132,6 +1156,18 @@ class ResultSet:
         self.results = results
         self.has_more_rows = has_more_rows
+    def _convert_columnar_table(self, table):
+        column_names = [c[0] for c in self.description]
+        ResultRow = Row(*column_names)
+        result = []
+        for row_index in range(table.num_rows):
+            curr_row = []
+            for col_index in range(table.num_columns):
+                curr_row.append(table.get_item(col_index, row_index))
+            result.append(ResultRow(*curr_row))
+        return result
     def _convert_arrow_table(self, table):
         column_names = [c[0] for c in self.description]
         ResultRow = Row(*column_names)
@@ -1199,6 +1235,48 @@ class ResultSet:
         return results
+    def merge_columnar(self, result1, result2):
+        """
+        Function to merge / combining the columnar results into a single result
+        :param result1:
+        :param result2:
+        :return:
+        """
+        if result1.column_names != result2.column_names:
+            raise ValueError("The columns in the results don't match")
+        merged_result = [
+            result1.column_table[i] + result2.column_table[i]
+            for i in range(result1.num_columns)
+        ]
+        return ColumnTable(merged_result, result1.column_names)
+    def fetchmany_columnar(self, size: int):
+        """
+        Fetch the next set of rows of a query result, returning a Columnar Table.
+        An empty sequence is returned when no more rows are available.
+        """
+        if size < 0:
+            raise ValueError("size argument for fetchmany is %s but must be >= 0", size)
+        results = self.results.next_n_rows(size)
+        n_remaining_rows = size - results.num_rows
+        self._next_row_index += results.num_rows
+        while (
+            n_remaining_rows > 0
+            and not self.has_been_closed_server_side
+            and self.has_more_rows
+        ):
+            self._fill_results_buffer()
+            partial_results = self.results.next_n_rows(n_remaining_rows)
+            results = self.merge_columnar(results, partial_results)
+            n_remaining_rows -= partial_results.num_rows
+            self._next_row_index += partial_results.num_rows
+        return results
     def fetchall_arrow(self) -> "pyarrow.Table":
         """Fetch all (remaining) rows of a query result, returning them as a PyArrow table."""
         results = self.results.remaining_rows()
@@ -1212,12 +1290,30 @@ class ResultSet:
         return results
+    def fetchall_columnar(self):
+        """Fetch all (remaining) rows of a query result, returning them as a Columnar table."""
+        results = self.results.remaining_rows()
+        self._next_row_index += results.num_rows
+        while not self.has_been_closed_server_side and self.has_more_rows:
+            self._fill_results_buffer()
+            partial_results = self.results.remaining_rows()
+            results = self.merge_columnar(results, partial_results)
+            self._next_row_index += partial_results.num_rows
+        return results
     def fetchone(self) -> Optional[Row]:
         """
         Fetch the next row of a query result set, returning a single sequence,
         or None when no more data is available.
         """
-        res = self._convert_arrow_table(self.fetchmany_arrow(1))
+        if isinstance(self.results, ColumnQueue):
+            res = self._convert_columnar_table(self.fetchmany_columnar(1))
+        else:
+            res = self._convert_arrow_table(self.fetchmany_arrow(1))
         if len(res) > 0:
             return res[0]
         else:
@@ -1227,7 +1323,10 @@ class ResultSet:
         """
         Fetch all (remaining) rows of a query result, returning them as a list of rows.
         """
-        return self._convert_arrow_table(self.fetchall_arrow())
+        if isinstance(self.results, ColumnQueue):
+            return self._convert_columnar_table(self.fetchall_columnar())
+        else:
+            return self._convert_arrow_table(self.fetchall_arrow())
     def fetchmany(self, size: int) -> List[Row]:
         """
@@ -1235,7 +1334,10 @@ class ResultSet:
         An empty sequence is returned when no more rows are available.
         """
-        return self._convert_arrow_table(self.fetchmany_arrow(size))
+        if isinstance(self.results, ColumnQueue):
+            return self._convert_columnar_table(self.fetchmany_columnar(size))
+        else:
+            return self._convert_arrow_table(self.fetchmany_arrow(size))
     def close(self) -> None:
         """

{databricks_sql_connector-4.0.0b2 → databricks_sql_connector-4.0.0b4}/src/databricks/sql/cloudfetch/download_manager.py RENAMED Viewed

@@ -1,6 +1,5 @@
 import logging
-from ssl import SSLContext
 from concurrent.futures import ThreadPoolExecutor, Future
 from typing import List, Union
@@ -9,6 +8,8 @@ from databricks.sql.cloudfetch.downloader import (
     DownloadableResultSettings,
     DownloadedFile,
 )
+from databricks.sql.types import SSLOptions
 from databricks.sql.thrift_api.TCLIService.ttypes import TSparkArrowResultLink
 logger = logging.getLogger(__name__)
@@ -20,7 +21,7 @@ class ResultFileDownloadManager:
         links: List[TSparkArrowResultLink],
         max_download_threads: int,
         lz4_compressed: bool,
-        ssl_context: SSLContext,
+        ssl_options: SSLOptions,
     ):
         self._pending_links: List[TSparkArrowResultLink] = []
         for link in links:
@@ -38,7 +39,7 @@ class ResultFileDownloadManager:
         self._thread_pool = ThreadPoolExecutor(max_workers=self._max_download_threads)
         self._downloadable_result_settings = DownloadableResultSettings(lz4_compressed)
-        self._ssl_context = ssl_context
+        self._ssl_options = ssl_options
     def get_next_downloaded_file(
         self, next_row_offset: int
@@ -95,7 +96,7 @@ class ResultFileDownloadManager:
             handler = ResultSetDownloadHandler(
                 settings=self._downloadable_result_settings,
                 link=link,
-                ssl_context=self._ssl_context,
+                ssl_options=self._ssl_options,
             )
             task = self._thread_pool.submit(handler.run)
             self._download_tasks.append(task)

{databricks_sql_connector-4.0.0b2 → databricks_sql_connector-4.0.0b4}/src/databricks/sql/cloudfetch/downloader.py RENAMED Viewed

@@ -3,13 +3,12 @@ from dataclasses import dataclass
 import requests
 from requests.adapters import HTTPAdapter, Retry
-from ssl import SSLContext, CERT_NONE
 import lz4.frame
 import time
 from databricks.sql.thrift_api.TCLIService.ttypes import TSparkArrowResultLink
 from databricks.sql.exc import Error
+from databricks.sql.types import SSLOptions
 logger = logging.getLogger(__name__)
@@ -66,11 +65,11 @@ class ResultSetDownloadHandler:
         self,
         settings: DownloadableResultSettings,
         link: TSparkArrowResultLink,
-        ssl_context: SSLContext,
+        ssl_options: SSLOptions,
     ):
         self.settings = settings
         self.link = link
-        self._ssl_context = ssl_context
+        self._ssl_options = ssl_options
     def run(self) -> DownloadedFile:
         """
@@ -95,14 +94,14 @@ class ResultSetDownloadHandler:
         session.mount("http://", HTTPAdapter(max_retries=retryPolicy))
         session.mount("https://", HTTPAdapter(max_retries=retryPolicy))
-        ssl_verify = self._ssl_context.verify_mode != CERT_NONE
         try:
             # Get the file via HTTP request
             response = session.get(
                 self.link.fileLink,
                 timeout=self.settings.download_timeout,
-                verify=ssl_verify,
+                verify=self._ssl_options.tls_verify,
+                headers=self.link.httpHeaders
+                # TODO: Pass cert from `self._ssl_options`
             )
             response.raise_for_status()

{databricks_sql_connector-4.0.0b2 → databricks_sql_connector-4.0.0b4}/src/databricks/sql/thrift_backend.py RENAMED Viewed

@@ -5,9 +5,12 @@ import math
 import time
 import uuid
 import threading
-from ssl import CERT_NONE, CERT_REQUIRED, create_default_context
 from typing import List, Union
+try:
+    import pyarrow
+except ImportError:
+    pyarrow = None
 import thrift.transport.THttpClient
 import thrift.protocol.TBinaryProtocol
 import thrift.transport.TSocket
@@ -35,11 +38,7 @@ from databricks.sql.utils import (
     convert_decimals_in_arrow_table,
     convert_column_based_set_to_arrow_table,
 )
-try:
-    import pyarrow
-except ImportError:
-    pyarrow = None
+from databricks.sql.types import SSLOptions
 logger = logging.getLogger(__name__)
@@ -89,6 +88,7 @@ class ThriftBackend:
         http_path: str,
         http_headers,
         auth_provider: AuthProvider,
+        ssl_options: SSLOptions,
         staging_allowed_local_path: Union[None, str, List[str]] = None,
         **kwargs,
     ):
@@ -97,16 +97,6 @@ class ThriftBackend:
         #   Tag to add to User-Agent header. For use by partners.
         # _username, _password
         #   Username and password Basic authentication (no official support)
-        # _tls_no_verify
-        #   Set to True (Boolean) to completely disable SSL verification.
-        # _tls_verify_hostname
-        #   Set to False (Boolean) to disable SSL hostname verification, but check certificate.
-        # _tls_trusted_ca_file
-        #   Set to the path of the file containing trusted CA certificates for server certificate
-        #   verification. If not provide, uses system truststore.
-        # _tls_client_cert_file, _tls_client_cert_key_file, _tls_client_cert_key_password
-        #   Set client SSL certificate.
-        #   See https://docs.python.org/3/library/ssl.html#ssl.SSLContext.load_cert_chain
         # _connection_uri
         #   Overrides server_hostname and http_path.
         # RETRY/ATTEMPT POLICY
@@ -166,29 +156,7 @@ class ThriftBackend:
         # Cloud fetch
         self.max_download_threads = kwargs.get("max_download_threads", 10)
-        # Configure tls context
-        ssl_context = create_default_context(cafile=kwargs.get("_tls_trusted_ca_file"))
-        if kwargs.get("_tls_no_verify") is True:
-            ssl_context.check_hostname = False
-            ssl_context.verify_mode = CERT_NONE
-        elif kwargs.get("_tls_verify_hostname") is False:
-            ssl_context.check_hostname = False
-            ssl_context.verify_mode = CERT_REQUIRED
-        else:
-            ssl_context.check_hostname = True
-            ssl_context.verify_mode = CERT_REQUIRED
-        tls_client_cert_file = kwargs.get("_tls_client_cert_file")
-        tls_client_cert_key_file = kwargs.get("_tls_client_cert_key_file")
-        tls_client_cert_key_password = kwargs.get("_tls_client_cert_key_password")
-        if tls_client_cert_file:
-            ssl_context.load_cert_chain(
-                certfile=tls_client_cert_file,
-                keyfile=tls_client_cert_key_file,
-                password=tls_client_cert_key_password,
-            )
-        self._ssl_context = ssl_context
+        self._ssl_options = ssl_options
         self._auth_provider = auth_provider
@@ -229,7 +197,7 @@ class ThriftBackend:
         self._transport = databricks.sql.auth.thrift_http_client.THttpClient(
             auth_provider=self._auth_provider,
             uri_or_host=uri,
-            ssl_context=self._ssl_context,
+            ssl_options=self._ssl_options,
             **additional_transport_args,  # type: ignore
         )
@@ -656,12 +624,6 @@ class ThriftBackend:
     @staticmethod
     def _hive_schema_to_arrow_schema(t_table_schema):
-        if pyarrow is None:
-            raise ImportError(
-                "pyarrow is required to convert Hive schema to Arrow schema"
-            )
         def map_type(t_type_entry):
             if t_type_entry.primitiveEntry:
                 return {
@@ -767,12 +729,17 @@ class ThriftBackend:
         description = self._hive_schema_to_description(
             t_result_set_metadata_resp.schema
         )
-        schema_bytes = (
-            t_result_set_metadata_resp.arrowSchema
-            or self._hive_schema_to_arrow_schema(t_result_set_metadata_resp.schema)
-            .serialize()
-            .to_pybytes()
-        )
+        if pyarrow:
+            schema_bytes = (
+                t_result_set_metadata_resp.arrowSchema
+                or self._hive_schema_to_arrow_schema(t_result_set_metadata_resp.schema)
+                .serialize()
+                .to_pybytes()
+            )
+        else:
+            schema_bytes = None
         lz4_compressed = t_result_set_metadata_resp.lz4Compressed
         is_staging_operation = t_result_set_metadata_resp.isStagingOperation
         if direct_results and direct_results.resultSet:
@@ -786,7 +753,7 @@ class ThriftBackend:
                 max_download_threads=self.max_download_threads,
                 lz4_compressed=lz4_compressed,
                 description=description,
-                ssl_context=self._ssl_context,
+                ssl_options=self._ssl_options,
             )
         else:
             arrow_queue_opt = None
@@ -1018,7 +985,7 @@ class ThriftBackend:
             max_download_threads=self.max_download_threads,
             lz4_compressed=lz4_compressed,
             description=description,
-            ssl_context=self._ssl_context,
+            ssl_options=self._ssl_options,
         )
         return queue, resp.hasMoreRows

{databricks_sql_connector-4.0.0b2 → databricks_sql_connector-4.0.0b4}/src/databricks/sql/types.py RENAMED Viewed

@@ -19,6 +19,54 @@
 from typing import Any, Dict, List, Optional, Tuple, Union, TypeVar
 import datetime
 import decimal
+from ssl import SSLContext, CERT_NONE, CERT_REQUIRED, create_default_context
+class SSLOptions:
+    tls_verify: bool
+    tls_verify_hostname: bool
+    tls_trusted_ca_file: Optional[str]
+    tls_client_cert_file: Optional[str]
+    tls_client_cert_key_file: Optional[str]
+    tls_client_cert_key_password: Optional[str]
+    def __init__(
+        self,
+        tls_verify: bool = True,
+        tls_verify_hostname: bool = True,
+        tls_trusted_ca_file: Optional[str] = None,
+        tls_client_cert_file: Optional[str] = None,
+        tls_client_cert_key_file: Optional[str] = None,
+        tls_client_cert_key_password: Optional[str] = None,
+    ):
+        self.tls_verify = tls_verify
+        self.tls_verify_hostname = tls_verify_hostname
+        self.tls_trusted_ca_file = tls_trusted_ca_file
+        self.tls_client_cert_file = tls_client_cert_file
+        self.tls_client_cert_key_file = tls_client_cert_key_file
+        self.tls_client_cert_key_password = tls_client_cert_key_password
+    def create_ssl_context(self) -> SSLContext:
+        ssl_context = create_default_context(cafile=self.tls_trusted_ca_file)
+        if self.tls_verify is False:
+            ssl_context.check_hostname = False
+            ssl_context.verify_mode = CERT_NONE
+        elif self.tls_verify_hostname is False:
+            ssl_context.check_hostname = False
+            ssl_context.verify_mode = CERT_REQUIRED
+        else:
+            ssl_context.check_hostname = True
+            ssl_context.verify_mode = CERT_REQUIRED
+        if self.tls_client_cert_file:
+            ssl_context.load_cert_chain(
+                certfile=self.tls_client_cert_file,
+                keyfile=self.tls_client_cert_key_file,
+                password=self.tls_client_cert_key_password,
+            )
+        return ssl_context
 class Row(tuple):

{databricks_sql_connector-4.0.0b2 → databricks_sql_connector-4.0.0b4}/src/databricks/sql/utils.py RENAMED Viewed

@@ -1,5 +1,6 @@
 from __future__ import annotations
+import pytz
 import datetime
 import decimal
 from abc import ABC, abstractmethod
@@ -9,10 +10,14 @@ from decimal import Decimal
 from enum import Enum
 from typing import Any, Dict, List, Optional, Union
 import re
-from ssl import SSLContext
 import lz4.frame
+try:
+    import pyarrow
+except ImportError:
+    pyarrow = None
 from databricks.sql import OperationalError, exc
 from databricks.sql.cloudfetch.download_manager import ResultFileDownloadManager
 from databricks.sql.thrift_api.TCLIService.ttypes import (
@@ -20,17 +25,14 @@ from databricks.sql.thrift_api.TCLIService.ttypes import (
     TSparkArrowResultLink,
     TSparkRowSetType,
 )
+from databricks.sql.types import SSLOptions
 from databricks.sql.parameters.native import ParameterStructure, TDbsqlParameter
-BIT_MASKS = [1, 2, 4, 8, 16, 32, 64, 128]
 import logging
-try:
-    import pyarrow
-except ImportError:
-    pyarrow = None
+BIT_MASKS = [1, 2, 4, 8, 16, 32, 64, 128]
+DEFAULT_ERROR_CONTEXT = "Unknown error"
 logger = logging.getLogger(__name__)
@@ -52,7 +54,7 @@ class ResultSetQueueFactory(ABC):
         t_row_set: TRowSet,
         arrow_schema_bytes: bytes,
         max_download_threads: int,
-        ssl_context: SSLContext,
+        ssl_options: SSLOptions,
         lz4_compressed: bool = True,
         description: Optional[List[List[Any]]] = None,
     ) -> ResultSetQueue:
@@ -66,7 +68,7 @@ class ResultSetQueueFactory(ABC):
             lz4_compressed (bool): Whether result data has been lz4 compressed.
             description (List[List[Any]]): Hive table schema description.
             max_download_threads (int): Maximum number of downloader thread pool threads.
-            ssl_context (SSLContext): SSLContext object for CloudFetchQueue
+            ssl_options (SSLOptions): SSLOptions object for CloudFetchQueue
         Returns:
             ResultSetQueue
@@ -80,13 +82,15 @@ class ResultSetQueueFactory(ABC):
             )
             return ArrowQueue(converted_arrow_table, n_valid_rows)
         elif row_set_type == TSparkRowSetType.COLUMN_BASED_SET:
-            arrow_table, n_valid_rows = convert_column_based_set_to_arrow_table(
+            column_table, column_names = convert_column_based_set_to_column_table(
                 t_row_set.columns, description
             )
-            converted_arrow_table = convert_decimals_in_arrow_table(
-                arrow_table, description
+            converted_column_table = convert_to_assigned_datatypes_in_column_table(
+                column_table, description
             )
-            return ArrowQueue(converted_arrow_table, n_valid_rows)
+            return ColumnQueue(ColumnTable(converted_column_table, column_names))
         elif row_set_type == TSparkRowSetType.URL_BASED_SET:
             return CloudFetchQueue(
                 schema_bytes=arrow_schema_bytes,
@@ -95,12 +99,65 @@ class ResultSetQueueFactory(ABC):
                 lz4_compressed=lz4_compressed,
                 description=description,
                 max_download_threads=max_download_threads,
-                ssl_context=ssl_context,
+                ssl_options=ssl_options,
             )
         else:
             raise AssertionError("Row set type is not valid")
+class ColumnTable:
+    def __init__(self, column_table, column_names):
+        self.column_table = column_table
+        self.column_names = column_names
+    @property
+    def num_rows(self):
+        if len(self.column_table) == 0:
+            return 0
+        else:
+            return len(self.column_table[0])
+    @property
+    def num_columns(self):
+        return len(self.column_names)
+    def get_item(self, col_index, row_index):
+        return self.column_table[col_index][row_index]
+    def slice(self, curr_index, length):
+        sliced_column_table = [
+            column[curr_index : curr_index + length] for column in self.column_table
+        ]
+        return ColumnTable(sliced_column_table, self.column_names)
+    def __eq__(self, other):
+        return (
+            self.column_table == other.column_table
+            and self.column_names == other.column_names
+        )
+class ColumnQueue(ResultSetQueue):
+    def __init__(self, column_table: ColumnTable):
+        self.column_table = column_table
+        self.cur_row_index = 0
+        self.n_valid_rows = column_table.num_rows
+    def next_n_rows(self, num_rows):
+        length = min(num_rows, self.n_valid_rows - self.cur_row_index)
+        slice = self.column_table.slice(self.cur_row_index, length)
+        self.cur_row_index += slice.num_rows
+        return slice
+    def remaining_rows(self):
+        slice = self.column_table.slice(
+            self.cur_row_index, self.n_valid_rows - self.cur_row_index
+        )
+        self.cur_row_index += slice.num_rows
+        return slice
 class ArrowQueue(ResultSetQueue):
     def __init__(
         self,
@@ -141,7 +198,7 @@ class CloudFetchQueue(ResultSetQueue):
         self,
         schema_bytes,
         max_download_threads: int,
-        ssl_context: SSLContext,
+        ssl_options: SSLOptions,
         start_row_offset: int = 0,
         result_links: Optional[List[TSparkArrowResultLink]] = None,
         lz4_compressed: bool = True,
@@ -164,7 +221,7 @@ class CloudFetchQueue(ResultSetQueue):
         self.result_links = result_links
         self.lz4_compressed = lz4_compressed
         self.description = description
-        self._ssl_context = ssl_context
+        self._ssl_options = ssl_options
         logger.debug(
             "Initialize CloudFetch loader, row set start offset: {}, file list:".format(
@@ -182,7 +239,7 @@ class CloudFetchQueue(ResultSetQueue):
             links=result_links or [],
             max_download_threads=self.max_download_threads,
             lz4_compressed=self.lz4_compressed,
-            ssl_context=self._ssl_context,
+            ssl_options=self._ssl_options,
         )
         self.table = self._create_next_table()
@@ -361,7 +418,12 @@ class RequestErrorInfo(
             user_friendly_error_message = "{}: {}".format(
                 user_friendly_error_message, self.error_message
             )
-        return user_friendly_error_message
+        try:
+            error_context = str(self.error)
+        except:
+            error_context = DEFAULT_ERROR_CONTEXT
+        return user_friendly_error_message + ". " + error_context
 # Taken from PyHive
@@ -566,6 +628,37 @@ def convert_decimals_in_arrow_table(table, description) -> "pyarrow.Table":
     return table
+def convert_to_assigned_datatypes_in_column_table(column_table, description):
+    converted_column_table = []
+    for i, col in enumerate(column_table):
+        if description[i][1] == "decimal":
+            converted_column_table.append(
+                tuple(v if v is None else Decimal(v) for v in col)
+            )
+        elif description[i][1] == "date":
+            converted_column_table.append(
+                tuple(v if v is None else datetime.date.fromisoformat(v) for v in col)
+            )
+        elif description[i][1] == "timestamp":
+            converted_column_table.append(
+                tuple(
+                    (
+                        v
+                        if v is None
+                        else datetime.datetime.strptime(
+                            v, "%Y-%m-%d %H:%M:%S.%f"
+                        ).replace(tzinfo=pytz.UTC)
+                    )
+                    for v in col
+                )
+            )
+        else:
+            converted_column_table.append(col)
+    return converted_column_table
 def convert_column_based_set_to_arrow_table(columns, description):
     arrow_table = pyarrow.Table.from_arrays(
         [_convert_column_to_arrow_array(c) for c in columns],
@@ -577,6 +670,13 @@ def convert_column_based_set_to_arrow_table(columns, description):
     return arrow_table, arrow_table.num_rows
+def convert_column_based_set_to_column_table(columns, description):
+    column_names = [c[0] for c in description]
+    column_table = [_convert_column_to_list(c) for c in columns]
+    return column_table, column_names
 def _convert_column_to_arrow_array(t_col):
     """
     Return a pyarrow array from the values in a TColumn instance.
@@ -601,6 +701,26 @@ def _convert_column_to_arrow_array(t_col):
     raise OperationalError("Empty TColumn instance {}".format(t_col))
+def _convert_column_to_list(t_col):
+    SUPPORTED_FIELD_TYPES = (
+        "boolVal",
+        "byteVal",
+        "i16Val",
+        "i32Val",
+        "i64Val",
+        "doubleVal",
+        "stringVal",
+        "binaryVal",
+    )
+    for field in SUPPORTED_FIELD_TYPES:
+        wrapper = getattr(t_col, field)
+        if wrapper:
+            return _create_python_tuple(wrapper)
+    raise OperationalError("Empty TColumn instance {}".format(t_col))
 def _create_arrow_array(t_col_value_wrapper, arrow_type):
     result = t_col_value_wrapper.values
     nulls = t_col_value_wrapper.nulls  # bitfield describing which values are null
@@ -615,3 +735,19 @@ def _create_arrow_array(t_col_value_wrapper, arrow_type):
             result[i] = None
     return pyarrow.array(result, type=arrow_type)
+def _create_python_tuple(t_col_value_wrapper):
+    result = t_col_value_wrapper.values
+    nulls = t_col_value_wrapper.nulls  # bitfield describing which values are null
+    assert isinstance(nulls, bytes)
+    # The number of bits in nulls can be both larger or smaller than the number of
+    # elements in result, so take the minimum of both to iterate over.
+    length = min(len(result), len(nulls) * 8)
+    for i in range(length):
+        if nulls[i >> 3] & BIT_MASKS[i & 0x7]:
+            result[i] = None
+    return tuple(result)

databricks_sql_connector-4.0.0b2/src/databricks/sqlalchemy/__init__.py DELETED Viewed

@@ -1,6 +0,0 @@
-try:
-    from databricks_sqlalchemy import *
-except:
-    import warnings
-    warnings.warn("Install databricks-sqlalchemy plugin before using this")