PyPI - sqlframe - Versions diffs - 1.14.0__tar.gz → 2.1.0__tar.gz - Mend

sqlframe 1.14.0tar.gz → 2.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (227) hide show

{sqlframe-1.14.0 → sqlframe-2.1.0}/Makefile RENAMED Viewed

@@ -19,6 +19,9 @@ bigquery-test:
 duckdb-test:
 	pytest -n auto -m "duckdb"
+snowflake-test:
+	pytest -n auto -m "snowflake"
 style:
 	pre-commit run --all-files

{sqlframe-1.14.0 → sqlframe-2.1.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: sqlframe
-Version: 1.14.0
+Version: 2.1.0
 Summary: Turning PySpark Into a Universal DataFrame API
 Home-page: https://github.com/eakmanrq/sqlframe
 Author: Ryan Eakman
@@ -78,6 +78,10 @@ SQLFrame generates consistently accurate yet complex SQL for engine execution.
 However, when using df.sql(), it produces more human-readable SQL.
 For details on how to configure this output and leverage OpenAI to enhance the SQL, see [Generated SQL Configuration](https://sqlframe.readthedocs.io/en/stable/configuration/#generated-sql).
+SQLFrame by default uses the Spark dialect for input and output.
+This can be changed to make SQLFrame feel more like a native DataFrame API for the engine you are using.
+See [Input and Output Dialect Configuration](https://sqlframe.readthedocs.io/en/stable/configuration/#input-and-output-dialect).
 ## Example Usage
 ```python
@@ -112,7 +116,7 @@ df = (
 )
 ```
 ```python
->>> df.sql()
+>>> df.sql(optimize=True)
 WITH `t94228` AS (
   SELECT
     `natality`.`year` AS `year`,

{sqlframe-1.14.0 → sqlframe-2.1.0}/README.md RENAMED Viewed

@@ -48,6 +48,10 @@ SQLFrame generates consistently accurate yet complex SQL for engine execution.
 However, when using df.sql(), it produces more human-readable SQL.
 For details on how to configure this output and leverage OpenAI to enhance the SQL, see [Generated SQL Configuration](https://sqlframe.readthedocs.io/en/stable/configuration/#generated-sql).
+SQLFrame by default uses the Spark dialect for input and output.
+This can be changed to make SQLFrame feel more like a native DataFrame API for the engine you are using.
+See [Input and Output Dialect Configuration](https://sqlframe.readthedocs.io/en/stable/configuration/#input-and-output-dialect).
 ## Example Usage
 ```python
@@ -82,7 +86,7 @@ df = (
 )
 ```
 ```python
->>> df.sql()
+>>> df.sql(optimize=True)
 WITH `t94228` AS (
   SELECT
     `natality`.`year` AS `year`,

{sqlframe-1.14.0 → sqlframe-2.1.0}/docs/configuration.md RENAMED Viewed

@@ -1,5 +1,29 @@
 # General Configuration
+## Input and Output Dialect
+By default, SQLFrame processes all string inputs using the Spark dialect (e.g., date format strings, SQL) and generates outputs in the Spark dialect (e.g., column names, data types).
+This configuration is ideal if you aim to use the PySpark DataFrame API as if running on Spark while actually executing on another engine.
+This configuration can be changed to make SQLFrame feel more like a native DataFrame API for the engine you are using.
+Example: Using BigQuery to Change Default Behavior
+```python
+from sqlframe.bigquery import BigQuerySession
+session = BigQuerySession.builder.config(
+    map={
+        "sqlframe.input.dialect": "bigquery",
+        "sqlframe.output.dialect": "bigquery",
+    }
+).getOrCreate()
+```
+In this configuration, you can use BigQuery syntax for elements such as date format strings and will receive BigQuery column names and data types in the output.
+SQLFrame supports multiple dialects, all of which can be specific as the `input_dialect` and `output_dialect`.
 ## Generated SQL
 ### Pretty
@@ -28,7 +52,9 @@ SELECT CAST(`a3`.`a` AS BIGINT) AS `a`, CAST(`a3`.`b` AS BIGINT) AS `b` FROM VAL
 ### Optimized
-Optimized SQL is SQL that has been processed by SQLGlot's optimizer. For complex queries this will significantly reduce the number of CTEs produced and remove extra unused columns. Defaults to `True`.
+Optimized SQL is SQL that has been processed by SQLGlot's optimizer.
+For complex queries this will significantly reduce the number of CTEs produced and remove extra unused columns.
+Defaults to `False`.
 ```python
 from sqlframe.bigquery import BigQuerySession
@@ -177,7 +203,9 @@ LIMIT 5
 ### Override Dialect
-The dialect of the generated SQL will be based on the session's dialect. However, you can override the dialect by passing a string to the `dialect` parameter. This is useful when you want to generate SQL for a different database.
+The dialect of the generated SQL will be based on the session's output dialect.
+However, you can override the dialect by passing a string to the `dialect` parameter.
+This is useful when you want to generate SQL for a different database.
 ```python
 # create session and `df` like normal

{sqlframe-1.14.0 → sqlframe-2.1.0}/docs/duckdb.md RENAMED Viewed

@@ -258,6 +258,7 @@ See something that you would like to see supported? [Open an issue](https://gith
 * [concat](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.concat.html)
     * Only works on strings (does not work on arrays)
 * [concat_ws](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.concat_ws.html)
+* [convert_timezone](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.convert_timezone.html)
 * [corr](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.corr.html)
 * [cos](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.cos.html)
 * [cot](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.cot.html)
@@ -293,6 +294,7 @@ See something that you would like to see supported? [Open an issue](https://gith
 * [element_at](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.element_at.html)
     * Only works on strings (does not work on arrays)
 * [encode](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.encode.html)
+* [endswith](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Column.endswith.html)
 * [exp](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.exp.html)
 * [explode](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.explode.html)
 * [expm1](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.expm1.html)
@@ -320,6 +322,7 @@ See something that you would like to see supported? [Open an issue](https://gith
 * [kurtosis](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.kurtosis.html)
 * [lag](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.lag.html)
 * [last](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.last.html)
+* [last_day](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.last_day.html)
 * [lcase](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.lcase.html)
 * [lead](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.lead.html)
 * [least](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.least.html)

{sqlframe-1.14.0 → sqlframe-2.1.0}/docs/postgres.md RENAMED Viewed

@@ -300,6 +300,7 @@ See something that you would like to see supported? [Open an issue](https://gith
 * [element_at](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.element_at.html)
     * Only works on strings (does not work on arrays)
 * [encode](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.encode.html)
+* [endswith](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Column.endswith.html)
 * [exp](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.exp.html)
 * [explode](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.explode.html)
     * Doesn't support exploding maps
@@ -320,6 +321,7 @@ See something that you would like to see supported? [Open an issue](https://gith
 * [isnan](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.isnan.html)
 * [isnull](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.isnull.html)
 * [lag](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.lag.html)
+* [last_day](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.last_day.html)
 * [lcase](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.lcase.html)
 * [lead](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.lead.html)
 * [least](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.least.html)

{sqlframe-1.14.0 → sqlframe-2.1.0}/docs/snowflake.md RENAMED Viewed

@@ -1,4 +1,4 @@
-# BigQuery
+# Snowflake
 ## Installation
@@ -286,6 +286,7 @@ See something that you would like to see supported? [Open an issue](https://gith
 * [concat](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.concat.html)
   * Can only concat strings not arrays
 * [concat_ws](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.concat_ws.html)
+* [convert_timezone](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.convert_timezone.html)
 * [corr](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.corr.html)
 * [cos](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.cos.html)
 * [cosh](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.cosh.html)
@@ -319,6 +320,7 @@ See something that you would like to see supported? [Open an issue](https://gith
 * [desc_nulls_last](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.desc_nulls_last.html)
 * [e](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.e.html)
 * [element_at](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.element_at.html)
+* [endswith](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Column.endswith.html)
 * [exp](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.exp.html)
 * [explode](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.explode.html)
 * [expm1](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.expm1.html)

{sqlframe-1.14.0 → sqlframe-2.1.0}/setup.py RENAMED Viewed

@@ -20,7 +20,7 @@ setup(
     python_requires=">=3.8",
     install_requires=[
         "prettytable<3.11.0",
-        "sqlglot>=24.0.0,<25.5",
+        "sqlglot>=24.0.0,<25.11",
         "typing_extensions>=4.8,<5",
     ],
     extras_require={
@@ -30,18 +30,18 @@ setup(
         ],
         "dev": [
             "duckdb>=0.9,<1.1",
-            "mypy>=1.10.0,<1.11",
-            "openai>=1.30,<1.36",
+            "mypy>=1.10.0,<1.12",
+            "openai>=1.30,<1.41",
             "pandas>=2,<3",
             "pandas-stubs>=2,<3",
             "psycopg>=3.1,<4",
-            "pyarrow>=10,<17",
+            "pyarrow>=10,<18",
             "pyspark>=2,<3.6",
-            "pytest>=8.2.0,<8.3",
+            "pytest>=8.2.0,<8.4",
             "pytest-postgresql>=6,<7",
             "pytest-xdist>=3.6,<3.7",
             "pre-commit>=3.5;python_version=='3.8'",
-            "pre-commit>=3.7,<3.8;python_version>='3.9'",
+            "pre-commit>=3.7,<3.9;python_version>='3.9'",
             "ruff>=0.4.4,<0.6",
             "types-psycopg2>=2.9,<3",
         ],
@@ -57,7 +57,7 @@ setup(
             "pandas>=2,<3",
         ],
         "openai": [
-            "openai>=1.30,<1.36",
+            "openai>=1.30,<1.41",
         ],
         "pandas": [
             "pandas>=2,<3",
@@ -69,7 +69,7 @@ setup(
             "redshift_connector>=2.1.1,<2.2.0",
         ],
         "snowflake": [
-            "snowflake-connector-python[secure-local-storage]>=3.10.0,<3.12",
+            "snowflake-connector-python[secure-local-storage]>=3.10.0,<3.13",
         ],
         "spark": [
             "pyspark>=2,<3.6",

{sqlframe-1.14.0 → sqlframe-2.1.0}/sqlframe/_version.py RENAMED Viewed

@@ -12,5 +12,5 @@ __version__: str
 __version_tuple__: VERSION_TUPLE
 version_tuple: VERSION_TUPLE
-__version__ = version = '1.14.0'
-__version_tuple__ = version_tuple = (1, 14, 0)
+__version__ = version = '2.1.0'
+__version_tuple__ = version_tuple = (2, 1, 0)

{sqlframe-1.14.0 → sqlframe-2.1.0}/sqlframe/base/_typing.py RENAMED Viewed

@@ -24,6 +24,7 @@ OutputExpressionContainer = t.Union[exp.Select, exp.Create, exp.Insert]
 StorageLevel = str
 PathOrPaths = t.Union[str, t.List[str]]
 OptionalPrimitiveType = t.Optional[PrimitiveType]
+DataTypeOrString = t.Union[DataType, str]
 class UserDefinedFunctionLike(t.Protocol):

{sqlframe-1.14.0 → sqlframe-2.1.0}/sqlframe/base/catalog.py RENAMED Viewed

@@ -3,12 +3,12 @@
 from __future__ import annotations
 import typing as t
+from collections import defaultdict
 from sqlglot import MappingSchema, exp
-from sqlframe.base.decorators import normalize
 from sqlframe.base.exceptions import TableSchemaError
-from sqlframe.base.util import ensure_column_mapping, to_schema
+from sqlframe.base.util import ensure_column_mapping, normalize_string, to_schema
 if t.TYPE_CHECKING:
     from sqlglot.schema import ColumnMapping
@@ -33,6 +33,7 @@ class _BaseCatalog(t.Generic[SESSION, DF]):
         """Create a new Catalog that wraps the underlying JVM object."""
         self.session = sparkSession
         self._schema = schema or MappingSchema()
+        self._quoted_columns: t.Dict[exp.Table, t.List[str]] = defaultdict(list)
     @property
     def spark(self) -> SESSION:
@@ -52,7 +53,7 @@ class _BaseCatalog(t.Generic[SESSION, DF]):
     def get_columns_from_schema(self, table: exp.Table | str) -> t.Dict[str, exp.DataType]:
         table = self.ensure_table(table)
         return {
-            exp.column(name, quoted=True).sql(
+            exp.column(name, quoted=name in self._quoted_columns[table]).sql(
                 dialect=self.session.input_dialect
             ): exp.DataType.build(dtype, dialect=self.session.input_dialect)
             for name, dtype in self._schema.find(table, raise_on_missing=True).items()  # type: ignore
@@ -64,9 +65,7 @@ class _BaseCatalog(t.Generic[SESSION, DF]):
         if not columns:
             return {}
         return {
-            exp.column(c.name, quoted=True).sql(
-                dialect=self.session.input_dialect
-            ): exp.DataType.build(c.dataType, dialect=self.session.input_dialect)
+            c.name: exp.DataType.build(c.dataType, dialect=self.session.output_dialect)
             for c in columns
         }
@@ -79,16 +78,30 @@ class _BaseCatalog(t.Generic[SESSION, DF]):
             return
         if not column_mapping:
             try:
-                column_mapping = self.get_columns(table)
+                column_mapping = {
+                    normalize_string(
+                        k, from_dialect="output", to_dialect="input", is_column=True
+                    ): normalize_string(
+                        v.sql(dialect=self.session.output_dialect),
+                        from_dialect="output",
+                        to_dialect="input",
+                        is_datatype=True,
+                    )
+                    for k, v in self.get_columns(table).items()
+                }
             except NotImplementedError:
                 # TODO: Add doc link
                 raise TableSchemaError(
                     "This session does not have access to a catalog that can lookup column information. See docs for explicitly defining columns or using a session that can automatically determine this."
                 )
         column_mapping = ensure_column_mapping(column_mapping)  # type: ignore
+        for column_name in column_mapping:
+            column = exp.to_column(column_name, dialect=self.session.input_dialect)
+            if column.this.quoted:
+                self._quoted_columns[table].append(column.this.name)
         self._schema.add_table(table, column_mapping, dialect=self.session.input_dialect)
-    @normalize(["dbName"])
     def getDatabase(self, dbName: str) -> Database:
         """Get the database with the specified name.
         This throws an :class:`AnalysisException` when the database cannot be found.
@@ -115,6 +128,7 @@ class _BaseCatalog(t.Generic[SESSION, DF]):
         >>> spark.catalog.getDatabase("spark_catalog.default")
         Database(name='default', catalog='spark_catalog', description='default database', ...
         """
+        dbName = normalize_string(dbName, from_dialect="input", is_schema=True)
         schema = to_schema(dbName, dialect=self.session.input_dialect)
         database_name = schema.db
         databases = self.listDatabases(pattern=database_name)
@@ -122,12 +136,16 @@ class _BaseCatalog(t.Generic[SESSION, DF]):
             raise ValueError(f"Database '{dbName}' not found")
         if len(databases) > 1:
             if schema.catalog is not None:
-                filtered_databases = [db for db in databases if db.catalog == schema.catalog]
+                filtered_databases = [
+                    db
+                    for db in databases
+                    if normalize_string(db.catalog, from_dialect="output", to_dialect="input")  # type: ignore
+                    == schema.catalog
+                ]
                 if filtered_databases:
                     return filtered_databases[0]
         return databases[0]
-    @normalize(["dbName"])
     def databaseExists(self, dbName: str) -> bool:
         """Check if the database with the specified name exists.
@@ -168,7 +186,6 @@ class _BaseCatalog(t.Generic[SESSION, DF]):
         except ValueError:
             return False
-    @normalize(["tableName"])
     def getTable(self, tableName: str) -> Table:
         """Get the table or view with the specified name. This table can be a temporary view or a
         table/view. This throws an :class:`AnalysisException` when no Table can be found.
@@ -210,13 +227,18 @@ class _BaseCatalog(t.Generic[SESSION, DF]):
             ...
         AnalysisException: ...
         """
+        tableName = normalize_string(tableName, from_dialect="input", is_table=True)
         table = exp.to_table(tableName, dialect=self.session.input_dialect)
         schema = table.copy()
         schema.set("this", None)
         tables = self.listTables(
             schema.sql(dialect=self.session.input_dialect) if schema.db else None
         )
-        matching_tables = [t for t in tables if t.name == table.name]
+        matching_tables = [
+            t
+            for t in tables
+            if normalize_string(t.name, from_dialect="output", to_dialect="input") == table.name
+        ]
         if not matching_tables:
             raise ValueError(f"Table '{tableName}' not found")
         return matching_tables[0]
@@ -315,7 +337,6 @@ class _BaseCatalog(t.Generic[SESSION, DF]):
             raise ValueError(f"Function '{functionName}' not found")
         return matching_functions[0]
-    @normalize(["tableName", "dbName"])
     def tableExists(self, tableName: str, dbName: t.Optional[str] = None) -> bool:
         """Check if the table or view with the specified name exists.
         This can either be a temporary view or a table/view.
@@ -389,6 +410,8 @@ class _BaseCatalog(t.Generic[SESSION, DF]):
         >>> spark.catalog.tableExists("view1")
         False
         """
+        tableName = normalize_string(tableName, from_dialect="input", is_table=True)
+        dbName = normalize_string(dbName, from_dialect="input", is_schema=True) if dbName else None
         table = exp.to_table(tableName, dialect=self.session.input_dialect)
         schema_arg = to_schema(dbName, dialect=self.session.input_dialect) if dbName else None
         if not table.db:

{sqlframe-1.14.0 → sqlframe-2.1.0}/sqlframe/base/column.py RENAMED Viewed

@@ -7,11 +7,11 @@ import math
 import typing as t
 import sqlglot
+from sqlglot import Dialect
 from sqlglot import expressions as exp
 from sqlglot.helper import flatten, is_iterable
 from sqlglot.optimizer.normalize_identifiers import normalize_identifiers
-from sqlframe.base.decorators import normalize
 from sqlframe.base.exceptions import UnsupportedOperationError
 from sqlframe.base.types import DataType
 from sqlframe.base.util import get_func_from_session, quote_preserving_alias_or_name
@@ -211,9 +211,8 @@ class Column:
     def binary_op(
         self, klass: t.Callable, other: ColumnOrLiteral, paren: bool = False, **kwargs
     ) -> Column:
-        op = klass(
-            this=self.column_expression, expression=Column(other).column_expression, **kwargs
-        )
+        other = self._lit(other) if isinstance(other, str) else Column(other)
+        op = klass(this=self.column_expression, expression=other.column_expression, **kwargs)
         if paren:
             return Column(exp.Paren(this=op))
         return Column(op)
@@ -221,9 +220,8 @@ class Column:
     def inverse_binary_op(
         self, klass: t.Callable, other: ColumnOrLiteral, paren: bool = False, **kwargs
     ) -> Column:
-        op = klass(
-            this=Column(other).column_expression, expression=self.column_expression, **kwargs
-        )
+        other = self._lit(other) if isinstance(other, str) else Column(other)
+        op = klass(this=other.column_expression, expression=self.column_expression, **kwargs)
         if paren:
             return Column(exp.Paren(this=op))
         return Column(op)
@@ -340,13 +338,17 @@ class Column:
         new_expression = exp.Not(this=exp.Is(this=self.column_expression, expression=exp.Null()))
         return Column(new_expression)
-    def cast(self, dataType: t.Union[str, DataType]) -> Column:
+    def cast(
+        self, dataType: t.Union[str, DataType], dialect: t.Optional[t.Union[str, Dialect]] = None
+    ) -> Column:
         from sqlframe.base.session import _BaseSession
         if isinstance(dataType, DataType):
             dataType = dataType.simpleString()
         return Column(
-            exp.cast(self.column_expression, dataType, dialect=_BaseSession().input_dialect)
+            exp.cast(
+                self.column_expression, dataType, dialect=dialect or _BaseSession().input_dialect
+            )
         )
     def startswith(self, value: t.Union[str, Column]) -> Column:

sqlframe 1.14.0__tar.gz → 2.1.0__tar.gz

sqlframe 1.14.0tar.gz → 2.1.0tar.gz