PyPI - sqlglot - Versions diffs - 27.20.0__tar.gz → 27.21.0__tar.gz - Mend

sqlglot 27.20.0tar.gz → 27.21.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (228) hide show

{sqlglot-27.20.0 → sqlglot-27.21.0}/CHANGELOG.md RENAMED Viewed

@@ -1,6 +1,62 @@
 Changelog
 =========
+## [v27.20.0] - 2025-09-30
+### :boom: BREAKING CHANGES
+- due to [`13a30df`](https://github.com/tobymao/sqlglot/commit/13a30dfa37096df5bfc2c31538325c40a49f7917) - Annotate type for snowflake TRY_BASE64_DECODE_BINARY function *(PR [#5972](https://github.com/tobymao/sqlglot/pull/5972) by [@fivetran-amrutabhimsenayachit](https://github.com/fivetran-amrutabhimsenayachit))*:
+  Annotate type for snowflake TRY_BASE64_DECODE_BINARY function (#5972)
+- due to [`1f5fdd7`](https://github.com/tobymao/sqlglot/commit/1f5fdd799c047de167a4572f7ac26b7ad92167f2) - Annotate type for snowflake TRY_BASE64_DECODE_STRING function *(PR [#5974](https://github.com/tobymao/sqlglot/pull/5974) by [@fivetran-amrutabhimsenayachit](https://github.com/fivetran-amrutabhimsenayachit))*:
+  Annotate type for snowflake TRY_BASE64_DECODE_STRING function (#5974)
+- due to [`324e82f`](https://github.com/tobymao/sqlglot/commit/324e82fe1fb11722f91341010602a743b151e055) - Annotate type for snowflake TRY_HEX_DECODE_BINARY function *(PR [#5975](https://github.com/tobymao/sqlglot/pull/5975) by [@fivetran-amrutabhimsenayachit](https://github.com/fivetran-amrutabhimsenayachit))*:
+  Annotate type for snowflake TRY_HEX_DECODE_BINARY function (#5975)
+- due to [`6caf99d`](https://github.com/tobymao/sqlglot/commit/6caf99d556a3357ffaa6c294a9babcd30dd5fac5) - Annotate type for snowflake TRY_HEX_DECODE_STRING function *(PR [#5976](https://github.com/tobymao/sqlglot/pull/5976) by [@fivetran-amrutabhimsenayachit](https://github.com/fivetran-amrutabhimsenayachit))*:
+  Annotate type for snowflake TRY_HEX_DECODE_STRING function (#5976)
+- due to [`73186a8`](https://github.com/tobymao/sqlglot/commit/73186a812ce422c108ee81b3de11da6ee9a9e902) - annotate type for Snowflake REGEXP_COUNT function *(PR [#5963](https://github.com/tobymao/sqlglot/pull/5963) by [@fivetran-BradfordPaskewitz](https://github.com/fivetran-BradfordPaskewitz))*:
+  annotate type for Snowflake REGEXP_COUNT function (#5963)
+- due to [`c3bdb3c`](https://github.com/tobymao/sqlglot/commit/c3bdb3cd1af1809ed82be0ae40744d9fffc8ce18) - array start index is 1, support array_flatten, fixes [#5983](https://github.com/tobymao/sqlglot/pull/5983) *(commit by [@georgesittas](https://github.com/georgesittas))*:
+  array start index is 1, support array_flatten, fixes #5983
+- due to [`244fb48`](https://github.com/tobymao/sqlglot/commit/244fb48fc9c4776f427c08b825d139b1c172fd26) - annotate type for Snowflake SPLIT_PART function *(PR [#5988](https://github.com/tobymao/sqlglot/pull/5988) by [@fivetran-BradfordPaskewitz](https://github.com/fivetran-BradfordPaskewitz))*:
+  annotate type for Snowflake SPLIT_PART function (#5988)
+- due to [`0d772e0`](https://github.com/tobymao/sqlglot/commit/0d772e0b9d687b24d49203c05d7a90cc1dce02d5) - add ast node for `DIRECTORY` source *(PR [#5990](https://github.com/tobymao/sqlglot/pull/5990) by [@georgesittas](https://github.com/georgesittas))*:
+  add ast node for `DIRECTORY` source (#5990)
+### :sparkles: New Features
+- [`13a30df`](https://github.com/tobymao/sqlglot/commit/13a30dfa37096df5bfc2c31538325c40a49f7917) - **optimizer**: Annotate type for snowflake TRY_BASE64_DECODE_BINARY function *(PR [#5972](https://github.com/tobymao/sqlglot/pull/5972) by [@fivetran-amrutabhimsenayachit](https://github.com/fivetran-amrutabhimsenayachit))*
+- [`1f5fdd7`](https://github.com/tobymao/sqlglot/commit/1f5fdd799c047de167a4572f7ac26b7ad92167f2) - **optimizer**: Annotate type for snowflake TRY_BASE64_DECODE_STRING function *(PR [#5974](https://github.com/tobymao/sqlglot/pull/5974) by [@fivetran-amrutabhimsenayachit](https://github.com/fivetran-amrutabhimsenayachit))*
+- [`324e82f`](https://github.com/tobymao/sqlglot/commit/324e82fe1fb11722f91341010602a743b151e055) - **optimizer**: Annotate type for snowflake TRY_HEX_DECODE_BINARY function *(PR [#5975](https://github.com/tobymao/sqlglot/pull/5975) by [@fivetran-amrutabhimsenayachit](https://github.com/fivetran-amrutabhimsenayachit))*
+- [`6caf99d`](https://github.com/tobymao/sqlglot/commit/6caf99d556a3357ffaa6c294a9babcd30dd5fac5) - **optimizer**: Annotate type for snowflake TRY_HEX_DECODE_STRING function *(PR [#5976](https://github.com/tobymao/sqlglot/pull/5976) by [@fivetran-amrutabhimsenayachit](https://github.com/fivetran-amrutabhimsenayachit))*
+- [`73186a8`](https://github.com/tobymao/sqlglot/commit/73186a812ce422c108ee81b3de11da6ee9a9e902) - **optimizer**: annotate type for Snowflake REGEXP_COUNT function *(PR [#5963](https://github.com/tobymao/sqlglot/pull/5963) by [@fivetran-BradfordPaskewitz](https://github.com/fivetran-BradfordPaskewitz))*
+- [`6124de7`](https://github.com/tobymao/sqlglot/commit/6124de76fa6d6725e844cd37e09ebfe99469b0ec) - **optimizer**: Annotate type for snowflake SOUNDEX function *(PR [#5986](https://github.com/tobymao/sqlglot/pull/5986) by [@fivetran-amrutabhimsenayachit](https://github.com/fivetran-amrutabhimsenayachit))*
+- [`244fb48`](https://github.com/tobymao/sqlglot/commit/244fb48fc9c4776f427c08b825d139b1c172fd26) - **optimizer**: annotate type for Snowflake SPLIT_PART function *(PR [#5988](https://github.com/tobymao/sqlglot/pull/5988) by [@fivetran-BradfordPaskewitz](https://github.com/fivetran-BradfordPaskewitz))*
+- [`0d772e0`](https://github.com/tobymao/sqlglot/commit/0d772e0b9d687b24d49203c05d7a90cc1dce02d5) - **snowflake**: add ast node for `DIRECTORY` source *(PR [#5990](https://github.com/tobymao/sqlglot/pull/5990) by [@georgesittas](https://github.com/georgesittas))*
+### :bug: Bug Fixes
+- [`7a3744f`](https://github.com/tobymao/sqlglot/commit/7a3744f203b93211e5dd97e6730b6bf59d6d96e0) - **sqlite**: support `RANGE CURRENT ROW` in window spec *(commit by [@georgesittas](https://github.com/georgesittas))*
+- [`c3bdb3c`](https://github.com/tobymao/sqlglot/commit/c3bdb3cd1af1809ed82be0ae40744d9fffc8ce18) - **starrocks**: array start index is 1, support array_flatten, fixes [#5983](https://github.com/tobymao/sqlglot/pull/5983) *(commit by [@georgesittas](https://github.com/georgesittas))*
+### :recycle: Refactors
+- [`d425ba2`](https://github.com/tobymao/sqlglot/commit/d425ba26b96b368801f8f486fa375cd75105993d) - make hash and eq non recursive *(PR [#5966](https://github.com/tobymao/sqlglot/pull/5966) by [@tobymao](https://github.com/tobymao))*
+### :wrench: Chores
+- [`345c6a1`](https://github.com/tobymao/sqlglot/commit/345c6a153481a22d6df1b12ef1863e2133688fdf) - add uv support to Makefile *(PR [#5973](https://github.com/tobymao/sqlglot/pull/5973) by [@eakmanrq](https://github.com/eakmanrq))*
 ## [v27.19.0] - 2025-09-26
 ### :boom: BREAKING CHANGES
 - due to [`68473ac`](https://github.com/tobymao/sqlglot/commit/68473ac3ec8dc76512dc76819892a1b0324c7ddc) - Annotate type for snowflake PARSE_URL function *(PR [#5962](https://github.com/tobymao/sqlglot/pull/5962) by [@fivetran-amrutabhimsenayachit](https://github.com/fivetran-amrutabhimsenayachit))*:
@@ -7561,3 +7617,4 @@ Changelog
 [v27.17.0]: https://github.com/tobymao/sqlglot/compare/v27.16.3...v27.17.0
 [v27.18.0]: https://github.com/tobymao/sqlglot/compare/v27.17.0...v27.18.0
 [v27.19.0]: https://github.com/tobymao/sqlglot/compare/v27.18.0...v27.19.0
+[v27.20.0]: https://github.com/tobymao/sqlglot/compare/v27.19.0...v27.20.0

{sqlglot-27.20.0 → sqlglot-27.21.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: sqlglot
-Version: 27.20.0
+Version: 27.21.0
 Summary: An easily customizable SQL parser and transpiler
 Author-email: Toby Mao <toby.mao@gmail.com>
 License-Expression: MIT
@@ -33,7 +33,7 @@ Requires-Dist: typing_extensions; extra == "dev"
 Requires-Dist: maturin<2.0,>=1.4; extra == "dev"
 Requires-Dist: pyperf; extra == "dev"
 Provides-Extra: rs
-Requires-Dist: sqlglotrs==0.6.2; extra == "rs"
+Requires-Dist: sqlglotrs==0.7.0; extra == "rs"
 Dynamic: license-file
 Dynamic: provides-extra

{sqlglot-27.20.0 → sqlglot-27.21.0}/sqlglot/_version.py RENAMED Viewed

@@ -28,7 +28,7 @@ version_tuple: VERSION_TUPLE
 commit_id: COMMIT_ID
 __commit_id__: COMMIT_ID
-__version__ = version = '27.20.0'
-__version_tuple__ = version_tuple = (27, 20, 0)
+__version__ = version = '27.21.0'
+__version_tuple__ = version_tuple = (27, 21, 0)
-__commit_id__ = commit_id = 'g0d772e0b9'
+__commit_id__ = commit_id = 'g5dd2ed3c6'

{sqlglot-27.20.0 → sqlglot-27.21.0}/sqlglot/dialects/bigquery.py RENAMED Viewed

@@ -867,6 +867,8 @@ class BigQuery(Dialect):
             "FROM_HEX": exp.Unhex.from_arg_list,
             "WEEK": lambda args: exp.WeekStart(this=exp.var(seq_get(args, 0))),
         }
+        # Remove SEARCH to avoid parameter routing issues - let it fall back to Anonymous function
+        FUNCTIONS.pop("SEARCH")
         FUNCTION_PARSERS = {
             **parser.Parser.FUNCTION_PARSERS,

{sqlglot-27.20.0 → sqlglot-27.21.0}/sqlglot/dialects/dialect.py RENAMED Viewed

@@ -1715,7 +1715,7 @@ def unit_to_str(expression: exp.Expression, default: str = "DAY") -> t.Optional[
 def unit_to_var(expression: exp.Expression, default: str = "DAY") -> t.Optional[exp.Expression]:
     unit = expression.args.get("unit")
-    if isinstance(unit, (exp.Var, exp.Placeholder, exp.WeekStart)):
+    if isinstance(unit, (exp.Var, exp.Placeholder, exp.WeekStart, exp.Column)):
         return unit
     value = unit.name if unit else default
@@ -1736,7 +1736,9 @@ def map_date_part(
 def map_date_part(part, dialect: DialectType = Dialect):
     mapped = (
-        Dialect.get_or_raise(dialect).DATE_PART_MAPPING.get(part.name.upper()) if part else None
+        Dialect.get_or_raise(dialect).DATE_PART_MAPPING.get(part.name.upper())
+        if part and not (isinstance(part, exp.Column) and len(part.parts) != 1)
+        else None
     )
     if mapped:
         return exp.Literal.string(mapped) if part.is_string else exp.var(mapped)

{sqlglot-27.20.0 → sqlglot-27.21.0}/sqlglot/dialects/duckdb.py RENAMED Viewed

@@ -311,6 +311,7 @@ class DuckDB(Dialect):
             "PIVOT_WIDER": TokenType.PIVOT,
             "POSITIONAL": TokenType.POSITIONAL,
             "RESET": TokenType.COMMAND,
+            "ROW": TokenType.STRUCT,
             "SIGNED": TokenType.INT,
             "STRING": TokenType.TEXT,
             "SUMMARIZE": TokenType.SUMMARIZE,
@@ -337,16 +338,14 @@ class DuckDB(Dialect):
     class Parser(parser.Parser):
         MAP_KEYS_ARE_ARBITRARY_EXPRESSIONS = True
-        BITWISE = {
-            **parser.Parser.BITWISE,
-            TokenType.TILDA: exp.RegexpLike,
-        }
+        BITWISE = parser.Parser.BITWISE.copy()
         BITWISE.pop(TokenType.CARET)
         RANGE_PARSERS = {
             **parser.Parser.RANGE_PARSERS,
             TokenType.DAMP: binary_range_parser(exp.ArrayOverlaps),
             TokenType.CARET_AT: binary_range_parser(exp.StartsWith),
+            TokenType.TILDA: binary_range_parser(exp.RegexpFullMatch),
         }
         EXPONENT = {

{sqlglot-27.20.0 → sqlglot-27.21.0}/sqlglot/dialects/hive.py RENAMED Viewed

@@ -531,7 +531,6 @@ class Hive(Dialect):
         TRANSFORMS = {
             **generator.Generator.TRANSFORMS,
-            exp.Group: transforms.preprocess([transforms.unalias_group]),
             exp.Property: property_sql,
             exp.AnyValue: rename_func("FIRST"),
             exp.ApproxDistinct: approx_count_distinct_sql,

{sqlglot-27.20.0 → sqlglot-27.21.0}/sqlglot/dialects/oracle.py RENAMED Viewed

@@ -307,7 +307,6 @@ class Oracle(Dialect):
             ),
             exp.DateTrunc: lambda self, e: self.func("TRUNC", e.this, e.unit),
             exp.EuclideanDistance: rename_func("L2_DISTANCE"),
-            exp.Group: transforms.preprocess([transforms.unalias_group]),
             exp.ILike: no_ilike_sql,
             exp.LogicalOr: rename_func("MAX"),
             exp.LogicalAnd: rename_func("MIN"),

{sqlglot-27.20.0 → sqlglot-27.21.0}/sqlglot/dialects/presto.py RENAMED Viewed

@@ -475,7 +475,6 @@ class Presto(Dialect):
             e: f"WITH_TIMEZONE({self.sql(e, 'this')}, {self.sql(e, 'zone')}) AT TIME ZONE 'UTC'",
             exp.GenerateSeries: sequence_sql,
             exp.GenerateDateArray: sequence_sql,
-            exp.Group: transforms.preprocess([transforms.unalias_group]),
             exp.If: if_sql(),
             exp.ILike: no_ilike_sql,
             exp.Initcap: _initcap_sql,

{sqlglot-27.20.0 → sqlglot-27.21.0}/sqlglot/dialects/risingwave.py RENAMED Viewed

@@ -25,6 +25,20 @@ class RisingWave(Postgres):
             "KEY": lambda self: self._parse_encode_property(key=True),
         }
+        CONSTRAINT_PARSERS = {
+            **Postgres.Parser.CONSTRAINT_PARSERS,
+            "WATERMARK": lambda self: self.expression(
+                exp.WatermarkColumnConstraint,
+                this=self._match(TokenType.FOR) and self._parse_column(),
+                expression=self._match(TokenType.ALIAS) and self._parse_disjunction(),
+            ),
+        }
+        SCHEMA_UNNAMED_CONSTRAINTS = {
+            *Postgres.Parser.SCHEMA_UNNAMED_CONSTRAINTS,
+            "WATERMARK",
+        }
         def _parse_table_hints(self) -> t.Optional[t.List[exp.Expression]]:
             # There is no hint in risingwave.
             # Do nothing here to avoid WITH keywords conflict in CREATE SINK statement.

{sqlglot-27.20.0 → sqlglot-27.21.0}/sqlglot/dialects/snowflake.py RENAMED Viewed

@@ -41,7 +41,18 @@ if t.TYPE_CHECKING:
     from sqlglot._typing import E, B
-# from https://docs.snowflake.com/en/sql-reference/functions/to_timestamp.html
+def _build_strtok(args: t.List) -> exp.SplitPart:
+    # Add default delimiter (space) if missing - per Snowflake docs
+    if len(args) == 1:
+        args.append(exp.Literal.string(" "))
+    # Add default part_index (1) if missing
+    if len(args) == 2:
+        args.append(exp.Literal.number(1))
+    return exp.SplitPart.from_arg_list(args)
 def _build_datetime(
     name: str, kind: exp.DataType.Type, safe: bool = False
 ) -> t.Callable[[t.List], exp.Func]:
@@ -137,12 +148,35 @@ def _build_if_from_div0(args: t.List) -> exp.If:
     return exp.If(this=cond, true=true, false=false)
+# https://docs.snowflake.com/en/sql-reference/functions/div0null
+def _build_if_from_div0null(args: t.List) -> exp.If:
+    lhs = exp._wrap(seq_get(args, 0), exp.Binary)
+    rhs = exp._wrap(seq_get(args, 1), exp.Binary)
+    # Returns 0 when divisor is 0 OR NULL
+    cond = exp.EQ(this=rhs, expression=exp.Literal.number(0)).or_(
+        exp.Is(this=rhs, expression=exp.null())
+    )
+    true = exp.Literal.number(0)
+    false = exp.Div(this=lhs, expression=rhs)
+    return exp.If(this=cond, true=true, false=false)
 # https://docs.snowflake.com/en/sql-reference/functions/zeroifnull
 def _build_if_from_zeroifnull(args: t.List) -> exp.If:
     cond = exp.Is(this=seq_get(args, 0), expression=exp.Null())
     return exp.If(this=cond, true=exp.Literal.number(0), false=seq_get(args, 0))
+def _build_search(args: t.List) -> exp.Search:
+    kwargs = {
+        "this": seq_get(args, 0),
+        "expression": seq_get(args, 1),
+        **{arg.name.lower(): arg for arg in args[2:] if isinstance(arg, exp.Kwarg)},
+    }
+    return exp.Search(**kwargs)
 # https://docs.snowflake.com/en/sql-reference/functions/zeroifnull
 def _build_if_from_nullifzero(args: t.List) -> exp.If:
     cond = exp.EQ(this=seq_get(args, 0), expression=exp.Literal.number(0))
@@ -529,6 +563,16 @@ class Snowflake(Dialect):
     TYPE_TO_EXPRESSIONS = {
         **Dialect.TYPE_TO_EXPRESSIONS,
+        exp.DataType.Type.DOUBLE: {
+            *Dialect.TYPE_TO_EXPRESSIONS[exp.DataType.Type.DOUBLE],
+            exp.Cos,
+            exp.Cosh,
+            exp.Cot,
+            exp.Degrees,
+            exp.Exp,
+            exp.Sin,
+            exp.Tan,
+        },
         exp.DataType.Type.INT: {
             *Dialect.TYPE_TO_EXPRESSIONS[exp.DataType.Type.INT],
             exp.Ascii,
@@ -539,6 +583,7 @@ class Snowflake(Dialect):
             exp.Levenshtein,
             exp.JarowinklerSimilarity,
             exp.StrPosition,
+            exp.Unicode,
         },
         exp.DataType.Type.VARCHAR: {
             *Dialect.TYPE_TO_EXPRESSIONS[exp.DataType.Type.VARCHAR],
@@ -564,8 +609,10 @@ class Snowflake(Dialect):
             exp.SHA,
             exp.SHA2,
             exp.Soundex,
+            exp.SoundexP123,
             exp.Space,
             exp.SplitPart,
+            exp.Translate,
             exp.Uuid,
         },
         exp.DataType.Type.BINARY: {
@@ -587,6 +634,8 @@ class Snowflake(Dialect):
         },
         exp.DataType.Type.ARRAY: {
             exp.Split,
+            exp.RegexpExtractAll,
+            exp.StringToArray,
         },
         exp.DataType.Type.OBJECT: {
             exp.ParseUrl,
@@ -595,6 +644,10 @@ class Snowflake(Dialect):
         exp.DataType.Type.DECIMAL: {
             exp.RegexpCount,
         },
+        exp.DataType.Type.BOOLEAN: {
+            *Dialect.TYPE_TO_EXPRESSIONS[exp.DataType.Type.BOOLEAN],
+            exp.Search,
+        },
     }
     ANNOTATORS = {
@@ -614,11 +667,17 @@ class Snowflake(Dialect):
                 exp.Substring,
             )
         },
+        **{
+            expr_type: lambda self, e: self._annotate_with_type(
+                e, exp.DataType.build("NUMBER", dialect="snowflake")
+            )
+            for expr_type in (
+                exp.RegexpCount,
+                exp.RegexpInstr,
+            )
+        },
         exp.ConcatWs: lambda self, e: self._annotate_by_args(e, "expressions"),
         exp.Reverse: _annotate_reverse,
-        exp.RegexpCount: lambda self, e: self._annotate_with_type(
-            e, exp.DataType.build("NUMBER", dialect="snowflake")
-        ),
     }
     TIME_MAPPING = {
@@ -691,7 +750,7 @@ class Snowflake(Dialect):
             "APPROX_PERCENTILE": exp.ApproxQuantile.from_arg_list,
             "ARRAY_CONSTRUCT": lambda args: exp.Array(expressions=args),
             "ARRAY_CONTAINS": lambda args: exp.ArrayContains(
-                this=seq_get(args, 1), expression=seq_get(args, 0)
+                this=seq_get(args, 1), expression=seq_get(args, 0), ensure_variant=False
             ),
             "ARRAY_GENERATE_RANGE": lambda args: exp.GenerateSeries(
                 # ARRAY_GENERATE_RANGE has an exlusive end; we normalize it to be inclusive
@@ -727,6 +786,7 @@ class Snowflake(Dialect):
             "DATEDIFF": _build_datediff,
             "DAYOFWEEKISO": exp.DayOfWeekIso.from_arg_list,
             "DIV0": _build_if_from_div0,
+            "DIV0NULL": _build_if_from_div0null,
             "EDITDISTANCE": lambda args: exp.Levenshtein(
                 this=seq_get(args, 0), expression=seq_get(args, 1), max_dist=seq_get(args, 2)
             ),
@@ -765,6 +825,7 @@ class Snowflake(Dialect):
             "SHA2_BINARY": exp.SHA2Digest.from_arg_list,
             "SHA2_HEX": exp.SHA2.from_arg_list,
             "SQUARE": lambda args: exp.Pow(this=seq_get(args, 0), expression=exp.Literal.number(2)),
+            "STRTOK": _build_strtok,
             "TABLE": lambda args: exp.TableFromRows(this=seq_get(args, 0)),
             "TIMEADD": _build_date_time_add(exp.TimeAdd),
             "TIMEDIFF": _build_datediff,
@@ -799,6 +860,7 @@ class Snowflake(Dialect):
             "ZEROIFNULL": _build_if_from_zeroifnull,
             "LIKE": _build_like(exp.Like),
             "ILIKE": _build_like(exp.ILike),
+            "SEARCH": _build_search,
         }
         FUNCTIONS.pop("PREDICT")
@@ -1364,7 +1426,13 @@ class Snowflake(Dialect):
             exp.ArgMax: rename_func("MAX_BY"),
             exp.ArgMin: rename_func("MIN_BY"),
             exp.ArrayConcat: lambda self, e: self.arrayconcat_sql(e, name="ARRAY_CAT"),
-            exp.ArrayContains: lambda self, e: self.func("ARRAY_CONTAINS", e.expression, e.this),
+            exp.ArrayContains: lambda self, e: self.func(
+                "ARRAY_CONTAINS",
+                e.expression
+                if e.args.get("ensure_variant") is False
+                else exp.cast(e.expression, exp.DataType.Type.VARIANT, copy=False),
+                e.this,
+            ),
             exp.ArrayIntersect: rename_func("ARRAY_INTERSECTION"),
             exp.AtTimeZone: lambda self, e: self.func(
                 "CONVERT_TIMEZONE", e.args.get("zone"), e.this
@@ -1894,3 +1962,13 @@ class Snowflake(Dialect):
                 return self.func("TO_CHAR", expression.expressions[0])
             return self.function_fallback_sql(expression)
+        def splitpart_sql(self, expression: exp.SplitPart) -> str:
+            # Set part_index to 1 if missing
+            if not expression.args.get("delimiter"):
+                expression.set("delimiter", exp.Literal.string(" "))
+            if not expression.args.get("part_index"):
+                expression.set("part_index", exp.Literal.number(1))
+            return rename_func("SPLIT_PART")(self, expression)

{sqlglot-27.20.0 → sqlglot-27.21.0}/sqlglot/dialects/spark.py RENAMED Viewed

@@ -230,7 +230,6 @@ class Spark(Spark2):
         }
         TRANSFORMS.pop(exp.AnyValue)
         TRANSFORMS.pop(exp.DateDiff)
-        TRANSFORMS.pop(exp.Group)
         def bracket_sql(self, expression: exp.Bracket) -> str:
             if expression.args.get("safe"):

{sqlglot-27.20.0 → sqlglot-27.21.0}/sqlglot/expressions.py RENAMED Viewed

@@ -5385,7 +5385,7 @@ class TimeUnit(Expression):
     def __init__(self, **args):
         unit = args.get("unit")
-        if type(unit) in self.VAR_LIKE:
+        if type(unit) in self.VAR_LIKE and not (isinstance(unit, Column) and len(unit.parts) != 1):
             args["unit"] = Var(
                 this=(self.UNABBREVIATED_UNIT_NAME.get(unit.name) or unit.name).upper()
             )
@@ -5525,6 +5525,10 @@ class Coth(Func):
     pass
+class Cos(Func):
+    pass
 class Csc(Func):
     pass
@@ -5549,6 +5553,18 @@ class Sinh(Func):
     pass
+class Tan(Func):
+    pass
+class Degrees(Func):
+    pass
+class Cosh(Func):
+    pass
 class CosineDistance(Func):
     arg_types = {"this": True, "expression": True}
@@ -5840,6 +5856,7 @@ class ArrayConstructCompact(Func):
 class ArrayContains(Binary, Func):
+    arg_types = {"this": True, "expression": True, "ensure_variant": False}
     _sql_names = ["ARRAY_CONTAINS", "ARRAY_HAS"]
@@ -6172,7 +6189,9 @@ class DateTrunc(Func):
         unabbreviate = args.pop("unabbreviate", True)
         unit = args.get("unit")
-        if isinstance(unit, TimeUnit.VAR_LIKE):
+        if isinstance(unit, TimeUnit.VAR_LIKE) and not (
+            isinstance(unit, Column) and len(unit.parts) != 1
+        ):
             unit_name = unit.name.upper()
             if unabbreviate and unit_name in TimeUnit.UNABBREVIATED_UNIT_NAME:
                 unit_name = TimeUnit.UNABBREVIATED_UNIT_NAME[unit_name]
@@ -7279,6 +7298,10 @@ class RegexpILike(Binary, Func):
     arg_types = {"this": True, "expression": True, "flag": False}
+class RegexpFullMatch(Binary, Func):
+    arg_types = {"this": True, "expression": True, "options": False}
 class RegexpInstr(Func):
     arg_types = {
         "this": True,
@@ -7380,13 +7403,20 @@ class Soundex(Func):
     pass
+# https://docs.snowflake.com/en/sql-reference/functions/soundex_p123
+class SoundexP123(Func):
+    pass
 class Split(Func):
     arg_types = {"this": True, "expression": True, "limit": False}
 # https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.split_part.html
+# https://docs.snowflake.com/en/sql-reference/functions/split_part
+# https://docs.snowflake.com/en/sql-reference/functions/strtok
 class SplitPart(Func):
-    arg_types = {"this": True, "delimiter": True, "part_index": True}
+    arg_types = {"this": True, "delimiter": False, "part_index": False}
 # Start may be omitted in the case of postgres
@@ -7430,6 +7460,19 @@ class StrPosition(Func):
     }
+# Snowflake: https://docs.snowflake.com/en/sql-reference/functions/search
+# BigQuery: https://cloud.google.com/bigquery/docs/reference/standard-sql/search_functions#search
+class Search(Func):
+    arg_types = {
+        "this": True,  # data_to_search / search_data
+        "expression": True,  # search_query / search_string
+        "json_scope": False,  # BigQuery: JSON_VALUES | JSON_KEYS | JSON_KEYS_AND_VALUES
+        "analyzer": False,  # Both: analyzer / ANALYZER
+        "analyzer_options": False,  # BigQuery: analyzer_options_values
+        "search_mode": False,  # Snowflake: OR | AND
+    }
 class StrToDate(Func):
     arg_types = {"this": True, "format": False, "safe": False}

{sqlglot-27.20.0 → sqlglot-27.21.0}/sqlglot/optimizer/canonicalize.py RENAMED Viewed

@@ -77,7 +77,7 @@ def coerce_type(node: exp.Expression, promote_to_inferred_datetime_type: bool) -
         _coerce_date(node.left, node.right, promote_to_inferred_datetime_type)
     elif isinstance(node, exp.Between):
         _coerce_date(node.this, node.args["low"], promote_to_inferred_datetime_type)
-    elif isinstance(node, exp.Extract) and not node.expression.type.is_type(
+    elif isinstance(node, exp.Extract) and not node.expression.is_type(
         *exp.DataType.TEMPORAL_TYPES
     ):
         _replace_cast(node.expression, exp.DataType.Type.DATETIME)

{sqlglot-27.20.0 → sqlglot-27.21.0}/sqlglot/optimizer/merge_subqueries.py RENAMED Viewed

@@ -201,6 +201,7 @@ def _mergeable(
         and not outer_scope.pivots
         and not any(e.find(exp.AggFunc, exp.Select, exp.Explode) for e in inner_select.expressions)
         and not (leave_tables_isolated and len(outer_scope.selected_sources) > 1)
+        and not (isinstance(from_or_join, exp.Join) and inner_select.args.get("joins"))
         and not (
             isinstance(from_or_join, exp.Join)
             and inner_select.args.get("where")
@@ -282,6 +283,7 @@ def _merge_joins(outer_scope: Scope, inner_scope: Scope, from_or_join: FromOrJoi
     new_joins = []
     joins = inner_scope.expression.args.get("joins") or []
     for join in joins:
         new_joins.append(join)
         outer_scope.add_source(join.alias_or_name, inner_scope.sources[join.alias_or_name])

{sqlglot-27.20.0 → sqlglot-27.21.0}/sqlglot/parser.py RENAMED Viewed

@@ -1141,11 +1141,6 @@ class Parser(metaclass=_Parser):
         "TTL": lambda self: self.expression(exp.MergeTreeTTL, expressions=[self._parse_bitwise()]),
         "UNIQUE": lambda self: self._parse_unique(),
         "UPPERCASE": lambda self: self.expression(exp.UppercaseColumnConstraint),
-        "WATERMARK": lambda self: self.expression(
-            exp.WatermarkColumnConstraint,
-            this=self._match(TokenType.FOR) and self._parse_column(),
-            expression=self._match(TokenType.ALIAS) and self._parse_disjunction(),
-        ),
         "WITH": lambda self: self.expression(
             exp.Properties, expressions=self._parse_wrapped_properties()
         ),
@@ -1211,7 +1206,6 @@ class Parser(metaclass=_Parser):
         "PERIOD",
         "PRIMARY KEY",
         "UNIQUE",
-        "WATERMARK",
         "BUCKET",
         "TRUNCATE",
     }
@@ -4592,14 +4586,10 @@ class Parser(metaclass=_Parser):
             before_with_index = self._index
             with_prefix = self._match(TokenType.WITH)
-            if self._match(TokenType.ROLLUP):
-                elements["rollup"].append(
-                    self._parse_cube_or_rollup(exp.Rollup, with_prefix=with_prefix)
-                )
-            elif self._match(TokenType.CUBE):
-                elements["cube"].append(
-                    self._parse_cube_or_rollup(exp.Cube, with_prefix=with_prefix)
-                )
+            cube_or_rollup = self._parse_cube_or_rollup(with_prefix=with_prefix)
+            if cube_or_rollup:
+                key = "rollup" if isinstance(cube_or_rollup, exp.Rollup) else "cube"
+                elements[key].append(cube_or_rollup)
             elif self._match(TokenType.GROUPING_SETS):
                 elements["grouping_sets"].append(
                     self.expression(
@@ -4619,18 +4609,20 @@ class Parser(metaclass=_Parser):
         return self.expression(exp.Group, comments=comments, **elements)  # type: ignore
-    def _parse_cube_or_rollup(self, kind: t.Type[E], with_prefix: bool = False) -> E:
+    def _parse_cube_or_rollup(self, with_prefix: bool = False) -> t.Optional[exp.Cube | exp.Rollup]:
+        if self._match(TokenType.CUBE):
+            kind: t.Type[exp.Cube | exp.Rollup] = exp.Cube
+        elif self._match(TokenType.ROLLUP):
+            kind = exp.Rollup
+        else:
+            return None
         return self.expression(
             kind, expressions=[] if with_prefix else self._parse_wrapped_csv(self._parse_column)
         )
     def _parse_grouping_set(self) -> t.Optional[exp.Expression]:
-        if self._match(TokenType.L_PAREN):
-            grouping_set = self._parse_csv(self._parse_bitwise)
-            self._match_r_paren()
-            return self.expression(exp.Tuple, expressions=grouping_set)
-        return self._parse_column()
+        return self._parse_cube_or_rollup() or self._parse_bitwise()
     def _parse_having(self, skip_having_token: bool = False) -> t.Optional[exp.Having]:
         if not skip_having_token and not self._match(TokenType.HAVING):
@@ -4749,11 +4741,15 @@ class Parser(metaclass=_Parser):
             exp.Ordered, this=this, desc=desc, nulls_first=nulls_first, with_fill=with_fill
         )
-    def _parse_limit_options(self) -> exp.LimitOptions:
-        percent = self._match(TokenType.PERCENT)
+    def _parse_limit_options(self) -> t.Optional[exp.LimitOptions]:
+        percent = self._match_set((TokenType.PERCENT, TokenType.MOD))
         rows = self._match_set((TokenType.ROW, TokenType.ROWS))
         self._match_text_seq("ONLY")
         with_ties = self._match_text_seq("WITH", "TIES")
+        if not (percent or rows or with_ties):
+            return None
         return self.expression(exp.LimitOptions, percent=percent, rows=rows, with_ties=with_ties)
     def _parse_limit(
@@ -4771,10 +4767,13 @@ class Parser(metaclass=_Parser):
                 if limit_paren:
                     self._match_r_paren()
-                limit_options = self._parse_limit_options()
             else:
-                limit_options = None
-                expression = self._parse_term()
+                # Parsing LIMIT x% (i.e x PERCENT) as a term leads to an error, since
+                # we try to build an exp.Mod expr. For that matter, we backtrack and instead
+                # consume the factor plus parse the percentage separately
+                expression = self._try_parse(self._parse_term) or self._parse_factor()
+            limit_options = self._parse_limit_options()
             if self._match(TokenType.COMMA):
                 offset = expression

{sqlglot-27.20.0 → sqlglot-27.21.0}/sqlglot/tokens.py RENAMED Viewed

@@ -1421,7 +1421,11 @@ class Tokenizer(metaclass=_Tokenizer):
                         raise_unmatched=not self.HEREDOC_TAG_IS_IDENTIFIER,
                     )
-                if tag and self.HEREDOC_TAG_IS_IDENTIFIER and (self._end or not tag.isidentifier()):
+                if (
+                    tag
+                    and self.HEREDOC_TAG_IS_IDENTIFIER
+                    and (self._end or tag.isdigit() or any(c.isspace() for c in tag))
+                ):
                     if not self._end:
                         self._advance(-1)

sqlglot 27.20.0__tar.gz → 27.21.0__tar.gz

sqlglot 27.20.0tar.gz → 27.21.0tar.gz