PyPI - sqlglot - Versions diffs - 27.8.0__tar.gz → 27.9.0__tar.gz - Mend

sqlglot 27.8.0tar.gz → 27.9.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (227) hide show

{sqlglot-27.8.0 → sqlglot-27.9.0}/CHANGELOG.md RENAMED Viewed

@@ -1,6 +1,74 @@
 Changelog
 =========
+## [v27.8.0] - 2025-08-19
+### :boom: BREAKING CHANGES
+- due to [`2a33339`](https://github.com/tobymao/sqlglot/commit/2a333395cde71936df911488afcff92cae735e11) - annotate type for bigquery REPLACE *(PR [#5572](https://github.com/tobymao/sqlglot/pull/5572) by [@geooo109](https://github.com/geooo109))*:
+  annotate type for bigquery REPLACE (#5572)
+- due to [`1e6f813`](https://github.com/tobymao/sqlglot/commit/1e6f81343de641e588f1a05ce7dc01bed72bd849) - annotate type for bigquery REGEXP_EXTRACT_ALL *(PR [#5573](https://github.com/tobymao/sqlglot/pull/5573) by [@geooo109](https://github.com/geooo109))*:
+  annotate type for bigquery REGEXP_EXTRACT_ALL (#5573)
+- due to [`d0d62ed`](https://github.com/tobymao/sqlglot/commit/d0d62ede6320b3fd0eee04b7073f5708676dc58c) - support `TO_CHAR` with numeric inputs *(PR [#5570](https://github.com/tobymao/sqlglot/pull/5570) by [@jasonthomassql](https://github.com/jasonthomassql))*:
+  support `TO_CHAR` with numeric inputs (#5570)
+- due to [`7928985`](https://github.com/tobymao/sqlglot/commit/7928985a655c3d0244bc9175a37f502b19a5c5f0) - allow dashes in JSONPath keys *(PR [#5574](https://github.com/tobymao/sqlglot/pull/5574) by [@georgesittas](https://github.com/georgesittas))*:
+  allow dashes in JSONPath keys (#5574)
+- due to [`eb09e6e`](https://github.com/tobymao/sqlglot/commit/eb09e6e32491a05846488de7b72b1dca0e0a2669) - parse and annotate type for bigquery TRANSLATE *(PR [#5575](https://github.com/tobymao/sqlglot/pull/5575) by [@geooo109](https://github.com/geooo109))*:
+  parse and annotate type for bigquery TRANSLATE (#5575)
+- due to [`f9a522b`](https://github.com/tobymao/sqlglot/commit/f9a522b26cd5d643b8b18fa64d70f2a3f0ff2d2c) - parse and annotate type for bigquery SOUNDEX *(PR [#5576](https://github.com/tobymao/sqlglot/pull/5576) by [@geooo109](https://github.com/geooo109))*:
+  parse and annotate type for bigquery SOUNDEX (#5576)
+- due to [`51da41b`](https://github.com/tobymao/sqlglot/commit/51da41b90ce421b154e45add28353ac044640a1c) - annotate type for bigquery MD5 *(PR [#5577](https://github.com/tobymao/sqlglot/pull/5577) by [@geooo109](https://github.com/geooo109))*:
+  annotate type for bigquery MD5 (#5577)
+- due to [`bcf302f`](https://github.com/tobymao/sqlglot/commit/bcf302ff6ad2d0adfc29f708a8b53b5c0e547619) - annotate type for bigquery MIN/MAX BY *(PR [#5579](https://github.com/tobymao/sqlglot/pull/5579) by [@geooo109](https://github.com/geooo109))*:
+  annotate type for bigquery MIN/MAX BY (#5579)
+- due to [`c501d9e`](https://github.com/tobymao/sqlglot/commit/c501d9e6f58e4880e4d23f21f53f72dcb5fdaa8c) - parse and annotate type for bigquery GROUPING *(PR [#5581](https://github.com/tobymao/sqlglot/pull/5581) by [@geooo109](https://github.com/geooo109))*:
+  parse and annotate type for bigquery GROUPING (#5581)
+### :sparkles: New Features
+- [`2a33339`](https://github.com/tobymao/sqlglot/commit/2a333395cde71936df911488afcff92cae735e11) - **optimizer**: annotate type for bigquery REPLACE *(PR [#5572](https://github.com/tobymao/sqlglot/pull/5572) by [@geooo109](https://github.com/geooo109))*
+- [`1e6f813`](https://github.com/tobymao/sqlglot/commit/1e6f81343de641e588f1a05ce7dc01bed72bd849) - **optimizer**: annotate type for bigquery REGEXP_EXTRACT_ALL *(PR [#5573](https://github.com/tobymao/sqlglot/pull/5573) by [@geooo109](https://github.com/geooo109))*
+- [`eb09e6e`](https://github.com/tobymao/sqlglot/commit/eb09e6e32491a05846488de7b72b1dca0e0a2669) - **optimizer**: parse and annotate type for bigquery TRANSLATE *(PR [#5575](https://github.com/tobymao/sqlglot/pull/5575) by [@geooo109](https://github.com/geooo109))*
+- [`f9a522b`](https://github.com/tobymao/sqlglot/commit/f9a522b26cd5d643b8b18fa64d70f2a3f0ff2d2c) - **optimizer**: parse and annotate type for bigquery SOUNDEX *(PR [#5576](https://github.com/tobymao/sqlglot/pull/5576) by [@geooo109](https://github.com/geooo109))*
+- [`51da41b`](https://github.com/tobymao/sqlglot/commit/51da41b90ce421b154e45add28353ac044640a1c) - **optimizer**: annotate type for bigquery MD5 *(PR [#5577](https://github.com/tobymao/sqlglot/pull/5577) by [@geooo109](https://github.com/geooo109))*
+- [`bcf302f`](https://github.com/tobymao/sqlglot/commit/bcf302ff6ad2d0adfc29f708a8b53b5c0e547619) - **optimizer**: annotate type for bigquery MIN/MAX BY *(PR [#5579](https://github.com/tobymao/sqlglot/pull/5579) by [@geooo109](https://github.com/geooo109))*
+- [`c501d9e`](https://github.com/tobymao/sqlglot/commit/c501d9e6f58e4880e4d23f21f53f72dcb5fdaa8c) - **optimizer**: parse and annotate type for bigquery GROUPING *(PR [#5581](https://github.com/tobymao/sqlglot/pull/5581) by [@geooo109](https://github.com/geooo109))*
+- [`8612825`](https://github.com/tobymao/sqlglot/commit/86128253f911b733d45b073356e3b8ddf261c22b) - **spark**: generate date/time ops as interval binary ops *(commit by [@georgesittas](https://github.com/georgesittas))*
+- [`8fda774`](https://github.com/tobymao/sqlglot/commit/8fda774b7a9b0c66948349dfe030d3c122ff6eee) - **singlestore**: Added parsing and generation of JSON_EXTRACT *(PR [#5555](https://github.com/tobymao/sqlglot/pull/5555) by [@AdalbertMemSQL](https://github.com/AdalbertMemSQL))*
+- [`82cc954`](https://github.com/tobymao/sqlglot/commit/82cc9549a875211a400e5c4e818b05ca48a0a9f4) - **exasol**: map div function to IntDiv in exasol dialect *(PR [#5593](https://github.com/tobymao/sqlglot/pull/5593) by [@nnamdi16](https://github.com/nnamdi16))*
+- [`eb0fe68`](https://github.com/tobymao/sqlglot/commit/eb0fe68d6b5977053c871badf2f5c1895b3e1c66) - **trino**: add JSON_VALUE function support with RETURNING clause *(PR [#5590](https://github.com/tobymao/sqlglot/pull/5590) by [@rev-rwasilewski](https://github.com/rev-rwasilewski))*
+- [`9e95c11`](https://github.com/tobymao/sqlglot/commit/9e95c115ea0304d9ccb4cb0be8389f5ff5f2a952) - **exasol**: mapped weekofyear to week in Exasol dialect *(PR [#5594](https://github.com/tobymao/sqlglot/pull/5594) by [@nnamdi16](https://github.com/nnamdi16))*
+- [`8f013c3`](https://github.com/tobymao/sqlglot/commit/8f013c37a412ca5978889c1e47b0c6f7add0715d) - **singlestore**: Fixed parsing of DATE function *(PR [#5601](https://github.com/tobymao/sqlglot/pull/5601) by [@AdalbertMemSQL](https://github.com/AdalbertMemSQL))*
+- [`a4a299a`](https://github.com/tobymao/sqlglot/commit/a4a299acbaf4461f0c2b470bc4e9e9590515eda7) - transpile `TO_CHAR` from Dremio to Databricks *(PR [#5598](https://github.com/tobymao/sqlglot/pull/5598) by [@jasonthomassql](https://github.com/jasonthomassql))*
+- [`093f35c`](https://github.com/tobymao/sqlglot/commit/093f35c201c3c22c3a14c6f8de26c06246bdf19c) - **dremio**: handle `DATE_FORMAT`, `TO_DATE`, and `TO_TIMESTAMP` *(PR [#5597](https://github.com/tobymao/sqlglot/pull/5597) by [@jasonthomassql](https://github.com/jasonthomassql))*
+### :bug: Bug Fixes
+- [`d0d62ed`](https://github.com/tobymao/sqlglot/commit/d0d62ede6320b3fd0eee04b7073f5708676dc58c) - **dremio**: support `TO_CHAR` with numeric inputs *(PR [#5570](https://github.com/tobymao/sqlglot/pull/5570) by [@jasonthomassql](https://github.com/jasonthomassql))*
+- [`7928985`](https://github.com/tobymao/sqlglot/commit/7928985a655c3d0244bc9175a37f502b19a5c5f0) - **bigquery**: allow dashes in JSONPath keys *(PR [#5574](https://github.com/tobymao/sqlglot/pull/5574) by [@georgesittas](https://github.com/georgesittas))*
+- [`866042d`](https://github.com/tobymao/sqlglot/commit/866042d0268da0cebce042c0868878c0fb39c3d1) - Remove TokenType.APPLY from table alias tokens *(PR [#5592](https://github.com/tobymao/sqlglot/pull/5592) by [@VaggelisD](https://github.com/VaggelisD))*
+  - :arrow_lower_right: *fixes issue [#5591](https://github.com/tobymao/sqlglot/issues/5591) opened by [@saadbelgi](https://github.com/saadbelgi)*
+- [`b485f66`](https://github.com/tobymao/sqlglot/commit/b485f6666fa8625b7da45ef832b5d666fbb707ea) - **dremio**: improve `TO_CHAR` transpilability *(PR [#5580](https://github.com/tobymao/sqlglot/pull/5580) by [@jasonthomassql](https://github.com/jasonthomassql))*
+- [`81874e9`](https://github.com/tobymao/sqlglot/commit/81874e9c3aafcc2cf8fb443f65146c5b3598b9b3) - handle unknown types in `unit_to_str` *(commit by [@georgesittas](https://github.com/georgesittas))*
+### :wrench: Chores
+- [`173e442`](https://github.com/tobymao/sqlglot/commit/173e4425b692728abffa8542324690823f984303) - refactor JSON_VALUE handling for MySQL and Trino *(commit by [@georgesittas](https://github.com/georgesittas))*
 ## [v27.7.0] - 2025-08-13
 ### :boom: BREAKING CHANGES
 - due to [`938f4b6`](https://github.com/tobymao/sqlglot/commit/938f4b6ebc1c0d26bd3c1400883978c79a435189) - annotate type for LAST_DAY *(PR [#5528](https://github.com/tobymao/sqlglot/pull/5528) by [@geooo109](https://github.com/geooo109))*:
@@ -6535,3 +6603,4 @@ Changelog
 [v27.5.1]: https://github.com/tobymao/sqlglot/compare/v27.5.0...v27.5.1
 [v27.6.0]: https://github.com/tobymao/sqlglot/compare/v27.5.1...v27.6.0
 [v27.7.0]: https://github.com/tobymao/sqlglot/compare/v27.6.0...v27.7.0
+[v27.8.0]: https://github.com/tobymao/sqlglot/compare/v27.7.0...v27.8.0

{sqlglot-27.8.0 → sqlglot-27.9.0}/CONTRIBUTING.md RENAMED Viewed

@@ -39,6 +39,10 @@ to share any relevant context and increase its chances of getting merged.
 Note: make sure to follow the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) guidelines when creating a PR.
+## IMPORTANT: Keep PRs minimal in scope
+Each pull request should focus on a single, well-defined change. Avoid bundling multiple unrelated fixes or features in one PR. This makes code review faster and more effective, increases the likelihood of acceptance, and helps maintain a clean git history.
 ## Report bugs using GitHub's [issues](https://github.com/tobymao/sqlglot/issues)
 We use GitHub issues to track public bugs. Report a bug by opening a new issue.

{sqlglot-27.8.0 → sqlglot-27.9.0}/PKG-INFO RENAMED Viewed

@@ -1,8 +1,9 @@
 Metadata-Version: 2.4
 Name: sqlglot
-Version: 27.8.0
+Version: 27.9.0
 Summary: An easily customizable SQL parser and transpiler
 Author-email: Toby Mao <toby.mao@gmail.com>
+License-Expression: MIT
 Project-URL: Homepage, https://sqlglot.com/
 Project-URL: Documentation, https://sqlglot.com/sqlglot.html
 Project-URL: Repository, https://github.com/tobymao/sqlglot
@@ -72,6 +73,7 @@ Contributions are very welcome in SQLGlot; read the [contribution guide](https:/
 * [Run Tests and Lint](#run-tests-and-lint)
 * [Benchmarks](#benchmarks)
 * [Optional Dependencies](#optional-dependencies)
+* [Supported Dialects](#supported-dialects)
 ## Install
@@ -584,3 +586,41 @@ SQLGlot uses [dateutil](https://github.com/dateutil/dateutil) to simplify litera
 ```sql
 x + interval '1' month
 ```
+## Supported Dialects
+| Dialect | Support Level |
+|---------|---------------|
+| Athena | Official |
+| BigQuery | Official |
+| ClickHouse | Official |
+| Databricks | Official |
+| Doris | Community |
+| Dremio | Community |
+| Drill | Community |
+| Druid | Community |
+| DuckDB | Official |
+| Exasol | Community |
+| Fabric | Community |
+| Hive | Official |
+| Materialize | Community |
+| MySQL | Official |
+| Oracle | Official |
+| Postgres | Official |
+| Presto | Official |
+| PRQL | Community |
+| Redshift | Official |
+| RisingWave | Community |
+| SingleStore | Community |
+| Snowflake | Official |
+| Spark | Official |
+| SQLite | Official |
+| StarRocks | Official |
+| Tableau | Official |
+| Teradata | Community |
+| Trino | Official |
+| TSQL | Official |
+**Official Dialects** are maintained by the core SQLGlot team with higher priority for bug fixes and feature additions.
+**Community Dialects** are developed and maintained primarily through community contributions. These are fully functional but may receive lower priority for issue resolution compared to officially supported dialects. We welcome and encourage community contributions to improve these dialects.

{sqlglot-27.8.0 → sqlglot-27.9.0}/README.md RENAMED Viewed

@@ -34,6 +34,7 @@ Contributions are very welcome in SQLGlot; read the [contribution guide](https:/
 * [Run Tests and Lint](#run-tests-and-lint)
 * [Benchmarks](#benchmarks)
 * [Optional Dependencies](#optional-dependencies)
+* [Supported Dialects](#supported-dialects)
 ## Install
@@ -546,3 +547,41 @@ SQLGlot uses [dateutil](https://github.com/dateutil/dateutil) to simplify litera
 ```sql
 x + interval '1' month
 ```
+## Supported Dialects
+| Dialect | Support Level |
+|---------|---------------|
+| Athena | Official |
+| BigQuery | Official |
+| ClickHouse | Official |
+| Databricks | Official |
+| Doris | Community |
+| Dremio | Community |
+| Drill | Community |
+| Druid | Community |
+| DuckDB | Official |
+| Exasol | Community |
+| Fabric | Community |
+| Hive | Official |
+| Materialize | Community |
+| MySQL | Official |
+| Oracle | Official |
+| Postgres | Official |
+| Presto | Official |
+| PRQL | Community |
+| Redshift | Official |
+| RisingWave | Community |
+| SingleStore | Community |
+| Snowflake | Official |
+| Spark | Official |
+| SQLite | Official |
+| StarRocks | Official |
+| Tableau | Official |
+| Teradata | Community |
+| Trino | Official |
+| TSQL | Official |
+**Official Dialects** are maintained by the core SQLGlot team with higher priority for bug fixes and feature additions.
+**Community Dialects** are developed and maintained primarily through community contributions. These are fully functional but may receive lower priority for issue resolution compared to officially supported dialects. We welcome and encourage community contributions to improve these dialects.

{sqlglot-27.8.0 → sqlglot-27.9.0}/pyproject.toml RENAMED Viewed

@@ -4,6 +4,7 @@ dynamic = ["version", "optional-dependencies"]
 description = "An easily customizable SQL parser and transpiler"
 readme = "README.md"
 authors = [{ name = "Toby Mao", email = "toby.mao@gmail.com" }]
+license = "MIT"
 license-files = ["LICENSE"]
 requires-python = ">= 3.9"
 classifiers = [

{sqlglot-27.8.0 → sqlglot-27.9.0}/sqlglot/_version.py RENAMED Viewed

@@ -28,7 +28,7 @@ version_tuple: VERSION_TUPLE
 commit_id: COMMIT_ID
 __commit_id__: COMMIT_ID
-__version__ = version = '27.8.0'
-__version_tuple__ = version_tuple = (27, 8, 0)
+__version__ = version = '27.9.0'
+__version_tuple__ = version_tuple = (27, 9, 0)
-__commit_id__ = commit_id = 'g093f35c20'
+__commit_id__ = commit_id = 'ge4e08e8c9'

{sqlglot-27.8.0 → sqlglot-27.9.0}/sqlglot/dialects/bigquery.py RENAMED Viewed

@@ -295,6 +295,22 @@ def _annotate_math_functions(self: TypeAnnotator, expression: E) -> E:
     return expression
+def _annotate_by_args_approx_top(self: TypeAnnotator, expression: exp.ApproxTopK) -> exp.ApproxTopK:
+    self._annotate_args(expression)
+    struct_type = exp.DataType(
+        this=exp.DataType.Type.STRUCT,
+        expressions=[expression.this.type, exp.DataType(this=exp.DataType.Type.BIGINT)],
+        nested=True,
+    )
+    self._set_type(
+        expression,
+        exp.DataType(this=exp.DataType.Type.ARRAY, expressions=[struct_type], nested=True),
+    )
+    return expression
 @unsupported_args("ins_cost", "del_cost", "sub_cost")
 def _levenshtein_sql(self: BigQuery.Generator, expression: exp.Levenshtein) -> str:
     max_dist = expression.args.get("max_dist")
@@ -473,17 +489,24 @@ class BigQuery(Dialect):
                 exp.Substring,
             )
         },
+        exp.ApproxTopSum: lambda self, e: _annotate_by_args_approx_top(self, e),
+        exp.ApproxTopK: lambda self, e: _annotate_by_args_approx_top(self, e),
+        exp.ApproxQuantiles: lambda self, e: self._annotate_by_args(e, "this", array=True),
         exp.ArgMax: lambda self, e: self._annotate_by_args(e, "this"),
         exp.ArgMin: lambda self, e: self._annotate_by_args(e, "this"),
         exp.Array: _annotate_array,
         exp.ArrayConcat: lambda self, e: self._annotate_by_args(e, "this", "expressions"),
         exp.Ascii: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.BIGINT),
+        exp.JSONBool: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.BOOLEAN),
         exp.BitwiseAndAgg: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.BIGINT),
         exp.BitwiseOrAgg: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.BIGINT),
         exp.BitwiseXorAgg: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.BIGINT),
         exp.BitwiseCountAgg: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.BIGINT),
         exp.ByteLength: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.BIGINT),
         exp.ByteString: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.BINARY),
+        exp.CodePointsToBytes: lambda self, e: self._annotate_with_type(
+            e, exp.DataType.Type.BINARY
+        ),
         exp.CodePointsToString: lambda self, e: self._annotate_with_type(
             e, exp.DataType.Type.VARCHAR
         ),
@@ -493,6 +516,9 @@ class BigQuery(Dialect):
         exp.CovarSamp: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.DOUBLE),
         exp.DateFromUnixDate: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.DATE),
         exp.DateTrunc: lambda self, e: self._annotate_by_args(e, "this"),
+        exp.FarmFingerprint: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.BIGINT),
+        exp.Unhex: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.BINARY),
+        exp.Float64: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.DOUBLE),
         exp.GenerateTimestampArray: lambda self, e: self._annotate_with_type(
             e, exp.DataType.build("ARRAY<TIMESTAMP>", dialect="bigquery")
         ),
@@ -506,12 +532,20 @@ class BigQuery(Dialect):
         ),
         exp.JSONType: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.VARCHAR),
         exp.Lag: lambda self, e: self._annotate_by_args(e, "this", "default"),
+        exp.LowerHex: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.VARCHAR),
         exp.MD5Digest: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.BINARY),
         exp.ParseTime: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.TIME),
         exp.ParseDatetime: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.DATETIME),
+        exp.ParseBignumeric: lambda self, e: self._annotate_with_type(
+            e, exp.DataType.Type.BIGDECIMAL
+        ),
+        exp.ParseNumeric: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.DECIMAL),
         exp.RegexpExtractAll: lambda self, e: self._annotate_by_args(e, "this", array=True),
         exp.Replace: lambda self, e: self._annotate_by_args(e, "this"),
         exp.Reverse: lambda self, e: self._annotate_by_args(e, "this"),
+        exp.SafeConvertBytesToString: lambda self, e: self._annotate_with_type(
+            e, exp.DataType.Type.VARCHAR
+        ),
         exp.Soundex: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.VARCHAR),
         exp.SHA: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.BINARY),
         exp.SHA2: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.BINARY),
@@ -522,8 +556,11 @@ class BigQuery(Dialect):
         ),
         exp.TimestampTrunc: lambda self, e: self._annotate_by_args(e, "this"),
         exp.TimeFromParts: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.TIME),
-        exp.TsOrDsToTime: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.TIME),
         exp.TimeTrunc: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.TIME),
+        exp.ToCodePoints: lambda self, e: self._annotate_with_type(
+            e, exp.DataType.build("ARRAY<BIGINT>", dialect="bigquery")
+        ),
+        exp.TsOrDsToTime: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.TIME),
         exp.Translate: lambda self, e: self._annotate_by_args(e, "this"),
         exp.Unicode: lambda self, e: self._annotate_with_type(e, exp.DataType.Type.BIGINT),
     }
@@ -596,10 +633,13 @@ class BigQuery(Dialect):
             "EXPORT": TokenType.EXPORT,
             "FLOAT64": TokenType.DOUBLE,
             "FOR SYSTEM_TIME": TokenType.TIMESTAMP_SNAPSHOT,
+            "LOOP": TokenType.COMMAND,
             "MODEL": TokenType.MODEL,
             "NOT DETERMINISTIC": TokenType.VOLATILE,
             "RECORD": TokenType.STRUCT,
+            "REPEAT": TokenType.COMMAND,
             "TIMESTAMP": TokenType.TIMESTAMPTZ,
+            "WHILE": TokenType.COMMAND,
         }
         KEYWORDS.pop("DIV")
         KEYWORDS.pop("VALUES")
@@ -623,6 +663,8 @@ class BigQuery(Dialect):
         FUNCTIONS = {
             **parser.Parser.FUNCTIONS,
+            "APPROX_TOP_COUNT": exp.ApproxTopK.from_arg_list,
+            "BOOL": exp.JSONBool.from_arg_list,
             "CONTAINS_SUBSTR": _build_contains_substring,
             "DATE": _build_date,
             "DATE_ADD": build_date_delta_with_interval(exp.DateAdd),
@@ -689,6 +731,7 @@ class BigQuery(Dialect):
             "FORMAT_DATETIME": _build_format_time(exp.TsOrDsToDatetime),
             "FORMAT_TIMESTAMP": _build_format_time(exp.TsOrDsToTimestamp),
             "FORMAT_TIME": _build_format_time(exp.TsOrDsToTime),
+            "FROM_HEX": exp.Unhex.from_arg_list,
             "WEEK": lambda args: exp.WeekStart(this=exp.var(seq_get(args, 0))),
         }
@@ -699,7 +742,10 @@ class BigQuery(Dialect):
                 exp.JSONArray, expressions=self._parse_csv(self._parse_bitwise)
             ),
             "MAKE_INTERVAL": lambda self: self._parse_make_interval(),
+            "PREDICT": lambda self: self._parse_predict(),
             "FEATURES_AT_TIME": lambda self: self._parse_features_at_time(),
+            "GENERATE_EMBEDDING": lambda self: self._parse_generate_embedding(),
+            "VECTOR_SEARCH": lambda self: self._parse_vector_search(),
         }
         FUNCTION_PARSERS.pop("TRIM")
@@ -979,13 +1025,40 @@ class BigQuery(Dialect):
             return expr
-        def _parse_features_at_time(self) -> exp.FeaturesAtTime:
-            expr = self.expression(
-                exp.FeaturesAtTime,
-                this=(self._match(TokenType.TABLE) and self._parse_table())
-                or self._parse_select(nested=True),
+        def _parse_predict(self) -> exp.Predict:
+            self._match_text_seq("MODEL")
+            this = self._parse_table()
+            self._match(TokenType.COMMA)
+            self._match_text_seq("TABLE")
+            return self.expression(
+                exp.Predict,
+                this=this,
+                expression=self._parse_table(),
+                params_struct=self._match(TokenType.COMMA) and self._parse_bitwise(),
+            )
+        def _parse_generate_embedding(self) -> exp.GenerateEmbedding:
+            self._match_text_seq("MODEL")
+            this = self._parse_table()
+            self._match(TokenType.COMMA)
+            self._match_text_seq("TABLE")
+            return self.expression(
+                exp.GenerateEmbedding,
+                this=this,
+                expression=self._parse_table(),
+                params_struct=self._match(TokenType.COMMA) and self._parse_bitwise(),
             )
+        def _parse_features_at_time(self) -> exp.FeaturesAtTime:
+            self._match(TokenType.TABLE)
+            this = self._parse_table()
+            expr = self.expression(exp.FeaturesAtTime, this=this)
             while self._match(TokenType.COMMA):
                 arg = self._parse_lambda()
@@ -996,6 +1069,37 @@ class BigQuery(Dialect):
             return expr
+        def _parse_vector_search(self) -> exp.VectorSearch:
+            self._match(TokenType.TABLE)
+            base_table = self._parse_table()
+            self._match(TokenType.COMMA)
+            column_to_search = self._parse_bitwise()
+            self._match(TokenType.COMMA)
+            self._match(TokenType.TABLE)
+            query_table = self._parse_table()
+            expr = self.expression(
+                exp.VectorSearch,
+                this=base_table,
+                column_to_search=column_to_search,
+                query_table=query_table,
+            )
+            while self._match(TokenType.COMMA):
+                # query_column_to_search can be named argument or positional
+                if self._match(TokenType.STRING, advance=False):
+                    query_column = self._parse_string()
+                    expr.set("query_column_to_search", query_column)
+                else:
+                    arg = self._parse_lambda()
+                    if arg:
+                        expr.set(arg.this.name, arg)
+            return expr
         def _parse_export_data(self) -> exp.Export:
             self._match_text_seq("DATA")
@@ -1043,6 +1147,7 @@ class BigQuery(Dialect):
         TRANSFORMS = {
             **generator.Generator.TRANSFORMS,
+            exp.ApproxTopK: rename_func("APPROX_TOP_COUNT"),
             exp.ApproxDistinct: rename_func("APPROX_COUNT_DISTINCT"),
             exp.ArgMax: arg_max_or_min_no_count("MAX_BY"),
             exp.ArgMin: arg_max_or_min_no_count("MIN_BY"),
@@ -1083,6 +1188,7 @@ class BigQuery(Dialect):
             exp.ILike: no_ilike_sql,
             exp.IntDiv: rename_func("DIV"),
             exp.Int64: rename_func("INT64"),
+            exp.JSONBool: rename_func("BOOL"),
             exp.JSONExtract: _json_extract_sql,
             exp.JSONExtractArray: _json_extract_sql,
             exp.JSONExtractScalar: _json_extract_sql,

{sqlglot-27.8.0 → sqlglot-27.9.0}/sqlglot/dialects/clickhouse.py RENAMED Viewed

@@ -345,6 +345,7 @@ class ClickHouse(Dialect):
             "LEVENSHTEINDISTANCE": exp.Levenshtein.from_arg_list,
         }
         FUNCTIONS.pop("TRANSFORM")
+        FUNCTIONS.pop("APPROX_TOP_SUM")
         AGG_FUNCTIONS = {
             "count",
@@ -379,6 +380,7 @@ class ClickHouse(Dialect):
             "argMax",
             "avgWeighted",
             "topK",
+            "approx_top_sum",
             "topKWeighted",
             "deltaSum",
             "deltaSumTimestamp",
@@ -977,6 +979,14 @@ class ClickHouse(Dialect):
             return value
+        def _parse_partitioned_by(self) -> exp.PartitionedByProperty:
+            # ClickHouse allows custom expressions as partition key
+            # https://clickhouse.com/docs/engines/table-engines/mergetree-family/custom-partitioning-key
+            return self.expression(
+                exp.PartitionedByProperty,
+                this=self._parse_assignment(),
+            )
     class Generator(generator.Generator):
         QUERY_HINTS = False
         STRUCT_DELIMITER = ("(", ")")
@@ -1094,6 +1104,7 @@ class ClickHouse(Dialect):
             exp.DateStrToDate: rename_func("toDate"),
             exp.DateSub: _datetime_delta_sql("DATE_SUB"),
             exp.Explode: rename_func("arrayJoin"),
+            exp.FarmFingerprint: rename_func("farmFingerprint64"),
             exp.Final: lambda self, e: f"{self.sql(e, 'this')} FINAL",
             exp.IsNan: rename_func("isNaN"),
             exp.JSONCast: lambda self, e: f"{self.sql(e, 'this')}.:{self.sql(e, 'to')}",

{sqlglot-27.8.0 → sqlglot-27.9.0}/sqlglot/dialects/dialect.py RENAMED Viewed

@@ -668,6 +668,7 @@ class Dialect(metaclass=_Dialect):
             exp.UnixMillis,
         },
         exp.DataType.Type.BINARY: {
+            exp.FromBase32,
             exp.FromBase64,
         },
         exp.DataType.Type.BOOLEAN: {
@@ -779,6 +780,7 @@ class Dialect(metaclass=_Dialect):
             exp.TimeToStr,
             exp.TimeToTimeStr,
             exp.Trim,
+            exp.ToBase32,
             exp.ToBase64,
             exp.TsOrDsToDateStr,
             exp.UnixToStr,

{sqlglot-27.8.0 → sqlglot-27.9.0}/sqlglot/dialects/doris.py RENAMED Viewed

@@ -65,7 +65,11 @@ class Doris(MySQL):
             **MySQL.Parser.PROPERTY_PARSERS,
             "PROPERTIES": lambda self: self._parse_wrapped_properties(),
             "UNIQUE": lambda self: self._parse_composite_key_property(exp.UniqueKeyProperty),
+            # Plain KEY without UNIQUE/DUPLICATE/AGGREGATE prefixes should be treated as UniqueKeyProperty with unique=False
+            "KEY": lambda self: self._parse_composite_key_property(exp.UniqueKeyProperty),
             "PARTITION BY": lambda self: self._parse_partition_by_opt_range(),
+            "BUILD": lambda self: self._parse_build_property(),
+            "REFRESH": lambda self: self._parse_refresh_property(),
         }
         def _parse_partitioning_granularity_dynamic(self) -> exp.PartitionByRangePropertyDynamic:
@@ -104,9 +108,27 @@ class Doris(MySQL):
             part_range = self.expression(exp.PartitionRange, this=name, expressions=values)
             return self.expression(exp.Partition, expressions=[part_range])
+        def _parse_partition_definition_list(self) -> exp.Partition:
+            # PARTITION <name> VALUES IN (<value_csv>)
+            self._match_text_seq("PARTITION")
+            name = self._parse_id_var()
+            self._match_text_seq("VALUES", "IN")
+            values = self._parse_wrapped_csv(self._parse_expression)
+            part_list = self.expression(exp.PartitionList, this=name, expressions=values)
+            return self.expression(exp.Partition, expressions=[part_list])
         def _parse_partition_by_opt_range(
             self,
-        ) -> exp.PartitionedByProperty | exp.PartitionByRangeProperty:
+        ) -> exp.PartitionedByProperty | exp.PartitionByRangeProperty | exp.PartitionByListProperty:
+            if self._match_text_seq("LIST"):
+                return self.expression(
+                    exp.PartitionByListProperty,
+                    partition_expressions=self._parse_wrapped_id_vars(),
+                    create_expressions=self._parse_wrapped_csv(
+                        self._parse_partition_definition_list
+                    ),
+                )
             if not self._match_text_seq("RANGE"):
                 return super()._parse_partitioned_by()
@@ -128,6 +150,28 @@ class Doris(MySQL):
                 create_expressions=create_expressions,
             )
+        def _parse_build_property(self) -> exp.BuildProperty:
+            return self.expression(exp.BuildProperty, this=self._parse_var(upper=True))
+        def _parse_refresh_property(self) -> exp.RefreshTriggerProperty:
+            method = self._parse_var(upper=True)
+            self._match(TokenType.ON)
+            kind = self._match_texts(("MANUAL", "COMMIT", "SCHEDULE")) and self._prev.text.upper()
+            every = self._match_text_seq("EVERY") and self._parse_number()
+            unit = self._parse_var(any_token=True) if every else None
+            starts = self._match_text_seq("STARTS") and self._parse_string()
+            return self.expression(
+                exp.RefreshTriggerProperty,
+                method=method,
+                kind=kind,
+                every=every,
+                unit=unit,
+                starts=starts,
+            )
     class Generator(MySQL.Generator):
         LAST_DAY_SUPPORTS_DATE_PART = False
         VARCHAR_REQUIRES_SIZE = False
@@ -145,7 +189,10 @@ class Doris(MySQL):
             **MySQL.Generator.PROPERTIES_LOCATION,
             exp.UniqueKeyProperty: exp.Properties.Location.POST_SCHEMA,
             exp.PartitionByRangeProperty: exp.Properties.Location.POST_SCHEMA,
+            exp.PartitionByListProperty: exp.Properties.Location.POST_SCHEMA,
             exp.PartitionedByProperty: exp.Properties.Location.POST_SCHEMA,
+            exp.BuildProperty: exp.Properties.Location.POST_SCHEMA,
+            exp.RefreshTriggerProperty: exp.Properties.Location.POST_SCHEMA,
         }
         CAST_MAPPING = {}
@@ -662,9 +709,18 @@ class Doris(MySQL):
             "year",
         }
+        def uniquekeyproperty_sql(
+            self, expression: exp.UniqueKeyProperty, prefix: str = "UNIQUE KEY"
+        ) -> str:
+            create_stmt = expression.find_ancestor(exp.Create)
+            if create_stmt and create_stmt.args["properties"].find(exp.MaterializedProperty):
+                return super().uniquekeyproperty_sql(expression, prefix="KEY")
+            return super().uniquekeyproperty_sql(expression)
         def partition_sql(self, expression: exp.Partition) -> str:
             parent = expression.parent
-            if isinstance(parent, exp.PartitionByRangeProperty):
+            if isinstance(parent, (exp.PartitionByRangeProperty, exp.PartitionByListProperty)):
                 return ", ".join(self.sql(e) for e in expression.expressions)
             return super().partition_sql(expression)
@@ -685,7 +741,9 @@ class Doris(MySQL):
             return f"PARTITION {name} VALUES LESS THAN ({self.sql(values[0])})"
-        def partitionbyrangepropertydynamic_sql(self, expression):
+        def partitionbyrangepropertydynamic_sql(
+            self, expression: exp.PartitionByRangePropertyDynamic
+        ) -> str:
             # Generates: FROM ("start") TO ("end") INTERVAL N UNIT
             start = self.sql(expression, "start")
             end = self.sql(expression, "end")
@@ -699,15 +757,25 @@ class Doris(MySQL):
             return f"FROM ({start}) TO ({end}) {interval}"
-        def partitionbyrangeproperty_sql(self, expression):
-            partition_expressions = ", ".join(
-                self.sql(e) for e in expression.args.get("partition_expressions") or []
+        def partitionbyrangeproperty_sql(self, expression: exp.PartitionByRangeProperty) -> str:
+            partition_expressions = self.expressions(
+                expression, key="partition_expressions", indent=False
             )
-            create_expressions = expression.args.get("create_expressions") or []
-            # Handle both static and dynamic partition definitions
-            create_sql = ", ".join(self.sql(e) for e in create_expressions)
+            create_sql = self.expressions(expression, key="create_expressions", indent=False)
             return f"PARTITION BY RANGE ({partition_expressions}) ({create_sql})"
+        def partitionbylistproperty_sql(self, expression: exp.PartitionByListProperty) -> str:
+            partition_expressions = self.expressions(
+                expression, key="partition_expressions", indent=False
+            )
+            create_sql = self.expressions(expression, key="create_expressions", indent=False)
+            return f"PARTITION BY LIST ({partition_expressions}) ({create_sql})"
+        def partitionlist_sql(self, expression: exp.PartitionList) -> str:
+            name = self.sql(expression, "this")
+            values = self.expressions(expression, indent=False)
+            return f"PARTITION {name} VALUES IN ({values})"
         def partitionedbyproperty_sql(self, expression: exp.PartitionedByProperty) -> str:
             node = expression.this
             if isinstance(node, exp.Schema):

sqlglot 27.8.0__tar.gz → 27.9.0__tar.gz

sqlglot 27.8.0tar.gz → 27.9.0tar.gz