@clickzetta/cz-cli-darwin-x64 0.3.80 → 0.3.81

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (201) hide show
  1. package/bin/cz-cli +0 -0
  2. package/package.json +1 -1
  3. package/bin/skills/clickzetta-access-control/LICENSE +0 -16
  4. package/bin/skills/clickzetta-access-control/SKILL.md +0 -243
  5. package/bin/skills/clickzetta-access-control/eval_cases.jsonl +0 -3
  6. package/bin/skills/clickzetta-access-control/references/dynamic-masking.md +0 -86
  7. package/bin/skills/clickzetta-access-control/references/grant-revoke.md +0 -103
  8. package/bin/skills/clickzetta-access-control/references/role-management.md +0 -66
  9. package/bin/skills/clickzetta-access-control/references/user-management.md +0 -61
  10. package/bin/skills/clickzetta-app-python-sdk/LICENSE +0 -16
  11. package/bin/skills/clickzetta-app-python-sdk/SKILL.md +0 -153
  12. package/bin/skills/clickzetta-app-python-sdk/eval_cases.jsonl +0 -12
  13. package/bin/skills/clickzetta-app-python-sdk/references/bulkload.md +0 -196
  14. package/bin/skills/clickzetta-app-python-sdk/references/connector.md +0 -143
  15. package/bin/skills/clickzetta-app-python-sdk/references/realtime.md +0 -122
  16. package/bin/skills/clickzetta-batch-sync-pipeline/LICENSE +0 -16
  17. package/bin/skills/clickzetta-batch-sync-pipeline/SKILL.md +0 -227
  18. package/bin/skills/clickzetta-batch-sync-pipeline/eval_cases.jsonl +0 -5
  19. package/bin/skills/clickzetta-bi-connect/LICENSE +0 -16
  20. package/bin/skills/clickzetta-bi-connect/SKILL.md +0 -176
  21. package/bin/skills/clickzetta-bi-connect/eval_cases.jsonl +0 -5
  22. package/bin/skills/clickzetta-bi-connect/references/bi-tools.md +0 -170
  23. package/bin/skills/clickzetta-cdc-sync-pipeline/LICENSE +0 -16
  24. package/bin/skills/clickzetta-cdc-sync-pipeline/SKILL.md +0 -633
  25. package/bin/skills/clickzetta-cdc-sync-pipeline/eval_cases.jsonl +0 -5
  26. package/bin/skills/clickzetta-data-ingest-pipeline/LICENSE +0 -16
  27. package/bin/skills/clickzetta-data-ingest-pipeline/SKILL.md +0 -237
  28. package/bin/skills/clickzetta-data-ingest-pipeline/eval_cases.jsonl +0 -5
  29. package/bin/skills/clickzetta-data-retention/LICENSE +0 -16
  30. package/bin/skills/clickzetta-data-retention/SKILL.md +0 -160
  31. package/bin/skills/clickzetta-data-retention/eval_cases.jsonl +0 -5
  32. package/bin/skills/clickzetta-data-retention/references/lifecycle-reference.md +0 -175
  33. package/bin/skills/clickzetta-data-science/LICENSE +0 -16
  34. package/bin/skills/clickzetta-data-science/SKILL.md +0 -125
  35. package/bin/skills/clickzetta-data-science/eval_cases.jsonl +0 -12
  36. package/bin/skills/clickzetta-data-science/references/bitmap-profile.md +0 -146
  37. package/bin/skills/clickzetta-data-science/references/data-patterns.md +0 -110
  38. package/bin/skills/clickzetta-data-science/references/setup.md +0 -160
  39. package/bin/skills/clickzetta-data-science/references/stats-functions.md +0 -195
  40. package/bin/skills/clickzetta-data-science/references/write-and-infer.md +0 -122
  41. package/bin/skills/clickzetta-data-science/references/zettapark-api.md +0 -156
  42. package/bin/skills/clickzetta-data-sharing/LICENSE +0 -16
  43. package/bin/skills/clickzetta-data-sharing/SKILL.md +0 -160
  44. package/bin/skills/clickzetta-data-sharing/eval_cases.jsonl +0 -3
  45. package/bin/skills/clickzetta-data-sharing/references/share-ddl.md +0 -134
  46. package/bin/skills/clickzetta-dba-guide/LICENSE +0 -16
  47. package/bin/skills/clickzetta-dba-guide/SKILL.md +0 -542
  48. package/bin/skills/clickzetta-dba-guide/eval_cases.jsonl +0 -3
  49. package/bin/skills/clickzetta-dw-modeling/LICENSE +0 -16
  50. package/bin/skills/clickzetta-dw-modeling/SKILL.md +0 -351
  51. package/bin/skills/clickzetta-dw-modeling/eval_cases.jsonl +0 -4
  52. package/bin/skills/clickzetta-dw-modeling/references/modeling-patterns.md +0 -100
  53. package/bin/skills/clickzetta-dynamic-table/LICENSE +0 -16
  54. package/bin/skills/clickzetta-dynamic-table/SKILL.md +0 -230
  55. package/bin/skills/clickzetta-dynamic-table/best-practices/dimension-table-join-guide.md +0 -253
  56. package/bin/skills/clickzetta-dynamic-table/best-practices/medallion-and-stream-patterns.md +0 -124
  57. package/bin/skills/clickzetta-dynamic-table/best-practices/non-partitioned-merge-into-warning.md +0 -96
  58. package/bin/skills/clickzetta-dynamic-table/best-practices/performance-optimization.md +0 -109
  59. package/bin/skills/clickzetta-dynamic-table/best-practices/scheduling-guide.md +0 -135
  60. package/bin/skills/clickzetta-dynamic-table/dt-creator/SKILL.md +0 -15
  61. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md +0 -185
  62. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/incremental-config-reference.md +0 -427
  63. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/refresh-history-guide.md +0 -260
  64. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/sql-limitations.md +0 -80
  65. package/bin/skills/clickzetta-dynamic-table/dynamic-table-alter/SKILL.md +0 -190
  66. package/bin/skills/clickzetta-dynamic-table/eval_cases.jsonl +0 -5
  67. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/SKILL.md +0 -27
  68. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-column-validation-rules.md +0 -118
  69. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-conversion-rules.md +0 -225
  70. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-placeholder-rules.md +0 -182
  71. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-refresh-rules.md +0 -98
  72. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-self-reference-rules.md +0 -76
  73. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-workflow.md +0 -109
  74. package/bin/skills/clickzetta-external-catalog/LICENSE +0 -16
  75. package/bin/skills/clickzetta-external-catalog/SKILL.md +0 -123
  76. package/bin/skills/clickzetta-external-catalog/eval_cases.jsonl +0 -5
  77. package/bin/skills/clickzetta-external-catalog/references/external-catalog-ddl.md +0 -130
  78. package/bin/skills/clickzetta-external-function/LICENSE +0 -16
  79. package/bin/skills/clickzetta-external-function/SKILL.md +0 -203
  80. package/bin/skills/clickzetta-external-function/eval_cases.jsonl +0 -4
  81. package/bin/skills/clickzetta-external-function/references/external-function-ddl.md +0 -171
  82. package/bin/skills/clickzetta-file-import-pipeline/LICENSE +0 -16
  83. package/bin/skills/clickzetta-file-import-pipeline/SKILL.md +0 -190
  84. package/bin/skills/clickzetta-file-import-pipeline/eval_cases.jsonl +0 -5
  85. package/bin/skills/clickzetta-index-manager/LICENSE +0 -16
  86. package/bin/skills/clickzetta-index-manager/SKILL.md +0 -140
  87. package/bin/skills/clickzetta-index-manager/eval_cases.jsonl +0 -5
  88. package/bin/skills/clickzetta-index-manager/references/bloomfilter-index.md +0 -67
  89. package/bin/skills/clickzetta-index-manager/references/index-management.md +0 -73
  90. package/bin/skills/clickzetta-index-manager/references/inverted-index.md +0 -80
  91. package/bin/skills/clickzetta-index-manager/references/vector-index.md +0 -81
  92. package/bin/skills/clickzetta-java-sdk/LICENSE +0 -16
  93. package/bin/skills/clickzetta-java-sdk/SKILL.md +0 -186
  94. package/bin/skills/clickzetta-java-sdk/eval_cases.jsonl +0 -12
  95. package/bin/skills/clickzetta-java-sdk/references/bulkload.md +0 -163
  96. package/bin/skills/clickzetta-java-sdk/references/realtime.md +0 -212
  97. package/bin/skills/clickzetta-kafka-ingest-pipeline/LICENSE +0 -16
  98. package/bin/skills/clickzetta-kafka-ingest-pipeline/SKILL.md +0 -769
  99. package/bin/skills/clickzetta-kafka-ingest-pipeline/eval_cases.jsonl +0 -5
  100. package/bin/skills/clickzetta-kafka-ingest-pipeline/references/kafka-pipe-syntax.md +0 -324
  101. package/bin/skills/clickzetta-lakehouse-connect/LICENSE +0 -16
  102. package/bin/skills/clickzetta-lakehouse-connect/SKILL.md +0 -218
  103. package/bin/skills/clickzetta-lakehouse-connect/eval_cases.jsonl +0 -3
  104. package/bin/skills/clickzetta-lakehouse-connect/evals/evals.json +0 -35
  105. package/bin/skills/clickzetta-lakehouse-connect/references/config-file.md +0 -435
  106. package/bin/skills/clickzetta-lakehouse-connect/references/jdbc.md +0 -478
  107. package/bin/skills/clickzetta-lakehouse-connect/references/python-sdk.md +0 -225
  108. package/bin/skills/clickzetta-lakehouse-connect/references/sqlalchemy.md +0 -468
  109. package/bin/skills/clickzetta-lakehouse-connect/references/zettapark-session.md +0 -445
  110. package/bin/skills/clickzetta-manage-comments/LICENSE +0 -16
  111. package/bin/skills/clickzetta-manage-comments/SKILL.md +0 -219
  112. package/bin/skills/clickzetta-manage-comments/eval_cases.jsonl +0 -3
  113. package/bin/skills/clickzetta-metadata/LICENSE +0 -16
  114. package/bin/skills/clickzetta-metadata/SKILL.md +0 -502
  115. package/bin/skills/clickzetta-metadata/eval_cases.jsonl +0 -5
  116. package/bin/skills/clickzetta-metadata/references/instance-views-reference.md +0 -276
  117. package/bin/skills/clickzetta-metadata/references/metering-views-reference.md +0 -137
  118. package/bin/skills/clickzetta-metadata/references/show-desc-reference.md +0 -326
  119. package/bin/skills/clickzetta-metadata/references/views-reference.md +0 -271
  120. package/bin/skills/clickzetta-monitoring/LICENSE +0 -16
  121. package/bin/skills/clickzetta-monitoring/SKILL.md +0 -215
  122. package/bin/skills/clickzetta-monitoring/eval_cases.jsonl +0 -5
  123. package/bin/skills/clickzetta-monitoring/references/job-history-analysis.md +0 -97
  124. package/bin/skills/clickzetta-monitoring/references/show-jobs.md +0 -48
  125. package/bin/skills/clickzetta-oss-ingest-pipeline/LICENSE +0 -16
  126. package/bin/skills/clickzetta-oss-ingest-pipeline/SKILL.md +0 -562
  127. package/bin/skills/clickzetta-oss-ingest-pipeline/eval_cases.jsonl +0 -5
  128. package/bin/skills/clickzetta-overview/LICENSE +0 -16
  129. package/bin/skills/clickzetta-overview/SKILL.md +0 -102
  130. package/bin/skills/clickzetta-overview/eval_cases.jsonl +0 -5
  131. package/bin/skills/clickzetta-overview/references/brands-and-endpoints.md +0 -79
  132. package/bin/skills/clickzetta-overview/references/object-model.md +0 -311
  133. package/bin/skills/clickzetta-overview/references/studio-modules.md +0 -173
  134. package/bin/skills/clickzetta-pipeline-review/LICENSE +0 -16
  135. package/bin/skills/clickzetta-pipeline-review/SKILL.md +0 -377
  136. package/bin/skills/clickzetta-query-optimizer/LICENSE +0 -16
  137. package/bin/skills/clickzetta-query-optimizer/SKILL.md +0 -156
  138. package/bin/skills/clickzetta-query-optimizer/eval_cases.jsonl +0 -5
  139. package/bin/skills/clickzetta-query-optimizer/references/explain.md +0 -56
  140. package/bin/skills/clickzetta-query-optimizer/references/hints-and-sortkey.md +0 -78
  141. package/bin/skills/clickzetta-query-optimizer/references/optimize.md +0 -65
  142. package/bin/skills/clickzetta-query-optimizer/references/result-cache.md +0 -49
  143. package/bin/skills/clickzetta-query-optimizer/references/show-jobs.md +0 -42
  144. package/bin/skills/clickzetta-realtime-sync-pipeline/LICENSE +0 -16
  145. package/bin/skills/clickzetta-realtime-sync-pipeline/SKILL.md +0 -323
  146. package/bin/skills/clickzetta-realtime-sync-pipeline/eval_cases.jsonl +0 -5
  147. package/bin/skills/clickzetta-semantic-view/LICENSE +0 -16
  148. package/bin/skills/clickzetta-semantic-view/SKILL.md +0 -207
  149. package/bin/skills/clickzetta-semantic-view/eval_cases.jsonl +0 -12
  150. package/bin/skills/clickzetta-semantic-view/references/semantic-view-reference.md +0 -167
  151. package/bin/skills/clickzetta-spark-flink-connector/LICENSE +0 -16
  152. package/bin/skills/clickzetta-spark-flink-connector/SKILL.md +0 -92
  153. package/bin/skills/clickzetta-spark-flink-connector/eval_cases.jsonl +0 -5
  154. package/bin/skills/clickzetta-spark-flink-connector/references/flink.md +0 -147
  155. package/bin/skills/clickzetta-spark-flink-connector/references/spark.md +0 -132
  156. package/bin/skills/clickzetta-sql-pipeline-manager/LICENSE +0 -16
  157. package/bin/skills/clickzetta-sql-pipeline-manager/SKILL.md +0 -485
  158. package/bin/skills/clickzetta-sql-pipeline-manager/eval_cases.jsonl +0 -12
  159. package/bin/skills/clickzetta-sql-pipeline-manager/evals/evals.json +0 -166
  160. package/bin/skills/clickzetta-sql-pipeline-manager/references/dynamic-table.md +0 -185
  161. package/bin/skills/clickzetta-sql-pipeline-manager/references/materialized-view.md +0 -129
  162. package/bin/skills/clickzetta-sql-pipeline-manager/references/pipe.md +0 -222
  163. package/bin/skills/clickzetta-sql-pipeline-manager/references/table-stream.md +0 -125
  164. package/bin/skills/clickzetta-sql-syntax-guide/LICENSE +0 -16
  165. package/bin/skills/clickzetta-sql-syntax-guide/SKILL.md +0 -249
  166. package/bin/skills/clickzetta-sql-syntax-guide/eval_cases.jsonl +0 -3
  167. package/bin/skills/clickzetta-sql-syntax-guide/references/ddl-reference.md +0 -350
  168. package/bin/skills/clickzetta-sql-syntax-guide/references/dml-reference.md +0 -279
  169. package/bin/skills/clickzetta-sql-syntax-guide/references/dql-reference.md +0 -504
  170. package/bin/skills/clickzetta-sql-syntax-guide/references/functions-reference.md +0 -372
  171. package/bin/skills/clickzetta-sql-syntax-guide/references/migration-databricks.md +0 -260
  172. package/bin/skills/clickzetta-sql-syntax-guide/references/migration-snowflake.md +0 -382
  173. package/bin/skills/clickzetta-sql-syntax-guide/references/vs-snowflake.md +0 -346
  174. package/bin/skills/clickzetta-sql-syntax-guide/references/vs-spark.md +0 -229
  175. package/bin/skills/clickzetta-studio-task-manager/LICENSE +0 -16
  176. package/bin/skills/clickzetta-studio-task-manager/SKILL.md +0 -652
  177. package/bin/skills/clickzetta-table-lineage/LICENSE +0 -16
  178. package/bin/skills/clickzetta-table-lineage/SKILL.md +0 -90
  179. package/bin/skills/clickzetta-table-lineage/eval_cases.jsonl +0 -1
  180. package/bin/skills/clickzetta-table-lineage/references/normalize_func.sql +0 -14
  181. package/bin/skills/clickzetta-table-lineage/references/table_cost.sql +0 -38
  182. package/bin/skills/clickzetta-table-lineage/references/table_lineage_standalone.html +0 -562
  183. package/bin/skills/clickzetta-table-lineage/references/table_relation.sql +0 -25
  184. package/bin/skills/clickzetta-table-stream-pipeline/LICENSE +0 -16
  185. package/bin/skills/clickzetta-table-stream-pipeline/SKILL.md +0 -206
  186. package/bin/skills/clickzetta-table-stream-pipeline/eval_cases.jsonl +0 -5
  187. package/bin/skills/clickzetta-vcluster-manager/LICENSE +0 -16
  188. package/bin/skills/clickzetta-vcluster-manager/SKILL.md +0 -212
  189. package/bin/skills/clickzetta-vcluster-manager/eval_cases.jsonl +0 -5
  190. package/bin/skills/clickzetta-vcluster-manager/references/vc-cache.md +0 -54
  191. package/bin/skills/clickzetta-vcluster-manager/references/vcluster-ddl.md +0 -150
  192. package/bin/skills/clickzetta-volume-manager/LICENSE +0 -16
  193. package/bin/skills/clickzetta-volume-manager/SKILL.md +0 -292
  194. package/bin/skills/clickzetta-volume-manager/eval_cases.jsonl +0 -5
  195. package/bin/skills/clickzetta-volume-manager/references/volume-ddl.md +0 -199
  196. package/bin/skills/clickzetta-zettapark/LICENSE +0 -16
  197. package/bin/skills/clickzetta-zettapark/SKILL.md +0 -248
  198. package/bin/skills/clickzetta-zettapark/eval_cases.jsonl +0 -12
  199. package/bin/skills/clickzetta-zettapark/references/zettapark-api.md +0 -283
  200. package/bin/skills/cz-cli/SKILL.md +0 -311
  201. package/bin/skills/cz-cli/references/profile-setup.md +0 -120
@@ -1,225 +0,0 @@
1
- # SQL → Dynamic Table 转换规则
2
-
3
- 你是一个 SQL 转换专家。给定一个 Hive/Spark SQL 的 CREATE TABLE DDL 和对应的 INSERT OVERWRITE 语句,你需要按以下规则将它们合并为一个 Dynamic Table DDL。
4
-
5
- ## 总体转换公式
6
-
7
- ```
8
- 输入1: CREATE TABLE schema.table_name (...) PARTITIONED BY (...) ...
9
- 输入2: INSERT OVERWRITE TABLE schema.table_name PARTITION(...) SELECT ... FROM ...
10
- 输出: CREATE OR REPLACE DYNAMIC TABLE schema.table_name (...) PARTITIONED BY (...) ... AS SELECT ... FROM ...
11
- ```
12
-
13
- 核心思想:把 CREATE TABLE 的结构定义 + INSERT OVERWRITE 的查询逻辑,合并成一个 `CREATE OR REPLACE DYNAMIC TABLE ... AS SELECT ...` 语句。
14
-
15
- ## 第一步:解析 CREATE TABLE DDL
16
-
17
- 从 DDL 中提取以下信息:
18
-
19
- 1. **表名**(含 schema):`schema.table_name`
20
- 2. **普通列**:列名、数据类型、COMMENT(保持原始缩进格式)
21
- 3. **分区列**:PARTITIONED BY 中的列名、数据类型、COMMENT
22
- 4. **存储格式**:STORED AS PARQUET/ORC/AVRO 等
23
- 5. **表属性**:TBLPROPERTIES 或 WITH PROPERTIES 中的键值对
24
- 6. **分桶信息**:CLUSTERED BY / SORTED BY / RANGE CLUSTERED BY / HASH CLUSTERED BY
25
- 7. **生命周期**:LIFECYCLE N
26
- 8. **连接信息**:CONNECTION schema.connection_name
27
- 9. **位置信息**:LOCATION 'path'
28
-
29
- ## 第二步:解析 INSERT OVERWRITE 语句
30
-
31
- 从 INSERT 语句中提取:
32
-
33
- 1. **目标表名**:用于自引用检测
34
- 2. **分区类型**:
35
- - 动态分区:`PARTITION (col1, col2)` — 列名无值
36
- - 静态分区:`PARTITION (col1='value1', col2=value2)` — 列名有值
37
- - 混合分区:`PARTITION (static_col='value', dynamic_col)` — 部分有值
38
- 3. **SELECT 查询**:完整的查询逻辑(含 WHERE、JOIN、GROUP BY 等)
39
- 4. **CTE(WITH 子句)**:如果有,保留完整的 WITH ... AS (...) 结构
40
- 5. **前置语句**:SET 语句、CREATE TEMPORARY FUNCTION 等(保留)
41
-
42
- ### 需要过滤的语句
43
-
44
- 从 INSERT 文件中移除:
45
- - `ALTER TABLE ... ADD PARTITION ...`
46
- - `ALTER TABLE ... DROP PARTITION ...`
47
- - 所有 `ALTER TABLE` 开头的语句
48
- - `ANALYZE TABLE` 语句
49
- - SQL 注释(`--` 和 `/* */`)
50
-
51
- ## 第三步:组装 Dynamic Table DDL
52
-
53
- 按以下顺序组装输出:
54
-
55
- ```sql
56
- -- 可选:如果需要删除已存在的同名表,请取消下一行的注释
57
- -- DROP TABLE IF EXISTS schema.table_name;
58
-
59
- CREATE SCHEMA IF NOT EXISTS schema; -- 仅当表名含 schema 时
60
- CREATE OR REPLACE DYNAMIC TABLE schema.table_name (
61
- col1 BIGINT COMMENT '...', -- 普通列(保持原始格式)
62
- col2 STRING COMMENT '...',
63
- part_col1 STRING COMMENT '...' -- 分区列追加在普通列后面
64
- )
65
- PARTITIONED BY (part_col1, part_col2) -- 仅列名,不含类型
66
- [CLUSTERED BY (...) [SORTED BY (...)] [INTO N BUCKETS]]
67
- [STORED AS PARQUET]
68
- TBLPROPERTIES ('key' = 'value') -- 合并模板属性和原始属性
69
- [LIFECYCLE N]
70
- [CONNECTION schema.connection_name]
71
- [LOCATION 'original_path_dt'] -- 原路径加 _dt 后缀
72
- AS
73
- SELECT查询; -- 来自 INSERT OVERWRITE 的查询
74
- ```
75
-
76
- ### 关键规则
77
-
78
- 1. **列定义**:普通列 + 分区列合并到一个括号内,保持原始缩进
79
- 2. **PARTITIONED BY**:只写列名,不写类型(与 CREATE TABLE 不同)
80
- 3. **CREATE SCHEMA**:如果表名含 `.`(如 `kscdm.table_name`),在 DDL 前加 `CREATE SCHEMA IF NOT EXISTS kscdm;`
81
- 4. **LOCATION**:原路径加 `_dt` 后缀
82
- 5. **DROP 语句**:注释掉的 `DROP TABLE IF EXISTS` 放在最前面
83
-
84
- ## 第四步:静态分区注入
85
-
86
- 当 INSERT OVERWRITE 使用静态分区(`PARTITION(col=value)`)时,需要将分区值注入到 SELECT 子句中。
87
-
88
- ### 注入规则
89
-
90
- 在 SELECT 的最后一个列之后、FROM 之前,按 DDL 中分区列的定义顺序追加:
91
-
92
- ```sql
93
- -- 原始 SELECT
94
- SELECT col1, col2 FROM source_table
95
-
96
- -- 注入后(假设 PARTITION(year=2024, month='January'))
97
- SELECT col1, col2,
98
- 2024 AS year,
99
- 'January' AS month
100
- FROM source_table
101
- ```
102
-
103
- ### 值类型智能处理
104
-
105
- 注入时根据值的类型决定是否加引号:
106
-
107
- | 值类型 | 判断规则 | 处理 | 示例 |
108
- |--------|----------|------|------|
109
- | 已有引号 | 以 `'` 或 `"` 开头结尾 | 保持原样 | `'hello'` → `'hello'` |
110
- | NULL | 值为 `NULL`(不区分大小写) | 不加引号 | `NULL` |
111
- | 布尔值 | `true` / `false`(不区分大小写) | 不加引号 | `true` |
112
- | 数字 | 可被 `float()` 解析 | 不加引号 | `123`, `-45.67`, `1.23e-4` |
113
- | SESSION_CONFIGS | 包含 `SESSION_CONFIGS(` | 不加引号 | `SESSION_CONFIGS()['dt.args.ds']` |
114
- | 函数调用 | 匹配 `标识符(...)` 且括号平衡 | 不加引号 | `CURRENT_DATE()`, `YEAR(col)` |
115
- | 字符串 | 以上都不匹配 | 加单引号,内部 `'` 转义为 `''` | `hello` → `'hello'` |
116
-
117
- ### UNION ALL 处理
118
-
119
- 如果 SELECT 包含 UNION ALL,每个分支都要独立注入分区列:
120
-
121
- ```sql
122
- SELECT col1, col2,
123
- 2024 AS year
124
- FROM table_a
125
- UNION ALL
126
- SELECT col1, col2,
127
- 2024 AS year
128
- FROM table_b
129
- ```
130
-
131
- ### CTE + UNION ALL
132
-
133
- 如果有 WITH 子句,先分离 CTE 部分,只对主查询中的 UNION 分支注入。
134
-
135
- ### 已存在的分区列
136
-
137
- 如果 SELECT 中已经包含了某个分区列(通过 `AS alias` 或末尾标识符检测),则跳过该列的注入,避免重复。
138
-
139
- ## 第五步:日期函数后处理
140
-
141
- 生成 DDL 后,对整个 DDL 文本做一次全局替换:
142
-
143
- | 原始形式 | 替换为 |
144
- |----------|--------|
145
- | `DATE_SUB(expr, INTERVAL N DAY)` | `sub_days(expr, N)` |
146
- | `DATE_ADD(expr, INTERVAL N DAY)` | `sub_days(expr, -N)` |
147
-
148
- 这一步确保最终输出统一使用 `sub_days` 函数。
149
-
150
- > 注意:在 SQL 引擎中,`SUB_DAYS` 是 `DATE_SUB` 的别名,两者等价。统一使用 `sub_days` 是为了保持输出一致性。
151
-
152
- ## 第六步:表属性模板合并
153
-
154
- 默认模板属性:`data_lifecycle = 15`
155
-
156
- 合并规则:
157
- - 模板属性作为基础
158
- - 原始 DDL 中的 TBLPROPERTIES 覆盖同名模板属性
159
- - 最终结果写入 TBLPROPERTIES
160
-
161
- ```sql
162
- -- 模板: data_lifecycle=15
163
- -- 原始DDL: TBLPROPERTIES('compression'='snappy', 'data_lifecycle'='30')
164
- -- 合并结果:
165
- TBLPROPERTIES ('data_lifecycle' = '30', 'compression' = 'snappy')
166
- -- data_lifecycle 保留原始值 30,compression 来自原始DDL
167
- ```
168
-
169
- ## 完整示例
170
-
171
- ### 输入1:DDL
172
- ```sql
173
- CREATE TABLE IF NOT EXISTS sales_data (
174
- id BIGINT COMMENT '销售记录ID',
175
- product_name STRING COMMENT '产品名称',
176
- sales_amount DECIMAL(12,2) COMMENT '销售金额'
177
- )
178
- PARTITIONED BY (
179
- year INT COMMENT '年份',
180
- month INT COMMENT '月份'
181
- )
182
- STORED AS PARQUET
183
- LOCATION '/data/warehouse/sales_data';
184
- ```
185
-
186
- ### 输入2:INSERT OVERWRITE
187
- ```sql
188
- INSERT OVERWRITE TABLE sales_data
189
- PARTITION (year, month)
190
- SELECT
191
- s.id,
192
- s.product_name,
193
- s.price * s.quantity AS sales_amount,
194
- YEAR(s.sales_date) AS year,
195
- MONTH(s.sales_date) AS month
196
- FROM raw_sales s
197
- WHERE s.status = 'completed';
198
- ```
199
-
200
- ### 输出:Dynamic Table DDL
201
- ```sql
202
- -- 可选:如果需要删除已存在的同名表,请取消下一行的注释
203
- -- DROP TABLE IF EXISTS sales_data;
204
-
205
- CREATE OR REPLACE DYNAMIC TABLE sales_data (
206
- id BIGINT COMMENT '销售记录ID',
207
- product_name STRING COMMENT '产品名称',
208
- sales_amount DECIMAL(12,2) COMMENT '销售金额',
209
- year INT COMMENT '年份',
210
- month INT COMMENT '月份'
211
- )
212
- PARTITIONED BY (year, month)
213
- STORED AS PARQUET
214
- TBLPROPERTIES ('data_lifecycle' = '15')
215
- LOCATION '/data/warehouse/sales_data_dt'
216
- AS
217
- SELECT
218
- s.id,
219
- s.product_name,
220
- s.price * s.quantity AS sales_amount,
221
- YEAR(s.sales_date) AS year,
222
- MONTH(s.sales_date) AS month
223
- FROM raw_sales s
224
- WHERE s.status = 'completed';
225
- ```
@@ -1,182 +0,0 @@
1
- # SQL 占位符 → SESSION_CONFIGS() 转换规则
2
-
3
- 你是一个 SQL 转换专家。在将传统 SQL 转换为 Dynamic Table SQL 时,需要将各种占位符格式统一转换为 `SESSION_CONFIGS()` 函数调用。
4
-
5
- ## 占位符格式统一
6
-
7
- 首先将所有旧格式统一为 `${...}` 格式:
8
-
9
- | 旧格式 | 统一为 |
10
- |--------|--------|
11
- | `{{ var }}` | `${var}` |
12
- | `{{ ds }}` | `${ds}` |
13
- | `{{region}}` | `${region}` |
14
-
15
- 转换正则:`\{\{\s*([^}]+)\s*\}\}` → `${\1}`
16
-
17
- ## 基本替换规则
18
-
19
- ### 简单变量
20
-
21
- | 输入 | 输出 |
22
- |------|------|
23
- | `${ds}` | `SESSION_CONFIGS()['dt.args.ds']` |
24
- | `${region}` | `SESSION_CONFIGS()['dt.args.region']` |
25
- | `${hour}` | `SESSION_CONFIGS()['dt.args.hour']` |
26
-
27
- ### nodash 变量(特殊处理)
28
-
29
- 变量名中包含 `nodash` 时,自动包装 DATE_FORMAT,但变量名保持原样:
30
-
31
- | 输入 | 输出 |
32
- |------|------|
33
- | `${ds_nodash}` | `DATE_FORMAT(SESSION_CONFIGS()['dt.args.ds_nodash'], 'yyyyMMdd')` |
34
- | `${dsnodash}` | `DATE_FORMAT(SESSION_CONFIGS()['dt.args.dsnodash'], 'yyyyMMdd')` |
35
-
36
- 注意:变量名保持原样(`ds_nodash` 不会变成 `ds`),只是外层包 DATE_FORMAT。
37
-
38
- ### 带运算的变量
39
-
40
- 最终输出统一使用 `sub_days` 函数(有一个后处理步骤会将所有 `DATE_SUB`/`DATE_ADD` 转为 `sub_days`):
41
-
42
- | 输入 | 最终输出 |
43
- |------|----------|
44
- | `${ds - 1}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')` |
45
- | `${ds + 7}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], -7), 'yyyy-MM-dd')` |
46
- | `${ds_nodash - 1}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds_nodash'], 1), 'yyyyMMdd')::STRING` |
47
-
48
- 规则:
49
- - `-` 运算 → `sub_days(..., N)`(N 为正数)
50
- - `+` 运算 → `sub_days(..., -N)`(N 取反为负数)
51
- - 外层包 `DATE_FORMAT`,格式根据变量名决定:
52
- - 含 `nodash` → `'yyyyMMdd'`
53
- - 不含 `nodash` → `'yyyy-MM-dd'`
54
- - 含 `nodash` 的变量带运算时,追加 `::STRING` 类型转换
55
-
56
- 注意:这是最终输出形式。中间步骤可能先生成 `DATE_SUB`/`DATE_ADD`,但最终会被后处理统一转为 `sub_days`。
57
-
58
- ### macros.ds_add 函数
59
-
60
- | 输入 | 输出 |
61
- |------|------|
62
- | `${macros.ds_add(ds, -1)}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')` |
63
- | `${macros.ds_add(ds, 7)}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], -7), 'yyyy-MM-dd')` |
64
-
65
- 注意:`macros.ds_add` 的第二个参数与 `sub_days` 的参数符号相反。`macros.ds_add(ds, -1)` 表示 ds 减 1 天,对应 `sub_days(ds, 1)`(正数=减天数);`macros.ds_add(ds, 7)` 表示 ds 加 7 天,对应 `sub_days(ds, -7)`(负数=加天数)。
66
-
67
- ## 引号上下文规则
68
-
69
- 占位符的处理方式取决于它所在的引号上下文:
70
-
71
- ### 情况1:占位符在单引号内(纯占位符)
72
-
73
- ```sql
74
- -- 输入
75
- WHERE dt = '${ds}'
76
- -- 输出(去除外层引号,直接替换)
77
- WHERE dt = SESSION_CONFIGS()['dt.args.ds']
78
- ```
79
-
80
- ### 情况2:占位符在单引号内(混合内容)
81
-
82
- 当引号内同时包含占位符和字面文本时,使用 CONCAT:
83
-
84
- ```sql
85
- -- 输入
86
- WHERE dt = '${ds_nodash}_done'
87
- -- 输出
88
- WHERE dt = CONCAT(DATE_FORMAT(SESSION_CONFIGS()['dt.args.ds_nodash'], 'yyyyMMdd'), '_done')
89
- ```
90
-
91
- ```sql
92
- -- 输入
93
- WHERE path = '/data/${region}/output'
94
- -- 输出
95
- WHERE path = CONCAT('/data/', SESSION_CONFIGS()['dt.args.region'], '/output')
96
- ```
97
-
98
- ### 情况3:占位符不在引号内
99
-
100
- ```sql
101
- -- 输入
102
- WHERE dt = ${ds}
103
- -- 输出
104
- WHERE dt = SESSION_CONFIGS()['dt.args.ds']
105
- ```
106
-
107
- ### 情况4:占位符在单引号内,且是日期运算
108
-
109
- ```sql
110
- -- 输入
111
- WHERE dt = '${ds - 1}'
112
- -- 输出(去除外层引号,添加 ::STRING 类型转换)
113
- WHERE dt = DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')::STRING
114
- ```
115
-
116
- ### 引号内的引号选择
117
-
118
- 当替换后的表达式仍然处于单引号字符串内部时(如 CONCAT 场景),SESSION_CONFIGS 的键名使用双引号以避免引号冲突:
119
- ```sql
120
- -- 在 CONCAT 等单引号上下文中
121
- CONCAT('prefix_', SESSION_CONFIGS()["dt.args.ds"])
122
-
123
- -- 独立表达式(外层引号已去除)
124
- SESSION_CONFIGS()['dt.args.ds']
125
- ```
126
-
127
- ## 静态分区中的占位符
128
-
129
- 静态分区值中的占位符替换后,值会被注入到 SELECT 子句:
130
-
131
- ```sql
132
- -- 输入
133
- INSERT OVERWRITE TABLE t PARTITION(dt='${ds}', region='${region}')
134
- SELECT col1 FROM source;
135
-
136
- -- 转换后
137
- SELECT col1,
138
- SESSION_CONFIGS()['dt.args.ds'] AS dt,
139
- SESSION_CONFIGS()['dt.args.region'] AS region
140
- FROM source;
141
- ```
142
-
143
- ## 不可识别的表达式
144
-
145
- 对于无法解析的复杂表达式(如 Airflow Jinja 模板),进行清洗:
146
- 1. 将 Python strftime 格式符转为 SQL 风格:`%Y`→`yyyy`, `%m`→`MM`, `%d`→`dd`, `%H`→`HH`
147
- 2. 非字母数字下划线字符替换为 `_`
148
- 3. 合并连续下划线,去除首尾下划线
149
- 4. 用清洗后的字符串作为 SESSION_CONFIGS 的键名
150
-
151
- ```sql
152
- -- 输入
153
- ${execution_date.strftime("%H00")}
154
- -- 清洗后键名: execution_date_strftime_HH00
155
- -- 输出
156
- SESSION_CONFIGS()['dt.args.execution_date_strftime_HH00']
157
- ```
158
-
159
- ## 完整示例
160
-
161
- ### 输入
162
- ```sql
163
- INSERT OVERWRITE TABLE kscdm.dim_table
164
- PARTITION(p_date='{{ ds_nodash }}_done', product='done', dt='{{ ds }}')
165
- SELECT id, name
166
- FROM source_table
167
- WHERE dt = '{{ ds }}'
168
- AND prev_dt = '{{ ds - 1 }}'
169
- AND region = '{{ region }}';
170
- ```
171
-
172
- ### 输出(占位符替换后)
173
- ```sql
174
- SELECT id, name,
175
- CONCAT(DATE_FORMAT(SESSION_CONFIGS()['dt.args.ds_nodash'], 'yyyyMMdd'), '_done') AS p_date,
176
- 'done' AS product,
177
- SESSION_CONFIGS()['dt.args.ds'] AS dt
178
- FROM source_table
179
- WHERE dt = SESSION_CONFIGS()['dt.args.ds']
180
- AND prev_dt = DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')::STRING
181
- AND region = SESSION_CONFIGS()['dt.args.region'];
182
- ```
@@ -1,98 +0,0 @@
1
- # Dynamic Table Refresh 与调度文件生成规则
2
-
3
- 你是一个 SQL 转换专家。在生成 Dynamic Table DDL 之后,还需要生成配套的 refresh 语句、回填语句和调度配置文件。
4
-
5
- ## Refresh 语句生成
6
-
7
- ### 变量提取
8
-
9
- 从转换后的 DDL 中提取所有 `SESSION_CONFIGS()['dt.args.XXX']` 中的变量名 XXX,去重后排序。
10
-
11
- 注意:只提取 DDL 中实际出现的变量名。例如如果 DDL 中只有 `SESSION_CONFIGS()['dt.args.ds_nodash']`,则只生成 `ds_nodash` 一个变量的 SET 语句。
12
-
13
- ### 三类 Refresh 文件
14
-
15
- 对每个转换的表,生成三类文件:
16
-
17
- #### 1. 当前周期 refresh(`表名_refresh.sql`)
18
-
19
- ```sql
20
- set dt.args.ds = ${ds};
21
- set dt.args.region = ${region};
22
- REFRESH DYNAMIC TABLE schema.table_name PARTITION(ds = '${ds}', region = '${region}');
23
- ```
24
-
25
- 规则:
26
- - 为每个提取到的变量生成一条 `set dt.args.变量名 = ${变量名};`
27
- - 变量按字母序排列
28
- - PARTITION 子句只包含静态分区列(从原始 INSERT OVERWRITE 的 PARTITION 子句中提取)
29
- - 分区值使用 `'${变量名}'` 格式
30
-
31
- #### 2. 上一周期 refresh(`表名_prev_refresh.sql`)
32
-
33
- ```sql
34
- set dt.args.ds = ${prev_ds};
35
- set dt.args.region = ${prev_region};
36
- REFRESH DYNAMIC TABLE schema.table_name PARTITION(ds = '${prev_ds}', region = '${prev_region}');
37
- ```
38
-
39
- 规则:每个变量名加 `prev_` 前缀。
40
-
41
- #### 3. 回填语句(`表名_backfill.sql`)
42
-
43
- ```sql
44
- set cz.optimizer.incremental.backfill.enabled = TRUE;
45
-
46
- INSERT OVERWRITE schema.table_name
47
- SELECT *
48
- FROM ext_schema.table_name
49
- WHERE ds = '${ds}' AND region = '${region}';
50
- ```
51
-
52
- 规则:
53
- - 固定的 backfill 开关 SET 语句
54
- - 从扩展表(ext_schema)SELECT * 到目标表
55
- - WHERE 条件使用静态分区列(从原始 INSERT OVERWRITE 的 PARTITION 子句中提取)
56
-
57
- ### 无分区表
58
-
59
- 如果表没有静态分区变量:
60
- - 只生成当前周期 refresh:`REFRESH DYNAMIC TABLE schema.table_name;`
61
- - 不生成 prev_refresh 和 backfill 文件
62
-
63
- ### 扩展表名规则
64
-
65
- - 如果指定了 `ext_schema`:`ext_schema.table_name`
66
-
67
- ## 完整示例
68
-
69
- ### 输入(转换后的 DDL 含以下变量)
70
-
71
- DDL 中包含:`SESSION_CONFIGS()['dt.args.ds']` 和 `SESSION_CONFIGS()['dt.args.region']`
72
- 原始 PARTITION:`PARTITION(dt='${ds}', region='${region}')`
73
-
74
- ### 输出
75
-
76
- **refresh.sql:**
77
- ```sql
78
- set dt.args.ds = ${ds};
79
- set dt.args.region = ${region};
80
- REFRESH DYNAMIC TABLE kscdm.my_table PARTITION(dt = '${ds}', region = '${region}');
81
- ```
82
-
83
- **prev_refresh.sql:**
84
- ```sql
85
- set dt.args.ds = ${prev_ds};
86
- set dt.args.region = ${prev_region};
87
- REFRESH DYNAMIC TABLE kscdm.my_table PARTITION(dt = '${prev_ds}', region = '${prev_region}');
88
- ```
89
-
90
- **backfill.sql:**
91
- ```sql
92
- set cz.optimizer.incremental.backfill.enabled = TRUE;
93
-
94
- INSERT OVERWRITE kscdm.my_table
95
- SELECT *
96
- FROM ext_kscdm.my_table
97
- WHERE dt = '${ds}' AND region = '${region}';
98
- ```
@@ -1,76 +0,0 @@
1
- # Dynamic Table 自引用表转换规则
2
-
3
- 你是一个 SQL 转换专家。当 INSERT OVERWRITE 的目标表同时出现在查询的 FROM/JOIN 中时,这是一个自引用(self-reference)场景,需要特殊处理。
4
-
5
- ## 自引用检测
6
-
7
- ### 判断条件
8
-
9
- 1. 从 INSERT OVERWRITE 语句中提取目标表名(含 schema)
10
- 2. 在 SELECT 查询的 FROM 和 JOIN 子句中搜索该表名
11
- 3. 排除 PARTITION 子句中的表名引用(不算自引用)
12
- 4. 如果在 FROM/JOIN 中找到目标表名 → 判定为自引用
13
-
14
- ### 示例
15
-
16
- ```sql
17
- -- 目标表: kscdm.daily_sales
18
- INSERT OVERWRITE TABLE kscdm.daily_sales PARTITION(ds='${ds}')
19
- SELECT current.id, current.amount
20
- FROM source_sales current
21
- LEFT JOIN kscdm.daily_sales prev ON current.id = prev.id -- ← 自引用
22
- WHERE prev.ds = '${ds - 1}';
23
- ```
24
-
25
- ## 转换规则
26
-
27
- 自引用表的转换与普通表基本相同,但有以下区别:
28
-
29
- ### 1. 显式 Schema 声明
30
-
31
- 自引用表必须在 CREATE DYNAMIC TABLE 中显式声明完整的列定义(含类型),因为 SQL 引擎需要这些信息来推理自依赖列的类型:
32
-
33
- ```sql
34
- CREATE OR REPLACE DYNAMIC TABLE kscdm.daily_sales (
35
- id BIGINT COMMENT '...',
36
- amount DECIMAL(10,2) COMMENT '...',
37
- ds STRING COMMENT '...'
38
- )
39
- PARTITIONED BY (ds)
40
- AS
41
- SELECT current.id, current.amount,
42
- SESSION_CONFIGS()['dt.args.ds'] AS ds
43
- FROM source_sales current
44
- LEFT JOIN kscdm.daily_sales prev ON current.id = prev.id
45
- WHERE prev.ds = DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')::STRING;
46
- ```
47
-
48
- ### 2. 查询中保留自引用
49
-
50
- 转换后的 AS 子句中,自引用表名保持不变,不做任何替换。SQL 引擎会自动处理自引用的版本管理。
51
-
52
- ## 常见自引用场景
53
-
54
- ### 日环比计算
55
-
56
- ```sql
57
- -- 输入
58
- INSERT OVERWRITE TABLE metrics PARTITION(ds='${ds}')
59
- SELECT t.id, t.value,
60
- t.value - prev.value AS daily_change
61
- FROM source t
62
- LEFT JOIN metrics prev ON t.id = prev.id AND prev.ds = '${ds - 1}';
63
-
64
- -- 输出
65
- CREATE OR REPLACE DYNAMIC TABLE metrics (
66
- id BIGINT, value DECIMAL(10,2), daily_change DECIMAL(10,2), ds STRING
67
- )
68
- PARTITIONED BY (ds)
69
- AS
70
- SELECT t.id, t.value,
71
- t.value - prev.value AS daily_change,
72
- SESSION_CONFIGS()['dt.args.ds'] AS ds
73
- FROM source t
74
- LEFT JOIN metrics prev ON t.id = prev.id
75
- AND prev.ds = DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')::STRING;
76
- ```
@@ -1,109 +0,0 @@
1
- # SQL → Dynamic Table 完整转换工作流
2
-
3
- 当用户给你一组 CREATE TABLE DDL 和 INSERT OVERWRITE SQL,要求转换为 Dynamic Table 时,按以下步骤顺序执行。
4
-
5
- 每一步的详细规则在对应的 skill 文件中,你需要同时引用它们。
6
-
7
- ## 工作流步骤
8
-
9
- ### Step 1: 预处理输入
10
-
11
- 从 INSERT OVERWRITE 文件中移除:
12
- - 所有 `ALTER TABLE` 语句
13
- - `ANALYZE TABLE` 语句
14
- - SQL 注释(`--` 和 `/* */`)
15
-
16
- 保留:CREATE TABLE、INSERT OVERWRITE、WITH、SET、CREATE TEMPORARY FUNCTION。
17
-
18
- ### Step 2: 占位符替换
19
-
20
- 按 #[[file:sql2dt-placeholder-rules.md]] 中的规则:
21
- 1. 统一占位符格式(`{{ }}` → `${ }`)
22
- 2. 替换所有占位符为 `SESSION_CONFIGS()` 调用
23
- 3. 处理 nodash 变量、日期运算、macros 函数
24
- 4. 根据引号上下文决定处理方式(去引号 / CONCAT / 直接替换)
25
-
26
- ### Step 3: 自引用检测
27
-
28
- 按 #[[file:sql2dt-self-reference-rules.md]] 中的规则:
29
- 1. 检查 INSERT OVERWRITE 目标表是否出现在 FROM/JOIN 中
30
- 2. 如果是自引用表,标记并在后续步骤中添加注释、使用显式 schema
31
-
32
- ### Step 4: 核心转换
33
-
34
- 按 #[[file:sql2dt-conversion-rules.md]] 中的规则:
35
- 1. 解析 CREATE TABLE DDL(提取列、分区、属性等)
36
- 2. 解析 INSERT OVERWRITE(提取查询、分区类型)
37
- 3. 组装 `CREATE OR REPLACE DYNAMIC TABLE ... AS SELECT ...`
38
- 4. 注入静态分区值到 SELECT(智能引号处理)
39
- 5. 合并表属性模板(默认 `data_lifecycle=15`)
40
- 6. 处理 UNION ALL(每个分支独立注入)
41
- 7. 日期函数后处理:将所有 `DATE_SUB/DATE_ADD` 统一转为 `sub_days`
42
-
43
- ### Step 5: 列校验
44
-
45
- 按 #[[file:sql2dt-column-validation-rules.md]] 中的规则:
46
- 1. 计算 schema 列数和 SELECT 列数
47
- 2. 验证两者相等
48
- 3. 检查重复别名和缺失分区列
49
- 4. UNION ALL 分支列数一致性检查
50
-
51
- ### Step 6: 生成配套文件
52
-
53
- 按 #[[file:sql2dt-refresh-rules.md]] 中的规则:
54
- 1. 从 DDL 中提取所有 SESSION_CONFIGS 变量
55
- 2. 生成当前周期 refresh 语句
56
- 3. 生成上一周期 prev_refresh 语句
57
- 4. 生成回填 backfill 语句
58
-
59
- ### Step 7: 转换后改进建议
60
-
61
- DDL 生成完成后,对转换结果做以下检查,并主动向用户提出改进建议:
62
-
63
- **检查 1:非分区表 + 持续写入风险**
64
-
65
- 按 #[[file:../best-practices/non-partitioned-merge-into-warning.md]] 中的判断逻辑:
66
- - 生成的 DT 是非分区表(无 `PARTITIONED BY` 也无 `SESSION_CONFIGS()`)
67
- - 且 SQL 中包含 `ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ... DESC) WHERE rn = 1` 去重模式
68
-
69
- → 满足条件时,使用该文档中的告警话术模板向用户发出风险提示,并建议改用 MERGE INTO + Table Stream 方案。
70
-
71
- **检查 2:SQL 性能优化机会**
72
-
73
- 按 #[[file:../best-practices/performance-optimization.md]] 中的规则,扫描生成的 DT SQL:
74
- - 存在 `LEFT/RIGHT/FULL OUTER JOIN` → 提示如果业务允许,改用 INNER JOIN 可提升增量效率
75
- - 存在无 `PARTITION BY` 的窗口函数 → 提示添加 PARTITION BY,否则每次增量都全量重算
76
- - `GROUP BY` 使用了复杂表达式(如 `DATE_TRUNC`、`SUBSTR`)→ 提示考虑在上游预计算或拆分为多级 DT
77
-
78
- **检查 3:JOIN 中是否有维度表**
79
-
80
- 按 #[[file:../best-practices/dimension-table-join-guide.md]] 中的推荐场景:
81
- - SQL 中存在 JOIN → 询问用户右侧表是否为低频变更的维度表(码表、字典表、配置表等)
82
- - 如果是 → 建议在 TBLPROPERTIES 中添加 `mv_const_tables` 配置,并说明其行为和数据一致性权衡
83
-
84
- ## 输出清单
85
-
86
- 对每个表,最终输出:
87
-
88
- | 文件 | 内容 | 条件 |
89
- |------|------|------|
90
- | `表名.sql` | Dynamic Table DDL | 始终生成 |
91
- | `表名_refresh.sql` | 当前周期 REFRESH 语句 | 始终生成 |
92
- | `表名_prev_refresh.sql` | 上一周期 REFRESH 语句 | 仅有分区变量时 |
93
- | `表名_backfill.sql` | 回填语句 | 仅有分区变量时 |
94
-
95
- ## 快速判断路径
96
-
97
- ```
98
- 输入 DDL + INSERT OVERWRITE
99
-
100
- ├─ 有占位符? → Step 2 占位符替换
101
-
102
- ├─ 自引用? → Step 3 特殊处理
103
-
104
- ├─ 有静态分区? → Step 4 注入分区值到 SELECT
105
-
106
- ├─ 有 UNION ALL? → Step 4 每个分支独立注入
107
-
108
- └─ 生成 DDL → Step 5 校验 → Step 6 生成配套文件 → Step 7 改进建议
109
- ```