@clickzetta/cz-cli-darwin-arm64 0.3.81 → 0.3.83
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/cz-cli +0 -0
- package/bin/skills/clickzetta-access-control/LICENSE +16 -0
- package/bin/skills/clickzetta-access-control/SKILL.md +243 -0
- package/bin/skills/clickzetta-access-control/eval_cases.jsonl +3 -0
- package/bin/skills/clickzetta-access-control/references/dynamic-masking.md +86 -0
- package/bin/skills/clickzetta-access-control/references/grant-revoke.md +103 -0
- package/bin/skills/clickzetta-access-control/references/role-management.md +66 -0
- package/bin/skills/clickzetta-access-control/references/user-management.md +61 -0
- package/bin/skills/clickzetta-app-python-sdk/LICENSE +16 -0
- package/bin/skills/clickzetta-app-python-sdk/SKILL.md +153 -0
- package/bin/skills/clickzetta-app-python-sdk/eval_cases.jsonl +12 -0
- package/bin/skills/clickzetta-app-python-sdk/references/bulkload.md +196 -0
- package/bin/skills/clickzetta-app-python-sdk/references/connector.md +143 -0
- package/bin/skills/clickzetta-app-python-sdk/references/realtime.md +122 -0
- package/bin/skills/clickzetta-batch-sync-pipeline/LICENSE +16 -0
- package/bin/skills/clickzetta-batch-sync-pipeline/SKILL.md +227 -0
- package/bin/skills/clickzetta-batch-sync-pipeline/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-bi-connect/LICENSE +16 -0
- package/bin/skills/clickzetta-bi-connect/SKILL.md +176 -0
- package/bin/skills/clickzetta-bi-connect/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-bi-connect/references/bi-tools.md +170 -0
- package/bin/skills/clickzetta-cdc-sync-pipeline/LICENSE +16 -0
- package/bin/skills/clickzetta-cdc-sync-pipeline/SKILL.md +633 -0
- package/bin/skills/clickzetta-cdc-sync-pipeline/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-data-ingest-pipeline/LICENSE +16 -0
- package/bin/skills/clickzetta-data-ingest-pipeline/SKILL.md +237 -0
- package/bin/skills/clickzetta-data-ingest-pipeline/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-data-retention/LICENSE +16 -0
- package/bin/skills/clickzetta-data-retention/SKILL.md +160 -0
- package/bin/skills/clickzetta-data-retention/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-data-retention/references/lifecycle-reference.md +175 -0
- package/bin/skills/clickzetta-data-science/LICENSE +16 -0
- package/bin/skills/clickzetta-data-science/SKILL.md +125 -0
- package/bin/skills/clickzetta-data-science/eval_cases.jsonl +12 -0
- package/bin/skills/clickzetta-data-science/references/bitmap-profile.md +146 -0
- package/bin/skills/clickzetta-data-science/references/data-patterns.md +110 -0
- package/bin/skills/clickzetta-data-science/references/setup.md +160 -0
- package/bin/skills/clickzetta-data-science/references/stats-functions.md +195 -0
- package/bin/skills/clickzetta-data-science/references/write-and-infer.md +122 -0
- package/bin/skills/clickzetta-data-science/references/zettapark-api.md +156 -0
- package/bin/skills/clickzetta-data-sharing/LICENSE +16 -0
- package/bin/skills/clickzetta-data-sharing/SKILL.md +160 -0
- package/bin/skills/clickzetta-data-sharing/eval_cases.jsonl +3 -0
- package/bin/skills/clickzetta-data-sharing/references/share-ddl.md +134 -0
- package/bin/skills/clickzetta-dba-guide/LICENSE +16 -0
- package/bin/skills/clickzetta-dba-guide/SKILL.md +542 -0
- package/bin/skills/clickzetta-dba-guide/eval_cases.jsonl +3 -0
- package/bin/skills/clickzetta-dw-modeling/LICENSE +16 -0
- package/bin/skills/clickzetta-dw-modeling/SKILL.md +351 -0
- package/bin/skills/clickzetta-dw-modeling/eval_cases.jsonl +4 -0
- package/bin/skills/clickzetta-dw-modeling/references/modeling-patterns.md +100 -0
- package/bin/skills/clickzetta-dynamic-table/LICENSE +16 -0
- package/bin/skills/clickzetta-dynamic-table/SKILL.md +230 -0
- package/bin/skills/clickzetta-dynamic-table/best-practices/dimension-table-join-guide.md +253 -0
- package/bin/skills/clickzetta-dynamic-table/best-practices/medallion-and-stream-patterns.md +124 -0
- package/bin/skills/clickzetta-dynamic-table/best-practices/non-partitioned-merge-into-warning.md +96 -0
- package/bin/skills/clickzetta-dynamic-table/best-practices/performance-optimization.md +109 -0
- package/bin/skills/clickzetta-dynamic-table/best-practices/scheduling-guide.md +135 -0
- package/bin/skills/clickzetta-dynamic-table/dt-creator/SKILL.md +15 -0
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md +185 -0
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/incremental-config-reference.md +427 -0
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/refresh-history-guide.md +260 -0
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/sql-limitations.md +80 -0
- package/bin/skills/clickzetta-dynamic-table/dynamic-table-alter/SKILL.md +190 -0
- package/bin/skills/clickzetta-dynamic-table/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/SKILL.md +27 -0
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-column-validation-rules.md +118 -0
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-conversion-rules.md +225 -0
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-placeholder-rules.md +182 -0
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-refresh-rules.md +98 -0
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-self-reference-rules.md +76 -0
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-workflow.md +109 -0
- package/bin/skills/clickzetta-external-catalog/LICENSE +16 -0
- package/bin/skills/clickzetta-external-catalog/SKILL.md +123 -0
- package/bin/skills/clickzetta-external-catalog/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-external-catalog/references/external-catalog-ddl.md +130 -0
- package/bin/skills/clickzetta-external-function/LICENSE +16 -0
- package/bin/skills/clickzetta-external-function/SKILL.md +203 -0
- package/bin/skills/clickzetta-external-function/eval_cases.jsonl +4 -0
- package/bin/skills/clickzetta-external-function/references/external-function-ddl.md +171 -0
- package/bin/skills/clickzetta-file-import-pipeline/LICENSE +16 -0
- package/bin/skills/clickzetta-file-import-pipeline/SKILL.md +190 -0
- package/bin/skills/clickzetta-file-import-pipeline/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-index-manager/LICENSE +16 -0
- package/bin/skills/clickzetta-index-manager/SKILL.md +140 -0
- package/bin/skills/clickzetta-index-manager/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-index-manager/references/bloomfilter-index.md +67 -0
- package/bin/skills/clickzetta-index-manager/references/index-management.md +73 -0
- package/bin/skills/clickzetta-index-manager/references/inverted-index.md +80 -0
- package/bin/skills/clickzetta-index-manager/references/vector-index.md +81 -0
- package/bin/skills/clickzetta-java-sdk/LICENSE +16 -0
- package/bin/skills/clickzetta-java-sdk/SKILL.md +186 -0
- package/bin/skills/clickzetta-java-sdk/eval_cases.jsonl +12 -0
- package/bin/skills/clickzetta-java-sdk/references/bulkload.md +163 -0
- package/bin/skills/clickzetta-java-sdk/references/realtime.md +212 -0
- package/bin/skills/clickzetta-kafka-ingest-pipeline/LICENSE +16 -0
- package/bin/skills/clickzetta-kafka-ingest-pipeline/SKILL.md +769 -0
- package/bin/skills/clickzetta-kafka-ingest-pipeline/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-kafka-ingest-pipeline/references/kafka-pipe-syntax.md +324 -0
- package/bin/skills/clickzetta-lakehouse-connect/LICENSE +16 -0
- package/bin/skills/clickzetta-lakehouse-connect/SKILL.md +218 -0
- package/bin/skills/clickzetta-lakehouse-connect/eval_cases.jsonl +3 -0
- package/bin/skills/clickzetta-lakehouse-connect/evals/evals.json +35 -0
- package/bin/skills/clickzetta-lakehouse-connect/references/config-file.md +435 -0
- package/bin/skills/clickzetta-lakehouse-connect/references/jdbc.md +478 -0
- package/bin/skills/clickzetta-lakehouse-connect/references/python-sdk.md +225 -0
- package/bin/skills/clickzetta-lakehouse-connect/references/sqlalchemy.md +468 -0
- package/bin/skills/clickzetta-lakehouse-connect/references/zettapark-session.md +445 -0
- package/bin/skills/clickzetta-manage-comments/LICENSE +16 -0
- package/bin/skills/clickzetta-manage-comments/SKILL.md +219 -0
- package/bin/skills/clickzetta-manage-comments/eval_cases.jsonl +3 -0
- package/bin/skills/clickzetta-metadata/LICENSE +16 -0
- package/bin/skills/clickzetta-metadata/SKILL.md +502 -0
- package/bin/skills/clickzetta-metadata/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-metadata/references/instance-views-reference.md +276 -0
- package/bin/skills/clickzetta-metadata/references/metering-views-reference.md +137 -0
- package/bin/skills/clickzetta-metadata/references/show-desc-reference.md +326 -0
- package/bin/skills/clickzetta-metadata/references/views-reference.md +271 -0
- package/bin/skills/clickzetta-monitoring/LICENSE +16 -0
- package/bin/skills/clickzetta-monitoring/SKILL.md +215 -0
- package/bin/skills/clickzetta-monitoring/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-monitoring/references/job-history-analysis.md +97 -0
- package/bin/skills/clickzetta-monitoring/references/show-jobs.md +48 -0
- package/bin/skills/clickzetta-oss-ingest-pipeline/LICENSE +16 -0
- package/bin/skills/clickzetta-oss-ingest-pipeline/SKILL.md +562 -0
- package/bin/skills/clickzetta-oss-ingest-pipeline/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-overview/LICENSE +16 -0
- package/bin/skills/clickzetta-overview/SKILL.md +102 -0
- package/bin/skills/clickzetta-overview/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-overview/references/brands-and-endpoints.md +79 -0
- package/bin/skills/clickzetta-overview/references/object-model.md +311 -0
- package/bin/skills/clickzetta-overview/references/studio-modules.md +173 -0
- package/bin/skills/clickzetta-pipeline-review/LICENSE +16 -0
- package/bin/skills/clickzetta-pipeline-review/SKILL.md +377 -0
- package/bin/skills/clickzetta-query-optimizer/LICENSE +16 -0
- package/bin/skills/clickzetta-query-optimizer/SKILL.md +156 -0
- package/bin/skills/clickzetta-query-optimizer/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-query-optimizer/references/explain.md +56 -0
- package/bin/skills/clickzetta-query-optimizer/references/hints-and-sortkey.md +78 -0
- package/bin/skills/clickzetta-query-optimizer/references/optimize.md +65 -0
- package/bin/skills/clickzetta-query-optimizer/references/result-cache.md +49 -0
- package/bin/skills/clickzetta-query-optimizer/references/show-jobs.md +42 -0
- package/bin/skills/clickzetta-realtime-sync-pipeline/LICENSE +16 -0
- package/bin/skills/clickzetta-realtime-sync-pipeline/SKILL.md +323 -0
- package/bin/skills/clickzetta-realtime-sync-pipeline/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-semantic-view/LICENSE +16 -0
- package/bin/skills/clickzetta-semantic-view/SKILL.md +207 -0
- package/bin/skills/clickzetta-semantic-view/eval_cases.jsonl +12 -0
- package/bin/skills/clickzetta-semantic-view/references/semantic-view-reference.md +167 -0
- package/bin/skills/clickzetta-spark-flink-connector/LICENSE +16 -0
- package/bin/skills/clickzetta-spark-flink-connector/SKILL.md +92 -0
- package/bin/skills/clickzetta-spark-flink-connector/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-spark-flink-connector/references/flink.md +147 -0
- package/bin/skills/clickzetta-spark-flink-connector/references/spark.md +132 -0
- package/bin/skills/clickzetta-sql-pipeline-manager/LICENSE +16 -0
- package/bin/skills/clickzetta-sql-pipeline-manager/SKILL.md +485 -0
- package/bin/skills/clickzetta-sql-pipeline-manager/eval_cases.jsonl +12 -0
- package/bin/skills/clickzetta-sql-pipeline-manager/evals/evals.json +166 -0
- package/bin/skills/clickzetta-sql-pipeline-manager/references/dynamic-table.md +185 -0
- package/bin/skills/clickzetta-sql-pipeline-manager/references/materialized-view.md +129 -0
- package/bin/skills/clickzetta-sql-pipeline-manager/references/pipe.md +222 -0
- package/bin/skills/clickzetta-sql-pipeline-manager/references/table-stream.md +125 -0
- package/bin/skills/clickzetta-sql-syntax-guide/LICENSE +16 -0
- package/bin/skills/clickzetta-sql-syntax-guide/SKILL.md +249 -0
- package/bin/skills/clickzetta-sql-syntax-guide/eval_cases.jsonl +3 -0
- package/bin/skills/clickzetta-sql-syntax-guide/references/ddl-reference.md +350 -0
- package/bin/skills/clickzetta-sql-syntax-guide/references/dml-reference.md +279 -0
- package/bin/skills/clickzetta-sql-syntax-guide/references/dql-reference.md +504 -0
- package/bin/skills/clickzetta-sql-syntax-guide/references/functions-reference.md +372 -0
- package/bin/skills/clickzetta-sql-syntax-guide/references/migration-databricks.md +260 -0
- package/bin/skills/clickzetta-sql-syntax-guide/references/migration-snowflake.md +382 -0
- package/bin/skills/clickzetta-sql-syntax-guide/references/vs-snowflake.md +346 -0
- package/bin/skills/clickzetta-sql-syntax-guide/references/vs-spark.md +229 -0
- package/bin/skills/clickzetta-studio-task-manager/LICENSE +16 -0
- package/bin/skills/clickzetta-studio-task-manager/SKILL.md +652 -0
- package/bin/skills/clickzetta-table-lineage/LICENSE +16 -0
- package/bin/skills/clickzetta-table-lineage/SKILL.md +90 -0
- package/bin/skills/clickzetta-table-lineage/eval_cases.jsonl +1 -0
- package/bin/skills/clickzetta-table-lineage/references/normalize_func.sql +14 -0
- package/bin/skills/clickzetta-table-lineage/references/table_cost.sql +38 -0
- package/bin/skills/clickzetta-table-lineage/references/table_lineage_standalone.html +562 -0
- package/bin/skills/clickzetta-table-lineage/references/table_relation.sql +25 -0
- package/bin/skills/clickzetta-table-stream-pipeline/LICENSE +16 -0
- package/bin/skills/clickzetta-table-stream-pipeline/SKILL.md +206 -0
- package/bin/skills/clickzetta-table-stream-pipeline/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-vcluster-manager/LICENSE +16 -0
- package/bin/skills/clickzetta-vcluster-manager/SKILL.md +212 -0
- package/bin/skills/clickzetta-vcluster-manager/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-vcluster-manager/references/vc-cache.md +54 -0
- package/bin/skills/clickzetta-vcluster-manager/references/vcluster-ddl.md +150 -0
- package/bin/skills/clickzetta-volume-manager/LICENSE +16 -0
- package/bin/skills/clickzetta-volume-manager/SKILL.md +292 -0
- package/bin/skills/clickzetta-volume-manager/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-volume-manager/references/volume-ddl.md +199 -0
- package/bin/skills/clickzetta-zettapark/LICENSE +16 -0
- package/bin/skills/clickzetta-zettapark/SKILL.md +248 -0
- package/bin/skills/clickzetta-zettapark/eval_cases.jsonl +12 -0
- package/bin/skills/clickzetta-zettapark/references/zettapark-api.md +283 -0
- package/bin/skills/cz-cli/SKILL.md +313 -0
- package/bin/skills/cz-cli/references/profile-setup.md +120 -0
- package/package.json +1 -1
|
@@ -0,0 +1,225 @@
|
|
|
1
|
+
# SQL → Dynamic Table 转换规则
|
|
2
|
+
|
|
3
|
+
你是一个 SQL 转换专家。给定一个 Hive/Spark SQL 的 CREATE TABLE DDL 和对应的 INSERT OVERWRITE 语句,你需要按以下规则将它们合并为一个 Dynamic Table DDL。
|
|
4
|
+
|
|
5
|
+
## 总体转换公式
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
输入1: CREATE TABLE schema.table_name (...) PARTITIONED BY (...) ...
|
|
9
|
+
输入2: INSERT OVERWRITE TABLE schema.table_name PARTITION(...) SELECT ... FROM ...
|
|
10
|
+
输出: CREATE OR REPLACE DYNAMIC TABLE schema.table_name (...) PARTITIONED BY (...) ... AS SELECT ... FROM ...
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
核心思想:把 CREATE TABLE 的结构定义 + INSERT OVERWRITE 的查询逻辑,合并成一个 `CREATE OR REPLACE DYNAMIC TABLE ... AS SELECT ...` 语句。
|
|
14
|
+
|
|
15
|
+
## 第一步:解析 CREATE TABLE DDL
|
|
16
|
+
|
|
17
|
+
从 DDL 中提取以下信息:
|
|
18
|
+
|
|
19
|
+
1. **表名**(含 schema):`schema.table_name`
|
|
20
|
+
2. **普通列**:列名、数据类型、COMMENT(保持原始缩进格式)
|
|
21
|
+
3. **分区列**:PARTITIONED BY 中的列名、数据类型、COMMENT
|
|
22
|
+
4. **存储格式**:STORED AS PARQUET/ORC/AVRO 等
|
|
23
|
+
5. **表属性**:TBLPROPERTIES 或 WITH PROPERTIES 中的键值对
|
|
24
|
+
6. **分桶信息**:CLUSTERED BY / SORTED BY / RANGE CLUSTERED BY / HASH CLUSTERED BY
|
|
25
|
+
7. **生命周期**:LIFECYCLE N
|
|
26
|
+
8. **连接信息**:CONNECTION schema.connection_name
|
|
27
|
+
9. **位置信息**:LOCATION 'path'
|
|
28
|
+
|
|
29
|
+
## 第二步:解析 INSERT OVERWRITE 语句
|
|
30
|
+
|
|
31
|
+
从 INSERT 语句中提取:
|
|
32
|
+
|
|
33
|
+
1. **目标表名**:用于自引用检测
|
|
34
|
+
2. **分区类型**:
|
|
35
|
+
- 动态分区:`PARTITION (col1, col2)` — 列名无值
|
|
36
|
+
- 静态分区:`PARTITION (col1='value1', col2=value2)` — 列名有值
|
|
37
|
+
- 混合分区:`PARTITION (static_col='value', dynamic_col)` — 部分有值
|
|
38
|
+
3. **SELECT 查询**:完整的查询逻辑(含 WHERE、JOIN、GROUP BY 等)
|
|
39
|
+
4. **CTE(WITH 子句)**:如果有,保留完整的 WITH ... AS (...) 结构
|
|
40
|
+
5. **前置语句**:SET 语句、CREATE TEMPORARY FUNCTION 等(保留)
|
|
41
|
+
|
|
42
|
+
### 需要过滤的语句
|
|
43
|
+
|
|
44
|
+
从 INSERT 文件中移除:
|
|
45
|
+
- `ALTER TABLE ... ADD PARTITION ...`
|
|
46
|
+
- `ALTER TABLE ... DROP PARTITION ...`
|
|
47
|
+
- 所有 `ALTER TABLE` 开头的语句
|
|
48
|
+
- `ANALYZE TABLE` 语句
|
|
49
|
+
- SQL 注释(`--` 和 `/* */`)
|
|
50
|
+
|
|
51
|
+
## 第三步:组装 Dynamic Table DDL
|
|
52
|
+
|
|
53
|
+
按以下顺序组装输出:
|
|
54
|
+
|
|
55
|
+
```sql
|
|
56
|
+
-- 可选:如果需要删除已存在的同名表,请取消下一行的注释
|
|
57
|
+
-- DROP TABLE IF EXISTS schema.table_name;
|
|
58
|
+
|
|
59
|
+
CREATE SCHEMA IF NOT EXISTS schema; -- 仅当表名含 schema 时
|
|
60
|
+
CREATE OR REPLACE DYNAMIC TABLE schema.table_name (
|
|
61
|
+
col1 BIGINT COMMENT '...', -- 普通列(保持原始格式)
|
|
62
|
+
col2 STRING COMMENT '...',
|
|
63
|
+
part_col1 STRING COMMENT '...' -- 分区列追加在普通列后面
|
|
64
|
+
)
|
|
65
|
+
PARTITIONED BY (part_col1, part_col2) -- 仅列名,不含类型
|
|
66
|
+
[CLUSTERED BY (...) [SORTED BY (...)] [INTO N BUCKETS]]
|
|
67
|
+
[STORED AS PARQUET]
|
|
68
|
+
TBLPROPERTIES ('key' = 'value') -- 合并模板属性和原始属性
|
|
69
|
+
[LIFECYCLE N]
|
|
70
|
+
[CONNECTION schema.connection_name]
|
|
71
|
+
[LOCATION 'original_path_dt'] -- 原路径加 _dt 后缀
|
|
72
|
+
AS
|
|
73
|
+
SELECT查询; -- 来自 INSERT OVERWRITE 的查询
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
### 关键规则
|
|
77
|
+
|
|
78
|
+
1. **列定义**:普通列 + 分区列合并到一个括号内,保持原始缩进
|
|
79
|
+
2. **PARTITIONED BY**:只写列名,不写类型(与 CREATE TABLE 不同)
|
|
80
|
+
3. **CREATE SCHEMA**:如果表名含 `.`(如 `kscdm.table_name`),在 DDL 前加 `CREATE SCHEMA IF NOT EXISTS kscdm;`
|
|
81
|
+
4. **LOCATION**:原路径加 `_dt` 后缀
|
|
82
|
+
5. **DROP 语句**:注释掉的 `DROP TABLE IF EXISTS` 放在最前面
|
|
83
|
+
|
|
84
|
+
## 第四步:静态分区注入
|
|
85
|
+
|
|
86
|
+
当 INSERT OVERWRITE 使用静态分区(`PARTITION(col=value)`)时,需要将分区值注入到 SELECT 子句中。
|
|
87
|
+
|
|
88
|
+
### 注入规则
|
|
89
|
+
|
|
90
|
+
在 SELECT 的最后一个列之后、FROM 之前,按 DDL 中分区列的定义顺序追加:
|
|
91
|
+
|
|
92
|
+
```sql
|
|
93
|
+
-- 原始 SELECT
|
|
94
|
+
SELECT col1, col2 FROM source_table
|
|
95
|
+
|
|
96
|
+
-- 注入后(假设 PARTITION(year=2024, month='January'))
|
|
97
|
+
SELECT col1, col2,
|
|
98
|
+
2024 AS year,
|
|
99
|
+
'January' AS month
|
|
100
|
+
FROM source_table
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
### 值类型智能处理
|
|
104
|
+
|
|
105
|
+
注入时根据值的类型决定是否加引号:
|
|
106
|
+
|
|
107
|
+
| 值类型 | 判断规则 | 处理 | 示例 |
|
|
108
|
+
|--------|----------|------|------|
|
|
109
|
+
| 已有引号 | 以 `'` 或 `"` 开头结尾 | 保持原样 | `'hello'` → `'hello'` |
|
|
110
|
+
| NULL | 值为 `NULL`(不区分大小写) | 不加引号 | `NULL` |
|
|
111
|
+
| 布尔值 | `true` / `false`(不区分大小写) | 不加引号 | `true` |
|
|
112
|
+
| 数字 | 可被 `float()` 解析 | 不加引号 | `123`, `-45.67`, `1.23e-4` |
|
|
113
|
+
| SESSION_CONFIGS | 包含 `SESSION_CONFIGS(` | 不加引号 | `SESSION_CONFIGS()['dt.args.ds']` |
|
|
114
|
+
| 函数调用 | 匹配 `标识符(...)` 且括号平衡 | 不加引号 | `CURRENT_DATE()`, `YEAR(col)` |
|
|
115
|
+
| 字符串 | 以上都不匹配 | 加单引号,内部 `'` 转义为 `''` | `hello` → `'hello'` |
|
|
116
|
+
|
|
117
|
+
### UNION ALL 处理
|
|
118
|
+
|
|
119
|
+
如果 SELECT 包含 UNION ALL,每个分支都要独立注入分区列:
|
|
120
|
+
|
|
121
|
+
```sql
|
|
122
|
+
SELECT col1, col2,
|
|
123
|
+
2024 AS year
|
|
124
|
+
FROM table_a
|
|
125
|
+
UNION ALL
|
|
126
|
+
SELECT col1, col2,
|
|
127
|
+
2024 AS year
|
|
128
|
+
FROM table_b
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
### CTE + UNION ALL
|
|
132
|
+
|
|
133
|
+
如果有 WITH 子句,先分离 CTE 部分,只对主查询中的 UNION 分支注入。
|
|
134
|
+
|
|
135
|
+
### 已存在的分区列
|
|
136
|
+
|
|
137
|
+
如果 SELECT 中已经包含了某个分区列(通过 `AS alias` 或末尾标识符检测),则跳过该列的注入,避免重复。
|
|
138
|
+
|
|
139
|
+
## 第五步:日期函数后处理
|
|
140
|
+
|
|
141
|
+
生成 DDL 后,对整个 DDL 文本做一次全局替换:
|
|
142
|
+
|
|
143
|
+
| 原始形式 | 替换为 |
|
|
144
|
+
|----------|--------|
|
|
145
|
+
| `DATE_SUB(expr, INTERVAL N DAY)` | `sub_days(expr, N)` |
|
|
146
|
+
| `DATE_ADD(expr, INTERVAL N DAY)` | `sub_days(expr, -N)` |
|
|
147
|
+
|
|
148
|
+
这一步确保最终输出统一使用 `sub_days` 函数。
|
|
149
|
+
|
|
150
|
+
> 注意:在 SQL 引擎中,`SUB_DAYS` 是 `DATE_SUB` 的别名,两者等价。统一使用 `sub_days` 是为了保持输出一致性。
|
|
151
|
+
|
|
152
|
+
## 第六步:表属性模板合并
|
|
153
|
+
|
|
154
|
+
默认模板属性:`data_lifecycle = 15`
|
|
155
|
+
|
|
156
|
+
合并规则:
|
|
157
|
+
- 模板属性作为基础
|
|
158
|
+
- 原始 DDL 中的 TBLPROPERTIES 覆盖同名模板属性
|
|
159
|
+
- 最终结果写入 TBLPROPERTIES
|
|
160
|
+
|
|
161
|
+
```sql
|
|
162
|
+
-- 模板: data_lifecycle=15
|
|
163
|
+
-- 原始DDL: TBLPROPERTIES('compression'='snappy', 'data_lifecycle'='30')
|
|
164
|
+
-- 合并结果:
|
|
165
|
+
TBLPROPERTIES ('data_lifecycle' = '30', 'compression' = 'snappy')
|
|
166
|
+
-- data_lifecycle 保留原始值 30,compression 来自原始DDL
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
## 完整示例
|
|
170
|
+
|
|
171
|
+
### 输入1:DDL
|
|
172
|
+
```sql
|
|
173
|
+
CREATE TABLE IF NOT EXISTS sales_data (
|
|
174
|
+
id BIGINT COMMENT '销售记录ID',
|
|
175
|
+
product_name STRING COMMENT '产品名称',
|
|
176
|
+
sales_amount DECIMAL(12,2) COMMENT '销售金额'
|
|
177
|
+
)
|
|
178
|
+
PARTITIONED BY (
|
|
179
|
+
year INT COMMENT '年份',
|
|
180
|
+
month INT COMMENT '月份'
|
|
181
|
+
)
|
|
182
|
+
STORED AS PARQUET
|
|
183
|
+
LOCATION '/data/warehouse/sales_data';
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
### 输入2:INSERT OVERWRITE
|
|
187
|
+
```sql
|
|
188
|
+
INSERT OVERWRITE TABLE sales_data
|
|
189
|
+
PARTITION (year, month)
|
|
190
|
+
SELECT
|
|
191
|
+
s.id,
|
|
192
|
+
s.product_name,
|
|
193
|
+
s.price * s.quantity AS sales_amount,
|
|
194
|
+
YEAR(s.sales_date) AS year,
|
|
195
|
+
MONTH(s.sales_date) AS month
|
|
196
|
+
FROM raw_sales s
|
|
197
|
+
WHERE s.status = 'completed';
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
### 输出:Dynamic Table DDL
|
|
201
|
+
```sql
|
|
202
|
+
-- 可选:如果需要删除已存在的同名表,请取消下一行的注释
|
|
203
|
+
-- DROP TABLE IF EXISTS sales_data;
|
|
204
|
+
|
|
205
|
+
CREATE OR REPLACE DYNAMIC TABLE sales_data (
|
|
206
|
+
id BIGINT COMMENT '销售记录ID',
|
|
207
|
+
product_name STRING COMMENT '产品名称',
|
|
208
|
+
sales_amount DECIMAL(12,2) COMMENT '销售金额',
|
|
209
|
+
year INT COMMENT '年份',
|
|
210
|
+
month INT COMMENT '月份'
|
|
211
|
+
)
|
|
212
|
+
PARTITIONED BY (year, month)
|
|
213
|
+
STORED AS PARQUET
|
|
214
|
+
TBLPROPERTIES ('data_lifecycle' = '15')
|
|
215
|
+
LOCATION '/data/warehouse/sales_data_dt'
|
|
216
|
+
AS
|
|
217
|
+
SELECT
|
|
218
|
+
s.id,
|
|
219
|
+
s.product_name,
|
|
220
|
+
s.price * s.quantity AS sales_amount,
|
|
221
|
+
YEAR(s.sales_date) AS year,
|
|
222
|
+
MONTH(s.sales_date) AS month
|
|
223
|
+
FROM raw_sales s
|
|
224
|
+
WHERE s.status = 'completed';
|
|
225
|
+
```
|
|
@@ -0,0 +1,182 @@
|
|
|
1
|
+
# SQL 占位符 → SESSION_CONFIGS() 转换规则
|
|
2
|
+
|
|
3
|
+
你是一个 SQL 转换专家。在将传统 SQL 转换为 Dynamic Table SQL 时,需要将各种占位符格式统一转换为 `SESSION_CONFIGS()` 函数调用。
|
|
4
|
+
|
|
5
|
+
## 占位符格式统一
|
|
6
|
+
|
|
7
|
+
首先将所有旧格式统一为 `${...}` 格式:
|
|
8
|
+
|
|
9
|
+
| 旧格式 | 统一为 |
|
|
10
|
+
|--------|--------|
|
|
11
|
+
| `{{ var }}` | `${var}` |
|
|
12
|
+
| `{{ ds }}` | `${ds}` |
|
|
13
|
+
| `{{region}}` | `${region}` |
|
|
14
|
+
|
|
15
|
+
转换正则:`\{\{\s*([^}]+)\s*\}\}` → `${\1}`
|
|
16
|
+
|
|
17
|
+
## 基本替换规则
|
|
18
|
+
|
|
19
|
+
### 简单变量
|
|
20
|
+
|
|
21
|
+
| 输入 | 输出 |
|
|
22
|
+
|------|------|
|
|
23
|
+
| `${ds}` | `SESSION_CONFIGS()['dt.args.ds']` |
|
|
24
|
+
| `${region}` | `SESSION_CONFIGS()['dt.args.region']` |
|
|
25
|
+
| `${hour}` | `SESSION_CONFIGS()['dt.args.hour']` |
|
|
26
|
+
|
|
27
|
+
### nodash 变量(特殊处理)
|
|
28
|
+
|
|
29
|
+
变量名中包含 `nodash` 时,自动包装 DATE_FORMAT,但变量名保持原样:
|
|
30
|
+
|
|
31
|
+
| 输入 | 输出 |
|
|
32
|
+
|------|------|
|
|
33
|
+
| `${ds_nodash}` | `DATE_FORMAT(SESSION_CONFIGS()['dt.args.ds_nodash'], 'yyyyMMdd')` |
|
|
34
|
+
| `${dsnodash}` | `DATE_FORMAT(SESSION_CONFIGS()['dt.args.dsnodash'], 'yyyyMMdd')` |
|
|
35
|
+
|
|
36
|
+
注意:变量名保持原样(`ds_nodash` 不会变成 `ds`),只是外层包 DATE_FORMAT。
|
|
37
|
+
|
|
38
|
+
### 带运算的变量
|
|
39
|
+
|
|
40
|
+
最终输出统一使用 `sub_days` 函数(有一个后处理步骤会将所有 `DATE_SUB`/`DATE_ADD` 转为 `sub_days`):
|
|
41
|
+
|
|
42
|
+
| 输入 | 最终输出 |
|
|
43
|
+
|------|----------|
|
|
44
|
+
| `${ds - 1}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')` |
|
|
45
|
+
| `${ds + 7}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], -7), 'yyyy-MM-dd')` |
|
|
46
|
+
| `${ds_nodash - 1}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds_nodash'], 1), 'yyyyMMdd')::STRING` |
|
|
47
|
+
|
|
48
|
+
规则:
|
|
49
|
+
- `-` 运算 → `sub_days(..., N)`(N 为正数)
|
|
50
|
+
- `+` 运算 → `sub_days(..., -N)`(N 取反为负数)
|
|
51
|
+
- 外层包 `DATE_FORMAT`,格式根据变量名决定:
|
|
52
|
+
- 含 `nodash` → `'yyyyMMdd'`
|
|
53
|
+
- 不含 `nodash` → `'yyyy-MM-dd'`
|
|
54
|
+
- 含 `nodash` 的变量带运算时,追加 `::STRING` 类型转换
|
|
55
|
+
|
|
56
|
+
注意:这是最终输出形式。中间步骤可能先生成 `DATE_SUB`/`DATE_ADD`,但最终会被后处理统一转为 `sub_days`。
|
|
57
|
+
|
|
58
|
+
### macros.ds_add 函数
|
|
59
|
+
|
|
60
|
+
| 输入 | 输出 |
|
|
61
|
+
|------|------|
|
|
62
|
+
| `${macros.ds_add(ds, -1)}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')` |
|
|
63
|
+
| `${macros.ds_add(ds, 7)}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], -7), 'yyyy-MM-dd')` |
|
|
64
|
+
|
|
65
|
+
注意:`macros.ds_add` 的第二个参数与 `sub_days` 的参数符号相反。`macros.ds_add(ds, -1)` 表示 ds 减 1 天,对应 `sub_days(ds, 1)`(正数=减天数);`macros.ds_add(ds, 7)` 表示 ds 加 7 天,对应 `sub_days(ds, -7)`(负数=加天数)。
|
|
66
|
+
|
|
67
|
+
## 引号上下文规则
|
|
68
|
+
|
|
69
|
+
占位符的处理方式取决于它所在的引号上下文:
|
|
70
|
+
|
|
71
|
+
### 情况1:占位符在单引号内(纯占位符)
|
|
72
|
+
|
|
73
|
+
```sql
|
|
74
|
+
-- 输入
|
|
75
|
+
WHERE dt = '${ds}'
|
|
76
|
+
-- 输出(去除外层引号,直接替换)
|
|
77
|
+
WHERE dt = SESSION_CONFIGS()['dt.args.ds']
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
### 情况2:占位符在单引号内(混合内容)
|
|
81
|
+
|
|
82
|
+
当引号内同时包含占位符和字面文本时,使用 CONCAT:
|
|
83
|
+
|
|
84
|
+
```sql
|
|
85
|
+
-- 输入
|
|
86
|
+
WHERE dt = '${ds_nodash}_done'
|
|
87
|
+
-- 输出
|
|
88
|
+
WHERE dt = CONCAT(DATE_FORMAT(SESSION_CONFIGS()['dt.args.ds_nodash'], 'yyyyMMdd'), '_done')
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
```sql
|
|
92
|
+
-- 输入
|
|
93
|
+
WHERE path = '/data/${region}/output'
|
|
94
|
+
-- 输出
|
|
95
|
+
WHERE path = CONCAT('/data/', SESSION_CONFIGS()['dt.args.region'], '/output')
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
### 情况3:占位符不在引号内
|
|
99
|
+
|
|
100
|
+
```sql
|
|
101
|
+
-- 输入
|
|
102
|
+
WHERE dt = ${ds}
|
|
103
|
+
-- 输出
|
|
104
|
+
WHERE dt = SESSION_CONFIGS()['dt.args.ds']
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
### 情况4:占位符在单引号内,且是日期运算
|
|
108
|
+
|
|
109
|
+
```sql
|
|
110
|
+
-- 输入
|
|
111
|
+
WHERE dt = '${ds - 1}'
|
|
112
|
+
-- 输出(去除外层引号,添加 ::STRING 类型转换)
|
|
113
|
+
WHERE dt = DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')::STRING
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
### 引号内的引号选择
|
|
117
|
+
|
|
118
|
+
当替换后的表达式仍然处于单引号字符串内部时(如 CONCAT 场景),SESSION_CONFIGS 的键名使用双引号以避免引号冲突:
|
|
119
|
+
```sql
|
|
120
|
+
-- 在 CONCAT 等单引号上下文中
|
|
121
|
+
CONCAT('prefix_', SESSION_CONFIGS()["dt.args.ds"])
|
|
122
|
+
|
|
123
|
+
-- 独立表达式(外层引号已去除)
|
|
124
|
+
SESSION_CONFIGS()['dt.args.ds']
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
## 静态分区中的占位符
|
|
128
|
+
|
|
129
|
+
静态分区值中的占位符替换后,值会被注入到 SELECT 子句:
|
|
130
|
+
|
|
131
|
+
```sql
|
|
132
|
+
-- 输入
|
|
133
|
+
INSERT OVERWRITE TABLE t PARTITION(dt='${ds}', region='${region}')
|
|
134
|
+
SELECT col1 FROM source;
|
|
135
|
+
|
|
136
|
+
-- 转换后
|
|
137
|
+
SELECT col1,
|
|
138
|
+
SESSION_CONFIGS()['dt.args.ds'] AS dt,
|
|
139
|
+
SESSION_CONFIGS()['dt.args.region'] AS region
|
|
140
|
+
FROM source;
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
## 不可识别的表达式
|
|
144
|
+
|
|
145
|
+
对于无法解析的复杂表达式(如 Airflow Jinja 模板),进行清洗:
|
|
146
|
+
1. 将 Python strftime 格式符转为 SQL 风格:`%Y`→`yyyy`, `%m`→`MM`, `%d`→`dd`, `%H`→`HH`
|
|
147
|
+
2. 非字母数字下划线字符替换为 `_`
|
|
148
|
+
3. 合并连续下划线,去除首尾下划线
|
|
149
|
+
4. 用清洗后的字符串作为 SESSION_CONFIGS 的键名
|
|
150
|
+
|
|
151
|
+
```sql
|
|
152
|
+
-- 输入
|
|
153
|
+
${execution_date.strftime("%H00")}
|
|
154
|
+
-- 清洗后键名: execution_date_strftime_HH00
|
|
155
|
+
-- 输出
|
|
156
|
+
SESSION_CONFIGS()['dt.args.execution_date_strftime_HH00']
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
## 完整示例
|
|
160
|
+
|
|
161
|
+
### 输入
|
|
162
|
+
```sql
|
|
163
|
+
INSERT OVERWRITE TABLE kscdm.dim_table
|
|
164
|
+
PARTITION(p_date='{{ ds_nodash }}_done', product='done', dt='{{ ds }}')
|
|
165
|
+
SELECT id, name
|
|
166
|
+
FROM source_table
|
|
167
|
+
WHERE dt = '{{ ds }}'
|
|
168
|
+
AND prev_dt = '{{ ds - 1 }}'
|
|
169
|
+
AND region = '{{ region }}';
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
### 输出(占位符替换后)
|
|
173
|
+
```sql
|
|
174
|
+
SELECT id, name,
|
|
175
|
+
CONCAT(DATE_FORMAT(SESSION_CONFIGS()['dt.args.ds_nodash'], 'yyyyMMdd'), '_done') AS p_date,
|
|
176
|
+
'done' AS product,
|
|
177
|
+
SESSION_CONFIGS()['dt.args.ds'] AS dt
|
|
178
|
+
FROM source_table
|
|
179
|
+
WHERE dt = SESSION_CONFIGS()['dt.args.ds']
|
|
180
|
+
AND prev_dt = DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')::STRING
|
|
181
|
+
AND region = SESSION_CONFIGS()['dt.args.region'];
|
|
182
|
+
```
|
|
@@ -0,0 +1,98 @@
|
|
|
1
|
+
# Dynamic Table Refresh 与调度文件生成规则
|
|
2
|
+
|
|
3
|
+
你是一个 SQL 转换专家。在生成 Dynamic Table DDL 之后,还需要生成配套的 refresh 语句、回填语句和调度配置文件。
|
|
4
|
+
|
|
5
|
+
## Refresh 语句生成
|
|
6
|
+
|
|
7
|
+
### 变量提取
|
|
8
|
+
|
|
9
|
+
从转换后的 DDL 中提取所有 `SESSION_CONFIGS()['dt.args.XXX']` 中的变量名 XXX,去重后排序。
|
|
10
|
+
|
|
11
|
+
注意:只提取 DDL 中实际出现的变量名。例如如果 DDL 中只有 `SESSION_CONFIGS()['dt.args.ds_nodash']`,则只生成 `ds_nodash` 一个变量的 SET 语句。
|
|
12
|
+
|
|
13
|
+
### 三类 Refresh 文件
|
|
14
|
+
|
|
15
|
+
对每个转换的表,生成三类文件:
|
|
16
|
+
|
|
17
|
+
#### 1. 当前周期 refresh(`表名_refresh.sql`)
|
|
18
|
+
|
|
19
|
+
```sql
|
|
20
|
+
set dt.args.ds = ${ds};
|
|
21
|
+
set dt.args.region = ${region};
|
|
22
|
+
REFRESH DYNAMIC TABLE schema.table_name PARTITION(ds = '${ds}', region = '${region}');
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
规则:
|
|
26
|
+
- 为每个提取到的变量生成一条 `set dt.args.变量名 = ${变量名};`
|
|
27
|
+
- 变量按字母序排列
|
|
28
|
+
- PARTITION 子句只包含静态分区列(从原始 INSERT OVERWRITE 的 PARTITION 子句中提取)
|
|
29
|
+
- 分区值使用 `'${变量名}'` 格式
|
|
30
|
+
|
|
31
|
+
#### 2. 上一周期 refresh(`表名_prev_refresh.sql`)
|
|
32
|
+
|
|
33
|
+
```sql
|
|
34
|
+
set dt.args.ds = ${prev_ds};
|
|
35
|
+
set dt.args.region = ${prev_region};
|
|
36
|
+
REFRESH DYNAMIC TABLE schema.table_name PARTITION(ds = '${prev_ds}', region = '${prev_region}');
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
规则:每个变量名加 `prev_` 前缀。
|
|
40
|
+
|
|
41
|
+
#### 3. 回填语句(`表名_backfill.sql`)
|
|
42
|
+
|
|
43
|
+
```sql
|
|
44
|
+
set cz.optimizer.incremental.backfill.enabled = TRUE;
|
|
45
|
+
|
|
46
|
+
INSERT OVERWRITE schema.table_name
|
|
47
|
+
SELECT *
|
|
48
|
+
FROM ext_schema.table_name
|
|
49
|
+
WHERE ds = '${ds}' AND region = '${region}';
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
规则:
|
|
53
|
+
- 固定的 backfill 开关 SET 语句
|
|
54
|
+
- 从扩展表(ext_schema)SELECT * 到目标表
|
|
55
|
+
- WHERE 条件使用静态分区列(从原始 INSERT OVERWRITE 的 PARTITION 子句中提取)
|
|
56
|
+
|
|
57
|
+
### 无分区表
|
|
58
|
+
|
|
59
|
+
如果表没有静态分区变量:
|
|
60
|
+
- 只生成当前周期 refresh:`REFRESH DYNAMIC TABLE schema.table_name;`
|
|
61
|
+
- 不生成 prev_refresh 和 backfill 文件
|
|
62
|
+
|
|
63
|
+
### 扩展表名规则
|
|
64
|
+
|
|
65
|
+
- 如果指定了 `ext_schema`:`ext_schema.table_name`
|
|
66
|
+
|
|
67
|
+
## 完整示例
|
|
68
|
+
|
|
69
|
+
### 输入(转换后的 DDL 含以下变量)
|
|
70
|
+
|
|
71
|
+
DDL 中包含:`SESSION_CONFIGS()['dt.args.ds']` 和 `SESSION_CONFIGS()['dt.args.region']`
|
|
72
|
+
原始 PARTITION:`PARTITION(dt='${ds}', region='${region}')`
|
|
73
|
+
|
|
74
|
+
### 输出
|
|
75
|
+
|
|
76
|
+
**refresh.sql:**
|
|
77
|
+
```sql
|
|
78
|
+
set dt.args.ds = ${ds};
|
|
79
|
+
set dt.args.region = ${region};
|
|
80
|
+
REFRESH DYNAMIC TABLE kscdm.my_table PARTITION(dt = '${ds}', region = '${region}');
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
**prev_refresh.sql:**
|
|
84
|
+
```sql
|
|
85
|
+
set dt.args.ds = ${prev_ds};
|
|
86
|
+
set dt.args.region = ${prev_region};
|
|
87
|
+
REFRESH DYNAMIC TABLE kscdm.my_table PARTITION(dt = '${prev_ds}', region = '${prev_region}');
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
**backfill.sql:**
|
|
91
|
+
```sql
|
|
92
|
+
set cz.optimizer.incremental.backfill.enabled = TRUE;
|
|
93
|
+
|
|
94
|
+
INSERT OVERWRITE kscdm.my_table
|
|
95
|
+
SELECT *
|
|
96
|
+
FROM ext_kscdm.my_table
|
|
97
|
+
WHERE dt = '${ds}' AND region = '${region}';
|
|
98
|
+
```
|
package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-self-reference-rules.md
ADDED
|
@@ -0,0 +1,76 @@
|
|
|
1
|
+
# Dynamic Table 自引用表转换规则
|
|
2
|
+
|
|
3
|
+
你是一个 SQL 转换专家。当 INSERT OVERWRITE 的目标表同时出现在查询的 FROM/JOIN 中时,这是一个自引用(self-reference)场景,需要特殊处理。
|
|
4
|
+
|
|
5
|
+
## 自引用检测
|
|
6
|
+
|
|
7
|
+
### 判断条件
|
|
8
|
+
|
|
9
|
+
1. 从 INSERT OVERWRITE 语句中提取目标表名(含 schema)
|
|
10
|
+
2. 在 SELECT 查询的 FROM 和 JOIN 子句中搜索该表名
|
|
11
|
+
3. 排除 PARTITION 子句中的表名引用(不算自引用)
|
|
12
|
+
4. 如果在 FROM/JOIN 中找到目标表名 → 判定为自引用
|
|
13
|
+
|
|
14
|
+
### 示例
|
|
15
|
+
|
|
16
|
+
```sql
|
|
17
|
+
-- 目标表: kscdm.daily_sales
|
|
18
|
+
INSERT OVERWRITE TABLE kscdm.daily_sales PARTITION(ds='${ds}')
|
|
19
|
+
SELECT current.id, current.amount
|
|
20
|
+
FROM source_sales current
|
|
21
|
+
LEFT JOIN kscdm.daily_sales prev ON current.id = prev.id -- ← 自引用
|
|
22
|
+
WHERE prev.ds = '${ds - 1}';
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
## 转换规则
|
|
26
|
+
|
|
27
|
+
自引用表的转换与普通表基本相同,但有以下区别:
|
|
28
|
+
|
|
29
|
+
### 1. 显式 Schema 声明
|
|
30
|
+
|
|
31
|
+
自引用表必须在 CREATE DYNAMIC TABLE 中显式声明完整的列定义(含类型),因为 SQL 引擎需要这些信息来推理自依赖列的类型:
|
|
32
|
+
|
|
33
|
+
```sql
|
|
34
|
+
CREATE OR REPLACE DYNAMIC TABLE kscdm.daily_sales (
|
|
35
|
+
id BIGINT COMMENT '...',
|
|
36
|
+
amount DECIMAL(10,2) COMMENT '...',
|
|
37
|
+
ds STRING COMMENT '...'
|
|
38
|
+
)
|
|
39
|
+
PARTITIONED BY (ds)
|
|
40
|
+
AS
|
|
41
|
+
SELECT current.id, current.amount,
|
|
42
|
+
SESSION_CONFIGS()['dt.args.ds'] AS ds
|
|
43
|
+
FROM source_sales current
|
|
44
|
+
LEFT JOIN kscdm.daily_sales prev ON current.id = prev.id
|
|
45
|
+
WHERE prev.ds = DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')::STRING;
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
### 2. 查询中保留自引用
|
|
49
|
+
|
|
50
|
+
转换后的 AS 子句中,自引用表名保持不变,不做任何替换。SQL 引擎会自动处理自引用的版本管理。
|
|
51
|
+
|
|
52
|
+
## 常见自引用场景
|
|
53
|
+
|
|
54
|
+
### 日环比计算
|
|
55
|
+
|
|
56
|
+
```sql
|
|
57
|
+
-- 输入
|
|
58
|
+
INSERT OVERWRITE TABLE metrics PARTITION(ds='${ds}')
|
|
59
|
+
SELECT t.id, t.value,
|
|
60
|
+
t.value - prev.value AS daily_change
|
|
61
|
+
FROM source t
|
|
62
|
+
LEFT JOIN metrics prev ON t.id = prev.id AND prev.ds = '${ds - 1}';
|
|
63
|
+
|
|
64
|
+
-- 输出
|
|
65
|
+
CREATE OR REPLACE DYNAMIC TABLE metrics (
|
|
66
|
+
id BIGINT, value DECIMAL(10,2), daily_change DECIMAL(10,2), ds STRING
|
|
67
|
+
)
|
|
68
|
+
PARTITIONED BY (ds)
|
|
69
|
+
AS
|
|
70
|
+
SELECT t.id, t.value,
|
|
71
|
+
t.value - prev.value AS daily_change,
|
|
72
|
+
SESSION_CONFIGS()['dt.args.ds'] AS ds
|
|
73
|
+
FROM source t
|
|
74
|
+
LEFT JOIN metrics prev ON t.id = prev.id
|
|
75
|
+
AND prev.ds = DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')::STRING;
|
|
76
|
+
```
|
|
@@ -0,0 +1,109 @@
|
|
|
1
|
+
# SQL → Dynamic Table 完整转换工作流
|
|
2
|
+
|
|
3
|
+
当用户给你一组 CREATE TABLE DDL 和 INSERT OVERWRITE SQL,要求转换为 Dynamic Table 时,按以下步骤顺序执行。
|
|
4
|
+
|
|
5
|
+
每一步的详细规则在对应的 skill 文件中,你需要同时引用它们。
|
|
6
|
+
|
|
7
|
+
## 工作流步骤
|
|
8
|
+
|
|
9
|
+
### Step 1: 预处理输入
|
|
10
|
+
|
|
11
|
+
从 INSERT OVERWRITE 文件中移除:
|
|
12
|
+
- 所有 `ALTER TABLE` 语句
|
|
13
|
+
- `ANALYZE TABLE` 语句
|
|
14
|
+
- SQL 注释(`--` 和 `/* */`)
|
|
15
|
+
|
|
16
|
+
保留:CREATE TABLE、INSERT OVERWRITE、WITH、SET、CREATE TEMPORARY FUNCTION。
|
|
17
|
+
|
|
18
|
+
### Step 2: 占位符替换
|
|
19
|
+
|
|
20
|
+
按 #[[file:sql2dt-placeholder-rules.md]] 中的规则:
|
|
21
|
+
1. 统一占位符格式(`{{ }}` → `${ }`)
|
|
22
|
+
2. 替换所有占位符为 `SESSION_CONFIGS()` 调用
|
|
23
|
+
3. 处理 nodash 变量、日期运算、macros 函数
|
|
24
|
+
4. 根据引号上下文决定处理方式(去引号 / CONCAT / 直接替换)
|
|
25
|
+
|
|
26
|
+
### Step 3: 自引用检测
|
|
27
|
+
|
|
28
|
+
按 #[[file:sql2dt-self-reference-rules.md]] 中的规则:
|
|
29
|
+
1. 检查 INSERT OVERWRITE 目标表是否出现在 FROM/JOIN 中
|
|
30
|
+
2. 如果是自引用表,标记并在后续步骤中添加注释、使用显式 schema
|
|
31
|
+
|
|
32
|
+
### Step 4: 核心转换
|
|
33
|
+
|
|
34
|
+
按 #[[file:sql2dt-conversion-rules.md]] 中的规则:
|
|
35
|
+
1. 解析 CREATE TABLE DDL(提取列、分区、属性等)
|
|
36
|
+
2. 解析 INSERT OVERWRITE(提取查询、分区类型)
|
|
37
|
+
3. 组装 `CREATE OR REPLACE DYNAMIC TABLE ... AS SELECT ...`
|
|
38
|
+
4. 注入静态分区值到 SELECT(智能引号处理)
|
|
39
|
+
5. 合并表属性模板(默认 `data_lifecycle=15`)
|
|
40
|
+
6. 处理 UNION ALL(每个分支独立注入)
|
|
41
|
+
7. 日期函数后处理:将所有 `DATE_SUB/DATE_ADD` 统一转为 `sub_days`
|
|
42
|
+
|
|
43
|
+
### Step 5: 列校验
|
|
44
|
+
|
|
45
|
+
按 #[[file:sql2dt-column-validation-rules.md]] 中的规则:
|
|
46
|
+
1. 计算 schema 列数和 SELECT 列数
|
|
47
|
+
2. 验证两者相等
|
|
48
|
+
3. 检查重复别名和缺失分区列
|
|
49
|
+
4. UNION ALL 分支列数一致性检查
|
|
50
|
+
|
|
51
|
+
### Step 6: 生成配套文件
|
|
52
|
+
|
|
53
|
+
按 #[[file:sql2dt-refresh-rules.md]] 中的规则:
|
|
54
|
+
1. 从 DDL 中提取所有 SESSION_CONFIGS 变量
|
|
55
|
+
2. 生成当前周期 refresh 语句
|
|
56
|
+
3. 生成上一周期 prev_refresh 语句
|
|
57
|
+
4. 生成回填 backfill 语句
|
|
58
|
+
|
|
59
|
+
### Step 7: 转换后改进建议
|
|
60
|
+
|
|
61
|
+
DDL 生成完成后,对转换结果做以下检查,并主动向用户提出改进建议:
|
|
62
|
+
|
|
63
|
+
**检查 1:非分区表 + 持续写入风险**
|
|
64
|
+
|
|
65
|
+
按 #[[file:../best-practices/non-partitioned-merge-into-warning.md]] 中的判断逻辑:
|
|
66
|
+
- 生成的 DT 是非分区表(无 `PARTITIONED BY` 也无 `SESSION_CONFIGS()`)
|
|
67
|
+
- 且 SQL 中包含 `ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ... DESC) WHERE rn = 1` 去重模式
|
|
68
|
+
|
|
69
|
+
→ 满足条件时,使用该文档中的告警话术模板向用户发出风险提示,并建议改用 MERGE INTO + Table Stream 方案。
|
|
70
|
+
|
|
71
|
+
**检查 2:SQL 性能优化机会**
|
|
72
|
+
|
|
73
|
+
按 #[[file:../best-practices/performance-optimization.md]] 中的规则,扫描生成的 DT SQL:
|
|
74
|
+
- 存在 `LEFT/RIGHT/FULL OUTER JOIN` → 提示如果业务允许,改用 INNER JOIN 可提升增量效率
|
|
75
|
+
- 存在无 `PARTITION BY` 的窗口函数 → 提示添加 PARTITION BY,否则每次增量都全量重算
|
|
76
|
+
- `GROUP BY` 使用了复杂表达式(如 `DATE_TRUNC`、`SUBSTR`)→ 提示考虑在上游预计算或拆分为多级 DT
|
|
77
|
+
|
|
78
|
+
**检查 3:JOIN 中是否有维度表**
|
|
79
|
+
|
|
80
|
+
按 #[[file:../best-practices/dimension-table-join-guide.md]] 中的推荐场景:
|
|
81
|
+
- SQL 中存在 JOIN → 询问用户右侧表是否为低频变更的维度表(码表、字典表、配置表等)
|
|
82
|
+
- 如果是 → 建议在 TBLPROPERTIES 中添加 `mv_const_tables` 配置,并说明其行为和数据一致性权衡
|
|
83
|
+
|
|
84
|
+
## 输出清单
|
|
85
|
+
|
|
86
|
+
对每个表,最终输出:
|
|
87
|
+
|
|
88
|
+
| 文件 | 内容 | 条件 |
|
|
89
|
+
|------|------|------|
|
|
90
|
+
| `表名.sql` | Dynamic Table DDL | 始终生成 |
|
|
91
|
+
| `表名_refresh.sql` | 当前周期 REFRESH 语句 | 始终生成 |
|
|
92
|
+
| `表名_prev_refresh.sql` | 上一周期 REFRESH 语句 | 仅有分区变量时 |
|
|
93
|
+
| `表名_backfill.sql` | 回填语句 | 仅有分区变量时 |
|
|
94
|
+
|
|
95
|
+
## 快速判断路径
|
|
96
|
+
|
|
97
|
+
```
|
|
98
|
+
输入 DDL + INSERT OVERWRITE
|
|
99
|
+
│
|
|
100
|
+
├─ 有占位符? → Step 2 占位符替换
|
|
101
|
+
│
|
|
102
|
+
├─ 自引用? → Step 3 特殊处理
|
|
103
|
+
│
|
|
104
|
+
├─ 有静态分区? → Step 4 注入分区值到 SELECT
|
|
105
|
+
│
|
|
106
|
+
├─ 有 UNION ALL? → Step 4 每个分支独立注入
|
|
107
|
+
│
|
|
108
|
+
└─ 生成 DDL → Step 5 校验 → Step 6 生成配套文件 → Step 7 改进建议
|
|
109
|
+
```
|