@clickzetta/cz-cli-darwin-x64 0.3.89 → 0.3.90

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. package/bin/cz-cli +0 -0
  2. package/bin/skills/clickzetta-dynamic-table/SKILL.md +169 -169
  3. package/bin/skills/clickzetta-dynamic-table/best-practices/dimension-table-join-guide.md +126 -126
  4. package/bin/skills/clickzetta-dynamic-table/best-practices/medallion-and-stream-patterns.md +25 -25
  5. package/bin/skills/clickzetta-dynamic-table/best-practices/non-partitioned-merge-into-warning.md +48 -48
  6. package/bin/skills/clickzetta-dynamic-table/best-practices/performance-optimization.md +51 -51
  7. package/bin/skills/clickzetta-dynamic-table/best-practices/scheduling-guide.md +59 -59
  8. package/bin/skills/clickzetta-dynamic-table/dt-creator/SKILL.md +8 -7
  9. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md +99 -99
  10. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/incremental-config-reference.md +188 -188
  11. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/refresh-history-guide.md +117 -117
  12. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/sql-limitations.md +29 -29
  13. package/bin/skills/clickzetta-dynamic-table/dynamic-table-alter/SKILL.md +80 -79
  14. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/SKILL.md +15 -15
  15. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-column-validation-rules.md +61 -61
  16. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-conversion-rules.md +100 -100
  17. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-placeholder-rules.md +64 -64
  18. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-refresh-rules.md +32 -32
  19. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-self-reference-rules.md +21 -21
  20. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-workflow.md +71 -71
  21. package/bin/skills/clickzetta-sql-pipeline-manager/SKILL.md +203 -202
  22. package/bin/skills/clickzetta-sql-pipeline-manager/references/dynamic-table.md +62 -62
  23. package/bin/skills/clickzetta-sql-pipeline-manager/references/materialized-view.md +34 -34
  24. package/bin/skills/clickzetta-sql-pipeline-manager/references/pipe.md +61 -61
  25. package/bin/skills/clickzetta-sql-pipeline-manager/references/table-stream.md +41 -41
  26. package/bin/skills/clickzetta-table-stream-pipeline/SKILL.md +103 -101
  27. package/package.json +1 -1
@@ -1,122 +1,122 @@
1
- # SQL → Dynamic Table 转换规则
1
+ # SQL → Dynamic Table Conversion Rules
2
2
 
3
- 你是一个 SQL 转换专家。给定一个 Hive/Spark SQL CREATE TABLE DDL 和对应的 INSERT OVERWRITE 语句,你需要按以下规则将它们合并为一个 Dynamic Table DDL
3
+ You are a SQL conversion expert. Given a CREATE TABLE DDL and corresponding INSERT OVERWRITE statement from Hive/Spark SQL, you need to merge them into a Dynamic Table DDL following the rules below.
4
4
 
5
- ## 总体转换公式
5
+ ## Overall Conversion Formula
6
6
 
7
7
  ```
8
- 输入1: CREATE TABLE schema.table_name (...) PARTITIONED BY (...) ...
9
- 输入2: INSERT OVERWRITE TABLE schema.table_name PARTITION(...) SELECT ... FROM ...
10
- 输出: CREATE OR REPLACE DYNAMIC TABLE schema.table_name (...) PARTITIONED BY (...) ... AS SELECT ... FROM ...
8
+ Input 1: CREATE TABLE schema.table_name (...) PARTITIONED BY (...) ...
9
+ Input 2: INSERT OVERWRITE TABLE schema.table_name PARTITION(...) SELECT ... FROM ...
10
+ Output: CREATE OR REPLACE DYNAMIC TABLE schema.table_name (...) PARTITIONED BY (...) ... AS SELECT ... FROM ...
11
11
  ```
12
12
 
13
- 核心思想:把 CREATE TABLE 的结构定义 + INSERT OVERWRITE 的查询逻辑,合并成一个 `CREATE OR REPLACE DYNAMIC TABLE ... AS SELECT ...` 语句。
13
+ Core idea: merge the structure definition from CREATE TABLE with the query logic from INSERT OVERWRITE into a single `CREATE OR REPLACE DYNAMIC TABLE ... AS SELECT ...` statement.
14
14
 
15
- ## 第一步:解析 CREATE TABLE DDL
15
+ ## Step 1: Parse the CREATE TABLE DDL
16
16
 
17
- DDL 中提取以下信息:
17
+ Extract the following information from the DDL:
18
18
 
19
- 1. **表名**(含 schema):`schema.table_name`
20
- 2. **普通列**:列名、数据类型、COMMENT(保持原始缩进格式)
21
- 3. **分区列**:PARTITIONED BY 中的列名、数据类型、COMMENT
22
- 4. **存储格式**:STORED AS PARQUET/ORC/AVRO
23
- 5. **表属性**:TBLPROPERTIES WITH PROPERTIES 中的键值对
24
- 6. **分桶信息**:CLUSTERED BY / SORTED BY / RANGE CLUSTERED BY / HASH CLUSTERED BY
25
- 7. **生命周期**:LIFECYCLE N
26
- 8. **连接信息**:CONNECTION schema.connection_name
27
- 9. **位置信息**:LOCATION 'path'
19
+ 1. **Table name** (including schema): `schema.table_name`
20
+ 2. **Regular columns**: column name, data type, COMMENT (preserve original indentation format)
21
+ 3. **Partition columns**: column name, data type, COMMENT from PARTITIONED BY
22
+ 4. **Storage format**: STORED AS PARQUET/ORC/AVRO, etc.
23
+ 5. **Table properties**: key-value pairs from TBLPROPERTIES or WITH PROPERTIES
24
+ 6. **Bucketing info**: CLUSTERED BY / SORTED BY / RANGE CLUSTERED BY / HASH CLUSTERED BY
25
+ 7. **Lifecycle**: LIFECYCLE N
26
+ 8. **Connection info**: CONNECTION schema.connection_name
27
+ 9. **Location info**: LOCATION 'path'
28
28
 
29
- ## 第二步:解析 INSERT OVERWRITE 语句
29
+ ## Step 2: Parse the INSERT OVERWRITE Statement
30
30
 
31
- INSERT 语句中提取:
31
+ Extract from the INSERT statement:
32
32
 
33
- 1. **目标表名**:用于自引用检测
34
- 2. **分区类型**:
35
- - 动态分区:`PARTITION (col1, col2)` — 列名无值
36
- - 静态分区:`PARTITION (col1='value1', col2=value2)` — 列名有值
37
- - 混合分区:`PARTITION (static_col='value', dynamic_col)` — 部分有值
38
- 3. **SELECT 查询**:完整的查询逻辑(含 WHEREJOINGROUP BY 等)
39
- 4. **CTEWITH 子句)**:如果有,保留完整的 WITH ... AS (...) 结构
40
- 5. **前置语句**:SET 语句、CREATE TEMPORARY FUNCTION 等(保留)
33
+ 1. **Target table name**: used for self-reference detection
34
+ 2. **Partition type**:
35
+ - Dynamic partition: `PARTITION (col1, col2)` — column names without values
36
+ - Static partition: `PARTITION (col1='value1', col2=value2)` — column names with values
37
+ - Mixed partition: `PARTITION (static_col='value', dynamic_col)` — some with values
38
+ 3. **SELECT query**: complete query logic (including WHERE, JOIN, GROUP BY, etc.)
39
+ 4. **CTE (WITH clause)**: if present, retain the complete `WITH ... AS (...)` structure
40
+ 5. **Preceding statements**: SET statements, CREATE TEMPORARY FUNCTION, etc. (retain)
41
41
 
42
- ### 需要过滤的语句
42
+ ### Statements to Filter Out
43
43
 
44
- INSERT 文件中移除:
44
+ Remove from the INSERT file:
45
45
  - `ALTER TABLE ... ADD PARTITION ...`
46
46
  - `ALTER TABLE ... DROP PARTITION ...`
47
- - 所有 `ALTER TABLE` 开头的语句
48
- - `ANALYZE TABLE` 语句
49
- - SQL 注释(`--` `/* */`)
47
+ - All statements starting with `ALTER TABLE`
48
+ - `ANALYZE TABLE` statements
49
+ - SQL comments (`--` and `/* */`)
50
50
 
51
- ## 第三步:组装 Dynamic Table DDL
51
+ ## Step 3: Assemble the Dynamic Table DDL
52
52
 
53
- 按以下顺序组装输出:
53
+ Assemble the output in the following order:
54
54
 
55
55
  ```sql
56
- -- 可选:如果需要删除已存在的同名表,请取消下一行的注释
56
+ -- Optional: to drop an existing table with the same name, uncomment the next line
57
57
  -- DROP TABLE IF EXISTS schema.table_name;
58
58
 
59
- CREATE SCHEMA IF NOT EXISTS schema; -- 仅当表名含 schema
59
+ CREATE SCHEMA IF NOT EXISTS schema; -- only when table name contains schema
60
60
  CREATE OR REPLACE DYNAMIC TABLE schema.table_name (
61
- col1 BIGINT COMMENT '...', -- 普通列(保持原始格式)
61
+ col1 BIGINT COMMENT '...', -- regular columns (preserve original format)
62
62
  col2 STRING COMMENT '...',
63
- part_col1 STRING COMMENT '...' -- 分区列追加在普通列后面
63
+ part_col1 STRING COMMENT '...' -- partition columns appended after regular columns
64
64
  )
65
- PARTITIONED BY (part_col1, part_col2) -- 仅列名,不含类型
65
+ PARTITIONED BY (part_col1, part_col2) -- column names only, no types
66
66
  [CLUSTERED BY (...) [SORTED BY (...)] [INTO N BUCKETS]]
67
67
  [STORED AS PARQUET]
68
- TBLPROPERTIES ('key' = 'value') -- 合并模板属性和原始属性
68
+ TBLPROPERTIES ('key' = 'value') -- merge template properties and original properties
69
69
  [LIFECYCLE N]
70
70
  [CONNECTION schema.connection_name]
71
- [LOCATION 'original_path_dt'] -- 原路径加 _dt 后缀
71
+ [LOCATION 'original_path_dt'] -- original path with _dt suffix
72
72
  AS
73
- SELECT查询; -- 来自 INSERT OVERWRITE 的查询
73
+ SELECT query; -- query from INSERT OVERWRITE
74
74
  ```
75
75
 
76
- ### 关键规则
76
+ ### Key Rules
77
77
 
78
- 1. **列定义**:普通列 + 分区列合并到一个括号内,保持原始缩进
79
- 2. **PARTITIONED BY**:只写列名,不写类型(与 CREATE TABLE 不同)
80
- 3. **CREATE SCHEMA**:如果表名含 `.`(如 `kscdm.table_name`),在 DDL 前加 `CREATE SCHEMA IF NOT EXISTS kscdm;`
81
- 4. **LOCATION**:原路径加 `_dt` 后缀
82
- 5. **DROP 语句**:注释掉的 `DROP TABLE IF EXISTS` 放在最前面
78
+ 1. **Column definitions**: regular columns + partition columns merged into one set of parentheses, preserving original indentation
79
+ 2. **PARTITIONED BY**: write column names only, no types (unlike CREATE TABLE)
80
+ 3. **CREATE SCHEMA**: if the table name contains `.` (e.g., `kscdm.table_name`), add `CREATE SCHEMA IF NOT EXISTS kscdm;` before the DDL
81
+ 4. **LOCATION**: original path with `_dt` suffix
82
+ 5. **DROP statement**: commented-out `DROP TABLE IF EXISTS` placed at the very beginning
83
83
 
84
- ## 第四步:静态分区注入
84
+ ## Step 4: Static Partition Injection
85
85
 
86
- INSERT OVERWRITE 使用静态分区(`PARTITION(col=value)`)时,需要将分区值注入到 SELECT 子句中。
86
+ When INSERT OVERWRITE uses static partitions (`PARTITION(col=value)`), partition values need to be injected into the SELECT clause.
87
87
 
88
- ### 注入规则
88
+ ### Injection Rules
89
89
 
90
- SELECT 的最后一个列之后、FROM 之前,按 DDL 中分区列的定义顺序追加:
90
+ After the last column in SELECT and before FROM, append in the order of partition column definitions in the DDL:
91
91
 
92
92
  ```sql
93
- -- 原始 SELECT
93
+ -- Original SELECT
94
94
  SELECT col1, col2 FROM source_table
95
95
 
96
- -- 注入后(假设 PARTITION(year=2024, month='January')
96
+ -- After injection (assuming PARTITION(year=2024, month='January'))
97
97
  SELECT col1, col2,
98
98
  2024 AS year,
99
99
  'January' AS month
100
100
  FROM source_table
101
101
  ```
102
102
 
103
- ### 值类型智能处理
103
+ ### Smart Value Type Handling
104
104
 
105
- 注入时根据值的类型决定是否加引号:
105
+ Decide whether to add quotes based on the value type when injecting:
106
106
 
107
- | 值类型 | 判断规则 | 处理 | 示例 |
107
+ | Value type | Detection rule | Handling | Example |
108
108
  |--------|----------|------|------|
109
- | 已有引号 | `'` `"` 开头结尾 | 保持原样 | `'hello'` → `'hello'` |
110
- | NULL | 值为 `NULL`(不区分大小写) | 不加引号 | `NULL` |
111
- | 布尔值 | `true` / `false`(不区分大小写) | 不加引号 | `true` |
112
- | 数字 | 可被 `float()` 解析 | 不加引号 | `123`, `-45.67`, `1.23e-4` |
113
- | SESSION_CONFIGS | 包含 `SESSION_CONFIGS(` | 不加引号 | `SESSION_CONFIGS()['dt.args.ds']` |
114
- | 函数调用 | 匹配 `标识符(...)` 且括号平衡 | 不加引号 | `CURRENT_DATE()`, `YEAR(col)` |
115
- | 字符串 | 以上都不匹配 | 加单引号,内部 `'` 转义为 `''` | `hello` → `'hello'` |
109
+ | Already quoted | Starts and ends with `'` or `"` | Keep as-is | `'hello'` → `'hello'` |
110
+ | NULL | Value is `NULL` (case-insensitive) | No quotes | `NULL` |
111
+ | Boolean | `true` / `false` (case-insensitive) | No quotes | `true` |
112
+ | Number | Can be parsed by `float()` | No quotes | `123`, `-45.67`, `1.23e-4` |
113
+ | SESSION_CONFIGS | Contains `SESSION_CONFIGS(` | No quotes | `SESSION_CONFIGS()['dt.args.ds']` |
114
+ | Function call | Matches `identifier(...)` with balanced parentheses | No quotes | `CURRENT_DATE()`, `YEAR(col)` |
115
+ | String | None of the above match | Add single quotes; escape internal `'` as `''` | `hello` → `'hello'` |
116
116
 
117
- ### UNION ALL 处理
117
+ ### UNION ALL Handling
118
118
 
119
- 如果 SELECT 包含 UNION ALL,每个分支都要独立注入分区列:
119
+ If SELECT contains UNION ALL, inject partition columns into each branch independently:
120
120
 
121
121
  ```sql
122
122
  SELECT col1, col2,
@@ -130,60 +130,60 @@ FROM table_b
130
130
 
131
131
  ### CTE + UNION ALL
132
132
 
133
- 如果有 WITH 子句,先分离 CTE 部分,只对主查询中的 UNION 分支注入。
133
+ If there is a WITH clause, first separate the CTE part, then inject only into the UNION branches in the main query.
134
134
 
135
- ### 已存在的分区列
135
+ ### Already-existing Partition Columns
136
136
 
137
- 如果 SELECT 中已经包含了某个分区列(通过 `AS alias` 或末尾标识符检测),则跳过该列的注入,避免重复。
137
+ If SELECT already contains a partition column (detected via `AS alias` or trailing identifier), skip injection for that column to avoid duplication.
138
138
 
139
- ## 第五步:日期函数后处理
139
+ ## Step 5: Date Function Post-processing
140
140
 
141
- 生成 DDL 后,对整个 DDL 文本做一次全局替换:
141
+ After generating the DDL, do a global replacement on the entire DDL text:
142
142
 
143
- | 原始形式 | 替换为 |
143
+ | Original form | Replace with |
144
144
  |----------|--------|
145
145
  | `DATE_SUB(expr, INTERVAL N DAY)` | `sub_days(expr, N)` |
146
146
  | `DATE_ADD(expr, INTERVAL N DAY)` | `sub_days(expr, -N)` |
147
147
 
148
- 这一步确保最终输出统一使用 `sub_days` 函数。
148
+ This step ensures the final output consistently uses the `sub_days` function.
149
149
 
150
- > 注意:在 SQL 引擎中,`SUB_DAYS` `DATE_SUB` 的别名,两者等价。统一使用 `sub_days` 是为了保持输出一致性。
150
+ > Note: In the SQL engine, `SUB_DAYS` is an alias for `DATE_SUB`; they are equivalent. Using `sub_days` uniformly is for output consistency.
151
151
 
152
- ## 第六步:表属性模板合并
152
+ ## Step 6: Table Property Template Merge
153
153
 
154
- 默认模板属性:`data_lifecycle = 15`
154
+ Default template property: `data_lifecycle = 15`
155
155
 
156
- 合并规则:
157
- - 模板属性作为基础
158
- - 原始 DDL 中的 TBLPROPERTIES 覆盖同名模板属性
159
- - 最终结果写入 TBLPROPERTIES
156
+ Merge rules:
157
+ - Template properties serve as the base
158
+ - TBLPROPERTIES from the original DDL override template properties with the same name
159
+ - Final result is written to TBLPROPERTIES
160
160
 
161
161
  ```sql
162
- -- 模板: data_lifecycle=15
163
- -- 原始DDL: TBLPROPERTIES('compression'='snappy', 'data_lifecycle'='30')
164
- -- 合并结果:
162
+ -- Template: data_lifecycle=15
163
+ -- Original DDL: TBLPROPERTIES('compression'='snappy', 'data_lifecycle'='30')
164
+ -- Merged result:
165
165
  TBLPROPERTIES ('data_lifecycle' = '30', 'compression' = 'snappy')
166
- -- data_lifecycle 保留原始值 30compression 来自原始DDL
166
+ -- data_lifecycle retains original value 30; compression comes from original DDL
167
167
  ```
168
168
 
169
- ## 完整示例
169
+ ## Complete Example
170
170
 
171
- ### 输入1DDL
171
+ ### Input 1: DDL
172
172
  ```sql
173
173
  CREATE TABLE IF NOT EXISTS sales_data (
174
- id BIGINT COMMENT '销售记录ID',
175
- product_name STRING COMMENT '产品名称',
176
- sales_amount DECIMAL(12,2) COMMENT '销售金额'
174
+ id BIGINT COMMENT 'Sales record ID',
175
+ product_name STRING COMMENT 'Product name',
176
+ sales_amount DECIMAL(12,2) COMMENT 'Sales amount'
177
177
  )
178
178
  PARTITIONED BY (
179
- year INT COMMENT '年份',
180
- month INT COMMENT '月份'
179
+ year INT COMMENT 'Year',
180
+ month INT COMMENT 'Month'
181
181
  )
182
182
  STORED AS PARQUET
183
183
  LOCATION '/data/warehouse/sales_data';
184
184
  ```
185
185
 
186
- ### 输入2INSERT OVERWRITE
186
+ ### Input 2: INSERT OVERWRITE
187
187
  ```sql
188
188
  INSERT OVERWRITE TABLE sales_data
189
189
  PARTITION (year, month)
@@ -197,17 +197,17 @@ FROM raw_sales s
197
197
  WHERE s.status = 'completed';
198
198
  ```
199
199
 
200
- ### 输出:Dynamic Table DDL
200
+ ### Output: Dynamic Table DDL
201
201
  ```sql
202
- -- 可选:如果需要删除已存在的同名表,请取消下一行的注释
202
+ -- Optional: to drop an existing table with the same name, uncomment the next line
203
203
  -- DROP TABLE IF EXISTS sales_data;
204
204
 
205
205
  CREATE OR REPLACE DYNAMIC TABLE sales_data (
206
- id BIGINT COMMENT '销售记录ID',
207
- product_name STRING COMMENT '产品名称',
208
- sales_amount DECIMAL(12,2) COMMENT '销售金额',
209
- year INT COMMENT '年份',
210
- month INT COMMENT '月份'
206
+ id BIGINT COMMENT 'Sales record ID',
207
+ product_name STRING COMMENT 'Product name',
208
+ sales_amount DECIMAL(12,2) COMMENT 'Sales amount',
209
+ year INT COMMENT 'Year',
210
+ month INT COMMENT 'Month'
211
211
  )
212
212
  PARTITIONED BY (year, month)
213
213
  STORED AS PARQUET
@@ -1,164 +1,164 @@
1
- # SQL 占位符 → SESSION_CONFIGS() 转换规则
1
+ # SQL Placeholder → SESSION_CONFIGS() Conversion Rules
2
2
 
3
- 你是一个 SQL 转换专家。在将传统 SQL 转换为 Dynamic Table SQL 时,需要将各种占位符格式统一转换为 `SESSION_CONFIGS()` 函数调用。
3
+ You are a SQL conversion expert. When converting traditional SQL to Dynamic Table SQL, you need to convert various placeholder formats uniformly to `SESSION_CONFIGS()` function calls.
4
4
 
5
- ## 占位符格式统一
5
+ ## Placeholder Format Normalization
6
6
 
7
- 首先将所有旧格式统一为 `${...}` 格式:
7
+ First, normalize all legacy formats to `${...}` format:
8
8
 
9
- | 旧格式 | 统一为 |
9
+ | Legacy format | Normalize to |
10
10
  |--------|--------|
11
11
  | `{{ var }}` | `${var}` |
12
12
  | `{{ ds }}` | `${ds}` |
13
13
  | `{{region}}` | `${region}` |
14
14
 
15
- 转换正则:`\{\{\s*([^}]+)\s*\}\}` → `${\1}`
15
+ Conversion regex: `\{\{\s*([^}]+)\s*\}\}` → `${\1}`
16
16
 
17
- ## 基本替换规则
17
+ ## Basic Replacement Rules
18
18
 
19
- ### 简单变量
19
+ ### Simple Variables
20
20
 
21
- | 输入 | 输出 |
21
+ | Input | Output |
22
22
  |------|------|
23
23
  | `${ds}` | `SESSION_CONFIGS()['dt.args.ds']` |
24
24
  | `${region}` | `SESSION_CONFIGS()['dt.args.region']` |
25
25
  | `${hour}` | `SESSION_CONFIGS()['dt.args.hour']` |
26
26
 
27
- ### nodash 变量(特殊处理)
27
+ ### nodash Variables (Special Handling)
28
28
 
29
- 变量名中包含 `nodash` 时,自动包装 DATE_FORMAT,但变量名保持原样:
29
+ When the variable name contains `nodash`, automatically wrap with DATE_FORMAT, but keep the variable name as-is:
30
30
 
31
- | 输入 | 输出 |
31
+ | Input | Output |
32
32
  |------|------|
33
33
  | `${ds_nodash}` | `DATE_FORMAT(SESSION_CONFIGS()['dt.args.ds_nodash'], 'yyyyMMdd')` |
34
34
  | `${dsnodash}` | `DATE_FORMAT(SESSION_CONFIGS()['dt.args.dsnodash'], 'yyyyMMdd')` |
35
35
 
36
- 注意:变量名保持原样(`ds_nodash` 不会变成 `ds`),只是外层包 DATE_FORMAT
36
+ Note: the variable name stays as-is (`ds_nodash` does not become `ds`); only the outer DATE_FORMAT is added.
37
37
 
38
- ### 带运算的变量
38
+ ### Variables with Arithmetic
39
39
 
40
- 最终输出统一使用 `sub_days` 函数(有一个后处理步骤会将所有 `DATE_SUB`/`DATE_ADD` 转为 `sub_days`):
40
+ The final output consistently uses the `sub_days` function (a post-processing step converts all `DATE_SUB`/`DATE_ADD` to `sub_days`):
41
41
 
42
- | 输入 | 最终输出 |
42
+ | Input | Final output |
43
43
  |------|----------|
44
44
  | `${ds - 1}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')` |
45
45
  | `${ds + 7}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], -7), 'yyyy-MM-dd')` |
46
46
  | `${ds_nodash - 1}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds_nodash'], 1), 'yyyyMMdd')::STRING` |
47
47
 
48
- 规则:
49
- - `-` 运算 → `sub_days(..., N)`(N 为正数)
50
- - `+` 运算 → `sub_days(..., -N)`(N 取反为负数)
51
- - 外层包 `DATE_FORMAT`,格式根据变量名决定:
52
- - `nodash` → `'yyyyMMdd'`
53
- - 不含 `nodash` → `'yyyy-MM-dd'`
54
- - `nodash` 的变量带运算时,追加 `::STRING` 类型转换
48
+ Rules:
49
+ - `-` operation → `sub_days(..., N)` (N is positive)
50
+ - `+` operation → `sub_days(..., -N)` (N negated to negative)
51
+ - Outer `DATE_FORMAT`, format determined by variable name:
52
+ - Contains `nodash` → `'yyyyMMdd'`
53
+ - Does not contain `nodash` → `'yyyy-MM-dd'`
54
+ - Variables containing `nodash` with arithmetic append `::STRING` type cast
55
55
 
56
- 注意:这是最终输出形式。中间步骤可能先生成 `DATE_SUB`/`DATE_ADD`,但最终会被后处理统一转为 `sub_days`。
56
+ Note: this is the final output form. Intermediate steps may first generate `DATE_SUB`/`DATE_ADD`, but they will be uniformly converted to `sub_days` by post-processing.
57
57
 
58
- ### macros.ds_add 函数
58
+ ### macros.ds_add Function
59
59
 
60
- | 输入 | 输出 |
60
+ | Input | Output |
61
61
  |------|------|
62
62
  | `${macros.ds_add(ds, -1)}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')` |
63
63
  | `${macros.ds_add(ds, 7)}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], -7), 'yyyy-MM-dd')` |
64
64
 
65
- 注意:`macros.ds_add` 的第二个参数与 `sub_days` 的参数符号相反。`macros.ds_add(ds, -1)` 表示 ds 1 天,对应 `sub_days(ds, 1)`(正数=减天数);`macros.ds_add(ds, 7)` 表示 ds 7 天,对应 `sub_days(ds, -7)`(负数=加天数)。
65
+ Note: the second parameter of `macros.ds_add` has the opposite sign from `sub_days`. `macros.ds_add(ds, -1)` means ds minus 1 day, corresponding to `sub_days(ds, 1)` (positive = subtract days); `macros.ds_add(ds, 7)` means ds plus 7 days, corresponding to `sub_days(ds, -7)` (negative = add days).
66
66
 
67
- ## 引号上下文规则
67
+ ## Quote Context Rules
68
68
 
69
- 占位符的处理方式取决于它所在的引号上下文:
69
+ The handling of a placeholder depends on the quote context it is in:
70
70
 
71
- ### 情况1:占位符在单引号内(纯占位符)
71
+ ### Case 1: Placeholder inside single quotes (pure placeholder)
72
72
 
73
73
  ```sql
74
- -- 输入
74
+ -- Input
75
75
  WHERE dt = '${ds}'
76
- -- 输出(去除外层引号,直接替换)
76
+ -- Output (remove outer quotes; direct replacement)
77
77
  WHERE dt = SESSION_CONFIGS()['dt.args.ds']
78
78
  ```
79
79
 
80
- ### 情况2:占位符在单引号内(混合内容)
80
+ ### Case 2: Placeholder inside single quotes (mixed content)
81
81
 
82
- 当引号内同时包含占位符和字面文本时,使用 CONCAT
82
+ When the quoted string contains both a placeholder and literal text, use CONCAT:
83
83
 
84
84
  ```sql
85
- -- 输入
85
+ -- Input
86
86
  WHERE dt = '${ds_nodash}_done'
87
- -- 输出
87
+ -- Output
88
88
  WHERE dt = CONCAT(DATE_FORMAT(SESSION_CONFIGS()['dt.args.ds_nodash'], 'yyyyMMdd'), '_done')
89
89
  ```
90
90
 
91
91
  ```sql
92
- -- 输入
92
+ -- Input
93
93
  WHERE path = '/data/${region}/output'
94
- -- 输出
94
+ -- Output
95
95
  WHERE path = CONCAT('/data/', SESSION_CONFIGS()['dt.args.region'], '/output')
96
96
  ```
97
97
 
98
- ### 情况3:占位符不在引号内
98
+ ### Case 3: Placeholder not inside quotes
99
99
 
100
100
  ```sql
101
- -- 输入
101
+ -- Input
102
102
  WHERE dt = ${ds}
103
- -- 输出
103
+ -- Output
104
104
  WHERE dt = SESSION_CONFIGS()['dt.args.ds']
105
105
  ```
106
106
 
107
- ### 情况4:占位符在单引号内,且是日期运算
107
+ ### Case 4: Placeholder inside single quotes with date arithmetic
108
108
 
109
109
  ```sql
110
- -- 输入
110
+ -- Input
111
111
  WHERE dt = '${ds - 1}'
112
- -- 输出(去除外层引号,添加 ::STRING 类型转换)
112
+ -- Output (remove outer quotes; add ::STRING type cast)
113
113
  WHERE dt = DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')::STRING
114
114
  ```
115
115
 
116
- ### 引号内的引号选择
116
+ ### Quote Selection Inside Strings
117
117
 
118
- 当替换后的表达式仍然处于单引号字符串内部时(如 CONCAT 场景),SESSION_CONFIGS 的键名使用双引号以避免引号冲突:
118
+ When the replaced expression is still inside a single-quoted string (e.g., CONCAT scenario), use double quotes for SESSION_CONFIGS key names to avoid quote conflicts:
119
119
  ```sql
120
- -- CONCAT 等单引号上下文中
120
+ -- Inside single-quote context (e.g., CONCAT)
121
121
  CONCAT('prefix_', SESSION_CONFIGS()["dt.args.ds"])
122
122
 
123
- -- 独立表达式(外层引号已去除)
123
+ -- Standalone expression (outer quotes already removed)
124
124
  SESSION_CONFIGS()['dt.args.ds']
125
125
  ```
126
126
 
127
- ## 静态分区中的占位符
127
+ ## Placeholders in Static Partitions
128
128
 
129
- 静态分区值中的占位符替换后,值会被注入到 SELECT 子句:
129
+ Placeholders in static partition values are replaced and then injected into the SELECT clause:
130
130
 
131
131
  ```sql
132
- -- 输入
132
+ -- Input
133
133
  INSERT OVERWRITE TABLE t PARTITION(dt='${ds}', region='${region}')
134
134
  SELECT col1 FROM source;
135
135
 
136
- -- 转换后
136
+ -- After conversion
137
137
  SELECT col1,
138
138
  SESSION_CONFIGS()['dt.args.ds'] AS dt,
139
139
  SESSION_CONFIGS()['dt.args.region'] AS region
140
140
  FROM source;
141
141
  ```
142
142
 
143
- ## 不可识别的表达式
143
+ ## Unrecognizable Expressions
144
144
 
145
- 对于无法解析的复杂表达式(如 Airflow Jinja 模板),进行清洗:
146
- 1. Python strftime 格式符转为 SQL 风格:`%Y`→`yyyy`, `%m`→`MM`, `%d`→`dd`, `%H`→`HH`
147
- 2. 非字母数字下划线字符替换为 `_`
148
- 3. 合并连续下划线,去除首尾下划线
149
- 4. 用清洗后的字符串作为 SESSION_CONFIGS 的键名
145
+ For complex expressions that cannot be parsed (e.g., Airflow Jinja templates), clean them up:
146
+ 1. Convert Python strftime format specifiers to SQL style: `%Y`→`yyyy`, `%m`→`MM`, `%d`→`dd`, `%H`→`HH`
147
+ 2. Replace non-alphanumeric-underscore characters with `_`
148
+ 3. Merge consecutive underscores; remove leading/trailing underscores
149
+ 4. Use the cleaned string as the SESSION_CONFIGS key name
150
150
 
151
151
  ```sql
152
- -- 输入
152
+ -- Input
153
153
  ${execution_date.strftime("%H00")}
154
- -- 清洗后键名: execution_date_strftime_HH00
155
- -- 输出
154
+ -- Cleaned key name: execution_date_strftime_HH00
155
+ -- Output
156
156
  SESSION_CONFIGS()['dt.args.execution_date_strftime_HH00']
157
157
  ```
158
158
 
159
- ## 完整示例
159
+ ## Complete Example
160
160
 
161
- ### 输入
161
+ ### Input
162
162
  ```sql
163
163
  INSERT OVERWRITE TABLE kscdm.dim_table
164
164
  PARTITION(p_date='{{ ds_nodash }}_done', product='done', dt='{{ ds }}')
@@ -169,7 +169,7 @@ WHERE dt = '{{ ds }}'
169
169
  AND region = '{{ region }}';
170
170
  ```
171
171
 
172
- ### 输出(占位符替换后)
172
+ ### Output (after placeholder replacement)
173
173
  ```sql
174
174
  SELECT id, name,
175
175
  CONCAT(DATE_FORMAT(SESSION_CONFIGS()['dt.args.ds_nodash'], 'yyyyMMdd'), '_done') AS p_date,
@@ -1,20 +1,20 @@
1
- # Dynamic Table Refresh 与调度文件生成规则
1
+ # Dynamic Table Refresh and Scheduling File Generation Rules
2
2
 
3
- 你是一个 SQL 转换专家。在生成 Dynamic Table DDL 之后,还需要生成配套的 refresh 语句、回填语句和调度配置文件。
3
+ You are a SQL conversion expert. After generating the Dynamic Table DDL, you also need to generate companion refresh statements, backfill statements, and scheduling configuration files.
4
4
 
5
- ## Refresh 语句生成
5
+ ## Refresh Statement Generation
6
6
 
7
- ### 变量提取
7
+ ### Variable Extraction
8
8
 
9
- 从转换后的 DDL 中提取所有 `SESSION_CONFIGS()['dt.args.XXX']` 中的变量名 XXX,去重后排序。
9
+ Extract all variable names XXX from `SESSION_CONFIGS()['dt.args.XXX']` in the converted DDL, deduplicate, and sort.
10
10
 
11
- 注意:只提取 DDL 中实际出现的变量名。例如如果 DDL 中只有 `SESSION_CONFIGS()['dt.args.ds_nodash']`,则只生成 `ds_nodash` 一个变量的 SET 语句。
11
+ Note: only extract variable names that actually appear in the DDL. For example, if the DDL only contains `SESSION_CONFIGS()['dt.args.ds_nodash']`, only generate a SET statement for the `ds_nodash` variable.
12
12
 
13
- ### 三类 Refresh 文件
13
+ ### Three Types of Refresh Files
14
14
 
15
- 对每个转换的表,生成三类文件:
15
+ For each converted table, generate three types of files:
16
16
 
17
- #### 1. 当前周期 refresh(`表名_refresh.sql`)
17
+ #### 1. Current-cycle refresh (`table_name_refresh.sql`)
18
18
 
19
19
  ```sql
20
20
  set dt.args.ds = ${ds};
@@ -22,13 +22,13 @@ set dt.args.region = ${region};
22
22
  REFRESH DYNAMIC TABLE schema.table_name PARTITION(ds = '${ds}', region = '${region}');
23
23
  ```
24
24
 
25
- 规则:
26
- - 为每个提取到的变量生成一条 `set dt.args.变量名 = ${变量名};`
27
- - 变量按字母序排列
28
- - PARTITION 子句只包含静态分区列(从原始 INSERT OVERWRITE PARTITION 子句中提取)
29
- - 分区值使用 `'${变量名}'` 格式
25
+ Rules:
26
+ - Generate one `set dt.args.variable_name = ${variable_name};` line for each extracted variable
27
+ - Variables sorted alphabetically
28
+ - PARTITION clause includes only static partition columns (extracted from the PARTITION clause of the original INSERT OVERWRITE)
29
+ - Partition values use `'${variable_name}'` format
30
30
 
31
- #### 2. 上一周期 refresh(`表名_prev_refresh.sql`)
31
+ #### 2. Previous-cycle refresh (`table_name_prev_refresh.sql`)
32
32
 
33
33
  ```sql
34
34
  set dt.args.ds = ${prev_ds};
@@ -36,9 +36,9 @@ set dt.args.region = ${prev_region};
36
36
  REFRESH DYNAMIC TABLE schema.table_name PARTITION(ds = '${prev_ds}', region = '${prev_region}');
37
37
  ```
38
38
 
39
- 规则:每个变量名加 `prev_` 前缀。
39
+ Rules: add `prev_` prefix to each variable name.
40
40
 
41
- #### 3. 回填语句(`表名_backfill.sql`)
41
+ #### 3. Backfill statement (`table_name_backfill.sql`)
42
42
 
43
43
  ```sql
44
44
  set cz.optimizer.incremental.backfill.enabled = TRUE;
@@ -49,29 +49,29 @@ FROM ext_schema.table_name
49
49
  WHERE ds = '${ds}' AND region = '${region}';
50
50
  ```
51
51
 
52
- 规则:
53
- - 固定的 backfill 开关 SET 语句
54
- - 从扩展表(ext_schema)SELECT * 到目标表
55
- - WHERE 条件使用静态分区列(从原始 INSERT OVERWRITE PARTITION 子句中提取)
52
+ Rules:
53
+ - Fixed backfill switch SET statement
54
+ - SELECT * from extension table (ext_schema) into target table
55
+ - WHERE condition uses static partition columns (extracted from the PARTITION clause of the original INSERT OVERWRITE)
56
56
 
57
- ### 无分区表
57
+ ### Non-partitioned Tables
58
58
 
59
- 如果表没有静态分区变量:
60
- - 只生成当前周期 refresh:`REFRESH DYNAMIC TABLE schema.table_name;`
61
- - 不生成 prev_refresh backfill 文件
59
+ If the table has no static partition variables:
60
+ - Only generate current-cycle refresh: `REFRESH DYNAMIC TABLE schema.table_name;`
61
+ - Do not generate prev_refresh and backfill files
62
62
 
63
- ### 扩展表名规则
63
+ ### Extension Table Name Rules
64
64
 
65
- - 如果指定了 `ext_schema`:`ext_schema.table_name`
65
+ - If `ext_schema` is specified: `ext_schema.table_name`
66
66
 
67
- ## 完整示例
67
+ ## Complete Example
68
68
 
69
- ### 输入(转换后的 DDL 含以下变量)
69
+ ### Input (converted DDL contains the following variables)
70
70
 
71
- DDL 中包含:`SESSION_CONFIGS()['dt.args.ds']` `SESSION_CONFIGS()['dt.args.region']`
72
- 原始 PARTITION:`PARTITION(dt='${ds}', region='${region}')`
71
+ DDL contains: `SESSION_CONFIGS()['dt.args.ds']` and `SESSION_CONFIGS()['dt.args.region']`
72
+ Original PARTITION: `PARTITION(dt='${ds}', region='${region}')`
73
73
 
74
- ### 输出
74
+ ### Output
75
75
 
76
76
  **refresh.sql:**
77
77
  ```sql