@clickzetta/cz-cli-darwin-arm64 0.3.87 → 0.3.88
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/cz-cli +0 -0
- package/bin/skills/clickzetta-dynamic-table/SKILL.md +169 -169
- package/bin/skills/clickzetta-dynamic-table/best-practices/dimension-table-join-guide.md +126 -126
- package/bin/skills/clickzetta-dynamic-table/best-practices/medallion-and-stream-patterns.md +25 -25
- package/bin/skills/clickzetta-dynamic-table/best-practices/non-partitioned-merge-into-warning.md +48 -48
- package/bin/skills/clickzetta-dynamic-table/best-practices/performance-optimization.md +51 -51
- package/bin/skills/clickzetta-dynamic-table/best-practices/scheduling-guide.md +59 -59
- package/bin/skills/clickzetta-dynamic-table/dt-creator/SKILL.md +8 -7
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md +99 -99
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/incremental-config-reference.md +188 -188
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/refresh-history-guide.md +117 -117
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/sql-limitations.md +29 -29
- package/bin/skills/clickzetta-dynamic-table/dynamic-table-alter/SKILL.md +80 -79
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/SKILL.md +15 -15
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-column-validation-rules.md +61 -61
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-conversion-rules.md +100 -100
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-placeholder-rules.md +64 -64
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-refresh-rules.md +32 -32
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-self-reference-rules.md +21 -21
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-workflow.md +71 -71
- package/bin/skills/clickzetta-sql-pipeline-manager/SKILL.md +203 -202
- package/bin/skills/clickzetta-sql-pipeline-manager/references/dynamic-table.md +62 -62
- package/bin/skills/clickzetta-sql-pipeline-manager/references/materialized-view.md +34 -34
- package/bin/skills/clickzetta-sql-pipeline-manager/references/pipe.md +61 -61
- package/bin/skills/clickzetta-sql-pipeline-manager/references/table-stream.md +41 -41
- package/bin/skills/clickzetta-table-stream-pipeline/SKILL.md +103 -101
- package/package.json +1 -1
|
@@ -1,122 +1,122 @@
|
|
|
1
|
-
# SQL → Dynamic Table
|
|
1
|
+
# SQL → Dynamic Table Conversion Rules
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
You are a SQL conversion expert. Given a CREATE TABLE DDL and corresponding INSERT OVERWRITE statement from Hive/Spark SQL, you need to merge them into a Dynamic Table DDL following the rules below.
|
|
4
4
|
|
|
5
|
-
##
|
|
5
|
+
## Overall Conversion Formula
|
|
6
6
|
|
|
7
7
|
```
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
8
|
+
Input 1: CREATE TABLE schema.table_name (...) PARTITIONED BY (...) ...
|
|
9
|
+
Input 2: INSERT OVERWRITE TABLE schema.table_name PARTITION(...) SELECT ... FROM ...
|
|
10
|
+
Output: CREATE OR REPLACE DYNAMIC TABLE schema.table_name (...) PARTITIONED BY (...) ... AS SELECT ... FROM ...
|
|
11
11
|
```
|
|
12
12
|
|
|
13
|
-
|
|
13
|
+
Core idea: merge the structure definition from CREATE TABLE with the query logic from INSERT OVERWRITE into a single `CREATE OR REPLACE DYNAMIC TABLE ... AS SELECT ...` statement.
|
|
14
14
|
|
|
15
|
-
##
|
|
15
|
+
## Step 1: Parse the CREATE TABLE DDL
|
|
16
16
|
|
|
17
|
-
|
|
17
|
+
Extract the following information from the DDL:
|
|
18
18
|
|
|
19
|
-
1.
|
|
20
|
-
2.
|
|
21
|
-
3.
|
|
22
|
-
4.
|
|
23
|
-
5.
|
|
24
|
-
6.
|
|
25
|
-
7.
|
|
26
|
-
8.
|
|
27
|
-
9.
|
|
19
|
+
1. **Table name** (including schema): `schema.table_name`
|
|
20
|
+
2. **Regular columns**: column name, data type, COMMENT (preserve original indentation format)
|
|
21
|
+
3. **Partition columns**: column name, data type, COMMENT from PARTITIONED BY
|
|
22
|
+
4. **Storage format**: STORED AS PARQUET/ORC/AVRO, etc.
|
|
23
|
+
5. **Table properties**: key-value pairs from TBLPROPERTIES or WITH PROPERTIES
|
|
24
|
+
6. **Bucketing info**: CLUSTERED BY / SORTED BY / RANGE CLUSTERED BY / HASH CLUSTERED BY
|
|
25
|
+
7. **Lifecycle**: LIFECYCLE N
|
|
26
|
+
8. **Connection info**: CONNECTION schema.connection_name
|
|
27
|
+
9. **Location info**: LOCATION 'path'
|
|
28
28
|
|
|
29
|
-
##
|
|
29
|
+
## Step 2: Parse the INSERT OVERWRITE Statement
|
|
30
30
|
|
|
31
|
-
|
|
31
|
+
Extract from the INSERT statement:
|
|
32
32
|
|
|
33
|
-
1.
|
|
34
|
-
2.
|
|
35
|
-
-
|
|
36
|
-
-
|
|
37
|
-
-
|
|
38
|
-
3. **SELECT
|
|
39
|
-
4. **CTE
|
|
40
|
-
5.
|
|
33
|
+
1. **Target table name**: used for self-reference detection
|
|
34
|
+
2. **Partition type**:
|
|
35
|
+
- Dynamic partition: `PARTITION (col1, col2)` — column names without values
|
|
36
|
+
- Static partition: `PARTITION (col1='value1', col2=value2)` — column names with values
|
|
37
|
+
- Mixed partition: `PARTITION (static_col='value', dynamic_col)` — some with values
|
|
38
|
+
3. **SELECT query**: complete query logic (including WHERE, JOIN, GROUP BY, etc.)
|
|
39
|
+
4. **CTE (WITH clause)**: if present, retain the complete `WITH ... AS (...)` structure
|
|
40
|
+
5. **Preceding statements**: SET statements, CREATE TEMPORARY FUNCTION, etc. (retain)
|
|
41
41
|
|
|
42
|
-
###
|
|
42
|
+
### Statements to Filter Out
|
|
43
43
|
|
|
44
|
-
|
|
44
|
+
Remove from the INSERT file:
|
|
45
45
|
- `ALTER TABLE ... ADD PARTITION ...`
|
|
46
46
|
- `ALTER TABLE ... DROP PARTITION ...`
|
|
47
|
-
-
|
|
48
|
-
- `ANALYZE TABLE`
|
|
49
|
-
- SQL
|
|
47
|
+
- All statements starting with `ALTER TABLE`
|
|
48
|
+
- `ANALYZE TABLE` statements
|
|
49
|
+
- SQL comments (`--` and `/* */`)
|
|
50
50
|
|
|
51
|
-
##
|
|
51
|
+
## Step 3: Assemble the Dynamic Table DDL
|
|
52
52
|
|
|
53
|
-
|
|
53
|
+
Assemble the output in the following order:
|
|
54
54
|
|
|
55
55
|
```sql
|
|
56
|
-
--
|
|
56
|
+
-- Optional: to drop an existing table with the same name, uncomment the next line
|
|
57
57
|
-- DROP TABLE IF EXISTS schema.table_name;
|
|
58
58
|
|
|
59
|
-
CREATE SCHEMA IF NOT EXISTS schema; --
|
|
59
|
+
CREATE SCHEMA IF NOT EXISTS schema; -- only when table name contains schema
|
|
60
60
|
CREATE OR REPLACE DYNAMIC TABLE schema.table_name (
|
|
61
|
-
col1 BIGINT COMMENT '...', --
|
|
61
|
+
col1 BIGINT COMMENT '...', -- regular columns (preserve original format)
|
|
62
62
|
col2 STRING COMMENT '...',
|
|
63
|
-
part_col1 STRING COMMENT '...' --
|
|
63
|
+
part_col1 STRING COMMENT '...' -- partition columns appended after regular columns
|
|
64
64
|
)
|
|
65
|
-
PARTITIONED BY (part_col1, part_col2) --
|
|
65
|
+
PARTITIONED BY (part_col1, part_col2) -- column names only, no types
|
|
66
66
|
[CLUSTERED BY (...) [SORTED BY (...)] [INTO N BUCKETS]]
|
|
67
67
|
[STORED AS PARQUET]
|
|
68
|
-
TBLPROPERTIES ('key' = 'value') --
|
|
68
|
+
TBLPROPERTIES ('key' = 'value') -- merge template properties and original properties
|
|
69
69
|
[LIFECYCLE N]
|
|
70
70
|
[CONNECTION schema.connection_name]
|
|
71
|
-
[LOCATION 'original_path_dt'] --
|
|
71
|
+
[LOCATION 'original_path_dt'] -- original path with _dt suffix
|
|
72
72
|
AS
|
|
73
|
-
SELECT
|
|
73
|
+
SELECT query; -- query from INSERT OVERWRITE
|
|
74
74
|
```
|
|
75
75
|
|
|
76
|
-
###
|
|
76
|
+
### Key Rules
|
|
77
77
|
|
|
78
|
-
1.
|
|
79
|
-
2. **PARTITIONED BY
|
|
80
|
-
3. **CREATE SCHEMA
|
|
81
|
-
4. **LOCATION
|
|
82
|
-
5. **DROP
|
|
78
|
+
1. **Column definitions**: regular columns + partition columns merged into one set of parentheses, preserving original indentation
|
|
79
|
+
2. **PARTITIONED BY**: write column names only, no types (unlike CREATE TABLE)
|
|
80
|
+
3. **CREATE SCHEMA**: if the table name contains `.` (e.g., `kscdm.table_name`), add `CREATE SCHEMA IF NOT EXISTS kscdm;` before the DDL
|
|
81
|
+
4. **LOCATION**: original path with `_dt` suffix
|
|
82
|
+
5. **DROP statement**: commented-out `DROP TABLE IF EXISTS` placed at the very beginning
|
|
83
83
|
|
|
84
|
-
##
|
|
84
|
+
## Step 4: Static Partition Injection
|
|
85
85
|
|
|
86
|
-
|
|
86
|
+
When INSERT OVERWRITE uses static partitions (`PARTITION(col=value)`), partition values need to be injected into the SELECT clause.
|
|
87
87
|
|
|
88
|
-
###
|
|
88
|
+
### Injection Rules
|
|
89
89
|
|
|
90
|
-
|
|
90
|
+
After the last column in SELECT and before FROM, append in the order of partition column definitions in the DDL:
|
|
91
91
|
|
|
92
92
|
```sql
|
|
93
|
-
--
|
|
93
|
+
-- Original SELECT
|
|
94
94
|
SELECT col1, col2 FROM source_table
|
|
95
95
|
|
|
96
|
-
--
|
|
96
|
+
-- After injection (assuming PARTITION(year=2024, month='January'))
|
|
97
97
|
SELECT col1, col2,
|
|
98
98
|
2024 AS year,
|
|
99
99
|
'January' AS month
|
|
100
100
|
FROM source_table
|
|
101
101
|
```
|
|
102
102
|
|
|
103
|
-
###
|
|
103
|
+
### Smart Value Type Handling
|
|
104
104
|
|
|
105
|
-
|
|
105
|
+
Decide whether to add quotes based on the value type when injecting:
|
|
106
106
|
|
|
107
|
-
|
|
|
107
|
+
| Value type | Detection rule | Handling | Example |
|
|
108
108
|
|--------|----------|------|------|
|
|
109
|
-
|
|
|
110
|
-
| NULL |
|
|
111
|
-
|
|
|
112
|
-
|
|
|
113
|
-
| SESSION_CONFIGS |
|
|
114
|
-
|
|
|
115
|
-
|
|
|
109
|
+
| Already quoted | Starts and ends with `'` or `"` | Keep as-is | `'hello'` → `'hello'` |
|
|
110
|
+
| NULL | Value is `NULL` (case-insensitive) | No quotes | `NULL` |
|
|
111
|
+
| Boolean | `true` / `false` (case-insensitive) | No quotes | `true` |
|
|
112
|
+
| Number | Can be parsed by `float()` | No quotes | `123`, `-45.67`, `1.23e-4` |
|
|
113
|
+
| SESSION_CONFIGS | Contains `SESSION_CONFIGS(` | No quotes | `SESSION_CONFIGS()['dt.args.ds']` |
|
|
114
|
+
| Function call | Matches `identifier(...)` with balanced parentheses | No quotes | `CURRENT_DATE()`, `YEAR(col)` |
|
|
115
|
+
| String | None of the above match | Add single quotes; escape internal `'` as `''` | `hello` → `'hello'` |
|
|
116
116
|
|
|
117
|
-
### UNION ALL
|
|
117
|
+
### UNION ALL Handling
|
|
118
118
|
|
|
119
|
-
|
|
119
|
+
If SELECT contains UNION ALL, inject partition columns into each branch independently:
|
|
120
120
|
|
|
121
121
|
```sql
|
|
122
122
|
SELECT col1, col2,
|
|
@@ -130,60 +130,60 @@ FROM table_b
|
|
|
130
130
|
|
|
131
131
|
### CTE + UNION ALL
|
|
132
132
|
|
|
133
|
-
|
|
133
|
+
If there is a WITH clause, first separate the CTE part, then inject only into the UNION branches in the main query.
|
|
134
134
|
|
|
135
|
-
###
|
|
135
|
+
### Already-existing Partition Columns
|
|
136
136
|
|
|
137
|
-
|
|
137
|
+
If SELECT already contains a partition column (detected via `AS alias` or trailing identifier), skip injection for that column to avoid duplication.
|
|
138
138
|
|
|
139
|
-
##
|
|
139
|
+
## Step 5: Date Function Post-processing
|
|
140
140
|
|
|
141
|
-
|
|
141
|
+
After generating the DDL, do a global replacement on the entire DDL text:
|
|
142
142
|
|
|
143
|
-
|
|
|
143
|
+
| Original form | Replace with |
|
|
144
144
|
|----------|--------|
|
|
145
145
|
| `DATE_SUB(expr, INTERVAL N DAY)` | `sub_days(expr, N)` |
|
|
146
146
|
| `DATE_ADD(expr, INTERVAL N DAY)` | `sub_days(expr, -N)` |
|
|
147
147
|
|
|
148
|
-
|
|
148
|
+
This step ensures the final output consistently uses the `sub_days` function.
|
|
149
149
|
|
|
150
|
-
>
|
|
150
|
+
> Note: In the SQL engine, `SUB_DAYS` is an alias for `DATE_SUB`; they are equivalent. Using `sub_days` uniformly is for output consistency.
|
|
151
151
|
|
|
152
|
-
##
|
|
152
|
+
## Step 6: Table Property Template Merge
|
|
153
153
|
|
|
154
|
-
|
|
154
|
+
Default template property: `data_lifecycle = 15`
|
|
155
155
|
|
|
156
|
-
|
|
157
|
-
-
|
|
158
|
-
-
|
|
159
|
-
-
|
|
156
|
+
Merge rules:
|
|
157
|
+
- Template properties serve as the base
|
|
158
|
+
- TBLPROPERTIES from the original DDL override template properties with the same name
|
|
159
|
+
- Final result is written to TBLPROPERTIES
|
|
160
160
|
|
|
161
161
|
```sql
|
|
162
|
-
--
|
|
163
|
-
--
|
|
164
|
-
--
|
|
162
|
+
-- Template: data_lifecycle=15
|
|
163
|
+
-- Original DDL: TBLPROPERTIES('compression'='snappy', 'data_lifecycle'='30')
|
|
164
|
+
-- Merged result:
|
|
165
165
|
TBLPROPERTIES ('data_lifecycle' = '30', 'compression' = 'snappy')
|
|
166
|
-
-- data_lifecycle
|
|
166
|
+
-- data_lifecycle retains original value 30; compression comes from original DDL
|
|
167
167
|
```
|
|
168
168
|
|
|
169
|
-
##
|
|
169
|
+
## Complete Example
|
|
170
170
|
|
|
171
|
-
###
|
|
171
|
+
### Input 1: DDL
|
|
172
172
|
```sql
|
|
173
173
|
CREATE TABLE IF NOT EXISTS sales_data (
|
|
174
|
-
id BIGINT COMMENT '
|
|
175
|
-
product_name STRING COMMENT '
|
|
176
|
-
sales_amount DECIMAL(12,2) COMMENT '
|
|
174
|
+
id BIGINT COMMENT 'Sales record ID',
|
|
175
|
+
product_name STRING COMMENT 'Product name',
|
|
176
|
+
sales_amount DECIMAL(12,2) COMMENT 'Sales amount'
|
|
177
177
|
)
|
|
178
178
|
PARTITIONED BY (
|
|
179
|
-
year INT COMMENT '
|
|
180
|
-
month INT COMMENT '
|
|
179
|
+
year INT COMMENT 'Year',
|
|
180
|
+
month INT COMMENT 'Month'
|
|
181
181
|
)
|
|
182
182
|
STORED AS PARQUET
|
|
183
183
|
LOCATION '/data/warehouse/sales_data';
|
|
184
184
|
```
|
|
185
185
|
|
|
186
|
-
###
|
|
186
|
+
### Input 2: INSERT OVERWRITE
|
|
187
187
|
```sql
|
|
188
188
|
INSERT OVERWRITE TABLE sales_data
|
|
189
189
|
PARTITION (year, month)
|
|
@@ -197,17 +197,17 @@ FROM raw_sales s
|
|
|
197
197
|
WHERE s.status = 'completed';
|
|
198
198
|
```
|
|
199
199
|
|
|
200
|
-
###
|
|
200
|
+
### Output: Dynamic Table DDL
|
|
201
201
|
```sql
|
|
202
|
-
--
|
|
202
|
+
-- Optional: to drop an existing table with the same name, uncomment the next line
|
|
203
203
|
-- DROP TABLE IF EXISTS sales_data;
|
|
204
204
|
|
|
205
205
|
CREATE OR REPLACE DYNAMIC TABLE sales_data (
|
|
206
|
-
id BIGINT COMMENT '
|
|
207
|
-
product_name STRING COMMENT '
|
|
208
|
-
sales_amount DECIMAL(12,2) COMMENT '
|
|
209
|
-
year INT COMMENT '
|
|
210
|
-
month INT COMMENT '
|
|
206
|
+
id BIGINT COMMENT 'Sales record ID',
|
|
207
|
+
product_name STRING COMMENT 'Product name',
|
|
208
|
+
sales_amount DECIMAL(12,2) COMMENT 'Sales amount',
|
|
209
|
+
year INT COMMENT 'Year',
|
|
210
|
+
month INT COMMENT 'Month'
|
|
211
211
|
)
|
|
212
212
|
PARTITIONED BY (year, month)
|
|
213
213
|
STORED AS PARQUET
|
package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-placeholder-rules.md
CHANGED
|
@@ -1,164 +1,164 @@
|
|
|
1
|
-
# SQL
|
|
1
|
+
# SQL Placeholder → SESSION_CONFIGS() Conversion Rules
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
You are a SQL conversion expert. When converting traditional SQL to Dynamic Table SQL, you need to convert various placeholder formats uniformly to `SESSION_CONFIGS()` function calls.
|
|
4
4
|
|
|
5
|
-
##
|
|
5
|
+
## Placeholder Format Normalization
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
First, normalize all legacy formats to `${...}` format:
|
|
8
8
|
|
|
9
|
-
|
|
|
9
|
+
| Legacy format | Normalize to |
|
|
10
10
|
|--------|--------|
|
|
11
11
|
| `{{ var }}` | `${var}` |
|
|
12
12
|
| `{{ ds }}` | `${ds}` |
|
|
13
13
|
| `{{region}}` | `${region}` |
|
|
14
14
|
|
|
15
|
-
|
|
15
|
+
Conversion regex: `\{\{\s*([^}]+)\s*\}\}` → `${\1}`
|
|
16
16
|
|
|
17
|
-
##
|
|
17
|
+
## Basic Replacement Rules
|
|
18
18
|
|
|
19
|
-
###
|
|
19
|
+
### Simple Variables
|
|
20
20
|
|
|
21
|
-
|
|
|
21
|
+
| Input | Output |
|
|
22
22
|
|------|------|
|
|
23
23
|
| `${ds}` | `SESSION_CONFIGS()['dt.args.ds']` |
|
|
24
24
|
| `${region}` | `SESSION_CONFIGS()['dt.args.region']` |
|
|
25
25
|
| `${hour}` | `SESSION_CONFIGS()['dt.args.hour']` |
|
|
26
26
|
|
|
27
|
-
### nodash
|
|
27
|
+
### nodash Variables (Special Handling)
|
|
28
28
|
|
|
29
|
-
|
|
29
|
+
When the variable name contains `nodash`, automatically wrap with DATE_FORMAT, but keep the variable name as-is:
|
|
30
30
|
|
|
31
|
-
|
|
|
31
|
+
| Input | Output |
|
|
32
32
|
|------|------|
|
|
33
33
|
| `${ds_nodash}` | `DATE_FORMAT(SESSION_CONFIGS()['dt.args.ds_nodash'], 'yyyyMMdd')` |
|
|
34
34
|
| `${dsnodash}` | `DATE_FORMAT(SESSION_CONFIGS()['dt.args.dsnodash'], 'yyyyMMdd')` |
|
|
35
35
|
|
|
36
|
-
|
|
36
|
+
Note: the variable name stays as-is (`ds_nodash` does not become `ds`); only the outer DATE_FORMAT is added.
|
|
37
37
|
|
|
38
|
-
###
|
|
38
|
+
### Variables with Arithmetic
|
|
39
39
|
|
|
40
|
-
|
|
40
|
+
The final output consistently uses the `sub_days` function (a post-processing step converts all `DATE_SUB`/`DATE_ADD` to `sub_days`):
|
|
41
41
|
|
|
42
|
-
|
|
|
42
|
+
| Input | Final output |
|
|
43
43
|
|------|----------|
|
|
44
44
|
| `${ds - 1}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')` |
|
|
45
45
|
| `${ds + 7}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], -7), 'yyyy-MM-dd')` |
|
|
46
46
|
| `${ds_nodash - 1}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds_nodash'], 1), 'yyyyMMdd')::STRING` |
|
|
47
47
|
|
|
48
|
-
|
|
49
|
-
- `-`
|
|
50
|
-
- `+`
|
|
51
|
-
-
|
|
52
|
-
-
|
|
53
|
-
-
|
|
54
|
-
-
|
|
48
|
+
Rules:
|
|
49
|
+
- `-` operation → `sub_days(..., N)` (N is positive)
|
|
50
|
+
- `+` operation → `sub_days(..., -N)` (N negated to negative)
|
|
51
|
+
- Outer `DATE_FORMAT`, format determined by variable name:
|
|
52
|
+
- Contains `nodash` → `'yyyyMMdd'`
|
|
53
|
+
- Does not contain `nodash` → `'yyyy-MM-dd'`
|
|
54
|
+
- Variables containing `nodash` with arithmetic append `::STRING` type cast
|
|
55
55
|
|
|
56
|
-
|
|
56
|
+
Note: this is the final output form. Intermediate steps may first generate `DATE_SUB`/`DATE_ADD`, but they will be uniformly converted to `sub_days` by post-processing.
|
|
57
57
|
|
|
58
|
-
### macros.ds_add
|
|
58
|
+
### macros.ds_add Function
|
|
59
59
|
|
|
60
|
-
|
|
|
60
|
+
| Input | Output |
|
|
61
61
|
|------|------|
|
|
62
62
|
| `${macros.ds_add(ds, -1)}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')` |
|
|
63
63
|
| `${macros.ds_add(ds, 7)}` | `DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], -7), 'yyyy-MM-dd')` |
|
|
64
64
|
|
|
65
|
-
|
|
65
|
+
Note: the second parameter of `macros.ds_add` has the opposite sign from `sub_days`. `macros.ds_add(ds, -1)` means ds minus 1 day, corresponding to `sub_days(ds, 1)` (positive = subtract days); `macros.ds_add(ds, 7)` means ds plus 7 days, corresponding to `sub_days(ds, -7)` (negative = add days).
|
|
66
66
|
|
|
67
|
-
##
|
|
67
|
+
## Quote Context Rules
|
|
68
68
|
|
|
69
|
-
|
|
69
|
+
The handling of a placeholder depends on the quote context it is in:
|
|
70
70
|
|
|
71
|
-
###
|
|
71
|
+
### Case 1: Placeholder inside single quotes (pure placeholder)
|
|
72
72
|
|
|
73
73
|
```sql
|
|
74
|
-
--
|
|
74
|
+
-- Input
|
|
75
75
|
WHERE dt = '${ds}'
|
|
76
|
-
--
|
|
76
|
+
-- Output (remove outer quotes; direct replacement)
|
|
77
77
|
WHERE dt = SESSION_CONFIGS()['dt.args.ds']
|
|
78
78
|
```
|
|
79
79
|
|
|
80
|
-
###
|
|
80
|
+
### Case 2: Placeholder inside single quotes (mixed content)
|
|
81
81
|
|
|
82
|
-
|
|
82
|
+
When the quoted string contains both a placeholder and literal text, use CONCAT:
|
|
83
83
|
|
|
84
84
|
```sql
|
|
85
|
-
--
|
|
85
|
+
-- Input
|
|
86
86
|
WHERE dt = '${ds_nodash}_done'
|
|
87
|
-
--
|
|
87
|
+
-- Output
|
|
88
88
|
WHERE dt = CONCAT(DATE_FORMAT(SESSION_CONFIGS()['dt.args.ds_nodash'], 'yyyyMMdd'), '_done')
|
|
89
89
|
```
|
|
90
90
|
|
|
91
91
|
```sql
|
|
92
|
-
--
|
|
92
|
+
-- Input
|
|
93
93
|
WHERE path = '/data/${region}/output'
|
|
94
|
-
--
|
|
94
|
+
-- Output
|
|
95
95
|
WHERE path = CONCAT('/data/', SESSION_CONFIGS()['dt.args.region'], '/output')
|
|
96
96
|
```
|
|
97
97
|
|
|
98
|
-
###
|
|
98
|
+
### Case 3: Placeholder not inside quotes
|
|
99
99
|
|
|
100
100
|
```sql
|
|
101
|
-
--
|
|
101
|
+
-- Input
|
|
102
102
|
WHERE dt = ${ds}
|
|
103
|
-
--
|
|
103
|
+
-- Output
|
|
104
104
|
WHERE dt = SESSION_CONFIGS()['dt.args.ds']
|
|
105
105
|
```
|
|
106
106
|
|
|
107
|
-
###
|
|
107
|
+
### Case 4: Placeholder inside single quotes with date arithmetic
|
|
108
108
|
|
|
109
109
|
```sql
|
|
110
|
-
--
|
|
110
|
+
-- Input
|
|
111
111
|
WHERE dt = '${ds - 1}'
|
|
112
|
-
--
|
|
112
|
+
-- Output (remove outer quotes; add ::STRING type cast)
|
|
113
113
|
WHERE dt = DATE_FORMAT(sub_days(SESSION_CONFIGS()['dt.args.ds'], 1), 'yyyy-MM-dd')::STRING
|
|
114
114
|
```
|
|
115
115
|
|
|
116
|
-
###
|
|
116
|
+
### Quote Selection Inside Strings
|
|
117
117
|
|
|
118
|
-
|
|
118
|
+
When the replaced expression is still inside a single-quoted string (e.g., CONCAT scenario), use double quotes for SESSION_CONFIGS key names to avoid quote conflicts:
|
|
119
119
|
```sql
|
|
120
|
-
--
|
|
120
|
+
-- Inside single-quote context (e.g., CONCAT)
|
|
121
121
|
CONCAT('prefix_', SESSION_CONFIGS()["dt.args.ds"])
|
|
122
122
|
|
|
123
|
-
--
|
|
123
|
+
-- Standalone expression (outer quotes already removed)
|
|
124
124
|
SESSION_CONFIGS()['dt.args.ds']
|
|
125
125
|
```
|
|
126
126
|
|
|
127
|
-
##
|
|
127
|
+
## Placeholders in Static Partitions
|
|
128
128
|
|
|
129
|
-
|
|
129
|
+
Placeholders in static partition values are replaced and then injected into the SELECT clause:
|
|
130
130
|
|
|
131
131
|
```sql
|
|
132
|
-
--
|
|
132
|
+
-- Input
|
|
133
133
|
INSERT OVERWRITE TABLE t PARTITION(dt='${ds}', region='${region}')
|
|
134
134
|
SELECT col1 FROM source;
|
|
135
135
|
|
|
136
|
-
--
|
|
136
|
+
-- After conversion
|
|
137
137
|
SELECT col1,
|
|
138
138
|
SESSION_CONFIGS()['dt.args.ds'] AS dt,
|
|
139
139
|
SESSION_CONFIGS()['dt.args.region'] AS region
|
|
140
140
|
FROM source;
|
|
141
141
|
```
|
|
142
142
|
|
|
143
|
-
##
|
|
143
|
+
## Unrecognizable Expressions
|
|
144
144
|
|
|
145
|
-
|
|
146
|
-
1.
|
|
147
|
-
2.
|
|
148
|
-
3.
|
|
149
|
-
4.
|
|
145
|
+
For complex expressions that cannot be parsed (e.g., Airflow Jinja templates), clean them up:
|
|
146
|
+
1. Convert Python strftime format specifiers to SQL style: `%Y`→`yyyy`, `%m`→`MM`, `%d`→`dd`, `%H`→`HH`
|
|
147
|
+
2. Replace non-alphanumeric-underscore characters with `_`
|
|
148
|
+
3. Merge consecutive underscores; remove leading/trailing underscores
|
|
149
|
+
4. Use the cleaned string as the SESSION_CONFIGS key name
|
|
150
150
|
|
|
151
151
|
```sql
|
|
152
|
-
--
|
|
152
|
+
-- Input
|
|
153
153
|
${execution_date.strftime("%H00")}
|
|
154
|
-
--
|
|
155
|
-
--
|
|
154
|
+
-- Cleaned key name: execution_date_strftime_HH00
|
|
155
|
+
-- Output
|
|
156
156
|
SESSION_CONFIGS()['dt.args.execution_date_strftime_HH00']
|
|
157
157
|
```
|
|
158
158
|
|
|
159
|
-
##
|
|
159
|
+
## Complete Example
|
|
160
160
|
|
|
161
|
-
###
|
|
161
|
+
### Input
|
|
162
162
|
```sql
|
|
163
163
|
INSERT OVERWRITE TABLE kscdm.dim_table
|
|
164
164
|
PARTITION(p_date='{{ ds_nodash }}_done', product='done', dt='{{ ds }}')
|
|
@@ -169,7 +169,7 @@ WHERE dt = '{{ ds }}'
|
|
|
169
169
|
AND region = '{{ region }}';
|
|
170
170
|
```
|
|
171
171
|
|
|
172
|
-
###
|
|
172
|
+
### Output (after placeholder replacement)
|
|
173
173
|
```sql
|
|
174
174
|
SELECT id, name,
|
|
175
175
|
CONCAT(DATE_FORMAT(SESSION_CONFIGS()['dt.args.ds_nodash'], 'yyyyMMdd'), '_done') AS p_date,
|
|
@@ -1,20 +1,20 @@
|
|
|
1
|
-
# Dynamic Table Refresh
|
|
1
|
+
# Dynamic Table Refresh and Scheduling File Generation Rules
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
You are a SQL conversion expert. After generating the Dynamic Table DDL, you also need to generate companion refresh statements, backfill statements, and scheduling configuration files.
|
|
4
4
|
|
|
5
|
-
## Refresh
|
|
5
|
+
## Refresh Statement Generation
|
|
6
6
|
|
|
7
|
-
###
|
|
7
|
+
### Variable Extraction
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
Extract all variable names XXX from `SESSION_CONFIGS()['dt.args.XXX']` in the converted DDL, deduplicate, and sort.
|
|
10
10
|
|
|
11
|
-
|
|
11
|
+
Note: only extract variable names that actually appear in the DDL. For example, if the DDL only contains `SESSION_CONFIGS()['dt.args.ds_nodash']`, only generate a SET statement for the `ds_nodash` variable.
|
|
12
12
|
|
|
13
|
-
###
|
|
13
|
+
### Three Types of Refresh Files
|
|
14
14
|
|
|
15
|
-
|
|
15
|
+
For each converted table, generate three types of files:
|
|
16
16
|
|
|
17
|
-
#### 1.
|
|
17
|
+
#### 1. Current-cycle refresh (`table_name_refresh.sql`)
|
|
18
18
|
|
|
19
19
|
```sql
|
|
20
20
|
set dt.args.ds = ${ds};
|
|
@@ -22,13 +22,13 @@ set dt.args.region = ${region};
|
|
|
22
22
|
REFRESH DYNAMIC TABLE schema.table_name PARTITION(ds = '${ds}', region = '${region}');
|
|
23
23
|
```
|
|
24
24
|
|
|
25
|
-
|
|
26
|
-
-
|
|
27
|
-
-
|
|
28
|
-
- PARTITION
|
|
29
|
-
-
|
|
25
|
+
Rules:
|
|
26
|
+
- Generate one `set dt.args.variable_name = ${variable_name};` line for each extracted variable
|
|
27
|
+
- Variables sorted alphabetically
|
|
28
|
+
- PARTITION clause includes only static partition columns (extracted from the PARTITION clause of the original INSERT OVERWRITE)
|
|
29
|
+
- Partition values use `'${variable_name}'` format
|
|
30
30
|
|
|
31
|
-
#### 2.
|
|
31
|
+
#### 2. Previous-cycle refresh (`table_name_prev_refresh.sql`)
|
|
32
32
|
|
|
33
33
|
```sql
|
|
34
34
|
set dt.args.ds = ${prev_ds};
|
|
@@ -36,9 +36,9 @@ set dt.args.region = ${prev_region};
|
|
|
36
36
|
REFRESH DYNAMIC TABLE schema.table_name PARTITION(ds = '${prev_ds}', region = '${prev_region}');
|
|
37
37
|
```
|
|
38
38
|
|
|
39
|
-
|
|
39
|
+
Rules: add `prev_` prefix to each variable name.
|
|
40
40
|
|
|
41
|
-
#### 3.
|
|
41
|
+
#### 3. Backfill statement (`table_name_backfill.sql`)
|
|
42
42
|
|
|
43
43
|
```sql
|
|
44
44
|
set cz.optimizer.incremental.backfill.enabled = TRUE;
|
|
@@ -49,29 +49,29 @@ FROM ext_schema.table_name
|
|
|
49
49
|
WHERE ds = '${ds}' AND region = '${region}';
|
|
50
50
|
```
|
|
51
51
|
|
|
52
|
-
|
|
53
|
-
-
|
|
54
|
-
-
|
|
55
|
-
- WHERE
|
|
52
|
+
Rules:
|
|
53
|
+
- Fixed backfill switch SET statement
|
|
54
|
+
- SELECT * from extension table (ext_schema) into target table
|
|
55
|
+
- WHERE condition uses static partition columns (extracted from the PARTITION clause of the original INSERT OVERWRITE)
|
|
56
56
|
|
|
57
|
-
###
|
|
57
|
+
### Non-partitioned Tables
|
|
58
58
|
|
|
59
|
-
|
|
60
|
-
-
|
|
61
|
-
-
|
|
59
|
+
If the table has no static partition variables:
|
|
60
|
+
- Only generate current-cycle refresh: `REFRESH DYNAMIC TABLE schema.table_name;`
|
|
61
|
+
- Do not generate prev_refresh and backfill files
|
|
62
62
|
|
|
63
|
-
###
|
|
63
|
+
### Extension Table Name Rules
|
|
64
64
|
|
|
65
|
-
-
|
|
65
|
+
- If `ext_schema` is specified: `ext_schema.table_name`
|
|
66
66
|
|
|
67
|
-
##
|
|
67
|
+
## Complete Example
|
|
68
68
|
|
|
69
|
-
###
|
|
69
|
+
### Input (converted DDL contains the following variables)
|
|
70
70
|
|
|
71
|
-
DDL
|
|
72
|
-
|
|
71
|
+
DDL contains: `SESSION_CONFIGS()['dt.args.ds']` and `SESSION_CONFIGS()['dt.args.region']`
|
|
72
|
+
Original PARTITION: `PARTITION(dt='${ds}', region='${region}')`
|
|
73
73
|
|
|
74
|
-
###
|
|
74
|
+
### Output
|
|
75
75
|
|
|
76
76
|
**refresh.sql:**
|
|
77
77
|
```sql
|