@clickzetta/cz-cli-darwin-x64 0.3.89 → 0.3.90
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/cz-cli +0 -0
- package/bin/skills/clickzetta-dynamic-table/SKILL.md +169 -169
- package/bin/skills/clickzetta-dynamic-table/best-practices/dimension-table-join-guide.md +126 -126
- package/bin/skills/clickzetta-dynamic-table/best-practices/medallion-and-stream-patterns.md +25 -25
- package/bin/skills/clickzetta-dynamic-table/best-practices/non-partitioned-merge-into-warning.md +48 -48
- package/bin/skills/clickzetta-dynamic-table/best-practices/performance-optimization.md +51 -51
- package/bin/skills/clickzetta-dynamic-table/best-practices/scheduling-guide.md +59 -59
- package/bin/skills/clickzetta-dynamic-table/dt-creator/SKILL.md +8 -7
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md +99 -99
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/incremental-config-reference.md +188 -188
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/refresh-history-guide.md +117 -117
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/sql-limitations.md +29 -29
- package/bin/skills/clickzetta-dynamic-table/dynamic-table-alter/SKILL.md +80 -79
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/SKILL.md +15 -15
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-column-validation-rules.md +61 -61
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-conversion-rules.md +100 -100
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-placeholder-rules.md +64 -64
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-refresh-rules.md +32 -32
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-self-reference-rules.md +21 -21
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-workflow.md +71 -71
- package/bin/skills/clickzetta-sql-pipeline-manager/SKILL.md +203 -202
- package/bin/skills/clickzetta-sql-pipeline-manager/references/dynamic-table.md +62 -62
- package/bin/skills/clickzetta-sql-pipeline-manager/references/materialized-view.md +34 -34
- package/bin/skills/clickzetta-sql-pipeline-manager/references/pipe.md +61 -61
- package/bin/skills/clickzetta-sql-pipeline-manager/references/table-stream.md +41 -41
- package/bin/skills/clickzetta-table-stream-pipeline/SKILL.md +103 -101
- package/package.json +1 -1
package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md
CHANGED
|
@@ -1,12 +1,12 @@
|
|
|
1
|
-
# Dynamic Table
|
|
1
|
+
# Dynamic Table Declaration Strategy
|
|
2
2
|
|
|
3
|
-
DT
|
|
3
|
+
DT has two creation syntaxes: static partition DT and dynamic partition DT (non-partitioned DT can be viewed as a special case of dynamic partition). The two differ fundamentally in creation syntax, refresh behavior, and incremental behavior.
|
|
4
4
|
|
|
5
|
-
##
|
|
5
|
+
## Core Concepts
|
|
6
6
|
|
|
7
|
-
###
|
|
7
|
+
### Static Partition DT (Partitioned DT with SESSION_CONFIGS args)
|
|
8
8
|
|
|
9
|
-
SQL
|
|
9
|
+
The SQL references partition parameters via `SESSION_CONFIGS()`, and a specific partition value is specified at each REFRESH. Each partition refreshes independently — each partition refresh unit can be viewed as an independent DT.
|
|
10
10
|
|
|
11
11
|
```sql
|
|
12
12
|
CREATE DYNAMIC TABLE order_daily (
|
|
@@ -18,24 +18,24 @@ SELECT id, amount, SESSION_CONFIGS()['dt.args.ds'] AS ds
|
|
|
18
18
|
FROM orders
|
|
19
19
|
WHERE ds = SESSION_CONFIGS()['dt.args.ds'];
|
|
20
20
|
|
|
21
|
-
--
|
|
21
|
+
-- Specify partition at refresh time
|
|
22
22
|
set dt.args.ds=2025-01-01
|
|
23
23
|
REFRESH DYNAMIC TABLE order_daily PARTITION(ds = '2025-01-01');
|
|
24
24
|
```
|
|
25
25
|
|
|
26
|
-
###
|
|
26
|
+
### Dynamic Partition DT (Non-partitioned DT / DT without args)
|
|
27
27
|
|
|
28
|
-
SQL
|
|
28
|
+
The SQL does not reference `SESSION_CONFIGS()`, or although partitioned, the partition values are dynamically produced by the query logic. Each REFRESH processes all incremental data from all source tables.
|
|
29
29
|
|
|
30
|
-
|
|
30
|
+
Dynamic partition DTs do not allow any command other than REFRESH to modify data (INSERT/UPDATE/DELETE/MERGE are all unavailable); data is driven entirely by REFRESH.
|
|
31
31
|
|
|
32
|
-
|
|
33
|
-
-
|
|
34
|
-
-
|
|
35
|
-
-
|
|
36
|
-
-
|
|
37
|
-
-
|
|
38
|
-
-
|
|
32
|
+
Therefore, the following ETL scenarios are not suitable for dynamic partition DT:
|
|
33
|
+
- Need to manually patch data (e.g., a few rows are found to be incorrect and need to be directly UPDATEd)
|
|
34
|
+
- Need to delete data by condition (e.g., cleaning dirty data, deleting expired records)
|
|
35
|
+
- Need MERGE INTO for upsert (e.g., consuming a stream and merging into a target table in a CDC scenario)
|
|
36
|
+
- Need INSERT INTO to append external data (e.g., manually importing a batch of supplementary data)
|
|
37
|
+
- Need to backfill or re-refresh partitions independently (dynamic partition DT can only do a full table refresh; individual partitions cannot be refreshed separately)
|
|
38
|
+
- Downstream tasks need to write to the same table (DT has exclusive write ownership)
|
|
39
39
|
|
|
40
40
|
```sql
|
|
41
41
|
CREATE DYNAMIC TABLE order_summary (
|
|
@@ -46,140 +46,140 @@ SELECT category, SUM(amount) AS total_amount
|
|
|
46
46
|
FROM orders
|
|
47
47
|
GROUP BY category;
|
|
48
48
|
|
|
49
|
-
--
|
|
49
|
+
-- No partition specified at refresh time
|
|
50
50
|
REFRESH DYNAMIC TABLE order_summary;
|
|
51
51
|
```
|
|
52
52
|
|
|
53
|
-
##
|
|
53
|
+
## Key Differences
|
|
54
54
|
|
|
55
|
-
|
|
|
55
|
+
| Dimension | Static Partition DT | Dynamic Partition DT |
|
|
56
56
|
|------|-----------|-----------|
|
|
57
|
-
| SQL
|
|
58
|
-
| REFRESH
|
|
59
|
-
|
|
|
60
|
-
|
|
|
61
|
-
|
|
|
62
|
-
|
|
|
63
|
-
|
|
|
57
|
+
| Does SQL contain `SESSION_CONFIGS()`? | Yes, used to reference partition parameters | No |
|
|
58
|
+
| REFRESH syntax | `REFRESH ... PARTITION(ds='xxx')` | `REFRESH ...` (no PARTITION) |
|
|
59
|
+
| Incremental scope | Only processes incremental data for the specified partition | Processes all incremental data from all source tables |
|
|
60
|
+
| Scheduling method | External scheduler triggers one partition at a time | External scheduler triggers on a timer |
|
|
61
|
+
| Data lifecycle | Managed per partition; can backfill/delete independently | Managed as a whole table |
|
|
62
|
+
| State tables | Maintained independently per partition | Maintained globally |
|
|
63
|
+
| Suitable data patterns | T+1 batch processing, time-partitioned ETL | Real-time streams, global aggregation, no clear partition key |
|
|
64
64
|
|
|
65
|
-
##
|
|
65
|
+
## Selection Decision Tree
|
|
66
66
|
|
|
67
67
|
```
|
|
68
|
-
|
|
68
|
+
Does your data have a clear time/business partition key?
|
|
69
69
|
│
|
|
70
|
-
├─
|
|
70
|
+
├─ Yes → Was the original ETL doing INSERT OVERWRITE by partition?
|
|
71
71
|
│ │
|
|
72
|
-
│ ├─
|
|
73
|
-
│ │
|
|
72
|
+
│ ├─ Yes → Use static partition DT
|
|
73
|
+
│ │ (maintain the original partition granularity; each partition refreshes independently)
|
|
74
74
|
│ │
|
|
75
|
-
│ └─
|
|
75
|
+
│ └─ No → Is the data volume large? Do you need per-partition lifecycle management?
|
|
76
76
|
│ │
|
|
77
|
-
│ ├─
|
|
78
|
-
│ │
|
|
77
|
+
│ ├─ Yes → Use static partition DT
|
|
78
|
+
│ │ (even if the original was not partitioned, adding partitions is recommended for manageability)
|
|
79
79
|
│ │
|
|
80
|
-
│ └─
|
|
81
|
-
│
|
|
80
|
+
│ └─ No → Use dynamic partition DT
|
|
81
|
+
│ (simple scenario; no partition management needed)
|
|
82
82
|
│
|
|
83
|
-
└─
|
|
84
|
-
|
|
83
|
+
└─ No → Use dynamic partition DT
|
|
84
|
+
(global aggregation, real-time summary, etc.)
|
|
85
85
|
```
|
|
86
86
|
|
|
87
|
-
##
|
|
87
|
+
## Static Partition DT — Details
|
|
88
88
|
|
|
89
|
-
###
|
|
89
|
+
### Applicable Scenarios
|
|
90
90
|
|
|
91
|
-
1. **T+1
|
|
92
|
-
-
|
|
93
|
-
-
|
|
94
|
-
-
|
|
91
|
+
1. **T+1 batch ETL migration**
|
|
92
|
+
- Original SQL follows the `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}')` pattern
|
|
93
|
+
- Refreshes once per day/hour by partition
|
|
94
|
+
- Needs to support historical partition backfill
|
|
95
95
|
|
|
96
|
-
2.
|
|
97
|
-
-
|
|
98
|
-
- SQL
|
|
96
|
+
2. **Sliding window computation**
|
|
97
|
+
- E.g., aggregation over the last 7 days, period-over-period comparison
|
|
98
|
+
- SQL references `SESSION_CONFIGS()['dt.args.ds']` and `sub_days(...)` for window range
|
|
99
99
|
|
|
100
|
-
3.
|
|
101
|
-
-
|
|
102
|
-
-
|
|
100
|
+
3. **Per-partition data lifecycle management**
|
|
101
|
+
- Automatically clean up expired partitions via `data_lifecycle`
|
|
102
|
+
- Can backfill a single partition without affecting others
|
|
103
103
|
|
|
104
|
-
4.
|
|
105
|
-
-
|
|
106
|
-
-
|
|
104
|
+
4. **Self-referencing DT (daily comparison, SCD)**
|
|
105
|
+
- Current partition depends on the result of the previous partition
|
|
106
|
+
- Must use static partition, because "current partition" and "previous partition" need to be explicitly specified
|
|
107
107
|
|
|
108
|
-
###
|
|
108
|
+
### Refresh Method
|
|
109
109
|
|
|
110
110
|
```sql
|
|
111
|
-
--
|
|
111
|
+
-- Refresh one partition at a time
|
|
112
112
|
set dt.args.ds=2025-01-15
|
|
113
113
|
REFRESH DYNAMIC TABLE my_dt PARTITION(ds = '2025-01-15');
|
|
114
114
|
|
|
115
|
-
--
|
|
115
|
+
-- Multi-level partition
|
|
116
116
|
set dt.args.pt=20250411
|
|
117
117
|
set dt.args.pt_hour=01
|
|
118
118
|
REFRESH DYNAMIC TABLE my_dt PARTITION(pt = '20250411', pt_hour = '01');
|
|
119
119
|
```
|
|
120
120
|
|
|
121
|
-
###
|
|
121
|
+
### Notes
|
|
122
122
|
|
|
123
|
-
-
|
|
124
|
-
-
|
|
123
|
+
- Use `cz.optimizer.incremental.backfill.enabled=TRUE` for backfill; it will automatically use full refresh
|
|
124
|
+
- Partition parameters are passed via `set dt.args.xxx=value`; the PARTITION clause in the REFRESH statement specifies the partition value
|
|
125
125
|
|
|
126
|
-
##
|
|
126
|
+
## Dynamic Partition DT — Details
|
|
127
127
|
|
|
128
|
-
###
|
|
128
|
+
### Applicable Scenarios
|
|
129
129
|
|
|
130
|
-
1.
|
|
131
|
-
-
|
|
132
|
-
-
|
|
130
|
+
1. **Real-time stream data aggregation**
|
|
131
|
+
- Source table continuously writes; DT refreshes on a schedule
|
|
132
|
+
- No partition management needed; each refresh processes all new data
|
|
133
133
|
|
|
134
|
-
2.
|
|
135
|
-
-
|
|
136
|
-
-
|
|
134
|
+
2. **Global summary tables**
|
|
135
|
+
- E.g., global TopN, global count, global deduplication
|
|
136
|
+
- No clear partition key
|
|
137
137
|
|
|
138
|
-
3.
|
|
139
|
-
-
|
|
140
|
-
-
|
|
138
|
+
3. **Simple JOIN + filter**
|
|
139
|
+
- Simple transformations without partition parameters
|
|
140
|
+
- E.g., fact table JOIN dimension table, output wide table
|
|
141
141
|
|
|
142
|
-
4.
|
|
143
|
-
-
|
|
144
|
-
-
|
|
142
|
+
4. **Multi-source merge (UNION ALL)**
|
|
143
|
+
- Data from multiple source tables merged into one table
|
|
144
|
+
- No partition management needed
|
|
145
145
|
|
|
146
|
-
###
|
|
146
|
+
### Refresh Method
|
|
147
147
|
|
|
148
148
|
```sql
|
|
149
|
-
--
|
|
149
|
+
-- Refresh directly; processes all incremental data from all source tables
|
|
150
150
|
REFRESH DYNAMIC TABLE my_dt;
|
|
151
151
|
```
|
|
152
152
|
|
|
153
|
-
###
|
|
153
|
+
### Notes
|
|
154
154
|
|
|
155
|
-
-
|
|
156
|
-
-
|
|
157
|
-
-
|
|
158
|
-
-
|
|
155
|
+
- Each refresh processes all incremental data from all source tables; if source table change volume is large, refresh may be slow
|
|
156
|
+
- State tables are maintained globally and may grow as data volume increases
|
|
157
|
+
- Per-partition backfill is not supported; only full table refresh is possible
|
|
158
|
+
- Suitable for scenarios where the change ratio is small (< 5%)
|
|
159
159
|
|
|
160
|
-
##
|
|
160
|
+
## Partition Granularity Selection
|
|
161
161
|
|
|
162
|
-
|
|
162
|
+
When choosing a static partition DT, you also need to decide on partition granularity:
|
|
163
163
|
|
|
164
|
-
|
|
|
164
|
+
| Data pattern | Recommended granularity | Notes |
|
|
165
165
|
|---------|------------|------|
|
|
166
|
-
|
|
|
167
|
-
|
|
|
168
|
-
| T+1
|
|
169
|
-
|
|
|
170
|
-
|
|
|
166
|
+
| Strictly ordered time series (e.g., logs) | Minute-level (`dt_min`) | High data volume, frequent writes |
|
|
167
|
+
| Roughly ordered, small amount of late data | Hour-level (`dt_hour`) | Balance between granularity and management complexity |
|
|
168
|
+
| T+1 batch import | Day-level (`ds`) | Most common ETL scenario |
|
|
169
|
+
| By business cycle | Weekly/monthly | Reporting scenarios |
|
|
170
|
+
| Multi-level partition | Day + hour (`ds`, `hour`) | Finer-grained lifecycle management needed |
|
|
171
171
|
|
|
172
|
-
|
|
173
|
-
-
|
|
174
|
-
-
|
|
175
|
-
-
|
|
172
|
+
Selection principles:
|
|
173
|
+
- Finer granularity → smaller data volume per refresh → higher incremental efficiency
|
|
174
|
+
- Finer granularity → more partitions → more complex management and scheduling
|
|
175
|
+
- Granularity should match the data write frequency: if data is written hourly, partition granularity should not be finer than hourly
|
|
176
176
|
|
|
177
|
-
##
|
|
177
|
+
## Determining Partition Strategy from Original ETL
|
|
178
178
|
|
|
179
|
-
|
|
|
179
|
+
| Original ETL pattern | Recommended DT partition strategy |
|
|
180
180
|
|--------------|----------------|
|
|
181
|
-
| `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}')` |
|
|
182
|
-
| `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}', hour='${hour}')` |
|
|
183
|
-
| `INSERT OVERWRITE TABLE t PARTITION(ds)`
|
|
184
|
-
| `INSERT INTO TABLE t SELECT ...`
|
|
185
|
-
| `INSERT OVERWRITE TABLE t SELECT ...`
|
|
181
|
+
| `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}')` | Static partition DT, day-level |
|
|
182
|
+
| `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}', hour='${hour}')` | Static partition DT, day+hour level |
|
|
183
|
+
| `INSERT OVERWRITE TABLE t PARTITION(ds)` (dynamic partition write) | Dynamic partition DT or static partition DT (depends on whether per-partition management is needed) |
|
|
184
|
+
| `INSERT INTO TABLE t SELECT ...` (no partition) | Dynamic partition DT |
|
|
185
|
+
| `INSERT OVERWRITE TABLE t SELECT ...` (full table overwrite) | Dynamic partition DT |
|