@clickzetta/cz-cli-darwin-x64 0.3.87 → 0.3.88
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/cz-cli +0 -0
- package/bin/skills/clickzetta-dynamic-table/SKILL.md +169 -169
- package/bin/skills/clickzetta-dynamic-table/best-practices/dimension-table-join-guide.md +126 -126
- package/bin/skills/clickzetta-dynamic-table/best-practices/medallion-and-stream-patterns.md +25 -25
- package/bin/skills/clickzetta-dynamic-table/best-practices/non-partitioned-merge-into-warning.md +48 -48
- package/bin/skills/clickzetta-dynamic-table/best-practices/performance-optimization.md +51 -51
- package/bin/skills/clickzetta-dynamic-table/best-practices/scheduling-guide.md +59 -59
- package/bin/skills/clickzetta-dynamic-table/dt-creator/SKILL.md +8 -7
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md +99 -99
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/incremental-config-reference.md +188 -188
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/refresh-history-guide.md +117 -117
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/sql-limitations.md +29 -29
- package/bin/skills/clickzetta-dynamic-table/dynamic-table-alter/SKILL.md +80 -79
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/SKILL.md +15 -15
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-column-validation-rules.md +61 -61
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-conversion-rules.md +100 -100
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-placeholder-rules.md +64 -64
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-refresh-rules.md +32 -32
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-self-reference-rules.md +21 -21
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-workflow.md +71 -71
- package/bin/skills/clickzetta-sql-pipeline-manager/SKILL.md +203 -202
- package/bin/skills/clickzetta-sql-pipeline-manager/references/dynamic-table.md +62 -62
- package/bin/skills/clickzetta-sql-pipeline-manager/references/materialized-view.md +34 -34
- package/bin/skills/clickzetta-sql-pipeline-manager/references/pipe.md +61 -61
- package/bin/skills/clickzetta-sql-pipeline-manager/references/table-stream.md +41 -41
- package/bin/skills/clickzetta-table-stream-pipeline/SKILL.md +103 -101
- package/package.json +1 -1
package/bin/skills/clickzetta-dynamic-table/dt-creator/references/incremental-config-reference.md
CHANGED
|
@@ -1,29 +1,29 @@
|
|
|
1
|
-
# Dynamic Table
|
|
1
|
+
# Dynamic Table Incremental Computation Configuration Reference
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
This document lists the configuration options available to users for incremental refresh in Dynamic Tables / Materialized Views. All configurations take effect at the Session level via `SET` statements.
|
|
4
4
|
|
|
5
5
|
---
|
|
6
6
|
|
|
7
|
-
##
|
|
7
|
+
## Refresh Strategy
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
Controls the switching behavior between incremental and full refresh.
|
|
10
10
|
|
|
11
11
|
### `cz.optimizer.incremental.force.full.refresh`
|
|
12
12
|
|
|
13
|
-
-
|
|
13
|
+
- Type: bool, default: `false`
|
|
14
14
|
|
|
15
|
-
|
|
15
|
+
Forces the current refresh to use full mode, skipping incremental logic and doing a full scan and recomputation of all source tables.
|
|
16
16
|
|
|
17
|
-
|
|
18
|
-
-
|
|
19
|
-
-
|
|
20
|
-
- DT
|
|
17
|
+
**Applicable scenarios:**
|
|
18
|
+
- Incremental refresh results show data anomalies (e.g., missing or duplicate data) and a full repair is needed
|
|
19
|
+
- A dimension table has undergone an important change (e.g., a mapping relationship was corrected) and all historical data needs to re-JOIN to the latest dimension
|
|
20
|
+
- The DT's state table was accidentally deleted or corrupted, causing incremental refresh errors, and a full recomputation from scratch is needed
|
|
21
21
|
|
|
22
|
-
|
|
22
|
+
**Advantage:** Full recomputation guarantees results are completely consistent with directly executing the SQL — it is the most reliable data repair method.
|
|
23
23
|
|
|
24
|
-
|
|
24
|
+
**Risk:** Full refresh requires scanning all data in all source tables; the computation volume and time are far greater than incremental refresh. For DTs with large data volumes, a single full refresh may take minutes or even hours.
|
|
25
25
|
|
|
26
|
-
|
|
26
|
+
**Note:** This is a one-time Session-level switch. After the refresh completes, it must be manually reset to `false`; otherwise every subsequent REFRESH will use full mode, wasting compute resources.
|
|
27
27
|
|
|
28
28
|
```sql
|
|
29
29
|
SET cz.optimizer.incremental.force.full.refresh = true;
|
|
@@ -33,65 +33,65 @@ SET cz.optimizer.incremental.force.full.refresh = false;
|
|
|
33
33
|
|
|
34
34
|
### `cz.optimizer.incremental.try.incremental.refresh.enabled`
|
|
35
35
|
|
|
36
|
-
-
|
|
36
|
+
- Type: bool, default: `false`
|
|
37
37
|
|
|
38
|
-
|
|
38
|
+
Tries incremental refresh first; if incremental plan generation fails (e.g., the SQL contains operators that do not support incremental), automatically falls back to full refresh instead of reporting an error.
|
|
39
39
|
|
|
40
|
-
|
|
41
|
-
-
|
|
42
|
-
-
|
|
40
|
+
**Applicable scenarios:**
|
|
41
|
+
- Just migrated a complex SQL to a DT and unsure whether all operators support incremental computation; want "incremental if possible, full if not"
|
|
42
|
+
- In production, want to ensure refresh tasks do not fail due to incremental plan generation failures
|
|
43
43
|
|
|
44
|
-
|
|
44
|
+
**Advantage:** Improves fault tolerance of DT refresh. Even if the SQL contains patterns not yet supported by the incremental engine, the refresh task will not fail — it automatically degrades to full refresh.
|
|
45
45
|
|
|
46
|
-
|
|
46
|
+
**Risk:** If incremental plan generation continuously fails, every refresh will silently fall back to full, and users may not know their DT is always running full refreshes, wasting compute resources. Monitoring logs is recommended to watch for frequent fallbacks.
|
|
47
47
|
|
|
48
48
|
```sql
|
|
49
|
-
--
|
|
49
|
+
-- Execute before the REFRESH statement
|
|
50
50
|
SET cz.optimizer.incremental.try.incremental.refresh.enabled = true;
|
|
51
51
|
REFRESH DYNAMIC TABLE my_dt;
|
|
52
52
|
```
|
|
53
53
|
|
|
54
54
|
---
|
|
55
55
|
|
|
56
|
-
##
|
|
56
|
+
## Source Table Data Characteristic Declarations
|
|
57
57
|
|
|
58
|
-
|
|
58
|
+
Declare source table data characteristics to guide the incremental engine toward more efficient computation strategies.
|
|
59
59
|
|
|
60
60
|
### `cz.optimizer.incremental.dimension.tables`
|
|
61
61
|
|
|
62
|
-
-
|
|
62
|
+
- Type: string, default: `""`
|
|
63
63
|
|
|
64
|
-
|
|
64
|
+
Marks specified source tables as dimension tables. Once marked, the incremental engine no longer reads change data from those tables; instead, it reads their latest full data directly at each refresh. Only changes in non-dimension tables (fact tables) drive incremental computation.
|
|
65
65
|
|
|
66
|
-
|
|
66
|
+
Format: comma- or colon-separated table names; supports full path `instanceId.ws.schema.table` or short names.
|
|
67
67
|
|
|
68
|
-
|
|
69
|
-
-
|
|
70
|
-
-
|
|
71
|
-
-
|
|
72
|
-
-
|
|
68
|
+
**This is a tradeoff of correctness for performance.** Once marked as a dimension table, any data changes (INSERT/UPDATE/DELETE) to that table will not trigger incremental computation, and already-output result rows will not be updated due to dimension table changes. In return, the incremental engine gains significant performance improvements:
|
|
69
|
+
- Skips scanning change data from dimension tables (no need to read change logs)
|
|
70
|
+
- Reduces the number of state tables (no state tables needed when one side of a JOIN is all dimension tables)
|
|
71
|
+
- Simplifies the incremental plan (only need to JOIN fact table change data with dimension table full data; no reverse computation needed)
|
|
72
|
+
- Reduces deduplication and merge operations on incremental data
|
|
73
73
|
|
|
74
|
-
|
|
75
|
-
-
|
|
76
|
-
-
|
|
77
|
-
-
|
|
78
|
-
- T+1
|
|
74
|
+
**Applicable scenarios:**
|
|
75
|
+
- Fact table LEFT JOIN lookup/dictionary tables (e.g., region code table, product category table) where the lookup table rarely changes and its changes don't need to be tracked
|
|
76
|
+
- Large fact table JOIN small dimension table where the core goal is incremental performance on the fact table, and brief inconsistency after occasional dimension table changes is acceptable
|
|
77
|
+
- External tables (e.g., MySQL external tables) that don't support time travel and can't provide change data — marking as dimension table enables normal incremental computation
|
|
78
|
+
- T+1 dimension table + real-time fact table: dimension table updates in batch once per day; can be treated as unchanged between two updates
|
|
79
79
|
|
|
80
|
-
|
|
80
|
+
**Correctness impact:** After a dimension table changes, already-output results will not be automatically updated. For example, if a row's `name` changes from `'A'` to `'B'` in the dimension table, historical results that already JOINed that row will still show `'A'`. Only new fact table increments will JOIN to the latest `'B'`. If historical data correction is needed, a full refresh must be manually executed.
|
|
81
81
|
|
|
82
|
-
|
|
82
|
+
For detailed correctness impact analysis and behavior under each JOIN type, see the Dimension Table JOIN Guide (dimension-table-join-guide).
|
|
83
83
|
|
|
84
84
|
```sql
|
|
85
|
-
--
|
|
85
|
+
-- Recommended: declare via DT table properties (follows DT definition; no need to set before each REFRESH)
|
|
86
86
|
CREATE DYNAMIC TABLE my_dt
|
|
87
87
|
TBLPROPERTIES('mv_const_tables' = 'dim_product,dim_region')
|
|
88
88
|
AS SELECT ...;
|
|
89
89
|
|
|
90
|
-
--
|
|
90
|
+
-- Or via Session configuration (set before REFRESH statement)
|
|
91
91
|
SET cz.optimizer.incremental.dimension.tables = 'dim_product,dim_region';
|
|
92
92
|
REFRESH DYNAMIC TABLE my_dt;
|
|
93
93
|
|
|
94
|
-
--
|
|
94
|
+
-- After an important dimension table change, manually trigger a full refresh to correct data
|
|
95
95
|
SET cz.optimizer.incremental.force.full.refresh = true;
|
|
96
96
|
REFRESH DYNAMIC TABLE my_dt;
|
|
97
97
|
SET cz.optimizer.incremental.force.full.refresh = false;
|
|
@@ -99,83 +99,83 @@ SET cz.optimizer.incremental.force.full.refresh = false;
|
|
|
99
99
|
|
|
100
100
|
### `cz.optimizer.incremental.append.only.tables`
|
|
101
101
|
|
|
102
|
-
-
|
|
102
|
+
- Type: string, default: `""`
|
|
103
103
|
|
|
104
|
-
|
|
104
|
+
Marks specified source tables as "expected append-only". This is an optimization hint telling the optimizer that the table is expected to have INSERT operations only, allowing the optimizer to choose a more efficient incremental plan (e.g., creating intermediate state optimized for append-only scenarios in advance).
|
|
105
105
|
|
|
106
|
-
|
|
106
|
+
**This does not affect correctness.** Even if a table marked as append-only later has actual UPDATE or DELETE operations, the incremental engine will still correctly capture and compute those changes — results will not be wrong. The difference is: when actual UPDATE/DELETE occurs, the plan the optimizer chose based on the "append-only" assumption may not be optimal, and performance may be worse than if the table were not marked.
|
|
107
107
|
|
|
108
|
-
|
|
109
|
-
- Kafka
|
|
110
|
-
-
|
|
108
|
+
**Applicable scenarios:**
|
|
109
|
+
- Kafka consumer landing tables, log tables, event tracking tables, and other data sources that have INSERT operations the vast majority of the time
|
|
110
|
+
- Source tables that may occasionally have a small number of UPDATE/DELETE operations (e.g., data corrections), but whose primary write pattern is INSERT
|
|
111
111
|
|
|
112
|
-
|
|
112
|
+
**Advantage:** The optimizer can choose a more efficient incremental plan based on the "append-only" assumption, reducing unnecessary intermediate state maintenance overhead. For aggregation scenarios, it can directly accumulate without maintaining complete intermediate state. Performance improvement is significant, especially in complex SQL with JOINs and aggregations.
|
|
113
113
|
|
|
114
|
-
|
|
114
|
+
**Risk:** If the table actually has frequent UPDATE/DELETE operations, the plan chosen by the optimizer based on the "append-only" assumption may not be optimal, and incremental refresh performance may be worse than without the marking. However, result correctness is not affected.
|
|
115
115
|
|
|
116
116
|
```sql
|
|
117
|
-
--
|
|
117
|
+
-- Recommended: declare via table properties (permanent; no need to set before each refresh)
|
|
118
118
|
ALTER TABLE event_log SET PROPERTIES('INCR_APPEND_ONLY_TABLE' = 'true');
|
|
119
119
|
|
|
120
|
-
--
|
|
120
|
+
-- Or via Session configuration (set before REFRESH statement)
|
|
121
121
|
SET cz.optimizer.incremental.append.only.tables = 'event_log,click_stream';
|
|
122
122
|
REFRESH DYNAMIC TABLE my_dt;
|
|
123
123
|
```
|
|
124
124
|
|
|
125
125
|
---
|
|
126
126
|
|
|
127
|
-
##
|
|
127
|
+
## Full Refresh Fallback Strategy
|
|
128
128
|
|
|
129
|
-
|
|
129
|
+
Automatically switches from incremental to full refresh when source table change volume is too large or specific tables change.
|
|
130
130
|
|
|
131
131
|
### `cz.optimizer.incremental.full.refresh.if.these.tables.change`
|
|
132
132
|
|
|
133
|
-
-
|
|
133
|
+
- Type: string, default: `""`
|
|
134
134
|
|
|
135
|
-
|
|
135
|
+
Comma-separated list of table names. When any table in the list has data changes during the current refresh cycle, a full refresh is automatically triggered.
|
|
136
136
|
|
|
137
|
-
|
|
138
|
-
- DT
|
|
139
|
-
-
|
|
137
|
+
**Applicable scenarios:**
|
|
138
|
+
- The DT's SQL JOINs a critical dimension table (e.g., a price table or exchange rate table) where any change requires all historical data to be recomputed with the new values
|
|
139
|
+
- Difference from `cz.optimizer.incremental.dimension.tables`: `dimension.tables` ignores changes and continues incremental; this config detects changes and triggers full recomputation
|
|
140
140
|
|
|
141
|
-
|
|
141
|
+
**Advantage:** Guarantees correctness after critical table changes — once a change is detected, full recomputation is automatic without manual intervention.
|
|
142
142
|
|
|
143
|
-
|
|
143
|
+
**Risk:** If the specified tables change frequently (e.g., updated every hour), every refresh will trigger a full refresh, completely losing the performance advantage of incremental. Should only be used for tables with very low change frequency but large impact scope.
|
|
144
144
|
|
|
145
145
|
```sql
|
|
146
|
-
--
|
|
146
|
+
-- Execute before the REFRESH statement
|
|
147
147
|
SET cz.optimizer.incremental.full.refresh.if.these.tables.change = 'dim_pricing,dim_exchange_rate';
|
|
148
148
|
REFRESH DYNAMIC TABLE my_dt;
|
|
149
149
|
```
|
|
150
150
|
|
|
151
151
|
### `cz.optimizer.incremental.full.refresh.if.source.table.changes.significantly`
|
|
152
152
|
|
|
153
|
-
-
|
|
153
|
+
- Type: bool, default: `false`
|
|
154
154
|
|
|
155
|
-
|
|
155
|
+
When enabled, automatically switches to full refresh when the ratio of incremental data volume to total data volume in the source table exceeds a threshold.
|
|
156
156
|
|
|
157
|
-
|
|
158
|
-
-
|
|
159
|
-
-
|
|
157
|
+
**Applicable scenarios:**
|
|
158
|
+
- Source table occasionally has large batch data imports (e.g., historical data backfill), where incremental data volume approaches or exceeds full volume, making incremental refresh actually slower than full
|
|
159
|
+
- Want the system to automatically judge "is incremental worth it?" and automatically switch to full when it's not
|
|
160
160
|
|
|
161
|
-
|
|
161
|
+
**Advantage:** Automatically selects the optimal strategy between incremental and full, avoiding the problem where incremental refresh is actually slower when incremental data volume is too large (incremental has additional overhead for change data computation, deduplication merging, state table read/write, etc.).
|
|
162
162
|
|
|
163
|
-
|
|
163
|
+
**Risk:** Threshold judgment is based on statistics and may not be fully accurate. If statistics are imprecise, unnecessary full refreshes may occur, or a switch to full may not happen when it should.
|
|
164
164
|
|
|
165
|
-
|
|
165
|
+
Requires `cz.optimizer.incremental.threshold.of.source.table.change.for.full.refresh` to set the threshold.
|
|
166
166
|
|
|
167
167
|
### `cz.optimizer.incremental.threshold.of.source.table.change.for.full.refresh`
|
|
168
168
|
|
|
169
|
-
-
|
|
169
|
+
- Type: double, default: `1.0`
|
|
170
170
|
|
|
171
|
-
|
|
171
|
+
The change ratio threshold that triggers a full refresh. When incremental data volume / total data volume exceeds this value, a full refresh is triggered.
|
|
172
172
|
|
|
173
|
-
- `1.0
|
|
174
|
-
- `0.5
|
|
175
|
-
- `0.1
|
|
173
|
+
- `1.0`: triggers only when incremental data exceeds total (very conservative)
|
|
174
|
+
- `0.5`: triggers when incremental exceeds half of total
|
|
175
|
+
- `0.1`: triggers when incremental exceeds 10% of total (aggressive; suitable for complex SQL with high incremental computation overhead)
|
|
176
176
|
|
|
177
177
|
```sql
|
|
178
|
-
--
|
|
178
|
+
-- Execute before the REFRESH statement
|
|
179
179
|
SET cz.optimizer.incremental.full.refresh.if.source.table.changes.significantly = true;
|
|
180
180
|
SET cz.optimizer.incremental.threshold.of.source.table.change.for.full.refresh = 0.5;
|
|
181
181
|
REFRESH DYNAMIC TABLE my_dt;
|
|
@@ -183,82 +183,82 @@ REFRESH DYNAMIC TABLE my_dt;
|
|
|
183
183
|
|
|
184
184
|
---
|
|
185
185
|
|
|
186
|
-
##
|
|
186
|
+
## State Table Management
|
|
187
187
|
|
|
188
|
-
|
|
188
|
+
State tables are internal tables automatically created by the incremental engine during refresh to store intermediate computation results (e.g., intermediate aggregation state, historical JOIN data, etc.) to accelerate subsequent incremental refreshes.
|
|
189
189
|
|
|
190
190
|
### `cz.optimizer.incremental.enable.state.table`
|
|
191
191
|
|
|
192
|
-
-
|
|
192
|
+
- Type: bool, default: `true`
|
|
193
193
|
|
|
194
|
-
|
|
194
|
+
Master switch for state tables. The system defaults to a limit of 5 state tables per DT to prevent excessive state tables from causing excessive disk storage in extreme scenarios. When the DT's SQL contains more than 5 stateful computation operators (e.g., aggregation, JOIN, window functions), if the user has not explicitly enabled this config, the system will **abandon creating all state tables**, and incremental refresh degrades to recomputing intermediate results from source tables each time.
|
|
195
195
|
|
|
196
|
-
|
|
196
|
+
If the user wants to create state tables for these operators to get better incremental refresh performance, this config must be explicitly set to `true`. **Explicitly enabling this config means the user understands and accepts the tradeoff of additional disk storage for better incremental refresh performance.**
|
|
197
197
|
|
|
198
|
-
|
|
198
|
+
When set to `false`, the incremental engine does not create or reuse any state tables; all intermediate results are recomputed from source tables each time.
|
|
199
199
|
|
|
200
|
-
|
|
200
|
+
**Applicable scenarios:**
|
|
201
201
|
|
|
202
|
-
|
|
203
|
-
- DT
|
|
204
|
-
-
|
|
202
|
+
Set to `true` (explicitly enable):
|
|
203
|
+
- The DT's SQL contains many stateful operators (e.g., multi-level JOIN + aggregation + window functions), and the default 5 state table limit is insufficient to cover all operators; want to create more state tables for optimal incremental performance
|
|
204
|
+
- User has evaluated storage overhead and confirmed the additional state table storage is acceptable
|
|
205
205
|
|
|
206
|
-
|
|
207
|
-
-
|
|
208
|
-
-
|
|
209
|
-
-
|
|
206
|
+
Set to `false` (disable):
|
|
207
|
+
- Troubleshooting state table related issues (e.g., suspecting state table data inconsistency is causing incremental result anomalies)
|
|
208
|
+
- Source table data volume is small; the cost of full recomputation is acceptable; state tables are not needed for acceleration
|
|
209
|
+
- Need to strictly control storage overhead; don't want the system to automatically create additional tables
|
|
210
210
|
|
|
211
|
-
|
|
211
|
+
**Advantage:** When explicitly enabled, the system can create state tables for all stateful operators, maximizing incremental refresh performance gains. When disabled, all storage overhead from state tables is eliminated.
|
|
212
212
|
|
|
213
|
-
|
|
213
|
+
**Risk:** When explicitly enabled, the number of state tables may exceed the default limit of 5, bringing additional disk storage overhead. When disabled, complex DTs with aggregation or multi-table JOINs need to read full data from source tables to recompute intermediate results on every incremental refresh, which may significantly degrade performance.
|
|
214
214
|
|
|
215
215
|
```sql
|
|
216
|
-
--
|
|
216
|
+
-- Explicitly enable: allow the system to create state tables for all stateful operators (execute before REFRESH)
|
|
217
217
|
SET cz.optimizer.incremental.enable.state.table = true;
|
|
218
218
|
REFRESH DYNAMIC TABLE my_dt;
|
|
219
219
|
|
|
220
|
-
--
|
|
220
|
+
-- Disable: do not create or reuse any state tables
|
|
221
221
|
SET cz.optimizer.incremental.enable.state.table = false;
|
|
222
222
|
REFRESH DYNAMIC TABLE my_dt;
|
|
223
223
|
```
|
|
224
224
|
|
|
225
225
|
### `cz.optimizer.incremental.state.table.lifecycle`
|
|
226
226
|
|
|
227
|
-
-
|
|
227
|
+
- Type: string, default: `"3"`
|
|
228
228
|
|
|
229
|
-
|
|
229
|
+
Number of days to retain state table data. Historical version data older than this number of days will be automatically cleaned up.
|
|
230
230
|
|
|
231
|
-
|
|
232
|
-
- DT
|
|
233
|
-
-
|
|
234
|
-
-
|
|
231
|
+
**Applicable scenarios:**
|
|
232
|
+
- The DT's refresh interval is long (e.g., once per week), and the default 3 days will cause state tables to be cleaned up between two refreshes, making them unusable for the next refresh and degrading to full refresh. In this case, increase this value.
|
|
233
|
+
- Want to reduce state table storage overhead; can shorten the retention period appropriately (but not shorter than the refresh interval)
|
|
234
|
+
- State table content is very large; want to reclaim storage space promptly; can explicitly shorten the lifecycle (e.g., set to 1 day) to let expired versions be cleaned up sooner
|
|
235
235
|
|
|
236
|
-
|
|
236
|
+
**Advantage:** Increasing the retention period ensures state tables are not cleaned up within the refresh interval, guaranteeing incremental refresh can normally reuse state.
|
|
237
237
|
|
|
238
|
-
|
|
238
|
+
**Risk:** The longer the retention period, the more storage space state tables occupy. Each version of a state table is retained until expiry; if refresh frequency is high (e.g., hourly) and retention period is long (e.g., 30 days), state table storage can be very substantial.
|
|
239
239
|
|
|
240
240
|
```sql
|
|
241
|
-
--
|
|
241
|
+
-- Execute before the REFRESH statement
|
|
242
242
|
SET cz.optimizer.incremental.state.table.lifecycle = '10';
|
|
243
243
|
REFRESH DYNAMIC TABLE my_dt;
|
|
244
244
|
```
|
|
245
245
|
|
|
246
246
|
### `cz.optimizer.incremental.rebuild.rule.based.state.table`
|
|
247
247
|
|
|
248
|
-
-
|
|
248
|
+
- Type: bool, default: `false`
|
|
249
249
|
|
|
250
|
-
|
|
250
|
+
When set to `true`, rebuilds all state tables on the next refresh. The rebuild process clears old state table data and regenerates it based on current source table data.
|
|
251
251
|
|
|
252
|
-
|
|
253
|
-
-
|
|
254
|
-
- DT
|
|
255
|
-
-
|
|
252
|
+
**Applicable scenarios:**
|
|
253
|
+
- State table data is corrupted (e.g., incomplete state table writes due to system anomalies), causing incremental refresh result anomalies
|
|
254
|
+
- The DT's SQL has changed (e.g., aggregation logic was modified), and the old state table schema doesn't match the new SQL
|
|
255
|
+
- Incremental refresh keeps reporting errors; suspecting a state table issue; want to rebuild from scratch
|
|
256
256
|
|
|
257
|
-
|
|
257
|
+
**Advantage:** After rebuilding, state table data is fully consistent with the current source table, eliminating historical accumulated data inconsistencies.
|
|
258
258
|
|
|
259
|
-
|
|
259
|
+
**Risk:** The rebuild process causes that refresh to use full mode, which takes longer. Incremental refresh is unavailable until the rebuild is complete.
|
|
260
260
|
|
|
261
|
-
|
|
261
|
+
**Note:** This is a one-time switch. After the rebuild is complete, it must be reset to `false`; otherwise every refresh will rebuild state tables, completely defeating the purpose of incremental.
|
|
262
262
|
|
|
263
263
|
```sql
|
|
264
264
|
SET cz.optimizer.incremental.rebuild.rule.based.state.table = true;
|
|
@@ -268,17 +268,17 @@ SET cz.optimizer.incremental.rebuild.rule.based.state.table = false;
|
|
|
268
268
|
|
|
269
269
|
### `cz.optimizer.incremental.state.table.specified.schema`
|
|
270
270
|
|
|
271
|
-
-
|
|
271
|
+
- Type: string, default: `""`
|
|
272
272
|
|
|
273
|
-
|
|
273
|
+
Specifies the Schema where state tables are stored. By default, state tables are in the same Schema as the DT target table.
|
|
274
274
|
|
|
275
|
-
|
|
276
|
-
-
|
|
277
|
-
-
|
|
275
|
+
**Applicable scenarios:**
|
|
276
|
+
- Want to isolate state tables from business tables for unified management and monitoring of state table storage overhead
|
|
277
|
+
- Multiple DTs share the same Schema for state tables, making batch cleanup easier
|
|
278
278
|
|
|
279
|
-
|
|
279
|
+
**Advantage:** After separating business tables and state tables, Schema-level permissions, quotas, and lifecycle policies can be set independently, preventing state tables from interfering with business table management.
|
|
280
280
|
|
|
281
|
-
|
|
281
|
+
**Risk:** Cross-Schema access may bring slight metadata query overhead. Additionally, if the specified Schema does not exist or permissions are insufficient, state table creation will fail.
|
|
282
282
|
|
|
283
283
|
```sql
|
|
284
284
|
SET cz.optimizer.incremental.state.table.specified.schema = 'incr_state';
|
|
@@ -286,76 +286,76 @@ SET cz.optimizer.incremental.state.table.specified.schema = 'incr_state';
|
|
|
286
286
|
|
|
287
287
|
---
|
|
288
288
|
|
|
289
|
-
## DT
|
|
289
|
+
## DT Definition Changes
|
|
290
290
|
|
|
291
|
-
|
|
291
|
+
Controls compatibility check behavior when executing `CREATE OR REPLACE DYNAMIC TABLE`.
|
|
292
292
|
|
|
293
293
|
### `cz.sql.mv.check.before.replacing.sql`
|
|
294
294
|
|
|
295
|
-
-
|
|
295
|
+
- Type: bool, default: `true`
|
|
296
296
|
|
|
297
|
-
|
|
297
|
+
Controls whether a compatibility check is performed on the old and new SQL when executing `CREATE OR REPLACE DYNAMIC TABLE`.
|
|
298
298
|
|
|
299
|
-
|
|
299
|
+
**Check enabled (`true`, default):** The system compares the column structure of the old and new SQL to determine compatibility. If judged compatible (e.g., only adding columns), the system retains existing incremental state and continues incremental refresh afterward. However, compatibility judgment is not perfect — for changes judged as "compatible", newly added columns will be filled with NULL in historical data, and existing historical rows will not be recomputed according to the new SQL, which may cause inconsistency between old and new data.
|
|
300
300
|
|
|
301
|
-
|
|
301
|
+
**Check disabled (`false`):** The system skips the compatibility check and directly treats the old and new SQL as incompatible, resetting incremental state (clearing state tables and historical version information). The next refresh after replacement will execute full computation, ensuring all data is regenerated according to the new SQL.
|
|
302
302
|
|
|
303
|
-
|
|
303
|
+
**Applicable scenarios:**
|
|
304
304
|
|
|
305
|
-
|
|
306
|
-
1. **`CREATE OR REPLACE`
|
|
307
|
-
2. **SQL
|
|
305
|
+
Set to `false` (disable check):
|
|
306
|
+
1. **`CREATE OR REPLACE` is stuck or reports an error**: In some cases, the compatibility check itself may take a long time or report an error due to metadata issues, preventing `CREATE OR REPLACE` from completing. Disabling the check skips this step and allows the replacement to complete smoothly. The tradeoff is that the next refresh will be a full refresh.
|
|
307
|
+
2. **SQL has undergone substantive changes and a full recomputation is desired**: E.g., JOIN logic or aggregation method was modified. Disabling the check ensures the system doesn't incorrectly judge it as "compatible" and retain old incremental state.
|
|
308
308
|
|
|
309
|
-
|
|
310
|
-
1.
|
|
311
|
-
2.
|
|
309
|
+
Keep `true` (enable check, default):
|
|
310
|
+
1. **Simple changes like adding columns only**: Want the system to automatically judge compatibility; retain incremental state when compatible to avoid full refresh. Suitable for scenarios where NULL in new columns for historical data is acceptable.
|
|
311
|
+
2. **Frequent DT definition adjustments during daily iteration**: Rely on automatic system judgment to reduce unnecessary full refreshes.
|
|
312
312
|
|
|
313
|
-
|
|
313
|
+
**Risk of enabling check:** Compatibility judgment may classify actually incompatible changes as "compatible", causing new columns to be NULL in historical data, or existing historical rows to never be updated according to the new SQL.
|
|
314
314
|
|
|
315
|
-
|
|
315
|
+
**Risk of disabling check:** The next refresh will execute full computation, which may take a long time for DTs with large data volumes.
|
|
316
316
|
|
|
317
317
|
```sql
|
|
318
|
-
--
|
|
318
|
+
-- Disable check to ensure full recomputation after replacement
|
|
319
319
|
SET cz.sql.mv.check.before.replacing.sql = false;
|
|
320
320
|
CREATE OR REPLACE DYNAMIC TABLE my_dt AS SELECT ...;
|
|
321
321
|
SET cz.sql.mv.check.before.replacing.sql = true;
|
|
322
|
-
--
|
|
322
|
+
-- Note: the next REFRESH will execute a full refresh
|
|
323
323
|
```
|
|
324
324
|
|
|
325
325
|
---
|
|
326
326
|
|
|
327
|
-
##
|
|
327
|
+
## Historical Partition Backfill
|
|
328
328
|
|
|
329
329
|
### `cz.optimizer.incremental.backfill.enabled`
|
|
330
330
|
|
|
331
|
-
-
|
|
331
|
+
- Type: bool, default: `false`
|
|
332
332
|
|
|
333
|
-
|
|
334
|
-
-
|
|
335
|
-
-
|
|
336
|
-
-
|
|
337
|
-
-
|
|
333
|
+
Enables backfill mode. Used to backfill or correct data in historical partitions of a DT. When enabled, the system automatically performs the following:
|
|
334
|
+
- Forces the current refresh to use full mode (equivalent to enabling `force.full.refresh`)
|
|
335
|
+
- Skips reading incremental data to avoid reading large amounts of historical change logs
|
|
336
|
+
- For partitioned DTs, disables state table creation and matching (backfilled partitions don't need incremental state)
|
|
337
|
+
- Allows DML operations on the DT (e.g., `INSERT OVERWRITE`)
|
|
338
338
|
|
|
339
|
-
|
|
339
|
+
**Applicable scenarios:**
|
|
340
340
|
|
|
341
|
-
|
|
342
|
-
1.
|
|
343
|
-
2.
|
|
344
|
-
3.
|
|
341
|
+
Set to `true` (enable backfill):
|
|
342
|
+
1. **Historical partition data correction**: A historical partition's data has issues and needs to be regenerated from correct source data.
|
|
343
|
+
2. **Supplement historical data after creating a new DT**: After a DT is created, historical partitions need to have data generated one by one.
|
|
344
|
+
3. **Recompute after source table data backfill**: The source table had historical data backfilled; affected partitions need to be recomputed.
|
|
345
345
|
|
|
346
|
-
|
|
347
|
-
-
|
|
348
|
-
-
|
|
349
|
-
-
|
|
346
|
+
**Notes:**
|
|
347
|
+
- Backfill mode is a one-time operation; after backfill is complete, reset to `false`; otherwise every subsequent refresh will use full mode.
|
|
348
|
+
- Backfill mode does not create or update state tables, so it does not affect subsequent normal incremental refresh state.
|
|
349
|
+
- Backfill is typically used with `INSERT OVERWRITE` to overwrite existing data in the target partition.
|
|
350
350
|
|
|
351
351
|
```sql
|
|
352
|
-
--
|
|
352
|
+
-- Backfill a specified historical partition (execute before REFRESH statement)
|
|
353
353
|
SET cz.optimizer.incremental.backfill.enabled = true;
|
|
354
354
|
SET dt.args.ds = '2025-01-01';
|
|
355
355
|
REFRESH DYNAMIC TABLE my_dt PARTITION(ds = '2025-01-01');
|
|
356
356
|
SET cz.optimizer.incremental.backfill.enabled = false;
|
|
357
357
|
|
|
358
|
-
--
|
|
358
|
+
-- Can also backfill directly via INSERT OVERWRITE
|
|
359
359
|
SET cz.optimizer.incremental.backfill.enabled = true;
|
|
360
360
|
INSERT OVERWRITE TABLE my_dt
|
|
361
361
|
SELECT id, amount, '2025-01-01' AS ds
|
|
@@ -366,37 +366,37 @@ SET cz.optimizer.incremental.backfill.enabled = false;
|
|
|
366
366
|
|
|
367
367
|
---
|
|
368
368
|
|
|
369
|
-
##
|
|
369
|
+
## Write Behavior for Partitioned Tables During Full Refresh
|
|
370
370
|
|
|
371
371
|
### `cz.optimizer.incremental.full.refresh.overwrite.partitioned.table`
|
|
372
372
|
|
|
373
|
-
-
|
|
373
|
+
- Type: bool, default: `true`
|
|
374
374
|
|
|
375
|
-
|
|
375
|
+
Controls the write mode for partitioned DTs during full refresh.
|
|
376
376
|
|
|
377
|
-
|
|
377
|
+
**Background:** For partitioned tables, full refresh (`force.full.refresh = true` or system-triggered full refresh) defaults to overwrite (OVERWRITE) mode — this is standard behavior in big data: full recomputation results overwrite all partitions of the target table. However, in some scenarios, the DT's SQL only computes data for some partitions (e.g., only the last 7 days), and the full refresh result also only includes those partitions. In this case, using overwrite would cause historical partitions (e.g., data older than 7 days) to be cleared.
|
|
378
378
|
|
|
379
|
-
|
|
379
|
+
**Overwrite enabled (`true`, default):** During full refresh, all partitions of the target table are overwritten. Partitions not included in the refresh result will be cleared. This is suitable for scenarios where the DT's SQL covers the entire data range of the target table.
|
|
380
380
|
|
|
381
|
-
|
|
381
|
+
**Overwrite disabled (`false`):** During full refresh, only the partition data produced by the current computation is written; other existing partitions in the target table are not affected. Historical partition data remains unchanged.
|
|
382
382
|
|
|
383
|
-
|
|
383
|
+
**Applicable scenarios:**
|
|
384
384
|
|
|
385
|
-
|
|
386
|
-
1. **DT
|
|
387
|
-
2.
|
|
388
|
-
3.
|
|
385
|
+
Set to `false` (disable overwrite):
|
|
386
|
+
1. **DT's SQL only computes some partitions**: E.g., the SQL has a `WHERE ds >= '2025-01-01'` filter condition and only computes recent data. Don't want to clear earlier historical partitions during full refresh.
|
|
387
|
+
2. **DT that accumulates data partition by partition**: Each refresh only produces data for the current partition; historical partitions were produced by previous refreshes. Full refresh should only recompute the current partition without affecting historical partitions.
|
|
388
|
+
3. **Sliding window scenarios**: The DT's SQL computes data within a time window based on partition parameters; full refresh only recomputes partitions within the window.
|
|
389
389
|
|
|
390
|
-
|
|
391
|
-
1. **DT
|
|
392
|
-
2.
|
|
390
|
+
Keep `true` (enable overwrite, default):
|
|
391
|
+
1. **DT's SQL covers all data**: SQL has no partition filter conditions; full refresh result includes all data in the target table.
|
|
392
|
+
2. **Need to fully rebuild the target table**: Want the target table data after full refresh to be completely consistent with directly executing the SQL, without retaining any historical residuals.
|
|
393
393
|
|
|
394
|
-
|
|
395
|
-
-
|
|
396
|
-
-
|
|
394
|
+
**Risks:**
|
|
395
|
+
- With overwrite enabled, if the DT's SQL only computes some partitions, full refresh will clear historical partitions not covered by the computation, causing data loss.
|
|
396
|
+
- With overwrite disabled, if the DT's SQL covers all data, old data may remain in the target table after full refresh (because old partitions were not cleared), causing data inconsistency.
|
|
397
397
|
|
|
398
398
|
```sql
|
|
399
|
-
--
|
|
399
|
+
-- Disable overwrite: retain historical partitions during full refresh (execute before REFRESH statement)
|
|
400
400
|
SET cz.optimizer.incremental.full.refresh.overwrite.partitioned.table = false;
|
|
401
401
|
SET cz.optimizer.incremental.force.full.refresh = true;
|
|
402
402
|
REFRESH DYNAMIC TABLE my_dt;
|
|
@@ -405,23 +405,23 @@ SET cz.optimizer.incremental.force.full.refresh = false;
|
|
|
405
405
|
|
|
406
406
|
---
|
|
407
407
|
|
|
408
|
-
##
|
|
408
|
+
## Configuration Quick Reference
|
|
409
409
|
|
|
410
|
-
|
|
410
|
+
Quickly locate the required configuration by use case:
|
|
411
411
|
|
|
412
|
-
|
|
|
412
|
+
| Scenario | Configuration | Recommended value |
|
|
413
413
|
|------|--------|--------|
|
|
414
|
-
|
|
|
415
|
-
|
|
|
416
|
-
|
|
|
417
|
-
|
|
|
418
|
-
|
|
|
419
|
-
|
|
|
420
|
-
| SQL
|
|
421
|
-
|
|
|
422
|
-
|
|
|
423
|
-
|
|
|
424
|
-
|
|
|
425
|
-
| `CREATE OR REPLACE`
|
|
426
|
-
|
|
|
427
|
-
|
|
|
414
|
+
| Data anomaly; need full recomputation for repair | `cz.optimizer.incremental.force.full.refresh` | `true` (one-time) |
|
|
415
|
+
| Unsure if SQL supports incremental | `cz.optimizer.incremental.try.incremental.refresh.enabled` | `true` |
|
|
416
|
+
| Small table JOIN; no need to track changes | `cz.optimizer.incremental.dimension.tables` or table property `mv_const_tables` | Table name list |
|
|
417
|
+
| Source table is mainly INSERT; want to optimize incremental performance | `cz.optimizer.incremental.append.only.tables` or table property `INCR_APPEND_ONLY_TABLE` | Table name list / `true` |
|
|
418
|
+
| Must do full recomputation when critical table changes | `cz.optimizer.incremental.full.refresh.if.these.tables.change` | Table name list |
|
|
419
|
+
| Auto-switch to full when incremental data volume is too large | `cz.optimizer.incremental.full.refresh.if.source.table.changes.significantly` + `threshold` | `true` + `0.5` |
|
|
420
|
+
| Many SQL operators; need more state tables for acceleration | `cz.optimizer.incremental.enable.state.table` | `true` (explicitly enable) |
|
|
421
|
+
| No state tables needed, or troubleshooting state table issues | `cz.optimizer.incremental.enable.state.table` | `false` |
|
|
422
|
+
| State table data corrupted; need to rebuild | `cz.optimizer.incremental.rebuild.rule.based.state.table` | `true` (one-time) |
|
|
423
|
+
| Long refresh interval; state tables cleaned up prematurely | `cz.optimizer.incremental.state.table.lifecycle` | Increase to cover refresh interval |
|
|
424
|
+
| Isolate state tables from business tables | `cz.optimizer.incremental.state.table.specified.schema` | Schema name |
|
|
425
|
+
| `CREATE OR REPLACE` is stuck or SQL has substantive changes | `cz.sql.mv.check.before.replacing.sql` | `false` (one-time) |
|
|
426
|
+
| Historical partition data backfill or correction | `cz.optimizer.incremental.backfill.enabled` | `true` (one-time) |
|
|
427
|
+
| Retain historical partition data during full refresh | `cz.optimizer.incremental.full.refresh.overwrite.partitioned.table` | `false` |
|