@clickzetta/cz-cli-darwin-x64 0.3.87 → 0.3.88
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/cz-cli +0 -0
- package/bin/skills/clickzetta-dynamic-table/SKILL.md +169 -169
- package/bin/skills/clickzetta-dynamic-table/best-practices/dimension-table-join-guide.md +126 -126
- package/bin/skills/clickzetta-dynamic-table/best-practices/medallion-and-stream-patterns.md +25 -25
- package/bin/skills/clickzetta-dynamic-table/best-practices/non-partitioned-merge-into-warning.md +48 -48
- package/bin/skills/clickzetta-dynamic-table/best-practices/performance-optimization.md +51 -51
- package/bin/skills/clickzetta-dynamic-table/best-practices/scheduling-guide.md +59 -59
- package/bin/skills/clickzetta-dynamic-table/dt-creator/SKILL.md +8 -7
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md +99 -99
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/incremental-config-reference.md +188 -188
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/refresh-history-guide.md +117 -117
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/sql-limitations.md +29 -29
- package/bin/skills/clickzetta-dynamic-table/dynamic-table-alter/SKILL.md +80 -79
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/SKILL.md +15 -15
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-column-validation-rules.md +61 -61
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-conversion-rules.md +100 -100
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-placeholder-rules.md +64 -64
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-refresh-rules.md +32 -32
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-self-reference-rules.md +21 -21
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-workflow.md +71 -71
- package/bin/skills/clickzetta-sql-pipeline-manager/SKILL.md +203 -202
- package/bin/skills/clickzetta-sql-pipeline-manager/references/dynamic-table.md +62 -62
- package/bin/skills/clickzetta-sql-pipeline-manager/references/materialized-view.md +34 -34
- package/bin/skills/clickzetta-sql-pipeline-manager/references/pipe.md +61 -61
- package/bin/skills/clickzetta-sql-pipeline-manager/references/table-stream.md +41 -41
- package/bin/skills/clickzetta-table-stream-pipeline/SKILL.md +103 -101
- package/package.json +1 -1
package/bin/skills/clickzetta-dynamic-table/best-practices/non-partitioned-merge-into-warning.md
CHANGED
|
@@ -1,53 +1,53 @@
|
|
|
1
|
-
#
|
|
1
|
+
# Non-partitioned Table + Continuous Writes: DT Risk Alert and MERGE INTO Alternative
|
|
2
2
|
|
|
3
|
-
##
|
|
3
|
+
## Trigger Conditions
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
When the DT the user is about to create simultaneously meets all of the following conditions, **an alert must be issued to the user**:
|
|
6
6
|
|
|
7
|
-
1. DT
|
|
8
|
-
2.
|
|
9
|
-
3. SQL
|
|
7
|
+
1. The DT itself is a non-partitioned table (no `PARTITIONED BY` and no `SESSION_CONFIGS()` references)
|
|
8
|
+
2. The source table is also a non-partitioned table with continuous writes (e.g., a Kafka consumer landing table, a CDC detail table)
|
|
9
|
+
3. The SQL contains a window function deduplication pattern by primary key: `ROW_NUMBER() OVER (PARTITION BY key ORDER BY ts DESC) WHERE rn = 1`
|
|
10
10
|
|
|
11
|
-
##
|
|
11
|
+
## Alert Content
|
|
12
12
|
|
|
13
|
-
|
|
13
|
+
Explain the following three risks to the user:
|
|
14
14
|
|
|
15
|
-
###
|
|
15
|
+
### Risk 1: Unbounded Storage Growth
|
|
16
16
|
|
|
17
|
-
|
|
18
|
-
-
|
|
19
|
-
- DT
|
|
20
|
-
-
|
|
21
|
-
-
|
|
17
|
+
Non-partitioned DTs and non-partitioned source tables both lack automatic data lifecycle management (`data_lifecycle` only works for partitioned tables). As data continues to be written:
|
|
18
|
+
- Source table data grows without bound
|
|
19
|
+
- DT state tables are maintained globally and grow linearly with data volume
|
|
20
|
+
- Target table data grows in sync
|
|
21
|
+
- All three combined result in continuously rising and uncontrollable storage costs
|
|
22
22
|
|
|
23
|
-
###
|
|
23
|
+
### Risk 2: Source Table Archiving Causes Performance Disaster
|
|
24
24
|
|
|
25
|
-
|
|
25
|
+
When storage grows to a certain point, operations teams typically archive the source table — migrating historical data to cold storage or an archive table, then deleting it from the source table to free space. At this point:
|
|
26
26
|
|
|
27
|
-
- DT
|
|
28
|
-
- `ROW_NUMBER() OVER (PARTITION BY key ORDER BY ts DESC) WHERE rn = 1`
|
|
29
|
-
-
|
|
30
|
-
-
|
|
31
|
-
-
|
|
32
|
-
-
|
|
27
|
+
- The DT captures the source table's delete events and reflects them in incremental computation results
|
|
28
|
+
- The `ROW_NUMBER() OVER (PARTITION BY key ORDER BY ts DESC) WHERE rn = 1` delete handling cost is extremely high:
|
|
29
|
+
- Window functions cannot handle deletes incrementally — they need to re-read all historical data for that key and re-sort
|
|
30
|
+
- Non-partitioned tables have no partition boundaries to limit the re-read scope; may need to scan the entire table
|
|
31
|
+
- Large-scale archiving produces massive delete changes; each key needs to be independently recomputed
|
|
32
|
+
- A single source table archiving operation can cause DT REFRESH duration to spike from seconds to hours, or even fail
|
|
33
33
|
|
|
34
|
-
###
|
|
34
|
+
### Risk 3: Cannot Filter Archive-generated Delete Events
|
|
35
35
|
|
|
36
|
-
DT
|
|
36
|
+
The DT's incremental engine automatically captures all changes to the source table (INSERT / UPDATE / DELETE); users cannot intervene in this process. `WHERE op <> 'DELETE'` in the SQL filters business-level delete markers, not physical deletes from the source table. Users have no way to tell the DT "these deletes are archiving operations; please ignore them."
|
|
37
37
|
|
|
38
|
-
##
|
|
38
|
+
## Recommended Alternative
|
|
39
39
|
|
|
40
|
-
|
|
40
|
+
Suggest the user use MERGE INTO + Table Stream instead:
|
|
41
41
|
|
|
42
42
|
```sql
|
|
43
|
-
-- Step 1:
|
|
43
|
+
-- Step 1: Enable change tracking on source table
|
|
44
44
|
ALTER TABLE source_table SET PROPERTIES ('change_tracking' = 'true');
|
|
45
45
|
|
|
46
|
-
-- Step 2:
|
|
46
|
+
-- Step 2: Create Table Stream
|
|
47
47
|
CREATE TABLE STREAM source_stream ON TABLE source_table
|
|
48
48
|
WITH (TABLE_STREAM_MODE = 'STANDARD', SHOW_INITIAL_ROWS = TRUE);
|
|
49
49
|
|
|
50
|
-
-- Step 3:
|
|
50
|
+
-- Step 3: Create target table
|
|
51
51
|
CREATE TABLE target_table (
|
|
52
52
|
id BIGINT,
|
|
53
53
|
col1 STRING,
|
|
@@ -55,7 +55,7 @@ CREATE TABLE target_table (
|
|
|
55
55
|
event_time TIMESTAMP
|
|
56
56
|
);
|
|
57
57
|
|
|
58
|
-
-- Step 4:
|
|
58
|
+
-- Step 4: Scheduled MERGE INTO to consume Stream
|
|
59
59
|
MERGE INTO target_table t
|
|
60
60
|
USING (
|
|
61
61
|
SELECT id, col1, col2, event_time,
|
|
@@ -68,29 +68,29 @@ WHEN NOT MATCHED AND s.op = 'UPSERT' THEN INSERT
|
|
|
68
68
|
(id, col1, col2, event_time) VALUES (s.id, s.col1, s.col2, s.event_time);
|
|
69
69
|
```
|
|
70
70
|
|
|
71
|
-
MERGE INTO + Table Stream
|
|
72
|
-
-
|
|
73
|
-
-
|
|
74
|
-
-
|
|
75
|
-
- **
|
|
71
|
+
Advantages of MERGE INTO + Table Stream:
|
|
72
|
+
- **Each computation is independent**: only consumes incremental data from the Stream; does not depend on the source table's full state
|
|
73
|
+
- **Archive-immune**: when the source table is archived, archive-generated delete events can be filtered via WHERE conditions in the USING subquery
|
|
74
|
+
- **Independent target table management**: the target table's lifecycle is decoupled from the source table; independent archiving strategies can be set
|
|
75
|
+
- **Offset auto-advances**: after MERGE INTO consumes the Stream, the offset automatically advances; only new changes are processed next time
|
|
76
76
|
|
|
77
|
-
##
|
|
77
|
+
## Alert Message Template
|
|
78
78
|
|
|
79
|
-
|
|
79
|
+
When the user's DT is detected to meet the trigger conditions, use the following message:
|
|
80
80
|
|
|
81
|
-
> ⚠️
|
|
81
|
+
> ⚠️ **Risk Warning**: You are creating a non-partitioned Dynamic Table, and the source table is also a non-partitioned table with continuous writes. This combination has the following long-term operational risks:
|
|
82
82
|
>
|
|
83
|
-
> 1.
|
|
84
|
-
> 2.
|
|
85
|
-
> 3.
|
|
83
|
+
> 1. **Unbounded storage growth**: the source table, DT target table, and DT state tables will all grow continuously and cannot be automatically cleaned up via `data_lifecycle`
|
|
84
|
+
> 2. **Source table archiving will cause a performance disaster**: when you need to archive the source table (migrate historical data then delete), the DT will capture these delete events. Because the SQL contains `ROW_NUMBER() ... WHERE rn = 1` deduplication logic, each deleted key needs to re-read historical data and re-sort; non-partitioned tables have no boundary limits, which may cause serious REFRESH performance regression
|
|
85
|
+
> 3. **Cannot filter archive deletes**: the DT incremental engine automatically captures all source table changes; you cannot tell the DT to ignore deletes generated by archiving operations
|
|
86
86
|
>
|
|
87
|
-
>
|
|
87
|
+
> **Recommendation**: For this type of "merge non-partitioned CDC detail table into result table" scenario, the MERGE INTO + Table Stream approach is recommended. Each run only consumes incremental data; archive-generated delete events can be filtered via WHERE conditions, without affecting downstream.
|
|
88
88
|
|
|
89
|
-
##
|
|
89
|
+
## Detection Logic
|
|
90
90
|
|
|
91
|
-
|
|
91
|
+
When helping users create a DT, check in the following order:
|
|
92
92
|
|
|
93
|
-
1. DT
|
|
94
|
-
2.
|
|
95
|
-
3. SQL
|
|
96
|
-
4.
|
|
93
|
+
1. Does the DT have `PARTITIONED BY` or `SESSION_CONFIGS()`? → If yes, do not trigger alert
|
|
94
|
+
2. Is the source table a non-partitioned table with continuous writes (e.g., Kafka consumer table, CDC detail table)? → If not, do not trigger alert
|
|
95
|
+
3. Does the SQL contain the `ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ... DESC) WHERE rn = 1` pattern? → If yes, highest risk; must alert
|
|
96
|
+
4. Even without ROW_NUMBER, if conditions 1+2 are met, remind the user of the storage growth risk
|
|
@@ -1,109 +1,109 @@
|
|
|
1
|
-
# Dynamic Table
|
|
1
|
+
# Dynamic Table Performance Optimization Guide
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
This document helps users write DTs with better incremental refresh performance from three dimensions: SQL writing, data characteristics, and pipeline design.
|
|
4
4
|
|
|
5
|
-
##
|
|
5
|
+
## Core Principle: The Cost Model of Incremental Refresh
|
|
6
6
|
|
|
7
|
-
|
|
8
|
-
1.
|
|
9
|
-
2.
|
|
10
|
-
3.
|
|
7
|
+
Incremental refresh performance depends on three factors:
|
|
8
|
+
1. **Change ratio**: how much data in the source table changed during each refresh. The smaller the change volume, the more worthwhile incremental is.
|
|
9
|
+
2. **Operator type**: different SQL operators have very different incremental costs.
|
|
10
|
+
3. **Data locality**: whether the changed data is concentrated on JOIN keys / GROUP BY keys / PARTITION BY keys.
|
|
11
11
|
|
|
12
|
-
|
|
12
|
+
When the change volume exceeds a significant proportion of total data, incremental refresh may actually be slower than full refresh, because incremental has additional overhead for change data computation, deduplication merging, state table read/write, etc.
|
|
13
13
|
|
|
14
|
-
## SQL
|
|
14
|
+
## SQL Writing Optimization
|
|
15
15
|
|
|
16
|
-
### 1.
|
|
16
|
+
### 1. Prefer INNER JOIN over OUTER JOIN
|
|
17
17
|
|
|
18
|
-
INNER JOIN
|
|
19
|
-
- INNER JOIN
|
|
20
|
-
- LEFT/RIGHT/FULL OUTER JOIN
|
|
18
|
+
INNER JOIN incremental computation is more efficient than OUTER JOIN:
|
|
19
|
+
- INNER JOIN: only needs to compute A's change data JOIN B's full data + A's full data JOIN B's change data
|
|
20
|
+
- LEFT/RIGHT/FULL OUTER JOIN: also needs to handle NULL filling, reverse retraction, and other logic
|
|
21
21
|
|
|
22
|
-
|
|
22
|
+
If business logic can guarantee referential integrity (i.e., JOIN keys will always match), prefer INNER JOIN.
|
|
23
23
|
|
|
24
24
|
```sql
|
|
25
|
-
-- ❌
|
|
25
|
+
-- ❌ Unnecessary LEFT JOIN (if product is guaranteed to exist)
|
|
26
26
|
SELECT o.*, p.name FROM orders o LEFT JOIN products p ON o.pid = p.id;
|
|
27
27
|
|
|
28
|
-
-- ✅
|
|
28
|
+
-- ✅ Switch to INNER JOIN
|
|
29
29
|
SELECT o.*, p.name FROM orders o INNER JOIN products p ON o.pid = p.id;
|
|
30
30
|
```
|
|
31
31
|
|
|
32
|
-
### 2.
|
|
32
|
+
### 2. Reduce Unnecessary DISTINCT
|
|
33
33
|
|
|
34
|
-
|
|
34
|
+
On every incremental refresh, DISTINCT needs to recompute affected keys. If upstream data is already deduplicated, or uniqueness can be guaranteed another way, remove DISTINCT.
|
|
35
35
|
|
|
36
36
|
```sql
|
|
37
|
-
-- ❌
|
|
37
|
+
-- ❌ Redundant DISTINCT
|
|
38
38
|
SELECT DISTINCT user_id, user_name FROM user_events;
|
|
39
39
|
```
|
|
40
40
|
|
|
41
|
-
### 3.
|
|
41
|
+
### 3. Window Functions Must Have PARTITION BY
|
|
42
42
|
|
|
43
|
-
|
|
43
|
+
Window functions without PARTITION BY cause every incremental refresh to fully recompute the entire window. With PARTITION BY, only affected partitions need to be recomputed.
|
|
44
44
|
|
|
45
45
|
```sql
|
|
46
|
-
-- ❌
|
|
46
|
+
-- ❌ Global window; every incremental refresh does a full recomputation
|
|
47
47
|
SELECT *, ROW_NUMBER() OVER (ORDER BY created_at DESC) AS rn FROM events;
|
|
48
48
|
|
|
49
|
-
-- ✅
|
|
49
|
+
-- ✅ Add PARTITION BY; only recompute partitions with changes
|
|
50
50
|
SELECT *, ROW_NUMBER() OVER (PARTITION BY category ORDER BY created_at DESC) AS rn FROM events;
|
|
51
51
|
```
|
|
52
52
|
|
|
53
|
-
### 4.
|
|
53
|
+
### 4. Use Simple Column References as Aggregation Keys
|
|
54
54
|
|
|
55
|
-
|
|
55
|
+
Compound expressions as GROUP BY keys reduce incremental efficiency, because the engine needs to evaluate the expression before determining which keys are affected.
|
|
56
56
|
|
|
57
57
|
```sql
|
|
58
|
-
-- ❌
|
|
58
|
+
-- ❌ Compound expression as GROUP BY key
|
|
59
59
|
SELECT DATE_TRUNC('hour', ts) AS hour, SUM(amount)
|
|
60
60
|
FROM transactions
|
|
61
61
|
GROUP BY DATE_TRUNC('hour', ts);
|
|
62
62
|
|
|
63
|
-
-- ✅
|
|
64
|
-
--
|
|
63
|
+
-- ✅ If possible, pre-compute the key column upstream
|
|
64
|
+
-- Or split into two DTs (see "Pipeline Splitting" below)
|
|
65
65
|
```
|
|
66
66
|
|
|
67
|
-
### 5.
|
|
67
|
+
### 5. Use Partition Conditions to Limit Data Range Where Possible
|
|
68
68
|
|
|
69
|
-
|
|
69
|
+
Adding partition filter conditions on source tables in the DT's SQL can significantly reduce the amount of data that needs to be scanned on each incremental refresh.
|
|
70
70
|
|
|
71
71
|
```sql
|
|
72
|
-
-- ❌
|
|
72
|
+
-- ❌ No partition condition; scans the full table every time
|
|
73
73
|
SELECT o.*, p.name
|
|
74
74
|
FROM orders o JOIN products p ON o.pid = p.id;
|
|
75
75
|
|
|
76
|
-
-- ✅
|
|
76
|
+
-- ✅ Limit data range with partition condition
|
|
77
77
|
SELECT o.*, p.name
|
|
78
78
|
FROM orders o JOIN products p ON o.pid = p.id
|
|
79
79
|
WHERE o.ds = SESSION_CONFIGS()['dt.args.ds'];
|
|
80
80
|
```
|
|
81
81
|
|
|
82
|
-
##
|
|
82
|
+
## Pipeline Splitting: Break Complex DTs into Multiple Levels
|
|
83
83
|
|
|
84
|
-
|
|
84
|
+
When a DT's SQL contains multiple JOINs + aggregations + window functions, consider splitting it into multiple DTs, each doing one thing.
|
|
85
85
|
|
|
86
|
-
|
|
87
|
-
-
|
|
88
|
-
-
|
|
89
|
-
-
|
|
90
|
-
-
|
|
86
|
+
Benefits:
|
|
87
|
+
- Each DT's incremental computation is simpler and faster
|
|
88
|
+
- Intermediate DTs can be reused by multiple downstream DTs
|
|
89
|
+
- Easier to pinpoint which layer has a problem when issues arise
|
|
90
|
+
- Different layers can use different optimization strategies
|
|
91
91
|
|
|
92
|
-
##
|
|
92
|
+
## Data Characteristics and Incremental Efficiency
|
|
93
93
|
|
|
94
|
-
###
|
|
94
|
+
### Change Ratio
|
|
95
95
|
|
|
96
|
-
|
|
97
|
-
- < 5
|
|
98
|
-
- 5% ~ 20
|
|
99
|
-
- \> 20
|
|
96
|
+
Incremental refresh works best when the change volume is a small proportion of total data. Rule of thumb:
|
|
97
|
+
- < 5%: incremental refresh is usually significantly better than full
|
|
98
|
+
- 5% ~ 20%: depends on specific operators and data distribution
|
|
99
|
+
- \> 20%: may need to evaluate whether full refresh is more appropriate
|
|
100
100
|
|
|
101
|
-
### Append-Only
|
|
101
|
+
### Append-Only Source Tables
|
|
102
102
|
|
|
103
|
-
|
|
104
|
-
-
|
|
105
|
-
-
|
|
103
|
+
If the source table only has INSERT and no UPDATE/DELETE, significant optimization is possible:
|
|
104
|
+
- The incremental engine knows change data only has additions (no retractions), and can skip deduplication merging and other operations
|
|
105
|
+
- Aggregation can directly accumulate without maintaining complete intermediate state
|
|
106
106
|
|
|
107
|
-
###
|
|
107
|
+
### Distribution of Changed Data
|
|
108
108
|
|
|
109
|
-
|
|
109
|
+
If changed data is concentrated on a few keys (e.g., recent time periods), incremental efficiency is high. If changes are spread across many keys, aggregation and window functions need to recompute many partitions, reducing efficiency.
|
|
@@ -1,19 +1,19 @@
|
|
|
1
|
-
# Dynamic Table
|
|
1
|
+
# Dynamic Table Scheduling Method Selection Guide
|
|
2
2
|
|
|
3
|
-
##
|
|
3
|
+
## Comparison of Two Scheduling Methods
|
|
4
4
|
|
|
5
|
-
|
|
|
5
|
+
| Method | Approach | Advantages | Disadvantages |
|
|
6
6
|
|------|------|------|------|
|
|
7
|
-
| **DDL
|
|
8
|
-
| **Studio Task
|
|
7
|
+
| **DDL built-in scheduling** (REFRESH INTERVAL) | Write a `REFRESH INTERVAL` clause in CREATE DYNAMIC TABLE; Lakehouse triggers automatically | Simple; no additional configuration needed | No alerts, no dependency orchestration; refresh status can only be checked via manual SQL |
|
|
8
|
+
| **Studio Task scheduling** (recommended) | Create a scheduled task in Studio; task content is the `REFRESH DYNAMIC TABLE` command | Supports upstream/downstream dependencies, unified alerts, visual monitoring | Requires creating an additional Task |
|
|
9
9
|
|
|
10
|
-
|
|
10
|
+
**Studio Task scheduling is recommended for production environments.** DDL built-in scheduling is suitable for quick validation and development/testing phases.
|
|
11
11
|
|
|
12
12
|
---
|
|
13
13
|
|
|
14
|
-
## DDL
|
|
14
|
+
## DDL Built-in Scheduling
|
|
15
15
|
|
|
16
|
-
|
|
16
|
+
Define the refresh frequency via the `REFRESH INTERVAL` clause in the CREATE statement; Lakehouse triggers periodically:
|
|
17
17
|
|
|
18
18
|
```sql
|
|
19
19
|
CREATE DYNAMIC TABLE sales_daily
|
|
@@ -25,111 +25,111 @@ FROM orders
|
|
|
25
25
|
GROUP BY 1;
|
|
26
26
|
```
|
|
27
27
|
|
|
28
|
-
###
|
|
28
|
+
### Drawbacks
|
|
29
29
|
|
|
30
|
-
-
|
|
31
|
-
-
|
|
32
|
-
-
|
|
30
|
+
- **No alerts**: refresh failures are not proactively notified; status can only be checked by manually executing SQL
|
|
31
|
+
- **No dependency orchestration**: cannot declare "refresh only after upstream task completes"; can only stagger by time interval
|
|
32
|
+
- **High monitoring cost**: need to periodically manually execute the following command to check whether refresh is normal
|
|
33
33
|
|
|
34
34
|
```sql
|
|
35
|
-
--
|
|
35
|
+
-- View refresh history; confirm state is SUCCEED
|
|
36
36
|
SHOW DYNAMIC TABLE REFRESH HISTORY WHERE name = 'your_dt_name';
|
|
37
37
|
```
|
|
38
38
|
|
|
39
|
-
|
|
39
|
+
Key field descriptions:
|
|
40
40
|
|
|
41
|
-
|
|
|
41
|
+
| Field | Meaning |
|
|
42
42
|
|------|------|
|
|
43
43
|
| `state` | SUCCEED / FAILED / RUNNING / QUEUED |
|
|
44
44
|
| `refresh_mode` | INCREMENTAL / FULL / NO_DATA |
|
|
45
|
-
| `error_message` |
|
|
46
|
-
| `duration` |
|
|
47
|
-
| `stats` |
|
|
45
|
+
| `error_message` | Error message on failure |
|
|
46
|
+
| `duration` | Duration of this refresh |
|
|
47
|
+
| `stats` | Incremental row count (rows_inserted / rows_deleted) |
|
|
48
48
|
|
|
49
49
|
---
|
|
50
50
|
|
|
51
|
-
## Studio Task
|
|
51
|
+
## Studio Task Scheduling (Recommended for Production)
|
|
52
52
|
|
|
53
|
-
|
|
53
|
+
Create a SQL task in Studio; task content is the REFRESH command; managed by Studio's scheduling system.
|
|
54
54
|
|
|
55
|
-
### Task
|
|
55
|
+
### Task Content
|
|
56
56
|
|
|
57
|
-
|
|
57
|
+
**Non-partitioned DT:**
|
|
58
58
|
|
|
59
59
|
```sql
|
|
60
60
|
REFRESH DYNAMIC TABLE schema_name.dt_name;
|
|
61
61
|
```
|
|
62
62
|
|
|
63
|
-
|
|
63
|
+
**Partitioned DT (with parameters):**
|
|
64
64
|
|
|
65
65
|
```sql
|
|
66
66
|
SET dt.args.ds = '${bizdate}';
|
|
67
67
|
REFRESH DYNAMIC TABLE schema_name.dt_name PARTITION (ds = '${bizdate}');
|
|
68
68
|
```
|
|
69
69
|
|
|
70
|
-
`${bizdate}`
|
|
70
|
+
`${bizdate}` is automatically replaced with the business date by the Studio scheduling engine at each execution.
|
|
71
71
|
|
|
72
|
-
###
|
|
72
|
+
### Must Configure Self-dependency
|
|
73
73
|
|
|
74
|
-
|
|
74
|
+
Concurrent REFRESH on the same DT is prohibited (causes write conflicts or data inconsistency). The Task must enable **self-dependency** to ensure the next instance starts only after the previous one completes.
|
|
75
75
|
|
|
76
|
-
###
|
|
76
|
+
### Upstream Dependency Configuration
|
|
77
77
|
|
|
78
|
-
-
|
|
79
|
-
-
|
|
78
|
+
- If the DT's source table data needs to wait for an upstream task to produce before refreshing → configure upstream dependency
|
|
79
|
+
- If source table data does not require synchronized readiness (e.g., real-time write table) → upstream dependency is optional
|
|
80
80
|
|
|
81
|
-
###
|
|
81
|
+
### Alert Configuration
|
|
82
82
|
|
|
83
|
-
Studio
|
|
83
|
+
Studio Tasks support the following alert rules; all are recommended for production environments:
|
|
84
84
|
|
|
85
|
-
-
|
|
86
|
-
-
|
|
87
|
-
-
|
|
85
|
+
- **Failure alert**: notify when task execution fails
|
|
86
|
+
- **Timeout alert**: notify when refresh duration exceeds a threshold (used to detect performance regression)
|
|
87
|
+
- **Not-run alert**: notify when the task has not started within the expected time
|
|
88
88
|
|
|
89
89
|
---
|
|
90
90
|
|
|
91
|
-
##
|
|
91
|
+
## Scheduling Orchestration for Multi-level DT Pipelines
|
|
92
92
|
|
|
93
|
-
|
|
93
|
+
When multiple DTs form upstream/downstream dependencies (e.g., DT_A → DT_B → DT_C), each DT corresponds to one Studio Task; task dependency relationships ensure execution order:
|
|
94
94
|
|
|
95
95
|
```
|
|
96
96
|
Task_A (REFRESH DT_A)
|
|
97
|
-
└─ Task_B (REFRESH DT_B
|
|
98
|
-
└─ Task_C (REFRESH DT_C
|
|
97
|
+
└─ Task_B (REFRESH DT_B, depends on Task_A)
|
|
98
|
+
└─ Task_C (REFRESH DT_C, depends on Task_B)
|
|
99
99
|
```
|
|
100
100
|
|
|
101
|
-
|
|
101
|
+
REFRESHes for different partitions can run in parallel (assigned to different Task instances); concurrent refresh of the same partition/non-partitioned DT is prohibited.
|
|
102
102
|
|
|
103
103
|
---
|
|
104
104
|
|
|
105
|
-
##
|
|
105
|
+
## Decision Logic: Recommend Scheduling Method to Users
|
|
106
106
|
|
|
107
|
-
|
|
107
|
+
When helping users create or configure a DT, recommend based on the following logic:
|
|
108
108
|
|
|
109
|
-
1.
|
|
110
|
-
-
|
|
111
|
-
-
|
|
109
|
+
1. **Is Studio available?**
|
|
110
|
+
- Yes → always recommend Studio Task scheduling, regardless of development or production environment
|
|
111
|
+
- No → use DDL built-in scheduling or a third-party scheduling engine
|
|
112
112
|
|
|
113
|
-
2.
|
|
114
|
-
-
|
|
115
|
-
-
|
|
113
|
+
2. **Are there upstream/downstream dependencies?**
|
|
114
|
+
- Yes (e.g., source table is produced by another task) → must use Studio Task; configure upstream dependency
|
|
115
|
+
- No → still recommend Studio Task to gain alert capability
|
|
116
116
|
|
|
117
|
-
3.
|
|
118
|
-
-
|
|
119
|
-
- REFRESH INTERVAL
|
|
117
|
+
3. **User has already written a REFRESH INTERVAL clause?**
|
|
118
|
+
- Suggest: the REFRESH INTERVAL clause can be removed and replaced with Studio Task scheduling to gain alert and dependency management capability
|
|
119
|
+
- REFRESH INTERVAL and Studio Task can coexist, but will cause double triggering; choosing one is recommended
|
|
120
120
|
|
|
121
121
|
---
|
|
122
122
|
|
|
123
|
-
##
|
|
123
|
+
## Alert Message Template
|
|
124
124
|
|
|
125
|
-
|
|
125
|
+
When the user is using DDL built-in scheduling, use the following message:
|
|
126
126
|
|
|
127
|
-
> 💡
|
|
127
|
+
> 💡 **Suggestion**: You are currently using DDL built-in scheduling (REFRESH INTERVAL), which has the following limitations:
|
|
128
128
|
>
|
|
129
|
-
> 1.
|
|
130
|
-
> 2.
|
|
129
|
+
> 1. **No alerts**: refresh failures are not proactively notified; you need to manually execute `SHOW DYNAMIC TABLE REFRESH HISTORY` to check status
|
|
130
|
+
> 2. **No dependency orchestration**: upstream/downstream task dependencies cannot be declared; can only stagger by time interval
|
|
131
131
|
>
|
|
132
|
-
>
|
|
133
|
-
> -
|
|
134
|
-
> -
|
|
135
|
-
> -
|
|
132
|
+
> **Recommendation**: Create a scheduled task in Studio with content `REFRESH DYNAMIC TABLE schema.dt_name`, and configure:
|
|
133
|
+
> - Self-dependency (prevent concurrent refresh)
|
|
134
|
+
> - Failure alert + timeout alert
|
|
135
|
+
> - Upstream dependency (if source table is produced by other tasks)
|
|
@@ -1,15 +1,16 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: dt-creator
|
|
3
3
|
description: |
|
|
4
|
-
|
|
5
|
-
|
|
4
|
+
Reference index for creating Dynamic Tables. Covers declaration strategies for static partition DT
|
|
5
|
+
vs dynamic partition DT, SQL patterns supported by incremental computation, incremental refresh
|
|
6
|
+
configuration options, and how to query refresh history.
|
|
6
7
|
---
|
|
7
8
|
|
|
8
|
-
# DT Creator —
|
|
9
|
+
# DT Creator — Reference Index
|
|
9
10
|
|
|
10
11
|
## references/
|
|
11
12
|
|
|
12
|
-
- **dt-declaration-strategy.md** — DT
|
|
13
|
-
- **sql-limitations.md** — SQL
|
|
14
|
-
- **incremental-config-reference.md** —
|
|
15
|
-
- **refresh-history-guide.md** —
|
|
13
|
+
- **dt-declaration-strategy.md** — DT declaration strategy (creation syntax and selection between static partition DT and dynamic partition DT)
|
|
14
|
+
- **sql-limitations.md** — SQL support matrix (support status for JOIN, aggregation, window functions, non-deterministic functions, etc.)
|
|
15
|
+
- **incremental-config-reference.md** — Incremental computation configuration reference (refresh strategy, source table characteristic declarations, state table management, etc.)
|
|
16
|
+
- **refresh-history-guide.md** — Incremental refresh history queries (SHOW REFRESH HISTORY / DESC HISTORY / information_schema)
|