@clickzetta/cz-cli-darwin-x64 0.3.92 → 0.3.94
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/cz-cli +0 -0
- package/bin/skills/clickzetta-ai-function/SKILL.md +109 -0
- package/bin/skills/clickzetta-ai-function/eval_cases.jsonl +4 -0
- package/bin/skills/clickzetta-ai-function/references/ai-function-ddl.md +106 -0
- package/bin/skills/clickzetta-batch-sync-pipeline/SKILL.md +124 -124
- package/bin/skills/clickzetta-batch-sync-pipeline/eval_cases.jsonl +5 -5
- package/bin/skills/clickzetta-bi-connect/SKILL.md +79 -78
- package/bin/skills/clickzetta-bi-connect/references/bi-tools.md +56 -56
- package/bin/skills/clickzetta-cdc-sync-pipeline/SKILL.md +386 -382
- package/bin/skills/clickzetta-cdc-sync-pipeline/eval_cases.jsonl +5 -5
- package/bin/skills/clickzetta-data-ingest-pipeline/SKILL.md +73 -212
- package/bin/skills/clickzetta-data-science/SKILL.md +57 -56
- package/bin/skills/clickzetta-data-science/references/bitmap-profile.md +38 -38
- package/bin/skills/clickzetta-data-science/references/data-patterns.md +16 -16
- package/bin/skills/clickzetta-data-science/references/setup.md +28 -28
- package/bin/skills/clickzetta-data-science/references/stats-functions.md +44 -44
- package/bin/skills/clickzetta-data-science/references/write-and-infer.md +22 -22
- package/bin/skills/clickzetta-data-science/references/zettapark-api.md +32 -32
- package/bin/skills/clickzetta-dw-modeling/SKILL.md +1 -1
- package/bin/skills/clickzetta-external-function/SKILL.md +51 -109
- package/bin/skills/clickzetta-external-function/eval_cases.jsonl +4 -4
- package/bin/skills/clickzetta-external-function/references/external-function-ddl.md +39 -77
- package/bin/skills/clickzetta-java-sdk/SKILL.md +49 -48
- package/bin/skills/clickzetta-java-sdk/eval_cases.jsonl +12 -12
- package/bin/skills/clickzetta-java-sdk/references/bulkload.md +34 -34
- package/bin/skills/clickzetta-java-sdk/references/realtime.md +44 -44
- package/bin/skills/clickzetta-kafka-ingest-pipeline/SKILL.md +273 -507
- package/bin/skills/clickzetta-kafka-ingest-pipeline/references/kafka-pipe-syntax.md +197 -231
- package/bin/skills/clickzetta-oss-ingest-pipeline/SKILL.md +231 -304
- package/bin/skills/clickzetta-realtime-sync-pipeline/SKILL.md +180 -179
- package/bin/skills/clickzetta-realtime-sync-pipeline/eval_cases.jsonl +5 -5
- package/bin/skills/clickzetta-semantic-view/SKILL.md +74 -72
- package/bin/skills/clickzetta-semantic-view/eval_cases.jsonl +12 -12
- package/bin/skills/clickzetta-semantic-view/references/semantic-view-reference.md +75 -75
- package/bin/skills/clickzetta-sql-migration/SKILL.md +128 -0
- package/bin/skills/clickzetta-sql-migration/eval_cases.jsonl +10 -0
- package/bin/skills/clickzetta-sql-migration/references/ddl-reference.md +350 -0
- package/bin/skills/clickzetta-sql-migration/references/dml-differences.md +192 -0
- package/bin/skills/clickzetta-sql-migration/references/dml-reference.md +279 -0
- package/bin/skills/{clickzetta-sql-syntax-guide → clickzetta-sql-migration}/references/dql-reference.md +128 -128
- package/bin/skills/clickzetta-sql-migration/references/function-mapping.md +194 -0
- package/bin/skills/clickzetta-sql-migration/references/functions-reference.md +372 -0
- package/bin/skills/clickzetta-sql-migration/references/implicit-type-conversion.md +143 -0
- package/bin/skills/clickzetta-sql-migration/references/migration-databricks.md +260 -0
- package/bin/skills/{clickzetta-sql-syntax-guide → clickzetta-sql-migration}/references/migration-snowflake.md +112 -112
- package/bin/skills/clickzetta-sql-migration/references/vs-snowflake.md +346 -0
- package/bin/skills/clickzetta-sql-migration/references/vs-spark.md +229 -0
- package/bin/skills/clickzetta-studio-task-manager/SKILL.md +326 -329
- package/bin/skills/clickzetta-table-lineage/SKILL.md +57 -55
- package/bin/skills/clickzetta-table-lineage/eval_cases.jsonl +1 -1
- package/bin/skills/clickzetta-table-lineage/references/normalize_func.sql +5 -5
- package/bin/skills/clickzetta-table-lineage/references/table_cost.sql +6 -6
- package/bin/skills/clickzetta-table-lineage/references/table_relation.sql +2 -2
- package/bin/skills/clickzetta-volume-manager/SKILL.md +186 -100
- package/bin/skills/clickzetta-volume-manager/references/volume-ddl.md +153 -52
- package/package.json +1 -1
- package/bin/skills/clickzetta-dynamic-table/best-practices/scheduling-guide.md +0 -135
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md +0 -185
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/refresh-history-guide.md +0 -260
- package/bin/skills/clickzetta-dynamic-table/dynamic-table-alter/SKILL.md +0 -191
- package/bin/skills/clickzetta-sql-syntax-guide/SKILL.md +0 -249
- package/bin/skills/clickzetta-sql-syntax-guide/eval_cases.jsonl +0 -3
- package/bin/skills/clickzetta-sql-syntax-guide/references/ddl-reference.md +0 -350
- package/bin/skills/clickzetta-sql-syntax-guide/references/dml-reference.md +0 -279
- package/bin/skills/clickzetta-sql-syntax-guide/references/functions-reference.md +0 -372
- package/bin/skills/clickzetta-sql-syntax-guide/references/migration-databricks.md +0 -260
- package/bin/skills/clickzetta-sql-syntax-guide/references/vs-snowflake.md +0 -346
- package/bin/skills/clickzetta-sql-syntax-guide/references/vs-spark.md +0 -229
- /package/bin/skills/{clickzetta-sql-syntax-guide → clickzetta-sql-migration}/LICENSE +0 -0
|
@@ -1,20 +1,35 @@
|
|
|
1
|
-
# Volume
|
|
1
|
+
# Volume Management Reference
|
|
2
2
|
|
|
3
|
-
>
|
|
3
|
+
> Source: https://www.yunqi.tech/documents/datalake_volume_object and others
|
|
4
4
|
|
|
5
|
-
## Volume
|
|
5
|
+
## Volume Types
|
|
6
6
|
|
|
7
|
-
|
|
|
8
|
-
|
|
9
|
-
|
|
|
10
|
-
|
|
|
7
|
+
| Type | Description | Lifecycle |
|
|
8
|
+
|---|---|---|
|
|
9
|
+
| External Volume | Mount OSS/COS/S3 object storage paths via Storage Connection | User creates/drops |
|
|
10
|
+
| Managed Volume | ClickZetta-managed storage, no connection needed | User creates/drops |
|
|
11
|
+
| User Volume | Auto-created per user per workspace, user-scoped access | Auto-managed; data removed when user deleted |
|
|
12
|
+
| Table Volume | Auto-created per table, access tied to table permissions | Auto-managed; data removed when table dropped |
|
|
13
|
+
|
|
14
|
+
## SQL Reference Patterns
|
|
15
|
+
|
|
16
|
+
```sql
|
|
17
|
+
-- External Volume / Managed Volume
|
|
18
|
+
VOLUME [[<workspace>].<schema>].volume_name
|
|
19
|
+
|
|
20
|
+
-- User Volume
|
|
21
|
+
USER VOLUME
|
|
22
|
+
|
|
23
|
+
-- Table Volume
|
|
24
|
+
TABLE VOLUME [[<workspace>].<schema>].table_name
|
|
25
|
+
```
|
|
11
26
|
|
|
12
27
|
---
|
|
13
28
|
|
|
14
29
|
## CREATE EXTERNAL VOLUME
|
|
15
30
|
|
|
16
31
|
```sql
|
|
17
|
-
-- OSS
|
|
32
|
+
-- OSS
|
|
18
33
|
CREATE EXTERNAL VOLUME my_oss_volume
|
|
19
34
|
LOCATION 'oss://<bucket>/<path>'
|
|
20
35
|
USING CONNECTION my_oss_conn
|
|
@@ -36,20 +51,33 @@ CREATE EXTERNAL VOLUME my_s3_volume
|
|
|
36
51
|
RECURSIVE = TRUE;
|
|
37
52
|
```
|
|
38
53
|
|
|
39
|
-
|
|
40
|
-
- `LOCATION
|
|
41
|
-
- `USING CONNECTION
|
|
42
|
-
- `DIRECTORY
|
|
43
|
-
- `RECURSIVE
|
|
54
|
+
Parameters:
|
|
55
|
+
- `LOCATION`: Object storage path
|
|
56
|
+
- `USING CONNECTION`: Name of an existing STORAGE CONNECTION
|
|
57
|
+
- `DIRECTORY`: Directory configuration, `ENABLE=TRUE` enables directory indexing, `AUTO_REFRESH=TRUE` enables auto-refresh
|
|
58
|
+
- `RECURSIVE`: Whether to recursively scan subdirectories
|
|
59
|
+
|
|
60
|
+
> If new files are not visible via `SHOW VOLUME DIRECTORY` after upload, run `ALTER VOLUME name REFRESH` manually.
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## CREATE VOLUME (Managed Volume)
|
|
65
|
+
|
|
66
|
+
Managed Volumes use ClickZetta-managed object storage. No Storage Connection or location is required.
|
|
67
|
+
|
|
68
|
+
```sql
|
|
69
|
+
CREATE VOLUME my_managed_volume RECURSIVE = TRUE;
|
|
70
|
+
```
|
|
44
71
|
|
|
45
|
-
|
|
72
|
+
Parameters:
|
|
73
|
+
- `RECURSIVE`: Whether to recursively scan subdirectories
|
|
46
74
|
|
|
47
75
|
---
|
|
48
76
|
|
|
49
77
|
## ALTER VOLUME
|
|
50
78
|
|
|
51
79
|
```sql
|
|
52
|
-
--
|
|
80
|
+
-- Refresh directory metadata
|
|
53
81
|
ALTER VOLUME my_oss_volume REFRESH;
|
|
54
82
|
```
|
|
55
83
|
|
|
@@ -57,8 +85,14 @@ ALTER VOLUME my_oss_volume REFRESH;
|
|
|
57
85
|
|
|
58
86
|
## DROP VOLUME
|
|
59
87
|
|
|
88
|
+
Only External Volumes and Managed Volumes can be explicitly dropped. User Volume and Table Volume are auto-managed and cannot be dropped.
|
|
89
|
+
|
|
60
90
|
```sql
|
|
91
|
+
-- Drop External Volume
|
|
61
92
|
DROP VOLUME IF EXISTS my_oss_volume;
|
|
93
|
+
|
|
94
|
+
-- Drop Managed Volume
|
|
95
|
+
DROP VOLUME IF EXISTS my_managed_volume;
|
|
62
96
|
```
|
|
63
97
|
|
|
64
98
|
---
|
|
@@ -66,134 +100,201 @@ DROP VOLUME IF EXISTS my_oss_volume;
|
|
|
66
100
|
## SHOW / DESC VOLUME
|
|
67
101
|
|
|
68
102
|
```sql
|
|
69
|
-
--
|
|
103
|
+
-- List all Volumes
|
|
70
104
|
SHOW VOLUMES;
|
|
71
105
|
|
|
72
|
-
--
|
|
106
|
+
-- Filter by condition (SHOW VOLUMES does not support WHERE, use information_schema)
|
|
73
107
|
SELECT volume_name, volume_type, volume_region, volume_creator
|
|
74
108
|
FROM information_schema.volumes
|
|
75
109
|
WHERE volume_type = 'EXTERNAL';
|
|
76
110
|
|
|
77
|
-
--
|
|
111
|
+
-- Find by name
|
|
78
112
|
SELECT * FROM information_schema.volumes
|
|
79
113
|
WHERE volume_name = 'my_oss_volume';
|
|
80
114
|
|
|
81
|
-
--
|
|
115
|
+
-- View Volume details
|
|
82
116
|
DESC VOLUME my_oss_volume;
|
|
83
117
|
|
|
84
|
-
--
|
|
118
|
+
-- View files in Volume directory
|
|
85
119
|
SHOW VOLUME DIRECTORY my_oss_volume;
|
|
86
120
|
```
|
|
87
121
|
|
|
88
122
|
---
|
|
89
123
|
|
|
90
|
-
##
|
|
124
|
+
## Viewing Directory Metadata (DIRECTORY Function)
|
|
91
125
|
|
|
92
126
|
```sql
|
|
93
|
-
--
|
|
127
|
+
-- View Volume directory metadata (requires prior ALTER VOLUME REFRESH)
|
|
94
128
|
SELECT * FROM DIRECTORY(VOLUME my_oss_volume);
|
|
95
129
|
```
|
|
96
130
|
|
|
97
131
|
---
|
|
98
132
|
|
|
99
|
-
## User Volume
|
|
133
|
+
## User Volume Operations
|
|
134
|
+
|
|
135
|
+
User Volume is auto-created per user per workspace and bound to the user. It can only be accessed by that user. Cannot be explicitly created or dropped. When the user is deleted, the User Volume becomes unavailable and its data is removed.
|
|
136
|
+
|
|
137
|
+
All four Volume types support file-level operations. `PUT` and `GET` require client-side support (e.g., cz-cli, Java JDBC driver, Python connector). **ClickZetta Studio Web does not support PUT/GET.**
|
|
100
138
|
|
|
101
139
|
```sql
|
|
102
|
-
--
|
|
140
|
+
-- List files (all types)
|
|
141
|
+
SHOW VOLUME DIRECTORY my_oss_volume;
|
|
142
|
+
SHOW VOLUME DIRECTORY my_managed_volume;
|
|
103
143
|
SHOW USER VOLUME DIRECTORY;
|
|
144
|
+
SHOW TABLE VOLUME DIRECTORY my_table;
|
|
104
145
|
|
|
105
|
-
--
|
|
106
|
-
PUT '/local/path/file.csv' TO
|
|
146
|
+
-- Upload files (External / Managed Volume)
|
|
147
|
+
PUT '/local/path/file.csv' TO VOLUME my_oss_volume;
|
|
148
|
+
PUT '/local/path/file.csv' TO VOLUME my_managed_volume;
|
|
107
149
|
|
|
108
|
-
--
|
|
150
|
+
-- Upload to User Volume
|
|
151
|
+
PUT '/local/path/file.csv' TO USER VOLUME;
|
|
109
152
|
PUT '/local/path/file.csv' TO USER VOLUME FILE 'subdir/file.csv';
|
|
110
|
-
|
|
111
|
-
-- 通配符上传多个文件
|
|
112
153
|
PUT '/local/path/images/*' TO USER VOLUME SUBDIRECTORY 'images/';
|
|
113
154
|
|
|
114
|
-
--
|
|
155
|
+
-- Upload to Table Volume
|
|
156
|
+
PUT '/local/path/file.csv' TO TABLE VOLUME my_table;
|
|
157
|
+
|
|
158
|
+
-- Download files (External / Managed Volume)
|
|
159
|
+
GET VOLUME my_oss_volume FILE 'subdir/file.csv' TO '/local/output/';
|
|
160
|
+
GET VOLUME my_managed_volume FILE 'subdir/file.csv' TO '/local/output/';
|
|
161
|
+
|
|
162
|
+
-- Download from User Volume
|
|
115
163
|
GET USER VOLUME FILE 'subdir/file.csv' TO '/local/output/';
|
|
116
164
|
|
|
117
|
-
--
|
|
165
|
+
-- Download from Table Volume
|
|
166
|
+
GET TABLE VOLUME my_table FILE 'subdir/file.csv' TO '/local/output/';
|
|
167
|
+
|
|
168
|
+
-- Delete files (all types)
|
|
169
|
+
REMOVE VOLUME my_oss_volume FILE 'subdir/file.csv';
|
|
170
|
+
REMOVE VOLUME my_managed_volume FILE 'subdir/file.csv';
|
|
118
171
|
REMOVE USER VOLUME FILE 'subdir/file.csv';
|
|
172
|
+
REMOVE TABLE VOLUME my_table FILE 'subdir/file.csv';
|
|
119
173
|
|
|
120
|
-
--
|
|
174
|
+
-- Delete all files in a directory
|
|
121
175
|
REMOVE USER VOLUME SUBDIRECTORY '/';
|
|
122
176
|
```
|
|
123
177
|
|
|
124
178
|
---
|
|
125
179
|
|
|
126
|
-
##
|
|
180
|
+
## Querying Data from Volume (SELECT FROM VOLUME)
|
|
127
181
|
|
|
128
182
|
```sql
|
|
129
|
-
--
|
|
183
|
+
-- Query External Volume files
|
|
130
184
|
SELECT * FROM VOLUME my_oss_volume
|
|
131
185
|
USING CSV
|
|
132
186
|
OPTIONS('header' = 'true', 'sep' = ',')
|
|
133
187
|
SUBDIRECTORY 'data/'
|
|
134
188
|
LIMIT 100;
|
|
135
189
|
|
|
136
|
-
--
|
|
190
|
+
-- Query Managed Volume files
|
|
191
|
+
SELECT * FROM VOLUME my_managed_volume
|
|
192
|
+
USING CSV
|
|
193
|
+
OPTIONS('header' = 'true')
|
|
194
|
+
FILES('data.csv');
|
|
195
|
+
|
|
196
|
+
-- Query Parquet files
|
|
137
197
|
SELECT * FROM VOLUME my_oss_volume
|
|
138
198
|
USING PARQUET
|
|
139
199
|
FILES('part-00001.parquet', 'part-00002.parquet');
|
|
140
200
|
|
|
141
|
-
--
|
|
201
|
+
-- Regex match files
|
|
142
202
|
SELECT * FROM VOLUME my_oss_volume
|
|
143
203
|
USING PARQUET
|
|
144
204
|
REGEXP '.*2024-0[1-3].parquet';
|
|
145
205
|
|
|
146
|
-
--
|
|
206
|
+
-- Query User Volume files
|
|
147
207
|
SELECT * FROM USER VOLUME
|
|
148
208
|
USING CSV
|
|
149
209
|
OPTIONS('header' = 'true')
|
|
150
210
|
FILES('data.csv')
|
|
151
211
|
LIMIT 10;
|
|
212
|
+
|
|
213
|
+
-- Query Table Volume files
|
|
214
|
+
SELECT * FROM TABLE VOLUME my_table
|
|
215
|
+
USING CSV
|
|
216
|
+
OPTIONS('header' = 'true')
|
|
217
|
+
FILES('data.csv')
|
|
218
|
+
LIMIT 10;
|
|
152
219
|
```
|
|
153
220
|
|
|
154
|
-
|
|
221
|
+
Supported formats: `CSV`, `PARQUET`, `ORC`, `JSON`, `BSON`
|
|
155
222
|
|
|
156
|
-
CSV OPTIONS
|
|
157
|
-
- `header
|
|
158
|
-
- `sep
|
|
159
|
-
- `compression
|
|
160
|
-
- `multiLine
|
|
223
|
+
Common CSV OPTIONS parameters:
|
|
224
|
+
- `header`: Whether the file has a header row, default `false`
|
|
225
|
+
- `sep`: Column delimiter, default `,`
|
|
226
|
+
- `compression`: Compression format (gzip/zstd/zlib)
|
|
227
|
+
- `multiLine`: Whether multi-line fields are supported, default `false`
|
|
161
228
|
|
|
162
229
|
---
|
|
163
230
|
|
|
164
|
-
## COPY INTO TABLE
|
|
231
|
+
## COPY INTO TABLE (Import from Volume)
|
|
165
232
|
|
|
166
233
|
```sql
|
|
234
|
+
-- Import from External Volume
|
|
167
235
|
COPY INTO my_table
|
|
168
236
|
FROM VOLUME my_oss_volume
|
|
169
237
|
USING CSV
|
|
170
238
|
OPTIONS('header' = 'true')
|
|
171
239
|
SUBDIRECTORY 'data/';
|
|
240
|
+
|
|
241
|
+
-- Import from Managed Volume
|
|
242
|
+
COPY INTO my_table
|
|
243
|
+
FROM VOLUME my_managed_volume
|
|
244
|
+
USING CSV
|
|
245
|
+
OPTIONS('header' = 'true')
|
|
246
|
+
FILES('data.csv');
|
|
247
|
+
|
|
248
|
+
-- Import from User Volume
|
|
249
|
+
COPY INTO my_table
|
|
250
|
+
FROM USER VOLUME
|
|
251
|
+
USING CSV
|
|
252
|
+
OPTIONS('header' = 'true')
|
|
253
|
+
FILES('data.csv');
|
|
254
|
+
|
|
255
|
+
-- Import from Table Volume
|
|
256
|
+
COPY INTO my_table
|
|
257
|
+
FROM TABLE VOLUME source_table
|
|
258
|
+
USING CSV
|
|
259
|
+
OPTIONS('header' = 'true')
|
|
260
|
+
FILES('data.csv');
|
|
172
261
|
```
|
|
173
262
|
|
|
174
|
-
## COPY INTO VOLUME
|
|
263
|
+
## COPY INTO VOLUME (Export to Volume)
|
|
175
264
|
|
|
176
265
|
```sql
|
|
177
|
-
--
|
|
266
|
+
-- Export table to External Volume
|
|
178
267
|
COPY INTO VOLUME my_oss_volume
|
|
179
268
|
SUBDIRECTORY 'export/'
|
|
180
269
|
FROM TABLE my_table
|
|
181
270
|
FILE_FORMAT = (TYPE = CSV);
|
|
182
271
|
|
|
183
|
-
--
|
|
272
|
+
-- Export query result
|
|
184
273
|
COPY INTO VOLUME my_oss_volume
|
|
185
274
|
SUBDIRECTORY 'export/'
|
|
186
275
|
FROM (SELECT * FROM orders WHERE year = 2024)
|
|
187
276
|
FILE_FORMAT = (TYPE = PARQUET COMPRESSION = 'GZIP');
|
|
188
277
|
|
|
189
|
-
--
|
|
278
|
+
-- Export to Managed Volume
|
|
279
|
+
COPY INTO VOLUME my_managed_volume
|
|
280
|
+
SUBDIRECTORY 'export/'
|
|
281
|
+
FROM TABLE my_table
|
|
282
|
+
FILE_FORMAT = (TYPE = CSV);
|
|
283
|
+
|
|
284
|
+
-- Export to User Volume
|
|
190
285
|
COPY INTO USER VOLUME
|
|
191
286
|
SUBDIRECTORY 'export/'
|
|
192
287
|
FROM TABLE my_table
|
|
193
288
|
FILE_FORMAT = (TYPE = CSV);
|
|
289
|
+
|
|
290
|
+
-- Export to Table Volume
|
|
291
|
+
COPY INTO TABLE VOLUME target_table
|
|
292
|
+
SUBDIRECTORY 'export/'
|
|
293
|
+
FROM TABLE my_table
|
|
294
|
+
FILE_FORMAT = (TYPE = CSV);
|
|
194
295
|
```
|
|
195
296
|
|
|
196
|
-
>
|
|
197
|
-
> -
|
|
198
|
-
> -
|
|
199
|
-
> -
|
|
297
|
+
> **Key distinction**:
|
|
298
|
+
> - **Import** (COPY INTO TABLE / SELECT FROM VOLUME): Use `USING CSV/PARQUET/JSON` + `OPTIONS(...)`
|
|
299
|
+
> - **Export** (COPY INTO VOLUME): Use `FILE_FORMAT = (TYPE = CSV/PARQUET/JSON)`
|
|
300
|
+
> - These two syntaxes are not interchangeable!
|
package/package.json
CHANGED
|
@@ -1,135 +0,0 @@
|
|
|
1
|
-
# Dynamic Table Scheduling Method Selection Guide
|
|
2
|
-
|
|
3
|
-
## Comparison of Two Scheduling Methods
|
|
4
|
-
|
|
5
|
-
| Method | Approach | Advantages | Disadvantages |
|
|
6
|
-
|------|------|------|------|
|
|
7
|
-
| **DDL built-in scheduling** (REFRESH INTERVAL) | Write a `REFRESH INTERVAL` clause in CREATE DYNAMIC TABLE; Lakehouse triggers automatically | Simple; no additional configuration needed | No alerts, no dependency orchestration; refresh status can only be checked via manual SQL |
|
|
8
|
-
| **Studio Task scheduling** (recommended) | Create a scheduled task in Studio; task content is the `REFRESH DYNAMIC TABLE` command | Supports upstream/downstream dependencies, unified alerts, visual monitoring | Requires creating an additional Task |
|
|
9
|
-
|
|
10
|
-
**Studio Task scheduling is recommended for production environments.** DDL built-in scheduling is suitable for quick validation and development/testing phases.
|
|
11
|
-
|
|
12
|
-
---
|
|
13
|
-
|
|
14
|
-
## DDL Built-in Scheduling
|
|
15
|
-
|
|
16
|
-
Define the refresh frequency via the `REFRESH INTERVAL` clause in the CREATE statement; Lakehouse triggers periodically:
|
|
17
|
-
|
|
18
|
-
```sql
|
|
19
|
-
CREATE DYNAMIC TABLE sales_daily
|
|
20
|
-
REFRESH INTERVAL 1 DAY
|
|
21
|
-
VCLUSTER default
|
|
22
|
-
AS
|
|
23
|
-
SELECT DATE(created_at) AS dt, SUM(amount) AS total
|
|
24
|
-
FROM orders
|
|
25
|
-
GROUP BY 1;
|
|
26
|
-
```
|
|
27
|
-
|
|
28
|
-
### Drawbacks
|
|
29
|
-
|
|
30
|
-
- **No alerts**: refresh failures are not proactively notified; status can only be checked by manually executing SQL
|
|
31
|
-
- **No dependency orchestration**: cannot declare "refresh only after upstream task completes"; can only stagger by time interval
|
|
32
|
-
- **High monitoring cost**: need to periodically manually execute the following command to check whether refresh is normal
|
|
33
|
-
|
|
34
|
-
```sql
|
|
35
|
-
-- View refresh history; confirm state is SUCCEED
|
|
36
|
-
SHOW DYNAMIC TABLE REFRESH HISTORY WHERE name = 'your_dt_name';
|
|
37
|
-
```
|
|
38
|
-
|
|
39
|
-
Key field descriptions:
|
|
40
|
-
|
|
41
|
-
| Field | Meaning |
|
|
42
|
-
|------|------|
|
|
43
|
-
| `state` | SUCCEED / FAILED / RUNNING / QUEUED |
|
|
44
|
-
| `refresh_mode` | INCREMENTAL / FULL / NO_DATA |
|
|
45
|
-
| `error_message` | Error message on failure |
|
|
46
|
-
| `duration` | Duration of this refresh |
|
|
47
|
-
| `stats` | Incremental row count (rows_inserted / rows_deleted) |
|
|
48
|
-
|
|
49
|
-
---
|
|
50
|
-
|
|
51
|
-
## Studio Task Scheduling (Recommended for Production)
|
|
52
|
-
|
|
53
|
-
Create a SQL task in Studio; task content is the REFRESH command; managed by Studio's scheduling system.
|
|
54
|
-
|
|
55
|
-
### Task Content
|
|
56
|
-
|
|
57
|
-
**Non-partitioned DT:**
|
|
58
|
-
|
|
59
|
-
```sql
|
|
60
|
-
REFRESH DYNAMIC TABLE schema_name.dt_name;
|
|
61
|
-
```
|
|
62
|
-
|
|
63
|
-
**Partitioned DT (with parameters):**
|
|
64
|
-
|
|
65
|
-
```sql
|
|
66
|
-
SET dt.args.ds = '${bizdate}';
|
|
67
|
-
REFRESH DYNAMIC TABLE schema_name.dt_name PARTITION (ds = '${bizdate}');
|
|
68
|
-
```
|
|
69
|
-
|
|
70
|
-
`${bizdate}` is automatically replaced with the business date by the Studio scheduling engine at each execution.
|
|
71
|
-
|
|
72
|
-
### Must Configure Self-dependency
|
|
73
|
-
|
|
74
|
-
Concurrent REFRESH on the same DT is prohibited (causes write conflicts or data inconsistency). The Task must enable **self-dependency** to ensure the next instance starts only after the previous one completes.
|
|
75
|
-
|
|
76
|
-
### Upstream Dependency Configuration
|
|
77
|
-
|
|
78
|
-
- If the DT's source table data needs to wait for an upstream task to produce before refreshing → configure upstream dependency
|
|
79
|
-
- If source table data does not require synchronized readiness (e.g., real-time write table) → upstream dependency is optional
|
|
80
|
-
|
|
81
|
-
### Alert Configuration
|
|
82
|
-
|
|
83
|
-
Studio Tasks support the following alert rules; all are recommended for production environments:
|
|
84
|
-
|
|
85
|
-
- **Failure alert**: notify when task execution fails
|
|
86
|
-
- **Timeout alert**: notify when refresh duration exceeds a threshold (used to detect performance regression)
|
|
87
|
-
- **Not-run alert**: notify when the task has not started within the expected time
|
|
88
|
-
|
|
89
|
-
---
|
|
90
|
-
|
|
91
|
-
## Scheduling Orchestration for Multi-level DT Pipelines
|
|
92
|
-
|
|
93
|
-
When multiple DTs form upstream/downstream dependencies (e.g., DT_A → DT_B → DT_C), each DT corresponds to one Studio Task; task dependency relationships ensure execution order:
|
|
94
|
-
|
|
95
|
-
```
|
|
96
|
-
Task_A (REFRESH DT_A)
|
|
97
|
-
└─ Task_B (REFRESH DT_B, depends on Task_A)
|
|
98
|
-
└─ Task_C (REFRESH DT_C, depends on Task_B)
|
|
99
|
-
```
|
|
100
|
-
|
|
101
|
-
REFRESHes for different partitions can run in parallel (assigned to different Task instances); concurrent refresh of the same partition/non-partitioned DT is prohibited.
|
|
102
|
-
|
|
103
|
-
---
|
|
104
|
-
|
|
105
|
-
## Decision Logic: Recommend Scheduling Method to Users
|
|
106
|
-
|
|
107
|
-
When helping users create or configure a DT, recommend based on the following logic:
|
|
108
|
-
|
|
109
|
-
1. **Is Studio available?**
|
|
110
|
-
- Yes → always recommend Studio Task scheduling, regardless of development or production environment
|
|
111
|
-
- No → use DDL built-in scheduling or a third-party scheduling engine
|
|
112
|
-
|
|
113
|
-
2. **Are there upstream/downstream dependencies?**
|
|
114
|
-
- Yes (e.g., source table is produced by another task) → must use Studio Task; configure upstream dependency
|
|
115
|
-
- No → still recommend Studio Task to gain alert capability
|
|
116
|
-
|
|
117
|
-
3. **User has already written a REFRESH INTERVAL clause?**
|
|
118
|
-
- Suggest: the REFRESH INTERVAL clause can be removed and replaced with Studio Task scheduling to gain alert and dependency management capability
|
|
119
|
-
- REFRESH INTERVAL and Studio Task can coexist, but will cause double triggering; choosing one is recommended
|
|
120
|
-
|
|
121
|
-
---
|
|
122
|
-
|
|
123
|
-
## Alert Message Template
|
|
124
|
-
|
|
125
|
-
When the user is using DDL built-in scheduling, use the following message:
|
|
126
|
-
|
|
127
|
-
> 💡 **Suggestion**: You are currently using DDL built-in scheduling (REFRESH INTERVAL), which has the following limitations:
|
|
128
|
-
>
|
|
129
|
-
> 1. **No alerts**: refresh failures are not proactively notified; you need to manually execute `SHOW DYNAMIC TABLE REFRESH HISTORY` to check status
|
|
130
|
-
> 2. **No dependency orchestration**: upstream/downstream task dependencies cannot be declared; can only stagger by time interval
|
|
131
|
-
>
|
|
132
|
-
> **Recommendation**: Create a scheduled task in Studio with content `REFRESH DYNAMIC TABLE schema.dt_name`, and configure:
|
|
133
|
-
> - Self-dependency (prevent concurrent refresh)
|
|
134
|
-
> - Failure alert + timeout alert
|
|
135
|
-
> - Upstream dependency (if source table is produced by other tasks)
|
package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md
DELETED
|
@@ -1,185 +0,0 @@
|
|
|
1
|
-
# Dynamic Table Declaration Strategy
|
|
2
|
-
|
|
3
|
-
DT has two creation syntaxes: static partition DT and dynamic partition DT (non-partitioned DT can be viewed as a special case of dynamic partition). The two differ fundamentally in creation syntax, refresh behavior, and incremental behavior.
|
|
4
|
-
|
|
5
|
-
## Core Concepts
|
|
6
|
-
|
|
7
|
-
### Static Partition DT (Partitioned DT with SESSION_CONFIGS args)
|
|
8
|
-
|
|
9
|
-
The SQL references partition parameters via `SESSION_CONFIGS()`, and a specific partition value is specified at each REFRESH. Each partition refreshes independently — each partition refresh unit can be viewed as an independent DT.
|
|
10
|
-
|
|
11
|
-
```sql
|
|
12
|
-
CREATE DYNAMIC TABLE order_daily (
|
|
13
|
-
id BIGINT, amount DECIMAL(12,2), ds STRING
|
|
14
|
-
)
|
|
15
|
-
PARTITIONED BY (ds)
|
|
16
|
-
AS
|
|
17
|
-
SELECT id, amount, SESSION_CONFIGS()['dt.args.ds'] AS ds
|
|
18
|
-
FROM orders
|
|
19
|
-
WHERE ds = SESSION_CONFIGS()['dt.args.ds'];
|
|
20
|
-
|
|
21
|
-
-- Specify partition at refresh time
|
|
22
|
-
set dt.args.ds=2025-01-01
|
|
23
|
-
REFRESH DYNAMIC TABLE order_daily PARTITION(ds = '2025-01-01');
|
|
24
|
-
```
|
|
25
|
-
|
|
26
|
-
### Dynamic Partition DT (Non-partitioned DT / DT without args)
|
|
27
|
-
|
|
28
|
-
The SQL does not reference `SESSION_CONFIGS()`, or although partitioned, the partition values are dynamically produced by the query logic. Each REFRESH processes all incremental data from all source tables.
|
|
29
|
-
|
|
30
|
-
Dynamic partition DTs do not allow any command other than REFRESH to modify data (INSERT/UPDATE/DELETE/MERGE are all unavailable); data is driven entirely by REFRESH.
|
|
31
|
-
|
|
32
|
-
Therefore, the following ETL scenarios are not suitable for dynamic partition DT:
|
|
33
|
-
- Need to manually patch data (e.g., a few rows are found to be incorrect and need to be directly UPDATEd)
|
|
34
|
-
- Need to delete data by condition (e.g., cleaning dirty data, deleting expired records)
|
|
35
|
-
- Need MERGE INTO for upsert (e.g., consuming a stream and merging into a target table in a CDC scenario)
|
|
36
|
-
- Need INSERT INTO to append external data (e.g., manually importing a batch of supplementary data)
|
|
37
|
-
- Need to backfill or re-refresh partitions independently (dynamic partition DT can only do a full table refresh; individual partitions cannot be refreshed separately)
|
|
38
|
-
- Downstream tasks need to write to the same table (DT has exclusive write ownership)
|
|
39
|
-
|
|
40
|
-
```sql
|
|
41
|
-
CREATE DYNAMIC TABLE order_summary (
|
|
42
|
-
category STRING, total_amount DECIMAL(12,2)
|
|
43
|
-
)
|
|
44
|
-
AS
|
|
45
|
-
SELECT category, SUM(amount) AS total_amount
|
|
46
|
-
FROM orders
|
|
47
|
-
GROUP BY category;
|
|
48
|
-
|
|
49
|
-
-- No partition specified at refresh time
|
|
50
|
-
REFRESH DYNAMIC TABLE order_summary;
|
|
51
|
-
```
|
|
52
|
-
|
|
53
|
-
## Key Differences
|
|
54
|
-
|
|
55
|
-
| Dimension | Static Partition DT | Dynamic Partition DT |
|
|
56
|
-
|------|-----------|-----------|
|
|
57
|
-
| Does SQL contain `SESSION_CONFIGS()`? | Yes, used to reference partition parameters | No |
|
|
58
|
-
| REFRESH syntax | `REFRESH ... PARTITION(ds='xxx')` | `REFRESH ...` (no PARTITION) |
|
|
59
|
-
| Incremental scope | Only processes incremental data for the specified partition | Processes all incremental data from all source tables |
|
|
60
|
-
| Scheduling method | External scheduler triggers one partition at a time | External scheduler triggers on a timer |
|
|
61
|
-
| Data lifecycle | Managed per partition; can backfill/delete independently | Managed as a whole table |
|
|
62
|
-
| State tables | Maintained independently per partition | Maintained globally |
|
|
63
|
-
| Suitable data patterns | T+1 batch processing, time-partitioned ETL | Real-time streams, global aggregation, no clear partition key |
|
|
64
|
-
|
|
65
|
-
## Selection Decision Tree
|
|
66
|
-
|
|
67
|
-
```
|
|
68
|
-
Does your data have a clear time/business partition key?
|
|
69
|
-
│
|
|
70
|
-
├─ Yes → Was the original ETL doing INSERT OVERWRITE by partition?
|
|
71
|
-
│ │
|
|
72
|
-
│ ├─ Yes → Use static partition DT
|
|
73
|
-
│ │ (maintain the original partition granularity; each partition refreshes independently)
|
|
74
|
-
│ │
|
|
75
|
-
│ └─ No → Is the data volume large? Do you need per-partition lifecycle management?
|
|
76
|
-
│ │
|
|
77
|
-
│ ├─ Yes → Use static partition DT
|
|
78
|
-
│ │ (even if the original was not partitioned, adding partitions is recommended for manageability)
|
|
79
|
-
│ │
|
|
80
|
-
│ └─ No → Use dynamic partition DT
|
|
81
|
-
│ (simple scenario; no partition management needed)
|
|
82
|
-
│
|
|
83
|
-
└─ No → Use dynamic partition DT
|
|
84
|
-
(global aggregation, real-time summary, etc.)
|
|
85
|
-
```
|
|
86
|
-
|
|
87
|
-
## Static Partition DT — Details
|
|
88
|
-
|
|
89
|
-
### Applicable Scenarios
|
|
90
|
-
|
|
91
|
-
1. **T+1 batch ETL migration**
|
|
92
|
-
- Original SQL follows the `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}')` pattern
|
|
93
|
-
- Refreshes once per day/hour by partition
|
|
94
|
-
- Needs to support historical partition backfill
|
|
95
|
-
|
|
96
|
-
2. **Sliding window computation**
|
|
97
|
-
- E.g., aggregation over the last 7 days, period-over-period comparison
|
|
98
|
-
- SQL references `SESSION_CONFIGS()['dt.args.ds']` and `sub_days(...)` for window range
|
|
99
|
-
|
|
100
|
-
3. **Per-partition data lifecycle management**
|
|
101
|
-
- Automatically clean up expired partitions via `data_lifecycle`
|
|
102
|
-
- Can backfill a single partition without affecting others
|
|
103
|
-
|
|
104
|
-
4. **Self-referencing DT (daily comparison, SCD)**
|
|
105
|
-
- Current partition depends on the result of the previous partition
|
|
106
|
-
- Must use static partition, because "current partition" and "previous partition" need to be explicitly specified
|
|
107
|
-
|
|
108
|
-
### Refresh Method
|
|
109
|
-
|
|
110
|
-
```sql
|
|
111
|
-
-- Refresh one partition at a time
|
|
112
|
-
set dt.args.ds=2025-01-15
|
|
113
|
-
REFRESH DYNAMIC TABLE my_dt PARTITION(ds = '2025-01-15');
|
|
114
|
-
|
|
115
|
-
-- Multi-level partition
|
|
116
|
-
set dt.args.pt=20250411
|
|
117
|
-
set dt.args.pt_hour=01
|
|
118
|
-
REFRESH DYNAMIC TABLE my_dt PARTITION(pt = '20250411', pt_hour = '01');
|
|
119
|
-
```
|
|
120
|
-
|
|
121
|
-
### Notes
|
|
122
|
-
|
|
123
|
-
- Use `cz.optimizer.incremental.backfill.enabled=TRUE` for backfill; it will automatically use full refresh
|
|
124
|
-
- Partition parameters are passed via `set dt.args.xxx=value`; the PARTITION clause in the REFRESH statement specifies the partition value
|
|
125
|
-
|
|
126
|
-
## Dynamic Partition DT — Details
|
|
127
|
-
|
|
128
|
-
### Applicable Scenarios
|
|
129
|
-
|
|
130
|
-
1. **Real-time stream data aggregation**
|
|
131
|
-
- Source table continuously writes; DT refreshes on a schedule
|
|
132
|
-
- No partition management needed; each refresh processes all new data
|
|
133
|
-
|
|
134
|
-
2. **Global summary tables**
|
|
135
|
-
- E.g., global TopN, global count, global deduplication
|
|
136
|
-
- No clear partition key
|
|
137
|
-
|
|
138
|
-
3. **Simple JOIN + filter**
|
|
139
|
-
- Simple transformations without partition parameters
|
|
140
|
-
- E.g., fact table JOIN dimension table, output wide table
|
|
141
|
-
|
|
142
|
-
4. **Multi-source merge (UNION ALL)**
|
|
143
|
-
- Data from multiple source tables merged into one table
|
|
144
|
-
- No partition management needed
|
|
145
|
-
|
|
146
|
-
### Refresh Method
|
|
147
|
-
|
|
148
|
-
```sql
|
|
149
|
-
-- Refresh directly; processes all incremental data from all source tables
|
|
150
|
-
REFRESH DYNAMIC TABLE my_dt;
|
|
151
|
-
```
|
|
152
|
-
|
|
153
|
-
### Notes
|
|
154
|
-
|
|
155
|
-
- Each refresh processes all incremental data from all source tables; if source table change volume is large, refresh may be slow
|
|
156
|
-
- State tables are maintained globally and may grow as data volume increases
|
|
157
|
-
- Per-partition backfill is not supported; only full table refresh is possible
|
|
158
|
-
- Suitable for scenarios where the change ratio is small (< 5%)
|
|
159
|
-
|
|
160
|
-
## Partition Granularity Selection
|
|
161
|
-
|
|
162
|
-
When choosing a static partition DT, you also need to decide on partition granularity:
|
|
163
|
-
|
|
164
|
-
| Data pattern | Recommended granularity | Notes |
|
|
165
|
-
|---------|------------|------|
|
|
166
|
-
| Strictly ordered time series (e.g., logs) | Minute-level (`dt_min`) | High data volume, frequent writes |
|
|
167
|
-
| Roughly ordered, small amount of late data | Hour-level (`dt_hour`) | Balance between granularity and management complexity |
|
|
168
|
-
| T+1 batch import | Day-level (`ds`) | Most common ETL scenario |
|
|
169
|
-
| By business cycle | Weekly/monthly | Reporting scenarios |
|
|
170
|
-
| Multi-level partition | Day + hour (`ds`, `hour`) | Finer-grained lifecycle management needed |
|
|
171
|
-
|
|
172
|
-
Selection principles:
|
|
173
|
-
- Finer granularity → smaller data volume per refresh → higher incremental efficiency
|
|
174
|
-
- Finer granularity → more partitions → more complex management and scheduling
|
|
175
|
-
- Granularity should match the data write frequency: if data is written hourly, partition granularity should not be finer than hourly
|
|
176
|
-
|
|
177
|
-
## Determining Partition Strategy from Original ETL
|
|
178
|
-
|
|
179
|
-
| Original ETL pattern | Recommended DT partition strategy |
|
|
180
|
-
|--------------|----------------|
|
|
181
|
-
| `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}')` | Static partition DT, day-level |
|
|
182
|
-
| `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}', hour='${hour}')` | Static partition DT, day+hour level |
|
|
183
|
-
| `INSERT OVERWRITE TABLE t PARTITION(ds)` (dynamic partition write) | Dynamic partition DT or static partition DT (depends on whether per-partition management is needed) |
|
|
184
|
-
| `INSERT INTO TABLE t SELECT ...` (no partition) | Dynamic partition DT |
|
|
185
|
-
| `INSERT OVERWRITE TABLE t SELECT ...` (full table overwrite) | Dynamic partition DT |
|