@clickzetta/cz-cli-darwin-arm64 0.3.92 → 0.3.94

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (69) hide show
  1. package/bin/cz-cli +0 -0
  2. package/bin/skills/clickzetta-ai-function/SKILL.md +109 -0
  3. package/bin/skills/clickzetta-ai-function/eval_cases.jsonl +4 -0
  4. package/bin/skills/clickzetta-ai-function/references/ai-function-ddl.md +106 -0
  5. package/bin/skills/clickzetta-batch-sync-pipeline/SKILL.md +124 -124
  6. package/bin/skills/clickzetta-batch-sync-pipeline/eval_cases.jsonl +5 -5
  7. package/bin/skills/clickzetta-bi-connect/SKILL.md +79 -78
  8. package/bin/skills/clickzetta-bi-connect/references/bi-tools.md +56 -56
  9. package/bin/skills/clickzetta-cdc-sync-pipeline/SKILL.md +386 -382
  10. package/bin/skills/clickzetta-cdc-sync-pipeline/eval_cases.jsonl +5 -5
  11. package/bin/skills/clickzetta-data-ingest-pipeline/SKILL.md +73 -212
  12. package/bin/skills/clickzetta-data-science/SKILL.md +57 -56
  13. package/bin/skills/clickzetta-data-science/references/bitmap-profile.md +38 -38
  14. package/bin/skills/clickzetta-data-science/references/data-patterns.md +16 -16
  15. package/bin/skills/clickzetta-data-science/references/setup.md +28 -28
  16. package/bin/skills/clickzetta-data-science/references/stats-functions.md +44 -44
  17. package/bin/skills/clickzetta-data-science/references/write-and-infer.md +22 -22
  18. package/bin/skills/clickzetta-data-science/references/zettapark-api.md +32 -32
  19. package/bin/skills/clickzetta-dw-modeling/SKILL.md +1 -1
  20. package/bin/skills/clickzetta-external-function/SKILL.md +51 -109
  21. package/bin/skills/clickzetta-external-function/eval_cases.jsonl +4 -4
  22. package/bin/skills/clickzetta-external-function/references/external-function-ddl.md +39 -77
  23. package/bin/skills/clickzetta-java-sdk/SKILL.md +49 -48
  24. package/bin/skills/clickzetta-java-sdk/eval_cases.jsonl +12 -12
  25. package/bin/skills/clickzetta-java-sdk/references/bulkload.md +34 -34
  26. package/bin/skills/clickzetta-java-sdk/references/realtime.md +44 -44
  27. package/bin/skills/clickzetta-kafka-ingest-pipeline/SKILL.md +273 -507
  28. package/bin/skills/clickzetta-kafka-ingest-pipeline/references/kafka-pipe-syntax.md +197 -231
  29. package/bin/skills/clickzetta-oss-ingest-pipeline/SKILL.md +231 -304
  30. package/bin/skills/clickzetta-realtime-sync-pipeline/SKILL.md +180 -179
  31. package/bin/skills/clickzetta-realtime-sync-pipeline/eval_cases.jsonl +5 -5
  32. package/bin/skills/clickzetta-semantic-view/SKILL.md +74 -72
  33. package/bin/skills/clickzetta-semantic-view/eval_cases.jsonl +12 -12
  34. package/bin/skills/clickzetta-semantic-view/references/semantic-view-reference.md +75 -75
  35. package/bin/skills/clickzetta-sql-migration/SKILL.md +128 -0
  36. package/bin/skills/clickzetta-sql-migration/eval_cases.jsonl +10 -0
  37. package/bin/skills/clickzetta-sql-migration/references/ddl-reference.md +350 -0
  38. package/bin/skills/clickzetta-sql-migration/references/dml-differences.md +192 -0
  39. package/bin/skills/clickzetta-sql-migration/references/dml-reference.md +279 -0
  40. package/bin/skills/{clickzetta-sql-syntax-guide → clickzetta-sql-migration}/references/dql-reference.md +128 -128
  41. package/bin/skills/clickzetta-sql-migration/references/function-mapping.md +194 -0
  42. package/bin/skills/clickzetta-sql-migration/references/functions-reference.md +372 -0
  43. package/bin/skills/clickzetta-sql-migration/references/implicit-type-conversion.md +143 -0
  44. package/bin/skills/clickzetta-sql-migration/references/migration-databricks.md +260 -0
  45. package/bin/skills/{clickzetta-sql-syntax-guide → clickzetta-sql-migration}/references/migration-snowflake.md +112 -112
  46. package/bin/skills/clickzetta-sql-migration/references/vs-snowflake.md +346 -0
  47. package/bin/skills/clickzetta-sql-migration/references/vs-spark.md +229 -0
  48. package/bin/skills/clickzetta-studio-task-manager/SKILL.md +326 -329
  49. package/bin/skills/clickzetta-table-lineage/SKILL.md +57 -55
  50. package/bin/skills/clickzetta-table-lineage/eval_cases.jsonl +1 -1
  51. package/bin/skills/clickzetta-table-lineage/references/normalize_func.sql +5 -5
  52. package/bin/skills/clickzetta-table-lineage/references/table_cost.sql +6 -6
  53. package/bin/skills/clickzetta-table-lineage/references/table_relation.sql +2 -2
  54. package/bin/skills/clickzetta-volume-manager/SKILL.md +186 -100
  55. package/bin/skills/clickzetta-volume-manager/references/volume-ddl.md +153 -52
  56. package/package.json +1 -1
  57. package/bin/skills/clickzetta-dynamic-table/best-practices/scheduling-guide.md +0 -135
  58. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md +0 -185
  59. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/refresh-history-guide.md +0 -260
  60. package/bin/skills/clickzetta-dynamic-table/dynamic-table-alter/SKILL.md +0 -191
  61. package/bin/skills/clickzetta-sql-syntax-guide/SKILL.md +0 -249
  62. package/bin/skills/clickzetta-sql-syntax-guide/eval_cases.jsonl +0 -3
  63. package/bin/skills/clickzetta-sql-syntax-guide/references/ddl-reference.md +0 -350
  64. package/bin/skills/clickzetta-sql-syntax-guide/references/dml-reference.md +0 -279
  65. package/bin/skills/clickzetta-sql-syntax-guide/references/functions-reference.md +0 -372
  66. package/bin/skills/clickzetta-sql-syntax-guide/references/migration-databricks.md +0 -260
  67. package/bin/skills/clickzetta-sql-syntax-guide/references/vs-snowflake.md +0 -346
  68. package/bin/skills/clickzetta-sql-syntax-guide/references/vs-spark.md +0 -229
  69. /package/bin/skills/{clickzetta-sql-syntax-guide → clickzetta-sql-migration}/LICENSE +0 -0
@@ -1,20 +1,35 @@
1
- # Volume 管理参考
1
+ # Volume Management Reference
2
2
 
3
- > 来源:https://www.yunqi.tech/documents/datalake_volume_object
3
+ > Source: https://www.yunqi.tech/documents/datalake_volume_object and others
4
4
 
5
- ## Volume 类型
5
+ ## Volume Types
6
6
 
7
- | 类型 | 说明 |
8
- |---|---|
9
- | 外部 Volume(External Volume | 挂载 OSS/COS/S3 等对象存储路径 |
10
- | 内部 Volume(Internal Volume) | 系统托管存储,含 User Volume、Table Volume、命名 Volume |
7
+ | Type | Description | Lifecycle |
8
+ |---|---|---|
9
+ | External Volume | Mount OSS/COS/S3 object storage paths via Storage Connection | User creates/drops |
10
+ | Managed Volume | ClickZetta-managed storage, no connection needed | User creates/drops |
11
+ | User Volume | Auto-created per user per workspace, user-scoped access | Auto-managed; data removed when user deleted |
12
+ | Table Volume | Auto-created per table, access tied to table permissions | Auto-managed; data removed when table dropped |
13
+
14
+ ## SQL Reference Patterns
15
+
16
+ ```sql
17
+ -- External Volume / Managed Volume
18
+ VOLUME [[<workspace>].<schema>].volume_name
19
+
20
+ -- User Volume
21
+ USER VOLUME
22
+
23
+ -- Table Volume
24
+ TABLE VOLUME [[<workspace>].<schema>].table_name
25
+ ```
11
26
 
12
27
  ---
13
28
 
14
29
  ## CREATE EXTERNAL VOLUME
15
30
 
16
31
  ```sql
17
- -- OSS(Connection 必须使用小写 access_id/access_key)
32
+ -- OSS
18
33
  CREATE EXTERNAL VOLUME my_oss_volume
19
34
  LOCATION 'oss://<bucket>/<path>'
20
35
  USING CONNECTION my_oss_conn
@@ -36,20 +51,33 @@ CREATE EXTERNAL VOLUME my_s3_volume
36
51
  RECURSIVE = TRUE;
37
52
  ```
38
53
 
39
- 参数说明:
40
- - `LOCATION`:对象存储路径
41
- - `USING CONNECTION`:已创建的 STORAGE CONNECTION 名称
42
- - `DIRECTORY`:目录功能配置,`ENABLE=TRUE` 开启目录索引,`AUTO_REFRESH=TRUE` 自动刷新
43
- - `RECURSIVE`:是否递归扫描子目录
54
+ Parameters:
55
+ - `LOCATION`: Object storage path
56
+ - `USING CONNECTION`: Name of an existing STORAGE CONNECTION
57
+ - `DIRECTORY`: Directory configuration, `ENABLE=TRUE` enables directory indexing, `AUTO_REFRESH=TRUE` enables auto-refresh
58
+ - `RECURSIVE`: Whether to recursively scan subdirectories
59
+
60
+ > If new files are not visible via `SHOW VOLUME DIRECTORY` after upload, run `ALTER VOLUME name REFRESH` manually.
61
+
62
+ ---
63
+
64
+ ## CREATE VOLUME (Managed Volume)
65
+
66
+ Managed Volumes use ClickZetta-managed object storage. No Storage Connection or location is required.
67
+
68
+ ```sql
69
+ CREATE VOLUME my_managed_volume RECURSIVE = TRUE;
70
+ ```
44
71
 
45
- > ⚠️ 上传新文件后如果 `SHOW VOLUME DIRECTORY` 未显示,执行 `ALTER VOLUME name REFRESH` 手动刷新。
72
+ Parameters:
73
+ - `RECURSIVE`: Whether to recursively scan subdirectories
46
74
 
47
75
  ---
48
76
 
49
77
  ## ALTER VOLUME
50
78
 
51
79
  ```sql
52
- -- 刷新目录元数据
80
+ -- Refresh directory metadata
53
81
  ALTER VOLUME my_oss_volume REFRESH;
54
82
  ```
55
83
 
@@ -57,8 +85,14 @@ ALTER VOLUME my_oss_volume REFRESH;
57
85
 
58
86
  ## DROP VOLUME
59
87
 
88
+ Only External Volumes and Managed Volumes can be explicitly dropped. User Volume and Table Volume are auto-managed and cannot be dropped.
89
+
60
90
  ```sql
91
+ -- Drop External Volume
61
92
  DROP VOLUME IF EXISTS my_oss_volume;
93
+
94
+ -- Drop Managed Volume
95
+ DROP VOLUME IF EXISTS my_managed_volume;
62
96
  ```
63
97
 
64
98
  ---
@@ -66,134 +100,201 @@ DROP VOLUME IF EXISTS my_oss_volume;
66
100
  ## SHOW / DESC VOLUME
67
101
 
68
102
  ```sql
69
- -- 列出所有 Volume
103
+ -- List all Volumes
70
104
  SHOW VOLUMES;
71
105
 
72
- -- 按条件过滤(SHOW VOLUMES 不支持 WHERE,使用 information_schema
106
+ -- Filter by condition (SHOW VOLUMES does not support WHERE, use information_schema)
73
107
  SELECT volume_name, volume_type, volume_region, volume_creator
74
108
  FROM information_schema.volumes
75
109
  WHERE volume_type = 'EXTERNAL';
76
110
 
77
- -- 按名称查找
111
+ -- Find by name
78
112
  SELECT * FROM information_schema.volumes
79
113
  WHERE volume_name = 'my_oss_volume';
80
114
 
81
- -- 查看 Volume 详情
115
+ -- View Volume details
82
116
  DESC VOLUME my_oss_volume;
83
117
 
84
- -- 查看 Volume 目录下的文件
118
+ -- View files in Volume directory
85
119
  SHOW VOLUME DIRECTORY my_oss_volume;
86
120
  ```
87
121
 
88
122
  ---
89
123
 
90
- ## 查看目录元数据(DIRECTORY 函数)
124
+ ## Viewing Directory Metadata (DIRECTORY Function)
91
125
 
92
126
  ```sql
93
- -- 查看 Volume 目录元数据(需先 ALTER VOLUME REFRESH
127
+ -- View Volume directory metadata (requires prior ALTER VOLUME REFRESH)
94
128
  SELECT * FROM DIRECTORY(VOLUME my_oss_volume);
95
129
  ```
96
130
 
97
131
  ---
98
132
 
99
- ## User Volume 操作
133
+ ## User Volume Operations
134
+
135
+ User Volume is auto-created per user per workspace and bound to the user. It can only be accessed by that user. Cannot be explicitly created or dropped. When the user is deleted, the User Volume becomes unavailable and its data is removed.
136
+
137
+ All four Volume types support file-level operations. `PUT` and `GET` require client-side support (e.g., cz-cli, Java JDBC driver, Python connector). **ClickZetta Studio Web does not support PUT/GET.**
100
138
 
101
139
  ```sql
102
- -- 查看 User Volume 文件列表
140
+ -- List files (all types)
141
+ SHOW VOLUME DIRECTORY my_oss_volume;
142
+ SHOW VOLUME DIRECTORY my_managed_volume;
103
143
  SHOW USER VOLUME DIRECTORY;
144
+ SHOW TABLE VOLUME DIRECTORY my_table;
104
145
 
105
- -- 上传文件到 User Volume 根目录
106
- PUT '/local/path/file.csv' TO USER VOLUME;
146
+ -- Upload files (External / Managed Volume)
147
+ PUT '/local/path/file.csv' TO VOLUME my_oss_volume;
148
+ PUT '/local/path/file.csv' TO VOLUME my_managed_volume;
107
149
 
108
- -- 上传并指定目标路径
150
+ -- Upload to User Volume
151
+ PUT '/local/path/file.csv' TO USER VOLUME;
109
152
  PUT '/local/path/file.csv' TO USER VOLUME FILE 'subdir/file.csv';
110
-
111
- -- 通配符上传多个文件
112
153
  PUT '/local/path/images/*' TO USER VOLUME SUBDIRECTORY 'images/';
113
154
 
114
- -- 下载文件
155
+ -- Upload to Table Volume
156
+ PUT '/local/path/file.csv' TO TABLE VOLUME my_table;
157
+
158
+ -- Download files (External / Managed Volume)
159
+ GET VOLUME my_oss_volume FILE 'subdir/file.csv' TO '/local/output/';
160
+ GET VOLUME my_managed_volume FILE 'subdir/file.csv' TO '/local/output/';
161
+
162
+ -- Download from User Volume
115
163
  GET USER VOLUME FILE 'subdir/file.csv' TO '/local/output/';
116
164
 
117
- -- 删除文件
165
+ -- Download from Table Volume
166
+ GET TABLE VOLUME my_table FILE 'subdir/file.csv' TO '/local/output/';
167
+
168
+ -- Delete files (all types)
169
+ REMOVE VOLUME my_oss_volume FILE 'subdir/file.csv';
170
+ REMOVE VOLUME my_managed_volume FILE 'subdir/file.csv';
118
171
  REMOVE USER VOLUME FILE 'subdir/file.csv';
172
+ REMOVE TABLE VOLUME my_table FILE 'subdir/file.csv';
119
173
 
120
- -- 删除目录下所有文件
174
+ -- Delete all files in a directory
121
175
  REMOVE USER VOLUME SUBDIRECTORY '/';
122
176
  ```
123
177
 
124
178
  ---
125
179
 
126
- ## Volume 查询数据(SELECT FROM VOLUME
180
+ ## Querying Data from Volume (SELECT FROM VOLUME)
127
181
 
128
182
  ```sql
129
- -- 查询 CSV 文件
183
+ -- Query External Volume files
130
184
  SELECT * FROM VOLUME my_oss_volume
131
185
  USING CSV
132
186
  OPTIONS('header' = 'true', 'sep' = ',')
133
187
  SUBDIRECTORY 'data/'
134
188
  LIMIT 100;
135
189
 
136
- -- 查询 Parquet 文件
190
+ -- Query Managed Volume files
191
+ SELECT * FROM VOLUME my_managed_volume
192
+ USING CSV
193
+ OPTIONS('header' = 'true')
194
+ FILES('data.csv');
195
+
196
+ -- Query Parquet files
137
197
  SELECT * FROM VOLUME my_oss_volume
138
198
  USING PARQUET
139
199
  FILES('part-00001.parquet', 'part-00002.parquet');
140
200
 
141
- -- 正则匹配文件
201
+ -- Regex match files
142
202
  SELECT * FROM VOLUME my_oss_volume
143
203
  USING PARQUET
144
204
  REGEXP '.*2024-0[1-3].parquet';
145
205
 
146
- -- 查询 User Volume 文件
206
+ -- Query User Volume files
147
207
  SELECT * FROM USER VOLUME
148
208
  USING CSV
149
209
  OPTIONS('header' = 'true')
150
210
  FILES('data.csv')
151
211
  LIMIT 10;
212
+
213
+ -- Query Table Volume files
214
+ SELECT * FROM TABLE VOLUME my_table
215
+ USING CSV
216
+ OPTIONS('header' = 'true')
217
+ FILES('data.csv')
218
+ LIMIT 10;
152
219
  ```
153
220
 
154
- 支持格式:`CSV`、`PARQUET`、`ORC`、`JSON`、`BSON`
221
+ Supported formats: `CSV`, `PARQUET`, `ORC`, `JSON`, `BSON`
155
222
 
156
- CSV OPTIONS 常用参数:
157
- - `header`:是否有表头,默认 `false`
158
- - `sep`:列分隔符,默认 `,`
159
- - `compression`:压缩格式(gzip/zstd/zlib
160
- - `multiLine`:是否支持多行字段,默认 `false`
223
+ Common CSV OPTIONS parameters:
224
+ - `header`: Whether the file has a header row, default `false`
225
+ - `sep`: Column delimiter, default `,`
226
+ - `compression`: Compression format (gzip/zstd/zlib)
227
+ - `multiLine`: Whether multi-line fields are supported, default `false`
161
228
 
162
229
  ---
163
230
 
164
- ## COPY INTO TABLE(从 Volume 导入)
231
+ ## COPY INTO TABLE (Import from Volume)
165
232
 
166
233
  ```sql
234
+ -- Import from External Volume
167
235
  COPY INTO my_table
168
236
  FROM VOLUME my_oss_volume
169
237
  USING CSV
170
238
  OPTIONS('header' = 'true')
171
239
  SUBDIRECTORY 'data/';
240
+
241
+ -- Import from Managed Volume
242
+ COPY INTO my_table
243
+ FROM VOLUME my_managed_volume
244
+ USING CSV
245
+ OPTIONS('header' = 'true')
246
+ FILES('data.csv');
247
+
248
+ -- Import from User Volume
249
+ COPY INTO my_table
250
+ FROM USER VOLUME
251
+ USING CSV
252
+ OPTIONS('header' = 'true')
253
+ FILES('data.csv');
254
+
255
+ -- Import from Table Volume
256
+ COPY INTO my_table
257
+ FROM TABLE VOLUME source_table
258
+ USING CSV
259
+ OPTIONS('header' = 'true')
260
+ FILES('data.csv');
172
261
  ```
173
262
 
174
- ## COPY INTO VOLUME(导出到 Volume
263
+ ## COPY INTO VOLUME (Export to Volume)
175
264
 
176
265
  ```sql
177
- -- 导出表到 External Volume
266
+ -- Export table to External Volume
178
267
  COPY INTO VOLUME my_oss_volume
179
268
  SUBDIRECTORY 'export/'
180
269
  FROM TABLE my_table
181
270
  FILE_FORMAT = (TYPE = CSV);
182
271
 
183
- -- 导出查询结果
272
+ -- Export query result
184
273
  COPY INTO VOLUME my_oss_volume
185
274
  SUBDIRECTORY 'export/'
186
275
  FROM (SELECT * FROM orders WHERE year = 2024)
187
276
  FILE_FORMAT = (TYPE = PARQUET COMPRESSION = 'GZIP');
188
277
 
189
- -- 导出到 User Volume
278
+ -- Export to Managed Volume
279
+ COPY INTO VOLUME my_managed_volume
280
+ SUBDIRECTORY 'export/'
281
+ FROM TABLE my_table
282
+ FILE_FORMAT = (TYPE = CSV);
283
+
284
+ -- Export to User Volume
190
285
  COPY INTO USER VOLUME
191
286
  SUBDIRECTORY 'export/'
192
287
  FROM TABLE my_table
193
288
  FILE_FORMAT = (TYPE = CSV);
289
+
290
+ -- Export to Table Volume
291
+ COPY INTO TABLE VOLUME target_table
292
+ SUBDIRECTORY 'export/'
293
+ FROM TABLE my_table
294
+ FILE_FORMAT = (TYPE = CSV);
194
295
  ```
195
296
 
196
- > ⚠️ **关键区分**:
197
- > - **导入**(COPY INTO TABLE / SELECT FROM VOLUME):用 `USING CSV/PARQUET/JSON` + `OPTIONS(...)`
198
- > - **导出**(COPY INTO VOLUME):用 `FILE_FORMAT = (TYPE = CSV/PARQUET/JSON)`
199
- > - 两者语法不可混用!
297
+ > **Key distinction**:
298
+ > - **Import** (COPY INTO TABLE / SELECT FROM VOLUME): Use `USING CSV/PARQUET/JSON` + `OPTIONS(...)`
299
+ > - **Export** (COPY INTO VOLUME): Use `FILE_FORMAT = (TYPE = CSV/PARQUET/JSON)`
300
+ > - These two syntaxes are not interchangeable!
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@clickzetta/cz-cli-darwin-arm64",
3
- "version": "0.3.92",
3
+ "version": "0.3.94",
4
4
  "description": "cz-cli binary for macOS ARM64 (Apple Silicon)",
5
5
  "os": [
6
6
  "darwin"
@@ -1,135 +0,0 @@
1
- # Dynamic Table Scheduling Method Selection Guide
2
-
3
- ## Comparison of Two Scheduling Methods
4
-
5
- | Method | Approach | Advantages | Disadvantages |
6
- |------|------|------|------|
7
- | **DDL built-in scheduling** (REFRESH INTERVAL) | Write a `REFRESH INTERVAL` clause in CREATE DYNAMIC TABLE; Lakehouse triggers automatically | Simple; no additional configuration needed | No alerts, no dependency orchestration; refresh status can only be checked via manual SQL |
8
- | **Studio Task scheduling** (recommended) | Create a scheduled task in Studio; task content is the `REFRESH DYNAMIC TABLE` command | Supports upstream/downstream dependencies, unified alerts, visual monitoring | Requires creating an additional Task |
9
-
10
- **Studio Task scheduling is recommended for production environments.** DDL built-in scheduling is suitable for quick validation and development/testing phases.
11
-
12
- ---
13
-
14
- ## DDL Built-in Scheduling
15
-
16
- Define the refresh frequency via the `REFRESH INTERVAL` clause in the CREATE statement; Lakehouse triggers periodically:
17
-
18
- ```sql
19
- CREATE DYNAMIC TABLE sales_daily
20
- REFRESH INTERVAL 1 DAY
21
- VCLUSTER default
22
- AS
23
- SELECT DATE(created_at) AS dt, SUM(amount) AS total
24
- FROM orders
25
- GROUP BY 1;
26
- ```
27
-
28
- ### Drawbacks
29
-
30
- - **No alerts**: refresh failures are not proactively notified; status can only be checked by manually executing SQL
31
- - **No dependency orchestration**: cannot declare "refresh only after upstream task completes"; can only stagger by time interval
32
- - **High monitoring cost**: need to periodically manually execute the following command to check whether refresh is normal
33
-
34
- ```sql
35
- -- View refresh history; confirm state is SUCCEED
36
- SHOW DYNAMIC TABLE REFRESH HISTORY WHERE name = 'your_dt_name';
37
- ```
38
-
39
- Key field descriptions:
40
-
41
- | Field | Meaning |
42
- |------|------|
43
- | `state` | SUCCEED / FAILED / RUNNING / QUEUED |
44
- | `refresh_mode` | INCREMENTAL / FULL / NO_DATA |
45
- | `error_message` | Error message on failure |
46
- | `duration` | Duration of this refresh |
47
- | `stats` | Incremental row count (rows_inserted / rows_deleted) |
48
-
49
- ---
50
-
51
- ## Studio Task Scheduling (Recommended for Production)
52
-
53
- Create a SQL task in Studio; task content is the REFRESH command; managed by Studio's scheduling system.
54
-
55
- ### Task Content
56
-
57
- **Non-partitioned DT:**
58
-
59
- ```sql
60
- REFRESH DYNAMIC TABLE schema_name.dt_name;
61
- ```
62
-
63
- **Partitioned DT (with parameters):**
64
-
65
- ```sql
66
- SET dt.args.ds = '${bizdate}';
67
- REFRESH DYNAMIC TABLE schema_name.dt_name PARTITION (ds = '${bizdate}');
68
- ```
69
-
70
- `${bizdate}` is automatically replaced with the business date by the Studio scheduling engine at each execution.
71
-
72
- ### Must Configure Self-dependency
73
-
74
- Concurrent REFRESH on the same DT is prohibited (causes write conflicts or data inconsistency). The Task must enable **self-dependency** to ensure the next instance starts only after the previous one completes.
75
-
76
- ### Upstream Dependency Configuration
77
-
78
- - If the DT's source table data needs to wait for an upstream task to produce before refreshing → configure upstream dependency
79
- - If source table data does not require synchronized readiness (e.g., real-time write table) → upstream dependency is optional
80
-
81
- ### Alert Configuration
82
-
83
- Studio Tasks support the following alert rules; all are recommended for production environments:
84
-
85
- - **Failure alert**: notify when task execution fails
86
- - **Timeout alert**: notify when refresh duration exceeds a threshold (used to detect performance regression)
87
- - **Not-run alert**: notify when the task has not started within the expected time
88
-
89
- ---
90
-
91
- ## Scheduling Orchestration for Multi-level DT Pipelines
92
-
93
- When multiple DTs form upstream/downstream dependencies (e.g., DT_A → DT_B → DT_C), each DT corresponds to one Studio Task; task dependency relationships ensure execution order:
94
-
95
- ```
96
- Task_A (REFRESH DT_A)
97
- └─ Task_B (REFRESH DT_B, depends on Task_A)
98
- └─ Task_C (REFRESH DT_C, depends on Task_B)
99
- ```
100
-
101
- REFRESHes for different partitions can run in parallel (assigned to different Task instances); concurrent refresh of the same partition/non-partitioned DT is prohibited.
102
-
103
- ---
104
-
105
- ## Decision Logic: Recommend Scheduling Method to Users
106
-
107
- When helping users create or configure a DT, recommend based on the following logic:
108
-
109
- 1. **Is Studio available?**
110
- - Yes → always recommend Studio Task scheduling, regardless of development or production environment
111
- - No → use DDL built-in scheduling or a third-party scheduling engine
112
-
113
- 2. **Are there upstream/downstream dependencies?**
114
- - Yes (e.g., source table is produced by another task) → must use Studio Task; configure upstream dependency
115
- - No → still recommend Studio Task to gain alert capability
116
-
117
- 3. **User has already written a REFRESH INTERVAL clause?**
118
- - Suggest: the REFRESH INTERVAL clause can be removed and replaced with Studio Task scheduling to gain alert and dependency management capability
119
- - REFRESH INTERVAL and Studio Task can coexist, but will cause double triggering; choosing one is recommended
120
-
121
- ---
122
-
123
- ## Alert Message Template
124
-
125
- When the user is using DDL built-in scheduling, use the following message:
126
-
127
- > 💡 **Suggestion**: You are currently using DDL built-in scheduling (REFRESH INTERVAL), which has the following limitations:
128
- >
129
- > 1. **No alerts**: refresh failures are not proactively notified; you need to manually execute `SHOW DYNAMIC TABLE REFRESH HISTORY` to check status
130
- > 2. **No dependency orchestration**: upstream/downstream task dependencies cannot be declared; can only stagger by time interval
131
- >
132
- > **Recommendation**: Create a scheduled task in Studio with content `REFRESH DYNAMIC TABLE schema.dt_name`, and configure:
133
- > - Self-dependency (prevent concurrent refresh)
134
- > - Failure alert + timeout alert
135
- > - Upstream dependency (if source table is produced by other tasks)
@@ -1,185 +0,0 @@
1
- # Dynamic Table Declaration Strategy
2
-
3
- DT has two creation syntaxes: static partition DT and dynamic partition DT (non-partitioned DT can be viewed as a special case of dynamic partition). The two differ fundamentally in creation syntax, refresh behavior, and incremental behavior.
4
-
5
- ## Core Concepts
6
-
7
- ### Static Partition DT (Partitioned DT with SESSION_CONFIGS args)
8
-
9
- The SQL references partition parameters via `SESSION_CONFIGS()`, and a specific partition value is specified at each REFRESH. Each partition refreshes independently — each partition refresh unit can be viewed as an independent DT.
10
-
11
- ```sql
12
- CREATE DYNAMIC TABLE order_daily (
13
- id BIGINT, amount DECIMAL(12,2), ds STRING
14
- )
15
- PARTITIONED BY (ds)
16
- AS
17
- SELECT id, amount, SESSION_CONFIGS()['dt.args.ds'] AS ds
18
- FROM orders
19
- WHERE ds = SESSION_CONFIGS()['dt.args.ds'];
20
-
21
- -- Specify partition at refresh time
22
- set dt.args.ds=2025-01-01
23
- REFRESH DYNAMIC TABLE order_daily PARTITION(ds = '2025-01-01');
24
- ```
25
-
26
- ### Dynamic Partition DT (Non-partitioned DT / DT without args)
27
-
28
- The SQL does not reference `SESSION_CONFIGS()`, or although partitioned, the partition values are dynamically produced by the query logic. Each REFRESH processes all incremental data from all source tables.
29
-
30
- Dynamic partition DTs do not allow any command other than REFRESH to modify data (INSERT/UPDATE/DELETE/MERGE are all unavailable); data is driven entirely by REFRESH.
31
-
32
- Therefore, the following ETL scenarios are not suitable for dynamic partition DT:
33
- - Need to manually patch data (e.g., a few rows are found to be incorrect and need to be directly UPDATEd)
34
- - Need to delete data by condition (e.g., cleaning dirty data, deleting expired records)
35
- - Need MERGE INTO for upsert (e.g., consuming a stream and merging into a target table in a CDC scenario)
36
- - Need INSERT INTO to append external data (e.g., manually importing a batch of supplementary data)
37
- - Need to backfill or re-refresh partitions independently (dynamic partition DT can only do a full table refresh; individual partitions cannot be refreshed separately)
38
- - Downstream tasks need to write to the same table (DT has exclusive write ownership)
39
-
40
- ```sql
41
- CREATE DYNAMIC TABLE order_summary (
42
- category STRING, total_amount DECIMAL(12,2)
43
- )
44
- AS
45
- SELECT category, SUM(amount) AS total_amount
46
- FROM orders
47
- GROUP BY category;
48
-
49
- -- No partition specified at refresh time
50
- REFRESH DYNAMIC TABLE order_summary;
51
- ```
52
-
53
- ## Key Differences
54
-
55
- | Dimension | Static Partition DT | Dynamic Partition DT |
56
- |------|-----------|-----------|
57
- | Does SQL contain `SESSION_CONFIGS()`? | Yes, used to reference partition parameters | No |
58
- | REFRESH syntax | `REFRESH ... PARTITION(ds='xxx')` | `REFRESH ...` (no PARTITION) |
59
- | Incremental scope | Only processes incremental data for the specified partition | Processes all incremental data from all source tables |
60
- | Scheduling method | External scheduler triggers one partition at a time | External scheduler triggers on a timer |
61
- | Data lifecycle | Managed per partition; can backfill/delete independently | Managed as a whole table |
62
- | State tables | Maintained independently per partition | Maintained globally |
63
- | Suitable data patterns | T+1 batch processing, time-partitioned ETL | Real-time streams, global aggregation, no clear partition key |
64
-
65
- ## Selection Decision Tree
66
-
67
- ```
68
- Does your data have a clear time/business partition key?
69
-
70
- ├─ Yes → Was the original ETL doing INSERT OVERWRITE by partition?
71
- │ │
72
- │ ├─ Yes → Use static partition DT
73
- │ │ (maintain the original partition granularity; each partition refreshes independently)
74
- │ │
75
- │ └─ No → Is the data volume large? Do you need per-partition lifecycle management?
76
- │ │
77
- │ ├─ Yes → Use static partition DT
78
- │ │ (even if the original was not partitioned, adding partitions is recommended for manageability)
79
- │ │
80
- │ └─ No → Use dynamic partition DT
81
- │ (simple scenario; no partition management needed)
82
-
83
- └─ No → Use dynamic partition DT
84
- (global aggregation, real-time summary, etc.)
85
- ```
86
-
87
- ## Static Partition DT — Details
88
-
89
- ### Applicable Scenarios
90
-
91
- 1. **T+1 batch ETL migration**
92
- - Original SQL follows the `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}')` pattern
93
- - Refreshes once per day/hour by partition
94
- - Needs to support historical partition backfill
95
-
96
- 2. **Sliding window computation**
97
- - E.g., aggregation over the last 7 days, period-over-period comparison
98
- - SQL references `SESSION_CONFIGS()['dt.args.ds']` and `sub_days(...)` for window range
99
-
100
- 3. **Per-partition data lifecycle management**
101
- - Automatically clean up expired partitions via `data_lifecycle`
102
- - Can backfill a single partition without affecting others
103
-
104
- 4. **Self-referencing DT (daily comparison, SCD)**
105
- - Current partition depends on the result of the previous partition
106
- - Must use static partition, because "current partition" and "previous partition" need to be explicitly specified
107
-
108
- ### Refresh Method
109
-
110
- ```sql
111
- -- Refresh one partition at a time
112
- set dt.args.ds=2025-01-15
113
- REFRESH DYNAMIC TABLE my_dt PARTITION(ds = '2025-01-15');
114
-
115
- -- Multi-level partition
116
- set dt.args.pt=20250411
117
- set dt.args.pt_hour=01
118
- REFRESH DYNAMIC TABLE my_dt PARTITION(pt = '20250411', pt_hour = '01');
119
- ```
120
-
121
- ### Notes
122
-
123
- - Use `cz.optimizer.incremental.backfill.enabled=TRUE` for backfill; it will automatically use full refresh
124
- - Partition parameters are passed via `set dt.args.xxx=value`; the PARTITION clause in the REFRESH statement specifies the partition value
125
-
126
- ## Dynamic Partition DT — Details
127
-
128
- ### Applicable Scenarios
129
-
130
- 1. **Real-time stream data aggregation**
131
- - Source table continuously writes; DT refreshes on a schedule
132
- - No partition management needed; each refresh processes all new data
133
-
134
- 2. **Global summary tables**
135
- - E.g., global TopN, global count, global deduplication
136
- - No clear partition key
137
-
138
- 3. **Simple JOIN + filter**
139
- - Simple transformations without partition parameters
140
- - E.g., fact table JOIN dimension table, output wide table
141
-
142
- 4. **Multi-source merge (UNION ALL)**
143
- - Data from multiple source tables merged into one table
144
- - No partition management needed
145
-
146
- ### Refresh Method
147
-
148
- ```sql
149
- -- Refresh directly; processes all incremental data from all source tables
150
- REFRESH DYNAMIC TABLE my_dt;
151
- ```
152
-
153
- ### Notes
154
-
155
- - Each refresh processes all incremental data from all source tables; if source table change volume is large, refresh may be slow
156
- - State tables are maintained globally and may grow as data volume increases
157
- - Per-partition backfill is not supported; only full table refresh is possible
158
- - Suitable for scenarios where the change ratio is small (< 5%)
159
-
160
- ## Partition Granularity Selection
161
-
162
- When choosing a static partition DT, you also need to decide on partition granularity:
163
-
164
- | Data pattern | Recommended granularity | Notes |
165
- |---------|------------|------|
166
- | Strictly ordered time series (e.g., logs) | Minute-level (`dt_min`) | High data volume, frequent writes |
167
- | Roughly ordered, small amount of late data | Hour-level (`dt_hour`) | Balance between granularity and management complexity |
168
- | T+1 batch import | Day-level (`ds`) | Most common ETL scenario |
169
- | By business cycle | Weekly/monthly | Reporting scenarios |
170
- | Multi-level partition | Day + hour (`ds`, `hour`) | Finer-grained lifecycle management needed |
171
-
172
- Selection principles:
173
- - Finer granularity → smaller data volume per refresh → higher incremental efficiency
174
- - Finer granularity → more partitions → more complex management and scheduling
175
- - Granularity should match the data write frequency: if data is written hourly, partition granularity should not be finer than hourly
176
-
177
- ## Determining Partition Strategy from Original ETL
178
-
179
- | Original ETL pattern | Recommended DT partition strategy |
180
- |--------------|----------------|
181
- | `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}')` | Static partition DT, day-level |
182
- | `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}', hour='${hour}')` | Static partition DT, day+hour level |
183
- | `INSERT OVERWRITE TABLE t PARTITION(ds)` (dynamic partition write) | Dynamic partition DT or static partition DT (depends on whether per-partition management is needed) |
184
- | `INSERT INTO TABLE t SELECT ...` (no partition) | Dynamic partition DT |
185
- | `INSERT OVERWRITE TABLE t SELECT ...` (full table overwrite) | Dynamic partition DT |