@clickzetta/cz-cli-darwin-arm64 0.3.87-dev.20260528223948 → 0.3.88

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. package/bin/cz-cli +0 -0
  2. package/bin/skills/clickzetta-dynamic-table/SKILL.md +169 -169
  3. package/bin/skills/clickzetta-dynamic-table/best-practices/dimension-table-join-guide.md +126 -126
  4. package/bin/skills/clickzetta-dynamic-table/best-practices/medallion-and-stream-patterns.md +25 -25
  5. package/bin/skills/clickzetta-dynamic-table/best-practices/non-partitioned-merge-into-warning.md +48 -48
  6. package/bin/skills/clickzetta-dynamic-table/best-practices/performance-optimization.md +51 -51
  7. package/bin/skills/clickzetta-dynamic-table/best-practices/scheduling-guide.md +59 -59
  8. package/bin/skills/clickzetta-dynamic-table/dt-creator/SKILL.md +8 -7
  9. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md +99 -99
  10. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/incremental-config-reference.md +188 -188
  11. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/refresh-history-guide.md +117 -117
  12. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/sql-limitations.md +29 -29
  13. package/bin/skills/clickzetta-dynamic-table/dynamic-table-alter/SKILL.md +80 -79
  14. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/SKILL.md +15 -15
  15. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-column-validation-rules.md +61 -61
  16. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-conversion-rules.md +100 -100
  17. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-placeholder-rules.md +64 -64
  18. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-refresh-rules.md +32 -32
  19. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-self-reference-rules.md +21 -21
  20. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-workflow.md +71 -71
  21. package/bin/skills/clickzetta-sql-pipeline-manager/SKILL.md +203 -202
  22. package/bin/skills/clickzetta-sql-pipeline-manager/references/dynamic-table.md +62 -62
  23. package/bin/skills/clickzetta-sql-pipeline-manager/references/materialized-view.md +34 -34
  24. package/bin/skills/clickzetta-sql-pipeline-manager/references/pipe.md +61 -61
  25. package/bin/skills/clickzetta-sql-pipeline-manager/references/table-stream.md +41 -41
  26. package/bin/skills/clickzetta-table-stream-pipeline/SKILL.md +103 -101
  27. package/package.json +1 -1
@@ -1,14 +1,14 @@
1
- # Dynamic Table(动态表)SQL 参考
1
+ # Dynamic Table SQL Reference
2
2
 
3
- > **⚠️ ClickZetta 特有语法**
4
- > - 刷新调度写法:`REFRESH INTERVAL 5 MINUTE vcluster default`(不是 `TARGET_LAG`)
5
- > - 修改调度周期或计算集群必须用 `CREATE OR REPLACE`,`ALTER` 不支持
6
- > - `ALTER DYNAMIC TABLE` 只支持:SUSPEND / RESUME / SET COMMENT / RENAME COLUMN / CHANGE COLUMN COMMENT / SET/UNSET PROPERTIES
7
- > - 删除用 `DROP DYNAMIC TABLE`(不是 `DROP TABLE`)
8
- > - 恢复用 `UNDROP TABLE`(不是 `UNDROP DYNAMIC TABLE`)
9
- > - DESC `DESC TABLE name`(不支持 `DESC DYNAMIC TABLE name EXTENDED`)
3
+ > **⚠️ ClickZetta-specific syntax**
4
+ > - Refresh schedule syntax: `REFRESH INTERVAL 5 MINUTE vcluster default` (not `TARGET_LAG`)
5
+ > - Modifying the schedule interval or compute cluster requires `CREATE OR REPLACE`; `ALTER` does not support this
6
+ > - `ALTER DYNAMIC TABLE` only supports: SUSPEND / RESUME / SET COMMENT / RENAME COLUMN / CHANGE COLUMN COMMENT / SET/UNSET PROPERTIES
7
+ > - Drop with `DROP DYNAMIC TABLE` (not `DROP TABLE`)
8
+ > - Restore with `UNDROP TABLE` (not `UNDROP DYNAMIC TABLE`)
9
+ > - Describe with `DESC TABLE name` (does not support `DESC DYNAMIC TABLE name EXTENDED`)
10
10
 
11
- 动态表是 ClickZetta Lakehouse 的核心增量计算对象。通过 SQL 查询定义,自动增量刷新,无需手动调度。
11
+ Dynamic Tables are the core incremental computation objects in ClickZetta Lakehouse. Defined by a SQL query, they refresh automatically and incrementally without manual scheduling.
12
12
 
13
13
  ## CREATE DYNAMIC TABLE
14
14
 
@@ -25,15 +25,15 @@ AS
25
25
  <query>;
26
26
  ```
27
27
 
28
- **关键参数:**
29
- - `REFRESH INTERVAL <n> MINUTE`:刷新间隔,最小 1 分钟
30
- - `vcluster`:运行刷新任务的计算集群名称(直接跟名称,不带等号和引号)
31
- - `OR REPLACE`:若同名动态表已存在则替换(修改 SQL 逻辑或调度配置必须用此方式)
32
- - 建议使用 GP 型集群(如 `default`),AP 型集群不支持小文件合并
28
+ **Key parameters:**
29
+ - `REFRESH INTERVAL <n> MINUTE`: refresh interval, minimum 1 minute
30
+ - `vcluster`: name of the compute cluster to run refresh jobs (name directly, no equals sign or quotes)
31
+ - `OR REPLACE`: replaces an existing Dynamic Table with the same name (required when modifying SQL logic or scheduling config)
32
+ - Recommended: use a GP-type cluster (e.g., `default`); AP-type clusters do not support small file compaction
33
33
 
34
- **示例:**
34
+ **Examples:**
35
35
  ```sql
36
- -- 基础示例:每 5 分钟刷新一次订单汇总
36
+ -- Basic example: refresh order summary every 5 minutes
37
37
  CREATE OR REPLACE DYNAMIC TABLE dw.order_summary
38
38
  REFRESH INTERVAL 5 MINUTE vcluster default
39
39
  AS
@@ -45,7 +45,7 @@ SELECT
45
45
  FROM ods.orders
46
46
  GROUP BY 1, 2;
47
47
 
48
- -- 修改调度周期(必须用 CREATE OR REPLACE
48
+ -- Modify refresh interval (must use CREATE OR REPLACE)
49
49
  CREATE OR REPLACE DYNAMIC TABLE dw.order_summary
50
50
  REFRESH INTERVAL 10 MINUTE vcluster default
51
51
  AS
@@ -61,82 +61,82 @@ GROUP BY 1, 2;
61
61
  ## ALTER DYNAMIC TABLE
62
62
 
63
63
  ```sql
64
- -- 暂停刷新
64
+ -- Suspend refresh
65
65
  ALTER DYNAMIC TABLE <name> SUSPEND;
66
66
 
67
- -- 恢复刷新
67
+ -- Resume refresh
68
68
  ALTER DYNAMIC TABLE <name> RESUME;
69
69
 
70
- -- 修改注释
70
+ -- Modify comment
71
71
  ALTER DYNAMIC TABLE <name> SET COMMENT '<comment>';
72
72
 
73
- -- 修改列名
73
+ -- Rename column
74
74
  ALTER DYNAMIC TABLE <name> RENAME COLUMN <old_col> TO <new_col>;
75
75
 
76
- -- 修改列注释(注意用 CHANGE COLUMN
76
+ -- Modify column comment (note: use CHANGE COLUMN)
77
77
  ALTER DYNAMIC TABLE <name> CHANGE COLUMN <col_name> COMMENT '<comment>';
78
78
 
79
- -- 修改属性
79
+ -- Modify properties
80
80
  ALTER DYNAMIC TABLE <name> SET PROPERTIES ('key' = 'value');
81
81
  ALTER DYNAMIC TABLE <name> UNSET PROPERTIES ('key');
82
82
  ```
83
83
 
84
- > 注意:修改调度周期、计算集群、SQL 查询逻辑,必须用 `CREATE OR REPLACE DYNAMIC TABLE`,ALTER 不支持这些操作。
84
+ > Note: To modify the refresh interval, compute cluster, or SQL query logic, use `CREATE OR REPLACE DYNAMIC TABLE`. ALTER does not support these operations.
85
85
 
86
- ## REFRESH DYNAMIC TABLE(手动触发)
86
+ ## REFRESH DYNAMIC TABLE (Manual Trigger)
87
87
 
88
88
  ```sql
89
- -- 手动触发一次刷新
89
+ -- Manually trigger a single refresh
90
90
  REFRESH DYNAMIC TABLE <name>;
91
91
  ```
92
92
 
93
93
  ## DROP DYNAMIC TABLE
94
94
 
95
95
  ```sql
96
- -- ⚠️ 必须用 DROP DYNAMIC TABLE,不能用 DROP TABLE
96
+ -- ⚠️ Must use DROP DYNAMIC TABLE, not DROP TABLE
97
97
  DROP DYNAMIC TABLE [ IF EXISTS ] <name>;
98
98
 
99
- -- 恢复已删除的动态表(⚠️ UNDROP TABLE,不是 UNDROP DYNAMIC TABLE
99
+ -- Restore a dropped Dynamic Table (⚠️ use UNDROP TABLE, not UNDROP DYNAMIC TABLE)
100
100
  UNDROP TABLE <name>;
101
101
  ```
102
102
 
103
103
  ## SHOW / DESC
104
104
 
105
105
  ```sql
106
- -- 列出当前 schema 下所有动态表
106
+ -- List all Dynamic Tables in the current schema
107
107
  SHOW TABLES WHERE is_dynamic = true;
108
108
 
109
- -- 列出指定 schema 下的动态表
109
+ -- List Dynamic Tables in a specific schema
110
110
  SHOW TABLES IN <schema_name> WHERE is_dynamic = true;
111
111
 
112
- -- 查看动态表结构
112
+ -- View Dynamic Table structure
113
113
  DESC TABLE <name>;
114
114
 
115
- -- 查看完整建表语句
115
+ -- View full CREATE statement
116
116
  SHOW CREATE TABLE <name>;
117
117
 
118
- -- 查看刷新历史(状态、耗时、触发方式、增量行数)
118
+ -- View refresh history (status, duration, trigger type, incremental row count)
119
119
  SHOW DYNAMIC TABLE REFRESH HISTORY WHERE name = '<dt_name>' LIMIT 20;
120
120
  ```
121
121
 
122
- > ⚠️ **DESC 注意**:动态表用 `DESC TABLE name`,不支持 `DESC DYNAMIC TABLE name EXTENDED`(EXTENDED 会报错)。
122
+ > ⚠️ **DESC note**: Use `DESC TABLE name` for Dynamic Tables. `DESC DYNAMIC TABLE name EXTENDED` is not supported (EXTENDED will cause an error).
123
123
 
124
- ## 注意事项
124
+ ## Notes
125
125
 
126
- - 修改 SQL 逻辑、调度周期、计算集群 `CREATE OR REPLACE`,不能用 `ALTER`
127
- - 刷新间隔最小 1 分钟
128
- - 删除用 `DROP DYNAMIC TABLE`(不是 `DROP TABLE`)
129
- - 恢复用 `UNDROP TABLE`(不是 `UNDROP DYNAMIC TABLE`)
130
- - 刷新失败不影响表的可查询性(返回上次成功版本的数据)
131
- - 非简单加列/减列的 `CREATE OR REPLACE` 会触发一次全量刷新
132
- - 建议使用 GP 型集群(如 `default`),AP 型集群不支持小文件合并
126
+ - To modify SQL logic, refresh interval, or compute cluster use `CREATE OR REPLACE`; `ALTER` is not supported for these
127
+ - Minimum refresh interval is 1 minute
128
+ - Drop with `DROP DYNAMIC TABLE` (not `DROP TABLE`)
129
+ - Restore with `UNDROP TABLE` (not `UNDROP DYNAMIC TABLE`)
130
+ - Refresh failures do not affect queryability (returns data from the last successful version)
131
+ - A `CREATE OR REPLACE` that is not a simple add/drop column will trigger a full refresh
132
+ - Recommended: use a GP-type cluster (e.g., `default`); AP-type clusters do not support small file compaction
133
133
 
134
- ## 参数化动态表(SESSION_CONFIGS
134
+ ## Parameterized Dynamic Table (SESSION_CONFIGS)
135
135
 
136
- 通过 `SESSION_CONFIGS()` 函数定义参数化查询,在刷新时传入分区值控制刷新范围:
136
+ Use the `SESSION_CONFIGS()` function to define parameterized queries, passing partition values at refresh time to control the refresh scope:
137
137
 
138
138
  ```sql
139
- -- 创建参数化动态表
139
+ -- Create a parameterized Dynamic Table
140
140
  CREATE OR REPLACE DYNAMIC TABLE dwd.orders_partitioned
141
141
  REFRESH INTERVAL 30 MINUTE vcluster default
142
142
  AS
@@ -144,42 +144,42 @@ SELECT order_id, user_id, amount, dt
144
144
  FROM ods.orders
145
145
  WHERE dt = SESSION_CONFIGS('target_date', CAST(CURRENT_DATE() AS STRING));
146
146
 
147
- -- 手动触发刷新并传入参数
147
+ -- Manually trigger refresh with parameters
148
148
  REFRESH DYNAMIC TABLE dwd.orders_partitioned
149
149
  WITH PROPERTIES ('target_date' = '2024-06-15');
150
150
  ```
151
151
 
152
- 适用场景:传统按天全量 ETL 改造为增量任务,用 SESSION_CONFIGS 替换调度变量。
152
+ Use case: migrating traditional daily full ETL jobs to incremental jobs, replacing scheduling variables with SESSION_CONFIGS.
153
153
 
154
- ## 动态表 DML 操作
154
+ ## Dynamic Table DML Operations
155
155
 
156
- 动态表默认不支持 DML,需先开启参数(每次 DML 前都需要 SET):
156
+ Dynamic Tables do not support DML by default. You must enable the parameter first (must be set before each DML operation):
157
157
 
158
158
  ```sql
159
- -- ⚠️ 必须在同一会话/批次中先执行 SET,再执行 DML
159
+ -- ⚠️ Must execute SET in the same session/batch before the DML
160
160
  SET cz.sql.dt.allow.dml = true;
161
161
  INSERT INTO <name> VALUES (...);
162
162
 
163
- -- 删除
163
+ -- Delete
164
164
  SET cz.sql.dt.allow.dml = true;
165
165
  DELETE FROM <name> WHERE ...;
166
166
  ```
167
167
 
168
- > ⚠️ **DML 注意事项**:
169
- > - `SET cz.sql.dt.allow.dml = true` 必须与 DML 语句在同一执行批次中
170
- > - 执行 DML 后,下一次自动刷新会触发**全量刷新**(而非增量),可能耗时较长
171
- > - UPDATE 可能因内部隐藏列(`MV__KEY`)报错,建议改用 DELETE + INSERT
172
- > - 仅在数据修正等特殊场景使用 DML
168
+ > ⚠️ **DML notes**:
169
+ > - `SET cz.sql.dt.allow.dml = true` must be in the same execution batch as the DML statement
170
+ > - After a DML operation, the next automatic refresh will trigger a **full refresh** (not incremental), which may take longer
171
+ > - UPDATE may fail due to internal hidden columns (`MV__KEY`); use DELETE + INSERT instead
172
+ > - Use DML only for special cases such as data correction
173
173
 
174
- ## 参考文档
174
+ ## Reference Documentation
175
175
 
176
176
  - [CREATE DYNAMIC TABLE](https://www.yunqi.tech/documents/create-dynamic-table)
177
177
  - [ALTER DYNAMIC TABLE](https://www.yunqi.tech/documents/alter-dynamic-table)
178
178
  - [DROP DYNAMIC TABLE](https://www.yunqi.tech/documents/drop-dynamic-table)
179
179
  - [SHOW DYNAMIC TABLES](https://www.yunqi.tech/documents/show-dynamic-table)
180
180
  - [SHOW DYNAMIC TABLE REFRESH HISTORY](https://www.yunqi.tech/documents/refresh-history)
181
- - [动态表简介](https://www.yunqi.tech/documents/dynamic_table_summary)
182
- - [查看动态表刷新模式](https://www.yunqi.tech/documents/dynamic-table-incre)
183
- - [传统离线任务转增量实践](https://www.yunqi.tech/documents/transformt-dt)
184
- - [动态表支持参数化定义](https://www.yunqi.tech/documents/dynamicTable-parmaters)
185
- - [动态表支持DML语句修改](https://www.yunqi.tech/documents/dynamicTable-dml)
181
+ - [Dynamic Table Overview](https://www.yunqi.tech/documents/dynamic_table_summary)
182
+ - [View Dynamic Table Refresh Mode](https://www.yunqi.tech/documents/dynamic-table-incre)
183
+ - [Migrating Traditional Offline Jobs to Incremental](https://www.yunqi.tech/documents/transformt-dt)
184
+ - [Parameterized Dynamic Table](https://www.yunqi.tech/documents/dynamicTable-parmaters)
185
+ - [Dynamic Table DML Support](https://www.yunqi.tech/documents/dynamicTable-dml)
@@ -1,11 +1,11 @@
1
- # Materialized View(物化视图)SQL 参考
1
+ # Materialized View SQL Reference
2
2
 
3
- > **⚠️ ClickZetta 特有语法**
4
- > - 定时刷新:`REFRESH INTERVAL 10 MINUTE vcluster default`(与动态表语法相同)
5
- > - 手动刷新:`REFRESH MATERIALIZED VIEW <name>;`
6
- > - 修改注释用 `ALTER TABLE`,不是 `ALTER MATERIALIZED VIEW`
3
+ > **⚠️ ClickZetta-specific syntax**
4
+ > - Scheduled refresh: `REFRESH INTERVAL 10 MINUTE vcluster default` (same syntax as Dynamic Table)
5
+ > - Manual refresh: `REFRESH MATERIALIZED VIEW <name>;`
6
+ > - Modify comments with `ALTER TABLE`, not `ALTER MATERIALIZED VIEW`
7
7
 
8
- 物化视图将查询结果预计算并物理存储,适合固定维度的聚合加速场景。与动态表的区别:物化视图支持手动或定时刷新,不支持增量刷新。
8
+ Materialized Views pre-compute and physically store query results, making them ideal for fixed-dimension aggregation acceleration. Unlike Dynamic Tables, Materialized Views support manual or scheduled refresh but do not support incremental refresh.
9
9
 
10
10
  ## CREATE MATERIALIZED VIEW
11
11
 
@@ -19,15 +19,15 @@ AS
19
19
  <query>;
20
20
  ```
21
21
 
22
- **关键参数:**
23
- - `REFRESH INTERVAL 10 MINUTE vcluster default`:定时自动刷新(与动态表语法相同)
24
- - 不写 REFRESH 子句:只能手动触发 `REFRESH MATERIALIZED VIEW <name>;`
25
- - `BUILD DEFERRED`:延迟构建,创建时不立即计算结果
26
- - `DISABLE QUERY REWRITE`:禁用查询改写(不自动用 MV 加速查询)
22
+ **Key parameters:**
23
+ - `REFRESH INTERVAL 10 MINUTE vcluster default`: scheduled automatic refresh (same syntax as Dynamic Table)
24
+ - Omitting the REFRESH clause: only manual refresh via `REFRESH MATERIALIZED VIEW <name>;`
25
+ - `BUILD DEFERRED`: deferred build — does not compute results immediately at creation time
26
+ - `DISABLE QUERY REWRITE`: disables query rewrite (MV will not automatically accelerate queries)
27
27
 
28
- **示例:**
28
+ **Examples:**
29
29
  ```sql
30
- -- 定时自动刷新的物化视图(每 10 分钟)
30
+ -- Materialized View with scheduled auto-refresh (every 10 minutes)
31
31
  CREATE MATERIALIZED VIEW mv_dept_stats
32
32
  REFRESH INTERVAL 10 MINUTE vcluster default
33
33
  AS
@@ -40,7 +40,7 @@ FROM departments d
40
40
  JOIN employees e ON d.dept_id = e.dept_id
41
41
  GROUP BY d.dept_id, d.dept_name;
42
42
 
43
- -- 修改刷新周期(需要 CREATE OR REPLACE
43
+ -- Modify refresh interval (requires CREATE OR REPLACE)
44
44
  CREATE OR REPLACE MATERIALIZED VIEW mv_dept_stats
45
45
  BUILD DEFERRED
46
46
  REFRESH INTERVAL 20 MINUTE vcluster default
@@ -57,32 +57,32 @@ FROM departments d
57
57
  JOIN employees e ON d.dept_id = e.dept_id
58
58
  GROUP BY d.dept_id, d.dept_name, d.location;
59
59
 
60
- -- 手动刷新
60
+ -- Manual refresh
61
61
  REFRESH MATERIALIZED VIEW mv_dept_stats;
62
62
  ```
63
63
 
64
64
  ## ALTER MATERIALIZED VIEW
65
65
 
66
66
  ```sql
67
- -- 暂停自动刷新
67
+ -- Suspend automatic refresh
68
68
  ALTER MATERIALIZED VIEW <name> SUSPEND;
69
69
 
70
- -- 恢复自动刷新
70
+ -- Resume automatic refresh
71
71
  ALTER MATERIALIZED VIEW <name> RESUME;
72
72
 
73
- -- 修改注释
73
+ -- Modify comment
74
74
  ALTER TABLE <mv_name> SET COMMENT '<comment>';
75
75
 
76
- -- 修改列注释(物化视图用 ALTER TABLE 语法)
76
+ -- Modify column comment (Materialized Views use ALTER TABLE syntax)
77
77
  ALTER TABLE <mv_name> CHANGE COLUMN <col_name> COMMENT '<comment>';
78
78
  ```
79
79
 
80
- > 注意:物化视图的注释修改使用 `ALTER TABLE`,不是 `ALTER MATERIALIZED VIEW`。
80
+ > Note: Use `ALTER TABLE` (not `ALTER MATERIALIZED VIEW`) to modify comments on a Materialized View.
81
81
 
82
82
  ## REFRESH MATERIALIZED VIEW
83
83
 
84
84
  ```sql
85
- -- 手动触发全量刷新
85
+ -- Manually trigger a full refresh
86
86
  REFRESH MATERIALIZED VIEW <name>;
87
87
  ```
88
88
 
@@ -95,35 +95,35 @@ DROP MATERIALIZED VIEW [ IF EXISTS ] <name>;
95
95
  ## SHOW / DESC
96
96
 
97
97
  ```sql
98
- -- 列出当前 schema 下所有物化视图
98
+ -- List all Materialized Views in the current schema
99
99
  SHOW TABLES WHERE is_materialized_view = true;
100
100
 
101
- -- 按名称过滤
101
+ -- Filter by name
102
102
  SHOW TABLES LIKE 'mv_%' WHERE is_materialized_view = true;
103
103
 
104
- -- 查看物化视图结构
104
+ -- View Materialized View structure
105
105
  DESC MATERIALIZED VIEW <name>;
106
106
  DESCRIBE MATERIALIZED VIEW <name> EXTENDED;
107
107
 
108
- -- 查看完整建表语句
108
+ -- View full CREATE statement
109
109
  SHOW CREATE TABLE <name>;
110
110
  ```
111
111
 
112
- ## 动态表 vs 物化视图 选择指南
112
+ ## Dynamic Table vs Materialized View — Selection Guide
113
113
 
114
- | 场景 | 推荐 |
114
+ | Scenario | Recommended |
115
115
  |---|---|
116
- | 需要秒/分钟级自动增量刷新 | Dynamic Table |
117
- | 固定聚合,手动或低频刷新 | Materialized View |
118
- | 需要 CDC 变更感知 | Dynamic Table + Table Stream |
119
- | 加速 BI 查询,数据不要求实时 | Materialized View |
116
+ | Need second/minute-level automatic incremental refresh | Dynamic Table |
117
+ | Fixed aggregation, manual or low-frequency refresh | Materialized View |
118
+ | Need CDC change detection | Dynamic Table + Table Stream |
119
+ | Accelerate BI queries, real-time data not required | Materialized View |
120
120
 
121
- ## 参考文档
121
+ ## Reference Documentation
122
122
 
123
123
  - [CREATE MATERIALIZED VIEW](https://www.yunqi.tech/documents/CREATEMATERIALIZEDVIEW)
124
124
  - [ALTER MATERIALIZED VIEW](https://www.yunqi.tech/documents/alter-materialzied-view)
125
125
  - [REFRESH MATERIALIZED VIEW](https://www.yunqi.tech/documents/REFRESH)
126
126
  - [DROP MATERIALIZED VIEW](https://www.yunqi.tech/documents/DROPMATERIALIZEDVIEW)
127
127
  - [SHOW MATERIALIZED VIEWS](https://www.yunqi.tech/documents/show-materialized-view)
128
- - [物化视图概念与场景](https://www.yunqi.tech/documents/MATERIALIZEDVIEW)
129
- - [物化视图 DDL 汇总](https://www.yunqi.tech/documents/materialized_ddl)
128
+ - [Materialized View Concepts and Use Cases](https://www.yunqi.tech/documents/MATERIALIZEDVIEW)
129
+ - [Materialized View DDL Summary](https://www.yunqi.tech/documents/materialized_ddl)
@@ -1,14 +1,14 @@
1
- # Pipe SQL 参考
1
+ # Pipe SQL Reference
2
2
 
3
- > **⚠️ ClickZetta 特有语法**
4
- > - Kafka 读取函数是 `read_kafka(...)`,使用**位置参数**(不是命名参数 `=>`)
5
- > - JSON 字段提取用 `parse_json(value::string)['field']::TYPE` 语法
6
- > - Pipe 创建后默认自动启动,无需手动 RESUME
7
- > - OSS Pipe `PURGE=true` 紧跟在 `USING <format>` 之后(如 `USING CSV PURGE=true`)
3
+ > **⚠️ ClickZetta-specific syntax**
4
+ > - The Kafka read function is `read_kafka(...)`, using **positional parameters** (not named parameters with `=>`)
5
+ > - JSON field extraction uses `parse_json(value::string)['field']::TYPE` syntax
6
+ > - A Pipe starts automatically after creation; no manual RESUME is needed
7
+ > - For OSS Pipes, `PURGE=true` follows immediately after `USING <format>` (e.g., `USING CSV PURGE=true`)
8
8
 
9
- Pipe ClickZetta Lakehouse 的持续数据导入对象,通过 SQL 定义从 Kafka 或对象存储(OSS/S3/COS)自动、持续地将数据导入目标表,无需外部调度。
9
+ Pipe is the continuous data ingestion object in ClickZetta Lakehouse. Defined by SQL, it automatically and continuously imports data from Kafka or object storage (OSS/S3/COS) into a target table without external scheduling.
10
10
 
11
- ## CREATE PIPE — Kafka 导入
11
+ ## CREATE PIPE — Ingest from Kafka
12
12
 
13
13
  ```sql
14
14
  CREATE [ OR REPLACE ] PIPE <pipe_name>
@@ -21,22 +21,22 @@ AS
21
21
  COPY INTO <target_table> FROM (
22
22
  SELECT <expr> [, ...]
23
23
  FROM read_kafka(
24
- '<bootstrap_servers>', -- 必填:Kafka 集群地址
25
- '<topic>', -- 必填:Topic 名称
26
- '', -- 保留(填空字符串)
27
- '<group_id>', -- 必填:持久消费者组 ID
28
- '', '', '', '', -- 位置参数留空,由 Pipe 自动管理
29
- 'raw', -- key 格式(目前只支持 raw
30
- 'raw', -- value 格式(目前只支持 raw
24
+ '<bootstrap_servers>', -- required: Kafka cluster address
25
+ '<topic>', -- required: topic name
26
+ '', -- reserved (leave empty string)
27
+ '<group_id>', -- required: persistent consumer group ID
28
+ '', '', '', '', -- positional params left empty, managed by Pipe automatically
29
+ 'raw', -- key format (only 'raw' supported currently)
30
+ 'raw', -- value format (only 'raw' supported currently)
31
31
  0, -- max_errors
32
- MAP(<kafka_config>) -- Kafka 配置参数
32
+ MAP(<kafka_config>) -- Kafka configuration parameters
33
33
  )
34
34
  );
35
35
  ```
36
36
 
37
- **示例:**
37
+ **Examples:**
38
38
  ```sql
39
- -- Kafka 持续导入 JSON 数据
39
+ -- Continuously ingest JSON data from Kafka
40
40
  CREATE OR REPLACE PIPE kafka_orders_pipe
41
41
  VIRTUAL_CLUSTER = 'default'
42
42
  BATCH_INTERVAL_IN_SECONDS = '60'
@@ -62,7 +62,7 @@ COPY INTO ods.orders FROM (
62
62
  )
63
63
  );
64
64
 
65
- -- SASL 认证
65
+ -- SASL authentication
66
66
  CREATE PIPE kafka_secure_pipe
67
67
  VIRTUAL_CLUSTER = 'pipe_vc'
68
68
  BATCH_INTERVAL_IN_SECONDS = '60'
@@ -83,12 +83,12 @@ COPY INTO ods.secure_events FROM (
83
83
  );
84
84
  ```
85
85
 
86
- ## 验证 Kafka 连接(创建 Pipe 前)
86
+ ## Verify Kafka Connection (Before Creating a Pipe)
87
87
 
88
- 独立使用 `read_kafka` 探查数据时,可以在 MAP 中设置 `kafka.auto.offset.reset`:
88
+ When using `read_kafka` standalone to explore data, you can set `kafka.auto.offset.reset` in the MAP:
89
89
 
90
90
  ```sql
91
- -- 验证连接和数据格式
91
+ -- Verify connection and data format
92
92
  SELECT value::string
93
93
  FROM read_kafka(
94
94
  'kafka.example.com:9092',
@@ -102,11 +102,11 @@ FROM read_kafka(
102
102
  LIMIT 10;
103
103
  ```
104
104
 
105
- > ⚠️ **独立探查 vs Pipe 中的区别**:
106
- > - 独立探查:可在 MAP 中设置 `kafka.auto.offset.reset` `earliest` 读取历史数据
107
- > - Pipe 中:位置参数必须留空,消费位点由 Pipe `RESET_KAFKA_GROUP_OFFSETS` 参数控制
105
+ > ⚠️ **Standalone exploration vs inside a Pipe**:
106
+ > - Standalone exploration: you can set `kafka.auto.offset.reset` to `earliest` in the MAP to read historical data
107
+ > - Inside a Pipe: positional parameters must be left empty; the consumer offset is controlled by the Pipe's `RESET_KAFKA_GROUP_OFFSETS` parameter
108
108
 
109
- ## CREATE PIPE — 从对象存储导入
109
+ ## CREATE PIPE — Ingest from Object Storage
110
110
 
111
111
  ```sql
112
112
  CREATE [ OR REPLACE ] PIPE [ IF NOT EXISTS ] <pipe_name>
@@ -120,17 +120,17 @@ FROM VOLUME <volume_name>
120
120
  USING <csv | parquet | orc | json> [OPTIONS ('<key>' = '<value>', ...)] PURGE=true;
121
121
  ```
122
122
 
123
- **关键参数:**
124
- - `VIRTUAL_CLUSTER`:指定虚拟集群名称(OSS Pipe 必填)
125
- - `INGEST_MODE = 'LIST_PURGE'`:通用模式,定期扫描文件列表,必须设置 `PURGE=true`
126
- - `INGEST_MODE = 'EVENT_NOTIFICATION'`:事件通知模式,低延迟(仅阿里云 OSS + AWS S3),不需要 `PURGE=true`
127
- - `COMMENT 'text'`:不带等号(`COMMENT = 'text'` 会报错)
128
- - `PURGE=true`:放在最后,OPTIONS 在其之前:`USING CSV OPTIONS (...) PURGE=true`
129
- - PIPE 中的 COPY 语句不支持 `files`、`regexp`、`subdirectory` 参数
123
+ **Key parameters:**
124
+ - `VIRTUAL_CLUSTER`: specifies the virtual cluster name (required for OSS Pipes)
125
+ - `INGEST_MODE = 'LIST_PURGE'`: general mode, periodically scans the file list; `PURGE=true` must be set
126
+ - `INGEST_MODE = 'EVENT_NOTIFICATION'`: event notification mode, low latency (Alibaba Cloud OSS + AWS S3 only); `PURGE=true` is not required
127
+ - `COMMENT 'text'`: no equals sign (`COMMENT = 'text'` will cause an error)
128
+ - `PURGE=true`: placed at the end, after OPTIONS: `USING CSV OPTIONS (...) PURGE=true`
129
+ - COPY statements inside a PIPE do not support `files`, `regexp`, or `subdirectory` parameters
130
130
 
131
- **示例:**
131
+ **Examples:**
132
132
  ```sql
133
- -- LIST_PURGE 模式(带 OPTIONS
133
+ -- LIST_PURGE mode (with OPTIONS)
134
134
  CREATE OR REPLACE PIPE oss_events_pipe
135
135
  VIRTUAL_CLUSTER = 'default'
136
136
  INGEST_MODE = 'LIST_PURGE'
@@ -140,7 +140,7 @@ COPY INTO ods.events
140
140
  FROM VOLUME my_oss_volume
141
141
  USING PARQUET PURGE=true;
142
142
 
143
- -- CSV 格式带 OPTIONSOPTIONS PURGE 之前)
143
+ -- CSV format with OPTIONS (OPTIONS before PURGE)
144
144
  CREATE PIPE oss_csv_pipe
145
145
  VIRTUAL_CLUSTER = 'default'
146
146
  INGEST_MODE = 'LIST_PURGE'
@@ -149,7 +149,7 @@ COPY INTO ods.csv_data
149
149
  FROM VOLUME my_csv_volume
150
150
  USING CSV OPTIONS ('header' = 'true', 'sep' = ',') PURGE=true;
151
151
 
152
- -- EVENT_NOTIFICATION 模式(不需要 PURGE
152
+ -- EVENT_NOTIFICATION mode (PURGE not required)
153
153
  CREATE PIPE oss_event_pipe
154
154
  VIRTUAL_CLUSTER = 'default'
155
155
  INGEST_MODE = 'EVENT_NOTIFICATION'
@@ -160,33 +160,33 @@ FROM VOLUME my_oss_event_volume
160
160
  USING PARQUET;
161
161
  ```
162
162
 
163
- ## 启停 Pipe
163
+ ## Start / Stop a Pipe
164
164
 
165
165
  ```sql
166
- -- 暂停 Pipe
166
+ -- Pause Pipe
167
167
  ALTER PIPE <pipe_name> SET PIPE_EXECUTION_PAUSED = true;
168
168
 
169
- -- 恢复 Pipe
169
+ -- Resume Pipe
170
170
  ALTER PIPE <pipe_name> SET PIPE_EXECUTION_PAUSED = false;
171
171
  ```
172
172
 
173
- ## 修改 Pipe 属性
173
+ ## Modify Pipe Properties
174
174
 
175
175
  ```sql
176
- -- 每次只能修改一个属性
176
+ -- Only one property can be modified at a time
177
177
  ALTER PIPE <pipe_name> SET VIRTUAL_CLUSTER = 'new_vc';
178
178
  ALTER PIPE <pipe_name> SET COPY_JOB_HINT = '{"cz.sql.split.kafka.strategy":"size","cz.mapper.kafka.message.size":"200000"}';
179
179
  ```
180
180
 
181
- > ⚠️ **ALTER PIPE 支持的属性**:
181
+ > ⚠️ **Supported ALTER PIPE properties**:
182
182
  > - ✅ `PIPE_EXECUTION_PAUSED`
183
183
  > - ✅ `VIRTUAL_CLUSTER`
184
184
  > - ✅ `COPY_JOB_HINT`
185
- > - ❌ `BATCH_INTERVAL_IN_SECONDS`(不支持修改,需删除重建)
186
- > - ❌ `BATCH_SIZE_PER_KAFKA_PARTITION`(不支持修改,需删除重建)
185
+ > - ❌ `BATCH_INTERVAL_IN_SECONDS` (not supported; must drop and recreate)
186
+ > - ❌ `BATCH_SIZE_PER_KAFKA_PARTITION` (not supported; must drop and recreate)
187
187
  >
188
- > 不支持修改 COPY/INSERT 语句逻辑,需删除 Pipe 后重建。
189
- > `COPY_JOB_HINT` 修改会覆盖所有已有 hints,需一次性设置全部参数。
188
+ > Modifying the COPY/INSERT statement logic is not supported; drop the Pipe and recreate it.
189
+ > Modifying `COPY_JOB_HINT` overwrites all existing hints; all parameters must be set at once.
190
190
 
191
191
  ## DROP PIPE
192
192
 
@@ -197,26 +197,26 @@ DROP PIPE [ IF EXISTS ] <pipe_name>;
197
197
  ## SHOW PIPE
198
198
 
199
199
  ```sql
200
- -- 列出当前 schema 下所有 Pipe
200
+ -- List all Pipes in the current schema
201
201
  SHOW PIPES;
202
202
 
203
- -- 查看 Pipe 详情(状态、延迟、定义)
203
+ -- View Pipe details (status, latency, definition)
204
204
  DESC PIPE <pipe_name>;
205
205
  DESC PIPE EXTENDED <pipe_name>;
206
206
  ```
207
207
 
208
- ## 注意事项
208
+ ## Notes
209
209
 
210
- - Pipe 创建后默认自动启动,无需手动 RESUME
211
- - Kafka Pipe 使用 consumer group 管理 offset,重建 Pipe 时保持相同 group_id 可从上次位点继续
212
- - 对象存储 Pipe 通过文件列表或事件通知检测新文件,`load_history` 去重记录保留 7
213
- - Pipe 不支持修改 AS 子句,需要删除后重建(不是 `CREATE OR REPLACE`)
214
- - Kafka Pipe 仅支持 PLAINTEXT SASL_PLAINTEXT 安全协议,不支持 SSL
210
+ - A Pipe starts automatically after creation; no manual RESUME is needed
211
+ - Kafka Pipes use a consumer group to manage offsets; keeping the same group_id when recreating a Pipe allows resuming from the last offset
212
+ - Object storage Pipes detect new files via file list scanning or event notifications; `load_history` deduplication records are retained for 7 days
213
+ - Pipes do not support modifying the AS clause; drop and recreate (not `CREATE OR REPLACE`)
214
+ - Kafka Pipes only support PLAINTEXT and SASL_PLAINTEXT security protocols; SSL is not supported
215
215
 
216
- ## 参考文档
216
+ ## Reference Documentation
217
217
 
218
- - [Pipe 简介](https://www.yunqi.tech/documents/pipe-summary)
219
- - [借助 read_kafka 函数持续导入](https://www.yunqi.tech/documents/pipe-kafka)
220
- - [借助 Kafka 外表 Table Stream 持续导入](https://www.yunqi.tech/documents/pipe-kafka-table-stream)
221
- - [最佳实践:使用 Pipe 高效接入 Kafka 数据](https://www.yunqi.tech/documents/pipe-kafka-bestpractice-1)
222
- - [使用 Pipe 持续导入对象存储数据](https://www.yunqi.tech/documents/pipe-storage-object)
218
+ - [Pipe Overview](https://www.yunqi.tech/documents/pipe-summary)
219
+ - [Continuous Ingestion with read_kafka](https://www.yunqi.tech/documents/pipe-kafka)
220
+ - [Continuous Ingestion with Kafka External Table Stream](https://www.yunqi.tech/documents/pipe-kafka-table-stream)
221
+ - [Best Practices: Efficient Kafka Ingestion with Pipe](https://www.yunqi.tech/documents/pipe-kafka-bestpractice-1)
222
+ - [Continuous Ingestion from Object Storage with Pipe](https://www.yunqi.tech/documents/pipe-storage-object)