@clickzetta/cz-cli-darwin-arm64 0.3.87 → 0.3.88
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/cz-cli +0 -0
- package/bin/skills/clickzetta-dynamic-table/SKILL.md +169 -169
- package/bin/skills/clickzetta-dynamic-table/best-practices/dimension-table-join-guide.md +126 -126
- package/bin/skills/clickzetta-dynamic-table/best-practices/medallion-and-stream-patterns.md +25 -25
- package/bin/skills/clickzetta-dynamic-table/best-practices/non-partitioned-merge-into-warning.md +48 -48
- package/bin/skills/clickzetta-dynamic-table/best-practices/performance-optimization.md +51 -51
- package/bin/skills/clickzetta-dynamic-table/best-practices/scheduling-guide.md +59 -59
- package/bin/skills/clickzetta-dynamic-table/dt-creator/SKILL.md +8 -7
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md +99 -99
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/incremental-config-reference.md +188 -188
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/refresh-history-guide.md +117 -117
- package/bin/skills/clickzetta-dynamic-table/dt-creator/references/sql-limitations.md +29 -29
- package/bin/skills/clickzetta-dynamic-table/dynamic-table-alter/SKILL.md +80 -79
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/SKILL.md +15 -15
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-column-validation-rules.md +61 -61
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-conversion-rules.md +100 -100
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-placeholder-rules.md +64 -64
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-refresh-rules.md +32 -32
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-self-reference-rules.md +21 -21
- package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-workflow.md +71 -71
- package/bin/skills/clickzetta-sql-pipeline-manager/SKILL.md +203 -202
- package/bin/skills/clickzetta-sql-pipeline-manager/references/dynamic-table.md +62 -62
- package/bin/skills/clickzetta-sql-pipeline-manager/references/materialized-view.md +34 -34
- package/bin/skills/clickzetta-sql-pipeline-manager/references/pipe.md +61 -61
- package/bin/skills/clickzetta-sql-pipeline-manager/references/table-stream.md +41 -41
- package/bin/skills/clickzetta-table-stream-pipeline/SKILL.md +103 -101
- package/package.json +1 -1
|
@@ -1,248 +1,249 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: clickzetta-sql-pipeline-manager
|
|
3
3
|
description: >
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
"
|
|
11
|
-
"
|
|
12
|
-
"
|
|
13
|
-
"
|
|
14
|
-
"
|
|
15
|
-
"
|
|
4
|
+
Manage SQL data pipeline objects in ClickZetta Lakehouse, including Dynamic Tables,
|
|
5
|
+
Materialized Views, Table Streams, and Pipes.
|
|
6
|
+
Covers the full lifecycle: create, modify, suspend/resume, drop, and status inspection.
|
|
7
|
+
SQL command operations only — does not cover the Lakehouse Studio GUI.
|
|
8
|
+
|
|
9
|
+
Trigger when the user says "create dynamic table", "create materialized view", "create Pipe",
|
|
10
|
+
"create table stream", "suspend/resume dynamic table", "view refresh history",
|
|
11
|
+
"change refresh interval", "ingest from Kafka", "continuous import from object storage",
|
|
12
|
+
"CDC change capture", "incremental computation", "real-time ETL",
|
|
13
|
+
"data pipeline", "pipeline", "stream processing", "dynamic table refresh failed",
|
|
14
|
+
"help me design ETL", "build a data pipeline", "data ingestion plan",
|
|
15
|
+
"Medallion Architecture", "Bronze Silver Gold", "lakehouse layering",
|
|
16
|
+
"Bronze layer", "Silver layer", "Gold layer".
|
|
16
17
|
Keywords: SQL pipeline, dynamic table, materialized view, table stream, Pipe, data pipeline
|
|
17
18
|
---
|
|
18
19
|
|
|
19
|
-
# ClickZetta SQL
|
|
20
|
+
# ClickZetta SQL Data Pipeline Management
|
|
20
21
|
|
|
21
|
-
## ⚠️ ClickZetta
|
|
22
|
+
## ⚠️ Key Syntax Differences: ClickZetta vs Standard SQL / Snowflake
|
|
22
23
|
|
|
23
|
-
|
|
24
|
+
These are the most common mistakes — always use ClickZetta-specific syntax:
|
|
24
25
|
|
|
25
|
-
|
|
|
26
|
+
| Feature | ❌ Wrong (Snowflake/Standard SQL) | ✅ ClickZetta Correct |
|
|
26
27
|
|---|---|---|
|
|
27
|
-
|
|
|
28
|
-
|
|
|
29
|
-
| Kafka
|
|
30
|
-
|
|
|
31
|
-
|
|
|
32
|
-
|
|
|
33
|
-
| JSON
|
|
34
|
-
| COPY INTO
|
|
35
|
-
| COPY INTO
|
|
28
|
+
| Dynamic Table compute cluster | `WAREHOUSE = compute_wh` | `vcluster default` (name directly, no equals sign) |
|
|
29
|
+
| Dynamic Table refresh schedule | `TARGET_LAG = '1 minutes'` | `REFRESH INTERVAL 1 MINUTE vcluster default` |
|
|
30
|
+
| Kafka read function | `TABLE(READ_KAFKA(KAFKA_BROKER => ...))` | `read_kafka('broker', 'topic', '', 'group', '', '', '', '', 'raw', 'raw', 0, MAP(...))` — positional args |
|
|
31
|
+
| Materialized View scheduled refresh | `REFRESH EVERY 1 HOUR` | `REFRESH INTERVAL 60 MINUTE vcluster default` (same syntax as Dynamic Table) |
|
|
32
|
+
| Materialized View manual refresh | `REFRESH MATERIALIZED VIEW` inside CREATE | Execute `REFRESH MATERIALIZED VIEW <name>;` separately |
|
|
33
|
+
| Modify Dynamic Table SQL | `ALTER DYNAMIC TABLE ... AS ...` | `CREATE OR REPLACE DYNAMIC TABLE ...` (ALTER does not support modifying the AS clause) |
|
|
34
|
+
| JSON field access | `$1:field::TYPE` or `data:key` | `parse_json(value::string)['field']::TYPE` or `data['key']` |
|
|
35
|
+
| COPY INTO import format | `FILE_FORMAT = (TYPE = CSV)` | `USING CSV OPTIONS(...)` |
|
|
36
|
+
| COPY INTO export format | `USING CSV` | `FILE_FORMAT = (TYPE = CSV)` |
|
|
36
37
|
|
|
37
38
|
---
|
|
38
39
|
|
|
39
|
-
##
|
|
40
|
+
## Guide: Clarify the User's Intent
|
|
40
41
|
|
|
41
|
-
|
|
42
|
+
After receiving a request, determine the user's intent and choose the corresponding workflow:
|
|
42
43
|
|
|
43
|
-
>
|
|
44
|
+
> What do you want to do?
|
|
44
45
|
>
|
|
45
|
-
> **A.
|
|
46
|
-
> **B.
|
|
47
|
-
> **C.
|
|
46
|
+
> **A. Design and create a new data pipeline** (complete SQL from data source through all layers) → Enter Pipeline Wizard
|
|
47
|
+
> **B. Manage existing pipeline objects** (modify DT refresh interval, suspend/resume, view refresh history) → Execute the corresponding operation directly
|
|
48
|
+
> **C. Troubleshoot pipeline issues** (DT refresh failure, Pipe stopped ingesting, Stream backlog) → Enter troubleshooting flow
|
|
48
49
|
|
|
49
|
-
|
|
50
|
+
**If the user has already stated clearly what they want (e.g., "create a pipeline from Kafka to DWD", "suspend this dynamic table"), proceed directly without asking again.**
|
|
50
51
|
|
|
51
52
|
---
|
|
52
53
|
|
|
53
|
-
## Pipeline Wizard
|
|
54
|
+
## Pipeline Wizard
|
|
54
55
|
|
|
55
|
-
|
|
56
|
-
"
|
|
57
|
-
"Medallion Architecture"
|
|
56
|
+
Use this mode when the user wants to design or build a complete data pipeline. Trigger phrases include:
|
|
57
|
+
"help me design/build ETL", "complete data pipeline", "ingest data from Kafka/OSS", "ODS→DWD→DWS", "end-to-end pipeline",
|
|
58
|
+
"Medallion Architecture", "Bronze/Silver/Gold", "lakehouse layering".
|
|
58
59
|
|
|
59
|
-
###
|
|
60
|
+
### Layer Naming Conventions
|
|
60
61
|
|
|
61
|
-
|
|
62
|
+
Users may use different layer naming schemes with the same meaning — preserve the user's preferred naming:
|
|
62
63
|
|
|
63
|
-
|
|
|
64
|
+
| User says | Meaning | Suggested Schema names |
|
|
64
65
|
|---|---|---|
|
|
65
66
|
| Bronze / Silver / Gold | Medallion Architecture | `bronze` / `silver` / `gold` |
|
|
66
|
-
| ODS / DWD / DWS |
|
|
67
|
-
| Raw / Cleansed / Aggregated |
|
|
67
|
+
| ODS / DWD / DWS | Chinese data warehouse convention | `ods` / `dwd` / `dws` |
|
|
68
|
+
| Raw / Cleansed / Aggregated | Generic English description | `raw` / `cleansed` / `agg` |
|
|
68
69
|
|
|
69
|
-
|
|
70
|
+
**Do not map Bronze to ODS, Silver to DWD, etc. — preserve the user's chosen naming and use the corresponding schema and table name prefixes in SQL.**
|
|
70
71
|
|
|
71
|
-
**Schema
|
|
72
|
+
**Schema names must include a business/project prefix to avoid conflicts with other projects.** If the user has not provided a prefix, ask for the project or business domain name, then generate prefixed schema names:
|
|
72
73
|
|
|
73
74
|
```sql
|
|
74
|
-
-- ❌
|
|
75
|
+
-- ❌ Prone to naming conflicts — avoid this
|
|
75
76
|
CREATE SCHEMA IF NOT EXISTS bronze;
|
|
76
77
|
|
|
77
|
-
-- ✅
|
|
78
|
+
-- ✅ Add a project prefix
|
|
78
79
|
CREATE SCHEMA IF NOT EXISTS ecommerce_bronze;
|
|
79
80
|
CREATE SCHEMA IF NOT EXISTS ecommerce_silver;
|
|
80
81
|
CREATE SCHEMA IF NOT EXISTS ecommerce_gold;
|
|
81
82
|
```
|
|
82
83
|
|
|
83
|
-
###
|
|
84
|
+
### Requirements Gathering
|
|
84
85
|
|
|
85
|
-
|
|
86
|
+
**If the user has already provided sufficient information (data source, fields, layer requirements, project prefix), generate the complete SQL directly without asking further questions.**
|
|
86
87
|
|
|
87
|
-
|
|
88
|
+
If information is incomplete, use an interactive Q&A tool (e.g., `question`) to collect the following and present option menus; if no such tool is available, list all questions in a single text response:
|
|
88
89
|
|
|
89
90
|
```
|
|
90
91
|
question({
|
|
91
92
|
questions: [
|
|
92
93
|
{
|
|
93
|
-
question: "
|
|
94
|
+
question: "Data source?",
|
|
94
95
|
options: [
|
|
95
|
-
{ label: "Kafka", description: "
|
|
96
|
-
{ label: "
|
|
97
|
-
{ label: "
|
|
98
|
-
{ label: "
|
|
96
|
+
{ label: "Kafka", description: "Provide broker address and topic name" },
|
|
97
|
+
{ label: "Object Storage (OSS/S3/COS)", description: "Provide Volume path and file format" },
|
|
98
|
+
{ label: "Existing Lakehouse table (INSERT only)", description: "Dynamic Table reads directly from source table" },
|
|
99
|
+
{ label: "Existing Lakehouse table (with UPDATE/DELETE)", description: "Requires Table Stream + Dynamic Table" }
|
|
99
100
|
]
|
|
100
101
|
},
|
|
101
102
|
{
|
|
102
|
-
question: "
|
|
103
|
+
question: "Refresh frequency?",
|
|
103
104
|
options: [
|
|
104
|
-
{ label: "
|
|
105
|
-
{ label: "
|
|
106
|
-
{ label: "
|
|
105
|
+
{ label: "Real-time (seconds)", description: "REFRESH INTERVAL 10~60 SECOND" },
|
|
106
|
+
{ label: "Near real-time (minutes)", description: "REFRESH INTERVAL 1~10 MINUTE" },
|
|
107
|
+
{ label: "Low frequency (hourly/daily)", description: "REFRESH INTERVAL 1 HOUR or 1 DAY" }
|
|
107
108
|
]
|
|
108
109
|
}
|
|
109
110
|
]
|
|
110
111
|
})
|
|
111
112
|
```
|
|
112
113
|
|
|
113
|
-
|
|
114
|
+
Also confirm: project/business prefix (for schema naming), layer requirements (how many layers, what each layer does), and target table field structure. These can be asked after the user responds, or inferred from context.
|
|
114
115
|
|
|
115
|
-
###
|
|
116
|
+
### Generate Complete SQL
|
|
116
117
|
|
|
117
|
-
|
|
118
|
+
After receiving answers, generate complete end-to-end SQL including all of the following:
|
|
118
119
|
|
|
119
120
|
```
|
|
120
|
-
1. Schema
|
|
121
|
-
2.
|
|
122
|
-
3.
|
|
123
|
-
4.
|
|
124
|
-
5.
|
|
125
|
-
6.
|
|
126
|
-
7.
|
|
127
|
-
8.
|
|
121
|
+
1. Schema creation (CREATE SCHEMA IF NOT EXISTS, using the user's chosen layer names)
|
|
122
|
+
2. Ingestion layer table creation (if external ingestion is involved)
|
|
123
|
+
3. Data entry point (Pipe or Table Stream, based on source type)
|
|
124
|
+
4. Intermediate layer Dynamic Tables (cleansing/filtering, REFRESH INTERVAL N MINUTE VCLUSTER name)
|
|
125
|
+
5. Serving layer Dynamic Tables (aggregation/dimensions, REFRESH INTERVAL N MINUTE VCLUSTER name)
|
|
126
|
+
6. Execute REFRESH DYNAMIC TABLE immediately after each Dynamic Table is created (reset refresh baseline)
|
|
127
|
+
7. Verification commands (SHOW + REFRESH HISTORY)
|
|
128
|
+
8. Operations commands (SUSPEND/RESUME)
|
|
128
129
|
```
|
|
129
130
|
|
|
130
|
-
**SQL
|
|
131
|
+
**After generating SQL, save each segment as a Studio task (code as an asset):**
|
|
131
132
|
|
|
132
|
-
|
|
133
|
+
In data pipeline development, all SQL should be saved as Studio tasks as manageable code assets:
|
|
133
134
|
|
|
134
135
|
```bash
|
|
135
|
-
#
|
|
136
|
+
# DDL SQL → save as DRAFT task (no Cron)
|
|
136
137
|
cz-cli task save-content <ddl_task_name> --content "<ddl_sql>"
|
|
137
138
|
|
|
138
|
-
# ETL
|
|
139
|
+
# ETL/transformation SQL → save as scheduled task (with Cron + dependencies)
|
|
139
140
|
cz-cli task save-content <etl_task_name> --content "<etl_sql>"
|
|
140
141
|
cz-cli task save-cron <etl_task_name> --cron '0 30 2 * * ? *'
|
|
141
142
|
cz-cli task deploy <etl_task_name>
|
|
142
143
|
```
|
|
143
144
|
|
|
144
|
-
> Dynamic Table DDL
|
|
145
|
+
> Dynamic Table DDL should also be saved as a DRAFT task (`03_ddl_dws_ads`) for easy reference and multi-environment migration.
|
|
145
146
|
|
|
146
|
-
**⚠️ DDL
|
|
147
|
+
**⚠️ DDL tasks vs data flow tasks — scheduling rules (hard constraints, must not be violated):**
|
|
147
148
|
|
|
148
|
-
|
|
|
149
|
+
| Task type | Criteria | Scheduling config | Studio status |
|
|
149
150
|
|---|---|---|---|
|
|
150
|
-
| DDL
|
|
151
|
-
|
|
|
152
|
-
| Dynamic Table | DWS/ADS
|
|
151
|
+
| DDL task | Contains `CREATE / DROP / ALTER TABLE/SCHEMA` | **No Cron, no dependencies** | DRAFT |
|
|
152
|
+
| Data flow task | Data sync, ETL transformation, data quality checks | Configure Cron + upstream/downstream dependencies | PUBLISHED |
|
|
153
|
+
| Dynamic Table | DWS/ADS aggregation layer | **No Studio task needed** — system auto-refreshes | — |
|
|
153
154
|
|
|
154
|
-
> AI
|
|
155
|
+
> When AI generates SQL pipelines involving Studio task orchestration, the above rules must be followed. Do not generate Cron scheduling for DDL statements.
|
|
155
156
|
|
|
156
|
-
|
|
157
|
+
**Source → entry object selection rules:**
|
|
157
158
|
- Kafka → `CREATE PIPE ... AS COPY INTO ... FROM (SELECT ... FROM read_kafka('broker', 'topic', '', 'group', '', '', '', '', 'raw', 'raw', 0, MAP(...)))`
|
|
158
|
-
-
|
|
159
|
-
-
|
|
160
|
-
-
|
|
159
|
+
- Object storage (OSS/S3/COS) → `CREATE PIPE ... VIRTUAL_CLUSTER = 'name' INGEST_MODE = 'LIST_PURGE' AS COPY INTO ... FROM VOLUME <volume_name> USING <format> PURGE=true`
|
|
160
|
+
- Existing table + has UPDATE/DELETE → `CREATE TABLE STREAM ... WITH PROPERTIES ('TABLE_STREAM_MODE' = 'STANDARD')`, intermediate layer filters `__change_type IN ('INSERT', 'UPDATE_AFTER', 'DELETE')`
|
|
161
|
+
- Existing table + INSERT only → Dynamic Table reads directly `FROM` source table
|
|
161
162
|
|
|
162
|
-
|
|
163
|
-
-
|
|
164
|
-
-
|
|
163
|
+
**Refresh frequency rules:**
|
|
164
|
+
- First transformation layer (Bronze→Silver or ODS→DWD): use the user-specified refresh frequency (e.g., `REFRESH INTERVAL 1 MINUTE vcluster default`)
|
|
165
|
+
- Downstream layers: set their own refresh frequency based on business requirements (e.g., `REFRESH INTERVAL 5 MINUTE vcluster default`)
|
|
165
166
|
|
|
166
167
|
---
|
|
167
168
|
|
|
168
|
-
##
|
|
169
|
+
## Object Type Quick Reference
|
|
169
170
|
|
|
170
|
-
|
|
|
171
|
+
| Object | Use case | Key characteristics |
|
|
171
172
|
|---|---|---|
|
|
172
|
-
| **Dynamic Table** |
|
|
173
|
-
| **Materialized View** |
|
|
174
|
-
| **Table Stream** | CDC
|
|
175
|
-
| **Pipe** |
|
|
173
|
+
| **Dynamic Table** | Real-time / near real-time incremental ETL | SQL-defined, auto incremental refresh, second/minute-level latency |
|
|
174
|
+
| **Materialized View** | Fixed aggregation to accelerate queries | Pre-computed storage, manual or scheduled full refresh |
|
|
175
|
+
| **Table Stream** | CDC change data capture | Captures INSERT/UPDATE/DELETE, consumed by Dynamic Tables |
|
|
176
|
+
| **Pipe** | Continuous data ingestion | Auto continuous import from Kafka or object storage, no scheduling needed |
|
|
176
177
|
|
|
177
|
-
##
|
|
178
|
+
## Decision Tree
|
|
178
179
|
|
|
179
180
|
```
|
|
180
|
-
|
|
181
|
-
├──
|
|
181
|
+
User requirement
|
|
182
|
+
├── Continuously ingest from external source (Kafka / OSS / S3)
|
|
182
183
|
│ └── → Pipe
|
|
183
|
-
├──
|
|
184
|
-
│ ├──
|
|
185
|
-
│ └──
|
|
186
|
-
├──
|
|
184
|
+
├── Real-time / incremental transformation on existing tables
|
|
185
|
+
│ ├── Need to detect UPDATE/DELETE → Table Stream + Dynamic Table
|
|
186
|
+
│ └── INSERT append only → Dynamic Table (reads source table directly)
|
|
187
|
+
├── Fixed aggregation, real-time not required
|
|
187
188
|
│ └── → Materialized View
|
|
188
|
-
└──
|
|
189
|
-
└── →
|
|
189
|
+
└── Multi-layer ETL (ODS→DWD→DWS or Bronze→Silver→Gold)
|
|
190
|
+
└── → Multiple cascaded Dynamic Tables (each layer with its own REFRESH INTERVAL)
|
|
190
191
|
```
|
|
191
192
|
|
|
192
|
-
##
|
|
193
|
+
## Step 0: Confirm Connection
|
|
193
194
|
|
|
194
|
-
|
|
195
|
+
Before any operation, confirm you are connected to ClickZetta Lakehouse. Refer to the `clickzetta-lakehouse-connect` skill for connection parameters.
|
|
195
196
|
|
|
196
|
-
##
|
|
197
|
+
## Step 1: Select Object Type
|
|
197
198
|
|
|
198
|
-
|
|
199
|
+
Use the decision tree to select the object type, then read the corresponding reference file:
|
|
199
200
|
|
|
200
|
-
|
|
|
201
|
+
| Object | Reference file |
|
|
201
202
|
|---|---|
|
|
202
203
|
| Dynamic Table | [references/dynamic-table.md](references/dynamic-table.md) |
|
|
203
204
|
| Materialized View | [references/materialized-view.md](references/materialized-view.md) |
|
|
204
205
|
| Table Stream | [references/table-stream.md](references/table-stream.md) |
|
|
205
206
|
| Pipe | [references/pipe.md](references/pipe.md) |
|
|
206
207
|
|
|
207
|
-
##
|
|
208
|
+
## Step 2: Generate and Execute SQL
|
|
208
209
|
|
|
209
|
-
|
|
210
|
+
After reading the corresponding reference file, generate complete runnable SQL based on the user's parameters.
|
|
210
211
|
|
|
211
|
-
|
|
212
|
-
- Dynamic Table
|
|
213
|
-
- Table Stream
|
|
214
|
-
- Pipe
|
|
215
|
-
- Pipe
|
|
212
|
+
**Required parameter checklist:**
|
|
213
|
+
- Dynamic Table: `REFRESH INTERVAL N MINUTE vcluster name`, AS query
|
|
214
|
+
- Table Stream: source table name, MODE (STANDARD or APPEND_ONLY)
|
|
215
|
+
- Pipe (Kafka): bootstrap_servers, topic, group_id, target table (positional parameter syntax)
|
|
216
|
+
- Pipe (object storage): Volume path, file format, target table, `PURGE=true` (LIST_PURGE mode)
|
|
216
217
|
|
|
217
|
-
|
|
218
|
+
If the user has not provided a VCLUSTER, default to `default` (GP-type cluster).
|
|
218
219
|
|
|
219
|
-
##
|
|
220
|
+
## Step 3: Verify
|
|
220
221
|
|
|
221
222
|
```sql
|
|
222
|
-
--
|
|
223
|
+
-- Verify Dynamic Table
|
|
223
224
|
SHOW TABLES WHERE is_dynamic = true;
|
|
224
225
|
SHOW DYNAMIC TABLE REFRESH HISTORY <name> LIMIT 5;
|
|
225
226
|
|
|
226
|
-
--
|
|
227
|
+
-- Verify Materialized View
|
|
227
228
|
SHOW TABLES WHERE is_materialized_view = true;
|
|
228
229
|
|
|
229
|
-
--
|
|
230
|
+
-- Verify Table Stream
|
|
230
231
|
SHOW TABLE STREAMS;
|
|
231
|
-
SELECT COUNT(*) FROM <stream_name>; --
|
|
232
|
+
SELECT COUNT(*) FROM <stream_name>; -- check pending change count
|
|
232
233
|
|
|
233
|
-
--
|
|
234
|
+
-- Verify Pipe
|
|
234
235
|
SHOW PIPES;
|
|
235
236
|
```
|
|
236
237
|
|
|
237
238
|
---
|
|
238
239
|
|
|
239
|
-
##
|
|
240
|
+
## Typical Scenario Examples
|
|
240
241
|
|
|
241
|
-
###
|
|
242
|
+
### Scenario A: Kafka → Dynamic Table (Real-time ETL)
|
|
242
243
|
|
|
243
244
|
```sql
|
|
244
|
-
-- Step 1:
|
|
245
|
-
-- ⚠️
|
|
245
|
+
-- Step 1: Create Pipe to continuously ingest Kafka data into ODS layer
|
|
246
|
+
-- ⚠️ Note: ClickZetta does not support CREATE OR REPLACE PIPE; use CREATE PIPE or DROP then CREATE
|
|
246
247
|
CREATE PIPE kafka_orders_pipe
|
|
247
248
|
VIRTUAL_CLUSTER = 'default'
|
|
248
249
|
BATCH_INTERVAL_IN_SECONDS = '60'
|
|
@@ -261,14 +262,14 @@ COPY INTO ods.orders FROM (
|
|
|
261
262
|
'orders', -- topic
|
|
262
263
|
'', -- reserved
|
|
263
264
|
'lakehouse_ingest', -- group_id
|
|
264
|
-
'', '', '', '', --
|
|
265
|
+
'', '', '', '', -- positional params left empty, managed by Pipe
|
|
265
266
|
'raw', 'raw', 0,
|
|
266
267
|
MAP('kafka.security.protocol', 'PLAINTEXT')
|
|
267
268
|
)
|
|
268
269
|
)
|
|
269
270
|
);
|
|
270
271
|
|
|
271
|
-
-- Step 2:
|
|
272
|
+
-- Step 2: Dynamic Table for DWD layer cleansing (incremental refresh every minute)
|
|
272
273
|
CREATE OR REPLACE DYNAMIC TABLE dwd.orders_clean
|
|
273
274
|
REFRESH INTERVAL 1 MINUTE vcluster default
|
|
274
275
|
AS
|
|
@@ -282,7 +283,7 @@ SELECT
|
|
|
282
283
|
FROM ods.orders
|
|
283
284
|
WHERE amount > 0;
|
|
284
285
|
|
|
285
|
-
-- Step 3:
|
|
286
|
+
-- Step 3: Dynamic Table for DWS layer aggregation (refresh every 5 minutes)
|
|
286
287
|
CREATE OR REPLACE DYNAMIC TABLE dws.order_hourly
|
|
287
288
|
REFRESH INTERVAL 5 MINUTE vcluster default
|
|
288
289
|
AS
|
|
@@ -295,15 +296,15 @@ FROM dwd.orders_clean
|
|
|
295
296
|
GROUP BY 1, 2;
|
|
296
297
|
```
|
|
297
298
|
|
|
298
|
-
###
|
|
299
|
+
### Scenario B: Table Stream + Dynamic Table (CDC UPSERT)
|
|
299
300
|
|
|
300
301
|
```sql
|
|
301
|
-
-- Step 1:
|
|
302
|
+
-- Step 1: Create Stream on source table to capture changes
|
|
302
303
|
CREATE TABLE STREAM ods.orders_stream
|
|
303
304
|
ON TABLE ods.orders
|
|
304
305
|
WITH PROPERTIES ('TABLE_STREAM_MODE' = 'STANDARD');
|
|
305
306
|
|
|
306
|
-
-- Step 2:
|
|
307
|
+
-- Step 2: Dynamic Table consumes Stream, filters for latest state
|
|
307
308
|
CREATE OR REPLACE DYNAMIC TABLE dwd.orders_latest
|
|
308
309
|
REFRESH INTERVAL 2 MINUTE vcluster default
|
|
309
310
|
AS
|
|
@@ -312,15 +313,15 @@ FROM ods.orders_stream
|
|
|
312
313
|
WHERE __change_type IN ('INSERT', 'UPDATE_AFTER');
|
|
313
314
|
```
|
|
314
315
|
|
|
315
|
-
###
|
|
316
|
+
### Scenario C: Materialized View to Accelerate BI Queries
|
|
316
317
|
|
|
317
318
|
```sql
|
|
318
|
-
--
|
|
319
|
-
-- ⚠️
|
|
320
|
-
--
|
|
319
|
+
-- Create a materialized view with hourly refresh
|
|
320
|
+
-- ⚠️ Note: ClickZetta does not support CREATE OR REPLACE MATERIALIZED VIEW
|
|
321
|
+
-- Method 1: DROP then CREATE (recommended)
|
|
321
322
|
DROP MATERIALIZED VIEW IF EXISTS dws.mv_daily_revenue;
|
|
322
323
|
CREATE MATERIALIZED VIEW dws.mv_daily_revenue
|
|
323
|
-
COMMENT '
|
|
324
|
+
COMMENT 'Daily revenue summary for BI tools'
|
|
324
325
|
REFRESH INTERVAL 60 MINUTE vcluster default
|
|
325
326
|
AS
|
|
326
327
|
SELECT
|
|
@@ -331,41 +332,41 @@ SELECT
|
|
|
331
332
|
FROM dwd.orders_clean
|
|
332
333
|
GROUP BY 1, 2;
|
|
333
334
|
|
|
334
|
-
--
|
|
335
|
+
-- Method 2: Use BUILD DEFERRED + DISABLE QUERY REWRITE (complex, not recommended)
|
|
335
336
|
-- CREATE OR REPLACE MATERIALIZED VIEW ... BUILD DEFERRED DISABLE QUERY REWRITE AS ...
|
|
336
337
|
|
|
337
|
-
--
|
|
338
|
+
-- Manually trigger refresh
|
|
338
339
|
REFRESH MATERIALIZED VIEW dws.mv_daily_revenue;
|
|
339
340
|
|
|
340
|
-
--
|
|
341
|
+
-- Drop materialized view (⚠️ must use DROP MATERIALIZED VIEW, not DROP TABLE)
|
|
341
342
|
DROP MATERIALIZED VIEW dws.mv_daily_revenue;
|
|
342
343
|
```
|
|
343
344
|
|
|
344
|
-
###
|
|
345
|
+
### Scenario D: Operations
|
|
345
346
|
|
|
346
347
|
```sql
|
|
347
|
-
--
|
|
348
|
+
-- Suspend Dynamic Table (e.g., during cluster maintenance)
|
|
348
349
|
ALTER DYNAMIC TABLE dwd.orders_clean SUSPEND;
|
|
349
350
|
|
|
350
|
-
--
|
|
351
|
+
-- Resume
|
|
351
352
|
ALTER DYNAMIC TABLE dwd.orders_clean RESUME;
|
|
352
353
|
|
|
353
|
-
--
|
|
354
|
+
-- View refresh history to troubleshoot failures
|
|
354
355
|
SHOW DYNAMIC TABLE REFRESH HISTORY dwd.orders_clean LIMIT 10;
|
|
355
356
|
|
|
356
|
-
--
|
|
357
|
+
-- Pause Pipe
|
|
357
358
|
ALTER PIPE kafka_orders_pipe SET PIPE_EXECUTION_PAUSED = true;
|
|
358
359
|
|
|
359
|
-
--
|
|
360
|
+
-- Resume Pipe
|
|
360
361
|
ALTER PIPE kafka_orders_pipe SET PIPE_EXECUTION_PAUSED = false;
|
|
361
362
|
```
|
|
362
363
|
|
|
363
|
-
###
|
|
364
|
+
### Scenario E: Parameterized Dynamic Table (Partition-based Refresh)
|
|
364
365
|
|
|
365
|
-
|
|
366
|
+
Use the `SESSION_CONFIGS()` function to define parameterized queries, passing partition values at refresh time to control the refresh scope:
|
|
366
367
|
|
|
367
368
|
```sql
|
|
368
|
-
--
|
|
369
|
+
-- Create a parameterized Dynamic Table (using SESSION_CONFIGS to define parameters)
|
|
369
370
|
CREATE OR REPLACE DYNAMIC TABLE dwd.orders_partitioned
|
|
370
371
|
REFRESH INTERVAL 30 MINUTE vcluster default
|
|
371
372
|
AS
|
|
@@ -373,27 +374,27 @@ SELECT order_id, user_id, amount, status, created_at, DATE(created_at) AS dt
|
|
|
373
374
|
FROM ods.orders
|
|
374
375
|
WHERE dt = SESSION_CONFIGS('target_date', CAST(CURRENT_DATE() AS STRING));
|
|
375
376
|
|
|
376
|
-
--
|
|
377
|
+
-- Manually trigger refresh with parameters
|
|
377
378
|
REFRESH DYNAMIC TABLE dwd.orders_partitioned
|
|
378
379
|
WITH PROPERTIES ('target_date' = '2024-06-15');
|
|
379
380
|
```
|
|
380
381
|
|
|
381
|
-
>
|
|
382
|
+
> **Use case**: When migrating traditional daily/hourly full ETL jobs to incremental jobs, replace scheduling variables (e.g., `${bizdate}`) with SESSION_CONFIGS for parameterized partition refresh.
|
|
382
383
|
|
|
383
|
-
###
|
|
384
|
+
### Scenario F: Dynamic Table DML Operations (Manual Data Correction)
|
|
384
385
|
|
|
385
|
-
⚠️
|
|
386
|
+
⚠️ **Important**: ClickZetta Dynamic Tables **do not support DML operations** (INSERT/UPDATE/DELETE) by default. For data correction, the following options are available:
|
|
386
387
|
|
|
387
|
-
|
|
388
|
+
**Option 1: Rebuild the Dynamic Table (recommended)**
|
|
388
389
|
```sql
|
|
389
|
-
-- 1.
|
|
390
|
-
-- 2.
|
|
390
|
+
-- 1. Correct data in the source table
|
|
391
|
+
-- 2. Wait for the Dynamic Table to auto-refresh (the next REFRESH INTERVAL will trigger a full refresh)
|
|
391
392
|
```
|
|
392
393
|
|
|
393
|
-
|
|
394
|
+
**Option 2: Use a regular table instead of a Dynamic Table**
|
|
394
395
|
```sql
|
|
395
|
-
--
|
|
396
|
-
--
|
|
396
|
+
-- For scenarios requiring frequent manual corrections, use a regular table + scheduled Studio task
|
|
397
|
+
-- instead of a Dynamic Table
|
|
397
398
|
CREATE TABLE dwd.orders_manual (
|
|
398
399
|
order_id STRING,
|
|
399
400
|
user_id STRING,
|
|
@@ -404,82 +405,82 @@ CREATE TABLE dwd.orders_manual (
|
|
|
404
405
|
);
|
|
405
406
|
```
|
|
406
407
|
|
|
407
|
-
> ⚠️
|
|
408
|
-
> -
|
|
409
|
-
> -
|
|
410
|
-
> -
|
|
408
|
+
> ⚠️ **Dynamic Table limitations**:
|
|
409
|
+
> - Dynamic Tables are read-only; INSERT/UPDATE/DELETE are not supported
|
|
410
|
+
> - Data corrections should be made in the source table; the Dynamic Table will auto-refresh
|
|
411
|
+
> - For manual data control, use a regular table + Studio scheduled task
|
|
411
412
|
|
|
412
413
|
---
|
|
413
414
|
|
|
414
|
-
##
|
|
415
|
+
## Common Errors
|
|
415
416
|
|
|
416
|
-
|
|
|
417
|
+
| Error | Cause | Solution |
|
|
417
418
|
|---|---|---|
|
|
418
|
-
| `VCluster not available` |
|
|
419
|
-
|
|
|
420
|
-
| Stream
|
|
421
|
-
| Pipe
|
|
422
|
-
| `Cannot ALTER AS clause` |
|
|
423
|
-
| `CREATE OR REPLACE PIPE`
|
|
424
|
-
| `CREATE OR REPLACE MATERIALIZED VIEW`
|
|
425
|
-
| `DROP TABLE`
|
|
426
|
-
|
|
|
427
|
-
| `SET cz.sql.dt.allow.dml`
|
|
419
|
+
| `VCluster not available` | Compute cluster not started or name is wrong | Verify VCLUSTER name, check cluster status |
|
|
420
|
+
| Dynamic Table refresh failed | SQL query error or source table schema changed | Run `SHOW DYNAMIC TABLE REFRESH HISTORY WHERE name = 'xxx'` to view error details |
|
|
421
|
+
| Stream data is empty | Already consumed or past retention period | Check source table `data_retention_days`, confirm whether data was consumed |
|
|
422
|
+
| Pipe stopped ingesting | Kafka offset issue or connection dropped | Run `DESC PIPE EXTENDED` to check status, verify Kafka connection |
|
|
423
|
+
| `Cannot ALTER AS clause` | Attempted to modify Dynamic Table SQL via ALTER | Use `CREATE OR REPLACE DYNAMIC TABLE` instead |
|
|
424
|
+
| `CREATE OR REPLACE PIPE` syntax error | ClickZetta does not support this syntax | Use `CREATE PIPE` or `DROP PIPE` then `CREATE` |
|
|
425
|
+
| `CREATE OR REPLACE MATERIALIZED VIEW` syntax error | Only supports `REWRITE DISABLED + BUILD DEFER` mode | Use `DROP MATERIALIZED VIEW` + `CREATE MATERIALIZED VIEW` |
|
|
426
|
+
| `DROP TABLE` fails on materialized view | Object type mismatch | Use `DROP MATERIALIZED VIEW` (not `DROP TABLE`) |
|
|
427
|
+
| Dynamic Table DML error `not allowed` | Dynamic Tables do not support DML | Correct data in source table, or use a regular table + scheduled task |
|
|
428
|
+
| `SET cz.sql.dt.allow.dml` error | Session statement not supported | Dynamic Tables do not support DML; use an alternative approach |
|
|
428
429
|
|
|
429
430
|
---
|
|
430
431
|
|
|
431
|
-
##
|
|
432
|
+
## Delivery Acceptance Checklist
|
|
432
433
|
|
|
433
|
-
|
|
434
|
+
After pipeline creation, **verify each item — do not skip**:
|
|
434
435
|
|
|
435
436
|
```sql
|
|
436
|
-
-- 1.
|
|
437
|
-
SELECT COUNT(*) FROM ods.<table>; -- ODS
|
|
438
|
-
SELECT COUNT(*) FROM dwd.<table>; -- DWD
|
|
439
|
-
SELECT COUNT(*) FROM dws.<table>; -- DWS
|
|
437
|
+
-- 1. Row count comparison: each layer's row count matches expectations
|
|
438
|
+
SELECT COUNT(*) FROM ods.<table>; -- ODS count ≈ source
|
|
439
|
+
SELECT COUNT(*) FROM dwd.<table>; -- DWD count ≤ ODS (after cleansing)
|
|
440
|
+
SELECT COUNT(*) FROM dws.<table>; -- DWS count matches aggregation logic
|
|
440
441
|
|
|
441
|
-
-- 2. Dynamic Table
|
|
442
|
+
-- 2. Dynamic Table refresh status
|
|
442
443
|
SHOW DYNAMIC TABLE REFRESH HISTORY <schema>.<table> LIMIT 5;
|
|
443
|
-
--
|
|
444
|
+
-- Confirm latest status = SUCCESS, refresh_mode = INCREMENTAL or FULL
|
|
444
445
|
|
|
445
|
-
-- 3.
|
|
446
|
+
-- 3. Key field non-null rate
|
|
446
447
|
SELECT
|
|
447
448
|
COUNT(*) AS total,
|
|
448
449
|
COUNT(key_field) AS non_null,
|
|
449
450
|
ROUND(COUNT(key_field) * 100.0 / COUNT(*), 2) AS non_null_pct
|
|
450
451
|
FROM <schema>.<table>;
|
|
451
|
-
--
|
|
452
|
+
-- Core business fields should have non-null rate > 99%
|
|
452
453
|
|
|
453
|
-
-- 4.
|
|
454
|
+
-- 4. Primary key uniqueness (DWD fact tables)
|
|
454
455
|
SELECT key_col, COUNT(*) AS cnt
|
|
455
456
|
FROM dwd.<table>
|
|
456
457
|
GROUP BY key_col
|
|
457
458
|
HAVING cnt > 1
|
|
458
459
|
LIMIT 10;
|
|
459
|
-
--
|
|
460
|
+
-- Empty result = no duplicates, as expected
|
|
460
461
|
|
|
461
|
-
-- 5. Pipe
|
|
462
|
+
-- 5. Pipe ingestion status (if applicable)
|
|
462
463
|
SHOW PIPES;
|
|
463
|
-
-- status = RUNNING
|
|
464
|
+
-- status = RUNNING, last_ingested_timestamp continuously updating
|
|
464
465
|
```
|
|
465
466
|
|
|
466
|
-
|
|
467
|
-
- [ ]
|
|
468
|
-
- [ ] Dynamic Table
|
|
469
|
-
- [ ]
|
|
470
|
-
- [ ] DWD
|
|
471
|
-
- [ ] Pipe
|
|
472
|
-
- [ ]
|
|
473
|
-
- [ ] DWS/ADS
|
|
467
|
+
**Acceptance criteria:**
|
|
468
|
+
- [ ] Row counts at each layer match expectations
|
|
469
|
+
- [ ] Dynamic Table latest refresh status is SUCCESS
|
|
470
|
+
- [ ] Key field non-null rate > 99%
|
|
471
|
+
- [ ] DWD layer primary keys have no duplicates
|
|
472
|
+
- [ ] Pipe status is RUNNING (if applicable)
|
|
473
|
+
- [ ] All DDL tasks are in DRAFT status (if Studio tasks are involved)
|
|
474
|
+
- [ ] No redundant Studio scheduled tasks at DWS/ADS layer
|
|
474
475
|
|
|
475
476
|
---
|
|
476
477
|
|
|
477
|
-
##
|
|
478
|
+
## Reference Documentation
|
|
478
479
|
|
|
479
|
-
- [
|
|
480
|
+
- [Incremental Computation Overview](https://www.yunqi.tech/documents/streaming_data_pipeline_overview)
|
|
480
481
|
- [Dynamic Table](https://www.yunqi.tech/documents/dynamic-table)
|
|
481
|
-
- [Table Stream
|
|
482
|
-
- [
|
|
483
|
-
- [Pipe
|
|
484
|
-
- [
|
|
485
|
-
- [LLM
|
|
482
|
+
- [Table Stream Change Data Capture](https://www.yunqi.tech/documents/table_stream)
|
|
483
|
+
- [Materialized View](https://www.yunqi.tech/documents/materialized_ddl)
|
|
484
|
+
- [Pipe Overview](https://www.yunqi.tech/documents/pipe-summary)
|
|
485
|
+
- [Real-time ETL with Dynamic Table](https://www.yunqi.tech/documents/tutorials-streaming-data-pipeline-with_dynamic-table)
|
|
486
|
+
- [LLM Full Documentation Index](https://yunqi.tech/llms-full.txt)
|