@clickzetta/cz-cli-darwin-x64 0.3.89 → 0.3.91

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. package/bin/cz-cli +0 -0
  2. package/bin/skills/clickzetta-dynamic-table/SKILL.md +169 -169
  3. package/bin/skills/clickzetta-dynamic-table/best-practices/dimension-table-join-guide.md +126 -126
  4. package/bin/skills/clickzetta-dynamic-table/best-practices/medallion-and-stream-patterns.md +25 -25
  5. package/bin/skills/clickzetta-dynamic-table/best-practices/non-partitioned-merge-into-warning.md +48 -48
  6. package/bin/skills/clickzetta-dynamic-table/best-practices/performance-optimization.md +51 -51
  7. package/bin/skills/clickzetta-dynamic-table/best-practices/scheduling-guide.md +59 -59
  8. package/bin/skills/clickzetta-dynamic-table/dt-creator/SKILL.md +8 -7
  9. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md +99 -99
  10. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/incremental-config-reference.md +188 -188
  11. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/refresh-history-guide.md +117 -117
  12. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/sql-limitations.md +29 -29
  13. package/bin/skills/clickzetta-dynamic-table/dynamic-table-alter/SKILL.md +80 -79
  14. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/SKILL.md +15 -15
  15. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-column-validation-rules.md +61 -61
  16. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-conversion-rules.md +100 -100
  17. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-placeholder-rules.md +64 -64
  18. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-refresh-rules.md +32 -32
  19. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-self-reference-rules.md +21 -21
  20. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-workflow.md +71 -71
  21. package/bin/skills/clickzetta-sql-pipeline-manager/SKILL.md +203 -202
  22. package/bin/skills/clickzetta-sql-pipeline-manager/references/dynamic-table.md +62 -62
  23. package/bin/skills/clickzetta-sql-pipeline-manager/references/materialized-view.md +34 -34
  24. package/bin/skills/clickzetta-sql-pipeline-manager/references/pipe.md +61 -61
  25. package/bin/skills/clickzetta-sql-pipeline-manager/references/table-stream.md +41 -41
  26. package/bin/skills/clickzetta-table-stream-pipeline/SKILL.md +103 -101
  27. package/package.json +1 -1
@@ -1,12 +1,12 @@
1
- # Dynamic Table 声明策略
1
+ # Dynamic Table Declaration Strategy
2
2
 
3
- DT 有两种创建语法:静态分区 DT 和动态分区 DT(非分区 DT 可视为动态分区的特例)。两者在创建语法、刷新方式、增量行为上有本质区别。
3
+ DT has two creation syntaxes: static partition DT and dynamic partition DT (non-partitioned DT can be viewed as a special case of dynamic partition). The two differ fundamentally in creation syntax, refresh behavior, and incremental behavior.
4
4
 
5
- ## 核心概念
5
+ ## Core Concepts
6
6
 
7
- ### 静态分区 DTPartitioned DT with SESSION_CONFIGS args
7
+ ### Static Partition DT (Partitioned DT with SESSION_CONFIGS args)
8
8
 
9
- SQL 中通过 `SESSION_CONFIGS()` 引用分区参数,每次 REFRESH 时指定具体的分区值。每个分区独立刷新,可以视为每个分区刷新单元都是一个彼此独立的 DT
9
+ The SQL references partition parameters via `SESSION_CONFIGS()`, and a specific partition value is specified at each REFRESH. Each partition refreshes independently — each partition refresh unit can be viewed as an independent DT.
10
10
 
11
11
  ```sql
12
12
  CREATE DYNAMIC TABLE order_daily (
@@ -18,24 +18,24 @@ SELECT id, amount, SESSION_CONFIGS()['dt.args.ds'] AS ds
18
18
  FROM orders
19
19
  WHERE ds = SESSION_CONFIGS()['dt.args.ds'];
20
20
 
21
- -- 刷新时指定分区
21
+ -- Specify partition at refresh time
22
22
  set dt.args.ds=2025-01-01
23
23
  REFRESH DYNAMIC TABLE order_daily PARTITION(ds = '2025-01-01');
24
24
  ```
25
25
 
26
- ### 动态分区 DTNon-partitioned DT / DT without args
26
+ ### Dynamic Partition DT (Non-partitioned DT / DT without args)
27
27
 
28
- SQL 中不引用 `SESSION_CONFIGS()`,或者虽然有分区但分区值由查询逻辑动态产生。每次 REFRESH 处理所有源表的增量数据。
28
+ The SQL does not reference `SESSION_CONFIGS()`, or although partitioned, the partition values are dynamically produced by the query logic. Each REFRESH processes all incremental data from all source tables.
29
29
 
30
- 动态分区 DT 不允许除 REFRESH 以外的任何命令修改数据(INSERT/UPDATE/DELETE/MERGE 均不可用),数据完全由 REFRESH 驱动。
30
+ Dynamic partition DTs do not allow any command other than REFRESH to modify data (INSERT/UPDATE/DELETE/MERGE are all unavailable); data is driven entirely by REFRESH.
31
31
 
32
- 因此以下 ETL 场景不适合使用动态分区 DT
33
- - 需要手动修补数据(如发现某几行数据有误,需要直接 UPDATE 修正)
34
- - 需要按条件删除部分数据(如清理脏数据、删除过期记录)
35
- - 需要 MERGE INTO upsert(如 CDC 场景中消费 stream 合并到目标表)
36
- - 需要 INSERT INTO 追加外部数据(如手动导入一批补录数据)
37
- - 需要按分区独立回填或重刷(动态分区 DT 只能整表全量刷新,无法单独刷某个分区)
38
- - 下游有其他任务需要往同一张表写入数据(DT 独占写入权)
32
+ Therefore, the following ETL scenarios are not suitable for dynamic partition DT:
33
+ - Need to manually patch data (e.g., a few rows are found to be incorrect and need to be directly UPDATEd)
34
+ - Need to delete data by condition (e.g., cleaning dirty data, deleting expired records)
35
+ - Need MERGE INTO for upsert (e.g., consuming a stream and merging into a target table in a CDC scenario)
36
+ - Need INSERT INTO to append external data (e.g., manually importing a batch of supplementary data)
37
+ - Need to backfill or re-refresh partitions independently (dynamic partition DT can only do a full table refresh; individual partitions cannot be refreshed separately)
38
+ - Downstream tasks need to write to the same table (DT has exclusive write ownership)
39
39
 
40
40
  ```sql
41
41
  CREATE DYNAMIC TABLE order_summary (
@@ -46,140 +46,140 @@ SELECT category, SUM(amount) AS total_amount
46
46
  FROM orders
47
47
  GROUP BY category;
48
48
 
49
- -- 刷新时不指定分区
49
+ -- No partition specified at refresh time
50
50
  REFRESH DYNAMIC TABLE order_summary;
51
51
  ```
52
52
 
53
- ## 两者的关键区别
53
+ ## Key Differences
54
54
 
55
- | 维度 | 静态分区 DT | 动态分区 DT |
55
+ | Dimension | Static Partition DT | Dynamic Partition DT |
56
56
  |------|-----------|-----------|
57
- | SQL 中是否有 `SESSION_CONFIGS()` | 有,用于引用分区参数 | |
58
- | REFRESH 语法 | `REFRESH ... PARTITION(ds='xxx')` | `REFRESH ...`(无 PARTITION |
59
- | 增量范围 | 只处理指定分区的增量数据 | 处理所有源表的全部增量数据 |
60
- | 调度方式 | 外部调度器按分区值逐个触发 | 外部调度器定时触发即可 |
61
- | 数据生命周期 | 按分区管理,可独立回填/删除 | 整表管理 |
62
- | 状态表 | 按分区独立维护 | 全局维护 |
63
- | 适合的数据模式 | T+1 批处理、按时间分区的 ETL | 实时流、全局聚合、无明确分区键 |
57
+ | Does SQL contain `SESSION_CONFIGS()`? | Yes, used to reference partition parameters | No |
58
+ | REFRESH syntax | `REFRESH ... PARTITION(ds='xxx')` | `REFRESH ...` (no PARTITION) |
59
+ | Incremental scope | Only processes incremental data for the specified partition | Processes all incremental data from all source tables |
60
+ | Scheduling method | External scheduler triggers one partition at a time | External scheduler triggers on a timer |
61
+ | Data lifecycle | Managed per partition; can backfill/delete independently | Managed as a whole table |
62
+ | State tables | Maintained independently per partition | Maintained globally |
63
+ | Suitable data patterns | T+1 batch processing, time-partitioned ETL | Real-time streams, global aggregation, no clear partition key |
64
64
 
65
- ## 选择决策树
65
+ ## Selection Decision Tree
66
66
 
67
67
  ```
68
- 你的数据有明确的时间/业务分区键吗?
68
+ Does your data have a clear time/business partition key?
69
69
 
70
- ├─ 原始 ETL 是按分区 INSERT OVERWRITE 的吗?
70
+ ├─ YesWas the original ETL doing INSERT OVERWRITE by partition?
71
71
  │ │
72
- │ ├─ 使用静态分区 DT
73
- │ │ (保持原有的分区粒度,每个分区独立刷新)
72
+ │ ├─ YesUse static partition DT
73
+ │ │ (maintain the original partition granularity; each partition refreshes independently)
74
74
  │ │
75
- │ └─ 数据量大吗?需要按分区管理生命周期吗?
75
+ │ └─ NoIs the data volume large? Do you need per-partition lifecycle management?
76
76
  │ │
77
- │ ├─ 使用静态分区 DT
78
- │ │ (即使原来不是分区表,也建议加分区以便管理)
77
+ │ ├─ YesUse static partition DT
78
+ │ │ (even if the original was not partitioned, adding partitions is recommended for manageability)
79
79
  │ │
80
- │ └─ 使用动态分区 DT
81
- (简单场景,不需要分区管理)
80
+ │ └─ NoUse dynamic partition DT
81
+ (simple scenario; no partition management needed)
82
82
 
83
- └─ 使用动态分区 DT
84
- (全局聚合、实时汇总等场景)
83
+ └─ NoUse dynamic partition DT
84
+ (global aggregation, real-time summary, etc.)
85
85
  ```
86
86
 
87
- ## 静态分区 DT 详解
87
+ ## Static Partition DT — Details
88
88
 
89
- ### 适用场景
89
+ ### Applicable Scenarios
90
90
 
91
- 1. **T+1 批处理 ETL 迁移**
92
- - 原始 SQL `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}')` 模式
93
- - 每天/每小时按分区刷新一次
94
- - 需要支持历史分区回填
91
+ 1. **T+1 batch ETL migration**
92
+ - Original SQL follows the `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}')` pattern
93
+ - Refreshes once per day/hour by partition
94
+ - Needs to support historical partition backfill
95
95
 
96
- 2. **滑动窗口计算**
97
- - 如:最近 7 天的聚合、环比计算
98
- - SQL 中引用 `SESSION_CONFIGS()['dt.args.ds']` `sub_days(...)` 做窗口范围
96
+ 2. **Sliding window computation**
97
+ - E.g., aggregation over the last 7 days, period-over-period comparison
98
+ - SQL references `SESSION_CONFIGS()['dt.args.ds']` and `sub_days(...)` for window range
99
99
 
100
- 3. **需要按分区管理数据生命周期**
101
- - 通过 `data_lifecycle` 自动清理过期分区
102
- - 可以单独回填某个分区而不影响其他分区
100
+ 3. **Per-partition data lifecycle management**
101
+ - Automatically clean up expired partitions via `data_lifecycle`
102
+ - Can backfill a single partition without affecting others
103
103
 
104
- 4. **自引用 DT(日环比、SCD)**
105
- - 当前分区依赖上一个分区的结果
106
- - 必须用静态分区,因为需要明确指定"当前分区""上一分区"
104
+ 4. **Self-referencing DT (daily comparison, SCD)**
105
+ - Current partition depends on the result of the previous partition
106
+ - Must use static partition, because "current partition" and "previous partition" need to be explicitly specified
107
107
 
108
- ### 刷新方式
108
+ ### Refresh Method
109
109
 
110
110
  ```sql
111
- -- 每次刷新一个分区
111
+ -- Refresh one partition at a time
112
112
  set dt.args.ds=2025-01-15
113
113
  REFRESH DYNAMIC TABLE my_dt PARTITION(ds = '2025-01-15');
114
114
 
115
- -- 多级分区
115
+ -- Multi-level partition
116
116
  set dt.args.pt=20250411
117
117
  set dt.args.pt_hour=01
118
118
  REFRESH DYNAMIC TABLE my_dt PARTITION(pt = '20250411', pt_hour = '01');
119
119
  ```
120
120
 
121
- ### 注意事项
121
+ ### Notes
122
122
 
123
- - 回填时使用 `cz.optimizer.incremental.backfill.enabled=TRUE`,会自动走全量刷新
124
- - 分区参数通过 `set dt.args.xxx=value` 传入,REFRESH 语句中的 PARTITION 子句指定分区值
123
+ - Use `cz.optimizer.incremental.backfill.enabled=TRUE` for backfill; it will automatically use full refresh
124
+ - Partition parameters are passed via `set dt.args.xxx=value`; the PARTITION clause in the REFRESH statement specifies the partition value
125
125
 
126
- ## 动态分区 DT 详解
126
+ ## Dynamic Partition DT — Details
127
127
 
128
- ### 适用场景
128
+ ### Applicable Scenarios
129
129
 
130
- 1. **实时流数据聚合**
131
- - 源表持续写入,DT 定时刷新
132
- - 不需要按分区管理,每次处理所有新增数据
130
+ 1. **Real-time stream data aggregation**
131
+ - Source table continuously writes; DT refreshes on a schedule
132
+ - No partition management needed; each refresh processes all new data
133
133
 
134
- 2. **全局汇总表**
135
- - 如:全局 TopN、全局计数、全局去重
136
- - 没有明确的分区键
134
+ 2. **Global summary tables**
135
+ - E.g., global TopN, global count, global deduplication
136
+ - No clear partition key
137
137
 
138
- 3. **简单的 JOIN + 过滤**
139
- - 不涉及分区参数的简单转换
140
- - 如:事实表 JOIN 维度表,输出宽表
138
+ 3. **Simple JOIN + filter**
139
+ - Simple transformations without partition parameters
140
+ - E.g., fact table JOIN dimension table, output wide table
141
141
 
142
- 4. **多源表合并(UNION ALL)**
143
- - 多个源表的数据合并到一张表
144
- - 不需要按分区管理
142
+ 4. **Multi-source merge (UNION ALL)**
143
+ - Data from multiple source tables merged into one table
144
+ - No partition management needed
145
145
 
146
- ### 刷新方式
146
+ ### Refresh Method
147
147
 
148
148
  ```sql
149
- -- 直接刷新,处理所有源表的增量
149
+ -- Refresh directly; processes all incremental data from all source tables
150
150
  REFRESH DYNAMIC TABLE my_dt;
151
151
  ```
152
152
 
153
- ### 注意事项
153
+ ### Notes
154
154
 
155
- - 每次刷新处理所有源表的全部增量,如果源表变更量大,刷新可能较慢
156
- - 状态表全局维护,随着数据量增长可能膨胀
157
- - 不支持按分区回填,只能全量刷新整表
158
- - 适合变更量占比小的场景(< 5%)
155
+ - Each refresh processes all incremental data from all source tables; if source table change volume is large, refresh may be slow
156
+ - State tables are maintained globally and may grow as data volume increases
157
+ - Per-partition backfill is not supported; only full table refresh is possible
158
+ - Suitable for scenarios where the change ratio is small (< 5%)
159
159
 
160
- ## 分区粒度选择
160
+ ## Partition Granularity Selection
161
161
 
162
- 当选择静态分区 DT 时,还需要决定分区粒度:
162
+ When choosing a static partition DT, you also need to decide on partition granularity:
163
163
 
164
- | 数据模式 | 推荐分区粒度 | 说明 |
164
+ | Data pattern | Recommended granularity | Notes |
165
165
  |---------|------------|------|
166
- | 严格有序的时间序列(如日志) | 分钟级 (`dt_min`) | 数据量大、写入频繁 |
167
- | 大致有序、少量迟到数据 | 小时级 (`dt_hour`) | 平衡粒度和管理复杂度 |
168
- | T+1 批量导入 | 天级 (`ds`) | 最常见的 ETL 场景 |
169
- | 按业务周期 | 周/月级 | 报表类场景 |
170
- | 多级分区 | + 小时 (`ds`, `hour`) | 需要更细粒度的生命周期管理 |
166
+ | Strictly ordered time series (e.g., logs) | Minute-level (`dt_min`) | High data volume, frequent writes |
167
+ | Roughly ordered, small amount of late data | Hour-level (`dt_hour`) | Balance between granularity and management complexity |
168
+ | T+1 batch import | Day-level (`ds`) | Most common ETL scenario |
169
+ | By business cycle | Weekly/monthly | Reporting scenarios |
170
+ | Multi-level partition | Day + hour (`ds`, `hour`) | Finer-grained lifecycle management needed |
171
171
 
172
- 选择原则:
173
- - 粒度越细,每次刷新处理的数据量越小,增量效率越高
174
- - 粒度越细,分区数越多,管理和调度越复杂
175
- - 粒度应与数据写入频率匹配:如果数据每小时写入一次,分区粒度不应细于小时
172
+ Selection principles:
173
+ - Finer granularity → smaller data volume per refresh → higher incremental efficiency
174
+ - Finer granularity → more partitions → more complex management and scheduling
175
+ - Granularity should match the data write frequency: if data is written hourly, partition granularity should not be finer than hourly
176
176
 
177
- ## 从原始 ETL 判断分区策略
177
+ ## Determining Partition Strategy from Original ETL
178
178
 
179
- | 原始 ETL 模式 | 推荐 DT 分区策略 |
179
+ | Original ETL pattern | Recommended DT partition strategy |
180
180
  |--------------|----------------|
181
- | `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}')` | 静态分区 DT,天级 |
182
- | `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}', hour='${hour}')` | 静态分区 DT,天+小时级 |
183
- | `INSERT OVERWRITE TABLE t PARTITION(ds)` (动态分区写入) | 动态分区 DT 或静态分区 DT(取决于是否需要按分区管理) |
184
- | `INSERT INTO TABLE t SELECT ...` (无分区) | 动态分区 DT |
185
- | `INSERT OVERWRITE TABLE t SELECT ...` (全表覆盖) | 动态分区 DT |
181
+ | `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}')` | Static partition DT, day-level |
182
+ | `INSERT OVERWRITE TABLE t PARTITION(ds='${ds}', hour='${hour}')` | Static partition DT, day+hour level |
183
+ | `INSERT OVERWRITE TABLE t PARTITION(ds)` (dynamic partition write) | Dynamic partition DT or static partition DT (depends on whether per-partition management is needed) |
184
+ | `INSERT INTO TABLE t SELECT ...` (no partition) | Dynamic partition DT |
185
+ | `INSERT OVERWRITE TABLE t SELECT ...` (full table overwrite) | Dynamic partition DT |