@clickzetta/cz-cli-darwin-arm64 0.3.78 → 0.3.81

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (201) hide show
  1. package/bin/cz-cli +0 -0
  2. package/package.json +1 -1
  3. package/bin/skills/clickzetta-access-control/LICENSE +0 -16
  4. package/bin/skills/clickzetta-access-control/SKILL.md +0 -243
  5. package/bin/skills/clickzetta-access-control/eval_cases.jsonl +0 -3
  6. package/bin/skills/clickzetta-access-control/references/dynamic-masking.md +0 -86
  7. package/bin/skills/clickzetta-access-control/references/grant-revoke.md +0 -103
  8. package/bin/skills/clickzetta-access-control/references/role-management.md +0 -66
  9. package/bin/skills/clickzetta-access-control/references/user-management.md +0 -61
  10. package/bin/skills/clickzetta-app-python-sdk/LICENSE +0 -16
  11. package/bin/skills/clickzetta-app-python-sdk/SKILL.md +0 -153
  12. package/bin/skills/clickzetta-app-python-sdk/eval_cases.jsonl +0 -12
  13. package/bin/skills/clickzetta-app-python-sdk/references/bulkload.md +0 -196
  14. package/bin/skills/clickzetta-app-python-sdk/references/connector.md +0 -143
  15. package/bin/skills/clickzetta-app-python-sdk/references/realtime.md +0 -122
  16. package/bin/skills/clickzetta-batch-sync-pipeline/LICENSE +0 -16
  17. package/bin/skills/clickzetta-batch-sync-pipeline/SKILL.md +0 -227
  18. package/bin/skills/clickzetta-batch-sync-pipeline/eval_cases.jsonl +0 -5
  19. package/bin/skills/clickzetta-bi-connect/LICENSE +0 -16
  20. package/bin/skills/clickzetta-bi-connect/SKILL.md +0 -176
  21. package/bin/skills/clickzetta-bi-connect/eval_cases.jsonl +0 -5
  22. package/bin/skills/clickzetta-bi-connect/references/bi-tools.md +0 -170
  23. package/bin/skills/clickzetta-cdc-sync-pipeline/LICENSE +0 -16
  24. package/bin/skills/clickzetta-cdc-sync-pipeline/SKILL.md +0 -633
  25. package/bin/skills/clickzetta-cdc-sync-pipeline/eval_cases.jsonl +0 -5
  26. package/bin/skills/clickzetta-data-ingest-pipeline/LICENSE +0 -16
  27. package/bin/skills/clickzetta-data-ingest-pipeline/SKILL.md +0 -237
  28. package/bin/skills/clickzetta-data-ingest-pipeline/eval_cases.jsonl +0 -5
  29. package/bin/skills/clickzetta-data-retention/LICENSE +0 -16
  30. package/bin/skills/clickzetta-data-retention/SKILL.md +0 -160
  31. package/bin/skills/clickzetta-data-retention/eval_cases.jsonl +0 -5
  32. package/bin/skills/clickzetta-data-retention/references/lifecycle-reference.md +0 -175
  33. package/bin/skills/clickzetta-data-science/LICENSE +0 -16
  34. package/bin/skills/clickzetta-data-science/SKILL.md +0 -125
  35. package/bin/skills/clickzetta-data-science/eval_cases.jsonl +0 -12
  36. package/bin/skills/clickzetta-data-science/references/bitmap-profile.md +0 -146
  37. package/bin/skills/clickzetta-data-science/references/data-patterns.md +0 -110
  38. package/bin/skills/clickzetta-data-science/references/setup.md +0 -160
  39. package/bin/skills/clickzetta-data-science/references/stats-functions.md +0 -195
  40. package/bin/skills/clickzetta-data-science/references/write-and-infer.md +0 -122
  41. package/bin/skills/clickzetta-data-science/references/zettapark-api.md +0 -156
  42. package/bin/skills/clickzetta-data-sharing/LICENSE +0 -16
  43. package/bin/skills/clickzetta-data-sharing/SKILL.md +0 -160
  44. package/bin/skills/clickzetta-data-sharing/eval_cases.jsonl +0 -3
  45. package/bin/skills/clickzetta-data-sharing/references/share-ddl.md +0 -134
  46. package/bin/skills/clickzetta-dba-guide/LICENSE +0 -16
  47. package/bin/skills/clickzetta-dba-guide/SKILL.md +0 -542
  48. package/bin/skills/clickzetta-dba-guide/eval_cases.jsonl +0 -3
  49. package/bin/skills/clickzetta-dw-modeling/LICENSE +0 -16
  50. package/bin/skills/clickzetta-dw-modeling/SKILL.md +0 -351
  51. package/bin/skills/clickzetta-dw-modeling/eval_cases.jsonl +0 -4
  52. package/bin/skills/clickzetta-dw-modeling/references/modeling-patterns.md +0 -100
  53. package/bin/skills/clickzetta-dynamic-table/LICENSE +0 -16
  54. package/bin/skills/clickzetta-dynamic-table/SKILL.md +0 -230
  55. package/bin/skills/clickzetta-dynamic-table/best-practices/dimension-table-join-guide.md +0 -253
  56. package/bin/skills/clickzetta-dynamic-table/best-practices/medallion-and-stream-patterns.md +0 -124
  57. package/bin/skills/clickzetta-dynamic-table/best-practices/non-partitioned-merge-into-warning.md +0 -96
  58. package/bin/skills/clickzetta-dynamic-table/best-practices/performance-optimization.md +0 -109
  59. package/bin/skills/clickzetta-dynamic-table/best-practices/scheduling-guide.md +0 -135
  60. package/bin/skills/clickzetta-dynamic-table/dt-creator/SKILL.md +0 -15
  61. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md +0 -185
  62. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/incremental-config-reference.md +0 -427
  63. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/refresh-history-guide.md +0 -260
  64. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/sql-limitations.md +0 -80
  65. package/bin/skills/clickzetta-dynamic-table/dynamic-table-alter/SKILL.md +0 -190
  66. package/bin/skills/clickzetta-dynamic-table/eval_cases.jsonl +0 -5
  67. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/SKILL.md +0 -27
  68. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-column-validation-rules.md +0 -118
  69. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-conversion-rules.md +0 -225
  70. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-placeholder-rules.md +0 -182
  71. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-refresh-rules.md +0 -98
  72. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-self-reference-rules.md +0 -76
  73. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-workflow.md +0 -109
  74. package/bin/skills/clickzetta-external-catalog/LICENSE +0 -16
  75. package/bin/skills/clickzetta-external-catalog/SKILL.md +0 -123
  76. package/bin/skills/clickzetta-external-catalog/eval_cases.jsonl +0 -5
  77. package/bin/skills/clickzetta-external-catalog/references/external-catalog-ddl.md +0 -130
  78. package/bin/skills/clickzetta-external-function/LICENSE +0 -16
  79. package/bin/skills/clickzetta-external-function/SKILL.md +0 -203
  80. package/bin/skills/clickzetta-external-function/eval_cases.jsonl +0 -4
  81. package/bin/skills/clickzetta-external-function/references/external-function-ddl.md +0 -171
  82. package/bin/skills/clickzetta-file-import-pipeline/LICENSE +0 -16
  83. package/bin/skills/clickzetta-file-import-pipeline/SKILL.md +0 -190
  84. package/bin/skills/clickzetta-file-import-pipeline/eval_cases.jsonl +0 -5
  85. package/bin/skills/clickzetta-index-manager/LICENSE +0 -16
  86. package/bin/skills/clickzetta-index-manager/SKILL.md +0 -140
  87. package/bin/skills/clickzetta-index-manager/eval_cases.jsonl +0 -5
  88. package/bin/skills/clickzetta-index-manager/references/bloomfilter-index.md +0 -67
  89. package/bin/skills/clickzetta-index-manager/references/index-management.md +0 -73
  90. package/bin/skills/clickzetta-index-manager/references/inverted-index.md +0 -80
  91. package/bin/skills/clickzetta-index-manager/references/vector-index.md +0 -81
  92. package/bin/skills/clickzetta-java-sdk/LICENSE +0 -16
  93. package/bin/skills/clickzetta-java-sdk/SKILL.md +0 -186
  94. package/bin/skills/clickzetta-java-sdk/eval_cases.jsonl +0 -12
  95. package/bin/skills/clickzetta-java-sdk/references/bulkload.md +0 -163
  96. package/bin/skills/clickzetta-java-sdk/references/realtime.md +0 -212
  97. package/bin/skills/clickzetta-kafka-ingest-pipeline/LICENSE +0 -16
  98. package/bin/skills/clickzetta-kafka-ingest-pipeline/SKILL.md +0 -769
  99. package/bin/skills/clickzetta-kafka-ingest-pipeline/eval_cases.jsonl +0 -5
  100. package/bin/skills/clickzetta-kafka-ingest-pipeline/references/kafka-pipe-syntax.md +0 -324
  101. package/bin/skills/clickzetta-lakehouse-connect/LICENSE +0 -16
  102. package/bin/skills/clickzetta-lakehouse-connect/SKILL.md +0 -218
  103. package/bin/skills/clickzetta-lakehouse-connect/eval_cases.jsonl +0 -3
  104. package/bin/skills/clickzetta-lakehouse-connect/evals/evals.json +0 -35
  105. package/bin/skills/clickzetta-lakehouse-connect/references/config-file.md +0 -435
  106. package/bin/skills/clickzetta-lakehouse-connect/references/jdbc.md +0 -478
  107. package/bin/skills/clickzetta-lakehouse-connect/references/python-sdk.md +0 -225
  108. package/bin/skills/clickzetta-lakehouse-connect/references/sqlalchemy.md +0 -468
  109. package/bin/skills/clickzetta-lakehouse-connect/references/zettapark-session.md +0 -445
  110. package/bin/skills/clickzetta-manage-comments/LICENSE +0 -16
  111. package/bin/skills/clickzetta-manage-comments/SKILL.md +0 -219
  112. package/bin/skills/clickzetta-manage-comments/eval_cases.jsonl +0 -3
  113. package/bin/skills/clickzetta-metadata/LICENSE +0 -16
  114. package/bin/skills/clickzetta-metadata/SKILL.md +0 -502
  115. package/bin/skills/clickzetta-metadata/eval_cases.jsonl +0 -5
  116. package/bin/skills/clickzetta-metadata/references/instance-views-reference.md +0 -276
  117. package/bin/skills/clickzetta-metadata/references/metering-views-reference.md +0 -137
  118. package/bin/skills/clickzetta-metadata/references/show-desc-reference.md +0 -326
  119. package/bin/skills/clickzetta-metadata/references/views-reference.md +0 -271
  120. package/bin/skills/clickzetta-monitoring/LICENSE +0 -16
  121. package/bin/skills/clickzetta-monitoring/SKILL.md +0 -215
  122. package/bin/skills/clickzetta-monitoring/eval_cases.jsonl +0 -5
  123. package/bin/skills/clickzetta-monitoring/references/job-history-analysis.md +0 -97
  124. package/bin/skills/clickzetta-monitoring/references/show-jobs.md +0 -48
  125. package/bin/skills/clickzetta-oss-ingest-pipeline/LICENSE +0 -16
  126. package/bin/skills/clickzetta-oss-ingest-pipeline/SKILL.md +0 -562
  127. package/bin/skills/clickzetta-oss-ingest-pipeline/eval_cases.jsonl +0 -5
  128. package/bin/skills/clickzetta-overview/LICENSE +0 -16
  129. package/bin/skills/clickzetta-overview/SKILL.md +0 -102
  130. package/bin/skills/clickzetta-overview/eval_cases.jsonl +0 -5
  131. package/bin/skills/clickzetta-overview/references/brands-and-endpoints.md +0 -79
  132. package/bin/skills/clickzetta-overview/references/object-model.md +0 -311
  133. package/bin/skills/clickzetta-overview/references/studio-modules.md +0 -173
  134. package/bin/skills/clickzetta-pipeline-review/LICENSE +0 -16
  135. package/bin/skills/clickzetta-pipeline-review/SKILL.md +0 -377
  136. package/bin/skills/clickzetta-query-optimizer/LICENSE +0 -16
  137. package/bin/skills/clickzetta-query-optimizer/SKILL.md +0 -156
  138. package/bin/skills/clickzetta-query-optimizer/eval_cases.jsonl +0 -5
  139. package/bin/skills/clickzetta-query-optimizer/references/explain.md +0 -56
  140. package/bin/skills/clickzetta-query-optimizer/references/hints-and-sortkey.md +0 -78
  141. package/bin/skills/clickzetta-query-optimizer/references/optimize.md +0 -65
  142. package/bin/skills/clickzetta-query-optimizer/references/result-cache.md +0 -49
  143. package/bin/skills/clickzetta-query-optimizer/references/show-jobs.md +0 -42
  144. package/bin/skills/clickzetta-realtime-sync-pipeline/LICENSE +0 -16
  145. package/bin/skills/clickzetta-realtime-sync-pipeline/SKILL.md +0 -323
  146. package/bin/skills/clickzetta-realtime-sync-pipeline/eval_cases.jsonl +0 -5
  147. package/bin/skills/clickzetta-semantic-view/LICENSE +0 -16
  148. package/bin/skills/clickzetta-semantic-view/SKILL.md +0 -207
  149. package/bin/skills/clickzetta-semantic-view/eval_cases.jsonl +0 -12
  150. package/bin/skills/clickzetta-semantic-view/references/semantic-view-reference.md +0 -167
  151. package/bin/skills/clickzetta-spark-flink-connector/LICENSE +0 -16
  152. package/bin/skills/clickzetta-spark-flink-connector/SKILL.md +0 -92
  153. package/bin/skills/clickzetta-spark-flink-connector/eval_cases.jsonl +0 -5
  154. package/bin/skills/clickzetta-spark-flink-connector/references/flink.md +0 -147
  155. package/bin/skills/clickzetta-spark-flink-connector/references/spark.md +0 -132
  156. package/bin/skills/clickzetta-sql-pipeline-manager/LICENSE +0 -16
  157. package/bin/skills/clickzetta-sql-pipeline-manager/SKILL.md +0 -485
  158. package/bin/skills/clickzetta-sql-pipeline-manager/eval_cases.jsonl +0 -12
  159. package/bin/skills/clickzetta-sql-pipeline-manager/evals/evals.json +0 -166
  160. package/bin/skills/clickzetta-sql-pipeline-manager/references/dynamic-table.md +0 -185
  161. package/bin/skills/clickzetta-sql-pipeline-manager/references/materialized-view.md +0 -129
  162. package/bin/skills/clickzetta-sql-pipeline-manager/references/pipe.md +0 -222
  163. package/bin/skills/clickzetta-sql-pipeline-manager/references/table-stream.md +0 -125
  164. package/bin/skills/clickzetta-sql-syntax-guide/LICENSE +0 -16
  165. package/bin/skills/clickzetta-sql-syntax-guide/SKILL.md +0 -249
  166. package/bin/skills/clickzetta-sql-syntax-guide/eval_cases.jsonl +0 -3
  167. package/bin/skills/clickzetta-sql-syntax-guide/references/ddl-reference.md +0 -350
  168. package/bin/skills/clickzetta-sql-syntax-guide/references/dml-reference.md +0 -279
  169. package/bin/skills/clickzetta-sql-syntax-guide/references/dql-reference.md +0 -504
  170. package/bin/skills/clickzetta-sql-syntax-guide/references/functions-reference.md +0 -372
  171. package/bin/skills/clickzetta-sql-syntax-guide/references/migration-databricks.md +0 -260
  172. package/bin/skills/clickzetta-sql-syntax-guide/references/migration-snowflake.md +0 -382
  173. package/bin/skills/clickzetta-sql-syntax-guide/references/vs-snowflake.md +0 -346
  174. package/bin/skills/clickzetta-sql-syntax-guide/references/vs-spark.md +0 -229
  175. package/bin/skills/clickzetta-studio-task-manager/LICENSE +0 -16
  176. package/bin/skills/clickzetta-studio-task-manager/SKILL.md +0 -652
  177. package/bin/skills/clickzetta-table-lineage/LICENSE +0 -16
  178. package/bin/skills/clickzetta-table-lineage/SKILL.md +0 -90
  179. package/bin/skills/clickzetta-table-lineage/eval_cases.jsonl +0 -1
  180. package/bin/skills/clickzetta-table-lineage/references/normalize_func.sql +0 -14
  181. package/bin/skills/clickzetta-table-lineage/references/table_cost.sql +0 -38
  182. package/bin/skills/clickzetta-table-lineage/references/table_lineage_standalone.html +0 -562
  183. package/bin/skills/clickzetta-table-lineage/references/table_relation.sql +0 -25
  184. package/bin/skills/clickzetta-table-stream-pipeline/LICENSE +0 -16
  185. package/bin/skills/clickzetta-table-stream-pipeline/SKILL.md +0 -206
  186. package/bin/skills/clickzetta-table-stream-pipeline/eval_cases.jsonl +0 -5
  187. package/bin/skills/clickzetta-vcluster-manager/LICENSE +0 -16
  188. package/bin/skills/clickzetta-vcluster-manager/SKILL.md +0 -212
  189. package/bin/skills/clickzetta-vcluster-manager/eval_cases.jsonl +0 -5
  190. package/bin/skills/clickzetta-vcluster-manager/references/vc-cache.md +0 -54
  191. package/bin/skills/clickzetta-vcluster-manager/references/vcluster-ddl.md +0 -150
  192. package/bin/skills/clickzetta-volume-manager/LICENSE +0 -16
  193. package/bin/skills/clickzetta-volume-manager/SKILL.md +0 -292
  194. package/bin/skills/clickzetta-volume-manager/eval_cases.jsonl +0 -5
  195. package/bin/skills/clickzetta-volume-manager/references/volume-ddl.md +0 -199
  196. package/bin/skills/clickzetta-zettapark/LICENSE +0 -16
  197. package/bin/skills/clickzetta-zettapark/SKILL.md +0 -248
  198. package/bin/skills/clickzetta-zettapark/eval_cases.jsonl +0 -12
  199. package/bin/skills/clickzetta-zettapark/references/zettapark-api.md +0 -283
  200. package/bin/skills/cz-cli/SKILL.md +0 -311
  201. package/bin/skills/cz-cli/references/profile-setup.md +0 -120
@@ -1,652 +0,0 @@
1
- ---
2
- name: clickzetta-studio-task-manager
3
- description: |
4
- 管理 ClickZetta Lakehouse Studio 任务,覆盖任务类型说明(离线同步/多表离线同步/实时同步/
5
- 多表实时同步/数据开发)、任务目录组织、任务类型区分、cz-cli task 命令族、
6
- 调度配置、依赖管理和常见问题排查。实现"建管分离"工程规范:DDL 任务草稿化、ETL 任务调度化、
7
- Dynamic Table 自动刷新化。
8
- 当用户说"创建 Studio 任务"、"任务目录"、"任务调度"、"cz-cli task"、"任务依赖"、
9
- "任务失败"、"任务状态"、"整库同步任务"、"ETL 任务编排"、"任务管理"、
10
- "建管分离"、"DDL 任务"、"调度 DAG"、"任务文件夹"、"Studio 任务"、
11
- "离线同步"、"实时同步"、"多表实时同步"、"数据开发任务"、"任务类型"、
12
- "选哪种同步"、"同步任务区别"时触发。
13
- Keywords: Studio task, task management, cz-cli task, scheduling, DAG, DDL draft, ETL pipeline, task folder, offline sync, realtime sync, CDC, task types
14
- ---
15
-
16
- # ClickZetta Studio 任务管理
17
-
18
- ## 向导:明确操作意图
19
-
20
- 收到任务管理请求后,优先使用交互式问答工具(如 `question`)收集意图;若无此类工具,则用文字列出选项:
21
-
22
- ```
23
- question({
24
- questions: [{
25
- question: "你想做什么?",
26
- options: [
27
- { label: "从零搭建新管道", description: "创建目录、DDL 任务、同步任务、ETL 任务" },
28
- { label: "管理现有任务", description: "查看状态、修改配置、配置依赖、重跑、补数" },
29
- { label: "排查任务问题", description: "失败诊断、依赖检查、日志分析 → 加载 clickzetta-pipeline-review" },
30
- { label: "规范检查", description: "检查现有任务是否符合建管分离规范" }
31
- ]
32
- }]
33
- })
34
- ```
35
-
36
- **如果用户已经明确说了要做什么,直接执行,不再询问。**
37
-
38
- 对于**从零搭建**,还需收集:业务域/项目名称、数据源类型、分层结构。
39
-
40
- ## 数据管道向导(从零搭建时使用)
41
-
42
- 完整流程:**需求理解 → 数据探索 → 技术选型 → 方案确认 → 执行**
43
-
44
- ---
45
-
46
- ### Step 0:需求输入
47
-
48
- **优先询问用户是否有需求文档**(PRD、需求说明、数仓设计文档等):
49
-
50
- ```
51
- question({
52
- questions: [{
53
- question: "开始前,你有需求文档或背景说明吗?",
54
- options: [
55
- { label: "有,我来提供", description: "请粘贴文档内容或上传文件,我来提取关键信息" },
56
- { label: "没有,口头描述", description: "我来引导你回答几个关键问题" }
57
- ]
58
- }]
59
- })
60
- ```
61
-
62
- **如果有文档**:读取文档,自动提取业务场景、数据源、目标产出、时效性要求,跳到 Step 1。
63
-
64
- **如果没有文档**,收集以下业务需求(优先使用交互式工具;若无,一次性文字列出):
65
-
66
- ```
67
- question({
68
- questions: [
69
- {
70
- question: "这个管道服务于什么业务场景?",
71
- options: [
72
- { label: "BI 报表 / 数据看板", description: "固定报表,指标体系明确,T+1 或小时级" },
73
- { label: "实时监控 / 运营看板", description: "分钟级延迟,关注实时指标" },
74
- { label: "数据科学 / 特征工程", description: "供模型训练或推理使用" },
75
- { label: "数据共享 / 对外输出", description: "提供给其他系统或团队使用" }
76
- ]
77
- },
78
- {
79
- question: "数据消费方是谁?",
80
- options: [
81
- { label: "BI 工具(Superset/Tableau 等)", description: "需要宽表或聚合表" },
82
- { label: "数据分析师(SQL 查询)", description: "需要清洗后的明细表" },
83
- { label: "下游系统 / API", description: "需要结构化输出" },
84
- { label: "数据科学家(Python/ZettaPark)", description: "需要特征表或原始明细" }
85
- ]
86
- },
87
- {
88
- question: "数据时效性要求?",
89
- options: [
90
- { label: "T+1(次日可用)", description: "每天凌晨跑批,早上数据就绪" },
91
- { label: "小时级", description: "每小时更新一次" },
92
- { label: "分钟级", description: "近实时,延迟 < 10 分钟" },
93
- { label: "秒级实时", description: "CDC 持续同步,秒级延迟" }
94
- ]
95
- }
96
- ]
97
- })
98
- ```
99
-
100
- 还需口头确认(文字追问,不用菜单):
101
- - **核心指标口径**:如果涉及 GMV、活跃用户等业务指标,确认计算口径
102
- - **项目/业务域名称**:用于任务目录和 Schema 命名(如 `ecommerce_dw`)
103
-
104
- ---
105
-
106
- ### Step 1:数据探索(AI 自主执行,不问用户)
107
-
108
- 收集到需求后,立即探查数据现状:
109
-
110
- ```sql
111
- -- 查看相关 schema 和表
112
- SHOW SCHEMAS;
113
- SHOW TABLES IN <相关schema>;
114
-
115
- -- 查表大小和行数
116
- SELECT table_schema, table_name,
117
- ROUND(bytes/1024.0/1024/1024, 2) AS size_gb, row_count
118
- FROM information_schema.tables
119
- WHERE table_type = 'MANAGED_TABLE'
120
- ORDER BY bytes DESC NULLS LAST LIMIT 20;
121
-
122
- -- 抽样了解字段含义
123
- SELECT * FROM <schema>.<table> LIMIT 5;
124
- ```
125
-
126
- 同时用 `cz-cli datasource list` 查看已配置的外部数据源。
127
-
128
- ---
129
-
130
- ### Step 2:技术选型(选数据源类型和接入方式)
131
-
132
- 基于需求和数据探索结果,用交互式工具收集技术选型:
133
-
134
- **选数据源类型:**
135
- ```
136
- question({ questions: [{ question: "数据来自哪里?", options: [
137
- { label: "外部数据库", description: "MySQL / PostgreSQL / SQL Server / Oracle 等" },
138
- { label: "Kafka 消息队列", description: "Kafka Topic → Lakehouse" },
139
- { label: "对象存储", description: "OSS / S3 / COS 文件导入" },
140
- { label: "Lakehouse 内部 ETL 分层", description: "ODS→DWD→DWS/ADS,SQL 任务 + Dynamic Table" },
141
- { label: "端到端完整管道", description: "数据接入 + 分层建模 + 聚合" },
142
- { label: "不确定,先探索数据", description: "先看现有数据再给方案建议" }
143
- ]}]})
144
- ```
145
-
146
- **追问(仅部分选项需要):**
147
-
148
- 选了"外部数据库":
149
- ```
150
- question({ questions: [{ question: "同步时效性?", options: [
151
- { label: "实时同步(秒级)", description: "CDC,基于 Binlog/WALs,持续运行" },
152
- { label: "离线批量(小时/天级)", description: "周期性全量同步,配置 Cron" }
153
- ]}]})
154
- ```
155
-
156
- 选了"对象存储":
157
- ```
158
- question({ questions: [{ question: "接入方式?", options: [
159
- { label: "SQL Pipe(持续自动导入)", description: "LIST_PURGE 或 EVENT_NOTIFICATION 模式" },
160
- { label: "Studio 离线同步任务", description: "周期性批量导入,配置 Cron" }
161
- ]}]})
162
- ```
163
-
164
- 选了"Kafka":
165
- ```
166
- question({ questions: [{ question: "接入方式?", options: [
167
- { label: "SQL Pipe(READ_KAFKA)", description: "纯 SQL,灵活,推荐工程师使用" },
168
- { label: "Studio 实时同步任务", description: "图形化配置,支持 JSONPath 计算列" }
169
- ]}]})
170
- ```
171
-
172
- ---
173
-
174
- ### Step 3:方案确认(必须执行,不得跳过)
175
-
176
- 综合需求和技术选型,向用户呈现完整方案摘要,请求确认:
177
-
178
- ```
179
- question({
180
- questions: [{
181
- question: "确认以下方案后开始构建:\n业务场景:<场景>\n数据源:<数据源名称>\n同步方式:<离线/实时/SQL Pipe>\n分层结构:<ODS/DWD/DWS 或 Bronze/Silver/Gold>\n目标 Schema:<schema>\n调度:<Cron 或持续运行>\n是否开始?",
182
- options: [
183
- { label: "确认,开始构建", description: "加载对应 skill,开始创建任务" },
184
- { label: "需要调整", description: "重新收集信息" }
185
- ]
186
- }]
187
- })
188
- ```
189
-
190
- 用户确认后,按路由表加载对应 skill:
191
-
192
- **路由表**
193
-
194
- | 数据源 | 时效性/方式 | 加载 skill |
195
- |---|---|---|
196
- | 外部数据库 | 实时单表 CDC | `clickzetta-realtime-sync-pipeline` |
197
- | 外部数据库 | 实时多表/整库 CDC | `clickzetta-cdc-sync-pipeline` |
198
- | 外部数据库 | 离线批量 | `clickzetta-batch-sync-pipeline` |
199
- | Kafka | SQL Pipe | `clickzetta-kafka-ingest-pipeline` |
200
- | Kafka | Studio 实时同步 | `clickzetta-realtime-sync-pipeline` |
201
- | 对象存储 | SQL Pipe | `clickzetta-oss-ingest-pipeline` |
202
- | 对象存储 | Studio 离线同步 | `clickzetta-batch-sync-pipeline` |
203
- | Lakehouse 内部 ETL 分层 | — | `clickzetta-sql-pipeline-manager` |
204
- | 端到端完整管道 / 不确定 | — | `clickzetta-dw-modeling` |
205
-
206
- > 实时 CDC 单表 vs 多表:用户说"整库"或"多张表"→ `cdc-sync-pipeline`;"一张表"→ `realtime-sync-pipeline`;不确定时追问。
207
-
208
- ---
209
-
210
- ## Studio 任务类型说明
211
-
212
- Studio 提供四大类任务,选错类型是最常见的工程错误:
213
-
214
- ### 离线同步(单表)
215
- 将单张源表周期性全量同步到 Lakehouse。
216
-
217
- - **适用场景**:单表定期覆盖更新、数据时效性要求不高(按天/小时批量)、资源优化(不需要实时)
218
- - **运行模式**:周期调度(需配置 Cron),每次全量覆盖或追加
219
- - **数据源**:MySQL、PostgreSQL、SQL Server 等关系型数据库
220
- - **对应 skill**:`clickzetta-batch-sync-pipeline`(单表模式)
221
-
222
- ### 多表离线同步
223
- 将多张源表或整库周期性批量同步到 Lakehouse。
224
-
225
- - **适用场景**:
226
- - 整库迁移(批量同步所有表,减少逐表配置工作量)
227
- - 分库分表合并(多个分库分表合并到统一目标表)
228
- - 定期数据校准(周期性全量同步确保目标端与源端一致)
229
- - **运行模式**:周期调度(需配置 Cron),支持整库镜像、多表镜像、多表合并三种模式
230
- - **数据源**:MySQL、PostgreSQL、SQL Server 等
231
- - **对应 skill**:`clickzetta-batch-sync-pipeline`(多表模式)
232
-
233
- ### 实时同步(单表)
234
- 将单张 Kafka Topic 数据持续实时同步到 Lakehouse。
235
-
236
- - **适用场景**:Kafka 消息流实时入湖、秒级/分钟级延迟要求、单 Topic 精细化同步
237
- - **运行模式**:持续运行(无需配置 Cron,提交即运行)
238
- - **数据源**:**仅支持 Kafka**(JSON 消息解析,支持 JSONPath 计算列)
239
- - **对应 skill**:`clickzetta-realtime-sync-pipeline`
240
-
241
- ### 多表实时同步(CDC)
242
- 将 MySQL / PostgreSQL 整库或多表通过 CDC 实时同步到 Lakehouse,包含全量 + 增量两阶段。
243
-
244
- - **适用场景**:数据库整库实时镜像、秒级端到端时效性、分库分表实时合并
245
- - **运行模式**:持续运行(无需配置 Cron,提交即运行)
246
- - **数据源**:
247
-
248
- | 类型 | 增量读取模式 | 支持版本 |
249
- |---|---|---|
250
- | MySQL 类(含 Aurora MySQL、PolarDB MySQL) | Binlog | 5.6 及以上、8.x |
251
- | PostgreSQL 类(含 Aurora PG、PolarDB PG) | WALs 日志 | 14 及以上 |
252
-
253
- - **对应 skill**:`clickzetta-cdc-sync-pipeline`
254
-
255
- ### 数据开发任务(SQL / Python / Shell)
256
- 在 Studio 中编写和调度数据处理逻辑,是数仓 ETL 的核心载体。
257
-
258
- - **SQL 任务**:ODS→DWD 清洗转换、数据质量检查、临时数据修复
259
- - **Python 任务**:自定义数据处理脚本、调用外部 API、机器学习推理
260
- - **Shell 任务**:系统命令、文件操作、调用外部工具
261
- - **运行模式**:周期调度(配置 Cron)或手动触发
262
- - **对应 skill**:`clickzetta-studio-task-manager`(本 skill)
263
-
264
- ### 四类任务对比速查
265
-
266
- | 任务类型 | 数据源 | 同步粒度 | 运行模式 | 时效性 |
267
- |---|---|---|---|---|
268
- | 离线同步 | 关系型数据库 | 单表 | 周期调度 | 小时/天级 |
269
- | 多表离线同步 | 关系型数据库 | 多表/整库 | 周期调度 | 小时/天级 |
270
- | 实时同步 | **仅 Kafka** | 单 Topic | 持续运行 | 秒/分钟级 |
271
- | 多表实时同步 | MySQL / PostgreSQL | 多表/整库 | 持续运行 | 秒级 |
272
- | 数据开发 | 任意(SQL/Python/Shell) | 自定义逻辑 | 周期调度或手动 | 取决于调度频率 |
273
-
274
- ---
275
-
276
- ## 核心原则:建管分离
277
-
278
- **不同类型的任务,调度策略完全不同。** 混淆任务类型是最常见的工程错误。
279
-
280
- | 任务类型 | 典型内容 | Studio 任务类型 | 调度配置 | 状态 |
281
- |---|---|---|---|---|
282
- | **DDL 建表任务** | CREATE TABLE / CREATE SCHEMA | SQL 任务 | ❌ 禁止 Cron,禁止依赖 | DRAFT |
283
- | **数据同步任务** | 外部数据源(关系型数据库/对象存储)→ ODS | **SINGLE_DI / MULTI_DI / REALTIME**(不是 SQL 任务) | ✅ 配置 Cron(离线)或持续运行(实时) | PUBLISHED |
284
- | **ETL 转换任务** | ODS→DWD 清洗 SQL(Lakehouse 内部) | SQL 任务 | ✅ 配置 Cron + 依赖上游同步 | PUBLISHED |
285
- | **数据质量任务** | 行数检查、NULL 率验证 | SQL 任务 | ✅ 配置 Cron + 依赖 ETL | PUBLISHED |
286
- | **DWS/ADS 聚合层** | 指标汇总、报表宽表 | ❌ 使用 Dynamic Table,不建任务 | — | — |
287
-
288
- **数据同步任务支持的数据源:**
289
- - 离线同步(SINGLE_DI/MULTI_DI):MySQL、PostgreSQL、Oracle、SQL Server 等关系型数据库,以及 OSS/COS/S3 对象存储
290
- - 实时同步单表(REALTIME):Kafka
291
- - 多表实时同步 CDC:MySQL(Binlog,5.6+/8.x)、PostgreSQL(WALs,14+)
292
-
293
- **其他数据访问方式(不是数据同步任务):**
294
- - Kafka/OSS/S3/COS → 也可以用 SQL Pipe(`READ_KAFKA`/Volume Pipe),与 Studio 同步任务都合法,根据场景选择
295
- - Hive/Databricks/Snowflake Open Catalog → External Catalog 联邦只读查询,不做数据同步
296
-
297
- > ⚠️ **DDL 任务绝对不能配 Cron**:建表语句重复执行会引发 `SCHEDULE_TASK_HAD_CHILDREN_NODES_EXCEPTION` 等调度冲突。DDL 任务执行完成后立即降级为 DRAFT。
298
-
299
- > ⚠️ **DWS/ADS 层不要建调度任务**:Dynamic Table 系统自动刷新,额外建任务是冗余计算,浪费资源。
300
-
301
- > ⚠️ **严禁用 SQL 任务替代数据同步任务**:不能用 SQL 任务写 `SELECT FROM EXTERNAL` 模拟同步(语法不支持),也不能用 JDBC 任务(JDBC 只能在外部数据库执行 SQL,不支持将数据同步到 Lakehouse)。
302
-
303
- ---
304
-
305
- ## 任务目录组织规范
306
-
307
- 每个数仓项目在 Studio 中创建独立任务目录,统一管理所有任务资产:
308
-
309
- ```
310
- <业务域>_dw/ ← 项目任务目录(如 shenyu_gateway_dw、ecommerce_dw)
311
- ├── 00_sync_<source>_to_ods ← 数据同步(Cron,最早执行)
312
- ├── 01_ddl_ods ← ODS 建表(DRAFT,不调度,手动执行一次)
313
- ├── 02_ddl_dwd ← DWD 建表(DRAFT,不调度,手动执行一次)
314
- ├── 03_ddl_dws_ads ← DWS/ADS 动态表建表(DRAFT,不调度)
315
- ├── 04_transform_ods_to_dwd ← ODS→DWD 清洗(Cron,依赖 00)
316
- └── 05_dqc_check ← 数据质量检查(Cron,依赖 04,可选)
317
- ```
318
-
319
- > DWS/ADS 层由 Dynamic Table 自动刷新,**无需创建任务**。
320
-
321
- ---
322
-
323
- ## cz-cli task 命令族
324
-
325
- ### 任务目录管理
326
-
327
- ```bash
328
- # 创建任务目录
329
- cz-cli task folder create <folder_name>
330
-
331
- # 列出所有任务目录
332
- cz-cli task folder list
333
- ```
334
-
335
- ### 任务查询
336
-
337
- ```bash
338
- # 列出所有任务
339
- cz-cli task list
340
-
341
- # 按目录过滤
342
- cz-cli task list --folder <folder_name>
343
-
344
- # 查看任务详情
345
- cz-cli task get <task_id>
346
-
347
- # 查看任务状态
348
- cz-cli task status <task_id>
349
- ```
350
-
351
- ### 任务执行
352
-
353
- ```bash
354
- # 手动触发任务运行
355
- cz-cli task run <task_id>
356
-
357
- # 查看任务运行日志
358
- cz-cli task logs <task_id>
359
-
360
- # 查看最近一次运行实例
361
- cz-cli task instances <task_id> --limit 5
362
- ```
363
-
364
- ### 任务创建
365
-
366
- ```bash
367
- # 创建 SQL 任务(ETL/DDL)
368
- cz-cli task create \
369
- --name "04_transform_ods_to_dwd" \
370
- --type SQL \
371
- --folder <folder_name> \
372
- --vcluster default \
373
- --sql-file ./transform.sql
374
-
375
- # 创建数据同步任务(单表)
376
- cz-cli task create \
377
- --name "00_sync_mysql_to_ods" \
378
- --type SINGLE_DI \
379
- --folder <folder_name>
380
- ```
381
-
382
- > ⚠️ **整库同步任务(MULTI_DI)的能力边界**:`cz-cli` 可以创建任务框架,但源端/目标端字段映射配置**必须在 Studio UI 中手动完成**。推荐 SOP:
383
- > 1. `cz-cli task create --type MULTI_DI` 创建任务框架
384
- > 2. 复制输出的任务链接,在浏览器中打开
385
- > 3. 在 Studio UI 中配置源端数据库、目标端 Schema、字段映射
386
- > 4. 点击发布运行
387
-
388
- ---
389
-
390
- ## 调度配置最佳实践
391
-
392
- ### Cron 表达式参考
393
-
394
- ```
395
- # 每天 02:00 执行(数据同步)
396
- 0 2 * * *
397
-
398
- # 每天 02:30 执行(ETL 转换,同步完成后 30 分钟)
399
- 30 2 * * *
400
-
401
- # 每天 03:00 执行(数据质量检查)
402
- 0 3 * * *
403
-
404
- # 每小时执行
405
- 0 * * * *
406
- ```
407
-
408
- ### 依赖配置原则
409
-
410
- ```
411
- 正确的依赖链:
412
- 00_sync(Cron 02:00)
413
- ↓ 依赖
414
- 04_transform(Cron 02:30)
415
- ↓ 依赖
416
- 05_dqc(Cron 03:00)
417
-
418
- 错误的依赖:
419
- ❌ DDL 任务(01/02/03)不应出现在依赖链中
420
- ❌ Dynamic Table 不应出现在依赖链中
421
- ```
422
-
423
- ---
424
-
425
- ## 数据同步任务类型选择
426
-
427
- | 场景 | 任务类型 | 说明 |
428
- |---|---|---|
429
- | MySQL/PG 单表同步到 Lakehouse | `SINGLE_DI` | 简单,CLI 可完全配置 |
430
- | MySQL/PG 整库同步(多表镜像) | `MULTI_DI` | CLI 创建框架,UI 配置映射 |
431
- | Kafka 实时接入 | `REALTIME_SYNC` | 持续运行,无需 Cron |
432
- | 文件批量导入(OSS/S3) | SQL 任务(COPY INTO) | 用 SQL 任务执行 COPY INTO |
433
-
434
- ---
435
-
436
- ## 常见问题排查
437
-
438
- | 问题 | 原因 | 解决方案 |
439
- |---|---|---|
440
- | `SCHEDULE_TASK_HAD_CHILDREN_NODES_EXCEPTION` | DDL 任务被配置了 Cron 或依赖 | 清除 DDL 任务的调度配置,降级为 DRAFT |
441
- | 任务发布失败,提示循环依赖 | 任务 A 依赖 B,B 又依赖 A | 检查依赖链,去除环形依赖 |
442
- | 同步任务一直失败,无明确报错 | 字段类型不兼容(如 MySQL BIT(1) vs Lakehouse BOOLEAN) | 检查字段类型映射,参考下方类型映射表 |
443
- | 整库同步任务创建后无法运行 | MULTI_DI 任务缺少字段映射配置 | 进入 Studio UI 配置源端/目标端映射后重新发布 |
444
- | ETL 任务未按时触发 | 上游同步任务失败,依赖未满足 | 先修复上游同步任务,再手动触发 ETL |
445
- | DWS 层数据未更新 | 误建了调度任务但 Dynamic Table 未刷新 | 删除冗余调度任务,确认 Dynamic Table 状态为 RUNNING |
446
- | 任务运行成功但数据为空 | SQL 逻辑问题(如 LEFT JOIN 过滤条件位置错误) | 检查 SQL,LEFT JOIN 右表过滤条件必须在 ON 子句 |
447
-
448
- ### MySQL → Lakehouse 字段类型映射(同步任务常见踩坑)
449
-
450
- | MySQL 类型 | ❌ 不要用 | ✅ ODS 层用 | DWD 层转换 |
451
- |---|---|---|---|
452
- | `BIT(1)` | `BOOLEAN` | `TINYINT` | `CAST(col AS BOOLEAN)` |
453
- | `DATETIME` | `DATETIME` | `TIMESTAMP` | 直接用 |
454
- | `ENUM('a','b')` | `ENUM` | `STRING` | 直接用 |
455
- | `TEXT` / `LONGTEXT` | `TEXT` | `STRING` | 直接用 |
456
- | `DECIMAL(p,s)` | `FLOAT` | `DECIMAL(p,s)` | 直接用 |
457
- | `TINYINT(1)` | `BOOLEAN` | `TINYINT` | `CAST(col AS BOOLEAN)` |
458
-
459
- > **ODS 层原则:宽泛类型优先**,同步成功后在 DWD 层做精确类型转换,避免同步阶段因类型不兼容失败。
460
-
461
- ---
462
-
463
- ## 完整工程化 SOP
464
-
465
- ### 代码资产化原则
466
-
467
- **数据管道开发 / 数仓建模场景下,所有 SQL 代码都应保存为 Studio 任务,作为可管理的代码资产。**
468
-
469
- - 任务是代码的载体,不只是调度配置
470
- - 即使是一次性执行的 DDL,也应保存为 DRAFT 任务,方便查阅、复用和多环境迁移
471
- - 不需要保存为任务的场景:SELECT 查询、临时修复 SQL、一次性验证查询
472
-
473
- ### 新项目启动流程(含快速验证节点)
474
-
475
- 敏捷原则:**每步完成后立即验证,30 秒内知道是否成功,不等全链路跑完再发现问题。**
476
-
477
- ```
478
- 1. 创建任务目录
479
- cz-cli task folder create <业务域>_dw
480
-
481
- 2. 建 ODS 层表,立即验证
482
- cz-cli task save-content 01_ddl_ods --content "<ods_ddl_sql>"
483
- cz-cli task run 01_ddl_ods
484
- ✅ 验证:SHOW TABLES IN <ods_schema> → 确认表已创建
485
-
486
- 3. 创建数据同步任务,手动触发一次,立即验证
487
- - 00_sync:整库或单表同步到 ODS(MULTI_DI 需进 UI 配置映射)
488
- cz-cli task execute 00_sync
489
- ✅ 验证:SELECT COUNT(*) FROM <ods_schema>.<table> → 与源端行数对比
490
- SELECT * FROM <ods_schema>.<table> LIMIT 5 → 抽样检查字段
491
-
492
- 4. 建 DWD 层表,立即验证
493
- cz-cli task save-content 02_ddl_dwd --content "<dwd_ddl_sql>"
494
- cz-cli task run 02_ddl_dwd
495
- ✅ 验证:SHOW TABLES IN <dwd_schema> → 确认表已创建
496
-
497
- 5. 生成 ETL 转换 SQL,先手动执行一次验证逻辑,再配调度
498
- cz-cli task save-content 04_transform_ods_to_dwd --content "<etl_sql>"
499
- cz-cli task execute 04_transform_ods_to_dwd ← 先手动跑一次
500
- ✅ 验证:SELECT COUNT(*) FROM <dwd_schema>.<table> → 行数符合预期
501
- 检查关键字段非空率、LEFT JOIN 结果行数 ≥ 左表行数
502
- 确认无误后再配调度:
503
- cz-cli task save-cron 04_transform_ods_to_dwd --cron '0 30 2 * * ? *'
504
- cz-cli task deploy 04_transform_ods_to_dwd
505
-
506
- 6. 建 DWS/ADS Dynamic Table,立即触发首次刷新验证
507
- cz-cli task save-content 03_ddl_dws_ads --content "<dws_ads_ddl_sql>"
508
- cz-cli task run 03_ddl_dws_ads
509
- REFRESH DYNAMIC TABLE <dws_schema>.<table>
510
- ✅ 验证:SHOW DYNAMIC TABLE REFRESH HISTORY <schema>.<table> LIMIT 3
511
- → status = SUCCESS,行数符合聚合逻辑
512
-
513
- 7. 可选:数据质量检查任务(配 Cron + 依赖 04)
514
- cz-cli task save-content 05_dqc_check --content "<dqc_sql>"
515
- cz-cli task save-cron 05_dqc_check --cron '0 0 3 * * ? *'
516
- cz-cli task deploy 05_dqc_check
517
- ```
518
-
519
- > **快速失败原则**:任何一步验证失败,立即停下来修复,不要继续往下走。ODS 数据不对,DWD 一定也不对。
520
-
521
- ---
522
-
523
- ## 增量迭代向导
524
-
525
- **已有管道需要修改时,走增量流程,不要重走完整建管道流程。**
526
-
527
- 当用户说"加一张表"、"加一个字段"、"加一个指标"、"改 ETL 逻辑"时,优先使用交互式工具收集迭代类型:
528
-
529
- ```
530
- question({
531
- questions: [{
532
- question: "你想对现有管道做什么修改?",
533
- options: [
534
- { label: "新增同步表", description: "在现有同步任务里增加一张源表" },
535
- { label: "新增字段", description: "源表加了字段,ODS/DWD 需要跟进" },
536
- { label: "新增指标/DWS 层", description: "新增聚合逻辑或 Dynamic Table" },
537
- { label: "修改 ETL 逻辑", description: "清洗规则、过滤条件、JOIN 关系变更" }
538
- ]
539
- }]
540
- })
541
- ```
542
-
543
- ### 新增同步表
544
-
545
- ```
546
- 1. 查血缘,确认影响范围
547
- 加载 clickzetta-table-lineage,确认新表是否与现有表有关联
548
-
549
- 2. 在现有同步任务里增加表(或新建单表同步任务)
550
- cz-cli task content 00_sync → 查看现有配置
551
- 按需修改后重新部署
552
-
553
- 3. 手动触发同步,立即验证
554
- cz-cli task execute 00_sync
555
- ✅ SELECT COUNT(*) FROM <ods_schema>.<new_table>
556
-
557
- 4. 如需 DWD 层处理,新增 ETL SQL 并追加到 04_transform 任务
558
- cz-cli task content 04_transform_ods_to_dwd → 查看现有 SQL
559
- 追加新表的清洗逻辑,手动执行验证后重新部署
560
- ```
561
-
562
- ### 新增字段(Schema Evolution)
563
-
564
- ```
565
- 1. 查血缘,识别所有受影响的下游任务/DT
566
- 加载 clickzetta-table-lineage
567
-
568
- 2. 逐层更新(从上游到下游,不能跳层)
569
- ODS 层:ALTER TABLE <ods_schema>.<table> ADD COLUMN <col> <type>
570
- ✅ 验证:DESC TABLE <ods_schema>.<table> → 确认字段已加
571
-
572
- DWD 层:更新 ETL SQL,加入新字段的清洗逻辑
573
- 手动执行 04_transform 验证后重新部署
574
- ✅ 验证:SELECT <new_col>, COUNT(*) FROM <dwd_schema>.<table> GROUP BY 1 LIMIT 5
575
-
576
- DWS/ADS 层(如需):Dynamic Table 不支持 ALTER,用 CREATE OR REPLACE 重建
577
- 重建后立即 REFRESH DYNAMIC TABLE
578
- ✅ 验证:SHOW DYNAMIC TABLE REFRESH HISTORY LIMIT 3 → status = SUCCESS
579
-
580
- 3. 更新 Studio 任务脚本(保持代码资产同步)
581
- cz-cli task save-content <task_name> --content "<updated_sql>"
582
- ```
583
-
584
- ### 新增指标/DWS 层
585
-
586
- ```
587
- 1. 确认指标口径(与用户确认计算逻辑,避免后期返工)
588
-
589
- 2. 检查 DWD 层是否有所需字段,没有先走"新增字段"流程
590
-
591
- 3. 创建新的 Dynamic Table
592
- CREATE OR REPLACE DYNAMIC TABLE <dws_schema>.<new_metric_table>
593
- REFRESH INTERVAL <n> <unit> vcluster <gp_cluster>
594
- AS SELECT ...;
595
- REFRESH DYNAMIC TABLE <dws_schema>.<new_metric_table>
596
- ✅ 验证:SELECT COUNT(*), SUM(<metric>) FROM <dws_schema>.<new_metric_table>
597
- 与已知基准值对比
598
-
599
- 4. 保存 DDL 到 Studio 任务
600
- cz-cli task save-content 03_ddl_dws_ads --content "<updated_ddl>"
601
- ```
602
-
603
- ### 修改 ETL 逻辑
604
-
605
- ```
606
- 1. 查血缘,确认下游影响范围
607
- 加载 clickzetta-table-lineage
608
-
609
- 2. 在 dev/测试环境先验证新逻辑(如有)
610
-
611
- 3. 更新 ETL SQL
612
- cz-cli task content 04_transform_ods_to_dwd → 查看现有逻辑
613
- 修改后先手动执行验证:
614
- cz-cli task execute 04_transform_ods_to_dwd
615
- ✅ 验证:行数对比、关键字段抽样、与修改前结果对比
616
-
617
- 4. 验证通过后重新部署
618
- cz-cli task save-content 04_transform_ods_to_dwd --content "<new_sql>"
619
- cz-cli task deploy 04_transform_ods_to_dwd
620
-
621
- 5. 下游 Dynamic Table 如受影响,触发全量刷新
622
- SET cz.optimizer.incremental.force.full.refresh = true;
623
- REFRESH DYNAMIC TABLE <dws_schema>.<table>;
624
- SET cz.optimizer.incremental.force.full.refresh = false;
625
- ```
626
-
627
- ### 交付验证 Checklist
628
-
629
- - [ ] 各层行数与预期一致
630
- - [ ] Dynamic Table 使用的 VCluster 存在且 `status = RUNNING`(`SHOW VCLUSTERS`)
631
- - [ ] Dynamic Table 刷新历史显示 SUCCESS
632
- - [ ] 关键字段 NULL 率在可接受范围
633
- - [ ] LEFT JOIN 结果行数 ≥ 左表行数
634
- - [ ] 所有 DDL 任务为 DRAFT 状态
635
- - [ ] DWS/ADS 层无冗余调度任务
636
- - [ ] 调度 DAG 无循环依赖
637
- - [ ] **ETL 任务依赖链完整**(`cz-cli task deps <task>` 验证,`task_dependencies` 不为空)
638
- - [ ] 关键表和字段已加注释(加载 `clickzetta-manage-comments`)
639
-
640
- ---
641
-
642
- ## 多环境管理(dev → prod)
643
-
644
- ClickZetta 通过 **Workspace** 隔离环境(dev/staging/prod 对应不同 Workspace)。跨 Workspace 的管道迁移当前自动化程度有限,主要依赖手动操作。
645
-
646
- **当用户提出多环境迁移需求时**,告知以下限制并引导:
647
-
648
- - 不同 Workspace 的数据源配置、Schema、VCluster 名称各自独立,迁移时需逐一确认和替换
649
- - 目前没有一键迁移工具,建议联系**数据运维(lh-dba 角色)**协助规划多环境策略
650
- - 可以用 `cz-cli task content <task_id>` 导出任务脚本,手动调整后在目标 Workspace 重建
651
-
652
- > 多环境管理是平台能力演进方向,当前阶段建议在单 Workspace 内用 Schema 命名区分(如 `ecommerce_ods_dev` vs `ecommerce_ods`),降低迁移复杂度。
@@ -1,16 +0,0 @@
1
- ClickZetta Skills License
2
- © 2026 Yunqi Inc. All rights reserved.
3
- LICENSE: Use of these materials (including all code, prompts, assets, files, and other components of these skills (collectively, "Skills")) is governed by your agreement with ClickZetta for the Service. If no separate agreement exists, use is governed by ClickZetta's Terms of Service (available at: https://yunqi.tech/documents/user-aggrement).
4
- Your applicable agreement is referred to as the "Agreement." "Service" is as defined in the Agreement.
5
- ADDITIONAL RESTRICTIONS: Notwithstanding anything in the Agreement to the contrary, you may not:
6
-
7
- Extract from the Service or retain copies of the Skills outside use with the Service;
8
- Reproduce or copy the Skills, except for temporary copies created automatically during authorized use of the Service;
9
- Create derivative works based on the Skills;
10
- Distribute, sublicense, or transfer the Skills to any third party;
11
- Make, offer to sell, sell, or import any inventions embodied in the Skills; nor,
12
- Reverse engineer, decompile, or disassemble the Skills.
13
-
14
- The receipt, viewing, or possession of the Skills does not convey or imply any license or right beyond those expressly granted above.
15
- Yunqi retains all rights, title, and interest in the Skills, including all copyrights, trademarks, patents, and all other applicable intellectual property rights.
16
- THE SKILLS ARE PROVIDED "AS IS," WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SKILLS OR THE USE OR OTHER DEALINGS IN THE SKILLS.