@clickzetta/cz-cli-darwin-arm64 0.3.80 → 0.3.81

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (201) hide show
  1. package/bin/cz-cli +0 -0
  2. package/package.json +1 -1
  3. package/bin/skills/clickzetta-access-control/LICENSE +0 -16
  4. package/bin/skills/clickzetta-access-control/SKILL.md +0 -243
  5. package/bin/skills/clickzetta-access-control/eval_cases.jsonl +0 -3
  6. package/bin/skills/clickzetta-access-control/references/dynamic-masking.md +0 -86
  7. package/bin/skills/clickzetta-access-control/references/grant-revoke.md +0 -103
  8. package/bin/skills/clickzetta-access-control/references/role-management.md +0 -66
  9. package/bin/skills/clickzetta-access-control/references/user-management.md +0 -61
  10. package/bin/skills/clickzetta-app-python-sdk/LICENSE +0 -16
  11. package/bin/skills/clickzetta-app-python-sdk/SKILL.md +0 -153
  12. package/bin/skills/clickzetta-app-python-sdk/eval_cases.jsonl +0 -12
  13. package/bin/skills/clickzetta-app-python-sdk/references/bulkload.md +0 -196
  14. package/bin/skills/clickzetta-app-python-sdk/references/connector.md +0 -143
  15. package/bin/skills/clickzetta-app-python-sdk/references/realtime.md +0 -122
  16. package/bin/skills/clickzetta-batch-sync-pipeline/LICENSE +0 -16
  17. package/bin/skills/clickzetta-batch-sync-pipeline/SKILL.md +0 -227
  18. package/bin/skills/clickzetta-batch-sync-pipeline/eval_cases.jsonl +0 -5
  19. package/bin/skills/clickzetta-bi-connect/LICENSE +0 -16
  20. package/bin/skills/clickzetta-bi-connect/SKILL.md +0 -176
  21. package/bin/skills/clickzetta-bi-connect/eval_cases.jsonl +0 -5
  22. package/bin/skills/clickzetta-bi-connect/references/bi-tools.md +0 -170
  23. package/bin/skills/clickzetta-cdc-sync-pipeline/LICENSE +0 -16
  24. package/bin/skills/clickzetta-cdc-sync-pipeline/SKILL.md +0 -633
  25. package/bin/skills/clickzetta-cdc-sync-pipeline/eval_cases.jsonl +0 -5
  26. package/bin/skills/clickzetta-data-ingest-pipeline/LICENSE +0 -16
  27. package/bin/skills/clickzetta-data-ingest-pipeline/SKILL.md +0 -237
  28. package/bin/skills/clickzetta-data-ingest-pipeline/eval_cases.jsonl +0 -5
  29. package/bin/skills/clickzetta-data-retention/LICENSE +0 -16
  30. package/bin/skills/clickzetta-data-retention/SKILL.md +0 -160
  31. package/bin/skills/clickzetta-data-retention/eval_cases.jsonl +0 -5
  32. package/bin/skills/clickzetta-data-retention/references/lifecycle-reference.md +0 -175
  33. package/bin/skills/clickzetta-data-science/LICENSE +0 -16
  34. package/bin/skills/clickzetta-data-science/SKILL.md +0 -125
  35. package/bin/skills/clickzetta-data-science/eval_cases.jsonl +0 -12
  36. package/bin/skills/clickzetta-data-science/references/bitmap-profile.md +0 -146
  37. package/bin/skills/clickzetta-data-science/references/data-patterns.md +0 -110
  38. package/bin/skills/clickzetta-data-science/references/setup.md +0 -160
  39. package/bin/skills/clickzetta-data-science/references/stats-functions.md +0 -195
  40. package/bin/skills/clickzetta-data-science/references/write-and-infer.md +0 -122
  41. package/bin/skills/clickzetta-data-science/references/zettapark-api.md +0 -156
  42. package/bin/skills/clickzetta-data-sharing/LICENSE +0 -16
  43. package/bin/skills/clickzetta-data-sharing/SKILL.md +0 -160
  44. package/bin/skills/clickzetta-data-sharing/eval_cases.jsonl +0 -3
  45. package/bin/skills/clickzetta-data-sharing/references/share-ddl.md +0 -134
  46. package/bin/skills/clickzetta-dba-guide/LICENSE +0 -16
  47. package/bin/skills/clickzetta-dba-guide/SKILL.md +0 -542
  48. package/bin/skills/clickzetta-dba-guide/eval_cases.jsonl +0 -3
  49. package/bin/skills/clickzetta-dw-modeling/LICENSE +0 -16
  50. package/bin/skills/clickzetta-dw-modeling/SKILL.md +0 -351
  51. package/bin/skills/clickzetta-dw-modeling/eval_cases.jsonl +0 -4
  52. package/bin/skills/clickzetta-dw-modeling/references/modeling-patterns.md +0 -100
  53. package/bin/skills/clickzetta-dynamic-table/LICENSE +0 -16
  54. package/bin/skills/clickzetta-dynamic-table/SKILL.md +0 -230
  55. package/bin/skills/clickzetta-dynamic-table/best-practices/dimension-table-join-guide.md +0 -253
  56. package/bin/skills/clickzetta-dynamic-table/best-practices/medallion-and-stream-patterns.md +0 -124
  57. package/bin/skills/clickzetta-dynamic-table/best-practices/non-partitioned-merge-into-warning.md +0 -96
  58. package/bin/skills/clickzetta-dynamic-table/best-practices/performance-optimization.md +0 -109
  59. package/bin/skills/clickzetta-dynamic-table/best-practices/scheduling-guide.md +0 -135
  60. package/bin/skills/clickzetta-dynamic-table/dt-creator/SKILL.md +0 -15
  61. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/dt-declaration-strategy.md +0 -185
  62. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/incremental-config-reference.md +0 -427
  63. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/refresh-history-guide.md +0 -260
  64. package/bin/skills/clickzetta-dynamic-table/dt-creator/references/sql-limitations.md +0 -80
  65. package/bin/skills/clickzetta-dynamic-table/dynamic-table-alter/SKILL.md +0 -190
  66. package/bin/skills/clickzetta-dynamic-table/eval_cases.jsonl +0 -5
  67. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/SKILL.md +0 -27
  68. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-column-validation-rules.md +0 -118
  69. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-conversion-rules.md +0 -225
  70. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-placeholder-rules.md +0 -182
  71. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-refresh-rules.md +0 -98
  72. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-self-reference-rules.md +0 -76
  73. package/bin/skills/clickzetta-dynamic-table/sql-to-dt/references/sql2dt-workflow.md +0 -109
  74. package/bin/skills/clickzetta-external-catalog/LICENSE +0 -16
  75. package/bin/skills/clickzetta-external-catalog/SKILL.md +0 -123
  76. package/bin/skills/clickzetta-external-catalog/eval_cases.jsonl +0 -5
  77. package/bin/skills/clickzetta-external-catalog/references/external-catalog-ddl.md +0 -130
  78. package/bin/skills/clickzetta-external-function/LICENSE +0 -16
  79. package/bin/skills/clickzetta-external-function/SKILL.md +0 -203
  80. package/bin/skills/clickzetta-external-function/eval_cases.jsonl +0 -4
  81. package/bin/skills/clickzetta-external-function/references/external-function-ddl.md +0 -171
  82. package/bin/skills/clickzetta-file-import-pipeline/LICENSE +0 -16
  83. package/bin/skills/clickzetta-file-import-pipeline/SKILL.md +0 -190
  84. package/bin/skills/clickzetta-file-import-pipeline/eval_cases.jsonl +0 -5
  85. package/bin/skills/clickzetta-index-manager/LICENSE +0 -16
  86. package/bin/skills/clickzetta-index-manager/SKILL.md +0 -140
  87. package/bin/skills/clickzetta-index-manager/eval_cases.jsonl +0 -5
  88. package/bin/skills/clickzetta-index-manager/references/bloomfilter-index.md +0 -67
  89. package/bin/skills/clickzetta-index-manager/references/index-management.md +0 -73
  90. package/bin/skills/clickzetta-index-manager/references/inverted-index.md +0 -80
  91. package/bin/skills/clickzetta-index-manager/references/vector-index.md +0 -81
  92. package/bin/skills/clickzetta-java-sdk/LICENSE +0 -16
  93. package/bin/skills/clickzetta-java-sdk/SKILL.md +0 -186
  94. package/bin/skills/clickzetta-java-sdk/eval_cases.jsonl +0 -12
  95. package/bin/skills/clickzetta-java-sdk/references/bulkload.md +0 -163
  96. package/bin/skills/clickzetta-java-sdk/references/realtime.md +0 -212
  97. package/bin/skills/clickzetta-kafka-ingest-pipeline/LICENSE +0 -16
  98. package/bin/skills/clickzetta-kafka-ingest-pipeline/SKILL.md +0 -769
  99. package/bin/skills/clickzetta-kafka-ingest-pipeline/eval_cases.jsonl +0 -5
  100. package/bin/skills/clickzetta-kafka-ingest-pipeline/references/kafka-pipe-syntax.md +0 -324
  101. package/bin/skills/clickzetta-lakehouse-connect/LICENSE +0 -16
  102. package/bin/skills/clickzetta-lakehouse-connect/SKILL.md +0 -218
  103. package/bin/skills/clickzetta-lakehouse-connect/eval_cases.jsonl +0 -3
  104. package/bin/skills/clickzetta-lakehouse-connect/evals/evals.json +0 -35
  105. package/bin/skills/clickzetta-lakehouse-connect/references/config-file.md +0 -435
  106. package/bin/skills/clickzetta-lakehouse-connect/references/jdbc.md +0 -478
  107. package/bin/skills/clickzetta-lakehouse-connect/references/python-sdk.md +0 -225
  108. package/bin/skills/clickzetta-lakehouse-connect/references/sqlalchemy.md +0 -468
  109. package/bin/skills/clickzetta-lakehouse-connect/references/zettapark-session.md +0 -445
  110. package/bin/skills/clickzetta-manage-comments/LICENSE +0 -16
  111. package/bin/skills/clickzetta-manage-comments/SKILL.md +0 -219
  112. package/bin/skills/clickzetta-manage-comments/eval_cases.jsonl +0 -3
  113. package/bin/skills/clickzetta-metadata/LICENSE +0 -16
  114. package/bin/skills/clickzetta-metadata/SKILL.md +0 -502
  115. package/bin/skills/clickzetta-metadata/eval_cases.jsonl +0 -5
  116. package/bin/skills/clickzetta-metadata/references/instance-views-reference.md +0 -276
  117. package/bin/skills/clickzetta-metadata/references/metering-views-reference.md +0 -137
  118. package/bin/skills/clickzetta-metadata/references/show-desc-reference.md +0 -326
  119. package/bin/skills/clickzetta-metadata/references/views-reference.md +0 -271
  120. package/bin/skills/clickzetta-monitoring/LICENSE +0 -16
  121. package/bin/skills/clickzetta-monitoring/SKILL.md +0 -215
  122. package/bin/skills/clickzetta-monitoring/eval_cases.jsonl +0 -5
  123. package/bin/skills/clickzetta-monitoring/references/job-history-analysis.md +0 -97
  124. package/bin/skills/clickzetta-monitoring/references/show-jobs.md +0 -48
  125. package/bin/skills/clickzetta-oss-ingest-pipeline/LICENSE +0 -16
  126. package/bin/skills/clickzetta-oss-ingest-pipeline/SKILL.md +0 -562
  127. package/bin/skills/clickzetta-oss-ingest-pipeline/eval_cases.jsonl +0 -5
  128. package/bin/skills/clickzetta-overview/LICENSE +0 -16
  129. package/bin/skills/clickzetta-overview/SKILL.md +0 -102
  130. package/bin/skills/clickzetta-overview/eval_cases.jsonl +0 -5
  131. package/bin/skills/clickzetta-overview/references/brands-and-endpoints.md +0 -79
  132. package/bin/skills/clickzetta-overview/references/object-model.md +0 -311
  133. package/bin/skills/clickzetta-overview/references/studio-modules.md +0 -173
  134. package/bin/skills/clickzetta-pipeline-review/LICENSE +0 -16
  135. package/bin/skills/clickzetta-pipeline-review/SKILL.md +0 -377
  136. package/bin/skills/clickzetta-query-optimizer/LICENSE +0 -16
  137. package/bin/skills/clickzetta-query-optimizer/SKILL.md +0 -156
  138. package/bin/skills/clickzetta-query-optimizer/eval_cases.jsonl +0 -5
  139. package/bin/skills/clickzetta-query-optimizer/references/explain.md +0 -56
  140. package/bin/skills/clickzetta-query-optimizer/references/hints-and-sortkey.md +0 -78
  141. package/bin/skills/clickzetta-query-optimizer/references/optimize.md +0 -65
  142. package/bin/skills/clickzetta-query-optimizer/references/result-cache.md +0 -49
  143. package/bin/skills/clickzetta-query-optimizer/references/show-jobs.md +0 -42
  144. package/bin/skills/clickzetta-realtime-sync-pipeline/LICENSE +0 -16
  145. package/bin/skills/clickzetta-realtime-sync-pipeline/SKILL.md +0 -323
  146. package/bin/skills/clickzetta-realtime-sync-pipeline/eval_cases.jsonl +0 -5
  147. package/bin/skills/clickzetta-semantic-view/LICENSE +0 -16
  148. package/bin/skills/clickzetta-semantic-view/SKILL.md +0 -207
  149. package/bin/skills/clickzetta-semantic-view/eval_cases.jsonl +0 -12
  150. package/bin/skills/clickzetta-semantic-view/references/semantic-view-reference.md +0 -167
  151. package/bin/skills/clickzetta-spark-flink-connector/LICENSE +0 -16
  152. package/bin/skills/clickzetta-spark-flink-connector/SKILL.md +0 -92
  153. package/bin/skills/clickzetta-spark-flink-connector/eval_cases.jsonl +0 -5
  154. package/bin/skills/clickzetta-spark-flink-connector/references/flink.md +0 -147
  155. package/bin/skills/clickzetta-spark-flink-connector/references/spark.md +0 -132
  156. package/bin/skills/clickzetta-sql-pipeline-manager/LICENSE +0 -16
  157. package/bin/skills/clickzetta-sql-pipeline-manager/SKILL.md +0 -485
  158. package/bin/skills/clickzetta-sql-pipeline-manager/eval_cases.jsonl +0 -12
  159. package/bin/skills/clickzetta-sql-pipeline-manager/evals/evals.json +0 -166
  160. package/bin/skills/clickzetta-sql-pipeline-manager/references/dynamic-table.md +0 -185
  161. package/bin/skills/clickzetta-sql-pipeline-manager/references/materialized-view.md +0 -129
  162. package/bin/skills/clickzetta-sql-pipeline-manager/references/pipe.md +0 -222
  163. package/bin/skills/clickzetta-sql-pipeline-manager/references/table-stream.md +0 -125
  164. package/bin/skills/clickzetta-sql-syntax-guide/LICENSE +0 -16
  165. package/bin/skills/clickzetta-sql-syntax-guide/SKILL.md +0 -249
  166. package/bin/skills/clickzetta-sql-syntax-guide/eval_cases.jsonl +0 -3
  167. package/bin/skills/clickzetta-sql-syntax-guide/references/ddl-reference.md +0 -350
  168. package/bin/skills/clickzetta-sql-syntax-guide/references/dml-reference.md +0 -279
  169. package/bin/skills/clickzetta-sql-syntax-guide/references/dql-reference.md +0 -504
  170. package/bin/skills/clickzetta-sql-syntax-guide/references/functions-reference.md +0 -372
  171. package/bin/skills/clickzetta-sql-syntax-guide/references/migration-databricks.md +0 -260
  172. package/bin/skills/clickzetta-sql-syntax-guide/references/migration-snowflake.md +0 -382
  173. package/bin/skills/clickzetta-sql-syntax-guide/references/vs-snowflake.md +0 -346
  174. package/bin/skills/clickzetta-sql-syntax-guide/references/vs-spark.md +0 -229
  175. package/bin/skills/clickzetta-studio-task-manager/LICENSE +0 -16
  176. package/bin/skills/clickzetta-studio-task-manager/SKILL.md +0 -652
  177. package/bin/skills/clickzetta-table-lineage/LICENSE +0 -16
  178. package/bin/skills/clickzetta-table-lineage/SKILL.md +0 -90
  179. package/bin/skills/clickzetta-table-lineage/eval_cases.jsonl +0 -1
  180. package/bin/skills/clickzetta-table-lineage/references/normalize_func.sql +0 -14
  181. package/bin/skills/clickzetta-table-lineage/references/table_cost.sql +0 -38
  182. package/bin/skills/clickzetta-table-lineage/references/table_lineage_standalone.html +0 -562
  183. package/bin/skills/clickzetta-table-lineage/references/table_relation.sql +0 -25
  184. package/bin/skills/clickzetta-table-stream-pipeline/LICENSE +0 -16
  185. package/bin/skills/clickzetta-table-stream-pipeline/SKILL.md +0 -206
  186. package/bin/skills/clickzetta-table-stream-pipeline/eval_cases.jsonl +0 -5
  187. package/bin/skills/clickzetta-vcluster-manager/LICENSE +0 -16
  188. package/bin/skills/clickzetta-vcluster-manager/SKILL.md +0 -212
  189. package/bin/skills/clickzetta-vcluster-manager/eval_cases.jsonl +0 -5
  190. package/bin/skills/clickzetta-vcluster-manager/references/vc-cache.md +0 -54
  191. package/bin/skills/clickzetta-vcluster-manager/references/vcluster-ddl.md +0 -150
  192. package/bin/skills/clickzetta-volume-manager/LICENSE +0 -16
  193. package/bin/skills/clickzetta-volume-manager/SKILL.md +0 -292
  194. package/bin/skills/clickzetta-volume-manager/eval_cases.jsonl +0 -5
  195. package/bin/skills/clickzetta-volume-manager/references/volume-ddl.md +0 -199
  196. package/bin/skills/clickzetta-zettapark/LICENSE +0 -16
  197. package/bin/skills/clickzetta-zettapark/SKILL.md +0 -248
  198. package/bin/skills/clickzetta-zettapark/eval_cases.jsonl +0 -12
  199. package/bin/skills/clickzetta-zettapark/references/zettapark-api.md +0 -283
  200. package/bin/skills/cz-cli/SKILL.md +0 -311
  201. package/bin/skills/cz-cli/references/profile-setup.md +0 -120
@@ -1,562 +0,0 @@
1
- ---
2
- name: clickzetta-oss-ingest-pipeline
3
- description: |
4
- 搭建 ClickZetta 对象存储(OSS/S3/COS)数据导入管道,覆盖持续导入(PIPE)和批量一次性导入
5
- 两大场景。持续导入支持 LIST_PURGE 扫描模式和 EVENT_NOTIFICATION 消息通知模式;批量导入支持
6
- Volume + INSERT INTO 和 Volume + COPY INTO 两种方式。当用户说"对象存储导入"、"OSS 数据管道"、
7
- "S3 数据导入"、"PIPE 持续导入"、"文件自动加载"、"存储桶数据同步"、"COS 导入"、
8
- "批量导入 OSS"、"从 OSS 加载数据"、"Volume 导入"时触发。
9
- 包含 PIPE 持续导入(两种 INGEST_MODE)、批量导入(Volume + COPY/INSERT)、Connection/Volume 创建、
10
- 监控管理等 ClickZetta 特有逻辑。
11
- Keywords: OSS, S3, COS, object storage, PIPE, COPY INTO, file ingestion
12
- ---
13
-
14
- # 对象存储数据管道搭建工作流
15
-
16
- ## 向导:收集必要信息
17
-
18
- 开始搭建对象存储管道前,优先使用交互式问答工具(如 `question`)收集以下信息并弹出选项菜单;若无此类工具,则用文字一次性列出所有问题:
19
-
20
- ```
21
- question({
22
- questions: [
23
- {
24
- question: "云平台?",
25
- options: [
26
- { label: "阿里云 OSS", description: "支持 LIST_PURGE 和 EVENT_NOTIFICATION 两种模式" },
27
- { label: "AWS S3", description: "支持 LIST_PURGE 和 EVENT_NOTIFICATION 两种模式" },
28
- { label: "腾讯云 COS", description: "仅支持 LIST_PURGE 模式" }
29
- ]
30
- },
31
- {
32
- question: "导入模式?",
33
- options: [
34
- { label: "持续导入(PIPE)", description: "新文件自动触发导入,近实时" },
35
- { label: "批量一次性导入", description: "手动或定时执行 COPY INTO" }
36
- ]
37
- },
38
- {
39
- question: "文件格式?",
40
- options: [
41
- { label: "CSV", description: "逗号分隔文本" },
42
- { label: "JSON / JSONL", description: "JSON 或换行分隔 JSON" },
43
- { label: "Parquet", description: "列式存储格式" },
44
- { label: "ORC", description: "列式存储格式" }
45
- ]
46
- }
47
- ]
48
- })
49
- ```
50
-
51
- **如果用户已经提供了足够信息,直接进入工作流,不再弹出菜单。**
52
-
53
- ---
54
-
55
- ## 适用场景
56
-
57
- - 从对象存储(阿里云 OSS / AWS S3 / 腾讯云 COS)持续自动导入数据到 Lakehouse(PIPE 模式)
58
- - 从对象存储批量一次性导入数据到 Lakehouse(Volume + COPY/INSERT 模式)
59
- - 需要微批处理方式加载新增文件,实现近实时数据同步
60
- - 需要选择扫描模式(LIST_PURGE)或消息通知模式(EVENT_NOTIFICATION)
61
- - 需要对导入数据进行过滤转换(WHERE 条件、指定文件)
62
- - 关键词:OSS PIPE、S3 导入、对象存储管道、文件自动加载、PIPE 持续导入、COS 数据同步、批量导入、Volume 导入
63
-
64
- ## 前置依赖
65
-
66
- - ClickZetta Lakehouse 账户,具备创建 PIPE、表、存储连接、Volume 等权限
67
- - 对象存储桶可达(Endpoint、AccessKey 或 Role ARN)
68
- - **执行环境**:已安装并配置 cz-cli
69
-
70
- ## 执行环境
71
-
72
- 所有 SQL 通过 `cz-cli sql` 执行:
73
-
74
- ```bash
75
- cz-cli --version # 确认 cz-cli 可用
76
- cz-cli sql "SELECT 1" --sync # 验证连接
77
- ```
78
-
79
- 需要 cz-cli,请参考官方文档安装并完成配置后重试。
80
-
81
- ## 核心概念
82
-
83
- ### INGEST_MODE 选择指引
84
-
85
- | 模式 | 触发方式 | 适用场景 | 云平台支持 | 授权方式 |
86
- |------|---------|---------|-----------|---------|
87
- | `LIST_PURGE` | 定期扫描目录 | 通用场景,导入后删除源文件 | 所有云平台 | 密钥 或 Role ARN |
88
- | `EVENT_NOTIFICATION` | 消息服务通知 | 低延迟场景,文件上传即触发 | 仅阿里云 OSS + AWS S3 | 仅 Role ARN |
89
-
90
- ### 关键限制
91
-
92
- - 每个 PIPE 需对应独立的 Volume,不可复用
93
- - 不支持修改 COPY 语句逻辑,需删除 PIPE 重新创建
94
- - PIPE 中的 COPY 语句不支持 `files` / `regexp` / `subdirectory` 参数
95
- - 数据加载无法保证严格有序
96
- - `load_history` 去重记录保留 7 天
97
- - 修改 `COPY_JOB_HINT` 会覆盖所有已有 hints,需一次性设置全部参数
98
- - **Volume PIPE 不支持 Kafka 专用参数**:`BATCH_INTERVAL_IN_SECONDS`、`BATCH_SIZE_PER_KAFKA_PARTITION`、`MAX_SKIP_BATCH_COUNT_ON_ERROR` 仅适用于 Kafka PIPE
99
- - **`COPY_JOB_HINT` 必须是合法 JSON 格式**,键值都要用双引号:`'{"IGNORE_TMP_FILE": "true"}'`,不能用 `KEY=VALUE` 格式
100
-
101
- ### 文件大小建议
102
-
103
- - gzip 压缩文件:≈ 50MB
104
- - CSV / PARQUET 未压缩文件:128MB ~ 256MB
105
-
106
- ## 工作流
107
-
108
- ### 模式 A:LIST_PURGE 扫描模式(通用)
109
-
110
- #### 步骤 1:创建存储连接(Storage Connection)
111
-
112
- ```sql
113
- -- 通过 cz-cli sql "<SQL>" --sync 执行
114
- -- 密钥方式(LIST_PURGE 模式支持)
115
- CREATE STORAGE CONNECTION IF NOT EXISTS my_oss_connection
116
- TYPE OSS
117
- access_id = '<your_access_key_id>'
118
- access_key = '<your_access_key_secret>'
119
- ENDPOINT = 'oss-cn-hangzhou.aliyuncs.com';
120
- ```
121
-
122
- > **参数说明**:
123
- > - `access_id`:对应阿里云控制台的 **AccessKey ID**
124
- > - `access_key`:对应阿里云控制台的 **AccessKey Secret**
125
- > - 也可使用大写形式 `ACCESS_KEY_ID` / `ACCESS_KEY_SECRET`
126
- > - ⚠️ `ACCESS_KEY` / `SECRET_KEY` 会报错(缺少 `_ID` / `_SECRET` 后缀)
127
- >
128
- > **提示**:如果使用 Role ARN 方式(EVENT_NOTIFICATION 模式必须),参见下方"模式 B"中的 Connection 创建语法。
129
-
130
- #### 步骤 2:创建外部 Volume
131
-
132
- ```sql
133
- -- 通过 cz-cli sql "<SQL>" --sync 执行
134
- CREATE EXTERNAL VOLUME IF NOT EXISTS pipe_volume
135
- LOCATION 'oss://my-bucket/data-path/'
136
- USING CONNECTION my_oss_connection
137
- DIRECTORY = (enable = true, auto_refresh = true)
138
- RECURSIVE = true
139
- COMMENT 'Volume for OSS PIPE ingestion';
140
- ```
141
-
142
- > **关键参数**:
143
- > - `RECURSIVE = true`:递归扫描子目录
144
- > - `DIRECTORY = (enable = true, auto_refresh = true)`:自动刷新目录元数据
145
- > - ⚠️ COMMENT 不带等号:`COMMENT 'text'`(不是 `COMMENT = 'text'`)
146
-
147
- #### 步骤 3:验证 COPY INTO 可独立运行
148
-
149
- 在创建 PIPE 之前,先用 COPY INTO 验证数据能正常加载:
150
-
151
- ```sql
152
- -- 通过 cz-cli sql "<SQL>" --sync 执行
153
- COPY INTO my_schema.target_table
154
- FROM VOLUME pipe_volume
155
- USING CSV OPTIONS ('header' = 'true', 'delimiter' = ',') PURGE=true;
156
- ```
157
-
158
- > **重要**:
159
- > - PIPE 中的 COPY 语句不支持 `files`、`regexp`、`subdirectory` 参数。确保此处验证时也不使用这些参数。
160
- > - OPTIONS 放在 PURGE=true **之前**:`USING CSV OPTIONS (...) PURGE=true`
161
-
162
- #### 步骤 4:创建 PIPE(LIST_PURGE 模式)
163
-
164
- ```sql
165
- -- 通过 cz-cli sql "<SQL>" --sync 执行
166
- CREATE PIPE IF NOT EXISTS my_oss_pipe
167
- INGEST_MODE = 'LIST_PURGE'
168
- VIRTUAL_CLUSTER = 'my_vc'
169
- COMMENT 'OSS data pipeline - scan mode'
170
- AS
171
- COPY INTO my_schema.target_table
172
- FROM VOLUME pipe_volume
173
- USING CSV OPTIONS ('header' = 'true') PURGE=true;
174
- ```
175
-
176
- > **⚠️ 语法关键点**:
177
- > - `PURGE=true` 放在最后:`USING <format> [OPTIONS (...)] PURGE=true`
178
- > - OPTIONS 在 PURGE=true **之前**(如果需要的话)
179
- > - 也可以不带 OPTIONS:`USING CSV PURGE=true`(推荐简洁写法)
180
- > - COMMENT 不带等号:`COMMENT 'text'`
181
- > - 大写 `PURGE`,小写 `true`,中间用 `=` 连接,无空格
182
- > - **LIST_PURGE 模式必须设置** `PURGE=true`,加载成功后删除源文件(避免重复导入)
183
- > - 即使不想删除源文件,LIST_PURGE 模式也需要此参数,否则会重复导入同一文件
184
- > - `VIRTUAL_CLUSTER`:指定执行 PIPE 任务的虚拟集群
185
- >
186
- > **错误写法**(会报语法错误):
187
- > ```sql
188
- > -- ❌ 不要把 purge 放在 OPTIONS 里
189
- > OPTIONS ('header' = 'true', 'purge' = 'true')
190
- > -- ❌ OPTIONS 不能在 PURGE 之后
191
- > USING CSV PURGE=true OPTIONS ('header' = 'true')
192
- > -- ❌ 不要用小写或加引号
193
- > 'purge'='true'
194
- > ```
195
-
196
- #### 步骤 5:验证 PIPE 状态
197
-
198
- ```sql
199
- -- 通过 cz-cli sql "<SQL>" --sync 执行
200
- DESC PIPE EXTENDED my_oss_pipe;
201
- ```
202
-
203
- 确认 `pipe_execution_paused = false`(PIPE 已启动运行)。
204
-
205
- ---
206
-
207
- ### 模式 B:EVENT_NOTIFICATION 消息通知模式(低延迟)
208
-
209
- > 仅支持阿里云 OSS + AWS S3。文件上传到桶后,通过消息服务(MNS/SQS)通知 Lakehouse 立即加载。
210
-
211
- #### 前置准备(阿里云 OSS 示例)
212
-
213
- 1. **开通阿里云 MNS 消息服务**:在阿里云控制台开通消息服务 MNS
214
- 2. **配置 OSS 事件通知**:在 OSS 桶 → 事件通知 → 创建规则,事件类型选择 `ObjectCreated`,目标选择 MNS 队列
215
- 3. **授权 OSS 读取权限**:创建 RAM 角色,授予 `oss:GetObject`、`oss:ListBucket` 权限,记录 Role ARN
216
- 4. **授权 MNS 到 Lakehouse**:将 Lakehouse 服务账号添加到 MNS 队列的授权策略中
217
-
218
- #### 步骤 1:创建存储连接(Role ARN 方式)
219
-
220
- ```sql
221
- -- 通过 cz-cli sql "<SQL>" --sync 执行
222
- CREATE STORAGE CONNECTION IF NOT EXISTS my_oss_role_connection
223
- TYPE OSS
224
- ENDPOINT = 'oss-cn-hangzhou.aliyuncs.com'
225
- ROLE_ARN = 'acs:ram::1234567890:role/clickzetta-oss-role'
226
- REGION = 'cn-hangzhou';
227
- ```
228
-
229
- #### 步骤 2:创建外部 Volume
230
-
231
- ```sql
232
- -- 通过 cz-cli sql "<SQL>" --sync 执行
233
- CREATE EXTERNAL VOLUME IF NOT EXISTS pipe_event_volume
234
- LOCATION 'oss://my-bucket/data-path/'
235
- USING CONNECTION my_oss_role_connection
236
- DIRECTORY = (enable = true, auto_refresh = true)
237
- RECURSIVE = true;
238
- ```
239
-
240
- #### 步骤 3:创建 PIPE(EVENT_NOTIFICATION 模式)
241
-
242
- ```sql
243
- -- 通过 cz-cli sql "<SQL>" --sync 执行
244
- CREATE PIPE IF NOT EXISTS my_oss_event_pipe
245
- INGEST_MODE = 'EVENT_NOTIFICATION'
246
- VIRTUAL_CLUSTER = 'my_vc'
247
- ALICLOUD_MNS_QUEUE = 'my-mns-queue-name'
248
- COMMENT 'OSS data pipeline - event notification mode'
249
- AS
250
- COPY INTO my_schema.target_table
251
- FROM VOLUME pipe_event_volume
252
- USING CSV;
253
- ```
254
-
255
- > **参数说明**:
256
- > - `INGEST_MODE = 'EVENT_NOTIFICATION'`:通过消息通知触发加载
257
- > - `ALICLOUD_MNS_QUEUE`:阿里云 MNS 队列名称(AWS 使用 `AWS_SQS_QUEUE`)
258
- > - 此模式下不需要 `PURGE=true`,因为是事件驱动而非扫描
259
- > - COMMENT 不带等号:`COMMENT 'text'`
260
-
261
- ---
262
-
263
- ### 模式 C:批量导入(一次性 Volume + COPY/INSERT)
264
-
265
- > 适用于一次性或定期批量加载对象存储中的文件,无需创建 PIPE。支持阿里云 OSS、腾讯云 COS 和 AWS S3。
266
- > 推荐使用 GENERAL PURPOSE 类型的虚拟集群执行批量加载。
267
-
268
- #### 使用限制
269
-
270
- - 不支持跨云导入(源存储与 Lakehouse 环境需在同一云平台)
271
- - 同地域建议使用内网 Endpoint(如 `oss-cn-shanghai-internal.aliyuncs.com`)以提升速度和稳定性
272
-
273
- #### 步骤 1:创建目标表
274
-
275
- ```sql
276
- -- 通过 cz-cli sql "<SQL>" --sync 执行
277
- CREATE TABLE IF NOT EXISTS my_schema.target_table (
278
- id STRING,
279
- name STRING,
280
- amount DECIMAL(10,2),
281
- created_date STRING
282
- );
283
- ```
284
-
285
- #### 步骤 2:创建存储连接(access_id/access_key 语法)
286
-
287
- ```sql
288
- -- 通过 cz-cli sql "<SQL>" --sync 执行
289
- CREATE STORAGE CONNECTION IF NOT EXISTS my_batch_conn
290
- TYPE OSS
291
- ENDPOINT = 'oss-cn-shanghai-internal.aliyuncs.com'
292
- access_id = '<your_access_key_id>'
293
- access_key = '<your_access_key_secret>';
294
- ```
295
-
296
- > **Connection 参数命名**:
297
- > - 小写形式:`access_id` / `access_key`(推荐)
298
- > - 大写形式:`ACCESS_KEY_ID` / `ACCESS_KEY_SECRET`(也可以)
299
- > - ⚠️ `ACCESS_KEY` / `SECRET_KEY` 会报错(缺少后缀)
300
-
301
- #### 步骤 3:创建外部 Volume(启用目录自动刷新)
302
-
303
- ```sql
304
- -- 通过 cz-cli sql "<SQL>" --sync 执行
305
- CREATE EXTERNAL VOLUME IF NOT EXISTS my_batch_volume
306
- LOCATION 'oss://my-bucket/data-path/'
307
- USING CONNECTION my_batch_conn
308
- DIRECTORY = (enable=true, auto_refresh=true);
309
- ```
310
-
311
- > **关键参数**:
312
- > - `LOCATION`:对象存储路径,格式为 `oss://bucket/path/`
313
- > - `USING CONNECTION`:引用已创建的存储连接
314
- > - `DIRECTORY = (enable=true, auto_refresh=true)`:启用目录元数据并自动刷新,便于查询 Volume 中的文件列表
315
- >
316
- > **Volume 创建语法统一说明**:
317
- > - ✅ 推荐语法:`LOCATION '...' USING CONNECTION conn_name`(官方文档标准写法)
318
- > - ⚠️ 旧语法:`STORAGE_CONNECTION = conn_name LOCATION = '...'`(部分旧文档中出现,仍可使用)
319
- > - 两种语法功能等价,建议统一使用 `LOCATION ... USING CONNECTION` 形式
320
-
321
- #### 步骤 4a:INSERT INTO 从 Volume 导入(支持过滤转换)
322
-
323
- ```sql
324
- -- 通过 cz-cli sql "<SQL>" --sync 执行
325
- INSERT INTO my_schema.target_table
326
- SELECT * FROM VOLUME my_batch_volume (
327
- id STRING,
328
- name STRING,
329
- amount DECIMAL(10,2),
330
- created_date STRING
331
- ) USING CSV OPTIONS ('header'='true', 'sep'=',')
332
- FILES ('data_file_01.csv')
333
- WHERE amount > 0;
334
- ```
335
-
336
- > **参数说明**:
337
- > - `VOLUME my_batch_volume (...)`:指定 Volume 及列定义(Schema-on-Read)
338
- > - `USING CSV OPTIONS (...)`:指定文件格式和解析选项
339
- > - `FILES ('file1.csv', 'file2.csv')`:指定要加载的文件名(可选,不指定则加载全部)
340
- > - `WHERE ...`:对数据进行过滤转换(可选)
341
- > - INSERT INTO 方式支持 `FILES` 和 `WHERE` 参数,适合需要精细控制的场景
342
-
343
- #### 步骤 4b:COPY INTO 从 Volume 导入(简洁语法)
344
-
345
- ```sql
346
- -- 通过 cz-cli sql "<SQL>" --sync 执行
347
- COPY INTO my_schema.target_table
348
- FROM VOLUME my_batch_volume (
349
- id STRING,
350
- name STRING,
351
- amount DECIMAL(10,2),
352
- created_date STRING
353
- ) USING CSV OPTIONS ('header'='true', 'sep'=',');
354
- ```
355
-
356
- > **INSERT INTO vs COPY INTO 选择**:
357
- > - `INSERT INTO`:支持 `FILES()` 指定文件、`WHERE` 过滤转换,适合精细控制
358
- > - `COPY INTO`:语法更简洁,适合全量加载
359
- > - 两者都支持 Schema-on-Read(在 FROM VOLUME 中定义列)
360
- > - ⚠️ **load_history 差异**:只有 `COPY INTO` 会记录到 `load_history`,`INSERT INTO ... FROM VOLUME` 不会记录。如需去重保护,请使用 `COPY INTO`
361
-
362
- #### 步骤 5:验证导入结果
363
-
364
- ```sql
365
- -- 通过 cz-cli sql "<SQL>" --sync 执行
366
- SELECT COUNT(*) AS total_rows FROM my_schema.target_table;
367
- SELECT * FROM my_schema.target_table LIMIT 10;
368
- ```
369
-
370
- ---
371
-
372
- ## 监控与运维
373
-
374
- ### 查看 PIPE 详细状态
375
-
376
- ```sql
377
- -- 通过 cz-cli sql "<SQL>" --sync 执行
378
- DESC PIPE EXTENDED my_oss_pipe;
379
- ```
380
-
381
- 关键字段:
382
- - `pipe_execution_paused`:是否暂停
383
- - `ingest_mode`:导入模式
384
- - `virtual_cluster`:执行集群
385
- - `definition`:COPY 语句定义
386
-
387
- ### 查看加载历史
388
-
389
- ```sql
390
- -- 通过 cz-cli sql "<SQL>" --sync 执行
391
- SELECT * FROM load_history('my_schema.target_table')
392
- ORDER BY last_load_time DESC
393
- LIMIT 20;
394
- ```
395
-
396
- > `load_history` 去重记录保留 7 天。
397
-
398
- ### 通过 query_tag 过滤 PIPE 作业
399
-
400
- PIPE 执行的作业会自动打上 `query_tag`,格式为:`pipe.<workspace_name>.<schema_name>.<pipe_name>`
401
-
402
- ```sql
403
- -- 通过 cz-cli sql "<SQL>" --sync 执行
404
- -- 在 JOBS 列表中过滤 PIPE 相关作业
405
- SHOW JOBS WHERE query_tag = 'pipe.my_workspace.my_schema.my_oss_pipe';
406
- ```
407
-
408
- ---
409
-
410
- ## PIPE 管理操作
411
-
412
- ### 暂停 / 恢复 PIPE
413
-
414
- ```sql
415
- -- 暂停 PIPE
416
- ALTER PIPE my_oss_pipe SET PIPE_EXECUTION_PAUSED = true;
417
-
418
- -- 恢复 PIPE
419
- ALTER PIPE my_oss_pipe SET PIPE_EXECUTION_PAUSED = false;
420
- ```
421
-
422
- ### 修改 PIPE 属性
423
-
424
- ```sql
425
- -- 修改虚拟集群
426
- ALTER PIPE my_oss_pipe SET VIRTUAL_CLUSTER = 'new_vc';
427
-
428
- -- 修改 COPY_JOB_HINT(注意:会覆盖所有已有 hints,需一次性设置全部参数)
429
- -- 必须是合法 JSON 格式,键值都要用双引号
430
- ALTER PIPE my_oss_pipe SET COPY_JOB_HINT = '{"max_file_count":"100","force":"false"}';
431
- ```
432
-
433
- > **限制**:每次 ALTER PIPE 只能修改一个属性。不支持修改 COPY 语句逻辑,需删除 PIPE 重新创建。
434
-
435
- ### 删除 PIPE
436
-
437
- ```sql
438
- DROP PIPE IF EXISTS my_oss_pipe;
439
- ```
440
-
441
- ---
442
-
443
- ## 故障排除
444
-
445
- | 问题 | 排查方向 |
446
- |------|---------|
447
- | PIPE 创建后无数据加载 | 1. `DESC PIPE EXTENDED` 检查是否暂停 2. 确认 Volume 路径下有新文件 3. 检查 COPY INTO 是否能独立运行 |
448
- | LIST_PURGE 模式文件未被删除 | 确认 `PURGE=true` 已设置(紧跟 `USING <format>` 之后);检查 Connection 的 AccessKey 是否有删除权限 |
449
- | `PURGE=true` 语法错误 | OPTIONS 必须在 PURGE 之前:`USING CSV OPTIONS (...) PURGE=true`。不要写成 `USING CSV PURGE=true OPTIONS(...)` |
450
- | EVENT_NOTIFICATION 模式无触发 | 1. 检查 MNS/SQS 队列是否收到消息 2. 确认 OSS 事件通知规则配置正确 3. 检查 Role ARN 授权 |
451
- | 重复加载数据 | `load_history` 去重记录仅保留 7 天,超过 7 天的同名文件会被重新加载 |
452
- | COPY_JOB_HINT 修改后部分参数丢失 | `SET COPY_JOB_HINT` 会覆盖所有已有 hints,需在一次 ALTER 中设置全部参数 |
453
- | INSERT INTO FROM VOLUME 后 load_history 无记录 | 正常行为:只有 `COPY INTO` 会记录到 load_history,`INSERT INTO` 不会 |
454
- | COPY INTO 报格式错误 | Volume 中有多种格式文件,使用 `FILES('xxx.json')` 指定文件 |
455
-
456
- ## 注意事项
457
-
458
- ### PIPE 持续导入(模式 A / B)
459
-
460
- - 每个 PIPE 需对应独立的 Volume,不可多个 PIPE 共用同一 Volume
461
- - PIPE 中的 COPY 语句不支持 `files` / `regexp` / `subdirectory` 参数
462
- - 数据加载无法保证严格有序(多文件并行加载)
463
- - 推荐文件大小:gzip 压缩 ≈ 50MB,CSV/Parquet 未压缩 128MB ~ 256MB
464
- - `load_history` 去重记录保留 7 天,超期后同名文件可能被重复加载
465
- - 修改 COPY 逻辑需删除 PIPE 重新创建,ALTER PIPE 不支持修改 COPY 语句
466
-
467
- ### 批量导入(模式 C)
468
-
469
- - Volume 支持阿里云 OSS、腾讯云 COS 和 AWS S3
470
- - 不支持跨云导入(源存储与 Lakehouse 环境需在同一云平台)
471
- - 同地域建议使用内网 Endpoint 以提升传输速度和稳定性
472
- - 推荐使用 GENERAL PURPOSE 类型虚拟集群执行批量加载任务
473
- - INSERT INTO 方式支持 `FILES()` 和 `WHERE` 参数,COPY INTO 不支持
474
- - Connection 参数使用 `access_id`/`access_key`(小写)或 `ACCESS_KEY_ID`/`ACCESS_KEY_SECRET`(大写),不要用 `ACCESS_KEY`/`SECRET_KEY`
475
- - ⚠️ `INSERT INTO ... FROM VOLUME` 不会记录到 `load_history`,只有 `COPY INTO` 会记录
476
- - ⚠️ Volume 中有多种格式文件时,不指定 `FILES()` 的 COPY INTO 会尝试读取所有文件,可能因格式不匹配而失败。建议使用 `FILES('xxx.json')` 指定文件或 `SUBDIRECTORY` 指定子目录
477
- - 上传文件到 OSS 后,`SHOW VOLUME DIRECTORY` 可能需要先执行 `ALTER VOLUME name REFRESH` 刷新目录元数据
478
-
479
- ---
480
-
481
- ## cz-cli 执行路径
482
-
483
- ### 模式 A:LIST_PURGE 扫描模式(cz-cli 版)
484
-
485
- ```bash
486
- # 步骤 1:创建存储连接
487
- cz-cli agent run "创建 OSS Storage Connection,名称 <my_oss_connection>,endpoint <oss-cn-hangzhou.aliyuncs.com>,access_key <key>,secret_key <secret>" \
488
- --format a2a --dangerously-skip-permissions
489
-
490
- # 步骤 2:创建外部 Volume
491
- cz-cli agent run "创建外部 Volume,名称 <pipe_volume>,使用 Connection <my_oss_connection>,路径 oss://<bucket>/<data-path>/" \
492
- --format a2a --dangerously-skip-permissions
493
-
494
- # 步骤 3:验证 COPY INTO 可独立运行
495
- cz-cli agent run "用 COPY INTO 从 Volume <pipe_volume> 加载数据到表 <schema>.<table>,文件格式 CSV,有 header,验证数据能正常加载" \
496
- --format a2a --dangerously-skip-permissions
497
-
498
- # 步骤 4:创建 LIST_PURGE 模式 PIPE
499
- cz-cli agent run "创建 PIPE <my_oss_pipe>,INGEST_MODE 为 LIST_PURGE,使用 VCluster <my_vc>,从 Volume <pipe_volume> 以 CSV 格式(有 header,purge=true)持续导入数据到表 <schema>.<table>" \
500
- --format a2a --dangerously-skip-permissions
501
-
502
- # 步骤 5:验证 PIPE 状态
503
- cz-cli agent run "查看 PIPE <my_oss_pipe> 的详细状态,确认 pipe_execution_paused 为 false" \
504
- --format a2a --dangerously-skip-permissions
505
- ```
506
-
507
- ---
508
-
509
- ### 模式 B:EVENT_NOTIFICATION 消息通知模式(cz-cli 版)
510
-
511
- ```bash
512
- # 步骤 1:创建 Role ARN 方式的存储连接
513
- cz-cli agent run "创建 OSS Storage Connection,名称 <my_oss_role_connection>,endpoint <oss-cn-hangzhou.aliyuncs.com>,使用 Role ARN <acs:ram::xxx:role/clickzetta-oss-role>,region cn-hangzhou" \
514
- --format a2a --dangerously-skip-permissions
515
-
516
- # 步骤 2:创建外部 Volume
517
- cz-cli agent run "创建外部 Volume,名称 <pipe_event_volume>,使用 Connection <my_oss_role_connection>,路径 oss://<bucket>/<data-path>/" \
518
- --format a2a --dangerously-skip-permissions
519
-
520
- # 步骤 3:创建 EVENT_NOTIFICATION 模式 PIPE
521
- cz-cli agent run "创建 PIPE <my_oss_event_pipe>,INGEST_MODE 为 EVENT_NOTIFICATION,使用 VCluster <my_vc>,ALICLOUD_MNS_QUEUE 为 <my-mns-queue-name>,从 Volume <pipe_event_volume> 以 CSV 格式持续导入数据到表 <schema>.<table>" \
522
- --format a2a --dangerously-skip-permissions
523
- ```
524
-
525
- ---
526
-
527
- ### 模式 C:批量导入(cz-cli 版)
528
-
529
- ```bash
530
- # 步骤 1:创建目标表
531
- cz-cli agent run "在 schema <my_schema> 下创建表 <target_table>,字段:id STRING, name STRING, amount DECIMAL(10,2), created_date STRING" \
532
- --format a2a --dangerously-skip-permissions
533
-
534
- # 步骤 2-3:创建存储连接和 Volume
535
- cz-cli agent run "创建 OSS Storage Connection <my_batch_conn>,endpoint <oss-cn-shanghai-internal.aliyuncs.com>,access_id <id>,access_key <key>;然后创建外部 Volume <my_batch_volume>,路径 oss://<bucket>/<data-path>/,启用目录自动刷新" \
536
- --format a2a --dangerously-skip-permissions
537
-
538
- # 步骤 4:从 Volume 导入数据
539
- cz-cli agent run "从 Volume <my_batch_volume> 以 CSV 格式(有 header)将数据导入表 <my_schema>.<target_table>" \
540
- --format a2a --dangerously-skip-permissions
541
-
542
- # 步骤 5:验证导入结果
543
- cz-cli agent run "查询表 <my_schema>.<target_table> 的总行数和前 10 条数据,验证导入结果" \
544
- --format a2a --dangerously-skip-permissions
545
- ```
546
-
547
- ---
548
-
549
- ### 监控与运维(cz-cli 版)
550
-
551
- ```bash
552
- # 查看 PIPE 状态
553
- cz-cli agent run "查看 PIPE <my_oss_pipe> 的详细状态和加载历史" \
554
- --format a2a --dangerously-skip-permissions
555
-
556
- # 暂停/恢复 PIPE
557
- cz-cli agent run "暂停 PIPE <my_oss_pipe>" \
558
- --format a2a --dangerously-skip-permissions
559
-
560
- cz-cli agent run "恢复 PIPE <my_oss_pipe>" \
561
- --format a2a --dangerously-skip-permissions
562
- ```
@@ -1,5 +0,0 @@
1
- {"case_id":"001","type":"should_call","user_input":"怎么从阿里云 OSS 持续自动导入数据到 Lakehouse?","expected_skill":"clickzetta-oss-ingest-pipeline","expected_output_contains":["PIPE","LIST_PURGE"]}
2
- {"case_id":"002","type":"should_call","user_input":"OSS PIPE 的 LIST_PURGE 和 EVENT_NOTIFICATION 模式有什么区别?","expected_skill":"clickzetta-oss-ingest-pipeline","expected_output_contains":["LIST_PURGE","EVENT_NOTIFICATION"]}
3
- {"case_id":"003","type":"should_call","user_input":"怎么从 S3 批量导入 Parquet 文件到 Lakehouse?","expected_skill":"clickzetta-oss-ingest-pipeline","expected_output_contains":["Volume","COPY INTO"]}
4
- {"case_id":"004","type":"should_call","user_input":"OSS 持续导入的前置步骤是什么?需要先创建什么对象?","expected_skill":"clickzetta-oss-ingest-pipeline","expected_output_contains":["CREATE STORAGE CONNECTION","External Volume"]}
5
- {"case_id":"005","type":"should_call","user_input":"腾讯云 COS 的数据怎么导入 ClickZetta?","expected_skill":"clickzetta-oss-ingest-pipeline","expected_output_contains":["COS","PIPE"]}
@@ -1,16 +0,0 @@
1
- ClickZetta Skills License
2
- © 2026 Yunqi Inc. All rights reserved.
3
- LICENSE: Use of these materials (including all code, prompts, assets, files, and other components of these skills (collectively, "Skills")) is governed by your agreement with ClickZetta for the Service. If no separate agreement exists, use is governed by ClickZetta's Terms of Service (available at: https://yunqi.tech/documents/user-aggrement).
4
- Your applicable agreement is referred to as the "Agreement." "Service" is as defined in the Agreement.
5
- ADDITIONAL RESTRICTIONS: Notwithstanding anything in the Agreement to the contrary, you may not:
6
-
7
- Extract from the Service or retain copies of the Skills outside use with the Service;
8
- Reproduce or copy the Skills, except for temporary copies created automatically during authorized use of the Service;
9
- Create derivative works based on the Skills;
10
- Distribute, sublicense, or transfer the Skills to any third party;
11
- Make, offer to sell, sell, or import any inventions embodied in the Skills; nor,
12
- Reverse engineer, decompile, or disassemble the Skills.
13
-
14
- The receipt, viewing, or possession of the Skills does not convey or imply any license or right beyond those expressly granted above.
15
- Yunqi retains all rights, title, and interest in the Skills, including all copyrights, trademarks, patents, and all other applicable intellectual property rights.
16
- THE SKILLS ARE PROVIDED "AS IS," WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SKILLS OR THE USE OR OTHER DEALINGS IN THE SKILLS.
@@ -1,102 +0,0 @@
1
- ---
2
- name: clickzetta-overview
3
- description: |
4
- ClickZetta Lakehouse 产品全貌:核心概念、对象模型、架构设计、Studio 功能介绍。
5
- 覆盖:账户/实例/工作空间/Schema 对象层级,Workspace 与 Database/Catalog 的对应关系,
6
- VCluster 三种类型与 CRU 计费,Dynamic Table 增量刷新机制,Table Stream CDC,
7
- 三层缓存体系,Pipe 持续导入,Synonym 跨 Schema 别名,权限体系(RBAC/ACL),
8
- 与 Snowflake/Databricks 的关键差异对比,存算分离架构,
9
- 品牌关系(ClickZetta = 云器 = Singdata)及各环境服务地址,
10
- Studio 六大模块(数据开发 IDE、任务调度、数据集成、数据目录、数据质量、运维监控)。
11
- 当用户说"工作空间是什么"、"Schema 和 Database 什么关系"、"Catalog 是什么"、
12
- "VCluster 是什么"、"CRU 是什么"、"内部表和外部表区别"、"Lakehouse 架构"、
13
- "对象层级"、"权限体系"、"和 Snowflake 概念对比"、"和 Databricks 概念对比"、
14
- "存算分离"、"云器是什么"、"Singdata 是什么"、"ClickZetta 和云器什么关系"、
15
- "Studio 是什么"、"Studio 有哪些功能"、"任务调度怎么用"、"数据集成怎么用"、
16
- "数据目录"、"数据质量"、"运维监控"时触发。
17
- 不适合:具体 SQL 语法(用 sql-syntax-guide)、具体元数据查询(用 metadata)、
18
- 具体数据导入操作(用 pipeline skill)、具体权限操作(用 access-control)。
19
- Keywords: concepts, architecture, workspace, schema, VCluster, Studio, overview, object model
20
- ---
21
-
22
- # ClickZetta Lakehouse 产品全貌
23
-
24
- ## 参考文档
25
-
26
- | 文档 | 内容 |
27
- |------|------|
28
- | [references/object-model.md](references/object-model.md) | 对象层级、概念对比、独特设计详解 |
29
- | [references/brands-and-endpoints.md](references/brands-and-endpoints.md) | 品牌关系、各环境服务地址 |
30
- | [references/studio-modules.md](references/studio-modules.md) | Studio 六大模块详细功能 |
31
-
32
- ---
33
-
34
- ## 对象层级总览
35
-
36
- ```
37
- 账户 (Account)
38
- └── 服务实例 (Instance) ← 资源隔离单元
39
- └── 工作空间 (Workspace) ← ≈ Snowflake Database / Databricks Catalog
40
- ├── Schema ← 命名空间,权限边界
41
- │ ├── 内部表 / 外部表 / 视图 / 动态表 / 物化视图
42
- │ ├── Volume / Table Stream / Pipe / 索引 / Synonym
43
- │ └── 函数 / External Function
44
- ├── Share / Connection / External Catalog
45
- └── VCluster(计算集群)
46
- ```
47
-
48
- ---
49
-
50
- ## 核心概念速查
51
-
52
- | 概念 | 说明 |
53
- |------|------|
54
- | CRU | 跨云统一算力单位,按 CRU×时 计费,集群停止不计费 |
55
- | VCluster | 三种类型:通用型(GP)、分析型(AP)、同步型(INTEGRATION) |
56
- | Dynamic Table | 声明式增量计算,基于 CBO 自适应增量/全量,最小 1 分钟刷新 |
57
- | Table Stream | CDC 变更捕获对象,需先开启 change_tracking |
58
- | Pipe | 持续导入对象(Kafka/OSS),每个 Pipe 对应独立 Volume |
59
- | Synonym | 跨 Schema 别名,无需复制数据 |
60
- | 三层缓存 | 结果缓存 + 元数据缓存 + 本地磁盘缓存(AP 支持 PRELOAD) |
61
-
62
- ---
63
-
64
- ## 与 Snowflake/Databricks 关键差异
65
-
66
- | ClickZetta | Snowflake | Databricks | 差异点 |
67
- |---|---|---|---|
68
- | Workspace | Database | Catalog | 一个账户可多实例多云 |
69
- | VCluster (3 类型) | Warehouse | SQL Warehouse | GP/AP/INTEGRATION 分离 |
70
- | Studio(内置) | 需第三方 | 需第三方 | 内置调度/集成/质量/目录 |
71
- | Dynamic Table (CBO) | Dynamic Table | Streaming Table | 基于 CBO 非流式 |
72
- | Synonym | — | — | ClickZetta 特有 |
73
-
74
- ---
75
-
76
- ## Studio 六大模块
77
-
78
- | 模块 | 核心能力 |
79
- |------|---------|
80
- | 数据开发 | Web IDE,支持 SQL/Python/Shell/JDBC/动态表/同步任务 |
81
- | 任务调度 | Cron 调度 + DAG 编排 + 任务组 + 补数据 + 参数变量 |
82
- | 数据集成 | 30+ 数据源无代码同步(离线/实时/CDC) |
83
- | 数据目录 | 全局搜索、表详情、数据血缘、数据预览 |
84
- | 数据质量 | 6 维度规则(完整性/唯一性/一致性/准确性/有效性/及时性) |
85
- | 运维监控 | 任务实例运维 + 告警规则 + 飞书/企微通知 |
86
-
87
- ---
88
-
89
- ## 品牌关系
90
-
91
- ClickZetta(技术品牌)= 云器(国内品牌)= Singdata(国际品牌)
92
-
93
- 详见 [references/brands-and-endpoints.md](references/brands-and-endpoints.md) 获取各环境服务地址。
94
-
95
- ---
96
-
97
- ## 存储架构
98
-
99
- - 存算分离:VCluster 停止不产生计算费用
100
- - 开放格式:内部表基于 Apache Iceberg
101
- - 多云多地域:阿里云/腾讯云/AWS
102
- - 私有存储(BYOS):支持自有 OSS/S3/COS