@clickzetta/cz-cli-darwin-x64 0.3.19 → 0.3.20
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/cz-cli +0 -0
- package/bin/skills/clickzetta-lakehouse-connect/SKILL.md +218 -0
- package/bin/skills/clickzetta-lakehouse-connect/eval_cases.jsonl +3 -0
- package/bin/skills/clickzetta-lakehouse-connect/evals/evals.json +35 -0
- package/bin/skills/clickzetta-lakehouse-connect/references/config-file.md +435 -0
- package/bin/skills/clickzetta-lakehouse-connect/references/jdbc.md +478 -0
- package/bin/skills/clickzetta-lakehouse-connect/references/python-sdk.md +225 -0
- package/bin/skills/clickzetta-lakehouse-connect/references/sqlalchemy.md +468 -0
- package/bin/skills/clickzetta-lakehouse-connect/references/zettapark-session.md +445 -0
- package/bin/skills/clickzetta-manage-comments/SKILL.md +219 -0
- package/bin/skills/clickzetta-manage-comments/eval_cases.jsonl +3 -0
- package/bin/skills/clickzetta-metadata/SKILL.md +483 -0
- package/bin/skills/clickzetta-metadata/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-metadata/references/instance-views-reference.md +276 -0
- package/bin/skills/clickzetta-metadata/references/metering-views-reference.md +137 -0
- package/bin/skills/clickzetta-metadata/references/show-desc-reference.md +326 -0
- package/bin/skills/clickzetta-metadata/references/views-reference.md +271 -0
- package/bin/skills/clickzetta-overview/SKILL.md +102 -0
- package/bin/skills/clickzetta-overview/eval_cases.jsonl +5 -0
- package/bin/skills/clickzetta-overview/references/brands-and-endpoints.md +79 -0
- package/bin/skills/clickzetta-overview/references/object-model.md +311 -0
- package/bin/skills/clickzetta-overview/references/studio-modules.md +173 -0
- package/package.json +1 -1
package/bin/cz-cli
CHANGED
|
Binary file
|
|
@@ -0,0 +1,218 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: clickzetta-lakehouse-connect
|
|
3
|
+
description: |
|
|
4
|
+
Guide for connecting to ClickZetta Lakehouse via SDK/JDBC. Covers Python SDK (clickzetta.connect), ZettaPark Session (DataFrame API), SQLAlchemy (ORM/BI tools), and JDBC (Java). Use this skill when user needs to configure a connection from external tools or code. Trigger for: "Python SDK 连接", "JDBC 连接", "SQLAlchemy 配置", "ZettaPark 怎么用", "连接报错", "clickzetta-connector-python", "clickzetta-sqlalchemy".
|
|
5
|
+
Keywords: connection, Python SDK, JDBC, SQLAlchemy, ZettaPark, driver, connect
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# ClickZetta Lakehouse 连接指南
|
|
9
|
+
|
|
10
|
+
## 指令
|
|
11
|
+
|
|
12
|
+
### 步骤 0:自动获取连接参数(优先)
|
|
13
|
+
|
|
14
|
+
**在询问用户之前,先尝试从本地配置文件自动读取连接参数。**
|
|
15
|
+
|
|
16
|
+
按以下优先级查找配置文件(找到第一个即停止):
|
|
17
|
+
1. `/app/.clickzetta/lakehouse_connection/connections.json`
|
|
18
|
+
2. `config/lakehouse_connection/connections.json`
|
|
19
|
+
3. `~/.clickzetta/connections.json`
|
|
20
|
+
4. `/app/.clickzetta/connections.json`
|
|
21
|
+
|
|
22
|
+
找到配置文件后:
|
|
23
|
+
- 解析 JSON,提取 `connections` 数组
|
|
24
|
+
- 根据用户描述的区域/环境匹配对应连接(如"阿里云上海"匹配 `service` 含 `cn-shanghai-alicloud` 的连接)
|
|
25
|
+
- 若有 `is_default: true` 且用户未指定区域,使用默认连接
|
|
26
|
+
- **不要将密码或完整配置输出到对话中**,仅内部使用
|
|
27
|
+
|
|
28
|
+
若配置文件不存在或无匹配连接,再向用户询问:service、instance、workspace、username、password、schema、vcluster。
|
|
29
|
+
|
|
30
|
+
### 步骤 1:确认连接方式
|
|
31
|
+
|
|
32
|
+
根据用户场景选择连接方式,阅读对应参考文件:
|
|
33
|
+
|
|
34
|
+
| 用户需求 | 参考文件 |
|
|
35
|
+
|:--|:--|
|
|
36
|
+
| Python 脚本 / 自动化 / 执行 SQL | [references/python-sdk.md](references/python-sdk.md) |
|
|
37
|
+
| DataFrame / 数据工程 | [references/zettapark-session.md](references/zettapark-session.md) |
|
|
38
|
+
| ORM / Web 应用 / BI 工具(Superset) | [references/sqlalchemy.md](references/sqlalchemy.md) |
|
|
39
|
+
| Java 应用 / BI 工具(DBeaver) | [references/jdbc.md](references/jdbc.md) |
|
|
40
|
+
| 多环境配置文件管理 | [references/config-file.md](references/config-file.md) |
|
|
41
|
+
|
|
42
|
+
不确定时参考决策树:
|
|
43
|
+
- 需要 DataFrame 操作 → ZettaPark Session
|
|
44
|
+
- 需要 ORM / SQLAlchemy 集成 → SQLAlchemy
|
|
45
|
+
- Java 应用 → JDBC
|
|
46
|
+
- 其他 Python 场景(含直接执行 SQL)→ Python SDK
|
|
47
|
+
|
|
48
|
+
### 步骤 2:确认 service 地址
|
|
49
|
+
|
|
50
|
+
`service` 参数必须包含区域前缀,根据实例所在区域选择:
|
|
51
|
+
|
|
52
|
+
**云器 Lakehouse(国内版,`clickzetta.com`)**
|
|
53
|
+
|
|
54
|
+
| 云厂商 | 区域 | service 地址 |
|
|
55
|
+
|:--|:--|:--|
|
|
56
|
+
| 阿里云 | 华东2(上海) | `cn-shanghai-alicloud.api.clickzetta.com` |
|
|
57
|
+
| 腾讯云 | 华东(上海) | `ap-shanghai-tencentcloud.api.clickzetta.com` |
|
|
58
|
+
| 腾讯云 | 华北(北京) | `ap-beijing-tencentcloud.api.clickzetta.com` |
|
|
59
|
+
| 腾讯云 | 华南(广州) | `ap-guangzhou-tencentcloud.api.clickzetta.com` |
|
|
60
|
+
| AWS | 中国(北京) | `cn-north-1-aws.api.clickzetta.com` |
|
|
61
|
+
|
|
62
|
+
**Singdata Lakehouse(国际版,`singdata.com`)**
|
|
63
|
+
|
|
64
|
+
| 云厂商 | 区域 | service 地址 |
|
|
65
|
+
|:--|:--|:--|
|
|
66
|
+
| 阿里云 | 亚太东南1(新加坡) | `ap-southeast-1-alicloud.api.singdata.com` |
|
|
67
|
+
| AWS | 亚太(新加坡) | `ap-southeast-1-aws.api.singdata.com` |
|
|
68
|
+
|
|
69
|
+
控制台:`https://{instance}.{region}.app.clickzetta.com`
|
|
70
|
+
|
|
71
|
+
### 步骤 3:执行查询或提供可运行代码
|
|
72
|
+
|
|
73
|
+
**若用户要求执行查询(如 SHOW SCHEMAS、SELECT、SHOW TABLES 等):**
|
|
74
|
+
|
|
75
|
+
1. 确认 `clickzetta-connector-python` 已安装:
|
|
76
|
+
```bash
|
|
77
|
+
pip3 show clickzetta-connector-python
|
|
78
|
+
```
|
|
79
|
+
若未安装,执行:`pip3 install clickzetta-connector-python --user`
|
|
80
|
+
|
|
81
|
+
2. 使用步骤 0 获取的连接参数直接执行查询,将结果格式化后展示给用户。
|
|
82
|
+
|
|
83
|
+
**若用户要求生成代码:**
|
|
84
|
+
|
|
85
|
+
阅读对应参考文件后,根据参数生成完整可运行代码。所有参数均为必填,`vcluster` 默认值为 `default_ap`。
|
|
86
|
+
|
|
87
|
+
密码含特殊字符时(SQLAlchemy URI),提醒用户用 `urllib.parse.quote_plus()` 编码。
|
|
88
|
+
|
|
89
|
+
## 示例
|
|
90
|
+
|
|
91
|
+
### 示例 0:自动读取配置并执行查询
|
|
92
|
+
|
|
93
|
+
```python
|
|
94
|
+
import json, os, clickzetta
|
|
95
|
+
|
|
96
|
+
# 按优先级查找配置文件
|
|
97
|
+
config_paths = [
|
|
98
|
+
"/app/.clickzetta/lakehouse_connection/connections.json",
|
|
99
|
+
"config/lakehouse_connection/connections.json",
|
|
100
|
+
os.path.expanduser("~/.clickzetta/connections.json"),
|
|
101
|
+
"/app/.clickzetta/connections.json",
|
|
102
|
+
]
|
|
103
|
+
config = None
|
|
104
|
+
for path in config_paths:
|
|
105
|
+
if os.path.exists(path):
|
|
106
|
+
with open(path) as f:
|
|
107
|
+
config = json.load(f)
|
|
108
|
+
break
|
|
109
|
+
|
|
110
|
+
# 选择目标连接(示例:匹配阿里云上海)
|
|
111
|
+
conn_cfg = next(
|
|
112
|
+
(c for c in config["connections"] if "cn-shanghai-alicloud" in c.get("service", "")),
|
|
113
|
+
None
|
|
114
|
+
) or next((c for c in config["connections"] if c.get("is_default")), config["connections"][0])
|
|
115
|
+
|
|
116
|
+
conn = clickzetta.connect(
|
|
117
|
+
service=conn_cfg["service"],
|
|
118
|
+
instance=conn_cfg["instance"],
|
|
119
|
+
workspace=conn_cfg["workspace"],
|
|
120
|
+
schema=conn_cfg.get("schema", "public"),
|
|
121
|
+
username=conn_cfg["username"],
|
|
122
|
+
password=conn_cfg["password"],
|
|
123
|
+
vcluster=conn_cfg.get("vcluster", "default_ap")
|
|
124
|
+
)
|
|
125
|
+
cursor = conn.cursor()
|
|
126
|
+
cursor.execute("SHOW SCHEMAS")
|
|
127
|
+
for row in cursor.fetchall():
|
|
128
|
+
print(row[0])
|
|
129
|
+
cursor.close()
|
|
130
|
+
conn.close()
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
### 示例 1:Python SDK 连接并查询
|
|
134
|
+
|
|
135
|
+
```python
|
|
136
|
+
import clickzetta
|
|
137
|
+
|
|
138
|
+
conn = clickzetta.connect(
|
|
139
|
+
service="cn-shanghai-alicloud.api.clickzetta.com",
|
|
140
|
+
instance="my_instance",
|
|
141
|
+
workspace="my_workspace",
|
|
142
|
+
schema="public",
|
|
143
|
+
username="my_user",
|
|
144
|
+
password="my_password",
|
|
145
|
+
vcluster="default_ap"
|
|
146
|
+
)
|
|
147
|
+
cursor = conn.cursor()
|
|
148
|
+
cursor.execute("SELECT * FROM orders LIMIT 10")
|
|
149
|
+
for row in cursor.fetchall():
|
|
150
|
+
print(row)
|
|
151
|
+
cursor.close()
|
|
152
|
+
conn.close()
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
### 示例 2:ZettaPark 按 region 汇总 revenue
|
|
156
|
+
|
|
157
|
+
```python
|
|
158
|
+
from clickzetta.zettapark.session import Session
|
|
159
|
+
from clickzetta.zettapark import functions as F
|
|
160
|
+
|
|
161
|
+
session = Session.builder.configs({
|
|
162
|
+
"service": "cn-shanghai-alicloud.api.clickzetta.com",
|
|
163
|
+
"instance": "my_instance", "workspace": "my_workspace",
|
|
164
|
+
"schema": "public", "username": "my_user",
|
|
165
|
+
"password": "my_password", "vcluster": "default_ap"
|
|
166
|
+
}).create()
|
|
167
|
+
|
|
168
|
+
session.table("sales") \
|
|
169
|
+
.group_by(F.col("region")) \
|
|
170
|
+
.agg(F.sum("revenue").as_("total_revenue")) \
|
|
171
|
+
.write.save_as_table("sales_summary", mode="overwrite")
|
|
172
|
+
session.close()
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
## 故障排除
|
|
176
|
+
|
|
177
|
+
| 错误信息 | 原因 | 解决方案 |
|
|
178
|
+
|:--|:--|:--|
|
|
179
|
+
| `Connection refused` | service 地址不正确或网络不通 | 检查 service 是否匹配区域(参见步骤 2 区域表) |
|
|
180
|
+
| `Authentication failed` | 用户名或密码错误 | 核实 username 和 password |
|
|
181
|
+
| `Workspace not found` | 工作空间名称不存在 | 在控制台确认 workspace 拼写 |
|
|
182
|
+
| `Instance not found` | 实例名称不存在 | 在控制台确认 instance 拼写 |
|
|
183
|
+
| `Timeout` | 查询超时 | 增大 `hints` 中的 `sdk.job.timeout`(默认 300 秒) |
|
|
184
|
+
| `VCluster not available` | 虚拟集群未启动或名称错误 | 确认 vcluster 名称,检查集群状态 |
|
|
185
|
+
| SQLAlchemy URL 解析错误 | 密码含特殊字符 | 用 `urllib.parse.quote_plus()` 对密码 URL 编码 |
|
|
186
|
+
| `ClassNotFoundException` | JDBC 驱动未在 classpath | 确保 `clickzetta-java` JAR 已加入 classpath |
|
|
187
|
+
|
|
188
|
+
## 安装
|
|
189
|
+
|
|
190
|
+
> ⚠️ **Python 版本要求**:推荐 **Python 3.12**(最低 3.10)。Python 3.9 及以下不支持。
|
|
191
|
+
|
|
192
|
+
| 连接方式 | 安装命令 |
|
|
193
|
+
|:--|:--|
|
|
194
|
+
| Python SDK | `pip install clickzetta-connector-python -i https://pypi.tuna.tsinghua.edu.cn/simple` |
|
|
195
|
+
| ZettaPark | `pip install clickzetta-zettapark-python -i https://pypi.tuna.tsinghua.edu.cn/simple` |
|
|
196
|
+
| SQLAlchemy | `pip install clickzetta-connector-python clickzetta-sqlalchemy -i https://pypi.tuna.tsinghua.edu.cn/simple` |
|
|
197
|
+
| JDBC | Maven: `com.clickzetta:clickzetta-java` |
|
|
198
|
+
|
|
199
|
+
```bash
|
|
200
|
+
# 方式 1:venv(Python 内置,推荐)
|
|
201
|
+
python3.12 -m venv .venv
|
|
202
|
+
source .venv/bin/activate # macOS/Linux
|
|
203
|
+
# .venv\Scripts\activate # Windows
|
|
204
|
+
pip install clickzetta-connector-python clickzetta-zettapark-python \
|
|
205
|
+
-i https://pypi.tuna.tsinghua.edu.cn/simple
|
|
206
|
+
|
|
207
|
+
# 方式 2:pyenv(需要切换 Python 版本时)
|
|
208
|
+
pyenv install 3.12.9
|
|
209
|
+
pyenv local 3.12.9
|
|
210
|
+
python -m venv .venv && source .venv/bin/activate
|
|
211
|
+
pip install clickzetta-connector-python clickzetta-zettapark-python \
|
|
212
|
+
-i https://pypi.tuna.tsinghua.edu.cn/simple
|
|
213
|
+
|
|
214
|
+
# 方式 3:conda(数据科学环境)
|
|
215
|
+
conda create -n lakehouse python=3.12 -y && conda activate lakehouse
|
|
216
|
+
pip install clickzetta-connector-python clickzetta-zettapark-python \
|
|
217
|
+
-i https://pypi.tuna.tsinghua.edu.cn/simple
|
|
218
|
+
```
|
|
@@ -0,0 +1,3 @@
|
|
|
1
|
+
{"case_id":"001","type":"should_call","user_input":"Python SDK 怎么连接 ClickZetta Lakehouse?连接参数有哪些?","expected_skill":"clickzetta-lakehouse-connect","expected_output_contains":["clickzetta","connect"]}
|
|
2
|
+
{"case_id":"002","type":"should_call","user_input":"JDBC 连接 ClickZetta 的 URL 格式是什么?","expected_skill":"clickzetta-lakehouse-connect","expected_output_contains":["jdbc","clickzetta"]}
|
|
3
|
+
{"case_id":"003","type":"should_call","user_input":"SQLAlchemy 怎么配置连接 ClickZetta?","expected_skill":"clickzetta-lakehouse-connect","expected_output_contains":["sqlalchemy","clickzetta"]}
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
{
|
|
2
|
+
"skill_name": "clickzetta-lakehouse-connect",
|
|
3
|
+
"evals": [
|
|
4
|
+
{
|
|
5
|
+
"id": 1,
|
|
6
|
+
"prompt": "我需要用 Python 连接 ClickZetta,实例名是 my_instance,工作空间是 analytics,region 是上海阿里云,用户名 alice,密码 secret123。帮我写一段查询 orders 表前 10 行的代码。",
|
|
7
|
+
"expected_output": "使用 clickzetta.connect() 或 clickzetta-connector-python,包含所有必填参数(service/instance/workspace/schema/username/password/vcluster),并演示 cursor.execute + fetchall 查询",
|
|
8
|
+
"files": []
|
|
9
|
+
},
|
|
10
|
+
{
|
|
11
|
+
"id": 2,
|
|
12
|
+
"prompt": "我想用 ZettaPark 做数据工程,需要读取 sales 表,按 region 分组求 revenue 总和,然后写回到 sales_summary 表。帮我写完整代码。",
|
|
13
|
+
"expected_output": "使用 Session.builder.configs().create(),展示 session.table() + group_by + agg + write.save_as_table(),包含连接参数配置",
|
|
14
|
+
"files": []
|
|
15
|
+
},
|
|
16
|
+
{
|
|
17
|
+
"id": 3,
|
|
18
|
+
"prompt": "我在用 Apache Superset 连接 ClickZetta,SQLAlchemy URI 应该怎么填?密码是 P@ss#2024,需要注意什么?",
|
|
19
|
+
"expected_output": "提供正确的 clickzetta:// URI 格式,指出密码特殊字符需要 quote_plus 编码,给出编码后的示例",
|
|
20
|
+
"files": []
|
|
21
|
+
},
|
|
22
|
+
{
|
|
23
|
+
"id": 4,
|
|
24
|
+
"prompt": "连接云器 Lakehouse 报错 Connection refused,我的 service 填的是 api.clickzetta.com,实例在上海腾讯云,怎么排查?",
|
|
25
|
+
"expected_output": "识别 service 地址填错,给出正确的上海腾讯云地址 ap-shanghai-tencentcloud.api.clickzetta.com,并提供排查步骤",
|
|
26
|
+
"files": []
|
|
27
|
+
},
|
|
28
|
+
{
|
|
29
|
+
"id": 5,
|
|
30
|
+
"prompt": "我有三个环境:dev/staging/prod,都在同一个 ClickZetta 实例上但不同 workspace。想用 connections.json 统一管理,并在代码里切换。怎么配置?",
|
|
31
|
+
"expected_output": "提供 connections.json 多连接配置示例(含 is_default),展示 switch_connection() 用法,说明文件放置路径",
|
|
32
|
+
"files": []
|
|
33
|
+
}
|
|
34
|
+
]
|
|
35
|
+
}
|
|
@@ -0,0 +1,435 @@
|
|
|
1
|
+
# 配置文件管理 — connections.json 详细参考
|
|
2
|
+
|
|
3
|
+
本文档详细说明 ClickZetta Lakehouse 的连接配置文件格式、搜索路径、多连接管理以及 Docker 部署方式。所有内容基于 `mcp-clickzetta-server` 项目中的真实代码。
|
|
4
|
+
|
|
5
|
+
## 目录
|
|
6
|
+
|
|
7
|
+
- [JSON 配置文件格式](#json-配置文件格式)
|
|
8
|
+
- [五级搜索路径](#五级搜索路径)
|
|
9
|
+
- [多连接管理](#多连接管理)
|
|
10
|
+
- [环境变量备选](#环境变量备选)
|
|
11
|
+
- [Docker / Kubernetes 部署配置](#docker--kubernetes-部署配置)
|
|
12
|
+
- [system_config 参数说明](#system_config-参数说明)
|
|
13
|
+
- [适用场景](#适用场景)
|
|
14
|
+
- [常见问题排查](#常见问题排查)
|
|
15
|
+
- [完整配置模板](#完整配置模板)
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## JSON 配置文件格式
|
|
20
|
+
|
|
21
|
+
配置文件 `connections.json` 包含两个顶层对象:`connections`(连接数组)和 `system_config`(系统级配置)。
|
|
22
|
+
|
|
23
|
+
### 基本结构
|
|
24
|
+
|
|
25
|
+
```json
|
|
26
|
+
{
|
|
27
|
+
"connections": [
|
|
28
|
+
{
|
|
29
|
+
"is_default": true,
|
|
30
|
+
"connection_name": "dev",
|
|
31
|
+
"service": "cn-shanghai-alicloud.api.clickzetta.com",
|
|
32
|
+
"username": "my_user",
|
|
33
|
+
"password": "my_password",
|
|
34
|
+
"instance": "my_instance",
|
|
35
|
+
"workspace": "my_workspace",
|
|
36
|
+
"schema": "public",
|
|
37
|
+
"vcluster": "default_ap",
|
|
38
|
+
"description": "开发环境",
|
|
39
|
+
"hints": {
|
|
40
|
+
"sdk.job.timeout": 300,
|
|
41
|
+
"query_tag": "dev-queries"
|
|
42
|
+
}
|
|
43
|
+
}
|
|
44
|
+
],
|
|
45
|
+
"system_config": {
|
|
46
|
+
"allow_write": true,
|
|
47
|
+
"prefetch": false,
|
|
48
|
+
"log_level": "INFO",
|
|
49
|
+
"exclude_tools": []
|
|
50
|
+
}
|
|
51
|
+
}
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
### 连接字段说明
|
|
55
|
+
|
|
56
|
+
| 字段 | 类型 | 必填 | 默认值 | 说明 |
|
|
57
|
+
|:--|:--|:--|:--|:--|
|
|
58
|
+
| `connection_name` | string | ✅ | — | 连接名称,用于在多连接中唯一标识 |
|
|
59
|
+
| `service` | string | ✅ | — | API 服务地址,如 `cn-shanghai-alicloud.api.clickzetta.com` |
|
|
60
|
+
| `username` | string | ✅ | — | 登录用户名 |
|
|
61
|
+
| `password` | string | ✅ | — | 登录密码 |
|
|
62
|
+
| `instance` | string | ✅ | — | 实例名称 |
|
|
63
|
+
| `workspace` | string | ✅ | — | 工作空间名称 |
|
|
64
|
+
| `schema` | string | ✅ | — | 默认 Schema |
|
|
65
|
+
| `vcluster` | string | ❌ | `"default_ap"` | 虚拟集群名称 |
|
|
66
|
+
| `is_default` | boolean | ❌ | `false` | 是否为默认连接 |
|
|
67
|
+
| `description` | string | ❌ | `""` | 连接描述信息 |
|
|
68
|
+
| `hints` | object | ❌ | 见下方 | SDK 运行时参数 |
|
|
69
|
+
|
|
70
|
+
### hints 默认值
|
|
71
|
+
|
|
72
|
+
当 `hints` 未配置时,系统自动使用以下默认值:
|
|
73
|
+
|
|
74
|
+
```json
|
|
75
|
+
{
|
|
76
|
+
"sdk.job.timeout": 300,
|
|
77
|
+
"query_tag": "Query from MCP Server",
|
|
78
|
+
"cz.storage.parquet.vector.index.read.memory.cache": "true",
|
|
79
|
+
"cz.storage.parquet.vector.index.read.local.cache": "false",
|
|
80
|
+
"cz.sql.table.scan.push.down.filter": "true",
|
|
81
|
+
"cz.sql.table.scan.enable.ensure.filter": "true",
|
|
82
|
+
"cz.storage.always.prefetch.internal": "true",
|
|
83
|
+
"cz.optimizer.generate.columns.always.valid": "true",
|
|
84
|
+
"cz.sql.index.prewhere.enabled": "true",
|
|
85
|
+
"cz.storage.parquet.enable.io.prefetch": "false"
|
|
86
|
+
}
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## 五级搜索路径
|
|
92
|
+
|
|
93
|
+
`ConnectionManager` 按以下优先级(从高到低)搜索配置文件,找到第一个可访问的文件即停止:
|
|
94
|
+
|
|
95
|
+
| 优先级 | 路径 | 适用场景 |
|
|
96
|
+
|:--|:--|:--|
|
|
97
|
+
| 1(最高) | `/app/.clickzetta/lakehouse_connection/connections.json` | Docker 统一路径(推荐) |
|
|
98
|
+
| 2 | `/app/config/lakehouse_connection/connections.json` | 旧版 Docker 兼容路径 |
|
|
99
|
+
| 3 | `config/lakehouse_connection/connections.json` | 项目目录相对路径(本地开发) |
|
|
100
|
+
| 4 | `~/.clickzetta/connections.json` | 用户主目录(仅非 Docker 环境) |
|
|
101
|
+
| 5(最低) | `/app/.clickzetta/connections.json` | 简化 Docker 路径 |
|
|
102
|
+
|
|
103
|
+
> **Windows Docker 环境**:系统会自动检测并添加额外的 Windows Docker 友好路径。
|
|
104
|
+
|
|
105
|
+
### 搜索逻辑源码参考
|
|
106
|
+
|
|
107
|
+
```python
|
|
108
|
+
# 来自 connection_manager.py
|
|
109
|
+
class ConnectionManager:
|
|
110
|
+
UNIFIED_CONFIG_FILE = "/app/.clickzetta/lakehouse_connection/connections.json"
|
|
111
|
+
DOCKER_CONFIG_FILE = "/app/config/lakehouse_connection/connections.json"
|
|
112
|
+
DEFAULT_CONFIG_FILE = "config/lakehouse_connection/connections.json"
|
|
113
|
+
USER_CONFIG_FILE = os.path.expanduser("~/.clickzetta/connections.json")
|
|
114
|
+
SIMPLE_CONFIG_FILE = "/app/.clickzetta/connections.json"
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
### 指定配置文件路径
|
|
118
|
+
|
|
119
|
+
也可以在初始化时直接指定配置文件路径,跳过搜索逻辑:
|
|
120
|
+
|
|
121
|
+
```python
|
|
122
|
+
from mcp_clickzetta_server.connection_manager import ConnectionManager
|
|
123
|
+
|
|
124
|
+
# 使用指定路径
|
|
125
|
+
conn_manager = ConnectionManager(config_file="/path/to/my/connections.json")
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
## 多连接管理
|
|
131
|
+
|
|
132
|
+
### 配置多个连接
|
|
133
|
+
|
|
134
|
+
在 `connections` 数组中定义多个连接,通过 `is_default` 标记默认连接:
|
|
135
|
+
|
|
136
|
+
```json
|
|
137
|
+
{
|
|
138
|
+
"connections": [
|
|
139
|
+
{
|
|
140
|
+
"is_default": true,
|
|
141
|
+
"connection_name": "dev",
|
|
142
|
+
"service": "cn-shanghai-alicloud.api.clickzetta.com",
|
|
143
|
+
"username": "dev_user",
|
|
144
|
+
"password": "dev_password",
|
|
145
|
+
"instance": "dev_instance",
|
|
146
|
+
"workspace": "dev_workspace",
|
|
147
|
+
"schema": "public",
|
|
148
|
+
"vcluster": "default_ap",
|
|
149
|
+
"description": "开发环境"
|
|
150
|
+
},
|
|
151
|
+
{
|
|
152
|
+
"connection_name": "staging",
|
|
153
|
+
"service": "cn-shanghai-alicloud.api.clickzetta.com",
|
|
154
|
+
"username": "staging_user",
|
|
155
|
+
"password": "staging_password",
|
|
156
|
+
"instance": "staging_instance",
|
|
157
|
+
"workspace": "staging_workspace",
|
|
158
|
+
"schema": "public",
|
|
159
|
+
"vcluster": "default_ap",
|
|
160
|
+
"description": "预发布环境"
|
|
161
|
+
},
|
|
162
|
+
{
|
|
163
|
+
"connection_name": "prod",
|
|
164
|
+
"service": "cn-shanghai-alicloud.api.clickzetta.com",
|
|
165
|
+
"username": "prod_user",
|
|
166
|
+
"password": "prod_password",
|
|
167
|
+
"instance": "prod_instance",
|
|
168
|
+
"workspace": "prod_workspace",
|
|
169
|
+
"schema": "public",
|
|
170
|
+
"vcluster": "default_ap",
|
|
171
|
+
"description": "生产环境",
|
|
172
|
+
"hints": {
|
|
173
|
+
"sdk.job.timeout": 600,
|
|
174
|
+
"query_tag": "production-queries"
|
|
175
|
+
}
|
|
176
|
+
}
|
|
177
|
+
],
|
|
178
|
+
"system_config": {
|
|
179
|
+
"allow_write": false,
|
|
180
|
+
"log_level": "WARNING"
|
|
181
|
+
}
|
|
182
|
+
}
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
### 连接选择规则
|
|
186
|
+
|
|
187
|
+
1. 如果有 `is_default: true` 的连接,自动设为活跃连接
|
|
188
|
+
2. 如果有多个 `is_default: true`,使用最后一个
|
|
189
|
+
3. 如果没有默认连接,使用第一个有效连接
|
|
190
|
+
4. `service` 为 `"not_configured"` 的占位配置会被自动跳过
|
|
191
|
+
|
|
192
|
+
### 代码中切换连接
|
|
193
|
+
|
|
194
|
+
```python
|
|
195
|
+
from mcp_clickzetta_server.connection_manager import ConnectionManager
|
|
196
|
+
|
|
197
|
+
conn_manager = ConnectionManager()
|
|
198
|
+
|
|
199
|
+
# 列出所有可用连接
|
|
200
|
+
connections = conn_manager.list_connections()
|
|
201
|
+
for conn in connections:
|
|
202
|
+
print(f" {conn['name']}: {conn.get('description', '')}")
|
|
203
|
+
|
|
204
|
+
# 切换到指定连接
|
|
205
|
+
result = conn_manager.switch_connection("prod")
|
|
206
|
+
print(f"已切换到: {result}")
|
|
207
|
+
|
|
208
|
+
# 获取当前活跃连接配置
|
|
209
|
+
active_config = conn_manager.get_active_config()
|
|
210
|
+
print(f"当前连接: {active_config.connection_name}")
|
|
211
|
+
print(f"工作空间: {active_config.workspace}")
|
|
212
|
+
|
|
213
|
+
# 设置新的默认连接
|
|
214
|
+
conn_manager.set_default_connection("staging")
|
|
215
|
+
|
|
216
|
+
# 验证连接配置是否完整
|
|
217
|
+
validation = conn_manager.validate_connection("dev")
|
|
218
|
+
print(f"验证结果: {validation}")
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
---
|
|
222
|
+
|
|
223
|
+
## 环境变量备选
|
|
224
|
+
|
|
225
|
+
当没有找到配置文件时,系统会创建占位配置并启动。以下环境变量可用于配置连接参数:
|
|
226
|
+
|
|
227
|
+
### 连接参数环境变量
|
|
228
|
+
|
|
229
|
+
| 环境变量 | 对应字段 | 说明 |
|
|
230
|
+
|:--|:--|:--|
|
|
231
|
+
| `CLICKZETTA_SERVICE` | `service` | API 服务地址 |
|
|
232
|
+
| `CLICKZETTA_USERNAME` | `username` | 用户名 |
|
|
233
|
+
| `CLICKZETTA_PASSWORD` | `password` | 密码 |
|
|
234
|
+
| `CLICKZETTA_INSTANCE` | `instance` | 实例名称 |
|
|
235
|
+
| `CLICKZETTA_WORKSPACE` | `workspace` | 工作空间 |
|
|
236
|
+
| `CLICKZETTA_SCHEMA` | `schema` | 默认 Schema |
|
|
237
|
+
| `CLICKZETTA_VCLUSTER` | `vcluster` | 虚拟集群(默认 `default_ap`) |
|
|
238
|
+
|
|
239
|
+
### 系统配置环境变量覆盖
|
|
240
|
+
|
|
241
|
+
以下环境变量可覆盖 `system_config` 中的对应配置(优先级最高):
|
|
242
|
+
|
|
243
|
+
| 环境变量 | 对应配置 | 示例值 |
|
|
244
|
+
|:--|:--|:--|
|
|
245
|
+
| `MCP_ALLOW_WRITE` | `allow_write` | `"true"` / `"false"` |
|
|
246
|
+
| `MCP_PREFETCH` | `prefetch` | `"true"` / `"false"` |
|
|
247
|
+
| `MCP_LOG_LEVEL` | `log_level` | `"DEBUG"` / `"INFO"` / `"WARNING"` / `"ERROR"` |
|
|
248
|
+
| `MCP_EXCLUDE_TOOLS` | `exclude_tools` | `"tool1,tool2,tool3"` |
|
|
249
|
+
|
|
250
|
+
### 相似度搜索环境变量
|
|
251
|
+
|
|
252
|
+
| 环境变量 | 对应配置 |
|
|
253
|
+
|:--|:--|
|
|
254
|
+
| `Similar_table_name` | `similar_search.table_name` |
|
|
255
|
+
| `Similar_embedding_column_name` | `similar_search.embedding_column_name` |
|
|
256
|
+
| `Similar_content_column_name` | `similar_search.content_column_name` |
|
|
257
|
+
| `Similar_other_columns_name` | `similar_search.other_columns_name` |
|
|
258
|
+
| `Similar_partition_scope` | `similar_search.partition_scope` |
|
|
259
|
+
|
|
260
|
+
### 云存储环境变量
|
|
261
|
+
|
|
262
|
+
| 环境变量 | 对应配置 |
|
|
263
|
+
|:--|:--|
|
|
264
|
+
| `AWS_ACCESS_KEY_ID` | `cloud_storage.aws_access_key_id` |
|
|
265
|
+
| `AWS_SECRET_ACCESS_KEY` | `cloud_storage.aws_secret_access_key` |
|
|
266
|
+
| `AWS_REGION` | `cloud_storage.aws_region` |
|
|
267
|
+
|
|
268
|
+
### 配置优先级
|
|
269
|
+
|
|
270
|
+
系统配置的加载优先级(从高到低):
|
|
271
|
+
|
|
272
|
+
1. **环境变量**(最高优先级)
|
|
273
|
+
2. **配置文件中的 `system_config`**
|
|
274
|
+
3. **内置默认值**(最低优先级)
|
|
275
|
+
|
|
276
|
+
---
|
|
277
|
+
|
|
278
|
+
## Docker / Kubernetes 部署配置
|
|
279
|
+
|
|
280
|
+
### docker-compose.yml
|
|
281
|
+
|
|
282
|
+
```yaml
|
|
283
|
+
services:
|
|
284
|
+
mcp-server:
|
|
285
|
+
image: mcp-clickzetta-server:latest
|
|
286
|
+
volumes:
|
|
287
|
+
# 推荐:挂载到统一路径(优先级 1)
|
|
288
|
+
- ./config/connections.json:/app/.clickzetta/lakehouse_connection/connections.json:ro
|
|
289
|
+
environment:
|
|
290
|
+
# 可选:通过环境变量覆盖系统配置
|
|
291
|
+
- MCP_ALLOW_WRITE=false
|
|
292
|
+
- MCP_LOG_LEVEL=INFO
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
### 使用旧版路径
|
|
296
|
+
|
|
297
|
+
```yaml
|
|
298
|
+
services:
|
|
299
|
+
mcp-server:
|
|
300
|
+
image: mcp-clickzetta-server:latest
|
|
301
|
+
volumes:
|
|
302
|
+
# 旧版兼容路径(优先级 2)
|
|
303
|
+
- ./config/connections.json:/app/config/lakehouse_connection/connections.json:ro
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
### Kubernetes ConfigMap
|
|
307
|
+
|
|
308
|
+
```yaml
|
|
309
|
+
apiVersion: v1
|
|
310
|
+
kind: ConfigMap
|
|
311
|
+
metadata:
|
|
312
|
+
name: clickzetta-config
|
|
313
|
+
data:
|
|
314
|
+
connections.json: |
|
|
315
|
+
{
|
|
316
|
+
"connections": [
|
|
317
|
+
{
|
|
318
|
+
"is_default": true,
|
|
319
|
+
"connection_name": "prod",
|
|
320
|
+
"service": "cn-shanghai-alicloud.api.clickzetta.com",
|
|
321
|
+
"username": "prod_user",
|
|
322
|
+
"password": "prod_password",
|
|
323
|
+
"instance": "prod_instance",
|
|
324
|
+
"workspace": "prod_workspace",
|
|
325
|
+
"schema": "public",
|
|
326
|
+
"vcluster": "default_ap"
|
|
327
|
+
}
|
|
328
|
+
],
|
|
329
|
+
"system_config": {
|
|
330
|
+
"allow_write": false,
|
|
331
|
+
"log_level": "WARNING"
|
|
332
|
+
}
|
|
333
|
+
}
|
|
334
|
+
---
|
|
335
|
+
apiVersion: apps/v1
|
|
336
|
+
kind: Deployment
|
|
337
|
+
metadata:
|
|
338
|
+
name: mcp-server
|
|
339
|
+
spec:
|
|
340
|
+
template:
|
|
341
|
+
spec:
|
|
342
|
+
containers:
|
|
343
|
+
- name: mcp-server
|
|
344
|
+
volumeMounts:
|
|
345
|
+
- name: config
|
|
346
|
+
mountPath: /app/.clickzetta/lakehouse_connection/connections.json
|
|
347
|
+
subPath: connections.json
|
|
348
|
+
readOnly: true
|
|
349
|
+
volumes:
|
|
350
|
+
- name: config
|
|
351
|
+
configMap:
|
|
352
|
+
name: clickzetta-config
|
|
353
|
+
```
|
|
354
|
+
|
|
355
|
+
> **安全提示**:生产环境建议使用 Kubernetes Secret 替代 ConfigMap 存储密码。
|
|
356
|
+
|
|
357
|
+
---
|
|
358
|
+
|
|
359
|
+
## system_config 参数说明
|
|
360
|
+
|
|
361
|
+
### 核心参数
|
|
362
|
+
|
|
363
|
+
| 参数 | 类型 | 默认值 | 说明 |
|
|
364
|
+
|:--|:--|:--|:--|
|
|
365
|
+
| `allow_write` | boolean | `false` | 是否允许写操作(INSERT/UPDATE/DELETE) |
|
|
366
|
+
| `prefetch` | boolean | `true` | 是否启用数据预取 |
|
|
367
|
+
| `log_level` | string | `"INFO"` | 日志级别:`DEBUG` / `INFO` / `WARNING` / `ERROR` |
|
|
368
|
+
| `exclude_tools` | array | `[]` | 需要排除的工具列表 |
|
|
369
|
+
|
|
370
|
+
---
|
|
371
|
+
|
|
372
|
+
## 适用场景
|
|
373
|
+
|
|
374
|
+
| 场景 | 推荐配置方式 | 说明 |
|
|
375
|
+
|:--|:--|:--|
|
|
376
|
+
| 本地开发 | `~/.clickzetta/connections.json` | 用户主目录,个人配置 |
|
|
377
|
+
| 团队共享 | `config/lakehouse_connection/connections.json` | 项目目录,可纳入版本管理 |
|
|
378
|
+
| Docker 部署 | 挂载到 `/app/.clickzetta/lakehouse_connection/` | 统一路径,推荐方式 |
|
|
379
|
+
| Kubernetes | ConfigMap / Secret | 声明式配置管理 |
|
|
380
|
+
| CI/CD 流水线 | 环境变量 | 无需配置文件,适合自动化 |
|
|
381
|
+
| 多环境切换 | 多连接 + `switch_connection()` | 在同一配置文件中管理多个环境 |
|
|
382
|
+
|
|
383
|
+
---
|
|
384
|
+
|
|
385
|
+
## 常见问题排查
|
|
386
|
+
|
|
387
|
+
### 配置文件未找到
|
|
388
|
+
|
|
389
|
+
```
|
|
390
|
+
WARNING: 未找到任何可用的配置文件
|
|
391
|
+
```
|
|
392
|
+
|
|
393
|
+
**解决方案**:
|
|
394
|
+
1. 确认文件存在于五级搜索路径之一
|
|
395
|
+
2. 检查文件权限:`chmod 600 connections.json`
|
|
396
|
+
3. Docker 环境检查挂载是否正确
|
|
397
|
+
|
|
398
|
+
### JSON 格式错误
|
|
399
|
+
|
|
400
|
+
```
|
|
401
|
+
ERROR: 配置文件JSON格式错误
|
|
402
|
+
```
|
|
403
|
+
|
|
404
|
+
**解决方案**:
|
|
405
|
+
1. 使用 JSON 验证工具检查格式
|
|
406
|
+
2. 确保所有字符串使用双引号
|
|
407
|
+
3. 检查是否有多余的逗号
|
|
408
|
+
4. 确保中文字符使用 UTF-8 编码保存
|
|
409
|
+
|
|
410
|
+
### 连接配置缺少必需字段
|
|
411
|
+
|
|
412
|
+
每个连接必须包含以下 7 个必需字段:
|
|
413
|
+
|
|
414
|
+
```
|
|
415
|
+
connection_name, service, username, password, instance, workspace, schema
|
|
416
|
+
```
|
|
417
|
+
|
|
418
|
+
### 占位配置被跳过
|
|
419
|
+
|
|
420
|
+
`service` 设置为 `"not_configured"` 的连接会被自动跳过。确保所有生产连接都配置了正确的 `service` 地址。
|
|
421
|
+
|
|
422
|
+
---
|
|
423
|
+
|
|
424
|
+
## 完整配置模板
|
|
425
|
+
|
|
426
|
+
参考项目中的模板文件:`config/connections-template.json`
|
|
427
|
+
|
|
428
|
+
```bash
|
|
429
|
+
# 复制模板并修改
|
|
430
|
+
cp config/connections-template.json ~/.clickzetta/connections.json
|
|
431
|
+
# 编辑配置
|
|
432
|
+
vim ~/.clickzetta/connections.json
|
|
433
|
+
# 设置文件权限(保护密码)
|
|
434
|
+
chmod 600 ~/.clickzetta/connections.json
|
|
435
|
+
```
|