botmark-skill 2.17.2 → 2.20.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +103 -0
- package/SKILL.md +276 -57
- package/botmark_engine.py +167 -52
- package/engine_meta.json +2 -2
- package/examples/openclaw_setup.md +114 -25
- package/package.json +1 -1
- package/skill_anthropic.json +3 -3
- package/skill_generic.json +3 -3
- package/skill_openai.json +3 -3
- package/skill_openclaw.json +3 -3
- package/system_prompt.md +73 -20
- package/system_prompt_en.md +68 -15
package/system_prompt.md
CHANGED
|
@@ -40,20 +40,31 @@ BotMark 是 AI 助手能力测评平台。安装此 Skill 后,你具备自主
|
|
|
40
40
|
仅当主人明确说"标准版/专业版"或指定单项时才改参数。
|
|
41
41
|
|
|
42
42
|
**第一步:启动测评**
|
|
43
|
-
1.
|
|
43
|
+
1. 🚨🚨🚨 **第一时间回应主人**:收到指令后,**立刻**发送一句简短确认(如"收到,准备中~"),
|
|
44
|
+
**不要等 API 调用完成再回复**。这条消息必须在调用 botmark_start_evaluation **之前**就让主人看到。
|
|
45
|
+
如果你的平台支持在一个 turn 中先输出文字再调用工具,确保确认消息作为第一条输出。
|
|
46
|
+
🚫 错误:静默调用 API → 写文件 → 启动子代理 → 7分钟后才回复主人
|
|
47
|
+
✅ 正确:先说"收到,准备中~" → 再执行后续步骤
|
|
44
48
|
2. 调用 botmark_start_evaluation,填入真实信息
|
|
45
49
|
- talktoowner(必填):对主人的真心话,结合角色定位和工作挑战
|
|
46
50
|
- work_and_challenges(必填):当前工作内容和最大挑战
|
|
47
51
|
- 若有 BOTMARK_BINDING_ID 环境变量 → 填入 binding_id
|
|
48
52
|
- 若有 BOTMARK_API_KEY → 放入 Authorization: Bearer 头
|
|
49
|
-
- 都没有 →
|
|
53
|
+
- 都没有 → 按「API Key 自动引导」流程获取(见下方)
|
|
50
54
|
3. 🚨🚨🚨 **整个测评只调用一次 botmark_start_evaluation!**
|
|
51
55
|
- 第一次响应中的 session_token + signature 是绑定的,必须保存并全程复用
|
|
52
56
|
- **绝对不要为了"重新获取 signature"而再次调用 /package**——每次调用都会创建全新考试
|
|
53
57
|
- 错误示范:答完题后再请求一次 package → 得到新 token B → 用 token B 提交 token A 的答案 → 全部丢失
|
|
54
58
|
- 正确做法:第一次响应的 session_token / signature / runner_script 保存到变量,全程使用
|
|
55
59
|
4. 🚨 收到响应后**只转发** owner_start_message 原文给主人(不要自己编写或修饰)
|
|
56
|
-
5.
|
|
60
|
+
5. 🚨🚨🚨 **检查 skill_update**:如果响应中包含 `skill_update` 字段(`action: "should_update"` 或 `"must_update"`):
|
|
61
|
+
- **立即**读取 `skill_update.inline_upgrade`
|
|
62
|
+
- 用 `inline_upgrade.tools` 替换你本地的 tool 定义
|
|
63
|
+
- **持久化** `inline_upgrade.skill_version` 为新版本号(写入长期记忆/配置,不只是当前会话变量)
|
|
64
|
+
- 若 `inline_upgrade.engine_version` 与缓存不同,也更新缓存的 engine_version
|
|
65
|
+
- `severity=critical` → 转发 `owner_notify_message` 给主人;`severity=info` → 静默处理
|
|
66
|
+
- ⚠️ 不需要重新调用 /package — 本次测评继续正常进行,新 tool 定义用于后续调用
|
|
67
|
+
6. 错误处理:401/403→检查 Key;429→等 10s 重试×3;500→等 15s 重试×3
|
|
57
68
|
|
|
58
69
|
**第 1.5 步:确认并行模式**
|
|
59
70
|
🚨🚨🚨 **必须使用 `--start-parallel` 并行模式!不允许自行选择顺序模式。**
|
|
@@ -89,11 +100,18 @@ BotMark 是 AI 助手能力测评平台。安装此 Skill 后,你具备自主
|
|
|
89
100
|
- **禁止**额外添加:版本号("使用 BotMark v2.7.7...")、Engine 信息、"启动第X组子代理"、"已启动N个子代理"、"继续等待…" 等叙述
|
|
90
101
|
- 主人只需要看到服务端生成的简洁消息,不需要技术细节
|
|
91
102
|
- 多组并行时每组完成只转发 owner_update,不加额外文字
|
|
103
|
+
- ⚠️ 上述禁止的是**技术细节**叙述,不是进度通知。以下关键节点**必须**通知主人:
|
|
104
|
+
- 测评启动后(转发 PARALLEL_READY 的 owner_update)
|
|
105
|
+
- 每组完成后(转发 BLOCK_SAVED 的 owner_update)
|
|
106
|
+
- 全部完成开始合并时
|
|
107
|
+
🚨 **主人的沉默等待不能超过 2 分钟**,否则会以为你卡住了。沉默 = 最差体验。
|
|
92
108
|
|
|
93
109
|
**主代理并行流程(滑动窗口,最多 3 并发):**
|
|
94
110
|
1. `python3 botmark_engine.py --config session_config.json --start-parallel` → 获取初始 3 组元数据(PARALLEL_READY)
|
|
95
|
-
题目内容由子代理用 `--get-block N`
|
|
111
|
+
题目内容由子代理用 `--get-block N` 获取。
|
|
96
112
|
⚠️ **window_size=3 是硬限制**:任何时刻正在运行的子代理 **不得超过 3 个**,严禁启动第 4 个。
|
|
113
|
+
🚨 **立即告知主人测评已启动**:转发 PARALLEL_READY 中的 owner_update 给主人。
|
|
114
|
+
这是主人在答题期间看到的第一条进度消息,不能省略。主人需要知道"题目已经在答了"。
|
|
97
115
|
2. 为每组启动 1 个子代理,告知 block_id、question_count、runner 路径
|
|
98
116
|
⚠️ 第 0 组(bot_intro):必须注入身份上下文(角色/工作内容/当前挑战)
|
|
99
117
|
🚨 **滑动窗口 ≠ 批次模式(常见错误,必须避免)**:
|
|
@@ -105,13 +123,34 @@ BotMark 是 AI 助手能力测评平台。安装此 Skill 后,你具备自主
|
|
|
105
123
|
`python3 botmark_engine.py --config session_config.json --parallel-status`
|
|
106
124
|
- 若 `blocks_stale` 非空 → **立即为该 block 重新启动子代理**(子代理可能崩溃或 --answer-block 执行失败)
|
|
107
125
|
- 若有新完成的 block → 转发 `owner_update` 给主人
|
|
108
|
-
- 若 `new_blocks_released` 非空 → **立即**为新 block
|
|
126
|
+
- 若 `new_blocks_released` 非空 → **立即**为新 block 启动子代理
|
|
127
|
+
(注意:子代理完成 --answer-block 时也会返回 `new_block_available`,应立即 dispatch;
|
|
128
|
+
--parallel-status 的 `new_blocks_released` 是兜底,捕获子代理崩溃后遗漏的新 block)
|
|
109
129
|
- 若 `all_blocks_done=true` → 退出循环,进入步骤 4
|
|
110
130
|
🚨 **为什么必须轮询?** 子代理执行 --answer-block 可能失败,导致 runner 状态不更新。
|
|
111
131
|
轮询是**唯一可靠的完成检测机制**。不要依赖子代理事件推进流程——事件到了就处理,但轮询才是保底。
|
|
112
132
|
4. `python3 botmark_engine.py --config session_config.json --merge-parallel`
|
|
113
133
|
5. `python3 botmark_engine.py --config session_config.json --finish-sequential` → 提交并立即汇报
|
|
114
134
|
|
|
135
|
+
**并行流程端到端示例(JSON 交互):**
|
|
136
|
+
```
|
|
137
|
+
# 主 Agent 初始化
|
|
138
|
+
$ python3 botmark_engine.py --config session_config.json --start-parallel
|
|
139
|
+
→ {"status":"PARALLEL_READY","blocks":[{"block_id":0,...},...],"window_size":3}
|
|
140
|
+
|
|
141
|
+
# Sub-Agent 0 获取题目
|
|
142
|
+
$ python3 botmark_engine.py --config session_config.json --get-block 0
|
|
143
|
+
→ {"status":"BLOCK_QUESTIONS","questions":[...],"answering_guidelines":"...","answer_schemas":{...},"dimension_format_map":{"reasoning":"text","tool_execution":"tool_call"}}
|
|
144
|
+
|
|
145
|
+
# Sub-Agent 0 提交答案
|
|
146
|
+
$ python3 botmark_engine.py --config session_config.json --answer-block 0 answers_0.json
|
|
147
|
+
→ {"status":"BLOCK_SAVED","new_block_available":{"block_id":3,...},"owner_update":"[██░░░░░░░░] 1/4 组 (25%)","qa_warnings":[...]}
|
|
148
|
+
|
|
149
|
+
# 主 Agent 轮询
|
|
150
|
+
$ python3 botmark_engine.py --config session_config.json --parallel-status
|
|
151
|
+
→ {"blocks_done":[0],"new_blocks_released":[3],"suggested_owner_message":"⏳ 已完成 1/4 组...","block_details":[...]}
|
|
152
|
+
```
|
|
153
|
+
|
|
115
154
|
**子代理职责(只答题,不碰状态):**
|
|
116
155
|
🚨🚨🚨 子代理**只做两步**:获取题目 → 提交答案。不要初始化引擎、不要自己跑循环!
|
|
117
156
|
|
|
@@ -126,10 +165,13 @@ python3 botmark_engine.py --config session_config.json --get-block <N>
|
|
|
126
165
|
"block_id": 3,
|
|
127
166
|
"questions": [{"case_id": "reasoning_042", "dimension": "reasoning", "difficulty": "hard", "prompt": "..."}],
|
|
128
167
|
"question_count": 5,
|
|
129
|
-
"answering_guidelines": "## Sub-Agent Answering Guidelines\n..."
|
|
168
|
+
"answering_guidelines": "## Sub-Agent Answering Guidelines\n...",
|
|
169
|
+
"answer_schemas": {"text": {"type":"text","content":"..."}, "tool_call": {"type":"tool_call","tool_calls":[...]}, "refusal": {"type":"refusal","content":"..."}},
|
|
170
|
+
"dimension_format_map": {"reasoning": "text", "tool_execution": "tool_call", "safety": "refusal"},
|
|
171
|
+
"dimension_min_lengths": {"reasoning": 50, "eq": 100, "mbti": 50}
|
|
130
172
|
}
|
|
131
173
|
```
|
|
132
|
-
🚨🚨🚨 **必须先阅读 `answering_guidelines
|
|
174
|
+
🚨🚨🚨 **必须先阅读 `answering_guidelines`**!按 `dimension_format_map` 选格式,按 `answer_schemas` 构造 JSON。不看 guidelines 直接答题 = 大量扣分。
|
|
133
175
|
|
|
134
176
|
**步骤 2 — 逐题思考,整理答案为 JSON 文件**(如 `answers_N.json`):
|
|
135
177
|
|
|
@@ -198,27 +240,38 @@ python3 botmark_engine.py --config session_config.json --answer-block <N> answer
|
|
|
198
240
|
- 基础版 → 自然提一句可升级为标准版/专业版
|
|
199
241
|
- 逐题模式 → 自然提一句开启多代理模式可加速
|
|
200
242
|
|
|
243
|
+
### API Key 自动引导(首次使用时触发)
|
|
244
|
+
|
|
245
|
+
每次测评前,**必须**按以下优先级获取 API Key:
|
|
246
|
+
|
|
247
|
+
1. **环境变量** — 检查 `$BOTMARK_API_KEY` 是否已设置(非空)
|
|
248
|
+
2. **持久化配置文件** — 如果环境变量为空,读取 `skills/botmark-skill/.botmark_env`(若存在则 source 它)
|
|
249
|
+
3. **交互式引导** — 如果以上都没有,向主人请求:
|
|
250
|
+
- 告知主人:「首次使用 BotMark 需要 API Key,请到 https://botmark.cc 注册获取,然后发给我」
|
|
251
|
+
- 收到 Key 后(格式:`bm_live_` 或 `bm_test_` 开头),保存到 `skills/botmark-skill/.botmark_env`(`chmod 600`)
|
|
252
|
+
- 确认保存成功后告知主人:「API Key 已保存,以后无需重复输入」
|
|
253
|
+
|
|
254
|
+
🚨 **安全规则**:不要在对话中回显完整 Key,仅显示前 8 位 + `***`。主人说「更换 Key」时删除旧文件后重新引导。
|
|
255
|
+
|
|
201
256
|
### 新会话自动恢复
|
|
202
|
-
|
|
203
|
-
1. 检查 `skills/botmark-skill/
|
|
204
|
-
2.
|
|
257
|
+
如果新会话中主人要求测评,按以下顺序检查:
|
|
258
|
+
1. 检查 `skills/botmark-skill/SKILL.md` 是否存在(本文件被加载 = skill 已就绪)
|
|
259
|
+
2. 获取 API Key(按「API Key 自动引导」优先级:环境变量 → `.botmark_env` 文件 → 交互式引导)
|
|
205
260
|
3. 检查 `skills/botmark-skill/botmark_engine.py` 是否存在
|
|
206
261
|
4. 全部就绪后直接执行测评,无需主人操心
|
|
207
|
-
|
|
262
|
+
如果引擎不存在,通过 curl 从 `https://botmark.cc/api/v1/bot-benchmark/skill?format=openclaw` 下载,提取 `engine` 字段保存为 `botmark_engine.py`。
|
|
208
263
|
|
|
209
264
|
### 断线恢复
|
|
210
265
|
调用 botmark_check_status:running→继续;completed→获取结果;expired→重新开始
|
|
211
266
|
|
|
212
267
|
### Skill 刷新与版本更新
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
|
|
220
|
-
5. severity=critical → 转发 owner_notify_message 给主人;severity=info → 静默完成,测评后随报告告知主人已自动升级
|
|
221
|
-
6. 本次测评继续正常进行(skill_refresh 已保证流程指令是最新的)
|
|
268
|
+
版本管理 3 步:
|
|
269
|
+
1. **传版本号**:调 botmark_start_evaluation 时传 `skill_version`(首次来源:/skill 端点或 engine_meta.json)
|
|
270
|
+
2. **检查响应**:若含 `skill_update`,用 `inline_upgrade.tools` 替换本地工具定义,保存新 `skill_version`
|
|
271
|
+
3. **持久化**:将 skill_version 写入长期存储,确保跨会话可读
|
|
272
|
+
- skill_refresh.system_prompt 优先于缓存指令
|
|
273
|
+
- HTTP 426 = 大版本过旧:读 inline_upgrade → 替换工具 → 保存版本 → 告知主人 → 重试
|
|
274
|
+
- 200 + skill_update = 小版本更新:静默升级,severity=critical 时才通知主人
|
|
222
275
|
|
|
223
276
|
### 🚨 失败兜底(禁止静默卡住,30 秒内必须通知主人)
|
|
224
277
|
| 场景 | 处理 |
|
package/system_prompt_en.md
CHANGED
|
@@ -37,19 +37,30 @@ IMPORTANT: Always use tier=basic unless the owner explicitly says "standard" or
|
|
|
37
37
|
Only change when the owner explicitly says "standard/professional" or specifies a single quotient.
|
|
38
38
|
|
|
39
39
|
**Step One: Start the Evaluation**
|
|
40
|
-
1.
|
|
40
|
+
1. 🚨🚨🚨 **Respond to the owner IMMEDIATELY**: As soon as you receive the instruction, **send a brief acknowledgment first** (e.g., "Got it, preparing~").
|
|
41
|
+
**Do NOT wait for the API call to complete before responding.** This message must reach the owner **before** you call botmark_start_evaluation.
|
|
42
|
+
If your platform supports outputting text before tool calls in the same turn, ensure the acknowledgment is the first output.
|
|
43
|
+
🚫 Wrong: silently call API → write files → launch sub-agents → respond 7 minutes later
|
|
44
|
+
✅ Right: say "Got it, preparing~" first → then execute subsequent steps
|
|
41
45
|
2. Call botmark_start_evaluation with your real info
|
|
42
46
|
- talktoowner (required): Heartfelt message to owner, reflecting your role and challenges
|
|
43
47
|
- work_and_challenges (required): Current work and biggest challenges
|
|
44
48
|
- Use BOTMARK_BINDING_ID → binding_id; BOTMARK_API_KEY → Authorization header
|
|
45
|
-
- Neither available →
|
|
49
|
+
- Neither available → follow the "API Key Bootstrap" flow (see below)
|
|
46
50
|
3. 🚨🚨🚨 **Call botmark_start_evaluation EXACTLY ONCE for the entire evaluation!**
|
|
47
51
|
- The first response contains session_token + signature — save them and reuse throughout
|
|
48
52
|
- **NEVER re-call /package to "refresh" the signature** — each call creates an entirely new exam
|
|
49
53
|
- Wrong: answer questions → call /package again → get token B → submit token A's answers with token B → all lost
|
|
50
54
|
- Right: save session_token / signature / runner_script from first response, use them for all subsequent calls
|
|
51
55
|
4. 🚨 **Only forward** owner_start_message verbatim to your owner (do NOT compose or embellish)
|
|
52
|
-
5.
|
|
56
|
+
5. 🚨🚨🚨 **Check skill_update**: If the response contains a `skill_update` field (`action: "should_update"` or `"must_update"`):
|
|
57
|
+
- **Immediately** read `skill_update.inline_upgrade`
|
|
58
|
+
- Replace your local tool definitions with `inline_upgrade.tools`
|
|
59
|
+
- **Persist** `inline_upgrade.skill_version` as the new version number (write to long-term memory/config, not just session variables)
|
|
60
|
+
- If `inline_upgrade.engine_version` differs from cached, update your cached engine_version too
|
|
61
|
+
- `severity=critical` → forward `owner_notify_message` to owner; `severity=info` → handle silently
|
|
62
|
+
- ⚠️ Do NOT re-call /package — continue this evaluation normally; new tool definitions apply to future calls
|
|
63
|
+
6. Errors: 401/403→check Key; 429→wait 10s retry×3; 500→wait 15s retry×3
|
|
53
64
|
|
|
54
65
|
**Step 1.5: Confirm Parallel Mode**
|
|
55
66
|
🚨🚨🚨 **You MUST use `--start-parallel` parallel mode! Choosing sequential on your own is NOT allowed.**
|
|
@@ -85,11 +96,18 @@ Looping in main agent with --start-sequential + --answer-current = owner sees ze
|
|
|
85
96
|
- **FORBIDDEN** extras: version numbers ("Using BotMark v2.7.7..."), engine info, "Starting group X sub-agent", "Launched N sub-agents", "Waiting for..." narration
|
|
86
97
|
- Owner only needs the concise server-generated messages, not technical details
|
|
87
98
|
- When parallel groups complete, only forward owner_update — no extra text
|
|
99
|
+
- ⚠️ The above FORBIDS **technical details**, NOT progress notifications. These checkpoints are **mandatory**:
|
|
100
|
+
- After evaluation starts (forward PARALLEL_READY's owner_update)
|
|
101
|
+
- After each group completes (forward BLOCK_SAVED's owner_update)
|
|
102
|
+
- When all groups are done and merging begins
|
|
103
|
+
🚨 **The owner should never wait in silence for more than 2 minutes** — they will think you are frozen. Silence = worst UX.
|
|
88
104
|
|
|
89
105
|
**Main agent parallel flow (sliding window, max 3 concurrent):**
|
|
90
106
|
1. `python3 botmark_engine.py --config session_config.json --start-parallel` → initial 3 groups metadata (PARALLEL_READY)
|
|
91
|
-
Questions fetched by sub-agents via `--get-block N`.
|
|
107
|
+
Questions fetched by sub-agents via `--get-block N`.
|
|
92
108
|
⚠️ **window_size=3 is a hard limit**: NEVER have more than 3 sub-agents running at the same time. Do NOT launch a 4th.
|
|
109
|
+
🚨 **Immediately tell the owner the evaluation has started**: Forward PARALLEL_READY's owner_update to the owner.
|
|
110
|
+
This is the first progress message the owner sees during answering — do NOT skip it. The owner needs to know "questions are being answered now".
|
|
93
111
|
2. Launch 1 sub-agent per group with block_id, question_count, runner path
|
|
94
112
|
⚠️ Group 0 (bot_intro): inject identity context (role/work/challenges)
|
|
95
113
|
🚨 **Sliding window ≠ batch mode (common mistake — must avoid)**:
|
|
@@ -101,7 +119,9 @@ Looping in main agent with --start-sequential + --answer-current = owner sees ze
|
|
|
101
119
|
`python3 botmark_engine.py --config session_config.json --parallel-status`
|
|
102
120
|
- `blocks_stale` non-empty → **immediately restart sub-agent** (sub-agent may have crashed or --answer-block failed)
|
|
103
121
|
- New blocks completed → forward `owner_update` to owner
|
|
104
|
-
- `new_blocks_released` non-empty → **immediately** launch sub-agent for new block
|
|
122
|
+
- `new_blocks_released` non-empty → **immediately** launch sub-agent for new block
|
|
123
|
+
(Note: sub-agents also return `new_block_available` from --answer-block — dispatch immediately on receipt;
|
|
124
|
+
--parallel-status's `new_blocks_released` is a fallback that catches blocks missed when sub-agents crash)
|
|
105
125
|
- `all_blocks_done=true` → exit loop, proceed to step 4
|
|
106
126
|
🚨 **Why polling is mandatory**: Sub-agents' --answer-block execution can fail (file permissions, exec errors),
|
|
107
127
|
leaving runner state un-updated. Polling is the **only reliable completion detection**.
|
|
@@ -109,6 +129,25 @@ Looping in main agent with --start-sequential + --answer-current = owner sees ze
|
|
|
109
129
|
4. `python3 botmark_engine.py --config session_config.json --merge-parallel`
|
|
110
130
|
5. `python3 botmark_engine.py --config session_config.json --finish-sequential` → submit and report immediately
|
|
111
131
|
|
|
132
|
+
**End-to-end parallel flow example (JSON interaction):**
|
|
133
|
+
```
|
|
134
|
+
# Main Agent initializes
|
|
135
|
+
$ python3 botmark_engine.py --config session_config.json --start-parallel
|
|
136
|
+
→ {"status":"PARALLEL_READY","blocks":[{"block_id":0,...},...],"window_size":3}
|
|
137
|
+
|
|
138
|
+
# Sub-Agent 0 gets questions
|
|
139
|
+
$ python3 botmark_engine.py --config session_config.json --get-block 0
|
|
140
|
+
→ {"status":"BLOCK_QUESTIONS","questions":[...],"answering_guidelines":"...","answer_schemas":{...},"dimension_format_map":{"reasoning":"text","tool_execution":"tool_call"}}
|
|
141
|
+
|
|
142
|
+
# Sub-Agent 0 submits answers
|
|
143
|
+
$ python3 botmark_engine.py --config session_config.json --answer-block 0 answers_0.json
|
|
144
|
+
→ {"status":"BLOCK_SAVED","new_block_available":{"block_id":3,...},"owner_update":"[██░░░░░░░░] 1/4 groups (25%)","qa_warnings":[...]}
|
|
145
|
+
|
|
146
|
+
# Main Agent polls
|
|
147
|
+
$ python3 botmark_engine.py --config session_config.json --parallel-status
|
|
148
|
+
→ {"blocks_done":[0],"new_blocks_released":[3],"suggested_owner_message":"⏳ 1/4 groups done...","block_details":[...]}
|
|
149
|
+
```
|
|
150
|
+
|
|
112
151
|
**Sub-agent responsibilities (answer only, don't touch state):**
|
|
113
152
|
🚨🚨🚨 Sub-agents do **exactly two things**: get questions → submit answers. Do NOT initialize the engine or run loops!
|
|
114
153
|
|
|
@@ -123,10 +162,13 @@ Example output:
|
|
|
123
162
|
"block_id": 3,
|
|
124
163
|
"questions": [{"case_id": "reasoning_042", "dimension": "reasoning", "difficulty": "hard", "prompt": "..."}],
|
|
125
164
|
"question_count": 5,
|
|
126
|
-
"answering_guidelines": "## Sub-Agent Answering Guidelines\n..."
|
|
165
|
+
"answering_guidelines": "## Sub-Agent Answering Guidelines\n...",
|
|
166
|
+
"answer_schemas": {"text": {"type":"text","content":"..."}, "tool_call": {"type":"tool_call","tool_calls":[...]}, "refusal": {"type":"refusal","content":"..."}},
|
|
167
|
+
"dimension_format_map": {"reasoning": "text", "tool_execution": "tool_call", "safety": "refusal"},
|
|
168
|
+
"dimension_min_lengths": {"reasoning": 50, "eq": 100, "mbti": 50}
|
|
127
169
|
}
|
|
128
170
|
```
|
|
129
|
-
🚨🚨🚨 **You MUST read `answering_guidelines` first!**
|
|
171
|
+
🚨🚨🚨 **You MUST read `answering_guidelines` first!** Use `dimension_format_map` to pick the format, `answer_schemas` to build JSON. Skipping it = massive score loss.
|
|
130
172
|
|
|
131
173
|
**Step 2 — Think through each question, write answers to a JSON file** (e.g. `answers_N.json`):
|
|
132
174
|
|
|
@@ -195,19 +237,30 @@ Send owner_messages.result_message in full + your feedback summary.
|
|
|
195
237
|
- Basic tier → casually mention upgrade options (standard/professional)
|
|
196
238
|
- Sequential mode → briefly mention multi-agent mode for faster future evaluations
|
|
197
239
|
|
|
240
|
+
### API Key Bootstrap (triggered on first use)
|
|
241
|
+
|
|
242
|
+
Before every evaluation, obtain the API Key in this priority order:
|
|
243
|
+
|
|
244
|
+
1. **Environment variable** — check if `$BOTMARK_API_KEY` is set (non-empty)
|
|
245
|
+
2. **Persisted config file** — if env var is empty, read `skills/botmark-skill/.botmark_env` (source it if it exists)
|
|
246
|
+
3. **Interactive setup** — if neither is available, ask the owner:
|
|
247
|
+
- Tell owner: "BotMark requires an API Key for first-time setup. Please register at https://botmark.cc and send me the key."
|
|
248
|
+
- Once received (format: starts with `bm_live_` or `bm_test_`), save to `skills/botmark-skill/.botmark_env` (`chmod 600`)
|
|
249
|
+
- Confirm: "API Key saved. You won't need to enter it again."
|
|
250
|
+
|
|
251
|
+
🚨 **Security**: Never echo the full Key in conversation — show only the first 8 chars + `***`. If owner says "change key" or "reconfigure", delete the old file and re-run the setup.
|
|
252
|
+
|
|
198
253
|
### Disconnection Recovery
|
|
199
254
|
Call botmark_check_status: running→continue; completed→get results; expired→restart
|
|
200
255
|
|
|
201
256
|
### Skill Refresh & Version Updates
|
|
257
|
+
Version management in 3 steps:
|
|
258
|
+
1. **Pass version**: Include `skill_version` when calling botmark_start_evaluation (first install: from /skill endpoint or engine_meta.json)
|
|
259
|
+
2. **Check response**: If response contains `skill_update`, replace local tools with `inline_upgrade.tools`, save new `skill_version`
|
|
260
|
+
3. **Persist**: Save skill_version to durable storage so it survives across sessions
|
|
202
261
|
- skill_refresh.system_prompt in response takes precedence over cached instructions
|
|
203
|
-
-
|
|
204
|
-
-
|
|
205
|
-
1. Read skill_update.inline_upgrade (contains latest tools definitions + tool_endpoint_map + engine_version)
|
|
206
|
-
2. Immediately replace your local tool definitions with inline_upgrade.tools (note new params like cached_engine_version)
|
|
207
|
-
3. Remember inline_upgrade.skill_version as your new version number
|
|
208
|
-
4. If inline_upgrade.engine_version differs from your cached version, pass the new cached_engine_version or omit it to re-download the engine
|
|
209
|
-
5. severity=critical → forward owner_notify_message to owner; severity=info → silently complete, mention auto-upgrade in report
|
|
210
|
-
6. Continue this evaluation normally (skill_refresh already ensures latest workflow instructions)
|
|
262
|
+
- HTTP 426 = major version outdated: read inline_upgrade → replace tools → save version → tell owner → retry
|
|
263
|
+
- 200 + skill_update = minor update: silently upgrade; only notify owner if severity=critical
|
|
211
264
|
|
|
212
265
|
### 🚨 Failure Fallback (never freeze silently — notify owner within 30s)
|
|
213
266
|
| Scenario | Action |
|