braintrust-lite 0.1.7 → 0.1.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +73 -102
- package/bin/braintrust +12 -0
- package/package.json +20 -20
- package/skills/consult/SKILL.md +2 -2
- package/src/config.js +60 -0
- package/src/doctor.js +120 -0
- package/src/format.js +26 -49
- package/src/judge.js +87 -0
- package/src/main.js +332 -0
- package/src/memory/db.js +183 -0
- package/src/memory/index.js +31 -0
- package/src/normalize.js +172 -0
- package/src/normalize.test.js +125 -0
- package/src/prompts/architecture.md +21 -0
- package/src/prompts/code.md +21 -0
- package/src/prompts/general.md +22 -0
- package/src/prompts/index.js +49 -0
- package/src/prompts/writing.md +21 -0
- package/src/providers/claude.js +45 -0
- package/src/providers/codex.js +69 -0
- package/src/providers/gemini.js +81 -0
- package/src/providers/index.js +22 -0
- package/src/reflector.js +244 -0
- package/src/save.js +93 -0
- package/src/server.js +245 -38
- package/LICENSE +0 -21
- package/bin/consult +0 -79
- package/scripts/setup.js +0 -66
- package/src/consult.js +0 -81
- package/src/providers.js +0 -91
package/README.md
CHANGED
|
@@ -1,149 +1,120 @@
|
|
|
1
|
-
#
|
|
1
|
+
# brantrust
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
同题多模型融合器 — 把同一个问题同时发给 Claude、Codex、Gemini,然后用一个 Judge 综合出"集大成方案"。
|
|
4
4
|
|
|
5
5
|
```
|
|
6
|
-
|
|
7
|
-
├─ Task(subagent_type=Plan, prompt=X) ← 正常子 agent
|
|
8
|
-
└─ mcp__braintrust_lite__consult(prompt=X) ← Codex + Gemini 旁路咨询
|
|
9
|
-
→ 主 Claude 融合三方视角 → 最终方案
|
|
6
|
+
输入 → 并发生成(3) → 清洗归一化 → Judge 融合(1) → 输出 + 落盘
|
|
10
7
|
```
|
|
11
8
|
|
|
12
|
-
|
|
9
|
+
4 次 API 调用,低成本,天天能用。
|
|
13
10
|
|
|
14
11
|
---
|
|
15
12
|
|
|
16
13
|
## 安装
|
|
17
14
|
|
|
18
|
-
**前置条件**:`codex` 和 `gemini` CLI 均已登录。
|
|
19
|
-
|
|
20
15
|
```bash
|
|
21
16
|
# 克隆
|
|
22
|
-
git clone https://github.com/HongjieRen/
|
|
23
|
-
cd
|
|
24
|
-
|
|
25
|
-
# 安装依赖
|
|
26
|
-
npm install
|
|
27
|
-
|
|
28
|
-
# 可选:把 CLI 软链到 PATH
|
|
29
|
-
ln -sf "$(pwd)/bin/consult" ~/.local/bin/consult
|
|
30
|
-
chmod +x bin/consult
|
|
31
|
-
```
|
|
17
|
+
git clone https://github.com/HongjieRen/brantrust.git
|
|
18
|
+
cd brantrust
|
|
32
19
|
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
```bash
|
|
38
|
-
claude mcp add braintrust-lite node "$(pwd)/src/server.js"
|
|
20
|
+
# 软链接到 PATH
|
|
21
|
+
ln -sf "$(pwd)/brantrust" ~/.local/bin/brantrust
|
|
22
|
+
chmod +x brantrust
|
|
39
23
|
```
|
|
40
24
|
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
重启 Claude Code 后生效。
|
|
44
|
-
|
|
45
|
-
---
|
|
46
|
-
|
|
47
|
-
## 安装 Skill 引导
|
|
48
|
-
|
|
49
|
-
把 skill 软链到 Claude Code 全局 skill 目录,让主 Claude 知道何时该主动使用 consult:
|
|
50
|
-
|
|
51
|
-
```bash
|
|
52
|
-
ln -sf "$(pwd)/skills/consult" ~/.claude/skills/consult
|
|
53
|
-
```
|
|
25
|
+
**前置依赖**(三个 CLI 均需已登录):
|
|
54
26
|
|
|
55
|
-
|
|
27
|
+
| Provider | CLI | 验证命令 |
|
|
28
|
+
|----------|-----|---------|
|
|
29
|
+
| Claude | `claude` | `claude -p "hi" --output-format json` |
|
|
30
|
+
| OpenAI Codex | `codex` | `codex exec "hi" --json --skip-git-repo-check --ephemeral` |
|
|
31
|
+
| Google Gemini | `gemini` | `gemini -p "hi" -o json` |
|
|
56
32
|
|
|
57
33
|
---
|
|
58
34
|
|
|
59
|
-
##
|
|
60
|
-
|
|
61
|
-
### 在 Claude Code 里(推荐)
|
|
62
|
-
|
|
63
|
-
Claude 会在处理规划/设计类任务时自动(或在 `/consult` 引导下)并发调用:
|
|
64
|
-
|
|
65
|
-
```
|
|
66
|
-
你处理一个架构选型任务时,Claude 会同时:
|
|
67
|
-
1. 启动 Plan sub-agent 做深度分析
|
|
68
|
-
2. 调用 mcp__braintrust_lite__consult 获取 Codex + Gemini 的独立视角
|
|
69
|
-
3. 融合三方输出给你最终方案
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
### 终端 CLI(fallback / 调试)
|
|
35
|
+
## 用法
|
|
73
36
|
|
|
74
37
|
```bash
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
38
|
+
brantrust "解释 CAP 定理" # 默认:3 generator + 1 judge
|
|
39
|
+
brantrust --no-judge "React vs Vue" # 只并发收集,不 judge
|
|
40
|
+
brantrust --judge-model gemini "数据库选型" # 切换 Judge 模型
|
|
41
|
+
brantrust --skip codex "量子计算" # 跳过某个模型(可多次)
|
|
42
|
+
cat app.ts | brantrust "review 这段代码" # stdin 管道
|
|
43
|
+
brantrust --dir ~/project "项目分析" # 指定工作目录
|
|
44
|
+
brantrust --context-file design.md "实现方案" # 附加上下文文件
|
|
45
|
+
brantrust --timeout 60 "快速问题" # 超时秒数
|
|
46
|
+
brantrust --no-save "临时问答" # 不保存到磁盘
|
|
47
|
+
brantrust --json "问题" # 输出完整 JSON
|
|
48
|
+
brantrust --list # 查看历史运行
|
|
49
|
+
brantrust --strict "关键决策" # [v2] 完整 Judge 流水线
|
|
82
50
|
```
|
|
83
51
|
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
## 参数
|
|
52
|
+
### 参数一览
|
|
87
53
|
|
|
88
54
|
| 参数 | 默认 | 说明 |
|
|
89
|
-
|
|
90
|
-
| `prompt` | 必须 |
|
|
91
|
-
|
|
|
92
|
-
| `
|
|
93
|
-
| `
|
|
94
|
-
|
|
|
95
|
-
| `--
|
|
55
|
+
|------|------|------|
|
|
56
|
+
| `"prompt"` | 必须 | 问题文本 |
|
|
57
|
+
| `--skip <model>` | — | 跳过模型:claude / codex / gemini,可多次使用 |
|
|
58
|
+
| `--judge-model <model>` | `claude` | Judge 使用的模型 |
|
|
59
|
+
| `--no-judge` | false | 关闭 Judge,只展示各模型原始回答 |
|
|
60
|
+
| `--timeout <sec>` | `120` | 每个模型的超时秒数 |
|
|
61
|
+
| `--dir <path>` | cwd | CLI 工具的工作目录 |
|
|
62
|
+
| `--context-file <file>` | — | 附加文件内容作为上下文(最多 8000 字符)|
|
|
63
|
+
| `--no-save` | false | 不保存结果到磁盘 |
|
|
64
|
+
| `--json` | false | 将完整结果以 JSON 格式输出到 stdout |
|
|
65
|
+
| `--list` | — | 列出最近 20 条历史运行 |
|
|
66
|
+
| `--strict` | — | [v2 占位] 两阶段 Judge + swap-compare |
|
|
96
67
|
|
|
97
68
|
---
|
|
98
69
|
|
|
99
|
-
##
|
|
70
|
+
## 输出
|
|
100
71
|
|
|
101
|
-
|
|
102
|
-
## CODEX (8.2s)
|
|
103
|
-
|
|
104
|
-
<codex 完整回答>
|
|
105
|
-
|
|
106
|
-
---
|
|
72
|
+
**终端**:各模型回答 + Judge 融合报告(Markdown 格式)
|
|
107
73
|
|
|
108
|
-
|
|
74
|
+
**落盘**(`~/ai-outputs/<timestamp>/`):
|
|
109
75
|
|
|
110
|
-
<gemini 完整回答>
|
|
111
76
|
```
|
|
112
|
-
|
|
113
|
-
|
|
77
|
+
~/ai-outputs/2026-04-09T11-23-45-678/
|
|
78
|
+
├── raw/
|
|
79
|
+
│ ├── claude.txt
|
|
80
|
+
│ ├── codex.txt
|
|
81
|
+
│ └── gemini.txt
|
|
82
|
+
├── normalized.json # 三个模型的结构化摘要
|
|
83
|
+
└── report.md # 最终融合报告
|
|
84
|
+
```
|
|
114
85
|
|
|
115
86
|
---
|
|
116
87
|
|
|
117
88
|
## 架构
|
|
118
89
|
|
|
119
90
|
```
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
│ └── format.js Markdown / JSON 渲染
|
|
126
|
-
├── bin/
|
|
127
|
-
│ └── consult CLI 入口
|
|
128
|
-
├── skills/
|
|
129
|
-
│ └── consult/
|
|
130
|
-
│ └── SKILL.md Claude Code skill 引导
|
|
131
|
-
└── docs/
|
|
132
|
-
└── spec.md 设计文档
|
|
91
|
+
runGenerators() # 并发调用三个 CLI,AbortController 超时,Promise.allSettled 容错
|
|
92
|
+
normalizeResults() # 各适配器提取 content / key_claims / assumptions / risks
|
|
93
|
+
runSimpleJudge() # 单次 Judge 调用,只传归一化摘要(非全文),控制 token
|
|
94
|
+
writeRunArtifacts() # 落盘 raw/ + normalized.json + report.md
|
|
95
|
+
runFullJudgePipeline() # [v2 占位] 两阶段 Judge + swap-compare + 抗偏置
|
|
133
96
|
```
|
|
134
97
|
|
|
98
|
+
**Judge prompt 匿名化**:候选标签只用 A / B / C,不暴露 provider 名称,避免模型偏置。
|
|
99
|
+
|
|
135
100
|
---
|
|
136
101
|
|
|
137
|
-
##
|
|
102
|
+
## 成本估算
|
|
103
|
+
|
|
104
|
+
每次运行 = 4 次 API 调用(3 generator + 1 judge):
|
|
138
105
|
|
|
139
|
-
|
|
|
140
|
-
|
|
141
|
-
|
|
|
142
|
-
|
|
|
143
|
-
|
|
|
106
|
+
| 问题复杂度 | 估算成本 |
|
|
107
|
+
|-----------|---------|
|
|
108
|
+
| 简单 | $0.20 – 0.50 |
|
|
109
|
+
| 中等 | $0.50 – 1.00 |
|
|
110
|
+
| 复杂 | $1.00 – 2.00 |
|
|
144
111
|
|
|
145
112
|
---
|
|
146
113
|
|
|
147
|
-
##
|
|
114
|
+
## V2 路线图
|
|
148
115
|
|
|
149
|
-
|
|
116
|
+
1. `--strict`:两阶段 Judge (A+B) + swap-compare + 抗偏置
|
|
117
|
+
2. `--continue`:线程续聊
|
|
118
|
+
3. `--context-file` 智能截断 + git diff 注入
|
|
119
|
+
4. 成本 / token 预算控制器
|
|
120
|
+
5. 更多 provider(Goose、本地模型等)
|
package/bin/braintrust
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
'use strict';
|
|
3
|
+
|
|
4
|
+
// Shim: delegates to src/main.js
|
|
5
|
+
// The symlink ~/.local/bin/braintrust → this file remains unchanged.
|
|
6
|
+
|
|
7
|
+
const { main } = require('../src/main.js');
|
|
8
|
+
|
|
9
|
+
main(process.argv.slice(2)).catch(e => {
|
|
10
|
+
process.stderr.write(`[braintrust error] ${e.message}\n`);
|
|
11
|
+
process.exit(1);
|
|
12
|
+
});
|
package/package.json
CHANGED
|
@@ -1,40 +1,40 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "braintrust-lite",
|
|
3
|
-
"version": "0.1.
|
|
4
|
-
"description": "
|
|
5
|
-
"type": "module",
|
|
3
|
+
"version": "0.1.8",
|
|
4
|
+
"description": "Multi-model AI consultation MCP for Claude Code — runs Claude, Codex, and Gemini in parallel for Judge-style synthesis",
|
|
6
5
|
"bin": {
|
|
7
|
-
"
|
|
8
|
-
"braintrust
|
|
9
|
-
"braintrust-
|
|
6
|
+
"braintrust-lite": "./src/server.js",
|
|
7
|
+
"braintrust": "./bin/braintrust",
|
|
8
|
+
"braintrust-doctor": "./src/doctor.js"
|
|
10
9
|
},
|
|
10
|
+
"main": "src/main.js",
|
|
11
|
+
"files": [
|
|
12
|
+
"src/",
|
|
13
|
+
"skills/",
|
|
14
|
+
"bin/",
|
|
15
|
+
"README.md"
|
|
16
|
+
],
|
|
11
17
|
"scripts": {
|
|
12
|
-
"
|
|
13
|
-
"setup": "node scripts/setup.js"
|
|
18
|
+
"test": "node --test src/normalize.test.js"
|
|
14
19
|
},
|
|
15
20
|
"dependencies": {
|
|
16
|
-
"
|
|
21
|
+
"better-sqlite3": "^11.0.0"
|
|
17
22
|
},
|
|
18
23
|
"engines": {
|
|
19
|
-
"node": ">=18"
|
|
24
|
+
"node": ">=18.0.0"
|
|
20
25
|
},
|
|
21
26
|
"keywords": [
|
|
22
27
|
"mcp",
|
|
23
28
|
"claude-code",
|
|
29
|
+
"multi-model",
|
|
30
|
+
"ai",
|
|
24
31
|
"codex",
|
|
25
32
|
"gemini",
|
|
26
|
-
"
|
|
27
|
-
"ai"
|
|
33
|
+
"braintrust"
|
|
28
34
|
],
|
|
29
|
-
"license": "MIT",
|
|
30
35
|
"repository": {
|
|
31
36
|
"type": "git",
|
|
32
|
-
"url": "
|
|
37
|
+
"url": "https://github.com/HongjieRen/braintrust.git"
|
|
33
38
|
},
|
|
34
|
-
"
|
|
35
|
-
"bin/",
|
|
36
|
-
"src/",
|
|
37
|
-
"scripts/",
|
|
38
|
-
"skills/"
|
|
39
|
-
]
|
|
39
|
+
"license": "MIT"
|
|
40
40
|
}
|
package/skills/consult/SKILL.md
CHANGED
|
@@ -9,7 +9,7 @@ description: 在处理规划/设计/架构/调研类任务时,并发调用 cod
|
|
|
9
9
|
## Preamble(每次调用前先执行)
|
|
10
10
|
|
|
11
11
|
```bash
|
|
12
|
-
_REMOTE_VER=$(curl -sf --max-time 3 "https://raw.githubusercontent.com/HongjieRen/braintrust
|
|
12
|
+
_REMOTE_VER=$(curl -sf --max-time 3 "https://raw.githubusercontent.com/HongjieRen/braintrust/main/skills/consult/SKILL.md" | awk '/^version:/ {print $2; exit}')
|
|
13
13
|
_LOCAL_VER=$(awk '/^version:/ {print $2; exit}' ~/.claude/skills/consult/SKILL.md 2>/dev/null)
|
|
14
14
|
[ -n "$_REMOTE_VER" ] && [ "$_REMOTE_VER" != "$_LOCAL_VER" ] && echo "UPGRADE_AVAILABLE $_LOCAL_VER $_REMOTE_VER" || echo "UP_TO_DATE $_LOCAL_VER"
|
|
15
15
|
```
|
|
@@ -17,7 +17,7 @@ _LOCAL_VER=$(awk '/^version:/ {print $2; exit}' ~/.claude/skills/consult/SKILL.m
|
|
|
17
17
|
**如果输出 `UPGRADE_AVAILABLE <old> <new>`**:静默自动更新,不打断用户:
|
|
18
18
|
```bash
|
|
19
19
|
cp ~/.claude/skills/consult/SKILL.md ~/.claude/skills/consult/SKILL.md.bak 2>/dev/null || true
|
|
20
|
-
curl -fsSL "https://raw.githubusercontent.com/HongjieRen/braintrust
|
|
20
|
+
curl -fsSL "https://raw.githubusercontent.com/HongjieRen/braintrust/main/skills/consult/SKILL.md" \
|
|
21
21
|
-o ~/.claude/skills/consult/SKILL.md && echo "Updated consult skill $_old → $_new"
|
|
22
22
|
```
|
|
23
23
|
更新完成后继续执行本次任务,在最终回复末尾附一行:`*(consult skill 已自动更新 v{old} → v{new})*`
|
package/src/config.js
ADDED
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
'use strict';
|
|
2
|
+
|
|
3
|
+
const { join } = require('path');
|
|
4
|
+
|
|
5
|
+
const PROJECT_ROOT = join(__dirname, '..');
|
|
6
|
+
const OUTPUT_DIR = join(PROJECT_ROOT, 'ai-outputs');
|
|
7
|
+
const STATE_DIR = join(OUTPUT_DIR, '.state');
|
|
8
|
+
const DB_PATH = join(STATE_DIR, 'braintrust.sqlite');
|
|
9
|
+
const POLICY_PATH = join(STATE_DIR, 'policy.json');
|
|
10
|
+
const REFLECTOR_LOG = join(STATE_DIR, 'reflector.log');
|
|
11
|
+
|
|
12
|
+
const DEFAULT_TIMEOUT_S = 120;
|
|
13
|
+
const DEFAULT_JUDGE_MODEL = 'claude';
|
|
14
|
+
const DEFAULT_MEMORY_K = 3;
|
|
15
|
+
const MAX_CONTEXT_CHARS = 30000;
|
|
16
|
+
const CONTEXT_FILE_MAX = 8000;
|
|
17
|
+
|
|
18
|
+
// Memory injection hard limits (chars)
|
|
19
|
+
const MEMORY_INJECT_LIMIT = 1500;
|
|
20
|
+
const LESSONS_INJECT_LIMIT = 600;
|
|
21
|
+
const SKILLS_INJECT_LIMIT = 800;
|
|
22
|
+
|
|
23
|
+
// Novelty check threshold: cosine similarity above this → prompt reuse
|
|
24
|
+
const NOVELTY_THRESHOLD = 0.9;
|
|
25
|
+
|
|
26
|
+
// Critique-revise disagreement threshold
|
|
27
|
+
const DISAGREE_THRESHOLD = 0.5;
|
|
28
|
+
|
|
29
|
+
// Economy mode: disable all extra LLM calls
|
|
30
|
+
const ECONOMY = process.env.BRAINTRUST_ECONOMY === '1';
|
|
31
|
+
|
|
32
|
+
// Reflector model: codex with gpt-5.4-mini.
|
|
33
|
+
// Chosen over haiku/flash for better Chinese text quality.
|
|
34
|
+
// Must differ from the default judge model (claude) to avoid self-evaluation bias.
|
|
35
|
+
const REFLECTOR_MODEL = 'gpt-5.4-mini';
|
|
36
|
+
const REFLECTOR_CMD = 'codex';
|
|
37
|
+
const REFLECTOR_ARGS_PREFIX = ['exec', '--json', '--skip-git-repo-check', '--ephemeral', '-m', REFLECTOR_MODEL];
|
|
38
|
+
|
|
39
|
+
module.exports = {
|
|
40
|
+
PROJECT_ROOT,
|
|
41
|
+
OUTPUT_DIR,
|
|
42
|
+
STATE_DIR,
|
|
43
|
+
DB_PATH,
|
|
44
|
+
POLICY_PATH,
|
|
45
|
+
REFLECTOR_LOG,
|
|
46
|
+
DEFAULT_TIMEOUT_S,
|
|
47
|
+
DEFAULT_JUDGE_MODEL,
|
|
48
|
+
DEFAULT_MEMORY_K,
|
|
49
|
+
MAX_CONTEXT_CHARS,
|
|
50
|
+
CONTEXT_FILE_MAX,
|
|
51
|
+
MEMORY_INJECT_LIMIT,
|
|
52
|
+
LESSONS_INJECT_LIMIT,
|
|
53
|
+
SKILLS_INJECT_LIMIT,
|
|
54
|
+
NOVELTY_THRESHOLD,
|
|
55
|
+
DISAGREE_THRESHOLD,
|
|
56
|
+
ECONOMY,
|
|
57
|
+
REFLECTOR_MODEL,
|
|
58
|
+
REFLECTOR_CMD,
|
|
59
|
+
REFLECTOR_ARGS_PREFIX,
|
|
60
|
+
};
|
package/src/doctor.js
ADDED
|
@@ -0,0 +1,120 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
'use strict';
|
|
3
|
+
|
|
4
|
+
const { execFileSync, spawnSync } = require('child_process');
|
|
5
|
+
const { existsSync, readFileSync } = require('fs');
|
|
6
|
+
const { join } = require('path');
|
|
7
|
+
const { version: PKG_VERSION } = require('../package.json');
|
|
8
|
+
|
|
9
|
+
const GREEN = '\x1b[32m✓\x1b[0m';
|
|
10
|
+
const RED = '\x1b[31m✗\x1b[0m';
|
|
11
|
+
const WARN = '\x1b[33m!\x1b[0m';
|
|
12
|
+
|
|
13
|
+
function check(label, ok, detail) {
|
|
14
|
+
const icon = ok === true ? GREEN : ok === 'warn' ? WARN : RED;
|
|
15
|
+
const line = ` ${icon} ${label.padEnd(28)} ${detail || ''}`;
|
|
16
|
+
console.log(line);
|
|
17
|
+
return ok === true;
|
|
18
|
+
}
|
|
19
|
+
|
|
20
|
+
function getVersion(cmd, args) {
|
|
21
|
+
try {
|
|
22
|
+
const result = spawnSync(cmd, args, { timeout: 5000, encoding: 'utf8' });
|
|
23
|
+
if (result.status === 0) {
|
|
24
|
+
return (result.stdout || result.stderr || '').split('\n')[0].trim().slice(0, 40);
|
|
25
|
+
}
|
|
26
|
+
return null;
|
|
27
|
+
} catch {
|
|
28
|
+
return null;
|
|
29
|
+
}
|
|
30
|
+
}
|
|
31
|
+
|
|
32
|
+
function getSkillVersion(skillPath) {
|
|
33
|
+
try {
|
|
34
|
+
const content = readFileSync(skillPath, 'utf8');
|
|
35
|
+
const m = content.match(/^version:\s*(.+)$/m);
|
|
36
|
+
return m ? m[1].trim() : 'unknown';
|
|
37
|
+
} catch {
|
|
38
|
+
return null;
|
|
39
|
+
}
|
|
40
|
+
}
|
|
41
|
+
|
|
42
|
+
function checkMcpServer() {
|
|
43
|
+
// Probe MCP server: send initialize, expect a valid JSON-RPC response
|
|
44
|
+
const serverPath = join(__dirname, 'server.js');
|
|
45
|
+
if (!existsSync(serverPath)) return { ok: false, detail: 'src/server.js not found' };
|
|
46
|
+
|
|
47
|
+
try {
|
|
48
|
+
const msg = JSON.stringify({
|
|
49
|
+
jsonrpc: '2.0', id: 1, method: 'initialize',
|
|
50
|
+
params: { protocolVersion: '2024-11-05', capabilities: {}, clientInfo: { name: 'doctor', version: '0' } },
|
|
51
|
+
});
|
|
52
|
+
const result = spawnSync(process.execPath, [serverPath], {
|
|
53
|
+
input: msg + '\n',
|
|
54
|
+
timeout: 5000,
|
|
55
|
+
encoding: 'utf8',
|
|
56
|
+
});
|
|
57
|
+
const line = (result.stdout || '').split('\n').find(l => l.trim().startsWith('{'));
|
|
58
|
+
if (!line) return { ok: false, detail: 'no JSON response from server' };
|
|
59
|
+
const resp = JSON.parse(line);
|
|
60
|
+
if (resp.result && resp.result.serverInfo) {
|
|
61
|
+
return { ok: true, detail: `v${resp.result.serverInfo.version}` };
|
|
62
|
+
}
|
|
63
|
+
return { ok: false, detail: 'unexpected response shape' };
|
|
64
|
+
} catch (err) {
|
|
65
|
+
return { ok: false, detail: err.message.slice(0, 60) };
|
|
66
|
+
}
|
|
67
|
+
}
|
|
68
|
+
|
|
69
|
+
function main() {
|
|
70
|
+
console.log(`\nbraintrust doctor (package v${PKG_VERSION})\n`);
|
|
71
|
+
|
|
72
|
+
let allOk = true;
|
|
73
|
+
|
|
74
|
+
// ── CLI tools ──────────────────────────────────────────────────────────────
|
|
75
|
+
console.log('CLI tools:');
|
|
76
|
+
for (const [cmd, vArgs, installHint] of [
|
|
77
|
+
['claude', ['--version'], 'https://claude.ai/download'],
|
|
78
|
+
['codex', ['--version'], 'npm i -g @openai/codex'],
|
|
79
|
+
['gemini', ['--version'], 'npm i -g @google/gemini-cli'],
|
|
80
|
+
]) {
|
|
81
|
+
const ver = getVersion(cmd, vArgs);
|
|
82
|
+
if (ver) {
|
|
83
|
+
check(cmd, true, ver);
|
|
84
|
+
} else {
|
|
85
|
+
check(cmd, false, `not found — ${installHint}`);
|
|
86
|
+
allOk = false;
|
|
87
|
+
}
|
|
88
|
+
}
|
|
89
|
+
|
|
90
|
+
// ── MCP server ─────────────────────────────────────────────────────────────
|
|
91
|
+
console.log('\nMCP server:');
|
|
92
|
+
const mcp = checkMcpServer();
|
|
93
|
+
if (!check('braintrust-lite server', mcp.ok, mcp.detail)) allOk = false;
|
|
94
|
+
|
|
95
|
+
// ── Skill ──────────────────────────────────────────────────────────────────
|
|
96
|
+
console.log('\nConsult skill:');
|
|
97
|
+
const skillPath = join(process.env.HOME || '~', '.claude', 'skills', 'consult', 'SKILL.md');
|
|
98
|
+
const skillVer = getSkillVersion(skillPath);
|
|
99
|
+
if (skillVer) {
|
|
100
|
+
check('SKILL.md installed', true, `v${skillVer} at ${skillPath}`);
|
|
101
|
+
} else {
|
|
102
|
+
check('SKILL.md installed', false, `not found at ${skillPath}`);
|
|
103
|
+
allOk = false;
|
|
104
|
+
}
|
|
105
|
+
|
|
106
|
+
const bakPath = skillPath + '.bak';
|
|
107
|
+
check('SKILL.md.bak exists', existsSync(bakPath) ? 'warn' : 'warn',
|
|
108
|
+
existsSync(bakPath) ? 'backup present' : 'no backup yet (created on first auto-update)');
|
|
109
|
+
|
|
110
|
+
// ── Summary ────────────────────────────────────────────────────────────────
|
|
111
|
+
console.log();
|
|
112
|
+
if (allOk) {
|
|
113
|
+
console.log(' \x1b[32mAll checks passed — braintrust is ready.\x1b[0m\n');
|
|
114
|
+
} else {
|
|
115
|
+
console.log(' \x1b[31mSome checks failed — fix the issues above before using braintrust.\x1b[0m\n');
|
|
116
|
+
process.exit(1);
|
|
117
|
+
}
|
|
118
|
+
}
|
|
119
|
+
|
|
120
|
+
main();
|
package/src/format.js
CHANGED
|
@@ -1,53 +1,30 @@
|
|
|
1
|
+
'use strict';
|
|
2
|
+
|
|
1
3
|
/**
|
|
2
|
-
* Format
|
|
4
|
+
* Format a CLI run manifest summary for terminal output.
|
|
5
|
+
*
|
|
6
|
+
* @param {{ results: Array, ts: string, judgeModel: string|null, runDir: string }} opts
|
|
7
|
+
* @returns {string}
|
|
3
8
|
*/
|
|
4
|
-
|
|
5
|
-
const
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
return `${statusLine}\n${body}${revealSection}\n\n---\n\n${manifest}`;
|
|
25
|
-
}
|
|
26
|
-
|
|
27
|
-
function buildReveal(mapping) {
|
|
28
|
-
const rows = Object.entries(mapping)
|
|
29
|
-
.map(([label, provider]) => `| ${label} | **${provider}** |`)
|
|
30
|
-
.join('\n');
|
|
31
|
-
return `\n\n---\n\n## 🔒 REVEAL — 仅在完成评估后阅读
|
|
32
|
-
|
|
33
|
-
> **Judge 指令**:请先完成你的完整评估和综合输出,再阅读以下映射表,并在回复末尾告知用户每个模型对应的真实身份。
|
|
34
|
-
|
|
35
|
-
| 匿名标签 | 真实模型 |
|
|
36
|
-
|---------|---------|
|
|
37
|
-
${rows}`;
|
|
9
|
+
function formatManifest({ results, ts, judgeModel, runDir }) {
|
|
10
|
+
const lines = [
|
|
11
|
+
'## Run Manifest',
|
|
12
|
+
'',
|
|
13
|
+
`Timestamp : ${ts}`,
|
|
14
|
+
`Judge : ${judgeModel || 'none (--no-judge)'}`,
|
|
15
|
+
`Saved to : ${runDir}`,
|
|
16
|
+
'',
|
|
17
|
+
'Providers:',
|
|
18
|
+
];
|
|
19
|
+
|
|
20
|
+
for (const r of results) {
|
|
21
|
+
const status = r.error
|
|
22
|
+
? `✗ ${(r.error_type || r.error).padEnd(12)}`
|
|
23
|
+
: `✓ ${(r.duration_ms / 1000).toFixed(1)}s parse_score=${r.parse_score.toFixed(2)}`;
|
|
24
|
+
lines.push(` ${r.provider.padEnd(10)} ${status}`);
|
|
25
|
+
}
|
|
26
|
+
|
|
27
|
+
return lines.join('\n');
|
|
38
28
|
}
|
|
39
29
|
|
|
40
|
-
|
|
41
|
-
const ts = new Date().toISOString().slice(0, 19) + 'Z';
|
|
42
|
-
const degraded = successCount < totalCount;
|
|
43
|
-
const lines = results.map(r =>
|
|
44
|
-
r.error
|
|
45
|
-
? ` - ${r.provider}: ${r.error_type || r.error}`
|
|
46
|
-
: ` - ${r.provider}: ${(r.duration_ms / 1000).toFixed(1)}s`
|
|
47
|
-
).join('\n');
|
|
48
|
-
return `**Run manifest** · \`${ts}\` · ${successCount}/${totalCount} models${degraded ? ' ⚠ degraded' : ''}\n${lines}`;
|
|
49
|
-
}
|
|
50
|
-
|
|
51
|
-
export function formatAsJson(prompt, results, mapping = null) {
|
|
52
|
-
return JSON.stringify({ prompt, results, mapping }, null, 2);
|
|
53
|
-
}
|
|
30
|
+
module.exports = { formatManifest };
|
package/src/judge.js
ADDED
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
'use strict';
|
|
2
|
+
|
|
3
|
+
const { PROVIDERS } = require('./providers/index.js');
|
|
4
|
+
const { summarize } = require('./normalize.js');
|
|
5
|
+
const { LESSONS_INJECT_LIMIT } = require('./config.js');
|
|
6
|
+
|
|
7
|
+
/**
|
|
8
|
+
* Build the judge prompt, optionally injecting lessons from memory.
|
|
9
|
+
* @param {string} question
|
|
10
|
+
* @param {Array} results - Normalized provider results
|
|
11
|
+
* @param {{ lessons?: string[], skills?: string[] }} opts
|
|
12
|
+
* @returns {string}
|
|
13
|
+
*/
|
|
14
|
+
function buildJudgePrompt(question, results, opts = {}) {
|
|
15
|
+
const valid = results.filter(r => !r.error);
|
|
16
|
+
const summaries = valid
|
|
17
|
+
.map((r, i) => `--- 候选 ${String.fromCharCode(65 + i)} (${r.provider}) ---\n${summarize(r)}`)
|
|
18
|
+
.join('\n\n');
|
|
19
|
+
|
|
20
|
+
const lessonsBlock = buildLessonsBlock(opts.lessons || []);
|
|
21
|
+
|
|
22
|
+
return `你是一个高级技术评审。${valid.length} 个 AI 模型对同一问题给出了各自的回答。
|
|
23
|
+
${lessonsBlock}
|
|
24
|
+
问题:${question}
|
|
25
|
+
|
|
26
|
+
${summaries}
|
|
27
|
+
|
|
28
|
+
请按以下结构输出你的评审(用中文标签分隔):
|
|
29
|
+
|
|
30
|
+
## 核心共识
|
|
31
|
+
(各模型都认同的关键结论)
|
|
32
|
+
|
|
33
|
+
## 独特洞见
|
|
34
|
+
(某个模型独有但有价值的见解,注明来自哪个候选)
|
|
35
|
+
|
|
36
|
+
## 分歧裁决
|
|
37
|
+
(如果存在矛盾,给出你的判断和理由;如无分歧则写"无明显分歧")
|
|
38
|
+
|
|
39
|
+
## 集大成方案
|
|
40
|
+
(综合各方的最优可执行方案)
|
|
41
|
+
|
|
42
|
+
## 风险提示
|
|
43
|
+
(需要注意的假设、风险或待验证项)`;
|
|
44
|
+
}
|
|
45
|
+
|
|
46
|
+
/**
|
|
47
|
+
* Build a lessons injection block, respecting the hard char limit.
|
|
48
|
+
* @param {string[]} lessons
|
|
49
|
+
* @returns {string}
|
|
50
|
+
*/
|
|
51
|
+
function buildLessonsBlock(lessons) {
|
|
52
|
+
if (!lessons.length) return '';
|
|
53
|
+
const joined = lessons.slice(0, 5).join('\n');
|
|
54
|
+
const trimmed = joined.slice(0, LESSONS_INJECT_LIMIT);
|
|
55
|
+
return `\n<past-lessons>\n${trimmed}\n</past-lessons>\n`;
|
|
56
|
+
}
|
|
57
|
+
|
|
58
|
+
/**
|
|
59
|
+
* Run the judge model and return the report text.
|
|
60
|
+
* @param {string} question
|
|
61
|
+
* @param {Array} results - Normalized provider results
|
|
62
|
+
* @param {object} opts
|
|
63
|
+
* @param {string} [opts.judgeModel='claude'] - Which model to use as judge
|
|
64
|
+
* @param {Function} opts.runProcess - The process runner function
|
|
65
|
+
* @param {string[]} [opts.lessons] - Lessons to inject
|
|
66
|
+
* @returns {Promise<string>}
|
|
67
|
+
*/
|
|
68
|
+
async function runJudge(question, results, opts = {}) {
|
|
69
|
+
const { judgeModel = 'claude', runProcess, lessons = [] } = opts;
|
|
70
|
+
const judgePrompt = buildJudgePrompt(question, results, { lessons });
|
|
71
|
+
|
|
72
|
+
process.stderr.write(`\n[Judge (${judgeModel}): running...]\n`);
|
|
73
|
+
const start = Date.now();
|
|
74
|
+
|
|
75
|
+
const provider = PROVIDERS[judgeModel];
|
|
76
|
+
if (!provider) {
|
|
77
|
+
throw new Error(`Unknown judge model: ${judgeModel}. Use claude|codex|gemini.`);
|
|
78
|
+
}
|
|
79
|
+
|
|
80
|
+
const raw = await runProcess(provider.cmd, provider.getArgs(judgePrompt));
|
|
81
|
+
const ms = Date.now() - start;
|
|
82
|
+
process.stderr.write(`[Judge: done ${(ms / 1000).toFixed(1)}s]\n`);
|
|
83
|
+
|
|
84
|
+
return provider.extractJudgeText(raw);
|
|
85
|
+
}
|
|
86
|
+
|
|
87
|
+
module.exports = { buildJudgePrompt, runJudge, buildLessonsBlock };
|