braintrust-lite 0.1.7 → 0.1.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,149 +1,120 @@
1
- # braintrust-lite
1
+ # brantrust
2
2
 
3
- Claude Code 原生的多模型军师 并发调用 Codex + Gemini,主 Claude 担任 Judge 融合输出。
3
+ 同题多模型融合器把同一个问题同时发给 Claude、CodexGemini,然后用一个 Judge 综合出"集大成方案"。
4
4
 
5
5
  ```
6
- Claudeparallel:
7
- ├─ Task(subagent_type=Plan, prompt=X) ← 正常子 agent
8
- └─ mcp__braintrust_lite__consult(prompt=X) ← Codex + Gemini 旁路咨询
9
- → 主 Claude 融合三方视角 → 最终方案
6
+ 输入 并发生成(3) 清洗归一化 → Judge 融合(1) → 输出 + 落盘
10
7
  ```
11
8
 
12
- vs [`braintrust`](https://github.com/HongjieRen/braintrust): 2 次 API 调用(省 50%),无独立 Judge,无落盘,原生集成 Claude Code。
9
+ 4 次 API 调用,低成本,天天能用。
13
10
 
14
11
  ---
15
12
 
16
13
  ## 安装
17
14
 
18
- **前置条件**:`codex` 和 `gemini` CLI 均已登录。
19
-
20
15
  ```bash
21
16
  # 克隆
22
- git clone https://github.com/HongjieRen/braintrust-lite.git
23
- cd braintrust-lite
24
-
25
- # 安装依赖
26
- npm install
27
-
28
- # 可选:把 CLI 软链到 PATH
29
- ln -sf "$(pwd)/bin/consult" ~/.local/bin/consult
30
- chmod +x bin/consult
31
- ```
17
+ git clone https://github.com/HongjieRen/brantrust.git
18
+ cd brantrust
32
19
 
33
- ---
34
-
35
- ## 注册到 Claude Code(MCP)
36
-
37
- ```bash
38
- claude mcp add braintrust-lite node "$(pwd)/src/server.js"
20
+ # 软链接到 PATH
21
+ ln -sf "$(pwd)/brantrust" ~/.local/bin/brantrust
22
+ chmod +x brantrust
39
23
  ```
40
24
 
41
- 注册后,Claude Code 会话里会出现 `mcp__braintrust_lite__consult` tool,和 `Read` / `Bash` 并列可用。
42
-
43
- 重启 Claude Code 后生效。
44
-
45
- ---
46
-
47
- ## 安装 Skill 引导
48
-
49
- 把 skill 软链到 Claude Code 全局 skill 目录,让主 Claude 知道何时该主动使用 consult:
50
-
51
- ```bash
52
- ln -sf "$(pwd)/skills/consult" ~/.claude/skills/consult
53
- ```
25
+ **前置依赖**(三个 CLI 均需已登录):
54
26
 
55
- 安装后可用 `/consult` slash command 激活"军师模式"引导。
27
+ | Provider | CLI | 验证命令 |
28
+ |----------|-----|---------|
29
+ | Claude | `claude` | `claude -p "hi" --output-format json` |
30
+ | OpenAI Codex | `codex` | `codex exec "hi" --json --skip-git-repo-check --ephemeral` |
31
+ | Google Gemini | `gemini` | `gemini -p "hi" -o json` |
56
32
 
57
33
  ---
58
34
 
59
- ## 使用方式
60
-
61
- ### 在 Claude Code 里(推荐)
62
-
63
- Claude 会在处理规划/设计类任务时自动(或在 `/consult` 引导下)并发调用:
64
-
65
- ```
66
- 你处理一个架构选型任务时,Claude 会同时:
67
- 1. 启动 Plan sub-agent 做深度分析
68
- 2. 调用 mcp__braintrust_lite__consult 获取 Codex + Gemini 的独立视角
69
- 3. 融合三方输出给你最终方案
70
- ```
71
-
72
- ### 终端 CLI(fallback / 调试)
35
+ ## 用法
73
36
 
74
37
  ```bash
75
- consult "解释 CAP 定理" # 并发两模型,markdown 输出
76
- consult --only codex "prompt" # 只跑 codex
77
- consult --skip gemini "prompt" # 跳过 gemini
78
- consult --timeout 60 "prompt" # 超时秒数
79
- consult --dir ~/myproject "review" # 工作目录
80
- cat app.ts | consult "review this code" # stdin 拼接
81
- consult --json "prompt" # JSON 结构化输出
38
+ brantrust "解释 CAP 定理" # 默认:3 generator + 1 judge
39
+ brantrust --no-judge "React vs Vue" # 只并发收集,不 judge
40
+ brantrust --judge-model gemini "数据库选型" # 切换 Judge 模型
41
+ brantrust --skip codex "量子计算" # 跳过某个模型(可多次)
42
+ cat app.ts | brantrust "review 这段代码" # stdin 管道
43
+ brantrust --dir ~/project "项目分析" # 指定工作目录
44
+ brantrust --context-file design.md "实现方案" # 附加上下文文件
45
+ brantrust --timeout 60 "快速问题" # 超时秒数
46
+ brantrust --no-save "临时问答" # 不保存到磁盘
47
+ brantrust --json "问题" # 输出完整 JSON
48
+ brantrust --list # 查看历史运行
49
+ brantrust --strict "关键决策" # [v2] 完整 Judge 流水线
82
50
  ```
83
51
 
84
- ---
85
-
86
- ## 参数
52
+ ### 参数一览
87
53
 
88
54
  | 参数 | 默认 | 说明 |
89
- |---|---|---|
90
- | `prompt` | 必须 | 问题文本(MCP)/ 位置参数(CLI)|
91
- | `only` | — | 只调用: `codex` \| `gemini` |
92
- | `skip` | | 跳过模型列表 |
93
- | `timeout_sec` | `90` | 每个模型超时秒数 |
94
- | `cwd` | server cwd | 子进程工作目录 |
95
- | `--json` | false | CLI 专用:JSON 格式输出 |
55
+ |------|------|------|
56
+ | `"prompt"` | 必须 | 问题文本 |
57
+ | `--skip <model>` | — | 跳过模型:claude / codex / gemini,可多次使用 |
58
+ | `--judge-model <model>` | `claude` | Judge 使用的模型 |
59
+ | `--no-judge` | false | 关闭 Judge,只展示各模型原始回答 |
60
+ | `--timeout <sec>` | `120` | 每个模型的超时秒数 |
61
+ | `--dir <path>` | cwd | CLI 工具的工作目录 |
62
+ | `--context-file <file>` | — | 附加文件内容作为上下文(最多 8000 字符)|
63
+ | `--no-save` | false | 不保存结果到磁盘 |
64
+ | `--json` | false | 将完整结果以 JSON 格式输出到 stdout |
65
+ | `--list` | — | 列出最近 20 条历史运行 |
66
+ | `--strict` | — | [v2 占位] 两阶段 Judge + swap-compare |
96
67
 
97
68
  ---
98
69
 
99
- ## 输出格式
70
+ ## 输出
100
71
 
101
- ```
102
- ## CODEX (8.2s)
103
-
104
- <codex 完整回答>
105
-
106
- ---
72
+ **终端**:各模型回答 + Judge 融合报告(Markdown 格式)
107
73
 
108
- ## GEMINI (6.5s)
74
+ **落盘**(`~/ai-outputs/<timestamp>/`):
109
75
 
110
- <gemini 完整回答>
111
76
  ```
112
-
113
- 失败的 provider 显示 `*调用失败: timeout*`,另一个照常返回(`Promise.allSettled` 容错)。
77
+ ~/ai-outputs/2026-04-09T11-23-45-678/
78
+ ├── raw/
79
+ │ ├── claude.txt
80
+ │ ├── codex.txt
81
+ │ └── gemini.txt
82
+ ├── normalized.json # 三个模型的结构化摘要
83
+ └── report.md # 最终融合报告
84
+ ```
114
85
 
115
86
  ---
116
87
 
117
88
  ## 架构
118
89
 
119
90
  ```
120
- braintrust-lite/
121
- ├── src/
122
- │ ├── server.js MCP stdio server
123
- │ ├── consult.js 核心并发逻辑
124
- │ ├── providers.js spawn + Codex/Gemini 解析器
125
- │ └── format.js Markdown / JSON 渲染
126
- ├── bin/
127
- │ └── consult CLI 入口
128
- ├── skills/
129
- │ └── consult/
130
- │ └── SKILL.md Claude Code skill 引导
131
- └── docs/
132
- └── spec.md 设计文档
91
+ runGenerators() # 并发调用三个 CLI,AbortController 超时,Promise.allSettled 容错
92
+ normalizeResults() # 各适配器提取 content / key_claims / assumptions / risks
93
+ runSimpleJudge() # 单次 Judge 调用,只传归一化摘要(非全文),控制 token
94
+ writeRunArtifacts() # 落盘 raw/ + normalized.json + report.md
95
+ runFullJudgePipeline() # [v2 占位] 两阶段 Judge + swap-compare + 抗偏置
133
96
  ```
134
97
 
98
+ **Judge prompt 匿名化**:候选标签只用 A / B / C,不暴露 provider 名称,避免模型偏置。
99
+
135
100
  ---
136
101
 
137
- ## 成本
102
+ ## 成本估算
103
+
104
+ 每次运行 = 4 次 API 调用(3 generator + 1 judge):
138
105
 
139
- | 场景 | API 调用 | 估算成本 |
140
- |---|---|---|
141
- | 简单问题 | 2 | $0.05–0.15 |
142
- | 中等问题 | 2 | $0.150.40 |
143
- | 复杂问题 | 2 | $0.400.80 |
106
+ | 问题复杂度 | 估算成本 |
107
+ |-----------|---------|
108
+ | 简单 | $0.20 0.50 |
109
+ | 中等 | $0.50 1.00 |
110
+ | 复杂 | $1.00 2.00 |
144
111
 
145
112
  ---
146
113
 
147
- ## License
114
+ ## V2 路线图
148
115
 
149
- MIT
116
+ 1. `--strict`:两阶段 Judge (A+B) + swap-compare + 抗偏置
117
+ 2. `--continue`:线程续聊
118
+ 3. `--context-file` 智能截断 + git diff 注入
119
+ 4. 成本 / token 预算控制器
120
+ 5. 更多 provider(Goose、本地模型等)
package/bin/braintrust ADDED
@@ -0,0 +1,12 @@
1
+ #!/usr/bin/env node
2
+ 'use strict';
3
+
4
+ // Shim: delegates to src/main.js
5
+ // The symlink ~/.local/bin/braintrust → this file remains unchanged.
6
+
7
+ const { main } = require('../src/main.js');
8
+
9
+ main(process.argv.slice(2)).catch(e => {
10
+ process.stderr.write(`[braintrust error] ${e.message}\n`);
11
+ process.exit(1);
12
+ });
package/package.json CHANGED
@@ -1,40 +1,40 @@
1
1
  {
2
2
  "name": "braintrust-lite",
3
- "version": "0.1.7",
4
- "description": "Lightweight multi-model advisor for Claude Code — parallel Codex + Gemini consultation via MCP",
5
- "type": "module",
3
+ "version": "0.1.8",
4
+ "description": "Multi-model AI consultation MCP for Claude Code — runs Claude, Codex, and Gemini in parallel for Judge-style synthesis",
6
5
  "bin": {
7
- "consult": "bin/consult",
8
- "braintrust-lite": "src/server.js",
9
- "braintrust-setup": "scripts/setup.js"
6
+ "braintrust-lite": "./src/server.js",
7
+ "braintrust": "./bin/braintrust",
8
+ "braintrust-doctor": "./src/doctor.js"
10
9
  },
10
+ "main": "src/main.js",
11
+ "files": [
12
+ "src/",
13
+ "skills/",
14
+ "bin/",
15
+ "README.md"
16
+ ],
11
17
  "scripts": {
12
- "start": "node src/server.js",
13
- "setup": "node scripts/setup.js"
18
+ "test": "node --test src/normalize.test.js"
14
19
  },
15
20
  "dependencies": {
16
- "@modelcontextprotocol/sdk": "^1.10.2"
21
+ "better-sqlite3": "^11.0.0"
17
22
  },
18
23
  "engines": {
19
- "node": ">=18"
24
+ "node": ">=18.0.0"
20
25
  },
21
26
  "keywords": [
22
27
  "mcp",
23
28
  "claude-code",
29
+ "multi-model",
30
+ "ai",
24
31
  "codex",
25
32
  "gemini",
26
- "multi-model",
27
- "ai"
33
+ "braintrust"
28
34
  ],
29
- "license": "MIT",
30
35
  "repository": {
31
36
  "type": "git",
32
- "url": "git+https://github.com/HongjieRen/braintrust-lite.git"
37
+ "url": "https://github.com/HongjieRen/braintrust.git"
33
38
  },
34
- "files": [
35
- "bin/",
36
- "src/",
37
- "scripts/",
38
- "skills/"
39
- ]
39
+ "license": "MIT"
40
40
  }
@@ -9,7 +9,7 @@ description: 在处理规划/设计/架构/调研类任务时,并发调用 cod
9
9
  ## Preamble(每次调用前先执行)
10
10
 
11
11
  ```bash
12
- _REMOTE_VER=$(curl -sf --max-time 3 "https://raw.githubusercontent.com/HongjieRen/braintrust-lite/main/skills/consult/SKILL.md" | awk '/^version:/ {print $2; exit}')
12
+ _REMOTE_VER=$(curl -sf --max-time 3 "https://raw.githubusercontent.com/HongjieRen/braintrust/main/skills/consult/SKILL.md" | awk '/^version:/ {print $2; exit}')
13
13
  _LOCAL_VER=$(awk '/^version:/ {print $2; exit}' ~/.claude/skills/consult/SKILL.md 2>/dev/null)
14
14
  [ -n "$_REMOTE_VER" ] && [ "$_REMOTE_VER" != "$_LOCAL_VER" ] && echo "UPGRADE_AVAILABLE $_LOCAL_VER $_REMOTE_VER" || echo "UP_TO_DATE $_LOCAL_VER"
15
15
  ```
@@ -17,7 +17,7 @@ _LOCAL_VER=$(awk '/^version:/ {print $2; exit}' ~/.claude/skills/consult/SKILL.m
17
17
  **如果输出 `UPGRADE_AVAILABLE <old> <new>`**:静默自动更新,不打断用户:
18
18
  ```bash
19
19
  cp ~/.claude/skills/consult/SKILL.md ~/.claude/skills/consult/SKILL.md.bak 2>/dev/null || true
20
- curl -fsSL "https://raw.githubusercontent.com/HongjieRen/braintrust-lite/main/skills/consult/SKILL.md" \
20
+ curl -fsSL "https://raw.githubusercontent.com/HongjieRen/braintrust/main/skills/consult/SKILL.md" \
21
21
  -o ~/.claude/skills/consult/SKILL.md && echo "Updated consult skill $_old → $_new"
22
22
  ```
23
23
  更新完成后继续执行本次任务,在最终回复末尾附一行:`*(consult skill 已自动更新 v{old} → v{new})*`
package/src/config.js ADDED
@@ -0,0 +1,60 @@
1
+ 'use strict';
2
+
3
+ const { join } = require('path');
4
+
5
+ const PROJECT_ROOT = join(__dirname, '..');
6
+ const OUTPUT_DIR = join(PROJECT_ROOT, 'ai-outputs');
7
+ const STATE_DIR = join(OUTPUT_DIR, '.state');
8
+ const DB_PATH = join(STATE_DIR, 'braintrust.sqlite');
9
+ const POLICY_PATH = join(STATE_DIR, 'policy.json');
10
+ const REFLECTOR_LOG = join(STATE_DIR, 'reflector.log');
11
+
12
+ const DEFAULT_TIMEOUT_S = 120;
13
+ const DEFAULT_JUDGE_MODEL = 'claude';
14
+ const DEFAULT_MEMORY_K = 3;
15
+ const MAX_CONTEXT_CHARS = 30000;
16
+ const CONTEXT_FILE_MAX = 8000;
17
+
18
+ // Memory injection hard limits (chars)
19
+ const MEMORY_INJECT_LIMIT = 1500;
20
+ const LESSONS_INJECT_LIMIT = 600;
21
+ const SKILLS_INJECT_LIMIT = 800;
22
+
23
+ // Novelty check threshold: cosine similarity above this → prompt reuse
24
+ const NOVELTY_THRESHOLD = 0.9;
25
+
26
+ // Critique-revise disagreement threshold
27
+ const DISAGREE_THRESHOLD = 0.5;
28
+
29
+ // Economy mode: disable all extra LLM calls
30
+ const ECONOMY = process.env.BRAINTRUST_ECONOMY === '1';
31
+
32
+ // Reflector model: codex with gpt-5.4-mini.
33
+ // Chosen over haiku/flash for better Chinese text quality.
34
+ // Must differ from the default judge model (claude) to avoid self-evaluation bias.
35
+ const REFLECTOR_MODEL = 'gpt-5.4-mini';
36
+ const REFLECTOR_CMD = 'codex';
37
+ const REFLECTOR_ARGS_PREFIX = ['exec', '--json', '--skip-git-repo-check', '--ephemeral', '-m', REFLECTOR_MODEL];
38
+
39
+ module.exports = {
40
+ PROJECT_ROOT,
41
+ OUTPUT_DIR,
42
+ STATE_DIR,
43
+ DB_PATH,
44
+ POLICY_PATH,
45
+ REFLECTOR_LOG,
46
+ DEFAULT_TIMEOUT_S,
47
+ DEFAULT_JUDGE_MODEL,
48
+ DEFAULT_MEMORY_K,
49
+ MAX_CONTEXT_CHARS,
50
+ CONTEXT_FILE_MAX,
51
+ MEMORY_INJECT_LIMIT,
52
+ LESSONS_INJECT_LIMIT,
53
+ SKILLS_INJECT_LIMIT,
54
+ NOVELTY_THRESHOLD,
55
+ DISAGREE_THRESHOLD,
56
+ ECONOMY,
57
+ REFLECTOR_MODEL,
58
+ REFLECTOR_CMD,
59
+ REFLECTOR_ARGS_PREFIX,
60
+ };
package/src/doctor.js ADDED
@@ -0,0 +1,120 @@
1
+ #!/usr/bin/env node
2
+ 'use strict';
3
+
4
+ const { execFileSync, spawnSync } = require('child_process');
5
+ const { existsSync, readFileSync } = require('fs');
6
+ const { join } = require('path');
7
+ const { version: PKG_VERSION } = require('../package.json');
8
+
9
+ const GREEN = '\x1b[32m✓\x1b[0m';
10
+ const RED = '\x1b[31m✗\x1b[0m';
11
+ const WARN = '\x1b[33m!\x1b[0m';
12
+
13
+ function check(label, ok, detail) {
14
+ const icon = ok === true ? GREEN : ok === 'warn' ? WARN : RED;
15
+ const line = ` ${icon} ${label.padEnd(28)} ${detail || ''}`;
16
+ console.log(line);
17
+ return ok === true;
18
+ }
19
+
20
+ function getVersion(cmd, args) {
21
+ try {
22
+ const result = spawnSync(cmd, args, { timeout: 5000, encoding: 'utf8' });
23
+ if (result.status === 0) {
24
+ return (result.stdout || result.stderr || '').split('\n')[0].trim().slice(0, 40);
25
+ }
26
+ return null;
27
+ } catch {
28
+ return null;
29
+ }
30
+ }
31
+
32
+ function getSkillVersion(skillPath) {
33
+ try {
34
+ const content = readFileSync(skillPath, 'utf8');
35
+ const m = content.match(/^version:\s*(.+)$/m);
36
+ return m ? m[1].trim() : 'unknown';
37
+ } catch {
38
+ return null;
39
+ }
40
+ }
41
+
42
+ function checkMcpServer() {
43
+ // Probe MCP server: send initialize, expect a valid JSON-RPC response
44
+ const serverPath = join(__dirname, 'server.js');
45
+ if (!existsSync(serverPath)) return { ok: false, detail: 'src/server.js not found' };
46
+
47
+ try {
48
+ const msg = JSON.stringify({
49
+ jsonrpc: '2.0', id: 1, method: 'initialize',
50
+ params: { protocolVersion: '2024-11-05', capabilities: {}, clientInfo: { name: 'doctor', version: '0' } },
51
+ });
52
+ const result = spawnSync(process.execPath, [serverPath], {
53
+ input: msg + '\n',
54
+ timeout: 5000,
55
+ encoding: 'utf8',
56
+ });
57
+ const line = (result.stdout || '').split('\n').find(l => l.trim().startsWith('{'));
58
+ if (!line) return { ok: false, detail: 'no JSON response from server' };
59
+ const resp = JSON.parse(line);
60
+ if (resp.result && resp.result.serverInfo) {
61
+ return { ok: true, detail: `v${resp.result.serverInfo.version}` };
62
+ }
63
+ return { ok: false, detail: 'unexpected response shape' };
64
+ } catch (err) {
65
+ return { ok: false, detail: err.message.slice(0, 60) };
66
+ }
67
+ }
68
+
69
+ function main() {
70
+ console.log(`\nbraintrust doctor (package v${PKG_VERSION})\n`);
71
+
72
+ let allOk = true;
73
+
74
+ // ── CLI tools ──────────────────────────────────────────────────────────────
75
+ console.log('CLI tools:');
76
+ for (const [cmd, vArgs, installHint] of [
77
+ ['claude', ['--version'], 'https://claude.ai/download'],
78
+ ['codex', ['--version'], 'npm i -g @openai/codex'],
79
+ ['gemini', ['--version'], 'npm i -g @google/gemini-cli'],
80
+ ]) {
81
+ const ver = getVersion(cmd, vArgs);
82
+ if (ver) {
83
+ check(cmd, true, ver);
84
+ } else {
85
+ check(cmd, false, `not found — ${installHint}`);
86
+ allOk = false;
87
+ }
88
+ }
89
+
90
+ // ── MCP server ─────────────────────────────────────────────────────────────
91
+ console.log('\nMCP server:');
92
+ const mcp = checkMcpServer();
93
+ if (!check('braintrust-lite server', mcp.ok, mcp.detail)) allOk = false;
94
+
95
+ // ── Skill ──────────────────────────────────────────────────────────────────
96
+ console.log('\nConsult skill:');
97
+ const skillPath = join(process.env.HOME || '~', '.claude', 'skills', 'consult', 'SKILL.md');
98
+ const skillVer = getSkillVersion(skillPath);
99
+ if (skillVer) {
100
+ check('SKILL.md installed', true, `v${skillVer} at ${skillPath}`);
101
+ } else {
102
+ check('SKILL.md installed', false, `not found at ${skillPath}`);
103
+ allOk = false;
104
+ }
105
+
106
+ const bakPath = skillPath + '.bak';
107
+ check('SKILL.md.bak exists', existsSync(bakPath) ? 'warn' : 'warn',
108
+ existsSync(bakPath) ? 'backup present' : 'no backup yet (created on first auto-update)');
109
+
110
+ // ── Summary ────────────────────────────────────────────────────────────────
111
+ console.log();
112
+ if (allOk) {
113
+ console.log(' \x1b[32mAll checks passed — braintrust is ready.\x1b[0m\n');
114
+ } else {
115
+ console.log(' \x1b[31mSome checks failed — fix the issues above before using braintrust.\x1b[0m\n');
116
+ process.exit(1);
117
+ }
118
+ }
119
+
120
+ main();
package/src/format.js CHANGED
@@ -1,53 +1,30 @@
1
+ 'use strict';
2
+
1
3
  /**
2
- * Format provider results as human-readable Markdown with run manifest.
4
+ * Format a CLI run manifest summary for terminal output.
5
+ *
6
+ * @param {{ results: Array, ts: string, judgeModel: string|null, runDir: string }} opts
7
+ * @returns {string}
3
8
  */
4
- export function formatAsMarkdown(results, mapping = null, { successCount, totalCount } = {}) {
5
- const total = totalCount ?? results.length;
6
- const succeeded = successCount ?? results.filter(r => !r.error).length;
7
- const degraded = succeeded < total;
8
-
9
- // Status line (mirrors SKILL.md status bar format)
10
- const modelsLabel = degraded ? `⚠ ${succeeded}/${total} models` : `${total} models`;
11
- const statusLine = `[Consult | ${modelsLabel} | responses below]\n`;
12
-
13
- const body = results.map(r => {
14
- const label = r.error
15
- ? `## ${r.provider} (${r.error})`
16
- : `## ${r.provider} (${(r.duration_ms / 1000).toFixed(1)}s)`;
17
- const content = r.error ? `*调用失败: ${r.error}*` : r.content;
18
- return `${label}\n\n${content}`;
19
- }).join('\n\n---\n\n');
20
-
21
- const revealSection = mapping ? buildReveal(mapping) : '';
22
- const manifest = buildManifest(results, { successCount: succeeded, totalCount: total });
23
-
24
- return `${statusLine}\n${body}${revealSection}\n\n---\n\n${manifest}`;
25
- }
26
-
27
- function buildReveal(mapping) {
28
- const rows = Object.entries(mapping)
29
- .map(([label, provider]) => `| ${label} | **${provider}** |`)
30
- .join('\n');
31
- return `\n\n---\n\n## 🔒 REVEAL — 仅在完成评估后阅读
32
-
33
- > **Judge 指令**:请先完成你的完整评估和综合输出,再阅读以下映射表,并在回复末尾告知用户每个模型对应的真实身份。
34
-
35
- | 匿名标签 | 真实模型 |
36
- |---------|---------|
37
- ${rows}`;
9
+ function formatManifest({ results, ts, judgeModel, runDir }) {
10
+ const lines = [
11
+ '## Run Manifest',
12
+ '',
13
+ `Timestamp : ${ts}`,
14
+ `Judge : ${judgeModel || 'none (--no-judge)'}`,
15
+ `Saved to : ${runDir}`,
16
+ '',
17
+ 'Providers:',
18
+ ];
19
+
20
+ for (const r of results) {
21
+ const status = r.error
22
+ ? `✗ ${(r.error_type || r.error).padEnd(12)}`
23
+ : `✓ ${(r.duration_ms / 1000).toFixed(1)}s parse_score=${r.parse_score.toFixed(2)}`;
24
+ lines.push(` ${r.provider.padEnd(10)} ${status}`);
25
+ }
26
+
27
+ return lines.join('\n');
38
28
  }
39
29
 
40
- function buildManifest(results, { successCount, totalCount }) {
41
- const ts = new Date().toISOString().slice(0, 19) + 'Z';
42
- const degraded = successCount < totalCount;
43
- const lines = results.map(r =>
44
- r.error
45
- ? ` - ${r.provider}: ${r.error_type || r.error}`
46
- : ` - ${r.provider}: ${(r.duration_ms / 1000).toFixed(1)}s`
47
- ).join('\n');
48
- return `**Run manifest** · \`${ts}\` · ${successCount}/${totalCount} models${degraded ? ' ⚠ degraded' : ''}\n${lines}`;
49
- }
50
-
51
- export function formatAsJson(prompt, results, mapping = null) {
52
- return JSON.stringify({ prompt, results, mapping }, null, 2);
53
- }
30
+ module.exports = { formatManifest };
package/src/judge.js ADDED
@@ -0,0 +1,87 @@
1
+ 'use strict';
2
+
3
+ const { PROVIDERS } = require('./providers/index.js');
4
+ const { summarize } = require('./normalize.js');
5
+ const { LESSONS_INJECT_LIMIT } = require('./config.js');
6
+
7
+ /**
8
+ * Build the judge prompt, optionally injecting lessons from memory.
9
+ * @param {string} question
10
+ * @param {Array} results - Normalized provider results
11
+ * @param {{ lessons?: string[], skills?: string[] }} opts
12
+ * @returns {string}
13
+ */
14
+ function buildJudgePrompt(question, results, opts = {}) {
15
+ const valid = results.filter(r => !r.error);
16
+ const summaries = valid
17
+ .map((r, i) => `--- 候选 ${String.fromCharCode(65 + i)} (${r.provider}) ---\n${summarize(r)}`)
18
+ .join('\n\n');
19
+
20
+ const lessonsBlock = buildLessonsBlock(opts.lessons || []);
21
+
22
+ return `你是一个高级技术评审。${valid.length} 个 AI 模型对同一问题给出了各自的回答。
23
+ ${lessonsBlock}
24
+ 问题:${question}
25
+
26
+ ${summaries}
27
+
28
+ 请按以下结构输出你的评审(用中文标签分隔):
29
+
30
+ ## 核心共识
31
+ (各模型都认同的关键结论)
32
+
33
+ ## 独特洞见
34
+ (某个模型独有但有价值的见解,注明来自哪个候选)
35
+
36
+ ## 分歧裁决
37
+ (如果存在矛盾,给出你的判断和理由;如无分歧则写"无明显分歧")
38
+
39
+ ## 集大成方案
40
+ (综合各方的最优可执行方案)
41
+
42
+ ## 风险提示
43
+ (需要注意的假设、风险或待验证项)`;
44
+ }
45
+
46
+ /**
47
+ * Build a lessons injection block, respecting the hard char limit.
48
+ * @param {string[]} lessons
49
+ * @returns {string}
50
+ */
51
+ function buildLessonsBlock(lessons) {
52
+ if (!lessons.length) return '';
53
+ const joined = lessons.slice(0, 5).join('\n');
54
+ const trimmed = joined.slice(0, LESSONS_INJECT_LIMIT);
55
+ return `\n<past-lessons>\n${trimmed}\n</past-lessons>\n`;
56
+ }
57
+
58
+ /**
59
+ * Run the judge model and return the report text.
60
+ * @param {string} question
61
+ * @param {Array} results - Normalized provider results
62
+ * @param {object} opts
63
+ * @param {string} [opts.judgeModel='claude'] - Which model to use as judge
64
+ * @param {Function} opts.runProcess - The process runner function
65
+ * @param {string[]} [opts.lessons] - Lessons to inject
66
+ * @returns {Promise<string>}
67
+ */
68
+ async function runJudge(question, results, opts = {}) {
69
+ const { judgeModel = 'claude', runProcess, lessons = [] } = opts;
70
+ const judgePrompt = buildJudgePrompt(question, results, { lessons });
71
+
72
+ process.stderr.write(`\n[Judge (${judgeModel}): running...]\n`);
73
+ const start = Date.now();
74
+
75
+ const provider = PROVIDERS[judgeModel];
76
+ if (!provider) {
77
+ throw new Error(`Unknown judge model: ${judgeModel}. Use claude|codex|gemini.`);
78
+ }
79
+
80
+ const raw = await runProcess(provider.cmd, provider.getArgs(judgePrompt));
81
+ const ms = Date.now() - start;
82
+ process.stderr.write(`[Judge: done ${(ms / 1000).toFixed(1)}s]\n`);
83
+
84
+ return provider.extractJudgeText(raw);
85
+ }
86
+
87
+ module.exports = { buildJudgePrompt, runJudge, buildLessonsBlock };