@archsight/aios 1.1.0 → 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +60 -0
- package/.claude-plugin/plugin.json +36 -0
- package/CHANGELOG.md +93 -30
- package/OPENCODE.md +23 -0
- package/README.md +106 -48
- package/RELEASE_NOTES.md +52 -0
- package/adapters/README.md +7 -0
- package/adapters/workbuddy/README.md +43 -0
- package/agents/README.md +6 -3
- package/agents/daedalus/system-prompt.md +2 -0
- package/agents/hestia/constraints.md +7 -0
- package/agents/hestia/responsibilities.md +7 -0
- package/agents/hestia/role.md +12 -0
- package/agents/hestia/system-prompt.md +23 -0
- package/agents/hestia/workflow.md +8 -0
- package/agents/plutus/constraints.md +7 -0
- package/agents/plutus/responsibilities.md +7 -0
- package/agents/plutus/role.md +12 -0
- package/agents/plutus/system-prompt.md +24 -0
- package/agents/plutus/workflow.md +8 -0
- package/agents/themis/constraints.md +7 -0
- package/agents/themis/responsibilities.md +7 -0
- package/agents/themis/role.md +12 -0
- package/agents/themis/system-prompt.md +24 -0
- package/agents/themis/workflow.md +8 -0
- package/bin/archsight-aios.mjs +605 -31
- package/docs/PUBLIC_DISCOVERY.md +207 -0
- package/docs/business-expert-guide.md +5 -3
- package/docs/glossary.md +11 -3
- package/docs/quickstart.md +18 -4
- package/gemini-extension.json +6 -0
- package/package.json +66 -34
- package/prompts/README.md +12 -0
- package/prompts/evaluation-policy.md +70 -0
- package/prompts/evaluations/engineering-business-basic-advisory-validation-2026-06-16.md +87 -0
- package/prompts/evaluations/engineering-business-basic-fixtures.json +375 -0
- package/prompts/evaluations/engineering-business-basic-model-output.example.json +179 -0
- package/prompts/evaluations/engineering-business-basic-prompts-2026-06-16.md +205 -0
- package/prompts/evaluations/engineering-business-basic-scorecard.json +238 -0
- package/prompts/evaluations/engineering-business-public-advisory-fixtures.json +422 -0
- package/prompts/evaluations/public-advisory-md/01-technical-bid.md +63 -0
- package/prompts/evaluations/public-advisory-md/02-contract.md +61 -0
- package/prompts/evaluations/public-advisory-md/03-daily.md +69 -0
- package/prompts/evaluations/public-advisory-md/04-meeting.md +48 -0
- package/prompts/evaluations/public-advisory-md/05-variation.md +63 -0
- package/prompts/evaluations/public-advisory-md/06-scheme.md +60 -0
- package/prompts/failure-cases.md +5 -1
- package/prompts/prompt-registry.md +10 -0
- package/runtime/agent-routing.md +36 -8
- package/runtime/archsight-aios.manifest.json +207 -60
- package/runtime/capability-registry.json +12 -2
- package/runtime/hermes/agent-registry.md +3 -0
- package/runtime/hermes/workspace-binding.md +3 -0
- package/runtime/skill-routing.md +16 -2
- package/scripts/analyze-prompt-run-results.mjs +187 -0
- package/scripts/build-prompt-run-pack.mjs +248 -0
- package/scripts/validate-prompt-fixtures.mjs +225 -0
- package/scripts/validate-prompt-model-outputs.mjs +201 -0
- package/scripts/validate-prompt-run-results.mjs +259 -0
- package/scripts/validate-prompt-scorecard.mjs +133 -0
- package/scripts/validate-skills.mjs +138 -0
- package/skills/README.md +16 -0
- package/skills/aios-commercial-contract/SKILL.md +107 -0
- package/skills/aios-commercial-contract/agents/openai.yaml +4 -0
- package/skills/aios-commercial-contract/prompts/basic-prompt.md +83 -0
- package/skills/aios-commercial-tender/SKILL.md +107 -0
- package/skills/aios-commercial-tender/agents/openai.yaml +4 -0
- package/skills/aios-commercial-tender/prompts/basic-prompt.md +94 -0
- package/skills/aios-commercial-variation/SKILL.md +106 -0
- package/skills/aios-commercial-variation/agents/openai.yaml +4 -0
- package/skills/aios-commercial-variation/prompts/basic-prompt.md +99 -0
- package/skills/aios-construction-daily/SKILL.md +104 -0
- package/skills/aios-construction-daily/agents/openai.yaml +4 -0
- package/skills/aios-construction-daily/prompts/basic-prompt.md +76 -0
- package/skills/aios-construction-meeting/SKILL.md +104 -0
- package/skills/aios-construction-meeting/agents/openai.yaml +4 -0
- package/skills/aios-construction-meeting/prompts/basic-prompt.md +78 -0
- package/skills/aios-construction-scheme/SKILL.md +97 -0
- package/skills/aios-construction-scheme/agents/openai.yaml +4 -0
- package/skills/aios-construction-scheme/prompts/basic-prompt.md +90 -0
- package/skills/aios-prompt-compare/SKILL.md +178 -0
- package/skills/aios-prompt-compare/agents/openai.yaml +4 -0
- package/skills/engineering-business-starter-kit.md +109 -0
- package/templates/README.md +16 -2
- package/templates/project-ai/.ai/ARCHSIGHT_AIOS_RULES.md +5 -4
- package/templates/project-ai/.ai/agent-routing.md +3 -1
- package/templates/project-ai/.ai/profile-detection.md +24 -0
- package/templates/project-ai/.ai/project-context.md +4 -1
- package/templates/project-ai/.ai/skills.md +31 -12
- package/templates/project-ai/.ai/workflows.md +7 -4
- package/templates/project-ai/AGENTS.md +6 -5
- package/templates/project-ai/AI_CODING_RULES.md +1 -1
- package/templates/project-ai/CLAUDE.md +6 -5
- package/templates/project-ai/GEMINI.md +6 -5
- package/templates/project-ai/OPENCODE.md +26 -0
- package/workflows/README.md +3 -0
- package/workflows/site-daily-loop.md +101 -0
|
@@ -0,0 +1,207 @@
|
|
|
1
|
+
# 公共发现与上架清单
|
|
2
|
+
|
|
3
|
+
本文件记录 ArchSight AIOS 要被公共 skill 生态发现时,项目内需要提供的稳定入口。
|
|
4
|
+
|
|
5
|
+
结论:公共发现不是单一市场自动收录。AIOS 需要同时满足三类机制:
|
|
6
|
+
|
|
7
|
+
1. 本地自动发现:宿主扫描 `SKILL.md`、`skills/`、`.agents/skills/`、`.claude/skills/`、`.opencode/skills/` 或 extension/plugin 目录。
|
|
8
|
+
2. 可分发安装:通过 GitHub、npm/npx、Antigravity/agy、Gemini extension 兼容入口、Claude marketplace、WorkBuddy、`skills.sh` / `npx skills` 安装。
|
|
9
|
+
3. 公共检索:依赖 GitHub topics、manifest、标准目录、README 关键词、release、安装量、star 和主动提交。
|
|
10
|
+
|
|
11
|
+
## 项目内入口
|
|
12
|
+
|
|
13
|
+
| 入口 | 文件 | 目的 |
|
|
14
|
+
| --- | --- | --- |
|
|
15
|
+
| 标准 skills 目录 | `skills/` | 让 `skills.sh`、`npx skills`、Antigravity/agy、Gemini extension、OpenCode 和其他标准 skill 索引器直接看到 `aios-*` Skill。 |
|
|
16
|
+
| Gemini extension manifest | `gemini-extension.json` | 保留 Gemini CLI extension 兼容入口和 Gallery / 第三方索引 manifest。 |
|
|
17
|
+
| Claude marketplace manifest | `.claude-plugin/marketplace.json` | 允许 Claude Code 用户通过 marketplace 方式发现本项目。 |
|
|
18
|
+
| Claude plugin manifest | `.claude-plugin/plugin.json` | 描述插件元数据,并把插件 skills 指向 `./skills/`。 |
|
|
19
|
+
| OpenCode 项目入口 | `OPENCODE.md` | 让业务项目中的 OpenCode 会话能读取公共规则和 `.ai/` 项目治理目录。 |
|
|
20
|
+
| WorkBuddy adapter | `adapters/workbuddy/README.md` | 说明如何把 `aios-*` Skill 安装到 `~/.workbuddy/skills/`。 |
|
|
21
|
+
| npm metadata | `package.json` | 提供英文检索关键词、分发文件清单和 `validate:skills` 校验入口。 |
|
|
22
|
+
| 发现校验脚本 | `scripts/validate-skills.mjs` | 校验 manifest、skill frontmatter、跨 host manifest 和 npm metadata 是否一致。 |
|
|
23
|
+
|
|
24
|
+
## GitHub About 建议
|
|
25
|
+
|
|
26
|
+
这些内容需要在 GitHub 仓库页面右侧 About 区手动设置,不能只靠代码文件完成。
|
|
27
|
+
|
|
28
|
+
策略:默认展示中文,紧跟一段英文检索短语。中文用户能直接看懂项目定位,英文用户和索引器也能通过 `building-industry AI agent skills`、`project evidence work`、`BIM`、`IFC`、`GraphRAG`、`code review` 等词命中。
|
|
29
|
+
|
|
30
|
+
Description:
|
|
31
|
+
|
|
32
|
+
```text
|
|
33
|
+
面向建筑行业知识工作从业者与 AI 研发团队的 Skills、Workflow 与多 Agent 工具包 / Building-industry AI agent skills for BIM, IFC, RAG, GraphRAG, project evidence work, code review, and runtime governance.
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
Website:
|
|
37
|
+
|
|
38
|
+
```text
|
|
39
|
+
https://github.com/ArchSightLabs/archsight-aios
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
Topics:
|
|
43
|
+
|
|
44
|
+
```text
|
|
45
|
+
agent-skills
|
|
46
|
+
building-ai
|
|
47
|
+
construction-ai
|
|
48
|
+
aec
|
|
49
|
+
bim
|
|
50
|
+
ifc
|
|
51
|
+
building-code
|
|
52
|
+
graphrag
|
|
53
|
+
mcp
|
|
54
|
+
codex
|
|
55
|
+
claude-code
|
|
56
|
+
gemini-cli-extension
|
|
57
|
+
antigravity
|
|
58
|
+
architecture-review
|
|
59
|
+
code-review
|
|
60
|
+
design-review
|
|
61
|
+
runtime-design
|
|
62
|
+
structural-engineering
|
|
63
|
+
tender-review
|
|
64
|
+
construction-management
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
Pinned README 搜索摘要可使用:
|
|
68
|
+
|
|
69
|
+
```text
|
|
70
|
+
默认中文输出,保留英文检索能力:building-industry AI, project evidence work, BIM, IFC, building code, RAG, GraphRAG, MCP, architecture review, code review, runtime governance, structural review, construction management.
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
## 公共安装命令
|
|
74
|
+
|
|
75
|
+
`skills.sh` / Vercel skills CLI:
|
|
76
|
+
|
|
77
|
+
```powershell
|
|
78
|
+
npx skills add ArchSightLabs/archsight-aios --list
|
|
79
|
+
npx skills add ArchSightLabs/archsight-aios --skill aios-arch --global
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Codex:
|
|
83
|
+
|
|
84
|
+
```powershell
|
|
85
|
+
npx @archsight/aios install --target codex --scope user
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
Claude Code 用户级 skills:
|
|
89
|
+
|
|
90
|
+
```powershell
|
|
91
|
+
npx @archsight/aios install --target claude-code --scope user
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
OpenCode 用户级 skills:
|
|
95
|
+
|
|
96
|
+
```powershell
|
|
97
|
+
npx @archsight/aios install --target opencode --scope user
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
Antigravity / agy:
|
|
101
|
+
|
|
102
|
+
```powershell
|
|
103
|
+
npx @archsight/aios install --target antigravity --scope user
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
Gemini CLI extension 兼容入口:
|
|
107
|
+
|
|
108
|
+
```powershell
|
|
109
|
+
gemini extensions install https://github.com/ArchSightLabs/archsight-aios
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
Gemini 用户级支持资产:
|
|
113
|
+
|
|
114
|
+
```powershell
|
|
115
|
+
npx @archsight/aios install --target gemini --scope user
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
WorkBuddy:
|
|
119
|
+
|
|
120
|
+
```powershell
|
|
121
|
+
npx @archsight/aios install --target workbuddy --scope user
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
Claude Code marketplace:
|
|
125
|
+
|
|
126
|
+
```text
|
|
127
|
+
/plugin marketplace add ArchSightLabs/archsight-aios
|
|
128
|
+
/plugin install archsight-aios@archsight
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
## 对外搜索词
|
|
132
|
+
|
|
133
|
+
README、release、npm、GitHub topics 和 launch post 应优先覆盖这些关键词:
|
|
134
|
+
|
|
135
|
+
```text
|
|
136
|
+
agent skills
|
|
137
|
+
AI agent skills
|
|
138
|
+
Codex skills
|
|
139
|
+
Claude Code skills
|
|
140
|
+
OpenCode skills
|
|
141
|
+
Gemini CLI extension
|
|
142
|
+
Antigravity CLI
|
|
143
|
+
construction AI
|
|
144
|
+
building AI
|
|
145
|
+
building-industry AI
|
|
146
|
+
AEC AI
|
|
147
|
+
project evidence work
|
|
148
|
+
BIM
|
|
149
|
+
IFC
|
|
150
|
+
building code
|
|
151
|
+
building code review
|
|
152
|
+
smart drawing review
|
|
153
|
+
architecture review
|
|
154
|
+
technical design review
|
|
155
|
+
code review
|
|
156
|
+
runtime design
|
|
157
|
+
runtime governance
|
|
158
|
+
MCP
|
|
159
|
+
GraphRAG
|
|
160
|
+
RAG
|
|
161
|
+
structural review
|
|
162
|
+
construction daily
|
|
163
|
+
tender review
|
|
164
|
+
contract evidence chain
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
## find-skills / skills.sh 索引请求模板
|
|
168
|
+
|
|
169
|
+
如果 `npx skills find` 搜不到本项目,可向对应社区或仓库提交索引请求:
|
|
170
|
+
|
|
171
|
+
```text
|
|
172
|
+
Repository: https://github.com/ArchSightLabs/archsight-aios
|
|
173
|
+
Package: @archsight/aios
|
|
174
|
+
License: Apache-2.0
|
|
175
|
+
Supported agents: Codex, Claude Code, Antigravity/agy, Gemini CLI, WorkBuddy, OpenCode, Hermes
|
|
176
|
+
Canonical skill path: skills/
|
|
177
|
+
Install command: npx skills add ArchSightLabs/archsight-aios --list
|
|
178
|
+
NPM install command: npx @archsight/aios install --target all --scope user
|
|
179
|
+
Representative skills: aios-arch, aios-design, aios-plan, aios-exec, aios-review, aios-knowledge, aios-structural, aios-runtime, aios-commercial-tender, aios-construction-daily
|
|
180
|
+
Keywords: agent skills, construction AI, BIM, IFC, building code, GraphRAG, architecture review, design review, code review, runtime design, MCP, structural review, tender review, construction management
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
## 发布前检查
|
|
184
|
+
|
|
185
|
+
每次发版前至少验证:
|
|
186
|
+
|
|
187
|
+
```powershell
|
|
188
|
+
npm run validate:skills
|
|
189
|
+
npx skills add . --list
|
|
190
|
+
npm test
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
如果本机安装了对应 CLI,再验证:
|
|
194
|
+
|
|
195
|
+
```powershell
|
|
196
|
+
gemini extensions validate .
|
|
197
|
+
claude plugin validate .
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
## 参考
|
|
201
|
+
|
|
202
|
+
- Codex Agent Skills: https://developers.openai.com/codex/skills
|
|
203
|
+
- OpenAI skills catalog: https://github.com/openai/skills
|
|
204
|
+
- Claude Code plugin marketplace: https://code.claude.com/docs/en/plugin-marketplaces
|
|
205
|
+
- Claude Code plugin reference: https://code.claude.com/docs/en/plugins-reference
|
|
206
|
+
- Gemini CLI extension releasing: https://github.com/google-gemini/gemini-cli/blob/main/docs/extensions/releasing.md
|
|
207
|
+
- Vercel skills CLI discovery: https://github.com/vercel-labs/skills
|
|
@@ -46,7 +46,7 @@ AIOS 不是替代专家判断,而是把专家判断变成可复用、可审查
|
|
|
46
46
|
1. 选一个具体主题,例如“地下室焊缝检测”“防火分区审查”“IFC 构件分类”。
|
|
47
47
|
2. 准备 5 到 20 个真实样例,包括正确样例和错误样例。
|
|
48
48
|
3. 写清楚术语、判断口径、通过条件和人工复核点。
|
|
49
|
-
4. 让工程团队使用 `archsight-aios init
|
|
49
|
+
4. 让工程团队使用 `archsight-aios init` 接入项目规则,并检查 `.ai/profile-detection.md` 和 `.ai/project-context.md` 的自动识别结果。
|
|
50
50
|
5. 根据 AI 输出的错误和遗漏,补充反例、规则和评估问题。
|
|
51
51
|
|
|
52
52
|
## 不应让 AIOS 做什么
|
|
@@ -58,10 +58,12 @@ AIOS 不是替代专家判断,而是把专家判断变成可复用、可审查
|
|
|
58
58
|
|
|
59
59
|
## 常见项目类型
|
|
60
60
|
|
|
61
|
-
|
|
61
|
+
默认由 `archsight-aios init` 自动识别。只有自动识别明显不符合项目实际时,再让工程团队显式覆盖 profile。
|
|
62
|
+
|
|
63
|
+
| 项目 | 自动识别或覆盖 profile |
|
|
62
64
|
| --- | --- |
|
|
63
65
|
| BIM / IFC / Revit / CAD 平台 | `bim-platform` |
|
|
64
66
|
| 施工现场图像、视频、缺陷检测 | `construction-vision` |
|
|
65
67
|
| 建筑规范知识库、RAG、GraphRAG | `rag-knowledge` |
|
|
66
68
|
|
|
67
|
-
业务专家只需要关注 `.ai/project-context.md`、`.ai/profiles/*.md` 和评估样例是否真实、准确、可复核。工程师负责把这些材料接入代码、脚本、测试和发布流程。
|
|
69
|
+
业务专家只需要关注 `.ai/profile-detection.md`、`.ai/project-context.md`、`.ai/profiles/*.md` 和评估样例是否真实、准确、可复核。工程师负责把这些材料接入代码、脚本、测试和发布流程。
|
package/docs/glossary.md
CHANGED
|
@@ -6,7 +6,7 @@ ArchSight AIOS 是一套 AI 规则、Agent、Skill、Workflow 和运行治理工
|
|
|
6
6
|
|
|
7
7
|
## Agent
|
|
8
8
|
|
|
9
|
-
Agent 是一个内部角色标签,例如建筑数字化专家、代码审查官、AI 研发工程师。Agent 定义职责、边界、输入和输出。普通使用者通常不需要记住 Agent 名字,也不需要手动指定 Agent;AIOS
|
|
9
|
+
Agent 是一个内部角色标签,例如建筑数字化专家、代码审查官、AI 研发工程师。Agent 定义职责、边界、输入和输出。普通使用者通常不需要记住 Agent 名字,也不需要手动指定 Agent;AIOS 会根据任务类型、自动识别 profile、Skill 和 Workflow 做路由。
|
|
10
10
|
|
|
11
11
|
## Skill
|
|
12
12
|
|
|
@@ -18,7 +18,7 @@ Workflow 是多步骤工作流,说明一个任务从输入、执行、检查
|
|
|
18
18
|
|
|
19
19
|
## Profile
|
|
20
20
|
|
|
21
|
-
Profile
|
|
21
|
+
Profile 是某类业务项目的补充规则。AIOS 默认把 profile 作为包内 registry 提供,`archsight-aios init` 会生成 `.ai/profile-detection.md` 做自动识别;用户通常不需要手动选择。当前包括:
|
|
22
22
|
|
|
23
23
|
- `bim-platform`
|
|
24
24
|
- `construction-vision`
|
|
@@ -26,7 +26,11 @@ Profile 是某类业务项目的补充规则。当前包括:
|
|
|
26
26
|
|
|
27
27
|
## `.ai/`
|
|
28
28
|
|
|
29
|
-
业务项目中的 AI 规则目录。它保存项目事实、AIOS
|
|
29
|
+
业务项目中的 AI 规则目录。它保存项目事实、AIOS 补充规则、自动识别结果、Agent 路由、Skills、Workflows 和行业 profile。
|
|
30
|
+
|
|
31
|
+
## `.ai/profile-detection.md`
|
|
32
|
+
|
|
33
|
+
AIOS 初始化时生成的自动识别草稿,记录命中的 profile、Skill 候选、证据关键词和人工复核边界。
|
|
30
34
|
|
|
31
35
|
## `AGENTS.md`
|
|
32
36
|
|
|
@@ -40,6 +44,10 @@ Claude Code 读取的项目入口文件。
|
|
|
40
44
|
|
|
41
45
|
Gemini 读取的项目入口文件。
|
|
42
46
|
|
|
47
|
+
## `OPENCODE.md`
|
|
48
|
+
|
|
49
|
+
OpenCode 读取的项目入口文件。
|
|
50
|
+
|
|
43
51
|
## `AI_CODING_RULES.md`
|
|
44
52
|
|
|
45
53
|
项目通用 AI 编码规则。它是项目自己的规则主体,AIOS 不应该随意覆盖它。
|
package/docs/quickstart.md
CHANGED
|
@@ -8,7 +8,7 @@
|
|
|
8
8
|
npx @archsight/aios install --target all --scope user
|
|
9
9
|
```
|
|
10
10
|
|
|
11
|
-
这一步会把 ArchSight AIOS 的 Skills、Workflows、Runtime 和模板同步到当前用户目录,让 Codex、Gemini、Antigravity 等工具可以读取。
|
|
11
|
+
这一步会把 ArchSight AIOS 的 Skills、Workflows、Runtime 和模板同步到当前用户目录,让 Codex、Claude Code、OpenCode、Gemini、Antigravity、WorkBuddy 等工具可以读取。
|
|
12
12
|
|
|
13
13
|
## 2. 检查安装
|
|
14
14
|
|
|
@@ -35,13 +35,18 @@ cd /work/your-project
|
|
|
35
35
|
npx @archsight/aios init
|
|
36
36
|
```
|
|
37
37
|
|
|
38
|
-
`init` 不指定 `--cwd` 时默认使用当前目录。已有 `AGENTS.md`、`CLAUDE.md`、`GEMINI.md` 或 `AI_CODING_RULES.md` 的项目不会被覆盖。
|
|
38
|
+
`init` 不指定 `--cwd` 时默认使用当前目录。已有 `AGENTS.md`、`CLAUDE.md`、`GEMINI.md`、`OPENCODE.md` 或 `AI_CODING_RULES.md` 的项目不会被覆盖。
|
|
39
39
|
|
|
40
|
-
## 4.
|
|
40
|
+
## 4. 查看自动识别结果
|
|
41
41
|
|
|
42
|
-
|
|
42
|
+
`init` 默认会自动生成 `.ai/profile-detection.md` 和预填 `.ai/project-context.md`。你可以直接打开这两个文件检查 AIOS 是否识别到了合适的 profile、Skill 候选、技术栈和常用命令。
|
|
43
|
+
|
|
44
|
+
通常不需要手动选择 profile。如果自动识别不符合项目实际,可以用下面的命令覆盖:
|
|
43
45
|
|
|
44
46
|
```bash
|
|
47
|
+
npx @archsight/aios init --profile auto
|
|
48
|
+
npx @archsight/aios init --profile none
|
|
49
|
+
npx @archsight/aios init --profile all
|
|
45
50
|
npx @archsight/aios init --profile bim-platform
|
|
46
51
|
npx @archsight/aios init --profile construction-vision
|
|
47
52
|
npx @archsight/aios init --profile rag-knowledge
|
|
@@ -55,6 +60,7 @@ npx @archsight/aios init --profile rag-knowledge
|
|
|
55
60
|
AGENTS.md
|
|
56
61
|
CLAUDE.md
|
|
57
62
|
GEMINI.md
|
|
63
|
+
OPENCODE.md
|
|
58
64
|
AI_CODING_RULES.md
|
|
59
65
|
.ai/
|
|
60
66
|
```
|
|
@@ -84,6 +90,14 @@ npm test
|
|
|
84
90
|
|
|
85
91
|
### 我应该用哪个 profile?
|
|
86
92
|
|
|
93
|
+
默认不用选。先运行:
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
npx @archsight/aios init
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
然后看 `.ai/profile-detection.md` 的识别结果。只有自动识别明显不符合项目实际时,再显式覆盖:
|
|
100
|
+
|
|
87
101
|
- BIM / Revit / CAD / IFC 平台:`bim-platform`
|
|
88
102
|
- 施工视觉 AI、检测、分割、深度估计:`construction-vision`
|
|
89
103
|
- 规范知识库、RAG、GraphRAG:`rag-knowledge`
|
|
@@ -0,0 +1,6 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "archsight-aios",
|
|
3
|
+
"version": "1.3.0",
|
|
4
|
+
"description": "面向建筑行业知识工作从业者与 AI 研发团队的 Skills、Workflow 与多 Agent 工具包 / Building-industry AI agent skills for BIM, IFC, RAG, GraphRAG, project evidence work, code review, and runtime governance.",
|
|
5
|
+
"contextFileName": "GEMINI.md"
|
|
6
|
+
}
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@archsight/aios",
|
|
3
|
-
"version": "1.
|
|
4
|
-
"description": "
|
|
3
|
+
"version": "1.3.0",
|
|
4
|
+
"description": "面向建筑行业知识工作从业者与 AI 研发团队的 Skills、Workflow 与多 Agent 工具包 / Building-industry AI agent skills for BIM, IFC, RAG, GraphRAG, project evidence work, code review, and runtime governance.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"homepage": "https://github.com/ArchSightLabs/archsight-aios#readme",
|
|
7
7
|
"repository": {
|
|
@@ -11,34 +11,62 @@
|
|
|
11
11
|
"bugs": {
|
|
12
12
|
"url": "https://github.com/ArchSightLabs/archsight-aios/issues"
|
|
13
13
|
},
|
|
14
|
-
"keywords": [
|
|
15
|
-
"ai",
|
|
16
|
-
"agents",
|
|
17
|
-
"architecture",
|
|
18
|
-
"bim",
|
|
19
|
-
"ifc",
|
|
20
|
-
"rag",
|
|
21
|
-
"graphrag",
|
|
22
|
-
"construction",
|
|
23
|
-
"
|
|
24
|
-
"
|
|
25
|
-
"
|
|
26
|
-
|
|
14
|
+
"keywords": [
|
|
15
|
+
"ai",
|
|
16
|
+
"agents",
|
|
17
|
+
"architecture",
|
|
18
|
+
"bim",
|
|
19
|
+
"ifc",
|
|
20
|
+
"rag",
|
|
21
|
+
"graphrag",
|
|
22
|
+
"construction",
|
|
23
|
+
"construction-ai",
|
|
24
|
+
"building-ai",
|
|
25
|
+
"aec",
|
|
26
|
+
"building-code",
|
|
27
|
+
"project-evidence-work",
|
|
28
|
+
"agent-skills",
|
|
29
|
+
"ai-agent",
|
|
30
|
+
"codex",
|
|
31
|
+
"gemini-cli",
|
|
32
|
+
"antigravity",
|
|
33
|
+
"claude-code",
|
|
34
|
+
"workbuddy",
|
|
35
|
+
"opencode",
|
|
36
|
+
"skills-sh",
|
|
37
|
+
"mcp",
|
|
38
|
+
"runtime-governance",
|
|
39
|
+
"architecture-review",
|
|
40
|
+
"code-review",
|
|
41
|
+
"design-review"
|
|
42
|
+
],
|
|
27
43
|
"bin": {
|
|
28
44
|
"archsight-aios": "./bin/archsight-aios.mjs"
|
|
29
45
|
},
|
|
30
|
-
"scripts": {
|
|
31
|
-
"doctor": "node ./bin/archsight-aios.mjs doctor",
|
|
32
|
-
"install:user": "node ./bin/archsight-aios.mjs install --target all --scope user",
|
|
33
|
-
"smoke:project": "node ./bin/archsight-aios.mjs validate --temp",
|
|
34
|
-
"
|
|
35
|
-
|
|
46
|
+
"scripts": {
|
|
47
|
+
"doctor": "node ./bin/archsight-aios.mjs doctor",
|
|
48
|
+
"install:user": "node ./bin/archsight-aios.mjs install --target all --scope user",
|
|
49
|
+
"smoke:project": "node ./bin/archsight-aios.mjs validate --temp",
|
|
50
|
+
"validate:skills": "node ./scripts/validate-skills.mjs",
|
|
51
|
+
"validate:prompts": "node ./scripts/validate-prompt-fixtures.mjs",
|
|
52
|
+
"validate:prompt-run-pack": "node ./scripts/build-prompt-run-pack.mjs --check",
|
|
53
|
+
"validate:public-advisory-run-pack": "node ./scripts/build-prompt-run-pack.mjs --fixture prompts/evaluations/engineering-business-public-advisory-fixtures.json --check",
|
|
54
|
+
"validate:prompt-run-results": "node ./scripts/validate-prompt-run-results.mjs --check-template",
|
|
55
|
+
"validate:prompt-outputs": "node ./scripts/validate-prompt-model-outputs.mjs",
|
|
56
|
+
"validate:prompt-scorecard": "node ./scripts/validate-prompt-scorecard.mjs",
|
|
57
|
+
"build:prompt-run-pack": "node ./scripts/build-prompt-run-pack.mjs --out prompts/evaluations/engineering-business-basic-run-pack.generated.json",
|
|
58
|
+
"build:public-advisory-run-pack": "node ./scripts/build-prompt-run-pack.mjs --fixture prompts/evaluations/engineering-business-public-advisory-fixtures.json --out prompts/evaluations/engineering-business-public-advisory-run-pack.generated.json",
|
|
59
|
+
"analyze:prompt-run-results": "node ./scripts/analyze-prompt-run-results.mjs",
|
|
60
|
+
"test": "node ./tests/cli.test.mjs"
|
|
61
|
+
},
|
|
36
62
|
"engines": {
|
|
37
63
|
"node": ">=18"
|
|
38
64
|
},
|
|
39
|
-
"files": [
|
|
40
|
-
"bin/",
|
|
41
|
-
"
|
|
65
|
+
"files": [
|
|
66
|
+
"bin/",
|
|
67
|
+
"scripts/",
|
|
68
|
+
"adapters/",
|
|
69
|
+
"skills/",
|
|
42
70
|
"workflows/",
|
|
43
71
|
"templates/",
|
|
44
72
|
"runtime/",
|
|
@@ -52,18 +80,22 @@
|
|
|
52
80
|
"standards/",
|
|
53
81
|
"infra/",
|
|
54
82
|
"prompts/",
|
|
55
|
-
"vision/",
|
|
56
|
-
"docs/",
|
|
57
|
-
"
|
|
58
|
-
"
|
|
59
|
-
"
|
|
83
|
+
"vision/",
|
|
84
|
+
"docs/",
|
|
85
|
+
".claude-plugin/",
|
|
86
|
+
"gemini-extension.json",
|
|
87
|
+
"LICENSE",
|
|
88
|
+
"CHANGELOG.md",
|
|
89
|
+
"RELEASE_NOTES.md",
|
|
90
|
+
"CONTRIBUTING.md",
|
|
60
91
|
"SECURITY.md",
|
|
61
92
|
"CODE_OF_CONDUCT.md",
|
|
62
93
|
"README.md",
|
|
63
|
-
"AI_CODING_RULES.md",
|
|
64
|
-
"AGENTS.md",
|
|
65
|
-
"CLAUDE.md",
|
|
66
|
-
"GEMINI.md"
|
|
67
|
-
|
|
94
|
+
"AI_CODING_RULES.md",
|
|
95
|
+
"AGENTS.md",
|
|
96
|
+
"CLAUDE.md",
|
|
97
|
+
"GEMINI.md",
|
|
98
|
+
"OPENCODE.md"
|
|
99
|
+
],
|
|
68
100
|
"license": "Apache-2.0"
|
|
69
101
|
}
|
package/prompts/README.md
CHANGED
|
@@ -11,3 +11,15 @@ Prompt 会腐化,因此不能只保存文本本身,必须保存评估和维
|
|
|
11
11
|
- [Prompt Registry](prompt-registry.md)
|
|
12
12
|
- [Prompt Evaluation Policy](evaluation-policy.md)
|
|
13
13
|
- [Prompt Failure Cases](failure-cases.md)
|
|
14
|
+
|
|
15
|
+
工程业务管理基础提示词随对应 `aios-*` Skill 分发,入口见 [Prompt Registry](prompt-registry.md) 和 [工程业务管理基础技能包](../skills/engineering-business-starter-kit.md)。
|
|
16
|
+
对比验证记录见 [工程业务管理基础提示词对比验证](evaluations/engineering-business-basic-prompts-2026-06-16.md)。
|
|
17
|
+
advisory 来源信号复核见 [工程业务基础提示词 advisory 复核说明](evaluations/engineering-business-basic-advisory-validation-2026-06-16.md)。
|
|
18
|
+
结构化评分卡见 [工程业务管理基础提示词评分卡](evaluations/engineering-business-basic-scorecard.json),可用 `npm run validate:prompt-scorecard` 校验。
|
|
19
|
+
weak/basic 运行包可用 `npm run build:prompt-run-pack` 生成,生成前可用 `npm run validate:prompt-run-pack` 校验。
|
|
20
|
+
weak/basic 运行结果模板可用 `node ./scripts/validate-prompt-run-results.mjs --init <file>` 生成,模板和真实结果可用 `npm run validate:prompt-run-results` 或 `--file` 校验。
|
|
21
|
+
weak/basic 运行结果报告可用 `npm run analyze:prompt-run-results -- --file <results> --out <report>` 生成。
|
|
22
|
+
输出结构样例见 [工程业务管理基础模型输出样例](evaluations/engineering-business-basic-model-output.example.json),可用 `npm run validate:prompt-outputs` 校验。
|
|
23
|
+
公开 advisory 验证案例见 [公开 advisory fixture](evaluations/engineering-business-public-advisory-fixtures.json) 和 [Markdown 归一化输入](evaluations/public-advisory-md/)。这些 Markdown 文件保留虚构客户、项目、人员、地点、日期、金额和编号,用于验证抽取效果;原始 PDF / DOCX / 图片解析另行测试。
|
|
24
|
+
|
|
25
|
+
需要比较弱提示词、便携强提示词和真实 Skill 触发结果时,使用 `aios-prompt-compare`。其中真实 Skill 结果必须来自宿主工具实际触发对应 `$aios-*` Skill 后的输出,不把 `SKILL.md` 当普通 prompt 粘贴运行的结果视为正式 Skill 结果。
|
|
@@ -16,3 +16,73 @@
|
|
|
16
16
|
3. 比较输出质量、风险和遵循度。
|
|
17
17
|
4. 记录失效案例。
|
|
18
18
|
|
|
19
|
+
## 工程业务基础提示词回归
|
|
20
|
+
|
|
21
|
+
工程业务管理基础提示词使用 `prompts/evaluations/engineering-business-basic-fixtures.json` 作为脱敏回归基线。
|
|
22
|
+
|
|
23
|
+
公开 advisory 验证案例使用 `prompts/evaluations/engineering-business-public-advisory-fixtures.json`,具体输入统一放在 `prompts/evaluations/public-advisory-md/*.md`。这些公开案例只使用 Markdown 归一化输入:客户、项目、人员、地点、日期、金额和编号都是虚构值;它们用于验证提示词、agent 路由、字段抽取和输出边界,不验证 PDF / DOCX / 图片解析链路。
|
|
24
|
+
|
|
25
|
+
修改 `skills/aios-*/prompts/basic-prompt.md` 后,运行:
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
npm run validate:prompts
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
该检查不替代真实模型输出评估,但能保证 6 类基础场景、抽象来源信号、必备输出结构、禁止结论和敏感信息边界没有被破坏。
|
|
32
|
+
|
|
33
|
+
普通提示词与基础提示词的结构化比较保存在 `prompts/evaluations/engineering-business-basic-scorecard.json`。修改 fixture、基础提示词或评分维度后,运行:
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
npm run validate:prompt-scorecard
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
评分卡用于固定比较维度、权重、普通提示词失败模式和基础提示词改进点;它是脱敏 fixture 级别的设计评估,不替代真实模型批量输出评测。
|
|
40
|
+
|
|
41
|
+
若需要批量运行 weak/basic 对照输入,先生成运行包:
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
npm run validate:prompt-run-pack
|
|
45
|
+
npm run build:prompt-run-pack
|
|
46
|
+
npm run validate:public-advisory-run-pack
|
|
47
|
+
npm run build:public-advisory-run-pack
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
基础运行包包含 6 个 case 的普通提示词和基础提示词两组输入,共 12 条 run item。公开 advisory 运行包同样生成 12 条 run item,但 `sampleInput` 来自 Markdown 归一化输入正文。该步骤只组织脱敏 / 虚构输入和 prompt 文本,不调用模型。
|
|
51
|
+
|
|
52
|
+
若要评估“普通提示词、便携强提示词、真实 Skill 结果”三类差异,使用 `aios-prompt-compare`。其中 weak/basic 可以沿用 run pack;`skill-runtime` 需要由宿主工具真实触发对应 `$aios-*` Skill 后归档,再按同一 scorecard 做三栏比较。不要把 `SKILL.md` 直接作为普通 prompt 粘贴运行的输出称为真实 Skill 结果。
|
|
53
|
+
|
|
54
|
+
weak/basic 成对运行后,用 run results 文件归档 12 条结果:
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
npm run validate:prompt-run-results
|
|
58
|
+
node ./scripts/validate-prompt-run-results.mjs --init prompts/evaluations/<your-run-results-file>.json
|
|
59
|
+
node ./scripts/validate-prompt-run-results.mjs --file prompts/evaluations/<your-run-results-file>.json
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
run results 校验会要求基础提示词输出包含必备章节且不出现禁止结论;普通提示词输出允许暴露缺陷,并输出 weak diagnostics 供对比复盘。
|
|
63
|
+
|
|
64
|
+
校验通过后,生成运行结果分析报告:
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
npm run analyze:prompt-run-results -- --file prompts/evaluations/<your-run-results-file>.json --out prompts/evaluations/<your-analysis-report>.md
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
分析报告汇总基础提示词通过门禁数量、普通提示词诊断数量、scorecard 判定和逐 case 差异,供后续决定是否调整基础提示词或 fixture。
|
|
71
|
+
|
|
72
|
+
若已经有模型输出文件,使用同一 fixture 校验输出结构:
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
npm run validate:prompt-outputs
|
|
76
|
+
node ./scripts/validate-prompt-model-outputs.mjs --file prompts/evaluations/<your-output-file>.json
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
若需要归档一次真实输出,先生成待填写模板:
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
node ./scripts/validate-prompt-model-outputs.mjs --init prompts/evaluations/<your-output-file>.json
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
模板中的 `output` 默认为空,不会通过校验;填入脱敏后的真实模型输出后,再用 `--file` 检查。
|
|
86
|
+
真实输出文件需要填写可追溯的 `model`、可解析的 `ranAt`,且 `promptVersion` 必须匹配当前 fixture 版本。
|
|
87
|
+
|
|
88
|
+
默认文件 `engineering-business-basic-model-output.example.json` 只是输出骨架样例,用于验证格式和检查器本身,不代表真实模型评测结果。
|
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
# 工程业务基础提示词 advisory 复核说明
|
|
2
|
+
|
|
3
|
+
> 日期:2026-06-16
|
|
4
|
+
> 范围:6 个工程业务基础提示词与 advisory 工作区中的旧提示词包、普通 / 优化输出对比记录。
|
|
5
|
+
> 边界:本文件只记录脱敏后的案例形态、输出差异和沉淀判断;不复制原始业务资料、联系人、项目名称、金额细节或完整模型输出。
|
|
6
|
+
|
|
7
|
+
## 复核结论
|
|
8
|
+
|
|
9
|
+
当前 AIOS 基础提示词比 advisory 旧提示词更适合沉淀为通用 Skill,原因不是“答案更长”,而是把旧提示词中的经验规则收口成了稳定规程:
|
|
10
|
+
|
|
11
|
+
| 维度 | advisory 旧提示词包 | AIOS 基础提示词 |
|
|
12
|
+
|---|---|---|
|
|
13
|
+
| 使用场景 | 为 PPT 准备和现场分享服务,文件之间相对独立 | 作为 `aios-*` Skill 的可复用基础模式 |
|
|
14
|
+
| 输入判断 | 每个提示词有边界提示,但分散在单文件内 | 每个 Skill 固定先判断资料类型、缺口和可验证程度 |
|
|
15
|
+
| 输出形态 | 已能生成矩阵、清单、台账和回查表 | 进一步统一 Source Map、主表、需确认项、复核岗位和不能下结论事项 |
|
|
16
|
+
| 风险边界 | 依赖提示词文本和人工使用习惯 | 固化禁止结论、人工复核岗位、L0-L1 能力边界和验证脚本 |
|
|
17
|
+
| 资产化程度 | 更像一次项目素材包 | 已进入 registry、manifest、安装分发、fixtures、scorecard 和 CLI 校验 |
|
|
18
|
+
|
|
19
|
+
因此,“更好”的选择不是直接搬 advisory 旧提示词,而是使用 AIOS 基础提示词作为通用技能包版本;advisory 旧提示词继续作为来源验证和案例启发。
|
|
20
|
+
|
|
21
|
+
## 只读复核来源
|
|
22
|
+
|
|
23
|
+
本次只读查看了 advisory 中的以下类型资产:
|
|
24
|
+
|
|
25
|
+
- `source/prompts/README.md`:旧提示词包的使用方式、输出验证和文件清单。
|
|
26
|
+
- `source/prompts/01-...` 到 `source/prompts/06-...`:6 个工程业务场景提示词。
|
|
27
|
+
- `source/prompts/07-...`:终稿阶段的案例分工、提示词优化方向和边界记录。
|
|
28
|
+
- `source/prompt-runs/2026-06-14-普通与优化提示词输出对比.md`:普通提示词与优化提示词的逐场景对比。
|
|
29
|
+
|
|
30
|
+
未读取或复制到 AIOS 的内容:
|
|
31
|
+
|
|
32
|
+
- 原始 docx / pdf 全文。
|
|
33
|
+
- 真实项目名称、联系人、公司内部称呼。
|
|
34
|
+
- 完整模型输出、金额细节、合同完整条款或正式资料编号。
|
|
35
|
+
|
|
36
|
+
## 场景信号映射
|
|
37
|
+
|
|
38
|
+
| AIOS caseId | advisory 抽象信号 | AIOS 固化结果 |
|
|
39
|
+
|---|---|---|
|
|
40
|
+
| `commercial-tender-response-matrix` | 技术标工具试用后的人工检查问题 + 评分点结构;不是完整招标原文读标 | 固化为输入类型判断、缺少可验证招标依据、问题回应矩阵、评分点响应矩阵 |
|
|
41
|
+
| `commercial-contract-obligation-nodes` | 工程合同片段有履约节点,也有空白字段和专业复核边界 | 固化为空白字段核对表、关键履约节点、付款结算条件和不能下结论事项 |
|
|
42
|
+
| `construction-daily-issue-tracking` | 日报有施工内容,也有资源、材料、照片等空白字段 | 固化为管理摘要、问题跟踪表、模板质量诊断;空白不等于现场事实 |
|
|
43
|
+
| `construction-meeting-action-closure` | 会议记录有发言人、状态和待办,但责任人 / 期限常不完整 | 固化为待办闭环、责任线索、需确认责任人和需确认期限 |
|
|
44
|
+
| `commercial-variation-evidence-chain` | 公开样表字段可讲资料链方法,但不证明具体项目事实 | 固化为资料链完整度、样表字段结构、过程线索和正式依据缺口 |
|
|
45
|
+
| `construction-scheme-assistive-review` | 施工方案 AI 生成 / 复核反馈涉及参数、图纸、地方标准和计算书边界 | 固化为辅助复核口径、失准复盘、专家修改说明回查和人工复核问题清单 |
|
|
46
|
+
|
|
47
|
+
## 普通提示词失败模式
|
|
48
|
+
|
|
49
|
+
advisory 对比记录显示,普通提示词在 6 个场景中有共性问题:
|
|
50
|
+
|
|
51
|
+
- 容易默认资料完整,跳过输入状态判断。
|
|
52
|
+
- 容易输出段落摘要,而不是可分工的矩阵、台账或回查表。
|
|
53
|
+
- 容易把未提供、未填、未见的内容写成事实判断。
|
|
54
|
+
- 容易把业务风险提示写成法律、造价、质量安全或审批结论。
|
|
55
|
+
- 容易把一次性回答当成工具能力、系统能力或正式交付能力。
|
|
56
|
+
|
|
57
|
+
AIOS 基础提示词针对这些问题加了统一约束:
|
|
58
|
+
|
|
59
|
+
- `Source Map` 和资料状态判断。
|
|
60
|
+
- 主输出表格或清单。
|
|
61
|
+
- `需补充确认` / `Need verify`。
|
|
62
|
+
- 人工复核岗位。
|
|
63
|
+
- `不能下结论的事项`。
|
|
64
|
+
- `Claim / Evidence / Tool Result / Decision`。
|
|
65
|
+
|
|
66
|
+
## 对当前 AIOS 资产的影响
|
|
67
|
+
|
|
68
|
+
本次复核后,AIOS 保留以下沉淀方式:
|
|
69
|
+
|
|
70
|
+
- 6 个基础提示词继续放在各自 Skill 的 `prompts/basic-prompt.md`。
|
|
71
|
+
- `engineering-business-basic-fixtures.json` 增加 `sourceSignals` 和 `advisoryComparison`,记录脱敏来源信号。
|
|
72
|
+
- `engineering-business-basic-scorecard.json` 继续作为“哪套更好”的结构化判断。
|
|
73
|
+
- `validate-prompt-fixtures.mjs` 校验来源信号必须是抽象前缀,避免真实资料名回流。
|
|
74
|
+
|
|
75
|
+
## 仍未完成的验证
|
|
76
|
+
|
|
77
|
+
当前验证可以证明提示词设计、案例覆盖、边界规则和资产分发链路已经成形,但不能声称已经完成真实外部模型批量评测。
|
|
78
|
+
|
|
79
|
+
真实批跑需要:
|
|
80
|
+
|
|
81
|
+
1. 生成 weak/basic run pack。
|
|
82
|
+
2. 用同一模型分别跑 12 条输入。
|
|
83
|
+
3. 把输出填入 run results JSON。
|
|
84
|
+
4. 执行 `validate-prompt-run-results.mjs --file`。
|
|
85
|
+
5. 执行 `analyze-prompt-run-results.mjs --file ... --out ...`。
|
|
86
|
+
|
|
87
|
+
没有真实模型输出时,scorecard 只能作为设计评审和静态回归门禁,不能当作模型效果保证。
|