@archsight/aios 1.2.0 → 1.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (110) hide show
  1. package/.claude-plugin/plugin.json +1 -1
  2. package/CHANGELOG.md +59 -0
  3. package/OPENCODE.md +23 -0
  4. package/README.md +64 -31
  5. package/RELEASE_NOTES.md +37 -0
  6. package/adapters/workbuddy/README.md +11 -1
  7. package/agents/README.md +6 -3
  8. package/agents/atlas/responsibilities.md +1 -1
  9. package/agents/atlas/system-prompt.md +1 -1
  10. package/agents/daedalus/system-prompt.md +2 -0
  11. package/agents/hestia/constraints.md +7 -0
  12. package/agents/hestia/responsibilities.md +7 -0
  13. package/agents/hestia/role.md +12 -0
  14. package/agents/hestia/system-prompt.md +23 -0
  15. package/agents/hestia/workflow.md +8 -0
  16. package/agents/plutus/constraints.md +7 -0
  17. package/agents/plutus/responsibilities.md +7 -0
  18. package/agents/plutus/role.md +12 -0
  19. package/agents/plutus/system-prompt.md +24 -0
  20. package/agents/plutus/workflow.md +8 -0
  21. package/agents/themis/constraints.md +7 -0
  22. package/agents/themis/responsibilities.md +7 -0
  23. package/agents/themis/role.md +12 -0
  24. package/agents/themis/system-prompt.md +24 -0
  25. package/agents/themis/workflow.md +8 -0
  26. package/bin/archsight-aios.mjs +558 -25
  27. package/docs/PUBLIC_DISCOVERY.md +16 -2
  28. package/docs/business-expert-guide.md +5 -3
  29. package/docs/glossary.md +11 -3
  30. package/docs/quickstart.md +18 -4
  31. package/gemini-extension.json +1 -1
  32. package/governance/README.md +41 -11
  33. package/governance/agent-boundary.md +1 -2
  34. package/governance/ai-review-policy.md +1 -2
  35. package/governance/arbitration-protocol.md +33 -33
  36. package/governance/context-policy.md +2 -3
  37. package/governance/delivery-policy.md +1 -2
  38. package/governance/memory-policy.md +1 -2
  39. package/governance/security-policy.md +1 -2
  40. package/memory/decision-records.md +8 -9
  41. package/package.json +17 -6
  42. package/prompts/README.md +12 -0
  43. package/prompts/evaluation-policy.md +70 -0
  44. package/prompts/evaluations/engineering-business-basic-advisory-validation-2026-06-16.md +87 -0
  45. package/prompts/evaluations/engineering-business-basic-fixtures.json +375 -0
  46. package/prompts/evaluations/engineering-business-basic-model-output.example.json +179 -0
  47. package/prompts/evaluations/engineering-business-basic-prompts-2026-06-16.md +205 -0
  48. package/prompts/evaluations/engineering-business-basic-scorecard.json +238 -0
  49. package/prompts/evaluations/engineering-business-public-advisory-fixtures.json +422 -0
  50. package/prompts/evaluations/public-advisory-md/01-technical-bid.md +63 -0
  51. package/prompts/evaluations/public-advisory-md/02-contract.md +61 -0
  52. package/prompts/evaluations/public-advisory-md/03-daily.md +69 -0
  53. package/prompts/evaluations/public-advisory-md/04-meeting.md +48 -0
  54. package/prompts/evaluations/public-advisory-md/05-variation.md +63 -0
  55. package/prompts/evaluations/public-advisory-md/06-scheme.md +60 -0
  56. package/prompts/failure-cases.md +5 -1
  57. package/prompts/prompt-registry.md +10 -0
  58. package/runtime/agent-routing.md +39 -9
  59. package/runtime/archsight-aios.manifest.json +154 -51
  60. package/runtime/hermes/agent-registry.md +3 -0
  61. package/runtime/hermes/workspace-binding.md +3 -0
  62. package/runtime/skill-routing.md +23 -12
  63. package/scripts/analyze-prompt-run-results.mjs +187 -0
  64. package/scripts/build-prompt-run-pack.mjs +248 -0
  65. package/scripts/validate-prompt-fixtures.mjs +225 -0
  66. package/scripts/validate-prompt-model-outputs.mjs +201 -0
  67. package/scripts/validate-prompt-run-results.mjs +259 -0
  68. package/scripts/validate-prompt-scorecard.mjs +133 -0
  69. package/scripts/validate-skills.mjs +8 -3
  70. package/skills/README.md +12 -6
  71. package/skills/aios/SKILL.md +79 -0
  72. package/skills/aios/agents/openai.yaml +4 -0
  73. package/skills/aios-arch/SKILL.md +14 -14
  74. package/skills/aios-ceo/SKILL.md +13 -13
  75. package/skills/aios-commercial-contract/SKILL.md +32 -14
  76. package/skills/aios-commercial-contract/prompts/basic-prompt.md +83 -0
  77. package/skills/aios-commercial-tender/SKILL.md +31 -13
  78. package/skills/aios-commercial-tender/prompts/basic-prompt.md +94 -0
  79. package/skills/aios-commercial-variation/SKILL.md +33 -15
  80. package/skills/aios-commercial-variation/prompts/basic-prompt.md +99 -0
  81. package/skills/aios-compare/SKILL.md +92 -0
  82. package/skills/aios-compare/agents/openai.yaml +4 -0
  83. package/skills/aios-construction-daily/SKILL.md +32 -14
  84. package/skills/aios-construction-daily/prompts/basic-prompt.md +76 -0
  85. package/skills/aios-construction-meeting/SKILL.md +32 -14
  86. package/skills/aios-construction-meeting/prompts/basic-prompt.md +78 -0
  87. package/skills/aios-construction-scheme/SKILL.md +28 -10
  88. package/skills/aios-construction-scheme/prompts/basic-prompt.md +90 -0
  89. package/skills/aios-plan/SKILL.md +7 -7
  90. package/skills/aios-prompt-compare/SKILL.md +180 -0
  91. package/skills/aios-prompt-compare/agents/openai.yaml +4 -0
  92. package/skills/aios-review/SKILL.md +1 -1
  93. package/skills/aios-structural/SKILL.md +7 -7
  94. package/skills/archsight-aios/SKILL.md +40 -0
  95. package/skills/archsight-aios/agents/openai.yaml +4 -0
  96. package/skills/engineering-business-starter-kit.md +112 -0
  97. package/templates/README.md +16 -2
  98. package/templates/project-ai/.ai/ARCHSIGHT_AIOS_RULES.md +5 -4
  99. package/templates/project-ai/.ai/agent-routing.md +3 -1
  100. package/templates/project-ai/.ai/profile-detection.md +24 -0
  101. package/templates/project-ai/.ai/project-context.md +4 -1
  102. package/templates/project-ai/.ai/skills.md +36 -24
  103. package/templates/project-ai/AGENTS.md +6 -5
  104. package/templates/project-ai/AI_CODING_RULES.md +1 -1
  105. package/templates/project-ai/CLAUDE.md +6 -5
  106. package/templates/project-ai/GEMINI.md +6 -5
  107. package/templates/project-ai/OPENCODE.md +26 -0
  108. package/workflows/README.md +1 -1
  109. package/workflows/architecture-review.md +10 -10
  110. package/workflows/site-daily-loop.md +25 -25
@@ -0,0 +1,201 @@
1
+ #!/usr/bin/env node
2
+
3
+ import fs from "node:fs";
4
+ import path from "node:path";
5
+
6
+ const root = fs.realpathSync(process.cwd());
7
+ const errors = [];
8
+
9
+ const defaultOutputPath = "prompts/evaluations/engineering-business-basic-model-output.example.json";
10
+ const args = parseArgs(process.argv.slice(2));
11
+ const outputPath = args.file ?? repoPath(defaultOutputPath);
12
+ const fixturePath = repoPath("prompts/evaluations/engineering-business-basic-fixtures.json");
13
+ const fixture = readJson(fixturePath);
14
+ const outputFile = args.init ? undefined : readJson(outputPath);
15
+
16
+ const sensitiveTerms = [
17
+ "立信",
18
+ "费敏",
19
+ "闻总",
20
+ "谭总",
21
+ "茅盾中学",
22
+ "鸿益",
23
+ "太鑫",
24
+ "飞双",
25
+ "魔毯",
26
+ "客户内部",
27
+ "培训演示",
28
+ "基础内测",
29
+ "内测模式",
30
+ "内测包"
31
+ ];
32
+
33
+ function repoPath(...parts) {
34
+ const target = path.join(root, ...parts);
35
+ const relative = path.relative(root, target);
36
+ if (relative.startsWith("..") || path.isAbsolute(relative)) {
37
+ throw new Error(`Path traversal detected: ${target}`);
38
+ }
39
+ return target;
40
+ }
41
+
42
+ function parseArgs(argv) {
43
+ const parsed = {
44
+ file: undefined,
45
+ init: undefined,
46
+ force: false
47
+ };
48
+
49
+ for (let index = 0; index < argv.length; index += 1) {
50
+ const arg = argv[index];
51
+ if (arg === "--file") {
52
+ const value = argv[index + 1];
53
+ if (!value) {
54
+ errors.push("--file requires a path");
55
+ } else {
56
+ parsed.file = repoPath(value);
57
+ index += 1;
58
+ }
59
+ } else if (arg === "--init") {
60
+ const value = argv[index + 1];
61
+ if (!value) {
62
+ errors.push("--init requires a path");
63
+ } else {
64
+ parsed.init = repoPath(value);
65
+ index += 1;
66
+ }
67
+ } else if (arg === "--force") {
68
+ parsed.force = true;
69
+ } else {
70
+ errors.push(`Unknown argument: ${arg}`);
71
+ }
72
+ }
73
+
74
+ return parsed;
75
+ }
76
+
77
+ function readJson(filePath) {
78
+ try {
79
+ return JSON.parse(fs.readFileSync(filePath, "utf8"));
80
+ } catch (error) {
81
+ errors.push(`${path.relative(root, filePath)}: invalid JSON (${error.message})`);
82
+ return undefined;
83
+ }
84
+ }
85
+
86
+ function check(condition, message) {
87
+ if (!condition) errors.push(message);
88
+ }
89
+
90
+ function includesSensitiveTerm(value) {
91
+ const raw = typeof value === "string" ? value : JSON.stringify(value);
92
+ return sensitiveTerms.filter((term) => raw.includes(term));
93
+ }
94
+
95
+ function outputText(value) {
96
+ if (Array.isArray(value)) return value.join("\n");
97
+ if (typeof value === "string") return value;
98
+ return "";
99
+ }
100
+
101
+ function createOutputTemplate() {
102
+ if (!fixture) return;
103
+
104
+ if (fs.existsSync(args.init) && !args.force) {
105
+ errors.push(`${path.relative(root, args.init)} already exists; pass --force to overwrite`);
106
+ return;
107
+ }
108
+
109
+ const template = {
110
+ schema: 1,
111
+ name: "engineering-business-basic-model-output-run",
112
+ version: fixture.version ?? "0.1",
113
+ fixture: "prompts/evaluations/engineering-business-basic-fixtures.json",
114
+ isExample: false,
115
+ dataBoundary:
116
+ "Fill this file with de-identified model outputs only. Do not include customer names, contacts, project names, exact amounts, or raw source documents.",
117
+ outputs: (fixture.cases ?? []).map((item) => ({
118
+ caseId: item.id,
119
+ promptVersion: fixture.version ?? "0.1",
120
+ model: "",
121
+ ranAt: "",
122
+ notes: "",
123
+ promptPath: item.promptPath,
124
+ scenario: item.scenario,
125
+ expectedSections: item.expectedStrongSections,
126
+ bannedClaims: item.bannedClaims,
127
+ output: []
128
+ }))
129
+ };
130
+
131
+ fs.mkdirSync(path.dirname(args.init), { recursive: true });
132
+ fs.writeFileSync(args.init, `${JSON.stringify(template, null, 2)}\n`, "utf8");
133
+ console.log(`Prompt model output template written: ${path.relative(root, args.init)}`);
134
+ }
135
+
136
+ if (args.init) {
137
+ if (errors.length === 0) createOutputTemplate();
138
+ if (errors.length > 0) {
139
+ console.error(`Prompt model output validation failed with ${errors.length} error(s):`);
140
+ for (const error of errors) console.error(`- ${error}`);
141
+ process.exit(1);
142
+ }
143
+ process.exit(0);
144
+ }
145
+
146
+ if (fixture && outputFile) {
147
+ check(outputFile.schema === 1, "model output file: schema must be 1");
148
+ check(typeof outputFile.version === "string" && outputFile.version.length > 0, "model output file: version must be a string");
149
+ check(Array.isArray(outputFile.outputs), "model output file: outputs must be an array");
150
+ check(
151
+ outputFile.fixture === "prompts/evaluations/engineering-business-basic-fixtures.json",
152
+ "model output file: fixture path mismatch"
153
+ );
154
+
155
+ const casesById = new Map((fixture.cases ?? []).map((item) => [item.id, item]));
156
+ const expectedIds = [...casesById.keys()].sort();
157
+ const actualIds = (outputFile.outputs ?? []).map((item) => item.caseId).sort();
158
+
159
+ check(JSON.stringify(actualIds) === JSON.stringify(expectedIds), "model output file: case coverage mismatch");
160
+
161
+ const seenIds = new Set();
162
+ for (const item of outputFile.outputs ?? []) {
163
+ check(typeof item.caseId === "string" && item.caseId.length > 0, "model output item: missing caseId");
164
+ check(!seenIds.has(item.caseId), `${item.caseId}: duplicate model output`);
165
+ seenIds.add(item.caseId);
166
+
167
+ const sourceCase = casesById.get(item.caseId);
168
+ check(Boolean(sourceCase), `${item.caseId}: caseId not found in fixtures`);
169
+
170
+ check(typeof item.promptVersion === "string" && item.promptVersion.length > 0, `${item.caseId}: missing promptVersion`);
171
+ check(item.promptVersion === fixture.version, `${item.caseId}: promptVersion must match fixture version ${fixture.version}`);
172
+ check(typeof item.model === "string" && item.model.length > 0, `${item.caseId}: missing model`);
173
+ check(typeof item.ranAt === "string" && item.ranAt.length > 0, `${item.caseId}: missing ranAt`);
174
+ check(!Number.isNaN(Date.parse(item.ranAt)), `${item.caseId}: ranAt must be a parseable timestamp`);
175
+ check(typeof item.notes === "string", `${item.caseId}: notes must be a string`);
176
+ if (outputFile.isExample !== true) {
177
+ check(item.model !== "example-skeleton", `${item.caseId}: non-example output must use a real model identifier`);
178
+ }
179
+
180
+ const text = outputText(item.output);
181
+ check(text.length > 0, `${item.caseId}: output must be a non-empty string or string array`);
182
+
183
+ const sensitiveHits = includesSensitiveTerm(item);
184
+ check(sensitiveHits.length === 0, `${item.caseId}: sensitive terms leaked (${sensitiveHits.join(", ")})`);
185
+
186
+ for (const section of sourceCase?.expectedStrongSections ?? []) {
187
+ check(text.includes(section), `${item.caseId}: output missing expected section "${section}"`);
188
+ }
189
+ for (const bannedClaim of sourceCase?.bannedClaims ?? []) {
190
+ check(!text.includes(bannedClaim), `${item.caseId}: output contains prohibited claim "${bannedClaim}"`);
191
+ }
192
+ }
193
+ }
194
+
195
+ if (errors.length > 0) {
196
+ console.error(`Prompt model output validation failed with ${errors.length} error(s):`);
197
+ for (const error of errors) console.error(`- ${error}`);
198
+ process.exit(1);
199
+ }
200
+
201
+ console.log("Prompt model output validation passed.");
@@ -0,0 +1,259 @@
1
+ #!/usr/bin/env node
2
+
3
+ import fs from "node:fs";
4
+ import path from "node:path";
5
+
6
+ const root = fs.realpathSync(process.cwd());
7
+ const errors = [];
8
+ const diagnostics = [];
9
+
10
+ const fixturePath = repoPath("prompts/evaluations/engineering-business-basic-fixtures.json");
11
+ const fixture = readJson(fixturePath);
12
+ const args = parseArgs(process.argv.slice(2));
13
+
14
+ const sensitiveTerms = [
15
+ "立信",
16
+ "费敏",
17
+ "闻总",
18
+ "谭总",
19
+ "茅盾中学",
20
+ "鸿益",
21
+ "太鑫",
22
+ "飞双",
23
+ "魔毯",
24
+ "客户内部",
25
+ "培训演示",
26
+ "基础内测",
27
+ "内测模式",
28
+ "内测包"
29
+ ];
30
+
31
+ function repoPath(...parts) {
32
+ const target = path.join(root, ...parts);
33
+ const relative = path.relative(root, target);
34
+ if (relative.startsWith("..") || path.isAbsolute(relative)) {
35
+ throw new Error(`Path traversal detected: ${target}`);
36
+ }
37
+ return target;
38
+ }
39
+
40
+ function parseArgs(argv) {
41
+ const parsed = {
42
+ file: undefined,
43
+ init: undefined,
44
+ force: false,
45
+ checkTemplate: argv.length === 0
46
+ };
47
+
48
+ for (let index = 0; index < argv.length; index += 1) {
49
+ const arg = argv[index];
50
+ if (arg === "--file") {
51
+ const value = argv[index + 1];
52
+ if (!value) {
53
+ errors.push("--file requires a path");
54
+ } else {
55
+ parsed.file = repoPath(value);
56
+ index += 1;
57
+ }
58
+ } else if (arg === "--init") {
59
+ const value = argv[index + 1];
60
+ if (!value) {
61
+ errors.push("--init requires a path");
62
+ } else {
63
+ parsed.init = repoPath(value);
64
+ index += 1;
65
+ }
66
+ } else if (arg === "--force") {
67
+ parsed.force = true;
68
+ } else if (arg === "--check-template") {
69
+ parsed.checkTemplate = true;
70
+ } else {
71
+ errors.push(`Unknown argument: ${arg}`);
72
+ }
73
+ }
74
+
75
+ if ([parsed.file, parsed.init, parsed.checkTemplate].filter(Boolean).length > 1) {
76
+ errors.push("Use only one mode: --file, --init, or --check-template");
77
+ }
78
+
79
+ return parsed;
80
+ }
81
+
82
+ function readJson(filePath) {
83
+ try {
84
+ return JSON.parse(fs.readFileSync(filePath, "utf8"));
85
+ } catch (error) {
86
+ errors.push(`${path.relative(root, filePath)}: invalid JSON (${error.message})`);
87
+ return undefined;
88
+ }
89
+ }
90
+
91
+ function check(condition, message) {
92
+ if (!condition) errors.push(message);
93
+ }
94
+
95
+ function includesSensitiveTerm(value) {
96
+ const raw = typeof value === "string" ? value : JSON.stringify(value);
97
+ return sensitiveTerms.filter((term) => raw.includes(term));
98
+ }
99
+
100
+ function outputText(value) {
101
+ if (Array.isArray(value)) return value.join("\n");
102
+ if (typeof value === "string") return value;
103
+ return "";
104
+ }
105
+
106
+ function expectedRuns() {
107
+ if (!fixture) return [];
108
+
109
+ return (fixture.cases ?? []).flatMap((item) => [
110
+ {
111
+ runId: `${item.id}::weak`,
112
+ caseId: item.id,
113
+ variant: "weak",
114
+ promptSource: "fixture.weakPrompt",
115
+ promptVersion: fixture.version,
116
+ expectedStrongSections: item.expectedStrongSections,
117
+ bannedClaims: item.bannedClaims,
118
+ weakFailureModes: item.weakFailureModes
119
+ },
120
+ {
121
+ runId: `${item.id}::basic`,
122
+ caseId: item.id,
123
+ variant: "basic",
124
+ promptSource: item.promptPath,
125
+ promptVersion: fixture.version,
126
+ expectedStrongSections: item.expectedStrongSections,
127
+ bannedClaims: item.bannedClaims,
128
+ weakFailureModes: item.weakFailureModes
129
+ }
130
+ ]);
131
+ }
132
+
133
+ function createTemplate() {
134
+ return {
135
+ schema: 1,
136
+ name: "engineering-business-basic-run-results",
137
+ version: fixture?.version ?? "0.1",
138
+ fixture: "prompts/evaluations/engineering-business-basic-fixtures.json",
139
+ runPack: "prompts/evaluations/engineering-business-basic-run-pack.generated.json",
140
+ isExample: false,
141
+ dataBoundary:
142
+ "Fill this file with de-identified weak/basic model outputs only. Do not include customer names, contacts, project names, exact amounts, or raw source documents.",
143
+ outputs: expectedRuns().map((item) => ({
144
+ runId: item.runId,
145
+ caseId: item.caseId,
146
+ variant: item.variant,
147
+ promptVersion: item.promptVersion,
148
+ model: "",
149
+ ranAt: "",
150
+ notes: "",
151
+ promptSource: item.promptSource,
152
+ expectedStrongSections: item.expectedStrongSections,
153
+ bannedClaims: item.bannedClaims,
154
+ weakFailureModes: item.weakFailureModes,
155
+ output: []
156
+ }))
157
+ };
158
+ }
159
+
160
+ function validateTemplateShape(template) {
161
+ check(template.schema === 1, "run results: schema must be 1");
162
+ check(template.version === fixture?.version, `run results: version must match fixture version ${fixture?.version}`);
163
+ check(template.fixture === "prompts/evaluations/engineering-business-basic-fixtures.json", "run results: fixture path mismatch");
164
+ check(Array.isArray(template.outputs), "run results: outputs must be an array");
165
+ check(template.outputs?.length === expectedRuns().length, "run results: output count must match weak/basic run count");
166
+
167
+ const sensitiveHits = includesSensitiveTerm(template);
168
+ check(sensitiveHits.length === 0, `run results: sensitive terms leaked (${sensitiveHits.join(", ")})`);
169
+
170
+ const expectedByRunId = new Map(expectedRuns().map((item) => [item.runId, item]));
171
+ const actualRunIds = (template.outputs ?? []).map((item) => item.runId).sort();
172
+ const expectedRunIds = [...expectedByRunId.keys()].sort();
173
+ check(JSON.stringify(actualRunIds) === JSON.stringify(expectedRunIds), "run results: runId coverage mismatch");
174
+
175
+ const seen = new Set();
176
+ for (const item of template.outputs ?? []) {
177
+ const expected = expectedByRunId.get(item.runId);
178
+ check(!seen.has(item.runId), `${item.runId}: duplicate output`);
179
+ seen.add(item.runId);
180
+ check(Boolean(expected), `${item.runId}: runId not found in expected run pack`);
181
+ check(item.caseId === expected?.caseId, `${item.runId}: caseId mismatch`);
182
+ check(item.variant === expected?.variant, `${item.runId}: variant mismatch`);
183
+ check(item.promptVersion === expected?.promptVersion, `${item.runId}: promptVersion mismatch`);
184
+ check(item.promptSource === expected?.promptSource, `${item.runId}: promptSource mismatch`);
185
+ check(Array.isArray(item.expectedStrongSections), `${item.runId}: expectedStrongSections must be an array`);
186
+ check(Array.isArray(item.bannedClaims), `${item.runId}: bannedClaims must be an array`);
187
+ check(Array.isArray(item.weakFailureModes), `${item.runId}: weakFailureModes must be an array`);
188
+ }
189
+ }
190
+
191
+ function validateRunResults(results) {
192
+ validateTemplateShape(results);
193
+
194
+ for (const item of results.outputs ?? []) {
195
+ check(typeof item.model === "string" && item.model.length > 0, `${item.runId}: missing model`);
196
+ check(item.model !== "example-skeleton", `${item.runId}: model must be a real model identifier`);
197
+ check(typeof item.ranAt === "string" && item.ranAt.length > 0, `${item.runId}: missing ranAt`);
198
+ check(!Number.isNaN(Date.parse(item.ranAt)), `${item.runId}: ranAt must be a parseable timestamp`);
199
+ check(typeof item.notes === "string", `${item.runId}: notes must be a string`);
200
+
201
+ const text = outputText(item.output);
202
+ check(text.length > 0, `${item.runId}: output must be a non-empty string or string array`);
203
+
204
+ const sensitiveHits = includesSensitiveTerm(item);
205
+ check(sensitiveHits.length === 0, `${item.runId}: sensitive terms leaked (${sensitiveHits.join(", ")})`);
206
+
207
+ const missingSections = (item.expectedStrongSections ?? []).filter((section) => !text.includes(section));
208
+ const prohibitedClaims = (item.bannedClaims ?? []).filter((claim) => text.includes(claim));
209
+
210
+ if (item.variant === "basic") {
211
+ for (const section of missingSections) {
212
+ errors.push(`${item.runId}: basic output missing expected section "${section}"`);
213
+ }
214
+ for (const claim of prohibitedClaims) {
215
+ errors.push(`${item.runId}: basic output contains prohibited claim "${claim}"`);
216
+ }
217
+ } else if (missingSections.length > 0 || prohibitedClaims.length > 0) {
218
+ diagnostics.push({
219
+ runId: item.runId,
220
+ missingSections,
221
+ prohibitedClaims
222
+ });
223
+ }
224
+ }
225
+ }
226
+
227
+ if (args.init) {
228
+ const template = createTemplate();
229
+ validateTemplateShape(template);
230
+ if (fs.existsSync(args.init) && !args.force) {
231
+ errors.push(`${path.relative(root, args.init)} already exists; pass --force to overwrite`);
232
+ }
233
+ if (errors.length === 0) {
234
+ fs.mkdirSync(path.dirname(args.init), { recursive: true });
235
+ fs.writeFileSync(args.init, `${JSON.stringify(template, null, 2)}\n`, "utf8");
236
+ console.log(`Prompt run results template written: ${path.relative(root, args.init)}`);
237
+ process.exit(0);
238
+ }
239
+ } else if (args.file) {
240
+ const results = readJson(args.file);
241
+ if (results) validateRunResults(results);
242
+ } else {
243
+ validateTemplateShape(createTemplate());
244
+ if (errors.length === 0) {
245
+ console.log("Prompt run results template validation passed.");
246
+ process.exit(0);
247
+ }
248
+ }
249
+
250
+ if (errors.length > 0) {
251
+ console.error(`Prompt run results validation failed with ${errors.length} error(s):`);
252
+ for (const error of errors) console.error(`- ${error}`);
253
+ process.exit(1);
254
+ }
255
+
256
+ console.log("Prompt run results validation passed.");
257
+ if (diagnostics.length > 0) {
258
+ console.log(`Weak output diagnostics: ${diagnostics.length} run(s) need comparison review.`);
259
+ }
@@ -0,0 +1,133 @@
1
+ #!/usr/bin/env node
2
+
3
+ import fs from "node:fs";
4
+ import path from "node:path";
5
+
6
+ const root = fs.realpathSync(process.cwd());
7
+ const errors = [];
8
+
9
+ const fixturePath = repoPath("prompts/evaluations/engineering-business-basic-fixtures.json");
10
+ const scorecardPath = repoPath("prompts/evaluations/engineering-business-basic-scorecard.json");
11
+ const fixture = readJson(fixturePath);
12
+ const scorecard = readJson(scorecardPath);
13
+
14
+ const sensitiveTerms = [
15
+ "立信",
16
+ "费敏",
17
+ "闻总",
18
+ "谭总",
19
+ "茅盾中学",
20
+ "鸿益",
21
+ "太鑫",
22
+ "飞双",
23
+ "魔毯",
24
+ "客户内部",
25
+ "培训演示",
26
+ "基础内测",
27
+ "内测模式",
28
+ "内测包"
29
+ ];
30
+
31
+ function repoPath(...parts) {
32
+ const target = path.join(root, ...parts);
33
+ const relative = path.relative(root, target);
34
+ if (relative.startsWith("..") || path.isAbsolute(relative)) {
35
+ throw new Error(`Path traversal detected: ${target}`);
36
+ }
37
+ return target;
38
+ }
39
+
40
+ function readJson(filePath) {
41
+ try {
42
+ return JSON.parse(fs.readFileSync(filePath, "utf8"));
43
+ } catch (error) {
44
+ errors.push(`${path.relative(root, filePath)}: invalid JSON (${error.message})`);
45
+ return undefined;
46
+ }
47
+ }
48
+
49
+ function check(condition, message) {
50
+ if (!condition) errors.push(message);
51
+ }
52
+
53
+ function includesSensitiveTerm(value) {
54
+ const raw = typeof value === "string" ? value : JSON.stringify(value);
55
+ return sensitiveTerms.filter((term) => raw.includes(term));
56
+ }
57
+
58
+ function weightedScore(scores, criteria) {
59
+ const totalWeight = criteria.reduce((sum, item) => sum + item.weight, 0);
60
+ return (
61
+ criteria.reduce((sum, item) => {
62
+ return sum + scores[item.id] * item.weight;
63
+ }, 0) / totalWeight
64
+ );
65
+ }
66
+
67
+ if (fixture && scorecard) {
68
+ check(scorecard.schema === 1, "scorecard: schema must be 1");
69
+ check(scorecard.version === fixture.version, `scorecard: version must match fixture version ${fixture.version}`);
70
+ check(scorecard.fixture === "prompts/evaluations/engineering-business-basic-fixtures.json", "scorecard: fixture path mismatch");
71
+ check(Array.isArray(scorecard.criteria) && scorecard.criteria.length > 0, "scorecard: criteria must be a non-empty array");
72
+ check(Array.isArray(scorecard.cases) && scorecard.cases.length === fixture.cases.length, "scorecard: case count mismatch");
73
+
74
+ const sensitiveHits = includesSensitiveTerm(scorecard);
75
+ check(sensitiveHits.length === 0, `scorecard: sensitive terms leaked (${sensitiveHits.join(", ")})`);
76
+
77
+ const criteriaIds = new Set();
78
+ let totalWeight = 0;
79
+ for (const criterion of scorecard.criteria ?? []) {
80
+ check(/^[a-z0-9_]+$/.test(criterion.id), `${criterion.id ?? "unknown"}: invalid criterion id`);
81
+ check(!criteriaIds.has(criterion.id), `${criterion.id}: duplicate criterion`);
82
+ criteriaIds.add(criterion.id);
83
+ check(Number.isInteger(criterion.weight) && criterion.weight > 0, `${criterion.id}: weight must be a positive integer`);
84
+ totalWeight += criterion.weight ?? 0;
85
+ check(typeof criterion.description === "string" && criterion.description.length > 0, `${criterion.id}: missing description`);
86
+ }
87
+ check(totalWeight === 100, `scorecard: criteria weights must total 100, got ${totalWeight}`);
88
+
89
+ const fixtureById = new Map((fixture.cases ?? []).map((item) => [item.id, item]));
90
+ const expectedIds = [...fixtureById.keys()].sort();
91
+ const actualIds = (scorecard.cases ?? []).map((item) => item.caseId).sort();
92
+ check(JSON.stringify(actualIds) === JSON.stringify(expectedIds), "scorecard: case coverage mismatch");
93
+
94
+ for (const item of scorecard.cases ?? []) {
95
+ const sourceCase = fixtureById.get(item.caseId);
96
+ check(Boolean(sourceCase), `${item.caseId}: caseId not found in fixtures`);
97
+ check(item.winner === "basic", `${item.caseId}: winner must be basic`);
98
+ check(typeof item.decisionBasis === "string" && item.decisionBasis.length > 0, `${item.caseId}: missing decisionBasis`);
99
+ check(Array.isArray(item.basicPromptGains) && item.basicPromptGains.length > 0, `${item.caseId}: basicPromptGains must be non-empty`);
100
+ check(
101
+ JSON.stringify(item.observedWeakFailures ?? []) === JSON.stringify(sourceCase?.weakFailureModes ?? []),
102
+ `${item.caseId}: observedWeakFailures must match fixture weakFailureModes`
103
+ );
104
+
105
+ for (const scoreSetName of ["weakScores", "basicScores"]) {
106
+ const scoreSet = item[scoreSetName] ?? {};
107
+ const scoreIds = Object.keys(scoreSet).sort();
108
+ check(JSON.stringify(scoreIds) === JSON.stringify([...criteriaIds].sort()), `${item.caseId}: ${scoreSetName} coverage mismatch`);
109
+ for (const criterionId of criteriaIds) {
110
+ const value = scoreSet[criterionId];
111
+ check(Number.isInteger(value) && value >= 1 && value <= 5, `${item.caseId}: ${scoreSetName}.${criterionId} must be 1-5`);
112
+ }
113
+ }
114
+
115
+ const weakTotal = weightedScore(item.weakScores, scorecard.criteria);
116
+ const basicTotal = weightedScore(item.basicScores, scorecard.criteria);
117
+ check(
118
+ basicTotal - weakTotal >= scorecard.minimumWeightedDelta,
119
+ `${item.caseId}: weighted improvement ${Number(basicTotal - weakTotal).toFixed(2)} is below minimum ${scorecard.minimumWeightedDelta}`
120
+ );
121
+ }
122
+
123
+ check(scorecard.overallDecision?.winner === "basic", "scorecard: overall winner must be basic");
124
+ check(typeof scorecard.overallDecision?.notAClaim === "string", "scorecard: overallDecision.notAClaim is required");
125
+ }
126
+
127
+ if (errors.length > 0) {
128
+ console.error(`Prompt scorecard validation failed with ${errors.length} error(s):`);
129
+ for (const error of errors) console.error(`- ${error}`);
130
+ process.exit(1);
131
+ }
132
+
133
+ console.log("Prompt scorecard validation passed.");
@@ -5,6 +5,7 @@ import path from "node:path";
5
5
 
6
6
  const root = fs.realpathSync(process.cwd());
7
7
  const errors = [];
8
+ const topLevelSkillNames = new Set(["aios", "archsight-aios"]);
8
9
 
9
10
  function repoPath(...parts) {
10
11
  const target = path.join(root, ...parts);
@@ -67,7 +68,7 @@ if (manifest) {
67
68
  const manifestSkillIds = new Set(manifest.skills.map((skill) => skill.id));
68
69
  const skillDirs = fs
69
70
  .readdirSync(repoPath("skills"), { withFileTypes: true })
70
- .filter((entry) => entry.isDirectory() && entry.name.startsWith("aios-"))
71
+ .filter((entry) => entry.isDirectory() && (entry.name.startsWith("aios-") || topLevelSkillNames.has(entry.name)))
71
72
  .map((entry) => entry.name)
72
73
  .sort();
73
74
 
@@ -88,15 +89,19 @@ if (manifest) {
88
89
  check(frontmatter.name === skill.id, `${skillFile}: frontmatter name must be ${skill.id}`);
89
90
  check(Boolean(frontmatter.description), `${skillFile}: missing frontmatter description`);
90
91
  }
92
+
93
+ for (const requiredAsset of manifest.requiredAssets ?? []) {
94
+ check(exists(requiredAsset), `runtime/archsight-aios.manifest.json: required asset missing ${requiredAsset}`);
95
+ }
91
96
  }
92
97
 
93
98
  if (packageJson) {
94
- const requiredFiles = ["skills/", "scripts/", ".claude-plugin/", "gemini-extension.json"];
99
+ const requiredFiles = ["skills/", "scripts/", ".claude-plugin/", "gemini-extension.json", "OPENCODE.md"];
95
100
  for (const requiredFile of requiredFiles) {
96
101
  check(packageJson.files?.includes(requiredFile), `package.json: files must include ${requiredFile}`);
97
102
  }
98
103
 
99
- const requiredKeywords = ["agent-skills", "skills-sh", "gemini-cli", "claude-code", "workbuddy", "construction-ai"];
104
+ const requiredKeywords = ["agent-skills", "skills-sh", "gemini-cli", "claude-code", "workbuddy", "opencode", "construction-ai"];
100
105
  for (const keyword of requiredKeywords) {
101
106
  check(packageJson.keywords?.includes(keyword), `package.json: keywords must include ${keyword}`);
102
107
  }
package/skills/README.md CHANGED
@@ -4,7 +4,7 @@
4
4
 
5
5
  每个 skill 应沉淀为可重复执行、可验证、可治理的工作单元,而不是一句 prompt。Skill 是项目工作目录中的实际作业方法,Agent 是角色身份和职责边界。
6
6
 
7
- AIOS Skill 的差异化目标是让通用 AI Coding 工具在建筑行业平台研发中获得更专业的默认判断。所有 `aios-*` Skill 都继承这个行业取向;Skill 名称只表示任务分工,不表示只有某一个 Skill 才面向建筑行业。
7
+ AIOS Skill 的差异化目标是让通用 AI Coding 工具在建筑行业平台研发中获得更专业的默认判断。`aios` `archsight-aios` 是总路由入口;其他 `aios-*` Skill 负责具体任务分工。Skill 名称只表示任务分工,不表示只有某一个 Skill 才面向建筑行业。
8
8
 
9
9
  当项目涉及 BIM / IFC、建筑规范、智能审图、图纸 / 模型处理、RAG / GraphRAG、任务编排、审计证据链、结构力学或长期平台演进时,`aios-ceo`、`aios-design`、`aios-plan`、`aios-exec`、`aios-review`、`aios-arch`、`aios-knowledge`、`aios-structural` 和 `aios-runtime` 都应把这些行业约束纳入判断。区别只是:`aios-ceo` 做建筑行业软件 / 系统的一把手深度评价,把产品定位、行业专业性、工程可信度、证据链和商业验证放到同一决策框架里;`aios-design` 判断界面方案能否支撑审查、定位、复核、追溯和交付,`aios-arch` 判断边界,`aios-knowledge` 判断行业语义,`aios-structural` 判断结构力学输入、求解链路和人工签审边界,`aios-runtime` 判断 AI / RAG 运行时,`aios-plan` 拆交付,`aios-review` 查风险,`aios-exec` 做受控实现。
10
10
 
@@ -16,7 +16,7 @@ AIOS 是建筑行业增强层,不是通用任务替代器。装了 AIOS 后,
16
16
 
17
17
  - 项目明确启用了 `bim-platform`、`construction-vision`、`rag-knowledge` 或其他建筑行业 profile。
18
18
  - 项目上下文、README、`.ai/project-context.md` 或用户任务明确涉及 BIM / IFC / Revit / CAD、建筑规范、智能审图、施工视觉、工程知识库、GraphRAG、图纸 / 模型处理、证据链、人工复核、审计留痕或建筑行业平台。
19
- - 用户明确要求使用 ArchSight AIOS、`aios-*` Skill 或建筑行业评审方法。
19
+ - 用户明确要求使用 ArchSight AIOS、`aios`、`archsight-aios`、`aios-*` Skill 或建筑行业评审方法。
20
20
 
21
21
  不启用行业增强的情况:
22
22
 
@@ -49,9 +49,11 @@ Skill 可以继续用 `SKILL.md` 表达操作方法,但涉及确定性工具
49
49
 
50
50
  第一阶段核心技能包:
51
51
 
52
- | Skill | 用途 |
53
- | --- | --- |
54
- | `aios-ceo` | 建筑行业软件 / 系统深度评价:产品定位、行业专业性、工程可信度、证据链、商业验证、范围取舍和阶段路线。 |
52
+ | Skill | 用途 |
53
+ | --- | --- |
54
+ | `aios` | AIOS 总路由入口:当用户只说“请用 AIOS 技能包分析该文档”时,先识别资料类型,再路由到合适的具体 Skill。 |
55
+ | `archsight-aios` | `aios` 的品牌别名入口,用于 “ArchSight AIOS” 或 “AIOS 技能包” 这类自然调用。 |
56
+ | `aios-ceo` | 建筑行业软件 / 系统深度评价:产品定位、行业专业性、工程可信度、证据链、商业验证、范围取舍和阶段路线。 |
55
57
  | `aios-design` | 建筑行业平台界面方案、工作台体验、证据定位、复核追溯和前端实现交接评审。 |
56
58
  | `aios-plan` | 交付计划、任务拆解、依赖和验证顺序。 |
57
59
  | `aios-exec` | 有边界地改代码、修 bug、更新文档、运行验证。 |
@@ -60,9 +62,13 @@ Skill 可以继续用 `SKILL.md` 表达操作方法,但涉及确定性工具
60
62
  | `aios-knowledge` | BIM、IFC、建筑规范、审图规则和知识结构化。 |
61
63
  | `aios-structural` | 结构力学、荷载、边界条件、FEM 和确定性求解链路评审。 |
62
64
  | `aios-runtime` | Prompt、Context、Memory、MCP/Tool、RAG/GraphRAG 和多 Agent Runtime 设计。 |
65
+ | `aios-compare` | 文档专业度对比:比较两份文档、两个版本或两个 AI 输出哪份更专业、更可复核、更适合交付。 |
66
+ | `aios-prompt-compare` | 内部 Prompt / Skill 测试工具:仅开发者明确调用时,对同一输入分别评估弱提示词、便携强提示词和真实 Skill 触发结果,判断是否应沉淀为 Skill。 |
63
67
 
64
68
  工程业务管理技能包 (Engineering Project Management):
65
69
 
70
+ 工程业务管理场景可直接参考 [工程业务管理基础技能包](engineering-business-starter-kit.md)。该基础包提供 L0-L1 级通用提示词 / Skill 模板能力:把工程资料整理成矩阵、清单、台账和人工复核问题;不承诺系统建设、自动审批、专业结论或替代签审。
71
+
66
72
  | Skill | 用途 |
67
73
  | --- | --- |
68
74
  | `aios-commercial-tender` | 工程招投标响应证据链,用于提取评分点、资格条件、废标风险、资料缺口和人工复核事项。 |
@@ -72,4 +78,4 @@ Skill 可以继续用 `SKILL.md` 表达操作方法,但涉及确定性工具
72
78
  | `aios-commercial-variation` | 工程变更签证资料链审查,用于梳理联系单、纪要、图纸变更、合同流程和资料断点。 |
73
79
  | `aios-construction-scheme` | 专项施工方案证据链辅审,用于提取危险源、交底要点、规范核验点、计算书缺口和专家复核事项。 |
74
80
 
75
- 工程业务管理 Skill 只处理建筑工程资料抽取、证据链整理、风险提示和人工复核分流,不替代法务、造价、监理、安全、项目经理、总工或专家签审。涉及规范、制度、结构计算、质量安全、金额、工期索赔或责任归属时,必须输出 `Claim / Evidence / Tool Result / Decision`;没有工具或人工证据时只能标注 `Need verify` `Hold for human`。
81
+ 工程业务管理 Skill 只处理建筑工程资料抽取、证据链整理、风险提示和人工复核分流,不替代法务、造价、监理、安全、项目经理、总工或专家签审。涉及规范、制度、结构计算、质量安全、金额、工期索赔或责任归属时,必须输出中文化的 `判断事项 / 证据 / 工具结果 / 处理建议`;没有工具或人工证据时只能标注 `需核验``转人工复核`。