@hongmaple0820/scale-engine 0.25.0 → 0.26.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +15 -15
- package/README.en.md +368 -346
- package/README.md +548 -529
- package/dist/adapters/AiderAdapter.js +52 -52
- package/dist/adapters/AntigravityAdapter.d.ts +4 -0
- package/dist/adapters/AntigravityAdapter.js +21 -0
- package/dist/adapters/AntigravityAdapter.js.map +1 -0
- package/dist/adapters/ClaudeCodeAdapter.d.ts +4 -1
- package/dist/adapters/ClaudeCodeAdapter.js +34 -34
- package/dist/adapters/ClaudeCodeAdapter.js.map +1 -1
- package/dist/adapters/ClineAdapter.d.ts +4 -0
- package/dist/adapters/ClineAdapter.js +20 -0
- package/dist/adapters/ClineAdapter.js.map +1 -0
- package/dist/adapters/CodexAdapter.js +28 -28
- package/dist/adapters/CursorAdapter.js +26 -26
- package/dist/adapters/DeepSeekTuiAdapter.js +97 -97
- package/dist/adapters/DoubaoAdapter.js +33 -33
- package/dist/adapters/GeminiAdapter.js +26 -26
- package/dist/adapters/GenericProjectAgentAdapter.d.ts +29 -0
- package/dist/adapters/GenericProjectAgentAdapter.js +204 -0
- package/dist/adapters/GenericProjectAgentAdapter.js.map +1 -0
- package/dist/adapters/HermesAdapter.js +26 -26
- package/dist/adapters/JCodeAdapter.d.ts +4 -0
- package/dist/adapters/JCodeAdapter.js +19 -0
- package/dist/adapters/JCodeAdapter.js.map +1 -0
- package/dist/adapters/KiloCodeAdapter.d.ts +4 -0
- package/dist/adapters/KiloCodeAdapter.js +20 -0
- package/dist/adapters/KiloCodeAdapter.js.map +1 -0
- package/dist/adapters/KimiAdapter.js +32 -32
- package/dist/adapters/KiroAdapter.js +26 -26
- package/dist/adapters/OpenClawAdapter.js +26 -26
- package/dist/adapters/OpenCodeAdapter.js +26 -26
- package/dist/adapters/QCoderAdapter.js +26 -26
- package/dist/adapters/QoderAdapter.d.ts +4 -0
- package/dist/adapters/QoderAdapter.js +21 -0
- package/dist/adapters/QoderAdapter.js.map +1 -0
- package/dist/adapters/TraeAdapter.js +26 -26
- package/dist/adapters/VSCAdapter.js +26 -26
- package/dist/adapters/WindsurfAdapter.js +32 -32
- package/dist/adapters/WorkBuddyAdapter.js +26 -26
- package/dist/adapters/index.d.ts +5 -0
- package/dist/adapters/index.js +15 -0
- package/dist/adapters/index.js.map +1 -1
- package/dist/api/cli.js +133 -47
- package/dist/api/cli.js.map +1 -1
- package/dist/api/doctor.js +10 -3
- package/dist/api/doctor.js.map +1 -1
- package/dist/api/quickstart.js +7 -1
- package/dist/api/quickstart.js.map +1 -1
- package/dist/artifact/sqliteStore.js +89 -89
- package/dist/artifact/types.d.ts +1 -1
- package/dist/cli/phaseCommands.js +45 -45
- package/dist/context/AntiPatternRegistry.js +20 -20
- package/dist/context/ContextBuilder.js +155 -155
- package/dist/evolution/EvolutionEngine.js +31 -31
- package/dist/evolution/EvolutionEvaluator.d.ts +2 -0
- package/dist/evolution/EvolutionEvaluator.js +7 -1
- package/dist/evolution/EvolutionEvaluator.js.map +1 -1
- package/dist/fsm/FSMAgentBridge.js +11 -11
- package/dist/hooks/HookGeneratorEnhanced.js +218 -218
- package/dist/index.d.ts +1 -1
- package/dist/index.js +2 -2
- package/dist/index.js.map +1 -1
- package/dist/knowledge/SQLiteKnowledgeBase.js +28 -28
- package/dist/memory/MemoryBrain.js +52 -52
- package/dist/output/GovernanceDashboard.js +44 -44
- package/dist/output/HTMLArtifactLayer.js +31 -31
- package/dist/prompts/VibeTemplateGallery.js +121 -121
- package/dist/skills/SkillDiscovery.js +12 -1
- package/dist/skills/SkillDiscovery.js.map +1 -1
- package/dist/skills/routing/SkillPlanner.js +40 -40
- package/dist/workflow/EngineeringStandards.js +62 -62
- package/dist/workflow/GovernanceTemplatePacks.d.ts +1 -1
- package/dist/workflow/GovernanceTemplatePacks.js +1990 -162
- package/dist/workflow/GovernanceTemplatePacks.js.map +1 -1
- package/dist/workflow/GovernanceTemplates.d.ts +2 -0
- package/dist/workflow/GovernanceTemplates.js +1012 -1001
- package/dist/workflow/GovernanceTemplates.js.map +1 -1
- package/dist/workflow/ResourceGovernance.js +16 -16
- package/dist/workflow/TaskArtifactScaffolder.js +10 -10
- package/dist/workflow/UpgradeManager.d.ts +3 -2
- package/dist/workflow/UpgradeManager.js +134 -49
- package/dist/workflow/UpgradeManager.js.map +1 -1
- package/dist/workflow/WorkspaceTopology.js +18 -15
- package/dist/workflow/WorkspaceTopology.js.map +1 -1
- package/docs/ACTIVE_SECURITY_VISUAL_GATES.md +87 -87
- package/docs/BACKGROUND_HUNTER.md +62 -62
- package/docs/CODE_INTELLIGENCE.md +138 -138
- package/docs/CONTEXT_BUDGET.md +113 -113
- package/docs/DEPENDENCY_AUDIT.md +89 -89
- package/docs/EVOLUTION_SHADOW_MODE.md +63 -63
- package/docs/EXTERNAL_REFERENCES.md +63 -58
- package/docs/GITLAB_FLOW.md +125 -125
- package/docs/GOVERNANCE_DASHBOARD.md +85 -85
- package/docs/MEMORY_BRAIN.md +104 -104
- package/docs/MEMORY_FABRIC.md +134 -134
- package/docs/README.md +101 -92
- package/docs/RUNTIME_EVIDENCE.md +101 -101
- package/docs/SKILL-REPOSITORY.md +57 -57
- package/docs/SKILL_RADAR.md +122 -122
- package/docs/THIRD_PARTY_SKILLS.md +57 -57
- package/docs/WORKFLOW_EVAL.md +151 -151
- package/docs/guides/DEVELOPMENT_WORKFLOW.md +80 -0
- package/docs/guides/GETTING_STARTED.md +50 -0
- package/docs/start/README.md +78 -72
- package/docs/start/agent-governance-demo.md +107 -107
- package/docs/start/quickstart.md +137 -127
- package/docs/start/workflow-upgrade.md +32 -8
- package/docs/workflow/README.md +67 -0
- package/docs/workflow/node-library.md +52 -0
- package/docs/workflow/templates/api-contract.md +29 -0
- package/docs/workflow/templates/architecture-review.md +23 -0
- package/docs/workflow/templates/db-change-plan.md +20 -0
- package/docs/workflow/templates/docs-impact.md +17 -0
- package/docs/workflow/templates/e2e-plan.md +20 -0
- package/docs/workflow/templates/explore.md +16 -0
- package/docs/workflow/templates/github-actions-scale-preflight.yml +32 -0
- package/docs/workflow/templates/mini-prd.md +16 -0
- package/docs/workflow/templates/plan.md +37 -0
- package/docs/workflow/templates/pre-push-scale-preflight.sh +8 -0
- package/docs/workflow/templates/product-smoke.md +61 -0
- package/docs/workflow/templates/reality-check.md +28 -0
- package/docs/workflow/templates/resource-cleanup.md +17 -0
- package/docs/workflow/templates/resource-impact.md +25 -0
- package/docs/workflow/templates/review.md +12 -0
- package/docs/workflow/templates/runtime.md +23 -0
- package/docs/workflow/templates/security-review.md +26 -0
- package/docs/workflow/templates/skill-evidence.md +33 -0
- package/docs/workflow/templates/skill-plan.md +39 -0
- package/docs/workflow/templates/spec.md +17 -0
- package/docs/workflow/templates/standards-impact.md +28 -0
- package/docs/workflow/templates/summary.md +16 -0
- package/docs/workflow/templates/tasks.md +8 -0
- package/docs/workflow/templates/ui-spec.md +29 -0
- package/docs/workflow/templates/verification.md +20 -0
- package/docs/workflow/templates/visual-review.md +20 -0
- package/examples/demo-projects/agent-governance-demo/CONTEXT.md +14 -14
- package/examples/demo-projects/agent-governance-demo/README.md +48 -48
- package/examples/demo-projects/agent-governance-demo/docs/CONTEXT-MAP.md +14 -14
- package/examples/demo-projects/agent-governance-demo/package.json +22 -21
- package/examples/demo-projects/agent-governance-demo/src/oauth-state.ts +39 -39
- package/examples/demo-projects/agent-governance-demo/tests/oauth-state.test.ts +52 -52
- package/package.json +88 -78
package/docs/WORKFLOW_EVAL.md
CHANGED
|
@@ -1,151 +1,151 @@
|
|
|
1
|
-
# Workflow Eval Harness
|
|
2
|
-
|
|
3
|
-
Status: implemented baseline
|
|
4
|
-
Since: v0.22 development branch
|
|
5
|
-
|
|
6
|
-
Workflow Eval Harness 用来证明工作流是否真的提升了 Agent 的工程交付质量,而不是只依赖主观感觉。它会运行轻量 eval suite,记录 pass@k、修复迭代、工具调用、token 估算、人类纠偏次数,并在失败时保留 Failure Replay。
|
|
7
|
-
|
|
8
|
-
## Commands
|
|
9
|
-
|
|
10
|
-
初始化默认基线套件:
|
|
11
|
-
|
|
12
|
-
```bash
|
|
13
|
-
scale eval init
|
|
14
|
-
scale eval init --suite workflow-baseline --json
|
|
15
|
-
```
|
|
16
|
-
|
|
17
|
-
运行套件:
|
|
18
|
-
|
|
19
|
-
```bash
|
|
20
|
-
scale eval run --suite workflow-baseline
|
|
21
|
-
scale eval run --suite workflow-baseline --json
|
|
22
|
-
```
|
|
23
|
-
|
|
24
|
-
对比两次运行:
|
|
25
|
-
|
|
26
|
-
```bash
|
|
27
|
-
scale eval compare --baseline <run-id> --candidate <run-id>
|
|
28
|
-
scale eval compare --baseline <run-id> --candidate <run-id> --json
|
|
29
|
-
```
|
|
30
|
-
|
|
31
|
-
生成 Markdown 报告:
|
|
32
|
-
|
|
33
|
-
```bash
|
|
34
|
-
scale eval report --run <run-id>
|
|
35
|
-
scale eval report --run <run-id> --output docs/worklog/eval-report.md
|
|
36
|
-
```
|
|
37
|
-
|
|
38
|
-
查看和提升失败重放:
|
|
39
|
-
|
|
40
|
-
```bash
|
|
41
|
-
scale eval failures --since 30d
|
|
42
|
-
scale eval replay <failure-id>
|
|
43
|
-
scale eval replay --task-id <task-id>
|
|
44
|
-
scale eval promote-failure <failure-id>
|
|
45
|
-
```
|
|
46
|
-
|
|
47
|
-
## Failure Replay To Memory
|
|
48
|
-
|
|
49
|
-
Failure Replay is local eval evidence first. When a failure pattern is useful for future work, ingest it into Memory Brain as an `incident` candidate:
|
|
50
|
-
|
|
51
|
-
```bash
|
|
52
|
-
scale memory ingest --from failure --failure-id <failure-id>
|
|
53
|
-
scale memory query "missing verification evidence"
|
|
54
|
-
scale memory promote <memory-node-id>
|
|
55
|
-
```
|
|
56
|
-
|
|
57
|
-
This does not auto-change standards or hooks. It only makes the failure queryable and evidence-backed so repeated mistakes can be promoted deliberately after review.
|
|
58
|
-
|
|
59
|
-
## Storage
|
|
60
|
-
|
|
61
|
-
```text
|
|
62
|
-
.scale/evals/
|
|
63
|
-
├── suites/
|
|
64
|
-
├── runs/
|
|
65
|
-
├── failures/
|
|
66
|
-
└── improvements/
|
|
67
|
-
```
|
|
68
|
-
|
|
69
|
-
These files are local runtime evidence by default. Commit only curated summaries or intentional benchmark fixtures.
|
|
70
|
-
|
|
71
|
-
## Suite Shape
|
|
72
|
-
|
|
73
|
-
```json
|
|
74
|
-
{
|
|
75
|
-
"version": "1.0",
|
|
76
|
-
"id": "workflow-baseline",
|
|
77
|
-
"name": "SCALE workflow baseline",
|
|
78
|
-
"cases": [
|
|
79
|
-
{
|
|
80
|
-
"id": "governance-command-smoke",
|
|
81
|
-
"type": "bugfix",
|
|
82
|
-
"title": "Command evidence smoke",
|
|
83
|
-
"task": "Verify that a local command can produce concrete eval evidence.",
|
|
84
|
-
"phase": "verify",
|
|
85
|
-
"successCriteria": ["command exits 0"],
|
|
86
|
-
"attempts": [
|
|
87
|
-
{
|
|
88
|
-
"id": "attempt-1",
|
|
89
|
-
"command": "node -e \"console.log('scale-eval-ok')\"",
|
|
90
|
-
"expectedExitCode": 0,
|
|
91
|
-
"outputContains": "scale-eval-ok"
|
|
92
|
-
}
|
|
93
|
-
]
|
|
94
|
-
}
|
|
95
|
-
]
|
|
96
|
-
}
|
|
97
|
-
```
|
|
98
|
-
|
|
99
|
-
## Metrics
|
|
100
|
-
|
|
101
|
-
| Metric | Meaning |
|
|
102
|
-
| --- | --- |
|
|
103
|
-
| `passAt1Rate` | 一次完整尝试就通过的比例 |
|
|
104
|
-
| `passAt3Rate` | 三次以内通过的比例 |
|
|
105
|
-
| `averageFixIterations` | 首次失败后的平均修复循环 |
|
|
106
|
-
| `totalToolCalls` | eval attempts 数量,可近似衡量工具调用成本 |
|
|
107
|
-
| `estimatedTokens` | task 与输出摘要的估算 token 成本 |
|
|
108
|
-
| `humanCorrections` | 人类纠偏次数 |
|
|
109
|
-
| `failureReplayCount` | 失败重放记录数量 |
|
|
110
|
-
|
|
111
|
-
## Failure Replay
|
|
112
|
-
|
|
113
|
-
失败不只记录最终失败状态,还会保存:
|
|
114
|
-
|
|
115
|
-
- task and success criteria
|
|
116
|
-
- phase
|
|
117
|
-
- wrong turn
|
|
118
|
-
- evidence
|
|
119
|
-
- correction
|
|
120
|
-
- prevention
|
|
121
|
-
- replay command
|
|
122
|
-
- redaction status
|
|
123
|
-
|
|
124
|
-
Failure category 当前包括:
|
|
125
|
-
|
|
126
|
-
- `wrong-exploration-path`
|
|
127
|
-
- `hallucinated-project-fact`
|
|
128
|
-
- `missing-codegraph-or-graph-fallback`
|
|
129
|
-
- `over-broad-context-load`
|
|
130
|
-
- `bad-skill-recommendation`
|
|
131
|
-
- `missing-verification-evidence`
|
|
132
|
-
- `failed-security-or-resource-gate`
|
|
133
|
-
- `human-correction-after-agent-confidence`
|
|
134
|
-
- `command-failure`
|
|
135
|
-
- `unknown`
|
|
136
|
-
|
|
137
|
-
`scale eval promote-failure` 会把失败重放提升为 improvement candidate,但不会自动修改项目规范。是否进入长期标准仍需要人工或后续 review 确认。
|
|
138
|
-
|
|
139
|
-
## Governance Use
|
|
140
|
-
|
|
141
|
-
- v0.22 的默认 suite 是轻量 smoke baseline,用来验证 eval 管线可运行。
|
|
142
|
-
- 真实项目应逐步增加 bugfix、feature、security、frontend、release、resource 类型案例。
|
|
143
|
-
- Failure Replay 应与 Resource Governance 配合:默认本地保留,只有总结、基准或明确要长期维护的案例才提交。
|
|
144
|
-
- Workflow Eval 的数据可以进入后续 Governance ROI,用来判断某个治理模块是否真的减少 rework、tool calls、token 或人类纠偏。
|
|
145
|
-
|
|
146
|
-
## Policy
|
|
147
|
-
|
|
148
|
-
- 不允许用 eval 通过率替代真实项目验证。
|
|
149
|
-
- 失败记录中的命令输出会做基础脱敏,但仍应避免把敏感原始日志写入 suite。
|
|
150
|
-
- 低成本 smoke suite 可以频繁运行;重型项目 suite 应按需运行。
|
|
151
|
-
- 没有 eval 证据时,不应宣称工作流能力已经提升。
|
|
1
|
+
# Workflow Eval Harness
|
|
2
|
+
|
|
3
|
+
Status: implemented baseline
|
|
4
|
+
Since: v0.22 development branch
|
|
5
|
+
|
|
6
|
+
Workflow Eval Harness 用来证明工作流是否真的提升了 Agent 的工程交付质量,而不是只依赖主观感觉。它会运行轻量 eval suite,记录 pass@k、修复迭代、工具调用、token 估算、人类纠偏次数,并在失败时保留 Failure Replay。
|
|
7
|
+
|
|
8
|
+
## Commands
|
|
9
|
+
|
|
10
|
+
初始化默认基线套件:
|
|
11
|
+
|
|
12
|
+
```bash
|
|
13
|
+
scale eval init
|
|
14
|
+
scale eval init --suite workflow-baseline --json
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
运行套件:
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
scale eval run --suite workflow-baseline
|
|
21
|
+
scale eval run --suite workflow-baseline --json
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
对比两次运行:
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
scale eval compare --baseline <run-id> --candidate <run-id>
|
|
28
|
+
scale eval compare --baseline <run-id> --candidate <run-id> --json
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
生成 Markdown 报告:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
scale eval report --run <run-id>
|
|
35
|
+
scale eval report --run <run-id> --output docs/worklog/eval-report.md
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
查看和提升失败重放:
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
scale eval failures --since 30d
|
|
42
|
+
scale eval replay <failure-id>
|
|
43
|
+
scale eval replay --task-id <task-id>
|
|
44
|
+
scale eval promote-failure <failure-id>
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
## Failure Replay To Memory
|
|
48
|
+
|
|
49
|
+
Failure Replay is local eval evidence first. When a failure pattern is useful for future work, ingest it into Memory Brain as an `incident` candidate:
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
scale memory ingest --from failure --failure-id <failure-id>
|
|
53
|
+
scale memory query "missing verification evidence"
|
|
54
|
+
scale memory promote <memory-node-id>
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
This does not auto-change standards or hooks. It only makes the failure queryable and evidence-backed so repeated mistakes can be promoted deliberately after review.
|
|
58
|
+
|
|
59
|
+
## Storage
|
|
60
|
+
|
|
61
|
+
```text
|
|
62
|
+
.scale/evals/
|
|
63
|
+
├── suites/
|
|
64
|
+
├── runs/
|
|
65
|
+
├── failures/
|
|
66
|
+
└── improvements/
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
These files are local runtime evidence by default. Commit only curated summaries or intentional benchmark fixtures.
|
|
70
|
+
|
|
71
|
+
## Suite Shape
|
|
72
|
+
|
|
73
|
+
```json
|
|
74
|
+
{
|
|
75
|
+
"version": "1.0",
|
|
76
|
+
"id": "workflow-baseline",
|
|
77
|
+
"name": "SCALE workflow baseline",
|
|
78
|
+
"cases": [
|
|
79
|
+
{
|
|
80
|
+
"id": "governance-command-smoke",
|
|
81
|
+
"type": "bugfix",
|
|
82
|
+
"title": "Command evidence smoke",
|
|
83
|
+
"task": "Verify that a local command can produce concrete eval evidence.",
|
|
84
|
+
"phase": "verify",
|
|
85
|
+
"successCriteria": ["command exits 0"],
|
|
86
|
+
"attempts": [
|
|
87
|
+
{
|
|
88
|
+
"id": "attempt-1",
|
|
89
|
+
"command": "node -e \"console.log('scale-eval-ok')\"",
|
|
90
|
+
"expectedExitCode": 0,
|
|
91
|
+
"outputContains": "scale-eval-ok"
|
|
92
|
+
}
|
|
93
|
+
]
|
|
94
|
+
}
|
|
95
|
+
]
|
|
96
|
+
}
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
## Metrics
|
|
100
|
+
|
|
101
|
+
| Metric | Meaning |
|
|
102
|
+
| --- | --- |
|
|
103
|
+
| `passAt1Rate` | 一次完整尝试就通过的比例 |
|
|
104
|
+
| `passAt3Rate` | 三次以内通过的比例 |
|
|
105
|
+
| `averageFixIterations` | 首次失败后的平均修复循环 |
|
|
106
|
+
| `totalToolCalls` | eval attempts 数量,可近似衡量工具调用成本 |
|
|
107
|
+
| `estimatedTokens` | task 与输出摘要的估算 token 成本 |
|
|
108
|
+
| `humanCorrections` | 人类纠偏次数 |
|
|
109
|
+
| `failureReplayCount` | 失败重放记录数量 |
|
|
110
|
+
|
|
111
|
+
## Failure Replay
|
|
112
|
+
|
|
113
|
+
失败不只记录最终失败状态,还会保存:
|
|
114
|
+
|
|
115
|
+
- task and success criteria
|
|
116
|
+
- phase
|
|
117
|
+
- wrong turn
|
|
118
|
+
- evidence
|
|
119
|
+
- correction
|
|
120
|
+
- prevention
|
|
121
|
+
- replay command
|
|
122
|
+
- redaction status
|
|
123
|
+
|
|
124
|
+
Failure category 当前包括:
|
|
125
|
+
|
|
126
|
+
- `wrong-exploration-path`
|
|
127
|
+
- `hallucinated-project-fact`
|
|
128
|
+
- `missing-codegraph-or-graph-fallback`
|
|
129
|
+
- `over-broad-context-load`
|
|
130
|
+
- `bad-skill-recommendation`
|
|
131
|
+
- `missing-verification-evidence`
|
|
132
|
+
- `failed-security-or-resource-gate`
|
|
133
|
+
- `human-correction-after-agent-confidence`
|
|
134
|
+
- `command-failure`
|
|
135
|
+
- `unknown`
|
|
136
|
+
|
|
137
|
+
`scale eval promote-failure` 会把失败重放提升为 improvement candidate,但不会自动修改项目规范。是否进入长期标准仍需要人工或后续 review 确认。
|
|
138
|
+
|
|
139
|
+
## Governance Use
|
|
140
|
+
|
|
141
|
+
- v0.22 的默认 suite 是轻量 smoke baseline,用来验证 eval 管线可运行。
|
|
142
|
+
- 真实项目应逐步增加 bugfix、feature、security、frontend、release、resource 类型案例。
|
|
143
|
+
- Failure Replay 应与 Resource Governance 配合:默认本地保留,只有总结、基准或明确要长期维护的案例才提交。
|
|
144
|
+
- Workflow Eval 的数据可以进入后续 Governance ROI,用来判断某个治理模块是否真的减少 rework、tool calls、token 或人类纠偏。
|
|
145
|
+
|
|
146
|
+
## Policy
|
|
147
|
+
|
|
148
|
+
- 不允许用 eval 通过率替代真实项目验证。
|
|
149
|
+
- 失败记录中的命令输出会做基础脱敏,但仍应避免把敏感原始日志写入 suite。
|
|
150
|
+
- 低成本 smoke suite 可以频繁运行;重型项目 suite 应按需运行。
|
|
151
|
+
- 没有 eval 证据时,不应宣称工作流能力已经提升。
|
|
@@ -0,0 +1,80 @@
|
|
|
1
|
+
# SCALE Engine 开发工作流
|
|
2
|
+
|
|
3
|
+
这份文档说明日常如何在 `scale-engine` 仓库里按最新工程化工作流工作。
|
|
4
|
+
|
|
5
|
+
## 标准闭环
|
|
6
|
+
|
|
7
|
+
```text
|
|
8
|
+
探索 -> 规划 -> 执行 -> 验证 -> 沉淀
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
## 1. 探索
|
|
12
|
+
|
|
13
|
+
目标:先弄清真实仓库状态,再动手。
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
make new-task NAME=task-slug LEVEL=M
|
|
17
|
+
make plan NAME=task-slug LEVEL=M
|
|
18
|
+
make explore FILES='AGENTS.md CLAUDE.md README.md package.json src/api/cli.ts' MSG='main contradiction'
|
|
19
|
+
make gate-workflow
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
最低要求:
|
|
23
|
+
|
|
24
|
+
- 至少读 3 个相关文件。
|
|
25
|
+
- 写清主矛盾,而不是只列文件名。
|
|
26
|
+
- 对不确定项明确标出,不靠猜。
|
|
27
|
+
|
|
28
|
+
## 2. 规划
|
|
29
|
+
|
|
30
|
+
在 `.planning/tasks/<task>/plan.md` 里至少补齐这些信息:
|
|
31
|
+
|
|
32
|
+
- scope / boundary
|
|
33
|
+
- acceptance criteria
|
|
34
|
+
- exception / failure path
|
|
35
|
+
- rollback / fallback
|
|
36
|
+
- verification commands
|
|
37
|
+
|
|
38
|
+
如果任务改动发布、权限、安全、凭据、npm 发版或破坏性行为,按 `CRITICAL` 处理。
|
|
39
|
+
|
|
40
|
+
## 3. 执行
|
|
41
|
+
|
|
42
|
+
原则:
|
|
43
|
+
|
|
44
|
+
- 最小必要修改。
|
|
45
|
+
- 优先复用现有脚本和 `npm` 命令,不再发明第二套命令。
|
|
46
|
+
- 改 `src/` 行为时,原则上同步改 `tests/`,否则会被 G3 拦下。
|
|
47
|
+
|
|
48
|
+
## 4. 验证
|
|
49
|
+
|
|
50
|
+
推荐顺序:
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
make gate-quality
|
|
54
|
+
make verify PROFILE=default
|
|
55
|
+
git diff --check
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
其中:
|
|
59
|
+
|
|
60
|
+
- `G4` 验证 workflow 脚本本身可解析。
|
|
61
|
+
- `G5` 运行 `lint + typecheck + test + build`。
|
|
62
|
+
- `G6` 检查任务证据和 diff hygiene。
|
|
63
|
+
- `G7` 是安全面,默认走 `npm audit --audit-level=high`。
|
|
64
|
+
- `G8` 检查 Markdown 与工作流文档的基础卫生。
|
|
65
|
+
|
|
66
|
+
## 5. 沉淀
|
|
67
|
+
|
|
68
|
+
应该留下:
|
|
69
|
+
|
|
70
|
+
- `verification.md`
|
|
71
|
+
- `review.md`
|
|
72
|
+
- `summary.md`
|
|
73
|
+
- 必要的长期规则文档更新
|
|
74
|
+
|
|
75
|
+
不应该留下:
|
|
76
|
+
|
|
77
|
+
- 临时日志
|
|
78
|
+
- worktree 状态
|
|
79
|
+
- 截图、trace、缓存
|
|
80
|
+
- 只对一次任务有意义的中间文件
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
# SCALE Engine 仓库上手
|
|
2
|
+
|
|
3
|
+
这份文档面向要开发 `scale-engine` 仓库本身的人,不是面向安装 CLI 的最终用户。
|
|
4
|
+
|
|
5
|
+
## 15 分钟路径
|
|
6
|
+
|
|
7
|
+
1. 先读根目录 [README.md](../../README.md)。
|
|
8
|
+
2. 跑本仓库 workflow 预检:
|
|
9
|
+
|
|
10
|
+
```bash
|
|
11
|
+
make preflight
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
3. 看当前可用验证面:
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
make verify-list
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
4. 建一个任务骨架并记录探索:
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
make new-task NAME=example LEVEL=M
|
|
24
|
+
make plan NAME=example LEVEL=M
|
|
25
|
+
make explore FILES='AGENTS.md CLAUDE.md README.md package.json' MSG='main contradiction'
|
|
26
|
+
make gate-workflow
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
5. 做完改动后跑质量面:
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
make gate-quality
|
|
33
|
+
make verify PROFILE=default
|
|
34
|
+
git diff --check
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
## 你应该看到什么
|
|
38
|
+
|
|
39
|
+
- `.scale/workspace.json` 明确了 `dev -> master` 的仓库分支策略。
|
|
40
|
+
- `.agent/project.json` 定义了本仓库的 Node/TypeScript 验证命令。
|
|
41
|
+
- `scripts/gates/*` 和 `scripts/workflow/*` 不是说明文档,而是可执行入口。
|
|
42
|
+
- `.planning/tasks/<date>-<task>/` 用于任务级证据,不再把临时过程写进 `docs/`。
|
|
43
|
+
|
|
44
|
+
## 常见误区
|
|
45
|
+
|
|
46
|
+
- `make gate-workflow` 通过,不代表代码质量通过。
|
|
47
|
+
- `make gate-quality` 通过,也不代表你已经记录了风险、回滚和未验证项。
|
|
48
|
+
- `G8` 会检查改动过的 Markdown 和工作流文档卫生,不替代业务验证。
|
|
49
|
+
- `--dry-run` 只能证明入口存在,不能写成“测试通过”。
|
|
50
|
+
- 不要把 `.claude/worktrees/`、`.agent/state/`、日志或截图提交进仓库。
|
package/docs/start/README.md
CHANGED
|
@@ -1,72 +1,78 @@
|
|
|
1
|
-
# SCALE Engine 入门路径
|
|
2
|
-
|
|
3
|
-
这个目录面向新用户。目标是先跑通一条最小路径,再理解完整体系,不要求一开始掌握所有命令。
|
|
4
|
-
|
|
5
|
-
## 推荐阅读顺序
|
|
6
|
-
|
|
7
|
-
1. [3 分钟快速开始](quickstart.md)
|
|
8
|
-
从空目录初始化治理工作流,看到 `.scale`、模板、验证 profile 和状态输出。
|
|
9
|
-
|
|
10
|
-
2. [Artifact 生命周期](artifact-lifecycle.md)
|
|
11
|
-
完整走一遍 Need → Spec → Plan → Task → Change → Evidence → Release,理解 FSM 和 Guard 如何用物理约束替代提示词建议。
|
|
12
|
-
|
|
13
|
-
3. [官方 Demo Walkthrough](agent-governance-demo.md)
|
|
14
|
-
用一个 OAuth state 加固任务演示:上下文对齐、诊断计划、TDD 切片、HTML artifact、资源治理和工程规范扫描。
|
|
15
|
-
|
|
16
|
-
4. 回到根目录 [README](../../README.md)
|
|
17
|
-
理解 SCALE Engine 的核心能力和 governance pack 选择。
|
|
18
|
-
|
|
19
|
-
4. [
|
|
20
|
-
理解工作流更新、第三方 skills/MCP/CLI
|
|
21
|
-
|
|
22
|
-
5. 查看 [文档地图](../README.md)
|
|
23
|
-
区分哪些文档是用户指南、哪些是参考资料、哪些是历史规划和过程记录。
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
|
65
|
-
|
|
|
66
|
-
|
|
|
67
|
-
|
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
1
|
+
# SCALE Engine 入门路径
|
|
2
|
+
|
|
3
|
+
这个目录面向新用户。目标是先跑通一条最小路径,再理解完整体系,不要求一开始掌握所有命令。
|
|
4
|
+
|
|
5
|
+
## 推荐阅读顺序
|
|
6
|
+
|
|
7
|
+
1. [3 分钟快速开始](quickstart.md)
|
|
8
|
+
从空目录初始化治理工作流,看到 `.scale`、模板、验证 profile 和状态输出。
|
|
9
|
+
|
|
10
|
+
2. [Artifact 生命周期](artifact-lifecycle.md)
|
|
11
|
+
完整走一遍 Need → Spec → Plan → Task → Change → Evidence → Release,理解 FSM 和 Guard 如何用物理约束替代提示词建议。
|
|
12
|
+
|
|
13
|
+
3. [官方 Demo Walkthrough](agent-governance-demo.md)
|
|
14
|
+
用一个 OAuth state 加固任务演示:上下文对齐、诊断计划、TDD 切片、HTML artifact、资源治理和工程规范扫描。
|
|
15
|
+
|
|
16
|
+
4. 回到根目录 [README](../../README.md)
|
|
17
|
+
理解 SCALE Engine 的核心能力和 governance pack 选择。
|
|
18
|
+
|
|
19
|
+
4. [工作流升级指南](workflow-upgrade.md)
|
|
20
|
+
理解工作流更新、第三方 skills/MCP/CLI 更新时如何先检查、生成计划、自动刷新干净受管文件,并避免覆盖本地改动。
|
|
21
|
+
|
|
22
|
+
5. 查看 [文档地图](../README.md)
|
|
23
|
+
区分哪些文档是用户指南、哪些是参考资料、哪些是历史规划和过程记录。
|
|
24
|
+
|
|
25
|
+
如果你要开发的是 `scale-engine` 仓库本身,而不是把 SCALE 接入别的项目,改看:
|
|
26
|
+
|
|
27
|
+
- [../guides/GETTING_STARTED.md](../guides/GETTING_STARTED.md)
|
|
28
|
+
- [../guides/DEVELOPMENT_WORKFLOW.md](../guides/DEVELOPMENT_WORKFLOW.md)
|
|
29
|
+
- [../workflow/README.md](../workflow/README.md)
|
|
30
|
+
|
|
31
|
+
## 15 分钟学习路径
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
npm install -g @hongmaple0820/scale-engine
|
|
35
|
+
scale --version
|
|
36
|
+
mkdir scale-demo && cd scale-demo
|
|
37
|
+
scale init --governance-pack standard
|
|
38
|
+
scale preflight --preflight-profile quick
|
|
39
|
+
scale status
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
跑完后先回答三个问题:
|
|
43
|
+
|
|
44
|
+
- `.scale/verification.json` 里定义了哪些验证 profile?
|
|
45
|
+
- `docs/workflow/templates/` 里有哪些任务产物模板?
|
|
46
|
+
- `scale status` 建议下一步做什么?
|
|
47
|
+
|
|
48
|
+
如果这三个问题答不上来,先不要继续看高级命令。
|
|
49
|
+
|
|
50
|
+
## 你应该先看到什么
|
|
51
|
+
|
|
52
|
+
跑完 quickstart 后,至少应该能看到:
|
|
53
|
+
|
|
54
|
+
- `scale preflight --preflight-profile quick` 可以执行。
|
|
55
|
+
- `scale status` 能告诉你当前项目下一步该做什么。
|
|
56
|
+
- `.scale/verification.json` 存在,并描述本地验证 profile。
|
|
57
|
+
- `docs/workflow/templates/` 存在,并包含 Mini-PRD、plan、verification、review、summary 等模板。
|
|
58
|
+
- `scale artifact render` 可以把任务 Markdown 证据渲染成 HTML。
|
|
59
|
+
|
|
60
|
+
如果其中任何一步失败,先看命令输出,不要假设是环境问题。SCALE 的原则是:没有真实命令结果,就不声称通过。
|
|
61
|
+
|
|
62
|
+
## 场景选择
|
|
63
|
+
|
|
64
|
+
| 场景 | 推荐入口 |
|
|
65
|
+
| --- | --- |
|
|
66
|
+
| 第一次试用 | [3 分钟快速开始](quickstart.md) |
|
|
67
|
+
| 想看 Agent 治理闭环 | [官方 Demo Walkthrough](agent-governance-demo.md) |
|
|
68
|
+
| 前端项目 | `scale init --governance-pack frontend-app` |
|
|
69
|
+
| Node/TypeScript 包 | `scale init --governance-pack node-library` |
|
|
70
|
+
| Go 多服务后端 | `scale init --governance-pack go-service-matrix` |
|
|
71
|
+
| 多仓库/MOE 工作区 | `scale init --governance-pack moe-workspace` |
|
|
72
|
+
| 文档、报告、截图、脚本混乱 | `scale init --governance-pack resource-governance` |
|
|
73
|
+
| 工作流或第三方能力要升级 | `scale upgrade check --lang zh && scale upgrade plan --html --lang zh` |
|
|
74
|
+
|
|
75
|
+
|
|
76
|
+
## 工作流升级短路径
|
|
77
|
+
|
|
78
|
+
已有项目先看 [SCALE 工作流升级指南](workflow-upgrade.md)。它说明 `scale init --interactive`、`scale upgrade check/plan/apply/rollback`、`--lang zh/en` 双语输出、仓库本地 `make workflow-upgrade-*` 入口,以及生成文件更新和项目级验证之间的边界。
|