@hongmaple0820/scale-engine 0.24.0 → 0.26.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +15 -15
- package/README.en.md +336 -304
- package/README.md +500 -475
- package/dist/adapters/AiderAdapter.js +52 -52
- package/dist/adapters/AntigravityAdapter.d.ts +4 -0
- package/dist/adapters/AntigravityAdapter.js +21 -0
- package/dist/adapters/AntigravityAdapter.js.map +1 -0
- package/dist/adapters/ClaudeCodeAdapter.d.ts +4 -1
- package/dist/adapters/ClaudeCodeAdapter.js +34 -34
- package/dist/adapters/ClaudeCodeAdapter.js.map +1 -1
- package/dist/adapters/ClineAdapter.d.ts +4 -0
- package/dist/adapters/ClineAdapter.js +20 -0
- package/dist/adapters/ClineAdapter.js.map +1 -0
- package/dist/adapters/CodexAdapter.js +28 -28
- package/dist/adapters/CursorAdapter.js +26 -26
- package/dist/adapters/DeepSeekTuiAdapter.js +97 -97
- package/dist/adapters/DoubaoAdapter.js +33 -33
- package/dist/adapters/GeminiAdapter.js +26 -26
- package/dist/adapters/GenericProjectAgentAdapter.d.ts +29 -0
- package/dist/adapters/GenericProjectAgentAdapter.js +204 -0
- package/dist/adapters/GenericProjectAgentAdapter.js.map +1 -0
- package/dist/adapters/HermesAdapter.js +26 -26
- package/dist/adapters/JCodeAdapter.d.ts +4 -0
- package/dist/adapters/JCodeAdapter.js +19 -0
- package/dist/adapters/JCodeAdapter.js.map +1 -0
- package/dist/adapters/KiloCodeAdapter.d.ts +4 -0
- package/dist/adapters/KiloCodeAdapter.js +20 -0
- package/dist/adapters/KiloCodeAdapter.js.map +1 -0
- package/dist/adapters/KimiAdapter.js +32 -32
- package/dist/adapters/KiroAdapter.js +26 -26
- package/dist/adapters/OpenClawAdapter.js +26 -26
- package/dist/adapters/OpenCodeAdapter.js +26 -26
- package/dist/adapters/QCoderAdapter.js +26 -26
- package/dist/adapters/QoderAdapter.d.ts +4 -0
- package/dist/adapters/QoderAdapter.js +21 -0
- package/dist/adapters/QoderAdapter.js.map +1 -0
- package/dist/adapters/TraeAdapter.js +26 -26
- package/dist/adapters/VSCAdapter.js +26 -26
- package/dist/adapters/WindsurfAdapter.js +32 -32
- package/dist/adapters/WorkBuddyAdapter.js +26 -26
- package/dist/adapters/index.d.ts +5 -0
- package/dist/adapters/index.js +15 -0
- package/dist/adapters/index.js.map +1 -1
- package/dist/api/cli.js +226 -48
- package/dist/api/cli.js.map +1 -1
- package/dist/api/doctor.js +10 -3
- package/dist/api/doctor.js.map +1 -1
- package/dist/api/quickstart.js +7 -1
- package/dist/api/quickstart.js.map +1 -1
- package/dist/artifact/sqliteStore.js +89 -89
- package/dist/artifact/types.d.ts +1 -1
- package/dist/cli/phaseCommands.js +45 -45
- package/dist/context/AntiPatternRegistry.js +20 -20
- package/dist/context/ContextBuilder.js +155 -155
- package/dist/evolution/EvolutionEngine.js +31 -31
- package/dist/evolution/EvolutionEvaluator.d.ts +2 -0
- package/dist/evolution/EvolutionEvaluator.js +7 -1
- package/dist/evolution/EvolutionEvaluator.js.map +1 -1
- package/dist/fsm/FSMAgentBridge.js +11 -11
- package/dist/hooks/HookGeneratorEnhanced.js +218 -218
- package/dist/index.d.ts +1 -1
- package/dist/index.js +2 -2
- package/dist/index.js.map +1 -1
- package/dist/knowledge/SQLiteKnowledgeBase.js +28 -28
- package/dist/memory/MemoryBrain.d.ts +1 -0
- package/dist/memory/MemoryBrain.js +55 -52
- package/dist/memory/MemoryBrain.js.map +1 -1
- package/dist/memory/MemoryFabric.d.ts +13 -1
- package/dist/memory/MemoryFabric.js +35 -0
- package/dist/memory/MemoryFabric.js.map +1 -1
- package/dist/memory/MemoryProviders.d.ts +111 -0
- package/dist/memory/MemoryProviders.js +385 -0
- package/dist/memory/MemoryProviders.js.map +1 -0
- package/dist/memory/index.d.ts +1 -0
- package/dist/memory/index.js +1 -0
- package/dist/memory/index.js.map +1 -1
- package/dist/output/GovernanceDashboard.js +44 -44
- package/dist/output/HTMLArtifactLayer.js +31 -31
- package/dist/prompts/VibeTemplateGallery.js +121 -121
- package/dist/skills/SkillDiscovery.js +12 -1
- package/dist/skills/SkillDiscovery.js.map +1 -1
- package/dist/skills/SkillRadar.js +20 -0
- package/dist/skills/SkillRadar.js.map +1 -1
- package/dist/skills/SkillRepository.d.ts +9 -1
- package/dist/skills/SkillRepository.js +70 -0
- package/dist/skills/SkillRepository.js.map +1 -1
- package/dist/skills/routing/SkillPlanner.js +40 -40
- package/dist/workflow/EngineeringStandards.js +62 -62
- package/dist/workflow/GovernanceTemplatePacks.d.ts +1 -1
- package/dist/workflow/GovernanceTemplatePacks.js +1990 -162
- package/dist/workflow/GovernanceTemplatePacks.js.map +1 -1
- package/dist/workflow/GovernanceTemplates.d.ts +2 -0
- package/dist/workflow/GovernanceTemplates.js +1012 -1001
- package/dist/workflow/GovernanceTemplates.js.map +1 -1
- package/dist/workflow/ResourceGovernance.js +16 -16
- package/dist/workflow/TaskArtifactScaffolder.js +10 -10
- package/dist/workflow/UpgradeManager.d.ts +3 -2
- package/dist/workflow/UpgradeManager.js +134 -49
- package/dist/workflow/UpgradeManager.js.map +1 -1
- package/dist/workflow/WorkspaceTopology.js +18 -15
- package/dist/workflow/WorkspaceTopology.js.map +1 -1
- package/docs/CODE_INTELLIGENCE.md +138 -138
- package/docs/CONTEXT_BUDGET.md +81 -81
- package/docs/EXTERNAL_REFERENCES.md +63 -0
- package/docs/GITLAB_FLOW.md +125 -125
- package/docs/GOVERNANCE_DASHBOARD.md +64 -64
- package/docs/MEMORY_BRAIN.md +104 -104
- package/docs/MEMORY_FABRIC.md +134 -107
- package/docs/README.md +79 -68
- package/docs/RUNTIME_EVIDENCE.md +101 -101
- package/docs/SKILL-REPOSITORY.md +57 -0
- package/docs/SKILL_RADAR.md +122 -115
- package/docs/THIRD_PARTY_SKILLS.md +57 -0
- package/docs/WORKFLOW_EVAL.md +151 -151
- package/docs/guides/DEVELOPMENT_WORKFLOW.md +80 -0
- package/docs/guides/GETTING_STARTED.md +50 -0
- package/docs/start/README.md +78 -72
- package/docs/start/agent-governance-demo.md +107 -107
- package/docs/start/quickstart.md +137 -127
- package/docs/start/workflow-upgrade.md +32 -8
- package/docs/workflow/README.md +67 -0
- package/docs/workflow/node-library.md +52 -0
- package/docs/workflow/templates/api-contract.md +29 -0
- package/docs/workflow/templates/architecture-review.md +23 -0
- package/docs/workflow/templates/db-change-plan.md +20 -0
- package/docs/workflow/templates/docs-impact.md +17 -0
- package/docs/workflow/templates/e2e-plan.md +20 -0
- package/docs/workflow/templates/explore.md +16 -0
- package/docs/workflow/templates/github-actions-scale-preflight.yml +32 -0
- package/docs/workflow/templates/mini-prd.md +16 -0
- package/docs/workflow/templates/plan.md +37 -0
- package/docs/workflow/templates/pre-push-scale-preflight.sh +8 -0
- package/docs/workflow/templates/product-smoke.md +61 -0
- package/docs/workflow/templates/reality-check.md +28 -0
- package/docs/workflow/templates/resource-cleanup.md +17 -0
- package/docs/workflow/templates/resource-impact.md +25 -0
- package/docs/workflow/templates/review.md +12 -0
- package/docs/workflow/templates/runtime.md +23 -0
- package/docs/workflow/templates/security-review.md +26 -0
- package/docs/workflow/templates/skill-evidence.md +33 -0
- package/docs/workflow/templates/skill-plan.md +39 -0
- package/docs/workflow/templates/spec.md +17 -0
- package/docs/workflow/templates/standards-impact.md +28 -0
- package/docs/workflow/templates/summary.md +16 -0
- package/docs/workflow/templates/tasks.md +8 -0
- package/docs/workflow/templates/ui-spec.md +29 -0
- package/docs/workflow/templates/verification.md +20 -0
- package/docs/workflow/templates/visual-review.md +20 -0
- package/examples/demo-projects/agent-governance-demo/CONTEXT.md +14 -14
- package/examples/demo-projects/agent-governance-demo/README.md +48 -48
- package/examples/demo-projects/agent-governance-demo/docs/CONTEXT-MAP.md +14 -14
- package/examples/demo-projects/agent-governance-demo/package.json +22 -21
- package/examples/demo-projects/agent-governance-demo/src/oauth-state.ts +39 -39
- package/examples/demo-projects/agent-governance-demo/tests/oauth-state.test.ts +52 -52
- package/package.json +88 -75
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
# SCALE Skill 仓库
|
|
2
|
+
|
|
3
|
+
这个仓库视图用于让 Agent 按任务渐进式发现、激活和编排 skills/MCP/CLI,而不是一次性把所有能力塞进上下文。
|
|
4
|
+
|
|
5
|
+
## 渐进式披露
|
|
6
|
+
|
|
7
|
+
1. 启动时只读取 Skill 元数据和一句话描述。
|
|
8
|
+
2. 任务命中时才读取完整 SKILL.md。
|
|
9
|
+
3. scripts、references、assets 只在明确需要时懒加载。
|
|
10
|
+
|
|
11
|
+
## 安全安装
|
|
12
|
+
|
|
13
|
+
- 安装前必须执行安全扫描,阻断 `curl | bash`、`Invoke-Expression`、危险删除和非 HTTPS 来源。
|
|
14
|
+
- npm/npx 来源必须补充 `npm audit signatures`、来源仓库、许可证和版本/commit 固定检查。
|
|
15
|
+
- 任何第三方 Skill 都先进入隔离审查,再写入项目或全局 skills 目录。
|
|
16
|
+
|
|
17
|
+
## 供应链防护清单
|
|
18
|
+
|
|
19
|
+
- review-skill-frontmatter
|
|
20
|
+
- inspect-scripts-directory
|
|
21
|
+
- verify-license-and-source
|
|
22
|
+
- verify-attribution-and-notice
|
|
23
|
+
- pin-source-revision
|
|
24
|
+
- npm-audit-signatures
|
|
25
|
+
|
|
26
|
+
## Skill 目录
|
|
27
|
+
|
|
28
|
+
| ID | 类别 | 信任 | 主要用途 | 组合建议 |
|
|
29
|
+
| --- | --- | --- | --- | --- |
|
|
30
|
+
| `planning-with-files` | planning | community | Use persistent planning files, progress logs, findings, active-plan selection, and plan attestation for long-running agent work. | memory-brain, web-access, code-reviewer |
|
|
31
|
+
| `agentmemory` | memory | community | Use as an optional external memory provider via REST or MCP when teams want cross-agent persistent memory beyond SCALE local Memory Brain. | memory-brain, mcp-chrome-devtools, codex-cli |
|
|
32
|
+
| `gbrain` | memory | community | Use as an optional graph-backed memory provider for long-running project knowledge, entity relationships, and background memory maintenance. | memory-brain, agentmemory, codegraph |
|
|
33
|
+
| `frontend-design` | ui | official | UI 视觉方向、布局、组件状态和前端实现约束。 | awesome-design-md, ui-ux-pro-max, webapp-testing |
|
|
34
|
+
| `awesome-design-md` | ui | ecosystem | 建立产品级设计规范和视觉语言。 | ui-ux-pro-max, frontend-design |
|
|
35
|
+
| `ui-ux-pro-max` | ui | ecosystem | 补齐体验策略、交互状态和 UI 验收维度。 | awesome-design-md, webapp-testing |
|
|
36
|
+
| `webapp-testing` | testing | official | 验证页面点击、表单、控制台、截图和端到端行为。 | agent-browser, mcp-chrome-devtools |
|
|
37
|
+
| `web-access` | browser | ecosystem | 获取一手资料、动态页面内容、网页证据和来源引用。 | agent-browser, mcp-chrome-devtools |
|
|
38
|
+
| `agent-browser` | browser | ecosystem | 与 Web 页面真实交互,补齐手工验收证据。 | web-access, webapp-testing, mcp-chrome-devtools |
|
|
39
|
+
| `mcp-chrome-devtools` | browser | ecosystem | 调试控制台错误、网络请求、页面状态和性能问题。 | agent-browser, webapp-testing |
|
|
40
|
+
| `cua` | desktop | ecosystem | 操作桌面应用并收集端侧截图、状态和副作用边界证据。 | web-access, agent-browser |
|
|
41
|
+
| `code-reviewer` | review | official | 合并前分级审查缺陷、安全、可维护性和测试风险。 | security-and-hardening, update-docs |
|
|
42
|
+
| `fix` | review | official | 提交前清理格式和 lint 问题。 | code-reviewer |
|
|
43
|
+
| `pr-creator` | review | official | 生成标准 PR 描述和合并前说明。 | code-reviewer, update-docs |
|
|
44
|
+
| `update-docs` | docs | official | 发现并更新受代码变更影响的长期文档。 | documentation-and-adrs |
|
|
45
|
+
| `find-skills` | discovery | ecosystem | 按任务意图搜索合适 Skill,再进入安全扫描。 | web-access |
|
|
46
|
+
| `codex-cli` | agent-cli | official | 外部 CLI 审查和命令级证据。 | gemini-cli, opencode-cli |
|
|
47
|
+
| `gemini-cli` | agent-cli | official | 外部 CLI 审查和命令级证据。 | codex-cli, opencode-cli |
|
|
48
|
+
| `opencode-cli` | agent-cli | ecosystem | 外部 CLI 审查和命令级证据。 | codex-cli, gemini-cli |
|
|
49
|
+
| `agency-agents-zh` | role-library | community | 提供 CEO、CTO、工程、设计、产品等角色预设参考。 | skill-safety-scan |
|
|
50
|
+
|
|
51
|
+
## Third-Party Attribution
|
|
52
|
+
|
|
53
|
+
| ID | License | Usage | Notice |
|
|
54
|
+
| --- | --- | --- | --- |
|
|
55
|
+
| `planning-with-files` | MIT | adapted-concept | Inspired by and compatible with OthmanAdi/planning-with-files. SCALE should not copy upstream files unless the MIT license text and attribution are included. |
|
|
56
|
+
| `agentmemory` | Apache-2.0 | external-reference | Optional external integration only. Do not vendor agentmemory code into SCALE without preserving Apache-2.0 license text, modification notices, and any upstream NOTICE obligations. |
|
|
57
|
+
| `gbrain` | MIT | external-reference | Optional external provider only. Do not vendor GBrain code into SCALE without preserving MIT license text, source revision, and modification notices. |
|
package/docs/SKILL_RADAR.md
CHANGED
|
@@ -1,115 +1,122 @@
|
|
|
1
|
-
# Skill Radar
|
|
2
|
-
|
|
3
|
-
Skill Radar is the active capability selection layer for SCALE. It does not auto-install or blindly run skills. It scores relevant skills, MCP servers, browser tools, desktop automation, and external CLIs against the current task, then returns:
|
|
4
|
-
|
|
5
|
-
- why the capability matches
|
|
6
|
-
- confidence score
|
|
7
|
-
- safety level
|
|
8
|
-
- required evidence
|
|
9
|
-
- fallback path
|
|
10
|
-
- supply-chain checks before installation or promotion
|
|
11
|
-
|
|
12
|
-
The goal is to make agents actively use useful tools without turning the project into an unsafe prompt or tool bundle.
|
|
13
|
-
|
|
14
|
-
## Commands
|
|
15
|
-
|
|
16
|
-
```bash
|
|
17
|
-
scale skill radar --task "Design upload UI and run browser E2E checks" --files src/pages/upload.tsx
|
|
18
|
-
scale skill radar --task "Automate WPS desktop workflow with CUA" --json
|
|
19
|
-
scale skill radar --task "Review release PR" --phase review --level L --output docs/worklog/tasks/release/skill-radar.md
|
|
20
|
-
scale skill doctor --supply-chain
|
|
21
|
-
scale skill doctor --supply-chain --json
|
|
22
|
-
```
|
|
23
|
-
|
|
24
|
-
## Safety Levels
|
|
25
|
-
|
|
26
|
-
| Level | Meaning | Default action |
|
|
27
|
-
| --- | --- | --- |
|
|
28
|
-
| `trusted` | Official or low-risk capability with policy enabled | May be recommended when confidence is high |
|
|
29
|
-
| `review-required` | Third-party or ecosystem capability | Require source, license, scripts, and revision review |
|
|
30
|
-
| `restricted` | Browser, desktop, or external execution boundary | Require explicit evidence and side-effect boundaries |
|
|
31
|
-
| `blocked` | Disabled by policy or failed safety review | Do not run; use fallback |
|
|
32
|
-
|
|
33
|
-
## Confidence
|
|
34
|
-
|
|
35
|
-
Skill Radar combines:
|
|
36
|
-
|
|
37
|
-
- task keywords and workflow phase
|
|
38
|
-
- changed file patterns
|
|
39
|
-
- local skill installation
|
|
40
|
-
- tool availability
|
|
41
|
-
- trust level
|
|
42
|
-
- policy status
|
|
43
|
-
- frontend/package evidence
|
|
44
|
-
- safety penalties
|
|
45
|
-
|
|
46
|
-
The score is not a promise that the tool will work. It is a routing signal. Any recommendation still needs real evidence before the agent can claim success.
|
|
47
|
-
|
|
48
|
-
## Default Domains
|
|
49
|
-
|
|
50
|
-
| Domain | Typical triggers | Recommended capability types |
|
|
51
|
-
| --- | --- | --- |
|
|
52
|
-
| `ui` | UI, UX, frontend, component, visual, layout | design skills, visual review, screenshot evidence |
|
|
53
|
-
| `browserAutomation` | browser, E2E, Playwright, Chrome, DevTools | web access, browser automation, DevTools evidence |
|
|
54
|
-
| `desktopAutomation` | desktop, GUI, WPS, WeChat, CUA | disabled by default; manual operator fallback |
|
|
55
|
-
| `externalCli` | Codex, Gemini, OpenCode, external agent CLI | disabled by default; dry-run and output evidence |
|
|
56
|
-
| `review` | PR, merge, release, code review | reviewer skills, severity findings |
|
|
57
|
-
| `docs` | docs, README, ADR, governance asset | doc impact and source-of-truth evidence |
|
|
58
|
-
| `
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
-
|
|
67
|
-
-
|
|
68
|
-
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
-
|
|
77
|
-
|
|
78
|
-
-
|
|
79
|
-
|
|
80
|
-
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
1
|
+
# Skill Radar
|
|
2
|
+
|
|
3
|
+
Skill Radar is the active capability selection layer for SCALE. It does not auto-install or blindly run skills. It scores relevant skills, MCP servers, browser tools, desktop automation, and external CLIs against the current task, then returns:
|
|
4
|
+
|
|
5
|
+
- why the capability matches
|
|
6
|
+
- confidence score
|
|
7
|
+
- safety level
|
|
8
|
+
- required evidence
|
|
9
|
+
- fallback path
|
|
10
|
+
- supply-chain checks before installation or promotion
|
|
11
|
+
|
|
12
|
+
The goal is to make agents actively use useful tools without turning the project into an unsafe prompt or tool bundle.
|
|
13
|
+
|
|
14
|
+
## Commands
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
scale skill radar --task "Design upload UI and run browser E2E checks" --files src/pages/upload.tsx
|
|
18
|
+
scale skill radar --task "Automate WPS desktop workflow with CUA" --json
|
|
19
|
+
scale skill radar --task "Review release PR" --phase review --level L --output docs/worklog/tasks/release/skill-radar.md
|
|
20
|
+
scale skill doctor --supply-chain
|
|
21
|
+
scale skill doctor --supply-chain --json
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## Safety Levels
|
|
25
|
+
|
|
26
|
+
| Level | Meaning | Default action |
|
|
27
|
+
| --- | --- | --- |
|
|
28
|
+
| `trusted` | Official or low-risk capability with policy enabled | May be recommended when confidence is high |
|
|
29
|
+
| `review-required` | Third-party or ecosystem capability | Require source, license, scripts, and revision review |
|
|
30
|
+
| `restricted` | Browser, desktop, or external execution boundary | Require explicit evidence and side-effect boundaries |
|
|
31
|
+
| `blocked` | Disabled by policy or failed safety review | Do not run; use fallback |
|
|
32
|
+
|
|
33
|
+
## Confidence
|
|
34
|
+
|
|
35
|
+
Skill Radar combines:
|
|
36
|
+
|
|
37
|
+
- task keywords and workflow phase
|
|
38
|
+
- changed file patterns
|
|
39
|
+
- local skill installation
|
|
40
|
+
- tool availability
|
|
41
|
+
- trust level
|
|
42
|
+
- policy status
|
|
43
|
+
- frontend/package evidence
|
|
44
|
+
- safety penalties
|
|
45
|
+
|
|
46
|
+
The score is not a promise that the tool will work. It is a routing signal. Any recommendation still needs real evidence before the agent can claim success.
|
|
47
|
+
|
|
48
|
+
## Default Domains
|
|
49
|
+
|
|
50
|
+
| Domain | Typical triggers | Recommended capability types |
|
|
51
|
+
| --- | --- | --- |
|
|
52
|
+
| `ui` | UI, UX, frontend, component, visual, layout | design skills, visual review, screenshot evidence |
|
|
53
|
+
| `browserAutomation` | browser, E2E, Playwright, Chrome, DevTools | web access, browser automation, DevTools evidence |
|
|
54
|
+
| `desktopAutomation` | desktop, GUI, WPS, WeChat, CUA | disabled by default; manual operator fallback |
|
|
55
|
+
| `externalCli` | Codex, Gemini, OpenCode, external agent CLI | disabled by default; dry-run and output evidence |
|
|
56
|
+
| `review` | PR, merge, release, code review | reviewer skills, severity findings |
|
|
57
|
+
| `docs` | docs, README, ADR, governance asset | doc impact and source-of-truth evidence |
|
|
58
|
+
| `planning` | plans, task_plan, findings, progress, long-running work | file-backed planning, progress logs, plan attestation |
|
|
59
|
+
| `memory` | memory, recall, knowledge, persistent memory, agentmemory, gbrain | provider-routed memory through agentmemory, gbrain, or scale-local fallback |
|
|
60
|
+
| `discovery` | skill, MCP, tool, capability discovery | find-skills plus safety review |
|
|
61
|
+
|
|
62
|
+
## Evidence Contract
|
|
63
|
+
|
|
64
|
+
Each recommendation carries required evidence. Examples:
|
|
65
|
+
|
|
66
|
+
- UI work: `ui-spec`, `design-rationale`, `screenshot`, `visual-review`
|
|
67
|
+
- Browser work: `browser-evidence`, `console-summary`, `network-summary`, `scenario-result`
|
|
68
|
+
- Desktop work: `operator-boundary`, `desktop-screenshot`, `affected-app`
|
|
69
|
+
- External CLI work: `cli-version-check`, `command`, `exit-code`, `output-summary`
|
|
70
|
+
- Review work: `review-report`, `finding-list`, `severity`
|
|
71
|
+
- Planning work: `task-plan`, `findings-log`, `progress-log`, `plan-attestation`
|
|
72
|
+
- Memory work: `memory-provider-health`, `privacy-boundary`, `data-retention-policy`, `query-result`
|
|
73
|
+
|
|
74
|
+
If evidence is missing, the final delivery should list the capability as unverified rather than claiming it was used successfully.
|
|
75
|
+
|
|
76
|
+
## Supply-Chain Doctor
|
|
77
|
+
|
|
78
|
+
`scale skill doctor --supply-chain` reviews known skill sources and install commands for:
|
|
79
|
+
|
|
80
|
+
- HTTPS source requirement
|
|
81
|
+
- `curl | bash`, `wget | sh`, `Invoke-Expression`, and `iex` blocking
|
|
82
|
+
- destructive install patterns
|
|
83
|
+
- npm/npx lifecycle script review
|
|
84
|
+
- required source, license, and revision checks
|
|
85
|
+
- third-party attribution and NOTICE checks
|
|
86
|
+
|
|
87
|
+
This is intentionally conservative. Third-party skills should start in review-required mode and be promoted only after inspection.
|
|
88
|
+
|
|
89
|
+
External skill references and acknowledgements are tracked in [Third-Party Skills and External References](THIRD_PARTY_SKILLS.md) and the full [External Reference Inventory](EXTERNAL_REFERENCES.md). SCALE should not vendor community skill code unless the license text, source revision, copyright notice, and modification notes are preserved.
|
|
90
|
+
|
|
91
|
+
## Policy Integration
|
|
92
|
+
|
|
93
|
+
Skill Radar reads `.scale/tools.json` through the Tool Policy layer. Defaults:
|
|
94
|
+
|
|
95
|
+
- UI and browser capabilities are enabled but evidence-required.
|
|
96
|
+
- Desktop CUA is disabled by default.
|
|
97
|
+
- External agent CLIs are disabled by default.
|
|
98
|
+
- Browser tools require captured evidence and should stay in approved domains.
|
|
99
|
+
|
|
100
|
+
Use Tool Policy to enable a restricted capability deliberately rather than relying on an agent's assumption.
|
|
101
|
+
|
|
102
|
+
## Fallback Rule
|
|
103
|
+
|
|
104
|
+
Every recommendation must include a fallback. This prevents tool theater:
|
|
105
|
+
|
|
106
|
+
```text
|
|
107
|
+
If the capability is missing, unsafe, low-confidence, or policy-blocked,
|
|
108
|
+
the agent must use the fallback and record why the capability was not used.
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
## Artifact Lifecycle
|
|
112
|
+
|
|
113
|
+
Skill Radar reports can be written into task artifacts:
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
scale skill radar \
|
|
117
|
+
--task "Refactor upload page and verify browser flow" \
|
|
118
|
+
--files src/pages/upload.tsx \
|
|
119
|
+
--output docs/worklog/tasks/2026-05-19-upload-refactor/skill-radar.md
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
Keep the report when it is evidence for an M/L/CRITICAL task. Do not commit transient local detection output unless it is part of the reviewed task artifact set.
|
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
# Third-Party Skills and External References
|
|
2
|
+
|
|
3
|
+
This document records external skill projects that SCALE may learn from, recommend, or integrate with. It is a governance boundary, not a vendoring manifest. The complete cross-repo inventory is maintained in [External Reference Inventory](EXTERNAL_REFERENCES.md).
|
|
4
|
+
|
|
5
|
+
## Policy
|
|
6
|
+
|
|
7
|
+
- Do not vendor third-party skill code, images, logos, examples, or marketing copy unless the license review explicitly allows redistribution.
|
|
8
|
+
- Preserve upstream license text, copyright notices, NOTICE files, source URL, and source revision before any vendored or modified redistribution.
|
|
9
|
+
- Mark modified files and document what changed from upstream.
|
|
10
|
+
- Treat optional external services as review-required until privacy, retention, credential, and delete boundaries are reviewed.
|
|
11
|
+
- `scale skill doctor --supply-chain` must include license, attribution, script, and pinned-revision checks for third-party skills.
|
|
12
|
+
- Community skills start as `review-required`; promotion requires real installation evidence and a recorded safety decision.
|
|
13
|
+
|
|
14
|
+
## Highlighted External References
|
|
15
|
+
|
|
16
|
+
| Project | License | Upstream | SCALE usage | Redistribution status |
|
|
17
|
+
| --- | --- | --- | --- | --- |
|
|
18
|
+
| Planning with Files | MIT | [OthmanAdi/planning-with-files](https://github.com/OthmanAdi/planning-with-files) | Adapt concepts for file-backed plans, findings, progress logs, active-plan routing, and plan attestation. | Not vendored. |
|
|
19
|
+
| agentmemory | Apache-2.0 | [rohitg00/agentmemory](https://github.com/rohitg00/agentmemory) | Optional external memory provider via REST or MCP for teams that need cross-agent persistent memory beyond local SCALE Memory Brain. | Not vendored. |
|
|
20
|
+
| GBrain | MIT | [garrytan/gbrain](https://github.com/garrytan/gbrain) | Optional graph memory provider for brain repos, hybrid search, entity relationships, MCP, and background maintenance. | Not vendored. |
|
|
21
|
+
|
|
22
|
+
Other referenced skills, MCP servers, CLIs, discovery candidates, and adapter targets are listed in [External Reference Inventory](EXTERNAL_REFERENCES.md). Unknown licenses stay `review-required`; do not treat a repository link as redistribution permission.
|
|
23
|
+
|
|
24
|
+
## Acknowledgements
|
|
25
|
+
|
|
26
|
+
SCALE acknowledges these upstream projects and contributors:
|
|
27
|
+
|
|
28
|
+
- `OthmanAdi/planning-with-files`, Copyright (c) 2026 Ahmad Adi.
|
|
29
|
+
- `rohitg00/agentmemory` and its upstream contributors.
|
|
30
|
+
- `garrytan/gbrain` and its upstream contributors.
|
|
31
|
+
- All upstream projects listed in [External Reference Inventory](EXTERNAL_REFERENCES.md) according to their licenses and contribution histories.
|
|
32
|
+
|
|
33
|
+
The current SCALE implementation records these projects as external references or adapted concepts. It does not copy their source code into this repository.
|
|
34
|
+
|
|
35
|
+
## Vendoring Checklist
|
|
36
|
+
|
|
37
|
+
If SCALE later vendors or modifies any third-party skill, the change must include:
|
|
38
|
+
|
|
39
|
+
1. Full upstream license text in the distributed package.
|
|
40
|
+
2. Upstream copyright and NOTICE material.
|
|
41
|
+
3. Source repository URL and pinned revision.
|
|
42
|
+
4. Modification notes for every copied or changed file.
|
|
43
|
+
5. Tests or doctor checks proving the attribution metadata is present.
|
|
44
|
+
6. README and generated skill repository documentation updates.
|
|
45
|
+
|
|
46
|
+
## Runtime Boundaries
|
|
47
|
+
|
|
48
|
+
External memory providers must not be enabled silently. Before use, record:
|
|
49
|
+
|
|
50
|
+
- provider endpoint and health check evidence
|
|
51
|
+
- project data scope
|
|
52
|
+
- credential boundary
|
|
53
|
+
- retention and deletion policy
|
|
54
|
+
- whether data leaves the local machine or team-controlled infrastructure
|
|
55
|
+
- whether provider writes are disabled, candidate-only, or explicitly enabled
|
|
56
|
+
|
|
57
|
+
External planning skills must not replace SCALE task evidence. They can improve the plan artifact shape, but final delivery still requires verification output, changed-file evidence, and explicit unverified-risk notes.
|
package/docs/WORKFLOW_EVAL.md
CHANGED
|
@@ -1,151 +1,151 @@
|
|
|
1
|
-
# Workflow Eval Harness
|
|
2
|
-
|
|
3
|
-
Status: implemented baseline
|
|
4
|
-
Since: v0.22 development branch
|
|
5
|
-
|
|
6
|
-
Workflow Eval Harness 用来证明工作流是否真的提升了 Agent 的工程交付质量,而不是只依赖主观感觉。它会运行轻量 eval suite,记录 pass@k、修复迭代、工具调用、token 估算、人类纠偏次数,并在失败时保留 Failure Replay。
|
|
7
|
-
|
|
8
|
-
## Commands
|
|
9
|
-
|
|
10
|
-
初始化默认基线套件:
|
|
11
|
-
|
|
12
|
-
```bash
|
|
13
|
-
scale eval init
|
|
14
|
-
scale eval init --suite workflow-baseline --json
|
|
15
|
-
```
|
|
16
|
-
|
|
17
|
-
运行套件:
|
|
18
|
-
|
|
19
|
-
```bash
|
|
20
|
-
scale eval run --suite workflow-baseline
|
|
21
|
-
scale eval run --suite workflow-baseline --json
|
|
22
|
-
```
|
|
23
|
-
|
|
24
|
-
对比两次运行:
|
|
25
|
-
|
|
26
|
-
```bash
|
|
27
|
-
scale eval compare --baseline <run-id> --candidate <run-id>
|
|
28
|
-
scale eval compare --baseline <run-id> --candidate <run-id> --json
|
|
29
|
-
```
|
|
30
|
-
|
|
31
|
-
生成 Markdown 报告:
|
|
32
|
-
|
|
33
|
-
```bash
|
|
34
|
-
scale eval report --run <run-id>
|
|
35
|
-
scale eval report --run <run-id> --output docs/worklog/eval-report.md
|
|
36
|
-
```
|
|
37
|
-
|
|
38
|
-
查看和提升失败重放:
|
|
39
|
-
|
|
40
|
-
```bash
|
|
41
|
-
scale eval failures --since 30d
|
|
42
|
-
scale eval replay <failure-id>
|
|
43
|
-
scale eval replay --task-id <task-id>
|
|
44
|
-
scale eval promote-failure <failure-id>
|
|
45
|
-
```
|
|
46
|
-
|
|
47
|
-
## Failure Replay To Memory
|
|
48
|
-
|
|
49
|
-
Failure Replay is local eval evidence first. When a failure pattern is useful for future work, ingest it into Memory Brain as an `incident` candidate:
|
|
50
|
-
|
|
51
|
-
```bash
|
|
52
|
-
scale memory ingest --from failure --failure-id <failure-id>
|
|
53
|
-
scale memory query "missing verification evidence"
|
|
54
|
-
scale memory promote <memory-node-id>
|
|
55
|
-
```
|
|
56
|
-
|
|
57
|
-
This does not auto-change standards or hooks. It only makes the failure queryable and evidence-backed so repeated mistakes can be promoted deliberately after review.
|
|
58
|
-
|
|
59
|
-
## Storage
|
|
60
|
-
|
|
61
|
-
```text
|
|
62
|
-
.scale/evals/
|
|
63
|
-
├── suites/
|
|
64
|
-
├── runs/
|
|
65
|
-
├── failures/
|
|
66
|
-
└── improvements/
|
|
67
|
-
```
|
|
68
|
-
|
|
69
|
-
These files are local runtime evidence by default. Commit only curated summaries or intentional benchmark fixtures.
|
|
70
|
-
|
|
71
|
-
## Suite Shape
|
|
72
|
-
|
|
73
|
-
```json
|
|
74
|
-
{
|
|
75
|
-
"version": "1.0",
|
|
76
|
-
"id": "workflow-baseline",
|
|
77
|
-
"name": "SCALE workflow baseline",
|
|
78
|
-
"cases": [
|
|
79
|
-
{
|
|
80
|
-
"id": "governance-command-smoke",
|
|
81
|
-
"type": "bugfix",
|
|
82
|
-
"title": "Command evidence smoke",
|
|
83
|
-
"task": "Verify that a local command can produce concrete eval evidence.",
|
|
84
|
-
"phase": "verify",
|
|
85
|
-
"successCriteria": ["command exits 0"],
|
|
86
|
-
"attempts": [
|
|
87
|
-
{
|
|
88
|
-
"id": "attempt-1",
|
|
89
|
-
"command": "node -e \"console.log('scale-eval-ok')\"",
|
|
90
|
-
"expectedExitCode": 0,
|
|
91
|
-
"outputContains": "scale-eval-ok"
|
|
92
|
-
}
|
|
93
|
-
]
|
|
94
|
-
}
|
|
95
|
-
]
|
|
96
|
-
}
|
|
97
|
-
```
|
|
98
|
-
|
|
99
|
-
## Metrics
|
|
100
|
-
|
|
101
|
-
| Metric | Meaning |
|
|
102
|
-
| --- | --- |
|
|
103
|
-
| `passAt1Rate` | 一次完整尝试就通过的比例 |
|
|
104
|
-
| `passAt3Rate` | 三次以内通过的比例 |
|
|
105
|
-
| `averageFixIterations` | 首次失败后的平均修复循环 |
|
|
106
|
-
| `totalToolCalls` | eval attempts 数量,可近似衡量工具调用成本 |
|
|
107
|
-
| `estimatedTokens` | task 与输出摘要的估算 token 成本 |
|
|
108
|
-
| `humanCorrections` | 人类纠偏次数 |
|
|
109
|
-
| `failureReplayCount` | 失败重放记录数量 |
|
|
110
|
-
|
|
111
|
-
## Failure Replay
|
|
112
|
-
|
|
113
|
-
失败不只记录最终失败状态,还会保存:
|
|
114
|
-
|
|
115
|
-
- task and success criteria
|
|
116
|
-
- phase
|
|
117
|
-
- wrong turn
|
|
118
|
-
- evidence
|
|
119
|
-
- correction
|
|
120
|
-
- prevention
|
|
121
|
-
- replay command
|
|
122
|
-
- redaction status
|
|
123
|
-
|
|
124
|
-
Failure category 当前包括:
|
|
125
|
-
|
|
126
|
-
- `wrong-exploration-path`
|
|
127
|
-
- `hallucinated-project-fact`
|
|
128
|
-
- `missing-codegraph-or-graph-fallback`
|
|
129
|
-
- `over-broad-context-load`
|
|
130
|
-
- `bad-skill-recommendation`
|
|
131
|
-
- `missing-verification-evidence`
|
|
132
|
-
- `failed-security-or-resource-gate`
|
|
133
|
-
- `human-correction-after-agent-confidence`
|
|
134
|
-
- `command-failure`
|
|
135
|
-
- `unknown`
|
|
136
|
-
|
|
137
|
-
`scale eval promote-failure` 会把失败重放提升为 improvement candidate,但不会自动修改项目规范。是否进入长期标准仍需要人工或后续 review 确认。
|
|
138
|
-
|
|
139
|
-
## Governance Use
|
|
140
|
-
|
|
141
|
-
- v0.22 的默认 suite 是轻量 smoke baseline,用来验证 eval 管线可运行。
|
|
142
|
-
- 真实项目应逐步增加 bugfix、feature、security、frontend、release、resource 类型案例。
|
|
143
|
-
- Failure Replay 应与 Resource Governance 配合:默认本地保留,只有总结、基准或明确要长期维护的案例才提交。
|
|
144
|
-
- Workflow Eval 的数据可以进入后续 Governance ROI,用来判断某个治理模块是否真的减少 rework、tool calls、token 或人类纠偏。
|
|
145
|
-
|
|
146
|
-
## Policy
|
|
147
|
-
|
|
148
|
-
- 不允许用 eval 通过率替代真实项目验证。
|
|
149
|
-
- 失败记录中的命令输出会做基础脱敏,但仍应避免把敏感原始日志写入 suite。
|
|
150
|
-
- 低成本 smoke suite 可以频繁运行;重型项目 suite 应按需运行。
|
|
151
|
-
- 没有 eval 证据时,不应宣称工作流能力已经提升。
|
|
1
|
+
# Workflow Eval Harness
|
|
2
|
+
|
|
3
|
+
Status: implemented baseline
|
|
4
|
+
Since: v0.22 development branch
|
|
5
|
+
|
|
6
|
+
Workflow Eval Harness 用来证明工作流是否真的提升了 Agent 的工程交付质量,而不是只依赖主观感觉。它会运行轻量 eval suite,记录 pass@k、修复迭代、工具调用、token 估算、人类纠偏次数,并在失败时保留 Failure Replay。
|
|
7
|
+
|
|
8
|
+
## Commands
|
|
9
|
+
|
|
10
|
+
初始化默认基线套件:
|
|
11
|
+
|
|
12
|
+
```bash
|
|
13
|
+
scale eval init
|
|
14
|
+
scale eval init --suite workflow-baseline --json
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
运行套件:
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
scale eval run --suite workflow-baseline
|
|
21
|
+
scale eval run --suite workflow-baseline --json
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
对比两次运行:
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
scale eval compare --baseline <run-id> --candidate <run-id>
|
|
28
|
+
scale eval compare --baseline <run-id> --candidate <run-id> --json
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
生成 Markdown 报告:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
scale eval report --run <run-id>
|
|
35
|
+
scale eval report --run <run-id> --output docs/worklog/eval-report.md
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
查看和提升失败重放:
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
scale eval failures --since 30d
|
|
42
|
+
scale eval replay <failure-id>
|
|
43
|
+
scale eval replay --task-id <task-id>
|
|
44
|
+
scale eval promote-failure <failure-id>
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
## Failure Replay To Memory
|
|
48
|
+
|
|
49
|
+
Failure Replay is local eval evidence first. When a failure pattern is useful for future work, ingest it into Memory Brain as an `incident` candidate:
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
scale memory ingest --from failure --failure-id <failure-id>
|
|
53
|
+
scale memory query "missing verification evidence"
|
|
54
|
+
scale memory promote <memory-node-id>
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
This does not auto-change standards or hooks. It only makes the failure queryable and evidence-backed so repeated mistakes can be promoted deliberately after review.
|
|
58
|
+
|
|
59
|
+
## Storage
|
|
60
|
+
|
|
61
|
+
```text
|
|
62
|
+
.scale/evals/
|
|
63
|
+
├── suites/
|
|
64
|
+
├── runs/
|
|
65
|
+
├── failures/
|
|
66
|
+
└── improvements/
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
These files are local runtime evidence by default. Commit only curated summaries or intentional benchmark fixtures.
|
|
70
|
+
|
|
71
|
+
## Suite Shape
|
|
72
|
+
|
|
73
|
+
```json
|
|
74
|
+
{
|
|
75
|
+
"version": "1.0",
|
|
76
|
+
"id": "workflow-baseline",
|
|
77
|
+
"name": "SCALE workflow baseline",
|
|
78
|
+
"cases": [
|
|
79
|
+
{
|
|
80
|
+
"id": "governance-command-smoke",
|
|
81
|
+
"type": "bugfix",
|
|
82
|
+
"title": "Command evidence smoke",
|
|
83
|
+
"task": "Verify that a local command can produce concrete eval evidence.",
|
|
84
|
+
"phase": "verify",
|
|
85
|
+
"successCriteria": ["command exits 0"],
|
|
86
|
+
"attempts": [
|
|
87
|
+
{
|
|
88
|
+
"id": "attempt-1",
|
|
89
|
+
"command": "node -e \"console.log('scale-eval-ok')\"",
|
|
90
|
+
"expectedExitCode": 0,
|
|
91
|
+
"outputContains": "scale-eval-ok"
|
|
92
|
+
}
|
|
93
|
+
]
|
|
94
|
+
}
|
|
95
|
+
]
|
|
96
|
+
}
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
## Metrics
|
|
100
|
+
|
|
101
|
+
| Metric | Meaning |
|
|
102
|
+
| --- | --- |
|
|
103
|
+
| `passAt1Rate` | 一次完整尝试就通过的比例 |
|
|
104
|
+
| `passAt3Rate` | 三次以内通过的比例 |
|
|
105
|
+
| `averageFixIterations` | 首次失败后的平均修复循环 |
|
|
106
|
+
| `totalToolCalls` | eval attempts 数量,可近似衡量工具调用成本 |
|
|
107
|
+
| `estimatedTokens` | task 与输出摘要的估算 token 成本 |
|
|
108
|
+
| `humanCorrections` | 人类纠偏次数 |
|
|
109
|
+
| `failureReplayCount` | 失败重放记录数量 |
|
|
110
|
+
|
|
111
|
+
## Failure Replay
|
|
112
|
+
|
|
113
|
+
失败不只记录最终失败状态,还会保存:
|
|
114
|
+
|
|
115
|
+
- task and success criteria
|
|
116
|
+
- phase
|
|
117
|
+
- wrong turn
|
|
118
|
+
- evidence
|
|
119
|
+
- correction
|
|
120
|
+
- prevention
|
|
121
|
+
- replay command
|
|
122
|
+
- redaction status
|
|
123
|
+
|
|
124
|
+
Failure category 当前包括:
|
|
125
|
+
|
|
126
|
+
- `wrong-exploration-path`
|
|
127
|
+
- `hallucinated-project-fact`
|
|
128
|
+
- `missing-codegraph-or-graph-fallback`
|
|
129
|
+
- `over-broad-context-load`
|
|
130
|
+
- `bad-skill-recommendation`
|
|
131
|
+
- `missing-verification-evidence`
|
|
132
|
+
- `failed-security-or-resource-gate`
|
|
133
|
+
- `human-correction-after-agent-confidence`
|
|
134
|
+
- `command-failure`
|
|
135
|
+
- `unknown`
|
|
136
|
+
|
|
137
|
+
`scale eval promote-failure` 会把失败重放提升为 improvement candidate,但不会自动修改项目规范。是否进入长期标准仍需要人工或后续 review 确认。
|
|
138
|
+
|
|
139
|
+
## Governance Use
|
|
140
|
+
|
|
141
|
+
- v0.22 的默认 suite 是轻量 smoke baseline,用来验证 eval 管线可运行。
|
|
142
|
+
- 真实项目应逐步增加 bugfix、feature、security、frontend、release、resource 类型案例。
|
|
143
|
+
- Failure Replay 应与 Resource Governance 配合:默认本地保留,只有总结、基准或明确要长期维护的案例才提交。
|
|
144
|
+
- Workflow Eval 的数据可以进入后续 Governance ROI,用来判断某个治理模块是否真的减少 rework、tool calls、token 或人类纠偏。
|
|
145
|
+
|
|
146
|
+
## Policy
|
|
147
|
+
|
|
148
|
+
- 不允许用 eval 通过率替代真实项目验证。
|
|
149
|
+
- 失败记录中的命令输出会做基础脱敏,但仍应避免把敏感原始日志写入 suite。
|
|
150
|
+
- 低成本 smoke suite 可以频繁运行;重型项目 suite 应按需运行。
|
|
151
|
+
- 没有 eval 证据时,不应宣称工作流能力已经提升。
|