npm - @hongmaple0820/scale-engine - Versions diffs - 0.18.0 → 0.20.0 - Mend

@hongmaple0820/scale-engine 0.18.0 → 0.20.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (110) hide show

package/README.en.md +310 -237
package/README.md +255 -63
package/dist/api/cli.js +2656 -1258
package/dist/api/cli.js.map +1 -1
package/dist/api/doctor.d.ts +4 -1
package/dist/api/doctor.js +85 -1
package/dist/api/doctor.js.map +1 -1
package/dist/api/quickstart.d.ts +3 -0
package/dist/api/quickstart.js +9 -4
package/dist/api/quickstart.js.map +1 -1
package/dist/cli/phaseCommands.js +7 -0
package/dist/cli/phaseCommands.js.map +1 -1
package/dist/codegraph/CodeIntelligence.d.ts +135 -0
package/dist/codegraph/CodeIntelligence.js +460 -0
package/dist/codegraph/CodeIntelligence.js.map +1 -0
package/dist/context/ContextBudget.d.ts +90 -0
package/dist/context/ContextBudget.js +322 -0
package/dist/context/ContextBudget.js.map +1 -0
package/dist/eval/WorkflowEval.d.ts +161 -0
package/dist/eval/WorkflowEval.js +379 -0
package/dist/eval/WorkflowEval.js.map +1 -0
package/dist/governance/GovernanceRoi.d.ts +25 -0
package/dist/governance/GovernanceRoi.js +70 -0
package/dist/governance/GovernanceRoi.js.map +1 -0
package/dist/governance/ProgressiveGovernance.d.ts +22 -0
package/dist/governance/ProgressiveGovernance.js +159 -0
package/dist/governance/ProgressiveGovernance.js.map +1 -0
package/dist/index.d.ts +2 -0
package/dist/index.js +4 -0
package/dist/index.js.map +1 -1
package/dist/memory/MemoryBrain.d.ts +135 -0
package/dist/memory/MemoryBrain.js +635 -0
package/dist/memory/MemoryBrain.js.map +1 -0
package/dist/memory/MemoryFabric.d.ts +118 -0
package/dist/memory/MemoryFabric.js +281 -0
package/dist/memory/MemoryFabric.js.map +1 -0
package/dist/memory/MemoryLearning.d.ts +61 -0
package/dist/memory/MemoryLearning.js +203 -0
package/dist/memory/MemoryLearning.js.map +1 -0
package/dist/memory/index.d.ts +3 -0
package/dist/memory/index.js +4 -0
package/dist/memory/index.js.map +1 -0
package/dist/output/GovernanceDashboard.d.ts +57 -0
package/dist/output/GovernanceDashboard.js +250 -0
package/dist/output/GovernanceDashboard.js.map +1 -0
package/dist/output/HTMLArtifactLayer.js +31 -31
package/dist/output/index.d.ts +2 -0
package/dist/output/index.js +1 -0
package/dist/output/index.js.map +1 -1
package/dist/prompts/VibeTemplateGallery.js +121 -121
package/dist/runtime/FinalReportGuard.d.ts +16 -0
package/dist/runtime/FinalReportGuard.js +14 -0
package/dist/runtime/FinalReportGuard.js.map +1 -0
package/dist/runtime/RuntimeDoctor.d.ts +23 -0
package/dist/runtime/RuntimeDoctor.js +151 -0
package/dist/runtime/RuntimeDoctor.js.map +1 -0
package/dist/runtime/RuntimeEvidenceLedger.d.ts +50 -0
package/dist/runtime/RuntimeEvidenceLedger.js +89 -0
package/dist/runtime/RuntimeEvidenceLedger.js.map +1 -0
package/dist/runtime/SessionLedger.d.ts +53 -0
package/dist/runtime/SessionLedger.js +104 -0
package/dist/runtime/SessionLedger.js.map +1 -0
package/dist/runtime/index.d.ts +4 -0
package/dist/runtime/index.js +5 -0
package/dist/runtime/index.js.map +1 -0
package/dist/skills/SkillRadar.d.ts +83 -0
package/dist/skills/SkillRadar.js +384 -0
package/dist/skills/SkillRadar.js.map +1 -0
package/dist/workflow/EngineeringStandards.js +69 -66
package/dist/workflow/EngineeringStandards.js.map +1 -1
package/dist/workflow/GovernanceTemplatePacks.js +126 -126
package/dist/workflow/GovernanceTemplates.d.ts +1 -1
package/dist/workflow/GovernanceTemplates.js +500 -229
package/dist/workflow/GovernanceTemplates.js.map +1 -1
package/dist/workflow/ResourceGovernance.js +27 -18
package/dist/workflow/ResourceGovernance.js.map +1 -1
package/dist/workflow/VerificationCommands.d.ts +11 -0
package/dist/workflow/VerificationCommands.js +2 -0
package/dist/workflow/VerificationCommands.js.map +1 -1
package/dist/workflow/VerificationProfile.d.ts +2 -1
package/dist/workflow/VerificationProfile.js +3 -0
package/dist/workflow/VerificationProfile.js.map +1 -1
package/dist/workflow/WorkflowArtifactWriter.js +2 -1
package/dist/workflow/WorkflowArtifactWriter.js.map +1 -1
package/dist/workflow/WorkflowEngine.js +4 -1
package/dist/workflow/WorkflowEngine.js.map +1 -1
package/dist/workflow/WorkspaceSafety.d.ts +9 -0
package/dist/workflow/WorkspaceSafety.js +49 -0
package/dist/workflow/WorkspaceSafety.js.map +1 -0
package/dist/workflow/gates/GateSystem.d.ts +12 -1
package/dist/workflow/gates/GateSystem.js +106 -0
package/dist/workflow/gates/GateSystem.js.map +1 -1
package/dist/workflow/types.d.ts +1 -1
package/docs/CODE_INTELLIGENCE.md +138 -0
package/docs/CONTEXT_BUDGET.md +87 -0
package/docs/GOVERNANCE_DASHBOARD.md +69 -0
package/docs/MEMORY_BRAIN.md +104 -0
package/docs/MEMORY_FABRIC.md +107 -0
package/docs/README.md +76 -0
package/docs/RUNTIME_EVIDENCE.md +101 -0
package/docs/SKILL_RADAR.md +115 -0
package/docs/WORKFLOW_EVAL.md +151 -0
package/docs/start/README.md +42 -0
package/docs/start/agent-governance-demo.md +107 -0
package/docs/start/quickstart.md +127 -0
package/examples/demo-projects/agent-governance-demo/README.md +37 -0
package/examples/demo-projects/agent-governance-demo/package.json +16 -0
package/examples/demo-projects/agent-governance-demo/src/oauth-state.ts +39 -0
package/examples/demo-projects/agent-governance-demo/tests/oauth-state.test.ts +52 -0
package/package.json +14 -3

package/docs/MEMORY_BRAIN.md ADDED Viewed

@@ -0,0 +1,104 @@
+# Memory Brain
+Memory Brain is SCALE's project-scoped long-term memory layer. It is separate from Memory Fabric:
+- Memory Fabric builds a compact context pack for the current task.
+- Memory Brain stores reviewed project knowledge with evidence, confidence, scope, and contradiction checks.
+The first version is local-first and uses SQLite:
+```text
+.scale/memory/brain.sqlite
+.scale/memory/brain-manifest.json
+```
+## Commands
+```bash
+scale memory ingest --from evidence --task-id <task-id>
+scale memory ingest --from candidate --candidate-id <candidate-id>
+scale memory ingest --from failure --failure-id <failure-replay-id>
+scale memory query "OAuth callback state design"
+scale memory contradictions
+scale memory dream
+scale memory promote <memory-node-id-or-candidate-id>
+scale memory export --output .scale/memory/export.jsonl
+scale memory import .scale/memory/export.jsonl
+```
+## Node Contract
+```ts
+interface MemoryNode {
+  id: string
+  type: 'fact' | 'decision' | 'incident' | 'relation' | 'contradiction'
+  title: string
+  summary: string
+  entities: string[]
+  source: 'runtime-evidence' | 'task-artifact' | 'docs' | 'git' | 'manual'
+  evidencePaths: string[]
+  confidence: number
+  scope: 'project' | 'workspace' | 'global-candidate'
+  status: 'candidate' | 'active' | 'stale' | 'rejected'
+  createdAt: string
+  updatedAt: string
+  lastVerifiedAt?: string
+}
+```
+## Evidence Rule
+Active memory must have at least one evidence path. SCALE blocks promotion when this is not true.
+Runtime evidence and learning candidates are ingested as `candidate` records first. `scale memory promote` is the explicit boundary where reviewed memory becomes active.
+Failure replay records can also be ingested as `incident` candidates:
+```bash
+scale eval run --suite workflow-baseline
+scale eval failures --since 30d
+scale memory ingest --from failure --failure-id <failure-replay-id>
+scale memory promote <memory-node-id>
+```
+This connects Eval Harness failures to long-term memory without automatically rewriting project standards. A failure becomes active memory only after promotion and only if the replay artifact is present as evidence.
+## Scope Rule
+Project memory stays project-scoped by default. `global-candidate` is allowed for export and review, but it cannot be activated inside a project brain. This prevents one project's temporary truth from becoming a global rule.
+## Contradiction Rule
+`scale memory contradictions` reports conflicts instead of resolving them automatically. Examples:
+- one memory says a provider is enabled, another says it is disabled
+- one memory says a route exists, another says it is missing
+- one memory says an operation is allowed, another says it is blocked
+The command exits non-zero when active contradictions exist.
+## Dream Maintenance
+`scale memory dream` is a maintenance pass. It reports:
+- promotion candidates
+- stale active memories
+- duplicate groups
+- contradictions
+- suggested docs to update
+- active memories missing evidence
+It does not auto-promote standards, rewrite docs, or delete memories.
+## Resource Lifecycle
+Memory Brain files under `.scale/memory/` are local runtime state by default. Commit only curated exports, documented decisions, or task artifacts that were intentionally reviewed.
+Recommended flow:
+```text
+runtime evidence -> memory settle -> memory ingest -> memory promote -> docs/standards update when stable
+eval failure replay -> memory ingest --from failure -> memory promote -> workflow rule update when stable
+```
+This keeps memory useful without turning every session observation into permanent project truth.

package/docs/MEMORY_FABRIC.md ADDED Viewed

@@ -0,0 +1,107 @@
+# Memory Fabric
+Memory Fabric 是 SCALE 用来降低长会话 token 消耗、提升 Agent 记忆质量的上下文压缩层。它不会把所有历史文档都塞回提示词，而是按任务范围生成一个可审计的 context pack。
+它聚合四类信息：
+- Runtime Evidence：真实运行过的命令、工具、浏览器、skill、MCP 和人工验证证据。
+- Session Events：当前会话的阶段、工具使用和证据写入事件。
+- Knowledge Recall：从项目知识库召回已验证经验、规则和历史教训。
+- Project Graph：检测 `graphify-out/GRAPH_REPORT.md` 或 `.scale/graph/manifest.json`，只引用图谱状态和摘要，不把大型图谱全文塞进上下文。
+## 基本命令
+生成上下文包：
+```bash
+scale memory pack \
+  --task-id 2026-05-18-runtime-evidence \
+  --session-id 2026-05-18-runtime-evidence \
+  --task "继续实现 runtime evidence 与最终交付检查" \
+  --level M \
+  --files src/runtime,src/api/cli.ts \
+  --budget 4000
+```
+输出 JSON，便于其他 Agent、CLI 或评审工具读取：
+```bash
+scale memory pack \
+  --task "修复 OAuth callback state 过期处理" \
+  --level M \
+  --budget 4000 \
+  --json
+```
+检查上下文预算：
+```bash
+scale memory doctor \
+  --task "跨模块权限重构" \
+  --level L \
+  --budget 3000
+```
+把完成任务后的运行证据沉淀成学习候选：
+```bash
+scale memory settle \
+  --task-id 2026-05-18-runtime-evidence \
+  --session-id 2026-05-18-runtime-evidence \
+  --task "继续实现 runtime evidence 与最终交付检查" \
+  --level M \
+  --budget 4000
+```
+`settle` 会写入：
+```text
+.scale/memory/learning-candidates/<candidate-id>.json
+.scale/memory/learning-candidates/<candidate-id>.md
+```
+这些文件是本地运行时学习候选，默认不应该直接提交到 Git。它们的作用是让人类或评审 Agent 判断“这条经验是否值得进入长期知识库、工程规范或模块文档”。
+## 预算策略
+Memory Fabric 使用估算 token 预算控制上下文规模。优先级从高到低：
+1. Runtime Evidence：失败证据和通过证据优先保留。
+2. Session Events：最近会话事件优先保留。
+3. Knowledge Recall：按任务描述和文件范围召回 Top K 知识。
+4. Project Graph：只保留图谱报告路径和短摘要。
+当预算不足时，低优先级 section 会被标记为 omitted，并写入原因。这样 Agent 能知道哪些上下文被刻意裁剪，而不是误以为项目没有相关信息。
+## 与知识库和自我进化的关系
+Memory Fabric 不替代知识库。它是知识库、运行证据和图谱之间的读取层：
+- Runtime Evidence 记录“这次实际做过什么”。
+- Knowledge Base 记录“长期可复用的经验和规则”。
+- Graphify 或项目图谱记录“模块之间的结构关系”。
+- Memory Fabric 在每次任务开始、恢复、评审或发版前，生成本次最相关的上下文包。
+任务完成后，应该把真正稳定的经验沉淀到知识库或长期维护文档中；`.scale/events/` 和 `.scale/evidence/` 仍然是本地运行时产物，不应默认提交到 Git。
+新的推荐闭环是：
+```text
+runtime evidence -> memory pack -> memory settle -> 人审 -> knowledge/docs/rules
+```
+也就是说，Memory Fabric 先把证据和上下文压缩成候选，不会自动把一次会话里的判断升级成长期规则。存在失败证据时，候选会标记为 `resolve-failures-first`，避免把未闭环问题沉淀成“经验”。
+## 推荐使用场景
+- 长会话恢复前：先生成 context pack，避免重复读大量文档。
+- 多 Agent 协作前：把 context pack 交给审查 Agent 或测试 Agent。
+- 发版前：用 runtime evidence 和 session events 检查是否存在未闭环失败。
+- 任务结束后：用 `memory settle` 生成学习候选，再决定是否进入知识库、模块文档或工程规范。
+- 大型项目治理：结合 service matrix、resource governance 和 engineering standards，生成任务相关而不是全仓库噪声上下文。
+## 当前边界
+- 当前版本不内置向量数据库；如果项目配置了 SQLite knowledge base，会使用现有召回接口。
+- 当前版本只检测 Graphify 产物是否存在并生成摘要，不主动运行 Graphify。
+- HTML 可视化报告适合后续加在 context pack 之上；Memory Fabric 的核心产物先保持 JSON/Markdown，方便 diff、测试和 CLI 集成。

package/docs/README.md ADDED Viewed

@@ -0,0 +1,76 @@
+# SCALE Engine 文档地图
+这个目录同时包含用户指南、治理能力说明、架构参考、历史规划和推广素材。新用户应优先阅读入门入口和当前治理能力文档，历史规划仅作为背景材料。
+## 新用户入口
+| 文档 | 说明 |
+| --- | --- |
+| [start/README.md](start/README.md) | 入门路径总览 |
+| [start/quickstart.md](start/quickstart.md) | 3 分钟快速开始 |
+| [start/agent-governance-demo.md](start/agent-governance-demo.md) | 官方 demo walkthrough |
+| [../README.md](../README.md) | 项目主页和能力总览 |
+## 当前治理能力
+| 文档 | 说明 |
+| --- | --- |
+| [RESOURCE_GOVERNANCE.md](RESOURCE_GOVERNANCE.md) | 文档、报告、媒体、脚本、临时产物的生命周期治理 |
+| [ENGINEERING_STANDARDS.md](ENGINEERING_STANDARDS.md) | 日志、安全、ORM、框架、测试、部署等工程规范 |
+| [TOOL_ORCHESTRATION.md](TOOL_ORCHESTRATION.md) | skills、MCP、CLI、浏览器、桌面自动化的编排策略 |
+| [RUNTIME_EVIDENCE.md](RUNTIME_EVIDENCE.md) | 会话 ledger、运行时证据和最终交付检查 |
+| [MEMORY_FABRIC.md](MEMORY_FABRIC.md) | Runtime evidence、session events、knowledge recall 和 graph status 的预算化上下文包 |
+| [MEMORY_BRAIN.md](MEMORY_BRAIN.md) | 证据驱动的长期记忆、矛盾检测、dream 整理和 failure replay 沉淀 |
+| [CONTEXT_BUDGET.md](CONTEXT_BUDGET.md) | Context Budget、Progressive Governance、Lazy Loading 和 Governance ROI |
+| [CODE_INTELLIGENCE.md](CODE_INTELLIGENCE.md) | CodeGraph、Graphify 和显式 fallback 的代码智能与探索 ROI |
+| [WORKFLOW_EVAL.md](WORKFLOW_EVAL.md) | Workflow Eval、pass@k 指标、Failure Replay 和改进候选 |
+| [SKILL_RADAR.md](SKILL_RADAR.md) | Skill Radar、能力置信度、证据要求和供应链安全检查 |
+| [GOVERNANCE_DASHBOARD.md](GOVERNANCE_DASHBOARD.md) | Runtime、eval、memory、resource、HTML artifact 的统一治理面板 |
+| [RELEASE_READINESS.md](RELEASE_READINESS.md) | 发版前质量门槛、官方 demo 和真实项目落地验收 |
+| [SKILL-REPOSITORY.md](SKILL-REPOSITORY.md) | 受治理 skill repository 和安装安全策略 |
+| [VIBE-TEMPLATES.md](VIBE-TEMPLATES.md) | 可复制的 Vibe Coding 提示词模板 |
+| [LEADERSHIP-PRESETS.md](LEADERSHIP-PRESETS.md) | CEO、CTO、PM、Architect 等内置领导者角色预设 |
+## 架构与参考
+| 文档 | 说明 |
+| --- | --- |
+| [00-OVERVIEW.md](00-OVERVIEW.md) | 系统概览 |
+| [01-ARCHITECTURE.md](01-ARCHITECTURE.md) | 架构设计 |
+| [02-DATA-MODEL.md](02-DATA-MODEL.md) | 数据模型 |
+| [03-CORE-MODULES.md](03-CORE-MODULES.md) | 核心模块 |
+| [04-INTEGRATION.md](04-INTEGRATION.md) | 平台与集成 |
+| [06-DECISIONS.md](06-DECISIONS.md) | 架构决策记录 |
+## 历史规划和过程记录
+这些文档是历史上下文，不一定代表当前产品入口：
+| 文档 | 说明 |
+| --- | --- |
+| [05-ROADMAP.md](05-ROADMAP.md) | 路线图 |
+| [OPTIMIZATION_PLAN.md](OPTIMIZATION_PLAN.md) | 历史优化计划 |
+| [WEEK1-2-REPORT.md](WEEK1-2-REPORT.md) | 阶段报告 |
+| [TASK_GUARD_SUMMARY.md](TASK_GUARD_SUMMARY.md) | Task Guard 总结 |
+| [TASK_GUARD_WORKFLOW_DEMO.md](TASK_GUARD_WORKFLOW_DEMO.md) | 早期 workflow demo |
+| [plans/2026-05-19-agent-engineering-os-upgrade-plan.md](plans/2026-05-19-agent-engineering-os-upgrade-plan.md) | Agent Engineering OS 升级审核稿：Context Budget、CodeGraph、Memory Brain、Skill Radar、HTML Artifact 和 Eval Harness |
+| [plans/](plans/) | 规划方案和技术方案归档 |
+| [superpowers/](superpowers/) | 外部方法论对照和计划归档 |
+## 推广和素材
+| 文档 | 说明 |
+| --- | --- |
+| [promote-article-v2.md](promote-article-v2.md) | 推广文章草稿 v2 |
+| [promote-article-v2.html](promote-article-v2.html) | 推广文章 HTML v2 |
+| [promote-article-v3.md](promote-article-v3.md) | 推广文章草稿 v3 |
+| [promote-article-v3.html](promote-article-v3.html) | 推广文章 HTML v3 |
+| [imgs/](imgs/) | 社群二维码和推广图片 |
+## 维护规则
+- 面向新用户的文档优先放在 `docs/start/`。
+- 当前可执行能力放在根 README 和当前治理能力文档中。
+- 历史规划不要混入新手教程，避免用户把旧计划当成当前事实。
+- 如果 CLI 行为变化，必须同步更新 `README.md`、`docs/start/quickstart.md` 和相关 reference 文档。
+- 如果新增 governance pack，必须同时更新 `README.md`、`docs/start/README.md` 和对应测试。

package/docs/RUNTIME_EVIDENCE.md ADDED Viewed

@@ -0,0 +1,101 @@
+# Runtime Evidence
+Runtime Evidence 是 SCALE 用来记录 Agent 实际做过什么的运行时证据层。它的目标很直接：没有真实命令、工具、浏览器、skill 或人工验证证据时，Agent 不能声称任务已经完成。
+它和现有证据层的关系：
+- Gate evidence：回答 build、lint、test、security、review 等门禁是否通过。
+- Tool evidence：回答必需的 skill、MCP、浏览器、桌面自动化或 CLI 工具是否执行过。
+- Runtime evidence：回答当前会话是否具备可信的最终交付证据。
+## 存储位置
+Runtime 数据写入 SCALE 已忽略的本地运行时目录：
+```text
+.scale/
+├── events/
+│   ├── current-session.json
+│   └── sessions/<session-id>.jsonl
+└── evidence/
+    └── runtime/<evidence-id>.json
+```
+这些文件默认是本地运行时产物，不应该提交到 Git。需要长期保留时，应把摘要沉淀到任务 summary、ADR、README 或模块文档中，而不是直接提交原始日志。
+## 基本流程
+启动会话：
+```bash
+scale runtime start \
+  --session-id 2026-05-18-runtime-evidence \
+  --task-id 2026-05-18-runtime-evidence \
+  --level M \
+  --agent codex
+```
+在真实命令、门禁、浏览器验证、skill 执行、MCP 调用或人工检查之后记录证据：
+```bash
+scale runtime record \
+  --title "build" \
+  --kind command \
+  --status passed \
+  --command "npm run build" \
+  --exit-code 0 \
+  --summary "TypeScript build passed"
+```
+检查是否允许最终交付：
+```bash
+scale runtime final-check \
+  --task-id 2026-05-18-runtime-evidence \
+  --session-id 2026-05-18-runtime-evidence \
+  --level M
+```
+检查运行时健康状态：
+```bash
+scale runtime doctor --level M
+scale doctor
+```
+## 完成规则
+M、L、CRITICAL 任务在最终交付前必须满足：
+- 当前 task/session 范围内至少有一条 `passed` runtime evidence。
+- 当前 task/session 范围内不能存在 `failed` runtime evidence。
+S 级任务可以保持轻量，但一旦存在失败证据，仍然不能声称完成。
+## 脱敏规则
+Runtime evidence 复用 tool evidence 的脱敏模型。写入 JSON 前会处理命令、摘要、artifact 路径和 metadata 中的敏感字段：
+- password
+- token
+- secret
+- authorization
+- cookie
+- credential
+- api key
+- private key
+这样可以保留有用证据，同时避免把 token、cookie、密钥等内容写进运行时文件。
+## 推荐使用场景
+适合记录 runtime evidence 的场景：
+- 最终交付检查。
+- 长会话或多阶段任务。
+- 跨 Agent 或外部 CLI review。
+- 浏览器、桌面自动化、MCP、skill 验证。
+- 发版前 preflight。
+- 需要进入后续学习闭环的失败、修复和重试记录。
+不要用 runtime evidence 替代长期维护文档。Runtime evidence 是“操作证明”，PRD、ADR、架构文档、README、模块文档才是长期项目契约。

package/docs/SKILL_RADAR.md ADDED Viewed

@@ -0,0 +1,115 @@
+# Skill Radar
+Skill Radar is the active capability selection layer for SCALE. It does not auto-install or blindly run skills. It scores relevant skills, MCP servers, browser tools, desktop automation, and external CLIs against the current task, then returns:
+- why the capability matches
+- confidence score
+- safety level
+- required evidence
+- fallback path
+- supply-chain checks before installation or promotion
+The goal is to make agents actively use useful tools without turning the project into an unsafe prompt or tool bundle.
+## Commands
+```bash
+scale skill radar --task "Design upload UI and run browser E2E checks" --files src/pages/upload.tsx
+scale skill radar --task "Automate WPS desktop workflow with CUA" --json
+scale skill radar --task "Review release PR" --phase review --level L --output docs/worklog/tasks/release/skill-radar.md
+scale skill doctor --supply-chain
+scale skill doctor --supply-chain --json
+```
+## Safety Levels
+| Level | Meaning | Default action |
+| --- | --- | --- |
+| `trusted` | Official or low-risk capability with policy enabled | May be recommended when confidence is high |
+| `review-required` | Third-party or ecosystem capability | Require source, license, scripts, and revision review |
+| `restricted` | Browser, desktop, or external execution boundary | Require explicit evidence and side-effect boundaries |
+| `blocked` | Disabled by policy or failed safety review | Do not run; use fallback |
+## Confidence
+Skill Radar combines:
+- task keywords and workflow phase
+- changed file patterns
+- local skill installation
+- tool availability
+- trust level
+- policy status
+- frontend/package evidence
+- safety penalties
+The score is not a promise that the tool will work. It is a routing signal. Any recommendation still needs real evidence before the agent can claim success.
+## Default Domains
+| Domain | Typical triggers | Recommended capability types |
+| --- | --- | --- |
+| `ui` | UI, UX, frontend, component, visual, layout | design skills, visual review, screenshot evidence |
+| `browserAutomation` | browser, E2E, Playwright, Chrome, DevTools | web access, browser automation, DevTools evidence |
+| `desktopAutomation` | desktop, GUI, WPS, WeChat, CUA | disabled by default; manual operator fallback |
+| `externalCli` | Codex, Gemini, OpenCode, external agent CLI | disabled by default; dry-run and output evidence |
+| `review` | PR, merge, release, code review | reviewer skills, severity findings |
+| `docs` | docs, README, ADR, governance asset | doc impact and source-of-truth evidence |
+| `discovery` | skill, MCP, tool, capability discovery | find-skills plus safety review |
+## Evidence Contract
+Each recommendation carries required evidence. Examples:
+- UI work: `ui-spec`, `design-rationale`, `screenshot`, `visual-review`
+- Browser work: `browser-evidence`, `console-summary`, `network-summary`, `scenario-result`
+- Desktop work: `operator-boundary`, `desktop-screenshot`, `affected-app`
+- External CLI work: `cli-version-check`, `command`, `exit-code`, `output-summary`
+- Review work: `review-report`, `finding-list`, `severity`
+If evidence is missing, the final delivery should list the capability as unverified rather than claiming it was used successfully.
+## Supply-Chain Doctor
+`scale skill doctor --supply-chain` reviews known skill sources and install commands for:
+- HTTPS source requirement
+- `curl | bash`, `wget | sh`, `Invoke-Expression`, and `iex` blocking
+- destructive install patterns
+- npm/npx lifecycle script review
+- required source, license, and revision checks
+This is intentionally conservative. Third-party skills should start in review-required mode and be promoted only after inspection.
+## Policy Integration
+Skill Radar reads `.scale/tools.json` through the Tool Policy layer. Defaults:
+- UI and browser capabilities are enabled but evidence-required.
+- Desktop CUA is disabled by default.
+- External agent CLIs are disabled by default.
+- Browser tools require captured evidence and should stay in approved domains.
+Use Tool Policy to enable a restricted capability deliberately rather than relying on an agent's assumption.
+## Fallback Rule
+Every recommendation must include a fallback. This prevents tool theater:
+```text
+If the capability is missing, unsafe, low-confidence, or policy-blocked,
+the agent must use the fallback and record why the capability was not used.
+```
+## Artifact Lifecycle
+Skill Radar reports can be written into task artifacts:
+```bash
+scale skill radar \
+  --task "Refactor upload page and verify browser flow" \
+  --files src/pages/upload.tsx \
+  --output docs/worklog/tasks/2026-05-19-upload-refactor/skill-radar.md
+```
+Keep the report when it is evidence for an M/L/CRITICAL task. Do not commit transient local detection output unless it is part of the reviewed task artifact set.

package/docs/WORKFLOW_EVAL.md ADDED Viewed

@@ -0,0 +1,151 @@
+# Workflow Eval Harness
+Status: implemented baseline
+Since: v0.22 development branch
+Workflow Eval Harness 用来证明工作流是否真的提升了 Agent 的工程交付质量，而不是只依赖主观感觉。它会运行轻量 eval suite，记录 pass@k、修复迭代、工具调用、token 估算、人类纠偏次数，并在失败时保留 Failure Replay。
+## Commands
+初始化默认基线套件：
+```bash
+scale eval init
+scale eval init --suite workflow-baseline --json
+```
+运行套件：
+```bash
+scale eval run --suite workflow-baseline
+scale eval run --suite workflow-baseline --json
+```
+对比两次运行：
+```bash
+scale eval compare --baseline <run-id> --candidate <run-id>
+scale eval compare --baseline <run-id> --candidate <run-id> --json
+```
+生成 Markdown 报告：
+```bash
+scale eval report --run <run-id>
+scale eval report --run <run-id> --output docs/worklog/eval-report.md
+```
+查看和提升失败重放：
+```bash
+scale eval failures --since 30d
+scale eval replay <failure-id>
+scale eval replay --task-id <task-id>
+scale eval promote-failure <failure-id>
+```
+## Failure Replay To Memory
+Failure Replay is local eval evidence first. When a failure pattern is useful for future work, ingest it into Memory Brain as an `incident` candidate:
+```bash
+scale memory ingest --from failure --failure-id <failure-id>
+scale memory query "missing verification evidence"
+scale memory promote <memory-node-id>
+```
+This does not auto-change standards or hooks. It only makes the failure queryable and evidence-backed so repeated mistakes can be promoted deliberately after review.
+## Storage
+```text
+.scale/evals/
+├── suites/
+├── runs/
+├── failures/
+└── improvements/
+```
+These files are local runtime evidence by default. Commit only curated summaries or intentional benchmark fixtures.
+## Suite Shape
+```json
+{
+  "version": "1.0",
+  "id": "workflow-baseline",
+  "name": "SCALE workflow baseline",
+  "cases": [
+    {
+      "id": "governance-command-smoke",
+      "type": "bugfix",
+      "title": "Command evidence smoke",
+      "task": "Verify that a local command can produce concrete eval evidence.",
+      "phase": "verify",
+      "successCriteria": ["command exits 0"],
+      "attempts": [
+        {
+          "id": "attempt-1",
+          "command": "node -e \"console.log('scale-eval-ok')\"",
+          "expectedExitCode": 0,
+          "outputContains": "scale-eval-ok"
+        }
+      ]
+    }
+  ]
+}
+```
+## Metrics
+| Metric | Meaning |
+| --- | --- |
+| `passAt1Rate` | 一次完整尝试就通过的比例 |
+| `passAt3Rate` | 三次以内通过的比例 |
+| `averageFixIterations` | 首次失败后的平均修复循环 |
+| `totalToolCalls` | eval attempts 数量，可近似衡量工具调用成本 |
+| `estimatedTokens` | task 与输出摘要的估算 token 成本 |
+| `humanCorrections` | 人类纠偏次数 |
+| `failureReplayCount` | 失败重放记录数量 |
+## Failure Replay
+失败不只记录最终失败状态，还会保存：
+- task and success criteria
+- phase
+- wrong turn
+- evidence
+- correction
+- prevention
+- replay command
+- redaction status
+Failure category 当前包括：
+- `wrong-exploration-path`
+- `hallucinated-project-fact`
+- `missing-codegraph-or-graph-fallback`
+- `over-broad-context-load`
+- `bad-skill-recommendation`
+- `missing-verification-evidence`
+- `failed-security-or-resource-gate`
+- `human-correction-after-agent-confidence`
+- `command-failure`
+- `unknown`
+`scale eval promote-failure` 会把失败重放提升为 improvement candidate，但不会自动修改项目规范。是否进入长期标准仍需要人工或后续 review 确认。
+## Governance Use
+- v0.22 的默认 suite 是轻量 smoke baseline，用来验证 eval 管线可运行。
+- 真实项目应逐步增加 bugfix、feature、security、frontend、release、resource 类型案例。
+- Failure Replay 应与 Resource Governance 配合：默认本地保留，只有总结、基准或明确要长期维护的案例才提交。
+- Workflow Eval 的数据可以进入后续 Governance ROI，用来判断某个治理模块是否真的减少 rework、tool calls、token 或人类纠偏。
+## Policy
+- 不允许用 eval 通过率替代真实项目验证。
+- 失败记录中的命令输出会做基础脱敏，但仍应避免把敏感原始日志写入 suite。
+- 低成本 smoke suite 可以频繁运行；重型项目 suite 应按需运行。
+- 没有 eval 证据时，不应宣称工作流能力已经提升。