@hongmaple0820/scale-engine 0.40.1 → 0.43.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (228) hide show
  1. package/README.md +30 -2
  2. package/dist/api/cli.js +286 -7
  3. package/dist/api/cli.js.map +1 -1
  4. package/dist/api/doctor.js +1 -1
  5. package/dist/api/doctor.js.map +1 -1
  6. package/dist/api/quickstart.d.ts +11 -0
  7. package/dist/api/quickstart.js +98 -1
  8. package/dist/api/quickstart.js.map +1 -1
  9. package/dist/artifact/fsmDefinitions.js +15 -2
  10. package/dist/artifact/fsmDefinitions.js.map +1 -1
  11. package/dist/artifact/types.d.ts +1 -1
  12. package/dist/artifact/types.js.map +1 -1
  13. package/dist/bootstrap/DependencyBootstrap.d.ts +1 -0
  14. package/dist/bootstrap/DependencyBootstrap.js +137 -25
  15. package/dist/bootstrap/DependencyBootstrap.js.map +1 -1
  16. package/dist/cache/ScanCache.d.ts +41 -0
  17. package/dist/cache/ScanCache.js +120 -0
  18. package/dist/cache/ScanCache.js.map +1 -0
  19. package/dist/capabilities/BrowserQACapability.d.ts +14 -0
  20. package/dist/capabilities/BrowserQACapability.js +94 -0
  21. package/dist/capabilities/BrowserQACapability.js.map +1 -1
  22. package/dist/capabilities/InstalledSkillsIntegration.js +29 -9
  23. package/dist/capabilities/InstalledSkillsIntegration.js.map +1 -1
  24. package/dist/cli/autofixCommands.d.ts +22 -0
  25. package/dist/cli/autofixCommands.js +32 -0
  26. package/dist/cli/autofixCommands.js.map +1 -0
  27. package/dist/cli/cortexCommands.d.ts +71 -0
  28. package/dist/cli/cortexCommands.js +335 -0
  29. package/dist/cli/cortexCommands.js.map +1 -0
  30. package/dist/cli/costCommands.d.ts +13 -0
  31. package/dist/cli/costCommands.js +48 -0
  32. package/dist/cli/costCommands.js.map +1 -0
  33. package/dist/cli/orchCommands.d.ts +43 -0
  34. package/dist/cli/orchCommands.js +135 -0
  35. package/dist/cli/orchCommands.js.map +1 -0
  36. package/dist/cli/phaseCommands.js +1 -2
  37. package/dist/cli/phaseCommands.js.map +1 -1
  38. package/dist/cli/qaCommands.d.ts +22 -0
  39. package/dist/cli/qaCommands.js +84 -0
  40. package/dist/cli/qaCommands.js.map +1 -0
  41. package/dist/cli/quickstartCommands.d.ts +17 -0
  42. package/dist/cli/quickstartCommands.js +47 -0
  43. package/dist/cli/quickstartCommands.js.map +1 -0
  44. package/dist/cli/shieldCommands.d.ts +30 -0
  45. package/dist/cli/shieldCommands.js +212 -0
  46. package/dist/cli/shieldCommands.js.map +1 -0
  47. package/dist/cli/tuiCommands.d.ts +7 -0
  48. package/dist/cli/tuiCommands.js +33 -0
  49. package/dist/cli/tuiCommands.js.map +1 -0
  50. package/dist/config/profiles.js +26 -0
  51. package/dist/config/profiles.js.map +1 -1
  52. package/dist/context/ContextBudget.js +2 -2
  53. package/dist/core/GbrainRuntime.d.ts +25 -0
  54. package/dist/core/GbrainRuntime.js +270 -0
  55. package/dist/core/GbrainRuntime.js.map +1 -0
  56. package/dist/cortex/GovernanceMetrics.d.ts +66 -0
  57. package/dist/cortex/GovernanceMetrics.js +230 -0
  58. package/dist/cortex/GovernanceMetrics.js.map +1 -0
  59. package/dist/cortex/InstinctExtractor.d.ts +61 -0
  60. package/dist/cortex/InstinctExtractor.js +184 -0
  61. package/dist/cortex/InstinctExtractor.js.map +1 -0
  62. package/dist/cortex/InstinctStore.d.ts +54 -0
  63. package/dist/cortex/InstinctStore.js +266 -0
  64. package/dist/cortex/InstinctStore.js.map +1 -0
  65. package/dist/cortex/ReflexionEngine.d.ts +34 -0
  66. package/dist/cortex/ReflexionEngine.js +157 -0
  67. package/dist/cortex/ReflexionEngine.js.map +1 -0
  68. package/dist/cortex/SessionInjector.d.ts +44 -0
  69. package/dist/cortex/SessionInjector.js +127 -0
  70. package/dist/cortex/SessionInjector.js.map +1 -0
  71. package/dist/cortex/adapters/ClaudeAdapter.d.ts +17 -0
  72. package/dist/cortex/adapters/ClaudeAdapter.js +61 -0
  73. package/dist/cortex/adapters/ClaudeAdapter.js.map +1 -0
  74. package/dist/cortex/adapters/CodexAdapter.d.ts +10 -0
  75. package/dist/cortex/adapters/CodexAdapter.js +52 -0
  76. package/dist/cortex/adapters/CodexAdapter.js.map +1 -0
  77. package/dist/cortex/adapters/CursorAdapter.d.ts +10 -0
  78. package/dist/cortex/adapters/CursorAdapter.js +46 -0
  79. package/dist/cortex/adapters/CursorAdapter.js.map +1 -0
  80. package/dist/cortex/adapters/GeminiAdapter.d.ts +11 -0
  81. package/dist/cortex/adapters/GeminiAdapter.js +48 -0
  82. package/dist/cortex/adapters/GeminiAdapter.js.map +1 -0
  83. package/dist/env/EnvironmentDoctor.js +221 -5
  84. package/dist/env/EnvironmentDoctor.js.map +1 -1
  85. package/dist/eval/BenchmarkPublisher.d.ts +25 -0
  86. package/dist/eval/BenchmarkPublisher.js +27 -0
  87. package/dist/eval/BenchmarkPublisher.js.map +1 -0
  88. package/dist/guardrails/DependencyAuditor.js +10 -1
  89. package/dist/guardrails/DependencyAuditor.js.map +1 -1
  90. package/dist/memory/MemoryProviders.js +38 -91
  91. package/dist/memory/MemoryProviders.js.map +1 -1
  92. package/dist/orchestrator/OrchestratorDaemon.d.ts +44 -0
  93. package/dist/orchestrator/OrchestratorDaemon.js +150 -0
  94. package/dist/orchestrator/OrchestratorDaemon.js.map +1 -0
  95. package/dist/orchestrator/PolicyLoader.d.ts +80 -0
  96. package/dist/orchestrator/PolicyLoader.js +229 -0
  97. package/dist/orchestrator/PolicyLoader.js.map +1 -0
  98. package/dist/orchestrator/ReconciliationLoop.d.ts +71 -0
  99. package/dist/orchestrator/ReconciliationLoop.js +266 -0
  100. package/dist/orchestrator/ReconciliationLoop.js.map +1 -0
  101. package/dist/orchestrator/TrackerAdapter.d.ts +60 -0
  102. package/dist/orchestrator/TrackerAdapter.js +147 -0
  103. package/dist/orchestrator/TrackerAdapter.js.map +1 -0
  104. package/dist/orchestrator/WorkspaceManager.d.ts +66 -0
  105. package/dist/orchestrator/WorkspaceManager.js +257 -0
  106. package/dist/orchestrator/WorkspaceManager.js.map +1 -0
  107. package/dist/qa/BrowserDaemon.d.ts +23 -0
  108. package/dist/qa/BrowserDaemon.js +79 -0
  109. package/dist/qa/BrowserDaemon.js.map +1 -0
  110. package/dist/qa/E2ETestOrchestrator.d.ts +14 -0
  111. package/dist/qa/E2ETestOrchestrator.js +19 -0
  112. package/dist/qa/E2ETestOrchestrator.js.map +1 -0
  113. package/dist/review/CrossModelReviewer.d.ts +35 -0
  114. package/dist/review/CrossModelReviewer.js +75 -0
  115. package/dist/review/CrossModelReviewer.js.map +1 -0
  116. package/dist/review/ReviewAggregator.d.ts +13 -0
  117. package/dist/review/ReviewAggregator.js +28 -0
  118. package/dist/review/ReviewAggregator.js.map +1 -0
  119. package/dist/review/reviewCommands.d.ts +15 -0
  120. package/dist/review/reviewCommands.js +24 -0
  121. package/dist/review/reviewCommands.js.map +1 -0
  122. package/dist/routing/LocalModelProvider.d.ts +11 -0
  123. package/dist/routing/LocalModelProvider.js +21 -0
  124. package/dist/routing/LocalModelProvider.js.map +1 -0
  125. package/dist/routing/ModelRouter.d.ts +12 -0
  126. package/dist/routing/ModelRouter.js +31 -4
  127. package/dist/routing/ModelRouter.js.map +1 -1
  128. package/dist/runtime/AiOsRuntime.d.ts +1 -0
  129. package/dist/runtime/AiOsRuntime.js +15 -0
  130. package/dist/runtime/AiOsRuntime.js.map +1 -1
  131. package/dist/runtime/CostAnalyzer.d.ts +53 -0
  132. package/dist/runtime/CostAnalyzer.js +160 -0
  133. package/dist/runtime/CostAnalyzer.js.map +1 -0
  134. package/dist/runtime/CostOptimizer.d.ts +11 -0
  135. package/dist/runtime/CostOptimizer.js +21 -0
  136. package/dist/runtime/CostOptimizer.js.map +1 -0
  137. package/dist/runtime/ModelUsageLedger.d.ts +53 -2
  138. package/dist/runtime/ModelUsageLedger.js +243 -39
  139. package/dist/runtime/ModelUsageLedger.js.map +1 -1
  140. package/dist/setup/SetupVerification.d.ts +42 -0
  141. package/dist/setup/SetupVerification.js +180 -0
  142. package/dist/setup/SetupVerification.js.map +1 -0
  143. package/dist/shield/PolicyCompiler.d.ts +70 -0
  144. package/dist/shield/PolicyCompiler.js +540 -0
  145. package/dist/shield/PolicyCompiler.js.map +1 -0
  146. package/dist/shield/ProtectedPaths.d.ts +39 -0
  147. package/dist/shield/ProtectedPaths.js +179 -0
  148. package/dist/shield/ProtectedPaths.js.map +1 -0
  149. package/dist/shield/ShieldProtocol.d.ts +50 -0
  150. package/dist/shield/ShieldProtocol.js +103 -0
  151. package/dist/shield/ShieldProtocol.js.map +1 -0
  152. package/dist/skills/SkillMdStandard.d.ts +33 -0
  153. package/dist/skills/SkillMdStandard.js +88 -0
  154. package/dist/skills/SkillMdStandard.js.map +1 -0
  155. package/dist/skills/SkillRegistry.d.ts +9 -1
  156. package/dist/skills/SkillRegistry.js +20 -0
  157. package/dist/skills/SkillRegistry.js.map +1 -1
  158. package/dist/skills/interop/GStackInterop.d.ts +15 -0
  159. package/dist/skills/interop/GStackInterop.js +34 -0
  160. package/dist/skills/interop/GStackInterop.js.map +1 -0
  161. package/dist/skills/interop/OMCInterop.d.ts +15 -0
  162. package/dist/skills/interop/OMCInterop.js +34 -0
  163. package/dist/skills/interop/OMCInterop.js.map +1 -0
  164. package/dist/tools/ToolCapabilityRegistry.js +10 -0
  165. package/dist/tools/ToolCapabilityRegistry.js.map +1 -1
  166. package/dist/tui/TuiDashboard.d.ts +3 -0
  167. package/dist/tui/TuiDashboard.js +120 -0
  168. package/dist/tui/TuiDashboard.js.map +1 -0
  169. package/dist/workflow/GateCatalog.d.ts +2 -0
  170. package/dist/workflow/GateCatalog.js +59 -3
  171. package/dist/workflow/GateCatalog.js.map +1 -1
  172. package/dist/workflow/GovernanceTemplatePacks.d.ts +1 -1
  173. package/dist/workflow/GovernanceTemplatePacks.js +15 -0
  174. package/dist/workflow/GovernanceTemplatePacks.js.map +1 -1
  175. package/dist/workflow/TddLoop.d.ts +2 -0
  176. package/dist/workflow/TddLoop.js +2 -0
  177. package/dist/workflow/TddLoop.js.map +1 -1
  178. package/dist/workflow/UpgradeManager.d.ts +10 -1
  179. package/dist/workflow/UpgradeManager.js +55 -0
  180. package/dist/workflow/UpgradeManager.js.map +1 -1
  181. package/dist/workflow/VerificationProfile.d.ts +8 -0
  182. package/dist/workflow/VerificationProfile.js +62 -1
  183. package/dist/workflow/VerificationProfile.js.map +1 -1
  184. package/dist/workflow/VerificationSchema.d.ts +46 -0
  185. package/dist/workflow/VerificationSchema.js +97 -0
  186. package/dist/workflow/VerificationSchema.js.map +1 -0
  187. package/dist/workflow/autofix/AutoFixEngine.d.ts +37 -0
  188. package/dist/workflow/autofix/AutoFixEngine.js +169 -0
  189. package/dist/workflow/autofix/AutoFixEngine.js.map +1 -0
  190. package/dist/workflow/execution/RalphEngine.d.ts +18 -0
  191. package/dist/workflow/execution/RalphEngine.js +22 -0
  192. package/dist/workflow/execution/RalphEngine.js.map +1 -1
  193. package/dist/workflow/gates/EnhancedGates.d.ts +74 -0
  194. package/dist/workflow/gates/EnhancedGates.js +653 -0
  195. package/dist/workflow/gates/EnhancedGates.js.map +1 -0
  196. package/dist/workflow/gates/GateSystem.d.ts +3 -0
  197. package/dist/workflow/gates/GateSystem.js +94 -1
  198. package/dist/workflow/gates/GateSystem.js.map +1 -1
  199. package/dist/workflow/types.d.ts +1 -1
  200. package/docs/README.md +3 -0
  201. package/docs/guides/DEVELOPMENT_WORKFLOW.md +28 -9
  202. package/docs/guides/GETTING_STARTED.md +19 -0
  203. package/docs/guides/MIGRATION.md +119 -0
  204. package/docs/start/quickstart.md +1 -0
  205. package/docs/workflow/GATES_AND_SCORE.md +34 -1
  206. package/docs/workflow/README.md +58 -10
  207. package/package.json +7 -18
  208. package/scripts/workflow/lib/gbrain-runtime.mjs +185 -0
  209. package/scripts/workflow/lib/report-output.mjs +107 -0
  210. package/scripts/workflow/provider-rehearsal.mjs +129 -48
  211. package/scripts/workflow/setup-smoke.mjs +142 -8
  212. package/docs/ACTIVE_SECURITY_VISUAL_GATES.md +0 -87
  213. package/docs/AI_ENGINEERING_OS_POSITIONING.md +0 -607
  214. package/docs/BACKGROUND_HUNTER.md +0 -62
  215. package/docs/CODE_INTELLIGENCE.md +0 -180
  216. package/docs/CONTEXT_BUDGET.md +0 -155
  217. package/docs/DEPENDENCY_AUDIT.md +0 -118
  218. package/docs/EVOLUTION_SHADOW_MODE.md +0 -63
  219. package/docs/GITLAB_FLOW.md +0 -125
  220. package/docs/GOVERNANCE_DASHBOARD.md +0 -85
  221. package/docs/MEMORY_BRAIN.md +0 -104
  222. package/docs/MEMORY_FABRIC.md +0 -161
  223. package/docs/RESOURCE_GOVERNANCE.md +0 -92
  224. package/docs/RUNTIME_EVIDENCE.md +0 -101
  225. package/docs/WORKFLOW_EVAL.md +0 -151
  226. package/image/wechat-public.jpg +0 -0
  227. package/image/wxPay.jpg +0 -0
  228. package/image/zfb.jpg +0 -0
@@ -1,607 +0,0 @@
1
- # SCALE Engine Strategic Positioning
2
-
3
- > Date: 2026-05-20
4
- > Status: strategic direction with a v0.27.0 runtime baseline
5
- > Audience: maintainers, contributors, roadmap reviewers, and product-facing documentation owners
6
-
7
- SCALE Engine should be positioned as an **Agent Governance Runtime** evolving toward an **AI Engineering OS**.
8
-
9
- The project is no longer best described as a prompt toolbox. Its durable value is the runtime layer around AI coding agents:
10
-
11
- - workflow state machines
12
- - hard gates and verification evidence
13
- - hook-based tool interception
14
- - role and permission boundaries
15
- - artifact persistence
16
- - context budgets
17
- - memory provider routing
18
- - skill, MCP, CLI, and adapter orchestration
19
-
20
- The core thesis is:
21
-
22
- > Use system constraints, evidence, and runtime gates to replace agent self-discipline.
23
-
24
- This positioning is intentionally stronger than "prompt engineering", but it must stay evidence-backed. SCALE can describe the direction as AI Engineering OS; it should only describe measurable gains after benchmark, eval, or runtime evidence exists.
25
-
26
- ## Reference Inputs
27
-
28
- This document consolidates:
29
-
30
- - the current SCALE Engine architecture and documentation surfaces
31
- - maintainer review of SCALE as an Agent Governance Runtime
32
- - the external Harness Engineering framing in [SCALE OS v10.0: AI 编码的认知操作系统](https://segmentfault.com/a/1190000047756584)
33
-
34
- External performance claims from ecosystem articles are positioning inputs, not SCALE Engine release claims. Public SCALE claims should still be backed by local evals, runtime evidence, or release reports.
35
-
36
- ## 1. Market Category
37
-
38
- SCALE sits between four emerging categories:
39
-
40
- | Category | SCALE relationship |
41
- | --- | --- |
42
- | AI Engineering OS | Long-term positioning: one governed operating layer for agent-driven engineering |
43
- | Agent Governance Runtime | Current strongest fit: gates, hooks, evidence, role boundaries, and policy enforcement |
44
- | Workflow Orchestration Runtime | Current fit: FSM, phase commands, artifacts, verification, and ship flow |
45
- | Harness Engineering Infrastructure | Methodology fit: constraints + feedback + workflow + continuous improvement |
46
-
47
- SCALE should avoid being framed as only:
48
-
49
- - prompt templates
50
- - agent rules
51
- - a Claude/Cursor/Codex config generator
52
- - an AutoGPT-style chain executor
53
- - a generic skills catalog
54
-
55
- Those may exist around the project, but they are not the core defensible layer.
56
-
57
- ## 2. Problem Definition
58
-
59
- AI coding failures are not only model-quality failures. They are engineering runtime failures:
60
-
61
- | Failure mode | Typical prompt-only response | SCALE response |
62
- | --- | --- | --- |
63
- | Fake completion | "Please verify before finishing" | verification gates and final evidence checks |
64
- | Skipped tests | reminder text | FSM and verification status before completion |
65
- | Repeated blind retries | "try a different approach" | retry and behavior detectors |
66
- | Context overload | longer instructions | context budgets, lazy loading, scoped packs |
67
- | Agent drift | more rules | persisted workflow state and phase boundaries |
68
- | Hallucinated delivery | review prompt | runtime evidence ledger and ship gates |
69
- | Lost learning | chat history | memory artifacts, failure replay, lessons, rule candidates |
70
- | Multi-agent confusion | role descriptions | role gateway and tool permission boundaries |
71
- | Tool overreach | trust agent judgment | hook interception and policy gateway |
72
-
73
- The strategic target is not to make the model "more obedient". The target is to make non-compliant behavior observable, blockable, and recoverable.
74
-
75
- ## 3. Current Strengths
76
-
77
- ### 3.1 Runtime Constraints
78
-
79
- SCALE already has the right architectural instinct: lower critical rules from prompt text into runtime checks.
80
-
81
- Relevant surfaces:
82
-
83
- - `docs/ENGINEERING_STANDARDS.md`
84
- - `docs/RUNTIME_EVIDENCE.md`
85
- - `docs/DEPENDENCY_AUDIT.md`
86
- - `src/workflow/gates/GateSystem.ts`
87
- - `src/guardrails/Gateway.ts`
88
- - `src/artifact/fsm.ts`
89
-
90
- This is the primary moat. Prompt rules can be ignored; runtime gates can block progress.
91
-
92
- ### 3.2 Workflow State Machine
93
-
94
- The workflow is driven by artifact state, not by chat momentum.
95
-
96
- Strategic value:
97
-
98
- - prevents premature completion
99
- - forces phase-specific evidence
100
- - makes stalled or skipped phases visible
101
- - supports resume and handoff across long sessions
102
- - gives agent platforms a shared lifecycle model
103
-
104
- The FSM should remain strict at phase boundaries and flexible inside each phase.
105
-
106
- ### 3.3 Hook and Gateway Layer
107
-
108
- Hooks, pre-tool checks, post-tool checks, stop checks, and role-aware gateway decisions form the AI runtime interceptor layer.
109
-
110
- Strategic value:
111
-
112
- - agents do not receive raw, unlimited tool authority
113
- - unsafe operations can be blocked before execution
114
- - tool output can be converted into evidence
115
- - repeated failure patterns can be detected outside the model
116
-
117
- This layer makes SCALE closer to an admission controller than a prompt pack.
118
-
119
- ### 3.4 Evidence-Backed Delivery
120
-
121
- SCALE's strongest anti-hallucination capability is engineering hallucination control:
122
-
123
- - no test evidence means no verified claim
124
- - no runtime evidence means no product-smoke claim
125
- - no reviewed file scope means no governed ship
126
- - no dependency audit evidence means weaker security confidence
127
-
128
- This reduces fake completion more reliably than instruction text.
129
-
130
- It does not fully solve reasoning hallucination. Architecture decisions, root-cause analysis, and technical tradeoffs still need evaluator intelligence.
131
-
132
- ### 3.5 Adapter and Platform Surface
133
-
134
- The agent-platform adapters let SCALE act as a shared governance layer for different coding agents.
135
-
136
- Strategic value:
137
-
138
- - one governance model across Claude Code, Codex, Cursor, Gemini, Windsurf, Kiro, Cline, and related tools
139
- - fewer duplicated rule files
140
- - lower switching cost between agents
141
- - consistent evidence and workflow semantics
142
-
143
- Adapter expansion should not become the main roadmap by itself. The strategic value comes from shared governance semantics, not from the count of supported agents.
144
-
145
- ## 4. Honest Capability Assessment
146
-
147
- SCALE can already claim:
148
-
149
- - stronger engineering governance than prompt-only rules
150
- - structured workflow execution with phase and artifact state
151
- - hard verification gates for delivery claims
152
- - evidence-based runtime reporting
153
- - first-class supply-chain audit direction
154
- - growing adapter coverage
155
- - memory and skill orchestration foundations
156
-
157
- SCALE should not yet overclaim:
158
-
159
- - fully autonomous self-evolution
160
- - human-level long-term memory
161
- - guaranteed token reduction percentages
162
- - guaranteed hallucination reduction percentages
163
- - adaptive cognitive planning
164
- - universal skill routing intelligence
165
-
166
- Use target ranges only in roadmap or evaluation documents, not as product claims, until eval evidence supports them.
167
-
168
- ## 5. Current Gaps
169
-
170
- ### 5.1 Memory Architecture
171
-
172
- Current state is closer to engineering knowledge persistence than true cognitive memory.
173
-
174
- Existing strengths:
175
-
176
- - artifacts persist decisions and work state
177
- - memory brain stores evidence-backed learnings
178
- - failure replay can preserve incidents
179
- - provider routing gives the right extension point
180
-
181
- Missing layers:
182
-
183
- | Memory type | Target meaning |
184
- | --- | --- |
185
- | Working memory | short-lived task context with strict token budget |
186
- | Episodic memory | past task episodes, failures, fixes, and outcomes |
187
- | Semantic memory | stable project facts and domain concepts |
188
- | Procedural memory | reusable ways of doing work |
189
- | Strategy memory | learned routing, verification, and recovery strategies |
190
-
191
- The next memory work should focus on provider-backed retrieval quality, not more local file accumulation.
192
-
193
- ### 5.2 Context Compiler
194
-
195
- SCALE has context structure and budgets. It does not yet have a full context compiler.
196
-
197
- Current capability:
198
-
199
- - categorize context
200
- - budget context
201
- - lazy-load selected material
202
- - assemble role/task-specific packs
203
-
204
- Target capability:
205
-
206
- - rank relevance
207
- - slice semantically
208
- - compress adaptively
209
- - route retrieval by task intent
210
- - explain why each context item was included
211
- - measure token saved vs evidence lost
212
-
213
- This is the highest-leverage path for token reduction.
214
-
215
- ### 5.3 Adaptive Workflow
216
-
217
- The current workflow is mostly rule-driven.
218
-
219
- The target workflow should adapt based on:
220
-
221
- - task risk
222
- - code ownership boundaries
223
- - prior failure rate
224
- - changed-file blast radius
225
- - missing evidence
226
- - tool reliability
227
- - agent capability confidence
228
-
229
- The system should not make every task heavy. It should apply stricter gates when risk rises and keep small documentation or config changes lightweight.
230
-
231
- ### 5.4 Skill Routing Intelligence
232
-
233
- SCALE already models skills, MCP, CLI, browser, desktop automation, and evidence requirements.
234
-
235
- The missing layer is strategy:
236
-
237
- - when to call a skill
238
- - why that skill is preferred
239
- - what evidence it must produce
240
- - what to do when it fails
241
- - when to switch to MCP or CLI
242
- - when to avoid tool use entirely
243
-
244
- Skill routing should become a planned execution graph, not an ad hoc recommendation list.
245
-
246
- ### 5.5 Evaluator Intelligence
247
-
248
- Current gates are strong for engineering completion, but weaker for reasoning quality.
249
-
250
- Needed evaluator layers:
251
-
252
- - critique loop for architecture and root cause
253
- - uncertainty scoring
254
- - adversarial review on high-risk changes
255
- - tradeoff comparison
256
- - failure hypothesis ranking
257
- - "evidence is insufficient" verdicts
258
-
259
- This is the path to reducing reasoning hallucination rather than only delivery hallucination.
260
-
261
- ### 5.6 Self-Optimization Loop
262
-
263
- Evolution should mean more than summarizing lessons.
264
-
265
- The target loop:
266
-
267
- ```text
268
- failure evidence
269
- -> defect record
270
- -> root-cause classification
271
- -> lesson candidate
272
- -> rule candidate
273
- -> hook or gate proposal
274
- -> shadow validation
275
- -> regression check
276
- -> promoted governance behavior
277
- ```
278
-
279
- The promotion step must remain evidence-backed. Automatically generating rules without validation risks turning mistakes into permanent friction.
280
-
281
- ## 6. Roadmap Direction
282
-
283
- ### 6.1 Planning Principle
284
-
285
- The roadmap has release horizons plus a long-range vision:
286
-
287
- | Horizon | Purpose | Claim boundary |
288
- | --- | --- | --- |
289
- | 0.27.x baseline | establish the AI OS Runtime primitives and adoption path | "runtime baseline", not "complete AI OS" |
290
- | 0.28.0 closure | make planning, execution, verification, dashboard, benchmark, and adoption usable as a closed loop | "usable closed-loop beta", not "stable final OS" |
291
- | 0.29.0 intelligence | make memory, context, and skill routing measurably smarter | "intelligence beta", not proven long-term cognition |
292
- | 0.30.0 governance maturity | strengthen enterprise governance, upgrade, evaluator, and evolution controls | "governance maturity", not commercial stability |
293
- | 1.0.0 beta | integrate the loop into a public AI Engineering OS beta | "public beta", backed by demos and benchmark evidence |
294
- | Long-range vision | keep SCALE moving toward an AI Engineering OS with memory, context, governance, and tool intelligence | directional until backed by eval data |
295
-
296
- The near-term work should be aggressive, but public wording must stay precise. SCALE can ship beta capabilities quickly; it should only claim stable, industry-leading AI OS behavior after repeated project evidence, benchmarks, and upgrade validation.
297
-
298
- ### 6.2 0.27.0: Cognitive Runtime Layer
299
-
300
- Theme: make context, memory, and skill use more intelligent and explainable.
301
-
302
- Core work:
303
-
304
- | Module | Outcome |
305
- | --- | --- |
306
- | Context Compiler | relevance-ranked, budgeted, explainable context packs |
307
- | Memory Provider Runtime | gbrain, agentmemory, code memory, and local memory as provider choices |
308
- | Skill Routing Engine | task-intent routing with evidence requirements and fallback decisions |
309
- | Governance ROI | quantify token cost, evidence quality, and gate friction |
310
-
311
- Implemented baseline in v0.27.0:
312
-
313
- ```bash
314
- scale ai-os plan \
315
- --task-id TASK-123 \
316
- --task "Fix OAuth callback auth token handling and verify browser flow" \
317
- --level L \
318
- --files src/auth/oauth.ts,src/ui/callback.tsx \
319
- --budget 8000 \
320
- --json
321
- ```
322
-
323
- The command returns one runtime plan containing:
324
-
325
- - `governance`: progressive mode, risk signals, required behaviors
326
- - `context`: Context Compiler ranking, included sections, omitted sections, token savings
327
- - `memory`: provider order, selected providers, fallback status, recalled items, memory context pack
328
- - `skillPlan`: detected intents plus executable skill/artifact/verification steps
329
- - `adaptiveWorkflow`: risk-adaptive gates and exit criteria for the task
330
- - `roi`: benefit and overhead modules for context, memory, skill routing, and governance
331
-
332
- Exit criteria:
333
-
334
- - each context item has an inclusion reason: baseline implemented by `ContextCompiler`
335
- - memory recall has provider, score, and evidence source: baseline implemented by Memory Provider Router
336
- - skill recommendations include why, when, and required proof: baseline implemented by skill execution plans
337
- - context pack generation reports token budget and omissions: baseline implemented by `context.pack.compiler`
338
-
339
- ### 6.3 0.27.x: Runtime Baseline and Adoption Path
340
-
341
- Theme: make the AI OS Runtime installable, inspectable, and safe to adopt.
342
-
343
- Current landing status:
344
-
345
- - `scale ai-os plan` exists as the unified planning entry point for governance, context, memory, skill routing, adaptive workflow, and ROI.
346
- - `scale ai-os run --dry-run` exists as the first beta execution slice.
347
- - `scale ai-os run --mode guarded --verify "<command>"` executes explicit verification commands through the safe command runner, records each command as runtime evidence, and blocks the run when verification fails.
348
- - `scale ai-os status --lang zh|en` checks runtime directories, plan/run evidence, guarded verification, dashboard health, benchmark evidence, and adoption evidence in one closed-loop readiness report; when verification evidence is missing, it recommends concrete guarded verification commands from `.scale/verification.json` or `package.json`.
349
- - `scale ai-os dashboard` summarizes persisted run reports into ready/blocked counts, guarded verification health, pending evidence, failure learning candidates, and next recommendations.
350
- - `scale ai-os benchmark` runs fixed beta scenarios and reports context token use, estimated savings, memory recall, skill steps, governance modes, and the current dashboard health snapshot.
351
- - `scale ai-os migrate` creates or verifies the `.scale/ai-os` runtime directories and writes an idempotent migration report.
352
- - `scale ai-os adopt` runs migrate, the first dry-run, benchmark, and doctor as one adoption path, then writes `.scale/ai-os/adoption.json`.
353
- - `scale ai-os doctor --lang zh|en` checks AI OS runtime readiness without mutating the project and blocks adoption when required directories or dashboard health are broken.
354
- - `scale upgrade check/plan` includes AI OS readiness, so existing projects see adoption, migration, and doctor steps through the normal upgrade workflow.
355
- - The upgrade and adoption CLI surfaces now have human-facing Chinese and English output while preserving JSON for scripts, CI, and agent integrations.
356
-
357
- Boundary:
358
-
359
- - 0.27.x is the baseline. It proves the runtime surface and adoption path, but it does not yet prove autonomous source mutation, PR creation, long-term memory, or stable commercial AI OS behavior.
360
-
361
- ### 6.4 0.28.0: Usable Closed-Loop Enhancement
362
-
363
- Theme: turn `ai-os plan` into a runnable beta loop.
364
-
365
- Target timebox: 2-3 weeks.
366
-
367
- Core work:
368
-
369
- | Module | Outcome |
370
- | --- | --- |
371
- | `scale ai-os run` | execute the unified plan through workflow, context, memory, skill routing, and verification steps |
372
- | Runtime Status | show whether plan, run, verification, dashboard, benchmark, adoption, and doctor evidence exist for the project |
373
- | Verification Recommendation | derive suggested verification commands from task level, changed files, project verification profile, and risk signals |
374
- | Failure Learning Closure | convert failed guarded runs, gate failures, and missing evidence into reviewed lesson/rule candidates |
375
- | Closed-Loop Demo Pack | provide repeatable docs and code task demos that exercise plan -> run -> verify -> dashboard -> benchmark |
376
- | Memory Provider Bridge | keep gbrain, agentmemory, code memory, and local memory selectable through one provider contract |
377
- | Context Compiler v2 | merge task intent, risk level, files, memory recall, and role into one explainable context pack |
378
- | Skill Router v2 | create an execution graph for skills, MCP tools, CLIs, artifacts, and required evidence |
379
- | Adaptive Workflow Profiles | choose light, standard, or strict gates from risk and changed-file signals |
380
- | AI OS Dashboard CLI | summarize gate health, memory hits, context budget, skill evidence, and ROI |
381
- | Upgrade/Migration | migrate older `.scale` state and warn about incompatible local governance files |
382
- | AI OS Adoption and Doctor | keep one-command adoption and readiness checks aligned with the normal upgrade workflow |
383
- | Bilingual DX | keep key CLI help, errors, README guidance, and tutorials readable in Chinese and English |
384
- | Benchmark Pack | run fixed samples for token budget, recall, gate pass rate, and skill-routing evidence |
385
-
386
- Exit criteria:
387
-
388
- - `scale ai-os run` can complete at least one documentation task and one code task in dry-run or guarded execution mode
389
- - `scale ai-os status` or equivalent doctor output shows what is missing for a closed loop
390
- - verification recommendations are explainable and can be overridden by explicit `--verify` commands
391
- - execution output records context decisions, memory provider choices, skill decisions, gate results, and failure lessons
392
- - benchmark output compares context token budget against a full-load baseline
393
- - beta docs clearly state what is automated, what is proposed, and what still requires human approval
394
-
395
- Current implementation status:
396
-
397
- - In progress on the post-0.27.1 development branch.
398
- - Runtime baseline, status visibility, verification recommendation, adoption, doctor, dashboard, benchmark, migration, upgrade integration, and bilingual adoption guidance are already landed.
399
- - Remaining 0.28.0 work should focus on failure-learning closure and repeatable end-to-end demo evidence.
400
- - It does not yet create PRs or mutate source files; richer skill execution remains a later implementation slice unless explicitly approved.
401
-
402
- Explicitly deferred:
403
-
404
- - default automatic PR creation or merge without review
405
- - deep dynamic dependency sandboxing beyond audit, lockfile diff, and high-risk pattern checks
406
- - full VLM visual judgment beyond screenshot capture and interface placeholders
407
- - claims of human-level long-term memory or fully autonomous engineering
408
-
409
- ### 6.5 0.29.0: Memory, Context, and Skill Intelligence
410
-
411
- Theme: make the beta loop measurably smarter rather than only broader.
412
-
413
- Target timebox: 4-6 weeks.
414
-
415
- Core work:
416
-
417
- | Module | Outcome |
418
- | --- | --- |
419
- | Memory Quality Scoring | score recall precision, contradiction risk, accepted memory rate, and stale-memory risk |
420
- | Provider Fallback Policy | choose between gbrain, agentmemory, code memory, local memory, or no memory with an explicit reason |
421
- | Context Compression | summarize low-risk context while preserving high-risk evidence verbatim |
422
- | Skill Strategy Learning | learn preferred tools from successful evidence, failures, and user overrides |
423
- | Workflow Eval Integration | turn benchmark results into release-gate evidence |
424
-
425
- Current first slice:
426
-
427
- - `scale ai-os status --json` now includes an `intelligence` report with `memory-recall`, `context-savings`, `skill-routing`, and `benchmark-intelligence` signals; memory recall includes a quality score based on confidence, relevance, and evidence-backed items.
428
- - Context intelligence now reports `contextQuality` with omitted sections, total omitted tokens, compression risk, and evidence-loss warnings when runtime evidence is dropped by budget constraints.
429
- - Human `scale ai-os status --lang zh|en` output surfaces the same intelligence readiness summary so release reviewers can see whether 0.29.0 memory/context/skill gains are backed by run and benchmark evidence.
430
-
431
- Exit criteria:
432
-
433
- - memory recall has acceptance/rejection feedback
434
- - context packs show savings, omissions, and evidence-loss warnings
435
- - skill routing decisions can be compared against outcome quality
436
- - release notes include measured deltas instead of aspirational percentages
437
-
438
- ### 6.6 0.30.0: Enterprise Governance and Upgrade Maturity
439
-
440
- Theme: deepen adaptive governance beyond the v0.27.0 baseline.
441
-
442
- Target timebox: 6-10 weeks.
443
-
444
- Core work:
445
-
446
- | Module | Outcome |
447
- | --- | --- |
448
- | Adaptive Workflow Router | production policy controls for dynamic gate profiles beyond the v0.27.0 planning output |
449
- | Evaluator Intelligence | critique and uncertainty gates for architecture/root-cause work |
450
- | Tool Strategy Planner | cost, retry, fallback, and evidence graph for tools |
451
- | Evolution Shadow Promotion | lessons become rules only after validation |
452
-
453
- Current first slice:
454
-
455
- - `scale ai-os plan` now emits `evaluator` intelligence with strategy `evaluator-intelligence-v1`, required gates, risk level, uncertainty score, drivers, and recommendations.
456
- - Architecture, root-cause, security, and release-risk tasks can automatically add `architecture-critique`, `root-cause-review`, `security-threat-model`, `release-readiness-review`, and `uncertainty-decision-log` gates to the adaptive workflow.
457
- - `scale ai-os run` carries evaluator gates into the executable step list and evidence requirements, so reasoning-heavy work cannot be represented as a plain low-friction run.
458
- - `scale ai-os status` now includes an `evaluator-intelligence` signal plus evaluator gate count and average uncertainty for release and milestone review.
459
- - `scale ai-os plan` now emits `toolStrategy` with strategy `tool-strategy-v1`, converting skill, artifact, and verification steps into a cost, retry, fallback, side-effect, and evidence graph.
460
- - `scale ai-os status` now includes a `tool-strategy` signal plus tool step count, estimated cost units, high-risk step count, and fallback coverage.
461
-
462
- Exit criteria:
463
-
464
- - small tasks can stay lightweight with evidence
465
- - risky tasks escalate automatically
466
- - reasoning-heavy tasks get critique/evaluator gates
467
- - evolution proposals can be traced to failure evidence and validation results
468
-
469
- ### 6.7 1.0.0 Beta: AI Engineering OS
470
-
471
- Theme: integrate governance, memory, context, and tools into an operating layer.
472
-
473
- Target timebox: 8-12 weeks.
474
-
475
- Target capabilities:
476
-
477
- - unified agent workspace policy
478
- - provider-neutral memory and code intelligence
479
- - cross-agent execution ledger
480
- - adaptive workflow templates
481
- - measurable token and quality reports
482
- - ecosystem-safe skill and MCP lifecycle governance
483
-
484
- Release criteria:
485
-
486
- - install, upgrade, run, dashboard, benchmark, and migration flows work on clean projects
487
- - at least three representative project types have documented smoke results
488
- - failure learning produces reviewed rule candidates without silently hardening bad rules
489
- - bilingual docs explain the core workflow without requiring maintainer context
490
- - public claims are tied to `WORKFLOW_EVAL`, benchmark output, or release evidence
491
-
492
- ### 6.8 1.0.0 Stable and Long-Range Vision
493
-
494
- This is the strategic north star, not the 0.28.0 closed-loop promise.
495
-
496
- | Time horizon | Target state | Evidence required before public claim |
497
- | --- | --- | --- |
498
- | 8-12 weeks | AI Engineering OS beta: usable end-to-end loop across planning, execution, verification, memory, and dashboard | repeatable demo projects and benchmark reports |
499
- | 3-6 months | stable governance runtime: upgrades, adapters, memory providers, and eval gates are reliable in real repositories | release-to-release regression data and field reports |
500
- | 6-12 months | industry-leading agent engineering layer: adaptive workflows, strategy memory, tool intelligence, and cross-agent governance mature together | comparative evals, sustained issue closure, external adoption evidence |
501
-
502
- Long-range capability themes:
503
-
504
- - Cognitive memory: working, episodic, semantic, procedural, and strategy memory with explicit source and freshness controls.
505
- - Adaptive orchestration: workflows selected by risk, ownership, failure history, and tool reliability instead of one fixed path.
506
- - Tool intelligence: skills, MCP, CLIs, browser automation, and agent adapters treated as governed capabilities with cost, evidence, and fallback policy.
507
- - Evaluator intelligence: critique loops, uncertainty scoring, adversarial review, and evidence insufficiency verdicts for reasoning-heavy tasks.
508
- - Governance economics: token cost, gate friction, verification quality, and maintenance overhead measured as first-class product metrics.
509
- - Ecosystem governance: external skills, memory providers, adapters, and templates integrated through attribution, license, source pinning, and supply-chain checks.
510
-
511
- Non-negotiable boundary:
512
-
513
- > The long-range vision can guide architecture, but it must not be used as a release claim until the corresponding evidence exists.
514
-
515
- ## 7. Measurement Plan
516
-
517
- Strategic claims must be tied to measurement.
518
-
519
- | Claim | Required metric |
520
- | --- | --- |
521
- | Fewer fake completions | final-check failure rate before/after gates |
522
- | Fewer skipped steps | FSM blocked transition count and successful recovery rate |
523
- | Fewer blind retries | repeated-command detector hits and fix iteration count |
524
- | Lower token use | context pack token count vs baseline full-context load |
525
- | Better memory | recall precision, accepted memory rate, contradiction count |
526
- | Better skill use | recommended skill acceptance rate and evidence completion rate |
527
- | Better workflow quality | pass@1, average fix iterations, failure replay closure rate |
528
- | Safer dependencies | dependency audit block count and reviewed baseline count |
529
-
530
- Target ranges can be tracked internally, but public claims should use measured values from `WORKFLOW_EVAL`, runtime evidence, or release reports.
531
-
532
- ## 8. Messaging Rules
533
-
534
- Use:
535
-
536
- - "Agent Governance Runtime"
537
- - "AI Engineering OS direction"
538
- - "runtime constraints instead of prompt-only discipline"
539
- - "evidence-backed workflow gates"
540
- - "provider-based memory and context orchestration"
541
-
542
- Avoid:
543
-
544
- - "fully autonomous engineer"
545
- - "guaranteed 90% AI coding rate"
546
- - "eliminates hallucination"
547
- - "zero human governance"
548
- - "universal memory"
549
- - "all tools are safe by default"
550
-
551
- The product message should be ambitious, but the engineering message must stay falsifiable.
552
-
553
- ## 9. Non-Goals
554
-
555
- SCALE should not try to own every layer.
556
-
557
- Non-goals:
558
-
559
- - replacing all agent platforms
560
- - building a full IDE
561
- - becoming a generic automation shell
562
- - implementing every memory backend internally
563
- - copying external skills without attribution
564
- - turning every task into heavyweight enterprise ceremony
565
-
566
- The correct posture is:
567
-
568
- > Govern agent engineering work, integrate external capability providers, and require evidence at the boundaries.
569
-
570
- ## 10. Documentation Placement
571
-
572
- Recommended documentation split:
573
-
574
- | Surface | Content |
575
- | --- | --- |
576
- | `README.md` / `README.en.md` | concise positioning, installation, core value, current capabilities |
577
- | `docs/AI_ENGINEERING_OS_POSITIONING.md` | strategic category, gaps, roadmap, messaging rules |
578
- | `docs/CONTEXT_BUDGET.md` | context budget and compiler mechanics |
579
- | `docs/MEMORY_BRAIN.md` / `docs/MEMORY_FABRIC.md` | memory provider and recall behavior |
580
- | `docs/SKILL_RADAR.md` / `docs/TOOL_ORCHESTRATION.md` | skill and tool routing behavior |
581
- | `docs/WORKFLOW_EVAL.md` | measurable evidence and improvement claims |
582
-
583
- README should not absorb this whole strategy. It should link here and keep the first screen user-focused.
584
-
585
- ## 11. Strategic Summary
586
-
587
- SCALE's strongest current differentiator is not more prompts. It is a runtime governance model for AI engineering:
588
-
589
- ```text
590
- Agent intent
591
- -> governed workflow state
592
- -> scoped context
593
- -> role/tool policy
594
- -> evidence-producing execution
595
- -> verification gates
596
- -> memory and evolution feedback
597
- ```
598
-
599
- The next stage is to make this runtime more cognitive:
600
-
601
- - compile context, do not just load it
602
- - route memory, do not just store it
603
- - plan skill use, do not just recommend it
604
- - adapt workflow, do not just enforce one path
605
- - validate evolution, do not just summarize lessons
606
-
607
- If these are implemented with measurable evidence, SCALE can credibly move from "AI workflow engine" to "AI Engineering OS".
@@ -1,62 +0,0 @@
1
- # Background Hunter
2
-
3
- Background Hunter is the readonly proactive scan layer for SCALE Engine V2.
4
- It turns existing governance signals into an actionable hunt queue without editing application code.
5
-
6
- ## Boundary
7
-
8
- Default behavior is intentionally conservative:
9
-
10
- - scan only, no automatic code changes
11
- - no automatic LLM repair
12
- - no automatic commit or pull request
13
- - no release bypass
14
- - ignore decisions are explicit and written to `.scale/hunt/ignored-findings.json`
15
-
16
- The hunter reuses existing checks instead of creating a second rule system. The first implementation consumes:
17
-
18
- - `EngineeringStandards`
19
- - `ReviewAnalyzer` when status and diff input are provided by callers
20
-
21
- ## Commands
22
-
23
- ```bash
24
- scale hunt scan
25
- scale hunt scan --json
26
- scale hunt report
27
- scale hunt diagnose <finding-id>
28
- scale hunt ignore <finding-id> --reason "Accepted legacy debt tracked elsewhere"
29
- ```
30
-
31
- `hunt scan` and `hunt report` do not modify source files. They classify findings as `open` or `ignored`.
32
-
33
- `hunt diagnose <finding-id>` creates a normal `DiagnosticLoop` from the finding. This keeps the debugging workflow evidence-first:
34
-
35
- - reproducible command
36
- - expected failure
37
- - changed files
38
- - verification commands
39
- - hypotheses and cleanup checklist
40
-
41
- `hunt ignore` records the finding id and stable fingerprint. The same finding will remain visible in the report as `ignored`, but it is removed from the open queue.
42
-
43
- ## Finding Identity
44
-
45
- Every finding gets:
46
-
47
- - `id`: short deterministic SHA-256 id derived from the fingerprint
48
- - `fingerprint`: stable source/rule/path/line/message tuple
49
- - `source`: currently `engineering-standards` or `review-analyzer`
50
- - `diagnosticInput`: ready-to-use `DiagnosticLoopInput`
51
-
52
- This allows repeated scans to avoid noisy duplicates and lets teams explicitly accept or defer known debt.
53
-
54
- ## Recommended Flow
55
-
56
- 1. Run `scale hunt scan --json`.
57
- 2. Triage open findings.
58
- 3. For real issues, run `scale hunt diagnose <finding-id> --json`.
59
- 4. Fix through the normal plan/TDD/verify workflow.
60
- 5. For accepted legacy debt, run `scale hunt ignore <finding-id> --reason "..."`
61
-
62
- Do not promote Background Hunter to automatic repair until the project has enough evidence that its findings are stable and low-noise.