@hongmaple0820/scale-engine 0.27.0 → 0.28.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.en.md +74 -3
- package/README.md +75 -3
- package/dist/api/cli.js +376 -7
- package/dist/api/cli.js.map +1 -1
- package/dist/runtime/AiOsRuntime.d.ts +291 -0
- package/dist/runtime/AiOsRuntime.js +1031 -1
- package/dist/runtime/AiOsRuntime.js.map +1 -1
- package/dist/workflow/GovernanceTemplatePacks.js +10 -3
- package/dist/workflow/GovernanceTemplatePacks.js.map +1 -1
- package/dist/workflow/UpgradeManager.d.ts +4 -1
- package/dist/workflow/UpgradeManager.js +35 -0
- package/dist/workflow/UpgradeManager.js.map +1 -1
- package/docs/AI_ENGINEERING_OS_POSITIONING.md +141 -11
- package/docs/README.md +1 -1
- package/docs/start/README.md +2 -1
- package/docs/start/quickstart.md +6 -0
- package/docs/start/workflow-upgrade.md +17 -0
- package/docs/workflow/README.md +2 -1
- package/package.json +1 -1
|
@@ -280,18 +280,20 @@ The promotion step must remain evidence-backed. Automatically generating rules w
|
|
|
280
280
|
|
|
281
281
|
## 6. Roadmap Direction
|
|
282
282
|
|
|
283
|
-
### 6.1
|
|
283
|
+
### 6.1 Planning Principle
|
|
284
284
|
|
|
285
|
-
|
|
285
|
+
The roadmap has release horizons plus a long-range vision:
|
|
286
286
|
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
|
|
292
|
-
|
|
287
|
+
| Horizon | Purpose | Claim boundary |
|
|
288
|
+
| --- | --- | --- |
|
|
289
|
+
| 0.27.x baseline | establish the AI OS Runtime primitives and adoption path | "runtime baseline", not "complete AI OS" |
|
|
290
|
+
| 0.28.0 closure | make planning, execution, verification, dashboard, benchmark, and adoption usable as a closed loop | "usable closed-loop beta", not "stable final OS" |
|
|
291
|
+
| 0.29.0 intelligence | make memory, context, and skill routing measurably smarter | "intelligence beta", not proven long-term cognition |
|
|
292
|
+
| 0.30.0 governance maturity | strengthen enterprise governance, upgrade, evaluator, and evolution controls | "governance maturity", not commercial stability |
|
|
293
|
+
| 1.0.0 beta | integrate the loop into a public AI Engineering OS beta | "public beta", backed by demos and benchmark evidence |
|
|
294
|
+
| Long-range vision | keep SCALE moving toward an AI Engineering OS with memory, context, governance, and tool intelligence | directional until backed by eval data |
|
|
293
295
|
|
|
294
|
-
|
|
296
|
+
The near-term work should be aggressive, but public wording must stay precise. SCALE can ship beta capabilities quickly; it should only claim stable, industry-leading AI OS behavior after repeated project evidence, benchmarks, and upgrade validation.
|
|
295
297
|
|
|
296
298
|
### 6.2 0.27.0: Cognitive Runtime Layer
|
|
297
299
|
|
|
@@ -334,10 +336,105 @@ Exit criteria:
|
|
|
334
336
|
- skill recommendations include why, when, and required proof: baseline implemented by skill execution plans
|
|
335
337
|
- context pack generation reports token budget and omissions: baseline implemented by `context.pack.compiler`
|
|
336
338
|
|
|
337
|
-
### 6.3 0.
|
|
339
|
+
### 6.3 0.27.x: Runtime Baseline and Adoption Path
|
|
340
|
+
|
|
341
|
+
Theme: make the AI OS Runtime installable, inspectable, and safe to adopt.
|
|
342
|
+
|
|
343
|
+
Current landing status:
|
|
344
|
+
|
|
345
|
+
- `scale ai-os plan` exists as the unified planning entry point for governance, context, memory, skill routing, adaptive workflow, and ROI.
|
|
346
|
+
- `scale ai-os run --dry-run` exists as the first beta execution slice.
|
|
347
|
+
- `scale ai-os run --mode guarded --verify "<command>"` executes explicit verification commands through the safe command runner, records each command as runtime evidence, and blocks the run when verification fails.
|
|
348
|
+
- `scale ai-os status --lang zh|en` checks runtime directories, plan/run evidence, guarded verification, dashboard health, benchmark evidence, and adoption evidence in one closed-loop readiness report; when verification evidence is missing, it recommends concrete guarded verification commands from `.scale/verification.json` or `package.json`.
|
|
349
|
+
- `scale ai-os dashboard` summarizes persisted run reports into ready/blocked counts, guarded verification health, pending evidence, failure learning candidates, and next recommendations.
|
|
350
|
+
- `scale ai-os benchmark` runs fixed beta scenarios and reports context token use, estimated savings, memory recall, skill steps, governance modes, and the current dashboard health snapshot.
|
|
351
|
+
- `scale ai-os migrate` creates or verifies the `.scale/ai-os` runtime directories and writes an idempotent migration report.
|
|
352
|
+
- `scale ai-os adopt` runs migrate, the first dry-run, benchmark, and doctor as one adoption path, then writes `.scale/ai-os/adoption.json`.
|
|
353
|
+
- `scale ai-os doctor --lang zh|en` checks AI OS runtime readiness without mutating the project and blocks adoption when required directories or dashboard health are broken.
|
|
354
|
+
- `scale upgrade check/plan` includes AI OS readiness, so existing projects see adoption, migration, and doctor steps through the normal upgrade workflow.
|
|
355
|
+
- The upgrade and adoption CLI surfaces now have human-facing Chinese and English output while preserving JSON for scripts, CI, and agent integrations.
|
|
356
|
+
|
|
357
|
+
Boundary:
|
|
358
|
+
|
|
359
|
+
- 0.27.x is the baseline. It proves the runtime surface and adoption path, but it does not yet prove autonomous source mutation, PR creation, long-term memory, or stable commercial AI OS behavior.
|
|
360
|
+
|
|
361
|
+
### 6.4 0.28.0: Usable Closed-Loop Enhancement
|
|
362
|
+
|
|
363
|
+
Theme: turn `ai-os plan` into a runnable beta loop.
|
|
364
|
+
|
|
365
|
+
Target timebox: 2-3 weeks.
|
|
366
|
+
|
|
367
|
+
Core work:
|
|
368
|
+
|
|
369
|
+
| Module | Outcome |
|
|
370
|
+
| --- | --- |
|
|
371
|
+
| `scale ai-os run` | execute the unified plan through workflow, context, memory, skill routing, and verification steps |
|
|
372
|
+
| Runtime Status | show whether plan, run, verification, dashboard, benchmark, adoption, and doctor evidence exist for the project |
|
|
373
|
+
| Verification Recommendation | derive suggested verification commands from task level, changed files, project verification profile, and risk signals |
|
|
374
|
+
| Failure Learning Closure | convert failed guarded runs, gate failures, and missing evidence into reviewed lesson/rule candidates |
|
|
375
|
+
| Closed-Loop Demo Pack | provide repeatable docs and code task demos that exercise plan -> run -> verify -> dashboard -> benchmark |
|
|
376
|
+
| Memory Provider Bridge | keep gbrain, agentmemory, code memory, and local memory selectable through one provider contract |
|
|
377
|
+
| Context Compiler v2 | merge task intent, risk level, files, memory recall, and role into one explainable context pack |
|
|
378
|
+
| Skill Router v2 | create an execution graph for skills, MCP tools, CLIs, artifacts, and required evidence |
|
|
379
|
+
| Adaptive Workflow Profiles | choose light, standard, or strict gates from risk and changed-file signals |
|
|
380
|
+
| AI OS Dashboard CLI | summarize gate health, memory hits, context budget, skill evidence, and ROI |
|
|
381
|
+
| Upgrade/Migration | migrate older `.scale` state and warn about incompatible local governance files |
|
|
382
|
+
| AI OS Adoption and Doctor | keep one-command adoption and readiness checks aligned with the normal upgrade workflow |
|
|
383
|
+
| Bilingual DX | keep key CLI help, errors, README guidance, and tutorials readable in Chinese and English |
|
|
384
|
+
| Benchmark Pack | run fixed samples for token budget, recall, gate pass rate, and skill-routing evidence |
|
|
385
|
+
|
|
386
|
+
Exit criteria:
|
|
387
|
+
|
|
388
|
+
- `scale ai-os run` can complete at least one documentation task and one code task in dry-run or guarded execution mode
|
|
389
|
+
- `scale ai-os status` or equivalent doctor output shows what is missing for a closed loop
|
|
390
|
+
- verification recommendations are explainable and can be overridden by explicit `--verify` commands
|
|
391
|
+
- execution output records context decisions, memory provider choices, skill decisions, gate results, and failure lessons
|
|
392
|
+
- benchmark output compares context token budget against a full-load baseline
|
|
393
|
+
- beta docs clearly state what is automated, what is proposed, and what still requires human approval
|
|
394
|
+
|
|
395
|
+
Current implementation status:
|
|
396
|
+
|
|
397
|
+
- In progress on the post-0.27.1 development branch.
|
|
398
|
+
- Runtime baseline, status visibility, verification recommendation, adoption, doctor, dashboard, benchmark, migration, upgrade integration, and bilingual adoption guidance are already landed.
|
|
399
|
+
- Remaining 0.28.0 work should focus on failure-learning closure and repeatable end-to-end demo evidence.
|
|
400
|
+
- It does not yet create PRs or mutate source files; richer skill execution remains a later implementation slice unless explicitly approved.
|
|
401
|
+
|
|
402
|
+
Explicitly deferred:
|
|
403
|
+
|
|
404
|
+
- default automatic PR creation or merge without review
|
|
405
|
+
- deep dynamic dependency sandboxing beyond audit, lockfile diff, and high-risk pattern checks
|
|
406
|
+
- full VLM visual judgment beyond screenshot capture and interface placeholders
|
|
407
|
+
- claims of human-level long-term memory or fully autonomous engineering
|
|
408
|
+
|
|
409
|
+
### 6.5 0.29.0: Memory, Context, and Skill Intelligence
|
|
410
|
+
|
|
411
|
+
Theme: make the beta loop measurably smarter rather than only broader.
|
|
412
|
+
|
|
413
|
+
Target timebox: 4-6 weeks.
|
|
414
|
+
|
|
415
|
+
Core work:
|
|
416
|
+
|
|
417
|
+
| Module | Outcome |
|
|
418
|
+
| --- | --- |
|
|
419
|
+
| Memory Quality Scoring | score recall precision, contradiction risk, accepted memory rate, and stale-memory risk |
|
|
420
|
+
| Provider Fallback Policy | choose between gbrain, agentmemory, code memory, local memory, or no memory with an explicit reason |
|
|
421
|
+
| Context Compression | summarize low-risk context while preserving high-risk evidence verbatim |
|
|
422
|
+
| Skill Strategy Learning | learn preferred tools from successful evidence, failures, and user overrides |
|
|
423
|
+
| Workflow Eval Integration | turn benchmark results into release-gate evidence |
|
|
424
|
+
|
|
425
|
+
Exit criteria:
|
|
426
|
+
|
|
427
|
+
- memory recall has acceptance/rejection feedback
|
|
428
|
+
- context packs show savings, omissions, and evidence-loss warnings
|
|
429
|
+
- skill routing decisions can be compared against outcome quality
|
|
430
|
+
- release notes include measured deltas instead of aspirational percentages
|
|
431
|
+
|
|
432
|
+
### 6.6 0.30.0: Enterprise Governance and Upgrade Maturity
|
|
338
433
|
|
|
339
434
|
Theme: deepen adaptive governance beyond the v0.27.0 baseline.
|
|
340
435
|
|
|
436
|
+
Target timebox: 6-10 weeks.
|
|
437
|
+
|
|
341
438
|
Core work:
|
|
342
439
|
|
|
343
440
|
| Module | Outcome |
|
|
@@ -354,10 +451,12 @@ Exit criteria:
|
|
|
354
451
|
- reasoning-heavy tasks get critique/evaluator gates
|
|
355
452
|
- evolution proposals can be traced to failure evidence and validation results
|
|
356
453
|
|
|
357
|
-
### 6.
|
|
454
|
+
### 6.7 1.0.0 Beta: AI Engineering OS
|
|
358
455
|
|
|
359
456
|
Theme: integrate governance, memory, context, and tools into an operating layer.
|
|
360
457
|
|
|
458
|
+
Target timebox: 8-12 weeks.
|
|
459
|
+
|
|
361
460
|
Target capabilities:
|
|
362
461
|
|
|
363
462
|
- unified agent workspace policy
|
|
@@ -367,6 +466,37 @@ Target capabilities:
|
|
|
367
466
|
- measurable token and quality reports
|
|
368
467
|
- ecosystem-safe skill and MCP lifecycle governance
|
|
369
468
|
|
|
469
|
+
Release criteria:
|
|
470
|
+
|
|
471
|
+
- install, upgrade, run, dashboard, benchmark, and migration flows work on clean projects
|
|
472
|
+
- at least three representative project types have documented smoke results
|
|
473
|
+
- failure learning produces reviewed rule candidates without silently hardening bad rules
|
|
474
|
+
- bilingual docs explain the core workflow without requiring maintainer context
|
|
475
|
+
- public claims are tied to `WORKFLOW_EVAL`, benchmark output, or release evidence
|
|
476
|
+
|
|
477
|
+
### 6.8 1.0.0 Stable and Long-Range Vision
|
|
478
|
+
|
|
479
|
+
This is the strategic north star, not the 0.28.0 closed-loop promise.
|
|
480
|
+
|
|
481
|
+
| Time horizon | Target state | Evidence required before public claim |
|
|
482
|
+
| --- | --- | --- |
|
|
483
|
+
| 8-12 weeks | AI Engineering OS beta: usable end-to-end loop across planning, execution, verification, memory, and dashboard | repeatable demo projects and benchmark reports |
|
|
484
|
+
| 3-6 months | stable governance runtime: upgrades, adapters, memory providers, and eval gates are reliable in real repositories | release-to-release regression data and field reports |
|
|
485
|
+
| 6-12 months | industry-leading agent engineering layer: adaptive workflows, strategy memory, tool intelligence, and cross-agent governance mature together | comparative evals, sustained issue closure, external adoption evidence |
|
|
486
|
+
|
|
487
|
+
Long-range capability themes:
|
|
488
|
+
|
|
489
|
+
- Cognitive memory: working, episodic, semantic, procedural, and strategy memory with explicit source and freshness controls.
|
|
490
|
+
- Adaptive orchestration: workflows selected by risk, ownership, failure history, and tool reliability instead of one fixed path.
|
|
491
|
+
- Tool intelligence: skills, MCP, CLIs, browser automation, and agent adapters treated as governed capabilities with cost, evidence, and fallback policy.
|
|
492
|
+
- Evaluator intelligence: critique loops, uncertainty scoring, adversarial review, and evidence insufficiency verdicts for reasoning-heavy tasks.
|
|
493
|
+
- Governance economics: token cost, gate friction, verification quality, and maintenance overhead measured as first-class product metrics.
|
|
494
|
+
- Ecosystem governance: external skills, memory providers, adapters, and templates integrated through attribution, license, source pinning, and supply-chain checks.
|
|
495
|
+
|
|
496
|
+
Non-negotiable boundary:
|
|
497
|
+
|
|
498
|
+
> The long-range vision can guide architecture, but it must not be used as a release claim until the corresponding evidence exists.
|
|
499
|
+
|
|
370
500
|
## 7. Measurement Plan
|
|
371
501
|
|
|
372
502
|
Strategic claims must be tied to measurement.
|
package/docs/README.md
CHANGED
|
@@ -36,7 +36,7 @@
|
|
|
36
36
|
| [CODE_INTELLIGENCE.md](CODE_INTELLIGENCE.md) | CodeGraph、Graphify 和显式 fallback 的代码智能与探索 ROI |
|
|
37
37
|
| [WORKFLOW_EVAL.md](WORKFLOW_EVAL.md) | Workflow Eval、pass@k 指标、Failure Replay 和改进候选 |
|
|
38
38
|
| [SKILL_RADAR.md](SKILL_RADAR.md) | Skill Radar、能力置信度、证据要求和供应链安全检查 |
|
|
39
|
-
| [AI_ENGINEERING_OS_POSITIONING.md](AI_ENGINEERING_OS_POSITIONING.md) | Agent Governance Runtime / AI Engineering OS
|
|
39
|
+
| [AI_ENGINEERING_OS_POSITIONING.md](AI_ENGINEERING_OS_POSITIONING.md) | Agent Governance Runtime / AI Engineering OS 方向、`scale ai-os plan/run/status/dashboard/benchmark/migrate/adopt/doctor` runtime 入口、`0.28.0` 可用闭环增强和 3-12 个月远景路线 |
|
|
40
40
|
| [THIRD_PARTY_SKILLS.md](THIRD_PARTY_SKILLS.md) | 第三方 skill 致谢、授权边界、引用方式和 vendoring 策略 |
|
|
41
41
|
| [EXTERNAL_REFERENCES.md](EXTERNAL_REFERENCES.md) | 外部项目、skills、MCP、CLI 和适配器引用的完整清单 |
|
|
42
42
|
| [UPGRADE_MANAGEMENT.md](UPGRADE_MANAGEMENT.md) | SCALE CLI、governance pack、skills、MCP 和 CLI 工具的安全升级流程 |
|
package/docs/start/README.md
CHANGED
|
@@ -71,8 +71,9 @@ scale status
|
|
|
71
71
|
| 多仓库/MOE 工作区 | `scale init --governance-pack moe-workspace` |
|
|
72
72
|
| 文档、报告、截图、脚本混乱 | `scale init --governance-pack resource-governance` |
|
|
73
73
|
| 工作流或第三方能力要升级 | `scale upgrade check --lang zh && scale upgrade plan --html --lang zh` |
|
|
74
|
+
| 已有项目接入 AI OS runtime | `scale ai-os adopt --task "接入 AI OS runtime" --lang zh` |
|
|
74
75
|
|
|
75
76
|
|
|
76
77
|
## 工作流升级短路径
|
|
77
78
|
|
|
78
|
-
已有项目先看 [SCALE 工作流升级指南](workflow-upgrade.md)。它说明 `scale init --interactive`、`scale upgrade check/plan/apply/rollback`、`--lang zh/en` 双语输出、仓库本地 `make workflow-upgrade-*` 入口,以及生成文件更新和项目级验证之间的边界。
|
|
79
|
+
已有项目先看 [SCALE 工作流升级指南](workflow-upgrade.md)。它说明 `scale init --interactive`、`scale upgrade check/plan/apply/rollback`、`scale ai-os adopt`、`--lang zh/en` 双语输出、仓库本地 `make workflow-upgrade-*` / `make workflow-aios-adopt` 入口,以及生成文件更新和项目级验证之间的边界。
|
package/docs/start/quickstart.md
CHANGED
|
@@ -131,6 +131,12 @@ scale upgrade plan --dir . --html --lang zh
|
|
|
131
131
|
scale upgrade apply --dir . --confirm --lang zh
|
|
132
132
|
```
|
|
133
133
|
|
|
134
|
+
如果升级计划提示 AI OS runtime 尚未接入,用一键接入命令生成运行态目录、首份 dry-run、benchmark 和 doctor 报告:
|
|
135
|
+
|
|
136
|
+
```bash
|
|
137
|
+
scale ai-os adopt --dir . --task "接入 AI OS runtime" --lang zh
|
|
138
|
+
```
|
|
139
|
+
|
|
134
140
|
需要英文输出时把 `--lang zh` 换成 `--lang en`。干净的 SCALE 受管文件可以自动刷新;已有本地改动的文件会进入人工审阅,不会被自动覆盖。
|
|
135
141
|
|
|
136
142
|
继续阅读 [官方 Demo Walkthrough](agent-governance-demo.md),看一个真实任务如何从需求到验证证据。
|
|
@@ -60,6 +60,18 @@ scale upgrade apply --dir . --confirm
|
|
|
60
60
|
scale preflight --dir . --service all --preflight-profile quick
|
|
61
61
|
```
|
|
62
62
|
|
|
63
|
+
如果升级计划提示 AI OS runtime 尚未接入,优先使用一键接入命令。它会创建运行态目录、生成首个 `dry-run` 运行报告、写入 benchmark,并用 doctor 复核就绪状态:
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
scale ai-os adopt \
|
|
67
|
+
--dir . \
|
|
68
|
+
--task "接入 AI OS runtime 并生成首份治理证据" \
|
|
69
|
+
--files "README.md,AGENTS.md" \
|
|
70
|
+
--lang zh
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
接入完成后会写入 `.scale/ai-os/adoption.json`。后续真实任务再使用 `scale ai-os run --mode guarded` 生成受治理的执行证据。
|
|
74
|
+
|
|
63
75
|
默认输出是中文。需要英文命令提示或英文 HTML 计划时加 `--lang en`:
|
|
64
76
|
|
|
65
77
|
```bash
|
|
@@ -67,12 +79,15 @@ scale upgrade check --dir . --lang en
|
|
|
67
79
|
scale upgrade plan --dir . --html --lang en
|
|
68
80
|
```
|
|
69
81
|
|
|
82
|
+
给人看的升级输出会使用当前语言生成下一步命令,例如中文场景会推荐 `scale ai-os adopt --task "接入 AI OS runtime" --lang zh`。只有脚本、CI 或 Agent 集成需要稳定结构时才使用 `--json`。
|
|
83
|
+
|
|
70
84
|
如果仓库已有本地封装,优先使用本地命令,因为它们编码了项目默认值:
|
|
71
85
|
|
|
72
86
|
```bash
|
|
73
87
|
make workflow-upgrade-check
|
|
74
88
|
make workflow-upgrade-plan
|
|
75
89
|
make workflow-upgrade-apply
|
|
90
|
+
make workflow-aios-adopt
|
|
76
91
|
make workflow-upgrade-verify
|
|
77
92
|
```
|
|
78
93
|
|
|
@@ -160,6 +175,8 @@ workflow-upgrade-rollback:
|
|
|
160
175
|
scale upgrade rollback --dir . --lang zh
|
|
161
176
|
workflow-upgrade-verify:
|
|
162
177
|
scale preflight --dir . --service all --preflight-profile quick
|
|
178
|
+
workflow-aios-adopt:
|
|
179
|
+
scale ai-os adopt --dir . --task "$(TASK)" --files "$(FILES)" --level "$(LEVEL)" --budget "$(BUDGET)" --lang zh
|
|
163
180
|
```
|
|
164
181
|
|
|
165
182
|
如果 Windows 环境没有 `make`,提供等价 PowerShell 脚本,或在文档里写清原始 `scale` 命令。
|
package/docs/workflow/README.md
CHANGED
|
@@ -62,6 +62,7 @@ feature/fix/docs/chore/codex -> dev -> master
|
|
|
62
62
|
make bootstrap-scale
|
|
63
63
|
make workflow-upgrade-check
|
|
64
64
|
make workflow-upgrade-plan
|
|
65
|
+
make workflow-aios-adopt
|
|
65
66
|
```
|
|
66
67
|
|
|
67
|
-
先审计划,再决定是否 `make workflow-upgrade-apply
|
|
68
|
+
先审计划,再决定是否 `make workflow-upgrade-apply`。如果计划提示 AI OS runtime 尚未接入,使用 `make workflow-aios-adopt` 生成运行态目录、首份 dry-run、benchmark 和 doctor 报告。
|
package/package.json
CHANGED