@hongmaple0820/scale-engine 0.27.0 → 0.28.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -280,18 +280,20 @@ The promotion step must remain evidence-backed. Automatically generating rules w
280
280
 
281
281
  ## 6. Roadmap Direction
282
282
 
283
- ### 6.1 Immediate Patch: 0.26.1
283
+ ### 6.1 Planning Principle
284
284
 
285
- Goal: publish the security patch before expanding strategic scope.
285
+ The roadmap has release horizons plus a long-range vision:
286
286
 
287
- Primary outcomes:
288
-
289
- - remove `verify-task` shell execution risk from the published package
290
- - document safe verification command semantics
291
- - pin and override flagged dependency versions where applicable
292
- - preserve production dependency audit health
287
+ | Horizon | Purpose | Claim boundary |
288
+ | --- | --- | --- |
289
+ | 0.27.x baseline | establish the AI OS Runtime primitives and adoption path | "runtime baseline", not "complete AI OS" |
290
+ | 0.28.0 closure | make planning, execution, verification, dashboard, benchmark, and adoption usable as a closed loop | "usable closed-loop beta", not "stable final OS" |
291
+ | 0.29.0 intelligence | make memory, context, and skill routing measurably smarter | "intelligence beta", not proven long-term cognition |
292
+ | 0.30.0 governance maturity | strengthen enterprise governance, upgrade, evaluator, and evolution controls | "governance maturity", not commercial stability |
293
+ | 1.0.0 beta | integrate the loop into a public AI Engineering OS beta | "public beta", backed by demos and benchmark evidence |
294
+ | Long-range vision | keep SCALE moving toward an AI Engineering OS with memory, context, governance, and tool intelligence | directional until backed by eval data |
293
295
 
294
- This is a trust-maintenance release. It should not be mixed with large roadmap changes.
296
+ The near-term work should be aggressive, but public wording must stay precise. SCALE can ship beta capabilities quickly; it should only claim stable, industry-leading AI OS behavior after repeated project evidence, benchmarks, and upgrade validation.
295
297
 
296
298
  ### 6.2 0.27.0: Cognitive Runtime Layer
297
299
 
@@ -334,10 +336,105 @@ Exit criteria:
334
336
  - skill recommendations include why, when, and required proof: baseline implemented by skill execution plans
335
337
  - context pack generation reports token budget and omissions: baseline implemented by `context.pack.compiler`
336
338
 
337
- ### 6.3 0.28.0: Adaptive Governance
339
+ ### 6.3 0.27.x: Runtime Baseline and Adoption Path
340
+
341
+ Theme: make the AI OS Runtime installable, inspectable, and safe to adopt.
342
+
343
+ Current landing status:
344
+
345
+ - `scale ai-os plan` exists as the unified planning entry point for governance, context, memory, skill routing, adaptive workflow, and ROI.
346
+ - `scale ai-os run --dry-run` exists as the first beta execution slice.
347
+ - `scale ai-os run --mode guarded --verify "<command>"` executes explicit verification commands through the safe command runner, records each command as runtime evidence, and blocks the run when verification fails.
348
+ - `scale ai-os status --lang zh|en` checks runtime directories, plan/run evidence, guarded verification, dashboard health, benchmark evidence, and adoption evidence in one closed-loop readiness report; when verification evidence is missing, it recommends concrete guarded verification commands from `.scale/verification.json` or `package.json`.
349
+ - `scale ai-os dashboard` summarizes persisted run reports into ready/blocked counts, guarded verification health, pending evidence, failure learning candidates, and next recommendations.
350
+ - `scale ai-os benchmark` runs fixed beta scenarios and reports context token use, estimated savings, memory recall, skill steps, governance modes, and the current dashboard health snapshot.
351
+ - `scale ai-os migrate` creates or verifies the `.scale/ai-os` runtime directories and writes an idempotent migration report.
352
+ - `scale ai-os adopt` runs migrate, the first dry-run, benchmark, and doctor as one adoption path, then writes `.scale/ai-os/adoption.json`.
353
+ - `scale ai-os doctor --lang zh|en` checks AI OS runtime readiness without mutating the project and blocks adoption when required directories or dashboard health are broken.
354
+ - `scale upgrade check/plan` includes AI OS readiness, so existing projects see adoption, migration, and doctor steps through the normal upgrade workflow.
355
+ - The upgrade and adoption CLI surfaces now have human-facing Chinese and English output while preserving JSON for scripts, CI, and agent integrations.
356
+
357
+ Boundary:
358
+
359
+ - 0.27.x is the baseline. It proves the runtime surface and adoption path, but it does not yet prove autonomous source mutation, PR creation, long-term memory, or stable commercial AI OS behavior.
360
+
361
+ ### 6.4 0.28.0: Usable Closed-Loop Enhancement
362
+
363
+ Theme: turn `ai-os plan` into a runnable beta loop.
364
+
365
+ Target timebox: 2-3 weeks.
366
+
367
+ Core work:
368
+
369
+ | Module | Outcome |
370
+ | --- | --- |
371
+ | `scale ai-os run` | execute the unified plan through workflow, context, memory, skill routing, and verification steps |
372
+ | Runtime Status | show whether plan, run, verification, dashboard, benchmark, adoption, and doctor evidence exist for the project |
373
+ | Verification Recommendation | derive suggested verification commands from task level, changed files, project verification profile, and risk signals |
374
+ | Failure Learning Closure | convert failed guarded runs, gate failures, and missing evidence into reviewed lesson/rule candidates |
375
+ | Closed-Loop Demo Pack | provide repeatable docs and code task demos that exercise plan -> run -> verify -> dashboard -> benchmark |
376
+ | Memory Provider Bridge | keep gbrain, agentmemory, code memory, and local memory selectable through one provider contract |
377
+ | Context Compiler v2 | merge task intent, risk level, files, memory recall, and role into one explainable context pack |
378
+ | Skill Router v2 | create an execution graph for skills, MCP tools, CLIs, artifacts, and required evidence |
379
+ | Adaptive Workflow Profiles | choose light, standard, or strict gates from risk and changed-file signals |
380
+ | AI OS Dashboard CLI | summarize gate health, memory hits, context budget, skill evidence, and ROI |
381
+ | Upgrade/Migration | migrate older `.scale` state and warn about incompatible local governance files |
382
+ | AI OS Adoption and Doctor | keep one-command adoption and readiness checks aligned with the normal upgrade workflow |
383
+ | Bilingual DX | keep key CLI help, errors, README guidance, and tutorials readable in Chinese and English |
384
+ | Benchmark Pack | run fixed samples for token budget, recall, gate pass rate, and skill-routing evidence |
385
+
386
+ Exit criteria:
387
+
388
+ - `scale ai-os run` can complete at least one documentation task and one code task in dry-run or guarded execution mode
389
+ - `scale ai-os status` or equivalent doctor output shows what is missing for a closed loop
390
+ - verification recommendations are explainable and can be overridden by explicit `--verify` commands
391
+ - execution output records context decisions, memory provider choices, skill decisions, gate results, and failure lessons
392
+ - benchmark output compares context token budget against a full-load baseline
393
+ - beta docs clearly state what is automated, what is proposed, and what still requires human approval
394
+
395
+ Current implementation status:
396
+
397
+ - In progress on the post-0.27.1 development branch.
398
+ - Runtime baseline, status visibility, verification recommendation, adoption, doctor, dashboard, benchmark, migration, upgrade integration, and bilingual adoption guidance are already landed.
399
+ - Remaining 0.28.0 work should focus on failure-learning closure and repeatable end-to-end demo evidence.
400
+ - It does not yet create PRs or mutate source files; richer skill execution remains a later implementation slice unless explicitly approved.
401
+
402
+ Explicitly deferred:
403
+
404
+ - default automatic PR creation or merge without review
405
+ - deep dynamic dependency sandboxing beyond audit, lockfile diff, and high-risk pattern checks
406
+ - full VLM visual judgment beyond screenshot capture and interface placeholders
407
+ - claims of human-level long-term memory or fully autonomous engineering
408
+
409
+ ### 6.5 0.29.0: Memory, Context, and Skill Intelligence
410
+
411
+ Theme: make the beta loop measurably smarter rather than only broader.
412
+
413
+ Target timebox: 4-6 weeks.
414
+
415
+ Core work:
416
+
417
+ | Module | Outcome |
418
+ | --- | --- |
419
+ | Memory Quality Scoring | score recall precision, contradiction risk, accepted memory rate, and stale-memory risk |
420
+ | Provider Fallback Policy | choose between gbrain, agentmemory, code memory, local memory, or no memory with an explicit reason |
421
+ | Context Compression | summarize low-risk context while preserving high-risk evidence verbatim |
422
+ | Skill Strategy Learning | learn preferred tools from successful evidence, failures, and user overrides |
423
+ | Workflow Eval Integration | turn benchmark results into release-gate evidence |
424
+
425
+ Exit criteria:
426
+
427
+ - memory recall has acceptance/rejection feedback
428
+ - context packs show savings, omissions, and evidence-loss warnings
429
+ - skill routing decisions can be compared against outcome quality
430
+ - release notes include measured deltas instead of aspirational percentages
431
+
432
+ ### 6.6 0.30.0: Enterprise Governance and Upgrade Maturity
338
433
 
339
434
  Theme: deepen adaptive governance beyond the v0.27.0 baseline.
340
435
 
436
+ Target timebox: 6-10 weeks.
437
+
341
438
  Core work:
342
439
 
343
440
  | Module | Outcome |
@@ -354,10 +451,12 @@ Exit criteria:
354
451
  - reasoning-heavy tasks get critique/evaluator gates
355
452
  - evolution proposals can be traced to failure evidence and validation results
356
453
 
357
- ### 6.4 0.29.0+: Agent Engineering OS
454
+ ### 6.7 1.0.0 Beta: AI Engineering OS
358
455
 
359
456
  Theme: integrate governance, memory, context, and tools into an operating layer.
360
457
 
458
+ Target timebox: 8-12 weeks.
459
+
361
460
  Target capabilities:
362
461
 
363
462
  - unified agent workspace policy
@@ -367,6 +466,37 @@ Target capabilities:
367
466
  - measurable token and quality reports
368
467
  - ecosystem-safe skill and MCP lifecycle governance
369
468
 
469
+ Release criteria:
470
+
471
+ - install, upgrade, run, dashboard, benchmark, and migration flows work on clean projects
472
+ - at least three representative project types have documented smoke results
473
+ - failure learning produces reviewed rule candidates without silently hardening bad rules
474
+ - bilingual docs explain the core workflow without requiring maintainer context
475
+ - public claims are tied to `WORKFLOW_EVAL`, benchmark output, or release evidence
476
+
477
+ ### 6.8 1.0.0 Stable and Long-Range Vision
478
+
479
+ This is the strategic north star, not the 0.28.0 closed-loop promise.
480
+
481
+ | Time horizon | Target state | Evidence required before public claim |
482
+ | --- | --- | --- |
483
+ | 8-12 weeks | AI Engineering OS beta: usable end-to-end loop across planning, execution, verification, memory, and dashboard | repeatable demo projects and benchmark reports |
484
+ | 3-6 months | stable governance runtime: upgrades, adapters, memory providers, and eval gates are reliable in real repositories | release-to-release regression data and field reports |
485
+ | 6-12 months | industry-leading agent engineering layer: adaptive workflows, strategy memory, tool intelligence, and cross-agent governance mature together | comparative evals, sustained issue closure, external adoption evidence |
486
+
487
+ Long-range capability themes:
488
+
489
+ - Cognitive memory: working, episodic, semantic, procedural, and strategy memory with explicit source and freshness controls.
490
+ - Adaptive orchestration: workflows selected by risk, ownership, failure history, and tool reliability instead of one fixed path.
491
+ - Tool intelligence: skills, MCP, CLIs, browser automation, and agent adapters treated as governed capabilities with cost, evidence, and fallback policy.
492
+ - Evaluator intelligence: critique loops, uncertainty scoring, adversarial review, and evidence insufficiency verdicts for reasoning-heavy tasks.
493
+ - Governance economics: token cost, gate friction, verification quality, and maintenance overhead measured as first-class product metrics.
494
+ - Ecosystem governance: external skills, memory providers, adapters, and templates integrated through attribution, license, source pinning, and supply-chain checks.
495
+
496
+ Non-negotiable boundary:
497
+
498
+ > The long-range vision can guide architecture, but it must not be used as a release claim until the corresponding evidence exists.
499
+
370
500
  ## 7. Measurement Plan
371
501
 
372
502
  Strategic claims must be tied to measurement.
package/docs/README.md CHANGED
@@ -36,7 +36,7 @@
36
36
  | [CODE_INTELLIGENCE.md](CODE_INTELLIGENCE.md) | CodeGraph、Graphify 和显式 fallback 的代码智能与探索 ROI |
37
37
  | [WORKFLOW_EVAL.md](WORKFLOW_EVAL.md) | Workflow Eval、pass@k 指标、Failure Replay 和改进候选 |
38
38
  | [SKILL_RADAR.md](SKILL_RADAR.md) | Skill Radar、能力置信度、证据要求和供应链安全检查 |
39
- | [AI_ENGINEERING_OS_POSITIONING.md](AI_ENGINEERING_OS_POSITIONING.md) | Agent Governance Runtime / AI Engineering OS 方向,以及 `scale ai-os plan` 一体化 runtime plan |
39
+ | [AI_ENGINEERING_OS_POSITIONING.md](AI_ENGINEERING_OS_POSITIONING.md) | Agent Governance Runtime / AI Engineering OS 方向、`scale ai-os plan/run/status/dashboard/benchmark/migrate/adopt/doctor` runtime 入口、`0.28.0` 可用闭环增强和 3-12 个月远景路线 |
40
40
  | [THIRD_PARTY_SKILLS.md](THIRD_PARTY_SKILLS.md) | 第三方 skill 致谢、授权边界、引用方式和 vendoring 策略 |
41
41
  | [EXTERNAL_REFERENCES.md](EXTERNAL_REFERENCES.md) | 外部项目、skills、MCP、CLI 和适配器引用的完整清单 |
42
42
  | [UPGRADE_MANAGEMENT.md](UPGRADE_MANAGEMENT.md) | SCALE CLI、governance pack、skills、MCP 和 CLI 工具的安全升级流程 |
@@ -71,8 +71,9 @@ scale status
71
71
  | 多仓库/MOE 工作区 | `scale init --governance-pack moe-workspace` |
72
72
  | 文档、报告、截图、脚本混乱 | `scale init --governance-pack resource-governance` |
73
73
  | 工作流或第三方能力要升级 | `scale upgrade check --lang zh && scale upgrade plan --html --lang zh` |
74
+ | 已有项目接入 AI OS runtime | `scale ai-os adopt --task "接入 AI OS runtime" --lang zh` |
74
75
 
75
76
 
76
77
  ## 工作流升级短路径
77
78
 
78
- 已有项目先看 [SCALE 工作流升级指南](workflow-upgrade.md)。它说明 `scale init --interactive`、`scale upgrade check/plan/apply/rollback`、`--lang zh/en` 双语输出、仓库本地 `make workflow-upgrade-*` 入口,以及生成文件更新和项目级验证之间的边界。
79
+ 已有项目先看 [SCALE 工作流升级指南](workflow-upgrade.md)。它说明 `scale init --interactive`、`scale upgrade check/plan/apply/rollback`、`scale ai-os adopt`、`--lang zh/en` 双语输出、仓库本地 `make workflow-upgrade-*` / `make workflow-aios-adopt` 入口,以及生成文件更新和项目级验证之间的边界。
@@ -131,6 +131,12 @@ scale upgrade plan --dir . --html --lang zh
131
131
  scale upgrade apply --dir . --confirm --lang zh
132
132
  ```
133
133
 
134
+ 如果升级计划提示 AI OS runtime 尚未接入,用一键接入命令生成运行态目录、首份 dry-run、benchmark 和 doctor 报告:
135
+
136
+ ```bash
137
+ scale ai-os adopt --dir . --task "接入 AI OS runtime" --lang zh
138
+ ```
139
+
134
140
  需要英文输出时把 `--lang zh` 换成 `--lang en`。干净的 SCALE 受管文件可以自动刷新;已有本地改动的文件会进入人工审阅,不会被自动覆盖。
135
141
 
136
142
  继续阅读 [官方 Demo Walkthrough](agent-governance-demo.md),看一个真实任务如何从需求到验证证据。
@@ -60,6 +60,18 @@ scale upgrade apply --dir . --confirm
60
60
  scale preflight --dir . --service all --preflight-profile quick
61
61
  ```
62
62
 
63
+ 如果升级计划提示 AI OS runtime 尚未接入,优先使用一键接入命令。它会创建运行态目录、生成首个 `dry-run` 运行报告、写入 benchmark,并用 doctor 复核就绪状态:
64
+
65
+ ```bash
66
+ scale ai-os adopt \
67
+ --dir . \
68
+ --task "接入 AI OS runtime 并生成首份治理证据" \
69
+ --files "README.md,AGENTS.md" \
70
+ --lang zh
71
+ ```
72
+
73
+ 接入完成后会写入 `.scale/ai-os/adoption.json`。后续真实任务再使用 `scale ai-os run --mode guarded` 生成受治理的执行证据。
74
+
63
75
  默认输出是中文。需要英文命令提示或英文 HTML 计划时加 `--lang en`:
64
76
 
65
77
  ```bash
@@ -67,12 +79,15 @@ scale upgrade check --dir . --lang en
67
79
  scale upgrade plan --dir . --html --lang en
68
80
  ```
69
81
 
82
+ 给人看的升级输出会使用当前语言生成下一步命令,例如中文场景会推荐 `scale ai-os adopt --task "接入 AI OS runtime" --lang zh`。只有脚本、CI 或 Agent 集成需要稳定结构时才使用 `--json`。
83
+
70
84
  如果仓库已有本地封装,优先使用本地命令,因为它们编码了项目默认值:
71
85
 
72
86
  ```bash
73
87
  make workflow-upgrade-check
74
88
  make workflow-upgrade-plan
75
89
  make workflow-upgrade-apply
90
+ make workflow-aios-adopt
76
91
  make workflow-upgrade-verify
77
92
  ```
78
93
 
@@ -160,6 +175,8 @@ workflow-upgrade-rollback:
160
175
  scale upgrade rollback --dir . --lang zh
161
176
  workflow-upgrade-verify:
162
177
  scale preflight --dir . --service all --preflight-profile quick
178
+ workflow-aios-adopt:
179
+ scale ai-os adopt --dir . --task "$(TASK)" --files "$(FILES)" --level "$(LEVEL)" --budget "$(BUDGET)" --lang zh
163
180
  ```
164
181
 
165
182
  如果 Windows 环境没有 `make`,提供等价 PowerShell 脚本,或在文档里写清原始 `scale` 命令。
@@ -62,6 +62,7 @@ feature/fix/docs/chore/codex -> dev -> master
62
62
  make bootstrap-scale
63
63
  make workflow-upgrade-check
64
64
  make workflow-upgrade-plan
65
+ make workflow-aios-adopt
65
66
  ```
66
67
 
67
- 先审计划,再决定是否 `make workflow-upgrade-apply`。
68
+ 先审计划,再决定是否 `make workflow-upgrade-apply`。如果计划提示 AI OS runtime 尚未接入,使用 `make workflow-aios-adopt` 生成运行态目录、首份 dry-run、benchmark 和 doctor 报告。
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@hongmaple0820/scale-engine",
3
- "version": "0.27.0",
3
+ "version": "0.28.0",
4
4
  "description": "Executable AI agent governance with workflow gates, evidence, skill/tool orchestration, and traceable HTML artifacts",
5
5
  "repository": {
6
6
  "type": "git",