npm - cc-devflow - Versions diffs - 4.5.8 → 4.5.10 - Mend

cc-devflow 4.5.8 → 4.5.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (149) hide show

package/docs/guides/getting-started.zh-CN.md CHANGED Viewed

@@ -11,10 +11,10 @@ CC-DevFlow 现在有两条入口：
 - `cc-devflow init`：把整包 `.claude` 安装到你的项目里
 - `cc-devflow adapt`：生成 Codex、Cursor、Qwen、Antigravity 等平台产物
-核心工作流由 6 个可见 Skill 组成，复杂工作可选 `cc-review` 做深度 Review：
+核心工作流可以手动走 PDCA/IDCA Skill，也可以通过 PR harness Skill 自动推进：
 ```text
-cc-roadmap
+cc-roadmap -> cc-next -> cc-dev
 PDCA: cc-plan -> [cc-review] -> cc-do -> [cc-review] -> cc-check -> cc-act
 IDCA: cc-investigate -> [cc-review] -> cc-do -> [cc-review] -> cc-check -> cc-act
@@ -36,7 +36,7 @@ IDCA: cc-investigate -> [cc-review] -> cc-do -> [cc-review] -> cc-check -> cc-ac
 npx cc-devflow init --dir /path/to/your/project
 ```
-整包安装会带上 6 个核心 workflow skill、可选 `cc-review`，以及维护用的 `cc-spec-init` 和 `cc-simplify`。
+整包安装会带上 roadmap、next-work selection、autonomous dev、手动 PDCA/IDCA、可选 `cc-review`、PR review/landing，以及维护用的 `cc-spec-init` 和 `cc-simplify`。
 ### 单个 Skill 安装
@@ -87,9 +87,9 @@ find .codex/skills -mindepth 2 -maxdepth 2 -name SKILL.md | sort
 - `cc-roadmap` 产出可编辑真相 `devflow/roadmap.json`，再生成 `devflow/ROADMAP.md` 和 deprecated `devflow/BACKLOG.md`
 - `cc-spec-init` 产出 `devflow/specs/INDEX.md`、capability spec 和 `change-meta.json`
-- `cc-plan` 产出 `planning/design.md`、`planning/tasks.md`、`task-manifest.json` 和 `change-meta.json`
-- `cc-investigate` 产出 `planning/analysis.md`、`planning/tasks.md`、`task-manifest.json` 和 `change-meta.json`
-- `cc-review` 产出 `cc-review-plan.md`、`cc-review-ledger.jsonl`、`cc-review-report.md`、可选 `cc-review-agent-results.jsonl`，以及可选的结构化深度 Review findings
+- `cc-plan` 产出 `planning/tasks.md#Contract Summary`、CLI 生成的 `task-manifest.json` 和 `change-meta.json`
+- `cc-investigate` 产出 `planning/tasks.md#Root Cause Contract`、CLI 生成的 `task-manifest.json` 和 `change-meta.json`
+- `cc-review` 产出 `review-ledger.jsonl`、可选 `review-findings.json`，Markdown 报告只在需要时按需渲染
 - `cc-check` 产出 `report-card.json`
 - `cc-act` 只产出一个最终 handoff 文件：`handoff/pr-brief.md`、`handoff/resume-index.md` 或 `handoff/release-note.md`
@@ -97,7 +97,8 @@ durable truth 分两层：
 - `devflow/specs/`：capability 真相，保留 `INDEX.md` 与 `capabilities/*.md`
 - 新 change 目录必须命名为 `REQ-<number>-<description>`（需求）或 `FIX-<number>-<description>`（修复）；`REQ` 和 `FIX` 分别维护自己的递增编号，跨前缀同号不是冲突；并行工作树造成重复编号时，完整 change key 的描述负责区分业务内容，旧小写目录只作为历史兼容读取。
-- `devflow/changes/<change>/`：变更真相，保留 `change-state.json`、`change-meta.json`、planning 文档、`task-manifest.json`、可选 `team-state.json`、任务级 `checkpoint.json`、`report-card.json` 和唯一的最终 handoff 文件。
+- `devflow/changes/<change>/`：变更真相，保留 `change-meta.json`、`planning/tasks.md`、CLI 生成的 `task-manifest.json`、review ledger / findings 记录、debug / failed 的可选 CLI 日志、`report-card.json` 和唯一的最终 handoff 文件。不要生成任务级 `context.md`、`checkpoint.json` 或 AI 手写过程文件。
+- 历史 `planning/design.md`、`planning/analysis.md` 和 `cc-review-*.md` 是旧 change 的可读 fallback，不再是新默认写入。
 - worker prompt、journal、assignment、session log 统一放到 `devflow/workspaces/<change>/`，作为 ephemeral scratch。
 进入实现前，planning handoff 应该先把证据写实：
@@ -153,7 +154,7 @@ npx cc-devflow adapt --cwd /path/to/your/project --platform codex
 如果你的项目没有可选的 `.claude/commands/` 输入目录，这也是正常的；编译器仍然会生成 skills registry，并为 Codex 镜像正式分发 skill 集合。
-Codex 现在会把正式分发的 skill 从 `.claude/skills/<skill>/` 镜像到 `.codex/skills/<skill>/`。这套集合包含 6 个核心 workflow skill、可选 `cc-review` 和维护类 skill `cc-spec-init`、`cc-simplify`，并且镜像是纯增量的：项目里已有的自定义 Codex skill 不会被删除。
+Codex 现在会把正式分发的 skill 从 `.claude/skills/<skill>/` 镜像到 `.codex/skills/<skill>/`。这套集合包含公开 workflow skill 和维护类 skill `cc-spec-init`、`cc-simplify`，并且镜像是纯增量的：项目里已有的自定义 Codex skill 不会被删除。
 ### 保持 skill 和样例同步
@@ -172,6 +173,7 @@ npm run verify:publish
 - [CLI 与 Skill](../commands/README.zh-CN.md)
 - [工作流详解](./workflow-guide.md)
 - [最佳实践](./best-practices.md)
+- [最小 Artifact 合同](./minimize-artifacts.md)
 - [样例入口页](../examples/START-HERE.md)
 - [简版样例列表](../examples/README.md)
 - [项目 README](../../README.zh-CN.md)

package/docs/guides/minimize-artifacts.md ADDED Viewed

@@ -0,0 +1,123 @@
+# Minimized Workflow Artifacts
+This guide describes the default artifact contract for new cc-devflow changes.
+The goal is simple: keep durable workflow truth readable, small, and measurable.
+## Default Shape
+Each new change keeps durable truth under `devflow/changes/<change-key>/`.
+Default human-authored Markdown:
+- `planning/tasks.md`
+Default machine-owned records:
+- `change-meta.json`
+- `planning/task-manifest.json`
+- `review/review-ledger.jsonl`
+- `review/review-findings.json` when findings exist
+- `execution/tasks/<task-id>/checkpoint.json`
+- `review/report-card.json`
+- one final handoff file under `handoff/`
+Runtime scratch, worker prompts, journals, assignments, and session logs belong
+under `devflow/workspaces/<change-key>/`, not beside durable change truth.
+## Feature Plans
+Feature and scope changes use:
+- `planning/tasks.md#Contract Summary`
+- `planning/task-manifest.json`
+- `change-meta.json`
+`Contract Summary` owns the frozen human-readable plan: user story, non-negotiable
+constraints, decisions that must not be reopened, task slices, and verification
+expectations. The task manifest is generated or validated by CLI tooling and owns
+machine-readable task status.
+## Bug Investigations
+Bug, regression, and unexpected-behavior work uses:
+- `planning/tasks.md#Root Cause Contract`
+- `planning/task-manifest.json`
+- `change-meta.json`
+`Root Cause Contract` owns the symptom, reproduction evidence, confirmed cause,
+rejected near-causes, repair boundary, and task handoff. `cc-do` should implement
+from that frozen contract instead of reopening investigation during execution.
+## Review Records
+`cc-review` writes structured lifecycle events first:
+- `review/review-ledger.jsonl`
+- optional `review/review-findings.json`
+- optional rendered Markdown from `cc-devflow review render`
+Markdown review reports are for human reading when needed. They are not the
+default durable review source.
+Useful commands:
+```bash
+npx cc-devflow review start --change REQ-001 --change-key REQ-001-copy-invite-link --base-sha abc123 --head-sha def456
+npx cc-devflow review record-node --change REQ-001 --change-key REQ-001-copy-invite-link --review-id <review-id> --node-id R001 --target planning/tasks.md --status checked --coverage contract --evidence-ref "cmd:npm run verify"
+npx cc-devflow review add-finding --change REQ-001 --change-key REQ-001-copy-invite-link --review-id <review-id> --finding-id F001 --severity important --confidence 8 --display-tier blocking --fingerprint sha256:<hash> --scope "current change" --path planning/tasks.md --evidence "finding evidence" --recommendation "repair action" --route cc-do
+npx cc-devflow review close --change REQ-001 --change-key REQ-001-copy-invite-link --review-id <review-id> --status clean --blocking-count 0 --warning-count 0 --next cc-check
+npx cc-devflow review render --change REQ-001 --change-key REQ-001-copy-invite-link --review-id <review-id> --output review/review-report.md
+```
+## Legacy Fallback
+Older changes may still contain:
+- `planning/design.md`
+- `planning/analysis.md`
+- `review/cc-review-plan.md`
+- `review/cc-review-report.md`
+- `review/cc-review-agent-results.jsonl`
+Those files remain readable compatibility inputs. New changes should not write
+them by default. When migrating old work, fold feature-plan truth into
+`planning/tasks.md#Contract Summary` and bug-investigation truth into
+`planning/tasks.md#Root Cause Contract`.
+## Validation Gates
+Validate one change:
+```bash
+npx cc-devflow task-contract validate --change REQ-001 --change-key REQ-001-copy-invite-link
+```
+Validate the repository artifact contract:
+```bash
+npm run verify:artifacts
+```
+Measure the contract:
+```bash
+npm run benchmark:artifacts
+```
+The package-level verification command also includes artifact validation:
+```bash
+npm run verify
+```
+## Authoring Rule
+Before adding a durable file under `devflow/changes/<change-key>/`, answer:
+1. Which downstream skill reads it by default?
+2. Which state does it own that no existing artifact owns?
+3. Which command fails if it drifts?
+If those answers are unclear, keep the information in `planning/tasks.md`, a
+machine record, or ephemeral workspace scratch instead.

package/docs/guides/project-postmortem.md ADDED Viewed

@@ -0,0 +1,78 @@
+# Project Postmortem Contract
+cc-devflow treats project postmortems as a durable AI memory surface. They are not
+chat summaries. They are repo-owned evidence that future agents can search before
+planning, investigating, or executing work.
+## Storage Layout
+Project-level postmortems live under `devflow/postmortems/`:
+| Path | Owner | Purpose |
+| --- | --- | --- |
+| `INDEX.md` | `cc-act` | Progressive entry point, latest incidents, tags, and search hints |
+| `principles.md` | `cc-act` | Generalized lessons about recurring model, process, and engineering mistakes |
+| `incidents/<date>-<change-key>.md` | `cc-act` | Immutable-ish factual record for one incident, bug, or repeated AI failure |
+`cc-act` owns writes because it has verified closeout, Git state, review state, and
+ship facts. Earlier skills only read and project the relevant reminders into their
+own artifacts.
+## Progressive Disclosure
+- Default layer: `INDEX.md` gives tags, one-line lessons, severity, affected
+  surfaces, and links to deeper incident files.
+- Principle layer: `principles.md` gives reusable rules such as model failure
+  modes, domain-specific judgment traps, and required countermeasures.
+- Incident layer: `incidents/*.md` gives the detailed facts, Git evidence,
+  timeline, root cause, detection gap, repair, follow-ups, and search terms.
+Agents should start with keyword search over the default and principle layers, then
+open incident files only when the tags or failure class match the current task.
+## Required Incident Evidence
+Every incident file should include:
+- Symptom and impact.
+- Trigger and timeline.
+- Confirmed root cause and rejected near-causes.
+- Why the failure escaped planning, investigation, execution, review, or ship.
+- Git evidence: branch, base, head SHA, PR if any, relevant commits, review range,
+  and dirty-tree notes when they matter.
+- Verification evidence: commands, exit status, key output, and artifact paths.
+- Follow-up actions: root-cause fixes, detection improvements, and backlog items.
+- AI failure mode: model limitation, pattern-matching trap, missing evidence habit,
+  over-broad abstraction, fake compatibility, test-seam mistake, or other reusable
+  class.
+- Search terms future agents should use before repeating similar work.
+## Redaction Guard
+Postmortems are durable repo artifacts, so they must never preserve secrets,
+tokens, private customer data, personal machine paths, or raw private logs unless
+the repository already treats that exact artifact as public source truth.
+- Record the command, file path, commit, or artifact pointer that proves the fact.
+- Quote only the minimal output needed to prove the incident.
+- Replace sensitive values with `<redacted>` and add a short redaction summary.
+- If the only available proof is sensitive, cite the owner artifact and describe
+  the observed shape instead of copying the raw value.
+## Read Gates
+`cc-plan`, `cc-investigate`, and `cc-do` must run a quick local search before they
+freeze direction or touch code:
+```bash
+rg -n "<capability|module|error|failure-class|model-risk>" devflow/postmortems
+```
+If `devflow/postmortems/` does not exist, record `no-project-postmortems-yet`.
+If a match exists, load only `INDEX.md`, `principles.md`, and the one or two
+incident files most relevant to the current work.
+## State Ownership
+Postmortems do not own task status, roadmap progress, review verdicts, or spec sync
+state. They cite those stronger owners by path, commit, or command output.

package/lib/compiler/__tests__/skills-registry.test.js CHANGED Viewed

@@ -159,9 +159,9 @@ describe('Skills Registry Generator', () => {
       expect(execute.writes).toEqual(
         expect.arrayContaining([
           expect.objectContaining({
-            path: 'devflow/changes/<change-key>/execution/tasks/<task-id>/checkpoint.json',
+            path: 'devflow/changes/<change-key>/execution/tasks/<task-id>/events.jsonl',
             durability: 'durable',
-            required: true
+            required: false
           })
         ])
       );

package/lib/skill-runtime/CLAUDE.md CHANGED Viewed

@@ -4,7 +4,7 @@
 职责分组
 入口层: `cli.js` 负责命令分发，`index.js` 提供给测试和内部脚本的稳定聚合入口。
 基础层: `schemas.js`、`store.js`、`paths.js` 管住契约、持久化与路径规则，避免执行层重复造轮子。
-状态层: `artifacts.js`、`lifecycle.js`、`query.js`、`review.js`、`team-state.js` 维护运行时真相源与只读查询。
+状态层: `artifacts.js`、`lifecycle.js`、`query.js`、`workflow-context.js`、`review.js`、`team-state.js` 维护运行时真相源与只读查询。
 规划与交接: `planner.js`、`intent.js`、`delegation.js` 把任务解析、handoff 生成和 team/workspace 委派收口成统一语义。
 阶段操作: `operations/` 是唯一 stage 入口目录；具体阶段边界见 `operations/CLAUDE.md`。
 测试布局: `__tests__/` 紧贴模块放置单元、回归与集成测试；顶层 `test/` 不再承载 `skill-runtime` 私有测试。

package/lib/skill-runtime/__tests__/autopilot.test.js CHANGED Viewed

@@ -9,13 +9,13 @@ const {
   getTaskManifestPath,
   getReportCardPath,
   getReleaseNotePath,
-  getRuntimeStatePath,
-  getCheckpointPath
+  getRuntimeStatePath
 } = require('../store');
 const {
   getIntentResumeIndexPath,
   getIntentPrBriefPath
 } = require('../artifacts');
+const { getChangePaths } = require('../paths');
 jest.setTimeout(20000);
@@ -41,6 +41,41 @@ function markManifestReviewsPassed(repoRoot, changeId) {
   fs.writeFileSync(manifestPath, `${JSON.stringify(manifest, null, 2)}\n`);
 }
+function writeCleanReviewLedger(repoRoot, changeId) {
+  const change = getChangePaths(repoRoot, changeId);
+  const ledgerPath = path.join(change.reviewDir, 'review-ledger.jsonl');
+  fs.mkdirSync(path.dirname(ledgerPath), { recursive: true });
+  fs.writeFileSync(ledgerPath, [
+    JSON.stringify({
+      schema: 'review-ledger.v2',
+      change: change.changeKey,
+      reviewId: 'RVW-20260512-001',
+      createdAt: '2026-05-12T00:00:00.000Z',
+      createdBy: 'cc-devflow-cli',
+      event: 'review-started',
+      mode: 'implementation',
+      scope: 'current-diff',
+      baseSha: 'abc123',
+      headSha: 'def456',
+      selectedNodes: [],
+      skippedNodes: [],
+      riskLanes: []
+    }),
+    JSON.stringify({
+      schema: 'review-ledger.v2',
+      change: change.changeKey,
+      reviewId: 'RVW-20260512-001',
+      createdAt: '2026-05-12T00:01:00.000Z',
+      createdBy: 'cc-devflow-cli',
+      event: 'review-closed',
+      status: 'clean',
+      blockingCount: 0,
+      warningCount: 0,
+      next: 'cc-check'
+    })
+  ].join('\n'));
+}
 describe('runAutopilot', () => {
   test('stops at the approval gate after planning without writing approval-phase handoff markdown', async () => {
     const repoRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'cc-devflow-autopilot-'));
@@ -81,7 +116,7 @@ describe('runAutopilot', () => {
     expect(fs.existsSync(getIntentResumeIndexPath(repoRoot, 'REQ-123'))).toBe(false);
   });
-  test('resumes after approval, executes delegated work, and prepares a PR from checkpoints', async () => {
+  test('resumes after approval, executes delegated work, and prepares a PR from task state', async () => {
     const repoRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'cc-devflow-autopilot-workers-'));
     writeJson(path.join(repoRoot, 'package.json'), {
@@ -123,17 +158,16 @@ describe('runAutopilot', () => {
     });
     const manifest = JSON.parse(fs.readFileSync(getTaskManifestPath(repoRoot, 'REQ-123'), 'utf8'));
-    const delegatedCheckpoint = JSON.parse(fs.readFileSync(getCheckpointPath(repoRoot, 'REQ-123', 'T002'), 'utf8'));
     const report = JSON.parse(fs.readFileSync(getReportCardPath(repoRoot, 'REQ-123'), 'utf8'));
     expect(firstRun.executed).toEqual(expect.arrayContaining(['delegate', 'worker-run', 'dispatch', 'verify']));
     expect(firstRun.currentStage).toBe('verify');
     expect(manifest.tasks.find((task) => task.id === 'T002').status).toBe('passed');
-    expect(delegatedCheckpoint.outputExcerpt).toContain('delegate-ok');
     expect(report.review.status).toBe('blocked');
     expect(fs.existsSync(getIntentPrBriefPath(repoRoot, 'REQ-123'))).toBe(false);
     markManifestReviewsPassed(repoRoot, 'REQ-123');
+    writeCleanReviewLedger(repoRoot, 'REQ-123');
     const secondRun = await runAutopilot({
       repoRoot,
@@ -146,7 +180,8 @@ describe('runAutopilot', () => {
     expect(secondRun.executed).toEqual(expect.arrayContaining(['verify', 'prepare-pr']));
     expect(secondRun.currentStage).toBe('prepare-pr');
-    expect(prBrief).toContain('execution/tasks/T002/checkpoint.json');
+    expect(prBrief).toContain('planning/task-manifest.json');
+    expect(prBrief).not.toContain('checkpoint.json');
   });
   test('runs release after prepare-pr when requested for an approved plan', async () => {
@@ -195,6 +230,7 @@ describe('runAutopilot', () => {
     expect(fs.existsSync(getReleaseNotePath(repoRoot, 'REQ-123'))).toBe(false);
     markManifestReviewsPassed(repoRoot, 'REQ-123');
+    writeCleanReviewLedger(repoRoot, 'REQ-123');
     const result = await runAutopilot({
       repoRoot,

package/lib/skill-runtime/__tests__/benchmark-artifacts.test.js ADDED Viewed

@@ -0,0 +1,165 @@
+/**
+ * [INPUT]: 依赖 scripts/benchmark-artifacts.js 导出的 runBenchmarkArtifacts 和临时 artifact fixture。
+ * [OUTPUT]: 验证 benchmark:artifacts 使用 ceil(len/4) 估算并报告 profile 阈值 savings。
+ * [POS]: REQ-003-minimize-workflow-artifacts T017 的 Red/Green 证据。
+ * [PROTOCOL]: 变更时更新此头部，然后检查 CLAUDE.md
+ */
+const fs = require('fs');
+const os = require('os');
+const path = require('path');
+const { spawnSync } = require('child_process');
+const { runBenchmarkArtifacts } = require('../../../scripts/benchmark-artifacts');
+const REPO_ROOT = path.resolve(__dirname, '../../..');
+const BENCHMARK_SCRIPT = path.join(REPO_ROOT, 'scripts', 'benchmark-artifacts.js');
+function writeText(filePath, text) {
+  fs.mkdirSync(path.dirname(filePath), { recursive: true });
+  fs.writeFileSync(filePath, text);
+}
+function writeJson(filePath, value) {
+  writeText(filePath, `${JSON.stringify(value, null, 2)}\n`);
+}
+function contractTasks({ changeKey, profile = 'standard', filler = '' }) {
+  return [
+    '# Tasks',
+    '',
+    '## Contract Summary',
+    '',
+    `Change: ${changeKey}`,
+    'Mode: plan',
+    `Profile: ${profile}`,
+    'Approval: approved',
+    '',
+    'Goal:',
+    '- Minimize workflow artifacts.',
+    '',
+    'Do Not Do:',
+    '- Do not change token estimator math.',
+    '',
+    'Approved Direction:',
+    '- Use tasks.md plus generated JSON records.',
+    '',
+    'Acceptance:',
+    '- Benchmark savings stay above threshold.',
+    '',
+    'Verification:',
+    '',
+    '```bash',
+    'npm run benchmark:artifacts',
+    '```',
+    '',
+    'Risk / Escalate If:',
+    '- Savings fall below profile threshold.',
+    '',
+    filler,
+    '## Phase 1',
+    '',
+    '- [ ] T001 benchmark minimized artifact surface',
+    '  Vertical slice: Slice 1',
+    ''
+  ].join('\n');
+}
+function seedLegacyBaseline(repoRoot, changeKey, size = 6000) {
+  const changeDir = path.join(repoRoot, 'devflow', 'changes', changeKey);
+  writeText(path.join(changeDir, 'planning', 'design.md'), `# Design\n\n${'d'.repeat(size)}\n`);
+  writeText(path.join(changeDir, 'planning', 'analysis.md'), `# Analysis\n\n${'a'.repeat(size / 2)}\n`);
+  writeText(path.join(changeDir, 'planning', 'tasks.md'), `# Tasks\n\n${'t'.repeat(size / 2)}\n`);
+  writeJson(path.join(changeDir, 'planning', 'task-manifest.json'), { changeId: changeKey, tasks: [] });
+  writeJson(path.join(changeDir, 'change-meta.json'), { changeId: changeKey, goal: ['legacy'] });
+  writeJson(path.join(changeDir, 'review', 'report-card.json'), { overall: 'pass' });
+}
+function seedMinimizedChange(repoRoot, changeKey, options = {}) {
+  const changeDir = path.join(repoRoot, 'devflow', 'changes', changeKey);
+  writeText(path.join(changeDir, 'planning', 'tasks.md'), contractTasks({ changeKey, ...options }));
+  writeJson(path.join(changeDir, 'planning', 'task-manifest.json'), {
+    changeId: changeKey,
+    metadata: { source: 'tasks.md', generatedBy: 'cc-devflow task-contract', planVersion: 1 },
+    tasks: []
+  });
+  writeJson(path.join(changeDir, 'change-meta.json'), {
+    changeId: changeKey,
+    _meta: { generatedBy: 'cc-devflow task-contract' }
+  });
+  writeJson(path.join(changeDir, 'review', 'review-ledger.jsonl'), { note: 'counted as text by benchmark' });
+  writeJson(path.join(changeDir, 'review', 'review-findings.json'), { findings: [] });
+  writeJson(path.join(changeDir, 'review', 'report-card.json'), { overall: 'pass' });
+}
+describe('benchmark:artifacts', () => {
+  let repoRoot;
+  beforeEach(() => {
+    repoRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'cc-devflow-benchmark-artifacts-'));
+    seedLegacyBaseline(repoRoot, 'REQ-001-legacy-baseline');
+    seedLegacyBaseline(repoRoot, 'REQ-002-legacy-baseline');
+  });
+  afterEach(() => {
+    fs.rmSync(repoRoot, { recursive: true, force: true });
+  });
+  test('reports standard savings >= 30% for REQ-003-example', () => {
+    seedMinimizedChange(repoRoot, 'REQ-003-example', { profile: 'standard' });
+    const result = runBenchmarkArtifacts(repoRoot);
+    const row = result.rows.find((item) => item.changeKey === 'REQ-003-example');
+    expect(result.code).toBe(0);
+    expect(row).toMatchObject({
+      profile: 'standard',
+      threshold_pct: 30,
+      correctness_pass: true
+    });
+    expect(row.savings_vs_baseline_pct).toBeGreaterThanOrEqual(30);
+  });
+  test('reports tiny savings >= 60% for tiny fixture', () => {
+    seedMinimizedChange(repoRoot, 'REQ-004-tiny-example', { profile: 'tiny' });
+    const result = runBenchmarkArtifacts(repoRoot);
+    const row = result.rows.find((item) => item.changeKey === 'REQ-004-tiny-example');
+    expect(result.code).toBe(0);
+    expect(row).toMatchObject({
+      profile: 'tiny',
+      threshold_pct: 60,
+      correctness_pass: true
+    });
+    expect(row.savings_vs_baseline_pct).toBeGreaterThanOrEqual(60);
+  });
+  test('exits 1 when savings are below the profile threshold', () => {
+    seedMinimizedChange(repoRoot, 'REQ-005-bloated-example', {
+      profile: 'standard',
+      filler: 'x'.repeat(20000)
+    });
+    const result = runBenchmarkArtifacts(repoRoot);
+    expect(result.code).toBe(1);
+    expect(result.rows[0]).toMatchObject({ correctness_pass: false });
+  });
+  test('CLI prints stdout JSON array', () => {
+    seedMinimizedChange(repoRoot, 'REQ-003-example', { profile: 'standard' });
+    const result = spawnSync(process.execPath, [BENCHMARK_SCRIPT, repoRoot], { encoding: 'utf8' });
+    const rows = JSON.parse(result.stdout);
+    expect(result.status).toBe(0);
+    expect(Array.isArray(rows)).toBe(true);
+    expect(rows[0]).toHaveProperty('savings_vs_baseline_pct');
+  });
+  test('package.json exposes npm run benchmark:artifacts', () => {
+    const pkg = JSON.parse(fs.readFileSync(path.join(REPO_ROOT, 'package.json'), 'utf8'));
+    expect(pkg.scripts['benchmark:artifacts']).toBe('node scripts/benchmark-artifacts.js');
+  });
+});

package/lib/skill-runtime/__tests__/cli-bootstrap.integration.test.js CHANGED Viewed

@@ -217,9 +217,9 @@ describe('cc-devflow cli distribution bootstrap', () => {
     expect(codexDoSkill.data.writes).toEqual(
       expect.arrayContaining([
         expect.objectContaining({
-          path: 'devflow/changes/<change-key>/execution/tasks/<task-id>/checkpoint.json',
+          path: 'devflow/changes/<change-key>/execution/tasks/<task-id>/events.jsonl',
           durability: 'durable',
-          required: true
+          required: false
         })
       ])
     );

package/lib/skill-runtime/__tests__/dispatch.test.js CHANGED Viewed

@@ -7,7 +7,6 @@ const { runResume } = require('../operations/resume');
 const {
   getRuntimeStatePath,
   getTaskManifestPath,
-  getCheckpointPath,
   getEventsPath
 } = require('../store');
@@ -76,7 +75,7 @@ describe('runDispatch', () => {
     expect(nextManifest.tasks[0].status).toBe('pending');
   });
-  test('rejects stale results when planVersion changes during task execution and records it in checkpoint', async () => {
+  test('rejects stale results when planVersion changes during task execution and records it in manifest and events', async () => {
     const repoRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'cc-devflow-dispatch-'));
     const manifestPath = getTaskManifestPath(repoRoot, 'REQ-123');
@@ -133,28 +132,25 @@ describe('runDispatch', () => {
     });
     const nextManifest = JSON.parse(fs.readFileSync(manifestPath, 'utf8'));
-    const checkpoint = JSON.parse(fs.readFileSync(getCheckpointPath(repoRoot, 'REQ-123', 'T001'), 'utf8'));
     const events = fs.readFileSync(getEventsPath(repoRoot, 'REQ-123', 'T001'), 'utf8');
     expect(result.success).toBe(false);
     expect(nextManifest.tasks[0].status).toBe('failed');
     expect(nextManifest.tasks[0].lastError).toContain('Stale result rejected');
-    expect(checkpoint.planVersion).toBe(1);
-    expect(checkpoint.error).toContain('Stale result rejected');
     expect(events).toContain('task_stale_rejected');
   });
-  test('restores unresolved work from the latest stable checkpoint on resume without creating handoff markdown', async () => {
+  test('restores unresolved work from the latest stable manifest state on resume without creating handoff markdown', async () => {
     const repoRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'cc-devflow-resume-stable-'));
     const changeId = 'REQ-123';
     const manifestPath = getTaskManifestPath(repoRoot, changeId);
     writeJson(getRuntimeStatePath(repoRoot, changeId), {
       changeId,
-      changeKey: 'REQ-123-recover-from-stable-checkpoint',
-      slug: 'recover-from-stable-checkpoint',
+      changeKey: 'REQ-123-recover-from-stable-state',
+      slug: 'recover-from-stable-state',
       createdAt: '2026-04-09T01:00:00.000Z',
-      goal: 'Recover from stable checkpoint',
+      goal: 'Recover from stable state',
       status: 'in_progress',
       initializedAt: '2026-04-09T01:00:00.000Z',
       plannedAt: '2026-04-09T01:01:00.000Z',
@@ -169,13 +165,13 @@ describe('runDispatch', () => {
     writeJson(manifestPath, {
       changeId,
-      goal: 'Recover from stable checkpoint',
+      goal: 'Recover from stable state',
       createdAt: '2026-04-09T01:00:00.000Z',
       updatedAt: '2026-04-09T01:02:00.000Z',
       tasks: [
         {
           id: 'T001',
-          title: 'Stable checkpoint task',
+          title: 'Stable completed task',
           type: 'TEST',
           dependsOn: [],
           touches: ['src/a.ts'],
@@ -219,32 +215,6 @@ describe('runDispatch', () => {
       }
     });
-    writeJson(getCheckpointPath(repoRoot, changeId, 'T001'), {
-      changeId,
-      taskId: 'T001',
-      sessionId: 'stable-session',
-      planVersion: 1,
-      status: 'passed',
-      summary: 'Task passed after 1 attempt(s)',
-      error: '',
-      outputExcerpt: '',
-      timestamp: '2026-04-09T01:05:00.000Z',
-      attempt: 1
-    });
-    writeJson(getCheckpointPath(repoRoot, changeId, 'T002'), {
-      changeId,
-      taskId: 'T002',
-      sessionId: 'failed-session',
-      planVersion: 1,
-      status: 'failed',
-      summary: 'Task failed: Command failed',
-      error: 'Command failed',
-      outputExcerpt: 'Command failed',
-      timestamp: '2026-04-09T01:06:00.000Z',
-      attempt: 2
-    });
     const result = await runResume({
       repoRoot,
       changeId,
@@ -255,7 +225,7 @@ describe('runDispatch', () => {
     const nextManifest = JSON.parse(fs.readFileSync(manifestPath, 'utf8'));
     expect(result.success).toBe(true);
-    expect(result.restoredCheckpoint).toMatchObject({
+    expect(result.restoredState).toMatchObject({
       taskId: 'T001',
       status: 'passed'
     });