@snipcodeit/mgw 0.4.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -38,7 +38,7 @@ If no state file exists → issue not triaged yet. Run triage inline:
38
38
  - Execute the mgw:issue triage flow (steps from issue.md) inline.
39
39
  - After triage, reload state file.
40
40
 
41
- If state file exists → load it. **Run migrateProjectState() to ensure retry fields exist:**
41
+ If state file exists → load it. **Run migrateProjectState() to ensure retry and checkpoint fields exist:**
42
42
  ```bash
43
43
  node -e "
44
44
  const { migrateProjectState } = require('./lib/state.cjs');
@@ -46,6 +46,161 @@ migrateProjectState();
46
46
  " 2>/dev/null || true
47
47
  ```
48
48
 
49
+ **Checkpoint detection — check for resumable progress before stage routing:**
50
+
51
+ After loading state and running migration, detect whether a prior pipeline run left
52
+ a checkpoint with meaningful progress (beyond triage). If found, present the user
53
+ with Resume/Fresh/Skip options before proceeding.
54
+
55
+ ```bash
56
+ # Detect checkpoint with progress beyond triage
57
+ CHECKPOINT_DATA=$(node -e "
58
+ const { detectCheckpoint, resumeFromCheckpoint } = require('./lib/state.cjs');
59
+ const cp = detectCheckpoint(${ISSUE_NUMBER});
60
+ if (!cp) {
61
+ console.log('none');
62
+ } else {
63
+ const resume = resumeFromCheckpoint(${ISSUE_NUMBER});
64
+ console.log(JSON.stringify(resume));
65
+ }
66
+ " 2>/dev/null || echo "none")
67
+ ```
68
+
69
+ If checkpoint is found (`CHECKPOINT_DATA !== "none"`):
70
+
71
+ Parse the checkpoint data and display to the user:
72
+ ```bash
73
+ CHECKPOINT_STEP=$(echo "$CHECKPOINT_DATA" | node -e "
74
+ const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf-8'));
75
+ console.log(d.checkpoint.pipeline_step);
76
+ ")
77
+ RESUME_ACTION=$(echo "$CHECKPOINT_DATA" | node -e "
78
+ const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf-8'));
79
+ console.log(d.resumeAction);
80
+ ")
81
+ RESUME_STAGE=$(echo "$CHECKPOINT_DATA" | node -e "
82
+ const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf-8'));
83
+ console.log(d.resumeStage);
84
+ ")
85
+ COMPLETED_STEPS=$(echo "$CHECKPOINT_DATA" | node -e "
86
+ const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf-8'));
87
+ console.log(d.completedSteps.join(', '));
88
+ ")
89
+ ARTIFACTS_COUNT=$(echo "$CHECKPOINT_DATA" | node -e "
90
+ const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf-8'));
91
+ console.log(d.checkpoint.artifacts.length);
92
+ ")
93
+ STARTED_AT=$(echo "$CHECKPOINT_DATA" | node -e "
94
+ const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf-8'));
95
+ console.log(d.checkpoint.started_at || 'unknown');
96
+ ")
97
+ UPDATED_AT=$(echo "$CHECKPOINT_DATA" | node -e "
98
+ const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf-8'));
99
+ console.log(d.checkpoint.updated_at || 'unknown');
100
+ ")
101
+ ```
102
+
103
+ Display checkpoint state and prompt user:
104
+ ```
105
+ AskUserQuestion(
106
+ header: "Checkpoint Detected for #${ISSUE_NUMBER}",
107
+ question: "A prior pipeline run left progress at step '${CHECKPOINT_STEP}'.
108
+
109
+ | | |
110
+ |---|---|
111
+ | **Last step** | ${CHECKPOINT_STEP} |
112
+ | **Completed steps** | ${COMPLETED_STEPS} |
113
+ | **Artifacts** | ${ARTIFACTS_COUNT} file(s) |
114
+ | **Resume action** | ${RESUME_ACTION} → stage: ${RESUME_STAGE} |
115
+ | **Started** | ${STARTED_AT} |
116
+ | **Last updated** | ${UPDATED_AT} |
117
+
118
+ How would you like to proceed?",
119
+ options: [
120
+ { label: "Resume", description: "Resume from checkpoint — skip completed steps (${COMPLETED_STEPS}), jump to ${RESUME_STAGE}" },
121
+ { label: "Fresh", description: "Discard checkpoint and re-run pipeline from scratch" },
122
+ { label: "Skip", description: "Skip this issue entirely" }
123
+ ]
124
+ )
125
+ ```
126
+
127
+ Handle user choice:
128
+
129
+ | Choice | Action |
130
+ |--------|--------|
131
+ | **Resume** | Load checkpoint context. Set `pipeline_stage` in state to `${RESUME_STAGE}`. Log: "MGW: Resuming #${ISSUE_NUMBER} from checkpoint (step: ${CHECKPOINT_STEP}, action: ${RESUME_ACTION})." Skip triage/worktree stages that already completed and jump directly to the resume stage in the pipeline. The `resume.context` object carries step-specific data (e.g., `quick_dir`, `plan_num`, `phase_number`) needed by the target stage. |
132
+ | **Fresh** | Clear checkpoint via `clearCheckpoint()`. Reset `pipeline_stage` to `"triaged"`. Log: "MGW: Checkpoint cleared for #${ISSUE_NUMBER}. Starting fresh." Continue with normal pipeline flow. |
133
+ | **Skip** | Log: "MGW: Skipping #${ISSUE_NUMBER} per user request." STOP pipeline. |
134
+
135
+ ```bash
136
+ case "$USER_CHOICE" in
137
+ Resume)
138
+ # Load resume context and jump to the appropriate stage
139
+ node -e "
140
+ const fs = require('fs'), path = require('path');
141
+ const activeDir = path.join(process.cwd(), '.mgw', 'active');
142
+ const files = fs.readdirSync(activeDir);
143
+ const file = files.find(f => f.startsWith('${ISSUE_NUMBER}-') && f.endsWith('.json'));
144
+ const filePath = path.join(activeDir, file);
145
+ const state = JSON.parse(fs.readFileSync(filePath, 'utf-8'));
146
+ // The pipeline_stage already reflects prior progress — do not overwrite
147
+ // unless the resume target is more advanced than current stage
148
+ console.log('Resuming from checkpoint: ' + JSON.stringify(state.checkpoint.resume));
149
+ " 2>/dev/null || true
150
+ # Set RESUME_MODE=true — downstream stages check this flag to skip completed work
151
+ RESUME_MODE=true
152
+ RESUME_CONTEXT="${CHECKPOINT_DATA}"
153
+ ;;
154
+ Fresh)
155
+ node -e "
156
+ const { clearCheckpoint } = require('./lib/state.cjs');
157
+ clearCheckpoint(${ISSUE_NUMBER});
158
+ console.log('Checkpoint cleared for #${ISSUE_NUMBER}');
159
+ " 2>/dev/null || true
160
+ # Reset pipeline_stage to triaged for fresh start
161
+ node -e "
162
+ const fs = require('fs'), path = require('path');
163
+ const activeDir = path.join(process.cwd(), '.mgw', 'active');
164
+ const files = fs.readdirSync(activeDir);
165
+ const file = files.find(f => f.startsWith('${ISSUE_NUMBER}-') && f.endsWith('.json'));
166
+ const filePath = path.join(activeDir, file);
167
+ const state = JSON.parse(fs.readFileSync(filePath, 'utf-8'));
168
+ state.pipeline_stage = 'triaged';
169
+ fs.writeFileSync(filePath, JSON.stringify(state, null, 2));
170
+ " 2>/dev/null || true
171
+ RESUME_MODE=false
172
+ ;;
173
+ Skip)
174
+ echo "MGW: Skipping #${ISSUE_NUMBER} per user request."
175
+ exit 0
176
+ ;;
177
+ esac
178
+ ```
179
+
180
+ If no checkpoint found (or checkpoint is at triage step only), continue with
181
+ normal pipeline stage routing below.
182
+
183
+ **Initialize checkpoint** when pipeline first transitions past triage:
184
+ ```bash
185
+ # Checkpoint initialization — called once when pipeline execution begins.
186
+ # Sets pipeline_step to "triage" with route selection progress.
187
+ # Subsequent stages update the checkpoint via updateCheckpoint().
188
+ # All checkpoint writes are atomic (write to .tmp then rename).
189
+ node -e "
190
+ const { updateCheckpoint } = require('./lib/state.cjs');
191
+ updateCheckpoint(${ISSUE_NUMBER}, {
192
+ pipeline_step: 'triage',
193
+ step_progress: {
194
+ comment_check_done: true,
195
+ route_selected: '${GSD_ROUTE}'
196
+ },
197
+ resume: {
198
+ action: 'begin-execution',
199
+ context: { gsd_route: '${GSD_ROUTE}', branch: '${BRANCH_NAME}' }
200
+ }
201
+ });
202
+ " 2>/dev/null || true
203
+ ```
49
204
  Check pipeline_stage:
50
205
  - "triaged" → proceed to GSD execution
51
206
  - "planning" / "executing" → resume from where we left off
@@ -359,7 +514,37 @@ NEW_COMMENTS=$(gh issue view $ISSUE_NUMBER --json comments \
359
514
  --jq "[.comments[-${NEW_COUNT}:]] | .[] | {author: .author.login, body: .body, createdAt: .createdAt}" 2>/dev/null)
360
515
  ```
361
516
 
362
- 2. **Spawn classification agent:**
517
+ 2. **Spawn classification agent (with diagnostic capture):**
518
+
519
+ <!-- mgw:criticality=advisory spawn_point=comment-classifier -->
520
+ <!-- Advisory: comment classification failure does not block the pipeline.
521
+ If this agent fails, log a warning and treat all new comments as
522
+ informational (safe default — pipeline continues with stale data).
523
+
524
+ Graceful degradation pattern:
525
+ ```
526
+ CLASSIFICATION_RESULT=$(wrapAdvisoryAgent(Task(...), 'comment-classifier', {
527
+ issueNumber: ISSUE_NUMBER,
528
+ fallback: '{"classification":"informational","reasoning":"comment classifier unavailable","new_requirements":[],"blocking_reason":""}'
529
+ }))
530
+ ```
531
+ -->
532
+
533
+ **Pre-spawn diagnostic hook:**
534
+ ```bash
535
+ CLASSIFIER_PROMPT="<full classifier prompt assembled above>"
536
+ DIAG_CLASSIFIER=$(node -e "
537
+ const dh = require('${REPO_ROOT}/lib/diagnostic-hooks.cjs');
538
+ const id = dh.beforeAgentSpawn({
539
+ agentType: 'general-purpose',
540
+ issueNumber: ${ISSUE_NUMBER},
541
+ prompt: process.argv[1],
542
+ repoRoot: '${REPO_ROOT}'
543
+ });
544
+ process.stdout.write(id);
545
+ " "$CLASSIFIER_PROMPT" 2>/dev/null || echo "")
546
+ ```
547
+
363
548
  ```
364
549
  Task(
365
550
  prompt="
@@ -414,6 +599,18 @@ Return ONLY valid JSON:
414
599
  )
415
600
  ```
416
601
 
602
+ **Post-spawn diagnostic hook:**
603
+ ```bash
604
+ node -e "
605
+ const dh = require('${REPO_ROOT}/lib/diagnostic-hooks.cjs');
606
+ dh.afterAgentSpawn({
607
+ diagId: '${DIAG_CLASSIFIER}',
608
+ exitReason: '${CLASSIFICATION_RESULT ? \"success\" : \"error\"}',
609
+ repoRoot: '${REPO_ROOT}'
610
+ });
611
+ " 2>/dev/null || true
612
+ ```
613
+
417
614
  3. **React based on classification:**
418
615
 
419
616
  | Classification | Action |
@@ -355,6 +355,77 @@ The agent is read-only (general-purpose, no code execution). It reads project st
355
355
  and codebase to classify, then MGW presents the result and offers follow-up actions
356
356
  (file new issue, post comment on related issue, etc.).
357
357
 
358
+ ## Diagnostic Capture Hooks
359
+
360
+ Every Task() spawn in the mgw:run pipeline SHOULD be instrumented with diagnostic
361
+ capture hooks using `lib/diagnostic-hooks.cjs`. This provides per-agent telemetry
362
+ (timing, prompt hash, exit reason, failure classification) without blocking pipeline
363
+ execution.
364
+
365
+ **Required modules:**
366
+ - `lib/diagnostic-hooks.cjs` — Before/after hooks for Task() spawns
367
+ - `lib/agent-diagnostics.cjs` — Underlying diagnostic logger (writes to `.mgw/diagnostics/`)
368
+
369
+ **Pattern — wrap every Task() spawn:**
370
+
371
+ ```bash
372
+ # 1. Before spawning: record start time and hash prompt
373
+ DIAG_ID=$(node -e "
374
+ const dh = require('${REPO_ROOT}/lib/diagnostic-hooks.cjs');
375
+ const id = dh.beforeAgentSpawn({
376
+ agentType: '${AGENT_TYPE}', # gsd-planner, gsd-executor, etc.
377
+ issueNumber: ${ISSUE_NUMBER},
378
+ prompt: '${PROMPT_SUMMARY}', # short description, not full prompt
379
+ repoRoot: '${REPO_ROOT}'
380
+ });
381
+ process.stdout.write(id);
382
+ " 2>/dev/null || echo "")
383
+
384
+ # 2. Spawn Task() agent (unchanged)
385
+ Task(
386
+ prompt="...",
387
+ subagent_type="${AGENT_TYPE}",
388
+ description="..."
389
+ )
390
+
391
+ # 3. After agent completes: record exit reason and write diagnostic entry
392
+ EXIT_REASON=$( <check artifact exists> && echo "success" || echo "error" )
393
+ node -e "
394
+ const dh = require('${REPO_ROOT}/lib/diagnostic-hooks.cjs');
395
+ dh.afterAgentSpawn({
396
+ diagId: '${DIAG_ID}',
397
+ exitReason: '${EXIT_REASON}',
398
+ repoRoot: '${REPO_ROOT}'
399
+ });
400
+ " 2>/dev/null || true
401
+ ```
402
+
403
+ **Key design principles:**
404
+ - **Non-blocking:** All hook calls are wrapped in `2>/dev/null || true` (bash) or
405
+ try/catch (JS). If diagnostic capture fails, pipeline continues normally.
406
+ - **No prompt storage:** Only a hash of the prompt is stored, not the full text.
407
+ Use `shortHash()` from `agent-diagnostics.cjs` for longer prompts.
408
+ - **Exit reason detection:** Use artifact existence checks (e.g., `PLAN.md` exists
409
+ means planner succeeded) rather than relying on Task() return values.
410
+ - **Graceful degradation:** If `agent-diagnostics.cjs` is not available (dependency
411
+ PR not merged), `diagnostic-hooks.cjs` logs a warning and returns empty handles.
412
+
413
+ **Diagnostic entries are written to:** `.mgw/diagnostics/<issueNumber>-<timestamp>.json`
414
+
415
+ **Instrumented agent spawns in mgw:run:**
416
+
417
+ | Agent | File | Step |
418
+ |-------|------|------|
419
+ | Comment classifier | `run/triage.md` | preflight_comment_check |
420
+ | Planner (quick) | `run/execute.md` | execute_gsd_quick step 3 |
421
+ | Plan-checker (quick) | `run/execute.md` | execute_gsd_quick step 6 |
422
+ | Executor (quick) | `run/execute.md` | execute_gsd_quick step 7 |
423
+ | Verifier (quick) | `run/execute.md` | execute_gsd_quick step 9 |
424
+ | Planner (milestone) | `run/execute.md` | execute_gsd_milestone step b |
425
+ | Executor (milestone) | `run/execute.md` | execute_gsd_milestone step d |
426
+ | Verifier (milestone) | `run/execute.md` | execute_gsd_milestone step e |
427
+ | PR creator | `run/pr-create.md` | create_pr |
428
+
358
429
  ## Anti-Patterns
359
430
 
360
431
  - **NEVER** use Skill invocation from within a Task() agent — Skills don't resolve inside subagents
@@ -227,13 +227,186 @@ File: `.mgw/active/<number>-<slug>.json`
227
227
  "gsd_route": null,
228
228
  "gsd_artifacts": { "type": null, "path": null },
229
229
  "pipeline_stage": "new|triaged|needs-info|needs-security-review|discussing|approved|planning|diagnosing|executing|verifying|pr-created|done|failed|blocked",
230
+ "checkpoint": null,
230
231
  "comments_posted": [],
231
232
  "linked_pr": null,
232
233
  "linked_issues": [],
233
- "linked_branches": []
234
+ "linked_branches": [],
235
+ "checkpoint": null
234
236
  }
235
237
  ```
236
238
 
239
+ ## Checkpoint Schema
240
+
241
+ The `checkpoint` field in `.mgw/active/<number>-<slug>.json` tracks fine-grained pipeline
242
+ execution progress. It enables resume after failures, context switches, or multi-session
243
+ execution. The field is `null` until pipeline execution begins (set during the triage-to-
244
+ executing transition).
245
+
246
+ ### Checkpoint Object Structure
247
+
248
+ ```json
249
+ {
250
+ "checkpoint": {
251
+ "schema_version": 1,
252
+ "pipeline_step": "triage|plan|execute|verify|pr",
253
+ "step_progress": {},
254
+ "last_agent_output": null,
255
+ "artifacts": [],
256
+ "resume": {
257
+ "action": null,
258
+ "context": {}
259
+ },
260
+ "started_at": "2026-03-06T12:00:00Z",
261
+ "updated_at": "2026-03-06T12:05:00Z",
262
+ "step_history": []
263
+ }
264
+ }
265
+ ```
266
+
267
+ ### Checkpoint Fields
268
+
269
+ | Field | Type | Default | Description |
270
+ |-------|------|---------|-------------|
271
+ | `schema_version` | integer | `1` | Schema version for forward-compatibility. Consumers check this before parsing. New fields can be added without bumping; bump only for breaking structural changes. |
272
+ | `pipeline_step` | string | `"triage"` | Current high-level pipeline step. Values: `"triage"`, `"plan"`, `"execute"`, `"verify"`, `"pr"`. Maps to GSD lifecycle stages but at a coarser grain than `pipeline_stage`. |
273
+ | `step_progress` | object | `{}` | Step-specific progress data. Shape varies by `pipeline_step` (see Step Progress Shapes below). Unknown keys are preserved on read -- consumers must not strip unrecognized fields. |
274
+ | `last_agent_output` | string\|null | `null` | File path (relative to repo root) of the last successful agent output. Updated after each agent spawn completes. Used for resume context injection. |
275
+ | `artifacts` | array | `[]` | Accumulated artifact paths produced during this pipeline run. Each entry is `{ "path": "relative/path", "type": "plan\|summary\|verification\|commit", "created_at": "ISO" }`. Append-only -- never remove entries. |
276
+ | `resume` | object | `{ "action": null, "context": {} }` | Instructions for resuming execution. `action` is a string describing what to do next (e.g., `"spawn-executor"`, `"retry-verifier"`, `"create-pr"`). `context` carries step-specific data needed for resume (e.g., `{ "phase_number": 3, "plan_path": ".planning/..." }`). |
277
+ | `started_at` | string | ISO timestamp | When checkpoint tracking began for this pipeline run. |
278
+ | `updated_at` | string | ISO timestamp | When the checkpoint was last modified. Updated on every checkpoint write. |
279
+ | `step_history` | array | `[]` | Ordered log of completed steps. Each entry: `{ "step": "plan", "completed_at": "ISO", "agent_type": "gsd-planner", "output_path": "..." }`. Append-only. |
280
+
281
+ ### Step Progress Shapes
282
+
283
+ The `step_progress` object has a different shape depending on the current `pipeline_step`.
284
+ These are the documented shapes; future pipeline steps can define their own without breaking
285
+ existing consumers (unknown keys are preserved).
286
+
287
+ **When `pipeline_step` is `"triage"`:**
288
+ ```json
289
+ {
290
+ "comment_check_done": false,
291
+ "route_selected": null
292
+ }
293
+ ```
294
+
295
+ **When `pipeline_step` is `"plan"`:**
296
+ ```json
297
+ {
298
+ "plan_path": null,
299
+ "plan_checked": false,
300
+ "revision_count": 0
301
+ }
302
+ ```
303
+
304
+ **When `pipeline_step` is `"execute"`:**
305
+ ```json
306
+ {
307
+ "gsd_phase": null,
308
+ "total_phases": null,
309
+ "current_task": null,
310
+ "tasks_completed": 0,
311
+ "tasks_total": null,
312
+ "commits": []
313
+ }
314
+ ```
315
+
316
+ **When `pipeline_step` is `"verify"`:**
317
+ ```json
318
+ {
319
+ "verification_path": null,
320
+ "must_haves_checked": false,
321
+ "artifact_check_done": false,
322
+ "keylink_check_done": false
323
+ }
324
+ ```
325
+
326
+ **When `pipeline_step` is `"pr"`:**
327
+ ```json
328
+ {
329
+ "branch_pushed": false,
330
+ "pr_number": null,
331
+ "pr_url": null
332
+ }
333
+ ```
334
+
335
+ ### Forward Compatibility Contract
336
+
337
+ 1. **New fields can be added** to the checkpoint object at any level without incrementing
338
+ `schema_version`. Consumers must tolerate unknown fields (preserve on read-modify-write,
339
+ ignore on read-only access).
340
+
341
+ 2. **New `pipeline_step` values** can be introduced freely. Existing step_progress shapes
342
+ are not affected. The `step_progress` for an unrecognized step should be treated as an
343
+ opaque object (pass through unchanged).
344
+
345
+ 3. **`schema_version` bump** is required only when an existing field changes its type,
346
+ semantics, or is removed. When bumped, `migrateProjectState()` in `lib/state.cjs` must
347
+ handle the migration.
348
+
349
+ 4. **`artifacts` and `step_history` are append-only**. Consumers should never modify or
350
+ remove entries from these arrays. They may be compacted during archival (when pipeline
351
+ reaches `done` stage and state moves to `.mgw/completed/`).
352
+
353
+ 5. **`resume.context` is opaque** to all consumers except the specific resume handler for
354
+ the given `resume.action`. This allows step-specific resume data to evolve independently.
355
+
356
+ ### Checkpoint Lifecycle
357
+
358
+ ```
359
+ triage (checkpoint initialized, pipeline_step="triage")
360
+ |
361
+ v
362
+ plan (pipeline_step="plan", step_progress tracks planning state)
363
+ |
364
+ v
365
+ execute (pipeline_step="execute", step_progress tracks GSD phase/task progress)
366
+ |
367
+ v
368
+ verify (pipeline_step="verify", step_progress tracks verification checks)
369
+ |
370
+ v
371
+ pr (pipeline_step="pr", step_progress tracks PR creation)
372
+ |
373
+ v
374
+ done (checkpoint frozen — archived to .mgw/completed/)
375
+ ```
376
+
377
+ ### Checkpoint Update Pattern
378
+
379
+ ```bash
380
+ # Update checkpoint at key pipeline stages using updateCheckpoint()
381
+ node -e "
382
+ const { updateCheckpoint } = require('./lib/state.cjs');
383
+ updateCheckpoint(${ISSUE_NUMBER}, {
384
+ pipeline_step: 'execute',
385
+ step_progress: {
386
+ gsd_phase: ${PHASE_NUMBER},
387
+ tasks_completed: ${COMPLETED},
388
+ tasks_total: ${TOTAL}
389
+ },
390
+ last_agent_output: '${OUTPUT_PATH}',
391
+ resume: {
392
+ action: 'continue-execution',
393
+ context: { phase_number: ${PHASE_NUMBER} }
394
+ }
395
+ });
396
+ "
397
+ ```
398
+
399
+ ### Consumers
400
+
401
+ | Consumer | Access Pattern |
402
+ |----------|---------------|
403
+ | run/triage.md | Initialize checkpoint at triage (`pipeline_step: "triage"`) |
404
+ | run/execute.md | Update checkpoint after each agent spawn (`pipeline_step: "plan"\|"execute"\|"verify"`) |
405
+ | run/pr-create.md | Update checkpoint at PR creation (`pipeline_step: "pr"`) |
406
+ | milestone.md | Read checkpoint to determine resume point for failed issues |
407
+ | status.md | Read checkpoint for detailed progress display |
408
+ | sync.md | Compare checkpoint state against GitHub for drift detection |
409
+
237
410
  ## Stage Flow Diagram
238
411
 
239
412
  ```
@@ -264,6 +437,84 @@ blocked --> triaged (re-triage after blocker resolved)
264
437
  Any stage --> failed (unrecoverable error)
265
438
  ```
266
439
 
440
+ ## Pipeline Checkpoints
441
+
442
+ Fine-grained pipeline progress tracking within `.mgw/active/<number>-<slug>.json`.
443
+ The `checkpoint` field starts as `null` and is initialized when the pipeline first
444
+ transitions past triage. Each subsequent stage writes an atomic checkpoint update.
445
+
446
+ ### Checkpoint Schema
447
+
448
+ ```json
449
+ {
450
+ "checkpoint": {
451
+ "schema_version": 1,
452
+ "pipeline_step": "triage|plan|execute|verify|pr",
453
+ "step_progress": {},
454
+ "last_agent_output": null,
455
+ "artifacts": [],
456
+ "resume": { "action": null, "context": {} },
457
+ "started_at": "ISO timestamp",
458
+ "updated_at": "ISO timestamp",
459
+ "step_history": []
460
+ }
461
+ }
462
+ ```
463
+
464
+ | Field | Type | Merge Strategy | Description |
465
+ |-------|------|---------------|-------------|
466
+ | `schema_version` | number | — | Checkpoint format version (currently 1) |
467
+ | `pipeline_step` | string | overwrite | Current pipeline step: `triage`, `plan`, `execute`, `verify`, `pr` |
468
+ | `step_progress` | object | shallow merge | Step-specific progress (e.g., `{ plan_path: "...", plan_checked: false }`) |
469
+ | `last_agent_output` | string\|null | overwrite | Path or URL of the last agent's output |
470
+ | `artifacts` | array | append-only | `[{ path, type, created_at }]` — never removed, only appended |
471
+ | `resume` | object | full replace | `{ action, context }` — what to do if pipeline restarts |
472
+ | `started_at` | string | — | ISO timestamp when checkpoint was first created |
473
+ | `updated_at` | string | auto | ISO timestamp of last update (set automatically) |
474
+ | `step_history` | array | append-only | `[{ step, completed_at, agent_type, output_path }]` — audit trail |
475
+
476
+ ### Atomic Writes
477
+
478
+ All checkpoint writes use `atomicWriteJson()` from `lib/state.cjs`:
479
+
480
+ ```bash
481
+ # atomicWriteJson(filePath, data) — write to .tmp then rename.
482
+ # POSIX rename is atomic on the same filesystem, so a crash mid-write
483
+ # never leaves a corrupt state file.
484
+ ```
485
+
486
+ The `updateCheckpoint()` function uses `atomicWriteJson()` internally. Commands
487
+ should always use `updateCheckpoint()` rather than writing checkpoints directly:
488
+
489
+ ```bash
490
+ node -e "
491
+ const { updateCheckpoint } = require('./lib/state.cjs');
492
+ updateCheckpoint(${ISSUE_NUMBER}, {
493
+ pipeline_step: 'plan',
494
+ step_progress: { plan_path: '...', plan_checked: false },
495
+ artifacts: [{ path: '...', type: 'plan', created_at: new Date().toISOString() }],
496
+ step_history: [{ step: 'plan', completed_at: new Date().toISOString(), agent_type: 'gsd-planner', output_path: '...' }],
497
+ resume: { action: 'spawn-executor', context: { quick_dir: '...' } }
498
+ });
499
+ " 2>/dev/null || true
500
+ ```
501
+
502
+ ### Checkpoint Lifecycle
503
+
504
+ | Pipeline Step | Checkpoint `pipeline_step` | Resume Action |
505
+ |--------------|---------------------------|---------------|
506
+ | Triage complete | `triage` | `begin-execution` |
507
+ | Planner complete | `plan` | `run-plan-checker` or `spawn-executor` |
508
+ | Executor complete | `execute` | `spawn-verifier` or `create-pr` |
509
+ | Verifier complete | `verify` | `create-pr` |
510
+ | PR created | `pr` | `cleanup` |
511
+
512
+ ### Migration
513
+
514
+ `migrateProjectState()` adds the `checkpoint: null` field to any issue state
515
+ files that predate checkpoint support. The field is initialized lazily — it
516
+ stays `null` until the pipeline actually runs.
517
+
267
518
  ## Slug Generation
268
519
 
269
520
  Use gsd-tools for consistent slug generation:
@@ -396,6 +647,91 @@ GSD phase directory (`.planning/phases/{NN}-{slug}/`) to operate in.
396
647
  Issues created outside of `/mgw:project` (e.g., manually filed bugs) will not have
397
648
  a `phase_number`. In this case, `/mgw:run` falls back to the quick pipeline.
398
649
 
650
+ ## Checkpoint Resume Detection
651
+
652
+ When `mgw:run` starts for an issue, the validate_and_load step checks whether a prior
653
+ pipeline run left a checkpoint with progress beyond the initial triage step. This enables
654
+ resuming interrupted sessions without re-doing completed work.
655
+
656
+ ### Resume Detection Functions (lib/state.cjs)
657
+
658
+ | Function | Signature | Returns | Description |
659
+ |----------|-----------|---------|-------------|
660
+ | `detectCheckpoint` | `(issueNumber)` | `object\|null` | Checks if active state file has a non-null checkpoint with `pipeline_step` beyond `"triage"`. Returns the checkpoint data if resumable, `null` otherwise. |
661
+ | `resumeFromCheckpoint` | `(issueNumber)` | `object\|null` | Returns checkpoint data plus computed `resumeStage`, `resumeAction`, and `completedSteps`. Maps `resume.action` to the pipeline stage to jump to. |
662
+ | `clearCheckpoint` | `(issueNumber)` | `{ cleared: boolean }` | Resets the checkpoint field to `null` in the active state file. Used for "Fresh start" option. |
663
+
664
+ ### Resume Action to Stage Mapping
665
+
666
+ The `resume.action` field in the checkpoint tells `resumeFromCheckpoint()` which pipeline
667
+ stage to jump to:
668
+
669
+ | resume.action | resumeStage | Meaning |
670
+ |---------------|-------------|---------|
671
+ | `run-plan-checker` | `planning` | Plan exists, needs quality check |
672
+ | `spawn-executor` | `executing` | Plan complete, execute next |
673
+ | `continue-execution` | `executing` | Mid-execution resume |
674
+ | `spawn-verifier` | `verifying` | Execution done, verify next |
675
+ | `create-pr` | `pr-pending` | Verification done, create PR |
676
+ | `begin-execution` | `planning` | Triage done, begin planning |
677
+ | `null` / unknown | `planning` | Safe default |
678
+
679
+ ### Resume Detection Flow
680
+
681
+ ```
682
+ mgw:run #N starts
683
+ |
684
+ v
685
+ Load state file → migrateProjectState()
686
+ |
687
+ v
688
+ detectCheckpoint(N)
689
+ |
690
+ +---> null (no checkpoint or triage-only) → proceed with normal stage routing
691
+ |
692
+ +---> checkpoint found → display state to user
693
+ |
694
+ v
695
+ AskUserQuestion: Resume / Fresh / Skip
696
+ |
697
+ +---> Resume: load checkpoint context, set RESUME_MODE=true,
698
+ | jump to resume.action stage (skip completed steps)
699
+ |
700
+ +---> Fresh: clearCheckpoint(N), reset pipeline_stage to "triaged",
701
+ | continue normal pipeline
702
+ |
703
+ +---> Skip: exit pipeline for this issue
704
+ ```
705
+
706
+ ### Pipeline Step Order
707
+
708
+ The `CHECKPOINT_STEP_ORDER` constant defines the ordered progression of checkpoint steps:
709
+
710
+ ```
711
+ triage → plan → execute → verify → pr
712
+ ```
713
+
714
+ Only checkpoints with `pipeline_step` at index > 0 (beyond `"triage"`) are considered
715
+ resumable. A checkpoint at `"triage"` means nothing meaningful has been completed yet.
716
+
717
+ ### Resume Context
718
+
719
+ When resuming, the `resume.context` object carries step-specific data needed by the
720
+ target stage. The context shape varies by `resume.action`:
721
+
722
+ | resume.action | Context fields |
723
+ |---------------|----------------|
724
+ | `spawn-executor` | `{ quick_dir, plan_num }` |
725
+ | `run-plan-checker` | `{ quick_dir, plan_num }` |
726
+ | `spawn-verifier` | `{ quick_dir, plan_num }` |
727
+ | `create-pr` | `{ quick_dir, plan_num }` |
728
+ | `continue-execution` | `{ phase_number }` |
729
+ | `begin-execution` | `{ gsd_route, branch }` |
730
+
731
+ Downstream pipeline stages read `resume.context` to pick up where the prior run left
732
+ off. For example, the executor stage uses `quick_dir` and `plan_num` to locate the
733
+ existing plan files rather than re-creating them.
734
+
399
735
  ## Consumers
400
736
 
401
737
  | Pattern | Referenced By |
@@ -410,3 +746,6 @@ a `phase_number`. In this case, `/mgw:run` falls back to the quick pipeline.
410
746
  | Project state | milestone.md, next.md, ask.md |
411
747
  | Gate result schema | issue.md (populate), run.md (validate) |
412
748
  | Board status sync | board-sync.md (utility), issue.md (triage transitions), run.md (pipeline transitions) |
749
+ | Checkpoint resume | run.md (detect + prompt), milestone.md (detect resume point for failed issues) |
750
+ | Checkpoint writes | triage.md (init), execute.md (plan/execute/verify), pr-create.md (pr) |
751
+ | Atomic writes | lib/state.cjs (`atomicWriteJson`, `updateCheckpoint`) |
package/dist/bin/mgw.cjs CHANGED
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env node
2
2
  'use strict';
3
3
 
4
- var index = require('../index-B-_JvYpz.cjs');
4
+ var index = require('../index-CHrVAIMY.cjs');
5
5
  var require$$0 = require('commander');
6
6
  var require$$1 = require('path');
7
7
  var require$$2 = require('fs');
@@ -11,7 +11,7 @@ require('events');
11
11
 
12
12
  var mgw$1 = {};
13
13
 
14
- var version = "0.4.0";
14
+ var version = "0.6.0";
15
15
  var require$$11 = {
16
16
  version: version};
17
17