@crewpilot/agent 2.0.0 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. package/README.md +131 -131
  2. package/dist-npm/cli.js +5 -5
  3. package/dist-npm/index.js +100 -100
  4. package/package.json +69 -69
  5. package/prompts/agent.md +282 -282
  6. package/prompts/copilot-instructions.md +36 -36
  7. package/prompts/{catalyst.config.json → crewpilot.config.json} +72 -72
  8. package/prompts/skills/assure-code-quality/SKILL.md +112 -112
  9. package/prompts/skills/assure-pr-intelligence/SKILL.md +148 -148
  10. package/prompts/skills/assure-review-functional/SKILL.md +114 -114
  11. package/prompts/skills/assure-review-standards/SKILL.md +106 -106
  12. package/prompts/skills/assure-threat-model/SKILL.md +182 -182
  13. package/prompts/skills/assure-vulnerability-scan/SKILL.md +146 -146
  14. package/prompts/skills/autopilot-meeting/SKILL.md +434 -434
  15. package/prompts/skills/autopilot-worker/SKILL.md +737 -737
  16. package/prompts/skills/daily-digest/SKILL.md +188 -188
  17. package/prompts/skills/deliver-change-management/SKILL.md +132 -132
  18. package/prompts/skills/deliver-deploy-guard/SKILL.md +144 -144
  19. package/prompts/skills/deliver-doc-governance/SKILL.md +130 -130
  20. package/prompts/skills/engineer-feature-builder/SKILL.md +270 -270
  21. package/prompts/skills/engineer-root-cause-analysis/SKILL.md +150 -150
  22. package/prompts/skills/engineer-test-first/SKILL.md +148 -148
  23. package/prompts/skills/insights-knowledge-base/SKILL.md +202 -202
  24. package/prompts/skills/insights-pattern-detection/SKILL.md +142 -142
  25. package/prompts/skills/strategize-architecture-planner/SKILL.md +141 -141
  26. package/prompts/skills/strategize-solution-design/SKILL.md +118 -118
  27. package/scripts/postinstall.js +108 -108
@@ -1,737 +1,737 @@
1
- # Autopilot Worker
2
-
3
- > **Pillar**: Orchestrate | **ID**: `autopilot-worker`
4
-
5
- ## Purpose
6
-
7
- Single-command pipeline that creates a board issue, plans implementation, writes code + tests, applies the full Deliver pipeline (change-management → doc-governance → deploy-guard), opens a reviewed PR, and updates the board. One human gate: approve the plan. Everything else is automatic chaining through 12 skills. Includes label-gated design/architecture phases, bug-triggered root-cause analysis, and a continuous self-improvement loop via pattern detection + knowledge base.
8
-
9
- ## Activation Triggers
10
-
11
- - autopilot, auto, pick up, work on, do this, implement and ship, end to end, full pipeline
12
- - Routed from `feature-builder` Phase 0 when complexity is moderate or complex
13
- - User provides a board issue number ("#42", "issue 42")
14
-
15
- ## Session Role Exception
16
-
17
- This pipeline chains 12 skills across role boundaries (e.g. code-quality and vulnerability-scan in Phase 6 are Review skills, but run inside the Builder pipeline). **All skills invoked internally by this pipeline are unrestricted by the session role.** Role scoping only applies to user-initiated requests, not pipeline steps.
18
-
19
- ## Tools Required
20
-
21
- - `catalyst_board_connect` — connect to board provider
22
- - `catalyst_board_create` — create issue on board
23
- - `catalyst_board_move` — update issue status
24
- - `catalyst_board_comment` — log progress on the issue
25
- - `catalyst_worker_start` — start orchestrator workflow
26
- - `catalyst_worker_plan` — set execution plan
27
- - `catalyst_worker_approve` — human approval gate
28
- - `catalyst_worker_branch` — create feature branch
29
- - `catalyst_worker_pr` — push + open PR
30
- - `catalyst_worker_review_done` — record review verdict
31
- - `catalyst_worker_complete` — mark workflow done
32
- - `catalyst_worker_fail` — circuit breaker on failure
33
- - `catalyst_git_stage` — stage files
34
- - `catalyst_git_commit` — commit changes
35
- - `catalyst_exec` — run commands (tests, lint, build)
36
- - `catalyst_knowledge_store` — store decisions made during implementation
37
- - `catalyst_git_diff` — analyze changes for change-management
38
- - `catalyst_git_log` — commit history for release notes
39
- - `catalyst_metrics_coverage` — coverage check for deploy-guard
40
- - `catalyst_metrics_complexity` — complexity check for deploy-guard and pattern detection
41
- - `catalyst_worker_preview_pr` — preview changes before PR creation
42
- - `catalyst_worker_push_fixes` — push fixes to existing PR branch (no new PR)
43
- - `catalyst_board_pr_comments` — fetch review comments from a PR
44
- - `catalyst_knowledge_search` — query known patterns, anti-patterns, and past root causes
45
- - `catalyst_artifact_write` — persist phase outputs (analysis, plans, reviews) so downstream phases can read them
46
- - `catalyst_artifact_read` — read artifacts from prior phases (e.g. analysis → plan, plan → implementation)
47
- - `catalyst_artifact_list` — list all artifacts for the current workflow
48
- - `catalyst_dispatch_subagent` — delegate focused work (code review, test writing, security audit) to specialized sub-agents
49
- - `catalyst_session_save` — save session state for long-running tasks (enables resume across conversations)
50
- - `catalyst_session_restore` — restore a previously saved session to continue work
51
- - `catalyst_session_list` — list all saved sessions
52
- - `mcp_workiq_ask_work_iq` — (optional, requires Work IQ extension) fetch M365 context (emails, docs, meetings) related to the task
53
-
54
- ## Methodology
55
-
56
- ### Process Flow
57
-
58
- ```dot
59
- digraph autopilot_worker {
60
- rankdir=TB;
61
- node [shape=box];
62
-
63
- intake [label="Phase 1\nIntake & Issue Creation"];
64
- analysis [label="Phase 2\nCodebase Analysis & Planning"];
65
- design [label="Phase 2.5\nDesign & Architecture\n(label-gated)", style=dashed];
66
- rca [label="Phase 2.5c\nRoot Cause Analysis\n(bug label-gated)", style=dashed];
67
- threat [label="Phase 2.5d\nThreat Model\n(security label-gated)", style=dashed];
68
- plan_gate [label="Phase 3\nHUMAN GATE: Plan Approval", shape=diamond, style=filled, fillcolor="#ffcccc"];
69
- implement [label="Phase 4\nBranch & Implementation"];
70
- change_mgmt [label="Phase 5\nChange Management"];
71
- doc_gov [label="Phase 5b\nDoc Governance"];
72
- pr_review [label="Phase 6\nPR Creation & Auto-Review\n(5-stage)"];
73
- deploy_guard [label="Phase 7\nDeploy Guard\n(6 gates)"];
74
- complete [label="Phase 8\nCompletion & Learning", shape=doublecircle];
75
- fail [label="FAIL\nCircuit Breaker", shape=octagon, style=filled, fillcolor="#ff9999"];
76
-
77
- intake -> analysis;
78
- analysis -> design [label="needs-design\nor needs-architecture"];
79
- analysis -> rca [label="bug/defect/\nregression"];
80
- analysis -> threat [label="needs-threat-model\nor security-sensitive"];
81
- analysis -> plan_gate [label="no special labels"];
82
- design -> plan_gate;
83
- rca -> plan_gate;
84
- threat -> plan_gate;
85
- plan_gate -> implement [label="approved"];
86
- plan_gate -> fail [label="cancelled"];
87
- implement -> change_mgmt;
88
- implement -> fail [label="3 failures"];
89
- change_mgmt -> doc_gov;
90
- doc_gov -> pr_review;
91
- pr_review -> pr_review [label="issues found\nfix & re-run"];
92
- pr_review -> deploy_guard;
93
- deploy_guard -> complete [label="GO"];
94
- deploy_guard -> pr_review [label="NO-GO\nfix blockers"];
95
- complete -> complete [label="store knowledge\nself-improvement loop"];
96
- }
97
- ```
98
-
99
- ### Phase 1 — Intake & Issue Creation
100
-
101
- **First interaction hint:** If this is the first interaction in the session, start with:
102
- > 💡 *Running Catalyst Autopilot — I'll summarize the task, confirm with you before creating a board issue, plan the work, get your approval, implement, test, review, and open a PR.*
103
-
104
- **Entry mode detection** — the worker can be entered four ways:
105
-
106
- | Entry Mode | How to Detect | Behavior |
107
- |---|---|---|
108
- | **Direct** | User says "autopilot", "full pipeline", etc. | Run full pipeline from Phase 1 |
109
- | **Routed from feature-builder** | feature-builder's Phase 0 classified as moderate/complex | Skip re-analyzing complexity — it's already assessed. Use the context feature-builder gathered. |
110
- | **Mid-build escalation** | feature-builder discovered more complexity during Phase 4 | Accept the partial context (files already touched, patterns found). Start from Phase 2 (planning) with what's already known. |
111
- | **Session resume** | User says "resume", "continue", "pick up where I left off" | Call `catalyst_session_restore` with the workflow ID. Read the saved state, load associated artifacts, and resume from the last pending action. |
112
-
113
- **Session resume flow**: When resuming, the agent should:
114
- 1. Call `catalyst_session_restore` to get the saved state
115
- 2. Call `catalyst_artifact_list` to see what artifacts exist
116
- 3. Read relevant artifacts with `catalyst_artifact_read`
117
- 4. **(Optional) Calendar-aware context refresh**: If `mcp_workiq_ask_work_iq` is available and significant time has passed since the session was saved (overnight, weekend, or >4 hours):
118
- - Call `mcp_workiq_accept_eula` with `eulaUrl: "https://github.com/microsoft/work-iq-mcp"` (idempotent)
119
- - **Check for new context**: `mcp_workiq_ask_work_iq` → "What meetings, emails, or Teams messages about {issue title / feature} happened since {saved_at timestamp}? Summarize any new decisions, requirement changes, or blockers."
120
- - **Check calendar conflicts**: `mcp_workiq_ask_work_iq` → "Do I have any meetings in the next 2 hours that might affect my availability?"
121
- - If new decisions or requirement changes are found, flag them to the user before continuing:
122
- ```
123
- 📅 Context Update (since session was saved {age} ago):
124
- - {new decision / requirement change / blocker}
125
- → Continue with current plan? (yes / re-plan)
126
- ```
127
- - If unavailable, skip — resume proceeds without M365 context refresh.
128
- 5. Continue from the first pending action in the saved state
129
- 6. Do NOT re-run phases that have already completed (check artifacts_written)
130
-
131
- **Complexity check (direct entry only):** If the user enters autopilot directly, quickly assess if the request warrants the full pipeline:
132
- - If the request is trivial (single file, obvious change) → suggest: *"This is a small change. I can implement it directly without the full pipeline. Want me to do that instead?"*
133
- - If the user says "just do it" → hand off to `feature-builder` (which will handle it as trivial/simple tier).
134
- - Otherwise → continue with the full pipeline below.
135
-
136
- **If user provides a task description (not an existing issue number):**
137
-
138
- 1. Parse the user's request to extract:
139
- - Title (concise, action-oriented)
140
- - Description (what needs to be built)
141
- - Acceptance criteria (bullet list — infer from description if not explicit)
142
- - Labels (feature, bug, chore — infer from context)
143
-
144
- <HARD-GATE>
145
- 2. **HUMAN GATE — Task Creation Confirmation**: Present the inferred task summary to the user BEFORE creating the board issue:
146
-
147
- ```
148
- 📋 Before I start, here's what I'll create as a board issue:
149
-
150
- Title: {title}
151
- Description: {description}
152
-
153
- Acceptance Criteria:
154
- - [ ] {criterion 1}
155
- - [ ] {criterion 2}
156
- - [ ] {criterion 3}
157
-
158
- Labels: {labels}
159
-
160
- → Create this task and start the pipeline? (yes / edit / no)
161
- ```
162
-
163
- - If **yes** → call `catalyst_board_create`, continue to Phase 2
164
- - If **edit** → user provides corrections, update and re-present
165
- - If **no** → stop the pipeline. Ask the user what they'd like to do instead.
166
- - Do NOT create the board issue without explicit user confirmation.
167
- </HARD-GATE>
168
-
169
- 3. Call `catalyst_board_create` with title, description, acceptance criteria
170
- 4. Note the created issue ID
171
-
172
- **If user provides an existing issue number (e.g., "#42"):**
173
-
174
- 1. Call `catalyst_board_get` to read the existing issue
175
- 2. Use its title, description, and acceptance criteria as-is
176
- 3. No confirmation needed — the task already exists
177
-
178
- ### Phase 2 — Codebase Analysis & Planning
179
-
180
- 1. Read the project structure — scan key files (package.json, tsconfig, src/ layout, existing patterns)
181
- 2. Identify:
182
- - Which files need to be **created**
183
- - Which files need to be **modified**
184
- - What patterns/conventions the codebase follows (naming, directory structure, test style)
185
- - What dependencies might be needed
186
- 3. Check issue labels for `needs-design`, `needs-architecture`, `bug`/`defect`/`regression`, and `needs-threat-model`/`security-sensitive`
187
- 4. **Query pattern knowledge** via `catalyst_knowledge_search` (type: `pattern`):
188
- - Search for known patterns and anti-patterns in the files being modified
189
- - Search for past root causes in the same area of the codebase
190
- - Collect any "repeat offender" warnings from previous runs
191
- - Feed this context into the plan so the worker avoids known mistakes
192
- 5. **(Optional) Fetch M365 requirements context**: First call `mcp_workiq_accept_eula` with `eulaUrl: "https://github.com/microsoft/work-iq-mcp"` (idempotent), then use **focused queries** to surface requirements context before planning:
193
- - **Requirements & specs**: `mcp_workiq_ask_work_iq` → "Find emails, documents, and Teams messages about: {issue title}. Summarize relevant discussions, specs, and design docs."
194
- - **Meeting decisions**: `mcp_workiq_ask_work_iq` → "What decisions were made about {issue title / feature name} in recent meetings? What requirements were stated?"
195
- - **Stakeholder expectations**: `mcp_workiq_ask_work_iq` → "What did stakeholders or customers say about {feature} in recent emails or meetings? What was promised or committed?"
196
- - Feed the M365 context into the analysis artifact so Phase 3's plan addresses stated requirements, not just the issue description.
197
- - If `mcp_workiq_ask_work_iq` is unavailable, skip — this step is optional.
198
- 6. Call `catalyst_worker_start` with the issue ID and title
199
- 7. **Write artifact**: Call `catalyst_artifact_write` with `workflow_id={issue_id}`, `phase="analysis"` containing:
200
- - Files to create/modify
201
- - Codebase patterns discovered
202
- - Dependencies needed
203
- - Label-gated phases to run
204
- - Known patterns/anti-patterns from knowledge search
205
-
206
- ### Phase 2.5 — Design & Architecture (label-gated)
207
-
208
- **Skip this phase entirely if the issue has neither `needs-design` nor `needs-architecture` label.**
209
-
210
- Check the issue labels (from `catalyst_board_get`). Run the applicable skills:
211
-
212
- #### If issue has `needs-design` label:
213
-
214
- **Load and follow** `.github/skills/strategize-solution-design/SKILL.md`:
215
-
216
- 1. Frame the problem — restate in one sentence with constraints
217
- 2. Generate 3-4 distinct approaches with strengths, risks, and effort
218
- 3. Build a trade-off matrix comparing all options
219
- 4. Present to user:
220
-
221
- ```
222
- 📐 Design Phase for: "{issue title}"
223
-
224
- {trade-off matrix}
225
-
226
- Recommendation: {option} (Confidence: {N}/10)
227
- Reversal cost: {Low/Medium/High}
228
-
229
- → Which approach? (A / B / C / edit)
230
- ```
231
-
232
- 5. **HUMAN GATE**: User picks an approach
233
- 6. Store the decision via `catalyst_knowledge_store` (type: decision)
234
- 7. Write the design document to `docs/design/{issue_id}-{slug}.md`:
235
- ```markdown
236
- # Design: {issue title}
237
-
238
- **Issue**: #{id}
239
- **Date**: {date}
240
- **Decision**: {chosen option}
241
-
242
- ## Problem
243
- {one-sentence problem statement}
244
-
245
- ## Options Considered
246
- {options with strengths/risks/effort}
247
-
248
- ## Trade-off Matrix
249
- {matrix}
250
-
251
- ## Decision
252
- {chosen option with rationale}
253
- Confidence: {N}/10 | Reversal cost: {Low/Medium/High}
254
- ```
255
- 8. Stage the design doc — it will be committed alongside the code in Phase 5
256
- 9. **Write artifact**: Call `catalyst_artifact_write` with `workflow_id={issue_id}`, `phase="design"` containing the chosen approach, trade-off summary, and design document path
257
-
258
- #### If issue has `needs-architecture` label:
259
-
260
- **Load and follow** `.github/skills/strategize-architecture-planner/SKILL.md`:
261
-
262
- 1. Define scope — system boundaries, actors, quality attributes
263
- 2. Decompose into components with responsibilities and interfaces
264
- 3. Trace the primary data flow through the system
265
- 4. Create an implementation roadmap with milestones
266
- 5. Present to user:
267
-
268
- ```
269
- 📐 Architecture for: "{issue title}"
270
-
271
- Components:
272
- | Component | Responsibility | Interface | Dependencies |
273
- |-----------|---------------|-----------|-------------|
274
- | ... | ... | ... | ... |
275
-
276
- Data Flow:
277
- 1. {step} → {step} → {step}
278
-
279
- → Approve architecture? (yes / edit)
280
- ```
281
-
282
- 6. **HUMAN GATE**: User approves the architecture
283
- 7. Store as knowledge (type: decision)
284
- 8. Write the ADR to `docs/adr/{NNN}-{slug}.md`:
285
- ```markdown
286
- # ADR-{NNN}: {title}
287
-
288
- ## Status: Accepted
289
- ## Context
290
- {why this design was needed}
291
- ## Decision
292
- {what was decided — components, data flow, interfaces}
293
- ## Consequences
294
- {positive and negative trade-offs}
295
- ## Alternatives Considered
296
- {rejected options and why}
297
- ```
298
- 9. Stage the ADR — it will be committed alongside the code in Phase 5
299
- 10. **Write artifact**: Call `catalyst_artifact_write` with `workflow_id={issue_id}`, `phase="architecture"` containing the component decomposition, data flow, interfaces, and ADR path
300
-
301
- #### If issue has BOTH labels:
302
-
303
- Run `needs-design` first (pick the approach), then `needs-architecture` (detail the design).
304
- The design decision feeds into the architecture — e.g., "we chose Redis" → architecture shows CacheService component, middleware chain, config interface.
305
-
306
- ### Phase 2.5c — Root Cause Analysis (label-gated)
307
-
308
- **Skip if the issue does NOT have a `bug`, `defect`, or `regression` label.**
309
-
310
- **Load and follow** `.github/skills/engineer-root-cause-analysis/SKILL.md` methodology:
311
-
312
- 1. **Symptom collection**:
313
- - Extract error message, stack trace, steps to reproduce from the issue description
314
- - Run `catalyst_git_log` on the affected files to check recent changes
315
- - Query `catalyst_knowledge_search` for previous root causes in the same area
316
- 2. **Hypothesis generation** — generate 2-3 ranked hypotheses:
317
-
318
- ```
319
- 🔍 RCA for: "{issue title}"
320
-
321
- | # | Hypothesis | Likelihood | Evidence | Test Strategy |
322
- |---|---|---|---|---|
323
- | H1 | {most likely} | High | {evidence} | {how to test} |
324
- | H2 | {alternative} | Medium | {evidence} | {how to test} |
325
- | H3 | {edge case} | Low | {evidence} | {how to test} |
326
- ```
327
-
328
- 3. **Systematic elimination** — for each hypothesis (highest first):
329
- - Run `catalyst_exec` to test (add logging, reproduce, check state)
330
- - Record result: confirmed / eliminated / narrowed
331
- - Max 5 attempts total (circuit breaker — same as Phase 4)
332
- 4. **Root cause identification**:
333
- - State in one sentence
334
- - Causal chain: trigger → intermediate effects → symptom
335
- - Design gap: WHY the code was vulnerable
336
- 5. **Feed into Phase 3 plan**:
337
- - The plan must fix the root cause, not just the symptom
338
- - Include a regression test that fails without the fix
339
- - Phase 5 commit footer: `Root-cause: {one-sentence description}`
340
- 6. **Store root cause** via `catalyst_knowledge_store` (type: `root-cause`):
341
- - What: the root cause description
342
- - Where: affected files/modules
343
- - Why: the design gap
344
- - Prevention: what would have caught this earlier
345
- 7. **Write artifact**: Call `catalyst_artifact_write` with `workflow_id={issue_id}`, `phase="rca"` containing the root cause, causal chain, design gap, prevention strategy, and affected files
346
- 8. **If root cause reveals a systemic issue**, flag it for pattern detection in Phase 6:
347
- - Add note: `systemic:{description}` for Phase 6 to pick up
348
-
349
- ### Phase 2.5d — Threat Modeling (label-gated)
350
-
351
- **Skip if the issue does NOT have a `needs-threat-model` or `security-sensitive` label.**
352
-
353
- **Load and follow** `.github/skills/assure-threat-model/SKILL.md` methodology:
354
-
355
- 1. **Read prior artifacts**: Load the `analysis` artifact (and `architecture` if it exists) to understand the system being built
356
- 2. **Scope the model**: Define the trust boundaries and data flows for the feature being implemented
357
- 3. **STRIDE analysis**: For each component and data flow crossing a trust boundary, evaluate all 6 STRIDE categories
358
- 4. **Risk assessment**: Score each threat (Likelihood × Impact = Risk)
359
- 5. **Mitigation planning**: For threats with risk ≥ 7, propose specific mitigations with effort and implementation phase
360
- 6. **Present to user**:
361
-
362
- ```
363
- 🛡️ Threat Model for: "{issue title}"
364
-
365
- | ID | STRIDE | Component | Threat | Risk Score | Mitigation |
366
- |----|--------|-----------|--------|------------|------------|
367
- | T1 | ... | ... | ... | ... | ... |
368
-
369
- Critical threats: {count}
370
- Required mitigations before implementation: {list}
371
-
372
- → Approve threat model? (yes / edit)
373
- ```
374
-
375
- 7. **HUMAN GATE**: User approves the threat model
376
- 8. Store via `catalyst_knowledge_store` (type: `threat-model`)
377
- 9. **Write artifact**: Call `catalyst_artifact_write` with `workflow_id={issue_id}`, `phase="threat-model"` containing the full threat register
378
- 10. Feed critical/high-risk mitigations into Phase 3 plan as mandatory implementation steps
379
-
380
- #### After design/architecture/RCA/threat-model phases:
381
-
382
- The design documents, RCA findings, and threat model inform the implementation plan. Phase 3's plan should reference:
383
- - Which approach was chosen (from design doc)
384
- - Which components to build (from architecture)
385
- - Which interfaces to implement (from ADR)
386
- - What root cause was found (from RCA) and what fix addresses it
387
- - What threats were identified (from threat model) and what mitigations are required
388
-
389
- **Read prior artifacts**: Call `catalyst_artifact_read` to load the `analysis`, `design`, `architecture`, `rca`, and/or `threat-model` artifacts. These contain the full context from earlier phases — do not rely on chat history alone.
390
-
391
- ### Phase 3 — HUMAN GATE: Plan Approval
392
-
393
- <HARD-GATE>
394
- Do NOT proceed to implementation until the user has explicitly approved the plan.
395
- Do NOT skip this gate for any reason, regardless of perceived simplicity.
396
- If the user says "just do it" without seeing the plan, present the plan anyway.
397
- </HARD-GATE>
398
-
399
- **STOP HERE. Present the plan to the user:**
400
-
401
- ```
402
- 📋 Autopilot Plan for: "{issue title}"
403
-
404
- Issue: #{id} on {board provider}
405
- {if design doc exists: "Design: docs/design/{file}.md"}
406
- {if ADR exists: "Architecture: docs/adr/{file}.md"}
407
-
408
- Steps:
409
- 1. {step description}
410
- 2. {step description}
411
- ...
412
-
413
- Files to change:
414
- - {path} (create/modify)
415
- - {path} (create/modify)
416
-
417
- Complexity: {trivial|simple|moderate|complex}
418
-
419
- Approve? (yes / edit / cancel)
420
- ```
421
-
422
- - If **yes** → call `catalyst_worker_approve`, continue to Phase 4
423
- - If **edit** → user provides changes, update plan, re-present
424
- - If **cancel** → call `catalyst_worker_fail`, stop
425
-
426
- **Write artifact**: After approval, call `catalyst_artifact_write` with `workflow_id={issue_id}`, `phase="plan"` containing the approved plan (steps, files, complexity).
427
-
428
- **Session checkpoint**: After plan approval, call `catalyst_session_save` with status="checkpoint", phase="phase-3-approved", and the current context. This ensures the approved plan can be resumed if the session is interrupted.
429
-
430
- ### Phase 4 — Branch & Implementation
431
-
432
- **Read prior artifacts**: Call `catalyst_artifact_read` for `plan` (and `analysis`, `design`, `architecture`, `rca` if they exist) to load the full execution context.
433
-
434
- 1. Call `catalyst_worker_branch` to create feature branch
435
- 2. Call `catalyst_board_move` to set issue status to "in-progress"
436
- 3. **For each step in the plan:**
437
- a. Implement the code change (create/modify files)
438
- b. Follow existing codebase patterns discovered in Phase 2
439
- c. After each logical unit, run `catalyst_exec("npm test")` or equivalent to verify nothing is broken
440
- d. If tests fail, diagnose and fix (max 3 attempts per step — circuit breaker)
441
- 4. Write tests for new code:
442
- - Match existing test framework and conventions
443
- - Cover happy path + key edge cases
444
- - Run tests to confirm they pass
445
-
446
- **Circuit breaker:** If any step fails 3 times consecutively:
447
- - Call `catalyst_board_comment` with details of the failure
448
- - Call `catalyst_worker_fail` with reason
449
- - Tell the user what went wrong and which step is stuck
450
- - STOP. Do not continue.
451
-
452
- ### Phase 5 — Change Management (Deliver Skill #1)
453
-
454
- **Load and follow** `.github/skills/deliver-change-management/SKILL.md` methodology:
455
-
456
- 1. Run `catalyst_git_diff` to analyze all changes
457
- 2. Categorize changes by type: `feat`, `fix`, `refactor`, `test`, `docs`, `chore`
458
- 3. **If changes span multiple logical units** (e.g., new feature + test + config):
459
- - Split into separate commits with `catalyst_git_stage` per group
460
- - Each commit gets its own conventional message
461
- - Example:
462
- ```
463
- git add src/feature.ts
464
- → feat(scope): add feature X (closes #ID)
465
-
466
- git add tests/feature.test.ts
467
- → test(scope): add tests for feature X
468
-
469
- git add docs/api.md
470
- → docs(scope): update API docs for feature X
471
- ```
472
- 4. **If changes are a single logical unit**, create one commit:
473
- - Format: `feat(scope): description (closes #ID)`
474
- - Body: what was implemented and why
475
- - Footer: `Closes #ID`
476
- 5. Call `catalyst_git_stage` and `catalyst_git_commit` for each logical commit
477
- 6. **Write artifact**: Call `catalyst_artifact_write` with `workflow_id={issue_id}`, `phase="change-mgmt"` containing the list of commits created (hash, type, scope, message)
478
-
479
- ### Phase 5b — Doc Governance (Deliver Skill #2)
480
-
481
- **Load and follow** `.github/skills/deliver-doc-governance/SKILL.md` methodology:
482
-
483
- 1. Check if the changes affect any **public interfaces**:
484
- - New/changed API endpoints
485
- - New/changed CLI commands
486
- - New/changed configuration options
487
- - New/changed tool signatures
488
- - New/changed exports or public functions
489
- 2. If public interfaces changed, run drift detection:
490
- - Compare README against actual project structure and features
491
- - Compare API docs against actual function signatures
492
- - Check if code examples still work
493
- - Verify install/setup instructions are still accurate
494
- 3. **If drift found:**
495
- - Fix the documentation directly (same branch)
496
- - Stage and commit: `docs(scope): sync docs with implementation changes`
497
- - Add to the PR body: `### Documentation Updated` section listing what was synced
498
- 4. **If no public interfaces changed**, skip — note "No doc changes needed" in the PR body
499
-
500
- ### Phase 6 — PR Creation & Auto-Review
501
-
502
- 1. Call `catalyst_worker_preview_pr` with:
503
- - Title: primary commit message
504
- - Body: markdown with sections:
505
- - **What**: summary of changes
506
- - **Why**: linked to issue #{ID}
507
- - **Changes**: list of commits with descriptions
508
- - **Documentation Updated**: what docs were synced (or "N/A")
509
- - **How to test**: steps to verify
510
- - **Checklist**: tests pass, lint clean, types clean, docs synced
511
- <HARD-GATE>
512
- 2. **HUMAN GATE**: User reviews the preview — do NOT create the PR until the user approves.
513
- If the user requests changes, apply them and re-preview. Never skip this gate.
514
- </HARD-GATE>
515
- 3. Call `catalyst_worker_pr` to create the PR
516
- 4. **Run PR Intelligence** (read `.github/skills/assure-pr-intelligence/SKILL.md`):
517
- - **Change inventory**: categorize changed files (core, api, test, config, docs)
518
- - **Risk assessment**: evaluate scope, complexity, blast radius, test coverage, reversibility → Low/Medium/High/Critical risk score
519
- - **Reviewer guidance**: order files by review priority, flag lines needing attention, list questions the reviewer should ask, note what's missing from the PR
520
- - **Merge readiness checklist**: tests pass, security clean, breaking changes documented, PR description matches changes
521
- - Post the full PR Intelligence report as a **comment on the PR** so the assigned reviewer sees it immediately
522
- 5. Read the diff of the PR
523
- 6. **Subagent delegation (recommended for moderate/complex changes):** Use `catalyst_dispatch_subagent` to delegate review work in parallel:
524
- - Delegate `code-reviewer` role with the diff and file list — receives correctness, security, and performance findings
525
- - Delegate `standards-reviewer` role with the diff and codebase conventions — receives standards compliance findings
526
- - Delegate `security-auditor` role with source files and architecture context — receives STRIDE/OWASP findings
527
- - Each subagent writes its output as an artifact (e.g. `review-functional`, `review-standards`) for traceability
528
- - Merge subagent findings using `catalyst_dispatch_consensus` to identify high-confidence vs disputed issues
529
-
530
- **Fallback (simple changes):** Run reviews inline without subagent delegation:
531
- 7. Run **code-quality** review internally (read `.github/skills/assure-code-quality/SKILL.md`):
532
- - Correctness: does the code do what the acceptance criteria say?
533
- - Security: any obvious vulnerabilities (SQL injection, XSS, secrets)?
534
- - Performance: any N+1 queries, await-in-loops, unnecessary re-renders?
535
- - Style: does it match codebase conventions?
536
- 7. Run **vulnerability-scan** internally (read `.github/skills/assure-vulnerability-scan/SKILL.md`):
537
- - OWASP Top 10 quick check on new code
538
- - Dependency audit: `npm audit` or `pip audit`
539
- 8. Run `catalyst_exec("npm run lint")` and `catalyst_exec("npm run typecheck")` if available
540
- 8b. **(Optional) Requirements alignment validation**: If M365 context was fetched in Phase 2, validate the implementation against meeting-stated requirements:
541
- - Read the `analysis` artifact to retrieve the M365 requirements context captured earlier
542
- - If the analysis artifact contains meeting decisions or stakeholder expectations, call `mcp_workiq_ask_work_iq` → "What specific requirements and acceptance criteria were stated for {feature} in meetings and emails?"
543
- - Cross-reference each stated requirement against the implementation diff:
544
- - **Covered**: the requirement is addressed by the code changes ✓
545
- - **Partial**: the requirement is partially addressed — flag what's missing
546
- - **Missing**: the requirement is not addressed at all — flag as a review finding
547
- - Include requirements alignment in the PR comment:
548
- ```
549
- 📋 Requirements Alignment:
550
- Meeting requirements checked: {N}
551
- Covered: {count} ✓ | Partial: {count} ⚠️ | Missing: {count} ❌
552
- {list any partial/missing items}
553
- ```
554
- - If critical requirements are missing, flag as a review issue that must be addressed before merge
555
- 9. **Run diff-scoped pattern detection** (read `.github/skills/insights-pattern-detection/SKILL.md`):
556
- - Scope: only scan files changed in the diff (NOT full codebase)
557
- - Check for **consistency** with existing codebase patterns:
558
- - Error handling style matches project conventions?
559
- - Data access patterns match?
560
- - Naming conventions followed?
561
- - Test structure matches existing tests?
562
- - Check for **anti-patterns** in changed files:
563
- - God object/file (single file > 500 lines with mixed responsibilities)
564
- - Copy-paste (near-duplicate code blocks)
565
- - Shotgun surgery (small change touching too many files)
566
- - Primitive obsession (strings/numbers where domain types belong)
567
- - **Query knowledge base for repeat offenses**:
568
- - `catalyst_knowledge_search` type: `pattern` — "has this same anti-pattern been flagged before?"
569
- - If a repeat offense is found, flag prominently:
570
- ```
571
- ⚠️ Recurring Pattern Issue: {description}
572
- Previously flagged in: {previous context}
573
- Suggestion: Consider a structural fix.
574
- ```
575
- - Run `catalyst_metrics_complexity` on changed files — flag any function with complexity > threshold
576
- - Include pattern findings in the PR comment:
577
- ```
578
- 🔎 Pattern Detection Results:
579
- Consistency: {✓ follows codebase patterns | ⚠️ deviations found}
580
- Anti-patterns: {✓ none | ⚠️ {list}}
581
- Repeat issues: {✓ none | ⚠️ {count} recurring}
582
- Complexity: {✓ within threshold | ⚠️ {files} above limit}
583
- ```
584
- 10. **If issues found (review, security, or pattern):**
585
- - Fix them directly
586
- - Re-commit: `fix(scope): address review findings`
587
- - Re-push
588
- - Re-run pattern detection on the fix to confirm resolution
589
- 11. **Write artifact**: Call `catalyst_artifact_write` with `workflow_id={issue_id}`, `phase="review-merged"` containing the combined review results (code-quality, vulnerability-scan, pattern detection findings, and fix iterations)
590
- 12. Call `catalyst_worker_review_done` with verdict: "approved" and summary
591
- 12. Call `catalyst_board_move` to set issue status to "in-review"
592
- 13. Call `catalyst_board_comment`: "PR #{pr_number} opened. Ready for review."
593
-
594
- ### Phase 7 — Deploy Guard (Deliver Skill #3)
595
-
596
- **Load and follow** `.github/skills/deliver-deploy-guard/SKILL.md` methodology:
597
-
598
- Before marking ready to merge, run the 6-gate checklist:
599
-
600
- 1. **Code Quality Gate**: No leftover TODOs, console.logs, or commented-out code in changed files
601
- 2. **Test Integrity Gate**: All tests pass, coverage meets threshold, no `.skip` tests
602
- 3. **Security Gate**: No hardcoded secrets, no critical CVEs, no unsafe patterns
603
- 4. **Configuration Gate**: Env vars documented, no dev config in prod paths
604
- 5. **Breaking Changes Gate**: API contracts backward-compatible, no dropped exports
605
- 6. **Operational Readiness Gate**: Health endpoints, logging, error handling
606
-
607
- Produce a verdict and include in the PR comment:
608
-
609
- ```
610
- 🛡️ Deploy Guard Results:
611
- Code Quality: ✓ pass
612
- Test Integrity: ✓ pass (coverage: 86%)
613
- Security: ✓ pass
614
- Configuration: ✓ pass
615
- Breaking Changes: ✓ pass
616
- Operational: ✓ pass
617
-
618
- Verdict: GO ✅
619
- ```
620
-
621
- - If **GO** → proceed to Phase 8
622
- - If **CONDITIONAL** → list warnings in PR comment, proceed (human decides)
623
- - If **NO-GO** → fix blockers, re-run until GO or escalate to user
624
-
625
- **Write artifact**: Call `catalyst_artifact_write` with `workflow_id={issue_id}`, `phase="deploy-guard"` containing the full 6-gate results and verdict.
626
-
627
- ### Phase 8 — Completion & Learning
628
-
629
- 1. Call `catalyst_board_comment` with deploy guard results: "All checks passed. Ready to merge."
630
- 2. **Store knowledge** via `catalyst_knowledge_store`:
631
- - Decisions made during implementation (type: `decision`)
632
- - Root cause findings, if this was a bug fix (type: `root-cause`)
633
- - **Pattern findings** from Phase 6 (type: `pattern`):
634
- - What patterns were followed or violated
635
- - Any anti-patterns found and fixed
636
- - Any repeat offenses detected
637
- - Complexity hotspots
638
- - This creates the **self-improvement loop**: future runs query this data in Phase 2 to avoid repeating the same mistakes
639
- 3. Present final summary to user:
640
-
641
- ```
642
- ✅ Autopilot Complete
643
-
644
- Issue: #{id} — {title}
645
- Branch: {branch_name}
646
- PR: #{pr_number}
647
- Status: Ready to merge
648
-
649
- Changes:
650
- - {N} commits across {M} files
651
- - {file} (created/modified) — {what changed}
652
-
653
- Deliver Pipeline:
654
- Change Mgmt: {N} conventional commits (feat/fix/test/docs)
655
- Doc Sync: {updated | no changes needed}
656
- Deploy Guard: {GO | CONDITIONAL — warnings}
657
-
658
- {if bug fix:}
659
- Root Cause: {one-sentence root cause}
660
- Design Gap: {why it was vulnerable}
661
- Prevention: {what would catch this earlier}
662
-
663
- Tests: {X} passing | Coverage: {Y}%
664
- Review: Auto-reviewed — code-quality + vulnerability-scan
665
- Security: No issues found
666
- Patterns: {✓ clean | ⚠️ {count} findings — stored for future runs}
667
- Repeat Issues: {none | {count} recurring patterns detected}
668
-
669
- → Merge when ready. Board will auto-update on close.
670
- ```
671
-
672
- 4. **Write artifact**: Call `catalyst_artifact_write` with `workflow_id={issue_id}`, `phase="completion"` containing the final summary (PR number, branch, commits, review/deploy-guard results, knowledge stored)
673
- 5. Call `catalyst_worker_complete`
674
-
675
- ### Capability Hints (on completion)
676
-
677
- After presenting the final summary, append **one** contextual hint based on the session. Show each hint at most once per session.
678
-
679
- | Context | Hint |
680
- |---|---|
681
- | First time user ran autopilot | 💡 *I can also parse meeting transcripts into user stories and epics — say "parse meeting" with your notes.* |
682
- | Multiple autopilot runs completed | 💡 *I can generate a daily digest summarizing all your work — say "daily digest" or "eod report".* |
683
- | Knowledge was stored during this run | 💡 *I remember decisions across sessions. Ask "what did we decide about X" anytime to recall.* |
684
- | Pattern issues were detected | 💡 *I can run a full codebase health scan for anti-patterns and tech debt — say "codebase health".* |
685
-
686
- ## Output Format
687
-
688
- Always use the structured format shown in each phase. Lead with the status emoji:
689
- - 📋 = planning
690
- - ⚠️ = waiting for approval
691
- - 🔨 = implementing
692
- - 🔍 = reviewing
693
- - ✅ = done
694
- - ✗ = failed
695
-
696
- ## Anti-Patterns
697
-
698
- <HARD-GATE>
699
- - Do NOT skip the human gate (Phase 3). The plan MUST be shown and approved.
700
- - Do NOT auto-merge the PR. Only humans merge.
701
- - Do NOT bypass the PR preview gate (Phase 6). The user MUST see the preview.
702
- </HARD-GATE>
703
- - Do NOT continue after 3 consecutive failures on a step. Escalate to human.
704
- - Do NOT install new dependencies without mentioning them in the plan.
705
- - Do NOT modify files outside the scope of the plan without asking.
706
- - Do NOT generate placeholder/stub code. Every file must be functional.
707
- - Do NOT skip tests. If the project has a test framework, write tests.
708
-
709
- ## No Placeholders
710
-
711
- Every step in the Phase 3 plan and every file produced in Phase 4 must contain real, working content. The following are **plan failures** — never write them:
712
-
713
- | Forbidden Pattern | Why It Fails |
714
- |---|---|
715
- | "TBD", "TODO", "implement later" | Defers work that should be done now |
716
- | "Add appropriate error handling" | Vague — specify which errors and how to handle them |
717
- | "Add validation" | Which inputs? What rules? What error messages? |
718
- | "Handle edge cases" | Name the edge cases or don't mention them |
719
- | "Write tests for the above" | Show the actual test code |
720
- | "Similar to Phase N" | Repeat the details — context resets between phases |
721
- | Steps without code blocks | If a step changes code, show the code |
722
- | References to undefined types/functions | Every symbol must trace back to an earlier step |
723
-
724
- ## Chains To
725
-
726
- - `solution-design` — Phase 2.5: generate solution design doc when `needs-design` label detected
727
- - `architecture-planner` — Phase 2.5: generate ADR when `needs-architecture` label detected
728
- - `root-cause-analysis` — Phase 2.5c: systematic RCA when `bug`/`defect`/`regression` label detected
729
- - `threat-model` — Phase 2.5d: STRIDE threat modeling when `needs-threat-model`/`security-sensitive` label detected
730
- - `change-management` — Phase 5: proper conventional commits with multi-commit splitting
731
- - `doc-governance` — Phase 5b: auto-detect and fix documentation drift
732
- - `pr-intelligence` — Phase 6: risk assessment + reviewer guidance posted on PR
733
- - `code-quality` — Phase 6: multi-pass review of the PR
734
- - `vulnerability-scan` — Phase 6: security audit of new code
735
- - `pattern-detection` — Phase 2 (query known patterns) + Phase 6 (diff-scoped scan) + Phase 8 (store findings)
736
- - `deploy-guard` — Phase 7: 6-gate safety check before marking ready to merge
737
- - `knowledge-base` — Phase 2, 2.5c, 6, 8: the memory hub that powers the self-improvement loop
1
+ # Autopilot Worker
2
+
3
+ > **Pillar**: Orchestrate | **ID**: `autopilot-worker`
4
+
5
+ ## Purpose
6
+
7
+ Single-command pipeline that creates a board issue, plans implementation, writes code + tests, applies the full Deliver pipeline (change-management → doc-governance → deploy-guard), opens a reviewed PR, and updates the board. One human gate: approve the plan. Everything else is automatic chaining through 12 skills. Includes label-gated design/architecture phases, bug-triggered root-cause analysis, and a continuous self-improvement loop via pattern detection + knowledge base.
8
+
9
+ ## Activation Triggers
10
+
11
+ - autopilot, auto, pick up, work on, do this, implement and ship, end to end, full pipeline
12
+ - Routed from `feature-builder` Phase 0 when complexity is moderate or complex
13
+ - User provides a board issue number ("#42", "issue 42")
14
+
15
+ ## Session Role Exception
16
+
17
+ This pipeline chains 12 skills across role boundaries (e.g. code-quality and vulnerability-scan in Phase 6 are Review skills, but run inside the Builder pipeline). **All skills invoked internally by this pipeline are unrestricted by the session role.** Role scoping only applies to user-initiated requests, not pipeline steps.
18
+
19
+ ## Tools Required
20
+
21
+ - `crewpilot_board_connect` — connect to board provider
22
+ - `crewpilot_board_create` — create issue on board
23
+ - `crewpilot_board_move` — update issue status
24
+ - `crewpilot_board_comment` — log progress on the issue
25
+ - `crewpilot_worker_start` — start orchestrator workflow
26
+ - `crewpilot_worker_plan` — set execution plan
27
+ - `crewpilot_worker_approve` — human approval gate
28
+ - `crewpilot_worker_branch` — create feature branch
29
+ - `crewpilot_worker_pr` — push + open PR
30
+ - `crewpilot_worker_review_done` — record review verdict
31
+ - `crewpilot_worker_complete` — mark workflow done
32
+ - `crewpilot_worker_fail` — circuit breaker on failure
33
+ - `crewpilot_git_stage` — stage files
34
+ - `crewpilot_git_commit` — commit changes
35
+ - `crewpilot_exec` — run commands (tests, lint, build)
36
+ - `crewpilot_knowledge_store` — store decisions made during implementation
37
+ - `crewpilot_git_diff` — analyze changes for change-management
38
+ - `crewpilot_git_log` — commit history for release notes
39
+ - `crewpilot_metrics_coverage` — coverage check for deploy-guard
40
+ - `crewpilot_metrics_complexity` — complexity check for deploy-guard and pattern detection
41
+ - `crewpilot_worker_preview_pr` — preview changes before PR creation
42
+ - `crewpilot_worker_push_fixes` — push fixes to existing PR branch (no new PR)
43
+ - `crewpilot_board_pr_comments` — fetch review comments from a PR
44
+ - `crewpilot_knowledge_search` — query known patterns, anti-patterns, and past root causes
45
+ - `crewpilot_artifact_write` — persist phase outputs (analysis, plans, reviews) so downstream phases can read them
46
+ - `crewpilot_artifact_read` — read artifacts from prior phases (e.g. analysis → plan, plan → implementation)
47
+ - `crewpilot_artifact_list` — list all artifacts for the current workflow
48
+ - `crewpilot_dispatch_subagent` — delegate focused work (code review, test writing, security audit) to specialized sub-agents
49
+ - `crewpilot_session_save` — save session state for long-running tasks (enables resume across conversations)
50
+ - `crewpilot_session_restore` — restore a previously saved session to continue work
51
+ - `crewpilot_session_list` — list all saved sessions
52
+ - `mcp_workiq_ask_work_iq` — (optional, requires Work IQ extension) fetch M365 context (emails, docs, meetings) related to the task
53
+
54
+ ## Methodology
55
+
56
+ ### Process Flow
57
+
58
+ ```dot
59
+ digraph autopilot_worker {
60
+ rankdir=TB;
61
+ node [shape=box];
62
+
63
+ intake [label="Phase 1\nIntake & Issue Creation"];
64
+ analysis [label="Phase 2\nCodebase Analysis & Planning"];
65
+ design [label="Phase 2.5\nDesign & Architecture\n(label-gated)", style=dashed];
66
+ rca [label="Phase 2.5c\nRoot Cause Analysis\n(bug label-gated)", style=dashed];
67
+ threat [label="Phase 2.5d\nThreat Model\n(security label-gated)", style=dashed];
68
+ plan_gate [label="Phase 3\nHUMAN GATE: Plan Approval", shape=diamond, style=filled, fillcolor="#ffcccc"];
69
+ implement [label="Phase 4\nBranch & Implementation"];
70
+ change_mgmt [label="Phase 5\nChange Management"];
71
+ doc_gov [label="Phase 5b\nDoc Governance"];
72
+ pr_review [label="Phase 6\nPR Creation & Auto-Review\n(5-stage)"];
73
+ deploy_guard [label="Phase 7\nDeploy Guard\n(6 gates)"];
74
+ complete [label="Phase 8\nCompletion & Learning", shape=doublecircle];
75
+ fail [label="FAIL\nCircuit Breaker", shape=octagon, style=filled, fillcolor="#ff9999"];
76
+
77
+ intake -> analysis;
78
+ analysis -> design [label="needs-design\nor needs-architecture"];
79
+ analysis -> rca [label="bug/defect/\nregression"];
80
+ analysis -> threat [label="needs-threat-model\nor security-sensitive"];
81
+ analysis -> plan_gate [label="no special labels"];
82
+ design -> plan_gate;
83
+ rca -> plan_gate;
84
+ threat -> plan_gate;
85
+ plan_gate -> implement [label="approved"];
86
+ plan_gate -> fail [label="cancelled"];
87
+ implement -> change_mgmt;
88
+ implement -> fail [label="3 failures"];
89
+ change_mgmt -> doc_gov;
90
+ doc_gov -> pr_review;
91
+ pr_review -> pr_review [label="issues found\nfix & re-run"];
92
+ pr_review -> deploy_guard;
93
+ deploy_guard -> complete [label="GO"];
94
+ deploy_guard -> pr_review [label="NO-GO\nfix blockers"];
95
+ complete -> complete [label="store knowledge\nself-improvement loop"];
96
+ }
97
+ ```
98
+
99
+ ### Phase 1 — Intake & Issue Creation
100
+
101
+ **First interaction hint:** If this is the first interaction in the session, start with:
102
+ > 💡 *Running CrewPilot Autopilot — I'll summarize the task, confirm with you before creating a board issue, plan the work, get your approval, implement, test, review, and open a PR.*
103
+
104
+ **Entry mode detection** — the worker can be entered four ways:
105
+
106
+ | Entry Mode | How to Detect | Behavior |
107
+ |---|---|---|
108
+ | **Direct** | User says "autopilot", "full pipeline", etc. | Run full pipeline from Phase 1 |
109
+ | **Routed from feature-builder** | feature-builder's Phase 0 classified as moderate/complex | Skip re-analyzing complexity — it's already assessed. Use the context feature-builder gathered. |
110
+ | **Mid-build escalation** | feature-builder discovered more complexity during Phase 4 | Accept the partial context (files already touched, patterns found). Start from Phase 2 (planning) with what's already known. |
111
+ | **Session resume** | User says "resume", "continue", "pick up where I left off" | Call `crewpilot_session_restore` with the workflow ID. Read the saved state, load associated artifacts, and resume from the last pending action. |
112
+
113
+ **Session resume flow**: When resuming, the agent should:
114
+ 1. Call `crewpilot_session_restore` to get the saved state
115
+ 2. Call `crewpilot_artifact_list` to see what artifacts exist
116
+ 3. Read relevant artifacts with `crewpilot_artifact_read`
117
+ 4. **(Optional) Calendar-aware context refresh**: If `mcp_workiq_ask_work_iq` is available and significant time has passed since the session was saved (overnight, weekend, or >4 hours):
118
+ - Call `mcp_workiq_accept_eula` with `eulaUrl: "https://github.com/microsoft/work-iq-mcp"` (idempotent)
119
+ - **Check for new context**: `mcp_workiq_ask_work_iq` → "What meetings, emails, or Teams messages about {issue title / feature} happened since {saved_at timestamp}? Summarize any new decisions, requirement changes, or blockers."
120
+ - **Check calendar conflicts**: `mcp_workiq_ask_work_iq` → "Do I have any meetings in the next 2 hours that might affect my availability?"
121
+ - If new decisions or requirement changes are found, flag them to the user before continuing:
122
+ ```
123
+ 📅 Context Update (since session was saved {age} ago):
124
+ - {new decision / requirement change / blocker}
125
+ → Continue with current plan? (yes / re-plan)
126
+ ```
127
+ - If unavailable, skip — resume proceeds without M365 context refresh.
128
+ 5. Continue from the first pending action in the saved state
129
+ 6. Do NOT re-run phases that have already completed (check artifacts_written)
130
+
131
+ **Complexity check (direct entry only):** If the user enters autopilot directly, quickly assess if the request warrants the full pipeline:
132
+ - If the request is trivial (single file, obvious change) → suggest: *"This is a small change. I can implement it directly without the full pipeline. Want me to do that instead?"*
133
+ - If the user says "just do it" → hand off to `feature-builder` (which will handle it as trivial/simple tier).
134
+ - Otherwise → continue with the full pipeline below.
135
+
136
+ **If user provides a task description (not an existing issue number):**
137
+
138
+ 1. Parse the user's request to extract:
139
+ - Title (concise, action-oriented)
140
+ - Description (what needs to be built)
141
+ - Acceptance criteria (bullet list — infer from description if not explicit)
142
+ - Labels (feature, bug, chore — infer from context)
143
+
144
+ <HARD-GATE>
145
+ 2. **HUMAN GATE — Task Creation Confirmation**: Present the inferred task summary to the user BEFORE creating the board issue:
146
+
147
+ ```
148
+ 📋 Before I start, here's what I'll create as a board issue:
149
+
150
+ Title: {title}
151
+ Description: {description}
152
+
153
+ Acceptance Criteria:
154
+ - [ ] {criterion 1}
155
+ - [ ] {criterion 2}
156
+ - [ ] {criterion 3}
157
+
158
+ Labels: {labels}
159
+
160
+ → Create this task and start the pipeline? (yes / edit / no)
161
+ ```
162
+
163
+ - If **yes** → call `crewpilot_board_create`, continue to Phase 2
164
+ - If **edit** → user provides corrections, update and re-present
165
+ - If **no** → stop the pipeline. Ask the user what they'd like to do instead.
166
+ - Do NOT create the board issue without explicit user confirmation.
167
+ </HARD-GATE>
168
+
169
+ 3. Call `crewpilot_board_create` with title, description, acceptance criteria
170
+ 4. Note the created issue ID
171
+
172
+ **If user provides an existing issue number (e.g., "#42"):**
173
+
174
+ 1. Call `crewpilot_board_get` to read the existing issue
175
+ 2. Use its title, description, and acceptance criteria as-is
176
+ 3. No confirmation needed — the task already exists
177
+
178
+ ### Phase 2 — Codebase Analysis & Planning
179
+
180
+ 1. Read the project structure — scan key files (package.json, tsconfig, src/ layout, existing patterns)
181
+ 2. Identify:
182
+ - Which files need to be **created**
183
+ - Which files need to be **modified**
184
+ - What patterns/conventions the codebase follows (naming, directory structure, test style)
185
+ - What dependencies might be needed
186
+ 3. Check issue labels for `needs-design`, `needs-architecture`, `bug`/`defect`/`regression`, and `needs-threat-model`/`security-sensitive`
187
+ 4. **Query pattern knowledge** via `crewpilot_knowledge_search` (type: `pattern`):
188
+ - Search for known patterns and anti-patterns in the files being modified
189
+ - Search for past root causes in the same area of the codebase
190
+ - Collect any "repeat offender" warnings from previous runs
191
+ - Feed this context into the plan so the worker avoids known mistakes
192
+ 5. **(Optional) Fetch M365 requirements context**: First call `mcp_workiq_accept_eula` with `eulaUrl: "https://github.com/microsoft/work-iq-mcp"` (idempotent), then use **focused queries** to surface requirements context before planning:
193
+ - **Requirements & specs**: `mcp_workiq_ask_work_iq` → "Find emails, documents, and Teams messages about: {issue title}. Summarize relevant discussions, specs, and design docs."
194
+ - **Meeting decisions**: `mcp_workiq_ask_work_iq` → "What decisions were made about {issue title / feature name} in recent meetings? What requirements were stated?"
195
+ - **Stakeholder expectations**: `mcp_workiq_ask_work_iq` → "What did stakeholders or customers say about {feature} in recent emails or meetings? What was promised or committed?"
196
+ - Feed the M365 context into the analysis artifact so Phase 3's plan addresses stated requirements, not just the issue description.
197
+ - If `mcp_workiq_ask_work_iq` is unavailable, skip — this step is optional.
198
+ 6. Call `crewpilot_worker_start` with the issue ID and title
199
+ 7. **Write artifact**: Call `crewpilot_artifact_write` with `workflow_id={issue_id}`, `phase="analysis"` containing:
200
+ - Files to create/modify
201
+ - Codebase patterns discovered
202
+ - Dependencies needed
203
+ - Label-gated phases to run
204
+ - Known patterns/anti-patterns from knowledge search
205
+
206
+ ### Phase 2.5 — Design & Architecture (label-gated)
207
+
208
+ **Skip this phase entirely if the issue has neither `needs-design` nor `needs-architecture` label.**
209
+
210
+ Check the issue labels (from `crewpilot_board_get`). Run the applicable skills:
211
+
212
+ #### If issue has `needs-design` label:
213
+
214
+ **Load and follow** `.github/skills/strategize-solution-design/SKILL.md`:
215
+
216
+ 1. Frame the problem — restate in one sentence with constraints
217
+ 2. Generate 3-4 distinct approaches with strengths, risks, and effort
218
+ 3. Build a trade-off matrix comparing all options
219
+ 4. Present to user:
220
+
221
+ ```
222
+ 📐 Design Phase for: "{issue title}"
223
+
224
+ {trade-off matrix}
225
+
226
+ Recommendation: {option} (Confidence: {N}/10)
227
+ Reversal cost: {Low/Medium/High}
228
+
229
+ → Which approach? (A / B / C / edit)
230
+ ```
231
+
232
+ 5. **HUMAN GATE**: User picks an approach
233
+ 6. Store the decision via `crewpilot_knowledge_store` (type: decision)
234
+ 7. Write the design document to `docs/design/{issue_id}-{slug}.md`:
235
+ ```markdown
236
+ # Design: {issue title}
237
+
238
+ **Issue**: #{id}
239
+ **Date**: {date}
240
+ **Decision**: {chosen option}
241
+
242
+ ## Problem
243
+ {one-sentence problem statement}
244
+
245
+ ## Options Considered
246
+ {options with strengths/risks/effort}
247
+
248
+ ## Trade-off Matrix
249
+ {matrix}
250
+
251
+ ## Decision
252
+ {chosen option with rationale}
253
+ Confidence: {N}/10 | Reversal cost: {Low/Medium/High}
254
+ ```
255
+ 8. Stage the design doc — it will be committed alongside the code in Phase 5
256
+ 9. **Write artifact**: Call `crewpilot_artifact_write` with `workflow_id={issue_id}`, `phase="design"` containing the chosen approach, trade-off summary, and design document path
257
+
258
+ #### If issue has `needs-architecture` label:
259
+
260
+ **Load and follow** `.github/skills/strategize-architecture-planner/SKILL.md`:
261
+
262
+ 1. Define scope — system boundaries, actors, quality attributes
263
+ 2. Decompose into components with responsibilities and interfaces
264
+ 3. Trace the primary data flow through the system
265
+ 4. Create an implementation roadmap with milestones
266
+ 5. Present to user:
267
+
268
+ ```
269
+ 📐 Architecture for: "{issue title}"
270
+
271
+ Components:
272
+ | Component | Responsibility | Interface | Dependencies |
273
+ |-----------|---------------|-----------|-------------|
274
+ | ... | ... | ... | ... |
275
+
276
+ Data Flow:
277
+ 1. {step} → {step} → {step}
278
+
279
+ → Approve architecture? (yes / edit)
280
+ ```
281
+
282
+ 6. **HUMAN GATE**: User approves the architecture
283
+ 7. Store as knowledge (type: decision)
284
+ 8. Write the ADR to `docs/adr/{NNN}-{slug}.md`:
285
+ ```markdown
286
+ # ADR-{NNN}: {title}
287
+
288
+ ## Status: Accepted
289
+ ## Context
290
+ {why this design was needed}
291
+ ## Decision
292
+ {what was decided — components, data flow, interfaces}
293
+ ## Consequences
294
+ {positive and negative trade-offs}
295
+ ## Alternatives Considered
296
+ {rejected options and why}
297
+ ```
298
+ 9. Stage the ADR — it will be committed alongside the code in Phase 5
299
+ 10. **Write artifact**: Call `crewpilot_artifact_write` with `workflow_id={issue_id}`, `phase="architecture"` containing the component decomposition, data flow, interfaces, and ADR path
300
+
301
+ #### If issue has BOTH labels:
302
+
303
+ Run `needs-design` first (pick the approach), then `needs-architecture` (detail the design).
304
+ The design decision feeds into the architecture — e.g., "we chose Redis" → architecture shows CacheService component, middleware chain, config interface.
305
+
306
+ ### Phase 2.5c — Root Cause Analysis (label-gated)
307
+
308
+ **Skip if the issue does NOT have a `bug`, `defect`, or `regression` label.**
309
+
310
+ **Load and follow** `.github/skills/engineer-root-cause-analysis/SKILL.md` methodology:
311
+
312
+ 1. **Symptom collection**:
313
+ - Extract error message, stack trace, steps to reproduce from the issue description
314
+ - Run `crewpilot_git_log` on the affected files to check recent changes
315
+ - Query `crewpilot_knowledge_search` for previous root causes in the same area
316
+ 2. **Hypothesis generation** — generate 2-3 ranked hypotheses:
317
+
318
+ ```
319
+ 🔍 RCA for: "{issue title}"
320
+
321
+ | # | Hypothesis | Likelihood | Evidence | Test Strategy |
322
+ |---|---|---|---|---|
323
+ | H1 | {most likely} | High | {evidence} | {how to test} |
324
+ | H2 | {alternative} | Medium | {evidence} | {how to test} |
325
+ | H3 | {edge case} | Low | {evidence} | {how to test} |
326
+ ```
327
+
328
+ 3. **Systematic elimination** — for each hypothesis (highest first):
329
+ - Run `crewpilot_exec` to test (add logging, reproduce, check state)
330
+ - Record result: confirmed / eliminated / narrowed
331
+ - Max 5 attempts total (circuit breaker — same as Phase 4)
332
+ 4. **Root cause identification**:
333
+ - State in one sentence
334
+ - Causal chain: trigger → intermediate effects → symptom
335
+ - Design gap: WHY the code was vulnerable
336
+ 5. **Feed into Phase 3 plan**:
337
+ - The plan must fix the root cause, not just the symptom
338
+ - Include a regression test that fails without the fix
339
+ - Phase 5 commit footer: `Root-cause: {one-sentence description}`
340
+ 6. **Store root cause** via `crewpilot_knowledge_store` (type: `root-cause`):
341
+ - What: the root cause description
342
+ - Where: affected files/modules
343
+ - Why: the design gap
344
+ - Prevention: what would have caught this earlier
345
+ 7. **Write artifact**: Call `crewpilot_artifact_write` with `workflow_id={issue_id}`, `phase="rca"` containing the root cause, causal chain, design gap, prevention strategy, and affected files
346
+ 8. **If root cause reveals a systemic issue**, flag it for pattern detection in Phase 6:
347
+ - Add note: `systemic:{description}` for Phase 6 to pick up
348
+
349
+ ### Phase 2.5d — Threat Modeling (label-gated)
350
+
351
+ **Skip if the issue does NOT have a `needs-threat-model` or `security-sensitive` label.**
352
+
353
+ **Load and follow** `.github/skills/assure-threat-model/SKILL.md` methodology:
354
+
355
+ 1. **Read prior artifacts**: Load the `analysis` artifact (and `architecture` if it exists) to understand the system being built
356
+ 2. **Scope the model**: Define the trust boundaries and data flows for the feature being implemented
357
+ 3. **STRIDE analysis**: For each component and data flow crossing a trust boundary, evaluate all 6 STRIDE categories
358
+ 4. **Risk assessment**: Score each threat (Likelihood × Impact = Risk)
359
+ 5. **Mitigation planning**: For threats with risk ≥ 7, propose specific mitigations with effort and implementation phase
360
+ 6. **Present to user**:
361
+
362
+ ```
363
+ 🛡️ Threat Model for: "{issue title}"
364
+
365
+ | ID | STRIDE | Component | Threat | Risk Score | Mitigation |
366
+ |----|--------|-----------|--------|------------|------------|
367
+ | T1 | ... | ... | ... | ... | ... |
368
+
369
+ Critical threats: {count}
370
+ Required mitigations before implementation: {list}
371
+
372
+ → Approve threat model? (yes / edit)
373
+ ```
374
+
375
+ 7. **HUMAN GATE**: User approves the threat model
376
+ 8. Store via `crewpilot_knowledge_store` (type: `threat-model`)
377
+ 9. **Write artifact**: Call `crewpilot_artifact_write` with `workflow_id={issue_id}`, `phase="threat-model"` containing the full threat register
378
+ 10. Feed critical/high-risk mitigations into Phase 3 plan as mandatory implementation steps
379
+
380
+ #### After design/architecture/RCA/threat-model phases:
381
+
382
+ The design documents, RCA findings, and threat model inform the implementation plan. Phase 3's plan should reference:
383
+ - Which approach was chosen (from design doc)
384
+ - Which components to build (from architecture)
385
+ - Which interfaces to implement (from ADR)
386
+ - What root cause was found (from RCA) and what fix addresses it
387
+ - What threats were identified (from threat model) and what mitigations are required
388
+
389
+ **Read prior artifacts**: Call `crewpilot_artifact_read` to load the `analysis`, `design`, `architecture`, `rca`, and/or `threat-model` artifacts. These contain the full context from earlier phases — do not rely on chat history alone.
390
+
391
+ ### Phase 3 — HUMAN GATE: Plan Approval
392
+
393
+ <HARD-GATE>
394
+ Do NOT proceed to implementation until the user has explicitly approved the plan.
395
+ Do NOT skip this gate for any reason, regardless of perceived simplicity.
396
+ If the user says "just do it" without seeing the plan, present the plan anyway.
397
+ </HARD-GATE>
398
+
399
+ **STOP HERE. Present the plan to the user:**
400
+
401
+ ```
402
+ 📋 Autopilot Plan for: "{issue title}"
403
+
404
+ Issue: #{id} on {board provider}
405
+ {if design doc exists: "Design: docs/design/{file}.md"}
406
+ {if ADR exists: "Architecture: docs/adr/{file}.md"}
407
+
408
+ Steps:
409
+ 1. {step description}
410
+ 2. {step description}
411
+ ...
412
+
413
+ Files to change:
414
+ - {path} (create/modify)
415
+ - {path} (create/modify)
416
+
417
+ Complexity: {trivial|simple|moderate|complex}
418
+
419
+ Approve? (yes / edit / cancel)
420
+ ```
421
+
422
+ - If **yes** → call `crewpilot_worker_approve`, continue to Phase 4
423
+ - If **edit** → user provides changes, update plan, re-present
424
+ - If **cancel** → call `crewpilot_worker_fail`, stop
425
+
426
+ **Write artifact**: After approval, call `crewpilot_artifact_write` with `workflow_id={issue_id}`, `phase="plan"` containing the approved plan (steps, files, complexity).
427
+
428
+ **Session checkpoint**: After plan approval, call `crewpilot_session_save` with status="checkpoint", phase="phase-3-approved", and the current context. This ensures the approved plan can be resumed if the session is interrupted.
429
+
430
+ ### Phase 4 — Branch & Implementation
431
+
432
+ **Read prior artifacts**: Call `crewpilot_artifact_read` for `plan` (and `analysis`, `design`, `architecture`, `rca` if they exist) to load the full execution context.
433
+
434
+ 1. Call `crewpilot_worker_branch` to create feature branch
435
+ 2. Call `crewpilot_board_move` to set issue status to "in-progress"
436
+ 3. **For each step in the plan:**
437
+ a. Implement the code change (create/modify files)
438
+ b. Follow existing codebase patterns discovered in Phase 2
439
+ c. After each logical unit, run `crewpilot_exec("npm test")` or equivalent to verify nothing is broken
440
+ d. If tests fail, diagnose and fix (max 3 attempts per step — circuit breaker)
441
+ 4. Write tests for new code:
442
+ - Match existing test framework and conventions
443
+ - Cover happy path + key edge cases
444
+ - Run tests to confirm they pass
445
+
446
+ **Circuit breaker:** If any step fails 3 times consecutively:
447
+ - Call `crewpilot_board_comment` with details of the failure
448
+ - Call `crewpilot_worker_fail` with reason
449
+ - Tell the user what went wrong and which step is stuck
450
+ - STOP. Do not continue.
451
+
452
+ ### Phase 5 — Change Management (Deliver Skill #1)
453
+
454
+ **Load and follow** `.github/skills/deliver-change-management/SKILL.md` methodology:
455
+
456
+ 1. Run `crewpilot_git_diff` to analyze all changes
457
+ 2. Categorize changes by type: `feat`, `fix`, `refactor`, `test`, `docs`, `chore`
458
+ 3. **If changes span multiple logical units** (e.g., new feature + test + config):
459
+ - Split into separate commits with `crewpilot_git_stage` per group
460
+ - Each commit gets its own conventional message
461
+ - Example:
462
+ ```
463
+ git add src/feature.ts
464
+ → feat(scope): add feature X (closes #ID)
465
+
466
+ git add tests/feature.test.ts
467
+ → test(scope): add tests for feature X
468
+
469
+ git add docs/api.md
470
+ → docs(scope): update API docs for feature X
471
+ ```
472
+ 4. **If changes are a single logical unit**, create one commit:
473
+ - Format: `feat(scope): description (closes #ID)`
474
+ - Body: what was implemented and why
475
+ - Footer: `Closes #ID`
476
+ 5. Call `crewpilot_git_stage` and `crewpilot_git_commit` for each logical commit
477
+ 6. **Write artifact**: Call `crewpilot_artifact_write` with `workflow_id={issue_id}`, `phase="change-mgmt"` containing the list of commits created (hash, type, scope, message)
478
+
479
+ ### Phase 5b — Doc Governance (Deliver Skill #2)
480
+
481
+ **Load and follow** `.github/skills/deliver-doc-governance/SKILL.md` methodology:
482
+
483
+ 1. Check if the changes affect any **public interfaces**:
484
+ - New/changed API endpoints
485
+ - New/changed CLI commands
486
+ - New/changed configuration options
487
+ - New/changed tool signatures
488
+ - New/changed exports or public functions
489
+ 2. If public interfaces changed, run drift detection:
490
+ - Compare README against actual project structure and features
491
+ - Compare API docs against actual function signatures
492
+ - Check if code examples still work
493
+ - Verify install/setup instructions are still accurate
494
+ 3. **If drift found:**
495
+ - Fix the documentation directly (same branch)
496
+ - Stage and commit: `docs(scope): sync docs with implementation changes`
497
+ - Add to the PR body: `### Documentation Updated` section listing what was synced
498
+ 4. **If no public interfaces changed**, skip — note "No doc changes needed" in the PR body
499
+
500
+ ### Phase 6 — PR Creation & Auto-Review
501
+
502
+ 1. Call `crewpilot_worker_preview_pr` with:
503
+ - Title: primary commit message
504
+ - Body: markdown with sections:
505
+ - **What**: summary of changes
506
+ - **Why**: linked to issue #{ID}
507
+ - **Changes**: list of commits with descriptions
508
+ - **Documentation Updated**: what docs were synced (or "N/A")
509
+ - **How to test**: steps to verify
510
+ - **Checklist**: tests pass, lint clean, types clean, docs synced
511
+ <HARD-GATE>
512
+ 2. **HUMAN GATE**: User reviews the preview — do NOT create the PR until the user approves.
513
+ If the user requests changes, apply them and re-preview. Never skip this gate.
514
+ </HARD-GATE>
515
+ 3. Call `crewpilot_worker_pr` to create the PR
516
+ 4. **Run PR Intelligence** (read `.github/skills/assure-pr-intelligence/SKILL.md`):
517
+ - **Change inventory**: categorize changed files (core, api, test, config, docs)
518
+ - **Risk assessment**: evaluate scope, complexity, blast radius, test coverage, reversibility → Low/Medium/High/Critical risk score
519
+ - **Reviewer guidance**: order files by review priority, flag lines needing attention, list questions the reviewer should ask, note what's missing from the PR
520
+ - **Merge readiness checklist**: tests pass, security clean, breaking changes documented, PR description matches changes
521
+ - Post the full PR Intelligence report as a **comment on the PR** so the assigned reviewer sees it immediately
522
+ 5. Read the diff of the PR
523
+ 6. **Subagent delegation (recommended for moderate/complex changes):** Use `crewpilot_dispatch_subagent` to delegate review work in parallel:
524
+ - Delegate `code-reviewer` role with the diff and file list — receives correctness, security, and performance findings
525
+ - Delegate `standards-reviewer` role with the diff and codebase conventions — receives standards compliance findings
526
+ - Delegate `security-auditor` role with source files and architecture context — receives STRIDE/OWASP findings
527
+ - Each subagent writes its output as an artifact (e.g. `review-functional`, `review-standards`) for traceability
528
+ - Merge subagent findings using `crewpilot_dispatch_consensus` to identify high-confidence vs disputed issues
529
+
530
+ **Fallback (simple changes):** Run reviews inline without subagent delegation:
531
+ 7. Run **code-quality** review internally (read `.github/skills/assure-code-quality/SKILL.md`):
532
+ - Correctness: does the code do what the acceptance criteria say?
533
+ - Security: any obvious vulnerabilities (SQL injection, XSS, secrets)?
534
+ - Performance: any N+1 queries, await-in-loops, unnecessary re-renders?
535
+ - Style: does it match codebase conventions?
536
+ 7. Run **vulnerability-scan** internally (read `.github/skills/assure-vulnerability-scan/SKILL.md`):
537
+ - OWASP Top 10 quick check on new code
538
+ - Dependency audit: `npm audit` or `pip audit`
539
+ 8. Run `crewpilot_exec("npm run lint")` and `crewpilot_exec("npm run typecheck")` if available
540
+ 8b. **(Optional) Requirements alignment validation**: If M365 context was fetched in Phase 2, validate the implementation against meeting-stated requirements:
541
+ - Read the `analysis` artifact to retrieve the M365 requirements context captured earlier
542
+ - If the analysis artifact contains meeting decisions or stakeholder expectations, call `mcp_workiq_ask_work_iq` → "What specific requirements and acceptance criteria were stated for {feature} in meetings and emails?"
543
+ - Cross-reference each stated requirement against the implementation diff:
544
+ - **Covered**: the requirement is addressed by the code changes ✓
545
+ - **Partial**: the requirement is partially addressed — flag what's missing
546
+ - **Missing**: the requirement is not addressed at all — flag as a review finding
547
+ - Include requirements alignment in the PR comment:
548
+ ```
549
+ 📋 Requirements Alignment:
550
+ Meeting requirements checked: {N}
551
+ Covered: {count} ✓ | Partial: {count} ⚠️ | Missing: {count} ❌
552
+ {list any partial/missing items}
553
+ ```
554
+ - If critical requirements are missing, flag as a review issue that must be addressed before merge
555
+ 9. **Run diff-scoped pattern detection** (read `.github/skills/insights-pattern-detection/SKILL.md`):
556
+ - Scope: only scan files changed in the diff (NOT full codebase)
557
+ - Check for **consistency** with existing codebase patterns:
558
+ - Error handling style matches project conventions?
559
+ - Data access patterns match?
560
+ - Naming conventions followed?
561
+ - Test structure matches existing tests?
562
+ - Check for **anti-patterns** in changed files:
563
+ - God object/file (single file > 500 lines with mixed responsibilities)
564
+ - Copy-paste (near-duplicate code blocks)
565
+ - Shotgun surgery (small change touching too many files)
566
+ - Primitive obsession (strings/numbers where domain types belong)
567
+ - **Query knowledge base for repeat offenses**:
568
+ - `crewpilot_knowledge_search` type: `pattern` — "has this same anti-pattern been flagged before?"
569
+ - If a repeat offense is found, flag prominently:
570
+ ```
571
+ ⚠️ Recurring Pattern Issue: {description}
572
+ Previously flagged in: {previous context}
573
+ Suggestion: Consider a structural fix.
574
+ ```
575
+ - Run `crewpilot_metrics_complexity` on changed files — flag any function with complexity > threshold
576
+ - Include pattern findings in the PR comment:
577
+ ```
578
+ 🔎 Pattern Detection Results:
579
+ Consistency: {✓ follows codebase patterns | ⚠️ deviations found}
580
+ Anti-patterns: {✓ none | ⚠️ {list}}
581
+ Repeat issues: {✓ none | ⚠️ {count} recurring}
582
+ Complexity: {✓ within threshold | ⚠️ {files} above limit}
583
+ ```
584
+ 10. **If issues found (review, security, or pattern):**
585
+ - Fix them directly
586
+ - Re-commit: `fix(scope): address review findings`
587
+ - Re-push
588
+ - Re-run pattern detection on the fix to confirm resolution
589
+ 11. **Write artifact**: Call `crewpilot_artifact_write` with `workflow_id={issue_id}`, `phase="review-merged"` containing the combined review results (code-quality, vulnerability-scan, pattern detection findings, and fix iterations)
590
+ 12. Call `crewpilot_worker_review_done` with verdict: "approved" and summary
591
+ 12. Call `crewpilot_board_move` to set issue status to "in-review"
592
+ 13. Call `crewpilot_board_comment`: "PR #{pr_number} opened. Ready for review."
593
+
594
+ ### Phase 7 — Deploy Guard (Deliver Skill #3)
595
+
596
+ **Load and follow** `.github/skills/deliver-deploy-guard/SKILL.md` methodology:
597
+
598
+ Before marking ready to merge, run the 6-gate checklist:
599
+
600
+ 1. **Code Quality Gate**: No leftover TODOs, console.logs, or commented-out code in changed files
601
+ 2. **Test Integrity Gate**: All tests pass, coverage meets threshold, no `.skip` tests
602
+ 3. **Security Gate**: No hardcoded secrets, no critical CVEs, no unsafe patterns
603
+ 4. **Configuration Gate**: Env vars documented, no dev config in prod paths
604
+ 5. **Breaking Changes Gate**: API contracts backward-compatible, no dropped exports
605
+ 6. **Operational Readiness Gate**: Health endpoints, logging, error handling
606
+
607
+ Produce a verdict and include in the PR comment:
608
+
609
+ ```
610
+ 🛡️ Deploy Guard Results:
611
+ Code Quality: ✓ pass
612
+ Test Integrity: ✓ pass (coverage: 86%)
613
+ Security: ✓ pass
614
+ Configuration: ✓ pass
615
+ Breaking Changes: ✓ pass
616
+ Operational: ✓ pass
617
+
618
+ Verdict: GO ✅
619
+ ```
620
+
621
+ - If **GO** → proceed to Phase 8
622
+ - If **CONDITIONAL** → list warnings in PR comment, proceed (human decides)
623
+ - If **NO-GO** → fix blockers, re-run until GO or escalate to user
624
+
625
+ **Write artifact**: Call `crewpilot_artifact_write` with `workflow_id={issue_id}`, `phase="deploy-guard"` containing the full 6-gate results and verdict.
626
+
627
+ ### Phase 8 — Completion & Learning
628
+
629
+ 1. Call `crewpilot_board_comment` with deploy guard results: "All checks passed. Ready to merge."
630
+ 2. **Store knowledge** via `crewpilot_knowledge_store`:
631
+ - Decisions made during implementation (type: `decision`)
632
+ - Root cause findings, if this was a bug fix (type: `root-cause`)
633
+ - **Pattern findings** from Phase 6 (type: `pattern`):
634
+ - What patterns were followed or violated
635
+ - Any anti-patterns found and fixed
636
+ - Any repeat offenses detected
637
+ - Complexity hotspots
638
+ - This creates the **self-improvement loop**: future runs query this data in Phase 2 to avoid repeating the same mistakes
639
+ 3. Present final summary to user:
640
+
641
+ ```
642
+ ✅ Autopilot Complete
643
+
644
+ Issue: #{id} — {title}
645
+ Branch: {branch_name}
646
+ PR: #{pr_number}
647
+ Status: Ready to merge
648
+
649
+ Changes:
650
+ - {N} commits across {M} files
651
+ - {file} (created/modified) — {what changed}
652
+
653
+ Deliver Pipeline:
654
+ Change Mgmt: {N} conventional commits (feat/fix/test/docs)
655
+ Doc Sync: {updated | no changes needed}
656
+ Deploy Guard: {GO | CONDITIONAL — warnings}
657
+
658
+ {if bug fix:}
659
+ Root Cause: {one-sentence root cause}
660
+ Design Gap: {why it was vulnerable}
661
+ Prevention: {what would catch this earlier}
662
+
663
+ Tests: {X} passing | Coverage: {Y}%
664
+ Review: Auto-reviewed — code-quality + vulnerability-scan
665
+ Security: No issues found
666
+ Patterns: {✓ clean | ⚠️ {count} findings — stored for future runs}
667
+ Repeat Issues: {none | {count} recurring patterns detected}
668
+
669
+ → Merge when ready. Board will auto-update on close.
670
+ ```
671
+
672
+ 4. **Write artifact**: Call `crewpilot_artifact_write` with `workflow_id={issue_id}`, `phase="completion"` containing the final summary (PR number, branch, commits, review/deploy-guard results, knowledge stored)
673
+ 5. Call `crewpilot_worker_complete`
674
+
675
+ ### Capability Hints (on completion)
676
+
677
+ After presenting the final summary, append **one** contextual hint based on the session. Show each hint at most once per session.
678
+
679
+ | Context | Hint |
680
+ |---|---|
681
+ | First time user ran autopilot | 💡 *I can also parse meeting transcripts into user stories and epics — say "parse meeting" with your notes.* |
682
+ | Multiple autopilot runs completed | 💡 *I can generate a daily digest summarizing all your work — say "daily digest" or "eod report".* |
683
+ | Knowledge was stored during this run | 💡 *I remember decisions across sessions. Ask "what did we decide about X" anytime to recall.* |
684
+ | Pattern issues were detected | 💡 *I can run a full codebase health scan for anti-patterns and tech debt — say "codebase health".* |
685
+
686
+ ## Output Format
687
+
688
+ Always use the structured format shown in each phase. Lead with the status emoji:
689
+ - 📋 = planning
690
+ - ⚠️ = waiting for approval
691
+ - 🔨 = implementing
692
+ - 🔍 = reviewing
693
+ - ✅ = done
694
+ - ✗ = failed
695
+
696
+ ## Anti-Patterns
697
+
698
+ <HARD-GATE>
699
+ - Do NOT skip the human gate (Phase 3). The plan MUST be shown and approved.
700
+ - Do NOT auto-merge the PR. Only humans merge.
701
+ - Do NOT bypass the PR preview gate (Phase 6). The user MUST see the preview.
702
+ </HARD-GATE>
703
+ - Do NOT continue after 3 consecutive failures on a step. Escalate to human.
704
+ - Do NOT install new dependencies without mentioning them in the plan.
705
+ - Do NOT modify files outside the scope of the plan without asking.
706
+ - Do NOT generate placeholder/stub code. Every file must be functional.
707
+ - Do NOT skip tests. If the project has a test framework, write tests.
708
+
709
+ ## No Placeholders
710
+
711
+ Every step in the Phase 3 plan and every file produced in Phase 4 must contain real, working content. The following are **plan failures** — never write them:
712
+
713
+ | Forbidden Pattern | Why It Fails |
714
+ |---|---|
715
+ | "TBD", "TODO", "implement later" | Defers work that should be done now |
716
+ | "Add appropriate error handling" | Vague — specify which errors and how to handle them |
717
+ | "Add validation" | Which inputs? What rules? What error messages? |
718
+ | "Handle edge cases" | Name the edge cases or don't mention them |
719
+ | "Write tests for the above" | Show the actual test code |
720
+ | "Similar to Phase N" | Repeat the details — context resets between phases |
721
+ | Steps without code blocks | If a step changes code, show the code |
722
+ | References to undefined types/functions | Every symbol must trace back to an earlier step |
723
+
724
+ ## Chains To
725
+
726
+ - `solution-design` — Phase 2.5: generate solution design doc when `needs-design` label detected
727
+ - `architecture-planner` — Phase 2.5: generate ADR when `needs-architecture` label detected
728
+ - `root-cause-analysis` — Phase 2.5c: systematic RCA when `bug`/`defect`/`regression` label detected
729
+ - `threat-model` — Phase 2.5d: STRIDE threat modeling when `needs-threat-model`/`security-sensitive` label detected
730
+ - `change-management` — Phase 5: proper conventional commits with multi-commit splitting
731
+ - `doc-governance` — Phase 5b: auto-detect and fix documentation drift
732
+ - `pr-intelligence` — Phase 6: risk assessment + reviewer guidance posted on PR
733
+ - `code-quality` — Phase 6: multi-pass review of the PR
734
+ - `vulnerability-scan` — Phase 6: security audit of new code
735
+ - `pattern-detection` — Phase 2 (query known patterns) + Phase 6 (diff-scoped scan) + Phase 8 (store findings)
736
+ - `deploy-guard` — Phase 7: 6-gate safety check before marking ready to merge
737
+ - `knowledge-base` — Phase 2, 2.5c, 6, 8: the memory hub that powers the self-improvement loop