@schilling.mark.a/software-methodology 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (77) hide show
  1. package/.github/copilot-instructions.md +106 -0
  2. package/LICENSE +21 -0
  3. package/README.md +174 -0
  4. package/atdd-workflow/SKILL.md +117 -0
  5. package/atdd-workflow/references/green-phase.md +38 -0
  6. package/atdd-workflow/references/red-phase.md +62 -0
  7. package/atdd-workflow/references/refactor-phase.md +75 -0
  8. package/bdd-specification/SKILL.md +88 -0
  9. package/bdd-specification/references/example-mapping.md +105 -0
  10. package/bdd-specification/references/gherkin-patterns.md +214 -0
  11. package/cicd-pipeline/SKILL.md +64 -0
  12. package/cicd-pipeline/references/deployment-rollback.md +176 -0
  13. package/cicd-pipeline/references/environment-promotion.md +159 -0
  14. package/cicd-pipeline/references/pipeline-stages.md +198 -0
  15. package/clean-code/SKILL.md +77 -0
  16. package/clean-code/references/behavioral-patterns.md +329 -0
  17. package/clean-code/references/creational-patterns.md +197 -0
  18. package/clean-code/references/enterprise-patterns.md +334 -0
  19. package/clean-code/references/solid.md +230 -0
  20. package/clean-code/references/structural-patterns.md +238 -0
  21. package/continuous-improvement/SKILL.md +69 -0
  22. package/continuous-improvement/references/measurement.md +133 -0
  23. package/continuous-improvement/references/process-update.md +118 -0
  24. package/continuous-improvement/references/root-cause-analysis.md +144 -0
  25. package/dist/atdd-workflow.skill +0 -0
  26. package/dist/bdd-specification.skill +0 -0
  27. package/dist/cicd-pipeline.skill +0 -0
  28. package/dist/clean-code.skill +0 -0
  29. package/dist/continuous-improvement.skill +0 -0
  30. package/dist/green-implementation.skill +0 -0
  31. package/dist/product-strategy.skill +0 -0
  32. package/dist/story-mapping.skill +0 -0
  33. package/dist/ui-design-system.skill +0 -0
  34. package/dist/ui-design-workflow.skill +0 -0
  35. package/dist/ux-design.skill +0 -0
  36. package/dist/ux-research.skill +0 -0
  37. package/docs/INTEGRATION.md +229 -0
  38. package/docs/QUICKSTART.md +126 -0
  39. package/docs/SHARING.md +828 -0
  40. package/docs/SKILLS.md +296 -0
  41. package/green-implementation/SKILL.md +155 -0
  42. package/green-implementation/references/angular-patterns.md +239 -0
  43. package/green-implementation/references/common-rejections.md +180 -0
  44. package/green-implementation/references/playwright-patterns.md +321 -0
  45. package/green-implementation/references/rxjs-patterns.md +161 -0
  46. package/package.json +57 -0
  47. package/product-strategy/SKILL.md +71 -0
  48. package/product-strategy/references/business-model-canvas.md +199 -0
  49. package/product-strategy/references/canvas-alignment.md +108 -0
  50. package/product-strategy/references/value-proposition-canvas.md +159 -0
  51. package/project-templates/context.md.template +56 -0
  52. package/project-templates/test-strategy.md.template +87 -0
  53. package/story-mapping/SKILL.md +104 -0
  54. package/story-mapping/references/backbone.md +66 -0
  55. package/story-mapping/references/release-planning.md +92 -0
  56. package/story-mapping/references/task-template.md +78 -0
  57. package/story-mapping/references/walking-skeleton.md +63 -0
  58. package/ui-design-system/SKILL.md +48 -0
  59. package/ui-design-system/references/accessibility.md +134 -0
  60. package/ui-design-system/references/components.md +257 -0
  61. package/ui-design-system/references/design-tokens.md +209 -0
  62. package/ui-design-system/references/layout.md +136 -0
  63. package/ui-design-system/references/typography.md +114 -0
  64. package/ui-design-workflow/SKILL.md +90 -0
  65. package/ui-design-workflow/references/acceptance-targets.md +144 -0
  66. package/ui-design-workflow/references/component-selection.md +108 -0
  67. package/ui-design-workflow/references/scenario-to-ui.md +151 -0
  68. package/ui-design-workflow/references/screen-flows.md +116 -0
  69. package/ux-design/SKILL.md +75 -0
  70. package/ux-design/references/information-architecture.md +144 -0
  71. package/ux-design/references/interaction-patterns.md +141 -0
  72. package/ux-design/references/onboarding.md +159 -0
  73. package/ux-design/references/usability-evaluation.md +132 -0
  74. package/ux-research/SKILL.md +75 -0
  75. package/ux-research/references/journey-mapping.md +168 -0
  76. package/ux-research/references/mental-models.md +106 -0
  77. package/ux-research/references/personas.md +102 -0
@@ -0,0 +1,88 @@
1
+ ---
2
+ name: bdd-specification
3
+ description: Behavior-Driven Development specification skill for writing Gherkin feature files and example mapping documents. Use when creating feature files, writing acceptance criteria, converting story map tasks into scenarios, or when the user says "write specification", "create Gherkin", "define acceptance criteria", or "example mapping". Produces .feature files in /features/ and example mapping docs in /docs/example-mapping/. Requires story-mapping skill context. Output is consumed by atdd-workflow skill.
4
+ ---
5
+
6
+ # BDD Specification by Example
7
+
8
+ ## Overview
9
+
10
+ Translates story map tasks into executable Gherkin specifications. Produces feature files that serve as both living documentation and input for acceptance tests.
11
+
12
+ ## Before Starting — Read Context
13
+
14
+ 1. `/docs/story-map/user-tasks/[task-id].md` — the task being specified
15
+ 2. `/docs/ubiquitous-language.md` — CRITICAL: all Gherkin terms must come from here
16
+ 3. `/docs/value-proposition-canvas.md` — business value context
17
+ 4. `/features/[domain]/` — existing feature files for consistency
18
+
19
+ ## Workflow
20
+
21
+ **Story map task exists?** → Follow "Specification Workflow" below
22
+ **No story map task?** → Use story-mapping skill first, then return here
23
+
24
+ ## Specification Workflow
25
+
26
+ ### Step 1: Example Mapping
27
+
28
+ Before writing Gherkin, run example mapping. See `references/example-mapping.md` for full process.
29
+
30
+ Create: `/docs/example-mapping/[feature]-[YYYY-MM-DD].md`
31
+
32
+ Output: Rules, concrete examples, and any open questions.
33
+
34
+ ### Step 2: Write Feature File
35
+
36
+ Create: `/features/[domain]/[feature-name].feature`
37
+
38
+ See `references/gherkin-patterns.md` for syntax, patterns, and anti-patterns.
39
+
40
+ **Required elements:**
41
+ - Feature header with story format (As a / I want / So that)
42
+ - Story map task ID in a comment at the top
43
+ - At least: one happy path, one alternative path, one error/validation case
44
+ - Background section for setup shared across all scenarios
45
+ - Scenario Outline with Examples table when the same behavior applies to multiple data sets
46
+
47
+ ### Step 3: Validate Against Ubiquitous Language
48
+
49
+ Every domain term in the feature file must exist in `/docs/ubiquitous-language.md`.
50
+
51
+ If a needed term is missing:
52
+ 1. Add it to the glossary first — include definition, context, and related terms
53
+ 2. Then use it in the scenario
54
+
55
+ ### Step 4: Update Story Map
56
+
57
+ Update `/docs/story-map/user-tasks/[task-id].md`:
58
+ - Link to the created feature file
59
+ - Check off "Scenarios written"
60
+
61
+ ### Step 5: Hand Off to ATDD
62
+
63
+ Feature file is now ready as input for atdd-workflow skill.
64
+
65
+ ## Quality Checklist
66
+
67
+ - [ ] Story map task ID referenced in feature file comment
68
+ - [ ] All terms exist in ubiquitous language glossary
69
+ - [ ] Happy path scenario included
70
+ - [ ] Alternative path scenario included
71
+ - [ ] Error/validation scenario included
72
+ - [ ] Scenarios are declarative (not imperative)
73
+ - [ ] Scenarios are independent (no ordering dependency)
74
+ - [ ] Each scenario tests exactly one behavior
75
+ - [ ] Background used only for setup common to ALL scenarios
76
+ - [ ] Scenario Outlines used for data variations
77
+ - [ ] Example mapping document created
78
+ - [ ] Open questions documented
79
+
80
+ ## Integration
81
+
82
+ ```
83
+ story-mapping → identifies task and acceptance criteria
84
+
85
+ bdd-specification (this skill) → writes Gherkin feature file
86
+
87
+ atdd-workflow → implements with RED-GREEN-REFACTOR
88
+ ```
@@ -0,0 +1,105 @@
1
+ # Example Mapping
2
+
3
+ ## Purpose
4
+
5
+ Example mapping discovers the rules, examples, and open questions for a feature before writing Gherkin. It prevents writing scenarios that miss business rules or contain ambiguities.
6
+
7
+ ## Process
8
+
9
+ ### 1. Identify the Story
10
+
11
+ Pull from the story map task:
12
+ ```
13
+ As a [user role]
14
+ I want to [capability]
15
+ So that [business value]
16
+ ```
17
+
18
+ ### 2. Discover Rules
19
+
20
+ Rules are business constraints that govern behavior. Extract from:
21
+ - Value proposition canvas (what gains/pains must be addressed)
22
+ - Story map task acceptance criteria
23
+ - Domain knowledge
24
+
25
+ **Rule format:** "The system must [behavior] when [condition]"
26
+
27
+ **Examples:**
28
+ - "Invoice total must equal the sum of all line item amounts"
29
+ - "A customer cannot be deleted when they have unpaid invoices"
30
+ - "Invoice numbers must be unique and sequential"
31
+
32
+ ### 3. Generate Examples for Each Rule
33
+
34
+ For each rule, create concrete examples covering:
35
+ - **Happy path** — normal successful case
36
+ - **Alternative path** — valid variation of the happy path
37
+ - **Error/edge case** — what happens when constraints are violated
38
+
39
+ **Example format:**
40
+ ```
41
+ Given [concrete initial state]
42
+ When [specific action]
43
+ Then [observable outcome]
44
+ ```
45
+
46
+ Keep values concrete — no variables or placeholders at this stage.
47
+
48
+ ### 4. Capture Questions
49
+
50
+ Document anything uncertain:
51
+ - Business rules that aren't clear
52
+ - Edge cases where behavior is undefined
53
+ - Dependencies on other systems or features
54
+
55
+ **Format:** Track as open items. If a question blocks scenario writing, mark the scenario as TODO and continue with unblocked scenarios.
56
+
57
+ ## Output Document
58
+
59
+ Create: `/docs/example-mapping/[feature]-[YYYY-MM-DD].md`
60
+
61
+ ```markdown
62
+ # Example Mapping: [Feature Name]
63
+
64
+ **Date:** [YYYY-MM-DD]
65
+ **Story Map Task:** [TASK-ID]
66
+
67
+ ## Story
68
+ As a [role]
69
+ I want to [capability]
70
+ So that [value]
71
+
72
+ ## Rules and Examples
73
+
74
+ ### Rule 1: [Business rule statement]
75
+
76
+ **Example 1 — Happy path:** [scenario name]
77
+ - Given [state]
78
+ - When [action]
79
+ - Then [outcome]
80
+
81
+ **Example 2 — Edge case:** [scenario name]
82
+ - Given [state]
83
+ - When [action]
84
+ - Then [outcome]
85
+
86
+ ### Rule 2: [Business rule statement]
87
+
88
+ [continue pattern]
89
+
90
+ ## Questions
91
+ - [ ] [Question] → [Answer or TBD]
92
+ - [ ] [Question] → [Answer or TBD]
93
+
94
+ ## Out of Scope
95
+ - [Explicitly excluded behavior]
96
+ ```
97
+
98
+ ## When to Skip Example Mapping
99
+
100
+ Skip only when:
101
+ - Feature is trivially simple (single CRUD operation, no business rules)
102
+ - Rules are already well-documented in story map task
103
+ - Feature is a minor extension of an already-mapped feature
104
+
105
+ Even then, document at least the rules — they drive scenario count.
@@ -0,0 +1,214 @@
1
+ # Gherkin Patterns
2
+
3
+ ## Feature File Structure
4
+
5
+ ```gherkin
6
+ # Story Map: [TASK-ID]
7
+ # Value Prop: [VP-ID]
8
+
9
+ Feature: [Noun phrase — e.g., "Invoice Creation"]
10
+ As a [user role from value proposition]
11
+ I want to [capability]
12
+ So that [business value]
13
+
14
+ Background:
15
+ Given [setup common to ALL scenarios in this feature]
16
+
17
+ Scenario: [Happy path]
18
+ Given [initial state]
19
+ When [action]
20
+ Then [outcome]
21
+
22
+ Scenario: [Alternative path]
23
+ ...
24
+
25
+ Scenario: [Error case]
26
+ ...
27
+
28
+ Scenario Outline: [Parameterized behavior]
29
+ Given [context with <placeholder>]
30
+ When [action]
31
+ Then [outcome with <expected>]
32
+
33
+ Examples:
34
+ | placeholder | expected |
35
+ | value1 | result1 |
36
+ | value2 | result2 |
37
+ ```
38
+
39
+ ## Given-When-Then Rules
40
+
41
+ ### Given — Establish State
42
+ - Past tense: "Given an invoice exists for customer 'ACME Corp'"
43
+ - Describe state, not how it was created
44
+ - Include only context relevant to this scenario
45
+
46
+ ### When — Single Action
47
+ - Present tense: "When I save the invoice"
48
+ - One user action per When step
49
+ - From the user's perspective, not the system's
50
+
51
+ ### Then — Observable Outcome
52
+ - Present tense: "Then I should see invoice number INV-2024-0001"
53
+ - Only what the user can observe
54
+ - Never internal system state (databases, queues, logs)
55
+
56
+ ### And — Continue Previous Keyword
57
+ - "And" inherits the meaning of the previous Given/When/Then
58
+ - Use for additional setup, actions, or assertions
59
+
60
+ ## Background vs Given
61
+
62
+ **Background:** Setup that applies to every scenario in the feature. Authentication state, system configuration, common data.
63
+
64
+ **Given:** Setup specific to this scenario only.
65
+
66
+ If a Given step appears in every scenario, move it to Background.
67
+
68
+ ## Scenario Outline
69
+
70
+ Use when the **same behavior** applies to **different data**. The Examples table drives one test execution per row.
71
+
72
+ **Good use:**
73
+ ```gherkin
74
+ Scenario Outline: Validate line item amount
75
+ Given I am adding a line item
76
+ When I enter amount "<amount>"
77
+ And I save
78
+ Then I should see error "<error>"
79
+
80
+ Examples:
81
+ | amount | error |
82
+ | -100 | Amount cannot be negative |
83
+ | 0 | Amount must be positive |
84
+ | abc | Amount must be a number |
85
+ ```
86
+
87
+ **Bad use — different behaviors should be separate scenarios:**
88
+ ```gherkin
89
+ # DON'T do this — create vs edit are different behaviors
90
+ Scenario Outline: Invoice operations
91
+ Given I am on the <page> page
92
+ When I click <button>
93
+ Then I should see <result>
94
+
95
+ Examples:
96
+ | page | button | result |
97
+ | new | Save | New invoice |
98
+ | edit | Update | Updated invoice |
99
+ ```
100
+
101
+ ## Tags
102
+
103
+ ```gherkin
104
+ @invoicing @critical
105
+ Scenario: Create invoice
106
+
107
+ @wip
108
+ Scenario: Tax calculation
109
+ # Not ready for implementation yet
110
+
111
+ @manual
112
+ Scenario: Email delivery confirmation
113
+ # Cannot be automated
114
+
115
+ @slow
116
+ Scenario: Generate annual report
117
+ ```
118
+
119
+ Common tags: `@critical`, `@wip`, `@manual`, `@slow`, `@regression`
120
+
121
+ ## Anti-Patterns
122
+
123
+ ### Imperative Steps (UI-level detail)
124
+ ```gherkin
125
+ ❌ When I click the "Save" button
126
+ And I wait for the loading spinner to disappear
127
+ And I see the green notification bar
128
+
129
+ ✅ When I save the invoice
130
+ Then I should see a confirmation
131
+ ```
132
+
133
+ ### Technical Implementation Leaking Through
134
+ ```gherkin
135
+ ❌ Then the invoice record should be inserted into the database
136
+ And the invoice_total column should equal 1000
137
+
138
+ ✅ Then the invoice should be saved
139
+ And the total should be $1,000.00
140
+ ```
141
+
142
+ ### Conjunction Steps (Multiple Actions)
143
+ ```gherkin
144
+ ❌ When I enter the customer name and select a product and set the quantity
145
+
146
+ ✅ Given I have selected customer "ACME Corp"
147
+ And I have added product "Consulting"
148
+ When I set the quantity to 5
149
+ ```
150
+
151
+ ### Incidental Detail
152
+ ```gherkin
153
+ ❌ Given a customer with ID 12345 and email "test@example.com" and phone "555-1234" and address "123 Main St"
154
+
155
+ ✅ Given customer "ACME Corp" exists
156
+ ```
157
+ Only include data that is relevant to the scenario's behavior.
158
+
159
+ ### Scenario Dependency
160
+ ```gherkin
161
+ ❌ Scenario: Create invoice
162
+ ...creates invoice INV-001...
163
+
164
+ Scenario: Edit invoice
165
+ Given I have invoice INV-001 # ← depends on previous scenario having run
166
+
167
+ ✅ Each scenario sets up its own state independently
168
+ ```
169
+
170
+ ## CRUD Scenario Templates
171
+
172
+ ### Create
173
+ ```gherkin
174
+ Scenario: Create [entity] with valid data
175
+ Given I am authorized to create [entities]
176
+ When I create a new [entity] with [key attributes]
177
+ Then the [entity] should be saved
178
+ And I should see confirmation
179
+ ```
180
+
181
+ ### Read
182
+ ```gherkin
183
+ Scenario: View [entity] details
184
+ Given [entity] "[name]" exists
185
+ When I view the [entity]
186
+ Then I should see [key attributes]
187
+ ```
188
+
189
+ ### Update
190
+ ```gherkin
191
+ Scenario: Update [entity] attribute
192
+ Given [entity] "[name]" exists
193
+ When I update [attribute] to "[new value]"
194
+ Then the [entity] should reflect the change
195
+ ```
196
+
197
+ ### Delete
198
+ ```gherkin
199
+ Scenario: Delete [entity] with no dependencies
200
+ Given [entity] "[name]" exists
201
+ And [entity] has no dependent records
202
+ When I delete the [entity]
203
+ Then the [entity] should no longer exist
204
+ ```
205
+
206
+ ### Validation
207
+ ```gherkin
208
+ Scenario: Reject [entity] with invalid [attribute]
209
+ Given I am creating a new [entity]
210
+ When I enter invalid [attribute] "[bad value]"
211
+ And I attempt to save
212
+ Then I should see error "[message]"
213
+ And the [entity] should not be saved
214
+ ```
@@ -0,0 +1,64 @@
1
+ ---
2
+ name: cicd-pipeline
3
+ description: CI/CD pipeline design and configuration skill. Defines the automated path from committed code to production. Use when setting up a new project's pipeline, when the user says "set up CI/CD", "configure the pipeline", "automate deployment", "define the release gates", or "how does code get to production". Sits between atdd-workflow (which ends at commit) and continuous-improvement (which begins after release ships). This skill defines everything in between — build, test gates, environment promotion, deployment strategy, and rollback. Also defines what "ready to ship" means, which release-planning's Dependencies section references.
4
+ ---
5
+
6
+ # CI/CD Pipeline
7
+
8
+ ## Overview
9
+
10
+ atdd-workflow ends when code is committed green. continuous-improvement begins after a release ships. This skill defines everything in between: how committed code is automatically built, tested, promoted through environments, and deployed to production. It also defines the gates — the points where the pipeline stops and requires the code to prove it is ready before proceeding.
11
+
12
+ The pipeline is not optional. Without it, "deployed" on the task template checklist is a manual, error-prone step. With it, deployment is a consequence of passing all gates — deterministic, repeatable, and auditable.
13
+
14
+ ## When to Run This Skill
15
+
16
+ - **New project:** Design the pipeline before any code is committed. The pipeline configuration is infrastructure, not an afterthought.
17
+ - **Pipeline failure:** A build broke, a deployment failed, or code reached production that should not have. Investigate which gate failed or was missing.
18
+ - **New environment:** Adding staging, canary, or a regional deployment target.
19
+
20
+ ## Workflow
21
+
22
+ ### Step 1: Define the Pipeline Stages
23
+
24
+ The pipeline has a fixed sequence of stages. Each stage has a gate — a pass/fail condition. Code moves forward only if the gate passes. If any gate fails, the pipeline stops and notifies.
25
+
26
+ See `references/pipeline-stages.md` for the full stage sequence, what each gate checks, and how stages connect to the artifacts atdd-workflow already produces.
27
+
28
+ Create: `/docs/cicd/pipeline.md`
29
+
30
+ ### Step 2: Define Environment Promotion Strategy
31
+
32
+ Code does not go directly from commit to production. It moves through environments, each one closer to production and each one with stricter gates. The promotion strategy defines which environments exist, what runs in each, and what approval (if any) is required to move forward.
33
+
34
+ See `references/environment-promotion.md` for environment definitions, promotion rules, and approval gates.
35
+
36
+ Create: `/docs/cicd/environments.md`
37
+
38
+ ### Step 3: Define Deployment and Rollback Strategy
39
+
40
+ How code is deployed to each environment, and how to recover when a deployment fails or a production defect is discovered. The deployment strategy must match the risk profile — a canary deployment for high-traffic features, a blue-green swap for zero-downtime requirements, a simple replace for low-risk internal tools.
41
+
42
+ See `references/deployment-rollback.md` for deployment patterns, rollback triggers, and rollback procedures.
43
+
44
+ Create: `/docs/cicd/deployment.md`
45
+
46
+ ## Output Contract
47
+
48
+ This skill produces three documents that other parts of the methodology reference:
49
+
50
+ - **`/docs/cicd/pipeline.md`** — defines what "ready to ship" means. release-planning's Dependencies section references this.
51
+ - **`/docs/cicd/environments.md`** — defines the promotion path. The task template's `[ ] Deployed` checkbox maps to successful promotion to production.
52
+ - **`/docs/cicd/deployment.md`** — defines rollback. continuous-improvement's root cause analysis references this when a production defect triggers an incident.
53
+
54
+ ## Integration
55
+
56
+ ```
57
+ atdd-workflow → commits green code
58
+
59
+ cicd-pipeline (this skill) → build → test gates → promote → deploy
60
+
61
+ continuous-improvement → measures what shipped, investigates defects
62
+ ```
63
+
64
+ The pipeline is the bridge. atdd-workflow guarantees the code is tested. The pipeline guarantees the tested code reaches production correctly.
@@ -0,0 +1,176 @@
1
+ # Deployment and Rollback
2
+
3
+ ## Deployment Strategies
4
+
5
+ Not all deployments carry the same risk. A small internal tool can tolerate a brief outage during deployment. A high-traffic customer-facing application cannot. The deployment strategy matches the risk profile of the application and the change being deployed.
6
+
7
+ Choose one strategy per environment. Production and staging may use different strategies if their risk profiles differ.
8
+
9
+ ---
10
+
11
+ ### Strategy 1: Replace (Simple Deploy)
12
+
13
+ **What happens:** The old version is stopped. The new version is started in its place. There is a brief period (seconds to minutes) where the application is unavailable.
14
+
15
+ **Use when:**
16
+ - Internal tools with no SLA
17
+ - Development or staging environments
18
+ - Applications with very low traffic where brief downtime is acceptable
19
+
20
+ **Risk:** Downtime during deployment. If the new version fails to start, the application is down until rollback completes.
21
+
22
+ **Rollback:** Stop the new version. Start the previous version. Downtime during rollback as well.
23
+
24
+ ---
25
+
26
+ ### Strategy 2: Blue-Green Deploy
27
+
28
+ **What happens:** Two identical environments exist — "blue" (current production) and "green" (new version). The new version is deployed to green and fully tested there. When ready, traffic is switched from blue to green in a single atomic operation. Blue remains running and idle, ready for instant rollback.
29
+
30
+ **Use when:**
31
+ - Customer-facing applications that require zero downtime
32
+ - Applications where you need an instant rollback capability
33
+ - Deployments where you want to verify the new version is healthy before any real traffic touches it
34
+
35
+ **Risk:** Requires double the infrastructure (both blue and green must run simultaneously). The traffic switch is atomic — users either see the old version or the new version, never both.
36
+
37
+ **Rollback:** Switch traffic back to blue. Instant. No downtime. Blue was idle but running the entire time.
38
+
39
+ ---
40
+
41
+ ### Strategy 3: Canary Deploy
42
+
43
+ **What happens:** The new version is deployed to a small percentage of production traffic (typically 1–5%). The pipeline monitors error rates and response times for the canary traffic. If metrics are healthy for the observation period, traffic is gradually shifted to the new version (10% → 25% → 50% → 100%). If metrics degrade at any point, the canary is automatically rolled back.
44
+
45
+ **Use when:**
46
+ - High-traffic applications where even a small error rate affects many users
47
+ - Changes where the failure mode is subtle and only surfaces under real traffic patterns
48
+ - Applications with sophisticated monitoring that can detect error rate changes quickly
49
+
50
+ **Risk:** A small percentage of real users will experience any failure in the new version during the canary period. The observation period must be long enough to surface failures but short enough to limit exposure.
51
+
52
+ **Rollback:** Shift all traffic back to the previous version. The canary instances are stopped. Users on the canary see the previous version within seconds.
53
+
54
+ ---
55
+
56
+ ### Strategy 4: Rolling Deploy
57
+
58
+ **What happens:** Instances are updated one at a time (or in small batches). Each instance is removed from the load balancer, updated to the new version, health-checked, and added back. The application remains available throughout — at any point, some instances run the old version and some run the new version.
59
+
60
+ **Use when:**
61
+ - Applications that can tolerate mixed versions temporarily (stateless, no version-dependent shared state)
62
+ - Applications where zero downtime is required but canary monitoring is not available
63
+ - Large instance counts where replacing all at once is risky
64
+
65
+ **Risk:** During the rollout, both old and new versions serve traffic simultaneously. If the application has version-dependent behavior (e.g., database schema changes that are incompatible with the old version), rolling deploy will cause failures. Never use rolling deploy when a database migration accompanies the code change unless the migration is backward-compatible.
66
+
67
+ **Rollback:** Stop updating remaining instances. Roll back the already-updated instances one at a time using the same process. The application remains available during rollback.
68
+
69
+ ---
70
+
71
+ ## Rollback Triggers
72
+
73
+ Rollback is not a manual decision made after someone notices a problem. It is an automated response to a defined condition. The conditions are objective — not subjective judgment calls.
74
+
75
+ ### Automatic Rollback Triggers
76
+
77
+ The pipeline or monitoring system triggers rollback automatically when any of these conditions are met:
78
+
79
+ | Trigger | Condition | Rollback scope |
80
+ |---|---|---|
81
+ | Health check failure | Application does not respond to health endpoint within timeout | Full rollback to previous version |
82
+ | Error rate spike | Production error rate exceeds baseline + threshold for observation window | Full rollback (or canary rollback if using canary) |
83
+ | Latency spike | P95 response time exceeds baseline + threshold for observation window | Full rollback |
84
+ | Canary failure | Canary error rate exceeds baseline + threshold during observation period | Canary rollback — shift traffic back to previous version |
85
+ | Deployment timeout | Deployment does not complete within the maximum allowed time | Rollback the partial deployment |
86
+
87
+ ### Manual Rollback Triggers
88
+
89
+ A team member initiates rollback when:
90
+
91
+ - A production incident is reported by users that monitoring did not catch (rare if monitoring is well-configured)
92
+ - A deployment is proceeding but a critical business decision requires stopping it (e.g., a pricing change deployed by mistake)
93
+ - A security vulnerability is discovered in the deployed version after deployment
94
+
95
+ ### What Rollback Is NOT
96
+
97
+ - Rollback is not "redeploy the old version from scratch." It is "switch back to the previous known-good version that is already available." Blue-green keeps it running. Canary keeps it serving traffic. Rolling keeps it on unupdated instances. Replace restores from the previous artifact.
98
+ - Rollback does not undo database migrations. If a migration ran during deployment, rolling back the application does not roll back the database. The migration must be backward-compatible or the database must be rolled back separately — see Database Migration Rules below.
99
+
100
+ ---
101
+
102
+ ## Database Migration Rules
103
+
104
+ Database migrations are the most dangerous part of any deployment. They run before the new application version starts (or during startup). If the migration breaks, the database is in an inconsistent state. If the migration is incompatible with the old application version, rollback breaks the application.
105
+
106
+ ### Rule 1: Migrations Must Be Backward-Compatible
107
+
108
+ Every migration must leave the database in a state that both the old and new application versions can read. This means:
109
+
110
+ - **Adding a column:** Add it as nullable. The old version ignores it. The new version writes to it.
111
+ - **Removing a column:** Do not remove it in the same deploy as the code that stops using it. Remove the code first (deploy and verify). Remove the column in a subsequent deploy.
112
+ - **Renaming a column:** Do not rename. Add the new column, migrate data, update code to use the new column, deploy and verify, then remove the old column in a subsequent deploy.
113
+ - **Changing a column type:** Same pattern as rename — add, migrate, switch, verify, remove.
114
+
115
+ ### Rule 2: Migrations Run Before Application Start
116
+
117
+ The pipeline deploys the migration first, then starts the new application version. This means the migration must succeed against the current database state — not the future state.
118
+
119
+ ### Rule 3: Migrations Are Tested in Staging
120
+
121
+ The staging database must have representative data. A migration that works against an empty database but fails against real data will not be caught until production. Run migrations against staging first. Verify the application works against the migrated staging database before promoting to production.
122
+
123
+ ---
124
+
125
+ ## Deployment Configuration Document
126
+
127
+ Create: `/docs/cicd/deployment.md`
128
+
129
+ ```markdown
130
+ # Deployment Configuration
131
+
132
+ **Created:** [Date]
133
+ **Last updated:** [Date]
134
+
135
+ ## Deployment Strategy by Environment
136
+
137
+ | Environment | Strategy | Rationale |
138
+ |---|---|---|
139
+ | Staging | [Strategy] | [Why this strategy for staging] |
140
+ | Production | [Strategy] | [Why this strategy for production] |
141
+
142
+ ## Canary Configuration (if used)
143
+ - **Initial canary percentage:** [e.g., 1%]
144
+ - **Observation window:** [e.g., 10 minutes per step]
145
+ - **Ramp schedule:** [e.g., 1% → 5% → 25% → 50% → 100%]
146
+ - **Automatic rollback condition:** Error rate > [threshold]% for > [duration]
147
+
148
+ ## Health Check Configuration
149
+ - **Endpoint:** [Health check URL or command]
150
+ - **Timeout:** [How long to wait before declaring unhealthy]
151
+ - **Error rate baseline:** [Current baseline error rate]
152
+ - **Error rate threshold:** [Max deviation before rollback triggers]
153
+ - **Latency baseline:** [Current P95 latency]
154
+ - **Latency threshold:** [Max deviation before rollback triggers]
155
+ - **Observation window:** [How long to monitor after deployment before declaring healthy]
156
+
157
+ ## Rollback Procedure
158
+ 1. [Automated or manual trigger fires]
159
+ 2. [Traffic shifted / instances stopped — specific to chosen strategy]
160
+ 3. [Previous version restored — specific to chosen strategy]
161
+ 4. [Health check confirms previous version is healthy]
162
+ 5. [Notification sent to team]
163
+ 6. [Incident logged — triggers continuous-improvement root cause analysis]
164
+
165
+ ## Database Migration Policy
166
+ - All migrations are backward-compatible (see deployment-rollback.md rules)
167
+ - Migrations run in staging first and are verified before production promotion
168
+ - Migration rollback procedure: [Document specific procedure if migrations are not fully backward-compatible]
169
+ ```
170
+
171
+ ## Rules
172
+
173
+ - Choose the deployment strategy based on the application's risk profile, not on what tools are available. If the tools do not support the strategy you need, find tools that do.
174
+ - Rollback triggers are objective conditions, not judgment calls. Define them before the first deployment. Do not change them under pressure during an incident.
175
+ - Database migrations are backward-compatible or they do not ship. No exceptions. A non-backward-compatible migration means the application cannot be rolled back safely — that is an unacceptable risk.
176
+ - Every rollback is logged and triggers a continuous-improvement root cause analysis. Rollbacks are not failures to hide — they are the system working as designed. The investigation asks why the pipeline did not catch the problem, not why the rollback happened.