cpflow 5.1.0 → 5.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.github/actions/cpflow-wait-for-health/action.yml +11 -4
- data/.github/workflows/cpflow-promote-staging-to-production.yml +224 -37
- data/.github/workflows/rspec-shared.yml +8 -1
- data/CHANGELOG.md +15 -1
- data/Gemfile.lock +1 -1
- data/README.md +4 -0
- data/docs/assets/logo/favicon.ico +0 -0
- data/docs/assets/logo/icon-1024.png +0 -0
- data/docs/assets/logo/icon-128.png +0 -0
- data/docs/assets/logo/icon-16.png +0 -0
- data/docs/assets/logo/icon-192.png +0 -0
- data/docs/assets/logo/icon-24.png +0 -0
- data/docs/assets/logo/icon-32.png +0 -0
- data/docs/assets/logo/icon-48.png +0 -0
- data/docs/assets/logo/icon-512.png +0 -0
- data/docs/assets/logo/icon-64.png +0 -0
- data/docs/assets/logo/icon-tile.svg +17 -0
- data/docs/assets/logo/mark-transparent.svg +16 -0
- data/docs/ci-automation.md +43 -2
- data/docs/commands.md +5 -1
- data/lib/command/maintenance_off.rb +1 -0
- data/lib/command/maintenance_on.rb +1 -0
- data/lib/command/run.rb +25 -5
- data/lib/core/maintenance_mode.rb +93 -6
- data/lib/cpflow/version.rb +1 -1
- data/lib/github_flow_templates/.github/cpflow-help.md +13 -1
- data/lib/github_flow_templates/.github/workflows/cpflow-promote-staging-to-production.yml +224 -39
- metadata +14 -2
data/docs/commands.md
CHANGED
|
@@ -315,6 +315,7 @@ cpflow maintenance -a $APP_NAME
|
|
|
315
315
|
### `maintenance:off`
|
|
316
316
|
|
|
317
317
|
- Disables maintenance mode for an app
|
|
318
|
+
- Safe to re-run: if a previous run timed out after switching the domain but before stopping the maintenance workload, re-running while maintenance mode is already disabled stops the maintenance workload to finish it (so it is not a pure no-op)
|
|
318
319
|
- Specify the one-off workload through `one_off_workload` in the `.controlplane/controlplane.yml` file
|
|
319
320
|
- Optionally specify the maintenance workload through `maintenance_workload` in the `.controlplane/controlplane.yml` file (defaults to 'maintenance')
|
|
320
321
|
- Maintenance mode is only supported for domains that use path based routing mode and have a route configured for the prefix '/' on either port 80 or 443
|
|
@@ -326,6 +327,7 @@ cpflow maintenance:off -a $APP_NAME
|
|
|
326
327
|
### `maintenance:on`
|
|
327
328
|
|
|
328
329
|
- Enables maintenance mode for an app
|
|
330
|
+
- Safe to re-run: if a previous run timed out after switching the domain but before stopping the app workloads, re-running while maintenance mode is already enabled stops the app workloads to finish it (so it is not a pure no-op)
|
|
329
331
|
- Specify the one-off workload through `one_off_workload` in the `.controlplane/controlplane.yml` file
|
|
330
332
|
- Optionally specify the maintenance workload through `maintenance_workload` in the `.controlplane/controlplane.yml` file (defaults to 'maintenance')
|
|
331
333
|
- Maintenance mode is only supported for domains that use path based routing mode and have a route configured for the prefix '/' on either port 80 or 443
|
|
@@ -466,6 +468,8 @@ timeout 300 cpflow ps:wait -a $APP_NAME
|
|
|
466
468
|
and also overridden per job through `--cpu` and `--memory`)
|
|
467
469
|
- By default, the job is stopped if it takes longer than 6 hours to finish
|
|
468
470
|
(can be configured though `runner_job_timeout` in `controlplane.yml`)
|
|
471
|
+
- Non-interactive jobs return the Control Plane cron job status even when the job finishes before
|
|
472
|
+
Control Plane exposes a runner replica to attach logs to
|
|
469
473
|
|
|
470
474
|
```sh
|
|
471
475
|
# Opens shell (bash by default).
|
|
@@ -550,7 +554,7 @@ cpflow terraform import
|
|
|
550
554
|
Regenerates the generated cpflow GitHub Actions wrappers and helper files
|
|
551
555
|
from the currently installed cpflow gem. Use this after updating the
|
|
552
556
|
cpflow gem so checked-in workflow wrappers move to the matching upstream
|
|
553
|
-
release tag, for example `v5.0
|
|
557
|
+
release tag, for example `v5.1.0`.
|
|
554
558
|
|
|
555
559
|
If the existing generated staging workflow uses a custom single staging
|
|
556
560
|
branch, the command preserves it. Pass `--staging-branch BRANCH` to set or
|
|
@@ -10,6 +10,7 @@ module Command
|
|
|
10
10
|
DESCRIPTION = "Disables maintenance mode for an app"
|
|
11
11
|
LONG_DESCRIPTION = <<~DESC
|
|
12
12
|
- Disables maintenance mode for an app
|
|
13
|
+
- Safe to re-run: if a previous run timed out after switching the domain but before stopping the maintenance workload, re-running while maintenance mode is already disabled stops the maintenance workload to finish it (so it is not a pure no-op)
|
|
13
14
|
- Specify the one-off workload through `one_off_workload` in the `.controlplane/controlplane.yml` file
|
|
14
15
|
- Optionally specify the maintenance workload through `maintenance_workload` in the `.controlplane/controlplane.yml` file (defaults to 'maintenance')
|
|
15
16
|
- Maintenance mode is only supported for domains that use path based routing mode and have a route configured for the prefix '/' on either port 80 or 443
|
|
@@ -10,6 +10,7 @@ module Command
|
|
|
10
10
|
DESCRIPTION = "Enables maintenance mode for an app"
|
|
11
11
|
LONG_DESCRIPTION = <<~DESC
|
|
12
12
|
- Enables maintenance mode for an app
|
|
13
|
+
- Safe to re-run: if a previous run timed out after switching the domain but before stopping the app workloads, re-running while maintenance mode is already enabled stops the app workloads to finish it (so it is not a pure no-op)
|
|
13
14
|
- Specify the one-off workload through `one_off_workload` in the `.controlplane/controlplane.yml` file
|
|
14
15
|
- Optionally specify the maintenance workload through `maintenance_workload` in the `.controlplane/controlplane.yml` file (defaults to 'maintenance')
|
|
15
16
|
- Maintenance mode is only supported for domains that use path based routing mode and have a route configured for the prefix '/' on either port 80 or 443
|
data/lib/command/run.rb
CHANGED
|
@@ -47,6 +47,8 @@ module Command
|
|
|
47
47
|
and also overridden per job through `--cpu` and `--memory`)
|
|
48
48
|
- By default, the job is stopped if it takes longer than 6 hours to finish
|
|
49
49
|
(can be configured though `runner_job_timeout` in `controlplane.yml`)
|
|
50
|
+
- Non-interactive jobs return the Control Plane cron job status even when the job finishes before
|
|
51
|
+
Control Plane exposes a runner replica to attach logs to
|
|
50
52
|
DESC
|
|
51
53
|
EXAMPLES = <<~EX.freeze
|
|
52
54
|
```sh
|
|
@@ -97,7 +99,7 @@ module Command
|
|
|
97
99
|
|
|
98
100
|
attr_reader :interactive, :detached, :location, :original_workload, :runner_workload,
|
|
99
101
|
:default_image, :default_cpu, :default_memory, :job_timeout, :job_history_limit,
|
|
100
|
-
:container, :job, :replica, :command
|
|
102
|
+
:container, :job, :replica, :command, :job_completed_before_replica_exit_status
|
|
101
103
|
|
|
102
104
|
def call # rubocop:disable Metrics/CyclomaticComplexity, Metrics/MethodLength, Metrics/PerceivedComplexity
|
|
103
105
|
@interactive = config.options[:interactive] || interactive_command?
|
|
@@ -129,6 +131,7 @@ module Command
|
|
|
129
131
|
update_runner_workload
|
|
130
132
|
start_job
|
|
131
133
|
wait_for_replica_for_job
|
|
134
|
+
exit(job_completed_before_replica_exit_status) if job_completed_before_replica_exit_status
|
|
132
135
|
|
|
133
136
|
progress.puts
|
|
134
137
|
if interactive
|
|
@@ -269,7 +272,20 @@ module Command
|
|
|
269
272
|
result = cp.fetch_workload_replicas(runner_workload, location: location)
|
|
270
273
|
@replica = result&.dig("items")&.find { |item| item.include?(job) }
|
|
271
274
|
|
|
272
|
-
replica || false
|
|
275
|
+
replica || completed_job_before_replica? || false
|
|
276
|
+
end
|
|
277
|
+
end
|
|
278
|
+
|
|
279
|
+
def completed_job_before_replica?
|
|
280
|
+
case current_job_status
|
|
281
|
+
when "successful"
|
|
282
|
+
@job_completed_before_replica_exit_status = ExitCode::SUCCESS
|
|
283
|
+
true
|
|
284
|
+
when nil, "active", "pending"
|
|
285
|
+
false
|
|
286
|
+
else
|
|
287
|
+
@job_completed_before_replica_exit_status = ExitCode::ERROR_DEFAULT
|
|
288
|
+
true
|
|
273
289
|
end
|
|
274
290
|
end
|
|
275
291
|
|
|
@@ -505,9 +521,7 @@ module Command
|
|
|
505
521
|
|
|
506
522
|
def resolve_job_status # rubocop:disable Metrics/MethodLength
|
|
507
523
|
loop do
|
|
508
|
-
|
|
509
|
-
job_details = result&.dig("items")&.find { |item| item["id"] == job }
|
|
510
|
-
status = job_details&.dig("status")
|
|
524
|
+
status = current_job_status
|
|
511
525
|
|
|
512
526
|
Shell.debug("JOB STATUS", status)
|
|
513
527
|
|
|
@@ -522,6 +536,12 @@ module Command
|
|
|
522
536
|
end
|
|
523
537
|
end
|
|
524
538
|
|
|
539
|
+
def current_job_status
|
|
540
|
+
result = cp.fetch_cron_workload(runner_workload, location: location)
|
|
541
|
+
job_details = result&.dig("items")&.find { |item| item["id"] == job }
|
|
542
|
+
job_details&.dig("status")
|
|
543
|
+
end
|
|
544
|
+
|
|
525
545
|
###########################################
|
|
526
546
|
### temporary extaction from run:detached
|
|
527
547
|
###########################################
|
|
@@ -1,8 +1,19 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
|
-
class MaintenanceMode
|
|
3
|
+
class MaintenanceMode # rubocop:disable Metrics/ClassLength
|
|
4
4
|
extend Forwardable
|
|
5
5
|
|
|
6
|
+
DOMAIN_WORKLOAD_UPDATE_MAX_POLL_ATTEMPTS = 30
|
|
7
|
+
DOMAIN_WORKLOAD_UPDATE_RETRY_WAIT_SECONDS = 1
|
|
8
|
+
DOMAIN_WORKLOAD_UPDATE_STEP_OPTIONS = {
|
|
9
|
+
retry_on_failure: true,
|
|
10
|
+
# `with_retry` loops while `retry_count <= max_retry_count` starting from 0, so
|
|
11
|
+
# total attempts == max_retry_count + 1. Subtract 1 so the bounded poll runs
|
|
12
|
+
# exactly DOMAIN_WORKLOAD_UPDATE_MAX_POLL_ATTEMPTS times.
|
|
13
|
+
max_retry_count: DOMAIN_WORKLOAD_UPDATE_MAX_POLL_ATTEMPTS - 1,
|
|
14
|
+
wait: DOMAIN_WORKLOAD_UPDATE_RETRY_WAIT_SECONDS
|
|
15
|
+
}.freeze
|
|
16
|
+
|
|
6
17
|
def_delegators :@command, :config, :progress, :cp, :step, :run_cpflow_command
|
|
7
18
|
|
|
8
19
|
def initialize(command)
|
|
@@ -22,6 +33,7 @@ class MaintenanceMode
|
|
|
22
33
|
def enable!
|
|
23
34
|
if enabled?
|
|
24
35
|
progress.puts("Maintenance mode is already enabled for app '#{config.app}'.")
|
|
36
|
+
ensure_app_workloads_stopped
|
|
25
37
|
else
|
|
26
38
|
enable_maintenance_mode
|
|
27
39
|
end
|
|
@@ -30,6 +42,7 @@ class MaintenanceMode
|
|
|
30
42
|
def disable!
|
|
31
43
|
if disabled?
|
|
32
44
|
progress.puts("Maintenance mode is already disabled for app '#{config.app}'.")
|
|
45
|
+
ensure_maintenance_workload_stopped
|
|
33
46
|
else
|
|
34
47
|
disable_maintenance_mode
|
|
35
48
|
end
|
|
@@ -69,6 +82,28 @@ class MaintenanceMode
|
|
|
69
82
|
cp.fetch_workload!(maintenance_workload)
|
|
70
83
|
end
|
|
71
84
|
|
|
85
|
+
# A run that already switched the route but hit the poll timeout aborts before
|
|
86
|
+
# its final workload-stop step runs. The next `enable!`/`disable!` short-circuits
|
|
87
|
+
# on the route check, so do the matching stop here — once the route is on the
|
|
88
|
+
# target, this brings the workloads into the state that route implies. `ps:stop`
|
|
89
|
+
# is idempotent, so each is a no-op once the target workload is already stopped.
|
|
90
|
+
#
|
|
91
|
+
# The stop target differs by direction. `ps:stop -a` covers only
|
|
92
|
+
# `app_workloads` + `additional_workloads`, never the maintenance workload:
|
|
93
|
+
# - enable!: the route now points at the maintenance workload, so the *app*
|
|
94
|
+
# workloads are the ones left running and `ps:stop -a` is correct.
|
|
95
|
+
# - disable!: the route now points at the app workloads (and a short-circuit
|
|
96
|
+
# `disable!` can run on an app whose app workloads are serving live traffic),
|
|
97
|
+
# so stopping all workloads would cause an outage. The workload a timed-out
|
|
98
|
+
# `disable!` leaves running is the maintenance workload, so stop only that.
|
|
99
|
+
def ensure_app_workloads_stopped
|
|
100
|
+
start_or_stop_all_workloads(:stop)
|
|
101
|
+
end
|
|
102
|
+
|
|
103
|
+
def ensure_maintenance_workload_stopped
|
|
104
|
+
start_or_stop_maintenance_workload(:stop)
|
|
105
|
+
end
|
|
106
|
+
|
|
72
107
|
def start_or_stop_all_workloads(action)
|
|
73
108
|
run_cpflow_command("ps:#{action}", "-a", config.app, "--wait")
|
|
74
109
|
|
|
@@ -82,16 +117,68 @@ class MaintenanceMode
|
|
|
82
117
|
end
|
|
83
118
|
|
|
84
119
|
def switch_domain_workload(to:)
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
120
|
+
domain_name = domain_data["name"]
|
|
121
|
+
|
|
122
|
+
# Unlike the polling step below, the switch request is intentionally not
|
|
123
|
+
# retried: if it fails, nothing has changed yet, so aborting and letting the
|
|
124
|
+
# user re-run is the safe outcome. (Retrying would not help here anyway —
|
|
125
|
+
# `with_retry` retries on a falsy return, and `set_domain_workload` raises
|
|
126
|
+
# rather than returning false.)
|
|
127
|
+
step("Requesting workload switch for domain '#{domain_name}' to '#{to}'") do
|
|
128
|
+
# `set_domain_workload` mutates the route in place, so send a deep copy
|
|
129
|
+
# (round-tripped through JSON, since the domain is plain parsed-API data
|
|
130
|
+
# with string keys and JSON-native values) to keep the cached
|
|
131
|
+
# `@domain_data` reflecting the real server route. The poll re-fetches and
|
|
132
|
+
# matches on that fresh data, but if every poll times out without a routable
|
|
133
|
+
# fetch, `@domain_data` is what a re-run's `enabled?`/`disabled?` check reads
|
|
134
|
+
# — mutating it here would make that check report the requested route, not
|
|
135
|
+
# the actual one.
|
|
136
|
+
domain_data_for_update = JSON.parse(JSON.generate(domain_data))
|
|
137
|
+
cp.set_domain_workload(domain_data_for_update, to)
|
|
90
138
|
end
|
|
91
139
|
|
|
140
|
+
wait_for_domain_workload_switch(domain_name, to)
|
|
141
|
+
|
|
92
142
|
progress.puts
|
|
93
143
|
end
|
|
94
144
|
|
|
145
|
+
# If the route never switches within the bounded poll window, this step aborts
|
|
146
|
+
# (abort_on_error) before any workloads are stopped, so traffic stays on the
|
|
147
|
+
# current workload. The label tells the user how to recover, since an exhausted
|
|
148
|
+
# poll has no error message of its own to print.
|
|
149
|
+
def wait_for_domain_workload_switch(domain_name, to)
|
|
150
|
+
@last_poll_error = nil # reset the poll-error dedup state for this poll window
|
|
151
|
+
step("Waiting for domain '#{domain_name}' workload to switch to '#{to}' " \
|
|
152
|
+
"(re-run this command if it times out)", **DOMAIN_WORKLOAD_UPDATE_STEP_OPTIONS) do
|
|
153
|
+
domain_workload_update_confirmed?(domain_name, to)
|
|
154
|
+
end
|
|
155
|
+
end
|
|
156
|
+
|
|
157
|
+
# Refetches the domain, refreshes the cached `@domain_data` when the fetch
|
|
158
|
+
# returns a routable domain, and reports whether the route now points at
|
|
159
|
+
# `workload`. Any error — a 5xx mid-propagation, a transient 403
|
|
160
|
+
# (`ForbiddenError < StandardError`, not a `RuntimeError`), or a network blip —
|
|
161
|
+
# is treated as "not switched yet" so the poll keeps retrying. The broad rescue
|
|
162
|
+
# logs the error to the step's stderr, so a latent bug (e.g. `NoMethodError`)
|
|
163
|
+
# surfaces in the "failed!" output on timeout instead of being swallowed.
|
|
164
|
+
def domain_workload_update_confirmed?(domain_name, workload)
|
|
165
|
+
refreshed_domain_data = cp.fetch_domain(domain_name)
|
|
166
|
+
@domain_data = refreshed_domain_data if refreshed_domain_data
|
|
167
|
+
refreshed_domain_data && cp.domain_workload_matches?(refreshed_domain_data, workload)
|
|
168
|
+
rescue StandardError => e
|
|
169
|
+
# A persistent failure (bad domain name, network outage, a latent bug) repeats
|
|
170
|
+
# the same error on every poll attempt, so only log when the message changes —
|
|
171
|
+
# otherwise the timeout output would carry up to MAX_POLL_ATTEMPTS identical
|
|
172
|
+
# lines. Guard on `tmp_stderr` so this stays safe if ever called outside a
|
|
173
|
+
# `step` block, where no tmp stderr is set up.
|
|
174
|
+
message = "#{e.class}: #{e.message} (#{e.backtrace&.first})\n"
|
|
175
|
+
if message != @last_poll_error && Shell.tmp_stderr
|
|
176
|
+
Shell.write_to_tmp_stderr(message)
|
|
177
|
+
@last_poll_error = message
|
|
178
|
+
end
|
|
179
|
+
false
|
|
180
|
+
end
|
|
181
|
+
|
|
95
182
|
def domain_data
|
|
96
183
|
@domain_data ||=
|
|
97
184
|
if config.domain
|
data/lib/cpflow/version.rb
CHANGED
|
@@ -23,11 +23,23 @@ For the normal generated review-app path, GitHub needs one repository secret:
|
|
|
23
23
|
| --- | --- | --- |
|
|
24
24
|
| `CPLN_TOKEN_STAGING` | Repository secret | Control Plane service-account token for the staging/review org. |
|
|
25
25
|
|
|
26
|
+
For public repositories, use a staging/review token that cannot access
|
|
27
|
+
production Control Plane resources. Generated review-app deploys skip fork PR
|
|
28
|
+
heads because Docker builds use repository secrets. If a forked change needs a
|
|
29
|
+
review app, first move the reviewed change to a trusted branch in this
|
|
30
|
+
repository.
|
|
31
|
+
|
|
26
32
|
No repository variables are required for the standard review-app path when
|
|
27
33
|
`.controlplane/controlplane.yml` has exactly one review app entry with
|
|
28
34
|
`match_if_app_name_starts_with: true`. cpflow infers the review-app prefix and
|
|
29
35
|
staging org from that config.
|
|
30
36
|
|
|
37
|
+
Review apps run pull request code. Any value mounted through
|
|
38
|
+
`cpln://secret/...` can be read by that code after the workload starts, so keep
|
|
39
|
+
review-app secret dictionaries limited to disposable databases, review-only
|
|
40
|
+
renderer credentials, and license values that are acceptable for review-app
|
|
41
|
+
exposure.
|
|
42
|
+
|
|
31
43
|
Optional overrides exist for forks, clones, and unusual apps:
|
|
32
44
|
|
|
33
45
|
| Name | Notes |
|
|
@@ -142,7 +154,7 @@ Most apps do not need these:
|
|
|
142
154
|
| Name | Notes |
|
|
143
155
|
| --- | --- |
|
|
144
156
|
| `DOCKER_BUILD_EXTRA_ARGS` | Newline-delimited extra Docker build tokens. |
|
|
145
|
-
| `DOCKER_BUILD_SSH_KEY` |
|
|
157
|
+
| `DOCKER_BUILD_SSH_KEY` | Read-only, revocable deploy key for Docker builds that fetch private dependencies. Do not use a personal SSH key. |
|
|
146
158
|
| `DOCKER_BUILD_SSH_KNOWN_HOSTS` | SSH known_hosts entries when SSH build hosts are not GitHub.com. |
|
|
147
159
|
| `REVIEW_APP_DEPLOYING_ICON_URL` | Cosmetic custom image URL for the animated deploying icon. Set to `none` to use the text fallback icon. |
|
|
148
160
|
| `STAGING_APP_BRANCH` | Custom staging branch. The branch must also appear in `cpflow-deploy-staging.yml`'s push filter. |
|