cpflow 5.1.0 → 5.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/docs/commands.md CHANGED
@@ -315,6 +315,7 @@ cpflow maintenance -a $APP_NAME
315
315
  ### `maintenance:off`
316
316
 
317
317
  - Disables maintenance mode for an app
318
+ - Safe to re-run: if a previous run timed out after switching the domain but before stopping the maintenance workload, re-running while maintenance mode is already disabled stops the maintenance workload to finish it (so it is not a pure no-op)
318
319
  - Specify the one-off workload through `one_off_workload` in the `.controlplane/controlplane.yml` file
319
320
  - Optionally specify the maintenance workload through `maintenance_workload` in the `.controlplane/controlplane.yml` file (defaults to 'maintenance')
320
321
  - Maintenance mode is only supported for domains that use path based routing mode and have a route configured for the prefix '/' on either port 80 or 443
@@ -326,6 +327,7 @@ cpflow maintenance:off -a $APP_NAME
326
327
  ### `maintenance:on`
327
328
 
328
329
  - Enables maintenance mode for an app
330
+ - Safe to re-run: if a previous run timed out after switching the domain but before stopping the app workloads, re-running while maintenance mode is already enabled stops the app workloads to finish it (so it is not a pure no-op)
329
331
  - Specify the one-off workload through `one_off_workload` in the `.controlplane/controlplane.yml` file
330
332
  - Optionally specify the maintenance workload through `maintenance_workload` in the `.controlplane/controlplane.yml` file (defaults to 'maintenance')
331
333
  - Maintenance mode is only supported for domains that use path based routing mode and have a route configured for the prefix '/' on either port 80 or 443
@@ -466,6 +468,8 @@ timeout 300 cpflow ps:wait -a $APP_NAME
466
468
  and also overridden per job through `--cpu` and `--memory`)
467
469
  - By default, the job is stopped if it takes longer than 6 hours to finish
468
470
  (can be configured though `runner_job_timeout` in `controlplane.yml`)
471
+ - Non-interactive jobs return the Control Plane cron job status even when the job finishes before
472
+ Control Plane exposes a runner replica to attach logs to
469
473
 
470
474
  ```sh
471
475
  # Opens shell (bash by default).
@@ -550,7 +554,7 @@ cpflow terraform import
550
554
  Regenerates the generated cpflow GitHub Actions wrappers and helper files
551
555
  from the currently installed cpflow gem. Use this after updating the
552
556
  cpflow gem so checked-in workflow wrappers move to the matching upstream
553
- release tag, for example `v5.0.4`.
557
+ release tag, for example `v5.1.0`.
554
558
 
555
559
  If the existing generated staging workflow uses a custom single staging
556
560
  branch, the command preserves it. Pass `--staging-branch BRANCH` to set or
@@ -10,6 +10,7 @@ module Command
10
10
  DESCRIPTION = "Disables maintenance mode for an app"
11
11
  LONG_DESCRIPTION = <<~DESC
12
12
  - Disables maintenance mode for an app
13
+ - Safe to re-run: if a previous run timed out after switching the domain but before stopping the maintenance workload, re-running while maintenance mode is already disabled stops the maintenance workload to finish it (so it is not a pure no-op)
13
14
  - Specify the one-off workload through `one_off_workload` in the `.controlplane/controlplane.yml` file
14
15
  - Optionally specify the maintenance workload through `maintenance_workload` in the `.controlplane/controlplane.yml` file (defaults to 'maintenance')
15
16
  - Maintenance mode is only supported for domains that use path based routing mode and have a route configured for the prefix '/' on either port 80 or 443
@@ -10,6 +10,7 @@ module Command
10
10
  DESCRIPTION = "Enables maintenance mode for an app"
11
11
  LONG_DESCRIPTION = <<~DESC
12
12
  - Enables maintenance mode for an app
13
+ - Safe to re-run: if a previous run timed out after switching the domain but before stopping the app workloads, re-running while maintenance mode is already enabled stops the app workloads to finish it (so it is not a pure no-op)
13
14
  - Specify the one-off workload through `one_off_workload` in the `.controlplane/controlplane.yml` file
14
15
  - Optionally specify the maintenance workload through `maintenance_workload` in the `.controlplane/controlplane.yml` file (defaults to 'maintenance')
15
16
  - Maintenance mode is only supported for domains that use path based routing mode and have a route configured for the prefix '/' on either port 80 or 443
data/lib/command/run.rb CHANGED
@@ -47,6 +47,8 @@ module Command
47
47
  and also overridden per job through `--cpu` and `--memory`)
48
48
  - By default, the job is stopped if it takes longer than 6 hours to finish
49
49
  (can be configured though `runner_job_timeout` in `controlplane.yml`)
50
+ - Non-interactive jobs return the Control Plane cron job status even when the job finishes before
51
+ Control Plane exposes a runner replica to attach logs to
50
52
  DESC
51
53
  EXAMPLES = <<~EX.freeze
52
54
  ```sh
@@ -97,7 +99,7 @@ module Command
97
99
 
98
100
  attr_reader :interactive, :detached, :location, :original_workload, :runner_workload,
99
101
  :default_image, :default_cpu, :default_memory, :job_timeout, :job_history_limit,
100
- :container, :job, :replica, :command
102
+ :container, :job, :replica, :command, :job_completed_before_replica_exit_status
101
103
 
102
104
  def call # rubocop:disable Metrics/CyclomaticComplexity, Metrics/MethodLength, Metrics/PerceivedComplexity
103
105
  @interactive = config.options[:interactive] || interactive_command?
@@ -129,6 +131,7 @@ module Command
129
131
  update_runner_workload
130
132
  start_job
131
133
  wait_for_replica_for_job
134
+ exit(job_completed_before_replica_exit_status) if job_completed_before_replica_exit_status
132
135
 
133
136
  progress.puts
134
137
  if interactive
@@ -269,7 +272,20 @@ module Command
269
272
  result = cp.fetch_workload_replicas(runner_workload, location: location)
270
273
  @replica = result&.dig("items")&.find { |item| item.include?(job) }
271
274
 
272
- replica || false
275
+ replica || completed_job_before_replica? || false
276
+ end
277
+ end
278
+
279
+ def completed_job_before_replica?
280
+ case current_job_status
281
+ when "successful"
282
+ @job_completed_before_replica_exit_status = ExitCode::SUCCESS
283
+ true
284
+ when nil, "active", "pending"
285
+ false
286
+ else
287
+ @job_completed_before_replica_exit_status = ExitCode::ERROR_DEFAULT
288
+ true
273
289
  end
274
290
  end
275
291
 
@@ -505,9 +521,7 @@ module Command
505
521
 
506
522
  def resolve_job_status # rubocop:disable Metrics/MethodLength
507
523
  loop do
508
- result = cp.fetch_cron_workload(runner_workload, location: location)
509
- job_details = result&.dig("items")&.find { |item| item["id"] == job }
510
- status = job_details&.dig("status")
524
+ status = current_job_status
511
525
 
512
526
  Shell.debug("JOB STATUS", status)
513
527
 
@@ -522,6 +536,12 @@ module Command
522
536
  end
523
537
  end
524
538
 
539
+ def current_job_status
540
+ result = cp.fetch_cron_workload(runner_workload, location: location)
541
+ job_details = result&.dig("items")&.find { |item| item["id"] == job }
542
+ job_details&.dig("status")
543
+ end
544
+
525
545
  ###########################################
526
546
  ### temporary extaction from run:detached
527
547
  ###########################################
@@ -1,8 +1,19 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- class MaintenanceMode
3
+ class MaintenanceMode # rubocop:disable Metrics/ClassLength
4
4
  extend Forwardable
5
5
 
6
+ DOMAIN_WORKLOAD_UPDATE_MAX_POLL_ATTEMPTS = 30
7
+ DOMAIN_WORKLOAD_UPDATE_RETRY_WAIT_SECONDS = 1
8
+ DOMAIN_WORKLOAD_UPDATE_STEP_OPTIONS = {
9
+ retry_on_failure: true,
10
+ # `with_retry` loops while `retry_count <= max_retry_count` starting from 0, so
11
+ # total attempts == max_retry_count + 1. Subtract 1 so the bounded poll runs
12
+ # exactly DOMAIN_WORKLOAD_UPDATE_MAX_POLL_ATTEMPTS times.
13
+ max_retry_count: DOMAIN_WORKLOAD_UPDATE_MAX_POLL_ATTEMPTS - 1,
14
+ wait: DOMAIN_WORKLOAD_UPDATE_RETRY_WAIT_SECONDS
15
+ }.freeze
16
+
6
17
  def_delegators :@command, :config, :progress, :cp, :step, :run_cpflow_command
7
18
 
8
19
  def initialize(command)
@@ -22,6 +33,7 @@ class MaintenanceMode
22
33
  def enable!
23
34
  if enabled?
24
35
  progress.puts("Maintenance mode is already enabled for app '#{config.app}'.")
36
+ ensure_app_workloads_stopped
25
37
  else
26
38
  enable_maintenance_mode
27
39
  end
@@ -30,6 +42,7 @@ class MaintenanceMode
30
42
  def disable!
31
43
  if disabled?
32
44
  progress.puts("Maintenance mode is already disabled for app '#{config.app}'.")
45
+ ensure_maintenance_workload_stopped
33
46
  else
34
47
  disable_maintenance_mode
35
48
  end
@@ -69,6 +82,28 @@ class MaintenanceMode
69
82
  cp.fetch_workload!(maintenance_workload)
70
83
  end
71
84
 
85
+ # A run that already switched the route but hit the poll timeout aborts before
86
+ # its final workload-stop step runs. The next `enable!`/`disable!` short-circuits
87
+ # on the route check, so do the matching stop here — once the route is on the
88
+ # target, this brings the workloads into the state that route implies. `ps:stop`
89
+ # is idempotent, so each is a no-op once the target workload is already stopped.
90
+ #
91
+ # The stop target differs by direction. `ps:stop -a` covers only
92
+ # `app_workloads` + `additional_workloads`, never the maintenance workload:
93
+ # - enable!: the route now points at the maintenance workload, so the *app*
94
+ # workloads are the ones left running and `ps:stop -a` is correct.
95
+ # - disable!: the route now points at the app workloads (and a short-circuit
96
+ # `disable!` can run on an app whose app workloads are serving live traffic),
97
+ # so stopping all workloads would cause an outage. The workload a timed-out
98
+ # `disable!` leaves running is the maintenance workload, so stop only that.
99
+ def ensure_app_workloads_stopped
100
+ start_or_stop_all_workloads(:stop)
101
+ end
102
+
103
+ def ensure_maintenance_workload_stopped
104
+ start_or_stop_maintenance_workload(:stop)
105
+ end
106
+
72
107
  def start_or_stop_all_workloads(action)
73
108
  run_cpflow_command("ps:#{action}", "-a", config.app, "--wait")
74
109
 
@@ -82,16 +117,68 @@ class MaintenanceMode
82
117
  end
83
118
 
84
119
  def switch_domain_workload(to:)
85
- step("Switching workload for domain '#{domain_data['name']}' to '#{to}'") do
86
- cp.set_domain_workload(domain_data, to)
87
-
88
- # Give it a bit of time for the domain to update
89
- Kernel.sleep(30)
120
+ domain_name = domain_data["name"]
121
+
122
+ # Unlike the polling step below, the switch request is intentionally not
123
+ # retried: if it fails, nothing has changed yet, so aborting and letting the
124
+ # user re-run is the safe outcome. (Retrying would not help here anyway —
125
+ # `with_retry` retries on a falsy return, and `set_domain_workload` raises
126
+ # rather than returning false.)
127
+ step("Requesting workload switch for domain '#{domain_name}' to '#{to}'") do
128
+ # `set_domain_workload` mutates the route in place, so send a deep copy
129
+ # (round-tripped through JSON, since the domain is plain parsed-API data
130
+ # with string keys and JSON-native values) to keep the cached
131
+ # `@domain_data` reflecting the real server route. The poll re-fetches and
132
+ # matches on that fresh data, but if every poll times out without a routable
133
+ # fetch, `@domain_data` is what a re-run's `enabled?`/`disabled?` check reads
134
+ # — mutating it here would make that check report the requested route, not
135
+ # the actual one.
136
+ domain_data_for_update = JSON.parse(JSON.generate(domain_data))
137
+ cp.set_domain_workload(domain_data_for_update, to)
90
138
  end
91
139
 
140
+ wait_for_domain_workload_switch(domain_name, to)
141
+
92
142
  progress.puts
93
143
  end
94
144
 
145
+ # If the route never switches within the bounded poll window, this step aborts
146
+ # (abort_on_error) before any workloads are stopped, so traffic stays on the
147
+ # current workload. The label tells the user how to recover, since an exhausted
148
+ # poll has no error message of its own to print.
149
+ def wait_for_domain_workload_switch(domain_name, to)
150
+ @last_poll_error = nil # reset the poll-error dedup state for this poll window
151
+ step("Waiting for domain '#{domain_name}' workload to switch to '#{to}' " \
152
+ "(re-run this command if it times out)", **DOMAIN_WORKLOAD_UPDATE_STEP_OPTIONS) do
153
+ domain_workload_update_confirmed?(domain_name, to)
154
+ end
155
+ end
156
+
157
+ # Refetches the domain, refreshes the cached `@domain_data` when the fetch
158
+ # returns a routable domain, and reports whether the route now points at
159
+ # `workload`. Any error — a 5xx mid-propagation, a transient 403
160
+ # (`ForbiddenError < StandardError`, not a `RuntimeError`), or a network blip —
161
+ # is treated as "not switched yet" so the poll keeps retrying. The broad rescue
162
+ # logs the error to the step's stderr, so a latent bug (e.g. `NoMethodError`)
163
+ # surfaces in the "failed!" output on timeout instead of being swallowed.
164
+ def domain_workload_update_confirmed?(domain_name, workload)
165
+ refreshed_domain_data = cp.fetch_domain(domain_name)
166
+ @domain_data = refreshed_domain_data if refreshed_domain_data
167
+ refreshed_domain_data && cp.domain_workload_matches?(refreshed_domain_data, workload)
168
+ rescue StandardError => e
169
+ # A persistent failure (bad domain name, network outage, a latent bug) repeats
170
+ # the same error on every poll attempt, so only log when the message changes —
171
+ # otherwise the timeout output would carry up to MAX_POLL_ATTEMPTS identical
172
+ # lines. Guard on `tmp_stderr` so this stays safe if ever called outside a
173
+ # `step` block, where no tmp stderr is set up.
174
+ message = "#{e.class}: #{e.message} (#{e.backtrace&.first})\n"
175
+ if message != @last_poll_error && Shell.tmp_stderr
176
+ Shell.write_to_tmp_stderr(message)
177
+ @last_poll_error = message
178
+ end
179
+ false
180
+ end
181
+
95
182
  def domain_data
96
183
  @domain_data ||=
97
184
  if config.domain
@@ -1,6 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Cpflow
4
- VERSION = "5.1.0"
4
+ VERSION = "5.1.1"
5
5
  MIN_CPLN_VERSION = "3.1.0"
6
6
  end
@@ -23,11 +23,23 @@ For the normal generated review-app path, GitHub needs one repository secret:
23
23
  | --- | --- | --- |
24
24
  | `CPLN_TOKEN_STAGING` | Repository secret | Control Plane service-account token for the staging/review org. |
25
25
 
26
+ For public repositories, use a staging/review token that cannot access
27
+ production Control Plane resources. Generated review-app deploys skip fork PR
28
+ heads because Docker builds use repository secrets. If a forked change needs a
29
+ review app, first move the reviewed change to a trusted branch in this
30
+ repository.
31
+
26
32
  No repository variables are required for the standard review-app path when
27
33
  `.controlplane/controlplane.yml` has exactly one review app entry with
28
34
  `match_if_app_name_starts_with: true`. cpflow infers the review-app prefix and
29
35
  staging org from that config.
30
36
 
37
+ Review apps run pull request code. Any value mounted through
38
+ `cpln://secret/...` can be read by that code after the workload starts, so keep
39
+ review-app secret dictionaries limited to disposable databases, review-only
40
+ renderer credentials, and license values that are acceptable for review-app
41
+ exposure.
42
+
31
43
  Optional overrides exist for forks, clones, and unusual apps:
32
44
 
33
45
  | Name | Notes |
@@ -142,7 +154,7 @@ Most apps do not need these:
142
154
  | Name | Notes |
143
155
  | --- | --- |
144
156
  | `DOCKER_BUILD_EXTRA_ARGS` | Newline-delimited extra Docker build tokens. |
145
- | `DOCKER_BUILD_SSH_KEY` | Private SSH key for Docker builds that fetch private dependencies. |
157
+ | `DOCKER_BUILD_SSH_KEY` | Read-only, revocable deploy key for Docker builds that fetch private dependencies. Do not use a personal SSH key. |
146
158
  | `DOCKER_BUILD_SSH_KNOWN_HOSTS` | SSH known_hosts entries when SSH build hosts are not GitHub.com. |
147
159
  | `REVIEW_APP_DEPLOYING_ICON_URL` | Cosmetic custom image URL for the animated deploying icon. Set to `none` to use the text fallback icon. |
148
160
  | `STAGING_APP_BRANCH` | Custom staging branch. The branch must also appear in `cpflow-deploy-staging.yml`'s push filter. |