RubyGems - cpflow - Versions diffs - 5.1.0 → 5.1.1 - Mend

cpflow 5.1.0 → 5.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

checksums.yaml +4 -4
data/.github/actions/cpflow-wait-for-health/action.yml +11 -4
data/.github/workflows/cpflow-promote-staging-to-production.yml +224 -37
data/.github/workflows/rspec-shared.yml +8 -1
data/CHANGELOG.md +15 -1
data/Gemfile.lock +1 -1
data/README.md +4 -0
data/docs/assets/logo/favicon.ico +0 -0
data/docs/assets/logo/icon-1024.png +0 -0
data/docs/assets/logo/icon-128.png +0 -0
data/docs/assets/logo/icon-16.png +0 -0
data/docs/assets/logo/icon-192.png +0 -0
data/docs/assets/logo/icon-24.png +0 -0
data/docs/assets/logo/icon-32.png +0 -0
data/docs/assets/logo/icon-48.png +0 -0
data/docs/assets/logo/icon-512.png +0 -0
data/docs/assets/logo/icon-64.png +0 -0
data/docs/assets/logo/icon-tile.svg +17 -0
data/docs/assets/logo/mark-transparent.svg +16 -0
data/docs/ci-automation.md +43 -2
data/docs/commands.md +5 -1
data/lib/command/maintenance_off.rb +1 -0
data/lib/command/maintenance_on.rb +1 -0
data/lib/command/run.rb +25 -5
data/lib/core/maintenance_mode.rb +93 -6
data/lib/cpflow/version.rb +1 -1
data/lib/github_flow_templates/.github/cpflow-help.md +13 -1
data/lib/github_flow_templates/.github/workflows/cpflow-promote-staging-to-production.yml +224 -39
metadata +14 -2

data/docs/commands.md CHANGED Viewed

@@ -315,6 +315,7 @@ cpflow maintenance -a $APP_NAME
 ### `maintenance:off`
 - Disables maintenance mode for an app
+- Safe to re-run: if a previous run timed out after switching the domain but before stopping the maintenance workload, re-running while maintenance mode is already disabled stops the maintenance workload to finish it (so it is not a pure no-op)
 - Specify the one-off workload through `one_off_workload` in the `.controlplane/controlplane.yml` file
 - Optionally specify the maintenance workload through `maintenance_workload` in the `.controlplane/controlplane.yml` file (defaults to 'maintenance')
 - Maintenance mode is only supported for domains that use path based routing mode and have a route configured for the prefix '/' on either port 80 or 443
@@ -326,6 +327,7 @@ cpflow maintenance:off -a $APP_NAME
 ### `maintenance:on`
 - Enables maintenance mode for an app
+- Safe to re-run: if a previous run timed out after switching the domain but before stopping the app workloads, re-running while maintenance mode is already enabled stops the app workloads to finish it (so it is not a pure no-op)
 - Specify the one-off workload through `one_off_workload` in the `.controlplane/controlplane.yml` file
 - Optionally specify the maintenance workload through `maintenance_workload` in the `.controlplane/controlplane.yml` file (defaults to 'maintenance')
 - Maintenance mode is only supported for domains that use path based routing mode and have a route configured for the prefix '/' on either port 80 or 443
@@ -466,6 +468,8 @@ timeout 300 cpflow ps:wait -a $APP_NAME
   and also overridden per job through `--cpu` and `--memory`)
 - By default, the job is stopped if it takes longer than 6 hours to finish
   (can be configured though `runner_job_timeout` in `controlplane.yml`)
+- Non-interactive jobs return the Control Plane cron job status even when the job finishes before
+  Control Plane exposes a runner replica to attach logs to
 ```sh
 # Opens shell (bash by default).
@@ -550,7 +554,7 @@ cpflow terraform import
 Regenerates the generated cpflow GitHub Actions wrappers and helper files
 from the currently installed cpflow gem. Use this after updating the
 cpflow gem so checked-in workflow wrappers move to the matching upstream
-release tag, for example `v5.0.4`.
+release tag, for example `v5.1.0`.
 If the existing generated staging workflow uses a custom single staging
 branch, the command preserves it. Pass `--staging-branch BRANCH` to set or

data/lib/command/maintenance_off.rb CHANGED Viewed

@@ -10,6 +10,7 @@ module Command
     DESCRIPTION = "Disables maintenance mode for an app"
     LONG_DESCRIPTION = <<~DESC
       - Disables maintenance mode for an app
+      - Safe to re-run: if a previous run timed out after switching the domain but before stopping the maintenance workload, re-running while maintenance mode is already disabled stops the maintenance workload to finish it (so it is not a pure no-op)
       - Specify the one-off workload through `one_off_workload` in the `.controlplane/controlplane.yml` file
       - Optionally specify the maintenance workload through `maintenance_workload` in the `.controlplane/controlplane.yml` file (defaults to 'maintenance')
       - Maintenance mode is only supported for domains that use path based routing mode and have a route configured for the prefix '/' on either port 80 or 443

data/lib/command/maintenance_on.rb CHANGED Viewed

@@ -10,6 +10,7 @@ module Command
     DESCRIPTION = "Enables maintenance mode for an app"
     LONG_DESCRIPTION = <<~DESC
       - Enables maintenance mode for an app
+      - Safe to re-run: if a previous run timed out after switching the domain but before stopping the app workloads, re-running while maintenance mode is already enabled stops the app workloads to finish it (so it is not a pure no-op)
       - Specify the one-off workload through `one_off_workload` in the `.controlplane/controlplane.yml` file
       - Optionally specify the maintenance workload through `maintenance_workload` in the `.controlplane/controlplane.yml` file (defaults to 'maintenance')
       - Maintenance mode is only supported for domains that use path based routing mode and have a route configured for the prefix '/' on either port 80 or 443

data/lib/command/run.rb CHANGED Viewed

@@ -47,6 +47,8 @@ module Command
         and also overridden per job through `--cpu` and `--memory`)
       - By default, the job is stopped if it takes longer than 6 hours to finish
         (can be configured though `runner_job_timeout` in `controlplane.yml`)
+      - Non-interactive jobs return the Control Plane cron job status even when the job finishes before
+        Control Plane exposes a runner replica to attach logs to
     DESC
     EXAMPLES = <<~EX.freeze
       ```sh
@@ -97,7 +99,7 @@ module Command
     attr_reader :interactive, :detached, :location, :original_workload, :runner_workload,
                 :default_image, :default_cpu, :default_memory, :job_timeout, :job_history_limit,
-                :container, :job, :replica, :command
+                :container, :job, :replica, :command, :job_completed_before_replica_exit_status
     def call # rubocop:disable Metrics/CyclomaticComplexity, Metrics/MethodLength, Metrics/PerceivedComplexity
       @interactive = config.options[:interactive] || interactive_command?
@@ -129,6 +131,7 @@ module Command
       update_runner_workload
       start_job
       wait_for_replica_for_job
+      exit(job_completed_before_replica_exit_status) if job_completed_before_replica_exit_status
       progress.puts
       if interactive
@@ -269,7 +272,20 @@ module Command
         result = cp.fetch_workload_replicas(runner_workload, location: location)
         @replica = result&.dig("items")&.find { |item| item.include?(job) }
-        replica || false
+        replica || completed_job_before_replica? || false
+      end
+    end
+    def completed_job_before_replica?
+      case current_job_status
+      when "successful"
+        @job_completed_before_replica_exit_status = ExitCode::SUCCESS
+        true
+      when nil, "active", "pending"
+        false
+      else
+        @job_completed_before_replica_exit_status = ExitCode::ERROR_DEFAULT
+        true
       end
     end
@@ -505,9 +521,7 @@ module Command
     def resolve_job_status # rubocop:disable Metrics/MethodLength
       loop do
-        result = cp.fetch_cron_workload(runner_workload, location: location)
-        job_details = result&.dig("items")&.find { |item| item["id"] == job }
-        status = job_details&.dig("status")
+        status = current_job_status
         Shell.debug("JOB STATUS", status)
@@ -522,6 +536,12 @@ module Command
       end
     end
+    def current_job_status
+      result = cp.fetch_cron_workload(runner_workload, location: location)
+      job_details = result&.dig("items")&.find { |item| item["id"] == job }
+      job_details&.dig("status")
+    end
     ###########################################
     ### temporary extaction from run:detached
     ###########################################

data/lib/core/maintenance_mode.rb CHANGED Viewed

@@ -1,8 +1,19 @@
 # frozen_string_literal: true
-class MaintenanceMode
+class MaintenanceMode # rubocop:disable Metrics/ClassLength
   extend Forwardable
+  DOMAIN_WORKLOAD_UPDATE_MAX_POLL_ATTEMPTS = 30
+  DOMAIN_WORKLOAD_UPDATE_RETRY_WAIT_SECONDS = 1
+  DOMAIN_WORKLOAD_UPDATE_STEP_OPTIONS = {
+    retry_on_failure: true,
+    # `with_retry` loops while `retry_count <= max_retry_count` starting from 0, so
+    # total attempts == max_retry_count + 1. Subtract 1 so the bounded poll runs
+    # exactly DOMAIN_WORKLOAD_UPDATE_MAX_POLL_ATTEMPTS times.
+    max_retry_count: DOMAIN_WORKLOAD_UPDATE_MAX_POLL_ATTEMPTS - 1,
+    wait: DOMAIN_WORKLOAD_UPDATE_RETRY_WAIT_SECONDS
+  }.freeze
   def_delegators :@command, :config, :progress, :cp, :step, :run_cpflow_command
   def initialize(command)
@@ -22,6 +33,7 @@ class MaintenanceMode
   def enable!
     if enabled?
       progress.puts("Maintenance mode is already enabled for app '#{config.app}'.")
+      ensure_app_workloads_stopped
     else
       enable_maintenance_mode
     end
@@ -30,6 +42,7 @@ class MaintenanceMode
   def disable!
     if disabled?
       progress.puts("Maintenance mode is already disabled for app '#{config.app}'.")
+      ensure_maintenance_workload_stopped
     else
       disable_maintenance_mode
     end
@@ -69,6 +82,28 @@ class MaintenanceMode
     cp.fetch_workload!(maintenance_workload)
   end
+  # A run that already switched the route but hit the poll timeout aborts before
+  # its final workload-stop step runs. The next `enable!`/`disable!` short-circuits
+  # on the route check, so do the matching stop here — once the route is on the
+  # target, this brings the workloads into the state that route implies. `ps:stop`
+  # is idempotent, so each is a no-op once the target workload is already stopped.
+  #
+  # The stop target differs by direction. `ps:stop -a` covers only
+  # `app_workloads` + `additional_workloads`, never the maintenance workload:
+  #   - enable!: the route now points at the maintenance workload, so the *app*
+  #     workloads are the ones left running and `ps:stop -a` is correct.
+  #   - disable!: the route now points at the app workloads (and a short-circuit
+  #     `disable!` can run on an app whose app workloads are serving live traffic),
+  #     so stopping all workloads would cause an outage. The workload a timed-out
+  #     `disable!` leaves running is the maintenance workload, so stop only that.
+  def ensure_app_workloads_stopped
+    start_or_stop_all_workloads(:stop)
+  end
+  def ensure_maintenance_workload_stopped
+    start_or_stop_maintenance_workload(:stop)
+  end
   def start_or_stop_all_workloads(action)
     run_cpflow_command("ps:#{action}", "-a", config.app, "--wait")
@@ -82,16 +117,68 @@ class MaintenanceMode
   end
   def switch_domain_workload(to:)
-    step("Switching workload for domain '#{domain_data['name']}' to '#{to}'") do
-      cp.set_domain_workload(domain_data, to)
-      # Give it a bit of time for the domain to update
-      Kernel.sleep(30)
+    domain_name = domain_data["name"]
+    # Unlike the polling step below, the switch request is intentionally not
+    # retried: if it fails, nothing has changed yet, so aborting and letting the
+    # user re-run is the safe outcome. (Retrying would not help here anyway —
+    # `with_retry` retries on a falsy return, and `set_domain_workload` raises
+    # rather than returning false.)
+    step("Requesting workload switch for domain '#{domain_name}' to '#{to}'") do
+      # `set_domain_workload` mutates the route in place, so send a deep copy
+      # (round-tripped through JSON, since the domain is plain parsed-API data
+      # with string keys and JSON-native values) to keep the cached
+      # `@domain_data` reflecting the real server route. The poll re-fetches and
+      # matches on that fresh data, but if every poll times out without a routable
+      # fetch, `@domain_data` is what a re-run's `enabled?`/`disabled?` check reads
+      # — mutating it here would make that check report the requested route, not
+      # the actual one.
+      domain_data_for_update = JSON.parse(JSON.generate(domain_data))
+      cp.set_domain_workload(domain_data_for_update, to)
     end
+    wait_for_domain_workload_switch(domain_name, to)
     progress.puts
   end
+  # If the route never switches within the bounded poll window, this step aborts
+  # (abort_on_error) before any workloads are stopped, so traffic stays on the
+  # current workload. The label tells the user how to recover, since an exhausted
+  # poll has no error message of its own to print.
+  def wait_for_domain_workload_switch(domain_name, to)
+    @last_poll_error = nil # reset the poll-error dedup state for this poll window
+    step("Waiting for domain '#{domain_name}' workload to switch to '#{to}' " \
+         "(re-run this command if it times out)", **DOMAIN_WORKLOAD_UPDATE_STEP_OPTIONS) do
+      domain_workload_update_confirmed?(domain_name, to)
+    end
+  end
+  # Refetches the domain, refreshes the cached `@domain_data` when the fetch
+  # returns a routable domain, and reports whether the route now points at
+  # `workload`. Any error — a 5xx mid-propagation, a transient 403
+  # (`ForbiddenError < StandardError`, not a `RuntimeError`), or a network blip —
+  # is treated as "not switched yet" so the poll keeps retrying. The broad rescue
+  # logs the error to the step's stderr, so a latent bug (e.g. `NoMethodError`)
+  # surfaces in the "failed!" output on timeout instead of being swallowed.
+  def domain_workload_update_confirmed?(domain_name, workload)
+    refreshed_domain_data = cp.fetch_domain(domain_name)
+    @domain_data = refreshed_domain_data if refreshed_domain_data
+    refreshed_domain_data && cp.domain_workload_matches?(refreshed_domain_data, workload)
+  rescue StandardError => e
+    # A persistent failure (bad domain name, network outage, a latent bug) repeats
+    # the same error on every poll attempt, so only log when the message changes —
+    # otherwise the timeout output would carry up to MAX_POLL_ATTEMPTS identical
+    # lines. Guard on `tmp_stderr` so this stays safe if ever called outside a
+    # `step` block, where no tmp stderr is set up.
+    message = "#{e.class}: #{e.message} (#{e.backtrace&.first})\n"
+    if message != @last_poll_error && Shell.tmp_stderr
+      Shell.write_to_tmp_stderr(message)
+      @last_poll_error = message
+    end
+    false
+  end
   def domain_data
     @domain_data ||=
       if config.domain

data/lib/cpflow/version.rb CHANGED Viewed

@@ -1,6 +1,6 @@
 # frozen_string_literal: true
 module Cpflow
-  VERSION = "5.1.0"
+  VERSION = "5.1.1"
   MIN_CPLN_VERSION = "3.1.0"
 end

data/lib/github_flow_templates/.github/cpflow-help.md CHANGED Viewed

@@ -23,11 +23,23 @@ For the normal generated review-app path, GitHub needs one repository secret:
 | --- | --- | --- |
 | `CPLN_TOKEN_STAGING` | Repository secret | Control Plane service-account token for the staging/review org. |
+For public repositories, use a staging/review token that cannot access
+production Control Plane resources. Generated review-app deploys skip fork PR
+heads because Docker builds use repository secrets. If a forked change needs a
+review app, first move the reviewed change to a trusted branch in this
+repository.
 No repository variables are required for the standard review-app path when
 `.controlplane/controlplane.yml` has exactly one review app entry with
 `match_if_app_name_starts_with: true`. cpflow infers the review-app prefix and
 staging org from that config.
+Review apps run pull request code. Any value mounted through
+`cpln://secret/...` can be read by that code after the workload starts, so keep
+review-app secret dictionaries limited to disposable databases, review-only
+renderer credentials, and license values that are acceptable for review-app
+exposure.
 Optional overrides exist for forks, clones, and unusual apps:
 | Name | Notes |
@@ -142,7 +154,7 @@ Most apps do not need these:
 | Name | Notes |
 | --- | --- |
 | `DOCKER_BUILD_EXTRA_ARGS` | Newline-delimited extra Docker build tokens. |
-| `DOCKER_BUILD_SSH_KEY` | Private SSH key for Docker builds that fetch private dependencies. |
+| `DOCKER_BUILD_SSH_KEY` | Read-only, revocable deploy key for Docker builds that fetch private dependencies. Do not use a personal SSH key. |
 | `DOCKER_BUILD_SSH_KNOWN_HOSTS` | SSH known_hosts entries when SSH build hosts are not GitHub.com. |
 | `REVIEW_APP_DEPLOYING_ICON_URL` | Cosmetic custom image URL for the animated deploying icon. Set to `none` to use the text fallback icon. |
 | `STAGING_APP_BRANCH` | Custom staging branch. The branch must also appear in `cpflow-deploy-staging.yml`'s push filter. |