RubyGems - solid_queue_autoscaler - Versions diffs - 1.0.13 → 1.0.16 - Mend

solid_queue_autoscaler 1.0.13 → 1.0.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +57 -0
data/README.md +88 -2
data/lib/generators/solid_queue_autoscaler/templates/create_solid_queue_autoscaler_locks.rb.erb +30 -0
data/lib/solid_queue_autoscaler/adapters/heroku.rb +38 -0
data/lib/solid_queue_autoscaler/advisory_lock.rb +168 -12
data/lib/solid_queue_autoscaler/version.rb +1 -1
metadata +17 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: bf4f38fa3806f153c03715b02554d1a82837da595fb9dadf4fb8063d6f52b3c8
-  data.tar.gz: 3737a7de81ab147dbd8e38fc87a2f601c21ea4d895758418f4bd484b0483ea12
+  metadata.gz: 0d2ec8d0897f2312d05ccc10d075e5b90bd7c31e5e699fc34bfff13ba5be513b
+  data.tar.gz: b6e26a9f33e0c86f8809c4c52f04059ed62b4f1c3097ef6421a25cafb5eeed72
 SHA512:
-  metadata.gz: 1ef6933dfe8a7936ba524ebd2fc94f46b20b6545b2e008e6d5e2c5c4753f54f866b82ef84eb75b39aa9d49e69ca476bae9be36cfdfff9cd39f97a380627e38f0
-  data.tar.gz: 14b04452165bac891d292dfdcb4dd7fb11993d6b5c772c43e658e2492437bcf7af2b9004f616b1caf9b73e606bb018855cd9f36be7b7a0909f336be59fc34952
+  metadata.gz: 4d9cd4937e412fded6a640c9044dfa846b7006848647a3ffa2c78a70fe3f0b03bf1b5f34dc0555f0e5489e0d3c4d23fef575c4e7f16ff421ad2385355ea6919f
+  data.tar.gz: 1ed169508ba540f4ec3dcfaf6bcea5d04812f4385c4231f0033a98d10b891d3722f2b0d46184d4bd7ffea9b860b6e70f5b3b07856533104ae06e3d31d09e2be7

data/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,63 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [1.0.16] - 2025-01-30
+### Added
+- **Comprehensive scale-to-zero documentation** - Added dedicated "Scale to Zero" section in README:
+  - Explains how `min_workers = 0` works with Heroku formation behavior
+  - Documents the v1.0.15 fix for graceful 404 handling
+  - Includes configuration examples and cold-start latency considerations
+  - Guidance on where to run the autoscaler (web dyno vs workers)
+  - Updated Features list and linked Cost-Optimized example
+## [1.0.15] - 2025-01-30
+### Fixed
+- **Fixed Heroku adapter 404 error when querying scaled-to-zero dynos** - When a dyno type is scaled to 0 and removed from Heroku's formation, the API returns 404. The adapter now handles this gracefully:
+  - `current_workers` returns 0 instead of raising an error when formation doesn't exist
+  - `scale` falls back to `batch_update` API to create the formation when `update` returns 404
+  - Added `create_formation` private method using Heroku's batch_update endpoint
+  - This enables full scale-to-zero support with `min_workers = 0`
+## [1.0.14] - 2025-01-18
+### Added
+- **SQLite and MySQL support for advisory locks** - AdvisoryLock now supports multiple database adapters:
+  - PostgreSQL: Uses native `pg_try_advisory_lock/pg_advisory_unlock`
+  - MySQL/Trilogy: Uses `GET_LOCK/RELEASE_LOCK`
+  - SQLite: Uses table-based locking with auto-created locks table
+  - Other databases: Falls back to table-based locking
+  - Automatic adapter detection via `connection.adapter_name`
+  - Stale lock cleanup (locks older than 5 minutes are removed)
+  - Lock ownership tracking (`hostname:pid:thread_id`)
+- **Comprehensive configuration tests** - Added 100+ tests across Rails and Sinatra dummy apps:
+  - Tests for ALL configuration options (job_queue, job_priority, scaling thresholds, cooldowns, etc.)
+  - Decision engine threshold tests verifying scaling logic
+  - End-to-end tests with mocked Heroku API verifying full scaling workflow
+  - Queue name and priority regression tests (prevents jobs going to wrong queue)
+- **GitHub Actions integration test workflow** - New CI job that runs dummy app tests:
+  - Runs Rails dummy app tests (62 tests)
+  - Runs Sinatra dummy app tests (58 tests)
+  - Ensures queue name, priority, and E2E scaling tests pass before release
+- **Release workflow now requires CI to pass** - Updated release.yml to use `workflow_run` trigger:
+  - Release only runs after CI workflow completes successfully
+  - All unit tests, integration tests, and linting must pass before publishing
+### Fixed
+- **Fixed test pollution in autoscale_job_spec** - Changed from using RSpec's `described_class` (which caches class references) to dynamic constant lookup, preventing stale class reference issues when tests reload the AutoscaleJob class
+## [1.0.13] - 2025-01-17
+### Fixed
+- **Fixed AutoscaleJob queue_name type mismatch** - Queue name is now converted to string when set via `apply_job_settings!`
+  - ActiveJob internally uses strings for queue names, but the configuration uses symbols
+  - This caused jobs to have symbol queue names (`:autoscaler`) instead of string (`"autoscaler"`)
+  - Now `apply_job_settings!` calls `.to_s` on the job_queue to ensure consistent string format
 ## [1.0.12] - 2025-01-17
 ### Fixed

data/README.md CHANGED Viewed

@@ -10,12 +10,98 @@ A control plane for [Solid Queue](https://github.com/rails/solid_queue) that aut
 - **Metrics-based scaling**: Scales based on queue depth, job latency, and throughput
 - **Multiple scaling strategies**: Fixed increment or proportional scaling based on load
 - **Multi-worker support**: Configure and scale different worker types independently
+- **Scale to zero**: Full support for `min_workers = 0` to eliminate costs during idle periods
 - **Platform adapters**: Native support for Heroku and Kubernetes
 - **Singleton execution**: Uses PostgreSQL advisory locks to ensure only one autoscaler runs at a time
 - **Safety features**: Cooldowns, min/max limits, dry-run mode
 - **Rails integration**: Configuration via initializer, Railtie with rake tasks
 - **Flexible execution**: Run as a recurring Solid Queue job or standalone
+## Scale to Zero
+The autoscaler fully supports scaling workers to zero (`min_workers = 0`), allowing you to eliminate worker costs during idle periods.
+### How It Works
+When you configure `min_workers = 0` and the queue becomes idle, the autoscaler will scale your workers down to zero. This is ideal for:
+- **Development/staging environments** with sporadic usage
+- **Batch processing workers** that only run when jobs are queued
+- **Cost-sensitive applications** with predictable idle periods
+### Heroku Formation Behavior
+On Heroku, when a dyno type is scaled to 0, it gets **removed from the formation entirely**. This means:
+1. `heroku ps:scale worker=0` removes the `worker` formation
+2. Subsequent API calls to get formation info return **404 Not Found**
+3. When scaling back up, the formation must be **recreated**
+As of **v1.0.15**, the autoscaler handles this gracefully:
+- When querying a non-existent formation, it returns `0` workers (instead of raising an error)
+- When scaling up a non-existent formation, it automatically creates it using Heroku's batch update API
+- This enables seamless scale-to-zero → scale-up workflows
+### Configuration Example
+```ruby
+SolidQueueAutoscaler.configure(:batch_worker) do |config|
+  config.adapter = :heroku
+  config.heroku_api_key = ENV['HEROKU_API_KEY']
+  config.heroku_app_name = ENV['HEROKU_APP_NAME']
+  config.process_type = 'batch_worker'
+  # Enable scale-to-zero
+  config.min_workers = 0
+  config.max_workers = 5
+  # Scale up immediately when any job is queued
+  config.scale_up_queue_depth = 1
+  config.scale_up_latency_seconds = 60
+  # Scale down when completely idle
+  config.scale_down_queue_depth = 0
+  config.scale_down_latency_seconds = 10
+  # Longer scale-down cooldown to avoid premature scaling to zero
+  config.scale_up_cooldown_seconds = 30
+  config.scale_down_cooldown_seconds = 300  # 5 minutes
+end
+```
+### Important Considerations
+**Cold-start latency**: When workers are at zero and a job is enqueued, there will be latency before the job is processed:
+1. The autoscaler job must run (depends on your `schedule` interval)
+2. The autoscaler must scale up workers
+3. Heroku must provision and start the dyno (~10-30 seconds)
+4. The worker must boot and start processing
+Total cold-start time is typically **30-90 seconds** depending on your configuration and dyno startup time.
+**Where to run the autoscaler**: The autoscaler job **must run on a process that's always running** (like your web dyno), NOT on the workers being scaled. If the autoscaler runs on workers and those workers scale to zero, there's nothing to scale them back up!
+```yaml
+# config/recurring.yml - runs on whatever process runs the dispatcher
+autoscaler_batch:
+  class: SolidQueueAutoscaler::AutoscaleJob
+  queue: autoscaler
+  schedule: every 30 seconds
+  args: [:batch_worker]
+```
+**Procfile setup**: Ensure your web dyno runs the Solid Queue dispatcher (or use a dedicated always-on dyno):
+```
+# Procfile
+web: bundle exec puma -C config/puma.rb
+worker: bundle exec rake solid_queue:start
+batch_worker: bundle exec rake solid_queue:start
+```
+Alternatively, run the dispatcher in a thread within your web process using `solid_queue.yml` configuration.
 ## Installation
 Add to your Gemfile:
@@ -308,7 +394,7 @@ autoscaler:
 ### Cost-Optimized Setup (Scale to Zero)
-For apps with sporadic workloads where you want to minimize costs during idle periods:
+For apps with sporadic workloads where you want to minimize costs during idle periods. See the [Scale to Zero](#scale-to-zero) section for full details on how this works.
 ```ruby
 SolidQueueAutoscaler.configure do |config|
@@ -337,7 +423,7 @@ SolidQueueAutoscaler.configure do |config|
 end
 ```
-**⚠️ Note:** With `min_workers = 0`, there's cold-start latency when the first job arrives. The autoscaler must run on a web dyno or separate process, not on the workers themselves.
+**⚠️ Note:** With `min_workers = 0`, there's cold-start latency (~30-90s) when the first job arrives. The autoscaler must run on a web dyno or separate always-on process, not on the workers themselves. See [Scale to Zero](#scale-to-zero) for details.
 ---

data/lib/generators/solid_queue_autoscaler/templates/create_solid_queue_autoscaler_locks.rb.erb ADDED Viewed

@@ -0,0 +1,30 @@
+# frozen_string_literal: true
+# Migration for SolidQueueAutoscaler locks table.
+# This table is used for advisory locking on databases that don't support
+# native advisory locks (SQLite, etc.).
+#
+# NOTE: This migration is OPTIONAL. The locks table is automatically created
+# when first needed. Only use this migration if you prefer to manage the
+# table schema explicitly.
+#
+# For multi-database setups (SolidQueue in separate database):
+#   This migration should be placed in db/queue_migrate/ (or your queue DB's migration path)
+#   Run with: rails db:migrate:queue
+#
+# For single-database setups:
+#   Place in db/migrate/ and run: rails db:migrate
+#
+class CreateSolidQueueAutoscalerLocks < ActiveRecord::Migration<%= migration_version %>
+  def change
+    create_table :solid_queue_autoscaler_locks, id: false do |t|
+      t.string :lock_key, null: false, primary_key: true
+      t.integer :lock_id, null: false
+      t.datetime :locked_at, null: false
+      t.string :locked_by, null: false
+    end
+    # Index for cleanup of stale locks
+    add_index :solid_queue_autoscaler_locks, :locked_at
+  end
+end

data/lib/solid_queue_autoscaler/adapters/heroku.rb CHANGED Viewed

@@ -33,6 +33,13 @@ module SolidQueueAutoscaler
           formation['quantity']
         end
       rescue Excon::Error => e
+        # Handle 404 gracefully - formation doesn't exist means 0 workers
+        # This happens when a dyno type is scaled to 0 and removed from formation
+        if e.respond_to?(:response) && e.response&.status == 404
+          logger&.debug("[Autoscaler] Formation '#{process_type}' not found, treating as 0 workers")
+          return 0
+        end
         raise HerokuAPIError.new(
           "Failed to get formation info: #{e.message}",
           status_code: e.respond_to?(:response) ? e.response&.status : nil,
@@ -51,6 +58,12 @@ module SolidQueueAutoscaler
         end
         quantity
       rescue Excon::Error => e
+        # Handle 404 by trying to create the formation via batch_update
+        # This happens when scaling up a dyno type that was previously scaled to 0
+        if e.respond_to?(:response) && e.response&.status == 404
+          return create_formation(quantity)
+        end
         raise HerokuAPIError.new(
           "Failed to scale #{process_type} to #{quantity}: #{e.message}",
           status_code: e.respond_to?(:response) ? e.response&.status : nil,
@@ -84,6 +97,31 @@ module SolidQueueAutoscaler
       private
+      # Creates a formation that doesn't exist using batch_update.
+      # This is needed when scaling up a dyno type that was previously scaled to 0.
+      #
+      # @param quantity [Integer] desired worker count
+      # @return [Integer] the new worker count
+      # @raise [HerokuAPIError] if the API call fails
+      def create_formation(quantity)
+        logger&.info("[Autoscaler] Formation '#{process_type}' not found, creating with quantity #{quantity}")
+        with_retry(RETRYABLE_ERRORS, retryable_check: method(:retryable_error?)) do
+          client.formation.batch_update(app_name, {
+            updates: [
+              { type: process_type, quantity: quantity }
+            ]
+          })
+        end
+        quantity
+      rescue Excon::Error => e
+        raise HerokuAPIError.new(
+          "Failed to create formation #{process_type} with quantity #{quantity}: #{e.message}",
+          status_code: e.respond_to?(:response) ? e.response&.status : nil,
+          response_body: e.respond_to?(:response) ? e.response&.body : nil
+        )
+      end
       # Determines if an error should be retried.
       # Retries timeouts and 5xx errors, but not 4xx client errors.
       def retryable_error?(error)

data/lib/solid_queue_autoscaler/advisory_lock.rb CHANGED Viewed

@@ -1,12 +1,14 @@
 # frozen_string_literal: true
 require 'zlib'
+require 'socket'
 module SolidQueueAutoscaler
-  # PostgreSQL advisory lock wrapper for singleton enforcement.
+  # Advisory lock wrapper for singleton enforcement.
+  # Supports both PostgreSQL (native advisory locks) and SQLite (table-based locks).
   #
-  # IMPORTANT: PgBouncer Compatibility Warning
-  # ==========================================
+  # IMPORTANT: PgBouncer Compatibility Warning (PostgreSQL only)
+  # ============================================================
   # PostgreSQL advisory locks are connection-scoped (session-level locks).
   # If you're using PgBouncer in transaction pooling mode, advisory locks
   # will NOT work correctly because:
@@ -24,6 +26,10 @@ module SolidQueueAutoscaler
   # lock acquisition always failing, PgBouncer is likely the cause.
   #
   class AdvisoryLock
+    LOCKS_TABLE_NAME = 'solid_queue_autoscaler_locks'
+    # Stale lock timeout - locks older than this are considered abandoned (5 minutes)
+    STALE_LOCK_TIMEOUT_SECONDS = 300
     attr_reader :lock_key, :timeout
     def initialize(lock_key: nil, timeout: nil, config: nil)
@@ -31,6 +37,7 @@ module SolidQueueAutoscaler
       @lock_key = lock_key || @config.lock_key
       @timeout = timeout || @config.lock_timeout_seconds
       @lock_acquired = false
+      @strategy = nil
     end
     def with_lock
@@ -43,20 +50,14 @@ module SolidQueueAutoscaler
     def try_lock
       return false if @lock_acquired
-      result = connection.select_value(
-        "SELECT pg_try_advisory_lock(#{lock_id})"
-      )
-      @lock_acquired = [true, 't'].include?(result)
+      @lock_acquired = lock_strategy.try_lock
       @lock_acquired
     end
     def acquire!
       return true if @lock_acquired
-      result = connection.select_value(
-        "SELECT pg_try_advisory_lock(#{lock_id})"
-      )
-      @lock_acquired = [true, 't'].include?(result)
+      @lock_acquired = lock_strategy.try_lock
       raise LockError, "Could not acquire advisory lock '#{lock_key}' (id: #{lock_id})" unless @lock_acquired
@@ -66,7 +67,7 @@ module SolidQueueAutoscaler
     def release
       return false unless @lock_acquired
-      connection.execute("SELECT pg_advisory_unlock(#{lock_id})")
+      lock_strategy.release
       @lock_acquired = false
       true
     end
@@ -87,5 +88,160 @@ module SolidQueueAutoscaler
         hash & 0x7FFFFFFF
       end
     end
+    def lock_strategy
+      @strategy ||= create_lock_strategy
+    end
+    def create_lock_strategy
+      adapter_name = connection.adapter_name.downcase
+      case adapter_name
+      when /postgresql/, /postgis/
+        PostgreSQLLockStrategy.new(connection: connection, lock_id: lock_id, lock_key: lock_key)
+      when /sqlite/
+        SQLiteLockStrategy.new(connection: connection, lock_id: lock_id, lock_key: lock_key)
+      when /mysql/, /trilogy/
+        MySQLLockStrategy.new(connection: connection, lock_id: lock_id, lock_key: lock_key)
+      else
+        # Fall back to table-based locking for unknown adapters
+        TableBasedLockStrategy.new(connection: connection, lock_id: lock_id, lock_key: lock_key)
+      end
+    end
+    # Base class for lock strategies
+    class BaseLockStrategy
+      def initialize(connection:, lock_id:, lock_key:)
+        @connection = connection
+        @lock_id = lock_id
+        @lock_key = lock_key
+      end
+      def try_lock
+        raise NotImplementedError, "#{self.class} must implement #try_lock"
+      end
+      def release
+        raise NotImplementedError, "#{self.class} must implement #release"
+      end
+      protected
+      attr_reader :connection, :lock_id, :lock_key
+    end
+    # PostgreSQL native advisory locks
+    class PostgreSQLLockStrategy < BaseLockStrategy
+      def try_lock
+        result = connection.select_value(
+          "SELECT pg_try_advisory_lock(#{lock_id})"
+        )
+        [true, 't'].include?(result)
+      end
+      def release
+        connection.execute("SELECT pg_advisory_unlock(#{lock_id})")
+        true
+      end
+    end
+    # MySQL named locks (GET_LOCK/RELEASE_LOCK)
+    class MySQLLockStrategy < BaseLockStrategy
+      def try_lock
+        # MySQL GET_LOCK returns 1 on success, 0 if timeout, NULL on error
+        result = connection.select_value(
+          "SELECT GET_LOCK(#{connection.quote(lock_key)}, 0)"
+        )
+        result == 1
+      end
+      def release
+        connection.execute("SELECT RELEASE_LOCK(#{connection.quote(lock_key)})")
+        true
+      end
+    end
+    # Table-based locking for databases without native advisory lock support
+    # Uses a simple locks table with INSERT/DELETE for lock management
+    class TableBasedLockStrategy < BaseLockStrategy
+      def try_lock
+        ensure_locks_table_exists!
+        cleanup_stale_locks!
+        # Try to insert a lock record
+        begin
+          connection.execute(<<~SQL)
+            INSERT INTO #{quoted_table_name} (lock_key, lock_id, locked_at, locked_by)
+            VALUES (#{connection.quote(lock_key)}, #{lock_id}, #{connection.quote(Time.now.utc.iso8601)}, #{connection.quote(lock_owner)})
+          SQL
+          true
+        rescue ActiveRecord::RecordNotUnique, ActiveRecord::StatementInvalid => e
+          # Lock already held by another process
+          # StatementInvalid catches SQLite's UNIQUE constraint violation
+          return false if e.message.include?('UNIQUE') || e.message.include?('duplicate')
+          raise
+        end
+      end
+      def release
+        return true unless table_exists?
+        connection.execute(<<~SQL)
+          DELETE FROM #{quoted_table_name}
+          WHERE lock_key = #{connection.quote(lock_key)}
+            AND locked_by = #{connection.quote(lock_owner)}
+        SQL
+        true
+      end
+      private
+      def ensure_locks_table_exists!
+        return if table_exists?
+        create_locks_table!
+      end
+      def table_exists?
+        @table_exists ||= connection.table_exists?(LOCKS_TABLE_NAME)
+      end
+      def create_locks_table!
+        connection.execute(<<~SQL)
+          CREATE TABLE IF NOT EXISTS #{quoted_table_name} (
+            lock_key VARCHAR(255) NOT NULL PRIMARY KEY,
+            lock_id INTEGER NOT NULL,
+            locked_at DATETIME NOT NULL,
+            locked_by VARCHAR(255) NOT NULL
+          )
+        SQL
+        @table_exists = true
+      end
+      def cleanup_stale_locks!
+        # Remove locks older than STALE_LOCK_TIMEOUT_SECONDS
+        stale_threshold = (Time.now.utc - STALE_LOCK_TIMEOUT_SECONDS).iso8601
+        connection.execute(<<~SQL)
+          DELETE FROM #{quoted_table_name}
+          WHERE locked_at < #{connection.quote(stale_threshold)}
+        SQL
+      end
+      def quoted_table_name
+        connection.quote_table_name(LOCKS_TABLE_NAME)
+      end
+      def lock_owner
+        # Unique identifier for this process/thread
+        @lock_owner ||= "#{Socket.gethostname}:#{Process.pid}:#{Thread.current.object_id}"
+      end
+    end
+    # SQLite table-based locking (SQLite doesn't have advisory locks)
+    # Defined after TableBasedLockStrategy since it inherits from it
+    class SQLiteLockStrategy < TableBasedLockStrategy
+      # Inherits all behavior from TableBasedLockStrategy
+    end
   end
 end

data/lib/solid_queue_autoscaler/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module SolidQueueAutoscaler
-  VERSION = '1.0.13'
+  VERSION = '1.0.16'
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: solid_queue_autoscaler
 version: !ruby/object:Gem::Version
-  version: 1.0.13
+  version: 1.0.16
 platform: ruby
 authors:
 - reillyse
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2026-01-17 00:00:00.000000000 Z
+date: 2026-01-31 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: activerecord
@@ -122,6 +122,20 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '3.18'
+- !ruby/object:Gem::Dependency
+  name: sqlite3
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 description: A control plane for Solid Queue on Heroku that automatically scales worker
   dynos based on queue depth, job latency, and throughput. Uses PostgreSQL advisory
   locks for singleton behavior and the Heroku Platform API for scaling.
@@ -143,6 +157,7 @@ files:
 - lib/generators/solid_queue_autoscaler/migration_generator.rb
 - lib/generators/solid_queue_autoscaler/templates/README
 - lib/generators/solid_queue_autoscaler/templates/create_solid_queue_autoscaler_events.rb.erb
+- lib/generators/solid_queue_autoscaler/templates/create_solid_queue_autoscaler_locks.rb.erb
 - lib/generators/solid_queue_autoscaler/templates/create_solid_queue_autoscaler_state.rb.erb
 - lib/generators/solid_queue_autoscaler/templates/initializer.rb
 - lib/solid_queue_autoscaler.rb