async-background 1.0.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/CHANGELOG.md CHANGED
@@ -1,358 +1,401 @@
1
1
  # Changelog
2
2
 
3
- ## Unreleased
3
+ ## 1.0.1
4
+
5
+ Dashboard security headers and a fiber-native rewrite of the SSE stream.
6
+
7
+ ### Security
8
+
9
+ - HTML shell now ships a strict CSP and `X-Frame-Options: DENY`. The CSP is
10
+ `default-src 'none'; script-src 'self'; style-src 'self'; img-src 'self'
11
+ data:; connect-src 'self'; frame-ancestors 'none'; base-uri 'none';
12
+ form-action 'none'`.
13
+ - Every response carries `X-Content-Type-Options: nosniff`,
14
+ `Referrer-Policy: no-referrer`, and `Cross-Origin-Resource-Policy:
15
+ same-origin`.
16
+ - `401` and `404` responses are now JSON like every other error — minor
17
+ breaking change for clients that parsed the previous `text/plain` body.
18
+ - `HEAD` is accepted on every route that accepts `GET` (RFC 9110 §9.3.2).
19
+ `HEAD /api/stream` returns the SSE headers without opening a stream.
20
+ - `mount_path` validation is stricter: must start with `/`, no trailing
21
+ slash, no control characters, no whitespace.
22
+ - New `Configuration#logger` (any object responding to `#warn` / `#error`).
23
+ When set, auth-callable exceptions, internal `rescue StandardError`, and
24
+ SSE stream errors are surfaced instead of being silently swallowed.
25
+
26
+ ### Architecture: fiber-native SSE
27
+
28
+ - Removed the per-process monitor `Thread.new` and per-subscription
29
+ `Mutex` + `ConditionVariable`. The dashboard web subsystem spawns **zero**
30
+ native threads of its own.
31
+ - `EventHub` is now a tiny mutex-guarded frame cache keyed by `data_version`.
32
+ No subscriptions, no monitor.
33
+ - `Stream#each` runs the entire poll-and-yield loop inside the per-request
34
+ Falcon fiber: `sleep` (fiber-aware), read `PRAGMA data_version`, yield the
35
+ `overview` frame when the version moves, yield a heartbeat otherwise.
36
+ - Each tab now polls `data_version` independently. At default 0.5 s and
37
+ realistic operator fan-out that's well under 50 SQLite header reads /
38
+ second per process. JSON rendering still happens at most once per version
39
+ thanks to the hub cache.
40
+ - Shutdown is clean: `App#close` marks the hub closed, the next poll raises
41
+ `ClosedError`, the loop exits. No thread to join, no `Subscription` to
42
+ unsubscribe.
43
+
44
+
45
+
46
+  
4
47
 
5
- ### Dashboard SSE hardening
48
+ ## 1.0.0
6
49
 
7
- - Make SSE the default dashboard transport. `:polling` remains an explicit compatibility fallback.
8
- - Replace one `PRAGMA data_version` loop **per browser connection** with one shared `EventHub` watcher per Rack app process while at least one dashboard tab is connected. It fans complete overview snapshots to every SSE body through latest-value mailboxes, so slow tabs do not accumulate unbounded event queues.
9
- - Use complete snapshots on connect and after a change rather than a replay log: reconnecting EventSource clients cannot miss the current queue state.
10
- - Coalesce client-side list refreshes after a burst of changes, cancel stale list fetches on tab switches, and retain previous pages when `Load more` is used.
11
- - Add SSE retry control, 25-second heartbeat frames, `no-cache, no-transform`, and `X-Accel-Buffering: no`. Remove the hop-by-hop `Connection` response header.
12
- - Fingerprint asset URLs and cache immutable assets by digest, so a dashboard deploy cannot leave a browser on incompatible HTML/JS/CSS.
13
- - Add coverage for event fan-out, latest-value coalescing, clean stream shutdown, forced overview reads, and the SSE configuration constraints.
50
+ First stable release. The queue execution contract from 0.7.2 (claim-token
51
+ CAS, lifecycle columns, barrier-based shutdown drain, per-status partial
52
+ indexes, versioned migrations) is now considered the public API.
53
+
54
+ ### Dashboard
55
+
56
+ A read-only Rack-mountable UI under `require 'async/background/web'`:
57
+ vanilla HTML / CSS / JS, no framework, no npm.
58
+
59
+ - Endpoints: `GET /`, `GET /assets/{app.css,app.js}`,
60
+ `GET /api/{overview,executing,claimed,pending,done,failed,metrics,config,stream}`.
61
+ - The read path runs through `Async::Background::Web::Snapshot`, which
62
+ opens SQLite with `file:?mode=ro`, wraps a `Mutex` around a single shared
63
+ connection, and uses one read transaction per endpoint plus a TTL'd
64
+ overview cache (`counts_cache_ttl`, default 3 s).
65
+ - Distinguishes **executing** (`status='running' AND started_at IS NOT
66
+ NULL`) from **claimed** (`status='running' AND started_at IS NULL`).
67
+ - Cursor pagination for `done` / `failed` / `pending` using
68
+ `(finished_at, id)` / `(run_at, id)` tuples. Stable on ties.
69
+ - Args hidden by default (`expose_args: false`); when enabled, content
70
+ runs through `redact_args`. All user content rendered through
71
+ `textContent`, never `innerHTML`.
72
+ - `auth` is **mandatory**. `Configuration#validate!` rejects an
73
+ unconfigured `auth`. There is no permissive default — a falsey result
74
+ returns `401`.
75
+
76
+ ### SSE transport
77
+
78
+ The dashboard uses a single long-lived `text/event-stream` connection per
79
+ browser tab instead of polling `/api/overview` every 2 seconds. One HTTP
80
+ connection per tab regardless of how long it stays open.
81
+
82
+ - `Configuration#transport` accepts `:sse` (default) or `:polling`. Anything
83
+ else raises `ConfigurationError`. The chosen transport is exposed at
84
+ `/api/config` so the client knows which path to take.
85
+ - Client opens `EventSource(mount_path + '/api/stream')` once; the server
86
+ pushes an `overview` event when `PRAGMA data_version` changes and a
87
+ `:keepalive` comment frame every 25 s.
88
+ - Server-supplied 5 s reconnect delay; each reconnect begins from a full
89
+ current snapshot (no event log).
90
+ - Asset URLs are fingerprinted and cached immutably by digest, so a
91
+ dashboard deploy can't leave a browser on incompatible HTML / JS / CSS.
92
+
93
+ ### Server compatibility for SSE
94
+
95
+ SSE holds the response open for the lifetime of the dashboard tab.
96
+
97
+ - **Falcon** — recommended. Handles long-lived connections via fibers.
98
+ - **Puma** — works. Each open tab holds one worker thread for its lifetime;
99
+ fine for a handful of operators, problematic if many concurrent operators
100
+ would starve the worker pool.
101
+ - **Unicorn** — doesn't work. Blocking worker model can't hold long-lived
102
+ connections without timeouts. Stay on `:polling`.
103
+
104
+ See the picture in the README for what each server is actually holding.
14
105
 
15
- ## 1.1.0
106
+ ### Configuration
16
107
 
17
- Server-Sent Events transport for the dashboard. Replaces HTTP polling as the recommended transport.
108
+ ```ruby
109
+ require 'async/background/web'
18
110
 
19
- ### Added
111
+ Async::Background::Queue::Store.prepare_dashboard!(path: '/var/lib/app/queue.db')
20
112
 
21
- - **SSE transport for the dashboard.** Set `c.transport = :sse` in `Async::Background::Web.configure` and the dashboard now uses a single long-lived `text/event-stream` connection per browser tab instead of polling `/api/overview` every 2 seconds. The browser opens `EventSource(mount_path + '/api/stream')` once; the server pushes an `overview` event whenever `PRAGMA data_version` changes, and a `:keepalive` comment frame every 30 seconds. Result: 1 HTTP connection per dashboard tab regardless of how long it stays open, instead of 30 req/min per tab.
113
+ Async::Background::Web.configure do |c|
114
+ c.queue_path = '/var/lib/app/queue.db'
115
+ c.auth = ->(env) { env['warden'].user&.admin? }
116
+ c.expose_args = false
117
+ c.metrics_path = '/run/app/async-background.shm'
118
+ c.total_workers = 4
119
+ c.counts_cache_ttl = 3.0
120
+ c.poll_interval_ms = 2000
121
+ c.list_limit = 50
122
+ c.mount_path = '/admin/background'
123
+ c.title = 'My App background jobs'
124
+ end
22
125
 
23
- - New module `Async::Background::Web::Stream` implements the event loop as a Rack streaming body (responds to `#each`, yields SSE frames). Holds no state across requests.
24
- - New route `GET /api/stream` returns `200 text/event-stream` when `transport == :sse`, `404` otherwise. Subject to the same auth gate as every other endpoint.
25
- - New `Response.sse(body)` helper sets the correct headers including `x-accel-buffering: no` (disables nginx buffering for the streaming response).
26
- - JS client (`assets.rb`) detects `state.config.transport === 'sse'` at boot and chooses `EventSource` over `setInterval(tick, ...)`. Both transports share the same `applyOverview()` and `refreshActiveList()` handlers, so the UI behaves identically.
126
+ run Async::Background::Web.app
127
+ ```
27
128
 
28
- - **`Configuration#transport`** with default `:polling` (backward compatible) and accepted values `:polling | :sse`. Validation rejects anything else with `ConfigurationError`. The chosen transport is exposed at `/api/config` so the client knows which path to take.
129
+ ### Dependencies
29
130
 
30
- ### Migration
131
+ `rack` is optional. Required only when `require 'async/background/web'` is
132
+ loaded. Core gem and worker processes don't require it.
31
133
 
32
- Existing deployments keep working unchanged. To opt into SSE:
134
+ ### Breaking changes from 0.7.x
33
135
 
34
- ```ruby
35
- Async::Background::Web.configure do |c|
36
- c.queue_path = ...
37
- c.auth = ->(env) { ... }
38
- c.transport = :sse
39
- end
40
- ```
136
+ None beyond what 0.7.2 already shipped. The 1.0 line locks the existing
137
+ contract:
41
138
 
42
- ### Server compatibility note
139
+ - `Queue::Store#fetch` returns `claim_token` in the result hash.
140
+ - All terminal `Queue::Store` methods (`complete`, `fail`, `retry_or_fail`)
141
+ require the `claim_token:` kwarg and return CAS success boolean /
142
+ `:retried` / `:failed` / `nil`.
143
+ - Schema is versioned via `PRAGMA user_version`. Use
144
+ `Queue::Store.migrate!(path:)` to upgrade. Use
145
+ `Queue::Store.prepare_dashboard!(path:)` from the dashboard process to
146
+ add dashboard-only indexes.
43
147
 
44
- SSE holds the request thread/fiber open for the lifetime of the dashboard tab. **Recommended for Falcon**, which handles long-lived connections natively via fibers. **Puma works** but each open dashboard tab holds one worker thread for its lifetime — fine for an admin dashboard with a handful of operators, problematic if many concurrent operators would starve the worker pool. **Unicorn does not work** for SSE since its blocking worker model can't hold long-lived connections without timeouts; stay on `:polling` there.
45
148
 
46
- ### Backend-side polling
47
149
 
48
- The server still polls `PRAGMA data_version` every 500ms inside the snapshot connection to detect changes. This is a connection-local PRAGMA call, microseconds, never hits a rate limiter. Client-facing transport is push.
150
+  
49
151
 
50
- ### Tests
152
+ ## 0.7.2
51
153
 
52
- - New `spec/async/background/web/stream_spec.rb` — covers overview event on data_version change, heartbeat after idle, graceful exit on `EPIPE`/`IOError`, error frame on `ClosedError`/`UnavailableError`.
53
- - Extended `spec/async/background/web/app_spec.rb` — `/api/stream` returns 404 on polling default, 200 text/event-stream on `:sse`, 401 without auth.
54
- - Extended `spec/async/background/web/configuration_spec.rb` — accepts `:sse`, rejects unknown transports.
154
+ Harden queue execution, retries, shutdown, and metrics. Adds schema v1,
155
+ optional dashboard indexes, and a faster enqueue path.
55
156
 
56
- ## 1.0.0
57
157
 
58
- First stable release. The queue execution contract from 0.7.2 (claim-token CAS, lifecycle columns, barrier-based shutdown drain, per-status partial indexes, versioned migrations) is now considered the public API.
59
158
 
60
- ### Features
159
+  
61
160
 
62
- - **Web dashboard.** Rack-mountable read-only UI under `require 'async/background/web'`. Vanilla HTML/CSS/JS, no JS framework, no npm.
63
- - Endpoints: `GET /`, `GET /assets/app.css`, `GET /assets/app.js`, `GET /api/overview`, `GET /api/executing`, `GET /api/claimed`, `GET /api/pending`, `GET /api/done`, `GET /api/failed`, `GET /api/metrics`, `GET /api/config`.
64
- - Default transport is JSON polling (`poll_interval_ms`, default 2000). SSE adapter for Falcon is intentionally deferred to a later release; the dashboard already coalesces work via a shared overview cache, so adding SSE later is a backward-compatible change.
65
- - Read path runs through `Async::Background::Web::Snapshot`, which opens SQLite with `file:?mode=ro`, wraps a `Mutex` around a single shared connection, and uses one read transaction per endpoint and caches each overview as one consistent snapshot.
66
- - Distinguishes `Executing` (`status='running' AND started_at IS NOT NULL`) from `Claimed` (`status='running' AND started_at IS NULL`).
67
- - Overview snapshot cache for `counts_cache_ttl` seconds (default 3.0) so a busy queue does not turn the dashboard into a hot reader.
68
- - Cursor pagination for `done`/`failed`/`pending` using `(finished_at, id)` / `(run_at, id)` tuples. Stable on ties.
69
- - Args hidden by default (`expose_args: false`); when enabled, content runs through `redact_args`. All user content rendered through `textContent`, never `innerHTML`.
70
- - Auth hook is **mandatory**. `Configuration#validate!` rejects an unconfigured `auth`. There is no permissive default.
161
+ ## 0.7.1
71
162
 
72
- - Add the optional Rack dashboard for the SQLite queue.
73
- - Make sqlite3 an explicit runtime dependency for queue/dashboard installs.
163
+ `Store` exposes three SQLite tuning knobs via `StoreOptions`, validated at
164
+ construction time so misconfigurations fail fast:
74
165
 
75
- ### Configuration
166
+ - `mmap` (`true` / `false`, default `true`) — memory-mapped I/O.
167
+ - `synchronous` (`:normal` / `:full` / `:extra`, default `:normal`) —
168
+ durability vs throughput.
169
+ - `wal_autocheckpoint` (`Integer` in `100..10_000`, default `1_000`) — WAL
170
+ checkpoint frequency in pages.
76
171
 
77
- ```ruby
78
- require 'async/background/web'
172
+ **Breaking change.** `Store.new(path:, mmap:)` → `Store.new(path:, options:
173
+ { mmap: ... })`. The direct `mmap:` kwarg is removed in favor of the
174
+ unified `options:` hash. Update any call site that constructs `Store`
175
+ manually.
79
176
 
80
- Async::Background::Queue::Store.prepare_dashboard!(path: '/var/lib/app/queue.db')
177
+ See [Get Started → Store tuning](docs/GET_STARTED.md#appendix-store-tuning)
178
+ for trade-offs.
81
179
 
82
- Async::Background::Web.configure do |c|
83
- c.queue_path = '/var/lib/app/queue.db'
84
- c.auth = ->(env) { env['warden'].user&.admin? }
85
- c.expose_args = false
86
- c.metrics_path = '/run/app/async-background.shm'
87
- c.total_workers = 4
88
- c.counts_cache_ttl = 3.0
89
- c.poll_interval_ms = 2000
90
- c.list_limit = 50
91
- c.mount_path = '/admin/background'
92
- c.title = 'My App background jobs'
180
+
181
+
182
+  
183
+
184
+ ## 0.6.2
185
+
186
+ Queue jobs gain a **configurable timeout** at three levels — call-site
187
+ `options:`, class-level `.options`, default 120 s — merged at enqueue time
188
+ so the runner just reads the final value from the payload:
189
+
190
+ ```ruby
191
+ class HeavyImportJob
192
+ include Async::Background::Job
193
+ options timeout: 600
93
194
  end
94
195
 
95
- run Async::Background::Web.app
196
+ HeavyImportJob.perform_async(user_id, options: { timeout: 120 }) # wins
96
197
  ```
97
198
 
98
- ### Dependencies
99
-
100
- - `rack` is an optional dependency. Required only when `require 'async/background/web'` is loaded. Core gem and worker processes do not require it.
199
+ Side effects: an `options TEXT` column in SQLite (added idempotently via
200
+ `ALTER TABLE … rescue nil` on existing databases), an extensible `options:`
201
+ hash across the entire enqueue chain, a `Job::Options` schema via
202
+ `Data.define` (unknown keys raise `ArgumentError`), and queue-timeout
203
+ failure logs now include the actual value (`"timed out after 120s"`).
101
204
 
102
- ### Breaking changes from 0.7.x
103
205
 
104
- None beyond what 0.7.2 already shipped. The 1.0 line locks the existing contract:
105
206
 
106
- - `Queue::Store#fetch` returns `claim_token` in the result hash.
107
- - All terminal `Queue::Store` methods (`complete`, `fail`, `retry_or_fail`) require the `claim_token:` kwarg and return CAS success boolean / `:retried` / `:failed` / `nil`.
108
- - Schema is versioned via `PRAGMA user_version`. Use `Queue::Store.migrate!(path:)` to upgrade. Use `Queue::Store.prepare_dashboard!(path:)` from the dashboard process to lazily create dashboard-only indexes (per-status partial indexes for `done` / `failed`, plus separate `executing` and `claimed` indexes).
207
+  
109
208
 
110
- ## 0.7.2
209
+ ## 0.6.1
111
210
 
112
- - Harden queue execution, retries, shutdown, and metrics.
113
- - Add schema v1, optional dashboard indexes, and a faster enqueue path.
211
+ Two scheduler fixes and one notification fast path:
114
212
 
115
- ## 0.7.1
213
+ - **Cron busy-loop on overlap skip.** When a scheduled run was skipped
214
+ because the previous one was still active, the entry was re-pushed to the
215
+ heap without `reschedule`. `next_run_at` never advanced, so the next
216
+ iteration picked it up immediately. Skip branch now calls
217
+ `entry.reschedule(monotonic_now)` like the normal path.
218
+ - **Prepared statement reset on fetch error.** `@fetch_stmt.reset!` ran
219
+ after `execute` returned, so an exception inside `execute` left the
220
+ statement dirty and the next `fetch` could fail. Wrapped in
221
+ `begin / ensure`.
222
+ - **SocketNotifier: 1 connect per enqueue.** `notify_all` no longer
223
+ connects to all N worker sockets on every enqueue. Wakes a single worker
224
+ chosen by random offset, falls back through the ring only if the chosen
225
+ worker is dead. Happy path: 1 connect; worst case (all workers down): N.
226
+ - Pending lookup now uses a partial index
227
+ `idx_jobs_pending(run_at, id) WHERE status = 'pending'`. Smaller on disk,
228
+ cheaper to update, and matches the only query that uses it.
116
229
 
117
- ### Features
118
- - **Tunable `Store` options via `StoreOptions`** — three knobs exposed for SQLite tuning, validated at construction time so misconfigurations fail fast at boot:
119
- - `mmap` (`true`/`false`, default `true`) — toggle memory-mapped I/O
120
- - `synchronous` (`:normal`/`:full`/`:extra`, default `:normal`) — durability vs throughput
121
- - `wal_autocheckpoint` (`Integer` in `100..10_000`, default `1_000`) — WAL checkpoint frequency in pages
122
230
 
123
- Range and enum validation prevent foot-guns (e.g. `wal_autocheckpoint: 100_000` would bloat WAL beyond `journal_size_limit`). See [Get Started → Store tuning](docs/GET_STARTED.md) for trade-offs of each knob
124
231
 
125
- ### Breaking changes
126
- - `Store.new(path:, mmap:)` → `Store.new(path:, options: { mmap: ... })`. Direct `mmap:` keyword argument removed in favor of the unified `options:` hash. Users who construct `Store` manually (e.g. for web-worker enqueue) need to update the call site
232
+  
127
233
 
128
- ## 0.6.2
234
+ ## 0.6.0
129
235
 
130
- ### Features
131
- - **Configurable timeout for queue jobs** — queue jobs previously used a hardcoded 30-second timeout (`DEFAULT_TIMEOUT`). Now configurable via `options` hash at two levels:
132
- ```ruby
133
- # Class-level default
134
- class HeavyImportJob
135
- include Async::Background::Job
136
- options timeout: 600
137
-
138
- def perform(user_id) = # ...
139
- end
140
-
141
- # Call-site override (wins over class-level)
142
- HeavyImportJob.perform_async(user_id, options: { timeout: 120 })
143
- ```
144
- Priority: call-site `options:` → class-level `options` → `DEFAULT_TIMEOUT` (30s). Options are merged at enqueue time so the runner simply reads the final value from the payload
145
- - **`options:` hash across the entire enqueue chain** — single extensible contract from `perform_async` through `Client` down to `Store`. Currently supports `:timeout`, designed to accommodate future keys (e.g. `:retry`) without API changes
146
- - **`Job::Options` schema via `Data.define`** — declares known option keys with types and defaults. Unknown keys raise `ArgumentError`, invalid types raise `TypeError`. No manual validation code
147
- - **`options TEXT` column in SQLite** — stores the merged options hash as JSON. Extensible without schema changes when new options are added
148
-
149
- ### Improvements
150
- - **Queue timeout logged on failure** — `run_queue_job` error log now includes actual timeout value: `"timed out after 120s"` instead of generic `"timed out"`
151
- - **Idempotent schema migration** — existing databases get `ALTER TABLE jobs ADD COLUMN options TEXT` on first connection, wrapped in `rescue nil` for safe re-runs. New databases include the column in `CREATE TABLE`
236
+ **Queue notification system rewritten.** The pipe-based `Notifier` is
237
+ replaced with a Unix-domain-socket architecture: each worker listens on its
238
+ own socket (`<dir>/async_bg_worker_N.sock`), producers broadcast wake-ups
239
+ via `SocketNotifier`. Fork-safe by design (no shared FDs), resilient to
240
+ restarts (stale-socket cleanup), and sub-100 µs wake-up latency
241
+ (30–80 µs typical).
152
242
 
153
- ## 0.6.1
243
+ **Why.** The pipe-based notifier was fundamentally broken in the
244
+ recommended multi-fork setup: `for_consumer!` closed the writer end in each
245
+ child, making `Client#push → notify` fail silently with `IOError`. All
246
+ writes hit `WRITE_DROPPED`, so the queue silently degraded to 5-second
247
+ polling.
154
248
 
155
- ### Bug Fixes
156
- - **Runner: cron jobs busy-loop on overlap skip** — when a scheduled run was skipped because the previous one was still active, the entry was re-pushed to the heap without calling `reschedule`. For cron jobs (where `interval` is `nil`), this meant `next_run_at` was never advanced to the next cron tick, causing the entry to be picked up again immediately on the next loop iteration. Skip branch now calls `entry.reschedule(monotonic_now)` like the normal path
157
- - **Store: prepared statement not reset on fetch error** — `@fetch_stmt.reset!` was called after `execute` returned, so an exception inside `execute` left the statement in a dirty state and the next `fetch` could fail. Wrapped in `begin/ensure` to guarantee reset
249
+ **Breaking changes.** `Runner` now takes `queue_socket_dir:` instead of
250
+ `queue_notifier:`. `Notifier#for_producer!` / `Notifier#for_consumer!` are
251
+ removed. `Client#push` calls `notifier.notify_all`. Environment variable
252
+ `QUEUE_SOCKET_PATH` is replaced by `QUEUE_SOCKET_DIR` (a directory now).
158
253
 
159
- ### Improvements
160
- - **SocketNotifier: non-blocking enqueue with ring fallback** — `notify_all` no longer connects to all N worker sockets on every enqueue. `UNIXSocket.new` is a blocking, non-fiber-aware syscall, and notifying every worker blocked the Falcon reactor for N `connect()` calls on the hot HTTP enqueue path. Now wakes a single worker chosen by random offset, falling back through the ring only if the chosen worker is dead (`ECONNREFUSED` etc.). Happy path: 1 connect. Worst case (all workers down): N connects — same as before, but only when actually needed. Safe because the queue is shared in SQLite, not sharded per worker
161
- - **SocketNotifier: cleaned up `UNAVAILABLE` error list** — removed `IO::WaitWritable` and `Errno::EAGAIN`. They implied "socket buffer full", but `write_nonblock` of a single byte to a freshly-opened connection cannot fill the kernel buffer. Listing them only misled readers
162
- - **Store: partial index for pending lookup** — replaced `idx_jobs_status_run_at_id(status, run_at, id)` with partial index `idx_jobs_pending(run_at, id) WHERE status = 'pending'`. Smaller on disk, cheaper to update, and matches the only query that uses it (`fetch`). `done`/`failed`/`running` rows no longer occupy index pages
163
254
 
164
- ## 0.6.0
165
255
 
166
- ### Breaking Changes
167
- - **Queue notification system completely rewritten** — replaced pipe-based `Notifier` with Unix domain socket-based architecture
168
- - `Runner` now takes `queue_socket_dir:` parameter instead of `queue_notifier:`
169
- - Removed `Notifier#for_producer!` and `Notifier#for_consumer!` — no longer needed
170
- - `Client#push` now calls `notifier.notify_all` instead of `notifier.notify`
171
-
172
- ### Features
173
- - **Unix domain socket-based notifications** — solves all cross-process notification problems
174
- - New `SocketWaker` class (consumer-side) — each worker listens on its own Unix socket (`/tmp/queue/sockets/async_bg_worker_N.sock`)
175
- - New `SocketNotifier` class (producer-side) — connects to all worker sockets to broadcast wake-ups
176
- - **Cross-process wake-up now works correctly** — web workers → background workers, background workers → background workers
177
- - **Fork-safe by design** — no shared file descriptors, each process creates its own socket after fork
178
- - **Resilient to restarts** — stale socket cleanup on worker startup, graceful degradation if worker unavailable
179
- - **Sub-100µs latency** — typical wake-up time 30-80µs vs previous 5-second polling fallback
180
-
181
- ### Bug Fixes
182
- - **CRITICAL: Notifier bug in recommended setup** — the old pipe-based `Notifier` was fundamentally broken in multi-fork scenarios:
183
- - `for_consumer!` closed the writer end in each child process, making `Client#push → notify` fail silently with `IOError`
184
- - All writes were caught by `WRITE_DROPPED` rescue block, causing jobs to use 5-second polling instead of instant wake-up
185
- - Web workers had no way to notify background workers (no shared pipe after fork)
186
- - The bug was masked by `WRITE_DROPPED` silently catching `IOError` — appeared to work but degraded to polling
187
- - **Socket cleanup race conditions** — `SocketWaker#cleanup_stale_socket` now validates if socket is truly stale by attempting connection
188
-
189
- ### Improvements
190
- - Updated `docs/GET_STARTED.md` with new socket-based setup for Falcon
191
- - Added section on web worker → background worker job enqueuing with full example
192
- - Changed environment variable from `QUEUE_SOCKET_PATH` to `QUEUE_SOCKET_DIR` (directory instead of single socket path)
193
- - Better error handling in `SocketWaker` and `SocketNotifier` with comprehensive `UNAVAILABLE` error list
194
- - Integrated with `Async::Notification` for local wake-ups (shutdown signals)
195
-
196
- ### Technical Details
197
- - **Why sockets over pipes?** Pipes require shared FDs across fork boundaries. The recommended Falcon setup calls `for_consumer!` in each child, which closes the writer, breaking the notification chain. Sockets use filesystem paths — any process can connect without inherited FDs.
198
- - **Performance impact:** Adding ~80µs per enqueue for 8 workers (8 socket connections) vs ~100µs for SQLite transaction = negligible overhead
199
- - **Graceful degradation:** If worker socket unavailable (`ENOENT`, `ECONNREFUSED`), producer silently skips — job still in database, will be picked up on next poll (5s max delay)
256
+ &nbsp;
200
257
 
201
258
  ## 0.5.1
202
259
 
203
- ### Testing Infrastructure
204
- - **Comprehensive CI setup** — full Docker-based integration testing environment with `Dockerfile.ci`, `docker-compose.ci.yml`, and `Gemfile.ci`
205
- - **End-to-end scenario testing** new `ci/scenario_test.rb` validates real-world scenarios with forked workers:
206
- - Normal execution of fast/slow/failing jobs across multiple workers
207
- - Crash recovery after SIGKILL with automatic job pickup by remaining workers
208
- - No duplicate execution guarantees under worker crashes
209
- - Proper job distribution validation across worker pool
210
- - **Test fixtures** dedicated `ci/fixtures/jobs.rb` and `ci/fixtures/schedule.yml` for scenario testing
211
-
212
- ### Bug Fixes
213
- - **SQLite busy timeout** — added `PRAGMA busy_timeout = 5000` to `Queue::Store` to prevent `SQLITE_BUSY` errors under concurrent multi-process database access
214
- - **Enhanced Queue::Notifier error handling** — restructured IO error handling with clearer categorization:
215
- - `WRITE_DROPPED` for write failures (`IO::WaitWritable`, `Errno::EAGAIN`, `IOError`, `Errno::EPIPE`) — all non-fatal as job is already in store
216
- - `READ_EXHAUSTED` for read exhaustion (`IO::WaitReadable`, `EOFError`, `IOError`) — normal drain completion
217
- - Added explanatory comments for each error type and handling strategy
260
+ CI infrastructure: full Docker-based integration testing (`Dockerfile.ci`,
261
+ `docker-compose.ci.yml`, `Gemfile.ci`) plus an end-to-end scenario test that
262
+ validates forked-worker behaviornormal execution, crash recovery after
263
+ SIGKILL, no duplicate execution under crashes, proper distribution across
264
+ the pool.
265
+
266
+ Also: `PRAGMA busy_timeout = 5000` on `Queue::Store` to prevent
267
+ `SQLITE_BUSY` under concurrent multi-process access; cleaner IO error
268
+ categorization in `Queue::Notifier` (`WRITE_DROPPED` vs `READ_EXHAUSTED`)
269
+ with explanatory comments.
270
+
271
+
272
+
273
+ &nbsp;
218
274
 
219
275
  ## 0.5.0
220
276
 
221
- ### Features
222
- - **Delayed jobs** — full support for scheduling jobs in the future
223
- - `Queue::Client#push_in(delay, class_name, args)` — enqueue with delay in seconds
224
- - `Queue::Client#push_at(time, class_name, args)` — enqueue at a specific time
225
- - `Queue.enqueue_in(delay, job_class, *args)` — class-level delayed enqueue
226
- - `Queue.enqueue_at(time, job_class, *args)` — class-level scheduled enqueue
227
- - New `run_at` column in SQLite `jobs` table — jobs are only fetched when `run_at <= now`
228
- - **Job module** Sidekiq-like `include Async::Background::Job` interface
229
- - `perform_async(*args)` immediate queue execution
230
- - `perform_in(delay, *args)` — delayed execution after N seconds
231
- - `perform_at(time, *args)` — scheduled execution at a specific time
232
- - Instance-level `#perform` with class-level `perform_now` delegation
233
- - **Clock module** — shared `monotonic_now` / `realtime_now` helpers extracted into `Async::Background::Clock`, included by `Runner`, `Queue::Store`, and `Queue::Client`
234
-
235
- ### Bug Fixes
236
- - **Runner: incorrect task in `with_timeout`** — `semaphore.async { |job_task| ... }` now correctly receives the child task instead of capturing the parent `task` from the outer scope. Previously, `with_timeout` was applied to the parent task, which could cancel unrelated work
237
-
238
- ### Improvements
239
- - **Job API: `#perform` instead of `#perform_now`** — job classes now define `#perform` instance method. The class-level `perform_now` creates instance and calls `#perform`, aligning with ActiveJob / Sidekiq conventions
240
- - Updated error messages: validation failures now suggest `must include Async::Background::Job` instead of `must implement .perform_now`
241
- - `Queue::Client` — extracted private `ensure_configured!` and `resolve_class_name` methods for cleaner validation and class name resolution logic
242
- - `Queue::Notifier` — extracted `IO_ERRORS` constant (`IO::WaitReadable`, `EOFError`, `IOError`) for cleaner `rescue` in `drain`
243
- - `Queue::Store` — replaced index `idx_jobs_status_id(status, id)` with `idx_jobs_status_run_at_id(status, run_at, id)` for efficient delayed job lookups
244
- - `Queue::Store` — `fetch` SQL now uses `WHERE status = 'pending' AND run_at <= ?` with `ORDER BY run_at, id` to process jobs in scheduled order
245
- - Removed duplicated `monotonic_now` / `realtime_now` from `Runner` and `Store` — now provided by `Clock` module
246
- - Updated documentation: README (Job module examples, Queue architecture diagram, Clock section), GET_STARTED (delayed jobs guide, Job module usage, minimal queue-only example)
277
+ **Delayed jobs.** Full support for scheduling jobs in the future:
278
+
279
+ ```ruby
280
+ SomeJob.perform_in(60, *args)
281
+ SomeJob.perform_at(time, *args)
282
+ ```
283
+
284
+ Backed by a new `run_at` column in the SQLite `jobs` table — jobs are only
285
+ fetched when `run_at <= now`.
286
+
287
+ **Job module.** Sidekiq-like `include Async::Background::Job` adds
288
+ `perform_async`, `perform_in`, `perform_at`, instance-level `#perform`, and
289
+ class-level `perform_now` delegation.
290
+
291
+ **Clock module.** Shared `monotonic_now` / `realtime_now` helpers extracted
292
+ to `Async::Background::Clock` and included by `Runner`, `Queue::Store`, and
293
+ `Queue::Client`.
294
+
295
+
296
+
297
+ &nbsp;
247
298
 
248
299
  ## 0.4.5
249
300
 
250
- ### Breaking Changes
251
- - `PRAGMAS` is now a frozen lambda `PRAGMAS.call(mmap_size)` instead of a static string if you referenced this constant directly, update your code
301
+ **Fetch race condition fixed.** Wrapped `UPDATE ... RETURNING` in
302
+ `BEGIN IMMEDIATE` to prevent two workers from picking up the same job
303
+ simultaneously.
304
+
305
+ **mmap on Docker overlay2.** `overlay2` does not guarantee `write()` /
306
+ `mmap()` coherence, which corrupts the WAL under concurrent multi-process
307
+ access. mmap is now configurable via `queue_mmap: false` instead of being
308
+ hardcoded. Proper Docker setup with named volumes is documented in
309
+ [Get Started → Docker](docs/GET_STARTED.md#step-3--docker-setup).
310
+
311
+ Also: `PRAGMA optimize` on shutdown wrapped in `rescue nil`,
312
+ `PRAGMA incremental_vacuum` actually works now (`PRAGMA auto_vacuum =
313
+ INCREMENTAL` added to schema; only takes effect on new databases),
314
+ composite index `idx_jobs_status_id(status, id)` to eliminate a sort in
315
+ `fetch`. New `queue_mmap:` / `mmap:` parameters and a public
316
+ `attr_reader :queue_store` on `Runner`.
317
+
318
+ **Breaking-ish.** `PRAGMAS` is now a frozen lambda `PRAGMAS.call(mmap_size)`
319
+ instead of a static string; update any direct reference.
252
320
 
253
- ### Features
254
- - New `queue_mmap:` parameter on `Runner` (default: `true`) — allows disabling SQLite mmap for environments where it's unsafe (Docker overlay2)
255
- - New `mmap:` parameter on `Queue::Store` (default: `true`) — controls `PRAGMA mmap_size` (256 MB when enabled, 0 when disabled)
256
- - Public `attr_reader :queue_store` on `Runner` — eliminates need for `instance_variable_get` when sharing Store with Client
257
321
 
258
- ### Bug Fixes
259
- - **CRITICAL: fetch race condition** — wrapped `UPDATE ... RETURNING` in `BEGIN IMMEDIATE` transaction to prevent two workers from picking up the same job simultaneously
260
- - **CRITICAL: mmap + Docker overlay2** — `overlay2` filesystem does not guarantee `write()`/`mmap()` coherence, causing SQLite WAL corruption under concurrent multi-process access. mmap is now configurable via `queue_mmap: false` instead of being hardcoded. Documented proper Docker setup with named volumes in `docs/GET_STARTED.md`
261
- - **`PRAGMA optimize` on shutdown** — wrapped in `rescue nil` to prevent `SQLite3::BusyException` when another process holds the write lock during graceful shutdown
262
- - **`PRAGMA incremental_vacuum` was a no-op** — added `PRAGMA auto_vacuum = INCREMENTAL` to schema. Without it, `incremental_vacuum` does nothing. Note: only takes effect on newly created databases; existing databases require a one-time `VACUUM`
263
322
 
264
- ### Improvements
265
- - Replaced index `idx_jobs_status(status)` with composite `idx_jobs_status_id(status, id)` — eliminates sort step in `fetch` query (`ORDER BY id LIMIT 1` is now a direct B-tree lookup)
266
- - Fixed `finalize_statements` — changed `%i[@enqueue_stmt ...]` to `%i[enqueue_stmt ...]` with `:"@#{name}"` interpolation for idiomatic `instance_variable_get`/`set` usage
267
- - Added documentation: `README.md` (concise, with warning markers) and `docs/GET_STARTED.md` (step-by-step guide covering schedule config, Falcon integration, Docker setup, dynamic queue)
323
+ &nbsp;
268
324
 
269
325
  ## 0.4.0
270
326
 
271
- ### Features
272
- - **Dynamic job queue** — enqueue jobs at runtime from any process (web, console, rake) with automatic execution by background workers
273
- - `Queue::Store` — SQLite-backed persistent storage with WAL mode, prepared statements, and optimized pragmas
274
- - `Queue::Notifier` — `IO.pipe`-based zero-cost wakeup between producer and consumer processes (no polling)
275
- - `Queue::Client` — public API: `Async::Background::Queue.enqueue(JobClass, *args)`
276
- - Automatic recovery of stale `running` jobs on worker restart
277
- - Periodic cleanup of completed jobs (piggyback on fetch, every 5 minutes)
278
- - `PRAGMA incremental_vacuum` when cleanup removes 100+ rows
279
- - Worker isolation via `ISOLATION_FORKS` env variable — exclude specific workers from queue processing
280
- - Custom database path via `queue_db_path` parameter
281
- - Requires optional `sqlite3` gem (`~> 2.0`) — not included by default, must be added to Gemfile explicitly
282
- - New Runner parameters: `queue_notifier:` and `queue_db_path:`
283
-
284
- ### Improvements
285
- - Unified `monotonic_now` usage across `run_job` and `run_queue_job` (was using direct `Process.clock_gettime` call in `run_job`)
286
- - `Queue::Notifier#drain` — moved `rescue` inside the loop to avoid stack unwinding on each drain cycle
327
+ **Dynamic job queue.** Enqueue jobs at runtime from any process (web,
328
+ console, rake) with automatic execution by background workers.
287
329
 
288
- ## 0.3.0
330
+ - `Queue::Store` — SQLite-backed persistent storage with WAL mode,
331
+ prepared statements, and optimized pragmas.
332
+ - `Queue::Notifier` — `IO.pipe`-based zero-cost wake-up between producer
333
+ and consumer processes.
334
+ - `Queue::Client` — public API: `Async::Background::Queue.enqueue
335
+ (JobClass, *args)`.
336
+ - Automatic recovery of stale `running` jobs on worker restart.
337
+ - Periodic cleanup of completed jobs (piggybacked on fetch, every 5 min);
338
+ `PRAGMA incremental_vacuum` when cleanup removes 100+ rows.
339
+ - `ISOLATION_FORKS` env var excludes specific workers from queue processing.
340
+ - Custom database path via `queue_db_path:` on `Runner`.
289
341
 
290
- ### Features
291
- - Added optional metrics collection system using shared memory
292
- - New `Metrics` class with worker-specific performance tracking
293
- - Public API: `runner.metrics.enabled?`, `runner.metrics.values`, `Metrics.read_all()`
294
- - Tracks total runs, successes, failures, timeouts, skips, active jobs, and execution times
295
- - Requires optional `async-utilization` gem dependency
296
- - Metrics stored in `/tmp/async-background.shm` with lock-free updates per worker
342
+ Requires the optional `sqlite3` gem (`~> 2.0`).
297
343
 
298
- ## 0.2.6
344
+ (The 0.6.0 socket-based architecture supersedes the pipe-based notifier
345
+ introduced here.)
299
346
 
300
- ### Improvements
301
- - Micro-optimization in `wait_with_shutdown` method: use passed `task` parameter instead of `Async::Task.current` for better consistency and slight performance improvement
302
347
 
303
- ## 0.2.5
304
348
 
305
- ### Features
306
- - Added graceful shutdown via signal handlers for SIGINT and SIGTERM
307
- - Enhanced process lifecycle management with proper signal handling using `Signal.trap` and IO.pipe for async communication
308
- - Improved robustness for production deployments and container orchestration
309
- - Updated dependencies to work with latest Async 2.x API (removed deprecated `:parent` parameter usage)
349
+ &nbsp;
310
350
 
311
- ## 0.2.4
351
+ ## 0.3.0
312
352
 
313
- ### Improvements
314
- - Removed hardcoded version warning from main module (was checking against fixed list: 0.1.0, 0.2.2, 0.2.3). Use semantic versioning with pre-release suffixes for unstable versions (e.g., 0.3.0.alpha1) instead
315
- - Removed hardcoded stable versions list from gemspec description — metadata should describe functionality, not versioning
316
- - Changed `while true` to idiomatic `loop do` in run method
317
- - Added `Gemfile.lock` to .gitignore (gems should not commit lockfile)
318
- - Updated README: clarified that job class must respond to `.perform_now` class method (removed confusing mention of instance `#perform`)
353
+ Optional metrics collection via shared memory. `Metrics` tracks per-worker
354
+ counters: `total_runs`, `total_successes`, `total_failures`,
355
+ `total_timeouts`, `total_skips`, `active_jobs`, plus last-run timestamp and
356
+ duration. Public API: `runner.metrics.enabled?`, `runner.metrics.values`,
357
+ `Metrics.read_all(total_workers:)`. Requires the optional
358
+ `async-utilization` gem; absent that, `enabled?` is `false` and `read_all`
359
+ returns `[]`. Default file: `/tmp/async-background.shm`.
319
360
 
320
- ## 0.2.2
321
361
 
322
- ### Bug Fixes
323
- - **CRITICAL**: Removed logger parameter from Runner initialize (was unused). Fixed initialization to use Console.logger directly which now properly initializes in forked processes with correct context
324
362
 
325
- ## 0.2.1
363
+ &nbsp;
326
364
 
327
- ### Bug Fixes
328
- - **CRITICAL**: Added missing `require 'console'` in main module. Logger was nil because Console gem was not imported, causing `undefined method 'info' for nil` errors on worker initialization
365
+ ## 0.2.x
329
366
 
330
- ## 0.2.0
367
+ - **0.2.6** — `wait_with_shutdown` uses the passed `task` parameter
368
+ instead of `Async::Task.current`.
369
+ - **0.2.5** — Graceful shutdown via `SIGINT` / `SIGTERM` signal handlers
370
+ using `Signal.trap` and `IO.pipe`. Compatible with Async 2.x API
371
+ (removed deprecated `:parent`).
372
+ - **0.2.4** — Removed hardcoded version warning. Use semver pre-release
373
+ suffixes for unstable versions (e.g. `0.3.0.alpha1`).
374
+ - **0.2.2** — Removed unused `logger` parameter from `Runner#initialize`;
375
+ use `Console.logger` directly, which now initializes correctly in
376
+ forked processes.
377
+ - **0.2.1** — Added missing `require 'console'` in main module. Logger
378
+ was `nil`, causing `undefined method 'info' for nil` on worker
379
+ initialization.
380
+ - **0.2.0** — Removed hidden ActiveSupport dependency
381
+ (`safe_constantize` → `Object.const_get` + `NameError`). Job validation
382
+ now checks for `.perform_now` (class method) instead of `.perform`
383
+ (instance method). Fixed a race where an entry could disappear from the
384
+ heap during execution. Added `stop()` and `running?()` to `Runner`.
331
385
 
332
- ### Bug Fixes
333
- - **CRITICAL**: Removed hidden ActiveSupport dependency. Replaced `safe_constantize` with `Object.const_get` + `NameError` handling
334
- - **CRITICAL**: Fixed validator mismatch: now validates `.perform_now` (class method) instead of `.perform` (instance method)
335
- - **CRITICAL**: Fixed race condition where entry could disappear from heap during execution. `reschedule` and `heap.push` now always execute after job processing
336
- - Added full exception backtrace to error logs for production debugging
337
- - Improved YAML security by removing `Symbol` from `permitted_classes`
338
- - Removed Mutex from graceful shutdown (anti-pattern in Async). Boolean assignment is atomic in MRI
339
386
 
340
- ### Features
341
- - Added optional `logger` parameter to Runner constructor for custom loggers (Rails.logger, etc.)
342
- - Added `stop()` method for graceful shutdown
343
- - Added `running?()` method to check scheduler status
344
387
 
345
- ### Breaking Changes
346
- - Job class validation now checks for `.perform_now` class method (was checking for `.perform` instance method)
388
+ &nbsp;
347
389
 
348
390
  ## 0.1.0
349
391
 
350
- - Initial release
351
- - Single event loop with min-heap timer (O(log N) scheduling)
352
- - Skip overlapping execution
353
- - Startup jitter to prevent thundering herd
354
- - Monotonic clock for interval jobs, wall clock for cron jobs
355
- - Deterministic worker sharding via Zlib.crc32
356
- - Semaphore-based concurrency control
357
- - Per-job timeout protection
358
- - Structured logging via Console
392
+ Initial release.
393
+
394
+ - Single event loop with min-heap timer (`O(log N)` scheduling).
395
+ - Skip overlapping execution.
396
+ - Startup jitter to prevent thundering herd.
397
+ - Monotonic clock for interval jobs, wall clock for cron jobs.
398
+ - Deterministic worker sharding via `Zlib.crc32`.
399
+ - Semaphore-based concurrency control.
400
+ - Per-job timeout protection.
401
+ - Structured logging via Console.