async-background 1.0.0 → 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +319 -276
- data/README.md +91 -109
- data/lib/async/background/version.rb +1 -1
- data/lib/async/background/web/app.rb +39 -18
- data/lib/async/background/web/auth.rb +7 -2
- data/lib/async/background/web/configuration.rb +17 -3
- data/lib/async/background/web/event_hub.rb +25 -148
- data/lib/async/background/web/response.rb +31 -9
- data/lib/async/background/web/router.rb +3 -1
- data/lib/async/background/web/stream.rb +54 -15
- metadata +2 -2
data/CHANGELOG.md
CHANGED
|
@@ -1,358 +1,401 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
-
##
|
|
3
|
+
## 1.0.1
|
|
4
|
+
|
|
5
|
+
Dashboard security headers and a fiber-native rewrite of the SSE stream.
|
|
6
|
+
|
|
7
|
+
### Security
|
|
8
|
+
|
|
9
|
+
- HTML shell now ships a strict CSP and `X-Frame-Options: DENY`. The CSP is
|
|
10
|
+
`default-src 'none'; script-src 'self'; style-src 'self'; img-src 'self'
|
|
11
|
+
data:; connect-src 'self'; frame-ancestors 'none'; base-uri 'none';
|
|
12
|
+
form-action 'none'`.
|
|
13
|
+
- Every response carries `X-Content-Type-Options: nosniff`,
|
|
14
|
+
`Referrer-Policy: no-referrer`, and `Cross-Origin-Resource-Policy:
|
|
15
|
+
same-origin`.
|
|
16
|
+
- `401` and `404` responses are now JSON like every other error — minor
|
|
17
|
+
breaking change for clients that parsed the previous `text/plain` body.
|
|
18
|
+
- `HEAD` is accepted on every route that accepts `GET` (RFC 9110 §9.3.2).
|
|
19
|
+
`HEAD /api/stream` returns the SSE headers without opening a stream.
|
|
20
|
+
- `mount_path` validation is stricter: must start with `/`, no trailing
|
|
21
|
+
slash, no control characters, no whitespace.
|
|
22
|
+
- New `Configuration#logger` (any object responding to `#warn` / `#error`).
|
|
23
|
+
When set, auth-callable exceptions, internal `rescue StandardError`, and
|
|
24
|
+
SSE stream errors are surfaced instead of being silently swallowed.
|
|
25
|
+
|
|
26
|
+
### Architecture: fiber-native SSE
|
|
27
|
+
|
|
28
|
+
- Removed the per-process monitor `Thread.new` and per-subscription
|
|
29
|
+
`Mutex` + `ConditionVariable`. The dashboard web subsystem spawns **zero**
|
|
30
|
+
native threads of its own.
|
|
31
|
+
- `EventHub` is now a tiny mutex-guarded frame cache keyed by `data_version`.
|
|
32
|
+
No subscriptions, no monitor.
|
|
33
|
+
- `Stream#each` runs the entire poll-and-yield loop inside the per-request
|
|
34
|
+
Falcon fiber: `sleep` (fiber-aware), read `PRAGMA data_version`, yield the
|
|
35
|
+
`overview` frame when the version moves, yield a heartbeat otherwise.
|
|
36
|
+
- Each tab now polls `data_version` independently. At default 0.5 s and
|
|
37
|
+
realistic operator fan-out that's well under 50 SQLite header reads /
|
|
38
|
+
second per process. JSON rendering still happens at most once per version
|
|
39
|
+
thanks to the hub cache.
|
|
40
|
+
- Shutdown is clean: `App#close` marks the hub closed, the next poll raises
|
|
41
|
+
`ClosedError`, the loop exits. No thread to join, no `Subscription` to
|
|
42
|
+
unsubscribe.
|
|
43
|
+
|
|
44
|
+
|
|
45
|
+
|
|
46
|
+
|
|
4
47
|
|
|
5
|
-
|
|
48
|
+
## 1.0.0
|
|
6
49
|
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
50
|
+
First stable release. The queue execution contract from 0.7.2 (claim-token
|
|
51
|
+
CAS, lifecycle columns, barrier-based shutdown drain, per-status partial
|
|
52
|
+
indexes, versioned migrations) is now considered the public API.
|
|
53
|
+
|
|
54
|
+
### Dashboard
|
|
55
|
+
|
|
56
|
+
A read-only Rack-mountable UI under `require 'async/background/web'`:
|
|
57
|
+
vanilla HTML / CSS / JS, no framework, no npm.
|
|
58
|
+
|
|
59
|
+
- Endpoints: `GET /`, `GET /assets/{app.css,app.js}`,
|
|
60
|
+
`GET /api/{overview,executing,claimed,pending,done,failed,metrics,config,stream}`.
|
|
61
|
+
- The read path runs through `Async::Background::Web::Snapshot`, which
|
|
62
|
+
opens SQLite with `file:?mode=ro`, wraps a `Mutex` around a single shared
|
|
63
|
+
connection, and uses one read transaction per endpoint plus a TTL'd
|
|
64
|
+
overview cache (`counts_cache_ttl`, default 3 s).
|
|
65
|
+
- Distinguishes **executing** (`status='running' AND started_at IS NOT
|
|
66
|
+
NULL`) from **claimed** (`status='running' AND started_at IS NULL`).
|
|
67
|
+
- Cursor pagination for `done` / `failed` / `pending` using
|
|
68
|
+
`(finished_at, id)` / `(run_at, id)` tuples. Stable on ties.
|
|
69
|
+
- Args hidden by default (`expose_args: false`); when enabled, content
|
|
70
|
+
runs through `redact_args`. All user content rendered through
|
|
71
|
+
`textContent`, never `innerHTML`.
|
|
72
|
+
- `auth` is **mandatory**. `Configuration#validate!` rejects an
|
|
73
|
+
unconfigured `auth`. There is no permissive default — a falsey result
|
|
74
|
+
returns `401`.
|
|
75
|
+
|
|
76
|
+
### SSE transport
|
|
77
|
+
|
|
78
|
+
The dashboard uses a single long-lived `text/event-stream` connection per
|
|
79
|
+
browser tab instead of polling `/api/overview` every 2 seconds. One HTTP
|
|
80
|
+
connection per tab regardless of how long it stays open.
|
|
81
|
+
|
|
82
|
+
- `Configuration#transport` accepts `:sse` (default) or `:polling`. Anything
|
|
83
|
+
else raises `ConfigurationError`. The chosen transport is exposed at
|
|
84
|
+
`/api/config` so the client knows which path to take.
|
|
85
|
+
- Client opens `EventSource(mount_path + '/api/stream')` once; the server
|
|
86
|
+
pushes an `overview` event when `PRAGMA data_version` changes and a
|
|
87
|
+
`:keepalive` comment frame every 25 s.
|
|
88
|
+
- Server-supplied 5 s reconnect delay; each reconnect begins from a full
|
|
89
|
+
current snapshot (no event log).
|
|
90
|
+
- Asset URLs are fingerprinted and cached immutably by digest, so a
|
|
91
|
+
dashboard deploy can't leave a browser on incompatible HTML / JS / CSS.
|
|
92
|
+
|
|
93
|
+
### Server compatibility for SSE
|
|
94
|
+
|
|
95
|
+
SSE holds the response open for the lifetime of the dashboard tab.
|
|
96
|
+
|
|
97
|
+
- **Falcon** — recommended. Handles long-lived connections via fibers.
|
|
98
|
+
- **Puma** — works. Each open tab holds one worker thread for its lifetime;
|
|
99
|
+
fine for a handful of operators, problematic if many concurrent operators
|
|
100
|
+
would starve the worker pool.
|
|
101
|
+
- **Unicorn** — doesn't work. Blocking worker model can't hold long-lived
|
|
102
|
+
connections without timeouts. Stay on `:polling`.
|
|
103
|
+
|
|
104
|
+
See the picture in the README for what each server is actually holding.
|
|
14
105
|
|
|
15
|
-
|
|
106
|
+
### Configuration
|
|
16
107
|
|
|
17
|
-
|
|
108
|
+
```ruby
|
|
109
|
+
require 'async/background/web'
|
|
18
110
|
|
|
19
|
-
|
|
111
|
+
Async::Background::Queue::Store.prepare_dashboard!(path: '/var/lib/app/queue.db')
|
|
20
112
|
|
|
21
|
-
|
|
113
|
+
Async::Background::Web.configure do |c|
|
|
114
|
+
c.queue_path = '/var/lib/app/queue.db'
|
|
115
|
+
c.auth = ->(env) { env['warden'].user&.admin? }
|
|
116
|
+
c.expose_args = false
|
|
117
|
+
c.metrics_path = '/run/app/async-background.shm'
|
|
118
|
+
c.total_workers = 4
|
|
119
|
+
c.counts_cache_ttl = 3.0
|
|
120
|
+
c.poll_interval_ms = 2000
|
|
121
|
+
c.list_limit = 50
|
|
122
|
+
c.mount_path = '/admin/background'
|
|
123
|
+
c.title = 'My App background jobs'
|
|
124
|
+
end
|
|
22
125
|
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
- New `Response.sse(body)` helper sets the correct headers including `x-accel-buffering: no` (disables nginx buffering for the streaming response).
|
|
26
|
-
- JS client (`assets.rb`) detects `state.config.transport === 'sse'` at boot and chooses `EventSource` over `setInterval(tick, ...)`. Both transports share the same `applyOverview()` and `refreshActiveList()` handlers, so the UI behaves identically.
|
|
126
|
+
run Async::Background::Web.app
|
|
127
|
+
```
|
|
27
128
|
|
|
28
|
-
|
|
129
|
+
### Dependencies
|
|
29
130
|
|
|
30
|
-
|
|
131
|
+
`rack` is optional. Required only when `require 'async/background/web'` is
|
|
132
|
+
loaded. Core gem and worker processes don't require it.
|
|
31
133
|
|
|
32
|
-
|
|
134
|
+
### Breaking changes from 0.7.x
|
|
33
135
|
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
c.queue_path = ...
|
|
37
|
-
c.auth = ->(env) { ... }
|
|
38
|
-
c.transport = :sse
|
|
39
|
-
end
|
|
40
|
-
```
|
|
136
|
+
None beyond what 0.7.2 already shipped. The 1.0 line locks the existing
|
|
137
|
+
contract:
|
|
41
138
|
|
|
42
|
-
|
|
139
|
+
- `Queue::Store#fetch` returns `claim_token` in the result hash.
|
|
140
|
+
- All terminal `Queue::Store` methods (`complete`, `fail`, `retry_or_fail`)
|
|
141
|
+
require the `claim_token:` kwarg and return CAS success boolean /
|
|
142
|
+
`:retried` / `:failed` / `nil`.
|
|
143
|
+
- Schema is versioned via `PRAGMA user_version`. Use
|
|
144
|
+
`Queue::Store.migrate!(path:)` to upgrade. Use
|
|
145
|
+
`Queue::Store.prepare_dashboard!(path:)` from the dashboard process to
|
|
146
|
+
add dashboard-only indexes.
|
|
43
147
|
|
|
44
|
-
SSE holds the request thread/fiber open for the lifetime of the dashboard tab. **Recommended for Falcon**, which handles long-lived connections natively via fibers. **Puma works** but each open dashboard tab holds one worker thread for its lifetime — fine for an admin dashboard with a handful of operators, problematic if many concurrent operators would starve the worker pool. **Unicorn does not work** for SSE since its blocking worker model can't hold long-lived connections without timeouts; stay on `:polling` there.
|
|
45
148
|
|
|
46
|
-
### Backend-side polling
|
|
47
149
|
|
|
48
|
-
|
|
150
|
+
|
|
49
151
|
|
|
50
|
-
|
|
152
|
+
## 0.7.2
|
|
51
153
|
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
- Extended `spec/async/background/web/configuration_spec.rb` — accepts `:sse`, rejects unknown transports.
|
|
154
|
+
Harden queue execution, retries, shutdown, and metrics. Adds schema v1,
|
|
155
|
+
optional dashboard indexes, and a faster enqueue path.
|
|
55
156
|
|
|
56
|
-
## 1.0.0
|
|
57
157
|
|
|
58
|
-
First stable release. The queue execution contract from 0.7.2 (claim-token CAS, lifecycle columns, barrier-based shutdown drain, per-status partial indexes, versioned migrations) is now considered the public API.
|
|
59
158
|
|
|
60
|
-
|
|
159
|
+
|
|
61
160
|
|
|
62
|
-
|
|
63
|
-
- Endpoints: `GET /`, `GET /assets/app.css`, `GET /assets/app.js`, `GET /api/overview`, `GET /api/executing`, `GET /api/claimed`, `GET /api/pending`, `GET /api/done`, `GET /api/failed`, `GET /api/metrics`, `GET /api/config`.
|
|
64
|
-
- Default transport is JSON polling (`poll_interval_ms`, default 2000). SSE adapter for Falcon is intentionally deferred to a later release; the dashboard already coalesces work via a shared overview cache, so adding SSE later is a backward-compatible change.
|
|
65
|
-
- Read path runs through `Async::Background::Web::Snapshot`, which opens SQLite with `file:?mode=ro`, wraps a `Mutex` around a single shared connection, and uses one read transaction per endpoint and caches each overview as one consistent snapshot.
|
|
66
|
-
- Distinguishes `Executing` (`status='running' AND started_at IS NOT NULL`) from `Claimed` (`status='running' AND started_at IS NULL`).
|
|
67
|
-
- Overview snapshot cache for `counts_cache_ttl` seconds (default 3.0) so a busy queue does not turn the dashboard into a hot reader.
|
|
68
|
-
- Cursor pagination for `done`/`failed`/`pending` using `(finished_at, id)` / `(run_at, id)` tuples. Stable on ties.
|
|
69
|
-
- Args hidden by default (`expose_args: false`); when enabled, content runs through `redact_args`. All user content rendered through `textContent`, never `innerHTML`.
|
|
70
|
-
- Auth hook is **mandatory**. `Configuration#validate!` rejects an unconfigured `auth`. There is no permissive default.
|
|
161
|
+
## 0.7.1
|
|
71
162
|
|
|
72
|
-
|
|
73
|
-
|
|
163
|
+
`Store` exposes three SQLite tuning knobs via `StoreOptions`, validated at
|
|
164
|
+
construction time so misconfigurations fail fast:
|
|
74
165
|
|
|
75
|
-
|
|
166
|
+
- `mmap` (`true` / `false`, default `true`) — memory-mapped I/O.
|
|
167
|
+
- `synchronous` (`:normal` / `:full` / `:extra`, default `:normal`) —
|
|
168
|
+
durability vs throughput.
|
|
169
|
+
- `wal_autocheckpoint` (`Integer` in `100..10_000`, default `1_000`) — WAL
|
|
170
|
+
checkpoint frequency in pages.
|
|
76
171
|
|
|
77
|
-
|
|
78
|
-
|
|
172
|
+
**Breaking change.** `Store.new(path:, mmap:)` → `Store.new(path:, options:
|
|
173
|
+
{ mmap: ... })`. The direct `mmap:` kwarg is removed in favor of the
|
|
174
|
+
unified `options:` hash. Update any call site that constructs `Store`
|
|
175
|
+
manually.
|
|
79
176
|
|
|
80
|
-
|
|
177
|
+
See [Get Started → Store tuning](docs/GET_STARTED.md#appendix-store-tuning)
|
|
178
|
+
for trade-offs.
|
|
81
179
|
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
180
|
+
|
|
181
|
+
|
|
182
|
+
|
|
183
|
+
|
|
184
|
+
## 0.6.2
|
|
185
|
+
|
|
186
|
+
Queue jobs gain a **configurable timeout** at three levels — call-site
|
|
187
|
+
`options:`, class-level `.options`, default 120 s — merged at enqueue time
|
|
188
|
+
so the runner just reads the final value from the payload:
|
|
189
|
+
|
|
190
|
+
```ruby
|
|
191
|
+
class HeavyImportJob
|
|
192
|
+
include Async::Background::Job
|
|
193
|
+
options timeout: 600
|
|
93
194
|
end
|
|
94
195
|
|
|
95
|
-
|
|
196
|
+
HeavyImportJob.perform_async(user_id, options: { timeout: 120 }) # wins
|
|
96
197
|
```
|
|
97
198
|
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
199
|
+
Side effects: an `options TEXT` column in SQLite (added idempotently via
|
|
200
|
+
`ALTER TABLE … rescue nil` on existing databases), an extensible `options:`
|
|
201
|
+
hash across the entire enqueue chain, a `Job::Options` schema via
|
|
202
|
+
`Data.define` (unknown keys raise `ArgumentError`), and queue-timeout
|
|
203
|
+
failure logs now include the actual value (`"timed out after 120s"`).
|
|
101
204
|
|
|
102
|
-
### Breaking changes from 0.7.x
|
|
103
205
|
|
|
104
|
-
None beyond what 0.7.2 already shipped. The 1.0 line locks the existing contract:
|
|
105
206
|
|
|
106
|
-
|
|
107
|
-
- All terminal `Queue::Store` methods (`complete`, `fail`, `retry_or_fail`) require the `claim_token:` kwarg and return CAS success boolean / `:retried` / `:failed` / `nil`.
|
|
108
|
-
- Schema is versioned via `PRAGMA user_version`. Use `Queue::Store.migrate!(path:)` to upgrade. Use `Queue::Store.prepare_dashboard!(path:)` from the dashboard process to lazily create dashboard-only indexes (per-status partial indexes for `done` / `failed`, plus separate `executing` and `claimed` indexes).
|
|
207
|
+
|
|
109
208
|
|
|
110
|
-
## 0.
|
|
209
|
+
## 0.6.1
|
|
111
210
|
|
|
112
|
-
|
|
113
|
-
- Add schema v1, optional dashboard indexes, and a faster enqueue path.
|
|
211
|
+
Two scheduler fixes and one notification fast path:
|
|
114
212
|
|
|
115
|
-
|
|
213
|
+
- **Cron busy-loop on overlap skip.** When a scheduled run was skipped
|
|
214
|
+
because the previous one was still active, the entry was re-pushed to the
|
|
215
|
+
heap without `reschedule`. `next_run_at` never advanced, so the next
|
|
216
|
+
iteration picked it up immediately. Skip branch now calls
|
|
217
|
+
`entry.reschedule(monotonic_now)` like the normal path.
|
|
218
|
+
- **Prepared statement reset on fetch error.** `@fetch_stmt.reset!` ran
|
|
219
|
+
after `execute` returned, so an exception inside `execute` left the
|
|
220
|
+
statement dirty and the next `fetch` could fail. Wrapped in
|
|
221
|
+
`begin / ensure`.
|
|
222
|
+
- **SocketNotifier: 1 connect per enqueue.** `notify_all` no longer
|
|
223
|
+
connects to all N worker sockets on every enqueue. Wakes a single worker
|
|
224
|
+
chosen by random offset, falls back through the ring only if the chosen
|
|
225
|
+
worker is dead. Happy path: 1 connect; worst case (all workers down): N.
|
|
226
|
+
- Pending lookup now uses a partial index
|
|
227
|
+
`idx_jobs_pending(run_at, id) WHERE status = 'pending'`. Smaller on disk,
|
|
228
|
+
cheaper to update, and matches the only query that uses it.
|
|
116
229
|
|
|
117
|
-
### Features
|
|
118
|
-
- **Tunable `Store` options via `StoreOptions`** — three knobs exposed for SQLite tuning, validated at construction time so misconfigurations fail fast at boot:
|
|
119
|
-
- `mmap` (`true`/`false`, default `true`) — toggle memory-mapped I/O
|
|
120
|
-
- `synchronous` (`:normal`/`:full`/`:extra`, default `:normal`) — durability vs throughput
|
|
121
|
-
- `wal_autocheckpoint` (`Integer` in `100..10_000`, default `1_000`) — WAL checkpoint frequency in pages
|
|
122
230
|
|
|
123
|
-
Range and enum validation prevent foot-guns (e.g. `wal_autocheckpoint: 100_000` would bloat WAL beyond `journal_size_limit`). See [Get Started → Store tuning](docs/GET_STARTED.md) for trade-offs of each knob
|
|
124
231
|
|
|
125
|
-
|
|
126
|
-
- `Store.new(path:, mmap:)` → `Store.new(path:, options: { mmap: ... })`. Direct `mmap:` keyword argument removed in favor of the unified `options:` hash. Users who construct `Store` manually (e.g. for web-worker enqueue) need to update the call site
|
|
232
|
+
|
|
127
233
|
|
|
128
|
-
## 0.6.
|
|
234
|
+
## 0.6.0
|
|
129
235
|
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
options timeout: 600
|
|
137
|
-
|
|
138
|
-
def perform(user_id) = # ...
|
|
139
|
-
end
|
|
140
|
-
|
|
141
|
-
# Call-site override (wins over class-level)
|
|
142
|
-
HeavyImportJob.perform_async(user_id, options: { timeout: 120 })
|
|
143
|
-
```
|
|
144
|
-
Priority: call-site `options:` → class-level `options` → `DEFAULT_TIMEOUT` (30s). Options are merged at enqueue time so the runner simply reads the final value from the payload
|
|
145
|
-
- **`options:` hash across the entire enqueue chain** — single extensible contract from `perform_async` through `Client` down to `Store`. Currently supports `:timeout`, designed to accommodate future keys (e.g. `:retry`) without API changes
|
|
146
|
-
- **`Job::Options` schema via `Data.define`** — declares known option keys with types and defaults. Unknown keys raise `ArgumentError`, invalid types raise `TypeError`. No manual validation code
|
|
147
|
-
- **`options TEXT` column in SQLite** — stores the merged options hash as JSON. Extensible without schema changes when new options are added
|
|
148
|
-
|
|
149
|
-
### Improvements
|
|
150
|
-
- **Queue timeout logged on failure** — `run_queue_job` error log now includes actual timeout value: `"timed out after 120s"` instead of generic `"timed out"`
|
|
151
|
-
- **Idempotent schema migration** — existing databases get `ALTER TABLE jobs ADD COLUMN options TEXT` on first connection, wrapped in `rescue nil` for safe re-runs. New databases include the column in `CREATE TABLE`
|
|
236
|
+
**Queue notification system rewritten.** The pipe-based `Notifier` is
|
|
237
|
+
replaced with a Unix-domain-socket architecture: each worker listens on its
|
|
238
|
+
own socket (`<dir>/async_bg_worker_N.sock`), producers broadcast wake-ups
|
|
239
|
+
via `SocketNotifier`. Fork-safe by design (no shared FDs), resilient to
|
|
240
|
+
restarts (stale-socket cleanup), and sub-100 µs wake-up latency
|
|
241
|
+
(30–80 µs typical).
|
|
152
242
|
|
|
153
|
-
|
|
243
|
+
**Why.** The pipe-based notifier was fundamentally broken in the
|
|
244
|
+
recommended multi-fork setup: `for_consumer!` closed the writer end in each
|
|
245
|
+
child, making `Client#push → notify` fail silently with `IOError`. All
|
|
246
|
+
writes hit `WRITE_DROPPED`, so the queue silently degraded to 5-second
|
|
247
|
+
polling.
|
|
154
248
|
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
249
|
+
**Breaking changes.** `Runner` now takes `queue_socket_dir:` instead of
|
|
250
|
+
`queue_notifier:`. `Notifier#for_producer!` / `Notifier#for_consumer!` are
|
|
251
|
+
removed. `Client#push` calls `notifier.notify_all`. Environment variable
|
|
252
|
+
`QUEUE_SOCKET_PATH` is replaced by `QUEUE_SOCKET_DIR` (a directory now).
|
|
158
253
|
|
|
159
|
-
### Improvements
|
|
160
|
-
- **SocketNotifier: non-blocking enqueue with ring fallback** — `notify_all` no longer connects to all N worker sockets on every enqueue. `UNIXSocket.new` is a blocking, non-fiber-aware syscall, and notifying every worker blocked the Falcon reactor for N `connect()` calls on the hot HTTP enqueue path. Now wakes a single worker chosen by random offset, falling back through the ring only if the chosen worker is dead (`ECONNREFUSED` etc.). Happy path: 1 connect. Worst case (all workers down): N connects — same as before, but only when actually needed. Safe because the queue is shared in SQLite, not sharded per worker
|
|
161
|
-
- **SocketNotifier: cleaned up `UNAVAILABLE` error list** — removed `IO::WaitWritable` and `Errno::EAGAIN`. They implied "socket buffer full", but `write_nonblock` of a single byte to a freshly-opened connection cannot fill the kernel buffer. Listing them only misled readers
|
|
162
|
-
- **Store: partial index for pending lookup** — replaced `idx_jobs_status_run_at_id(status, run_at, id)` with partial index `idx_jobs_pending(run_at, id) WHERE status = 'pending'`. Smaller on disk, cheaper to update, and matches the only query that uses it (`fetch`). `done`/`failed`/`running` rows no longer occupy index pages
|
|
163
254
|
|
|
164
|
-
## 0.6.0
|
|
165
255
|
|
|
166
|
-
|
|
167
|
-
- **Queue notification system completely rewritten** — replaced pipe-based `Notifier` with Unix domain socket-based architecture
|
|
168
|
-
- `Runner` now takes `queue_socket_dir:` parameter instead of `queue_notifier:`
|
|
169
|
-
- Removed `Notifier#for_producer!` and `Notifier#for_consumer!` — no longer needed
|
|
170
|
-
- `Client#push` now calls `notifier.notify_all` instead of `notifier.notify`
|
|
171
|
-
|
|
172
|
-
### Features
|
|
173
|
-
- **Unix domain socket-based notifications** — solves all cross-process notification problems
|
|
174
|
-
- New `SocketWaker` class (consumer-side) — each worker listens on its own Unix socket (`/tmp/queue/sockets/async_bg_worker_N.sock`)
|
|
175
|
-
- New `SocketNotifier` class (producer-side) — connects to all worker sockets to broadcast wake-ups
|
|
176
|
-
- **Cross-process wake-up now works correctly** — web workers → background workers, background workers → background workers
|
|
177
|
-
- **Fork-safe by design** — no shared file descriptors, each process creates its own socket after fork
|
|
178
|
-
- **Resilient to restarts** — stale socket cleanup on worker startup, graceful degradation if worker unavailable
|
|
179
|
-
- **Sub-100µs latency** — typical wake-up time 30-80µs vs previous 5-second polling fallback
|
|
180
|
-
|
|
181
|
-
### Bug Fixes
|
|
182
|
-
- **CRITICAL: Notifier bug in recommended setup** — the old pipe-based `Notifier` was fundamentally broken in multi-fork scenarios:
|
|
183
|
-
- `for_consumer!` closed the writer end in each child process, making `Client#push → notify` fail silently with `IOError`
|
|
184
|
-
- All writes were caught by `WRITE_DROPPED` rescue block, causing jobs to use 5-second polling instead of instant wake-up
|
|
185
|
-
- Web workers had no way to notify background workers (no shared pipe after fork)
|
|
186
|
-
- The bug was masked by `WRITE_DROPPED` silently catching `IOError` — appeared to work but degraded to polling
|
|
187
|
-
- **Socket cleanup race conditions** — `SocketWaker#cleanup_stale_socket` now validates if socket is truly stale by attempting connection
|
|
188
|
-
|
|
189
|
-
### Improvements
|
|
190
|
-
- Updated `docs/GET_STARTED.md` with new socket-based setup for Falcon
|
|
191
|
-
- Added section on web worker → background worker job enqueuing with full example
|
|
192
|
-
- Changed environment variable from `QUEUE_SOCKET_PATH` to `QUEUE_SOCKET_DIR` (directory instead of single socket path)
|
|
193
|
-
- Better error handling in `SocketWaker` and `SocketNotifier` with comprehensive `UNAVAILABLE` error list
|
|
194
|
-
- Integrated with `Async::Notification` for local wake-ups (shutdown signals)
|
|
195
|
-
|
|
196
|
-
### Technical Details
|
|
197
|
-
- **Why sockets over pipes?** Pipes require shared FDs across fork boundaries. The recommended Falcon setup calls `for_consumer!` in each child, which closes the writer, breaking the notification chain. Sockets use filesystem paths — any process can connect without inherited FDs.
|
|
198
|
-
- **Performance impact:** Adding ~80µs per enqueue for 8 workers (8 socket connections) vs ~100µs for SQLite transaction = negligible overhead
|
|
199
|
-
- **Graceful degradation:** If worker socket unavailable (`ENOENT`, `ECONNREFUSED`), producer silently skips — job still in database, will be picked up on next poll (5s max delay)
|
|
256
|
+
|
|
200
257
|
|
|
201
258
|
## 0.5.1
|
|
202
259
|
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
- Added explanatory comments for each error type and handling strategy
|
|
260
|
+
CI infrastructure: full Docker-based integration testing (`Dockerfile.ci`,
|
|
261
|
+
`docker-compose.ci.yml`, `Gemfile.ci`) plus an end-to-end scenario test that
|
|
262
|
+
validates forked-worker behavior — normal execution, crash recovery after
|
|
263
|
+
SIGKILL, no duplicate execution under crashes, proper distribution across
|
|
264
|
+
the pool.
|
|
265
|
+
|
|
266
|
+
Also: `PRAGMA busy_timeout = 5000` on `Queue::Store` to prevent
|
|
267
|
+
`SQLITE_BUSY` under concurrent multi-process access; cleaner IO error
|
|
268
|
+
categorization in `Queue::Notifier` (`WRITE_DROPPED` vs `READ_EXHAUSTED`)
|
|
269
|
+
with explanatory comments.
|
|
270
|
+
|
|
271
|
+
|
|
272
|
+
|
|
273
|
+
|
|
218
274
|
|
|
219
275
|
## 0.5.0
|
|
220
276
|
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
- `Queue::Notifier` — extracted `IO_ERRORS` constant (`IO::WaitReadable`, `EOFError`, `IOError`) for cleaner `rescue` in `drain`
|
|
243
|
-
- `Queue::Store` — replaced index `idx_jobs_status_id(status, id)` with `idx_jobs_status_run_at_id(status, run_at, id)` for efficient delayed job lookups
|
|
244
|
-
- `Queue::Store` — `fetch` SQL now uses `WHERE status = 'pending' AND run_at <= ?` with `ORDER BY run_at, id` to process jobs in scheduled order
|
|
245
|
-
- Removed duplicated `monotonic_now` / `realtime_now` from `Runner` and `Store` — now provided by `Clock` module
|
|
246
|
-
- Updated documentation: README (Job module examples, Queue architecture diagram, Clock section), GET_STARTED (delayed jobs guide, Job module usage, minimal queue-only example)
|
|
277
|
+
**Delayed jobs.** Full support for scheduling jobs in the future:
|
|
278
|
+
|
|
279
|
+
```ruby
|
|
280
|
+
SomeJob.perform_in(60, *args)
|
|
281
|
+
SomeJob.perform_at(time, *args)
|
|
282
|
+
```
|
|
283
|
+
|
|
284
|
+
Backed by a new `run_at` column in the SQLite `jobs` table — jobs are only
|
|
285
|
+
fetched when `run_at <= now`.
|
|
286
|
+
|
|
287
|
+
**Job module.** Sidekiq-like `include Async::Background::Job` adds
|
|
288
|
+
`perform_async`, `perform_in`, `perform_at`, instance-level `#perform`, and
|
|
289
|
+
class-level `perform_now` delegation.
|
|
290
|
+
|
|
291
|
+
**Clock module.** Shared `monotonic_now` / `realtime_now` helpers extracted
|
|
292
|
+
to `Async::Background::Clock` and included by `Runner`, `Queue::Store`, and
|
|
293
|
+
`Queue::Client`.
|
|
294
|
+
|
|
295
|
+
|
|
296
|
+
|
|
297
|
+
|
|
247
298
|
|
|
248
299
|
## 0.4.5
|
|
249
300
|
|
|
250
|
-
|
|
251
|
-
|
|
301
|
+
**Fetch race condition fixed.** Wrapped `UPDATE ... RETURNING` in
|
|
302
|
+
`BEGIN IMMEDIATE` to prevent two workers from picking up the same job
|
|
303
|
+
simultaneously.
|
|
304
|
+
|
|
305
|
+
**mmap on Docker overlay2.** `overlay2` does not guarantee `write()` /
|
|
306
|
+
`mmap()` coherence, which corrupts the WAL under concurrent multi-process
|
|
307
|
+
access. mmap is now configurable via `queue_mmap: false` instead of being
|
|
308
|
+
hardcoded. Proper Docker setup with named volumes is documented in
|
|
309
|
+
[Get Started → Docker](docs/GET_STARTED.md#step-3--docker-setup).
|
|
310
|
+
|
|
311
|
+
Also: `PRAGMA optimize` on shutdown wrapped in `rescue nil`,
|
|
312
|
+
`PRAGMA incremental_vacuum` actually works now (`PRAGMA auto_vacuum =
|
|
313
|
+
INCREMENTAL` added to schema; only takes effect on new databases),
|
|
314
|
+
composite index `idx_jobs_status_id(status, id)` to eliminate a sort in
|
|
315
|
+
`fetch`. New `queue_mmap:` / `mmap:` parameters and a public
|
|
316
|
+
`attr_reader :queue_store` on `Runner`.
|
|
317
|
+
|
|
318
|
+
**Breaking-ish.** `PRAGMAS` is now a frozen lambda `PRAGMAS.call(mmap_size)`
|
|
319
|
+
instead of a static string; update any direct reference.
|
|
252
320
|
|
|
253
|
-
### Features
|
|
254
|
-
- New `queue_mmap:` parameter on `Runner` (default: `true`) — allows disabling SQLite mmap for environments where it's unsafe (Docker overlay2)
|
|
255
|
-
- New `mmap:` parameter on `Queue::Store` (default: `true`) — controls `PRAGMA mmap_size` (256 MB when enabled, 0 when disabled)
|
|
256
|
-
- Public `attr_reader :queue_store` on `Runner` — eliminates need for `instance_variable_get` when sharing Store with Client
|
|
257
321
|
|
|
258
|
-
### Bug Fixes
|
|
259
|
-
- **CRITICAL: fetch race condition** — wrapped `UPDATE ... RETURNING` in `BEGIN IMMEDIATE` transaction to prevent two workers from picking up the same job simultaneously
|
|
260
|
-
- **CRITICAL: mmap + Docker overlay2** — `overlay2` filesystem does not guarantee `write()`/`mmap()` coherence, causing SQLite WAL corruption under concurrent multi-process access. mmap is now configurable via `queue_mmap: false` instead of being hardcoded. Documented proper Docker setup with named volumes in `docs/GET_STARTED.md`
|
|
261
|
-
- **`PRAGMA optimize` on shutdown** — wrapped in `rescue nil` to prevent `SQLite3::BusyException` when another process holds the write lock during graceful shutdown
|
|
262
|
-
- **`PRAGMA incremental_vacuum` was a no-op** — added `PRAGMA auto_vacuum = INCREMENTAL` to schema. Without it, `incremental_vacuum` does nothing. Note: only takes effect on newly created databases; existing databases require a one-time `VACUUM`
|
|
263
322
|
|
|
264
|
-
|
|
265
|
-
- Replaced index `idx_jobs_status(status)` with composite `idx_jobs_status_id(status, id)` — eliminates sort step in `fetch` query (`ORDER BY id LIMIT 1` is now a direct B-tree lookup)
|
|
266
|
-
- Fixed `finalize_statements` — changed `%i[@enqueue_stmt ...]` to `%i[enqueue_stmt ...]` with `:"@#{name}"` interpolation for idiomatic `instance_variable_get`/`set` usage
|
|
267
|
-
- Added documentation: `README.md` (concise, with warning markers) and `docs/GET_STARTED.md` (step-by-step guide covering schedule config, Falcon integration, Docker setup, dynamic queue)
|
|
323
|
+
|
|
268
324
|
|
|
269
325
|
## 0.4.0
|
|
270
326
|
|
|
271
|
-
|
|
272
|
-
|
|
273
|
-
- `Queue::Store` — SQLite-backed persistent storage with WAL mode, prepared statements, and optimized pragmas
|
|
274
|
-
- `Queue::Notifier` — `IO.pipe`-based zero-cost wakeup between producer and consumer processes (no polling)
|
|
275
|
-
- `Queue::Client` — public API: `Async::Background::Queue.enqueue(JobClass, *args)`
|
|
276
|
-
- Automatic recovery of stale `running` jobs on worker restart
|
|
277
|
-
- Periodic cleanup of completed jobs (piggyback on fetch, every 5 minutes)
|
|
278
|
-
- `PRAGMA incremental_vacuum` when cleanup removes 100+ rows
|
|
279
|
-
- Worker isolation via `ISOLATION_FORKS` env variable — exclude specific workers from queue processing
|
|
280
|
-
- Custom database path via `queue_db_path` parameter
|
|
281
|
-
- Requires optional `sqlite3` gem (`~> 2.0`) — not included by default, must be added to Gemfile explicitly
|
|
282
|
-
- New Runner parameters: `queue_notifier:` and `queue_db_path:`
|
|
283
|
-
|
|
284
|
-
### Improvements
|
|
285
|
-
- Unified `monotonic_now` usage across `run_job` and `run_queue_job` (was using direct `Process.clock_gettime` call in `run_job`)
|
|
286
|
-
- `Queue::Notifier#drain` — moved `rescue` inside the loop to avoid stack unwinding on each drain cycle
|
|
327
|
+
**Dynamic job queue.** Enqueue jobs at runtime from any process (web,
|
|
328
|
+
console, rake) with automatic execution by background workers.
|
|
287
329
|
|
|
288
|
-
|
|
330
|
+
- `Queue::Store` — SQLite-backed persistent storage with WAL mode,
|
|
331
|
+
prepared statements, and optimized pragmas.
|
|
332
|
+
- `Queue::Notifier` — `IO.pipe`-based zero-cost wake-up between producer
|
|
333
|
+
and consumer processes.
|
|
334
|
+
- `Queue::Client` — public API: `Async::Background::Queue.enqueue
|
|
335
|
+
(JobClass, *args)`.
|
|
336
|
+
- Automatic recovery of stale `running` jobs on worker restart.
|
|
337
|
+
- Periodic cleanup of completed jobs (piggybacked on fetch, every 5 min);
|
|
338
|
+
`PRAGMA incremental_vacuum` when cleanup removes 100+ rows.
|
|
339
|
+
- `ISOLATION_FORKS` env var excludes specific workers from queue processing.
|
|
340
|
+
- Custom database path via `queue_db_path:` on `Runner`.
|
|
289
341
|
|
|
290
|
-
|
|
291
|
-
- Added optional metrics collection system using shared memory
|
|
292
|
-
- New `Metrics` class with worker-specific performance tracking
|
|
293
|
-
- Public API: `runner.metrics.enabled?`, `runner.metrics.values`, `Metrics.read_all()`
|
|
294
|
-
- Tracks total runs, successes, failures, timeouts, skips, active jobs, and execution times
|
|
295
|
-
- Requires optional `async-utilization` gem dependency
|
|
296
|
-
- Metrics stored in `/tmp/async-background.shm` with lock-free updates per worker
|
|
342
|
+
Requires the optional `sqlite3` gem (`~> 2.0`).
|
|
297
343
|
|
|
298
|
-
|
|
344
|
+
(The 0.6.0 socket-based architecture supersedes the pipe-based notifier
|
|
345
|
+
introduced here.)
|
|
299
346
|
|
|
300
|
-
### Improvements
|
|
301
|
-
- Micro-optimization in `wait_with_shutdown` method: use passed `task` parameter instead of `Async::Task.current` for better consistency and slight performance improvement
|
|
302
347
|
|
|
303
|
-
## 0.2.5
|
|
304
348
|
|
|
305
|
-
|
|
306
|
-
- Added graceful shutdown via signal handlers for SIGINT and SIGTERM
|
|
307
|
-
- Enhanced process lifecycle management with proper signal handling using `Signal.trap` and IO.pipe for async communication
|
|
308
|
-
- Improved robustness for production deployments and container orchestration
|
|
309
|
-
- Updated dependencies to work with latest Async 2.x API (removed deprecated `:parent` parameter usage)
|
|
349
|
+
|
|
310
350
|
|
|
311
|
-
## 0.
|
|
351
|
+
## 0.3.0
|
|
312
352
|
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
|
|
316
|
-
|
|
317
|
-
|
|
318
|
-
-
|
|
353
|
+
Optional metrics collection via shared memory. `Metrics` tracks per-worker
|
|
354
|
+
counters: `total_runs`, `total_successes`, `total_failures`,
|
|
355
|
+
`total_timeouts`, `total_skips`, `active_jobs`, plus last-run timestamp and
|
|
356
|
+
duration. Public API: `runner.metrics.enabled?`, `runner.metrics.values`,
|
|
357
|
+
`Metrics.read_all(total_workers:)`. Requires the optional
|
|
358
|
+
`async-utilization` gem; absent that, `enabled?` is `false` and `read_all`
|
|
359
|
+
returns `[]`. Default file: `/tmp/async-background.shm`.
|
|
319
360
|
|
|
320
|
-
## 0.2.2
|
|
321
361
|
|
|
322
|
-
### Bug Fixes
|
|
323
|
-
- **CRITICAL**: Removed logger parameter from Runner initialize (was unused). Fixed initialization to use Console.logger directly which now properly initializes in forked processes with correct context
|
|
324
362
|
|
|
325
|
-
|
|
363
|
+
|
|
326
364
|
|
|
327
|
-
|
|
328
|
-
- **CRITICAL**: Added missing `require 'console'` in main module. Logger was nil because Console gem was not imported, causing `undefined method 'info' for nil` errors on worker initialization
|
|
365
|
+
## 0.2.x
|
|
329
366
|
|
|
330
|
-
|
|
367
|
+
- **0.2.6** — `wait_with_shutdown` uses the passed `task` parameter
|
|
368
|
+
instead of `Async::Task.current`.
|
|
369
|
+
- **0.2.5** — Graceful shutdown via `SIGINT` / `SIGTERM` signal handlers
|
|
370
|
+
using `Signal.trap` and `IO.pipe`. Compatible with Async 2.x API
|
|
371
|
+
(removed deprecated `:parent`).
|
|
372
|
+
- **0.2.4** — Removed hardcoded version warning. Use semver pre-release
|
|
373
|
+
suffixes for unstable versions (e.g. `0.3.0.alpha1`).
|
|
374
|
+
- **0.2.2** — Removed unused `logger` parameter from `Runner#initialize`;
|
|
375
|
+
use `Console.logger` directly, which now initializes correctly in
|
|
376
|
+
forked processes.
|
|
377
|
+
- **0.2.1** — Added missing `require 'console'` in main module. Logger
|
|
378
|
+
was `nil`, causing `undefined method 'info' for nil` on worker
|
|
379
|
+
initialization.
|
|
380
|
+
- **0.2.0** — Removed hidden ActiveSupport dependency
|
|
381
|
+
(`safe_constantize` → `Object.const_get` + `NameError`). Job validation
|
|
382
|
+
now checks for `.perform_now` (class method) instead of `.perform`
|
|
383
|
+
(instance method). Fixed a race where an entry could disappear from the
|
|
384
|
+
heap during execution. Added `stop()` and `running?()` to `Runner`.
|
|
331
385
|
|
|
332
|
-
### Bug Fixes
|
|
333
|
-
- **CRITICAL**: Removed hidden ActiveSupport dependency. Replaced `safe_constantize` with `Object.const_get` + `NameError` handling
|
|
334
|
-
- **CRITICAL**: Fixed validator mismatch: now validates `.perform_now` (class method) instead of `.perform` (instance method)
|
|
335
|
-
- **CRITICAL**: Fixed race condition where entry could disappear from heap during execution. `reschedule` and `heap.push` now always execute after job processing
|
|
336
|
-
- Added full exception backtrace to error logs for production debugging
|
|
337
|
-
- Improved YAML security by removing `Symbol` from `permitted_classes`
|
|
338
|
-
- Removed Mutex from graceful shutdown (anti-pattern in Async). Boolean assignment is atomic in MRI
|
|
339
386
|
|
|
340
|
-
### Features
|
|
341
|
-
- Added optional `logger` parameter to Runner constructor for custom loggers (Rails.logger, etc.)
|
|
342
|
-
- Added `stop()` method for graceful shutdown
|
|
343
|
-
- Added `running?()` method to check scheduler status
|
|
344
387
|
|
|
345
|
-
|
|
346
|
-
- Job class validation now checks for `.perform_now` class method (was checking for `.perform` instance method)
|
|
388
|
+
|
|
347
389
|
|
|
348
390
|
## 0.1.0
|
|
349
391
|
|
|
350
|
-
|
|
351
|
-
|
|
352
|
-
-
|
|
353
|
-
-
|
|
354
|
-
-
|
|
355
|
-
-
|
|
356
|
-
-
|
|
357
|
-
-
|
|
358
|
-
-
|
|
392
|
+
Initial release.
|
|
393
|
+
|
|
394
|
+
- Single event loop with min-heap timer (`O(log N)` scheduling).
|
|
395
|
+
- Skip overlapping execution.
|
|
396
|
+
- Startup jitter to prevent thundering herd.
|
|
397
|
+
- Monotonic clock for interval jobs, wall clock for cron jobs.
|
|
398
|
+
- Deterministic worker sharding via `Zlib.crc32`.
|
|
399
|
+
- Semaphore-based concurrency control.
|
|
400
|
+
- Per-job timeout protection.
|
|
401
|
+
- Structured logging via Console.
|