jqueue 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,712 @@
1
+ Metadata-Version: 2.4
2
+ Name: jqueue
3
+ Version: 0.1.0
4
+ Summary: Storage-agnostic job queue using object storage with compare-and-set semantics
5
+ Project-URL: Homepage, https://github.com/janbjorge/jqueue
6
+ Project-URL: Repository, https://github.com/janbjorge/jqueue
7
+ Project-URL: Issues, https://github.com/janbjorge/jqueue/issues
8
+ Author-email: JB <janbjorge@gmail.com>
9
+ License: MIT
10
+ License-File: LICENSE
11
+ Keywords: async,asyncio,gcs,job-queue,object-storage,queue,s3
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Framework :: AsyncIO
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Topic :: Software Development :: Libraries
19
+ Classifier: Topic :: System :: Distributed Computing
20
+ Classifier: Typing :: Typed
21
+ Requires-Python: >=3.12
22
+ Requires-Dist: pydantic>=2.0
23
+ Provides-Extra: dev
24
+ Requires-Dist: aioboto3>=13.0; extra == 'dev'
25
+ Requires-Dist: google-cloud-storage>=2.0; extra == 'dev'
26
+ Requires-Dist: mypy; extra == 'dev'
27
+ Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
28
+ Requires-Dist: pytest>=8.0; extra == 'dev'
29
+ Requires-Dist: ruff; extra == 'dev'
30
+ Provides-Extra: gcs
31
+ Requires-Dist: google-cloud-storage>=2.0; extra == 'gcs'
32
+ Provides-Extra: s3
33
+ Requires-Dist: aioboto3>=13.0; extra == 's3'
34
+ Description-Content-Type: text/markdown
35
+
36
+ # jqueue
37
+
38
+ A lightweight, storage-agnostic job queue for Python that runs on top of ordinary object
39
+ storage — S3, GCS, a local file, or an in-memory buffer. No message broker, no database,
40
+ no sidecar process required.
41
+
42
+ Inspired by the [turbopuffer object-storage queue pattern](https://turbopuffer.com/blog/object-storage-queue).
43
+
44
+ ---
45
+
46
+ ## Contents
47
+
48
+ - [Why object storage?](#why-object-storage)
49
+ - [Installation](#installation)
50
+ - [Quick start](#quick-start)
51
+ - [Use cases](#use-cases)
52
+ - [How it works](#how-it-works)
53
+ - [Compare-and-set writes](#compare-and-set-writes)
54
+ - [DirectQueue — one CAS write per operation](#directqueue--one-cas-write-per-operation)
55
+ - [BrokerQueue — group commit](#brokerqueue--group-commit)
56
+ - [Heartbeats and stale-job recovery](#heartbeats-and-stale-job-recovery)
57
+ - [Wire format](#wire-format)
58
+ - [Storage adapters](#storage-adapters)
59
+ - [InMemoryStorage](#inmemorystorage)
60
+ - [LocalFileSystemStorage](#localfilesystemstorage)
61
+ - [S3Storage](#s3storage)
62
+ - [GCSStorage](#gcsstorage)
63
+ - [Custom adapters](#custom-adapters)
64
+ - [API reference](#api-reference)
65
+ - [Error handling](#error-handling)
66
+ - [Architecture](#architecture)
67
+ - [Limitations and trade-offs](#limitations-and-trade-offs)
68
+
69
+ ---
70
+
71
+ ## Why object storage?
72
+
73
+ Traditional job queues add a dependency: Redis, RabbitMQ, SQS, a Postgres table. Each comes
74
+ with operational overhead — provisioning, monitoring, capacity planning, and another failure
75
+ domain to manage.
76
+
77
+ For workloads that don't need sub-millisecond latency or thousands of operations per second,
78
+ object storage is a surprisingly capable alternative:
79
+
80
+ | Property | Object storage queue |
81
+ |---|---|
82
+ | **Durability** | 11 nines (S3/GCS) — survives entire AZ outages |
83
+ | **Cost** | ~$0.004/10 000 operations (S3 PUT pricing) |
84
+ | **Infrastructure** | Zero — uses storage you already have |
85
+ | **Concurrency safety** | CAS writes (If-Match / if_generation_match) |
86
+ | **Exactly-once delivery** | Guaranteed by the CAS protocol |
87
+ | **Ops/sec** | ~1–100 ops/s depending on backend and batching |
88
+
89
+ The queue state lives in **a single JSON file**. Every mutation is a conditional write that
90
+ only succeeds if the file hasn't changed since you last read it. Concurrent writers that lose
91
+ the race retry automatically.
92
+
93
+ ---
94
+
95
+ ## Installation
96
+
97
+ ```bash
98
+ # Core (no optional deps)
99
+ pip install jqueue
100
+
101
+ # With S3 support
102
+ pip install "jqueue[s3]"
103
+
104
+ # With GCS support
105
+ pip install "jqueue[gcs]"
106
+
107
+ # Both
108
+ pip install "jqueue[s3,gcs]"
109
+ ```
110
+
111
+ Requires Python 3.12+.
112
+
113
+ ---
114
+
115
+ ## Quick start
116
+
117
+ ```python
118
+ import asyncio
119
+ from jqueue import BrokerQueue, HeartbeatManager, InMemoryStorage
120
+
121
+ async def main():
122
+ async with BrokerQueue(InMemoryStorage()) as q:
123
+
124
+ # Producer: add jobs to the queue
125
+ await q.enqueue("send_email", b'{"to": "alice@example.com"}')
126
+ await q.enqueue("send_email", b'{"to": "bob@example.com"}')
127
+
128
+ # Consumer: claim and process
129
+ jobs = await q.dequeue("send_email", batch_size=2)
130
+ for job in jobs:
131
+ async with HeartbeatManager(q, job.id):
132
+ print(f"Sending email: {job.payload}")
133
+ await q.ack(job.id)
134
+
135
+ asyncio.run(main())
136
+ ```
137
+
138
+ Switch to a real backend by swapping the storage adapter — the queue logic is identical:
139
+
140
+ ```python
141
+ # Local file (single machine)
142
+ from jqueue import LocalFileSystemStorage
143
+ storage = LocalFileSystemStorage("/var/lib/myapp/queue.json")
144
+
145
+ # AWS S3
146
+ from jqueue.adapters.storage.s3 import S3Storage
147
+ storage = S3Storage(bucket="my-bucket", key="queues/jobs.json")
148
+
149
+ # Google Cloud Storage
150
+ from jqueue.adapters.storage.gcs import GCSStorage
151
+ storage = GCSStorage(bucket_name="my-bucket", blob_name="queues/jobs.json")
152
+ ```
153
+
154
+ ---
155
+
156
+ ## Use cases
157
+
158
+ ### Background job processing
159
+
160
+ Enqueue work from a web request and process it in a separate worker process:
161
+
162
+ ```python
163
+ # web handler
164
+ await q.enqueue("resize_image", payload=image_bytes, priority=0)
165
+
166
+ # worker loop
167
+ async def worker(q):
168
+ while True:
169
+ jobs = await q.dequeue("resize_image", batch_size=5)
170
+ if not jobs:
171
+ await asyncio.sleep(1)
172
+ continue
173
+ for job in jobs:
174
+ async with HeartbeatManager(q, job.id, interval=timedelta(seconds=30)):
175
+ await resize(job.payload)
176
+ await q.ack(job.id)
177
+ ```
178
+
179
+ ### Fan-out with multiple queues
180
+
181
+ Use separate JSON files (keys/blobs) to partition workloads:
182
+
183
+ ```python
184
+ email_queue = BrokerQueue(S3Storage(bucket="b", key="queues/email.json"))
185
+ sms_queue = BrokerQueue(S3Storage(bucket="b", key="queues/sms.json"))
186
+ ```
187
+
188
+ ### Priority work
189
+
190
+ Lower `priority` value = processed first (think Unix `nice`):
191
+
192
+ ```python
193
+ await q.enqueue("report", b"payload", priority=0) # urgent
194
+ await q.enqueue("report", b"payload", priority=10) # best-effort
195
+ ```
196
+
197
+ ### Long-running jobs with heartbeats
198
+
199
+ Workers on long tasks use `HeartbeatManager` to prevent the broker from re-queuing their
200
+ job as stale. If the worker dies, the heartbeat stops and the job is automatically
201
+ recovered:
202
+
203
+ ```python
204
+ [job] = await q.dequeue("transcode_video")
205
+ try:
206
+ async with HeartbeatManager(q, job.id, interval=timedelta(seconds=30)):
207
+ result = await transcode(job.payload) # might take minutes
208
+ await q.ack(job.id)
209
+ except Exception:
210
+ await q.nack(job.id) # return to queue for retry
211
+ ```
212
+
213
+ ### Testing without infrastructure
214
+
215
+ `InMemoryStorage` implements the same interface — no mocking required:
216
+
217
+ ```python
218
+ async def test_email_worker():
219
+ async with BrokerQueue(InMemoryStorage()) as q:
220
+ await q.enqueue("send_email", b'{"to": "test@example.com"}')
221
+ [job] = await q.dequeue("send_email")
222
+ await process_email(job)
223
+ await q.ack(job.id)
224
+ state = await q.read_state()
225
+ assert len(state.jobs) == 0
226
+ ```
227
+
228
+ ### MinIO / self-hosted S3-compatible storage
229
+
230
+ ```python
231
+ storage = S3Storage(
232
+ bucket="my-bucket",
233
+ key="queue.json",
234
+ endpoint_url="http://minio.internal:9000",
235
+ region_name="us-east-1",
236
+ )
237
+ ```
238
+
239
+ ---
240
+
241
+ ## How it works
242
+
243
+ ### Compare-and-set writes
244
+
245
+ The entire queue state is serialized to a single JSON blob on object storage. Every
246
+ mutation follows a three-step cycle:
247
+
248
+ ```
249
+ 1. READ → fetch the current blob + its etag
250
+ 2. MUTATE → apply the operation in memory (pure function)
251
+ 3. WRITE → PUT the new blob with If-Match: <etag>
252
+ ✓ etag matches → write succeeds, new etag returned
253
+ ✗ etag changed → CASConflictError, retry from step 1
254
+ ```
255
+
256
+ The **etag** is an opaque version token returned by the storage backend:
257
+
258
+ | Backend | Etag source |
259
+ |---|---|
260
+ | S3 / MinIO / R2 | HTTP `ETag` response header |
261
+ | GCS | Object `generation` number (integer, stringified) |
262
+ | Filesystem | `st_mtime_ns` (nanosecond mtime) |
263
+ | InMemory | Monotonic integer counter |
264
+
265
+ Because two writers can't both satisfy the same `If-Match` condition, the CAS protocol
266
+ provides **exactly-once delivery** without any locks, transactions, or coordination service.
267
+
268
+ ```
269
+ Writer A Writer B
270
+ ──────── ────────
271
+ read → state₀, etag="abc"
272
+ read → state₀, etag="abc"
273
+ mutate → state₁
274
+ write (If-Match: abc) → ✓ "xyz"
275
+ mutate → state₁′
276
+ write (If-Match: abc) → ✗ CASConflictError
277
+ read → state₁, etag="xyz" ← re-reads fresh state
278
+ mutate → state₂
279
+ write (If-Match: xyz) → ✓ "pqr"
280
+ ```
281
+
282
+ ### DirectQueue — one CAS write per operation
283
+
284
+ `DirectQueue` is the simplest implementation. Every call to `enqueue`, `dequeue`, `ack`,
285
+ `nack`, or `heartbeat` performs its own independent CAS cycle:
286
+
287
+ ```
288
+ enqueue("task", b"payload")
289
+ → read (state₀, etag₀)
290
+ → state₁ = state₀.with_job_added(job)
291
+ → write(state₁, if_match=etag₀) ← one storage round-trip per operation
292
+ ```
293
+
294
+ **Retry policy:** up to 10 retries on `CASConflictError`, with linear back-off
295
+ (10 ms × attempt number).
296
+
297
+ Use `DirectQueue` when:
298
+ - throughput is ~1–5 ops/s
299
+ - you want the simplest possible code path
300
+ - you're running a single worker
301
+
302
+ ```python
303
+ from jqueue import DirectQueue, LocalFileSystemStorage
304
+
305
+ q = DirectQueue(LocalFileSystemStorage("queue.json"))
306
+ job = await q.enqueue("task", b"data")
307
+ [claimed] = await q.dequeue("task")
308
+ await q.ack(claimed.id)
309
+ ```
310
+
311
+ ### BrokerQueue — group commit
312
+
313
+ When multiple coroutines (or asyncio tasks) call the queue concurrently, each one would
314
+ normally trigger its own storage round-trip. With a 100 ms S3 latency, 10 concurrent
315
+ enqueues would take 10 × 100 ms = 1 second if serialized naively.
316
+
317
+ `BrokerQueue` solves this with a **group commit loop** (`GroupCommitLoop`) — a single
318
+ background writer task that batches all pending operations into one CAS write:
319
+
320
+ ```
321
+ Caller 1: enqueue() ──────────────────────────────────────> result
322
+ Caller 2: enqueue() ──────────────────────────────────────> result
323
+ Caller 3: dequeue() ──────────────────────────────────────> result
324
+ ↓ batch = [op₁, op₂, op₃]
325
+ Writer: read → apply op₁,op₂,op₃ → CAS write → resolve futures
326
+ └─────────── one round-trip ───────────┘
327
+ ```
328
+
329
+ **The algorithm in detail:**
330
+
331
+ 1. Each caller appends its mutation function to a shared `_pending` list and wakes the
332
+ writer via an `asyncio.Event`.
333
+ 2. The caller then `await`s a `Future` that will be resolved by the writer.
334
+ 3. The writer drains `_pending` into a batch, reads the current state once, applies all
335
+ mutations in order, and CAS-writes the new state.
336
+ 4. On success, the writer resolves each future with its result (or exception).
337
+ 5. If the CAS write fails (another writer raced ahead), the entire batch is re-applied to
338
+ the fresh state and retried with exponential back-off (up to 20 retries, capped at
339
+ ~320 ms).
340
+
341
+ **Per-operation error isolation:** if one mutation in a batch raises (e.g., `nack` on a
342
+ job that no longer exists), only that caller's future receives the exception. All other
343
+ operations in the same batch commit normally.
344
+
345
+ ```
346
+ Batch: [valid_enqueue, bad_nack, valid_dequeue]
347
+ ↓ ↓ ↓
348
+ success JobNotFound success
349
+ ```
350
+
351
+ `BrokerQueue` collapses N concurrent callers into O(1) storage operations per write
352
+ cycle, making it suitable for ~10–100 ops/s depending on backend latency.
353
+
354
+ ```python
355
+ from jqueue import BrokerQueue, InMemoryStorage
356
+
357
+ async with BrokerQueue(InMemoryStorage()) as q:
358
+ # These three enqueues are batched into a single storage write
359
+ await asyncio.gather(
360
+ q.enqueue("task", b"1"),
361
+ q.enqueue("task", b"2"),
362
+ q.enqueue("task", b"3"),
363
+ )
364
+ ```
365
+
366
+ **Lifecycle:** `BrokerQueue` is an async context manager. On enter it starts the writer
367
+ task; on exit it signals shutdown and drains any buffered operations before stopping.
368
+
369
+ ### Heartbeats and stale-job recovery
370
+
371
+ When a worker claims a job (via `dequeue`), the job's status changes to `IN_PROGRESS`
372
+ and its `heartbeat_at` timestamp is set to `now`. If the worker crashes or hangs, the
373
+ heartbeat stops updating and the job becomes **stale**.
374
+
375
+ **`HeartbeatManager`** is an async context manager that sends periodic heartbeat pings
376
+ for a single job while work is in progress:
377
+
378
+ ```python
379
+ async with HeartbeatManager(q, job.id, interval=timedelta(seconds=30)):
380
+ await long_running_work(job.payload)
381
+ # heartbeat task is cancelled on exit
382
+ ```
383
+
384
+ **Automatic stale recovery:** On every write cycle, `BrokerQueue` (via `GroupCommitLoop`)
385
+ sweeps `IN_PROGRESS` jobs and resets any whose `heartbeat_at` is older than
386
+ `stale_timeout` (default: 5 minutes) back to `QUEUED`. This requires zero extra storage
387
+ operations — the sweep piggybacks on writes that are already happening.
388
+
389
+ `DirectQueue` exposes this as an explicit call:
390
+
391
+ ```python
392
+ requeued = await q.requeue_stale(timeout=timedelta(minutes=5))
393
+ print(f"Recovered {requeued} stale jobs")
394
+ ```
395
+
396
+ ### Wire format
397
+
398
+ The queue state is stored as pretty-printed JSON. Here's a complete example:
399
+
400
+ ```json
401
+ {
402
+ "version": 3,
403
+ "jobs": [
404
+ {
405
+ "id": "550e8400-e29b-41d4-a716-446655440000",
406
+ "entrypoint": "send_email",
407
+ "payload": "eyJ0byI6ICJ1c2VyQGV4YW1wbGUuY29tIn0=",
408
+ "status": "queued",
409
+ "priority": 0,
410
+ "created_at": "2024-01-01T12:00:00+00:00",
411
+ "heartbeat_at": null
412
+ },
413
+ {
414
+ "id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
415
+ "entrypoint": "resize_image",
416
+ "payload": "...",
417
+ "status": "in_progress",
418
+ "priority": 5,
419
+ "created_at": "2024-01-01T11:59:00+00:00",
420
+ "heartbeat_at": "2024-01-01T12:00:45+00:00"
421
+ }
422
+ ]
423
+ }
424
+ ```
425
+
426
+ - `version` — monotonically increasing counter, incremented on every successful write.
427
+ - `payload` — arbitrary bytes, base64-encoded for JSON compatibility.
428
+ - `heartbeat_at` — `null` when `QUEUED`; set by `dequeue` and refreshed by `heartbeat`.
429
+
430
+ Pydantic v2 handles all serialization, including base64 encoding of `bytes` fields and
431
+ ISO-8601 datetime formatting.
432
+
433
+ ---
434
+
435
+ ## Storage adapters
436
+
437
+ ### InMemoryStorage
438
+
439
+ ```python
440
+ from jqueue import InMemoryStorage
441
+
442
+ storage = InMemoryStorage()
443
+ # or with pre-populated state:
444
+ storage = InMemoryStorage(initial_content=b'{"version":0,"jobs":[]}')
445
+ ```
446
+
447
+ Uses an `asyncio.Lock` to serialise reads and writes. Safe for concurrent coroutines
448
+ in a single event loop. **Not** safe across processes or threads. Ideal for tests and
449
+ local development.
450
+
451
+ ### LocalFileSystemStorage
452
+
453
+ ```python
454
+ from jqueue import LocalFileSystemStorage
455
+
456
+ storage = LocalFileSystemStorage("/var/lib/myapp/queue.json")
457
+ # accepts str or pathlib.Path; parent directories are created automatically
458
+ ```
459
+
460
+ Uses `fcntl.flock` for POSIX exclusive locking. The etag is the file's `st_mtime_ns`.
461
+ **POSIX-only** (Linux, macOS). Not safe across machines or on NFS.
462
+
463
+ ### S3Storage
464
+
465
+ ```python
466
+ from jqueue.adapters.storage.s3 import S3Storage
467
+
468
+ # Standard AWS — credentials from environment / IAM role
469
+ storage = S3Storage(bucket="my-bucket", key="queues/jobs.json")
470
+
471
+ # Explicit region
472
+ storage = S3Storage(bucket="my-bucket", key="jobs.json", region_name="eu-central-1")
473
+
474
+ # S3-compatible (MinIO, Cloudflare R2, Tigris, …)
475
+ storage = S3Storage(
476
+ bucket="my-bucket",
477
+ key="jobs.json",
478
+ endpoint_url="http://minio.internal:9000",
479
+ )
480
+
481
+ # Bring your own session
482
+ import aioboto3
483
+ storage = S3Storage(
484
+ bucket="my-bucket",
485
+ key="jobs.json",
486
+ session=aioboto3.Session(
487
+ aws_access_key_id="...",
488
+ aws_secret_access_key="...",
489
+ ),
490
+ )
491
+ ```
492
+
493
+ Uses aioboto3 (async) with `IfMatch` conditional PutObject — the S3 conditional write
494
+ feature [released in August 2024](https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/).
495
+ The etag is the S3 `ETag` response header.
496
+
497
+ Requires: `pip install "jqueue[s3]"`
498
+
499
+ ### GCSStorage
500
+
501
+ ```python
502
+ from jqueue.adapters.storage.gcs import GCSStorage
503
+
504
+ # Application Default Credentials
505
+ storage = GCSStorage(bucket_name="my-bucket", blob_name="queues/jobs.json")
506
+
507
+ # Explicit client
508
+ from google.cloud import storage as gcs
509
+ storage = GCSStorage(
510
+ bucket_name="my-bucket",
511
+ blob_name="jobs.json",
512
+ client=gcs.Client(project="my-project"),
513
+ )
514
+ ```
515
+
516
+ Uses `if_generation_match` for conditional writes. Generation 0 means "blob must not
517
+ exist yet" — used for the first write. The etag is the GCS object generation number
518
+ (stringified integer).
519
+
520
+ Since `google-cloud-storage` is synchronous, all GCS operations are wrapped in
521
+ `asyncio.to_thread`.
522
+
523
+ Requires: `pip install "jqueue[gcs]"`
524
+
525
+ ### Custom adapters
526
+
527
+ Any object with these two async methods satisfies the `ObjectStoragePort` protocol:
528
+
529
+ ```python
530
+ from jqueue import ObjectStoragePort # for type checking only
531
+
532
+ class MyStorage:
533
+ async def read(self) -> tuple[bytes, str | None]:
534
+ """
535
+ Return (content, etag).
536
+ If the object doesn't exist yet, return (b"", None).
537
+ """
538
+ ...
539
+
540
+ async def write(self, content: bytes, if_match: str | None = None) -> str:
541
+ """
542
+ Conditional write.
543
+ - if_match=None → unconditional put (first write)
544
+ - if_match=etag → only write if current etag matches; raise CASConflictError otherwise
545
+ Returns the new etag.
546
+ """
547
+ ...
548
+ ```
549
+
550
+ No base class, no registration — structural subtyping (duck typing) is sufficient.
551
+ Pass your adapter directly to `DirectQueue` or `BrokerQueue`.
552
+
553
+ ---
554
+
555
+ ## API reference
556
+
557
+ ### `BrokerQueue` / `DirectQueue`
558
+
559
+ Both queues expose the same public interface. `BrokerQueue` must be used as an async
560
+ context manager; `DirectQueue` can be used directly.
561
+
562
+ ```python
563
+ # Enqueue
564
+ job: Job = await q.enqueue(
565
+ entrypoint: str, # logical handler name
566
+ payload: bytes, # arbitrary bytes
567
+ priority: int = 0, # lower = processed first
568
+ )
569
+
570
+ # Dequeue — marks jobs IN_PROGRESS, returns empty list if none available
571
+ jobs: list[Job] = await q.dequeue(
572
+ entrypoint: str | None = None, # None = any entrypoint
573
+ *,
574
+ batch_size: int = 1,
575
+ )
576
+
577
+ # Acknowledge — remove a completed job
578
+ await q.ack(job_id: str)
579
+
580
+ # Negative-acknowledge — return a job to QUEUED for retry
581
+ await q.nack(job_id: str)
582
+
583
+ # Heartbeat — refresh the IN_PROGRESS timestamp
584
+ await q.heartbeat(job_id: str)
585
+
586
+ # Read-only snapshot (no CAS, no locking)
587
+ state: QueueState = await q.read_state()
588
+
589
+ # DirectQueue only — explicit stale sweep
590
+ requeued: int = await q.requeue_stale(timeout: timedelta)
591
+ ```
592
+
593
+ ### `HeartbeatManager`
594
+
595
+ ```python
596
+ async with HeartbeatManager(
597
+ queue, # any object with async heartbeat(job_id)
598
+ job_id: str,
599
+ interval: timedelta = timedelta(seconds=60),
600
+ ):
601
+ await do_work()
602
+ ```
603
+
604
+ Starts a background task that calls `queue.heartbeat(job_id)` every `interval` seconds.
605
+ The task is cancelled when the context exits. If `heartbeat` raises `JQueueError`
606
+ (e.g., the job was acked by another process), the task stops silently.
607
+
608
+ ### `Job`
609
+
610
+ ```python
611
+ job.id # str — stable UUID assigned at enqueue time
612
+ job.entrypoint # str
613
+ job.payload # bytes
614
+ job.status # JobStatus.QUEUED | IN_PROGRESS | DEAD
615
+ job.priority # int — lower = higher priority
616
+ job.created_at # datetime (UTC)
617
+ job.heartbeat_at # datetime | None
618
+ ```
619
+
620
+ `Job` is a frozen Pydantic model — all fields are immutable.
621
+
622
+ ### `QueueState`
623
+
624
+ ```python
625
+ state.jobs # tuple[Job, ...]
626
+ state.version # int — incremented on every write
627
+
628
+ state.queued_jobs(entrypoint=None) # sorted by (priority, created_at)
629
+ state.in_progress_jobs()
630
+ state.find(job_id) # Job | None
631
+ ```
632
+
633
+ ---
634
+
635
+ ## Error handling
636
+
637
+ ```python
638
+ from jqueue import (
639
+ JQueueError, # base class — catches all jqueue errors
640
+ CASConflictError, # CAS write rejected (etag mismatch) — usually retried internally
641
+ JobNotFoundError, # job_id not in current state; has .job_id attribute
642
+ StorageError, # I/O failure from the storage backend; has .cause attribute
643
+ )
644
+ ```
645
+
646
+ `CASConflictError` is retried automatically inside `DirectQueue` and `GroupCommitLoop`.
647
+ It only bubbles up to the caller if all retries are exhausted.
648
+
649
+ `JobNotFoundError` is raised by `ack`, `nack`, and `heartbeat` when the job ID is not
650
+ present in the current queue state (e.g., it was already acked by another worker).
651
+
652
+ ```python
653
+ try:
654
+ await q.ack(job.id)
655
+ except JobNotFoundError:
656
+ pass # already removed — safe to ignore
657
+ ```
658
+
659
+ ---
660
+
661
+ ## Architecture
662
+
663
+ jqueue follows the **Ports & Adapters** (hexagonal) pattern:
664
+
665
+ ```
666
+ ┌─────────────────────────────────────────────────────────────┐
667
+ │ domain/ │
668
+ │ ├── models.py Job, QueueState, JobStatus │
669
+ │ └── errors.py JQueueError hierarchy │
670
+ ├─────────────────────────────────────────────────────────────┤
671
+ │ ports/ │
672
+ │ └── storage.py ObjectStoragePort (Protocol) │
673
+ ├─────────────────────────────────────────────────────────────┤
674
+ │ core/ │
675
+ │ ├── codec.py QueueState ↔ JSON bytes │
676
+ │ ├── direct.py DirectQueue (one CAS per op) │
677
+ │ ├── group_commit.py GroupCommitLoop (batched writes) │
678
+ │ ├── broker.py BrokerQueue (context manager façade) │
679
+ │ └── heartbeat.py HeartbeatManager │
680
+ ├─────────────────────────────────────────────────────────────┤
681
+ │ adapters/storage/ │
682
+ │ ├── memory.py InMemoryStorage │
683
+ │ ├── filesystem.py LocalFileSystemStorage │
684
+ │ ├── s3.py S3Storage │
685
+ │ └── gcs.py GCSStorage │
686
+ └─────────────────────────────────────────────────────────────┘
687
+ ```
688
+
689
+ **Key design properties:**
690
+
691
+ - **Pure domain layer.** `Job` and `QueueState` are frozen Pydantic models with no
692
+ I/O dependencies. All mutations return new instances — no side effects.
693
+ - **Protocol-based port.** `ObjectStoragePort` is a `runtime_checkable` Protocol. Any
694
+ two-method object satisfies it without inheritance.
695
+ - **Codec separation.** `codec.encode` / `codec.decode` are the only place that knows
696
+ about the JSON wire format, keeping it easy to evolve.
697
+ - **Zero shared state between writers.** Each CAS cycle reads a fresh snapshot.
698
+ There are no in-process caches that can go stale.
699
+
700
+ ---
701
+
702
+ ## Limitations and trade-offs
703
+
704
+ | Concern | Detail |
705
+ |---|---|
706
+ | **Throughput ceiling** | S3 conditional writes have ~50–200 ms round-trip latency. `DirectQueue` tops out around 5–20 ops/s; `BrokerQueue` can reach ~50–100 ops/s by batching. |
707
+ | **Single-file bottleneck** | All operations contend on one object. This is fine for moderate workloads; for very high throughput, partition into multiple queues (one file per entrypoint). |
708
+ | **Queue size** | The entire state is read and written on every operation. Keep queue depths reasonable (hundreds to low thousands of jobs). |
709
+ | **No push / subscribe** | Workers must poll `dequeue`. There's no server-push mechanism. |
710
+ | **POSIX only (filesystem)** | `LocalFileSystemStorage` uses `fcntl.flock` — Linux and macOS only, not NFS. |
711
+ | **S3 conditional writes** | Requires the August 2024 S3 conditional write feature. Verify your S3-compatible backend supports `IfMatch` on PutObject before using `S3Storage`. |
712
+ | **Not a database** | If you need complex queries, scheduling, or priority queues with millions of jobs, a purpose-built system (Postgres, Redis) is a better fit. |