PyPI - redis-message-queue - Versions diffs - 5.0.0__tar.gz → 6.0.0__tar.gz - Mend

redis-message-queue 5.0.0tar.gz → 6.0.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

{redis_message_queue-5.0.0 → redis_message_queue-6.0.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: redis-message-queue
-Version: 5.0.0
+Version: 6.0.0
 Summary: Python message queuing with Redis and message deduplication
 License: MIT
 License-File: LICENSE
@@ -26,7 +26,7 @@ Description-Content-Type: text/markdown
 # redis-message-queue
-[![PyPI Version](https://img.shields.io/badge/v5.0.0-version?color=43cd0f&style=flat&label=pypi)](https://pypi.org/project/redis-message-queue)
+[![PyPI Version](https://img.shields.io/badge/v6.0.0-version?color=43cd0f&style=flat&label=pypi)](https://pypi.org/project/redis-message-queue)
 [![PyPI Downloads](https://img.shields.io/pypi/dm/redis-message-queue?color=43cd0f&style=flat&label=downloads)](https://pypistats.org/packages/redis-message-queue)
 [![License: MIT](https://img.shields.io/badge/License-MIT-43cd0f.svg?style=flat&label=license)](LICENSE)
 [![Maintained: yes](https://img.shields.io/badge/yes-43cd0f.svg?style=flat&label=maintained)](https://github.com/Elijas/redis-message-queue/issues)
@@ -37,7 +37,7 @@ Description-Content-Type: text/markdown
 **Lightweight Python message queuing with Redis and built-in publish-side deduplication.** Deduplicate publishes within a TTL window, with optional crash recovery — across any number of producers and consumers.
 ```bash
-pip install "redis-message-queue>=3.0.0,<4.0.0"
+pip install "redis-message-queue>=6.0.0,<7.0.0"
 ```
 Requires Redis server >= 6.2.
@@ -151,6 +151,43 @@ When set, `LTRIM` is called after each message is moved to the completed/failed
 Pass `max_completed_length=None` or `max_failed_length=None` explicitly if you
 want unbounded tracking queues.
+### Publish backpressure
+By default, the pending queue is unbounded (`max_pending_length=None`), matching
+the v5 behavior. Set `max_pending_length` when producers can outrun consumers
+and Redis memory must fail closed before the broker is exhausted:
+```python
+queue = RedisMessageQueue(
+    "q",
+    client=client,
+    max_pending_length=100_000,
+    pending_overload_policy="raise",  # "raise", "drop_oldest", or "block"
+)
+```
+The built-in Redis path checks pending depth and enqueues in the same Lua script,
+so concurrent publishers cannot race above the configured cap. Overload policies:
+- `raise` raises `QueueBackpressureError` and leaves the pending list unchanged.
+- `drop_oldest` removes the oldest pending message (`RPOP`) before enqueueing the
+  new message. This is silent data loss by design; deduplication markers for
+  dropped messages are not removed, so a dropped duplicate may still be
+  suppressed until its dedup TTL expires.
+- `block` retries the atomic check until space opens or
+  `pending_overload_block_timeout_seconds` elapses (default: 1.0), then raises
+  `QueueBackpressureError`.
+These limits apply only to the pending list at publish time. They do not cap
+messages already in `processing`, dead-letter queues, deduplication keys, or
+replay metadata. `max_completed_length` and `max_failed_length` only bound the
+completed/failed history lists. Size pending payload memory separately from the
+dedup/replay metadata described in
+[Redis memory sizing](#redis-memory-sizing-for-deduplication-and-replay-metadata).
+When using `gateway=`, configure backpressure on the gateway directly, for
+example `RedisGateway(redis_client=client, max_pending_length=100_000)`.
 ### Crash recovery with visibility timeout
 ```python
@@ -186,6 +223,43 @@ The callback is **advisory** — it may fire briefly after a successful `process
 Without a visibility timeout, messages already moved to `processing` remain there indefinitely after a consumer crash and are not redelivered, even if the crash happened before your handler started running.
+### Ordering and multi-consumer fairness
+The built-in queue is a shared-pull Redis list. Successful publishes push to the
+left side of the pending list, and claims pop from the right side, so Redis
+grants claims in enqueue order in the no-failure path.
+This is a claim-order guarantee only. It is not a completion-order guarantee:
+multiple consumers process concurrently, handlers can run for different
+durations, and younger messages can finish before older messages.
+With `visibility_timeout_seconds` enabled, expired messages from `processing`
+are reclaimed before fresh pending work on the next consumer poll. A reclaimed
+message may be delivered after younger messages were already processed, and may
+be processed concurrently with a stale original handler if that handler keeps
+running after its lease expires.
+Expired reclaims are ordered by lease deadline within one reclaim batch.
+`CLAIM_MESSAGE_WITH_VISIBILITY_TIMEOUT_LUA_SCRIPT` selects expired leases with
+`ZRANGEBYSCORE ... LIMIT 0, 100` to bound Redis Lua execution time. When more
+than 100 messages expire together, the next poll can append a later reclaim
+batch at the claimable end of the pending list ahead of leftovers from the
+previous batch, so cross-batch redelivery order is not guaranteed.
+`max_delivery_count` can skip over poison messages during a claim poll by moving
+over-limit messages to the dead-letter queue and returning a later pending
+message. Deduplication is publish-side only: duplicate publishes are not
+enqueued and therefore do not occupy a queue position.
+Handler exceptions are not retries: the default behavior removes the message
+from `processing`, or moves it to the failed queue when enabled. Redelivery is
+for crash, stall, or stale-lease paths where cleanup does not complete.
+Multiple consumers contend for the same queue. The next message goes to the
+consumer whose claim request Redis executes next. There is no round-robin,
+equal-share, or starvation-freedom guarantee; faster consumers can receive more
+than 1/N of messages.
 ### Dead-letter queue
 ```python
@@ -230,6 +304,42 @@ while not interrupt.is_interrupted():
 > (for example, a second Ctrl+C raises `KeyboardInterrupt`). If you need multiple
 > shutdown hooks, use a single handler and fan out in your own code.
+There are three distinct shutdown shapes; pick the one that matches your runtime:
+| Shape | Trigger | In-flight handler | Pending claim IDs |
+|---|---|---|---|
+| **Flag-based soft drain** (`GracefulInterruptHandler`) | First SIGINT/SIGTERM flips a flag | Runs to completion | Drained on the next claim call, not on signal arrival |
+| **Async task cancellation** (`asyncio.CancelledError`) | Framework cancels the worker task (Uvicorn/K8s SIGTERM in many setups) | **Hard abort** — message stays in `processing`; with VT it is reclaimed at deadline expiry, without VT it is orphaned | Not drained |
+| **Explicit drain** (`drain()` / `aclose()`) | You call the method | Caller's responsibility to let it finish (drain does **not** cancel) | Drained synchronously via the gateway recovery path |
+Use `drain()` / `aclose()` to bridge K8s `preStop` / SIGTERM grace windows without
+relying on signal interception:
+```python
+# sync — in your SIGTERM handler or preStop hook
+queue.drain(timeout=25)   # refuses new claims, recovers pending claim IDs
+worker_thread.join()      # wait for in-flight process_message to finish
+# async — same shape
+await queue.aclose(timeout=25)
+await worker_task         # task observes ``_draining`` and exits its loop
+```
+`drain()` / `aclose()` set a queue-local flag so subsequent `process_message()`
+calls yield `None` immediately. They do not cancel in-flight handlers — the
+caller must arrange handler exit through normal thread/task coordination.
+Returns `True` if all in-memory pending claim IDs were recovered within the
+timeout; `False` if the deadline fired or transient Redis errors left claim
+IDs pending (call again to retry). `timeout=0` reports current state without
+attempting recovery.
+> **Heartbeat caveat (best-effort stop):** when `heartbeat_interval_seconds` is
+> set, the heartbeat sidecar's `stop()` is bounded but not strictly quiescent —
+> a slow renewal in flight when `process_message` exits may still write to
+> Redis after the caller believes shutdown is complete. The renewal is bounded
+> by the configured visibility timeout and the lease token check on the Redis
+> side, but plan for a small post-shutdown overlap rather than instant quiesce.
 ### Custom gateway
 ```python
@@ -250,12 +360,12 @@ queue = RedisMessageQueue("q", gateway=gateway)
 The retry knobs configure an internal `tenacity` strategy: exponential
 backoff with jitter, retry on transient Redis errors only, capped at
-`retry_budget_seconds`. The budget is wall-clock time from the first attempt (including attempt duration), not inter-attempt delay; a single attempt that takes longer than the budget results in zero retries. Setting `retry_budget_seconds=0` disables retry
+`retry_budget_seconds`. The budget is monotonic elapsed time from the first attempt (including attempt duration), not inter-attempt delay; it is unaffected by Python-host NTP jumps. A single attempt that takes longer than the budget results in zero retries. Setting `retry_budget_seconds=0` disables retry
 entirely (single attempt; exceptions propagate). The library uses
 `retry_budget_seconds` to size the operation-result cache TTL automatically,
 so the previous footgun of an over-long retry budget out-living the cache
 and producing misleading "cleanup was a no-op" warnings is now structurally
-impossible. Note: tenacity may allow one additional attempt beyond the budget if the budget check passes at attempt start — total wall-clock time can exceed `retry_budget_seconds` by the duration of that final attempt.
+impossible. Note: tenacity may allow one additional attempt beyond the budget if the budget check passes at attempt start, so total monotonic elapsed time can exceed `retry_budget_seconds` by the duration of that final attempt.
 To plug in a different retry library (`backoff`, `asyncstdlib.retry`, or your
 own logic) or fundamentally different semantics, subclass
@@ -327,9 +437,126 @@ await client.aclose()
 For the sync Redis client, call `client.close()` during application shutdown when
 you own the client lifecycle.
+## Production notes
+### Fork safety and pre-fork servers
+Construct Redis clients and `RedisMessageQueue` instances after a process forks.
+This is the recommended pattern for `multiprocessing`, `ProcessPoolExecutor`,
+and pre-fork servers such as gunicorn with `--preload`.
+```python
+def worker_main():
+    client = redis.Redis()
+    queue = RedisMessageQueue("jobs", client=client)
+    ...
+```
+Avoid constructing a queue/client in a parent process and then using that same
+object in forked children, especially if the parent has already run any Redis
+command. The queue stores the user-provided Redis client and process-local
+claim-recovery state. Inherited Redis sockets can corrupt the Redis protocol if
+two processes use the same file descriptor.
+Notes:
+- The sync redis-py pooled client attempts to reset its connection pool after
+  fork, but this does not apply to every client shape.
+- The built-in sync gateway rejects `redis.Redis(single_connection_client=True)`
+  because that mode pins one socket instead of using the pool.
+- Do not share `redis.asyncio.Redis` or async queues across fork; create or
+  reconnect them in the child process.
+- If you use `GracefulInterruptHandler`, create it in the worker process after
+  fork so signal ownership is local to that worker.
+- The heartbeat sidecar is lazy and starts only while processing a leased
+  message. Do not call `fork()` from inside active message handlers unless the
+  child exits without using the inherited queue/client.
+### Redis memory sizing for deduplication and replay metadata
+When deduplication is enabled, each distinct dedup key creates one Redis string
+for `message_deduplication_log_ttl_seconds` (default: 3600 seconds). The default
+dedup key is a SHA-256 hash of the canonical message payload, so distinct
+payloads are distinct keys. Size Redis for:
+```text
+peak_unique_publish_rate_per_second
+* message_deduplication_log_ttl_seconds
+* bytes_per_dedup_key
+```
+Use 200 bytes per dedup key as a conservative starting point for short queue
+names, then validate with `MEMORY USAGE` in your Redis version. Example:
+1,000 unique messages/s * 3,600s * 200 B ~= 720 MB for dedup markers alone.
+A 24h dedup window at the same rate is 86.4M keys, or roughly 17 GB before
+message payload lists, lease metadata, completed/failed queues, and allocator
+fragmentation.
+Operation-result replay keys are normally deleted after a successful call, but
+may live until their TTL after ambiguous connection drops or failed cleanup
+deletes. With visibility timeouts, active claims also store replay metadata
+until ack or reclaim. Without visibility timeouts, abandoned claims leave
+`claim_result_ids` and `claim_result_backrefs` fields until the message is
+acked or manually cleaned.
+`max_completed_length` and `max_failed_length` only bound the completed/failed
+lists. They do not bound deduplication keys or replay metadata.
+Avoid sharing queue Redis DBs with unrelated high-cardinality workloads. If
+idempotency matters, prefer explicit capacity planning and `noeviction` with
+alerts over LRU/random eviction policies: evicting dedup/replay keys before
+their TTL can weaken duplicate suppression and retry result replay.
+## Observability
+Queue instances accept an optional `on_event` callback for metrics, tracing, or
+structured logging. The sync queue expects a regular callable; the async queue
+expects an async callable:
+```python
+from redis_message_queue import QueueEvent, RedisMessageQueue
+def on_event(event: QueueEvent) -> None:
+    ...
+queue = RedisMessageQueue("jobs", client=client, on_event=on_event)
+```
+Events cover publish, dedup hits, claim/empty polls, reclaim, ack/nack,
+completed/failed cleanup, DLQ moves, heartbeat renewal, stale leases, cleanup
+and trim failures, and retry attempts. Callback exceptions are logged and
+reported with `RuntimeWarning`, but never propagate into queue operations.
+Package logs remain diagnostic; use `on_event` rather than log parsing for
+metrics.
+```python
+from prometheus_client import Counter
+from redis_message_queue import QueueEvent, RedisMessageQueue
+events_total = Counter(
+    "rmq_events_total",
+    "redis-message-queue lifecycle events",
+    ["queue", "operation", "outcome", "exception_type"],
+)
+def observe(event: QueueEvent) -> None:
+    events_total.labels(
+        event.queue, event.operation, event.outcome, event.exception_type or ""
+    ).inc()
+queue = RedisMessageQueue("jobs", client=client, on_event=observe)
+```
+The public exception hierarchy is rooted at `RedisMessageQueueError`.
+Configuration value/combinations raise `ConfigurationError` (also a
+`ValueError`), custom gateway contract violations raise `GatewayContractError`
+(also a `TypeError`), and Lua `redis.error_reply(...)` failures raise
+`LuaScriptError` (also a redis-py `ResponseError`). Publish overload raises
+`QueueBackpressureError`. `CleanupFailedError` and `RetryBudgetExhaustedError`
+are reserved categories for cleanup and retry surfaces.
 ## Known limitations
-- **No metrics or observability hooks.** The library logs warnings (stale leases, heartbeat failures, transient errors) via Python's `logging` module but does not expose callbacks, event hooks, or metric counters. To monitor queue health, inspect the underlying Redis keys directly or parse log output.
 - **Timed waits use polling claim loops.** To make claims recoverable after ambiguous connection drops, `wait_for_message_and_move()` uses idempotent Lua claim polling instead of raw blocking list-move commands. This adds a small polling cadence during timed waits.
 - **Redis Lua is atomic, not rollback-transactional.** The built-in scripts now preflight queue key types and fail closed on `WRONGTYPE` before mutating queue state, but Redis does not undo earlier writes if a later script command fails for another reason (for example `OOM` under severe memory pressure).
 - **Batch reclaim limit of 100.** The visibility-timeout reclaim Lua script processes at most 100 expired messages per consumer poll. Under extreme backlog this may delay recovery, but prevents any single poll from blocking Redis.
@@ -337,7 +564,7 @@ you own the client lifecycle.
 - **Cluster detection uses `isinstance(client, RedisCluster)`.** Wrapped or instrumented cluster clients that delegate without inheriting will bypass hash-tag validation. Custom gateways should set `is_redis_cluster = True` explicitly.
 - **Redis Cluster requires hash tags.** The built-in queue uses multiple Redis keys per operation. Wrap the queue name in hash tags (for example `{myqueue}`) so every generated key lands in the same slot. When you pass a Redis Cluster client to the built-in queue/gateway path, incompatible names are rejected early.
 - **Non-ASCII payloads use ~2x storage.** The default `ensure_ascii=True` in JSON serialization encodes non-ASCII characters as `\uXXXX` escape sequences. This is a deliberate compatibility choice.
-- **Client-side `Retry` can duplicate non-deduplicated publishes.** If you construct your `redis.Redis` client with `retry=Retry(...)`, redis-py retries `ConnectionError` / `TimeoutError` at the connection layer — *below* this library. Idempotent operations (deduplicated `publish()`, lease-scoped cleanup) are safe because their Lua scripts replay the original result. `add_message()` (used by `publish()` when `deduplication=False`) is a bare `LPUSH`: this library deliberately does not retry it, but a client-level `Retry` will, and if the server executed the command before the response was lost the message is enqueued twice. Leave `retry=None` (the default) if you need strict at-most-once semantics for non-deduplicated publishes, or accept the duplication risk. More broadly, any non-idempotent `LPUSH` path is vulnerable if the connection drops after server execution but before the client receives the response; all other built-in operations (deduplicated publish, lease-scoped ack/move, lease renewal) use replay markers and are safe under client-level `Retry`.
+- **Client-side `Retry` can duplicate non-deduplicated publishes.** If you construct your `redis.Redis` client with `retry=Retry(...)`, redis-py retries `ConnectionError` / `TimeoutError` at the connection layer — *below* this library. Idempotent operations (deduplicated `publish()`, lease-scoped cleanup) are safe because their Lua scripts replay the original result. `add_message()` (used by `publish()` when `deduplication=False`) is a bare `LPUSH` by default, or a single non-idempotent Lua enqueue when `max_pending_length` is set: this library deliberately does not retry it, but a client-level `Retry` will, and if the server executed the command before the response was lost the message is enqueued twice. Leave `retry=None` (the default) if you need strict at-most-once semantics for non-deduplicated publishes, or accept the duplication risk. More broadly, any non-idempotent enqueue path is vulnerable if the connection drops after server execution but before the client receives the response; all other built-in operations (deduplicated publish, lease-scoped ack/move, lease renewal) use replay markers and are safe under client-level `Retry`.
 - **Redis Cluster default retry can stack with this library's retry budget.** In redis-py 6.0+, `RedisCluster()` constructs a default `ExponentialWithJitterBackoff` retry below this library's `retry_budget_seconds`. If you need a single retry surface, pass `retry=Retry(NoBackoff(), 0)` to the cluster client or reduce `retry_budget_seconds` to account for the lower-level retry window.
 For a full analysis, see [docs/production-readiness.md](docs/production-readiness.md).

{redis_message_queue-5.0.0 → redis_message_queue-6.0.0}/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 # redis-message-queue
-[![PyPI Version](https://img.shields.io/badge/v5.0.0-version?color=43cd0f&style=flat&label=pypi)](https://pypi.org/project/redis-message-queue)
+[![PyPI Version](https://img.shields.io/badge/v6.0.0-version?color=43cd0f&style=flat&label=pypi)](https://pypi.org/project/redis-message-queue)
 [![PyPI Downloads](https://img.shields.io/pypi/dm/redis-message-queue?color=43cd0f&style=flat&label=downloads)](https://pypistats.org/packages/redis-message-queue)
 [![License: MIT](https://img.shields.io/badge/License-MIT-43cd0f.svg?style=flat&label=license)](LICENSE)
 [![Maintained: yes](https://img.shields.io/badge/yes-43cd0f.svg?style=flat&label=maintained)](https://github.com/Elijas/redis-message-queue/issues)
@@ -11,7 +11,7 @@
 **Lightweight Python message queuing with Redis and built-in publish-side deduplication.** Deduplicate publishes within a TTL window, with optional crash recovery — across any number of producers and consumers.
 ```bash
-pip install "redis-message-queue>=3.0.0,<4.0.0"
+pip install "redis-message-queue>=6.0.0,<7.0.0"
 ```
 Requires Redis server >= 6.2.
@@ -125,6 +125,43 @@ When set, `LTRIM` is called after each message is moved to the completed/failed
 Pass `max_completed_length=None` or `max_failed_length=None` explicitly if you
 want unbounded tracking queues.
+### Publish backpressure
+By default, the pending queue is unbounded (`max_pending_length=None`), matching
+the v5 behavior. Set `max_pending_length` when producers can outrun consumers
+and Redis memory must fail closed before the broker is exhausted:
+```python
+queue = RedisMessageQueue(
+    "q",
+    client=client,
+    max_pending_length=100_000,
+    pending_overload_policy="raise",  # "raise", "drop_oldest", or "block"
+)
+```
+The built-in Redis path checks pending depth and enqueues in the same Lua script,
+so concurrent publishers cannot race above the configured cap. Overload policies:
+- `raise` raises `QueueBackpressureError` and leaves the pending list unchanged.
+- `drop_oldest` removes the oldest pending message (`RPOP`) before enqueueing the
+  new message. This is silent data loss by design; deduplication markers for
+  dropped messages are not removed, so a dropped duplicate may still be
+  suppressed until its dedup TTL expires.
+- `block` retries the atomic check until space opens or
+  `pending_overload_block_timeout_seconds` elapses (default: 1.0), then raises
+  `QueueBackpressureError`.
+These limits apply only to the pending list at publish time. They do not cap
+messages already in `processing`, dead-letter queues, deduplication keys, or
+replay metadata. `max_completed_length` and `max_failed_length` only bound the
+completed/failed history lists. Size pending payload memory separately from the
+dedup/replay metadata described in
+[Redis memory sizing](#redis-memory-sizing-for-deduplication-and-replay-metadata).
+When using `gateway=`, configure backpressure on the gateway directly, for
+example `RedisGateway(redis_client=client, max_pending_length=100_000)`.
 ### Crash recovery with visibility timeout
 ```python
@@ -160,6 +197,43 @@ The callback is **advisory** — it may fire briefly after a successful `process
 Without a visibility timeout, messages already moved to `processing` remain there indefinitely after a consumer crash and are not redelivered, even if the crash happened before your handler started running.
+### Ordering and multi-consumer fairness
+The built-in queue is a shared-pull Redis list. Successful publishes push to the
+left side of the pending list, and claims pop from the right side, so Redis
+grants claims in enqueue order in the no-failure path.
+This is a claim-order guarantee only. It is not a completion-order guarantee:
+multiple consumers process concurrently, handlers can run for different
+durations, and younger messages can finish before older messages.
+With `visibility_timeout_seconds` enabled, expired messages from `processing`
+are reclaimed before fresh pending work on the next consumer poll. A reclaimed
+message may be delivered after younger messages were already processed, and may
+be processed concurrently with a stale original handler if that handler keeps
+running after its lease expires.
+Expired reclaims are ordered by lease deadline within one reclaim batch.
+`CLAIM_MESSAGE_WITH_VISIBILITY_TIMEOUT_LUA_SCRIPT` selects expired leases with
+`ZRANGEBYSCORE ... LIMIT 0, 100` to bound Redis Lua execution time. When more
+than 100 messages expire together, the next poll can append a later reclaim
+batch at the claimable end of the pending list ahead of leftovers from the
+previous batch, so cross-batch redelivery order is not guaranteed.
+`max_delivery_count` can skip over poison messages during a claim poll by moving
+over-limit messages to the dead-letter queue and returning a later pending
+message. Deduplication is publish-side only: duplicate publishes are not
+enqueued and therefore do not occupy a queue position.
+Handler exceptions are not retries: the default behavior removes the message
+from `processing`, or moves it to the failed queue when enabled. Redelivery is
+for crash, stall, or stale-lease paths where cleanup does not complete.
+Multiple consumers contend for the same queue. The next message goes to the
+consumer whose claim request Redis executes next. There is no round-robin,
+equal-share, or starvation-freedom guarantee; faster consumers can receive more
+than 1/N of messages.
 ### Dead-letter queue
 ```python
@@ -204,6 +278,42 @@ while not interrupt.is_interrupted():
 > (for example, a second Ctrl+C raises `KeyboardInterrupt`). If you need multiple
 > shutdown hooks, use a single handler and fan out in your own code.
+There are three distinct shutdown shapes; pick the one that matches your runtime:
+| Shape | Trigger | In-flight handler | Pending claim IDs |
+|---|---|---|---|
+| **Flag-based soft drain** (`GracefulInterruptHandler`) | First SIGINT/SIGTERM flips a flag | Runs to completion | Drained on the next claim call, not on signal arrival |
+| **Async task cancellation** (`asyncio.CancelledError`) | Framework cancels the worker task (Uvicorn/K8s SIGTERM in many setups) | **Hard abort** — message stays in `processing`; with VT it is reclaimed at deadline expiry, without VT it is orphaned | Not drained |
+| **Explicit drain** (`drain()` / `aclose()`) | You call the method | Caller's responsibility to let it finish (drain does **not** cancel) | Drained synchronously via the gateway recovery path |
+Use `drain()` / `aclose()` to bridge K8s `preStop` / SIGTERM grace windows without
+relying on signal interception:
+```python
+# sync — in your SIGTERM handler or preStop hook
+queue.drain(timeout=25)   # refuses new claims, recovers pending claim IDs
+worker_thread.join()      # wait for in-flight process_message to finish
+# async — same shape
+await queue.aclose(timeout=25)
+await worker_task         # task observes ``_draining`` and exits its loop
+```
+`drain()` / `aclose()` set a queue-local flag so subsequent `process_message()`
+calls yield `None` immediately. They do not cancel in-flight handlers — the
+caller must arrange handler exit through normal thread/task coordination.
+Returns `True` if all in-memory pending claim IDs were recovered within the
+timeout; `False` if the deadline fired or transient Redis errors left claim
+IDs pending (call again to retry). `timeout=0` reports current state without
+attempting recovery.
+> **Heartbeat caveat (best-effort stop):** when `heartbeat_interval_seconds` is
+> set, the heartbeat sidecar's `stop()` is bounded but not strictly quiescent —
+> a slow renewal in flight when `process_message` exits may still write to
+> Redis after the caller believes shutdown is complete. The renewal is bounded
+> by the configured visibility timeout and the lease token check on the Redis
+> side, but plan for a small post-shutdown overlap rather than instant quiesce.
 ### Custom gateway
 ```python
@@ -224,12 +334,12 @@ queue = RedisMessageQueue("q", gateway=gateway)
 The retry knobs configure an internal `tenacity` strategy: exponential
 backoff with jitter, retry on transient Redis errors only, capped at
-`retry_budget_seconds`. The budget is wall-clock time from the first attempt (including attempt duration), not inter-attempt delay; a single attempt that takes longer than the budget results in zero retries. Setting `retry_budget_seconds=0` disables retry
+`retry_budget_seconds`. The budget is monotonic elapsed time from the first attempt (including attempt duration), not inter-attempt delay; it is unaffected by Python-host NTP jumps. A single attempt that takes longer than the budget results in zero retries. Setting `retry_budget_seconds=0` disables retry
 entirely (single attempt; exceptions propagate). The library uses
 `retry_budget_seconds` to size the operation-result cache TTL automatically,
 so the previous footgun of an over-long retry budget out-living the cache
 and producing misleading "cleanup was a no-op" warnings is now structurally
-impossible. Note: tenacity may allow one additional attempt beyond the budget if the budget check passes at attempt start — total wall-clock time can exceed `retry_budget_seconds` by the duration of that final attempt.
+impossible. Note: tenacity may allow one additional attempt beyond the budget if the budget check passes at attempt start, so total monotonic elapsed time can exceed `retry_budget_seconds` by the duration of that final attempt.
 To plug in a different retry library (`backoff`, `asyncstdlib.retry`, or your
 own logic) or fundamentally different semantics, subclass
@@ -301,9 +411,126 @@ await client.aclose()
 For the sync Redis client, call `client.close()` during application shutdown when
 you own the client lifecycle.
+## Production notes
+### Fork safety and pre-fork servers
+Construct Redis clients and `RedisMessageQueue` instances after a process forks.
+This is the recommended pattern for `multiprocessing`, `ProcessPoolExecutor`,
+and pre-fork servers such as gunicorn with `--preload`.
+```python
+def worker_main():
+    client = redis.Redis()
+    queue = RedisMessageQueue("jobs", client=client)
+    ...
+```
+Avoid constructing a queue/client in a parent process and then using that same
+object in forked children, especially if the parent has already run any Redis
+command. The queue stores the user-provided Redis client and process-local
+claim-recovery state. Inherited Redis sockets can corrupt the Redis protocol if
+two processes use the same file descriptor.
+Notes:
+- The sync redis-py pooled client attempts to reset its connection pool after
+  fork, but this does not apply to every client shape.
+- The built-in sync gateway rejects `redis.Redis(single_connection_client=True)`
+  because that mode pins one socket instead of using the pool.
+- Do not share `redis.asyncio.Redis` or async queues across fork; create or
+  reconnect them in the child process.
+- If you use `GracefulInterruptHandler`, create it in the worker process after
+  fork so signal ownership is local to that worker.
+- The heartbeat sidecar is lazy and starts only while processing a leased
+  message. Do not call `fork()` from inside active message handlers unless the
+  child exits without using the inherited queue/client.
+### Redis memory sizing for deduplication and replay metadata
+When deduplication is enabled, each distinct dedup key creates one Redis string
+for `message_deduplication_log_ttl_seconds` (default: 3600 seconds). The default
+dedup key is a SHA-256 hash of the canonical message payload, so distinct
+payloads are distinct keys. Size Redis for:
+```text
+peak_unique_publish_rate_per_second
+* message_deduplication_log_ttl_seconds
+* bytes_per_dedup_key
+```
+Use 200 bytes per dedup key as a conservative starting point for short queue
+names, then validate with `MEMORY USAGE` in your Redis version. Example:
+1,000 unique messages/s * 3,600s * 200 B ~= 720 MB for dedup markers alone.
+A 24h dedup window at the same rate is 86.4M keys, or roughly 17 GB before
+message payload lists, lease metadata, completed/failed queues, and allocator
+fragmentation.
+Operation-result replay keys are normally deleted after a successful call, but
+may live until their TTL after ambiguous connection drops or failed cleanup
+deletes. With visibility timeouts, active claims also store replay metadata
+until ack or reclaim. Without visibility timeouts, abandoned claims leave
+`claim_result_ids` and `claim_result_backrefs` fields until the message is
+acked or manually cleaned.
+`max_completed_length` and `max_failed_length` only bound the completed/failed
+lists. They do not bound deduplication keys or replay metadata.
+Avoid sharing queue Redis DBs with unrelated high-cardinality workloads. If
+idempotency matters, prefer explicit capacity planning and `noeviction` with
+alerts over LRU/random eviction policies: evicting dedup/replay keys before
+their TTL can weaken duplicate suppression and retry result replay.
+## Observability
+Queue instances accept an optional `on_event` callback for metrics, tracing, or
+structured logging. The sync queue expects a regular callable; the async queue
+expects an async callable:
+```python
+from redis_message_queue import QueueEvent, RedisMessageQueue
+def on_event(event: QueueEvent) -> None:
+    ...
+queue = RedisMessageQueue("jobs", client=client, on_event=on_event)
+```
+Events cover publish, dedup hits, claim/empty polls, reclaim, ack/nack,
+completed/failed cleanup, DLQ moves, heartbeat renewal, stale leases, cleanup
+and trim failures, and retry attempts. Callback exceptions are logged and
+reported with `RuntimeWarning`, but never propagate into queue operations.
+Package logs remain diagnostic; use `on_event` rather than log parsing for
+metrics.
+```python
+from prometheus_client import Counter
+from redis_message_queue import QueueEvent, RedisMessageQueue
+events_total = Counter(
+    "rmq_events_total",
+    "redis-message-queue lifecycle events",
+    ["queue", "operation", "outcome", "exception_type"],
+)
+def observe(event: QueueEvent) -> None:
+    events_total.labels(
+        event.queue, event.operation, event.outcome, event.exception_type or ""
+    ).inc()
+queue = RedisMessageQueue("jobs", client=client, on_event=observe)
+```
+The public exception hierarchy is rooted at `RedisMessageQueueError`.
+Configuration value/combinations raise `ConfigurationError` (also a
+`ValueError`), custom gateway contract violations raise `GatewayContractError`
+(also a `TypeError`), and Lua `redis.error_reply(...)` failures raise
+`LuaScriptError` (also a redis-py `ResponseError`). Publish overload raises
+`QueueBackpressureError`. `CleanupFailedError` and `RetryBudgetExhaustedError`
+are reserved categories for cleanup and retry surfaces.
 ## Known limitations
-- **No metrics or observability hooks.** The library logs warnings (stale leases, heartbeat failures, transient errors) via Python's `logging` module but does not expose callbacks, event hooks, or metric counters. To monitor queue health, inspect the underlying Redis keys directly or parse log output.
 - **Timed waits use polling claim loops.** To make claims recoverable after ambiguous connection drops, `wait_for_message_and_move()` uses idempotent Lua claim polling instead of raw blocking list-move commands. This adds a small polling cadence during timed waits.
 - **Redis Lua is atomic, not rollback-transactional.** The built-in scripts now preflight queue key types and fail closed on `WRONGTYPE` before mutating queue state, but Redis does not undo earlier writes if a later script command fails for another reason (for example `OOM` under severe memory pressure).
 - **Batch reclaim limit of 100.** The visibility-timeout reclaim Lua script processes at most 100 expired messages per consumer poll. Under extreme backlog this may delay recovery, but prevents any single poll from blocking Redis.
@@ -311,7 +538,7 @@ you own the client lifecycle.
 - **Cluster detection uses `isinstance(client, RedisCluster)`.** Wrapped or instrumented cluster clients that delegate without inheriting will bypass hash-tag validation. Custom gateways should set `is_redis_cluster = True` explicitly.
 - **Redis Cluster requires hash tags.** The built-in queue uses multiple Redis keys per operation. Wrap the queue name in hash tags (for example `{myqueue}`) so every generated key lands in the same slot. When you pass a Redis Cluster client to the built-in queue/gateway path, incompatible names are rejected early.
 - **Non-ASCII payloads use ~2x storage.** The default `ensure_ascii=True` in JSON serialization encodes non-ASCII characters as `\uXXXX` escape sequences. This is a deliberate compatibility choice.
-- **Client-side `Retry` can duplicate non-deduplicated publishes.** If you construct your `redis.Redis` client with `retry=Retry(...)`, redis-py retries `ConnectionError` / `TimeoutError` at the connection layer — *below* this library. Idempotent operations (deduplicated `publish()`, lease-scoped cleanup) are safe because their Lua scripts replay the original result. `add_message()` (used by `publish()` when `deduplication=False`) is a bare `LPUSH`: this library deliberately does not retry it, but a client-level `Retry` will, and if the server executed the command before the response was lost the message is enqueued twice. Leave `retry=None` (the default) if you need strict at-most-once semantics for non-deduplicated publishes, or accept the duplication risk. More broadly, any non-idempotent `LPUSH` path is vulnerable if the connection drops after server execution but before the client receives the response; all other built-in operations (deduplicated publish, lease-scoped ack/move, lease renewal) use replay markers and are safe under client-level `Retry`.
+- **Client-side `Retry` can duplicate non-deduplicated publishes.** If you construct your `redis.Redis` client with `retry=Retry(...)`, redis-py retries `ConnectionError` / `TimeoutError` at the connection layer — *below* this library. Idempotent operations (deduplicated `publish()`, lease-scoped cleanup) are safe because their Lua scripts replay the original result. `add_message()` (used by `publish()` when `deduplication=False`) is a bare `LPUSH` by default, or a single non-idempotent Lua enqueue when `max_pending_length` is set: this library deliberately does not retry it, but a client-level `Retry` will, and if the server executed the command before the response was lost the message is enqueued twice. Leave `retry=None` (the default) if you need strict at-most-once semantics for non-deduplicated publishes, or accept the duplication risk. More broadly, any non-idempotent enqueue path is vulnerable if the connection drops after server execution but before the client receives the response; all other built-in operations (deduplicated publish, lease-scoped ack/move, lease renewal) use replay markers and are safe under client-level `Retry`.
 - **Redis Cluster default retry can stack with this library's retry budget.** In redis-py 6.0+, `RedisCluster()` constructs a default `ExponentialWithJitterBackoff` retry below this library's `retry_budget_seconds`. If you need a single retry surface, pass `retry=Retry(NoBackoff(), 0)` to the cluster client or reduce `retry_budget_seconds` to account for the lower-level retry window.
 For a full analysis, see [docs/production-readiness.md](docs/production-readiness.md).

{redis_message_queue-5.0.0 → redis_message_queue-6.0.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "redis-message-queue"
-version = "5.0.0"
+version = "6.0.0"
 description = "Python message queuing with Redis and message deduplication"
 authors = ["Elijas <4084885+Elijas@users.noreply.github.com>"]
 readme = "README.md"

{redis_message_queue-5.0.0 → redis_message_queue-6.0.0}/redis_message_queue/__init__.py RENAMED Viewed

@@ -1,4 +1,14 @@
 from redis_message_queue._abstract_redis_gateway import AbstractRedisGateway
+from redis_message_queue._event import EventOperation, EventOutcome, QueueEvent
+from redis_message_queue._exceptions import (
+    CleanupFailedError,
+    ConfigurationError,
+    GatewayContractError,
+    LuaScriptError,
+    QueueBackpressureError,
+    RedisMessageQueueError,
+    RetryBudgetExhaustedError,
+)
 from redis_message_queue._redis_gateway import RedisGateway
 from redis_message_queue._stored_message import ClaimedMessage, MessageData
 from redis_message_queue.interrupt_handler import (
@@ -15,4 +25,14 @@ __all__ = [
     "MessageData",
     "GracefulInterruptHandler",
     "BaseGracefulInterruptHandler",
+    "QueueEvent",
+    "EventOperation",
+    "EventOutcome",
+    "RedisMessageQueueError",
+    "ConfigurationError",
+    "GatewayContractError",
+    "LuaScriptError",
+    "QueueBackpressureError",
+    "CleanupFailedError",
+    "RetryBudgetExhaustedError",
 ]

redis-message-queue 5.0.0__tar.gz → 6.0.0__tar.gz

redis-message-queue 5.0.0tar.gz → 6.0.0tar.gz