PyPI - redis-message-queue - Versions diffs - 6.0.1__tar.gz → 7.0.0__tar.gz - Mend

redis-message-queue 6.0.1tar.gz → 7.0.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

{redis_message_queue-6.0.1 → redis_message_queue-7.0.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: redis-message-queue
-Version: 6.0.1
+Version: 7.0.0
 Summary: Python message queuing with Redis and message deduplication
 License: MIT
 License-File: LICENSE
@@ -17,7 +17,7 @@ Classifier: Programming Language :: Python :: 3.13
 Classifier: Programming Language :: Python :: 3.14
 Classifier: Topic :: Software Development :: Libraries
 Classifier: Topic :: System :: Distributed Computing
-Requires-Dist: redis (>=5.0.0)
+Requires-Dist: redis (>=5.0.0,<8.0.0)
 Requires-Dist: tenacity (>=8.1.0)
 Project-URL: Homepage, https://github.com/Elijas/redis-message-queue
 Project-URL: Issues, https://github.com/Elijas/redis-message-queue/issues
@@ -26,7 +26,7 @@ Description-Content-Type: text/markdown
 # redis-message-queue
-[![PyPI Version](https://img.shields.io/badge/v6.0.1-version?color=43cd0f&style=flat&label=pypi)](https://pypi.org/project/redis-message-queue)
+[![PyPI Version](https://img.shields.io/badge/v7.0.0-version?color=43cd0f&style=flat&label=pypi)](https://pypi.org/project/redis-message-queue)
 [![PyPI Downloads](https://img.shields.io/pypi/dm/redis-message-queue?color=43cd0f&style=flat&label=downloads)](https://pypistats.org/packages/redis-message-queue)
 [![License: MIT](https://img.shields.io/badge/License-MIT-43cd0f.svg?style=flat&label=license)](LICENSE)
 [![Maintained: yes](https://img.shields.io/badge/yes-43cd0f.svg?style=flat&label=maintained)](https://github.com/Elijas/redis-message-queue/issues)
@@ -260,6 +260,22 @@ consumer whose claim request Redis executes next. There is no round-robin,
 equal-share, or starvation-freedom guarantee; faster consumers can receive more
 than 1/N of messages.
+### If you need stronger ordering or fairness guarantees
+- **Strict queue-wide processing order** — use a single consumer per queue.
+  Multiple consumers will interleave handler completions.
+- **Per-key processing order** — partition by key into multiple queues
+  (`queue_<hash(key) % N>`), and consume each partition with a single consumer.
+- **Equal-share / round-robin fairness across consumers** — choose a different
+  scheduler. This queue does not guarantee that any individual consumer makes
+  forward progress at any specific rate.
+- **Cross-batch ordering after reclaim** — accept that reclaimed messages will
+  reappear after newer un-reclaimed messages have been consumed. If your handler
+  must observe original publish order, persist that order in the payload (for
+  example, a sequence number set by the producer). For clock-related operator
+  detail behind reclaim behavior, see
+  [production readiness R11](docs/production-readiness.md#r11-redis-clock-dependencies).
 ### Dead-letter queue
 ```python
@@ -310,14 +326,14 @@ There are three distinct shutdown shapes; pick the one that matches your runtime
 |---|---|---|---|
 | **Flag-based soft drain** (`GracefulInterruptHandler`) | First SIGINT/SIGTERM flips a flag | Runs to completion | Drained on the next claim call, not on signal arrival |
 | **Async task cancellation** (`asyncio.CancelledError`) | Framework cancels the worker task (Uvicorn/K8s SIGTERM in many setups) | **Hard abort** — message stays in `processing`; with VT it is reclaimed at deadline expiry, without VT it is orphaned | Not drained |
-| **Explicit drain** (`drain()` / `aclose()`) | You call the method | Caller's responsibility to let it finish (drain does **not** cancel) | Drained synchronously via the gateway recovery path |
+| **Explicit drain** (`drain()` / `aclose()`) | You call the method | Caller's responsibility to let it finish (drain does **not** cancel) | Drained synchronously via the gateway recovery path; new publishes are refused |
 Use `drain()` / `aclose()` to bridge K8s `preStop` / SIGTERM grace windows without
 relying on signal interception:
 ```python
 # sync — in your SIGTERM handler or preStop hook
-queue.drain(timeout=25)   # refuses new claims, recovers pending claim IDs
+queue.drain(timeout=25)   # refuses new publishes/claims, recovers pending claim IDs
 worker_thread.join()      # wait for in-flight process_message to finish
 # async — same shape
@@ -326,12 +342,19 @@ await worker_task         # task observes ``_draining`` and exits its loop
 ```
 `drain()` / `aclose()` set a queue-local flag so subsequent `process_message()`
-calls yield `None` immediately. They do not cancel in-flight handlers — the
-caller must arrange handler exit through normal thread/task coordination.
-Returns `True` if all in-memory pending claim IDs were recovered within the
-timeout; `False` if the deadline fired or transient Redis errors left claim
-IDs pending (call again to retry). `timeout=0` reports current state without
-attempting recovery.
+calls yield `None` immediately and subsequent `publish()` calls raise
+`QueueDrainedError("queue is drained")`. Drain also gates the publish path:
+if a publish is already inside the queue instance's publish path, drain waits
+for that publish to finish before it returns; publishes that arrive after the
+drained flag is set are rejected. The drained state is local to that Python
+queue object and is not written to Redis, so constructing a fresh
+`RedisMessageQueue(...)` over the same keys remains usable.
+Drain does not cancel in-flight handlers — the caller must arrange handler
+exit through normal thread/task coordination. Returns `True` if all in-memory
+pending claim IDs were recovered within the timeout; `False` if the deadline
+fired or transient Redis errors left claim IDs pending (call again to retry).
+`timeout=0` reports current state without attempting recovery.
 > **Heartbeat caveat (best-effort stop):** when `heartbeat_interval_seconds` is
 > set, the heartbeat sidecar's `stop()` is bounded but not strictly quiescent —
@@ -472,6 +495,32 @@ Notes:
   message. Do not call `fork()` from inside active message handlers unless the
   child exits without using the inherited queue/client.
+#### Forking after constructing GracefulInterruptHandler
+If your application constructed `GracefulInterruptHandler` in the parent process
+before `os.fork()` (for example, via module import in a pre-fork app server),
+forked children cannot construct a fresh handler for the same signal because the
+inherited signal table still routes to the parent-process handler.
+In each child process, call `parent_handler.reset()` before constructing a fresh
+handler:
+```python
+def worker_main():
+    # Inherited handler from parent - reset it.
+    if shared.interrupt_handler is not None:
+        shared.interrupt_handler.reset()
+    # Now safe to construct a fresh handler for this child.
+    interrupt = GracefulInterruptHandler()
+    queue = RedisMessageQueue("jobs", client=redis.Redis(), interrupt=interrupt)
+    ...
+```
+Alternatively, defer all construction (handler and queue) to inside
+`worker_main()` and pass `--no-preload` (or equivalent) to your app server. That
+avoids the parent-construct hazard entirely.
 ### Redis memory sizing for deduplication and replay metadata
 When deduplication is enabled, each distinct dedup key creates one Redis string
@@ -530,6 +579,7 @@ Package logs remain diagnostic; use `on_event` rather than log parsing for
 metrics.
 ```python
+from opentelemetry import trace
 from prometheus_client import Counter
 from redis_message_queue import QueueEvent, RedisMessageQueue
@@ -543,17 +593,78 @@ def observe(event: QueueEvent) -> None:
     events_total.labels(
         event.queue, event.operation, event.outcome, event.exception_type or ""
     ).inc()
+    if event.error is not None:
+        trace.get_current_span().record_exception(event.error)
 queue = RedisMessageQueue("jobs", client=client, on_event=observe)
 ```
+#### Event dispatch context
+Callbacks fire inline:
+- **Sync queue:** the callback runs in the caller's thread. It sees
+  contextvars, the OpenTelemetry current span, and structlog contextvars bound
+  by the caller.
+- **Async queue:** the callback is awaited in the current asyncio task. It has
+  the same contextvars, span, and structlog visibility.
+- **Sync heartbeat:** heartbeat events fire from a separate
+  `threading.Thread`. That thread does not inherit caller contextvars or the
+  caller's OpenTelemetry current span. Use `event.message_id` and
+  `event.lease_token_hash` for correlation.
+- **Async heartbeat:** heartbeat events fire from an asyncio task. The task
+  copies the context present when the heartbeat was started, so contextvars and
+  OpenTelemetry spans bound at handler entry are visible.
+#### Event timing vs. Redis commit
+Most events are post-commit, emitted after the Redis command or Lua script
+returned: `publish/success`, `publish_dedup_hit`, `claim/success`,
+`claim_empty`, `claim_reclaim`, `ack`, `nack`, `completed`, `dlq`,
+`lease_renew`, `trim_failed`, and `stale_lease_*`.
+Pre-commit and mid-flight exceptions:
+- `failed/failure` fires after the handler raises but before failed-queue
+  cleanup completes. Use `nack` for cleanup-commit metrics; use `failed` for
+  handler-exception attribution.
+- `retry_attempt/failure` and `retry_exhausted` fire on the claim-loop retry
+  path. The first Redis attempt may or may not have committed.
+- `publish/failure`, `claim/failure`, and `cleanup_failed/failure` follow
+  exceptions. Under an ambiguous lost response, Redis may have committed
+  despite the exception. Treat them as "operation did not succeed from the
+  caller's perspective", not "Redis did not commit".
+#### Intentionally silent paths
+The following operations have no `on_event` surface by design:
+- **B1 Cluster `pcall` cleanup failure:** three lease-aware Lua scripts wrap a
+  data-derived `DEL` in `redis.pcall(...)` and ignore the result. This
+  preserves queue safety on Cluster `CROSSSLOT` rejection but cannot be
+  observed through `on_event`. Operators watching key-TTL behavior or Redis
+  slow logs can detect orphans.
+- **VT claim-store OOM compensation:** if the visibility-timeout Lua script
+  cannot store the claim result, it removes the message from processing, pushes
+  it back to pending, and returns `false`. Python translates that into
+  `claim_empty/skipped`, the same shape as an empty poll. This is intentional
+  fail-safe behavior; the message is not lost.
+- **`drain()` / `close()` / `aclose()` lifecycle:** explicit shutdown
+  operations do not emit lifecycle events. Pending-claim-drain recovery work
+  counts as `claim_reclaim` events when reached.
+- **Non-claim-loop retry attempts:** tenacity retries in deduplicated publish,
+  ack/remove, move-to-completed/failed, and lease renewal collapse into the
+  terminal operation's failure event. There is no per-attempt event for those
+  paths.
 The public exception hierarchy is rooted at `RedisMessageQueueError`.
 Configuration value/combinations raise `ConfigurationError` (also a
 `ValueError`), custom gateway contract violations raise `GatewayContractError`
 (also a `TypeError`), and Lua `redis.error_reply(...)` failures raise
 `LuaScriptError` (also a redis-py `ResponseError`). Publish overload raises
-`QueueBackpressureError`. `CleanupFailedError` and `RetryBudgetExhaustedError`
-are reserved categories for cleanup and retry surfaces.
+`QueueBackpressureError`; publish after explicit drain raises
+`QueueDrainedError`. `CleanupFailedError` and `RetryBudgetExhaustedError` are
+reserved categories for cleanup and retry surfaces.
 ## Known limitations
@@ -564,13 +675,46 @@ are reserved categories for cleanup and retry surfaces.
 - **Cluster detection uses `isinstance(client, RedisCluster)`.** Wrapped or instrumented cluster clients that delegate without inheriting will bypass hash-tag validation. Custom gateways should set `is_redis_cluster = True` explicitly.
 - **Redis Cluster requires hash tags.** The built-in queue uses multiple Redis keys per operation. Wrap the queue name in hash tags (for example `{myqueue}`) so every generated key lands in the same slot. When you pass a Redis Cluster client to the built-in queue/gateway path, incompatible names are rejected early.
 - **Non-ASCII payloads use ~2x storage.** The default `ensure_ascii=True` in JSON serialization encodes non-ASCII characters as `\uXXXX` escape sequences. This is a deliberate compatibility choice.
-- **Client-side `Retry` can duplicate non-deduplicated publishes.** If you construct your `redis.Redis` client with `retry=Retry(...)`, redis-py retries `ConnectionError` / `TimeoutError` at the connection layer — *below* this library. Idempotent operations (deduplicated `publish()`, lease-scoped cleanup) are safe because their Lua scripts replay the original result. `add_message()` (used by `publish()` when `deduplication=False`) is a bare `LPUSH` by default, or a single non-idempotent Lua enqueue when `max_pending_length` is set: this library deliberately does not retry it, but a client-level `Retry` will, and if the server executed the command before the response was lost the message is enqueued twice. Leave `retry=None` (the default) if you need strict at-most-once semantics for non-deduplicated publishes, or accept the duplication risk. More broadly, any non-idempotent enqueue path is vulnerable if the connection drops after server execution but before the client receives the response; all other built-in operations (deduplicated publish, lease-scoped ack/move, lease renewal) use replay markers and are safe under client-level `Retry`.
+- **Client-side `Retry` can duplicate non-deduplicated publishes.** If you construct your `redis.Redis` or `redis.asyncio.Redis` client with `retry=Retry(...)`, redis-py retries `ConnectionError` / `TimeoutError` at the connection layer — *below* this library. Idempotent operations (deduplicated `publish()`, lease-scoped cleanup) are safe because their Lua scripts replay the original result. `add_message()` (used by `publish()` when `deduplication=False`) is a bare `LPUSH` by default, or a single non-idempotent Lua enqueue when `max_pending_length` is set: this library deliberately does not retry it, but a client-level `Retry` will, and if the server executed the command before the response was lost the message is enqueued twice. redis-py 6.0+ changed the default standalone `Redis()` / `redis.asyncio.Redis()` retry policy from `None` (no retry) to a 3-attempt `ExponentialWithJitterBackoff`; pass `retry=None` explicitly if you need strict at-most-once semantics for non-deduplicated publishes, or accept the duplication risk. More broadly, any non-idempotent enqueue path is vulnerable if the connection drops after server execution but before the client receives the response; all other built-in operations (deduplicated publish, lease-scoped ack/move, lease renewal) use replay markers and are safe under client-level `Retry`.
+  ```python
+  import redis
+  from redis_message_queue import RedisMessageQueue
+  # Strict at-most-once for non-dedup messages: disable redis-py's
+  # default 3-retry policy explicitly.
+  client = redis.Redis(retry=None)
+  queue = RedisMessageQueue("jobs", client=client)
+  ```
+  ```python
+  import redis.asyncio as redis
+  from redis_message_queue.asyncio import RedisMessageQueue
+  # Strict at-most-once for non-dedup messages: disable redis-py's
+  # default 3-retry policy explicitly.
+  client = redis.Redis(retry=None)
+  queue = RedisMessageQueue("jobs", client=client)
+  ```
 - **Redis Cluster default retry can stack with this library's retry budget.** In redis-py 6.0+, `RedisCluster()` constructs a default `ExponentialWithJitterBackoff` retry below this library's `retry_budget_seconds`. If you need a single retry surface, pass `retry=Retry(NoBackoff(), 0)` to the cluster client or reduce `retry_budget_seconds` to account for the lower-level retry window.
 For a full analysis, see [docs/production-readiness.md](docs/production-readiness.md).
 ## Upgrading
+### v6 to v7 migration
+v7.0.0 changes explicit drain shutdown semantics. After `queue.drain()` /
+`queue.close()` (sync) or `await queue.drain()` / `await queue.aclose()`
+(async), the same queue instance rejects `publish()` with
+`QueueDrainedError("queue is drained")`.
+This state is queue-local and process-local; it is not stored in Redis. If a
+producer must continue publishing after a worker has drained, use a separate
+`RedisMessageQueue(...)` instance for that producer lifecycle. During
+shutdown, catch `QueueDrainedError` only at boundaries where late publishes are
+expected and safe to drop or reschedule.
 ### Configuration changes on live queues
 > **Warning:** These changes are destructive on live queues. Drain the queue completely before applying them.
@@ -596,6 +740,9 @@ v6.0.0 is a non-breaking-defaults release that adds new public APIs. v5 code con
 - `max_pending_length=N` caps pending-list depth; with `pending_overload_policy="raise"` (default) producers see `QueueBackpressureError` when the cap is hit; `"block"` waits up to `pending_overload_block_timeout_seconds`; `"drop_oldest"` evicts silently, so use it only when data loss is acceptable.
 - `queue.drain(timeout=...)` (sync) and `await queue.aclose(timeout=...)` (async) are explicit graceful-shutdown hooks. They refuse new claims and recover pending claim IDs but do not cancel in-flight handlers; join or await your worker separately.
 - `on_event=callback` receives a `QueueEvent` dataclass for every publish/claim/ack/reclaim/dedup/cleanup lifecycle event. Use it for metrics, tracing, and structured logging. See [`examples/production/observability.py`](examples/production/observability.py) for the adapter pattern.
+- See [`examples/production/backpressure.py`](examples/production/backpressure.py) and [`examples/production/graceful_shutdown.py`](examples/production/graceful_shutdown.py) for sync production patterns, with async siblings under [`examples/production/asyncio/`](examples/production/asyncio/).
+> When using a pre-fork app server (gunicorn `--preload`, uvicorn workers that import the app at master startup), call `make_queue()` from your worker startup hook - NOT at module import. See [Fork safety](#fork-safety-and-pre-fork-servers) for why.
 **New constructor rejections:**

{redis_message_queue-6.0.1 → redis_message_queue-7.0.0}/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 # redis-message-queue
-[![PyPI Version](https://img.shields.io/badge/v6.0.1-version?color=43cd0f&style=flat&label=pypi)](https://pypi.org/project/redis-message-queue)
+[![PyPI Version](https://img.shields.io/badge/v7.0.0-version?color=43cd0f&style=flat&label=pypi)](https://pypi.org/project/redis-message-queue)
 [![PyPI Downloads](https://img.shields.io/pypi/dm/redis-message-queue?color=43cd0f&style=flat&label=downloads)](https://pypistats.org/packages/redis-message-queue)
 [![License: MIT](https://img.shields.io/badge/License-MIT-43cd0f.svg?style=flat&label=license)](LICENSE)
 [![Maintained: yes](https://img.shields.io/badge/yes-43cd0f.svg?style=flat&label=maintained)](https://github.com/Elijas/redis-message-queue/issues)
@@ -234,6 +234,22 @@ consumer whose claim request Redis executes next. There is no round-robin,
 equal-share, or starvation-freedom guarantee; faster consumers can receive more
 than 1/N of messages.
+### If you need stronger ordering or fairness guarantees
+- **Strict queue-wide processing order** — use a single consumer per queue.
+  Multiple consumers will interleave handler completions.
+- **Per-key processing order** — partition by key into multiple queues
+  (`queue_<hash(key) % N>`), and consume each partition with a single consumer.
+- **Equal-share / round-robin fairness across consumers** — choose a different
+  scheduler. This queue does not guarantee that any individual consumer makes
+  forward progress at any specific rate.
+- **Cross-batch ordering after reclaim** — accept that reclaimed messages will
+  reappear after newer un-reclaimed messages have been consumed. If your handler
+  must observe original publish order, persist that order in the payload (for
+  example, a sequence number set by the producer). For clock-related operator
+  detail behind reclaim behavior, see
+  [production readiness R11](docs/production-readiness.md#r11-redis-clock-dependencies).
 ### Dead-letter queue
 ```python
@@ -284,14 +300,14 @@ There are three distinct shutdown shapes; pick the one that matches your runtime
 |---|---|---|---|
 | **Flag-based soft drain** (`GracefulInterruptHandler`) | First SIGINT/SIGTERM flips a flag | Runs to completion | Drained on the next claim call, not on signal arrival |
 | **Async task cancellation** (`asyncio.CancelledError`) | Framework cancels the worker task (Uvicorn/K8s SIGTERM in many setups) | **Hard abort** — message stays in `processing`; with VT it is reclaimed at deadline expiry, without VT it is orphaned | Not drained |
-| **Explicit drain** (`drain()` / `aclose()`) | You call the method | Caller's responsibility to let it finish (drain does **not** cancel) | Drained synchronously via the gateway recovery path |
+| **Explicit drain** (`drain()` / `aclose()`) | You call the method | Caller's responsibility to let it finish (drain does **not** cancel) | Drained synchronously via the gateway recovery path; new publishes are refused |
 Use `drain()` / `aclose()` to bridge K8s `preStop` / SIGTERM grace windows without
 relying on signal interception:
 ```python
 # sync — in your SIGTERM handler or preStop hook
-queue.drain(timeout=25)   # refuses new claims, recovers pending claim IDs
+queue.drain(timeout=25)   # refuses new publishes/claims, recovers pending claim IDs
 worker_thread.join()      # wait for in-flight process_message to finish
 # async — same shape
@@ -300,12 +316,19 @@ await worker_task         # task observes ``_draining`` and exits its loop
 ```
 `drain()` / `aclose()` set a queue-local flag so subsequent `process_message()`
-calls yield `None` immediately. They do not cancel in-flight handlers — the
-caller must arrange handler exit through normal thread/task coordination.
-Returns `True` if all in-memory pending claim IDs were recovered within the
-timeout; `False` if the deadline fired or transient Redis errors left claim
-IDs pending (call again to retry). `timeout=0` reports current state without
-attempting recovery.
+calls yield `None` immediately and subsequent `publish()` calls raise
+`QueueDrainedError("queue is drained")`. Drain also gates the publish path:
+if a publish is already inside the queue instance's publish path, drain waits
+for that publish to finish before it returns; publishes that arrive after the
+drained flag is set are rejected. The drained state is local to that Python
+queue object and is not written to Redis, so constructing a fresh
+`RedisMessageQueue(...)` over the same keys remains usable.
+Drain does not cancel in-flight handlers — the caller must arrange handler
+exit through normal thread/task coordination. Returns `True` if all in-memory
+pending claim IDs were recovered within the timeout; `False` if the deadline
+fired or transient Redis errors left claim IDs pending (call again to retry).
+`timeout=0` reports current state without attempting recovery.
 > **Heartbeat caveat (best-effort stop):** when `heartbeat_interval_seconds` is
 > set, the heartbeat sidecar's `stop()` is bounded but not strictly quiescent —
@@ -446,6 +469,32 @@ Notes:
   message. Do not call `fork()` from inside active message handlers unless the
   child exits without using the inherited queue/client.
+#### Forking after constructing GracefulInterruptHandler
+If your application constructed `GracefulInterruptHandler` in the parent process
+before `os.fork()` (for example, via module import in a pre-fork app server),
+forked children cannot construct a fresh handler for the same signal because the
+inherited signal table still routes to the parent-process handler.
+In each child process, call `parent_handler.reset()` before constructing a fresh
+handler:
+```python
+def worker_main():
+    # Inherited handler from parent - reset it.
+    if shared.interrupt_handler is not None:
+        shared.interrupt_handler.reset()
+    # Now safe to construct a fresh handler for this child.
+    interrupt = GracefulInterruptHandler()
+    queue = RedisMessageQueue("jobs", client=redis.Redis(), interrupt=interrupt)
+    ...
+```
+Alternatively, defer all construction (handler and queue) to inside
+`worker_main()` and pass `--no-preload` (or equivalent) to your app server. That
+avoids the parent-construct hazard entirely.
 ### Redis memory sizing for deduplication and replay metadata
 When deduplication is enabled, each distinct dedup key creates one Redis string
@@ -504,6 +553,7 @@ Package logs remain diagnostic; use `on_event` rather than log parsing for
 metrics.
 ```python
+from opentelemetry import trace
 from prometheus_client import Counter
 from redis_message_queue import QueueEvent, RedisMessageQueue
@@ -517,17 +567,78 @@ def observe(event: QueueEvent) -> None:
     events_total.labels(
         event.queue, event.operation, event.outcome, event.exception_type or ""
     ).inc()
+    if event.error is not None:
+        trace.get_current_span().record_exception(event.error)
 queue = RedisMessageQueue("jobs", client=client, on_event=observe)
 ```
+#### Event dispatch context
+Callbacks fire inline:
+- **Sync queue:** the callback runs in the caller's thread. It sees
+  contextvars, the OpenTelemetry current span, and structlog contextvars bound
+  by the caller.
+- **Async queue:** the callback is awaited in the current asyncio task. It has
+  the same contextvars, span, and structlog visibility.
+- **Sync heartbeat:** heartbeat events fire from a separate
+  `threading.Thread`. That thread does not inherit caller contextvars or the
+  caller's OpenTelemetry current span. Use `event.message_id` and
+  `event.lease_token_hash` for correlation.
+- **Async heartbeat:** heartbeat events fire from an asyncio task. The task
+  copies the context present when the heartbeat was started, so contextvars and
+  OpenTelemetry spans bound at handler entry are visible.
+#### Event timing vs. Redis commit
+Most events are post-commit, emitted after the Redis command or Lua script
+returned: `publish/success`, `publish_dedup_hit`, `claim/success`,
+`claim_empty`, `claim_reclaim`, `ack`, `nack`, `completed`, `dlq`,
+`lease_renew`, `trim_failed`, and `stale_lease_*`.
+Pre-commit and mid-flight exceptions:
+- `failed/failure` fires after the handler raises but before failed-queue
+  cleanup completes. Use `nack` for cleanup-commit metrics; use `failed` for
+  handler-exception attribution.
+- `retry_attempt/failure` and `retry_exhausted` fire on the claim-loop retry
+  path. The first Redis attempt may or may not have committed.
+- `publish/failure`, `claim/failure`, and `cleanup_failed/failure` follow
+  exceptions. Under an ambiguous lost response, Redis may have committed
+  despite the exception. Treat them as "operation did not succeed from the
+  caller's perspective", not "Redis did not commit".
+#### Intentionally silent paths
+The following operations have no `on_event` surface by design:
+- **B1 Cluster `pcall` cleanup failure:** three lease-aware Lua scripts wrap a
+  data-derived `DEL` in `redis.pcall(...)` and ignore the result. This
+  preserves queue safety on Cluster `CROSSSLOT` rejection but cannot be
+  observed through `on_event`. Operators watching key-TTL behavior or Redis
+  slow logs can detect orphans.
+- **VT claim-store OOM compensation:** if the visibility-timeout Lua script
+  cannot store the claim result, it removes the message from processing, pushes
+  it back to pending, and returns `false`. Python translates that into
+  `claim_empty/skipped`, the same shape as an empty poll. This is intentional
+  fail-safe behavior; the message is not lost.
+- **`drain()` / `close()` / `aclose()` lifecycle:** explicit shutdown
+  operations do not emit lifecycle events. Pending-claim-drain recovery work
+  counts as `claim_reclaim` events when reached.
+- **Non-claim-loop retry attempts:** tenacity retries in deduplicated publish,
+  ack/remove, move-to-completed/failed, and lease renewal collapse into the
+  terminal operation's failure event. There is no per-attempt event for those
+  paths.
 The public exception hierarchy is rooted at `RedisMessageQueueError`.
 Configuration value/combinations raise `ConfigurationError` (also a
 `ValueError`), custom gateway contract violations raise `GatewayContractError`
 (also a `TypeError`), and Lua `redis.error_reply(...)` failures raise
 `LuaScriptError` (also a redis-py `ResponseError`). Publish overload raises
-`QueueBackpressureError`. `CleanupFailedError` and `RetryBudgetExhaustedError`
-are reserved categories for cleanup and retry surfaces.
+`QueueBackpressureError`; publish after explicit drain raises
+`QueueDrainedError`. `CleanupFailedError` and `RetryBudgetExhaustedError` are
+reserved categories for cleanup and retry surfaces.
 ## Known limitations
@@ -538,13 +649,46 @@ are reserved categories for cleanup and retry surfaces.
 - **Cluster detection uses `isinstance(client, RedisCluster)`.** Wrapped or instrumented cluster clients that delegate without inheriting will bypass hash-tag validation. Custom gateways should set `is_redis_cluster = True` explicitly.
 - **Redis Cluster requires hash tags.** The built-in queue uses multiple Redis keys per operation. Wrap the queue name in hash tags (for example `{myqueue}`) so every generated key lands in the same slot. When you pass a Redis Cluster client to the built-in queue/gateway path, incompatible names are rejected early.
 - **Non-ASCII payloads use ~2x storage.** The default `ensure_ascii=True` in JSON serialization encodes non-ASCII characters as `\uXXXX` escape sequences. This is a deliberate compatibility choice.
-- **Client-side `Retry` can duplicate non-deduplicated publishes.** If you construct your `redis.Redis` client with `retry=Retry(...)`, redis-py retries `ConnectionError` / `TimeoutError` at the connection layer — *below* this library. Idempotent operations (deduplicated `publish()`, lease-scoped cleanup) are safe because their Lua scripts replay the original result. `add_message()` (used by `publish()` when `deduplication=False`) is a bare `LPUSH` by default, or a single non-idempotent Lua enqueue when `max_pending_length` is set: this library deliberately does not retry it, but a client-level `Retry` will, and if the server executed the command before the response was lost the message is enqueued twice. Leave `retry=None` (the default) if you need strict at-most-once semantics for non-deduplicated publishes, or accept the duplication risk. More broadly, any non-idempotent enqueue path is vulnerable if the connection drops after server execution but before the client receives the response; all other built-in operations (deduplicated publish, lease-scoped ack/move, lease renewal) use replay markers and are safe under client-level `Retry`.
+- **Client-side `Retry` can duplicate non-deduplicated publishes.** If you construct your `redis.Redis` or `redis.asyncio.Redis` client with `retry=Retry(...)`, redis-py retries `ConnectionError` / `TimeoutError` at the connection layer — *below* this library. Idempotent operations (deduplicated `publish()`, lease-scoped cleanup) are safe because their Lua scripts replay the original result. `add_message()` (used by `publish()` when `deduplication=False`) is a bare `LPUSH` by default, or a single non-idempotent Lua enqueue when `max_pending_length` is set: this library deliberately does not retry it, but a client-level `Retry` will, and if the server executed the command before the response was lost the message is enqueued twice. redis-py 6.0+ changed the default standalone `Redis()` / `redis.asyncio.Redis()` retry policy from `None` (no retry) to a 3-attempt `ExponentialWithJitterBackoff`; pass `retry=None` explicitly if you need strict at-most-once semantics for non-deduplicated publishes, or accept the duplication risk. More broadly, any non-idempotent enqueue path is vulnerable if the connection drops after server execution but before the client receives the response; all other built-in operations (deduplicated publish, lease-scoped ack/move, lease renewal) use replay markers and are safe under client-level `Retry`.
+  ```python
+  import redis
+  from redis_message_queue import RedisMessageQueue
+  # Strict at-most-once for non-dedup messages: disable redis-py's
+  # default 3-retry policy explicitly.
+  client = redis.Redis(retry=None)
+  queue = RedisMessageQueue("jobs", client=client)
+  ```
+  ```python
+  import redis.asyncio as redis
+  from redis_message_queue.asyncio import RedisMessageQueue
+  # Strict at-most-once for non-dedup messages: disable redis-py's
+  # default 3-retry policy explicitly.
+  client = redis.Redis(retry=None)
+  queue = RedisMessageQueue("jobs", client=client)
+  ```
 - **Redis Cluster default retry can stack with this library's retry budget.** In redis-py 6.0+, `RedisCluster()` constructs a default `ExponentialWithJitterBackoff` retry below this library's `retry_budget_seconds`. If you need a single retry surface, pass `retry=Retry(NoBackoff(), 0)` to the cluster client or reduce `retry_budget_seconds` to account for the lower-level retry window.
 For a full analysis, see [docs/production-readiness.md](docs/production-readiness.md).
 ## Upgrading
+### v6 to v7 migration
+v7.0.0 changes explicit drain shutdown semantics. After `queue.drain()` /
+`queue.close()` (sync) or `await queue.drain()` / `await queue.aclose()`
+(async), the same queue instance rejects `publish()` with
+`QueueDrainedError("queue is drained")`.
+This state is queue-local and process-local; it is not stored in Redis. If a
+producer must continue publishing after a worker has drained, use a separate
+`RedisMessageQueue(...)` instance for that producer lifecycle. During
+shutdown, catch `QueueDrainedError` only at boundaries where late publishes are
+expected and safe to drop or reschedule.
 ### Configuration changes on live queues
 > **Warning:** These changes are destructive on live queues. Drain the queue completely before applying them.
@@ -570,6 +714,9 @@ v6.0.0 is a non-breaking-defaults release that adds new public APIs. v5 code con
 - `max_pending_length=N` caps pending-list depth; with `pending_overload_policy="raise"` (default) producers see `QueueBackpressureError` when the cap is hit; `"block"` waits up to `pending_overload_block_timeout_seconds`; `"drop_oldest"` evicts silently, so use it only when data loss is acceptable.
 - `queue.drain(timeout=...)` (sync) and `await queue.aclose(timeout=...)` (async) are explicit graceful-shutdown hooks. They refuse new claims and recover pending claim IDs but do not cancel in-flight handlers; join or await your worker separately.
 - `on_event=callback` receives a `QueueEvent` dataclass for every publish/claim/ack/reclaim/dedup/cleanup lifecycle event. Use it for metrics, tracing, and structured logging. See [`examples/production/observability.py`](examples/production/observability.py) for the adapter pattern.
+- See [`examples/production/backpressure.py`](examples/production/backpressure.py) and [`examples/production/graceful_shutdown.py`](examples/production/graceful_shutdown.py) for sync production patterns, with async siblings under [`examples/production/asyncio/`](examples/production/asyncio/).
+> When using a pre-fork app server (gunicorn `--preload`, uvicorn workers that import the app at master startup), call `make_queue()` from your worker startup hook - NOT at module import. See [Fork safety](#fork-safety-and-pre-fork-servers) for why.
 **New constructor rejections:**

{redis_message_queue-6.0.1 → redis_message_queue-7.0.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "redis-message-queue"
-version = "6.0.1"
+version = "7.0.0"
 description = "Python message queuing with Redis and message deduplication"
 authors = ["Elijas <4084885+Elijas@users.noreply.github.com>"]
 readme = "README.md"
@@ -24,7 +24,7 @@ Issues = "https://github.com/Elijas/redis-message-queue/issues"
 [tool.poetry.dependencies]
 python = "^3.12"
-redis = ">=5.0.0"
+redis = ">=5.0.0,<8.0.0"
 tenacity = ">=8.1.0"
 [tool.poetry.group.test.dependencies]

{redis_message_queue-6.0.1 → redis_message_queue-7.0.0}/redis_message_queue/__init__.py RENAMED Viewed

@@ -6,6 +6,7 @@ from redis_message_queue._exceptions import (
     GatewayContractError,
     LuaScriptError,
     QueueBackpressureError,
+    QueueDrainedError,
     RedisMessageQueueError,
     RetryBudgetExhaustedError,
 )
@@ -33,6 +34,7 @@ __all__ = [
     "GatewayContractError",
     "LuaScriptError",
     "QueueBackpressureError",
+    "QueueDrainedError",
     "CleanupFailedError",
     "RetryBudgetExhaustedError",
 ]

{redis_message_queue-6.0.1 → redis_message_queue-7.0.0}/redis_message_queue/_abstract_redis_gateway.py RENAMED Viewed

@@ -83,12 +83,13 @@ class AbstractRedisGateway(ABC):
         command can silently duplicate the message. The caller can still
         retry (accepting duplicates).
-        Note: a client-level retry policy bypasses this guarantee. If the
-        underlying ``redis.Redis`` / ``redis.asyncio.Redis`` client was
-        constructed with ``retry=Retry(...)``, redis-py retries on
-        ``ConnectionError`` / ``TimeoutError`` below this call and may
-        duplicate. Pass ``retry=None`` (the default) when strict at-most-once
-        is required for non-deduplicated publishes.
+        Note on retries: redis-py 6.0+ changed the default standalone
+        ``Redis()`` / ``redis.asyncio.Redis()`` retry policy from ``None`` (no
+        retry) to a 3-attempt ``ExponentialWithJitterBackoff``. If you need
+        strict at-most-once for non-deduplicated publishes, pass ``retry=None``
+        explicitly when constructing the redis-py client. This library does
+        not configure the redis-py client retry; it only controls its own
+        retry budget on top of the client.
         """
     @abstractmethod

redis-message-queue 6.0.1__tar.gz → 7.0.0__tar.gz

redis-message-queue 6.0.1tar.gz → 7.0.0tar.gz