mongodash 2.1.0 → 2.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (68) hide show
  1. package/README.md +70 -5
  2. package/dist/dashboard/index.html +8 -8
  3. package/dist/lib/playground/server.js +1 -8
  4. package/dist/lib/src/ConcurrentRunner.js +1 -8
  5. package/dist/lib/src/OnError.js +1 -8
  6. package/dist/lib/src/OnInfo.js +1 -8
  7. package/dist/lib/src/createContinuousLock.js +3 -8
  8. package/dist/lib/src/createContinuousLock.js.map +1 -1
  9. package/dist/lib/src/cronTasks.js +1 -8
  10. package/dist/lib/src/getCollection.js +1 -8
  11. package/dist/lib/src/getMongoClient.js +1 -8
  12. package/dist/lib/src/globalsCollection.js +1 -8
  13. package/dist/lib/src/index.js +1 -8
  14. package/dist/lib/src/initPromise.js +1 -8
  15. package/dist/lib/src/mongoCompatibility.js +1 -8
  16. package/dist/lib/src/parseInterval.js +1 -8
  17. package/dist/lib/src/prefixFilterKeys.js +1 -8
  18. package/dist/lib/src/processInBatches.js +1 -8
  19. package/dist/lib/src/reactiveTasks/LeaderElector.js +1 -8
  20. package/dist/lib/src/reactiveTasks/MetricsCollector.js +1 -8
  21. package/dist/lib/src/reactiveTasks/MetricsCollector.js.map +1 -1
  22. package/dist/lib/src/reactiveTasks/ReactiveTaskManager.js +1 -8
  23. package/dist/lib/src/reactiveTasks/ReactiveTaskOps.js +1 -8
  24. package/dist/lib/src/reactiveTasks/ReactiveTaskPlanner.js +1 -8
  25. package/dist/lib/src/reactiveTasks/ReactiveTaskReconciler.js +1 -8
  26. package/dist/lib/src/reactiveTasks/ReactiveTaskRegistry.js +1 -8
  27. package/dist/lib/src/reactiveTasks/ReactiveTaskRepository.js +1 -8
  28. package/dist/lib/src/reactiveTasks/ReactiveTaskRetryStrategy.js +1 -8
  29. package/dist/lib/src/reactiveTasks/ReactiveTaskTypes.js +1 -8
  30. package/dist/lib/src/reactiveTasks/ReactiveTaskWorker.js +1 -8
  31. package/dist/lib/src/reactiveTasks/compileWatchProjection.js +1 -8
  32. package/dist/lib/src/reactiveTasks/index.js +1 -8
  33. package/dist/lib/src/reactiveTasks/queryToExpression.js +1 -8
  34. package/dist/lib/src/reactiveTasks/validateTaskFilter.js +1 -8
  35. package/dist/lib/src/task-management/OperationalTaskController.js +1 -8
  36. package/dist/lib/src/task-management/index.js +1 -8
  37. package/dist/lib/src/task-management/serveDashboard.js +1 -9
  38. package/dist/lib/src/task-management/serveDashboard.js.map +1 -1
  39. package/dist/lib/src/task-management/types.js +1 -8
  40. package/dist/lib/src/withLock.js +1 -8
  41. package/dist/lib/src/withTransaction.js +1 -8
  42. package/dist/lib/tools/check-db-connection.js +1 -8
  43. package/dist/lib/tools/clean-testing-databases.js +1 -8
  44. package/dist/lib/tools/prepare-republish.js +1 -8
  45. package/dist/lib/tools/test-matrix-local.js +1 -8
  46. package/dist/lib/tools/testingDatabase.js +1 -8
  47. package/docs/.vitepress/cache/deps/_metadata.json +6 -6
  48. package/docs/.vitepress/config.mts +20 -1
  49. package/docs/.vitepress/theme/style.css +5 -0
  50. package/docs/getting-started.md +75 -9
  51. package/docs/index.md +4 -1
  52. package/docs/initialization.md +1 -1
  53. package/docs/public/logo-backgroundless.png +0 -0
  54. package/docs/public/logo.png +0 -0
  55. package/docs/reactive-tasks/configuration.md +89 -0
  56. package/docs/reactive-tasks/core-concepts.md +56 -0
  57. package/docs/reactive-tasks/evolution.md +62 -0
  58. package/docs/reactive-tasks/examples.md +66 -0
  59. package/docs/reactive-tasks/getting-started.md +54 -0
  60. package/docs/reactive-tasks/guides.md +237 -0
  61. package/docs/reactive-tasks/index.md +44 -0
  62. package/docs/reactive-tasks/management.md +86 -0
  63. package/docs/reactive-tasks/monitoring.md +76 -0
  64. package/docs/reactive-tasks/policy-cleanup.md +70 -0
  65. package/docs/reactive-tasks/policy-retry.md +60 -0
  66. package/docs/reactive-tasks/reconciliation.md +40 -0
  67. package/package.json +11 -10
  68. package/docs/reactive-tasks.md +0 -914
@@ -0,0 +1,237 @@
1
+ # Guides & Patterns
2
+
3
+
4
+
5
+ ## Idempotency & Re-execution
6
+
7
+ The system is designed with an **At-Least-Once** execution guarantee. This is a fundamental property of distributed systems that value reliability over "exactly-once".
8
+
9
+ While the system strives to execute your handler exactly once per event, there are specific scenarios where it might execute multiple times for the same document state. Therefore, **your `handler` must be idempotent**.
10
+
11
+ ### Common Re-execution Scenarios
12
+
13
+ 1. **Transient Failures ([Retries](./policy-retry.md))**: If a worker crashes or loses network connectivity during execution (before marking the task `completed`), the lock will expire. Another worker will pick up the task and retry it.
14
+ 2. **[Reconciliation](./reconciliation.md) Recovery**: If task records are deleted (e.g. manual cleanup) but source documents remain, once a reconciliation runs, it recreates them as `pending`.
15
+ 3. **Filter Re-matching** If a document is no longer matching the task filter, the task is deleted because the **[sourceDocumentDeletedOrNoLongerMatching](./policy-cleanup.md)** cleanup policy is used and then the document is changed back again to match the task filter, the task will be recreated as `pending`.
16
+ 4. **Explicit Reprocessing**: You might trigger re-execution manually (via `retryReactiveTasks`) or through [schema evolution policies](./evolution.md) (`reprocess_all`).
17
+
18
+ ### Designing Idempotent Handlers
19
+
20
+ Ensure your handler allows multiple executions without adverse side effects.
21
+
22
+ **Example**:
23
+ ```typescript
24
+ handler: async (context) => {
25
+ // 1. Fetch document (with verification)
26
+ const order = await context.getDocument();
27
+
28
+ // 2. Check if the work is already done
29
+ if (order.emailSent) return;
30
+
31
+ // 3. Perform the side-effect
32
+ await sendEmail(order.userId, "Order Received");
33
+
34
+ // 4. Mark as done (using atomic update)
35
+ await db.collection('orders').updateOne(
36
+ { _id: order._id },
37
+ { $set: { emailSent: true } }
38
+ );
39
+ }
40
+ ```
41
+
42
+
43
+ ## The Handler Context
44
+
45
+ ### `getDocument` & Safety Checks
46
+
47
+ Critically, the library performs a **runtime check** when you call `await context.getDocument()` inside your handler.
48
+
49
+ 1. **Lock Task**: The worker locks the task.
50
+ 2. **Fetch & Verify**: When you call `await context.getDocument()`, it performs an atomic fetch that ensures:
51
+ * **Filter Match**: The document still matches your `filter` configuration.
52
+ * **Data Consistency**: The watched fields (`watchProjection`) have NOT changed since the task was triggered (Optimistic Locking).
53
+ * **Existence**: The document still exists.
54
+
55
+ If any of these conditions fail, `getDocument` throws a `TaskConditionFailedError`. The worker catches this error effectively **skipping** the task and marking it as 'completed'.
56
+
57
+ **Why is this important?**
58
+ * **Race Conditions**: Imagine a "Back-In-Stock" task triggered when `inventory > 0`. If the item sells out immediately (`inventory` returns to `0`) *while* the task is waiting in the queue, this check prevents sending a false notification.
59
+ * **Optimistic Concurrency**: If the data changed significantly (e.g. `status` changed from `paid` to `refunded`) between trigger and execution, the task is skipped to effectively "cancel" the stale operation. A new task for the new state (`refunded`) will likely be in the queue anyway.
60
+
61
+ #### Advanced Usage: Options & Transactions
62
+
63
+ The `getDocument(options)` method accepts standard MongoDB `FindOptions`, allowing you to optimize performance or ensure consistency.
64
+
65
+ **1. Projections (Partial Fetch)**
66
+ If your source document is large but you only need a few fields, use `projection`.
67
+
68
+ ```typescript
69
+ const user = await context.getDocument({
70
+ projection: { email: 1, firstName: 1 }
71
+ });
72
+ ```
73
+
74
+ **2. Transactions (`session`)**
75
+ To ensure atomic updates across multiple collections, pass a `session` to `getDocument`. This ensures that the document fetch and your subsequent writes happen within the same transaction snapshot.
76
+
77
+ ```typescript
78
+ import { withTransaction } from 'mongodash';
79
+
80
+ handler: async (context) => {
81
+ await withTransaction(async (session) => {
82
+ // Pass session to getDocument to participate in the transaction
83
+ const doc = await context.getDocument({ session });
84
+
85
+ // Perform other operations in the same transaction
86
+ await otherCollection.updateOne({ _id: doc.refId }, { $set: { ... } }, { session });
87
+ });
88
+ }
89
+ ```
90
+
91
+ **3. Locking Resources (`withLock`)**
92
+ While the *task itself* is locked (ensuring only one worker processes this specific task instance), you might need to lock shared resources if your handler accesses data outside the source document.
93
+
94
+ You can use `context.watchedValues` to get IDs needed for locking *before* you fetch the document.
95
+
96
+ ```typescript
97
+ import { withLock } from 'mongodash';
98
+
99
+ handler: async (context) => {
100
+ // Use watchedValues to get the ID for locking
101
+ const accountId = context.watchedValues.accountId;
102
+
103
+ // Lock a shared resource
104
+ await withLock(`account-update-${accountId}`, async () => {
105
+ // Now it is safe to fetch and process
106
+ const doc = await context.getDocument();
107
+ // ... safe exclusive access to the account ...
108
+ });
109
+ }
110
+ ```
111
+
112
+ ### Flow Control (Defer / Throttle)
113
+
114
+ Sometimes you need dynamic control over task execution speed based on external factors (e.g., rate limits of a 3rd party API) or business logic.
115
+
116
+ The `handler` receives a `context` object that exposes flow control methods.
117
+
118
+ #### 1. Deferral (`deferCurrent`)
119
+
120
+ Delays the **current** task execution. The task is put back into the queue specifically for this document and will not be picked up again until the specified time.
121
+
122
+ This is useful for:
123
+ * **Rate Limits**: "API returned 429, try again in 30 seconds."
124
+ * **Business Waits**: "Customer created, but wait 1 hour before sending first email."
125
+
126
+ ```typescript
127
+ await reactiveTask({
128
+ task: 'send-webhook',
129
+ collection: 'events',
130
+ handler: async (context) => {
131
+ const doc = await context.getDocument();
132
+ try {
133
+ await sendWebhook(doc);
134
+ } catch (err) {
135
+ if (err.status === 429) {
136
+ const retryAfter = err.headers['retry-after'] || 30; // seconds
137
+
138
+ // Defer THIS task only.
139
+ // It resets status to 'pending' and schedules it for future.
140
+ // It does NOT increment attempt count (it's not a failure).
141
+ context.deferCurrent(retryAfter * 1000);
142
+ return;
143
+ }
144
+ throw err; // Use standard retry policy for other errors
145
+ }
146
+ }
147
+ });
148
+ ```
149
+
150
+ #### 2. Throttling (`throttleAll`)
151
+
152
+ Pauses all FUTURE tasks of this type for a specified duration. This serves as a "Circuit Breaker" when an external system (e.g., CRM, Payment Gateway) is unresponsive or returns overload errors (503, 429).
153
+
154
+ ```typescript
155
+ context.throttleAll(60 * 1000); // Pause this task type for 1 minute
156
+ ```
157
+
158
+ > [!IMPORTANT]
159
+ > **Cluster Behavior (Instance-Local)**
160
+ > `throttleAll` operates only in the memory of the current instance (worker).
161
+ > In a distributed environment (e.g., Kubernetes with multiple pods), other instances will not know about the issue immediately. They will continue processing until they independently encounter the error and trigger their own `throttleAll`.
162
+ >
163
+ > **Result**: The load on the external service will not drop to zero immediately but will decrease gradually as individual instances hit the "circuit breaker".
164
+
165
+ > [!NOTE]
166
+ > **Current Task**
167
+ > `throttleAll` does not affect the currently running task. If you want to postpone the current task (so it counts as pending and retries after the pause), you must explicitly call `deferCurrent()`.
168
+
169
+ **Example (Service Down):**
170
+
171
+ ```typescript
172
+ await reactiveTask({
173
+ task: 'sync-to-crm',
174
+ collection: 'users',
175
+ handler: async (context) => {
176
+ // Note: You can throttle even before fetching the doc if you know the service is down!
177
+ try {
178
+ const doc = await context.getDocument();
179
+ await crmApi.update(doc);
180
+ } catch (err) {
181
+ // If service is unavailable (503) or circuit breaker is open
182
+ if (err.status === 503 || err.isCircuitBreakerOpen) {
183
+ console.warn('CRM is down, pausing tasks for 1 minute.');
184
+
185
+ // 1. Stop processing future tasks of this type on this instance
186
+ context.throttleAll(60 * 1000);
187
+
188
+ // 2. Defer the CURRENT task so it retries after the pause
189
+ context.deferCurrent(60 * 1000);
190
+ return;
191
+ }
192
+ throw err; // Standard retry policy for other errors
193
+ }
194
+ }
195
+ });
196
+ ```
197
+
198
+ ## Graceful Shutdown
199
+
200
+ When shutting down your application, call `stopReactiveTasks()` in your termination signal handlers to ensure in-progress tasks complete and resources are released cleanly.
201
+
202
+ **Recommended Pattern:**
203
+
204
+ ```typescript
205
+ import { stopReactiveTasks } from 'mongodash';
206
+
207
+ const gracefulShutdown = async (signal: string) => {
208
+ console.log(`${signal} received, shutting down...`);
209
+
210
+ // Set timeout to force exit if shutdown hangs
211
+ const timeout = setTimeout(() => {
212
+ console.error('Shutdown timeout, forcing exit');
213
+ process.exit(1);
214
+ }, 30000);
215
+
216
+ try {
217
+ await stopReactiveTasks(); // Stop tasks BEFORE closing DB
218
+ await server.close(); // Close your HTTP server
219
+ await db.disconnect(); // Close database connection
220
+
221
+ clearTimeout(timeout);
222
+ process.exit(0);
223
+ } catch (err) {
224
+ console.error('Shutdown error:', err);
225
+ process.exit(1);
226
+ }
227
+ };
228
+
229
+ process.on('SIGTERM', () => gracefulShutdown('SIGTERM')); // Docker, K8s
230
+ process.on('SIGINT', () => gracefulShutdown('SIGINT')); // Ctrl+C
231
+ ```
232
+
233
+ > [!IMPORTANT]
234
+ > Always call `stopReactiveTasks()` **before** closing database connections, as the stop process needs to communicate with MongoDB.
235
+
236
+ > [!NOTE]
237
+ > **Self-Healing Design**: While graceful shutdown is recommended best practice, the system is designed to be resilient. If your application crashes or is forcefully terminated, task locks will automatically expire after a timeout (default: 1 minute), allowing other instances to pick up and process the unfinished tasks. Similarly, leadership locks expire, ensuring another instance takes over. This guarantees eventual task processing even in failure scenarios.
@@ -0,0 +1,44 @@
1
+ # Reactive Tasks
2
+
3
+ A powerful, distributed task execution system built on top of [MongoDB Change Streams](https://www.mongodb.com/docs/manual/changeStreams/).
4
+
5
+ Reactive Tasks allow you to define background jobs that trigger automatically when your data changes. This enables **Model Data-Driven Flows**, where business logic is triggered by state changes (e.g., `status: 'paid'`) rather than explicit calls. The system handles **concurrency**, **retries**, **deduplication**, and **monitoring** out of the box.
6
+
7
+ ## Features
8
+
9
+ - **Reactive**: Tasks triggered instantly (near-real-time) by database changes (insert/update).
10
+ - **Distributed**: Safe to run on multiple instances (Kubernetes/Serverless). Only one instance processes a specific task for a specific document at a time.
11
+ - **Efficient Listener**: Regardless of the number of application instances, **only one instance (the leader)** listens to the MongoDB Change Stream. This minimizes database load significantly (**O(1) connections**), though it implies that the total ingestion throughput is limited by the single leader instance.
12
+ - **[Reliable Retries](./policy-retry.md)**: Built-in retry mechanisms (exponential backoff) and "Dead Letter Queue" logic.
13
+ - **Efficient**: Uses MongoDB Driver for low-latency updates and avoids polling where possible.
14
+ - **Memory Efficiency**: The system is designed to handle large datasets. During live scheduling (Change Streams), reconciliation, and periodic cleanup, the library **only loads the `_id`'s** of the source documents into memory, keeping the footprint low regardless of the collection size. Note that task *storage* size depends on your `watchProjection` configuration—see [Storage Optimization](./core-concepts.md#change-detection-and-storage-optimization).
15
+ - **[Concurrency Control](./configuration.md)**: Limit parallel execution to protect downstream resources.
16
+ - **[Deduplication](./guides.md#idempotency--re-execution)**: Automatic debouncing ("wait for data to settle") and task merging.
17
+ - **[Observability](./monitoring.md)**: First-class Prometheus metrics support.
18
+ - **[Dashboard](../dashboard.md)**: A visual Dashboard to monitor, retry, and debug tasks.
19
+ - **Developer Friendly**: Zero-config local development, fully typed with TypeScript.
20
+
21
+ ## Reactive vs Scheduled Tasks
22
+
23
+ It is important to distinguish between Reactive Tasks and standard schedulers (like Agenda or BullMQ).
24
+
25
+ - **Reactive Tasks (Reactors)**: Triggered by **state changes** (data). "When Order is Paid, send email". This guarantees consistency with data.
26
+ - **Schedulers**: Triggered by **time**. "Send email at 2:00 PM".
27
+
28
+ Reactive Tasks support time-based operations via `debounce` (e.g., "Wait 1m after data change to settle") and `deferCurrent` (e.g., "Retry in 5m"), but they are fundamentally event-driven. If you need purely time-based jobs (e.g., "Daily Report" without any data change trigger), you can trigger them via a [Cron job](../cron-tasks.md), although you can model them as "Run on insert to 'daily_reports' collection".
29
+
30
+ ## Advantages over Standard Messaging
31
+
32
+ Using Reactive Tasks instead of a traditional message broker (RabbitMQ, Kafka) provides distinct architectural benefits:
33
+
34
+ 1. **Lean Stack & Simplified DevOps**:
35
+ - Eliminates the need to manage, scale, and secure external message brokers.
36
+ - **Zero-Config Development**: Local testing requires only the database connection—no extra Docker containers or infrastructure to spin up.
37
+
38
+ 2. **Transactional Consistency (Solving the "Dual Write" Problem)**:
39
+ - *The Problem*: In standard architectures, writing to the database and publishing an event are two separate operations. If the database write succeeds but the message flush fails (network error, crash), your system enters an inconsistent state.
40
+ - *The Solution*: With Reactive Tasks, the "event" **is** the database write. The task is triggered electronically by the MongoDB Oplog. This guarantees that **if and only if** data is persisted, the corresponding task will be scheduled—ensuring 100% data consistency without distributed transactions.
41
+
42
+ 3. **Inspectable State**:
43
+ - The task queue is stored in a standard MongoDB collection (`[collection]_tasks`), not in a hidden broker queue.
44
+ - You can use standard tools (MongoDB Compass, Atlas Data Explorer, simple queries) to inspect pending jobs, debug failures, and analyze queue distribution without needing specialized queue management interfaces.
@@ -0,0 +1,86 @@
1
+ # Task Management & DLQ
2
+
3
+ You can programmatically manage tasks, investigate failures, and handle Dead Letter Queues (DLQ) using the exported management API.
4
+
5
+ These functions allow you to build custom admin UIs or automated recovery workflows.
6
+
7
+ > [!TIP]
8
+ > **Dashboard Available**
9
+ > While this page describes the programmatic API, Mongodash also provides a **[Visual Dashboard](../dashboard.md)** (GUI) which wraps these methods in a user-friendly interface. The dashboard allows you to view task lists, filter by status/error, retry failed tasks, and trigger cron jobs without writing any code.
10
+
11
+ ## Listing Tasks
12
+
13
+ Use `getReactiveTasks` to inspect the queue. You can filter by task name, status, error message, or properties of the **source document**.
14
+
15
+ ```typescript
16
+ import { getReactiveTasks } from 'mongodash';
17
+
18
+ // list currently failed tasks
19
+ const failedTasks = await getReactiveTasks({
20
+ task: 'send-welcome-email',
21
+ status: 'failed'
22
+ });
23
+
24
+ // list with pagination
25
+ const page1 = await getReactiveTasks(
26
+ { task: 'send-welcome-email' },
27
+ { limit: 50, skip: 0, sort: { scheduledAt: -1 } }
28
+ );
29
+
30
+ // Advanced: Helper to find task by properties of the SOURCE document
31
+ // This is powerful: "Find the task associated with Order #123"
32
+ const orderTasks = await getReactiveTasks({
33
+ task: 'sync-order',
34
+ sourceDocFilter: { _id: 'order-123' }
35
+ });
36
+
37
+ // Advanced: Find tasks where source document matches complex filter
38
+ // "Find sync tasks for all VIP users"
39
+ const vipTasks = await getReactiveTasks({
40
+ task: 'sync-order',
41
+ sourceDocFilter: { isVip: true }
42
+ });
43
+ ```
44
+
45
+ ## Counting Tasks
46
+
47
+ Use `countReactiveTasks` for metrics or UI badges.
48
+
49
+ ```typescript
50
+ import { countReactiveTasks } from 'mongodash';
51
+
52
+ const dlqSize = await countReactiveTasks({
53
+ task: 'send-welcome-email',
54
+ status: 'failed'
55
+ });
56
+ ```
57
+
58
+ ## Retrying Tasks
59
+
60
+ Use `retryReactiveTasks` to manually re-trigger tasks. This is useful for DLQ recovery after fixing a bug.
61
+
62
+ This operation is **concurrency-safe**. If a task is currently `processing`, it will be marked to re-run immediately after the current execution finishes (`processing_dirty`), ensuring no race conditions.
63
+
64
+ ```typescript
65
+ import { retryReactiveTasks } from 'mongodash';
66
+
67
+ // Retry ALL failed tasks for a specific job
68
+ const result = await retryReactiveTasks({
69
+ task: 'send-welcome-email',
70
+ status: 'failed'
71
+ });
72
+ console.log(`Retried ${result.modifiedCount} tasks.`);
73
+
74
+ // Retry specific task by Source Document ID
75
+ await retryReactiveTasks({
76
+ task: 'sync-order',
77
+ sourceDocFilter: { _id: 'order-123' }
78
+ });
79
+
80
+ // Bulk Retry: Retry all tasks for "VIP" orders
81
+ // This efficiently finds matching tasks and schedules them for execution.
82
+ await retryReactiveTasks({
83
+ task: 'sync-order',
84
+ sourceDocFilter: { isVip: true }
85
+ });
86
+ ```
@@ -0,0 +1,76 @@
1
+ # Monitoring
2
+
3
+ Mongodash provides built-in Prometheus metrics to monitor your reactive tasks.
4
+
5
+ > [!NOTE]
6
+ > **Dependency Required**: You must install `prom-client` yourself to use this feature. It is an optional peer dependency.
7
+ > ```bash
8
+ > npm install prom-client
9
+ > ```
10
+
11
+ ## Configuration
12
+
13
+ Monitoring is configured in the initialization options under the `monitoring` key:
14
+
15
+ ```typescript
16
+ await mongodash.init({
17
+ // ...
18
+ monitoring: {
19
+ enabled: true, // Default: true
20
+ scrapeMode: 'cluster', // 'cluster' (default) or 'local'
21
+ pushIntervalMs: 60000, // How often instances synchronize metrics (default: 1m). Relevant only if scrapeMode is 'cluster'.
22
+ readPreference: 'secondaryPreferred' // 'primary', 'secondaryPreferred' etc.
23
+ }
24
+ });
25
+ ```
26
+
27
+ - **scrapeMode**:
28
+ - `'cluster'` (Default): Returns aggregated system-wide metrics. Any instance can respond to this request (by fetching state from the DB). It aggregates metrics from all other active instances. (Recommended for Load Balancers / Heroku)
29
+ - `'local'`: Returns local metrics for THIS instance. If this instance is the Leader, it ALSO includes Global System Metrics (Queue Depth, Lag) so they are reported exactly once in the cluster. (Recommended for K8s Pod Monitors)
30
+
31
+ ## Retrieving Metrics
32
+
33
+ Expose the metrics endpoint (e.g., in Express):
34
+
35
+ ```typescript
36
+ import { getPrometheusMetrics } from 'mongodash';
37
+
38
+ app.get('/metrics', async (req, res) => {
39
+ const registry = await getPrometheusMetrics();
40
+
41
+ if (registry) {
42
+ res.set('Content-Type', registry.contentType);
43
+ return res.end(await registry.metrics());
44
+ }
45
+
46
+ res.status(503).send('Monitoring disabled');
47
+ });
48
+ ```
49
+
50
+ ## Available Metrics
51
+
52
+ The system exposes the following metrics with standardized labels:
53
+
54
+ | Metric Name | Type | Labels | Description |
55
+ | :--- | :--- | :--- | :--- |
56
+ | `reactive_tasks_duration_seconds` | Histogram | `task_name`, `status` | Distribution of task processing time (success/failure). |
57
+ | `reactive_tasks_retries_total` | Counter | `task_name` | Total number of retries attempted. |
58
+ | `reactive_tasks_queue_depth` | Gauge | `task_name`, `status` | Current number of tasks in the queue, grouped by status (`pending`, `processing`, `processing_dirty`, `failed`). |
59
+ | `reactive_tasks_global_lag_seconds` | Gauge | `task_name` | Age of the oldest `pending` task, measured from `initialScheduledAt` (or `scheduledAt` if not deferred). This ensures deferred tasks still reflect their true waiting time. |
60
+ | `reactive_tasks_change_stream_lag_seconds` | Gauge | *none* | Time difference between now and the last processed Change Stream event. |
61
+ | `reactive_tasks_last_reconciliation_timestamp_seconds` | Gauge | *none* | Timestamp when the last full reconciliation (recovery) finished. |
62
+
63
+ ## Grafana Dashboard
64
+
65
+ A comprehensive **Grafana Dashboard** ("Reactive Tasks - System Overview") is included with the package.
66
+
67
+ It provides real-time visibility into:
68
+ - System Health & Global Lag
69
+ - Throughput & Latency Heatmaps
70
+ - Queue Depth & Composition
71
+ - Error Rates & Retries
72
+
73
+ You can find the dashboard JSON file at:
74
+ `node_modules/mongodash/grafana/reactive_tasks.json`
75
+
76
+ Import this file directly into Grafana to get started.
@@ -0,0 +1,70 @@
1
+ # Cleanup Policy
2
+
3
+ The Cleanup Policy controls automatic deletion of orphaned task records — tasks whose source documents have been deleted or no longer match the configured filter.
4
+
5
+ ## Configuration
6
+
7
+ ```typescript
8
+ cleanupPolicy?: {
9
+ deleteWhen?: 'sourceDocumentDeleted' | 'sourceDocumentDeletedOrNoLongerMatching' | 'never';
10
+ keepFor?: string | number;
11
+ }
12
+ ```
13
+
14
+ | Property | Type | Default | Description |
15
+ |----------|------|---------|-------------|
16
+ | `deleteWhen` | `string` | `'sourceDocumentDeleted'` | When to trigger task deletion |
17
+ | `keepFor` | `string \| number` | `'24h'` | Grace period before deletion (e.g., `'1h'`, `'7d'`, or `86400000` ms) |
18
+
19
+ ### Deletion Strategies (`deleteWhen`)
20
+
21
+ | Strategy | Behavior |
22
+ |----------|----------|
23
+ | `sourceDocumentDeleted` | **Default.** Task deleted only when its source document is deleted from the database. Filter mismatches are ignored. |
24
+ | `sourceDocumentDeletedOrNoLongerMatching` | Task deleted when source document is deleted **OR** when it no longer matches the task's `filter`. Useful for cases the change of document is permament and it is not expected the document could match in the future again and retrigger because of that. Also useful for `$$NOW`-based or dynamic filters. |
25
+ | `never` | Tasks are never automatically deleted. Use for audit trails or manual cleanup scenarios. |
26
+
27
+ ### Grace Period Calculation
28
+
29
+ The `keepFor` grace period is measured from `MAX(updatedAt, lastFinalizedAt)`:
30
+
31
+ - **`updatedAt`**: When the source document's watched fields (`watchProjection`) last changed
32
+ - **`lastFinalizedAt`**: When a worker last completed or failed the task
33
+
34
+ This ensures tasks are protected if either:
35
+ 1. The source data changed recently, OR
36
+ 2. A worker processed the task recently
37
+
38
+ ### Example: Dynamic Filter Cleanup
39
+
40
+ ```typescript
41
+ await reactiveTask({
42
+ task: 'remind-pending-order',
43
+ collection: 'orders',
44
+ // Match orders pending for more than 24 hours
45
+ filter: { $expr: { $gt: ['$$NOW', { $add: ['$createdAt', 24 * 60 * 60 * 1000] }] } },
46
+
47
+ cleanupPolicy: {
48
+ deleteWhen: 'sourceDocumentDeletedOrNoLongerMatching',
49
+ keepFor: '1h', // Keep it at least 1 hour after last scheduled matching or finalization
50
+ },
51
+
52
+ handler: async (order) => { /* Send reminder email */ }
53
+ });
54
+ ```
55
+
56
+ ### Scheduler-Level Configuration
57
+
58
+ Control how often the cleanup runs using `reactiveTaskCleanupInterval` in scheduler options. Cleanup is performed in **batches** (default 1000 items) to ensure stability on large datasets.
59
+
60
+ ```typescript
61
+ await mongodash.init({
62
+ // ...
63
+ reactiveTaskCleanupInterval: '12h', // Run cleanup every 12 hours (default: '24h')
64
+ });
65
+ ```
66
+
67
+ Supported formats:
68
+ - Duration string: `'1h'`, `'24h'`, `'7d'`
69
+ - Milliseconds: `86400000`
70
+ - Cron expression: `'CRON 0 3 * * *'` (e.g., daily at 3 AM)
@@ -0,0 +1,60 @@
1
+ # Retry Policy
2
+
3
+ You can configure the retry behavior using the `retryPolicy` option.
4
+
5
+ ## General Options
6
+
7
+ | Option | Type | Default | Description |
8
+ | :--- | :--- | :--- | :--- |
9
+ | `type` | `string` | **Required** | `'fixed'`, `'linear'`, `'exponential'`, `'series'`, or `'cron'` |
10
+ | `maxAttempts` | `number` | `5`* | Maximum total attempts (use `-1` for unlimited). |
11
+ | `maxDuration` | `string \| number` | `undefined` | Stop retrying if elapsed time since the **first failure** in the current sequence exceeds this value. |
12
+ | `resetRetriesOnDataChange` | `boolean` | `true` | Reset attempt count if the source document changes. |
13
+
14
+ *\* If `maxDuration` is specified, `maxAttempts` defaults to unlimited.*
15
+
16
+ ### Policy Specific Settings
17
+
18
+ | Policy | Property | Default | Description |
19
+ | :--- | :--- | :--- | :--- |
20
+ | **`fixed`** | `interval` | - | Delay between retries (e.g., `'10s'`). |
21
+ | **`linear`** | `interval` | - | Base delay multiplied by `attempt` number. |
22
+ | **`exponential`** | `min` | `'10s'` | Initial delay for the first retry. |
23
+ | **`exponential`** | `max` | `'1d'` | Maximum delay cap for backoff. |
24
+ | **`exponential`** | `factor` | `2` | Multiplication factor per attempt. |
25
+ | **`series`** | `intervals` | - | Array of fixed delays (e.g., `['1m', '5m', '15m']`). |
26
+ | **`cron`** | `expression` | - | Standard cron string for scheduling retries. |
27
+
28
+ ### Examples
29
+
30
+ ```typescript
31
+ // 1. Give up after 24 hours (infinite attempts within that window)
32
+ retryPolicy: {
33
+ maxDuration: '24h',
34
+ type: 'exponential',
35
+ min: '10s',
36
+ max: '1h'
37
+ }
38
+
39
+ // 2. Exact retry ladder (try after 1m, then 5m, then 15m, then fail)
40
+ retryPolicy: {
41
+ maxAttempts: 4, // 1st run + 3 retries
42
+ type: 'series',
43
+ intervals: ['1m', '5m', '15m']
44
+ }
45
+
46
+ // 3. Series with last interval reuse
47
+ // Sequence: 1m, 5m, 5m, 5m ... (last one repeats)
48
+ retryPolicy: {
49
+ maxAttempts: 10,
50
+ type: 'series',
51
+ intervals: ['1m', '5m']
52
+ }
53
+
54
+ // 4. Permanent retries every hour
55
+ retryPolicy: {
56
+ maxAttempts: -1,
57
+ type: 'fixed',
58
+ interval: '1h'
59
+ }
60
+ ```
@@ -0,0 +1,40 @@
1
+ # Reconciliation & Reliability
2
+
3
+ The system includes a self-healing mechanism called **Reconciliation**.
4
+
5
+ ## What is it?
6
+
7
+ It is a "full scan" process that ensures the state of your tasks matches the actual data in your collections. It iterates through your source collections (efficiently, fetching only `_id`) and ensures every document has the correct corresponding tasks planned.
8
+
9
+ ## When does it run?
10
+
11
+ 1. **On Startup (Partial)**: When `startReactiveTasks()` is called, the leader performs a reconciliation only for tasks that have **never been reconciled before**. This ensures that newly added tasks catch up with existing data.
12
+ 2. **On History Loss**: If the MongoDB Change Stream buffer (Oplog) is full and events are lost (Error code 280), the system automatically triggers full reconciliation to ensure consistency is restored.
13
+ 3. **On Trigger Evolution**: When you widen a task filter (e.g. `amount > 100` -> `amount > 50`), the system triggers reconciliation to backfill tasks for existing documents that now match the new filter.
14
+
15
+ ## Resilience
16
+
17
+ Reconciliation is **persistent and resilient**.
18
+ - **Checkpoints**: The system saves its progress (`lastId`) periodically to the database (`_mongodash_planner_meta`).
19
+ - **Resumable**: If the process is interrupted (e.g., deployment, crash), it effectively **resumes** from the last checkpoint upon restart, preventing re-processing of already reconciled documents.
20
+ - **Invalidation**: If the set of tasks being reconciled changes (e.g., you deploy a version with a NEW task definition for the same collection), the system detects this change, invalidates the checkpoint, and restarts reconciliation from the beginning to ensure the new task is applied to the entire collection.
21
+
22
+ ## Expectations
23
+
24
+ - **No Data Loss**: Even if your specific localized Oplog history is lost, the system will eventually process every document.
25
+ - **Performance**: The scan is optimized (uses batching and projection of `_id` only), but it performs a **full collection scan**. On huge collections (millions of docs), this causes increased database load during startup or recovery.
26
+ - **Batch Processing**: Both reconciliation and periodic cleanup process documents in batches to avoid overwhelming the database and the application memory.
27
+
28
+ > [!CAUTION]
29
+ > **Limitations of `$$NOW` in filters**
30
+ > MongoDB Change Streams only trigger when a document is physically updated. If your `filter` depends on time passing (e.g., `dueAt: { $lte: '$$NOW' }`), the task **will not** trigger automatically just because time passed. It will only be picked up during:
31
+ > 1. A physical update to the source document.
32
+ > 2. The next system restart, if the reconciliation is run.
33
+ > 3. Manual re-triggers via `retryReactiveTasks()`.
34
+
35
+ ## Configuration Matters
36
+
37
+ Reconciliation respects your `filter` and `watchProjection`.
38
+ - If a document doesn't match the `filter`, no task is planned.
39
+ - If the `watchProjection` hasn't changed since the last run (comparing `lastObservedValues`), the task is **not** re-triggered.
40
+ - **Recommendation**: Carefully configure `filter` and `watchProjection` to minimize unnecessary processing during reconciliation.