@cronicorn/mcp-server 1.4.5 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,268 @@
1
+ ---
2
+ id: how-scheduling-works
3
+ title: How Scheduling Works
4
+ description: Scheduler loop, Governor safety, and nextRunAt calculation
5
+ tags: [assistant, technical, scheduler]
6
+ sidebar_position: 2
7
+ mcp:
8
+ uri: file:///docs/technical/how-scheduling-works.md
9
+ mimeType: text/markdown
10
+ priority: 0.85
11
+ lastModified: 2025-11-02T00:00:00Z
12
+ ---
13
+
14
+ # How Scheduling Works
15
+
16
+ This document explains how the Scheduler worker executes jobs and calculates next run times. If you haven't read [System Architecture](./system-architecture.md), start there for context on the dual-worker design.
17
+
18
+ ## The Scheduler's Job
19
+
20
+ The Scheduler worker has one responsibility: execute endpoints on time, record results, and schedule the next run. It doesn't analyze patterns, make AI decisions, or try to be clever. It's a reliable execution engine.
21
+
22
+ Every 5 seconds (configurable), the Scheduler wakes up and runs a **tick**:
23
+
24
+ 1. **Claim** due endpoints from the database
25
+ 2. **Execute** each endpoint's HTTP request
26
+ 3. **Record** results (status, duration, response body)
27
+ 4. **Calculate** when to run next using the Governor
28
+ 5. **Update** the database with the new schedule
29
+
30
+ Then it goes back to sleep. Simple, predictable, fast.
31
+
32
+ ## The Tick Loop in Detail
33
+
34
+ ### Step 1: Claiming Due Endpoints
35
+
36
+ The Scheduler asks the database: "Which endpoints have `nextRunAt <= now`?"
37
+
38
+ But there's a catch—in production, you might run multiple Scheduler instances for redundancy. To prevent two Schedulers from executing the same endpoint simultaneously, the claim operation uses a **lock**.
39
+
40
+ Here's how claiming works:
41
+
42
+ - Query for endpoints where `nextRunAt <= now` and `_lockedUntil <= now` (not currently locked)
43
+ - Atomically update those endpoints to set `_lockedUntil = now + lockTtlMs` (typically 30 seconds)
44
+ - Return the list of endpoint IDs that were successfully locked
45
+
46
+ The lock TTL serves as a safety mechanism. If a Scheduler crashes while executing an endpoint, the lock expires and another Scheduler can claim it. This prevents endpoints from getting stuck if a worker dies mid-execution.
47
+
48
+ The Scheduler claims up to `batchSize` endpoints per tick (default: 10). This provides flow control—if the system is backlogged, it processes endpoints in manageable batches rather than trying to execute hundreds at once.
49
+
50
+ ### Step 2: Executing the Endpoint
51
+
52
+ For each claimed endpoint, the Scheduler:
53
+
54
+ 1. Reads the full endpoint configuration from the database
55
+ 2. Creates a run record with status `"running"` and the current attempt number
56
+ 3. Makes the HTTP request (GET, POST, etc.) to the endpoint's URL
57
+ 4. Waits for the response (up to the configured timeout)
58
+ 5. Records the outcome: success/failure, duration, status code, response body
59
+
60
+ The response body is stored in the database. This is important—the AI Planner will read these response bodies later to make scheduling decisions. Structure your endpoint responses to include metrics the AI should monitor (queue depth, error rate, load indicators).
61
+
62
+ ### Step 3: Recording Results
63
+
64
+ After execution completes (or times out), the Scheduler writes a complete run record:
65
+
66
+ - Status: `"success"` or `"failure"`
67
+ - Duration in milliseconds
68
+ - HTTP status code
69
+ - Response body (the JSON returned by your endpoint)
70
+ - Error message (if failed)
71
+
72
+ This creates a historical record. You can query past runs to see execution patterns, investigate failures, or debug scheduling behavior.
73
+
74
+ ### Step 4: The Governor Decides Next Run Time
75
+
76
+ After recording results, the Scheduler needs to answer: "When should this endpoint run next?"
77
+
78
+ This is where the **Governor** comes in. The Governor is a pure function that takes three inputs:
79
+
80
+ - **Current time** (`now`)
81
+ - **Endpoint state** (baseline schedule, AI hints, constraints, failure count)
82
+ - **Cron parser** (for evaluating cron expressions)
83
+
84
+ It returns a single output:
85
+
86
+ - **Next run time** and **source** (why this time was chosen)
87
+
88
+ The Governor is deterministic—same inputs always produce the same output. It makes no database calls, has no side effects, and contains no I/O. This makes it easy to test, audit, and understand.
89
+
90
+ Let's walk through how the Governor makes its decision.
91
+
92
+ ### Building Scheduling Candidates
93
+
94
+ The Governor starts by building a list of **candidates**—possible next run times based on different scheduling sources:
95
+
96
+ **Baseline Candidate**
97
+
98
+ Every endpoint has a baseline schedule. This is either:
99
+ - A cron expression: `"0 */5 * * *"` → next cron occurrence after `now`
100
+ - An interval: `300000ms` → `now + 300000ms` (with exponential backoff on failures)
101
+
102
+ The baseline represents your intent. It never expires.
103
+
104
+ If the endpoint uses interval-based scheduling and has failures, the Governor applies **exponential backoff**: `interval * 2^failureCount` (capped at 5 failures = 32x multiplier). This prevents hammering a failing endpoint while still retrying.
105
+
106
+ **AI Interval Candidate**
107
+
108
+ If there's an active AI interval hint (not expired), the Governor creates a candidate:
109
+ - `now + aiHintIntervalMs`
110
+ - Source: `"ai-interval"`
111
+
112
+ This candidate only exists if `aiHintExpiresAt > now`. Expired hints are ignored.
113
+
114
+ **AI One-Shot Candidate**
115
+
116
+ If there's an active AI one-shot hint (not expired), the Governor creates a candidate:
117
+ - `aiHintNextRunAt` (specific timestamp)
118
+ - Source: `"ai-oneshot"`
119
+
120
+ Again, only if `aiHintExpiresAt > now`.
121
+
122
+ ### Choosing the Winning Candidate
123
+
124
+ Now the Governor has 1-3 candidates. Which one wins?
125
+
126
+ **The rules:**
127
+
128
+ 1. **If both AI hints exist**: Choose the earliest between interval and one-shot (baseline ignored)
129
+ 2. **If only AI interval exists**: AI interval wins (baseline ignored)
130
+ 3. **If only AI one-shot exists**: Choose earliest between one-shot and baseline
131
+ 4. **If no AI hints exist**: Baseline wins
132
+
133
+ This logic ensures that:
134
+ - AI interval hints **override** baseline (enables tightening/relaxing schedule)
135
+ - AI one-shot hints **compete** with other candidates (enables immediate runs)
136
+ - Baseline is the fallback when AI has no opinion
137
+
138
+ ### Safety: Handling Past Candidates
139
+
140
+ Sometimes a candidate time is in the past. This happens when:
141
+ - Execution took longer than the interval
142
+ - System was backlogged
143
+ - One-shot hint targeted a time that already passed
144
+
145
+ The Governor handles this by rescheduling from `now`:
146
+
147
+ - **Interval-based** (baseline or AI interval): Add the interval to `now` to preserve cadence
148
+ - **Cron-based**: Use cron.next() which handles past times correctly
149
+ - **One-shot**: Floor to `now` (run immediately)
150
+
151
+ This prevents the endpoint from being immediately reclaimed in the next tick, which would cause a tight execution loop.
152
+
153
+ ### Applying Constraints (Clamping)
154
+
155
+ After choosing a candidate, the Governor applies **min/max interval constraints** (if configured):
156
+
157
+ - **Min interval**: If `chosen < (now + minIntervalMs)`, move it forward to `now + minIntervalMs` (source becomes `"clamped-min"`)
158
+ - **Max interval**: If `chosen > (now + maxIntervalMs)`, move it back to `now + maxIntervalMs` (source becomes `"clamped-max"`)
159
+
160
+ These constraints are hard limits. They override even AI hints. Use them to enforce invariants like rate limits or maximum staleness.
161
+
162
+ ### The Pause Override
163
+
164
+ After all candidate evaluation and clamping, the Governor checks one final thing: **Is the endpoint paused?**
165
+
166
+ If `pausedUntil` is set and `pausedUntil > now`, the Governor returns:
167
+ - `nextRunAt = pausedUntil`
168
+ - Source: `"paused"`
169
+
170
+ Pause **always wins**. It overrides baseline, AI hints, and constraints. This gives you an emergency brake to stop an endpoint immediately.
171
+
172
+ ### Step 5: Database Update
173
+
174
+ The Scheduler writes back to the database:
175
+
176
+ - `lastRunAt = now` (when this execution started)
177
+ - `nextRunAt = governor result` (when to run next)
178
+ - `failureCount`: reset to 0 on success, increment on failure
179
+ - Clear expired AI hints (if `aiHintExpiresAt <= now`)
180
+
181
+ The database update is atomic. If two Schedulers somehow claimed the same endpoint (shouldn't happen due to locks, but defensive programming), only one update succeeds.
182
+
183
+ After the update, the endpoint's lock expires naturally (when `_lockedUntil` passes), and it becomes claimable again when `nextRunAt` arrives.
184
+
185
+ ## Safety Mechanisms
186
+
187
+ The Scheduler includes several safety mechanisms to handle edge cases:
188
+
189
+ ### Zombie Run Cleanup
190
+
191
+ A **zombie run** is a run record stuck in `"running"` status because the Scheduler crashed mid-execution.
192
+
193
+ The Scheduler periodically (every few minutes) scans for runs where:
194
+ - Status is `"running"`
195
+ - Created more than `zombieThresholdMs` ago (default: 5 minutes)
196
+
197
+ These runs are marked as `"timeout"` or `"cancelled"` to clean up the database and prevent confusion in the UI.
198
+
199
+ ### Lock Expiration
200
+
201
+ The `_lockedUntil` field prevents double execution. Locks are short-lived (30 seconds by default). If a Scheduler crashes, the lock expires and another Scheduler can pick up the work.
202
+
203
+ This means endpoints can't get permanently stuck. At worst, there's a delay equal to the lock TTL before another Scheduler claims the endpoint.
204
+
205
+ ### Past Candidate Protection
206
+
207
+ As mentioned earlier, the Governor reschedules past candidates from `now` rather than allowing them to remain in the past. This prevents immediate reclaiming, which would create a tight loop.
208
+
209
+ The Scheduler also has a second layer of protection: after calling the Governor, it checks if the returned `nextRunAt` is still in the past (due to slow execution). If so, it recalculates from the current time using the intended interval.
210
+
211
+ This double-check ensures that even if execution is very slow, the system doesn't spiral into a reclaim loop.
212
+
213
+ ### Failure Count and Backoff
214
+
215
+ When an endpoint fails repeatedly, the Governor applies exponential backoff to interval-based schedules:
216
+
217
+ - 0 failures: Normal interval
218
+ - 1 failure: 2x interval
219
+ - 2 failures: 4x interval
220
+ - 3 failures: 8x interval
221
+ - 4 failures: 16x interval
222
+ - 5+ failures: 32x interval (capped)
223
+
224
+ This prevents hammering a failing service while still retrying. The backoff resets to 0 on the first success.
225
+
226
+ ## Sources: Tracing Scheduling Decisions
227
+
228
+ Every scheduling decision records its **source**. This tells you why an endpoint ran at a particular time. Possible sources:
229
+
230
+ - `"baseline-cron"`: Ran on cron schedule
231
+ - `"baseline-interval"`: Ran on fixed interval
232
+ - `"ai-interval"`: Ran using AI interval hint
233
+ - `"ai-oneshot"`: Ran using AI one-shot hint
234
+ - `"clamped-min"`: Chosen time was too soon, moved to min interval
235
+ - `"clamped-max"`: Chosen time was too late, moved to max interval
236
+ - `"paused"`: Endpoint is paused
237
+
238
+ These sources are stored in run records and logs. When debugging "why did this endpoint run at 3:47 AM?", check the source. It shows the decision trail.
239
+
240
+ ## What Happens Between Ticks
241
+
242
+ While the Scheduler sleeps (the 5 seconds between ticks), other things can happen:
243
+
244
+ - The AI Planner might write new hints
245
+ - You might update the endpoint configuration via the API
246
+ - You might pause an endpoint
247
+ - External systems might change state
248
+
249
+ None of this disrupts the Scheduler. When it wakes up for the next tick, it reads fresh state from the database and makes decisions based on current reality.
250
+
251
+ This is the power of database-mediated communication: the Scheduler and AI Planner stay in sync without ever talking directly.
252
+
253
+ ## Key Takeaways
254
+
255
+ 1. **The Scheduler is simple**: Claim, execute, record, schedule, repeat
256
+ 2. **The Governor is deterministic**: Pure function, same inputs = same output
257
+ 3. **AI hints override baseline**: This enables adaptation
258
+ 4. **Constraints are hard limits**: Min/max and pause override everything
259
+ 5. **Safety mechanisms prevent failures**: Locks, zombie cleanup, backoff, past protection
260
+ 6. **Sources provide auditability**: Every decision is traceable
261
+
262
+ Understanding how scheduling works gives you the foundation to configure endpoints effectively, debug unexpected behavior, and reason about how AI adaptation affects execution timing.
263
+
264
+ ## Next Steps
265
+
266
+ - **[How AI Adaptation Works](./how-ai-adaptation-works.md)** - Learn how the AI Planner writes hints
267
+ - **[Configuration and Constraints](./configuration-and-constraints.md)** - Guide to setting up endpoints safely
268
+ - **[Reference](./reference.md)** - Quick lookup for fields, defaults, and sources
@@ -0,0 +1,404 @@
1
+ ---
2
+ id: technical-reference
3
+ title: Technical Reference
4
+ description: Quick lookup for schemas, defaults, tools, and state transitions
5
+ tags: [assistant, technical, reference]
6
+ sidebar_position: 6
7
+ mcp:
8
+ uri: file:///docs/technical/reference.md
9
+ mimeType: text/markdown
10
+ priority: 0.75
11
+ lastModified: 2025-11-02T00:00:00Z
12
+ ---
13
+
14
+ # Reference
15
+
16
+ Quick lookup for Cronicorn terminology, schema, defaults, and tools.
17
+
18
+ ## Glossary
19
+
20
+ **Baseline Schedule**
21
+ : The permanent schedule configured for an endpoint (cron expression or interval). Used when no AI hints are active. Never expires unless manually updated.
22
+
23
+ **Governor**
24
+ : Pure function that calculates the next run time for an endpoint. Takes current time, endpoint state, and cron parser as inputs. Returns timestamp and source.
25
+
26
+ **Hint**
27
+ : Temporary AI suggestion for scheduling (interval or one-shot). Has a TTL and expires automatically. Stored in `aiHint*` fields.
28
+
29
+ **Nudging**
30
+ : Updating `nextRunAt` immediately when AI writes a hint, so the change takes effect within seconds instead of waiting for the next baseline execution. Uses `setNextRunAtIfEarlier()`.
31
+
32
+ **Claiming**
33
+ : The Scheduler's process of acquiring locks on due endpoints to prevent double execution in multi-worker setups. Uses `_lockedUntil` field.
34
+
35
+ **Tick**
36
+ : One iteration of the Scheduler's loop (every 5 seconds). Claims due endpoints, executes them, records results, schedules next run.
37
+
38
+ **Source**
39
+ : The reason a particular next run time was chosen. Values: `baseline-cron`, `baseline-interval`, `ai-interval`, `ai-oneshot`, `clamped-min`, `clamped-max`, `paused`.
40
+
41
+ **TTL (Time To Live)**
42
+ : How long an AI hint remains valid before expiring. After expiration, the system reverts to baseline. Default: 60 minutes for intervals, 30 minutes for one-shots.
43
+
44
+ **Exponential Backoff**
45
+ : Automatic interval increase after failures: `baselineMs × 2^min(failureCount, 5)`. Applies only to baseline interval schedules, not AI hints or cron.
46
+
47
+ **Zombie Run**
48
+ : A run record stuck in `"running"` status because the Scheduler crashed mid-execution. Cleaned up after 5 minutes by default.
49
+
50
+ **Sibling Endpoints**
51
+ : Endpoints in the same job that can coordinate via `get_sibling_latest_responses()`.
52
+
53
+ **Analysis Session**
54
+ : Record of AI analysis for an endpoint, including tools called, reasoning, and token usage. Stored in `ai_analysis_sessions` table.
55
+
56
+ ## Schema: job_endpoints Table
57
+
58
+ | Field | Type | Description |
59
+ |-------|------|-------------|
60
+ | `id` | UUID | Unique endpoint identifier |
61
+ | `jobId` | UUID | Parent job (for grouping and sibling queries) |
62
+ | `tenantId` | UUID | Tenant for multi-tenancy |
63
+ | `name` | String | Display name |
64
+ | `description` | String | Optional context for AI |
65
+ | `url` | String | HTTP endpoint to call |
66
+ | `method` | Enum | HTTP method (GET, POST, PUT, PATCH, DELETE) |
67
+ | `headersJson` | JSON | Request headers |
68
+ | `bodyJson` | JSON | Request body (for POST/PUT/PATCH) |
69
+ | `baselineCron` | String | Cron expression (mutually exclusive with interval) |
70
+ | `baselineIntervalMs` | Integer | Fixed interval in milliseconds |
71
+ | `minIntervalMs` | Integer | Minimum interval constraint (hard limit) |
72
+ | `maxIntervalMs` | Integer | Maximum interval constraint (hard limit) |
73
+ | `timeoutMs` | Integer | Request timeout |
74
+ | `maxResponseSizeKb` | Integer | Response body size limit |
75
+ | `maxExecutionTimeMs` | Integer | Max execution time for lock duration |
76
+ | `aiHintIntervalMs` | Integer | AI-suggested interval (temporary) |
77
+ | `aiHintNextRunAt` | Timestamp | AI-suggested one-shot time (temporary) |
78
+ | `aiHintExpiresAt` | Timestamp | When AI hints expire |
79
+ | `aiHintReason` | String | AI's explanation for hint |
80
+ | `pausedUntil` | Timestamp | Pause until this time (null = not paused) |
81
+ | `lastRunAt` | Timestamp | When endpoint last executed |
82
+ | `nextRunAt` | Timestamp | When to run next (updated after each execution) |
83
+ | `failureCount` | Integer | Consecutive failures (reset on success) |
84
+ | `_lockedUntil` | Timestamp | Lock expiration for claiming |
85
+
86
+ ## Default Values
87
+
88
+ | Setting | Default | Notes |
89
+ |---------|---------|-------|
90
+ | **Scheduler tick interval** | 5 seconds | How often Scheduler wakes up to claim endpoints |
91
+ | **AI analysis interval** | 5 minutes | How often AI Planner discovers and analyzes endpoints |
92
+ | **Lock TTL** | 30 seconds | How long claimed endpoints stay locked |
93
+ | **Batch size** | 10 endpoints | Max endpoints claimed per tick |
94
+ | **Zombie threshold** | 5 minutes | Runs older than this marked as timeout |
95
+ | **AI hint TTL** | 60 minutes | Default for `propose_interval` |
96
+ | **One-shot hint TTL** | 30 minutes | Default for `propose_next_time` |
97
+ | **Timeout** | 30 seconds | Default request timeout |
98
+ | **Max response size** | 100 KB | Default response body limit |
99
+ | **Failure count cap** | 5 | Backoff capped at 2^5 = 32x |
100
+ | **Min interval** | None | No minimum unless configured |
101
+ | **Max interval** | None | No maximum unless configured |
102
+
103
+ ## AI Tool Catalog
104
+
105
+ ### Action Tools (Write Hints)
106
+
107
+ #### propose_interval
108
+
109
+ Adjust endpoint execution frequency.
110
+
111
+ **Parameters**:
112
+ - `intervalMs` (number, required): New interval in milliseconds
113
+ - `ttlMinutes` (number, default: 60): How long hint is valid
114
+ - `reason` (string, optional): Explanation for adjustment
115
+
116
+ **Effect**: Writes `aiHintIntervalMs` and `aiHintExpiresAt`, nudges `nextRunAt`
117
+
118
+ **Override behavior**: Overrides baseline (not one-shot)
119
+
120
+ **Example**:
121
+ ```json
122
+ {
123
+ "intervalMs": 30000,
124
+ "ttlMinutes": 15,
125
+ "reason": "Queue depth increasing - tightening monitoring"
126
+ }
127
+ ```
128
+
129
+ #### propose_next_time
130
+
131
+ Schedule one-shot execution at specific time.
132
+
133
+ **Parameters**:
134
+ - `nextRunAtIso` (ISO 8601 string, required): When to run next
135
+ - `ttlMinutes` (number, default: 30): How long hint is valid
136
+ - `reason` (string, optional): Explanation
137
+
138
+ **Effect**: Writes `aiHintNextRunAt` and `aiHintExpiresAt`, nudges `nextRunAt`
139
+
140
+ **Override behavior**: Competes with baseline (earliest wins)
141
+
142
+ **Example**:
143
+ ```json
144
+ {
145
+ "nextRunAtIso": "2025-11-02T14:30:00Z",
146
+ "ttlMinutes": 5,
147
+ "reason": "Immediate investigation of failure spike"
148
+ }
149
+ ```
150
+
151
+ #### pause_until
152
+
153
+ Pause execution temporarily or resume.
154
+
155
+ **Parameters**:
156
+ - `untilIso` (ISO 8601 string or null, required): Pause until time, or null to resume
157
+ - `reason` (string, optional): Explanation
158
+
159
+ **Effect**: Writes `pausedUntil`
160
+
161
+ **Override behavior**: Overrides everything (highest priority)
162
+
163
+ **Example**:
164
+ ```json
165
+ {
166
+ "untilIso": "2025-11-02T15:00:00Z",
167
+ "reason": "Dependency unavailable until maintenance completes"
168
+ }
169
+ ```
170
+
171
+ ### Query Tools (Read Data)
172
+
173
+ #### get_latest_response
174
+
175
+ Get most recent response body from this endpoint.
176
+
177
+ **Parameters**: None
178
+
179
+ **Returns**:
180
+ ```json
181
+ {
182
+ "found": true,
183
+ "responseBody": { "queue_depth": 45, "status": "healthy" },
184
+ "timestamp": "2025-11-02T14:30:00Z",
185
+ "status": "success"
186
+ }
187
+ ```
188
+
189
+ #### get_response_history
190
+
191
+ Get recent response bodies to identify trends.
192
+
193
+ **Parameters**:
194
+ - `limit` (number, default: 2, max: 10): Number of responses
195
+ - `offset` (number, default: 0): Skip N newest for pagination
196
+
197
+ **Returns**:
198
+ ```json
199
+ {
200
+ "count": 2,
201
+ "hasMore": true,
202
+ "pagination": { "offset": 0, "limit": 2, "nextOffset": 2 },
203
+ "responses": [
204
+ {
205
+ "responseBody": { "queue_depth": 200 },
206
+ "timestamp": "2025-11-02T14:30:00Z",
207
+ "status": "success",
208
+ "durationMs": 120
209
+ }
210
+ ],
211
+ "hint": "More history available - call again with offset: 2"
212
+ }
213
+ ```
214
+
215
+ **Note**: Response bodies truncated at 1000 chars
216
+
217
+ #### get_sibling_latest_responses
218
+
219
+ Get latest responses from all endpoints in same job.
220
+
221
+ **Parameters**: None
222
+
223
+ **Returns**:
224
+ ```json
225
+ {
226
+ "count": 3,
227
+ "siblings": [
228
+ {
229
+ "endpointId": "ep_456",
230
+ "endpointName": "Data Processor",
231
+ "responseBody": { "batch_id": "2025-11-02", "ready": true },
232
+ "timestamp": "2025-11-02T14:25:00Z",
233
+ "status": "success"
234
+ }
235
+ ]
236
+ }
237
+ ```
238
+
239
+ **Note**: Only returns endpoints with same `jobId`
240
+
241
+ ### Final Answer Tool
242
+
243
+ #### submit_analysis
244
+
245
+ Signal analysis completion and provide reasoning.
246
+
247
+ **Parameters**:
248
+ - `reasoning` (string, required): Analysis explanation
249
+ - `actions_taken` (string[], optional): List of tools called
250
+ - `confidence` (enum, optional): 'high' | 'medium' | 'low'
251
+
252
+ **Returns**:
253
+ ```json
254
+ {
255
+ "status": "analysis_complete",
256
+ "reasoning": "...",
257
+ "actions_taken": ["propose_interval"],
258
+ "confidence": "high"
259
+ }
260
+ ```
261
+
262
+ **Note**: Must be called last to complete analysis
263
+
264
+ ## Scheduling Sources
265
+
266
+ Sources explain why a particular next run time was chosen:
267
+
268
+ | Source | Meaning | Priority |
269
+ |--------|---------|----------|
270
+ | `paused` | Endpoint is paused, runs at `pausedUntil` | Highest (overrides all) |
271
+ | `clamped-min` | Chosen time was too soon, moved to `now + minIntervalMs` | High |
272
+ | `clamped-max` | Chosen time was too late, moved to `now + maxIntervalMs` | High |
273
+ | `ai-interval` | AI interval hint overrode baseline | Medium |
274
+ | `ai-oneshot` | AI one-shot hint won competition | Medium |
275
+ | `baseline-cron` | Cron expression determined time | Low (default) |
276
+ | `baseline-interval` | Fixed interval determined time | Low (default) |
277
+
278
+ **Reading sources**: Check run records or logs to see why an endpoint ran at a particular time.
279
+
280
+ ## Constraint Interaction Matrix
281
+
282
+ How different settings interact:
283
+
284
+ | If you set... | AI interval hints... | AI one-shot hints... | Baseline... |
285
+ |---------------|---------------------|---------------------|-------------|
286
+ | `pausedUntil` | Ignored (pause wins) | Ignored (pause wins) | Ignored |
287
+ | `minIntervalMs` | Clamped to minimum | Clamped to minimum | Clamped to minimum |
288
+ | `maxIntervalMs` | Clamped to maximum | Clamped to maximum | Clamped to maximum |
289
+ | Both AI hints active | Compete (earliest wins) | Compete (earliest wins) | Ignored |
290
+ | Cron baseline | Ignored by AI | Competes with cron | Used when no hints |
291
+
292
+ **Key insight**: Pause &gt; Constraints &gt; AI hints &gt; Baseline
293
+
294
+ ## Field Constraints
295
+
296
+ Limits enforced by the system:
297
+
298
+ | Field | Min | Max | Units |
299
+ |-------|-----|-----|-------|
300
+ | `baselineIntervalMs` | 1,000 | None | Milliseconds (1 second minimum) |
301
+ | `minIntervalMs` | 0 | None | Milliseconds |
302
+ | `maxIntervalMs` | 0 | None | Milliseconds |
303
+ | `timeoutMs` | 1,000 | 1,800,000 | Milliseconds (30 minutes max) |
304
+ | `maxResponseSizeKb` | 1 | 10,000 | Kilobytes |
305
+ | `maxExecutionTimeMs` | 1,000 | 1,800,000 | Milliseconds (30 minutes max) |
306
+ | `failureCount` | 0 | None | Integer (capped at 5 for backoff) |
307
+ | Hint TTL | 1 | None | Minutes (recommended: 5-240) |
308
+
309
+ ## Common Response Body Patterns
310
+
311
+ Reusable structures for endpoint responses:
312
+
313
+ ### Health Status
314
+ ```json
315
+ {
316
+ "status": "healthy" | "degraded" | "critical",
317
+ "timestamp": "2025-11-02T14:30:00Z"
318
+ }
319
+ ```
320
+
321
+ ### Metrics
322
+ ```json
323
+ {
324
+ "queue_depth": 45,
325
+ "processing_rate_per_min": 100,
326
+ "error_rate_pct": 1.2,
327
+ "avg_latency_ms": 150
328
+ }
329
+ ```
330
+
331
+ ### Coordination Signals
332
+ ```json
333
+ {
334
+ "ready_for_processing": true,
335
+ "batch_id": "2025-11-02",
336
+ "dependency_status": "healthy"
337
+ }
338
+ ```
339
+
340
+ ### Cooldown Tracking
341
+ ```json
342
+ {
343
+ "last_action_at": "2025-11-02T12:00:00Z",
344
+ "action_type": "cache_warm",
345
+ "cooldown_minutes": 60
346
+ }
347
+ ```
348
+
349
+ ### Thresholds
350
+ ```json
351
+ {
352
+ "current_value": 250,
353
+ "warning_threshold": 300,
354
+ "critical_threshold": 500
355
+ }
356
+ ```
357
+
358
+ ## Quick Troubleshooting
359
+
360
+ **Endpoint not running**:
361
+ - Check `pausedUntil` (might be paused)
362
+ - Check `nextRunAt` (might be scheduled far in future)
363
+ - Check `_lockedUntil` (might be locked by crashed worker)
364
+
365
+ **AI not adapting**:
366
+ - Check `aiHintExpiresAt` (hints might be expired)
367
+ - Check analysis sessions (AI might not see patterns)
368
+ - Verify response bodies include metrics (AI needs data)
369
+ - Check quota limits (might be exceeded)
370
+
371
+ **Running too frequently**:
372
+ - Check active AI interval hint
373
+ - Verify `minIntervalMs` isn't set too low
374
+ - Check for `propose_next_time` loops
375
+
376
+ **Running too slowly**:
377
+ - Check `maxIntervalMs` constraint
378
+ - Check exponential backoff (high `failureCount`)
379
+ - Verify baseline isn't too long
380
+
381
+ ## Useful Queries
382
+
383
+ Check if endpoint is paused:
384
+ ```
385
+ pausedUntil > now ? "Paused until {pausedUntil}" : "Active"
386
+ ```
387
+
388
+ Check if AI hints are active:
389
+ ```
390
+ aiHintExpiresAt > now ? "Active hints" : "No active hints"
391
+ ```
392
+
393
+ Calculate current backoff multiplier:
394
+ ```
395
+ failureCount > 0 ? 2^min(failureCount, 5) : 1
396
+ ```
397
+
398
+ ## Next Steps
399
+
400
+ For detailed explanations, see:
401
+ - **[System Architecture](./system-architecture.md)** - Understand the big picture
402
+ - **[How Scheduling Works](./how-scheduling-works.md)** - Deep-dive into Governor and Scheduler
403
+ - **[How AI Adaptation Works](./how-ai-adaptation-works.md)** - Learn about hints and tools
404
+ - **[Configuration Guide](./configuration-and-constraints.md)** - Set up endpoints correctly