@cronicorn/mcp-server 1.13.4 → 1.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -25,15 +25,21 @@ The AI Planner runs independently from the Scheduler. Typically it wakes up ever
25
25
 
26
26
  The AI Planner doesn't analyze every endpoint on every cycle. That would be expensive (AI API costs) and unnecessary (most endpoints are stable).
27
27
 
28
- Instead, it uses a **discovery mechanism** to find endpoints that executed recently:
28
+ Instead, it uses **smart scheduling** where the AI controls when it needs to analyze again:
29
29
 
30
- 1. Query the database for endpoints that ran within the last 5-10 minutes
31
- 2. Filter to endpoints that haven't been analyzed in the last cycle (avoid duplicate work)
32
- 3. Analyze each discovered endpoint in sequence
30
+ 1. Query the database for endpoints that ran recently
31
+ 2. Check if the endpoint is due for analysis based on:
32
+ - **First analysis**: New endpoints that have never been analyzed
33
+ - **Scheduled time**: AI-requested re-analysis time has passed
34
+ - **State change**: New failures since last analysis (triggers immediate re-analysis)
35
+ 3. Skip endpoints where none of these conditions are met
33
36
 
34
- This approach focuses AI attention on active endpoints where adaptation might be valuable. Dormant endpoints (not running) don't get analyzed until they become active again.
37
+ This approach lets the AI decide its own analysis frequency:
38
+ - Stable endpoints: "Check again in 4 hours"
39
+ - Incidents: "Check again in 5 minutes"
40
+ - Very stable daily jobs: "Check again in 24 hours"
35
41
 
36
- The discovery window (5-10 minutes) is configurable. Wider windows catch more endpoints but increase batch size. Narrower windows reduce costs but might miss some activity.
42
+ The AI communicates this via the `next_analysis_in_ms` parameter in `submit_analysis` (see Tools section).
37
43
 
38
44
  ## What the AI Sees: Building Context
39
45
 
@@ -49,15 +55,28 @@ For each endpoint, the AI Planner builds a comprehensive analysis prompt contain
49
55
 
50
56
  This tells the AI what scheduling behavior is currently in effect.
51
57
 
52
- ### Recent Performance (Last 24 Hours)
58
+ ### Recent Performance (Multi-Window Health)
53
59
 
54
- - **Success rate**: Percentage of successful executions
55
- - **Total runs**: Number of executions
60
+ The AI sees health metrics across **three time windows** to accurately detect recovery patterns:
61
+
62
+ | Window | Metrics |
63
+ |--------|--------|
64
+ | **Last 1 hour** | Success rate, run count |
65
+ | **Last 4 hours** | Success rate, run count |
66
+ | **Last 24 hours** | Success rate, run count |
67
+
68
+ Plus:
56
69
  - **Average duration**: Mean execution time
57
70
  - **Failure streak**: Consecutive failures (signals degradation)
58
- - **Last run status**: Most recent execution outcome
59
71
 
60
- These health metrics help the AI spot trends: improving, degrading, or stable.
72
+ **Why multiple windows matter**: A single 24-hour window can be misleading during recovery. If an endpoint failed at high frequency (every 5 seconds for 2 hours = 1,440 failures) and then recovered at normal frequency (every 5 minutes for 6 hours = 72 successes), the 24-hour rate shows 4.8% success even though recent performance is 100%.
73
+
74
+ With multi-window health, the AI sees:
75
+ - Last 1h: 100% success (12 runs)
76
+ - Last 4h: 85% success (40 runs)
77
+ - Last 24h: 32% success (500 runs) ← skewed by old failures
78
+
79
+ This tells the AI "endpoint has recovered" rather than "endpoint is still failing."
61
80
 
62
81
  ### Response Body Data
63
82
 
@@ -82,15 +101,33 @@ Structure your response bodies to include the metrics that matter for your use c
82
101
 
83
102
  ### Job Context
84
103
 
85
- If the endpoint belongs to a job, the AI sees the job's description. This provides high-level intent:
104
+ If the endpoint belongs to a job, the AI sees:
105
+
106
+ - **Job description**: High-level intent (e.g., "Monitors payment queue and triggers processing when depth exceeds threshold")
107
+ - **Sibling endpoint names**: Other endpoints in the same job (e.g., "3 endpoints [API Monitor, Data Fetcher, Notifier]")
108
+
109
+ Knowing sibling names helps the AI:
110
+ - Understand the endpoint is part of a larger workflow
111
+ - Decide when to check sibling responses for coordination
112
+ - Make informed decisions about the `get_sibling_latest_responses` tool
113
+
114
+ The AI uses job context to interpret what "good" vs "bad" looks like for specific metrics. A growing queue_depth might be normal for a collector endpoint but alarming for a processor endpoint.
86
115
 
87
- > "Monitors payment queue and triggers processing when depth exceeds threshold"
116
+ ## Session Constraints
88
117
 
89
- The AI uses this context to interpret what "good" vs "bad" looks like for specific metrics. A growing queue_depth might be normal for a collector endpoint but alarming for a processor endpoint.
118
+ Each AI analysis session has resource limits to prevent runaway costs:
90
119
 
91
- ## The Three Tools: How AI Takes Action
120
+ - **Maximum 15 tool calls** per session (hard limit)
121
+ - **10 history records** is usually sufficient for trend analysis
122
+ - Sessions that hit the limit are terminated
92
123
 
93
- The AI Planner doesn't write to the database directly. Instead, it has access to **three action tools** that write hints:
124
+ These constraints prevent the worst-case scenario: an AI session that paginates through hundreds of identical failure records, consuming 42K+ tokens for a decision reachable in 5 tool calls.
125
+
126
+ The AI is informed of these limits and prioritizes the most valuable queries.
127
+
128
+ ## The Four Tools: How AI Takes Action
129
+
130
+ The AI Planner doesn't write to the database directly. Instead, it has access to **four action tools** that write hints:
94
131
 
95
132
  ### 1. propose_interval
96
133
 
@@ -175,6 +212,31 @@ Effect: No executions until 3:30 PM, then resumes baseline
175
212
  3. Scheduler's Governor checks pause state—if `pausedUntil > now`, returns that timestamp with source `"paused"`
176
213
  4. When pause time passes, Governor resumes normal scheduling
177
214
 
215
+ ### 4. clear_hints
216
+
217
+ **Purpose**: Reset endpoint to baseline schedule by clearing all AI hints
218
+
219
+ **Parameters**:
220
+ - `reason`: Explanation for clearing hints
221
+
222
+ **When to use**:
223
+ - AI hints are no longer relevant (situation changed)
224
+ - Manual intervention resolved the issue
225
+ - False positive detection (AI over-reacted)
226
+ - Want to revert to baseline without waiting for TTL expiry
227
+
228
+ **Example**:
229
+ ```
230
+ AI sees endpoint recovered but has aggressive 30s interval hint active
231
+ Action: clear_hints(reason="Endpoint recovered, reverting to baseline")
232
+ Effect: AI hints cleared immediately, baseline schedule resumes
233
+ ```
234
+
235
+ **How it works**:
236
+ 1. AI calls the tool with a reason
237
+ 2. Tool clears `aiHintIntervalMs`, `aiHintNextRunAt`, `aiHintExpiresAt`
238
+ 3. Next execution uses baseline schedule
239
+
178
240
  ## Query Tools: Informing Decisions
179
241
 
180
242
  Before taking action, the AI can query response data using **three query tools**:
@@ -190,7 +252,7 @@ Example: "What's the current queue depth?"
190
252
  ### 2. get_response_history
191
253
 
192
254
  **Parameters**:
193
- - `limit`: Number of responses (1-10, default 2)
255
+ - `limit`: Number of responses (1-10, default 10)
194
256
  - `offset`: Skip N newest responses for pagination
195
257
 
196
258
  **Returns**: Array of response bodies with timestamps, newest first
@@ -199,17 +261,39 @@ Example: "What's the current queue depth?"
199
261
 
200
262
  Example: "Is queue_depth increasing monotonically?"
201
263
 
202
- **Efficiency tip**: Start with `limit=2` to check recent trend. If you need more context, use `offset=2, limit=3` to get the next 3 older responses.
264
+ **Note**: 10 records is usually sufficient for trend analysis. The AI is encouraged not to paginate endlessly—if patterns are unclear after 10-20 records, more data rarely helps.
203
265
 
204
266
  Response bodies are truncated at 1000 characters to prevent token overflow.
205
267
 
206
268
  ### 3. get_sibling_latest_responses
207
269
 
208
- **Returns**: Latest response from each sibling endpoint in the same job
270
+ **Returns**: For each sibling endpoint in the same job:
271
+ - **Response data**: Latest response body, timestamp, status
272
+ - **Schedule info**: Baseline, next run, last run, pause status, failure count
273
+ - **AI hints**: Active interval/one-shot hints with expiry and reason
274
+
275
+ **Use case**: Coordinate across endpoints with full context
209
276
 
210
- **Use case**: Coordinate across endpoints
277
+ Example: "Is the upstream endpoint healthy and running normally, or is it paused/failing?"
211
278
 
212
- Example: "Did the upstream endpoint finish processing?"
279
+ The enriched response allows the AI to understand sibling state at a glance:
280
+ ```json
281
+ {
282
+ "endpointId": "ep_456",
283
+ "endpointName": "Data Fetcher",
284
+ "responseBody": { "batch_ready": true },
285
+ "timestamp": "2025-11-02T14:25:00Z",
286
+ "status": "success",
287
+ "schedule": {
288
+ "baseline": "every 5 minutes",
289
+ "nextRunAt": "2025-11-02T14:30:00Z",
290
+ "lastRunAt": "2025-11-02T14:25:00Z",
291
+ "isPaused": false,
292
+ "failureCount": 0
293
+ },
294
+ "aiHints": null
295
+ }
296
+ ```
213
297
 
214
298
  Only useful for workflow endpoints (multiple endpoints in the same job that coordinate).
215
299
 
@@ -371,12 +455,15 @@ Every AI analysis creates a session record:
371
455
  - **Reasoning** (AI's explanation of its decision)
372
456
  - **Token usage** (cost tracking)
373
457
  - **Duration**
458
+ - **Next analysis at** (when AI scheduled its next analysis)
459
+ - **Endpoint failure count** (snapshot for detecting state changes)
374
460
 
375
461
  This audit trail helps you:
376
462
  - Debug unexpected scheduling behavior
377
463
  - Understand why AI tightened/relaxed intervals
378
464
  - Review cost (token usage per analysis)
379
465
  - Tune prompts or constraints based on AI reasoning
466
+ - See when AI expects to analyze again
380
467
 
381
468
  Check the sessions table when an endpoint's schedule changes unexpectedly. The reasoning field shows the AI's thought process.
382
469
 
@@ -434,14 +521,15 @@ This ensures that even if AI becomes unavailable or too expensive, your jobs kee
434
521
 
435
522
  ## Key Takeaways
436
523
 
437
- 1. **AI discovers active endpoints**: Only analyzes what's running
438
- 2. **AI sees health + response data**: Metrics inform decisions
439
- 3. **Three action tools**: propose_interval, propose_next_time, pause_until
440
- 4. **Hints have TTLs**: Auto-revert on expiration (safety)
441
- 5. **Interval hints override baseline**: Enables adaptation
442
- 6. **Nudging provides immediacy**: Changes apply within seconds
443
- 7. **Structure response bodies intentionally**: Include metrics AI should monitor
444
- 8. **Sessions provide audit trail**: Debug AI reasoning
445
- 9. **Quota controls costs**: AI unavailable ≠ jobs stop running
524
+ 1. **AI controls its own analysis frequency**: Schedules re-analysis based on endpoint state
525
+ 2. **AI sees multi-window health data**: 1h, 4h, 24h windows for accurate recovery detection
526
+ 3. **Sessions have constraints**: Max 15 tool calls to prevent runaway costs
527
+ 4. **Four action tools**: propose_interval, propose_next_time, pause_until, clear_hints
528
+ 5. **Hints have TTLs**: Auto-revert on expiration (safety)
529
+ 6. **Interval hints override baseline**: Enables adaptation
530
+ 7. **Nudging provides immediacy**: Changes apply within seconds
531
+ 8. **Structure response bodies intentionally**: Include metrics AI should monitor
532
+ 9. **Sessions provide audit trail**: Debug AI reasoning
533
+ 10. **Quota controls costs**: AI unavailable ≠ jobs stop running
446
534
 
447
535
  Understanding how AI adaptation works helps you design endpoints that benefit from intelligent scheduling, structure response bodies effectively, and debug unexpected schedule changes.
@@ -50,8 +50,27 @@ Quick lookup for Cronicorn terminology, schema, defaults, and tools.
50
50
  **Sibling Endpoints**
51
51
  : Endpoints in the same job that can coordinate via `get_sibling_latest_responses()`.
52
52
 
53
+ **Multi-Window Health**
54
+ : Health metrics shown across three time windows (1h, 4h, 24h) to accurately detect recovery patterns. Prevents old failure bursts from skewing recent health assessment.
55
+
53
56
  **Analysis Session**
54
- : Record of AI analysis for an endpoint, including tools called, reasoning, and token usage. Stored in `ai_analysis_sessions` table.
57
+ : Record of AI analysis for an endpoint, including tools called, reasoning, token usage, and when to analyze next. Stored in `ai_analysis_sessions` table.
58
+
59
+ **Session Constraints**
60
+ : Resource limits on AI analysis sessions. Maximum 15 tool calls per session to prevent runaway costs. Sessions that hit the limit are terminated.
61
+
62
+ ## Schema: ai_analysis_sessions Table
63
+
64
+ | Field | Type | Description |
65
+ |-------|------|-------------|
66
+ | `id` | UUID | Unique session identifier |
67
+ | `endpointId` | UUID | Endpoint that was analyzed |
68
+ | `createdAt` | Timestamp | When analysis started |
69
+ | `reasoning` | String | AI's explanation for decision |
70
+ | `tokenUsage` | Integer | Tokens consumed |
71
+ | `durationMs` | Integer | How long analysis took |
72
+ | `nextAnalysisAt` | Timestamp | When AI requested re-analysis |
73
+ | `endpointFailureCount` | Integer | Failure count snapshot at analysis time |
55
74
 
56
75
  ## Schema: job_endpoints Table
57
76
 
@@ -88,7 +107,8 @@ Quick lookup for Cronicorn terminology, schema, defaults, and tools.
88
107
  | Setting | Default | Notes |
89
108
  |---------|---------|-------|
90
109
  | **Scheduler tick interval** | 5 seconds | How often Scheduler wakes up to claim endpoints |
91
- | **AI analysis interval** | 5 minutes | How often AI Planner discovers and analyzes endpoints |
110
+ | **AI analysis interval** | 5 minutes | How often AI Planner discovers endpoints (overridden by AI scheduling) |
111
+ | **AI max tool calls** | 15 | Maximum tool calls per AI session (hard limit) |
92
112
  | **Lock TTL** | 30 seconds | How long claimed endpoints stay locked |
93
113
  | **Batch size** | 10 endpoints | Max endpoints claimed per tick |
94
114
  | **Zombie threshold** | 5 minutes | Runs older than this marked as timeout |
@@ -97,6 +117,7 @@ Quick lookup for Cronicorn terminology, schema, defaults, and tools.
97
117
  | **Timeout** | 30 seconds | Default request timeout |
98
118
  | **Max response size** | 100 KB | Default response body limit |
99
119
  | **Failure count cap** | 5 | Backoff capped at 2^5 = 32x |
120
+ | **History limit** | 10 | Default records returned by get_response_history |
100
121
  | **Min interval** | None | No minimum unless configured |
101
122
  | **Max interval** | None | No maximum unless configured |
102
123
 
@@ -168,6 +189,24 @@ Pause execution temporarily or resume.
168
189
  }
169
190
  ```
170
191
 
192
+ #### clear_hints
193
+
194
+ Clear all AI hints, reverting to baseline schedule immediately.
195
+
196
+ **Parameters**:
197
+ - `reason` (string, required): Explanation for clearing hints
198
+
199
+ **Effect**: Clears `aiHintIntervalMs`, `aiHintNextRunAt`, `aiHintExpiresAt`
200
+
201
+ **Use case**: When adaptive hints are no longer relevant (manual intervention, false positive, situation resolved)
202
+
203
+ **Example**:
204
+ ```json
205
+ {
206
+ "reason": "Endpoint recovered, reverting to baseline schedule"
207
+ }
208
+ ```
209
+
171
210
  ### Query Tools (Read Data)
172
211
 
173
212
  #### get_latest_response
@@ -191,15 +230,15 @@ Get most recent response body from this endpoint.
191
230
  Get recent response bodies to identify trends.
192
231
 
193
232
  **Parameters**:
194
- - `limit` (number, default: 2, max: 10): Number of responses
233
+ - `limit` (number, default: 10, max: 10): Number of responses
195
234
  - `offset` (number, default: 0): Skip N newest for pagination
196
235
 
197
236
  **Returns**:
198
237
  ```json
199
238
  {
200
- "count": 2,
239
+ "count": 10,
201
240
  "hasMore": true,
202
- "pagination": { "offset": 0, "limit": 2, "nextOffset": 2 },
241
+ "pagination": { "offset": 0, "limit": 10, "nextOffset": 10 },
203
242
  "responses": [
204
243
  {
205
244
  "responseBody": { "queue_depth": 200 },
@@ -208,15 +247,15 @@ Get recent response bodies to identify trends.
208
247
  "durationMs": 120
209
248
  }
210
249
  ],
211
- "hint": "More history available - call again with offset: 2"
250
+ "hint": "More history exists if needed, but 10 records is usually sufficient"
212
251
  }
213
252
  ```
214
253
 
215
- **Note**: Response bodies truncated at 1000 chars
254
+ **Note**: Response bodies truncated at 1000 chars. 10 records is usually sufficient for trend analysis.
216
255
 
217
256
  #### get_sibling_latest_responses
218
257
 
219
- Get latest responses from all endpoints in same job.
258
+ Get latest responses and state from all endpoints in same job.
220
259
 
221
260
  **Parameters**: None
222
261
 
@@ -230,22 +269,40 @@ Get latest responses from all endpoints in same job.
230
269
  "endpointName": "Data Processor",
231
270
  "responseBody": { "batch_id": "2025-11-02", "ready": true },
232
271
  "timestamp": "2025-11-02T14:25:00Z",
233
- "status": "success"
272
+ "status": "success",
273
+ "schedule": {
274
+ "baseline": "every 5 minutes",
275
+ "nextRunAt": "2025-11-02T14:30:00Z",
276
+ "lastRunAt": "2025-11-02T14:25:00Z",
277
+ "isPaused": false,
278
+ "pausedUntil": null,
279
+ "failureCount": 0
280
+ },
281
+ "aiHints": {
282
+ "intervalMs": 30000,
283
+ "nextRunAt": null,
284
+ "expiresAt": "2025-11-02T15:25:00Z",
285
+ "reason": "Tightening monitoring"
286
+ }
234
287
  }
235
288
  ]
236
289
  }
237
290
  ```
238
291
 
239
- **Note**: Only returns endpoints with same `jobId`
292
+ **Note**: Returns schedule info and active AI hints per sibling for full context
240
293
 
241
294
  ### Final Answer Tool
242
295
 
243
296
  #### submit_analysis
244
297
 
245
- Signal analysis completion and provide reasoning.
298
+ Signal analysis completion, provide reasoning, and schedule next analysis.
246
299
 
247
300
  **Parameters**:
248
301
  - `reasoning` (string, required): Analysis explanation
302
+ - `next_analysis_in_ms` (number, optional): When to analyze this endpoint again
303
+ - Min: 300000 (5 minutes)
304
+ - Max: 86400000 (24 hours)
305
+ - If omitted, uses baseline interval
249
306
  - `actions_taken` (string[], optional): List of tools called
250
307
  - `confidence` (enum, optional): 'high' | 'medium' | 'low'
251
308
 
@@ -254,11 +311,18 @@ Signal analysis completion and provide reasoning.
254
311
  {
255
312
  "status": "analysis_complete",
256
313
  "reasoning": "...",
314
+ "next_analysis_in_ms": 7200000,
257
315
  "actions_taken": ["propose_interval"],
258
316
  "confidence": "high"
259
317
  }
260
318
  ```
261
319
 
320
+ **Scheduling guidance**:
321
+ - 300000 (5 min): Incident active, need close monitoring
322
+ - 1800000 (30 min): Recovering, monitoring progress
323
+ - 7200000 (2 hours): Stable, routine check
324
+ - 86400000 (24 hours): Very stable or daily job
325
+
262
326
  **Note**: Must be called last to complete analysis
263
327
 
264
328
  ## Scheduling Sources
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@cronicorn/mcp-server",
3
- "version": "1.13.4",
3
+ "version": "1.14.0",
4
4
  "type": "module",
5
5
  "description": "MCP server for Cronicorn - enables AI agents to manage cron jobs via Model Context Protocol",
6
6
  "author": "Cronicorn",