@cronicorn/mcp-server 1.13.4 → 1.14.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -25,15 +25,21 @@ The AI Planner runs independently from the Scheduler. Typically it wakes up ever
|
|
|
25
25
|
|
|
26
26
|
The AI Planner doesn't analyze every endpoint on every cycle. That would be expensive (AI API costs) and unnecessary (most endpoints are stable).
|
|
27
27
|
|
|
28
|
-
Instead, it uses
|
|
28
|
+
Instead, it uses **smart scheduling** where the AI controls when it needs to analyze again:
|
|
29
29
|
|
|
30
|
-
1. Query the database for endpoints that ran
|
|
31
|
-
2.
|
|
32
|
-
|
|
30
|
+
1. Query the database for endpoints that ran recently
|
|
31
|
+
2. Check if the endpoint is due for analysis based on:
|
|
32
|
+
- **First analysis**: New endpoints that have never been analyzed
|
|
33
|
+
- **Scheduled time**: AI-requested re-analysis time has passed
|
|
34
|
+
- **State change**: New failures since last analysis (triggers immediate re-analysis)
|
|
35
|
+
3. Skip endpoints where none of these conditions are met
|
|
33
36
|
|
|
34
|
-
This approach
|
|
37
|
+
This approach lets the AI decide its own analysis frequency:
|
|
38
|
+
- Stable endpoints: "Check again in 4 hours"
|
|
39
|
+
- Incidents: "Check again in 5 minutes"
|
|
40
|
+
- Very stable daily jobs: "Check again in 24 hours"
|
|
35
41
|
|
|
36
|
-
The
|
|
42
|
+
The AI communicates this via the `next_analysis_in_ms` parameter in `submit_analysis` (see Tools section).
|
|
37
43
|
|
|
38
44
|
## What the AI Sees: Building Context
|
|
39
45
|
|
|
@@ -49,15 +55,28 @@ For each endpoint, the AI Planner builds a comprehensive analysis prompt contain
|
|
|
49
55
|
|
|
50
56
|
This tells the AI what scheduling behavior is currently in effect.
|
|
51
57
|
|
|
52
|
-
### Recent Performance (
|
|
58
|
+
### Recent Performance (Multi-Window Health)
|
|
53
59
|
|
|
54
|
-
|
|
55
|
-
|
|
60
|
+
The AI sees health metrics across **three time windows** to accurately detect recovery patterns:
|
|
61
|
+
|
|
62
|
+
| Window | Metrics |
|
|
63
|
+
|--------|--------|
|
|
64
|
+
| **Last 1 hour** | Success rate, run count |
|
|
65
|
+
| **Last 4 hours** | Success rate, run count |
|
|
66
|
+
| **Last 24 hours** | Success rate, run count |
|
|
67
|
+
|
|
68
|
+
Plus:
|
|
56
69
|
- **Average duration**: Mean execution time
|
|
57
70
|
- **Failure streak**: Consecutive failures (signals degradation)
|
|
58
|
-
- **Last run status**: Most recent execution outcome
|
|
59
71
|
|
|
60
|
-
|
|
72
|
+
**Why multiple windows matter**: A single 24-hour window can be misleading during recovery. If an endpoint failed at high frequency (every 5 seconds for 2 hours = 1,440 failures) and then recovered at normal frequency (every 5 minutes for 6 hours = 72 successes), the 24-hour rate shows 4.8% success even though recent performance is 100%.
|
|
73
|
+
|
|
74
|
+
With multi-window health, the AI sees:
|
|
75
|
+
- Last 1h: 100% success (12 runs)
|
|
76
|
+
- Last 4h: 85% success (40 runs)
|
|
77
|
+
- Last 24h: 32% success (500 runs) ← skewed by old failures
|
|
78
|
+
|
|
79
|
+
This tells the AI "endpoint has recovered" rather than "endpoint is still failing."
|
|
61
80
|
|
|
62
81
|
### Response Body Data
|
|
63
82
|
|
|
@@ -82,15 +101,33 @@ Structure your response bodies to include the metrics that matter for your use c
|
|
|
82
101
|
|
|
83
102
|
### Job Context
|
|
84
103
|
|
|
85
|
-
If the endpoint belongs to a job, the AI sees
|
|
104
|
+
If the endpoint belongs to a job, the AI sees:
|
|
105
|
+
|
|
106
|
+
- **Job description**: High-level intent (e.g., "Monitors payment queue and triggers processing when depth exceeds threshold")
|
|
107
|
+
- **Sibling endpoint names**: Other endpoints in the same job (e.g., "3 endpoints [API Monitor, Data Fetcher, Notifier]")
|
|
108
|
+
|
|
109
|
+
Knowing sibling names helps the AI:
|
|
110
|
+
- Understand the endpoint is part of a larger workflow
|
|
111
|
+
- Decide when to check sibling responses for coordination
|
|
112
|
+
- Make informed decisions about the `get_sibling_latest_responses` tool
|
|
113
|
+
|
|
114
|
+
The AI uses job context to interpret what "good" vs "bad" looks like for specific metrics. A growing queue_depth might be normal for a collector endpoint but alarming for a processor endpoint.
|
|
86
115
|
|
|
87
|
-
|
|
116
|
+
## Session Constraints
|
|
88
117
|
|
|
89
|
-
|
|
118
|
+
Each AI analysis session has resource limits to prevent runaway costs:
|
|
90
119
|
|
|
91
|
-
|
|
120
|
+
- **Maximum 15 tool calls** per session (hard limit)
|
|
121
|
+
- **10 history records** is usually sufficient for trend analysis
|
|
122
|
+
- Sessions that hit the limit are terminated
|
|
92
123
|
|
|
93
|
-
|
|
124
|
+
These constraints prevent the worst-case scenario: an AI session that paginates through hundreds of identical failure records, consuming 42K+ tokens for a decision reachable in 5 tool calls.
|
|
125
|
+
|
|
126
|
+
The AI is informed of these limits and prioritizes the most valuable queries.
|
|
127
|
+
|
|
128
|
+
## The Four Tools: How AI Takes Action
|
|
129
|
+
|
|
130
|
+
The AI Planner doesn't write to the database directly. Instead, it has access to **four action tools** that write hints:
|
|
94
131
|
|
|
95
132
|
### 1. propose_interval
|
|
96
133
|
|
|
@@ -175,6 +212,31 @@ Effect: No executions until 3:30 PM, then resumes baseline
|
|
|
175
212
|
3. Scheduler's Governor checks pause state—if `pausedUntil > now`, returns that timestamp with source `"paused"`
|
|
176
213
|
4. When pause time passes, Governor resumes normal scheduling
|
|
177
214
|
|
|
215
|
+
### 4. clear_hints
|
|
216
|
+
|
|
217
|
+
**Purpose**: Reset endpoint to baseline schedule by clearing all AI hints
|
|
218
|
+
|
|
219
|
+
**Parameters**:
|
|
220
|
+
- `reason`: Explanation for clearing hints
|
|
221
|
+
|
|
222
|
+
**When to use**:
|
|
223
|
+
- AI hints are no longer relevant (situation changed)
|
|
224
|
+
- Manual intervention resolved the issue
|
|
225
|
+
- False positive detection (AI over-reacted)
|
|
226
|
+
- Want to revert to baseline without waiting for TTL expiry
|
|
227
|
+
|
|
228
|
+
**Example**:
|
|
229
|
+
```
|
|
230
|
+
AI sees endpoint recovered but has aggressive 30s interval hint active
|
|
231
|
+
Action: clear_hints(reason="Endpoint recovered, reverting to baseline")
|
|
232
|
+
Effect: AI hints cleared immediately, baseline schedule resumes
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
**How it works**:
|
|
236
|
+
1. AI calls the tool with a reason
|
|
237
|
+
2. Tool clears `aiHintIntervalMs`, `aiHintNextRunAt`, `aiHintExpiresAt`
|
|
238
|
+
3. Next execution uses baseline schedule
|
|
239
|
+
|
|
178
240
|
## Query Tools: Informing Decisions
|
|
179
241
|
|
|
180
242
|
Before taking action, the AI can query response data using **three query tools**:
|
|
@@ -190,7 +252,7 @@ Example: "What's the current queue depth?"
|
|
|
190
252
|
### 2. get_response_history
|
|
191
253
|
|
|
192
254
|
**Parameters**:
|
|
193
|
-
- `limit`: Number of responses (1-10, default
|
|
255
|
+
- `limit`: Number of responses (1-10, default 10)
|
|
194
256
|
- `offset`: Skip N newest responses for pagination
|
|
195
257
|
|
|
196
258
|
**Returns**: Array of response bodies with timestamps, newest first
|
|
@@ -199,17 +261,39 @@ Example: "What's the current queue depth?"
|
|
|
199
261
|
|
|
200
262
|
Example: "Is queue_depth increasing monotonically?"
|
|
201
263
|
|
|
202
|
-
**
|
|
264
|
+
**Note**: 10 records is usually sufficient for trend analysis. The AI is encouraged not to paginate endlessly—if patterns are unclear after 10-20 records, more data rarely helps.
|
|
203
265
|
|
|
204
266
|
Response bodies are truncated at 1000 characters to prevent token overflow.
|
|
205
267
|
|
|
206
268
|
### 3. get_sibling_latest_responses
|
|
207
269
|
|
|
208
|
-
**Returns**:
|
|
270
|
+
**Returns**: For each sibling endpoint in the same job:
|
|
271
|
+
- **Response data**: Latest response body, timestamp, status
|
|
272
|
+
- **Schedule info**: Baseline, next run, last run, pause status, failure count
|
|
273
|
+
- **AI hints**: Active interval/one-shot hints with expiry and reason
|
|
274
|
+
|
|
275
|
+
**Use case**: Coordinate across endpoints with full context
|
|
209
276
|
|
|
210
|
-
|
|
277
|
+
Example: "Is the upstream endpoint healthy and running normally, or is it paused/failing?"
|
|
211
278
|
|
|
212
|
-
|
|
279
|
+
The enriched response allows the AI to understand sibling state at a glance:
|
|
280
|
+
```json
|
|
281
|
+
{
|
|
282
|
+
"endpointId": "ep_456",
|
|
283
|
+
"endpointName": "Data Fetcher",
|
|
284
|
+
"responseBody": { "batch_ready": true },
|
|
285
|
+
"timestamp": "2025-11-02T14:25:00Z",
|
|
286
|
+
"status": "success",
|
|
287
|
+
"schedule": {
|
|
288
|
+
"baseline": "every 5 minutes",
|
|
289
|
+
"nextRunAt": "2025-11-02T14:30:00Z",
|
|
290
|
+
"lastRunAt": "2025-11-02T14:25:00Z",
|
|
291
|
+
"isPaused": false,
|
|
292
|
+
"failureCount": 0
|
|
293
|
+
},
|
|
294
|
+
"aiHints": null
|
|
295
|
+
}
|
|
296
|
+
```
|
|
213
297
|
|
|
214
298
|
Only useful for workflow endpoints (multiple endpoints in the same job that coordinate).
|
|
215
299
|
|
|
@@ -371,12 +455,15 @@ Every AI analysis creates a session record:
|
|
|
371
455
|
- **Reasoning** (AI's explanation of its decision)
|
|
372
456
|
- **Token usage** (cost tracking)
|
|
373
457
|
- **Duration**
|
|
458
|
+
- **Next analysis at** (when AI scheduled its next analysis)
|
|
459
|
+
- **Endpoint failure count** (snapshot for detecting state changes)
|
|
374
460
|
|
|
375
461
|
This audit trail helps you:
|
|
376
462
|
- Debug unexpected scheduling behavior
|
|
377
463
|
- Understand why AI tightened/relaxed intervals
|
|
378
464
|
- Review cost (token usage per analysis)
|
|
379
465
|
- Tune prompts or constraints based on AI reasoning
|
|
466
|
+
- See when AI expects to analyze again
|
|
380
467
|
|
|
381
468
|
Check the sessions table when an endpoint's schedule changes unexpectedly. The reasoning field shows the AI's thought process.
|
|
382
469
|
|
|
@@ -434,14 +521,15 @@ This ensures that even if AI becomes unavailable or too expensive, your jobs kee
|
|
|
434
521
|
|
|
435
522
|
## Key Takeaways
|
|
436
523
|
|
|
437
|
-
1. **AI
|
|
438
|
-
2. **AI sees health
|
|
439
|
-
3. **
|
|
440
|
-
4. **
|
|
441
|
-
5. **
|
|
442
|
-
6. **
|
|
443
|
-
7. **
|
|
444
|
-
8. **
|
|
445
|
-
9. **
|
|
524
|
+
1. **AI controls its own analysis frequency**: Schedules re-analysis based on endpoint state
|
|
525
|
+
2. **AI sees multi-window health data**: 1h, 4h, 24h windows for accurate recovery detection
|
|
526
|
+
3. **Sessions have constraints**: Max 15 tool calls to prevent runaway costs
|
|
527
|
+
4. **Four action tools**: propose_interval, propose_next_time, pause_until, clear_hints
|
|
528
|
+
5. **Hints have TTLs**: Auto-revert on expiration (safety)
|
|
529
|
+
6. **Interval hints override baseline**: Enables adaptation
|
|
530
|
+
7. **Nudging provides immediacy**: Changes apply within seconds
|
|
531
|
+
8. **Structure response bodies intentionally**: Include metrics AI should monitor
|
|
532
|
+
9. **Sessions provide audit trail**: Debug AI reasoning
|
|
533
|
+
10. **Quota controls costs**: AI unavailable ≠ jobs stop running
|
|
446
534
|
|
|
447
535
|
Understanding how AI adaptation works helps you design endpoints that benefit from intelligent scheduling, structure response bodies effectively, and debug unexpected schedule changes.
|
|
@@ -50,8 +50,27 @@ Quick lookup for Cronicorn terminology, schema, defaults, and tools.
|
|
|
50
50
|
**Sibling Endpoints**
|
|
51
51
|
: Endpoints in the same job that can coordinate via `get_sibling_latest_responses()`.
|
|
52
52
|
|
|
53
|
+
**Multi-Window Health**
|
|
54
|
+
: Health metrics shown across three time windows (1h, 4h, 24h) to accurately detect recovery patterns. Prevents old failure bursts from skewing recent health assessment.
|
|
55
|
+
|
|
53
56
|
**Analysis Session**
|
|
54
|
-
: Record of AI analysis for an endpoint, including tools called, reasoning, and
|
|
57
|
+
: Record of AI analysis for an endpoint, including tools called, reasoning, token usage, and when to analyze next. Stored in `ai_analysis_sessions` table.
|
|
58
|
+
|
|
59
|
+
**Session Constraints**
|
|
60
|
+
: Resource limits on AI analysis sessions. Maximum 15 tool calls per session to prevent runaway costs. Sessions that hit the limit are terminated.
|
|
61
|
+
|
|
62
|
+
## Schema: ai_analysis_sessions Table
|
|
63
|
+
|
|
64
|
+
| Field | Type | Description |
|
|
65
|
+
|-------|------|-------------|
|
|
66
|
+
| `id` | UUID | Unique session identifier |
|
|
67
|
+
| `endpointId` | UUID | Endpoint that was analyzed |
|
|
68
|
+
| `createdAt` | Timestamp | When analysis started |
|
|
69
|
+
| `reasoning` | String | AI's explanation for decision |
|
|
70
|
+
| `tokenUsage` | Integer | Tokens consumed |
|
|
71
|
+
| `durationMs` | Integer | How long analysis took |
|
|
72
|
+
| `nextAnalysisAt` | Timestamp | When AI requested re-analysis |
|
|
73
|
+
| `endpointFailureCount` | Integer | Failure count snapshot at analysis time |
|
|
55
74
|
|
|
56
75
|
## Schema: job_endpoints Table
|
|
57
76
|
|
|
@@ -88,7 +107,8 @@ Quick lookup for Cronicorn terminology, schema, defaults, and tools.
|
|
|
88
107
|
| Setting | Default | Notes |
|
|
89
108
|
|---------|---------|-------|
|
|
90
109
|
| **Scheduler tick interval** | 5 seconds | How often Scheduler wakes up to claim endpoints |
|
|
91
|
-
| **AI analysis interval** | 5 minutes | How often AI Planner discovers
|
|
110
|
+
| **AI analysis interval** | 5 minutes | How often AI Planner discovers endpoints (overridden by AI scheduling) |
|
|
111
|
+
| **AI max tool calls** | 15 | Maximum tool calls per AI session (hard limit) |
|
|
92
112
|
| **Lock TTL** | 30 seconds | How long claimed endpoints stay locked |
|
|
93
113
|
| **Batch size** | 10 endpoints | Max endpoints claimed per tick |
|
|
94
114
|
| **Zombie threshold** | 5 minutes | Runs older than this marked as timeout |
|
|
@@ -97,6 +117,7 @@ Quick lookup for Cronicorn terminology, schema, defaults, and tools.
|
|
|
97
117
|
| **Timeout** | 30 seconds | Default request timeout |
|
|
98
118
|
| **Max response size** | 100 KB | Default response body limit |
|
|
99
119
|
| **Failure count cap** | 5 | Backoff capped at 2^5 = 32x |
|
|
120
|
+
| **History limit** | 10 | Default records returned by get_response_history |
|
|
100
121
|
| **Min interval** | None | No minimum unless configured |
|
|
101
122
|
| **Max interval** | None | No maximum unless configured |
|
|
102
123
|
|
|
@@ -168,6 +189,24 @@ Pause execution temporarily or resume.
|
|
|
168
189
|
}
|
|
169
190
|
```
|
|
170
191
|
|
|
192
|
+
#### clear_hints
|
|
193
|
+
|
|
194
|
+
Clear all AI hints, reverting to baseline schedule immediately.
|
|
195
|
+
|
|
196
|
+
**Parameters**:
|
|
197
|
+
- `reason` (string, required): Explanation for clearing hints
|
|
198
|
+
|
|
199
|
+
**Effect**: Clears `aiHintIntervalMs`, `aiHintNextRunAt`, `aiHintExpiresAt`
|
|
200
|
+
|
|
201
|
+
**Use case**: When adaptive hints are no longer relevant (manual intervention, false positive, situation resolved)
|
|
202
|
+
|
|
203
|
+
**Example**:
|
|
204
|
+
```json
|
|
205
|
+
{
|
|
206
|
+
"reason": "Endpoint recovered, reverting to baseline schedule"
|
|
207
|
+
}
|
|
208
|
+
```
|
|
209
|
+
|
|
171
210
|
### Query Tools (Read Data)
|
|
172
211
|
|
|
173
212
|
#### get_latest_response
|
|
@@ -191,15 +230,15 @@ Get most recent response body from this endpoint.
|
|
|
191
230
|
Get recent response bodies to identify trends.
|
|
192
231
|
|
|
193
232
|
**Parameters**:
|
|
194
|
-
- `limit` (number, default:
|
|
233
|
+
- `limit` (number, default: 10, max: 10): Number of responses
|
|
195
234
|
- `offset` (number, default: 0): Skip N newest for pagination
|
|
196
235
|
|
|
197
236
|
**Returns**:
|
|
198
237
|
```json
|
|
199
238
|
{
|
|
200
|
-
"count":
|
|
239
|
+
"count": 10,
|
|
201
240
|
"hasMore": true,
|
|
202
|
-
"pagination": { "offset": 0, "limit":
|
|
241
|
+
"pagination": { "offset": 0, "limit": 10, "nextOffset": 10 },
|
|
203
242
|
"responses": [
|
|
204
243
|
{
|
|
205
244
|
"responseBody": { "queue_depth": 200 },
|
|
@@ -208,15 +247,15 @@ Get recent response bodies to identify trends.
|
|
|
208
247
|
"durationMs": 120
|
|
209
248
|
}
|
|
210
249
|
],
|
|
211
|
-
"hint": "More history
|
|
250
|
+
"hint": "More history exists if needed, but 10 records is usually sufficient"
|
|
212
251
|
}
|
|
213
252
|
```
|
|
214
253
|
|
|
215
|
-
**Note**: Response bodies truncated at 1000 chars
|
|
254
|
+
**Note**: Response bodies truncated at 1000 chars. 10 records is usually sufficient for trend analysis.
|
|
216
255
|
|
|
217
256
|
#### get_sibling_latest_responses
|
|
218
257
|
|
|
219
|
-
Get latest responses from all endpoints in same job.
|
|
258
|
+
Get latest responses and state from all endpoints in same job.
|
|
220
259
|
|
|
221
260
|
**Parameters**: None
|
|
222
261
|
|
|
@@ -230,22 +269,40 @@ Get latest responses from all endpoints in same job.
|
|
|
230
269
|
"endpointName": "Data Processor",
|
|
231
270
|
"responseBody": { "batch_id": "2025-11-02", "ready": true },
|
|
232
271
|
"timestamp": "2025-11-02T14:25:00Z",
|
|
233
|
-
"status": "success"
|
|
272
|
+
"status": "success",
|
|
273
|
+
"schedule": {
|
|
274
|
+
"baseline": "every 5 minutes",
|
|
275
|
+
"nextRunAt": "2025-11-02T14:30:00Z",
|
|
276
|
+
"lastRunAt": "2025-11-02T14:25:00Z",
|
|
277
|
+
"isPaused": false,
|
|
278
|
+
"pausedUntil": null,
|
|
279
|
+
"failureCount": 0
|
|
280
|
+
},
|
|
281
|
+
"aiHints": {
|
|
282
|
+
"intervalMs": 30000,
|
|
283
|
+
"nextRunAt": null,
|
|
284
|
+
"expiresAt": "2025-11-02T15:25:00Z",
|
|
285
|
+
"reason": "Tightening monitoring"
|
|
286
|
+
}
|
|
234
287
|
}
|
|
235
288
|
]
|
|
236
289
|
}
|
|
237
290
|
```
|
|
238
291
|
|
|
239
|
-
**Note**:
|
|
292
|
+
**Note**: Returns schedule info and active AI hints per sibling for full context
|
|
240
293
|
|
|
241
294
|
### Final Answer Tool
|
|
242
295
|
|
|
243
296
|
#### submit_analysis
|
|
244
297
|
|
|
245
|
-
Signal analysis completion and
|
|
298
|
+
Signal analysis completion, provide reasoning, and schedule next analysis.
|
|
246
299
|
|
|
247
300
|
**Parameters**:
|
|
248
301
|
- `reasoning` (string, required): Analysis explanation
|
|
302
|
+
- `next_analysis_in_ms` (number, optional): When to analyze this endpoint again
|
|
303
|
+
- Min: 300000 (5 minutes)
|
|
304
|
+
- Max: 86400000 (24 hours)
|
|
305
|
+
- If omitted, uses baseline interval
|
|
249
306
|
- `actions_taken` (string[], optional): List of tools called
|
|
250
307
|
- `confidence` (enum, optional): 'high' | 'medium' | 'low'
|
|
251
308
|
|
|
@@ -254,11 +311,18 @@ Signal analysis completion and provide reasoning.
|
|
|
254
311
|
{
|
|
255
312
|
"status": "analysis_complete",
|
|
256
313
|
"reasoning": "...",
|
|
314
|
+
"next_analysis_in_ms": 7200000,
|
|
257
315
|
"actions_taken": ["propose_interval"],
|
|
258
316
|
"confidence": "high"
|
|
259
317
|
}
|
|
260
318
|
```
|
|
261
319
|
|
|
320
|
+
**Scheduling guidance**:
|
|
321
|
+
- 300000 (5 min): Incident active, need close monitoring
|
|
322
|
+
- 1800000 (30 min): Recovering, monitoring progress
|
|
323
|
+
- 7200000 (2 hours): Stable, routine check
|
|
324
|
+
- 86400000 (24 hours): Very stable or daily job
|
|
325
|
+
|
|
262
326
|
**Note**: Must be called last to complete analysis
|
|
263
327
|
|
|
264
328
|
## Scheduling Sources
|
package/package.json
CHANGED