@cronicorn/mcp-server 1.4.5 → 1.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/docs/README.md +133 -0
- package/dist/docs/core-concepts.md +233 -0
- package/dist/docs/introduction.md +152 -0
- package/dist/docs/quick-start.md +232 -0
- package/dist/docs/technical/_category.yml +8 -0
- package/dist/docs/technical/configuration-and-constraints.md +415 -0
- package/dist/docs/technical/coordinating-multiple-endpoints.md +457 -0
- package/dist/docs/technical/how-ai-adaptation-works.md +453 -0
- package/dist/docs/technical/how-scheduling-works.md +268 -0
- package/dist/docs/technical/reference.md +404 -0
- package/dist/docs/technical/system-architecture.md +306 -0
- package/dist/index.js +123 -8
- package/dist/index.js.map +1 -1
- package/package.json +3 -2
|
@@ -0,0 +1,232 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: quick-start
|
|
3
|
+
title: Quick Start Guide
|
|
4
|
+
description: Create your first scheduled job in 5 minutes
|
|
5
|
+
tags:
|
|
6
|
+
- user
|
|
7
|
+
- essential
|
|
8
|
+
sidebar_position: 3
|
|
9
|
+
mcp:
|
|
10
|
+
uri: file:///docs/quick-start.md
|
|
11
|
+
mimeType: text/markdown
|
|
12
|
+
priority: 0.9
|
|
13
|
+
lastModified: 2025-11-02T00:00:00Z
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
# Quick Start Guide
|
|
17
|
+
|
|
18
|
+
Create your first scheduled job and start monitoring executions.
|
|
19
|
+
|
|
20
|
+
> **Note**: This guide is for using Cronicorn as a hosted service. If you're self-hosting, see the [Technical Documentation](./technical/system-architecture.md).
|
|
21
|
+
|
|
22
|
+
## 1. Sign Up
|
|
23
|
+
|
|
24
|
+
1. Visit [https://app.cronicorn.com](https://app.cronicorn.com)
|
|
25
|
+
2. Click **Sign in with GitHub**
|
|
26
|
+
3. Authorize the application
|
|
27
|
+
|
|
28
|
+
## 2. Create Your First Job
|
|
29
|
+
|
|
30
|
+
Jobs group related endpoints together. Let's create one for API monitoring:
|
|
31
|
+
|
|
32
|
+
1. Click **Create Job**
|
|
33
|
+
2. Fill in the details:
|
|
34
|
+
- **Name**: `API Health Checks`
|
|
35
|
+
- **Description**: `Monitor our API endpoints`
|
|
36
|
+
3. Click **Create**
|
|
37
|
+
|
|
38
|
+
## 3. Add an Endpoint
|
|
39
|
+
|
|
40
|
+
Now let's add an HTTP endpoint to monitor:
|
|
41
|
+
|
|
42
|
+
1. Click on your newly created job
|
|
43
|
+
2. Click **Add Endpoint**
|
|
44
|
+
3. Fill in:
|
|
45
|
+
- **Name**: `Main API Health`
|
|
46
|
+
- **URL**: `https://api.yourapp.com/health`
|
|
47
|
+
- **Method**: `GET`
|
|
48
|
+
- **Baseline Schedule**: Choose either:
|
|
49
|
+
- **Cron**: `*/5 * * * *` (every 5 minutes)
|
|
50
|
+
- **Interval**: `300000` (milliseconds = 5 minutes)
|
|
51
|
+
|
|
52
|
+
4. **(Optional)** Add safety constraints:
|
|
53
|
+
- **Min Interval**: `30000` (30 seconds - prevents over-polling)
|
|
54
|
+
- **Max Interval**: `900000` (15 minutes - ensures regular checks)
|
|
55
|
+
|
|
56
|
+
5. Click **Add Endpoint**
|
|
57
|
+
|
|
58
|
+
## 4. View Execution History
|
|
59
|
+
|
|
60
|
+
Your endpoint will start executing automatically. To monitor it:
|
|
61
|
+
|
|
62
|
+
1. Click on the endpoint name
|
|
63
|
+
2. View the **Runs** tab to see:
|
|
64
|
+
- Execution timestamps
|
|
65
|
+
- Success/failure status
|
|
66
|
+
- Response time
|
|
67
|
+
- Error messages (if any)
|
|
68
|
+
|
|
69
|
+
## 5. Enable AI Adaptation (Optional)
|
|
70
|
+
|
|
71
|
+
Want AI to optimize your schedule automatically?
|
|
72
|
+
|
|
73
|
+
1. Navigate to **Settings** in the top navigation
|
|
74
|
+
2. Find the **AI Features** section
|
|
75
|
+
3. Toggle **Enable AI Scheduling**
|
|
76
|
+
4. Your endpoints will now adapt based on performance patterns
|
|
77
|
+
|
|
78
|
+
**How AI helps:**
|
|
79
|
+
- Increases frequency when errors detected
|
|
80
|
+
- Backs off when everything is stable
|
|
81
|
+
- Always respects your min/max constraints
|
|
82
|
+
- All hints expire automatically (TTL)
|
|
83
|
+
|
|
84
|
+
## Common Patterns
|
|
85
|
+
|
|
86
|
+
### API Health Check
|
|
87
|
+
|
|
88
|
+
Monitor an API endpoint with adaptive frequency:
|
|
89
|
+
|
|
90
|
+
```
|
|
91
|
+
Name: API Health Check
|
|
92
|
+
URL: https://api.example.com/health
|
|
93
|
+
Method: GET
|
|
94
|
+
Baseline: Every 5 minutes (300000ms)
|
|
95
|
+
Min Interval: 30 seconds (prevents rate limit issues)
|
|
96
|
+
Max Interval: 15 minutes (ensures timely detection)
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
**With AI enabled:**
|
|
100
|
+
- Normal state: Runs every 5 minutes
|
|
101
|
+
- Errors detected: Increases to every 30 seconds
|
|
102
|
+
- All healthy: Backs off to every 15 minutes
|
|
103
|
+
|
|
104
|
+
### Data Sync
|
|
105
|
+
|
|
106
|
+
Synchronize data between systems:
|
|
107
|
+
|
|
108
|
+
```
|
|
109
|
+
Name: User Sync
|
|
110
|
+
URL: https://api.example.com/sync/users
|
|
111
|
+
Method: POST
|
|
112
|
+
Body: {"lastSyncTime": "{{lastRunAt}}"}
|
|
113
|
+
Baseline: Every hour (3600000ms)
|
|
114
|
+
Max Interval: 2 hours (ensures freshness)
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
### Daily Cleanup
|
|
118
|
+
|
|
119
|
+
Run maintenance tasks on a schedule:
|
|
120
|
+
|
|
121
|
+
```
|
|
122
|
+
Name: Database Cleanup
|
|
123
|
+
URL: https://api.example.com/admin/cleanup
|
|
124
|
+
Method: POST
|
|
125
|
+
Baseline: Daily at 2am (cron: "0 2 * * *")
|
|
126
|
+
Timeout: 300000ms (5 minutes)
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
## Using API Keys
|
|
130
|
+
|
|
131
|
+
For programmatic access, create API keys:
|
|
132
|
+
|
|
133
|
+
1. Go to **Settings** → **API Keys**
|
|
134
|
+
2. Click **Create API Key**
|
|
135
|
+
3. Give it a name and copy the key (shown only once!)
|
|
136
|
+
4. Use in your requests:
|
|
137
|
+
|
|
138
|
+
```bash
|
|
139
|
+
curl -X GET https://api.cronicorn.com/api/jobs \
|
|
140
|
+
-H "x-api-key: cron_abc123..."
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
See the [API Reference](https://app.cronicorn.com/docs/api) for all available endpoints.
|
|
144
|
+
|
|
145
|
+
## Using with AI Assistants
|
|
146
|
+
|
|
147
|
+
Cronicorn provides an MCP server for AI assistants like Claude:
|
|
148
|
+
|
|
149
|
+
### Installation
|
|
150
|
+
|
|
151
|
+
```bash
|
|
152
|
+
npm install -g @cronicorn/mcp-server
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
### Configuration
|
|
156
|
+
|
|
157
|
+
**Claude Desktop** (`~/Library/Application Support/Claude/claude_desktop_config.json`):
|
|
158
|
+
|
|
159
|
+
```json
|
|
160
|
+
{
|
|
161
|
+
"mcpServers": {
|
|
162
|
+
"cronicorn": {
|
|
163
|
+
"command": "cronicorn-mcp"
|
|
164
|
+
}
|
|
165
|
+
}
|
|
166
|
+
}
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
### First Run
|
|
170
|
+
|
|
171
|
+
The MCP server will:
|
|
172
|
+
|
|
173
|
+
1. Display a device code
|
|
174
|
+
2. Open your browser to approve access
|
|
175
|
+
3. Store credentials securely
|
|
176
|
+
|
|
177
|
+
Then ask Claude to manage your jobs:
|
|
178
|
+
|
|
179
|
+
- "Create a health check endpoint for api.example.com"
|
|
180
|
+
- "Show me the run history for my API health check"
|
|
181
|
+
- "Pause all endpoints for the next hour"
|
|
182
|
+
|
|
183
|
+
## Troubleshooting
|
|
184
|
+
|
|
185
|
+
### Endpoint Not Running
|
|
186
|
+
|
|
187
|
+
**Check the endpoint status:**
|
|
188
|
+
1. Open the endpoint details
|
|
189
|
+
2. Look for **Status** field
|
|
190
|
+
3. If "Paused", click **Resume**
|
|
191
|
+
|
|
192
|
+
**Check execution history:**
|
|
193
|
+
1. View the **Runs** tab
|
|
194
|
+
2. Look for error messages
|
|
195
|
+
3. Check if the URL is accessible
|
|
196
|
+
|
|
197
|
+
### Authentication Errors
|
|
198
|
+
|
|
199
|
+
**For HTTPS endpoints with auth:**
|
|
200
|
+
1. Add authentication headers in the endpoint configuration
|
|
201
|
+
2. Common headers:
|
|
202
|
+
- `Authorization: Bearer <token>`
|
|
203
|
+
- `x-api-key: <key>`
|
|
204
|
+
|
|
205
|
+
### Timeout Errors
|
|
206
|
+
|
|
207
|
+
If requests are timing out:
|
|
208
|
+
|
|
209
|
+
1. Edit the endpoint
|
|
210
|
+
2. Increase **Timeout** (default is 30 seconds)
|
|
211
|
+
3. Consider if your API needs optimization
|
|
212
|
+
|
|
213
|
+
### Rate Limit Errors
|
|
214
|
+
|
|
215
|
+
If you're hitting rate limits:
|
|
216
|
+
|
|
217
|
+
1. Increase **Min Interval** (e.g., from 30s to 60s)
|
|
218
|
+
2. Adjust **Baseline Schedule** to be less frequent
|
|
219
|
+
3. Let AI adapt (it will back off automatically)
|
|
220
|
+
|
|
221
|
+
## Next Steps
|
|
222
|
+
|
|
223
|
+
- **[Core Concepts](./core-concepts.md)** - Understand jobs, endpoints, and AI scheduling
|
|
224
|
+
- **[API Reference](https://app.cronicorn.com/docs/api)** - Full API documentation
|
|
225
|
+
- **[Self-Hosting Guide](./technical/system-architecture.md)** - Deploy Cronicorn yourself
|
|
226
|
+
|
|
227
|
+
## Getting Help
|
|
228
|
+
|
|
229
|
+
- 📖 [Documentation](https://cronicorn.com/docs)
|
|
230
|
+
- 💬 [Discord Community](https://discord.gg/cronicorn)
|
|
231
|
+
- 📧 [Email Support](mailto:support@cronicorn.com)
|
|
232
|
+
- 🐛 [Report Issues](https://github.com/weskerllc/cronicorn/issues)
|
|
@@ -0,0 +1,415 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: configuration-constraints
|
|
3
|
+
title: Configuration & Constraints
|
|
4
|
+
description: Decision guide for intervals, cron, timeouts, and AI constraints
|
|
5
|
+
tags: [assistant, technical, configuration]
|
|
6
|
+
sidebar_position: 5
|
|
7
|
+
mcp:
|
|
8
|
+
uri: file:///docs/technical/configuration-and-constraints.md
|
|
9
|
+
mimeType: text/markdown
|
|
10
|
+
priority: 0.80
|
|
11
|
+
lastModified: 2025-11-02T00:00:00Z
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Configuration and Constraints
|
|
15
|
+
|
|
16
|
+
This document helps you configure endpoints correctly. Instead of explaining every field, it focuses on the decisions you need to make and their consequences.
|
|
17
|
+
|
|
18
|
+
## Decision 1: Cron vs. Interval
|
|
19
|
+
|
|
20
|
+
**Question**: Should I use a cron expression or fixed interval?
|
|
21
|
+
|
|
22
|
+
### Use Cron When:
|
|
23
|
+
|
|
24
|
+
**Scheduled business logic** - Tasks tied to calendar/clock times:
|
|
25
|
+
- Daily reports at 2 AM: `"0 2 * * *"`
|
|
26
|
+
- Hourly data sync: `"0 * * * *"`
|
|
27
|
+
- Weekday batch jobs: `"0 9 * * 1-5"`
|
|
28
|
+
- End-of-month processing: `"0 0 1 * *"`
|
|
29
|
+
|
|
30
|
+
**Pros**: Predictable timing, aligns with business schedules, easy to reason about
|
|
31
|
+
|
|
32
|
+
**Cons**: Can't be tightened by AI (AI interval hints don't apply to cron), uneven spacing (monthly crons have 28-31 day gaps)
|
|
33
|
+
|
|
34
|
+
### Use Interval When:
|
|
35
|
+
|
|
36
|
+
**Continuous monitoring** - Tasks measuring ongoing state:
|
|
37
|
+
- Queue depth checks every 5 minutes: `300000ms`
|
|
38
|
+
- Health checks every 30 seconds: `30000ms`
|
|
39
|
+
- Metric collection every 1 minute: `60000ms`
|
|
40
|
+
|
|
41
|
+
**Pros**: AI can tighten/relax (adaptive), consistent spacing, simple math
|
|
42
|
+
|
|
43
|
+
**Cons**: Drifts from clock (5min interval ≠ :00, :05, :10...), no calendar awareness
|
|
44
|
+
|
|
45
|
+
### Decision Tree:
|
|
46
|
+
|
|
47
|
+
```
|
|
48
|
+
Does the task care about specific clock times?
|
|
49
|
+
├─ Yes → Use cron
|
|
50
|
+
│ Example: "Run daily at 3 AM"
|
|
51
|
+
└─ No → Use interval
|
|
52
|
+
├─ Do you want AI adaptation?
|
|
53
|
+
│ ├─ Yes → Use interval (AI can adjust)
|
|
54
|
+
│ └─ No → Either works (prefer interval for simplicity)
|
|
55
|
+
└─ Example: "Check every 5 minutes"
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
### Common Mistake:
|
|
59
|
+
|
|
60
|
+
❌ Using cron for monitoring: `"*/5 * * * *"` (every 5 minutes)
|
|
61
|
+
✅ Use interval instead: `300000` (AI can tighten to 1 minute during incidents)
|
|
62
|
+
|
|
63
|
+
Cron works, but you lose adaptive capability. Reserve cron for calendar-aware tasks.
|
|
64
|
+
|
|
65
|
+
## Decision 2: Setting Min/Max Intervals
|
|
66
|
+
|
|
67
|
+
**Question**: Should I set `minIntervalMs` and `maxIntervalMs` constraints?
|
|
68
|
+
|
|
69
|
+
### When to Set Min Interval:
|
|
70
|
+
|
|
71
|
+
**Rate limiting** - External service has request limits:
|
|
72
|
+
- API allows 100 requests/hour → `minIntervalMs: 36000` (36 seconds)
|
|
73
|
+
- Database writes limited to 1/second → `minIntervalMs: 1000`
|
|
74
|
+
|
|
75
|
+
**Cost control** - Execution is expensive:
|
|
76
|
+
- AI analysis costs $0.10/run → `minIntervalMs: 300000` (5 min minimum)
|
|
77
|
+
- External API charges per call → Set based on budget
|
|
78
|
+
|
|
79
|
+
**Resource protection** - Prevent overwhelming systems:
|
|
80
|
+
- Heavy computation → `minIntervalMs: 60000` (1 min minimum)
|
|
81
|
+
- Large data transfers → Set based on capacity
|
|
82
|
+
|
|
83
|
+
**Effect**: Governor clamps any schedule (baseline, AI hints) to this minimum. Even if AI proposes 10-second interval, endpoint runs at minimum interval.
|
|
84
|
+
|
|
85
|
+
### When to Set Max Interval:
|
|
86
|
+
|
|
87
|
+
**Staleness limits** - Data can't be too old:
|
|
88
|
+
- Real-time monitoring → `maxIntervalMs: 300000` (5 min max)
|
|
89
|
+
- Fraud detection → `maxIntervalMs: 60000` (1 min max)
|
|
90
|
+
|
|
91
|
+
**SLA requirements** - Guaranteed check frequency:
|
|
92
|
+
- "Must check health every 10 minutes" → `maxIntervalMs: 600000`
|
|
93
|
+
|
|
94
|
+
**Effect**: Governor clamps any schedule to this maximum. If AI tries to relax to 30 minutes, endpoint runs at maximum interval.
|
|
95
|
+
|
|
96
|
+
### Default Strategy:
|
|
97
|
+
|
|
98
|
+
**Start without constraints**, then add if needed:
|
|
99
|
+
1. Deploy with baseline only (no min/max)
|
|
100
|
+
2. Monitor execution patterns
|
|
101
|
+
3. Add `minIntervalMs` if hitting rate limits or costs spike
|
|
102
|
+
4. Add `maxIntervalMs` if data gets too stale
|
|
103
|
+
|
|
104
|
+
Constraints are hard limits—they override everything, including AI. Use them sparingly.
|
|
105
|
+
|
|
106
|
+
### Example Configurations:
|
|
107
|
+
|
|
108
|
+
**Health check** (must run frequently, no rate limits):
|
|
109
|
+
```json
|
|
110
|
+
{
|
|
111
|
+
"baselineIntervalMs": 30000,
|
|
112
|
+
"minIntervalMs": null,
|
|
113
|
+
"maxIntervalMs": 120000
|
|
114
|
+
}
|
|
115
|
+
```
|
|
116
|
+
AI can tighten to 30s (baseline) but not exceed 2 minutes.
|
|
117
|
+
|
|
118
|
+
**External API call** (rate limited):
|
|
119
|
+
```json
|
|
120
|
+
{
|
|
121
|
+
"baselineIntervalMs": 60000,
|
|
122
|
+
"minIntervalMs": 30000,
|
|
123
|
+
"maxIntervalMs": null
|
|
124
|
+
}
|
|
125
|
+
```
|
|
126
|
+
AI can relax beyond 1 minute but never tighten below 30 seconds.
|
|
127
|
+
|
|
128
|
+
**Expensive operation** (cost control):
|
|
129
|
+
```json
|
|
130
|
+
{
|
|
131
|
+
"baselineIntervalMs": 300000,
|
|
132
|
+
"minIntervalMs": 180000,
|
|
133
|
+
"maxIntervalMs": 900000
|
|
134
|
+
}
|
|
135
|
+
```
|
|
136
|
+
Constrained to 3-15 minute range, AI operates within that window.
|
|
137
|
+
|
|
138
|
+
## Decision 3: Timeout Configuration
|
|
139
|
+
|
|
140
|
+
**Question**: What timeout should I set?
|
|
141
|
+
|
|
142
|
+
### Rule of Thumb:
|
|
143
|
+
|
|
144
|
+
`timeout = p95 latency × 2 + buffer`
|
|
145
|
+
|
|
146
|
+
**Example**: If your endpoint usually completes in 2 seconds (p95), set timeout to 5-10 seconds.
|
|
147
|
+
|
|
148
|
+
### Categories:
|
|
149
|
+
|
|
150
|
+
**Fast endpoints** (< 1 second typical):
|
|
151
|
+
- Health checks
|
|
152
|
+
- Simple API calls
|
|
153
|
+
- Cache reads
|
|
154
|
+
- **Timeout**: 3-5 seconds
|
|
155
|
+
|
|
156
|
+
**Medium endpoints** (1-10 seconds typical):
|
|
157
|
+
- Database queries
|
|
158
|
+
- Data transformations
|
|
159
|
+
- Most API integrations
|
|
160
|
+
- **Timeout**: 15-30 seconds
|
|
161
|
+
|
|
162
|
+
**Slow endpoints** (10-60 seconds typical):
|
|
163
|
+
- Heavy computations
|
|
164
|
+
- Large data transfers
|
|
165
|
+
- Batch operations
|
|
166
|
+
- **Timeout**: 60-120 seconds
|
|
167
|
+
|
|
168
|
+
**Very slow endpoints** (> 1 minute typical):
|
|
169
|
+
- ML model training
|
|
170
|
+
- Large file processing
|
|
171
|
+
- **Timeout**: 300+ seconds (5+ minutes)
|
|
172
|
+
|
|
173
|
+
### What Happens on Timeout:
|
|
174
|
+
|
|
175
|
+
1. Request is cancelled
|
|
176
|
+
2. Run status marked as `"timeout"`
|
|
177
|
+
3. Failure count increments
|
|
178
|
+
4. Exponential backoff applies (interval increases)
|
|
179
|
+
|
|
180
|
+
Set timeout high enough to avoid false failures, but low enough to prevent zombie runs consuming resources.
|
|
181
|
+
|
|
182
|
+
### Common Mistake:
|
|
183
|
+
|
|
184
|
+
❌ Setting timeout too low: `timeout: 5000` for 10-second operation
|
|
185
|
+
Result: Frequent timeout failures, exponential backoff kicks in, endpoint runs less frequently
|
|
186
|
+
|
|
187
|
+
✅ Set based on actual performance: `timeout: 30000` with 10-second p95
|
|
188
|
+
|
|
189
|
+
## Decision 4: Response Body Design
|
|
190
|
+
|
|
191
|
+
**Question**: What should my endpoint return?
|
|
192
|
+
|
|
193
|
+
### Minimum Viable Response:
|
|
194
|
+
|
|
195
|
+
```json
|
|
196
|
+
{
|
|
197
|
+
"status": "success",
|
|
198
|
+
"timestamp": "2025-11-02T14:30:00Z"
|
|
199
|
+
}
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
AI can work with just execution status, but you get better adaptation with richer data.
|
|
203
|
+
|
|
204
|
+
### Good Response (Monitoring Endpoint):
|
|
205
|
+
|
|
206
|
+
```json
|
|
207
|
+
{
|
|
208
|
+
"status": "healthy",
|
|
209
|
+
"queue_depth": 45,
|
|
210
|
+
"processing_rate_per_min": 100,
|
|
211
|
+
"error_rate_pct": 1.2,
|
|
212
|
+
"avg_latency_ms": 150,
|
|
213
|
+
"capacity_pct": 35,
|
|
214
|
+
"timestamp": "2025-11-02T14:30:00Z"
|
|
215
|
+
}
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
AI can detect:
|
|
219
|
+
- Growing queue (tighten interval)
|
|
220
|
+
- High error rate (investigate immediately)
|
|
221
|
+
- Low capacity usage (relax interval)
|
|
222
|
+
|
|
223
|
+
### Great Response (Coordination Endpoint):
|
|
224
|
+
|
|
225
|
+
```json
|
|
226
|
+
{
|
|
227
|
+
"status": "healthy",
|
|
228
|
+
"metrics": {
|
|
229
|
+
"queue_depth": 45,
|
|
230
|
+
"error_rate_pct": 1.2
|
|
231
|
+
},
|
|
232
|
+
"coordination": {
|
|
233
|
+
"ready_for_downstream": true,
|
|
234
|
+
"upstream_healthy": true,
|
|
235
|
+
"batch_id": "2025-11-02"
|
|
236
|
+
},
|
|
237
|
+
"actions": {
|
|
238
|
+
"last_alert_sent_at": "2025-11-02T12:00:00Z",
|
|
239
|
+
"last_scale_up_at": null
|
|
240
|
+
},
|
|
241
|
+
"timestamp": "2025-11-02T14:30:00Z"
|
|
242
|
+
}
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
AI can:
|
|
246
|
+
- Detect trends in metrics
|
|
247
|
+
- Coordinate with siblings (batch_id, ready flags)
|
|
248
|
+
- Apply cooldowns (check last_alert_sent_at)
|
|
249
|
+
|
|
250
|
+
### What to Include:
|
|
251
|
+
|
|
252
|
+
1. **Metrics that change**: Queue depth, error rates, latencies (not static config)
|
|
253
|
+
2. **Status indicators**: healthy/degraded/critical, true/false flags
|
|
254
|
+
3. **Timestamps**: Enable cooldown calculations, staleness checks
|
|
255
|
+
4. **Coordination signals**: For multi-endpoint workflows
|
|
256
|
+
5. **Thresholds**: Show when to take action (queue_max, warning_threshold)
|
|
257
|
+
|
|
258
|
+
### What to Exclude:
|
|
259
|
+
|
|
260
|
+
1. **Large arrays**: Truncated at 1000 chars, wastes tokens
|
|
261
|
+
2. **Deeply nested objects**: Hard for AI to parse
|
|
262
|
+
3. **Static data**: Endpoint name, config (AI already has this)
|
|
263
|
+
4. **Sensitive data**: Logged and stored, consider security
|
|
264
|
+
|
|
265
|
+
## Decision 5: Pause vs. Constraints
|
|
266
|
+
|
|
267
|
+
**Question**: Should I use `pause_until` or constraint clamping?
|
|
268
|
+
|
|
269
|
+
### Use Pause When:
|
|
270
|
+
|
|
271
|
+
**Temporary shutdowns**:
|
|
272
|
+
- Maintenance windows
|
|
273
|
+
- Dependency outages
|
|
274
|
+
- Manual emergency stop
|
|
275
|
+
|
|
276
|
+
**Effect**: Endpoint doesn't run at all until pause expires
|
|
277
|
+
|
|
278
|
+
**Set via**: AI tool `pause_until()` or API
|
|
279
|
+
|
|
280
|
+
### Use Max Interval When:
|
|
281
|
+
|
|
282
|
+
**Slow down but don't stop**:
|
|
283
|
+
- Rate limiting (hit API limits, back off)
|
|
284
|
+
- Cost control (reduce frequency but keep monitoring)
|
|
285
|
+
|
|
286
|
+
**Effect**: Endpoint runs, but not faster than max interval
|
|
287
|
+
|
|
288
|
+
**Set via**: Endpoint configuration
|
|
289
|
+
|
|
290
|
+
### Comparison:
|
|
291
|
+
|
|
292
|
+
| Scenario | Use Pause | Use Max Interval |
|
|
293
|
+
|----------|-----------|------------------|
|
|
294
|
+
| Database maintenance | ✓ | |
|
|
295
|
+
| Rate limit hit (429 response) | ✓ | |
|
|
296
|
+
| Cost budget exceeded | | ✓ |
|
|
297
|
+
| Dependency unavailable | ✓ | |
|
|
298
|
+
| Weekend-only execution | ✓ | |
|
|
299
|
+
| Slow down monitoring | | ✓ |
|
|
300
|
+
|
|
301
|
+
**Rule**: If you want **zero** executions, use pause. If you want **less frequent** executions, use max interval.
|
|
302
|
+
|
|
303
|
+
## Constraint Interaction: How Limits Stack
|
|
304
|
+
|
|
305
|
+
Understanding how constraints interact:
|
|
306
|
+
|
|
307
|
+
### Priority Order:
|
|
308
|
+
|
|
309
|
+
1. **Pause** (highest priority)
|
|
310
|
+
- If `pausedUntil > now`, return that time (source: `"paused"`)
|
|
311
|
+
- Nothing else matters
|
|
312
|
+
|
|
313
|
+
2. **Governor candidate selection**
|
|
314
|
+
- Choose between baseline, AI interval hint, AI one-shot hint
|
|
315
|
+
- AI interval overrides baseline
|
|
316
|
+
- One-shot competes with others
|
|
317
|
+
|
|
318
|
+
3. **Min/Max clamping** (lowest priority)
|
|
319
|
+
- Clamp chosen candidate to `[now + min, now + max]`
|
|
320
|
+
- Applies to all candidates (baseline, AI hints)
|
|
321
|
+
|
|
322
|
+
### Example Flow:
|
|
323
|
+
|
|
324
|
+
```
|
|
325
|
+
Baseline: 5 minutes (300000ms)
|
|
326
|
+
Min: 1 minute (60000ms)
|
|
327
|
+
Max: 10 minutes (600000ms)
|
|
328
|
+
AI proposes: 30 seconds (30000ms)
|
|
329
|
+
Pause: Not set
|
|
330
|
+
|
|
331
|
+
Governor logic:
|
|
332
|
+
1. Check pause → Not paused, continue
|
|
333
|
+
2. Choose candidate → AI interval (30000ms) overrides baseline
|
|
334
|
+
3. Clamp to min → 30000ms < 60000ms (min), adjust to 60000ms
|
|
335
|
+
4. Clamp to max → 60000ms < 600000ms (max), no change
|
|
336
|
+
5. Result: 60000ms (1 minute), source: "clamped-min"
|
|
337
|
+
```
|
|
338
|
+
|
|
339
|
+
**Key insight**: Min/max are hard limits. AI can propose anything, but Governor enforces bounds.
|
|
340
|
+
|
|
341
|
+
## Common Configuration Patterns
|
|
342
|
+
|
|
343
|
+
### Pattern 1: Adaptive Monitoring
|
|
344
|
+
```json
|
|
345
|
+
{
|
|
346
|
+
"baselineIntervalMs": 300000,
|
|
347
|
+
"minIntervalMs": 30000,
|
|
348
|
+
"maxIntervalMs": null,
|
|
349
|
+
"timeoutMs": 10000
|
|
350
|
+
}
|
|
351
|
+
```
|
|
352
|
+
AI can tighten from 5 minutes to 30 seconds during incidents, but can't go faster (rate limit).
|
|
353
|
+
|
|
354
|
+
### Pattern 2: Scheduled Task with Staleness Limit
|
|
355
|
+
```json
|
|
356
|
+
{
|
|
357
|
+
"baselineCron": "0 2 * * *",
|
|
358
|
+
"minIntervalMs": null,
|
|
359
|
+
"maxIntervalMs": 7200000,
|
|
360
|
+
"timeoutMs": 60000
|
|
361
|
+
}
|
|
362
|
+
```
|
|
363
|
+
Runs daily at 2 AM, but if it fails, retries within 2 hours (not 24 hours later).
|
|
364
|
+
|
|
365
|
+
### Pattern 3: Cost-Controlled Analysis
|
|
366
|
+
```json
|
|
367
|
+
{
|
|
368
|
+
"baselineIntervalMs": 600000,
|
|
369
|
+
"minIntervalMs": 300000,
|
|
370
|
+
"maxIntervalMs": 3600000,
|
|
371
|
+
"timeoutMs": 30000
|
|
372
|
+
}
|
|
373
|
+
```
|
|
374
|
+
Expensive operation runs every 10 minutes normally, AI can relax to hourly or tighten to 5 minutes.
|
|
375
|
+
|
|
376
|
+
## Debugging Configuration Issues
|
|
377
|
+
|
|
378
|
+
**Problem**: Endpoint runs too frequently, ignoring baseline
|
|
379
|
+
|
|
380
|
+
**Check**:
|
|
381
|
+
1. Active AI interval hint? (check `aiHintExpiresAt`)
|
|
382
|
+
2. Min interval set too low?
|
|
383
|
+
3. Baseline misconfigured? (wrong units—milliseconds not seconds)
|
|
384
|
+
|
|
385
|
+
**Problem**: Endpoint never tightens during incidents
|
|
386
|
+
|
|
387
|
+
**Check**:
|
|
388
|
+
1. Using cron instead of interval? (AI can't override cron)
|
|
389
|
+
2. Min interval too high? (AI proposals clamped)
|
|
390
|
+
3. AI quota exceeded? (no analysis happening)
|
|
391
|
+
|
|
392
|
+
**Problem**: Endpoint times out frequently
|
|
393
|
+
|
|
394
|
+
**Check**:
|
|
395
|
+
1. Timeout too low for actual performance?
|
|
396
|
+
2. Endpoint performance degraded? (check duration trends)
|
|
397
|
+
3. Network issues? (high latency)
|
|
398
|
+
|
|
399
|
+
## Key Takeaways
|
|
400
|
+
|
|
401
|
+
1. **Cron for calendar tasks, interval for continuous monitoring**
|
|
402
|
+
2. **Start without constraints, add min/max only when needed**
|
|
403
|
+
3. **Timeout = p95 latency × 2 + buffer**
|
|
404
|
+
4. **Include metrics, timestamps, and coordination signals in responses**
|
|
405
|
+
5. **Pause stops execution, max interval slows it down**
|
|
406
|
+
6. **Constraints are hard limits—they override AI**
|
|
407
|
+
7. **Min/max apply relative to `now`, not `lastRunAt`**
|
|
408
|
+
|
|
409
|
+
Configure conservatively. The system is designed to be safe by default. Add constraints when you encounter real problems, not anticipated ones.
|
|
410
|
+
|
|
411
|
+
## Next Steps
|
|
412
|
+
|
|
413
|
+
- **[How Scheduling Works](./how-scheduling-works.md)** - Understand how Governor applies constraints
|
|
414
|
+
- **[How AI Adaptation Works](./how-ai-adaptation-works.md)** - Learn how AI proposes intervals
|
|
415
|
+
- **[Reference](./reference.md)** - Quick lookup for defaults and field ranges
|