npm - @cronicorn/mcp-server - Versions diffs - 1.18.3 → 1.19.1 - Mend

@cronicorn/mcp-server 1.18.3 → 1.19.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/README.md +87 -436
package/dist/docs/api-reference.md +558 -0
package/dist/docs/core-concepts.md +9 -1
package/dist/docs/introduction.md +15 -5
package/dist/docs/mcp-server.md +49 -69
package/dist/docs/quick-start.md +11 -2
package/dist/docs/self-hosting.md +10 -1
package/dist/docs/technical/configuration-and-constraints.md +11 -2
package/dist/docs/technical/coordinating-multiple-endpoints.md +11 -2
package/dist/docs/technical/how-ai-adaptation-works.md +122 -385
package/dist/docs/technical/how-scheduling-works.md +76 -2
package/dist/docs/technical/reference.md +11 -2
package/dist/docs/technical/system-architecture.md +57 -189
package/dist/docs/troubleshooting.md +392 -0
package/dist/docs/use-cases.md +10 -1
package/dist/index.js +20 -12
package/dist/index.js.map +1 -1
package/package.json +1 -1
package/dist/docs/competitive-analysis.md +0 -324
package/dist/docs/developers/README.md +0 -29
package/dist/docs/developers/authentication.md +0 -121
package/dist/docs/developers/environment-configuration.md +0 -103
package/dist/docs/developers/quality-checks.md +0 -68
package/dist/docs/developers/quick-start.md +0 -87
package/dist/docs/developers/workspace-structure.md +0 -174

package/dist/docs/technical/how-scheduling-works.md CHANGED Viewed

@@ -7,12 +7,14 @@ sidebar_position: 2
 mcp:
   uri: file:///docs/technical/how-scheduling-works.md
   mimeType: text/markdown
-  priority: 0.85
-  lastModified: 2025-11-02T00:00:00Z
+  priority: 0.90
+  lastModified: 2026-02-03T00:00:00Z
 ---
 # How Scheduling Works
+**TL;DR:** The Scheduler claims due endpoints, executes them, records results, and uses the Governor (a pure function) to calculate the next run time. AI hints override baseline schedules, constraints are hard limits, and the system includes safety mechanisms for locks, failures, and zombie runs.
 This document explains how the Scheduler worker executes jobs and calculates next run times. If you haven't read [System Architecture](./system-architecture.md), start there for context on the dual-worker design.
 ## The Scheduler's Job
@@ -183,6 +185,70 @@ The database update is atomic. If two Schedulers somehow claimed the same endpoi
 After the update, the endpoint's lock expires naturally (when `_lockedUntil` passes), and it becomes claimable again when `nextRunAt` arrives.
+## Distributed Locks and Single Execution Guarantee
+A critical requirement for any job scheduler is ensuring each job runs **exactly once** per scheduled time—even when multiple Scheduler instances run concurrently for high availability.
+Cronicorn uses **database-level distributed locks** via PostgreSQL's atomic operations to achieve this guarantee.
+### How Distributed Locks Work
+When the Scheduler claims endpoints, it uses an **atomic conditional update**:
+```sql
+UPDATE job_endpoints
+SET _lockedUntil = now() + lockTtlMs
+WHERE nextRunAt <= now()
+  AND _lockedUntil <= now()
+RETURNING id
+```
+This query atomically:
+1. Finds endpoints that are due (`nextRunAt <= now`)
+2. Checks they're not already locked (`_lockedUntil <= now`)
+3. Acquires the lock by setting `_lockedUntil`
+4. Returns only the IDs that were successfully claimed
+Because PostgreSQL executes this as a single atomic operation, **only one Scheduler instance can claim each endpoint**, even if multiple instances query simultaneously.
+### Multi-Instance Behavior
+When running multiple Scheduler instances:
+| Time | Scheduler A | Scheduler B | Endpoint State |
+|------|-------------|-------------|----------------|
+| T=0 | Claims ep_123 | Attempts claim | `_lockedUntil = T+30s` |
+| T=0 | Gets ep_123 | Gets nothing | Locked by A |
+| T=5s | Executing | Skips ep_123 | Still locked |
+| T=10s | Completes, releases | Available but not due | `nextRunAt = T+300s` |
+**Result**: Endpoint ep_123 executes exactly once, by Scheduler A.
+### Lock TTL and Crash Recovery
+Locks have a short **Time-To-Live** (default: 30 seconds). This enables crash recovery:
+**If Scheduler A crashes mid-execution:**
+1. The lock remains until `_lockedUntil` expires
+2. After 30 seconds, Scheduler B can claim the endpoint
+3. The run is marked as failed (timeout/zombie)
+4. The endpoint recovers automatically
+This means endpoints **can't get permanently stuck**. At worst, there's a delay equal to the lock TTL before another instance picks up the work.
+### Single Execution Summary
+| Guarantee | Mechanism |
+|-----------|-----------|
+| No double execution | Atomic `UPDATE...WHERE _lockedUntil <= now` |
+| Crash recovery | Lock TTL expires, another instance claims |
+| Multi-instance safety | PostgreSQL transaction isolation |
+| Audit trail | Run records show which instance executed |
+This design ensures reliable, exactly-once execution across any number of Scheduler instances.
+---
 ## Safety Mechanisms
 The Scheduler includes several safety mechanisms to handle edge cases:
@@ -261,3 +327,11 @@ This is the power of database-mediated communication: the Scheduler and AI Plann
 6. **Sources provide auditability**: Every decision is traceable
 Understanding how scheduling works gives you the foundation to configure endpoints effectively, debug unexpected behavior, and reason about how AI adaptation affects execution timing.
+---
+## See Also
+- **[System Architecture](./system-architecture.md)** - High-level dual-worker design
+- **[How AI Adaptation Works](./how-ai-adaptation-works.md)** - AI tools, response body design, and decision framework
+- **[Configuration and Constraints](./configuration-and-constraints.md)** - Setting up endpoints effectively

package/dist/docs/technical/reference.md CHANGED Viewed

@@ -7,8 +7,8 @@ sidebar_position: 6
 mcp:
   uri: file:///docs/technical/reference.md
   mimeType: text/markdown
-  priority: 0.75
-  lastModified: 2025-11-02T00:00:00Z
+  priority: 0.90
+  lastModified: 2026-02-03T00:00:00Z
 ---
 # Reference
@@ -458,3 +458,12 @@ Calculate current backoff multiplier:
 ```
 failureCount > 0 ? 2^min(failureCount, 5) : 1
 ```
+---
+## See Also
+- **[How Scheduling Works](./how-scheduling-works.md)** - Detailed Governor logic
+- **[How AI Adaptation Works](./how-ai-adaptation-works.md)** - AI tools and decision framework
+- **[Configuration and Constraints](./configuration-and-constraints.md)** - Practical configuration guidance
+- **[Troubleshooting](../troubleshooting.md)** - Debugging guide

package/dist/docs/technical/system-architecture.md CHANGED Viewed

@@ -7,229 +7,108 @@ sidebar_position: 1
 mcp:
   uri: file:///docs/technical/system-architecture.md
   mimeType: text/markdown
-  priority: 0.85
-  lastModified: 2025-11-02T00:00:00Z
+  priority: 0.80
+  lastModified: 2026-02-03T00:00:00Z
 ---
 # System Architecture
-**TL;DR:** Cronicorn uses two independent workers (Scheduler and AI Planner) that communicate only through a shared database. The Scheduler executes jobs reliably on schedule, while the AI Planner analyzes execution patterns and suggests schedule adjustments through time-bounded hints. This separation makes the system both reliable and adaptive.
+**TL;DR:** Cronicorn uses two independent workers (Scheduler and AI Planner) that communicate only through a shared database. The Scheduler executes jobs reliably on schedule, while the AI Planner analyzes execution patterns and suggests schedule adjustments through time-bounded hints.
 ---
 ## The Big Picture
-Cronicorn uses a **dual-worker architecture** where job scheduling is both reliable and intelligent. Instead of one worker doing everything, we split responsibilities into two independent workers:
+Cronicorn uses a **dual-worker architecture** where job scheduling is both reliable and intelligent:
 1. **The Scheduler Worker** - Executes jobs on schedule
 2. **The AI Planner Worker** - Analyzes patterns and suggests schedule adjustments
-These workers never communicate directly. All coordination happens through the database. This separation is why the system works.
+These workers never communicate directly. All coordination happens through the database.
 ## Why Two Workers?
-When you build intelligence directly into a scheduler, every job execution must:
+When you build intelligence directly into a scheduler, every job execution must analyze history, call AI models, wait for responses, and handle failures—all in the critical path. If the AI is slow, jobs run late. If the AI crashes, the scheduler might crash.
-- Analyze execution history
-- Make API calls to AI models
-- Wait for responses
-- Process recommendations
-- Handle AI failures gracefully
-All of this happens *in the critical path*. If the AI is slow, jobs run late. If the AI crashes, the scheduler might crash. If you update AI logic, you risk breaking job execution.
-Cronicorn separates execution from decision-making. The Scheduler executes endpoints reliably and on time. The AI Planner analyzes patterns and suggests adjustments. Neither worker depends on the other.
-This separation provides:
+Cronicorn separates execution from decision-making:
 - **Reliability**: The Scheduler keeps running even if AI fails
 - **Performance**: Jobs execute immediately without waiting for AI analysis
 - **Safety**: Bugs in AI logic can't break job execution
-- **Flexibility**: We can upgrade, restart, or replace either worker independently
-- **Scalability**: We can scale Schedulers and AI Planners separately based on load
+- **Scalability**: Scale Schedulers and AI Planners independently based on load
-## How Workers Communicate: The Database as Message Bus
+## How Workers Communicate
-Since the workers don't talk to each other, they coordinate through **shared database state**. The database is both storage and message bus—workers leave information for each other.
-Here's how it works:
+Workers coordinate through **shared database state**. The database is both storage and message bus.
 ### The Scheduler's Perspective
-The Scheduler wakes up every 5 seconds and asks the database: "Which endpoints are due to run right now?" It gets back a list of endpoint IDs, executes each one, and then writes the results back to the database:
-- Execution status (success or failure)
-- Duration and performance metrics
-- Response body data (the actual JSON returned by the endpoint)
-- Updated failure counts
-After recording results, the Scheduler calculates when each endpoint should run next and updates the `nextRunAt` field. Then it goes back to sleep for 5 seconds.
+Every 5 seconds, the Scheduler:
+1. Claims due endpoints from the database
+2. Executes each endpoint's HTTP request
+3. Writes results back (status, duration, response body)
+4. Calculates next run time using the Governor
+5. Updates `nextRunAt` and goes back to sleep
-Notice what the Scheduler *doesn't* do: analyze patterns, make AI decisions, or try to be clever. It executes, records, schedules next run, repeat.
+The Scheduler doesn't analyze patterns or make AI decisions. It executes, records, schedules, repeat.
 ### The AI Planner's Perspective
-The AI Planner wakes up every 5 minutes (or on a different schedule entirely) and asks the database: "Which endpoints have been active recently?" It gets back a list of endpoint IDs that ran in the last few minutes.
+The AI Planner wakes up periodically and analyzes recently active endpoints. For each endpoint, it:
+1. Reads execution history (success rates, response bodies, failure streaks)
+2. Sends context to an AI model
+3. Writes **hints** to the database—temporary scheduling suggestions with expiration times
-For each active endpoint, the Planner reads execution history from the database:
-- Success rates over the last 24 hours
-- Recent response bodies
-- Current failure streaks
-- Existing schedule configuration
+The AI Planner doesn't execute jobs or manage locks. It analyzes and suggests.
-It sends all this context to an AI model and asks: "Should we adjust this endpoint's schedule?" The AI might decide:
-- "Tighten the interval to 30 seconds because load is increasing" (writes an interval hint)
-- "Run this immediately to investigate an issue" (writes a one-shot hint)
-- "Pause until the maintenance window ends" (sets pausedUntil)
-- "Everything looks good, no changes needed" (does nothing)
-Any decisions get written to the database as **hints**—temporary scheduling suggestions with expiration times. Then the Planner moves to the next endpoint.
-Notice what the AI Planner *doesn't* do: execute jobs, manage locks, or worry about reliability. It analyzes and suggests.
-### The Database as Coordination Medium
+### Database as Coordination Medium
 This database-mediated architecture means:
-1. **Eventually consistent**: The Scheduler might execute a job before the AI Planner has analyzed its previous run. That's fine—the next execution will use the AI's recommendations.
-2. **Non-blocking**: The Scheduler never waits for AI. It reads hints already in the database (or finds none) and makes instant decisions.
-3. **Fault-tolerant**: If the AI Planner crashes, the Scheduler keeps running jobs on baseline schedules. When AI comes back, it resumes making recommendations.
-4. **Scalable**: Want faster AI analysis? Run more AI Planner instances. Want to handle more job executions? Run more Scheduler instances. They scale independently.
+- **Eventually consistent**: The Scheduler might execute a job before the AI has analyzed its previous run
+- **Non-blocking**: The Scheduler never waits for AI—it reads hints already in the database
+- **Fault-tolerant**: If AI crashes, the Scheduler keeps running on baseline schedules
+- **Scalable**: Scale Schedulers and AI Planners independently
-## The Three Types of Scheduling Information
-Understanding how the system works requires understanding the three types of scheduling information stored in the database:
+## Three Types of Scheduling Information
 ### 1. Baseline Schedule (Permanent)
-This is what you configure when creating an endpoint. It's either:
-- A cron expression: `"0 */5 * * *"` (every 5 minutes)
-- A fixed interval: `300000` milliseconds (5 minutes)
+What you configure when creating an endpoint:
+- A cron expression: `"0 */5 * * *"` (every 5 minutes)
+- A fixed interval: `300000` milliseconds
-The baseline represents your *intent*—how often you want the job to run under normal circumstances. It never expires or changes unless you update it.
+The baseline never expires or changes unless you update it.
 ### 2. AI Hints (Temporary, Time-Bounded)
-These are recommendations from the AI Planner. They come in two flavors:
+Recommendations from the AI Planner with automatic expiration:
 **Interval Hints**: "Run every 30 seconds for the next hour"
 - Used when AI wants to change run frequency
-- Has a TTL (time-to-live)—expires after N minutes
 - Example: Tightening monitoring during a load spike
-**One-Shot Hints**: "Run at 2:30 PM today"
+**One-Shot Hints**: "Run at 2:30 PM today"
 - Used when AI wants to trigger a specific execution
-- Has a TTL—expires if not used within N minutes
 - Example: Immediate investigation of a failure
-Both hint types have expiration times. When a hint expires, the system falls back to the baseline schedule. This is a safety mechanism—if the AI makes a bad decision, it's time-bounded.
+When hints expire, the system falls back to baseline. This is a safety mechanism.
 ### 3. Pause State (Manual Override)
-You can manually pause an endpoint until a specific time. While paused, the endpoint won't run at all, regardless of baseline or hints. This is useful for:
-- Maintenance windows
-- Temporarily disabling misbehaving endpoints
-- Coordinating with external system downtime
-Setting `pausedUntil = null` resumes the endpoint immediately.
-## How Adaptation Happens
-Let's walk through a concrete example.
-**Scenario**: You have a traffic monitoring endpoint checking visitor counts every 5 minutes (baseline interval).
-**T=0**: Normal day, 2,000 visitors per minute
-- Scheduler runs the endpoint on its 5-minute baseline
-- Response body: `{ "visitorsPerMin": 2000, "status": "normal" }`
-- Scheduler calculates next run at T+5min and updates database
-**T+5min**: AI Planner analyzes the endpoint
-- Reads last 24 hours of execution history
-- Sees steady 2,000 visitors with 100% success rate
-- AI decision: "Everything looks stable, no changes needed"
-- No hints written to database
-**T+10min**: Flash sale starts, traffic spikes
-- Scheduler runs endpoint on 5-minute baseline
-- Response body: `{ "visitorsPerMin": 5500, "status": "elevated" }`
-- Scheduler records results and schedules next run at T+15min
-**T+12min**: AI Planner analyzes again
-- Sees visitor count jumped from 2,000 to 5,500
-- Looks at trend over last few runs—increasing
-- AI decision: "High load detected, need tighter monitoring"
-- Writes interval hint to database: 30 seconds, expires in 60 minutes
-- **Nudges** `nextRunAt` to T+12min+30sec
-**T+12min+30sec**: Scheduler wakes up, claims endpoint (now due)
-- Reads endpoint state from database
-- Sees fresh AI hint (30-second interval, expires at T+72min)
-- Governor chooses: AI hint (30 sec) overrides baseline (5 min)
-- Executes endpoint, gets response: `{ "visitorsPerMin": 6200, "status": "high" }`
-- Calculates next run: T+13min (30 seconds from now)
-**T+13min through T+72min**: Runs every 30 seconds
-- AI hint remains active
-- Scheduler uses 30-second interval for every run
-- System monitors flash sale closely
+Manually pause an endpoint until a specific time. While paused, the endpoint won't run regardless of baseline or hints.
-**T+72min**: AI hint expires
-- Scheduler reads endpoint state
-- No valid hints found (aiHintExpiresAt < now)
-- Governor chooses: Baseline (5 min)
-- System returns to normal 5-minute interval
-- AI can propose new hints if load remains high
+## The Governor
-This example shows several key principles:
+The **Governor** is a pure function inside the Scheduler that decides when a job runs next. After executing an endpoint, the Scheduler calls the Governor with current time, endpoint configuration, and constraints.
-1. **AI hints override baseline**—This is what makes the system adaptive
-2. **Hints have TTLs**—Bad AI decisions auto-correct
-3. **Nudging provides immediacy**—Changes take effect within seconds, not minutes
-4. **Eventual consistency works**—There's a delay between analysis and application, but it's acceptable
-5. **System self-heals**—When hints expire, it returns to known-good baseline
+The Governor evaluates all scheduling information and returns: "Run this endpoint next at [timestamp]."
-## The Role of the Governor
+The Governor is deterministic—same inputs always produce the same output. It has no side effects and makes no database calls.
-Who decides when a job runs next? That's the **Governor**—a pure function inside the Scheduler worker.
+For detailed Governor logic, see [How Scheduling Works](./how-scheduling-works.md).
-After executing an endpoint, the Scheduler calls the Governor with:
-- Current time
-- Endpoint configuration (baseline, hints, constraints)
-- Cron parser (for cron expressions)
-The Governor evaluates all scheduling information and returns a single answer: "Run this endpoint next at [timestamp]."
-The Governor is deterministic—same inputs always produce the same output. It has no side effects, makes no database calls, and contains no business logic beyond "what time should this run next?"
-This determinism makes the Governor:
-- **Testable**: We can verify scheduling logic with unit tests
-- **Auditable**: Every scheduling decision has a clear source ("baseline-cron", "ai-interval", etc.)
-- **Debuggable**: You can trace why a job ran when it did
-- **Portable**: The algorithm can be understood, documented, and reimplemented
-The Governor's logic is covered in detail in [How Scheduling Works](./how-scheduling-works.md).
-## Why This Architecture Works for Adaptation
-Traditional cron systems are static—you set a schedule and it runs forever on that schedule. Cronicorn's architecture enables adaptive scheduling because:
-1. **Separation allows continuous learning**: While the Scheduler executes jobs, the AI Planner can analyze patterns without disrupting execution. Analysis happens in parallel, not blocking execution.
-2. **Hints enable safe experimentation**: Because hints have TTLs, the AI can try aggressive schedule changes knowing they'll auto-expire if wrong. This allows quick adaptation without risk.
-3. **Database state captures context**: Every execution records response bodies. The AI can see the data returned by endpoints—not just success/failure, but real metrics like queue depths, error rates, latency. This rich context enables intelligent decisions.
-4. **Override semantics enable tightening**: AI interval hints *override* baseline (not just compete), so the system can tighten monitoring during incidents. Without this override, the baseline would always win and adaptation would be limited to relaxation only.
-5. **Independent scaling supports different workloads**: Execution workload (Scheduler) and analysis workload (AI Planner) have different characteristics. Separating them allows optimizing each independently.
-## Data Flows: Putting It All Together
-Here's how information flows through the system:
+## Data Flow
 ```
 [User Creates Endpoint]
@@ -252,47 +131,36 @@ Here's how information flows through the system:
        ↓
    Governor sees hints, calculates next run
        ↓
-   Database (nextRunAt updated with hint influence)
-       ↓
    [Cycle continues...]
 ```
-Notice how the database is the central hub. Workers don't communicate—they share state through database reads and writes. This is **database-mediated communication**, and it's the foundation of the architecture.
-## Trade-offs and Design Decisions
+## Trade-offs
-No architecture is perfect. Here are the trade-offs we made:
-**✅ Pros:**
+**Pros:**
 - Reliability (execution never blocked by AI)
 - Performance (no AI in critical path)
 - Scalability (independent worker scaling)
 - Safety (AI bugs can't break execution)
 - Testability (deterministic components)
-**⚠️ Cons:**
-- Eventual consistency (hints applied after next execution, not immediately)
+**Cons:**
+- Eventual consistency (hints applied after next execution)
 - Database as bottleneck (all coordination through DB)
-- More complex deployment (two worker types to run)
-- Debugging requires understanding async flows
+- More complex deployment (two worker types)
-We believe the pros outweigh the cons for adaptive scheduling. The slight delay in applying AI hints (typically 5-30 seconds) is acceptable because scheduling adjustments aren't time-critical—we're optimizing for hours/days, not milliseconds.
+The slight delay in applying AI hints (typically 5-30 seconds) is acceptable because scheduling adjustments aren't time-critical—we're optimizing for hours/days, not milliseconds.
 ## What You Need to Know as a User
-To use Cronicorn effectively, you need to understand:
-1. **Your baseline schedule is your safety net**: Even if AI does something unexpected, the system returns to your baseline when hints expire. Configure baselines conservatively.
-2. **Response bodies matter**: The AI analyzes the JSON you return. Structure it to include metrics the AI should monitor (queue depths, error rates, status flags).
-3. **Constraints are hard limits**: Min/max intervals and pause states override even AI hints. Use them to enforce invariants (rate limits, maintenance windows).
-4. **Coordination happens via response bodies**: To orchestrate multiple endpoints, have them write coordination signals to their response bodies. Other endpoints can read these via the `get_sibling_latest_responses` tool.
-5. **The system is eventually consistent**: Don't expect instant reactions to every change. The AI analyzes every 5 minutes, and hints apply on the next execution. Plan for minutes, not seconds.
+1. **Your baseline schedule is your safety net**: The system returns to baseline when hints expire
+2. **Response bodies matter**: The AI analyzes the JSON you return
+3. **Constraints are hard limits**: Min/max intervals and pause states override AI hints
+4. **The system is eventually consistent**: Plan for minutes, not seconds
 ---
-*This document explains how the system is designed. For implementation details, see the source code in `packages/worker-scheduler` and `packages/worker-ai-planner`.*
+## See Also
+- **[How Scheduling Works](./how-scheduling-works.md)** - Detailed Governor logic and safety mechanisms
+- **[How AI Adaptation Works](./how-ai-adaptation-works.md)** - AI tools, response body design, and decision framework
+- **[Configuration and Constraints](./configuration-and-constraints.md)** - Setting up endpoints effectively