npm - @kairos-sdk/core - Versions diffs - 0.1.1 → 0.2.1 - Mend

@kairos-sdk/core 0.1.1 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/README.md +109 -15
package/dist/{chunk-IADOKKFO.js → chunk-QQJDLS5A.js} +1190 -110
package/dist/chunk-QQJDLS5A.js.map +1 -0
package/dist/cli.cjs +1349 -132
package/dist/cli.cjs.map +1 -1
package/dist/cli.js +171 -27
package/dist/cli.js.map +1 -1
package/dist/index.cjs +1135 -153
package/dist/index.cjs.map +1 -1
package/dist/index.d.cts +154 -13
package/dist/index.d.ts +154 -13
package/dist/index.js +18 -111
package/dist/index.js.map +1 -1
package/package.json +12 -5
package/dist/chunk-IADOKKFO.js.map +0 -1

package/README.md CHANGED Viewed

@@ -1,8 +1,8 @@
 # @kairos-sdk/core
-**Turn plain English into deployed n8n workflows.**
+**Turn plain English into deployed n8n workflows — validated, corrected, and deployed in one call.**
-Kairos is a TypeScript SDK that takes a natural-language description of an automation, calls Claude to generate valid n8n workflow JSON, runs it through a 19-rule validator with an automatic correction loop, and deploys it to your n8n instance via REST API — all in a single `build()` call.
+Kairos is a TypeScript SDK that takes a natural-language description of an automation, calls Claude to generate n8n workflow JSON, runs it through a **23-rule structural validator** with an automatic correction loop (up to 3 attempts), and deploys the result to your n8n instance via REST API. A local workflow library with **hybrid retrieval** (TF-IDF + node fingerprinting + outcome history + cluster reranking) and telemetry-based feedback inject past failure patterns into future generations. With a seeded template library, Kairos achieves **100% first-try validation pass rate** across 20 benchmark prompts.
 ```ts
 import { Kairos } from '@kairos-sdk/core'
@@ -72,13 +72,70 @@ console.log(deployed.workflowId) // now live in n8n
 ---
+## Benchmark Results
+Tested against 20 workflow prompts of varying complexity (simple triggers, multi-step conditional logic, AI agents with memory). Results measure **structural validation pass rate** — whether the generated workflow passes all 23 validator rules, not end-to-end execution correctness.
+### Before vs After: Template-Seeded Library
+| Metric | Baseline (no library) | With library (105 templates) | Delta |
+|---|---|---|---|
+| **First-try pass rate** | 55% (11/20) | **100% (20/20)** | **+45pp** |
+| Avg attempts | 1.45 | **1.00** | -0.45 |
+| Correction loop usage | 45% | **0%** | -45pp |
+| Avg generation time | 30.6s | **20.7s** | -32% |
+| Failures | 0 | 0 | — |
+The baseline run used Claude with the 22-rule validator and correction loop but no library. The seeded run used the same validator plus a library of 105 workflows (16 organic + 89 ingested from the n8n community). Template seeding eliminated the correction loop entirely and cut generation time by a third.
+> **Note:** These results confirm that generated workflows are structurally valid and deployable to n8n. They do not verify runtime execution correctness, credential configuration, or whether the workflow output matches user intent.
+---
 ## How It Works
-1. **Generate** — Kairos sends your description to Claude with a detailed system prompt and forces a `generate_workflow` tool call, producing structured n8n workflow JSON.
-2. **Validate** — The workflow is checked against 19 rules covering node structure, connection integrity, forbidden fields, trigger presence, AI connection direction, and more.
-3. **Correct** — If validation fails, Kairos automatically sends the issues back to Claude and retries (up to 3 attempts, with tighter temperature on the final try).
-4. **Strip** — Forbidden server-assigned fields (`id`, `createdAt`, `updatedAt`, etc.) are stripped before deployment.
-5. **Deploy** — The validated workflow is posted to your n8n instance via REST API.
+1. **Search** — Kairos searches its local workflow library for similar past builds. Matching workflows and their failure patterns are pulled into context.
+2. **Warn** — Known failure patterns (from library matches and global telemetry rates) are injected into the system prompt so Claude avoids repeating known mistakes.
+3. **Generate** — Your description is sent to Claude with a detailed system prompt, forcing a `generate_workflow` tool call that produces structured n8n workflow JSON.
+4. **Validate** — The workflow is checked against **22 structural rules** covering node IDs, types, versions, names, positions, connections, forbidden fields, trigger presence, AI connection direction, cycle detection, webhook pairing, and required parameters.
+5. **Correct** — If validation fails, the specific rule violations are sent back to Claude for correction (up to 3 attempts, with tighter temperature on the final try).
+6. **Strip** — Forbidden server-assigned fields (`id`, `createdAt`, `updatedAt`, etc.) are stripped before deployment.
+7. **Deploy** — The validated workflow is posted to your n8n instance via REST API.
+8. **Record** — The workflow, its metadata (generation mode, attempt count, failure patterns, credentials needed), and telemetry events are saved locally. Future builds use this history to avoid past mistakes.
+---
+## Validator Rules
+The 22-rule validator is the core of what makes Kairos reliable. In baseline testing (no library), Claude needed the correction loop 45% of the time. Each rule targets a specific class of error:
+| Rule | Severity | What it checks |
+|------|----------|----------------|
+| 1 | error | Workflow has a non-empty name |
+| 2 | error | At least one node exists |
+| 3 | error | Every node has a non-empty ID |
+| 4 | error | No duplicate node IDs |
+| 5 | error | Every node has a type string |
+| 6 | error | Every node has a valid typeVersion |
+| 7 | error | Every node has a valid [x, y] position |
+| 8 | error | Every node has a non-empty name |
+| 9 | error | Connections is a plain object |
+| 10 | error | Every connection target exists in nodes |
+| 11 | warn | Non-trigger nodes have incoming connections |
+| 12 | error | No forbidden server-assigned fields |
+| 13 | error | Settings is a valid object |
+| 14 | error | At least one trigger node present |
+| 15 | error | Node type strings match expected format |
+| 16 | error | No duplicate node names |
+| 17 | error | Credentials have valid id/name shape |
+| 18 | error | AI connections originate from sub-nodes, not agent roots |
+| 19 | warn | typeVersion is within known safe range |
+| 20 | warn | No connection cycles (exempts splitInBatches loops) |
+| 21 | warn | Webhook with responseMode="responseNode" has respondToWebhook |
+| 22 | warn | Required parameters present for known node types |
+| 23 | warn | Node type is recognized in the registry (unknown types may not exist in n8n) |
+Errors block deployment. Warnings are recorded and fed back into the prompt for future builds.
 ---
@@ -94,7 +151,7 @@ console.log(deployed.workflowId) // now live in n8n
 | `model` | `string` | | Claude model to use (default: `claude-sonnet-4-6`) |
 | `logger` | `ILogger` | | Custom logger (default: silent) |
 | `telemetry` | `boolean \| string` | | Enable JSONL telemetry logging (`true` for default dir, or a path) |
-| `library` | `IWorkflowLibrary` | | Workflow library for RAG (default: `NullLibrary`) |
+| `library` | `IWorkflowLibrary` | | Workflow library for learning loop (default: `NullLibrary`, CLI uses `FileLibrary`) |
 ---
@@ -115,6 +172,7 @@ const result = await kairos.build(description, {
 {
   workflowId: string | null  // null on dry run
   name: string
+  workflow: N8nWorkflow       // the full generated workflow JSON — inspect before deploying
   generationAttempts: number  // 1–3
   activationRequired: boolean // true if workflow needs manual activation
   credentialsNeeded: Array<{
@@ -137,8 +195,8 @@ const workflows = await kairos.list()
 // Get a specific workflow
 const workflow = await kairos.get(workflowId)
-// Update a workflow from a new description
-const updated = await kairos.update(workflowId, 'new description')
+// Replace a workflow with a fresh generation from a new description
+const updated = await kairos.replace(workflowId, 'new description')
 // Activate / deactivate
 await kairos.activate(workflowId)
@@ -211,10 +269,10 @@ try {
 |---|---|
 | `GenerationError` | Anthropic API call failed |
 | `ResponseParseError` | Claude responded but produced no usable tool call |
-| `ValidationError` | Workflow failed 19-rule validation after max retries |
+| `ValidationError` | Workflow failed 22-rule validation after max retries |
 | `ProviderError` | Network/auth failure talking to n8n |
 | `ApiError` | n8n returned a 4xx or 5xx (carries `.statusCode`) |
-| `GuardError` | `delete()` called without `{ confirm: true }` |
+| `GuardError` | Input validation failed (empty description) or `delete()` called without `{ confirm: true }` |
 ---
@@ -229,6 +287,9 @@ kairos build "Every morning at 9am, send a Slack digest to #daily-updates"
 # Dry run only
 kairos build "Monitor a webhook and log payloads" --dry-run
+# Seed library with n8n community templates
+kairos sync-templates --max 200
 # Manage workflows
 kairos list
 kairos get <workflow-id>
@@ -245,6 +306,8 @@ export N8N_BASE_URL=https://your-instance.app.n8n.cloud
 export N8N_API_KEY=your-n8n-key
 ```
+For dry-run mode, only `ANTHROPIC_API_KEY` is required — no n8n setup needed.
 ---
 ## Telemetry
@@ -268,13 +331,15 @@ telemetry: '/path/to/telemetry/dir'
 Each event includes timestamp, session ID, token counts, validation issues, and duration — useful for benchmarking and analyzing the correction loop.
+Kairos also reads telemetry data to compute **per-rule failure rates** across all builds. Rules that fail frequently (>= 15% of builds) are automatically surfaced as warnings in the generation prompt, helping Claude avoid systemic issues. Failure rates use distinct session counting to avoid inflation from retry loops, and results are cached for 5 minutes.
 For CLI usage, set `KAIROS_TELEMETRY=true` in your environment.
 ---
-## Workflow Library (RAG)
+## Workflow Library & Feedback Loop
-Kairos includes a file-based workflow library that stores successful generations and uses them as few-shot examples for future builds:
+Kairos includes a file-based workflow library that stores every generation and feeds failure patterns back into future builds:
 ```ts
 import { Kairos, FileLibrary } from '@kairos-sdk/core'
@@ -284,10 +349,39 @@ const kairos = new Kairos({
   n8nBaseUrl: '...',
   n8nApiKey: '...',
   library: new FileLibrary(), // stores in ~/.kairos/library/
+  telemetry: true,            // enables failure rate tracking
 })
 ```
-Every successful `build()` is saved automatically. Over time, the library improves generation quality by providing relevant examples to Claude — a self-improving few-shot learning system.
+**What gets stored per workflow:**
+- The full workflow JSON and description
+- Generation mode (`direct`, `reference`, or `scratch` based on library match quality)
+- Number of generation attempts needed
+- Failure patterns — which validation rules failed and how many times
+- Source workflow IDs (which library entries influenced this build)
+- Top match score and credentials needed
+- Outcome tracking: retrieval count, usage as direct/reference source, first-try pass rate, avg attempts, and failed rules when used as a source
+**How retrieval works:**
+Kairos uses a **hybrid retrieval** pipeline with four scoring signals, weighted and combined:
+| Signal | Weight | What it captures |
+|---|---|---|
+| TF-IDF keywords | 0.35 | Text similarity between description and stored workflows |
+| Node fingerprint | 0.30 | Jaccard similarity between expected node types (extracted from query) and actual nodes in stored workflows |
+| Outcome history | 0.20 | First-try pass rate and avg attempts when this workflow was used as a source — proven templates rank higher |
+| Deploy frequency | 0.15 | How often a workflow has been deployed — a proxy for usefulness |
+After hybrid scoring, results are **reranked by cluster**: workflows are grouped by node fingerprint pattern (e.g., webhook→slack, scheduleTrigger→httpRequest→gmail), and cluster-level success stats boost or penalize candidates. Clusters with high failure rates on specific rules surface those as warnings.
+- High-scoring matches (>= 0.92) provide direct structural templates
+- Medium matches (>= 0.72) provide reference examples
+- Failure patterns from matched workflows and cluster-level warnings are injected into Claude's prompt
+**Template seeding:** Run `kairos sync-templates` to ingest validated workflows from the n8n community library. Templates are safety-filtered (blocks code/executeCommand/ssh nodes, hardcoded secrets) and tagged with `sourceKind: 'n8n-template'`. In benchmarks, seeding the library with 89 templates improved first-try pass rate from 55% to 100%.
+The CLI automatically enables the library — no configuration needed.
 ---