@kairos-sdk/core 0.1.1 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,8 +1,8 @@
1
1
  # @kairos-sdk/core
2
2
 
3
- **Turn plain English into deployed n8n workflows.**
3
+ **Turn plain English into deployed n8n workflows — validated, corrected, and deployed in one call.**
4
4
 
5
- Kairos is a TypeScript SDK that takes a natural-language description of an automation, calls Claude to generate valid n8n workflow JSON, runs it through a 19-rule validator with an automatic correction loop, and deploys it to your n8n instance via REST API all in a single `build()` call.
5
+ Kairos is a TypeScript SDK that takes a natural-language description of an automation, calls Claude to generate n8n workflow JSON, runs it through a **23-rule structural validator** with an automatic correction loop (up to 3 attempts), and deploys the result to your n8n instance via REST API. A local workflow library with **hybrid retrieval** (TF-IDF + node fingerprinting + outcome history + cluster reranking) and telemetry-based feedback inject past failure patterns into future generations. With a seeded template library, Kairos achieves **100% first-try validation pass rate** across 20 benchmark prompts.
6
6
 
7
7
  ```ts
8
8
  import { Kairos } from '@kairos-sdk/core'
@@ -72,13 +72,70 @@ console.log(deployed.workflowId) // now live in n8n
72
72
 
73
73
  ---
74
74
 
75
+ ## Benchmark Results
76
+
77
+ Tested against 20 workflow prompts of varying complexity (simple triggers, multi-step conditional logic, AI agents with memory). Results measure **structural validation pass rate** — whether the generated workflow passes all 23 validator rules, not end-to-end execution correctness.
78
+
79
+ ### Before vs After: Template-Seeded Library
80
+
81
+ | Metric | Baseline (no library) | With library (105 templates) | Delta |
82
+ |---|---|---|---|
83
+ | **First-try pass rate** | 55% (11/20) | **100% (20/20)** | **+45pp** |
84
+ | Avg attempts | 1.45 | **1.00** | -0.45 |
85
+ | Correction loop usage | 45% | **0%** | -45pp |
86
+ | Avg generation time | 30.6s | **20.7s** | -32% |
87
+ | Failures | 0 | 0 | — |
88
+
89
+ The baseline run used Claude with the 22-rule validator and correction loop but no library. The seeded run used the same validator plus a library of 105 workflows (16 organic + 89 ingested from the n8n community). Template seeding eliminated the correction loop entirely and cut generation time by a third.
90
+
91
+ > **Note:** These results confirm that generated workflows are structurally valid and deployable to n8n. They do not verify runtime execution correctness, credential configuration, or whether the workflow output matches user intent.
92
+
93
+ ---
94
+
75
95
  ## How It Works
76
96
 
77
- 1. **Generate** — Kairos sends your description to Claude with a detailed system prompt and forces a `generate_workflow` tool call, producing structured n8n workflow JSON.
78
- 2. **Validate** — The workflow is checked against 19 rules covering node structure, connection integrity, forbidden fields, trigger presence, AI connection direction, and more.
79
- 3. **Correct** — If validation fails, Kairos automatically sends the issues back to Claude and retries (up to 3 attempts, with tighter temperature on the final try).
80
- 4. **Strip** — Forbidden server-assigned fields (`id`, `createdAt`, `updatedAt`, etc.) are stripped before deployment.
81
- 5. **Deploy** — The validated workflow is posted to your n8n instance via REST API.
97
+ 1. **Search** — Kairos searches its local workflow library for similar past builds. Matching workflows and their failure patterns are pulled into context.
98
+ 2. **Warn** — Known failure patterns (from library matches and global telemetry rates) are injected into the system prompt so Claude avoids repeating known mistakes.
99
+ 3. **Generate** — Your description is sent to Claude with a detailed system prompt, forcing a `generate_workflow` tool call that produces structured n8n workflow JSON.
100
+ 4. **Validate** — The workflow is checked against **22 structural rules** covering node IDs, types, versions, names, positions, connections, forbidden fields, trigger presence, AI connection direction, cycle detection, webhook pairing, and required parameters.
101
+ 5. **Correct** — If validation fails, the specific rule violations are sent back to Claude for correction (up to 3 attempts, with tighter temperature on the final try).
102
+ 6. **Strip** — Forbidden server-assigned fields (`id`, `createdAt`, `updatedAt`, etc.) are stripped before deployment.
103
+ 7. **Deploy** — The validated workflow is posted to your n8n instance via REST API.
104
+ 8. **Record** — The workflow, its metadata (generation mode, attempt count, failure patterns, credentials needed), and telemetry events are saved locally. Future builds use this history to avoid past mistakes.
105
+
106
+ ---
107
+
108
+ ## Validator Rules
109
+
110
+ The 22-rule validator is the core of what makes Kairos reliable. In baseline testing (no library), Claude needed the correction loop 45% of the time. Each rule targets a specific class of error:
111
+
112
+ | Rule | Severity | What it checks |
113
+ |------|----------|----------------|
114
+ | 1 | error | Workflow has a non-empty name |
115
+ | 2 | error | At least one node exists |
116
+ | 3 | error | Every node has a non-empty ID |
117
+ | 4 | error | No duplicate node IDs |
118
+ | 5 | error | Every node has a type string |
119
+ | 6 | error | Every node has a valid typeVersion |
120
+ | 7 | error | Every node has a valid [x, y] position |
121
+ | 8 | error | Every node has a non-empty name |
122
+ | 9 | error | Connections is a plain object |
123
+ | 10 | error | Every connection target exists in nodes |
124
+ | 11 | warn | Non-trigger nodes have incoming connections |
125
+ | 12 | error | No forbidden server-assigned fields |
126
+ | 13 | error | Settings is a valid object |
127
+ | 14 | error | At least one trigger node present |
128
+ | 15 | error | Node type strings match expected format |
129
+ | 16 | error | No duplicate node names |
130
+ | 17 | error | Credentials have valid id/name shape |
131
+ | 18 | error | AI connections originate from sub-nodes, not agent roots |
132
+ | 19 | warn | typeVersion is within known safe range |
133
+ | 20 | warn | No connection cycles (exempts splitInBatches loops) |
134
+ | 21 | warn | Webhook with responseMode="responseNode" has respondToWebhook |
135
+ | 22 | warn | Required parameters present for known node types |
136
+ | 23 | warn | Node type is recognized in the registry (unknown types may not exist in n8n) |
137
+
138
+ Errors block deployment. Warnings are recorded and fed back into the prompt for future builds.
82
139
 
83
140
  ---
84
141
 
@@ -94,7 +151,7 @@ console.log(deployed.workflowId) // now live in n8n
94
151
  | `model` | `string` | | Claude model to use (default: `claude-sonnet-4-6`) |
95
152
  | `logger` | `ILogger` | | Custom logger (default: silent) |
96
153
  | `telemetry` | `boolean \| string` | | Enable JSONL telemetry logging (`true` for default dir, or a path) |
97
- | `library` | `IWorkflowLibrary` | | Workflow library for RAG (default: `NullLibrary`) |
154
+ | `library` | `IWorkflowLibrary` | | Workflow library for learning loop (default: `NullLibrary`, CLI uses `FileLibrary`) |
98
155
 
99
156
  ---
100
157
 
@@ -115,6 +172,7 @@ const result = await kairos.build(description, {
115
172
  {
116
173
  workflowId: string | null // null on dry run
117
174
  name: string
175
+ workflow: N8nWorkflow // the full generated workflow JSON — inspect before deploying
118
176
  generationAttempts: number // 1–3
119
177
  activationRequired: boolean // true if workflow needs manual activation
120
178
  credentialsNeeded: Array<{
@@ -137,8 +195,8 @@ const workflows = await kairos.list()
137
195
  // Get a specific workflow
138
196
  const workflow = await kairos.get(workflowId)
139
197
 
140
- // Update a workflow from a new description
141
- const updated = await kairos.update(workflowId, 'new description')
198
+ // Replace a workflow with a fresh generation from a new description
199
+ const updated = await kairos.replace(workflowId, 'new description')
142
200
 
143
201
  // Activate / deactivate
144
202
  await kairos.activate(workflowId)
@@ -211,10 +269,10 @@ try {
211
269
  |---|---|
212
270
  | `GenerationError` | Anthropic API call failed |
213
271
  | `ResponseParseError` | Claude responded but produced no usable tool call |
214
- | `ValidationError` | Workflow failed 19-rule validation after max retries |
272
+ | `ValidationError` | Workflow failed 22-rule validation after max retries |
215
273
  | `ProviderError` | Network/auth failure talking to n8n |
216
274
  | `ApiError` | n8n returned a 4xx or 5xx (carries `.statusCode`) |
217
- | `GuardError` | `delete()` called without `{ confirm: true }` |
275
+ | `GuardError` | Input validation failed (empty description) or `delete()` called without `{ confirm: true }` |
218
276
 
219
277
  ---
220
278
 
@@ -229,6 +287,9 @@ kairos build "Every morning at 9am, send a Slack digest to #daily-updates"
229
287
  # Dry run only
230
288
  kairos build "Monitor a webhook and log payloads" --dry-run
231
289
 
290
+ # Seed library with n8n community templates
291
+ kairos sync-templates --max 200
292
+
232
293
  # Manage workflows
233
294
  kairos list
234
295
  kairos get <workflow-id>
@@ -245,6 +306,8 @@ export N8N_BASE_URL=https://your-instance.app.n8n.cloud
245
306
  export N8N_API_KEY=your-n8n-key
246
307
  ```
247
308
 
309
+ For dry-run mode, only `ANTHROPIC_API_KEY` is required — no n8n setup needed.
310
+
248
311
  ---
249
312
 
250
313
  ## Telemetry
@@ -268,13 +331,15 @@ telemetry: '/path/to/telemetry/dir'
268
331
 
269
332
  Each event includes timestamp, session ID, token counts, validation issues, and duration — useful for benchmarking and analyzing the correction loop.
270
333
 
334
+ Kairos also reads telemetry data to compute **per-rule failure rates** across all builds. Rules that fail frequently (>= 15% of builds) are automatically surfaced as warnings in the generation prompt, helping Claude avoid systemic issues. Failure rates use distinct session counting to avoid inflation from retry loops, and results are cached for 5 minutes.
335
+
271
336
  For CLI usage, set `KAIROS_TELEMETRY=true` in your environment.
272
337
 
273
338
  ---
274
339
 
275
- ## Workflow Library (RAG)
340
+ ## Workflow Library & Feedback Loop
276
341
 
277
- Kairos includes a file-based workflow library that stores successful generations and uses them as few-shot examples for future builds:
342
+ Kairos includes a file-based workflow library that stores every generation and feeds failure patterns back into future builds:
278
343
 
279
344
  ```ts
280
345
  import { Kairos, FileLibrary } from '@kairos-sdk/core'
@@ -284,10 +349,39 @@ const kairos = new Kairos({
284
349
  n8nBaseUrl: '...',
285
350
  n8nApiKey: '...',
286
351
  library: new FileLibrary(), // stores in ~/.kairos/library/
352
+ telemetry: true, // enables failure rate tracking
287
353
  })
288
354
  ```
289
355
 
290
- Every successful `build()` is saved automatically. Over time, the library improves generation quality by providing relevant examples to Claude — a self-improving few-shot learning system.
356
+ **What gets stored per workflow:**
357
+ - The full workflow JSON and description
358
+ - Generation mode (`direct`, `reference`, or `scratch` based on library match quality)
359
+ - Number of generation attempts needed
360
+ - Failure patterns — which validation rules failed and how many times
361
+ - Source workflow IDs (which library entries influenced this build)
362
+ - Top match score and credentials needed
363
+ - Outcome tracking: retrieval count, usage as direct/reference source, first-try pass rate, avg attempts, and failed rules when used as a source
364
+
365
+ **How retrieval works:**
366
+
367
+ Kairos uses a **hybrid retrieval** pipeline with four scoring signals, weighted and combined:
368
+
369
+ | Signal | Weight | What it captures |
370
+ |---|---|---|
371
+ | TF-IDF keywords | 0.35 | Text similarity between description and stored workflows |
372
+ | Node fingerprint | 0.30 | Jaccard similarity between expected node types (extracted from query) and actual nodes in stored workflows |
373
+ | Outcome history | 0.20 | First-try pass rate and avg attempts when this workflow was used as a source — proven templates rank higher |
374
+ | Deploy frequency | 0.15 | How often a workflow has been deployed — a proxy for usefulness |
375
+
376
+ After hybrid scoring, results are **reranked by cluster**: workflows are grouped by node fingerprint pattern (e.g., webhook→slack, scheduleTrigger→httpRequest→gmail), and cluster-level success stats boost or penalize candidates. Clusters with high failure rates on specific rules surface those as warnings.
377
+
378
+ - High-scoring matches (>= 0.92) provide direct structural templates
379
+ - Medium matches (>= 0.72) provide reference examples
380
+ - Failure patterns from matched workflows and cluster-level warnings are injected into Claude's prompt
381
+
382
+ **Template seeding:** Run `kairos sync-templates` to ingest validated workflows from the n8n community library. Templates are safety-filtered (blocks code/executeCommand/ssh nodes, hardcoded secrets) and tagged with `sourceKind: 'n8n-template'`. In benchmarks, seeding the library with 89 templates improved first-try pass rate from 55% to 100%.
383
+
384
+ The CLI automatically enables the library — no configuration needed.
291
385
 
292
386
  ---
293
387