kairn-cli 2.5.0 → 2.5.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +499 -74
- package/dist/cli.js +1 -1
- package/dist/cli.js.map +1 -1
- package/package.json +3 -3
package/README.md
CHANGED
|
@@ -1,12 +1,14 @@
|
|
|
1
|
-
# Kairn
|
|
1
|
+
# Kairn — The Agent Environment Compiler
|
|
2
2
|
|
|
3
|
-
>
|
|
3
|
+
> Describe your workflow. Get an optimized Claude Code environment. Then evolve it automatically.
|
|
4
4
|
|
|
5
|
-
Kairn is a CLI that compiles natural language
|
|
5
|
+
Kairn is a CLI that compiles natural language descriptions into minimal, optimal [Claude Code](https://code.claude.com/) agent environments — complete with MCP servers, slash commands, skills, subagents, rules, and security. Then it uses **automated evolution** (inspired by [Meta-Harness](https://yoonholee.com/meta-harness/), Stanford IRIS Lab 2026) to improve them through real-world task execution.
|
|
6
6
|
|
|
7
|
-
**v2.
|
|
7
|
+
**v2.5.0** adds **Intent-Aware Harnesses** — project-specific routing that intercepts natural language and activates the right command. Two-tier: fast regex (Tier 1) + semantic Haiku fallback (Tier 2). Self-learning — the harness learns your vocabulary over time.
|
|
8
8
|
|
|
9
|
-
**No
|
|
9
|
+
**No servers. No accounts. No telemetry. Runs locally with your own LLM key.**
|
|
10
|
+
|
|
11
|
+
---
|
|
10
12
|
|
|
11
13
|
## Install
|
|
12
14
|
|
|
@@ -19,127 +21,267 @@ Requires Node.js 18+. The command is `kairn`.
|
|
|
19
21
|
## Quick Start
|
|
20
22
|
|
|
21
23
|
```bash
|
|
22
|
-
# 1. Set up your LLM
|
|
24
|
+
# 1. Set up your LLM provider (Anthropic, OpenAI, Google, xAI, DeepSeek, Mistral, Groq, or custom)
|
|
23
25
|
kairn init
|
|
24
26
|
|
|
25
|
-
# 2. Describe your workflow
|
|
27
|
+
# 2. Describe your workflow (or scan an existing repo)
|
|
26
28
|
kairn describe "Build a Next.js app with Supabase auth"
|
|
29
|
+
# or
|
|
30
|
+
kairn optimize # scans existing project at cwd
|
|
27
31
|
|
|
28
32
|
# 3. Start coding
|
|
29
33
|
claude
|
|
30
34
|
```
|
|
31
35
|
|
|
32
|
-
Kairn generates the entire `.claude/` directory — CLAUDE.md,
|
|
36
|
+
Kairn generates the entire `.claude/` directory — CLAUDE.md, settings.json, commands, rules, agents, hooks, security policies — tailored to your specific workflow. Then, optionally, evolve it:
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
# Set up evolution
|
|
40
|
+
kairn evolve init # auto-generate 3-5 eval tasks
|
|
41
|
+
kairn evolve baseline # snapshot current harness
|
|
42
|
+
|
|
43
|
+
# Optimize
|
|
44
|
+
kairn evolve run --iterations 5 # Run evolution loop
|
|
45
|
+
kairn evolve apply # Accept best harness
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
---
|
|
33
49
|
|
|
34
50
|
## What Gets Generated
|
|
35
51
|
|
|
36
52
|
```
|
|
37
53
|
.claude/
|
|
38
|
-
├── CLAUDE.md # Workflow-specific system prompt
|
|
39
|
-
├── settings.json # Permissions, hooks,
|
|
54
|
+
├── CLAUDE.md # Workflow-specific system prompt (7 sections)
|
|
55
|
+
├── settings.json # Permissions, hooks, security rules, intent routing
|
|
40
56
|
├── commands/ # Slash commands (/project:help, /project:plan, etc.)
|
|
41
|
-
├── rules/ # Auto-loaded instructions (security, continuity)
|
|
42
|
-
├── skills/ # Model-controlled capabilities
|
|
43
|
-
├── agents/ # Specialized subagents
|
|
44
|
-
|
|
57
|
+
├── rules/ # Auto-loaded instructions (security, continuity, paths)
|
|
58
|
+
├── skills/ # Model-controlled capabilities (code, research, writing)
|
|
59
|
+
├── agents/ # Specialized subagents (@architect, @tester, etc.)
|
|
60
|
+
├── docs/ # Pre-initialized project memory
|
|
61
|
+
├── hooks/ # Intent router (Tier 1 regex + Tier 2 Haiku classifier)
|
|
62
|
+
│ ├── intent-router.mjs # Project-specific regex patterns + fallthrough
|
|
63
|
+
│ ├── intent-learner.mjs # Promotes recurring Tier 2 patterns to Tier 1
|
|
64
|
+
│ └── intent-log.jsonl # Log of routed prompts (for learning)
|
|
65
|
+
└── QUICKSTART.md # Interactive startup guide (Level 2-4)
|
|
45
66
|
.mcp.json # Project-scoped MCP server config
|
|
46
67
|
.env # API keys (gitignored, masked in output)
|
|
47
68
|
```
|
|
48
69
|
|
|
49
|
-
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## Core Commands
|
|
50
73
|
|
|
51
74
|
### `kairn init`
|
|
52
75
|
|
|
53
|
-
Interactive setup. Pick your LLM provider
|
|
76
|
+
Interactive setup. Pick your LLM provider, enter credentials. API key stored locally at `~/.kairn/config.json`.
|
|
54
77
|
|
|
55
|
-
Supported providers
|
|
78
|
+
**Supported providers:**
|
|
56
79
|
- **Anthropic** — Claude Sonnet 4.6, Opus 4.6, Haiku 4.5
|
|
57
80
|
- **OpenAI** — GPT-4.1, GPT-4.1 mini, o4-mini, GPT-5 mini
|
|
58
81
|
- **Google** — Gemini 2.5 Flash, Gemini 3 Flash, Gemini 2.5 Pro, Gemini 3.1 Pro
|
|
59
|
-
- **xAI** — Grok 4.1 Fast, Grok 4.20 (2M context)
|
|
60
|
-
- **DeepSeek** — V3.2 Chat, V3.2 Reasoner (cheapest)
|
|
82
|
+
- **xAI** — Grok 4.1 Fast, Grok 4.20 (2M context, $0.20/M)
|
|
83
|
+
- **DeepSeek** — V3.2 Chat, V3.2 Reasoner (cheapest at $0.28/M)
|
|
61
84
|
- **Mistral** — Large 3, Codestral, Small 4 (open-weight)
|
|
62
85
|
- **Groq** — Llama 4, DeepSeek R1, Qwen 3 (free tier)
|
|
63
|
-
- **Custom** — any OpenAI-compatible endpoint (local Ollama, LM Studio
|
|
86
|
+
- **Custom** — any OpenAI-compatible endpoint (local Ollama, LM Studio)
|
|
64
87
|
|
|
65
|
-
### `kairn describe [intent]`
|
|
88
|
+
### `kairn describe [intent] [options]`
|
|
66
89
|
|
|
67
|
-
The main command
|
|
90
|
+
**The main command.** Describe what you want your agent to do. Kairn compiles an optimal environment.
|
|
68
91
|
|
|
69
92
|
```bash
|
|
70
|
-
kairn describe "
|
|
71
|
-
kairn describe "
|
|
93
|
+
kairn describe "Build a Next.js REST API with PostgreSQL"
|
|
94
|
+
kairn describe "Research ML papers on GRPO training and summarize" --quick
|
|
72
95
|
```
|
|
73
96
|
|
|
74
|
-
Features
|
|
75
|
-
- **Interactive clarification** — 3-5 questions to
|
|
76
|
-
- **Multi-pass compilation** —
|
|
77
|
-
- **Autonomy levels** —
|
|
78
|
-
- **
|
|
97
|
+
**Features:**
|
|
98
|
+
- **Interactive clarification** — 3-5 yes/no questions to refine your workflow (skip with `--quick`)
|
|
99
|
+
- **Multi-pass compilation** — Skeleton pass (tool selection) + Harness pass (content generation) + deterministic settings
|
|
100
|
+
- **Autonomy levels** — Choose how autonomous (1-4, default 2):
|
|
101
|
+
- **Level 1 (Guided):** Manual workflow with `/project:tour`, help, and guidance
|
|
102
|
+
- **Level 2 (Assisted):** `/project:loop` for workflow automation, `@pm` agent for planning
|
|
103
|
+
- **Level 3 (Autonomous):** `/project:auto` for self-directed execution with PR delivery
|
|
104
|
+
- **Level 4 (Full Auto):** `/project:autopilot` for continuous execution with stop conditions
|
|
105
|
+
- **Secrets collection** — Prompted for API keys after generation, written to `.env`
|
|
106
|
+
- **Intent routing** — Auto-generated `/project:*` command routing (both regex and Haiku-based)
|
|
79
107
|
|
|
80
|
-
### `kairn optimize [
|
|
108
|
+
### `kairn optimize [options]`
|
|
81
109
|
|
|
82
110
|
Scan an existing project and optimize its Claude Code environment. Detects language, framework, dependencies, and generates improvements.
|
|
83
111
|
|
|
84
112
|
```bash
|
|
85
|
-
kairn optimize #
|
|
113
|
+
kairn optimize # Scan, audit, and overwrite .claude/
|
|
86
114
|
kairn optimize --diff # Preview changes before writing
|
|
115
|
+
kairn optimize --audit-only # Show issues without generating
|
|
87
116
|
```
|
|
88
117
|
|
|
89
|
-
|
|
118
|
+
**Features:**
|
|
119
|
+
- **Full project scan** — language, framework, dependencies, scripts, env keys, CI/CD, existing harness
|
|
120
|
+
- **Harness audit** — checks CLAUDE.md quality, missing commands/rules, MCP bloat, security configurations
|
|
121
|
+
- **Two modes:**
|
|
122
|
+
- No `.claude/` → generate from scratch
|
|
123
|
+
- Has `.claude/` → optimize + overwrite (shows audit issues first, asks for confirmation)
|
|
124
|
+
- **Diff preview** — see what would change before applying (with `--diff`)
|
|
90
125
|
|
|
91
|
-
|
|
126
|
+
### `kairn templates [options]`
|
|
127
|
+
|
|
128
|
+
Browse pre-built environment templates. Activate one to jumpstart a new project.
|
|
92
129
|
|
|
93
130
|
```bash
|
|
94
|
-
kairn templates
|
|
95
|
-
kairn templates --activate nextjs
|
|
131
|
+
kairn templates # Browse gallery
|
|
132
|
+
kairn templates --activate nextjs # Apply a template
|
|
96
133
|
```
|
|
97
134
|
|
|
98
|
-
Available templates
|
|
135
|
+
**Available templates:**
|
|
136
|
+
- Next.js Full-Stack (React + Node + PostgreSQL + Supabase)
|
|
137
|
+
- API Service (Express/Fastify + database + testing)
|
|
138
|
+
- Research Project (paper analysis, literature review, synthesis)
|
|
139
|
+
- Content Writing (blog, documentation, marketing)
|
|
99
140
|
|
|
100
141
|
### `kairn doctor`
|
|
101
142
|
|
|
102
|
-
Validate the current environment against Claude Code best practices.
|
|
143
|
+
Validate the current environment against Claude Code best practices. Checks:
|
|
144
|
+
- CLAUDE.md structure and token count
|
|
145
|
+
- MCP server configuration completeness
|
|
146
|
+
- Security rules and hooks
|
|
147
|
+
- Command and agent definitions
|
|
148
|
+
- Environment variable references
|
|
103
149
|
|
|
104
|
-
### `kairn keys [
|
|
150
|
+
### `kairn keys [options]`
|
|
105
151
|
|
|
106
|
-
|
|
152
|
+
Manage API keys for MCP servers in the current environment.
|
|
153
|
+
|
|
154
|
+
```bash
|
|
155
|
+
kairn keys # Prompt for missing keys
|
|
156
|
+
kairn keys --show # Show which keys are set vs missing
|
|
157
|
+
```
|
|
107
158
|
|
|
108
159
|
### `kairn list` / `kairn activate <env_id>`
|
|
109
160
|
|
|
110
|
-
Show saved environments and re-deploy them to any directory.
|
|
161
|
+
Show saved environments (stored in `~/.kairn/envs/`) and re-deploy them to any directory.
|
|
111
162
|
|
|
112
|
-
|
|
163
|
+
```bash
|
|
164
|
+
kairn list # List all saved environments
|
|
165
|
+
kairn activate env_abc123 # Copy that environment to .claude/
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
### `kairn evolve` — Automated Harness Optimization
|
|
169
|
+
|
|
170
|
+
The heart of v2.x. Run your agent on real tasks, capture execution traces, diagnose failures, and mutate the harness iteratively.
|
|
171
|
+
|
|
172
|
+
#### `kairn evolve init`
|
|
113
173
|
|
|
114
|
-
|
|
174
|
+
Set up evolution for the current project. Auto-generates 3-5 concrete eval tasks based on your CLAUDE.md and project structure.
|
|
115
175
|
|
|
116
176
|
```bash
|
|
117
|
-
# 1. Initialize — auto-generates project-specific eval tasks via LLM
|
|
118
177
|
kairn evolve init
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
Creates `.kairn-evolve/tasks.yaml` with tasks like:
|
|
181
|
+
- "Add a new feature X to the codebase"
|
|
182
|
+
- "Fix this known bug Y"
|
|
183
|
+
- "Refactor the API layer for clarity"
|
|
184
|
+
- "Write comprehensive test coverage"
|
|
185
|
+
- "Update documentation after feature launch"
|
|
186
|
+
|
|
187
|
+
Uses 6 built-in templates: add-feature, fix-bug, refactor, test-writing, config-change, documentation.
|
|
119
188
|
|
|
120
|
-
|
|
189
|
+
#### `kairn evolve baseline`
|
|
190
|
+
|
|
191
|
+
Snapshot your current `.claude/` directory as iteration 0 (the baseline to improve against).
|
|
192
|
+
|
|
193
|
+
```bash
|
|
121
194
|
kairn evolve baseline
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
#### `kairn evolve run`
|
|
198
|
+
|
|
199
|
+
Run the full evolution loop. Evaluates all tasks, diagnoses failures, proposes mutations, re-evaluates.
|
|
200
|
+
|
|
201
|
+
```bash
|
|
202
|
+
kairn evolve run # 5 iterations (default)
|
|
203
|
+
kairn evolve run --iterations 3 # Custom iteration count
|
|
204
|
+
kairn evolve run --task <task_id> # Run a single task
|
|
205
|
+
kairn evolve run --parallel 4 # Parallel task evaluation (4 concurrent)
|
|
206
|
+
kairn evolve run --runs 3 # Run each task 3 times, report mean ± stddev
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
**How it works (the loop):**
|
|
210
|
+
|
|
211
|
+
1. **Evaluate** — Run each eval task by spawning Claude Code in an isolated workspace. Capture full traces:
|
|
212
|
+
- stdout, stderr
|
|
213
|
+
- MCP tool calls (which tools, inputs, outputs)
|
|
214
|
+
- Files changed (diffs)
|
|
215
|
+
- Execution time, pass/fail status
|
|
216
|
+
|
|
217
|
+
2. **Diagnose** — A proposer agent (Opus) reads the full trace filesystem and performs causal reasoning:
|
|
218
|
+
- "Task A failed because CLAUDE.md doesn't mention the /api path"
|
|
219
|
+
- "Task B passed on iteration 1 but regressed on iteration 3 — the new security rule broke it"
|
|
220
|
+
- "Tasks A and C both needed /project:fix but there's no /project:fix command"
|
|
221
|
+
|
|
222
|
+
3. **Mutate** — Propose minimal, targeted changes to the harness:
|
|
223
|
+
- `replace`: Update a section in CLAUDE.md, a command, a rule
|
|
224
|
+
- `add_section`: Insert new guidance into CLAUDE.md
|
|
225
|
+
- `create_file`: Add a new command or rule
|
|
226
|
+
- `delete_section`: Remove contradictory or bloat sections
|
|
227
|
+
- `delete_file`: Remove unused commands/rules
|
|
228
|
+
- `add_intent_pattern`: Add a new natural language pattern (v2.5.0)
|
|
229
|
+
- `modify_intent_prompt`: Improve the Tier 2 Haiku classifier (v2.5.0)
|
|
230
|
+
|
|
231
|
+
4. **Re-evaluate** — Run all tasks again with the mutated harness. If scores improve → accept. If scores regress → rollback to previous best.
|
|
232
|
+
|
|
233
|
+
5. **Repeat** — Iterate N times (default 5). Each iteration cycles through evaluate → diagnose → mutate → re-evaluate.
|
|
122
234
|
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
235
|
+
**Scoring:**
|
|
236
|
+
- **pass/fail** (default) — task passes or fails
|
|
237
|
+
- **llm-judge** — LLM reads task output and scores (0-100)
|
|
238
|
+
- **rubric** — custom weighted scoring function
|
|
239
|
+
|
|
240
|
+
**Adaptive pruning (v2.2.7):**
|
|
241
|
+
On middle iterations, skip slow/expensive tasks above a confidence threshold. Re-run all tasks on the first and last iteration for rigor.
|
|
242
|
+
|
|
243
|
+
**Anti-regression guards (v2.2.8):**
|
|
244
|
+
- `maxMutationsPerIteration` (default: 3) — cap mutations per step
|
|
245
|
+
- `maxTaskDrop` (default: 20) — if any single task drops >20 points, rollback
|
|
246
|
+
- Loss-weighted proposer focus — proposer reads failures worst-first
|
|
247
|
+
|
|
248
|
+
#### `kairn evolve report`
|
|
249
|
+
|
|
250
|
+
Generate a human-readable summary of the evolution run.
|
|
251
|
+
|
|
252
|
+
```bash
|
|
253
|
+
kairn evolve report # Markdown to stdout
|
|
254
|
+
kairn evolve report --json # Machine-readable JSON
|
|
127
255
|
```
|
|
128
256
|
|
|
129
|
-
|
|
257
|
+
Shows:
|
|
258
|
+
- Evolution leaderboard (iterations × tasks × scores)
|
|
259
|
+
- Per-task trace diffs (what changed between iterations for the same task)
|
|
260
|
+
- Counterfactual diagnosis (which mutations helped/hurt which tasks)
|
|
261
|
+
- Wall time, token cost, iterations completed
|
|
130
262
|
|
|
131
|
-
|
|
132
|
-
2. **Baseline** — `kairn evolve baseline` snapshots your current `.claude/` directory
|
|
133
|
-
3. **Evaluate** — runs each task by spawning Claude Code in an isolated workspace, capturing full traces (stdout, stderr, tool calls, files changed, timing)
|
|
134
|
-
4. **Diagnose** — a proposer agent (Opus) reads the full traces and performs causal reasoning to identify why tasks fail
|
|
135
|
-
5. **Mutate** — proposes minimal, targeted changes to CLAUDE.md, commands, rules, or agents
|
|
136
|
-
6. **Repeat** — re-evaluates with the mutated harness. Rolls back if scores regress.
|
|
263
|
+
#### `kairn evolve diff <iter1> <iter2>`
|
|
137
264
|
|
|
138
|
-
|
|
265
|
+
Show the harness changes between two iterations.
|
|
266
|
+
|
|
267
|
+
```bash
|
|
268
|
+
kairn evolve diff 0 3 # Show all mutations from baseline to iteration 3
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
#### `kairn evolve apply [--iter N]`
|
|
272
|
+
|
|
273
|
+
Copy the best (or specified) evolved harness back to `.claude/`.
|
|
274
|
+
|
|
275
|
+
```bash
|
|
276
|
+
kairn evolve apply # Copy best iteration to .claude/
|
|
277
|
+
kairn evolve apply --iter 3 # Copy iteration 3 specifically
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
---
|
|
139
281
|
|
|
140
282
|
## Tool Registry
|
|
141
283
|
|
|
142
|
-
Kairn ships with 28 curated
|
|
284
|
+
Kairn ships with **28 curated MCP servers** across 8 categories. Tools are auto-selected based on your workflow — fewer tools = less context bloat = better agent performance.
|
|
143
285
|
|
|
144
286
|
| Category | Tools |
|
|
145
287
|
|----------|-------|
|
|
@@ -152,19 +294,262 @@ Kairn ships with 28 curated tools across 8 categories:
|
|
|
152
294
|
| **Security** | Semgrep, security-guidance |
|
|
153
295
|
| **Design** | Figma, Frontend Design |
|
|
154
296
|
|
|
155
|
-
|
|
297
|
+
---
|
|
298
|
+
|
|
299
|
+
## How the Pipeline Works
|
|
156
300
|
|
|
157
|
-
|
|
301
|
+
### Generation (kairn describe / kairn optimize)
|
|
158
302
|
|
|
159
|
-
1.
|
|
160
|
-
2.
|
|
161
|
-
3. **Pass 1
|
|
162
|
-
4. **Pass 2
|
|
163
|
-
5. **Pass 3
|
|
164
|
-
6.
|
|
165
|
-
7.
|
|
303
|
+
1. **User input** — intent string or scanned project profile
|
|
304
|
+
2. **Clarification** (optional) — 3-5 yes/no questions to refine workflow
|
|
305
|
+
3. **Pass 1: Skeleton** — LLM selects minimal tool set and outlines the project
|
|
306
|
+
4. **Pass 2: Harness** — LLM generates all content (CLAUDE.md, commands, rules, agents, docs)
|
|
307
|
+
5. **Pass 3: Settings** — Deterministic generation of `settings.json` and `.mcp.json` from registry
|
|
308
|
+
6. **Intent patterns** — Compile project-specific regex patterns from command names + synonyms
|
|
309
|
+
7. **Hook templates** — Generate `intent-router.mjs` (Tier 1) and Tier 2 prompt template
|
|
310
|
+
8. **Write files** — `.claude/` directory + `.mcp.json` + `.env` (with masked keys)
|
|
166
311
|
|
|
167
|
-
|
|
312
|
+
### Evolution (kairn evolve run)
|
|
313
|
+
|
|
314
|
+
```
|
|
315
|
+
Baseline (.claude/ snapshot)
|
|
316
|
+
│
|
|
317
|
+
▼
|
|
318
|
+
Iteration 1
|
|
319
|
+
├─ Evaluate: run all tasks, capture traces
|
|
320
|
+
├─ Diagnose: proposer reads traces, reasons about failures
|
|
321
|
+
├─ Mutate: generate 1-3 harness mutations
|
|
322
|
+
├─ Re-evaluate: run all tasks again
|
|
323
|
+
└─ Accept/rollback based on score improvement
|
|
324
|
+
│
|
|
325
|
+
▼
|
|
326
|
+
Iteration 2, 3, 4, 5...
|
|
327
|
+
│
|
|
328
|
+
▼
|
|
329
|
+
Best harness (apply to .claude/)
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
Each iteration is independent and can be retried. The proposer has memory of all prior iterations (v2.4.0 experience replay, coming soon).
|
|
333
|
+
|
|
334
|
+
### Self-Learning (v2.5.0)
|
|
335
|
+
|
|
336
|
+
```
|
|
337
|
+
Tier 1: regex hook intercepts prompt
|
|
338
|
+
├─ Matches pattern? → route to command + inject context
|
|
339
|
+
└─ No match? → fallthrough to Tier 2
|
|
340
|
+
|
|
341
|
+
Tier 2: Haiku prompt hook
|
|
342
|
+
├─ Classify intent
|
|
343
|
+
├─ Route to command if confident
|
|
344
|
+
└─ Log routing attempt (for learning)
|
|
345
|
+
|
|
346
|
+
SessionStart: intent-learner.mjs
|
|
347
|
+
├─ Read intent-log.jsonl (recent tier 2 routings)
|
|
348
|
+
├─ Promote recurring patterns to regex
|
|
349
|
+
├─ Update intent-router.mjs
|
|
350
|
+
└─ Write audit trail
|
|
351
|
+
```
|
|
352
|
+
|
|
353
|
+
Over time, more patterns become regex (fast, free) instead of Haiku (slow, $0.001).
|
|
354
|
+
|
|
355
|
+
---
|
|
356
|
+
|
|
357
|
+
## Example Workflow
|
|
358
|
+
|
|
359
|
+
### Scenario: Build a Next.js API
|
|
360
|
+
|
|
361
|
+
```bash
|
|
362
|
+
cd /tmp/my-api
|
|
363
|
+
git init
|
|
364
|
+
|
|
365
|
+
kairn describe "Next.js REST API with Prisma ORM and PostgreSQL. OAuth login, JWT auth, rate limiting."
|
|
366
|
+
|
|
367
|
+
# Output:
|
|
368
|
+
# ✔ Pass 1: Selected 7 tools (GitHub, PostgreSQL, Vercel, Semgrep, Docker, Context7, Sequential Thinking)
|
|
369
|
+
# ✔ Pass 2: Generated 73 lines in CLAUDE.md, 8 commands, 4 rules, 3 agents, 2 skills
|
|
370
|
+
# ✔ Pass 3: Configured 2 MCP servers (PostgreSQL + GitHub)
|
|
371
|
+
#
|
|
372
|
+
# Commands:
|
|
373
|
+
# /project:help Show available commands
|
|
374
|
+
# /project:plan Draft the API spec
|
|
375
|
+
# /project:develop Full development pipeline
|
|
376
|
+
# /project:test Run test suite
|
|
377
|
+
# /project:fix Issue-driven bug fixing
|
|
378
|
+
# /project:deploy Deploy to Vercel
|
|
379
|
+
# /project:security Audit for vulnerabilities
|
|
380
|
+
# /project:batch Run batches of independent tasks
|
|
381
|
+
#
|
|
382
|
+
# Env keys needed:
|
|
383
|
+
# POSTGRES_URL
|
|
384
|
+
# JWT_SECRET
|
|
385
|
+
# GITHUB_TOKEN
|
|
386
|
+
# VERCEL_TOKEN
|
|
387
|
+
#
|
|
388
|
+
# Paste your secrets (or press enter to skip):
|
|
389
|
+
# POSTGRES_URL: ***
|
|
390
|
+
# JWT_SECRET: ***
|
|
391
|
+
# GITHUB_TOKEN: (skipped)
|
|
392
|
+
# VERCEL_TOKEN: (skipped)
|
|
393
|
+
#
|
|
394
|
+
# Ready! Run: $ claude
|
|
395
|
+
|
|
396
|
+
claude # Start Claude Code with the generated harness
|
|
397
|
+
|
|
398
|
+
# In Claude Code:
|
|
399
|
+
# > /project:plan
|
|
400
|
+
# Drafts the API specification with OAuth flow, database schema, endpoint design
|
|
401
|
+
#
|
|
402
|
+
# > /project:develop feature/auth
|
|
403
|
+
# Full pipeline: specs feature in detail, plans implementation, TDD red→green→refactor,
|
|
404
|
+
# writes tests, runs security audit, updates docs
|
|
405
|
+
#
|
|
406
|
+
# > /project:fix
|
|
407
|
+
# Shows recent issues, user picks one, Claude researches the bug, fixes it, runs tests
|
|
408
|
+
```
|
|
409
|
+
|
|
410
|
+
### Scenario: Optimize an Existing Project
|
|
411
|
+
|
|
412
|
+
```bash
|
|
413
|
+
cd /path/to/existing/next-app
|
|
414
|
+
# It has a manual .claude/ directory
|
|
415
|
+
|
|
416
|
+
kairn optimize
|
|
417
|
+
|
|
418
|
+
# Output:
|
|
419
|
+
# ✔ Scan: TypeScript, Next.js, 47 dependencies, 8 scripts
|
|
420
|
+
#
|
|
421
|
+
# Harness Audit:
|
|
422
|
+
# CLAUDE.md: 187 lines ✓ (good)
|
|
423
|
+
# MCP servers: 4
|
|
424
|
+
# Commands: 5 (/help, /plan, /code, /test, /deploy)
|
|
425
|
+
# Rules: 2 (security, continuity)
|
|
426
|
+
#
|
|
427
|
+
# Issues found:
|
|
428
|
+
# ⚠ Missing /project:develop command (full development pipeline)
|
|
429
|
+
# ⚠ No path-scoped rules (api.md, testing.md for different code domains)
|
|
430
|
+
# ⚠ Hooks not configured (missing destructive command blocking)
|
|
431
|
+
#
|
|
432
|
+
# Generate optimized environment? This will overwrite existing .claude/ files.
|
|
433
|
+
# > Yes
|
|
434
|
+
#
|
|
435
|
+
# ✔ Environment compiled in 12s
|
|
436
|
+
# ✔ Files written: 4 new, 3 modified, 1 unchanged
|
|
437
|
+
#
|
|
438
|
+
# Ready! Run: $ claude
|
|
439
|
+
```
|
|
440
|
+
|
|
441
|
+
### Scenario: Evolve the Harness
|
|
442
|
+
|
|
443
|
+
```bash
|
|
444
|
+
# Harness is generated and working. Set up evolution:
|
|
445
|
+
|
|
446
|
+
kairn evolve init
|
|
447
|
+
|
|
448
|
+
# Auto-generated 5 eval tasks based on CLAUDE.md + project structure:
|
|
449
|
+
# task-1: "Implement user profile page"
|
|
450
|
+
# task-2: "Add password reset flow"
|
|
451
|
+
# task-3: "Refactor authentication middleware"
|
|
452
|
+
# task-4: "Write E2E tests for checkout flow"
|
|
453
|
+
# task-5: "Update API documentation after feature release"
|
|
454
|
+
|
|
455
|
+
kairn evolve baseline # Snapshot current .claude/ as iteration 0
|
|
456
|
+
|
|
457
|
+
kairn evolve run --iterations 5
|
|
458
|
+
|
|
459
|
+
# Iteration 1/5
|
|
460
|
+
# Evaluating... [task-1] pass [task-2] fail [task-3] pass [task-4] fail [task-5] pass
|
|
461
|
+
# Score: 3/5 (60%)
|
|
462
|
+
#
|
|
463
|
+
# Diagnosing failures...
|
|
464
|
+
# - Task 2 failed: "password reset" not mentioned in CLAUDE.md. Need /project:email command.
|
|
465
|
+
# - Task 4 failed: E2E tests failed because missing /project:test. Added but not documented.
|
|
466
|
+
#
|
|
467
|
+
# Proposing mutations:
|
|
468
|
+
# - Add /project:email command with SMTP integration guidance
|
|
469
|
+
# - Update CLAUDE.md "Authentication" section with password reset flow
|
|
470
|
+
# - Add e2e.md path-scoped rule with Playwright patterns
|
|
471
|
+
#
|
|
472
|
+
# Iteration 2/5
|
|
473
|
+
# Evaluating with mutated harness...
|
|
474
|
+
# [task-1] pass [task-2] pass [task-3] pass [task-4] pass [task-5] pass
|
|
475
|
+
# Score: 5/5 (100%) ✔ improvement! Accepting mutations.
|
|
476
|
+
#
|
|
477
|
+
# Iteration 3/5
|
|
478
|
+
# Evaluating...
|
|
479
|
+
# [task-1] pass [task-2] pass [task-3] pass [task-4] pass [task-5] pass
|
|
480
|
+
# Score: 5/5 (100%) — no regression, but no improvement. Proposing refactements...
|
|
481
|
+
# - CLAUDE.md got bloated (142 lines). Moving detail to rules/.
|
|
482
|
+
# Iteration 3 score: 5/5. Accepting.
|
|
483
|
+
#
|
|
484
|
+
# Iterations 4-5: Scores plateau at 5/5. No more mutations.
|
|
485
|
+
#
|
|
486
|
+
# Final leaderboard:
|
|
487
|
+
# Iteration 0 (baseline): 60% (3/5)
|
|
488
|
+
# Iteration 1: 60% (3/5)
|
|
489
|
+
# Iteration 2: 100% (5/5) ← best
|
|
490
|
+
# Iteration 3: 100% (5/5)
|
|
491
|
+
# Iteration 4: 100% (5/5)
|
|
492
|
+
# Iteration 5: 100% (5/5)
|
|
493
|
+
|
|
494
|
+
kairn evolve report # Detailed markdown summary
|
|
495
|
+
kairn evolve apply # Copy iteration 2 to .claude/
|
|
496
|
+
```
|
|
497
|
+
|
|
498
|
+
---
|
|
499
|
+
|
|
500
|
+
## Architecture & Philosophy
|
|
501
|
+
|
|
502
|
+
### Design Principles
|
|
503
|
+
|
|
504
|
+
1. **Minimal over complete.** 5 well-chosen tools beat 50 generic ones.
|
|
505
|
+
2. **Workflow-specific over generic.** Every file generated relates to your actual task.
|
|
506
|
+
3. **Self-improving.** Environments get better with use via the evolution loop and self-learning intent router.
|
|
507
|
+
4. **Local-first.** No accounts, no servers, no telemetry. Runs offline with your own LLM key.
|
|
508
|
+
5. **Transparent.** You can inspect every generated file. Nothing is hidden.
|
|
509
|
+
6. **Security by default.** Every environment includes deny rules, hooks, and guidance.
|
|
510
|
+
7. **Prove it.** Evolved harnesses must demonstrably outperform static ones. Claims require measurement.
|
|
511
|
+
|
|
512
|
+
### What Makes Kairn Unique
|
|
513
|
+
|
|
514
|
+
**vs. Manual `.claude/` directories:**
|
|
515
|
+
- Auto-generated from codebase scan or workflow description
|
|
516
|
+
- Intent routing (don't memorize command names)
|
|
517
|
+
- Automated evolution (harness improves on real tasks)
|
|
518
|
+
|
|
519
|
+
**vs. Other agents (OMC, AutoCoder, etc.):**
|
|
520
|
+
- Kairn manages the *harness* (instructions, MCP, commands, rules, agents), not agents themselves
|
|
521
|
+
- Kairn uses the evolution loop to improve the harness (not the agent capability)
|
|
522
|
+
- Two-tier intent routing (regex + Haiku) is unique to Kairn v2.5.0+
|
|
523
|
+
|
|
524
|
+
**vs. DSPy, Meta-Harness, OpenEvolve:**
|
|
525
|
+
- Kairn is CLI-first and project-scoped (not a framework library)
|
|
526
|
+
- Integrated with Claude Code's native hooks API (not custom inference)
|
|
527
|
+
- Generates MCP configurations alongside harness (full integration)
|
|
528
|
+
|
|
529
|
+
---
|
|
530
|
+
|
|
531
|
+
## Roadmap
|
|
532
|
+
|
|
533
|
+
### v1.x ✅ (Complete)
|
|
534
|
+
Local CLI for generating and managing Claude Code environments. Includes advanced patterns (sprint contracts, multi-agent QA, autonomy levels), templates, secrets management, and Claude Code power patterns (TDD, verification, known gotchas).
|
|
535
|
+
|
|
536
|
+
### v2.x (In Progress)
|
|
537
|
+
**Kairn Evolve** — automated harness optimization.
|
|
538
|
+
|
|
539
|
+
- **v2.0.0** ✅ Task Definition & Trace Infrastructure
|
|
540
|
+
- **v2.1.0** ✅ The Evolution Loop
|
|
541
|
+
- **v2.2.0** ✅ Diagnosis & Reporting
|
|
542
|
+
- **v2.2.1-2.2.8** ✅ Bug fixes & optimizations
|
|
543
|
+
- **v2.3.0** ⏳ Eval Quality & Auth (Claude Code subscription OAuth, prompt caching)
|
|
544
|
+
- **v2.4.0** ⏳ Intelligent Evolution (principal proposer, experience replay, exploration/exploitation)
|
|
545
|
+
- **v2.5.0** 🔄 Intent-Aware Harnesses (in-progress Ralph loop)
|
|
546
|
+
- **v2.6.0** ⏳ Structured Harness IR (mutations on typed IR, not raw text)
|
|
547
|
+
- **v2.7.0** ⏳ Polish & Integration (dashboard, watch mode, CI/CD integration)
|
|
548
|
+
|
|
549
|
+
### v3.x (Aspirational)
|
|
550
|
+
Broader harness scope (plugins, external tools), paid tool connections, hosted platform, learning system.
|
|
551
|
+
|
|
552
|
+
---
|
|
168
553
|
|
|
169
554
|
## Security
|
|
170
555
|
|
|
@@ -173,14 +558,54 @@ The LLM call uses your own API key. Nothing is sent to Kairn servers (there are
|
|
|
173
558
|
- **Curated registry only.** Every MCP server is manually verified.
|
|
174
559
|
- **Environment variable references.** MCP configs use `${ENV_VAR}` syntax — secrets never written to config files.
|
|
175
560
|
- **Path traversal protection.** Evolution mutations are validated against `../` injection.
|
|
561
|
+
- **Hooks in settings.json** — `PreToolUse` hooks block destructive commands, `PostCompact` hooks restore context.
|
|
562
|
+
|
|
563
|
+
---
|
|
564
|
+
|
|
565
|
+
## FAQ
|
|
566
|
+
|
|
567
|
+
**Q: Do I need a Kairn account?**
|
|
568
|
+
A: No. Kairn is a local CLI. Your API key for Claude/GPT/Gemini is configured once and stored locally.
|
|
569
|
+
|
|
570
|
+
**Q: Does Kairn send my code to external servers?**
|
|
571
|
+
A: No. All LLM calls use your own API key. Kairn CLI has no backend.
|
|
572
|
+
|
|
573
|
+
**Q: Can I use Kairn with Claude Code on a team?**
|
|
574
|
+
A: Yes. Generate the harness locally, commit `.claude/` to git. Team members run `claude` and get the same environment. The evolve loop runs locally per person (results don't auto-merge).
|
|
575
|
+
|
|
576
|
+
**Q: What if I want to keep my manual `.claude/` customizations?**
|
|
577
|
+
A: Use `kairn optimize --diff` to preview changes. You can selectively accept or reject them. For full control, don't use `optimize` — use `describe` once and then hand-edit the generated files.
|
|
578
|
+
|
|
579
|
+
**Q: How much does evolution cost?**
|
|
580
|
+
A: Depends on your model, iteration count, and task volume. A 5-iteration evolution run with 5 tasks on Anthropic:
|
|
581
|
+
- Evaluation: ~100K tokens per iteration (traces logged)
|
|
582
|
+
- Proposer: ~80K tokens per iteration (diagnosis + mutation)
|
|
583
|
+
- Re-evaluation: ~100K tokens per iteration
|
|
584
|
+
- **Total:** ~1.5M tokens = ~$15-50 (Opus/Claude 3) or ~$2-5 (Haiku)
|
|
176
585
|
|
|
177
|
-
|
|
586
|
+
**Q: Can I evolve just one task?**
|
|
587
|
+
A: Yes. `kairn evolve run --task <task_id>` runs a single task.
|
|
178
588
|
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
589
|
+
**Q: What's the intent router doing on my prompt?**
|
|
590
|
+
A: When you type a prompt like "deploy this", the intent router:
|
|
591
|
+
1. Checks Tier 1 regex patterns (fast, free)
|
|
592
|
+
2. If no match, sends to Tier 2 (Haiku, ~$0.001)
|
|
593
|
+
3. Injects `/project:deploy` into your message context
|
|
594
|
+
4. Claude reads that and executes the command
|
|
595
|
+
|
|
596
|
+
You can disable it with `"enableTier2": false` in settings.json if you find it intrusive.
|
|
597
|
+
|
|
598
|
+
---
|
|
599
|
+
|
|
600
|
+
## Contributing
|
|
601
|
+
|
|
602
|
+
Kairn is open-source. Contributions welcome:
|
|
603
|
+
- New MCP servers to the registry
|
|
604
|
+
- Eval task templates for new project types
|
|
605
|
+
- Improved proposer prompts
|
|
606
|
+
- Bug reports and UX feedback
|
|
607
|
+
|
|
608
|
+
---
|
|
184
609
|
|
|
185
610
|
## License
|
|
186
611
|
|
|
@@ -188,4 +613,4 @@ MIT
|
|
|
188
613
|
|
|
189
614
|
---
|
|
190
615
|
|
|
191
|
-
*Kairn — from kairos (the right moment) and cairn (the stack of stones marking the path).*
|
|
616
|
+
*Kairn — from kairos (the right moment) and cairn (the stack of stones marking the path). Choose the right moment. Mark the path for others.*
|