kairn-cli 2.5.0 → 2.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,12 +1,14 @@
1
- # Kairn
1
+ # Kairn — The Agent Environment Compiler
2
2
 
3
- > The agent environment compiler. Describe what you want done — get an optimized Claude Code environment. Then evolve it automatically.
3
+ > Describe your workflow. Get an optimized Claude Code environment. Then evolve it automatically.
4
4
 
5
- Kairn is a CLI that compiles natural language workflow descriptions into minimal, optimal [Claude Code](https://code.claude.com/) agent environments — complete with MCP servers, slash commands, skills, subagents, and security rules.
5
+ Kairn is a CLI that compiles natural language descriptions into minimal, optimal [Claude Code](https://code.claude.com/) agent environments — complete with MCP servers, slash commands, skills, subagents, rules, and security. Then it uses **automated evolution** (inspired by [Meta-Harness](https://yoonholee.com/meta-harness/), Stanford IRIS Lab 2026) to improve them through real-world task execution.
6
6
 
7
- **v2.1** adds **Kairn Evolve** — an automated optimization loop that runs your agent on real tasks, diagnoses failures from full execution traces, and mutates the harness until performance plateaus. Inspired by [Meta-Harness](https://yoonholee.com/meta-harness/) (Stanford IRIS Lab, 2026).
7
+ **v2.5.0** adds **Intent-Aware Harnesses** — project-specific routing that intercepts natural language and activates the right command. Two-tier: fast regex (Tier 1) + semantic Haiku fallback (Tier 2). Self-learning the harness learns your vocabulary over time.
8
8
 
9
- **No server. No account. Runs locally with your own LLM key.**
9
+ **No servers. No accounts. No telemetry. Runs locally with your own LLM key.**
10
+
11
+ ---
10
12
 
11
13
  ## Install
12
14
 
@@ -19,127 +21,267 @@ Requires Node.js 18+. The command is `kairn`.
19
21
  ## Quick Start
20
22
 
21
23
  ```bash
22
- # 1. Set up your LLM key
24
+ # 1. Set up your LLM provider (Anthropic, OpenAI, Google, xAI, DeepSeek, Mistral, Groq, or custom)
23
25
  kairn init
24
26
 
25
- # 2. Describe your workflow
27
+ # 2. Describe your workflow (or scan an existing repo)
26
28
  kairn describe "Build a Next.js app with Supabase auth"
29
+ # or
30
+ kairn optimize # scans existing project at cwd
27
31
 
28
32
  # 3. Start coding
29
33
  claude
30
34
  ```
31
35
 
32
- Kairn generates the entire `.claude/` directory — CLAUDE.md, MCP servers, slash commands, skills, agents, rules — tailored to your specific workflow.
36
+ Kairn generates the entire `.claude/` directory — CLAUDE.md, settings.json, commands, rules, agents, hooks, security policies — tailored to your specific workflow. Then, optionally, evolve it:
37
+
38
+ ```bash
39
+ # Set up evolution
40
+ kairn evolve init # auto-generate 3-5 eval tasks
41
+ kairn evolve baseline # snapshot current harness
42
+
43
+ # Optimize
44
+ kairn evolve run --iterations 5 # Run evolution loop
45
+ kairn evolve apply # Accept best harness
46
+ ```
47
+
48
+ ---
33
49
 
34
50
  ## What Gets Generated
35
51
 
36
52
  ```
37
53
  .claude/
38
- ├── CLAUDE.md # Workflow-specific system prompt
39
- ├── settings.json # Permissions, hooks, and security deny rules
54
+ ├── CLAUDE.md # Workflow-specific system prompt (7 sections)
55
+ ├── settings.json # Permissions, hooks, security rules, intent routing
40
56
  ├── commands/ # Slash commands (/project:help, /project:plan, etc.)
41
- ├── rules/ # Auto-loaded instructions (security, continuity)
42
- ├── skills/ # Model-controlled capabilities
43
- ├── agents/ # Specialized subagents
44
- └── docs/ # Pre-initialized project memory
57
+ ├── rules/ # Auto-loaded instructions (security, continuity, paths)
58
+ ├── skills/ # Model-controlled capabilities (code, research, writing)
59
+ ├── agents/ # Specialized subagents (@architect, @tester, etc.)
60
+ ├── docs/ # Pre-initialized project memory
61
+ ├── hooks/ # Intent router (Tier 1 regex + Tier 2 Haiku classifier)
62
+ │ ├── intent-router.mjs # Project-specific regex patterns + fallthrough
63
+ │ ├── intent-learner.mjs # Promotes recurring Tier 2 patterns to Tier 1
64
+ │ └── intent-log.jsonl # Log of routed prompts (for learning)
65
+ └── QUICKSTART.md # Interactive startup guide (Level 2-4)
45
66
  .mcp.json # Project-scoped MCP server config
46
67
  .env # API keys (gitignored, masked in output)
47
68
  ```
48
69
 
49
- ## Commands
70
+ ---
71
+
72
+ ## Core Commands
50
73
 
51
74
  ### `kairn init`
52
75
 
53
- Interactive setup. Pick your LLM provider and model, paste your API key. Key stays local at `~/.kairn/config.json`.
76
+ Interactive setup. Pick your LLM provider, enter credentials. API key stored locally at `~/.kairn/config.json`.
54
77
 
55
- Supported providers:
78
+ **Supported providers:**
56
79
  - **Anthropic** — Claude Sonnet 4.6, Opus 4.6, Haiku 4.5
57
80
  - **OpenAI** — GPT-4.1, GPT-4.1 mini, o4-mini, GPT-5 mini
58
81
  - **Google** — Gemini 2.5 Flash, Gemini 3 Flash, Gemini 2.5 Pro, Gemini 3.1 Pro
59
- - **xAI** — Grok 4.1 Fast, Grok 4.20 (2M context)
60
- - **DeepSeek** — V3.2 Chat, V3.2 Reasoner (cheapest)
82
+ - **xAI** — Grok 4.1 Fast, Grok 4.20 (2M context, $0.20/M)
83
+ - **DeepSeek** — V3.2 Chat, V3.2 Reasoner (cheapest at $0.28/M)
61
84
  - **Mistral** — Large 3, Codestral, Small 4 (open-weight)
62
85
  - **Groq** — Llama 4, DeepSeek R1, Qwen 3 (free tier)
63
- - **Custom** — any OpenAI-compatible endpoint (local Ollama, LM Studio, etc.)
86
+ - **Custom** — any OpenAI-compatible endpoint (local Ollama, LM Studio)
64
87
 
65
- ### `kairn describe [intent]`
88
+ ### `kairn describe [intent] [options]`
66
89
 
67
- The main command. Describe what you want your agent to do, and Kairn compiles an optimal environment.
90
+ **The main command.** Describe what you want your agent to do. Kairn compiles an optimal environment.
68
91
 
69
92
  ```bash
70
- kairn describe "Research ML papers on GRPO training and write a summary"
71
- kairn describe "Build a REST API with Express and PostgreSQL" --quick
93
+ kairn describe "Build a Next.js REST API with PostgreSQL"
94
+ kairn describe "Research ML papers on GRPO training and summarize" --quick
72
95
  ```
73
96
 
74
- Features:
75
- - **Interactive clarification** — 3-5 questions to understand your project (skip with `--quick`)
76
- - **Multi-pass compilation** — skeleton pass (tool selection) + harness pass (content generation) + deterministic settings
77
- - **Autonomy levels** — choose how autonomous the agent should be (1-4)
78
- - **Secrets collection** prompted for API keys after generation, written to `.env`
97
+ **Features:**
98
+ - **Interactive clarification** — 3-5 yes/no questions to refine your workflow (skip with `--quick`)
99
+ - **Multi-pass compilation** — Skeleton pass (tool selection) + Harness pass (content generation) + deterministic settings
100
+ - **Autonomy levels** — Choose how autonomous (1-4, default 2):
101
+ - **Level 1 (Guided):** Manual workflow with `/project:tour`, help, and guidance
102
+ - **Level 2 (Assisted):** `/project:loop` for workflow automation, `@pm` agent for planning
103
+ - **Level 3 (Autonomous):** `/project:auto` for self-directed execution with PR delivery
104
+ - **Level 4 (Full Auto):** `/project:autopilot` for continuous execution with stop conditions
105
+ - **Secrets collection** — Prompted for API keys after generation, written to `.env`
106
+ - **Intent routing** — Auto-generated `/project:*` command routing (both regex and Haiku-based)
79
107
 
80
- ### `kairn optimize [--diff]`
108
+ ### `kairn optimize [options]`
81
109
 
82
110
  Scan an existing project and optimize its Claude Code environment. Detects language, framework, dependencies, and generates improvements.
83
111
 
84
112
  ```bash
85
- kairn optimize # Write optimized environment
113
+ kairn optimize # Scan, audit, and overwrite .claude/
86
114
  kairn optimize --diff # Preview changes before writing
115
+ kairn optimize --audit-only # Show issues without generating
87
116
  ```
88
117
 
89
- ### `kairn templates`
118
+ **Features:**
119
+ - **Full project scan** — language, framework, dependencies, scripts, env keys, CI/CD, existing harness
120
+ - **Harness audit** — checks CLAUDE.md quality, missing commands/rules, MCP bloat, security configurations
121
+ - **Two modes:**
122
+ - No `.claude/` → generate from scratch
123
+ - Has `.claude/` → optimize + overwrite (shows audit issues first, asks for confirmation)
124
+ - **Diff preview** — see what would change before applying (with `--diff`)
90
125
 
91
- Browse and activate pre-built environment templates.
126
+ ### `kairn templates [options]`
127
+
128
+ Browse pre-built environment templates. Activate one to jumpstart a new project.
92
129
 
93
130
  ```bash
94
- kairn templates # Browse gallery
95
- kairn templates --activate nextjs # Apply a template
131
+ kairn templates # Browse gallery
132
+ kairn templates --activate nextjs # Apply a template
96
133
  ```
97
134
 
98
- Available templates: Next.js Full-Stack, API Service, Research Project, Content Writing.
135
+ **Available templates:**
136
+ - Next.js Full-Stack (React + Node + PostgreSQL + Supabase)
137
+ - API Service (Express/Fastify + database + testing)
138
+ - Research Project (paper analysis, literature review, synthesis)
139
+ - Content Writing (blog, documentation, marketing)
99
140
 
100
141
  ### `kairn doctor`
101
142
 
102
- Validate the current environment against Claude Code best practices.
143
+ Validate the current environment against Claude Code best practices. Checks:
144
+ - CLAUDE.md structure and token count
145
+ - MCP server configuration completeness
146
+ - Security rules and hooks
147
+ - Command and agent definitions
148
+ - Environment variable references
103
149
 
104
- ### `kairn keys [--show]`
150
+ ### `kairn keys [options]`
105
151
 
106
- Add or update API keys for MCP servers in the current environment.
152
+ Manage API keys for MCP servers in the current environment.
153
+
154
+ ```bash
155
+ kairn keys # Prompt for missing keys
156
+ kairn keys --show # Show which keys are set vs missing
157
+ ```
107
158
 
108
159
  ### `kairn list` / `kairn activate <env_id>`
109
160
 
110
- Show saved environments and re-deploy them to any directory.
161
+ Show saved environments (stored in `~/.kairn/envs/`) and re-deploy them to any directory.
111
162
 
112
- ### `kairn evolve`
163
+ ```bash
164
+ kairn list # List all saved environments
165
+ kairn activate env_abc123 # Copy that environment to .claude/
166
+ ```
167
+
168
+ ### `kairn evolve` — Automated Harness Optimization
169
+
170
+ The heart of v2.x. Run your agent on real tasks, capture execution traces, diagnose failures, and mutate the harness iteratively.
171
+
172
+ #### `kairn evolve init`
113
173
 
114
- Automated harness optimization. Run your agent on real tasks, capture traces, and evolve the environment.
174
+ Set up evolution for the current project. Auto-generates 3-5 concrete eval tasks based on your CLAUDE.md and project structure.
115
175
 
116
176
  ```bash
117
- # 1. Initialize — auto-generates project-specific eval tasks via LLM
118
177
  kairn evolve init
178
+ ```
179
+
180
+ Creates `.kairn-evolve/tasks.yaml` with tasks like:
181
+ - "Add a new feature X to the codebase"
182
+ - "Fix this known bug Y"
183
+ - "Refactor the API layer for clarity"
184
+ - "Write comprehensive test coverage"
185
+ - "Update documentation after feature launch"
186
+
187
+ Uses 6 built-in templates: add-feature, fix-bug, refactor, test-writing, config-change, documentation.
119
188
 
120
- # 2. Snapshot current .claude/ as the baseline
189
+ #### `kairn evolve baseline`
190
+
191
+ Snapshot your current `.claude/` directory as iteration 0 (the baseline to improve against).
192
+
193
+ ```bash
121
194
  kairn evolve baseline
195
+ ```
196
+
197
+ #### `kairn evolve run`
198
+
199
+ Run the full evolution loop. Evaluates all tasks, diagnoses failures, proposes mutations, re-evaluates.
200
+
201
+ ```bash
202
+ kairn evolve run # 5 iterations (default)
203
+ kairn evolve run --iterations 3 # Custom iteration count
204
+ kairn evolve run --task <task_id> # Run a single task
205
+ kairn evolve run --parallel 4 # Parallel task evaluation (4 concurrent)
206
+ kairn evolve run --runs 3 # Run each task 3 times, report mean ± stddev
207
+ ```
208
+
209
+ **How it works (the loop):**
210
+
211
+ 1. **Evaluate** — Run each eval task by spawning Claude Code in an isolated workspace. Capture full traces:
212
+ - stdout, stderr
213
+ - MCP tool calls (which tools, inputs, outputs)
214
+ - Files changed (diffs)
215
+ - Execution time, pass/fail status
216
+
217
+ 2. **Diagnose** — A proposer agent (Opus) reads the full trace filesystem and performs causal reasoning:
218
+ - "Task A failed because CLAUDE.md doesn't mention the /api path"
219
+ - "Task B passed on iteration 1 but regressed on iteration 3 — the new security rule broke it"
220
+ - "Tasks A and C both needed /project:fix but there's no /project:fix command"
221
+
222
+ 3. **Mutate** — Propose minimal, targeted changes to the harness:
223
+ - `replace`: Update a section in CLAUDE.md, a command, a rule
224
+ - `add_section`: Insert new guidance into CLAUDE.md
225
+ - `create_file`: Add a new command or rule
226
+ - `delete_section`: Remove contradictory or bloat sections
227
+ - `delete_file`: Remove unused commands/rules
228
+ - `add_intent_pattern`: Add a new natural language pattern (v2.5.0)
229
+ - `modify_intent_prompt`: Improve the Tier 2 Haiku classifier (v2.5.0)
230
+
231
+ 4. **Re-evaluate** — Run all tasks again with the mutated harness. If scores improve → accept. If scores regress → rollback to previous best.
232
+
233
+ 5. **Repeat** — Iterate N times (default 5). Each iteration cycles through evaluate → diagnose → mutate → re-evaluate.
122
234
 
123
- # 3. Run the evolution loop
124
- kairn evolve run # 5 iterations (default)
125
- kairn evolve run --iterations 3 # Custom iteration count
126
- kairn evolve run --task <id> # Run a single task
235
+ **Scoring:**
236
+ - **pass/fail** (default) task passes or fails
237
+ - **llm-judge** LLM reads task output and scores (0-100)
238
+ - **rubric** custom weighted scoring function
239
+
240
+ **Adaptive pruning (v2.2.7):**
241
+ On middle iterations, skip slow/expensive tasks above a confidence threshold. Re-run all tasks on the first and last iteration for rigor.
242
+
243
+ **Anti-regression guards (v2.2.8):**
244
+ - `maxMutationsPerIteration` (default: 3) — cap mutations per step
245
+ - `maxTaskDrop` (default: 20) — if any single task drops >20 points, rollback
246
+ - Loss-weighted proposer focus — proposer reads failures worst-first
247
+
248
+ #### `kairn evolve report`
249
+
250
+ Generate a human-readable summary of the evolution run.
251
+
252
+ ```bash
253
+ kairn evolve report # Markdown to stdout
254
+ kairn evolve report --json # Machine-readable JSON
127
255
  ```
128
256
 
129
- **How it works:**
257
+ Shows:
258
+ - Evolution leaderboard (iterations × tasks × scores)
259
+ - Per-task trace diffs (what changed between iterations for the same task)
260
+ - Counterfactual diagnosis (which mutations helped/hurt which tasks)
261
+ - Wall time, token cost, iterations completed
130
262
 
131
- 1. **Define tasks** — `kairn evolve init` reads your CLAUDE.md and project structure, then uses the LLM to generate 3-5 concrete eval tasks from 6 built-in templates (add-feature, fix-bug, refactor, test-writing, config-change, documentation)
132
- 2. **Baseline** — `kairn evolve baseline` snapshots your current `.claude/` directory
133
- 3. **Evaluate** — runs each task by spawning Claude Code in an isolated workspace, capturing full traces (stdout, stderr, tool calls, files changed, timing)
134
- 4. **Diagnose** — a proposer agent (Opus) reads the full traces and performs causal reasoning to identify why tasks fail
135
- 5. **Mutate** — proposes minimal, targeted changes to CLAUDE.md, commands, rules, or agents
136
- 6. **Repeat** — re-evaluates with the mutated harness. Rolls back if scores regress.
263
+ #### `kairn evolve diff <iter1> <iter2>`
137
264
 
138
- Scoring: pass/fail (default), LLM-as-judge, or weighted rubric.
265
+ Show the harness changes between two iterations.
266
+
267
+ ```bash
268
+ kairn evolve diff 0 3 # Show all mutations from baseline to iteration 3
269
+ ```
270
+
271
+ #### `kairn evolve apply [--iter N]`
272
+
273
+ Copy the best (or specified) evolved harness back to `.claude/`.
274
+
275
+ ```bash
276
+ kairn evolve apply # Copy best iteration to .claude/
277
+ kairn evolve apply --iter 3 # Copy iteration 3 specifically
278
+ ```
279
+
280
+ ---
139
281
 
140
282
  ## Tool Registry
141
283
 
142
- Kairn ships with 28 curated tools across 8 categories:
284
+ Kairn ships with **28 curated MCP servers** across 8 categories. Tools are auto-selected based on your workflow — fewer tools = less context bloat = better agent performance.
143
285
 
144
286
  | Category | Tools |
145
287
  |----------|-------|
@@ -152,19 +294,262 @@ Kairn ships with 28 curated tools across 8 categories:
152
294
  | **Security** | Semgrep, security-guidance |
153
295
  | **Design** | Figma, Frontend Design |
154
296
 
155
- Tools are selected based on your workflow description. Fewer tools = less context bloat = better agent performance.
297
+ ---
298
+
299
+ ## How the Pipeline Works
156
300
 
157
- ## How It Works
301
+ ### Generation (kairn describe / kairn optimize)
158
302
 
159
- 1. You describe your workflow in natural language
160
- 2. Kairn asks clarifying questions (or skip with `--quick`)
161
- 3. **Pass 1:** LLM selects the minimal tool set and outlines the project
162
- 4. **Pass 2:** LLM generates all harness content (CLAUDE.md, commands, rules, agents)
163
- 5. **Pass 3:** Settings and MCP config generated deterministically from the registry
164
- 6. Kairn writes the `.claude/` directory and `.mcp.json`
165
- 7. API keys are collected and written to `.env`
303
+ 1. **User input** intent string or scanned project profile
304
+ 2. **Clarification** (optional) 3-5 yes/no questions to refine workflow
305
+ 3. **Pass 1: Skeleton** — LLM selects minimal tool set and outlines the project
306
+ 4. **Pass 2: Harness** — LLM generates all content (CLAUDE.md, commands, rules, agents, docs)
307
+ 5. **Pass 3: Settings** Deterministic generation of `settings.json` and `.mcp.json` from registry
308
+ 6. **Intent patterns** Compile project-specific regex patterns from command names + synonyms
309
+ 7. **Hook templates** Generate `intent-router.mjs` (Tier 1) and Tier 2 prompt template
310
+ 8. **Write files** — `.claude/` directory + `.mcp.json` + `.env` (with masked keys)
166
311
 
167
- The LLM call uses your own API key. Nothing is sent to Kairn servers (there are none).
312
+ ### Evolution (kairn evolve run)
313
+
314
+ ```
315
+ Baseline (.claude/ snapshot)
316
+
317
+
318
+ Iteration 1
319
+ ├─ Evaluate: run all tasks, capture traces
320
+ ├─ Diagnose: proposer reads traces, reasons about failures
321
+ ├─ Mutate: generate 1-3 harness mutations
322
+ ├─ Re-evaluate: run all tasks again
323
+ └─ Accept/rollback based on score improvement
324
+
325
+
326
+ Iteration 2, 3, 4, 5...
327
+
328
+
329
+ Best harness (apply to .claude/)
330
+ ```
331
+
332
+ Each iteration is independent and can be retried. The proposer has memory of all prior iterations (v2.4.0 experience replay, coming soon).
333
+
334
+ ### Self-Learning (v2.5.0)
335
+
336
+ ```
337
+ Tier 1: regex hook intercepts prompt
338
+ ├─ Matches pattern? → route to command + inject context
339
+ └─ No match? → fallthrough to Tier 2
340
+
341
+ Tier 2: Haiku prompt hook
342
+ ├─ Classify intent
343
+ ├─ Route to command if confident
344
+ └─ Log routing attempt (for learning)
345
+
346
+ SessionStart: intent-learner.mjs
347
+ ├─ Read intent-log.jsonl (recent tier 2 routings)
348
+ ├─ Promote recurring patterns to regex
349
+ ├─ Update intent-router.mjs
350
+ └─ Write audit trail
351
+ ```
352
+
353
+ Over time, more patterns become regex (fast, free) instead of Haiku (slow, $0.001).
354
+
355
+ ---
356
+
357
+ ## Example Workflow
358
+
359
+ ### Scenario: Build a Next.js API
360
+
361
+ ```bash
362
+ cd /tmp/my-api
363
+ git init
364
+
365
+ kairn describe "Next.js REST API with Prisma ORM and PostgreSQL. OAuth login, JWT auth, rate limiting."
366
+
367
+ # Output:
368
+ # ✔ Pass 1: Selected 7 tools (GitHub, PostgreSQL, Vercel, Semgrep, Docker, Context7, Sequential Thinking)
369
+ # ✔ Pass 2: Generated 73 lines in CLAUDE.md, 8 commands, 4 rules, 3 agents, 2 skills
370
+ # ✔ Pass 3: Configured 2 MCP servers (PostgreSQL + GitHub)
371
+ #
372
+ # Commands:
373
+ # /project:help Show available commands
374
+ # /project:plan Draft the API spec
375
+ # /project:develop Full development pipeline
376
+ # /project:test Run test suite
377
+ # /project:fix Issue-driven bug fixing
378
+ # /project:deploy Deploy to Vercel
379
+ # /project:security Audit for vulnerabilities
380
+ # /project:batch Run batches of independent tasks
381
+ #
382
+ # Env keys needed:
383
+ # POSTGRES_URL
384
+ # JWT_SECRET
385
+ # GITHUB_TOKEN
386
+ # VERCEL_TOKEN
387
+ #
388
+ # Paste your secrets (or press enter to skip):
389
+ # POSTGRES_URL: ***
390
+ # JWT_SECRET: ***
391
+ # GITHUB_TOKEN: (skipped)
392
+ # VERCEL_TOKEN: (skipped)
393
+ #
394
+ # Ready! Run: $ claude
395
+
396
+ claude # Start Claude Code with the generated harness
397
+
398
+ # In Claude Code:
399
+ # > /project:plan
400
+ # Drafts the API specification with OAuth flow, database schema, endpoint design
401
+ #
402
+ # > /project:develop feature/auth
403
+ # Full pipeline: specs feature in detail, plans implementation, TDD red→green→refactor,
404
+ # writes tests, runs security audit, updates docs
405
+ #
406
+ # > /project:fix
407
+ # Shows recent issues, user picks one, Claude researches the bug, fixes it, runs tests
408
+ ```
409
+
410
+ ### Scenario: Optimize an Existing Project
411
+
412
+ ```bash
413
+ cd /path/to/existing/next-app
414
+ # It has a manual .claude/ directory
415
+
416
+ kairn optimize
417
+
418
+ # Output:
419
+ # ✔ Scan: TypeScript, Next.js, 47 dependencies, 8 scripts
420
+ #
421
+ # Harness Audit:
422
+ # CLAUDE.md: 187 lines ✓ (good)
423
+ # MCP servers: 4
424
+ # Commands: 5 (/help, /plan, /code, /test, /deploy)
425
+ # Rules: 2 (security, continuity)
426
+ #
427
+ # Issues found:
428
+ # ⚠ Missing /project:develop command (full development pipeline)
429
+ # ⚠ No path-scoped rules (api.md, testing.md for different code domains)
430
+ # ⚠ Hooks not configured (missing destructive command blocking)
431
+ #
432
+ # Generate optimized environment? This will overwrite existing .claude/ files.
433
+ # > Yes
434
+ #
435
+ # ✔ Environment compiled in 12s
436
+ # ✔ Files written: 4 new, 3 modified, 1 unchanged
437
+ #
438
+ # Ready! Run: $ claude
439
+ ```
440
+
441
+ ### Scenario: Evolve the Harness
442
+
443
+ ```bash
444
+ # Harness is generated and working. Set up evolution:
445
+
446
+ kairn evolve init
447
+
448
+ # Auto-generated 5 eval tasks based on CLAUDE.md + project structure:
449
+ # task-1: "Implement user profile page"
450
+ # task-2: "Add password reset flow"
451
+ # task-3: "Refactor authentication middleware"
452
+ # task-4: "Write E2E tests for checkout flow"
453
+ # task-5: "Update API documentation after feature release"
454
+
455
+ kairn evolve baseline # Snapshot current .claude/ as iteration 0
456
+
457
+ kairn evolve run --iterations 5
458
+
459
+ # Iteration 1/5
460
+ # Evaluating... [task-1] pass [task-2] fail [task-3] pass [task-4] fail [task-5] pass
461
+ # Score: 3/5 (60%)
462
+ #
463
+ # Diagnosing failures...
464
+ # - Task 2 failed: "password reset" not mentioned in CLAUDE.md. Need /project:email command.
465
+ # - Task 4 failed: E2E tests failed because missing /project:test. Added but not documented.
466
+ #
467
+ # Proposing mutations:
468
+ # - Add /project:email command with SMTP integration guidance
469
+ # - Update CLAUDE.md "Authentication" section with password reset flow
470
+ # - Add e2e.md path-scoped rule with Playwright patterns
471
+ #
472
+ # Iteration 2/5
473
+ # Evaluating with mutated harness...
474
+ # [task-1] pass [task-2] pass [task-3] pass [task-4] pass [task-5] pass
475
+ # Score: 5/5 (100%) ✔ improvement! Accepting mutations.
476
+ #
477
+ # Iteration 3/5
478
+ # Evaluating...
479
+ # [task-1] pass [task-2] pass [task-3] pass [task-4] pass [task-5] pass
480
+ # Score: 5/5 (100%) — no regression, but no improvement. Proposing refactements...
481
+ # - CLAUDE.md got bloated (142 lines). Moving detail to rules/.
482
+ # Iteration 3 score: 5/5. Accepting.
483
+ #
484
+ # Iterations 4-5: Scores plateau at 5/5. No more mutations.
485
+ #
486
+ # Final leaderboard:
487
+ # Iteration 0 (baseline): 60% (3/5)
488
+ # Iteration 1: 60% (3/5)
489
+ # Iteration 2: 100% (5/5) ← best
490
+ # Iteration 3: 100% (5/5)
491
+ # Iteration 4: 100% (5/5)
492
+ # Iteration 5: 100% (5/5)
493
+
494
+ kairn evolve report # Detailed markdown summary
495
+ kairn evolve apply # Copy iteration 2 to .claude/
496
+ ```
497
+
498
+ ---
499
+
500
+ ## Architecture & Philosophy
501
+
502
+ ### Design Principles
503
+
504
+ 1. **Minimal over complete.** 5 well-chosen tools beat 50 generic ones.
505
+ 2. **Workflow-specific over generic.** Every file generated relates to your actual task.
506
+ 3. **Self-improving.** Environments get better with use via the evolution loop and self-learning intent router.
507
+ 4. **Local-first.** No accounts, no servers, no telemetry. Runs offline with your own LLM key.
508
+ 5. **Transparent.** You can inspect every generated file. Nothing is hidden.
509
+ 6. **Security by default.** Every environment includes deny rules, hooks, and guidance.
510
+ 7. **Prove it.** Evolved harnesses must demonstrably outperform static ones. Claims require measurement.
511
+
512
+ ### What Makes Kairn Unique
513
+
514
+ **vs. Manual `.claude/` directories:**
515
+ - Auto-generated from codebase scan or workflow description
516
+ - Intent routing (don't memorize command names)
517
+ - Automated evolution (harness improves on real tasks)
518
+
519
+ **vs. Other agents (OMC, AutoCoder, etc.):**
520
+ - Kairn manages the *harness* (instructions, MCP, commands, rules, agents), not agents themselves
521
+ - Kairn uses the evolution loop to improve the harness (not the agent capability)
522
+ - Two-tier intent routing (regex + Haiku) is unique to Kairn v2.5.0+
523
+
524
+ **vs. DSPy, Meta-Harness, OpenEvolve:**
525
+ - Kairn is CLI-first and project-scoped (not a framework library)
526
+ - Integrated with Claude Code's native hooks API (not custom inference)
527
+ - Generates MCP configurations alongside harness (full integration)
528
+
529
+ ---
530
+
531
+ ## Roadmap
532
+
533
+ ### v1.x ✅ (Complete)
534
+ Local CLI for generating and managing Claude Code environments. Includes advanced patterns (sprint contracts, multi-agent QA, autonomy levels), templates, secrets management, and Claude Code power patterns (TDD, verification, known gotchas).
535
+
536
+ ### v2.x (In Progress)
537
+ **Kairn Evolve** — automated harness optimization.
538
+
539
+ - **v2.0.0** ✅ Task Definition & Trace Infrastructure
540
+ - **v2.1.0** ✅ The Evolution Loop
541
+ - **v2.2.0** ✅ Diagnosis & Reporting
542
+ - **v2.2.1-2.2.8** ✅ Bug fixes & optimizations
543
+ - **v2.3.0** ⏳ Eval Quality & Auth (Claude Code subscription OAuth, prompt caching)
544
+ - **v2.4.0** ⏳ Intelligent Evolution (principal proposer, experience replay, exploration/exploitation)
545
+ - **v2.5.0** 🔄 Intent-Aware Harnesses (in-progress Ralph loop)
546
+ - **v2.6.0** ⏳ Structured Harness IR (mutations on typed IR, not raw text)
547
+ - **v2.7.0** ⏳ Polish & Integration (dashboard, watch mode, CI/CD integration)
548
+
549
+ ### v3.x (Aspirational)
550
+ Broader harness scope (plugins, external tools), paid tool connections, hosted platform, learning system.
551
+
552
+ ---
168
553
 
169
554
  ## Security
170
555
 
@@ -173,14 +558,54 @@ The LLM call uses your own API key. Nothing is sent to Kairn servers (there are
173
558
  - **Curated registry only.** Every MCP server is manually verified.
174
559
  - **Environment variable references.** MCP configs use `${ENV_VAR}` syntax — secrets never written to config files.
175
560
  - **Path traversal protection.** Evolution mutations are validated against `../` injection.
561
+ - **Hooks in settings.json** — `PreToolUse` hooks block destructive commands, `PostCompact` hooks restore context.
562
+
563
+ ---
564
+
565
+ ## FAQ
566
+
567
+ **Q: Do I need a Kairn account?**
568
+ A: No. Kairn is a local CLI. Your API key for Claude/GPT/Gemini is configured once and stored locally.
569
+
570
+ **Q: Does Kairn send my code to external servers?**
571
+ A: No. All LLM calls use your own API key. Kairn CLI has no backend.
572
+
573
+ **Q: Can I use Kairn with Claude Code on a team?**
574
+ A: Yes. Generate the harness locally, commit `.claude/` to git. Team members run `claude` and get the same environment. The evolve loop runs locally per person (results don't auto-merge).
575
+
576
+ **Q: What if I want to keep my manual `.claude/` customizations?**
577
+ A: Use `kairn optimize --diff` to preview changes. You can selectively accept or reject them. For full control, don't use `optimize` — use `describe` once and then hand-edit the generated files.
578
+
579
+ **Q: How much does evolution cost?**
580
+ A: Depends on your model, iteration count, and task volume. A 5-iteration evolution run with 5 tasks on Anthropic:
581
+ - Evaluation: ~100K tokens per iteration (traces logged)
582
+ - Proposer: ~80K tokens per iteration (diagnosis + mutation)
583
+ - Re-evaluation: ~100K tokens per iteration
584
+ - **Total:** ~1.5M tokens = ~$15-50 (Opus/Claude 3) or ~$2-5 (Haiku)
176
585
 
177
- ## Philosophy
586
+ **Q: Can I evolve just one task?**
587
+ A: Yes. `kairn evolve run --task <task_id>` runs a single task.
178
588
 
179
- - **Minimal over complete.** 5 well-chosen tools beat 50 generic ones.
180
- - **Workflow-specific over generic.** Every file generated relates to your actual task.
181
- - **Self-improving.** Environments should get better with use, not just at generation time.
182
- - **Local-first.** No accounts, no servers, no telemetry.
183
- - **Transparent.** You can inspect every generated file. Nothing is hidden.
589
+ **Q: What's the intent router doing on my prompt?**
590
+ A: When you type a prompt like "deploy this", the intent router:
591
+ 1. Checks Tier 1 regex patterns (fast, free)
592
+ 2. If no match, sends to Tier 2 (Haiku, ~$0.001)
593
+ 3. Injects `/project:deploy` into your message context
594
+ 4. Claude reads that and executes the command
595
+
596
+ You can disable it with `"enableTier2": false` in settings.json if you find it intrusive.
597
+
598
+ ---
599
+
600
+ ## Contributing
601
+
602
+ Kairn is open-source. Contributions welcome:
603
+ - New MCP servers to the registry
604
+ - Eval task templates for new project types
605
+ - Improved proposer prompts
606
+ - Bug reports and UX feedback
607
+
608
+ ---
184
609
 
185
610
  ## License
186
611
 
@@ -188,4 +613,4 @@ MIT
188
613
 
189
614
  ---
190
615
 
191
- *Kairn — from kairos (the right moment) and cairn (the stack of stones marking the path).*
616
+ *Kairn — from kairos (the right moment) and cairn (the stack of stones marking the path). Choose the right moment. Mark the path for others.*