@kernel.chat/kbot 3.97.4 → 3.99.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/agent.js +22 -1
- package/dist/cli.js +163 -0
- package/dist/skills-loader.d.ts +37 -5
- package/dist/skills-loader.js +342 -50
- package/dist/teacher-logger.d.ts +71 -0
- package/dist/teacher-logger.js +162 -0
- package/dist/tools/idempotency-check.d.ts +2 -0
- package/dist/tools/idempotency-check.js +31 -0
- package/dist/tools/schedule-persistence.d.ts +2 -0
- package/dist/tools/schedule-persistence.js +19 -0
- package/dist/train-agent-trace.d.ts +29 -0
- package/dist/train-agent-trace.js +141 -0
- package/dist/train-curate.d.ts +25 -0
- package/dist/train-curate.js +354 -0
- package/dist/train-cycle.d.ts +22 -0
- package/dist/train-cycle.js +230 -0
- package/dist/train-grpo.d.ts +68 -0
- package/dist/train-grpo.js +206 -0
- package/dist/train-merge.d.ts +26 -0
- package/dist/train-merge.js +148 -0
- package/dist/train-self.d.ts +38 -0
- package/dist/train-self.js +232 -0
- package/package.json +2 -1
- package/skills/deployment/daemon-deployment/SKILL.md +70 -0
- package/skills/deployment/ship-pipeline/SKILL.md +81 -0
- package/skills/emergent/forge-reflex/SKILL.md +53 -0
- package/skills/emergent/mimic-hybrid/SKILL.md +56 -0
- package/skills/memory/dream-to-commit/SKILL.md +52 -0
- package/skills/memory/memory-cascade/SKILL.md +59 -0
- package/skills/music-production/ableton-session-build/SKILL.md +61 -0
- package/skills/orchestration/cross-agent-blackboard/SKILL.md +58 -0
- package/skills/orchestration/specialist-routing/SKILL.md +57 -0
- package/skills/self-improvement/autopoiesis-loop/SKILL.md +47 -0
- package/skills/self-improvement/skill-self-authorship/SKILL.md +70 -0
- package/skills/self-improvement/teacher-trace-curation/SKILL.md +54 -0
- package/skills/software-development/systematic-debugging/SKILL.md +86 -0
- package/skills/software-development/test-driven-development/SKILL.md +74 -0
|
@@ -0,0 +1,61 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ableton-session-build
|
|
3
|
+
description: Use when the user wants to build a track, beat, or arrangement in Ableton Live. kbot drives Ableton via OSC — you don't type notes, you describe the idea.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: kbot
|
|
6
|
+
license: MIT
|
|
7
|
+
platforms: [darwin]
|
|
8
|
+
metadata:
|
|
9
|
+
kbot:
|
|
10
|
+
tags: [ableton, music, osc, m4l, production]
|
|
11
|
+
related_skills: [serum2-preset-craft, dj-set-builder]
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Ableton Session Build
|
|
15
|
+
|
|
16
|
+
kbot has 14 Ableton tools, 9 M4L devices, and a full OSC bridge. You describe a beat; kbot lays it down in a live session.
|
|
17
|
+
|
|
18
|
+
## Iron Law
|
|
19
|
+
|
|
20
|
+
```
|
|
21
|
+
VERIFY THE OSC BRIDGE FIRST. EVERY TIME.
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
Without a live bridge, every tool call silently fails. Check before you plan.
|
|
25
|
+
|
|
26
|
+
## Preflight (2 commands)
|
|
27
|
+
|
|
28
|
+
1. `ableton_session_info` — if this returns tempo + tracks, you're connected.
|
|
29
|
+
2. If it times out: remind the user to start Live and enable the AbletonOSC remote script (`Link/Tempo/MIDI → Control Surface → AbletonOSC`).
|
|
30
|
+
|
|
31
|
+
## Production Flow
|
|
32
|
+
|
|
33
|
+
1. **Set the frame**: `ableton_transport` to set tempo, key hint via scene naming, time signature.
|
|
34
|
+
2. **Create tracks**: one `ableton_create_track` per role (drums, bass, pad, lead, FX).
|
|
35
|
+
3. **Load sounds**: `ableton_load_sample` for one-shots; `ableton_load_plugin` for Serum/synths; `ableton_load_preset` for factory patches.
|
|
36
|
+
4. **Write patterns**: `ableton_midi` with note arrays. Use `generate_drum_pattern` / `generate_melody_pattern` as starting points.
|
|
37
|
+
5. **Arrange**: `ableton_scene` to build verse/chorus/drop scenes; `ableton_clip` to fire them.
|
|
38
|
+
6. **Mix**: `ableton_mixer` for levels/pan; `ableton_effect_chain` for returns; `ableton_device` for insert FX.
|
|
39
|
+
7. **Capture**: render via transport record, or have the user bounce the arrangement.
|
|
40
|
+
|
|
41
|
+
## Specialist Escalations
|
|
42
|
+
|
|
43
|
+
- Preset design needed? Use `serum2-preset-craft`.
|
|
44
|
+
- Full DJ set needed? Use `dj-set-builder`.
|
|
45
|
+
- Sound too generic? Route to `aesthete` specialist with the current session_info for creative direction.
|
|
46
|
+
|
|
47
|
+
## Anti-Patterns
|
|
48
|
+
|
|
49
|
+
- Writing MIDI before verifying the bridge responds. Silent failures waste the whole session.
|
|
50
|
+
- Loading plugins without checking the user has them installed. Use `ableton_browse` first.
|
|
51
|
+
- Generating 32-bar patterns in one call. Work in 4- and 8-bar loops; iterate.
|
|
52
|
+
|
|
53
|
+
## Known Fragility
|
|
54
|
+
|
|
55
|
+
- AbletonOSC's `set/notes` endpoint quirks — if clip writes fail, fall back to setting notes via `ableton_clip.write_notes` with explicit velocity + duration.
|
|
56
|
+
- Sample loading can 404 if the browser index is stale. `ableton_browse --refresh` fixes it.
|
|
57
|
+
- Clip firing during a running session can drop the first beat. Fire on scene boundary, not mid-bar.
|
|
58
|
+
|
|
59
|
+
## What Emerges
|
|
60
|
+
|
|
61
|
+
The user stops thinking in Live's UI and starts describing ideas. "Make it darker" becomes a legitimate prompt because kbot knows darker = minor key + sub-bass boost + reverb tail + plate on the snare. This is the skill paying off over sessions.
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: cross-agent-blackboard
|
|
3
|
+
description: Use when more than one agent works on the same problem. The blackboard is the shared context — without it, agents duplicate work and contradict each other.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: kbot
|
|
6
|
+
license: MIT
|
|
7
|
+
metadata:
|
|
8
|
+
kbot:
|
|
9
|
+
tags: [agents, coordination, blackboard, multi-agent]
|
|
10
|
+
related_skills: [specialist-routing, agent-handoff]
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Cross-Agent Blackboard
|
|
14
|
+
|
|
15
|
+
Single-agent sessions use memory. Multi-agent sessions need a blackboard — a shared write-read surface every participating agent sees.
|
|
16
|
+
|
|
17
|
+
## When
|
|
18
|
+
|
|
19
|
+
- `kbot --architect` (plan + implement by two agents)
|
|
20
|
+
- `/team` running all 6 specialists against the same change
|
|
21
|
+
- `agent_handoff` passing control with preserved context
|
|
22
|
+
- Matrix agents collaborating on a research question
|
|
23
|
+
|
|
24
|
+
## Iron Law
|
|
25
|
+
|
|
26
|
+
```
|
|
27
|
+
ANY AGENT TAKING OVER MUST READ THE BLACKBOARD BEFORE ACTING.
|
|
28
|
+
ANY AGENT LEAVING MUST WRITE TO THE BLACKBOARD BEFORE EXITING.
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## Protocol
|
|
32
|
+
|
|
33
|
+
Blackboard entries have four fields: `type` (decision/finding/blocker/artifact), `key` (short slug), `value` (the payload), `author` (agent id).
|
|
34
|
+
|
|
35
|
+
- `blackboard_write({ type, key, value })` — before handing off or pausing.
|
|
36
|
+
- `blackboard_read({ keyPrefix?, type? })` — on entry or after a long subagent call.
|
|
37
|
+
- `blackboard_query()` — full dump when context is lost.
|
|
38
|
+
|
|
39
|
+
## Example Flow (from a real session)
|
|
40
|
+
|
|
41
|
+
1. `researcher` investigates a library version incompatibility.
|
|
42
|
+
2. Writes `{ type: 'finding', key: 'axios.v1-vs-v2', value: 'v2 breaks the retry middleware' }`.
|
|
43
|
+
3. Hands off to `coder` via `agent_handoff`.
|
|
44
|
+
4. `coder` reads the blackboard, sees the finding, and skips re-researching.
|
|
45
|
+
5. Writes `{ type: 'artifact', key: 'axios-pin', value: 'pinned to 1.6.7 in package.json' }`.
|
|
46
|
+
6. `guardian` runs later, reads both entries, confirms no regression, writes `{ type: 'decision', key: 'axios-pin', value: 'approved' }`.
|
|
47
|
+
|
|
48
|
+
Total time from research to verified fix: 8 minutes. Without the blackboard: each agent would re-derive context, easily 25+ minutes.
|
|
49
|
+
|
|
50
|
+
## Anti-Patterns
|
|
51
|
+
|
|
52
|
+
- Passing context through the user ("please tell the next agent X"). The user is not a message bus.
|
|
53
|
+
- Writing vague entries ("looked into the bug"). Name the finding concretely or don't write it.
|
|
54
|
+
- Reading only your own writes. Other agents' entries are the whole point.
|
|
55
|
+
|
|
56
|
+
## What Emerges
|
|
57
|
+
|
|
58
|
+
With the blackboard habit, specialist chains start behaving like one compound agent. The user types one prompt, three specialists take turns, and the handoffs are invisible — because context never dropped.
|
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: specialist-routing
|
|
3
|
+
description: Use when a task clearly belongs to a specialist agent. Route first, reason second — don't let the general agent muddle through a domain it has a specialist for.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: kbot
|
|
6
|
+
license: MIT
|
|
7
|
+
metadata:
|
|
8
|
+
kbot:
|
|
9
|
+
tags: [agents, routing, specialists, delegation]
|
|
10
|
+
related_skills: [matrix-agent-spawn, cross-agent-blackboard, agent-handoff]
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Specialist Routing
|
|
14
|
+
|
|
15
|
+
kbot ships with 25+ specialists (run `kbot agents list` for the current roster). The default `kernel` agent is a generalist — competent everywhere, excellent nowhere. The feeling of "kbot is sharp" comes from routing into the right specialist before work starts.
|
|
16
|
+
|
|
17
|
+
## Iron Law
|
|
18
|
+
|
|
19
|
+
```
|
|
20
|
+
IF A SPECIALIST EXISTS FOR THIS DOMAIN, THE GENERALIST DOES NOT TOUCH IT.
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## The Roster
|
|
24
|
+
|
|
25
|
+
| Signal | Specialist | Why |
|
|
26
|
+
|---|---|---|
|
|
27
|
+
| "review my code", "refactor", "fix this test" | `coder` | strongest code-patching track record |
|
|
28
|
+
| "research X", "find me", "what does the literature say" | `researcher` | web + arxiv + citation graph |
|
|
29
|
+
| "design this", "make it look right", a11y | `aesthete` | design tokens + spacing + typography |
|
|
30
|
+
| "is this safe", secrets, auth, permissions | `guardian` | OWASP checks, dep audit, redact |
|
|
31
|
+
| CI, deploys, env vars, launchd, docker | `infrastructure` | full infra toolkit |
|
|
32
|
+
| a statistical question, backtesting, distributions | `quant` | stats + finance + probability |
|
|
33
|
+
| a 30+ minute deep dive, multi-source | `investigator` | multi-step research workflow |
|
|
34
|
+
| "write" anything long-form | `writer` | content creation + editing |
|
|
35
|
+
| strategy, tradeoffs, business framing | `strategist` | structured decision support |
|
|
36
|
+
| how it's going, "predict X" | `oracle` | forecasting + anticipation |
|
|
37
|
+
|
|
38
|
+
## Trigger
|
|
39
|
+
|
|
40
|
+
The moment the user's first message can be classified into the table above. The learned router handles this automatically when confidence ≥ 0.7; below that, you route explicitly: `kbot --agent <id> "..."`.
|
|
41
|
+
|
|
42
|
+
## Procedure
|
|
43
|
+
|
|
44
|
+
1. **Classify the task** against the table. If two match, pick the one closer to the *verb* (review → coder; design review → aesthete).
|
|
45
|
+
2. **If none match**, stay on `kernel` and invoke the skill that fits instead.
|
|
46
|
+
3. **When in doubt**, use `--architect` (plan with one specialist, implement with another) or `--plan` (read-only scoping first).
|
|
47
|
+
4. **If the specialist gets stuck**, use `agent_handoff` to pass to another specialist with context — don't fall back to the generalist silently.
|
|
48
|
+
|
|
49
|
+
## Anti-Pattern
|
|
50
|
+
|
|
51
|
+
- Running everything on the default agent "for consistency." You lose every specialist advantage.
|
|
52
|
+
- Choosing a specialist by *topic* instead of *verb*. "Music" isn't a specialist; `aesthete` handles creative direction and `coder` handles the OSC scripting.
|
|
53
|
+
- Routing between specialists mid-task without writing to the blackboard — the next specialist loses context.
|
|
54
|
+
|
|
55
|
+
## What Emerges
|
|
56
|
+
|
|
57
|
+
Users develop muscle memory for `kbot --agent <id>`. Over weeks, the learned router picks up which specialist wins which task *for this user* (not average) and routes preemptively. The experience becomes: "I type the prompt, the right expert answers." That's the routing skill paying compound interest.
|
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: autopoiesis-loop
|
|
3
|
+
description: Use when planning multi-session work. The autopoiesis loop is kbot using itself to improve itself — every session should end a little sharper than it started.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: kbot
|
|
6
|
+
license: MIT
|
|
7
|
+
metadata:
|
|
8
|
+
kbot:
|
|
9
|
+
tags: [self-improvement, meta, dogfood, autopoiesis]
|
|
10
|
+
related_skills: [skill-self-authorship, teacher-trace-curation, dream-to-commit]
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# The Autopoiesis Loop
|
|
14
|
+
|
|
15
|
+
kbot is the tool *and* the workbench. Every session has two outputs: the thing the user asked for, and the incremental improvement to kbot itself. Sessions that only produce the first are leaving compound interest on the table.
|
|
16
|
+
|
|
17
|
+
## The Five Moves (once per session)
|
|
18
|
+
|
|
19
|
+
1. **Session start** — run `kbot bootstrap`. The bootstrap agent surfaces the highest-leverage improvement based on accumulated signals. Do this before feature work, not instead of it.
|
|
20
|
+
2. **During work** — notice repeated patterns. Each repetition is a skill waiting to be written (`skill-self-authorship`).
|
|
21
|
+
3. **On friction** — missing tool? `forge-reflex`. Wrong specialist? Update the learned router via corrective feedback.
|
|
22
|
+
4. **Session end** — update `SCRATCHPAD.md` with what you learned (not what you did). The next session's opening context reads this file.
|
|
23
|
+
5. **Overnight** — the dream engine consolidates transcripts into memory entries. The daemon reviews diffs, runs code quality scans, translates i18n. Work continues while the user sleeps.
|
|
24
|
+
|
|
25
|
+
## Iron Law
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
NEVER END A SESSION WORSE THAN IT STARTED.
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
If kbot hit a wall and you didn't leave a corrective signal behind (a skill, a memory, a scratchpad note, a corrected learned-router pattern), the loop is broken.
|
|
32
|
+
|
|
33
|
+
## The Three Signals That Compound
|
|
34
|
+
|
|
35
|
+
- **Corrections** — user says "no, do X instead." These go into `~/.kbot/corrections/` and load as closed-loop prompts.
|
|
36
|
+
- **Teacher traces** — every non-local Claude call is logged to `~/.kbot/teacher/traces.jsonl`. Weekly, `kbot train-self` fine-tunes local models on the best ones.
|
|
37
|
+
- **Skills** — successful patterns distilled into `~/.kbot/skills/`. Loaded on relevance.
|
|
38
|
+
|
|
39
|
+
Each of these runs automatically once wired up. The skill is knowing to wire them up in the first place.
|
|
40
|
+
|
|
41
|
+
## What Emerges
|
|
42
|
+
|
|
43
|
+
Three weeks of active use and kbot's answers start feeling tuned to *this user* specifically. Six weeks in, the local model (via `train-self`) is answering basic questions at zero cost. Three months in, kbot's corrections archive has more collective wisdom than the user's own notes.
|
|
44
|
+
|
|
45
|
+
## Anti-Pattern
|
|
46
|
+
|
|
47
|
+
Running kbot as a pure consumer — asking questions, using answers, never looking at what's in `~/.kbot/`. You're paying for the loop with every API call but not collecting the dividend.
|
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: skill-self-authorship
|
|
3
|
+
description: Use at the end of any session where you solved something non-obvious or repeated a pattern 3+ times. Write a new skill so the next session doesn't re-derive it.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: kbot
|
|
6
|
+
license: MIT
|
|
7
|
+
metadata:
|
|
8
|
+
kbot:
|
|
9
|
+
tags: [skills, meta, autopoiesis, learning]
|
|
10
|
+
related_skills: [forge-reflex, autopoiesis-loop, teacher-trace-curation]
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Skill Self-Authorship
|
|
14
|
+
|
|
15
|
+
The skills that ship with kbot cover known territory. The skills that make *your* kbot feel psychic are the ones it writes for itself during use.
|
|
16
|
+
|
|
17
|
+
## When to Write a New Skill
|
|
18
|
+
|
|
19
|
+
At least one of these must be true:
|
|
20
|
+
- You solved something that took 5+ tool calls and got it right.
|
|
21
|
+
- You repeated the same sequence of tool calls 3+ times this week.
|
|
22
|
+
- You corrected kbot twice on the same kind of mistake.
|
|
23
|
+
- A user gave explicit feedback "remember this" / "always do X" / "never do Y."
|
|
24
|
+
- A daemon surfaced a recurring failure kbot had to work around.
|
|
25
|
+
|
|
26
|
+
## Skill Anatomy
|
|
27
|
+
|
|
28
|
+
Use `~/.kbot/skills/<category>/<kebab-name>/SKILL.md`. Frontmatter:
|
|
29
|
+
|
|
30
|
+
```yaml
|
|
31
|
+
---
|
|
32
|
+
name: kebab-name
|
|
33
|
+
description: One sentence — "when to use."
|
|
34
|
+
version: 1.0.0
|
|
35
|
+
author: kbot-self
|
|
36
|
+
license: MIT
|
|
37
|
+
metadata:
|
|
38
|
+
kbot:
|
|
39
|
+
tags: [2-5 lowercase tags]
|
|
40
|
+
related_skills: [other skills this connects to]
|
|
41
|
+
---
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Body structure:
|
|
45
|
+
1. **Iron Law** — the one rule that, if broken, invalidates the skill.
|
|
46
|
+
2. **Trigger** — the exact situation that should make kbot reach for this.
|
|
47
|
+
3. **Procedure** — 3–6 numbered steps. Not prose. Commands and decisions.
|
|
48
|
+
4. **Anti-patterns** — the failure modes to refuse.
|
|
49
|
+
|
|
50
|
+
## The "When to Use" Line Is Everything
|
|
51
|
+
|
|
52
|
+
The relevance scorer reads `description` and `tags` first. If your description is "general debugging tips" the skill will never load. If it's "use when any vitest test fails with ENOENT" it loads exactly when needed. Write for the trigger, not the topic.
|
|
53
|
+
|
|
54
|
+
## Self-Patching
|
|
55
|
+
|
|
56
|
+
Use the `skill_manage` tool after a skill executes:
|
|
57
|
+
- `skill_manage({ action: "get", query: "<skill-id>" })` — check current success rate.
|
|
58
|
+
- If the skill failed, `skill_manage({ action: "patch", query: "<skill-id>", issue: "<what went wrong>" })` appends the issue.
|
|
59
|
+
- If a better step sequence was discovered, `skill_manage({ action: "patch", query: "<skill-id>", steps: "[...]" })` replaces the steps and bumps version.
|
|
60
|
+
- `skill_manage({ action: "report" })` shows the success-rate distribution across all skills.
|
|
61
|
+
|
|
62
|
+
A skill with a 20% success rate after 10 executions is a bug report — rewrite or delete (`action: "delete"`).
|
|
63
|
+
|
|
64
|
+
## What Emerges
|
|
65
|
+
|
|
66
|
+
After 30 days of active use, `~/.kbot/skills/` reflects the user's actual work more precisely than any config file. The skills library is the durable shadow of every problem kbot helped solve.
|
|
67
|
+
|
|
68
|
+
## Anti-Pattern
|
|
69
|
+
|
|
70
|
+
Writing a skill for every clever thing you did. Skills have a maintenance cost — each one competes for token budget and relevance scoring attention. Write one skill per *repeated* problem, not per *interesting* problem.
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: teacher-trace-curation
|
|
3
|
+
description: Use weekly. The teacher log captures every non-local Claude call — curate the best ones into training data so the local model learns from your actual work.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: kbot
|
|
6
|
+
license: MIT
|
|
7
|
+
metadata:
|
|
8
|
+
kbot:
|
|
9
|
+
tags: [training, fine-tuning, mlx, local-model, dataset]
|
|
10
|
+
related_skills: [autopoiesis-loop, skill-self-authorship]
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Teacher Trace Curation
|
|
14
|
+
|
|
15
|
+
Every time kbot calls Claude, the prompt + response is written to `~/.kbot/teacher/traces.jsonl`. Left alone, this is just a log. Curated, it's the dataset that teaches the local model to answer *your* questions without touching the API.
|
|
16
|
+
|
|
17
|
+
## Iron Law
|
|
18
|
+
|
|
19
|
+
```
|
|
20
|
+
ONLY SUCCESSFUL, CORRECTED, AND USER-APPROVED TRACES ENTER THE DATASET.
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
Failed traces teach the model to fail. Garbage in is not "more data."
|
|
24
|
+
|
|
25
|
+
## The Weekly Ritual
|
|
26
|
+
|
|
27
|
+
1. `kbot train-self --mode default --max-examples 500 --iters 200 --num-layers 8` — curates + fine-tunes in one pass. The curator runs first, scores traces, and writes `~/.kbot/teacher/dataset-default.jsonl`.
|
|
28
|
+
2. Review the top 50 entries in the dataset file. Skim titles + first 200 chars.
|
|
29
|
+
3. Remove anything you wouldn't want the local model to imitate:
|
|
30
|
+
- Responses you corrected mid-session.
|
|
31
|
+
- Hallucinated library names or APIs.
|
|
32
|
+
- Advice you later decided was wrong.
|
|
33
|
+
4. Re-run step 1 with the cleaned dataset if you made significant deletions.
|
|
34
|
+
5. Test: `ollama run kernel-self:<timestamp>` on a task from the last week. Compare against the Claude baseline.
|
|
35
|
+
|
|
36
|
+
For longer cycles of evaluation + retraining, use `kbot train-cycle` which chains curate → train → evaluate → merge across multiple iterations.
|
|
37
|
+
|
|
38
|
+
## The Quality Signal That Matters Most
|
|
39
|
+
|
|
40
|
+
Was this answer *used* without correction? The curator scores partly on: no correction in the next 5 turns, no follow-up question asking for clarification, no user rephrasing. Approved-by-silence is the strongest endorsement.
|
|
41
|
+
|
|
42
|
+
## What You're Actually Building
|
|
43
|
+
|
|
44
|
+
A local model that answers "how do I deploy this?" using *your* deploy flow, not Anthropic's generic best practice. Your infrastructure, your naming, your conventions, your past decisions. That's what the local model becomes over weeks.
|
|
45
|
+
|
|
46
|
+
## Anti-Pattern
|
|
47
|
+
|
|
48
|
+
Training on everything. Larger datasets with noisy data fine-tune *worse* models than small curated datasets. 200 excellent examples beat 2,000 mediocre ones every time.
|
|
49
|
+
|
|
50
|
+
## Integration
|
|
51
|
+
|
|
52
|
+
- `KBOT_TEACHER_LOG=1` in `~/.zshrc` keeps the logger always-on.
|
|
53
|
+
- `launchd` plist at `~/Library/LaunchAgents/com.kernel.kbot-train-self.plist` runs the curation + training weekly.
|
|
54
|
+
- The trained model gets tagged `kernel-self:<timestamp>` in Ollama and becomes available to every kbot command via `--model kernel-self:latest`.
|
|
@@ -0,0 +1,86 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: systematic-debugging
|
|
3
|
+
description: Use when any test fails, any bug surfaces, or any behavior is unexpected. Enforces root-cause analysis before code changes.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: kbot
|
|
6
|
+
license: MIT
|
|
7
|
+
metadata:
|
|
8
|
+
kbot:
|
|
9
|
+
tags: [debugging, root-cause, investigation, tdd]
|
|
10
|
+
related_skills: [test-driven-development, ship-pipeline, specialist-routing]
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Systematic Debugging
|
|
14
|
+
|
|
15
|
+
Random fixes waste time and create new bugs. Before writing a single line of fix, you must understand *why* it broke.
|
|
16
|
+
|
|
17
|
+
## Iron Law
|
|
18
|
+
|
|
19
|
+
```
|
|
20
|
+
NO FIX WITHOUT A NAMED ROOT CAUSE.
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
If you can't finish the sentence "It broke because ___", you are not ready to edit.
|
|
24
|
+
|
|
25
|
+
## When to Use
|
|
26
|
+
|
|
27
|
+
- Any failing test
|
|
28
|
+
- Any exception in a log
|
|
29
|
+
- Any "this used to work"
|
|
30
|
+
- Any UI regression
|
|
31
|
+
- Any flaky behavior
|
|
32
|
+
- Any build failure
|
|
33
|
+
|
|
34
|
+
Especially use this under time pressure — "quick fix" almost always means "new bug tomorrow."
|
|
35
|
+
|
|
36
|
+
## Four Phases
|
|
37
|
+
|
|
38
|
+
### Phase 1 — Reproduce
|
|
39
|
+
|
|
40
|
+
- Run the failing case *exactly*. Don't paraphrase the command.
|
|
41
|
+
- Capture the full error: message, stack, file, line.
|
|
42
|
+
- If you can't reproduce, you don't understand the bug yet — stop and gather more evidence.
|
|
43
|
+
|
|
44
|
+
Tools: `bash` to run the command, `kbot_read` to open the file at the stack trace line, `git_diff` to see what changed recently.
|
|
45
|
+
|
|
46
|
+
### Phase 2 — Isolate
|
|
47
|
+
|
|
48
|
+
- Binary-search the change set. Use `git log --oneline -20` and check the last commit that touched the area.
|
|
49
|
+
- Trace the data flow upstream from the symptom. The bug almost never lives where the exception fires.
|
|
50
|
+
- Find a *working* similar case in the same codebase. What's different?
|
|
51
|
+
|
|
52
|
+
Tools: `grep` for the bad value, `git_log` for history, subagent with Explore to map the call graph.
|
|
53
|
+
|
|
54
|
+
### Phase 3 — Hypothesize
|
|
55
|
+
|
|
56
|
+
- Write down one sentence: "I think this is caused by X because Y."
|
|
57
|
+
- Predict what the smallest possible reproduction looks like.
|
|
58
|
+
- Predict what the smallest possible fix looks like.
|
|
59
|
+
|
|
60
|
+
Do **not** write code yet. Share the hypothesis first if the user is watching.
|
|
61
|
+
|
|
62
|
+
### Phase 4 — Fix
|
|
63
|
+
|
|
64
|
+
1. Write a failing test that captures the bug (see `test-driven-development`).
|
|
65
|
+
2. Make the minimal change that turns it green.
|
|
66
|
+
3. Run the full suite. No regressions.
|
|
67
|
+
4. Commit with a message that names the root cause, not the symptom.
|
|
68
|
+
|
|
69
|
+
## The Rule of Three
|
|
70
|
+
|
|
71
|
+
If your third fix attempt failed, **stop**. Don't try a fourth. The bug is not where you think it is, or the architecture is wrong. Escalate — use `kbot_plan` or hand off to a specialist agent.
|
|
72
|
+
|
|
73
|
+
## Red Flags (stop and restart)
|
|
74
|
+
|
|
75
|
+
- "Let me just try this and see"
|
|
76
|
+
- "I'll comment this out for now"
|
|
77
|
+
- Changing more than one thing per commit
|
|
78
|
+
- Adding a try/except to make the error go away
|
|
79
|
+
- "It works on my machine"
|
|
80
|
+
|
|
81
|
+
## How kbot Helps
|
|
82
|
+
|
|
83
|
+
- `kbot --thinking` shows reasoning — use it when debugging complex issues.
|
|
84
|
+
- `kbot --agent coder` routes to the specialist with strongest code-patching track record.
|
|
85
|
+
- `kbot --plan` forces read-only investigation mode for Phase 1 & 2.
|
|
86
|
+
- `forge_tool` builds a throwaway diagnostic script when stdlib tools aren't enough.
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: test-driven-development
|
|
3
|
+
description: Use when adding any new behavior or fixing any bug. Red → Green → Refactor, enforced.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: kbot
|
|
6
|
+
license: MIT
|
|
7
|
+
metadata:
|
|
8
|
+
kbot:
|
|
9
|
+
tags: [tdd, testing, quality, vitest]
|
|
10
|
+
related_skills: [systematic-debugging, ship-pipeline]
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Test-Driven Development
|
|
14
|
+
|
|
15
|
+
## Iron Law
|
|
16
|
+
|
|
17
|
+
```
|
|
18
|
+
THE FIRST COMMIT ON ANY BEHAVIOR CHANGE IS A FAILING TEST.
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
No exceptions. Not "I'll add tests after." Not "this is too small." The failing test proves you understood the behavior before you wrote it.
|
|
22
|
+
|
|
23
|
+
## Why
|
|
24
|
+
|
|
25
|
+
- Writing the test first forces you to design the interface, not the implementation.
|
|
26
|
+
- A failing test that turns green is the only honest proof the fix works.
|
|
27
|
+
- Tests written after the fact confirm what you already did, not what you should have done.
|
|
28
|
+
- Coverage that grows alongside code never has a "we should add tests" backlog.
|
|
29
|
+
|
|
30
|
+
## The Cycle
|
|
31
|
+
|
|
32
|
+
### RED
|
|
33
|
+
|
|
34
|
+
1. Decide what the new behavior is, in one sentence.
|
|
35
|
+
2. Write the smallest test that would fail if the behavior is missing.
|
|
36
|
+
3. Run it. Confirm it fails for the *right reason* (not a typo, not a missing import).
|
|
37
|
+
|
|
38
|
+
### GREEN
|
|
39
|
+
|
|
40
|
+
1. Write the minimum code that makes the test pass.
|
|
41
|
+
2. No extra features. No refactoring yet. No "while I'm here."
|
|
42
|
+
3. Run the test. Confirm green.
|
|
43
|
+
|
|
44
|
+
### REFACTOR
|
|
45
|
+
|
|
46
|
+
1. Now — and only now — clean up the implementation.
|
|
47
|
+
2. Rename, extract, deduplicate. Tests still green the whole time.
|
|
48
|
+
3. If a refactor turns a test red, you changed behavior — revert and think.
|
|
49
|
+
|
|
50
|
+
## kbot Toolchain
|
|
51
|
+
|
|
52
|
+
- Framework: **Vitest** (`.test.ts` next to source).
|
|
53
|
+
- Runner: `npx vitest run` (one-shot) or `npx vitest` (watch).
|
|
54
|
+
- Typecheck gate: `npx tsc --noEmit` before every commit. Strict mode is non-negotiable.
|
|
55
|
+
- UI tests: `@testing-library/react`.
|
|
56
|
+
- E2E: Playwright (`npm run test:e2e`).
|
|
57
|
+
|
|
58
|
+
## Mocking Policy
|
|
59
|
+
|
|
60
|
+
- Mock Supabase. Never call the real API in tests.
|
|
61
|
+
- Mock the Claude proxy. Never call the real API in tests.
|
|
62
|
+
- Do NOT mock your own pure functions — test them directly.
|
|
63
|
+
- Do NOT mock file I/O — use `tmpdir()` fixtures.
|
|
64
|
+
|
|
65
|
+
## Anti-Patterns
|
|
66
|
+
|
|
67
|
+
- Tests that assert implementation details (internal method calls) instead of observable behavior.
|
|
68
|
+
- Tests that share state between cases. Each test starts clean.
|
|
69
|
+
- Test names like `it('works')`. Name the behavior: `it('rejects unauthenticated requests')`.
|
|
70
|
+
- "Fix the test" when the test correctly catches a regression. Fix the code.
|
|
71
|
+
|
|
72
|
+
## When kbot Should Run Tests Autonomously
|
|
73
|
+
|
|
74
|
+
After any edit under `src/` or `packages/kbot/src/`, kbot runs `npm run typecheck` and `npx vitest run <affected>` before reporting success. No UI claim without a browser check (see `ship-pipeline`).
|