claudecode-omc 5.3.0 → 5.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (91) hide show
  1. package/.local/guidelines/CLAUDE.md +31 -0
  2. package/README.md +57 -1
  3. package/bundled/manifest.json +2 -2
  4. package/bundled/upstream/oh-my-claudecode/agents/analyst.md +1 -1
  5. package/bundled/upstream/oh-my-claudecode/agents/architect.md +1 -1
  6. package/bundled/upstream/oh-my-claudecode/agents/code-reviewer.md +1 -1
  7. package/bundled/upstream/oh-my-claudecode/agents/code-simplifier.md +1 -1
  8. package/bundled/upstream/oh-my-claudecode/agents/critic.md +1 -1
  9. package/bundled/upstream/oh-my-claudecode/agents/debugger.md +1 -1
  10. package/bundled/upstream/oh-my-claudecode/agents/designer.md +1 -1
  11. package/bundled/upstream/oh-my-claudecode/agents/document-specialist.md +1 -1
  12. package/bundled/upstream/oh-my-claudecode/agents/executor.md +1 -1
  13. package/bundled/upstream/oh-my-claudecode/agents/explore.md +1 -1
  14. package/bundled/upstream/oh-my-claudecode/agents/git-master.md +3 -3
  15. package/bundled/upstream/oh-my-claudecode/agents/planner.md +1 -1
  16. package/bundled/upstream/oh-my-claudecode/agents/qa-tester.md +1 -1
  17. package/bundled/upstream/oh-my-claudecode/agents/scientist.md +1 -1
  18. package/bundled/upstream/oh-my-claudecode/agents/security-reviewer.md +1 -1
  19. package/bundled/upstream/oh-my-claudecode/agents/test-engineer.md +1 -75
  20. package/bundled/upstream/oh-my-claudecode/agents/tracer.md +1 -1
  21. package/bundled/upstream/oh-my-claudecode/agents/verifier.md +1 -1
  22. package/bundled/upstream/oh-my-claudecode/agents/writer.md +1 -1
  23. package/bundled/upstream/oh-my-claudecode/hooks/hooks.json +21 -1
  24. package/bundled/upstream/oh-my-claudecode/skills/AGENTS.md +200 -0
  25. package/bundled/upstream/oh-my-claudecode/skills/autopilot/SKILL.md +17 -10
  26. package/bundled/upstream/oh-my-claudecode/skills/autoresearch/SKILL.md +90 -0
  27. package/bundled/upstream/oh-my-claudecode/skills/cancel/SKILL.md +15 -6
  28. package/bundled/upstream/oh-my-claudecode/skills/configure-notifications/SKILL.md +12 -12
  29. package/bundled/upstream/oh-my-claudecode/skills/debug/SKILL.md +35 -0
  30. package/bundled/upstream/oh-my-claudecode/skills/deep-dive/SKILL.md +4 -0
  31. package/bundled/upstream/oh-my-claudecode/skills/deep-interview/SKILL.md +23 -18
  32. package/bundled/upstream/oh-my-claudecode/skills/hud/SKILL.md +23 -101
  33. package/bundled/upstream/oh-my-claudecode/skills/learner/SKILL.md +27 -2
  34. package/bundled/upstream/oh-my-claudecode/skills/mcp-setup/SKILL.md +67 -8
  35. package/bundled/upstream/oh-my-claudecode/skills/omc-doctor/SKILL.md +32 -47
  36. package/bundled/upstream/oh-my-claudecode/skills/omc-setup/SKILL.md +4 -2
  37. package/bundled/upstream/oh-my-claudecode/skills/omc-setup/phases/01-install-claude-md.md +15 -4
  38. package/bundled/upstream/oh-my-claudecode/skills/omc-setup/phases/02-configure.md +9 -9
  39. package/bundled/upstream/oh-my-claudecode/skills/omc-setup/phases/03-integrations.md +13 -13
  40. package/bundled/upstream/oh-my-claudecode/skills/omc-setup/phases/04-welcome.md +3 -3
  41. package/bundled/upstream/oh-my-claudecode/skills/omc-teams/SKILL.md +28 -0
  42. package/bundled/upstream/oh-my-claudecode/skills/plan/SKILL.md +1 -0
  43. package/bundled/upstream/oh-my-claudecode/skills/project-session-manager/SKILL.md +25 -5
  44. package/bundled/upstream/oh-my-claudecode/skills/project-session-manager/lib/config.sh +2 -15
  45. package/bundled/upstream/oh-my-claudecode/skills/project-session-manager/lib/providers/github.sh +1 -1
  46. package/bundled/upstream/oh-my-claudecode/skills/project-session-manager/lib/session.sh +2 -2
  47. package/bundled/upstream/oh-my-claudecode/skills/project-session-manager/lib/tmux.sh +109 -4
  48. package/bundled/upstream/oh-my-claudecode/skills/project-session-manager/lib/worktree.sh +26 -0
  49. package/bundled/upstream/oh-my-claudecode/skills/project-session-manager/psm.sh +46 -5
  50. package/bundled/upstream/oh-my-claudecode/skills/project-session-manager/templates/pr-review.md +5 -2
  51. package/bundled/upstream/oh-my-claudecode/skills/project-session-manager/templates/projects.json +1 -1
  52. package/bundled/upstream/oh-my-claudecode/skills/project-session-manager/tests/test-psm-prompt-injection.sh +336 -0
  53. package/bundled/upstream/oh-my-claudecode/skills/ralph/SKILL.md +18 -9
  54. package/bundled/upstream/oh-my-claudecode/skills/ralplan/SKILL.md +2 -0
  55. package/bundled/upstream/oh-my-claudecode/skills/release/SKILL.md +167 -57
  56. package/bundled/upstream/oh-my-claudecode/skills/remember/SKILL.md +41 -0
  57. package/bundled/upstream/oh-my-claudecode/skills/self-improve/SKILL.md +391 -0
  58. package/bundled/upstream/oh-my-claudecode/skills/self-improve/data_contracts.md +274 -0
  59. package/bundled/upstream/oh-my-claudecode/skills/self-improve/scripts/plot_progress.py +128 -0
  60. package/bundled/upstream/oh-my-claudecode/skills/self-improve/scripts/resolve-paths.mjs +192 -0
  61. package/bundled/upstream/oh-my-claudecode/skills/self-improve/scripts/validate.sh +404 -0
  62. package/bundled/upstream/oh-my-claudecode/skills/self-improve/si-benchmark-builder.md +79 -0
  63. package/bundled/upstream/oh-my-claudecode/skills/self-improve/si-goal-clarifier.md +94 -0
  64. package/bundled/upstream/oh-my-claudecode/skills/self-improve/si-researcher.md +73 -0
  65. package/bundled/upstream/oh-my-claudecode/skills/self-improve/templates/agent-settings.json +14 -0
  66. package/bundled/upstream/oh-my-claudecode/skills/self-improve/templates/goal.md +22 -0
  67. package/bundled/upstream/oh-my-claudecode/skills/self-improve/templates/harness.md +18 -0
  68. package/bundled/upstream/oh-my-claudecode/skills/self-improve/templates/idea.md +5 -0
  69. package/bundled/upstream/oh-my-claudecode/skills/self-improve/templates/settings.json +23 -0
  70. package/bundled/upstream/oh-my-claudecode/skills/skill/SKILL.md +46 -77
  71. package/bundled/upstream/oh-my-claudecode/skills/skillify/SKILL.md +53 -0
  72. package/bundled/upstream/oh-my-claudecode/skills/team/SKILL.md +83 -11
  73. package/bundled/upstream/oh-my-claudecode/skills/trace/SKILL.md +1 -0
  74. package/bundled/upstream/oh-my-claudecode/skills/ultraqa/SKILL.md +1 -0
  75. package/bundled/upstream/oh-my-claudecode/skills/ultrawork/SKILL.md +1 -0
  76. package/bundled/upstream/oh-my-claudecode/skills/verify/SKILL.md +37 -0
  77. package/bundled/upstream/oh-my-claudecode/skills/wiki/SKILL.md +67 -0
  78. package/package.json +3 -1
  79. package/src/cli/artifact.js +63 -2
  80. package/src/cli/guidelines.js +83 -0
  81. package/src/cli/index.js +13 -1
  82. package/src/cli/setup.js +48 -18
  83. package/src/cli/source.js +35 -1
  84. package/src/config/artifact-types.js +12 -2
  85. package/src/config/paths.js +95 -4
  86. package/src/config/sources.js +29 -5
  87. package/src/guidelines/apply.js +152 -0
  88. package/src/guidelines/optimizer.js +325 -0
  89. package/src/merge/claude-md-merger.js +35 -12
  90. package/templates/merge-config.json +12 -1
  91. package/bundled/upstream/oh-my-claudecode/skills/omc-doctor/skill-debugger.md +0 -101
@@ -0,0 +1,391 @@
1
+ ---
2
+ name: self-improve
3
+ description: Autonomous evolutionary code improvement engine with tournament selection
4
+ level: 4
5
+ ---
6
+
7
+ # Self-Improvement Orchestrator
8
+
9
+ You are the **loop controller** for the self-improvement system. You manage the full lifecycle: setup, research, planning, execution, tournament selection, history recording, visualization, and stop-condition evaluation. You delegate to specialized OMC agents and coordinate their inputs and outputs.
10
+
11
+ ---
12
+
13
+ ## Autonomous Execution Policy
14
+
15
+ **NEVER stop or pause to ask the user during the improvement loop.** Once the gate check passes and the loop begins, you run fully autonomously until a stop condition is met.
16
+
17
+ - **Do not ask for confirmation** between iterations or between steps within an iteration.
18
+ - **Do not summarize and wait** — execute the next step immediately.
19
+ - **On agent failure**: retry once, then skip that agent and continue with remaining agents. Log the failure in iteration history.
20
+ - **On all plans rejected**: log it, continue to the next iteration automatically.
21
+ - **On all executors failing**: log it, continue to the next iteration automatically.
22
+ - **On benchmark errors**: log the error, mark the executor as failed, continue with other executors.
23
+ - **The only things that stop the loop** are the stop conditions in Step 11.
24
+ - **Trust boundary**: The loop runs benchmark commands as-is inside the target repo. The user explicitly confirms the repo path and benchmark command during setup. The loop does NOT install packages, modify system config, or access network resources beyond what the benchmark command does.
25
+ - **Sealed files**: validate.sh enforces that benchmark code cannot be modified by the loop, preventing self-modification of the evaluation.
26
+
27
+ ---
28
+
29
+ ## State Tracking
30
+
31
+ Self-improve artifacts live under a resolved root returned by `scripts/resolve-paths.mjs`.
32
+
33
+ - New runs default to `.omc/self-improve/topics/default/`.
34
+ - When the user provides a topic or slug, use `.omc/self-improve/topics/{topic_slug}/`.
35
+ - Legacy single-track state at `.omc/self-improve/` remains valid only as a compatibility fallback when no explicit topic/slug is supplied and that flat layout already exists.
36
+
37
+ Treat `<self-improve-root>/` below as that resolved root:
38
+
39
+ ```
40
+ <self-improve-root>/
41
+ ├── config/ # User configuration
42
+ │ ├── settings.json # agents, benchmark, thresholds, sealed_files
43
+ │ ├── goal.md # Improvement objective + target metric
44
+ │ ├── harness.md # Guardrail rules (H001/H002/H003)
45
+ │ └── idea.md # User experiment ideas
46
+ ├── state/ # Runtime state
47
+ │ ├── agent-settings.json # iterations, best_score, status, counters
48
+ │ ├── iteration_state.json # Within-iteration progress (resumability)
49
+ │ ├── research_briefs/ # Research output per round
50
+ │ ├── iteration_history/ # Full history per round
51
+ │ ├── merge_reports/ # Tournament results
52
+ │ └── plan_archive/ # Archived plans (permanent)
53
+ ├── plans/ # Active plans (current round)
54
+ └── tracking/ # Visualization data
55
+ ├── raw_data.json # All candidate scores
56
+ ├── baseline.json # Initial benchmark score
57
+ ├── events.json # Config changes
58
+ └── progress.png # Generated chart
59
+ ```
60
+
61
+ OMC mode lifecycle: `.omc/state/sessions/{sessionId}/self-improve-state.json`
62
+
63
+ ---
64
+
65
+ ## Agent Mapping
66
+
67
+ All augmentations delivered via Task description context at spawn time. No modifications to existing agent .md files.
68
+
69
+ | Step | Role | OMC Agent | Model |
70
+ |------|------|-----------|-------|
71
+ | Research | Codebase analysis + hypothesis generation | general-purpose Agent | opus |
72
+ | Planning | Hypothesis → structured plan | oh-my-claudecode:planner | opus |
73
+ | Architecture Review | 6-point plan review | oh-my-claudecode:architect | opus |
74
+ | Critic Review | Harness rule enforcement | oh-my-claudecode:critic | opus |
75
+ | Execution | Implement plan + run benchmark | oh-my-claudecode:executor | opus |
76
+ | Git Operations | Atomic merge/tag/PR | oh-my-claudecode:git-master | sonnet |
77
+ | Goal Setup | Interactive interview | (directly in this skill) | N/A |
78
+ | Benchmark Setup | Create + validate benchmark | custom agent | opus |
79
+
80
+ **Research prompt**: Read `si-researcher.md` from this skill directory and pass its content as the agent prompt.
81
+
82
+ **Benchmark builder**: Read `si-benchmark-builder.md` from this skill directory and pass its content as the agent prompt.
83
+
84
+ **Goal clarifier**: Read `si-goal-clarifier.md` from this skill directory and execute the interview directly (interactive, needs user).
85
+
86
+ ---
87
+
88
+ ## Inputs
89
+
90
+ Read these files at startup and at the beginning of each iteration:
91
+
92
+ | File | Purpose |
93
+ |---|---|
94
+ | `<self-improve-root>/config/settings.json` | User config: `number_of_agents`, `benchmark_command`, `benchmark_format`, `benchmark_direction`, `max_iterations`, `plateau_threshold`, `plateau_window`, `target_value`, `primary_metric`, `sealed_files`, `regression_threshold`, `circuit_breaker_threshold`, `target_branch`, `current_repo_url`, `fork_url`, `upstream_url`, `topic_slug` |
95
+ | `<self-improve-root>/state/agent-settings.json` | Runtime: `iterations`, `best_score`, `plateau_consecutive_count`, `circuit_breaker_count`, `status`, `goal_slug` (derived: lowercase underscore from goal objective, persisted for cross-session consistency) |
96
+ | `<self-improve-root>/state/iteration_state.json` | Per-iteration progress for resumability |
97
+ | `<self-improve-root>/config/goal.md` | Improvement objective, target metric, scope |
98
+ | `<self-improve-root>/config/harness.md` | Guardrail rules (H001, H002, H003) |
99
+
100
+ ---
101
+
102
+ ## Setup Phase
103
+
104
+ 1. Check if target repo path exists. If not configured, ask user for the path to the repository to improve.
105
+ 2. Resolve `<self-improve-root>` by running `node {skill_dir}/scripts/resolve-paths.mjs --project-root {repo_path} [--topic "..."] [--slug "..."] --ensure-dirs`.
106
+ 3. Create the `<self-improve-root>/` directory structure by copying from `templates/` in this skill directory into the resolved `config/` root.
107
+ 4. Read `<self-improve-root>/state/agent-settings.json`. Check `si_setting_goal`, `si_setting_benchmark`, `si_setting_harness`.
108
+ 4. **Trust confirmation** (mandatory, cannot be skipped):
109
+ a. If `trust_confirmed` is already `true` in agent-settings.json, skip to step 5 (resume path).
110
+ b. Display the target repo path and ask user to confirm:
111
+ `"Self-improve will run benchmark commands inside {repo_path}. This executes arbitrary code in that repository. Confirm? [yes/no]"`
112
+ c. If user declines: abort setup and exit. Do NOT proceed.
113
+ d. Record consent: set `trust_confirmed: true` in agent-settings.json.
114
+ 5. Persist `topic_slug` into `config/settings.json` when the resolved root is topic-scoped so future resumes stay on the same track.
115
+ 6. If goal not set → read `si-goal-clarifier.md` from this skill directory and run the 4-dimension Socratic interview directly in this context (Objective, Metric, Target, Scope). Write result to `<self-improve-root>/config/goal.md`.
116
+ 6. If benchmark not set → read `si-benchmark-builder.md` from this skill directory, spawn a custom Agent(model=opus) with its content as prompt. The agent surveys the repo, creates or wraps a benchmark, validates 3x, and records baseline.
117
+ After benchmark is set, confirm the benchmark command with user:
118
+ `"Benchmark command: {benchmark_command}. This will be run repeatedly during the loop. Confirm? [yes/no]"`
119
+ If user declines: abort setup and exit.
120
+ 7. If harness not set → confirm default harness rules (H001/H002/H003) with user or customize.
121
+ 8. **Gate**: All of `si_setting_goal`, `si_setting_benchmark`, `si_setting_harness`, `trust_confirmed` must be true.
122
+ 9. **Create improvement branch** (if it does not exist):
123
+ ```
124
+ git -C {repo_path} checkout -b improve/{goal_slug} {target_branch}
125
+ git -C {repo_path} checkout {target_branch}
126
+ ```
127
+ Where `{goal_slug}` is derived from the goal objective (lowercase, underscored). If the branch already exists, skip creation. Persist `goal_slug` in agent-settings.json.
128
+ 10. **Mode exclusivity**: Call `state_list_active`. If autopilot, ralph, or ultrawork is active, refuse to start.
129
+ 11. Write initial state: `state_write(mode='self-improve', active=true, iteration=0, started_at=<now>)`
130
+
131
+ ---
132
+
133
+ ## Git Strategy
134
+
135
+ All git operations happen inside the target repo, NOT in the OMC project root.
136
+
137
+ - **Improvement branch**: `improve/{goal_slug}` — accumulates winning changes only.
138
+ - **Experiment branches**: `experiment/round_{n}_executor_{id}` — short-lived, per executor.
139
+ - **Archive tags**: `archive/round_{n}_executor_{id}` — losing branches tagged before deletion.
140
+ - **Worktree setup** (SKILL.md creates before each executor):
141
+ ```
142
+ git -C {repo_path} worktree add worktrees/round_{n}_executor_{id} -b experiment/round_{n}_executor_{id} improve/{goal_slug}
143
+ ```
144
+ - **Winner merges** via `oh-my-claudecode:git-master`:
145
+ ```
146
+ Merge experiment/round_{n}_executor_{winner_id} into improve/{goal_slug} with --no-ff
147
+ Message: "Iteration {n}: {hypothesis} (score: {before} → {after})"
148
+ ```
149
+ - **Push after merge**: `git -C {repo_path} push origin improve/{goal_slug}` (backup, non-blocking)
150
+ - **Losers archived**: Tag + delete via git-master.
151
+
152
+ ---
153
+
154
+ ## Improvement Loop
155
+
156
+ **Gate**: All settings must be true. Once the gate passes, execute continuously without stopping.
157
+
158
+ Update `state_write(mode='self-improve', active=true, status="running")`.
159
+
160
+ ### Step 0 — Stale Worktree Cleanup (mandatory, runs every iteration)
161
+
162
+ **PREREQUISITE**: This step MUST run to completion before any other step, including resume logic. It is idempotent and safe to run multiple times.
163
+
164
+ 1. List all worktrees in the target repo: `git -C {repo_path} worktree list`
165
+ 2. For any worktree matching `worktrees/round_*` that does NOT belong to the current iteration: remove it with `git -C {repo_path} worktree remove {path} --force`
166
+ 3. Run `git -C {repo_path} worktree prune` to clean up stale references
167
+ 4. This handles crash recovery — orphaned worktrees from interrupted iterations are cleaned before the new iteration starts
168
+
169
+ ### Step 1 — Refresh State
170
+
171
+ `state_write(mode='self-improve', active=true, iteration=N)` to reset 30min TTL.
172
+
173
+ ### Step 2 — Check Stop Request
174
+
175
+ Read state via `state_read(mode='self-improve')`.
176
+
177
+ If state is cleared (cancel was invoked) OR status is `user_stopped`:
178
+ a. Set `status: "user_stopped"` in `<self-improve-root>/state/agent-settings.json`
179
+ b. Update `iteration_state.json`: set `status: "interrupted"`, record `current_step`
180
+ c. Clean up any active worktrees for the current round (Step 0 logic)
181
+ d. Log: `"Self-improve stopped by user at iteration {N}, step {current_step}"`
182
+ e. Exit gracefully — do NOT invoke /cancel again (already cancelled)
183
+
184
+ ### Step 3 — Check User Ideas
185
+
186
+ Read `<self-improve-root>/config/idea.md`. If non-empty, snapshot contents for planners. Clear after planners consume.
187
+
188
+ ### Step 4 — Research
189
+
190
+ Spawn 1 general-purpose Agent(model=opus) with the content of `si-researcher.md` as prompt.
191
+
192
+ Pass in the prompt:
193
+ - Current iteration number
194
+ - Path to target repo
195
+ - Path to `<self-improve-root>/config/goal.md`
196
+ - Path to `<self-improve-root>/state/iteration_history/` (all prior records)
197
+ - Path to `<self-improve-root>/state/research_briefs/` (prior briefs)
198
+ - Content of `data_contracts.md` Section 3 (Research Brief schema)
199
+
200
+ Expected output: research brief JSON → `<self-improve-root>/state/research_briefs/round_{n}.json`
201
+
202
+ If researcher fails, proceed with history only.
203
+
204
+ ### Step 5 — Plan
205
+
206
+ Spawn N `oh-my-claudecode:planner`(model=opus) agents in parallel (N = `number_of_agents` from settings).
207
+
208
+ Pass in each planner's prompt:
209
+ - Planner identity (planner_a, planner_b, planner_c...)
210
+ - Research brief path
211
+ - Iteration history path
212
+ - Harness rules from `<self-improve-root>/config/harness.md`
213
+ - Data contract schema for Plan Document
214
+ - **Override instructions**: Output JSON (not markdown), skip interview mode, generate exactly ONE testable hypothesis per plan, include approach_family tag and history_reference.
215
+ - User ideas (if any, planner_a gets priority)
216
+
217
+ Expected output: Plan Document JSON → `<self-improve-root>/plans/round_{n}/plan_planner_{id}.json`
218
+
219
+ ### Step 6 — Review
220
+
221
+ For each plan, **sequentially** (architect before critic):
222
+
223
+ **6a. Architecture Review**: Spawn `oh-my-claudecode:architect` with the plan + 6-point checklist:
224
+ 1. Testability — is the hypothesis testable?
225
+ 2. Novelty — different from prior attempts?
226
+ 3. Scope — right-sized?
227
+ 4. Target files — exist, not sealed?
228
+ 5. Implementation clarity — executor can implement without guessing?
229
+ 6. Expected outcome — realistic given evidence?
230
+
231
+ Architect verdict is **advisory only**.
232
+
233
+ **6b. Critic Review**: Spawn `oh-my-claudecode:critic` with the plan + harness rules:
234
+ - H001: Exactly one hypothesis (reject if zero or multiple)
235
+ - H002: No approach_family repetition streak >= 3
236
+ - H003: Intra-round diversity (no two plans same family in same round)
237
+ - Schema validation against data_contracts.md
238
+ - History awareness check
239
+
240
+ Critic sets `critic_approved: true` or `false`. Plans with `false` are excluded from execution.
241
+
242
+ If ALL plans rejected, log and skip to Step 9.
243
+
244
+ ### Step 7 — Execute
245
+
246
+ For each approved plan, spawn `oh-my-claudecode:executor`(model=opus) in parallel.
247
+
248
+ **Before spawning**, create worktree:
249
+ ```
250
+ git -C {repo_path} worktree add worktrees/round_{n}_executor_{id} -b experiment/round_{n}_executor_{id} improve/{goal_slug}
251
+ ```
252
+
253
+ Pass in each executor's prompt:
254
+ - The approved plan JSON
255
+ - Worktree directory path
256
+ - Benchmark command from settings
257
+ - Sealed files list from settings
258
+ - Path to `scripts/validate.sh` in this skill directory
259
+ - Data contract schema for Benchmark Result
260
+ - **Override instructions**: Implement the plan faithfully, run validate.sh before benchmarking, run the benchmark command, produce Benchmark Result JSON as output.
261
+
262
+ Expected output: Benchmark Result JSON (written by executor or returned as output).
263
+
264
+ ### Step 8 — Tournament Selection
265
+
266
+ SKILL.md does this directly (not delegated):
267
+
268
+ 1. **Collect** all executor results
269
+ 2. **Filter** to `status: "success"` only. If zero candidates, skip to Step 9 (Record & Visualize).
270
+ 3. **Rank** by `benchmark_score` (respecting `benchmark_direction`)
271
+ 4. **Ranked-candidate loop** — for each candidate in rank order (best first):
272
+ a. **No-regression check**: candidate score must improve or hold even vs `best_score`, respecting `benchmark_direction` (`higher_is_better`: score >= best_score; `lower_is_better`: score <= best_score)
273
+ b. **Merge** via `oh-my-claudecode:git-master`: `git merge experiment/round_{n}_executor_{id} --no-ff -m "Iteration {n}: {hypothesis} (score: {before} → {after})"`
274
+ c. **Re-benchmark** on merged state to confirm improvement
275
+ d. If re-benchmark **confirms** improvement: **accept winner**, break loop
276
+ e. If re-benchmark shows **regression**: **revert merge** via `git -C {repo_path} reset --hard HEAD~1`, continue to next candidate
277
+ f. If merge **conflicts**: `git -C {repo_path} merge --abort`, continue to next candidate
278
+ 5. If a winner was accepted AND `auto_push` is `true` in settings: **Push** improvement branch: `git -C {repo_path} push origin improve/{goal_slug}` (non-blocking).
279
+ If `auto_push` is `false` (default): skip push. Log: `"Push skipped (auto_push: false). Run manually: git -C {repo_path} push origin improve/{goal_slug}"`
280
+ 6. **Archive** all non-winner branches via git-master: tag + delete
281
+ 7. If no candidate survived the loop: no merge this round. Improvement branch stays at prior state.
282
+ 8. **Write Merge Report** JSON to `<self-improve-root>/state/merge_reports/round_{n}.json` (schema: data_contracts.md Section 9).
283
+
284
+ ### Step 9 — Record & Visualize
285
+
286
+ 1. Write iteration history to `<self-improve-root>/state/iteration_history/round_{n}.json`
287
+ 2. Update `<self-improve-root>/state/agent-settings.json`:
288
+ - Increment `iterations` by 1
289
+ - If winner AND improvement exceeds `plateau_threshold` (`abs(new_score - best_score) >= plateau_threshold`): update `best_score`, reset `plateau_consecutive_count = 0`, reset `circuit_breaker_count = 0`
290
+ - If winner AND improvement below threshold (`abs(new_score - best_score) < plateau_threshold`): update `best_score` if better, increment `plateau_consecutive_count += 1`, reset `circuit_breaker_count = 0`
291
+ - If no winner (all rejected, all failed, or all regressed): increment `circuit_breaker_count += 1` (do NOT increment `plateau_consecutive_count` — plateau tracks stagnating wins, not failures)
292
+ 3. Append to `<self-improve-root>/tracking/raw_data.json` (one entry per candidate)
293
+ 4. Run `python3 {skill_dir}/scripts/plot_progress.py --tracking-dir <self-improve-root>/tracking` for visualization
294
+ 5. Archive plans: copy current round plans to `state/plan_archive/round_{n}/`
295
+
296
+ ### Step 10 — Cleanup
297
+
298
+ Remove worktrees:
299
+ ```
300
+ git -C {repo_path} worktree remove worktrees/round_{n}_executor_{id} --force
301
+ git -C {repo_path} worktree prune
302
+ ```
303
+
304
+ Update `iteration_state.json` status to `completed`.
305
+
306
+ ### Step 11 — Stop Condition Check
307
+
308
+ Evaluate ALL conditions. If ANY is true, exit:
309
+
310
+ | Condition | Check |
311
+ |---|---|
312
+ | User stop | `status == "user_stopped"` in agent-settings or state cleared |
313
+ | Target reached | `best_score` meets/exceeds `target_value` (respecting direction) |
314
+ | Plateau | `plateau_consecutive_count >= plateau_window` |
315
+ | Max iterations | `iterations >= max_iterations` |
316
+ | Circuit breaker | `circuit_breaker_count >= circuit_breaker_threshold` |
317
+
318
+ If NO stop condition: immediately go back to Step 1.
319
+
320
+ ---
321
+
322
+ ## Resumability
323
+
324
+ **PREREQUISITE**: Step 0 (stale worktree cleanup) MUST run to completion before any resume logic executes, regardless of prior state.
325
+
326
+ On invocation, before entering the loop:
327
+
328
+ 1. **Always run Step 0** (stale worktree cleanup) — even on fresh start
329
+ 2. Read `<self-improve-root>/state/agent-settings.json`:
330
+ - If `status: "user_stopped"`: ask user `"Previous run was stopped at iteration {N}. Resume? [yes/no]"`. If no, exit. If yes, continue.
331
+ - If `status: "running"`: session crashed — resume automatically (no user prompt)
332
+ - If `status: "idle"`: fresh start
333
+ 3. Re-confirm trust gate only if `trust_confirmed` is `false` in agent-settings.json
334
+ 4. Read `<self-improve-root>/state/iteration_state.json`:
335
+ - `status: "in_progress"` → resume from `current_step`, skip completed sub-steps
336
+ - `status: "completed"` → start next iteration
337
+ - `status: "failed"` → complete recording step if needed, start next iteration
338
+ - File missing → start from iteration 1
339
+
340
+ ---
341
+
342
+ ## Completion
343
+
344
+ When the loop exits:
345
+
346
+ 1. Update agent-settings.json with final status
347
+ 2. If `target_reached` AND `auto_pr` is `true` in settings: spawn git-master to create PR from `improve/{goal_slug}` to upstream.
348
+ If `auto_pr` is `false` (default): skip PR creation. Log: `"PR creation skipped (auto_pr: false). Run manually: gh pr create --head improve/{goal_slug} --base {target_branch}"`
349
+ 3. Run plot_progress.py one final time
350
+ 4. Print summary report:
351
+ ```
352
+ === Self-Improvement Loop Complete ===
353
+ Status: {status}
354
+ Iterations: {iterations}
355
+ Best Score: {best_score} (baseline: {baseline})
356
+ Improvement: {delta} ({delta_pct}%)
357
+ ```
358
+ 5. Run `/oh-my-claudecode:cancel` for clean state cleanup
359
+
360
+ ---
361
+
362
+ ## Error Handling
363
+
364
+ | Situation | Action |
365
+ |---|---|
366
+ | Agent fails to produce output | Retry once. If still no output, log and continue. |
367
+ | Researcher produces empty brief | Proceed — planners work from history alone. |
368
+ | All plans rejected by critic | Skip execution. Log. Continue to next iteration. |
369
+ | All executors fail | Skip tournament. Record failures. Continue. |
370
+ | Merge conflict | Reject candidate, try next. |
371
+ | Re-benchmark regression | Reject candidate, revert merge, try next. |
372
+ | Push failure | Log warning. Continue — push is backup. |
373
+ | Worktree already exists | Remove and recreate. |
374
+ | Settings corrupted | Report and stop. |
375
+
376
+ ---
377
+
378
+ ## Approach Family Taxonomy
379
+
380
+ Every plan must be tagged with exactly one:
381
+
382
+ | Tag | Description |
383
+ |-----|-------------|
384
+ | `architecture` | Model/component structure changes |
385
+ | `training_config` | Optimizer, LR, scheduler, batch size |
386
+ | `data` | Data loading, augmentation, preprocessing |
387
+ | `infrastructure` | Mixed precision, distributed training, compiled kernels |
388
+ | `optimization` | Algorithmic/numerical optimizations |
389
+ | `testing` | Evaluation methodology changes |
390
+ | `documentation` | Documentation-only changes |
391
+ | `other` | Does not fit above — explain in evidence |
@@ -0,0 +1,274 @@
1
+ # Data Contracts: Inter-Agent Communication Schemas
2
+
3
+ Canonical JSON schemas for all messages exchanged between agents in the self-improvement loop.
4
+
5
+ ## 1. Plan Document
6
+
7
+ **Producer:** planner | **Consumer:** critic, executor
8
+
9
+ ```json
10
+ {
11
+ "plan_id": "round_{N}_{planner_id}",
12
+ "planner_id": "planner_a|planner_b|planner_c",
13
+ "round": 1,
14
+ "hypothesis": "Doing X should improve Y because Z",
15
+ "approach_family": "<taxonomy value>",
16
+ "critic_approved": false,
17
+ "target_files": ["path/to/file1"],
18
+ "steps": [
19
+ { "step": 1, "file": "path/to/file", "change": "exact description" }
20
+ ],
21
+ "expected_outcome": {
22
+ "metric": "<metric from goal>",
23
+ "estimated_impact": "<quantified estimate>",
24
+ "rationale": "<why>",
25
+ "sub_score_expectations": {}
26
+ },
27
+ "history_reference": {
28
+ "builds_on": "<prior success or 'none'>",
29
+ "avoids": "<prior failure or 'none'>"
30
+ },
31
+ "critic_review": {
32
+ "h001_hypothesis_count": "pass|fail",
33
+ "h002_family_streak": "pass|fail",
34
+ "h003_intra_round_diversity": "pass|fail",
35
+ "schema_valid": "pass|fail",
36
+ "history_aware": "pass|fail",
37
+ "verdict": "approved|rejected",
38
+ "rejection_reason": null
39
+ },
40
+ "architect_review": {
41
+ "verdict": "approve|reject",
42
+ "feedback": "",
43
+ "structural_concerns": []
44
+ }
45
+ }
46
+ ```
47
+
48
+ ## 2. Benchmark Result
49
+
50
+ **Producer:** executor | **Consumer:** tournament (SKILL.md)
51
+
52
+ ```json
53
+ {
54
+ "executor_id": "executor_{id}",
55
+ "plan_id": "round_{n}_planner_{x}",
56
+ "benchmark_score": 85.2,
57
+ "benchmark_raw": "full stdout verbatim",
58
+ "status": "success|regression|error|timeout",
59
+ "sub_scores": { "dim_a": 85.2, "dim_b": 42.3 },
60
+ "failure_analysis": null,
61
+ "timestamp": "ISO 8601 UTC"
62
+ }
63
+ ```
64
+
65
+ **Status definitions:**
66
+ - `success` — score improved or held even
67
+ - `regression` — score dropped below baseline
68
+ - `error` — benchmark could not run
69
+ - `timeout` — exceeded time limit
70
+
71
+ ## 3. Research Brief
72
+
73
+ **Producer:** researcher | **Consumer:** planners
74
+
75
+ ```json
76
+ {
77
+ "iteration": 1,
78
+ "researcher_id": "researcher",
79
+ "repo_analysis_summary": "...",
80
+ "ideas": [
81
+ {
82
+ "title": "Short action name",
83
+ "source": "Specific origin",
84
+ "evidence": "Concrete evidence",
85
+ "approach_family": "<taxonomy value>",
86
+ "confidence": "high|medium|low",
87
+ "estimated_impact": "3-5%"
88
+ }
89
+ ]
90
+ }
91
+ ```
92
+
93
+ ## 4. Iteration History Record
94
+
95
+ **Producer:** orchestrator | **Consumer:** planners, researcher
96
+
97
+ ```json
98
+ {
99
+ "iteration": 1,
100
+ "baseline_score": 80.0,
101
+ "winner": {
102
+ "plan_id": "round_1_planner_a",
103
+ "score": 85.2,
104
+ "approach_family": "training_config",
105
+ "hypothesis": "...",
106
+ "sub_scores": {}
107
+ },
108
+ "losers": [
109
+ {
110
+ "plan_id": "round_1_planner_b",
111
+ "score": 78.5,
112
+ "approach_family": "architecture",
113
+ "hypothesis": "...",
114
+ "sub_scores": {},
115
+ "failure_analysis": {
116
+ "what": "Score dropped",
117
+ "why": "Root cause",
118
+ "category": "regression",
119
+ "lesson": "Actionable lesson"
120
+ }
121
+ }
122
+ ],
123
+ "research_brief_id": "round_1"
124
+ }
125
+ ```
126
+
127
+ ## 5. Visualization Data
128
+
129
+ **File:** `<self-improve-root>/tracking/raw_data.json` — top-level JSON array, append-only.
130
+
131
+ ```json
132
+ [
133
+ {
134
+ "iteration": 1,
135
+ "plan_id": "round_1_planner_a",
136
+ "benchmark_score": 85.2,
137
+ "is_winner": true,
138
+ "approach_family": "training_config",
139
+ "sub_scores": {}
140
+ }
141
+ ]
142
+ ```
143
+
144
+ ## 6. Approach Family Taxonomy
145
+
146
+ | Tag | Description |
147
+ |-----|-------------|
148
+ | `architecture` | Model/component structure changes |
149
+ | `training_config` | Optimizer, LR, scheduler, batch size, epochs |
150
+ | `data` | Data loading, augmentation, preprocessing |
151
+ | `infrastructure` | Mixed precision, distributed training, checkpointing |
152
+ | `optimization` | Algorithmic/numerical optimizations |
153
+ | `testing` | Evaluation methodology changes |
154
+ | `documentation` | Documentation-only changes |
155
+ | `other` | Does not fit above — explain in evidence |
156
+
157
+ Custom families from harness.md are also valid.
158
+
159
+ ## 7. Failure Analysis Object
160
+
161
+ ```json
162
+ {
163
+ "what": "Factual description with scores/errors",
164
+ "why": "Root cause mechanism",
165
+ "category": "oom|timeout|regression|logic_error|scope_error|infrastructure|benchmark_parse_error|sealed_file_violation",
166
+ "lesson": "Actionable lesson for future planners"
167
+ }
168
+ ```
169
+
170
+ ## 8. Iteration State
171
+
172
+ **File:** `<self-improve-root>/state/iteration_state.json` — tracks within-iteration progress.
173
+
174
+ ```json
175
+ {
176
+ "iteration": 1,
177
+ "status": "in_progress|completed|failed|interrupted",
178
+ "current_step": "research|planning|critic_review|execution|tournament|recording|stop_check",
179
+ "started_at": "ISO 8601",
180
+ "updated_at": "ISO 8601",
181
+ "research": { "status": "pending|in_progress|completed|failed", "output_path": null, "completed_at": null },
182
+ "planning": {
183
+ "status": "pending|in_progress|completed",
184
+ "plans": {
185
+ "planner_a": { "status": "completed", "output_path": "...", "critic_approved": true }
186
+ },
187
+ "approved_count": 2,
188
+ "completed_at": null
189
+ },
190
+ "execution": {
191
+ "status": "pending|in_progress|completed",
192
+ "executors": {
193
+ "executor_1": { "status": "running", "plan_id": "...", "output_path": null, "benchmark_score": null }
194
+ },
195
+ "completed_at": null
196
+ },
197
+ "tournament": { "status": "pending", "winner": null, "winner_score": null, "completed_at": null },
198
+ "recording": { "status": "pending", "history_path": null, "visualization_updated": false, "cleanup_done": false },
199
+ "user_ideas_consumed": []
200
+ }
201
+ ```
202
+
203
+ ## 9. Merge Report
204
+
205
+ **Producer:** tournament (SKILL.md) | **Consumer:** orchestrator
206
+
207
+ ```json
208
+ {
209
+ "iteration": 3,
210
+ "goal_slug": "reduce_latency",
211
+ "winner": {
212
+ "executor_id": "executor_2",
213
+ "branch": "experiment/round_3_executor_2",
214
+ "hypothesis": "Cache intermediate results",
215
+ "score_before": 142.3,
216
+ "score_after": 118.7,
217
+ "sub_scores": {}
218
+ },
219
+ "archived": ["archive/round_3_executor_1"],
220
+ "regressions_detected": false,
221
+ "re_benchmark_score": 118.7,
222
+ "status": "merged|no_improvement|no_winner|all_rejected",
223
+ "reason": null
224
+ }
225
+ ```
226
+
227
+ **Status definitions:**
228
+ - `merged` — a candidate was merged and re-benchmark confirmed improvement
229
+ - `no_improvement` — candidates existed and were tested, but all failed re-benchmark (no merge occurred)
230
+ - `no_winner` — all executors failed or produced non-success status (no candidates to evaluate)
231
+ - `all_rejected` — all plans were rejected by the critic (execution was skipped)
232
+
233
+ `reason` is required (string) when status is not `merged`, null when `merged`.
234
+ ```
235
+
236
+ ## 10. Plan Archive
237
+
238
+ **Location:** `<self-improve-root>/state/plan_archive/round_{n}/`
239
+
240
+ Exact copies of all plan JSON files, including critic and architect reviews. Permanent retention.
241
+
242
+ ## 11. Event Log
243
+
244
+ **File:** `<self-improve-root>/tracking/events.json` — append-only array.
245
+
246
+ ```json
247
+ [
248
+ {
249
+ "timestamp": "ISO 8601",
250
+ "event_type": "config_change|phase_transition",
251
+ "iteration": 5,
252
+ "details": {
253
+ "field": "number_of_agents",
254
+ "old_value": 2,
255
+ "new_value": 3,
256
+ "source": "user"
257
+ }
258
+ }
259
+ ]
260
+ ```
261
+
262
+ ## 12. Goal Phase
263
+
264
+ Defined in goal.md under `## Phases`. Tracked in agent-settings.json as `current_phase`.
265
+
266
+ ```markdown
267
+ ## Phases
268
+ | Phase | Focus | Sub-Score Targets | Status |
269
+ |-------|-------|-------------------|--------|
270
+ | phase_1 | Primary dimension | dim_a >= 90.0 | active |
271
+ | phase_2 | Secondary dimension | dim_b <= 50.0 | pending |
272
+ ```
273
+
274
+ Phase transitions are tracked as events but do not affect tournament selection.