niahere 0.2.57 → 0.2.59

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,5 +1,9 @@
1
1
  # nia
2
2
 
3
+ [![npm version](https://img.shields.io/npm/v/niahere.svg)](https://www.npmjs.com/package/niahere)
4
+ [![npm downloads](https://img.shields.io/npm/dm/niahere.svg)](https://www.npmjs.com/package/niahere)
5
+ [![license](https://img.shields.io/npm/l/niahere.svg)](https://github.com/onlyoneaman/niahere/blob/main/LICENSE)
6
+
3
7
  A personal AI agent you fork and make your own. Small enough to understand, built for one user. Powered by Claude Agent SDK.
4
8
 
5
9
  - npm package: [`niahere`](https://www.npmjs.com/package/niahere)
@@ -33,7 +37,7 @@ nia start # starts daemon + registers OS service
33
37
  - **Telegram** — message your agent from your phone, typing indicator while processing
34
38
  - **Slack** — Socket Mode bot with thread awareness, thinking emoji, watch channels for proactive monitoring
35
39
  - **Terminal chat** — REPL with session resume support
36
- - **Scheduled jobs** — recurring jobs and crons that run Claude and can message you back
40
+ - **Scheduled jobs** — recurring jobs and crons that run Claude and can message you back. Stateful by default (working memory), per-job model routing for cost savings
37
41
  - **Persona system** — customizable identity, soul, owner profile, rules, and memory (preloaded every session)
38
42
  - **Agents** — domain specialists (marketer, senior-dev) via Claude Agent SDK subagents
39
43
  - **Skills** — loads skills from multiple directories, invokable as slash commands
@@ -63,8 +67,8 @@ nia update — update to latest version (auto-backup + resta
63
67
  nia job list — list all jobs
64
68
  nia job show [name] — full details + recent runs
65
69
  nia job status [name] — quick status check
66
- nia job add <n> <s> <p> — add a job (--type, --always, --agent, --stateless, --prompt-file)
67
- nia job update <name> — update a job (--schedule, --prompt, --prompt-file, --type, --always, --agent, --stateless)
70
+ nia job add <n> <s> <p> — add a job (--type, --always, --agent, --model, --stateless, --prompt-file)
71
+ nia job update <name> — update a job (--schedule, --prompt, --prompt-file, --type, --always, --agent, --model, --stateless)
68
72
  nia job remove <name> — delete a job
69
73
  nia job enable / disable <n> — toggle a job
70
74
  nia job run <name> — run a job once
@@ -106,6 +110,8 @@ All config and data lives in `~/.niahere/`:
106
110
  soul.md — how the agent works
107
111
  rules.md — behavioral instructions (loaded every session)
108
112
  memory.md — persistent facts and context (loaded every session)
113
+ jobs/ — per-job working memory and state (auto-created)
114
+ optimizations/ — optimization loop run workspaces
109
115
  images/
110
116
  reference.webp — visual identity reference image
111
117
  profile.webp — profile picture for Telegram/Slack
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "niahere",
3
- "version": "0.2.57",
3
+ "version": "0.2.59",
4
4
  "description": "A personal AI assistant daemon — chat, scheduled jobs, persona system, extensible via skills.",
5
5
  "type": "module",
6
6
  "scripts": {
@@ -43,7 +43,7 @@
43
43
  "license": "MIT",
44
44
  "private": false,
45
45
  "dependencies": {
46
- "@anthropic-ai/claude-agent-sdk": "^0.2.74",
46
+ "@anthropic-ai/claude-agent-sdk": "^0.2.97",
47
47
  "@modelcontextprotocol/sdk": "^1.27.1",
48
48
  "@slack/bolt": "^4.6.0",
49
49
  "cron-parser": "^5.5.0",
@@ -3,9 +3,8 @@ name: beads-tasks
3
3
  description: >
4
4
  Persistent task management via Beads CLI (bd). Use when user mentions tasks, todos, issues, or tracking work.
5
5
  Check `which bd` first — if missing, offer: `npm install -g @beads/bd`.
6
- Prefer setting `BEATS_DIR` (for example `~/.niahere/beads`) and run commands from there:
7
- `cd "$BEATS_DIR" && bd <command>`. Always label: `--label project:<project-name>`.
8
- Run `cd "$BEATS_DIR" && bd help-all` for available commands. Not for ephemeral in-conversation tracking.
6
+ All commands: run from `$BEATS_DIR` (for example `~/.niahere/beads`) and use `bd <command>`. Always label: `--label project:<project-name>`.
7
+ Run `bd help-all` for available commands. Not for ephemeral in-conversation tracking.
9
8
  ---
10
9
 
11
10
  ## Overview
@@ -19,7 +18,63 @@ Global task manager powered by [Beads](https://github.com/steveyegge/beads). Sto
19
18
  2. Ensure `~/.niahere/beads/.beads` exists — `bd init` if not.
20
19
  3. Set `BEATS_DIR` to your Beads workspace (for example `~/.niahere/beads`).
21
20
  4. All commands: `cd "$BEATS_DIR" && bd <command>`.
22
- 5. Always label with `--label project:<project-name>`.
21
+ 5. Always label with `--label project:<name>`.
22
+ 6. Run `cd "$BEATS_DIR" && bd help-all` for available commands.
23
+
24
+ ## Core Commands
25
+
26
+ ### Creating tasks
27
+
28
+ ```bash
29
+ # Basic
30
+ bd create --title "Fix auth token refresh" --priority P2 --type bug
31
+
32
+ # With parent (subtask)
33
+ bd create --title "Extract shared logic" --priority P2 --type task --parent <parent-id>
34
+
35
+ # With description — ALWAYS add context: what's broken, links, references
36
+ bd create --title "Chat fails on long docs" --type bug --description "Fails on docs >500 pages. Ref: https://..."
37
+
38
+ # Epic (container for related tasks)
39
+ bd create --title "API performance improvements" --type epic --priority P2
40
+ ```
41
+
42
+ ### Updating tasks
43
+
44
+ `bd update` is the workhorse — use it for reparenting, reprioritizing, retyping, renaming:
45
+
46
+ ```bash
47
+ bd update <id> --parent <new-parent-id> # Reparent / move under epic
48
+ bd update <id> --parent "" # Remove parent (make top-level)
49
+ bd update <id> --priority P1 # Change priority
50
+ bd update <id> --type bug # Change type
51
+ bd update <id> --title "Better title" # Rename
52
+ bd update <id> --status in_progress # Start work
53
+ bd update <id> --description "..." # Add/replace description
54
+ bd update <id> --add-label personal # Add label
55
+ bd update <id> --set-labels bug,urgent # Replace all labels
56
+ ```
57
+
58
+ Chain multiple updates: `bd update <id> --priority P1 --type bug --parent <parent-id>`
59
+
60
+ ### Viewing tasks
61
+
62
+ ```bash
63
+ bd list # Open tasks (tree view)
64
+ bd list --all # Include closed/deferred tasks
65
+ bd list --label project:<name> # Filter by project
66
+ bd show <id> # Full details of a task
67
+ bd children <id> # List children of a parent
68
+ ```
69
+
70
+ ### Closing tasks
71
+
72
+ ```bash
73
+ bd close <id> # Close a task
74
+ bd reopen <id> # Reopen if closed prematurely
75
+ ```
76
+
77
+ **Warning:** Closing a parent does NOT close or reparent its children. If a parent epic is done but children remain open, reparent them first or they become orphaned top-level items.
23
78
 
24
79
  ## Decision Points
25
80
 
@@ -29,22 +84,45 @@ Global task manager powered by [Beads](https://github.com/steveyegge/beads). Sto
29
84
  - User says "done with X" / "finished" → `bd close <id>`
30
85
  - User wants to see cross-project work → `bd list` (no project filter)
31
86
  - User wants project-specific view → `bd list --label project:<name>`
32
- - User asks for task in-progress state → use task status:
33
- - `bd update <task-id> --status in_progress`
34
- - `bd set-state ... state=...` is for operational metadata only and does not
35
- change list-visible task status.
36
87
  - bd not installed → offer install, don't silently fail
37
88
  - Ephemeral/conversation-only tracking → use conversation context, not beads
89
+ - `bd set-state ... state=...` is for operational metadata only; it does not change the task status shown in list.
90
+
91
+ ## Hierarchy & Organization
92
+
93
+ ### When to use parent-child vs labels
94
+
95
+ - **Parent-child** (`--parent`): for structural grouping — epics containing subtasks, features broken into steps.
96
+ - **Labels** (`--add-label`): for cross-cutting tags — `personal`, `urgent`, `project:<name>`. A task can have multiple labels but only one parent.
97
+
98
+ ### Epic patterns
99
+
100
+ - Use `--type epic` for containers that group related work.
101
+ - Epics can nest: epic > sub-epic > tasks.
102
+ - Keep epic titles broad ("API improvements"), subtask titles specific ("Reduce /search latency from 2s to 200ms").
103
+
104
+ ### Cleanup & auditing
105
+
106
+ Periodically review with `bd list` and look for:
107
+ - **Orphaned tasks** — top-level items that should be under an epic.
108
+ - **Similar ungrouped tasks** — multiple tasks on the same topic that should share a parent.
109
+ - **Misplaced tasks** — bugs under improvement epics or vice versa.
110
+ - **Stale tasks** — open tasks that are actually done or no longer relevant.
111
+
112
+ When reorganizing, reparent with `bd update <id> --parent <new-parent>` — don't delete and recreate.
38
113
 
39
114
  ## Conventions
40
115
 
41
- - Titles: descriptive, actionable (e.g. "Fix auth token refresh in niahere")
42
- - Labels: `project:<name>`, `bug`, `feature`, `chore`, `urgent`
43
- - Priority: default P2 unless user specifies urgency
44
- - Status flow: `open``in_progress` `closed`
116
+ - **Titles:** descriptive, actionable (e.g. "Fix auth token refresh in niahere")
117
+ - **Descriptions:** always include context what's broken, why it matters, links to references (Canny, threads, logs). Future you needs enough to start working without asking questions.
118
+ - **Types:** `epic`, `bug`, `feature`, `task`, `chore`, `decision`
119
+ - **Priority:** P0 (critical)P4 (nice-to-have). Default P2 unless user specifies.
120
+ - **Labels:** `project:<name>`, `personal`, `bug`, `feature`, `chore`, `urgent`
121
+ - **Status flow:** `open` → `in_progress` → `closed`
45
122
 
46
123
  ## Validation
47
124
 
48
- - `cd "$BEATS_DIR" && bd list` returns results after creating a task
125
+ - `bd list` returns results after creating a task
49
126
  - Labels appear correctly in list output
127
+ - Parent-child relationships show as indented tree in `bd list`
50
128
  - Dependencies show in `bd dep tree`
@@ -0,0 +1,230 @@
1
+ ---
2
+ name: optimization-loop
3
+ description: |
4
+ The iterative optimization pattern (Karpathy Loop / autoresearch). Reference for running
5
+ autonomous experiment loops on any target: modify → score → keep or revert → repeat.
6
+ Use when running multiple iterations of improvement against a measurable metric — code
7
+ benchmarks, prompt quality, copy effectiveness, config tuning, or any scorable target.
8
+ Also known as "autoresearch." Use this skill to understand the pattern and discipline.
9
+ For orchestration (scheduling, user confirmation, job setup), see the "optimize" skill.
10
+ metadata:
11
+ version: 1.0.0
12
+ ---
13
+
14
+ # Optimization Loop
15
+
16
+ The Karpathy Loop: autonomous iterative optimization through disciplined experimentation.
17
+ Modify a target, score the result, keep improvements, revert failures, repeat.
18
+
19
+ This skill defines the **pattern and discipline**. For when/how to schedule and orchestrate
20
+ optimization runs, see the `optimize` skill.
21
+
22
+ ## The Pattern
23
+
24
+ ```
25
+ freeze contract + rubric
26
+ save baseline (never touch again)
27
+ copy baseline → current-best
28
+
29
+ repeat:
30
+ 1. read state — what's been tried, what worked
31
+ 2. hypothesize — form a specific idea, informed by history
32
+ 3. modify — produce a candidate version
33
+ 4. gate check — hard constraints pass? if no → reject
34
+ 5. score — compare candidate vs current-best (pairwise)
35
+ 6. decide — clearly better? keep. otherwise revert.
36
+ 7. log — append to results.jsonl
37
+ 8. update state — what you tried, what happened, what next
38
+
39
+ until: budget exhausted, target reached, or plateau detected
40
+ notify user with summary
41
+ ```
42
+
43
+ ## Workspace Layout
44
+
45
+ Each optimization run gets a dedicated, self-contained directory:
46
+
47
+ ```
48
+ ~/.niahere/optimizations/{slug}-{hex}/
49
+ ├── contract.md # Frozen at start: objective, scope, constraints, metrics, budget
50
+ ├── rubric.md # Frozen at start: scoring criteria (never modify during run)
51
+ ├── baseline.md # Original version (never modify)
52
+ ├── current-best.md # Best version so far (update only on accept)
53
+ ├── accepted/ # Every accepted candidate, numbered
54
+ │ ├── 001.md
55
+ │ ├── 002.md
56
+ │ └── ...
57
+ ├── results.jsonl # One JSON object per experiment (append-only)
58
+ └── state.md # Your working notebook
59
+ ```
60
+
61
+ **The slug** is human-readable (e.g., `signup-prompt`). The hex suffix (4 chars) prevents
62
+ collisions across multiple runs on the same target.
63
+
64
+ ## The Contract (contract.md)
65
+
66
+ Freeze this at the start. Never modify during the run.
67
+
68
+ ```markdown
69
+ # Optimization Contract
70
+
71
+ ## Objective
72
+
73
+ [What we're optimizing and why — one sentence]
74
+
75
+ ## Target
76
+
77
+ [File path or content being modified]
78
+ [Which sections/parts are in scope — be specific]
79
+
80
+ ## Primary Metric
81
+
82
+ [The metric being optimized — what "better" means]
83
+
84
+ ## Secondary Metrics (regression guards)
85
+
86
+ [Metrics that must NOT degrade. Each with a threshold.]
87
+
88
+ - [e.g., "Word count must stay under 200"]
89
+ - [e.g., "All existing tests must pass"]
90
+ - [e.g., "Readability score must stay above grade 8"]
91
+
92
+ ## Hard Constraints
93
+
94
+ [Violations = automatic reject, no exceptions]
95
+
96
+ - [e.g., "Must mention the free trial"]
97
+ - [e.g., "Must pass lint and type check"]
98
+
99
+ ## Soft Preferences
100
+
101
+ [Tiebreakers — not vetoes, but guide decisions]
102
+
103
+ - [e.g., "Prefer shorter over longer"]
104
+ - [e.g., "Prefer simple over clever"]
105
+
106
+ ## Budget
107
+
108
+ - Max iterations: [N]
109
+ - Max wall-clock time: [hours]
110
+
111
+ ## Stop Rules
112
+
113
+ - All iterations completed
114
+ - Target score reached: [if applicable]
115
+ - Plateau: [N] consecutive discards (default 5)
116
+ ```
117
+
118
+ ## Scoring
119
+
120
+ ### For code targets
121
+
122
+ Run a benchmark or test command. Extract the metric. The command is fixed in the contract
123
+ and cannot be modified during the run.
124
+
125
+ ```
126
+ 1. Gate check: tests pass? lint clean? types check? → if any fail, reject immediately
127
+ 2. Run benchmark command → extract primary metric
128
+ 3. Check secondary metrics for regressions → if any violated, reject
129
+ 4. Compare primary metric against current-best
130
+ 5. Accept only if clearly improved (above noise floor)
131
+ ```
132
+
133
+ ### For content targets (prompts, copy, configs)
134
+
135
+ Use pairwise comparison. Never absolute 1-10 scoring.
136
+
137
+ ```
138
+ 1. Gate check: hard constraints met? (word count, required elements, etc.)
139
+ 2. Present both versions side by side:
140
+ - Randomly assign which is "Version A" and "Version B"
141
+ - Do NOT label which is current-best vs candidate
142
+ 3. Evaluate using the frozen rubric criteria
143
+ 4. Pick the winner — candidate must be CLEARLY better, not just different
144
+ 5. If it's a toss-up, reject (bias toward stability)
145
+ 6. Check secondary metrics for regressions
146
+ ```
147
+
148
+ **Anti-bias controls for LLM-as-judge:**
149
+
150
+ - Randomize A/B order every time (prevents position bias)
151
+ - Never reveal which version is "current" vs "candidate"
152
+ - If the margin is slim, run the comparison twice with swapped order
153
+ - The rubric is frozen in `rubric.md` — you cannot modify scoring criteria mid-run
154
+
155
+ ## Exploration Strategy
156
+
157
+ Don't just make incremental tweaks. Use staged exploration:
158
+
159
+ **Early phase (first ~30% of iterations):** Go broad. Try fundamentally different approaches.
160
+ Different structures, different angles, different trade-offs. You're mapping the space.
161
+
162
+ **Exploit phase (middle ~50%):** You've found something that works. Refine around it.
163
+ Incremental improvements, wording tweaks, parameter tuning.
164
+
165
+ **Escape phase (if plateaued):** If you hit 5 consecutive discards, try ONE radical departure
166
+ from current-best — something completely different. If that fails too, stop. You've likely
167
+ found a local optimum.
168
+
169
+ ## The Results Log (results.jsonl)
170
+
171
+ Append one JSON object per experiment. Never edit previous entries.
172
+
173
+ ```json
174
+ {"n": 1, "status": "keep", "hypothesis": "shorter opening hook", "score_note": "candidate clearly more direct", "duration_s": 45, "timestamp": "2026-04-07T02:14:00Z"}
175
+ {"n": 2, "status": "discard", "hypothesis": "add social proof", "score_note": "toss-up, rejected for stability", "duration_s": 38, "timestamp": "2026-04-07T02:21:00Z"}
176
+ {"n": 3, "status": "crash", "hypothesis": "doubled context window", "error": "benchmark timed out", "duration_s": 300, "timestamp": "2026-04-07T02:28:00Z"}
177
+ ```
178
+
179
+ Every entry must include:
180
+
181
+ - `n` — experiment number
182
+ - `status` — `keep`, `discard`, or `crash`
183
+ - `hypothesis` — what you tried and why (one line)
184
+ - `score_note` — why you kept or discarded (one line)
185
+ - `timestamp` — when the experiment completed
186
+
187
+ ## Resumability
188
+
189
+ If the run crashes or is interrupted:
190
+
191
+ 1. Read `current-best.md` — this is always the last accepted version
192
+ 2. Read `results.jsonl` — count completed experiments, review what was tried
193
+ 3. Read `state.md` — pick up your thinking from where you left off
194
+ 4. Continue from the next experiment number
195
+ 5. Do NOT re-run completed experiments
196
+
197
+ ## Scoring Integrity
198
+
199
+ **The scorer and the optimizer must be separated in intent.** You are both proposer and judge,
200
+ so you must be disciplined:
201
+
202
+ - The rubric is frozen. Do not adjust criteria because a candidate "almost" passes.
203
+ - Do not add special cases to make a favorite candidate win.
204
+ - Do not lower the bar after repeated failures. If nothing passes, that's a valid outcome.
205
+ - If you notice you're gaming your own rubric, stop and note it in state.md.
206
+
207
+ ## When Finished
208
+
209
+ 1. Update `state.md` with a final summary:
210
+ - Baseline description vs final best description
211
+ - Total experiments: N run, X accepted, Y discarded, Z crashed
212
+ - Key findings: what worked, what didn't, surprises
213
+ 2. Send a message to the user (via `send_message`):
214
+ ```
215
+ [optimization] Done. Ran N experiments on [target].
216
+ X accepted, Y discarded. [One-line summary of the best version vs baseline].
217
+ Results: ~/.niahere/optimizations/{slug}-{hex}/
218
+ ```
219
+ 3. Do NOT auto-apply the result. The user reviews `current-best.md` and decides
220
+ whether to use it.
221
+
222
+ ## Principles
223
+
224
+ - **Propose, never apply.** The optimization produces a candidate. The user promotes it.
225
+ - **Simplicity criterion.** A marginal improvement that adds complexity isn't worth keeping.
226
+ Removing something while maintaining quality is always a win.
227
+ - **Bias toward stability.** When in doubt, reject. Keeping a good version is better than
228
+ accepting a sideways move.
229
+ - **One target, one metric, one run.** Don't try to optimize multiple things simultaneously.
230
+ Run separate optimizations for separate targets.
@@ -0,0 +1,238 @@
1
+ ---
2
+ name: optimize
3
+ description: |
4
+ Schedule or run an iterative optimization pass on code, prompts, copy, or any scorable
5
+ target. Use when user asks to "optimize this", "run experiments", "autoresearch this",
6
+ "iterate on this overnight", "can this be better", or proactively suggest after completing
7
+ work that could benefit from further iteration. Also use when a job wants to self-optimize
8
+ something within its own run. Handles spec confirmation, scoring setup, job scheduling,
9
+ and result delivery. For the loop discipline itself, references the optimization-loop skill.
10
+ metadata:
11
+ version: 1.0.0
12
+ ---
13
+
14
+ # Optimize
15
+
16
+ Schedule or run autonomous optimization passes. This skill handles the orchestration —
17
+ when to use it, how to confirm specs, how to schedule, how to deliver results.
18
+
19
+ For the loop discipline, scoring methods, and workspace layout, invoke the
20
+ `optimization-loop` skill.
21
+
22
+ ## Two Entry Points
23
+
24
+ ### 1. User explicitly asks
25
+
26
+ User says "autoresearch this", "optimize this overnight", "run experiments on this",
27
+ "can you iterate on this more", or similar.
28
+
29
+ **Don't suggest — confirm and schedule.** The user already wants this. Move to Step 1.
30
+
31
+ ### 2. Proactive suggestion (after immediate work)
32
+
33
+ You just finished a task — rewrote copy, tuned a prompt, optimized a function. The result
34
+ is good, but more iterations could find something better.
35
+
36
+ Suggest briefly:
37
+
38
+ > "This is solid. Want me to schedule an overnight optimization pass? I'll run ~30
39
+ > experiments scoring each version against [brief criteria] and have the best version
40
+ > ready by morning."
41
+
42
+ **Rules for suggesting:**
43
+
44
+ - Only suggest when there's a clear, scorable metric
45
+ - Only suggest when the target is self-contained (one file, one prompt, one section)
46
+ - Don't suggest for trivial tasks or quick fixes
47
+ - Don't push if the user declines — move on immediately
48
+ - Don't suggest if the user said they need this done now and can't wait
49
+
50
+ ## Step 1: Confirm the Setup
51
+
52
+ Before scheduling, confirm these with the user. Be concise — a quick summary, not an
53
+ interrogation.
54
+
55
+ **Target** — What are we optimizing?
56
+
57
+ - A file (code, config, prompt file)
58
+ - A section of content (landing page hero, email subject line)
59
+ - A prompt or template
60
+
61
+ **Scoring method** — How do we know if a version is better?
62
+
63
+ - Code: what benchmark or test command produces a number?
64
+ - Content: what criteria matter? (clarity, persuasiveness, brevity, conversion, etc.)
65
+ - Custom: does the user have a specific scoring script?
66
+
67
+ **Constraints** — What can't change?
68
+
69
+ - Hard constraints (must-haves, test requirements, word limits)
70
+ - Soft preferences (shorter is better, simpler is better)
71
+
72
+ **Secondary metrics** — What must NOT get worse?
73
+
74
+ - Code: performance can't drop, memory can't increase, tests must pass
75
+ - Content: readability, brand voice, required elements
76
+ - These are regression guards — violations veto an otherwise good candidate
77
+
78
+ **Iterations** — How many experiments? Default 30. User can adjust.
79
+
80
+ **When** — Now, or schedule for later? If later, what time?
81
+
82
+ Example confirmation:
83
+
84
+ > "Here's the plan:
85
+ >
86
+ > - **Target**: signup prompt at `src/prompts/signup.md`
87
+ > - **Scoring**: pairwise comparison on clarity, persuasiveness, and brevity
88
+ > - **Constraints**: must mention free trial, keep under 150 words
89
+ > - **Regression guards**: readability must stay above grade 8
90
+ > - **Iterations**: 30 experiments
91
+ > - **When**: tonight at midnight
92
+ >
93
+ > Sound right?"
94
+
95
+ Wait for confirmation before proceeding.
96
+
97
+ ## Step 2: Set Up the Workspace
98
+
99
+ Create the optimization directory:
100
+
101
+ ```
102
+ ~/.niahere/optimizations/{slug}-{hex}/
103
+ ```
104
+
105
+ Where `{slug}` is a short descriptive name and `{hex}` is 4 random hex chars.
106
+
107
+ Create the frozen files:
108
+
109
+ 1. **contract.md** — objective, target, primary metric, secondary metrics, constraints,
110
+ preferences, budget, stop rules (see optimization-loop skill for template)
111
+ 2. **rubric.md** — detailed scoring criteria
112
+ - For code: the benchmark command and how to extract the metric
113
+ - For content: the pairwise comparison rubric with specific criteria and weights
114
+ 3. **baseline.md** — copy the current version of the target (the starting point)
115
+ 4. **current-best.md** — copy of baseline (will be updated during the run)
116
+ 5. **state.md** — initialize with "Run starting. 0 experiments completed."
117
+ 6. **accepted/** — create empty directory
118
+
119
+ ## Step 3: Compose the Job Prompt
120
+
121
+ Build a self-contained job prompt that encodes everything the agent needs to run
122
+ the optimization loop autonomously. The prompt must include:
123
+
124
+ ```
125
+ Job: optimization — {slug}
126
+
127
+ You are running an optimization loop. Follow the optimization-loop pattern strictly.
128
+
129
+ ## Your workspace
130
+ {absolute path to the optimization directory}
131
+
132
+ ## What to optimize
133
+ {description of the target — file path, what it does, context}
134
+
135
+ ## Current version
136
+ {full content of the target}
137
+
138
+ ## Contract
139
+ {contents of contract.md}
140
+
141
+ ## Scoring rubric
142
+ {contents of rubric.md}
143
+
144
+ ## Loop instructions
145
+
146
+ Read your workspace files (contract.md, rubric.md, baseline.md, current-best.md,
147
+ state.md, results.jsonl) to understand the current state.
148
+
149
+ For each iteration:
150
+ 1. Read state.md for context on what's been tried
151
+ 2. Form a hypothesis — what to change and why
152
+ 3. Produce a candidate version
153
+ 4. Gate check — verify all hard constraints from the contract
154
+ 5. Score — compare candidate vs current-best using the rubric (pairwise, randomized order)
155
+ 6. If candidate is clearly better AND no secondary metric regressions:
156
+ - Update current-best.md
157
+ - Save candidate to accepted/{NNN}.md
158
+ - Log {"status": "keep", ...} to results.jsonl
159
+ 7. If not clearly better:
160
+ - Discard candidate
161
+ - Log {"status": "discard", ...} to results.jsonl
162
+ 8. Update state.md with what you tried and learned
163
+
164
+ Stop when:
165
+ - Completed {N} iterations, OR
166
+ - {stop_count} consecutive discards (plateau), OR
167
+ - Target score reached (if specified in contract)
168
+
169
+ When finished, update state.md with a final summary and send a message to the user:
170
+ "[optimization] Done. Ran N experiments on {target}. X accepted, Y discarded.
171
+ {One-line summary}. Results: {workspace path}"
172
+
173
+ IMPORTANT:
174
+ - Do NOT modify contract.md or rubric.md
175
+ - Do NOT auto-apply results to the original file
176
+ - Do NOT stop to ask the user questions — run autonomously until done
177
+ ```
178
+
179
+ ## Step 4: Schedule the Job
180
+
181
+ Use the `add_job` MCP tool (preferred) or `nia job add` CLI:
182
+
183
+ - **name**: `optimize-{slug}` (e.g., `optimize-signup-prompt`)
184
+ - **schedule**: ISO timestamp for the agreed time, or now
185
+ - **schedule_type**: `once`
186
+ - **prompt**: the composed job prompt from Step 3
187
+ - **always**: `true` (overnight runs need to ignore active hours)
188
+ - **stateless**: `yes` (the optimization uses its own workspace, not the job's state.md)
189
+
190
+ Confirm to the user:
191
+
192
+ > "Scheduled. The optimization run starts at {time} and will run ~{N} experiments.
193
+ > I'll message you when it's done with the results."
194
+
195
+ ## Step 5: After Completion
196
+
197
+ When the user asks about results, or when reviewing the notification:
198
+
199
+ 1. Read `~/.niahere/optimizations/{slug}-{hex}/state.md` for the summary
200
+ 2. Read `results.jsonl` for the experiment log
201
+ 3. Show `current-best.md` vs `baseline.md` — the diff is the value
202
+ 4. Show the accepted progression if the user wants to see the journey
203
+ 5. Ask if the user wants to apply the result to the original target
204
+
205
+ ## Running Now vs Later
206
+
207
+ **"Run it now":** Schedule with the current timestamp. The user stays in the conversation
208
+ and can check results when the job finishes. Good for shorter runs (10-15 iterations).
209
+
210
+ **"Schedule for later":** Schedule for a specific time (midnight, after hours). The user
211
+ goes about their day. The notification arrives when done. Good for longer runs (30+ iterations).
212
+
213
+ **"Run it inline":** If the user wants to optimize something RIGHT NOW in this conversation
214
+ (not as a job), you can run the optimization-loop pattern directly without scheduling a job.
215
+ Use this for quick 5-10 iteration runs where the user is watching.
216
+
217
+ ## When a Job Self-Optimizes
218
+
219
+ A running job (e.g., news-curator, prompt-generator) can use this pattern to improve
220
+ its own approach. The flow:
221
+
222
+ 1. Job creates an optimization subdirectory in its workspace or in `~/.niahere/optimizations/`
223
+ 2. Runs the loop inline (not as a sub-job — within its own execution)
224
+ 3. Saves the best version in the workspace
225
+ 4. Does NOT auto-apply changes to its own prompt or config
226
+ 5. Sends a message: "I found a better approach for [X]. Review at [path]."
227
+ 6. The user decides whether to apply it (e.g., via `nia job update`)
228
+
229
+ ## What NOT to Optimize
230
+
231
+ - Things without a clear metric (vague "make it better")
232
+ - Targets that require human judgment with no proxy (art, brand voice decisions)
233
+ - Multi-file changes with complex interdependencies
234
+ - Anything where the scoring takes longer than the modification (defeats the loop)
235
+ - Security-sensitive code where autonomous changes are risky
236
+
237
+ If the target doesn't fit, say so. Not everything benefits from iterative optimization.
238
+ Sometimes the first good version is the right answer.