niahere 0.2.57 → 0.2.59
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +9 -3
- package/package.json +2 -2
- package/skills/beads-tasks/SKILL.md +91 -13
- package/skills/optimization-loop/SKILL.md +230 -0
- package/skills/optimize/SKILL.md +238 -0
- package/src/chat/engine.ts +95 -17
- package/src/cli/index.ts +124 -38
- package/src/cli/job.ts +160 -43
- package/src/commands/backup.ts +26 -4
- package/src/core/agents.ts +22 -8
- package/src/core/consolidator.ts +52 -14
- package/src/core/health.ts +92 -23
- package/src/core/runner.ts +74 -15
- package/src/core/skills.ts +18 -6
- package/src/core/summarizer.ts +33 -8
- package/src/db/migrations/013_jobs_model.ts +7 -0
- package/src/db/models/active_engine.ts +5 -3
- package/src/db/models/job.ts +40 -13
- package/src/mcp/server.ts +202 -37
- package/src/mcp/tools.ts +116 -29
- package/src/types/audit.ts +1 -0
- package/src/types/job.ts +2 -0
- package/src/utils/retry.ts +18 -0
package/README.md
CHANGED
|
@@ -1,5 +1,9 @@
|
|
|
1
1
|
# nia
|
|
2
2
|
|
|
3
|
+
[](https://www.npmjs.com/package/niahere)
|
|
4
|
+
[](https://www.npmjs.com/package/niahere)
|
|
5
|
+
[](https://github.com/onlyoneaman/niahere/blob/main/LICENSE)
|
|
6
|
+
|
|
3
7
|
A personal AI agent you fork and make your own. Small enough to understand, built for one user. Powered by Claude Agent SDK.
|
|
4
8
|
|
|
5
9
|
- npm package: [`niahere`](https://www.npmjs.com/package/niahere)
|
|
@@ -33,7 +37,7 @@ nia start # starts daemon + registers OS service
|
|
|
33
37
|
- **Telegram** — message your agent from your phone, typing indicator while processing
|
|
34
38
|
- **Slack** — Socket Mode bot with thread awareness, thinking emoji, watch channels for proactive monitoring
|
|
35
39
|
- **Terminal chat** — REPL with session resume support
|
|
36
|
-
- **Scheduled jobs** — recurring jobs and crons that run Claude and can message you back
|
|
40
|
+
- **Scheduled jobs** — recurring jobs and crons that run Claude and can message you back. Stateful by default (working memory), per-job model routing for cost savings
|
|
37
41
|
- **Persona system** — customizable identity, soul, owner profile, rules, and memory (preloaded every session)
|
|
38
42
|
- **Agents** — domain specialists (marketer, senior-dev) via Claude Agent SDK subagents
|
|
39
43
|
- **Skills** — loads skills from multiple directories, invokable as slash commands
|
|
@@ -63,8 +67,8 @@ nia update — update to latest version (auto-backup + resta
|
|
|
63
67
|
nia job list — list all jobs
|
|
64
68
|
nia job show [name] — full details + recent runs
|
|
65
69
|
nia job status [name] — quick status check
|
|
66
|
-
nia job add <n> <s> <p> — add a job (--type, --always, --agent, --stateless, --prompt-file)
|
|
67
|
-
nia job update <name> — update a job (--schedule, --prompt, --prompt-file, --type, --always, --agent, --stateless)
|
|
70
|
+
nia job add <n> <s> <p> — add a job (--type, --always, --agent, --model, --stateless, --prompt-file)
|
|
71
|
+
nia job update <name> — update a job (--schedule, --prompt, --prompt-file, --type, --always, --agent, --model, --stateless)
|
|
68
72
|
nia job remove <name> — delete a job
|
|
69
73
|
nia job enable / disable <n> — toggle a job
|
|
70
74
|
nia job run <name> — run a job once
|
|
@@ -106,6 +110,8 @@ All config and data lives in `~/.niahere/`:
|
|
|
106
110
|
soul.md — how the agent works
|
|
107
111
|
rules.md — behavioral instructions (loaded every session)
|
|
108
112
|
memory.md — persistent facts and context (loaded every session)
|
|
113
|
+
jobs/ — per-job working memory and state (auto-created)
|
|
114
|
+
optimizations/ — optimization loop run workspaces
|
|
109
115
|
images/
|
|
110
116
|
reference.webp — visual identity reference image
|
|
111
117
|
profile.webp — profile picture for Telegram/Slack
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "niahere",
|
|
3
|
-
"version": "0.2.
|
|
3
|
+
"version": "0.2.59",
|
|
4
4
|
"description": "A personal AI assistant daemon — chat, scheduled jobs, persona system, extensible via skills.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"scripts": {
|
|
@@ -43,7 +43,7 @@
|
|
|
43
43
|
"license": "MIT",
|
|
44
44
|
"private": false,
|
|
45
45
|
"dependencies": {
|
|
46
|
-
"@anthropic-ai/claude-agent-sdk": "^0.2.
|
|
46
|
+
"@anthropic-ai/claude-agent-sdk": "^0.2.97",
|
|
47
47
|
"@modelcontextprotocol/sdk": "^1.27.1",
|
|
48
48
|
"@slack/bolt": "^4.6.0",
|
|
49
49
|
"cron-parser": "^5.5.0",
|
|
@@ -3,9 +3,8 @@ name: beads-tasks
|
|
|
3
3
|
description: >
|
|
4
4
|
Persistent task management via Beads CLI (bd). Use when user mentions tasks, todos, issues, or tracking work.
|
|
5
5
|
Check `which bd` first — if missing, offer: `npm install -g @beads/bd`.
|
|
6
|
-
|
|
7
|
-
`
|
|
8
|
-
Run `cd "$BEATS_DIR" && bd help-all` for available commands. Not for ephemeral in-conversation tracking.
|
|
6
|
+
All commands: run from `$BEATS_DIR` (for example `~/.niahere/beads`) and use `bd <command>`. Always label: `--label project:<project-name>`.
|
|
7
|
+
Run `bd help-all` for available commands. Not for ephemeral in-conversation tracking.
|
|
9
8
|
---
|
|
10
9
|
|
|
11
10
|
## Overview
|
|
@@ -19,7 +18,63 @@ Global task manager powered by [Beads](https://github.com/steveyegge/beads). Sto
|
|
|
19
18
|
2. Ensure `~/.niahere/beads/.beads` exists — `bd init` if not.
|
|
20
19
|
3. Set `BEATS_DIR` to your Beads workspace (for example `~/.niahere/beads`).
|
|
21
20
|
4. All commands: `cd "$BEATS_DIR" && bd <command>`.
|
|
22
|
-
5. Always label with `--label project:<
|
|
21
|
+
5. Always label with `--label project:<name>`.
|
|
22
|
+
6. Run `cd "$BEATS_DIR" && bd help-all` for available commands.
|
|
23
|
+
|
|
24
|
+
## Core Commands
|
|
25
|
+
|
|
26
|
+
### Creating tasks
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
# Basic
|
|
30
|
+
bd create --title "Fix auth token refresh" --priority P2 --type bug
|
|
31
|
+
|
|
32
|
+
# With parent (subtask)
|
|
33
|
+
bd create --title "Extract shared logic" --priority P2 --type task --parent <parent-id>
|
|
34
|
+
|
|
35
|
+
# With description — ALWAYS add context: what's broken, links, references
|
|
36
|
+
bd create --title "Chat fails on long docs" --type bug --description "Fails on docs >500 pages. Ref: https://..."
|
|
37
|
+
|
|
38
|
+
# Epic (container for related tasks)
|
|
39
|
+
bd create --title "API performance improvements" --type epic --priority P2
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
### Updating tasks
|
|
43
|
+
|
|
44
|
+
`bd update` is the workhorse — use it for reparenting, reprioritizing, retyping, renaming:
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
bd update <id> --parent <new-parent-id> # Reparent / move under epic
|
|
48
|
+
bd update <id> --parent "" # Remove parent (make top-level)
|
|
49
|
+
bd update <id> --priority P1 # Change priority
|
|
50
|
+
bd update <id> --type bug # Change type
|
|
51
|
+
bd update <id> --title "Better title" # Rename
|
|
52
|
+
bd update <id> --status in_progress # Start work
|
|
53
|
+
bd update <id> --description "..." # Add/replace description
|
|
54
|
+
bd update <id> --add-label personal # Add label
|
|
55
|
+
bd update <id> --set-labels bug,urgent # Replace all labels
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
Chain multiple updates: `bd update <id> --priority P1 --type bug --parent <parent-id>`
|
|
59
|
+
|
|
60
|
+
### Viewing tasks
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
bd list # Open tasks (tree view)
|
|
64
|
+
bd list --all # Include closed/deferred tasks
|
|
65
|
+
bd list --label project:<name> # Filter by project
|
|
66
|
+
bd show <id> # Full details of a task
|
|
67
|
+
bd children <id> # List children of a parent
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
### Closing tasks
|
|
71
|
+
|
|
72
|
+
```bash
|
|
73
|
+
bd close <id> # Close a task
|
|
74
|
+
bd reopen <id> # Reopen if closed prematurely
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
**Warning:** Closing a parent does NOT close or reparent its children. If a parent epic is done but children remain open, reparent them first or they become orphaned top-level items.
|
|
23
78
|
|
|
24
79
|
## Decision Points
|
|
25
80
|
|
|
@@ -29,22 +84,45 @@ Global task manager powered by [Beads](https://github.com/steveyegge/beads). Sto
|
|
|
29
84
|
- User says "done with X" / "finished" → `bd close <id>`
|
|
30
85
|
- User wants to see cross-project work → `bd list` (no project filter)
|
|
31
86
|
- User wants project-specific view → `bd list --label project:<name>`
|
|
32
|
-
- User asks for task in-progress state → use task status:
|
|
33
|
-
- `bd update <task-id> --status in_progress`
|
|
34
|
-
- `bd set-state ... state=...` is for operational metadata only and does not
|
|
35
|
-
change list-visible task status.
|
|
36
87
|
- bd not installed → offer install, don't silently fail
|
|
37
88
|
- Ephemeral/conversation-only tracking → use conversation context, not beads
|
|
89
|
+
- `bd set-state ... state=...` is for operational metadata only; it does not change the task status shown in list.
|
|
90
|
+
|
|
91
|
+
## Hierarchy & Organization
|
|
92
|
+
|
|
93
|
+
### When to use parent-child vs labels
|
|
94
|
+
|
|
95
|
+
- **Parent-child** (`--parent`): for structural grouping — epics containing subtasks, features broken into steps.
|
|
96
|
+
- **Labels** (`--add-label`): for cross-cutting tags — `personal`, `urgent`, `project:<name>`. A task can have multiple labels but only one parent.
|
|
97
|
+
|
|
98
|
+
### Epic patterns
|
|
99
|
+
|
|
100
|
+
- Use `--type epic` for containers that group related work.
|
|
101
|
+
- Epics can nest: epic > sub-epic > tasks.
|
|
102
|
+
- Keep epic titles broad ("API improvements"), subtask titles specific ("Reduce /search latency from 2s to 200ms").
|
|
103
|
+
|
|
104
|
+
### Cleanup & auditing
|
|
105
|
+
|
|
106
|
+
Periodically review with `bd list` and look for:
|
|
107
|
+
- **Orphaned tasks** — top-level items that should be under an epic.
|
|
108
|
+
- **Similar ungrouped tasks** — multiple tasks on the same topic that should share a parent.
|
|
109
|
+
- **Misplaced tasks** — bugs under improvement epics or vice versa.
|
|
110
|
+
- **Stale tasks** — open tasks that are actually done or no longer relevant.
|
|
111
|
+
|
|
112
|
+
When reorganizing, reparent with `bd update <id> --parent <new-parent>` — don't delete and recreate.
|
|
38
113
|
|
|
39
114
|
## Conventions
|
|
40
115
|
|
|
41
|
-
- Titles
|
|
42
|
-
-
|
|
43
|
-
-
|
|
44
|
-
-
|
|
116
|
+
- **Titles:** descriptive, actionable (e.g. "Fix auth token refresh in niahere")
|
|
117
|
+
- **Descriptions:** always include context — what's broken, why it matters, links to references (Canny, threads, logs). Future you needs enough to start working without asking questions.
|
|
118
|
+
- **Types:** `epic`, `bug`, `feature`, `task`, `chore`, `decision`
|
|
119
|
+
- **Priority:** P0 (critical) → P4 (nice-to-have). Default P2 unless user specifies.
|
|
120
|
+
- **Labels:** `project:<name>`, `personal`, `bug`, `feature`, `chore`, `urgent`
|
|
121
|
+
- **Status flow:** `open` → `in_progress` → `closed`
|
|
45
122
|
|
|
46
123
|
## Validation
|
|
47
124
|
|
|
48
|
-
- `
|
|
125
|
+
- `bd list` returns results after creating a task
|
|
49
126
|
- Labels appear correctly in list output
|
|
127
|
+
- Parent-child relationships show as indented tree in `bd list`
|
|
50
128
|
- Dependencies show in `bd dep tree`
|
|
@@ -0,0 +1,230 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: optimization-loop
|
|
3
|
+
description: |
|
|
4
|
+
The iterative optimization pattern (Karpathy Loop / autoresearch). Reference for running
|
|
5
|
+
autonomous experiment loops on any target: modify → score → keep or revert → repeat.
|
|
6
|
+
Use when running multiple iterations of improvement against a measurable metric — code
|
|
7
|
+
benchmarks, prompt quality, copy effectiveness, config tuning, or any scorable target.
|
|
8
|
+
Also known as "autoresearch." Use this skill to understand the pattern and discipline.
|
|
9
|
+
For orchestration (scheduling, user confirmation, job setup), see the "optimize" skill.
|
|
10
|
+
metadata:
|
|
11
|
+
version: 1.0.0
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Optimization Loop
|
|
15
|
+
|
|
16
|
+
The Karpathy Loop: autonomous iterative optimization through disciplined experimentation.
|
|
17
|
+
Modify a target, score the result, keep improvements, revert failures, repeat.
|
|
18
|
+
|
|
19
|
+
This skill defines the **pattern and discipline**. For when/how to schedule and orchestrate
|
|
20
|
+
optimization runs, see the `optimize` skill.
|
|
21
|
+
|
|
22
|
+
## The Pattern
|
|
23
|
+
|
|
24
|
+
```
|
|
25
|
+
freeze contract + rubric
|
|
26
|
+
save baseline (never touch again)
|
|
27
|
+
copy baseline → current-best
|
|
28
|
+
|
|
29
|
+
repeat:
|
|
30
|
+
1. read state — what's been tried, what worked
|
|
31
|
+
2. hypothesize — form a specific idea, informed by history
|
|
32
|
+
3. modify — produce a candidate version
|
|
33
|
+
4. gate check — hard constraints pass? if no → reject
|
|
34
|
+
5. score — compare candidate vs current-best (pairwise)
|
|
35
|
+
6. decide — clearly better? keep. otherwise revert.
|
|
36
|
+
7. log — append to results.jsonl
|
|
37
|
+
8. update state — what you tried, what happened, what next
|
|
38
|
+
|
|
39
|
+
until: budget exhausted, target reached, or plateau detected
|
|
40
|
+
notify user with summary
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## Workspace Layout
|
|
44
|
+
|
|
45
|
+
Each optimization run gets a dedicated, self-contained directory:
|
|
46
|
+
|
|
47
|
+
```
|
|
48
|
+
~/.niahere/optimizations/{slug}-{hex}/
|
|
49
|
+
├── contract.md # Frozen at start: objective, scope, constraints, metrics, budget
|
|
50
|
+
├── rubric.md # Frozen at start: scoring criteria (never modify during run)
|
|
51
|
+
├── baseline.md # Original version (never modify)
|
|
52
|
+
├── current-best.md # Best version so far (update only on accept)
|
|
53
|
+
├── accepted/ # Every accepted candidate, numbered
|
|
54
|
+
│ ├── 001.md
|
|
55
|
+
│ ├── 002.md
|
|
56
|
+
│ └── ...
|
|
57
|
+
├── results.jsonl # One JSON object per experiment (append-only)
|
|
58
|
+
└── state.md # Your working notebook
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
**The slug** is human-readable (e.g., `signup-prompt`). The hex suffix (4 chars) prevents
|
|
62
|
+
collisions across multiple runs on the same target.
|
|
63
|
+
|
|
64
|
+
## The Contract (contract.md)
|
|
65
|
+
|
|
66
|
+
Freeze this at the start. Never modify during the run.
|
|
67
|
+
|
|
68
|
+
```markdown
|
|
69
|
+
# Optimization Contract
|
|
70
|
+
|
|
71
|
+
## Objective
|
|
72
|
+
|
|
73
|
+
[What we're optimizing and why — one sentence]
|
|
74
|
+
|
|
75
|
+
## Target
|
|
76
|
+
|
|
77
|
+
[File path or content being modified]
|
|
78
|
+
[Which sections/parts are in scope — be specific]
|
|
79
|
+
|
|
80
|
+
## Primary Metric
|
|
81
|
+
|
|
82
|
+
[The metric being optimized — what "better" means]
|
|
83
|
+
|
|
84
|
+
## Secondary Metrics (regression guards)
|
|
85
|
+
|
|
86
|
+
[Metrics that must NOT degrade. Each with a threshold.]
|
|
87
|
+
|
|
88
|
+
- [e.g., "Word count must stay under 200"]
|
|
89
|
+
- [e.g., "All existing tests must pass"]
|
|
90
|
+
- [e.g., "Readability score must stay above grade 8"]
|
|
91
|
+
|
|
92
|
+
## Hard Constraints
|
|
93
|
+
|
|
94
|
+
[Violations = automatic reject, no exceptions]
|
|
95
|
+
|
|
96
|
+
- [e.g., "Must mention the free trial"]
|
|
97
|
+
- [e.g., "Must pass lint and type check"]
|
|
98
|
+
|
|
99
|
+
## Soft Preferences
|
|
100
|
+
|
|
101
|
+
[Tiebreakers — not vetoes, but guide decisions]
|
|
102
|
+
|
|
103
|
+
- [e.g., "Prefer shorter over longer"]
|
|
104
|
+
- [e.g., "Prefer simple over clever"]
|
|
105
|
+
|
|
106
|
+
## Budget
|
|
107
|
+
|
|
108
|
+
- Max iterations: [N]
|
|
109
|
+
- Max wall-clock time: [hours]
|
|
110
|
+
|
|
111
|
+
## Stop Rules
|
|
112
|
+
|
|
113
|
+
- All iterations completed
|
|
114
|
+
- Target score reached: [if applicable]
|
|
115
|
+
- Plateau: [N] consecutive discards (default 5)
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
## Scoring
|
|
119
|
+
|
|
120
|
+
### For code targets
|
|
121
|
+
|
|
122
|
+
Run a benchmark or test command. Extract the metric. The command is fixed in the contract
|
|
123
|
+
and cannot be modified during the run.
|
|
124
|
+
|
|
125
|
+
```
|
|
126
|
+
1. Gate check: tests pass? lint clean? types check? → if any fail, reject immediately
|
|
127
|
+
2. Run benchmark command → extract primary metric
|
|
128
|
+
3. Check secondary metrics for regressions → if any violated, reject
|
|
129
|
+
4. Compare primary metric against current-best
|
|
130
|
+
5. Accept only if clearly improved (above noise floor)
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
### For content targets (prompts, copy, configs)
|
|
134
|
+
|
|
135
|
+
Use pairwise comparison. Never absolute 1-10 scoring.
|
|
136
|
+
|
|
137
|
+
```
|
|
138
|
+
1. Gate check: hard constraints met? (word count, required elements, etc.)
|
|
139
|
+
2. Present both versions side by side:
|
|
140
|
+
- Randomly assign which is "Version A" and "Version B"
|
|
141
|
+
- Do NOT label which is current-best vs candidate
|
|
142
|
+
3. Evaluate using the frozen rubric criteria
|
|
143
|
+
4. Pick the winner — candidate must be CLEARLY better, not just different
|
|
144
|
+
5. If it's a toss-up, reject (bias toward stability)
|
|
145
|
+
6. Check secondary metrics for regressions
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
**Anti-bias controls for LLM-as-judge:**
|
|
149
|
+
|
|
150
|
+
- Randomize A/B order every time (prevents position bias)
|
|
151
|
+
- Never reveal which version is "current" vs "candidate"
|
|
152
|
+
- If the margin is slim, run the comparison twice with swapped order
|
|
153
|
+
- The rubric is frozen in `rubric.md` — you cannot modify scoring criteria mid-run
|
|
154
|
+
|
|
155
|
+
## Exploration Strategy
|
|
156
|
+
|
|
157
|
+
Don't just make incremental tweaks. Use staged exploration:
|
|
158
|
+
|
|
159
|
+
**Early phase (first ~30% of iterations):** Go broad. Try fundamentally different approaches.
|
|
160
|
+
Different structures, different angles, different trade-offs. You're mapping the space.
|
|
161
|
+
|
|
162
|
+
**Exploit phase (middle ~50%):** You've found something that works. Refine around it.
|
|
163
|
+
Incremental improvements, wording tweaks, parameter tuning.
|
|
164
|
+
|
|
165
|
+
**Escape phase (if plateaued):** If you hit 5 consecutive discards, try ONE radical departure
|
|
166
|
+
from current-best — something completely different. If that fails too, stop. You've likely
|
|
167
|
+
found a local optimum.
|
|
168
|
+
|
|
169
|
+
## The Results Log (results.jsonl)
|
|
170
|
+
|
|
171
|
+
Append one JSON object per experiment. Never edit previous entries.
|
|
172
|
+
|
|
173
|
+
```json
|
|
174
|
+
{"n": 1, "status": "keep", "hypothesis": "shorter opening hook", "score_note": "candidate clearly more direct", "duration_s": 45, "timestamp": "2026-04-07T02:14:00Z"}
|
|
175
|
+
{"n": 2, "status": "discard", "hypothesis": "add social proof", "score_note": "toss-up, rejected for stability", "duration_s": 38, "timestamp": "2026-04-07T02:21:00Z"}
|
|
176
|
+
{"n": 3, "status": "crash", "hypothesis": "doubled context window", "error": "benchmark timed out", "duration_s": 300, "timestamp": "2026-04-07T02:28:00Z"}
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
Every entry must include:
|
|
180
|
+
|
|
181
|
+
- `n` — experiment number
|
|
182
|
+
- `status` — `keep`, `discard`, or `crash`
|
|
183
|
+
- `hypothesis` — what you tried and why (one line)
|
|
184
|
+
- `score_note` — why you kept or discarded (one line)
|
|
185
|
+
- `timestamp` — when the experiment completed
|
|
186
|
+
|
|
187
|
+
## Resumability
|
|
188
|
+
|
|
189
|
+
If the run crashes or is interrupted:
|
|
190
|
+
|
|
191
|
+
1. Read `current-best.md` — this is always the last accepted version
|
|
192
|
+
2. Read `results.jsonl` — count completed experiments, review what was tried
|
|
193
|
+
3. Read `state.md` — pick up your thinking from where you left off
|
|
194
|
+
4. Continue from the next experiment number
|
|
195
|
+
5. Do NOT re-run completed experiments
|
|
196
|
+
|
|
197
|
+
## Scoring Integrity
|
|
198
|
+
|
|
199
|
+
**The scorer and the optimizer must be separated in intent.** You are both proposer and judge,
|
|
200
|
+
so you must be disciplined:
|
|
201
|
+
|
|
202
|
+
- The rubric is frozen. Do not adjust criteria because a candidate "almost" passes.
|
|
203
|
+
- Do not add special cases to make a favorite candidate win.
|
|
204
|
+
- Do not lower the bar after repeated failures. If nothing passes, that's a valid outcome.
|
|
205
|
+
- If you notice you're gaming your own rubric, stop and note it in state.md.
|
|
206
|
+
|
|
207
|
+
## When Finished
|
|
208
|
+
|
|
209
|
+
1. Update `state.md` with a final summary:
|
|
210
|
+
- Baseline description vs final best description
|
|
211
|
+
- Total experiments: N run, X accepted, Y discarded, Z crashed
|
|
212
|
+
- Key findings: what worked, what didn't, surprises
|
|
213
|
+
2. Send a message to the user (via `send_message`):
|
|
214
|
+
```
|
|
215
|
+
[optimization] Done. Ran N experiments on [target].
|
|
216
|
+
X accepted, Y discarded. [One-line summary of the best version vs baseline].
|
|
217
|
+
Results: ~/.niahere/optimizations/{slug}-{hex}/
|
|
218
|
+
```
|
|
219
|
+
3. Do NOT auto-apply the result. The user reviews `current-best.md` and decides
|
|
220
|
+
whether to use it.
|
|
221
|
+
|
|
222
|
+
## Principles
|
|
223
|
+
|
|
224
|
+
- **Propose, never apply.** The optimization produces a candidate. The user promotes it.
|
|
225
|
+
- **Simplicity criterion.** A marginal improvement that adds complexity isn't worth keeping.
|
|
226
|
+
Removing something while maintaining quality is always a win.
|
|
227
|
+
- **Bias toward stability.** When in doubt, reject. Keeping a good version is better than
|
|
228
|
+
accepting a sideways move.
|
|
229
|
+
- **One target, one metric, one run.** Don't try to optimize multiple things simultaneously.
|
|
230
|
+
Run separate optimizations for separate targets.
|
|
@@ -0,0 +1,238 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: optimize
|
|
3
|
+
description: |
|
|
4
|
+
Schedule or run an iterative optimization pass on code, prompts, copy, or any scorable
|
|
5
|
+
target. Use when user asks to "optimize this", "run experiments", "autoresearch this",
|
|
6
|
+
"iterate on this overnight", "can this be better", or proactively suggest after completing
|
|
7
|
+
work that could benefit from further iteration. Also use when a job wants to self-optimize
|
|
8
|
+
something within its own run. Handles spec confirmation, scoring setup, job scheduling,
|
|
9
|
+
and result delivery. For the loop discipline itself, references the optimization-loop skill.
|
|
10
|
+
metadata:
|
|
11
|
+
version: 1.0.0
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Optimize
|
|
15
|
+
|
|
16
|
+
Schedule or run autonomous optimization passes. This skill handles the orchestration —
|
|
17
|
+
when to use it, how to confirm specs, how to schedule, how to deliver results.
|
|
18
|
+
|
|
19
|
+
For the loop discipline, scoring methods, and workspace layout, invoke the
|
|
20
|
+
`optimization-loop` skill.
|
|
21
|
+
|
|
22
|
+
## Two Entry Points
|
|
23
|
+
|
|
24
|
+
### 1. User explicitly asks
|
|
25
|
+
|
|
26
|
+
User says "autoresearch this", "optimize this overnight", "run experiments on this",
|
|
27
|
+
"can you iterate on this more", or similar.
|
|
28
|
+
|
|
29
|
+
**Don't suggest — confirm and schedule.** The user already wants this. Move to Step 1.
|
|
30
|
+
|
|
31
|
+
### 2. Proactive suggestion (after immediate work)
|
|
32
|
+
|
|
33
|
+
You just finished a task — rewrote copy, tuned a prompt, optimized a function. The result
|
|
34
|
+
is good, but more iterations could find something better.
|
|
35
|
+
|
|
36
|
+
Suggest briefly:
|
|
37
|
+
|
|
38
|
+
> "This is solid. Want me to schedule an overnight optimization pass? I'll run ~30
|
|
39
|
+
> experiments scoring each version against [brief criteria] and have the best version
|
|
40
|
+
> ready by morning."
|
|
41
|
+
|
|
42
|
+
**Rules for suggesting:**
|
|
43
|
+
|
|
44
|
+
- Only suggest when there's a clear, scorable metric
|
|
45
|
+
- Only suggest when the target is self-contained (one file, one prompt, one section)
|
|
46
|
+
- Don't suggest for trivial tasks or quick fixes
|
|
47
|
+
- Don't push if the user declines — move on immediately
|
|
48
|
+
- Don't suggest if the user said they need this done now and can't wait
|
|
49
|
+
|
|
50
|
+
## Step 1: Confirm the Setup
|
|
51
|
+
|
|
52
|
+
Before scheduling, confirm these with the user. Be concise — a quick summary, not an
|
|
53
|
+
interrogation.
|
|
54
|
+
|
|
55
|
+
**Target** — What are we optimizing?
|
|
56
|
+
|
|
57
|
+
- A file (code, config, prompt file)
|
|
58
|
+
- A section of content (landing page hero, email subject line)
|
|
59
|
+
- A prompt or template
|
|
60
|
+
|
|
61
|
+
**Scoring method** — How do we know if a version is better?
|
|
62
|
+
|
|
63
|
+
- Code: what benchmark or test command produces a number?
|
|
64
|
+
- Content: what criteria matter? (clarity, persuasiveness, brevity, conversion, etc.)
|
|
65
|
+
- Custom: does the user have a specific scoring script?
|
|
66
|
+
|
|
67
|
+
**Constraints** — What can't change?
|
|
68
|
+
|
|
69
|
+
- Hard constraints (must-haves, test requirements, word limits)
|
|
70
|
+
- Soft preferences (shorter is better, simpler is better)
|
|
71
|
+
|
|
72
|
+
**Secondary metrics** — What must NOT get worse?
|
|
73
|
+
|
|
74
|
+
- Code: performance can't drop, memory can't increase, tests must pass
|
|
75
|
+
- Content: readability, brand voice, required elements
|
|
76
|
+
- These are regression guards — violations veto an otherwise good candidate
|
|
77
|
+
|
|
78
|
+
**Iterations** — How many experiments? Default 30. User can adjust.
|
|
79
|
+
|
|
80
|
+
**When** — Now, or schedule for later? If later, what time?
|
|
81
|
+
|
|
82
|
+
Example confirmation:
|
|
83
|
+
|
|
84
|
+
> "Here's the plan:
|
|
85
|
+
>
|
|
86
|
+
> - **Target**: signup prompt at `src/prompts/signup.md`
|
|
87
|
+
> - **Scoring**: pairwise comparison on clarity, persuasiveness, and brevity
|
|
88
|
+
> - **Constraints**: must mention free trial, keep under 150 words
|
|
89
|
+
> - **Regression guards**: readability must stay above grade 8
|
|
90
|
+
> - **Iterations**: 30 experiments
|
|
91
|
+
> - **When**: tonight at midnight
|
|
92
|
+
>
|
|
93
|
+
> Sound right?"
|
|
94
|
+
|
|
95
|
+
Wait for confirmation before proceeding.
|
|
96
|
+
|
|
97
|
+
## Step 2: Set Up the Workspace
|
|
98
|
+
|
|
99
|
+
Create the optimization directory:
|
|
100
|
+
|
|
101
|
+
```
|
|
102
|
+
~/.niahere/optimizations/{slug}-{hex}/
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
Where `{slug}` is a short descriptive name and `{hex}` is 4 random hex chars.
|
|
106
|
+
|
|
107
|
+
Create the frozen files:
|
|
108
|
+
|
|
109
|
+
1. **contract.md** — objective, target, primary metric, secondary metrics, constraints,
|
|
110
|
+
preferences, budget, stop rules (see optimization-loop skill for template)
|
|
111
|
+
2. **rubric.md** — detailed scoring criteria
|
|
112
|
+
- For code: the benchmark command and how to extract the metric
|
|
113
|
+
- For content: the pairwise comparison rubric with specific criteria and weights
|
|
114
|
+
3. **baseline.md** — copy the current version of the target (the starting point)
|
|
115
|
+
4. **current-best.md** — copy of baseline (will be updated during the run)
|
|
116
|
+
5. **state.md** — initialize with "Run starting. 0 experiments completed."
|
|
117
|
+
6. **accepted/** — create empty directory
|
|
118
|
+
|
|
119
|
+
## Step 3: Compose the Job Prompt
|
|
120
|
+
|
|
121
|
+
Build a self-contained job prompt that encodes everything the agent needs to run
|
|
122
|
+
the optimization loop autonomously. The prompt must include:
|
|
123
|
+
|
|
124
|
+
```
|
|
125
|
+
Job: optimization — {slug}
|
|
126
|
+
|
|
127
|
+
You are running an optimization loop. Follow the optimization-loop pattern strictly.
|
|
128
|
+
|
|
129
|
+
## Your workspace
|
|
130
|
+
{absolute path to the optimization directory}
|
|
131
|
+
|
|
132
|
+
## What to optimize
|
|
133
|
+
{description of the target — file path, what it does, context}
|
|
134
|
+
|
|
135
|
+
## Current version
|
|
136
|
+
{full content of the target}
|
|
137
|
+
|
|
138
|
+
## Contract
|
|
139
|
+
{contents of contract.md}
|
|
140
|
+
|
|
141
|
+
## Scoring rubric
|
|
142
|
+
{contents of rubric.md}
|
|
143
|
+
|
|
144
|
+
## Loop instructions
|
|
145
|
+
|
|
146
|
+
Read your workspace files (contract.md, rubric.md, baseline.md, current-best.md,
|
|
147
|
+
state.md, results.jsonl) to understand the current state.
|
|
148
|
+
|
|
149
|
+
For each iteration:
|
|
150
|
+
1. Read state.md for context on what's been tried
|
|
151
|
+
2. Form a hypothesis — what to change and why
|
|
152
|
+
3. Produce a candidate version
|
|
153
|
+
4. Gate check — verify all hard constraints from the contract
|
|
154
|
+
5. Score — compare candidate vs current-best using the rubric (pairwise, randomized order)
|
|
155
|
+
6. If candidate is clearly better AND no secondary metric regressions:
|
|
156
|
+
- Update current-best.md
|
|
157
|
+
- Save candidate to accepted/{NNN}.md
|
|
158
|
+
- Log {"status": "keep", ...} to results.jsonl
|
|
159
|
+
7. If not clearly better:
|
|
160
|
+
- Discard candidate
|
|
161
|
+
- Log {"status": "discard", ...} to results.jsonl
|
|
162
|
+
8. Update state.md with what you tried and learned
|
|
163
|
+
|
|
164
|
+
Stop when:
|
|
165
|
+
- Completed {N} iterations, OR
|
|
166
|
+
- {stop_count} consecutive discards (plateau), OR
|
|
167
|
+
- Target score reached (if specified in contract)
|
|
168
|
+
|
|
169
|
+
When finished, update state.md with a final summary and send a message to the user:
|
|
170
|
+
"[optimization] Done. Ran N experiments on {target}. X accepted, Y discarded.
|
|
171
|
+
{One-line summary}. Results: {workspace path}"
|
|
172
|
+
|
|
173
|
+
IMPORTANT:
|
|
174
|
+
- Do NOT modify contract.md or rubric.md
|
|
175
|
+
- Do NOT auto-apply results to the original file
|
|
176
|
+
- Do NOT stop to ask the user questions — run autonomously until done
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
## Step 4: Schedule the Job
|
|
180
|
+
|
|
181
|
+
Use the `add_job` MCP tool (preferred) or `nia job add` CLI:
|
|
182
|
+
|
|
183
|
+
- **name**: `optimize-{slug}` (e.g., `optimize-signup-prompt`)
|
|
184
|
+
- **schedule**: ISO timestamp for the agreed time, or now
|
|
185
|
+
- **schedule_type**: `once`
|
|
186
|
+
- **prompt**: the composed job prompt from Step 3
|
|
187
|
+
- **always**: `true` (overnight runs need to ignore active hours)
|
|
188
|
+
- **stateless**: `yes` (the optimization uses its own workspace, not the job's state.md)
|
|
189
|
+
|
|
190
|
+
Confirm to the user:
|
|
191
|
+
|
|
192
|
+
> "Scheduled. The optimization run starts at {time} and will run ~{N} experiments.
|
|
193
|
+
> I'll message you when it's done with the results."
|
|
194
|
+
|
|
195
|
+
## Step 5: After Completion
|
|
196
|
+
|
|
197
|
+
When the user asks about results, or when reviewing the notification:
|
|
198
|
+
|
|
199
|
+
1. Read `~/.niahere/optimizations/{slug}-{hex}/state.md` for the summary
|
|
200
|
+
2. Read `results.jsonl` for the experiment log
|
|
201
|
+
3. Show `current-best.md` vs `baseline.md` — the diff is the value
|
|
202
|
+
4. Show the accepted progression if the user wants to see the journey
|
|
203
|
+
5. Ask if the user wants to apply the result to the original target
|
|
204
|
+
|
|
205
|
+
## Running Now vs Later
|
|
206
|
+
|
|
207
|
+
**"Run it now":** Schedule with the current timestamp. The user stays in the conversation
|
|
208
|
+
and can check results when the job finishes. Good for shorter runs (10-15 iterations).
|
|
209
|
+
|
|
210
|
+
**"Schedule for later":** Schedule for a specific time (midnight, after hours). The user
|
|
211
|
+
goes about their day. The notification arrives when done. Good for longer runs (30+ iterations).
|
|
212
|
+
|
|
213
|
+
**"Run it inline":** If the user wants to optimize something RIGHT NOW in this conversation
|
|
214
|
+
(not as a job), you can run the optimization-loop pattern directly without scheduling a job.
|
|
215
|
+
Use this for quick 5-10 iteration runs where the user is watching.
|
|
216
|
+
|
|
217
|
+
## When a Job Self-Optimizes
|
|
218
|
+
|
|
219
|
+
A running job (e.g., news-curator, prompt-generator) can use this pattern to improve
|
|
220
|
+
its own approach. The flow:
|
|
221
|
+
|
|
222
|
+
1. Job creates an optimization subdirectory in its workspace or in `~/.niahere/optimizations/`
|
|
223
|
+
2. Runs the loop inline (not as a sub-job — within its own execution)
|
|
224
|
+
3. Saves the best version in the workspace
|
|
225
|
+
4. Does NOT auto-apply changes to its own prompt or config
|
|
226
|
+
5. Sends a message: "I found a better approach for [X]. Review at [path]."
|
|
227
|
+
6. The user decides whether to apply it (e.g., via `nia job update`)
|
|
228
|
+
|
|
229
|
+
## What NOT to Optimize
|
|
230
|
+
|
|
231
|
+
- Things without a clear metric (vague "make it better")
|
|
232
|
+
- Targets that require human judgment with no proxy (art, brand voice decisions)
|
|
233
|
+
- Multi-file changes with complex interdependencies
|
|
234
|
+
- Anything where the scoring takes longer than the modification (defeats the loop)
|
|
235
|
+
- Security-sensitive code where autonomous changes are risky
|
|
236
|
+
|
|
237
|
+
If the target doesn't fit, say so. Not everything benefits from iterative optimization.
|
|
238
|
+
Sometimes the first good version is the right answer.
|