@evermore.work/adapter-codex-local 2026.509.0-canary.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (107) hide show
  1. package/dist/cli/format-event.d.ts +2 -0
  2. package/dist/cli/format-event.d.ts.map +1 -0
  3. package/dist/cli/format-event.js +213 -0
  4. package/dist/cli/format-event.js.map +1 -0
  5. package/dist/cli/index.d.ts +2 -0
  6. package/dist/cli/index.d.ts.map +1 -0
  7. package/dist/cli/index.js +2 -0
  8. package/dist/cli/index.js.map +1 -0
  9. package/dist/cli/quota-probe.d.ts +3 -0
  10. package/dist/cli/quota-probe.d.ts.map +1 -0
  11. package/dist/cli/quota-probe.js +97 -0
  12. package/dist/cli/quota-probe.js.map +1 -0
  13. package/dist/index.d.ts +17 -0
  14. package/dist/index.d.ts.map +1 -0
  15. package/dist/index.js +83 -0
  16. package/dist/index.js.map +1 -0
  17. package/dist/server/codex-args.d.ts +11 -0
  18. package/dist/server/codex-args.d.ts.map +1 -0
  19. package/dist/server/codex-args.js +55 -0
  20. package/dist/server/codex-args.js.map +1 -0
  21. package/dist/server/codex-args.test.d.ts +2 -0
  22. package/dist/server/codex-args.test.d.ts.map +1 -0
  23. package/dist/server/codex-args.test.js +63 -0
  24. package/dist/server/codex-args.test.js.map +1 -0
  25. package/dist/server/codex-home.d.ts +15 -0
  26. package/dist/server/codex-home.d.ts.map +1 -0
  27. package/dist/server/codex-home.js +107 -0
  28. package/dist/server/codex-home.js.map +1 -0
  29. package/dist/server/execute.d.ts +15 -0
  30. package/dist/server/execute.d.ts.map +1 -0
  31. package/dist/server/execute.js +669 -0
  32. package/dist/server/execute.js.map +1 -0
  33. package/dist/server/execute.remote.test.d.ts +2 -0
  34. package/dist/server/execute.remote.test.d.ts.map +1 -0
  35. package/dist/server/execute.remote.test.js +382 -0
  36. package/dist/server/execute.remote.test.js.map +1 -0
  37. package/dist/server/index.d.ts +8 -0
  38. package/dist/server/index.d.ts.map +1 -0
  39. package/dist/server/index.js +57 -0
  40. package/dist/server/index.js.map +1 -0
  41. package/dist/server/parse.d.ts +22 -0
  42. package/dist/server/parse.d.ts.map +1 -0
  43. package/dist/server/parse.js +213 -0
  44. package/dist/server/parse.js.map +1 -0
  45. package/dist/server/parse.test.d.ts +2 -0
  46. package/dist/server/parse.test.d.ts.map +1 -0
  47. package/dist/server/parse.test.js +107 -0
  48. package/dist/server/parse.test.js.map +1 -0
  49. package/dist/server/quota-spawn-error.test.d.ts +2 -0
  50. package/dist/server/quota-spawn-error.test.d.ts.map +1 -0
  51. package/dist/server/quota-spawn-error.test.js +77 -0
  52. package/dist/server/quota-spawn-error.test.js.map +1 -0
  53. package/dist/server/quota.d.ts +64 -0
  54. package/dist/server/quota.d.ts.map +1 -0
  55. package/dist/server/quota.js +432 -0
  56. package/dist/server/quota.js.map +1 -0
  57. package/dist/server/skills.d.ts +8 -0
  58. package/dist/server/skills.d.ts.map +1 -0
  59. package/dist/server/skills.js +65 -0
  60. package/dist/server/skills.js.map +1 -0
  61. package/dist/server/test.d.ts +3 -0
  62. package/dist/server/test.d.ts.map +1 -0
  63. package/dist/server/test.js +259 -0
  64. package/dist/server/test.js.map +1 -0
  65. package/dist/ui/build-config.d.ts +3 -0
  66. package/dist/ui/build-config.d.ts.map +1 -0
  67. package/dist/ui/build-config.js +113 -0
  68. package/dist/ui/build-config.js.map +1 -0
  69. package/dist/ui/build-config.test.d.ts +2 -0
  70. package/dist/ui/build-config.test.d.ts.map +1 -0
  71. package/dist/ui/build-config.test.js +49 -0
  72. package/dist/ui/build-config.test.js.map +1 -0
  73. package/dist/ui/index.d.ts +3 -0
  74. package/dist/ui/index.d.ts.map +1 -0
  75. package/dist/ui/index.js +3 -0
  76. package/dist/ui/index.js.map +1 -0
  77. package/dist/ui/parse-stdout.d.ts +3 -0
  78. package/dist/ui/parse-stdout.d.ts.map +1 -0
  79. package/dist/ui/parse-stdout.js +261 -0
  80. package/dist/ui/parse-stdout.js.map +1 -0
  81. package/dist/ui/parse-stdout.test.d.ts +2 -0
  82. package/dist/ui/parse-stdout.test.d.ts.map +1 -0
  83. package/dist/ui/parse-stdout.test.js +77 -0
  84. package/dist/ui/parse-stdout.test.js.map +1 -0
  85. package/package.json +55 -0
  86. package/skills/diagnose-why-work-stopped/SKILL.md +161 -0
  87. package/skills/evermore/SKILL.md +366 -0
  88. package/skills/evermore/references/api-reference.md +899 -0
  89. package/skills/evermore/references/company-skills.md +193 -0
  90. package/skills/evermore/references/issue-workspaces.md +80 -0
  91. package/skills/evermore/references/routines.md +187 -0
  92. package/skills/evermore/references/workflows.md +141 -0
  93. package/skills/evermore-converting-plans-to-tasks/SKILL.md +42 -0
  94. package/skills/evermore-create-agent/SKILL.md +163 -0
  95. package/skills/evermore-create-agent/references/agent-instruction-templates.md +123 -0
  96. package/skills/evermore-create-agent/references/agents/coder.md +64 -0
  97. package/skills/evermore-create-agent/references/agents/qa.md +88 -0
  98. package/skills/evermore-create-agent/references/agents/securityengineer.md +135 -0
  99. package/skills/evermore-create-agent/references/agents/uxdesigner.md +115 -0
  100. package/skills/evermore-create-agent/references/api-reference.md +110 -0
  101. package/skills/evermore-create-agent/references/baseline-role-guide.md +168 -0
  102. package/skills/evermore-create-agent/references/draft-review-checklist.md +95 -0
  103. package/skills/evermore-create-plugin/SKILL.md +101 -0
  104. package/skills/evermore-dev/SKILL.md +267 -0
  105. package/skills/para-memory-files/SKILL.md +104 -0
  106. package/skills/para-memory-files/references/schemas.md +35 -0
  107. package/skills/terminal-bench-loop/SKILL.md +236 -0
@@ -0,0 +1,267 @@
1
+ ---
2
+ name: evermore-dev
3
+ required: false
4
+ description: >
5
+ Develop and operate a local Evermore instance — start and stop servers,
6
+ pull updates from master, run builds and tests, manage worktrees, back up
7
+ databases, and diagnose problems. Use whenever you need to work on the
8
+ Evermore codebase itself or keep a running instance healthy.
9
+ ---
10
+
11
+ # Evermore Dev
12
+
13
+ This skill covers the day-to-day workflows for developing and operating a local Evermore instance. It assumes you are working inside the Evermore repo checkout with `origin` pointing to `git@github.com:phuctm97/evermore.git`.
14
+
15
+ > **OPEN SOURCE HYGIENE:** This repository is public-facing. Treat anything you push to `origin` as publishable. Never commit or push secrets, API keys, tokens, private logs, PII, customer data, or machine-local configuration that should stay private. Keep git history tidy as well: avoid pushing throwaway branches, noisy checkpoint commits, or speculative work that does not need to be shared upstream.
16
+
17
+ > **MANDATORY:** Before running any CLI command, building, testing, or managing worktrees, you MUST read `doc/DEVELOPING.md` in the Evermore repo. It is the canonical reference for all `evermore` CLI commands, their options, build/test workflows, database operations, worktree management, and diagnostics. Do NOT guess at flags or options — read the doc first.
18
+
19
+ ## Quick Command Reference
20
+
21
+ These are the most common commands. For full option tables and details, see `doc/DEVELOPING.md`.
22
+
23
+ | Task | Command |
24
+ |------|---------|
25
+ | Start server (first time or normal) | `npx evermore.work run` |
26
+ | Dev mode with hot reload | `pnpm dev` |
27
+ | Stop dev server | `pnpm dev:stop` |
28
+ | Build | `pnpm build` |
29
+ | Type-check | `pnpm typecheck` |
30
+ | Run tests | `pnpm test` |
31
+ | Run migrations | `pnpm db:migrate` |
32
+ | Regenerate Drizzle client | `pnpm db:generate` |
33
+ | Back up database | `npx evermore.work db:backup` |
34
+ | Health check | `npx evermore.work doctor --repair` |
35
+ | Print env vars | `npx evermore.work env` |
36
+ | Trigger agent heartbeat | `npx evermore.work heartbeat run --agent-id <id>` |
37
+ | Install agent skills locally | `npx evermore.work agent local-cli <agent> --company-id <id>` |
38
+
39
+ ## Pulling from Master
40
+
41
+ ```bash
42
+ git fetch origin && git pull origin master
43
+ pnpm install && pnpm build
44
+ ```
45
+
46
+ If schema changes landed, also run `pnpm db:generate && pnpm db:migrate`.
47
+
48
+ ## Worktrees
49
+
50
+ Evermore worktrees combine git worktrees with isolated Evermore instances — each gets its own database, server port, and environment seeded from the primary instance.
51
+
52
+ > **MANDATORY:** Before creating or managing worktrees, you MUST read the "Worktree-local Instances" and "Worktree CLI Reference" sections in `doc/DEVELOPING.md`. That is the canonical reference for all worktree commands, their options, seed modes, and environment variables.
53
+
54
+ ### When to Use Worktrees
55
+
56
+ - Starting a feature branch that needs its own Evermore environment
57
+ - Running parallel agent work without cross-contaminating the primary instance
58
+ - Testing Evermore changes in isolation before merging
59
+
60
+ ### Command Overview
61
+
62
+ The CLI has two tiers (see `doc/DEVELOPING.md` for full option tables):
63
+
64
+ | Command | Purpose |
65
+ |---------|---------|
66
+ | `worktree:make <name>` | Create worktree + isolated instance in one step |
67
+ | `worktree:list` | List worktrees and their Evermore status |
68
+ | `worktree:merge-history` | Preview/import issue history between worktrees |
69
+ | `worktree:cleanup <name>` | Remove worktree, branch, and instance data |
70
+ | `worktree init` | Bootstrap instance inside existing worktree |
71
+ | `worktree env` | Print shell exports for worktree instance |
72
+ | `worktree reseed` | Refresh worktree DB from another instance |
73
+ | `worktree repair` | Fix broken/missing worktree instance metadata |
74
+
75
+ ### Typical Workflow
76
+
77
+ ```bash
78
+ # 1. Create a worktree for a feature
79
+ npx evermore.work worktree:make my-feature --start-point origin/main
80
+
81
+ # 2. Move into the worktree (path printed by worktree:make) and source the environment
82
+ cd <worktree-path>
83
+ eval "$(npx evermore.work worktree env)"
84
+
85
+ # 3. Start the isolated Evermore server
86
+ npx evermore.work run
87
+
88
+ # 4. Do your work
89
+
90
+ # 5. When done, merge history back if needed
91
+ npx evermore.work worktree:merge-history --from evermore-my-feature --to current --apply
92
+
93
+ # 6. Clean up
94
+ npx evermore.work worktree:cleanup my-feature
95
+ ```
96
+
97
+ ## Forks — Prefer Pushing to a User Fork
98
+
99
+ If the user has a personal fork of `phuctm97/evermore` configured as a git remote, push your feature branches to **that fork** instead of creating branches on the main repo. This keeps the upstream branch list clean and matches the standard open-source contribution flow.
100
+
101
+ ### Detect a fork remote
102
+
103
+ Before pushing or creating a PR, list remotes and check for one that points at a non-`evermore` GitHub fork:
104
+
105
+ ```bash
106
+ git remote -v
107
+ ```
108
+
109
+ Treat any remote whose URL points to `github.com:<user>/evermore` (or `github.com/<user>/evermore.git`) as the user's fork. Common names are `fork`, `<username>`, or `myfork`. The remote named `origin` or `upstream` that points at `phuctm97/evermore` is the canonical upstream — do not push feature branches there if a fork exists.
110
+
111
+ ### Pushing to the fork
112
+
113
+ ```bash
114
+ # Push the current branch to the user's fork and set upstream
115
+ git push -u <fork-remote> HEAD
116
+ ```
117
+
118
+ Then create the PR from the fork branch:
119
+
120
+ ```bash
121
+ gh pr create --repo phuctm97/evermore --head <fork-owner>:<branch-name> ...
122
+ ```
123
+
124
+ `gh pr create` usually figures out the head ref automatically when run from a branch tracking the fork; the explicit `--head <owner>:<branch>` form is the reliable fallback when it does not.
125
+
126
+ ### When no fork exists
127
+
128
+ If `git remote -v` shows only `phuctm97/evermore` remotes (no user fork), fall back to pushing branches to `origin` as before. Do NOT create a fork on the user's behalf — ask first.
129
+
130
+ ### Keeping the fork up to date
131
+
132
+ The canonical remote that points at `phuctm97/evermore` may be named `origin` **or** `upstream` depending on how the user set up the repo. Detect it the same way as in the "Detect a fork remote" step, then fetch and push from/with that remote so the sync works under either convention:
133
+
134
+ ```bash
135
+ UPSTREAM_REMOTE=$(git remote -v | awk '/evermore\/evermore.*\(fetch\)/{print $1; exit}')
136
+ git fetch "$UPSTREAM_REMOTE"
137
+ git push <fork-remote> "${UPSTREAM_REMOTE}/master:master"
138
+ ```
139
+
140
+ ## Pull Requests
141
+
142
+ > **MANDATORY PRE-FLIGHT:** Before creating ANY pull request, you MUST read the canonical source files listed below. Do NOT run `gh pr create` until you have read these files and verified your PR body matches every required section.
143
+
144
+ ### Step 1 — Read the canonical files
145
+
146
+ You MUST read all three of these files before creating a PR:
147
+
148
+ 1. **`.github/PULL_REQUEST_TEMPLATE.md`** — the required PR body structure
149
+ 2. **`CONTRIBUTING.md`** — contribution conventions, PR requirements, and thinking-path examples
150
+ 3. **`.github/workflows/pr.yml`** — CI checks that gate merge
151
+
152
+ ### Step 2 — Validate your PR body against this checklist
153
+
154
+ After reading the template, verify your `--body` includes every one of these sections (names must match exactly):
155
+
156
+ - [ ] `## Thinking Path` — blockquote style, 5-8 reasoning steps
157
+ - [ ] `## What Changed` — bullet list of concrete changes
158
+ - [ ] `## Verification` — how a reviewer confirms this works
159
+ - [ ] `## Risks` — what could go wrong
160
+ - [ ] `## Model Used` — provider, model ID, version, capabilities
161
+ - [ ] `## Checklist` — copied from the template, items checked off
162
+
163
+ If any section is missing or empty, do NOT submit the PR. Go back and fill it in.
164
+
165
+ ### Step 3 — Create the PR
166
+
167
+ Only after completing Steps 1 and 2, run `gh pr create`. Use the template contents as the structure for `--body` — do not write a freeform summary.
168
+
169
+ ## Hard Rules — Do NOT Bypass
170
+
171
+ These rules exist because agents have caused real damage by improvising around CLI failures. Follow them exactly.
172
+
173
+ 1. **CLI is the only interface to worktrees and databases.** All worktree and database operations MUST go through `npx evermore.work` / `pnpm evermore` commands. You MUST NOT:
174
+ - Run `pg_dump`, `pg_restore`, `psql`, `createdb`, `dropdb`, or any raw postgres commands
175
+ - Manually set `DATABASE_URL` to point a worktree server at another instance's database
176
+ - Run `rm -rf` on any `.evermore/`, `.evermore-worktrees/`, or `db/` directory
177
+ - Directly manipulate embedded postgres data directories
178
+ - Kill postgres processes by PID
179
+
180
+ 2. **If a CLI command fails, stop and report.** Do NOT attempt workarounds. If `worktree:make`, `worktree reseed`, `worktree init`, `worktree:cleanup`, or any other `evermore` command fails:
181
+ - Report the exact error message in your task comment
182
+ - Set the task to `blocked`
183
+ - Suggest running `npx evermore.work doctor --repair` or recreating the worktree from scratch
184
+ - Do NOT try to manually replicate what the CLI does
185
+
186
+ 3. **Never share databases between instances.** Each worktree instance gets its own isolated database. Never override `DATABASE_URL` to point one instance at another's database. This destroys isolation and can corrupt production data.
187
+
188
+ 4. **Starting a dev server in a worktree requires setup first.** The correct sequence is:
189
+ ```bash
190
+ # If the worktree already exists but has no running instance:
191
+ cd <worktree-path>
192
+ eval "$(npx evermore.work worktree env)"
193
+ pnpm install && pnpm build
194
+ npx evermore.work run # or pnpm dev
195
+
196
+ # If the worktree needs a fresh database:
197
+ npx evermore.work worktree reseed --seed-mode full
198
+
199
+ # If the worktree is broken beyond repair:
200
+ npx evermore.work worktree:cleanup <name>
201
+ npx evermore.work worktree:make <name> --seed-mode full
202
+ ```
203
+ If any step fails, follow rule 2 — stop and report.
204
+
205
+ 5. **Seeding is a CLI operation.** When asked to seed a worktree database from the main instance, use `worktree reseed` or recreate with `worktree:make --seed-mode full`. Read `doc/DEVELOPING.md` for the full option tables. Never attempt manual database copying.
206
+
207
+ ## Persistent Dev Servers (for Manual Testing)
208
+
209
+ When an agent needs to start a dev server that outlives the current heartbeat — for example, so a human or QA agent can manually test against it — the server process **must** be launched in a detached session. A process started directly from a heartbeat shell is killed when the heartbeat exits.
210
+
211
+ ### Use `tmux` for persistent servers
212
+
213
+ ```bash
214
+ # 1. cd into the worktree (or main repo) and source the environment
215
+ cd <worktree-path>
216
+ eval "$(npx evermore.work worktree env)" # skip if using the primary instance
217
+
218
+ # 2. Start the dev server in a named, detached tmux session
219
+ tmux new-session -d -s <session-name> 'pnpm dev'
220
+
221
+ # Example with a descriptive name:
222
+ tmux new-session -d -s auth-fix-3102 'pnpm dev'
223
+ ```
224
+
225
+ ### Managing the session
226
+
227
+ | Task | Command |
228
+ |------|---------|
229
+ | Check if the session is alive | `tmux has-session -t <session-name> 2>/dev/null && echo running` |
230
+ | View server output | `tmux capture-pane -t <session-name> -p` |
231
+ | Kill the session | `tmux kill-session -t <session-name>` |
232
+ | List all tmux sessions | `tmux list-sessions` |
233
+
234
+ ### Verifying the server is reachable
235
+
236
+ After launching, confirm the port is listening before reporting success:
237
+
238
+ ```bash
239
+ # Wait briefly for startup, then verify
240
+ sleep 3
241
+ curl -sf http://127.0.0.1:<port>/api/health && echo "Server is up"
242
+ lsof -nP -iTCP:<port> -sTCP:LISTEN
243
+ ```
244
+
245
+ ### Key rules
246
+
247
+ 1. **Always use `tmux` (or equivalent)** when a dev server needs to stay running after the heartbeat ends. A server started directly from the agent shell will die when the heartbeat exits, even if it appeared healthy moments before.
248
+ 2. **Name the session descriptively** — include the worktree name and port (e.g., `auth-fix-3102`).
249
+ 3. **Verify the server is listening** before reporting the URL to anyone.
250
+ 4. **Do not use `nohup` or `&` alone** — these are unreliable for agent shells that may have their entire process group killed.
251
+ 5. **Clean up when done** — kill the tmux session when the testing is complete.
252
+
253
+ ## Common Mistakes
254
+
255
+ | Mistake | Fix |
256
+ |---------|-----|
257
+ | Server won't start | Run `npx evermore.work doctor --repair` to diagnose and auto-fix |
258
+ | Forgetting to source worktree env | Run `eval "$(npx evermore.work worktree env)"` after cd-ing into the worktree |
259
+ | Stale dependencies after pull | Run `pnpm install && pnpm build` after pulling |
260
+ | Schema out of date after pull | Run `pnpm db:generate && pnpm db:migrate` |
261
+ | Reseeding while target DB is running | Stop the target server first, or use `--allow-live-target` |
262
+ | Cleaning up with unmerged commits | Merge or push first, or use `--force` if intentionally discarding |
263
+ | Running agents against wrong instance | Verify `EVERMORE_API_URL` points to the correct port |
264
+ | CLI command fails | Do NOT work around it — report the error and block (see Hard Rules above) |
265
+ | Agent tries manual postgres operations | NEVER do this — all DB ops go through the CLI (see Hard Rules above) |
266
+ | Dev server dies between heartbeats | Launch in a detached `tmux` session — see "Persistent Dev Servers" above |
267
+ | Pushed feature branch to `phuctm97/evermore` when a fork exists | Push to the user's fork remote instead — see "Forks" above |
@@ -0,0 +1,104 @@
1
+ ---
2
+ name: para-memory-files
3
+ description: >
4
+ File-based memory system using Tiago Forte's PARA method. Use this skill whenever
5
+ you need to store, retrieve, update, or organize knowledge across sessions. Covers
6
+ three memory layers: (1) Knowledge graph in PARA folders with atomic YAML facts,
7
+ (2) Daily notes as raw timeline, (3) Tacit knowledge about user patterns. Also
8
+ handles planning files, memory decay, weekly synthesis, and recall via qmd.
9
+ Trigger on any memory operation: saving facts, writing daily notes, creating
10
+ entities, running weekly synthesis, recalling past context, or managing plans.
11
+ ---
12
+
13
+ # PARA Memory Files
14
+
15
+ Persistent, file-based memory organized by Tiago Forte's PARA method. Three layers: a knowledge graph, daily notes, and tacit knowledge. All paths are relative to `$AGENT_HOME`.
16
+
17
+ ## Three Memory Layers
18
+
19
+ ### Layer 1: Knowledge Graph (`$AGENT_HOME/life/` -- PARA)
20
+
21
+ Entity-based storage. Each entity gets a folder with two tiers:
22
+
23
+ 1. `summary.md` -- quick context, load first.
24
+ 2. `items.yaml` -- atomic facts, load on demand.
25
+
26
+ ```text
27
+ $AGENT_HOME/life/
28
+ projects/ # Active work with clear goals/deadlines
29
+ <name>/
30
+ summary.md
31
+ items.yaml
32
+ areas/ # Ongoing responsibilities, no end date
33
+ people/<name>/
34
+ companies/<name>/
35
+ resources/ # Reference material, topics of interest
36
+ <topic>/
37
+ archives/ # Inactive items from the other three
38
+ index.md
39
+ ```
40
+
41
+ **PARA rules:**
42
+
43
+ - **Projects** -- active work with a goal or deadline. Move to archives when complete.
44
+ - **Areas** -- ongoing (people, companies, responsibilities). No end date.
45
+ - **Resources** -- reference material, topics of interest.
46
+ - **Archives** -- inactive items from any category.
47
+
48
+ **Fact rules:**
49
+
50
+ - Save durable facts immediately to `items.yaml`.
51
+ - Weekly: rewrite `summary.md` from active facts.
52
+ - Never delete facts. Supersede instead (`status: superseded`, add `superseded_by`).
53
+ - When an entity goes inactive, move its folder to `$AGENT_HOME/life/archives/`.
54
+
55
+ **When to create an entity:**
56
+
57
+ - Mentioned 3+ times, OR
58
+ - Direct relationship to the user (family, coworker, partner, client), OR
59
+ - Significant project or company in the user's life.
60
+ - Otherwise, note it in daily notes.
61
+
62
+ For the atomic fact YAML schema and memory decay rules, see [references/schemas.md](references/schemas.md).
63
+
64
+ ### Layer 2: Daily Notes (`$AGENT_HOME/memory/YYYY-MM-DD.md`)
65
+
66
+ Raw timeline of events -- the "when" layer.
67
+
68
+ - Write continuously during conversations.
69
+ - Extract durable facts to Layer 1 during heartbeats.
70
+
71
+ ### Layer 3: Tacit Knowledge (`$AGENT_HOME/MEMORY.md`)
72
+
73
+ How the user operates -- patterns, preferences, lessons learned.
74
+
75
+ - Not facts about the world; facts about the user.
76
+ - Update whenever you learn new operating patterns.
77
+
78
+ ## Write It Down -- No Mental Notes
79
+
80
+ Memory does not survive session restarts. Files do.
81
+
82
+ - Want to remember something -> WRITE IT TO A FILE.
83
+ - "Remember this" -> update `$AGENT_HOME/memory/YYYY-MM-DD.md` or the relevant entity file.
84
+ - Learn a lesson -> update AGENTS.md, TOOLS.md, or the relevant skill file.
85
+ - Make a mistake -> document it so future-you does not repeat it.
86
+ - On-disk text files are always better than holding it in temporary context.
87
+
88
+ ## Memory Recall -- Use qmd
89
+
90
+ Use `qmd` rather than grepping files:
91
+
92
+ ```bash
93
+ qmd query "what happened at Christmas" # Semantic search with reranking
94
+ qmd search "specific phrase" # BM25 keyword search
95
+ qmd vsearch "conceptual question" # Pure vector similarity
96
+ ```
97
+
98
+ Index your personal folder: `qmd index $AGENT_HOME`
99
+
100
+ Vectors + BM25 + reranking finds things even when the wording differs.
101
+
102
+ ## Planning
103
+
104
+ Keep plans in timestamped files in `plans/` at the project root (outside personal memory so other agents can access them). Use `qmd` to search plans. Plans go stale -- if a newer plan exists, do not confuse yourself with an older version. If you notice staleness, update the file to note what it is supersededBy.
@@ -0,0 +1,35 @@
1
+ # Schemas and Memory Decay
2
+
3
+ ## Atomic Fact Schema (items.yaml)
4
+
5
+ ```yaml
6
+ - id: entity-001
7
+ fact: "The actual fact"
8
+ category: relationship | milestone | status | preference
9
+ timestamp: "YYYY-MM-DD"
10
+ source: "YYYY-MM-DD"
11
+ status: active # active | superseded
12
+ superseded_by: null # e.g. entity-002
13
+ related_entities:
14
+ - companies/acme
15
+ - people/jeff
16
+ last_accessed: "YYYY-MM-DD"
17
+ access_count: 0
18
+ ```
19
+
20
+ ## Memory Decay
21
+
22
+ Facts decay in retrieval priority over time so stale info does not crowd out recent context.
23
+
24
+ **Access tracking:** When a fact is used in conversation, bump `access_count` and set `last_accessed` to today. During heartbeat extraction, scan the session for referenced entity facts and update their access metadata.
25
+
26
+ **Recency tiers (for summary.md rewriting):**
27
+
28
+ - **Hot** (accessed in last 7 days) -- include prominently in summary.md.
29
+ - **Warm** (8-30 days ago) -- include at lower priority.
30
+ - **Cold** (30+ days or never accessed) -- omit from summary.md. Still in items.yaml, retrievable on demand.
31
+ - High `access_count` resists decay -- frequently used facts stay warm longer.
32
+
33
+ **Weekly synthesis:** Sort by recency tier, then by access_count within tier. Cold facts drop out of the summary but remain in items.yaml. Accessing a cold fact reheats it.
34
+
35
+ No deletion. Decay only affects retrieval priority via summary.md curation. The full record always lives in items.yaml.
@@ -0,0 +1,236 @@
1
+ ---
2
+ name: terminal-bench-loop
3
+ description: >
4
+ Run a single Terminal-Bench problem through Evermore in a bounded,
5
+ human-in-the-loop improvement cycle until the smoke passes, the board
6
+ rejects the next fix, the iteration budget is exhausted, or a real
7
+ blocker is named. Each iteration runs a bounded smoke against an
8
+ isolated Evermore App worktree, captures artifacts, diagnoses the
9
+ exact stop point with `/diagnose-why-work-stopped`, requests board
10
+ confirmation before any product fix, then reruns against the same
11
+ worktree. Use whenever an issue asks to "run Terminal-Bench in a
12
+ loop", "drive Terminal-Bench until it passes", "loop fix-git through
13
+ Evermore", or otherwise points at a Terminal-Bench task and asks for
14
+ bounded iteration with diagnosis.
15
+ ---
16
+
17
+ # Terminal-Bench Loop
18
+
19
+ A repeatable operating skill for driving one Terminal-Bench problem to a passing smoke through Evermore, with explicit issue topology, bounded runs, board-gated product fixes, and worktree continuity.
20
+
21
+ This skill is **operational + diagnostic**, not engineering. It coordinates issues, artifacts, and approvals around a Terminal-Bench loop. It does not authorize code changes — every accepted product fix lands as a separate implementation child issue after a board confirmation.
22
+
23
+ Canonical execution model: read `doc/execution-semantics.md` before starting a loop or moving any loop issue. Every loop issue must rest in a state the doc allows: terminal (`done`/`cancelled`), explicitly live (active run / queued wake), explicitly waiting (`in_review` with participant/interaction/approval), or explicit recovery/blocker (`blocked` with `blockedByIssueIds` and a named owner).
24
+
25
+ ## When to use
26
+
27
+ Trigger on an assignment whose title or body matches any of:
28
+
29
+ - "run Terminal-Bench in a loop", "loop \<task-name\> through Evermore"
30
+ - "drive Terminal-Bench fix-git", "iterate on Terminal-Bench until it passes"
31
+ - "Terminal-Bench smoke loop", "bench loop", "smoke loop on \<task-name\>"
32
+ - An attached link to a Terminal-Bench loop parent issue, plus a request to do another iteration
33
+
34
+ Also use when the user hands you an existing top-level loop issue and asks for the next iteration, diagnosis, or rerun.
35
+
36
+ ## When NOT to use
37
+
38
+ - The assignment is to build or change `evermore-bench` itself (Harbor adapter, wrapper, telemetry). Use normal engineering flow on that repo.
39
+ - The assignment is to submit a benchmark result for ranking. This skill produces smoke/non-comparable runs by design — escalate full-suite or comparable runs to BenchmarkQualityManager.
40
+ - The assignment is a normal Evermore product bug not surfaced by a Terminal-Bench loop. Use normal investigation.
41
+ - You have not been granted permission to install or assign company skills, and the asker actually wants library mutation. Hand that step to an authorized skill-library owner.
42
+
43
+ ## Three invariants you must preserve
44
+
45
+ Every loop iteration and every proposed product fix must hold these three invariants together. They come from `/diagnose-why-work-stopped` and the user has restated them across the liveness work:
46
+
47
+ 1. **Productive work continues.** Each loop issue must always have a clear next action owner — agent, board, user, or named blocker. No silent `in_review` with nothing waiting on it.
48
+ 2. **Only real blockers stop work.** Stops happen when something genuinely cannot proceed (board confirmation, QA, missing credentials, exhausted budget). Pseudo-stops must be detected and routed.
49
+ 3. **No infinite loops.** Iteration count, wall-clock budget, and a board gate before product fixes are applied keep the loop bounded.
50
+
51
+ If a proposed iteration violates any of the three, drop it or rework it. State explicitly in the loop issue how each invariant is held this iteration.
52
+
53
+ ## Inputs
54
+
55
+ Collect these on the top-level loop issue before iteration 1. Any input that cannot be supplied is a blocker — name the unblock owner and stop.
56
+
57
+ - **Source issue.** The Evermore issue that asked for the loop. The loop parent links back to it.
58
+ - **Terminal-Bench task name.** Single-task identifier (e.g. `terminal-bench/fix-git`). Multi-task suites are out of scope for this skill.
59
+ - **Iteration budget.** Maximum number of iterations before the loop must stop without further fixes (typical: 3–5). Also record a per-iteration wall-clock cap.
60
+ - **Evermore App worktree issue.** The implementation-side issue under the Evermore App project whose execution workspace owns the isolated worktree. First iteration creates it; later iterations reuse it via `inheritExecutionWorkspaceFromIssueId` or equivalent.
61
+ - **Benchmark command.** The exact `evermore-bench` invocation, including the `EVERMORE_CMD` (or equivalent) binding pinned to the Evermore App worktree under test. Record verbatim on the loop issue.
62
+ - **Dispatch runner config.** The exact Harbor/Evermore runner dispatch config required for the smoke to actually start a Evermore heartbeat. For the current Harbor wrapper, record the `EVERMORE_HARBOR_RUNNER_CONFIG` JSON (or equivalent config file) verbatim enough to preserve: `assignee`, `heartbeat_strategy`, `agent_adapter` / `agent_adapters`, `reuse_host_home` when local credentials are intentionally needed, and the stop budget. A bare Harbor command that creates `BEN-1` as unassigned `todo` with zero heartbeat-enabled agents is a harness/setup failure, not a valid product diagnosis.
63
+ - **Latest artifact root.** Filesystem or storage path under which `evermore-bench` writes run artifacts (manifest, `results.jsonl`, Harbor raw job folders, redacted telemetry). Each iteration appends; nothing is overwritten.
64
+ - **Approval policy.** Who must accept a proposed product fix before implementation (default: board via `request_confirmation`; CTO if delegated; never the loop driver alone).
65
+
66
+ Record each input on the top-level loop issue (description or a dedicated `inputs` document). If any input changes mid-loop, note the change and the iteration it took effect.
67
+
68
+ ## Issue topology
69
+
70
+ The loop must be representable as a tree, not as prose in comments:
71
+
72
+ - **Top-level loop issue.** Long-lived. Holds inputs, iteration counter, current state, links to every iteration child, and the product-rule history. Rests in `in_progress` while an iteration is running, `in_review` only when a typed waiter sits directly on the loop parent (execution-policy participant, `request_confirmation` / `ask_user_questions` / `suggest_tasks` interaction, approval, or named human owner), `blocked` with `blockedByIssueIds` while a child issue is the gating work (iteration child holding the fix-proposal `request_confirmation`, or implementation, QA, or CTO review children), `done` on pass, or `cancelled` on board-rejection / budget exhaustion.
73
+ - **Iteration child issues.** One per iteration. Each carries: a bounded run issue (smoke), a diagnosis issue (applies `/diagnose-why-work-stopped`), a fix-proposal document with a `request_confirmation` interaction, and — only after acceptance — implementation, QA, CTO review, and rerun children. Iteration children are blocked by their predecessors so the executor wakes them in order.
74
+ - **Evermore App implementation issue.** The first iteration creates a fresh Evermore App child whose project policy spawns an isolated worktree. Every later iteration's implementation/rerun child references that same execution workspace via `inheritExecutionWorkspaceFromIssueId` so the same worktree is amended and tested.
75
+
76
+ Wire dependencies with `blockedByIssueIds`, never with prose like "blocked by X". When a dependent child is `done`, the executor auto-wakes the next.
77
+
78
+ ## Procedure
79
+
80
+ ### 0. Read the current execution contract
81
+
82
+ Before opening or advancing a loop, read `doc/execution-semantics.md`. Use that document's terms intact when classifying loop-issue state: live path / waiting path / recovery path; post-run disposition; bounded continuation; productivity review; pause-hold; watchdog. Do not invent a new state.
83
+
84
+ ### 1. Open or reuse the top-level loop issue
85
+
86
+ - If an existing loop issue is supplied, read it: inputs, iteration counter, last iteration's stop reason, current Evermore App worktree pointer, latest benchmark command.
87
+ - If no loop issue exists, create one under the Evermore App project (or the project the source issue points at). Title: `Terminal-Bench loop: <task-name>`. Description captures the inputs above, the iteration budget, and a link to the source issue.
88
+ - Verify the worktree pointer still resolves. If the recorded execution workspace was discarded (worktree pruned, project changed), the loop is blocked — name the unblock owner (CodexCoder or the Evermore App owner) and stop.
89
+
90
+ ### 2. Open the iteration child
91
+
92
+ - Increment the iteration counter on the loop issue.
93
+ - Create an iteration child titled `Iteration N: <task-name>`. Its description repeats the inputs and references the loop parent. Block it on the prior iteration's terminal child (if any) so the executor cannot start two iterations in parallel.
94
+ - If the iteration counter would exceed the budget, do not create the child. Move the loop issue to `cancelled` (budget exhausted) or `in_review` if the user must decide whether to extend the budget.
95
+
96
+ ### 3. Run the bounded smoke
97
+
98
+ - The benchmark command must use the Evermore App worktree under test. Set `EVERMORE_CMD` (or the equivalent command binding) to the CLI entrypoint inside that worktree. Never let the smoke run against the operator's current Evermore checkout.
99
+ - The same command block must include the runner dispatch config that makes the benchmark issue actionable. For the current Harbor wrapper, export `EVERMORE_HARBOR_RUNNER_CONFIG` with the intended assignee, heartbeat strategy, agent adapter, credential/home mode, and stop budget. Do not treat a bare `uvx harbor run ...` as the canonical smoke if it omits the dispatch config; record that as a harness/setup miss and rerun with the recorded config.
100
+ - Bound the run by wall-clock and by Evermore's run-budget controls. If the smoke would exceed the per-iteration cap, kill it and record the truncation reason.
101
+ - Capture, in the iteration child or a dedicated `run` document:
102
+ - Evermore run id and heartbeat run ids
103
+ - benchmark run id, manifest, `results.jsonl` row, Harbor raw job folder
104
+ - dispatch config used (`EVERMORE_HARBOR_RUNNER_CONFIG` or equivalent), including assignee and adapter type
105
+ - the exact stop reason reported by the harness (pass, harness fail, verifier fail, timeout, agent gave up, infrastructure error)
106
+ - heartbeat-enabled and heartbeat-observed agent counts when Evermore telemetry exports them
107
+ - failure taxonomy bucket (task/model, Evermore product, harness/setup, verifier/infrastructure, security, unclear)
108
+ - artifact paths under the latest artifact root
109
+ - Label the iteration as **smoke / non-comparable**. Comparable runs are out of scope for this skill.
110
+
111
+ ### 4. Diagnose the exact stop point
112
+
113
+ Apply the `/diagnose-why-work-stopped` pattern to the iteration's run, scoped to this loop only — do not pull in unrelated forensic boilerplate. Specifically:
114
+
115
+ - Walk the Evermore issue tree the smoke produced under the Evermore App worktree, node by node, and find the exact `(issue, status)` combination that stopped progress. Quote evidence: run ids, comment timestamps, status transitions.
116
+ - Classify every non-progressing issue in that subtree as **truly needs human/board intervention**, **agent-actionable but not currently routed**, or **already covered**.
117
+ - State whether the failure is task/model, Evermore product, harness/setup, verifier/infrastructure, security, or unclear. Be explicit when evidence is inferred (e.g. cross-company API boundary blocks direct reads).
118
+ - If the failure is a Evermore product gap, frame the fix as a **general product rule** stated as a contract, and check it against the three invariants above. If the rule would have blocked a recent productive run, narrow it.
119
+
120
+ Record the diagnosis on the iteration child as a `diagnosis` document. Do not propose code yet.
121
+
122
+ ### 5. Decide the next move
123
+
124
+ Based on the diagnosis, the iteration ends in exactly one of these terminal-for-iteration states:
125
+
126
+ - **Pass.** Smoke verifier reports pass. Move the iteration child and the loop parent toward QA/CTO review (Step 8).
127
+ - **Product fix proposed.** A Evermore product gap was identified. Write the fix proposal as a `plan` document on the iteration child, then go to Step 6.
128
+ - **Non-product failure with retry.** Failure is harness/setup/infrastructure or model flakiness, the iteration budget is not exhausted, and the loop driver believes a rerun without code changes has signal (e.g. transient infra). Record the rationale on the iteration child and go to Step 7 with no implementation step.
129
+ - **Real blocker.** Named external blocker (credentials, quota, third-party outage, security review). Move the loop issue to `blocked`, set `blockedByIssueIds` to the blocker issue (creating one if needed), and name the unblock owner. Stop.
130
+ - **Budget or board stop.** Iteration budget reached, or the board has rejected the next fix proposal. Move the loop issue to `cancelled` with a comment that summarizes the run history and the reason for stopping.
131
+
132
+ ### 6. Request board confirmation before any product fix
133
+
134
+ When the iteration ends in **product fix proposed**:
135
+
136
+ - Update the iteration child's `plan` document with the proposed contract, the three-invariant check, the affected Evermore surfaces, and the phased subtasks (implementation, QA, CTO review, rerun) — but do not create those subtasks.
137
+ - Open the `request_confirmation` interaction on the **iteration child** (the same issue that owns the `plan` document), targeting the latest plan revision. Idempotency key: `confirmation:{iterationIssueId}:plan:{revisionId}`. Set `continuationPolicy` to `wake_assignee`.
138
+ - Move the **iteration child** to `in_review`. The typed waiter — the `request_confirmation` interaction — sits directly on it, so its `in_review` is healthy. Comment links the plan document and names the pending confirmation.
139
+ - Move the **loop parent** to `blocked` with `blockedByIssueIds: [iterationChildId]` and a comment naming the board (or whichever approver the approval policy designates) as the unblock owner. Do not move the loop parent to `in_review` here: the typed waiter lives on the iteration child, not on the parent, so the parent's wait path is the child blocker. This matches the topology rule that the loop parent only sits in `in_review` when a typed waiter is attached directly to the parent.
140
+ - Wait for acceptance. If the board posts a superseding comment that changes the plan, revise the document, then open a fresh confirmation tied to the new revision on the iteration child — the prior one is invalidated. The loop parent's `blockedByIssueIds` already points at the iteration child, so it does not need to change.
141
+ - On rejection, end the loop per the **Budget or board stop** rule; do not silently retry the same proposal.
142
+ - On acceptance, create the implementation, QA, CTO review, and rerun child issues with `blockedByIssueIds` wired in order, and update the loop parent's `blockedByIssueIds` to point at the new gating child (typically the implementation child) so the parent stays `blocked` against real downstream work. The implementation child must inherit the Evermore App execution workspace (`inheritExecutionWorkspaceFromIssueId` to the worktree-owning issue) so the fix lands in the same isolated worktree the smoke ran against.
143
+
144
+ ### 7. Rerun against the same worktree
145
+
146
+ After implementation and QA complete (or immediately, in the **non-product failure with retry** case), the rerun child runs the same `evermore-bench` invocation with `EVERMORE_CMD` still pinned to the Evermore App worktree under test.
147
+
148
+ - The rerun must use the same worktree the fix landed in. If the workspace was reset between iterations, the loop is invalid — open a blocker on the loop issue and stop.
149
+ - On completion, the rerun child becomes the next iteration's run record. If the smoke now passes, jump to Step 8. Otherwise return to Step 4 with a new iteration child (subject to the iteration budget).
150
+
151
+ ### 8. Pass: QA, CTO review, close
152
+
153
+ When the smoke passes:
154
+
155
+ - Create QA and CTO review children if they are not already in the dependency chain (CTO review blocked by QA, so the chain wakes in order). Move the loop parent to `blocked` with `blockedByIssueIds` set to the QA / CTO review chain, and post a comment that names QA and CTO as the unblock owners and links the children. The loop parent stays `blocked` — not `in_review` — because the typed waiter lives on the children, not on the parent.
156
+ - If you instead want the loop parent itself to sit in `in_review` during this phase (for example because a board user has explicitly volunteered to drive the review), put a typed waiter directly on the parent — execution-policy participant, `request_confirmation` / `ask_user_questions` / `suggest_tasks` interaction, approval, or named human owner — and do not rely on the child chain alone. Do not combine `in_review` on the parent with QA/CTO children acting as the blocker; that is the ambiguous review shape this skill exists to prevent.
157
+ - QA validates artifacts (manifest, `results.jsonl`, Harbor raw job, redacted telemetry) and the rerun reproducibility against the same worktree.
158
+ - CTO reviews the technical scope of any product fixes that landed during the loop.
159
+ - On QA + CTO acceptance, close the loop issue with a board-level summary comment: task name, iteration count, stop reason (pass), worktree pointer, link to the final artifact root, and the list of accepted product fixes (each with its implementation issue id).
160
+
161
+ ### 9. Stop rules
162
+
163
+ The loop **must** stop, with state explicitly recorded on the loop issue, when any of these is true:
164
+
165
+ - **Pass.** Smoke verifier reports pass and QA + CTO accept (Step 8). Loop issue → `done`.
166
+ - **Board rejection.** Board rejects a fix proposal and does not request a revision. Loop issue → `cancelled`. Comment names the rejected proposal and the reason.
167
+ - **Iteration budget reached.** Iteration counter reaches the budget without a pass. Loop issue → `cancelled` (or `in_review` if the user must decide whether to extend the budget). Never silently start iteration N+1.
168
+ - **Real blocker named.** External blocker (credentials, quota, infra, security, missing skill) cannot be resolved by the loop driver. Loop issue → `blocked` with `blockedByIssueIds` to the blocker issue and the unblock owner named.
169
+
170
+ A loop must never end on a prose comment alone. Every stop is a status transition with a named next-action owner.
171
+
172
+ ## Worktree rule
173
+
174
+ The loop must not test whatever Evermore checkout happens to be current for the heartbeat. It must test the same isolated Evermore App worktree where proposed fixes are applied.
175
+
176
+ - The first iteration creates the Evermore App implementation child; that project's git-worktree policy spawns a fresh worktree.
177
+ - The loop issue records the worktree-owning issue id and the workspace path (or workspace id).
178
+ - Every later implementation, QA, and rerun child sets `inheritExecutionWorkspaceFromIssueId` to that worktree-owning issue, so all subsequent loop work shares one workspace.
179
+ - The benchmark command always sets `EVERMORE_CMD` (or the equivalent command binding) to the CLI entrypoint inside that worktree, and it carries the recorded dispatch runner config (`EVERMORE_HARBOR_RUNNER_CONFIG` or equivalent) needed to assign the benchmark issue and start the heartbeat. The benchmark command stored on the loop issue is the source of truth — if a heartbeat needs to run the smoke from a different shell, it copies the recorded command block verbatim, not only the Harbor invocation line.
180
+ - If the workspace is pruned or the worktree path no longer resolves, the loop is invalid until rebuilt. Mark the loop `blocked` and name the unblock owner (typically CodexCoder or the Evermore App owner).
181
+
182
+ ## Liveness rule
183
+
184
+ Every loop issue, at the end of every heartbeat, must rest in one of:
185
+
186
+ - **Terminal:** `done` or `cancelled`. No further action.
187
+ - **Explicitly live:** `in_progress` with an active run, an upcoming queued wake, or a child issue actively executing under it.
188
+ - **Explicitly waiting:** `in_review` with a typed waiter — execution-policy participant, `request_confirmation` / `ask_user_questions` / `suggest_tasks` interaction, approval, or a named human owner.
189
+ - **Explicit recovery / blocker:** `blocked` with `blockedByIssueIds` set to a real blocking issue, plus a comment naming the unblock owner and the action needed.
190
+
191
+ If a loop issue does not fit one of these on exit, the heartbeat is not done. Fix the state before exiting.
192
+
193
+ ## Pitfalls
194
+
195
+ - **Running the smoke against the operator's Evermore checkout.** The whole point of the worktree rule is that the bench tests the worktree the fix lands in. Always set `EVERMORE_CMD` and verify the path before launching the run.
196
+ - **Dropping the dispatch config.** A Harbor run that omits `EVERMORE_HARBOR_RUNNER_CONFIG` (or equivalent) may boot Evermore and create `BEN-1`, but leave it unassigned with zero heartbeat-enabled agents. That is not a Terminal-Bench product signal. Preserve and rerun the full command block, including assignee and adapter config.
197
+ - **Coding before approval.** No implementation child exists until a board confirmation accepts the iteration's `plan` document. Do not push code in the diagnostic phase.
198
+ - **Skipping the recent-work survey.** When proposing a Evermore product rule, check what already shipped in the affected liveness/execution area in the last few days. A rule that contradicts last-week's accepted contract is rework.
199
+ - **Letting `in_review` mean done.** A loop or iteration child sitting in `in_review` with no participant, no interaction, no approval, and no human owner is a stop, not progress. Treat it as a liveness violation and route it.
200
+ - **Silent iteration N+1.** If the iteration budget is reached, never start another iteration without an explicit budget extension recorded on the loop issue.
201
+ - **Comparable-run drift.** This skill produces smoke runs only. If the asker wants a comparable benchmark submission, hand off to BenchmarkQualityManager and BenchmarkForensics — do not relabel a smoke as comparable.
202
+ - **Recursive recovery.** Stranded-work recovery that recovers its own recovery issues is the canonical infinite loop. If a diagnosis surfaces it inside the smoke's subtree, refuse to deepen and route to `/diagnose-why-work-stopped` for a product-rule fix.
203
+ - **Skill-library mutation.** This skill never installs, edits, or assigns company skills as part of a loop iteration. Library changes go to an authorized skill-library owner via a separate issue.
204
+ - **Hiding the chain.** Do not silently delete or hide failed iteration children, retracted proposals, or rejected confirmations. The audit trail is the loop's evidence.
205
+
206
+ ## Verification checklist (before exiting a heartbeat that touched the loop)
207
+
208
+ - [ ] All inputs are recorded on the top-level loop issue, including the exact benchmark command, `EVERMORE_CMD` binding, and dispatch runner config.
209
+ - [ ] Iteration counter is up to date and within budget.
210
+ - [ ] The Evermore App worktree pointer still resolves, and the iteration's run/implementation/rerun children share that workspace.
211
+ - [ ] The smoke run is captured with run ids, manifest, `results.jsonl`, Harbor raw job folder, and stop reason.
212
+ - [ ] Evermore telemetry shows the benchmark issue was assigned and a heartbeat was enabled/observed, or the iteration is explicitly classified as harness/setup no-dispatch.
213
+ - [ ] Diagnosis applies the `/diagnose-why-work-stopped` pattern, classifies every non-progressing issue, and checks the three invariants.
214
+ - [ ] No implementation child exists for an unapproved fix proposal; if one was proposed, a `request_confirmation` is open against the latest plan revision.
215
+ - [ ] Every loop and iteration issue rests in a terminal, explicitly-live, explicitly-waiting, or named-blocker state.
216
+ - [ ] The stop reason — if the loop stopped this heartbeat — is one of pass, board rejection, budget exhausted, or named real blocker.
217
+ - [ ] No company-skill library mutation happened in this heartbeat.
218
+
219
+ ## Deterministic smoke
220
+
221
+ Run this smoke after installing or changing the skill, before treating it as operational for a live Terminal-Bench loop:
222
+
223
+ ```sh
224
+ pnpm smoke:terminal-bench-loop-skill
225
+ ```
226
+
227
+ The command uses the current Evermore API token and company from `EVERMORE_API_URL`, `EVERMORE_API_KEY`, and `EVERMORE_COMPANY_ID`. When `EVERMORE_TASK_ID` is set, it attaches the smoke issues under that source issue and inherits its project/goal context. By default it cancels the short-lived smoke issues after verification; pass `-- --keep` to leave the verified `blocked` loop parent, `in_review` iteration child, and pending confirmation available for manual inspection.
228
+
229
+ The smoke is deterministic and intentionally non-comparable. It does not start Terminal-Bench, Harbor, an agent model, or a provider runtime. It verifies only the control-plane shape:
230
+
231
+ - local `skills/terminal-bench-loop/SKILL.md` contains the loop contract terms;
232
+ - a top-level loop issue can be created and updated into a blocker posture;
233
+ - an iteration child issue can be created under the loop parent;
234
+ - mocked benchmark artifact paths are recorded on a `run` document;
235
+ - a `diagnosis` document names the exact stop point and next-action owner;
236
+ - a `request_confirmation` interaction is created and the iteration child rests in `in_review` with a typed waiting path rather than silent review.