agentera 0.0.0 → 3.0.0-dev.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (252) hide show
  1. package/README.md +6 -45
  2. package/bundle/.agentera-npx-bundle.json +4 -0
  3. package/bundle/references/adapters/cursor.md +213 -0
  4. package/bundle/references/adapters/opencode.md +530 -0
  5. package/bundle/references/adapters/package-manifest-interface-model.yaml +337 -0
  6. package/bundle/references/adapters/package-registry.yaml +247 -0
  7. package/bundle/references/adapters/package-surface-characterization.md +48 -0
  8. package/bundle/references/adapters/runtime-adapter-characterization.md +79 -0
  9. package/bundle/references/adapters/runtime-adapter-interface-model.yaml +200 -0
  10. package/bundle/references/adapters/runtime-adapter-registry.yaml +548 -0
  11. package/bundle/references/adapters/runtime-feature-parity.md +189 -0
  12. package/bundle/references/analysis/benchmark.md +267 -0
  13. package/bundle/references/analysis/startup-measurement-contract.yaml +424 -0
  14. package/bundle/references/artifacts/artifact-registry-interface-model.yaml +288 -0
  15. package/bundle/references/cli/agent-ready-state-contract.yaml +950 -0
  16. package/bundle/references/cli/app-lifecycle-vocabulary.yaml +233 -0
  17. package/bundle/references/cli/audience-namespace-cli-migration.yaml +355 -0
  18. package/bundle/references/cli/bundle-skill-vocabulary.yaml +278 -0
  19. package/bundle/references/cli/capability-instruction-contract.yaml +123 -0
  20. package/bundle/references/cli/capability-tool-classification.yaml +53 -0
  21. package/bundle/references/cli/routing-execution-vocabulary.yaml +281 -0
  22. package/bundle/references/cli/update-channels.yaml +120 -0
  23. package/bundle/references/cli/vocabulary-index.yaml +160 -0
  24. package/bundle/references/cli/vocabulary.md +562 -0
  25. package/bundle/references/meta/documentation-inventory.md +43 -0
  26. package/bundle/references/v1-section-mapping.md +47 -0
  27. package/bundle/registry.json +39 -0
  28. package/bundle/skills/agentera/.claude-plugin/plugin.json +27 -0
  29. package/bundle/skills/agentera/SKILL.md +470 -0
  30. package/bundle/skills/agentera/agents/dokumentera.toml +6 -0
  31. package/bundle/skills/agentera/agents/hej.toml +6 -0
  32. package/bundle/skills/agentera/agents/inspektera.toml +6 -0
  33. package/bundle/skills/agentera/agents/inspirera.toml +6 -0
  34. package/bundle/skills/agentera/agents/optimera.toml +6 -0
  35. package/bundle/skills/agentera/agents/orkestrera.toml +6 -0
  36. package/bundle/skills/agentera/agents/planera.toml +6 -0
  37. package/bundle/skills/agentera/agents/profilera.toml +6 -0
  38. package/bundle/skills/agentera/agents/realisera.toml +6 -0
  39. package/bundle/skills/agentera/agents/resonera.toml +6 -0
  40. package/bundle/skills/agentera/agents/visionera.toml +6 -0
  41. package/bundle/skills/agentera/agents/visualisera.toml +6 -0
  42. package/bundle/skills/agentera/capabilities/dokumentera/instructions.md +428 -0
  43. package/bundle/skills/agentera/capabilities/dokumentera/schemas/artifacts.yaml +73 -0
  44. package/bundle/skills/agentera/capabilities/dokumentera/schemas/exit.yaml +35 -0
  45. package/bundle/skills/agentera/capabilities/dokumentera/schemas/triggers.yaml +35 -0
  46. package/bundle/skills/agentera/capabilities/dokumentera/schemas/validation.yaml +139 -0
  47. package/bundle/skills/agentera/capabilities/hej/instructions.md +331 -0
  48. package/bundle/skills/agentera/capabilities/hej/schemas/artifacts.yaml +69 -0
  49. package/bundle/skills/agentera/capabilities/hej/schemas/exit.yaml +32 -0
  50. package/bundle/skills/agentera/capabilities/hej/schemas/triggers.yaml +58 -0
  51. package/bundle/skills/agentera/capabilities/hej/schemas/validation.yaml +55 -0
  52. package/bundle/skills/agentera/capabilities/inspektera/instructions.md +514 -0
  53. package/bundle/skills/agentera/capabilities/inspektera/schemas/artifacts.yaml +76 -0
  54. package/bundle/skills/agentera/capabilities/inspektera/schemas/exit.yaml +36 -0
  55. package/bundle/skills/agentera/capabilities/inspektera/schemas/triggers.yaml +38 -0
  56. package/bundle/skills/agentera/capabilities/inspektera/schemas/validation.yaml +113 -0
  57. package/bundle/skills/agentera/capabilities/inspirera/instructions.md +280 -0
  58. package/bundle/skills/agentera/capabilities/inspirera/schemas/artifacts.yaml +24 -0
  59. package/bundle/skills/agentera/capabilities/inspirera/schemas/exit.yaml +33 -0
  60. package/bundle/skills/agentera/capabilities/inspirera/schemas/triggers.yaml +34 -0
  61. package/bundle/skills/agentera/capabilities/inspirera/schemas/validation.yaml +58 -0
  62. package/bundle/skills/agentera/capabilities/optimera/instructions.md +437 -0
  63. package/bundle/skills/agentera/capabilities/optimera/schemas/artifacts.yaml +69 -0
  64. package/bundle/skills/agentera/capabilities/optimera/schemas/exit.yaml +35 -0
  65. package/bundle/skills/agentera/capabilities/optimera/schemas/triggers.yaml +39 -0
  66. package/bundle/skills/agentera/capabilities/optimera/schemas/validation.yaml +91 -0
  67. package/bundle/skills/agentera/capabilities/orkestrera/instructions.md +433 -0
  68. package/bundle/skills/agentera/capabilities/orkestrera/schemas/artifacts.yaml +64 -0
  69. package/bundle/skills/agentera/capabilities/orkestrera/schemas/exit.yaml +34 -0
  70. package/bundle/skills/agentera/capabilities/orkestrera/schemas/triggers.yaml +42 -0
  71. package/bundle/skills/agentera/capabilities/orkestrera/schemas/validation.yaml +107 -0
  72. package/bundle/skills/agentera/capabilities/planera/instructions.md +368 -0
  73. package/bundle/skills/agentera/capabilities/planera/schemas/artifacts.yaml +62 -0
  74. package/bundle/skills/agentera/capabilities/planera/schemas/exit.yaml +33 -0
  75. package/bundle/skills/agentera/capabilities/planera/schemas/triggers.yaml +34 -0
  76. package/bundle/skills/agentera/capabilities/planera/schemas/validation.yaml +61 -0
  77. package/bundle/skills/agentera/capabilities/profilera/instructions.md +419 -0
  78. package/bundle/skills/agentera/capabilities/profilera/schemas/artifacts.yaml +18 -0
  79. package/bundle/skills/agentera/capabilities/profilera/schemas/exit.yaml +34 -0
  80. package/bundle/skills/agentera/capabilities/profilera/schemas/triggers.yaml +45 -0
  81. package/bundle/skills/agentera/capabilities/profilera/schemas/validation.yaml +57 -0
  82. package/bundle/skills/agentera/capabilities/realisera/instructions.md +403 -0
  83. package/bundle/skills/agentera/capabilities/realisera/schemas/artifacts.yaml +80 -0
  84. package/bundle/skills/agentera/capabilities/realisera/schemas/exit.yaml +35 -0
  85. package/bundle/skills/agentera/capabilities/realisera/schemas/triggers.yaml +39 -0
  86. package/bundle/skills/agentera/capabilities/realisera/schemas/validation.yaml +110 -0
  87. package/bundle/skills/agentera/capabilities/resonera/instructions.md +329 -0
  88. package/bundle/skills/agentera/capabilities/resonera/schemas/artifacts.yaml +47 -0
  89. package/bundle/skills/agentera/capabilities/resonera/schemas/exit.yaml +35 -0
  90. package/bundle/skills/agentera/capabilities/resonera/schemas/triggers.yaml +46 -0
  91. package/bundle/skills/agentera/capabilities/resonera/schemas/validation.yaml +77 -0
  92. package/bundle/skills/agentera/capabilities/visionera/instructions.md +309 -0
  93. package/bundle/skills/agentera/capabilities/visionera/schemas/artifacts.yaml +57 -0
  94. package/bundle/skills/agentera/capabilities/visionera/schemas/exit.yaml +35 -0
  95. package/bundle/skills/agentera/capabilities/visionera/schemas/triggers.yaml +41 -0
  96. package/bundle/skills/agentera/capabilities/visionera/schemas/validation.yaml +74 -0
  97. package/bundle/skills/agentera/capabilities/visualisera/instructions.md +400 -0
  98. package/bundle/skills/agentera/capabilities/visualisera/schemas/artifacts.yaml +44 -0
  99. package/bundle/skills/agentera/capabilities/visualisera/schemas/exit.yaml +34 -0
  100. package/bundle/skills/agentera/capabilities/visualisera/schemas/triggers.yaml +33 -0
  101. package/bundle/skills/agentera/capabilities/visualisera/schemas/validation.yaml +80 -0
  102. package/bundle/skills/agentera/capability_schema_contract.yaml +385 -0
  103. package/bundle/skills/agentera/protocol.yaml +463 -0
  104. package/bundle/skills/agentera/references/contract.md +1039 -0
  105. package/bundle/skills/agentera/schemas/artifacts/changelog.yaml +60 -0
  106. package/bundle/skills/agentera/schemas/artifacts/decisions.yaml +461 -0
  107. package/bundle/skills/agentera/schemas/artifacts/design.yaml +55 -0
  108. package/bundle/skills/agentera/schemas/artifacts/docs.yaml +402 -0
  109. package/bundle/skills/agentera/schemas/artifacts/experiments.yaml +373 -0
  110. package/bundle/skills/agentera/schemas/artifacts/health.yaml +484 -0
  111. package/bundle/skills/agentera/schemas/artifacts/objective.yaml +399 -0
  112. package/bundle/skills/agentera/schemas/artifacts/plan.yaml +342 -0
  113. package/bundle/skills/agentera/schemas/artifacts/progress.yaml +325 -0
  114. package/bundle/skills/agentera/schemas/artifacts/todo.yaml +110 -0
  115. package/bundle/skills/agentera/schemas/artifacts/vision.yaml +262 -0
  116. package/bundle/skills/hej/.claude-plugin/plugin.json +6 -0
  117. package/bundle/skills/hej/SKILL.md +69 -0
  118. package/bundle/skills/hej/agents/hej.toml +11 -0
  119. package/bundle/skills/hej/agents/openai.yaml +8 -0
  120. package/dist/analytics/extractCorpus.js +1791 -0
  121. package/dist/analytics/extractCorpus.js.map +1 -0
  122. package/dist/analytics/usageStats.js +487 -0
  123. package/dist/analytics/usageStats.js.map +1 -0
  124. package/dist/bin/agentera.js +4 -0
  125. package/dist/bin/agentera.js.map +1 -0
  126. package/dist/cli/appContext.js +226 -0
  127. package/dist/cli/appContext.js.map +1 -0
  128. package/dist/cli/argvalidate.js +41 -0
  129. package/dist/cli/argvalidate.js.map +1 -0
  130. package/dist/cli/capabilityContext.js +2421 -0
  131. package/dist/cli/capabilityContext.js.map +1 -0
  132. package/dist/cli/commands/backfill.js +84 -0
  133. package/dist/cli/commands/backfill.js.map +1 -0
  134. package/dist/cli/commands/capability.js +44 -0
  135. package/dist/cli/commands/capability.js.map +1 -0
  136. package/dist/cli/commands/compact.js +148 -0
  137. package/dist/cli/commands/compact.js.map +1 -0
  138. package/dist/cli/commands/doctor.js +180 -0
  139. package/dist/cli/commands/doctor.js.map +1 -0
  140. package/dist/cli/commands/lint.js +179 -0
  141. package/dist/cli/commands/lint.js.map +1 -0
  142. package/dist/cli/commands/prime.js +545 -0
  143. package/dist/cli/commands/prime.js.map +1 -0
  144. package/dist/cli/commands/query.js +346 -0
  145. package/dist/cli/commands/query.js.map +1 -0
  146. package/dist/cli/commands/report.js +210 -0
  147. package/dist/cli/commands/report.js.map +1 -0
  148. package/dist/cli/commands/schema.js +306 -0
  149. package/dist/cli/commands/schema.js.map +1 -0
  150. package/dist/cli/commands/state.js +1012 -0
  151. package/dist/cli/commands/state.js.map +1 -0
  152. package/dist/cli/commands/upgrade.js +49 -0
  153. package/dist/cli/commands/upgrade.js.map +1 -0
  154. package/dist/cli/commands/validate.js +519 -0
  155. package/dist/cli/commands/validate.js.map +1 -0
  156. package/dist/cli/commands/verify.js +204 -0
  157. package/dist/cli/commands/verify.js.map +1 -0
  158. package/dist/cli/dispatch.js +962 -0
  159. package/dist/cli/dispatch.js.map +1 -0
  160. package/dist/cli/orientation.js +595 -0
  161. package/dist/cli/orientation.js.map +1 -0
  162. package/dist/cli/prime-blob.js +3 -0
  163. package/dist/cli/prime-blob.js.map +1 -0
  164. package/dist/cli/stateQuery.js +292 -0
  165. package/dist/cli/stateQuery.js.map +1 -0
  166. package/dist/cli/structured.js +18 -0
  167. package/dist/cli/structured.js.map +1 -0
  168. package/dist/core/difflib.js +274 -0
  169. package/dist/core/difflib.js.map +1 -0
  170. package/dist/core/git.js +43 -0
  171. package/dist/core/git.js.map +1 -0
  172. package/dist/core/paths.js +50 -0
  173. package/dist/core/paths.js.map +1 -0
  174. package/dist/core/pyjson.js +101 -0
  175. package/dist/core/pyjson.js.map +1 -0
  176. package/dist/core/sourceRoot.js +72 -0
  177. package/dist/core/sourceRoot.js.map +1 -0
  178. package/dist/core/toml.js +11 -0
  179. package/dist/core/toml.js.map +1 -0
  180. package/dist/core/yaml.js +25 -0
  181. package/dist/core/yaml.js.map +1 -0
  182. package/dist/eval/evalSkills.js +258 -0
  183. package/dist/eval/evalSkills.js.map +1 -0
  184. package/dist/eval/semanticEval.js +148 -0
  185. package/dist/eval/semanticEval.js.map +1 -0
  186. package/dist/eval/semanticFixtures.js +227 -0
  187. package/dist/eval/semanticFixtures.js.map +1 -0
  188. package/dist/hooks/common.js +160 -0
  189. package/dist/hooks/common.js.map +1 -0
  190. package/dist/hooks/compaction.js +935 -0
  191. package/dist/hooks/compaction.js.map +1 -0
  192. package/dist/hooks/cursorPreToolUse.js +19 -0
  193. package/dist/hooks/cursorPreToolUse.js.map +1 -0
  194. package/dist/hooks/cursorSessionStart.js +71 -0
  195. package/dist/hooks/cursorSessionStart.js.map +1 -0
  196. package/dist/hooks/sessionStart.js +209 -0
  197. package/dist/hooks/sessionStart.js.map +1 -0
  198. package/dist/hooks/sessionStop.js +212 -0
  199. package/dist/hooks/sessionStop.js.map +1 -0
  200. package/dist/hooks/validateArtifact.js +933 -0
  201. package/dist/hooks/validateArtifact.js.map +1 -0
  202. package/dist/registries/artifactRegistry.js +206 -0
  203. package/dist/registries/artifactRegistry.js.map +1 -0
  204. package/dist/registries/capabilityContract.js +310 -0
  205. package/dist/registries/capabilityContract.js.map +1 -0
  206. package/dist/registries/packageRegistry.js +641 -0
  207. package/dist/registries/packageRegistry.js.map +1 -0
  208. package/dist/registries/runtimeAdapterRegistry.js +315 -0
  209. package/dist/registries/runtimeAdapterRegistry.js.map +1 -0
  210. package/dist/setup/codex.js +1052 -0
  211. package/dist/setup/codex.js.map +1 -0
  212. package/dist/setup/copilot.js +227 -0
  213. package/dist/setup/copilot.js.map +1 -0
  214. package/dist/setup/cursor.js +127 -0
  215. package/dist/setup/cursor.js.map +1 -0
  216. package/dist/setup/doctor.js +1269 -0
  217. package/dist/setup/doctor.js.map +1 -0
  218. package/dist/state/installRoot.js +279 -0
  219. package/dist/state/installRoot.js.map +1 -0
  220. package/dist/state/progressCommit.js +289 -0
  221. package/dist/state/progressCommit.js.map +1 -0
  222. package/dist/state/startupAnalysis.js +1953 -0
  223. package/dist/state/startupAnalysis.js.map +1 -0
  224. package/dist/upgrade/appModel.js +189 -0
  225. package/dist/upgrade/appModel.js.map +1 -0
  226. package/dist/upgrade/channels.js +197 -0
  227. package/dist/upgrade/channels.js.map +1 -0
  228. package/dist/upgrade/compatibility.js +197 -0
  229. package/dist/upgrade/compatibility.js.map +1 -0
  230. package/dist/upgrade/doctor.js +368 -0
  231. package/dist/upgrade/doctor.js.map +1 -0
  232. package/dist/upgrade/migrateArtifactsV2ToV3.js +412 -0
  233. package/dist/upgrade/migrateArtifactsV2ToV3.js.map +1 -0
  234. package/dist/upgrade/upgradeCommands.js +40 -0
  235. package/dist/upgrade/upgradeCommands.js.map +1 -0
  236. package/dist/upgrade/upgradeOrchestrator.js +280 -0
  237. package/dist/upgrade/upgradeOrchestrator.js.map +1 -0
  238. package/dist/validate/appHomeContract.js +150 -0
  239. package/dist/validate/appHomeContract.js.map +1 -0
  240. package/dist/validate/capability.js +412 -0
  241. package/dist/validate/capability.js.map +1 -0
  242. package/dist/validate/crossCapability.js +145 -0
  243. package/dist/validate/crossCapability.js.map +1 -0
  244. package/dist/validate/lifecycleAdapters.js +772 -0
  245. package/dist/validate/lifecycleAdapters.js.map +1 -0
  246. package/dist/validate/selfAudit.js +107 -0
  247. package/dist/validate/selfAudit.js.map +1 -0
  248. package/package.json +28 -8
  249. package/LICENSE +0 -201
  250. package/bin/agentera.mjs +0 -50
  251. package/lib/exec.mjs +0 -116
  252. package/lib/resolve.mjs +0 -129
@@ -0,0 +1,189 @@
1
+ # Runtime feature parity reference
2
+
3
+ Tracks release-relevant runtime behavior for the portable agentera suite.
4
+
5
+ This reference distinguishes implemented behavior from host support. A runtime
6
+ may expose an event while agentera still lacks a shipped adapter path for it.
7
+
8
+ ## Summary
9
+
10
+ | Runtime | Skill loading | Session preload | Artifact validation | Session bookmark |
11
+ |---------|---------------|-----------------|---------------------|------------------|
12
+ | Claude Code | Full: marketplace plugin and native skill paths load `skills/<name>/SKILL.md` | Active via `SessionStart` in `hooks/hooks.json` | Advisory after mutation via `PostToolUse` for `Edit` or `Write` | Active via `Stop` |
13
+ | OpenCode | Full: native `skill` tool loads `.opencode`, `.claude`, and `.agents` skill paths | Deferred for session start: `session.created` is observable, but no model-context injection path is verified. Active for compaction through bounded `experimental.session.compacting` context from `agentera hej --format json`. | Conditional hard gate for reconstructable `write` and `edit` candidates via `tool.execute.before`; `tool.execute.after` remains advisory | Active via generic `event` hook on `session.idle` |
14
+ | Copilot CLI | Full for portable skills through plugin or skill-folder install paths | Active via `sessionStart` | Conditional hard gate via `preToolUse` when `toolArgs` include path plus candidate content or exact replacement evidence | Active via `sessionEnd` |
15
+ | Codex CLI | Full for portable skills through plugin install, `.agents/skills`, and `$skill` invocation | Not wired by the shipped hook config | Advisory `apply_patch` path validation through shipped PreToolUse and PostToolUse hooks; final patch content is not reconstructed | Not wired by the shipped hook config |
16
+ | Cursor IDE | Full for portable skills through local plugin (`~/.cursor/plugins/local/agentera` via `.cursor-plugin/plugin.json`), repo-native surfaces, and upgrade-installed `.cursor/` targets | Active via `sessionStart` env export plus optional `additional_context` digest; plugin-root fallback when `AGENTERA_HOME` env and project walk-up fail (`hooks/cursor_session_start.py`) | Conditional hard gate for reconstructable `Write` and `Edit` candidates via `preToolUse`; verified after live preToolUse Write smoke (2026-05-24) | Active via `sessionEnd` |
17
+ | Cursor Agent CLI | Full when workspace surfaces are installed; degraded when launched outside a Cursor project | Degraded relative to IDE sessionStart env export | Degraded hook parity; follows IDE smoke evidence only | Degraded relative to IDE `sessionEnd` wiring |
18
+
19
+ ## Profilera session corpus
20
+
21
+ | Runtime | Local session mining | Evidence |
22
+ | ------- | -------------------- | -------- |
23
+ | Claude Code | Yes: `~/.claude/projects/**/*.jsonl` | `scripts/extract_corpus.py` |
24
+ | Codex CLI | Yes: `~/.codex/sessions/**/*.jsonl` | `scripts/extract_corpus.py` |
25
+ | OpenCode | Yes: `opencode.db` SQLite stores | `scripts/extract_corpus.py`, `references/adapters/opencode.md` |
26
+ | Copilot CLI | Yes: `session-store.db` SQLite stores | `scripts/extract_corpus.py` |
27
+ | Cursor IDE | Yes: `~/.cursor/projects/*/agent-transcripts/*/*.jsonl` | `scripts/extract_corpus.py`, `references/adapters/cursor.md` |
28
+ | Cursor Agent CLI | Yes: gap-fill from `~/.config/cursor/chats/<md5(project)>/<session>/store.db` when no IDE JSONL exists | `scripts/extract_corpus.py`, `references/adapters/cursor.md` |
29
+
30
+ ## Bare `hej` routing
31
+
32
+ | Runtime | Bare text `hej` behavior | Evidence |
33
+ |---------|--------------------------|----------|
34
+ | OpenCode | Deterministic exact-match adapter route through `chat.message`; only a complete lowercase text message `hej` is rewritten to load `agentera` and run the `agentera hej` dashboard path, accepting OpenCode's CLI-added single trailing newline as a transport artifact. | `.opencode/plugins/agentera.js`, `scripts/smoke_opencode_bootstrap.mjs`, OpenCode `packages/plugin/src/index.ts` Hooks interface |
35
+ | Claude Code | Metadata/context only; `UserPromptSubmit` can observe or add context but is not a verified prompt rewrite router. | `skills/agentera/SKILL.md`, marketplace metadata |
36
+ | Copilot CLI | Metadata/context only; skills, prompts, hooks, and plugins expose Agentera but do not guarantee pre-model bare-prompt routing. | `plugin.json`, `.github/plugin/plugin.json`, `.github/hooks` |
37
+ | Codex CLI | Metadata/context only; `$agentera` is explicit and the legacy `$hej` bridge is not implicitly invocable. | `.codex-plugin/plugin.json`, `agents/openai.yaml` |
38
+ | Cursor IDE | Metadata/context only; `beforeSubmitPrompt` is supported by the host but Agentera v1 does not rewrite bare `hej`. | `.cursor-plugin/plugin.json`, `.cursor/agents/*.md` |
39
+ | Cursor Agent CLI | Metadata/context only like Claude, Copilot, and Codex. | `references/adapters/cursor.md`, eval runner metadata |
40
+
41
+ ## Artifact validation
42
+
43
+ | Runtime | Blocking surface | Implemented gate | Evidence-insufficient paths | Verification surface |
44
+ |---------|------------------|------------------|-----------------------------|----------------------|
45
+ | Claude Code | None in shipped config; validation runs after `Edit` or `Write` | No pre-write hard gate is claimed | Any invalid artifact can already be written before the warning appears | `hooks/hooks.json`, `hooks/validate_artifact.py` |
46
+ | OpenCode | `tool.execute.before` can throw before mutation | Invalid reconstructable artifact `write` and `edit` candidates are blocked | Sparse payloads and `apply_patch` `patchText` without reconstructed full content are allowed | `.opencode/plugins/agentera.js`, `scripts/smoke_opencode_bootstrap.mjs` |
47
+ | Copilot CLI | `preToolUse` returns `permissionDecision: deny` | Invalid reconstructable artifact candidates are denied | Malformed, sparse, or non-reconstructable `toolArgs` are allowed | `.github/hooks/preToolUse.json`, `hooks/validate_artifact.py`, `tests/test_validate_artifact.py` |
48
+ | Codex CLI | `codex_hooks` can run before and after `apply_patch` | No content hard gate is claimed; the copied user hook config parses touched paths and validates existing files; optional plugin-bundled hooks require `[features].plugin_hooks = true` plus `/hooks` review | Add-file targets and final post-patch candidate content are not reconstructed by the adapter | `~/.codex/hooks.json` generated by upgrade, `.codex-plugin/plugin.json` `hooks`, `hooks/codex-plugin-hooks.json`, `hooks/validate_artifact.py`, live apply_patch hook firing smoke |
49
+ | Cursor IDE | `preToolUse` returns `permission: deny` when wired | Invalid reconstructable artifact candidates are denied; live preToolUse Write smoke passed 2026-05-24 | Malformed, sparse, or non-reconstructable tool_input payloads are allowed | `.cursor/hooks.json`, `hooks/cursor_pre_tool_use.py`, `hooks/validate_artifact.py`, `.agentera/smoke-cursor-pretooluse-evidence.txt` |
50
+ | Cursor Agent CLI | None claimed for standalone CLI | No IDE-equivalent hard gate is claimed | CLI may run without project hook wiring | `scripts/eval_skills.py --runtime cursor-agent` |
51
+
52
+ Docs may claim functional hard-gate parity only for closeable paths that are
53
+ implemented and verified. Today that means OpenCode, Copilot, and Cursor IDE
54
+ reconstructable artifact candidates. Claude Code and Codex remain active validation
55
+ surfaces, but neither shipped configuration blocks every invalid artifact candidate
56
+ before mutation.
57
+
58
+ ## Lifecycle notes
59
+
60
+ | Runtime | Runtime reason for degraded or blocked capability |
61
+ |---------|---------------------------------------------------|
62
+ | OpenCode preload | The `event` hook observes `session.created`, but no supported adapter path injects text into model context. |
63
+ | OpenCode compaction context | `experimental.session.compacting` appends bounded Agentera state from `agentera hej --format json`; the plugin does not read raw `.agentera` artifacts for compaction. |
64
+ | OpenCode `apply_patch` hard gate | The adapter receives `patchText` without reconstructing full candidate content. It allows that path rather than guessing. |
65
+ | Copilot sparse edits | Copilot `preToolUse` stdin may omit full content or unique old/new replacement evidence. The hook allows those payloads. |
66
+ | Codex preload/bookmarks | `codex_hooks` supports lifecycle events, but Agentera ships only `apply_patch` PreToolUse/PostToolUse wiring for copied user hooks and optional plugin-bundled hooks. |
67
+ | Codex artifact hard gate | The adapter parses patch headers for touched paths, but it does not reconstruct final candidate content for blocking validation. |
68
+ | Codex plugin hook trust | Plugin-bundled hooks require `[features].plugin_hooks = true` and deliberate `/hooks` review; copied `~/.codex/hooks.json` remains the default reliable install path with generated `[hooks.state]` trust hashes. |
69
+ | Cursor cloud agents | Cloud agents are unsupported in v1; repo hooks and managed agents target local IDE sessions only. |
70
+ | Cursor CLI hook parity | `cursor-agent` print mode is eval-covered but hook/session env parity is degraded relative to IDE wiring. |
71
+ | Cursor hard-gate release gate | Live preToolUse Write smoke passed 2026-05-24; release tagging and publication stay blocked pending broader release closeout. |
72
+
73
+ ## Subagent Dispatch
74
+
75
+ | Runtime | Dispatch surface | Descriptor source | Tool Access | Verification surface |
76
+ |---------|------------------|-------------------|-------------|----------------------|
77
+ | Claude Code | Native Task/subagent surface | Host-managed; no Agentera descriptor files shipped for this phase | None (no descriptors) | RuntimeAdapter registry |
78
+ | OpenCode | `@<capability>` descriptors under `~/.config/opencode/agents` | `.opencode/agents/*.md`, bootstrapped by `.opencode/plugins/agentera.js` | Per-agent `permission` frontmatter | `scripts/smoke_opencode_bootstrap.mjs`, `agentera validate descriptors` |
79
+ | Copilot CLI | User-driven host action such as `/fleet` when available | Host-managed; no Agentera descriptor files shipped for this phase | N/A (no descriptors) | RuntimeAdapter registry |
80
+ | Codex CLI | Native agent descriptors under `~/.codex/agents` or project `.codex/agents` with bounded `[agents]` settings | `skills/agentera/agents/*.toml`, installed by `scripts/setup_codex.py` and `agentera upgrade` | Global sandbox policy (no per-agent) | `agentera validate descriptors`, `tests/test_setup_codex.py`, `tests/test_upgrade_cli.py` |
81
+ | Cursor IDE | Cursor agent picker / @-mention for managed capability descriptors | `.cursor/agents/*.md`, via local plugin or `agentera upgrade --runtime cursor` | Global full access (no per-agent) | `references/adapters/cursor.md`, `scripts/validate_lifecycle_adapters.py`, `tests/test_upgrade_cli.py` |
82
+ | Cursor Agent CLI | Host-managed `cursor-agent -p` print mode | Workspace `.cursor/agents/*.md` when present; no separate CLI descriptor install | Global full access (no per-agent) | `scripts/eval_skills.py --runtime cursor-agent`, `tests/test_eval_skills.py` |
83
+
84
+ Agentera v2 does not write legacy `[agents.<name>]` Codex config blocks. Capability dispatch must use runtime-native subagent descriptors or host Task surfaces, not unsupported `agentera <capability>` CLI commands.
85
+
86
+ ## Copilot install notes
87
+
88
+ Recommended marketplace install:
89
+
90
+ ```bash
91
+ copilot plugin marketplace add jgabor/agentera
92
+ copilot plugin install <skill>@agentera
93
+ ```
94
+
95
+ Umbrella install:
96
+
97
+ ```bash
98
+ copilot plugin install jgabor/agentera
99
+ ```
100
+
101
+ The marketplace install path is verified working. Granular installs avoid
102
+ umbrella discovery bug `github/copilot-cli#2390`.
103
+
104
+ Granular installs provide core `SKILL.md` behavior. App-home tools such as
105
+ doctor, installer, validators, and shared setup helpers require the managed
106
+ Agentera app or a local clone with the shared `scripts/` directory.
107
+
108
+ Deprecated fallback: `copilot plugin install OWNER/REPO`, Git URLs, and local
109
+ paths still work, but Copilot warns they are deprecated.
110
+
111
+ ## Cursor install notes
112
+
113
+ **Local plugin (no Marketplace listing required)**
114
+
115
+ ```bash
116
+ git clone https://github.com/jgabor/agentera.git ~/.cursor/plugins/local/agentera
117
+ # or: ln -s /path/to/agentera ~/.cursor/plugins/local/agentera
118
+ ```
119
+
120
+ Restart Cursor or run **Developer: Reload Window**. The plugin root must contain
121
+ `.cursor-plugin/plugin.json`. Agentera is not published to the Cursor Marketplace
122
+ yet.
123
+
124
+ The plugin loads skills, managed capability agents, and hooks. When you open a
125
+ project that is not an Agentera install root, `sessionStart` exports
126
+ `AGENTERA_HOME` from the plugin checkout (including a plugin-root fallback when
127
+ env and project walk-up do not resolve a managed root).
128
+
129
+ **Portable skill plus project upgrade**
130
+
131
+ Install the bundled skill, then install managed project surfaces:
132
+
133
+ ```bash
134
+ npx skills add jgabor/agentera -g -a cursor --skill agentera -y
135
+ uv run scripts/agentera upgrade --runtime cursor --dry-run
136
+ uv run scripts/agentera upgrade --runtime cursor --yes
137
+ uv run scripts/agentera doctor --runtime cursor
138
+ ```
139
+
140
+ Use the plugin path for a user-global install. Use upgrade when you need
141
+ project-committed `.cursor/hooks.json` and `.cursor/agents/` copies. Both paths can
142
+ be combined.
143
+
144
+ Repo-native dogfood in this repository uses committed `.cursor/hooks.json` and
145
+ `.cursor/agents/*.md`. Other projects install managed surfaces with the upgrade
146
+ commands above.
147
+
148
+ Cloud agents are unsupported in v1. Conditional hard-gate validation for IDE
149
+ reconstructable Write and Edit candidates is verified after live preToolUse Write
150
+ smoke (2026-05-24); release tagging and publication remain blocked until explicitly
151
+ approved.
152
+
153
+ Eval coverage for automation uses:
154
+
155
+ ```bash
156
+ uv run scripts/eval_skills.py --runtime cursor-agent --dry-run
157
+ ```
158
+
159
+ ## Source of truth
160
+
161
+ Runtime adapter facts are owned by the RuntimeAdapter registry at
162
+ `references/adapters/runtime-adapter-registry.yaml` and loaded through
163
+ `scripts/runtime_adapter_registry.py`. This reference may describe registry
164
+ claims, but changes to runtime identity, lifecycle events, artifact-validation
165
+ support, subagent dispatch, config targets, diagnostics, or documentation claims must be validated
166
+ against the registry rather than duplicated here as an independent table.
167
+
168
+ App-home classification is not runtime-specific. `scripts/install_root.py` is
169
+ the shared Module for `AGENTERA_HOME`, the normal user data root, managed app,
170
+ out-of-date app, unknown directories, and diagnostic semantics. Package metadata
171
+ registry work stays outside both the RuntimeAdapter registry and this shared
172
+ classification Module.
173
+
174
+ | Surface | Path |
175
+ |---------|------|
176
+ | Shared artifact validator | `hooks/validate_artifact.py` |
177
+ | Claude Code hook registry | `hooks/hooks.json` |
178
+ | OpenCode plugin | `.opencode/plugins/agentera.js` |
179
+ | OpenCode agent descriptors | `.opencode/agents/*.md` |
180
+ | Copilot pre-write hook | `.github/hooks/preToolUse.json` |
181
+ | Codex copied user hook config | `~/.codex/hooks.json` generated from `hooks/codex-hooks.json` with a resolved validator command |
182
+ | Codex plugin hook config | `hooks/codex-plugin-hooks.json` via `.codex-plugin/plugin.json` `hooks` |
183
+ | Codex agent descriptors | `skills/agentera/agents/*.toml` |
184
+ | Cursor hook registry | `.cursor/hooks.json` |
185
+ | Cursor agent descriptors | `.cursor/agents/*.md` |
186
+ | Cursor plugin manifest | `.cursor-plugin/plugin.json` |
187
+ | RuntimeAdapter registry | `references/adapters/runtime-adapter-registry.yaml` |
188
+ | RuntimeAdapter registry loader | `scripts/runtime_adapter_registry.py` |
189
+ | Lifecycle metadata validator | `scripts/validate_lifecycle_adapters.py` |
@@ -0,0 +1,267 @@
1
+ # Agentera benchmarks
2
+
3
+ This document indexes benchmark surfaces, execution policy, retained outputs,
4
+ and interpretation rules. Use it when running Agentera benchmarks or adding new
5
+ ones.
6
+
7
+ Scope: maintainer-run benchmarks only. Normal verification still lives in tests,
8
+ contract validators, and runtime smoke checks.
9
+
10
+ ## Authority order
11
+
12
+ | Authority | Owns |
13
+ | --- | --- |
14
+ | `references/analysis/startup-measurement-contract.yaml` | Startup state-access metric contract, benchmark privacy boundary, retained fields, and storage shape. |
15
+ | `scripts/startup_analysis_contract.py` | Startup analyzer implementation, report generation, aggregate row construction, and benchmark persistence. |
16
+ | `magefile.go` | Manual benchmark command surface and non-interactive approval gate. |
17
+ | `tests/test_startup_analysis_contract.py` | Fixture-backed consent, persistence, privacy, and no-repo-output checks. |
18
+ | `references/analysis/benchmark.md` | Human runbook, interpretation guide, and future benchmark documentation pattern. |
19
+
20
+ ## Benchmark surfaces
21
+
22
+ | Surface | Command | Purpose | CI policy |
23
+ | --- | --- | --- | --- |
24
+ | Startup state benchmark | `mage bench:startupState` | Measures how often Agentera CLI state reads are followed by raw artifact access during startup/state gathering. | Manual only; forbidden in normal CI. |
25
+
26
+ The startup benchmark is an optimization signal for Decision 51 and Decision 52.
27
+ It does not implement a startup state envelope or change runtime behavior.
28
+
29
+ ## Startup State Benchmark
30
+
31
+ Run the benchmark with no extra setup:
32
+
33
+ ```bash
34
+ mage bench:startupState
35
+ ```
36
+
37
+ The default run uses documented runtime-store defaults, writes retained results
38
+ under the default Agentera benchmark directory, and records unavailable stores as
39
+ bounded degradation evidence. No environment variables are required.
40
+
41
+ To use different runtime history sources, set each runtime label and concrete
42
+ store path explicitly:
43
+
44
+ ```bash
45
+ AGENTERA_BENCH_RUNTIME_STORES="opencode=/absolute/path/to/opencode.db" mage bench:startupState
46
+ ```
47
+
48
+ Customize a run when needed:
49
+
50
+ ```bash
51
+ AGENTERA_BENCH_RUNTIME_STORES="opencode=/absolute/path/to/opencode.db" \
52
+ AGENTERA_BENCH_SALT="$(openssl rand -hex 32)" \
53
+ AGENTERA_BENCH_PROJECT_ROOTS="/absolute/project/root" \
54
+ AGENTERA_BENCH_OUTPUT_DIR="/absolute/benchmark/output" \
55
+ mage bench:startupState
56
+ ```
57
+
58
+ Inputs:
59
+
60
+ | Input | Meaning |
61
+ | --- | --- |
62
+ | `AGENTERA_BENCH_RUNTIME_STORES` | Optional. Comma-separated `runtime=/absolute/path` overrides. Use this when the documented runtime-store defaults are not the stores you want measured. A runtime label without its path is not enough. |
63
+ | `AGENTERA_BENCH_SALT` | Optional. Local redaction salt for transient `latest-report.*` pseudonyms. If omitted, Mage generates one. `runs.jsonl` does not retain salts or generated salted hashes, so aggregate history does not require a stable or shared salt. |
64
+ | `AGENTERA_BENCH_PROJECT_ROOTS` | Optional. Absolute project roots for corpus extraction. Use this when the current directory is not the project root you want measured. |
65
+ | `AGENTERA_BENCH_OUTPUT_DIR` | Optional. Absolute override for the durable benchmark directory. Use only when the default user-local directory is not desired. |
66
+
67
+ Supported runtime labels are owned by the analyzer and Mage wrapper. Invalid
68
+ labels or relative paths fail before runtime history is read.
69
+
70
+ ### Runtime Extraction Contract
71
+
72
+ The extraction matrix is defined in
73
+ `references/analysis/startup-measurement-contract.yaml` under
74
+ `runtime_extraction_contract`. It lists, for each supported runtime, the accepted
75
+ input schema classes, normalized record fields, status mapping, and redaction
76
+ rules.
77
+
78
+ Status interpretation is intentionally split:
79
+
80
+ | Outcome | Status / reason | Meaning |
81
+ | --- | --- | --- |
82
+ | Schema divergence | `degraded` / `schema_divergent` | Candidate runtime storage was found, but the adapter hit schema errors. Treat this as extraction failure evidence. |
83
+ | No matching records | `sparse` / `no_matching_records` | Candidate storage was readable and schema-compatible, but no supported records were extracted. Treat this as sparse coverage evidence. |
84
+ | Successful zero-record window | `ok` / `records_extracted` with `record_count: 0` and `error_count: 0` | Extraction succeeded, but the incremental benchmark window has no records after the previous watermark. Treat this as compatible successful behavior. |
85
+
86
+ Current known runtime caveats are extraction caveats, not CLI behavior evidence:
87
+ `claude-code` is degraded by `schema_divergent` with 4836 candidates, 0 records,
88
+ and 2 errors; `github-copilot` is degraded by `schema_divergent` with 1
89
+ candidate, 0 records, and 1 error; `opencode` currently contributes records;
90
+ `codex` can validly report `ok` with zero records and zero errors for an empty
91
+ incremental window.
92
+
93
+ Runtime-store runs are incremental by default. The first run for a runtime scope
94
+ measures all records after the v2.3.0 boundary. Later runs for the same
95
+ `runtime_scope` read the previous `benchmark_watermark_at` from `runs.jsonl` and
96
+ measure only records with timestamps strictly after that watermark. This keeps
97
+ new aggregate rows focused on work since the previous successful benchmark run.
98
+
99
+ To start a fresh series, use a different `AGENTERA_BENCH_OUTPUT_DIR` or archive
100
+ the existing `runs.jsonl`. Late-arriving records with timestamps at or before the
101
+ stored watermark are treated as already covered.
102
+
103
+ ## How To Reduce Raw Startup Reads
104
+
105
+ Use the startup benchmark as a CLI completeness profiler. The benchmark does not
106
+ measure wall-clock startup speed. It answers whether agents fetch Agentera state
107
+ through the CLI and then still fall back to raw artifact reads, greps, or globs
108
+ during startup/state gathering.
109
+
110
+ Follow this loop:
111
+
112
+ 1. Run `mage bench:startupState` and open the retained latest report from the
113
+ `benchmark.directory` path printed on stdout.
114
+ 2. Check evidence quality before planning product changes. `total_state_sequences`
115
+ must be non-zero, important runtime rows should not be degraded, and
116
+ `confidence_caveats` plus `degradation_reason_counts` must be understood. If
117
+ the report has zero state-gathering sequences, treat it as no evidence rather
118
+ than evidence that the CLI is complete.
119
+ 3. Rank the post-CLI raw artifact reads. The best next CLI field is usually the
120
+ highest repeated artifact in `redundant_raw_artifact_access_counts`, followed
121
+ by `raw_artifact_access_after_cli_counts` when the read is not redundant yet
122
+ but still happens after a successful CLI state call.
123
+ 4. Map each repeated raw read to the CLI state owner. Prefer enriching existing
124
+ routine commands or the `hej` composite startup result. Do not add Decision 43
125
+ slash-route aliases as CLI commands.
126
+ 5. Make CLI completeness explicit. A startup-capable response should say whether
127
+ it is complete for the requested capability, whether raw artifact reads are
128
+ required, what state families are included, what is missing, and which CLI
129
+ fallback command should be tried before raw file access.
130
+ 6. Update guidance and tests so agents trust complete CLI output and use raw
131
+ reads only as a fallback. Rerun the benchmark on representative sessions and
132
+ compare the new rates with prior `runs.jsonl` rows.
133
+
134
+ Use these fields to decide what to change:
135
+
136
+ | Field | How to use it |
137
+ | --- | --- |
138
+ | `runtime_coverage` | Verify which runtime stores contributed records, which were degraded, and why. Fix extraction or gather more evidence before product work when key stores are degraded. |
139
+ | `total_state_sequences` | Confirm the run actually observed startup/state-gathering sequences. A zero value blocks CLI-completeness conclusions. |
140
+ | `cli_state_command_counts` | Shows which Agentera state commands anchored the measured sequences. These commands are the first candidates for richer structured output. |
141
+ | `raw_artifact_access_after_cli_counts` | Shows which canonical artifacts agents still read after CLI state. These are candidate missing CLI fields or summaries. |
142
+ | `redundant_raw_artifact_access_counts` | Shows post-CLI raw reads that overlap state already covered by the CLI. These are the highest-priority avoidable reads. |
143
+ | `per_capability_state_counts` | Shows whether the gap is broad startup behavior or narrow capability-specific startup context. |
144
+ | `capability_prose_read_counts` | Shows capability prose reads during startup. Use this to decide whether routing/context guidance, not artifact state, is the repeated lookup. |
145
+ | `startup_recommendation` | Records whether the measured evidence supports closing, targeted guidance, or a broader startup state envelope. |
146
+
147
+ ### Token Impact Estimates
148
+
149
+ Token-impact fields are approximate, privacy-safe aggregate estimates. The
150
+ contract-owned estimator version is `approx_bytes_div_4_v1`: the analyzer may
151
+ observe content byte counts transiently, group them by canonical artifact label,
152
+ and estimate tokens as bytes divided by 4. Retained outputs must not include raw
153
+ paths, transcript text, raw tool arguments, private salts, or generated salted
154
+ hashes.
155
+
156
+ Retained latest reports and new history rows may include:
157
+
158
+ | Field | Meaning |
159
+ | --- | --- |
160
+ | `token_estimator_version` | Estimator identity, currently `approx_bytes_div_4_v1`. |
161
+ | `estimated_raw_after_cli_tokens` | Aggregate estimated tokens for raw artifact reads after CLI state calls. |
162
+ | `estimated_redundant_raw_tokens` | Aggregate estimated tokens for raw artifact reads that overlap CLI-covered state. |
163
+ | `estimated_raw_after_cli_tokens_by_artifact` | Canonical-label breakdown such as `PLAN.md`; never raw paths. |
164
+ | `estimated_redundant_raw_tokens_by_artifact` | Canonical-label breakdown for redundant raw reads. |
165
+ | `estimated_tokens_saved_vs_previous` | Previous comparable row's redundant-token estimate minus the current row's estimate, or `null`. |
166
+ | `estimated_tokens_saved_vs_previous_null_reason` | Concrete reason when savings are `null`, such as `previous_missing_token_estimates`. |
167
+
168
+ Rows are comparable only when contract version, benchmark mode, runtime scope,
169
+ estimator version, and token-field availability match. Otherwise savings stay
170
+ `null` with a contract-listed reason.
171
+
172
+ Stop conditions are as important as action triggers. If a run has zero
173
+ state-gathering sequences, sparse records, or degraded schema extraction, improve
174
+ benchmark coverage first. If only one capability repeatedly reads one artifact,
175
+ prefer targeted CLI output or guidance over a broad startup envelope. If multiple
176
+ capabilities repeatedly read several redundant artifacts after CLI state, plan a
177
+ capability-ready startup state envelope or equivalent composite output.
178
+
179
+ ## Retention Policy
180
+
181
+ Default durable storage is `${AGENTERA_HOME}/benchmarks/startup-state/`.
182
+
183
+ Retained outputs are limited to:
184
+
185
+ | File | Retention role |
186
+ | --- | --- |
187
+ | `runs.jsonl` | Append-only aggregate benchmark history and previous-run watermark source. |
188
+ | `latest-report.json` | Latest redacted structured report. |
189
+ | `latest-report.md` | Latest redacted human-readable report. |
190
+
191
+ Temporary corpus files, intermediates, and per-run detailed reports are not
192
+ durable benchmark history. Failed report generation must not append `runs.jsonl`
193
+ or replace previous latest reports.
194
+
195
+ Aggregate history must not retain raw transcripts, raw corpus files, raw
196
+ intermediates, raw store paths, raw session ids, private salts, or generated
197
+ salted hashes. Benchmark metrics are user-local, uncommitted, unshipped, and not
198
+ part of normal CI.
199
+
200
+ The command prints the retained benchmark directory in `benchmark.directory`.
201
+ Review the latest result from that directory, or from the default location under
202
+ your resolved `AGENTERA_HOME` (see `agentera doctor --json` when unset):
203
+
204
+ ```bash
205
+ BENCH_DIR="${AGENTERA_HOME}/benchmarks/startup-state"
206
+
207
+ less "$BENCH_DIR/latest-report.md"
208
+ python3 -m json.tool "$BENCH_DIR/latest-report.json"
209
+ tail -n 5 "$BENCH_DIR/runs.jsonl"
210
+ ```
211
+
212
+ Set `AGENTERA_HOME` first, or resolve the platform app home from
213
+ `agentera doctor --json` / `agentera upgrade --dry-run` when unset.
214
+ The stdout `reports` filenames are analyzer report names from Mage's temporary
215
+ work directory. The durable operator-facing files are the `benchmark` paths:
216
+ `runs.jsonl`, `latest-report.json`, and `latest-report.md`.
217
+
218
+ ## Interpretation
219
+
220
+ Interpret the benchmark as a startup state-access optimization signal, not as a
221
+ general runtime performance benchmark.
222
+
223
+ Default runs demonstrate whether new local Agentera startup/state-gathering
224
+ behavior since the previous successful run repeatedly falls back from CLI state
225
+ reads to raw artifact access. Stores that do not exist or cannot be read are
226
+ bounded degradation evidence, not successful behavior evidence.
227
+
228
+ Watermark fields:
229
+
230
+ | Field | Meaning |
231
+ | --- | --- |
232
+ | `benchmark_mode` | `since_previous_benchmark` for Mage runs that use retained history as the previous-run boundary. |
233
+ | `benchmark_previous_watermark_at` | The previous successful watermark for the same `runtime_scope`, or `null` on the first run. |
234
+ | `benchmark_window_started_after` | The exclusive lower timestamp bound used for this run. |
235
+ | `benchmark_watermark_at` | The latest record timestamp covered by this run. The next run for the same `runtime_scope` starts after this value. |
236
+
237
+ Primary rates:
238
+
239
+ | Latest report field | History row field | Meaning |
240
+ | --- | --- | --- |
241
+ | `raw_after_cli_sequence_rate` | `raw_after_cli_rate` | Share of startup state-gathering sequences where any raw Agentera artifact access follows CLI state. |
242
+ | `redundant_raw_sequence_rate` | `redundant_raw_access_rate` | Share of startup state-gathering sequences where raw access overlaps state already covered by the CLI. |
243
+
244
+ High rates support follow-up work such as a CLI startup state envelope. Low or
245
+ zero rates can close the measurement loop without implementation if the corpus is
246
+ representative and degradation counts are bounded.
247
+
248
+ Always review runtime coverage and degradation counts before treating a trend as
249
+ actionable. Missing, locked, sparse, or unreadable stores are bounded degradation
250
+ evidence, not product behavior evidence.
251
+
252
+ ## Adding Benchmarks
253
+
254
+ New Agentera benchmarks should follow the same shape:
255
+
256
+ | Requirement | Rule |
257
+ | --- | --- |
258
+ | Contract first | Define metric, privacy boundary, storage, retained fields, and failure behavior before implementation. |
259
+ | No-prerequisite target | Every `mage bench:*` target must run with no environment variables. The default mode should use documented local defaults and report unavailable inputs as bounded degradation evidence. |
260
+ | Manual unless proven safe | Do not add runtime-history or environment-sensitive benchmarks to normal CI. |
261
+ | Explicit local approval | Require concrete paths or resources, not broad consent flags. |
262
+ | User-local outputs | Keep generated benchmark history outside the repository by default. |
263
+ | Bounded retention | Retain aggregate history and latest redacted reports only. |
264
+ | Fixture verification | Test consent refusal, successful synthetic runs, degradation cases, privacy exclusions, and no repository-local outputs. |
265
+
266
+ If a future benchmark needs different retention or CI behavior, document the
267
+ reason in its contract and update this file in the same change.