agentera 0.0.0 → 3.0.0-dev.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +6 -45
- package/bundle/.agentera-npx-bundle.json +4 -0
- package/bundle/references/adapters/cursor.md +213 -0
- package/bundle/references/adapters/opencode.md +530 -0
- package/bundle/references/adapters/package-manifest-interface-model.yaml +337 -0
- package/bundle/references/adapters/package-registry.yaml +247 -0
- package/bundle/references/adapters/package-surface-characterization.md +48 -0
- package/bundle/references/adapters/runtime-adapter-characterization.md +79 -0
- package/bundle/references/adapters/runtime-adapter-interface-model.yaml +200 -0
- package/bundle/references/adapters/runtime-adapter-registry.yaml +548 -0
- package/bundle/references/adapters/runtime-feature-parity.md +189 -0
- package/bundle/references/analysis/benchmark.md +267 -0
- package/bundle/references/analysis/startup-measurement-contract.yaml +424 -0
- package/bundle/references/artifacts/artifact-registry-interface-model.yaml +288 -0
- package/bundle/references/cli/agent-ready-state-contract.yaml +950 -0
- package/bundle/references/cli/app-lifecycle-vocabulary.yaml +233 -0
- package/bundle/references/cli/audience-namespace-cli-migration.yaml +355 -0
- package/bundle/references/cli/bundle-skill-vocabulary.yaml +278 -0
- package/bundle/references/cli/capability-instruction-contract.yaml +123 -0
- package/bundle/references/cli/capability-tool-classification.yaml +53 -0
- package/bundle/references/cli/routing-execution-vocabulary.yaml +281 -0
- package/bundle/references/cli/update-channels.yaml +120 -0
- package/bundle/references/cli/vocabulary-index.yaml +160 -0
- package/bundle/references/cli/vocabulary.md +562 -0
- package/bundle/references/meta/documentation-inventory.md +43 -0
- package/bundle/references/v1-section-mapping.md +47 -0
- package/bundle/registry.json +39 -0
- package/bundle/skills/agentera/.claude-plugin/plugin.json +27 -0
- package/bundle/skills/agentera/SKILL.md +470 -0
- package/bundle/skills/agentera/agents/dokumentera.toml +6 -0
- package/bundle/skills/agentera/agents/hej.toml +6 -0
- package/bundle/skills/agentera/agents/inspektera.toml +6 -0
- package/bundle/skills/agentera/agents/inspirera.toml +6 -0
- package/bundle/skills/agentera/agents/optimera.toml +6 -0
- package/bundle/skills/agentera/agents/orkestrera.toml +6 -0
- package/bundle/skills/agentera/agents/planera.toml +6 -0
- package/bundle/skills/agentera/agents/profilera.toml +6 -0
- package/bundle/skills/agentera/agents/realisera.toml +6 -0
- package/bundle/skills/agentera/agents/resonera.toml +6 -0
- package/bundle/skills/agentera/agents/visionera.toml +6 -0
- package/bundle/skills/agentera/agents/visualisera.toml +6 -0
- package/bundle/skills/agentera/capabilities/dokumentera/instructions.md +428 -0
- package/bundle/skills/agentera/capabilities/dokumentera/schemas/artifacts.yaml +73 -0
- package/bundle/skills/agentera/capabilities/dokumentera/schemas/exit.yaml +35 -0
- package/bundle/skills/agentera/capabilities/dokumentera/schemas/triggers.yaml +35 -0
- package/bundle/skills/agentera/capabilities/dokumentera/schemas/validation.yaml +139 -0
- package/bundle/skills/agentera/capabilities/hej/instructions.md +331 -0
- package/bundle/skills/agentera/capabilities/hej/schemas/artifacts.yaml +69 -0
- package/bundle/skills/agentera/capabilities/hej/schemas/exit.yaml +32 -0
- package/bundle/skills/agentera/capabilities/hej/schemas/triggers.yaml +58 -0
- package/bundle/skills/agentera/capabilities/hej/schemas/validation.yaml +55 -0
- package/bundle/skills/agentera/capabilities/inspektera/instructions.md +514 -0
- package/bundle/skills/agentera/capabilities/inspektera/schemas/artifacts.yaml +76 -0
- package/bundle/skills/agentera/capabilities/inspektera/schemas/exit.yaml +36 -0
- package/bundle/skills/agentera/capabilities/inspektera/schemas/triggers.yaml +38 -0
- package/bundle/skills/agentera/capabilities/inspektera/schemas/validation.yaml +113 -0
- package/bundle/skills/agentera/capabilities/inspirera/instructions.md +280 -0
- package/bundle/skills/agentera/capabilities/inspirera/schemas/artifacts.yaml +24 -0
- package/bundle/skills/agentera/capabilities/inspirera/schemas/exit.yaml +33 -0
- package/bundle/skills/agentera/capabilities/inspirera/schemas/triggers.yaml +34 -0
- package/bundle/skills/agentera/capabilities/inspirera/schemas/validation.yaml +58 -0
- package/bundle/skills/agentera/capabilities/optimera/instructions.md +437 -0
- package/bundle/skills/agentera/capabilities/optimera/schemas/artifacts.yaml +69 -0
- package/bundle/skills/agentera/capabilities/optimera/schemas/exit.yaml +35 -0
- package/bundle/skills/agentera/capabilities/optimera/schemas/triggers.yaml +39 -0
- package/bundle/skills/agentera/capabilities/optimera/schemas/validation.yaml +91 -0
- package/bundle/skills/agentera/capabilities/orkestrera/instructions.md +433 -0
- package/bundle/skills/agentera/capabilities/orkestrera/schemas/artifacts.yaml +64 -0
- package/bundle/skills/agentera/capabilities/orkestrera/schemas/exit.yaml +34 -0
- package/bundle/skills/agentera/capabilities/orkestrera/schemas/triggers.yaml +42 -0
- package/bundle/skills/agentera/capabilities/orkestrera/schemas/validation.yaml +107 -0
- package/bundle/skills/agentera/capabilities/planera/instructions.md +368 -0
- package/bundle/skills/agentera/capabilities/planera/schemas/artifacts.yaml +62 -0
- package/bundle/skills/agentera/capabilities/planera/schemas/exit.yaml +33 -0
- package/bundle/skills/agentera/capabilities/planera/schemas/triggers.yaml +34 -0
- package/bundle/skills/agentera/capabilities/planera/schemas/validation.yaml +61 -0
- package/bundle/skills/agentera/capabilities/profilera/instructions.md +419 -0
- package/bundle/skills/agentera/capabilities/profilera/schemas/artifacts.yaml +18 -0
- package/bundle/skills/agentera/capabilities/profilera/schemas/exit.yaml +34 -0
- package/bundle/skills/agentera/capabilities/profilera/schemas/triggers.yaml +45 -0
- package/bundle/skills/agentera/capabilities/profilera/schemas/validation.yaml +57 -0
- package/bundle/skills/agentera/capabilities/realisera/instructions.md +403 -0
- package/bundle/skills/agentera/capabilities/realisera/schemas/artifacts.yaml +80 -0
- package/bundle/skills/agentera/capabilities/realisera/schemas/exit.yaml +35 -0
- package/bundle/skills/agentera/capabilities/realisera/schemas/triggers.yaml +39 -0
- package/bundle/skills/agentera/capabilities/realisera/schemas/validation.yaml +110 -0
- package/bundle/skills/agentera/capabilities/resonera/instructions.md +329 -0
- package/bundle/skills/agentera/capabilities/resonera/schemas/artifacts.yaml +47 -0
- package/bundle/skills/agentera/capabilities/resonera/schemas/exit.yaml +35 -0
- package/bundle/skills/agentera/capabilities/resonera/schemas/triggers.yaml +46 -0
- package/bundle/skills/agentera/capabilities/resonera/schemas/validation.yaml +77 -0
- package/bundle/skills/agentera/capabilities/visionera/instructions.md +309 -0
- package/bundle/skills/agentera/capabilities/visionera/schemas/artifacts.yaml +57 -0
- package/bundle/skills/agentera/capabilities/visionera/schemas/exit.yaml +35 -0
- package/bundle/skills/agentera/capabilities/visionera/schemas/triggers.yaml +41 -0
- package/bundle/skills/agentera/capabilities/visionera/schemas/validation.yaml +74 -0
- package/bundle/skills/agentera/capabilities/visualisera/instructions.md +400 -0
- package/bundle/skills/agentera/capabilities/visualisera/schemas/artifacts.yaml +44 -0
- package/bundle/skills/agentera/capabilities/visualisera/schemas/exit.yaml +34 -0
- package/bundle/skills/agentera/capabilities/visualisera/schemas/triggers.yaml +33 -0
- package/bundle/skills/agentera/capabilities/visualisera/schemas/validation.yaml +80 -0
- package/bundle/skills/agentera/capability_schema_contract.yaml +385 -0
- package/bundle/skills/agentera/protocol.yaml +463 -0
- package/bundle/skills/agentera/references/contract.md +1039 -0
- package/bundle/skills/agentera/schemas/artifacts/changelog.yaml +60 -0
- package/bundle/skills/agentera/schemas/artifacts/decisions.yaml +461 -0
- package/bundle/skills/agentera/schemas/artifacts/design.yaml +55 -0
- package/bundle/skills/agentera/schemas/artifacts/docs.yaml +402 -0
- package/bundle/skills/agentera/schemas/artifacts/experiments.yaml +373 -0
- package/bundle/skills/agentera/schemas/artifacts/health.yaml +484 -0
- package/bundle/skills/agentera/schemas/artifacts/objective.yaml +399 -0
- package/bundle/skills/agentera/schemas/artifacts/plan.yaml +342 -0
- package/bundle/skills/agentera/schemas/artifacts/progress.yaml +325 -0
- package/bundle/skills/agentera/schemas/artifacts/todo.yaml +110 -0
- package/bundle/skills/agentera/schemas/artifacts/vision.yaml +262 -0
- package/bundle/skills/hej/.claude-plugin/plugin.json +6 -0
- package/bundle/skills/hej/SKILL.md +69 -0
- package/bundle/skills/hej/agents/hej.toml +11 -0
- package/bundle/skills/hej/agents/openai.yaml +8 -0
- package/dist/analytics/extractCorpus.js +1791 -0
- package/dist/analytics/extractCorpus.js.map +1 -0
- package/dist/analytics/usageStats.js +487 -0
- package/dist/analytics/usageStats.js.map +1 -0
- package/dist/bin/agentera.js +4 -0
- package/dist/bin/agentera.js.map +1 -0
- package/dist/cli/appContext.js +226 -0
- package/dist/cli/appContext.js.map +1 -0
- package/dist/cli/argvalidate.js +41 -0
- package/dist/cli/argvalidate.js.map +1 -0
- package/dist/cli/capabilityContext.js +2421 -0
- package/dist/cli/capabilityContext.js.map +1 -0
- package/dist/cli/commands/backfill.js +84 -0
- package/dist/cli/commands/backfill.js.map +1 -0
- package/dist/cli/commands/capability.js +44 -0
- package/dist/cli/commands/capability.js.map +1 -0
- package/dist/cli/commands/compact.js +148 -0
- package/dist/cli/commands/compact.js.map +1 -0
- package/dist/cli/commands/doctor.js +180 -0
- package/dist/cli/commands/doctor.js.map +1 -0
- package/dist/cli/commands/lint.js +179 -0
- package/dist/cli/commands/lint.js.map +1 -0
- package/dist/cli/commands/prime.js +545 -0
- package/dist/cli/commands/prime.js.map +1 -0
- package/dist/cli/commands/query.js +346 -0
- package/dist/cli/commands/query.js.map +1 -0
- package/dist/cli/commands/report.js +210 -0
- package/dist/cli/commands/report.js.map +1 -0
- package/dist/cli/commands/schema.js +306 -0
- package/dist/cli/commands/schema.js.map +1 -0
- package/dist/cli/commands/state.js +1012 -0
- package/dist/cli/commands/state.js.map +1 -0
- package/dist/cli/commands/upgrade.js +49 -0
- package/dist/cli/commands/upgrade.js.map +1 -0
- package/dist/cli/commands/validate.js +519 -0
- package/dist/cli/commands/validate.js.map +1 -0
- package/dist/cli/commands/verify.js +204 -0
- package/dist/cli/commands/verify.js.map +1 -0
- package/dist/cli/dispatch.js +962 -0
- package/dist/cli/dispatch.js.map +1 -0
- package/dist/cli/orientation.js +595 -0
- package/dist/cli/orientation.js.map +1 -0
- package/dist/cli/prime-blob.js +3 -0
- package/dist/cli/prime-blob.js.map +1 -0
- package/dist/cli/stateQuery.js +292 -0
- package/dist/cli/stateQuery.js.map +1 -0
- package/dist/cli/structured.js +18 -0
- package/dist/cli/structured.js.map +1 -0
- package/dist/core/difflib.js +274 -0
- package/dist/core/difflib.js.map +1 -0
- package/dist/core/git.js +43 -0
- package/dist/core/git.js.map +1 -0
- package/dist/core/paths.js +50 -0
- package/dist/core/paths.js.map +1 -0
- package/dist/core/pyjson.js +101 -0
- package/dist/core/pyjson.js.map +1 -0
- package/dist/core/sourceRoot.js +72 -0
- package/dist/core/sourceRoot.js.map +1 -0
- package/dist/core/toml.js +11 -0
- package/dist/core/toml.js.map +1 -0
- package/dist/core/yaml.js +25 -0
- package/dist/core/yaml.js.map +1 -0
- package/dist/eval/evalSkills.js +258 -0
- package/dist/eval/evalSkills.js.map +1 -0
- package/dist/eval/semanticEval.js +148 -0
- package/dist/eval/semanticEval.js.map +1 -0
- package/dist/eval/semanticFixtures.js +227 -0
- package/dist/eval/semanticFixtures.js.map +1 -0
- package/dist/hooks/common.js +160 -0
- package/dist/hooks/common.js.map +1 -0
- package/dist/hooks/compaction.js +935 -0
- package/dist/hooks/compaction.js.map +1 -0
- package/dist/hooks/cursorPreToolUse.js +19 -0
- package/dist/hooks/cursorPreToolUse.js.map +1 -0
- package/dist/hooks/cursorSessionStart.js +71 -0
- package/dist/hooks/cursorSessionStart.js.map +1 -0
- package/dist/hooks/sessionStart.js +209 -0
- package/dist/hooks/sessionStart.js.map +1 -0
- package/dist/hooks/sessionStop.js +212 -0
- package/dist/hooks/sessionStop.js.map +1 -0
- package/dist/hooks/validateArtifact.js +933 -0
- package/dist/hooks/validateArtifact.js.map +1 -0
- package/dist/registries/artifactRegistry.js +206 -0
- package/dist/registries/artifactRegistry.js.map +1 -0
- package/dist/registries/capabilityContract.js +310 -0
- package/dist/registries/capabilityContract.js.map +1 -0
- package/dist/registries/packageRegistry.js +641 -0
- package/dist/registries/packageRegistry.js.map +1 -0
- package/dist/registries/runtimeAdapterRegistry.js +315 -0
- package/dist/registries/runtimeAdapterRegistry.js.map +1 -0
- package/dist/setup/codex.js +1052 -0
- package/dist/setup/codex.js.map +1 -0
- package/dist/setup/copilot.js +227 -0
- package/dist/setup/copilot.js.map +1 -0
- package/dist/setup/cursor.js +127 -0
- package/dist/setup/cursor.js.map +1 -0
- package/dist/setup/doctor.js +1269 -0
- package/dist/setup/doctor.js.map +1 -0
- package/dist/state/installRoot.js +279 -0
- package/dist/state/installRoot.js.map +1 -0
- package/dist/state/progressCommit.js +289 -0
- package/dist/state/progressCommit.js.map +1 -0
- package/dist/state/startupAnalysis.js +1953 -0
- package/dist/state/startupAnalysis.js.map +1 -0
- package/dist/upgrade/appModel.js +189 -0
- package/dist/upgrade/appModel.js.map +1 -0
- package/dist/upgrade/channels.js +197 -0
- package/dist/upgrade/channels.js.map +1 -0
- package/dist/upgrade/compatibility.js +197 -0
- package/dist/upgrade/compatibility.js.map +1 -0
- package/dist/upgrade/doctor.js +368 -0
- package/dist/upgrade/doctor.js.map +1 -0
- package/dist/upgrade/migrateArtifactsV2ToV3.js +412 -0
- package/dist/upgrade/migrateArtifactsV2ToV3.js.map +1 -0
- package/dist/upgrade/upgradeCommands.js +40 -0
- package/dist/upgrade/upgradeCommands.js.map +1 -0
- package/dist/upgrade/upgradeOrchestrator.js +280 -0
- package/dist/upgrade/upgradeOrchestrator.js.map +1 -0
- package/dist/validate/appHomeContract.js +150 -0
- package/dist/validate/appHomeContract.js.map +1 -0
- package/dist/validate/capability.js +412 -0
- package/dist/validate/capability.js.map +1 -0
- package/dist/validate/crossCapability.js +145 -0
- package/dist/validate/crossCapability.js.map +1 -0
- package/dist/validate/lifecycleAdapters.js +772 -0
- package/dist/validate/lifecycleAdapters.js.map +1 -0
- package/dist/validate/selfAudit.js +107 -0
- package/dist/validate/selfAudit.js.map +1 -0
- package/package.json +28 -8
- package/LICENSE +0 -201
- package/bin/agentera.mjs +0 -50
- package/lib/exec.mjs +0 -116
- package/lib/resolve.mjs +0 -129
|
@@ -0,0 +1,189 @@
|
|
|
1
|
+
# Runtime feature parity reference
|
|
2
|
+
|
|
3
|
+
Tracks release-relevant runtime behavior for the portable agentera suite.
|
|
4
|
+
|
|
5
|
+
This reference distinguishes implemented behavior from host support. A runtime
|
|
6
|
+
may expose an event while agentera still lacks a shipped adapter path for it.
|
|
7
|
+
|
|
8
|
+
## Summary
|
|
9
|
+
|
|
10
|
+
| Runtime | Skill loading | Session preload | Artifact validation | Session bookmark |
|
|
11
|
+
|---------|---------------|-----------------|---------------------|------------------|
|
|
12
|
+
| Claude Code | Full: marketplace plugin and native skill paths load `skills/<name>/SKILL.md` | Active via `SessionStart` in `hooks/hooks.json` | Advisory after mutation via `PostToolUse` for `Edit` or `Write` | Active via `Stop` |
|
|
13
|
+
| OpenCode | Full: native `skill` tool loads `.opencode`, `.claude`, and `.agents` skill paths | Deferred for session start: `session.created` is observable, but no model-context injection path is verified. Active for compaction through bounded `experimental.session.compacting` context from `agentera hej --format json`. | Conditional hard gate for reconstructable `write` and `edit` candidates via `tool.execute.before`; `tool.execute.after` remains advisory | Active via generic `event` hook on `session.idle` |
|
|
14
|
+
| Copilot CLI | Full for portable skills through plugin or skill-folder install paths | Active via `sessionStart` | Conditional hard gate via `preToolUse` when `toolArgs` include path plus candidate content or exact replacement evidence | Active via `sessionEnd` |
|
|
15
|
+
| Codex CLI | Full for portable skills through plugin install, `.agents/skills`, and `$skill` invocation | Not wired by the shipped hook config | Advisory `apply_patch` path validation through shipped PreToolUse and PostToolUse hooks; final patch content is not reconstructed | Not wired by the shipped hook config |
|
|
16
|
+
| Cursor IDE | Full for portable skills through local plugin (`~/.cursor/plugins/local/agentera` via `.cursor-plugin/plugin.json`), repo-native surfaces, and upgrade-installed `.cursor/` targets | Active via `sessionStart` env export plus optional `additional_context` digest; plugin-root fallback when `AGENTERA_HOME` env and project walk-up fail (`hooks/cursor_session_start.py`) | Conditional hard gate for reconstructable `Write` and `Edit` candidates via `preToolUse`; verified after live preToolUse Write smoke (2026-05-24) | Active via `sessionEnd` |
|
|
17
|
+
| Cursor Agent CLI | Full when workspace surfaces are installed; degraded when launched outside a Cursor project | Degraded relative to IDE sessionStart env export | Degraded hook parity; follows IDE smoke evidence only | Degraded relative to IDE `sessionEnd` wiring |
|
|
18
|
+
|
|
19
|
+
## Profilera session corpus
|
|
20
|
+
|
|
21
|
+
| Runtime | Local session mining | Evidence |
|
|
22
|
+
| ------- | -------------------- | -------- |
|
|
23
|
+
| Claude Code | Yes: `~/.claude/projects/**/*.jsonl` | `scripts/extract_corpus.py` |
|
|
24
|
+
| Codex CLI | Yes: `~/.codex/sessions/**/*.jsonl` | `scripts/extract_corpus.py` |
|
|
25
|
+
| OpenCode | Yes: `opencode.db` SQLite stores | `scripts/extract_corpus.py`, `references/adapters/opencode.md` |
|
|
26
|
+
| Copilot CLI | Yes: `session-store.db` SQLite stores | `scripts/extract_corpus.py` |
|
|
27
|
+
| Cursor IDE | Yes: `~/.cursor/projects/*/agent-transcripts/*/*.jsonl` | `scripts/extract_corpus.py`, `references/adapters/cursor.md` |
|
|
28
|
+
| Cursor Agent CLI | Yes: gap-fill from `~/.config/cursor/chats/<md5(project)>/<session>/store.db` when no IDE JSONL exists | `scripts/extract_corpus.py`, `references/adapters/cursor.md` |
|
|
29
|
+
|
|
30
|
+
## Bare `hej` routing
|
|
31
|
+
|
|
32
|
+
| Runtime | Bare text `hej` behavior | Evidence |
|
|
33
|
+
|---------|--------------------------|----------|
|
|
34
|
+
| OpenCode | Deterministic exact-match adapter route through `chat.message`; only a complete lowercase text message `hej` is rewritten to load `agentera` and run the `agentera hej` dashboard path, accepting OpenCode's CLI-added single trailing newline as a transport artifact. | `.opencode/plugins/agentera.js`, `scripts/smoke_opencode_bootstrap.mjs`, OpenCode `packages/plugin/src/index.ts` Hooks interface |
|
|
35
|
+
| Claude Code | Metadata/context only; `UserPromptSubmit` can observe or add context but is not a verified prompt rewrite router. | `skills/agentera/SKILL.md`, marketplace metadata |
|
|
36
|
+
| Copilot CLI | Metadata/context only; skills, prompts, hooks, and plugins expose Agentera but do not guarantee pre-model bare-prompt routing. | `plugin.json`, `.github/plugin/plugin.json`, `.github/hooks` |
|
|
37
|
+
| Codex CLI | Metadata/context only; `$agentera` is explicit and the legacy `$hej` bridge is not implicitly invocable. | `.codex-plugin/plugin.json`, `agents/openai.yaml` |
|
|
38
|
+
| Cursor IDE | Metadata/context only; `beforeSubmitPrompt` is supported by the host but Agentera v1 does not rewrite bare `hej`. | `.cursor-plugin/plugin.json`, `.cursor/agents/*.md` |
|
|
39
|
+
| Cursor Agent CLI | Metadata/context only like Claude, Copilot, and Codex. | `references/adapters/cursor.md`, eval runner metadata |
|
|
40
|
+
|
|
41
|
+
## Artifact validation
|
|
42
|
+
|
|
43
|
+
| Runtime | Blocking surface | Implemented gate | Evidence-insufficient paths | Verification surface |
|
|
44
|
+
|---------|------------------|------------------|-----------------------------|----------------------|
|
|
45
|
+
| Claude Code | None in shipped config; validation runs after `Edit` or `Write` | No pre-write hard gate is claimed | Any invalid artifact can already be written before the warning appears | `hooks/hooks.json`, `hooks/validate_artifact.py` |
|
|
46
|
+
| OpenCode | `tool.execute.before` can throw before mutation | Invalid reconstructable artifact `write` and `edit` candidates are blocked | Sparse payloads and `apply_patch` `patchText` without reconstructed full content are allowed | `.opencode/plugins/agentera.js`, `scripts/smoke_opencode_bootstrap.mjs` |
|
|
47
|
+
| Copilot CLI | `preToolUse` returns `permissionDecision: deny` | Invalid reconstructable artifact candidates are denied | Malformed, sparse, or non-reconstructable `toolArgs` are allowed | `.github/hooks/preToolUse.json`, `hooks/validate_artifact.py`, `tests/test_validate_artifact.py` |
|
|
48
|
+
| Codex CLI | `codex_hooks` can run before and after `apply_patch` | No content hard gate is claimed; the copied user hook config parses touched paths and validates existing files; optional plugin-bundled hooks require `[features].plugin_hooks = true` plus `/hooks` review | Add-file targets and final post-patch candidate content are not reconstructed by the adapter | `~/.codex/hooks.json` generated by upgrade, `.codex-plugin/plugin.json` `hooks`, `hooks/codex-plugin-hooks.json`, `hooks/validate_artifact.py`, live apply_patch hook firing smoke |
|
|
49
|
+
| Cursor IDE | `preToolUse` returns `permission: deny` when wired | Invalid reconstructable artifact candidates are denied; live preToolUse Write smoke passed 2026-05-24 | Malformed, sparse, or non-reconstructable tool_input payloads are allowed | `.cursor/hooks.json`, `hooks/cursor_pre_tool_use.py`, `hooks/validate_artifact.py`, `.agentera/smoke-cursor-pretooluse-evidence.txt` |
|
|
50
|
+
| Cursor Agent CLI | None claimed for standalone CLI | No IDE-equivalent hard gate is claimed | CLI may run without project hook wiring | `scripts/eval_skills.py --runtime cursor-agent` |
|
|
51
|
+
|
|
52
|
+
Docs may claim functional hard-gate parity only for closeable paths that are
|
|
53
|
+
implemented and verified. Today that means OpenCode, Copilot, and Cursor IDE
|
|
54
|
+
reconstructable artifact candidates. Claude Code and Codex remain active validation
|
|
55
|
+
surfaces, but neither shipped configuration blocks every invalid artifact candidate
|
|
56
|
+
before mutation.
|
|
57
|
+
|
|
58
|
+
## Lifecycle notes
|
|
59
|
+
|
|
60
|
+
| Runtime | Runtime reason for degraded or blocked capability |
|
|
61
|
+
|---------|---------------------------------------------------|
|
|
62
|
+
| OpenCode preload | The `event` hook observes `session.created`, but no supported adapter path injects text into model context. |
|
|
63
|
+
| OpenCode compaction context | `experimental.session.compacting` appends bounded Agentera state from `agentera hej --format json`; the plugin does not read raw `.agentera` artifacts for compaction. |
|
|
64
|
+
| OpenCode `apply_patch` hard gate | The adapter receives `patchText` without reconstructing full candidate content. It allows that path rather than guessing. |
|
|
65
|
+
| Copilot sparse edits | Copilot `preToolUse` stdin may omit full content or unique old/new replacement evidence. The hook allows those payloads. |
|
|
66
|
+
| Codex preload/bookmarks | `codex_hooks` supports lifecycle events, but Agentera ships only `apply_patch` PreToolUse/PostToolUse wiring for copied user hooks and optional plugin-bundled hooks. |
|
|
67
|
+
| Codex artifact hard gate | The adapter parses patch headers for touched paths, but it does not reconstruct final candidate content for blocking validation. |
|
|
68
|
+
| Codex plugin hook trust | Plugin-bundled hooks require `[features].plugin_hooks = true` and deliberate `/hooks` review; copied `~/.codex/hooks.json` remains the default reliable install path with generated `[hooks.state]` trust hashes. |
|
|
69
|
+
| Cursor cloud agents | Cloud agents are unsupported in v1; repo hooks and managed agents target local IDE sessions only. |
|
|
70
|
+
| Cursor CLI hook parity | `cursor-agent` print mode is eval-covered but hook/session env parity is degraded relative to IDE wiring. |
|
|
71
|
+
| Cursor hard-gate release gate | Live preToolUse Write smoke passed 2026-05-24; release tagging and publication stay blocked pending broader release closeout. |
|
|
72
|
+
|
|
73
|
+
## Subagent Dispatch
|
|
74
|
+
|
|
75
|
+
| Runtime | Dispatch surface | Descriptor source | Tool Access | Verification surface |
|
|
76
|
+
|---------|------------------|-------------------|-------------|----------------------|
|
|
77
|
+
| Claude Code | Native Task/subagent surface | Host-managed; no Agentera descriptor files shipped for this phase | None (no descriptors) | RuntimeAdapter registry |
|
|
78
|
+
| OpenCode | `@<capability>` descriptors under `~/.config/opencode/agents` | `.opencode/agents/*.md`, bootstrapped by `.opencode/plugins/agentera.js` | Per-agent `permission` frontmatter | `scripts/smoke_opencode_bootstrap.mjs`, `agentera validate descriptors` |
|
|
79
|
+
| Copilot CLI | User-driven host action such as `/fleet` when available | Host-managed; no Agentera descriptor files shipped for this phase | N/A (no descriptors) | RuntimeAdapter registry |
|
|
80
|
+
| Codex CLI | Native agent descriptors under `~/.codex/agents` or project `.codex/agents` with bounded `[agents]` settings | `skills/agentera/agents/*.toml`, installed by `scripts/setup_codex.py` and `agentera upgrade` | Global sandbox policy (no per-agent) | `agentera validate descriptors`, `tests/test_setup_codex.py`, `tests/test_upgrade_cli.py` |
|
|
81
|
+
| Cursor IDE | Cursor agent picker / @-mention for managed capability descriptors | `.cursor/agents/*.md`, via local plugin or `agentera upgrade --runtime cursor` | Global full access (no per-agent) | `references/adapters/cursor.md`, `scripts/validate_lifecycle_adapters.py`, `tests/test_upgrade_cli.py` |
|
|
82
|
+
| Cursor Agent CLI | Host-managed `cursor-agent -p` print mode | Workspace `.cursor/agents/*.md` when present; no separate CLI descriptor install | Global full access (no per-agent) | `scripts/eval_skills.py --runtime cursor-agent`, `tests/test_eval_skills.py` |
|
|
83
|
+
|
|
84
|
+
Agentera v2 does not write legacy `[agents.<name>]` Codex config blocks. Capability dispatch must use runtime-native subagent descriptors or host Task surfaces, not unsupported `agentera <capability>` CLI commands.
|
|
85
|
+
|
|
86
|
+
## Copilot install notes
|
|
87
|
+
|
|
88
|
+
Recommended marketplace install:
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
copilot plugin marketplace add jgabor/agentera
|
|
92
|
+
copilot plugin install <skill>@agentera
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Umbrella install:
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
copilot plugin install jgabor/agentera
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
The marketplace install path is verified working. Granular installs avoid
|
|
102
|
+
umbrella discovery bug `github/copilot-cli#2390`.
|
|
103
|
+
|
|
104
|
+
Granular installs provide core `SKILL.md` behavior. App-home tools such as
|
|
105
|
+
doctor, installer, validators, and shared setup helpers require the managed
|
|
106
|
+
Agentera app or a local clone with the shared `scripts/` directory.
|
|
107
|
+
|
|
108
|
+
Deprecated fallback: `copilot plugin install OWNER/REPO`, Git URLs, and local
|
|
109
|
+
paths still work, but Copilot warns they are deprecated.
|
|
110
|
+
|
|
111
|
+
## Cursor install notes
|
|
112
|
+
|
|
113
|
+
**Local plugin (no Marketplace listing required)**
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
git clone https://github.com/jgabor/agentera.git ~/.cursor/plugins/local/agentera
|
|
117
|
+
# or: ln -s /path/to/agentera ~/.cursor/plugins/local/agentera
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
Restart Cursor or run **Developer: Reload Window**. The plugin root must contain
|
|
121
|
+
`.cursor-plugin/plugin.json`. Agentera is not published to the Cursor Marketplace
|
|
122
|
+
yet.
|
|
123
|
+
|
|
124
|
+
The plugin loads skills, managed capability agents, and hooks. When you open a
|
|
125
|
+
project that is not an Agentera install root, `sessionStart` exports
|
|
126
|
+
`AGENTERA_HOME` from the plugin checkout (including a plugin-root fallback when
|
|
127
|
+
env and project walk-up do not resolve a managed root).
|
|
128
|
+
|
|
129
|
+
**Portable skill plus project upgrade**
|
|
130
|
+
|
|
131
|
+
Install the bundled skill, then install managed project surfaces:
|
|
132
|
+
|
|
133
|
+
```bash
|
|
134
|
+
npx skills add jgabor/agentera -g -a cursor --skill agentera -y
|
|
135
|
+
uv run scripts/agentera upgrade --runtime cursor --dry-run
|
|
136
|
+
uv run scripts/agentera upgrade --runtime cursor --yes
|
|
137
|
+
uv run scripts/agentera doctor --runtime cursor
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
Use the plugin path for a user-global install. Use upgrade when you need
|
|
141
|
+
project-committed `.cursor/hooks.json` and `.cursor/agents/` copies. Both paths can
|
|
142
|
+
be combined.
|
|
143
|
+
|
|
144
|
+
Repo-native dogfood in this repository uses committed `.cursor/hooks.json` and
|
|
145
|
+
`.cursor/agents/*.md`. Other projects install managed surfaces with the upgrade
|
|
146
|
+
commands above.
|
|
147
|
+
|
|
148
|
+
Cloud agents are unsupported in v1. Conditional hard-gate validation for IDE
|
|
149
|
+
reconstructable Write and Edit candidates is verified after live preToolUse Write
|
|
150
|
+
smoke (2026-05-24); release tagging and publication remain blocked until explicitly
|
|
151
|
+
approved.
|
|
152
|
+
|
|
153
|
+
Eval coverage for automation uses:
|
|
154
|
+
|
|
155
|
+
```bash
|
|
156
|
+
uv run scripts/eval_skills.py --runtime cursor-agent --dry-run
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
## Source of truth
|
|
160
|
+
|
|
161
|
+
Runtime adapter facts are owned by the RuntimeAdapter registry at
|
|
162
|
+
`references/adapters/runtime-adapter-registry.yaml` and loaded through
|
|
163
|
+
`scripts/runtime_adapter_registry.py`. This reference may describe registry
|
|
164
|
+
claims, but changes to runtime identity, lifecycle events, artifact-validation
|
|
165
|
+
support, subagent dispatch, config targets, diagnostics, or documentation claims must be validated
|
|
166
|
+
against the registry rather than duplicated here as an independent table.
|
|
167
|
+
|
|
168
|
+
App-home classification is not runtime-specific. `scripts/install_root.py` is
|
|
169
|
+
the shared Module for `AGENTERA_HOME`, the normal user data root, managed app,
|
|
170
|
+
out-of-date app, unknown directories, and diagnostic semantics. Package metadata
|
|
171
|
+
registry work stays outside both the RuntimeAdapter registry and this shared
|
|
172
|
+
classification Module.
|
|
173
|
+
|
|
174
|
+
| Surface | Path |
|
|
175
|
+
|---------|------|
|
|
176
|
+
| Shared artifact validator | `hooks/validate_artifact.py` |
|
|
177
|
+
| Claude Code hook registry | `hooks/hooks.json` |
|
|
178
|
+
| OpenCode plugin | `.opencode/plugins/agentera.js` |
|
|
179
|
+
| OpenCode agent descriptors | `.opencode/agents/*.md` |
|
|
180
|
+
| Copilot pre-write hook | `.github/hooks/preToolUse.json` |
|
|
181
|
+
| Codex copied user hook config | `~/.codex/hooks.json` generated from `hooks/codex-hooks.json` with a resolved validator command |
|
|
182
|
+
| Codex plugin hook config | `hooks/codex-plugin-hooks.json` via `.codex-plugin/plugin.json` `hooks` |
|
|
183
|
+
| Codex agent descriptors | `skills/agentera/agents/*.toml` |
|
|
184
|
+
| Cursor hook registry | `.cursor/hooks.json` |
|
|
185
|
+
| Cursor agent descriptors | `.cursor/agents/*.md` |
|
|
186
|
+
| Cursor plugin manifest | `.cursor-plugin/plugin.json` |
|
|
187
|
+
| RuntimeAdapter registry | `references/adapters/runtime-adapter-registry.yaml` |
|
|
188
|
+
| RuntimeAdapter registry loader | `scripts/runtime_adapter_registry.py` |
|
|
189
|
+
| Lifecycle metadata validator | `scripts/validate_lifecycle_adapters.py` |
|
|
@@ -0,0 +1,267 @@
|
|
|
1
|
+
# Agentera benchmarks
|
|
2
|
+
|
|
3
|
+
This document indexes benchmark surfaces, execution policy, retained outputs,
|
|
4
|
+
and interpretation rules. Use it when running Agentera benchmarks or adding new
|
|
5
|
+
ones.
|
|
6
|
+
|
|
7
|
+
Scope: maintainer-run benchmarks only. Normal verification still lives in tests,
|
|
8
|
+
contract validators, and runtime smoke checks.
|
|
9
|
+
|
|
10
|
+
## Authority order
|
|
11
|
+
|
|
12
|
+
| Authority | Owns |
|
|
13
|
+
| --- | --- |
|
|
14
|
+
| `references/analysis/startup-measurement-contract.yaml` | Startup state-access metric contract, benchmark privacy boundary, retained fields, and storage shape. |
|
|
15
|
+
| `scripts/startup_analysis_contract.py` | Startup analyzer implementation, report generation, aggregate row construction, and benchmark persistence. |
|
|
16
|
+
| `magefile.go` | Manual benchmark command surface and non-interactive approval gate. |
|
|
17
|
+
| `tests/test_startup_analysis_contract.py` | Fixture-backed consent, persistence, privacy, and no-repo-output checks. |
|
|
18
|
+
| `references/analysis/benchmark.md` | Human runbook, interpretation guide, and future benchmark documentation pattern. |
|
|
19
|
+
|
|
20
|
+
## Benchmark surfaces
|
|
21
|
+
|
|
22
|
+
| Surface | Command | Purpose | CI policy |
|
|
23
|
+
| --- | --- | --- | --- |
|
|
24
|
+
| Startup state benchmark | `mage bench:startupState` | Measures how often Agentera CLI state reads are followed by raw artifact access during startup/state gathering. | Manual only; forbidden in normal CI. |
|
|
25
|
+
|
|
26
|
+
The startup benchmark is an optimization signal for Decision 51 and Decision 52.
|
|
27
|
+
It does not implement a startup state envelope or change runtime behavior.
|
|
28
|
+
|
|
29
|
+
## Startup State Benchmark
|
|
30
|
+
|
|
31
|
+
Run the benchmark with no extra setup:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
mage bench:startupState
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
The default run uses documented runtime-store defaults, writes retained results
|
|
38
|
+
under the default Agentera benchmark directory, and records unavailable stores as
|
|
39
|
+
bounded degradation evidence. No environment variables are required.
|
|
40
|
+
|
|
41
|
+
To use different runtime history sources, set each runtime label and concrete
|
|
42
|
+
store path explicitly:
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
AGENTERA_BENCH_RUNTIME_STORES="opencode=/absolute/path/to/opencode.db" mage bench:startupState
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
Customize a run when needed:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
AGENTERA_BENCH_RUNTIME_STORES="opencode=/absolute/path/to/opencode.db" \
|
|
52
|
+
AGENTERA_BENCH_SALT="$(openssl rand -hex 32)" \
|
|
53
|
+
AGENTERA_BENCH_PROJECT_ROOTS="/absolute/project/root" \
|
|
54
|
+
AGENTERA_BENCH_OUTPUT_DIR="/absolute/benchmark/output" \
|
|
55
|
+
mage bench:startupState
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
Inputs:
|
|
59
|
+
|
|
60
|
+
| Input | Meaning |
|
|
61
|
+
| --- | --- |
|
|
62
|
+
| `AGENTERA_BENCH_RUNTIME_STORES` | Optional. Comma-separated `runtime=/absolute/path` overrides. Use this when the documented runtime-store defaults are not the stores you want measured. A runtime label without its path is not enough. |
|
|
63
|
+
| `AGENTERA_BENCH_SALT` | Optional. Local redaction salt for transient `latest-report.*` pseudonyms. If omitted, Mage generates one. `runs.jsonl` does not retain salts or generated salted hashes, so aggregate history does not require a stable or shared salt. |
|
|
64
|
+
| `AGENTERA_BENCH_PROJECT_ROOTS` | Optional. Absolute project roots for corpus extraction. Use this when the current directory is not the project root you want measured. |
|
|
65
|
+
| `AGENTERA_BENCH_OUTPUT_DIR` | Optional. Absolute override for the durable benchmark directory. Use only when the default user-local directory is not desired. |
|
|
66
|
+
|
|
67
|
+
Supported runtime labels are owned by the analyzer and Mage wrapper. Invalid
|
|
68
|
+
labels or relative paths fail before runtime history is read.
|
|
69
|
+
|
|
70
|
+
### Runtime Extraction Contract
|
|
71
|
+
|
|
72
|
+
The extraction matrix is defined in
|
|
73
|
+
`references/analysis/startup-measurement-contract.yaml` under
|
|
74
|
+
`runtime_extraction_contract`. It lists, for each supported runtime, the accepted
|
|
75
|
+
input schema classes, normalized record fields, status mapping, and redaction
|
|
76
|
+
rules.
|
|
77
|
+
|
|
78
|
+
Status interpretation is intentionally split:
|
|
79
|
+
|
|
80
|
+
| Outcome | Status / reason | Meaning |
|
|
81
|
+
| --- | --- | --- |
|
|
82
|
+
| Schema divergence | `degraded` / `schema_divergent` | Candidate runtime storage was found, but the adapter hit schema errors. Treat this as extraction failure evidence. |
|
|
83
|
+
| No matching records | `sparse` / `no_matching_records` | Candidate storage was readable and schema-compatible, but no supported records were extracted. Treat this as sparse coverage evidence. |
|
|
84
|
+
| Successful zero-record window | `ok` / `records_extracted` with `record_count: 0` and `error_count: 0` | Extraction succeeded, but the incremental benchmark window has no records after the previous watermark. Treat this as compatible successful behavior. |
|
|
85
|
+
|
|
86
|
+
Current known runtime caveats are extraction caveats, not CLI behavior evidence:
|
|
87
|
+
`claude-code` is degraded by `schema_divergent` with 4836 candidates, 0 records,
|
|
88
|
+
and 2 errors; `github-copilot` is degraded by `schema_divergent` with 1
|
|
89
|
+
candidate, 0 records, and 1 error; `opencode` currently contributes records;
|
|
90
|
+
`codex` can validly report `ok` with zero records and zero errors for an empty
|
|
91
|
+
incremental window.
|
|
92
|
+
|
|
93
|
+
Runtime-store runs are incremental by default. The first run for a runtime scope
|
|
94
|
+
measures all records after the v2.3.0 boundary. Later runs for the same
|
|
95
|
+
`runtime_scope` read the previous `benchmark_watermark_at` from `runs.jsonl` and
|
|
96
|
+
measure only records with timestamps strictly after that watermark. This keeps
|
|
97
|
+
new aggregate rows focused on work since the previous successful benchmark run.
|
|
98
|
+
|
|
99
|
+
To start a fresh series, use a different `AGENTERA_BENCH_OUTPUT_DIR` or archive
|
|
100
|
+
the existing `runs.jsonl`. Late-arriving records with timestamps at or before the
|
|
101
|
+
stored watermark are treated as already covered.
|
|
102
|
+
|
|
103
|
+
## How To Reduce Raw Startup Reads
|
|
104
|
+
|
|
105
|
+
Use the startup benchmark as a CLI completeness profiler. The benchmark does not
|
|
106
|
+
measure wall-clock startup speed. It answers whether agents fetch Agentera state
|
|
107
|
+
through the CLI and then still fall back to raw artifact reads, greps, or globs
|
|
108
|
+
during startup/state gathering.
|
|
109
|
+
|
|
110
|
+
Follow this loop:
|
|
111
|
+
|
|
112
|
+
1. Run `mage bench:startupState` and open the retained latest report from the
|
|
113
|
+
`benchmark.directory` path printed on stdout.
|
|
114
|
+
2. Check evidence quality before planning product changes. `total_state_sequences`
|
|
115
|
+
must be non-zero, important runtime rows should not be degraded, and
|
|
116
|
+
`confidence_caveats` plus `degradation_reason_counts` must be understood. If
|
|
117
|
+
the report has zero state-gathering sequences, treat it as no evidence rather
|
|
118
|
+
than evidence that the CLI is complete.
|
|
119
|
+
3. Rank the post-CLI raw artifact reads. The best next CLI field is usually the
|
|
120
|
+
highest repeated artifact in `redundant_raw_artifact_access_counts`, followed
|
|
121
|
+
by `raw_artifact_access_after_cli_counts` when the read is not redundant yet
|
|
122
|
+
but still happens after a successful CLI state call.
|
|
123
|
+
4. Map each repeated raw read to the CLI state owner. Prefer enriching existing
|
|
124
|
+
routine commands or the `hej` composite startup result. Do not add Decision 43
|
|
125
|
+
slash-route aliases as CLI commands.
|
|
126
|
+
5. Make CLI completeness explicit. A startup-capable response should say whether
|
|
127
|
+
it is complete for the requested capability, whether raw artifact reads are
|
|
128
|
+
required, what state families are included, what is missing, and which CLI
|
|
129
|
+
fallback command should be tried before raw file access.
|
|
130
|
+
6. Update guidance and tests so agents trust complete CLI output and use raw
|
|
131
|
+
reads only as a fallback. Rerun the benchmark on representative sessions and
|
|
132
|
+
compare the new rates with prior `runs.jsonl` rows.
|
|
133
|
+
|
|
134
|
+
Use these fields to decide what to change:
|
|
135
|
+
|
|
136
|
+
| Field | How to use it |
|
|
137
|
+
| --- | --- |
|
|
138
|
+
| `runtime_coverage` | Verify which runtime stores contributed records, which were degraded, and why. Fix extraction or gather more evidence before product work when key stores are degraded. |
|
|
139
|
+
| `total_state_sequences` | Confirm the run actually observed startup/state-gathering sequences. A zero value blocks CLI-completeness conclusions. |
|
|
140
|
+
| `cli_state_command_counts` | Shows which Agentera state commands anchored the measured sequences. These commands are the first candidates for richer structured output. |
|
|
141
|
+
| `raw_artifact_access_after_cli_counts` | Shows which canonical artifacts agents still read after CLI state. These are candidate missing CLI fields or summaries. |
|
|
142
|
+
| `redundant_raw_artifact_access_counts` | Shows post-CLI raw reads that overlap state already covered by the CLI. These are the highest-priority avoidable reads. |
|
|
143
|
+
| `per_capability_state_counts` | Shows whether the gap is broad startup behavior or narrow capability-specific startup context. |
|
|
144
|
+
| `capability_prose_read_counts` | Shows capability prose reads during startup. Use this to decide whether routing/context guidance, not artifact state, is the repeated lookup. |
|
|
145
|
+
| `startup_recommendation` | Records whether the measured evidence supports closing, targeted guidance, or a broader startup state envelope. |
|
|
146
|
+
|
|
147
|
+
### Token Impact Estimates
|
|
148
|
+
|
|
149
|
+
Token-impact fields are approximate, privacy-safe aggregate estimates. The
|
|
150
|
+
contract-owned estimator version is `approx_bytes_div_4_v1`: the analyzer may
|
|
151
|
+
observe content byte counts transiently, group them by canonical artifact label,
|
|
152
|
+
and estimate tokens as bytes divided by 4. Retained outputs must not include raw
|
|
153
|
+
paths, transcript text, raw tool arguments, private salts, or generated salted
|
|
154
|
+
hashes.
|
|
155
|
+
|
|
156
|
+
Retained latest reports and new history rows may include:
|
|
157
|
+
|
|
158
|
+
| Field | Meaning |
|
|
159
|
+
| --- | --- |
|
|
160
|
+
| `token_estimator_version` | Estimator identity, currently `approx_bytes_div_4_v1`. |
|
|
161
|
+
| `estimated_raw_after_cli_tokens` | Aggregate estimated tokens for raw artifact reads after CLI state calls. |
|
|
162
|
+
| `estimated_redundant_raw_tokens` | Aggregate estimated tokens for raw artifact reads that overlap CLI-covered state. |
|
|
163
|
+
| `estimated_raw_after_cli_tokens_by_artifact` | Canonical-label breakdown such as `PLAN.md`; never raw paths. |
|
|
164
|
+
| `estimated_redundant_raw_tokens_by_artifact` | Canonical-label breakdown for redundant raw reads. |
|
|
165
|
+
| `estimated_tokens_saved_vs_previous` | Previous comparable row's redundant-token estimate minus the current row's estimate, or `null`. |
|
|
166
|
+
| `estimated_tokens_saved_vs_previous_null_reason` | Concrete reason when savings are `null`, such as `previous_missing_token_estimates`. |
|
|
167
|
+
|
|
168
|
+
Rows are comparable only when contract version, benchmark mode, runtime scope,
|
|
169
|
+
estimator version, and token-field availability match. Otherwise savings stay
|
|
170
|
+
`null` with a contract-listed reason.
|
|
171
|
+
|
|
172
|
+
Stop conditions are as important as action triggers. If a run has zero
|
|
173
|
+
state-gathering sequences, sparse records, or degraded schema extraction, improve
|
|
174
|
+
benchmark coverage first. If only one capability repeatedly reads one artifact,
|
|
175
|
+
prefer targeted CLI output or guidance over a broad startup envelope. If multiple
|
|
176
|
+
capabilities repeatedly read several redundant artifacts after CLI state, plan a
|
|
177
|
+
capability-ready startup state envelope or equivalent composite output.
|
|
178
|
+
|
|
179
|
+
## Retention Policy
|
|
180
|
+
|
|
181
|
+
Default durable storage is `${AGENTERA_HOME}/benchmarks/startup-state/`.
|
|
182
|
+
|
|
183
|
+
Retained outputs are limited to:
|
|
184
|
+
|
|
185
|
+
| File | Retention role |
|
|
186
|
+
| --- | --- |
|
|
187
|
+
| `runs.jsonl` | Append-only aggregate benchmark history and previous-run watermark source. |
|
|
188
|
+
| `latest-report.json` | Latest redacted structured report. |
|
|
189
|
+
| `latest-report.md` | Latest redacted human-readable report. |
|
|
190
|
+
|
|
191
|
+
Temporary corpus files, intermediates, and per-run detailed reports are not
|
|
192
|
+
durable benchmark history. Failed report generation must not append `runs.jsonl`
|
|
193
|
+
or replace previous latest reports.
|
|
194
|
+
|
|
195
|
+
Aggregate history must not retain raw transcripts, raw corpus files, raw
|
|
196
|
+
intermediates, raw store paths, raw session ids, private salts, or generated
|
|
197
|
+
salted hashes. Benchmark metrics are user-local, uncommitted, unshipped, and not
|
|
198
|
+
part of normal CI.
|
|
199
|
+
|
|
200
|
+
The command prints the retained benchmark directory in `benchmark.directory`.
|
|
201
|
+
Review the latest result from that directory, or from the default location under
|
|
202
|
+
your resolved `AGENTERA_HOME` (see `agentera doctor --json` when unset):
|
|
203
|
+
|
|
204
|
+
```bash
|
|
205
|
+
BENCH_DIR="${AGENTERA_HOME}/benchmarks/startup-state"
|
|
206
|
+
|
|
207
|
+
less "$BENCH_DIR/latest-report.md"
|
|
208
|
+
python3 -m json.tool "$BENCH_DIR/latest-report.json"
|
|
209
|
+
tail -n 5 "$BENCH_DIR/runs.jsonl"
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
Set `AGENTERA_HOME` first, or resolve the platform app home from
|
|
213
|
+
`agentera doctor --json` / `agentera upgrade --dry-run` when unset.
|
|
214
|
+
The stdout `reports` filenames are analyzer report names from Mage's temporary
|
|
215
|
+
work directory. The durable operator-facing files are the `benchmark` paths:
|
|
216
|
+
`runs.jsonl`, `latest-report.json`, and `latest-report.md`.
|
|
217
|
+
|
|
218
|
+
## Interpretation
|
|
219
|
+
|
|
220
|
+
Interpret the benchmark as a startup state-access optimization signal, not as a
|
|
221
|
+
general runtime performance benchmark.
|
|
222
|
+
|
|
223
|
+
Default runs demonstrate whether new local Agentera startup/state-gathering
|
|
224
|
+
behavior since the previous successful run repeatedly falls back from CLI state
|
|
225
|
+
reads to raw artifact access. Stores that do not exist or cannot be read are
|
|
226
|
+
bounded degradation evidence, not successful behavior evidence.
|
|
227
|
+
|
|
228
|
+
Watermark fields:
|
|
229
|
+
|
|
230
|
+
| Field | Meaning |
|
|
231
|
+
| --- | --- |
|
|
232
|
+
| `benchmark_mode` | `since_previous_benchmark` for Mage runs that use retained history as the previous-run boundary. |
|
|
233
|
+
| `benchmark_previous_watermark_at` | The previous successful watermark for the same `runtime_scope`, or `null` on the first run. |
|
|
234
|
+
| `benchmark_window_started_after` | The exclusive lower timestamp bound used for this run. |
|
|
235
|
+
| `benchmark_watermark_at` | The latest record timestamp covered by this run. The next run for the same `runtime_scope` starts after this value. |
|
|
236
|
+
|
|
237
|
+
Primary rates:
|
|
238
|
+
|
|
239
|
+
| Latest report field | History row field | Meaning |
|
|
240
|
+
| --- | --- | --- |
|
|
241
|
+
| `raw_after_cli_sequence_rate` | `raw_after_cli_rate` | Share of startup state-gathering sequences where any raw Agentera artifact access follows CLI state. |
|
|
242
|
+
| `redundant_raw_sequence_rate` | `redundant_raw_access_rate` | Share of startup state-gathering sequences where raw access overlaps state already covered by the CLI. |
|
|
243
|
+
|
|
244
|
+
High rates support follow-up work such as a CLI startup state envelope. Low or
|
|
245
|
+
zero rates can close the measurement loop without implementation if the corpus is
|
|
246
|
+
representative and degradation counts are bounded.
|
|
247
|
+
|
|
248
|
+
Always review runtime coverage and degradation counts before treating a trend as
|
|
249
|
+
actionable. Missing, locked, sparse, or unreadable stores are bounded degradation
|
|
250
|
+
evidence, not product behavior evidence.
|
|
251
|
+
|
|
252
|
+
## Adding Benchmarks
|
|
253
|
+
|
|
254
|
+
New Agentera benchmarks should follow the same shape:
|
|
255
|
+
|
|
256
|
+
| Requirement | Rule |
|
|
257
|
+
| --- | --- |
|
|
258
|
+
| Contract first | Define metric, privacy boundary, storage, retained fields, and failure behavior before implementation. |
|
|
259
|
+
| No-prerequisite target | Every `mage bench:*` target must run with no environment variables. The default mode should use documented local defaults and report unavailable inputs as bounded degradation evidence. |
|
|
260
|
+
| Manual unless proven safe | Do not add runtime-history or environment-sensitive benchmarks to normal CI. |
|
|
261
|
+
| Explicit local approval | Require concrete paths or resources, not broad consent flags. |
|
|
262
|
+
| User-local outputs | Keep generated benchmark history outside the repository by default. |
|
|
263
|
+
| Bounded retention | Retain aggregate history and latest redacted reports only. |
|
|
264
|
+
| Fixture verification | Test consent refusal, successful synthetic runs, degradation cases, privacy exclusions, and no repository-local outputs. |
|
|
265
|
+
|
|
266
|
+
If a future benchmark needs different retention or CI behavior, document the
|
|
267
|
+
reason in its contract and update this file in the same change.
|