@nexus-cortex/cli 4.26.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.cortex/agents/AGENT_PROFILE_GUIDE.md +307 -0
- package/.cortex/agents/README.md +268 -0
- package/.cortex/agents/a-frontend-landing-page-designer.md +41 -0
- package/.cortex/agents/autoresearch-agent.md +49 -0
- package/.cortex/agents/code-reviewer.md +63 -0
- package/.cortex/agents/context-research.md +26 -0
- package/.cortex/agents/doc-writer.md +92 -0
- package/.cortex/agents/explore.md +63 -0
- package/.cortex/agents/new-model-api-integrator-analyst.md +41 -0
- package/.cortex/agents/plan.md +109 -0
- package/.cortex/agents/pr-architecture-reviewer.md +77 -0
- package/.cortex/agents/pr-code-quality.md +78 -0
- package/.cortex/agents/pr-implementer.md +50 -0
- package/.cortex/agents/pr-security-auditor.md +62 -0
- package/.cortex/agents/pr-test-writer.md +67 -0
- package/.cortex/agents/refactor.md +118 -0
- package/.cortex/agents/test-writer.md +72 -0
- package/.cortex/agents/web-researcher.md +72 -0
- package/.cortex/bench/tasks/sample-tasks.json +20 -0
- package/.cortex/commands/compare.md +14 -0
- package/.cortex/commands/deps.md +16 -0
- package/.cortex/commands/diff.md +14 -0
- package/.cortex/commands/explain.md +16 -0
- package/.cortex/commands/find-bug.md +13 -0
- package/.cortex/commands/profile.md +15 -0
- package/.cortex/commands/review.md +18 -0
- package/.cortex/commands/search.md +16 -0
- package/.cortex/commands/test.md +15 -0
- package/.cortex/permissions.dev.json +20 -0
- package/.cortex/permissions.example.json +71 -0
- package/.cortex/permissions.prod.json +63 -0
- package/.cortex/permissions.test.json +19 -0
- package/.cortex/skills/autoresearch/SKILL.md +77 -0
- package/.cortex/skills/autoresearch/personas/README.md +45 -0
- package/.cortex/skills/autoresearch/personas/aggressive-refactor.md +25 -0
- package/.cortex/skills/autoresearch/personas/creative.md +29 -0
- package/.cortex/skills/autoresearch/personas/perf-hunter.md +27 -0
- package/.cortex/skills/autoresearch/personas/precise.md +23 -0
- package/.cortex/skills/autoresearch/personas/root-cause.md +26 -0
- package/.cortex/skills/autoresearch/personas/security-auditor.md +29 -0
- package/.cortex/skills/autoresearch/personas/skeptic-reviewer.md +31 -0
- package/.cortex/skills/autoresearch/personas/test-first.md +25 -0
- package/.cortex/skills/best-of-n/SKILL.md +76 -0
- package/.cortex/skills/cortex/SKILL.md +834 -0
- package/.cortex/skills/cortex-bench/SKILL.md +354 -0
- package/.cortex/skills/docx/SKILL.md +83 -0
- package/.cortex/skills/pdf-documents/SKILL.md +297 -0
- package/.cortex/skills/pdf-documents/sections/01-image-acquisition.md +132 -0
- package/.cortex/skills/pdf-documents/sections/02-ai-image-generation.md +274 -0
- package/.cortex/skills/pdf-documents/sections/03-paper-sizes.md +89 -0
- package/.cortex/skills/pdf-documents/sections/04-design-system.md +549 -0
- package/.cortex/skills/pdf-documents/sections/05-css-print-rules.md +135 -0
- package/.cortex/skills/pdf-documents/sections/06-svg-charts.md +100 -0
- package/.cortex/skills/pdf-documents/sections/07-templates.md +224 -0
- package/.cortex/skills/pdf-documents/sections/08-scaled-output.md +164 -0
- package/.cortex/skills/pdf-documents/sections/09-preview-qa.md +66 -0
- package/.cortex/skills/pdf-documents/sections/10-reading-pdfs.md +499 -0
- package/.cortex/skills/pdf-documents/sections/11-form-filling.md +241 -0
- package/.cortex/skills/pptx/SKILL.md +90 -0
- package/.cortex/skills/resume-analyst/SKILL.md +373 -0
- package/.cortex/skills/verify-work/SKILL.md +74 -0
- package/.cortex/skills/xlsx/SKILL.md +101 -0
- package/.cortex/system-messages/messages/WORK_QUALITY.md +159 -0
- package/.cortex/system-messages/registry.json +18 -0
- package/LICENSE +202 -0
- package/NOTICE +2 -0
- package/README.md +13 -0
- package/bin/cortex.js +548 -0
- package/dist/agent-mode.d.ts +21 -0
- package/dist/agent-mode.d.ts.map +1 -0
- package/dist/agent-mode.js +511 -0
- package/dist/agent-mode.js.map +1 -0
- package/dist/client/CortexClient.d.ts +84 -0
- package/dist/client/CortexClient.d.ts.map +1 -0
- package/dist/client/CortexClient.js +163 -0
- package/dist/client/CortexClient.js.map +1 -0
- package/dist/commands/artifact/list.d.ts +15 -0
- package/dist/commands/artifact/list.d.ts.map +1 -0
- package/dist/commands/artifact/list.js +89 -0
- package/dist/commands/artifact/list.js.map +1 -0
- package/dist/commands/artifact/restart.d.ts +13 -0
- package/dist/commands/artifact/restart.d.ts.map +1 -0
- package/dist/commands/artifact/restart.js +56 -0
- package/dist/commands/artifact/restart.js.map +1 -0
- package/dist/commands/artifact/status.d.ts +13 -0
- package/dist/commands/artifact/status.d.ts.map +1 -0
- package/dist/commands/artifact/status.js +100 -0
- package/dist/commands/artifact/status.js.map +1 -0
- package/dist/commands/artifact/stop.d.ts +13 -0
- package/dist/commands/artifact/stop.d.ts.map +1 -0
- package/dist/commands/artifact/stop.js +50 -0
- package/dist/commands/artifact/stop.js.map +1 -0
- package/dist/commands/autoresearch/bench.d.ts +32 -0
- package/dist/commands/autoresearch/bench.d.ts.map +1 -0
- package/dist/commands/autoresearch/bench.js +123 -0
- package/dist/commands/autoresearch/bench.js.map +1 -0
- package/dist/commands/autoresearch/commandRunner.d.ts +35 -0
- package/dist/commands/autoresearch/commandRunner.d.ts.map +1 -0
- package/dist/commands/autoresearch/commandRunner.js +91 -0
- package/dist/commands/autoresearch/commandRunner.js.map +1 -0
- package/dist/commands/autoresearch/evaluate.d.ts +18 -0
- package/dist/commands/autoresearch/evaluate.d.ts.map +1 -0
- package/dist/commands/autoresearch/evaluate.js +117 -0
- package/dist/commands/autoresearch/evaluate.js.map +1 -0
- package/dist/commands/autoresearch/experiment.d.ts +38 -0
- package/dist/commands/autoresearch/experiment.d.ts.map +1 -0
- package/dist/commands/autoresearch/experiment.js +168 -0
- package/dist/commands/autoresearch/experiment.js.map +1 -0
- package/dist/commands/autoresearch/fix.d.ts +10 -0
- package/dist/commands/autoresearch/fix.d.ts.map +1 -0
- package/dist/commands/autoresearch/fix.js +86 -0
- package/dist/commands/autoresearch/fix.js.map +1 -0
- package/dist/commands/autoresearch/harnessProcess.d.ts +48 -0
- package/dist/commands/autoresearch/harnessProcess.d.ts.map +1 -0
- package/dist/commands/autoresearch/harnessProcess.js +140 -0
- package/dist/commands/autoresearch/harnessProcess.js.map +1 -0
- package/dist/commands/autoresearch/list.d.ts +6 -0
- package/dist/commands/autoresearch/list.d.ts.map +1 -0
- package/dist/commands/autoresearch/list.js +38 -0
- package/dist/commands/autoresearch/list.js.map +1 -0
- package/dist/commands/autoresearch/loop.d.ts +26 -0
- package/dist/commands/autoresearch/loop.d.ts.map +1 -0
- package/dist/commands/autoresearch/loop.js +242 -0
- package/dist/commands/autoresearch/loop.js.map +1 -0
- package/dist/commands/cache/metrics.d.ts +13 -0
- package/dist/commands/cache/metrics.d.ts.map +1 -0
- package/dist/commands/cache/metrics.js +77 -0
- package/dist/commands/cache/metrics.js.map +1 -0
- package/dist/commands/chat/AgenticChat.d.ts +39 -0
- package/dist/commands/chat/AgenticChat.d.ts.map +1 -0
- package/dist/commands/chat/AgenticChat.js +201 -0
- package/dist/commands/chat/AgenticChat.js.map +1 -0
- package/dist/commands/chat/renderers/CodeRenderer.d.ts +36 -0
- package/dist/commands/chat/renderers/CodeRenderer.d.ts.map +1 -0
- package/dist/commands/chat/renderers/CodeRenderer.js +85 -0
- package/dist/commands/chat/renderers/CodeRenderer.js.map +1 -0
- package/dist/commands/chat/renderers/ToolRenderer.d.ts +30 -0
- package/dist/commands/chat/renderers/ToolRenderer.d.ts.map +1 -0
- package/dist/commands/chat/renderers/ToolRenderer.js +93 -0
- package/dist/commands/chat/renderers/ToolRenderer.js.map +1 -0
- package/dist/commands/chat/single-message.d.ts +15 -0
- package/dist/commands/chat/single-message.d.ts.map +1 -0
- package/dist/commands/chat/single-message.js +85 -0
- package/dist/commands/chat/single-message.js.map +1 -0
- package/dist/commands/config/categories.d.ts +8 -0
- package/dist/commands/config/categories.d.ts.map +1 -0
- package/dist/commands/config/categories.js +75 -0
- package/dist/commands/config/categories.js.map +1 -0
- package/dist/commands/config/category.d.ts +8 -0
- package/dist/commands/config/category.d.ts.map +1 -0
- package/dist/commands/config/category.js +81 -0
- package/dist/commands/config/category.js.map +1 -0
- package/dist/commands/config/get.d.ts +9 -0
- package/dist/commands/config/get.d.ts.map +1 -0
- package/dist/commands/config/get.js +98 -0
- package/dist/commands/config/get.js.map +1 -0
- package/dist/commands/config/reset.d.ts +6 -0
- package/dist/commands/config/reset.d.ts.map +1 -0
- package/dist/commands/config/reset.js +68 -0
- package/dist/commands/config/reset.js.map +1 -0
- package/dist/commands/config/set.d.ts +6 -0
- package/dist/commands/config/set.d.ts.map +1 -0
- package/dist/commands/config/set.js +60 -0
- package/dist/commands/config/set.js.map +1 -0
- package/dist/commands/config/utils.d.ts +14 -0
- package/dist/commands/config/utils.d.ts.map +1 -0
- package/dist/commands/config/utils.js +54 -0
- package/dist/commands/config/utils.js.map +1 -0
- package/dist/commands/context/boundaries.d.ts +13 -0
- package/dist/commands/context/boundaries.d.ts.map +1 -0
- package/dist/commands/context/boundaries.js +45 -0
- package/dist/commands/context/boundaries.js.map +1 -0
- package/dist/commands/context/compact.d.ts +13 -0
- package/dist/commands/context/compact.d.ts.map +1 -0
- package/dist/commands/context/compact.js +41 -0
- package/dist/commands/context/compact.js.map +1 -0
- package/dist/commands/context/savings.d.ts +13 -0
- package/dist/commands/context/savings.d.ts.map +1 -0
- package/dist/commands/context/savings.js +49 -0
- package/dist/commands/context/savings.js.map +1 -0
- package/dist/commands/context/status.d.ts +13 -0
- package/dist/commands/context/status.d.ts.map +1 -0
- package/dist/commands/context/status.js +52 -0
- package/dist/commands/context/status.js.map +1 -0
- package/dist/commands/context/strategy.d.ts +13 -0
- package/dist/commands/context/strategy.d.ts.map +1 -0
- package/dist/commands/context/strategy.js +66 -0
- package/dist/commands/context/strategy.js.map +1 -0
- package/dist/commands/mcp/disable.d.ts +5 -0
- package/dist/commands/mcp/disable.d.ts.map +1 -0
- package/dist/commands/mcp/disable.js +26 -0
- package/dist/commands/mcp/disable.js.map +1 -0
- package/dist/commands/mcp/edit.d.ts +9 -0
- package/dist/commands/mcp/edit.d.ts.map +1 -0
- package/dist/commands/mcp/edit.js +62 -0
- package/dist/commands/mcp/edit.js.map +1 -0
- package/dist/commands/mcp/enable.d.ts +5 -0
- package/dist/commands/mcp/enable.d.ts.map +1 -0
- package/dist/commands/mcp/enable.js +27 -0
- package/dist/commands/mcp/enable.js.map +1 -0
- package/dist/commands/mcp/init.d.ts +9 -0
- package/dist/commands/mcp/init.d.ts.map +1 -0
- package/dist/commands/mcp/init.js +97 -0
- package/dist/commands/mcp/init.js.map +1 -0
- package/dist/commands/mcp/list.d.ts +6 -0
- package/dist/commands/mcp/list.d.ts.map +1 -0
- package/dist/commands/mcp/list.js +56 -0
- package/dist/commands/mcp/list.js.map +1 -0
- package/dist/commands/mcp/server.d.ts +6 -0
- package/dist/commands/mcp/server.d.ts.map +1 -0
- package/dist/commands/mcp/server.js +44 -0
- package/dist/commands/mcp/server.js.map +1 -0
- package/dist/commands/mcp/status.d.ts +6 -0
- package/dist/commands/mcp/status.d.ts.map +1 -0
- package/dist/commands/mcp/status.js +43 -0
- package/dist/commands/mcp/status.js.map +1 -0
- package/dist/commands/mcp/tools.d.ts +7 -0
- package/dist/commands/mcp/tools.d.ts.map +1 -0
- package/dist/commands/mcp/tools.js +82 -0
- package/dist/commands/mcp/tools.js.map +1 -0
- package/dist/commands/mcp/validate.d.ts +8 -0
- package/dist/commands/mcp/validate.d.ts.map +1 -0
- package/dist/commands/mcp/validate.js +121 -0
- package/dist/commands/mcp/validate.js.map +1 -0
- package/dist/commands/middleware/config.d.ts +13 -0
- package/dist/commands/middleware/config.d.ts.map +1 -0
- package/dist/commands/middleware/config.js +87 -0
- package/dist/commands/middleware/config.js.map +1 -0
- package/dist/commands/middleware/disable.d.ts +13 -0
- package/dist/commands/middleware/disable.d.ts.map +1 -0
- package/dist/commands/middleware/disable.js +50 -0
- package/dist/commands/middleware/disable.js.map +1 -0
- package/dist/commands/middleware/enable.d.ts +13 -0
- package/dist/commands/middleware/enable.d.ts.map +1 -0
- package/dist/commands/middleware/enable.js +50 -0
- package/dist/commands/middleware/enable.js.map +1 -0
- package/dist/commands/middleware/list.d.ts +13 -0
- package/dist/commands/middleware/list.d.ts.map +1 -0
- package/dist/commands/middleware/list.js +64 -0
- package/dist/commands/middleware/list.js.map +1 -0
- package/dist/commands/middleware/status.d.ts +13 -0
- package/dist/commands/middleware/status.d.ts.map +1 -0
- package/dist/commands/middleware/status.js +80 -0
- package/dist/commands/middleware/status.js.map +1 -0
- package/dist/commands/models/compare.d.ts +9 -0
- package/dist/commands/models/compare.d.ts.map +1 -0
- package/dist/commands/models/compare.js +76 -0
- package/dist/commands/models/compare.js.map +1 -0
- package/dist/commands/models/cost.d.ts +9 -0
- package/dist/commands/models/cost.d.ts.map +1 -0
- package/dist/commands/models/cost.js +64 -0
- package/dist/commands/models/cost.js.map +1 -0
- package/dist/commands/models/info.d.ts +9 -0
- package/dist/commands/models/info.d.ts.map +1 -0
- package/dist/commands/models/info.js +61 -0
- package/dist/commands/models/info.js.map +1 -0
- package/dist/commands/models/list.d.ts +6 -0
- package/dist/commands/models/list.d.ts.map +1 -0
- package/dist/commands/models/list.js +66 -0
- package/dist/commands/models/list.js.map +1 -0
- package/dist/commands/models/providers.d.ts +13 -0
- package/dist/commands/models/providers.d.ts.map +1 -0
- package/dist/commands/models/providers.js +45 -0
- package/dist/commands/models/providers.js.map +1 -0
- package/dist/commands/models/search.d.ts +10 -0
- package/dist/commands/models/search.d.ts.map +1 -0
- package/dist/commands/models/search.js +56 -0
- package/dist/commands/models/search.js.map +1 -0
- package/dist/commands/models/switch.d.ts +14 -0
- package/dist/commands/models/switch.d.ts.map +1 -0
- package/dist/commands/models/switch.js +67 -0
- package/dist/commands/models/switch.js.map +1 -0
- package/dist/commands/permissions/auto-approve.d.ts +13 -0
- package/dist/commands/permissions/auto-approve.d.ts.map +1 -0
- package/dist/commands/permissions/auto-approve.js +53 -0
- package/dist/commands/permissions/auto-approve.js.map +1 -0
- package/dist/commands/permissions/grant.d.ts +13 -0
- package/dist/commands/permissions/grant.d.ts.map +1 -0
- package/dist/commands/permissions/grant.js +46 -0
- package/dist/commands/permissions/grant.js.map +1 -0
- package/dist/commands/permissions/mode.d.ts +12 -0
- package/dist/commands/permissions/mode.d.ts.map +1 -0
- package/dist/commands/permissions/mode.js +61 -0
- package/dist/commands/permissions/mode.js.map +1 -0
- package/dist/commands/permissions/policies.d.ts +13 -0
- package/dist/commands/permissions/policies.d.ts.map +1 -0
- package/dist/commands/permissions/policies.js +47 -0
- package/dist/commands/permissions/policies.js.map +1 -0
- package/dist/commands/permissions/revoke.d.ts +13 -0
- package/dist/commands/permissions/revoke.d.ts.map +1 -0
- package/dist/commands/permissions/revoke.js +46 -0
- package/dist/commands/permissions/revoke.js.map +1 -0
- package/dist/commands/permissions/set.d.ts +13 -0
- package/dist/commands/permissions/set.d.ts.map +1 -0
- package/dist/commands/permissions/set.js +57 -0
- package/dist/commands/permissions/set.js.map +1 -0
- package/dist/commands/permissions/tools.d.ts +13 -0
- package/dist/commands/permissions/tools.d.ts.map +1 -0
- package/dist/commands/permissions/tools.js +50 -0
- package/dist/commands/permissions/tools.js.map +1 -0
- package/dist/commands/server/start.d.ts +11 -0
- package/dist/commands/server/start.d.ts.map +1 -0
- package/dist/commands/server/start.js +58 -0
- package/dist/commands/server/start.js.map +1 -0
- package/dist/commands/session/checkpoints.d.ts +6 -0
- package/dist/commands/session/checkpoints.d.ts.map +1 -0
- package/dist/commands/session/checkpoints.js +41 -0
- package/dist/commands/session/checkpoints.js.map +1 -0
- package/dist/commands/session/compact.d.ts +13 -0
- package/dist/commands/session/compact.d.ts.map +1 -0
- package/dist/commands/session/compact.js +56 -0
- package/dist/commands/session/compact.js.map +1 -0
- package/dist/commands/session/export.d.ts +6 -0
- package/dist/commands/session/export.d.ts.map +1 -0
- package/dist/commands/session/export.js +31 -0
- package/dist/commands/session/export.js.map +1 -0
- package/dist/commands/session/list.d.ts +7 -0
- package/dist/commands/session/list.d.ts.map +1 -0
- package/dist/commands/session/list.js +63 -0
- package/dist/commands/session/list.js.map +1 -0
- package/dist/commands/session/new.d.ts +8 -0
- package/dist/commands/session/new.d.ts.map +1 -0
- package/dist/commands/session/new.js +23 -0
- package/dist/commands/session/new.js.map +1 -0
- package/dist/commands/session/resume.d.ts +6 -0
- package/dist/commands/session/resume.d.ts.map +1 -0
- package/dist/commands/session/resume.js +32 -0
- package/dist/commands/session/resume.js.map +1 -0
- package/dist/commands/session/search.d.ts +10 -0
- package/dist/commands/session/search.d.ts.map +1 -0
- package/dist/commands/session/search.js +65 -0
- package/dist/commands/session/search.js.map +1 -0
- package/dist/commands/session/stats.d.ts +6 -0
- package/dist/commands/session/stats.d.ts.map +1 -0
- package/dist/commands/session/stats.js +58 -0
- package/dist/commands/session/stats.js.map +1 -0
- package/dist/commands/session/view.d.ts +6 -0
- package/dist/commands/session/view.d.ts.map +1 -0
- package/dist/commands/session/view.js +65 -0
- package/dist/commands/session/view.js.map +1 -0
- package/dist/commands/slash/CommandPalette.d.ts +60 -0
- package/dist/commands/slash/CommandPalette.d.ts.map +1 -0
- package/dist/commands/slash/CommandPalette.js +351 -0
- package/dist/commands/slash/CommandPalette.js.map +1 -0
- package/dist/commands/slash/SlashCommandParser.d.ts +11 -0
- package/dist/commands/slash/SlashCommandParser.d.ts.map +1 -0
- package/dist/commands/slash/SlashCommandParser.js +11 -0
- package/dist/commands/slash/SlashCommandParser.js.map +1 -0
- package/dist/commands/slash/SlashCommandRegistry.d.ts +11 -0
- package/dist/commands/slash/SlashCommandRegistry.d.ts.map +1 -0
- package/dist/commands/slash/SlashCommandRegistry.js +11 -0
- package/dist/commands/slash/SlashCommandRegistry.js.map +1 -0
- package/dist/commands/slash/index.d.ts +11 -0
- package/dist/commands/slash/index.d.ts.map +1 -0
- package/dist/commands/slash/index.js +13 -0
- package/dist/commands/slash/index.js.map +1 -0
- package/dist/commands/system-messages/list.d.ts +13 -0
- package/dist/commands/system-messages/list.d.ts.map +1 -0
- package/dist/commands/system-messages/list.js +54 -0
- package/dist/commands/system-messages/list.js.map +1 -0
- package/dist/commands/system-messages/reload.d.ts +13 -0
- package/dist/commands/system-messages/reload.d.ts.map +1 -0
- package/dist/commands/system-messages/reload.js +36 -0
- package/dist/commands/system-messages/reload.js.map +1 -0
- package/dist/commands/system-messages/view.d.ts +13 -0
- package/dist/commands/system-messages/view.d.ts.map +1 -0
- package/dist/commands/system-messages/view.js +52 -0
- package/dist/commands/system-messages/view.js.map +1 -0
- package/dist/commands/tmux/list.d.ts +13 -0
- package/dist/commands/tmux/list.d.ts.map +1 -0
- package/dist/commands/tmux/list.js +68 -0
- package/dist/commands/tmux/list.js.map +1 -0
- package/dist/commands/tools/info.d.ts +13 -0
- package/dist/commands/tools/info.d.ts.map +1 -0
- package/dist/commands/tools/info.js +82 -0
- package/dist/commands/tools/info.js.map +1 -0
- package/dist/commands/tools/list.d.ts +14 -0
- package/dist/commands/tools/list.d.ts.map +1 -0
- package/dist/commands/tools/list.js +67 -0
- package/dist/commands/tools/list.js.map +1 -0
- package/dist/config/ConfigManager.d.ts +40 -0
- package/dist/config/ConfigManager.d.ts.map +1 -0
- package/dist/config/ConfigManager.js +162 -0
- package/dist/config/ConfigManager.js.map +1 -0
- package/dist/config/extension.d.ts +12 -0
- package/dist/config/extension.d.ts.map +1 -0
- package/dist/config/extension.js +5 -0
- package/dist/config/extension.js.map +1 -0
- package/dist/config/settings.d.ts +42 -0
- package/dist/config/settings.d.ts.map +1 -0
- package/dist/config/settings.js +32 -0
- package/dist/config/settings.js.map +1 -0
- package/dist/index.d.ts +3 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +883 -0
- package/dist/index.js.map +1 -0
- package/dist/orchestrator/OrchestratorClient.d.ts +385 -0
- package/dist/orchestrator/OrchestratorClient.d.ts.map +1 -0
- package/dist/orchestrator/OrchestratorClient.js +1195 -0
- package/dist/orchestrator/OrchestratorClient.js.map +1 -0
- package/dist/themes/DefaultTheme.d.ts +9 -0
- package/dist/themes/DefaultTheme.d.ts.map +1 -0
- package/dist/themes/DefaultTheme.js +29 -0
- package/dist/themes/DefaultTheme.js.map +1 -0
- package/dist/themes/MinimalTheme.d.ts +9 -0
- package/dist/themes/MinimalTheme.d.ts.map +1 -0
- package/dist/themes/MinimalTheme.js +29 -0
- package/dist/themes/MinimalTheme.js.map +1 -0
- package/dist/themes/Theme.interface.d.ts +36 -0
- package/dist/themes/Theme.interface.d.ts.map +1 -0
- package/dist/themes/Theme.interface.js +5 -0
- package/dist/themes/Theme.interface.js.map +1 -0
- package/dist/themes/ThemeManager.d.ts +63 -0
- package/dist/themes/ThemeManager.d.ts.map +1 -0
- package/dist/themes/ThemeManager.js +257 -0
- package/dist/themes/ThemeManager.js.map +1 -0
- package/dist/themes/colors.d.ts +108 -0
- package/dist/themes/colors.d.ts.map +1 -0
- package/dist/themes/colors.js +284 -0
- package/dist/themes/colors.js.map +1 -0
- package/dist/themes/createTheme.d.ts +40 -0
- package/dist/themes/createTheme.d.ts.map +1 -0
- package/dist/themes/createTheme.js +114 -0
- package/dist/themes/createTheme.js.map +1 -0
- package/dist/themes/themeDefinitions.d.ts +27 -0
- package/dist/themes/themeDefinitions.d.ts.map +1 -0
- package/dist/themes/themeDefinitions.js +244 -0
- package/dist/themes/themeDefinitions.js.map +1 -0
- package/dist/utils/CodeDiffRenderer.d.ts +124 -0
- package/dist/utils/CodeDiffRenderer.d.ts.map +1 -0
- package/dist/utils/CodeDiffRenderer.js +257 -0
- package/dist/utils/CodeDiffRenderer.js.map +1 -0
- package/dist/utils/MarkdownRenderer.d.ts +74 -0
- package/dist/utils/MarkdownRenderer.d.ts.map +1 -0
- package/dist/utils/MarkdownRenderer.js +260 -0
- package/dist/utils/MarkdownRenderer.js.map +1 -0
- package/dist/utils/MessageRenderer.d.ts +200 -0
- package/dist/utils/MessageRenderer.d.ts.map +1 -0
- package/dist/utils/MessageRenderer.js +283 -0
- package/dist/utils/MessageRenderer.js.map +1 -0
- package/dist/utils/ToolFormatter.d.ts +103 -0
- package/dist/utils/ToolFormatter.d.ts.map +1 -0
- package/dist/utils/ToolFormatter.js +357 -0
- package/dist/utils/ToolFormatter.js.map +1 -0
- package/dist/utils/boxDrawing.d.ts +23 -0
- package/dist/utils/boxDrawing.d.ts.map +1 -0
- package/dist/utils/boxDrawing.js +78 -0
- package/dist/utils/boxDrawing.js.map +1 -0
- package/dist/utils/checks.d.ts +9 -0
- package/dist/utils/checks.d.ts.map +1 -0
- package/dist/utils/checks.js +11 -0
- package/dist/utils/checks.js.map +1 -0
- package/dist/utils/events.d.ts +24 -0
- package/dist/utils/events.d.ts.map +1 -0
- package/dist/utils/events.js +17 -0
- package/dist/utils/events.js.map +1 -0
- package/dist/utils/formatters.d.ts +255 -0
- package/dist/utils/formatters.d.ts.map +1 -0
- package/dist/utils/formatters.js +361 -0
- package/dist/utils/formatters.js.map +1 -0
- package/dist/utils/math.d.ts +11 -0
- package/dist/utils/math.d.ts.map +1 -0
- package/dist/utils/math.js +13 -0
- package/dist/utils/math.js.map +1 -0
- package/package.json +82 -0
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: autoresearch
|
|
3
|
+
description: The playbook for an agent tasked with acting as the AUTO-RESEARCH PM — investigate, write a measurable experiment plan, delegate to N varied subagents (or relay to the cortex harness), and arbitrate the holdout-verified winner. You orchestrate; you do NOT run the experiments yourself. Plan-gated: no measurable metric, no launch. Pairs with cortex-bench (the benchmarking methodology) and the in-harness autoresearch-agent profile (the worker).
|
|
4
|
+
triggers:
|
|
5
|
+
- act as autoresearch pm
|
|
6
|
+
- autoresearch pm
|
|
7
|
+
- run an autoresearch experiment
|
|
8
|
+
- auto-research campaign
|
|
9
|
+
- run experiments at scale
|
|
10
|
+
- improve X until
|
|
11
|
+
- self-improvement experiment
|
|
12
|
+
- delegate to autoresearch agents
|
|
13
|
+
- experiment plan
|
|
14
|
+
- recursive improvement campaign
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
# Auto-Research — your role is PM
|
|
18
|
+
|
|
19
|
+
**Loading this skill puts you in the auto-research PM role.** From now on, for this effort, you orchestrate — you do NOT run the experiments yourself. The target is to improve something until a metric clears a bar (or to audit / self-improve a harness). Your four jobs: **investigate → plan → delegate → arbitrate.** The tool surface + the experiment-running live in the *subagents* (or the cortex harness); keep your own context on the plan and the verdicts.
|
|
20
|
+
|
|
21
|
+
> **The one rule that prevents the classic failure:** NO MEASURABLE METRIC → DO NOT LAUNCH. A live run once spawned 5 agents on a vague, unmeasurable deficiency; they explored for 5 minutes and produced nothing (0 fixes, 2 timeouts). Auto-research *requires* a base-vs-candidate measurement. If you can't define one, say what's missing (an eval / repro / task-set) and stop.
|
|
22
|
+
|
|
23
|
+
## 1. PLAN FIRST (the gate)
|
|
24
|
+
Before delegating anything, produce a concrete **experiment plan**. The cortex harness *enforces* this — it blocks the launch until you have:
|
|
25
|
+
- **Interactive (a human is present):** draft the plan in **plan mode** (EnterPlanMode → ExitPlanMode) and get it approved.
|
|
26
|
+
- **Headless (no human):** create a **TodoCreate** planning checklist.
|
|
27
|
+
|
|
28
|
+
The plan must define:
|
|
29
|
+
1. **Target & metric** — what you're improving, and *how it's measured* (an eval/command that prints a number, or a verifier task-set).
|
|
30
|
+
2. **Pass/fail criterion** — the threshold + the verifier (`exact`/`regex`/`contains`/`numeric`/`llm-judge`).
|
|
31
|
+
3. **Control** — base ref vs candidate, **train + a held-out set** (the held-out set is non-negotiable — see §4).
|
|
32
|
+
4. **Per-subagent variation** — see §2.
|
|
33
|
+
5. **Continue/fail rules** — a turn budget; fail-fast if not measurable; never self-merge.
|
|
34
|
+
|
|
35
|
+
Do real **base investigation first** — read the project, the backlog (`ResearchBacklog next`/`list`), the existing benchmarks. Triage the *single* highest-value item; don't boil the ocean.
|
|
36
|
+
|
|
37
|
+
## 2. DIVERSIFY the arms (the whole point of N)
|
|
38
|
+
Identical agents on identical prompts waste the parallelism — they trace the same path. Assign each subagent a **distinct strategy/persona**, and vary the levers the dispatch supports:
|
|
39
|
+
- **Strategy/persona** (per Task dispatch — `strategy` label, also in the prompt): use the **arm persona library** in this skill's `personas/` directory (`precise`, `aggressive-refactor`, `root-cause`, `test-first`, `security-auditor`, `perf-hunter`, `creative`, `skeptic-reviewer` — see `personas/README.md`). Embed the chosen persona body in the arm's prompt after the shared plan, and pass its filename as the label (`strategy: "precise"`) so the result is recorded under it.
|
|
40
|
+
- **Model** (per Task dispatch — `model` override): different models genuinely decorrelate. Honor cost/no-xAI constraints.
|
|
41
|
+
- **Temperature** (per Task dispatch — `temperature`): read the model card's valid range first (e.g. DeepSeek 0–2, Anthropic 0–1) and step by tenths across arms — auto-clamped to the model's range.
|
|
42
|
+
|
|
43
|
+
**Effectiveness learns over time.** Each arm's benchmark is recorded per **(model × temperature × strategy)** in the router matrix, so the harness builds a track record of which *variations* — not just which models — win a given task. When you plan the next round, reuse the strongest known arm and spend the remaining arms exploring new variety (the matrix's `recommendStrategy` surfaces the leader). This is the cortex-bench loop applied to strategies.
|
|
44
|
+
|
|
45
|
+
**Diversify the SEARCH; keep the EVALUATION identical** — every arm is judged by the *same* metric + the *same* gate (one shared judge). Letting arms pick their own metric is reward-hacking. Keep N small with **sharp** distinctions (4–5 genuinely different approaches beat many near-duplicates) — N arms ≈ N× the spend, so buy breadth, not duplicates.
|
|
46
|
+
|
|
47
|
+
## 3. DELEGATE (pick the execution path by how you're accessed)
|
|
48
|
+
- **Local cortex harness** (you're driving cortex, or inside it): set `AUTORESEARCH_AGENTS=native` and delegate via the **Task tool** (`subagent_type: autoresearch-agent`), one per strategy, each prompt = the plan + that arm's persona/strategy + `EXECUTION MODE: native`. Or drive the CLI directly: `cortex autoresearch fix` / `experiment` / `loop`.
|
|
49
|
+
- **Hosted at scale** (external agent): a hosted auto-research MCP is planned (ships after the npm release). When one is configured, relay the plan to its tools — `EXECUTION MODE: mcp`.
|
|
50
|
+
|
|
51
|
+
The agents EXPLORE; they do not merge. They each return a candidate + its verdict.
|
|
52
|
+
|
|
53
|
+
## 4. ARBITRATE (you, centrally — never the arms)
|
|
54
|
+
Collect every candidate + verdict and keep **only the holdout-verified winner**:
|
|
55
|
+
- **fixed ≠ verified.** A candidate that only passes the task that surfaced the deficiency is `fixed`. It is `verified` ONLY after a **held-out** set it was never tuned against confirms it.
|
|
56
|
+
- **N-aware significance.** With N parallel arms some clear the bar by chance — the gate's family-wise-error (FWER) correction handles this; apply it across *all* arms (including the discarded ones). A single arm "winning" is not enough on its own.
|
|
57
|
+
- **You arbitrate; the arms don't self-merge.** This central single-judge step is what makes aggressive diversity safe.
|
|
58
|
+
- The gate is deterministic code (`cortex autoresearch evaluate` / `AutoResearchGate`) — never an LLM deciding significance.
|
|
59
|
+
|
|
60
|
+
## 5. Discipline (the overfitting guards — load-bearing)
|
|
61
|
+
- **Human owns the metric.** You (or the operator) define success; the agents optimize against it. An agent that chooses its own metric games the eval.
|
|
62
|
+
- **Train decides, holdout verifies.** "Until it meets the criteria" must mean *on data it never trained against* — for time series, a genuine **walk-forward** split, not a random one. Risk-adjusted metrics (Sharpe, cost-aware) over raw return, or the agents "win" by adding hidden risk.
|
|
63
|
+
- **Fail-fast.** If a deficiency isn't measurable, report it and stop — don't let agents explore indefinitely.
|
|
64
|
+
|
|
65
|
+
## 6. Improve over time (ties to cortex-bench)
|
|
66
|
+
This is the benchmarking loop from `cortex-bench`, applied recursively: every experiment writes scored records (`router-matrix.jsonl`) and deficiencies (`research-backlog.jsonl`). The effectiveness layer is BUILT: every record carries its **(model × temperature × strategy)** arm, and the matrix ranks them per task (`getStrategyScores` / `recommendStrategy`; the bench/experiment/loop CLIs take `--temperature`/`--strategy`, and Task-dispatched arms are stamped automatically via `CORTEX_SUBAGENT_TEMPERATURE`/`CORTEX_ARM_STRATEGY`). When planning a round, reuse the strongest known arm and spend the remaining arms on new variety — benchmark results → improve output over time.
|
|
67
|
+
|
|
68
|
+
## 7. Composing skills inside the pipeline
|
|
69
|
+
- **`best-of-n` is the per-deficiency tactic.** When one deficiency deserves multiple competing fixes, each arm IS a tournament entrant: same plan, same metric, distinct strategy/model/temperature per arm, one central judge. Use its worktree + frozen-criteria discipline for any high-value single task, even outside a formal experiment.
|
|
70
|
+
- **`verify-work` is the arbitration discipline.** Before accepting any arm's "fixed" claim, apply its refute-don't-confirm checklist: independent verifier, evidence-based per-claim verdicts, fixed ≠ verified. The holdout gate is the statistical form; verify-work is the structural form — use both.
|
|
71
|
+
- **The document skills (`docx`/`xlsx`/`pptx`/`pdf-documents`) make excellent bench-task surfaces**: file deliverables are independently verifiable (re-open + assert contents), which is exactly what a graded task needs — real work, deterministic check.
|
|
72
|
+
|
|
73
|
+
## See also
|
|
74
|
+
- `cortex-bench` — the multi-model benchmark methodology + the deficiency-ledger discipline.
|
|
75
|
+
- `best-of-n` — the parallel tournament pattern (per-task form of the arms doctrine).
|
|
76
|
+
- `verify-work` — the adversarial verification subagent (structural form of the holdout gate).
|
|
77
|
+
- In-harness: `AUTORESEARCH_AGENTS` (off|native|mcp), the `autoresearch-agent` profile (the worker), `cortex autoresearch fix/experiment/loop/bench/evaluate`.
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
# Arm Persona Library
|
|
2
|
+
|
|
3
|
+
Named personas for auto-research arms (and any `best-of-n` tournament). Each file
|
|
4
|
+
is a self-contained persona the PM embeds into an arm's Task prompt — and each
|
|
5
|
+
persona's filename IS its **`strategy` label**: pass it on the dispatch
|
|
6
|
+
(`strategy: "precise"`, plus the suggested `temperature`) so every benchmark the
|
|
7
|
+
arm produces is recorded under that arm in the effectiveness matrix
|
|
8
|
+
(`getStrategyScores` / `recommendStrategy` then learn which personas win which
|
|
9
|
+
task types over time).
|
|
10
|
+
|
|
11
|
+
## How the PM uses this library
|
|
12
|
+
|
|
13
|
+
1. **Pick a diverse set** (3–5) whose angles genuinely differ for THIS deficiency —
|
|
14
|
+
sharp distinctions beat near-duplicates. `recommendStrategy(taskType)` tells you
|
|
15
|
+
the strongest proven arm; give it one slot and spend the rest on variety.
|
|
16
|
+
2. **Embed the persona body** in that arm's prompt, after the shared plan
|
|
17
|
+
(metric, acceptance check, continue/fail rules — identical across arms).
|
|
18
|
+
3. **Dispatch with the matching labels**: `strategy` = the persona filename,
|
|
19
|
+
`temperature` from the persona's suggested range (auto-clamped per model),
|
|
20
|
+
`model` varied across arms for real decorrelation.
|
|
21
|
+
4. Every persona obeys the shared contract below — personas change the SEARCH,
|
|
22
|
+
never the EVALUATION.
|
|
23
|
+
|
|
24
|
+
## The shared contract (all personas)
|
|
25
|
+
|
|
26
|
+
- The metric, acceptance check, and judge are FROZEN by the plan — a persona may
|
|
27
|
+
not reinterpret or relax them.
|
|
28
|
+
- Work in YOUR assigned worktree only; commit your candidate; never merge.
|
|
29
|
+
- Report honestly in the structured format your persona defines — a failed
|
|
30
|
+
attempt reported clearly is more valuable than a fake success.
|
|
31
|
+
- Fail fast: if your angle is exhausted or the target is unmeasurable from your
|
|
32
|
+
side, say so and stop within budget.
|
|
33
|
+
|
|
34
|
+
## The personas
|
|
35
|
+
|
|
36
|
+
| File / `strategy` | Angle | Suggested temp |
|
|
37
|
+
|---|---|---|
|
|
38
|
+
| `precise` | Minimal, conservative, smallest-possible diff | 0.0–0.3 |
|
|
39
|
+
| `aggressive-refactor` | Fix the structure, not the symptom | 0.5–0.8 |
|
|
40
|
+
| `root-cause` | Diagnose fully before touching anything | 0.2–0.5 |
|
|
41
|
+
| `test-first` | Encode the bug as a failing test, then fix to green | 0.2–0.5 |
|
|
42
|
+
| `security-auditor` | Adversarial audit — findings, not fixes | 0.2–0.4 |
|
|
43
|
+
| `perf-hunter` | Measure first, optimize the proven hotspot only | 0.3–0.6 |
|
|
44
|
+
| `creative` | Divergent — question the framing, try the unconventional | 0.9–1.3 |
|
|
45
|
+
| `skeptic-reviewer` | Verification arm — refute the others' candidates | 0.1–0.3 |
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
# Persona: aggressive-refactor — fix the structure, not the symptom
|
|
2
|
+
|
|
3
|
+
**Mission.** Treat the deficiency as evidence of a structural problem and fix the
|
|
4
|
+
structure. Your candidate wins when the metric clears AND the code is simpler
|
|
5
|
+
than before — a fix that *deletes* complexity and still holds is a top-tier result.
|
|
6
|
+
|
|
7
|
+
**Process.**
|
|
8
|
+
1. Map the structure around the failure: who calls this, what state it owns, where
|
|
9
|
+
the responsibility boundaries actually are vs. where they should be.
|
|
10
|
+
2. Form a thesis: "this class of bug exists because X is shaped wrong." Write it
|
|
11
|
+
down in one sentence before editing.
|
|
12
|
+
3. Reshape X — extract/merge/invert as the thesis demands — then make the original
|
|
13
|
+
failure impossible by construction, not patched around.
|
|
14
|
+
4. Run the FULL test suite, not just the acceptance check: a refactor's risk is
|
|
15
|
+
collateral breakage, and an honest report of it is part of your job.
|
|
16
|
+
|
|
17
|
+
**Output contract.** Report: (a) the thesis sentence, (b) the diff stat and net
|
|
18
|
+
lines added/removed, (c) acceptance check + full-suite results verbatim,
|
|
19
|
+
(d) every behavior you knowingly changed beyond the target.
|
|
20
|
+
|
|
21
|
+
**Rules.** Refactor ≠ rewrite: keep the blast radius proportional to the thesis.
|
|
22
|
+
If the structure turns out to be sound and the bug is local, say so and stop —
|
|
23
|
+
that validates the `precise` arm. Never relax a test to make the refactor pass.
|
|
24
|
+
|
|
25
|
+
**Dispatch hints.** `strategy: "aggressive-refactor"`, temperature 0.5–0.8.
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
# Persona: creative — question the framing, try the unconventional
|
|
2
|
+
|
|
3
|
+
**Mission.** You are the divergence arm. The other arms optimize within the
|
|
4
|
+
problem as stated; you are licensed to ask whether the problem is stated right —
|
|
5
|
+
and to try the approach nobody else will. Most of your candidates will lose.
|
|
6
|
+
The ones that win change the experiment.
|
|
7
|
+
|
|
8
|
+
**Process.**
|
|
9
|
+
1. Re-derive the problem from first principles: what is the deficiency a SYMPTOM
|
|
10
|
+
of? Is there a framing under which it disappears entirely (delete the feature,
|
|
11
|
+
invert the dataflow, move the work to a different layer, replace the mechanism
|
|
12
|
+
with a simpler primitive)?
|
|
13
|
+
2. Pick the most promising unconventional angle — one, not five — and prototype
|
|
14
|
+
it fast and rough in your worktree.
|
|
15
|
+
3. Bring it back to the SAME acceptance check as everyone else. Divergent search,
|
|
16
|
+
identical evaluation — the check is what makes your license safe.
|
|
17
|
+
4. If it clears, spend remaining budget cleaning the prototype to a reviewable
|
|
18
|
+
candidate. If it doesn't, write down precisely why the angle failed.
|
|
19
|
+
|
|
20
|
+
**Output contract.** Report: (a) the reframing in two sentences (what assumption
|
|
21
|
+
you broke), (b) the prototype diff, (c) the acceptance check output verbatim,
|
|
22
|
+
(d) on failure: what the angle revealed that the conventional arms won't see —
|
|
23
|
+
your failure analysis is a first-class deliverable.
|
|
24
|
+
|
|
25
|
+
**Rules.** The acceptance check is the one thing you may not reimagine. No mixing
|
|
26
|
+
of three half-ideas — commit to one angle. Flag anything you broke knowingly.
|
|
27
|
+
|
|
28
|
+
**Dispatch hints.** `strategy: "creative"`, temperature 0.9–1.3 (clamped to the
|
|
29
|
+
model's range automatically).
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
# Persona: perf-hunter — measure first, optimize the proven hotspot only
|
|
2
|
+
|
|
3
|
+
**Mission.** Make the metric faster/cheaper with evidence at both ends. Your
|
|
4
|
+
candidate wins on the measured delta, never on plausibility — an optimization
|
|
5
|
+
without a before/after number does not exist.
|
|
6
|
+
|
|
7
|
+
**Process.**
|
|
8
|
+
1. Establish the baseline: run the plan's measurement (or build a minimal
|
|
9
|
+
harness/timer around the target path) at least 3 times; record the spread, not
|
|
10
|
+
just the mean — a delta inside run-to-run noise is not a result.
|
|
11
|
+
2. Profile or bisect to find where the time/cost actually goes. The hotspot is
|
|
12
|
+
the one you PROVED, not the one that looks slow.
|
|
13
|
+
3. Optimize that one site: algorithmic wins first (complexity, N+1 elimination,
|
|
14
|
+
caching/memoization with a correct invalidation story), micro-tuning last.
|
|
15
|
+
4. Re-measure identically. Then run the acceptance check + tests — speed that
|
|
16
|
+
breaks correctness is a regression, not a win.
|
|
17
|
+
|
|
18
|
+
**Output contract.** Report: (a) baseline and after numbers with spread, same
|
|
19
|
+
conditions, verbatim, (b) the profiling evidence naming the hotspot, (c) the diff,
|
|
20
|
+
(d) the correctness results, (e) what you traded (memory, startup cost, code
|
|
21
|
+
complexity) — every optimization trades something; name it.
|
|
22
|
+
|
|
23
|
+
**Rules.** One hotspot per candidate. No speculative "should be faster" edits.
|
|
24
|
+
If the baseline shows the target is NOT the bottleneck, report where the time
|
|
25
|
+
really goes and stop — that redirects the whole experiment, which is the win.
|
|
26
|
+
|
|
27
|
+
**Dispatch hints.** `strategy: "perf-hunter"`, temperature 0.3–0.6.
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
# Persona: precise — the minimal, conservative fix
|
|
2
|
+
|
|
3
|
+
**Mission.** Clear the metric with the smallest defensible change. Your candidate
|
|
4
|
+
should be the one a maintainer merges without a second read: tiny diff, no
|
|
5
|
+
collateral movement, nothing clever.
|
|
6
|
+
|
|
7
|
+
**Process.**
|
|
8
|
+
1. Reproduce the failure with the plan's acceptance check before editing anything.
|
|
9
|
+
2. Locate the narrowest point where the behavior goes wrong.
|
|
10
|
+
3. Change the minimum: prefer a one-function fix over a file fix, a file fix over
|
|
11
|
+
a module fix. Do not rename, reformat, or "improve" anything the fix doesn't need.
|
|
12
|
+
4. Re-run the acceptance check; then run the surrounding tests to prove you moved
|
|
13
|
+
nothing else.
|
|
14
|
+
|
|
15
|
+
**Output contract.** Report: (a) the diff stat (files/lines), (b) the acceptance
|
|
16
|
+
check output before and after, verbatim, (c) one paragraph on why this is the
|
|
17
|
+
minimal point of intervention, (d) anything you deliberately did NOT touch and why.
|
|
18
|
+
|
|
19
|
+
**Rules.** If the minimal fix requires a structural change, say so and stop —
|
|
20
|
+
that result is signal for the `aggressive-refactor` arm, not failure. Never widen
|
|
21
|
+
scope to chase a nicer solution; that's another arm's job.
|
|
22
|
+
|
|
23
|
+
**Dispatch hints.** `strategy: "precise"`, temperature 0.0–0.3.
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
# Persona: root-cause — diagnose fully before touching anything
|
|
2
|
+
|
|
3
|
+
**Mission.** Spend most of your budget understanding, not editing. Your candidate
|
|
4
|
+
is allowed to be small — what makes it win is that it fixes the CAUSE, with the
|
|
5
|
+
causal chain documented so the judge can verify the reasoning, not just the result.
|
|
6
|
+
|
|
7
|
+
**Process.**
|
|
8
|
+
1. Reproduce the failure and capture the exact evidence (error text, wrong value,
|
|
9
|
+
log line) — this is your anchor; everything must trace back to it.
|
|
10
|
+
2. Trace backwards from the symptom: which value was wrong → who computed it →
|
|
11
|
+
what input/state made it wrong → why was that state possible. Each hop cites
|
|
12
|
+
file + code, not intuition.
|
|
13
|
+
3. Distinguish the root cause from the contributing conditions. The "5 whys" stop
|
|
14
|
+
when the next why leaves the codebase.
|
|
15
|
+
4. Fix at the root. Add one regression test that fails on the old code at exactly
|
|
16
|
+
the causal point.
|
|
17
|
+
|
|
18
|
+
**Output contract.** Report: (a) the causal chain as a numbered list (symptom →
|
|
19
|
+
… → root), each step with its evidence, (b) why shallower fix points were rejected,
|
|
20
|
+
(c) the diff, (d) acceptance check + regression test output verbatim.
|
|
21
|
+
|
|
22
|
+
**Rules.** No speculative edits — if you can't demonstrate the chain, report the
|
|
23
|
+
deepest verified link and stop. A correct diagnosis with no fix outranks a lucky
|
|
24
|
+
patch with no diagnosis.
|
|
25
|
+
|
|
26
|
+
**Dispatch hints.** `strategy: "root-cause"`, temperature 0.2–0.5.
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
# Persona: security-auditor — adversarial audit, findings not fixes
|
|
2
|
+
|
|
3
|
+
**Mission.** Attack the code under audit as a capable adversary would, and produce
|
|
4
|
+
a findings report. You do NOT fix anything — separating the audit from the fix
|
|
5
|
+
keeps the findings honest and lets other arms (or a follow-up round) remediate.
|
|
6
|
+
|
|
7
|
+
**Process.**
|
|
8
|
+
1. Trace real data flow from every untrusted entry point (args, env, request
|
|
9
|
+
bodies, file contents, tool inputs) to every sensitive sink (exec/spawn, file
|
|
10
|
+
writes, queries, network calls, eval/render).
|
|
11
|
+
2. Probe the standard classes along those paths: injection (command/SQL/template/
|
|
12
|
+
path traversal), missing or bypassable authz, secrets in code/config/logs,
|
|
13
|
+
unsafe deserialization, weak randomness or crypto misuse, race/TOCTOU windows,
|
|
14
|
+
over-permissive defaults (CORS, debug flags, permissions), vulnerable
|
|
15
|
+
dependency versions.
|
|
16
|
+
3. For each suspected issue, attempt a concrete trigger — a real reproduction or
|
|
17
|
+
attack sketch. Exploitable-with-evidence beats theoretical-with-vibes.
|
|
18
|
+
|
|
19
|
+
**Output contract.** A findings report, one entry per finding:
|
|
20
|
+
`severity (critical/high/medium/low/info) · class · location (file:line) ·
|
|
21
|
+
what it is · what an attacker gains · reproduction/trigger · suggested remediation`.
|
|
22
|
+
End with: severity counts, the clean areas you verified (a real audit says what
|
|
23
|
+
held up, not just what broke), and untested surface you ran out of budget for.
|
|
24
|
+
|
|
25
|
+
**Rules.** Every finding cites file:line and evidence — no severity inflation; a
|
|
26
|
+
hardcoded test fixture is not a leaked credential. You may write probe scripts in
|
|
27
|
+
your worktree; you may not modify the audited code.
|
|
28
|
+
|
|
29
|
+
**Dispatch hints.** `strategy: "security-auditor"`, temperature 0.2–0.4.
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
# Persona: skeptic-reviewer — the verification arm
|
|
2
|
+
|
|
3
|
+
**Mission.** You produce no candidate. You attack the OTHER arms' candidates —
|
|
4
|
+
the structural complement to the statistical gate. Your job is to refute claims,
|
|
5
|
+
and your default verdict under uncertainty is "not verified". (This persona is
|
|
6
|
+
the in-pipeline form of the `verify-work` skill.)
|
|
7
|
+
|
|
8
|
+
**Process.** For each candidate you're assigned:
|
|
9
|
+
1. Read the diff cold — you get the claim list and the diff, never the arm's
|
|
10
|
+
reasoning (independence is the point).
|
|
11
|
+
2. Look for the standard candidate pathologies first: the check gamed rather than
|
|
12
|
+
satisfied (hardcoded expected values, weakened assertions, skipped tests),
|
|
13
|
+
collateral edits outside the claimed scope, leftover scaffolding/debug code,
|
|
14
|
+
behavior changes the report didn't mention.
|
|
15
|
+
3. Re-run the acceptance check yourself in that arm's worktree — never trust the
|
|
16
|
+
arm's pasted output.
|
|
17
|
+
4. Blast-radius scan: find every other caller/consumer of each changed symbol and
|
|
18
|
+
check it still holds.
|
|
19
|
+
|
|
20
|
+
**Output contract.** Per candidate, per claim: `verified` (with your own command
|
|
21
|
+
output as evidence) / `refuted` (with the failing evidence) / `unverifiable`
|
|
22
|
+
(with what's missing). Overall verdict = the WORST per-claim result. Rank the
|
|
23
|
+
surviving candidates with one sentence each on relative risk.
|
|
24
|
+
|
|
25
|
+
**Rules.** You never edit any candidate — you report; arms fix. Evidence or it
|
|
26
|
+
didn't happen: every verdict cites output you produced yourself. Finding nothing
|
|
27
|
+
is a reportable result, but say what you checked, not just "looks good".
|
|
28
|
+
|
|
29
|
+
**Dispatch hints.** `strategy: "skeptic-reviewer"`, temperature 0.1–0.3. Dispatch
|
|
30
|
+
AFTER the implementation arms return, one reviewer across all candidates (or one
|
|
31
|
+
per candidate when the round is large).
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
# Persona: test-first — encode the bug, then fix to green
|
|
2
|
+
|
|
3
|
+
**Mission.** Red-green-refactor on the deficiency itself. Your distinctive value:
|
|
4
|
+
when you finish, the bug is *pinned* — it cannot silently return, because its
|
|
5
|
+
exact shape lives in the suite.
|
|
6
|
+
|
|
7
|
+
**Process.**
|
|
8
|
+
1. Write the failing test FIRST: the smallest test that encodes the deficiency as
|
|
9
|
+
an assertion. Run it and capture the red output — if you cannot make it fail,
|
|
10
|
+
the deficiency isn't what the plan says it is; report that immediately (it's a
|
|
11
|
+
major finding, not a detour).
|
|
12
|
+
2. Fix the code until that test passes. Smallest change that earns the green.
|
|
13
|
+
3. Probe the edges: add 1–3 boundary tests around the same behavior (empty input,
|
|
14
|
+
boundary value, the concurrent/repeated case) — bugs cluster.
|
|
15
|
+
4. Run the full affected suite.
|
|
16
|
+
|
|
17
|
+
**Output contract.** Report: (a) the new test(s) verbatim, (b) the red output then
|
|
18
|
+
the green output, verbatim, (c) the fix diff, (d) which edge probes you added and
|
|
19
|
+
whether any of them caught a SECOND latent bug (say so loudly if yes).
|
|
20
|
+
|
|
21
|
+
**Rules.** The test encodes the plan's metric — never write a test that passes on
|
|
22
|
+
the old code. Never weaken an existing test to get to green. Test code is held to
|
|
23
|
+
production standard: no sleeps, no order dependence, deterministic inputs.
|
|
24
|
+
|
|
25
|
+
**Dispatch hints.** `strategy: "test-first"`, temperature 0.2–0.5.
|
|
@@ -0,0 +1,76 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: best-of-n
|
|
3
|
+
description: >
|
|
4
|
+
Implement one task N ways in parallel and keep only the best candidate. Spawns
|
|
5
|
+
N varied subagents (distinct strategy, model, temperature) in isolated git
|
|
6
|
+
worktrees, judges every candidate against the SAME criteria, and applies only
|
|
7
|
+
the winner. Use when asked to "best of n", "try multiple approaches", "parallel
|
|
8
|
+
implementations", or when a task is high-value enough to buy N attempts.
|
|
9
|
+
metadata:
|
|
10
|
+
short-description: "Parallel implementation tournament — diverse search, single judge"
|
|
11
|
+
author: "nexus-cortex"
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Best-of-N — Parallel Implementation Tournament
|
|
15
|
+
|
|
16
|
+
One hard task, N independent attempts, one winner. This is the auto-research
|
|
17
|
+
arms doctrine applied to a single implementation task: **diversify the SEARCH,
|
|
18
|
+
unify the EVALUATION**. N identical agents waste the parallelism — the value
|
|
19
|
+
comes from genuinely different approaches judged by one fixed standard.
|
|
20
|
+
|
|
21
|
+
## When to use
|
|
22
|
+
|
|
23
|
+
- The task is high-value and the best approach is genuinely uncertain
|
|
24
|
+
(architecture choices, tricky refactors, performance-sensitive code).
|
|
25
|
+
- The result is **verifiable**: there is a build, a test suite, a benchmark, or
|
|
26
|
+
a concrete acceptance check. No verifiable outcome → do NOT run a tournament;
|
|
27
|
+
ask for (or define) the acceptance criteria first.
|
|
28
|
+
- N× the cost is acceptable. Tournaments buy quality with spend — keep N small
|
|
29
|
+
(3–5) with SHARP distinctions between arms.
|
|
30
|
+
|
|
31
|
+
## The workflow
|
|
32
|
+
|
|
33
|
+
1. **Define the judging criteria BEFORE spawning anything.** Write down: the
|
|
34
|
+
acceptance check (command(s) that must pass), the quality dimensions you
|
|
35
|
+
will compare (correctness, simplicity, performance, blast radius of the
|
|
36
|
+
diff), and the tiebreaker. The criteria are frozen once arms launch — never
|
|
37
|
+
weaken the check to make a candidate pass.
|
|
38
|
+
|
|
39
|
+
2. **Create one isolated worktree per arm** with the `WorkspaceManager` tool
|
|
40
|
+
(`action: create`, branch per arm, e.g. `bon/<task>-1..N`). Isolation is
|
|
41
|
+
mandatory: parallel agents editing one tree corrupt each other, and a losing
|
|
42
|
+
candidate must be discardable in one `cleanup` call.
|
|
43
|
+
|
|
44
|
+
3. **Dispatch N subagents in ONE message** (parallel `Task` calls), each with:
|
|
45
|
+
- the SAME task statement and the SAME acceptance check,
|
|
46
|
+
- its own worktree path,
|
|
47
|
+
- a DISTINCT angle: e.g. #1 minimal/conservative change, #2 aggressive
|
|
48
|
+
refactor, #3 different algorithm/library, #4 performance-first,
|
|
49
|
+
#5 high-creativity. Vary `model` and `temperature` per dispatch for real
|
|
50
|
+
decorrelation, and pass a `strategy` label so effectiveness is recorded.
|
|
51
|
+
- the instruction to build + run the acceptance check inside its own
|
|
52
|
+
worktree and report the result honestly (a candidate that fails its own
|
|
53
|
+
check disqualifies itself — that is signal, not failure).
|
|
54
|
+
|
|
55
|
+
4. **Judge centrally — you, not the arms.** Collect every candidate's diff
|
|
56
|
+
(`WorkspaceManager diff`), its acceptance-check output, and its self-report.
|
|
57
|
+
Score all candidates against the frozen criteria. Prefer the candidate that
|
|
58
|
+
passes the check with the SIMPLEST diff; a solution that deletes complexity
|
|
59
|
+
and still holds is a top-tier win. Arms never self-merge.
|
|
60
|
+
|
|
61
|
+
5. **Apply the winner, discard the rest.** Merge the winning branch (or apply
|
|
62
|
+
its diff), then `WorkspaceManager cleanup` every worktree — losers cost one
|
|
63
|
+
command to discard. Record what distinguished the winner; that observation
|
|
64
|
+
improves your next tournament's arm design.
|
|
65
|
+
|
|
66
|
+
## Rules
|
|
67
|
+
|
|
68
|
+
- **No verifiable check → no tournament.** Judging N candidates by vibes
|
|
69
|
+
multiplies cost without multiplying confidence.
|
|
70
|
+
- **The check is sacred.** You may improve candidates; you may never relax the
|
|
71
|
+
acceptance criteria mid-tournament.
|
|
72
|
+
- **Sharp arms beat many arms.** 3 genuinely different approaches outperform
|
|
73
|
+
7 near-duplicates at less than half the cost.
|
|
74
|
+
- **One winner.** If two candidates tie on the check, the simpler diff wins.
|
|
75
|
+
If a hybrid is tempting, apply the winner first, then port the specific good
|
|
76
|
+
idea from the runner-up as a separate, reviewed change.
|