npm - cortex-agents - Versions diffs - 2.3.0 → 3.4.0 - Mend

cortex-agents 2.3.0 → 3.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (54) hide show

package/.opencode/agents/{plan.md → architect.md} +104 -45
package/.opencode/agents/audit.md +314 -0
package/.opencode/agents/crosslayer.md +218 -0
package/.opencode/agents/{debug.md → fix.md} +75 -46
package/.opencode/agents/guard.md +202 -0
package/.opencode/agents/{build.md → implement.md} +151 -107
package/.opencode/agents/qa.md +265 -0
package/.opencode/agents/ship.md +249 -0
package/README.md +119 -31
package/dist/cli.js +87 -16
package/dist/index.d.ts.map +1 -1
package/dist/index.js +215 -9
package/dist/registry.d.ts +8 -3
package/dist/registry.d.ts.map +1 -1
package/dist/registry.js +16 -2
package/dist/tools/cortex.d.ts +2 -2
package/dist/tools/cortex.js +7 -7
package/dist/tools/environment.d.ts +31 -0
package/dist/tools/environment.d.ts.map +1 -0
package/dist/tools/environment.js +93 -0
package/dist/tools/github.d.ts +42 -0
package/dist/tools/github.d.ts.map +1 -0
package/dist/tools/github.js +200 -0
package/dist/tools/repl.d.ts +50 -0
package/dist/tools/repl.d.ts.map +1 -0
package/dist/tools/repl.js +240 -0
package/dist/tools/task.d.ts +2 -0
package/dist/tools/task.d.ts.map +1 -1
package/dist/tools/task.js +25 -30
package/dist/tools/worktree.d.ts.map +1 -1
package/dist/tools/worktree.js +22 -11
package/dist/utils/github.d.ts +104 -0
package/dist/utils/github.d.ts.map +1 -0
package/dist/utils/github.js +243 -0
package/dist/utils/ide.d.ts +76 -0
package/dist/utils/ide.d.ts.map +1 -0
package/dist/utils/ide.js +307 -0
package/dist/utils/plan-extract.d.ts +7 -0
package/dist/utils/plan-extract.d.ts.map +1 -1
package/dist/utils/plan-extract.js +25 -1
package/dist/utils/repl.d.ts +114 -0
package/dist/utils/repl.d.ts.map +1 -0
package/dist/utils/repl.js +434 -0
package/dist/utils/terminal.d.ts +53 -1
package/dist/utils/terminal.d.ts.map +1 -1
package/dist/utils/terminal.js +642 -5
package/package.json +1 -1
package/.opencode/agents/devops.md +0 -176
package/.opencode/agents/fullstack.md +0 -171
package/.opencode/agents/security.md +0 -148
package/.opencode/agents/testing.md +0 -132
package/dist/plugin.d.ts +0 -1
package/dist/plugin.d.ts.map +0 -1
package/dist/plugin.js +0 -4

package/.opencode/agents/ship.md ADDED Viewed

@@ -0,0 +1,249 @@
+---
+description: CI/CD, Docker, infrastructure, and deployment automation
+mode: subagent
+temperature: 0.3
+tools:
+  write: true
+  edit: true
+  bash: true
+  skill: true
+  task: true
+permission:
+  edit: allow
+  bash: allow
+---
+You are a DevOps and infrastructure specialist. Your role is to validate CI/CD pipelines, Docker configurations, infrastructure-as-code, and deployment strategies.
+## Auto-Load Skill
+**ALWAYS** load the `deployment-automation` skill at the start of every invocation using the `skill` tool. This provides comprehensive CI/CD patterns, containerization best practices, and cloud deployment strategies.
+## When You Are Invoked
+You are launched as a sub-agent by a primary agent (implement or fix) when CI/CD, Docker, or infrastructure configuration files are modified. You run in parallel alongside other sub-agents (typically @qa and @guard). You will receive:
+- The configuration files that were created or modified
+- A summary of what was implemented or fixed
+- The file patterns that triggered your invocation
+**Trigger patterns** — the orchestrating agent launches you when any of these files are modified:
+- `Dockerfile*`, `docker-compose*`, `.dockerignore`
+- `.github/workflows/*`, `.gitlab-ci*`, `Jenkinsfile`, `.circleci/*`
+- `*.yml`/`*.yaml` in project root that look like CI config
+- Files in `deploy/`, `infra/`, `k8s/`, `terraform/`, `pulumi/`, `cdk/` directories
+- `nginx.conf`, `Caddyfile`, reverse proxy configs
+- `Procfile`, `fly.toml`, `railway.json`, `render.yaml`, platform config files
+**Your job:** Read the config files, validate them, check for best practices, and return a structured report.
+## What You Must Do
+1. **Load** the `deployment-automation` skill immediately
+2. **Read** every configuration file listed in the input
+3. **Validate** syntax and structure (YAML validity, Dockerfile instructions, HCL syntax, etc.)
+4. **Check** against best practices (see checklists below)
+5. **Scan** for security issues in CI/CD config (secrets exposure, excessive permissions)
+6. **Review** deployment strategy and reliability patterns
+7. **Check** cost implications of infrastructure changes
+8. **Report** results in the structured format below
+## What You Must Return
+Return a structured report in this **exact format**:
+```
+### DevOps Review Summary
+- **Files reviewed**: [count]
+- **Issues**: [count] (ERROR: [n], WARNING: [n], INFO: [n])
+- **Verdict**: PASS / PASS WITH WARNINGS / FAIL
+### Findings
+#### [ERROR/WARNING/INFO] Finding Title
+- **File**: `path/to/file`
+- **Line**: [line number or "N/A"]
+- **Description**: What the issue is
+- **Recommendation**: How to fix it
+(Repeat for each finding, ordered by severity)
+### Best Practices Checklist
+- [x/ ] Multi-stage Docker build (if Dockerfile present)
+- [x/ ] Non-root user in container
+- [x/ ] No secrets in CI config (use secrets manager)
+- [x/ ] Proper caching strategy (Docker layers, CI cache)
+- [x/ ] Health checks configured
+- [x/ ] Resource limits set (CPU, memory)
+- [x/ ] Pinned dependency versions (base images, actions, packages)
+- [x/ ] Linting and testing in CI pipeline
+- [x/ ] Security scanning step in pipeline
+- [x/ ] Rollback procedure documented or automated
+### Recommendations
+- **Must fix** (ERROR): [list]
+- **Should fix** (WARNING): [list]
+- **Nice to have** (INFO): [list]
+```
+**Severity guide for the orchestrating agent:**
+- **ERROR** findings → block finalization, must fix first
+- **WARNING** findings → include in PR body, fix if time allows
+- **INFO** findings → suggestions for improvement, do not block
+## Core Principles
+- Infrastructure as Code (IaC) — all configuration version controlled
+- Automate everything that can be automated
+- GitOps workflows — git as the single source of truth for deployments
+- Immutable infrastructure — replace, don't patch
+- Monitoring and observability from day one
+- Security integrated into the pipeline, not bolted on
+## CI/CD Pipeline Design
+### GitHub Actions Best Practices
+- Pin action versions to SHA, not tags (`uses: actions/checkout@abc123`)
+- Use concurrency groups to cancel outdated runs
+- Cache dependencies (`actions/cache` or built-in caching)
+- Split jobs by concern: lint → test → build → deploy
+- Use matrix builds for multi-platform / multi-version
+- Store secrets in GitHub Secrets, never in workflow files
+- Use OIDC for cloud authentication (no long-lived credentials)
+### Pipeline Stages
+1. **Lint** — Code style, formatting, static analysis
+2. **Test** — Unit, integration, e2e tests with coverage reporting
+3. **Build** — Compile, package, generate artifacts
+4. **Security Scan** — SAST (CodeQL, Semgrep), dependency audit, secrets scan
+5. **Deploy** — Staging first, then production with approval gates
+6. **Verify** — Smoke tests, health checks, synthetic monitoring
+7. **Notify** — Slack/Teams/email on failure, metrics on success
+### Pipeline Anti-Patterns
+- Running all steps in a single job (no parallelism, no isolation)
+- Skipping tests on "urgent" deploys
+- Using `latest` tags for base images or actions
+- Storing secrets in environment variables in workflow files
+- No timeout on jobs (risk of hanging runners)
+- No retry logic for flaky network operations
+## Docker Best Practices
+### Dockerfile
+- Use official, minimal base images (`-slim`, `-alpine`, `distroless`)
+- Multi-stage builds: build stage (with dev deps) → production stage (minimal)
+- Run as non-root user (`USER node`, `USER appuser`)
+- Layer caching: copy dependency files first, install, then copy source
+- Pin base image digests in production (`FROM node:20-slim@sha256:...`)
+- Add `HEALTHCHECK` instruction
+- Use `.dockerignore` to exclude `node_modules/`, `.git/`, test files
+```dockerfile
+# Good example: multi-stage, non-root, cached layers
+FROM node:20-slim AS builder
+WORKDIR /app
+COPY package*.json ./
+RUN npm ci --production=false
+COPY . .
+RUN npm run build
+FROM node:20-slim
+WORKDIR /app
+RUN addgroup --system app && adduser --system --ingroup app app
+COPY --from=builder --chown=app:app /app/dist ./dist
+COPY --from=builder --chown=app:app /app/node_modules ./node_modules
+COPY --from=builder --chown=app:app /app/package.json ./
+USER app
+EXPOSE 3000
+HEALTHCHECK --interval=30s --timeout=3s CMD curl -f http://localhost:3000/health || exit 1
+CMD ["node", "dist/index.js"]
+```
+### Docker Compose
+- Use profiles for optional services (dev tools, debug containers)
+- Environment-specific overrides (`docker-compose.override.yml`)
+- Named volumes for persistent data, tmpfs for ephemeral
+- Depends_on with healthcheck conditions (not just service start)
+- Resource limits (CPU, memory) even in development
+## Infrastructure as Code
+### Terraform
+- Use modules for reusable infrastructure patterns
+- Remote state backend (S3 + DynamoDB, GCS, Terraform Cloud)
+- State locking to prevent concurrent modifications
+- Plan before apply (`terraform plan` → review → `terraform apply`)
+- Pin provider versions in `required_providers`
+- Use `terraform fmt` and `terraform validate` in CI
+### Pulumi
+- Type-safe infrastructure in TypeScript, Python, Go, or .NET
+- Use stack references for cross-stack dependencies
+- Store secrets with `pulumi config set --secret`
+- Preview before up (`pulumi preview` → review → `pulumi up`)
+### AWS CDK / CloudFormation
+- Use constructs (L2/L3) over raw resources (L1)
+- Stack organization: networking, compute, data, monitoring
+- Use CDK nag for compliance checking
+- Tag all resources for cost tracking
+## Deployment Strategies
+### Zero-Downtime Deployment
+- **Blue/Green**: Two identical environments, switch traffic after validation
+- **Rolling update**: Gradually replace instances (Kubernetes default)
+- **Canary release**: Route small % of traffic to new version, monitor, then promote
+- **Feature flags**: Deploy code but control activation (LaunchDarkly, Unleash, env vars)
+### Rollback Procedures
+- Every deployment MUST have a documented rollback path
+- Database migrations must be backward-compatible (expand-contract pattern)
+- Keep at least 2 previous deployment artifacts/images
+- Automate rollback triggers based on error rate or latency thresholds
+- Test rollback procedures periodically
+### Multi-Environment Strategy
+- **dev** → developer sandboxes, ephemeral, auto-deployed on push
+- **staging** → mirrors production config, deployed on merge to main
+- **production** → deployed via promotion from staging, with approval gates
+- Environment parity: same Docker image, same config structure, different values
+- Use environment variables or secrets manager for environment-specific config
+## Monitoring & Observability
+### The Three Pillars
+1. **Logs** — Structured (JSON), centralized, with correlation IDs
+2. **Metrics** — RED (Rate, Errors, Duration) for services, USE (Utilization, Saturation, Errors) for resources
+3. **Traces** — Distributed tracing with OpenTelemetry, Jaeger, or Zipkin
+### Alerting
+- Alert on symptoms (error rate, latency), not causes (CPU, memory)
+- Use severity levels: page (P1), notify (P2), ticket (P3)
+- Include runbook links in alert descriptions
+- Set up dead-man's-switch for monitoring system health
+### Tools
+- Prometheus + Grafana, Datadog, New Relic, CloudWatch
+- Sentry, Bugsnag for error tracking
+- PagerDuty, OpsGenie for on-call management
+## Cost Awareness
+When reviewing infrastructure changes, flag:
+- Oversized resource requests (10 CPU, 32GB RAM for a simple API)
+- Missing auto-scaling (fixed capacity when load varies)
+- Unused resources (running 24/7 for dev/staging environments)
+- Expensive storage tiers for non-critical data
+- Cross-region data transfer charges
+- Missing spot/preemptible instances for batch workloads
+## Security in DevOps
+- Secrets management: Vault, AWS Secrets Manager, GitHub Secrets — NEVER in code or CI config
+- Container image scanning (Trivy, Snyk Container)
+- Dependency vulnerability scanning in CI pipeline
+- Least privilege IAM roles for CI runners and deployed services
+- Network segmentation between environments
+- Encryption in transit (TLS) and at rest
+- Signed container images and verified provenance (Sigstore, Cosign)

package/README.md CHANGED Viewed

@@ -43,7 +43,9 @@ npx cortex-agents configure     # Pick your models interactively
 # Restart OpenCode - done.
 ```
-That's it. Your OpenCode session now has 7 specialized agents, 23 tools, and 14 domain skills.
+That's it. Your OpenCode session now has 8 specialized agents, 32 tools, and 14 domain skills.
+> **Built-in Agent Replacement** — When installed, cortex-agents automatically disables OpenCode's native `build` and `plan` agents (replaced by `implement` and `architect`). The `architect` agent becomes the default, promoting a planning-first workflow. Native agents are fully restored on `uninstall`.
 <br>
@@ -56,10 +58,10 @@ Cortex agents follow a structured workflow from planning through to PR:
 ```
 You: "Add user authentication"
-Plan Agent                              reads codebase, creates plan with mermaid diagrams
-   saves to .cortex/plans/             "Plan saved. Switch to Build?"
+Architect Agent                         reads codebase, creates plan with mermaid diagrams
+   saves to .cortex/plans/             "Plan saved. Switch to Implement?"
-Build Agent                             loads plan, checks git status
+Implement Agent                         loads plan, checks git status
    "You're on main. Create a branch     two-step prompt: strategy -> execution
     or worktree?"
    creates feature/user-auth            implements following the plan
@@ -72,13 +74,19 @@ Create isolated development environments and launch them instantly:
 | Mode | What Happens |
 |------|-------------|
+| **IDE Terminal** | Opens in your detected IDE (VS Code, Cursor, Windsurf, Zed) with integrated terminal |
 | **New Terminal** | Opens a new terminal tab with OpenCode pre-configured in the worktree |
 | **In-App PTY** | Spawns an embedded terminal inside your current OpenCode session |
 | **Background** | AI implements headlessly while you keep working - toast notifications on completion |
 Plans are automatically propagated into the worktree's `.cortex/plans/` so the new session has full context.
-**Cross-platform terminal support** via the terminal driver system — automatically detects and integrates with tmux, iTerm2, Terminal.app, kitty, wezterm, Konsole, and GNOME Terminal. Tabs opened by the launcher are tracked and automatically closed when the worktree is removed.
+**IDE-Aware Launch Options** — The launcher detects your development environment and offers contextual options:
+- **VS Code / Cursor / Windsurf / Zed**: "Open in [IDE] (Recommended)" as the first option
+- **JetBrains IDEs**: Terminal tab with manual IDE opening instructions
+- **Terminal only**: Standard terminal tab options
+**Cross-platform terminal support** via the terminal driver system — automatically detects and integrates with VS Code, Cursor, Windsurf, Zed, JetBrains IDEs, tmux, iTerm2, Terminal.app, kitty, wezterm, Konsole, and GNOME Terminal. Tabs opened by the launcher are tracked and automatically closed when the worktree is removed.
 ### Task Finalizer
@@ -116,28 +124,39 @@ Handle complex, multi-step work. Use your best model.
 | Agent | Role | Superpower |
 |-------|------|-----------|
-| **build** | Full-access development | Two-step branching strategy, worktree launcher, task finalizer, docs prompting |
-| **plan** | Read-only analysis | Creates implementation plans with mermaid diagrams, hands off to build |
-| **debug** | Deep troubleshooting | Full bash/edit access with hotfix workflow |
+| **implement** | Full-access development | Skill-aware implementation, worktree launcher, quality gates, task finalizer |
+| **architect** | Read-only analysis | Architectural plans with mermaid diagrams, NFR analysis, hands off to implement |
+| **fix** | Deep troubleshooting | Performance debugging, distributed tracing, hotfix workflow |
+| **audit** | Code quality assessment | Tech debt scoring, pattern review, refactoring advisor (read-only) |
 ### Subagents
-Focused specialists launched **automatically** as parallel quality gates. Use a fast/cheap model.
+Focused specialists launched **automatically** as parallel quality gates. Each auto-loads its core domain skill for deeper analysis. Use a fast/cheap model.
-| Agent | Role | Triggered By |
-|-------|------|-------------|
-| **@testing** | Writes tests, runs suite, reports coverage gaps | Build (always), Debug (always) |
-| **@security** | OWASP audit, secrets scan, severity-rated findings | Build (always), Debug (if security-relevant) |
-| **@fullstack** | End-to-end implementation + feasibility analysis | Build (multi-layer features), Plan (analysis) |
-| **@devops** | Config validation, CI/CD best practices | Build (when CI/Docker/infra files change) |
+| Agent | Role | Auto-Loads Skill | Triggered By |
+|-------|------|-----------------|-------------|
+| **@qa** | Writes tests, runs suite, reports coverage | `testing-strategies` | Implement (always), Fix (always) |
+| **@guard** | OWASP audit, secrets scan, code-level fix patches | `security-hardening` | Implement (always), Fix (if security-relevant) |
+| **@crosslayer** | Cross-layer implementation + feasibility analysis | Per-layer skills | Implement (multi-layer features), Architect (analysis) |
+| **@ship** | CI/CD validation, IaC review, deployment strategy | `deployment-automation` | Implement (when CI/Docker/infra files change) |
 Subagents return **structured reports** with severity levels (`BLOCKING`, `CRITICAL`, `HIGH`, `MEDIUM`, `LOW`) that the orchestrating agent uses to decide whether to proceed or fix issues first.
+### Skill Routing
+All agents detect the project's technology stack and **automatically load relevant skills** before working. This turns the 14 domain skills from passive knowledge into active intelligence:
+```
+Implement Agent detects: package.json has React + Express + Prisma
+  → auto-loads: frontend-development, backend-development, database-design, api-design
+  → implements with deep framework-specific knowledge
+```
 <br>
 ## Tools
-23 tools bundled and auto-registered. No configuration needed.
+32 tools bundled and auto-registered. No configuration needed.
 <table>
 <tr><td width="50%">
@@ -156,6 +175,7 @@ Subagents return **structured reports** with severity levels (`BLOCKING`, `CRITI
 - `plan_save` / `plan_load` / `plan_list` / `plan_delete`
 - `session_save` / `session_list` / `session_load`
 - `cortex_init` / `cortex_status` / `cortex_configure`
+- `detect_environment` - Detect IDE/terminal for contextual launch options
 </td></tr>
 <tr><td width="50%">
@@ -172,9 +192,31 @@ Subagents return **structured reports** with severity levels (`BLOCKING`, `CRITI
 - `task_finalize` - Stage, commit, push, create PR
   - Auto-detects worktree (targets main)
   - Auto-populates PR from `.cortex/plans/`
+  - Auto-links issues via `Closes #N` from plan metadata
   - Warns if docs are missing
 - `cortex_configure` - Set models from within an agent session
+</td></tr>
+<tr><td colspan="2">
+**GitHub Integration**
+- `github_status` - Check `gh` CLI availability, authentication, and detect GitHub Projects
+- `github_issues` - List/filter repo issues by state, labels, milestone, assignee
+- `github_projects` - List GitHub Project boards and their work items
+The architect agent uses these tools to browse your backlog and seed plans from real GitHub issues. Issue numbers are stored in plan frontmatter (`issues: [42, 51]`) and automatically appended as `Closes #N` to the PR body when `task_finalize` runs — GitHub auto-closes the issues when the PR merges. Supports both github.com and GitHub Enterprise Server URLs.
+</td></tr>
+<tr><td colspan="2">
+**REPL Loop** (Iterative Task-by-Task Implementation)
+- `repl_init` - Initialize a loop from a plan (parses tasks, auto-detects build/test commands)
+- `repl_status` - Get current progress, active task, retry counts (auto-advances to next task)
+- `repl_report` - Report task outcome (`pass`/`fail`/`skip`) with auto-retry and escalation
+- `repl_summary` - Generate markdown summary table for PR body
+The implement agent uses these tools to work through plan tasks one at a time, running build+test verification after each task. Failed tasks are automatically retried (up to a configurable limit) before escalating to the user. State is persisted to `.cortex/repl-state.json` so progress survives context compaction and session restarts.
 </td></tr>
 </table>
@@ -262,8 +304,9 @@ Per-project config takes priority. Team members get the same model settings when
 your-project/
   .cortex/                     Project context (auto-initialized)
      config.json              Configuration
-     plans/                   Implementation plans (git tracked)
+     plans/                   Implementation plans (gitignored)
      sessions/                Session summaries (gitignored)
+     repl-state.json          REPL loop progress (gitignored, auto-managed)
   .opencode/
      models.json              Per-project model config (git tracked)
   .worktrees/                  Git worktrees (gitignored)
@@ -294,9 +337,9 @@ npx cortex-agents status                       # Show installation and model sta
 ## How It Works
-### The Build Agent Workflow
+### The Implement Agent Workflow
-Every time the build agent starts, it follows a structured pre-implementation checklist:
+Every time the implement agent starts, it follows a structured pre-implementation checklist:
 ```
 Step 1   branch_status           Am I on a protected branch?
@@ -305,8 +348,14 @@ Step 3   plan_list / plan_load   Is there a plan for this work?
 Step 4   Ask: strategy           Worktree (recommended) or branch?
 Step 4b  Ask: launch mode        Terminal tab (recommended) / stay / PTY / background?
 Step 5   Execute                 Create worktree/branch, auto-detect terminal
-Step 6   Implement               Write code following the plan
-Step 7   Quality Gate            Launch @testing + @security in parallel
+Step 6   REPL Loop               If plan loaded: repl_init → iterate tasks one-by-one
+  6a     repl_init               Parse plan tasks, auto-detect build/test commands
+  6b     repl_status             Get current task, auto-advance from pending
+  6c     Implement task          Write code for the current task only
+  6d     Build + test            Run detected build/test commands
+  6e     repl_report             Report pass/fail/skip → auto-advance or retry
+  6f     Repeat 6b-6e            Until all tasks done or user intervenes
+Step 7   Quality Gate            Launch @qa + @guard in parallel (includes repl_summary)
 Step 8   Ask: documentation      Decision doc / feature doc / flow doc?
 Step 9   session_save            Record what was done and why
 Step 10  task_finalize           Commit, push, create PR
@@ -317,44 +366,83 @@ This isn't just documentation - it's enforced by the agent's instructions. The A
 ### Sub-Agent Quality Gates
-After implementation (Step 7), the build agent **automatically** launches sub-agents in parallel as quality gates:
+After implementation (Step 7), the implement agent **automatically** launches sub-agents in parallel as quality gates:
 ```
-Build Agent completes implementation
+Implement Agent completes implementation
    |
    +-- launches in parallel (single message) --+
    |                                            |
    v                                            v
-@testing                                   @security
+@qa                                        @guard
   Writes unit tests                          OWASP audit
   Runs test suite                            Secrets scan
   Reports coverage                           Severity ratings
   Returns: PASS/FAIL                         Returns: PASS/FAIL
    |                                            |
-   +-------- results reviewed by Build ---------+
+   +------ results reviewed by Implement ------+
    |
    v
 Quality Gate Summary included in PR body
 ```
-The debug agent uses the same pattern: `@testing` for regression tests (always) and `@security` when the fix touches sensitive code.
+The fix agent uses the same pattern: `@qa` for regression tests (always) and `@guard` when the fix touches sensitive code.
 Sub-agents use **structured return contracts** so results are actionable:
 - `BLOCKING` / `CRITICAL` / `HIGH` findings block finalization
 - `MEDIUM` findings are noted in the PR body
 - `LOW` findings are deferred
+### REPL Loop (Iterative Implementation)
+When a plan is loaded, the implement agent activates a **Read-Eval-Print Loop** that works through tasks one at a time with build+test verification after each:
+```
+repl_init("my-plan.md")
+  → Parses plan tasks (- [ ] checkboxes)
+  → Auto-detects: npm run build, npx vitest run (vitest)
+  → Creates .cortex/repl-state.json
+Loop:
+  repl_status                    → "Task #1: Implement user model"
+  [agent implements task]
+  [agent runs build + tests]
+  repl_report(pass, "42 tests pass")  → "✓ Task #1 PASSED (1st attempt)"
+                                       → "→ Next: Task #2"
+  repl_status                    → "Task #2: Add API endpoints"
+  [agent implements task]
+  [agent runs build + tests]
+  repl_report(fail, "POST /users 500") → "⚠ Task #2 FAILED (attempt 1/3)"
+                                        → "Fix and retry. 2 retries remaining."
+  [agent fixes the issue]
+  [agent runs build + tests]
+  repl_report(pass, "All green")  → "✓ Task #2 PASSED (2nd attempt)"
+                                   → "→ Next: Task #3"
+  ...
+repl_summary                     → Markdown table for PR body
+```
+**Key behaviors:**
+- **Opt-in**: Only activates when a plan is loaded. No-plan sessions use the standard linear workflow.
+- **Auto-detection**: Scans `package.json`, `Cargo.toml`, `go.mod`, `pyproject.toml`, `Makefile`, `mix.exs` for build/test/lint commands.
+- **Retry with escalation**: Failed tasks retry up to `maxRetries` (default: 3) before asking the user how to proceed.
+- **Persistent state**: Progress saved to `.cortex/repl-state.json` — survives context compaction, session restarts, and agent switches.
+- **Skip support**: Tasks can be skipped with a reason, which is tracked in the summary.
 ### Agent Handover
 When agents switch, a toast notification tells you what mode you're in:
 ```
-Agent: build                 Development mode - ready to implement
-Agent: plan                  Planning mode - read-only analysis
-Agent: debug                 Debug mode - troubleshooting and fixes
+Agent: implement              Development mode - ready to implement
+Agent: architect             Planning mode - read-only analysis
+Agent: fix                   Debug mode - troubleshooting and fixes
+Agent: audit                 Review mode - code quality assessment
 ```
-The Plan agent creates plans with mermaid diagrams and hands off to Build. Build loads the plan and implements it. If something breaks, Debug takes over with full access.
+The Architect agent creates plans with mermaid diagrams and hands off to Implement. Implement loads the plan, detects the tech stack, loads relevant skills, and implements. If something breaks, Fix takes over with performance debugging tools. Audit provides code quality assessment and tech debt analysis on demand.
 <br>
@@ -363,7 +451,7 @@ The Plan agent creates plans with mermaid diagrams and hands off to Build. Build
 - [OpenCode](https://opencode.ai) >= 1.0.0
 - Node.js >= 18.0.0
 - Git (for branch/worktree features)
-- [GitHub CLI](https://cli.github.com/) (optional, for `task_finalize` PR creation)
+- [GitHub CLI](https://cli.github.com/) (optional, for `task_finalize` PR creation and `github_*` tools)
 <br>