npm - shipwright-cli - Versions diffs - 3.1.0 → 3.3.0 - Mend

shipwright-cli 3.1.0 → 3.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (283) hide show

package/.claude/agents/code-reviewer.md +2 -0
package/.claude/agents/devops-engineer.md +2 -0
package/.claude/agents/doc-fleet-agent.md +2 -0
package/.claude/agents/pipeline-agent.md +2 -0
package/.claude/agents/shell-script-specialist.md +2 -0
package/.claude/agents/test-specialist.md +2 -0
package/.claude/hooks/agent-crash-capture.sh +32 -0
package/.claude/hooks/post-tool-use.sh +3 -2
package/.claude/hooks/pre-tool-use.sh +35 -3
package/README.md +22 -8
package/claude-code/hooks/config-change.sh +18 -0
package/claude-code/hooks/instructions-reloaded.sh +7 -0
package/claude-code/hooks/worktree-create.sh +25 -0
package/claude-code/hooks/worktree-remove.sh +20 -0
package/config/code-constitution.json +130 -0
package/config/defaults.json +25 -2
package/config/policy.json +1 -1
package/dashboard/middleware/auth.ts +134 -0
package/dashboard/middleware/constants.ts +21 -0
package/dashboard/public/index.html +8 -6
package/dashboard/public/styles.css +176 -97
package/dashboard/routes/auth.ts +38 -0
package/dashboard/server.ts +117 -25
package/dashboard/services/config.ts +26 -0
package/dashboard/services/db.ts +118 -0
package/dashboard/src/canvas/pixel-agent.ts +298 -0
package/dashboard/src/canvas/pixel-sprites.ts +440 -0
package/dashboard/src/canvas/shipyard-effects.ts +367 -0
package/dashboard/src/canvas/shipyard-scene.ts +616 -0
package/dashboard/src/canvas/submarine-layout.ts +267 -0
package/dashboard/src/components/header.ts +8 -7
package/dashboard/src/core/api.ts +5 -0
package/dashboard/src/core/router.ts +1 -0
package/dashboard/src/design/submarine-theme.ts +253 -0
package/dashboard/src/main.ts +2 -0
package/dashboard/src/types/api.ts +12 -1
package/dashboard/src/views/activity.ts +2 -1
package/dashboard/src/views/metrics.ts +69 -1
package/dashboard/src/views/shipyard.ts +39 -0
package/dashboard/types/index.ts +166 -0
package/docs/plans/2026-02-28-compound-audit-and-shipyard-design.md +186 -0
package/docs/plans/2026-02-28-skipper-shipwright-implementation-plan.md +1182 -0
package/docs/plans/2026-02-28-skipper-shipwright-integration-design.md +531 -0
package/docs/plans/2026-03-01-ai-powered-skill-injection-design.md +298 -0
package/docs/plans/2026-03-01-ai-powered-skill-injection-plan.md +1109 -0
package/docs/plans/2026-03-01-capabilities-cleanup-plan.md +658 -0
package/docs/plans/2026-03-01-clean-architecture-plan.md +924 -0
package/docs/plans/2026-03-01-compound-audit-cascade-design.md +191 -0
package/docs/plans/2026-03-01-compound-audit-cascade-plan.md +921 -0
package/docs/plans/2026-03-01-deep-integration-plan.md +851 -0
package/docs/plans/2026-03-01-pipeline-audit-trail-design.md +145 -0
package/docs/plans/2026-03-01-pipeline-audit-trail-plan.md +770 -0
package/docs/plans/2026-03-01-refined-depths-brand-design.md +382 -0
package/docs/plans/2026-03-01-refined-depths-implementation.md +599 -0
package/docs/plans/2026-03-01-skipper-kernel-integration-design.md +203 -0
package/docs/plans/2026-03-01-unified-platform-design.md +272 -0
package/docs/plans/2026-03-07-claude-code-feature-integration-design.md +189 -0
package/docs/plans/2026-03-07-claude-code-feature-integration-plan.md +1165 -0
package/docs/research/BACKLOG_QUICK_REFERENCE.md +352 -0
package/docs/research/CUTTING_EDGE_RESEARCH_2026.md +546 -0
package/docs/research/RESEARCH_INDEX.md +439 -0
package/docs/research/RESEARCH_SOURCES.md +440 -0
package/docs/research/RESEARCH_SUMMARY.txt +275 -0
package/docs/superpowers/specs/2026-03-10-pipeline-quality-revolution-design.md +341 -0
package/package.json +2 -2
package/scripts/lib/adaptive-model.sh +427 -0
package/scripts/lib/adaptive-timeout.sh +316 -0
package/scripts/lib/audit-trail.sh +309 -0
package/scripts/lib/auto-recovery.sh +471 -0
package/scripts/lib/bandit-selector.sh +431 -0
package/scripts/lib/bootstrap.sh +104 -2
package/scripts/lib/causal-graph.sh +455 -0
package/scripts/lib/compat.sh +126 -0
package/scripts/lib/compound-audit.sh +337 -0
package/scripts/lib/constitutional.sh +454 -0
package/scripts/lib/context-budget.sh +359 -0
package/scripts/lib/convergence.sh +594 -0
package/scripts/lib/cost-optimizer.sh +634 -0
package/scripts/lib/daemon-adaptive.sh +14 -2
package/scripts/lib/daemon-dispatch.sh +106 -17
package/scripts/lib/daemon-failure.sh +34 -4
package/scripts/lib/daemon-patrol.sh +25 -4
package/scripts/lib/daemon-poll-github.sh +361 -0
package/scripts/lib/daemon-poll-health.sh +299 -0
package/scripts/lib/daemon-poll.sh +27 -611
package/scripts/lib/daemon-state.sh +119 -66
package/scripts/lib/daemon-triage.sh +10 -0
package/scripts/lib/dod-scorecard.sh +442 -0
package/scripts/lib/error-actionability.sh +300 -0
package/scripts/lib/formal-spec.sh +461 -0
package/scripts/lib/helpers.sh +180 -5
package/scripts/lib/intent-analysis.sh +409 -0
package/scripts/lib/loop-convergence.sh +350 -0
package/scripts/lib/loop-iteration.sh +682 -0
package/scripts/lib/loop-progress.sh +48 -0
package/scripts/lib/loop-restart.sh +185 -0
package/scripts/lib/memory-effectiveness.sh +506 -0
package/scripts/lib/mutation-executor.sh +352 -0
package/scripts/lib/outcome-feedback.sh +521 -0
package/scripts/lib/pipeline-cli.sh +336 -0
package/scripts/lib/pipeline-commands.sh +1216 -0
package/scripts/lib/pipeline-detection.sh +101 -3
package/scripts/lib/pipeline-execution.sh +897 -0
package/scripts/lib/pipeline-github.sh +28 -3
package/scripts/lib/pipeline-intelligence-compound.sh +431 -0
package/scripts/lib/pipeline-intelligence-scoring.sh +407 -0
package/scripts/lib/pipeline-intelligence-skip.sh +181 -0
package/scripts/lib/pipeline-intelligence.sh +104 -1138
package/scripts/lib/pipeline-quality-bash-compat.sh +182 -0
package/scripts/lib/pipeline-quality-checks.sh +17 -711
package/scripts/lib/pipeline-quality-gates.sh +563 -0
package/scripts/lib/pipeline-stages-build.sh +730 -0
package/scripts/lib/pipeline-stages-delivery.sh +965 -0
package/scripts/lib/pipeline-stages-intake.sh +1133 -0
package/scripts/lib/pipeline-stages-monitor.sh +407 -0
package/scripts/lib/pipeline-stages-review.sh +1022 -0
package/scripts/lib/pipeline-stages.sh +161 -2901
package/scripts/lib/pipeline-state.sh +36 -5
package/scripts/lib/pipeline-util.sh +487 -0
package/scripts/lib/policy-learner.sh +438 -0
package/scripts/lib/process-reward.sh +493 -0
package/scripts/lib/project-detect.sh +649 -0
package/scripts/lib/quality-profile.sh +334 -0
package/scripts/lib/recruit-commands.sh +885 -0
package/scripts/lib/recruit-learning.sh +739 -0
package/scripts/lib/recruit-roles.sh +648 -0
package/scripts/lib/reward-aggregator.sh +458 -0
package/scripts/lib/rl-optimizer.sh +362 -0
package/scripts/lib/root-cause.sh +427 -0
package/scripts/lib/scope-enforcement.sh +445 -0
package/scripts/lib/session-restart.sh +493 -0
package/scripts/lib/skill-memory.sh +300 -0
package/scripts/lib/skill-registry.sh +775 -0
package/scripts/lib/spec-driven.sh +476 -0
package/scripts/lib/test-helpers.sh +18 -7
package/scripts/lib/test-holdout.sh +429 -0
package/scripts/lib/test-optimizer.sh +511 -0
package/scripts/shipwright-file-suggest.sh +45 -0
package/scripts/skills/adversarial-quality.md +61 -0
package/scripts/skills/api-design.md +44 -0
package/scripts/skills/architecture-design.md +50 -0
package/scripts/skills/brainstorming.md +43 -0
package/scripts/skills/data-pipeline.md +44 -0
package/scripts/skills/deploy-safety.md +64 -0
package/scripts/skills/documentation.md +38 -0
package/scripts/skills/frontend-design.md +45 -0
package/scripts/skills/generated/.gitkeep +0 -0
package/scripts/skills/generated/_refinements/.gitkeep +0 -0
package/scripts/skills/generated/_refinements/adversarial-quality.patch.md +3 -0
package/scripts/skills/generated/_refinements/architecture-design.patch.md +3 -0
package/scripts/skills/generated/_refinements/brainstorming.patch.md +3 -0
package/scripts/skills/generated/cli-version-management.md +29 -0
package/scripts/skills/generated/collection-system-validation.md +99 -0
package/scripts/skills/generated/large-scale-c-refactoring-coordination.md +97 -0
package/scripts/skills/generated/pattern-matching-similarity-scoring.md +195 -0
package/scripts/skills/generated/test-parallelization-detection.md +65 -0
package/scripts/skills/observability.md +79 -0
package/scripts/skills/performance.md +48 -0
package/scripts/skills/pr-quality.md +49 -0
package/scripts/skills/product-thinking.md +43 -0
package/scripts/skills/security-audit.md +49 -0
package/scripts/skills/systematic-debugging.md +40 -0
package/scripts/skills/testing-strategy.md +47 -0
package/scripts/skills/two-stage-review.md +52 -0
package/scripts/skills/validation-thoroughness.md +55 -0
package/scripts/sw +9 -3
package/scripts/sw-activity.sh +9 -8
package/scripts/sw-adaptive.sh +8 -7
package/scripts/sw-adversarial.sh +2 -1
package/scripts/sw-architecture-enforcer.sh +3 -1
package/scripts/sw-auth.sh +12 -2
package/scripts/sw-autonomous.sh +5 -1
package/scripts/sw-changelog.sh +4 -1
package/scripts/sw-checkpoint.sh +2 -1
package/scripts/sw-ci.sh +15 -6
package/scripts/sw-cleanup.sh +4 -26
package/scripts/sw-code-review.sh +45 -20
package/scripts/sw-connect.sh +2 -1
package/scripts/sw-context.sh +2 -1
package/scripts/sw-cost.sh +107 -5
package/scripts/sw-daemon.sh +71 -11
package/scripts/sw-dashboard.sh +3 -1
package/scripts/sw-db.sh +71 -20
package/scripts/sw-decide.sh +8 -2
package/scripts/sw-decompose.sh +360 -17
package/scripts/sw-deps.sh +4 -1
package/scripts/sw-developer-simulation.sh +4 -1
package/scripts/sw-discovery.sh +378 -5
package/scripts/sw-doc-fleet.sh +4 -1
package/scripts/sw-docs-agent.sh +3 -1
package/scripts/sw-docs.sh +2 -1
package/scripts/sw-doctor.sh +453 -2
package/scripts/sw-dora.sh +4 -1
package/scripts/sw-durable.sh +12 -7
package/scripts/sw-e2e-orchestrator.sh +17 -16
package/scripts/sw-eventbus.sh +13 -4
package/scripts/sw-evidence.sh +364 -12
package/scripts/sw-feedback.sh +550 -9
package/scripts/sw-fix.sh +20 -1
package/scripts/sw-fleet-discover.sh +6 -2
package/scripts/sw-fleet-viz.sh +9 -4
package/scripts/sw-fleet.sh +5 -1
package/scripts/sw-github-app.sh +18 -4
package/scripts/sw-github-checks.sh +3 -2
package/scripts/sw-github-deploy.sh +3 -2
package/scripts/sw-github-graphql.sh +18 -7
package/scripts/sw-guild.sh +5 -1
package/scripts/sw-heartbeat.sh +5 -30
package/scripts/sw-hello.sh +67 -0
package/scripts/sw-hygiene.sh +10 -3
package/scripts/sw-incident.sh +273 -5
package/scripts/sw-init.sh +18 -2
package/scripts/sw-instrument.sh +10 -2
package/scripts/sw-intelligence.sh +44 -7
package/scripts/sw-jira.sh +5 -1
package/scripts/sw-launchd.sh +2 -1
package/scripts/sw-linear.sh +4 -1
package/scripts/sw-logs.sh +4 -1
package/scripts/sw-loop.sh +436 -1076
package/scripts/sw-memory.sh +357 -3
package/scripts/sw-mission-control.sh +6 -1
package/scripts/sw-model-router.sh +483 -27
package/scripts/sw-otel.sh +15 -4
package/scripts/sw-oversight.sh +14 -5
package/scripts/sw-patrol-meta.sh +334 -0
package/scripts/sw-pipeline-composer.sh +7 -1
package/scripts/sw-pipeline-vitals.sh +12 -6
package/scripts/sw-pipeline.sh +54 -2653
package/scripts/sw-pm.sh +16 -8
package/scripts/sw-pr-lifecycle.sh +2 -1
package/scripts/sw-predictive.sh +17 -5
package/scripts/sw-prep.sh +185 -2
package/scripts/sw-ps.sh +5 -25
package/scripts/sw-public-dashboard.sh +17 -4
package/scripts/sw-quality.sh +14 -6
package/scripts/sw-reaper.sh +8 -25
package/scripts/sw-recruit.sh +156 -2303
package/scripts/sw-regression.sh +19 -12
package/scripts/sw-release-manager.sh +3 -1
package/scripts/sw-release.sh +4 -1
package/scripts/sw-remote.sh +3 -1
package/scripts/sw-replay.sh +7 -1
package/scripts/sw-retro.sh +158 -1
package/scripts/sw-review-rerun.sh +3 -1
package/scripts/sw-scale.sh +14 -5
package/scripts/sw-security-audit.sh +6 -1
package/scripts/sw-self-optimize.sh +173 -6
package/scripts/sw-session.sh +9 -3
package/scripts/sw-setup.sh +3 -1
package/scripts/sw-stall-detector.sh +406 -0
package/scripts/sw-standup.sh +15 -7
package/scripts/sw-status.sh +3 -1
package/scripts/sw-strategic.sh +14 -6
package/scripts/sw-stream.sh +13 -4
package/scripts/sw-swarm.sh +20 -7
package/scripts/sw-team-stages.sh +13 -6
package/scripts/sw-templates.sh +7 -31
package/scripts/sw-testgen.sh +17 -6
package/scripts/sw-tmux-pipeline.sh +4 -1
package/scripts/sw-tmux-role-color.sh +2 -0
package/scripts/sw-tmux-status.sh +1 -1
package/scripts/sw-tmux.sh +37 -1
package/scripts/sw-trace.sh +3 -1
package/scripts/sw-tracker-github.sh +3 -0
package/scripts/sw-tracker-jira.sh +3 -0
package/scripts/sw-tracker-linear.sh +3 -0
package/scripts/sw-tracker.sh +3 -1
package/scripts/sw-triage.sh +3 -2
package/scripts/sw-upgrade.sh +3 -1
package/scripts/sw-ux.sh +5 -2
package/scripts/sw-webhook.sh +5 -2
package/scripts/sw-widgets.sh +9 -4
package/scripts/sw-worktree.sh +15 -3
package/scripts/test-skill-injection.sh +1233 -0
package/templates/pipelines/autonomous.json +27 -3
package/templates/pipelines/cost-aware.json +34 -8
package/templates/pipelines/deployed.json +12 -0
package/templates/pipelines/enterprise.json +12 -0
package/templates/pipelines/fast.json +6 -0
package/templates/pipelines/full.json +27 -3
package/templates/pipelines/hotfix.json +6 -0
package/templates/pipelines/standard.json +12 -0
package/templates/pipelines/tdd.json +12 -0

package/docs/research/RESEARCH_INDEX.md ADDED Viewed

@@ -0,0 +1,439 @@
+# Deep Research: Autonomous Coding Systems 2026 - Complete Index
+**Research Date:** April 4, 2026
+**Scope:** Cutting-edge research on autonomous software engineering, dark factories, RL systems, and agent coordination
+**Status:** Complete (65 sources, 25+ papers, 10 research areas)
+---
+## Quick Start Guide
+### For Product Strategy (15 min read)
+1. Start with: **RESEARCH_SUMMARY.txt** (executive summary)
+2. Skim: **BACKLOG_QUICK_REFERENCE.md** (priority matrix + ROI)
+3. Deep dive: **CUTTING_EDGE_RESEARCH_2026.md** (sections #1-5)
+### For Implementation Planning (30 min read)
+1. Read: **BACKLOG_QUICK_REFERENCE.md** (full roadmap)
+2. Reference: **RESEARCH_SOURCES.md** (key papers per feature)
+3. Deep dive: **CUTTING_EDGE_RESEARCH_2026.md** (specific gap sections)
+### For Architecture Decisions (60 min read)
+1. Read: **CUTTING_EDGE_RESEARCH_2026.md** (entire document)
+2. Cross-reference: **RESEARCH_SOURCES.md** (full URLs for papers)
+3. Apply: **BACKLOG_QUICK_REFERENCE.md** (implementation checklist)
+---
+## Document Overview
+### 1. CUTTING_EDGE_RESEARCH_2026.md (34 KB) ★ PRIMARY REPORT
+**Content:**
+- 10-area competitive analysis (loop patterns, dark factory, RL, memory, verification, testing, cost, self-healing, multi-agent, reasoning)
+- SOTA systems deep-dive with specific examples and benchmarks
+- Shipwright strengths (8 differentiated capabilities)
+- Shipwright gaps (10 specific missing features)
+- 20-item actionable backlog ranked by impact/effort ratio
+- 3-phase 12-week implementation roadmap
+- ROI analysis (5-7x immediate, 3-4x long-term)
+**Best for:** Strategic decisions, identifying gaps, understanding SOTA landscape, implementation planning
+**Key sections:**
+- Section 1: Autonomous Loop Patterns (SWE-agent, geometric dynamics, convergence)
+- Section 2: Dark Factory Model (BCG Platinion, 3-5 engineer factories)
+- Section 3: RL for Code (FunPRM, SecCoderX, DeepSeek-R1)
+- Section 4: Episodic Memory (Mem0, EM-LLM, active compression)
+- Section 5: Formal Verification (DafnyPro, ATLAS, Dafny benchmarks)
+- Section 6: Mutation Testing (Meta ACH, MutGen, diversity)
+- Section 7: Cost Optimization (Google Cascades, routing frameworks)
+- Section 8: Self-Healing CI/CD (Agentic SRE, Pipeline Doctor, MTTR)
+- Section 9: Multi-Agent Coordination (3-role pattern, frameworks, conflicts)
+- Section 10: Reasoning Models (Claude Opus 4.6, o1-pro, DeepSeek-R1)
+---
+### 2. BACKLOG_QUICK_REFERENCE.md (15 KB) ★ ACTIONABLE PRIORITY LIST
+**Content:**
+- Priority matrix (Rank, ID, Feature, Impact, Effort, ROI, Category)
+- Top 8 items with implementation details
+- 12-week phase-based roadmap
+- Implementation checklist
+- Success metrics
+- Dependency graph
+- Cost-benefit analysis
+- Next steps timeline
+**Best for:** Quick decision-making, sprint planning, ROI justification, tracking progress
+**Key sections:**
+- At-a-glance matrix (20 items ranked)
+- Phase 1 items with full implementation guidance (#1, #5, #2 research)
+- Phase 2 items (#3, #6, #13)
+- Phase 3 items (#4, #7, #8)
+- Tier 2 items summary (brief implementation paths)
+- Dependency relationships
+- Post-implementation success metrics
+- Budget and timeline planning
+---
+### 3. RESEARCH_SOURCES.md (16 KB) ★ COMPLETE BIBLIOGRAPHY
+**Content:**
+- 60+ sources organized by research area
+- Complete URLs for every paper, blog, report, tool
+- Key findings extracted from each source
+- Quick link summary grouped by backlog item
+- Total coverage: 25+ academic papers, 15+ industry reports, 10+ GitHub repos
+**Best for:** Finding original sources, deep diving on specific topics, citation, verification
+**Key sections:**
+- Dark Factory & Autonomous Delivery (BCG, Anthropic, GitHub)
+- Autonomous Loop Patterns (SWE-agent, geometric dynamics, benchmarks)
+- RL for Code Generation (FunPRM, SecCoderX, DeepSeek, Meta ACH)
+- Reasoning Models (Claude, OpenAI o1-pro, alignment science)
+- Memory Systems (Mem0, EM-LLM, episodic learning)
+- Formal Verification (DafnyPro, ATLAS, benchmarks)
+- Test Generation & Mutation (Meta, MutGen, LLMorpheus)
+- Cost Optimization (Google Cascades, routing frameworks)
+- Self-Healing CI/CD (Agentic SRE, AIOps, patterns)
+- Multi-Agent Coordination (frameworks, patterns, DORA)
+- Competitive Analysis (SWE-agent, Claude Code, Aider, Cline)
+---
+### 4. RESEARCH_SUMMARY.txt (plaintext, ~5 KB)
+**Content:**
+- Executive summary of all research
+- Key findings by category
+- Competitive analysis summary
+- 20-item backlog summary
+- ROI analysis
+- Implementation roadmap
+- Next steps
+**Best for:** Email distribution, quick briefing, non-Markdown contexts
+---
+## Research Coverage by Topic
+### Autonomous Loop Patterns & Convergence Detection
+**SOTA Systems:**
+- SWE-agent (NeurIPS 2024) — custom ACI, repository primitives
+- Geometric Dynamics paper (arxiv 2512.10350) — formal regime characterization
+- Anthropic 2026 report — convergence triggers via prompt design
+- 220 loops study — stuck detection empirical data
+**Shipwright Status:** Has basic convergence detection; missing formal regime analysis
+**Backlog Item:** #1 (Semantic trajectory analysis) — Week 1
+---
+### Dark Factory / Lights-Out Delivery
+**SOTA Systems:**
+- BCG Platinion (March 2026) — 3-5 engineers, 650+ PRs/month, Spotify/OpenAI cases
+- GitHub Copilot Agent Mode — Issue-to-PR workflow
+- Project Padawan (upcoming) — fully autonomous issue completion
+**Shipwright Status:** Has 12-stage pipeline; missing Intent Specification Engine
+**Backlog Item:** #2 (Intent Specification Engine) — High impact, research phase Week 2
+---
+### Reinforcement Learning for Code Generation
+**SOTA Systems:**
+- FunPRM — function-as-step process rewards, +15-20% completion
+- SecCoderX — vulnerability reward model, secure code RL
+- Meta ACH — 9,095 mutants + 571 tests on 10K classes
+- DeepSeek-R1 — pure RL without SFT, 2,029 Codeforces Elo
+**Shipwright Status:** Has reward aggregation + policy learning; missing vulnerability signals
+**Backlog Items:** #3 (Vulnerability Reward), #6 (Mutation Feedback), #13 (LLM Mutants)
+---
+### Episodic Memory & Long-Context Learning
+**SOTA Systems:**
+- Mem0 — hybrid storage, episodic + semantic layers
+- EM-LLM — Bayesian surprise + graph refinement for episodes
+- MemRL — agents improve via runtime RL on episodic memory
+- Active compression — consolidate episodes → semantic facts
+**Shipwright Status:** Pattern-based memory only; no execution traces
+**Backlog Items:** #4 (Episodic Memory), #12 (Active Compression), #15 (Fleet Learning)
+---
+### Formal Verification & Specification
+**SOTA Systems:**
+- DafnyPro (POPL 2026) — 86% on DafnyBench via Claude
+- ATLAS — 2.7K verified programs, 19K training examples
+- MiniF2F-Dafny — mathematical theorem proving
+- Vericoding benchmark — 27% Lean, 44% Verus, 82% Dafny
+**Shipwright Status:** Tests only; no formal verification
+**Backlog Item:** #11 (Formal Verification) — High effort, niche but high stakes
+---
+### Test Generation & Mutation Testing
+**SOTA Systems:**
+- Meta ACH — LLM-based test generation + mutant generation
+- MutGen — 89.5% mutation score, outperforms EvoSuite
+- LLMorpheus — open-source LLM-based mutation tool
+- GPT-4o mutants — 57 different AST node types vs 2 for rule-based
+**Shipwright Status:** Has testgen; no mutation feedback loop
+**Backlog Items:** #6 (Mutation Loop), #13 (LLM Mutants), #17 (Privacy Mutations)
+---
+### Cost Optimization & Model Routing
+**SOTA Systems:**
+- Google Speculative Cascades — 30-60% cost reduction
+- Unified routing + cascading — theoretically optimal framework
+- CoSine — 23% latency, 32% throughput improvement
+- Smurfs — adaptive speculation length per query
+**Shipwright Status:** Has model routing; no speculative cascading
+**Backlog Item:** #5 (Cascade Routing) — High ROI, Week 1
+---
+### Self-Healing CI/CD & AIOps
+**SOTA Systems:**
+- Agentic SRE pattern — telemetry → reasoning → automation
+- Pipeline Doctor / Interceptor — repair agent on failure
+- LLM-as-a-Judge — standard 2026 quality gate pattern
+- 67% MTTR drop with AIOps; 60% enterprise adoption (Gartner)
+**Shipwright Status:** Has retry logic; no repair agent or secondary validation
+**Backlog Items:** #7 (CI Repair), #8 (Judge), #14 (Anomaly Detection)
+---
+### Multi-Agent Coordination & Orchestration
+**SOTA Systems:**
+- Standard 3-role (Planner, Worker, Judge)
+- Git worktrees now standard isolation
+- MetaGPT, CrewAI, LangGraph, AutoGen frameworks
+- Google DORA 2025: 20-30% faster, 9% bug rate climb
+**Shipwright Status:** Strong multi-agent support; missing conflict resolution + DAG
+**Backlog Items:** #9 (Conflict Detection), #18 (DAG Scheduler)
+---
+### Reasoning Models with Extended/Adaptive Thinking
+**SOTA Systems:**
+- Claude Opus 4.6 — adaptive thinking (dynamic budget)
+- OpenAI o1-pro — $150/$600 pricing, 200K context, 89th% Codeforces
+- DeepSeek-R1 — 2,029 Elo, MoE architecture
+- Claude Mythos (unreleased) — recursive self-correction
+**Shipwright Status:** Uses extended thinking; missing budget allocation per query type
+**Backlog Item:** #10 (Reasoning Budget Allocation)
+---
+## Competitive Landscape (2026)
+| System             | SWE-bench | Multi-Agent | RL  | Memory | Cost-Opt | Verification | Notes                                   |
+| ------------------ | --------- | ----------- | --- | ------ | -------- | ------------ | --------------------------------------- |
+| **Claude Code**    | 80.9%     | ❌          | ❌  | ❌     | ✓        | ❌           | Highest score, single-agent             |
+| **SWE-agent**      | 40.6%     | ❌          | ❌  | ❌     | ❌       | ❌           | Best ACI design, NeurIPS 2024           |
+| **Aider**          | 49.2%     | ❌          | ❌  | ❌     | ✓✓       | ❌           | 4.2x token efficient                    |
+| **Cline**          | —         | ❌          | ❌  | ❌     | ✓        | ❌           | 500K downloads, IDE integration         |
+| **GitHub Copilot** | —         | ✓           | ❌  | ❌     | ✓        | ❌           | Project Padawan (autonomous)            |
+| **Shipwright**     | —         | ✓✓          | ✓✓  | ✓      | ✓        | ❌           | **UNIQUE: Platform for dark factories** |
+**Shipwright's positioning:** Only full-stack platform combining multi-agent orchestration + RL optimization + memory system + cost intelligence.
+---
+## Implementation Roadmap at a Glance
+```
+PHASE 1 (Weeks 1-4): CONVERGENCE & COST
+  Week 1-2: #1 Semantic trajectory analysis
+  Week 1-2: #5 Speculative cascade routing
+  Week 2+:  #2 Intent Specification (research phase)
+PHASE 2 (Weeks 5-8): SECURITY & TESTING
+  Week 5-6: #3 Vulnerability Reward Model
+  Week 5-6: #6 Mutation Testing Loop
+  Week 7-8: #13 LLM-based Mutants
+PHASE 3 (Weeks 9-12): MEMORY & SELF-HEALING
+  Week 9-10: #4 Episodic Memory Layer
+  Week 9-10: #7 CI Repair Agent
+  Week 11-12: #8 LLM-as-a-Judge
+TIER 2 (Weeks 13-26): LONGER-TERM
+  #2 Intent Specification (full implementation)
+  #9 Conflict Detection + DAG
+  #10 Reasoning Budget Allocation
+  #11 Formal Verification
+  #12 Active Compression
+  #14 Anomaly Detection
+  #15 Fleet Learning
+```
+---
+## Success Metrics (Post-Implementation)
+| Feature             | Metric            | Target  | Current  |
+| ------------------- | ----------------- | ------- | -------- |
+| #1 Loop convergence | Iteration waste ↓ | -25-40% | Baseline |
+| #5 Cascade routing  | Cost reduction    | -40-60% | Baseline |
+| #3 Security         | Bug reduction     | -30-40% | Current  |
+| #4 Episodic memory  | Solution time     | -20-35% | Baseline |
+| #6 Mutation testing | Mutation score    | >80%    | ~60%     |
+| #7 CI repair        | Retry cycles      | -50%    | Baseline |
+| **Overall**         | Pipeline success  | >85%    | ~77%     |
+---
+## Investment & ROI
+**Phase 1-2 (8 weeks, 2 engineers):**
+- Cost: $65K (engineering + compute)
+- Return: $320-440K annually
+- ROI: **5-7x**
+**Long-term (26 weeks):**
+- Additional return: $120-155K/year
+- ROI: **3-4x** on incremental investment
+---
+## How to Use These Documents
+### Weekly Strategy Review
+1. Open **BACKLOG_QUICK_REFERENCE.md** → Priority matrix
+2. Check progress against timeline
+3. Update next week's focus
+### Pre-Sprint Planning
+1. Read relevant sections in **CUTTING_EDGE_RESEARCH_2026.md**
+2. Extract implementation details from "Actionable Gap"
+3. Check **RESEARCH_SOURCES.md** for key papers
+### Deep Technical Design
+1. Read full section in **CUTTING_EDGE_RESEARCH_2026.md**
+2. Review all sources in **RESEARCH_SOURCES.md**
+3. Implement checklist from **BACKLOG_QUICK_REFERENCE.md**
+### Competitive Briefing
+1. Share **RESEARCH_SUMMARY.txt** (5 min read)
+2. Reference SOTA systems from specific sections
+3. Deep dive as needed
+---
+## Notes for Implementation
+### Assumptions Made
+- Shipwright has access to Claude API (embedding, reasoning)
+- GitHub Actions integration complete
+- Current pipeline success rate ~77%
+- Monthly compute budget ~$50K
+### Risk Factors
+- Model API availability (o1-pro limited to ChatGPT Pro)
+- DeepSeek-R1 accessibility (China-based, regulatory risk)
+- Formal verification tools (complex integration)
+- RL training stability (exploration vs exploitation tuning)
+### Mitigation Strategies
+- Start with proven patterns (Google Cascades, Meta ACH)
+- Use open-source where possible (DeepSeek-R1, Dafny, Aider)
+- Prototype before full implementation (#2 Intent Engine research phase)
+- A/B test new features (reasoning budgets, cascade routing)
+- Track metrics continuously (DORA, cost, success rate)
+---
+## Questions for Follow-Up
+1. **Dark Factory Ready:** How aggressively should we pursue the Intent Specification Engine (#2)? It's strategic but high-effort.
+2. **Formal Verification:** Is the cryptographic/payment use case common enough to justify #11 (Dafny/Lean integration)?
+3. **Reasoning Models:** Should we wait for Claude Mythos, or start with o1-pro now?
+4. **Priority Trade-offs:** If we can only do 3 items in Phase 1, should we skip #2 (Intent) research and focus on cost/convergence?
+5. **Multi-Agent Safety:** With Google's DORA showing 9% bug rate climb, how should quality gates (#8 Judge) be weighted?
+---
+**Generated:** April 4, 2026
+**Research effort:** 65+ sources, 25+ papers, 10 research areas, 8 hours
+**Next review:** After Phase 1 completion (Week 4)
+---
+## Document Navigation
+- Primary Report: `CUTTING_EDGE_RESEARCH_2026.md`
+- Quick Reference: `BACKLOG_QUICK_REFERENCE.md`
+- Sources: `RESEARCH_SOURCES.md`
+- Summary: `RESEARCH_SUMMARY.txt`
+- This Index: `RESEARCH_INDEX.md` (you are here)