shipwright-cli 3.2.0 → 3.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (279) hide show
  1. package/.claude/agents/code-reviewer.md +2 -0
  2. package/.claude/agents/devops-engineer.md +2 -0
  3. package/.claude/agents/doc-fleet-agent.md +2 -0
  4. package/.claude/agents/pipeline-agent.md +2 -0
  5. package/.claude/agents/shell-script-specialist.md +2 -0
  6. package/.claude/agents/test-specialist.md +2 -0
  7. package/.claude/hooks/agent-crash-capture.sh +32 -0
  8. package/.claude/hooks/post-tool-use.sh +3 -2
  9. package/.claude/hooks/pre-tool-use.sh +35 -3
  10. package/README.md +4 -4
  11. package/claude-code/hooks/config-change.sh +18 -0
  12. package/claude-code/hooks/instructions-reloaded.sh +7 -0
  13. package/claude-code/hooks/worktree-create.sh +25 -0
  14. package/claude-code/hooks/worktree-remove.sh +20 -0
  15. package/config/code-constitution.json +130 -0
  16. package/dashboard/middleware/auth.ts +134 -0
  17. package/dashboard/middleware/constants.ts +21 -0
  18. package/dashboard/public/index.html +2 -6
  19. package/dashboard/public/styles.css +100 -97
  20. package/dashboard/routes/auth.ts +38 -0
  21. package/dashboard/server.ts +66 -25
  22. package/dashboard/services/config.ts +26 -0
  23. package/dashboard/services/db.ts +118 -0
  24. package/dashboard/src/canvas/pixel-agent.ts +298 -0
  25. package/dashboard/src/canvas/pixel-sprites.ts +440 -0
  26. package/dashboard/src/canvas/shipyard-effects.ts +367 -0
  27. package/dashboard/src/canvas/shipyard-scene.ts +616 -0
  28. package/dashboard/src/canvas/submarine-layout.ts +267 -0
  29. package/dashboard/src/components/header.ts +8 -7
  30. package/dashboard/src/core/router.ts +1 -0
  31. package/dashboard/src/design/submarine-theme.ts +253 -0
  32. package/dashboard/src/main.ts +2 -0
  33. package/dashboard/src/types/api.ts +2 -1
  34. package/dashboard/src/views/activity.ts +2 -1
  35. package/dashboard/src/views/shipyard.ts +39 -0
  36. package/dashboard/types/index.ts +166 -0
  37. package/docs/plans/2026-02-28-compound-audit-and-shipyard-design.md +186 -0
  38. package/docs/plans/2026-02-28-skipper-shipwright-implementation-plan.md +1182 -0
  39. package/docs/plans/2026-02-28-skipper-shipwright-integration-design.md +531 -0
  40. package/docs/plans/2026-03-01-ai-powered-skill-injection-design.md +298 -0
  41. package/docs/plans/2026-03-01-ai-powered-skill-injection-plan.md +1109 -0
  42. package/docs/plans/2026-03-01-capabilities-cleanup-plan.md +658 -0
  43. package/docs/plans/2026-03-01-clean-architecture-plan.md +924 -0
  44. package/docs/plans/2026-03-01-compound-audit-cascade-design.md +191 -0
  45. package/docs/plans/2026-03-01-compound-audit-cascade-plan.md +921 -0
  46. package/docs/plans/2026-03-01-deep-integration-plan.md +851 -0
  47. package/docs/plans/2026-03-01-pipeline-audit-trail-design.md +145 -0
  48. package/docs/plans/2026-03-01-pipeline-audit-trail-plan.md +770 -0
  49. package/docs/plans/2026-03-01-refined-depths-brand-design.md +382 -0
  50. package/docs/plans/2026-03-01-refined-depths-implementation.md +599 -0
  51. package/docs/plans/2026-03-01-skipper-kernel-integration-design.md +203 -0
  52. package/docs/plans/2026-03-01-unified-platform-design.md +272 -0
  53. package/docs/plans/2026-03-07-claude-code-feature-integration-design.md +189 -0
  54. package/docs/plans/2026-03-07-claude-code-feature-integration-plan.md +1165 -0
  55. package/docs/research/BACKLOG_QUICK_REFERENCE.md +352 -0
  56. package/docs/research/CUTTING_EDGE_RESEARCH_2026.md +546 -0
  57. package/docs/research/RESEARCH_INDEX.md +439 -0
  58. package/docs/research/RESEARCH_SOURCES.md +440 -0
  59. package/docs/research/RESEARCH_SUMMARY.txt +275 -0
  60. package/docs/superpowers/specs/2026-03-10-pipeline-quality-revolution-design.md +341 -0
  61. package/package.json +2 -2
  62. package/scripts/lib/adaptive-model.sh +427 -0
  63. package/scripts/lib/adaptive-timeout.sh +316 -0
  64. package/scripts/lib/audit-trail.sh +309 -0
  65. package/scripts/lib/auto-recovery.sh +471 -0
  66. package/scripts/lib/bandit-selector.sh +431 -0
  67. package/scripts/lib/bootstrap.sh +104 -2
  68. package/scripts/lib/causal-graph.sh +455 -0
  69. package/scripts/lib/compat.sh +126 -0
  70. package/scripts/lib/compound-audit.sh +337 -0
  71. package/scripts/lib/constitutional.sh +454 -0
  72. package/scripts/lib/context-budget.sh +359 -0
  73. package/scripts/lib/convergence.sh +594 -0
  74. package/scripts/lib/cost-optimizer.sh +634 -0
  75. package/scripts/lib/daemon-adaptive.sh +10 -0
  76. package/scripts/lib/daemon-dispatch.sh +106 -17
  77. package/scripts/lib/daemon-failure.sh +34 -4
  78. package/scripts/lib/daemon-patrol.sh +23 -2
  79. package/scripts/lib/daemon-poll-github.sh +361 -0
  80. package/scripts/lib/daemon-poll-health.sh +299 -0
  81. package/scripts/lib/daemon-poll.sh +27 -611
  82. package/scripts/lib/daemon-state.sh +112 -66
  83. package/scripts/lib/daemon-triage.sh +10 -0
  84. package/scripts/lib/dod-scorecard.sh +442 -0
  85. package/scripts/lib/error-actionability.sh +300 -0
  86. package/scripts/lib/formal-spec.sh +461 -0
  87. package/scripts/lib/helpers.sh +177 -4
  88. package/scripts/lib/intent-analysis.sh +409 -0
  89. package/scripts/lib/loop-convergence.sh +350 -0
  90. package/scripts/lib/loop-iteration.sh +682 -0
  91. package/scripts/lib/loop-progress.sh +48 -0
  92. package/scripts/lib/loop-restart.sh +185 -0
  93. package/scripts/lib/memory-effectiveness.sh +506 -0
  94. package/scripts/lib/mutation-executor.sh +352 -0
  95. package/scripts/lib/outcome-feedback.sh +521 -0
  96. package/scripts/lib/pipeline-cli.sh +336 -0
  97. package/scripts/lib/pipeline-commands.sh +1216 -0
  98. package/scripts/lib/pipeline-detection.sh +100 -2
  99. package/scripts/lib/pipeline-execution.sh +897 -0
  100. package/scripts/lib/pipeline-github.sh +28 -3
  101. package/scripts/lib/pipeline-intelligence-compound.sh +431 -0
  102. package/scripts/lib/pipeline-intelligence-scoring.sh +407 -0
  103. package/scripts/lib/pipeline-intelligence-skip.sh +181 -0
  104. package/scripts/lib/pipeline-intelligence.sh +100 -1136
  105. package/scripts/lib/pipeline-quality-bash-compat.sh +182 -0
  106. package/scripts/lib/pipeline-quality-checks.sh +17 -715
  107. package/scripts/lib/pipeline-quality-gates.sh +563 -0
  108. package/scripts/lib/pipeline-stages-build.sh +730 -0
  109. package/scripts/lib/pipeline-stages-delivery.sh +965 -0
  110. package/scripts/lib/pipeline-stages-intake.sh +1133 -0
  111. package/scripts/lib/pipeline-stages-monitor.sh +407 -0
  112. package/scripts/lib/pipeline-stages-review.sh +1022 -0
  113. package/scripts/lib/pipeline-stages.sh +59 -2929
  114. package/scripts/lib/pipeline-state.sh +36 -5
  115. package/scripts/lib/pipeline-util.sh +487 -0
  116. package/scripts/lib/policy-learner.sh +438 -0
  117. package/scripts/lib/process-reward.sh +493 -0
  118. package/scripts/lib/project-detect.sh +649 -0
  119. package/scripts/lib/quality-profile.sh +334 -0
  120. package/scripts/lib/recruit-commands.sh +885 -0
  121. package/scripts/lib/recruit-learning.sh +739 -0
  122. package/scripts/lib/recruit-roles.sh +648 -0
  123. package/scripts/lib/reward-aggregator.sh +458 -0
  124. package/scripts/lib/rl-optimizer.sh +362 -0
  125. package/scripts/lib/root-cause.sh +427 -0
  126. package/scripts/lib/scope-enforcement.sh +445 -0
  127. package/scripts/lib/session-restart.sh +493 -0
  128. package/scripts/lib/skill-memory.sh +300 -0
  129. package/scripts/lib/skill-registry.sh +775 -0
  130. package/scripts/lib/spec-driven.sh +476 -0
  131. package/scripts/lib/test-helpers.sh +18 -7
  132. package/scripts/lib/test-holdout.sh +429 -0
  133. package/scripts/lib/test-optimizer.sh +511 -0
  134. package/scripts/shipwright-file-suggest.sh +45 -0
  135. package/scripts/skills/adversarial-quality.md +61 -0
  136. package/scripts/skills/api-design.md +44 -0
  137. package/scripts/skills/architecture-design.md +50 -0
  138. package/scripts/skills/brainstorming.md +43 -0
  139. package/scripts/skills/data-pipeline.md +44 -0
  140. package/scripts/skills/deploy-safety.md +64 -0
  141. package/scripts/skills/documentation.md +38 -0
  142. package/scripts/skills/frontend-design.md +45 -0
  143. package/scripts/skills/generated/.gitkeep +0 -0
  144. package/scripts/skills/generated/_refinements/.gitkeep +0 -0
  145. package/scripts/skills/generated/_refinements/adversarial-quality.patch.md +3 -0
  146. package/scripts/skills/generated/_refinements/architecture-design.patch.md +3 -0
  147. package/scripts/skills/generated/_refinements/brainstorming.patch.md +3 -0
  148. package/scripts/skills/generated/cli-version-management.md +29 -0
  149. package/scripts/skills/generated/collection-system-validation.md +99 -0
  150. package/scripts/skills/generated/large-scale-c-refactoring-coordination.md +97 -0
  151. package/scripts/skills/generated/pattern-matching-similarity-scoring.md +195 -0
  152. package/scripts/skills/generated/test-parallelization-detection.md +65 -0
  153. package/scripts/skills/observability.md +79 -0
  154. package/scripts/skills/performance.md +48 -0
  155. package/scripts/skills/pr-quality.md +49 -0
  156. package/scripts/skills/product-thinking.md +43 -0
  157. package/scripts/skills/security-audit.md +49 -0
  158. package/scripts/skills/systematic-debugging.md +40 -0
  159. package/scripts/skills/testing-strategy.md +47 -0
  160. package/scripts/skills/two-stage-review.md +52 -0
  161. package/scripts/skills/validation-thoroughness.md +55 -0
  162. package/scripts/sw +9 -3
  163. package/scripts/sw-activity.sh +9 -2
  164. package/scripts/sw-adaptive.sh +2 -1
  165. package/scripts/sw-adversarial.sh +2 -1
  166. package/scripts/sw-architecture-enforcer.sh +3 -1
  167. package/scripts/sw-auth.sh +12 -2
  168. package/scripts/sw-autonomous.sh +5 -1
  169. package/scripts/sw-changelog.sh +4 -1
  170. package/scripts/sw-checkpoint.sh +2 -1
  171. package/scripts/sw-ci.sh +5 -1
  172. package/scripts/sw-cleanup.sh +4 -26
  173. package/scripts/sw-code-review.sh +10 -4
  174. package/scripts/sw-connect.sh +2 -1
  175. package/scripts/sw-context.sh +2 -1
  176. package/scripts/sw-cost.sh +48 -3
  177. package/scripts/sw-daemon.sh +66 -9
  178. package/scripts/sw-dashboard.sh +3 -1
  179. package/scripts/sw-db.sh +59 -16
  180. package/scripts/sw-decide.sh +8 -2
  181. package/scripts/sw-decompose.sh +360 -17
  182. package/scripts/sw-deps.sh +4 -1
  183. package/scripts/sw-developer-simulation.sh +4 -1
  184. package/scripts/sw-discovery.sh +325 -2
  185. package/scripts/sw-doc-fleet.sh +4 -1
  186. package/scripts/sw-docs-agent.sh +3 -1
  187. package/scripts/sw-docs.sh +2 -1
  188. package/scripts/sw-doctor.sh +453 -2
  189. package/scripts/sw-dora.sh +4 -1
  190. package/scripts/sw-durable.sh +4 -3
  191. package/scripts/sw-e2e-orchestrator.sh +17 -16
  192. package/scripts/sw-eventbus.sh +7 -1
  193. package/scripts/sw-evidence.sh +364 -12
  194. package/scripts/sw-feedback.sh +550 -9
  195. package/scripts/sw-fix.sh +20 -1
  196. package/scripts/sw-fleet-discover.sh +6 -2
  197. package/scripts/sw-fleet-viz.sh +4 -1
  198. package/scripts/sw-fleet.sh +5 -1
  199. package/scripts/sw-github-app.sh +16 -3
  200. package/scripts/sw-github-checks.sh +3 -2
  201. package/scripts/sw-github-deploy.sh +3 -2
  202. package/scripts/sw-github-graphql.sh +18 -7
  203. package/scripts/sw-guild.sh +5 -1
  204. package/scripts/sw-heartbeat.sh +5 -30
  205. package/scripts/sw-hello.sh +67 -0
  206. package/scripts/sw-hygiene.sh +6 -1
  207. package/scripts/sw-incident.sh +265 -1
  208. package/scripts/sw-init.sh +18 -2
  209. package/scripts/sw-instrument.sh +10 -2
  210. package/scripts/sw-intelligence.sh +42 -6
  211. package/scripts/sw-jira.sh +5 -1
  212. package/scripts/sw-launchd.sh +2 -1
  213. package/scripts/sw-linear.sh +4 -1
  214. package/scripts/sw-logs.sh +4 -1
  215. package/scripts/sw-loop.sh +432 -1128
  216. package/scripts/sw-memory.sh +356 -2
  217. package/scripts/sw-mission-control.sh +6 -1
  218. package/scripts/sw-model-router.sh +481 -26
  219. package/scripts/sw-otel.sh +13 -4
  220. package/scripts/sw-oversight.sh +14 -5
  221. package/scripts/sw-patrol-meta.sh +334 -0
  222. package/scripts/sw-pipeline-composer.sh +5 -1
  223. package/scripts/sw-pipeline-vitals.sh +2 -1
  224. package/scripts/sw-pipeline.sh +53 -2664
  225. package/scripts/sw-pm.sh +12 -5
  226. package/scripts/sw-pr-lifecycle.sh +2 -1
  227. package/scripts/sw-predictive.sh +7 -1
  228. package/scripts/sw-prep.sh +185 -2
  229. package/scripts/sw-ps.sh +5 -25
  230. package/scripts/sw-public-dashboard.sh +15 -3
  231. package/scripts/sw-quality.sh +2 -1
  232. package/scripts/sw-reaper.sh +8 -25
  233. package/scripts/sw-recruit.sh +156 -2303
  234. package/scripts/sw-regression.sh +19 -12
  235. package/scripts/sw-release-manager.sh +3 -1
  236. package/scripts/sw-release.sh +4 -1
  237. package/scripts/sw-remote.sh +3 -1
  238. package/scripts/sw-replay.sh +7 -1
  239. package/scripts/sw-retro.sh +158 -1
  240. package/scripts/sw-review-rerun.sh +3 -1
  241. package/scripts/sw-scale.sh +10 -3
  242. package/scripts/sw-security-audit.sh +6 -1
  243. package/scripts/sw-self-optimize.sh +6 -3
  244. package/scripts/sw-session.sh +9 -3
  245. package/scripts/sw-setup.sh +3 -1
  246. package/scripts/sw-stall-detector.sh +406 -0
  247. package/scripts/sw-standup.sh +15 -7
  248. package/scripts/sw-status.sh +3 -1
  249. package/scripts/sw-strategic.sh +4 -1
  250. package/scripts/sw-stream.sh +7 -1
  251. package/scripts/sw-swarm.sh +18 -6
  252. package/scripts/sw-team-stages.sh +13 -6
  253. package/scripts/sw-templates.sh +5 -29
  254. package/scripts/sw-testgen.sh +7 -1
  255. package/scripts/sw-tmux-pipeline.sh +4 -1
  256. package/scripts/sw-tmux-role-color.sh +2 -0
  257. package/scripts/sw-tmux-status.sh +1 -1
  258. package/scripts/sw-tmux.sh +3 -1
  259. package/scripts/sw-trace.sh +3 -1
  260. package/scripts/sw-tracker-github.sh +3 -0
  261. package/scripts/sw-tracker-jira.sh +3 -0
  262. package/scripts/sw-tracker-linear.sh +3 -0
  263. package/scripts/sw-tracker.sh +3 -1
  264. package/scripts/sw-triage.sh +2 -1
  265. package/scripts/sw-upgrade.sh +3 -1
  266. package/scripts/sw-ux.sh +5 -2
  267. package/scripts/sw-webhook.sh +3 -1
  268. package/scripts/sw-widgets.sh +3 -1
  269. package/scripts/sw-worktree.sh +15 -3
  270. package/scripts/test-skill-injection.sh +1233 -0
  271. package/templates/pipelines/autonomous.json +27 -3
  272. package/templates/pipelines/cost-aware.json +34 -8
  273. package/templates/pipelines/deployed.json +12 -0
  274. package/templates/pipelines/enterprise.json +12 -0
  275. package/templates/pipelines/fast.json +6 -0
  276. package/templates/pipelines/full.json +27 -3
  277. package/templates/pipelines/hotfix.json +6 -0
  278. package/templates/pipelines/standard.json +12 -0
  279. package/templates/pipelines/tdd.json +12 -0
@@ -0,0 +1,439 @@
1
+ # Deep Research: Autonomous Coding Systems 2026 - Complete Index
2
+
3
+ **Research Date:** April 4, 2026
4
+ **Scope:** Cutting-edge research on autonomous software engineering, dark factories, RL systems, and agent coordination
5
+ **Status:** Complete (65 sources, 25+ papers, 10 research areas)
6
+
7
+ ---
8
+
9
+ ## Quick Start Guide
10
+
11
+ ### For Product Strategy (15 min read)
12
+
13
+ 1. Start with: **RESEARCH_SUMMARY.txt** (executive summary)
14
+ 2. Skim: **BACKLOG_QUICK_REFERENCE.md** (priority matrix + ROI)
15
+ 3. Deep dive: **CUTTING_EDGE_RESEARCH_2026.md** (sections #1-5)
16
+
17
+ ### For Implementation Planning (30 min read)
18
+
19
+ 1. Read: **BACKLOG_QUICK_REFERENCE.md** (full roadmap)
20
+ 2. Reference: **RESEARCH_SOURCES.md** (key papers per feature)
21
+ 3. Deep dive: **CUTTING_EDGE_RESEARCH_2026.md** (specific gap sections)
22
+
23
+ ### For Architecture Decisions (60 min read)
24
+
25
+ 1. Read: **CUTTING_EDGE_RESEARCH_2026.md** (entire document)
26
+ 2. Cross-reference: **RESEARCH_SOURCES.md** (full URLs for papers)
27
+ 3. Apply: **BACKLOG_QUICK_REFERENCE.md** (implementation checklist)
28
+
29
+ ---
30
+
31
+ ## Document Overview
32
+
33
+ ### 1. CUTTING_EDGE_RESEARCH_2026.md (34 KB) ★ PRIMARY REPORT
34
+
35
+ **Content:**
36
+
37
+ - 10-area competitive analysis (loop patterns, dark factory, RL, memory, verification, testing, cost, self-healing, multi-agent, reasoning)
38
+ - SOTA systems deep-dive with specific examples and benchmarks
39
+ - Shipwright strengths (8 differentiated capabilities)
40
+ - Shipwright gaps (10 specific missing features)
41
+ - 20-item actionable backlog ranked by impact/effort ratio
42
+ - 3-phase 12-week implementation roadmap
43
+ - ROI analysis (5-7x immediate, 3-4x long-term)
44
+
45
+ **Best for:** Strategic decisions, identifying gaps, understanding SOTA landscape, implementation planning
46
+
47
+ **Key sections:**
48
+
49
+ - Section 1: Autonomous Loop Patterns (SWE-agent, geometric dynamics, convergence)
50
+ - Section 2: Dark Factory Model (BCG Platinion, 3-5 engineer factories)
51
+ - Section 3: RL for Code (FunPRM, SecCoderX, DeepSeek-R1)
52
+ - Section 4: Episodic Memory (Mem0, EM-LLM, active compression)
53
+ - Section 5: Formal Verification (DafnyPro, ATLAS, Dafny benchmarks)
54
+ - Section 6: Mutation Testing (Meta ACH, MutGen, diversity)
55
+ - Section 7: Cost Optimization (Google Cascades, routing frameworks)
56
+ - Section 8: Self-Healing CI/CD (Agentic SRE, Pipeline Doctor, MTTR)
57
+ - Section 9: Multi-Agent Coordination (3-role pattern, frameworks, conflicts)
58
+ - Section 10: Reasoning Models (Claude Opus 4.6, o1-pro, DeepSeek-R1)
59
+
60
+ ---
61
+
62
+ ### 2. BACKLOG_QUICK_REFERENCE.md (15 KB) ★ ACTIONABLE PRIORITY LIST
63
+
64
+ **Content:**
65
+
66
+ - Priority matrix (Rank, ID, Feature, Impact, Effort, ROI, Category)
67
+ - Top 8 items with implementation details
68
+ - 12-week phase-based roadmap
69
+ - Implementation checklist
70
+ - Success metrics
71
+ - Dependency graph
72
+ - Cost-benefit analysis
73
+ - Next steps timeline
74
+
75
+ **Best for:** Quick decision-making, sprint planning, ROI justification, tracking progress
76
+
77
+ **Key sections:**
78
+
79
+ - At-a-glance matrix (20 items ranked)
80
+ - Phase 1 items with full implementation guidance (#1, #5, #2 research)
81
+ - Phase 2 items (#3, #6, #13)
82
+ - Phase 3 items (#4, #7, #8)
83
+ - Tier 2 items summary (brief implementation paths)
84
+ - Dependency relationships
85
+ - Post-implementation success metrics
86
+ - Budget and timeline planning
87
+
88
+ ---
89
+
90
+ ### 3. RESEARCH_SOURCES.md (16 KB) ★ COMPLETE BIBLIOGRAPHY
91
+
92
+ **Content:**
93
+
94
+ - 60+ sources organized by research area
95
+ - Complete URLs for every paper, blog, report, tool
96
+ - Key findings extracted from each source
97
+ - Quick link summary grouped by backlog item
98
+ - Total coverage: 25+ academic papers, 15+ industry reports, 10+ GitHub repos
99
+
100
+ **Best for:** Finding original sources, deep diving on specific topics, citation, verification
101
+
102
+ **Key sections:**
103
+
104
+ - Dark Factory & Autonomous Delivery (BCG, Anthropic, GitHub)
105
+ - Autonomous Loop Patterns (SWE-agent, geometric dynamics, benchmarks)
106
+ - RL for Code Generation (FunPRM, SecCoderX, DeepSeek, Meta ACH)
107
+ - Reasoning Models (Claude, OpenAI o1-pro, alignment science)
108
+ - Memory Systems (Mem0, EM-LLM, episodic learning)
109
+ - Formal Verification (DafnyPro, ATLAS, benchmarks)
110
+ - Test Generation & Mutation (Meta, MutGen, LLMorpheus)
111
+ - Cost Optimization (Google Cascades, routing frameworks)
112
+ - Self-Healing CI/CD (Agentic SRE, AIOps, patterns)
113
+ - Multi-Agent Coordination (frameworks, patterns, DORA)
114
+ - Competitive Analysis (SWE-agent, Claude Code, Aider, Cline)
115
+
116
+ ---
117
+
118
+ ### 4. RESEARCH_SUMMARY.txt (plaintext, ~5 KB)
119
+
120
+ **Content:**
121
+
122
+ - Executive summary of all research
123
+ - Key findings by category
124
+ - Competitive analysis summary
125
+ - 20-item backlog summary
126
+ - ROI analysis
127
+ - Implementation roadmap
128
+ - Next steps
129
+
130
+ **Best for:** Email distribution, quick briefing, non-Markdown contexts
131
+
132
+ ---
133
+
134
+ ## Research Coverage by Topic
135
+
136
+ ### Autonomous Loop Patterns & Convergence Detection
137
+
138
+ **SOTA Systems:**
139
+
140
+ - SWE-agent (NeurIPS 2024) — custom ACI, repository primitives
141
+ - Geometric Dynamics paper (arxiv 2512.10350) — formal regime characterization
142
+ - Anthropic 2026 report — convergence triggers via prompt design
143
+ - 220 loops study — stuck detection empirical data
144
+
145
+ **Shipwright Status:** Has basic convergence detection; missing formal regime analysis
146
+
147
+ **Backlog Item:** #1 (Semantic trajectory analysis) — Week 1
148
+
149
+ ---
150
+
151
+ ### Dark Factory / Lights-Out Delivery
152
+
153
+ **SOTA Systems:**
154
+
155
+ - BCG Platinion (March 2026) — 3-5 engineers, 650+ PRs/month, Spotify/OpenAI cases
156
+ - GitHub Copilot Agent Mode — Issue-to-PR workflow
157
+ - Project Padawan (upcoming) — fully autonomous issue completion
158
+
159
+ **Shipwright Status:** Has 12-stage pipeline; missing Intent Specification Engine
160
+
161
+ **Backlog Item:** #2 (Intent Specification Engine) — High impact, research phase Week 2
162
+
163
+ ---
164
+
165
+ ### Reinforcement Learning for Code Generation
166
+
167
+ **SOTA Systems:**
168
+
169
+ - FunPRM — function-as-step process rewards, +15-20% completion
170
+ - SecCoderX — vulnerability reward model, secure code RL
171
+ - Meta ACH — 9,095 mutants + 571 tests on 10K classes
172
+ - DeepSeek-R1 — pure RL without SFT, 2,029 Codeforces Elo
173
+
174
+ **Shipwright Status:** Has reward aggregation + policy learning; missing vulnerability signals
175
+
176
+ **Backlog Items:** #3 (Vulnerability Reward), #6 (Mutation Feedback), #13 (LLM Mutants)
177
+
178
+ ---
179
+
180
+ ### Episodic Memory & Long-Context Learning
181
+
182
+ **SOTA Systems:**
183
+
184
+ - Mem0 — hybrid storage, episodic + semantic layers
185
+ - EM-LLM — Bayesian surprise + graph refinement for episodes
186
+ - MemRL — agents improve via runtime RL on episodic memory
187
+ - Active compression — consolidate episodes → semantic facts
188
+
189
+ **Shipwright Status:** Pattern-based memory only; no execution traces
190
+
191
+ **Backlog Items:** #4 (Episodic Memory), #12 (Active Compression), #15 (Fleet Learning)
192
+
193
+ ---
194
+
195
+ ### Formal Verification & Specification
196
+
197
+ **SOTA Systems:**
198
+
199
+ - DafnyPro (POPL 2026) — 86% on DafnyBench via Claude
200
+ - ATLAS — 2.7K verified programs, 19K training examples
201
+ - MiniF2F-Dafny — mathematical theorem proving
202
+ - Vericoding benchmark — 27% Lean, 44% Verus, 82% Dafny
203
+
204
+ **Shipwright Status:** Tests only; no formal verification
205
+
206
+ **Backlog Item:** #11 (Formal Verification) — High effort, niche but high stakes
207
+
208
+ ---
209
+
210
+ ### Test Generation & Mutation Testing
211
+
212
+ **SOTA Systems:**
213
+
214
+ - Meta ACH — LLM-based test generation + mutant generation
215
+ - MutGen — 89.5% mutation score, outperforms EvoSuite
216
+ - LLMorpheus — open-source LLM-based mutation tool
217
+ - GPT-4o mutants — 57 different AST node types vs 2 for rule-based
218
+
219
+ **Shipwright Status:** Has testgen; no mutation feedback loop
220
+
221
+ **Backlog Items:** #6 (Mutation Loop), #13 (LLM Mutants), #17 (Privacy Mutations)
222
+
223
+ ---
224
+
225
+ ### Cost Optimization & Model Routing
226
+
227
+ **SOTA Systems:**
228
+
229
+ - Google Speculative Cascades — 30-60% cost reduction
230
+ - Unified routing + cascading — theoretically optimal framework
231
+ - CoSine — 23% latency, 32% throughput improvement
232
+ - Smurfs — adaptive speculation length per query
233
+
234
+ **Shipwright Status:** Has model routing; no speculative cascading
235
+
236
+ **Backlog Item:** #5 (Cascade Routing) — High ROI, Week 1
237
+
238
+ ---
239
+
240
+ ### Self-Healing CI/CD & AIOps
241
+
242
+ **SOTA Systems:**
243
+
244
+ - Agentic SRE pattern — telemetry → reasoning → automation
245
+ - Pipeline Doctor / Interceptor — repair agent on failure
246
+ - LLM-as-a-Judge — standard 2026 quality gate pattern
247
+ - 67% MTTR drop with AIOps; 60% enterprise adoption (Gartner)
248
+
249
+ **Shipwright Status:** Has retry logic; no repair agent or secondary validation
250
+
251
+ **Backlog Items:** #7 (CI Repair), #8 (Judge), #14 (Anomaly Detection)
252
+
253
+ ---
254
+
255
+ ### Multi-Agent Coordination & Orchestration
256
+
257
+ **SOTA Systems:**
258
+
259
+ - Standard 3-role (Planner, Worker, Judge)
260
+ - Git worktrees now standard isolation
261
+ - MetaGPT, CrewAI, LangGraph, AutoGen frameworks
262
+ - Google DORA 2025: 20-30% faster, 9% bug rate climb
263
+
264
+ **Shipwright Status:** Strong multi-agent support; missing conflict resolution + DAG
265
+
266
+ **Backlog Items:** #9 (Conflict Detection), #18 (DAG Scheduler)
267
+
268
+ ---
269
+
270
+ ### Reasoning Models with Extended/Adaptive Thinking
271
+
272
+ **SOTA Systems:**
273
+
274
+ - Claude Opus 4.6 — adaptive thinking (dynamic budget)
275
+ - OpenAI o1-pro — $150/$600 pricing, 200K context, 89th% Codeforces
276
+ - DeepSeek-R1 — 2,029 Elo, MoE architecture
277
+ - Claude Mythos (unreleased) — recursive self-correction
278
+
279
+ **Shipwright Status:** Uses extended thinking; missing budget allocation per query type
280
+
281
+ **Backlog Item:** #10 (Reasoning Budget Allocation)
282
+
283
+ ---
284
+
285
+ ## Competitive Landscape (2026)
286
+
287
+ | System | SWE-bench | Multi-Agent | RL | Memory | Cost-Opt | Verification | Notes |
288
+ | ------------------ | --------- | ----------- | --- | ------ | -------- | ------------ | --------------------------------------- |
289
+ | **Claude Code** | 80.9% | ❌ | ❌ | ❌ | ✓ | ❌ | Highest score, single-agent |
290
+ | **SWE-agent** | 40.6% | ❌ | ❌ | ❌ | ❌ | ❌ | Best ACI design, NeurIPS 2024 |
291
+ | **Aider** | 49.2% | ❌ | ❌ | ❌ | ✓✓ | ❌ | 4.2x token efficient |
292
+ | **Cline** | — | ❌ | ❌ | ❌ | ✓ | ❌ | 500K downloads, IDE integration |
293
+ | **GitHub Copilot** | — | ✓ | ❌ | ❌ | ✓ | ❌ | Project Padawan (autonomous) |
294
+ | **Shipwright** | — | ✓✓ | ✓✓ | ✓ | ✓ | ❌ | **UNIQUE: Platform for dark factories** |
295
+
296
+ **Shipwright's positioning:** Only full-stack platform combining multi-agent orchestration + RL optimization + memory system + cost intelligence.
297
+
298
+ ---
299
+
300
+ ## Implementation Roadmap at a Glance
301
+
302
+ ```
303
+ PHASE 1 (Weeks 1-4): CONVERGENCE & COST
304
+ Week 1-2: #1 Semantic trajectory analysis
305
+ Week 1-2: #5 Speculative cascade routing
306
+ Week 2+: #2 Intent Specification (research phase)
307
+
308
+ PHASE 2 (Weeks 5-8): SECURITY & TESTING
309
+ Week 5-6: #3 Vulnerability Reward Model
310
+ Week 5-6: #6 Mutation Testing Loop
311
+ Week 7-8: #13 LLM-based Mutants
312
+
313
+ PHASE 3 (Weeks 9-12): MEMORY & SELF-HEALING
314
+ Week 9-10: #4 Episodic Memory Layer
315
+ Week 9-10: #7 CI Repair Agent
316
+ Week 11-12: #8 LLM-as-a-Judge
317
+
318
+ TIER 2 (Weeks 13-26): LONGER-TERM
319
+ #2 Intent Specification (full implementation)
320
+ #9 Conflict Detection + DAG
321
+ #10 Reasoning Budget Allocation
322
+ #11 Formal Verification
323
+ #12 Active Compression
324
+ #14 Anomaly Detection
325
+ #15 Fleet Learning
326
+ ```
327
+
328
+ ---
329
+
330
+ ## Success Metrics (Post-Implementation)
331
+
332
+ | Feature | Metric | Target | Current |
333
+ | ------------------- | ----------------- | ------- | -------- |
334
+ | #1 Loop convergence | Iteration waste ↓ | -25-40% | Baseline |
335
+ | #5 Cascade routing | Cost reduction | -40-60% | Baseline |
336
+ | #3 Security | Bug reduction | -30-40% | Current |
337
+ | #4 Episodic memory | Solution time | -20-35% | Baseline |
338
+ | #6 Mutation testing | Mutation score | >80% | ~60% |
339
+ | #7 CI repair | Retry cycles | -50% | Baseline |
340
+ | **Overall** | Pipeline success | >85% | ~77% |
341
+
342
+ ---
343
+
344
+ ## Investment & ROI
345
+
346
+ **Phase 1-2 (8 weeks, 2 engineers):**
347
+
348
+ - Cost: $65K (engineering + compute)
349
+ - Return: $320-440K annually
350
+ - ROI: **5-7x**
351
+
352
+ **Long-term (26 weeks):**
353
+
354
+ - Additional return: $120-155K/year
355
+ - ROI: **3-4x** on incremental investment
356
+
357
+ ---
358
+
359
+ ## How to Use These Documents
360
+
361
+ ### Weekly Strategy Review
362
+
363
+ 1. Open **BACKLOG_QUICK_REFERENCE.md** → Priority matrix
364
+ 2. Check progress against timeline
365
+ 3. Update next week's focus
366
+
367
+ ### Pre-Sprint Planning
368
+
369
+ 1. Read relevant sections in **CUTTING_EDGE_RESEARCH_2026.md**
370
+ 2. Extract implementation details from "Actionable Gap"
371
+ 3. Check **RESEARCH_SOURCES.md** for key papers
372
+
373
+ ### Deep Technical Design
374
+
375
+ 1. Read full section in **CUTTING_EDGE_RESEARCH_2026.md**
376
+ 2. Review all sources in **RESEARCH_SOURCES.md**
377
+ 3. Implement checklist from **BACKLOG_QUICK_REFERENCE.md**
378
+
379
+ ### Competitive Briefing
380
+
381
+ 1. Share **RESEARCH_SUMMARY.txt** (5 min read)
382
+ 2. Reference SOTA systems from specific sections
383
+ 3. Deep dive as needed
384
+
385
+ ---
386
+
387
+ ## Notes for Implementation
388
+
389
+ ### Assumptions Made
390
+
391
+ - Shipwright has access to Claude API (embedding, reasoning)
392
+ - GitHub Actions integration complete
393
+ - Current pipeline success rate ~77%
394
+ - Monthly compute budget ~$50K
395
+
396
+ ### Risk Factors
397
+
398
+ - Model API availability (o1-pro limited to ChatGPT Pro)
399
+ - DeepSeek-R1 accessibility (China-based, regulatory risk)
400
+ - Formal verification tools (complex integration)
401
+ - RL training stability (exploration vs exploitation tuning)
402
+
403
+ ### Mitigation Strategies
404
+
405
+ - Start with proven patterns (Google Cascades, Meta ACH)
406
+ - Use open-source where possible (DeepSeek-R1, Dafny, Aider)
407
+ - Prototype before full implementation (#2 Intent Engine research phase)
408
+ - A/B test new features (reasoning budgets, cascade routing)
409
+ - Track metrics continuously (DORA, cost, success rate)
410
+
411
+ ---
412
+
413
+ ## Questions for Follow-Up
414
+
415
+ 1. **Dark Factory Ready:** How aggressively should we pursue the Intent Specification Engine (#2)? It's strategic but high-effort.
416
+
417
+ 2. **Formal Verification:** Is the cryptographic/payment use case common enough to justify #11 (Dafny/Lean integration)?
418
+
419
+ 3. **Reasoning Models:** Should we wait for Claude Mythos, or start with o1-pro now?
420
+
421
+ 4. **Priority Trade-offs:** If we can only do 3 items in Phase 1, should we skip #2 (Intent) research and focus on cost/convergence?
422
+
423
+ 5. **Multi-Agent Safety:** With Google's DORA showing 9% bug rate climb, how should quality gates (#8 Judge) be weighted?
424
+
425
+ ---
426
+
427
+ **Generated:** April 4, 2026
428
+ **Research effort:** 65+ sources, 25+ papers, 10 research areas, 8 hours
429
+ **Next review:** After Phase 1 completion (Week 4)
430
+
431
+ ---
432
+
433
+ ## Document Navigation
434
+
435
+ - Primary Report: `CUTTING_EDGE_RESEARCH_2026.md`
436
+ - Quick Reference: `BACKLOG_QUICK_REFERENCE.md`
437
+ - Sources: `RESEARCH_SOURCES.md`
438
+ - Summary: `RESEARCH_SUMMARY.txt`
439
+ - This Index: `RESEARCH_INDEX.md` (you are here)