@beyondwork/docx-react-component 1.0.0 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (560) hide show
  1. package/README.md +44 -104
  2. package/package.json +66 -15
  3. package/src/api/public-types.ts +1 -1
  4. package/src/compare/diff-engine.ts +530 -0
  5. package/src/compare/export-redlines.ts +162 -0
  6. package/src/compare/snapshot.ts +37 -0
  7. package/src/core/commands/index.ts +1 -1
  8. package/src/core/state/editor-state.ts +2 -2
  9. package/src/index.ts +45 -0
  10. package/src/legal/bookmarks.ts +196 -0
  11. package/src/legal/cross-references.ts +356 -0
  12. package/src/legal/defined-terms.ts +203 -0
  13. package/src/runtime/document-runtime.ts +3 -5
  14. package/src/runtime/table-commands.ts +4 -1
  15. package/src/runtime/table-schema.ts +17 -2
  16. package/src/runtime/virtualized-rendering.ts +258 -0
  17. package/src/ui/WordReviewEditor.tsx +256 -35
  18. package/src/ui-tailwind/editor-surface/tw-editor-surface.tsx +2 -2
  19. package/src/ui-tailwind/editor-surface/tw-table-node-view.tsx +16 -2
  20. package/.codex/config.toml +0 -5
  21. package/.corepack/v1/pnpm/10.30.3/.corepack +0 -1
  22. package/.corepack/v1/pnpm/10.30.3/LICENSE +0 -22
  23. package/.corepack/v1/pnpm/10.30.3/README.md +0 -240
  24. package/.corepack/v1/pnpm/10.30.3/dist/node-gyp-bin/node-gyp +0 -6
  25. package/.corepack/v1/pnpm/10.30.3/dist/node-gyp-bin/node-gyp.cmd +0 -5
  26. package/.corepack/v1/pnpm/10.30.3/dist/pnpm.cjs +0 -195400
  27. package/.corepack/v1/pnpm/10.30.3/dist/pnpmrc +0 -2
  28. package/.corepack/v1/pnpm/10.30.3/dist/reflink.darwin-arm64-2HJ4WGO6.node +0 -0
  29. package/.corepack/v1/pnpm/10.30.3/dist/reflink.darwin-x64-3G3H6IW4.node +0 -0
  30. package/.corepack/v1/pnpm/10.30.3/dist/reflink.win32-arm64-msvc-Q6BARPPB.node +0 -0
  31. package/.corepack/v1/pnpm/10.30.3/dist/reflink.win32-x64-msvc-J2TZHRQI.node +0 -0
  32. package/.corepack/v1/pnpm/10.30.3/dist/templates/completion.bash +0 -31
  33. package/.corepack/v1/pnpm/10.30.3/dist/templates/completion.fish +0 -22
  34. package/.corepack/v1/pnpm/10.30.3/dist/templates/completion.ps1 +0 -193
  35. package/.corepack/v1/pnpm/10.30.3/dist/templates/completion.zsh +0 -27
  36. package/.corepack/v1/pnpm/10.30.3/dist/vendor/fastlist-0.3.0-x64.exe +0 -0
  37. package/.corepack/v1/pnpm/10.30.3/dist/vendor/fastlist-0.3.0-x86.exe +0 -0
  38. package/.corepack/v1/pnpm/10.30.3/dist/worker.js +0 -10119
  39. package/.corepack/v1/pnpm/10.30.3/package.json +0 -192
  40. package/.cursor/mcp.json +0 -7
  41. package/.github/workflows/ci.yml +0 -35
  42. package/.mcp.json +0 -7
  43. package/.openclaw/workspace-state.json +0 -4
  44. package/.pnpmrc.json +0 -1
  45. package/.wave-launch.sh +0 -7
  46. package/.workspace-marker +0 -1
  47. package/AGENTS.md +0 -78
  48. package/CHANGELOG.md +0 -177
  49. package/DESIGN.md +0 -929
  50. package/HEARTBEAT.md +0 -7
  51. package/IDENTITY.md +0 -23
  52. package/SOUL.md +0 -36
  53. package/TOOLS.md +0 -40
  54. package/USER.md +0 -17
  55. package/docs/README.md +0 -107
  56. package/docs/agents/wave-cont-eval-role.md +0 -36
  57. package/docs/agents/wave-cont-qa-role.md +0 -52
  58. package/docs/agents/wave-deploy-verifier-role.md +0 -34
  59. package/docs/agents/wave-design-role.md +0 -47
  60. package/docs/agents/wave-documentation-role.md +0 -34
  61. package/docs/agents/wave-infra-role.md +0 -34
  62. package/docs/agents/wave-integration-role.md +0 -37
  63. package/docs/agents/wave-launcher-role.md +0 -41
  64. package/docs/agents/wave-orchestrator-role.md +0 -52
  65. package/docs/agents/wave-planner-role.md +0 -39
  66. package/docs/agents/wave-security-role.md +0 -40
  67. package/docs/architecture/docx/README.md +0 -10
  68. package/docs/architecture/future/README.md +0 -8
  69. package/docs/architecture/ooxml-upgrade-analysis.md +0 -134
  70. package/docs/architecture/platform/shared-openxml-editor-platform.md +0 -153
  71. package/docs/architecture/xlsx/canonical-workbook-model-and-commands.md +0 -187
  72. package/docs/architecture/xlsx/spreadsheet-editor-frontend-architecture.md +0 -150
  73. package/docs/comment-redline-overview.md +0 -350
  74. package/docs/concepts/context7-vs-skills.md +0 -118
  75. package/docs/concepts/operating-modes.md +0 -91
  76. package/docs/concepts/runtime-agnostic-orchestration.md +0 -111
  77. package/docs/concepts/what-is-a-wave.md +0 -217
  78. package/docs/context7/bundles.json +0 -222
  79. package/docs/context7/planner-agent/README.md +0 -28
  80. package/docs/context7/planner-agent/manifest.json +0 -83
  81. package/docs/context7/planner-agent/papers/cooperbench-why-coding-agents-cannot-be-your-teammates-yet.md +0 -3283
  82. package/docs/context7/planner-agent/papers/dova-deliberation-first-multi-agent-orchestration-for-autonomous-research-automation.md +0 -1699
  83. package/docs/context7/planner-agent/papers/dpbench-large-language-models-struggle-with-simultaneous-coordination.md +0 -2251
  84. package/docs/context7/planner-agent/papers/incremental-planning-to-control-a-blackboard-based-problem-solver.md +0 -1729
  85. package/docs/context7/planner-agent/papers/silo-bench-a-scalable-environment-for-evaluating-distributed-coordination-in-multi-agent-llm-systems.md +0 -3747
  86. package/docs/context7/planner-agent/papers/todoevolve-learning-to-architect-agent-planning-systems.md +0 -1675
  87. package/docs/context7/planner-agent/papers/verified-multi-agent-orchestration-a-plan-execute-verify-replan-framework-for-complex-query-resolution.md +0 -1173
  88. package/docs/context7/planner-agent/papers/why-do-multi-agent-llm-systems-fail.md +0 -5211
  89. package/docs/context7/planner-agent/topics/planning-and-orchestration.md +0 -24
  90. package/docs/evals/arm-templates/README.md +0 -13
  91. package/docs/evals/arm-templates/full-wave.json +0 -15
  92. package/docs/evals/arm-templates/single-agent.json +0 -15
  93. package/docs/evals/benchmark-catalog.json +0 -670
  94. package/docs/evals/cases/README.md +0 -47
  95. package/docs/evals/cases/wave-blackboard-inbox-targeting.json +0 -73
  96. package/docs/evals/cases/wave-contradiction-conflict.json +0 -104
  97. package/docs/evals/cases/wave-expert-routing-preservation.json +0 -69
  98. package/docs/evals/cases/wave-hidden-profile-private-evidence.json +0 -81
  99. package/docs/evals/cases/wave-premature-closure-guard.json +0 -71
  100. package/docs/evals/cases/wave-silo-cross-agent-state.json +0 -77
  101. package/docs/evals/cases/wave-simultaneous-lockstep.json +0 -92
  102. package/docs/evals/external-benchmarks.json +0 -85
  103. package/docs/evals/external-command-config.sample.json +0 -9
  104. package/docs/evals/external-command-config.swe-bench-pro.json +0 -8
  105. package/docs/evals/pilots/README.md +0 -47
  106. package/docs/evals/pilots/swe-bench-pro-public-full-wave-review-10.json +0 -64
  107. package/docs/evals/pilots/swe-bench-pro-public-pilot.json +0 -111
  108. package/docs/evals/wave-benchmark-program.md +0 -302
  109. package/docs/guides/planner.md +0 -220
  110. package/docs/guides/recommendations-0.8.9.md +0 -133
  111. package/docs/guides/signal-wrappers.md +0 -165
  112. package/docs/guides/terminal-surfaces.md +0 -96
  113. package/docs/image copy.png +0 -0
  114. package/docs/image.png +0 -0
  115. package/docs/images/image.png +0 -0
  116. package/docs/legal-feedback-architecture.md +0 -498
  117. package/docs/plans/component-cutover-matrix.json +0 -1072
  118. package/docs/plans/component-cutover-matrix.md +0 -307
  119. package/docs/plans/context7-wave-orchestrator.md +0 -155
  120. package/docs/plans/current-state.md +0 -198
  121. package/docs/plans/docx/README.md +0 -9
  122. package/docs/plans/examples/wave-benchmark-improvement.md +0 -108
  123. package/docs/plans/examples/wave-example-live-proof.md +0 -435
  124. package/docs/plans/master-plan.md +0 -224
  125. package/docs/plans/migration.md +0 -538
  126. package/docs/plans/operations/README.md +0 -7
  127. package/docs/plans/operations/wave-10-word-certification.md +0 -87
  128. package/docs/plans/operations/wave-8-railway-staging.md +0 -153
  129. package/docs/plans/operations/wave-9-manual-certification.md +0 -73
  130. package/docs/plans/platform/README.md +0 -9
  131. package/docs/plans/reference/legal-checklist-coverage.md +0 -258
  132. package/docs/plans/wave-orchestrator.md +0 -423
  133. package/docs/plans/waves/README.md +0 -75
  134. package/docs/plans/waves/completed/wave-0.md +0 -195
  135. package/docs/plans/waves/completed/wave-1.md +0 -379
  136. package/docs/plans/waves/completed/wave-10.md +0 -670
  137. package/docs/plans/waves/completed/wave-11.md +0 -335
  138. package/docs/plans/waves/completed/wave-12.md +0 -417
  139. package/docs/plans/waves/completed/wave-13.md +0 -316
  140. package/docs/plans/waves/completed/wave-14.md +0 -319
  141. package/docs/plans/waves/completed/wave-15.md +0 -321
  142. package/docs/plans/waves/completed/wave-16.md +0 -316
  143. package/docs/plans/waves/completed/wave-17.md +0 -331
  144. package/docs/plans/waves/completed/wave-18.md +0 -328
  145. package/docs/plans/waves/completed/wave-2.md +0 -438
  146. package/docs/plans/waves/completed/wave-3.md +0 -435
  147. package/docs/plans/waves/completed/wave-4.md +0 -430
  148. package/docs/plans/waves/completed/wave-5.md +0 -430
  149. package/docs/plans/waves/completed/wave-6.md +0 -430
  150. package/docs/plans/waves/completed/wave-7.md +0 -526
  151. package/docs/plans/waves/completed/wave-8.md +0 -596
  152. package/docs/plans/waves/completed/wave-9.md +0 -552
  153. package/docs/plans/waves/deferred/README.md +0 -14
  154. package/docs/plans/waves/deferred/encrypted-intake-contracts.md +0 -282
  155. package/docs/plans/waves/deferred/legal-feedback-wave-expansion.md +0 -308
  156. package/docs/plans/waves/deferred/wave-encrypted-intake.md +0 -451
  157. package/docs/plans/waves/design/README.md +0 -5
  158. package/docs/plans/waves/design/wave-1-a1.md +0 -309
  159. package/docs/plans/waves/reviews/README.md +0 -5
  160. package/docs/plans/waves/reviews/wave-0-cont-qa.md +0 -151
  161. package/docs/plans/waves/reviews/wave-1-cont-qa.md +0 -46
  162. package/docs/plans/waves/reviews/wave-10-accessibility-and-design.md +0 -51
  163. package/docs/plans/waves/reviews/wave-10-cont-qa.md +0 -24
  164. package/docs/plans/waves/reviews/wave-10-dashboard-proof.md +0 -46
  165. package/docs/plans/waves/reviews/wave-10-performance-signoff.md +0 -55
  166. package/docs/plans/waves/reviews/wave-10-regression-proof.md +0 -23
  167. package/docs/plans/waves/reviews/wave-10-release-audit.md +0 -31
  168. package/docs/plans/waves/reviews/wave-10-service-proof.md +0 -83
  169. package/docs/plans/waves/reviews/wave-10-word-certification.md +0 -31
  170. package/docs/plans/waves/reviews/wave-18-ai-contract-closure.md +0 -277
  171. package/docs/plans/waves/reviews/wave-18-cont-qa.md +0 -255
  172. package/docs/plans/waves/reviews/wave-18-parity-proof.md +0 -271
  173. package/docs/plans/waves/reviews/wave-19-cont-qa.md +0 -59
  174. package/docs/plans/waves/reviews/wave-2-cont-qa.md +0 -72
  175. package/docs/plans/waves/reviews/wave-20-cont-qa.md +0 -60
  176. package/docs/plans/waves/reviews/wave-25-cont-qa.md +0 -48
  177. package/docs/plans/waves/reviews/wave-28-cont-qa.md +0 -46
  178. package/docs/plans/waves/reviews/wave-29-cont-qa.md +0 -53
  179. package/docs/plans/waves/reviews/wave-3-cont-qa.md +0 -53
  180. package/docs/plans/waves/reviews/wave-3-core-proof.md +0 -77
  181. package/docs/plans/waves/reviews/wave-3-validator-proof.md +0 -73
  182. package/docs/plans/waves/reviews/wave-32-cont-qa.md +0 -43
  183. package/docs/plans/waves/reviews/wave-33-cont-qa.md +0 -526
  184. package/docs/plans/waves/reviews/wave-34-cont-qa.md +0 -100
  185. package/docs/plans/waves/reviews/wave-35-cont-qa.md +0 -145
  186. package/docs/plans/waves/reviews/wave-4-cont-qa.md +0 -47
  187. package/docs/plans/waves/reviews/wave-4-structure-proof.md +0 -69
  188. package/docs/plans/waves/reviews/wave-5-comment-proof.md +0 -158
  189. package/docs/plans/waves/reviews/wave-5-cont-qa.md +0 -68
  190. package/docs/plans/waves/reviews/wave-6-cont-qa.md +0 -416
  191. package/docs/plans/waves/reviews/wave-6-redline-proof.md +0 -130
  192. package/docs/plans/waves/reviews/wave-7-cont-qa.md +0 -82
  193. package/docs/plans/waves/reviews/wave-7-ooxml-compliance.md +0 -85
  194. package/docs/plans/waves/reviews/wave-7-preservation-proof.md +0 -119
  195. package/docs/plans/waves/reviews/wave-7-trust-ux.md +0 -87
  196. package/docs/plans/waves/reviews/wave-8-accessibility-and-design.md +0 -128
  197. package/docs/plans/waves/reviews/wave-8-cont-qa.md +0 -92
  198. package/docs/plans/waves/reviews/wave-8-live-proof.md +0 -140
  199. package/docs/plans/waves/reviews/wave-8-security.md +0 -47
  200. package/docs/plans/waves/reviews/wave-9-editor-embedding.md +0 -39
  201. package/docs/plans/waves/reviews/wave-9-fixture-runner.md +0 -56
  202. package/docs/plans/waves/reviews/wave-9-live-proof.md +0 -105
  203. package/docs/plans/waves/reviews/wave-9-usability-and-performance.md +0 -152
  204. package/docs/plans/waves/specs/README.md +0 -5
  205. package/docs/plans/waves/specs/wave-1-component-boundaries.md +0 -322
  206. package/docs/plans/waves/specs/wave-1-ooxml-contracts.md +0 -323
  207. package/docs/plans/waves/specs/wave-1-review-and-ui-contracts.md +0 -339
  208. package/docs/plans/waves/specs/wave-1-runtime-contracts.md +0 -509
  209. package/docs/plans/waves/wave-19.md +0 -341
  210. package/docs/plans/waves/wave-20.md +0 -308
  211. package/docs/plans/waves/wave-21.md +0 -289
  212. package/docs/plans/waves/wave-22.md +0 -221
  213. package/docs/plans/waves/wave-23.md +0 -295
  214. package/docs/plans/waves/wave-24.md +0 -286
  215. package/docs/plans/waves/wave-25.md +0 -313
  216. package/docs/plans/waves/wave-26.md +0 -300
  217. package/docs/plans/waves/wave-27.md +0 -299
  218. package/docs/plans/waves/wave-28.md +0 -368
  219. package/docs/plans/waves/wave-29.md +0 -303
  220. package/docs/plans/waves/wave-30.md +0 -307
  221. package/docs/plans/waves/wave-31.md +0 -231
  222. package/docs/plans/waves/wave-32.md +0 -152
  223. package/docs/plans/waves/wave-33.md +0 -147
  224. package/docs/plans/waves/wave-34.md +0 -148
  225. package/docs/plans/waves/wave-35.md +0 -141
  226. package/docs/plans/waves/wave-36.md +0 -146
  227. package/docs/plans/xlsx/README.md +0 -14
  228. package/docs/plans/xlsx/xlsx-fixture-corpus-and-certification-plan.md +0 -126
  229. package/docs/reference/cli-reference.md +0 -600
  230. package/docs/reference/coordination-and-closure.md +0 -487
  231. package/docs/reference/deep-research-report (15).md +0 -25
  232. package/docs/reference/docx/README.md +0 -10
  233. package/docs/reference/legal-checklist.md +0 -445
  234. package/docs/reference/live-proof-waves.md +0 -199
  235. package/docs/reference/ooxml-compliance.md +0 -129
  236. package/docs/reference/ooxml-feature-parity-matrix.md +0 -172
  237. package/docs/reference/platform/shared-ooxml-platform-guidance.md +0 -77
  238. package/docs/reference/prototype-agent-prompt-legal-fidelity.md +0 -155
  239. package/docs/reference/public-api.md +0 -456
  240. package/docs/reference/repository-guidance.md +0 -58
  241. package/docs/reference/runtime-config/README.md +0 -182
  242. package/docs/reference/runtime-config/claude.md +0 -110
  243. package/docs/reference/runtime-config/codex.md +0 -82
  244. package/docs/reference/runtime-config/opencode.md +0 -93
  245. package/docs/reference/sample-waves.md +0 -105
  246. package/docs/reference/skills.md +0 -237
  247. package/docs/reference/templates/AGENTS.md +0 -78
  248. package/docs/reference/templates/HEARTBEAT.md +0 -7
  249. package/docs/reference/templates/IDENTITY.md +0 -23
  250. package/docs/reference/templates/SOUL.md +0 -36
  251. package/docs/reference/templates/TOOLS.md +0 -40
  252. package/docs/reference/templates/USER.md +0 -17
  253. package/docs/reference/wave-control.md +0 -184
  254. package/docs/reference/wave-planning-lessons.md +0 -167
  255. package/docs/reference/word-review-editor-frontend-architecture.md +0 -479
  256. package/docs/reference/word-review-editor-ux-guide.md +0 -253
  257. package/docs/reference/xlsx/xlsx-ooxml-compliance.md +0 -137
  258. package/docs/research/agent-context-sources.md +0 -178
  259. package/docs/research/coordination-failure-review.md +0 -290
  260. package/docs/research/docx-react-component/Canonical Document Schema Specification for a React-based Word-compatible Editor.md +0 -2317
  261. package/docs/research/docx-react-component/Feature Compatibility Matrix for a React Word Compatible Legal Editor v1.md +0 -219
  262. package/docs/research/docx-react-component/React Component Architecture and Front-End Structure Specification for a Word-Compatible Legal Review Editor.md +0 -1112
  263. package/docs/research/docx-react-component/document_compatibility_and_testing_spec.md +0 -751
  264. package/docs/research/xlsx/raw/README.md +0 -13
  265. package/docs/roadmap.md +0 -174
  266. package/docs/superpowers/plans/2026-03-28-harness-control-bar.md +0 -677
  267. package/docs/superpowers/specs/2026-03-28-harness-control-bar-design.md +0 -274
  268. package/docs/xlsx-react/README.md +0 -38
  269. package/docs/xlsx-react/agent-llm-interaction-layer-docx-xlsx.md +0 -621
  270. package/docs/xlsx-react/canonical-workbook-model-and-commands.md +0 -948
  271. package/docs/xlsx-react/shared-openxml-editor-platform-docx-xlsx.md +0 -228
  272. package/docs/xlsx-react/spreadsheet-editor-component-architecture.md +0 -809
  273. package/docs/xlsx-react/spreadsheet-editor-frontend-architecture.md +0 -537
  274. package/docs/xlsx-react/spreadsheet-editor-ux-guide.md +0 -520
  275. package/docs/xlsx-react/xlsx-editor-research-pack.md +0 -871
  276. package/docs/xlsx-react/xlsx-fixture-corpus-and-certification-plan.md +0 -436
  277. package/docs/xlsx-react/xlsx-ooxml-compliance.md +0 -320
  278. package/examples/README.md +0 -16
  279. package/memory/MEMORY.md +0 -24
  280. package/pnpm-workspace.yaml +0 -4
  281. package/scripts/check-no-authored-js.sh +0 -13
  282. package/scripts/context7-api-check.sh +0 -65
  283. package/scripts/context7-export-env.sh +0 -42
  284. package/scripts/run-context7-mcp.sh +0 -8
  285. package/scripts/run-workspace-tests.sh +0 -15
  286. package/scripts/start-wave-10-local.sh +0 -189
  287. package/scripts/wave-agent-attach.sh +0 -47
  288. package/scripts/wave-auto-answer.sh +0 -118
  289. package/scripts/wave-dashboard-attach.sh +0 -13
  290. package/scripts/wave-launch.sh +0 -273
  291. package/scripts/wave-overnight-supervisor.sh +0 -145
  292. package/scripts/wave-status.sh +0 -379
  293. package/scripts/wave-watch.sh +0 -231
  294. package/services/README.md +0 -17
  295. package/services/openxml-validator/Dockerfile +0 -29
  296. package/services/openxml-validator/OpenXmlValidator.Api.csproj +0 -12
  297. package/services/openxml-validator/Program.cs +0 -436
  298. package/services/openxml-validator/README.md +0 -152
  299. package/services/openxml-validator/railway.json +0 -16
  300. package/services/react-word-editor/.tmp-a4/src/api/public-types.ts +0 -318
  301. package/services/react-word-editor/.tmp-a4/src/ui/WordReviewEditor.tsx +0 -1302
  302. package/services/react-word-editor/.tmp-a4/src/ui/editor-surface/editor-surface.tsx +0 -546
  303. package/services/react-word-editor/.tmp-a4/test/ui/word-review-editor.test.tsx +0 -146
  304. package/services/react-word-editor/.tmp-a4-build/src/api/public-types.js +0 -2
  305. package/services/react-word-editor/.tmp-a4-build/src/ui/WordReviewEditor.js +0 -818
  306. package/services/react-word-editor/.tmp-a4-build/src/ui/editor-surface/editor-surface.js +0 -229
  307. package/services/react-word-editor/.tmp-a4-build/test/ui/word-review-editor.test.js +0 -121
  308. package/services/react-word-editor/.tmp-wave-4-a3-tsconfig.json +0 -21
  309. package/services/react-word-editor/.tmp-wave-4-a3-tsconfig.tsbuildinfo +0 -1
  310. package/services/react-word-editor/Dockerfile +0 -26
  311. package/services/react-word-editor/README.md +0 -254
  312. package/services/react-word-editor/app/api/certification/route.ts +0 -79
  313. package/services/react-word-editor/app/api/demo-sessions/route.ts +0 -109
  314. package/services/react-word-editor/app/api/deploy-health/route.ts +0 -23
  315. package/services/react-word-editor/app/api/exports/[exportId]/route.ts +0 -34
  316. package/services/react-word-editor/app/api/exports/route.ts +0 -81
  317. package/services/react-word-editor/app/api/fixtures/[fixtureId]/run/route.ts +0 -100
  318. package/services/react-word-editor/app/api/health/route.ts +0 -70
  319. package/services/react-word-editor/app/api/runs/[runId]/route.ts +0 -36
  320. package/services/react-word-editor/app/api/scenarios/[scenarioId]/run/route.ts +0 -85
  321. package/services/react-word-editor/app/api/sessions/[sessionId]/route.ts +0 -199
  322. package/services/react-word-editor/app/api/sessions/[sessionId]/source/route.ts +0 -45
  323. package/services/react-word-editor/app/api/uploads/route.ts +0 -70
  324. package/services/react-word-editor/app/api/validate/route.ts +0 -310
  325. package/services/react-word-editor/app/certification/[runId]/page.tsx +0 -14
  326. package/services/react-word-editor/app/certification/page.tsx +0 -32
  327. package/services/react-word-editor/app/dashboard/page.tsx +0 -7
  328. package/services/react-word-editor/app/demo/page.tsx +0 -30
  329. package/services/react-word-editor/app/demo/prototype-client.tsx +0 -1080
  330. package/services/react-word-editor/app/editor/[sessionId]/page.tsx +0 -33
  331. package/services/react-word-editor/app/fixtures/page.tsx +0 -7
  332. package/services/react-word-editor/app/globals.css +0 -121
  333. package/services/react-word-editor/app/layout.tsx +0 -32
  334. package/services/react-word-editor/app/page.tsx +0 -30
  335. package/services/react-word-editor/app/runs/[runId]/page.tsx +0 -34
  336. package/services/react-word-editor/app/wave-10-word-review/page.tsx +0 -7
  337. package/services/react-word-editor/components/harness-control-bar.tsx +0 -289
  338. package/services/react-word-editor/components/harness-editor-session-client.tsx +0 -1214
  339. package/services/react-word-editor/components/harness-workspace-page.tsx +0 -715
  340. package/services/react-word-editor/components/reduced-motion-toggle.tsx +0 -79
  341. package/services/react-word-editor/components/workspace-certification-panel.tsx +0 -307
  342. package/services/react-word-editor/lib/certification-bundle.ts +0 -796
  343. package/services/react-word-editor/lib/certification-store.ts +0 -661
  344. package/services/react-word-editor/lib/demo-fixtures.test.mjs +0 -195
  345. package/services/react-word-editor/lib/demo-fixtures.ts +0 -1519
  346. package/services/react-word-editor/lib/editor-session-summary.test.mjs +0 -68
  347. package/services/react-word-editor/lib/editor-session-summary.ts +0 -14
  348. package/services/react-word-editor/lib/editor-session.ts +0 -228
  349. package/services/react-word-editor/lib/exports-route.test.mjs +0 -32
  350. package/services/react-word-editor/lib/harness-client.ts +0 -347
  351. package/services/react-word-editor/lib/harness-config.json +0 -30
  352. package/services/react-word-editor/lib/harness-config.test.mjs +0 -31
  353. package/services/react-word-editor/lib/harness-config.ts +0 -21
  354. package/services/react-word-editor/lib/harness-editor-datastore.test.mjs +0 -220
  355. package/services/react-word-editor/lib/harness-editor-datastore.ts +0 -161
  356. package/services/react-word-editor/lib/private-mode.test.mjs +0 -42
  357. package/services/react-word-editor/lib/private-mode.ts +0 -61
  358. package/services/react-word-editor/lib/regression-report.test.mjs +0 -352
  359. package/services/react-word-editor/lib/regression-report.ts +0 -896
  360. package/services/react-word-editor/lib/run-artifacts.ts +0 -934
  361. package/services/react-word-editor/lib/run-history.ts +0 -755
  362. package/services/react-word-editor/lib/scenario-artifacts.test.mjs +0 -41
  363. package/services/react-word-editor/lib/scenario-artifacts.ts +0 -44
  364. package/services/react-word-editor/lib/storage.ts +0 -953
  365. package/services/react-word-editor/lib/validator-client.test.mjs +0 -54
  366. package/services/react-word-editor/lib/validator-client.ts +0 -95
  367. package/services/react-word-editor/lib/workspace-navigation.ts +0 -79
  368. package/services/react-word-editor/middleware.ts +0 -35
  369. package/services/react-word-editor/next-env.d.ts +0 -6
  370. package/services/react-word-editor/next.config.mjs +0 -15
  371. package/services/react-word-editor/package.json +0 -38
  372. package/services/react-word-editor/postcss.config.mjs +0 -8
  373. package/services/react-word-editor/railway.json +0 -21
  374. package/services/react-word-editor/scripts/wave-10-certification.mjs +0 -101
  375. package/services/react-word-editor/scripts/wave-9-live-usability-pilot.mjs +0 -911
  376. package/services/react-word-editor/tsconfig.json +0 -39
  377. package/services/react-word-editor/tsconfig.tsbuildinfo +0 -1
  378. package/skills/README.md +0 -48
  379. package/skills/domain-docx-compatibility/SKILL.md +0 -44
  380. package/skills/domain-docx-compatibility/skill.json +0 -19
  381. package/skills/domain-editor-architecture/SKILL.md +0 -49
  382. package/skills/domain-editor-architecture/skill.json +0 -19
  383. package/skills/domain-legal-review/SKILL.md +0 -39
  384. package/skills/domain-legal-review/skill.json +0 -19
  385. package/skills/provider-aws/SKILL.md +0 -117
  386. package/skills/provider-aws/adapters/claude.md +0 -1
  387. package/skills/provider-aws/adapters/codex.md +0 -1
  388. package/skills/provider-aws/references/service-verification.md +0 -39
  389. package/skills/provider-aws/skill.json +0 -54
  390. package/skills/provider-custom-deploy/SKILL.md +0 -64
  391. package/skills/provider-custom-deploy/skill.json +0 -50
  392. package/skills/provider-docker-compose/SKILL.md +0 -96
  393. package/skills/provider-docker-compose/adapters/local.md +0 -1
  394. package/skills/provider-docker-compose/skill.json +0 -53
  395. package/skills/provider-github-release/SKILL.md +0 -121
  396. package/skills/provider-github-release/adapters/claude.md +0 -1
  397. package/skills/provider-github-release/adapters/codex.md +0 -1
  398. package/skills/provider-github-release/skill.json +0 -55
  399. package/skills/provider-kubernetes/SKILL.md +0 -143
  400. package/skills/provider-kubernetes/adapters/claude.md +0 -1
  401. package/skills/provider-kubernetes/adapters/codex.md +0 -1
  402. package/skills/provider-kubernetes/references/kubectl-patterns.md +0 -58
  403. package/skills/provider-kubernetes/skill.json +0 -52
  404. package/skills/provider-railway/SKILL.md +0 -123
  405. package/skills/provider-railway/adapters/claude.md +0 -1
  406. package/skills/provider-railway/adapters/codex.md +0 -1
  407. package/skills/provider-railway/adapters/local.md +0 -1
  408. package/skills/provider-railway/adapters/opencode.md +0 -1
  409. package/skills/provider-railway/references/verification-commands.md +0 -39
  410. package/skills/provider-railway/skill.json +0 -71
  411. package/skills/provider-ssh-manual/SKILL.md +0 -97
  412. package/skills/provider-ssh-manual/skill.json +0 -54
  413. package/skills/repo-coding-rules/SKILL.md +0 -55
  414. package/skills/repo-coding-rules/skill.json +0 -34
  415. package/skills/role-cont-eval/SKILL.md +0 -91
  416. package/skills/role-cont-eval/adapters/codex.md +0 -1
  417. package/skills/role-cont-eval/skill.json +0 -36
  418. package/skills/role-cont-qa/SKILL.md +0 -100
  419. package/skills/role-cont-qa/adapters/claude.md +0 -1
  420. package/skills/role-cont-qa/skill.json +0 -36
  421. package/skills/role-deploy/SKILL.md +0 -97
  422. package/skills/role-deploy/skill.json +0 -36
  423. package/skills/role-design/SKILL.md +0 -50
  424. package/skills/role-design/skill.json +0 -36
  425. package/skills/role-documentation/SKILL.md +0 -76
  426. package/skills/role-documentation/skill.json +0 -36
  427. package/skills/role-implementation/SKILL.md +0 -45
  428. package/skills/role-implementation/skill.json +0 -36
  429. package/skills/role-infra/SKILL.md +0 -81
  430. package/skills/role-infra/skill.json +0 -36
  431. package/skills/role-integration/SKILL.md +0 -91
  432. package/skills/role-integration/skill.json +0 -36
  433. package/skills/role-planner/SKILL.md +0 -39
  434. package/skills/role-planner/skill.json +0 -21
  435. package/skills/role-research/SKILL.md +0 -65
  436. package/skills/role-research/skill.json +0 -36
  437. package/skills/role-security/SKILL.md +0 -60
  438. package/skills/role-security/skill.json +0 -36
  439. package/skills/runtime-claude/SKILL.md +0 -66
  440. package/skills/runtime-claude/skill.json +0 -36
  441. package/skills/runtime-codex/SKILL.md +0 -58
  442. package/skills/runtime-codex/skill.json +0 -36
  443. package/skills/runtime-local/SKILL.md +0 -46
  444. package/skills/runtime-local/skill.json +0 -36
  445. package/skills/runtime-opencode/SKILL.md +0 -58
  446. package/skills/runtime-opencode/skill.json +0 -36
  447. package/skills/signal-hygiene/SKILL.md +0 -51
  448. package/skills/signal-hygiene/skill.json +0 -20
  449. package/skills/tui-design/SKILL.md +0 -77
  450. package/skills/tui-design/references/tui-design.md +0 -259
  451. package/skills/tui-design/skill.json +0 -36
  452. package/skills/wave-core/SKILL.md +0 -141
  453. package/skills/wave-core/references/marker-syntax.md +0 -70
  454. package/skills/wave-core/skill.json +0 -35
  455. package/test/README.md +0 -16
  456. package/test/core/formatting-commands.test.ts +0 -285
  457. package/test/core/image-commands.test.ts +0 -298
  458. package/test/core/mapping.test.ts +0 -186
  459. package/test/core/text-commands.test.ts +0 -176
  460. package/test/fixtures/docx/F01-basic-contract.docx +0 -0
  461. package/test/fixtures/docx/F01-basic-contract.md +0 -33
  462. package/test/fixtures/docx/F02-headings-styles.docx +0 -0
  463. package/test/fixtures/docx/F02-headings-styles.md +0 -33
  464. package/test/fixtures/docx/F03-legal-outline-numbering.docx +0 -0
  465. package/test/fixtures/docx/F03-legal-outline-numbering.md +0 -34
  466. package/test/fixtures/docx/F04-restart-numbering-schedules.docx +0 -0
  467. package/test/fixtures/docx/F04-restart-numbering-schedules.md +0 -33
  468. package/test/fixtures/docx/F05-table-heavy-agreement.docx +0 -0
  469. package/test/fixtures/docx/F05-table-heavy-agreement.md +0 -34
  470. package/test/fixtures/docx/F06-merged-cells-signature-table.docx +0 -0
  471. package/test/fixtures/docx/F06-merged-cells-signature-table.md +0 -34
  472. package/test/fixtures/docx/F07-inline-images-exhibit.docx +0 -0
  473. package/test/fixtures/docx/F07-inline-images-exhibit.md +0 -34
  474. package/test/fixtures/docx/F08-hyperlinks.docx +0 -0
  475. package/test/fixtures/docx/F08-hyperlinks.md +0 -33
  476. package/test/fixtures/docx/F09-comments-single-paragraph.docx +0 -0
  477. package/test/fixtures/docx/F09-comments-single-paragraph.md +0 -33
  478. package/test/fixtures/docx/F10-threaded-comments-resolve.docx +0 -0
  479. package/test/fixtures/docx/F10-threaded-comments-resolve.md +0 -33
  480. package/test/fixtures/docx/F11-redlines-basic.docx +0 -0
  481. package/test/fixtures/docx/F11-redlines-basic.md +0 -33
  482. package/test/fixtures/docx/F12-redlines-paragraph-joins-splits.docx +0 -0
  483. package/test/fixtures/docx/F12-redlines-paragraph-joins-splits.md +0 -33
  484. package/test/fixtures/docx/F13-comments-on-deleted-text.docx +0 -0
  485. package/test/fixtures/docx/F13-comments-on-deleted-text.md +0 -33
  486. package/test/fixtures/docx/F14-revisions-in-tables-and-lists.docx +0 -0
  487. package/test/fixtures/docx/F14-revisions-in-tables-and-lists.md +0 -33
  488. package/test/fixtures/docx/F15-sections-headers-footers.docx +0 -0
  489. package/test/fixtures/docx/F15-sections-headers-footers.md +0 -33
  490. package/test/fixtures/docx/F16-footnotes-endnotes.docx +0 -0
  491. package/test/fixtures/docx/F16-footnotes-endnotes.md +0 -33
  492. package/test/fixtures/docx/F17-fields-and-toc.docx +0 -0
  493. package/test/fixtures/docx/F17-fields-and-toc.md +0 -33
  494. package/test/fixtures/docx/F18-content-controls-template.docx +0 -0
  495. package/test/fixtures/docx/F18-content-controls-template.md +0 -33
  496. package/test/fixtures/docx/F19-custom-xml-doc-assembly.docx +0 -0
  497. package/test/fixtures/docx/F19-custom-xml-doc-assembly.md +0 -35
  498. package/test/fixtures/docx/F20-unknown-ooxml-and-alternatecontent.docx +0 -0
  499. package/test/fixtures/docx/F20-unknown-ooxml-and-alternatecontent.md +0 -33
  500. package/test/fixtures/docx/F21-malformed-broken-docx.docx +0 -0
  501. package/test/fixtures/docx/F21-malformed-broken-docx.md +0 -33
  502. package/test/fixtures/docx/README.md +0 -74
  503. package/test/fixtures/docx/certification-manifest.json +0 -104
  504. package/test/fixtures/docx/fixtures.manifest.json +0 -196
  505. package/test/fixtures/encrypted-docx/README.md +0 -27
  506. package/test/fixtures/encrypted-docx/certification-manifest.json +0 -9
  507. package/test/fixtures/encrypted-docx/fixtures.manifest.json +0 -47
  508. package/test/fixtures/scenarios/docx/README.md +0 -25
  509. package/test/fixtures/scenarios/docx/S01-sow-template.docx +0 -0
  510. package/test/fixtures/scenarios/docx/S01-sow-template.md +0 -30
  511. package/test/fixtures/scenarios/docx/S02-bw-partner-user-licence-agreement-redlines.docx +0 -0
  512. package/test/fixtures/scenarios/docx/S02-bw-partner-user-licence-agreement-redlines.md +0 -32
  513. package/test/fixtures/scenarios/docx/scenario-manifest.json +0 -53
  514. package/test/formats/xlsx/io/xlsx-import.test.ts +0 -766
  515. package/test/formats/xlsx/model/workbook.test.ts +0 -669
  516. package/test/helpers/dom-setup.ts +0 -124
  517. package/test/io/comment-roundtrip.test.ts +0 -272
  518. package/test/io/complex-content-roundtrip.test.ts +0 -632
  519. package/test/io/docx-compatibility-regression.test.ts +0 -199
  520. package/test/io/docx-session.test.ts +0 -1495
  521. package/test/io/footnotes-roundtrip.test.ts +0 -318
  522. package/test/io/headers-footers-roundtrip.test.ts +0 -547
  523. package/test/io/numbering-roundtrip.test.ts +0 -234
  524. package/test/io/package-reader.test.ts +0 -199
  525. package/test/io/paragraph-properties-roundtrip.test.ts +0 -129
  526. package/test/io/preserved-package-roundtrip.test.ts +0 -365
  527. package/test/io/property-completeness.test.ts +0 -292
  528. package/test/io/revision-roundtrip.test.ts +0 -347
  529. package/test/io/structural-blocks.test.ts +0 -202
  530. package/test/io/table-media-roundtrip.test.ts +0 -448
  531. package/test/io/table-properties-roundtrip.test.ts +0 -569
  532. package/test/io/table-roundtrip.test.ts +0 -302
  533. package/test/io/text-roundtrip.test.ts +0 -344
  534. package/test/model/canonical-document.test.ts +0 -285
  535. package/test/preservation/opaque-fragment-store.test.ts +0 -121
  536. package/test/preservation/package-preservation.test.ts +0 -395
  537. package/test/preservation/store.test.ts +0 -84
  538. package/test/review/comment-remapping.test.ts +0 -220
  539. package/test/review/comment-store.test.ts +0 -180
  540. package/test/review/move-revisions.test.ts +0 -143
  541. package/test/review/property-change-revisions.test.ts +0 -225
  542. package/test/review/revision-actions.test.ts +0 -330
  543. package/test/review/revision-store.test.ts +0 -193
  544. package/test/runtime/session-capabilities.test.ts +0 -260
  545. package/test/runtime/table-commands.test.ts +0 -356
  546. package/test/runtime/table-schema.test.ts +0 -221
  547. package/test/runtime/tracked-changes-toggle.test.ts +0 -107
  548. package/test/ui/comment-review-surface.test.tsx +0 -114
  549. package/test/ui/reduced-motion-toggle.test.tsx +0 -137
  550. package/test/ui/word-review-editor.imported-scenarios.test.tsx +0 -169
  551. package/test/ui/word-review-editor.interaction.test.tsx +0 -1198
  552. package/test/ui/word-review-editor.test.js +0 -188
  553. package/test/ui/word-review-editor.test.tsx +0 -280
  554. package/test/ui-tailwind/search-plugin.test.ts +0 -286
  555. package/test/validation/compatibility-engine.test.ts +0 -336
  556. package/test/validation/compatibility-report.test.ts +0 -189
  557. package/test/validation/low-priority-word-surfaces.test.ts +0 -282
  558. package/test/validation/malformed-doc.test.ts +0 -113
  559. package/test-results/.last-run.json +0 -4
  560. package/wave.config.json +0 -406
@@ -1,1699 +0,0 @@
1
- ---
2
- summary: 'Converted paper text and source links for DOVA: Deliberation-First Multi-Agent Orchestration for Autonomous Research Automation.'
3
- read_when:
4
- - Reviewing harness and coordination research source material in the docs tree
5
- - You want the extracted paper text with source links preserved
6
- topics:
7
- - blackboard-and-shared-workspaces
8
- - harnesses-and-practice
9
- kind: 'paper'
10
- title: 'DOVA: Deliberation-First Multi-Agent Orchestration for Autonomous Research Automation'
11
- ---
12
- # DOVA: Deliberation-First Multi-Agent Orchestration for Autonomous Research Automation
13
-
14
- <Note>
15
- Converted from the source document on 2026-03-21. The repo does not retain downloaded source files; they were fetched transiently, converted to Markdown, and deleted after extraction.
16
- </Note>
17
-
18
- ## Metadata
19
-
20
- | Field | Value |
21
- | --- | --- |
22
- | Content type | Paper / report |
23
- | Authors | Aaron Shen, Alfred Shen |
24
- | Year | 2026 |
25
- | Venue | arXiv 2603.13327 |
26
- | Research bucket | P0 direct hits |
27
- | Maps to | Deliberation-first orchestration, iterative refinement, and transparent coordination for autonomous research. |
28
- | Harness fit | Useful as a modern hybrid between harness design and blackboard-style coordination. |
29
- | Source page | [Open source](https://arxiv.org/abs/2603.13327) |
30
- | Source PDF | [Open PDF](https://arxiv.org/pdf/2603.13327.pdf) |
31
-
32
- ## Extracted text
33
- ### Page 1
34
-
35
- DOVA: Deliberation-First Multi-Agent Orchestration
36
-
37
- for Autonomous Research Automation
38
-
39
- Aaron Shen 1 Alfred Shen 2
40
-
41
- Abstract
42
-
43
- Large language model (LLM) agents have demon-
44
-
45
- strated remarkable capabilities in tool use, rea-
46
-
47
- soning, and code generation, yet single-agent
48
-
49
- systems exhibit fundamental limitations when
50
-
51
- confronted with complex research tasks demand-
52
-
53
- ing multi-source synthesis, adversarial verifica-
54
-
55
- tion, and personalized delivery. We present
56
-
57
- DOVA (Deep Orchestrated Versatile Agent), a
58
-
59
- multi-agent platform introducing three innova-
60
-
61
- tions: (1) deliberation-first orchestration, where
62
-
63
- explicit meta-reasoning precedes tool invocation,
64
-
65
- informed by a persistent user model and entity-
66
-
67
- aware conversation context; (2) hybrid collabora-
68
-
69
- tive reasoning, a composable three-phase pipeline
70
-
71
- unifying ensemble diversity, blackboard trans-
72
-
73
- parency, and iterative refinement; and (3) adap-
74
-
75
- tive multi-tiered thinking, a six-level token-budget
76
-
77
- allocation scheme reducing inference cost by 40–
78
-
79
- 60% on simple tasks while preserving deep rea-
80
-
81
- soning capacity. We formalize the core algo-
82
-
83
- rithms, present an architectural ablation study
84
-
85
- across seven system configurations, and analyze
86
-
87
- the contribution of each component to answer con-
88
-
89
- fidence, source coverage, and token efficiency.
90
-
91
- 1. Introduction
92
-
93
- The rapid advancement of large language models
94
-
95
- (LLMs) (Brown et al., 2020; Anthropic, 2024a) has enabled
96
-
97
- a new generation of autonomous agents capable of reason-
98
-
99
- ing, tool use, and multi-step planning (Yao et al., 2023b;
100
-
101
- Schick et al., 2023). However, deploying these agents for
102
-
103
- complex research automation—where a single query may
104
-
105
- require searching academic databases, analyzing code repos-
106
-
107
- itories, cross-referencing model registries, and synthesiz-
108
-
109
- ing findings with citations—exposes several limitations of
110
-
111
- 1
112
-
113
- University of California, Berkeley, USA 2
114
-
115
- Amazon
116
-
117
- Web Services, USA. Correspondence to: Aaron
118
-
119
- Shen <aaron.shen@berkeley.edu>, Alfred Shen <al-
120
-
121
- freshe@amazon.com>.
122
-
123
- Preprint. March 17, 2026.
124
-
125
- single-agent architectures:
126
-
127
- • Linear reasoning. A single agent processes informa-
128
-
129
- tion sequentially, missing cross-domain connections.
130
-
131
- • Premature commitment. Without adversarial chal-
132
-
133
- lenge, agents accept initial findings without verifica-
134
-
135
- tion.
136
-
137
- • Reflexive tool invocation. Standard REACT
138
-
139
- loops (Yao et al., 2023b) trigger tools based on key-
140
-
141
- word patterns rather than deliberate need assessment.
142
-
143
- • Fixed computation cost. Identical reasoning depth
144
-
145
- for trivial and complex queries wastes tokens on the
146
-
147
- former and starves the latter.
148
-
149
- We present DOVA, a multi-agent platform designed to ad-
150
-
151
- dress these limitations.
152
-
153
- 1.1. Contributions
154
-
155
- 1. Deliberation-first orchestration (§5.2). A meta-
156
-
157
- reasoning layer that deliberates—using a persistent
158
-
159
- user model and entity-aware context—before invoking
160
-
161
- any tool, reducing unnecessary API calls and enabling
162
-
163
- context-aware follow-ups.
164
-
165
- 2. Hybrid collaborative reasoning (§5.3). A compos-
166
-
167
- able three-phase pipeline (ensemble → blackboard →
168
-
169
- iterative refinement) combining breadth, transparency,
170
-
171
- and depth of multi-round critique.
172
-
173
- 3. Adaptive multi-tiered thinking (§5.4). A six-
174
-
175
- level token-budget allocation with automatic task-
176
-
177
- complexity selection, achieving significant token sav-
178
-
179
- ings on simple tasks.
180
-
181
- 4. Diversity-aware memory retrieval (§5.6).
182
-
183
- MMR (Carbonell & Goldstein, 1998) reranking
184
-
185
- over a multi-tier memory architecture with embedding-
186
-
187
- based semantic search.
188
-
189
- 5. Unified multi-modal interface (§6). Four cohesive
190
-
191
- access modalities—REST API, CLI, browser UI, and
192
-
193
- MCP server—sharing a single orchestration backend,
194
-
195
- with seamless Claude Code integration via dynamic
196
-
197
- plugin (Anthropic, 2024b).
198
-
199
- 1
200
-
201
- arXiv:2603.13327v1 [cs.AI] 4 Mar 2026
202
-
203
- ### Page 2
204
-
205
- DOVA: Deliberation-First Multi-Agent Orchestration
206
-
207
- 2. Preliminaries
208
-
209
- Definition 2.1 (Agent). An agent A = (π, T, M) is a tuple
210
-
211
- of a policy π (an LLM with a system prompt), a tool set
212
-
213
- T = {t1,..., tm}, and a memory store M.
214
-
215
- Definition 2.2 (Reasoning Trace). A reasoning trace τ =
216
-
217
- (s0, a1, o1, s1,..., an, on, sn) is an alternating sequence of
218
-
219
- thought states si ∈ S, actions ai ∈ Aact ∪ {conclude},
220
-
221
- and observations oi ∈ O.
222
-
223
- Definition 2.3 (Confidence Function). A confidence func-
224
-
225
- tion C: R × P → [0, 1] maps a response r and prompt p to
226
-
227
- a scalar quality estimate.
228
-
229
- Let Q denote user queries, D the data sources (ArXiv,
230
-
231
- GitHub, HuggingFace, Web), and U a user model capturing
232
-
233
- expertise, preferences, and history.
234
-
235
- Problem. Given query q ∈ Q, user model u ∈ U, and
236
-
237
- context ξ, produce response r∗ maximizing:
238
-
239
- r∗ = arg max
240
-
241
- r∈R
242
-
243
- C(r, q) · Cov(r, D) s.t. cost(r) ≤ B(q),
244
-
245
- (1)
246
-
247
- where Cov(r, D) measures source coverage and B(q) is a
248
-
249
- query-adaptive token budget.
250
-
251
- 3. Related Work
252
-
253
- LLM Reasoning. Chain-of-thought prompting (Wei et al.,
254
-
255
- 2022) demonstrated that intermediate reasoning steps im-
256
-
257
- prove LLM performance. REACT (Yao et al., 2023b) inter-
258
-
259
- leaved reasoning with tool actions. Tree of Thoughts (Yao
260
-
261
- et al., 2023a) and Language Agent Tree Search (Zhou et al.,
262
-
263
- 2023) extended this to tree-structured exploration. Reflex-
264
-
265
- ion (Shinn et al., 2023) added verbal self-reflection, Self-
266
-
267
- Refine (Madaan et al., 2023) showed LLMs can critique
268
-
269
- their own outputs, and Self-Consistency (Wang et al., 2023)
270
-
271
- introduced majority voting. Wei et al. (2026) provide a
272
-
273
- comprehensive taxonomy of agentic reasoning along foun-
274
-
275
- dational, self-evolving, and collective dimensions, and a sur-
276
-
277
- vey of long chain-of-thought reasoning (Chen et al., 2025)
278
-
279
- traces the evolution from standard CoT to extended reason-
280
-
281
- ing in models such as OpenAI O1 and DeepSeek-R1. DOVA
282
-
283
- augments REACT with (a) a deliberation step that reasons
284
-
285
- about whether to invoke tools and (b) multi-component
286
-
287
- confidence scoring with self-reflection.
288
-
289
- Multi-Agent Systems. Multi-agent debate (Du et al.,
290
-
291
- 2023; Liang et al., 2023) improves factuality. CAMEL (Li
292
-
293
- et al., 2023) explored role-playing communication. Gen-
294
-
295
- erative Agents (Park et al., 2023) simulated behavior with
296
-
297
- memory. MetaGPT (Hong et al., 2023) assigned software
298
-
299
- roles. AutoGen (Wu et al., 2023) provided conversation-
300
-
301
- based multi-agent frameworks. A recent survey (Tran et al.,
302
-
303
- 2025) categorizes collaboration mechanisms into coopera-
304
-
305
- tion, competition, and coordination protocols, while Dang
306
-
307
- et al. (2025) propose centralized orchestration with rein-
308
-
309
- forcement learning. Orogat et al. (2026) provide a uni-
310
-
311
- fied benchmark showing that framework-level architectural
312
-
313
- choices (e.g., message routing, memory sharing) can in-
314
-
315
- crease latency by up to 100×, underscoring the importance
316
-
317
- of deliberation-aware orchestration. Unlike these systems
318
-
319
- which employ a single collaboration pattern, DOVA com-
320
-
321
- poses three patterns into a hybrid pipeline with a delib-
322
-
323
- eration layer determining when multi-agent reasoning is
324
-
325
- warranted.
326
-
327
- Tool-Augmented LLMs. Toolformer (Schick et al., 2023)
328
-
329
- trained LLMs to self-annotate tool calls. Gorilla (Patil et al.,
330
-
331
- 2023) fine-tuned on API documentation. ToolLLM (Qin
332
-
333
- et al., 2023) scaled to 16,000+ APIs. MCP (Anthropic,
334
-
335
- 2024b) standardized tool integration; Hou et al. (2025)
336
-
337
- provide a systematic landscape analysis and threat taxon-
338
-
339
- omy, while MCP-Universe (Luo et al., 2025) offers the first
340
-
341
- comprehensive benchmark across real-world MCP servers.
342
-
343
- DOVA leverages MCP but introduces deliberation-first tool
344
-
345
- selection.
346
-
347
- Adaptive Computation. Adaptive Computation
348
-
349
- Time (Graves, 2016) introduced variable compute for
350
-
351
- RNNs. Pause tokens (Goyal et al., 2023) allocated extra pro-
352
-
353
- cessing. Recent work on budget-guided thinking (Li et al.,
354
-
355
- 2025), token-budget-aware reasoning (Han et al., 2024),
356
-
357
- and a survey of adaptive test-time compute (Alomrani
358
-
359
- et al., 2025) confirm that variable token budgets improve
360
-
361
- efficiency–quality trade-offs. Sleep-time compute (Lin
362
-
363
- et al., 2025) extends this to pre-computation, while Zhu
364
-
365
- et al. (2025) provide the first systematic study of test-time
366
-
367
- scaling specifically for LLM agents. DOVA applies this at
368
-
369
- the system level through a six-tier thinking budget.
370
-
371
- 4. System Architecture
372
-
373
- Figure 1 illustrates the layered architecture.
374
-
375
- 4.1. Agent Layer
376
-
377
- All agents inherit from a common base providing two
378
-
379
- mixins: ReasoningMixin (implements the REACT loop
380
-
381
- with self-reflection and a working-memory scratchpad) and
382
-
383
- MemoryMixin (access to the enhanced memory service).
384
-
385
- Five specialized agents compose the agent pool: (1) Re-
386
-
387
- searchAgent—multi-source search via MCP servers with
388
-
389
- query-type classification; (2) ProfilingAgent—user model
390
-
391
- management via persistent memory; (3) ValidationAgent—
392
-
393
- code analysis and sandboxed execution; (4) Synthesis-
394
-
395
- Agent—narrative generation with source attribution; (5) De-
396
-
397
- bateAgent—adversarial Bull-vs-Bear analysis.
398
-
399
- 2
400
-
401
- ### Page 3
402
-
403
- DOVA: Deliberation-First Multi-Agent Orchestration
404
-
405
- Figure 1. Layered architecture of DOVA. Queries enter through the Interface Layer, pass through Orchestration (with deliberation),
406
-
407
- dispatch to specialized agents, which leverage collaborative reasoning and intelligence services.
408
-
409
- Table 1. Model tier configuration.
410
-
411
- Task Type Tier Max Tok. Temp.
412
-
413
- Classification Basic 10K 0.0
414
-
415
- Summarization Basic 20K 0.3
416
-
417
- Chat Standard 40K 0.7
418
-
419
- Code Gen. Advanced 80K 0.2
420
-
421
- Reasoning Advanced 40K 0.7
422
-
423
- 4.2. Model Tiering
424
-
425
- DOVA routes LLM calls through a tiering system that maps
426
-
427
- task types to model classes (Table 1).
428
-
429
- 5. Core Algorithms
430
-
431
- 5.1. ReAct Reasoning with Self-Reflection
432
-
433
- The foundational reasoning loop extends REACT (Yao et al.,
434
-
435
- 2023b) with a terminal self-reflection step. Each agent main-
436
-
437
- tains a scratchpad—a working memory that accumulates
438
-
439
- observations.
440
-
441
- The trace confidence is the mean over per-step confidences:
442
-
443
- ¯c(τ) =
444
-
445
- 1
446
-
447
- |{ci}|
448
-
449
- X
450
-
451
- i
452
-
453
- ci, ci ∈ [0, 1]. (2)
454
-
455
- Algorithm 1 ReAct Reasoning with Self-Reflection
456
-
457
- Require: Problem q; max iterations N; reflect flag ϕ
458
-
459
- Ensure: Reasoning trace τ, answer r, confidence ¯c
460
-
461
- τ ← ∅; pad ← ∅
462
-
463
- for i = 1 toN do
464
-
465
- (si, ai, ci) ← THINK(q, τ, pad)
466
-
467
- τ ← τ ∪ {(THOUGHT, si, ci)}
468
-
469
- if ai = conclude then
470
-
471
- r ← si; break
472
-
473
- end if
474
-
475
- oi ← ACT(ai) {execute tool}
476
-
477
- τ ← τ ∪ {(ACT, ai), (OBS, oi)}
478
-
479
- pad ← pad ∪ {oi}
480
-
481
- end for
482
-
483
- if ϕ and r exists then
484
-
485
- (r′, crit) ← REFLECT(r, q, τ)
486
-
487
- τ ← τ ∪ {(REFL, crit)}; r ← r′
488
-
489
- end if
490
-
491
- ¯c ← 1
492
-
493
- |τc|
494
-
495
- P
496
-
497
- ci
498
-
499
- return (τ, r, ¯c)
500
-
501
- 3
502
-
503
- ### Page 4
504
-
505
- DOVA: Deliberation-First Multi-Agent Orchestration
506
-
507
- 5.2. Deliberation-First Orchestration
508
-
509
- The key innovation of DOVA’s
510
-
511
- ThinkingOrchestrator is an explicit delibera-
512
-
513
- tion step preceding all tool invocation. Unlike standard
514
-
515
- REACT agents that reflexively call tools, the orchestrator
516
-
517
- first assesses whether external information is necessary.
518
-
519
- Algorithm 2 Deliberation-First Orchestration
520
-
521
- Require: Query q; user model u; context ξ; sources D′
522
-
523
- Ensure: Deliberation δ
524
-
525
- exp ← FORMATEXPERTISE(u)
526
-
527
- ent ← FORMATENTITIES(ξ)
528
-
529
- rec ← RECENTTURNS(ξ, k=6)
530
-
531
- Tavail ← DISCOVERTOOLS(D′)
532
-
533
- δ ← LLM DELIBERATE(q, exp, ent, rec, Tavail)
534
-
535
- if CHECKMANDATORYTRIGGERS(q) then
536
-
537
- δ.action ← USE TOOLS
538
-
539
- end if
540
-
541
- return δ
542
-
543
- The mandatory trigger function detects temporal keywords
544
-
545
- (“latest,” “recent,” year patterns ≥2025), specificity mark-
546
-
547
- ers (“specific papers”), and real-time queries that always
548
-
549
- warrant tool invocation.
550
-
551
- Proposition 5.1 (Tool Call Reduction). Let fd be
552
-
553
- the fraction of queries where deliberation selects
554
-
555
- RESPOND DIRECTLY. The expected tool-call volume rel-
556
-
557
- ative to a standard REACT agent is (1 − fd), achieving cost
558
-
559
- savings proportional to fd · ctool, where ctool is the average
560
-
561
- cost per tool-augmented response.
562
-
563
- 5.3. Hybrid Collaborative Reasoning
564
-
565
- DOVA composes three collaboration patterns into a single
566
-
567
- pipeline.
568
-
569
- Phase 1: Ensemble. Multiple agents solve the problem
570
-
571
- independently in parallel. The agreement score quantifies
572
-
573
- consensus:
574
-
575
- A(c1,..., cn) = max
576
-
577
- 0, 1 − Var(c1,..., cn)
578
-
579
- 
580
-
581
- . (3)
582
-
583
- Phase 2: Blackboard. Results are posted to a shared
584
-
585
- workspace where agents contribute evidence and votes.
586
-
587
- Each post carries a weighted confidence:
588
-
589
- w(p) = cbase(p) ·
590
-
591
- 1 + ¯a(p)
592
-
593
- 2
594
-
595
- , ¯a(p) =
596
-
597
- 1
598
-
599
- |Vp|
600
-
601
- X
602
-
603
- v∈Vp
604
-
605
- vagree, (4)
606
-
607
- where cbase is the agent’s self-assessed confidence and ¯a is
608
-
609
- mean agreement from peer votes (vagree ∈ [−1, 1]) (Hayes-
610
-
611
- Roth, 1985).
612
-
613
- Phase 3: Iterative Refinement. The top-ranked synthesis
614
-
615
- is iteratively refined through multi-round critique.
616
-
617
- Algorithm 3 Hybrid Collaborative Reasoning
618
-
619
- Require: Problem q; agents {Ai}; max iter. K; context ξ
620
-
621
- Ensure: Result r∗, confidence c∗, agreement A
622
-
623
- {Phase 1: Ensemble}
624
-
625
- (ˆr, {ci}, dissent) ← ENSEMBLE(q, {Ai}, ξ)
626
-
627
- A ← 1 − Var({ci})
628
-
629
- {Phase 2: Blackboard}
630
-
631
- BB.clear()
632
-
633
- POST(HYPO, ˆr, ¯c)
634
-
635
- for d ∈ dissent do
636
-
637
- POST(EVID, d, 0.3)
638
-
639
- end for
640
-
641
- rbb ← SYNTHESIZEBB(BB)
642
-
643
- {Phase 3: Iterative Refinement}
644
-
645
- r∗ ← ITERREFINE(rbb, {A1, A2}, min(2, K))
646
-
647
- c∗ ← 1
648
-
649
- 2 (¯cens + citer)
650
-
651
- return (r∗, c∗, A)
652
-
653
- Table 2. Thinking levels and token budgets (2–4× scaling per
654
-
655
- level).
656
-
657
- Level Budget Typical Tasks
658
-
659
- OFF 0 Embeddings
660
-
661
- MINIMAL 1,024 Classification
662
-
663
- LOW 4,096 Summarization
664
-
665
- MEDIUM 16,384 Code generation
666
-
667
- HIGH 32,768 Reasoning, research
668
-
669
- XHIGH 65,536 Complex analysis
670
-
671
- 5.4. Adaptive Multi-Tiered Thinking
672
-
673
- DOVA allocates reasoning compute via a six-level budget
674
-
675
- (Table 2).
676
-
677
- The selection function maps a task to a thinking level:
678
-
679
- Formally, the budget function is:
680
-
681
- B(t, h, q) = BUD
682
-
683
- 
684
-
685
- clamp
686
-
687
- β(t)+ α(h)+ γ(q), 0, 5
688
-
689
- 
690
-
691
- , (5)
692
-
693
- where β: Ttask → {0,..., 5} maps task types, α:
694
-
695
- H → {−1, 0, 1, 2} adjusts for complexity, and γ: Q →
696
-
697
- {−1, 0, 1} adjusts for query length.
698
-
699
- 5.5. Multi-Component Confidence Scoring
700
-
701
- The self-evaluation service computes confidence as:
702
-
703
- C(r, p) =
704
-
705
- P
706
-
707
- k wk · fk(r, p)
708
-
709
- P
710
-
711
- k wk
712
-
713
- , (6)
714
-
715
- 4
716
-
717
- ### Page 5
718
-
719
- DOVA: Deliberation-First Multi-Agent Orchestration
720
-
721
- Algorithm 4 Adaptive Thinking Level Selection
722
-
723
- Require: Task type t; query q; complexity hint h
724
-
725
- Ensure: Level ℓ and budget b
726
-
727
- L ← [OFF, MIN, LOW, MED, HI, XH]
728
-
729
- base ← TASKDEFAULTS[t]
730
-
731
- adj ← 0
732
-
733
- if h = simple then
734
-
735
- adj ← adj − 1
736
-
737
- end if
738
-
739
- if h = complex then
740
-
741
- adj ← adj + 1
742
-
743
- end if
744
-
745
- if h = very complex then
746
-
747
- adj ← adj + 2
748
-
749
- end if
750
-
751
- if |q| > 2000 then
752
-
753
- adj ← adj + 1
754
-
755
- end if
756
-
757
- if |q| < 50 then
758
-
759
- adj ← adj − 1
760
-
761
- end if
762
-
763
- idx ← clamp(indexOf(base) + adj, 0, 5)
764
-
765
- ℓ ← L[idx]; b ← BUDGETS[ℓ]
766
-
767
- return (ℓ, b)
768
-
769
- with four components:
770
-
771
- flen(r) = clip
772
-
773
- 
774
-
775
- |r|
776
-
777
- τlen
778
-
779
- , 0.2, 1.0
780
-
781
- 
782
-
783
- , (7)
784
-
785
- fref (r) = 1 − 0.7 · ⊮[∃ k∈Kref: k⊆r], (8)
786
-
787
- ffmt(r, φ) = format check(r, φ), (9)
788
-
789
- frel(r, p) = min
790
-
791
- 
792
-
793
- 1, |kw(r)∩kw(p)|
794
-
795
- 0.3·|kw(p)|
796
-
797
- 
798
-
799
- . (10)
800
-
801
- A response is acceptable when C(r, p) ≥ θmin (default 0.6).
802
-
803
- When C < 0.7, iterative query refinement triggers (up to 2
804
-
805
- rounds).
806
-
807
- 5.6. Diversity-Aware Memory Retrieval
808
-
809
- The enhanced memory stores entries in three tiers: short-
810
-
811
- term (TTL = 86,400s), long-term (persistent), and proce-
812
-
813
- dural (reusable skills).
814
-
815
- Retrieval uses cosine similarity reranked with MMR (Car-
816
-
817
- bonell & Goldstein, 1998). Recent work on agent memory
818
-
819
- beyond RAG (Hu et al., 2026) decouples memories into se-
820
-
821
- mantic components; DOVA takes a complementary approach
822
-
823
- with tiered storage and diversity-aware retrieval:
824
-
825
- MMR(di) = λ·sim(di, q) − (1−λ)·max
826
-
827
- dj∈S
828
-
829
- sim(di, dj), (11)
830
-
831
- where sim(a, b) = a·b/(∥a∥∥b∥), S is the set of already-
832
-
833
- selected results, and λ ∈ [0, 1] (default 0.5) controls the
834
-
835
- relevance–diversity trade-off.
836
-
837
- Algorithm 5 MMR-Enhanced Semantic Memory Search
838
-
839
- Require: Query q; top-k; λ; memory M
840
-
841
- Ensure: Ranked results R
842
-
843
- eq ← EMBED(q)
844
-
845
- sc ← {(m, sim(eq, em)): m ∈ M}
846
-
847
- Sort sc by similarity descending
848
-
849
- S ← ∅; R ← ∅
850
-
851
- while |R| < k and sc ̸= ∅ do
852
-
853
- d∗ ← arg maxd∈sc λ · sim(d, q) − (1−λ) ·
854
-
855
- maxd′∈S sim(d, d′)
856
-
857
- R ← R ∪ {d∗}; S ← S ∪ {d∗}
858
-
859
- sc ← sc \ {d∗}
860
-
861
- end while
862
-
863
- return R
864
-
865
- Table 3. Query type to source routing.
866
-
867
- Type ArXiv GitHub HF Web
868
-
869
- Technical ✓ ✓ ✓ ✓
870
-
871
- News ✓
872
-
873
- Biographical ✓
874
-
875
- Factual ✓ ✓
876
-
877
- General ✓ ✓ ✓ ✓
878
-
879
- 5.7. Query Intent Classification
880
-
881
- The research agent classifies queries to route to appropriate
882
-
883
- sources:
884
-
885
- t∗(q) = arg max
886
-
887
- t∈Tq
888
-
889
- X
890
-
891
- k∈Kt
892
-
893
- ⊮[k ∈ q↓] + bonus(q, t), (12)
894
-
895
- where Tq = {tech., news, bio., fact., gen.}, q↓ is the low-
896
-
897
- ercased query, and bonus(q, bio.) = 2 · ⊮[is person(q)].
898
-
899
- Table 3 shows the source routing.
900
-
901
- 5.8. Multi-Round Adversarial Debate
902
-
903
- The debate agent implements a Bull-vs-Bear pattern for
904
-
905
- evaluative queries. Inspired by financial analysis practice,
906
-
907
- two adversarial agents—Bull (advocate) and Bear (critic)—
908
-
909
- argue opposing positions across multiple rounds. Each agent
910
-
911
- receives the accumulated arguments of its opponent, forcing
912
-
913
- direct engagement with counterpoints rather than indepen-
914
-
915
- dent monologues.
916
-
917
- The sequential turn-taking is critical: in round r, the
918
-
919
- Bull agent conditions on all prior Bear arguments B<r
920
-
921
- ear,
922
-
923
- and vice versa. This creates an implicit convergence
924
-
925
- dynamic—arguments that survive multiple rounds of ad-
926
-
927
- versarial scrutiny carry higher epistemic weight in the final
928
-
929
- synthesis.
930
-
931
- The synthesis step aggregates both argument sets into a struc-
932
-
933
- tured output containing: (i) a balanced summary, (ii) sur-
934
-
935
- viving strengths (Bull arguments not effectively rebutted),
936
-
937
- 5
938
-
939
- ### Page 6
940
-
941
- DOVA: Deliberation-First Multi-Agent Orchestration
942
-
943
- Algorithm 6 Multi-Round Adversarial Debate
944
-
945
- Require: Topic q; context ξ; rounds R (default 2)
946
-
947
- Ensure: Conclusion: summary, strengths, concerns, confi-
948
-
949
- dence
950
-
951
- Bull ← ∅; Bear ← ∅
952
-
953
- for r = 1 toR do
954
-
955
- br ← BULLAGENT.ARGUE(q, ξ, Bear)
956
-
957
- Bull ← Bull ∪ {br}
958
-
959
- kr ← BEARAGENT.ARGUE(q, ξ, Bull)
960
-
961
- Bear ← Bear ∪ {kr}
962
-
963
- end for
964
-
965
- return SYNTHESIZE(Bull, Bear)
966
-
967
- Table 4. Interface modalities.
968
-
969
- Interface Access Key Features
970
-
971
- REST API HTTP 15+ endpoints, OAuth2
972
-
973
- CLI Terminal CoT display, sessions
974
-
975
- Browser UI Web Source chips, badges
976
-
977
- MCP Server Stdio 5 tools, plugin arch.
978
-
979
- (iii) validated concerns (Bear arguments not adequately ad-
980
-
981
- dressed), and (iv) an overall confidence score reflecting
982
-
983
- argument balance. We default to R=2 rounds, as empiri-
984
-
985
- cally the marginal information gain diminishes beyond two
986
-
987
- rounds while token cost grows linearly.
988
-
989
- This pattern draws on multi-agent debate research (Du et al.,
990
-
991
- 2023; Liang et al., 2023), extending it with structured syn-
992
-
993
- thesis and integration into the broader orchestration pipeline
994
-
995
- via the deliberation layer, which determines when adversar-
996
-
997
- ial analysis is warranted versus simpler reasoning modes.
998
-
999
- 6. Interface Modalities
1000
-
1001
- DOVA exposes its orchestration engine through four inter-
1002
-
1003
- faces sharing the same backend (Table 4).
1004
-
1005
- 6.1. Claude Code Integration via Dynamic Plugin
1006
-
1007
- The MCP server (Anthropic, 2024b) exposes
1008
-
1009
- five tools to Claude Code: dova research,
1010
-
1011
- dova search, dova debate, dova validate,
1012
-
1013
- and dova web search. Communication uses stdio
1014
-
1015
- transport with lazy initialization.
1016
-
1017
- The plugin architecture provides: (i) a plugin.json
1018
-
1019
- manifest; (ii) an.mcp.json server configuration;
1020
-
1021
- (iii) custom slash-command skills (/dova-research,
1022
-
1023
- /dova-debate); (iv) a custom agent definition enabling
1024
-
1025
- autonomous multi-source research.
1026
-
1027
- This creates a bidirectional integration: Claude Code in-
1028
-
1029
- vokes DOVA as a tool provider, while DOVA uses Claude
1030
-
1031
- models as its LLM backbone—each system augmenting the
1032
-
1033
- other.
1034
-
1035
- 6.2. Interactive CLI
1036
-
1037
- The interactive CLI provides a seven-step chain-of-thought
1038
-
1039
- pipeline: (1) Observe—parse input; (2) Recall—search
1040
-
1041
- memory; (3) Reason—CoT analysis; (4) Plan—select ac-
1042
-
1043
- tion; (5) Act—execute tools; (6) Reflect—evaluate qual-
1044
-
1045
- ity; (7) Respond—generate output. Session commands
1046
-
1047
- (/status, /thinking, /orchestrator) provide
1048
-
1049
- runtime control.
1050
-
1051
- 7. Experiments and Evaluation
1052
-
1053
- We evaluate DOVA through an architectural ablation and
1054
-
1055
- reasoning mode comparison.
1056
-
1057
- 7.1. Setup
1058
-
1059
- Models. Claude Sonnet 4.6 (Standard tier), Claude
1060
-
1061
- Opus 4.6 (Advanced tier), and Claude Haiku 4.5 (Basic
1062
-
1063
- tier).
1064
-
1065
- Baselines. (1) Single-LLM: one Claude Opus call;
1066
-
1067
- (2) REACT-only: standard REACT without deliberation
1068
-
1069
- or collaboration; (3) Ensemble-only: parallel multi-agent
1070
-
1071
- without blackboard or iterative refinement.
1072
-
1073
- Metrics. Answer confidence (C), source coverage (Cov),
1074
-
1075
- token efficiency, latency, refinement rate, and error recovery
1076
-
1077
- rate.
1078
-
1079
- 7.2. Ablation Study
1080
-
1081
- Table 5 presents the architectural ablation across seven con-
1082
-
1083
- figurations.
1084
-
1085
- Key findings. (1) Collaboration is highest-impact: re-
1086
-
1087
- moving it drops confidence by 0.14 and coverage by
1088
-
1089
- 0.25. (2) Self-evaluation prevents degradation: without
1090
-
1091
- it, low-quality responses reach the user (refinement rate
1092
-
1093
- 18%→35%). (3) Adaptive thinking is a pure efficiency gain:
1094
-
1095
- fixed MEDIUM reduces token efficiency by 32% with mini-
1096
-
1097
- mal confidence impact. (4) Deliberation reduces cost: re-
1098
-
1099
- moving it increases latency by 19% and decreases efficiency
1100
-
1101
- by 27% through unnecessary tool invocations. (5) ReAct is
1102
-
1103
- foundational: single-pass causes the largest confidence drop
1104
-
1105
- (0.82→0.58).
1106
-
1107
- 7.3. Reasoning Mode Comparison
1108
-
1109
- Table 6 compares the four reasoning modes that DOVA ex-
1110
-
1111
- poses, each representing a different point on the quality–cost
1112
-
1113
- Pareto frontier.
1114
-
1115
- Quick mode uses a single agent with minimal thinking
1116
-
1117
- budget and no tool invocation, suitable for simple factual
1118
-
1119
- 6
1120
-
1121
- ### Page 7
1122
-
1123
- DOVA: Deliberation-First Multi-Agent Orchestration
1124
-
1125
- Table 5. Architectural ablation study. Each row removes one component. Values represent expected relative performance based on
1126
-
1127
- architectural analysis. ↑ = higher is better; ↓ = lower is better. Bold indicates full-system values.
1128
-
1129
- Configuration Reasoning Collab. Think Conf.↑ Cov.↑ Tok.Eff.↑ Lat.(s)↓
1130
-
1131
- DOVA-Full ✓ ✓ Adaptive 0.82 0.90 0.71 12.4
1132
-
1133
- −Collaboration ✓ — Adaptive 0.68 0.65 0.74 6.1
1134
-
1135
- −Thinking (fixed Med) ✓ ✓ Fixed 0.79 0.88 0.48 11.8
1136
-
1137
- −Memory ✓ ✓ Adaptive 0.75 0.85 0.65 11.2
1138
-
1139
- −Deliberation ✓ ✓ Adaptive 0.77 0.90 0.52 14.8
1140
-
1141
- −Self-Eval ✓ ✓ Adaptive 0.70 0.88 0.69 10.1
1142
-
1143
- −ReAct (single pass) — — — 0.58 0.45 0.80 3.2
1144
-
1145
- Single-LLM baseline — — — 0.52 0.00 0.85 1.8
1146
-
1147
- Table 6. Reasoning mode comparison. Confidence and token
1148
-
1149
- consumption are averaged across a mixed workload of factual,
1150
-
1151
- technical, and evaluative queries.
1152
-
1153
- Mode Agents Conf. Lat. Tok.
1154
-
1155
- Quick 1 0.52 1.8s 2K
1156
-
1157
- Standard 1 0.68 6.5s 12K
1158
-
1159
- Deep N 0.78 18.3s 45K
1160
-
1161
- Collaborative N 0.82 24.1s 65K
1162
-
1163
- recall or conversational follow-ups. Standard mode enables
1164
-
1165
- the full REACT loop with self-reflection and tool access,
1166
-
1167
- providing a 31% confidence gain over Quick at 6× the token
1168
-
1169
- cost. Deep mode activates multiple agents with ensemble
1170
-
1171
- reasoning but without the blackboard or iterative refinement
1172
-
1173
- phases, achieving a further 15% confidence improvement.
1174
-
1175
- Collaborative mode engages the complete hybrid pipeline
1176
-
1177
- (Algorithm 3), yielding the highest confidence at the cost of
1178
-
1179
- 32.5× the tokens of Quick mode.
1180
-
1181
- The confidence gap between Standard and Collaborative
1182
-
1183
- (0.68 vs. 0.82) highlights the value of multi-agent reason-
1184
-
1185
- ing for complex queries, while the gap between Quick and
1186
-
1187
- Standard (0.52 vs. 0.68) demonstrates that tool access and
1188
-
1189
- self-reflection are individually high-value. The delibera-
1190
-
1191
- tion layer (§5.2) automatically selects the appropriate mode
1192
-
1193
- based on query complexity, ensuring that simple queries de-
1194
-
1195
- fault to Quick or Standard while research-intensive queries
1196
-
1197
- escalate to Deep or Collaborative.
1198
-
1199
- 7.4. Token Efficiency Analysis
1200
-
1201
- Figure 2 illustrates the token savings from adaptive thinking
1202
-
1203
- level selection (Algorithm 4) compared to a fixed MEDIUM
1204
-
1205
- baseline across five representative task types.
1206
-
1207
- The savings are most pronounced for lightweight tasks: clas-
1208
-
1209
- sification drops from 16K to 1K tokens (94% reduction) and
1210
-
1211
- summarization from 16K to 4K (75%), since these tasks
1212
-
1213
- require only MINIMAL and LOW thinking budgets respec-
1214
-
1215
- tively. For complex tasks (reasoning and research), the
1216
-
1217
- adaptive system allocates HIGH budgets (33K), exceeding
1218
-
1219
- the fixed 16K baseline—this is the intended behavior, as un-
1220
-
1221
- Classif.
1222
-
1223
- Summ.
1224
-
1225
- Code
1226
-
1227
- Reason.
1228
-
1229
- Research
1230
-
1231
- 0
1232
-
1233
- 10
1234
-
1235
- 20
1236
-
1237
- 30
1238
-
1239
- 40
1240
-
1241
- 1
1242
-
1243
- 4
1244
-
1245
- 16
1246
-
1247
- 33 33
1248
-
1249
- 16 16 16 16 16
1250
-
1251
- Tokens (K)
1252
-
1253
- Adaptive
1254
-
1255
- Fixed
1256
-
1257
- Figure 2. Token consumption: adaptive vs. fixed MEDIUM. Adap-
1258
-
1259
- tive saves 94% on classification and 75% on summarization.
1260
-
1261
- derspending on hard tasks degrades answer quality (Table 5,
1262
-
1263
- row 2).
1264
-
1265
- The key insight is that adaptive allocation is not uniformly
1266
-
1267
- cheaper. Rather, it redistributes tokens from tasks that do
1268
-
1269
- not benefit from deep reasoning to tasks that do. Under
1270
-
1271
- a realistic workload where 40–60% of queries are simple
1272
-
1273
- (classification, summarization, or short factual lookups), the
1274
-
1275
- aggregate token savings reach 40–60% with no measurable
1276
-
1277
- confidence loss (Table 5: 0.82 vs. 0.79). Code generation
1278
-
1279
- consumes 16K under both schemes because its default level
1280
-
1281
- (MEDIUM) already matches the fixed baseline.
1282
-
1283
- 7.5. Component Interaction Effects
1284
-
1285
- We observe notable interactions:
1286
-
1287
- • Deliberation × Collaboration: Removing both
1288
-
1289
- is worse than the sum of individual removals—
1290
-
1291
- deliberation gatekeeps expensive collaborative reason-
1292
-
1293
- ing.
1294
-
1295
- • Memory × Self-Eval: Memory provides context
1296
-
1297
- that improves evaluation accuracy. Without it, false-
1298
-
1299
- positive retries increase.
1300
-
1301
- • Thinking × Tiering: Adaptive thinking (depth within
1302
-
1303
- a model) is complementary to model tiering (which
1304
-
1305
- model), providing two-dimensional cost optimization.
1306
-
1307
- 7
1308
-
1309
- ### Page 8
1310
-
1311
- DOVA: Deliberation-First Multi-Agent Orchestration
1312
-
1313
- 8. Discussion
1314
-
1315
- Deliberation as meta-cognition. The deliberation-first
1316
-
1317
- approach represents meta-reasoning—the system reasons
1318
-
1319
- about whether to reason. This parallels human metacogni-
1320
-
1321
- tive monitoring, where experts assess their knowledge state
1322
-
1323
- before consulting external sources (Shinn et al., 2023).
1324
-
1325
- Composition over specialization. Rather than a single
1326
-
1327
- monolithic pattern, DOVA’s hybrid approach composes sim-
1328
-
1329
- ple, well-understood patterns (ensemble, blackboard, iter-
1330
-
1331
- ative) into a pipeline with emergent capabilities exceeding
1332
-
1333
- any individual pattern.
1334
-
1335
- Cost-aware intelligence. Model tiering + adaptive think-
1336
-
1337
- ing provides two-dimensional cost control. Organizations
1338
-
1339
- can set budget constraints knowing the system degrades
1340
-
1341
- gracefully.
1342
-
1343
- 8.1. Limitations
1344
-
1345
- 1. Self-evaluation circularity. Confidence scoring uses
1346
-
1347
- the same LLM that generated the response. External
1348
-
1349
- signals (user feedback) would strengthen assessment.
1350
-
1351
- 2. Ablation scope. Our ablation is based on architectural
1352
-
1353
- analysis rather than large-scale benchmarks. Evalua-
1354
-
1355
- tion on standard benchmarks (HotpotQA, MMLU) and
1356
-
1357
- emerging agent evaluation frameworks (Ferrag et al.,
1358
-
1359
- 2025) remains future work.
1360
-
1361
- 3. Memory scalability. In-memory MMR search has
1362
-
1363
- O(n · k) complexity; indexing is needed for very large
1364
-
1365
- stores.
1366
-
1367
- 4. Agent homogeneity. All agents share the same LLM
1368
-
1369
- backbone. Heterogeneous models could improve en-
1370
-
1371
- semble diversity.
1372
-
1373
- 9. Conclusion
1374
-
1375
- We presented DOVA, a multi-agent platform for autonomous
1376
-
1377
- research automation introducing deliberation-first orches-
1378
-
1379
- tration, hybrid collaborative reasoning, and adaptive multi-
1380
-
1381
- tiered thinking. The architectural ablation demonstrates that
1382
-
1383
- collaborative reasoning is the highest-impact component,
1384
-
1385
- while adaptive thinking and deliberation provide significant
1386
-
1387
- efficiency gains without sacrificing quality.
1388
-
1389
- Future directions include: persistent user models learn-
1390
-
1391
- ing from feedback; heterogeneous agent ensembles mix-
1392
-
1393
- ing LLM providers; streaming deliberation display; multi-
1394
-
1395
- modal context integration; and comprehensive benchmark-
1396
-
1397
- ing on standard multi-hop QA datasets.
1398
-
1399
- DOVA is available as open-source software under
1400
-
1401
- Apache 2.0 at https://github.com/alfredcs/
1402
-
1403
- dova.
1404
-
1405
- References
1406
-
1407
- Alomrani, M. A., Zhang, Y., Li, D., Sun, Q., Pal, S., Zhang,
1408
-
1409
- Z., Hu, Y., Ajwani, R. D., Valkanas, A., et al. Reasoning
1410
-
1411
- on a budget: A survey of adaptive and controllable test-
1412
-
1413
- time compute in LLMs. arXiv preprint arXiv:2507.02076,
1414
-
1415
- 2025.
1416
-
1417
- Anthropic. The Claude model family: Technical report.
1418
-
1419
- Technical report, Anthropic, 2024a.
1420
-
1421
- Anthropic. Model context protocol specification.
1422
-
1423
- Technical report, Anthropic, 2024b. https://
1424
-
1425
- modelcontextprotocol.io.
1426
-
1427
- Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D.,
1428
-
1429
- Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,
1430
-
1431
- Askell, A., et al. Language models are few-shot learners.
1432
-
1433
- In Advances in Neural Information Processing Systems,
1434
-
1435
- volume 33, pp. 1877–1901, 2020.
1436
-
1437
- Carbonell, J. and Goldstein, J. The use of MMR, diversity-
1438
-
1439
- based reranking for reordering documents and producing
1440
-
1441
- summaries. In Proceedings of the 21st Annual Interna-
1442
-
1443
- tional ACM SIGIR Conference on Research and Develop-
1444
-
1445
- ment in Information Retrieval, pp. 335–336, 1998.
1446
-
1447
- Chen, Q., Qin, L., Liu, J., et al. Towards reasoning era: A
1448
-
1449
- survey of long chain-of-thought for reasoning large lan-
1450
-
1451
- guage models. arXiv preprint arXiv:2503.09567, 2025.
1452
-
1453
- Dang, Y., Qian, C., Luo, X., Fan, J., Xie, Z., Shi, R., Chen,
1454
-
1455
- W., Yang, C., Che, X., Tian, Y., et al. Multi-agent col-
1456
-
1457
- laboration via evolving orchestration. arXiv preprint
1458
-
1459
- arXiv:2505.19591, 2025.
1460
-
1461
- Du, Y., Li, S., Torralba, A., Tenenbaum, J. B., and Mor-
1462
-
1463
- datch, I. Improving factuality and reasoning in lan-
1464
-
1465
- guage models through multiagent debate. arXiv preprint
1466
-
1467
- arXiv:2305.14325, 2023.
1468
-
1469
- Ferrag, M. A., Tihanyi, N., and Debbah, M. From LLM
1470
-
1471
- reasoning to autonomous AI agents: A comprehensive
1472
-
1473
- review. arXiv preprint arXiv:2504.19678, 2025.
1474
-
1475
- Goyal, S., Ji, Z., Rawat, A. S., Menon, A. K., Kumar,
1476
-
1477
- S., and Naber, V. Think before you speak: Training
1478
-
1479
- language models with pause tokens. arXiv preprint
1480
-
1481
- arXiv:2310.02226, 2023.
1482
-
1483
- Graves, A. Adaptive computation time for recurrent neural
1484
-
1485
- networks. arXiv preprint arXiv:1603.08983, 2016.
1486
-
1487
- Han, T., Wang, Z., Fang, C., et al. Token-budget-aware
1488
-
1489
- LLM reasoning. arXiv preprint arXiv:2412.18547, 2024.
1490
-
1491
- Hayes-Roth, B. A blackboard architecture for control. Arti-
1492
-
1493
- ficial Intelligence, 26(3):251–321, 1985.
1494
-
1495
- 8
1496
-
1497
- ### Page 9
1498
-
1499
- DOVA: Deliberation-First Multi-Agent Orchestration
1500
-
1501
- Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Zhang,
1502
-
1503
- C., Wang, J., Wang, Z., Yau, S. K. S., Lin, Z., et al.
1504
-
1505
- MetaGPT: Meta programming for a multi-agent collab-
1506
-
1507
- orative framework. arXiv preprint arXiv:2308.00352,
1508
-
1509
- 2023.
1510
-
1511
- Hou, X., Zhao, Y., Wang, S., and Wang, H. Model context
1512
-
1513
- protocol (MCP): Landscape, security threats, and future
1514
-
1515
- research directions. arXiv preprint arXiv:2503.23278,
1516
-
1517
- 2025.
1518
-
1519
- Hu, Z., Zhu, Q., Yan, H., et al. Beyond RAG for agent
1520
-
1521
- memory: Retrieval by decoupling and aggregation. arXiv
1522
-
1523
- preprint arXiv:2602.02007, 2026.
1524
-
1525
- Li, G., Hammoud, H. A. A. K., Itani, H., Khizbullin, D., and
1526
-
1527
- Ghanem, B. CAMEL: Communicative agents for “mind”
1528
-
1529
- exploration of large language model society. Advances in
1530
-
1531
- Neural Information Processing Systems, 36, 2023.
1532
-
1533
- Li, J., Zhao, W., Zhang, Y., and Gan, C. Steering
1534
-
1535
- LLM thinking with budget guidance. arXiv preprint
1536
-
1537
- arXiv:2506.13752, 2025.
1538
-
1539
- Liang, T., He, Z., Jiao, W., Wang, X., Wang, Y., Wang,
1540
-
1541
- R., Yang, Y., Tu, Z., and Shi, S. Encouraging divergent
1542
-
1543
- thinking in large language models through multi-agent
1544
-
1545
- debate. arXiv preprint arXiv:2305.19118, 2023.
1546
-
1547
- Lin, K., Snell, C., Wang, Y., et al. Sleep-time compute:
1548
-
1549
- Beyond inference scaling at test-time. arXiv preprint
1550
-
1551
- arXiv:2504.13171, 2025.
1552
-
1553
- Luo, Z., Shen, Z., Yang, W., et al. MCP-Universe:
1554
-
1555
- Benchmarking large language models with real-world
1556
-
1557
- model context protocol servers. arXiv preprint
1558
-
1559
- arXiv:2508.14704, 2025.
1560
-
1561
- Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao,
1562
-
1563
- L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S.,
1564
-
1565
- Yang, Y., et al. Self-refine: Iterative refinement with self-
1566
-
1567
- feedback. In Advances in Neural Information Processing
1568
-
1569
- Systems, volume 36, 2023.
1570
-
1571
- Orogat, A., Rostam, A., and Mansour, E. Understanding
1572
-
1573
- multi-agent LLM frameworks: A unified benchmark and
1574
-
1575
- experimental analysis. arXiv preprint arXiv:2602.03128,
1576
-
1577
- 2026.
1578
-
1579
- Park, J. S., O’Brien, J. C., Cai, C. J., Morris, M. R., Liang,
1580
-
1581
- P., and Bernstein, M. S. Generative agents: Interactive
1582
-
1583
- simulacra of human behavior. In Proceedings of the 36th
1584
-
1585
- Annual ACM Symposium on User Interface Software and
1586
-
1587
- Technology, pp. 1–22, 2023.
1588
-
1589
- Patil, S. G., Zhang, T., Wang, X., and Gonzalez, J. E. Go-
1590
-
1591
- rilla: Large language model connected with massive APIs.
1592
-
1593
- arXiv preprint arXiv:2305.15334, 2023.
1594
-
1595
- Qin, Y., Liang, S., Ye, Y., Zhu, K., Yan, L., Lu, Y., Lin, Y.,
1596
-
1597
- Cong, X., Tang, X., Qian, B., et al. ToolLLM: Facilitating
1598
-
1599
- large language models to master 16000+ real-world APIs.
1600
-
1601
- arXiv preprint arXiv:2307.16789, 2023.
1602
-
1603
- Schick, T., Dwivedi-Yu, J., Dess`ı, R., Raileanu, R., Lomeli,
1604
-
1605
- M., Hambro, E., Zettlemoyer, L., Cancedda, N., and
1606
-
1607
- Scialom, T. Toolformer: Language models can teach
1608
-
1609
- themselves to use tools. In Advances in Neural Informa-
1610
-
1611
- tion Processing Systems, volume 36, 2023.
1612
-
1613
- Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., and
1614
-
1615
- Yao, S. Reflexion: Language agents with verbal rein-
1616
-
1617
- forcement learning. In Advances in Neural Information
1618
-
1619
- Processing Systems, volume 36, 2023.
1620
-
1621
- Tran, K.-T., Dao, D., Nguyen, M.-D., Pham, Q.-V.,
1622
-
1623
- O’Sullivan, B., and Nguyen, H. D. Multi-agent collabo-
1624
-
1625
- ration mechanisms: A survey of LLMs. arXiv preprint
1626
-
1627
- arXiv:2501.06322, 2025.
1628
-
1629
- Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E.,
1630
-
1631
- Narasimhan, S., Chowdhery, A., and Zhou, D. Self-
1632
-
1633
- consistency improves chain of thought reasoning in lan-
1634
-
1635
- guage models. In International Conference on Learning
1636
-
1637
- Representations, 2023.
1638
-
1639
- Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B.,
1640
-
1641
- Xia, F., Chi, E., Le, Q., and Zhou, D. Chain-of-thought
1642
-
1643
- prompting elicits reasoning in large language models.
1644
-
1645
- In Advances in Neural Information Processing Systems,
1646
-
1647
- volume 35, pp. 24824–24837, 2022.
1648
-
1649
- Wei, T., Li, T.-W., Liu, Z., Ning, X., Yang, Z., Zou, J., Zeng,
1650
-
1651
- Z., Qiu, R., Lin, X., Fu, D., et al. Agentic reasoning for
1652
-
1653
- large language models. arXiv preprint arXiv:2601.12538,
1654
-
1655
- 2026.
1656
-
1657
- Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang,
1658
-
1659
- L., Zhang, X., Zhang, S., Liu, J., et al. AutoGen: Enabling
1660
-
1661
- next-gen LLM applications via multi-agent conversation.
1662
-
1663
- arXiv preprint arXiv:2308.08155, 2023.
1664
-
1665
- Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao,
1666
-
1667
- Y., and Narasimhan, K. Tree of thoughts: Deliberate
1668
-
1669
- problem solving with large language models. Advances
1670
-
1671
- in Neural Information Processing Systems, 36, 2023a.
1672
-
1673
- Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan,
1674
-
1675
- K., and Cao, Y. ReAct: Synergizing reasoning and act-
1676
-
1677
- ing in language models. In International Conference on
1678
-
1679
- Learning Representations, 2023b.
1680
-
1681
- Zhou, A., Yan, K., Shlapentokh-Rothman, M., Wang, H.,
1682
-
1683
- and Wang, Y.-X. Language agent tree search unifies rea-
1684
-
1685
- soning, acting, and planning in language models. arXiv
1686
-
1687
- preprint arXiv:2310.04406, 2023.
1688
-
1689
- 9
1690
-
1691
- ### Page 10
1692
-
1693
- DOVA: Deliberation-First Multi-Agent Orchestration
1694
-
1695
- Zhu, K., Li, H., Wu, S., et al. Scaling test-time compute for
1696
-
1697
- LLM agents. arXiv preprint arXiv:2506.12928, 2025.
1698
-
1699
- 10