groundswell 0.0.2 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (554) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +26 -9
  3. package/dist/cache/cache-key.d.ts +86 -0
  4. package/dist/cache/cache-key.d.ts.map +1 -0
  5. package/dist/cache/cache-key.js +204 -0
  6. package/dist/cache/cache-key.js.map +1 -0
  7. package/dist/cache/cache.d.ts +104 -0
  8. package/dist/cache/cache.d.ts.map +1 -0
  9. package/dist/cache/cache.js +179 -0
  10. package/dist/cache/cache.js.map +1 -0
  11. package/{src/cache/index.ts → dist/cache/index.d.ts} +1 -1
  12. package/dist/cache/index.d.ts.map +1 -0
  13. package/dist/cache/index.js +6 -0
  14. package/dist/cache/index.js.map +1 -0
  15. package/dist/core/agent.d.ts +203 -0
  16. package/dist/core/agent.d.ts.map +1 -0
  17. package/dist/core/agent.js +833 -0
  18. package/dist/core/agent.js.map +1 -0
  19. package/{src/core/context.ts → dist/core/context.d.ts} +16 -67
  20. package/dist/core/context.d.ts.map +1 -0
  21. package/dist/core/context.js +80 -0
  22. package/dist/core/context.js.map +1 -0
  23. package/dist/core/event-tree.d.ts +72 -0
  24. package/dist/core/event-tree.d.ts.map +1 -0
  25. package/dist/core/event-tree.js +211 -0
  26. package/dist/core/event-tree.js.map +1 -0
  27. package/{src/core/factory.ts → dist/core/factory.d.ts} +6 -27
  28. package/dist/core/factory.d.ts.map +1 -0
  29. package/dist/core/factory.js +110 -0
  30. package/dist/core/factory.js.map +1 -0
  31. package/{src/core/index.ts → dist/core/index.d.ts} +2 -10
  32. package/dist/core/index.d.ts.map +1 -0
  33. package/dist/core/index.js +9 -0
  34. package/dist/core/index.js.map +1 -0
  35. package/dist/core/logger.d.ts +50 -0
  36. package/dist/core/logger.d.ts.map +1 -0
  37. package/dist/core/logger.js +91 -0
  38. package/dist/core/logger.js.map +1 -0
  39. package/dist/core/mcp-handler.d.ts +127 -0
  40. package/dist/core/mcp-handler.d.ts.map +1 -0
  41. package/dist/core/mcp-handler.js +323 -0
  42. package/dist/core/mcp-handler.js.map +1 -0
  43. package/dist/core/prompt.d.ts +80 -0
  44. package/dist/core/prompt.d.ts.map +1 -0
  45. package/dist/core/prompt.js +120 -0
  46. package/dist/core/prompt.js.map +1 -0
  47. package/dist/core/workflow-context.d.ts +61 -0
  48. package/dist/core/workflow-context.d.ts.map +1 -0
  49. package/dist/core/workflow-context.js +358 -0
  50. package/dist/core/workflow-context.js.map +1 -0
  51. package/dist/core/workflow.d.ts +543 -0
  52. package/dist/core/workflow.d.ts.map +1 -0
  53. package/dist/core/workflow.js +986 -0
  54. package/dist/core/workflow.js.map +1 -0
  55. package/dist/debugger/event-replayer.d.ts +422 -0
  56. package/dist/debugger/event-replayer.d.ts.map +1 -0
  57. package/dist/debugger/event-replayer.js +639 -0
  58. package/dist/debugger/event-replayer.js.map +1 -0
  59. package/dist/debugger/index.d.ts +2 -0
  60. package/dist/debugger/index.d.ts.map +1 -0
  61. package/{src/debugger/index.ts → dist/debugger/index.js} +1 -0
  62. package/dist/debugger/index.js.map +1 -0
  63. package/dist/debugger/tree-debugger.d.ts +240 -0
  64. package/dist/debugger/tree-debugger.d.ts.map +1 -0
  65. package/dist/debugger/tree-debugger.js +620 -0
  66. package/dist/debugger/tree-debugger.js.map +1 -0
  67. package/dist/decorators/index.d.ts +4 -0
  68. package/dist/decorators/index.d.ts.map +1 -0
  69. package/{src/decorators/index.ts → dist/decorators/index.js} +1 -0
  70. package/dist/decorators/index.js.map +1 -0
  71. package/dist/decorators/observed-state.d.ts +32 -0
  72. package/dist/decorators/observed-state.d.ts.map +1 -0
  73. package/dist/decorators/observed-state.js +79 -0
  74. package/dist/decorators/observed-state.js.map +1 -0
  75. package/dist/decorators/step.d.ts +15 -0
  76. package/dist/decorators/step.d.ts.map +1 -0
  77. package/dist/decorators/step.js +192 -0
  78. package/dist/decorators/step.js.map +1 -0
  79. package/dist/decorators/task.d.ts +50 -0
  80. package/dist/decorators/task.d.ts.map +1 -0
  81. package/dist/decorators/task.js +118 -0
  82. package/dist/decorators/task.js.map +1 -0
  83. package/dist/examples/index.d.ts +3 -0
  84. package/dist/examples/index.d.ts.map +1 -0
  85. package/{src/examples/index.ts → dist/examples/index.js} +1 -0
  86. package/dist/examples/index.js.map +1 -0
  87. package/dist/examples/tdd-orchestrator.d.ts +15 -0
  88. package/dist/examples/tdd-orchestrator.d.ts.map +1 -0
  89. package/dist/examples/tdd-orchestrator.js +121 -0
  90. package/dist/examples/tdd-orchestrator.js.map +1 -0
  91. package/dist/examples/test-cycle-workflow.d.ts +14 -0
  92. package/dist/examples/test-cycle-workflow.d.ts.map +1 -0
  93. package/dist/examples/test-cycle-workflow.js +116 -0
  94. package/dist/examples/test-cycle-workflow.js.map +1 -0
  95. package/dist/harnesses/claude-code-harness.d.ts +391 -0
  96. package/dist/harnesses/claude-code-harness.d.ts.map +1 -0
  97. package/dist/harnesses/claude-code-harness.js +1076 -0
  98. package/dist/harnesses/claude-code-harness.js.map +1 -0
  99. package/dist/harnesses/harness-registry.d.ts +440 -0
  100. package/dist/harnesses/harness-registry.d.ts.map +1 -0
  101. package/dist/harnesses/harness-registry.js +543 -0
  102. package/dist/harnesses/harness-registry.js.map +1 -0
  103. package/dist/harnesses/index.d.ts +12 -0
  104. package/dist/harnesses/index.d.ts.map +1 -0
  105. package/dist/harnesses/index.js +11 -0
  106. package/dist/harnesses/index.js.map +1 -0
  107. package/dist/harnesses/pi-harness.d.ts +219 -0
  108. package/dist/harnesses/pi-harness.d.ts.map +1 -0
  109. package/dist/harnesses/pi-harness.js +676 -0
  110. package/dist/harnesses/pi-harness.js.map +1 -0
  111. package/dist/harnesses/pi-schema-converter.d.ts +24 -0
  112. package/dist/harnesses/pi-schema-converter.d.ts.map +1 -0
  113. package/dist/harnesses/pi-schema-converter.js +81 -0
  114. package/dist/harnesses/pi-schema-converter.js.map +1 -0
  115. package/dist/harnesses/register-defaults.d.ts +24 -0
  116. package/dist/harnesses/register-defaults.d.ts.map +1 -0
  117. package/dist/harnesses/register-defaults.js +40 -0
  118. package/dist/harnesses/register-defaults.js.map +1 -0
  119. package/dist/harnesses/session-store.d.ts +201 -0
  120. package/dist/harnesses/session-store.d.ts.map +1 -0
  121. package/dist/harnesses/session-store.js +254 -0
  122. package/dist/harnesses/session-store.js.map +1 -0
  123. package/dist/index.d.ts +37 -0
  124. package/dist/index.d.ts.map +1 -0
  125. package/dist/index.js +57 -0
  126. package/dist/index.js.map +1 -0
  127. package/dist/reflection/index.d.ts +5 -0
  128. package/dist/reflection/index.d.ts.map +1 -0
  129. package/{src/reflection/index.ts → dist/reflection/index.js} +1 -1
  130. package/dist/reflection/index.js.map +1 -0
  131. package/dist/reflection/reflection.d.ts +84 -0
  132. package/dist/reflection/reflection.d.ts.map +1 -0
  133. package/dist/reflection/reflection.js +344 -0
  134. package/dist/reflection/reflection.js.map +1 -0
  135. package/dist/tools/index.d.ts +6 -0
  136. package/dist/tools/index.d.ts.map +1 -0
  137. package/dist/tools/index.js +11 -0
  138. package/dist/tools/index.js.map +1 -0
  139. package/dist/tools/introspection.d.ts +165 -0
  140. package/dist/tools/introspection.d.ts.map +1 -0
  141. package/dist/tools/introspection.js +324 -0
  142. package/dist/tools/introspection.js.map +1 -0
  143. package/dist/types/agent.d.ts +1317 -0
  144. package/dist/types/agent.d.ts.map +1 -0
  145. package/dist/types/agent.js +423 -0
  146. package/dist/types/agent.js.map +1 -0
  147. package/dist/types/decorators.d.ts +40 -0
  148. package/dist/types/decorators.d.ts.map +1 -0
  149. package/dist/types/decorators.js +2 -0
  150. package/dist/types/decorators.js.map +1 -0
  151. package/dist/types/error-strategy.d.ts +13 -0
  152. package/dist/types/error-strategy.d.ts.map +1 -0
  153. package/dist/types/error-strategy.js +2 -0
  154. package/dist/types/error-strategy.js.map +1 -0
  155. package/dist/types/error.d.ts +20 -0
  156. package/dist/types/error.d.ts.map +1 -0
  157. package/dist/types/error.js +2 -0
  158. package/dist/types/error.js.map +1 -0
  159. package/dist/types/events.d.ts +113 -0
  160. package/dist/types/events.d.ts.map +1 -0
  161. package/dist/types/events.js +2 -0
  162. package/dist/types/events.js.map +1 -0
  163. package/dist/types/harnesses.d.ts +474 -0
  164. package/dist/types/harnesses.d.ts.map +1 -0
  165. package/dist/types/harnesses.js +2 -0
  166. package/dist/types/harnesses.js.map +1 -0
  167. package/dist/types/index.d.ts +23 -0
  168. package/dist/types/index.d.ts.map +1 -0
  169. package/dist/types/index.js +8 -0
  170. package/dist/types/index.js.map +1 -0
  171. package/dist/types/logging.d.ts +24 -0
  172. package/dist/types/logging.d.ts.map +1 -0
  173. package/dist/types/logging.js +2 -0
  174. package/dist/types/logging.js.map +1 -0
  175. package/dist/types/observer.d.ts +18 -0
  176. package/dist/types/observer.d.ts.map +1 -0
  177. package/dist/types/observer.js +2 -0
  178. package/dist/types/observer.js.map +1 -0
  179. package/dist/types/prompt.d.ts +31 -0
  180. package/dist/types/prompt.d.ts.map +1 -0
  181. package/dist/types/prompt.js +6 -0
  182. package/dist/types/prompt.js.map +1 -0
  183. package/dist/types/providers.d.ts +691 -0
  184. package/dist/types/providers.d.ts.map +1 -0
  185. package/dist/types/providers.js +14 -0
  186. package/dist/types/providers.js.map +1 -0
  187. package/dist/types/reflection.d.ts +96 -0
  188. package/dist/types/reflection.d.ts.map +1 -0
  189. package/dist/types/reflection.js +24 -0
  190. package/dist/types/reflection.js.map +1 -0
  191. package/dist/types/restart.d.ts +132 -0
  192. package/dist/types/restart.d.ts.map +1 -0
  193. package/dist/types/restart.js +2 -0
  194. package/dist/types/restart.js.map +1 -0
  195. package/dist/types/sdk-primitives.d.ts +118 -0
  196. package/dist/types/sdk-primitives.d.ts.map +1 -0
  197. package/dist/types/sdk-primitives.js +6 -0
  198. package/dist/types/sdk-primitives.js.map +1 -0
  199. package/{src/types/snapshot.ts → dist/types/snapshot.d.ts} +5 -5
  200. package/dist/types/snapshot.d.ts.map +1 -0
  201. package/dist/types/snapshot.js +2 -0
  202. package/dist/types/snapshot.js.map +1 -0
  203. package/dist/types/streaming.d.ts +194 -0
  204. package/dist/types/streaming.d.ts.map +1 -0
  205. package/dist/types/streaming.js +67 -0
  206. package/dist/types/streaming.js.map +1 -0
  207. package/dist/types/workflow-context.d.ts +275 -0
  208. package/dist/types/workflow-context.d.ts.map +1 -0
  209. package/dist/types/workflow-context.js +8 -0
  210. package/dist/types/workflow-context.js.map +1 -0
  211. package/dist/types/workflow.d.ts +30 -0
  212. package/dist/types/workflow.d.ts.map +1 -0
  213. package/dist/types/workflow.js +2 -0
  214. package/dist/types/workflow.js.map +1 -0
  215. package/dist/utils/agent-validation.d.ts +88 -0
  216. package/dist/utils/agent-validation.d.ts.map +1 -0
  217. package/dist/utils/agent-validation.js +87 -0
  218. package/dist/utils/agent-validation.js.map +1 -0
  219. package/dist/utils/delay.d.ts +7 -0
  220. package/dist/utils/delay.d.ts.map +1 -0
  221. package/dist/utils/delay.js +9 -0
  222. package/dist/utils/delay.js.map +1 -0
  223. package/dist/utils/harness-config.d.ts +180 -0
  224. package/dist/utils/harness-config.d.ts.map +1 -0
  225. package/dist/utils/harness-config.js +311 -0
  226. package/dist/utils/harness-config.js.map +1 -0
  227. package/dist/utils/id.d.ts +6 -0
  228. package/dist/utils/id.d.ts.map +1 -0
  229. package/dist/utils/id.js +12 -0
  230. package/dist/utils/id.js.map +1 -0
  231. package/dist/utils/index.d.ts +13 -0
  232. package/dist/utils/index.d.ts.map +1 -0
  233. package/dist/utils/index.js +11 -0
  234. package/dist/utils/index.js.map +1 -0
  235. package/dist/utils/model-spec.d.ts +110 -0
  236. package/dist/utils/model-spec.d.ts.map +1 -0
  237. package/dist/utils/model-spec.js +149 -0
  238. package/dist/utils/model-spec.js.map +1 -0
  239. package/dist/utils/observable.d.ts +54 -0
  240. package/dist/utils/observable.d.ts.map +1 -0
  241. package/dist/utils/observable.js +82 -0
  242. package/dist/utils/observable.js.map +1 -0
  243. package/dist/utils/provider-config.d.ts +10 -0
  244. package/dist/utils/provider-config.d.ts.map +1 -0
  245. package/dist/utils/provider-config.js +10 -0
  246. package/dist/utils/provider-config.js.map +1 -0
  247. package/dist/utils/restart-analysis.d.ts +202 -0
  248. package/dist/utils/restart-analysis.d.ts.map +1 -0
  249. package/dist/utils/restart-analysis.js +426 -0
  250. package/dist/utils/restart-analysis.js.map +1 -0
  251. package/dist/utils/session-serialization.d.ts +118 -0
  252. package/dist/utils/session-serialization.d.ts.map +1 -0
  253. package/dist/utils/session-serialization.js +217 -0
  254. package/dist/utils/session-serialization.js.map +1 -0
  255. package/dist/utils/workflow-error-utils.d.ts +22 -0
  256. package/dist/utils/workflow-error-utils.d.ts.map +1 -0
  257. package/dist/utils/workflow-error-utils.js +45 -0
  258. package/dist/utils/workflow-error-utils.js.map +1 -0
  259. package/package.json +34 -5
  260. package/.claude/commands/subtask-planning/prp-base-create.md +0 -120
  261. package/.claude/commands/subtask-planning/prp-base-execute.md +0 -65
  262. package/.claude/commands/task-breakdown.md +0 -94
  263. package/.claude/settings.local.json +0 -9
  264. package/.claude/system_prompts/task-breakdown.md +0 -101
  265. package/CHANGELOG.md +0 -188
  266. package/PRD.md +0 -543
  267. package/PRPs/001-hierarchical-workflow-engine.md +0 -2438
  268. package/PRPs/PRDs/002-agent-prompt.md +0 -390
  269. package/PRPs/PRDs/003-agent-prompt.md +0 -943
  270. package/PRPs/PRDs/004-agent-prompt.md +0 -1136
  271. package/PRPs/PRDs/tasks-001.json +0 -492
  272. package/PRPs/README.md +0 -83
  273. package/PRPs/templates/prp_base.md +0 -222
  274. package/docs/agent.md +0 -422
  275. package/docs/prompt.md +0 -419
  276. package/docs/workflow.md +0 -600
  277. package/examples/README.md +0 -258
  278. package/examples/examples/01-basic-workflow.ts +0 -100
  279. package/examples/examples/02-decorator-options.ts +0 -217
  280. package/examples/examples/03-parent-child.ts +0 -241
  281. package/examples/examples/04-observers-debugger.ts +0 -340
  282. package/examples/examples/05-error-handling.ts +0 -387
  283. package/examples/examples/06-concurrent-tasks.ts +0 -352
  284. package/examples/examples/07-agent-loops.ts +0 -432
  285. package/examples/examples/08-sdk-features.ts +0 -667
  286. package/examples/examples/09-reflection.ts +0 -573
  287. package/examples/examples/10-introspection.ts +0 -550
  288. package/examples/examples/11-reparenting-workflows.ts +0 -269
  289. package/examples/index.ts +0 -147
  290. package/examples/utils/helpers.ts +0 -57
  291. package/package-lock.json +0 -2398
  292. package/plan/001_d3bb02af4886/TEST_RESULTS.md +0 -259
  293. package/plan/001_d3bb02af4886/backlog.json +0 -867
  294. package/plan/001_d3bb02af4886/bug_fix_tasks.json +0 -484
  295. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M1T1S1/PRP.md +0 -488
  296. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M1T1S2/PRP.md +0 -581
  297. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M1T1S3/PRP.md +0 -687
  298. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M2T1S1/PRP.md +0 -492
  299. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M2T1S3/PRP.md +0 -932
  300. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M2T1S3/research/concurrent_error_testing_patterns.md +0 -1109
  301. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M2T1S3/research/vitest_concurrent_testing.md +0 -802
  302. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M2T1S3/research/workflow_engine_test_references.md +0 -603
  303. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M2T2S1/PRP.md +0 -564
  304. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M2T2S3/PRP.md +0 -518
  305. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M2T2S4/PRP.md +0 -1252
  306. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M2T3S1/PRP.md +0 -364
  307. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M2T3S1/research/CODEBASE_INVENTORY.md +0 -114
  308. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M2T3S1/research/DECORATOR_DOCUMENTATION_PATTERNS.md +0 -205
  309. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M2T3S1/research/PRD_LOCATION_ANALYSIS.md +0 -199
  310. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M2T3S1/research/ULTRATHINK_PRP_PLAN.md +0 -134
  311. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M3T1S1/PRP.md +0 -495
  312. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M3T1S1/research/console_error_inventory.md +0 -435
  313. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M3T1S2/PRP.md +0 -506
  314. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M3T1S3/PRP.md +0 -612
  315. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M3T2S2/PRP.md +0 -558
  316. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M3T2S2/research/external_research.md +0 -788
  317. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M3T3S2/PRP.md +0 -460
  318. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M3T3S3/PRP.md +0 -454
  319. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M3T4S1/PRP.md +0 -520
  320. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M3T4S1/RECOMMENDATION.md +0 -417
  321. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M3T4S1/research/external_workflow_engines_research.md +0 -760
  322. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M3T4S1/research/security_implications_analysis.md +0 -245
  323. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M3T4S2/PRP.md +0 -792
  324. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M4T1S1/PRP.md +0 -535
  325. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M4T1S1/TEST_EXECUTION_REPORT.md +0 -190
  326. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M4T1S2/PRP.md +0 -654
  327. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M4T1S2/TEST_FIX_REPORT.md +0 -227
  328. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M4T1S2/research/KEY_FINDINGS.md +0 -345
  329. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M4T1S2/research/QUICK_REFERENCE.md +0 -193
  330. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M4T1S2/research/test_maintenance_research.md +0 -1323
  331. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M4T3S1/BREAKING_CHANGES_AUDIT.md +0 -1011
  332. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M4T3S1/PRP.md +0 -927
  333. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/P1M4T3S2/PRP.md +0 -505
  334. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/architecture/logger_child_signature_analysis.md +0 -401
  335. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M1T1S3/child_implementation_research.md +0 -142
  336. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M1T1S3/test_patterns_research.md +0 -112
  337. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M1T1S3/vitest_patterns_research.md +0 -159
  338. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M1T1S4/PRP.md +0 -549
  339. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M1T1S4/VERIFICATION_REPORT.md +0 -368
  340. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M1T1S4/edge_case_analysis.md +0 -172
  341. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M1T1S4/usage_inventory.md +0 -175
  342. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T1S2/PRP.md +0 -696
  343. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T1S4/PRP.md +0 -860
  344. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T2S2/PRP.md +0 -1066
  345. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T2S2/research/01-testing-aggregated-errors.md +0 -1103
  346. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T2S2/research/01_typescript_error_aggregation_patterns.md +0 -789
  347. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T2S2/research/02-error-merge-strategy-testing-guide.md +0 -1098
  348. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T2S2/research/02_aggregate_error_patterns.md +0 -1037
  349. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T2S2/research/03-promise-allsettled-testing-patterns.md +0 -916
  350. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T2S2/research/03_error_merging_strategies.md +0 -1045
  351. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T2S2/research/04_github_stackoverflow_examples.md +0 -890
  352. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T2S2/research/05_comprehensive_summary.md +0 -822
  353. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T2S2/research/INDEX.md +0 -668
  354. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T2S2/research/QUICK_REFERENCE.md +0 -706
  355. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T2S2/research/README.md +0 -265
  356. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T2S2/research/RESEARCH_REPORT.md +0 -655
  357. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T2S4/research/vitest_testing_patterns.md +0 -1103
  358. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M2T3S2/PRP.md +0 -426
  359. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T1S2/PRP.md +0 -506
  360. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T1S2/research/QUICK_REFERENCE.md +0 -114
  361. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T1S2/research/RESEARCH_SUMMARY.md +0 -316
  362. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T1S2/research/vitest_observer_error_logging_best_practices.md +0 -754
  363. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T1S3/PRP.md +0 -612
  364. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T2S1/PRP.md +0 -719
  365. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T2S1/README.md +0 -215
  366. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T2S1/analysis.md +0 -765
  367. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T2S3/PRP.md +0 -718
  368. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T3S1/DECISION.md +0 -149
  369. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T3S1/PRP.md +0 -470
  370. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T3S1/research/ULTRATHINK_PLAN.md +0 -332
  371. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T3S1/research/codebase_workflow_name_analysis.md +0 -167
  372. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T3S1/research/external_best_practices.md +0 -265
  373. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T3S1/research/validation_patterns.md +0 -273
  374. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T4S1/workflow_engine_ancestry_api_research.md +0 -760
  375. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M3T4S3-PRP.md +0 -434
  376. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M4T2S1/PRP.md +0 -717
  377. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M4T2S2/PRP.md +0 -472
  378. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M4T2S2/VALIDATION_REPORT.md +0 -125
  379. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/P1M4T2S2/research/ULTRATHINK_PRP_PLAN.md +0 -301
  380. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/error-logging-best-practices.md +0 -1170
  381. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/research_typescript_partial_and_overloads.md +0 -940
  382. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/vitest-quick-reference.md +0 -151
  383. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/docs/vitest-research.md +0 -650
  384. package/plan/001_d3bb02af4886/bugfix/001_e8e04329daf3/prd_snapshot.md +0 -259
  385. package/plan/001_d3bb02af4886/bugfix/P1M1T1S1/PRP.md +0 -457
  386. package/plan/001_d3bb02af4886/bugfix/RESEARCH_SUMMARY.md +0 -346
  387. package/plan/001_d3bb02af4886/bugfix/architecture/codebase_structure.md +0 -311
  388. package/plan/001_d3bb02af4886/bugfix/architecture/concurrent_execution_best_practices.md +0 -1565
  389. package/plan/001_d3bb02af4886/bugfix/architecture/error_handling_patterns.md +0 -288
  390. package/plan/001_d3bb02af4886/bugfix/architecture/promise_all_analysis.md +0 -741
  391. package/plan/001_d3bb02af4886/docs/PRP/P1M1T1S4-functional-workflow-error-state-capture-test.md +0 -652
  392. package/plan/001_d3bb02af4886/docs/PRP/P1P2-PRP.md +0 -527
  393. package/plan/001_d3bb02af4886/docs/PRP/P3P4-PRP.md +0 -1388
  394. package/plan/001_d3bb02af4886/docs/PRP/P4P5-PRP.md +0 -1136
  395. package/plan/001_d3bb02af4886/docs/PRP/PRP.md +0 -527
  396. package/plan/001_d3bb02af4886/docs/PRP/bugfix/P1M1T2S1-PRP.md +0 -415
  397. package/plan/001_d3bb02af4886/docs/PRP/bugfix/P1M1T2S2-PRP.md +0 -378
  398. package/plan/001_d3bb02af4886/docs/PRP/bugfix/P1M1T2S4-PRP.md +0 -713
  399. package/plan/001_d3bb02af4886/docs/PRP/bugfix/P1M2T1S4-PRP.md +0 -370
  400. package/plan/001_d3bb02af4886/docs/PRP_P1M3T1S3.md +0 -499
  401. package/plan/001_d3bb02af4886/docs/TEST_RESULTS.md +0 -230
  402. package/plan/001_d3bb02af4886/docs/architecture/external_deps.md +0 -358
  403. package/plan/001_d3bb02af4886/docs/architecture/system_context.md +0 -242
  404. package/plan/001_d3bb02af4886/docs/bugfix/ANALYSIS_PRD_VS_IMPLEMENTATION.md +0 -1134
  405. package/plan/001_d3bb02af4886/docs/bugfix/GAP_ANALYSIS_SUMMARY.md +0 -179
  406. package/plan/001_d3bb02af4886/docs/bugfix/P1M4T2S1/PRP.md +0 -629
  407. package/plan/001_d3bb02af4886/docs/bugfix/P1M4T2S1/validation-report.md +0 -214
  408. package/plan/001_d3bb02af4886/docs/bugfix/PRP_P1M4T2S3.md +0 -629
  409. package/plan/001_d3bb02af4886/docs/bugfix/bugfix_PRP.md +0 -529
  410. package/plan/001_d3bb02af4886/docs/bugfix/bugfix_QUICK_REFERENCE.md +0 -142
  411. package/plan/001_d3bb02af4886/docs/bugfix/bugfix_README.md +0 -304
  412. package/plan/001_d3bb02af4886/docs/bugfix/bugfix_TEST_RESULTS.md +0 -558
  413. package/plan/001_d3bb02af4886/docs/bugfix/bugfix_VALIDATION_SUMMARY.md +0 -256
  414. package/plan/001_d3bb02af4886/docs/bugfix/system_context.md +0 -346
  415. package/plan/001_d3bb02af4886/docs/bugfix-architecture/bug_analysis.md +0 -415
  416. package/plan/001_d3bb02af4886/docs/bugfix-architecture/implementation_patterns.md +0 -489
  417. package/plan/001_d3bb02af4886/docs/bugfix-architecture/system_context.md +0 -218
  418. package/plan/001_d3bb02af4886/docs/bugfix_INITIATION_SUMMARY.md +0 -380
  419. package/plan/001_d3bb02af4886/docs/research/CYCLE_DETECTION_PATTERNS.md +0 -1923
  420. package/plan/001_d3bb02af4886/docs/research/CYCLE_DETECTION_QUICK_REF.md +0 -319
  421. package/plan/001_d3bb02af4886/docs/research/P1M1T2S1/codebase-context.md +0 -115
  422. package/plan/001_d3bb02af4886/docs/research/P1M1T2S1/cycle-detection-algorithms.md +0 -134
  423. package/plan/001_d3bb02af4886/docs/research/P1M1T2S1/test-patterns.md +0 -153
  424. package/plan/001_d3bb02af4886/docs/research/P1M1T2S1/workflow-class.md +0 -132
  425. package/plan/001_d3bb02af4886/docs/research/P1M2T1S4/DECORATOR_DOCUMENTATION_BEST_PRACTICES.md +0 -716
  426. package/plan/001_d3bb02af4886/docs/research/P1M2T1S4/DECORATOR_DOCUMENTATION_QUICK_REF.md +0 -186
  427. package/plan/001_d3bb02af4886/docs/research/P1M2T1S4/GROUNDSWELL_DECORATOR_EXAMPLES.md +0 -604
  428. package/plan/001_d3bb02af4886/docs/research/P1M2T1S4/INDEX.md +0 -213
  429. package/plan/001_d3bb02af4886/docs/research/P1M2T1S4/codebase_structure.md +0 -30
  430. package/plan/001_d3bb02af4886/docs/research/P1M2T1S4/existing_test_pattern.md +0 -56
  431. package/plan/001_d3bb02af4886/docs/research/P1M2T1S4/getRootObservers_implementation.md +0 -53
  432. package/plan/001_d3bb02af4886/docs/research/P1M2T1S4/test_conventions.md +0 -49
  433. package/plan/001_d3bb02af4886/docs/research/P1M3T1S4/PRP.md +0 -958
  434. package/plan/001_d3bb02af4886/docs/research/P1M3T1S4/QUICK_REFERENCE.md +0 -339
  435. package/plan/001_d3bb02af4886/docs/research/P1M3T1S4/README.md +0 -305
  436. package/plan/001_d3bb02af4886/docs/research/P1M3T1S4/SUMMARY.md +0 -433
  437. package/plan/001_d3bb02af4886/docs/research/P1M3T1S4/bidirectional-tree-consistency-testing.md +0 -1574
  438. package/plan/001_d3bb02af4886/docs/research/P1M3T1S4/test-pattern-examples.md +0 -1014
  439. package/plan/001_d3bb02af4886/docs/research/P1P2/LRU_CACHE_BEST_PRACTICES.md +0 -1929
  440. package/plan/001_d3bb02af4886/docs/research/P1P2/LRU_CACHE_CODE_PATTERNS.md +0 -857
  441. package/plan/001_d3bb02af4886/docs/research/P1P2/LRU_CACHE_INTEGRATION_GUIDE.md +0 -738
  442. package/plan/001_d3bb02af4886/docs/research/P1P2/LRU_CACHE_RESEARCH_INDEX.md +0 -424
  443. package/plan/001_d3bb02af4886/docs/research/P1P2/REFLECTION_INDEX.md +0 -291
  444. package/plan/001_d3bb02af4886/docs/research/P1P2/REFLECTION_RESEARCH_REPORT.md +0 -1342
  445. package/plan/001_d3bb02af4886/docs/research/P1P2/RESEARCH_SUMMARY.md +0 -342
  446. package/plan/001_d3bb02af4886/docs/research/P1P2/anthropic-sdk.md +0 -174
  447. package/plan/001_d3bb02af4886/docs/research/P1P2/async-local-storage.md +0 -200
  448. package/plan/001_d3bb02af4886/docs/research/P1P2/reflection-code-patterns.md +0 -1205
  449. package/plan/001_d3bb02af4886/docs/research/P1P2/reflection-decision-matrix.md +0 -421
  450. package/plan/001_d3bb02af4886/docs/research/P1P2/reflection-implementation-guide.md +0 -1341
  451. package/plan/001_d3bb02af4886/docs/research/P1P2/reflection-integration-guide.md +0 -834
  452. package/plan/001_d3bb02af4886/docs/research/P1P2/reflection-patterns.md +0 -1468
  453. package/plan/001_d3bb02af4886/docs/research/P1P2/reflection-quick-reference.md +0 -558
  454. package/plan/001_d3bb02af4886/docs/research/P1P2/zod-schema.md +0 -152
  455. package/plan/001_d3bb02af4886/docs/research/P3P4/caching-lru.md +0 -116
  456. package/plan/001_d3bb02af4886/docs/research/P3P4/introspection-tools.md +0 -177
  457. package/plan/001_d3bb02af4886/docs/research/P3P4/reflection-patterns.md +0 -117
  458. package/plan/001_d3bb02af4886/docs/research/P4P5/RESEARCH_SUMMARY.md +0 -151
  459. package/plan/001_d3bb02af4886/docs/research/PROMISE_ALLSETTLED_QUICK_REF.md +0 -376
  460. package/plan/001_d3bb02af4886/docs/research/PROMISE_ALLSETTLED_RESEARCH.md +0 -1507
  461. package/plan/001_d3bb02af4886/docs/research/bugfix_typescript_patterns.md +0 -949
  462. package/plan/001_d3bb02af4886/docs/research/error-testing-research.md +0 -619
  463. package/plan/001_d3bb02af4886/docs/research/error_handling_patterns.md +0 -723
  464. package/plan/001_d3bb02af4886/docs/research/general/INTROSPECTION_RESEARCH_SUMMARY.md +0 -378
  465. package/plan/001_d3bb02af4886/docs/research/general/README-INTROSPECTION.md +0 -352
  466. package/plan/001_d3bb02af4886/docs/research/general/agent-introspection-patterns.md +0 -1085
  467. package/plan/001_d3bb02af4886/docs/research/general/introspection-security-guide.md +0 -984
  468. package/plan/001_d3bb02af4886/docs/research/general/introspection-tool-examples.md +0 -875
  469. package/plan/001_d3bb02af4886/docs/research/incremental-tree-map-updates/PRP_TEMPLATE.md +0 -460
  470. package/plan/001_d3bb02af4886/docs/research/incremental-tree-map-updates/QUICK_REFERENCE.md +0 -324
  471. package/plan/001_d3bb02af4886/docs/research/incremental-tree-map-updates/README.md +0 -175
  472. package/plan/001_d3bb02af4886/docs/research/incremental-tree-map-updates/RESEARCH_REPORT.md +0 -499
  473. package/plan/001_d3bb02af4886/docs/research/incremental-tree-map-updates/SUMMARY.md +0 -163
  474. package/plan/001_d3bb02af4886/prd_snapshot.md +0 -543
  475. package/plan/bugfix/BUG_FIX_SUMMARY.md +0 -961
  476. package/scripts/generate-llms-full.ts +0 -206
  477. package/src/__tests__/adversarial/attachChild-performance.test.ts +0 -216
  478. package/src/__tests__/adversarial/circular-reference.test.ts +0 -101
  479. package/src/__tests__/adversarial/complex-circular-reference.test.ts +0 -139
  480. package/src/__tests__/adversarial/concurrent-task-failures.test.ts +0 -571
  481. package/src/__tests__/adversarial/deep-analysis.test.ts +0 -729
  482. package/src/__tests__/adversarial/deep-hierarchy-stress.test.ts +0 -213
  483. package/src/__tests__/adversarial/e2e-prd-validation.test.ts +0 -448
  484. package/src/__tests__/adversarial/edge-case.test.ts +0 -703
  485. package/src/__tests__/adversarial/error-merge-strategy.test.ts +0 -760
  486. package/src/__tests__/adversarial/incremental-performance.test.ts +0 -140
  487. package/src/__tests__/adversarial/node-map-update-benchmarks.test.ts +0 -457
  488. package/src/__tests__/adversarial/observer-propagation.test.ts +0 -487
  489. package/src/__tests__/adversarial/parent-validation.test.ts +0 -143
  490. package/src/__tests__/adversarial/prd-12-2-compliance.test.ts +0 -611
  491. package/src/__tests__/adversarial/prd-compliance.test.ts +0 -731
  492. package/src/__tests__/compatibility/backward-compatibility.test.ts +0 -1572
  493. package/src/__tests__/helpers/index.ts +0 -18
  494. package/src/__tests__/helpers/tree-verification.ts +0 -257
  495. package/src/__tests__/integration/agent-workflow.test.ts +0 -256
  496. package/src/__tests__/integration/bidirectional-consistency.test.ts +0 -847
  497. package/src/__tests__/integration/observer-logging.test.ts +0 -643
  498. package/src/__tests__/integration/tree-mirroring.test.ts +0 -151
  499. package/src/__tests__/integration/workflow-reparenting.test.ts +0 -303
  500. package/src/__tests__/unit/agent.test.ts +0 -169
  501. package/src/__tests__/unit/cache-key.test.ts +0 -182
  502. package/src/__tests__/unit/cache.test.ts +0 -172
  503. package/src/__tests__/unit/context.test.ts +0 -217
  504. package/src/__tests__/unit/decorators.test.ts +0 -100
  505. package/src/__tests__/unit/introspection-tools.test.ts +0 -277
  506. package/src/__tests__/unit/logger.test.ts +0 -293
  507. package/src/__tests__/unit/observable.test.ts +0 -321
  508. package/src/__tests__/unit/prompt.test.ts +0 -135
  509. package/src/__tests__/unit/reflection.test.ts +0 -210
  510. package/src/__tests__/unit/tree-debugger-incremental.test.ts +0 -170
  511. package/src/__tests__/unit/tree-debugger.test.ts +0 -85
  512. package/src/__tests__/unit/utils/workflow-error-utils.test.ts +0 -209
  513. package/src/__tests__/unit/workflow-detachChild.test.ts +0 -100
  514. package/src/__tests__/unit/workflow-emitEvent-childDetached.test.ts +0 -153
  515. package/src/__tests__/unit/workflow-isDescendantOf.test.ts +0 -180
  516. package/src/__tests__/unit/workflow.test.ts +0 -357
  517. package/src/cache/cache-key.ts +0 -244
  518. package/src/cache/cache.ts +0 -236
  519. package/src/core/agent.ts +0 -593
  520. package/src/core/event-tree.ts +0 -260
  521. package/src/core/logger.ts +0 -112
  522. package/src/core/mcp-handler.ts +0 -184
  523. package/src/core/prompt.ts +0 -150
  524. package/src/core/workflow-context.ts +0 -351
  525. package/src/core/workflow.ts +0 -540
  526. package/src/debugger/tree-debugger.ts +0 -255
  527. package/src/decorators/observed-state.ts +0 -95
  528. package/src/decorators/step.ts +0 -139
  529. package/src/decorators/task.ts +0 -159
  530. package/src/examples/tdd-orchestrator.ts +0 -65
  531. package/src/examples/test-cycle-workflow.ts +0 -64
  532. package/src/index.ts +0 -142
  533. package/src/reflection/reflection.ts +0 -407
  534. package/src/tools/index.ts +0 -36
  535. package/src/tools/introspection.ts +0 -464
  536. package/src/types/agent.ts +0 -90
  537. package/src/types/decorators.ts +0 -32
  538. package/src/types/error-strategy.ts +0 -13
  539. package/src/types/error.ts +0 -20
  540. package/src/types/events.ts +0 -75
  541. package/src/types/index.ts +0 -55
  542. package/src/types/logging.ts +0 -24
  543. package/src/types/observer.ts +0 -18
  544. package/src/types/prompt.ts +0 -40
  545. package/src/types/reflection.ts +0 -117
  546. package/src/types/sdk-primitives.ts +0 -128
  547. package/src/types/workflow-context.ts +0 -163
  548. package/src/types/workflow.ts +0 -37
  549. package/src/utils/id.ts +0 -11
  550. package/src/utils/index.ts +0 -4
  551. package/src/utils/observable.ts +0 -106
  552. package/src/utils/workflow-error-utils.ts +0 -56
  553. package/tsconfig.json +0 -22
  554. package/vitest.config.ts +0 -16
@@ -1,1468 +0,0 @@
1
- # AI Agent Reflection & Self-Correction Patterns - Research Summary
2
-
3
- **Date**: December 2025
4
- **Focus**: Comprehensive research on reflection and self-correction patterns for AI agent frameworks
5
-
6
- ## Table of Contents
7
-
8
- 1. [Reflection in AI Systems](#reflection-in-ai-systems)
9
- 2. [Reflection Levels & Triggers](#reflection-levels--triggers)
10
- 3. [Implementation Patterns](#implementation-patterns)
11
- 4. [Reflection Prompt Templates](#reflection-prompt-templates)
12
- 5. [Existing Framework Approaches](#existing-framework-approaches)
13
- 6. [Best Practices & Guardrails](#best-practices--guardrails)
14
- 7. [When NOT to Reflect](#when-not-to-reflect)
15
- 8. [State Capture Patterns](#state-capture-patterns)
16
- 9. [Code Implementation Examples](#code-implementation-examples)
17
-
18
- ---
19
-
20
- ## Reflection in AI Systems
21
-
22
- ### What is Reflection?
23
-
24
- Reflection in AI agent contexts refers to an agent's ability to **think about its own actions and results in order to self-correct and improve**. It's essentially the AI analog of human introspection or "System 2" deliberative thinking. Rather than merely reacting instinctively, a reflective AI will pause to analyze what it has done, identify errors or suboptimal steps, and adjust its strategy.
25
-
26
- Key insight: Agents that can check and improve their own output are fundamentally more reliable because they catch mistakes before they compound, self-correct when they drift, and get better as they iterate.
27
-
28
- ### Core Components of Reflection
29
-
30
- The reflection pattern typically follows a three-phase cycle:
31
-
32
- 1. **Generation** - The model creates an initial output based on a prompt
33
- 2. **Reflection** - The AI critiques its own work, identifying areas for improvement
34
- 3. **Iteration/Refinement** - The AI refines its output based on feedback and continues until quality thresholds are met
35
-
36
- ### Why Reflection Matters
37
-
38
- Research demonstrates significant performance improvements:
39
- - **Reflexion (Shinn et al., 2023)**: Achieved 91% success rates in complex tasks
40
- - **CRITIC (Gou et al., 2024)**: Showed 10-30% improvement in accuracy across multiple domains
41
- - **Reflexion + GPT-4**: Reached 91% on HumanEval coding benchmark vs 80% without reflection
42
-
43
- ---
44
-
45
- ## Reflection Levels & Triggers
46
-
47
- ### Three Levels of Reflection
48
-
49
- #### 1. **Prompt-Level Reflection**
50
- - Occurs within a single LLM call
51
- - Model is prompted to "check your work" after generation
52
- - Lightweight, uses only one additional prompt
53
- - Good for: Basic quality improvement, simple validations
54
- - Cost: 1 additional LLM call
55
-
56
- #### 2. **Agent-Level Reflection**
57
- - Occurs between tool calls or action sequences
58
- - Agent pauses after each major step to evaluate progress
59
- - Can include both self-assessment and external tool feedback
60
- - Good for: Multi-step tasks, tool-based workflows
61
- - Cost: Adds latency but improves task success
62
-
63
- #### 3. **Workflow-Level Reflection**
64
- - Occurs at the orchestration level
65
- - Multiple agents or sub-tasks are evaluated together
66
- - Captures systemic improvements and pattern recognition
67
- - Good for: Complex multi-agent systems, long-running workflows
68
- - Cost: Significant additional compute, reserved for high-value tasks
69
-
70
- ### Trigger Mechanisms
71
-
72
- #### **Error-Driven Reflection** (Most Common)
73
- Triggered when:
74
- - Tool call fails with error
75
- - Output validation rules fail
76
- - Test/assertion fails
77
- - Response status codes indicate problems
78
-
79
- Pattern:
80
- ```
81
- Output → Validate → Error Detected → Reflect → Retry
82
- ```
83
-
84
- #### **Low-Confidence Reflection**
85
- Triggered when:
86
- - Model expresses uncertainty in output
87
- - Confidence score below threshold
88
- - Multiple alternative interpretations exist
89
- - Ambiguous user input detected
90
-
91
- Mechanism:
92
- - Use certainty tokens or confidence metadata
93
- - Leverage model's own uncertainty assessment
94
- - Self-Reflection Certainty: Ask model "Does this seem correct to you?"
95
- - Dynamically adjust confidence as model reasons (chain-of-thought)
96
-
97
- #### **Manual/Explicit Triggers**
98
- - User explicitly requests reflection
99
- - Scheduled checkpoints in workflow
100
- - Budget-based (after N tokens/steps)
101
- - Performance-based (when metric drops below threshold)
102
-
103
- #### **Progress-Based Triggers**
104
- - No progress after N iterations
105
- - State duplication (same output returned twice)
106
- - Repeated error patterns
107
- - Timeout approaching
108
-
109
- ---
110
-
111
- ## Implementation Patterns
112
-
113
- ### Pattern 1: Error Detection → Reflection → Retry Loop
114
-
115
- **Core Cycle:**
116
- ```
117
- 1. Generate Solution
118
- 2. Execute/Validate → Capture Error
119
- 3. Reflect: "What went wrong? Why did this fail?"
120
- 4. Retry with improved strategy
121
- 5. Loop until success or max attempts reached
122
- ```
123
-
124
- **Key Variables:**
125
- - `max_retry_limit`: Total retry attempts (default: 3-5)
126
- - `retry_count`: Current attempt number
127
- - `error_context`: Captured error message/state
128
- - `previous_attempts`: History of what was tried
129
-
130
- **Implementation Considerations:**
131
- - Always include error message in reflection prompt
132
- - Maintain history of previous attempts to avoid loops
133
- - Implement exponential backoff for API calls
134
- - Track which approaches failed to suggest different strategies
135
-
136
- ### Pattern 2: Instruction-Following Validation (IFE)
137
-
138
- Treats LLM outputs as **untrusted inputs requiring explicit validation**.
139
-
140
- **Flow:**
141
- ```
142
- Agent Generates Output
143
-
144
- Validation Checkpoint:
145
- - Check instruction compliance
146
- - Verify format requirements
147
- - Validate output constraints
148
-
149
- If Violations Detected:
150
- - Log specific failures
151
- - Refine prompt with violation details
152
- - Retry (up to max attempts)
153
-
154
- If Passed:
155
- - Accept and proceed
156
- ```
157
-
158
- **Example Constraints:**
159
- ```
160
- - Time estimates: numeric only, 0-4.0 range, no units
161
- - Function names: snake_case, no special characters
162
- - Response format: valid JSON, specific schema
163
- - Length: within min/max bounds
164
- ```
165
-
166
- ### Pattern 3: Reflexion Architecture
167
-
168
- Separates three distinct models/roles:
169
-
170
- 1. **Actor** - Generates text and actions using Chain-of-Thought or ReAct
171
- 2. **Evaluator** - Scores outputs by assigning reward signals
172
- 3. **Self-Reflection** - Generates verbal feedback using rewards and memory
173
-
174
- **Flow:**
175
- ```
176
- Task Definition
177
-
178
- Generate Initial Trajectory (Actor)
179
-
180
- Evaluate Outcome (Evaluator assigns reward score)
181
-
182
- Generate Reflection (Self-Reflection creates verbal feedback)
183
-
184
- Store in Memory
185
-
186
- Generate Next Trajectory (with reflection context)
187
- ```
188
-
189
- Advantages:
190
- - Structured feedback mechanism
191
- - Interpretable reflection output
192
- - Can learn from feedback across multiple attempts
193
- - Grounded in external signals (rewards)
194
-
195
- ### Pattern 4: Tool-Enhanced Reflection
196
-
197
- Agent uses external tools to verify correctness before self-reflection.
198
-
199
- **Tools Used:**
200
- - Unit tests / test cases
201
- - Code linters (for TypeScript/Python)
202
- - Web search to verify facts
203
- - APIs to validate data
204
- - Sandbox execution to catch runtime errors
205
-
206
- **Flow:**
207
- ```
208
- Generate Code
209
-
210
- Run Tests/Linter
211
-
212
- Capture Feedback
213
-
214
- Reflect on Specific Failures
215
-
216
- Revise Based on Concrete Evidence
217
- ```
218
-
219
- **Key Insight**: Type-checked languages (TypeScript vs JavaScript) provide multiple layers of automatic feedback, improving reflection quality.
220
-
221
- ### Pattern 5: Multi-Agent Reflection
222
-
223
- Rather than self-reflection, deploy two specialized agents:
224
- 1. **Generator Agent** - Prompted to produce outputs
225
- 2. **Critic Agent** - Prompted to provide constructive criticism
226
-
227
- **Flow:**
228
- ```
229
- Generator creates output
230
-
231
- Critic reviews and provides specific feedback:
232
- - What works well
233
- - What's missing or wrong
234
- - Specific improvement suggestions
235
-
236
- Generator receives critique
237
-
238
- Revised output
239
-
240
- (Loop up to N times or until satisfied)
241
- ```
242
-
243
- Benefits:
244
- - More diverse feedback (different reasoning path)
245
- - Can leverage specialized critic models
246
- - Dialogue creates interactive improvement
247
- - Often produces better results than self-reflection alone
248
-
249
- ---
250
-
251
- ## Reflection Prompt Templates
252
-
253
- ### Template 1: Basic Self-Critique (Lightweight)
254
-
255
- ```
256
- Original Task: [TASK]
257
-
258
- Your previous response:
259
- [RESPONSE]
260
-
261
- Please review your response for:
262
- 1. Accuracy - Is the information correct?
263
- 2. Completeness - Did you address all aspects of the task?
264
- 3. Clarity - Is it easy to understand?
265
- 4. Potential improvements - What could be better?
266
-
267
- Identify any issues and provide a revised response.
268
- ```
269
-
270
- **Cost**: Single additional LLM call
271
- **Best for**: Quick quality improvements, simple tasks
272
-
273
- ---
274
-
275
- ### Template 2: Error-Context Reflection (With Feedback)
276
-
277
- ```
278
- Original Task: [TASK]
279
-
280
- Your previous attempt:
281
- [PREVIOUS_RESPONSE]
282
-
283
- Error encountered: [ERROR_MESSAGE]
284
-
285
- Analysis: What specifically caused this error?
286
- - Root cause analysis
287
- - What assumption was wrong?
288
- - What information was missing?
289
-
290
- Revised approach: Provide a corrected solution that addresses the specific error.
291
- Explain your reasoning for why this approach will work better.
292
- ```
293
-
294
- **Cost**: Single additional LLM call with rich context
295
- **Best for**: Recovery from errors, learning from failures
296
-
297
- ---
298
-
299
- ### Template 3: Expert Persona Reflection
300
-
301
- ```
302
- Original Task: [TASK]
303
-
304
- Response to evaluate:
305
- [RESPONSE]
306
-
307
- You are now a [EXPERT_ROLE: code reviewer | technical architect | quality assurance specialist].
308
- Review the above response from the perspective of [EXPERT_ROLE].
309
-
310
- Specifically evaluate:
311
- 1. [TECHNICAL_CRITERIA]
312
- 2. [BEST_PRACTICES]
313
- 3. [EDGE_CASES]
314
- 4. [PERFORMANCE/QUALITY_METRICS]
315
-
316
- Provide your expert assessment and specific improvements.
317
- ```
318
-
319
- **Cost**: Single additional LLM call
320
- **Best for**: Complex technical outputs, code, architectural decisions
321
-
322
- ---
323
-
324
- ### Template 4: Structured Reflection with Rubric
325
-
326
- ```
327
- Original Task: [TASK]
328
-
329
- Generated Output:
330
- [OUTPUT]
331
-
332
- Evaluation Rubric:
333
- 1. Requirement A: [DESCRIPTION]
334
- Status: ✓ Met / ✗ Not Met
335
- If not met, why?
336
-
337
- 2. Requirement B: [DESCRIPTION]
338
- Status: ✓ Met / ✗ Not Met
339
- If not met, why?
340
-
341
- [... for each requirement ...]
342
-
343
- Summary:
344
- - Which requirements were NOT met?
345
- - Specific fixes needed for each failure
346
- - Revised output addressing all requirements
347
-
348
- Provide corrected output that meets all requirements.
349
- ```
350
-
351
- **Cost**: Single additional LLM call
352
- **Best for**: Tasks with explicit criteria, validation-heavy workflows
353
-
354
- ---
355
-
356
- ### Template 5: Confidence-Triggered Reflection
357
-
358
- ```
359
- Original Task: [TASK]
360
-
361
- Your response:
362
- [RESPONSE]
363
-
364
- Before we proceed, please evaluate your own confidence:
365
- 1. How confident are you that this response is correct? (0-100%)
366
- 2. What aspects are you uncertain about?
367
- 3. What additional information would increase your confidence?
368
-
369
- If confidence < 80%:
370
- - Identify specific sources of uncertainty
371
- - Provide alternative approaches you considered
372
- - Suggest how to verify your answer
373
- - Offer a revised response with higher confidence
374
- ```
375
-
376
- **Cost**: Single additional call with conditional branching
377
- **Best for**: High-stakes decisions, complex problem-solving
378
-
379
- ---
380
-
381
- ### Template 6: Multi-Turn Reflection Loop
382
-
383
- ```
384
- ROUND 1 - Initial Generation:
385
- [INITIAL_PROMPT]
386
-
387
- ROUND 2 - Self-Critique:
388
- "Review your response for: correctness, completeness, clarity, and efficiency.
389
- Identify specific issues."
390
-
391
- [CRITIQUE_FROM_PREVIOUS_ROUND]
392
-
393
- ROUND 3 - Improvement:
394
- "Based on the identified issues, provide an improved version.
395
- Explain what you changed and why."
396
-
397
- [CONTINUE_FOR_UP_TO_N_ROUNDS]
398
-
399
- Quality Checkpoint:
400
- Does current output meet all quality criteria? If yes, finalize.
401
- If no, continue round [N+1].
402
- ```
403
-
404
- **Cost**: Multiple LLM calls (3-5 typically)
405
- **Best for**: Complex writing, algorithm optimization, architectural design
406
-
407
- ---
408
-
409
- ## Existing Framework Approaches
410
-
411
- ### LangChain/LangGraph Reflection
412
-
413
- LangChain implements reflection through **LangGraph**, a stateful graph framework.
414
-
415
- **Three Core Patterns:**
416
-
417
- #### 1. Basic Reflection (MessageGraph)
418
- ```typescript
419
- - State: List of messages
420
- - Generator Node: Produces initial responses
421
- - Reflector Node: Acts as "teacher" providing constructive criticism
422
- - Edges: Loop back up to N times
423
- ```
424
-
425
- #### 2. Reflexion Pattern
426
- ```typescript
427
- - Generator produces draft
428
- - Tools are executed
429
- - Feedback captured
430
- - Revision happens with reflection context
431
- - Conditional loop based on iteration count
432
- ```
433
-
434
- #### 3. Language Agent Tree Search (LATS)
435
- ```typescript
436
- - Combines reflection/evaluation with Monte Carlo tree search
437
- - Four steps: Select → Expand/Simulate → Reflect+Evaluate → Backpropagate
438
- - Uses StateGraph with tree-based exploration
439
- ```
440
-
441
- **Key Implementation Details:**
442
- - Uses `add_node()`, `add_edge()`, `add_conditional_edges()`
443
- - State is shared data structure representing current snapshot
444
- - Nodes encode logic, perform computation, make LLM calls
445
- - Edges define next node based on current state
446
-
447
- **Trade-off**: Reflection requires additional computational time and resources. Each pattern trades latency for higher output quality. Not suitable for low-latency applications.
448
-
449
- ---
450
-
451
- ### Reflexion Framework (Shinn et al., 2023)
452
-
453
- **Design Philosophy**: Keep model frozen, use text-based feedback as reinforcement.
454
-
455
- **Components**:
456
- 1. **Actor** - Attempts task using Chain-of-Thought/ReAct with memory
457
- 2. **Evaluator** - Assigns reward scores to trajectories
458
- 3. **Self-Reflection** - Generates verbal feedback from rewards
459
-
460
- **Key Feature**: Reflexion forces explicit grounding in external data:
461
- - Must cite sources for claims
462
- - Explicitly enumerate superfluous aspects (what's wrong)
463
- - Explicitly enumerate missing aspects (what's needed)
464
-
465
- **Results**:
466
- - 91% success on complex tasks vs lower baselines
467
- - Strong performance on: AlfWorld (decision-making), HotPotQA (reasoning), HumanEval/MBPP (programming)
468
-
469
- **Best Use Cases**:
470
- - Iterative learning from mistakes
471
- - When traditional RL is impractical
472
- - Tasks where interpretability matters
473
- - Systems requiring nuanced feedback
474
-
475
- ---
476
-
477
- ### Claude/Anthropic Reflection Patterns
478
-
479
- **Philosophy**: Simplicity over complexity. Start with simple prompts, optimize through evaluation, add multi-step systems only when necessary.
480
-
481
- **Core Principles**:
482
- 1. Maintain simplicity in agent design
483
- 2. Prioritize transparency (show planning steps explicitly)
484
- 3. Carefully craft agent-computer interface (ACI) through tool documentation
485
-
486
- **Evaluator-Optimizer Workflow**:
487
- ```
488
- One LLM Call: Generates response
489
-
490
- Another LLM Call: Provides evaluation and feedback
491
-
492
- Loop: Iteratively refine
493
- ```
494
-
495
- **Most Effective When**:
496
- - Clear evaluation criteria exist
497
- - Iterative refinement provides measurable value
498
- - Not implementing complex internal reasoning
499
-
500
- **Extended Thinking Integration**:
501
- - Use extended thinking for complex reasoning within reflection
502
- - Interleaved mode: tool call → tool result → reflection thinking
503
- - Strongly prefer thinking block when uncertain
504
- - Enables "System 2" deliberative thinking in reflection phase
505
-
506
- **Feedback Approaches**:
507
- - **Rules-Based Feedback**: Define explicit rules, explain which failed and why
508
- - **Code Linting**: Type-checked languages (TypeScript) provide automatic feedback layers
509
- - **Sandbox Execution**: Run code to identify bugs
510
-
511
- ---
512
-
513
- ### AutoGPT Self-Correction
514
-
515
- **Approach**: Analyze feedback from errors and adjust strategy.
516
-
517
- **Core Mechanism**:
518
- ```
519
- Execute Step
520
-
521
- Evaluate Outcome
522
-
523
- If Failed:
524
- - Run reflection process
525
- - Diagnose failure points
526
- - Update strategy
527
- - Proceed
528
- ```
529
-
530
- **Key Features**:
531
- - Flexible automation with error analysis
532
- - Requires human oversight to prevent infinite loops
533
- - Handles many errors on client side
534
-
535
- **Recent Innovation: Retrials Without Feedback**
536
- Research shows "retrials without feedback" is effective:
537
- - Retry whenever incorrect answer identified
538
- - No explicit self-reflection needed
539
- - Continue until correct solution found or budget exhausted
540
- - Simpler than Reflexion, surprisingly effective
541
-
542
- ---
543
-
544
- ### Google Agent Development Kit (ADK) - Reflect & Retry
545
-
546
- **Technical Implementation**:
547
-
548
- **Core Mechanism**: Intercepts tool failures, provides structured guidance for correction, retries up to configurable limit.
549
-
550
- **Key Features**:
551
- - Concurrency-safe with locking mechanisms
552
- - Failure tracking per-invocation (default) or global across users
553
- - Custom error extraction by overriding detection methods
554
- - Supports both transient and logical errors
555
-
556
- **Configuration**:
557
- ```
558
- max_retries: 3 (default)
559
- throw_on_exceeded: true (default)
560
- failure_scope: per_invocation or global
561
- ```
562
-
563
- **Advanced Pattern**: Custom error detection
564
- ```
565
- Override extract_error_from_result() to identify:
566
- - HTTP status codes
567
- - Custom response fields
568
- - Error patterns in normal responses
569
- ```
570
-
571
- ---
572
-
573
- ## Best Practices & Guardrails
574
-
575
- ### Maximum Reflection Attempts
576
-
577
- **Industry Standard**: 3-5 maximum reflection attempts
578
-
579
- **Recommended Configuration**:
580
- ```
581
- - Basic Tasks (simple validation): 2 attempts
582
- - Standard Tasks (tool-based workflows): 3 attempts
583
- - Complex Reasoning: 4-5 attempts
584
- - Never exceed: 8 attempts
585
- ```
586
-
587
- **Guardrails to Prevent Loops**:
588
- 1. **Hard iteration limit**: `max_rounds` (fixed ceiling)
589
- 2. **No-progress detection**: Stop after K rounds with no improvement
590
- 3. **State-hash deduplication**: Exit if returning to previous state
591
- 4. **Cost budget**: Total token limit across all attempts
592
- 5. **Timeout mechanism**: Overall time limit (not just per-request)
593
-
594
- ### Error Handling Strategy
595
-
596
- **Distinguish Error Types**:
597
-
598
- | Error Type | Action | Retry? |
599
- |-----------|--------|--------|
600
- | Transient (timeout, rate limit) | Wait with exponential backoff | Yes (2-3x) |
601
- | Logical (wrong approach) | Reflect, change strategy | Yes (up to 3x) |
602
- | Invalid input (bad data) | Return error to user | No |
603
- | Model refusal | Accept result | No |
604
- | Permanent failure (API down) | Escalate/fallback | No |
605
-
606
- **Backoff Strategies**:
607
- - **Constant Backoff**: Fixed delay (e.g., 1 second)
608
- - **Exponential Backoff**: Delay doubles each attempt
609
- - **Jittered Backoff**: Add randomness to prevent thundering herd
610
-
611
- Example exponential backoff:
612
- ```
613
- Attempt 1: Retry immediately
614
- Attempt 2: Wait 1 second
615
- Attempt 3: Wait 2 seconds
616
- Attempt 4: Wait 4 seconds
617
- Attempt 5: Wait 8 seconds
618
- ```
619
-
620
- ### Success Criteria Matter
621
-
622
- Clear success criteria prevent infinite loops:
623
-
624
- **Bad**: "Fix the bug", "optimize the database", "improve the response"
625
- **Good**: "Make test_user_login pass", "reduce query time below 100ms", "increase BLEU score to 0.85+"
626
-
627
- ### State Capture Before Reflection
628
-
629
- **What to Capture**:
630
- 1. **Input Context**: Original request, parameters, user intent
631
- 2. **Execution Snapshot**: Current state at failure point
632
- 3. **Error Details**: Exception, error code, message
633
- 4. **Attempt History**: What was tried before, outcomes
634
- 5. **Decision Metadata**: Why each choice was made, confidence level
635
-
636
- **Storage Strategy**:
637
- - Use lightweight JSON objects
638
- - Store in Redis with expiration matching workflow duration
639
- - Separate learned patterns from temporary processing state
640
- - Keep reasoning chain (why decisions were made) separate
641
-
642
- ---
643
-
644
- ## When NOT to Reflect
645
-
646
- ### Scenarios to Avoid Reflection
647
-
648
- #### 1. **Low-Stakes, High-Velocity Tasks**
649
- - Real-time chat responses
650
- - Autocomplete suggestions
651
- - Quick lookups
652
- - Requirements: <100ms latency
653
-
654
- **Cost/Benefit**: Cost of reflection exceeds value of marginal improvement
655
-
656
- #### 2. **Well-Understood, Deterministic Workflows**
657
- - Simple CRUD operations
658
- - Predictable data transformations
659
- - Tasks with 99%+ baseline accuracy
660
-
661
- **Cost/Benefit**: No errors to fix, reflection wastes tokens
662
-
663
- #### 3. **Clear Model Refusals**
664
- - User asks model to do something against policies
665
- - Model refuses for safety reasons
666
- - No reflection can change this outcome
667
-
668
- **Cost/Benefit**: Reflection won't help
669
-
670
- #### 4. **Ambiguous User Input Without Clarification**
671
- - User request is unclear
672
- - Model can't determine intent
673
-
674
- **Better approach**: Ask clarifying questions, don't reflect
675
-
676
- #### 5. **High-Confidence Outputs with Good Validation**
677
- - Model is highly confident
678
- - Output passes all validation checks
679
- - Tests confirm correctness
680
-
681
- **Cost/Benefit**: Reflection adds latency with no benefit
682
-
683
- #### 6. **Token Budget Constraints**
684
- - Limited tokens remaining in context window
685
- - Reflection would consume majority of remaining budget
686
-
687
- **Cost/Benefit**: Can't afford the cost
688
-
689
- #### 7. **Cascading Failures**
690
- - Reflection failure causes downstream failures
691
- - Loop detection shows same error pattern repeating
692
-
693
- **Better approach**: Escalate to human or fallback
694
-
695
- ### Performance Impact
696
-
697
- **Cost of Reflection**:
698
- - Each reflection attempt = ~1 additional LLM call
699
- - Latency: +200-2000ms per reflection (depends on model)
700
- - Cost: +1x-2x per reflection (depending on output length)
701
-
702
- **When Cost Justifies Benefit**:
703
- - High-value decisions (code generation, critical business logic)
704
- - Complex reasoning tasks
705
- - Where 10-30% improvement is meaningful
706
- - User acceptable for 2-5x latency increase
707
-
708
- ### Confidence-Based Thresholds
709
-
710
- **Reflection Triggers**:
711
- - Model confidence < 70%: Trigger reflection
712
- - Model confidence 70-85%: Optional reflection
713
- - Model confidence > 85%: Skip reflection
714
-
715
- **Implementation**:
716
- - Use model's own uncertainty assessment
717
- - Leverage confidence tokens from extended thinking
718
- - Monitor chain-of-thought for hedging language
719
- - Track prediction confidence scores
720
-
721
- ---
722
-
723
- ## State Capture Patterns
724
-
725
- ### Pre-Reflection State Snapshot
726
-
727
- Capture critical state **before** attempting reflection:
728
-
729
- ```json
730
- {
731
- "attempt_number": 1,
732
- "timestamp": "2025-12-08T12:34:56Z",
733
- "input": {
734
- "user_request": "...",
735
- "context": "...",
736
- "parameters": {...}
737
- },
738
- "generation": {
739
- "output": "...",
740
- "model": "claude-opus-4.5",
741
- "tokens_used": 245,
742
- "confidence": 0.65
743
- },
744
- "validation": {
745
- "passed": false,
746
- "violations": ["format_check_failed", "logic_error"],
747
- "error_message": "..."
748
- },
749
- "error_context": {
750
- "type": "logical_error",
751
- "details": "..."
752
- }
753
- }
754
- ```
755
-
756
- ### Reasoning Chain Logging
757
-
758
- Separate reasoning metadata from content:
759
-
760
- ```json
761
- {
762
- "decision_point": "tool_selection",
763
- "options_considered": ["approach_a", "approach_b", "approach_c"],
764
- "chosen": "approach_a",
765
- "reasoning": "Approach A is more efficient because...",
766
- "confidence": 0.72,
767
- "alternative_rationale": "Approach B would work but...",
768
- "risk_factors": ["potential_timeout", "edge_case_handling"]
769
- }
770
- ```
771
-
772
- Benefits:
773
- - Recovery doesn't re-analyze same information
774
- - Next attempt picks up decision trail where it left off
775
- - Provides context for reflection prompts
776
-
777
- ### Memory State Preservation
778
-
779
- Distinguish learned patterns from temporary state:
780
-
781
- ```json
782
- {
783
- "learned_patterns": {
784
- "document_structure_insights": ["..."],
785
- "user_preferences": ["..."],
786
- "error_recovery_strategies": ["..."]
787
- },
788
- "temporary_state": {
789
- "current_task_context": "...",
790
- "current_output": "...",
791
- "current_attempt": 2
792
- }
793
- }
794
- ```
795
-
796
- **Key principle**: When individual tasks fail, preserve learned insights while resetting temporary state.
797
-
798
- ### State for Error Recovery
799
-
800
- Include information needed for intelligent retry:
801
-
802
- ```json
803
- {
804
- "failed_attempt": {
805
- "approach": "web_search_strategy",
806
- "output": "...",
807
- "error": "timeout"
808
- },
809
- "recovery_context": {
810
- "what_worked_before": [
811
- {"approach": "api_call", "result": "success"},
812
- {"approach": "local_cache", "result": "cache_miss"}
813
- ],
814
- "what_failed": [
815
- {"approach": "web_search", "reason": "timeout"}
816
- ],
817
- "suggestion": "Try API call approach next"
818
- }
819
- }
820
- ```
821
-
822
- ---
823
-
824
- ## Code Implementation Examples
825
-
826
- ### Example 1: Basic Error-Reflection-Retry Loop (TypeScript)
827
-
828
- ```typescript
829
- interface ReflectionState {
830
- attempt: number;
831
- maxAttempts: number;
832
- lastError: string | null;
833
- attemptHistory: Array<{
834
- approach: string;
835
- result: string;
836
- error: string | null;
837
- }>;
838
- }
839
-
840
- async function executeWithReflection(
841
- task: string,
842
- maxAttempts: number = 3
843
- ): Promise<string> {
844
- const state: ReflectionState = {
845
- attempt: 0,
846
- maxAttempts,
847
- lastError: null,
848
- attemptHistory: [],
849
- };
850
-
851
- while (state.attempt < state.maxAttempts) {
852
- state.attempt++;
853
-
854
- try {
855
- // Step 1: Generate solution
856
- const solution = await generateSolution(task, state.attemptHistory);
857
-
858
- // Step 2: Validate
859
- const validation = validateOutput(solution);
860
- if (validation.isValid) {
861
- return solution;
862
- }
863
-
864
- // Step 3: Reflect on failure
865
- state.lastError = validation.errors.join("; ");
866
- const reflection = await reflectOnFailure(
867
- task,
868
- solution,
869
- validation.errors,
870
- state.attemptHistory
871
- );
872
-
873
- // Step 4: Update history
874
- state.attemptHistory.push({
875
- approach: reflection.suggestedApproach,
876
- result: solution,
877
- error: state.lastError,
878
- });
879
-
880
- } catch (error) {
881
- state.lastError = String(error);
882
-
883
- // Attempt recovery reflection
884
- const recovery = await reflectOnError(task, error, state.attemptHistory);
885
- state.attemptHistory.push({
886
- approach: recovery.suggestedApproach,
887
- result: "",
888
- error: state.lastError,
889
- });
890
- }
891
- }
892
-
893
- throw new Error(
894
- `Failed after ${state.maxAttempts} attempts. ` +
895
- `Last error: ${state.lastError}`
896
- );
897
- }
898
-
899
- async function generateSolution(
900
- task: string,
901
- history: ReflectionState["attemptHistory"]
902
- ): Promise<string> {
903
- const historyContext = history.length > 0
904
- ? `Previous attempts:\n${history
905
- .map((h, i) => `Attempt ${i + 1} (${h.approach}): ${h.error || "failed"}`)
906
- .join("\n")}\n`
907
- : "";
908
-
909
- const response = await client.messages.create({
910
- model: "claude-opus-4.5",
911
- max_tokens: 1024,
912
- messages: [
913
- {
914
- role: "user",
915
- content: `${historyContext}\nTask: ${task}\n\nGenerate a solution.`,
916
- },
917
- ],
918
- });
919
-
920
- return response.content[0].type === "text" ? response.content[0].text : "";
921
- }
922
-
923
- async function reflectOnFailure(
924
- task: string,
925
- solution: string,
926
- errors: string[],
927
- history: ReflectionState["attemptHistory"]
928
- ): Promise<{ suggestedApproach: string }> {
929
- const response = await client.messages.create({
930
- model: "claude-opus-4.5",
931
- max_tokens: 512,
932
- messages: [
933
- {
934
- role: "user",
935
- content: `Task: ${task}
936
-
937
- Your previous solution failed with these issues:
938
- ${errors.map((e) => `- ${e}`).join("\n")}
939
-
940
- Previous solution:
941
- ${solution}
942
-
943
- Analyze what went wrong and suggest a different approach that would avoid these issues.`,
944
- },
945
- ],
946
- });
947
-
948
- return {
949
- suggestedApproach:
950
- response.content[0].type === "text" ? response.content[0].text : "",
951
- };
952
- }
953
-
954
- async function reflectOnError(
955
- task: string,
956
- error: unknown,
957
- history: ReflectionState["attemptHistory"]
958
- ): Promise<{ suggestedApproach: string }> {
959
- // Similar to reflectOnFailure but handles exceptions
960
- return {
961
- suggestedApproach: `Error recovery strategy after: ${String(error)}`,
962
- };
963
- }
964
-
965
- function validateOutput(output: string): {
966
- isValid: boolean;
967
- errors: string[];
968
- } {
969
- const errors: string[] = [];
970
-
971
- if (!output || output.trim().length === 0) {
972
- errors.push("Output is empty");
973
- }
974
-
975
- if (output.length < 10) {
976
- errors.push("Output is too short");
977
- }
978
-
979
- return {
980
- isValid: errors.length === 0,
981
- errors,
982
- };
983
- }
984
- ```
985
-
986
- ---
987
-
988
- ### Example 2: Instruction-Following Validation Pattern
989
-
990
- ```typescript
991
- interface ValidationRule {
992
- name: string;
993
- validate: (value: any) => boolean;
994
- errorMessage: string;
995
- }
996
-
997
- interface InstructionFollowingEvaluator {
998
- rules: ValidationRule[];
999
- maxRetries: number;
1000
- }
1001
-
1002
- async function validateWithIFE(
1003
- task: string,
1004
- evaluator: InstructionFollowingEvaluator
1005
- ): Promise<string> {
1006
- let retries = 0;
1007
-
1008
- while (retries < evaluator.maxRetries) {
1009
- // Generate output
1010
- const output = await generateOutput(task);
1011
-
1012
- // Check each rule
1013
- const violations: string[] = [];
1014
- for (const rule of evaluator.rules) {
1015
- if (!rule.validate(output)) {
1016
- violations.push(rule.errorMessage);
1017
- }
1018
- }
1019
-
1020
- // If all rules pass, return
1021
- if (violations.length === 0) {
1022
- return output;
1023
- }
1024
-
1025
- // If violations, refine and retry
1026
- retries++;
1027
- if (retries < evaluator.maxRetries) {
1028
- const refinedTask = await refinePormptWithViolations(
1029
- task,
1030
- output,
1031
- violations
1032
- );
1033
- task = refinedTask;
1034
- } else {
1035
- throw new Error(
1036
- `Validation failed after ${evaluator.maxRetries} attempts. ` +
1037
- `Violations: ${violations.join("; ")}`
1038
- );
1039
- }
1040
- }
1041
-
1042
- throw new Error("Unexpected error in IFE validation");
1043
- }
1044
-
1045
- async function generateOutput(task: string): Promise<string> {
1046
- const response = await client.messages.create({
1047
- model: "claude-opus-4.5",
1048
- max_tokens: 1024,
1049
- messages: [{ role: "user", content: task }],
1050
- });
1051
-
1052
- return response.content[0].type === "text" ? response.content[0].text : "";
1053
- }
1054
-
1055
- async function refinePormptWithViolations(
1056
- originalTask: string,
1057
- output: string,
1058
- violations: string[]
1059
- ): Promise<string> {
1060
- const response = await client.messages.create({
1061
- model: "claude-opus-4.5",
1062
- max_tokens: 512,
1063
- messages: [
1064
- {
1065
- role: "user",
1066
- content: `Original task: ${originalTask}
1067
-
1068
- Your output failed these validation rules:
1069
- ${violations.map((v) => `- ${v}`).join("\n")}
1070
-
1071
- Your output was:
1072
- ${output}
1073
-
1074
- Revise the task/instructions to ensure the next attempt will satisfy all rules.`,
1075
- },
1076
- ],
1077
- });
1078
-
1079
- return response.content[0].type === "text" ? response.content[0].text : "";
1080
- }
1081
-
1082
- // Example usage with specific validation rules
1083
- const codeEvaluator: InstructionFollowingEvaluator = {
1084
- maxRetries: 3,
1085
- rules: [
1086
- {
1087
- name: "valid_syntax",
1088
- validate: (code) => {
1089
- try {
1090
- // Parse or compile check
1091
- return code.includes("function") || code.includes("const");
1092
- } catch {
1093
- return false;
1094
- }
1095
- },
1096
- errorMessage: "Code must have valid TypeScript syntax",
1097
- },
1098
- {
1099
- name: "includes_tests",
1100
- validate: (code) => code.includes("test") || code.includes("describe"),
1101
- errorMessage: "Code must include test cases",
1102
- },
1103
- {
1104
- name: "has_comments",
1105
- validate: (code) => code.includes("//") || code.includes("/*"),
1106
- errorMessage: "Code must include comments",
1107
- },
1108
- ],
1109
- };
1110
- ```
1111
-
1112
- ---
1113
-
1114
- ### Example 3: Reflexion-Style Architecture
1115
-
1116
- ```typescript
1117
- interface ReflexionState {
1118
- task: string;
1119
- trajectory: string;
1120
- reward: number;
1121
- reflection: string;
1122
- nextAttempt: string;
1123
- }
1124
-
1125
- class ReflexionAgent {
1126
- private actor: LLMClient;
1127
- private evaluator: (output: string) => number;
1128
- private reflector: LLMClient;
1129
- private memory: ReflexionState[] = [];
1130
-
1131
- async runReflexion(task: string, maxIterations: number = 3): Promise<string> {
1132
- let currentTask = task;
1133
-
1134
- for (let i = 0; i < maxIterations; i++) {
1135
- // Step 1: Actor generates trajectory
1136
- const trajectory = await this.actor.generate(currentTask);
1137
-
1138
- // Step 2: Evaluator assigns reward
1139
- const reward = this.evaluator(trajectory);
1140
-
1141
- // Step 3: Reflector generates feedback
1142
- const reflection = await this.reflector.generateReflection(
1143
- task,
1144
- trajectory,
1145
- reward,
1146
- this.memory
1147
- );
1148
-
1149
- // Step 4: Store in memory
1150
- const state: ReflexionState = {
1151
- task,
1152
- trajectory,
1153
- reward,
1154
- reflection,
1155
- nextAttempt: "",
1156
- };
1157
- this.memory.push(state);
1158
-
1159
- // Step 5: Use reflection to improve next attempt
1160
- if (reward > 0.8) {
1161
- // Good enough, return
1162
- return trajectory;
1163
- }
1164
-
1165
- // Prepare for next iteration with reflection context
1166
- currentTask = `${task}
1167
-
1168
- Previous attempt feedback:
1169
- ${reflection}
1170
-
1171
- Generate an improved solution that addresses the feedback above.`;
1172
- }
1173
-
1174
- return this.memory[this.memory.length - 1].trajectory;
1175
- }
1176
- }
1177
-
1178
- class ReflectorModel {
1179
- private client: LLMClient;
1180
-
1181
- async generateReflection(
1182
- task: string,
1183
- trajectory: string,
1184
- reward: number,
1185
- memory: ReflexionState[]
1186
- ): Promise<string> {
1187
- const memoryContext =
1188
- memory.length > 0
1189
- ? `Previous attempts and feedback:\n${memory
1190
- .slice(-2)
1191
- .map((m) => `Reward: ${m.reward}\nFeedback: ${m.reflection}`)
1192
- .join("\n\n")}\n`
1193
- : "";
1194
-
1195
- const response = await this.client.messages.create({
1196
- model: "claude-opus-4.5",
1197
- max_tokens: 512,
1198
- messages: [
1199
- {
1200
- role: "user",
1201
- content: `Task: ${task}
1202
-
1203
- ${memoryContext}
1204
-
1205
- Current attempt (reward score: ${reward}):
1206
- ${trajectory}
1207
-
1208
- Evaluate this attempt:
1209
- 1. What did it do well?
1210
- 2. What are the specific failures or issues?
1211
- 3. What should be tried differently in the next attempt?
1212
- 4. What patterns from previous attempts should be avoided?
1213
-
1214
- Format your response as structured verbal feedback.`,
1215
- },
1216
- ],
1217
- });
1218
-
1219
- return response.content[0].type === "text" ? response.content[0].text : "";
1220
- }
1221
- }
1222
- ```
1223
-
1224
- ---
1225
-
1226
- ### Example 4: Multi-Agent Reflection
1227
-
1228
- ```typescript
1229
- class MultiAgentReflection {
1230
- private generator: LLMClient;
1231
- private critic: LLMClient;
1232
-
1233
- async reflectiveGeneration(
1234
- task: string,
1235
- maxRounds: number = 3
1236
- ): Promise<string> {
1237
- let currentOutput = await this.generator.generate(task);
1238
-
1239
- for (let round = 1; round < maxRounds; round++) {
1240
- // Get critique
1241
- const critique = await this.critic.critique(task, currentOutput);
1242
-
1243
- if (critique.isSatisfactory) {
1244
- return currentOutput;
1245
- }
1246
-
1247
- // Generate improvement
1248
- currentOutput = await this.generator.improve(
1249
- task,
1250
- currentOutput,
1251
- critique.feedback,
1252
- critique.suggestions
1253
- );
1254
- }
1255
-
1256
- return currentOutput;
1257
- }
1258
-
1259
- async generate(task: string): Promise<string> {
1260
- const response = await this.generator.messages.create({
1261
- model: "claude-opus-4.5",
1262
- max_tokens: 1024,
1263
- messages: [{ role: "user", content: task }],
1264
- });
1265
-
1266
- return response.content[0].type === "text" ? response.content[0].text : "";
1267
- }
1268
-
1269
- async improve(
1270
- task: string,
1271
- currentOutput: string,
1272
- feedback: string,
1273
- suggestions: string[]
1274
- ): Promise<string> {
1275
- const response = await this.generator.messages.create({
1276
- model: "claude-opus-4.5",
1277
- max_tokens: 1024,
1278
- messages: [
1279
- {
1280
- role: "user",
1281
- content: `Task: ${task}
1282
-
1283
- Current output:
1284
- ${currentOutput}
1285
-
1286
- Feedback from review:
1287
- ${feedback}
1288
-
1289
- Specific improvements to make:
1290
- ${suggestions.map((s) => `- ${s}`).join("\n")}
1291
-
1292
- Provide an improved version that addresses all feedback.`,
1293
- },
1294
- ],
1295
- });
1296
-
1297
- return response.content[0].type === "text" ? response.content[0].text : "";
1298
- }
1299
-
1300
- async critique(
1301
- task: string,
1302
- output: string
1303
- ): Promise<{
1304
- isSatisfactory: boolean;
1305
- feedback: string;
1306
- suggestions: string[];
1307
- }> {
1308
- const response = await this.critic.messages.create({
1309
- model: "claude-opus-4.5",
1310
- max_tokens: 512,
1311
- messages: [
1312
- {
1313
- role: "user",
1314
- content: `You are a critical reviewer. Evaluate this response:
1315
-
1316
- Task: ${task}
1317
-
1318
- Response:
1319
- ${output}
1320
-
1321
- Provide:
1322
- 1. Overall assessment (satisfactory or needs improvement)
1323
- 2. Specific issues with the current response
1324
- 3. Concrete suggestions for improvement
1325
-
1326
- Format as JSON: { "isSatisfactory": boolean, "feedback": string, "suggestions": string[] }`,
1327
- },
1328
- ],
1329
- });
1330
-
1331
- const text =
1332
- response.content[0].type === "text" ? response.content[0].text : "{}";
1333
- return JSON.parse(text);
1334
- }
1335
- }
1336
- ```
1337
-
1338
- ---
1339
-
1340
- ### Example 5: Confidence-Based Reflection Trigger
1341
-
1342
- ```typescript
1343
- interface ConfidenceMetadata {
1344
- overallConfidence: number;
1345
- uncertaintyAreas: string[];
1346
- alternativesConsidered: string[];
1347
- }
1348
-
1349
- async function confidenceBasedReflection(
1350
- task: string,
1351
- confidenceThreshold: number = 0.75
1352
- ): Promise<string> {
1353
- const response = await client.messages.create({
1354
- model: "claude-opus-4.5",
1355
- max_tokens: 1024,
1356
- messages: [
1357
- {
1358
- role: "user",
1359
- content: `${task}
1360
-
1361
- After your response, provide a JSON block with confidence metadata:
1362
- {
1363
- "overallConfidence": <0-1>,
1364
- "uncertaintyAreas": ["area1", "area2"],
1365
- "alternativesConsidered": ["alternative1", "alternative2"]
1366
- }`,
1367
- },
1368
- ],
1369
- });
1370
-
1371
- const text =
1372
- response.content[0].type === "text" ? response.content[0].text : "";
1373
-
1374
- // Extract response and metadata
1375
- const jsonMatch = text.match(/\{[\s\S]*\}/);
1376
- const metadata: ConfidenceMetadata = jsonMatch
1377
- ? JSON.parse(jsonMatch[0])
1378
- : { overallConfidence: 0.5, uncertaintyAreas: [], alternativesConsidered: [] };
1379
-
1380
- // If confidence too low, reflect
1381
- if (metadata.overallConfidence < confidenceThreshold) {
1382
- const reflection = await reflectWithLowConfidence(
1383
- task,
1384
- text,
1385
- metadata
1386
- );
1387
- return reflection;
1388
- }
1389
-
1390
- return text;
1391
- }
1392
-
1393
- async function reflectWithLowConfidence(
1394
- task: string,
1395
- initialResponse: string,
1396
- metadata: ConfidenceMetadata
1397
- ): Promise<string> {
1398
- const response = await client.messages.create({
1399
- model: "claude-opus-4.5",
1400
- max_tokens: 1024,
1401
- messages: [
1402
- {
1403
- role: "user",
1404
- content: `Original task: ${task}
1405
-
1406
- Your previous response (confidence: ${metadata.overallConfidence}):
1407
- ${initialResponse}
1408
-
1409
- You indicated uncertainty in these areas:
1410
- ${metadata.uncertaintyAreas.map((a) => `- ${a}`).join("\n")}
1411
-
1412
- You considered these alternatives:
1413
- ${metadata.alternativesConsidered.map((a) => `- ${a}`).join("\n")}
1414
-
1415
- Given your own identified uncertainties:
1416
- 1. Identify what specific information would increase your confidence
1417
- 2. Provide a revised response that addresses these uncertainty areas
1418
- 3. Explain how your revised response is more robust`,
1419
- },
1420
- ],
1421
- });
1422
-
1423
- return response.content[0].type === "text" ? response.content[0].text : "";
1424
- }
1425
- ```
1426
-
1427
- ---
1428
-
1429
- ## Summary & Key Takeaways
1430
-
1431
- ### What Is Reflection?
1432
- Self-reflection in AI agents enables error detection, analysis, and correction without human intervention. It's a three-phase cycle: generate → analyze → improve.
1433
-
1434
- ### When to Implement Reflection
1435
- - **Error recovery**: When outputs fail validation
1436
- - **Iterative refinement**: Complex tasks needing multiple passes
1437
- - **High-stakes decisions**: Code generation, critical logic
1438
- - **Low-confidence outputs**: When model expresses uncertainty
1439
-
1440
- ### When NOT to Implement Reflection
1441
- - Low-latency requirements (<100ms)
1442
- - Simple, deterministic tasks
1443
- - Well-understood workflows with 99%+ baseline accuracy
1444
- - Clear model refusals
1445
- - Token budget constraints
1446
-
1447
- ### Best Practices
1448
- 1. **Limit retries**: 3-5 attempts maximum, never unlimited
1449
- 2. **Clear success criteria**: Specific, measurable goals (not vague)
1450
- 3. **State capture**: Preserve context for intelligent retry
1451
- 4. **Error categorization**: Different strategies for different error types
1452
- 5. **Backoff strategies**: Exponential backoff for transient errors
1453
- 6. **Avoid reflection loops**: Use state deduplication and progress detection
1454
-
1455
- ### Implementation Hierarchy
1456
- 1. Start simple: Basic self-critique in prompts
1457
- 2. Add validation: Explicit output rules
1458
- 3. Multi-attempt: Error-reflection-retry loop (3 attempts)
1459
- 4. Tool-enhanced: Use linters, tests, execution for feedback
1460
- 5. Multi-agent: Deploy separate critic for complex tasks
1461
- 6. Full Reflexion: If baseline approaches insufficient
1462
-
1463
- ### Framework Selection
1464
- - **LangChain/LangGraph**: Pre-built reflection patterns, good for graph-based workflows
1465
- - **Anthropic/Claude**: Emphasis on simplicity, extended thinking for reflection
1466
- - **Google ADK**: Specialized reflect-and-retry plugin
1467
- - **Custom**: Lightweight TypeScript patterns for specific needs
1468
-