@pencil-agent/nano-pencil 2.0.0 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (195) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/mcp/mcp-client.d.ts +3 -1
  7. package/dist/core/mcp/mcp-client.js +6 -6
  8. package/dist/core/mcp/mcp-config.d.ts +3 -3
  9. package/dist/core/mcp/mcp-config.js +1 -1
  10. package/dist/core/mcp/mcp-manager.d.ts +5 -1
  11. package/dist/core/mcp/mcp-manager.js +1 -1
  12. package/dist/core/platform/config/resource-loader.d.ts +2 -0
  13. package/dist/core/platform/config/resource-loader.js +2 -2
  14. package/dist/core/runtime/agent-session.d.ts +12 -0
  15. package/dist/core/runtime/agent-session.js +8 -8
  16. package/dist/core/runtime/sdk.d.ts +8 -0
  17. package/dist/core/runtime/sdk.js +1 -1
  18. package/dist/extensions/builtin/AGENT.md +115 -115
  19. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  20. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  91. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  92. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  93. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  94. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  95. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  96. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  97. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  98. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  99. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  100. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  101. package/dist/extensions/builtin/browser/browser.md +73 -73
  102. package/dist/extensions/builtin/browser/install.md +142 -142
  103. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  104. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  105. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  107. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  108. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  109. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  110. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  111. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  112. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  113. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  114. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  115. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  116. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  117. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  118. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  119. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  120. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  121. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  122. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  123. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  124. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  125. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  126. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  127. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  128. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  129. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  130. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  131. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  132. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  133. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  134. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  135. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  136. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  137. package/dist/extensions/builtin/goal/README.md +67 -67
  138. package/dist/extensions/builtin/grub/README.md +112 -112
  139. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  140. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  141. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  142. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  143. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  144. package/dist/extensions/builtin/loop/README.md +92 -92
  145. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  146. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  147. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  148. package/dist/extensions/builtin/sal/README.md +72 -72
  149. package/dist/extensions/builtin/security-audit/README.md +289 -289
  150. package/dist/extensions/builtin/team/AGENT.md +112 -112
  151. package/dist/extensions/builtin/team/TESTING.md +299 -299
  152. package/dist/extensions/builtin/token-save/README.md +56 -56
  153. package/dist/extensions/optional/AGENT.md +10 -10
  154. package/dist/modes/interactive/interactive-mode.js +36 -36
  155. package/dist/modes/interactive/theme/dark.json +85 -85
  156. package/dist/modes/interactive/theme/light.json +84 -84
  157. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  158. package/dist/modes/interactive/theme/warm.json +81 -81
  159. package/dist/node_modules/@pencil-agent/agent-core/dist/agent-loop.js +3 -2
  160. package/dist/node_modules/@pencil-agent/agent-core/dist/structured-adaptive-agent-loop.js +2 -1
  161. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  162. package/docs/cc-agent-design.md +1297 -0
  163. package/docs/cc-tui-design.md +1333 -0
  164. package/docs/codex-goal-command-impl.md +1055 -1055
  165. package/docs/codex-goal-vs-grub.md +500 -500
  166. package/docs/custom-provider.md +27 -27
  167. package/docs/extensions.md +27 -27
  168. package/docs/keybindings.md +27 -27
  169. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  170. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  171. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  172. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  173. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  174. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  175. package/docs/loop-usage-examples.md +214 -214
  176. package/docs/models.md +27 -27
  177. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +170 -0
  178. package/docs/packages.md +27 -27
  179. package/docs/pi-design-philosophy.md +457 -457
  180. package/docs/planmode.md +1987 -1987
  181. package/docs/prompt-templates.md +27 -27
  182. package/docs/providers.md +27 -27
  183. package/docs/scan-report.md +3820 -0
  184. package/docs/sdk.md +27 -27
  185. package/docs/skills.md +27 -27
  186. package/docs/themes.md +27 -27
  187. package/docs/tui.md +27 -27
  188. package/docs//345/257/271/346/240/207Claude-Code.md +1775 -0
  189. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +261 -0
  190. package/package.json +190 -190
  191. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +0 -851
  192. package/docs/SDK-TESTING.md +0 -364
  193. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +0 -593
  194. package/docs/startup-performance-optimization.md +0 -301
  195. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +0 -47
@@ -1,500 +1,500 @@
1
- # Codex `/goal` vs nanoPencil `/grub`:同源异流的长期任务机制
2
-
3
- > 两个系统解决同一个问题:让 AI agent 自主迭代完成复杂任务。
4
- > 但设计哲学、实现路径和约束模型截然不同。
5
-
6
- ---
7
-
8
- ## 一、一句话概括
9
-
10
- | | Codex `/goal` | nanoPencil `/grub` |
11
- |---|---|---|
12
- | **核心理念** | "设个目标,我 idle 时自动继续" | "设个目标,我每轮严格推进一个 feature" |
13
- | **控制粒度** | token 预算 + 时间 | 迭代轮次 + 连续失败次数 |
14
- | **完成判定** | LLM 自己说了算(但有 completion audit prompt) | feature-list.json 所有项 passes:true 才算完成 |
15
- | **持久化** | SQLite(进程内) | 文件系统(.grub/ 目录) |
16
-
17
- ---
18
-
19
- ## 二、命令对比
20
-
21
- ### 2.1 命令格式
22
-
23
- | 操作 | Codex `/goal` | nanoPencil `/grub` |
24
- |------|--------------|-------------------|
25
- | 设置目标 | `/goal <objective>` | `/grub <goal>` |
26
- | 查看状态 | `/goal`(显示摘要菜单) | `/grub status` 或 `/grub status --json` |
27
- | 暂停 | `/goal pause` | 无(只有 stop) |
28
- | 恢复 | `/goal resume` | `/grub resume` |
29
- | 停止 | `/goal clear` | `/grub stop` |
30
- | 编辑 | `/goal edit` | 无(stop 后重新 start) |
31
- | 帮助 | 无(直接显示 usage) | `/grub help` |
32
- | 限制参数 | token_budget(LLM 工具设置) | `--max-iter N`, `--max-fail N` |
33
-
34
- ### 2.2 命令解析
35
-
36
- **Codex**:在 TUI 层解析,通过 `AppEvent` 事件总线分派到 `App` 层的 `thread_goal_actions`。
37
-
38
- **nanoPencil**:在扩展层解析,`parseGrubCommand()` 返回类型化的命令对象:
39
-
40
- ```typescript
41
- type ParsedGrubCommand =
42
- | { type: "start"; goal: string; maxIterations?: number; maxConsecutiveFailures?: number }
43
- | { type: "status"; json?: boolean }
44
- | { type: "stop" }
45
- | { type: "resume" }
46
- | { type: "help"; reason?: string };
47
- ```
48
-
49
- ---
50
-
51
- ## 三、数据模型对比
52
-
53
- ### 3.1 状态枚举
54
-
55
- | Codex `ThreadGoalStatus` | nanoPencil `GrubStatus` | 对应关系 |
56
- |--------------------------|------------------------|----------|
57
- | `active` | `running` | 等价 |
58
- | `paused` | 无 | grub 无暂停概念 |
59
- | `blocked` | `blocked` | 等价(但触发条件不同) |
60
- | `usage_limited` | 无 | grub 无用量限制 |
61
- | `budget_limited` | 无 | grub 无 token 预算 |
62
- | `complete` | `complete` | 等价 |
63
- | 无 | `stopped` | grub 有手动停止 |
64
- | 无 | `failed` | grub 有失败终止 |
65
-
66
- **关键差异**:Codex 有 6 种状态,grub 有 5 种。Codex 的 `paused`/`usage_limited`/`budget_limited` 在 grub 中不存在;grub 的 `stopped`/`failed` 在 Codex 中不存在。
67
-
68
- ### 3.2 实体结构
69
-
70
- **Codex `ThreadGoal`**:
71
- ```typescript
72
- interface ThreadGoal {
73
- thread_id: string;
74
- goal_id: string; // UUID,每次 replace 生成新 ID
75
- objective: string;
76
- status: ThreadGoalStatus;
77
- token_budget: number | null;
78
- tokens_used: number;
79
- time_used_seconds: number;
80
- created_at: number; // epoch ms
81
- updated_at: number; // epoch ms
82
- }
83
- ```
84
-
85
- **nanoPencil `GrubTaskState`**:
86
- ```typescript
87
- interface GrubTaskState {
88
- id: string; // 8 位 hex
89
- goal: string;
90
- locale: "en" | "zh";
91
- status: GrubStatus;
92
- phase: "initializer" | "execution"; // ⭐ grub 独有
93
- startedAt: number;
94
- updatedAt: number;
95
- currentIteration: number; // ⭐ 当前轮次
96
- awaitingTurn: boolean; // ⭐ 是否在等 turn 返回
97
- consecutiveFailures: number; // ⭐ 连续失败计数
98
- maxIterations: number; // 默认 25
99
- maxConsecutiveFailures: number; // 默认 3
100
- maxInitializerFailures?: number; // 默认 5(初始化阶段更宽容)
101
- harnessDirectory: string; // ⭐ .grub/<id>/
102
- featureChecklistPath: string;
103
- featureListPath: string;
104
- stateFilePath: string;
105
- progressLogPath: string;
106
- initScriptPath: string;
107
- featureListBaseline?: FeatureList;
108
- lastDecision?: GrubDecision;
109
- lastError?: string;
110
- }
111
- ```
112
-
113
- **关键差异**:
114
- - Codex 用 token 预算做限制,grub 用轮次和失败次数
115
- - grub 有 `phase`(initializer/execution),Codex 没有
116
- - grub 有完整的 harness 文件系统(feature-list.json、progress-log.md、init.sh),Codex 没有
117
- - grub 有 `consecutiveFailures` 计数和 `lastDecision`/`lastError` 恢复上下文
118
-
119
- ### 3.3 持久化
120
-
121
- | | Codex | nanoPencil grub |
122
- |---|---|---|
123
- | **存储** | SQLite `thread_goals` 表 | JSON 文件 `state.json` |
124
- | **粒度** | 每个 thread 一行 | 每个 task 一个目录 |
125
- | **事务** | SQL 事务保证原子性 | 文件写入(best-effort) |
126
- | **并发** | 行锁 + 乐观锁(`expected_goal_id`) | 内存锁(`GrubController` 单例) |
127
- | **跨会话** | 天然支持(SQLite 持久) | 支持(文件持久 + resume 命令) |
128
-
129
- ---
130
-
131
- ## 四、续作机制对比
132
-
133
- 这是两个系统最核心的差异。
134
-
135
- ### 4.1 Codex:Idle Continuation(空闲续作)
136
-
137
- ```
138
- Agent turn 结束 → idle
139
-
140
- on_thread_idle() 触发
141
-
142
- 检查 goal 是否 active
143
-
144
- 注入 continuation prompt
145
-
146
- 触发新 turn(自动,无需用户干预)
147
- ```
148
-
149
- **特点**:
150
- - 完全自动,agent idle 就续作
151
- - 续作 prompt 包含 objective、budget 信息、completion audit 规则
152
- - token 预算在每次 tool 完成时实时记账
153
- - budget 耗尽时注入 `budget_limit_prompt` 收尾
154
-
155
- ### 4.2 nanoPencil grub:Controller Loop(控制器循环)
156
-
157
- ```
158
- /grub <goal> 启动
159
-
160
- GrubController.start() → 创建 harness 目录 + state.json
161
-
162
- injectGrubTurn() → 注入初始化 prompt → 触发 turn
163
-
164
- Turn 结束 → extractGrubDecision() 解析 <loop-state> 块
165
-
166
- ┌─ status === "continue"?
167
- │ ├─ YES → validateFeatureListAfterTurn()
168
- │ │ → finishTurn(decision) → currentIteration++
169
- │ │ → injectGrubTurn() → 触发下一个 turn
170
- │ └─ NO → status === "complete"?
171
- │ ├─ YES → validateCompletion() → 检查 feature-list 所有 passes:true
172
- │ │ ├─ 全部通过 → stop("complete")
173
- │ │ └─ 有未完成 → 降级为 continue,指定 nextStep
174
- │ └─ status === "blocked" → stop("blocked")
175
-
176
- 失败时 → recordFailure() → consecutiveFailures++
177
-
178
- consecutiveFailures >= maxConsecutiveFailures → stop("failed")
179
- currentIteration >= maxIterations → stop("failed")
180
- ```
181
-
182
- **特点**:
183
- - 有明确的初始化阶段(initializer)和执行阶段(execution)
184
- - 每轮必须输出 `<loop-state>` JSON 块
185
- - feature-list.json 是完成的 ground truth,不是 LLM 说了算
186
- - 有 init.sh 每轮验证项目健康状态
187
- - 有 progress-log.md 记录每轮进展
188
-
189
- ### 4.3 续作 Prompt 对比
190
-
191
- **Codex continuation prompt**(52 行):
192
- ```
193
- Continue working toward the active thread goal.
194
- <objective>{{ objective }}</objective>
195
- - This goal persists across turns.
196
- - Keep the full objective intact.
197
- - Temporary rough edges are acceptable.
198
- Budget: Tokens used: X / Token budget: Y / Remaining: Z
199
- Work from evidence: Use current worktree as authoritative.
200
- Completion audit: Derive requirements, verify against actual state.
201
- Blocked audit: 3+ consecutive turns of same blocker before marking blocked.
202
- ```
203
-
204
- **grub execution prompt**(100+ 行):
205
- ```
206
- [GRUB:<id>:<iteration>]
207
- Autonomous grub goal: <goal>
208
- You are inside a managed grub harness.
209
- 1) Run .grub/<id>/init.sh and verify project boots.
210
- 2) Read feature-list.json. Pick EXACTLY one feature with passes:false.
211
- 3) Implement + verify that single feature end-to-end.
212
- 4) Flip ONLY "passes" to true and set "evidence".
213
- 5) Append to progress-log.md.
214
- 6) End with <loop-state>{"status":"continue|complete|blocked","summary":"...","nextStep":"..."}</loop-state>
215
- ```
216
-
217
- **关键差异**:
218
- - Codex 的 prompt 侧重"忠实于 objective"和"防止 premature completion"
219
- - grub 的 prompt 侧重"每轮只做一个 feature"和"严格遵守 feature-list 契约"
220
- - Codex 用 XML `<objective>` 包裹用户输入(安全边界)
221
- - grub 用 `<loop-state>` XML 块作为 agent→系统的结构化通信协议
222
-
223
- ---
224
-
225
- ## 五、完成判定对比
226
-
227
- ### 5.1 Codex:LLM 自判 + Prompt 约束
228
-
229
- ```
230
- LLM 判断任务完成
231
-
232
- 调用 update_goal(status: "complete")
233
-
234
- 系统接受(无额外验证)
235
-
236
- 但如果 continuation prompt 的 completion audit 被严格执行:
237
- - LLM 应该验证每个 requirement
238
- - LLM 应该检查 evidence
239
- - LLM 应该避免 premature completion
240
- ```
241
-
242
- **问题**:完成判定完全依赖 LLM 的自律。prompt 再严格,LLM 仍可能"偷懒"。
243
-
244
- ### 5.2 nanoPencil grub:Feature-List 门控
245
-
246
- ```
247
- LLM 判断任务完成
248
-
249
- 输出 <loop-state>{"status":"complete",...}
250
-
251
- extractGrubDecision() 解析
252
-
253
- validateCompletion() 检查:
254
-
255
- 读取 feature-list.json
256
-
257
- allPassing(list)?
258
- ├─ YES → 接受 complete
259
- └─ NO → 降级为 continue,指定下一个 pending feature
260
- ```
261
-
262
- **关键差异**:grub 有**硬编码的完成门控**。LLM 说"complete"但 feature-list 还有 `passes:false` 的项 → 系统拒绝,强制继续。这不是 prompt 约束,是代码约束。
263
-
264
- ---
265
-
266
- ## 六、错误恢复对比
267
-
268
- ### 6.1 Codex
269
-
270
- | 场景 | 处理 |
271
- |------|------|
272
- | Turn 出错(非 usage limit) | `on_turn_error` → stop goal for turn error(→ blocked) |
273
- | Usage limit exceeded | `on_turn_error` → stop goal for usage limit(→ usage_limited) |
274
- | Provider 错误 | turn 内部重试(由 agent-core 处理) |
275
- | 预算耗尽 | 注入 budget_limit_prompt,LLM 收尾 |
276
-
277
- ### 6.2 nanoPencil grub
278
-
279
- | 场景 | 处理 |
280
- |------|------|
281
- | Turn 返回但无 `<loop-state>` | `recordFailure()` → consecutiveFailures++ |
282
- | `<loop-state>` 解析失败 | `recordFailure()` → consecutiveFailures++ |
283
- | feature-list 被非法修改 | `validateFeatureListAfterTurn()` → recordFailure() |
284
- | 连续失败 >= maxConsecutiveFailures | stop("failed") |
285
- | 轮次 >= maxIterations | stop("failed") |
286
- | 初始化阶段连续失败 >= maxInitializerFailures | stop("failed")(更宽容的预算) |
287
-
288
- **关键差异**:
289
- - Codex 的错误恢复依赖 provider 级重试和 LLM 自我修正
290
- - grub 有**显式的失败计数器**和**结构化验证**(feature-list diff 检查)
291
-
292
- ---
293
-
294
- ## 七、Token 预算 vs 轮次预算
295
-
296
- ### 7.1 Codex:Token 预算
297
-
298
- ```typescript
299
- // 创建时设置
300
- create_goal({ objective: "...", token_budget: 50000 });
301
-
302
- // 每次 tool 完成时记账
303
- account_thread_goal_usage(threadId, timeDelta, tokenDelta, "ActiveOnly");
304
-
305
- // 预算耗尽 → 自动标记 budget_limited
306
- if (tokens_used >= token_budget) {
307
- status = "budget_limited";
308
- inject_budget_limit_prompt(); // 告诉 LLM 收尾
309
- }
310
- ```
311
-
312
- **优点**:精细控制成本,token 是硬通货。
313
- **缺点**:不同 provider 的 token 计费不同,用户难以估算。
314
-
315
- ### 7.2 nanoPencil grub:轮次预算
316
-
317
- ```typescript
318
- // 启动时设置
319
- /grub <goal> --max-iter 25 --max-fail 3
320
-
321
- // 每轮结束时检查
322
- if (currentIteration >= maxIterations) stop("failed");
323
- if (consecutiveFailures >= maxConsecutiveFailures) stop("failed");
324
- ```
325
-
326
- **优点**:用户直观理解("最多跑 25 轮"),不依赖 token 计费。
327
- **缺点**:每轮消耗的 token 可能差异很大,无法精确控制成本。
328
-
329
- ---
330
-
331
- ## 八、Harness 文件系统(grub 独有)
332
-
333
- grub 创建了一个完整的 harness 目录:
334
-
335
- ```
336
- .grub/<task-id>/
337
- ├── feature-list.json # 功能清单(ground truth)
338
- ├── feature-checklist.md # 清单的 markdown 可读版
339
- ├── progress-log.md # 每轮进展日志
340
- ├── init.sh # 项目健康检查脚本
341
- └── state.json # 任务状态持久化
342
- ```
343
-
344
- ### 8.1 feature-list.json
345
-
346
- ```json
347
- {
348
- "version": 1,
349
- "goal": "实现用户认证系统",
350
- "features": [
351
- {
352
- "id": "auth-login-endpoint",
353
- "category": "functional",
354
- "description": "POST /auth/login 接受 email+password,返回 JWT",
355
- "steps": [
356
- "创建路由和控制器",
357
- "实现密码哈希验证",
358
- "生成 JWT token",
359
- "返回 token 和 user 对象"
360
- ],
361
- "passes": false,
362
- "evidence": null
363
- }
364
- ]
365
- }
366
- ```
367
-
368
- **契约**:
369
- - 初始化阶段:agent 生成 15-40 个 feature,全部 `passes: false`
370
- - 执行阶段:agent 每轮只能改一个 feature 的 `passes` 和 `evidence` 字段
371
- - 其他字段(id、category、description、steps)不可变
372
- - 系统用 `validateFeatureListDiff()` 检查是否有非法修改
373
-
374
- ### 8.2 init.sh
375
-
376
- ```bash
377
- #!/bin/bash
378
- pwd
379
- git log --oneline -n 20
380
- tail -5 .grub/*/progress-log.md
381
- grep -c '"passes": true' .grub/*/feature-list.json
382
- npm test # 项目特定的烟测
383
- ```
384
-
385
- 每轮执行前运行,确保项目健康。
386
-
387
- ### 8.3 Codex 的对应物
388
-
389
- Codex 没有 harness 文件系统。它的"ground truth"是:
390
- - LLM 自己的记忆(上下文窗口内的对话历史)
391
- - continuation prompt 中的 objective 描述
392
- - completion audit prompt 的验证规则
393
-
394
- ---
395
-
396
- ## 九、生命周期钩子对比
397
-
398
- ### 9.1 Codex 的钩子系统
399
-
400
- ```typescript
401
- // 6 个扩展 trait
402
- ThreadLifecycleContributor: on_thread_start, on_thread_resume, on_thread_idle, on_thread_stop
403
- ConfigContributor: on_config_changed
404
- TurnLifecycleContributor: on_turn_start, on_turn_stop, on_turn_abort, on_turn_error
405
- TokenUsageContributor: on_token_usage
406
- ToolLifecycleContributor: on_tool_finish
407
- ToolContributor: tools() // 注册 get_goal, create_goal, update_goal
408
- ```
409
-
410
- ### 9.2 nanoPencil grub 的钩子
411
-
412
- grub 不使用生命周期钩子。它在扩展入口(`index.ts`)中:
413
- - 注册 `/grub` 命令和补全
414
- - 注册 `user_message` 事件拦截(检测 grub turn 的响应)
415
- - 注册 `session_start` 事件(发现并恢复持久化的任务)
416
- - 手动调用 `injectGrubTurn()` 触发每轮
417
-
418
- **关键差异**:Codex 的 goal 是深度集成到 agent 生命周期的;grub 是通过扩展 API 在外层编排的。
419
-
420
- ---
421
-
422
- ## 十、设计哲学差异
423
-
424
- ### 10.1 Codex:信任 LLM + 预算约束
425
-
426
- - **信任**:LLM 可以自主判断 complete/blocked
427
- - **约束**:token 预算硬限制
428
- - **恢复**:continuation prompt 的 completion audit 是"建议"而非"强制"
429
- - **哲学**:"给 LLM 足够的上下文和规则,让它做出正确判断"
430
-
431
- ### 10.2 nanoPencil grub:不信任 LLM + 结构化验证
432
-
433
- - **不信任**:LLM 说 complete 时,系统验证 feature-list
434
- - **约束**:轮次 + 失败次数
435
- - **恢复**:feature-list diff 检查、结构化 `<loop-state>` 解析
436
- - **哲学**:"LLM 是执行者,系统是裁判"
437
-
438
- ### 10.3 这反映了什么
439
-
440
- Codex 是 OpenAI 的产品,倾向于**让模型更强然后信任它**。
441
- grub 是工程团队的工具,倾向于**用结构约束弥补模型的不确定性**。
442
-
443
- 两种哲学都有道理:
444
- - Codex 的方式在模型足够强时效率更高(少一轮验证就少一轮 token)
445
- - grub 的方式在模型不够强时更可靠(不会 premature completion)
446
-
447
- ---
448
-
449
- ## 十一、复刻指南:如何在 nanoPencil 中融合两者
450
-
451
- 如果你想把 Codex goal 的优点融入 grub,以下是可借鉴的点:
452
-
453
- ### 11.1 可以直接借鉴的
454
-
455
- | Codex 特性 | 融入 grub 的方式 |
456
- |-----------|-----------------|
457
- | Token 预算 | 在 `GrubTaskState` 加 `tokenBudget` 和 `tokensUsed` 字段 |
458
- | 自动续作 | 在 `on_thread_idle` 时检查是否有 running task,自动注入下一轮 |
459
- | 编辑 objective | 加 `/grub edit <new-goal>` 子命令 |
460
- | 暂停/恢复 | 加 `/grub pause` + 状态 `paused` |
461
- | 状态行指示器 | 在 TUI status bar 显示当前 grub task 状态 |
462
- | 记账系统 | 在 `on_tool_finish` 时累加 token 使用 |
463
-
464
- ### 11.2 不建议借鉴的
465
-
466
- | Codex 特性 | 原因 |
467
- |-----------|------|
468
- | LLM 自判 complete | grub 的 feature-list 门控更可靠 |
469
- | SQLite 存储 | 文件系统对 grub 的 harness 模式更自然(可 git 追踪) |
470
- | 6 种状态 | grub 的 5 种 + phase 已经足够表达 |
471
-
472
- ### 11.3 grub 独有的优势应保留
473
-
474
- | 特性 | 为什么重要 |
475
- |------|-----------|
476
- | feature-list.json | 完成的 ground truth,不依赖 LLM 记忆 |
477
- | init.sh | 每轮健康检查,防止退化 |
478
- | progress-log.md | 人类可读的进展记录 |
479
- | initializer/execution phase | 先规划后执行,防止 LLM 直接跳到实现 |
480
- | `<loop-state>` 协议 | 结构化的 agent→系统通信 |
481
- | feature-list diff 验证 | 防止 LLM 偷改清单 |
482
-
483
- ---
484
-
485
- ## 十二、总结矩阵
486
-
487
- | 维度 | Codex `/goal` | nanoPencil `/grub` | 谁更好 |
488
- |------|--------------|-------------------|--------|
489
- | **命令丰富度** | 6 个子命令 | 5 个子命令 | Codex(有 edit) |
490
- | **状态模型** | 6 种状态 | 5 种状态 + 2 种 phase | grub(phase 更清晰) |
491
- | **续作机制** | idle 自动续作 | controller loop 驱动 | Codex(更无缝) |
492
- | **完成判定** | LLM 自判 | feature-list 门控 | grub(更可靠) |
493
- | **错误恢复** | provider 重试 + LLM 自修 | 结构化失败计数 | grub(更可预测) |
494
- | **成本控制** | token 预算 | 轮次预算 | 各有优劣 |
495
- | **持久化** | SQLite | 文件系统 | 各有优劣 |
496
- | **可审计性** | 低(只有 DB 行) | 高(feature-list + progress-log) | grub |
497
- | **集成深度** | 深(6 个生命周期 trait) | 浅(扩展 API 外层编排) | Codex |
498
- | **模型依赖** | 高(强依赖模型自律) | 低(结构约束兜底) | grub |
499
-
500
- **最终结论**:Codex 的 goal 是"给模型自由",grub 是"给模型笼子"。两者不是好坏之分,是信任边界的差异。grub 的 feature-list 门控是它最大的结构性优势,不应被 Codex 的"信任 LLM"哲学取代。
1
+ # Codex `/goal` vs nanoPencil `/grub`:同源异流的长期任务机制
2
+
3
+ > 两个系统解决同一个问题:让 AI agent 自主迭代完成复杂任务。
4
+ > 但设计哲学、实现路径和约束模型截然不同。
5
+
6
+ ---
7
+
8
+ ## 一、一句话概括
9
+
10
+ | | Codex `/goal` | nanoPencil `/grub` |
11
+ |---|---|---|
12
+ | **核心理念** | "设个目标,我 idle 时自动继续" | "设个目标,我每轮严格推进一个 feature" |
13
+ | **控制粒度** | token 预算 + 时间 | 迭代轮次 + 连续失败次数 |
14
+ | **完成判定** | LLM 自己说了算(但有 completion audit prompt) | feature-list.json 所有项 passes:true 才算完成 |
15
+ | **持久化** | SQLite(进程内) | 文件系统(.grub/ 目录) |
16
+
17
+ ---
18
+
19
+ ## 二、命令对比
20
+
21
+ ### 2.1 命令格式
22
+
23
+ | 操作 | Codex `/goal` | nanoPencil `/grub` |
24
+ |------|--------------|-------------------|
25
+ | 设置目标 | `/goal <objective>` | `/grub <goal>` |
26
+ | 查看状态 | `/goal`(显示摘要菜单) | `/grub status` 或 `/grub status --json` |
27
+ | 暂停 | `/goal pause` | 无(只有 stop) |
28
+ | 恢复 | `/goal resume` | `/grub resume` |
29
+ | 停止 | `/goal clear` | `/grub stop` |
30
+ | 编辑 | `/goal edit` | 无(stop 后重新 start) |
31
+ | 帮助 | 无(直接显示 usage) | `/grub help` |
32
+ | 限制参数 | token_budget(LLM 工具设置) | `--max-iter N`, `--max-fail N` |
33
+
34
+ ### 2.2 命令解析
35
+
36
+ **Codex**:在 TUI 层解析,通过 `AppEvent` 事件总线分派到 `App` 层的 `thread_goal_actions`。
37
+
38
+ **nanoPencil**:在扩展层解析,`parseGrubCommand()` 返回类型化的命令对象:
39
+
40
+ ```typescript
41
+ type ParsedGrubCommand =
42
+ | { type: "start"; goal: string; maxIterations?: number; maxConsecutiveFailures?: number }
43
+ | { type: "status"; json?: boolean }
44
+ | { type: "stop" }
45
+ | { type: "resume" }
46
+ | { type: "help"; reason?: string };
47
+ ```
48
+
49
+ ---
50
+
51
+ ## 三、数据模型对比
52
+
53
+ ### 3.1 状态枚举
54
+
55
+ | Codex `ThreadGoalStatus` | nanoPencil `GrubStatus` | 对应关系 |
56
+ |--------------------------|------------------------|----------|
57
+ | `active` | `running` | 等价 |
58
+ | `paused` | 无 | grub 无暂停概念 |
59
+ | `blocked` | `blocked` | 等价(但触发条件不同) |
60
+ | `usage_limited` | 无 | grub 无用量限制 |
61
+ | `budget_limited` | 无 | grub 无 token 预算 |
62
+ | `complete` | `complete` | 等价 |
63
+ | 无 | `stopped` | grub 有手动停止 |
64
+ | 无 | `failed` | grub 有失败终止 |
65
+
66
+ **关键差异**:Codex 有 6 种状态,grub 有 5 种。Codex 的 `paused`/`usage_limited`/`budget_limited` 在 grub 中不存在;grub 的 `stopped`/`failed` 在 Codex 中不存在。
67
+
68
+ ### 3.2 实体结构
69
+
70
+ **Codex `ThreadGoal`**:
71
+ ```typescript
72
+ interface ThreadGoal {
73
+ thread_id: string;
74
+ goal_id: string; // UUID,每次 replace 生成新 ID
75
+ objective: string;
76
+ status: ThreadGoalStatus;
77
+ token_budget: number | null;
78
+ tokens_used: number;
79
+ time_used_seconds: number;
80
+ created_at: number; // epoch ms
81
+ updated_at: number; // epoch ms
82
+ }
83
+ ```
84
+
85
+ **nanoPencil `GrubTaskState`**:
86
+ ```typescript
87
+ interface GrubTaskState {
88
+ id: string; // 8 位 hex
89
+ goal: string;
90
+ locale: "en" | "zh";
91
+ status: GrubStatus;
92
+ phase: "initializer" | "execution"; // ⭐ grub 独有
93
+ startedAt: number;
94
+ updatedAt: number;
95
+ currentIteration: number; // ⭐ 当前轮次
96
+ awaitingTurn: boolean; // ⭐ 是否在等 turn 返回
97
+ consecutiveFailures: number; // ⭐ 连续失败计数
98
+ maxIterations: number; // 默认 25
99
+ maxConsecutiveFailures: number; // 默认 3
100
+ maxInitializerFailures?: number; // 默认 5(初始化阶段更宽容)
101
+ harnessDirectory: string; // ⭐ .grub/<id>/
102
+ featureChecklistPath: string;
103
+ featureListPath: string;
104
+ stateFilePath: string;
105
+ progressLogPath: string;
106
+ initScriptPath: string;
107
+ featureListBaseline?: FeatureList;
108
+ lastDecision?: GrubDecision;
109
+ lastError?: string;
110
+ }
111
+ ```
112
+
113
+ **关键差异**:
114
+ - Codex 用 token 预算做限制,grub 用轮次和失败次数
115
+ - grub 有 `phase`(initializer/execution),Codex 没有
116
+ - grub 有完整的 harness 文件系统(feature-list.json、progress-log.md、init.sh),Codex 没有
117
+ - grub 有 `consecutiveFailures` 计数和 `lastDecision`/`lastError` 恢复上下文
118
+
119
+ ### 3.3 持久化
120
+
121
+ | | Codex | nanoPencil grub |
122
+ |---|---|---|
123
+ | **存储** | SQLite `thread_goals` 表 | JSON 文件 `state.json` |
124
+ | **粒度** | 每个 thread 一行 | 每个 task 一个目录 |
125
+ | **事务** | SQL 事务保证原子性 | 文件写入(best-effort) |
126
+ | **并发** | 行锁 + 乐观锁(`expected_goal_id`) | 内存锁(`GrubController` 单例) |
127
+ | **跨会话** | 天然支持(SQLite 持久) | 支持(文件持久 + resume 命令) |
128
+
129
+ ---
130
+
131
+ ## 四、续作机制对比
132
+
133
+ 这是两个系统最核心的差异。
134
+
135
+ ### 4.1 Codex:Idle Continuation(空闲续作)
136
+
137
+ ```
138
+ Agent turn 结束 → idle
139
+
140
+ on_thread_idle() 触发
141
+
142
+ 检查 goal 是否 active
143
+
144
+ 注入 continuation prompt
145
+
146
+ 触发新 turn(自动,无需用户干预)
147
+ ```
148
+
149
+ **特点**:
150
+ - 完全自动,agent idle 就续作
151
+ - 续作 prompt 包含 objective、budget 信息、completion audit 规则
152
+ - token 预算在每次 tool 完成时实时记账
153
+ - budget 耗尽时注入 `budget_limit_prompt` 收尾
154
+
155
+ ### 4.2 nanoPencil grub:Controller Loop(控制器循环)
156
+
157
+ ```
158
+ /grub <goal> 启动
159
+
160
+ GrubController.start() → 创建 harness 目录 + state.json
161
+
162
+ injectGrubTurn() → 注入初始化 prompt → 触发 turn
163
+
164
+ Turn 结束 → extractGrubDecision() 解析 <loop-state> 块
165
+
166
+ ┌─ status === "continue"?
167
+ │ ├─ YES → validateFeatureListAfterTurn()
168
+ │ │ → finishTurn(decision) → currentIteration++
169
+ │ │ → injectGrubTurn() → 触发下一个 turn
170
+ │ └─ NO → status === "complete"?
171
+ │ ├─ YES → validateCompletion() → 检查 feature-list 所有 passes:true
172
+ │ │ ├─ 全部通过 → stop("complete")
173
+ │ │ └─ 有未完成 → 降级为 continue,指定 nextStep
174
+ │ └─ status === "blocked" → stop("blocked")
175
+
176
+ 失败时 → recordFailure() → consecutiveFailures++
177
+
178
+ consecutiveFailures >= maxConsecutiveFailures → stop("failed")
179
+ currentIteration >= maxIterations → stop("failed")
180
+ ```
181
+
182
+ **特点**:
183
+ - 有明确的初始化阶段(initializer)和执行阶段(execution)
184
+ - 每轮必须输出 `<loop-state>` JSON 块
185
+ - feature-list.json 是完成的 ground truth,不是 LLM 说了算
186
+ - 有 init.sh 每轮验证项目健康状态
187
+ - 有 progress-log.md 记录每轮进展
188
+
189
+ ### 4.3 续作 Prompt 对比
190
+
191
+ **Codex continuation prompt**(52 行):
192
+ ```
193
+ Continue working toward the active thread goal.
194
+ <objective>{{ objective }}</objective>
195
+ - This goal persists across turns.
196
+ - Keep the full objective intact.
197
+ - Temporary rough edges are acceptable.
198
+ Budget: Tokens used: X / Token budget: Y / Remaining: Z
199
+ Work from evidence: Use current worktree as authoritative.
200
+ Completion audit: Derive requirements, verify against actual state.
201
+ Blocked audit: 3+ consecutive turns of same blocker before marking blocked.
202
+ ```
203
+
204
+ **grub execution prompt**(100+ 行):
205
+ ```
206
+ [GRUB:<id>:<iteration>]
207
+ Autonomous grub goal: <goal>
208
+ You are inside a managed grub harness.
209
+ 1) Run .grub/<id>/init.sh and verify project boots.
210
+ 2) Read feature-list.json. Pick EXACTLY one feature with passes:false.
211
+ 3) Implement + verify that single feature end-to-end.
212
+ 4) Flip ONLY "passes" to true and set "evidence".
213
+ 5) Append to progress-log.md.
214
+ 6) End with <loop-state>{"status":"continue|complete|blocked","summary":"...","nextStep":"..."}</loop-state>
215
+ ```
216
+
217
+ **关键差异**:
218
+ - Codex 的 prompt 侧重"忠实于 objective"和"防止 premature completion"
219
+ - grub 的 prompt 侧重"每轮只做一个 feature"和"严格遵守 feature-list 契约"
220
+ - Codex 用 XML `<objective>` 包裹用户输入(安全边界)
221
+ - grub 用 `<loop-state>` XML 块作为 agent→系统的结构化通信协议
222
+
223
+ ---
224
+
225
+ ## 五、完成判定对比
226
+
227
+ ### 5.1 Codex:LLM 自判 + Prompt 约束
228
+
229
+ ```
230
+ LLM 判断任务完成
231
+
232
+ 调用 update_goal(status: "complete")
233
+
234
+ 系统接受(无额外验证)
235
+
236
+ 但如果 continuation prompt 的 completion audit 被严格执行:
237
+ - LLM 应该验证每个 requirement
238
+ - LLM 应该检查 evidence
239
+ - LLM 应该避免 premature completion
240
+ ```
241
+
242
+ **问题**:完成判定完全依赖 LLM 的自律。prompt 再严格,LLM 仍可能"偷懒"。
243
+
244
+ ### 5.2 nanoPencil grub:Feature-List 门控
245
+
246
+ ```
247
+ LLM 判断任务完成
248
+
249
+ 输出 <loop-state>{"status":"complete",...}
250
+
251
+ extractGrubDecision() 解析
252
+
253
+ validateCompletion() 检查:
254
+
255
+ 读取 feature-list.json
256
+
257
+ allPassing(list)?
258
+ ├─ YES → 接受 complete
259
+ └─ NO → 降级为 continue,指定下一个 pending feature
260
+ ```
261
+
262
+ **关键差异**:grub 有**硬编码的完成门控**。LLM 说"complete"但 feature-list 还有 `passes:false` 的项 → 系统拒绝,强制继续。这不是 prompt 约束,是代码约束。
263
+
264
+ ---
265
+
266
+ ## 六、错误恢复对比
267
+
268
+ ### 6.1 Codex
269
+
270
+ | 场景 | 处理 |
271
+ |------|------|
272
+ | Turn 出错(非 usage limit) | `on_turn_error` → stop goal for turn error(→ blocked) |
273
+ | Usage limit exceeded | `on_turn_error` → stop goal for usage limit(→ usage_limited) |
274
+ | Provider 错误 | turn 内部重试(由 agent-core 处理) |
275
+ | 预算耗尽 | 注入 budget_limit_prompt,LLM 收尾 |
276
+
277
+ ### 6.2 nanoPencil grub
278
+
279
+ | 场景 | 处理 |
280
+ |------|------|
281
+ | Turn 返回但无 `<loop-state>` | `recordFailure()` → consecutiveFailures++ |
282
+ | `<loop-state>` 解析失败 | `recordFailure()` → consecutiveFailures++ |
283
+ | feature-list 被非法修改 | `validateFeatureListAfterTurn()` → recordFailure() |
284
+ | 连续失败 >= maxConsecutiveFailures | stop("failed") |
285
+ | 轮次 >= maxIterations | stop("failed") |
286
+ | 初始化阶段连续失败 >= maxInitializerFailures | stop("failed")(更宽容的预算) |
287
+
288
+ **关键差异**:
289
+ - Codex 的错误恢复依赖 provider 级重试和 LLM 自我修正
290
+ - grub 有**显式的失败计数器**和**结构化验证**(feature-list diff 检查)
291
+
292
+ ---
293
+
294
+ ## 七、Token 预算 vs 轮次预算
295
+
296
+ ### 7.1 Codex:Token 预算
297
+
298
+ ```typescript
299
+ // 创建时设置
300
+ create_goal({ objective: "...", token_budget: 50000 });
301
+
302
+ // 每次 tool 完成时记账
303
+ account_thread_goal_usage(threadId, timeDelta, tokenDelta, "ActiveOnly");
304
+
305
+ // 预算耗尽 → 自动标记 budget_limited
306
+ if (tokens_used >= token_budget) {
307
+ status = "budget_limited";
308
+ inject_budget_limit_prompt(); // 告诉 LLM 收尾
309
+ }
310
+ ```
311
+
312
+ **优点**:精细控制成本,token 是硬通货。
313
+ **缺点**:不同 provider 的 token 计费不同,用户难以估算。
314
+
315
+ ### 7.2 nanoPencil grub:轮次预算
316
+
317
+ ```typescript
318
+ // 启动时设置
319
+ /grub <goal> --max-iter 25 --max-fail 3
320
+
321
+ // 每轮结束时检查
322
+ if (currentIteration >= maxIterations) stop("failed");
323
+ if (consecutiveFailures >= maxConsecutiveFailures) stop("failed");
324
+ ```
325
+
326
+ **优点**:用户直观理解("最多跑 25 轮"),不依赖 token 计费。
327
+ **缺点**:每轮消耗的 token 可能差异很大,无法精确控制成本。
328
+
329
+ ---
330
+
331
+ ## 八、Harness 文件系统(grub 独有)
332
+
333
+ grub 创建了一个完整的 harness 目录:
334
+
335
+ ```
336
+ .grub/<task-id>/
337
+ ├── feature-list.json # 功能清单(ground truth)
338
+ ├── feature-checklist.md # 清单的 markdown 可读版
339
+ ├── progress-log.md # 每轮进展日志
340
+ ├── init.sh # 项目健康检查脚本
341
+ └── state.json # 任务状态持久化
342
+ ```
343
+
344
+ ### 8.1 feature-list.json
345
+
346
+ ```json
347
+ {
348
+ "version": 1,
349
+ "goal": "实现用户认证系统",
350
+ "features": [
351
+ {
352
+ "id": "auth-login-endpoint",
353
+ "category": "functional",
354
+ "description": "POST /auth/login 接受 email+password,返回 JWT",
355
+ "steps": [
356
+ "创建路由和控制器",
357
+ "实现密码哈希验证",
358
+ "生成 JWT token",
359
+ "返回 token 和 user 对象"
360
+ ],
361
+ "passes": false,
362
+ "evidence": null
363
+ }
364
+ ]
365
+ }
366
+ ```
367
+
368
+ **契约**:
369
+ - 初始化阶段:agent 生成 15-40 个 feature,全部 `passes: false`
370
+ - 执行阶段:agent 每轮只能改一个 feature 的 `passes` 和 `evidence` 字段
371
+ - 其他字段(id、category、description、steps)不可变
372
+ - 系统用 `validateFeatureListDiff()` 检查是否有非法修改
373
+
374
+ ### 8.2 init.sh
375
+
376
+ ```bash
377
+ #!/bin/bash
378
+ pwd
379
+ git log --oneline -n 20
380
+ tail -5 .grub/*/progress-log.md
381
+ grep -c '"passes": true' .grub/*/feature-list.json
382
+ npm test # 项目特定的烟测
383
+ ```
384
+
385
+ 每轮执行前运行,确保项目健康。
386
+
387
+ ### 8.3 Codex 的对应物
388
+
389
+ Codex 没有 harness 文件系统。它的"ground truth"是:
390
+ - LLM 自己的记忆(上下文窗口内的对话历史)
391
+ - continuation prompt 中的 objective 描述
392
+ - completion audit prompt 的验证规则
393
+
394
+ ---
395
+
396
+ ## 九、生命周期钩子对比
397
+
398
+ ### 9.1 Codex 的钩子系统
399
+
400
+ ```typescript
401
+ // 6 个扩展 trait
402
+ ThreadLifecycleContributor: on_thread_start, on_thread_resume, on_thread_idle, on_thread_stop
403
+ ConfigContributor: on_config_changed
404
+ TurnLifecycleContributor: on_turn_start, on_turn_stop, on_turn_abort, on_turn_error
405
+ TokenUsageContributor: on_token_usage
406
+ ToolLifecycleContributor: on_tool_finish
407
+ ToolContributor: tools() // 注册 get_goal, create_goal, update_goal
408
+ ```
409
+
410
+ ### 9.2 nanoPencil grub 的钩子
411
+
412
+ grub 不使用生命周期钩子。它在扩展入口(`index.ts`)中:
413
+ - 注册 `/grub` 命令和补全
414
+ - 注册 `user_message` 事件拦截(检测 grub turn 的响应)
415
+ - 注册 `session_start` 事件(发现并恢复持久化的任务)
416
+ - 手动调用 `injectGrubTurn()` 触发每轮
417
+
418
+ **关键差异**:Codex 的 goal 是深度集成到 agent 生命周期的;grub 是通过扩展 API 在外层编排的。
419
+
420
+ ---
421
+
422
+ ## 十、设计哲学差异
423
+
424
+ ### 10.1 Codex:信任 LLM + 预算约束
425
+
426
+ - **信任**:LLM 可以自主判断 complete/blocked
427
+ - **约束**:token 预算硬限制
428
+ - **恢复**:continuation prompt 的 completion audit 是"建议"而非"强制"
429
+ - **哲学**:"给 LLM 足够的上下文和规则,让它做出正确判断"
430
+
431
+ ### 10.2 nanoPencil grub:不信任 LLM + 结构化验证
432
+
433
+ - **不信任**:LLM 说 complete 时,系统验证 feature-list
434
+ - **约束**:轮次 + 失败次数
435
+ - **恢复**:feature-list diff 检查、结构化 `<loop-state>` 解析
436
+ - **哲学**:"LLM 是执行者,系统是裁判"
437
+
438
+ ### 10.3 这反映了什么
439
+
440
+ Codex 是 OpenAI 的产品,倾向于**让模型更强然后信任它**。
441
+ grub 是工程团队的工具,倾向于**用结构约束弥补模型的不确定性**。
442
+
443
+ 两种哲学都有道理:
444
+ - Codex 的方式在模型足够强时效率更高(少一轮验证就少一轮 token)
445
+ - grub 的方式在模型不够强时更可靠(不会 premature completion)
446
+
447
+ ---
448
+
449
+ ## 十一、复刻指南:如何在 nanoPencil 中融合两者
450
+
451
+ 如果你想把 Codex goal 的优点融入 grub,以下是可借鉴的点:
452
+
453
+ ### 11.1 可以直接借鉴的
454
+
455
+ | Codex 特性 | 融入 grub 的方式 |
456
+ |-----------|-----------------|
457
+ | Token 预算 | 在 `GrubTaskState` 加 `tokenBudget` 和 `tokensUsed` 字段 |
458
+ | 自动续作 | 在 `on_thread_idle` 时检查是否有 running task,自动注入下一轮 |
459
+ | 编辑 objective | 加 `/grub edit <new-goal>` 子命令 |
460
+ | 暂停/恢复 | 加 `/grub pause` + 状态 `paused` |
461
+ | 状态行指示器 | 在 TUI status bar 显示当前 grub task 状态 |
462
+ | 记账系统 | 在 `on_tool_finish` 时累加 token 使用 |
463
+
464
+ ### 11.2 不建议借鉴的
465
+
466
+ | Codex 特性 | 原因 |
467
+ |-----------|------|
468
+ | LLM 自判 complete | grub 的 feature-list 门控更可靠 |
469
+ | SQLite 存储 | 文件系统对 grub 的 harness 模式更自然(可 git 追踪) |
470
+ | 6 种状态 | grub 的 5 种 + phase 已经足够表达 |
471
+
472
+ ### 11.3 grub 独有的优势应保留
473
+
474
+ | 特性 | 为什么重要 |
475
+ |------|-----------|
476
+ | feature-list.json | 完成的 ground truth,不依赖 LLM 记忆 |
477
+ | init.sh | 每轮健康检查,防止退化 |
478
+ | progress-log.md | 人类可读的进展记录 |
479
+ | initializer/execution phase | 先规划后执行,防止 LLM 直接跳到实现 |
480
+ | `<loop-state>` 协议 | 结构化的 agent→系统通信 |
481
+ | feature-list diff 验证 | 防止 LLM 偷改清单 |
482
+
483
+ ---
484
+
485
+ ## 十二、总结矩阵
486
+
487
+ | 维度 | Codex `/goal` | nanoPencil `/grub` | 谁更好 |
488
+ |------|--------------|-------------------|--------|
489
+ | **命令丰富度** | 6 个子命令 | 5 个子命令 | Codex(有 edit) |
490
+ | **状态模型** | 6 种状态 | 5 种状态 + 2 种 phase | grub(phase 更清晰) |
491
+ | **续作机制** | idle 自动续作 | controller loop 驱动 | Codex(更无缝) |
492
+ | **完成判定** | LLM 自判 | feature-list 门控 | grub(更可靠) |
493
+ | **错误恢复** | provider 重试 + LLM 自修 | 结构化失败计数 | grub(更可预测) |
494
+ | **成本控制** | token 预算 | 轮次预算 | 各有优劣 |
495
+ | **持久化** | SQLite | 文件系统 | 各有优劣 |
496
+ | **可审计性** | 低(只有 DB 行) | 高(feature-list + progress-log) | grub |
497
+ | **集成深度** | 深(6 个生命周期 trait) | 浅(扩展 API 外层编排) | Codex |
498
+ | **模型依赖** | 高(强依赖模型自律) | 低(结构约束兜底) | grub |
499
+
500
+ **最终结论**:Codex 的 goal 是"给模型自由",grub 是"给模型笼子"。两者不是好坏之分,是信任边界的差异。grub 的 feature-list 门控是它最大的结构性优势,不应被 Codex 的"信任 LLM"哲学取代。