@web-auto/webauto 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (354) hide show
  1. package/apps/desktop-console/default-settings.json +1 -0
  2. package/apps/desktop-console/dist/main/index.mjs +1618 -0
  3. package/apps/desktop-console/{src → dist}/main/preload.mjs +10 -0
  4. package/apps/desktop-console/dist/renderer/index.js +3063 -0
  5. package/apps/desktop-console/entry/ui-console.mjs +299 -0
  6. package/apps/webauto/entry/account.mjs +356 -0
  7. package/apps/webauto/entry/lib/account-detect.mjs +160 -0
  8. package/apps/webauto/entry/lib/account-store.mjs +587 -0
  9. package/apps/webauto/entry/lib/profilepool.mjs +1 -1
  10. package/apps/webauto/entry/xhs-install.mjs +27 -3
  11. package/apps/webauto/entry/xhs-status.mjs +152 -0
  12. package/apps/webauto/entry/xhs-unified.mjs +595 -17
  13. package/bin/webauto.mjs +263 -15
  14. package/dist/apps/webauto/server.js +66 -0
  15. package/dist/modules/camo-backend/src/index.js +575 -0
  16. package/dist/modules/camo-backend/src/internal/BrowserSession.js +817 -0
  17. package/dist/modules/camo-backend/src/internal/ElementRegistry.js +61 -0
  18. package/dist/modules/camo-backend/src/internal/ProfileLock.js +85 -0
  19. package/dist/modules/camo-backend/src/internal/SessionManager.js +172 -0
  20. package/dist/modules/camo-backend/src/internal/container-matcher.js +852 -0
  21. package/dist/modules/camo-backend/src/internal/engine-manager.js +258 -0
  22. package/dist/modules/camo-backend/src/internal/fingerprint.js +203 -0
  23. package/dist/modules/camo-backend/src/internal/pageRuntime.js +29 -0
  24. package/dist/modules/camo-backend/src/internal/runtimeInjector.js +30 -0
  25. package/dist/modules/camo-backend/src/internal/state-bus.js +46 -0
  26. package/dist/modules/camo-backend/src/internal/storage-paths.js +36 -0
  27. package/dist/modules/camo-backend/src/internal/ws-server.js +1202 -0
  28. package/dist/modules/camo-runtime/src/utils/browser-service.mjs +423 -0
  29. package/dist/modules/camo-runtime/src/utils/config.mjs +77 -0
  30. package/dist/modules/container-registry/src/index.js +184 -0
  31. package/dist/modules/logging/src/index.js +92 -0
  32. package/dist/modules/operations/src/builtin.js +27 -0
  33. package/dist/modules/operations/src/container-binding.js +75 -0
  34. package/dist/modules/operations/src/executor.js +146 -0
  35. package/dist/modules/operations/src/operations/click.js +167 -0
  36. package/dist/modules/operations/src/operations/extract.js +204 -0
  37. package/dist/modules/operations/src/operations/find-child.js +17 -0
  38. package/dist/modules/operations/src/operations/highlight.js +138 -0
  39. package/dist/modules/operations/src/operations/key.js +61 -0
  40. package/dist/modules/operations/src/operations/navigate.js +148 -0
  41. package/dist/modules/operations/src/operations/scroll.js +126 -0
  42. package/dist/modules/operations/src/operations/type.js +190 -0
  43. package/dist/modules/operations/src/queue.js +100 -0
  44. package/dist/modules/operations/src/registry.js +11 -0
  45. package/dist/modules/operations/src/system/mouse.js +33 -0
  46. package/dist/modules/state/src/atomic-json.js +33 -0
  47. package/dist/modules/workflow/blocks/AnchorVerificationBlock.js +71 -0
  48. package/dist/modules/workflow/blocks/BehaviorRandomizer.js +26 -0
  49. package/dist/modules/workflow/blocks/CallWorkflowBlock.js +38 -0
  50. package/dist/modules/workflow/blocks/CloseDetailBlock.js +209 -0
  51. package/dist/modules/workflow/blocks/CollectBatch.js +137 -0
  52. package/dist/modules/workflow/blocks/CollectCommentsBlock.js +415 -0
  53. package/dist/modules/workflow/blocks/CollectSearchListBlock.js +599 -0
  54. package/dist/modules/workflow/blocks/CollectWeiboPosts.js +229 -0
  55. package/dist/modules/workflow/blocks/DetectPageStateBlock.js +259 -0
  56. package/dist/modules/workflow/blocks/EnsureLoginBlock.js +162 -0
  57. package/dist/modules/workflow/blocks/EnsureSession.js +426 -0
  58. package/dist/modules/workflow/blocks/ErrorClassifier.js +164 -0
  59. package/dist/modules/workflow/blocks/ErrorRecoveryBlock.js +319 -0
  60. package/dist/modules/workflow/blocks/ExpandCommentsBlock.js +1032 -0
  61. package/dist/modules/workflow/blocks/ExtractDetailBlock.js +310 -0
  62. package/dist/modules/workflow/blocks/ExtractPostFields.js +88 -0
  63. package/dist/modules/workflow/blocks/GenerateSmartReplyBlock.js +68 -0
  64. package/dist/modules/workflow/blocks/GoToSearchBlock.js +497 -0
  65. package/dist/modules/workflow/blocks/GracefulFallbackBlock.js +104 -0
  66. package/dist/modules/workflow/blocks/HighlightBlock.js +66 -0
  67. package/dist/modules/workflow/blocks/InitAutoScroll.js +65 -0
  68. package/dist/modules/workflow/blocks/LoadContainerDefinition.js +50 -0
  69. package/dist/modules/workflow/blocks/LoadContainerIndex.js +43 -0
  70. package/dist/modules/workflow/blocks/LocateAndGuardBlock.js +176 -0
  71. package/dist/modules/workflow/blocks/LoginRecoveryBlock.js +242 -0
  72. package/dist/modules/workflow/blocks/MatchContainers.js +64 -0
  73. package/dist/modules/workflow/blocks/MonitoringBlock.js +190 -0
  74. package/dist/modules/workflow/blocks/OpenDetailBlock.js +1240 -0
  75. package/dist/modules/workflow/blocks/OrganizeXhsNotesBlock.js +117 -0
  76. package/dist/modules/workflow/blocks/PersistXhsNoteBlock.js +270 -0
  77. package/dist/modules/workflow/blocks/PickSinglePost.js +69 -0
  78. package/dist/modules/workflow/blocks/ProgressTracker.js +125 -0
  79. package/dist/modules/workflow/blocks/RecordFixtureBlock.js +44 -0
  80. package/dist/modules/workflow/blocks/RenderMarkdown.js +48 -0
  81. package/dist/modules/workflow/blocks/SaveFile.js +54 -0
  82. package/dist/modules/workflow/blocks/ScrollNextBatch.js +72 -0
  83. package/dist/modules/workflow/blocks/SessionHealthBlock.js +73 -0
  84. package/dist/modules/workflow/blocks/StartBrowserService.js +45 -0
  85. package/dist/modules/workflow/blocks/ValidateContainerDefinition.js +67 -0
  86. package/dist/modules/workflow/blocks/ValidateExtract.js +35 -0
  87. package/dist/modules/workflow/blocks/WaitSearchPermitBlock.js +162 -0
  88. package/dist/modules/workflow/blocks/WaitStable.js +74 -0
  89. package/dist/modules/workflow/blocks/WarmupCommentsBlock.js +120 -0
  90. package/dist/modules/workflow/blocks/WorkflowExecutor.js +156 -0
  91. package/dist/modules/workflow/blocks/XiaohongshuCollectFromLinksBlock.js +1004 -0
  92. package/dist/modules/workflow/blocks/XiaohongshuCollectLinksBlock.js +1049 -0
  93. package/dist/modules/workflow/blocks/XiaohongshuFullCollectBlock.js +782 -0
  94. package/dist/modules/workflow/blocks/helpers/anchorVerify.js +198 -0
  95. package/dist/modules/workflow/blocks/helpers/asyncWorkQueue.js +53 -0
  96. package/dist/modules/workflow/blocks/helpers/commentScroller.js +334 -0
  97. package/dist/modules/workflow/blocks/helpers/commentSectionLocator.js +126 -0
  98. package/dist/modules/workflow/blocks/helpers/containerAnchors.js +301 -0
  99. package/dist/modules/workflow/blocks/helpers/debugArtifacts.js +6 -0
  100. package/dist/modules/workflow/blocks/helpers/downloadPaths.js +29 -0
  101. package/dist/modules/workflow/blocks/helpers/expandCommentsController.js +53 -0
  102. package/dist/modules/workflow/blocks/helpers/expandCommentsExtractor.js +129 -0
  103. package/dist/modules/workflow/blocks/helpers/macosVisionOcrPlugin.js +116 -0
  104. package/dist/modules/workflow/blocks/helpers/mergeXhsMarkdown.js +109 -0
  105. package/dist/modules/workflow/blocks/helpers/openDetailController.js +56 -0
  106. package/dist/modules/workflow/blocks/helpers/openDetailTypes.js +7 -0
  107. package/dist/modules/workflow/blocks/helpers/openDetailViewport.js +474 -0
  108. package/dist/modules/workflow/blocks/helpers/openDetailWaiter.js +104 -0
  109. package/dist/modules/workflow/blocks/helpers/operationLogger.js +195 -0
  110. package/dist/modules/workflow/blocks/helpers/persistedNotes.js +107 -0
  111. package/dist/modules/workflow/blocks/helpers/replyExpander.js +260 -0
  112. package/dist/modules/workflow/blocks/helpers/scrollIntoView.js +138 -0
  113. package/dist/modules/workflow/blocks/helpers/searchExecutor.js +328 -0
  114. package/dist/modules/workflow/blocks/helpers/searchGate.js +46 -0
  115. package/dist/modules/workflow/blocks/helpers/searchPageState.js +164 -0
  116. package/dist/modules/workflow/blocks/helpers/searchResultWaiter.js +64 -0
  117. package/dist/modules/workflow/blocks/helpers/simpleAnchor.js +134 -0
  118. package/dist/modules/workflow/blocks/helpers/smartReply.js +40 -0
  119. package/dist/modules/workflow/blocks/helpers/systemInput.js +635 -0
  120. package/dist/modules/workflow/blocks/helpers/targetCountMode.js +9 -0
  121. package/dist/modules/workflow/blocks/helpers/xhsCliArgs.js +80 -0
  122. package/dist/modules/workflow/blocks/helpers/xhsCommentDom.js +805 -0
  123. package/dist/modules/workflow/blocks/helpers/xhsNoteOrganizer.js +140 -0
  124. package/dist/modules/workflow/blocks/restore/RestorePhaseBlock.js +204 -0
  125. package/dist/modules/workflow/config/workflowRegistry.js +32 -0
  126. package/dist/modules/workflow/definitions/batch-collect-workflow.js +63 -0
  127. package/dist/modules/workflow/definitions/scroll-extract-workflow.js +74 -0
  128. package/dist/modules/workflow/definitions/xiaohongshu-collect-workflow-v2.js +81 -0
  129. package/dist/modules/workflow/definitions/xiaohongshu-collect-workflow.js +57 -0
  130. package/dist/modules/workflow/definitions/xiaohongshu-full-collect-workflow-v3.js +68 -0
  131. package/dist/modules/workflow/definitions/xiaohongshu-note-collect.js +49 -0
  132. package/dist/modules/workflow/definitions/xiaohongshu-phase1-workflow-v3.js +30 -0
  133. package/dist/modules/workflow/definitions/xiaohongshu-phase2-links-workflow-v3.js +40 -0
  134. package/dist/modules/workflow/definitions/xiaohongshu-phase3-collect-workflow-v1.js +54 -0
  135. package/dist/modules/workflow/definitions/xiaohongshu-phase34-from-links-workflow-v3.js +25 -0
  136. package/dist/modules/workflow/src/WeiboEventDrivenWorkflowRunner.js +308 -0
  137. package/dist/modules/workflow/src/context.js +70 -0
  138. package/dist/modules/workflow/src/index.js +5 -0
  139. package/dist/modules/workflow/src/orchestrator.js +230 -0
  140. package/dist/modules/workflow/src/runner.js +55 -0
  141. package/dist/modules/workflow/src/runtime.js +70 -0
  142. package/dist/modules/workflow/workflows/WeiboFeedExtractionWorkflow.js +359 -0
  143. package/dist/modules/workflow/workflows/XiaohongshuLoginWorkflow.js +110 -0
  144. package/dist/modules/xiaohongshu/app/src/blocks/MatchCommentsBlock.js +139 -0
  145. package/dist/modules/xiaohongshu/app/src/blocks/Phase1EnsureServicesBlock.js +36 -0
  146. package/dist/modules/xiaohongshu/app/src/blocks/Phase1MonitorCookieBlock.js +213 -0
  147. package/dist/modules/xiaohongshu/app/src/blocks/Phase1StartProfileBlock.js +121 -0
  148. package/dist/modules/xiaohongshu/app/src/blocks/Phase2CollectLinksBlock.js +1249 -0
  149. package/dist/modules/xiaohongshu/app/src/blocks/Phase2SearchBlock.js +703 -0
  150. package/dist/modules/xiaohongshu/app/src/blocks/Phase34CloseDetailBlock.js +41 -0
  151. package/dist/modules/xiaohongshu/app/src/blocks/Phase34CloseTabsBlock.js +44 -0
  152. package/dist/modules/xiaohongshu/app/src/blocks/Phase34CollectCommentsBlock.js +150 -0
  153. package/dist/modules/xiaohongshu/app/src/blocks/Phase34ExtractDetailBlock.js +117 -0
  154. package/dist/modules/xiaohongshu/app/src/blocks/Phase34OpenDetailBlock.js +102 -0
  155. package/dist/modules/xiaohongshu/app/src/blocks/Phase34OpenTabsBlock.js +109 -0
  156. package/dist/modules/xiaohongshu/app/src/blocks/Phase34PersistDetailBlock.js +117 -0
  157. package/dist/modules/xiaohongshu/app/src/blocks/Phase34ProcessSingleNoteBlock.js +114 -0
  158. package/dist/modules/xiaohongshu/app/src/blocks/Phase34ValidateLinksBlock.js +90 -0
  159. package/dist/modules/xiaohongshu/app/src/blocks/Phase3InteractBlock.js +1009 -0
  160. package/dist/modules/xiaohongshu/app/src/blocks/Phase4MultiTabHarvestBlock.js +233 -0
  161. package/dist/modules/xiaohongshu/app/src/blocks/ReplyInteractBlock.js +291 -0
  162. package/dist/modules/xiaohongshu/app/src/blocks/XhsDiscoverFallbackBlock.js +240 -0
  163. package/dist/modules/xiaohongshu/app/src/blocks/helpers/commentMatchDsl.js +126 -0
  164. package/dist/modules/xiaohongshu/app/src/blocks/helpers/commentMatcher.js +99 -0
  165. package/dist/modules/xiaohongshu/app/src/blocks/helpers/evidence.js +27 -0
  166. package/dist/modules/xiaohongshu/app/src/blocks/helpers/sharding.js +42 -0
  167. package/dist/modules/xiaohongshu/app/src/blocks/helpers/xhsComments.js +270 -0
  168. package/dist/modules/xiaohongshu/app/src/index.js +9 -0
  169. package/dist/modules/xiaohongshu/app/src/utils/checkpoints.js +222 -0
  170. package/dist/modules/xiaohongshu/app/src/utils/controllerAction.js +43 -0
  171. package/dist/services/controller/src/controller.js +1476 -0
  172. package/dist/services/controller/src/index.js +2 -0
  173. package/dist/services/controller/src/payload-normalizer.js +129 -0
  174. package/dist/services/shared/heartbeat.js +120 -0
  175. package/dist/services/shared/lib/errorHandler.js +2 -0
  176. package/dist/services/shared/serviceProcessLogger.js +139 -0
  177. package/dist/services/unified-api/RemoteBrowserSession.js +176 -0
  178. package/dist/services/unified-api/RemoteSessionManager.js +148 -0
  179. package/dist/services/unified-api/container-operations-handler.js +115 -0
  180. package/dist/services/unified-api/server.js +652 -0
  181. package/dist/services/unified-api/state-registry.js +274 -0
  182. package/dist/services/unified-api/task-persistence.js +66 -0
  183. package/dist/services/unified-api/task-state.js +130 -0
  184. package/modules/camo-runtime/src/autoscript/action-providers/xhs/search.mjs +12 -5
  185. package/modules/xiaohongshu/app/pnpm-lock.yaml +24 -0
  186. package/package.json +38 -10
  187. package/.beads/README.md +0 -81
  188. package/.beads/config.yaml +0 -67
  189. package/.beads/interactions.jsonl +0 -0
  190. package/.beads/issues.jsonl +0 -180
  191. package/.beads/metadata.json +0 -4
  192. package/.claude/settings.local.json +0 -10
  193. package/.github/workflows/ci.yml +0 -55
  194. package/AGENTS.md +0 -253
  195. package/apps/desktop-console/README.md +0 -27
  196. package/apps/desktop-console/package-lock.json +0 -897
  197. package/apps/desktop-console/package.json +0 -20
  198. package/apps/desktop-console/scripts/build-and-install.mjs +0 -19
  199. package/apps/desktop-console/scripts/build.mjs +0 -45
  200. package/apps/desktop-console/scripts/test-preload.mjs +0 -13
  201. package/apps/desktop-console/src/main/config.mts +0 -26
  202. package/apps/desktop-console/src/main/core-daemon-manager.mts +0 -131
  203. package/apps/desktop-console/src/main/desktop-settings.mts +0 -267
  204. package/apps/desktop-console/src/main/heartbeat-watchdog.mts +0 -50
  205. package/apps/desktop-console/src/main/heartbeat-watchdog.test.mts +0 -68
  206. package/apps/desktop-console/src/main/index-streaming.test.mts +0 -20
  207. package/apps/desktop-console/src/main/index.mts +0 -980
  208. package/apps/desktop-console/src/main/profile-store.mts +0 -239
  209. package/apps/desktop-console/src/main/profile-store.test.mts +0 -54
  210. package/apps/desktop-console/src/main/state-bridge.mts +0 -114
  211. package/apps/desktop-console/src/main/task-state-types.ts +0 -32
  212. package/apps/desktop-console/src/renderer/hooks/use-task-state.mts +0 -120
  213. package/apps/desktop-console/src/renderer/index.mts +0 -133
  214. package/apps/desktop-console/src/renderer/index.test.mts +0 -34
  215. package/apps/desktop-console/src/renderer/path-helpers.mts +0 -46
  216. package/apps/desktop-console/src/renderer/path-helpers.test.mts +0 -14
  217. package/apps/desktop-console/src/renderer/tabs/debug.mts +0 -48
  218. package/apps/desktop-console/src/renderer/tabs/debug.test.mts +0 -22
  219. package/apps/desktop-console/src/renderer/tabs/logs.mts +0 -421
  220. package/apps/desktop-console/src/renderer/tabs/logs.test.mts +0 -27
  221. package/apps/desktop-console/src/renderer/tabs/preflight.mts +0 -486
  222. package/apps/desktop-console/src/renderer/tabs/preflight.test.mts +0 -33
  223. package/apps/desktop-console/src/renderer/tabs/profile-pool.mts +0 -213
  224. package/apps/desktop-console/src/renderer/tabs/results.mts +0 -171
  225. package/apps/desktop-console/src/renderer/tabs/run.test.mts +0 -63
  226. package/apps/desktop-console/src/renderer/tabs/runtime.mts +0 -151
  227. package/apps/desktop-console/src/renderer/tabs/settings.mts +0 -146
  228. package/apps/desktop-console/src/renderer/tabs/xiaohongshu/account-flow.mts +0 -486
  229. package/apps/desktop-console/src/renderer/tabs/xiaohongshu/guide-browser-check.mts +0 -56
  230. package/apps/desktop-console/src/renderer/tabs/xiaohongshu/helpers.mts +0 -262
  231. package/apps/desktop-console/src/renderer/tabs/xiaohongshu/layout-block.mts +0 -430
  232. package/apps/desktop-console/src/renderer/tabs/xiaohongshu/live-stats.mts +0 -847
  233. package/apps/desktop-console/src/renderer/tabs/xiaohongshu/run-flow.mts +0 -443
  234. package/apps/desktop-console/src/renderer/tabs/xiaohongshu-state.mts +0 -425
  235. package/apps/desktop-console/src/renderer/tabs/xiaohongshu.mts +0 -497
  236. package/apps/desktop-console/src/renderer/tabs/xiaohongshu.test.mts +0 -291
  237. package/apps/desktop-console/src/renderer/ui-components.mts +0 -31
  238. package/docs/README_camoufox_chinese.md +0 -141
  239. package/docs/USAGE_V3.md +0 -163
  240. package/docs/arch/OCR_MACOS_PLUGIN.md +0 -39
  241. package/docs/arch/PORTS.md +0 -40
  242. package/docs/arch/REGRESSION_CHECKLIST.md +0 -121
  243. package/docs/arch/SEARCH_GATE.md +0 -224
  244. package/docs/arch/VIEWPORT_SAFETY.md +0 -182
  245. package/docs/arch/XIAOHONGSHU_OFFLINE_MOCK_DESIGN.md +0 -267
  246. package/docs/xiaohongshu-container-driven-summary.md +0 -221
  247. package/docs/xiaohongshu-full-collect-runbook.md +0 -134
  248. package/docs/xiaohongshu-next-steps.md +0 -228
  249. package/docs/xiaohongshu-quickstart.md +0 -73
  250. package/docs/xiaohongshu-workflow-summary.md +0 -227
  251. package/modules/container-registry/tests/container-registry.test.ts +0 -16
  252. package/modules/logging/tests/logging.test.ts +0 -38
  253. package/modules/operations/tests/operations.test.ts +0 -22
  254. package/modules/operations/tests/viewport-filter.test.ts +0 -161
  255. package/modules/operations/tests/visible-only.test.ts +0 -250
  256. package/modules/session-manager/tests/session-manager.test.ts +0 -23
  257. package/modules/state/src/atomic-json.test.ts +0 -30
  258. package/modules/state/src/paths.test.ts +0 -59
  259. package/modules/state/src/xiaohongshu-collect-state.test.ts +0 -259
  260. package/modules/workflow/blocks/AnchorVerificationBlock.d.ts.map +0 -1
  261. package/modules/workflow/blocks/AnchorVerificationBlock.js.map +0 -1
  262. package/modules/workflow/blocks/DetectPageStateBlock.d.ts.map +0 -1
  263. package/modules/workflow/blocks/DetectPageStateBlock.js.map +0 -1
  264. package/modules/workflow/blocks/ErrorRecoveryBlock.d.ts.map +0 -1
  265. package/modules/workflow/blocks/ErrorRecoveryBlock.js.map +0 -1
  266. package/modules/workflow/blocks/WaitSearchPermitBlock.d.ts.map +0 -1
  267. package/modules/workflow/blocks/WaitSearchPermitBlock.js.map +0 -1
  268. package/modules/workflow/blocks/helpers/containerAnchors.d.ts.map +0 -1
  269. package/modules/workflow/blocks/helpers/containerAnchors.js.map +0 -1
  270. package/modules/workflow/blocks/helpers/downloadPaths.test.ts +0 -62
  271. package/modules/workflow/blocks/helpers/mergeXhsMarkdown.test.ts +0 -121
  272. package/modules/workflow/blocks/helpers/operationLogger.d.ts.map +0 -1
  273. package/modules/workflow/blocks/helpers/operationLogger.js.map +0 -1
  274. package/modules/workflow/blocks/helpers/persistedNotes.test.ts +0 -268
  275. package/modules/workflow/blocks/helpers/searchPageState.d.ts.map +0 -1
  276. package/modules/workflow/blocks/helpers/searchPageState.js.map +0 -1
  277. package/modules/workflow/blocks/helpers/targetCountMode.test.ts +0 -29
  278. package/modules/workflow/blocks/helpers/xhsCliArgs.test.ts +0 -75
  279. package/modules/workflow/tests/smartReply.test.ts +0 -32
  280. package/modules/xiaohongshu/app/src/blocks/Phase3Interact.matcher.test.ts +0 -33
  281. package/modules/xiaohongshu/app/src/utils/__tests__/checkpoints.test.ts +0 -141
  282. package/modules/xiaohongshu/app/tests/commentMatchDsl.test.ts +0 -50
  283. package/modules/xiaohongshu/app/tests/commentMatcher.test.ts +0 -46
  284. package/modules/xiaohongshu/app/tests/sharding.test.ts +0 -31
  285. package/package-scripts.json +0 -8
  286. package/runtime/infra/utils/README.md +0 -13
  287. package/runtime/infra/utils/scripts/README.md +0 -0
  288. package/runtime/infra/utils/scripts/development/eval-in-session.mjs +0 -40
  289. package/runtime/infra/utils/scripts/development/highlight-search-containers.mjs +0 -35
  290. package/runtime/infra/utils/scripts/service/kill-port.mjs +0 -24
  291. package/runtime/infra/utils/scripts/service/start-api.mjs +0 -39
  292. package/runtime/infra/utils/scripts/service/start-browser-service.mjs +0 -106
  293. package/runtime/infra/utils/scripts/service/stop-api.mjs +0 -18
  294. package/runtime/infra/utils/scripts/service/stop-browser-service.mjs +0 -104
  295. package/runtime/infra/utils/scripts/test-services.mjs +0 -94
  296. package/services/shared/heartbeat.test.ts +0 -102
  297. package/services/unified-api/__tests__/task-state.test.ts +0 -95
  298. package/sitecustomize.py +0 -19
  299. package/tests/README.md +0 -194
  300. package/tests/e2e/workflows/weibo-feed-extraction.test.ts +0 -171
  301. package/tests/fixtures/data/container-definitions.json +0 -67
  302. package/tests/fixtures/pages/simple-page.html +0 -69
  303. package/tests/integration/01-test-container-match.mjs +0 -188
  304. package/tests/integration/02-test-dom-branch.mjs +0 -161
  305. package/tests/integration/03-test-container-operation-system.mjs +0 -91
  306. package/tests/integration/05-test-container-lifecycle-events.mjs +0 -224
  307. package/tests/integration/05-test-container-lifecycle-with-events.mjs +0 -250
  308. package/tests/integration/06-test-container-dom-tree-drawing.mjs +0 -256
  309. package/tests/integration/07-test-weibo-container-lifecycle.mjs +0 -355
  310. package/tests/integration/08-test-weibo-feed-workflow.test.mjs +0 -164
  311. package/tests/integration/10-test-visual-analyzer.mjs +0 -312
  312. package/tests/integration/11-test-visual-loop.mjs +0 -284
  313. package/tests/integration/12-test-simple-visual-loop.mjs +0 -242
  314. package/tests/integration/13-test-visual-robust.mjs +0 -185
  315. package/tests/integration/14-test-visual-highlight-loop.mjs +0 -271
  316. package/tests/integration/inspect-page.mjs +0 -50
  317. package/tests/integration/run-all-tests.mjs +0 -95
  318. package/tests/patch_verification/CODEX_PATCH_TEST.md +0 -103
  319. package/tests/patch_verification/PHASE2_ANALYSIS.md +0 -179
  320. package/tests/patch_verification/PHASE2_OPTIMIZATION_REPORT.md +0 -55
  321. package/tests/patch_verification/PHASE2_TO_PHASE4_SUMMARY.md +0 -126
  322. package/tests/patch_verification/QUICK_TEST_SEQUENCE.md +0 -262
  323. package/tests/patch_verification/README.md +0 -143
  324. package/tests/patch_verification/RUN_TESTS.md +0 -60
  325. package/tests/patch_verification/TEST_EXECUTION.md +0 -99
  326. package/tests/patch_verification/TEST_PLAN.md +0 -328
  327. package/tests/patch_verification/TEST_RESULTS.md +0 -34
  328. package/tests/patch_verification/TOOL_TEST_PLAN.md +0 -48
  329. package/tests/patch_verification/run-tool-test.mjs +0 -121
  330. package/tests/patch_verification/temp_test_files/test01.txt +0 -1
  331. package/tests/patch_verification/temp_test_files/test02.txt +0 -3
  332. package/tests/patch_verification/temp_test_files/test02_gnu.txt +0 -3
  333. package/tests/patch_verification/temp_test_files/test03.txt +0 -1
  334. package/tests/patch_verification/temp_test_files/test03_multiline.txt +0 -5
  335. package/tests/patch_verification/temp_test_files/test04_function.ts +0 -5
  336. package/tests/patch_verification/temp_test_files/test05_import.ts +0 -4
  337. package/tests/patch_verification/temp_test_files/test06_special_chars.txt +0 -4
  338. package/tests/patch_verification/temp_test_files/test07_indentation.ts +0 -5
  339. package/tests/patch_verification/temp_test_files/test08_mismatch.txt +0 -1
  340. package/tests/patch_verification/temp_test_files/test_add_02.txt +0 -3
  341. package/tests/patch_verification/temp_test_files/test_simple.txt +0 -1
  342. package/tests/runner/TestReporter.mjs +0 -57
  343. package/tests/runner/TestRunner.mjs +0 -244
  344. package/tests/unit/commands/profile.test.mjs +0 -10
  345. package/tests/unit/container/change-notifier.test.mjs +0 -181
  346. package/tests/unit/lifecycle/session-registry.test.mjs +0 -135
  347. package/tests/unit/operations/registry.test.ts +0 -73
  348. package/tests/unit/utils/browser-service.test.mjs +0 -153
  349. package/tests/unit/utils/config.test.mjs +0 -166
  350. package/tests/unit/utils/fingerprint.test.mjs +0 -166
  351. package/tsconfig.json +0 -31
  352. package/tsconfig.services.json +0 -26
  353. /package/apps/desktop-console/{src → dist}/renderer/index.html +0 -0
  354. /package/apps/desktop-console/{src/renderer/tabs → dist/renderer}/run.mts +0 -0
@@ -1,267 +0,0 @@
1
- # 小红书采集持久化节点与离线仿真测试设计
2
-
3
- > 目标:在不依赖线上页面和 URL 跳转的前提下,完整验证「详情提取 + 评论采集 + 持久化写盘」链路,为后续量产采集提供稳定闭环。
4
-
5
- ## 1. 持久化节点:PersistXhsNoteBlock
6
-
7
- ### 1.1 职责
8
-
9
- - 单一职责:将当前 Note 的结构化内容(详情 + 评论)写入本地目录结构;
10
- - 不做 DOM 访问、不做容器操作,只处理纯数据与文件系统;
11
- - 所有落盘路径统一落在 `~/.webauto/download/xiaohongshu/{env}/`。
12
-
13
- ### 1.2 输入
14
-
15
- 由上游 Workflow 上下文提供(通常来自 ExtractDetailBlock / CollectCommentsBlock):
16
-
17
- - `sessionId: string`
18
- - `env: string`:环境标记,例如 `debug` / `prod`
19
- - `platform?: string`:默认 `'xiaohongshu'`
20
- - `keyword: string`
21
- - `noteId: string`
22
- - `detailUrl?: string`:当前详情页 URL(带 xsec token,只用于展示,不参与导航)
23
- - `detail: { ... }`:
24
- - 至少包含:`title`, `contentText`, `gallery: { images: string[] }`
25
- - 具体字段沿用 ExtractDetailBlock 的输出结构
26
- - `commentsResult: { ... }`:
27
- - 至少包含:
28
- - `comments: Array<{ user_name?, user_id?, timestamp?, text? }>`
29
- - `totalFromHeader?: number`
30
- - `reachedEnd?: boolean`
31
- - `emptyState?: boolean`
32
-
33
- ### 1.3 输出
34
-
35
- ```ts
36
- interface PersistXhsNoteOutput {
37
- success: boolean;
38
- error?: string;
39
- outputDir?: string; // 实际写盘的帖子目录
40
- contentPath?: string; // content.md 路径
41
- imagesDir?: string; // images 目录路径
42
- }
43
- ```
44
-
45
- ### 1.4 目录结构
46
-
47
- - 根目录:`~/.webauto/download/xiaohongshu/{env}/`
48
- - 关键字目录:`{root}/{sanitize(keyword)}/`
49
- - 单条 Note 目录:`{root}/{sanitize(keyword)}/{noteId}/`
50
- - `content.md`:帖子+评论 Markdown
51
- - `images/`:图片文件
52
-
53
- `sanitize(keyword)`:沿用现有实现,替换 `\/:*?"<>|` 等字符为 `_`,并 trim。
54
-
55
- ### 1.5 写盘逻辑
56
-
57
- 1. **目录创建**
58
- - 依次确保 `root/keywordDir/postDir/imagesDir` 存在;
59
- - 使用 Node ESM FS API:`fs.promises.mkdir(dir, { recursive: true })`。
60
-
61
- 2. **图片下载**
62
- - 来源:`detail.gallery.images: string[]`;
63
- - 预处理:
64
- - 去除空值,两端 trim;
65
- - `//` 开头补 `https:`;
66
- - 仅保留 `http/https` 协议;
67
- - 下载策略:
68
- - 使用 `fetch(url)` 获取响应,`arrayBuffer()` → `Buffer`;
69
- - 文件名:`images/{index}.jpg`(`01.jpg`、`02.jpg`...,保留顺序即可);
70
- - 对单张失败情况:跳过该 URL,打印告警但不使整个 Block 失败;
71
- - 返回:
72
- - 本地相对路径列表,例如:`['images/01.jpg', 'images/02.jpg', ...]`。
73
-
74
- 3. **content.md 结构**
75
-
76
- 示例结构(与现有 `collect-100-workflow-v2.mjs` 一致,但文件名统一为 `content.md`):
77
-
78
- ```markdown
79
- # {title || '无标题'}
80
-
81
- - Note ID: {noteId}
82
- - 关键词: {keyword}
83
- - 链接: {detailUrl}
84
- - 作者: {author}
85
- - 评论统计: 抓取={comments.length}, header={totalFromHeader|未知}(reachedEnd={是/否}, empty={是/否})
86
-
87
- ## 正文
88
-
89
- {contentText 或占位 "(无正文)"}
90
-
91
- ## 图片
92
-
93
- ![](images/01.jpg)
94
- ![](images/02.jpg)
95
- ...
96
-
97
- ## 评论
98
-
99
- - **用户名**(user_id) [时间]:评论文本
100
- ...
101
- ```
102
-
103
- 字段选择策略:
104
-
105
- - `title`:优先 detail.header/content 中的标题字段,其次回退到列表 item 的标题;
106
- - `author`:从 detail.header 中的 `author/user_name/nickname` 选取;
107
- - `contentText`:从 detail.content 中组合正文文本字段;
108
- - `评论统计`:使用 `commentsResult.comments/totalFromHeader/reachedEnd/emptyState` 填充。
109
-
110
- 评论渲染规则:
111
-
112
- - 遍历 `commentsResult.comments`:
113
- - `user = user_name || username || '未知用户'`
114
- - `uid = user_id || ''`
115
- - `ts = timestamp || ''`
116
- - `text = text || ''`
117
- - 生成:`- **{user}**({uid}) [ts]:{text}`
118
- - 当 `comments.length === 0` 时写入:`(无评论)`。
119
-
120
- ---
121
-
122
- ## 2. 在线数据 → 本地 fixture JSON
123
-
124
- > 一次在线采集,多次离线复用。
125
-
126
- ### 2.1 录制位置
127
-
128
- - 在真实阶段(在线运行)中,在以下 Block 后增加 debug 输出(仅在 `DEBUG` 或特定环境下打开):
129
- - `ExtractDetailBlock` 完成后;
130
- - `CollectCommentsBlock` 完成后。
131
- - 将两者输出聚合成一份结构体:
132
-
133
- ```ts
134
- interface XhsNoteFixture {
135
- noteId: string;
136
- keyword: string;
137
- detailUrl: string;
138
- detail: any; // ExtractDetailBlock 完整输出
139
- commentsResult: any; // CollectCommentsBlock 完整输出
140
- capturedAt: string; // ISO 时间
141
- }
142
- ```
143
-
144
- ### 2.2 落盘路径
145
-
146
- - 路径统一放在用户目录,不进仓库:
147
- - `~/.webauto/fixtures/xiaohongshu/{noteId}.json`
148
- - 由一个小的工具函数或 Block 内部调试逻辑写入:
149
- - 非强制步骤,只在调试/回放模式下写,避免常态任务产生太多 fixture。
150
-
151
- ### 2.3 用途
152
-
153
- - PersistXhsNoteBlock 的单元/集成测试直接以 fixture 为输入,不依赖浏览器或 DOM;
154
- - 也作为生成离线 HTML 仿真页的原始数据源。
155
-
156
- ---
157
-
158
- ## 3. fixture JSON → 仿真 HTML 详情页
159
-
160
- > 目标:构造一个“结构类似小红书详情页”的本地 HTML,使容器系统与 Block 可以在本地跑完整链路。
161
-
162
- ### 3.1 生成脚本
163
-
164
- - 新增脚本:`scripts/xiaohongshu/tests/generate-detail-mock-page.mjs`
165
- - 输入:
166
- - `--noteId <id>`:从 `~/.webauto/fixtures/xiaohongshu/{noteId}.json` 读数据;
167
- - `--output <path>`(可选):默认写到 `~/.webauto/fixtures/xiaohongshu/detail-{noteId}.html`。
168
- - 输出:
169
- - 一份完整 HTML,模拟线上详情页的布局和 class 结构。
170
-
171
- ### 3.2 DOM 结构设计(按容器对齐)
172
-
173
- 仿真 DOM 需对齐以下容器 ID/selector:
174
-
175
- - 详情容器:
176
- - `xiaohongshu_detail.modal_shell` / `xiaohongshu_detail`:最外层模态框容器;
177
- - `xiaohongshu_detail.header`:标题、作者信息区域;
178
- - `xiaohongshu_detail.content`:正文文本区域;
179
- - `xiaohongshu_detail.gallery`:图片区域。
180
- - 评论容器:
181
- - `xiaohongshu_detail.comment_section`:评论区根容器;
182
- - `xiaohongshu_detail.comment_section.comment_item`:单条评论节点;
183
- - `xiaohongshu_detail.comment_section.show_more_button`:展开更多按钮;
184
- - `xiaohongshu_detail.comment_section.end_marker`:末尾 marker(可选)。
185
-
186
- 布局要点:
187
-
188
- - 使用与容器 JSON 中 selector 对齐的 class / DOM 层级;
189
- - 每条评论生成一段 `.comment-item`,内部包含:
190
- - 用户名元素(如 `.user-name`);
191
- - 用户链接/ID(放在 `data-user-id` 或 `<a href="/user">` 中);
192
- - 时间元素;
193
- - 文本元素。
194
-
195
- ### 3.3 “展开更多评论”仿真
196
-
197
- - 插入若干 `.show-more` 按钮与折叠块:
198
- - 初始部分评论(例如前 N 条)直接可见;
199
- - 后续评论包在一个 `div` 里,`style="display:none"`;
200
- - 在其前插入一个 `.show-more` 元素。
201
- - 在页面底部插入一段简单的 inline JS:
202
-
203
- ```js
204
- document.addEventListener('click', (e) => {
205
- const btn = e.target.closest('.show-more');
206
- if (!btn) return;
207
- const block = btn.nextElementSibling;
208
- if (block) {
209
- block.style.display = 'block';
210
- btn.remove();
211
- }
212
- });
213
- ```
214
-
215
- - 目的:让 `WarmupCommentsBlock` + `CollectCommentsBlock` 在本地也能通过容器 click 自动展开评论,行为上与线上一致。
216
-
217
- ### 3.4 图片区域仿真
218
-
219
- - 使用 fixture 中的 `detail.gallery.images`:
220
- - 在 gallery 容器下生成 `<img>` 列表,class 对齐容器定义,例如:
221
- - `.note-img img`、`.note-scroller img` 等;
222
- - `src` 直接使用线上 URL(下载由 PersistXhsNoteBlock 负责)。
223
-
224
- ---
225
-
226
- ## 4. 基于仿真页的测试策略
227
-
228
- ### 4.1 PersistXhsNoteBlock 单块测试
229
-
230
- 1. 使用 fixture JSON 作为直接输入,不依赖 HTML/浏览器;
231
- 2. 调用 `PersistXhsNoteBlock.execute()`;
232
- 3. 断言:
233
- - 目录结构:`~/.webauto/download/xiaohongshu/{env}/{keyword}/{noteId}/` 存在;
234
- - `content.md` 内容完整(标题、元信息、正文、图片引用、评论);
235
- - `images/` 下图片数量与 `gallery.images` 数量基本一致(允许部分下载失败但有告警)。
236
-
237
- ### 4.2 单 Note Workflow 离线 E2E
238
-
239
- 1. 启动 Browser Service,但导航到本地生成的仿真 HTML:
240
- - 例如:`http://127.0.0.1:port/xhs-mock/detail-{noteId}.html`;
241
- - URL 不包含任何线上域名,也不构造 xsec-less 链接。
242
- 2. 通过 `runWorkflowById('xiaohongshu-note-collect', { sessionId, keyword, env, noteId, detailUrl: mockUrl })` 执行:
243
- - 内部仍然使用容器系统进行 anchor 定位、滚动、展开评论;
244
- - CollectCommentsBlock 与 ExtractDetailBlock 均在本地仿真 DOM 上运行。
245
- 3. 验证:
246
- - WorkflowExecutionResult 中各步骤 success;
247
- - 持久化结果与 fixture 内容一致(评论条数、标题、正文等)。
248
-
249
- ### 4.3 整链路集成(可选)
250
-
251
- - 在 debug 模式下,将搜索阶段替换为「直接跳本地仿真详情页」的简化 Workflow,用于验证:
252
- - 顶层 Workflow + CallWorkflowBlock 串联;
253
- - note-collect 节点可被反复调用且写盘正确;
254
- - 不再依赖真实搜索页和线上滚动。
255
-
256
- ---
257
-
258
- ## 5. 对现有代码的影响范围(规划)
259
-
260
- 1. 新增 Block:`PersistXhsNoteBlock`(仅依赖 Node FS 与 fetch,不依赖容器或浏览器上下文);
261
- 2. 新增脚本:`scripts/xiaohongshu/tests/generate-detail-mock-page.mjs`;
262
- 3. 适度修改:
263
- - 在在线调试脚本 / Workflow 中增加 fixture 录制逻辑(可由 DEBUG 开关控制);
264
- - 在 `xiaohongshu-note-collect` Workflow 定义中插入 `PersistXhsNoteBlock`。
265
-
266
- 通过本设计,我们可以在本地稳定重放“小红书详情+评论”的复杂场景,用真实数据驱动的仿真 DOM 来验证容器、Block 与持久化逻辑,而不再依赖线上页面与 URL 导航,从而显著降低调试成本与风控风险。
267
-
@@ -1,221 +0,0 @@
1
- # 小红书容器驱动化改造总结
2
-
3
- > 日期:2025-01-06
4
- > 状态:✅ 完成
5
- > 目标:将小红书采集链路完全切换到容器驱动模式
6
-
7
- ## ✅ 完成清单
8
-
9
- ### 1. 登录锚点模型定义 ✅
10
-
11
- **文件**:`container-library/xiaohongshu/README.md`
12
-
13
- **约定**:
14
- - **已登录标识**:`*.login_anchor`(匹配任意页面下的登录锚点容器)
15
- - **未登录标识**:`xiaohongshu_login.login_guard`(登录页核心控件)
16
- - **不确定状态**:两类容器都不匹配
17
-
18
- **容器选择器**:
19
- - `*.login_anchor`:`a.link-wrapper[title="我"]`
20
- - `xiaohongshu_login.login_guard`:登录页核心控件
21
-
22
- ### 2. Launcher 登录检测改造 ✅
23
-
24
- **文件**:`launcher/core/launcher.mjs`
25
-
26
- **改造内容**:
27
- - 移除硬编码 DOM 查询
28
- - 改为调用 `containers:match` 获取容器树
29
- - 递归查找 `*.login_anchor` 和 `xiaohongshu_login.login_guard`
30
- - 不再直接读取 `__INITIAL_STATE__` 等全局变量
31
-
32
- **关键代码**:
33
- ```typescript
34
- function findContainer(tree, pattern) {
35
- if (pattern.test(tree.id || tree.defId)) return tree;
36
- // 递归查找...
37
- }
38
-
39
- const loginAnchor = findContainer(tree, /\.login_anchor$/);
40
- const loginGuard = findContainer(tree, /xiaohongshu_login\.login_guard$/);
41
- ```
42
-
43
- ### 3. Workflow Block 实现 ✅
44
-
45
- **文件**:`modules/workflow/blocks/EnsureLoginBlock.ts`
46
-
47
- **功能**:
48
- - 通过 `containers:match` API 查找容器
49
- - 匹配到 `*.login_anchor` → 返回 `isLoggedIn: true`
50
- - 匹配到 `login_guard` → 等待人工登录
51
- - 超时保护(默认 2 分钟)
52
-
53
- **接口**:
54
- ```typescript
55
- interface EnsureLoginInput {
56
- sessionId: string;
57
- serviceUrl?: string;
58
- maxWaitMs?: number;
59
- checkIntervalMs?: number;
60
- }
61
-
62
- interface EnsureLoginOutput {
63
- isLoggedIn: boolean;
64
- loginMethod: 'container_match' | 'manual_wait' | 'timeout';
65
- matchedContainer?: string;
66
- waitTimeMs?: number;
67
- error?: string;
68
- }
69
- ```
70
-
71
- ### 4. 调试脚本改造 ✅
72
-
73
- **文件**:
74
- - `scripts/xiaohongshu/tests/status-v2.mjs` - 状态检查
75
- - `scripts/xiaohongshu/tests/phase1-session-login.mjs` - 登录守护
76
- - `scripts/debug-xhs-search.mjs` - Unattached 搜索验证
77
- - `scripts/debug-xhs-detail.mjs` - Unattached 详情页交互
78
-
79
- **改造要点**:
80
- - 移除硬编码 DOM 逻辑(如 `if (url.includes('xiaohongshu'))`)
81
- - 完全基于容器 ID 匹配
82
- - 优先使用刷新而非重新导航
83
- - 测试后恢复初始状态
84
-
85
- ### 5. 文档完善 ✅
86
-
87
- **文件**:
88
- - `container-library/xiaohongshu/README.md` - 登录锚点约定
89
- - `AGENTS.md` - 调试脚本 Unattached 模式规则
90
- - `task.md` - 完整任务追踪
91
-
92
- ## 📊 容器驱动化对比
93
-
94
- ### ❌ 旧方式(硬编码 DOM)
95
-
96
- ```javascript
97
- // 禁止这样写
98
- if (url.includes('xiaohongshu.com')) {
99
- const avatar = await page.$('a[title="我"]');
100
- if (avatar) return true;
101
- }
102
- ```
103
-
104
- **问题**:
105
- - DOM 选择器易失效
106
- - 平台特定逻辑分散
107
- - 难以测试和维护
108
- - 违反分层原则
109
-
110
- ### ✅ 新方式(容器驱动)
111
-
112
- ```typescript
113
- // 推荐:基于容器 ID
114
- const result = await controllerAction('containers:match', { profile, url });
115
- const loginAnchor = findContainer(tree, /\.login_anchor$/);
116
- if (loginAnchor) {
117
- return { isLoggedIn: true };
118
- }
119
- ```
120
-
121
- **优势**:
122
- - 平台无关(同一套代码支持微博/抖音等)
123
- - 选择器集中在容器定义
124
- - 易于测试和验证
125
- - 符合分层架构
126
-
127
- ## 🔄 数据流
128
-
129
- ### 登录检测流程
130
-
131
- ```
132
- 1. Launcher / Workflow
133
-
134
- 2. 调用 containers:match
135
-
136
- 3. 获取容器树
137
-
138
- 4. 递归查找 *.login_anchor
139
-
140
- 5a. 匹配到 → 已登录
141
-
142
- 5b. 未匹配到,查找 xiaohongshu_login.login_guard
143
-
144
- 6a. 匹配到 → 未登录,等待人工
145
-
146
- 6b. 未匹配到 → 不确定状态
147
- ```
148
-
149
- ### Workflow 执行流程
150
-
151
- ```
152
- 1. EnsureSessionBlock
153
-
154
- 2. EnsureLoginBlock(容器驱动)
155
-
156
- 3. GoToSearchBlock(容器驱动)
157
-
158
- 4. PickNoteBlock(容器驱动)
159
-
160
- 5. OpenDetailBlock(容器驱动)
161
-
162
- 6. ExpandCommentsBlock(容器驱动)
163
- ```
164
-
165
- ## 📝 关键文件清单
166
-
167
- | 文件 | 状态 | 说明 |
168
- |------|------|------|
169
- | `container-library/xiaohongshu/README.md` | ✅ | 登录锚点约定文档 |
170
- | `launcher/core/launcher.mjs` | ✅ | 容器驱动登录检测 |
171
- | `modules/workflow/blocks/EnsureLoginBlock.ts` | ✅ | 通用登录 Block |
172
- | `scripts/xiaohongshu/tests/status-v2.mjs` | ✅ | 容器驱动状态检查 |
173
- | `scripts/xiaohongshu/tests/phase1-session-login.mjs` | ✅ | 容器驱动登录守护 |
174
- | `scripts/debug-xhs-search.mjs` | ✅ | Unattached 搜索验证 |
175
- | `scripts/debug-xhs-detail.mjs` | ✅ | Unattached 详情页交互 |
176
- | `AGENTS.md` | ✅ | Unattached 模式规则 |
177
- | `task.md` | ✅ | 完整任务追踪 |
178
-
179
- ## 🎯 验证测试
180
-
181
- ### 测试命令
182
-
183
- ```bash
184
- # 1. 检查会话状态(容器驱动)
185
- node scripts/xiaohongshu/tests/status-v2.mjs
186
-
187
- # 2. 一键启动(容器驱动登录检测)
188
- node scripts/start-headful.mjs --profile xiaohongshu_fresh --url https://www.xiaohongshu.com
189
-
190
- # 3. 搜索验证(Unattached 模式)
191
- node scripts/debug-xhs-search.mjs
192
-
193
- # 4. 详情页测试(Unattached 模式)
194
- node scripts/debug-xhs-detail.mjs
195
- ```
196
-
197
- ### 预期结果
198
-
199
- - 所有脚本不再硬编码 DOM 逻辑
200
- - 登录状态完全基于容器匹配
201
- - 调试脚本复用现有 session
202
- - Workflow 可以直接复用 EnsureLoginBlock
203
-
204
- ## 🚀 下一步
205
-
206
- 1. 运行测试脚本验证容器驱动化
207
- 2. 创建第一个基于容器驱动的完整 Workflow
208
- 3. 运行小规模采集测试(5 条数据)
209
- 4. 优化 XiaohongshuCrawlerBlock 使用新架构
210
-
211
- ## 📚 参考文档
212
-
213
- - `container-library/xiaohongshu/README.md` - 容器定义 + 登录锚点约定
214
- - `task.md` - 当前任务追踪
215
- - `AGENTS.md` - 架构规则
216
- - `docs/xiaohongshu-next-steps.md` - 详细任务清单
217
-
218
- ---
219
-
220
- **完成时间**:2025-01-06 09:30
221
- **改造成果**:小红书链路 100% 容器驱动化
@@ -1,134 +0,0 @@
1
- # 小红书全流程采集执行文档(Phase1-4)
2
-
3
- > 目标:确保“列表未达目标也继续执行 Phase3/4 评论采集”,即使列表滚动异常也不中断后续流程。
4
-
5
- ## 一、前置检查
6
-
7
- ### 1. 服务状态
8
-
9
- ```bash
10
- curl http://127.0.0.1:7701/health
11
- curl http://127.0.0.1:7704/health
12
- ```
13
-
14
- ### 2. 会话状态
15
-
16
- ```bash
17
- node scripts/xiaohongshu/tests/status-v2.mjs
18
- ```
19
-
20
- 必须确认:
21
- - session: `xiaohongshu_fresh` 已存在
22
- - 登录锚点 `*.login_anchor` 命中
23
- - 当前页面处于小红书正常页面
24
-
25
- ### 3. SearchGate
26
-
27
- 如果需要搜索(Phase2 会触发):
28
-
29
- ```bash
30
- node scripts/search-gate-server.mjs
31
- ```
32
-
33
- ## 二、全流程启动
34
-
35
- ### 1. 启动完整采集
36
-
37
- ```bash
38
- node scripts/xiaohongshu/tests/phase1-4-full-collect.mjs --keyword "雷军" --count 200
39
- ```
40
-
41
- 输出目录(强制标准):
42
- ```
43
- ~/.webauto/download/xiaohongshu/{env}/{keyword}/{noteId}/
44
- ```
45
-
46
- 默认 env=`download`(可通过参数调整)。
47
-
48
- ### 2. 运行阶段说明
49
-
50
- | 阶段 | 功能 | 退出条件 |
51
- |------|------|-----------|
52
- | Phase1 | 确认服务/会话/登录 | 成功即进入 Phase2 |
53
- | Phase2(ListOnly) | 搜索列表采集 + 获取 safe-detail-urls | **列表滚动异常不影响后续阶段** |
54
- | Phase3 | 基于 safe-detail-urls 打开详情 | safe-detail-urls 为空则跳过 |
55
- | Phase4 | 采集评论并落盘 | 逐条完成,增量写 comments.md |
56
-
57
- ### 3. 关键要求(必须遵守)
58
-
59
- - **不允许 URL 直跳**:必须从搜索页点击进入详情
60
- - **SearchGate 节流**:搜索必须先申请许可
61
- - **容器锚点**:禁止硬编码 DOM
62
- - **滚动必须在视口内**:禁止 off-screen 操作
63
- - **内容文件名必须是 `content.md`**(不是 README.md)
64
-
65
- ## 三、Phase2 滚动异常处理(修复后行为)
66
-
67
- ### 目标
68
-
69
- - **只在检测到 END 标记时认为真正到底**
70
- - **若滚动失败:仍继续尝试滚动**
71
- - **即使 Phase2 未达标,也继续执行 Phase3/4**
72
-
73
- ### 异常退出逻辑(修复后)
74
-
75
- - 连续 3 轮滚动失败(每轮 3 次重试) → 标记 Phase2 异常退出
76
- - **但不会中断流程**:Phase3/4 继续执行已采集的 safe-detail-urls
77
-
78
- ## 四、产出目录结构
79
-
80
- ```
81
- ~/.webauto/download/xiaohongshu/{env}/{keyword}/{noteId}/
82
- ├── content.md
83
- ├── images/
84
- │ ├── 1.jpg
85
- │ └── ...
86
- └── comments.md # Phase4 追加生成
87
- ```
88
-
89
- 其他文件:
90
- - `.collect-state.json`:断点状态
91
- - `safe-detail-urls.jsonl`:带 xsec_token 的详情链接索引
92
- - `run.log / run-events.jsonl`:流程日志
93
-
94
- ## 五、恢复采集
95
-
96
- 如中断,可直接重跑,脚本会续传:
97
-
98
- ```bash
99
- node scripts/xiaohongshu/tests/phase1-4-full-collect.mjs --keyword "雷军" --count 200
100
- ```
101
-
102
- ## 六、排查常见问题
103
-
104
- ### 1. Phase2 停止但未到 END
105
-
106
- 修复后不会直接停止,只会进入“滚动失败重试”。
107
- 若仍退出:
108
- - 检查 END 标记是否存在
109
- - 检查容器结构是否变化
110
- - 检查 SearchGate 状态
111
-
112
- ### 2. 没有 comments.md
113
-
114
- 说明 Phase4 未完成或评论为空:
115
- - 检查日志 `Phase4` 关键字
116
- - 检查是否遇到风控
117
-
118
- ### 3. 输出路径不对
119
-
120
- 必须是:
121
- ```
122
- ~/.webauto/download/xiaohongshu/{env}/{keyword}/{noteId}/
123
- ```
124
-
125
- ## 七、日志定位
126
-
127
- ```bash
128
- # 最新 run log
129
- ls -lt ~/.webauto/download/xiaohongshu/{env}/{keyword}/run*.log | head -1
130
-
131
- # 查看关键错误
132
- rg "ERROR|WARN|风控|phase2_scroll_failure" ~/.webauto/download/xiaohongshu/{env}/{keyword}/run*.log
133
- ```
134
-