@web-auto/webauto 0.1.1 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/apps/desktop-console/default-settings.json +1 -0
- package/apps/desktop-console/dist/main/index.mjs +1618 -0
- package/apps/desktop-console/{src → dist}/main/preload.mjs +10 -0
- package/apps/desktop-console/dist/renderer/index.js +3063 -0
- package/apps/desktop-console/entry/ui-console.mjs +299 -0
- package/apps/webauto/entry/account.mjs +356 -0
- package/apps/webauto/entry/lib/account-detect.mjs +160 -0
- package/apps/webauto/entry/lib/account-store.mjs +587 -0
- package/apps/webauto/entry/lib/profilepool.mjs +1 -1
- package/apps/webauto/entry/xhs-install.mjs +27 -3
- package/apps/webauto/entry/xhs-status.mjs +152 -0
- package/apps/webauto/entry/xhs-unified.mjs +595 -17
- package/bin/webauto.mjs +247 -12
- package/dist/apps/webauto/server.js +66 -0
- package/dist/modules/camo-backend/src/index.js +575 -0
- package/dist/modules/camo-backend/src/internal/BrowserSession.js +817 -0
- package/dist/modules/camo-backend/src/internal/ElementRegistry.js +61 -0
- package/dist/modules/camo-backend/src/internal/ProfileLock.js +85 -0
- package/dist/modules/camo-backend/src/internal/SessionManager.js +172 -0
- package/dist/modules/camo-backend/src/internal/container-matcher.js +852 -0
- package/dist/modules/camo-backend/src/internal/engine-manager.js +258 -0
- package/dist/modules/camo-backend/src/internal/fingerprint.js +203 -0
- package/dist/modules/camo-backend/src/internal/pageRuntime.js +29 -0
- package/dist/modules/camo-backend/src/internal/runtimeInjector.js +30 -0
- package/dist/modules/camo-backend/src/internal/state-bus.js +46 -0
- package/dist/modules/camo-backend/src/internal/storage-paths.js +36 -0
- package/dist/modules/camo-backend/src/internal/ws-server.js +1202 -0
- package/dist/modules/camo-runtime/src/utils/browser-service.mjs +423 -0
- package/dist/modules/camo-runtime/src/utils/config.mjs +77 -0
- package/dist/modules/container-registry/src/index.js +184 -0
- package/dist/modules/logging/src/index.js +92 -0
- package/dist/modules/operations/src/builtin.js +27 -0
- package/dist/modules/operations/src/container-binding.js +75 -0
- package/dist/modules/operations/src/executor.js +146 -0
- package/dist/modules/operations/src/operations/click.js +167 -0
- package/dist/modules/operations/src/operations/extract.js +204 -0
- package/dist/modules/operations/src/operations/find-child.js +17 -0
- package/dist/modules/operations/src/operations/highlight.js +138 -0
- package/dist/modules/operations/src/operations/key.js +61 -0
- package/dist/modules/operations/src/operations/navigate.js +148 -0
- package/dist/modules/operations/src/operations/scroll.js +126 -0
- package/dist/modules/operations/src/operations/type.js +190 -0
- package/dist/modules/operations/src/queue.js +100 -0
- package/dist/modules/operations/src/registry.js +11 -0
- package/dist/modules/operations/src/system/mouse.js +33 -0
- package/dist/modules/state/src/atomic-json.js +33 -0
- package/dist/modules/workflow/blocks/AnchorVerificationBlock.js +71 -0
- package/dist/modules/workflow/blocks/BehaviorRandomizer.js +26 -0
- package/dist/modules/workflow/blocks/CallWorkflowBlock.js +38 -0
- package/dist/modules/workflow/blocks/CloseDetailBlock.js +209 -0
- package/dist/modules/workflow/blocks/CollectBatch.js +137 -0
- package/dist/modules/workflow/blocks/CollectCommentsBlock.js +415 -0
- package/dist/modules/workflow/blocks/CollectSearchListBlock.js +599 -0
- package/dist/modules/workflow/blocks/CollectWeiboPosts.js +229 -0
- package/dist/modules/workflow/blocks/DetectPageStateBlock.js +259 -0
- package/dist/modules/workflow/blocks/EnsureLoginBlock.js +162 -0
- package/dist/modules/workflow/blocks/EnsureSession.js +426 -0
- package/dist/modules/workflow/blocks/ErrorClassifier.js +164 -0
- package/dist/modules/workflow/blocks/ErrorRecoveryBlock.js +319 -0
- package/dist/modules/workflow/blocks/ExpandCommentsBlock.js +1032 -0
- package/dist/modules/workflow/blocks/ExtractDetailBlock.js +310 -0
- package/dist/modules/workflow/blocks/ExtractPostFields.js +88 -0
- package/dist/modules/workflow/blocks/GenerateSmartReplyBlock.js +68 -0
- package/dist/modules/workflow/blocks/GoToSearchBlock.js +497 -0
- package/dist/modules/workflow/blocks/GracefulFallbackBlock.js +104 -0
- package/dist/modules/workflow/blocks/HighlightBlock.js +66 -0
- package/dist/modules/workflow/blocks/InitAutoScroll.js +65 -0
- package/dist/modules/workflow/blocks/LoadContainerDefinition.js +50 -0
- package/dist/modules/workflow/blocks/LoadContainerIndex.js +43 -0
- package/dist/modules/workflow/blocks/LocateAndGuardBlock.js +176 -0
- package/dist/modules/workflow/blocks/LoginRecoveryBlock.js +242 -0
- package/dist/modules/workflow/blocks/MatchContainers.js +64 -0
- package/dist/modules/workflow/blocks/MonitoringBlock.js +190 -0
- package/dist/modules/workflow/blocks/OpenDetailBlock.js +1240 -0
- package/dist/modules/workflow/blocks/OrganizeXhsNotesBlock.js +117 -0
- package/dist/modules/workflow/blocks/PersistXhsNoteBlock.js +270 -0
- package/dist/modules/workflow/blocks/PickSinglePost.js +69 -0
- package/dist/modules/workflow/blocks/ProgressTracker.js +125 -0
- package/dist/modules/workflow/blocks/RecordFixtureBlock.js +44 -0
- package/dist/modules/workflow/blocks/RenderMarkdown.js +48 -0
- package/dist/modules/workflow/blocks/SaveFile.js +54 -0
- package/dist/modules/workflow/blocks/ScrollNextBatch.js +72 -0
- package/dist/modules/workflow/blocks/SessionHealthBlock.js +73 -0
- package/dist/modules/workflow/blocks/StartBrowserService.js +45 -0
- package/dist/modules/workflow/blocks/ValidateContainerDefinition.js +67 -0
- package/dist/modules/workflow/blocks/ValidateExtract.js +35 -0
- package/dist/modules/workflow/blocks/WaitSearchPermitBlock.js +162 -0
- package/dist/modules/workflow/blocks/WaitStable.js +74 -0
- package/dist/modules/workflow/blocks/WarmupCommentsBlock.js +120 -0
- package/dist/modules/workflow/blocks/WorkflowExecutor.js +156 -0
- package/dist/modules/workflow/blocks/XiaohongshuCollectFromLinksBlock.js +1004 -0
- package/dist/modules/workflow/blocks/XiaohongshuCollectLinksBlock.js +1049 -0
- package/dist/modules/workflow/blocks/XiaohongshuFullCollectBlock.js +782 -0
- package/dist/modules/workflow/blocks/helpers/anchorVerify.js +198 -0
- package/dist/modules/workflow/blocks/helpers/asyncWorkQueue.js +53 -0
- package/dist/modules/workflow/blocks/helpers/commentScroller.js +334 -0
- package/dist/modules/workflow/blocks/helpers/commentSectionLocator.js +126 -0
- package/dist/modules/workflow/blocks/helpers/containerAnchors.js +301 -0
- package/dist/modules/workflow/blocks/helpers/debugArtifacts.js +6 -0
- package/dist/modules/workflow/blocks/helpers/downloadPaths.js +29 -0
- package/dist/modules/workflow/blocks/helpers/expandCommentsController.js +53 -0
- package/dist/modules/workflow/blocks/helpers/expandCommentsExtractor.js +129 -0
- package/dist/modules/workflow/blocks/helpers/macosVisionOcrPlugin.js +116 -0
- package/dist/modules/workflow/blocks/helpers/mergeXhsMarkdown.js +109 -0
- package/dist/modules/workflow/blocks/helpers/openDetailController.js +56 -0
- package/dist/modules/workflow/blocks/helpers/openDetailTypes.js +7 -0
- package/dist/modules/workflow/blocks/helpers/openDetailViewport.js +474 -0
- package/dist/modules/workflow/blocks/helpers/openDetailWaiter.js +104 -0
- package/dist/modules/workflow/blocks/helpers/operationLogger.js +195 -0
- package/dist/modules/workflow/blocks/helpers/persistedNotes.js +107 -0
- package/dist/modules/workflow/blocks/helpers/replyExpander.js +260 -0
- package/dist/modules/workflow/blocks/helpers/scrollIntoView.js +138 -0
- package/dist/modules/workflow/blocks/helpers/searchExecutor.js +328 -0
- package/dist/modules/workflow/blocks/helpers/searchGate.js +46 -0
- package/dist/modules/workflow/blocks/helpers/searchPageState.js +164 -0
- package/dist/modules/workflow/blocks/helpers/searchResultWaiter.js +64 -0
- package/dist/modules/workflow/blocks/helpers/simpleAnchor.js +134 -0
- package/dist/modules/workflow/blocks/helpers/smartReply.js +40 -0
- package/dist/modules/workflow/blocks/helpers/systemInput.js +635 -0
- package/dist/modules/workflow/blocks/helpers/targetCountMode.js +9 -0
- package/dist/modules/workflow/blocks/helpers/xhsCliArgs.js +80 -0
- package/dist/modules/workflow/blocks/helpers/xhsCommentDom.js +805 -0
- package/dist/modules/workflow/blocks/helpers/xhsNoteOrganizer.js +140 -0
- package/dist/modules/workflow/blocks/restore/RestorePhaseBlock.js +204 -0
- package/dist/modules/workflow/config/workflowRegistry.js +32 -0
- package/dist/modules/workflow/definitions/batch-collect-workflow.js +63 -0
- package/dist/modules/workflow/definitions/scroll-extract-workflow.js +74 -0
- package/dist/modules/workflow/definitions/xiaohongshu-collect-workflow-v2.js +81 -0
- package/dist/modules/workflow/definitions/xiaohongshu-collect-workflow.js +57 -0
- package/dist/modules/workflow/definitions/xiaohongshu-full-collect-workflow-v3.js +68 -0
- package/dist/modules/workflow/definitions/xiaohongshu-note-collect.js +49 -0
- package/dist/modules/workflow/definitions/xiaohongshu-phase1-workflow-v3.js +30 -0
- package/dist/modules/workflow/definitions/xiaohongshu-phase2-links-workflow-v3.js +40 -0
- package/dist/modules/workflow/definitions/xiaohongshu-phase3-collect-workflow-v1.js +54 -0
- package/dist/modules/workflow/definitions/xiaohongshu-phase34-from-links-workflow-v3.js +25 -0
- package/dist/modules/workflow/src/WeiboEventDrivenWorkflowRunner.js +308 -0
- package/dist/modules/workflow/src/context.js +70 -0
- package/dist/modules/workflow/src/index.js +5 -0
- package/dist/modules/workflow/src/orchestrator.js +230 -0
- package/dist/modules/workflow/src/runner.js +55 -0
- package/dist/modules/workflow/src/runtime.js +70 -0
- package/dist/modules/workflow/workflows/WeiboFeedExtractionWorkflow.js +359 -0
- package/dist/modules/workflow/workflows/XiaohongshuLoginWorkflow.js +110 -0
- package/dist/modules/xiaohongshu/app/src/blocks/MatchCommentsBlock.js +139 -0
- package/dist/modules/xiaohongshu/app/src/blocks/Phase1EnsureServicesBlock.js +36 -0
- package/dist/modules/xiaohongshu/app/src/blocks/Phase1MonitorCookieBlock.js +213 -0
- package/dist/modules/xiaohongshu/app/src/blocks/Phase1StartProfileBlock.js +121 -0
- package/dist/modules/xiaohongshu/app/src/blocks/Phase2CollectLinksBlock.js +1249 -0
- package/dist/modules/xiaohongshu/app/src/blocks/Phase2SearchBlock.js +703 -0
- package/dist/modules/xiaohongshu/app/src/blocks/Phase34CloseDetailBlock.js +41 -0
- package/dist/modules/xiaohongshu/app/src/blocks/Phase34CloseTabsBlock.js +44 -0
- package/dist/modules/xiaohongshu/app/src/blocks/Phase34CollectCommentsBlock.js +150 -0
- package/dist/modules/xiaohongshu/app/src/blocks/Phase34ExtractDetailBlock.js +117 -0
- package/dist/modules/xiaohongshu/app/src/blocks/Phase34OpenDetailBlock.js +102 -0
- package/dist/modules/xiaohongshu/app/src/blocks/Phase34OpenTabsBlock.js +109 -0
- package/dist/modules/xiaohongshu/app/src/blocks/Phase34PersistDetailBlock.js +117 -0
- package/dist/modules/xiaohongshu/app/src/blocks/Phase34ProcessSingleNoteBlock.js +114 -0
- package/dist/modules/xiaohongshu/app/src/blocks/Phase34ValidateLinksBlock.js +90 -0
- package/dist/modules/xiaohongshu/app/src/blocks/Phase3InteractBlock.js +1009 -0
- package/dist/modules/xiaohongshu/app/src/blocks/Phase4MultiTabHarvestBlock.js +233 -0
- package/dist/modules/xiaohongshu/app/src/blocks/ReplyInteractBlock.js +291 -0
- package/dist/modules/xiaohongshu/app/src/blocks/XhsDiscoverFallbackBlock.js +240 -0
- package/dist/modules/xiaohongshu/app/src/blocks/helpers/commentMatchDsl.js +126 -0
- package/dist/modules/xiaohongshu/app/src/blocks/helpers/commentMatcher.js +99 -0
- package/dist/modules/xiaohongshu/app/src/blocks/helpers/evidence.js +27 -0
- package/dist/modules/xiaohongshu/app/src/blocks/helpers/sharding.js +42 -0
- package/dist/modules/xiaohongshu/app/src/blocks/helpers/xhsComments.js +270 -0
- package/dist/modules/xiaohongshu/app/src/index.js +9 -0
- package/dist/modules/xiaohongshu/app/src/utils/checkpoints.js +222 -0
- package/dist/modules/xiaohongshu/app/src/utils/controllerAction.js +43 -0
- package/dist/services/controller/src/controller.js +1476 -0
- package/dist/services/controller/src/index.js +2 -0
- package/dist/services/controller/src/payload-normalizer.js +129 -0
- package/dist/services/shared/heartbeat.js +120 -0
- package/dist/services/shared/lib/errorHandler.js +2 -0
- package/dist/services/shared/serviceProcessLogger.js +139 -0
- package/dist/services/unified-api/RemoteBrowserSession.js +176 -0
- package/dist/services/unified-api/RemoteSessionManager.js +148 -0
- package/dist/services/unified-api/container-operations-handler.js +115 -0
- package/dist/services/unified-api/server.js +652 -0
- package/dist/services/unified-api/state-registry.js +274 -0
- package/dist/services/unified-api/task-persistence.js +66 -0
- package/dist/services/unified-api/task-state.js +130 -0
- package/modules/camo-runtime/src/autoscript/action-providers/xhs/search.mjs +12 -5
- package/modules/xiaohongshu/app/pnpm-lock.yaml +24 -0
- package/package.json +37 -9
- package/.beads/README.md +0 -81
- package/.beads/config.yaml +0 -67
- package/.beads/interactions.jsonl +0 -0
- package/.beads/issues.jsonl +0 -180
- package/.beads/metadata.json +0 -4
- package/.claude/settings.local.json +0 -10
- package/.github/workflows/ci.yml +0 -55
- package/AGENTS.md +0 -253
- package/apps/desktop-console/README.md +0 -27
- package/apps/desktop-console/package-lock.json +0 -897
- package/apps/desktop-console/package.json +0 -20
- package/apps/desktop-console/scripts/build-and-install.mjs +0 -19
- package/apps/desktop-console/scripts/build.mjs +0 -45
- package/apps/desktop-console/scripts/test-preload.mjs +0 -13
- package/apps/desktop-console/src/main/config.mts +0 -26
- package/apps/desktop-console/src/main/core-daemon-manager.mts +0 -131
- package/apps/desktop-console/src/main/desktop-settings.mts +0 -267
- package/apps/desktop-console/src/main/heartbeat-watchdog.mts +0 -50
- package/apps/desktop-console/src/main/heartbeat-watchdog.test.mts +0 -68
- package/apps/desktop-console/src/main/index-streaming.test.mts +0 -20
- package/apps/desktop-console/src/main/index.mts +0 -980
- package/apps/desktop-console/src/main/profile-store.mts +0 -239
- package/apps/desktop-console/src/main/profile-store.test.mts +0 -54
- package/apps/desktop-console/src/main/state-bridge.mts +0 -114
- package/apps/desktop-console/src/main/task-state-types.ts +0 -32
- package/apps/desktop-console/src/renderer/hooks/use-task-state.mts +0 -120
- package/apps/desktop-console/src/renderer/index.mts +0 -133
- package/apps/desktop-console/src/renderer/index.test.mts +0 -34
- package/apps/desktop-console/src/renderer/path-helpers.mts +0 -46
- package/apps/desktop-console/src/renderer/path-helpers.test.mts +0 -14
- package/apps/desktop-console/src/renderer/tabs/debug.mts +0 -48
- package/apps/desktop-console/src/renderer/tabs/debug.test.mts +0 -22
- package/apps/desktop-console/src/renderer/tabs/logs.mts +0 -421
- package/apps/desktop-console/src/renderer/tabs/logs.test.mts +0 -27
- package/apps/desktop-console/src/renderer/tabs/preflight.mts +0 -486
- package/apps/desktop-console/src/renderer/tabs/preflight.test.mts +0 -33
- package/apps/desktop-console/src/renderer/tabs/profile-pool.mts +0 -213
- package/apps/desktop-console/src/renderer/tabs/results.mts +0 -171
- package/apps/desktop-console/src/renderer/tabs/run.test.mts +0 -63
- package/apps/desktop-console/src/renderer/tabs/runtime.mts +0 -151
- package/apps/desktop-console/src/renderer/tabs/settings.mts +0 -146
- package/apps/desktop-console/src/renderer/tabs/xiaohongshu/account-flow.mts +0 -486
- package/apps/desktop-console/src/renderer/tabs/xiaohongshu/guide-browser-check.mts +0 -56
- package/apps/desktop-console/src/renderer/tabs/xiaohongshu/helpers.mts +0 -262
- package/apps/desktop-console/src/renderer/tabs/xiaohongshu/layout-block.mts +0 -430
- package/apps/desktop-console/src/renderer/tabs/xiaohongshu/live-stats.mts +0 -847
- package/apps/desktop-console/src/renderer/tabs/xiaohongshu/run-flow.mts +0 -443
- package/apps/desktop-console/src/renderer/tabs/xiaohongshu-state.mts +0 -425
- package/apps/desktop-console/src/renderer/tabs/xiaohongshu.mts +0 -497
- package/apps/desktop-console/src/renderer/tabs/xiaohongshu.test.mts +0 -291
- package/apps/desktop-console/src/renderer/ui-components.mts +0 -31
- package/docs/README_camoufox_chinese.md +0 -141
- package/docs/USAGE_V3.md +0 -163
- package/docs/arch/OCR_MACOS_PLUGIN.md +0 -39
- package/docs/arch/PORTS.md +0 -40
- package/docs/arch/REGRESSION_CHECKLIST.md +0 -121
- package/docs/arch/SEARCH_GATE.md +0 -224
- package/docs/arch/VIEWPORT_SAFETY.md +0 -182
- package/docs/arch/XIAOHONGSHU_OFFLINE_MOCK_DESIGN.md +0 -267
- package/docs/xiaohongshu-container-driven-summary.md +0 -221
- package/docs/xiaohongshu-full-collect-runbook.md +0 -134
- package/docs/xiaohongshu-next-steps.md +0 -228
- package/docs/xiaohongshu-quickstart.md +0 -73
- package/docs/xiaohongshu-workflow-summary.md +0 -227
- package/modules/container-registry/tests/container-registry.test.ts +0 -16
- package/modules/logging/tests/logging.test.ts +0 -38
- package/modules/operations/tests/operations.test.ts +0 -22
- package/modules/operations/tests/viewport-filter.test.ts +0 -161
- package/modules/operations/tests/visible-only.test.ts +0 -250
- package/modules/session-manager/tests/session-manager.test.ts +0 -23
- package/modules/state/src/atomic-json.test.ts +0 -30
- package/modules/state/src/paths.test.ts +0 -59
- package/modules/state/src/xiaohongshu-collect-state.test.ts +0 -259
- package/modules/workflow/blocks/AnchorVerificationBlock.d.ts.map +0 -1
- package/modules/workflow/blocks/AnchorVerificationBlock.js.map +0 -1
- package/modules/workflow/blocks/DetectPageStateBlock.d.ts.map +0 -1
- package/modules/workflow/blocks/DetectPageStateBlock.js.map +0 -1
- package/modules/workflow/blocks/ErrorRecoveryBlock.d.ts.map +0 -1
- package/modules/workflow/blocks/ErrorRecoveryBlock.js.map +0 -1
- package/modules/workflow/blocks/WaitSearchPermitBlock.d.ts.map +0 -1
- package/modules/workflow/blocks/WaitSearchPermitBlock.js.map +0 -1
- package/modules/workflow/blocks/helpers/containerAnchors.d.ts.map +0 -1
- package/modules/workflow/blocks/helpers/containerAnchors.js.map +0 -1
- package/modules/workflow/blocks/helpers/downloadPaths.test.ts +0 -62
- package/modules/workflow/blocks/helpers/mergeXhsMarkdown.test.ts +0 -121
- package/modules/workflow/blocks/helpers/operationLogger.d.ts.map +0 -1
- package/modules/workflow/blocks/helpers/operationLogger.js.map +0 -1
- package/modules/workflow/blocks/helpers/persistedNotes.test.ts +0 -268
- package/modules/workflow/blocks/helpers/searchPageState.d.ts.map +0 -1
- package/modules/workflow/blocks/helpers/searchPageState.js.map +0 -1
- package/modules/workflow/blocks/helpers/targetCountMode.test.ts +0 -29
- package/modules/workflow/blocks/helpers/xhsCliArgs.test.ts +0 -75
- package/modules/workflow/tests/smartReply.test.ts +0 -32
- package/modules/xiaohongshu/app/src/blocks/Phase3Interact.matcher.test.ts +0 -33
- package/modules/xiaohongshu/app/src/utils/__tests__/checkpoints.test.ts +0 -141
- package/modules/xiaohongshu/app/tests/commentMatchDsl.test.ts +0 -50
- package/modules/xiaohongshu/app/tests/commentMatcher.test.ts +0 -46
- package/modules/xiaohongshu/app/tests/sharding.test.ts +0 -31
- package/package-scripts.json +0 -8
- package/runtime/infra/utils/README.md +0 -13
- package/runtime/infra/utils/scripts/README.md +0 -0
- package/runtime/infra/utils/scripts/development/eval-in-session.mjs +0 -40
- package/runtime/infra/utils/scripts/development/highlight-search-containers.mjs +0 -35
- package/runtime/infra/utils/scripts/service/kill-port.mjs +0 -24
- package/runtime/infra/utils/scripts/service/start-api.mjs +0 -39
- package/runtime/infra/utils/scripts/service/start-browser-service.mjs +0 -106
- package/runtime/infra/utils/scripts/service/stop-api.mjs +0 -18
- package/runtime/infra/utils/scripts/service/stop-browser-service.mjs +0 -104
- package/runtime/infra/utils/scripts/test-services.mjs +0 -94
- package/services/shared/heartbeat.test.ts +0 -102
- package/services/unified-api/__tests__/task-state.test.ts +0 -95
- package/sitecustomize.py +0 -19
- package/tests/README.md +0 -194
- package/tests/e2e/workflows/weibo-feed-extraction.test.ts +0 -171
- package/tests/fixtures/data/container-definitions.json +0 -67
- package/tests/fixtures/pages/simple-page.html +0 -69
- package/tests/integration/01-test-container-match.mjs +0 -188
- package/tests/integration/02-test-dom-branch.mjs +0 -161
- package/tests/integration/03-test-container-operation-system.mjs +0 -91
- package/tests/integration/05-test-container-lifecycle-events.mjs +0 -224
- package/tests/integration/05-test-container-lifecycle-with-events.mjs +0 -250
- package/tests/integration/06-test-container-dom-tree-drawing.mjs +0 -256
- package/tests/integration/07-test-weibo-container-lifecycle.mjs +0 -355
- package/tests/integration/08-test-weibo-feed-workflow.test.mjs +0 -164
- package/tests/integration/10-test-visual-analyzer.mjs +0 -312
- package/tests/integration/11-test-visual-loop.mjs +0 -284
- package/tests/integration/12-test-simple-visual-loop.mjs +0 -242
- package/tests/integration/13-test-visual-robust.mjs +0 -185
- package/tests/integration/14-test-visual-highlight-loop.mjs +0 -271
- package/tests/integration/inspect-page.mjs +0 -50
- package/tests/integration/run-all-tests.mjs +0 -95
- package/tests/patch_verification/CODEX_PATCH_TEST.md +0 -103
- package/tests/patch_verification/PHASE2_ANALYSIS.md +0 -179
- package/tests/patch_verification/PHASE2_OPTIMIZATION_REPORT.md +0 -55
- package/tests/patch_verification/PHASE2_TO_PHASE4_SUMMARY.md +0 -126
- package/tests/patch_verification/QUICK_TEST_SEQUENCE.md +0 -262
- package/tests/patch_verification/README.md +0 -143
- package/tests/patch_verification/RUN_TESTS.md +0 -60
- package/tests/patch_verification/TEST_EXECUTION.md +0 -99
- package/tests/patch_verification/TEST_PLAN.md +0 -328
- package/tests/patch_verification/TEST_RESULTS.md +0 -34
- package/tests/patch_verification/TOOL_TEST_PLAN.md +0 -48
- package/tests/patch_verification/run-tool-test.mjs +0 -121
- package/tests/patch_verification/temp_test_files/test01.txt +0 -1
- package/tests/patch_verification/temp_test_files/test02.txt +0 -3
- package/tests/patch_verification/temp_test_files/test02_gnu.txt +0 -3
- package/tests/patch_verification/temp_test_files/test03.txt +0 -1
- package/tests/patch_verification/temp_test_files/test03_multiline.txt +0 -5
- package/tests/patch_verification/temp_test_files/test04_function.ts +0 -5
- package/tests/patch_verification/temp_test_files/test05_import.ts +0 -4
- package/tests/patch_verification/temp_test_files/test06_special_chars.txt +0 -4
- package/tests/patch_verification/temp_test_files/test07_indentation.ts +0 -5
- package/tests/patch_verification/temp_test_files/test08_mismatch.txt +0 -1
- package/tests/patch_verification/temp_test_files/test_add_02.txt +0 -3
- package/tests/patch_verification/temp_test_files/test_simple.txt +0 -1
- package/tests/runner/TestReporter.mjs +0 -57
- package/tests/runner/TestRunner.mjs +0 -244
- package/tests/unit/commands/profile.test.mjs +0 -10
- package/tests/unit/container/change-notifier.test.mjs +0 -181
- package/tests/unit/lifecycle/session-registry.test.mjs +0 -135
- package/tests/unit/operations/registry.test.ts +0 -73
- package/tests/unit/utils/browser-service.test.mjs +0 -153
- package/tests/unit/utils/config.test.mjs +0 -166
- package/tests/unit/utils/fingerprint.test.mjs +0 -166
- package/tsconfig.json +0 -31
- package/tsconfig.services.json +0 -26
- /package/apps/desktop-console/{src → dist}/renderer/index.html +0 -0
- /package/apps/desktop-console/{src/renderer/tabs → dist/renderer}/run.mts +0 -0
|
@@ -1,267 +0,0 @@
|
|
|
1
|
-
# 小红书采集持久化节点与离线仿真测试设计
|
|
2
|
-
|
|
3
|
-
> 目标:在不依赖线上页面和 URL 跳转的前提下,完整验证「详情提取 + 评论采集 + 持久化写盘」链路,为后续量产采集提供稳定闭环。
|
|
4
|
-
|
|
5
|
-
## 1. 持久化节点:PersistXhsNoteBlock
|
|
6
|
-
|
|
7
|
-
### 1.1 职责
|
|
8
|
-
|
|
9
|
-
- 单一职责:将当前 Note 的结构化内容(详情 + 评论)写入本地目录结构;
|
|
10
|
-
- 不做 DOM 访问、不做容器操作,只处理纯数据与文件系统;
|
|
11
|
-
- 所有落盘路径统一落在 `~/.webauto/download/xiaohongshu/{env}/`。
|
|
12
|
-
|
|
13
|
-
### 1.2 输入
|
|
14
|
-
|
|
15
|
-
由上游 Workflow 上下文提供(通常来自 ExtractDetailBlock / CollectCommentsBlock):
|
|
16
|
-
|
|
17
|
-
- `sessionId: string`
|
|
18
|
-
- `env: string`:环境标记,例如 `debug` / `prod`
|
|
19
|
-
- `platform?: string`:默认 `'xiaohongshu'`
|
|
20
|
-
- `keyword: string`
|
|
21
|
-
- `noteId: string`
|
|
22
|
-
- `detailUrl?: string`:当前详情页 URL(带 xsec token,只用于展示,不参与导航)
|
|
23
|
-
- `detail: { ... }`:
|
|
24
|
-
- 至少包含:`title`, `contentText`, `gallery: { images: string[] }`
|
|
25
|
-
- 具体字段沿用 ExtractDetailBlock 的输出结构
|
|
26
|
-
- `commentsResult: { ... }`:
|
|
27
|
-
- 至少包含:
|
|
28
|
-
- `comments: Array<{ user_name?, user_id?, timestamp?, text? }>`
|
|
29
|
-
- `totalFromHeader?: number`
|
|
30
|
-
- `reachedEnd?: boolean`
|
|
31
|
-
- `emptyState?: boolean`
|
|
32
|
-
|
|
33
|
-
### 1.3 输出
|
|
34
|
-
|
|
35
|
-
```ts
|
|
36
|
-
interface PersistXhsNoteOutput {
|
|
37
|
-
success: boolean;
|
|
38
|
-
error?: string;
|
|
39
|
-
outputDir?: string; // 实际写盘的帖子目录
|
|
40
|
-
contentPath?: string; // content.md 路径
|
|
41
|
-
imagesDir?: string; // images 目录路径
|
|
42
|
-
}
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
### 1.4 目录结构
|
|
46
|
-
|
|
47
|
-
- 根目录:`~/.webauto/download/xiaohongshu/{env}/`
|
|
48
|
-
- 关键字目录:`{root}/{sanitize(keyword)}/`
|
|
49
|
-
- 单条 Note 目录:`{root}/{sanitize(keyword)}/{noteId}/`
|
|
50
|
-
- `content.md`:帖子+评论 Markdown
|
|
51
|
-
- `images/`:图片文件
|
|
52
|
-
|
|
53
|
-
`sanitize(keyword)`:沿用现有实现,替换 `\/:*?"<>|` 等字符为 `_`,并 trim。
|
|
54
|
-
|
|
55
|
-
### 1.5 写盘逻辑
|
|
56
|
-
|
|
57
|
-
1. **目录创建**
|
|
58
|
-
- 依次确保 `root/keywordDir/postDir/imagesDir` 存在;
|
|
59
|
-
- 使用 Node ESM FS API:`fs.promises.mkdir(dir, { recursive: true })`。
|
|
60
|
-
|
|
61
|
-
2. **图片下载**
|
|
62
|
-
- 来源:`detail.gallery.images: string[]`;
|
|
63
|
-
- 预处理:
|
|
64
|
-
- 去除空值,两端 trim;
|
|
65
|
-
- `//` 开头补 `https:`;
|
|
66
|
-
- 仅保留 `http/https` 协议;
|
|
67
|
-
- 下载策略:
|
|
68
|
-
- 使用 `fetch(url)` 获取响应,`arrayBuffer()` → `Buffer`;
|
|
69
|
-
- 文件名:`images/{index}.jpg`(`01.jpg`、`02.jpg`...,保留顺序即可);
|
|
70
|
-
- 对单张失败情况:跳过该 URL,打印告警但不使整个 Block 失败;
|
|
71
|
-
- 返回:
|
|
72
|
-
- 本地相对路径列表,例如:`['images/01.jpg', 'images/02.jpg', ...]`。
|
|
73
|
-
|
|
74
|
-
3. **content.md 结构**
|
|
75
|
-
|
|
76
|
-
示例结构(与现有 `collect-100-workflow-v2.mjs` 一致,但文件名统一为 `content.md`):
|
|
77
|
-
|
|
78
|
-
```markdown
|
|
79
|
-
# {title || '无标题'}
|
|
80
|
-
|
|
81
|
-
- Note ID: {noteId}
|
|
82
|
-
- 关键词: {keyword}
|
|
83
|
-
- 链接: {detailUrl}
|
|
84
|
-
- 作者: {author}
|
|
85
|
-
- 评论统计: 抓取={comments.length}, header={totalFromHeader|未知}(reachedEnd={是/否}, empty={是/否})
|
|
86
|
-
|
|
87
|
-
## 正文
|
|
88
|
-
|
|
89
|
-
{contentText 或占位 "(无正文)"}
|
|
90
|
-
|
|
91
|
-
## 图片
|
|
92
|
-
|
|
93
|
-

|
|
94
|
-

|
|
95
|
-
...
|
|
96
|
-
|
|
97
|
-
## 评论
|
|
98
|
-
|
|
99
|
-
- **用户名**(user_id) [时间]:评论文本
|
|
100
|
-
...
|
|
101
|
-
```
|
|
102
|
-
|
|
103
|
-
字段选择策略:
|
|
104
|
-
|
|
105
|
-
- `title`:优先 detail.header/content 中的标题字段,其次回退到列表 item 的标题;
|
|
106
|
-
- `author`:从 detail.header 中的 `author/user_name/nickname` 选取;
|
|
107
|
-
- `contentText`:从 detail.content 中组合正文文本字段;
|
|
108
|
-
- `评论统计`:使用 `commentsResult.comments/totalFromHeader/reachedEnd/emptyState` 填充。
|
|
109
|
-
|
|
110
|
-
评论渲染规则:
|
|
111
|
-
|
|
112
|
-
- 遍历 `commentsResult.comments`:
|
|
113
|
-
- `user = user_name || username || '未知用户'`
|
|
114
|
-
- `uid = user_id || ''`
|
|
115
|
-
- `ts = timestamp || ''`
|
|
116
|
-
- `text = text || ''`
|
|
117
|
-
- 生成:`- **{user}**({uid}) [ts]:{text}`
|
|
118
|
-
- 当 `comments.length === 0` 时写入:`(无评论)`。
|
|
119
|
-
|
|
120
|
-
---
|
|
121
|
-
|
|
122
|
-
## 2. 在线数据 → 本地 fixture JSON
|
|
123
|
-
|
|
124
|
-
> 一次在线采集,多次离线复用。
|
|
125
|
-
|
|
126
|
-
### 2.1 录制位置
|
|
127
|
-
|
|
128
|
-
- 在真实阶段(在线运行)中,在以下 Block 后增加 debug 输出(仅在 `DEBUG` 或特定环境下打开):
|
|
129
|
-
- `ExtractDetailBlock` 完成后;
|
|
130
|
-
- `CollectCommentsBlock` 完成后。
|
|
131
|
-
- 将两者输出聚合成一份结构体:
|
|
132
|
-
|
|
133
|
-
```ts
|
|
134
|
-
interface XhsNoteFixture {
|
|
135
|
-
noteId: string;
|
|
136
|
-
keyword: string;
|
|
137
|
-
detailUrl: string;
|
|
138
|
-
detail: any; // ExtractDetailBlock 完整输出
|
|
139
|
-
commentsResult: any; // CollectCommentsBlock 完整输出
|
|
140
|
-
capturedAt: string; // ISO 时间
|
|
141
|
-
}
|
|
142
|
-
```
|
|
143
|
-
|
|
144
|
-
### 2.2 落盘路径
|
|
145
|
-
|
|
146
|
-
- 路径统一放在用户目录,不进仓库:
|
|
147
|
-
- `~/.webauto/fixtures/xiaohongshu/{noteId}.json`
|
|
148
|
-
- 由一个小的工具函数或 Block 内部调试逻辑写入:
|
|
149
|
-
- 非强制步骤,只在调试/回放模式下写,避免常态任务产生太多 fixture。
|
|
150
|
-
|
|
151
|
-
### 2.3 用途
|
|
152
|
-
|
|
153
|
-
- PersistXhsNoteBlock 的单元/集成测试直接以 fixture 为输入,不依赖浏览器或 DOM;
|
|
154
|
-
- 也作为生成离线 HTML 仿真页的原始数据源。
|
|
155
|
-
|
|
156
|
-
---
|
|
157
|
-
|
|
158
|
-
## 3. fixture JSON → 仿真 HTML 详情页
|
|
159
|
-
|
|
160
|
-
> 目标:构造一个“结构类似小红书详情页”的本地 HTML,使容器系统与 Block 可以在本地跑完整链路。
|
|
161
|
-
|
|
162
|
-
### 3.1 生成脚本
|
|
163
|
-
|
|
164
|
-
- 新增脚本:`scripts/xiaohongshu/tests/generate-detail-mock-page.mjs`
|
|
165
|
-
- 输入:
|
|
166
|
-
- `--noteId <id>`:从 `~/.webauto/fixtures/xiaohongshu/{noteId}.json` 读数据;
|
|
167
|
-
- `--output <path>`(可选):默认写到 `~/.webauto/fixtures/xiaohongshu/detail-{noteId}.html`。
|
|
168
|
-
- 输出:
|
|
169
|
-
- 一份完整 HTML,模拟线上详情页的布局和 class 结构。
|
|
170
|
-
|
|
171
|
-
### 3.2 DOM 结构设计(按容器对齐)
|
|
172
|
-
|
|
173
|
-
仿真 DOM 需对齐以下容器 ID/selector:
|
|
174
|
-
|
|
175
|
-
- 详情容器:
|
|
176
|
-
- `xiaohongshu_detail.modal_shell` / `xiaohongshu_detail`:最外层模态框容器;
|
|
177
|
-
- `xiaohongshu_detail.header`:标题、作者信息区域;
|
|
178
|
-
- `xiaohongshu_detail.content`:正文文本区域;
|
|
179
|
-
- `xiaohongshu_detail.gallery`:图片区域。
|
|
180
|
-
- 评论容器:
|
|
181
|
-
- `xiaohongshu_detail.comment_section`:评论区根容器;
|
|
182
|
-
- `xiaohongshu_detail.comment_section.comment_item`:单条评论节点;
|
|
183
|
-
- `xiaohongshu_detail.comment_section.show_more_button`:展开更多按钮;
|
|
184
|
-
- `xiaohongshu_detail.comment_section.end_marker`:末尾 marker(可选)。
|
|
185
|
-
|
|
186
|
-
布局要点:
|
|
187
|
-
|
|
188
|
-
- 使用与容器 JSON 中 selector 对齐的 class / DOM 层级;
|
|
189
|
-
- 每条评论生成一段 `.comment-item`,内部包含:
|
|
190
|
-
- 用户名元素(如 `.user-name`);
|
|
191
|
-
- 用户链接/ID(放在 `data-user-id` 或 `<a href="/user">` 中);
|
|
192
|
-
- 时间元素;
|
|
193
|
-
- 文本元素。
|
|
194
|
-
|
|
195
|
-
### 3.3 “展开更多评论”仿真
|
|
196
|
-
|
|
197
|
-
- 插入若干 `.show-more` 按钮与折叠块:
|
|
198
|
-
- 初始部分评论(例如前 N 条)直接可见;
|
|
199
|
-
- 后续评论包在一个 `div` 里,`style="display:none"`;
|
|
200
|
-
- 在其前插入一个 `.show-more` 元素。
|
|
201
|
-
- 在页面底部插入一段简单的 inline JS:
|
|
202
|
-
|
|
203
|
-
```js
|
|
204
|
-
document.addEventListener('click', (e) => {
|
|
205
|
-
const btn = e.target.closest('.show-more');
|
|
206
|
-
if (!btn) return;
|
|
207
|
-
const block = btn.nextElementSibling;
|
|
208
|
-
if (block) {
|
|
209
|
-
block.style.display = 'block';
|
|
210
|
-
btn.remove();
|
|
211
|
-
}
|
|
212
|
-
});
|
|
213
|
-
```
|
|
214
|
-
|
|
215
|
-
- 目的:让 `WarmupCommentsBlock` + `CollectCommentsBlock` 在本地也能通过容器 click 自动展开评论,行为上与线上一致。
|
|
216
|
-
|
|
217
|
-
### 3.4 图片区域仿真
|
|
218
|
-
|
|
219
|
-
- 使用 fixture 中的 `detail.gallery.images`:
|
|
220
|
-
- 在 gallery 容器下生成 `<img>` 列表,class 对齐容器定义,例如:
|
|
221
|
-
- `.note-img img`、`.note-scroller img` 等;
|
|
222
|
-
- `src` 直接使用线上 URL(下载由 PersistXhsNoteBlock 负责)。
|
|
223
|
-
|
|
224
|
-
---
|
|
225
|
-
|
|
226
|
-
## 4. 基于仿真页的测试策略
|
|
227
|
-
|
|
228
|
-
### 4.1 PersistXhsNoteBlock 单块测试
|
|
229
|
-
|
|
230
|
-
1. 使用 fixture JSON 作为直接输入,不依赖 HTML/浏览器;
|
|
231
|
-
2. 调用 `PersistXhsNoteBlock.execute()`;
|
|
232
|
-
3. 断言:
|
|
233
|
-
- 目录结构:`~/.webauto/download/xiaohongshu/{env}/{keyword}/{noteId}/` 存在;
|
|
234
|
-
- `content.md` 内容完整(标题、元信息、正文、图片引用、评论);
|
|
235
|
-
- `images/` 下图片数量与 `gallery.images` 数量基本一致(允许部分下载失败但有告警)。
|
|
236
|
-
|
|
237
|
-
### 4.2 单 Note Workflow 离线 E2E
|
|
238
|
-
|
|
239
|
-
1. 启动 Browser Service,但导航到本地生成的仿真 HTML:
|
|
240
|
-
- 例如:`http://127.0.0.1:port/xhs-mock/detail-{noteId}.html`;
|
|
241
|
-
- URL 不包含任何线上域名,也不构造 xsec-less 链接。
|
|
242
|
-
2. 通过 `runWorkflowById('xiaohongshu-note-collect', { sessionId, keyword, env, noteId, detailUrl: mockUrl })` 执行:
|
|
243
|
-
- 内部仍然使用容器系统进行 anchor 定位、滚动、展开评论;
|
|
244
|
-
- CollectCommentsBlock 与 ExtractDetailBlock 均在本地仿真 DOM 上运行。
|
|
245
|
-
3. 验证:
|
|
246
|
-
- WorkflowExecutionResult 中各步骤 success;
|
|
247
|
-
- 持久化结果与 fixture 内容一致(评论条数、标题、正文等)。
|
|
248
|
-
|
|
249
|
-
### 4.3 整链路集成(可选)
|
|
250
|
-
|
|
251
|
-
- 在 debug 模式下,将搜索阶段替换为「直接跳本地仿真详情页」的简化 Workflow,用于验证:
|
|
252
|
-
- 顶层 Workflow + CallWorkflowBlock 串联;
|
|
253
|
-
- note-collect 节点可被反复调用且写盘正确;
|
|
254
|
-
- 不再依赖真实搜索页和线上滚动。
|
|
255
|
-
|
|
256
|
-
---
|
|
257
|
-
|
|
258
|
-
## 5. 对现有代码的影响范围(规划)
|
|
259
|
-
|
|
260
|
-
1. 新增 Block:`PersistXhsNoteBlock`(仅依赖 Node FS 与 fetch,不依赖容器或浏览器上下文);
|
|
261
|
-
2. 新增脚本:`scripts/xiaohongshu/tests/generate-detail-mock-page.mjs`;
|
|
262
|
-
3. 适度修改:
|
|
263
|
-
- 在在线调试脚本 / Workflow 中增加 fixture 录制逻辑(可由 DEBUG 开关控制);
|
|
264
|
-
- 在 `xiaohongshu-note-collect` Workflow 定义中插入 `PersistXhsNoteBlock`。
|
|
265
|
-
|
|
266
|
-
通过本设计,我们可以在本地稳定重放“小红书详情+评论”的复杂场景,用真实数据驱动的仿真 DOM 来验证容器、Block 与持久化逻辑,而不再依赖线上页面与 URL 导航,从而显著降低调试成本与风控风险。
|
|
267
|
-
|
|
@@ -1,221 +0,0 @@
|
|
|
1
|
-
# 小红书容器驱动化改造总结
|
|
2
|
-
|
|
3
|
-
> 日期:2025-01-06
|
|
4
|
-
> 状态:✅ 完成
|
|
5
|
-
> 目标:将小红书采集链路完全切换到容器驱动模式
|
|
6
|
-
|
|
7
|
-
## ✅ 完成清单
|
|
8
|
-
|
|
9
|
-
### 1. 登录锚点模型定义 ✅
|
|
10
|
-
|
|
11
|
-
**文件**:`container-library/xiaohongshu/README.md`
|
|
12
|
-
|
|
13
|
-
**约定**:
|
|
14
|
-
- **已登录标识**:`*.login_anchor`(匹配任意页面下的登录锚点容器)
|
|
15
|
-
- **未登录标识**:`xiaohongshu_login.login_guard`(登录页核心控件)
|
|
16
|
-
- **不确定状态**:两类容器都不匹配
|
|
17
|
-
|
|
18
|
-
**容器选择器**:
|
|
19
|
-
- `*.login_anchor`:`a.link-wrapper[title="我"]`
|
|
20
|
-
- `xiaohongshu_login.login_guard`:登录页核心控件
|
|
21
|
-
|
|
22
|
-
### 2. Launcher 登录检测改造 ✅
|
|
23
|
-
|
|
24
|
-
**文件**:`launcher/core/launcher.mjs`
|
|
25
|
-
|
|
26
|
-
**改造内容**:
|
|
27
|
-
- 移除硬编码 DOM 查询
|
|
28
|
-
- 改为调用 `containers:match` 获取容器树
|
|
29
|
-
- 递归查找 `*.login_anchor` 和 `xiaohongshu_login.login_guard`
|
|
30
|
-
- 不再直接读取 `__INITIAL_STATE__` 等全局变量
|
|
31
|
-
|
|
32
|
-
**关键代码**:
|
|
33
|
-
```typescript
|
|
34
|
-
function findContainer(tree, pattern) {
|
|
35
|
-
if (pattern.test(tree.id || tree.defId)) return tree;
|
|
36
|
-
// 递归查找...
|
|
37
|
-
}
|
|
38
|
-
|
|
39
|
-
const loginAnchor = findContainer(tree, /\.login_anchor$/);
|
|
40
|
-
const loginGuard = findContainer(tree, /xiaohongshu_login\.login_guard$/);
|
|
41
|
-
```
|
|
42
|
-
|
|
43
|
-
### 3. Workflow Block 实现 ✅
|
|
44
|
-
|
|
45
|
-
**文件**:`modules/workflow/blocks/EnsureLoginBlock.ts`
|
|
46
|
-
|
|
47
|
-
**功能**:
|
|
48
|
-
- 通过 `containers:match` API 查找容器
|
|
49
|
-
- 匹配到 `*.login_anchor` → 返回 `isLoggedIn: true`
|
|
50
|
-
- 匹配到 `login_guard` → 等待人工登录
|
|
51
|
-
- 超时保护(默认 2 分钟)
|
|
52
|
-
|
|
53
|
-
**接口**:
|
|
54
|
-
```typescript
|
|
55
|
-
interface EnsureLoginInput {
|
|
56
|
-
sessionId: string;
|
|
57
|
-
serviceUrl?: string;
|
|
58
|
-
maxWaitMs?: number;
|
|
59
|
-
checkIntervalMs?: number;
|
|
60
|
-
}
|
|
61
|
-
|
|
62
|
-
interface EnsureLoginOutput {
|
|
63
|
-
isLoggedIn: boolean;
|
|
64
|
-
loginMethod: 'container_match' | 'manual_wait' | 'timeout';
|
|
65
|
-
matchedContainer?: string;
|
|
66
|
-
waitTimeMs?: number;
|
|
67
|
-
error?: string;
|
|
68
|
-
}
|
|
69
|
-
```
|
|
70
|
-
|
|
71
|
-
### 4. 调试脚本改造 ✅
|
|
72
|
-
|
|
73
|
-
**文件**:
|
|
74
|
-
- `scripts/xiaohongshu/tests/status-v2.mjs` - 状态检查
|
|
75
|
-
- `scripts/xiaohongshu/tests/phase1-session-login.mjs` - 登录守护
|
|
76
|
-
- `scripts/debug-xhs-search.mjs` - Unattached 搜索验证
|
|
77
|
-
- `scripts/debug-xhs-detail.mjs` - Unattached 详情页交互
|
|
78
|
-
|
|
79
|
-
**改造要点**:
|
|
80
|
-
- 移除硬编码 DOM 逻辑(如 `if (url.includes('xiaohongshu'))`)
|
|
81
|
-
- 完全基于容器 ID 匹配
|
|
82
|
-
- 优先使用刷新而非重新导航
|
|
83
|
-
- 测试后恢复初始状态
|
|
84
|
-
|
|
85
|
-
### 5. 文档完善 ✅
|
|
86
|
-
|
|
87
|
-
**文件**:
|
|
88
|
-
- `container-library/xiaohongshu/README.md` - 登录锚点约定
|
|
89
|
-
- `AGENTS.md` - 调试脚本 Unattached 模式规则
|
|
90
|
-
- `task.md` - 完整任务追踪
|
|
91
|
-
|
|
92
|
-
## 📊 容器驱动化对比
|
|
93
|
-
|
|
94
|
-
### ❌ 旧方式(硬编码 DOM)
|
|
95
|
-
|
|
96
|
-
```javascript
|
|
97
|
-
// 禁止这样写
|
|
98
|
-
if (url.includes('xiaohongshu.com')) {
|
|
99
|
-
const avatar = await page.$('a[title="我"]');
|
|
100
|
-
if (avatar) return true;
|
|
101
|
-
}
|
|
102
|
-
```
|
|
103
|
-
|
|
104
|
-
**问题**:
|
|
105
|
-
- DOM 选择器易失效
|
|
106
|
-
- 平台特定逻辑分散
|
|
107
|
-
- 难以测试和维护
|
|
108
|
-
- 违反分层原则
|
|
109
|
-
|
|
110
|
-
### ✅ 新方式(容器驱动)
|
|
111
|
-
|
|
112
|
-
```typescript
|
|
113
|
-
// 推荐:基于容器 ID
|
|
114
|
-
const result = await controllerAction('containers:match', { profile, url });
|
|
115
|
-
const loginAnchor = findContainer(tree, /\.login_anchor$/);
|
|
116
|
-
if (loginAnchor) {
|
|
117
|
-
return { isLoggedIn: true };
|
|
118
|
-
}
|
|
119
|
-
```
|
|
120
|
-
|
|
121
|
-
**优势**:
|
|
122
|
-
- 平台无关(同一套代码支持微博/抖音等)
|
|
123
|
-
- 选择器集中在容器定义
|
|
124
|
-
- 易于测试和验证
|
|
125
|
-
- 符合分层架构
|
|
126
|
-
|
|
127
|
-
## 🔄 数据流
|
|
128
|
-
|
|
129
|
-
### 登录检测流程
|
|
130
|
-
|
|
131
|
-
```
|
|
132
|
-
1. Launcher / Workflow
|
|
133
|
-
↓
|
|
134
|
-
2. 调用 containers:match
|
|
135
|
-
↓
|
|
136
|
-
3. 获取容器树
|
|
137
|
-
↓
|
|
138
|
-
4. 递归查找 *.login_anchor
|
|
139
|
-
↓
|
|
140
|
-
5a. 匹配到 → 已登录
|
|
141
|
-
↓
|
|
142
|
-
5b. 未匹配到,查找 xiaohongshu_login.login_guard
|
|
143
|
-
↓
|
|
144
|
-
6a. 匹配到 → 未登录,等待人工
|
|
145
|
-
↓
|
|
146
|
-
6b. 未匹配到 → 不确定状态
|
|
147
|
-
```
|
|
148
|
-
|
|
149
|
-
### Workflow 执行流程
|
|
150
|
-
|
|
151
|
-
```
|
|
152
|
-
1. EnsureSessionBlock
|
|
153
|
-
↓
|
|
154
|
-
2. EnsureLoginBlock(容器驱动)
|
|
155
|
-
↓
|
|
156
|
-
3. GoToSearchBlock(容器驱动)
|
|
157
|
-
↓
|
|
158
|
-
4. PickNoteBlock(容器驱动)
|
|
159
|
-
↓
|
|
160
|
-
5. OpenDetailBlock(容器驱动)
|
|
161
|
-
↓
|
|
162
|
-
6. ExpandCommentsBlock(容器驱动)
|
|
163
|
-
```
|
|
164
|
-
|
|
165
|
-
## 📝 关键文件清单
|
|
166
|
-
|
|
167
|
-
| 文件 | 状态 | 说明 |
|
|
168
|
-
|------|------|------|
|
|
169
|
-
| `container-library/xiaohongshu/README.md` | ✅ | 登录锚点约定文档 |
|
|
170
|
-
| `launcher/core/launcher.mjs` | ✅ | 容器驱动登录检测 |
|
|
171
|
-
| `modules/workflow/blocks/EnsureLoginBlock.ts` | ✅ | 通用登录 Block |
|
|
172
|
-
| `scripts/xiaohongshu/tests/status-v2.mjs` | ✅ | 容器驱动状态检查 |
|
|
173
|
-
| `scripts/xiaohongshu/tests/phase1-session-login.mjs` | ✅ | 容器驱动登录守护 |
|
|
174
|
-
| `scripts/debug-xhs-search.mjs` | ✅ | Unattached 搜索验证 |
|
|
175
|
-
| `scripts/debug-xhs-detail.mjs` | ✅ | Unattached 详情页交互 |
|
|
176
|
-
| `AGENTS.md` | ✅ | Unattached 模式规则 |
|
|
177
|
-
| `task.md` | ✅ | 完整任务追踪 |
|
|
178
|
-
|
|
179
|
-
## 🎯 验证测试
|
|
180
|
-
|
|
181
|
-
### 测试命令
|
|
182
|
-
|
|
183
|
-
```bash
|
|
184
|
-
# 1. 检查会话状态(容器驱动)
|
|
185
|
-
node scripts/xiaohongshu/tests/status-v2.mjs
|
|
186
|
-
|
|
187
|
-
# 2. 一键启动(容器驱动登录检测)
|
|
188
|
-
node scripts/start-headful.mjs --profile xiaohongshu_fresh --url https://www.xiaohongshu.com
|
|
189
|
-
|
|
190
|
-
# 3. 搜索验证(Unattached 模式)
|
|
191
|
-
node scripts/debug-xhs-search.mjs
|
|
192
|
-
|
|
193
|
-
# 4. 详情页测试(Unattached 模式)
|
|
194
|
-
node scripts/debug-xhs-detail.mjs
|
|
195
|
-
```
|
|
196
|
-
|
|
197
|
-
### 预期结果
|
|
198
|
-
|
|
199
|
-
- 所有脚本不再硬编码 DOM 逻辑
|
|
200
|
-
- 登录状态完全基于容器匹配
|
|
201
|
-
- 调试脚本复用现有 session
|
|
202
|
-
- Workflow 可以直接复用 EnsureLoginBlock
|
|
203
|
-
|
|
204
|
-
## 🚀 下一步
|
|
205
|
-
|
|
206
|
-
1. 运行测试脚本验证容器驱动化
|
|
207
|
-
2. 创建第一个基于容器驱动的完整 Workflow
|
|
208
|
-
3. 运行小规模采集测试(5 条数据)
|
|
209
|
-
4. 优化 XiaohongshuCrawlerBlock 使用新架构
|
|
210
|
-
|
|
211
|
-
## 📚 参考文档
|
|
212
|
-
|
|
213
|
-
- `container-library/xiaohongshu/README.md` - 容器定义 + 登录锚点约定
|
|
214
|
-
- `task.md` - 当前任务追踪
|
|
215
|
-
- `AGENTS.md` - 架构规则
|
|
216
|
-
- `docs/xiaohongshu-next-steps.md` - 详细任务清单
|
|
217
|
-
|
|
218
|
-
---
|
|
219
|
-
|
|
220
|
-
**完成时间**:2025-01-06 09:30
|
|
221
|
-
**改造成果**:小红书链路 100% 容器驱动化
|
|
@@ -1,134 +0,0 @@
|
|
|
1
|
-
# 小红书全流程采集执行文档(Phase1-4)
|
|
2
|
-
|
|
3
|
-
> 目标:确保“列表未达目标也继续执行 Phase3/4 评论采集”,即使列表滚动异常也不中断后续流程。
|
|
4
|
-
|
|
5
|
-
## 一、前置检查
|
|
6
|
-
|
|
7
|
-
### 1. 服务状态
|
|
8
|
-
|
|
9
|
-
```bash
|
|
10
|
-
curl http://127.0.0.1:7701/health
|
|
11
|
-
curl http://127.0.0.1:7704/health
|
|
12
|
-
```
|
|
13
|
-
|
|
14
|
-
### 2. 会话状态
|
|
15
|
-
|
|
16
|
-
```bash
|
|
17
|
-
node scripts/xiaohongshu/tests/status-v2.mjs
|
|
18
|
-
```
|
|
19
|
-
|
|
20
|
-
必须确认:
|
|
21
|
-
- session: `xiaohongshu_fresh` 已存在
|
|
22
|
-
- 登录锚点 `*.login_anchor` 命中
|
|
23
|
-
- 当前页面处于小红书正常页面
|
|
24
|
-
|
|
25
|
-
### 3. SearchGate
|
|
26
|
-
|
|
27
|
-
如果需要搜索(Phase2 会触发):
|
|
28
|
-
|
|
29
|
-
```bash
|
|
30
|
-
node scripts/search-gate-server.mjs
|
|
31
|
-
```
|
|
32
|
-
|
|
33
|
-
## 二、全流程启动
|
|
34
|
-
|
|
35
|
-
### 1. 启动完整采集
|
|
36
|
-
|
|
37
|
-
```bash
|
|
38
|
-
node scripts/xiaohongshu/tests/phase1-4-full-collect.mjs --keyword "雷军" --count 200
|
|
39
|
-
```
|
|
40
|
-
|
|
41
|
-
输出目录(强制标准):
|
|
42
|
-
```
|
|
43
|
-
~/.webauto/download/xiaohongshu/{env}/{keyword}/{noteId}/
|
|
44
|
-
```
|
|
45
|
-
|
|
46
|
-
默认 env=`download`(可通过参数调整)。
|
|
47
|
-
|
|
48
|
-
### 2. 运行阶段说明
|
|
49
|
-
|
|
50
|
-
| 阶段 | 功能 | 退出条件 |
|
|
51
|
-
|------|------|-----------|
|
|
52
|
-
| Phase1 | 确认服务/会话/登录 | 成功即进入 Phase2 |
|
|
53
|
-
| Phase2(ListOnly) | 搜索列表采集 + 获取 safe-detail-urls | **列表滚动异常不影响后续阶段** |
|
|
54
|
-
| Phase3 | 基于 safe-detail-urls 打开详情 | safe-detail-urls 为空则跳过 |
|
|
55
|
-
| Phase4 | 采集评论并落盘 | 逐条完成,增量写 comments.md |
|
|
56
|
-
|
|
57
|
-
### 3. 关键要求(必须遵守)
|
|
58
|
-
|
|
59
|
-
- **不允许 URL 直跳**:必须从搜索页点击进入详情
|
|
60
|
-
- **SearchGate 节流**:搜索必须先申请许可
|
|
61
|
-
- **容器锚点**:禁止硬编码 DOM
|
|
62
|
-
- **滚动必须在视口内**:禁止 off-screen 操作
|
|
63
|
-
- **内容文件名必须是 `content.md`**(不是 README.md)
|
|
64
|
-
|
|
65
|
-
## 三、Phase2 滚动异常处理(修复后行为)
|
|
66
|
-
|
|
67
|
-
### 目标
|
|
68
|
-
|
|
69
|
-
- **只在检测到 END 标记时认为真正到底**
|
|
70
|
-
- **若滚动失败:仍继续尝试滚动**
|
|
71
|
-
- **即使 Phase2 未达标,也继续执行 Phase3/4**
|
|
72
|
-
|
|
73
|
-
### 异常退出逻辑(修复后)
|
|
74
|
-
|
|
75
|
-
- 连续 3 轮滚动失败(每轮 3 次重试) → 标记 Phase2 异常退出
|
|
76
|
-
- **但不会中断流程**:Phase3/4 继续执行已采集的 safe-detail-urls
|
|
77
|
-
|
|
78
|
-
## 四、产出目录结构
|
|
79
|
-
|
|
80
|
-
```
|
|
81
|
-
~/.webauto/download/xiaohongshu/{env}/{keyword}/{noteId}/
|
|
82
|
-
├── content.md
|
|
83
|
-
├── images/
|
|
84
|
-
│ ├── 1.jpg
|
|
85
|
-
│ └── ...
|
|
86
|
-
└── comments.md # Phase4 追加生成
|
|
87
|
-
```
|
|
88
|
-
|
|
89
|
-
其他文件:
|
|
90
|
-
- `.collect-state.json`:断点状态
|
|
91
|
-
- `safe-detail-urls.jsonl`:带 xsec_token 的详情链接索引
|
|
92
|
-
- `run.log / run-events.jsonl`:流程日志
|
|
93
|
-
|
|
94
|
-
## 五、恢复采集
|
|
95
|
-
|
|
96
|
-
如中断,可直接重跑,脚本会续传:
|
|
97
|
-
|
|
98
|
-
```bash
|
|
99
|
-
node scripts/xiaohongshu/tests/phase1-4-full-collect.mjs --keyword "雷军" --count 200
|
|
100
|
-
```
|
|
101
|
-
|
|
102
|
-
## 六、排查常见问题
|
|
103
|
-
|
|
104
|
-
### 1. Phase2 停止但未到 END
|
|
105
|
-
|
|
106
|
-
修复后不会直接停止,只会进入“滚动失败重试”。
|
|
107
|
-
若仍退出:
|
|
108
|
-
- 检查 END 标记是否存在
|
|
109
|
-
- 检查容器结构是否变化
|
|
110
|
-
- 检查 SearchGate 状态
|
|
111
|
-
|
|
112
|
-
### 2. 没有 comments.md
|
|
113
|
-
|
|
114
|
-
说明 Phase4 未完成或评论为空:
|
|
115
|
-
- 检查日志 `Phase4` 关键字
|
|
116
|
-
- 检查是否遇到风控
|
|
117
|
-
|
|
118
|
-
### 3. 输出路径不对
|
|
119
|
-
|
|
120
|
-
必须是:
|
|
121
|
-
```
|
|
122
|
-
~/.webauto/download/xiaohongshu/{env}/{keyword}/{noteId}/
|
|
123
|
-
```
|
|
124
|
-
|
|
125
|
-
## 七、日志定位
|
|
126
|
-
|
|
127
|
-
```bash
|
|
128
|
-
# 最新 run log
|
|
129
|
-
ls -lt ~/.webauto/download/xiaohongshu/{env}/{keyword}/run*.log | head -1
|
|
130
|
-
|
|
131
|
-
# 查看关键错误
|
|
132
|
-
rg "ERROR|WARN|风控|phase2_scroll_failure" ~/.webauto/download/xiaohongshu/{env}/{keyword}/run*.log
|
|
133
|
-
```
|
|
134
|
-
|