@pencil-agent/nano-pencil 2.0.0 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (195) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/mcp/mcp-client.d.ts +3 -1
  7. package/dist/core/mcp/mcp-client.js +6 -6
  8. package/dist/core/mcp/mcp-config.d.ts +3 -3
  9. package/dist/core/mcp/mcp-config.js +1 -1
  10. package/dist/core/mcp/mcp-manager.d.ts +5 -1
  11. package/dist/core/mcp/mcp-manager.js +1 -1
  12. package/dist/core/platform/config/resource-loader.d.ts +2 -0
  13. package/dist/core/platform/config/resource-loader.js +2 -2
  14. package/dist/core/runtime/agent-session.d.ts +12 -0
  15. package/dist/core/runtime/agent-session.js +8 -8
  16. package/dist/core/runtime/sdk.d.ts +8 -0
  17. package/dist/core/runtime/sdk.js +1 -1
  18. package/dist/extensions/builtin/AGENT.md +115 -115
  19. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  20. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  91. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  92. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  93. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  94. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  95. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  96. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  97. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  98. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  99. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  100. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  101. package/dist/extensions/builtin/browser/browser.md +73 -73
  102. package/dist/extensions/builtin/browser/install.md +142 -142
  103. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  104. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  105. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  107. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  108. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  109. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  110. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  111. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  112. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  113. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  114. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  115. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  116. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  117. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  118. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  119. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  120. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  121. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  122. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  123. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  124. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  125. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  126. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  127. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  128. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  129. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  130. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  131. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  132. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  133. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  134. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  135. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  136. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  137. package/dist/extensions/builtin/goal/README.md +67 -67
  138. package/dist/extensions/builtin/grub/README.md +112 -112
  139. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  140. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  141. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  142. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  143. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  144. package/dist/extensions/builtin/loop/README.md +92 -92
  145. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  146. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  147. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  148. package/dist/extensions/builtin/sal/README.md +72 -72
  149. package/dist/extensions/builtin/security-audit/README.md +289 -289
  150. package/dist/extensions/builtin/team/AGENT.md +112 -112
  151. package/dist/extensions/builtin/team/TESTING.md +299 -299
  152. package/dist/extensions/builtin/token-save/README.md +56 -56
  153. package/dist/extensions/optional/AGENT.md +10 -10
  154. package/dist/modes/interactive/interactive-mode.js +36 -36
  155. package/dist/modes/interactive/theme/dark.json +85 -85
  156. package/dist/modes/interactive/theme/light.json +84 -84
  157. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  158. package/dist/modes/interactive/theme/warm.json +81 -81
  159. package/dist/node_modules/@pencil-agent/agent-core/dist/agent-loop.js +3 -2
  160. package/dist/node_modules/@pencil-agent/agent-core/dist/structured-adaptive-agent-loop.js +2 -1
  161. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  162. package/docs/cc-agent-design.md +1297 -0
  163. package/docs/cc-tui-design.md +1333 -0
  164. package/docs/codex-goal-command-impl.md +1055 -1055
  165. package/docs/codex-goal-vs-grub.md +500 -500
  166. package/docs/custom-provider.md +27 -27
  167. package/docs/extensions.md +27 -27
  168. package/docs/keybindings.md +27 -27
  169. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  170. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  171. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  172. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  173. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  174. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  175. package/docs/loop-usage-examples.md +214 -214
  176. package/docs/models.md +27 -27
  177. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +170 -0
  178. package/docs/packages.md +27 -27
  179. package/docs/pi-design-philosophy.md +457 -457
  180. package/docs/planmode.md +1987 -1987
  181. package/docs/prompt-templates.md +27 -27
  182. package/docs/providers.md +27 -27
  183. package/docs/scan-report.md +3820 -0
  184. package/docs/sdk.md +27 -27
  185. package/docs/skills.md +27 -27
  186. package/docs/themes.md +27 -27
  187. package/docs/tui.md +27 -27
  188. package/docs//345/257/271/346/240/207Claude-Code.md +1775 -0
  189. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +261 -0
  190. package/package.json +190 -190
  191. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +0 -851
  192. package/docs/SDK-TESTING.md +0 -364
  193. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +0 -593
  194. package/docs/startup-performance-optimization.md +0 -301
  195. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +0 -47
@@ -1,12 +1,12 @@
1
- """Agent-editable browser helpers.
2
-
3
- Add task-specific browser primitives here. Core helpers from browser_harness.helpers
4
- load this file when BH_AGENT_WORKSPACE points at this directory, or when this
5
- repo's default agent-workspace exists.
6
-
7
- [WHO]: Provides project-local extension points for reusable Browser Harness helper functions
8
- [FROM]: Depends on browser_harness.helpers importing this file through BH_AGENT_WORKSPACE
9
- [TO]: Consumed by browser snippets after NanoPencil seeds .nanopencil/browser-workspace
10
- [HERE]: extensions/defaults/browser/agent-workspace/agent_helpers.py seed copied into project browser workspace
11
- """
12
-
1
+ """Agent-editable browser helpers.
2
+
3
+ Add task-specific browser primitives here. Core helpers from browser_harness.helpers
4
+ load this file when BH_AGENT_WORKSPACE points at this directory, or when this
5
+ repo's default agent-workspace exists.
6
+
7
+ [WHO]: Provides project-local extension points for reusable Browser Harness helper functions
8
+ [FROM]: Depends on browser_harness.helpers importing this file through BH_AGENT_WORKSPACE
9
+ [TO]: Consumed by browser snippets after NanoPencil seeds .nanopencil/browser-workspace
10
+ [HERE]: extensions/defaults/browser/agent-workspace/agent_helpers.py seed copied into project browser workspace
11
+ """
12
+
@@ -1,198 +1,198 @@
1
- # Amazon — Product Search & Data Extraction
2
-
3
- Field-tested against amazon.com on 2025-04-18 using a logged-in Chrome session.
4
- No CAPTCHA or bot detection was triggered during any test run.
5
-
6
- ## Navigation
7
-
8
- ### Direct search URL (fastest, always use this)
9
- ```python
10
- goto_url("https://www.amazon.com/s?k=mechanical+keyboard")
11
- wait_for_load()
12
- wait(2) # dynamic content needs ~2s after readyState=complete
13
- ```
14
-
15
- ### Search box typing (use when you need category filtering)
16
- ```python
17
- goto_url("https://www.amazon.com")
18
- wait_for_load()
19
- wait(1)
20
- js("document.querySelector('#twotabsearchtextbox').focus()")
21
- js("document.querySelector('#twotabsearchtextbox').click()")
22
- wait(0.3)
23
- type_text("wireless mouse")
24
- wait(0.3)
25
- press_key("Enter")
26
- wait_for_load()
27
- wait(2)
28
- ```
29
-
30
- ### Direct product page
31
- ```python
32
- # URL pattern: /dp/{ASIN} or /dp/{ASIN}?th=1 (Amazon may redirect to add ?th=1)
33
- goto_url("https://www.amazon.com/dp/B08Z6X4NK3")
34
- wait_for_load()
35
- wait(2)
36
- ```
37
-
38
- ## Session Gotcha
39
-
40
- **Always use `new_tab()` when opening Amazon for the first time in a harness session.**
41
- `goto_url()` can silently fail to navigate if the current tab resists the navigation
42
- (observed when the daemon attached to a different real tab). The safe pattern:
43
-
44
- ```python
45
- tid = new_tab("https://www.amazon.com/s?k=mechanical+keyboard")
46
- wait_for_load()
47
- wait(2)
48
- ```
49
-
50
- After that, `goto_url()` works fine within the same Amazon session.
51
-
52
- ## Search Results Extraction
53
-
54
- ### Container selector
55
- `[data-component-type="s-search-result"]` — confirmed working, yields ~22 results per page.
56
-
57
- ### Full extraction (field-tested)
58
- ```python
59
- results = js("""
60
- Array.from(document.querySelectorAll('[data-component-type="s-search-result"]')).map(el => ({
61
- asin: el.getAttribute('data-asin'),
62
- title: el.querySelector('h2 span')?.innerText?.trim(),
63
- price: el.querySelector('.a-price .a-offscreen')?.innerText,
64
- list_price: el.querySelector('.a-text-price .a-offscreen')?.innerText,
65
- rating: el.querySelector('[aria-label*="out of 5 stars"]')?.getAttribute('aria-label')?.split(' ')[0],
66
- reviews: el.querySelector('[aria-label*="ratings"]')?.getAttribute('aria-label'),
67
- is_sponsored: !!el.querySelector('.puis-sponsored-label-text'),
68
- url: el.querySelector('h2 a')?.href
69
- }))
70
- """)
71
- ```
72
-
73
- ### Field notes
74
- - **`asin`**: `data-asin` attribute on the container div — always present, matches the `/dp/{ASIN}` URL.
75
- - **`title`**: `h2 span` works consistently. `h2 a.a-link-normal span` also works.
76
- - **`price`**: `.a-price .a-offscreen` returns the formatted string e.g. `"$69.99"`. Use this, not `.a-price-whole`.
77
- - **`list_price`**: `.a-text-price .a-offscreen` — only present when item is on sale (was/now pricing).
78
- - **`rating`**: Use `aria-label` on `[aria-label*="out of 5 stars"]` — gives `"4.5 out of 5 stars, rating details"`, split on space for the number.
79
- - **`reviews`**: Use `[aria-label*="ratings"]` attribute — gives `"1,514 ratings"`. Do NOT use `.a-size-base.s-underline-text` — that element exists on sponsored results and shows "Xbox" (a cross-sell widget text).
80
- - **`is_sponsored`**: `.puis-sponsored-label-text` is present on sponsored listings; first 2-3 results are usually sponsored.
81
- - **`url`**: `h2 a` href — contains the full `/dp/{ASIN}/...` URL.
82
-
83
- ## Product Detail Page Extraction
84
-
85
- ### Confirmed selectors (field-tested on B08Z6X4NK3)
86
- ```python
87
- detail = js("""
88
- ({
89
- title: document.querySelector('#productTitle')?.innerText?.trim(),
90
- price: (function() {
91
- var whole = document.querySelector('.a-price-whole')?.innerText?.replace(/[\\n.]/g,'');
92
- var frac = document.querySelector('.a-price-fraction')?.innerText;
93
- return (whole && frac) ? '$' + whole + '.' + frac
94
- : document.querySelector('.a-price .a-offscreen')?.innerText || null;
95
- })(),
96
- list_price: document.querySelector('.basisPrice .a-offscreen')?.innerText,
97
- rating: document.querySelector('#acrPopover')?.getAttribute('title'),
98
- review_count: document.querySelector('#acrCustomerReviewText')?.innerText,
99
- availability: document.querySelector('#availability span')?.innerText?.trim(),
100
- brand: document.querySelector('#bylineInfo')?.innerText?.trim(),
101
- asin: document.querySelector('input[name="ASIN"]')?.value,
102
- bullet_points: Array.from(document.querySelectorAll('#feature-bullets li span.a-list-item'))
103
- .map(e => e.innerText?.trim()).filter(t => t)
104
- })
105
- """)
106
- ```
107
-
108
- ### Price field notes
109
- - `#priceblock_ourprice` and `#priceblock_dealprice` are **legacy** — they return `null` on modern product pages.
110
- - Construct price from `.a-price-whole` + `.a-price-fraction` (both stripped of `\n` and `.`).
111
- - As a fallback: first `.a-price .a-offscreen` on the page also works (confirmed `$69.99`).
112
- - `list_price` from `.basisPrice .a-offscreen` shows the crossed-out "was" price when a discount exists.
113
-
114
- ## Best Sellers Page
115
-
116
- URL: `https://www.amazon.com/Best-Sellers-{Category}/zgbs/{slug}/`
117
- e.g. `https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/`
118
-
119
- ### DOM structure (2025)
120
- `.zg-item-immersion` **does not exist** — Amazon migrated to CSS modules. Use `[data-asin]` anchored on `[id="gridItemRoot"]`:
121
-
122
- ```python
123
- goto_url("https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/")
124
- wait_for_load()
125
- wait(2)
126
-
127
- items = js("""
128
- Array.from(document.querySelectorAll('[data-asin]')).map(el => {
129
- var container = el.closest('[id="gridItemRoot"]') || el;
130
- return {
131
- asin: el.getAttribute('data-asin'),
132
- rank: container.querySelector('[class*="zg-bdg-text"]')?.innerText,
133
- title: container.querySelector('img[alt]')?.getAttribute('alt'),
134
- price: container.querySelector('.p13n-sc-price, .a-size-base.a-color-price')?.innerText,
135
- url: 'https://www.amazon.com/dp/' + el.getAttribute('data-asin')
136
- }
137
- }).filter(r => r.rank)
138
- """)
139
- ```
140
-
141
- Note: Title comes from the product image `alt` attribute — the text title elements use obfuscated CSS module class names that change between deployments.
142
-
143
- ## Pagination
144
-
145
- ```python
146
- # Get next page URL directly
147
- next_url = js("document.querySelector('.s-pagination-next')?.href")
148
- if next_url:
149
- goto_url(next_url)
150
- wait_for_load()
151
- wait(2)
152
-
153
- # Or construct by page number
154
- goto_url("https://www.amazon.com/s?k=wireless+mouse&page=2")
155
- ```
156
-
157
- ## Result Count
158
-
159
- ```python
160
- count_text = js("document.querySelector('[data-component-type=\"s-result-info-bar\"] h1')?.innerText?.trim()")
161
- # Returns e.g.: '1-16 of over 40,000 results for "wireless mouse"\nSort by:\n...'
162
- # Extract just the count: count_text.split('\n')[0]
163
- ```
164
-
165
- ## CAPTCHA Detection
166
-
167
- No CAPTCHA was encountered during testing with a logged-in Chrome session. To detect defensively:
168
-
169
- ```python
170
- def check_captcha():
171
- text = js("document.body.innerText.slice(0,500)") or ""
172
- url = page_info()["url"]
173
- return (
174
- "captcha" in text.lower()
175
- or "enter the characters" in text.lower()
176
- or "sorry, we just need to make sure" in text.lower()
177
- or "captcha" in url.lower()
178
- or "validateCaptcha" in url
179
- )
180
-
181
- if check_captcha():
182
- raise RuntimeError("Amazon CAPTCHA hit — stop and notify user")
183
- ```
184
-
185
- Amazon may serve a CAPTCHA on fresh/anonymous sessions. Using the browser's existing logged-in session avoids this in practice.
186
-
187
- ## Gotchas
188
-
189
- - **`goto_url()` silent failure**: On first visit, use `new_tab(url)` instead. After the tab is on Amazon, `goto_url()` works.
190
- - **`.zg-item-immersion` is gone**: Best Sellers page uses CSS module classes (obfuscated). Use `[data-asin]` + `img[alt]` for title.
191
- - **`.a-size-base.s-underline-text` is unreliable for review count**: On sponsored results it shows unrelated text (e.g. "Xbox"). Use `[aria-label*="ratings"]` instead.
192
- - **`#priceblock_ourprice` is legacy**: Returns `null` on modern pages. Construct from `.a-price-whole` + `.a-price-fraction`.
193
- - **Sponsored results appear first**: First 2-3 results are almost always `is_sponsored: true`. Filter them out with `!el.querySelector('.puis-sponsored-label-text')` when you need organic results.
194
- - **`data-asin` can be empty string on non-product rows**: Filter with `.filter(r => r.asin)`.
195
- - **Price split DOM**: `.a-price-whole` innerText includes a trailing `\n.` — strip it: `.replace(/[\n.]/g,'')`.
196
- - **ASIN from URL**: Use `/dp/([A-Z0-9]{10})/` regex on the product URL. `data-asin` on search results is always the canonical ASIN.
197
- - **`?th=1` redirect**: Amazon appends `?th=1` (and sometimes `?psc=1`) to product URLs after redirect. This is normal — `input[name="ASIN"]` always has the clean ASIN.
198
- - **Wait 2s after `wait_for_load()`**: Amazon search results load the listing cards asynchronously. `readyState=complete` fires before cards render. A hard 2s wait is required.
1
+ # Amazon — Product Search & Data Extraction
2
+
3
+ Field-tested against amazon.com on 2025-04-18 using a logged-in Chrome session.
4
+ No CAPTCHA or bot detection was triggered during any test run.
5
+
6
+ ## Navigation
7
+
8
+ ### Direct search URL (fastest, always use this)
9
+ ```python
10
+ goto_url("https://www.amazon.com/s?k=mechanical+keyboard")
11
+ wait_for_load()
12
+ wait(2) # dynamic content needs ~2s after readyState=complete
13
+ ```
14
+
15
+ ### Search box typing (use when you need category filtering)
16
+ ```python
17
+ goto_url("https://www.amazon.com")
18
+ wait_for_load()
19
+ wait(1)
20
+ js("document.querySelector('#twotabsearchtextbox').focus()")
21
+ js("document.querySelector('#twotabsearchtextbox').click()")
22
+ wait(0.3)
23
+ type_text("wireless mouse")
24
+ wait(0.3)
25
+ press_key("Enter")
26
+ wait_for_load()
27
+ wait(2)
28
+ ```
29
+
30
+ ### Direct product page
31
+ ```python
32
+ # URL pattern: /dp/{ASIN} or /dp/{ASIN}?th=1 (Amazon may redirect to add ?th=1)
33
+ goto_url("https://www.amazon.com/dp/B08Z6X4NK3")
34
+ wait_for_load()
35
+ wait(2)
36
+ ```
37
+
38
+ ## Session Gotcha
39
+
40
+ **Always use `new_tab()` when opening Amazon for the first time in a harness session.**
41
+ `goto_url()` can silently fail to navigate if the current tab resists the navigation
42
+ (observed when the daemon attached to a different real tab). The safe pattern:
43
+
44
+ ```python
45
+ tid = new_tab("https://www.amazon.com/s?k=mechanical+keyboard")
46
+ wait_for_load()
47
+ wait(2)
48
+ ```
49
+
50
+ After that, `goto_url()` works fine within the same Amazon session.
51
+
52
+ ## Search Results Extraction
53
+
54
+ ### Container selector
55
+ `[data-component-type="s-search-result"]` — confirmed working, yields ~22 results per page.
56
+
57
+ ### Full extraction (field-tested)
58
+ ```python
59
+ results = js("""
60
+ Array.from(document.querySelectorAll('[data-component-type="s-search-result"]')).map(el => ({
61
+ asin: el.getAttribute('data-asin'),
62
+ title: el.querySelector('h2 span')?.innerText?.trim(),
63
+ price: el.querySelector('.a-price .a-offscreen')?.innerText,
64
+ list_price: el.querySelector('.a-text-price .a-offscreen')?.innerText,
65
+ rating: el.querySelector('[aria-label*="out of 5 stars"]')?.getAttribute('aria-label')?.split(' ')[0],
66
+ reviews: el.querySelector('[aria-label*="ratings"]')?.getAttribute('aria-label'),
67
+ is_sponsored: !!el.querySelector('.puis-sponsored-label-text'),
68
+ url: el.querySelector('h2 a')?.href
69
+ }))
70
+ """)
71
+ ```
72
+
73
+ ### Field notes
74
+ - **`asin`**: `data-asin` attribute on the container div — always present, matches the `/dp/{ASIN}` URL.
75
+ - **`title`**: `h2 span` works consistently. `h2 a.a-link-normal span` also works.
76
+ - **`price`**: `.a-price .a-offscreen` returns the formatted string e.g. `"$69.99"`. Use this, not `.a-price-whole`.
77
+ - **`list_price`**: `.a-text-price .a-offscreen` — only present when item is on sale (was/now pricing).
78
+ - **`rating`**: Use `aria-label` on `[aria-label*="out of 5 stars"]` — gives `"4.5 out of 5 stars, rating details"`, split on space for the number.
79
+ - **`reviews`**: Use `[aria-label*="ratings"]` attribute — gives `"1,514 ratings"`. Do NOT use `.a-size-base.s-underline-text` — that element exists on sponsored results and shows "Xbox" (a cross-sell widget text).
80
+ - **`is_sponsored`**: `.puis-sponsored-label-text` is present on sponsored listings; first 2-3 results are usually sponsored.
81
+ - **`url`**: `h2 a` href — contains the full `/dp/{ASIN}/...` URL.
82
+
83
+ ## Product Detail Page Extraction
84
+
85
+ ### Confirmed selectors (field-tested on B08Z6X4NK3)
86
+ ```python
87
+ detail = js("""
88
+ ({
89
+ title: document.querySelector('#productTitle')?.innerText?.trim(),
90
+ price: (function() {
91
+ var whole = document.querySelector('.a-price-whole')?.innerText?.replace(/[\\n.]/g,'');
92
+ var frac = document.querySelector('.a-price-fraction')?.innerText;
93
+ return (whole && frac) ? '$' + whole + '.' + frac
94
+ : document.querySelector('.a-price .a-offscreen')?.innerText || null;
95
+ })(),
96
+ list_price: document.querySelector('.basisPrice .a-offscreen')?.innerText,
97
+ rating: document.querySelector('#acrPopover')?.getAttribute('title'),
98
+ review_count: document.querySelector('#acrCustomerReviewText')?.innerText,
99
+ availability: document.querySelector('#availability span')?.innerText?.trim(),
100
+ brand: document.querySelector('#bylineInfo')?.innerText?.trim(),
101
+ asin: document.querySelector('input[name="ASIN"]')?.value,
102
+ bullet_points: Array.from(document.querySelectorAll('#feature-bullets li span.a-list-item'))
103
+ .map(e => e.innerText?.trim()).filter(t => t)
104
+ })
105
+ """)
106
+ ```
107
+
108
+ ### Price field notes
109
+ - `#priceblock_ourprice` and `#priceblock_dealprice` are **legacy** — they return `null` on modern product pages.
110
+ - Construct price from `.a-price-whole` + `.a-price-fraction` (both stripped of `\n` and `.`).
111
+ - As a fallback: first `.a-price .a-offscreen` on the page also works (confirmed `$69.99`).
112
+ - `list_price` from `.basisPrice .a-offscreen` shows the crossed-out "was" price when a discount exists.
113
+
114
+ ## Best Sellers Page
115
+
116
+ URL: `https://www.amazon.com/Best-Sellers-{Category}/zgbs/{slug}/`
117
+ e.g. `https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/`
118
+
119
+ ### DOM structure (2025)
120
+ `.zg-item-immersion` **does not exist** — Amazon migrated to CSS modules. Use `[data-asin]` anchored on `[id="gridItemRoot"]`:
121
+
122
+ ```python
123
+ goto_url("https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/")
124
+ wait_for_load()
125
+ wait(2)
126
+
127
+ items = js("""
128
+ Array.from(document.querySelectorAll('[data-asin]')).map(el => {
129
+ var container = el.closest('[id="gridItemRoot"]') || el;
130
+ return {
131
+ asin: el.getAttribute('data-asin'),
132
+ rank: container.querySelector('[class*="zg-bdg-text"]')?.innerText,
133
+ title: container.querySelector('img[alt]')?.getAttribute('alt'),
134
+ price: container.querySelector('.p13n-sc-price, .a-size-base.a-color-price')?.innerText,
135
+ url: 'https://www.amazon.com/dp/' + el.getAttribute('data-asin')
136
+ }
137
+ }).filter(r => r.rank)
138
+ """)
139
+ ```
140
+
141
+ Note: Title comes from the product image `alt` attribute — the text title elements use obfuscated CSS module class names that change between deployments.
142
+
143
+ ## Pagination
144
+
145
+ ```python
146
+ # Get next page URL directly
147
+ next_url = js("document.querySelector('.s-pagination-next')?.href")
148
+ if next_url:
149
+ goto_url(next_url)
150
+ wait_for_load()
151
+ wait(2)
152
+
153
+ # Or construct by page number
154
+ goto_url("https://www.amazon.com/s?k=wireless+mouse&page=2")
155
+ ```
156
+
157
+ ## Result Count
158
+
159
+ ```python
160
+ count_text = js("document.querySelector('[data-component-type=\"s-result-info-bar\"] h1')?.innerText?.trim()")
161
+ # Returns e.g.: '1-16 of over 40,000 results for "wireless mouse"\nSort by:\n...'
162
+ # Extract just the count: count_text.split('\n')[0]
163
+ ```
164
+
165
+ ## CAPTCHA Detection
166
+
167
+ No CAPTCHA was encountered during testing with a logged-in Chrome session. To detect defensively:
168
+
169
+ ```python
170
+ def check_captcha():
171
+ text = js("document.body.innerText.slice(0,500)") or ""
172
+ url = page_info()["url"]
173
+ return (
174
+ "captcha" in text.lower()
175
+ or "enter the characters" in text.lower()
176
+ or "sorry, we just need to make sure" in text.lower()
177
+ or "captcha" in url.lower()
178
+ or "validateCaptcha" in url
179
+ )
180
+
181
+ if check_captcha():
182
+ raise RuntimeError("Amazon CAPTCHA hit — stop and notify user")
183
+ ```
184
+
185
+ Amazon may serve a CAPTCHA on fresh/anonymous sessions. Using the browser's existing logged-in session avoids this in practice.
186
+
187
+ ## Gotchas
188
+
189
+ - **`goto_url()` silent failure**: On first visit, use `new_tab(url)` instead. After the tab is on Amazon, `goto_url()` works.
190
+ - **`.zg-item-immersion` is gone**: Best Sellers page uses CSS module classes (obfuscated). Use `[data-asin]` + `img[alt]` for title.
191
+ - **`.a-size-base.s-underline-text` is unreliable for review count**: On sponsored results it shows unrelated text (e.g. "Xbox"). Use `[aria-label*="ratings"]` instead.
192
+ - **`#priceblock_ourprice` is legacy**: Returns `null` on modern pages. Construct from `.a-price-whole` + `.a-price-fraction`.
193
+ - **Sponsored results appear first**: First 2-3 results are almost always `is_sponsored: true`. Filter them out with `!el.querySelector('.puis-sponsored-label-text')` when you need organic results.
194
+ - **`data-asin` can be empty string on non-product rows**: Filter with `.filter(r => r.asin)`.
195
+ - **Price split DOM**: `.a-price-whole` innerText includes a trailing `\n.` — strip it: `.replace(/[\n.]/g,'')`.
196
+ - **ASIN from URL**: Use `/dp/([A-Z0-9]{10})/` regex on the product URL. `data-asin` on search results is always the canonical ASIN.
197
+ - **`?th=1` redirect**: Amazon appends `?th=1` (and sometimes `?psc=1`) to product URLs after redirect. This is normal — `input[name="ASIN"]` always has the clean ASIN.
198
+ - **Wait 2s after `wait_for_load()`**: Amazon search results load the listing cards asynchronously. `readyState=complete` fires before cards render. A hard 2s wait is required.