@pencil-agent/nano-pencil 2.0.0-beta.9 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (207) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/extensions-host/index.d.ts +1 -1
  7. package/dist/core/extensions-host/types.d.ts +5 -8
  8. package/dist/extensions/builtin/AGENT.md +115 -115
  9. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  10. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  11. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  12. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  13. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  14. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  15. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  16. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  17. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  18. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  19. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  20. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  91. package/dist/extensions/builtin/browser/browser.md +73 -73
  92. package/dist/extensions/builtin/browser/install.md +142 -142
  93. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  94. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  95. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  96. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  97. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  98. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  99. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  100. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  101. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  102. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  103. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  104. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  105. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  107. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  108. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  109. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  110. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  111. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  112. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  113. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  114. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  115. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  116. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  117. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  118. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  119. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  120. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  121. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  122. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  123. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  124. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  125. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  126. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  127. package/dist/extensions/builtin/goal/README.md +67 -67
  128. package/dist/extensions/builtin/goal/goal-controller.js +1 -1
  129. package/dist/extensions/builtin/goal/goal-prompts.js +4 -4
  130. package/dist/extensions/builtin/grub/README.md +112 -112
  131. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  132. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  133. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  134. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  135. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  136. package/dist/extensions/builtin/loop/README.md +92 -92
  137. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  138. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  139. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  140. package/dist/extensions/builtin/sal/README.md +72 -72
  141. package/dist/extensions/builtin/security-audit/README.md +289 -289
  142. package/dist/extensions/builtin/team/AGENT.md +112 -112
  143. package/dist/extensions/builtin/team/TESTING.md +299 -299
  144. package/dist/extensions/builtin/token-save/README.md +56 -56
  145. package/dist/extensions/optional/AGENT.md +10 -10
  146. package/dist/index.d.ts +5 -30
  147. package/dist/index.js +1 -1
  148. package/dist/models.d.ts +7 -0
  149. package/dist/models.js +1 -0
  150. package/dist/modes/interactive/theme/dark.json +85 -85
  151. package/dist/modes/interactive/theme/light.json +84 -84
  152. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  153. package/dist/modes/interactive/theme/warm.json +81 -81
  154. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  155. package/dist/packages/protocol/src/flags.d.ts +20 -0
  156. package/dist/packages/protocol/src/flags.js +0 -0
  157. package/dist/packages/protocol/src/hooks.d.ts +17 -0
  158. package/dist/packages/protocol/src/hooks.js +0 -0
  159. package/dist/packages/protocol/src/index.d.ts +4 -2
  160. package/dist/packages/protocol/src/index.js +1 -1
  161. package/dist/packages/protocol/src/lifecycle.d.ts +11 -21
  162. package/dist/public-config.d.ts +12 -0
  163. package/dist/public-config.js +1 -0
  164. package/dist/runtime.d.ts +9 -0
  165. package/dist/runtime.js +1 -0
  166. package/dist/session-compaction.d.ts +7 -0
  167. package/dist/session-compaction.js +1 -0
  168. package/dist/session.d.ts +7 -0
  169. package/dist/session.js +1 -0
  170. package/dist/skills.d.ts +7 -0
  171. package/dist/skills.js +1 -0
  172. package/dist/tools.d.ts +7 -0
  173. package/dist/tools.js +1 -0
  174. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +851 -0
  175. package/docs/SDK-TESTING.md +364 -0
  176. package/docs/codex-goal-command-impl.md +1055 -1055
  177. package/docs/codex-goal-vs-grub.md +500 -500
  178. package/docs/custom-provider.md +27 -27
  179. package/docs/extensions.md +27 -27
  180. package/docs/keybindings.md +27 -27
  181. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  182. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  183. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  184. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  185. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  186. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  187. package/docs/loop-usage-examples.md +214 -214
  188. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +593 -0
  189. package/docs/models.md +27 -27
  190. package/docs/packages.md +27 -27
  191. package/docs/pi-design-philosophy.md +457 -457
  192. package/docs/planmode.md +1987 -1987
  193. package/docs/prompt-templates.md +27 -27
  194. package/docs/providers.md +27 -27
  195. package/docs/sdk.md +27 -27
  196. package/docs/skills.md +27 -27
  197. package/docs/startup-performance-optimization.md +301 -0
  198. package/docs/themes.md +27 -27
  199. package/docs/tui.md +27 -27
  200. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +47 -0
  201. package/package.json +190 -162
  202. package/docs/cc-agent-design.md +0 -1297
  203. package/docs/cc-tui-design.md +0 -1333
  204. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +0 -170
  205. package/docs/scan-report.md +0 -3820
  206. package/docs//345/257/271/346/240/207Claude-Code.md +0 -1775
  207. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +0 -261
@@ -1,198 +1,198 @@
1
- # Amazon — Product Search & Data Extraction
2
-
3
- Field-tested against amazon.com on 2025-04-18 using a logged-in Chrome session.
4
- No CAPTCHA or bot detection was triggered during any test run.
5
-
6
- ## Navigation
7
-
8
- ### Direct search URL (fastest, always use this)
9
- ```python
10
- goto_url("https://www.amazon.com/s?k=mechanical+keyboard")
11
- wait_for_load()
12
- wait(2) # dynamic content needs ~2s after readyState=complete
13
- ```
14
-
15
- ### Search box typing (use when you need category filtering)
16
- ```python
17
- goto_url("https://www.amazon.com")
18
- wait_for_load()
19
- wait(1)
20
- js("document.querySelector('#twotabsearchtextbox').focus()")
21
- js("document.querySelector('#twotabsearchtextbox').click()")
22
- wait(0.3)
23
- type_text("wireless mouse")
24
- wait(0.3)
25
- press_key("Enter")
26
- wait_for_load()
27
- wait(2)
28
- ```
29
-
30
- ### Direct product page
31
- ```python
32
- # URL pattern: /dp/{ASIN} or /dp/{ASIN}?th=1 (Amazon may redirect to add ?th=1)
33
- goto_url("https://www.amazon.com/dp/B08Z6X4NK3")
34
- wait_for_load()
35
- wait(2)
36
- ```
37
-
38
- ## Session Gotcha
39
-
40
- **Always use `new_tab()` when opening Amazon for the first time in a harness session.**
41
- `goto_url()` can silently fail to navigate if the current tab resists the navigation
42
- (observed when the daemon attached to a different real tab). The safe pattern:
43
-
44
- ```python
45
- tid = new_tab("https://www.amazon.com/s?k=mechanical+keyboard")
46
- wait_for_load()
47
- wait(2)
48
- ```
49
-
50
- After that, `goto_url()` works fine within the same Amazon session.
51
-
52
- ## Search Results Extraction
53
-
54
- ### Container selector
55
- `[data-component-type="s-search-result"]` — confirmed working, yields ~22 results per page.
56
-
57
- ### Full extraction (field-tested)
58
- ```python
59
- results = js("""
60
- Array.from(document.querySelectorAll('[data-component-type="s-search-result"]')).map(el => ({
61
- asin: el.getAttribute('data-asin'),
62
- title: el.querySelector('h2 span')?.innerText?.trim(),
63
- price: el.querySelector('.a-price .a-offscreen')?.innerText,
64
- list_price: el.querySelector('.a-text-price .a-offscreen')?.innerText,
65
- rating: el.querySelector('[aria-label*="out of 5 stars"]')?.getAttribute('aria-label')?.split(' ')[0],
66
- reviews: el.querySelector('[aria-label*="ratings"]')?.getAttribute('aria-label'),
67
- is_sponsored: !!el.querySelector('.puis-sponsored-label-text'),
68
- url: el.querySelector('h2 a')?.href
69
- }))
70
- """)
71
- ```
72
-
73
- ### Field notes
74
- - **`asin`**: `data-asin` attribute on the container div — always present, matches the `/dp/{ASIN}` URL.
75
- - **`title`**: `h2 span` works consistently. `h2 a.a-link-normal span` also works.
76
- - **`price`**: `.a-price .a-offscreen` returns the formatted string e.g. `"$69.99"`. Use this, not `.a-price-whole`.
77
- - **`list_price`**: `.a-text-price .a-offscreen` — only present when item is on sale (was/now pricing).
78
- - **`rating`**: Use `aria-label` on `[aria-label*="out of 5 stars"]` — gives `"4.5 out of 5 stars, rating details"`, split on space for the number.
79
- - **`reviews`**: Use `[aria-label*="ratings"]` attribute — gives `"1,514 ratings"`. Do NOT use `.a-size-base.s-underline-text` — that element exists on sponsored results and shows "Xbox" (a cross-sell widget text).
80
- - **`is_sponsored`**: `.puis-sponsored-label-text` is present on sponsored listings; first 2-3 results are usually sponsored.
81
- - **`url`**: `h2 a` href — contains the full `/dp/{ASIN}/...` URL.
82
-
83
- ## Product Detail Page Extraction
84
-
85
- ### Confirmed selectors (field-tested on B08Z6X4NK3)
86
- ```python
87
- detail = js("""
88
- ({
89
- title: document.querySelector('#productTitle')?.innerText?.trim(),
90
- price: (function() {
91
- var whole = document.querySelector('.a-price-whole')?.innerText?.replace(/[\\n.]/g,'');
92
- var frac = document.querySelector('.a-price-fraction')?.innerText;
93
- return (whole && frac) ? '$' + whole + '.' + frac
94
- : document.querySelector('.a-price .a-offscreen')?.innerText || null;
95
- })(),
96
- list_price: document.querySelector('.basisPrice .a-offscreen')?.innerText,
97
- rating: document.querySelector('#acrPopover')?.getAttribute('title'),
98
- review_count: document.querySelector('#acrCustomerReviewText')?.innerText,
99
- availability: document.querySelector('#availability span')?.innerText?.trim(),
100
- brand: document.querySelector('#bylineInfo')?.innerText?.trim(),
101
- asin: document.querySelector('input[name="ASIN"]')?.value,
102
- bullet_points: Array.from(document.querySelectorAll('#feature-bullets li span.a-list-item'))
103
- .map(e => e.innerText?.trim()).filter(t => t)
104
- })
105
- """)
106
- ```
107
-
108
- ### Price field notes
109
- - `#priceblock_ourprice` and `#priceblock_dealprice` are **legacy** — they return `null` on modern product pages.
110
- - Construct price from `.a-price-whole` + `.a-price-fraction` (both stripped of `\n` and `.`).
111
- - As a fallback: first `.a-price .a-offscreen` on the page also works (confirmed `$69.99`).
112
- - `list_price` from `.basisPrice .a-offscreen` shows the crossed-out "was" price when a discount exists.
113
-
114
- ## Best Sellers Page
115
-
116
- URL: `https://www.amazon.com/Best-Sellers-{Category}/zgbs/{slug}/`
117
- e.g. `https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/`
118
-
119
- ### DOM structure (2025)
120
- `.zg-item-immersion` **does not exist** — Amazon migrated to CSS modules. Use `[data-asin]` anchored on `[id="gridItemRoot"]`:
121
-
122
- ```python
123
- goto_url("https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/")
124
- wait_for_load()
125
- wait(2)
126
-
127
- items = js("""
128
- Array.from(document.querySelectorAll('[data-asin]')).map(el => {
129
- var container = el.closest('[id="gridItemRoot"]') || el;
130
- return {
131
- asin: el.getAttribute('data-asin'),
132
- rank: container.querySelector('[class*="zg-bdg-text"]')?.innerText,
133
- title: container.querySelector('img[alt]')?.getAttribute('alt'),
134
- price: container.querySelector('.p13n-sc-price, .a-size-base.a-color-price')?.innerText,
135
- url: 'https://www.amazon.com/dp/' + el.getAttribute('data-asin')
136
- }
137
- }).filter(r => r.rank)
138
- """)
139
- ```
140
-
141
- Note: Title comes from the product image `alt` attribute — the text title elements use obfuscated CSS module class names that change between deployments.
142
-
143
- ## Pagination
144
-
145
- ```python
146
- # Get next page URL directly
147
- next_url = js("document.querySelector('.s-pagination-next')?.href")
148
- if next_url:
149
- goto_url(next_url)
150
- wait_for_load()
151
- wait(2)
152
-
153
- # Or construct by page number
154
- goto_url("https://www.amazon.com/s?k=wireless+mouse&page=2")
155
- ```
156
-
157
- ## Result Count
158
-
159
- ```python
160
- count_text = js("document.querySelector('[data-component-type=\"s-result-info-bar\"] h1')?.innerText?.trim()")
161
- # Returns e.g.: '1-16 of over 40,000 results for "wireless mouse"\nSort by:\n...'
162
- # Extract just the count: count_text.split('\n')[0]
163
- ```
164
-
165
- ## CAPTCHA Detection
166
-
167
- No CAPTCHA was encountered during testing with a logged-in Chrome session. To detect defensively:
168
-
169
- ```python
170
- def check_captcha():
171
- text = js("document.body.innerText.slice(0,500)") or ""
172
- url = page_info()["url"]
173
- return (
174
- "captcha" in text.lower()
175
- or "enter the characters" in text.lower()
176
- or "sorry, we just need to make sure" in text.lower()
177
- or "captcha" in url.lower()
178
- or "validateCaptcha" in url
179
- )
180
-
181
- if check_captcha():
182
- raise RuntimeError("Amazon CAPTCHA hit — stop and notify user")
183
- ```
184
-
185
- Amazon may serve a CAPTCHA on fresh/anonymous sessions. Using the browser's existing logged-in session avoids this in practice.
186
-
187
- ## Gotchas
188
-
189
- - **`goto_url()` silent failure**: On first visit, use `new_tab(url)` instead. After the tab is on Amazon, `goto_url()` works.
190
- - **`.zg-item-immersion` is gone**: Best Sellers page uses CSS module classes (obfuscated). Use `[data-asin]` + `img[alt]` for title.
191
- - **`.a-size-base.s-underline-text` is unreliable for review count**: On sponsored results it shows unrelated text (e.g. "Xbox"). Use `[aria-label*="ratings"]` instead.
192
- - **`#priceblock_ourprice` is legacy**: Returns `null` on modern pages. Construct from `.a-price-whole` + `.a-price-fraction`.
193
- - **Sponsored results appear first**: First 2-3 results are almost always `is_sponsored: true`. Filter them out with `!el.querySelector('.puis-sponsored-label-text')` when you need organic results.
194
- - **`data-asin` can be empty string on non-product rows**: Filter with `.filter(r => r.asin)`.
195
- - **Price split DOM**: `.a-price-whole` innerText includes a trailing `\n.` — strip it: `.replace(/[\n.]/g,'')`.
196
- - **ASIN from URL**: Use `/dp/([A-Z0-9]{10})/` regex on the product URL. `data-asin` on search results is always the canonical ASIN.
197
- - **`?th=1` redirect**: Amazon appends `?th=1` (and sometimes `?psc=1`) to product URLs after redirect. This is normal — `input[name="ASIN"]` always has the clean ASIN.
198
- - **Wait 2s after `wait_for_load()`**: Amazon search results load the listing cards asynchronously. `readyState=complete` fires before cards render. A hard 2s wait is required.
1
+ # Amazon — Product Search & Data Extraction
2
+
3
+ Field-tested against amazon.com on 2025-04-18 using a logged-in Chrome session.
4
+ No CAPTCHA or bot detection was triggered during any test run.
5
+
6
+ ## Navigation
7
+
8
+ ### Direct search URL (fastest, always use this)
9
+ ```python
10
+ goto_url("https://www.amazon.com/s?k=mechanical+keyboard")
11
+ wait_for_load()
12
+ wait(2) # dynamic content needs ~2s after readyState=complete
13
+ ```
14
+
15
+ ### Search box typing (use when you need category filtering)
16
+ ```python
17
+ goto_url("https://www.amazon.com")
18
+ wait_for_load()
19
+ wait(1)
20
+ js("document.querySelector('#twotabsearchtextbox').focus()")
21
+ js("document.querySelector('#twotabsearchtextbox').click()")
22
+ wait(0.3)
23
+ type_text("wireless mouse")
24
+ wait(0.3)
25
+ press_key("Enter")
26
+ wait_for_load()
27
+ wait(2)
28
+ ```
29
+
30
+ ### Direct product page
31
+ ```python
32
+ # URL pattern: /dp/{ASIN} or /dp/{ASIN}?th=1 (Amazon may redirect to add ?th=1)
33
+ goto_url("https://www.amazon.com/dp/B08Z6X4NK3")
34
+ wait_for_load()
35
+ wait(2)
36
+ ```
37
+
38
+ ## Session Gotcha
39
+
40
+ **Always use `new_tab()` when opening Amazon for the first time in a harness session.**
41
+ `goto_url()` can silently fail to navigate if the current tab resists the navigation
42
+ (observed when the daemon attached to a different real tab). The safe pattern:
43
+
44
+ ```python
45
+ tid = new_tab("https://www.amazon.com/s?k=mechanical+keyboard")
46
+ wait_for_load()
47
+ wait(2)
48
+ ```
49
+
50
+ After that, `goto_url()` works fine within the same Amazon session.
51
+
52
+ ## Search Results Extraction
53
+
54
+ ### Container selector
55
+ `[data-component-type="s-search-result"]` — confirmed working, yields ~22 results per page.
56
+
57
+ ### Full extraction (field-tested)
58
+ ```python
59
+ results = js("""
60
+ Array.from(document.querySelectorAll('[data-component-type="s-search-result"]')).map(el => ({
61
+ asin: el.getAttribute('data-asin'),
62
+ title: el.querySelector('h2 span')?.innerText?.trim(),
63
+ price: el.querySelector('.a-price .a-offscreen')?.innerText,
64
+ list_price: el.querySelector('.a-text-price .a-offscreen')?.innerText,
65
+ rating: el.querySelector('[aria-label*="out of 5 stars"]')?.getAttribute('aria-label')?.split(' ')[0],
66
+ reviews: el.querySelector('[aria-label*="ratings"]')?.getAttribute('aria-label'),
67
+ is_sponsored: !!el.querySelector('.puis-sponsored-label-text'),
68
+ url: el.querySelector('h2 a')?.href
69
+ }))
70
+ """)
71
+ ```
72
+
73
+ ### Field notes
74
+ - **`asin`**: `data-asin` attribute on the container div — always present, matches the `/dp/{ASIN}` URL.
75
+ - **`title`**: `h2 span` works consistently. `h2 a.a-link-normal span` also works.
76
+ - **`price`**: `.a-price .a-offscreen` returns the formatted string e.g. `"$69.99"`. Use this, not `.a-price-whole`.
77
+ - **`list_price`**: `.a-text-price .a-offscreen` — only present when item is on sale (was/now pricing).
78
+ - **`rating`**: Use `aria-label` on `[aria-label*="out of 5 stars"]` — gives `"4.5 out of 5 stars, rating details"`, split on space for the number.
79
+ - **`reviews`**: Use `[aria-label*="ratings"]` attribute — gives `"1,514 ratings"`. Do NOT use `.a-size-base.s-underline-text` — that element exists on sponsored results and shows "Xbox" (a cross-sell widget text).
80
+ - **`is_sponsored`**: `.puis-sponsored-label-text` is present on sponsored listings; first 2-3 results are usually sponsored.
81
+ - **`url`**: `h2 a` href — contains the full `/dp/{ASIN}/...` URL.
82
+
83
+ ## Product Detail Page Extraction
84
+
85
+ ### Confirmed selectors (field-tested on B08Z6X4NK3)
86
+ ```python
87
+ detail = js("""
88
+ ({
89
+ title: document.querySelector('#productTitle')?.innerText?.trim(),
90
+ price: (function() {
91
+ var whole = document.querySelector('.a-price-whole')?.innerText?.replace(/[\\n.]/g,'');
92
+ var frac = document.querySelector('.a-price-fraction')?.innerText;
93
+ return (whole && frac) ? '$' + whole + '.' + frac
94
+ : document.querySelector('.a-price .a-offscreen')?.innerText || null;
95
+ })(),
96
+ list_price: document.querySelector('.basisPrice .a-offscreen')?.innerText,
97
+ rating: document.querySelector('#acrPopover')?.getAttribute('title'),
98
+ review_count: document.querySelector('#acrCustomerReviewText')?.innerText,
99
+ availability: document.querySelector('#availability span')?.innerText?.trim(),
100
+ brand: document.querySelector('#bylineInfo')?.innerText?.trim(),
101
+ asin: document.querySelector('input[name="ASIN"]')?.value,
102
+ bullet_points: Array.from(document.querySelectorAll('#feature-bullets li span.a-list-item'))
103
+ .map(e => e.innerText?.trim()).filter(t => t)
104
+ })
105
+ """)
106
+ ```
107
+
108
+ ### Price field notes
109
+ - `#priceblock_ourprice` and `#priceblock_dealprice` are **legacy** — they return `null` on modern product pages.
110
+ - Construct price from `.a-price-whole` + `.a-price-fraction` (both stripped of `\n` and `.`).
111
+ - As a fallback: first `.a-price .a-offscreen` on the page also works (confirmed `$69.99`).
112
+ - `list_price` from `.basisPrice .a-offscreen` shows the crossed-out "was" price when a discount exists.
113
+
114
+ ## Best Sellers Page
115
+
116
+ URL: `https://www.amazon.com/Best-Sellers-{Category}/zgbs/{slug}/`
117
+ e.g. `https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/`
118
+
119
+ ### DOM structure (2025)
120
+ `.zg-item-immersion` **does not exist** — Amazon migrated to CSS modules. Use `[data-asin]` anchored on `[id="gridItemRoot"]`:
121
+
122
+ ```python
123
+ goto_url("https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/")
124
+ wait_for_load()
125
+ wait(2)
126
+
127
+ items = js("""
128
+ Array.from(document.querySelectorAll('[data-asin]')).map(el => {
129
+ var container = el.closest('[id="gridItemRoot"]') || el;
130
+ return {
131
+ asin: el.getAttribute('data-asin'),
132
+ rank: container.querySelector('[class*="zg-bdg-text"]')?.innerText,
133
+ title: container.querySelector('img[alt]')?.getAttribute('alt'),
134
+ price: container.querySelector('.p13n-sc-price, .a-size-base.a-color-price')?.innerText,
135
+ url: 'https://www.amazon.com/dp/' + el.getAttribute('data-asin')
136
+ }
137
+ }).filter(r => r.rank)
138
+ """)
139
+ ```
140
+
141
+ Note: Title comes from the product image `alt` attribute — the text title elements use obfuscated CSS module class names that change between deployments.
142
+
143
+ ## Pagination
144
+
145
+ ```python
146
+ # Get next page URL directly
147
+ next_url = js("document.querySelector('.s-pagination-next')?.href")
148
+ if next_url:
149
+ goto_url(next_url)
150
+ wait_for_load()
151
+ wait(2)
152
+
153
+ # Or construct by page number
154
+ goto_url("https://www.amazon.com/s?k=wireless+mouse&page=2")
155
+ ```
156
+
157
+ ## Result Count
158
+
159
+ ```python
160
+ count_text = js("document.querySelector('[data-component-type=\"s-result-info-bar\"] h1')?.innerText?.trim()")
161
+ # Returns e.g.: '1-16 of over 40,000 results for "wireless mouse"\nSort by:\n...'
162
+ # Extract just the count: count_text.split('\n')[0]
163
+ ```
164
+
165
+ ## CAPTCHA Detection
166
+
167
+ No CAPTCHA was encountered during testing with a logged-in Chrome session. To detect defensively:
168
+
169
+ ```python
170
+ def check_captcha():
171
+ text = js("document.body.innerText.slice(0,500)") or ""
172
+ url = page_info()["url"]
173
+ return (
174
+ "captcha" in text.lower()
175
+ or "enter the characters" in text.lower()
176
+ or "sorry, we just need to make sure" in text.lower()
177
+ or "captcha" in url.lower()
178
+ or "validateCaptcha" in url
179
+ )
180
+
181
+ if check_captcha():
182
+ raise RuntimeError("Amazon CAPTCHA hit — stop and notify user")
183
+ ```
184
+
185
+ Amazon may serve a CAPTCHA on fresh/anonymous sessions. Using the browser's existing logged-in session avoids this in practice.
186
+
187
+ ## Gotchas
188
+
189
+ - **`goto_url()` silent failure**: On first visit, use `new_tab(url)` instead. After the tab is on Amazon, `goto_url()` works.
190
+ - **`.zg-item-immersion` is gone**: Best Sellers page uses CSS module classes (obfuscated). Use `[data-asin]` + `img[alt]` for title.
191
+ - **`.a-size-base.s-underline-text` is unreliable for review count**: On sponsored results it shows unrelated text (e.g. "Xbox"). Use `[aria-label*="ratings"]` instead.
192
+ - **`#priceblock_ourprice` is legacy**: Returns `null` on modern pages. Construct from `.a-price-whole` + `.a-price-fraction`.
193
+ - **Sponsored results appear first**: First 2-3 results are almost always `is_sponsored: true`. Filter them out with `!el.querySelector('.puis-sponsored-label-text')` when you need organic results.
194
+ - **`data-asin` can be empty string on non-product rows**: Filter with `.filter(r => r.asin)`.
195
+ - **Price split DOM**: `.a-price-whole` innerText includes a trailing `\n.` — strip it: `.replace(/[\n.]/g,'')`.
196
+ - **ASIN from URL**: Use `/dp/([A-Z0-9]{10})/` regex on the product URL. `data-asin` on search results is always the canonical ASIN.
197
+ - **`?th=1` redirect**: Amazon appends `?th=1` (and sometimes `?psc=1`) to product URLs after redirect. This is normal — `input[name="ASIN"]` always has the clean ASIN.
198
+ - **Wait 2s after `wait_for_load()`**: Amazon search results load the listing cards asynchronously. `readyState=complete` fires before cards render. A hard 2s wait is required.