@pencil-agent/nano-pencil 2.0.1 → 2.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (188) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/model/custom-providers.js +1 -1
  7. package/dist/core/model-registry.js +5 -5
  8. package/dist/extensions/builtin/AGENT.md +115 -115
  9. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  10. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  11. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  12. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  13. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  14. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  15. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  16. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  17. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  18. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  19. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  20. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  91. package/dist/extensions/builtin/browser/browser.md +73 -73
  92. package/dist/extensions/builtin/browser/install.md +142 -142
  93. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  94. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  95. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  96. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  97. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  98. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  99. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  100. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  101. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  102. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  103. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  104. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  105. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  107. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  108. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  109. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  110. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  111. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  112. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  113. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  114. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  115. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  116. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  117. package/dist/extensions/builtin/debug/index.js +9 -9
  118. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  119. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  120. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  121. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  122. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  123. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  124. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  125. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  126. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  127. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  128. package/dist/extensions/builtin/goal/README.md +67 -67
  129. package/dist/extensions/builtin/goal/index.js +6 -6
  130. package/dist/extensions/builtin/grub/README.md +112 -112
  131. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  132. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  133. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  134. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  135. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  136. package/dist/extensions/builtin/loop/README.md +92 -92
  137. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  138. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  139. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  140. package/dist/extensions/builtin/sal/README.md +72 -72
  141. package/dist/extensions/builtin/security-audit/README.md +289 -289
  142. package/dist/extensions/builtin/team/AGENT.md +112 -112
  143. package/dist/extensions/builtin/team/TESTING.md +299 -299
  144. package/dist/extensions/builtin/token-save/README.md +56 -56
  145. package/dist/extensions/optional/AGENT.md +10 -10
  146. package/dist/modes/interactive/controllers/input-submit-controller.js +2 -2
  147. package/dist/modes/interactive/controllers/stream-render-controller.js +2 -2
  148. package/dist/modes/interactive/interactive-mode.js +19 -19
  149. package/dist/modes/interactive/theme/dark.json +85 -85
  150. package/dist/modes/interactive/theme/light.json +84 -84
  151. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  152. package/dist/modes/interactive/theme/warm.json +81 -81
  153. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  154. package/dist/node_modules/@pencil-agent/ai/dist/models.generated.js +1 -1
  155. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +851 -0
  156. package/docs/SDK-TESTING.md +364 -0
  157. package/docs/codex-goal-command-impl.md +1055 -1055
  158. package/docs/codex-goal-vs-grub.md +500 -500
  159. package/docs/custom-provider.md +27 -27
  160. package/docs/extensions.md +27 -27
  161. package/docs/keybindings.md +27 -27
  162. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  163. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  164. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  165. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  166. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  167. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  168. package/docs/loop-usage-examples.md +214 -214
  169. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +593 -0
  170. package/docs/models.md +27 -27
  171. package/docs/packages.md +27 -27
  172. package/docs/pi-design-philosophy.md +457 -457
  173. package/docs/planmode.md +1987 -1987
  174. package/docs/prompt-templates.md +27 -27
  175. package/docs/providers.md +27 -27
  176. package/docs/sdk.md +27 -27
  177. package/docs/skills.md +27 -27
  178. package/docs/startup-performance-optimization.md +301 -0
  179. package/docs/themes.md +27 -27
  180. package/docs/tui.md +27 -27
  181. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +47 -0
  182. package/package.json +190 -190
  183. package/docs/cc-agent-design.md +0 -1297
  184. package/docs/cc-tui-design.md +0 -1333
  185. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +0 -170
  186. package/docs/scan-report.md +0 -3820
  187. package/docs//345/257/271/346/240/207Claude-Code.md +0 -1775
  188. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +0 -261
@@ -1,65 +1,65 @@
1
- # GitHub — Repo actions (star, unstar, watch)
2
-
3
- `https://github.com/{owner}/{repo}` — user-triggered actions on the repo header (Star, Unstar, Watch, Unwatch) are HTML forms that POST back to GitHub with the session's CSRF token already rendered inline. **Submit the form — do not click the button.**
4
-
5
- ## Do this first
6
-
7
- ```python
8
- # Precondition: user is logged in
9
- if not js('!!document.querySelector("meta[name=user-login]")'):
10
- raise RuntimeError("not logged in to GitHub")
11
-
12
- # Star the current repo
13
- js("""
14
- (()=>{
15
- const f = document.querySelector('form[action$="/star"]');
16
- if (!f) return 'already-starred-or-missing';
17
- f.submit();
18
- return 'submitted';
19
- })()
20
- """)
21
- wait(2)
22
- wait_for_load()
23
-
24
- # Verify — the toggle swaps which form is present
25
- starred = js('!!document.querySelector(\'form[action$="/unstar"]\')')
26
- ```
27
-
28
- Same pattern for the reverse (`form[action$="/unstar"]`) and for watch/unwatch (`form[action$="/subscription"]` + a hidden `_method` field, see below).
29
-
30
- ## Why not click the button
31
-
32
- The visible Star button looks like `button[aria-label^="Star "]`, but that selector has two gotchas on the modern repo header:
33
-
34
- - **There are two matching buttons.** The first one `querySelector` returns is a hidden fallback inside the sticky sub-header form with `getBoundingClientRect() == {x:0, y:0, w:0, h:0}`. Coordinate-clicking it does nothing because it has no geometry.
35
- - **Synthetic `.click()` on the visible React button does not persist the star.** The click fires, `aria-label` stays `Star ...`, network tab shows no POST. GitHub's component swallows the synthetic event somewhere in its React fiber handler.
36
-
37
- `form.submit()` sidesteps both problems — it bypasses React entirely and goes straight to the HTML form's POST. The authenticity token is already in a hidden input inside the form, so there's nothing extra to fetch.
38
-
39
- ## Watch / Unwatch
40
-
41
- The subscription form uses a shared endpoint with a `_method` override:
42
-
43
- ```python
44
- # Watch (all activity)
45
- js("""
46
- (()=>{
47
- const f = document.querySelector('form[action$="/subscription"]');
48
- if (!f) return 'missing';
49
- f.submit();
50
- return 'submitted';
51
- })()
52
- """)
53
- ```
54
-
55
- GitHub renders different form attributes (different `_method` hidden input values) depending on the current state. Re-read the form after every toggle rather than caching a reference.
56
-
57
- ## Gotchas
58
-
59
- - **Star count in the rendered button lags the true count by a hydration tick.** The durable signal that "this worked" is which form is on the page after reload: `form[action$="/star"]` present means unstarred, `form[action$="/unstar"]` means starred. The visible aria-label is reliable once you scroll to the top and wait ~1s after submit; the count inside the button updates on soft navigation and is not a good assertion target.
60
-
61
- - **`form.submit()` bypasses the form's `submit` event listeners** — fine for GitHub's case (the handler is a full navigation), but if a future change wires in `e.preventDefault()` to do an XHR, `form.requestSubmit()` is the safer alternative. Worth trying first if `form.submit()` stops working.
62
-
63
- - **If the user is not logged in the forms are not rendered at all.** `meta[name="user-login"]` is the cheapest pre-check.
64
-
65
- - **For read-only star counts, don't touch the DOM — use the API.** `http_get("https://api.github.com/repos/{owner}/{repo}")` returns `stargazers_count` without any browser interaction. See `scraping.md`. Only use the form-submit pattern when you actually need to *change* state on behalf of the logged-in user.
1
+ # GitHub — Repo actions (star, unstar, watch)
2
+
3
+ `https://github.com/{owner}/{repo}` — user-triggered actions on the repo header (Star, Unstar, Watch, Unwatch) are HTML forms that POST back to GitHub with the session's CSRF token already rendered inline. **Submit the form — do not click the button.**
4
+
5
+ ## Do this first
6
+
7
+ ```python
8
+ # Precondition: user is logged in
9
+ if not js('!!document.querySelector("meta[name=user-login]")'):
10
+ raise RuntimeError("not logged in to GitHub")
11
+
12
+ # Star the current repo
13
+ js("""
14
+ (()=>{
15
+ const f = document.querySelector('form[action$="/star"]');
16
+ if (!f) return 'already-starred-or-missing';
17
+ f.submit();
18
+ return 'submitted';
19
+ })()
20
+ """)
21
+ wait(2)
22
+ wait_for_load()
23
+
24
+ # Verify — the toggle swaps which form is present
25
+ starred = js('!!document.querySelector(\'form[action$="/unstar"]\')')
26
+ ```
27
+
28
+ Same pattern for the reverse (`form[action$="/unstar"]`) and for watch/unwatch (`form[action$="/subscription"]` + a hidden `_method` field, see below).
29
+
30
+ ## Why not click the button
31
+
32
+ The visible Star button looks like `button[aria-label^="Star "]`, but that selector has two gotchas on the modern repo header:
33
+
34
+ - **There are two matching buttons.** The first one `querySelector` returns is a hidden fallback inside the sticky sub-header form with `getBoundingClientRect() == {x:0, y:0, w:0, h:0}`. Coordinate-clicking it does nothing because it has no geometry.
35
+ - **Synthetic `.click()` on the visible React button does not persist the star.** The click fires, `aria-label` stays `Star ...`, network tab shows no POST. GitHub's component swallows the synthetic event somewhere in its React fiber handler.
36
+
37
+ `form.submit()` sidesteps both problems — it bypasses React entirely and goes straight to the HTML form's POST. The authenticity token is already in a hidden input inside the form, so there's nothing extra to fetch.
38
+
39
+ ## Watch / Unwatch
40
+
41
+ The subscription form uses a shared endpoint with a `_method` override:
42
+
43
+ ```python
44
+ # Watch (all activity)
45
+ js("""
46
+ (()=>{
47
+ const f = document.querySelector('form[action$="/subscription"]');
48
+ if (!f) return 'missing';
49
+ f.submit();
50
+ return 'submitted';
51
+ })()
52
+ """)
53
+ ```
54
+
55
+ GitHub renders different form attributes (different `_method` hidden input values) depending on the current state. Re-read the form after every toggle rather than caching a reference.
56
+
57
+ ## Gotchas
58
+
59
+ - **Star count in the rendered button lags the true count by a hydration tick.** The durable signal that "this worked" is which form is on the page after reload: `form[action$="/star"]` present means unstarred, `form[action$="/unstar"]` means starred. The visible aria-label is reliable once you scroll to the top and wait ~1s after submit; the count inside the button updates on soft navigation and is not a good assertion target.
60
+
61
+ - **`form.submit()` bypasses the form's `submit` event listeners** — fine for GitHub's case (the handler is a full navigation), but if a future change wires in `e.preventDefault()` to do an XHR, `form.requestSubmit()` is the safer alternative. Worth trying first if `form.submit()` stops working.
62
+
63
+ - **If the user is not logged in the forms are not rendered at all.** `meta[name="user-login"]` is the cheapest pre-check.
64
+
65
+ - **For read-only star counts, don't touch the DOM — use the API.** `http_get("https://api.github.com/repos/{owner}/{repo}")` returns `stargazers_count` without any browser interaction. See `scraping.md`. Only use the form-submit pattern when you actually need to *change* state on behalf of the logged-in user.
@@ -1,184 +1,184 @@
1
- # GitHub — Scraping & Data Extraction
2
-
3
- `https://github.com` — public data, mix of REST API (fast, rate-limited) and browser (trending page only).
4
-
5
- ## Do this first
6
-
7
- **Use the REST API for repo/user/release data — it's one call, no browser, fully parsed JSON.**
8
-
9
- ```python
10
- import json
11
- data = json.loads(http_get("https://api.github.com/repos/{owner}/{repo}"))
12
- # Key fields: stargazers_count, forks_count, description, language, topics,
13
- # open_issues_count, created_at, updated_at, pushed_at,
14
- # watchers_count, subscribers_count, network_count,
15
- # default_branch, license, homepage, visibility
16
- ```
17
-
18
- Use `raw.githubusercontent.com` for file contents — no rate limit, no auth, no base64 decode:
19
-
20
- ```python
21
- readme = http_get("https://raw.githubusercontent.com/owner/repo/main/README.md")
22
- content = http_get("https://raw.githubusercontent.com/owner/repo/main/pyproject.toml")
23
- ```
24
-
25
- Use the browser **only** for the trending page — it's server-side rendered HTML, no API equivalent.
26
-
27
- ## Common workflows
28
-
29
- ### Repo metadata (API)
30
-
31
- ```python
32
- import json
33
- data = json.loads(http_get("https://api.github.com/repos/browser-use/browser-use"))
34
- print(data['stargazers_count'], data['forks_count'], data['description'])
35
- # returns: 88349 10136 '🌐 Make websites accessible for AI agents.'
36
- ```
37
-
38
- ### User / org profile (API)
39
-
40
- ```python
41
- import json
42
- user = json.loads(http_get("https://api.github.com/users/browser-use"))
43
- print(user['type'], user['followers'], user['public_repos'], user['blog'])
44
- # returns: 'Organization' 3046 39 'https://browser-use.com'
45
- ```
46
-
47
- ### Trending page (browser required)
48
-
49
- The trending page is JS-rendered. `article.Box-row` selector confirmed working (15 results for today/all-languages, 12 for filtered). All fields work in a single JS call — **must navigate and wait in the same script run**, as each run is a separate exec context.
50
-
51
- ```python
52
- import json
53
- goto_url("https://github.com/trending") # or /trending/python?since=weekly
54
- wait_for_load()
55
- wait(2) # extra 2s — React hydration completes after readyState
56
-
57
- result = js("""
58
- (function(){
59
- var rows = Array.from(document.querySelectorAll('article.Box-row'));
60
- return JSON.stringify(rows.map(function(el){
61
- var h2link = el.querySelector('h2 a');
62
- var starLink = el.querySelector('a[href*="/stargazers"]');
63
- var forkLink = el.querySelector('a[href*="/forks"]');
64
- var langEl = el.querySelector('[itemprop="programmingLanguage"]');
65
- var todayEl = el.querySelector('.d-inline-block.float-sm-right');
66
- var descEl = el.querySelector('p');
67
- return {
68
- name: h2link ? h2link.innerText.trim().replace(/\\s+/g,' ') : null,
69
- url: h2link ? 'https://github.com' + h2link.getAttribute('href') : null,
70
- stars_total: starLink ? starLink.innerText.trim() : null,
71
- stars_period: todayEl ? todayEl.innerText.trim() : null,
72
- forks: forkLink ? forkLink.innerText.trim() : null,
73
- language: langEl ? langEl.innerText.trim() : null,
74
- desc: descEl ? descEl.innerText.trim() : null
75
- };
76
- }));
77
- })()
78
- """)
79
- repos = json.loads(result)
80
- # stars_period text is e.g. "737 stars today" or "47,053 stars this week"
81
- ```
82
-
83
- Supported URL params:
84
- - `/trending` — all languages, today
85
- - `/trending/python` — filtered to Python
86
- - `/trending?since=weekly` or `?since=monthly`
87
- - `/trending/python?since=weekly` — combined
88
-
89
- ### Search repositories (API)
90
-
91
- ```python
92
- import json
93
- results = json.loads(http_get(
94
- "https://api.github.com/search/repositories?q=browser+automation+language:python&sort=stars&per_page=10"
95
- ))
96
- print(results['total_count']) # e.g. 3250
97
- for r in results['items']:
98
- print(r['full_name'], r['stargazers_count'])
99
- ```
100
-
101
- Search API rate limit is **10 req/min** unauthenticated (separate from the 60/hour core limit). Runs out fast if called in a loop.
102
-
103
- ### Commits, releases, issues (API)
104
-
105
- ```python
106
- import json
107
- # Commits
108
- commits = json.loads(http_get("https://api.github.com/repos/owner/repo/commits?per_page=10"))
109
- # Fields: sha, commit.message, commit.author.date, author.login
110
-
111
- # Releases
112
- releases = json.loads(http_get("https://api.github.com/repos/owner/repo/releases?per_page=5"))
113
- # Fields: tag_name, name, published_at, body, assets
114
-
115
- # Issues
116
- issues = json.loads(http_get("https://api.github.com/repos/owner/repo/issues?state=open&per_page=10"))
117
- # Fields: number, title, labels, state, created_at, user.login
118
-
119
- # Contributors
120
- contribs = json.loads(http_get("https://api.github.com/repos/owner/repo/contributors?per_page=10"))
121
- # Fields: login, contributions
122
- ```
123
-
124
- ### File contents via API (base64)
125
-
126
- ```python
127
- import json, base64
128
- resp = json.loads(http_get("https://api.github.com/repos/owner/repo/contents/path/to/file.py"))
129
- content = base64.b64decode(resp['content']).decode()
130
- # resp also has: size, sha, html_url
131
- # Prefer raw.githubusercontent.com for large files — no base64, no rate limit hit
132
- ```
133
-
134
- ### Parallel fetching (multiple repos)
135
-
136
- ```python
137
- import json
138
- from concurrent.futures import ThreadPoolExecutor
139
-
140
- def fetch_repo(name):
141
- data = json.loads(http_get(f"https://api.github.com/repos/{name}"))
142
- return {"name": name, "stars": data['stargazers_count'], "lang": data['language']}
143
-
144
- repos = ["owner/repo1", "owner/repo2", "owner/repo3"]
145
- with ThreadPoolExecutor(max_workers=3) as ex:
146
- results = list(ex.map(fetch_repo, repos))
147
- # Confirmed working; watch rate limit — 60 unauthenticated calls/hour total
148
- ```
149
-
150
- ## Gotchas
151
-
152
- - **Rate limits are per IP, unauthenticated** — Core API: 60 req/hour. Search API: 10 req/min. These are separate pools. Check `/rate_limit` endpoint: `http_get("https://api.github.com/rate_limit")`. With a `GITHUB_TOKEN`, both limits increase to 5,000/hour.
153
-
154
- - **Token header format** — Use `Authorization: Bearer <token>` (not `token <token>`), plus `X-GitHub-Api-Version: 2022-11-28`:
155
- ```python
156
- import os
157
- token = os.environ.get('GITHUB_TOKEN', '')
158
- headers = {"Authorization": f"Bearer {token}", "X-GitHub-Api-Version": "2022-11-28"} if token else {}
159
- data = json.loads(http_get("https://api.github.com/repos/owner/repo", headers=headers))
160
- ```
161
-
162
- - **404 raises HTTPError, not a JSON error** — Wrap API calls for missing repos:
163
- ```python
164
- try:
165
- data = json.loads(http_get("https://api.github.com/repos/owner/repo"))
166
- except Exception as e:
167
- print("Not found or rate limited:", e)
168
- ```
169
-
170
- - **Code search requires auth** — `GET /search/code` returns HTTP 401 without a token. Repo/user/issues search works unauthenticated.
171
-
172
- - **Trending page selectors only work if navigation is in the same script run** — Each `uv run browser-harness` exec is fresh. Selectors that returned 0 results were run in a separate invocation after the page had navigated away. Always include `goto_url()` + `wait_for_load()` + `wait(2)` in the same script.
173
-
174
- - **wait(2) after wait_for_load() on trending** — `document.readyState == 'complete'` fires before React finishes painting repo cards. Without the extra 2s sleep, `article.Box-row` count was 0 even though the DOM technically loaded.
175
-
176
- - **Trending stars field is a string with commas** — `stars_total` comes back as `"4,548"` not `4548`. Parse with `int(r['stars_total'].replace(',', ''))` if you need to sort.
177
-
178
- - **stars_period text includes the period** — Value is `"737 stars today"` or `"47,053 stars this week"` — strip the trailing word if you want just the number.
179
-
180
- - **Repo page DOM is React-heavy, API is better** — Extracting star counts from the repo HTML page (`github.com/owner/repo`) is unreliable because GitHub uses React with server-side hydration and component IDs change. The REST API returns all the same data cleanly.
181
-
182
- - **raw.githubusercontent.com has no rate limit and no auth** — Use it for any public file. It serves the raw bytes, no JSON wrapping or base64.
183
-
184
- - **Trending page article count varies** — Today filter returned 15 articles, weekly Python filter returned 12. Don't assume 25 results; iterate `document.querySelectorAll('article.Box-row')` and take what's there.
1
+ # GitHub — Scraping & Data Extraction
2
+
3
+ `https://github.com` — public data, mix of REST API (fast, rate-limited) and browser (trending page only).
4
+
5
+ ## Do this first
6
+
7
+ **Use the REST API for repo/user/release data — it's one call, no browser, fully parsed JSON.**
8
+
9
+ ```python
10
+ import json
11
+ data = json.loads(http_get("https://api.github.com/repos/{owner}/{repo}"))
12
+ # Key fields: stargazers_count, forks_count, description, language, topics,
13
+ # open_issues_count, created_at, updated_at, pushed_at,
14
+ # watchers_count, subscribers_count, network_count,
15
+ # default_branch, license, homepage, visibility
16
+ ```
17
+
18
+ Use `raw.githubusercontent.com` for file contents — no rate limit, no auth, no base64 decode:
19
+
20
+ ```python
21
+ readme = http_get("https://raw.githubusercontent.com/owner/repo/main/README.md")
22
+ content = http_get("https://raw.githubusercontent.com/owner/repo/main/pyproject.toml")
23
+ ```
24
+
25
+ Use the browser **only** for the trending page — it's server-side rendered HTML, no API equivalent.
26
+
27
+ ## Common workflows
28
+
29
+ ### Repo metadata (API)
30
+
31
+ ```python
32
+ import json
33
+ data = json.loads(http_get("https://api.github.com/repos/browser-use/browser-use"))
34
+ print(data['stargazers_count'], data['forks_count'], data['description'])
35
+ # returns: 88349 10136 '🌐 Make websites accessible for AI agents.'
36
+ ```
37
+
38
+ ### User / org profile (API)
39
+
40
+ ```python
41
+ import json
42
+ user = json.loads(http_get("https://api.github.com/users/browser-use"))
43
+ print(user['type'], user['followers'], user['public_repos'], user['blog'])
44
+ # returns: 'Organization' 3046 39 'https://browser-use.com'
45
+ ```
46
+
47
+ ### Trending page (browser required)
48
+
49
+ The trending page is JS-rendered. `article.Box-row` selector confirmed working (15 results for today/all-languages, 12 for filtered). All fields work in a single JS call — **must navigate and wait in the same script run**, as each run is a separate exec context.
50
+
51
+ ```python
52
+ import json
53
+ goto_url("https://github.com/trending") # or /trending/python?since=weekly
54
+ wait_for_load()
55
+ wait(2) # extra 2s — React hydration completes after readyState
56
+
57
+ result = js("""
58
+ (function(){
59
+ var rows = Array.from(document.querySelectorAll('article.Box-row'));
60
+ return JSON.stringify(rows.map(function(el){
61
+ var h2link = el.querySelector('h2 a');
62
+ var starLink = el.querySelector('a[href*="/stargazers"]');
63
+ var forkLink = el.querySelector('a[href*="/forks"]');
64
+ var langEl = el.querySelector('[itemprop="programmingLanguage"]');
65
+ var todayEl = el.querySelector('.d-inline-block.float-sm-right');
66
+ var descEl = el.querySelector('p');
67
+ return {
68
+ name: h2link ? h2link.innerText.trim().replace(/\\s+/g,' ') : null,
69
+ url: h2link ? 'https://github.com' + h2link.getAttribute('href') : null,
70
+ stars_total: starLink ? starLink.innerText.trim() : null,
71
+ stars_period: todayEl ? todayEl.innerText.trim() : null,
72
+ forks: forkLink ? forkLink.innerText.trim() : null,
73
+ language: langEl ? langEl.innerText.trim() : null,
74
+ desc: descEl ? descEl.innerText.trim() : null
75
+ };
76
+ }));
77
+ })()
78
+ """)
79
+ repos = json.loads(result)
80
+ # stars_period text is e.g. "737 stars today" or "47,053 stars this week"
81
+ ```
82
+
83
+ Supported URL params:
84
+ - `/trending` — all languages, today
85
+ - `/trending/python` — filtered to Python
86
+ - `/trending?since=weekly` or `?since=monthly`
87
+ - `/trending/python?since=weekly` — combined
88
+
89
+ ### Search repositories (API)
90
+
91
+ ```python
92
+ import json
93
+ results = json.loads(http_get(
94
+ "https://api.github.com/search/repositories?q=browser+automation+language:python&sort=stars&per_page=10"
95
+ ))
96
+ print(results['total_count']) # e.g. 3250
97
+ for r in results['items']:
98
+ print(r['full_name'], r['stargazers_count'])
99
+ ```
100
+
101
+ Search API rate limit is **10 req/min** unauthenticated (separate from the 60/hour core limit). Runs out fast if called in a loop.
102
+
103
+ ### Commits, releases, issues (API)
104
+
105
+ ```python
106
+ import json
107
+ # Commits
108
+ commits = json.loads(http_get("https://api.github.com/repos/owner/repo/commits?per_page=10"))
109
+ # Fields: sha, commit.message, commit.author.date, author.login
110
+
111
+ # Releases
112
+ releases = json.loads(http_get("https://api.github.com/repos/owner/repo/releases?per_page=5"))
113
+ # Fields: tag_name, name, published_at, body, assets
114
+
115
+ # Issues
116
+ issues = json.loads(http_get("https://api.github.com/repos/owner/repo/issues?state=open&per_page=10"))
117
+ # Fields: number, title, labels, state, created_at, user.login
118
+
119
+ # Contributors
120
+ contribs = json.loads(http_get("https://api.github.com/repos/owner/repo/contributors?per_page=10"))
121
+ # Fields: login, contributions
122
+ ```
123
+
124
+ ### File contents via API (base64)
125
+
126
+ ```python
127
+ import json, base64
128
+ resp = json.loads(http_get("https://api.github.com/repos/owner/repo/contents/path/to/file.py"))
129
+ content = base64.b64decode(resp['content']).decode()
130
+ # resp also has: size, sha, html_url
131
+ # Prefer raw.githubusercontent.com for large files — no base64, no rate limit hit
132
+ ```
133
+
134
+ ### Parallel fetching (multiple repos)
135
+
136
+ ```python
137
+ import json
138
+ from concurrent.futures import ThreadPoolExecutor
139
+
140
+ def fetch_repo(name):
141
+ data = json.loads(http_get(f"https://api.github.com/repos/{name}"))
142
+ return {"name": name, "stars": data['stargazers_count'], "lang": data['language']}
143
+
144
+ repos = ["owner/repo1", "owner/repo2", "owner/repo3"]
145
+ with ThreadPoolExecutor(max_workers=3) as ex:
146
+ results = list(ex.map(fetch_repo, repos))
147
+ # Confirmed working; watch rate limit — 60 unauthenticated calls/hour total
148
+ ```
149
+
150
+ ## Gotchas
151
+
152
+ - **Rate limits are per IP, unauthenticated** — Core API: 60 req/hour. Search API: 10 req/min. These are separate pools. Check `/rate_limit` endpoint: `http_get("https://api.github.com/rate_limit")`. With a `GITHUB_TOKEN`, both limits increase to 5,000/hour.
153
+
154
+ - **Token header format** — Use `Authorization: Bearer <token>` (not `token <token>`), plus `X-GitHub-Api-Version: 2022-11-28`:
155
+ ```python
156
+ import os
157
+ token = os.environ.get('GITHUB_TOKEN', '')
158
+ headers = {"Authorization": f"Bearer {token}", "X-GitHub-Api-Version": "2022-11-28"} if token else {}
159
+ data = json.loads(http_get("https://api.github.com/repos/owner/repo", headers=headers))
160
+ ```
161
+
162
+ - **404 raises HTTPError, not a JSON error** — Wrap API calls for missing repos:
163
+ ```python
164
+ try:
165
+ data = json.loads(http_get("https://api.github.com/repos/owner/repo"))
166
+ except Exception as e:
167
+ print("Not found or rate limited:", e)
168
+ ```
169
+
170
+ - **Code search requires auth** — `GET /search/code` returns HTTP 401 without a token. Repo/user/issues search works unauthenticated.
171
+
172
+ - **Trending page selectors only work if navigation is in the same script run** — Each `uv run browser-harness` exec is fresh. Selectors that returned 0 results were run in a separate invocation after the page had navigated away. Always include `goto_url()` + `wait_for_load()` + `wait(2)` in the same script.
173
+
174
+ - **wait(2) after wait_for_load() on trending** — `document.readyState == 'complete'` fires before React finishes painting repo cards. Without the extra 2s sleep, `article.Box-row` count was 0 even though the DOM technically loaded.
175
+
176
+ - **Trending stars field is a string with commas** — `stars_total` comes back as `"4,548"` not `4548`. Parse with `int(r['stars_total'].replace(',', ''))` if you need to sort.
177
+
178
+ - **stars_period text includes the period** — Value is `"737 stars today"` or `"47,053 stars this week"` — strip the trailing word if you want just the number.
179
+
180
+ - **Repo page DOM is React-heavy, API is better** — Extracting star counts from the repo HTML page (`github.com/owner/repo`) is unreliable because GitHub uses React with server-side hydration and component IDs change. The REST API returns all the same data cleanly.
181
+
182
+ - **raw.githubusercontent.com has no rate limit and no auth** — Use it for any public file. It serves the raw bytes, no JSON wrapping or base64.
183
+
184
+ - **Trending page article count varies** — Today filter returned 15 articles, weekly Python filter returned 12. Don't assume 25 results; iterate `document.querySelectorAll('article.Box-row')` and take what's there.