@pencil-agent/nano-pencil 2.0.1 → 2.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (186) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/model/custom-providers.js +1 -1
  7. package/dist/core/model-registry.js +5 -5
  8. package/dist/extensions/builtin/AGENT.md +115 -115
  9. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  10. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  11. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  12. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  13. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  14. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  15. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  16. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  17. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  18. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  19. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  20. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  91. package/dist/extensions/builtin/browser/browser.md +73 -73
  92. package/dist/extensions/builtin/browser/install.md +142 -142
  93. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  94. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  95. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  96. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  97. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  98. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  99. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  100. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  101. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  102. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  103. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  104. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  105. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  107. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  108. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  109. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  110. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  111. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  112. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  113. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  114. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  115. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  116. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  117. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  118. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  119. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  120. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  121. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  122. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  123. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  124. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  125. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  126. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  127. package/dist/extensions/builtin/goal/README.md +67 -67
  128. package/dist/extensions/builtin/grub/README.md +112 -112
  129. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  130. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  131. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  132. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  133. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  134. package/dist/extensions/builtin/loop/README.md +92 -92
  135. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  136. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  137. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  138. package/dist/extensions/builtin/sal/README.md +72 -72
  139. package/dist/extensions/builtin/security-audit/README.md +289 -289
  140. package/dist/extensions/builtin/team/AGENT.md +112 -112
  141. package/dist/extensions/builtin/team/TESTING.md +299 -299
  142. package/dist/extensions/builtin/token-save/README.md +56 -56
  143. package/dist/extensions/optional/AGENT.md +10 -10
  144. package/dist/modes/interactive/controllers/input-submit-controller.js +2 -2
  145. package/dist/modes/interactive/controllers/stream-render-controller.js +2 -2
  146. package/dist/modes/interactive/interactive-mode.js +19 -19
  147. package/dist/modes/interactive/theme/dark.json +85 -85
  148. package/dist/modes/interactive/theme/light.json +84 -84
  149. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  150. package/dist/modes/interactive/theme/warm.json +81 -81
  151. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  152. package/dist/node_modules/@pencil-agent/ai/dist/models.generated.js +1 -1
  153. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +851 -0
  154. package/docs/SDK-TESTING.md +364 -0
  155. package/docs/codex-goal-command-impl.md +1055 -1055
  156. package/docs/codex-goal-vs-grub.md +500 -500
  157. package/docs/custom-provider.md +27 -27
  158. package/docs/extensions.md +27 -27
  159. package/docs/keybindings.md +27 -27
  160. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  161. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  162. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  163. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  164. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  165. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  166. package/docs/loop-usage-examples.md +214 -214
  167. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +593 -0
  168. package/docs/models.md +27 -27
  169. package/docs/packages.md +27 -27
  170. package/docs/pi-design-philosophy.md +457 -457
  171. package/docs/planmode.md +1987 -1987
  172. package/docs/prompt-templates.md +27 -27
  173. package/docs/providers.md +27 -27
  174. package/docs/sdk.md +27 -27
  175. package/docs/skills.md +27 -27
  176. package/docs/startup-performance-optimization.md +301 -0
  177. package/docs/themes.md +27 -27
  178. package/docs/tui.md +27 -27
  179. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +47 -0
  180. package/package.json +190 -190
  181. package/docs/cc-agent-design.md +0 -1297
  182. package/docs/cc-tui-design.md +0 -1333
  183. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +0 -170
  184. package/docs/scan-report.md +0 -3820
  185. package/docs//345/257/271/346/240/207Claude-Code.md +0 -1775
  186. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +0 -261
@@ -1,120 +1,120 @@
1
- # Medium — Article Body via DOM
2
-
3
- Extract a Medium article's body as clean markdown using the logged-in browser. Use this when API paths in `scraping.md` are blocked or truncated:
4
-
5
- - Cloudflare challenge on the `?format=json` endpoint ("Performing security verification")
6
- - Member-only post that the API returns locked (`isSubscriptionLocked=True`) but the logged-in browser can render in full
7
- - JS-only variant where the article is gated behind a client-side paywall modal
8
-
9
- If the article is free and the API works, prefer `scraping.md` — it's faster and doesn't need a visible tab.
10
-
11
- ## URL patterns
12
-
13
- - Canonical: `https://medium.com/@<author>/<slug>-<id>`
14
- - Publication: `https://<pub>.medium.com/<slug>-<id>` or `https://medium.com/<pub>/<slug>-<id>`
15
- - Custom domain: some publications (e.g. `towardsdatascience.com`) proxy Medium; the same DOM extractor works there.
16
-
17
- All variants render the article body inside a single `<article>` element.
18
-
19
- ## Site structure
20
-
21
- - The article body lives under the page's single `<article>` element.
22
- - Block-level content: `h1`–`h4`, `p`, `pre`, `blockquote`, `ul`, `ol`, `figure`.
23
- - Images are always wrapped in `<figure>` with a `<figcaption>` sibling; the real resolution lives on `miro.medium.com/v2/resize:fit:<N>/...`.
24
- - Code blocks are `<pre>` — no language class is exposed in the DOM, so emit plain fenced blocks.
25
- - Pull quotes render as `<blockquote>` with nested `<p>`.
26
-
27
- ## Cruft to strip
28
-
29
- Medium injects engagement UI **inside** `<article>`. The text "6 2 Listen Share More" at the top is the clap/comment/listen/share button row, not content. Also expect a follow button near the author's name and sometimes a "Help" / "Status" footer.
30
-
31
- Safe pattern: take the extracted markdown, then drop leading paragraphs that are shorter than ~12 characters until you hit the first real block (the "Last updated" line, the H1, or the first long paragraph).
32
-
33
- ## Extractor
34
-
35
- ````bash
36
- browser-harness <<'PY'
37
- new_tab("https://medium.com/@user/slug-abc123")
38
- wait_for_load()
39
- wait(2.0) # Medium hydrates more UI after readyState=complete
40
-
41
- md = js(r"""
42
- (()=>{
43
- const article = document.querySelector('article');
44
- if(!article) return null;
45
- const blocks = article.querySelectorAll('h1, h2, h3, h4, p, pre, blockquote, ul, ol, figure');
46
- const out = [];
47
- const seen = new Set();
48
- for(const el of blocks){
49
- let skip = false;
50
- for(const s of seen){ if(s.contains(el) && s !== el){ skip=true; break; } }
51
- if(skip) continue;
52
- seen.add(el);
53
- const tag = el.tagName;
54
- const txt = (el.innerText || '').trim();
55
- if(!txt && tag !== 'FIGURE') continue;
56
- if(tag === 'H1') out.push('# ' + txt);
57
- else if(tag === 'H2') out.push('## ' + txt);
58
- else if(tag === 'H3') out.push('### ' + txt);
59
- else if(tag === 'H4') out.push('#### ' + txt);
60
- else if(tag === 'PRE') out.push('```\n' + txt + '\n```');
61
- else if(tag === 'BLOCKQUOTE') out.push(txt.split('\n').map(l=>'> '+l).join('\n'));
62
- else if(tag === 'UL' || tag === 'OL'){
63
- const items = [...el.querySelectorAll(':scope > li')].map((li,i)=>{
64
- const t = li.innerText.trim();
65
- return (tag==='OL' ? (i+1)+'. ' : '- ') + t;
66
- });
67
- out.push(items.join('\n'));
68
- }
69
- else if(tag === 'FIGURE'){
70
- const img = el.querySelector('img');
71
- const cap = el.querySelector('figcaption');
72
- if(img && img.src){
73
- const alt = img.alt || (cap ? cap.innerText.trim() : '');
74
- out.push('![' + alt + '](' + img.src + ')');
75
- }
76
- }
77
- else if(tag === 'P') out.push(txt);
78
- }
79
- return out.join('\n\n');
80
- })()
81
- """)
82
-
83
- # Strip engagement-button cruft from the top
84
- paras = md.split('\n\n')
85
- while paras and len(paras[0]) < 12:
86
- paras.pop(0)
87
- md = '\n\n'.join(paras)
88
- print(md)
89
- PY
90
- ````
91
-
92
- The `seen` set avoids double-emitting when an `<li>` matches the block query inside its `<ul>`.
93
-
94
- ## Waits
95
-
96
- - `wait_for_load()` is necessary but not sufficient — Medium continues to hydrate author-card and clap widgets after `readyState=complete`. An additional `wait(2.0)` avoids cases where the article outer frame exists but the first few paragraphs are still skeleton `<div>`s.
97
- - For member-only articles, if `<article>` renders but text length is suspiciously short (<500 chars), the paywall modal intercepted. Confirm the tab is on your logged-in profile and retry.
98
-
99
- ## Paywall / login detection
100
-
101
- ```python
102
- state = js("""
103
- (()=>{
104
- const art = document.querySelector('article');
105
- const len = art ? art.innerText.length : 0;
106
- const hasPaywall = !!document.querySelector('[data-testid*="paywall"], [aria-label*="Sign in" i]');
107
- return {len, hasPaywall};
108
- })()
109
- """)
110
- ```
111
-
112
- If `hasPaywall` is true or `len < 500`, fall back to `scraping.md` API paths (the article may simply be locked for this account).
113
-
114
- ## Traps
115
-
116
- - **Don't use `article.innerText` alone.** It drops structure — code blocks lose their fences, lists lose their markers, figures disappear. The block walker above preserves each element kind.
117
- - **Don't rely on CSS class names.** Medium's class names are hashed (`pw-post-body-paragraph`, etc.) and rotate; select by tag instead.
118
- - **`<figure>` caption text is often also repeated as `<img alt>`.** Prefer `alt`, fall back to `figcaption`, so you don't emit both.
119
- - **The article ends before the "About the Author" card sometimes, sometimes not.** The walker captures both, which is fine for archival. If you need body-only, cut at the last `h2`/`h3` before a `<hr>`-equivalent divider, or trim by known footer strings (`Follow`, `More from`, `Written by`).
120
- - **Tab marker.** `new_tab()` prepends 🟢 to the title. Don't include `document.title` in the emitted markdown — use the article's `<h1>` instead.
1
+ # Medium — Article Body via DOM
2
+
3
+ Extract a Medium article's body as clean markdown using the logged-in browser. Use this when API paths in `scraping.md` are blocked or truncated:
4
+
5
+ - Cloudflare challenge on the `?format=json` endpoint ("Performing security verification")
6
+ - Member-only post that the API returns locked (`isSubscriptionLocked=True`) but the logged-in browser can render in full
7
+ - JS-only variant where the article is gated behind a client-side paywall modal
8
+
9
+ If the article is free and the API works, prefer `scraping.md` — it's faster and doesn't need a visible tab.
10
+
11
+ ## URL patterns
12
+
13
+ - Canonical: `https://medium.com/@<author>/<slug>-<id>`
14
+ - Publication: `https://<pub>.medium.com/<slug>-<id>` or `https://medium.com/<pub>/<slug>-<id>`
15
+ - Custom domain: some publications (e.g. `towardsdatascience.com`) proxy Medium; the same DOM extractor works there.
16
+
17
+ All variants render the article body inside a single `<article>` element.
18
+
19
+ ## Site structure
20
+
21
+ - The article body lives under the page's single `<article>` element.
22
+ - Block-level content: `h1`–`h4`, `p`, `pre`, `blockquote`, `ul`, `ol`, `figure`.
23
+ - Images are always wrapped in `<figure>` with a `<figcaption>` sibling; the real resolution lives on `miro.medium.com/v2/resize:fit:<N>/...`.
24
+ - Code blocks are `<pre>` — no language class is exposed in the DOM, so emit plain fenced blocks.
25
+ - Pull quotes render as `<blockquote>` with nested `<p>`.
26
+
27
+ ## Cruft to strip
28
+
29
+ Medium injects engagement UI **inside** `<article>`. The text "6 2 Listen Share More" at the top is the clap/comment/listen/share button row, not content. Also expect a follow button near the author's name and sometimes a "Help" / "Status" footer.
30
+
31
+ Safe pattern: take the extracted markdown, then drop leading paragraphs that are shorter than ~12 characters until you hit the first real block (the "Last updated" line, the H1, or the first long paragraph).
32
+
33
+ ## Extractor
34
+
35
+ ````bash
36
+ browser-harness <<'PY'
37
+ new_tab("https://medium.com/@user/slug-abc123")
38
+ wait_for_load()
39
+ wait(2.0) # Medium hydrates more UI after readyState=complete
40
+
41
+ md = js(r"""
42
+ (()=>{
43
+ const article = document.querySelector('article');
44
+ if(!article) return null;
45
+ const blocks = article.querySelectorAll('h1, h2, h3, h4, p, pre, blockquote, ul, ol, figure');
46
+ const out = [];
47
+ const seen = new Set();
48
+ for(const el of blocks){
49
+ let skip = false;
50
+ for(const s of seen){ if(s.contains(el) && s !== el){ skip=true; break; } }
51
+ if(skip) continue;
52
+ seen.add(el);
53
+ const tag = el.tagName;
54
+ const txt = (el.innerText || '').trim();
55
+ if(!txt && tag !== 'FIGURE') continue;
56
+ if(tag === 'H1') out.push('# ' + txt);
57
+ else if(tag === 'H2') out.push('## ' + txt);
58
+ else if(tag === 'H3') out.push('### ' + txt);
59
+ else if(tag === 'H4') out.push('#### ' + txt);
60
+ else if(tag === 'PRE') out.push('```\n' + txt + '\n```');
61
+ else if(tag === 'BLOCKQUOTE') out.push(txt.split('\n').map(l=>'> '+l).join('\n'));
62
+ else if(tag === 'UL' || tag === 'OL'){
63
+ const items = [...el.querySelectorAll(':scope > li')].map((li,i)=>{
64
+ const t = li.innerText.trim();
65
+ return (tag==='OL' ? (i+1)+'. ' : '- ') + t;
66
+ });
67
+ out.push(items.join('\n'));
68
+ }
69
+ else if(tag === 'FIGURE'){
70
+ const img = el.querySelector('img');
71
+ const cap = el.querySelector('figcaption');
72
+ if(img && img.src){
73
+ const alt = img.alt || (cap ? cap.innerText.trim() : '');
74
+ out.push('![' + alt + '](' + img.src + ')');
75
+ }
76
+ }
77
+ else if(tag === 'P') out.push(txt);
78
+ }
79
+ return out.join('\n\n');
80
+ })()
81
+ """)
82
+
83
+ # Strip engagement-button cruft from the top
84
+ paras = md.split('\n\n')
85
+ while paras and len(paras[0]) < 12:
86
+ paras.pop(0)
87
+ md = '\n\n'.join(paras)
88
+ print(md)
89
+ PY
90
+ ````
91
+
92
+ The `seen` set avoids double-emitting when an `<li>` matches the block query inside its `<ul>`.
93
+
94
+ ## Waits
95
+
96
+ - `wait_for_load()` is necessary but not sufficient — Medium continues to hydrate author-card and clap widgets after `readyState=complete`. An additional `wait(2.0)` avoids cases where the article outer frame exists but the first few paragraphs are still skeleton `<div>`s.
97
+ - For member-only articles, if `<article>` renders but text length is suspiciously short (<500 chars), the paywall modal intercepted. Confirm the tab is on your logged-in profile and retry.
98
+
99
+ ## Paywall / login detection
100
+
101
+ ```python
102
+ state = js("""
103
+ (()=>{
104
+ const art = document.querySelector('article');
105
+ const len = art ? art.innerText.length : 0;
106
+ const hasPaywall = !!document.querySelector('[data-testid*="paywall"], [aria-label*="Sign in" i]');
107
+ return {len, hasPaywall};
108
+ })()
109
+ """)
110
+ ```
111
+
112
+ If `hasPaywall` is true or `len < 500`, fall back to `scraping.md` API paths (the article may simply be locked for this account).
113
+
114
+ ## Traps
115
+
116
+ - **Don't use `article.innerText` alone.** It drops structure — code blocks lose their fences, lists lose their markers, figures disappear. The block walker above preserves each element kind.
117
+ - **Don't rely on CSS class names.** Medium's class names are hashed (`pw-post-body-paragraph`, etc.) and rotate; select by tag instead.
118
+ - **`<figure>` caption text is often also repeated as `<img alt>`.** Prefer `alt`, fall back to `figcaption`, so you don't emit both.
119
+ - **The article ends before the "About the Author" card sometimes, sometimes not.** The walker captures both, which is fine for archival. If you need body-only, cut at the last `h2`/`h3` before a `<hr>`-equivalent divider, or trim by known footer strings (`Follow`, `More from`, `Written by`).
120
+ - **Tab marker.** `new_tab()` prepends 🟢 to the title. Don't include `document.title` in the emitted markdown — use the article's `<h1>` instead.