@pencil-agent/nano-pencil 2.0.0-beta.8 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (241) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/extensions-host/index.d.ts +1 -1
  7. package/dist/core/extensions-host/loader.js +1 -1
  8. package/dist/core/extensions-host/runner.d.ts +1 -0
  9. package/dist/core/extensions-host/runner.js +2 -2
  10. package/dist/core/extensions-host/types.d.ts +17 -22
  11. package/dist/core/lib/ai/src/types.d.ts +12 -2
  12. package/dist/core/persona/persona-manager.js +5 -2
  13. package/dist/core/runtime/agent-session.js +3 -3
  14. package/dist/core/runtime/extension-core-bindings.d.ts +1 -0
  15. package/dist/core/runtime/extension-core-bindings.js +2 -2
  16. package/dist/extensions/builtin/AGENT.md +115 -115
  17. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  18. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  19. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  20. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  91. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  92. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  93. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  94. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  95. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  96. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  97. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  98. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  99. package/dist/extensions/builtin/browser/browser.md +73 -73
  100. package/dist/extensions/builtin/browser/install.md +142 -142
  101. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  102. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  103. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  104. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  105. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  107. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  108. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  109. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  110. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  111. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  112. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  113. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  114. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  115. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  116. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  117. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  118. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  119. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  120. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  121. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  122. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  123. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  124. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  125. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  126. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  127. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  128. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  129. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  130. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  131. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  132. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  133. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  134. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  135. package/dist/extensions/builtin/goal/README.md +67 -67
  136. package/dist/extensions/builtin/goal/goal-controller.d.ts +39 -10
  137. package/dist/extensions/builtin/goal/goal-controller.js +1 -1
  138. package/dist/extensions/builtin/goal/goal-format.js +1 -1
  139. package/dist/extensions/builtin/goal/goal-prompts.d.ts +2 -0
  140. package/dist/extensions/builtin/goal/goal-prompts.js +5 -4
  141. package/dist/extensions/builtin/goal/goal-store.js +1 -1
  142. package/dist/extensions/builtin/goal/index.d.ts +1 -1
  143. package/dist/extensions/builtin/goal/index.js +10 -7
  144. package/dist/extensions/builtin/grub/README.md +112 -112
  145. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  146. package/dist/extensions/builtin/link-world/index.js +6 -6
  147. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  148. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  149. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  150. package/dist/extensions/builtin/link-world/{network-routing.md → network-routing/network-routing.md} +67 -67
  151. package/dist/extensions/builtin/loop/README.md +92 -92
  152. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  153. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  154. package/dist/extensions/builtin/plan/index.js +1 -1
  155. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  156. package/dist/extensions/builtin/sal/README.md +72 -72
  157. package/dist/extensions/builtin/security-audit/README.md +289 -289
  158. package/dist/extensions/builtin/task/task-store.d.ts +4 -0
  159. package/dist/extensions/builtin/task/task-store.js +1 -1
  160. package/dist/extensions/builtin/team/AGENT.md +112 -112
  161. package/dist/extensions/builtin/team/TESTING.md +299 -299
  162. package/dist/extensions/builtin/token-save/README.md +56 -56
  163. package/dist/extensions/optional/AGENT.md +10 -10
  164. package/dist/index.d.ts +5 -30
  165. package/dist/index.js +1 -1
  166. package/dist/models.d.ts +7 -0
  167. package/dist/models.js +1 -0
  168. package/dist/modes/interactive/components/footer.js +1 -1
  169. package/dist/modes/interactive/components/task-status-panel.d.ts +36 -0
  170. package/dist/modes/interactive/components/task-status-panel.js +1 -0
  171. package/dist/modes/interactive/controllers/stream-render-controller.d.ts +7 -0
  172. package/dist/modes/interactive/controllers/stream-render-controller.js +2 -2
  173. package/dist/modes/interactive/interactive-mode.js +40 -40
  174. package/dist/modes/interactive/state/interactive-state.d.ts +2 -0
  175. package/dist/modes/interactive/state/interactive-state.js +1 -1
  176. package/dist/modes/interactive/theme/dark.json +85 -85
  177. package/dist/modes/interactive/theme/light.json +84 -84
  178. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  179. package/dist/modes/interactive/theme/warm.json +81 -81
  180. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  181. package/dist/node_modules/@pencil-agent/ai/dist/models.generated.js +1 -1
  182. package/dist/node_modules/@pencil-agent/ai/dist/providers/anthropic.js +2 -2
  183. package/dist/node_modules/@pencil-agent/ai/dist/providers/openai-completions.js +5 -5
  184. package/dist/node_modules/@pencil-agent/ai/dist/providers/openai-responses.js +1 -1
  185. package/dist/node_modules/@pencil-agent/ai/dist/stream.js +1 -1
  186. package/dist/packages/protocol/src/commands.d.ts +33 -0
  187. package/dist/packages/protocol/src/flags.d.ts +20 -0
  188. package/dist/packages/protocol/src/hooks.d.ts +17 -0
  189. package/dist/packages/protocol/src/hooks.js +0 -0
  190. package/dist/packages/{extension-sdk → protocol}/src/index.d.ts +7 -4
  191. package/dist/packages/protocol/src/index.js +1 -0
  192. package/dist/packages/{extension-sdk → protocol}/src/lifecycle.d.ts +15 -27
  193. package/dist/packages/protocol/src/lifecycle.js +0 -0
  194. package/dist/packages/{extension-sdk → protocol}/src/tools.d.ts +1 -1
  195. package/dist/packages/protocol/src/tools.js +0 -0
  196. package/dist/public-config.d.ts +12 -0
  197. package/dist/public-config.js +1 -0
  198. package/dist/runtime.d.ts +9 -0
  199. package/dist/runtime.js +1 -0
  200. package/dist/session-compaction.d.ts +7 -0
  201. package/dist/session-compaction.js +1 -0
  202. package/dist/session.d.ts +7 -0
  203. package/dist/session.js +1 -0
  204. package/dist/skills.d.ts +7 -0
  205. package/dist/skills.js +1 -0
  206. package/dist/tools.d.ts +7 -0
  207. package/dist/tools.js +1 -0
  208. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +851 -0
  209. package/docs/SDK-TESTING.md +364 -0
  210. package/docs/codex-goal-command-impl.md +1055 -1055
  211. package/docs/codex-goal-vs-grub.md +500 -500
  212. package/docs/custom-provider.md +27 -27
  213. package/docs/extensions.md +27 -27
  214. package/docs/keybindings.md +27 -27
  215. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  216. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  217. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  218. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  219. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  220. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  221. package/docs/loop-usage-examples.md +214 -214
  222. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +593 -0
  223. package/docs/models.md +27 -27
  224. package/docs/packages.md +27 -27
  225. package/docs/pi-design-philosophy.md +457 -457
  226. package/docs/planmode.md +1987 -1987
  227. package/docs/prompt-templates.md +27 -27
  228. package/docs/providers.md +27 -27
  229. package/docs/sdk.md +27 -27
  230. package/docs/skills.md +27 -27
  231. package/docs/startup-performance-optimization.md +301 -0
  232. package/docs/themes.md +27 -27
  233. package/docs/tui.md +27 -27
  234. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +47 -0
  235. package/package.json +190 -162
  236. package/dist/packages/extension-sdk/src/index.js +0 -1
  237. package/docs/cc-agent-design.md +0 -1297
  238. package/docs/cc-tui-design.md +0 -1333
  239. package/docs//345/257/271/346/240/207Claude-Code.md +0 -1775
  240. /package/dist/packages/{extension-sdk/src/lifecycle.js → protocol/src/commands.js} +0 -0
  241. /package/dist/packages/{extension-sdk/src/tools.js → protocol/src/flags.js} +0 -0
@@ -1,362 +1,362 @@
1
- # SoundCloud — Data Extraction
2
-
3
- Field-tested against soundcloud.com on 2026-04-18.
4
- No authentication required for any approach documented here. All code uses `http_get` (pure HTTP, no browser).
5
-
6
- ---
7
-
8
- ## Approach 1 (Fastest): oEmbed API — No Auth, No Client ID
9
-
10
- `https://soundcloud.com/oembed?url=<resource_url>&format=json`
11
-
12
- Returns JSON in ~0.3s. Works for **tracks, playlists/sets, and user profiles**. No key required.
13
-
14
- ```python
15
- from helpers import http_get
16
- import json
17
-
18
- def soundcloud_oembed(resource_url):
19
- """Fetch oEmbed metadata for any public SoundCloud URL.
20
-
21
- Works for:
22
- - https://soundcloud.com/{user}/{track-slug}
23
- - https://soundcloud.com/{user}/sets/{playlist-slug}
24
- - https://soundcloud.com/{user}
25
- """
26
- url = f"https://soundcloud.com/oembed?url={resource_url}&format=json"
27
- return json.loads(http_get(url))
28
-
29
- # Track
30
- track = soundcloud_oembed("https://soundcloud.com/forss/flickermood")
31
- # {
32
- # "version": 1.0,
33
- # "type": "rich",
34
- # "provider_name": "SoundCloud",
35
- # "provider_url": "https://soundcloud.com",
36
- # "height": 400,
37
- # "width": "100%",
38
- # "title": "Flickermood by Forss",
39
- # "description": "From the Soulhack album...",
40
- # "thumbnail_url": "https://i1.sndcdn.com/artworks-000067273316-smsiqx-t500x500.jpg",
41
- # "html": "<iframe width=\"100%\" height=\"400\" scrolling=\"no\" frameborder=\"no\" src=\"https://w.soundcloud.com/player/?visual=true&url=...\">",
42
- # "author_name": "Forss",
43
- # "author_url": "https://soundcloud.com/forss"
44
- # }
45
-
46
- # Playlist/set
47
- pl = soundcloud_oembed("https://soundcloud.com/forss/sets/soulhack")
48
- # title="Soulhack by Forss", description="My 2003 debut album...", height=450
49
-
50
- # User profile
51
- user = soundcloud_oembed("https://soundcloud.com/forss")
52
- # title="Forss", description="Artist & Founder SoundCloud", height=450
53
- ```
54
-
55
- ### oEmbed fields
56
-
57
- | Field | Type | Notes |
58
- |-------|------|-------|
59
- | `title` | str | "{Track Title} by {Artist}" for tracks, "{Name}" for users |
60
- | `author_name` | str | Artist/user display name |
61
- | `author_url` | str | Profile URL |
62
- | `thumbnail_url` | str | Artwork at 500×500px (t500x500) |
63
- | `description` | str | Track/profile description (may contain HTML entities) |
64
- | `html` | str | Embed iframe for the SoundCloud player widget |
65
- | `height` | int | 400 for tracks, 450 for playlists and users |
66
- | `width` | str | Always `"100%"` |
67
-
68
- ---
69
-
70
- ## Approach 2: Page Hydration (`__sc_hydration`) — Rich Metadata, No Client ID
71
-
72
- Every SoundCloud page embeds a JSON array in a `<script>` tag as `window.__sc_hydration`. This contains full API-grade metadata with no key required.
73
-
74
- ```python
75
- from helpers import http_get
76
- import json, re
77
-
78
- def extract_hydration(page_url):
79
- """Extract __sc_hydration JSON from any SoundCloud page."""
80
- html = http_get(page_url)
81
- match = re.search(r'window\.__sc_hydration\s*=\s*(\[.*?\]);\s*<', html, re.DOTALL)
82
- if not match:
83
- return []
84
- return json.loads(match.group(1))
85
-
86
- def get_hydration_by_type(page_url, hydratable):
87
- """Get the 'data' dict for a specific hydratable type."""
88
- for obj in extract_hydration(page_url):
89
- if obj.get('hydratable') == hydratable:
90
- return obj.get('data')
91
- return None
92
-
93
- # Track page — hydration key is 'sound'
94
- track = get_hydration_by_type("https://soundcloud.com/forss/flickermood", "sound")
95
- # track['id'] = 293
96
- # track['title'] = "Flickermood"
97
- # track['playback_count'] = 962685
98
- # track['likes_count'] = 2592
99
- # track['duration'] = 213886 (milliseconds)
100
- # track['genre'] = "Electronic"
101
- # track['created_at'] = "2007-09-22T14:45:46Z"
102
- # track['artwork_url'] = "https://i1.sndcdn.com/artworks-000067273316-smsiqx-large.jpg"
103
- # track['waveform_url'] = "https://wave.sndcdn.com/cWHNerOLlkUq_m.json"
104
- # track['streamable'] = True
105
- # track['downloadable'] = True
106
- # track['license'] = "all-rights-reserved"
107
- # track['tag_list'] = "downtempo"
108
- # track['urn'] = "soundcloud:tracks:293"
109
- # track['media'] = {'transcodings': [...]} (HLS/progressive stream URLs — need auth)
110
- # track['user'] = {full user object nested}
111
-
112
- # User page — hydration key is 'user'
113
- user = get_hydration_by_type("https://soundcloud.com/forss", "user")
114
- # user['id'] = 183
115
- # user['username'] = "Forss"
116
- # user['full_name'] = "Eric Quidenus-Wahlforss"
117
- # user['followers_count'] = 132203
118
- # user['track_count'] = 26
119
- # user['verified'] = True
120
- # user['city'] = "Berlin"
121
- # user['country_code'] = "DE"
122
- # user['description'] = "Artist & Founder SoundCloud"
123
- # user['creator_subscription'] = {'product': {'id': 'creator-pro-unlimited'}}
124
- # user['badges'] = {'pro_unlimited': True, 'verified': True}
125
-
126
- # Playlist/set page — hydration key is 'playlist'
127
- playlist = get_hydration_by_type("https://soundcloud.com/forss/sets/soulhack", "playlist")
128
- # playlist['id'] = 18
129
- # playlist['title'] = "Soulhack"
130
- # playlist['track_count'] = 11
131
- # playlist['tracks'] = [full track objects list]
132
- # playlist['is_album'] = True/False
133
- # playlist['genre'] = "Electronic"
134
- ```
135
-
136
- ### All hydration keys on a typical page
137
-
138
- | `hydratable` | Content |
139
- |---|---|
140
- | `sound` | Full track object (on track pages) |
141
- | `playlist` | Full playlist + all tracks (on set pages) |
142
- | `user` | Full user object (on any page with a profile) |
143
- | `apiClient` | `{'id': '<client_id>', 'isExpiring': False}` — the client_id |
144
- | `geoip` | Viewer country/city/coordinates |
145
- | `features` | Feature flags dict |
146
- | `anonymousId` | Session tracking ID (not useful) |
147
-
148
- ---
149
-
150
- ## Approach 3: API v2 — Full Query Power (Requires Client ID)
151
-
152
- The `client_id` lives in every page's `__sc_hydration` under the `apiClient` key. It is **stable across all pages and sessions** — extract once and reuse.
153
-
154
- ```python
155
- from helpers import http_get
156
- import json, re
157
-
158
- def get_client_id(page_url="https://soundcloud.com"):
159
- """Extract client_id from any SoundCloud page's __sc_hydration."""
160
- html = http_get(page_url)
161
- match = re.search(r'window\.__sc_hydration\s*=\s*(\[.*?\]);\s*<', html, re.DOTALL)
162
- if not match:
163
- raise ValueError("No hydration found")
164
- for obj in json.loads(match.group(1)):
165
- if obj.get('hydratable') == 'apiClient':
166
- return obj['data']['id']
167
- raise ValueError("apiClient not found in hydration")
168
-
169
- CLIENT_ID = get_client_id() # "efg2kjLJnAJpInbN6P3hsHzispI1SKQH" (example — extract fresh)
170
-
171
- def sc_api(path, **params):
172
- """Call api-v2.soundcloud.com. Returns parsed JSON."""
173
- params['client_id'] = CLIENT_ID
174
- qs = "&".join(f"{k}={v}" for k, v in params.items())
175
- return json.loads(http_get(f"https://api-v2.soundcloud.com/{path}?{qs}"))
176
- ```
177
-
178
- ### Resolve any URL to a resource
179
-
180
- ```python
181
- # Resolve a permalink URL to get its resource with full metadata
182
- track = sc_api("resolve", url="https://soundcloud.com/forss/flickermood")
183
- # Returns: {'kind': 'track', 'id': 293, 'title': 'Flickermood', ...}
184
-
185
- user = sc_api("resolve", url="https://soundcloud.com/forss")
186
- # Returns: {'kind': 'user', 'id': 183, 'username': 'Forss', ...}
187
- ```
188
-
189
- ### Track lookup
190
-
191
- ```python
192
- # Single track by numeric ID
193
- track = sc_api("tracks/293")
194
-
195
- # Bulk track lookup (comma-separated IDs — returns list)
196
- tracks = sc_api("tracks", ids="293,290,48031525")
197
- # Returns a JSON array directly (not wrapped in 'collection')
198
- for t in tracks:
199
- print(t['id'], t['title'], t['playback_count'])
200
- ```
201
-
202
- ### Search
203
-
204
- ```python
205
- # Tracks
206
- results = sc_api("search/tracks", q="jazz", limit=20)
207
- # results['collection'] = list of track objects
208
- # results['total_results'] = 5293248
209
- # results['next_href'] = pagination URL (see below)
210
-
211
- # Users
212
- results = sc_api("search/users", q="jazz", limit=10)
213
-
214
- # Playlists/sets
215
- results = sc_api("search/playlists", q="jazz", limit=10)
216
-
217
- # Paginate with next_href
218
- def paginate(first_response):
219
- """Yield all pages of a collection response."""
220
- yield from first_response.get('collection', [])
221
- next_href = first_response.get('next_href')
222
- while next_href:
223
- page = json.loads(http_get(f"{next_href}&client_id={CLIENT_ID}"))
224
- yield from page.get('collection', [])
225
- next_href = page.get('next_href')
226
- ```
227
-
228
- ### Trending charts
229
-
230
- ```python
231
- # Trending tracks across all genres
232
- trending = sc_api("charts", kind="trending",
233
- genre="soundcloud:genres:all-music", limit=20)
234
- for item in trending['collection']:
235
- t = item['track']
236
- print(f"{t['title']} — score={item['score']:.4f}")
237
-
238
- # Genre options: soundcloud:genres:all-music, soundcloud:genres:electronic,
239
- # soundcloud:genres:hiphoprap, soundcloud:genres:ambient, etc.
240
- ```
241
-
242
- ### User resources
243
-
244
- ```python
245
- user_id = 183 # numeric ID from resolve or hydration
246
-
247
- # User's tracks
248
- tracks = sc_api(f"users/{user_id}/tracks", limit=20)
249
- # tracks['collection'] = list of track objects
250
-
251
- # User's playlists
252
- playlists = sc_api(f"users/{user_id}/playlists", limit=10)
253
-
254
- # User's likes
255
- likes = sc_api(f"users/{user_id}/likes", limit=10)
256
-
257
- # Related tracks for a track
258
- related = sc_api("tracks/293/related", limit=10)
259
- # related['collection'] = list of track objects
260
- ```
261
-
262
- ### Waveform data
263
-
264
- ```python
265
- # Waveform URL comes from track['waveform_url']
266
- waveform_url = "https://wave.sndcdn.com/cWHNerOLlkUq_m.json"
267
- waveform = json.loads(http_get(waveform_url))
268
- # {
269
- # 'width': 1800, # number of sample points
270
- # 'height': 140, # max amplitude value
271
- # 'samples': [11, 86, 91, 80, ...] # 1800 amplitude values
272
- # }
273
- ```
274
-
275
- ---
276
-
277
- ## Full track fields from `__sc_hydration` / API v2
278
-
279
- ```
280
- id int Numeric track ID (e.g. 293)
281
- urn str "soundcloud:tracks:293"
282
- title str Track title
283
- description str May contain HTML entities/tags
284
- genre str Genre string
285
- tag_list str Space-separated tags
286
- created_at str ISO 8601 UTC
287
- last_modified str ISO 8601 UTC
288
- release_date str ISO 8601 UTC (original release)
289
- display_date str ISO 8601 UTC (shown to users)
290
- duration int Milliseconds
291
- full_duration int Milliseconds (untruncated)
292
- playback_count int
293
- likes_count int
294
- reposts_count int
295
- comment_count int
296
- download_count int
297
- artwork_url str e.g. .../artworks-...-large.jpg (replace 'large' with 't500x500' for 500px)
298
- waveform_url str https://wave.sndcdn.com/....json
299
- permalink str Slug (e.g. "flickermood")
300
- permalink_url str Full canonical URL
301
- streamable bool
302
- downloadable bool
303
- license str e.g. "all-rights-reserved", "cc-by"
304
- sharing str "public" or "private"
305
- state str "finished" | "processing" | "failed"
306
- monetization_model str "AD_SUPPORTED" | "SUB_HIGH_TIER" | "NOT_APPLICABLE"
307
- embeddable_by str "all" | "me" | "none"
308
- user dict Nested user object (id, username, avatar_url, verified, ...)
309
- user_id int Owner numeric ID
310
- publisher_metadata dict {artist, publisher, isrc, contains_music, ...}
311
- media dict {'transcodings': [...]} — stream URLs (require OAuth, not usable without login)
312
- label_name str Record label
313
- purchase_url str External buy link
314
- station_urn str "soundcloud:system-playlists:track-stations:{id}"
315
- ```
316
-
317
- ---
318
-
319
- ## Gotchas
320
-
321
- **client_id is required for api-v2.soundcloud.com** — requests without it return HTTP 401. Always extract from `__sc_hydration['apiClient']['id']`.
322
-
323
- **client_id source: hydration, not JS bundles** — the JS bundles on `a-v2.sndcdn.com` do NOT contain the `client_id` pattern. The only reliable source is the `apiClient` object in the page hydration. It is stable across all pages (same value from homepage, track pages, user pages) and does not appear to rotate on short timescales.
324
-
325
- **Artwork URL sizes** — hydration/API returns `...-large.jpg` (100×100). Replace the size suffix to get larger images:
326
- - `-large.jpg` → 100×100
327
- - `-t300x300.jpg` → 300×300
328
- - `-t500x500.jpg` → 500×500 (oEmbed returns this size)
329
-
330
- **Regex must use `re.DOTALL`** — the `__sc_hydration` JSON spans multiple lines. Without `re.DOTALL`, the `.` in the regex won't match newlines.
331
-
332
- **Stream URLs (media.transcodings) are gated** — the HLS/progressive audio stream URLs in `track['media']['transcodings']` require an OAuth token even to fetch a stream manifest. They cannot be played without a logged-in session.
333
-
334
- **Bulk track lookup returns a list, not collection** — `GET /tracks?ids=...` returns a JSON array directly. Do NOT look for `.get('collection')`.
335
-
336
- **Search `total_results` can be huge** — results like 5M+ are normal for broad queries. Use `next_href` for pagination; do not calculate offsets manually.
337
-
338
- **oEmbed description contains HTML** — SoundCloud descriptions may include `&nbsp;` and anchor tags. Decode with `html.unescape()` if you need plain text.
339
-
340
- **HTTP 400 on some endpoints** — `/tracks/{id}/comments` returns 400 without OAuth headers. Timed comments are not accessible without login.
341
-
342
- **No browser required** — all documented approaches work with plain `http_get`. SoundCloud does not require JavaScript rendering for metadata extraction.
343
-
344
- **Rate limits** — 20 rapid sequential API v2 requests completed without errors in testing. SoundCloud does not publish official rate limits; stay under ~50 req/s for sustained scraping. oEmbed is more lenient than api-v2.
345
-
346
- ---
347
-
348
- ## Quick Reference
349
-
350
- | Goal | Approach | Auth |
351
- |------|----------|------|
352
- | Track title/author/thumbnail from URL | oEmbed | None |
353
- | Full track metadata + play counts | `__sc_hydration` `sound` key | None |
354
- | Full user profile + stats | `__sc_hydration` `user` key | None |
355
- | Full playlist with all tracks | `__sc_hydration` `playlist` key | None |
356
- | Search tracks/users/playlists | API v2 `/search/*` | client_id |
357
- | Trending charts | API v2 `/charts` | client_id |
358
- | Bulk track lookup by IDs | API v2 `/tracks?ids=` | client_id |
359
- | User's track list | API v2 `/users/{id}/tracks` | client_id |
360
- | Resolve permalink to resource | API v2 `/resolve?url=` | client_id |
361
- | Waveform amplitude data | Direct fetch of `waveform_url` | None |
362
- | Audio stream playback | OAuth login required | Login |
1
+ # SoundCloud — Data Extraction
2
+
3
+ Field-tested against soundcloud.com on 2026-04-18.
4
+ No authentication required for any approach documented here. All code uses `http_get` (pure HTTP, no browser).
5
+
6
+ ---
7
+
8
+ ## Approach 1 (Fastest): oEmbed API — No Auth, No Client ID
9
+
10
+ `https://soundcloud.com/oembed?url=<resource_url>&format=json`
11
+
12
+ Returns JSON in ~0.3s. Works for **tracks, playlists/sets, and user profiles**. No key required.
13
+
14
+ ```python
15
+ from helpers import http_get
16
+ import json
17
+
18
+ def soundcloud_oembed(resource_url):
19
+ """Fetch oEmbed metadata for any public SoundCloud URL.
20
+
21
+ Works for:
22
+ - https://soundcloud.com/{user}/{track-slug}
23
+ - https://soundcloud.com/{user}/sets/{playlist-slug}
24
+ - https://soundcloud.com/{user}
25
+ """
26
+ url = f"https://soundcloud.com/oembed?url={resource_url}&format=json"
27
+ return json.loads(http_get(url))
28
+
29
+ # Track
30
+ track = soundcloud_oembed("https://soundcloud.com/forss/flickermood")
31
+ # {
32
+ # "version": 1.0,
33
+ # "type": "rich",
34
+ # "provider_name": "SoundCloud",
35
+ # "provider_url": "https://soundcloud.com",
36
+ # "height": 400,
37
+ # "width": "100%",
38
+ # "title": "Flickermood by Forss",
39
+ # "description": "From the Soulhack album...",
40
+ # "thumbnail_url": "https://i1.sndcdn.com/artworks-000067273316-smsiqx-t500x500.jpg",
41
+ # "html": "<iframe width=\"100%\" height=\"400\" scrolling=\"no\" frameborder=\"no\" src=\"https://w.soundcloud.com/player/?visual=true&url=...\">",
42
+ # "author_name": "Forss",
43
+ # "author_url": "https://soundcloud.com/forss"
44
+ # }
45
+
46
+ # Playlist/set
47
+ pl = soundcloud_oembed("https://soundcloud.com/forss/sets/soulhack")
48
+ # title="Soulhack by Forss", description="My 2003 debut album...", height=450
49
+
50
+ # User profile
51
+ user = soundcloud_oembed("https://soundcloud.com/forss")
52
+ # title="Forss", description="Artist & Founder SoundCloud", height=450
53
+ ```
54
+
55
+ ### oEmbed fields
56
+
57
+ | Field | Type | Notes |
58
+ |-------|------|-------|
59
+ | `title` | str | "{Track Title} by {Artist}" for tracks, "{Name}" for users |
60
+ | `author_name` | str | Artist/user display name |
61
+ | `author_url` | str | Profile URL |
62
+ | `thumbnail_url` | str | Artwork at 500×500px (t500x500) |
63
+ | `description` | str | Track/profile description (may contain HTML entities) |
64
+ | `html` | str | Embed iframe for the SoundCloud player widget |
65
+ | `height` | int | 400 for tracks, 450 for playlists and users |
66
+ | `width` | str | Always `"100%"` |
67
+
68
+ ---
69
+
70
+ ## Approach 2: Page Hydration (`__sc_hydration`) — Rich Metadata, No Client ID
71
+
72
+ Every SoundCloud page embeds a JSON array in a `<script>` tag as `window.__sc_hydration`. This contains full API-grade metadata with no key required.
73
+
74
+ ```python
75
+ from helpers import http_get
76
+ import json, re
77
+
78
+ def extract_hydration(page_url):
79
+ """Extract __sc_hydration JSON from any SoundCloud page."""
80
+ html = http_get(page_url)
81
+ match = re.search(r'window\.__sc_hydration\s*=\s*(\[.*?\]);\s*<', html, re.DOTALL)
82
+ if not match:
83
+ return []
84
+ return json.loads(match.group(1))
85
+
86
+ def get_hydration_by_type(page_url, hydratable):
87
+ """Get the 'data' dict for a specific hydratable type."""
88
+ for obj in extract_hydration(page_url):
89
+ if obj.get('hydratable') == hydratable:
90
+ return obj.get('data')
91
+ return None
92
+
93
+ # Track page — hydration key is 'sound'
94
+ track = get_hydration_by_type("https://soundcloud.com/forss/flickermood", "sound")
95
+ # track['id'] = 293
96
+ # track['title'] = "Flickermood"
97
+ # track['playback_count'] = 962685
98
+ # track['likes_count'] = 2592
99
+ # track['duration'] = 213886 (milliseconds)
100
+ # track['genre'] = "Electronic"
101
+ # track['created_at'] = "2007-09-22T14:45:46Z"
102
+ # track['artwork_url'] = "https://i1.sndcdn.com/artworks-000067273316-smsiqx-large.jpg"
103
+ # track['waveform_url'] = "https://wave.sndcdn.com/cWHNerOLlkUq_m.json"
104
+ # track['streamable'] = True
105
+ # track['downloadable'] = True
106
+ # track['license'] = "all-rights-reserved"
107
+ # track['tag_list'] = "downtempo"
108
+ # track['urn'] = "soundcloud:tracks:293"
109
+ # track['media'] = {'transcodings': [...]} (HLS/progressive stream URLs — need auth)
110
+ # track['user'] = {full user object nested}
111
+
112
+ # User page — hydration key is 'user'
113
+ user = get_hydration_by_type("https://soundcloud.com/forss", "user")
114
+ # user['id'] = 183
115
+ # user['username'] = "Forss"
116
+ # user['full_name'] = "Eric Quidenus-Wahlforss"
117
+ # user['followers_count'] = 132203
118
+ # user['track_count'] = 26
119
+ # user['verified'] = True
120
+ # user['city'] = "Berlin"
121
+ # user['country_code'] = "DE"
122
+ # user['description'] = "Artist & Founder SoundCloud"
123
+ # user['creator_subscription'] = {'product': {'id': 'creator-pro-unlimited'}}
124
+ # user['badges'] = {'pro_unlimited': True, 'verified': True}
125
+
126
+ # Playlist/set page — hydration key is 'playlist'
127
+ playlist = get_hydration_by_type("https://soundcloud.com/forss/sets/soulhack", "playlist")
128
+ # playlist['id'] = 18
129
+ # playlist['title'] = "Soulhack"
130
+ # playlist['track_count'] = 11
131
+ # playlist['tracks'] = [full track objects list]
132
+ # playlist['is_album'] = True/False
133
+ # playlist['genre'] = "Electronic"
134
+ ```
135
+
136
+ ### All hydration keys on a typical page
137
+
138
+ | `hydratable` | Content |
139
+ |---|---|
140
+ | `sound` | Full track object (on track pages) |
141
+ | `playlist` | Full playlist + all tracks (on set pages) |
142
+ | `user` | Full user object (on any page with a profile) |
143
+ | `apiClient` | `{'id': '<client_id>', 'isExpiring': False}` — the client_id |
144
+ | `geoip` | Viewer country/city/coordinates |
145
+ | `features` | Feature flags dict |
146
+ | `anonymousId` | Session tracking ID (not useful) |
147
+
148
+ ---
149
+
150
+ ## Approach 3: API v2 — Full Query Power (Requires Client ID)
151
+
152
+ The `client_id` lives in every page's `__sc_hydration` under the `apiClient` key. It is **stable across all pages and sessions** — extract once and reuse.
153
+
154
+ ```python
155
+ from helpers import http_get
156
+ import json, re
157
+
158
+ def get_client_id(page_url="https://soundcloud.com"):
159
+ """Extract client_id from any SoundCloud page's __sc_hydration."""
160
+ html = http_get(page_url)
161
+ match = re.search(r'window\.__sc_hydration\s*=\s*(\[.*?\]);\s*<', html, re.DOTALL)
162
+ if not match:
163
+ raise ValueError("No hydration found")
164
+ for obj in json.loads(match.group(1)):
165
+ if obj.get('hydratable') == 'apiClient':
166
+ return obj['data']['id']
167
+ raise ValueError("apiClient not found in hydration")
168
+
169
+ CLIENT_ID = get_client_id() # "efg2kjLJnAJpInbN6P3hsHzispI1SKQH" (example — extract fresh)
170
+
171
+ def sc_api(path, **params):
172
+ """Call api-v2.soundcloud.com. Returns parsed JSON."""
173
+ params['client_id'] = CLIENT_ID
174
+ qs = "&".join(f"{k}={v}" for k, v in params.items())
175
+ return json.loads(http_get(f"https://api-v2.soundcloud.com/{path}?{qs}"))
176
+ ```
177
+
178
+ ### Resolve any URL to a resource
179
+
180
+ ```python
181
+ # Resolve a permalink URL to get its resource with full metadata
182
+ track = sc_api("resolve", url="https://soundcloud.com/forss/flickermood")
183
+ # Returns: {'kind': 'track', 'id': 293, 'title': 'Flickermood', ...}
184
+
185
+ user = sc_api("resolve", url="https://soundcloud.com/forss")
186
+ # Returns: {'kind': 'user', 'id': 183, 'username': 'Forss', ...}
187
+ ```
188
+
189
+ ### Track lookup
190
+
191
+ ```python
192
+ # Single track by numeric ID
193
+ track = sc_api("tracks/293")
194
+
195
+ # Bulk track lookup (comma-separated IDs — returns list)
196
+ tracks = sc_api("tracks", ids="293,290,48031525")
197
+ # Returns a JSON array directly (not wrapped in 'collection')
198
+ for t in tracks:
199
+ print(t['id'], t['title'], t['playback_count'])
200
+ ```
201
+
202
+ ### Search
203
+
204
+ ```python
205
+ # Tracks
206
+ results = sc_api("search/tracks", q="jazz", limit=20)
207
+ # results['collection'] = list of track objects
208
+ # results['total_results'] = 5293248
209
+ # results['next_href'] = pagination URL (see below)
210
+
211
+ # Users
212
+ results = sc_api("search/users", q="jazz", limit=10)
213
+
214
+ # Playlists/sets
215
+ results = sc_api("search/playlists", q="jazz", limit=10)
216
+
217
+ # Paginate with next_href
218
+ def paginate(first_response):
219
+ """Yield all pages of a collection response."""
220
+ yield from first_response.get('collection', [])
221
+ next_href = first_response.get('next_href')
222
+ while next_href:
223
+ page = json.loads(http_get(f"{next_href}&client_id={CLIENT_ID}"))
224
+ yield from page.get('collection', [])
225
+ next_href = page.get('next_href')
226
+ ```
227
+
228
+ ### Trending charts
229
+
230
+ ```python
231
+ # Trending tracks across all genres
232
+ trending = sc_api("charts", kind="trending",
233
+ genre="soundcloud:genres:all-music", limit=20)
234
+ for item in trending['collection']:
235
+ t = item['track']
236
+ print(f"{t['title']} — score={item['score']:.4f}")
237
+
238
+ # Genre options: soundcloud:genres:all-music, soundcloud:genres:electronic,
239
+ # soundcloud:genres:hiphoprap, soundcloud:genres:ambient, etc.
240
+ ```
241
+
242
+ ### User resources
243
+
244
+ ```python
245
+ user_id = 183 # numeric ID from resolve or hydration
246
+
247
+ # User's tracks
248
+ tracks = sc_api(f"users/{user_id}/tracks", limit=20)
249
+ # tracks['collection'] = list of track objects
250
+
251
+ # User's playlists
252
+ playlists = sc_api(f"users/{user_id}/playlists", limit=10)
253
+
254
+ # User's likes
255
+ likes = sc_api(f"users/{user_id}/likes", limit=10)
256
+
257
+ # Related tracks for a track
258
+ related = sc_api("tracks/293/related", limit=10)
259
+ # related['collection'] = list of track objects
260
+ ```
261
+
262
+ ### Waveform data
263
+
264
+ ```python
265
+ # Waveform URL comes from track['waveform_url']
266
+ waveform_url = "https://wave.sndcdn.com/cWHNerOLlkUq_m.json"
267
+ waveform = json.loads(http_get(waveform_url))
268
+ # {
269
+ # 'width': 1800, # number of sample points
270
+ # 'height': 140, # max amplitude value
271
+ # 'samples': [11, 86, 91, 80, ...] # 1800 amplitude values
272
+ # }
273
+ ```
274
+
275
+ ---
276
+
277
+ ## Full track fields from `__sc_hydration` / API v2
278
+
279
+ ```
280
+ id int Numeric track ID (e.g. 293)
281
+ urn str "soundcloud:tracks:293"
282
+ title str Track title
283
+ description str May contain HTML entities/tags
284
+ genre str Genre string
285
+ tag_list str Space-separated tags
286
+ created_at str ISO 8601 UTC
287
+ last_modified str ISO 8601 UTC
288
+ release_date str ISO 8601 UTC (original release)
289
+ display_date str ISO 8601 UTC (shown to users)
290
+ duration int Milliseconds
291
+ full_duration int Milliseconds (untruncated)
292
+ playback_count int
293
+ likes_count int
294
+ reposts_count int
295
+ comment_count int
296
+ download_count int
297
+ artwork_url str e.g. .../artworks-...-large.jpg (replace 'large' with 't500x500' for 500px)
298
+ waveform_url str https://wave.sndcdn.com/....json
299
+ permalink str Slug (e.g. "flickermood")
300
+ permalink_url str Full canonical URL
301
+ streamable bool
302
+ downloadable bool
303
+ license str e.g. "all-rights-reserved", "cc-by"
304
+ sharing str "public" or "private"
305
+ state str "finished" | "processing" | "failed"
306
+ monetization_model str "AD_SUPPORTED" | "SUB_HIGH_TIER" | "NOT_APPLICABLE"
307
+ embeddable_by str "all" | "me" | "none"
308
+ user dict Nested user object (id, username, avatar_url, verified, ...)
309
+ user_id int Owner numeric ID
310
+ publisher_metadata dict {artist, publisher, isrc, contains_music, ...}
311
+ media dict {'transcodings': [...]} — stream URLs (require OAuth, not usable without login)
312
+ label_name str Record label
313
+ purchase_url str External buy link
314
+ station_urn str "soundcloud:system-playlists:track-stations:{id}"
315
+ ```
316
+
317
+ ---
318
+
319
+ ## Gotchas
320
+
321
+ **client_id is required for api-v2.soundcloud.com** — requests without it return HTTP 401. Always extract from `__sc_hydration['apiClient']['id']`.
322
+
323
+ **client_id source: hydration, not JS bundles** — the JS bundles on `a-v2.sndcdn.com` do NOT contain the `client_id` pattern. The only reliable source is the `apiClient` object in the page hydration. It is stable across all pages (same value from homepage, track pages, user pages) and does not appear to rotate on short timescales.
324
+
325
+ **Artwork URL sizes** — hydration/API returns `...-large.jpg` (100×100). Replace the size suffix to get larger images:
326
+ - `-large.jpg` → 100×100
327
+ - `-t300x300.jpg` → 300×300
328
+ - `-t500x500.jpg` → 500×500 (oEmbed returns this size)
329
+
330
+ **Regex must use `re.DOTALL`** — the `__sc_hydration` JSON spans multiple lines. Without `re.DOTALL`, the `.` in the regex won't match newlines.
331
+
332
+ **Stream URLs (media.transcodings) are gated** — the HLS/progressive audio stream URLs in `track['media']['transcodings']` require an OAuth token even to fetch a stream manifest. They cannot be played without a logged-in session.
333
+
334
+ **Bulk track lookup returns a list, not collection** — `GET /tracks?ids=...` returns a JSON array directly. Do NOT look for `.get('collection')`.
335
+
336
+ **Search `total_results` can be huge** — results like 5M+ are normal for broad queries. Use `next_href` for pagination; do not calculate offsets manually.
337
+
338
+ **oEmbed description contains HTML** — SoundCloud descriptions may include `&nbsp;` and anchor tags. Decode with `html.unescape()` if you need plain text.
339
+
340
+ **HTTP 400 on some endpoints** — `/tracks/{id}/comments` returns 400 without OAuth headers. Timed comments are not accessible without login.
341
+
342
+ **No browser required** — all documented approaches work with plain `http_get`. SoundCloud does not require JavaScript rendering for metadata extraction.
343
+
344
+ **Rate limits** — 20 rapid sequential API v2 requests completed without errors in testing. SoundCloud does not publish official rate limits; stay under ~50 req/s for sustained scraping. oEmbed is more lenient than api-v2.
345
+
346
+ ---
347
+
348
+ ## Quick Reference
349
+
350
+ | Goal | Approach | Auth |
351
+ |------|----------|------|
352
+ | Track title/author/thumbnail from URL | oEmbed | None |
353
+ | Full track metadata + play counts | `__sc_hydration` `sound` key | None |
354
+ | Full user profile + stats | `__sc_hydration` `user` key | None |
355
+ | Full playlist with all tracks | `__sc_hydration` `playlist` key | None |
356
+ | Search tracks/users/playlists | API v2 `/search/*` | client_id |
357
+ | Trending charts | API v2 `/charts` | client_id |
358
+ | Bulk track lookup by IDs | API v2 `/tracks?ids=` | client_id |
359
+ | User's track list | API v2 `/users/{id}/tracks` | client_id |
360
+ | Resolve permalink to resource | API v2 `/resolve?url=` | client_id |
361
+ | Waveform amplitude data | Direct fetch of `waveform_url` | None |
362
+ | Audio stream playback | OAuth login required | Login |