@pencil-agent/nano-pencil 2.0.0 → 2.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +267 -267
- package/dist/build-meta.json +3 -3
- package/dist/core/export-html/AGENT.md +11 -11
- package/dist/core/export-html/template.css +971 -971
- package/dist/core/export-html/template.html +54 -54
- package/dist/core/mcp/mcp-client.d.ts +3 -1
- package/dist/core/mcp/mcp-client.js +6 -6
- package/dist/core/mcp/mcp-config.d.ts +3 -3
- package/dist/core/mcp/mcp-config.js +1 -1
- package/dist/core/mcp/mcp-manager.d.ts +5 -1
- package/dist/core/mcp/mcp-manager.js +1 -1
- package/dist/core/platform/config/resource-loader.d.ts +2 -0
- package/dist/core/platform/config/resource-loader.js +2 -2
- package/dist/core/runtime/agent-session.d.ts +12 -0
- package/dist/core/runtime/agent-session.js +8 -8
- package/dist/core/runtime/sdk.d.ts +8 -0
- package/dist/core/runtime/sdk.js +1 -1
- package/dist/extensions/builtin/AGENT.md +115 -115
- package/dist/extensions/builtin/browser/AGENT.md +17 -17
- package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
- package/dist/extensions/builtin/browser/browser.md +73 -73
- package/dist/extensions/builtin/browser/install.md +142 -142
- package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
- package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
- package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
- package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
- package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
- package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
- package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
- package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
- package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
- package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
- package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
- package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
- package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
- package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
- package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
- package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
- package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
- package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
- package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
- package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
- package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
- package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
- package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
- package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
- package/dist/extensions/builtin/goal/README.md +67 -67
- package/dist/extensions/builtin/grub/README.md +112 -112
- package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
- package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
- package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
- package/dist/extensions/builtin/link-world/linkworld.md +313 -313
- package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
- package/dist/extensions/builtin/loop/README.md +92 -92
- package/dist/extensions/builtin/mcp/figma-design.md +68 -68
- package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
- package/dist/extensions/builtin/recap/AGENT.md +15 -15
- package/dist/extensions/builtin/sal/README.md +72 -72
- package/dist/extensions/builtin/security-audit/README.md +289 -289
- package/dist/extensions/builtin/team/AGENT.md +112 -112
- package/dist/extensions/builtin/team/TESTING.md +299 -299
- package/dist/extensions/builtin/token-save/README.md +56 -56
- package/dist/extensions/optional/AGENT.md +10 -10
- package/dist/modes/interactive/interactive-mode.js +36 -36
- package/dist/modes/interactive/theme/dark.json +85 -85
- package/dist/modes/interactive/theme/light.json +84 -84
- package/dist/modes/interactive/theme/theme-schema.json +335 -335
- package/dist/modes/interactive/theme/warm.json +81 -81
- package/dist/node_modules/@pencil-agent/agent-core/dist/agent-loop.js +3 -2
- package/dist/node_modules/@pencil-agent/agent-core/dist/structured-adaptive-agent-loop.js +2 -1
- package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
- package/docs/cc-agent-design.md +1297 -0
- package/docs/cc-tui-design.md +1333 -0
- package/docs/codex-goal-command-impl.md +1055 -1055
- package/docs/codex-goal-vs-grub.md +500 -500
- package/docs/custom-provider.md +27 -27
- package/docs/extensions.md +27 -27
- package/docs/keybindings.md +27 -27
- package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
- package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
- package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
- package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
- package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
- package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
- package/docs/loop-usage-examples.md +214 -214
- package/docs/models.md +27 -27
- package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +170 -0
- package/docs/packages.md +27 -27
- package/docs/pi-design-philosophy.md +457 -457
- package/docs/planmode.md +1987 -1987
- package/docs/prompt-templates.md +27 -27
- package/docs/providers.md +27 -27
- package/docs/scan-report.md +3820 -0
- package/docs/sdk.md +27 -27
- package/docs/skills.md +27 -27
- package/docs/themes.md +27 -27
- package/docs/tui.md +27 -27
- package/docs//345/257/271/346/240/207Claude-Code.md +1775 -0
- package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +261 -0
- package/package.json +190 -190
- package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +0 -851
- package/docs/SDK-TESTING.md +0 -364
- package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +0 -593
- package/docs/startup-performance-optimization.md +0 -301
- package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +0 -47
package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md
CHANGED
|
@@ -1,362 +1,362 @@
|
|
|
1
|
-
# SoundCloud — Data Extraction
|
|
2
|
-
|
|
3
|
-
Field-tested against soundcloud.com on 2026-04-18.
|
|
4
|
-
No authentication required for any approach documented here. All code uses `http_get` (pure HTTP, no browser).
|
|
5
|
-
|
|
6
|
-
---
|
|
7
|
-
|
|
8
|
-
## Approach 1 (Fastest): oEmbed API — No Auth, No Client ID
|
|
9
|
-
|
|
10
|
-
`https://soundcloud.com/oembed?url=<resource_url>&format=json`
|
|
11
|
-
|
|
12
|
-
Returns JSON in ~0.3s. Works for **tracks, playlists/sets, and user profiles**. No key required.
|
|
13
|
-
|
|
14
|
-
```python
|
|
15
|
-
from helpers import http_get
|
|
16
|
-
import json
|
|
17
|
-
|
|
18
|
-
def soundcloud_oembed(resource_url):
|
|
19
|
-
"""Fetch oEmbed metadata for any public SoundCloud URL.
|
|
20
|
-
|
|
21
|
-
Works for:
|
|
22
|
-
- https://soundcloud.com/{user}/{track-slug}
|
|
23
|
-
- https://soundcloud.com/{user}/sets/{playlist-slug}
|
|
24
|
-
- https://soundcloud.com/{user}
|
|
25
|
-
"""
|
|
26
|
-
url = f"https://soundcloud.com/oembed?url={resource_url}&format=json"
|
|
27
|
-
return json.loads(http_get(url))
|
|
28
|
-
|
|
29
|
-
# Track
|
|
30
|
-
track = soundcloud_oembed("https://soundcloud.com/forss/flickermood")
|
|
31
|
-
# {
|
|
32
|
-
# "version": 1.0,
|
|
33
|
-
# "type": "rich",
|
|
34
|
-
# "provider_name": "SoundCloud",
|
|
35
|
-
# "provider_url": "https://soundcloud.com",
|
|
36
|
-
# "height": 400,
|
|
37
|
-
# "width": "100%",
|
|
38
|
-
# "title": "Flickermood by Forss",
|
|
39
|
-
# "description": "From the Soulhack album...",
|
|
40
|
-
# "thumbnail_url": "https://i1.sndcdn.com/artworks-000067273316-smsiqx-t500x500.jpg",
|
|
41
|
-
# "html": "<iframe width=\"100%\" height=\"400\" scrolling=\"no\" frameborder=\"no\" src=\"https://w.soundcloud.com/player/?visual=true&url=...\">",
|
|
42
|
-
# "author_name": "Forss",
|
|
43
|
-
# "author_url": "https://soundcloud.com/forss"
|
|
44
|
-
# }
|
|
45
|
-
|
|
46
|
-
# Playlist/set
|
|
47
|
-
pl = soundcloud_oembed("https://soundcloud.com/forss/sets/soulhack")
|
|
48
|
-
# title="Soulhack by Forss", description="My 2003 debut album...", height=450
|
|
49
|
-
|
|
50
|
-
# User profile
|
|
51
|
-
user = soundcloud_oembed("https://soundcloud.com/forss")
|
|
52
|
-
# title="Forss", description="Artist & Founder SoundCloud", height=450
|
|
53
|
-
```
|
|
54
|
-
|
|
55
|
-
### oEmbed fields
|
|
56
|
-
|
|
57
|
-
| Field | Type | Notes |
|
|
58
|
-
|-------|------|-------|
|
|
59
|
-
| `title` | str | "{Track Title} by {Artist}" for tracks, "{Name}" for users |
|
|
60
|
-
| `author_name` | str | Artist/user display name |
|
|
61
|
-
| `author_url` | str | Profile URL |
|
|
62
|
-
| `thumbnail_url` | str | Artwork at 500×500px (t500x500) |
|
|
63
|
-
| `description` | str | Track/profile description (may contain HTML entities) |
|
|
64
|
-
| `html` | str | Embed iframe for the SoundCloud player widget |
|
|
65
|
-
| `height` | int | 400 for tracks, 450 for playlists and users |
|
|
66
|
-
| `width` | str | Always `"100%"` |
|
|
67
|
-
|
|
68
|
-
---
|
|
69
|
-
|
|
70
|
-
## Approach 2: Page Hydration (`__sc_hydration`) — Rich Metadata, No Client ID
|
|
71
|
-
|
|
72
|
-
Every SoundCloud page embeds a JSON array in a `<script>` tag as `window.__sc_hydration`. This contains full API-grade metadata with no key required.
|
|
73
|
-
|
|
74
|
-
```python
|
|
75
|
-
from helpers import http_get
|
|
76
|
-
import json, re
|
|
77
|
-
|
|
78
|
-
def extract_hydration(page_url):
|
|
79
|
-
"""Extract __sc_hydration JSON from any SoundCloud page."""
|
|
80
|
-
html = http_get(page_url)
|
|
81
|
-
match = re.search(r'window\.__sc_hydration\s*=\s*(\[.*?\]);\s*<', html, re.DOTALL)
|
|
82
|
-
if not match:
|
|
83
|
-
return []
|
|
84
|
-
return json.loads(match.group(1))
|
|
85
|
-
|
|
86
|
-
def get_hydration_by_type(page_url, hydratable):
|
|
87
|
-
"""Get the 'data' dict for a specific hydratable type."""
|
|
88
|
-
for obj in extract_hydration(page_url):
|
|
89
|
-
if obj.get('hydratable') == hydratable:
|
|
90
|
-
return obj.get('data')
|
|
91
|
-
return None
|
|
92
|
-
|
|
93
|
-
# Track page — hydration key is 'sound'
|
|
94
|
-
track = get_hydration_by_type("https://soundcloud.com/forss/flickermood", "sound")
|
|
95
|
-
# track['id'] = 293
|
|
96
|
-
# track['title'] = "Flickermood"
|
|
97
|
-
# track['playback_count'] = 962685
|
|
98
|
-
# track['likes_count'] = 2592
|
|
99
|
-
# track['duration'] = 213886 (milliseconds)
|
|
100
|
-
# track['genre'] = "Electronic"
|
|
101
|
-
# track['created_at'] = "2007-09-22T14:45:46Z"
|
|
102
|
-
# track['artwork_url'] = "https://i1.sndcdn.com/artworks-000067273316-smsiqx-large.jpg"
|
|
103
|
-
# track['waveform_url'] = "https://wave.sndcdn.com/cWHNerOLlkUq_m.json"
|
|
104
|
-
# track['streamable'] = True
|
|
105
|
-
# track['downloadable'] = True
|
|
106
|
-
# track['license'] = "all-rights-reserved"
|
|
107
|
-
# track['tag_list'] = "downtempo"
|
|
108
|
-
# track['urn'] = "soundcloud:tracks:293"
|
|
109
|
-
# track['media'] = {'transcodings': [...]} (HLS/progressive stream URLs — need auth)
|
|
110
|
-
# track['user'] = {full user object nested}
|
|
111
|
-
|
|
112
|
-
# User page — hydration key is 'user'
|
|
113
|
-
user = get_hydration_by_type("https://soundcloud.com/forss", "user")
|
|
114
|
-
# user['id'] = 183
|
|
115
|
-
# user['username'] = "Forss"
|
|
116
|
-
# user['full_name'] = "Eric Quidenus-Wahlforss"
|
|
117
|
-
# user['followers_count'] = 132203
|
|
118
|
-
# user['track_count'] = 26
|
|
119
|
-
# user['verified'] = True
|
|
120
|
-
# user['city'] = "Berlin"
|
|
121
|
-
# user['country_code'] = "DE"
|
|
122
|
-
# user['description'] = "Artist & Founder SoundCloud"
|
|
123
|
-
# user['creator_subscription'] = {'product': {'id': 'creator-pro-unlimited'}}
|
|
124
|
-
# user['badges'] = {'pro_unlimited': True, 'verified': True}
|
|
125
|
-
|
|
126
|
-
# Playlist/set page — hydration key is 'playlist'
|
|
127
|
-
playlist = get_hydration_by_type("https://soundcloud.com/forss/sets/soulhack", "playlist")
|
|
128
|
-
# playlist['id'] = 18
|
|
129
|
-
# playlist['title'] = "Soulhack"
|
|
130
|
-
# playlist['track_count'] = 11
|
|
131
|
-
# playlist['tracks'] = [full track objects list]
|
|
132
|
-
# playlist['is_album'] = True/False
|
|
133
|
-
# playlist['genre'] = "Electronic"
|
|
134
|
-
```
|
|
135
|
-
|
|
136
|
-
### All hydration keys on a typical page
|
|
137
|
-
|
|
138
|
-
| `hydratable` | Content |
|
|
139
|
-
|---|---|
|
|
140
|
-
| `sound` | Full track object (on track pages) |
|
|
141
|
-
| `playlist` | Full playlist + all tracks (on set pages) |
|
|
142
|
-
| `user` | Full user object (on any page with a profile) |
|
|
143
|
-
| `apiClient` | `{'id': '<client_id>', 'isExpiring': False}` — the client_id |
|
|
144
|
-
| `geoip` | Viewer country/city/coordinates |
|
|
145
|
-
| `features` | Feature flags dict |
|
|
146
|
-
| `anonymousId` | Session tracking ID (not useful) |
|
|
147
|
-
|
|
148
|
-
---
|
|
149
|
-
|
|
150
|
-
## Approach 3: API v2 — Full Query Power (Requires Client ID)
|
|
151
|
-
|
|
152
|
-
The `client_id` lives in every page's `__sc_hydration` under the `apiClient` key. It is **stable across all pages and sessions** — extract once and reuse.
|
|
153
|
-
|
|
154
|
-
```python
|
|
155
|
-
from helpers import http_get
|
|
156
|
-
import json, re
|
|
157
|
-
|
|
158
|
-
def get_client_id(page_url="https://soundcloud.com"):
|
|
159
|
-
"""Extract client_id from any SoundCloud page's __sc_hydration."""
|
|
160
|
-
html = http_get(page_url)
|
|
161
|
-
match = re.search(r'window\.__sc_hydration\s*=\s*(\[.*?\]);\s*<', html, re.DOTALL)
|
|
162
|
-
if not match:
|
|
163
|
-
raise ValueError("No hydration found")
|
|
164
|
-
for obj in json.loads(match.group(1)):
|
|
165
|
-
if obj.get('hydratable') == 'apiClient':
|
|
166
|
-
return obj['data']['id']
|
|
167
|
-
raise ValueError("apiClient not found in hydration")
|
|
168
|
-
|
|
169
|
-
CLIENT_ID = get_client_id() # "efg2kjLJnAJpInbN6P3hsHzispI1SKQH" (example — extract fresh)
|
|
170
|
-
|
|
171
|
-
def sc_api(path, **params):
|
|
172
|
-
"""Call api-v2.soundcloud.com. Returns parsed JSON."""
|
|
173
|
-
params['client_id'] = CLIENT_ID
|
|
174
|
-
qs = "&".join(f"{k}={v}" for k, v in params.items())
|
|
175
|
-
return json.loads(http_get(f"https://api-v2.soundcloud.com/{path}?{qs}"))
|
|
176
|
-
```
|
|
177
|
-
|
|
178
|
-
### Resolve any URL to a resource
|
|
179
|
-
|
|
180
|
-
```python
|
|
181
|
-
# Resolve a permalink URL to get its resource with full metadata
|
|
182
|
-
track = sc_api("resolve", url="https://soundcloud.com/forss/flickermood")
|
|
183
|
-
# Returns: {'kind': 'track', 'id': 293, 'title': 'Flickermood', ...}
|
|
184
|
-
|
|
185
|
-
user = sc_api("resolve", url="https://soundcloud.com/forss")
|
|
186
|
-
# Returns: {'kind': 'user', 'id': 183, 'username': 'Forss', ...}
|
|
187
|
-
```
|
|
188
|
-
|
|
189
|
-
### Track lookup
|
|
190
|
-
|
|
191
|
-
```python
|
|
192
|
-
# Single track by numeric ID
|
|
193
|
-
track = sc_api("tracks/293")
|
|
194
|
-
|
|
195
|
-
# Bulk track lookup (comma-separated IDs — returns list)
|
|
196
|
-
tracks = sc_api("tracks", ids="293,290,48031525")
|
|
197
|
-
# Returns a JSON array directly (not wrapped in 'collection')
|
|
198
|
-
for t in tracks:
|
|
199
|
-
print(t['id'], t['title'], t['playback_count'])
|
|
200
|
-
```
|
|
201
|
-
|
|
202
|
-
### Search
|
|
203
|
-
|
|
204
|
-
```python
|
|
205
|
-
# Tracks
|
|
206
|
-
results = sc_api("search/tracks", q="jazz", limit=20)
|
|
207
|
-
# results['collection'] = list of track objects
|
|
208
|
-
# results['total_results'] = 5293248
|
|
209
|
-
# results['next_href'] = pagination URL (see below)
|
|
210
|
-
|
|
211
|
-
# Users
|
|
212
|
-
results = sc_api("search/users", q="jazz", limit=10)
|
|
213
|
-
|
|
214
|
-
# Playlists/sets
|
|
215
|
-
results = sc_api("search/playlists", q="jazz", limit=10)
|
|
216
|
-
|
|
217
|
-
# Paginate with next_href
|
|
218
|
-
def paginate(first_response):
|
|
219
|
-
"""Yield all pages of a collection response."""
|
|
220
|
-
yield from first_response.get('collection', [])
|
|
221
|
-
next_href = first_response.get('next_href')
|
|
222
|
-
while next_href:
|
|
223
|
-
page = json.loads(http_get(f"{next_href}&client_id={CLIENT_ID}"))
|
|
224
|
-
yield from page.get('collection', [])
|
|
225
|
-
next_href = page.get('next_href')
|
|
226
|
-
```
|
|
227
|
-
|
|
228
|
-
### Trending charts
|
|
229
|
-
|
|
230
|
-
```python
|
|
231
|
-
# Trending tracks across all genres
|
|
232
|
-
trending = sc_api("charts", kind="trending",
|
|
233
|
-
genre="soundcloud:genres:all-music", limit=20)
|
|
234
|
-
for item in trending['collection']:
|
|
235
|
-
t = item['track']
|
|
236
|
-
print(f"{t['title']} — score={item['score']:.4f}")
|
|
237
|
-
|
|
238
|
-
# Genre options: soundcloud:genres:all-music, soundcloud:genres:electronic,
|
|
239
|
-
# soundcloud:genres:hiphoprap, soundcloud:genres:ambient, etc.
|
|
240
|
-
```
|
|
241
|
-
|
|
242
|
-
### User resources
|
|
243
|
-
|
|
244
|
-
```python
|
|
245
|
-
user_id = 183 # numeric ID from resolve or hydration
|
|
246
|
-
|
|
247
|
-
# User's tracks
|
|
248
|
-
tracks = sc_api(f"users/{user_id}/tracks", limit=20)
|
|
249
|
-
# tracks['collection'] = list of track objects
|
|
250
|
-
|
|
251
|
-
# User's playlists
|
|
252
|
-
playlists = sc_api(f"users/{user_id}/playlists", limit=10)
|
|
253
|
-
|
|
254
|
-
# User's likes
|
|
255
|
-
likes = sc_api(f"users/{user_id}/likes", limit=10)
|
|
256
|
-
|
|
257
|
-
# Related tracks for a track
|
|
258
|
-
related = sc_api("tracks/293/related", limit=10)
|
|
259
|
-
# related['collection'] = list of track objects
|
|
260
|
-
```
|
|
261
|
-
|
|
262
|
-
### Waveform data
|
|
263
|
-
|
|
264
|
-
```python
|
|
265
|
-
# Waveform URL comes from track['waveform_url']
|
|
266
|
-
waveform_url = "https://wave.sndcdn.com/cWHNerOLlkUq_m.json"
|
|
267
|
-
waveform = json.loads(http_get(waveform_url))
|
|
268
|
-
# {
|
|
269
|
-
# 'width': 1800, # number of sample points
|
|
270
|
-
# 'height': 140, # max amplitude value
|
|
271
|
-
# 'samples': [11, 86, 91, 80, ...] # 1800 amplitude values
|
|
272
|
-
# }
|
|
273
|
-
```
|
|
274
|
-
|
|
275
|
-
---
|
|
276
|
-
|
|
277
|
-
## Full track fields from `__sc_hydration` / API v2
|
|
278
|
-
|
|
279
|
-
```
|
|
280
|
-
id int Numeric track ID (e.g. 293)
|
|
281
|
-
urn str "soundcloud:tracks:293"
|
|
282
|
-
title str Track title
|
|
283
|
-
description str May contain HTML entities/tags
|
|
284
|
-
genre str Genre string
|
|
285
|
-
tag_list str Space-separated tags
|
|
286
|
-
created_at str ISO 8601 UTC
|
|
287
|
-
last_modified str ISO 8601 UTC
|
|
288
|
-
release_date str ISO 8601 UTC (original release)
|
|
289
|
-
display_date str ISO 8601 UTC (shown to users)
|
|
290
|
-
duration int Milliseconds
|
|
291
|
-
full_duration int Milliseconds (untruncated)
|
|
292
|
-
playback_count int
|
|
293
|
-
likes_count int
|
|
294
|
-
reposts_count int
|
|
295
|
-
comment_count int
|
|
296
|
-
download_count int
|
|
297
|
-
artwork_url str e.g. .../artworks-...-large.jpg (replace 'large' with 't500x500' for 500px)
|
|
298
|
-
waveform_url str https://wave.sndcdn.com/....json
|
|
299
|
-
permalink str Slug (e.g. "flickermood")
|
|
300
|
-
permalink_url str Full canonical URL
|
|
301
|
-
streamable bool
|
|
302
|
-
downloadable bool
|
|
303
|
-
license str e.g. "all-rights-reserved", "cc-by"
|
|
304
|
-
sharing str "public" or "private"
|
|
305
|
-
state str "finished" | "processing" | "failed"
|
|
306
|
-
monetization_model str "AD_SUPPORTED" | "SUB_HIGH_TIER" | "NOT_APPLICABLE"
|
|
307
|
-
embeddable_by str "all" | "me" | "none"
|
|
308
|
-
user dict Nested user object (id, username, avatar_url, verified, ...)
|
|
309
|
-
user_id int Owner numeric ID
|
|
310
|
-
publisher_metadata dict {artist, publisher, isrc, contains_music, ...}
|
|
311
|
-
media dict {'transcodings': [...]} — stream URLs (require OAuth, not usable without login)
|
|
312
|
-
label_name str Record label
|
|
313
|
-
purchase_url str External buy link
|
|
314
|
-
station_urn str "soundcloud:system-playlists:track-stations:{id}"
|
|
315
|
-
```
|
|
316
|
-
|
|
317
|
-
---
|
|
318
|
-
|
|
319
|
-
## Gotchas
|
|
320
|
-
|
|
321
|
-
**client_id is required for api-v2.soundcloud.com** — requests without it return HTTP 401. Always extract from `__sc_hydration['apiClient']['id']`.
|
|
322
|
-
|
|
323
|
-
**client_id source: hydration, not JS bundles** — the JS bundles on `a-v2.sndcdn.com` do NOT contain the `client_id` pattern. The only reliable source is the `apiClient` object in the page hydration. It is stable across all pages (same value from homepage, track pages, user pages) and does not appear to rotate on short timescales.
|
|
324
|
-
|
|
325
|
-
**Artwork URL sizes** — hydration/API returns `...-large.jpg` (100×100). Replace the size suffix to get larger images:
|
|
326
|
-
- `-large.jpg` → 100×100
|
|
327
|
-
- `-t300x300.jpg` → 300×300
|
|
328
|
-
- `-t500x500.jpg` → 500×500 (oEmbed returns this size)
|
|
329
|
-
|
|
330
|
-
**Regex must use `re.DOTALL`** — the `__sc_hydration` JSON spans multiple lines. Without `re.DOTALL`, the `.` in the regex won't match newlines.
|
|
331
|
-
|
|
332
|
-
**Stream URLs (media.transcodings) are gated** — the HLS/progressive audio stream URLs in `track['media']['transcodings']` require an OAuth token even to fetch a stream manifest. They cannot be played without a logged-in session.
|
|
333
|
-
|
|
334
|
-
**Bulk track lookup returns a list, not collection** — `GET /tracks?ids=...` returns a JSON array directly. Do NOT look for `.get('collection')`.
|
|
335
|
-
|
|
336
|
-
**Search `total_results` can be huge** — results like 5M+ are normal for broad queries. Use `next_href` for pagination; do not calculate offsets manually.
|
|
337
|
-
|
|
338
|
-
**oEmbed description contains HTML** — SoundCloud descriptions may include ` ` and anchor tags. Decode with `html.unescape()` if you need plain text.
|
|
339
|
-
|
|
340
|
-
**HTTP 400 on some endpoints** — `/tracks/{id}/comments` returns 400 without OAuth headers. Timed comments are not accessible without login.
|
|
341
|
-
|
|
342
|
-
**No browser required** — all documented approaches work with plain `http_get`. SoundCloud does not require JavaScript rendering for metadata extraction.
|
|
343
|
-
|
|
344
|
-
**Rate limits** — 20 rapid sequential API v2 requests completed without errors in testing. SoundCloud does not publish official rate limits; stay under ~50 req/s for sustained scraping. oEmbed is more lenient than api-v2.
|
|
345
|
-
|
|
346
|
-
---
|
|
347
|
-
|
|
348
|
-
## Quick Reference
|
|
349
|
-
|
|
350
|
-
| Goal | Approach | Auth |
|
|
351
|
-
|------|----------|------|
|
|
352
|
-
| Track title/author/thumbnail from URL | oEmbed | None |
|
|
353
|
-
| Full track metadata + play counts | `__sc_hydration` `sound` key | None |
|
|
354
|
-
| Full user profile + stats | `__sc_hydration` `user` key | None |
|
|
355
|
-
| Full playlist with all tracks | `__sc_hydration` `playlist` key | None |
|
|
356
|
-
| Search tracks/users/playlists | API v2 `/search/*` | client_id |
|
|
357
|
-
| Trending charts | API v2 `/charts` | client_id |
|
|
358
|
-
| Bulk track lookup by IDs | API v2 `/tracks?ids=` | client_id |
|
|
359
|
-
| User's track list | API v2 `/users/{id}/tracks` | client_id |
|
|
360
|
-
| Resolve permalink to resource | API v2 `/resolve?url=` | client_id |
|
|
361
|
-
| Waveform amplitude data | Direct fetch of `waveform_url` | None |
|
|
362
|
-
| Audio stream playback | OAuth login required | Login |
|
|
1
|
+
# SoundCloud — Data Extraction
|
|
2
|
+
|
|
3
|
+
Field-tested against soundcloud.com on 2026-04-18.
|
|
4
|
+
No authentication required for any approach documented here. All code uses `http_get` (pure HTTP, no browser).
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Approach 1 (Fastest): oEmbed API — No Auth, No Client ID
|
|
9
|
+
|
|
10
|
+
`https://soundcloud.com/oembed?url=<resource_url>&format=json`
|
|
11
|
+
|
|
12
|
+
Returns JSON in ~0.3s. Works for **tracks, playlists/sets, and user profiles**. No key required.
|
|
13
|
+
|
|
14
|
+
```python
|
|
15
|
+
from helpers import http_get
|
|
16
|
+
import json
|
|
17
|
+
|
|
18
|
+
def soundcloud_oembed(resource_url):
|
|
19
|
+
"""Fetch oEmbed metadata for any public SoundCloud URL.
|
|
20
|
+
|
|
21
|
+
Works for:
|
|
22
|
+
- https://soundcloud.com/{user}/{track-slug}
|
|
23
|
+
- https://soundcloud.com/{user}/sets/{playlist-slug}
|
|
24
|
+
- https://soundcloud.com/{user}
|
|
25
|
+
"""
|
|
26
|
+
url = f"https://soundcloud.com/oembed?url={resource_url}&format=json"
|
|
27
|
+
return json.loads(http_get(url))
|
|
28
|
+
|
|
29
|
+
# Track
|
|
30
|
+
track = soundcloud_oembed("https://soundcloud.com/forss/flickermood")
|
|
31
|
+
# {
|
|
32
|
+
# "version": 1.0,
|
|
33
|
+
# "type": "rich",
|
|
34
|
+
# "provider_name": "SoundCloud",
|
|
35
|
+
# "provider_url": "https://soundcloud.com",
|
|
36
|
+
# "height": 400,
|
|
37
|
+
# "width": "100%",
|
|
38
|
+
# "title": "Flickermood by Forss",
|
|
39
|
+
# "description": "From the Soulhack album...",
|
|
40
|
+
# "thumbnail_url": "https://i1.sndcdn.com/artworks-000067273316-smsiqx-t500x500.jpg",
|
|
41
|
+
# "html": "<iframe width=\"100%\" height=\"400\" scrolling=\"no\" frameborder=\"no\" src=\"https://w.soundcloud.com/player/?visual=true&url=...\">",
|
|
42
|
+
# "author_name": "Forss",
|
|
43
|
+
# "author_url": "https://soundcloud.com/forss"
|
|
44
|
+
# }
|
|
45
|
+
|
|
46
|
+
# Playlist/set
|
|
47
|
+
pl = soundcloud_oembed("https://soundcloud.com/forss/sets/soulhack")
|
|
48
|
+
# title="Soulhack by Forss", description="My 2003 debut album...", height=450
|
|
49
|
+
|
|
50
|
+
# User profile
|
|
51
|
+
user = soundcloud_oembed("https://soundcloud.com/forss")
|
|
52
|
+
# title="Forss", description="Artist & Founder SoundCloud", height=450
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
### oEmbed fields
|
|
56
|
+
|
|
57
|
+
| Field | Type | Notes |
|
|
58
|
+
|-------|------|-------|
|
|
59
|
+
| `title` | str | "{Track Title} by {Artist}" for tracks, "{Name}" for users |
|
|
60
|
+
| `author_name` | str | Artist/user display name |
|
|
61
|
+
| `author_url` | str | Profile URL |
|
|
62
|
+
| `thumbnail_url` | str | Artwork at 500×500px (t500x500) |
|
|
63
|
+
| `description` | str | Track/profile description (may contain HTML entities) |
|
|
64
|
+
| `html` | str | Embed iframe for the SoundCloud player widget |
|
|
65
|
+
| `height` | int | 400 for tracks, 450 for playlists and users |
|
|
66
|
+
| `width` | str | Always `"100%"` |
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Approach 2: Page Hydration (`__sc_hydration`) — Rich Metadata, No Client ID
|
|
71
|
+
|
|
72
|
+
Every SoundCloud page embeds a JSON array in a `<script>` tag as `window.__sc_hydration`. This contains full API-grade metadata with no key required.
|
|
73
|
+
|
|
74
|
+
```python
|
|
75
|
+
from helpers import http_get
|
|
76
|
+
import json, re
|
|
77
|
+
|
|
78
|
+
def extract_hydration(page_url):
|
|
79
|
+
"""Extract __sc_hydration JSON from any SoundCloud page."""
|
|
80
|
+
html = http_get(page_url)
|
|
81
|
+
match = re.search(r'window\.__sc_hydration\s*=\s*(\[.*?\]);\s*<', html, re.DOTALL)
|
|
82
|
+
if not match:
|
|
83
|
+
return []
|
|
84
|
+
return json.loads(match.group(1))
|
|
85
|
+
|
|
86
|
+
def get_hydration_by_type(page_url, hydratable):
|
|
87
|
+
"""Get the 'data' dict for a specific hydratable type."""
|
|
88
|
+
for obj in extract_hydration(page_url):
|
|
89
|
+
if obj.get('hydratable') == hydratable:
|
|
90
|
+
return obj.get('data')
|
|
91
|
+
return None
|
|
92
|
+
|
|
93
|
+
# Track page — hydration key is 'sound'
|
|
94
|
+
track = get_hydration_by_type("https://soundcloud.com/forss/flickermood", "sound")
|
|
95
|
+
# track['id'] = 293
|
|
96
|
+
# track['title'] = "Flickermood"
|
|
97
|
+
# track['playback_count'] = 962685
|
|
98
|
+
# track['likes_count'] = 2592
|
|
99
|
+
# track['duration'] = 213886 (milliseconds)
|
|
100
|
+
# track['genre'] = "Electronic"
|
|
101
|
+
# track['created_at'] = "2007-09-22T14:45:46Z"
|
|
102
|
+
# track['artwork_url'] = "https://i1.sndcdn.com/artworks-000067273316-smsiqx-large.jpg"
|
|
103
|
+
# track['waveform_url'] = "https://wave.sndcdn.com/cWHNerOLlkUq_m.json"
|
|
104
|
+
# track['streamable'] = True
|
|
105
|
+
# track['downloadable'] = True
|
|
106
|
+
# track['license'] = "all-rights-reserved"
|
|
107
|
+
# track['tag_list'] = "downtempo"
|
|
108
|
+
# track['urn'] = "soundcloud:tracks:293"
|
|
109
|
+
# track['media'] = {'transcodings': [...]} (HLS/progressive stream URLs — need auth)
|
|
110
|
+
# track['user'] = {full user object nested}
|
|
111
|
+
|
|
112
|
+
# User page — hydration key is 'user'
|
|
113
|
+
user = get_hydration_by_type("https://soundcloud.com/forss", "user")
|
|
114
|
+
# user['id'] = 183
|
|
115
|
+
# user['username'] = "Forss"
|
|
116
|
+
# user['full_name'] = "Eric Quidenus-Wahlforss"
|
|
117
|
+
# user['followers_count'] = 132203
|
|
118
|
+
# user['track_count'] = 26
|
|
119
|
+
# user['verified'] = True
|
|
120
|
+
# user['city'] = "Berlin"
|
|
121
|
+
# user['country_code'] = "DE"
|
|
122
|
+
# user['description'] = "Artist & Founder SoundCloud"
|
|
123
|
+
# user['creator_subscription'] = {'product': {'id': 'creator-pro-unlimited'}}
|
|
124
|
+
# user['badges'] = {'pro_unlimited': True, 'verified': True}
|
|
125
|
+
|
|
126
|
+
# Playlist/set page — hydration key is 'playlist'
|
|
127
|
+
playlist = get_hydration_by_type("https://soundcloud.com/forss/sets/soulhack", "playlist")
|
|
128
|
+
# playlist['id'] = 18
|
|
129
|
+
# playlist['title'] = "Soulhack"
|
|
130
|
+
# playlist['track_count'] = 11
|
|
131
|
+
# playlist['tracks'] = [full track objects list]
|
|
132
|
+
# playlist['is_album'] = True/False
|
|
133
|
+
# playlist['genre'] = "Electronic"
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
### All hydration keys on a typical page
|
|
137
|
+
|
|
138
|
+
| `hydratable` | Content |
|
|
139
|
+
|---|---|
|
|
140
|
+
| `sound` | Full track object (on track pages) |
|
|
141
|
+
| `playlist` | Full playlist + all tracks (on set pages) |
|
|
142
|
+
| `user` | Full user object (on any page with a profile) |
|
|
143
|
+
| `apiClient` | `{'id': '<client_id>', 'isExpiring': False}` — the client_id |
|
|
144
|
+
| `geoip` | Viewer country/city/coordinates |
|
|
145
|
+
| `features` | Feature flags dict |
|
|
146
|
+
| `anonymousId` | Session tracking ID (not useful) |
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
## Approach 3: API v2 — Full Query Power (Requires Client ID)
|
|
151
|
+
|
|
152
|
+
The `client_id` lives in every page's `__sc_hydration` under the `apiClient` key. It is **stable across all pages and sessions** — extract once and reuse.
|
|
153
|
+
|
|
154
|
+
```python
|
|
155
|
+
from helpers import http_get
|
|
156
|
+
import json, re
|
|
157
|
+
|
|
158
|
+
def get_client_id(page_url="https://soundcloud.com"):
|
|
159
|
+
"""Extract client_id from any SoundCloud page's __sc_hydration."""
|
|
160
|
+
html = http_get(page_url)
|
|
161
|
+
match = re.search(r'window\.__sc_hydration\s*=\s*(\[.*?\]);\s*<', html, re.DOTALL)
|
|
162
|
+
if not match:
|
|
163
|
+
raise ValueError("No hydration found")
|
|
164
|
+
for obj in json.loads(match.group(1)):
|
|
165
|
+
if obj.get('hydratable') == 'apiClient':
|
|
166
|
+
return obj['data']['id']
|
|
167
|
+
raise ValueError("apiClient not found in hydration")
|
|
168
|
+
|
|
169
|
+
CLIENT_ID = get_client_id() # "efg2kjLJnAJpInbN6P3hsHzispI1SKQH" (example — extract fresh)
|
|
170
|
+
|
|
171
|
+
def sc_api(path, **params):
|
|
172
|
+
"""Call api-v2.soundcloud.com. Returns parsed JSON."""
|
|
173
|
+
params['client_id'] = CLIENT_ID
|
|
174
|
+
qs = "&".join(f"{k}={v}" for k, v in params.items())
|
|
175
|
+
return json.loads(http_get(f"https://api-v2.soundcloud.com/{path}?{qs}"))
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
### Resolve any URL to a resource
|
|
179
|
+
|
|
180
|
+
```python
|
|
181
|
+
# Resolve a permalink URL to get its resource with full metadata
|
|
182
|
+
track = sc_api("resolve", url="https://soundcloud.com/forss/flickermood")
|
|
183
|
+
# Returns: {'kind': 'track', 'id': 293, 'title': 'Flickermood', ...}
|
|
184
|
+
|
|
185
|
+
user = sc_api("resolve", url="https://soundcloud.com/forss")
|
|
186
|
+
# Returns: {'kind': 'user', 'id': 183, 'username': 'Forss', ...}
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
### Track lookup
|
|
190
|
+
|
|
191
|
+
```python
|
|
192
|
+
# Single track by numeric ID
|
|
193
|
+
track = sc_api("tracks/293")
|
|
194
|
+
|
|
195
|
+
# Bulk track lookup (comma-separated IDs — returns list)
|
|
196
|
+
tracks = sc_api("tracks", ids="293,290,48031525")
|
|
197
|
+
# Returns a JSON array directly (not wrapped in 'collection')
|
|
198
|
+
for t in tracks:
|
|
199
|
+
print(t['id'], t['title'], t['playback_count'])
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
### Search
|
|
203
|
+
|
|
204
|
+
```python
|
|
205
|
+
# Tracks
|
|
206
|
+
results = sc_api("search/tracks", q="jazz", limit=20)
|
|
207
|
+
# results['collection'] = list of track objects
|
|
208
|
+
# results['total_results'] = 5293248
|
|
209
|
+
# results['next_href'] = pagination URL (see below)
|
|
210
|
+
|
|
211
|
+
# Users
|
|
212
|
+
results = sc_api("search/users", q="jazz", limit=10)
|
|
213
|
+
|
|
214
|
+
# Playlists/sets
|
|
215
|
+
results = sc_api("search/playlists", q="jazz", limit=10)
|
|
216
|
+
|
|
217
|
+
# Paginate with next_href
|
|
218
|
+
def paginate(first_response):
|
|
219
|
+
"""Yield all pages of a collection response."""
|
|
220
|
+
yield from first_response.get('collection', [])
|
|
221
|
+
next_href = first_response.get('next_href')
|
|
222
|
+
while next_href:
|
|
223
|
+
page = json.loads(http_get(f"{next_href}&client_id={CLIENT_ID}"))
|
|
224
|
+
yield from page.get('collection', [])
|
|
225
|
+
next_href = page.get('next_href')
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
### Trending charts
|
|
229
|
+
|
|
230
|
+
```python
|
|
231
|
+
# Trending tracks across all genres
|
|
232
|
+
trending = sc_api("charts", kind="trending",
|
|
233
|
+
genre="soundcloud:genres:all-music", limit=20)
|
|
234
|
+
for item in trending['collection']:
|
|
235
|
+
t = item['track']
|
|
236
|
+
print(f"{t['title']} — score={item['score']:.4f}")
|
|
237
|
+
|
|
238
|
+
# Genre options: soundcloud:genres:all-music, soundcloud:genres:electronic,
|
|
239
|
+
# soundcloud:genres:hiphoprap, soundcloud:genres:ambient, etc.
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
### User resources
|
|
243
|
+
|
|
244
|
+
```python
|
|
245
|
+
user_id = 183 # numeric ID from resolve or hydration
|
|
246
|
+
|
|
247
|
+
# User's tracks
|
|
248
|
+
tracks = sc_api(f"users/{user_id}/tracks", limit=20)
|
|
249
|
+
# tracks['collection'] = list of track objects
|
|
250
|
+
|
|
251
|
+
# User's playlists
|
|
252
|
+
playlists = sc_api(f"users/{user_id}/playlists", limit=10)
|
|
253
|
+
|
|
254
|
+
# User's likes
|
|
255
|
+
likes = sc_api(f"users/{user_id}/likes", limit=10)
|
|
256
|
+
|
|
257
|
+
# Related tracks for a track
|
|
258
|
+
related = sc_api("tracks/293/related", limit=10)
|
|
259
|
+
# related['collection'] = list of track objects
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
### Waveform data
|
|
263
|
+
|
|
264
|
+
```python
|
|
265
|
+
# Waveform URL comes from track['waveform_url']
|
|
266
|
+
waveform_url = "https://wave.sndcdn.com/cWHNerOLlkUq_m.json"
|
|
267
|
+
waveform = json.loads(http_get(waveform_url))
|
|
268
|
+
# {
|
|
269
|
+
# 'width': 1800, # number of sample points
|
|
270
|
+
# 'height': 140, # max amplitude value
|
|
271
|
+
# 'samples': [11, 86, 91, 80, ...] # 1800 amplitude values
|
|
272
|
+
# }
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
---
|
|
276
|
+
|
|
277
|
+
## Full track fields from `__sc_hydration` / API v2
|
|
278
|
+
|
|
279
|
+
```
|
|
280
|
+
id int Numeric track ID (e.g. 293)
|
|
281
|
+
urn str "soundcloud:tracks:293"
|
|
282
|
+
title str Track title
|
|
283
|
+
description str May contain HTML entities/tags
|
|
284
|
+
genre str Genre string
|
|
285
|
+
tag_list str Space-separated tags
|
|
286
|
+
created_at str ISO 8601 UTC
|
|
287
|
+
last_modified str ISO 8601 UTC
|
|
288
|
+
release_date str ISO 8601 UTC (original release)
|
|
289
|
+
display_date str ISO 8601 UTC (shown to users)
|
|
290
|
+
duration int Milliseconds
|
|
291
|
+
full_duration int Milliseconds (untruncated)
|
|
292
|
+
playback_count int
|
|
293
|
+
likes_count int
|
|
294
|
+
reposts_count int
|
|
295
|
+
comment_count int
|
|
296
|
+
download_count int
|
|
297
|
+
artwork_url str e.g. .../artworks-...-large.jpg (replace 'large' with 't500x500' for 500px)
|
|
298
|
+
waveform_url str https://wave.sndcdn.com/....json
|
|
299
|
+
permalink str Slug (e.g. "flickermood")
|
|
300
|
+
permalink_url str Full canonical URL
|
|
301
|
+
streamable bool
|
|
302
|
+
downloadable bool
|
|
303
|
+
license str e.g. "all-rights-reserved", "cc-by"
|
|
304
|
+
sharing str "public" or "private"
|
|
305
|
+
state str "finished" | "processing" | "failed"
|
|
306
|
+
monetization_model str "AD_SUPPORTED" | "SUB_HIGH_TIER" | "NOT_APPLICABLE"
|
|
307
|
+
embeddable_by str "all" | "me" | "none"
|
|
308
|
+
user dict Nested user object (id, username, avatar_url, verified, ...)
|
|
309
|
+
user_id int Owner numeric ID
|
|
310
|
+
publisher_metadata dict {artist, publisher, isrc, contains_music, ...}
|
|
311
|
+
media dict {'transcodings': [...]} — stream URLs (require OAuth, not usable without login)
|
|
312
|
+
label_name str Record label
|
|
313
|
+
purchase_url str External buy link
|
|
314
|
+
station_urn str "soundcloud:system-playlists:track-stations:{id}"
|
|
315
|
+
```
|
|
316
|
+
|
|
317
|
+
---
|
|
318
|
+
|
|
319
|
+
## Gotchas
|
|
320
|
+
|
|
321
|
+
**client_id is required for api-v2.soundcloud.com** — requests without it return HTTP 401. Always extract from `__sc_hydration['apiClient']['id']`.
|
|
322
|
+
|
|
323
|
+
**client_id source: hydration, not JS bundles** — the JS bundles on `a-v2.sndcdn.com` do NOT contain the `client_id` pattern. The only reliable source is the `apiClient` object in the page hydration. It is stable across all pages (same value from homepage, track pages, user pages) and does not appear to rotate on short timescales.
|
|
324
|
+
|
|
325
|
+
**Artwork URL sizes** — hydration/API returns `...-large.jpg` (100×100). Replace the size suffix to get larger images:
|
|
326
|
+
- `-large.jpg` → 100×100
|
|
327
|
+
- `-t300x300.jpg` → 300×300
|
|
328
|
+
- `-t500x500.jpg` → 500×500 (oEmbed returns this size)
|
|
329
|
+
|
|
330
|
+
**Regex must use `re.DOTALL`** — the `__sc_hydration` JSON spans multiple lines. Without `re.DOTALL`, the `.` in the regex won't match newlines.
|
|
331
|
+
|
|
332
|
+
**Stream URLs (media.transcodings) are gated** — the HLS/progressive audio stream URLs in `track['media']['transcodings']` require an OAuth token even to fetch a stream manifest. They cannot be played without a logged-in session.
|
|
333
|
+
|
|
334
|
+
**Bulk track lookup returns a list, not collection** — `GET /tracks?ids=...` returns a JSON array directly. Do NOT look for `.get('collection')`.
|
|
335
|
+
|
|
336
|
+
**Search `total_results` can be huge** — results like 5M+ are normal for broad queries. Use `next_href` for pagination; do not calculate offsets manually.
|
|
337
|
+
|
|
338
|
+
**oEmbed description contains HTML** — SoundCloud descriptions may include ` ` and anchor tags. Decode with `html.unescape()` if you need plain text.
|
|
339
|
+
|
|
340
|
+
**HTTP 400 on some endpoints** — `/tracks/{id}/comments` returns 400 without OAuth headers. Timed comments are not accessible without login.
|
|
341
|
+
|
|
342
|
+
**No browser required** — all documented approaches work with plain `http_get`. SoundCloud does not require JavaScript rendering for metadata extraction.
|
|
343
|
+
|
|
344
|
+
**Rate limits** — 20 rapid sequential API v2 requests completed without errors in testing. SoundCloud does not publish official rate limits; stay under ~50 req/s for sustained scraping. oEmbed is more lenient than api-v2.
|
|
345
|
+
|
|
346
|
+
---
|
|
347
|
+
|
|
348
|
+
## Quick Reference
|
|
349
|
+
|
|
350
|
+
| Goal | Approach | Auth |
|
|
351
|
+
|------|----------|------|
|
|
352
|
+
| Track title/author/thumbnail from URL | oEmbed | None |
|
|
353
|
+
| Full track metadata + play counts | `__sc_hydration` `sound` key | None |
|
|
354
|
+
| Full user profile + stats | `__sc_hydration` `user` key | None |
|
|
355
|
+
| Full playlist with all tracks | `__sc_hydration` `playlist` key | None |
|
|
356
|
+
| Search tracks/users/playlists | API v2 `/search/*` | client_id |
|
|
357
|
+
| Trending charts | API v2 `/charts` | client_id |
|
|
358
|
+
| Bulk track lookup by IDs | API v2 `/tracks?ids=` | client_id |
|
|
359
|
+
| User's track list | API v2 `/users/{id}/tracks` | client_id |
|
|
360
|
+
| Resolve permalink to resource | API v2 `/resolve?url=` | client_id |
|
|
361
|
+
| Waveform amplitude data | Direct fetch of `waveform_url` | None |
|
|
362
|
+
| Audio stream playback | OAuth login required | Login |
|