@pencil-agent/nano-pencil 2.0.0 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (195) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/mcp/mcp-client.d.ts +3 -1
  7. package/dist/core/mcp/mcp-client.js +6 -6
  8. package/dist/core/mcp/mcp-config.d.ts +3 -3
  9. package/dist/core/mcp/mcp-config.js +1 -1
  10. package/dist/core/mcp/mcp-manager.d.ts +5 -1
  11. package/dist/core/mcp/mcp-manager.js +1 -1
  12. package/dist/core/platform/config/resource-loader.d.ts +2 -0
  13. package/dist/core/platform/config/resource-loader.js +2 -2
  14. package/dist/core/runtime/agent-session.d.ts +12 -0
  15. package/dist/core/runtime/agent-session.js +8 -8
  16. package/dist/core/runtime/sdk.d.ts +8 -0
  17. package/dist/core/runtime/sdk.js +1 -1
  18. package/dist/extensions/builtin/AGENT.md +115 -115
  19. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  20. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  91. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  92. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  93. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  94. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  95. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  96. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  97. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  98. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  99. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  100. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  101. package/dist/extensions/builtin/browser/browser.md +73 -73
  102. package/dist/extensions/builtin/browser/install.md +142 -142
  103. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  104. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  105. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  107. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  108. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  109. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  110. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  111. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  112. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  113. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  114. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  115. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  116. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  117. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  118. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  119. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  120. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  121. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  122. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  123. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  124. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  125. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  126. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  127. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  128. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  129. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  130. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  131. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  132. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  133. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  134. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  135. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  136. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  137. package/dist/extensions/builtin/goal/README.md +67 -67
  138. package/dist/extensions/builtin/grub/README.md +112 -112
  139. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  140. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  141. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  142. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  143. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  144. package/dist/extensions/builtin/loop/README.md +92 -92
  145. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  146. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  147. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  148. package/dist/extensions/builtin/sal/README.md +72 -72
  149. package/dist/extensions/builtin/security-audit/README.md +289 -289
  150. package/dist/extensions/builtin/team/AGENT.md +112 -112
  151. package/dist/extensions/builtin/team/TESTING.md +299 -299
  152. package/dist/extensions/builtin/token-save/README.md +56 -56
  153. package/dist/extensions/optional/AGENT.md +10 -10
  154. package/dist/modes/interactive/interactive-mode.js +36 -36
  155. package/dist/modes/interactive/theme/dark.json +85 -85
  156. package/dist/modes/interactive/theme/light.json +84 -84
  157. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  158. package/dist/modes/interactive/theme/warm.json +81 -81
  159. package/dist/node_modules/@pencil-agent/agent-core/dist/agent-loop.js +3 -2
  160. package/dist/node_modules/@pencil-agent/agent-core/dist/structured-adaptive-agent-loop.js +2 -1
  161. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  162. package/docs/cc-agent-design.md +1297 -0
  163. package/docs/cc-tui-design.md +1333 -0
  164. package/docs/codex-goal-command-impl.md +1055 -1055
  165. package/docs/codex-goal-vs-grub.md +500 -500
  166. package/docs/custom-provider.md +27 -27
  167. package/docs/extensions.md +27 -27
  168. package/docs/keybindings.md +27 -27
  169. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  170. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  171. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  172. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  173. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  174. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  175. package/docs/loop-usage-examples.md +214 -214
  176. package/docs/models.md +27 -27
  177. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +170 -0
  178. package/docs/packages.md +27 -27
  179. package/docs/pi-design-philosophy.md +457 -457
  180. package/docs/planmode.md +1987 -1987
  181. package/docs/prompt-templates.md +27 -27
  182. package/docs/providers.md +27 -27
  183. package/docs/scan-report.md +3820 -0
  184. package/docs/sdk.md +27 -27
  185. package/docs/skills.md +27 -27
  186. package/docs/themes.md +27 -27
  187. package/docs/tui.md +27 -27
  188. package/docs//345/257/271/346/240/207Claude-Code.md +1775 -0
  189. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +261 -0
  190. package/package.json +190 -190
  191. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +0 -851
  192. package/docs/SDK-TESTING.md +0 -364
  193. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +0 -593
  194. package/docs/startup-performance-optimization.md +0 -301
  195. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +0 -47
@@ -1,478 +1,478 @@
1
- # MusicBrainz — Data Extraction
2
-
3
- `https://musicbrainz.org` — open music encyclopedia with a fully free JSON API.
4
- No auth required for reads. No browser needed for any documented workflow.
5
-
6
- Field-tested against musicbrainz.org on 2026-04-18.
7
-
8
- ---
9
-
10
- ## Do this first
11
-
12
- **The MusicBrainz Web Service API (ws/2) returns clean JSON for all entity types — no browser needed.**
13
-
14
- ```python
15
- from helpers import http_get
16
- import json
17
-
18
- # REQUIRED: every request must include this header or you get HTTP 403
19
- UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
20
-
21
- data = json.loads(http_get("https://musicbrainz.org/ws/2/artist/?query=queen&fmt=json&limit=5", headers=UA))
22
- for a in data['artists']:
23
- print(a['id'], a['name'], a.get('type'), a.get('country'), a['score'])
24
- # 0383dadf-2a4e-4d10-a46a-e9e041da8eb3 Queen Group GB 100
25
- # 79239441-bfd5-4981-a70c-55c3f15c1287 Madonna Person US 73
26
- ```
27
-
28
- `User-Agent` is **mandatory** — omitting it returns HTTP 403 immediately. Format: `AppName/Version (contact@email.com)`.
29
-
30
- ---
31
-
32
- ## Entity types
33
-
34
- | Entity | Endpoint | Key fields |
35
- |---|---|---|
36
- | `artist` | `/ws/2/artist/` | name, sort-name, type (Group/Person/Orchestra/Choir), country, life-span, tags, rating |
37
- | `release-group` | `/ws/2/release-group/` | title, primary-type (Album/Single/EP/Other), first-release-date |
38
- | `release` | `/ws/2/release/` | title, date, country, status (Official/Bootleg/Promotional), barcode, label-info, media |
39
- | `recording` | `/ws/2/recording/` | title, length (milliseconds), artist-credit, releases |
40
- | `label` | `/ws/2/label/` | name, type, country, area |
41
- | `work` | `/ws/2/work/` | title, type (Song/Aria/Soundtrack/etc.), relations |
42
-
43
- All entities share the same MBID (MusicBrainz ID) format: UUID v4, e.g. `0383dadf-2a4e-4d10-a46a-e9e041da8eb3`.
44
-
45
- ---
46
-
47
- ## Common workflows
48
-
49
- ### Artist search
50
-
51
- ```python
52
- from helpers import http_get
53
- import json
54
-
55
- UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
56
-
57
- resp = json.loads(http_get(
58
- "https://musicbrainz.org/ws/2/artist/?query=queen&fmt=json&limit=5",
59
- headers=UA
60
- ))
61
- # resp keys: count (total matches), offset, artists (list)
62
- for a in resp['artists']:
63
- print(a['id']) # MBID: 0383dadf-2a4e-4d10-a46a-e9e041da8eb3
64
- print(a['name']) # Queen
65
- print(a['sort-name']) # Queen (differs for persons: "Bowie, David")
66
- print(a.get('type')) # Group / Person / Orchestra / Choir
67
- print(a.get('country')) # GB
68
- print(a.get('life-span'))# {'begin': '1970-06-27', 'end': None, 'ended': True}
69
- print(a.get('disambiguation', '')) # e.g. "English singer-songwriter"
70
- print(a['score']) # relevance 0-100
71
- ```
72
-
73
- ### Artist by MBID (with related data via `inc=`)
74
-
75
- ```python
76
- # inc= parameters stack with + between them
77
- resp = json.loads(http_get(
78
- "https://musicbrainz.org/ws/2/artist/0383dadf-2a4e-4d10-a46a-e9e041da8eb3"
79
- "?inc=releases+tags+ratings+release-groups&fmt=json",
80
- headers=UA
81
- ))
82
- print(resp['name']) # Queen
83
- print(resp['type']) # Group
84
- print(resp['country']) # GB
85
- print(resp['life-span']) # {'begin': '1970-06-27', 'end': None, 'ended': True}
86
-
87
- # Tags (community-voted genre labels, sorted by count)
88
- tags = sorted(resp.get('tags', []), key=lambda x: x['count'], reverse=True)
89
- print([t['name'] for t in tags[:5]])
90
- # ['rock', 'glam rock', 'hard rock', 'art rock', 'british']
91
-
92
- # Rating (community score, 0-5)
93
- print(resp.get('rating')) # {'votes-count': 43, 'value': 4.7}
94
-
95
- # Direct releases (up to 25 per request — use browse for full list)
96
- for r in resp.get('releases', []):
97
- print(r['id'], r['title'], r.get('date'))
98
-
99
- # Release groups (albums, singles, EPs — deduplicated by edition)
100
- for rg in resp.get('release-groups', []):
101
- print(rg['id'], rg['title'], rg.get('primary-type'), rg.get('first-release-date'))
102
- # 6b47c9a0 A Night at the Opera Album 1975-11-21
103
- # 002ed683 Sheer Heart Attack Album 1974-11-01
104
- ```
105
-
106
- ### Browse releases by artist (full list)
107
-
108
- ```python
109
- # Browse API: uses 'artist' param (not 'query') — response key is 'release-count' not 'count'
110
- resp = json.loads(http_get(
111
- "https://musicbrainz.org/ws/2/release/"
112
- "?artist=0383dadf-2a4e-4d10-a46a-e9e041da8eb3&fmt=json&limit=25&offset=0",
113
- headers=UA
114
- ))
115
- print(resp['release-count']) # 1635 — total releases for this artist
116
- for r in resp['releases']:
117
- print(r['id'], r['title'], r.get('date'), r.get('country'), r.get('status'))
118
- # Also has: cover-art-archive.artwork (bool), cover-art-archive.front (bool)
119
- caa = r.get('cover-art-archive', {})
120
- print(caa.get('artwork'), caa.get('front'), caa.get('count'))
121
-
122
- # Paginate: increment offset by limit
123
- ```
124
-
125
- ### Release search and lookup
126
-
127
- ```python
128
- # Search by title
129
- resp = json.loads(http_get(
130
- "https://musicbrainz.org/ws/2/release/?query=dark+side+of+the+moon&fmt=json&limit=5",
131
- headers=UA
132
- ))
133
- # resp keys: count, offset, releases
134
-
135
- # Full release with track list, artists, and labels
136
- release = json.loads(http_get(
137
- "https://musicbrainz.org/ws/2/release/b84ee12a-09ef-421b-82de-0441a926375b"
138
- "?inc=artists+recordings+labels+release-groups&fmt=json",
139
- headers=UA
140
- ))
141
- print(release['title']) # The Dark Side of the Moon
142
- print(release['date']) # 1973-03-24
143
- print(release['status']) # Official
144
- print(release['country']) # GB
145
-
146
- # Release group (the "album concept", deduplicates editions)
147
- rg = release.get('release-group', {})
148
- print(rg['title'], rg.get('primary-type'), rg['id'])
149
- # The Dark Side of the Moon Album f5093c06-23e3-404f-aeaa-40f72885ee3a
150
-
151
- # Artist credit
152
- for ac in release.get('artist-credit', []):
153
- if isinstance(ac, dict) and 'artist' in ac:
154
- print(ac['artist']['name'], ac['artist']['id'])
155
- # Pink Floyd 83d91898-7763-47d7-b03b-b92132375c47
156
-
157
- # Labels
158
- for li in release.get('label-info', []):
159
- label = li.get('label', {})
160
- print(label.get('name'), li.get('catalog-number'))
161
- # Harvest SHVL 804
162
-
163
- # Track list (from media[].tracks[])
164
- for disc in release.get('media', []):
165
- for track in disc.get('tracks', []):
166
- dur_s = track['length'] // 1000 if track.get('length') else None
167
- rec = track.get('recording', {})
168
- print(track['number'], track['title'], dur_s, rec.get('id'))
169
- # A1 Speak to Me 68s bef3fddb-5aca-49f5-b2fd-d56a23268d63
170
- # A2 Breathe 168s ecbc7c9b-e79d-4ec8-ac77-44e4a7f7f1b8
171
- ```
172
-
173
- ### Recording (track) search
174
-
175
- ```python
176
- # Use Lucene field syntax to filter by artist
177
- resp = json.loads(http_get(
178
- "https://musicbrainz.org/ws/2/recording/"
179
- "?query=bohemian+rhapsody+AND+artist:queen&fmt=json&limit=5",
180
- headers=UA
181
- ))
182
- print(resp['count']) # 419
183
- for r in resp['recordings']:
184
- dur_s = r['length'] // 1000 if r.get('length') else None
185
- artists = [ac['artist']['name'] for ac in r.get('artist-credit', []) if isinstance(ac, dict)]
186
- releases = r.get('releases', [])
187
- print(r['id'], r['title'], dur_s, artists, releases[0]['title'] if releases else None)
188
- # a4803b45 Bohemian Rhapsody 130s ['Queen'] Rhapsody in Red
189
- # 40212eb6 Bohemian Rhapsody 338s ['Queen'] 1986-07: Wembley Stadium
190
- ```
191
-
192
- ### Release-group search (deduplicated albums)
193
-
194
- ```python
195
- # Use release-group endpoint to avoid getting every regional edition
196
- resp = json.loads(http_get(
197
- "https://musicbrainz.org/ws/2/release-group/"
198
- "?query=release-group:\"A+Night+at+the+Opera\"+AND+artist:queen&fmt=json&limit=5",
199
- headers=UA
200
- ))
201
- # resp keys: count, release-groups
202
- for rg in resp.get('release-groups', []):
203
- print(rg['id'], rg['title'], rg.get('primary-type'), rg.get('first-release-date'), rg['score'])
204
- # 6b47c9a0 A Night at the Opera Album 1975-11-21 100
205
-
206
- # Browse release-groups for an artist
207
- resp = json.loads(http_get(
208
- "https://musicbrainz.org/ws/2/release-group/"
209
- "?artist=0383dadf-2a4e-4d10-a46a-e9e041da8eb3&fmt=json&limit=25",
210
- headers=UA
211
- ))
212
- print(resp['release-group-count']) # 412
213
- for rg in resp.get('release-groups', []):
214
- print(rg['title'], rg.get('primary-type'), rg.get('first-release-date'))
215
- ```
216
-
217
- ### Label and work lookups
218
-
219
- ```python
220
- # Label search
221
- resp = json.loads(http_get(
222
- "https://musicbrainz.org/ws/2/label/?query=EMI&fmt=json&limit=3",
223
- headers=UA
224
- ))
225
- for l in resp['labels']:
226
- print(l['id'], l['name'], l.get('type'), l.get('country'), l['score'])
227
- # c029628b EMI Original Production GB 100
228
-
229
- # Work (song composition — author-level, not performance-level)
230
- resp = json.loads(http_get(
231
- "https://musicbrainz.org/ws/2/work/?query=bohemian+rhapsody&fmt=json&limit=3",
232
- headers=UA
233
- ))
234
- for w in resp['works']:
235
- print(w['id'], w['title'], w.get('type'), w['score'])
236
- # 41c94a08 Bohemian Rhapsody Song 100
237
- ```
238
-
239
- ### Cover Art Archive
240
-
241
- ```python
242
- # Get cover art for a release MBID
243
- # 404 if no artwork has been uploaded for that release
244
- def get_cover_art(release_mbid, size="500"):
245
- """
246
- size: '250', '500', '1200', or 'full' (original file)
247
- Returns the front cover URL, or None if no artwork exists.
248
- """
249
- try:
250
- resp = json.loads(http_get(
251
- f"https://coverartarchive.org/release/{release_mbid}",
252
- headers=UA
253
- ))
254
- except Exception:
255
- return None # 404 = no art uploaded
256
-
257
- images = resp.get('images', [])
258
- # Prefer an image flagged as front=True
259
- front = next((img for img in images if img.get('front')), None)
260
- img = front or (images[0] if images else None)
261
- if not img:
262
- return None
263
-
264
- if size == 'full':
265
- return img['image']
266
- return img['thumbnails'].get(size) or img['thumbnails'].get('large')
267
-
268
- # Thumbnail sizes confirmed: '250', '500', '1200', 'small' (=250), 'large' (=500)
269
-
270
- url = get_cover_art("b84ee12a-09ef-421b-82de-0441a926375b")
271
- # http://coverartarchive.org/release/b84ee12a.../1611507818-500.jpg
272
-
273
- # Full images response structure
274
- resp = json.loads(http_get(
275
- "https://coverartarchive.org/release/b84ee12a-09ef-421b-82de-0441a926375b",
276
- headers=UA
277
- ))
278
- for img in resp['images']:
279
- print(img.get('types')) # ['Front'], ['Back'], ['Liner'], ['Poster'], ['Medium'], ['Sticker'], ['Other']
280
- print(img.get('front')) # True only for front=True flagged images (not all 'Front' types)
281
- print(img.get('approved'))# True/False
282
- print(img['image']) # full resolution URL
283
- print(img['thumbnails']) # {'small': '...-250.jpg', 'large': '...-500.jpg', '250': ..., '500': ..., '1200': ...}
284
- ```
285
-
286
- ### Lucene query syntax for search
287
-
288
- All search endpoints support Lucene field queries:
289
-
290
- ```python
291
- # Field search: artist:, type:, country:, tag:, release:, date:
292
- resp = json.loads(http_get(
293
- "https://musicbrainz.org/ws/2/artist/"
294
- "?query=artist:queen+AND+type:group+AND+country:GB&fmt=json&limit=5",
295
- headers=UA
296
- ))
297
- # count: 23 (exact matches only)
298
-
299
- # Phrase search with quotes
300
- resp = json.loads(http_get(
301
- "https://musicbrainz.org/ws/2/release/"
302
- '?query=release:"A+Night+at+the+Opera"+AND+artist:queen&fmt=json&limit=5',
303
- headers=UA
304
- ))
305
- ```
306
-
307
- Common Lucene field names per entity:
308
- - artist: `artist:`, `type:`, `country:`, `tag:`, `begin:`, `end:`
309
- - release: `release:`, `artist:`, `date:`, `country:`, `status:`, `label:`, `barcode:`
310
- - recording: `recording:`, `artist:`, `release:`, `dur:` (milliseconds), `tnum:` (track number)
311
- - release-group: `release-group:`, `artist:`, `primarytype:`, `secondarytype:`
312
-
313
- ### Parallel fetching
314
-
315
- ```python
316
- from concurrent.futures import ThreadPoolExecutor
317
-
318
- UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
319
-
320
- def fetch_artist(mbid):
321
- resp = json.loads(http_get(
322
- f"https://musicbrainz.org/ws/2/artist/{mbid}?inc=tags&fmt=json",
323
- headers=UA
324
- ))
325
- tags = [t['name'] for t in sorted(resp.get('tags', []), key=lambda x: x['count'], reverse=True)[:3]]
326
- return {"name": resp['name'], "type": resp.get('type'), "tags": tags}
327
-
328
- mbids = [
329
- "0383dadf-2a4e-4d10-a46a-e9e041da8eb3", # Queen
330
- "83d91898-7763-47d7-b03b-b92132375c47", # Pink Floyd
331
- "678d88b2-87b0-403b-b63d-5da7465aecc3", # Led Zeppelin
332
- ]
333
-
334
- with ThreadPoolExecutor(max_workers=3) as ex:
335
- results = list(ex.map(fetch_artist, mbids))
336
- # 3 artists fetched in ~0.79s total
337
- ```
338
-
339
- Tested: 5-6 rapid sequential requests all succeed. Parallel requests at 3x concurrency succeed. Real 429s (rate-limit blocks) are only hit at very high burst rates; if you do get a 429, add `time.sleep(1)` between requests.
340
-
341
- ### Pagination
342
-
343
- ```python
344
- import time
345
-
346
- UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
347
-
348
- def browse_all_releases(artist_mbid, page_size=25):
349
- """Fetch all releases for an artist across multiple pages."""
350
- offset = 0
351
- total = None
352
- releases = []
353
- while total is None or offset < total:
354
- resp = json.loads(http_get(
355
- f"https://musicbrainz.org/ws/2/release/"
356
- f"?artist={artist_mbid}&fmt=json&limit={page_size}&offset={offset}",
357
- headers=UA
358
- ))
359
- total = resp['release-count']
360
- batch = resp['releases']
361
- releases.extend(batch)
362
- offset += len(batch)
363
- if offset < total:
364
- time.sleep(1) # stay within 1 req/s for sequential pagination
365
- return releases
366
-
367
- # Queen has 1635 releases — use release-groups (412) to get deduplicated albums
368
- ```
369
-
370
- ---
371
-
372
- ## `inc=` parameter reference
373
-
374
- Stack multiple `inc=` values with `+` between them.
375
-
376
- **Artist lookup** (`/ws/2/artist/{mbid}`):
377
- - `releases` — list of releases (max ~25)
378
- - `release-groups` — list of release groups (max ~25)
379
- - `recordings` — list of recordings (max ~25)
380
- - `works` — list of works
381
- - `tags` — community genre tags (name + vote count)
382
- - `ratings` — community rating (value 0-5, votes-count)
383
- - `aliases` — alternative names and transliterations
384
- - `annotation` — free-text editorial note
385
- - `artist-rels`, `release-rels`, `recording-rels`, `work-rels` — relationship data
386
-
387
- **Release lookup** (`/ws/2/release/{mbid}`):
388
- - `artists` — full artist-credit objects
389
- - `recordings` — track list with recording links (populates `media[].tracks[].recording`)
390
- - `labels` — label-info with catalog numbers
391
- - `release-groups` — the release group this belongs to
392
- - `artist-credits` — expanded artist credit with joinphrase
393
- - `media` — disc/format info (always included in lookup, not needed in `inc=`)
394
-
395
- ---
396
-
397
- ## Response shapes cheat sheet
398
-
399
- ```
400
- # MBID format: standard UUID v4
401
- "0383dadf-2a4e-4d10-a46a-e9e041da8eb3"
402
-
403
- # Search response (artist/recording/release/release-group/label/work)
404
- {
405
- "count": 1612, # total matches
406
- "offset": 0,
407
- "<entity-plural>": [...] # e.g. "artists", "releases", "recordings", "release-groups"
408
- }
409
-
410
- # Browse response (using ?artist=MBID or ?label=MBID style)
411
- {
412
- "release-count": 1635, # note: key name changes per entity
413
- "release-offset": 0, # e.g. "release-group-count", "recording-count"
414
- "releases": [...]
415
- }
416
-
417
- # Recording length is always milliseconds
418
- recording['length'] // 1000 # => seconds
419
-
420
- # Artist life-span
421
- life_span = artist['life-span']
422
- # {'begin': '1970-06-27', 'end': None, 'ended': True}
423
- # 'ended': True with 'end': None means end date unknown but band is inactive
424
-
425
- # Artist credit joinphrase (for multi-artist tracks)
426
- # [{"name": "Simon", "artist": {...}, "joinphrase": " & "}, {"name": "Garfunkel", ...}]
427
- ```
428
-
429
- ---
430
-
431
- ## URL patterns
432
-
433
- | Resource | URL |
434
- |---|---|
435
- | Artist search | `https://musicbrainz.org/ws/2/artist/?query={q}&fmt=json&limit=5` |
436
- | Artist by MBID | `https://musicbrainz.org/ws/2/artist/{mbid}?inc=tags+ratings&fmt=json` |
437
- | Browse releases by artist | `https://musicbrainz.org/ws/2/release/?artist={mbid}&fmt=json&limit=25&offset=0` |
438
- | Release search | `https://musicbrainz.org/ws/2/release/?query={q}&fmt=json&limit=5` |
439
- | Release by MBID | `https://musicbrainz.org/ws/2/release/{mbid}?inc=artists+recordings+labels&fmt=json` |
440
- | Release-group browse | `https://musicbrainz.org/ws/2/release-group/?artist={mbid}&fmt=json&limit=25` |
441
- | Recording search | `https://musicbrainz.org/ws/2/recording/?query={q}&fmt=json&limit=5` |
442
- | Label search | `https://musicbrainz.org/ws/2/label/?query={q}&fmt=json&limit=5` |
443
- | Work search | `https://musicbrainz.org/ws/2/work/?query={q}&fmt=json&limit=5` |
444
- | Cover art | `https://coverartarchive.org/release/{release-mbid}` |
445
-
446
- MusicBrainz entity browser URL (human-readable): `https://musicbrainz.org/artist/{mbid}` (replace `artist` with `release`, `recording`, etc.)
447
-
448
- ---
449
-
450
- ## Gotchas
451
-
452
- - **`User-Agent` is mandatory** — without it you get HTTP 403 instantly. The header must include contact info, e.g. `browser-harness/1.0 (you@example.com)`. The default `http_get` UA (`Mozilla/5.0`) also gets 403.
453
-
454
- - **Browse vs search response keys differ** — Search responses use `count` and `offset`; Browse responses (with `?artist=MBID`) use `release-count` / `release-offset` (or `release-group-count` etc.). Accessing `data['count']` on a browse response throws `KeyError`.
455
-
456
- - **`releases` include in artist lookup caps at ~25** — Use the browse endpoint (`?artist=MBID`) with pagination for complete lists. Queen has 1,635 releases total; the `inc=releases` on the artist endpoint only returns ~25.
457
-
458
- - **Use release-groups to avoid edition explosion** — A popular album can have hundreds of release entries (every country's pressing, every remaster, every format). Use `/ws/2/release-group/` to get one entry per "album concept". Queen's "A Night at the Opera" has 75+ release entries but 1 release-group.
459
-
460
- - **Recording length is milliseconds** — `recording['length']` is in milliseconds, not seconds. Divide by 1000.
461
-
462
- - **Sort-name differs from display name for persons** — Artists have both `name` (display: "David Bowie") and `sort-name` (alphabetical: "Bowie, David"). Groups usually have identical values.
463
-
464
- - **Disambiguation in parentheses** — When multiple entities share a name, MusicBrainz adds a `disambiguation` field to distinguish them (e.g. `"English singer-songwriter"` vs a different David Bowie). Always check `a.get('disambiguation', '')` when resolving artist identity.
465
-
466
- - **Score 100 does not mean unique** — Search returns `score: 100` for multiple results when several equally match the query. "dark side of the moon" returns 6 results all scored 100 — they're different regional pressings. Filter by `date`, `country`, or `status` to narrow down.
467
-
468
- - **Recording search: plain query matches titles AND artists broadly** — `?query=bohemian+rhapsody+queen` matches *cover versions* first because "queen" appears in the artist or title of other recordings. Use `AND artist:queen` Lucene syntax to restrict to Queen performances.
469
-
470
- - **Cover Art Archive returns 404 for releases with no uploaded art** — Check `release['cover-art-archive']['artwork']` (boolean) from any release browse/search response before hitting the CAA endpoint. Saves an extra HTTP round-trip.
471
-
472
- - **Cover art `front=True` flag vs `types=['Front']`** — A release can have multiple images typed as 'Front' but only one (or none) flagged `front: true`. Always filter on `img.get('front') == True` for the canonical cover, not on `img.get('types') == ['Front']`.
473
-
474
- - **CAA thumbnail key names** — Both string keys `'small'` (250px) and `'large'` (500px) exist as aliases alongside numeric string keys `'250'`, `'500'`, `'1200'`. Access as `img['thumbnails']['500']` or `img['thumbnails']['large']` — both work.
475
-
476
- - **Rate limit: 1 req/s unauthenticated** — In practice, bursts of 5-6 sequential requests succeed without throttling. True 429s appear at higher rates. For sequential pagination loops, add `time.sleep(1)` between pages. For parallel fetching, limit concurrency to 3-5 workers.
477
-
478
- - **`fmt=json` required** — Omitting it returns XML instead of JSON. Always append `&fmt=json` to every request.
1
+ # MusicBrainz — Data Extraction
2
+
3
+ `https://musicbrainz.org` — open music encyclopedia with a fully free JSON API.
4
+ No auth required for reads. No browser needed for any documented workflow.
5
+
6
+ Field-tested against musicbrainz.org on 2026-04-18.
7
+
8
+ ---
9
+
10
+ ## Do this first
11
+
12
+ **The MusicBrainz Web Service API (ws/2) returns clean JSON for all entity types — no browser needed.**
13
+
14
+ ```python
15
+ from helpers import http_get
16
+ import json
17
+
18
+ # REQUIRED: every request must include this header or you get HTTP 403
19
+ UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
20
+
21
+ data = json.loads(http_get("https://musicbrainz.org/ws/2/artist/?query=queen&fmt=json&limit=5", headers=UA))
22
+ for a in data['artists']:
23
+ print(a['id'], a['name'], a.get('type'), a.get('country'), a['score'])
24
+ # 0383dadf-2a4e-4d10-a46a-e9e041da8eb3 Queen Group GB 100
25
+ # 79239441-bfd5-4981-a70c-55c3f15c1287 Madonna Person US 73
26
+ ```
27
+
28
+ `User-Agent` is **mandatory** — omitting it returns HTTP 403 immediately. Format: `AppName/Version (contact@email.com)`.
29
+
30
+ ---
31
+
32
+ ## Entity types
33
+
34
+ | Entity | Endpoint | Key fields |
35
+ |---|---|---|
36
+ | `artist` | `/ws/2/artist/` | name, sort-name, type (Group/Person/Orchestra/Choir), country, life-span, tags, rating |
37
+ | `release-group` | `/ws/2/release-group/` | title, primary-type (Album/Single/EP/Other), first-release-date |
38
+ | `release` | `/ws/2/release/` | title, date, country, status (Official/Bootleg/Promotional), barcode, label-info, media |
39
+ | `recording` | `/ws/2/recording/` | title, length (milliseconds), artist-credit, releases |
40
+ | `label` | `/ws/2/label/` | name, type, country, area |
41
+ | `work` | `/ws/2/work/` | title, type (Song/Aria/Soundtrack/etc.), relations |
42
+
43
+ All entities share the same MBID (MusicBrainz ID) format: UUID v4, e.g. `0383dadf-2a4e-4d10-a46a-e9e041da8eb3`.
44
+
45
+ ---
46
+
47
+ ## Common workflows
48
+
49
+ ### Artist search
50
+
51
+ ```python
52
+ from helpers import http_get
53
+ import json
54
+
55
+ UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
56
+
57
+ resp = json.loads(http_get(
58
+ "https://musicbrainz.org/ws/2/artist/?query=queen&fmt=json&limit=5",
59
+ headers=UA
60
+ ))
61
+ # resp keys: count (total matches), offset, artists (list)
62
+ for a in resp['artists']:
63
+ print(a['id']) # MBID: 0383dadf-2a4e-4d10-a46a-e9e041da8eb3
64
+ print(a['name']) # Queen
65
+ print(a['sort-name']) # Queen (differs for persons: "Bowie, David")
66
+ print(a.get('type')) # Group / Person / Orchestra / Choir
67
+ print(a.get('country')) # GB
68
+ print(a.get('life-span'))# {'begin': '1970-06-27', 'end': None, 'ended': True}
69
+ print(a.get('disambiguation', '')) # e.g. "English singer-songwriter"
70
+ print(a['score']) # relevance 0-100
71
+ ```
72
+
73
+ ### Artist by MBID (with related data via `inc=`)
74
+
75
+ ```python
76
+ # inc= parameters stack with + between them
77
+ resp = json.loads(http_get(
78
+ "https://musicbrainz.org/ws/2/artist/0383dadf-2a4e-4d10-a46a-e9e041da8eb3"
79
+ "?inc=releases+tags+ratings+release-groups&fmt=json",
80
+ headers=UA
81
+ ))
82
+ print(resp['name']) # Queen
83
+ print(resp['type']) # Group
84
+ print(resp['country']) # GB
85
+ print(resp['life-span']) # {'begin': '1970-06-27', 'end': None, 'ended': True}
86
+
87
+ # Tags (community-voted genre labels, sorted by count)
88
+ tags = sorted(resp.get('tags', []), key=lambda x: x['count'], reverse=True)
89
+ print([t['name'] for t in tags[:5]])
90
+ # ['rock', 'glam rock', 'hard rock', 'art rock', 'british']
91
+
92
+ # Rating (community score, 0-5)
93
+ print(resp.get('rating')) # {'votes-count': 43, 'value': 4.7}
94
+
95
+ # Direct releases (up to 25 per request — use browse for full list)
96
+ for r in resp.get('releases', []):
97
+ print(r['id'], r['title'], r.get('date'))
98
+
99
+ # Release groups (albums, singles, EPs — deduplicated by edition)
100
+ for rg in resp.get('release-groups', []):
101
+ print(rg['id'], rg['title'], rg.get('primary-type'), rg.get('first-release-date'))
102
+ # 6b47c9a0 A Night at the Opera Album 1975-11-21
103
+ # 002ed683 Sheer Heart Attack Album 1974-11-01
104
+ ```
105
+
106
+ ### Browse releases by artist (full list)
107
+
108
+ ```python
109
+ # Browse API: uses 'artist' param (not 'query') — response key is 'release-count' not 'count'
110
+ resp = json.loads(http_get(
111
+ "https://musicbrainz.org/ws/2/release/"
112
+ "?artist=0383dadf-2a4e-4d10-a46a-e9e041da8eb3&fmt=json&limit=25&offset=0",
113
+ headers=UA
114
+ ))
115
+ print(resp['release-count']) # 1635 — total releases for this artist
116
+ for r in resp['releases']:
117
+ print(r['id'], r['title'], r.get('date'), r.get('country'), r.get('status'))
118
+ # Also has: cover-art-archive.artwork (bool), cover-art-archive.front (bool)
119
+ caa = r.get('cover-art-archive', {})
120
+ print(caa.get('artwork'), caa.get('front'), caa.get('count'))
121
+
122
+ # Paginate: increment offset by limit
123
+ ```
124
+
125
+ ### Release search and lookup
126
+
127
+ ```python
128
+ # Search by title
129
+ resp = json.loads(http_get(
130
+ "https://musicbrainz.org/ws/2/release/?query=dark+side+of+the+moon&fmt=json&limit=5",
131
+ headers=UA
132
+ ))
133
+ # resp keys: count, offset, releases
134
+
135
+ # Full release with track list, artists, and labels
136
+ release = json.loads(http_get(
137
+ "https://musicbrainz.org/ws/2/release/b84ee12a-09ef-421b-82de-0441a926375b"
138
+ "?inc=artists+recordings+labels+release-groups&fmt=json",
139
+ headers=UA
140
+ ))
141
+ print(release['title']) # The Dark Side of the Moon
142
+ print(release['date']) # 1973-03-24
143
+ print(release['status']) # Official
144
+ print(release['country']) # GB
145
+
146
+ # Release group (the "album concept", deduplicates editions)
147
+ rg = release.get('release-group', {})
148
+ print(rg['title'], rg.get('primary-type'), rg['id'])
149
+ # The Dark Side of the Moon Album f5093c06-23e3-404f-aeaa-40f72885ee3a
150
+
151
+ # Artist credit
152
+ for ac in release.get('artist-credit', []):
153
+ if isinstance(ac, dict) and 'artist' in ac:
154
+ print(ac['artist']['name'], ac['artist']['id'])
155
+ # Pink Floyd 83d91898-7763-47d7-b03b-b92132375c47
156
+
157
+ # Labels
158
+ for li in release.get('label-info', []):
159
+ label = li.get('label', {})
160
+ print(label.get('name'), li.get('catalog-number'))
161
+ # Harvest SHVL 804
162
+
163
+ # Track list (from media[].tracks[])
164
+ for disc in release.get('media', []):
165
+ for track in disc.get('tracks', []):
166
+ dur_s = track['length'] // 1000 if track.get('length') else None
167
+ rec = track.get('recording', {})
168
+ print(track['number'], track['title'], dur_s, rec.get('id'))
169
+ # A1 Speak to Me 68s bef3fddb-5aca-49f5-b2fd-d56a23268d63
170
+ # A2 Breathe 168s ecbc7c9b-e79d-4ec8-ac77-44e4a7f7f1b8
171
+ ```
172
+
173
+ ### Recording (track) search
174
+
175
+ ```python
176
+ # Use Lucene field syntax to filter by artist
177
+ resp = json.loads(http_get(
178
+ "https://musicbrainz.org/ws/2/recording/"
179
+ "?query=bohemian+rhapsody+AND+artist:queen&fmt=json&limit=5",
180
+ headers=UA
181
+ ))
182
+ print(resp['count']) # 419
183
+ for r in resp['recordings']:
184
+ dur_s = r['length'] // 1000 if r.get('length') else None
185
+ artists = [ac['artist']['name'] for ac in r.get('artist-credit', []) if isinstance(ac, dict)]
186
+ releases = r.get('releases', [])
187
+ print(r['id'], r['title'], dur_s, artists, releases[0]['title'] if releases else None)
188
+ # a4803b45 Bohemian Rhapsody 130s ['Queen'] Rhapsody in Red
189
+ # 40212eb6 Bohemian Rhapsody 338s ['Queen'] 1986-07: Wembley Stadium
190
+ ```
191
+
192
+ ### Release-group search (deduplicated albums)
193
+
194
+ ```python
195
+ # Use release-group endpoint to avoid getting every regional edition
196
+ resp = json.loads(http_get(
197
+ "https://musicbrainz.org/ws/2/release-group/"
198
+ "?query=release-group:\"A+Night+at+the+Opera\"+AND+artist:queen&fmt=json&limit=5",
199
+ headers=UA
200
+ ))
201
+ # resp keys: count, release-groups
202
+ for rg in resp.get('release-groups', []):
203
+ print(rg['id'], rg['title'], rg.get('primary-type'), rg.get('first-release-date'), rg['score'])
204
+ # 6b47c9a0 A Night at the Opera Album 1975-11-21 100
205
+
206
+ # Browse release-groups for an artist
207
+ resp = json.loads(http_get(
208
+ "https://musicbrainz.org/ws/2/release-group/"
209
+ "?artist=0383dadf-2a4e-4d10-a46a-e9e041da8eb3&fmt=json&limit=25",
210
+ headers=UA
211
+ ))
212
+ print(resp['release-group-count']) # 412
213
+ for rg in resp.get('release-groups', []):
214
+ print(rg['title'], rg.get('primary-type'), rg.get('first-release-date'))
215
+ ```
216
+
217
+ ### Label and work lookups
218
+
219
+ ```python
220
+ # Label search
221
+ resp = json.loads(http_get(
222
+ "https://musicbrainz.org/ws/2/label/?query=EMI&fmt=json&limit=3",
223
+ headers=UA
224
+ ))
225
+ for l in resp['labels']:
226
+ print(l['id'], l['name'], l.get('type'), l.get('country'), l['score'])
227
+ # c029628b EMI Original Production GB 100
228
+
229
+ # Work (song composition — author-level, not performance-level)
230
+ resp = json.loads(http_get(
231
+ "https://musicbrainz.org/ws/2/work/?query=bohemian+rhapsody&fmt=json&limit=3",
232
+ headers=UA
233
+ ))
234
+ for w in resp['works']:
235
+ print(w['id'], w['title'], w.get('type'), w['score'])
236
+ # 41c94a08 Bohemian Rhapsody Song 100
237
+ ```
238
+
239
+ ### Cover Art Archive
240
+
241
+ ```python
242
+ # Get cover art for a release MBID
243
+ # 404 if no artwork has been uploaded for that release
244
+ def get_cover_art(release_mbid, size="500"):
245
+ """
246
+ size: '250', '500', '1200', or 'full' (original file)
247
+ Returns the front cover URL, or None if no artwork exists.
248
+ """
249
+ try:
250
+ resp = json.loads(http_get(
251
+ f"https://coverartarchive.org/release/{release_mbid}",
252
+ headers=UA
253
+ ))
254
+ except Exception:
255
+ return None # 404 = no art uploaded
256
+
257
+ images = resp.get('images', [])
258
+ # Prefer an image flagged as front=True
259
+ front = next((img for img in images if img.get('front')), None)
260
+ img = front or (images[0] if images else None)
261
+ if not img:
262
+ return None
263
+
264
+ if size == 'full':
265
+ return img['image']
266
+ return img['thumbnails'].get(size) or img['thumbnails'].get('large')
267
+
268
+ # Thumbnail sizes confirmed: '250', '500', '1200', 'small' (=250), 'large' (=500)
269
+
270
+ url = get_cover_art("b84ee12a-09ef-421b-82de-0441a926375b")
271
+ # http://coverartarchive.org/release/b84ee12a.../1611507818-500.jpg
272
+
273
+ # Full images response structure
274
+ resp = json.loads(http_get(
275
+ "https://coverartarchive.org/release/b84ee12a-09ef-421b-82de-0441a926375b",
276
+ headers=UA
277
+ ))
278
+ for img in resp['images']:
279
+ print(img.get('types')) # ['Front'], ['Back'], ['Liner'], ['Poster'], ['Medium'], ['Sticker'], ['Other']
280
+ print(img.get('front')) # True only for front=True flagged images (not all 'Front' types)
281
+ print(img.get('approved'))# True/False
282
+ print(img['image']) # full resolution URL
283
+ print(img['thumbnails']) # {'small': '...-250.jpg', 'large': '...-500.jpg', '250': ..., '500': ..., '1200': ...}
284
+ ```
285
+
286
+ ### Lucene query syntax for search
287
+
288
+ All search endpoints support Lucene field queries:
289
+
290
+ ```python
291
+ # Field search: artist:, type:, country:, tag:, release:, date:
292
+ resp = json.loads(http_get(
293
+ "https://musicbrainz.org/ws/2/artist/"
294
+ "?query=artist:queen+AND+type:group+AND+country:GB&fmt=json&limit=5",
295
+ headers=UA
296
+ ))
297
+ # count: 23 (exact matches only)
298
+
299
+ # Phrase search with quotes
300
+ resp = json.loads(http_get(
301
+ "https://musicbrainz.org/ws/2/release/"
302
+ '?query=release:"A+Night+at+the+Opera"+AND+artist:queen&fmt=json&limit=5',
303
+ headers=UA
304
+ ))
305
+ ```
306
+
307
+ Common Lucene field names per entity:
308
+ - artist: `artist:`, `type:`, `country:`, `tag:`, `begin:`, `end:`
309
+ - release: `release:`, `artist:`, `date:`, `country:`, `status:`, `label:`, `barcode:`
310
+ - recording: `recording:`, `artist:`, `release:`, `dur:` (milliseconds), `tnum:` (track number)
311
+ - release-group: `release-group:`, `artist:`, `primarytype:`, `secondarytype:`
312
+
313
+ ### Parallel fetching
314
+
315
+ ```python
316
+ from concurrent.futures import ThreadPoolExecutor
317
+
318
+ UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
319
+
320
+ def fetch_artist(mbid):
321
+ resp = json.loads(http_get(
322
+ f"https://musicbrainz.org/ws/2/artist/{mbid}?inc=tags&fmt=json",
323
+ headers=UA
324
+ ))
325
+ tags = [t['name'] for t in sorted(resp.get('tags', []), key=lambda x: x['count'], reverse=True)[:3]]
326
+ return {"name": resp['name'], "type": resp.get('type'), "tags": tags}
327
+
328
+ mbids = [
329
+ "0383dadf-2a4e-4d10-a46a-e9e041da8eb3", # Queen
330
+ "83d91898-7763-47d7-b03b-b92132375c47", # Pink Floyd
331
+ "678d88b2-87b0-403b-b63d-5da7465aecc3", # Led Zeppelin
332
+ ]
333
+
334
+ with ThreadPoolExecutor(max_workers=3) as ex:
335
+ results = list(ex.map(fetch_artist, mbids))
336
+ # 3 artists fetched in ~0.79s total
337
+ ```
338
+
339
+ Tested: 5-6 rapid sequential requests all succeed. Parallel requests at 3x concurrency succeed. Real 429s (rate-limit blocks) are only hit at very high burst rates; if you do get a 429, add `time.sleep(1)` between requests.
340
+
341
+ ### Pagination
342
+
343
+ ```python
344
+ import time
345
+
346
+ UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
347
+
348
+ def browse_all_releases(artist_mbid, page_size=25):
349
+ """Fetch all releases for an artist across multiple pages."""
350
+ offset = 0
351
+ total = None
352
+ releases = []
353
+ while total is None or offset < total:
354
+ resp = json.loads(http_get(
355
+ f"https://musicbrainz.org/ws/2/release/"
356
+ f"?artist={artist_mbid}&fmt=json&limit={page_size}&offset={offset}",
357
+ headers=UA
358
+ ))
359
+ total = resp['release-count']
360
+ batch = resp['releases']
361
+ releases.extend(batch)
362
+ offset += len(batch)
363
+ if offset < total:
364
+ time.sleep(1) # stay within 1 req/s for sequential pagination
365
+ return releases
366
+
367
+ # Queen has 1635 releases — use release-groups (412) to get deduplicated albums
368
+ ```
369
+
370
+ ---
371
+
372
+ ## `inc=` parameter reference
373
+
374
+ Stack multiple `inc=` values with `+` between them.
375
+
376
+ **Artist lookup** (`/ws/2/artist/{mbid}`):
377
+ - `releases` — list of releases (max ~25)
378
+ - `release-groups` — list of release groups (max ~25)
379
+ - `recordings` — list of recordings (max ~25)
380
+ - `works` — list of works
381
+ - `tags` — community genre tags (name + vote count)
382
+ - `ratings` — community rating (value 0-5, votes-count)
383
+ - `aliases` — alternative names and transliterations
384
+ - `annotation` — free-text editorial note
385
+ - `artist-rels`, `release-rels`, `recording-rels`, `work-rels` — relationship data
386
+
387
+ **Release lookup** (`/ws/2/release/{mbid}`):
388
+ - `artists` — full artist-credit objects
389
+ - `recordings` — track list with recording links (populates `media[].tracks[].recording`)
390
+ - `labels` — label-info with catalog numbers
391
+ - `release-groups` — the release group this belongs to
392
+ - `artist-credits` — expanded artist credit with joinphrase
393
+ - `media` — disc/format info (always included in lookup, not needed in `inc=`)
394
+
395
+ ---
396
+
397
+ ## Response shapes cheat sheet
398
+
399
+ ```
400
+ # MBID format: standard UUID v4
401
+ "0383dadf-2a4e-4d10-a46a-e9e041da8eb3"
402
+
403
+ # Search response (artist/recording/release/release-group/label/work)
404
+ {
405
+ "count": 1612, # total matches
406
+ "offset": 0,
407
+ "<entity-plural>": [...] # e.g. "artists", "releases", "recordings", "release-groups"
408
+ }
409
+
410
+ # Browse response (using ?artist=MBID or ?label=MBID style)
411
+ {
412
+ "release-count": 1635, # note: key name changes per entity
413
+ "release-offset": 0, # e.g. "release-group-count", "recording-count"
414
+ "releases": [...]
415
+ }
416
+
417
+ # Recording length is always milliseconds
418
+ recording['length'] // 1000 # => seconds
419
+
420
+ # Artist life-span
421
+ life_span = artist['life-span']
422
+ # {'begin': '1970-06-27', 'end': None, 'ended': True}
423
+ # 'ended': True with 'end': None means end date unknown but band is inactive
424
+
425
+ # Artist credit joinphrase (for multi-artist tracks)
426
+ # [{"name": "Simon", "artist": {...}, "joinphrase": " & "}, {"name": "Garfunkel", ...}]
427
+ ```
428
+
429
+ ---
430
+
431
+ ## URL patterns
432
+
433
+ | Resource | URL |
434
+ |---|---|
435
+ | Artist search | `https://musicbrainz.org/ws/2/artist/?query={q}&fmt=json&limit=5` |
436
+ | Artist by MBID | `https://musicbrainz.org/ws/2/artist/{mbid}?inc=tags+ratings&fmt=json` |
437
+ | Browse releases by artist | `https://musicbrainz.org/ws/2/release/?artist={mbid}&fmt=json&limit=25&offset=0` |
438
+ | Release search | `https://musicbrainz.org/ws/2/release/?query={q}&fmt=json&limit=5` |
439
+ | Release by MBID | `https://musicbrainz.org/ws/2/release/{mbid}?inc=artists+recordings+labels&fmt=json` |
440
+ | Release-group browse | `https://musicbrainz.org/ws/2/release-group/?artist={mbid}&fmt=json&limit=25` |
441
+ | Recording search | `https://musicbrainz.org/ws/2/recording/?query={q}&fmt=json&limit=5` |
442
+ | Label search | `https://musicbrainz.org/ws/2/label/?query={q}&fmt=json&limit=5` |
443
+ | Work search | `https://musicbrainz.org/ws/2/work/?query={q}&fmt=json&limit=5` |
444
+ | Cover art | `https://coverartarchive.org/release/{release-mbid}` |
445
+
446
+ MusicBrainz entity browser URL (human-readable): `https://musicbrainz.org/artist/{mbid}` (replace `artist` with `release`, `recording`, etc.)
447
+
448
+ ---
449
+
450
+ ## Gotchas
451
+
452
+ - **`User-Agent` is mandatory** — without it you get HTTP 403 instantly. The header must include contact info, e.g. `browser-harness/1.0 (you@example.com)`. The default `http_get` UA (`Mozilla/5.0`) also gets 403.
453
+
454
+ - **Browse vs search response keys differ** — Search responses use `count` and `offset`; Browse responses (with `?artist=MBID`) use `release-count` / `release-offset` (or `release-group-count` etc.). Accessing `data['count']` on a browse response throws `KeyError`.
455
+
456
+ - **`releases` include in artist lookup caps at ~25** — Use the browse endpoint (`?artist=MBID`) with pagination for complete lists. Queen has 1,635 releases total; the `inc=releases` on the artist endpoint only returns ~25.
457
+
458
+ - **Use release-groups to avoid edition explosion** — A popular album can have hundreds of release entries (every country's pressing, every remaster, every format). Use `/ws/2/release-group/` to get one entry per "album concept". Queen's "A Night at the Opera" has 75+ release entries but 1 release-group.
459
+
460
+ - **Recording length is milliseconds** — `recording['length']` is in milliseconds, not seconds. Divide by 1000.
461
+
462
+ - **Sort-name differs from display name for persons** — Artists have both `name` (display: "David Bowie") and `sort-name` (alphabetical: "Bowie, David"). Groups usually have identical values.
463
+
464
+ - **Disambiguation in parentheses** — When multiple entities share a name, MusicBrainz adds a `disambiguation` field to distinguish them (e.g. `"English singer-songwriter"` vs a different David Bowie). Always check `a.get('disambiguation', '')` when resolving artist identity.
465
+
466
+ - **Score 100 does not mean unique** — Search returns `score: 100` for multiple results when several equally match the query. "dark side of the moon" returns 6 results all scored 100 — they're different regional pressings. Filter by `date`, `country`, or `status` to narrow down.
467
+
468
+ - **Recording search: plain query matches titles AND artists broadly** — `?query=bohemian+rhapsody+queen` matches *cover versions* first because "queen" appears in the artist or title of other recordings. Use `AND artist:queen` Lucene syntax to restrict to Queen performances.
469
+
470
+ - **Cover Art Archive returns 404 for releases with no uploaded art** — Check `release['cover-art-archive']['artwork']` (boolean) from any release browse/search response before hitting the CAA endpoint. Saves an extra HTTP round-trip.
471
+
472
+ - **Cover art `front=True` flag vs `types=['Front']`** — A release can have multiple images typed as 'Front' but only one (or none) flagged `front: true`. Always filter on `img.get('front') == True` for the canonical cover, not on `img.get('types') == ['Front']`.
473
+
474
+ - **CAA thumbnail key names** — Both string keys `'small'` (250px) and `'large'` (500px) exist as aliases alongside numeric string keys `'250'`, `'500'`, `'1200'`. Access as `img['thumbnails']['500']` or `img['thumbnails']['large']` — both work.
475
+
476
+ - **Rate limit: 1 req/s unauthenticated** — In practice, bursts of 5-6 sequential requests succeed without throttling. True 429s appear at higher rates. For sequential pagination loops, add `time.sleep(1)` between pages. For parallel fetching, limit concurrency to 3-5 workers.
477
+
478
+ - **`fmt=json` required** — Omitting it returns XML instead of JSON. Always append `&fmt=json` to every request.