@pencil-agent/nano-pencil 2.0.1 → 2.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (186) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/model/custom-providers.js +1 -1
  7. package/dist/core/model-registry.js +5 -5
  8. package/dist/extensions/builtin/AGENT.md +115 -115
  9. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  10. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  11. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  12. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  13. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  14. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  15. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  16. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  17. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  18. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  19. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  20. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  91. package/dist/extensions/builtin/browser/browser.md +73 -73
  92. package/dist/extensions/builtin/browser/install.md +142 -142
  93. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  94. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  95. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  96. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  97. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  98. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  99. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  100. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  101. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  102. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  103. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  104. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  105. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  107. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  108. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  109. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  110. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  111. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  112. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  113. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  114. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  115. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  116. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  117. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  118. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  119. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  120. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  121. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  122. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  123. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  124. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  125. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  126. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  127. package/dist/extensions/builtin/goal/README.md +67 -67
  128. package/dist/extensions/builtin/grub/README.md +112 -112
  129. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  130. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  131. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  132. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  133. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  134. package/dist/extensions/builtin/loop/README.md +92 -92
  135. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  136. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  137. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  138. package/dist/extensions/builtin/sal/README.md +72 -72
  139. package/dist/extensions/builtin/security-audit/README.md +289 -289
  140. package/dist/extensions/builtin/team/AGENT.md +112 -112
  141. package/dist/extensions/builtin/team/TESTING.md +299 -299
  142. package/dist/extensions/builtin/token-save/README.md +56 -56
  143. package/dist/extensions/optional/AGENT.md +10 -10
  144. package/dist/modes/interactive/controllers/input-submit-controller.js +2 -2
  145. package/dist/modes/interactive/controllers/stream-render-controller.js +2 -2
  146. package/dist/modes/interactive/interactive-mode.js +19 -19
  147. package/dist/modes/interactive/theme/dark.json +85 -85
  148. package/dist/modes/interactive/theme/light.json +84 -84
  149. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  150. package/dist/modes/interactive/theme/warm.json +81 -81
  151. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  152. package/dist/node_modules/@pencil-agent/ai/dist/models.generated.js +1 -1
  153. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +851 -0
  154. package/docs/SDK-TESTING.md +364 -0
  155. package/docs/codex-goal-command-impl.md +1055 -1055
  156. package/docs/codex-goal-vs-grub.md +500 -500
  157. package/docs/custom-provider.md +27 -27
  158. package/docs/extensions.md +27 -27
  159. package/docs/keybindings.md +27 -27
  160. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  161. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  162. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  163. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  164. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  165. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  166. package/docs/loop-usage-examples.md +214 -214
  167. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +593 -0
  168. package/docs/models.md +27 -27
  169. package/docs/packages.md +27 -27
  170. package/docs/pi-design-philosophy.md +457 -457
  171. package/docs/planmode.md +1987 -1987
  172. package/docs/prompt-templates.md +27 -27
  173. package/docs/providers.md +27 -27
  174. package/docs/sdk.md +27 -27
  175. package/docs/skills.md +27 -27
  176. package/docs/startup-performance-optimization.md +301 -0
  177. package/docs/themes.md +27 -27
  178. package/docs/tui.md +27 -27
  179. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +47 -0
  180. package/package.json +190 -190
  181. package/docs/cc-agent-design.md +0 -1297
  182. package/docs/cc-tui-design.md +0 -1333
  183. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +0 -170
  184. package/docs/scan-report.md +0 -3820
  185. package/docs//345/257/271/346/240/207Claude-Code.md +0 -1775
  186. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +0 -261
@@ -1,478 +1,478 @@
1
- # MusicBrainz — Data Extraction
2
-
3
- `https://musicbrainz.org` — open music encyclopedia with a fully free JSON API.
4
- No auth required for reads. No browser needed for any documented workflow.
5
-
6
- Field-tested against musicbrainz.org on 2026-04-18.
7
-
8
- ---
9
-
10
- ## Do this first
11
-
12
- **The MusicBrainz Web Service API (ws/2) returns clean JSON for all entity types — no browser needed.**
13
-
14
- ```python
15
- from helpers import http_get
16
- import json
17
-
18
- # REQUIRED: every request must include this header or you get HTTP 403
19
- UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
20
-
21
- data = json.loads(http_get("https://musicbrainz.org/ws/2/artist/?query=queen&fmt=json&limit=5", headers=UA))
22
- for a in data['artists']:
23
- print(a['id'], a['name'], a.get('type'), a.get('country'), a['score'])
24
- # 0383dadf-2a4e-4d10-a46a-e9e041da8eb3 Queen Group GB 100
25
- # 79239441-bfd5-4981-a70c-55c3f15c1287 Madonna Person US 73
26
- ```
27
-
28
- `User-Agent` is **mandatory** — omitting it returns HTTP 403 immediately. Format: `AppName/Version (contact@email.com)`.
29
-
30
- ---
31
-
32
- ## Entity types
33
-
34
- | Entity | Endpoint | Key fields |
35
- |---|---|---|
36
- | `artist` | `/ws/2/artist/` | name, sort-name, type (Group/Person/Orchestra/Choir), country, life-span, tags, rating |
37
- | `release-group` | `/ws/2/release-group/` | title, primary-type (Album/Single/EP/Other), first-release-date |
38
- | `release` | `/ws/2/release/` | title, date, country, status (Official/Bootleg/Promotional), barcode, label-info, media |
39
- | `recording` | `/ws/2/recording/` | title, length (milliseconds), artist-credit, releases |
40
- | `label` | `/ws/2/label/` | name, type, country, area |
41
- | `work` | `/ws/2/work/` | title, type (Song/Aria/Soundtrack/etc.), relations |
42
-
43
- All entities share the same MBID (MusicBrainz ID) format: UUID v4, e.g. `0383dadf-2a4e-4d10-a46a-e9e041da8eb3`.
44
-
45
- ---
46
-
47
- ## Common workflows
48
-
49
- ### Artist search
50
-
51
- ```python
52
- from helpers import http_get
53
- import json
54
-
55
- UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
56
-
57
- resp = json.loads(http_get(
58
- "https://musicbrainz.org/ws/2/artist/?query=queen&fmt=json&limit=5",
59
- headers=UA
60
- ))
61
- # resp keys: count (total matches), offset, artists (list)
62
- for a in resp['artists']:
63
- print(a['id']) # MBID: 0383dadf-2a4e-4d10-a46a-e9e041da8eb3
64
- print(a['name']) # Queen
65
- print(a['sort-name']) # Queen (differs for persons: "Bowie, David")
66
- print(a.get('type')) # Group / Person / Orchestra / Choir
67
- print(a.get('country')) # GB
68
- print(a.get('life-span'))# {'begin': '1970-06-27', 'end': None, 'ended': True}
69
- print(a.get('disambiguation', '')) # e.g. "English singer-songwriter"
70
- print(a['score']) # relevance 0-100
71
- ```
72
-
73
- ### Artist by MBID (with related data via `inc=`)
74
-
75
- ```python
76
- # inc= parameters stack with + between them
77
- resp = json.loads(http_get(
78
- "https://musicbrainz.org/ws/2/artist/0383dadf-2a4e-4d10-a46a-e9e041da8eb3"
79
- "?inc=releases+tags+ratings+release-groups&fmt=json",
80
- headers=UA
81
- ))
82
- print(resp['name']) # Queen
83
- print(resp['type']) # Group
84
- print(resp['country']) # GB
85
- print(resp['life-span']) # {'begin': '1970-06-27', 'end': None, 'ended': True}
86
-
87
- # Tags (community-voted genre labels, sorted by count)
88
- tags = sorted(resp.get('tags', []), key=lambda x: x['count'], reverse=True)
89
- print([t['name'] for t in tags[:5]])
90
- # ['rock', 'glam rock', 'hard rock', 'art rock', 'british']
91
-
92
- # Rating (community score, 0-5)
93
- print(resp.get('rating')) # {'votes-count': 43, 'value': 4.7}
94
-
95
- # Direct releases (up to 25 per request — use browse for full list)
96
- for r in resp.get('releases', []):
97
- print(r['id'], r['title'], r.get('date'))
98
-
99
- # Release groups (albums, singles, EPs — deduplicated by edition)
100
- for rg in resp.get('release-groups', []):
101
- print(rg['id'], rg['title'], rg.get('primary-type'), rg.get('first-release-date'))
102
- # 6b47c9a0 A Night at the Opera Album 1975-11-21
103
- # 002ed683 Sheer Heart Attack Album 1974-11-01
104
- ```
105
-
106
- ### Browse releases by artist (full list)
107
-
108
- ```python
109
- # Browse API: uses 'artist' param (not 'query') — response key is 'release-count' not 'count'
110
- resp = json.loads(http_get(
111
- "https://musicbrainz.org/ws/2/release/"
112
- "?artist=0383dadf-2a4e-4d10-a46a-e9e041da8eb3&fmt=json&limit=25&offset=0",
113
- headers=UA
114
- ))
115
- print(resp['release-count']) # 1635 — total releases for this artist
116
- for r in resp['releases']:
117
- print(r['id'], r['title'], r.get('date'), r.get('country'), r.get('status'))
118
- # Also has: cover-art-archive.artwork (bool), cover-art-archive.front (bool)
119
- caa = r.get('cover-art-archive', {})
120
- print(caa.get('artwork'), caa.get('front'), caa.get('count'))
121
-
122
- # Paginate: increment offset by limit
123
- ```
124
-
125
- ### Release search and lookup
126
-
127
- ```python
128
- # Search by title
129
- resp = json.loads(http_get(
130
- "https://musicbrainz.org/ws/2/release/?query=dark+side+of+the+moon&fmt=json&limit=5",
131
- headers=UA
132
- ))
133
- # resp keys: count, offset, releases
134
-
135
- # Full release with track list, artists, and labels
136
- release = json.loads(http_get(
137
- "https://musicbrainz.org/ws/2/release/b84ee12a-09ef-421b-82de-0441a926375b"
138
- "?inc=artists+recordings+labels+release-groups&fmt=json",
139
- headers=UA
140
- ))
141
- print(release['title']) # The Dark Side of the Moon
142
- print(release['date']) # 1973-03-24
143
- print(release['status']) # Official
144
- print(release['country']) # GB
145
-
146
- # Release group (the "album concept", deduplicates editions)
147
- rg = release.get('release-group', {})
148
- print(rg['title'], rg.get('primary-type'), rg['id'])
149
- # The Dark Side of the Moon Album f5093c06-23e3-404f-aeaa-40f72885ee3a
150
-
151
- # Artist credit
152
- for ac in release.get('artist-credit', []):
153
- if isinstance(ac, dict) and 'artist' in ac:
154
- print(ac['artist']['name'], ac['artist']['id'])
155
- # Pink Floyd 83d91898-7763-47d7-b03b-b92132375c47
156
-
157
- # Labels
158
- for li in release.get('label-info', []):
159
- label = li.get('label', {})
160
- print(label.get('name'), li.get('catalog-number'))
161
- # Harvest SHVL 804
162
-
163
- # Track list (from media[].tracks[])
164
- for disc in release.get('media', []):
165
- for track in disc.get('tracks', []):
166
- dur_s = track['length'] // 1000 if track.get('length') else None
167
- rec = track.get('recording', {})
168
- print(track['number'], track['title'], dur_s, rec.get('id'))
169
- # A1 Speak to Me 68s bef3fddb-5aca-49f5-b2fd-d56a23268d63
170
- # A2 Breathe 168s ecbc7c9b-e79d-4ec8-ac77-44e4a7f7f1b8
171
- ```
172
-
173
- ### Recording (track) search
174
-
175
- ```python
176
- # Use Lucene field syntax to filter by artist
177
- resp = json.loads(http_get(
178
- "https://musicbrainz.org/ws/2/recording/"
179
- "?query=bohemian+rhapsody+AND+artist:queen&fmt=json&limit=5",
180
- headers=UA
181
- ))
182
- print(resp['count']) # 419
183
- for r in resp['recordings']:
184
- dur_s = r['length'] // 1000 if r.get('length') else None
185
- artists = [ac['artist']['name'] for ac in r.get('artist-credit', []) if isinstance(ac, dict)]
186
- releases = r.get('releases', [])
187
- print(r['id'], r['title'], dur_s, artists, releases[0]['title'] if releases else None)
188
- # a4803b45 Bohemian Rhapsody 130s ['Queen'] Rhapsody in Red
189
- # 40212eb6 Bohemian Rhapsody 338s ['Queen'] 1986-07: Wembley Stadium
190
- ```
191
-
192
- ### Release-group search (deduplicated albums)
193
-
194
- ```python
195
- # Use release-group endpoint to avoid getting every regional edition
196
- resp = json.loads(http_get(
197
- "https://musicbrainz.org/ws/2/release-group/"
198
- "?query=release-group:\"A+Night+at+the+Opera\"+AND+artist:queen&fmt=json&limit=5",
199
- headers=UA
200
- ))
201
- # resp keys: count, release-groups
202
- for rg in resp.get('release-groups', []):
203
- print(rg['id'], rg['title'], rg.get('primary-type'), rg.get('first-release-date'), rg['score'])
204
- # 6b47c9a0 A Night at the Opera Album 1975-11-21 100
205
-
206
- # Browse release-groups for an artist
207
- resp = json.loads(http_get(
208
- "https://musicbrainz.org/ws/2/release-group/"
209
- "?artist=0383dadf-2a4e-4d10-a46a-e9e041da8eb3&fmt=json&limit=25",
210
- headers=UA
211
- ))
212
- print(resp['release-group-count']) # 412
213
- for rg in resp.get('release-groups', []):
214
- print(rg['title'], rg.get('primary-type'), rg.get('first-release-date'))
215
- ```
216
-
217
- ### Label and work lookups
218
-
219
- ```python
220
- # Label search
221
- resp = json.loads(http_get(
222
- "https://musicbrainz.org/ws/2/label/?query=EMI&fmt=json&limit=3",
223
- headers=UA
224
- ))
225
- for l in resp['labels']:
226
- print(l['id'], l['name'], l.get('type'), l.get('country'), l['score'])
227
- # c029628b EMI Original Production GB 100
228
-
229
- # Work (song composition — author-level, not performance-level)
230
- resp = json.loads(http_get(
231
- "https://musicbrainz.org/ws/2/work/?query=bohemian+rhapsody&fmt=json&limit=3",
232
- headers=UA
233
- ))
234
- for w in resp['works']:
235
- print(w['id'], w['title'], w.get('type'), w['score'])
236
- # 41c94a08 Bohemian Rhapsody Song 100
237
- ```
238
-
239
- ### Cover Art Archive
240
-
241
- ```python
242
- # Get cover art for a release MBID
243
- # 404 if no artwork has been uploaded for that release
244
- def get_cover_art(release_mbid, size="500"):
245
- """
246
- size: '250', '500', '1200', or 'full' (original file)
247
- Returns the front cover URL, or None if no artwork exists.
248
- """
249
- try:
250
- resp = json.loads(http_get(
251
- f"https://coverartarchive.org/release/{release_mbid}",
252
- headers=UA
253
- ))
254
- except Exception:
255
- return None # 404 = no art uploaded
256
-
257
- images = resp.get('images', [])
258
- # Prefer an image flagged as front=True
259
- front = next((img for img in images if img.get('front')), None)
260
- img = front or (images[0] if images else None)
261
- if not img:
262
- return None
263
-
264
- if size == 'full':
265
- return img['image']
266
- return img['thumbnails'].get(size) or img['thumbnails'].get('large')
267
-
268
- # Thumbnail sizes confirmed: '250', '500', '1200', 'small' (=250), 'large' (=500)
269
-
270
- url = get_cover_art("b84ee12a-09ef-421b-82de-0441a926375b")
271
- # http://coverartarchive.org/release/b84ee12a.../1611507818-500.jpg
272
-
273
- # Full images response structure
274
- resp = json.loads(http_get(
275
- "https://coverartarchive.org/release/b84ee12a-09ef-421b-82de-0441a926375b",
276
- headers=UA
277
- ))
278
- for img in resp['images']:
279
- print(img.get('types')) # ['Front'], ['Back'], ['Liner'], ['Poster'], ['Medium'], ['Sticker'], ['Other']
280
- print(img.get('front')) # True only for front=True flagged images (not all 'Front' types)
281
- print(img.get('approved'))# True/False
282
- print(img['image']) # full resolution URL
283
- print(img['thumbnails']) # {'small': '...-250.jpg', 'large': '...-500.jpg', '250': ..., '500': ..., '1200': ...}
284
- ```
285
-
286
- ### Lucene query syntax for search
287
-
288
- All search endpoints support Lucene field queries:
289
-
290
- ```python
291
- # Field search: artist:, type:, country:, tag:, release:, date:
292
- resp = json.loads(http_get(
293
- "https://musicbrainz.org/ws/2/artist/"
294
- "?query=artist:queen+AND+type:group+AND+country:GB&fmt=json&limit=5",
295
- headers=UA
296
- ))
297
- # count: 23 (exact matches only)
298
-
299
- # Phrase search with quotes
300
- resp = json.loads(http_get(
301
- "https://musicbrainz.org/ws/2/release/"
302
- '?query=release:"A+Night+at+the+Opera"+AND+artist:queen&fmt=json&limit=5',
303
- headers=UA
304
- ))
305
- ```
306
-
307
- Common Lucene field names per entity:
308
- - artist: `artist:`, `type:`, `country:`, `tag:`, `begin:`, `end:`
309
- - release: `release:`, `artist:`, `date:`, `country:`, `status:`, `label:`, `barcode:`
310
- - recording: `recording:`, `artist:`, `release:`, `dur:` (milliseconds), `tnum:` (track number)
311
- - release-group: `release-group:`, `artist:`, `primarytype:`, `secondarytype:`
312
-
313
- ### Parallel fetching
314
-
315
- ```python
316
- from concurrent.futures import ThreadPoolExecutor
317
-
318
- UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
319
-
320
- def fetch_artist(mbid):
321
- resp = json.loads(http_get(
322
- f"https://musicbrainz.org/ws/2/artist/{mbid}?inc=tags&fmt=json",
323
- headers=UA
324
- ))
325
- tags = [t['name'] for t in sorted(resp.get('tags', []), key=lambda x: x['count'], reverse=True)[:3]]
326
- return {"name": resp['name'], "type": resp.get('type'), "tags": tags}
327
-
328
- mbids = [
329
- "0383dadf-2a4e-4d10-a46a-e9e041da8eb3", # Queen
330
- "83d91898-7763-47d7-b03b-b92132375c47", # Pink Floyd
331
- "678d88b2-87b0-403b-b63d-5da7465aecc3", # Led Zeppelin
332
- ]
333
-
334
- with ThreadPoolExecutor(max_workers=3) as ex:
335
- results = list(ex.map(fetch_artist, mbids))
336
- # 3 artists fetched in ~0.79s total
337
- ```
338
-
339
- Tested: 5-6 rapid sequential requests all succeed. Parallel requests at 3x concurrency succeed. Real 429s (rate-limit blocks) are only hit at very high burst rates; if you do get a 429, add `time.sleep(1)` between requests.
340
-
341
- ### Pagination
342
-
343
- ```python
344
- import time
345
-
346
- UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
347
-
348
- def browse_all_releases(artist_mbid, page_size=25):
349
- """Fetch all releases for an artist across multiple pages."""
350
- offset = 0
351
- total = None
352
- releases = []
353
- while total is None or offset < total:
354
- resp = json.loads(http_get(
355
- f"https://musicbrainz.org/ws/2/release/"
356
- f"?artist={artist_mbid}&fmt=json&limit={page_size}&offset={offset}",
357
- headers=UA
358
- ))
359
- total = resp['release-count']
360
- batch = resp['releases']
361
- releases.extend(batch)
362
- offset += len(batch)
363
- if offset < total:
364
- time.sleep(1) # stay within 1 req/s for sequential pagination
365
- return releases
366
-
367
- # Queen has 1635 releases — use release-groups (412) to get deduplicated albums
368
- ```
369
-
370
- ---
371
-
372
- ## `inc=` parameter reference
373
-
374
- Stack multiple `inc=` values with `+` between them.
375
-
376
- **Artist lookup** (`/ws/2/artist/{mbid}`):
377
- - `releases` — list of releases (max ~25)
378
- - `release-groups` — list of release groups (max ~25)
379
- - `recordings` — list of recordings (max ~25)
380
- - `works` — list of works
381
- - `tags` — community genre tags (name + vote count)
382
- - `ratings` — community rating (value 0-5, votes-count)
383
- - `aliases` — alternative names and transliterations
384
- - `annotation` — free-text editorial note
385
- - `artist-rels`, `release-rels`, `recording-rels`, `work-rels` — relationship data
386
-
387
- **Release lookup** (`/ws/2/release/{mbid}`):
388
- - `artists` — full artist-credit objects
389
- - `recordings` — track list with recording links (populates `media[].tracks[].recording`)
390
- - `labels` — label-info with catalog numbers
391
- - `release-groups` — the release group this belongs to
392
- - `artist-credits` — expanded artist credit with joinphrase
393
- - `media` — disc/format info (always included in lookup, not needed in `inc=`)
394
-
395
- ---
396
-
397
- ## Response shapes cheat sheet
398
-
399
- ```
400
- # MBID format: standard UUID v4
401
- "0383dadf-2a4e-4d10-a46a-e9e041da8eb3"
402
-
403
- # Search response (artist/recording/release/release-group/label/work)
404
- {
405
- "count": 1612, # total matches
406
- "offset": 0,
407
- "<entity-plural>": [...] # e.g. "artists", "releases", "recordings", "release-groups"
408
- }
409
-
410
- # Browse response (using ?artist=MBID or ?label=MBID style)
411
- {
412
- "release-count": 1635, # note: key name changes per entity
413
- "release-offset": 0, # e.g. "release-group-count", "recording-count"
414
- "releases": [...]
415
- }
416
-
417
- # Recording length is always milliseconds
418
- recording['length'] // 1000 # => seconds
419
-
420
- # Artist life-span
421
- life_span = artist['life-span']
422
- # {'begin': '1970-06-27', 'end': None, 'ended': True}
423
- # 'ended': True with 'end': None means end date unknown but band is inactive
424
-
425
- # Artist credit joinphrase (for multi-artist tracks)
426
- # [{"name": "Simon", "artist": {...}, "joinphrase": " & "}, {"name": "Garfunkel", ...}]
427
- ```
428
-
429
- ---
430
-
431
- ## URL patterns
432
-
433
- | Resource | URL |
434
- |---|---|
435
- | Artist search | `https://musicbrainz.org/ws/2/artist/?query={q}&fmt=json&limit=5` |
436
- | Artist by MBID | `https://musicbrainz.org/ws/2/artist/{mbid}?inc=tags+ratings&fmt=json` |
437
- | Browse releases by artist | `https://musicbrainz.org/ws/2/release/?artist={mbid}&fmt=json&limit=25&offset=0` |
438
- | Release search | `https://musicbrainz.org/ws/2/release/?query={q}&fmt=json&limit=5` |
439
- | Release by MBID | `https://musicbrainz.org/ws/2/release/{mbid}?inc=artists+recordings+labels&fmt=json` |
440
- | Release-group browse | `https://musicbrainz.org/ws/2/release-group/?artist={mbid}&fmt=json&limit=25` |
441
- | Recording search | `https://musicbrainz.org/ws/2/recording/?query={q}&fmt=json&limit=5` |
442
- | Label search | `https://musicbrainz.org/ws/2/label/?query={q}&fmt=json&limit=5` |
443
- | Work search | `https://musicbrainz.org/ws/2/work/?query={q}&fmt=json&limit=5` |
444
- | Cover art | `https://coverartarchive.org/release/{release-mbid}` |
445
-
446
- MusicBrainz entity browser URL (human-readable): `https://musicbrainz.org/artist/{mbid}` (replace `artist` with `release`, `recording`, etc.)
447
-
448
- ---
449
-
450
- ## Gotchas
451
-
452
- - **`User-Agent` is mandatory** — without it you get HTTP 403 instantly. The header must include contact info, e.g. `browser-harness/1.0 (you@example.com)`. The default `http_get` UA (`Mozilla/5.0`) also gets 403.
453
-
454
- - **Browse vs search response keys differ** — Search responses use `count` and `offset`; Browse responses (with `?artist=MBID`) use `release-count` / `release-offset` (or `release-group-count` etc.). Accessing `data['count']` on a browse response throws `KeyError`.
455
-
456
- - **`releases` include in artist lookup caps at ~25** — Use the browse endpoint (`?artist=MBID`) with pagination for complete lists. Queen has 1,635 releases total; the `inc=releases` on the artist endpoint only returns ~25.
457
-
458
- - **Use release-groups to avoid edition explosion** — A popular album can have hundreds of release entries (every country's pressing, every remaster, every format). Use `/ws/2/release-group/` to get one entry per "album concept". Queen's "A Night at the Opera" has 75+ release entries but 1 release-group.
459
-
460
- - **Recording length is milliseconds** — `recording['length']` is in milliseconds, not seconds. Divide by 1000.
461
-
462
- - **Sort-name differs from display name for persons** — Artists have both `name` (display: "David Bowie") and `sort-name` (alphabetical: "Bowie, David"). Groups usually have identical values.
463
-
464
- - **Disambiguation in parentheses** — When multiple entities share a name, MusicBrainz adds a `disambiguation` field to distinguish them (e.g. `"English singer-songwriter"` vs a different David Bowie). Always check `a.get('disambiguation', '')` when resolving artist identity.
465
-
466
- - **Score 100 does not mean unique** — Search returns `score: 100` for multiple results when several equally match the query. "dark side of the moon" returns 6 results all scored 100 — they're different regional pressings. Filter by `date`, `country`, or `status` to narrow down.
467
-
468
- - **Recording search: plain query matches titles AND artists broadly** — `?query=bohemian+rhapsody+queen` matches *cover versions* first because "queen" appears in the artist or title of other recordings. Use `AND artist:queen` Lucene syntax to restrict to Queen performances.
469
-
470
- - **Cover Art Archive returns 404 for releases with no uploaded art** — Check `release['cover-art-archive']['artwork']` (boolean) from any release browse/search response before hitting the CAA endpoint. Saves an extra HTTP round-trip.
471
-
472
- - **Cover art `front=True` flag vs `types=['Front']`** — A release can have multiple images typed as 'Front' but only one (or none) flagged `front: true`. Always filter on `img.get('front') == True` for the canonical cover, not on `img.get('types') == ['Front']`.
473
-
474
- - **CAA thumbnail key names** — Both string keys `'small'` (250px) and `'large'` (500px) exist as aliases alongside numeric string keys `'250'`, `'500'`, `'1200'`. Access as `img['thumbnails']['500']` or `img['thumbnails']['large']` — both work.
475
-
476
- - **Rate limit: 1 req/s unauthenticated** — In practice, bursts of 5-6 sequential requests succeed without throttling. True 429s appear at higher rates. For sequential pagination loops, add `time.sleep(1)` between pages. For parallel fetching, limit concurrency to 3-5 workers.
477
-
478
- - **`fmt=json` required** — Omitting it returns XML instead of JSON. Always append `&fmt=json` to every request.
1
+ # MusicBrainz — Data Extraction
2
+
3
+ `https://musicbrainz.org` — open music encyclopedia with a fully free JSON API.
4
+ No auth required for reads. No browser needed for any documented workflow.
5
+
6
+ Field-tested against musicbrainz.org on 2026-04-18.
7
+
8
+ ---
9
+
10
+ ## Do this first
11
+
12
+ **The MusicBrainz Web Service API (ws/2) returns clean JSON for all entity types — no browser needed.**
13
+
14
+ ```python
15
+ from helpers import http_get
16
+ import json
17
+
18
+ # REQUIRED: every request must include this header or you get HTTP 403
19
+ UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
20
+
21
+ data = json.loads(http_get("https://musicbrainz.org/ws/2/artist/?query=queen&fmt=json&limit=5", headers=UA))
22
+ for a in data['artists']:
23
+ print(a['id'], a['name'], a.get('type'), a.get('country'), a['score'])
24
+ # 0383dadf-2a4e-4d10-a46a-e9e041da8eb3 Queen Group GB 100
25
+ # 79239441-bfd5-4981-a70c-55c3f15c1287 Madonna Person US 73
26
+ ```
27
+
28
+ `User-Agent` is **mandatory** — omitting it returns HTTP 403 immediately. Format: `AppName/Version (contact@email.com)`.
29
+
30
+ ---
31
+
32
+ ## Entity types
33
+
34
+ | Entity | Endpoint | Key fields |
35
+ |---|---|---|
36
+ | `artist` | `/ws/2/artist/` | name, sort-name, type (Group/Person/Orchestra/Choir), country, life-span, tags, rating |
37
+ | `release-group` | `/ws/2/release-group/` | title, primary-type (Album/Single/EP/Other), first-release-date |
38
+ | `release` | `/ws/2/release/` | title, date, country, status (Official/Bootleg/Promotional), barcode, label-info, media |
39
+ | `recording` | `/ws/2/recording/` | title, length (milliseconds), artist-credit, releases |
40
+ | `label` | `/ws/2/label/` | name, type, country, area |
41
+ | `work` | `/ws/2/work/` | title, type (Song/Aria/Soundtrack/etc.), relations |
42
+
43
+ All entities share the same MBID (MusicBrainz ID) format: UUID v4, e.g. `0383dadf-2a4e-4d10-a46a-e9e041da8eb3`.
44
+
45
+ ---
46
+
47
+ ## Common workflows
48
+
49
+ ### Artist search
50
+
51
+ ```python
52
+ from helpers import http_get
53
+ import json
54
+
55
+ UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
56
+
57
+ resp = json.loads(http_get(
58
+ "https://musicbrainz.org/ws/2/artist/?query=queen&fmt=json&limit=5",
59
+ headers=UA
60
+ ))
61
+ # resp keys: count (total matches), offset, artists (list)
62
+ for a in resp['artists']:
63
+ print(a['id']) # MBID: 0383dadf-2a4e-4d10-a46a-e9e041da8eb3
64
+ print(a['name']) # Queen
65
+ print(a['sort-name']) # Queen (differs for persons: "Bowie, David")
66
+ print(a.get('type')) # Group / Person / Orchestra / Choir
67
+ print(a.get('country')) # GB
68
+ print(a.get('life-span'))# {'begin': '1970-06-27', 'end': None, 'ended': True}
69
+ print(a.get('disambiguation', '')) # e.g. "English singer-songwriter"
70
+ print(a['score']) # relevance 0-100
71
+ ```
72
+
73
+ ### Artist by MBID (with related data via `inc=`)
74
+
75
+ ```python
76
+ # inc= parameters stack with + between them
77
+ resp = json.loads(http_get(
78
+ "https://musicbrainz.org/ws/2/artist/0383dadf-2a4e-4d10-a46a-e9e041da8eb3"
79
+ "?inc=releases+tags+ratings+release-groups&fmt=json",
80
+ headers=UA
81
+ ))
82
+ print(resp['name']) # Queen
83
+ print(resp['type']) # Group
84
+ print(resp['country']) # GB
85
+ print(resp['life-span']) # {'begin': '1970-06-27', 'end': None, 'ended': True}
86
+
87
+ # Tags (community-voted genre labels, sorted by count)
88
+ tags = sorted(resp.get('tags', []), key=lambda x: x['count'], reverse=True)
89
+ print([t['name'] for t in tags[:5]])
90
+ # ['rock', 'glam rock', 'hard rock', 'art rock', 'british']
91
+
92
+ # Rating (community score, 0-5)
93
+ print(resp.get('rating')) # {'votes-count': 43, 'value': 4.7}
94
+
95
+ # Direct releases (up to 25 per request — use browse for full list)
96
+ for r in resp.get('releases', []):
97
+ print(r['id'], r['title'], r.get('date'))
98
+
99
+ # Release groups (albums, singles, EPs — deduplicated by edition)
100
+ for rg in resp.get('release-groups', []):
101
+ print(rg['id'], rg['title'], rg.get('primary-type'), rg.get('first-release-date'))
102
+ # 6b47c9a0 A Night at the Opera Album 1975-11-21
103
+ # 002ed683 Sheer Heart Attack Album 1974-11-01
104
+ ```
105
+
106
+ ### Browse releases by artist (full list)
107
+
108
+ ```python
109
+ # Browse API: uses 'artist' param (not 'query') — response key is 'release-count' not 'count'
110
+ resp = json.loads(http_get(
111
+ "https://musicbrainz.org/ws/2/release/"
112
+ "?artist=0383dadf-2a4e-4d10-a46a-e9e041da8eb3&fmt=json&limit=25&offset=0",
113
+ headers=UA
114
+ ))
115
+ print(resp['release-count']) # 1635 — total releases for this artist
116
+ for r in resp['releases']:
117
+ print(r['id'], r['title'], r.get('date'), r.get('country'), r.get('status'))
118
+ # Also has: cover-art-archive.artwork (bool), cover-art-archive.front (bool)
119
+ caa = r.get('cover-art-archive', {})
120
+ print(caa.get('artwork'), caa.get('front'), caa.get('count'))
121
+
122
+ # Paginate: increment offset by limit
123
+ ```
124
+
125
+ ### Release search and lookup
126
+
127
+ ```python
128
+ # Search by title
129
+ resp = json.loads(http_get(
130
+ "https://musicbrainz.org/ws/2/release/?query=dark+side+of+the+moon&fmt=json&limit=5",
131
+ headers=UA
132
+ ))
133
+ # resp keys: count, offset, releases
134
+
135
+ # Full release with track list, artists, and labels
136
+ release = json.loads(http_get(
137
+ "https://musicbrainz.org/ws/2/release/b84ee12a-09ef-421b-82de-0441a926375b"
138
+ "?inc=artists+recordings+labels+release-groups&fmt=json",
139
+ headers=UA
140
+ ))
141
+ print(release['title']) # The Dark Side of the Moon
142
+ print(release['date']) # 1973-03-24
143
+ print(release['status']) # Official
144
+ print(release['country']) # GB
145
+
146
+ # Release group (the "album concept", deduplicates editions)
147
+ rg = release.get('release-group', {})
148
+ print(rg['title'], rg.get('primary-type'), rg['id'])
149
+ # The Dark Side of the Moon Album f5093c06-23e3-404f-aeaa-40f72885ee3a
150
+
151
+ # Artist credit
152
+ for ac in release.get('artist-credit', []):
153
+ if isinstance(ac, dict) and 'artist' in ac:
154
+ print(ac['artist']['name'], ac['artist']['id'])
155
+ # Pink Floyd 83d91898-7763-47d7-b03b-b92132375c47
156
+
157
+ # Labels
158
+ for li in release.get('label-info', []):
159
+ label = li.get('label', {})
160
+ print(label.get('name'), li.get('catalog-number'))
161
+ # Harvest SHVL 804
162
+
163
+ # Track list (from media[].tracks[])
164
+ for disc in release.get('media', []):
165
+ for track in disc.get('tracks', []):
166
+ dur_s = track['length'] // 1000 if track.get('length') else None
167
+ rec = track.get('recording', {})
168
+ print(track['number'], track['title'], dur_s, rec.get('id'))
169
+ # A1 Speak to Me 68s bef3fddb-5aca-49f5-b2fd-d56a23268d63
170
+ # A2 Breathe 168s ecbc7c9b-e79d-4ec8-ac77-44e4a7f7f1b8
171
+ ```
172
+
173
+ ### Recording (track) search
174
+
175
+ ```python
176
+ # Use Lucene field syntax to filter by artist
177
+ resp = json.loads(http_get(
178
+ "https://musicbrainz.org/ws/2/recording/"
179
+ "?query=bohemian+rhapsody+AND+artist:queen&fmt=json&limit=5",
180
+ headers=UA
181
+ ))
182
+ print(resp['count']) # 419
183
+ for r in resp['recordings']:
184
+ dur_s = r['length'] // 1000 if r.get('length') else None
185
+ artists = [ac['artist']['name'] for ac in r.get('artist-credit', []) if isinstance(ac, dict)]
186
+ releases = r.get('releases', [])
187
+ print(r['id'], r['title'], dur_s, artists, releases[0]['title'] if releases else None)
188
+ # a4803b45 Bohemian Rhapsody 130s ['Queen'] Rhapsody in Red
189
+ # 40212eb6 Bohemian Rhapsody 338s ['Queen'] 1986-07: Wembley Stadium
190
+ ```
191
+
192
+ ### Release-group search (deduplicated albums)
193
+
194
+ ```python
195
+ # Use release-group endpoint to avoid getting every regional edition
196
+ resp = json.loads(http_get(
197
+ "https://musicbrainz.org/ws/2/release-group/"
198
+ "?query=release-group:\"A+Night+at+the+Opera\"+AND+artist:queen&fmt=json&limit=5",
199
+ headers=UA
200
+ ))
201
+ # resp keys: count, release-groups
202
+ for rg in resp.get('release-groups', []):
203
+ print(rg['id'], rg['title'], rg.get('primary-type'), rg.get('first-release-date'), rg['score'])
204
+ # 6b47c9a0 A Night at the Opera Album 1975-11-21 100
205
+
206
+ # Browse release-groups for an artist
207
+ resp = json.loads(http_get(
208
+ "https://musicbrainz.org/ws/2/release-group/"
209
+ "?artist=0383dadf-2a4e-4d10-a46a-e9e041da8eb3&fmt=json&limit=25",
210
+ headers=UA
211
+ ))
212
+ print(resp['release-group-count']) # 412
213
+ for rg in resp.get('release-groups', []):
214
+ print(rg['title'], rg.get('primary-type'), rg.get('first-release-date'))
215
+ ```
216
+
217
+ ### Label and work lookups
218
+
219
+ ```python
220
+ # Label search
221
+ resp = json.loads(http_get(
222
+ "https://musicbrainz.org/ws/2/label/?query=EMI&fmt=json&limit=3",
223
+ headers=UA
224
+ ))
225
+ for l in resp['labels']:
226
+ print(l['id'], l['name'], l.get('type'), l.get('country'), l['score'])
227
+ # c029628b EMI Original Production GB 100
228
+
229
+ # Work (song composition — author-level, not performance-level)
230
+ resp = json.loads(http_get(
231
+ "https://musicbrainz.org/ws/2/work/?query=bohemian+rhapsody&fmt=json&limit=3",
232
+ headers=UA
233
+ ))
234
+ for w in resp['works']:
235
+ print(w['id'], w['title'], w.get('type'), w['score'])
236
+ # 41c94a08 Bohemian Rhapsody Song 100
237
+ ```
238
+
239
+ ### Cover Art Archive
240
+
241
+ ```python
242
+ # Get cover art for a release MBID
243
+ # 404 if no artwork has been uploaded for that release
244
+ def get_cover_art(release_mbid, size="500"):
245
+ """
246
+ size: '250', '500', '1200', or 'full' (original file)
247
+ Returns the front cover URL, or None if no artwork exists.
248
+ """
249
+ try:
250
+ resp = json.loads(http_get(
251
+ f"https://coverartarchive.org/release/{release_mbid}",
252
+ headers=UA
253
+ ))
254
+ except Exception:
255
+ return None # 404 = no art uploaded
256
+
257
+ images = resp.get('images', [])
258
+ # Prefer an image flagged as front=True
259
+ front = next((img for img in images if img.get('front')), None)
260
+ img = front or (images[0] if images else None)
261
+ if not img:
262
+ return None
263
+
264
+ if size == 'full':
265
+ return img['image']
266
+ return img['thumbnails'].get(size) or img['thumbnails'].get('large')
267
+
268
+ # Thumbnail sizes confirmed: '250', '500', '1200', 'small' (=250), 'large' (=500)
269
+
270
+ url = get_cover_art("b84ee12a-09ef-421b-82de-0441a926375b")
271
+ # http://coverartarchive.org/release/b84ee12a.../1611507818-500.jpg
272
+
273
+ # Full images response structure
274
+ resp = json.loads(http_get(
275
+ "https://coverartarchive.org/release/b84ee12a-09ef-421b-82de-0441a926375b",
276
+ headers=UA
277
+ ))
278
+ for img in resp['images']:
279
+ print(img.get('types')) # ['Front'], ['Back'], ['Liner'], ['Poster'], ['Medium'], ['Sticker'], ['Other']
280
+ print(img.get('front')) # True only for front=True flagged images (not all 'Front' types)
281
+ print(img.get('approved'))# True/False
282
+ print(img['image']) # full resolution URL
283
+ print(img['thumbnails']) # {'small': '...-250.jpg', 'large': '...-500.jpg', '250': ..., '500': ..., '1200': ...}
284
+ ```
285
+
286
+ ### Lucene query syntax for search
287
+
288
+ All search endpoints support Lucene field queries:
289
+
290
+ ```python
291
+ # Field search: artist:, type:, country:, tag:, release:, date:
292
+ resp = json.loads(http_get(
293
+ "https://musicbrainz.org/ws/2/artist/"
294
+ "?query=artist:queen+AND+type:group+AND+country:GB&fmt=json&limit=5",
295
+ headers=UA
296
+ ))
297
+ # count: 23 (exact matches only)
298
+
299
+ # Phrase search with quotes
300
+ resp = json.loads(http_get(
301
+ "https://musicbrainz.org/ws/2/release/"
302
+ '?query=release:"A+Night+at+the+Opera"+AND+artist:queen&fmt=json&limit=5',
303
+ headers=UA
304
+ ))
305
+ ```
306
+
307
+ Common Lucene field names per entity:
308
+ - artist: `artist:`, `type:`, `country:`, `tag:`, `begin:`, `end:`
309
+ - release: `release:`, `artist:`, `date:`, `country:`, `status:`, `label:`, `barcode:`
310
+ - recording: `recording:`, `artist:`, `release:`, `dur:` (milliseconds), `tnum:` (track number)
311
+ - release-group: `release-group:`, `artist:`, `primarytype:`, `secondarytype:`
312
+
313
+ ### Parallel fetching
314
+
315
+ ```python
316
+ from concurrent.futures import ThreadPoolExecutor
317
+
318
+ UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
319
+
320
+ def fetch_artist(mbid):
321
+ resp = json.loads(http_get(
322
+ f"https://musicbrainz.org/ws/2/artist/{mbid}?inc=tags&fmt=json",
323
+ headers=UA
324
+ ))
325
+ tags = [t['name'] for t in sorted(resp.get('tags', []), key=lambda x: x['count'], reverse=True)[:3]]
326
+ return {"name": resp['name'], "type": resp.get('type'), "tags": tags}
327
+
328
+ mbids = [
329
+ "0383dadf-2a4e-4d10-a46a-e9e041da8eb3", # Queen
330
+ "83d91898-7763-47d7-b03b-b92132375c47", # Pink Floyd
331
+ "678d88b2-87b0-403b-b63d-5da7465aecc3", # Led Zeppelin
332
+ ]
333
+
334
+ with ThreadPoolExecutor(max_workers=3) as ex:
335
+ results = list(ex.map(fetch_artist, mbids))
336
+ # 3 artists fetched in ~0.79s total
337
+ ```
338
+
339
+ Tested: 5-6 rapid sequential requests all succeed. Parallel requests at 3x concurrency succeed. Real 429s (rate-limit blocks) are only hit at very high burst rates; if you do get a 429, add `time.sleep(1)` between requests.
340
+
341
+ ### Pagination
342
+
343
+ ```python
344
+ import time
345
+
346
+ UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
347
+
348
+ def browse_all_releases(artist_mbid, page_size=25):
349
+ """Fetch all releases for an artist across multiple pages."""
350
+ offset = 0
351
+ total = None
352
+ releases = []
353
+ while total is None or offset < total:
354
+ resp = json.loads(http_get(
355
+ f"https://musicbrainz.org/ws/2/release/"
356
+ f"?artist={artist_mbid}&fmt=json&limit={page_size}&offset={offset}",
357
+ headers=UA
358
+ ))
359
+ total = resp['release-count']
360
+ batch = resp['releases']
361
+ releases.extend(batch)
362
+ offset += len(batch)
363
+ if offset < total:
364
+ time.sleep(1) # stay within 1 req/s for sequential pagination
365
+ return releases
366
+
367
+ # Queen has 1635 releases — use release-groups (412) to get deduplicated albums
368
+ ```
369
+
370
+ ---
371
+
372
+ ## `inc=` parameter reference
373
+
374
+ Stack multiple `inc=` values with `+` between them.
375
+
376
+ **Artist lookup** (`/ws/2/artist/{mbid}`):
377
+ - `releases` — list of releases (max ~25)
378
+ - `release-groups` — list of release groups (max ~25)
379
+ - `recordings` — list of recordings (max ~25)
380
+ - `works` — list of works
381
+ - `tags` — community genre tags (name + vote count)
382
+ - `ratings` — community rating (value 0-5, votes-count)
383
+ - `aliases` — alternative names and transliterations
384
+ - `annotation` — free-text editorial note
385
+ - `artist-rels`, `release-rels`, `recording-rels`, `work-rels` — relationship data
386
+
387
+ **Release lookup** (`/ws/2/release/{mbid}`):
388
+ - `artists` — full artist-credit objects
389
+ - `recordings` — track list with recording links (populates `media[].tracks[].recording`)
390
+ - `labels` — label-info with catalog numbers
391
+ - `release-groups` — the release group this belongs to
392
+ - `artist-credits` — expanded artist credit with joinphrase
393
+ - `media` — disc/format info (always included in lookup, not needed in `inc=`)
394
+
395
+ ---
396
+
397
+ ## Response shapes cheat sheet
398
+
399
+ ```
400
+ # MBID format: standard UUID v4
401
+ "0383dadf-2a4e-4d10-a46a-e9e041da8eb3"
402
+
403
+ # Search response (artist/recording/release/release-group/label/work)
404
+ {
405
+ "count": 1612, # total matches
406
+ "offset": 0,
407
+ "<entity-plural>": [...] # e.g. "artists", "releases", "recordings", "release-groups"
408
+ }
409
+
410
+ # Browse response (using ?artist=MBID or ?label=MBID style)
411
+ {
412
+ "release-count": 1635, # note: key name changes per entity
413
+ "release-offset": 0, # e.g. "release-group-count", "recording-count"
414
+ "releases": [...]
415
+ }
416
+
417
+ # Recording length is always milliseconds
418
+ recording['length'] // 1000 # => seconds
419
+
420
+ # Artist life-span
421
+ life_span = artist['life-span']
422
+ # {'begin': '1970-06-27', 'end': None, 'ended': True}
423
+ # 'ended': True with 'end': None means end date unknown but band is inactive
424
+
425
+ # Artist credit joinphrase (for multi-artist tracks)
426
+ # [{"name": "Simon", "artist": {...}, "joinphrase": " & "}, {"name": "Garfunkel", ...}]
427
+ ```
428
+
429
+ ---
430
+
431
+ ## URL patterns
432
+
433
+ | Resource | URL |
434
+ |---|---|
435
+ | Artist search | `https://musicbrainz.org/ws/2/artist/?query={q}&fmt=json&limit=5` |
436
+ | Artist by MBID | `https://musicbrainz.org/ws/2/artist/{mbid}?inc=tags+ratings&fmt=json` |
437
+ | Browse releases by artist | `https://musicbrainz.org/ws/2/release/?artist={mbid}&fmt=json&limit=25&offset=0` |
438
+ | Release search | `https://musicbrainz.org/ws/2/release/?query={q}&fmt=json&limit=5` |
439
+ | Release by MBID | `https://musicbrainz.org/ws/2/release/{mbid}?inc=artists+recordings+labels&fmt=json` |
440
+ | Release-group browse | `https://musicbrainz.org/ws/2/release-group/?artist={mbid}&fmt=json&limit=25` |
441
+ | Recording search | `https://musicbrainz.org/ws/2/recording/?query={q}&fmt=json&limit=5` |
442
+ | Label search | `https://musicbrainz.org/ws/2/label/?query={q}&fmt=json&limit=5` |
443
+ | Work search | `https://musicbrainz.org/ws/2/work/?query={q}&fmt=json&limit=5` |
444
+ | Cover art | `https://coverartarchive.org/release/{release-mbid}` |
445
+
446
+ MusicBrainz entity browser URL (human-readable): `https://musicbrainz.org/artist/{mbid}` (replace `artist` with `release`, `recording`, etc.)
447
+
448
+ ---
449
+
450
+ ## Gotchas
451
+
452
+ - **`User-Agent` is mandatory** — without it you get HTTP 403 instantly. The header must include contact info, e.g. `browser-harness/1.0 (you@example.com)`. The default `http_get` UA (`Mozilla/5.0`) also gets 403.
453
+
454
+ - **Browse vs search response keys differ** — Search responses use `count` and `offset`; Browse responses (with `?artist=MBID`) use `release-count` / `release-offset` (or `release-group-count` etc.). Accessing `data['count']` on a browse response throws `KeyError`.
455
+
456
+ - **`releases` include in artist lookup caps at ~25** — Use the browse endpoint (`?artist=MBID`) with pagination for complete lists. Queen has 1,635 releases total; the `inc=releases` on the artist endpoint only returns ~25.
457
+
458
+ - **Use release-groups to avoid edition explosion** — A popular album can have hundreds of release entries (every country's pressing, every remaster, every format). Use `/ws/2/release-group/` to get one entry per "album concept". Queen's "A Night at the Opera" has 75+ release entries but 1 release-group.
459
+
460
+ - **Recording length is milliseconds** — `recording['length']` is in milliseconds, not seconds. Divide by 1000.
461
+
462
+ - **Sort-name differs from display name for persons** — Artists have both `name` (display: "David Bowie") and `sort-name` (alphabetical: "Bowie, David"). Groups usually have identical values.
463
+
464
+ - **Disambiguation in parentheses** — When multiple entities share a name, MusicBrainz adds a `disambiguation` field to distinguish them (e.g. `"English singer-songwriter"` vs a different David Bowie). Always check `a.get('disambiguation', '')` when resolving artist identity.
465
+
466
+ - **Score 100 does not mean unique** — Search returns `score: 100` for multiple results when several equally match the query. "dark side of the moon" returns 6 results all scored 100 — they're different regional pressings. Filter by `date`, `country`, or `status` to narrow down.
467
+
468
+ - **Recording search: plain query matches titles AND artists broadly** — `?query=bohemian+rhapsody+queen` matches *cover versions* first because "queen" appears in the artist or title of other recordings. Use `AND artist:queen` Lucene syntax to restrict to Queen performances.
469
+
470
+ - **Cover Art Archive returns 404 for releases with no uploaded art** — Check `release['cover-art-archive']['artwork']` (boolean) from any release browse/search response before hitting the CAA endpoint. Saves an extra HTTP round-trip.
471
+
472
+ - **Cover art `front=True` flag vs `types=['Front']`** — A release can have multiple images typed as 'Front' but only one (or none) flagged `front: true`. Always filter on `img.get('front') == True` for the canonical cover, not on `img.get('types') == ['Front']`.
473
+
474
+ - **CAA thumbnail key names** — Both string keys `'small'` (250px) and `'large'` (500px) exist as aliases alongside numeric string keys `'250'`, `'500'`, `'1200'`. Access as `img['thumbnails']['500']` or `img['thumbnails']['large']` — both work.
475
+
476
+ - **Rate limit: 1 req/s unauthenticated** — In practice, bursts of 5-6 sequential requests succeed without throttling. True 429s appear at higher rates. For sequential pagination loops, add `time.sleep(1)` between pages. For parallel fetching, limit concurrency to 3-5 workers.
477
+
478
+ - **`fmt=json` required** — Omitting it returns XML instead of JSON. Always append `&fmt=json` to every request.