@pencil-agent/nano-pencil 2.0.0-beta.8 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (241) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/extensions-host/index.d.ts +1 -1
  7. package/dist/core/extensions-host/loader.js +1 -1
  8. package/dist/core/extensions-host/runner.d.ts +1 -0
  9. package/dist/core/extensions-host/runner.js +2 -2
  10. package/dist/core/extensions-host/types.d.ts +17 -22
  11. package/dist/core/lib/ai/src/types.d.ts +12 -2
  12. package/dist/core/persona/persona-manager.js +5 -2
  13. package/dist/core/runtime/agent-session.js +3 -3
  14. package/dist/core/runtime/extension-core-bindings.d.ts +1 -0
  15. package/dist/core/runtime/extension-core-bindings.js +2 -2
  16. package/dist/extensions/builtin/AGENT.md +115 -115
  17. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  18. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  19. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  20. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  91. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  92. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  93. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  94. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  95. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  96. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  97. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  98. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  99. package/dist/extensions/builtin/browser/browser.md +73 -73
  100. package/dist/extensions/builtin/browser/install.md +142 -142
  101. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  102. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  103. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  104. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  105. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  107. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  108. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  109. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  110. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  111. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  112. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  113. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  114. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  115. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  116. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  117. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  118. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  119. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  120. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  121. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  122. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  123. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  124. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  125. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  126. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  127. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  128. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  129. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  130. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  131. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  132. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  133. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  134. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  135. package/dist/extensions/builtin/goal/README.md +67 -67
  136. package/dist/extensions/builtin/goal/goal-controller.d.ts +39 -10
  137. package/dist/extensions/builtin/goal/goal-controller.js +1 -1
  138. package/dist/extensions/builtin/goal/goal-format.js +1 -1
  139. package/dist/extensions/builtin/goal/goal-prompts.d.ts +2 -0
  140. package/dist/extensions/builtin/goal/goal-prompts.js +5 -4
  141. package/dist/extensions/builtin/goal/goal-store.js +1 -1
  142. package/dist/extensions/builtin/goal/index.d.ts +1 -1
  143. package/dist/extensions/builtin/goal/index.js +10 -7
  144. package/dist/extensions/builtin/grub/README.md +112 -112
  145. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  146. package/dist/extensions/builtin/link-world/index.js +6 -6
  147. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  148. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  149. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  150. package/dist/extensions/builtin/link-world/{network-routing.md → network-routing/network-routing.md} +67 -67
  151. package/dist/extensions/builtin/loop/README.md +92 -92
  152. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  153. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  154. package/dist/extensions/builtin/plan/index.js +1 -1
  155. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  156. package/dist/extensions/builtin/sal/README.md +72 -72
  157. package/dist/extensions/builtin/security-audit/README.md +289 -289
  158. package/dist/extensions/builtin/task/task-store.d.ts +4 -0
  159. package/dist/extensions/builtin/task/task-store.js +1 -1
  160. package/dist/extensions/builtin/team/AGENT.md +112 -112
  161. package/dist/extensions/builtin/team/TESTING.md +299 -299
  162. package/dist/extensions/builtin/token-save/README.md +56 -56
  163. package/dist/extensions/optional/AGENT.md +10 -10
  164. package/dist/index.d.ts +5 -30
  165. package/dist/index.js +1 -1
  166. package/dist/models.d.ts +7 -0
  167. package/dist/models.js +1 -0
  168. package/dist/modes/interactive/components/footer.js +1 -1
  169. package/dist/modes/interactive/components/task-status-panel.d.ts +36 -0
  170. package/dist/modes/interactive/components/task-status-panel.js +1 -0
  171. package/dist/modes/interactive/controllers/stream-render-controller.d.ts +7 -0
  172. package/dist/modes/interactive/controllers/stream-render-controller.js +2 -2
  173. package/dist/modes/interactive/interactive-mode.js +40 -40
  174. package/dist/modes/interactive/state/interactive-state.d.ts +2 -0
  175. package/dist/modes/interactive/state/interactive-state.js +1 -1
  176. package/dist/modes/interactive/theme/dark.json +85 -85
  177. package/dist/modes/interactive/theme/light.json +84 -84
  178. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  179. package/dist/modes/interactive/theme/warm.json +81 -81
  180. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  181. package/dist/node_modules/@pencil-agent/ai/dist/models.generated.js +1 -1
  182. package/dist/node_modules/@pencil-agent/ai/dist/providers/anthropic.js +2 -2
  183. package/dist/node_modules/@pencil-agent/ai/dist/providers/openai-completions.js +5 -5
  184. package/dist/node_modules/@pencil-agent/ai/dist/providers/openai-responses.js +1 -1
  185. package/dist/node_modules/@pencil-agent/ai/dist/stream.js +1 -1
  186. package/dist/packages/protocol/src/commands.d.ts +33 -0
  187. package/dist/packages/protocol/src/flags.d.ts +20 -0
  188. package/dist/packages/protocol/src/hooks.d.ts +17 -0
  189. package/dist/packages/protocol/src/hooks.js +0 -0
  190. package/dist/packages/{extension-sdk → protocol}/src/index.d.ts +7 -4
  191. package/dist/packages/protocol/src/index.js +1 -0
  192. package/dist/packages/{extension-sdk → protocol}/src/lifecycle.d.ts +15 -27
  193. package/dist/packages/protocol/src/lifecycle.js +0 -0
  194. package/dist/packages/{extension-sdk → protocol}/src/tools.d.ts +1 -1
  195. package/dist/packages/protocol/src/tools.js +0 -0
  196. package/dist/public-config.d.ts +12 -0
  197. package/dist/public-config.js +1 -0
  198. package/dist/runtime.d.ts +9 -0
  199. package/dist/runtime.js +1 -0
  200. package/dist/session-compaction.d.ts +7 -0
  201. package/dist/session-compaction.js +1 -0
  202. package/dist/session.d.ts +7 -0
  203. package/dist/session.js +1 -0
  204. package/dist/skills.d.ts +7 -0
  205. package/dist/skills.js +1 -0
  206. package/dist/tools.d.ts +7 -0
  207. package/dist/tools.js +1 -0
  208. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +851 -0
  209. package/docs/SDK-TESTING.md +364 -0
  210. package/docs/codex-goal-command-impl.md +1055 -1055
  211. package/docs/codex-goal-vs-grub.md +500 -500
  212. package/docs/custom-provider.md +27 -27
  213. package/docs/extensions.md +27 -27
  214. package/docs/keybindings.md +27 -27
  215. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  216. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  217. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  218. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  219. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  220. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  221. package/docs/loop-usage-examples.md +214 -214
  222. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +593 -0
  223. package/docs/models.md +27 -27
  224. package/docs/packages.md +27 -27
  225. package/docs/pi-design-philosophy.md +457 -457
  226. package/docs/planmode.md +1987 -1987
  227. package/docs/prompt-templates.md +27 -27
  228. package/docs/providers.md +27 -27
  229. package/docs/sdk.md +27 -27
  230. package/docs/skills.md +27 -27
  231. package/docs/startup-performance-optimization.md +301 -0
  232. package/docs/themes.md +27 -27
  233. package/docs/tui.md +27 -27
  234. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +47 -0
  235. package/package.json +190 -162
  236. package/dist/packages/extension-sdk/src/index.js +0 -1
  237. package/docs/cc-agent-design.md +0 -1297
  238. package/docs/cc-tui-design.md +0 -1333
  239. package/docs//345/257/271/346/240/207Claude-Code.md +0 -1775
  240. /package/dist/packages/{extension-sdk/src/lifecycle.js → protocol/src/commands.js} +0 -0
  241. /package/dist/packages/{extension-sdk/src/tools.js → protocol/src/flags.js} +0 -0
@@ -1,363 +1,363 @@
1
- # Eventbrite — Scraping & Data Extraction
2
-
3
- `https://www.eventbrite.com` — public event listings and detail pages, no auth required for HTML scraping. REST API requires an OAuth token.
4
-
5
- ## Do this first
6
-
7
- **Use the search listing URL to get event lists — parse the `ItemList` JSON-LD block, not the HTML.**
8
-
9
- ```python
10
- import re, json
11
-
12
- headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
13
- html = http_get("https://www.eventbrite.com/d/ca--san-francisco/tech/", headers=headers)
14
-
15
- ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
16
- for block in ld_blocks:
17
- parsed = json.loads(block)
18
- if isinstance(parsed, dict) and parsed.get('@type') == 'ItemList':
19
- for item in parsed['itemListElement']:
20
- ev = item['item']
21
- print(ev['name'], ev['startDate'], ev['url'])
22
- break
23
- # Returns 18–40 events per page
24
- ```
25
-
26
- **For a single event, fetch the detail page and extract the `Event` JSON-LD block.** It contains all fields including `offers` (pricing). There is also a richer `__NEXT_DATA__` block if you need venue coordinates, refund policy, or sales status.
27
-
28
- ## URL structure
29
-
30
- ### Search / listing pages
31
-
32
- ```
33
- https://www.eventbrite.com/d/{location}/{category}/
34
- https://www.eventbrite.com/d/{location}/{category}/?page=2
35
- https://www.eventbrite.com/d/{location}/{category}/?start_date=2026-05-01&end_date=2026-05-31
36
- ```
37
-
38
- **Location format:** `{state-abbreviation}--{city}` (lowercase, hyphens for spaces)
39
- - `ca--san-francisco`
40
- - `ny--new-york`
41
- - `ca--los-angeles`
42
- - Use `online` for virtual events
43
-
44
- **Category slugs (confirmed working):**
45
- - `tech` — Technology events
46
- - `music` — Music
47
- - `food--drink` — Food & Drink
48
- - `health` — Health & Wellness
49
- - `sports--fitness` — Sports & Fitness
50
- - `arts--entertainment` — Arts & Entertainment
51
- - `family--education` — Family & Education
52
- - `business--professional` — Business & Networking
53
- - `science--tech` — Science & Technology
54
- - `community--culture` — Community & Culture
55
- - `networking` — Networking
56
- - `events` — All events (broadest, returns ~40/page)
57
-
58
- **Filter slugs (replace category):**
59
- - `free--events` — Free events only
60
- - `events--today` — Today
61
- - `events--tomorrow` — Tomorrow
62
- - `events--this-weekend` — This weekend
63
-
64
- **Query params:**
65
- - `?page=N` — Pagination (page 2+ confirmed working, each returns 18–20 events)
66
- - `?start_date=YYYY-MM-DD&end_date=YYYY-MM-DD` — Date range filter (confirmed, narrows results)
67
-
68
- ### Event detail pages
69
-
70
- ```
71
- https://www.eventbrite.com/e/{slug}-tickets-{event_id}
72
- ```
73
-
74
- Example: `https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639`
75
-
76
- - `event_id` is a numeric string (10–13 digits)
77
- - Extract with: `re.search(r'-tickets-(\d+)$', url).group(1)`
78
- - Extract slug with: `re.search(r'/e/(.+)-tickets-\d+$', url).group(1)`
79
-
80
- Other TLDs (`.ca`, `.co.uk`, etc.) use the same structure — event IDs are globally unique across TLDs.
81
-
82
- ## Listing page: JSON-LD `ItemList` schema
83
-
84
- The first `<script type="application/ld+json">` block on any `/d/` page is an `ItemList`. Each `itemListElement` contains:
85
-
86
- ```json
87
- {
88
- "position": 1,
89
- "@type": "ListItem",
90
- "item": {
91
- "@type": "Event",
92
- "name": "iContact the tactile tech opera",
93
- "description": "An immersive performance...",
94
- "url": "https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639",
95
- "image": "https://img.evbuc.com/...",
96
- "startDate": "2026-06-21",
97
- "endDate": "2026-06-21",
98
- "eventAttendanceMode": "https://schema.org/OfflineEventAttendanceMode",
99
- "location": {
100
- "@type": "Place",
101
- "name": "Little Boxes Theater",
102
- "address": {
103
- "@type": "PostalAddress",
104
- "addressLocality": "San Francisco",
105
- "addressRegion": "CA",
106
- "addressCountry": "US",
107
- "streetAddress": "94107 1661 Tennessee Street",
108
- "postalCode": "94107"
109
- },
110
- "geo": {
111
- "@type": "GeoCoordinates",
112
- "latitude": "37.7508806",
113
- "longitude": "-122.3881427"
114
- }
115
- }
116
- }
117
- }
118
- ```
119
-
120
- Note: listing-page items do NOT include `offers` (pricing) or `organizer`. Fetch the detail page for those.
121
-
122
- The second JSON-LD block on listing pages is a `BreadcrumbList` (skip it).
123
-
124
- ## Detail page: JSON-LD `Event` schema
125
-
126
- The detail page has 4 JSON-LD blocks. The `Event` (or `BusinessEvent`) block is the second one and contains the full schema:
127
-
128
- ```python
129
- import re, json
130
-
131
- headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
132
- html = http_get("https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639", headers=headers)
133
-
134
- ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
135
- event_data = None
136
- for block in ld_blocks:
137
- parsed = json.loads(block)
138
- if isinstance(parsed, dict) and parsed.get('@type') in ('Event', 'BusinessEvent', 'MusicEvent', 'EducationEvent'):
139
- event_data = parsed
140
- break
141
-
142
- print(event_data['name']) # "iContact the tactile tech opera"
143
- print(event_data['startDate']) # "2026-06-21T17:05:00-07:00" (ISO 8601 with TZ)
144
- print(event_data['endDate']) # "2026-06-21T20:08:00-07:00"
145
- print(event_data['eventStatus']) # "https://schema.org/EventScheduled"
146
- print(event_data['eventAttendanceMode']) # "https://schema.org/OfflineEventAttendanceMode"
147
- print(event_data['location']['name']) # "Little Boxes Theater"
148
- print(event_data['location']['address']['streetAddress']) # "94107 1661 Tennessee Street, San Francisco, CA 94107"
149
- print(event_data['organizer']['name']) # "Beth McNamara"
150
- print(event_data['organizer']['url']) # "https://www.eventbrite.com/o/beth-mcnamara-120755148166"
151
- ```
152
-
153
- Full confirmed schema on detail page:
154
- ```
155
- name str Event title
156
- description str Short summary
157
- url str Canonical event URL
158
- image str Event banner image URL
159
- startDate str ISO 8601 with timezone offset
160
- endDate str ISO 8601 with timezone offset
161
- eventStatus str URI: EventScheduled / EventCancelled / EventPostponed
162
- eventAttendanceMode str URI: OfflineEventAttendanceMode / OnlineEventAttendanceMode / MixedEventAttendanceMode
163
- location.@type str "Place" (in-person) or "VirtualLocation" (online)
164
- location.name str Venue name
165
- location.address.streetAddress str
166
- location.address.addressLocality str City
167
- location.address.addressRegion str State abbreviation
168
- location.address.addressCountry str Country code
169
- organizer.name str Organizer display name
170
- organizer.url str Organizer profile URL
171
- offers list AggregateOffer object(s)
172
- ```
173
-
174
- ### Offers / pricing
175
-
176
- ```python
177
- offers = event_data.get('offers', [])
178
- if offers:
179
- offer = offers[0] # always a list; typically one AggregateOffer
180
- print(offer['@type']) # "AggregateOffer"
181
- print(offer['lowPrice']) # "50.0" (string, not float)
182
- print(offer['highPrice']) # "50.0"
183
- print(offer['priceCurrency']) # "USD"
184
- print(offer['availability']) # "InStock" / "SoldOut"
185
- print(offer['availabilityStarts']) # ISO 8601 UTC
186
- print(offer['availabilityEnds']) # ISO 8601 UTC
187
-
188
- # Free events: lowPrice="0.0", highPrice="0.0"
189
- # Free check: float(offer['lowPrice']) == 0.0
190
- ```
191
-
192
- `@type` on the event itself varies by format (all scrape identically):
193
- - `Event` — general
194
- - `BusinessEvent` — networking, professional
195
- - `MusicEvent` — concerts
196
- - `EducationEvent` — classes, workshops
197
-
198
- ## Detail page: `__NEXT_DATA__` (richer structured data)
199
-
200
- Every event detail page embeds a `<script id="__NEXT_DATA__">` block with additional fields not in JSON-LD:
201
-
202
- ```python
203
- import re, json
204
-
205
- nextjs = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
206
- nd = json.loads(nextjs.group(1))
207
- context = nd['props']['pageProps']['context']
208
-
209
- bi = context['basicInfo']
210
- print(bi['id']) # "1982861003639" (event ID string)
211
- print(bi['name']) # event title
212
- print(bi['isFree']) # bool
213
- print(bi['isOnline']) # bool
214
- print(bi['currency']) # "USD"
215
- print(bi['status']) # "live" / "completed" / "canceled"
216
- print(bi['organizationId']) # numeric string
217
- print(bi['formatId']) # numeric string (event format category)
218
- print(bi['isProtected']) # bool — password-protected events
219
- print(bi['isSeries']) # bool — recurring series
220
- print(bi['created']) # ISO 8601 UTC creation timestamp
221
-
222
- # Venue with coordinates
223
- venue = bi['venue']
224
- print(venue['name']) # "Little Boxes Theater"
225
- print(venue['address']['city']) # "San Francisco"
226
- print(venue['address']['region']) # "CA"
227
- print(venue['address']['latitude']) # "37.7508806"
228
- print(venue['address']['longitude']) # "-122.3881427"
229
- print(venue['address']['localizedMultiLineAddressDisplay']) # list of strings
230
-
231
- # Organizer details
232
- org = bi['organizer']
233
- print(org['name']) # "Beth McNamara"
234
- print(org['url']) # organizer profile URL
235
- print(org['numEvents']) # int
236
- print(org['verified']) # bool
237
-
238
- # Sales status
239
- ss = context['salesStatus']
240
- print(ss['salesStatus']) # "on_sale" / "sold_out" / "sales_ended"
241
- print(ss['startSalesDate']['local']) # local datetime string
242
-
243
- # Good to know
244
- gtk = context['goodToKnow']['highlights']
245
- print(gtk['ageRestriction']) # "18+" or null
246
- print(gtk['durationInMinutes']) # int (e.g. 183)
247
- print(gtk['doorTime']) # local datetime string or null
248
- print(gtk['locationType']) # "in_person" or "online"
249
-
250
- # Refund policy
251
- refund = context['goodToKnow']['refundPolicy']
252
- print(refund['policyType']) # "custom" / "no_refunds" / "standard"
253
- print(refund['isRefundAllowed']) # bool
254
- print(refund['validDays']) # int or null
255
-
256
- # Full event description (HTML)
257
- for module in context['structuredContent']['modules']:
258
- if module['type'] == 'text':
259
- print(module['text']) # raw HTML, may need BeautifulSoup to strip tags
260
- ```
261
-
262
- ## Complete workflow: scrape events from a category
263
-
264
- ```python
265
- import re, json
266
-
267
- def get_events_from_listing(location, category, page=1):
268
- """Returns list of event dicts with name, url, startDate, endDate, location."""
269
- headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
270
- url = f"https://www.eventbrite.com/d/{location}/{category}/?page={page}"
271
- html = http_get(url, headers=headers)
272
- ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
273
- for block in ld_blocks:
274
- parsed = json.loads(block)
275
- if isinstance(parsed, dict) and parsed.get('@type') == 'ItemList':
276
- return [item['item'] for item in parsed.get('itemListElement', [])]
277
- return []
278
-
279
- def get_event_detail(event_url):
280
- """Returns full Event JSON-LD + NEXT_DATA context for a single event."""
281
- headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
282
- html = http_get(event_url, headers=headers)
283
-
284
- # JSON-LD Event block
285
- ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
286
- event_ld = None
287
- for block in ld_blocks:
288
- parsed = json.loads(block)
289
- if isinstance(parsed, dict) and parsed.get('@type') in ('Event', 'BusinessEvent', 'MusicEvent', 'EducationEvent'):
290
- event_ld = parsed
291
- break
292
-
293
- # NEXT_DATA context
294
- nextjs = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
295
- context = None
296
- if nextjs:
297
- nd = json.loads(nextjs.group(1))
298
- context = nd['props']['pageProps']['context']
299
-
300
- return event_ld, context
301
-
302
- # Usage
303
- events = get_events_from_listing("ca--san-francisco", "tech", page=1)
304
- print(f"Found {len(events)} events") # 18–20 typical
305
-
306
- for ev in events[:3]:
307
- print(ev['name'], ev['startDate'], ev['url'])
308
-
309
- # Deep-fetch one event
310
- ld, ctx = get_event_detail(events[0]['url'])
311
- if ld and ld.get('offers'):
312
- price = float(ld['offers'][0]['lowPrice'])
313
- currency = ld['offers'][0]['priceCurrency']
314
- print(f"Price: {price} {currency}") # 0.0 USD (free) or e.g. 50.0 USD
315
- ```
316
-
317
- ## Public API: requires auth
318
-
319
- The Eventbrite REST API (`https://www.eventbriteapi.com/v3/`) requires an OAuth token for all endpoints:
320
-
321
- - `GET /v3/events/{id}/` — HTTP 401 without auth
322
- - `GET /v3/events/search/` — HTTP 404 (endpoint changed; auth also required)
323
-
324
- **Use HTML scraping instead** — the JSON-LD and `__NEXT_DATA__` data is equivalent to the API response and requires no credentials.
325
-
326
- If you have a token (`EVENTBRITE_TOKEN`):
327
- ```python
328
- import os
329
- token = os.environ.get('EVENTBRITE_TOKEN')
330
- headers = {
331
- "User-Agent": "Mozilla/5.0",
332
- "Authorization": f"Bearer {token}"
333
- }
334
- data = json.loads(http_get(f"https://www.eventbriteapi.com/v3/events/{event_id}/", headers=headers))
335
- ```
336
-
337
- ## Gotchas
338
-
339
- - **Event URLs in the HTML use relative `/e/` paths, not absolute URLs** — Search listing HTML contains `/e/slug-tickets-id?aff=...` relative paths (with tracking params). Extract event URLs from the JSON-LD `ItemList` instead — they are absolute, clean URLs without tracking params.
340
-
341
- - **`re.findall(r'href="https://www.eventbrite.com/e/...')` returns 0 results** — Confirmed: event cards in the HTML do not have `https://www.eventbrite.com/e/` in href attributes. Use JSON-LD extraction only.
342
-
343
- - **`__SERVER_DATA__` does not exist** — Both search and detail pages were checked. There is no `window.__SERVER_DATA__` or `window.__redux_state__`. The embedded data is in `<script id="__NEXT_DATA__">` (detail pages only) and JSON-LD (both).
344
-
345
- - **Search listing pages have no `__NEXT_DATA__`** — Only event detail pages (`/e/` URLs) have the `__NEXT_DATA__` block. Listing pages (`/d/` URLs) have JSON-LD only.
346
-
347
- - **`@type` varies by event format** — Don't filter JSON-LD blocks with `parsed['@type'] == 'Event'` alone. Check for any of: `Event`, `BusinessEvent`, `MusicEvent`, `EducationEvent`. They have identical field structure.
348
-
349
- - **`startDate` on listing vs. detail pages differs in precision** — Listing page items show date-only (`"2026-06-21"`). Detail page Event block shows full ISO 8601 with timezone offset (`"2026-06-21T17:05:00-07:00"`). Use detail page for scheduling tasks.
350
-
351
- - **`offers` is absent on listing page items** — The `ItemList` does not include pricing. Fetch the detail page for `offers.lowPrice` / `offers.highPrice`.
352
-
353
- - **Free events have `lowPrice: "0.0"` and `highPrice: "0.0"`** — Not null or missing. Check `float(offers[0]['lowPrice']) == 0.0` or use `basicInfo.isFree` from `__NEXT_DATA__`.
354
-
355
- - **`offers` prices are strings, not floats** — `"50.0"` not `50.0`. Cast with `float(offer['lowPrice'])` before arithmetic.
356
-
357
- - **Page size is ~18–20 events per page** — Not a fixed 20. Some pages return fewer. Don't assume page N is empty because it returned < 20.
358
-
359
- - **Date filter works but can still return events outside range** — The `?start_date=` / `?end_date=` params narrow results but are not strict; always validate `startDate` from the returned data.
360
-
361
- - **Eventbrite CA / UK / AU use different TLDs** — Online event listings may surface `eventbrite.ca`, `eventbrite.co.uk` URLs. The `/e/` structure and JSON-LD schema are identical. Fetch them with the same code.
362
-
363
- - **No rate limiting observed** — 8 sequential HTTP requests across 4 pages completed without errors or blocks (avg ~1.5s each). No delay needed for light workloads, but be reasonable for bulk scraping.
1
+ # Eventbrite — Scraping & Data Extraction
2
+
3
+ `https://www.eventbrite.com` — public event listings and detail pages, no auth required for HTML scraping. REST API requires an OAuth token.
4
+
5
+ ## Do this first
6
+
7
+ **Use the search listing URL to get event lists — parse the `ItemList` JSON-LD block, not the HTML.**
8
+
9
+ ```python
10
+ import re, json
11
+
12
+ headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
13
+ html = http_get("https://www.eventbrite.com/d/ca--san-francisco/tech/", headers=headers)
14
+
15
+ ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
16
+ for block in ld_blocks:
17
+ parsed = json.loads(block)
18
+ if isinstance(parsed, dict) and parsed.get('@type') == 'ItemList':
19
+ for item in parsed['itemListElement']:
20
+ ev = item['item']
21
+ print(ev['name'], ev['startDate'], ev['url'])
22
+ break
23
+ # Returns 18–40 events per page
24
+ ```
25
+
26
+ **For a single event, fetch the detail page and extract the `Event` JSON-LD block.** It contains all fields including `offers` (pricing). There is also a richer `__NEXT_DATA__` block if you need venue coordinates, refund policy, or sales status.
27
+
28
+ ## URL structure
29
+
30
+ ### Search / listing pages
31
+
32
+ ```
33
+ https://www.eventbrite.com/d/{location}/{category}/
34
+ https://www.eventbrite.com/d/{location}/{category}/?page=2
35
+ https://www.eventbrite.com/d/{location}/{category}/?start_date=2026-05-01&end_date=2026-05-31
36
+ ```
37
+
38
+ **Location format:** `{state-abbreviation}--{city}` (lowercase, hyphens for spaces)
39
+ - `ca--san-francisco`
40
+ - `ny--new-york`
41
+ - `ca--los-angeles`
42
+ - Use `online` for virtual events
43
+
44
+ **Category slugs (confirmed working):**
45
+ - `tech` — Technology events
46
+ - `music` — Music
47
+ - `food--drink` — Food & Drink
48
+ - `health` — Health & Wellness
49
+ - `sports--fitness` — Sports & Fitness
50
+ - `arts--entertainment` — Arts & Entertainment
51
+ - `family--education` — Family & Education
52
+ - `business--professional` — Business & Networking
53
+ - `science--tech` — Science & Technology
54
+ - `community--culture` — Community & Culture
55
+ - `networking` — Networking
56
+ - `events` — All events (broadest, returns ~40/page)
57
+
58
+ **Filter slugs (replace category):**
59
+ - `free--events` — Free events only
60
+ - `events--today` — Today
61
+ - `events--tomorrow` — Tomorrow
62
+ - `events--this-weekend` — This weekend
63
+
64
+ **Query params:**
65
+ - `?page=N` — Pagination (page 2+ confirmed working, each returns 18–20 events)
66
+ - `?start_date=YYYY-MM-DD&end_date=YYYY-MM-DD` — Date range filter (confirmed, narrows results)
67
+
68
+ ### Event detail pages
69
+
70
+ ```
71
+ https://www.eventbrite.com/e/{slug}-tickets-{event_id}
72
+ ```
73
+
74
+ Example: `https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639`
75
+
76
+ - `event_id` is a numeric string (10–13 digits)
77
+ - Extract with: `re.search(r'-tickets-(\d+)$', url).group(1)`
78
+ - Extract slug with: `re.search(r'/e/(.+)-tickets-\d+$', url).group(1)`
79
+
80
+ Other TLDs (`.ca`, `.co.uk`, etc.) use the same structure — event IDs are globally unique across TLDs.
81
+
82
+ ## Listing page: JSON-LD `ItemList` schema
83
+
84
+ The first `<script type="application/ld+json">` block on any `/d/` page is an `ItemList`. Each `itemListElement` contains:
85
+
86
+ ```json
87
+ {
88
+ "position": 1,
89
+ "@type": "ListItem",
90
+ "item": {
91
+ "@type": "Event",
92
+ "name": "iContact the tactile tech opera",
93
+ "description": "An immersive performance...",
94
+ "url": "https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639",
95
+ "image": "https://img.evbuc.com/...",
96
+ "startDate": "2026-06-21",
97
+ "endDate": "2026-06-21",
98
+ "eventAttendanceMode": "https://schema.org/OfflineEventAttendanceMode",
99
+ "location": {
100
+ "@type": "Place",
101
+ "name": "Little Boxes Theater",
102
+ "address": {
103
+ "@type": "PostalAddress",
104
+ "addressLocality": "San Francisco",
105
+ "addressRegion": "CA",
106
+ "addressCountry": "US",
107
+ "streetAddress": "94107 1661 Tennessee Street",
108
+ "postalCode": "94107"
109
+ },
110
+ "geo": {
111
+ "@type": "GeoCoordinates",
112
+ "latitude": "37.7508806",
113
+ "longitude": "-122.3881427"
114
+ }
115
+ }
116
+ }
117
+ }
118
+ ```
119
+
120
+ Note: listing-page items do NOT include `offers` (pricing) or `organizer`. Fetch the detail page for those.
121
+
122
+ The second JSON-LD block on listing pages is a `BreadcrumbList` (skip it).
123
+
124
+ ## Detail page: JSON-LD `Event` schema
125
+
126
+ The detail page has 4 JSON-LD blocks. The `Event` (or `BusinessEvent`) block is the second one and contains the full schema:
127
+
128
+ ```python
129
+ import re, json
130
+
131
+ headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
132
+ html = http_get("https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639", headers=headers)
133
+
134
+ ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
135
+ event_data = None
136
+ for block in ld_blocks:
137
+ parsed = json.loads(block)
138
+ if isinstance(parsed, dict) and parsed.get('@type') in ('Event', 'BusinessEvent', 'MusicEvent', 'EducationEvent'):
139
+ event_data = parsed
140
+ break
141
+
142
+ print(event_data['name']) # "iContact the tactile tech opera"
143
+ print(event_data['startDate']) # "2026-06-21T17:05:00-07:00" (ISO 8601 with TZ)
144
+ print(event_data['endDate']) # "2026-06-21T20:08:00-07:00"
145
+ print(event_data['eventStatus']) # "https://schema.org/EventScheduled"
146
+ print(event_data['eventAttendanceMode']) # "https://schema.org/OfflineEventAttendanceMode"
147
+ print(event_data['location']['name']) # "Little Boxes Theater"
148
+ print(event_data['location']['address']['streetAddress']) # "94107 1661 Tennessee Street, San Francisco, CA 94107"
149
+ print(event_data['organizer']['name']) # "Beth McNamara"
150
+ print(event_data['organizer']['url']) # "https://www.eventbrite.com/o/beth-mcnamara-120755148166"
151
+ ```
152
+
153
+ Full confirmed schema on detail page:
154
+ ```
155
+ name str Event title
156
+ description str Short summary
157
+ url str Canonical event URL
158
+ image str Event banner image URL
159
+ startDate str ISO 8601 with timezone offset
160
+ endDate str ISO 8601 with timezone offset
161
+ eventStatus str URI: EventScheduled / EventCancelled / EventPostponed
162
+ eventAttendanceMode str URI: OfflineEventAttendanceMode / OnlineEventAttendanceMode / MixedEventAttendanceMode
163
+ location.@type str "Place" (in-person) or "VirtualLocation" (online)
164
+ location.name str Venue name
165
+ location.address.streetAddress str
166
+ location.address.addressLocality str City
167
+ location.address.addressRegion str State abbreviation
168
+ location.address.addressCountry str Country code
169
+ organizer.name str Organizer display name
170
+ organizer.url str Organizer profile URL
171
+ offers list AggregateOffer object(s)
172
+ ```
173
+
174
+ ### Offers / pricing
175
+
176
+ ```python
177
+ offers = event_data.get('offers', [])
178
+ if offers:
179
+ offer = offers[0] # always a list; typically one AggregateOffer
180
+ print(offer['@type']) # "AggregateOffer"
181
+ print(offer['lowPrice']) # "50.0" (string, not float)
182
+ print(offer['highPrice']) # "50.0"
183
+ print(offer['priceCurrency']) # "USD"
184
+ print(offer['availability']) # "InStock" / "SoldOut"
185
+ print(offer['availabilityStarts']) # ISO 8601 UTC
186
+ print(offer['availabilityEnds']) # ISO 8601 UTC
187
+
188
+ # Free events: lowPrice="0.0", highPrice="0.0"
189
+ # Free check: float(offer['lowPrice']) == 0.0
190
+ ```
191
+
192
+ `@type` on the event itself varies by format (all scrape identically):
193
+ - `Event` — general
194
+ - `BusinessEvent` — networking, professional
195
+ - `MusicEvent` — concerts
196
+ - `EducationEvent` — classes, workshops
197
+
198
+ ## Detail page: `__NEXT_DATA__` (richer structured data)
199
+
200
+ Every event detail page embeds a `<script id="__NEXT_DATA__">` block with additional fields not in JSON-LD:
201
+
202
+ ```python
203
+ import re, json
204
+
205
+ nextjs = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
206
+ nd = json.loads(nextjs.group(1))
207
+ context = nd['props']['pageProps']['context']
208
+
209
+ bi = context['basicInfo']
210
+ print(bi['id']) # "1982861003639" (event ID string)
211
+ print(bi['name']) # event title
212
+ print(bi['isFree']) # bool
213
+ print(bi['isOnline']) # bool
214
+ print(bi['currency']) # "USD"
215
+ print(bi['status']) # "live" / "completed" / "canceled"
216
+ print(bi['organizationId']) # numeric string
217
+ print(bi['formatId']) # numeric string (event format category)
218
+ print(bi['isProtected']) # bool — password-protected events
219
+ print(bi['isSeries']) # bool — recurring series
220
+ print(bi['created']) # ISO 8601 UTC creation timestamp
221
+
222
+ # Venue with coordinates
223
+ venue = bi['venue']
224
+ print(venue['name']) # "Little Boxes Theater"
225
+ print(venue['address']['city']) # "San Francisco"
226
+ print(venue['address']['region']) # "CA"
227
+ print(venue['address']['latitude']) # "37.7508806"
228
+ print(venue['address']['longitude']) # "-122.3881427"
229
+ print(venue['address']['localizedMultiLineAddressDisplay']) # list of strings
230
+
231
+ # Organizer details
232
+ org = bi['organizer']
233
+ print(org['name']) # "Beth McNamara"
234
+ print(org['url']) # organizer profile URL
235
+ print(org['numEvents']) # int
236
+ print(org['verified']) # bool
237
+
238
+ # Sales status
239
+ ss = context['salesStatus']
240
+ print(ss['salesStatus']) # "on_sale" / "sold_out" / "sales_ended"
241
+ print(ss['startSalesDate']['local']) # local datetime string
242
+
243
+ # Good to know
244
+ gtk = context['goodToKnow']['highlights']
245
+ print(gtk['ageRestriction']) # "18+" or null
246
+ print(gtk['durationInMinutes']) # int (e.g. 183)
247
+ print(gtk['doorTime']) # local datetime string or null
248
+ print(gtk['locationType']) # "in_person" or "online"
249
+
250
+ # Refund policy
251
+ refund = context['goodToKnow']['refundPolicy']
252
+ print(refund['policyType']) # "custom" / "no_refunds" / "standard"
253
+ print(refund['isRefundAllowed']) # bool
254
+ print(refund['validDays']) # int or null
255
+
256
+ # Full event description (HTML)
257
+ for module in context['structuredContent']['modules']:
258
+ if module['type'] == 'text':
259
+ print(module['text']) # raw HTML, may need BeautifulSoup to strip tags
260
+ ```
261
+
262
+ ## Complete workflow: scrape events from a category
263
+
264
+ ```python
265
+ import re, json
266
+
267
+ def get_events_from_listing(location, category, page=1):
268
+ """Returns list of event dicts with name, url, startDate, endDate, location."""
269
+ headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
270
+ url = f"https://www.eventbrite.com/d/{location}/{category}/?page={page}"
271
+ html = http_get(url, headers=headers)
272
+ ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
273
+ for block in ld_blocks:
274
+ parsed = json.loads(block)
275
+ if isinstance(parsed, dict) and parsed.get('@type') == 'ItemList':
276
+ return [item['item'] for item in parsed.get('itemListElement', [])]
277
+ return []
278
+
279
+ def get_event_detail(event_url):
280
+ """Returns full Event JSON-LD + NEXT_DATA context for a single event."""
281
+ headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
282
+ html = http_get(event_url, headers=headers)
283
+
284
+ # JSON-LD Event block
285
+ ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
286
+ event_ld = None
287
+ for block in ld_blocks:
288
+ parsed = json.loads(block)
289
+ if isinstance(parsed, dict) and parsed.get('@type') in ('Event', 'BusinessEvent', 'MusicEvent', 'EducationEvent'):
290
+ event_ld = parsed
291
+ break
292
+
293
+ # NEXT_DATA context
294
+ nextjs = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
295
+ context = None
296
+ if nextjs:
297
+ nd = json.loads(nextjs.group(1))
298
+ context = nd['props']['pageProps']['context']
299
+
300
+ return event_ld, context
301
+
302
+ # Usage
303
+ events = get_events_from_listing("ca--san-francisco", "tech", page=1)
304
+ print(f"Found {len(events)} events") # 18–20 typical
305
+
306
+ for ev in events[:3]:
307
+ print(ev['name'], ev['startDate'], ev['url'])
308
+
309
+ # Deep-fetch one event
310
+ ld, ctx = get_event_detail(events[0]['url'])
311
+ if ld and ld.get('offers'):
312
+ price = float(ld['offers'][0]['lowPrice'])
313
+ currency = ld['offers'][0]['priceCurrency']
314
+ print(f"Price: {price} {currency}") # 0.0 USD (free) or e.g. 50.0 USD
315
+ ```
316
+
317
+ ## Public API: requires auth
318
+
319
+ The Eventbrite REST API (`https://www.eventbriteapi.com/v3/`) requires an OAuth token for all endpoints:
320
+
321
+ - `GET /v3/events/{id}/` — HTTP 401 without auth
322
+ - `GET /v3/events/search/` — HTTP 404 (endpoint changed; auth also required)
323
+
324
+ **Use HTML scraping instead** — the JSON-LD and `__NEXT_DATA__` data is equivalent to the API response and requires no credentials.
325
+
326
+ If you have a token (`EVENTBRITE_TOKEN`):
327
+ ```python
328
+ import os
329
+ token = os.environ.get('EVENTBRITE_TOKEN')
330
+ headers = {
331
+ "User-Agent": "Mozilla/5.0",
332
+ "Authorization": f"Bearer {token}"
333
+ }
334
+ data = json.loads(http_get(f"https://www.eventbriteapi.com/v3/events/{event_id}/", headers=headers))
335
+ ```
336
+
337
+ ## Gotchas
338
+
339
+ - **Event URLs in the HTML use relative `/e/` paths, not absolute URLs** — Search listing HTML contains `/e/slug-tickets-id?aff=...` relative paths (with tracking params). Extract event URLs from the JSON-LD `ItemList` instead — they are absolute, clean URLs without tracking params.
340
+
341
+ - **`re.findall(r'href="https://www.eventbrite.com/e/...')` returns 0 results** — Confirmed: event cards in the HTML do not have `https://www.eventbrite.com/e/` in href attributes. Use JSON-LD extraction only.
342
+
343
+ - **`__SERVER_DATA__` does not exist** — Both search and detail pages were checked. There is no `window.__SERVER_DATA__` or `window.__redux_state__`. The embedded data is in `<script id="__NEXT_DATA__">` (detail pages only) and JSON-LD (both).
344
+
345
+ - **Search listing pages have no `__NEXT_DATA__`** — Only event detail pages (`/e/` URLs) have the `__NEXT_DATA__` block. Listing pages (`/d/` URLs) have JSON-LD only.
346
+
347
+ - **`@type` varies by event format** — Don't filter JSON-LD blocks with `parsed['@type'] == 'Event'` alone. Check for any of: `Event`, `BusinessEvent`, `MusicEvent`, `EducationEvent`. They have identical field structure.
348
+
349
+ - **`startDate` on listing vs. detail pages differs in precision** — Listing page items show date-only (`"2026-06-21"`). Detail page Event block shows full ISO 8601 with timezone offset (`"2026-06-21T17:05:00-07:00"`). Use detail page for scheduling tasks.
350
+
351
+ - **`offers` is absent on listing page items** — The `ItemList` does not include pricing. Fetch the detail page for `offers.lowPrice` / `offers.highPrice`.
352
+
353
+ - **Free events have `lowPrice: "0.0"` and `highPrice: "0.0"`** — Not null or missing. Check `float(offers[0]['lowPrice']) == 0.0` or use `basicInfo.isFree` from `__NEXT_DATA__`.
354
+
355
+ - **`offers` prices are strings, not floats** — `"50.0"` not `50.0`. Cast with `float(offer['lowPrice'])` before arithmetic.
356
+
357
+ - **Page size is ~18–20 events per page** — Not a fixed 20. Some pages return fewer. Don't assume page N is empty because it returned < 20.
358
+
359
+ - **Date filter works but can still return events outside range** — The `?start_date=` / `?end_date=` params narrow results but are not strict; always validate `startDate` from the returned data.
360
+
361
+ - **Eventbrite CA / UK / AU use different TLDs** — Online event listings may surface `eventbrite.ca`, `eventbrite.co.uk` URLs. The `/e/` structure and JSON-LD schema are identical. Fetch them with the same code.
362
+
363
+ - **No rate limiting observed** — 8 sequential HTTP requests across 4 pages completed without errors or blocks (avg ~1.5s each). No delay needed for light workloads, but be reasonable for bulk scraping.