@pencil-agent/nano-pencil 2.0.1 → 2.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (186) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/model/custom-providers.js +1 -1
  7. package/dist/core/model-registry.js +5 -5
  8. package/dist/extensions/builtin/AGENT.md +115 -115
  9. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  10. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  11. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  12. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  13. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  14. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  15. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  16. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  17. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  18. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  19. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  20. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  91. package/dist/extensions/builtin/browser/browser.md +73 -73
  92. package/dist/extensions/builtin/browser/install.md +142 -142
  93. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  94. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  95. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  96. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  97. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  98. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  99. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  100. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  101. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  102. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  103. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  104. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  105. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  107. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  108. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  109. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  110. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  111. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  112. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  113. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  114. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  115. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  116. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  117. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  118. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  119. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  120. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  121. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  122. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  123. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  124. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  125. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  126. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  127. package/dist/extensions/builtin/goal/README.md +67 -67
  128. package/dist/extensions/builtin/grub/README.md +112 -112
  129. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  130. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  131. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  132. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  133. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  134. package/dist/extensions/builtin/loop/README.md +92 -92
  135. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  136. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  137. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  138. package/dist/extensions/builtin/sal/README.md +72 -72
  139. package/dist/extensions/builtin/security-audit/README.md +289 -289
  140. package/dist/extensions/builtin/team/AGENT.md +112 -112
  141. package/dist/extensions/builtin/team/TESTING.md +299 -299
  142. package/dist/extensions/builtin/token-save/README.md +56 -56
  143. package/dist/extensions/optional/AGENT.md +10 -10
  144. package/dist/modes/interactive/controllers/input-submit-controller.js +2 -2
  145. package/dist/modes/interactive/controllers/stream-render-controller.js +2 -2
  146. package/dist/modes/interactive/interactive-mode.js +19 -19
  147. package/dist/modes/interactive/theme/dark.json +85 -85
  148. package/dist/modes/interactive/theme/light.json +84 -84
  149. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  150. package/dist/modes/interactive/theme/warm.json +81 -81
  151. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  152. package/dist/node_modules/@pencil-agent/ai/dist/models.generated.js +1 -1
  153. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +851 -0
  154. package/docs/SDK-TESTING.md +364 -0
  155. package/docs/codex-goal-command-impl.md +1055 -1055
  156. package/docs/codex-goal-vs-grub.md +500 -500
  157. package/docs/custom-provider.md +27 -27
  158. package/docs/extensions.md +27 -27
  159. package/docs/keybindings.md +27 -27
  160. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  161. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  162. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  163. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  164. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  165. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  166. package/docs/loop-usage-examples.md +214 -214
  167. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +593 -0
  168. package/docs/models.md +27 -27
  169. package/docs/packages.md +27 -27
  170. package/docs/pi-design-philosophy.md +457 -457
  171. package/docs/planmode.md +1987 -1987
  172. package/docs/prompt-templates.md +27 -27
  173. package/docs/providers.md +27 -27
  174. package/docs/sdk.md +27 -27
  175. package/docs/skills.md +27 -27
  176. package/docs/startup-performance-optimization.md +301 -0
  177. package/docs/themes.md +27 -27
  178. package/docs/tui.md +27 -27
  179. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +47 -0
  180. package/package.json +190 -190
  181. package/docs/cc-agent-design.md +0 -1297
  182. package/docs/cc-tui-design.md +0 -1333
  183. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +0 -170
  184. package/docs/scan-report.md +0 -3820
  185. package/docs//345/257/271/346/240/207Claude-Code.md +0 -1775
  186. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +0 -261
@@ -1,363 +1,363 @@
1
- # Eventbrite — Scraping & Data Extraction
2
-
3
- `https://www.eventbrite.com` — public event listings and detail pages, no auth required for HTML scraping. REST API requires an OAuth token.
4
-
5
- ## Do this first
6
-
7
- **Use the search listing URL to get event lists — parse the `ItemList` JSON-LD block, not the HTML.**
8
-
9
- ```python
10
- import re, json
11
-
12
- headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
13
- html = http_get("https://www.eventbrite.com/d/ca--san-francisco/tech/", headers=headers)
14
-
15
- ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
16
- for block in ld_blocks:
17
- parsed = json.loads(block)
18
- if isinstance(parsed, dict) and parsed.get('@type') == 'ItemList':
19
- for item in parsed['itemListElement']:
20
- ev = item['item']
21
- print(ev['name'], ev['startDate'], ev['url'])
22
- break
23
- # Returns 18–40 events per page
24
- ```
25
-
26
- **For a single event, fetch the detail page and extract the `Event` JSON-LD block.** It contains all fields including `offers` (pricing). There is also a richer `__NEXT_DATA__` block if you need venue coordinates, refund policy, or sales status.
27
-
28
- ## URL structure
29
-
30
- ### Search / listing pages
31
-
32
- ```
33
- https://www.eventbrite.com/d/{location}/{category}/
34
- https://www.eventbrite.com/d/{location}/{category}/?page=2
35
- https://www.eventbrite.com/d/{location}/{category}/?start_date=2026-05-01&end_date=2026-05-31
36
- ```
37
-
38
- **Location format:** `{state-abbreviation}--{city}` (lowercase, hyphens for spaces)
39
- - `ca--san-francisco`
40
- - `ny--new-york`
41
- - `ca--los-angeles`
42
- - Use `online` for virtual events
43
-
44
- **Category slugs (confirmed working):**
45
- - `tech` — Technology events
46
- - `music` — Music
47
- - `food--drink` — Food & Drink
48
- - `health` — Health & Wellness
49
- - `sports--fitness` — Sports & Fitness
50
- - `arts--entertainment` — Arts & Entertainment
51
- - `family--education` — Family & Education
52
- - `business--professional` — Business & Networking
53
- - `science--tech` — Science & Technology
54
- - `community--culture` — Community & Culture
55
- - `networking` — Networking
56
- - `events` — All events (broadest, returns ~40/page)
57
-
58
- **Filter slugs (replace category):**
59
- - `free--events` — Free events only
60
- - `events--today` — Today
61
- - `events--tomorrow` — Tomorrow
62
- - `events--this-weekend` — This weekend
63
-
64
- **Query params:**
65
- - `?page=N` — Pagination (page 2+ confirmed working, each returns 18–20 events)
66
- - `?start_date=YYYY-MM-DD&end_date=YYYY-MM-DD` — Date range filter (confirmed, narrows results)
67
-
68
- ### Event detail pages
69
-
70
- ```
71
- https://www.eventbrite.com/e/{slug}-tickets-{event_id}
72
- ```
73
-
74
- Example: `https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639`
75
-
76
- - `event_id` is a numeric string (10–13 digits)
77
- - Extract with: `re.search(r'-tickets-(\d+)$', url).group(1)`
78
- - Extract slug with: `re.search(r'/e/(.+)-tickets-\d+$', url).group(1)`
79
-
80
- Other TLDs (`.ca`, `.co.uk`, etc.) use the same structure — event IDs are globally unique across TLDs.
81
-
82
- ## Listing page: JSON-LD `ItemList` schema
83
-
84
- The first `<script type="application/ld+json">` block on any `/d/` page is an `ItemList`. Each `itemListElement` contains:
85
-
86
- ```json
87
- {
88
- "position": 1,
89
- "@type": "ListItem",
90
- "item": {
91
- "@type": "Event",
92
- "name": "iContact the tactile tech opera",
93
- "description": "An immersive performance...",
94
- "url": "https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639",
95
- "image": "https://img.evbuc.com/...",
96
- "startDate": "2026-06-21",
97
- "endDate": "2026-06-21",
98
- "eventAttendanceMode": "https://schema.org/OfflineEventAttendanceMode",
99
- "location": {
100
- "@type": "Place",
101
- "name": "Little Boxes Theater",
102
- "address": {
103
- "@type": "PostalAddress",
104
- "addressLocality": "San Francisco",
105
- "addressRegion": "CA",
106
- "addressCountry": "US",
107
- "streetAddress": "94107 1661 Tennessee Street",
108
- "postalCode": "94107"
109
- },
110
- "geo": {
111
- "@type": "GeoCoordinates",
112
- "latitude": "37.7508806",
113
- "longitude": "-122.3881427"
114
- }
115
- }
116
- }
117
- }
118
- ```
119
-
120
- Note: listing-page items do NOT include `offers` (pricing) or `organizer`. Fetch the detail page for those.
121
-
122
- The second JSON-LD block on listing pages is a `BreadcrumbList` (skip it).
123
-
124
- ## Detail page: JSON-LD `Event` schema
125
-
126
- The detail page has 4 JSON-LD blocks. The `Event` (or `BusinessEvent`) block is the second one and contains the full schema:
127
-
128
- ```python
129
- import re, json
130
-
131
- headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
132
- html = http_get("https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639", headers=headers)
133
-
134
- ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
135
- event_data = None
136
- for block in ld_blocks:
137
- parsed = json.loads(block)
138
- if isinstance(parsed, dict) and parsed.get('@type') in ('Event', 'BusinessEvent', 'MusicEvent', 'EducationEvent'):
139
- event_data = parsed
140
- break
141
-
142
- print(event_data['name']) # "iContact the tactile tech opera"
143
- print(event_data['startDate']) # "2026-06-21T17:05:00-07:00" (ISO 8601 with TZ)
144
- print(event_data['endDate']) # "2026-06-21T20:08:00-07:00"
145
- print(event_data['eventStatus']) # "https://schema.org/EventScheduled"
146
- print(event_data['eventAttendanceMode']) # "https://schema.org/OfflineEventAttendanceMode"
147
- print(event_data['location']['name']) # "Little Boxes Theater"
148
- print(event_data['location']['address']['streetAddress']) # "94107 1661 Tennessee Street, San Francisco, CA 94107"
149
- print(event_data['organizer']['name']) # "Beth McNamara"
150
- print(event_data['organizer']['url']) # "https://www.eventbrite.com/o/beth-mcnamara-120755148166"
151
- ```
152
-
153
- Full confirmed schema on detail page:
154
- ```
155
- name str Event title
156
- description str Short summary
157
- url str Canonical event URL
158
- image str Event banner image URL
159
- startDate str ISO 8601 with timezone offset
160
- endDate str ISO 8601 with timezone offset
161
- eventStatus str URI: EventScheduled / EventCancelled / EventPostponed
162
- eventAttendanceMode str URI: OfflineEventAttendanceMode / OnlineEventAttendanceMode / MixedEventAttendanceMode
163
- location.@type str "Place" (in-person) or "VirtualLocation" (online)
164
- location.name str Venue name
165
- location.address.streetAddress str
166
- location.address.addressLocality str City
167
- location.address.addressRegion str State abbreviation
168
- location.address.addressCountry str Country code
169
- organizer.name str Organizer display name
170
- organizer.url str Organizer profile URL
171
- offers list AggregateOffer object(s)
172
- ```
173
-
174
- ### Offers / pricing
175
-
176
- ```python
177
- offers = event_data.get('offers', [])
178
- if offers:
179
- offer = offers[0] # always a list; typically one AggregateOffer
180
- print(offer['@type']) # "AggregateOffer"
181
- print(offer['lowPrice']) # "50.0" (string, not float)
182
- print(offer['highPrice']) # "50.0"
183
- print(offer['priceCurrency']) # "USD"
184
- print(offer['availability']) # "InStock" / "SoldOut"
185
- print(offer['availabilityStarts']) # ISO 8601 UTC
186
- print(offer['availabilityEnds']) # ISO 8601 UTC
187
-
188
- # Free events: lowPrice="0.0", highPrice="0.0"
189
- # Free check: float(offer['lowPrice']) == 0.0
190
- ```
191
-
192
- `@type` on the event itself varies by format (all scrape identically):
193
- - `Event` — general
194
- - `BusinessEvent` — networking, professional
195
- - `MusicEvent` — concerts
196
- - `EducationEvent` — classes, workshops
197
-
198
- ## Detail page: `__NEXT_DATA__` (richer structured data)
199
-
200
- Every event detail page embeds a `<script id="__NEXT_DATA__">` block with additional fields not in JSON-LD:
201
-
202
- ```python
203
- import re, json
204
-
205
- nextjs = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
206
- nd = json.loads(nextjs.group(1))
207
- context = nd['props']['pageProps']['context']
208
-
209
- bi = context['basicInfo']
210
- print(bi['id']) # "1982861003639" (event ID string)
211
- print(bi['name']) # event title
212
- print(bi['isFree']) # bool
213
- print(bi['isOnline']) # bool
214
- print(bi['currency']) # "USD"
215
- print(bi['status']) # "live" / "completed" / "canceled"
216
- print(bi['organizationId']) # numeric string
217
- print(bi['formatId']) # numeric string (event format category)
218
- print(bi['isProtected']) # bool — password-protected events
219
- print(bi['isSeries']) # bool — recurring series
220
- print(bi['created']) # ISO 8601 UTC creation timestamp
221
-
222
- # Venue with coordinates
223
- venue = bi['venue']
224
- print(venue['name']) # "Little Boxes Theater"
225
- print(venue['address']['city']) # "San Francisco"
226
- print(venue['address']['region']) # "CA"
227
- print(venue['address']['latitude']) # "37.7508806"
228
- print(venue['address']['longitude']) # "-122.3881427"
229
- print(venue['address']['localizedMultiLineAddressDisplay']) # list of strings
230
-
231
- # Organizer details
232
- org = bi['organizer']
233
- print(org['name']) # "Beth McNamara"
234
- print(org['url']) # organizer profile URL
235
- print(org['numEvents']) # int
236
- print(org['verified']) # bool
237
-
238
- # Sales status
239
- ss = context['salesStatus']
240
- print(ss['salesStatus']) # "on_sale" / "sold_out" / "sales_ended"
241
- print(ss['startSalesDate']['local']) # local datetime string
242
-
243
- # Good to know
244
- gtk = context['goodToKnow']['highlights']
245
- print(gtk['ageRestriction']) # "18+" or null
246
- print(gtk['durationInMinutes']) # int (e.g. 183)
247
- print(gtk['doorTime']) # local datetime string or null
248
- print(gtk['locationType']) # "in_person" or "online"
249
-
250
- # Refund policy
251
- refund = context['goodToKnow']['refundPolicy']
252
- print(refund['policyType']) # "custom" / "no_refunds" / "standard"
253
- print(refund['isRefundAllowed']) # bool
254
- print(refund['validDays']) # int or null
255
-
256
- # Full event description (HTML)
257
- for module in context['structuredContent']['modules']:
258
- if module['type'] == 'text':
259
- print(module['text']) # raw HTML, may need BeautifulSoup to strip tags
260
- ```
261
-
262
- ## Complete workflow: scrape events from a category
263
-
264
- ```python
265
- import re, json
266
-
267
- def get_events_from_listing(location, category, page=1):
268
- """Returns list of event dicts with name, url, startDate, endDate, location."""
269
- headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
270
- url = f"https://www.eventbrite.com/d/{location}/{category}/?page={page}"
271
- html = http_get(url, headers=headers)
272
- ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
273
- for block in ld_blocks:
274
- parsed = json.loads(block)
275
- if isinstance(parsed, dict) and parsed.get('@type') == 'ItemList':
276
- return [item['item'] for item in parsed.get('itemListElement', [])]
277
- return []
278
-
279
- def get_event_detail(event_url):
280
- """Returns full Event JSON-LD + NEXT_DATA context for a single event."""
281
- headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
282
- html = http_get(event_url, headers=headers)
283
-
284
- # JSON-LD Event block
285
- ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
286
- event_ld = None
287
- for block in ld_blocks:
288
- parsed = json.loads(block)
289
- if isinstance(parsed, dict) and parsed.get('@type') in ('Event', 'BusinessEvent', 'MusicEvent', 'EducationEvent'):
290
- event_ld = parsed
291
- break
292
-
293
- # NEXT_DATA context
294
- nextjs = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
295
- context = None
296
- if nextjs:
297
- nd = json.loads(nextjs.group(1))
298
- context = nd['props']['pageProps']['context']
299
-
300
- return event_ld, context
301
-
302
- # Usage
303
- events = get_events_from_listing("ca--san-francisco", "tech", page=1)
304
- print(f"Found {len(events)} events") # 18–20 typical
305
-
306
- for ev in events[:3]:
307
- print(ev['name'], ev['startDate'], ev['url'])
308
-
309
- # Deep-fetch one event
310
- ld, ctx = get_event_detail(events[0]['url'])
311
- if ld and ld.get('offers'):
312
- price = float(ld['offers'][0]['lowPrice'])
313
- currency = ld['offers'][0]['priceCurrency']
314
- print(f"Price: {price} {currency}") # 0.0 USD (free) or e.g. 50.0 USD
315
- ```
316
-
317
- ## Public API: requires auth
318
-
319
- The Eventbrite REST API (`https://www.eventbriteapi.com/v3/`) requires an OAuth token for all endpoints:
320
-
321
- - `GET /v3/events/{id}/` — HTTP 401 without auth
322
- - `GET /v3/events/search/` — HTTP 404 (endpoint changed; auth also required)
323
-
324
- **Use HTML scraping instead** — the JSON-LD and `__NEXT_DATA__` data is equivalent to the API response and requires no credentials.
325
-
326
- If you have a token (`EVENTBRITE_TOKEN`):
327
- ```python
328
- import os
329
- token = os.environ.get('EVENTBRITE_TOKEN')
330
- headers = {
331
- "User-Agent": "Mozilla/5.0",
332
- "Authorization": f"Bearer {token}"
333
- }
334
- data = json.loads(http_get(f"https://www.eventbriteapi.com/v3/events/{event_id}/", headers=headers))
335
- ```
336
-
337
- ## Gotchas
338
-
339
- - **Event URLs in the HTML use relative `/e/` paths, not absolute URLs** — Search listing HTML contains `/e/slug-tickets-id?aff=...` relative paths (with tracking params). Extract event URLs from the JSON-LD `ItemList` instead — they are absolute, clean URLs without tracking params.
340
-
341
- - **`re.findall(r'href="https://www.eventbrite.com/e/...')` returns 0 results** — Confirmed: event cards in the HTML do not have `https://www.eventbrite.com/e/` in href attributes. Use JSON-LD extraction only.
342
-
343
- - **`__SERVER_DATA__` does not exist** — Both search and detail pages were checked. There is no `window.__SERVER_DATA__` or `window.__redux_state__`. The embedded data is in `<script id="__NEXT_DATA__">` (detail pages only) and JSON-LD (both).
344
-
345
- - **Search listing pages have no `__NEXT_DATA__`** — Only event detail pages (`/e/` URLs) have the `__NEXT_DATA__` block. Listing pages (`/d/` URLs) have JSON-LD only.
346
-
347
- - **`@type` varies by event format** — Don't filter JSON-LD blocks with `parsed['@type'] == 'Event'` alone. Check for any of: `Event`, `BusinessEvent`, `MusicEvent`, `EducationEvent`. They have identical field structure.
348
-
349
- - **`startDate` on listing vs. detail pages differs in precision** — Listing page items show date-only (`"2026-06-21"`). Detail page Event block shows full ISO 8601 with timezone offset (`"2026-06-21T17:05:00-07:00"`). Use detail page for scheduling tasks.
350
-
351
- - **`offers` is absent on listing page items** — The `ItemList` does not include pricing. Fetch the detail page for `offers.lowPrice` / `offers.highPrice`.
352
-
353
- - **Free events have `lowPrice: "0.0"` and `highPrice: "0.0"`** — Not null or missing. Check `float(offers[0]['lowPrice']) == 0.0` or use `basicInfo.isFree` from `__NEXT_DATA__`.
354
-
355
- - **`offers` prices are strings, not floats** — `"50.0"` not `50.0`. Cast with `float(offer['lowPrice'])` before arithmetic.
356
-
357
- - **Page size is ~18–20 events per page** — Not a fixed 20. Some pages return fewer. Don't assume page N is empty because it returned < 20.
358
-
359
- - **Date filter works but can still return events outside range** — The `?start_date=` / `?end_date=` params narrow results but are not strict; always validate `startDate` from the returned data.
360
-
361
- - **Eventbrite CA / UK / AU use different TLDs** — Online event listings may surface `eventbrite.ca`, `eventbrite.co.uk` URLs. The `/e/` structure and JSON-LD schema are identical. Fetch them with the same code.
362
-
363
- - **No rate limiting observed** — 8 sequential HTTP requests across 4 pages completed without errors or blocks (avg ~1.5s each). No delay needed for light workloads, but be reasonable for bulk scraping.
1
+ # Eventbrite — Scraping & Data Extraction
2
+
3
+ `https://www.eventbrite.com` — public event listings and detail pages, no auth required for HTML scraping. REST API requires an OAuth token.
4
+
5
+ ## Do this first
6
+
7
+ **Use the search listing URL to get event lists — parse the `ItemList` JSON-LD block, not the HTML.**
8
+
9
+ ```python
10
+ import re, json
11
+
12
+ headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
13
+ html = http_get("https://www.eventbrite.com/d/ca--san-francisco/tech/", headers=headers)
14
+
15
+ ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
16
+ for block in ld_blocks:
17
+ parsed = json.loads(block)
18
+ if isinstance(parsed, dict) and parsed.get('@type') == 'ItemList':
19
+ for item in parsed['itemListElement']:
20
+ ev = item['item']
21
+ print(ev['name'], ev['startDate'], ev['url'])
22
+ break
23
+ # Returns 18–40 events per page
24
+ ```
25
+
26
+ **For a single event, fetch the detail page and extract the `Event` JSON-LD block.** It contains all fields including `offers` (pricing). There is also a richer `__NEXT_DATA__` block if you need venue coordinates, refund policy, or sales status.
27
+
28
+ ## URL structure
29
+
30
+ ### Search / listing pages
31
+
32
+ ```
33
+ https://www.eventbrite.com/d/{location}/{category}/
34
+ https://www.eventbrite.com/d/{location}/{category}/?page=2
35
+ https://www.eventbrite.com/d/{location}/{category}/?start_date=2026-05-01&end_date=2026-05-31
36
+ ```
37
+
38
+ **Location format:** `{state-abbreviation}--{city}` (lowercase, hyphens for spaces)
39
+ - `ca--san-francisco`
40
+ - `ny--new-york`
41
+ - `ca--los-angeles`
42
+ - Use `online` for virtual events
43
+
44
+ **Category slugs (confirmed working):**
45
+ - `tech` — Technology events
46
+ - `music` — Music
47
+ - `food--drink` — Food & Drink
48
+ - `health` — Health & Wellness
49
+ - `sports--fitness` — Sports & Fitness
50
+ - `arts--entertainment` — Arts & Entertainment
51
+ - `family--education` — Family & Education
52
+ - `business--professional` — Business & Networking
53
+ - `science--tech` — Science & Technology
54
+ - `community--culture` — Community & Culture
55
+ - `networking` — Networking
56
+ - `events` — All events (broadest, returns ~40/page)
57
+
58
+ **Filter slugs (replace category):**
59
+ - `free--events` — Free events only
60
+ - `events--today` — Today
61
+ - `events--tomorrow` — Tomorrow
62
+ - `events--this-weekend` — This weekend
63
+
64
+ **Query params:**
65
+ - `?page=N` — Pagination (page 2+ confirmed working, each returns 18–20 events)
66
+ - `?start_date=YYYY-MM-DD&end_date=YYYY-MM-DD` — Date range filter (confirmed, narrows results)
67
+
68
+ ### Event detail pages
69
+
70
+ ```
71
+ https://www.eventbrite.com/e/{slug}-tickets-{event_id}
72
+ ```
73
+
74
+ Example: `https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639`
75
+
76
+ - `event_id` is a numeric string (10–13 digits)
77
+ - Extract with: `re.search(r'-tickets-(\d+)$', url).group(1)`
78
+ - Extract slug with: `re.search(r'/e/(.+)-tickets-\d+$', url).group(1)`
79
+
80
+ Other TLDs (`.ca`, `.co.uk`, etc.) use the same structure — event IDs are globally unique across TLDs.
81
+
82
+ ## Listing page: JSON-LD `ItemList` schema
83
+
84
+ The first `<script type="application/ld+json">` block on any `/d/` page is an `ItemList`. Each `itemListElement` contains:
85
+
86
+ ```json
87
+ {
88
+ "position": 1,
89
+ "@type": "ListItem",
90
+ "item": {
91
+ "@type": "Event",
92
+ "name": "iContact the tactile tech opera",
93
+ "description": "An immersive performance...",
94
+ "url": "https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639",
95
+ "image": "https://img.evbuc.com/...",
96
+ "startDate": "2026-06-21",
97
+ "endDate": "2026-06-21",
98
+ "eventAttendanceMode": "https://schema.org/OfflineEventAttendanceMode",
99
+ "location": {
100
+ "@type": "Place",
101
+ "name": "Little Boxes Theater",
102
+ "address": {
103
+ "@type": "PostalAddress",
104
+ "addressLocality": "San Francisco",
105
+ "addressRegion": "CA",
106
+ "addressCountry": "US",
107
+ "streetAddress": "94107 1661 Tennessee Street",
108
+ "postalCode": "94107"
109
+ },
110
+ "geo": {
111
+ "@type": "GeoCoordinates",
112
+ "latitude": "37.7508806",
113
+ "longitude": "-122.3881427"
114
+ }
115
+ }
116
+ }
117
+ }
118
+ ```
119
+
120
+ Note: listing-page items do NOT include `offers` (pricing) or `organizer`. Fetch the detail page for those.
121
+
122
+ The second JSON-LD block on listing pages is a `BreadcrumbList` (skip it).
123
+
124
+ ## Detail page: JSON-LD `Event` schema
125
+
126
+ The detail page has 4 JSON-LD blocks. The `Event` (or `BusinessEvent`) block is the second one and contains the full schema:
127
+
128
+ ```python
129
+ import re, json
130
+
131
+ headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
132
+ html = http_get("https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639", headers=headers)
133
+
134
+ ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
135
+ event_data = None
136
+ for block in ld_blocks:
137
+ parsed = json.loads(block)
138
+ if isinstance(parsed, dict) and parsed.get('@type') in ('Event', 'BusinessEvent', 'MusicEvent', 'EducationEvent'):
139
+ event_data = parsed
140
+ break
141
+
142
+ print(event_data['name']) # "iContact the tactile tech opera"
143
+ print(event_data['startDate']) # "2026-06-21T17:05:00-07:00" (ISO 8601 with TZ)
144
+ print(event_data['endDate']) # "2026-06-21T20:08:00-07:00"
145
+ print(event_data['eventStatus']) # "https://schema.org/EventScheduled"
146
+ print(event_data['eventAttendanceMode']) # "https://schema.org/OfflineEventAttendanceMode"
147
+ print(event_data['location']['name']) # "Little Boxes Theater"
148
+ print(event_data['location']['address']['streetAddress']) # "94107 1661 Tennessee Street, San Francisco, CA 94107"
149
+ print(event_data['organizer']['name']) # "Beth McNamara"
150
+ print(event_data['organizer']['url']) # "https://www.eventbrite.com/o/beth-mcnamara-120755148166"
151
+ ```
152
+
153
+ Full confirmed schema on detail page:
154
+ ```
155
+ name str Event title
156
+ description str Short summary
157
+ url str Canonical event URL
158
+ image str Event banner image URL
159
+ startDate str ISO 8601 with timezone offset
160
+ endDate str ISO 8601 with timezone offset
161
+ eventStatus str URI: EventScheduled / EventCancelled / EventPostponed
162
+ eventAttendanceMode str URI: OfflineEventAttendanceMode / OnlineEventAttendanceMode / MixedEventAttendanceMode
163
+ location.@type str "Place" (in-person) or "VirtualLocation" (online)
164
+ location.name str Venue name
165
+ location.address.streetAddress str
166
+ location.address.addressLocality str City
167
+ location.address.addressRegion str State abbreviation
168
+ location.address.addressCountry str Country code
169
+ organizer.name str Organizer display name
170
+ organizer.url str Organizer profile URL
171
+ offers list AggregateOffer object(s)
172
+ ```
173
+
174
+ ### Offers / pricing
175
+
176
+ ```python
177
+ offers = event_data.get('offers', [])
178
+ if offers:
179
+ offer = offers[0] # always a list; typically one AggregateOffer
180
+ print(offer['@type']) # "AggregateOffer"
181
+ print(offer['lowPrice']) # "50.0" (string, not float)
182
+ print(offer['highPrice']) # "50.0"
183
+ print(offer['priceCurrency']) # "USD"
184
+ print(offer['availability']) # "InStock" / "SoldOut"
185
+ print(offer['availabilityStarts']) # ISO 8601 UTC
186
+ print(offer['availabilityEnds']) # ISO 8601 UTC
187
+
188
+ # Free events: lowPrice="0.0", highPrice="0.0"
189
+ # Free check: float(offer['lowPrice']) == 0.0
190
+ ```
191
+
192
+ `@type` on the event itself varies by format (all scrape identically):
193
+ - `Event` — general
194
+ - `BusinessEvent` — networking, professional
195
+ - `MusicEvent` — concerts
196
+ - `EducationEvent` — classes, workshops
197
+
198
+ ## Detail page: `__NEXT_DATA__` (richer structured data)
199
+
200
+ Every event detail page embeds a `<script id="__NEXT_DATA__">` block with additional fields not in JSON-LD:
201
+
202
+ ```python
203
+ import re, json
204
+
205
+ nextjs = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
206
+ nd = json.loads(nextjs.group(1))
207
+ context = nd['props']['pageProps']['context']
208
+
209
+ bi = context['basicInfo']
210
+ print(bi['id']) # "1982861003639" (event ID string)
211
+ print(bi['name']) # event title
212
+ print(bi['isFree']) # bool
213
+ print(bi['isOnline']) # bool
214
+ print(bi['currency']) # "USD"
215
+ print(bi['status']) # "live" / "completed" / "canceled"
216
+ print(bi['organizationId']) # numeric string
217
+ print(bi['formatId']) # numeric string (event format category)
218
+ print(bi['isProtected']) # bool — password-protected events
219
+ print(bi['isSeries']) # bool — recurring series
220
+ print(bi['created']) # ISO 8601 UTC creation timestamp
221
+
222
+ # Venue with coordinates
223
+ venue = bi['venue']
224
+ print(venue['name']) # "Little Boxes Theater"
225
+ print(venue['address']['city']) # "San Francisco"
226
+ print(venue['address']['region']) # "CA"
227
+ print(venue['address']['latitude']) # "37.7508806"
228
+ print(venue['address']['longitude']) # "-122.3881427"
229
+ print(venue['address']['localizedMultiLineAddressDisplay']) # list of strings
230
+
231
+ # Organizer details
232
+ org = bi['organizer']
233
+ print(org['name']) # "Beth McNamara"
234
+ print(org['url']) # organizer profile URL
235
+ print(org['numEvents']) # int
236
+ print(org['verified']) # bool
237
+
238
+ # Sales status
239
+ ss = context['salesStatus']
240
+ print(ss['salesStatus']) # "on_sale" / "sold_out" / "sales_ended"
241
+ print(ss['startSalesDate']['local']) # local datetime string
242
+
243
+ # Good to know
244
+ gtk = context['goodToKnow']['highlights']
245
+ print(gtk['ageRestriction']) # "18+" or null
246
+ print(gtk['durationInMinutes']) # int (e.g. 183)
247
+ print(gtk['doorTime']) # local datetime string or null
248
+ print(gtk['locationType']) # "in_person" or "online"
249
+
250
+ # Refund policy
251
+ refund = context['goodToKnow']['refundPolicy']
252
+ print(refund['policyType']) # "custom" / "no_refunds" / "standard"
253
+ print(refund['isRefundAllowed']) # bool
254
+ print(refund['validDays']) # int or null
255
+
256
+ # Full event description (HTML)
257
+ for module in context['structuredContent']['modules']:
258
+ if module['type'] == 'text':
259
+ print(module['text']) # raw HTML, may need BeautifulSoup to strip tags
260
+ ```
261
+
262
+ ## Complete workflow: scrape events from a category
263
+
264
+ ```python
265
+ import re, json
266
+
267
+ def get_events_from_listing(location, category, page=1):
268
+ """Returns list of event dicts with name, url, startDate, endDate, location."""
269
+ headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
270
+ url = f"https://www.eventbrite.com/d/{location}/{category}/?page={page}"
271
+ html = http_get(url, headers=headers)
272
+ ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
273
+ for block in ld_blocks:
274
+ parsed = json.loads(block)
275
+ if isinstance(parsed, dict) and parsed.get('@type') == 'ItemList':
276
+ return [item['item'] for item in parsed.get('itemListElement', [])]
277
+ return []
278
+
279
+ def get_event_detail(event_url):
280
+ """Returns full Event JSON-LD + NEXT_DATA context for a single event."""
281
+ headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
282
+ html = http_get(event_url, headers=headers)
283
+
284
+ # JSON-LD Event block
285
+ ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
286
+ event_ld = None
287
+ for block in ld_blocks:
288
+ parsed = json.loads(block)
289
+ if isinstance(parsed, dict) and parsed.get('@type') in ('Event', 'BusinessEvent', 'MusicEvent', 'EducationEvent'):
290
+ event_ld = parsed
291
+ break
292
+
293
+ # NEXT_DATA context
294
+ nextjs = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
295
+ context = None
296
+ if nextjs:
297
+ nd = json.loads(nextjs.group(1))
298
+ context = nd['props']['pageProps']['context']
299
+
300
+ return event_ld, context
301
+
302
+ # Usage
303
+ events = get_events_from_listing("ca--san-francisco", "tech", page=1)
304
+ print(f"Found {len(events)} events") # 18–20 typical
305
+
306
+ for ev in events[:3]:
307
+ print(ev['name'], ev['startDate'], ev['url'])
308
+
309
+ # Deep-fetch one event
310
+ ld, ctx = get_event_detail(events[0]['url'])
311
+ if ld and ld.get('offers'):
312
+ price = float(ld['offers'][0]['lowPrice'])
313
+ currency = ld['offers'][0]['priceCurrency']
314
+ print(f"Price: {price} {currency}") # 0.0 USD (free) or e.g. 50.0 USD
315
+ ```
316
+
317
+ ## Public API: requires auth
318
+
319
+ The Eventbrite REST API (`https://www.eventbriteapi.com/v3/`) requires an OAuth token for all endpoints:
320
+
321
+ - `GET /v3/events/{id}/` — HTTP 401 without auth
322
+ - `GET /v3/events/search/` — HTTP 404 (endpoint changed; auth also required)
323
+
324
+ **Use HTML scraping instead** — the JSON-LD and `__NEXT_DATA__` data is equivalent to the API response and requires no credentials.
325
+
326
+ If you have a token (`EVENTBRITE_TOKEN`):
327
+ ```python
328
+ import os
329
+ token = os.environ.get('EVENTBRITE_TOKEN')
330
+ headers = {
331
+ "User-Agent": "Mozilla/5.0",
332
+ "Authorization": f"Bearer {token}"
333
+ }
334
+ data = json.loads(http_get(f"https://www.eventbriteapi.com/v3/events/{event_id}/", headers=headers))
335
+ ```
336
+
337
+ ## Gotchas
338
+
339
+ - **Event URLs in the HTML use relative `/e/` paths, not absolute URLs** — Search listing HTML contains `/e/slug-tickets-id?aff=...` relative paths (with tracking params). Extract event URLs from the JSON-LD `ItemList` instead — they are absolute, clean URLs without tracking params.
340
+
341
+ - **`re.findall(r'href="https://www.eventbrite.com/e/...')` returns 0 results** — Confirmed: event cards in the HTML do not have `https://www.eventbrite.com/e/` in href attributes. Use JSON-LD extraction only.
342
+
343
+ - **`__SERVER_DATA__` does not exist** — Both search and detail pages were checked. There is no `window.__SERVER_DATA__` or `window.__redux_state__`. The embedded data is in `<script id="__NEXT_DATA__">` (detail pages only) and JSON-LD (both).
344
+
345
+ - **Search listing pages have no `__NEXT_DATA__`** — Only event detail pages (`/e/` URLs) have the `__NEXT_DATA__` block. Listing pages (`/d/` URLs) have JSON-LD only.
346
+
347
+ - **`@type` varies by event format** — Don't filter JSON-LD blocks with `parsed['@type'] == 'Event'` alone. Check for any of: `Event`, `BusinessEvent`, `MusicEvent`, `EducationEvent`. They have identical field structure.
348
+
349
+ - **`startDate` on listing vs. detail pages differs in precision** — Listing page items show date-only (`"2026-06-21"`). Detail page Event block shows full ISO 8601 with timezone offset (`"2026-06-21T17:05:00-07:00"`). Use detail page for scheduling tasks.
350
+
351
+ - **`offers` is absent on listing page items** — The `ItemList` does not include pricing. Fetch the detail page for `offers.lowPrice` / `offers.highPrice`.
352
+
353
+ - **Free events have `lowPrice: "0.0"` and `highPrice: "0.0"`** — Not null or missing. Check `float(offers[0]['lowPrice']) == 0.0` or use `basicInfo.isFree` from `__NEXT_DATA__`.
354
+
355
+ - **`offers` prices are strings, not floats** — `"50.0"` not `50.0`. Cast with `float(offer['lowPrice'])` before arithmetic.
356
+
357
+ - **Page size is ~18–20 events per page** — Not a fixed 20. Some pages return fewer. Don't assume page N is empty because it returned < 20.
358
+
359
+ - **Date filter works but can still return events outside range** — The `?start_date=` / `?end_date=` params narrow results but are not strict; always validate `startDate` from the returned data.
360
+
361
+ - **Eventbrite CA / UK / AU use different TLDs** — Online event listings may surface `eventbrite.ca`, `eventbrite.co.uk` URLs. The `/e/` structure and JSON-LD schema are identical. Fetch them with the same code.
362
+
363
+ - **No rate limiting observed** — 8 sequential HTTP requests across 4 pages completed without errors or blocks (avg ~1.5s each). No delay needed for light workloads, but be reasonable for bulk scraping.