@pencil-agent/nano-pencil 2.0.1 → 2.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +267 -267
- package/dist/build-meta.json +3 -3
- package/dist/core/export-html/AGENT.md +11 -11
- package/dist/core/export-html/template.css +971 -971
- package/dist/core/export-html/template.html +54 -54
- package/dist/core/model/custom-providers.js +1 -1
- package/dist/core/model-registry.js +5 -5
- package/dist/extensions/builtin/AGENT.md +115 -115
- package/dist/extensions/builtin/browser/AGENT.md +17 -17
- package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
- package/dist/extensions/builtin/browser/browser.md +73 -73
- package/dist/extensions/builtin/browser/install.md +142 -142
- package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
- package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
- package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
- package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
- package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
- package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
- package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
- package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
- package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
- package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
- package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
- package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
- package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
- package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
- package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
- package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
- package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
- package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
- package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
- package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
- package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
- package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
- package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
- package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
- package/dist/extensions/builtin/goal/README.md +67 -67
- package/dist/extensions/builtin/grub/README.md +112 -112
- package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
- package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
- package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
- package/dist/extensions/builtin/link-world/linkworld.md +313 -313
- package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
- package/dist/extensions/builtin/loop/README.md +92 -92
- package/dist/extensions/builtin/mcp/figma-design.md +68 -68
- package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
- package/dist/extensions/builtin/recap/AGENT.md +15 -15
- package/dist/extensions/builtin/sal/README.md +72 -72
- package/dist/extensions/builtin/security-audit/README.md +289 -289
- package/dist/extensions/builtin/team/AGENT.md +112 -112
- package/dist/extensions/builtin/team/TESTING.md +299 -299
- package/dist/extensions/builtin/token-save/README.md +56 -56
- package/dist/extensions/optional/AGENT.md +10 -10
- package/dist/modes/interactive/controllers/input-submit-controller.js +2 -2
- package/dist/modes/interactive/controllers/stream-render-controller.js +2 -2
- package/dist/modes/interactive/interactive-mode.js +19 -19
- package/dist/modes/interactive/theme/dark.json +85 -85
- package/dist/modes/interactive/theme/light.json +84 -84
- package/dist/modes/interactive/theme/theme-schema.json +335 -335
- package/dist/modes/interactive/theme/warm.json +81 -81
- package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
- package/dist/node_modules/@pencil-agent/ai/dist/models.generated.js +1 -1
- package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +851 -0
- package/docs/SDK-TESTING.md +364 -0
- package/docs/codex-goal-command-impl.md +1055 -1055
- package/docs/codex-goal-vs-grub.md +500 -500
- package/docs/custom-provider.md +27 -27
- package/docs/extensions.md +27 -27
- package/docs/keybindings.md +27 -27
- package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
- package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
- package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
- package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
- package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
- package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
- package/docs/loop-usage-examples.md +214 -214
- package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +593 -0
- package/docs/models.md +27 -27
- package/docs/packages.md +27 -27
- package/docs/pi-design-philosophy.md +457 -457
- package/docs/planmode.md +1987 -1987
- package/docs/prompt-templates.md +27 -27
- package/docs/providers.md +27 -27
- package/docs/sdk.md +27 -27
- package/docs/skills.md +27 -27
- package/docs/startup-performance-optimization.md +301 -0
- package/docs/themes.md +27 -27
- package/docs/tui.md +27 -27
- package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +47 -0
- package/package.json +190 -190
- package/docs/cc-agent-design.md +0 -1297
- package/docs/cc-tui-design.md +0 -1333
- package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +0 -170
- package/docs/scan-report.md +0 -3820
- package/docs//345/257/271/346/240/207Claude-Code.md +0 -1775
- package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +0 -261
package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md
CHANGED
|
@@ -1,363 +1,363 @@
|
|
|
1
|
-
# Eventbrite — Scraping & Data Extraction
|
|
2
|
-
|
|
3
|
-
`https://www.eventbrite.com` — public event listings and detail pages, no auth required for HTML scraping. REST API requires an OAuth token.
|
|
4
|
-
|
|
5
|
-
## Do this first
|
|
6
|
-
|
|
7
|
-
**Use the search listing URL to get event lists — parse the `ItemList` JSON-LD block, not the HTML.**
|
|
8
|
-
|
|
9
|
-
```python
|
|
10
|
-
import re, json
|
|
11
|
-
|
|
12
|
-
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
|
|
13
|
-
html = http_get("https://www.eventbrite.com/d/ca--san-francisco/tech/", headers=headers)
|
|
14
|
-
|
|
15
|
-
ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
|
|
16
|
-
for block in ld_blocks:
|
|
17
|
-
parsed = json.loads(block)
|
|
18
|
-
if isinstance(parsed, dict) and parsed.get('@type') == 'ItemList':
|
|
19
|
-
for item in parsed['itemListElement']:
|
|
20
|
-
ev = item['item']
|
|
21
|
-
print(ev['name'], ev['startDate'], ev['url'])
|
|
22
|
-
break
|
|
23
|
-
# Returns 18–40 events per page
|
|
24
|
-
```
|
|
25
|
-
|
|
26
|
-
**For a single event, fetch the detail page and extract the `Event` JSON-LD block.** It contains all fields including `offers` (pricing). There is also a richer `__NEXT_DATA__` block if you need venue coordinates, refund policy, or sales status.
|
|
27
|
-
|
|
28
|
-
## URL structure
|
|
29
|
-
|
|
30
|
-
### Search / listing pages
|
|
31
|
-
|
|
32
|
-
```
|
|
33
|
-
https://www.eventbrite.com/d/{location}/{category}/
|
|
34
|
-
https://www.eventbrite.com/d/{location}/{category}/?page=2
|
|
35
|
-
https://www.eventbrite.com/d/{location}/{category}/?start_date=2026-05-01&end_date=2026-05-31
|
|
36
|
-
```
|
|
37
|
-
|
|
38
|
-
**Location format:** `{state-abbreviation}--{city}` (lowercase, hyphens for spaces)
|
|
39
|
-
- `ca--san-francisco`
|
|
40
|
-
- `ny--new-york`
|
|
41
|
-
- `ca--los-angeles`
|
|
42
|
-
- Use `online` for virtual events
|
|
43
|
-
|
|
44
|
-
**Category slugs (confirmed working):**
|
|
45
|
-
- `tech` — Technology events
|
|
46
|
-
- `music` — Music
|
|
47
|
-
- `food--drink` — Food & Drink
|
|
48
|
-
- `health` — Health & Wellness
|
|
49
|
-
- `sports--fitness` — Sports & Fitness
|
|
50
|
-
- `arts--entertainment` — Arts & Entertainment
|
|
51
|
-
- `family--education` — Family & Education
|
|
52
|
-
- `business--professional` — Business & Networking
|
|
53
|
-
- `science--tech` — Science & Technology
|
|
54
|
-
- `community--culture` — Community & Culture
|
|
55
|
-
- `networking` — Networking
|
|
56
|
-
- `events` — All events (broadest, returns ~40/page)
|
|
57
|
-
|
|
58
|
-
**Filter slugs (replace category):**
|
|
59
|
-
- `free--events` — Free events only
|
|
60
|
-
- `events--today` — Today
|
|
61
|
-
- `events--tomorrow` — Tomorrow
|
|
62
|
-
- `events--this-weekend` — This weekend
|
|
63
|
-
|
|
64
|
-
**Query params:**
|
|
65
|
-
- `?page=N` — Pagination (page 2+ confirmed working, each returns 18–20 events)
|
|
66
|
-
- `?start_date=YYYY-MM-DD&end_date=YYYY-MM-DD` — Date range filter (confirmed, narrows results)
|
|
67
|
-
|
|
68
|
-
### Event detail pages
|
|
69
|
-
|
|
70
|
-
```
|
|
71
|
-
https://www.eventbrite.com/e/{slug}-tickets-{event_id}
|
|
72
|
-
```
|
|
73
|
-
|
|
74
|
-
Example: `https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639`
|
|
75
|
-
|
|
76
|
-
- `event_id` is a numeric string (10–13 digits)
|
|
77
|
-
- Extract with: `re.search(r'-tickets-(\d+)$', url).group(1)`
|
|
78
|
-
- Extract slug with: `re.search(r'/e/(.+)-tickets-\d+$', url).group(1)`
|
|
79
|
-
|
|
80
|
-
Other TLDs (`.ca`, `.co.uk`, etc.) use the same structure — event IDs are globally unique across TLDs.
|
|
81
|
-
|
|
82
|
-
## Listing page: JSON-LD `ItemList` schema
|
|
83
|
-
|
|
84
|
-
The first `<script type="application/ld+json">` block on any `/d/` page is an `ItemList`. Each `itemListElement` contains:
|
|
85
|
-
|
|
86
|
-
```json
|
|
87
|
-
{
|
|
88
|
-
"position": 1,
|
|
89
|
-
"@type": "ListItem",
|
|
90
|
-
"item": {
|
|
91
|
-
"@type": "Event",
|
|
92
|
-
"name": "iContact the tactile tech opera",
|
|
93
|
-
"description": "An immersive performance...",
|
|
94
|
-
"url": "https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639",
|
|
95
|
-
"image": "https://img.evbuc.com/...",
|
|
96
|
-
"startDate": "2026-06-21",
|
|
97
|
-
"endDate": "2026-06-21",
|
|
98
|
-
"eventAttendanceMode": "https://schema.org/OfflineEventAttendanceMode",
|
|
99
|
-
"location": {
|
|
100
|
-
"@type": "Place",
|
|
101
|
-
"name": "Little Boxes Theater",
|
|
102
|
-
"address": {
|
|
103
|
-
"@type": "PostalAddress",
|
|
104
|
-
"addressLocality": "San Francisco",
|
|
105
|
-
"addressRegion": "CA",
|
|
106
|
-
"addressCountry": "US",
|
|
107
|
-
"streetAddress": "94107 1661 Tennessee Street",
|
|
108
|
-
"postalCode": "94107"
|
|
109
|
-
},
|
|
110
|
-
"geo": {
|
|
111
|
-
"@type": "GeoCoordinates",
|
|
112
|
-
"latitude": "37.7508806",
|
|
113
|
-
"longitude": "-122.3881427"
|
|
114
|
-
}
|
|
115
|
-
}
|
|
116
|
-
}
|
|
117
|
-
}
|
|
118
|
-
```
|
|
119
|
-
|
|
120
|
-
Note: listing-page items do NOT include `offers` (pricing) or `organizer`. Fetch the detail page for those.
|
|
121
|
-
|
|
122
|
-
The second JSON-LD block on listing pages is a `BreadcrumbList` (skip it).
|
|
123
|
-
|
|
124
|
-
## Detail page: JSON-LD `Event` schema
|
|
125
|
-
|
|
126
|
-
The detail page has 4 JSON-LD blocks. The `Event` (or `BusinessEvent`) block is the second one and contains the full schema:
|
|
127
|
-
|
|
128
|
-
```python
|
|
129
|
-
import re, json
|
|
130
|
-
|
|
131
|
-
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
|
|
132
|
-
html = http_get("https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639", headers=headers)
|
|
133
|
-
|
|
134
|
-
ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
|
|
135
|
-
event_data = None
|
|
136
|
-
for block in ld_blocks:
|
|
137
|
-
parsed = json.loads(block)
|
|
138
|
-
if isinstance(parsed, dict) and parsed.get('@type') in ('Event', 'BusinessEvent', 'MusicEvent', 'EducationEvent'):
|
|
139
|
-
event_data = parsed
|
|
140
|
-
break
|
|
141
|
-
|
|
142
|
-
print(event_data['name']) # "iContact the tactile tech opera"
|
|
143
|
-
print(event_data['startDate']) # "2026-06-21T17:05:00-07:00" (ISO 8601 with TZ)
|
|
144
|
-
print(event_data['endDate']) # "2026-06-21T20:08:00-07:00"
|
|
145
|
-
print(event_data['eventStatus']) # "https://schema.org/EventScheduled"
|
|
146
|
-
print(event_data['eventAttendanceMode']) # "https://schema.org/OfflineEventAttendanceMode"
|
|
147
|
-
print(event_data['location']['name']) # "Little Boxes Theater"
|
|
148
|
-
print(event_data['location']['address']['streetAddress']) # "94107 1661 Tennessee Street, San Francisco, CA 94107"
|
|
149
|
-
print(event_data['organizer']['name']) # "Beth McNamara"
|
|
150
|
-
print(event_data['organizer']['url']) # "https://www.eventbrite.com/o/beth-mcnamara-120755148166"
|
|
151
|
-
```
|
|
152
|
-
|
|
153
|
-
Full confirmed schema on detail page:
|
|
154
|
-
```
|
|
155
|
-
name str Event title
|
|
156
|
-
description str Short summary
|
|
157
|
-
url str Canonical event URL
|
|
158
|
-
image str Event banner image URL
|
|
159
|
-
startDate str ISO 8601 with timezone offset
|
|
160
|
-
endDate str ISO 8601 with timezone offset
|
|
161
|
-
eventStatus str URI: EventScheduled / EventCancelled / EventPostponed
|
|
162
|
-
eventAttendanceMode str URI: OfflineEventAttendanceMode / OnlineEventAttendanceMode / MixedEventAttendanceMode
|
|
163
|
-
location.@type str "Place" (in-person) or "VirtualLocation" (online)
|
|
164
|
-
location.name str Venue name
|
|
165
|
-
location.address.streetAddress str
|
|
166
|
-
location.address.addressLocality str City
|
|
167
|
-
location.address.addressRegion str State abbreviation
|
|
168
|
-
location.address.addressCountry str Country code
|
|
169
|
-
organizer.name str Organizer display name
|
|
170
|
-
organizer.url str Organizer profile URL
|
|
171
|
-
offers list AggregateOffer object(s)
|
|
172
|
-
```
|
|
173
|
-
|
|
174
|
-
### Offers / pricing
|
|
175
|
-
|
|
176
|
-
```python
|
|
177
|
-
offers = event_data.get('offers', [])
|
|
178
|
-
if offers:
|
|
179
|
-
offer = offers[0] # always a list; typically one AggregateOffer
|
|
180
|
-
print(offer['@type']) # "AggregateOffer"
|
|
181
|
-
print(offer['lowPrice']) # "50.0" (string, not float)
|
|
182
|
-
print(offer['highPrice']) # "50.0"
|
|
183
|
-
print(offer['priceCurrency']) # "USD"
|
|
184
|
-
print(offer['availability']) # "InStock" / "SoldOut"
|
|
185
|
-
print(offer['availabilityStarts']) # ISO 8601 UTC
|
|
186
|
-
print(offer['availabilityEnds']) # ISO 8601 UTC
|
|
187
|
-
|
|
188
|
-
# Free events: lowPrice="0.0", highPrice="0.0"
|
|
189
|
-
# Free check: float(offer['lowPrice']) == 0.0
|
|
190
|
-
```
|
|
191
|
-
|
|
192
|
-
`@type` on the event itself varies by format (all scrape identically):
|
|
193
|
-
- `Event` — general
|
|
194
|
-
- `BusinessEvent` — networking, professional
|
|
195
|
-
- `MusicEvent` — concerts
|
|
196
|
-
- `EducationEvent` — classes, workshops
|
|
197
|
-
|
|
198
|
-
## Detail page: `__NEXT_DATA__` (richer structured data)
|
|
199
|
-
|
|
200
|
-
Every event detail page embeds a `<script id="__NEXT_DATA__">` block with additional fields not in JSON-LD:
|
|
201
|
-
|
|
202
|
-
```python
|
|
203
|
-
import re, json
|
|
204
|
-
|
|
205
|
-
nextjs = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
|
|
206
|
-
nd = json.loads(nextjs.group(1))
|
|
207
|
-
context = nd['props']['pageProps']['context']
|
|
208
|
-
|
|
209
|
-
bi = context['basicInfo']
|
|
210
|
-
print(bi['id']) # "1982861003639" (event ID string)
|
|
211
|
-
print(bi['name']) # event title
|
|
212
|
-
print(bi['isFree']) # bool
|
|
213
|
-
print(bi['isOnline']) # bool
|
|
214
|
-
print(bi['currency']) # "USD"
|
|
215
|
-
print(bi['status']) # "live" / "completed" / "canceled"
|
|
216
|
-
print(bi['organizationId']) # numeric string
|
|
217
|
-
print(bi['formatId']) # numeric string (event format category)
|
|
218
|
-
print(bi['isProtected']) # bool — password-protected events
|
|
219
|
-
print(bi['isSeries']) # bool — recurring series
|
|
220
|
-
print(bi['created']) # ISO 8601 UTC creation timestamp
|
|
221
|
-
|
|
222
|
-
# Venue with coordinates
|
|
223
|
-
venue = bi['venue']
|
|
224
|
-
print(venue['name']) # "Little Boxes Theater"
|
|
225
|
-
print(venue['address']['city']) # "San Francisco"
|
|
226
|
-
print(venue['address']['region']) # "CA"
|
|
227
|
-
print(venue['address']['latitude']) # "37.7508806"
|
|
228
|
-
print(venue['address']['longitude']) # "-122.3881427"
|
|
229
|
-
print(venue['address']['localizedMultiLineAddressDisplay']) # list of strings
|
|
230
|
-
|
|
231
|
-
# Organizer details
|
|
232
|
-
org = bi['organizer']
|
|
233
|
-
print(org['name']) # "Beth McNamara"
|
|
234
|
-
print(org['url']) # organizer profile URL
|
|
235
|
-
print(org['numEvents']) # int
|
|
236
|
-
print(org['verified']) # bool
|
|
237
|
-
|
|
238
|
-
# Sales status
|
|
239
|
-
ss = context['salesStatus']
|
|
240
|
-
print(ss['salesStatus']) # "on_sale" / "sold_out" / "sales_ended"
|
|
241
|
-
print(ss['startSalesDate']['local']) # local datetime string
|
|
242
|
-
|
|
243
|
-
# Good to know
|
|
244
|
-
gtk = context['goodToKnow']['highlights']
|
|
245
|
-
print(gtk['ageRestriction']) # "18+" or null
|
|
246
|
-
print(gtk['durationInMinutes']) # int (e.g. 183)
|
|
247
|
-
print(gtk['doorTime']) # local datetime string or null
|
|
248
|
-
print(gtk['locationType']) # "in_person" or "online"
|
|
249
|
-
|
|
250
|
-
# Refund policy
|
|
251
|
-
refund = context['goodToKnow']['refundPolicy']
|
|
252
|
-
print(refund['policyType']) # "custom" / "no_refunds" / "standard"
|
|
253
|
-
print(refund['isRefundAllowed']) # bool
|
|
254
|
-
print(refund['validDays']) # int or null
|
|
255
|
-
|
|
256
|
-
# Full event description (HTML)
|
|
257
|
-
for module in context['structuredContent']['modules']:
|
|
258
|
-
if module['type'] == 'text':
|
|
259
|
-
print(module['text']) # raw HTML, may need BeautifulSoup to strip tags
|
|
260
|
-
```
|
|
261
|
-
|
|
262
|
-
## Complete workflow: scrape events from a category
|
|
263
|
-
|
|
264
|
-
```python
|
|
265
|
-
import re, json
|
|
266
|
-
|
|
267
|
-
def get_events_from_listing(location, category, page=1):
|
|
268
|
-
"""Returns list of event dicts with name, url, startDate, endDate, location."""
|
|
269
|
-
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
|
|
270
|
-
url = f"https://www.eventbrite.com/d/{location}/{category}/?page={page}"
|
|
271
|
-
html = http_get(url, headers=headers)
|
|
272
|
-
ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
|
|
273
|
-
for block in ld_blocks:
|
|
274
|
-
parsed = json.loads(block)
|
|
275
|
-
if isinstance(parsed, dict) and parsed.get('@type') == 'ItemList':
|
|
276
|
-
return [item['item'] for item in parsed.get('itemListElement', [])]
|
|
277
|
-
return []
|
|
278
|
-
|
|
279
|
-
def get_event_detail(event_url):
|
|
280
|
-
"""Returns full Event JSON-LD + NEXT_DATA context for a single event."""
|
|
281
|
-
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
|
|
282
|
-
html = http_get(event_url, headers=headers)
|
|
283
|
-
|
|
284
|
-
# JSON-LD Event block
|
|
285
|
-
ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
|
|
286
|
-
event_ld = None
|
|
287
|
-
for block in ld_blocks:
|
|
288
|
-
parsed = json.loads(block)
|
|
289
|
-
if isinstance(parsed, dict) and parsed.get('@type') in ('Event', 'BusinessEvent', 'MusicEvent', 'EducationEvent'):
|
|
290
|
-
event_ld = parsed
|
|
291
|
-
break
|
|
292
|
-
|
|
293
|
-
# NEXT_DATA context
|
|
294
|
-
nextjs = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
|
|
295
|
-
context = None
|
|
296
|
-
if nextjs:
|
|
297
|
-
nd = json.loads(nextjs.group(1))
|
|
298
|
-
context = nd['props']['pageProps']['context']
|
|
299
|
-
|
|
300
|
-
return event_ld, context
|
|
301
|
-
|
|
302
|
-
# Usage
|
|
303
|
-
events = get_events_from_listing("ca--san-francisco", "tech", page=1)
|
|
304
|
-
print(f"Found {len(events)} events") # 18–20 typical
|
|
305
|
-
|
|
306
|
-
for ev in events[:3]:
|
|
307
|
-
print(ev['name'], ev['startDate'], ev['url'])
|
|
308
|
-
|
|
309
|
-
# Deep-fetch one event
|
|
310
|
-
ld, ctx = get_event_detail(events[0]['url'])
|
|
311
|
-
if ld and ld.get('offers'):
|
|
312
|
-
price = float(ld['offers'][0]['lowPrice'])
|
|
313
|
-
currency = ld['offers'][0]['priceCurrency']
|
|
314
|
-
print(f"Price: {price} {currency}") # 0.0 USD (free) or e.g. 50.0 USD
|
|
315
|
-
```
|
|
316
|
-
|
|
317
|
-
## Public API: requires auth
|
|
318
|
-
|
|
319
|
-
The Eventbrite REST API (`https://www.eventbriteapi.com/v3/`) requires an OAuth token for all endpoints:
|
|
320
|
-
|
|
321
|
-
- `GET /v3/events/{id}/` — HTTP 401 without auth
|
|
322
|
-
- `GET /v3/events/search/` — HTTP 404 (endpoint changed; auth also required)
|
|
323
|
-
|
|
324
|
-
**Use HTML scraping instead** — the JSON-LD and `__NEXT_DATA__` data is equivalent to the API response and requires no credentials.
|
|
325
|
-
|
|
326
|
-
If you have a token (`EVENTBRITE_TOKEN`):
|
|
327
|
-
```python
|
|
328
|
-
import os
|
|
329
|
-
token = os.environ.get('EVENTBRITE_TOKEN')
|
|
330
|
-
headers = {
|
|
331
|
-
"User-Agent": "Mozilla/5.0",
|
|
332
|
-
"Authorization": f"Bearer {token}"
|
|
333
|
-
}
|
|
334
|
-
data = json.loads(http_get(f"https://www.eventbriteapi.com/v3/events/{event_id}/", headers=headers))
|
|
335
|
-
```
|
|
336
|
-
|
|
337
|
-
## Gotchas
|
|
338
|
-
|
|
339
|
-
- **Event URLs in the HTML use relative `/e/` paths, not absolute URLs** — Search listing HTML contains `/e/slug-tickets-id?aff=...` relative paths (with tracking params). Extract event URLs from the JSON-LD `ItemList` instead — they are absolute, clean URLs without tracking params.
|
|
340
|
-
|
|
341
|
-
- **`re.findall(r'href="https://www.eventbrite.com/e/...')` returns 0 results** — Confirmed: event cards in the HTML do not have `https://www.eventbrite.com/e/` in href attributes. Use JSON-LD extraction only.
|
|
342
|
-
|
|
343
|
-
- **`__SERVER_DATA__` does not exist** — Both search and detail pages were checked. There is no `window.__SERVER_DATA__` or `window.__redux_state__`. The embedded data is in `<script id="__NEXT_DATA__">` (detail pages only) and JSON-LD (both).
|
|
344
|
-
|
|
345
|
-
- **Search listing pages have no `__NEXT_DATA__`** — Only event detail pages (`/e/` URLs) have the `__NEXT_DATA__` block. Listing pages (`/d/` URLs) have JSON-LD only.
|
|
346
|
-
|
|
347
|
-
- **`@type` varies by event format** — Don't filter JSON-LD blocks with `parsed['@type'] == 'Event'` alone. Check for any of: `Event`, `BusinessEvent`, `MusicEvent`, `EducationEvent`. They have identical field structure.
|
|
348
|
-
|
|
349
|
-
- **`startDate` on listing vs. detail pages differs in precision** — Listing page items show date-only (`"2026-06-21"`). Detail page Event block shows full ISO 8601 with timezone offset (`"2026-06-21T17:05:00-07:00"`). Use detail page for scheduling tasks.
|
|
350
|
-
|
|
351
|
-
- **`offers` is absent on listing page items** — The `ItemList` does not include pricing. Fetch the detail page for `offers.lowPrice` / `offers.highPrice`.
|
|
352
|
-
|
|
353
|
-
- **Free events have `lowPrice: "0.0"` and `highPrice: "0.0"`** — Not null or missing. Check `float(offers[0]['lowPrice']) == 0.0` or use `basicInfo.isFree` from `__NEXT_DATA__`.
|
|
354
|
-
|
|
355
|
-
- **`offers` prices are strings, not floats** — `"50.0"` not `50.0`. Cast with `float(offer['lowPrice'])` before arithmetic.
|
|
356
|
-
|
|
357
|
-
- **Page size is ~18–20 events per page** — Not a fixed 20. Some pages return fewer. Don't assume page N is empty because it returned < 20.
|
|
358
|
-
|
|
359
|
-
- **Date filter works but can still return events outside range** — The `?start_date=` / `?end_date=` params narrow results but are not strict; always validate `startDate` from the returned data.
|
|
360
|
-
|
|
361
|
-
- **Eventbrite CA / UK / AU use different TLDs** — Online event listings may surface `eventbrite.ca`, `eventbrite.co.uk` URLs. The `/e/` structure and JSON-LD schema are identical. Fetch them with the same code.
|
|
362
|
-
|
|
363
|
-
- **No rate limiting observed** — 8 sequential HTTP requests across 4 pages completed without errors or blocks (avg ~1.5s each). No delay needed for light workloads, but be reasonable for bulk scraping.
|
|
1
|
+
# Eventbrite — Scraping & Data Extraction
|
|
2
|
+
|
|
3
|
+
`https://www.eventbrite.com` — public event listings and detail pages, no auth required for HTML scraping. REST API requires an OAuth token.
|
|
4
|
+
|
|
5
|
+
## Do this first
|
|
6
|
+
|
|
7
|
+
**Use the search listing URL to get event lists — parse the `ItemList` JSON-LD block, not the HTML.**
|
|
8
|
+
|
|
9
|
+
```python
|
|
10
|
+
import re, json
|
|
11
|
+
|
|
12
|
+
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
|
|
13
|
+
html = http_get("https://www.eventbrite.com/d/ca--san-francisco/tech/", headers=headers)
|
|
14
|
+
|
|
15
|
+
ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
|
|
16
|
+
for block in ld_blocks:
|
|
17
|
+
parsed = json.loads(block)
|
|
18
|
+
if isinstance(parsed, dict) and parsed.get('@type') == 'ItemList':
|
|
19
|
+
for item in parsed['itemListElement']:
|
|
20
|
+
ev = item['item']
|
|
21
|
+
print(ev['name'], ev['startDate'], ev['url'])
|
|
22
|
+
break
|
|
23
|
+
# Returns 18–40 events per page
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
**For a single event, fetch the detail page and extract the `Event` JSON-LD block.** It contains all fields including `offers` (pricing). There is also a richer `__NEXT_DATA__` block if you need venue coordinates, refund policy, or sales status.
|
|
27
|
+
|
|
28
|
+
## URL structure
|
|
29
|
+
|
|
30
|
+
### Search / listing pages
|
|
31
|
+
|
|
32
|
+
```
|
|
33
|
+
https://www.eventbrite.com/d/{location}/{category}/
|
|
34
|
+
https://www.eventbrite.com/d/{location}/{category}/?page=2
|
|
35
|
+
https://www.eventbrite.com/d/{location}/{category}/?start_date=2026-05-01&end_date=2026-05-31
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
**Location format:** `{state-abbreviation}--{city}` (lowercase, hyphens for spaces)
|
|
39
|
+
- `ca--san-francisco`
|
|
40
|
+
- `ny--new-york`
|
|
41
|
+
- `ca--los-angeles`
|
|
42
|
+
- Use `online` for virtual events
|
|
43
|
+
|
|
44
|
+
**Category slugs (confirmed working):**
|
|
45
|
+
- `tech` — Technology events
|
|
46
|
+
- `music` — Music
|
|
47
|
+
- `food--drink` — Food & Drink
|
|
48
|
+
- `health` — Health & Wellness
|
|
49
|
+
- `sports--fitness` — Sports & Fitness
|
|
50
|
+
- `arts--entertainment` — Arts & Entertainment
|
|
51
|
+
- `family--education` — Family & Education
|
|
52
|
+
- `business--professional` — Business & Networking
|
|
53
|
+
- `science--tech` — Science & Technology
|
|
54
|
+
- `community--culture` — Community & Culture
|
|
55
|
+
- `networking` — Networking
|
|
56
|
+
- `events` — All events (broadest, returns ~40/page)
|
|
57
|
+
|
|
58
|
+
**Filter slugs (replace category):**
|
|
59
|
+
- `free--events` — Free events only
|
|
60
|
+
- `events--today` — Today
|
|
61
|
+
- `events--tomorrow` — Tomorrow
|
|
62
|
+
- `events--this-weekend` — This weekend
|
|
63
|
+
|
|
64
|
+
**Query params:**
|
|
65
|
+
- `?page=N` — Pagination (page 2+ confirmed working, each returns 18–20 events)
|
|
66
|
+
- `?start_date=YYYY-MM-DD&end_date=YYYY-MM-DD` — Date range filter (confirmed, narrows results)
|
|
67
|
+
|
|
68
|
+
### Event detail pages
|
|
69
|
+
|
|
70
|
+
```
|
|
71
|
+
https://www.eventbrite.com/e/{slug}-tickets-{event_id}
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
Example: `https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639`
|
|
75
|
+
|
|
76
|
+
- `event_id` is a numeric string (10–13 digits)
|
|
77
|
+
- Extract with: `re.search(r'-tickets-(\d+)$', url).group(1)`
|
|
78
|
+
- Extract slug with: `re.search(r'/e/(.+)-tickets-\d+$', url).group(1)`
|
|
79
|
+
|
|
80
|
+
Other TLDs (`.ca`, `.co.uk`, etc.) use the same structure — event IDs are globally unique across TLDs.
|
|
81
|
+
|
|
82
|
+
## Listing page: JSON-LD `ItemList` schema
|
|
83
|
+
|
|
84
|
+
The first `<script type="application/ld+json">` block on any `/d/` page is an `ItemList`. Each `itemListElement` contains:
|
|
85
|
+
|
|
86
|
+
```json
|
|
87
|
+
{
|
|
88
|
+
"position": 1,
|
|
89
|
+
"@type": "ListItem",
|
|
90
|
+
"item": {
|
|
91
|
+
"@type": "Event",
|
|
92
|
+
"name": "iContact the tactile tech opera",
|
|
93
|
+
"description": "An immersive performance...",
|
|
94
|
+
"url": "https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639",
|
|
95
|
+
"image": "https://img.evbuc.com/...",
|
|
96
|
+
"startDate": "2026-06-21",
|
|
97
|
+
"endDate": "2026-06-21",
|
|
98
|
+
"eventAttendanceMode": "https://schema.org/OfflineEventAttendanceMode",
|
|
99
|
+
"location": {
|
|
100
|
+
"@type": "Place",
|
|
101
|
+
"name": "Little Boxes Theater",
|
|
102
|
+
"address": {
|
|
103
|
+
"@type": "PostalAddress",
|
|
104
|
+
"addressLocality": "San Francisco",
|
|
105
|
+
"addressRegion": "CA",
|
|
106
|
+
"addressCountry": "US",
|
|
107
|
+
"streetAddress": "94107 1661 Tennessee Street",
|
|
108
|
+
"postalCode": "94107"
|
|
109
|
+
},
|
|
110
|
+
"geo": {
|
|
111
|
+
"@type": "GeoCoordinates",
|
|
112
|
+
"latitude": "37.7508806",
|
|
113
|
+
"longitude": "-122.3881427"
|
|
114
|
+
}
|
|
115
|
+
}
|
|
116
|
+
}
|
|
117
|
+
}
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
Note: listing-page items do NOT include `offers` (pricing) or `organizer`. Fetch the detail page for those.
|
|
121
|
+
|
|
122
|
+
The second JSON-LD block on listing pages is a `BreadcrumbList` (skip it).
|
|
123
|
+
|
|
124
|
+
## Detail page: JSON-LD `Event` schema
|
|
125
|
+
|
|
126
|
+
The detail page has 4 JSON-LD blocks. The `Event` (or `BusinessEvent`) block is the second one and contains the full schema:
|
|
127
|
+
|
|
128
|
+
```python
|
|
129
|
+
import re, json
|
|
130
|
+
|
|
131
|
+
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
|
|
132
|
+
html = http_get("https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639", headers=headers)
|
|
133
|
+
|
|
134
|
+
ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
|
|
135
|
+
event_data = None
|
|
136
|
+
for block in ld_blocks:
|
|
137
|
+
parsed = json.loads(block)
|
|
138
|
+
if isinstance(parsed, dict) and parsed.get('@type') in ('Event', 'BusinessEvent', 'MusicEvent', 'EducationEvent'):
|
|
139
|
+
event_data = parsed
|
|
140
|
+
break
|
|
141
|
+
|
|
142
|
+
print(event_data['name']) # "iContact the tactile tech opera"
|
|
143
|
+
print(event_data['startDate']) # "2026-06-21T17:05:00-07:00" (ISO 8601 with TZ)
|
|
144
|
+
print(event_data['endDate']) # "2026-06-21T20:08:00-07:00"
|
|
145
|
+
print(event_data['eventStatus']) # "https://schema.org/EventScheduled"
|
|
146
|
+
print(event_data['eventAttendanceMode']) # "https://schema.org/OfflineEventAttendanceMode"
|
|
147
|
+
print(event_data['location']['name']) # "Little Boxes Theater"
|
|
148
|
+
print(event_data['location']['address']['streetAddress']) # "94107 1661 Tennessee Street, San Francisco, CA 94107"
|
|
149
|
+
print(event_data['organizer']['name']) # "Beth McNamara"
|
|
150
|
+
print(event_data['organizer']['url']) # "https://www.eventbrite.com/o/beth-mcnamara-120755148166"
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
Full confirmed schema on detail page:
|
|
154
|
+
```
|
|
155
|
+
name str Event title
|
|
156
|
+
description str Short summary
|
|
157
|
+
url str Canonical event URL
|
|
158
|
+
image str Event banner image URL
|
|
159
|
+
startDate str ISO 8601 with timezone offset
|
|
160
|
+
endDate str ISO 8601 with timezone offset
|
|
161
|
+
eventStatus str URI: EventScheduled / EventCancelled / EventPostponed
|
|
162
|
+
eventAttendanceMode str URI: OfflineEventAttendanceMode / OnlineEventAttendanceMode / MixedEventAttendanceMode
|
|
163
|
+
location.@type str "Place" (in-person) or "VirtualLocation" (online)
|
|
164
|
+
location.name str Venue name
|
|
165
|
+
location.address.streetAddress str
|
|
166
|
+
location.address.addressLocality str City
|
|
167
|
+
location.address.addressRegion str State abbreviation
|
|
168
|
+
location.address.addressCountry str Country code
|
|
169
|
+
organizer.name str Organizer display name
|
|
170
|
+
organizer.url str Organizer profile URL
|
|
171
|
+
offers list AggregateOffer object(s)
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
### Offers / pricing
|
|
175
|
+
|
|
176
|
+
```python
|
|
177
|
+
offers = event_data.get('offers', [])
|
|
178
|
+
if offers:
|
|
179
|
+
offer = offers[0] # always a list; typically one AggregateOffer
|
|
180
|
+
print(offer['@type']) # "AggregateOffer"
|
|
181
|
+
print(offer['lowPrice']) # "50.0" (string, not float)
|
|
182
|
+
print(offer['highPrice']) # "50.0"
|
|
183
|
+
print(offer['priceCurrency']) # "USD"
|
|
184
|
+
print(offer['availability']) # "InStock" / "SoldOut"
|
|
185
|
+
print(offer['availabilityStarts']) # ISO 8601 UTC
|
|
186
|
+
print(offer['availabilityEnds']) # ISO 8601 UTC
|
|
187
|
+
|
|
188
|
+
# Free events: lowPrice="0.0", highPrice="0.0"
|
|
189
|
+
# Free check: float(offer['lowPrice']) == 0.0
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
`@type` on the event itself varies by format (all scrape identically):
|
|
193
|
+
- `Event` — general
|
|
194
|
+
- `BusinessEvent` — networking, professional
|
|
195
|
+
- `MusicEvent` — concerts
|
|
196
|
+
- `EducationEvent` — classes, workshops
|
|
197
|
+
|
|
198
|
+
## Detail page: `__NEXT_DATA__` (richer structured data)
|
|
199
|
+
|
|
200
|
+
Every event detail page embeds a `<script id="__NEXT_DATA__">` block with additional fields not in JSON-LD:
|
|
201
|
+
|
|
202
|
+
```python
|
|
203
|
+
import re, json
|
|
204
|
+
|
|
205
|
+
nextjs = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
|
|
206
|
+
nd = json.loads(nextjs.group(1))
|
|
207
|
+
context = nd['props']['pageProps']['context']
|
|
208
|
+
|
|
209
|
+
bi = context['basicInfo']
|
|
210
|
+
print(bi['id']) # "1982861003639" (event ID string)
|
|
211
|
+
print(bi['name']) # event title
|
|
212
|
+
print(bi['isFree']) # bool
|
|
213
|
+
print(bi['isOnline']) # bool
|
|
214
|
+
print(bi['currency']) # "USD"
|
|
215
|
+
print(bi['status']) # "live" / "completed" / "canceled"
|
|
216
|
+
print(bi['organizationId']) # numeric string
|
|
217
|
+
print(bi['formatId']) # numeric string (event format category)
|
|
218
|
+
print(bi['isProtected']) # bool — password-protected events
|
|
219
|
+
print(bi['isSeries']) # bool — recurring series
|
|
220
|
+
print(bi['created']) # ISO 8601 UTC creation timestamp
|
|
221
|
+
|
|
222
|
+
# Venue with coordinates
|
|
223
|
+
venue = bi['venue']
|
|
224
|
+
print(venue['name']) # "Little Boxes Theater"
|
|
225
|
+
print(venue['address']['city']) # "San Francisco"
|
|
226
|
+
print(venue['address']['region']) # "CA"
|
|
227
|
+
print(venue['address']['latitude']) # "37.7508806"
|
|
228
|
+
print(venue['address']['longitude']) # "-122.3881427"
|
|
229
|
+
print(venue['address']['localizedMultiLineAddressDisplay']) # list of strings
|
|
230
|
+
|
|
231
|
+
# Organizer details
|
|
232
|
+
org = bi['organizer']
|
|
233
|
+
print(org['name']) # "Beth McNamara"
|
|
234
|
+
print(org['url']) # organizer profile URL
|
|
235
|
+
print(org['numEvents']) # int
|
|
236
|
+
print(org['verified']) # bool
|
|
237
|
+
|
|
238
|
+
# Sales status
|
|
239
|
+
ss = context['salesStatus']
|
|
240
|
+
print(ss['salesStatus']) # "on_sale" / "sold_out" / "sales_ended"
|
|
241
|
+
print(ss['startSalesDate']['local']) # local datetime string
|
|
242
|
+
|
|
243
|
+
# Good to know
|
|
244
|
+
gtk = context['goodToKnow']['highlights']
|
|
245
|
+
print(gtk['ageRestriction']) # "18+" or null
|
|
246
|
+
print(gtk['durationInMinutes']) # int (e.g. 183)
|
|
247
|
+
print(gtk['doorTime']) # local datetime string or null
|
|
248
|
+
print(gtk['locationType']) # "in_person" or "online"
|
|
249
|
+
|
|
250
|
+
# Refund policy
|
|
251
|
+
refund = context['goodToKnow']['refundPolicy']
|
|
252
|
+
print(refund['policyType']) # "custom" / "no_refunds" / "standard"
|
|
253
|
+
print(refund['isRefundAllowed']) # bool
|
|
254
|
+
print(refund['validDays']) # int or null
|
|
255
|
+
|
|
256
|
+
# Full event description (HTML)
|
|
257
|
+
for module in context['structuredContent']['modules']:
|
|
258
|
+
if module['type'] == 'text':
|
|
259
|
+
print(module['text']) # raw HTML, may need BeautifulSoup to strip tags
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
## Complete workflow: scrape events from a category
|
|
263
|
+
|
|
264
|
+
```python
|
|
265
|
+
import re, json
|
|
266
|
+
|
|
267
|
+
def get_events_from_listing(location, category, page=1):
|
|
268
|
+
"""Returns list of event dicts with name, url, startDate, endDate, location."""
|
|
269
|
+
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
|
|
270
|
+
url = f"https://www.eventbrite.com/d/{location}/{category}/?page={page}"
|
|
271
|
+
html = http_get(url, headers=headers)
|
|
272
|
+
ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
|
|
273
|
+
for block in ld_blocks:
|
|
274
|
+
parsed = json.loads(block)
|
|
275
|
+
if isinstance(parsed, dict) and parsed.get('@type') == 'ItemList':
|
|
276
|
+
return [item['item'] for item in parsed.get('itemListElement', [])]
|
|
277
|
+
return []
|
|
278
|
+
|
|
279
|
+
def get_event_detail(event_url):
|
|
280
|
+
"""Returns full Event JSON-LD + NEXT_DATA context for a single event."""
|
|
281
|
+
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
|
|
282
|
+
html = http_get(event_url, headers=headers)
|
|
283
|
+
|
|
284
|
+
# JSON-LD Event block
|
|
285
|
+
ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
|
|
286
|
+
event_ld = None
|
|
287
|
+
for block in ld_blocks:
|
|
288
|
+
parsed = json.loads(block)
|
|
289
|
+
if isinstance(parsed, dict) and parsed.get('@type') in ('Event', 'BusinessEvent', 'MusicEvent', 'EducationEvent'):
|
|
290
|
+
event_ld = parsed
|
|
291
|
+
break
|
|
292
|
+
|
|
293
|
+
# NEXT_DATA context
|
|
294
|
+
nextjs = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
|
|
295
|
+
context = None
|
|
296
|
+
if nextjs:
|
|
297
|
+
nd = json.loads(nextjs.group(1))
|
|
298
|
+
context = nd['props']['pageProps']['context']
|
|
299
|
+
|
|
300
|
+
return event_ld, context
|
|
301
|
+
|
|
302
|
+
# Usage
|
|
303
|
+
events = get_events_from_listing("ca--san-francisco", "tech", page=1)
|
|
304
|
+
print(f"Found {len(events)} events") # 18–20 typical
|
|
305
|
+
|
|
306
|
+
for ev in events[:3]:
|
|
307
|
+
print(ev['name'], ev['startDate'], ev['url'])
|
|
308
|
+
|
|
309
|
+
# Deep-fetch one event
|
|
310
|
+
ld, ctx = get_event_detail(events[0]['url'])
|
|
311
|
+
if ld and ld.get('offers'):
|
|
312
|
+
price = float(ld['offers'][0]['lowPrice'])
|
|
313
|
+
currency = ld['offers'][0]['priceCurrency']
|
|
314
|
+
print(f"Price: {price} {currency}") # 0.0 USD (free) or e.g. 50.0 USD
|
|
315
|
+
```
|
|
316
|
+
|
|
317
|
+
## Public API: requires auth
|
|
318
|
+
|
|
319
|
+
The Eventbrite REST API (`https://www.eventbriteapi.com/v3/`) requires an OAuth token for all endpoints:
|
|
320
|
+
|
|
321
|
+
- `GET /v3/events/{id}/` — HTTP 401 without auth
|
|
322
|
+
- `GET /v3/events/search/` — HTTP 404 (endpoint changed; auth also required)
|
|
323
|
+
|
|
324
|
+
**Use HTML scraping instead** — the JSON-LD and `__NEXT_DATA__` data is equivalent to the API response and requires no credentials.
|
|
325
|
+
|
|
326
|
+
If you have a token (`EVENTBRITE_TOKEN`):
|
|
327
|
+
```python
|
|
328
|
+
import os
|
|
329
|
+
token = os.environ.get('EVENTBRITE_TOKEN')
|
|
330
|
+
headers = {
|
|
331
|
+
"User-Agent": "Mozilla/5.0",
|
|
332
|
+
"Authorization": f"Bearer {token}"
|
|
333
|
+
}
|
|
334
|
+
data = json.loads(http_get(f"https://www.eventbriteapi.com/v3/events/{event_id}/", headers=headers))
|
|
335
|
+
```
|
|
336
|
+
|
|
337
|
+
## Gotchas
|
|
338
|
+
|
|
339
|
+
- **Event URLs in the HTML use relative `/e/` paths, not absolute URLs** — Search listing HTML contains `/e/slug-tickets-id?aff=...` relative paths (with tracking params). Extract event URLs from the JSON-LD `ItemList` instead — they are absolute, clean URLs without tracking params.
|
|
340
|
+
|
|
341
|
+
- **`re.findall(r'href="https://www.eventbrite.com/e/...')` returns 0 results** — Confirmed: event cards in the HTML do not have `https://www.eventbrite.com/e/` in href attributes. Use JSON-LD extraction only.
|
|
342
|
+
|
|
343
|
+
- **`__SERVER_DATA__` does not exist** — Both search and detail pages were checked. There is no `window.__SERVER_DATA__` or `window.__redux_state__`. The embedded data is in `<script id="__NEXT_DATA__">` (detail pages only) and JSON-LD (both).
|
|
344
|
+
|
|
345
|
+
- **Search listing pages have no `__NEXT_DATA__`** — Only event detail pages (`/e/` URLs) have the `__NEXT_DATA__` block. Listing pages (`/d/` URLs) have JSON-LD only.
|
|
346
|
+
|
|
347
|
+
- **`@type` varies by event format** — Don't filter JSON-LD blocks with `parsed['@type'] == 'Event'` alone. Check for any of: `Event`, `BusinessEvent`, `MusicEvent`, `EducationEvent`. They have identical field structure.
|
|
348
|
+
|
|
349
|
+
- **`startDate` on listing vs. detail pages differs in precision** — Listing page items show date-only (`"2026-06-21"`). Detail page Event block shows full ISO 8601 with timezone offset (`"2026-06-21T17:05:00-07:00"`). Use detail page for scheduling tasks.
|
|
350
|
+
|
|
351
|
+
- **`offers` is absent on listing page items** — The `ItemList` does not include pricing. Fetch the detail page for `offers.lowPrice` / `offers.highPrice`.
|
|
352
|
+
|
|
353
|
+
- **Free events have `lowPrice: "0.0"` and `highPrice: "0.0"`** — Not null or missing. Check `float(offers[0]['lowPrice']) == 0.0` or use `basicInfo.isFree` from `__NEXT_DATA__`.
|
|
354
|
+
|
|
355
|
+
- **`offers` prices are strings, not floats** — `"50.0"` not `50.0`. Cast with `float(offer['lowPrice'])` before arithmetic.
|
|
356
|
+
|
|
357
|
+
- **Page size is ~18–20 events per page** — Not a fixed 20. Some pages return fewer. Don't assume page N is empty because it returned < 20.
|
|
358
|
+
|
|
359
|
+
- **Date filter works but can still return events outside range** — The `?start_date=` / `?end_date=` params narrow results but are not strict; always validate `startDate` from the returned data.
|
|
360
|
+
|
|
361
|
+
- **Eventbrite CA / UK / AU use different TLDs** — Online event listings may surface `eventbrite.ca`, `eventbrite.co.uk` URLs. The `/e/` structure and JSON-LD schema are identical. Fetch them with the same code.
|
|
362
|
+
|
|
363
|
+
- **No rate limiting observed** — 8 sequential HTTP requests across 4 pages completed without errors or blocks (avg ~1.5s each). No delay needed for light workloads, but be reasonable for bulk scraping.
|