devpost-scraper 0.1.0__tar.gz → 0.3.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {devpost_scraper-0.1.0 → devpost_scraper-0.3.0}/.env.example +5 -0
- {devpost_scraper-0.1.0 → devpost_scraper-0.3.0}/.gitignore +3 -1
- devpost_scraper-0.3.0/PKG-INFO +181 -0
- devpost_scraper-0.3.0/README.md +169 -0
- {devpost_scraper-0.1.0 → devpost_scraper-0.3.0}/pyproject.toml +2 -1
- {devpost_scraper-0.1.0 → devpost_scraper-0.3.0}/src/devpost_scraper/cli.py +297 -1
- devpost_scraper-0.3.0/src/devpost_scraper/customerio.py +87 -0
- devpost_scraper-0.3.0/src/devpost_scraper/db.py +224 -0
- {devpost_scraper-0.1.0 → devpost_scraper-0.3.0}/src/devpost_scraper/models.py +30 -1
- {devpost_scraper-0.1.0 → devpost_scraper-0.3.0}/src/devpost_scraper/scraper.py +53 -0
- devpost_scraper-0.1.0/PKG-INFO +0 -101
- devpost_scraper-0.1.0/README.md +0 -89
- {devpost_scraper-0.1.0 → devpost_scraper-0.3.0}/src/devpost_scraper/__init__.py +0 -0
- {devpost_scraper-0.1.0 → devpost_scraper-0.3.0}/src/devpost_scraper/backboard_client.py +0 -0
- {devpost_scraper-0.1.0 → devpost_scraper-0.3.0}/src/devpost_scraper/csv_export.py +0 -0
|
@@ -11,3 +11,8 @@ DEVPOST_SESSION=
|
|
|
11
11
|
# GitHub personal access token for higher API rate limits (5000/hr vs 60/hr)
|
|
12
12
|
# Generate at https://github.com/settings/tokens (no scopes needed, public data only)
|
|
13
13
|
GITHUB_TOKEN=
|
|
14
|
+
|
|
15
|
+
# Customer.io Track API credentials (required for --emit-events)
|
|
16
|
+
# Find at https://fly.customer.io/settings/api_credentials
|
|
17
|
+
CUSTOMERIO_SITE_ID=
|
|
18
|
+
CUSTOMERIO_API_KEY=
|
|
@@ -0,0 +1,181 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: devpost-scraper
|
|
3
|
+
Version: 0.3.0
|
|
4
|
+
Summary: CLI for extracting Devpost data with Backboard tool-calling and exporting results to CSV.
|
|
5
|
+
Requires-Python: >=3.11
|
|
6
|
+
Requires-Dist: backboard-sdk>=1.5.9
|
|
7
|
+
Requires-Dist: beautifulsoup4>=4.12.0
|
|
8
|
+
Requires-Dist: httpx>=0.27.0
|
|
9
|
+
Requires-Dist: pydantic>=2.7.0
|
|
10
|
+
Requires-Dist: python-dotenv>=1.0.1
|
|
11
|
+
Description-Content-Type: text/markdown
|
|
12
|
+
|
|
13
|
+
# Devpost Scraper
|
|
14
|
+
|
|
15
|
+
CLI toolkit for extracting Devpost hackathon data, enriching participants with emails,
|
|
16
|
+
storing results in SQLite, and emitting Customer.io events.
|
|
17
|
+
|
|
18
|
+
Three commands:
|
|
19
|
+
|
|
20
|
+
| Command | Purpose |
|
|
21
|
+
|---|---|
|
|
22
|
+
| `devpost-scraper` | Search Devpost projects by keyword, enrich with emails, export CSV |
|
|
23
|
+
| `devpost-participants` | Scrape a single hackathon's participant list, export CSV |
|
|
24
|
+
| `devpost-harvest` | Walk the hackathon listing, scrape all participants, store in SQLite, emit delta events |
|
|
25
|
+
|
|
26
|
+
## Requirements
|
|
27
|
+
|
|
28
|
+
- Python 3.11+
|
|
29
|
+
- [`uv`](https://docs.astral.sh/uv/)
|
|
30
|
+
- A [Backboard](https://app.backboard.io) API key (for `devpost-scraper` only)
|
|
31
|
+
|
|
32
|
+
## Install
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
uv sync
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## Environment
|
|
39
|
+
|
|
40
|
+
Copy `.env.example` → `.env` and fill in:
|
|
41
|
+
|
|
42
|
+
| Variable | Required for | Notes |
|
|
43
|
+
|---|---|---|
|
|
44
|
+
| `BACKBOARD_API_KEY` | `devpost-scraper` | Backboard account key |
|
|
45
|
+
| `DEVPOST_ASSISTANT_ID` | auto | Persisted on first run |
|
|
46
|
+
| `DEVPOST_SESSION` | `devpost-participants`, `devpost-harvest` | `_devpost` cookie from browser DevTools |
|
|
47
|
+
| `GITHUB_TOKEN` | optional | GitHub PAT for 5000 req/hr (vs 60). No scopes needed |
|
|
48
|
+
| `CUSTOMERIO_SITE_ID` | `--emit-events` | Customer.io Track API |
|
|
49
|
+
| `CUSTOMERIO_API_KEY` | `--emit-events` | Customer.io Track API |
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## devpost-scraper
|
|
54
|
+
|
|
55
|
+
Search Devpost projects by keyword, enrich each with detail page + author email, export CSV.
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
uv run devpost-scraper "ai agents" --output results.csv
|
|
59
|
+
uv run devpost-scraper "climate tech" "developer tools" -o results.csv
|
|
60
|
+
|
|
61
|
+
# Or via start.sh
|
|
62
|
+
./start.sh "ai agents" --output results.csv
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## devpost-participants
|
|
68
|
+
|
|
69
|
+
Scrape a single hackathon's participant list and export to CSV.
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
# First time — pass session cookie
|
|
73
|
+
uv run devpost-participants "https://authorizedtoact.devpost.com/participants" \
|
|
74
|
+
--jwt "<_devpost cookie value>" -o participants.csv
|
|
75
|
+
|
|
76
|
+
# Reuse saved session from .env
|
|
77
|
+
uv run devpost-participants "https://authorizedtoact.devpost.com/participants" -o out.csv
|
|
78
|
+
|
|
79
|
+
# Skip email enrichment
|
|
80
|
+
uv run devpost-participants "https://..." --no-email -o out.csv
|
|
81
|
+
|
|
82
|
+
# Emit Customer.io events after scrape
|
|
83
|
+
uv run devpost-participants "https://..." --emit-events -o out.csv
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
## devpost-harvest
|
|
89
|
+
|
|
90
|
+
Automated pipeline: walk the hackathon listing → scrape participants → store in SQLite → emit Customer.io events for delta (new) participants.
|
|
91
|
+
|
|
92
|
+
### Basic usage
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
# Scrape 3 pages of open hackathons (27 hackathons), enrich new participants, emit events
|
|
96
|
+
uv run devpost-harvest --emit-events
|
|
97
|
+
|
|
98
|
+
# Fast first run — scrape without email enrichment
|
|
99
|
+
uv run devpost-harvest --no-email
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### Flags
|
|
103
|
+
|
|
104
|
+
| Flag | Default | Description |
|
|
105
|
+
|---|---|---|
|
|
106
|
+
| `--pages N` | `3` | Number of hackathon listing pages to fetch (9 per page) |
|
|
107
|
+
| `--hackathons N` | `0` (all) | Only process the first N hackathons from the listing |
|
|
108
|
+
| `--jwt TOKEN` | `.env` | Devpost `_devpost` session cookie |
|
|
109
|
+
| `--db PATH` | `devpost_harvest.db` | SQLite database path |
|
|
110
|
+
| `--status {open,ended,upcoming}` | `open` | Hackathon status filter (repeatable) |
|
|
111
|
+
| `--max-participants N` | `0` (unlimited) | Cap participants scraped per hackathon |
|
|
112
|
+
| `--no-email` | off | Skip email enrichment entirely (even for new participants) |
|
|
113
|
+
| `--emit-events` | off | Emit Customer.io events for unemitted participants during scrape |
|
|
114
|
+
| `--emit-unsent` | off | Skip scraping — just emit events for all unsent participants in DB |
|
|
115
|
+
| `--rescrape` | off | Re-scrape hackathons already scraped in a previous run |
|
|
116
|
+
|
|
117
|
+
### How it works
|
|
118
|
+
|
|
119
|
+
```
|
|
120
|
+
Phase 1: Discover hackathons
|
|
121
|
+
GET /api/hackathons?status[]=open → paginated JSON listing
|
|
122
|
+
|
|
123
|
+
Phase 2: Per hackathon
|
|
124
|
+
2a. Fast scan — scrape all participant pages (no enrichment, ~1 req per 20 participants)
|
|
125
|
+
2b. Upsert into SQLite → detect delta (new participants not previously in DB)
|
|
126
|
+
2c. Email-enrich delta only — GitHub API + link walking (skipped with --no-email)
|
|
127
|
+
2d. Emit Customer.io events for unemitted participants (only with --emit-events)
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
### Delta logic
|
|
131
|
+
|
|
132
|
+
On subsequent runs, the fast scan re-fetches participant lists but only new participants
|
|
133
|
+
(not previously in SQLite) get the expensive email enrichment. Already-emitted participants
|
|
134
|
+
are never re-emitted. This makes re-runs fast and safe to repeat.
|
|
135
|
+
|
|
136
|
+
### Common workflows
|
|
137
|
+
|
|
138
|
+
```bash
|
|
139
|
+
# Initial bulk scrape (no events yet)
|
|
140
|
+
uv run devpost-harvest --pages 5
|
|
141
|
+
|
|
142
|
+
# Emit all unsent events from the DB (no scraping, no JWT needed)
|
|
143
|
+
uv run devpost-harvest --emit-unsent
|
|
144
|
+
|
|
145
|
+
# Quick delta check on first hackathon only
|
|
146
|
+
uv run devpost-harvest --hackathons 1 --rescrape --emit-events
|
|
147
|
+
|
|
148
|
+
# Re-scan all hackathons for new participants, enrich + emit
|
|
149
|
+
uv run devpost-harvest --rescrape --emit-events
|
|
150
|
+
|
|
151
|
+
# Include ended hackathons
|
|
152
|
+
uv run devpost-harvest --status open --status ended
|
|
153
|
+
|
|
154
|
+
# Fast delta scan (skip email enrichment for new participants too)
|
|
155
|
+
uv run devpost-harvest --rescrape --no-email
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
### SQLite schema
|
|
159
|
+
|
|
160
|
+
The database (`devpost_harvest.db`) has two tables:
|
|
161
|
+
|
|
162
|
+
- **`hackathons`** — id, url, title, org, state, dates, registrations, prize, themes.
|
|
163
|
+
`last_scraped_at` is set after participants are scraped.
|
|
164
|
+
- **`participants`** — (hackathon_url, username) primary key, enrichment fields,
|
|
165
|
+
`first_seen_at`, `last_seen_at`, `event_emitted_at`.
|
|
166
|
+
|
|
167
|
+
### Customer.io events
|
|
168
|
+
|
|
169
|
+
Event name: `devpost_hackathon`. Uses participant email as the Customer.io user ID.
|
|
170
|
+
|
|
171
|
+
Event data: hackathon_url, hackathon_title, username, name, specialty, profile_url, github_url, linkedin_url.
|
|
172
|
+
|
|
173
|
+
Email templates in `emails/` use `{{customer.first_name}}` and `{{event.*}}` Liquid variables.
|
|
174
|
+
|
|
175
|
+
---
|
|
176
|
+
|
|
177
|
+
## Development
|
|
178
|
+
|
|
179
|
+
```bash
|
|
180
|
+
uv run python -m devpost_scraper.cli "ai agents" --output out.csv
|
|
181
|
+
```
|
|
@@ -0,0 +1,169 @@
|
|
|
1
|
+
# Devpost Scraper
|
|
2
|
+
|
|
3
|
+
CLI toolkit for extracting Devpost hackathon data, enriching participants with emails,
|
|
4
|
+
storing results in SQLite, and emitting Customer.io events.
|
|
5
|
+
|
|
6
|
+
Three commands:
|
|
7
|
+
|
|
8
|
+
| Command | Purpose |
|
|
9
|
+
|---|---|
|
|
10
|
+
| `devpost-scraper` | Search Devpost projects by keyword, enrich with emails, export CSV |
|
|
11
|
+
| `devpost-participants` | Scrape a single hackathon's participant list, export CSV |
|
|
12
|
+
| `devpost-harvest` | Walk the hackathon listing, scrape all participants, store in SQLite, emit delta events |
|
|
13
|
+
|
|
14
|
+
## Requirements
|
|
15
|
+
|
|
16
|
+
- Python 3.11+
|
|
17
|
+
- [`uv`](https://docs.astral.sh/uv/)
|
|
18
|
+
- A [Backboard](https://app.backboard.io) API key (for `devpost-scraper` only)
|
|
19
|
+
|
|
20
|
+
## Install
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
uv sync
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
## Environment
|
|
27
|
+
|
|
28
|
+
Copy `.env.example` → `.env` and fill in:
|
|
29
|
+
|
|
30
|
+
| Variable | Required for | Notes |
|
|
31
|
+
|---|---|---|
|
|
32
|
+
| `BACKBOARD_API_KEY` | `devpost-scraper` | Backboard account key |
|
|
33
|
+
| `DEVPOST_ASSISTANT_ID` | auto | Persisted on first run |
|
|
34
|
+
| `DEVPOST_SESSION` | `devpost-participants`, `devpost-harvest` | `_devpost` cookie from browser DevTools |
|
|
35
|
+
| `GITHUB_TOKEN` | optional | GitHub PAT for 5000 req/hr (vs 60). No scopes needed |
|
|
36
|
+
| `CUSTOMERIO_SITE_ID` | `--emit-events` | Customer.io Track API |
|
|
37
|
+
| `CUSTOMERIO_API_KEY` | `--emit-events` | Customer.io Track API |
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## devpost-scraper
|
|
42
|
+
|
|
43
|
+
Search Devpost projects by keyword, enrich each with detail page + author email, export CSV.
|
|
44
|
+
|
|
45
|
+
```bash
|
|
46
|
+
uv run devpost-scraper "ai agents" --output results.csv
|
|
47
|
+
uv run devpost-scraper "climate tech" "developer tools" -o results.csv
|
|
48
|
+
|
|
49
|
+
# Or via start.sh
|
|
50
|
+
./start.sh "ai agents" --output results.csv
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## devpost-participants
|
|
56
|
+
|
|
57
|
+
Scrape a single hackathon's participant list and export to CSV.
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
# First time — pass session cookie
|
|
61
|
+
uv run devpost-participants "https://authorizedtoact.devpost.com/participants" \
|
|
62
|
+
--jwt "<_devpost cookie value>" -o participants.csv
|
|
63
|
+
|
|
64
|
+
# Reuse saved session from .env
|
|
65
|
+
uv run devpost-participants "https://authorizedtoact.devpost.com/participants" -o out.csv
|
|
66
|
+
|
|
67
|
+
# Skip email enrichment
|
|
68
|
+
uv run devpost-participants "https://..." --no-email -o out.csv
|
|
69
|
+
|
|
70
|
+
# Emit Customer.io events after scrape
|
|
71
|
+
uv run devpost-participants "https://..." --emit-events -o out.csv
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## devpost-harvest
|
|
77
|
+
|
|
78
|
+
Automated pipeline: walk the hackathon listing → scrape participants → store in SQLite → emit Customer.io events for delta (new) participants.
|
|
79
|
+
|
|
80
|
+
### Basic usage
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
# Scrape 3 pages of open hackathons (27 hackathons), enrich new participants, emit events
|
|
84
|
+
uv run devpost-harvest --emit-events
|
|
85
|
+
|
|
86
|
+
# Fast first run — scrape without email enrichment
|
|
87
|
+
uv run devpost-harvest --no-email
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
### Flags
|
|
91
|
+
|
|
92
|
+
| Flag | Default | Description |
|
|
93
|
+
|---|---|---|
|
|
94
|
+
| `--pages N` | `3` | Number of hackathon listing pages to fetch (9 per page) |
|
|
95
|
+
| `--hackathons N` | `0` (all) | Only process the first N hackathons from the listing |
|
|
96
|
+
| `--jwt TOKEN` | `.env` | Devpost `_devpost` session cookie |
|
|
97
|
+
| `--db PATH` | `devpost_harvest.db` | SQLite database path |
|
|
98
|
+
| `--status {open,ended,upcoming}` | `open` | Hackathon status filter (repeatable) |
|
|
99
|
+
| `--max-participants N` | `0` (unlimited) | Cap participants scraped per hackathon |
|
|
100
|
+
| `--no-email` | off | Skip email enrichment entirely (even for new participants) |
|
|
101
|
+
| `--emit-events` | off | Emit Customer.io events for unemitted participants during scrape |
|
|
102
|
+
| `--emit-unsent` | off | Skip scraping — just emit events for all unsent participants in DB |
|
|
103
|
+
| `--rescrape` | off | Re-scrape hackathons already scraped in a previous run |
|
|
104
|
+
|
|
105
|
+
### How it works
|
|
106
|
+
|
|
107
|
+
```
|
|
108
|
+
Phase 1: Discover hackathons
|
|
109
|
+
GET /api/hackathons?status[]=open → paginated JSON listing
|
|
110
|
+
|
|
111
|
+
Phase 2: Per hackathon
|
|
112
|
+
2a. Fast scan — scrape all participant pages (no enrichment, ~1 req per 20 participants)
|
|
113
|
+
2b. Upsert into SQLite → detect delta (new participants not previously in DB)
|
|
114
|
+
2c. Email-enrich delta only — GitHub API + link walking (skipped with --no-email)
|
|
115
|
+
2d. Emit Customer.io events for unemitted participants (only with --emit-events)
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### Delta logic
|
|
119
|
+
|
|
120
|
+
On subsequent runs, the fast scan re-fetches participant lists but only new participants
|
|
121
|
+
(not previously in SQLite) get the expensive email enrichment. Already-emitted participants
|
|
122
|
+
are never re-emitted. This makes re-runs fast and safe to repeat.
|
|
123
|
+
|
|
124
|
+
### Common workflows
|
|
125
|
+
|
|
126
|
+
```bash
|
|
127
|
+
# Initial bulk scrape (no events yet)
|
|
128
|
+
uv run devpost-harvest --pages 5
|
|
129
|
+
|
|
130
|
+
# Emit all unsent events from the DB (no scraping, no JWT needed)
|
|
131
|
+
uv run devpost-harvest --emit-unsent
|
|
132
|
+
|
|
133
|
+
# Quick delta check on first hackathon only
|
|
134
|
+
uv run devpost-harvest --hackathons 1 --rescrape --emit-events
|
|
135
|
+
|
|
136
|
+
# Re-scan all hackathons for new participants, enrich + emit
|
|
137
|
+
uv run devpost-harvest --rescrape --emit-events
|
|
138
|
+
|
|
139
|
+
# Include ended hackathons
|
|
140
|
+
uv run devpost-harvest --status open --status ended
|
|
141
|
+
|
|
142
|
+
# Fast delta scan (skip email enrichment for new participants too)
|
|
143
|
+
uv run devpost-harvest --rescrape --no-email
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
### SQLite schema
|
|
147
|
+
|
|
148
|
+
The database (`devpost_harvest.db`) has two tables:
|
|
149
|
+
|
|
150
|
+
- **`hackathons`** — id, url, title, org, state, dates, registrations, prize, themes.
|
|
151
|
+
`last_scraped_at` is set after participants are scraped.
|
|
152
|
+
- **`participants`** — (hackathon_url, username) primary key, enrichment fields,
|
|
153
|
+
`first_seen_at`, `last_seen_at`, `event_emitted_at`.
|
|
154
|
+
|
|
155
|
+
### Customer.io events
|
|
156
|
+
|
|
157
|
+
Event name: `devpost_hackathon`. Uses participant email as the Customer.io user ID.
|
|
158
|
+
|
|
159
|
+
Event data: hackathon_url, hackathon_title, username, name, specialty, profile_url, github_url, linkedin_url.
|
|
160
|
+
|
|
161
|
+
Email templates in `emails/` use `{{customer.first_name}}` and `{{event.*}}` Liquid variables.
|
|
162
|
+
|
|
163
|
+
---
|
|
164
|
+
|
|
165
|
+
## Development
|
|
166
|
+
|
|
167
|
+
```bash
|
|
168
|
+
uv run python -m devpost_scraper.cli "ai agents" --output out.csv
|
|
169
|
+
```
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
[project]
|
|
2
2
|
name = "devpost-scraper"
|
|
3
|
-
version = "0.
|
|
3
|
+
version = "0.3.0"
|
|
4
4
|
description = "CLI for extracting Devpost data with Backboard tool-calling and exporting results to CSV."
|
|
5
5
|
readme = "README.md"
|
|
6
6
|
requires-python = ">=3.11"
|
|
@@ -15,6 +15,7 @@ dependencies = [
|
|
|
15
15
|
[project.scripts]
|
|
16
16
|
devpost-scraper = "devpost_scraper.cli:main"
|
|
17
17
|
devpost-participants = "devpost_scraper.cli:participants_main"
|
|
18
|
+
devpost-harvest = "devpost_scraper.cli:harvest_main"
|
|
18
19
|
|
|
19
20
|
[build-system]
|
|
20
21
|
requires = ["hatchling"]
|
|
@@ -17,13 +17,15 @@ from devpost_scraper.backboard_client import (
|
|
|
17
17
|
ensure_assistant,
|
|
18
18
|
run_in_thread,
|
|
19
19
|
)
|
|
20
|
+
from devpost_scraper.customerio import emit_hackathon_events
|
|
20
21
|
from devpost_scraper.csv_export import write_projects
|
|
21
|
-
from devpost_scraper.models import DevpostProject, HackathonParticipant
|
|
22
|
+
from devpost_scraper.models import DevpostProject, Hackathon, HackathonParticipant
|
|
22
23
|
from devpost_scraper.scraper import (
|
|
23
24
|
find_author_email,
|
|
24
25
|
find_participant_email,
|
|
25
26
|
get_hackathon_participants,
|
|
26
27
|
get_project_details,
|
|
28
|
+
list_hackathons,
|
|
27
29
|
search_projects,
|
|
28
30
|
)
|
|
29
31
|
|
|
@@ -228,6 +230,7 @@ async def _run_participants(
|
|
|
228
230
|
jwt_token: str,
|
|
229
231
|
output: str | None,
|
|
230
232
|
no_email: bool,
|
|
233
|
+
emit_events: bool = False,
|
|
231
234
|
) -> None:
|
|
232
235
|
all_participants: list[HackathonParticipant] = []
|
|
233
236
|
page = 1
|
|
@@ -303,6 +306,9 @@ async def _run_participants(
|
|
|
303
306
|
writer.writerows(rows)
|
|
304
307
|
print(buf.getvalue())
|
|
305
308
|
|
|
309
|
+
if emit_events:
|
|
310
|
+
await emit_hackathon_events(all_participants)
|
|
311
|
+
|
|
306
312
|
|
|
307
313
|
def participants_main() -> None:
|
|
308
314
|
load_dotenv(_ENV_FILE, override=True)
|
|
@@ -334,6 +340,12 @@ def participants_main() -> None:
|
|
|
334
340
|
default=False,
|
|
335
341
|
help="Skip email enrichment (faster)",
|
|
336
342
|
)
|
|
343
|
+
parser.add_argument(
|
|
344
|
+
"--emit-events",
|
|
345
|
+
action="store_true",
|
|
346
|
+
default=False,
|
|
347
|
+
help="Emit devpost_hackathon events to Customer.io (requires CUSTOMERIO_SITE_ID and CUSTOMERIO_API_KEY in .env)",
|
|
348
|
+
)
|
|
337
349
|
args = parser.parse_args()
|
|
338
350
|
|
|
339
351
|
if not args.output:
|
|
@@ -360,5 +372,289 @@ def participants_main() -> None:
|
|
|
360
372
|
jwt_token=jwt_token,
|
|
361
373
|
output=args.output,
|
|
362
374
|
no_email=args.no_email,
|
|
375
|
+
emit_events=args.emit_events,
|
|
376
|
+
)
|
|
377
|
+
)
|
|
378
|
+
|
|
379
|
+
|
|
380
|
+
# ---------------------------------------------------------------------------
|
|
381
|
+
# devpost-harvest: walk hackathon listing → scrape participants → delta emit
|
|
382
|
+
# ---------------------------------------------------------------------------
|
|
383
|
+
|
|
384
|
+
async def _run_harvest(
|
|
385
|
+
pages: int,
|
|
386
|
+
jwt_token: str,
|
|
387
|
+
db_path: str,
|
|
388
|
+
no_email: bool,
|
|
389
|
+
emit_events: bool,
|
|
390
|
+
rescrape: bool,
|
|
391
|
+
max_participants: int = 0,
|
|
392
|
+
max_hackathons: int = 0,
|
|
393
|
+
statuses: list[str] | None = None,
|
|
394
|
+
) -> None:
|
|
395
|
+
from devpost_scraper.db import HarvestDB
|
|
396
|
+
|
|
397
|
+
db = HarvestDB(db_path)
|
|
398
|
+
|
|
399
|
+
# Phase 1: discover hackathons
|
|
400
|
+
all_hackathons: list[Hackathon] = []
|
|
401
|
+
for page in range(1, pages + 1):
|
|
402
|
+
print(f"[harvest] Fetching hackathon listing page {page}…", file=sys.stderr)
|
|
403
|
+
data = await list_hackathons(page=page, statuses=statuses)
|
|
404
|
+
batch = data.get("hackathons", [])
|
|
405
|
+
if not batch:
|
|
406
|
+
print(f"[harvest] No hackathons on page {page}, stopping.", file=sys.stderr)
|
|
407
|
+
break
|
|
408
|
+
for raw in batch:
|
|
409
|
+
h = Hackathon(**raw)
|
|
410
|
+
if h.invite_only:
|
|
411
|
+
print(f" [skip] invite-only: {h.title}", file=sys.stderr)
|
|
412
|
+
continue
|
|
413
|
+
db.upsert_hackathon(h)
|
|
414
|
+
all_hackathons.append(h)
|
|
415
|
+
if max_hackathons and len(all_hackathons) >= max_hackathons:
|
|
416
|
+
break
|
|
417
|
+
print(f"[harvest] Page {page}: {len(batch)} hackathons ({len(all_hackathons)} total)", file=sys.stderr)
|
|
418
|
+
if max_hackathons and len(all_hackathons) >= max_hackathons:
|
|
419
|
+
break
|
|
420
|
+
|
|
421
|
+
if not all_hackathons:
|
|
422
|
+
print("[harvest] No hackathons found.", file=sys.stderr)
|
|
423
|
+
db.close()
|
|
424
|
+
return
|
|
425
|
+
|
|
426
|
+
# Phase 2: for each hackathon, scrape participants
|
|
427
|
+
total_new = 0
|
|
428
|
+
total_emitted = 0
|
|
429
|
+
|
|
430
|
+
for h in all_hackathons:
|
|
431
|
+
if not rescrape and db.hackathon_scraped(h.url):
|
|
432
|
+
print(f" [cached] {h.title} — already scraped, skipping (use --rescrape to force)", file=sys.stderr)
|
|
433
|
+
continue
|
|
434
|
+
|
|
435
|
+
print(f"\n[harvest] {h.title} ({h.url})", file=sys.stderr)
|
|
436
|
+
print(f" registrations: {h.registrations_count}, state: {h.open_state}", file=sys.stderr)
|
|
437
|
+
|
|
438
|
+
# Phase 2a: fast scan — scrape all participant pages (no enrichment)
|
|
439
|
+
participants: list[HackathonParticipant] = []
|
|
440
|
+
ppage = 1
|
|
441
|
+
while True:
|
|
442
|
+
try:
|
|
443
|
+
data = await get_hackathon_participants(h.url, jwt_token, page=ppage)
|
|
444
|
+
except Exception as exc:
|
|
445
|
+
print(f" [warn] participants fetch failed page {ppage}: {exc}", file=sys.stderr)
|
|
446
|
+
break
|
|
447
|
+
|
|
448
|
+
batch = data.get("participants", [])
|
|
449
|
+
has_more = data.get("has_more", False)
|
|
450
|
+
|
|
451
|
+
if not batch:
|
|
452
|
+
if ppage == 1:
|
|
453
|
+
print(f" [info] No participants found (may need auth)", file=sys.stderr)
|
|
454
|
+
break
|
|
455
|
+
|
|
456
|
+
if max_participants and len(participants) + len(batch) > max_participants:
|
|
457
|
+
batch = batch[:max_participants - len(participants)]
|
|
458
|
+
has_more = False
|
|
459
|
+
|
|
460
|
+
for raw in batch:
|
|
461
|
+
participants.append(
|
|
462
|
+
HackathonParticipant(
|
|
463
|
+
hackathon_url=h.url,
|
|
464
|
+
hackathon_title=h.title,
|
|
465
|
+
username=raw.get("username", ""),
|
|
466
|
+
name=raw.get("name", ""),
|
|
467
|
+
specialty=raw.get("specialty", ""),
|
|
468
|
+
profile_url=raw.get("profile_url", ""),
|
|
469
|
+
)
|
|
470
|
+
)
|
|
471
|
+
|
|
472
|
+
if not has_more:
|
|
473
|
+
break
|
|
474
|
+
ppage += 1
|
|
475
|
+
|
|
476
|
+
if not participants:
|
|
477
|
+
db.mark_hackathon_scraped(h.url)
|
|
478
|
+
continue
|
|
479
|
+
|
|
480
|
+
print(f" [scan] {len(participants)} participants across {ppage} pages", file=sys.stderr)
|
|
481
|
+
|
|
482
|
+
# Phase 2b: upsert → detect delta
|
|
483
|
+
new_participants = db.upsert_participants(participants)
|
|
484
|
+
total_new += len(new_participants)
|
|
485
|
+
print(f" [db] {len(new_participants)} new, {len(participants) - len(new_participants)} existing", file=sys.stderr)
|
|
486
|
+
|
|
487
|
+
# Phase 2c: email-enrich only the delta
|
|
488
|
+
if new_participants and not no_email:
|
|
489
|
+
print(f" [enrich] enriching {len(new_participants)} new participants…", file=sys.stderr)
|
|
490
|
+
for p in new_participants:
|
|
491
|
+
if not p.profile_url:
|
|
492
|
+
continue
|
|
493
|
+
try:
|
|
494
|
+
email_data = await find_participant_email(p.profile_url)
|
|
495
|
+
p.email = email_data.get("email", "")
|
|
496
|
+
p.github_url = email_data.get("github_url", "")
|
|
497
|
+
p.linkedin_url = email_data.get("linkedin_url", "")
|
|
498
|
+
if p.email:
|
|
499
|
+
print(f" [email] {p.email} ← {p.username}", file=sys.stderr)
|
|
500
|
+
db.update_participant_enrichment(p)
|
|
501
|
+
except Exception as exc:
|
|
502
|
+
print(f" [warn] enrich failed for {p.username}: {exc}", file=sys.stderr)
|
|
503
|
+
|
|
504
|
+
# Phase 2d: emit events for unemitted participants
|
|
505
|
+
if emit_events:
|
|
506
|
+
unemitted = db.get_unemitted_participants(h.url)
|
|
507
|
+
if unemitted:
|
|
508
|
+
print(f" [cio] Emitting events for {len(unemitted)} unemitted participants…", file=sys.stderr)
|
|
509
|
+
await emit_hackathon_events(unemitted)
|
|
510
|
+
for p in unemitted:
|
|
511
|
+
db.mark_event_emitted(h.url, p.username)
|
|
512
|
+
total_emitted += len(unemitted)
|
|
513
|
+
|
|
514
|
+
db.mark_hackathon_scraped(h.url)
|
|
515
|
+
|
|
516
|
+
# Summary
|
|
517
|
+
stats = db.stats()
|
|
518
|
+
print(f"\n{'=' * 60}", file=sys.stderr)
|
|
519
|
+
print(f"[harvest] Done.", file=sys.stderr)
|
|
520
|
+
print(f" hackathons in db: {stats['hackathons']}", file=sys.stderr)
|
|
521
|
+
print(f" participants in db: {stats['participants']}", file=sys.stderr)
|
|
522
|
+
print(f" with email: {stats['with_email']}", file=sys.stderr)
|
|
523
|
+
print(f" new this run: {total_new}", file=sys.stderr)
|
|
524
|
+
print(f" events emitted (total): {stats['events_emitted']}", file=sys.stderr)
|
|
525
|
+
if total_emitted:
|
|
526
|
+
print(f" events emitted (this run): {total_emitted}", file=sys.stderr)
|
|
527
|
+
print(f" db: {db_path}", file=sys.stderr)
|
|
528
|
+
db.close()
|
|
529
|
+
|
|
530
|
+
|
|
531
|
+
async def _run_emit_unsent(db_path: str) -> None:
|
|
532
|
+
from devpost_scraper.db import HarvestDB
|
|
533
|
+
|
|
534
|
+
db = HarvestDB(db_path)
|
|
535
|
+
unemitted = db.all_unemitted_participants()
|
|
536
|
+
|
|
537
|
+
if not unemitted:
|
|
538
|
+
print("[emit-unsent] No unsent participants with emails in DB.", file=sys.stderr)
|
|
539
|
+
db.close()
|
|
540
|
+
return
|
|
541
|
+
|
|
542
|
+
print(f"[emit-unsent] {len(unemitted)} participants to emit", file=sys.stderr)
|
|
543
|
+
await emit_hackathon_events(unemitted)
|
|
544
|
+
|
|
545
|
+
for p in unemitted:
|
|
546
|
+
db.mark_event_emitted(p.hackathon_url, p.username)
|
|
547
|
+
|
|
548
|
+
stats = db.stats()
|
|
549
|
+
print(f"\n[emit-unsent] Done. {len(unemitted)} events emitted.", file=sys.stderr)
|
|
550
|
+
print(f" events emitted (total): {stats['events_emitted']}", file=sys.stderr)
|
|
551
|
+
db.close()
|
|
552
|
+
|
|
553
|
+
|
|
554
|
+
def harvest_main() -> None:
|
|
555
|
+
load_dotenv(_ENV_FILE, override=True)
|
|
556
|
+
|
|
557
|
+
parser = argparse.ArgumentParser(
|
|
558
|
+
prog="devpost-harvest",
|
|
559
|
+
description=(
|
|
560
|
+
"Walk the Devpost hackathon listing, scrape participants per hackathon, "
|
|
561
|
+
"store in SQLite, and emit Customer.io events for new (delta) participants."
|
|
562
|
+
),
|
|
563
|
+
)
|
|
564
|
+
parser.add_argument(
|
|
565
|
+
"--pages",
|
|
566
|
+
type=int,
|
|
567
|
+
default=3,
|
|
568
|
+
help="Number of hackathon listing pages to fetch (9 hackathons/page, default: 3)",
|
|
569
|
+
)
|
|
570
|
+
parser.add_argument(
|
|
571
|
+
"--jwt",
|
|
572
|
+
metavar="TOKEN",
|
|
573
|
+
default=None,
|
|
574
|
+
help="Value of the _devpost session cookie. Falls back to DEVPOST_SESSION in .env",
|
|
575
|
+
)
|
|
576
|
+
parser.add_argument(
|
|
577
|
+
"--db",
|
|
578
|
+
metavar="PATH",
|
|
579
|
+
default="devpost_harvest.db",
|
|
580
|
+
help="SQLite database path (default: devpost_harvest.db)",
|
|
581
|
+
)
|
|
582
|
+
parser.add_argument(
|
|
583
|
+
"--no-email",
|
|
584
|
+
action="store_true",
|
|
585
|
+
default=False,
|
|
586
|
+
help="Skip email enrichment (much faster)",
|
|
587
|
+
)
|
|
588
|
+
parser.add_argument(
|
|
589
|
+
"--emit-events",
|
|
590
|
+
action="store_true",
|
|
591
|
+
default=False,
|
|
592
|
+
help="Emit Customer.io events for delta participants during scrape",
|
|
593
|
+
)
|
|
594
|
+
parser.add_argument(
|
|
595
|
+
"--emit-unsent",
|
|
596
|
+
action="store_true",
|
|
597
|
+
default=False,
|
|
598
|
+
help="Skip scraping — just emit Customer.io events for all unsent participants in the DB",
|
|
599
|
+
)
|
|
600
|
+
parser.add_argument(
|
|
601
|
+
"--rescrape",
|
|
602
|
+
action="store_true",
|
|
603
|
+
default=False,
|
|
604
|
+
help="Re-scrape hackathons that were already scraped in a previous run",
|
|
605
|
+
)
|
|
606
|
+
parser.add_argument(
|
|
607
|
+
"--max-participants",
|
|
608
|
+
type=int,
|
|
609
|
+
default=0,
|
|
610
|
+
metavar="N",
|
|
611
|
+
help="Cap participants scraped per hackathon (0 = unlimited, default: 0)",
|
|
612
|
+
)
|
|
613
|
+
parser.add_argument(
|
|
614
|
+
"--hackathons",
|
|
615
|
+
type=int,
|
|
616
|
+
default=0,
|
|
617
|
+
metavar="N",
|
|
618
|
+
help="Only process the first N hackathons from the listing (0 = all, default: 0)",
|
|
619
|
+
)
|
|
620
|
+
parser.add_argument(
|
|
621
|
+
"--status",
|
|
622
|
+
action="append",
|
|
623
|
+
choices=["open", "ended", "upcoming"],
|
|
624
|
+
default=None,
|
|
625
|
+
dest="statuses",
|
|
626
|
+
help="Hackathon status filter (repeatable, default: open). e.g. --status open --status ended",
|
|
627
|
+
)
|
|
628
|
+
args = parser.parse_args()
|
|
629
|
+
|
|
630
|
+
if args.statuses is None:
|
|
631
|
+
args.statuses = ["open"]
|
|
632
|
+
|
|
633
|
+
if args.emit_unsent:
|
|
634
|
+
asyncio.run(_run_emit_unsent(db_path=args.db))
|
|
635
|
+
return
|
|
636
|
+
|
|
637
|
+
jwt_token = args.jwt or os.getenv(_PARTICIPANTS_JWT_KEY, "").strip()
|
|
638
|
+
if not jwt_token:
|
|
639
|
+
raise SystemExit(
|
|
640
|
+
"[error] No session cookie. Pass --jwt TOKEN or set DEVPOST_SESSION in .env\n"
|
|
641
|
+
" Copy the _devpost cookie value from browser DevTools → Application → Cookies"
|
|
642
|
+
)
|
|
643
|
+
|
|
644
|
+
if args.jwt:
|
|
645
|
+
_ENV_FILE.touch(exist_ok=True)
|
|
646
|
+
set_key(str(_ENV_FILE), _PARTICIPANTS_JWT_KEY, args.jwt)
|
|
647
|
+
|
|
648
|
+
asyncio.run(
|
|
649
|
+
_run_harvest(
|
|
650
|
+
pages=args.pages,
|
|
651
|
+
jwt_token=jwt_token,
|
|
652
|
+
db_path=args.db,
|
|
653
|
+
no_email=args.no_email,
|
|
654
|
+
emit_events=args.emit_events,
|
|
655
|
+
rescrape=args.rescrape,
|
|
656
|
+
max_participants=args.max_participants,
|
|
657
|
+
max_hackathons=args.hackathons,
|
|
658
|
+
statuses=args.statuses,
|
|
363
659
|
)
|
|
364
660
|
)
|
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
from __future__ import annotations
|
|
2
|
+
|
|
3
|
+
import logging
|
|
4
|
+
import os
|
|
5
|
+
import sys
|
|
6
|
+
|
|
7
|
+
import httpx
|
|
8
|
+
|
|
9
|
+
from devpost_scraper.models import DevpostHackathonEvent, HackathonParticipant
|
|
10
|
+
|
|
11
|
+
logger = logging.getLogger(__name__)
|
|
12
|
+
|
|
13
|
+
_TRACK_API_URL = "https://track.customer.io/api/v1"
|
|
14
|
+
_EVENT_NAME = "devpost_hackathon"
|
|
15
|
+
|
|
16
|
+
|
|
17
|
+
class CustomerIOService:
|
|
18
|
+
"""Async Customer.io Track API client (httpx + basic auth)."""
|
|
19
|
+
|
|
20
|
+
def __init__(self, site_id: str, api_key: str) -> None:
|
|
21
|
+
self._auth = (site_id, api_key)
|
|
22
|
+
|
|
23
|
+
async def identify_user(self, user_id: str, email: str, **attrs: str) -> bool:
|
|
24
|
+
payload = {"email": email, **{k: v for k, v in attrs.items() if v}}
|
|
25
|
+
url = f"{_TRACK_API_URL}/customers/{user_id}"
|
|
26
|
+
async with httpx.AsyncClient() as client:
|
|
27
|
+
resp = await client.put(url, json=payload, auth=self._auth, timeout=10.0)
|
|
28
|
+
if resp.status_code == 200:
|
|
29
|
+
return True
|
|
30
|
+
logger.error("[cio] identify %s → %s %s", user_id, resp.status_code, resp.text)
|
|
31
|
+
return False
|
|
32
|
+
|
|
33
|
+
async def track_event(self, user_id: str, event_name: str, data: dict) -> bool:
|
|
34
|
+
url = f"{_TRACK_API_URL}/customers/{user_id}/events"
|
|
35
|
+
payload = {"name": event_name, "data": data}
|
|
36
|
+
async with httpx.AsyncClient() as client:
|
|
37
|
+
resp = await client.post(url, json=payload, auth=self._auth, timeout=10.0)
|
|
38
|
+
if resp.status_code == 200:
|
|
39
|
+
return True
|
|
40
|
+
logger.error("[cio] track %s/%s → %s %s", user_id, event_name, resp.status_code, resp.text)
|
|
41
|
+
return False
|
|
42
|
+
|
|
43
|
+
|
|
44
|
+
def _build_service() -> CustomerIOService:
|
|
45
|
+
site_id = os.getenv("CUSTOMERIO_SITE_ID", "").strip()
|
|
46
|
+
api_key = os.getenv("CUSTOMERIO_API_KEY", "").strip()
|
|
47
|
+
if not site_id or not api_key:
|
|
48
|
+
raise SystemExit(
|
|
49
|
+
"[error] CUSTOMERIO_SITE_ID and CUSTOMERIO_API_KEY must be set in .env"
|
|
50
|
+
)
|
|
51
|
+
return CustomerIOService(site_id, api_key)
|
|
52
|
+
|
|
53
|
+
|
|
54
|
+
async def emit_hackathon_events(participants: list[HackathonParticipant]) -> None:
|
|
55
|
+
eligible = [p for p in participants if p.email]
|
|
56
|
+
if not eligible:
|
|
57
|
+
print("[cio] No participants with emails — skipping event emission", file=sys.stderr)
|
|
58
|
+
return
|
|
59
|
+
|
|
60
|
+
svc = _build_service()
|
|
61
|
+
sent = 0
|
|
62
|
+
|
|
63
|
+
for p in eligible:
|
|
64
|
+
event = DevpostHackathonEvent(
|
|
65
|
+
hackathon_url=p.hackathon_url,
|
|
66
|
+
hackathon_title=p.hackathon_title,
|
|
67
|
+
username=p.username,
|
|
68
|
+
name=p.name,
|
|
69
|
+
specialty=p.specialty,
|
|
70
|
+
profile_url=p.profile_url,
|
|
71
|
+
github_url=p.github_url,
|
|
72
|
+
linkedin_url=p.linkedin_url,
|
|
73
|
+
)
|
|
74
|
+
|
|
75
|
+
name_parts = p.name.split(maxsplit=1)
|
|
76
|
+
first = name_parts[0] if name_parts else ""
|
|
77
|
+
last = name_parts[1] if len(name_parts) > 1 else ""
|
|
78
|
+
|
|
79
|
+
await svc.identify_user(p.email, email=p.email, first_name=first, last_name=last)
|
|
80
|
+
ok = await svc.track_event(p.email, _EVENT_NAME, event.model_dump())
|
|
81
|
+
if ok:
|
|
82
|
+
sent += 1
|
|
83
|
+
print(f" [cio] {_EVENT_NAME} → {p.email}", file=sys.stderr)
|
|
84
|
+
else:
|
|
85
|
+
print(f" [cio] FAILED {p.email}", file=sys.stderr)
|
|
86
|
+
|
|
87
|
+
print(f"[cio] Emitted {sent}/{len(eligible)} events", file=sys.stderr)
|
|
@@ -0,0 +1,224 @@
|
|
|
1
|
+
from __future__ import annotations
|
|
2
|
+
|
|
3
|
+
import sqlite3
|
|
4
|
+
from datetime import datetime, timezone
|
|
5
|
+
from pathlib import Path
|
|
6
|
+
|
|
7
|
+
from devpost_scraper.models import Hackathon, HackathonParticipant
|
|
8
|
+
|
|
9
|
+
_DEFAULT_DB = "devpost_harvest.db"
|
|
10
|
+
|
|
11
|
+
_SCHEMA = """\
|
|
12
|
+
CREATE TABLE IF NOT EXISTS hackathons (
|
|
13
|
+
id INTEGER PRIMARY KEY,
|
|
14
|
+
url TEXT UNIQUE NOT NULL,
|
|
15
|
+
title TEXT,
|
|
16
|
+
organization_name TEXT,
|
|
17
|
+
open_state TEXT,
|
|
18
|
+
submission_period_dates TEXT,
|
|
19
|
+
registrations_count INTEGER,
|
|
20
|
+
prize_amount TEXT,
|
|
21
|
+
themes TEXT,
|
|
22
|
+
invite_only INTEGER,
|
|
23
|
+
first_seen_at TEXT NOT NULL,
|
|
24
|
+
last_scraped_at TEXT
|
|
25
|
+
);
|
|
26
|
+
|
|
27
|
+
CREATE TABLE IF NOT EXISTS participants (
|
|
28
|
+
hackathon_url TEXT NOT NULL,
|
|
29
|
+
hackathon_title TEXT,
|
|
30
|
+
username TEXT NOT NULL,
|
|
31
|
+
name TEXT,
|
|
32
|
+
specialty TEXT,
|
|
33
|
+
profile_url TEXT,
|
|
34
|
+
github_url TEXT,
|
|
35
|
+
linkedin_url TEXT,
|
|
36
|
+
email TEXT,
|
|
37
|
+
first_seen_at TEXT NOT NULL,
|
|
38
|
+
last_seen_at TEXT NOT NULL,
|
|
39
|
+
event_emitted_at TEXT,
|
|
40
|
+
PRIMARY KEY (hackathon_url, username)
|
|
41
|
+
);
|
|
42
|
+
"""
|
|
43
|
+
|
|
44
|
+
|
|
45
|
+
class HarvestDB:
|
|
46
|
+
def __init__(self, db_path: str = _DEFAULT_DB) -> None:
|
|
47
|
+
self._path = Path(db_path)
|
|
48
|
+
self._conn = sqlite3.connect(str(self._path))
|
|
49
|
+
self._conn.row_factory = sqlite3.Row
|
|
50
|
+
self._conn.executescript(_SCHEMA)
|
|
51
|
+
self._migrate()
|
|
52
|
+
|
|
53
|
+
def _migrate(self) -> None:
|
|
54
|
+
cols = {r[1] for r in self._conn.execute("PRAGMA table_info(participants)").fetchall()}
|
|
55
|
+
if "hackathon_title" not in cols:
|
|
56
|
+
self._conn.execute("ALTER TABLE participants ADD COLUMN hackathon_title TEXT")
|
|
57
|
+
self._conn.commit()
|
|
58
|
+
|
|
59
|
+
def close(self) -> None:
|
|
60
|
+
self._conn.close()
|
|
61
|
+
|
|
62
|
+
def upsert_hackathon(self, h: Hackathon) -> None:
|
|
63
|
+
now = _now_iso()
|
|
64
|
+
self._conn.execute(
|
|
65
|
+
"""INSERT INTO hackathons
|
|
66
|
+
(id, url, title, organization_name, open_state,
|
|
67
|
+
submission_period_dates, registrations_count, prize_amount,
|
|
68
|
+
themes, invite_only, first_seen_at)
|
|
69
|
+
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
|
70
|
+
ON CONFLICT(id) DO UPDATE SET
|
|
71
|
+
title=excluded.title,
|
|
72
|
+
organization_name=excluded.organization_name,
|
|
73
|
+
open_state=excluded.open_state,
|
|
74
|
+
submission_period_dates=excluded.submission_period_dates,
|
|
75
|
+
registrations_count=excluded.registrations_count,
|
|
76
|
+
prize_amount=excluded.prize_amount,
|
|
77
|
+
themes=excluded.themes,
|
|
78
|
+
invite_only=excluded.invite_only
|
|
79
|
+
""",
|
|
80
|
+
(
|
|
81
|
+
h.id, h.url, h.title, h.organization_name, h.open_state,
|
|
82
|
+
h.submission_period_dates, h.registrations_count, h.prize_amount,
|
|
83
|
+
h.themes, int(h.invite_only), now,
|
|
84
|
+
),
|
|
85
|
+
)
|
|
86
|
+
self._conn.commit()
|
|
87
|
+
|
|
88
|
+
def upsert_participants(
|
|
89
|
+
self, participants: list[HackathonParticipant],
|
|
90
|
+
) -> list[HackathonParticipant]:
|
|
91
|
+
"""Insert or update participants. Returns only the NEW ones (not previously seen)."""
|
|
92
|
+
now = _now_iso()
|
|
93
|
+
new: list[HackathonParticipant] = []
|
|
94
|
+
|
|
95
|
+
for p in participants:
|
|
96
|
+
existing = self._conn.execute(
|
|
97
|
+
"SELECT 1 FROM participants WHERE hackathon_url=? AND username=?",
|
|
98
|
+
(p.hackathon_url, p.username),
|
|
99
|
+
).fetchone()
|
|
100
|
+
|
|
101
|
+
if existing is None:
|
|
102
|
+
new.append(p)
|
|
103
|
+
self._conn.execute(
|
|
104
|
+
"""INSERT INTO participants
|
|
105
|
+
(hackathon_url, hackathon_title, username, name, specialty,
|
|
106
|
+
profile_url, github_url, linkedin_url, email,
|
|
107
|
+
first_seen_at, last_seen_at)
|
|
108
|
+
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
|
109
|
+
""",
|
|
110
|
+
(
|
|
111
|
+
p.hackathon_url, p.hackathon_title, p.username, p.name,
|
|
112
|
+
p.specialty, p.profile_url, p.github_url, p.linkedin_url,
|
|
113
|
+
p.email, now, now,
|
|
114
|
+
),
|
|
115
|
+
)
|
|
116
|
+
else:
|
|
117
|
+
self._conn.execute(
|
|
118
|
+
"""UPDATE participants
|
|
119
|
+
SET hackathon_title=?, name=?, specialty=?, profile_url=?,
|
|
120
|
+
github_url=?, linkedin_url=?, email=?, last_seen_at=?
|
|
121
|
+
WHERE hackathon_url=? AND username=?
|
|
122
|
+
""",
|
|
123
|
+
(
|
|
124
|
+
p.hackathon_title, p.name, p.specialty, p.profile_url,
|
|
125
|
+
p.github_url, p.linkedin_url, p.email, now,
|
|
126
|
+
p.hackathon_url, p.username,
|
|
127
|
+
),
|
|
128
|
+
)
|
|
129
|
+
|
|
130
|
+
self._conn.commit()
|
|
131
|
+
return new
|
|
132
|
+
|
|
133
|
+
def update_participant_enrichment(self, p: HackathonParticipant) -> None:
|
|
134
|
+
"""Update email/github/linkedin fields for an existing participant."""
|
|
135
|
+
self._conn.execute(
|
|
136
|
+
"""UPDATE participants
|
|
137
|
+
SET email=?, github_url=?, linkedin_url=?, last_seen_at=?
|
|
138
|
+
WHERE hackathon_url=? AND username=?""",
|
|
139
|
+
(p.email, p.github_url, p.linkedin_url, _now_iso(), p.hackathon_url, p.username),
|
|
140
|
+
)
|
|
141
|
+
self._conn.commit()
|
|
142
|
+
|
|
143
|
+
def mark_event_emitted(self, hackathon_url: str, username: str) -> None:
|
|
144
|
+
self._conn.execute(
|
|
145
|
+
"UPDATE participants SET event_emitted_at=? WHERE hackathon_url=? AND username=?",
|
|
146
|
+
(_now_iso(), hackathon_url, username),
|
|
147
|
+
)
|
|
148
|
+
self._conn.commit()
|
|
149
|
+
|
|
150
|
+
def get_unemitted_participants(self, hackathon_url: str) -> list[HackathonParticipant]:
|
|
151
|
+
"""Return participants that have an email but haven't had events emitted yet."""
|
|
152
|
+
rows = self._conn.execute(
|
|
153
|
+
"""SELECT * FROM participants
|
|
154
|
+
WHERE hackathon_url=? AND email != '' AND event_emitted_at IS NULL""",
|
|
155
|
+
(hackathon_url,),
|
|
156
|
+
).fetchall()
|
|
157
|
+
return [
|
|
158
|
+
HackathonParticipant(
|
|
159
|
+
hackathon_url=r["hackathon_url"],
|
|
160
|
+
hackathon_title=r["hackathon_title"] or "",
|
|
161
|
+
username=r["username"],
|
|
162
|
+
name=r["name"] or "",
|
|
163
|
+
specialty=r["specialty"] or "",
|
|
164
|
+
profile_url=r["profile_url"] or "",
|
|
165
|
+
github_url=r["github_url"] or "",
|
|
166
|
+
linkedin_url=r["linkedin_url"] or "",
|
|
167
|
+
email=r["email"] or "",
|
|
168
|
+
)
|
|
169
|
+
for r in rows
|
|
170
|
+
]
|
|
171
|
+
|
|
172
|
+
def all_unemitted_participants(self) -> list[HackathonParticipant]:
|
|
173
|
+
"""Return all participants across all hackathons with email but no event emitted."""
|
|
174
|
+
rows = self._conn.execute(
|
|
175
|
+
"SELECT * FROM participants WHERE email != '' AND event_emitted_at IS NULL"
|
|
176
|
+
).fetchall()
|
|
177
|
+
return [
|
|
178
|
+
HackathonParticipant(
|
|
179
|
+
hackathon_url=r["hackathon_url"],
|
|
180
|
+
hackathon_title=r["hackathon_title"] or "",
|
|
181
|
+
username=r["username"],
|
|
182
|
+
name=r["name"] or "",
|
|
183
|
+
specialty=r["specialty"] or "",
|
|
184
|
+
profile_url=r["profile_url"] or "",
|
|
185
|
+
github_url=r["github_url"] or "",
|
|
186
|
+
linkedin_url=r["linkedin_url"] or "",
|
|
187
|
+
email=r["email"] or "",
|
|
188
|
+
)
|
|
189
|
+
for r in rows
|
|
190
|
+
]
|
|
191
|
+
|
|
192
|
+
def hackathon_scraped(self, hackathon_url: str) -> bool:
|
|
193
|
+
row = self._conn.execute(
|
|
194
|
+
"SELECT last_scraped_at FROM hackathons WHERE url=?",
|
|
195
|
+
(hackathon_url,),
|
|
196
|
+
).fetchone()
|
|
197
|
+
return row is not None and row["last_scraped_at"] is not None
|
|
198
|
+
|
|
199
|
+
def mark_hackathon_scraped(self, hackathon_url: str) -> None:
|
|
200
|
+
self._conn.execute(
|
|
201
|
+
"UPDATE hackathons SET last_scraped_at=? WHERE url=?",
|
|
202
|
+
(_now_iso(), hackathon_url),
|
|
203
|
+
)
|
|
204
|
+
self._conn.commit()
|
|
205
|
+
|
|
206
|
+
def stats(self) -> dict[str, int]:
|
|
207
|
+
hcount = self._conn.execute("SELECT COUNT(*) FROM hackathons").fetchone()[0]
|
|
208
|
+
pcount = self._conn.execute("SELECT COUNT(*) FROM participants").fetchone()[0]
|
|
209
|
+
emitted = self._conn.execute(
|
|
210
|
+
"SELECT COUNT(*) FROM participants WHERE event_emitted_at IS NOT NULL"
|
|
211
|
+
).fetchone()[0]
|
|
212
|
+
with_email = self._conn.execute(
|
|
213
|
+
"SELECT COUNT(*) FROM participants WHERE email != ''"
|
|
214
|
+
).fetchone()[0]
|
|
215
|
+
return {
|
|
216
|
+
"hackathons": hcount,
|
|
217
|
+
"participants": pcount,
|
|
218
|
+
"with_email": with_email,
|
|
219
|
+
"events_emitted": emitted,
|
|
220
|
+
}
|
|
221
|
+
|
|
222
|
+
|
|
223
|
+
def _now_iso() -> str:
|
|
224
|
+
return datetime.now(timezone.utc).isoformat()
|
|
@@ -3,10 +3,26 @@ from __future__ import annotations
|
|
|
3
3
|
from pydantic import BaseModel, ConfigDict
|
|
4
4
|
|
|
5
5
|
|
|
6
|
+
class Hackathon(BaseModel):
|
|
7
|
+
model_config = ConfigDict(extra="ignore")
|
|
8
|
+
|
|
9
|
+
id: int
|
|
10
|
+
url: str
|
|
11
|
+
title: str = ""
|
|
12
|
+
organization_name: str = ""
|
|
13
|
+
open_state: str = ""
|
|
14
|
+
submission_period_dates: str = ""
|
|
15
|
+
registrations_count: int = 0
|
|
16
|
+
prize_amount: str = ""
|
|
17
|
+
themes: str = ""
|
|
18
|
+
invite_only: bool = False
|
|
19
|
+
|
|
20
|
+
|
|
6
21
|
class HackathonParticipant(BaseModel):
|
|
7
22
|
model_config = ConfigDict(extra="ignore")
|
|
8
23
|
|
|
9
24
|
hackathon_url: str = ""
|
|
25
|
+
hackathon_title: str = ""
|
|
10
26
|
username: str = ""
|
|
11
27
|
name: str = ""
|
|
12
28
|
specialty: str = ""
|
|
@@ -17,7 +33,20 @@ class HackathonParticipant(BaseModel):
|
|
|
17
33
|
|
|
18
34
|
@classmethod
|
|
19
35
|
def fieldnames(cls) -> list[str]:
|
|
20
|
-
return ["hackathon_url", "username", "name", "specialty", "profile_url", "github_url", "linkedin_url", "email"]
|
|
36
|
+
return ["hackathon_url", "hackathon_title", "username", "name", "specialty", "profile_url", "github_url", "linkedin_url", "email"]
|
|
37
|
+
|
|
38
|
+
|
|
39
|
+
class DevpostHackathonEvent(BaseModel):
|
|
40
|
+
"""Payload for the Customer.io ``devpost_hackathon`` event."""
|
|
41
|
+
|
|
42
|
+
hackathon_url: str
|
|
43
|
+
hackathon_title: str
|
|
44
|
+
username: str
|
|
45
|
+
name: str
|
|
46
|
+
specialty: str
|
|
47
|
+
profile_url: str
|
|
48
|
+
github_url: str
|
|
49
|
+
linkedin_url: str
|
|
21
50
|
|
|
22
51
|
|
|
23
52
|
class DevpostProject(BaseModel):
|
|
@@ -23,6 +23,7 @@ _WALKABLE_DOMAINS = {
|
|
|
23
23
|
}
|
|
24
24
|
|
|
25
25
|
_SEARCH_URL = "https://devpost.com/software/search"
|
|
26
|
+
_HACKATHONS_API_URL = "https://devpost.com/api/hackathons"
|
|
26
27
|
_GITHUB_API_URL = "https://api.github.com/users"
|
|
27
28
|
|
|
28
29
|
|
|
@@ -83,6 +84,58 @@ async def search_projects(query: str, page: int = 1) -> dict[str, Any]:
|
|
|
83
84
|
}
|
|
84
85
|
|
|
85
86
|
|
|
87
|
+
async def list_hackathons(
|
|
88
|
+
page: int = 1,
|
|
89
|
+
statuses: list[str] | None = None,
|
|
90
|
+
) -> dict[str, Any]:
|
|
91
|
+
"""Fetch one page of hackathons from the Devpost API.
|
|
92
|
+
Returns {"hackathons": [...], "total_count": int, "per_page": int}.
|
|
93
|
+
"""
|
|
94
|
+
if statuses is None:
|
|
95
|
+
statuses = ["open"]
|
|
96
|
+
|
|
97
|
+
params: list[tuple[str, str]] = [("page", str(page))]
|
|
98
|
+
for s in statuses:
|
|
99
|
+
params.append(("status[]", s))
|
|
100
|
+
|
|
101
|
+
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
|
|
102
|
+
resp = await client.get(
|
|
103
|
+
_HACKATHONS_API_URL,
|
|
104
|
+
params=params,
|
|
105
|
+
headers={**_JSON_HEADERS, "Accept": "application/json"},
|
|
106
|
+
)
|
|
107
|
+
resp.raise_for_status()
|
|
108
|
+
data = resp.json()
|
|
109
|
+
|
|
110
|
+
hackathons = []
|
|
111
|
+
for item in data.get("hackathons", []):
|
|
112
|
+
themes = item.get("themes") or []
|
|
113
|
+
theme_names = ", ".join(t.get("name", "") for t in themes if t.get("name"))
|
|
114
|
+
|
|
115
|
+
prize_raw = item.get("prize_amount", "") or ""
|
|
116
|
+
prize_clean = re.sub(r"<[^>]+>", "", prize_raw).strip()
|
|
117
|
+
|
|
118
|
+
hackathons.append({
|
|
119
|
+
"id": item.get("id", 0),
|
|
120
|
+
"title": item.get("title", ""),
|
|
121
|
+
"url": item.get("url", "").rstrip("/"),
|
|
122
|
+
"organization_name": item.get("organization_name", ""),
|
|
123
|
+
"open_state": item.get("open_state", ""),
|
|
124
|
+
"submission_period_dates": item.get("submission_period_dates", ""),
|
|
125
|
+
"registrations_count": item.get("registrations_count", 0),
|
|
126
|
+
"prize_amount": prize_clean,
|
|
127
|
+
"themes": theme_names,
|
|
128
|
+
"invite_only": bool(item.get("invite_only")),
|
|
129
|
+
})
|
|
130
|
+
|
|
131
|
+
meta = data.get("meta", {})
|
|
132
|
+
return {
|
|
133
|
+
"hackathons": hackathons,
|
|
134
|
+
"total_count": meta.get("total_count", 0),
|
|
135
|
+
"per_page": meta.get("per_page", 9),
|
|
136
|
+
}
|
|
137
|
+
|
|
138
|
+
|
|
86
139
|
async def get_project_details(url: str) -> dict[str, Any]:
|
|
87
140
|
"""Fetch a Devpost project page and extract detail fields."""
|
|
88
141
|
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
|
devpost_scraper-0.1.0/PKG-INFO
DELETED
|
@@ -1,101 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: devpost-scraper
|
|
3
|
-
Version: 0.1.0
|
|
4
|
-
Summary: CLI for extracting Devpost data with Backboard tool-calling and exporting results to CSV.
|
|
5
|
-
Requires-Python: >=3.11
|
|
6
|
-
Requires-Dist: backboard-sdk>=1.5.9
|
|
7
|
-
Requires-Dist: beautifulsoup4>=4.12.0
|
|
8
|
-
Requires-Dist: httpx>=0.27.0
|
|
9
|
-
Requires-Dist: pydantic>=2.7.0
|
|
10
|
-
Requires-Dist: python-dotenv>=1.0.1
|
|
11
|
-
Description-Content-Type: text/markdown
|
|
12
|
-
|
|
13
|
-
# Devpost Scraper
|
|
14
|
-
|
|
15
|
-
CLI for extracting Devpost project data with a Backboard assistant that can call a Devpost MCP tool server and export structured results to CSV.
|
|
16
|
-
|
|
17
|
-
## Requirements
|
|
18
|
-
|
|
19
|
-
- Python 3.11+
|
|
20
|
-
- `uv`
|
|
21
|
-
- Node.js / `npx` available on your machine
|
|
22
|
-
- A Backboard API key
|
|
23
|
-
|
|
24
|
-
## Environment
|
|
25
|
-
|
|
26
|
-
Create a `.env` file from `.env.example` and set:
|
|
27
|
-
|
|
28
|
-
- `BACKBOARD_API_KEY`
|
|
29
|
-
- `BACKBOARD_MODEL` (optional)
|
|
30
|
-
- `DEVPOST_ASSISTANT_NAME` (optional)
|
|
31
|
-
|
|
32
|
-
## MCP server
|
|
33
|
-
|
|
34
|
-
This project is designed to use a Devpost MCP server with this configuration:
|
|
35
|
-
|
|
36
|
-
```json
|
|
37
|
-
{
|
|
38
|
-
"mcpServers": {
|
|
39
|
-
"devpost": {
|
|
40
|
-
"command": "npx",
|
|
41
|
-
"args": ["devpost-mcp-server"]
|
|
42
|
-
}
|
|
43
|
-
}
|
|
44
|
-
}
|
|
45
|
-
```
|
|
46
|
-
|
|
47
|
-
## Install
|
|
48
|
-
|
|
49
|
-
```bash
|
|
50
|
-
uv sync
|
|
51
|
-
```
|
|
52
|
-
|
|
53
|
-
## Run
|
|
54
|
-
|
|
55
|
-
```bash
|
|
56
|
-
uv run devpost-scraper "ai agents" --output ai_agents.csv
|
|
57
|
-
uv run devpost-scraper "developer tools" "climate tech" --output results.csv
|
|
58
|
-
```
|
|
59
|
-
|
|
60
|
-
You can also use the startup script:
|
|
61
|
-
|
|
62
|
-
```bash
|
|
63
|
-
./start.sh "ai agents" --output ai_agents.csv
|
|
64
|
-
```
|
|
65
|
-
|
|
66
|
-
## What it does
|
|
67
|
-
|
|
68
|
-
1. Creates or reuses a Backboard assistant configured for Devpost extraction.
|
|
69
|
-
2. Creates a thread for the run.
|
|
70
|
-
3. Sends a prompt that asks the assistant to use the Devpost MCP toolset.
|
|
71
|
-
4. Handles tool-calling loops until the assistant returns completed structured content.
|
|
72
|
-
5. Parses the structured JSON result.
|
|
73
|
-
6. Writes the extracted rows to CSV.
|
|
74
|
-
|
|
75
|
-
## Expected output shape
|
|
76
|
-
|
|
77
|
-
Each extracted row should contain fields like:
|
|
78
|
-
|
|
79
|
-
- `search_term`
|
|
80
|
-
- `project_title`
|
|
81
|
-
- `tagline`
|
|
82
|
-
- `project_url`
|
|
83
|
-
- `hackathon_name`
|
|
84
|
-
- `hackathon_url`
|
|
85
|
-
- `summary`
|
|
86
|
-
- `built_with`
|
|
87
|
-
- `prizes`
|
|
88
|
-
- `submission_date`
|
|
89
|
-
- `team_size`
|
|
90
|
-
|
|
91
|
-
## Notes
|
|
92
|
-
|
|
93
|
-
- The CLI is intentionally API-heavy and UI-free.
|
|
94
|
-
- The Backboard assistant must have access to the Devpost MCP tools in the environment where it runs.
|
|
95
|
-
- If your Backboard account or environment requires additional tool registration, wire that into the assistant creation flow in the client module.
|
|
96
|
-
|
|
97
|
-
## Development
|
|
98
|
-
|
|
99
|
-
```bash
|
|
100
|
-
uv run python -m devpost_scraper.cli "ai agents" --output out.csv
|
|
101
|
-
```
|
devpost_scraper-0.1.0/README.md
DELETED
|
@@ -1,89 +0,0 @@
|
|
|
1
|
-
# Devpost Scraper
|
|
2
|
-
|
|
3
|
-
CLI for extracting Devpost project data with a Backboard assistant that can call a Devpost MCP tool server and export structured results to CSV.
|
|
4
|
-
|
|
5
|
-
## Requirements
|
|
6
|
-
|
|
7
|
-
- Python 3.11+
|
|
8
|
-
- `uv`
|
|
9
|
-
- Node.js / `npx` available on your machine
|
|
10
|
-
- A Backboard API key
|
|
11
|
-
|
|
12
|
-
## Environment
|
|
13
|
-
|
|
14
|
-
Create a `.env` file from `.env.example` and set:
|
|
15
|
-
|
|
16
|
-
- `BACKBOARD_API_KEY`
|
|
17
|
-
- `BACKBOARD_MODEL` (optional)
|
|
18
|
-
- `DEVPOST_ASSISTANT_NAME` (optional)
|
|
19
|
-
|
|
20
|
-
## MCP server
|
|
21
|
-
|
|
22
|
-
This project is designed to use a Devpost MCP server with this configuration:
|
|
23
|
-
|
|
24
|
-
```json
|
|
25
|
-
{
|
|
26
|
-
"mcpServers": {
|
|
27
|
-
"devpost": {
|
|
28
|
-
"command": "npx",
|
|
29
|
-
"args": ["devpost-mcp-server"]
|
|
30
|
-
}
|
|
31
|
-
}
|
|
32
|
-
}
|
|
33
|
-
```
|
|
34
|
-
|
|
35
|
-
## Install
|
|
36
|
-
|
|
37
|
-
```bash
|
|
38
|
-
uv sync
|
|
39
|
-
```
|
|
40
|
-
|
|
41
|
-
## Run
|
|
42
|
-
|
|
43
|
-
```bash
|
|
44
|
-
uv run devpost-scraper "ai agents" --output ai_agents.csv
|
|
45
|
-
uv run devpost-scraper "developer tools" "climate tech" --output results.csv
|
|
46
|
-
```
|
|
47
|
-
|
|
48
|
-
You can also use the startup script:
|
|
49
|
-
|
|
50
|
-
```bash
|
|
51
|
-
./start.sh "ai agents" --output ai_agents.csv
|
|
52
|
-
```
|
|
53
|
-
|
|
54
|
-
## What it does
|
|
55
|
-
|
|
56
|
-
1. Creates or reuses a Backboard assistant configured for Devpost extraction.
|
|
57
|
-
2. Creates a thread for the run.
|
|
58
|
-
3. Sends a prompt that asks the assistant to use the Devpost MCP toolset.
|
|
59
|
-
4. Handles tool-calling loops until the assistant returns completed structured content.
|
|
60
|
-
5. Parses the structured JSON result.
|
|
61
|
-
6. Writes the extracted rows to CSV.
|
|
62
|
-
|
|
63
|
-
## Expected output shape
|
|
64
|
-
|
|
65
|
-
Each extracted row should contain fields like:
|
|
66
|
-
|
|
67
|
-
- `search_term`
|
|
68
|
-
- `project_title`
|
|
69
|
-
- `tagline`
|
|
70
|
-
- `project_url`
|
|
71
|
-
- `hackathon_name`
|
|
72
|
-
- `hackathon_url`
|
|
73
|
-
- `summary`
|
|
74
|
-
- `built_with`
|
|
75
|
-
- `prizes`
|
|
76
|
-
- `submission_date`
|
|
77
|
-
- `team_size`
|
|
78
|
-
|
|
79
|
-
## Notes
|
|
80
|
-
|
|
81
|
-
- The CLI is intentionally API-heavy and UI-free.
|
|
82
|
-
- The Backboard assistant must have access to the Devpost MCP tools in the environment where it runs.
|
|
83
|
-
- If your Backboard account or environment requires additional tool registration, wire that into the assistant creation flow in the client module.
|
|
84
|
-
|
|
85
|
-
## Development
|
|
86
|
-
|
|
87
|
-
```bash
|
|
88
|
-
uv run python -m devpost_scraper.cli "ai agents" --output out.csv
|
|
89
|
-
```
|
|
File without changes
|
|
File without changes
|
|
File without changes
|