beatbot 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- beatbot-0.1.0/PKG-INFO +267 -0
- beatbot-0.1.0/README.md +230 -0
- beatbot-0.1.0/beatbot/__init__.py +3 -0
- beatbot-0.1.0/beatbot/cli.py +322 -0
- beatbot-0.1.0/beatbot/extractor/__init__.py +0 -0
- beatbot-0.1.0/beatbot/extractor/extractor.py +769 -0
- beatbot-0.1.0/beatbot/track.py +123 -0
- beatbot-0.1.0/pyproject.toml +52 -0
beatbot-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,267 @@
|
|
|
1
|
+
Metadata-Version: 2.3
|
|
2
|
+
Name: beatbot
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: BeatBot CLI — local audio feature extraction for cloud cue-point prediction
|
|
5
|
+
Author: Romanos Chiliarchopoulos
|
|
6
|
+
Author-email: romanoshiliarhopoulos@gmail.com
|
|
7
|
+
Requires-Python: >=3.10
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
10
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
13
|
+
Provides-Extra: dev
|
|
14
|
+
Provides-Extra: server
|
|
15
|
+
Requires-Dist: fastapi (>=0.115.0,<1.0.0) ; extra == "dev"
|
|
16
|
+
Requires-Dist: fastapi (>=0.115.0,<1.0.0) ; extra == "server"
|
|
17
|
+
Requires-Dist: firebase-admin (>=6.5.0,<7.0.0) ; extra == "dev"
|
|
18
|
+
Requires-Dist: firebase-admin (>=6.5.0,<7.0.0) ; extra == "server"
|
|
19
|
+
Requires-Dist: ipywidgets (>=8.1.0) ; extra == "dev"
|
|
20
|
+
Requires-Dist: librosa (>=0.11.0,<0.12.0)
|
|
21
|
+
Requires-Dist: lightgbm (>=4.6.0,<5.0.0) ; extra == "dev"
|
|
22
|
+
Requires-Dist: lightgbm (>=4.6.0,<5.0.0) ; extra == "server"
|
|
23
|
+
Requires-Dist: matplotlib (>=3.7.0) ; extra == "dev"
|
|
24
|
+
Requires-Dist: numpy (>=1.26.0,<2.4.0)
|
|
25
|
+
Requires-Dist: pandas (>=2.2.0,<3.0.0) ; extra == "dev"
|
|
26
|
+
Requires-Dist: pandas (>=2.2.0,<3.0.0) ; extra == "server"
|
|
27
|
+
Requires-Dist: plotly (>=5.0.0) ; extra == "dev"
|
|
28
|
+
Requires-Dist: scipy (>=1.11.0)
|
|
29
|
+
Requires-Dist: seaborn (>=0.13.0) ; extra == "dev"
|
|
30
|
+
Requires-Dist: streamlit (>=1.30.0) ; extra == "dev"
|
|
31
|
+
Requires-Dist: tqdm (>=4.67.3,<5.0.0)
|
|
32
|
+
Requires-Dist: uvicorn[standard] (>=0.30.0,<1.0.0) ; extra == "dev"
|
|
33
|
+
Requires-Dist: uvicorn[standard] (>=0.30.0,<1.0.0) ; extra == "server"
|
|
34
|
+
Requires-Dist: yt-dlp (>=2024.0.0) ; extra == "dev"
|
|
35
|
+
Description-Content-Type: text/markdown
|
|
36
|
+
|
|
37
|
+
# BeatBot — AI-Powered DJ Mixing Tool
|
|
38
|
+
|
|
39
|
+
BeatBot is an AI-powered mixing assistant that analyses house music tracks and automatically selects the optimal **entry** and **exit** cue points for seamless DJ transitions. A React frontend gives the user full visibility into the model's predictions and lets them trigger or override crossfades in real time.
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## Table of Contents
|
|
44
|
+
|
|
45
|
+
- [Use Case](#use-case)
|
|
46
|
+
- [Project Structure](#project-structure)
|
|
47
|
+
- [The Model](#the-model)
|
|
48
|
+
- [Feature Engineering](#feature-engineering)
|
|
49
|
+
- [API](#api)
|
|
50
|
+
- [Frontend](#frontend)
|
|
51
|
+
- [Running Locally](#running-locally)
|
|
52
|
+
- [Data](#data)
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## Use Case
|
|
57
|
+
|
|
58
|
+
A user loads a playlist of house music tracks into the queue. BeatBot:
|
|
59
|
+
|
|
60
|
+
1. Analyses each track with `librosa` and the `FeatureExtractor` pipeline.
|
|
61
|
+
2. Scores every bar in the track with the trained **dual LambdaRank** model.
|
|
62
|
+
3. Surfaces the best **entry** and **exit** bar for each track in the UI.
|
|
63
|
+
4. Automatically crossfades to the next track when the exit cue approaches, or immediately on user request.
|
|
64
|
+
|
|
65
|
+
The user can inspect the scoring charts, manually drag cue points, skip the upcoming track, reorder the queue, and adjust the crossfade duration — all without touching the model.
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## Project Structure
|
|
70
|
+
|
|
71
|
+
```
|
|
72
|
+
BeatBot/
|
|
73
|
+
├── src/ # Python back-end
|
|
74
|
+
│ ├── api/ # FastAPI application
|
|
75
|
+
│ │ ├── main.py # App factory, CORS, router registration
|
|
76
|
+
│ │ ├── state.py # Shared runtime state (queue, cue cache, predict_cues)
|
|
77
|
+
│ │ ├── schemas.py # Pydantic request / response models
|
|
78
|
+
│ │ ├── ws_manager.py # WebSocket connection manager
|
|
79
|
+
│ │ └── routes/
|
|
80
|
+
│ │ ├── audio.py # GET /audio/{track_id} — streams MP3
|
|
81
|
+
│ │ ├── tracks.py # GET /tracks — lists library
|
|
82
|
+
│ │ ├── queue.py # Queue CRUD + reorder
|
|
83
|
+
│ │ ├── predict.py # POST /predict/{track_id}
|
|
84
|
+
│ │ ├── cues.py # PATCH /cues/{track_id}
|
|
85
|
+
│ │ ├── session.py # WebSocket /ws/session
|
|
86
|
+
│ │ └── transition.py # POST /transition/now
|
|
87
|
+
│ ├── model/
|
|
88
|
+
│ │ └── lightgbm.py # BeatBotModel — dual LambdaRank wrapper
|
|
89
|
+
│ ├── extractor/
|
|
90
|
+
│ │ └── extractor.py # Audio → Track pipeline (librosa)
|
|
91
|
+
│ ├── features.py # FeatureExtractor — 40+ features per bar
|
|
92
|
+
│ ├── track.py # Track dataclass
|
|
93
|
+
│ └── annotator.py # Annotation helper (JAMS format)
|
|
94
|
+
│
|
|
95
|
+
├── frontend/ # React + TypeScript UI
|
|
96
|
+
│ └── src/
|
|
97
|
+
│ ├── App.tsx # Root: queue state, deck routing, crossfade logic
|
|
98
|
+
│ ├── api/client.ts # Typed fetch helpers for every API route
|
|
99
|
+
│ ├── hooks/
|
|
100
|
+
│ │ ├── useAudioEngine.ts # Web Audio API playback engine
|
|
101
|
+
│ │ └── useWebSocket.ts # WS client with exponential-backoff reconnect
|
|
102
|
+
│ ├── components/
|
|
103
|
+
│ │ ├── Deck.tsx # NOW PLAYING / UP NEXT panel
|
|
104
|
+
│ │ ├── CueChart.tsx # Recharts score visualisation (entry + exit)
|
|
105
|
+
│ │ ├── FeatureCharts.tsx # Energy, beat strength, vocal confidence
|
|
106
|
+
│ │ ├── WaveformView.tsx # WaveSurfer.js waveform with cue markers
|
|
107
|
+
│ │ ├── Queue.tsx # Drag-and-drop queue list
|
|
108
|
+
│ │ ├── Transport.tsx # Play / Stop / Mix Now controls
|
|
109
|
+
│ │ └── ErrorBoundary.tsx
|
|
110
|
+
│ └── types/ # Shared TypeScript interfaces
|
|
111
|
+
│
|
|
112
|
+
├── data/
|
|
113
|
+
│ ├── custom/
|
|
114
|
+
│ │ ├── house_music_personal.csv # Personal track library
|
|
115
|
+
│ │ └── annotations/ # JAMS annotation files
|
|
116
|
+
│ ├── M-DJCUE/ # Academic dataset (EDM)
|
|
117
|
+
│ ├── models/ # Serialised model runs (.pkl)
|
|
118
|
+
│ └── processed/ # Pre-extracted feature cache
|
|
119
|
+
│
|
|
120
|
+
├── mds/ # Design and architecture notes
|
|
121
|
+
├── pyproject.toml
|
|
122
|
+
└── makefile
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
---
|
|
126
|
+
|
|
127
|
+
## The Model
|
|
128
|
+
|
|
129
|
+
BeatBot uses a **Learning-to-Rank (LambdaRank)** approach implemented in LightGBM (`src/model/lightgbm.py`).
|
|
130
|
+
|
|
131
|
+
### Why Learning-to-Rank?
|
|
132
|
+
|
|
133
|
+
DJing is inherently a ranking problem, not a classification one. Some bars are _perfect_ cue points, others are _acceptable_, and most are irrelevant. LambdaRank directly optimises **NDCG** (Normalized Discounted Cumulative Gain), which rewards pushing the best bars to the top of the ranked list.
|
|
134
|
+
|
|
135
|
+
### Dual Rankers
|
|
136
|
+
|
|
137
|
+
Two separate models are trained for the two halves of the mixing decision:
|
|
138
|
+
|
|
139
|
+
| Model | Goal | Configuration |
|
|
140
|
+
| ---------------- | ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------ |
|
|
141
|
+
| **Entry Ranker** | Structural beginnings — intros, breakdowns | High regularisation (`reg_lambda=15`), shallow trees (`max_depth=3`) to learn general structural rules rather than overfitting |
|
|
142
|
+
| **Exit Ranker** | Structural endings — outros, post-chorus | Lower regularisation (`reg_lambda=5`), deeper trees (`max_depth=4`) to capture complex energy dynamics |
|
|
143
|
+
|
|
144
|
+
### Training Labels
|
|
145
|
+
|
|
146
|
+
Each bar in a training track is given a **graded relevance label**:
|
|
147
|
+
|
|
148
|
+
- `2` — Perfect cue (exact human annotation)
|
|
149
|
+
- `1` — Acceptable (within ±2 bars of annotation)
|
|
150
|
+
- `0` — Not a cue point
|
|
151
|
+
|
|
152
|
+
### Inference
|
|
153
|
+
|
|
154
|
+
At inference time (`src/api/state.py → predict_cues`):
|
|
155
|
+
|
|
156
|
+
1. `FeatureExtractor.extract(track)` produces a feature matrix (one row per bar).
|
|
157
|
+
2. Both rankers score every bar.
|
|
158
|
+
3. A **positional weight** discourages exit cues in the final ~15% of the track (where the model would otherwise exploit the structural similarity of outros).
|
|
159
|
+
4. If the selected entry and exit are implausibly close, the exit score is masked within `min_sep_bars` of the entry and the best remaining candidate is chosen.
|
|
160
|
+
5. Results are cached per track and returned to the frontend within ~200 ms.
|
|
161
|
+
|
|
162
|
+
### Model Artefacts
|
|
163
|
+
|
|
164
|
+
Trained models are saved under `data/models/`. Each run directory contains:
|
|
165
|
+
|
|
166
|
+
- `beatbot_model.pkl` — serialised `BeatBotModel`
|
|
167
|
+
- `evaluation.json` — NDCG scores and feature importances
|
|
168
|
+
- `figures/` — training curves and prediction plots
|
|
169
|
+
|
|
170
|
+
---
|
|
171
|
+
|
|
172
|
+
## Feature Engineering
|
|
173
|
+
|
|
174
|
+
`src/features.py` computes **40+ features per bar**, organised into 9 tiers:
|
|
175
|
+
|
|
176
|
+
| Tier | Features | Purpose |
|
|
177
|
+
| -------------------- | ------------------------------------------------------------------------------------------- | ------------------------------------------------------------- |
|
|
178
|
+
| 1 – Structure | `bar_pos_norm`, `dist_to_section`, `phrase_pos`, `duration` | "Where am I in the song?" |
|
|
179
|
+
| 2 – Energy | `energy_prev_8`, `energy_next_8`, `energy_volatility`, `energy_derivative`, `beat_strength` | "How energetic is this section?" |
|
|
180
|
+
| 3 – Timbre | `spectral_centroid`, `vocal_conf`, `harmonic_ratio`, `high_band_energy` | "What does it sound like?" |
|
|
181
|
+
| 4 – Chroma | `chroma_rel_0/3/7/9/11` | Key-invariant harmonic function (Tonic, Minor-3rd, Dominant…) |
|
|
182
|
+
| 5 – Rhythmic Grid | `is_4_bar`, `bar_mod_8/16/32` | Phrasing alignment — mixes should land on the "1" |
|
|
183
|
+
| 6 – Flux | `energy_flux`, `spectral_flux` | Instantaneous change (drops, crashes) |
|
|
184
|
+
| 7 – Advanced Context | `energy_contrast_future`, `is_likely_breakdown`, `vocal_future_8`, `vocal_past_8` | Look-ahead / look-behind "human" features |
|
|
185
|
+
| 8 – Metadata | `is_section_start`, `beat_consistency`, `percussion_intensity`, `spectral_rolloff` | Structural and rhythmic metadata |
|
|
186
|
+
| 9 – Composite | `phrase_boundary_strength` | Count of grid alignments (0–5) — strong downbeat signal |
|
|
187
|
+
|
|
188
|
+
Chroma features are **key-invariant**: the raw 12-bin chroma vector is rotated by the track's detected tonic so the model learns harmonic _function_ (Dominant, Subdominant) rather than absolute pitch class.
|
|
189
|
+
|
|
190
|
+
---
|
|
191
|
+
|
|
192
|
+
## API
|
|
193
|
+
|
|
194
|
+
The backend is a **FastAPI** app (`src/api/`) served by uvicorn.
|
|
195
|
+
|
|
196
|
+
```bash
|
|
197
|
+
PYTHONPATH=src .venv/bin/uvicorn api.main:app --reload --app-dir src
|
|
198
|
+
# Runs on http://localhost:8000
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
Key endpoints:
|
|
202
|
+
|
|
203
|
+
| Method | Path | Description |
|
|
204
|
+
| ----------------- | --------------------- | -------------------------------------------------------- |
|
|
205
|
+
| `GET` | `/tracks` | List all tracks in the library |
|
|
206
|
+
| `GET` | `/audio/{track_id}` | Stream the MP3 file |
|
|
207
|
+
| `POST` | `/predict/{track_id}` | Run cue prediction; returns scores + selected cues |
|
|
208
|
+
| `PATCH` | `/cues/{track_id}` | Override a cue point; validates and broadcasts via WS |
|
|
209
|
+
| `GET/POST/DELETE` | `/queue` | Queue management |
|
|
210
|
+
| `PATCH` | `/queue/reorder` | Reorder two queue positions |
|
|
211
|
+
| `POST` | `/transition/now` | Trigger immediate crossfade |
|
|
212
|
+
| `WS` | `/ws/session` | Real-time push events (`queue.updated`, `cues.accepted`) |
|
|
213
|
+
|
|
214
|
+
---
|
|
215
|
+
|
|
216
|
+
## Frontend
|
|
217
|
+
|
|
218
|
+
The UI is a **React 19 + TypeScript** single-page app built with Vite 6.
|
|
219
|
+
|
|
220
|
+
```bash
|
|
221
|
+
cd frontend && pnpm install && pnpm dev
|
|
222
|
+
# Runs on http://localhost:5173
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
Key design decisions:
|
|
226
|
+
|
|
227
|
+
- **Two physical decks (A / B)** alternate roles as NOW PLAYING and UP NEXT. The `activeDeck` ref drives all routing logic so async crossfades never touch the wrong slot.
|
|
228
|
+
- **Web Audio engine** (`useAudioEngine`) handles all playback, crossfading, and elapsed-time reporting.
|
|
229
|
+
- **WaveSurfer.js** renders the waveform but is staggered 3.5 s after deck load to avoid a simultaneous double PCM-decode that triggers Chrome OOM crashes.
|
|
230
|
+
- **WebSocket** (`useWebSocket`) reconnects with exponential backoff (150 ms → 5 s cap) so uvicorn `--reload` restarts are transparent.
|
|
231
|
+
- **Recharts** charts (cue scores + feature charts) share a `syncId` for synchronised hover cursors and render a live playhead `ReferenceLine`.
|
|
232
|
+
|
|
233
|
+
---
|
|
234
|
+
|
|
235
|
+
## Running Locally
|
|
236
|
+
|
|
237
|
+
**Prerequisites:** Python ≥ 3.13, Node.js ≥ 20, pnpm.
|
|
238
|
+
|
|
239
|
+
```bash
|
|
240
|
+
# 1. Python environment
|
|
241
|
+
python3.13 -m venv .venv
|
|
242
|
+
source .venv/bin/activate
|
|
243
|
+
pip install -e .
|
|
244
|
+
|
|
245
|
+
# 2. Backend
|
|
246
|
+
PYTHONPATH=src uvicorn api.main:app --reload --app-dir src
|
|
247
|
+
|
|
248
|
+
# 3. Frontend (separate terminal)
|
|
249
|
+
cd frontend
|
|
250
|
+
pnpm install
|
|
251
|
+
pnpm dev
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
Open [http://localhost:5173](http://localhost:5173).
|
|
255
|
+
|
|
256
|
+
---
|
|
257
|
+
|
|
258
|
+
## Data
|
|
259
|
+
|
|
260
|
+
| Path | Contents |
|
|
261
|
+
| -------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
|
|
262
|
+
| `data/custom/annotations/` | JAMS files — manually annotated cue points for ~100 house tracks |
|
|
263
|
+
| `data/custom/house_music_personal.csv` | Track metadata (BPM, key, duration, file path) |
|
|
264
|
+
| `data/M-DJCUE/` | Academic EDM dataset used for additional training signal |
|
|
265
|
+
| `data/models/` | Serialised model runs; the active model path is configured in `src/api/state.py` |
|
|
266
|
+
| `data/processed/` | Pre-extracted feature DataFrames cached as Parquet — regenerated by `src/extractor/extractor.py` if missing |
|
|
267
|
+
|
beatbot-0.1.0/README.md
ADDED
|
@@ -0,0 +1,230 @@
|
|
|
1
|
+
# BeatBot — AI-Powered DJ Mixing Tool
|
|
2
|
+
|
|
3
|
+
BeatBot is an AI-powered mixing assistant that analyses house music tracks and automatically selects the optimal **entry** and **exit** cue points for seamless DJ transitions. A React frontend gives the user full visibility into the model's predictions and lets them trigger or override crossfades in real time.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Table of Contents
|
|
8
|
+
|
|
9
|
+
- [Use Case](#use-case)
|
|
10
|
+
- [Project Structure](#project-structure)
|
|
11
|
+
- [The Model](#the-model)
|
|
12
|
+
- [Feature Engineering](#feature-engineering)
|
|
13
|
+
- [API](#api)
|
|
14
|
+
- [Frontend](#frontend)
|
|
15
|
+
- [Running Locally](#running-locally)
|
|
16
|
+
- [Data](#data)
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Use Case
|
|
21
|
+
|
|
22
|
+
A user loads a playlist of house music tracks into the queue. BeatBot:
|
|
23
|
+
|
|
24
|
+
1. Analyses each track with `librosa` and the `FeatureExtractor` pipeline.
|
|
25
|
+
2. Scores every bar in the track with the trained **dual LambdaRank** model.
|
|
26
|
+
3. Surfaces the best **entry** and **exit** bar for each track in the UI.
|
|
27
|
+
4. Automatically crossfades to the next track when the exit cue approaches, or immediately on user request.
|
|
28
|
+
|
|
29
|
+
The user can inspect the scoring charts, manually drag cue points, skip the upcoming track, reorder the queue, and adjust the crossfade duration — all without touching the model.
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## Project Structure
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
BeatBot/
|
|
37
|
+
├── src/ # Python back-end
|
|
38
|
+
│ ├── api/ # FastAPI application
|
|
39
|
+
│ │ ├── main.py # App factory, CORS, router registration
|
|
40
|
+
│ │ ├── state.py # Shared runtime state (queue, cue cache, predict_cues)
|
|
41
|
+
│ │ ├── schemas.py # Pydantic request / response models
|
|
42
|
+
│ │ ├── ws_manager.py # WebSocket connection manager
|
|
43
|
+
│ │ └── routes/
|
|
44
|
+
│ │ ├── audio.py # GET /audio/{track_id} — streams MP3
|
|
45
|
+
│ │ ├── tracks.py # GET /tracks — lists library
|
|
46
|
+
│ │ ├── queue.py # Queue CRUD + reorder
|
|
47
|
+
│ │ ├── predict.py # POST /predict/{track_id}
|
|
48
|
+
│ │ ├── cues.py # PATCH /cues/{track_id}
|
|
49
|
+
│ │ ├── session.py # WebSocket /ws/session
|
|
50
|
+
│ │ └── transition.py # POST /transition/now
|
|
51
|
+
│ ├── model/
|
|
52
|
+
│ │ └── lightgbm.py # BeatBotModel — dual LambdaRank wrapper
|
|
53
|
+
│ ├── extractor/
|
|
54
|
+
│ │ └── extractor.py # Audio → Track pipeline (librosa)
|
|
55
|
+
│ ├── features.py # FeatureExtractor — 40+ features per bar
|
|
56
|
+
│ ├── track.py # Track dataclass
|
|
57
|
+
│ └── annotator.py # Annotation helper (JAMS format)
|
|
58
|
+
│
|
|
59
|
+
├── frontend/ # React + TypeScript UI
|
|
60
|
+
│ └── src/
|
|
61
|
+
│ ├── App.tsx # Root: queue state, deck routing, crossfade logic
|
|
62
|
+
│ ├── api/client.ts # Typed fetch helpers for every API route
|
|
63
|
+
│ ├── hooks/
|
|
64
|
+
│ │ ├── useAudioEngine.ts # Web Audio API playback engine
|
|
65
|
+
│ │ └── useWebSocket.ts # WS client with exponential-backoff reconnect
|
|
66
|
+
│ ├── components/
|
|
67
|
+
│ │ ├── Deck.tsx # NOW PLAYING / UP NEXT panel
|
|
68
|
+
│ │ ├── CueChart.tsx # Recharts score visualisation (entry + exit)
|
|
69
|
+
│ │ ├── FeatureCharts.tsx # Energy, beat strength, vocal confidence
|
|
70
|
+
│ │ ├── WaveformView.tsx # WaveSurfer.js waveform with cue markers
|
|
71
|
+
│ │ ├── Queue.tsx # Drag-and-drop queue list
|
|
72
|
+
│ │ ├── Transport.tsx # Play / Stop / Mix Now controls
|
|
73
|
+
│ │ └── ErrorBoundary.tsx
|
|
74
|
+
│ └── types/ # Shared TypeScript interfaces
|
|
75
|
+
│
|
|
76
|
+
├── data/
|
|
77
|
+
│ ├── custom/
|
|
78
|
+
│ │ ├── house_music_personal.csv # Personal track library
|
|
79
|
+
│ │ └── annotations/ # JAMS annotation files
|
|
80
|
+
│ ├── M-DJCUE/ # Academic dataset (EDM)
|
|
81
|
+
│ ├── models/ # Serialised model runs (.pkl)
|
|
82
|
+
│ └── processed/ # Pre-extracted feature cache
|
|
83
|
+
│
|
|
84
|
+
├── mds/ # Design and architecture notes
|
|
85
|
+
├── pyproject.toml
|
|
86
|
+
└── makefile
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## The Model
|
|
92
|
+
|
|
93
|
+
BeatBot uses a **Learning-to-Rank (LambdaRank)** approach implemented in LightGBM (`src/model/lightgbm.py`).
|
|
94
|
+
|
|
95
|
+
### Why Learning-to-Rank?
|
|
96
|
+
|
|
97
|
+
DJing is inherently a ranking problem, not a classification one. Some bars are _perfect_ cue points, others are _acceptable_, and most are irrelevant. LambdaRank directly optimises **NDCG** (Normalized Discounted Cumulative Gain), which rewards pushing the best bars to the top of the ranked list.
|
|
98
|
+
|
|
99
|
+
### Dual Rankers
|
|
100
|
+
|
|
101
|
+
Two separate models are trained for the two halves of the mixing decision:
|
|
102
|
+
|
|
103
|
+
| Model | Goal | Configuration |
|
|
104
|
+
| ---------------- | ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------ |
|
|
105
|
+
| **Entry Ranker** | Structural beginnings — intros, breakdowns | High regularisation (`reg_lambda=15`), shallow trees (`max_depth=3`) to learn general structural rules rather than overfitting |
|
|
106
|
+
| **Exit Ranker** | Structural endings — outros, post-chorus | Lower regularisation (`reg_lambda=5`), deeper trees (`max_depth=4`) to capture complex energy dynamics |
|
|
107
|
+
|
|
108
|
+
### Training Labels
|
|
109
|
+
|
|
110
|
+
Each bar in a training track is given a **graded relevance label**:
|
|
111
|
+
|
|
112
|
+
- `2` — Perfect cue (exact human annotation)
|
|
113
|
+
- `1` — Acceptable (within ±2 bars of annotation)
|
|
114
|
+
- `0` — Not a cue point
|
|
115
|
+
|
|
116
|
+
### Inference
|
|
117
|
+
|
|
118
|
+
At inference time (`src/api/state.py → predict_cues`):
|
|
119
|
+
|
|
120
|
+
1. `FeatureExtractor.extract(track)` produces a feature matrix (one row per bar).
|
|
121
|
+
2. Both rankers score every bar.
|
|
122
|
+
3. A **positional weight** discourages exit cues in the final ~15% of the track (where the model would otherwise exploit the structural similarity of outros).
|
|
123
|
+
4. If the selected entry and exit are implausibly close, the exit score is masked within `min_sep_bars` of the entry and the best remaining candidate is chosen.
|
|
124
|
+
5. Results are cached per track and returned to the frontend within ~200 ms.
|
|
125
|
+
|
|
126
|
+
### Model Artefacts
|
|
127
|
+
|
|
128
|
+
Trained models are saved under `data/models/`. Each run directory contains:
|
|
129
|
+
|
|
130
|
+
- `beatbot_model.pkl` — serialised `BeatBotModel`
|
|
131
|
+
- `evaluation.json` — NDCG scores and feature importances
|
|
132
|
+
- `figures/` — training curves and prediction plots
|
|
133
|
+
|
|
134
|
+
---
|
|
135
|
+
|
|
136
|
+
## Feature Engineering
|
|
137
|
+
|
|
138
|
+
`src/features.py` computes **40+ features per bar**, organised into 9 tiers:
|
|
139
|
+
|
|
140
|
+
| Tier | Features | Purpose |
|
|
141
|
+
| -------------------- | ------------------------------------------------------------------------------------------- | ------------------------------------------------------------- |
|
|
142
|
+
| 1 – Structure | `bar_pos_norm`, `dist_to_section`, `phrase_pos`, `duration` | "Where am I in the song?" |
|
|
143
|
+
| 2 – Energy | `energy_prev_8`, `energy_next_8`, `energy_volatility`, `energy_derivative`, `beat_strength` | "How energetic is this section?" |
|
|
144
|
+
| 3 – Timbre | `spectral_centroid`, `vocal_conf`, `harmonic_ratio`, `high_band_energy` | "What does it sound like?" |
|
|
145
|
+
| 4 – Chroma | `chroma_rel_0/3/7/9/11` | Key-invariant harmonic function (Tonic, Minor-3rd, Dominant…) |
|
|
146
|
+
| 5 – Rhythmic Grid | `is_4_bar`, `bar_mod_8/16/32` | Phrasing alignment — mixes should land on the "1" |
|
|
147
|
+
| 6 – Flux | `energy_flux`, `spectral_flux` | Instantaneous change (drops, crashes) |
|
|
148
|
+
| 7 – Advanced Context | `energy_contrast_future`, `is_likely_breakdown`, `vocal_future_8`, `vocal_past_8` | Look-ahead / look-behind "human" features |
|
|
149
|
+
| 8 – Metadata | `is_section_start`, `beat_consistency`, `percussion_intensity`, `spectral_rolloff` | Structural and rhythmic metadata |
|
|
150
|
+
| 9 – Composite | `phrase_boundary_strength` | Count of grid alignments (0–5) — strong downbeat signal |
|
|
151
|
+
|
|
152
|
+
Chroma features are **key-invariant**: the raw 12-bin chroma vector is rotated by the track's detected tonic so the model learns harmonic _function_ (Dominant, Subdominant) rather than absolute pitch class.
|
|
153
|
+
|
|
154
|
+
---
|
|
155
|
+
|
|
156
|
+
## API
|
|
157
|
+
|
|
158
|
+
The backend is a **FastAPI** app (`src/api/`) served by uvicorn.
|
|
159
|
+
|
|
160
|
+
```bash
|
|
161
|
+
PYTHONPATH=src .venv/bin/uvicorn api.main:app --reload --app-dir src
|
|
162
|
+
# Runs on http://localhost:8000
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
Key endpoints:
|
|
166
|
+
|
|
167
|
+
| Method | Path | Description |
|
|
168
|
+
| ----------------- | --------------------- | -------------------------------------------------------- |
|
|
169
|
+
| `GET` | `/tracks` | List all tracks in the library |
|
|
170
|
+
| `GET` | `/audio/{track_id}` | Stream the MP3 file |
|
|
171
|
+
| `POST` | `/predict/{track_id}` | Run cue prediction; returns scores + selected cues |
|
|
172
|
+
| `PATCH` | `/cues/{track_id}` | Override a cue point; validates and broadcasts via WS |
|
|
173
|
+
| `GET/POST/DELETE` | `/queue` | Queue management |
|
|
174
|
+
| `PATCH` | `/queue/reorder` | Reorder two queue positions |
|
|
175
|
+
| `POST` | `/transition/now` | Trigger immediate crossfade |
|
|
176
|
+
| `WS` | `/ws/session` | Real-time push events (`queue.updated`, `cues.accepted`) |
|
|
177
|
+
|
|
178
|
+
---
|
|
179
|
+
|
|
180
|
+
## Frontend
|
|
181
|
+
|
|
182
|
+
The UI is a **React 19 + TypeScript** single-page app built with Vite 6.
|
|
183
|
+
|
|
184
|
+
```bash
|
|
185
|
+
cd frontend && pnpm install && pnpm dev
|
|
186
|
+
# Runs on http://localhost:5173
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
Key design decisions:
|
|
190
|
+
|
|
191
|
+
- **Two physical decks (A / B)** alternate roles as NOW PLAYING and UP NEXT. The `activeDeck` ref drives all routing logic so async crossfades never touch the wrong slot.
|
|
192
|
+
- **Web Audio engine** (`useAudioEngine`) handles all playback, crossfading, and elapsed-time reporting.
|
|
193
|
+
- **WaveSurfer.js** renders the waveform but is staggered 3.5 s after deck load to avoid a simultaneous double PCM-decode that triggers Chrome OOM crashes.
|
|
194
|
+
- **WebSocket** (`useWebSocket`) reconnects with exponential backoff (150 ms → 5 s cap) so uvicorn `--reload` restarts are transparent.
|
|
195
|
+
- **Recharts** charts (cue scores + feature charts) share a `syncId` for synchronised hover cursors and render a live playhead `ReferenceLine`.
|
|
196
|
+
|
|
197
|
+
---
|
|
198
|
+
|
|
199
|
+
## Running Locally
|
|
200
|
+
|
|
201
|
+
**Prerequisites:** Python ≥ 3.13, Node.js ≥ 20, pnpm.
|
|
202
|
+
|
|
203
|
+
```bash
|
|
204
|
+
# 1. Python environment
|
|
205
|
+
python3.13 -m venv .venv
|
|
206
|
+
source .venv/bin/activate
|
|
207
|
+
pip install -e .
|
|
208
|
+
|
|
209
|
+
# 2. Backend
|
|
210
|
+
PYTHONPATH=src uvicorn api.main:app --reload --app-dir src
|
|
211
|
+
|
|
212
|
+
# 3. Frontend (separate terminal)
|
|
213
|
+
cd frontend
|
|
214
|
+
pnpm install
|
|
215
|
+
pnpm dev
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
Open [http://localhost:5173](http://localhost:5173).
|
|
219
|
+
|
|
220
|
+
---
|
|
221
|
+
|
|
222
|
+
## Data
|
|
223
|
+
|
|
224
|
+
| Path | Contents |
|
|
225
|
+
| -------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
|
|
226
|
+
| `data/custom/annotations/` | JAMS files — manually annotated cue points for ~100 house tracks |
|
|
227
|
+
| `data/custom/house_music_personal.csv` | Track metadata (BPM, key, duration, file path) |
|
|
228
|
+
| `data/M-DJCUE/` | Academic EDM dataset used for additional training signal |
|
|
229
|
+
| `data/models/` | Serialised model runs; the active model path is configured in `src/api/state.py` |
|
|
230
|
+
| `data/processed/` | Pre-extracted feature DataFrames cached as Parquet — regenerated by `src/extractor/extractor.py` if missing |
|