broadcastx 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,10 @@
1
+ Metadata-Version: 2.4
2
+ Name: broadcastx
3
+ Version: 0.1.0
4
+ Summary: Discover and download X/Twitter broadcast videos from user timelines
5
+ Requires-Python: >=3.11
6
+ Requires-Dist: playwright>=1.40
7
+ Requires-Dist: rich>=13.0
8
+ Requires-Dist: click>=8.0
9
+ Requires-Dist: httpx>=0.27
10
+ Requires-Dist: twscrape>=0.18
@@ -0,0 +1,196 @@
1
+ # BroadcastX
2
+
3
+ <a href="README_ZH.md">πŸ‡¨πŸ‡³ δΈ­ζ–‡η‰ˆ</a> | <a href="README.md">πŸ‡¬πŸ‡§ English</a>
4
+
5
+ [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/) [![PyPI](https://img.shields.io/pypi/v/broadcastx.svg)](https://pypi.org/project/broadcastx/)
6
+
7
+ Discover, monitor, and download X/Twitter broadcast videos from user timelines.
8
+
9
+ **BroadcastX** is a CLI tool that helps you:
10
+
11
+ - **Scan** β€” Find broadcast links in a user's timeline
12
+ - **Download** β€” Download broadcast videos with automatic phone-rotation correction
13
+ - **Monitor** β€” Watch a profile for live broadcasts and auto-download replays
14
+
15
+ ## Features
16
+
17
+ ### Scan
18
+
19
+ Uses Playwright browser automation to scroll through a user's X profile and intercept GraphQL API responses to extract broadcast URLs. More reliable than DOM scraping.
20
+
21
+ ### Download with Auto-Rotation
22
+
23
+ Downloads broadcast videos via `yt-dlp` and post-processes the video to correct phone orientation. Broadcasts streamed from a phone in portrait mode appear upright after processing. A `.rotation.jsonl` sidecar file is written alongside the video for inspection.
24
+
25
+ ### Monitor
26
+
27
+ Continuously monitors a user's profile. When a live broadcast is detected, periodically checks its status. When the broadcast ends, automatically downloads the replay.
28
+
29
+
30
+ ## Prerequisites
31
+
32
+ - **Python 3.11+**
33
+ - **[yt-dlp](https://github.com/yt-dlp/yt-dlp)** β€” `brew install yt-dlp`
34
+ - **[ffmpeg](https://ffmpeg.org/)** β€” `brew install ffmpeg`
35
+ - **Google Chrome** (installed separately)
36
+
37
+ ## Installation
38
+
39
+ ```bash
40
+ # Create virtual environment
41
+ python3 -m venv .venv
42
+ source .venv/bin/activate
43
+
44
+ # Install BroadcastX and dependencies
45
+ pip install -e .
46
+
47
+ # Install Playwright's browser driver
48
+ playwright install chromium
49
+ ```
50
+
51
+ ## Quick Start
52
+
53
+ ```bash
54
+ # Scan a user's timeline for broadcast links
55
+ broadcastx scan @username
56
+
57
+ # Download broadcasts from scan results
58
+ broadcastx download --from output/broadcasts.json
59
+
60
+ # Monitor a user for live broadcasts
61
+ broadcastx monitor @username
62
+ ```
63
+
64
+ ## Usage
65
+
66
+ ### Scan a timeline for broadcasts
67
+
68
+ ```bash
69
+ broadcastx scan @username
70
+
71
+ # Options:
72
+ # --max-scrolls 100 Maximum scroll actions
73
+ # --scroll-delay 2.0 Seconds between scrolls
74
+ # --idle-timeout 10.0 Stop after N seconds with no new data
75
+ # --output FILE Output path (default: output/broadcasts.json)
76
+ # --headless Run browser without visible window
77
+ ```
78
+
79
+ The scanner opens the user's X profile in Chrome, scrolls through the timeline, and intercepts API responses. Broadcast URLs are extracted from tweet cards. If you are not logged in, the browser shows the login page β€” log in manually, then press Enter in the terminal to continue. Your session is saved to `~/.broadcastx/chrome-profile/` for future runs.
80
+
81
+ ### Download broadcasts
82
+
83
+ ```bash
84
+ # Single broadcast
85
+ broadcastx download https://x.com/i/broadcasts/1vAxRkBbDRzKl
86
+
87
+ # From scan results
88
+ broadcastx download --from output/broadcasts.json
89
+
90
+ # Multiple concurrent downloads
91
+ broadcastx download --from output/broadcasts.json -p 3
92
+
93
+ # Custom output directory
94
+ broadcastx download --from output/broadcasts.json -o ./videos
95
+
96
+ # Use Firefox cookies
97
+ broadcastx download --from output/broadcasts.json --browser firefox
98
+
99
+ # Verbose yt-dlp output
100
+ broadcastx download --from output/broadcasts.json -v
101
+ ```
102
+
103
+ BroadcastX **automatically corrects phone rotation**: if the broadcast carries phone-orientation metadata in the HLS stream, the video is re-encoded so it displays upright in any player.
104
+
105
+ ### Monitor a profile for live broadcasts
106
+
107
+ ```bash
108
+ broadcastx monitor @username
109
+
110
+ # One-shot test cycle (no loop)
111
+ broadcastx monitor @username --once
112
+
113
+ # Download to a custom directory
114
+ broadcastx monitor @username -o ./my_videos
115
+
116
+ # Custom check intervals (seconds)
117
+ broadcastx monitor @username --check-interval 1800 --live-interval 300
118
+
119
+ # Detect only, skip download
120
+ broadcastx monitor @username --no-download
121
+ ```
122
+
123
+ The monitor runs in a loop:
124
+
125
+ 1. **Profile check** (every `check-interval`, default 30 min) β€” Opens the profile and looks for broadcast cards.
126
+ 2. **Live detection** β€” When a candidate is found, checks whether it is currently live.
127
+ 3. **Live check** (every `live-interval`, default 5 min) β€” Re-checks status until the broadcast ends.
128
+ 4. **Download** β€” Downloads the replay automatically.
129
+
130
+ Events are logged to `output/monitor_events.json`.
131
+ ### Scrape all past broadcasts
132
+
133
+ ```bash
134
+ broadcastx scrape @username
135
+
136
+ # Ignore saved state and start from the beginning
137
+ broadcastx scrape @username --fresh
138
+
139
+ # Add delay and verbose output
140
+ broadcastx scrape @username --delay 2.0 -v
141
+
142
+ # Supply credentials directly (skips browser login)
143
+ broadcastx scrape @username \
144
+ --auth-token "your_auth_token" \
145
+ --csrf-token "your_ct0" \
146
+ --user-id "1234567890"
147
+ ```
148
+
149
+ Uses GraphQL API pagination with cursor-based resumption for full history traversal. State is saved locally, so you can pause and resume after rate limits.
150
+
151
+ ## Output Structure
152
+
153
+ ```
154
+ output/
155
+ β”œβ”€β”€ broadcasts.json # Scan results
156
+ β”œβ”€β”€ monitor_events.json # Monitor event log
157
+ └── videos/
158
+ β”œβ”€β”€ [title] [id].mp4 # Downloaded broadcast
159
+ β”œβ”€β”€ [id].rotation.jsonl # Rotation timeline sidecar
160
+ └── ...
161
+ ```
162
+
163
+ ## Pipeline Examples
164
+
165
+ ```bash
166
+ # Scan + download all found broadcasts
167
+ broadcastx scan @username
168
+ broadcastx download --from output/broadcasts.json
169
+
170
+ # Monitor with auto-download
171
+ broadcastx monitor @username -o ./videos
172
+
173
+ # Bulk scrape + download
174
+ broadcastx scrape @username
175
+ broadcastx download --from output/username_broadcasts.json
176
+ ```
177
+
178
+ ## How It Works
179
+
180
+ ### Scanner
181
+ Uses Playwright to intercept Twitter's GraphQL API responses (`UserTweets` / `TweetDetail`). This is more stable than DOM scraping because JSON response structures change less frequently than HTML.
182
+
183
+ ### Downloader
184
+ Wraps `yt-dlp` (which has a built-in `TwitterBroadcastIE` extractor) and adds:
185
+ - **Rotation sidecar extraction** β€” Parses timed-ID3 metadata from HLS segments
186
+ - **Auto-rotation** β€” Re-encodes the video with correct orientation via ffmpeg
187
+
188
+ ### Rotation Sidecar
189
+ The JSONL sidecar (`[id].rotation.jsonl`) contains one record per HLS segment:
190
+ - `raw_rotation` β€” Original sensor angle from Periscope
191
+ - `rotation` β€” Quantized to 0Β°, 90Β°, 180Β°, or 270Β° with hysteresis
192
+ - `ntp` β€” NTP timestamp for timeline reconstruction
193
+
194
+ ## License
195
+
196
+ MIT
@@ -0,0 +1,3 @@
1
+ """BroadcastX - Discover and download X/Twitter broadcast videos."""
2
+
3
+ __version__ = "0.1.0"
@@ -0,0 +1,244 @@
1
+ """
2
+ BroadcastX CLI β€” Discover and download X/Twitter broadcast videos.
3
+
4
+ Usage:
5
+ broadcastx scan @username
6
+ broadcastx download https://x.com/i/broadcasts/...
7
+ broadcastx download --from broadcasts.json
8
+ """
9
+
10
+ import asyncio
11
+ from pathlib import Path
12
+
13
+ import click
14
+ from rich.console import Console
15
+
16
+ from . import __version__
17
+ from .config import DEFAULT_BROADCASTS_FILE, DEFAULT_BROWSER, DEFAULT_VIDEOS_DIR
18
+ from .downloader import check_ffmpeg, check_yt_dlp, download_all, download_broadcast
19
+ from .monitor import monitor_user
20
+ from .pause_detector import detect_pauses, pause_report, trim_intervals
21
+ from .scanner import scan_user
22
+ from .scrape_broadcasts import scrape_broadcasts
23
+
24
+ console = Console()
25
+
26
+
27
+ @click.group()
28
+ @click.version_option(version=__version__, prog_name="broadcastx")
29
+ def main():
30
+ """BroadcastX β€” Discover and download X/Twitter broadcast videos."""
31
+ pass
32
+
33
+
34
+ @main.command()
35
+ @click.argument("username")
36
+ @click.option("--max-scrolls", "-n", default=100, help="Maximum scroll actions (default: 100)")
37
+ @click.option("--scroll-delay", "-d", default=2.0, help="Delay between scrolls in seconds (default: 2.0)")
38
+ @click.option("--idle-timeout", "-t", default=10.0, help="Stop after N seconds with no new data (default: 10)")
39
+ @click.option("--output", "-o", default=None, help="Output JSON file path")
40
+ @click.option("--headless/--no-headless", default=False, help="Run browser headless (default: visible)")
41
+ def scan(username, max_scrolls, scroll_delay, idle_timeout, output, headless):
42
+ """Scan a user's timeline for broadcast links.
43
+
44
+ USERNAME can be with or without @ (e.g., @elonmusk or elonmusk).
45
+ """
46
+ asyncio.run(scan_user(
47
+ username=username,
48
+ max_scrolls=max_scrolls,
49
+ scroll_delay=scroll_delay,
50
+ idle_timeout=idle_timeout,
51
+ headless=headless,
52
+ output_file=output,
53
+ ))
54
+
55
+
56
+ @main.command()
57
+ @click.argument("username")
58
+ @click.option("--output", "-o", default=None, help="Output JSON file path")
59
+ @click.option("--delay", default=1.0, help="Delay between API calls in seconds (default: 1.0)")
60
+ @click.option("--headless/--no-headless", default=False, help="Run browser headless (default: visible)")
61
+ @click.option("--verbose", "-v", is_flag=True, help="Show detailed output")
62
+ @click.option("--fresh", is_flag=True, help="Ignore saved state, start from beginning")
63
+ @click.option("--auth-token", default=None, help="Manual auth_token cookie (skips browser)")
64
+ @click.option("--csrf-token", default=None, help="Manual ct0/CSRF token (skips browser)")
65
+ @click.option("--user-id", default=None, help="Manual user ID (skips user ID lookup)")
66
+ def scrape(username, output, delay, headless, verbose, fresh, auth_token, csrf_token, user_id):
67
+ """Scrape ALL past broadcasts from a user's timeline.
68
+
69
+ Uses GraphQL API pagination. Saves cursor state so you can resume
70
+ after rate limits. Run the same command again to continue.
71
+
72
+ USERNAME can be with or without @ (e.g., @SpaceX or SpaceX).
73
+
74
+ Examples:
75
+
76
+ broadcastx scrape @SpaceX
77
+
78
+ broadcastx scrape @SpaceX --fresh # ignore saved state
79
+
80
+ broadcastx scrape @SpaceX --delay 2.0 -v
81
+ """
82
+ if fresh:
83
+ from .scrape_broadcasts import _state_file
84
+ state_path = _state_file(username.lstrip("@"))
85
+ if state_path.exists():
86
+ state_path.unlink()
87
+ console.print(f"[dim]Cleared saved state: {state_path}[/dim]")
88
+
89
+ asyncio.run(scrape_broadcasts(
90
+ username=username,
91
+ headless=headless,
92
+ output_file=output,
93
+ delay=delay,
94
+ verbose=verbose,
95
+ auth_token=auth_token,
96
+ csrf_token=csrf_token,
97
+ user_id=user_id,
98
+ ))
99
+
100
+
101
+ @main.command()
102
+ @click.argument("username")
103
+ @click.option("--check-interval", default=30 * 60, help="Seconds between profile checks (default: 1800)")
104
+ @click.option("--live-interval", default=5 * 60, help="Seconds between live-status checks (default: 300)")
105
+ @click.option("--output", "-o", default=None, help="Monitor event JSON file path")
106
+ @click.option("--output-dir", default=None, help="Directory for downloaded videos")
107
+ @click.option("--browser", "-b", default=DEFAULT_BROWSER, help=f"Browser for yt-dlp cookies (default: {DEFAULT_BROWSER})")
108
+ @click.option("--headless/--no-headless", default=False, help="Run browser headless (default: visible)")
109
+ @click.option("--download/--no-download", default=True, help="Download when broadcast ends (default: download)")
110
+ @click.option("--once", is_flag=True, help="Run one detection cycle, useful for testing")
111
+ def monitor(username, check_interval, live_interval, output, output_dir, browser, headless, download, once):
112
+ """Monitor a profile for current live broadcasts and download ended replays.
113
+
114
+ USERNAME can be with or without @ (e.g., @SpaceX or SpaceX).
115
+ """
116
+ if download and not check_yt_dlp():
117
+ console.print("[red]βœ— yt-dlp not found.[/red]")
118
+ console.print(" Install with: [bold]brew install yt-dlp[/bold]")
119
+ raise SystemExit(1)
120
+
121
+ if download and not check_ffmpeg():
122
+ console.print("[red]βœ— ffmpeg not found.[/red]")
123
+ console.print(" Install with: [bold]brew install ffmpeg[/bold]")
124
+ raise SystemExit(1)
125
+
126
+ asyncio.run(monitor_user(
127
+ username=username,
128
+ check_interval=check_interval,
129
+ live_interval=live_interval,
130
+ headless=headless,
131
+ output_file=output,
132
+ output_dir=output_dir,
133
+ browser=browser,
134
+ download=download,
135
+ once=once,
136
+ ))
137
+
138
+
139
+ @main.command()
140
+ @click.argument("urls", nargs=-1)
141
+ @click.option("--from", "from_file", default=None, type=click.Path(), help="Load URLs from a JSON file")
142
+ @click.option("--output-dir", "-o", default=None, help="Output directory for videos")
143
+ @click.option("--browser", "-b", default=DEFAULT_BROWSER, help=f"Browser for cookies (default: {DEFAULT_BROWSER})")
144
+ @click.option("--verbose", "-v", is_flag=True, help="Show yt-dlp output")
145
+ @click.option("--parallel", "-p", default=1, help="Number of concurrent downloads (default: 1)")
146
+ def download(urls, from_file, output_dir, browser, verbose, parallel):
147
+ """Download broadcast video(s).
148
+
149
+ Pass one or more broadcast URLs directly, or use --from to load from a JSON file.
150
+
151
+ Examples:
152
+
153
+ broadcastx download https://x.com/i/broadcasts/1vAxRkBbDRzKl
154
+
155
+ broadcastx download --from output/broadcasts.json
156
+
157
+ broadcastx download --from output/broadcasts.json -o ./my_videos
158
+
159
+ Rotation correction is applied automatically: if the broadcast carries
160
+ phone-orientation metadata, the downloaded video is re-encoded so it
161
+ displays upright. A `.rotation.jsonl` sidecar is also written alongside
162
+ the video for inspection.
163
+ """
164
+ # Pre-flight checks
165
+ if not check_yt_dlp():
166
+ console.print("[red]βœ— yt-dlp not found.[/red]")
167
+ console.print(" Install with: [bold]brew install yt-dlp[/bold]")
168
+ raise SystemExit(1)
169
+
170
+ if not check_ffmpeg():
171
+ console.print("[red]βœ— ffmpeg not found.[/red]")
172
+ console.print(" Install with: [bold]brew install ffmpeg[/bold]")
173
+ raise SystemExit(1)
174
+
175
+ if not urls and not from_file:
176
+ console.print("[yellow]Provide URLs or use --from <file>.[/yellow]")
177
+ raise SystemExit(1)
178
+
179
+ out = Path(output_dir) if output_dir else DEFAULT_VIDEOS_DIR
180
+
181
+ results = download_all(
182
+ urls=list(urls),
183
+ from_file=from_file,
184
+ output_dir=out,
185
+ browser=browser,
186
+ verbose=verbose,
187
+ parallel=parallel,
188
+ )
189
+
190
+ # Exit with error code if any downloads failed
191
+ if any(not r.success for r in results):
192
+ raise SystemExit(1)
193
+
194
+
195
+ if __name__ == "__main__":
196
+ main()
197
+ @main.command()
198
+ @click.argument("broadcast_url")
199
+ @click.option("--browser", "-b", default=DEFAULT_BROWSER, help=f"Browser for cookies (default: {DEFAULT_BROWSER})")
200
+ @click.option("--trim/--detect-only", default=False, help="Actually trim paused sections (default: detect only)")
201
+ @click.option("--output", "-o", default=None, help="Output video for --trim (default: <video>.trimmed.mp4)")
202
+ @click.option("--size-ratio", default=0.50, help="Size-drop threshold (default 0.50)")
203
+ @click.option("--gap-density", default=0.50, help="PDT-gap density threshold (default 0.50)")
204
+ @click.option("--min-pause", default=10.0, help="Minimum pause duration in seconds (default 10)")
205
+ def trim_pauses(broadcast_url, browser, trim, output, size_ratio, gap_density, min_pause):
206
+ """Detect (and optionally trim) paused sections in a broadcast.
207
+
208
+ Analyses HLS segments via HTTP HEAD requests (no full download) and
209
+ playlist PDT timestamps to find sections where the video was paused
210
+ while audio continued. Default: detect-only. Pass --trim to cut.
211
+ """
212
+ if trim and not check_ffmpeg():
213
+ console.print("[red]ffmpeg not found - install with: brew install ffmpeg[/red]")
214
+ raise SystemExit(1)
215
+
216
+ console.print("[bold]Analysing HLS segments for pauses...[/bold]")
217
+
218
+ pauses = detect_pauses(
219
+ broadcast_url,
220
+ browser=browser,
221
+ size_ratio_threshold=size_ratio,
222
+ gap_density_threshold=gap_density,
223
+ min_pause_sec=min_pause,
224
+ )
225
+
226
+ console.print(pause_report(pauses))
227
+
228
+ if trim and pauses:
229
+ video_path = Path("output") / "videos" / f"{broadcast_url.split('/')[-1]}.mp4"
230
+ if not video_path.exists():
231
+ console.print(f"[red]Video not found: {video_path}")
232
+ console.print(" Download first: broadcastx download <url>")
233
+ raise SystemExit(1)
234
+
235
+ out = Path(output) if output else Path(str(video_path).replace(".mp4", ".trimmed.mp4"))
236
+ console.print(f"\n[bold]Trimming -> {out}...")
237
+ try:
238
+ trim_intervals(video_path, pauses, out)
239
+ console.print(f" [green]Done -> {out}")
240
+ except Exception as e:
241
+ console.print(f" [red]Failed: {e}")
242
+ raise SystemExit(1)
243
+ elif trim and not pauses:
244
+ console.print("[green]Nothing to trim.")
@@ -0,0 +1,67 @@
1
+ """Shared configuration and constants for BroadcastX."""
2
+
3
+ import re
4
+ from pathlib import Path
5
+
6
+ # Default output directory (relative to cwd)
7
+ DEFAULT_OUTPUT_DIR = Path("output")
8
+ DEFAULT_VIDEOS_DIR = DEFAULT_OUTPUT_DIR / "videos"
9
+ DEFAULT_BROADCASTS_FILE = DEFAULT_OUTPUT_DIR / "broadcasts.json"
10
+
11
+ # Browser to extract cookies from (for yt-dlp)
12
+ DEFAULT_BROWSER = "chrome"
13
+
14
+ # X broadcast IDs observed in real broadcast URLs are opaque alphanumeric
15
+ # tokens, e.g. 1vAxRkBbDRzKl. Reject tiny fragments such as /broadcasts/1.
16
+ BROADCAST_ID_RE = r"[A-Za-z0-9]{8,}"
17
+
18
+ # Broadcast URL patterns to match
19
+ BROADCAST_PATTERNS = [
20
+ re.compile(rf"https?://(?:x|twitter)\.com/i/broadcasts/({BROADCAST_ID_RE})(?![A-Za-z0-9_])"),
21
+ re.compile(rf"https?://(?:www\.)?pscp\.tv/w/({BROADCAST_ID_RE})(?![A-Za-z0-9_])"),
22
+ ]
23
+
24
+ # Twitter GraphQL endpoints to intercept
25
+ GRAPHQL_ENDPOINTS = [
26
+ "UserTweets",
27
+ "UserTweetsAndReplies",
28
+ "TweetDetail",
29
+ "SearchTimeline",
30
+ ]
31
+
32
+ # Scanner defaults
33
+ DEFAULT_MAX_SCROLLS = 100 # Maximum number of scroll actions
34
+ DEFAULT_SCROLL_DELAY = 2.0 # Seconds between scrolls
35
+ DEFAULT_IDLE_TIMEOUT = 10.0 # Stop after N seconds with no new tweets
36
+ DEFAULT_HEADLESS = False # Show browser by default (useful for login)
37
+
38
+ # yt-dlp output template β€” uses broadcast ID as filename
39
+ YTDLP_OUTPUT_TEMPLATE = "%(id)s [%(timestamp>%Y-%m-%d %H.%M.%S)s] %(title)s.%(ext)s"
40
+
41
+
42
+ def ensure_output_dirs():
43
+ """Create output directories if they don't exist."""
44
+ DEFAULT_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
45
+ DEFAULT_VIDEOS_DIR.mkdir(parents=True, exist_ok=True)
46
+
47
+
48
+ def extract_broadcast_id(url: str) -> str | None:
49
+ """Extract broadcast ID from a broadcast URL."""
50
+ for pattern in BROADCAST_PATTERNS:
51
+ match = pattern.search(url)
52
+ if match:
53
+ return match.group(1)
54
+ return None
55
+
56
+
57
+ def is_broadcast_url(url: str) -> bool:
58
+ """Check if a URL is a broadcast URL."""
59
+ return extract_broadcast_id(url) is not None
60
+
61
+
62
+ def normalize_broadcast_url(url: str) -> str | None:
63
+ """Normalize a broadcast URL to the canonical x.com format."""
64
+ bid = extract_broadcast_id(url)
65
+ if bid:
66
+ return f"https://x.com/i/broadcasts/{bid}"
67
+ return None