ef-dl 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 EF-DL
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,521 @@
1
+ # EF-DL: Epstein Files Downloader
2
+
3
+ ```
4
+ ______ ______ _____ __
5
+ /\ ___\ /\ ___\ /\ __-. /\ \
6
+ \ \ __\ \ \ __\ \ \ \/\ \ \ \ \____
7
+ \ \_____\ \ \_\ \ \____- \ \_____\
8
+ \/_____/ \/_/ \/____/ \/_____/
9
+ ```
10
+
11
+ > **DISCLAIMER**: This application is for **EDUCATIONAL PURPOSES ONLY**. By using this tool, you certify that you are 18 years of age or older and will use it responsibly and legally.
12
+
13
+ An interactive CLI tool for downloading the epstein files from the DOJ Epstein Files search portal. This tool automates the process of searching, downloading metadata, and downloading PDF files with support for pagination, prefixes, and deduplication.
14
+
15
+ ## Table of Contents
16
+
17
+ - [Features](#features)
18
+ - [Installation](#installation)
19
+ - [Option 1: Docker (Recommended)](#option-1-docker-recommended)
20
+ - [Option 2: Bun Package Manager](#option-2-bun-package-manager)
21
+ - [Option 3: Local Development](#option-3-local-development)
22
+ - [Quick Start](#quick-start)
23
+ - [Start with interactive mode (default)](#start-with-interactive-mode-default)
24
+ - [Download all pages](#download-all-pages)
25
+ - [Download a specific page](#download-a-specific-page)
26
+ - [Interactive mode with pre-filled values](#interactive-mode-with-pre-filled-values)
27
+ - [Docker Usage](#docker-usage)
28
+ - [Quick Start with Docker](#quick-start-with-docker)
29
+ - [Docker Commands](#docker-commands)
30
+ - [Development with Docker](#development-with-docker)
31
+ - [Usage](#usage)
32
+ - [Command Line Options](#command-line-options)
33
+ - [Interactive Mode](#interactive-mode)
34
+ - [Examples](#examples)
35
+ - [Download Flow](#download-flow)
36
+ - [File Organization](#file-organization)
37
+ - [Tech Stack](#tech-stack)
38
+ - [Development](#development)
39
+ - [Important Notes](#important-notes)
40
+ - [Contributing](#contributing)
41
+ - [License](#license)
42
+
43
+ ## Features
44
+
45
+ - **Search Portal Integration**: Automatically searches justice.gov Epstein Files portal
46
+ - **PDF Downloads**: Downloads PDFs with automatic deduplication based on filename and file size
47
+ - **Progress Tracking**: Visual progress bars for JSON fetching and PDF downloads
48
+ - **Parallel Workers**: Multi-process downloads with a queue-backed resume system
49
+ - **Resume Support**: Restart interrupted runs from the queue state
50
+ - **Custom Prefixes**: Add custom prefixes to PDF filenames or use page numbers automatically
51
+ - **Smart Deduplication**: Detects existing files and skips/renames them appropriately
52
+ - **Batch Processing**: Download single pages or all pages at once
53
+ - **Interactive Mode**: Guided prompts for configuration (great for first-time users)
54
+ - **Age Verification**: Built-in age verification for legal compliance
55
+ - **Security Handling**: Automatically handles CAPTCHA and age verification challenges
56
+
57
+ ## Installation
58
+
59
+ > **Bun Runtime Required**: This package uses Bun-specific APIs (`bun:sqlite`) and requires the Bun runtime. It will not work with Node.js.
60
+
61
+ ### Option 1: Docker (Recommended)
62
+
63
+ No local runtime installation needed - just Docker:
64
+
65
+ - [Docker](https://www.docker.com/) v20.0.0 or higher
66
+ - [Docker Compose](https://docs.docker.com/compose/) v2.0.0 or higher (optional)
67
+
68
+ Docker Images:
69
+
70
+ - Docker Hub: `iammorpheus/ef-dl:latest`
71
+ - GitHub Container Registry: `ghcr.io/iammorpheuszion/ef-dl:latest`
72
+
73
+ ```bash
74
+ # Option A: Using docker-compose (recommended)
75
+ # Download & run docker-compose.yml
76
+ curl -O https://raw.githubusercontent.com/iammorpheuszion/ef-dl/main/docker-compose.yml
77
+
78
+ docker compose run -it --rm ef-dl
79
+ ```
80
+
81
+ ```
82
+ # Option B: Using docker run directly
83
+ # Configure download location with -v flag
84
+ docker run -it --rm -v ./downloads:/app/downloads iammorpheus/ef-dl
85
+ ```
86
+
87
+ See [Docker Usage](#docker-usage) for more details.
88
+
89
+ ### Option 2: Bun Package Manager
90
+
91
+ Install from npm using Bun:
92
+
93
+ ```bash
94
+ # Using bunx (no installation needed) - like npx but for Bun
95
+ bunx ef-dl -s "your search term" -d ./downloads
96
+
97
+ # Or install globally with Bun
98
+ bun install -g ef-dl
99
+ ef-dl -s "your search term" -d ./downloads
100
+ ```
101
+
102
+ ### Option 3: Local Development
103
+
104
+ Clone and run from source:
105
+
106
+ <details>
107
+ <summary>Local installation steps</summary>
108
+
109
+ **Prerequisites:** [Bun](https://bun.sh/) v1.0.0 or higher
110
+
111
+ 1. Clone the repository:
112
+
113
+ ```bash
114
+ git clone https://github.com/iammorpheuszion/ef-dl.git
115
+ cd ef-dl
116
+ ```
117
+
118
+ 2. Install dependencies:
119
+
120
+ ```bash
121
+ bun install
122
+ ```
123
+
124
+ 3. Verify installation:
125
+
126
+ ```bash
127
+ bun run typecheck
128
+ ```
129
+
130
+ </details>
131
+
132
+ ## Quick Start
133
+
134
+ ### Start with interactive mode (default)
135
+
136
+ Running without arguments automatically starts interactive mode:
137
+
138
+ ```bash
139
+ bun run start
140
+ ```
141
+
142
+ ### Interactive mode with pre-filled values
143
+
144
+ ```bash
145
+ bun run start -i -s "your search term" -p 5
146
+ ```
147
+
148
+ ### Download all pages
149
+
150
+ ```bash
151
+ bun run start -s "your search term" -d ./downloads
152
+ ```
153
+
154
+ ### Download a specific page
155
+
156
+ ```bash
157
+ bun run start -s "your search term" -p 5 -d ./downloads
158
+ ```
159
+
160
+ ## Usage
161
+
162
+ ### Command Line Options
163
+
164
+ | Flag | Short | Description | Required | Default |
165
+ | --------------- | ----- | ----------------------------------------- | -------- | ----------- |
166
+ | `--search` | `-s` | Search term to query the portal | Yes | - |
167
+ | `--directory` | `-d` | Download directory path | Yes | - |
168
+ | `--page` | `-p` | Page number to download | - | All pages |
169
+ | `--all` | `-a` | Download all pages from specified page | - | `false` |
170
+ | `--prefix` | - | Custom filename prefix (sequential mode) | - | Page number |
171
+ | `--workers` | - | Number of parallel workers (1-10) | - | `5` |
172
+ | `--fresh` | - | Force fresh start, ignore resume | - | `false` |
173
+ | `--sequential` | - | Use sequential download (disable workers) | - | `false` |
174
+ | `--verbose` | `-v` | Enable verbose debug output | - | `false` |
175
+ | `--interactive` | `-i` | Interactive mode with prompts | - | `false` |
176
+ | `--help` | `-h` | Show help menu | - | - |
177
+ | `--version` | `-V` | Show version number | - | - |
178
+
179
+ ### Interactive Mode
180
+
181
+ Interactive mode provides guided prompts for all configuration options. **Running the tool without any arguments automatically enters interactive mode.**
182
+
183
+ ```bash
184
+ # Start interactive mode (no arguments needed)
185
+ bun run start
186
+
187
+ # Explicit interactive mode
188
+ bun run start -i
189
+
190
+ # Interactive with pre-filled values
191
+ bun run start -i -s "your search term"
192
+ ```
193
+
194
+ **Interactive prompts:**
195
+
196
+ 1. Search term
197
+ 2. Download directory
198
+ 3. Page number (leave empty for all pages)
199
+ 4. Download mode (single page or all from page)
200
+ 5. Custom prefix (leave empty for page number)
201
+ 6. Verbose mode (yes/no)
202
+
203
+ ### Examples
204
+
205
+ <details>
206
+ <summary>Click to see all example commands</summary>
207
+
208
+ ```bash
209
+ # Download all pages with parallel workers (default: 5)
210
+ bun run start -s "your search term" -d ./downloads
211
+
212
+ # Download with 10 parallel workers
213
+ bun run start -s "your search term" -d ./downloads --workers 10
214
+
215
+ # Download with sequential mode (no parallelism)
216
+ bun run start -s "your search term" -d ./downloads --sequential
217
+
218
+ # Download only page 5
219
+ bun run start -s "your search term" -p 5 -d ./downloads
220
+
221
+ # Download all pages starting from page 5
222
+ bun run start -s "your search term" -p 5 -a -d ./downloads
223
+
224
+ # Download page 5 (uses page number as prefix: 5-filename.pdf)
225
+ bun run start -s "your search term" -p 5 -d ./downloads
226
+ # Results in: 5-EFTA00000001.pdf
227
+
228
+ # Download with custom prefix
229
+ bun run start -s "your search term" -p 5 -d ./downloads --prefix EPSTEIN
230
+ # Results in: EPSTEIN-EFTA00000001.pdf
231
+
232
+ # Download with verbose output
233
+ bun run start -s "your search term" -d ./downloads -v
234
+
235
+ # Force fresh start (ignore previous resume)
236
+ bun run start -s "your search term" -d ./downloads --fresh
237
+
238
+ # Interactive mode (prompts for all options)
239
+ bun run start -i
240
+
241
+ # Interactive mode with pre-filled values
242
+ bun run start -i -s "your search term" -d ./downloads
243
+ ```
244
+
245
+ </details>
246
+
247
+ ## Docker Usage
248
+
249
+ You can also run EF-DL using Docker without installing Bun or Node.js locally.
250
+
251
+ ### Quick Start with Docker
252
+
253
+ ```bash
254
+ # Run in interactive mode
255
+ docker compose run -it --rm ef-dl
256
+
257
+ # Download specific search term
258
+ docker compose run -it --rm ef-dl bun index.ts -s "your search term" -d ./downloads
259
+ ```
260
+
261
+ ### Docker Commands
262
+
263
+ **Volume Binding:** Use `-v` to map a local directory to the container's download location. Downloads will be saved to your local machine.
264
+
265
+ <details>
266
+ <summary>Click to see all example commands</summary>
267
+
268
+ ```bash
269
+ # Build the image
270
+ docker build -t ef-dl .
271
+
272
+ # Run interactively - downloads go to ./downloads on your machine
273
+ docker run -it --rm -v $(pwd)/downloads:/app/downloads ef-dl
274
+
275
+ # Run with arguments - save to current directory
276
+ docker run -it --rm -v $(pwd)/downloads:/app/downloads ef-dl bun index.ts -s "your_search_term" -d ./downloads
277
+
278
+ # Custom download location - use absolute path
279
+ docker run -it --rm -v /path/to/your/downloads:/app/downloads ef-dl bun index.ts -s "your_search_term" -d ./downloads
280
+
281
+ # Windows users (PowerShell)
282
+ docker run -it --rm -v ${PWD}/downloads:/app/downloads ef-dl
283
+
284
+ # Use production-optimized image
285
+ docker build -f Dockerfile.production -t ef-dl:prod .
286
+ docker run -it --rm -v $(pwd)/downloads:/app/downloads ef-dl:prod
287
+ ```
288
+
289
+ </details>
290
+
291
+ ### Development with Docker
292
+
293
+ ```bash
294
+ # Run with hot reload
295
+ docker-compose --profile dev run --rm ef-dl-dev
296
+ ```
297
+
298
+ ## Download Flow
299
+
300
+ Parallel mode (default) uses a producer-consumer pipeline with a SQLite queue and worker pool. Use `--sequential` to run the legacy single-process flow.
301
+
302
+ <details>
303
+ <summary>View detailed flow diagram</summary>
304
+
305
+ ```mermaid
306
+ flowchart TD
307
+ A[Start CLI] --> B{Resume check}
308
+ B -->|No queue or --fresh| C[Initialize queue DB + cache]
309
+ B -->|Queue exists| D[Show resume prompt]
310
+ D -->|Resume| E[Reset in-progress -> pending]
311
+ D -->|Fresh| C
312
+ D -->|Abort| Z[Exit]
313
+
314
+ C --> F[Discover totals]
315
+ E --> F
316
+ F --> G[Start worker pool]
317
+ G --> H[Init progress bars]
318
+
319
+ subgraph Producer[Coordinator: JSON producer]
320
+ H --> I[Fetch JSON pages]
321
+ I --> J[Save JSON to cache]
322
+ J --> K[Extract PDFs]
323
+ K --> L[Insert tasks into queue DB]
324
+ L --> M[Update JSON progress]
325
+ M --> I
326
+ end
327
+
328
+ subgraph Queue[SQLite queue DB]
329
+ L --> Q[(pdf_tasks + metadata)]
330
+ Q --> N[Workers claim tasks]
331
+ end
332
+
333
+ subgraph Workers[Worker pool]
334
+ N --> O[Download PDF]
335
+ O --> P[Mark complete/failed]
336
+ P --> N
337
+ P --> R{json_fetch_complete?}
338
+ R -->|No| N
339
+ R -->|Yes & no pending| S[Worker exits]
340
+ end
341
+
342
+ I --> T[Set json_fetch_complete = true]
343
+ T --> U[Wait for workers]
344
+
345
+ subgraph Progress[Progress tracking]
346
+ H --> V[Add JSON + PDF bars]
347
+ V --> W[Poll queue progress 1s]
348
+ W --> X[Update PDF bar]
349
+ M --> Y[Update JSON bar]
350
+ end
351
+
352
+ U --> AA[Show summary]
353
+ AA --> AB{Cleanup cache?}
354
+ AB -->|Yes| AC[Delete cache + queue DB]
355
+ AB -->|No| AD[Keep cache for resume]
356
+ AC --> AE[Done]
357
+ AD --> AE[Done]
358
+ ```
359
+
360
+ </details>
361
+
362
+ ## File Organization
363
+
364
+ **JSON Metadata:** Automatically saved with search results, document metadata, URLs, file sizes, and excerpts.
365
+
366
+ **PDF Files:** Prefixed with page number by default (e.g., `5-filename.pdf`). Custom prefixes supported in sequential mode. Duplicate detection based on filename AND file size.
367
+
368
+ <details>
369
+ <summary>View directory structures</summary>
370
+
371
+ ### Parallel mode (default)
372
+
373
+ ```
374
+ {download-directory}/
375
+ ├── cache/
376
+ │ └── {search-term}/
377
+ │ ├── json/
378
+ │ │ ├── search-{term}-page-1-{timestamp}.json
379
+ │ │ ├── search-{term}-page-2-{timestamp}.json
380
+ │ │ └── ...
381
+ │ └── {search-term}.db
382
+ └── files/
383
+ └── {search-term}/
384
+ ├── {page}-EFTA00000001.pdf
385
+ ├── {page}-EFTA00000002.pdf
386
+ └── ...
387
+ ```
388
+
389
+ ### Sequential mode (`--sequential`)
390
+
391
+ ```
392
+ {download-directory}/
393
+ └── {search-term}/
394
+ ├── json/
395
+ │ ├── search-{term}-page-1-{timestamp}.json
396
+ │ ├── search-{term}-page-2-{timestamp}.json
397
+ │ └── ...
398
+ └── pdfs/
399
+ ├── {prefix}-EFTA00000001.pdf
400
+ ├── {prefix}-EFTA00000002.pdf
401
+ └── ...
402
+ ```
403
+
404
+ </details>
405
+
406
+ ## Tech Stack
407
+
408
+ **Core:** TypeScript, Bun/Node.js, Puppeteer for browser automation
409
+
410
+ <details>
411
+ <summary>View all dependencies</summary>
412
+
413
+ ### Dependencies
414
+
415
+ | Package | Version | Purpose |
416
+ | --------------------- | -------- | ----------------------------------------------- |
417
+ | `@inquirer/prompts` | ^8.2.0 | Interactive CLI prompts and user input handling |
418
+ | `browserless` | ^10.9.18 | Headless browser automation for web scraping |
419
+ | `chalk` | ^5.6.2 | Terminal string styling and colors |
420
+ | `commander` | ^14.0.3 | CLI argument parsing and command structure |
421
+ | `figlet` | ^1.10.0 | ASCII art text generation for headers |
422
+ | `multi-progress-bars` | ^5.0.3 | Multiple concurrent progress bar display |
423
+ | `puppeteer` | ^24.36.1 | Browser automation and PDF downloads |
424
+
425
+ ### Development Dependencies
426
+
427
+ | Package | Version | Purpose |
428
+ | --------------- | ------- | ------------------------------------------- |
429
+ | `@types/bun` | latest | TypeScript type definitions for Bun runtime |
430
+ | `@types/figlet` | ^1.7.0 | TypeScript type definitions for Figlet |
431
+
432
+ </details>
433
+
434
+ ## Development
435
+
436
+ <details>
437
+ <summary>Scripts and project structure</summary>
438
+
439
+ ### Scripts
440
+
441
+ | Script | Command | Description |
442
+ | -------------- | ---------------------------------- | ------------------------ |
443
+ | `dev` | `bun --watch --hot index.ts` | Run with hot reloading |
444
+ | `start` | `bun index.ts` | Run the application |
445
+ | `build` | `bun build index.ts --outdir dist` | Build for production |
446
+ | `typecheck` | `tsc --noEmit` | TypeScript type checking |
447
+ | `test:browser` | `bun src/browser-client.ts` | Test browser client |
448
+
449
+ ### Project Structure
450
+
451
+ ```
452
+ ef-dl/
453
+ ├── index.ts # Main application entry point
454
+ ├── src/
455
+ │ ├── browser-client.ts # Web scraping and PDF download logic
456
+ │ ├── progress.ts # Progress bar management
457
+ │ ├── types/
458
+ │ │ ├── enums.ts # Shared enums (prompt types)
459
+ │ │ └── browserless.d.ts # Browserless module typings
460
+ │ ├── utils/
461
+ │ │ ├── ascii.ts # ASCII art header generation
462
+ │ │ ├── logger.ts # Centralized logging utilities
463
+ │ │ └── prompt.ts # Unified prompt handling
464
+ │ └── workers/
465
+ │ ├── coordinator.ts # Producer logic
466
+ │ ├── task-queue.ts # SQLite operations
467
+ │ ├── worker-pool.ts # Worker management
468
+ │ ├── worker.ts # Worker process
469
+ │ └── types.ts # Worker types
470
+ ├── downloads/ # Default download directory (created on first run)
471
+ ├── package.json
472
+ ├── tsconfig.json
473
+ └── README.md
474
+ ```
475
+
476
+ </details>
477
+
478
+ ## Important Notes
479
+
480
+ - **Age Requirement**: You must be 18+ to use this application
481
+ - **Educational Use**: For educational purposes only
482
+ - **Default Behavior**: Running without arguments starts interactive mode
483
+ - **Parallel by Default**: Worker pipeline is default; use `--sequential` for single-process
484
+ - **File Deduplication**: Detected by filename AND size to prevent duplicates
485
+
486
+ <details>
487
+ <summary>Troubleshooting</summary>
488
+
489
+ ### "required option not specified" error
490
+
491
+ This error only occurs in non-interactive mode. Either:
492
+
493
+ - Run without arguments to use interactive mode: `bun index.ts`
494
+ - Provide all required flags: `bun index.ts -s "term" -d ./downloads`
495
+ - Use interactive mode explicitly: `bun index.ts -i`
496
+
497
+ ### Download fails
498
+
499
+ - Check your internet connection
500
+ - Try with `-v` (verbose) flag to see detailed error messages
501
+ - Ensure you have sufficient disk space
502
+
503
+ ### Files not being detected as duplicates
504
+
505
+ The tool checks both filename AND file size. If a file exists with a different size, it will be re-downloaded.
506
+
507
+ </details>
508
+
509
+ ## Contributing
510
+
511
+ Contributions are welcome. Please feel free to submit a Pull Request.
512
+
513
+ ## License
514
+
515
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
516
+
517
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
518
+
519
+ ---
520
+
521
+ **Disclaimer**: This is an independent educational tool and is not affiliated with or endorsed by the US Department of Justice. Use responsibly and in accordance with all applicable laws and terms of service.