ghostfetch 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Arsalan
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,661 @@
1
+ Metadata-Version: 2.4
2
+ Name: ghostfetch
3
+ Version: 1.0.0
4
+ Summary: A stealthy headless browser service for AI agents. Bypasses anti-bot protections to fetch content and convert to clean Markdown.
5
+ Author-email: Arsalan Shah <iarsalanshah@gmail.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/iArsalanshah/GhostFetch
8
+ Project-URL: Documentation, https://github.com/iArsalanshah/GhostFetch#readme
9
+ Project-URL: Repository, https://github.com/iArsalanshah/GhostFetch
10
+ Project-URL: Issues, https://github.com/iArsalanshah/GhostFetch/issues
11
+ Keywords: ai-agents,web-scraping,stealth,playwright,markdown,llm
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Operating System :: OS Independent
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.9
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
22
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
23
+ Requires-Python: >=3.9
24
+ Description-Content-Type: text/markdown
25
+ License-File: LICENSE
26
+ Requires-Dist: playwright>=1.40.0
27
+ Requires-Dist: fastapi>=0.100.0
28
+ Requires-Dist: uvicorn>=0.23.0
29
+ Requires-Dist: beautifulsoup4>=4.12.0
30
+ Requires-Dist: html2text>=2020.1.16
31
+ Requires-Dist: requests>=2.31.0
32
+ Requires-Dist: prometheus-client>=0.17.0
33
+ Provides-Extra: dev
34
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
35
+ Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
36
+ Requires-Dist: httpx>=0.24.0; extra == "dev"
37
+ Dynamic: license-file
38
+
39
+ # GhostFetch
40
+
41
+ A stealthy headless browser service for AI agents. Bypasses anti-bot protections to fetch content from sites like X.com and converts it to clean Markdown.
42
+
43
+ ## Features
44
+ - **Zero Setup**: Install with pip, browsers auto-install on first run
45
+ - **Synchronous API**: Single request returns content directly (no polling needed)
46
+ - **Ghost Protocol**: Advanced proxy rotation and cohesive browser fingerprinting
47
+ - **Stealth Browsing**: Uses Playwright with custom flags and canvas noise injection
48
+ - **Markdown Output**: Automatically converts HTML to Markdown for easy LLM consumption
49
+ - **Metadata Extraction**: Automatically extracts title, author, publish date, and images
50
+ - **X.com Support**: Logic to wait for dynamic content on Twitter/X
51
+ - **Async Job Queue**: Process multiple requests concurrently with intelligent retry
52
+ - **Persistent Sessions**: Cookie/localStorage persistence per domain
53
+ - **Webhook Callbacks**: Get notified via HTTP when jobs complete
54
+ - **GitHub Integration**: Post results directly to GitHub issues
55
+ - **Dual Mode**: CLI tool or REST API service
56
+ - **Docker Ready**: Pre-configured Docker setup with docker-compose
57
+
58
+ ## Quick Start
59
+
60
+ ### For AI Agents (Simplest)
61
+
62
+ ```bash
63
+ # Install from source
64
+ pip install -e .
65
+
66
+ # Fetch any URL (auto-installs browsers on first run)
67
+ ghostfetch "https://x.com/user/status/123"
68
+
69
+ # Or use the Python SDK
70
+ python -c "from ghostfetch import fetch; print(fetch('https://example.com')['markdown'])"
71
+ ```
72
+
73
+ ### For API Usage
74
+
75
+ ```bash
76
+ # Start the server
77
+ ghostfetch serve
78
+
79
+ # Fetch synchronously (blocks until done)
80
+ curl "http://localhost:8000/fetch/sync?url=https://example.com"
81
+ ```
82
+
83
+ ## Installation
84
+
85
+ ### Option 1: Docker Hub (Fastest)
86
+
87
+ ```bash
88
+ # Pull and run
89
+ docker run -p 8000:8000 iarsalanshah/ghostfetch
90
+
91
+ # Or with docker-compose
92
+ docker-compose up
93
+ ```
94
+
95
+ ### Option 2: pip install
96
+
97
+ ```bash
98
+ # From PyPI (when published)
99
+ pip install ghostfetch
100
+
101
+ # Or from source
102
+ git clone https://github.com/iArsalanshah/GhostFetch.git
103
+ cd GhostFetch
104
+ pip install -e .
105
+
106
+ # Browsers install automatically on first use, or run:
107
+ ghostfetch setup
108
+ ```
109
+
110
+ ### Option 3: Manual Setup
111
+
112
+ ```bash
113
+ cd GhostFetch
114
+
115
+ # Create virtual environment (optional)
116
+ python3 -m venv venv
117
+ source venv/bin/activate
118
+
119
+ # Install packages & browser
120
+ pip install -r requirements.txt
121
+ playwright install chromium
122
+ ```
123
+
124
+ ## Usage
125
+
126
+ ### 1. CLI Mode (Zero Setup)
127
+
128
+ Using the `ghostfetch` CLI (after pip install):
129
+
130
+ ```bash
131
+ # Basic fetch
132
+ ghostfetch "https://x.com/user/status/123"
133
+
134
+ # JSON output (for parsing)
135
+ ghostfetch "https://example.com" --json
136
+
137
+ # Metadata only
138
+ ghostfetch "https://example.com" --metadata-only
139
+
140
+ # Quiet mode (no progress messages)
141
+ ghostfetch "https://example.com" --quiet
142
+ ```
143
+
144
+ Using the legacy module directly:
145
+ ```bash
146
+ python -m src.core.scraper "https://x.com/user/status/123"
147
+ ```
148
+
149
+ Output:
150
+ ```
151
+ --- Metadata ---
152
+ {
153
+ "title": "...",
154
+ "author": "...",
155
+ "publish_date": "...",
156
+ "images": [...]
157
+ }
158
+
159
+ --- Markdown ---
160
+ [converted markdown content]
161
+ ```
162
+
163
+ ### 2. API Mode (Service for Agents)
164
+
165
+ Start the server:
166
+ ```bash
167
+ # Using CLI
168
+ ghostfetch serve
169
+
170
+ # Or directly
171
+ python main.py
172
+ ```
173
+ The server will start at `http://localhost:8000`.
174
+
175
+ ## API Endpoints
176
+
177
+ ### Synchronous Fetch (Recommended for AI Agents)
178
+ - **POST** `/fetch/sync` — blocks until content is ready
179
+ - **GET** `/fetch/sync?url=...` — same, but via query parameter
180
+
181
+ **Example (POST):**
182
+ ```bash
183
+ curl -X POST "http://localhost:8000/fetch/sync" \
184
+ -H "Content-Type: application/json" \
185
+ -d '{"url": "https://example.com", "timeout": 60}'
186
+ ```
187
+
188
+ **Example (GET):**
189
+ ```bash
190
+ curl "http://localhost:8000/fetch/sync?url=https://example.com"
191
+ ```
192
+
193
+ **Response:**
194
+ ```json
195
+ {
196
+ "metadata": {
197
+ "title": "Example Domain",
198
+ "author": "",
199
+ "publish_date": "",
200
+ "images": []
201
+ },
202
+ "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples..."
203
+ }
204
+ ```
205
+
206
+ ### Async Fetch (For Background Processing)
207
+ - **POST** `/fetch` (returns `202 Accepted`)
208
+ - **Body**:
209
+ ```json
210
+ {
211
+ "url": "https://example.com",
212
+ "callback_url": "https://your-server.com/webhook",
213
+ "github_issue": 123
214
+ }
215
+ ```
216
+
217
+ **Example:**
218
+ ```bash
219
+ curl -X POST "http://localhost:8000/fetch" \
220
+ -H "Content-Type: application/json" \
221
+ -d '{"url": "https://x.com/user/status/123"}'
222
+ ```
223
+
224
+ **Response:**
225
+ ```json
226
+ {
227
+ "job_id": "a1b2c3d4-e5f6-7890",
228
+ "url": "https://x.com/user/status/123",
229
+ "status": "queued"
230
+ }
231
+ ```
232
+
233
+ ### Check Job Status
234
+ - **GET** `/job/{job_id}`
235
+
236
+ **Example:**
237
+ ```bash
238
+ curl "http://localhost:8000/job/a1b2c3d4-e5f6-7890"
239
+ ```
240
+
241
+ **Response (Completed):**
242
+ ```json
243
+ {
244
+ "id": "a1b2c3d4-e5f6-7890",
245
+ "url": "https://x.com/mrnacknack/status/2016134416897360212",
246
+ "status": "completed",
247
+ "result": {
248
+ "metadata": {
249
+ "title": "...",
250
+ "author": "...",
251
+ "publish_date": "...",
252
+ "images": [...]
253
+ },
254
+ "markdown": "..."
255
+ },
256
+ "created_at": 1706000000,
257
+ "started_at": 1706000001,
258
+ "completed_at": 1706000010
259
+ }
260
+ ```
261
+
262
+ ### Health Check
263
+ - **GET** `/health`
264
+
265
+ **Response:**
266
+ ```json
267
+ {
268
+ "status": "ok",
269
+ "browser_connected": true,
270
+ "active_jobs_queue": 2,
271
+ "active_browser_contexts": 1,
272
+ "concurrency_limit": 2
273
+ }
274
+ ```
275
+
276
+ ## Integration Examples
277
+
278
+ ### Python Agent with Job Polling
279
+ ```python
280
+ import requests
281
+ import time
282
+
283
+ def fetch_content_async(url):
284
+ # Submit job
285
+ response = requests.post(
286
+ "http://localhost:8000/fetch",
287
+ json={"url": url}
288
+ )
289
+ job_id = response.json()["job_id"]
290
+
291
+ # Poll until completed
292
+ while True:
293
+ job_response = requests.get(f"http://localhost:8000/job/{job_id}")
294
+ job = job_response.json()
295
+
296
+ if job["status"] == "completed":
297
+ return job["result"]["markdown"]
298
+ elif job["status"] == "failed":
299
+ raise Exception(f"Job failed: {job['error']}")
300
+
301
+ time.sleep(1) # Poll every second
302
+ ```
303
+
304
+ ### Using Webhook Callbacks
305
+ ```python
306
+ import requests
307
+
308
+ # Your webhook endpoint receives:
309
+ # POST to callback_url with:
310
+ # {
311
+ # "job_id": "...",
312
+ # "url": "...",
313
+ # "status": "completed",
314
+ # "data": {"metadata": {...}, "markdown": "..."},
315
+ # "error": null,
316
+ # "error_details": null
317
+ # }
318
+
319
+ requests.post(
320
+ "http://localhost:8000/fetch",
321
+ json={
322
+ "url": "https://example.com",
323
+ "callback_url": "https://your-server.com/webhooks/ghostfetch"
324
+ }
325
+ )
326
+ ```
327
+
328
+ ### GitHub Integration
329
+ When you include a `github_issue` parameter, GhostFetch will post results as comments:
330
+
331
+ ```python
332
+ requests.post(
333
+ "http://localhost:8000/fetch",
334
+ json={
335
+ "url": "https://example.com",
336
+ "github_issue": 42 # Post result as comment on issue #42
337
+ }
338
+ )
339
+ ```
340
+
341
+ **Requires:**
342
+ - GitHub CLI (`gh` command) installed
343
+ - `GITHUB_TOKEN` environment variable set
344
+ - `GITHUB_REPO` configured
345
+
346
+ ## Integration with AI Agents
347
+ Your agent can submit a fetch job and poll for results:
348
+
349
+ ```python
350
+ import requests
351
+ import time
352
+
353
+ def fetch_blocked_content(url):
354
+ response = requests.post(
355
+ "http://localhost:8000/fetch",
356
+ json={"url": url}
357
+ )
358
+ job_id = response.json()["job_id"]
359
+
360
+ # Poll for completion
361
+ max_retries = 60
362
+ for _ in range(max_retries):
363
+ result = requests.get(f"http://localhost:8000/job/{job_id}").json()
364
+ if result["status"] == "completed":
365
+ return result["result"]["markdown"]
366
+ elif result["status"] == "failed":
367
+ return f"Error: {result['error']}"
368
+ time.sleep(1)
369
+
370
+ return "Timeout waiting for result"
371
+ ```
372
+
373
+ ## Configuration
374
+
375
+ GhostFetch is configured via environment variables (see `src/utils/config.py`) or the `proxies.txt` file.
376
+
377
+ - **Proxies**: Add one proxy per line to `proxies.txt` in the format `http://user:pass@host:port`.
378
+ - **Strategy**: Set `PROXY_STRATEGY` to `round_robin` or `random`.
379
+
380
+ ### Environment Variables
381
+
382
+ ```bash
383
+ # API Server
384
+ GHOSTFETCH_HOST=0.0.0.0
385
+ GHOSTFETCH_PORT=8000
386
+
387
+ # Scraper Settings
388
+ MAX_CONCURRENT_BROWSERS=2 # Number of concurrent browser contexts
389
+ MIN_DOMAIN_DELAY=10 # Minimum seconds between requests to same domain
390
+ MAX_REQUESTS_PER_BROWSER=50 # Restart browser after N requests
391
+ MAX_RETRIES=3 # Retry attempts for failed requests
392
+
393
+ # Sync Endpoint Settings
394
+ SYNC_TIMEOUT_DEFAULT=120 # Default timeout for /fetch/sync (seconds)
395
+ MAX_SYNC_TIMEOUT=300 # Maximum allowed timeout (5 minutes)
396
+
397
+ # GitHub Integration
398
+ GITHUB_REPO=iArsalanshah/GhostFetch # Owner/repo for issue comments
399
+
400
+ # Persistence
401
+ DATABASE_URL=sqlite:///./storage/jobs.db
402
+ STORAGE_DIR=storage
403
+
404
+ # Job Lifecycle
405
+ JOB_TTL_SECONDS=86400 # Delete completed jobs after 24 hours
406
+ ```
407
+
408
+ ### Docker Environment
409
+ Create a `.env` file for docker-compose:
410
+
411
+ ```bash
412
+ MAX_CONCURRENT_BROWSERS=2
413
+ MIN_DOMAIN_DELAY=10
414
+ GITHUB_REPO=your-org/your-repo
415
+ JOB_TTL_SECONDS=86400
416
+ ```
417
+
418
+ Then run:
419
+ ```bash
420
+ docker-compose --env-file .env up
421
+ ```
422
+
423
+ ## Specific Handling
424
+ - **X/Twitter**: The scraper waits for `[data-testid="tweetText"]` to ensure the tweet content is loaded before capturing.
425
+
426
+ ## ⚠️ Important: Rate Limiting & Ethics
427
+
428
+ This tool bypasses anti-bot protections. **Use responsibly:**
429
+
430
+ - **Respect robots.txt** - Check site policies before scraping
431
+ - **Implement delays** - Use `MIN_DOMAIN_DELAY` (default: 10 seconds) to avoid overloading servers
432
+ - **Throttle requests** - Reduce `MAX_CONCURRENT_BROWSERS` for high-volume scraping
433
+ - **Terms of Service** - Ensure your use complies with target site's ToS
434
+ - **Authentication** - When possible, use authorized access instead of bypassing protections
435
+
436
+ ### Recommended Settings for Production
437
+ ```bash
438
+ # Conservative (respectful scraping)
439
+ MIN_DOMAIN_DELAY=30
440
+ MAX_CONCURRENT_BROWSERS=1
441
+
442
+ # Moderate
443
+ MIN_DOMAIN_DELAY=15
444
+ MAX_CONCURRENT_BROWSERS=2
445
+
446
+ # Aggressive (only for your own content)
447
+ MIN_DOMAIN_DELAY=5
448
+ MAX_CONCURRENT_BROWSERS=4
449
+ ```
450
+
451
+ ## Production Deployment Guide
452
+
453
+ ### 1. Proxy Support (Recommended for High-Volume)
454
+
455
+ For serious stealth, rotate through residential proxies:
456
+
457
+ ```python
458
+ # Configure proxies.txt with your proxy list
459
+ # GhostFetch will automatically rotate and track health.
460
+ ```
461
+
462
+ **Recommended proxy providers:**
463
+ - BrightData (datacenter/residential)
464
+ - ScrapingBee (cloud-based)
465
+ - Oxylabs (residential networks)
466
+ - Local proxy rotation with tools like `scrapy-proxy-pool`
467
+
468
+ ### 2. Caching Layer (Reduce Redundant Requests)
469
+
470
+ For repeated fetches, implement Redis caching:
471
+
472
+ ```python
473
+ import redis
474
+
475
+ cache = redis.Redis(host='localhost', port=6379)
476
+
477
+ async def fetch_with_cache(url, ttl=3600):
478
+ cached = cache.get(url)
479
+ if cached:
480
+ return json.loads(cached)
481
+
482
+ result = await scraper.fetch(url)
483
+ cache.setex(url, ttl, json.dumps(result))
484
+ return result
485
+ ```
486
+
487
+ **Docker Compose with Redis:**
488
+ ```yaml
489
+ services:
490
+ ghostfetch:
491
+ build: .
492
+ ports:
493
+ - "8000:8000"
494
+ redis:
495
+ image: redis:7-alpine
496
+ ports:
497
+ - "6379:6379"
498
+ ```
499
+
500
+ ### 3. Security & Authentication
501
+
502
+ Add API key authentication before exposing publicly:
503
+
504
+ ```python
505
+ from fastapi import Header, HTTPException
506
+
507
+ VALID_API_KEYS = set(os.getenv("API_KEYS", "").split(","))
508
+
509
+ @app.post("/fetch")
510
+ async def fetch_endpoint(request: FetchRequest, x_api_key: str = Header(None)):
511
+ if not x_api_key or x_api_key not in VALID_API_KEYS:
512
+ raise HTTPException(status_code=401, detail="Invalid API key")
513
+ # ... rest of endpoint
514
+ ```
515
+
516
+ Usage:
517
+ ```bash
518
+ curl -X POST "http://localhost:8000/fetch" \
519
+ -H "x-api-key: your-secret-key" \
520
+ -H "Content-Type: application/json" \
521
+ -d '{"url": "https://example.com"}'
522
+ ```
523
+
524
+ ### 4. Monitoring & Observability
525
+
526
+ **Log rotation** (automatically configured):
527
+ - Logs stored in `storage/scraper.log`
528
+ - Max 5MB per file, keeps 5 backups
529
+ - Check for errors: `tail -f storage/scraper.log | grep ERROR`
530
+
531
+ **Database queries for analytics:**
532
+ ```bash
533
+ sqlite3 storage/jobs.db "SELECT status, COUNT(*) FROM jobs GROUP BY status;"
534
+ ```
535
+
536
+ **Health check monitoring:**
537
+ ```bash
538
+ while true; do
539
+ curl http://localhost:8000/health | jq .
540
+ sleep 30
541
+ done
542
+ ```
543
+
544
+ ### 3. Model Context Protocol (MCP)
545
+
546
+ GhostFetch includes an MCP server for integration with Claude Desktop and other MCP-aware agents.
547
+
548
+ Configuration (`claude_desktop_config.json`):
549
+
550
+ ```json
551
+ {
552
+ "mcpServers": {
553
+ "ghostfetch": {
554
+ "command": "python",
555
+ "args": ["-m", "ghostfetch.mcp_server"],
556
+ "env": {
557
+ "SYNC_TIMEOUT_DEFAULT": "120"
558
+ }
559
+ }
560
+ }
561
+ }
562
+ ```
563
+
564
+ This exposes a `ghostfetch` tool to the agent:
565
+ - `url`: The URL to fetch
566
+ - `context_id`: Optional session ID
567
+ - `timeout`: Optional timeout (seconds)
568
+
569
+ ## Performance & Monitoring
570
+
571
+ ### Logging
572
+ Logs are written to `storage/scraper.log` with rotation (5MB max):
573
+ - Stream output to console (INFO level)
574
+ - File output with detailed format
575
+
576
+ ### Load Testing
577
+ Run included load tests:
578
+ ```bash
579
+ # Python async load test
580
+ python scripts/load_test.py
581
+ ```
582
+
583
+ ### Database
584
+ Job history is stored in `storage/jobs.db` (SQLite):
585
+ - Persistent across restarts
586
+ - Automatic cleanup of old jobs (configurable TTL)
587
+ - Query jobs directly for analytics/debugging
588
+
589
+ ## Troubleshooting
590
+
591
+ **Playwright Error: Executable doesn't exist**
592
+ If you see an error about the browser executable not being found, run:
593
+ ```bash
594
+ playwright install chromium
595
+ ```
596
+
597
+ **Timeout Errors**
598
+ If fetching times out, it might be due to slow network or heavy anti-bot protections. You can try:
599
+ - Increasing timeout in `src/core/scraper.py` (default is 60000ms)
600
+ - Increasing `MIN_DOMAIN_DELAY` to avoid rate-limiting
601
+
602
+ **Job Stuck in "Processing"**
603
+ Check logs in `storage/scraper.log` for errors. If stuck, restart the service.
604
+
605
+ **GitHub Comments Not Posting**
606
+ Ensure:
607
+ - `gh` CLI is installed: `brew install gh` (macOS) or `apt install gh` (Linux)
608
+ - You're authenticated: `gh auth login`
609
+ - `GITHUB_REPO` is set correctly
610
+ - `GITHUB_TOKEN` is in your environment
611
+
612
+ **High Memory Usage**
613
+ Reduce `MAX_CONCURRENT_BROWSERS` or `MAX_REQUESTS_PER_BROWSER` in configuration.
614
+
615
+ ## Publishing Setup
616
+
617
+ ### Docker Hub
618
+
619
+ To enable automated Docker image publishing:
620
+
621
+ 1. Create a Docker Hub account and repository (`your-username/ghostfetch`)
622
+ 2. Generate an access token at https://hub.docker.com/settings/security
623
+ 3. Add these secrets to your GitHub repository:
624
+ - `DOCKERHUB_USERNAME`: Your Docker Hub username
625
+ - `DOCKERHUB_TOKEN`: Your access token
626
+
627
+ Images will be published automatically on pushes to `main` and version tags.
628
+
629
+ ### PyPI (Trusted Publishing)
630
+
631
+ To enable automated PyPI publishing:
632
+
633
+ 1. Go to https://pypi.org/manage/account/publishing/
634
+ 2. Add a new pending publisher:
635
+ - **PyPI Project Name**: `ghostfetch`
636
+ - **Owner**: `iArsalanshah`
637
+ - **Repository**: `GhostFetch`
638
+ - **Workflow name**: `pypi-publish.yml`
639
+ - **Environment**: `pypi`
640
+ 3. Create a GitHub Release to trigger publishing
641
+
642
+ No API tokens needed - uses OIDC trusted publishing.
643
+
644
+
645
+ ## Legal Disclaimer
646
+
647
+ GhostFetch is provided for educational and research purposes only. Users are solely responsible for ensuring their use complies with:
648
+ 1. The Terms of Service of target websites
649
+ 2. Applicable laws regarding data access and automation (including CFAA in the US)
650
+ 3. The robots.txt and scraping policies of target domains
651
+
652
+ This tool should not be used to:
653
+ - Scrape private or authenticated content without authorization
654
+ - Circumvent security measures on sites where such circumvention violates applicable law
655
+ - Violate the Terms of Service of social media platforms (including X/Twitter)
656
+
657
+ The authors assume no liability for misuse of this software.
658
+
659
+ ## License
660
+
661
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.