captcha-bypass 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- captcha_bypass-0.1.0/.claude/agents/senior-python-scraping-dev.md +54 -0
- captcha_bypass-0.1.0/.claude/settings.local.json +11 -0
- captcha_bypass-0.1.0/.gitignore +207 -0
- captcha_bypass-0.1.0/CLAUDE.md +104 -0
- captcha_bypass-0.1.0/Dockerfile +22 -0
- captcha_bypass-0.1.0/LICENSE +14 -0
- captcha_bypass-0.1.0/PKG-INFO +430 -0
- captcha_bypass-0.1.0/README.md +390 -0
- captcha_bypass-0.1.0/TASKS.md +107 -0
- captcha_bypass-0.1.0/captcha_bypass/__init__.py +3 -0
- captcha_bypass-0.1.0/captcha_bypass/__main__.py +6 -0
- captcha_bypass-0.1.0/captcha_bypass/cli.py +143 -0
- captcha_bypass-0.1.0/captcha_bypass/server.py +284 -0
- captcha_bypass-0.1.0/captcha_bypass/solver.py +592 -0
- captcha_bypass-0.1.0/captcha_bypass/storage.py +273 -0
- captcha_bypass-0.1.0/docker-compose.yml +20 -0
- captcha_bypass-0.1.0/example/async_solve.py +262 -0
- captcha_bypass-0.1.0/example/results/.gitignore +2 -0
- captcha_bypass-0.1.0/pyproject.toml +40 -0
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: senior-python-scraping-dev
|
|
3
|
+
description: Use this agent when the user needs to write, review, or fix Python code related to web scraping, anti-bot bypass, browser automation, async programming, or high-load systems. This includes code reviews after implementing features, debugging stealth/anti-detection issues, writing production-ready scraping code, fixing bugs in HTTP/cookie handling, or reviewing task queue implementations.\n\nExamples:\n\n1. Code Review After Implementation:\n user: "Please implement a function to extract cookies from Camoufox browser session"\n assistant: "Here is the implementation:"\n <function implementation>\n assistant: "Now let me use the senior-python-scraping-dev agent to review the code for production-readiness and anti-detection best practices"\n\n2. Writing New Code:\n user: "I need a retry mechanism for handling Cloudflare challenges"\n assistant: "I'll use the senior-python-scraping-dev agent to implement this with proper async patterns and reliability considerations"\n\n3. Bug Fixing:\n user: "The cookie extraction is failing intermittently"\n assistant: "I'll use the senior-python-scraping-dev agent to diagnose and fix this bug, checking for race conditions and proper async handling"\n\n4. Task Review:\n user: "Can you review the solve endpoint implementation I just finished?"\n assistant: "I'll use the senior-python-scraping-dev agent to review the implementation for performance, reliability, and adherence to anti-detection patterns"
|
|
4
|
+
model: opus
|
|
5
|
+
color: red
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
You are a senior Python developer with deep expertise in web scraping, anti-bot bypass systems, and high-load architectures. Your experience spans:
|
|
9
|
+
|
|
10
|
+
**Core Domains:**
|
|
11
|
+
- Web scraping and parsing with anti-bot bypass, browser automation, and stealth techniques
|
|
12
|
+
- High-load systems including async programming, task queues, and horizontal scaling
|
|
13
|
+
- Network protocols (HTTP/HTTPS, cookies, headers manipulation)
|
|
14
|
+
- Anti-detection tools (Camoufox, Playwright, Puppeteer patterns)
|
|
15
|
+
|
|
16
|
+
**Your Approach:**
|
|
17
|
+
You write pragmatic, production-ready code. You prioritize reliability and performance over premature abstractions. You understand that in scraping systems, edge cases are the norm, not the exception.
|
|
18
|
+
|
|
19
|
+
**When Writing Code:**
|
|
20
|
+
1. Always consider anti-detection implications - fingerprinting, timing patterns, request sequences
|
|
21
|
+
2. Use async/await properly - avoid blocking calls, handle cancellation gracefully
|
|
22
|
+
3. Implement proper error handling with retry logic and exponential backoff
|
|
23
|
+
4. Keep code simple and maintainable - avoid over-engineering
|
|
24
|
+
5. Consider resource management - browser instances, connections, memory
|
|
25
|
+
6. Add type hints for clarity and IDE support
|
|
26
|
+
7. Follow Python 3.14 idioms and best practices
|
|
27
|
+
|
|
28
|
+
**When Reviewing Code:**
|
|
29
|
+
1. Check for anti-detection issues - suspicious patterns, missing stealth measures
|
|
30
|
+
2. Verify async correctness - race conditions, proper awaiting, resource cleanup
|
|
31
|
+
3. Assess error handling completeness - what happens when things fail?
|
|
32
|
+
4. Evaluate performance implications - connection pooling, caching, batching
|
|
33
|
+
5. Look for security issues - credential handling, injection vulnerabilities
|
|
34
|
+
6. Ensure code follows project conventions from CLAUDE.md
|
|
35
|
+
7. Be specific about issues and provide concrete fixes
|
|
36
|
+
|
|
37
|
+
**When Fixing Bugs:**
|
|
38
|
+
1. Reproduce the issue first - understand the failure mode
|
|
39
|
+
2. Check for intermittent/timing-related causes in async code
|
|
40
|
+
3. Verify network-related assumptions - timeouts, retries, connection states
|
|
41
|
+
4. Consider browser state and lifecycle issues with Camoufox
|
|
42
|
+
5. Test the fix under realistic conditions
|
|
43
|
+
|
|
44
|
+
**Quality Standards:**
|
|
45
|
+
- Code must be production-ready, not prototype quality
|
|
46
|
+
- Prefer explicit over implicit behavior
|
|
47
|
+
- Handle edge cases that are common in scraping (timeouts, partial responses, rate limits)
|
|
48
|
+
- Include meaningful logging for debugging in production
|
|
49
|
+
- Write code that fails gracefully and provides useful error messages
|
|
50
|
+
|
|
51
|
+
**Project Context:**
|
|
52
|
+
You are working on a self-hosted captcha bypass service with HTTP API for circumventing Cloudflare/Amazon challenges. The stack is Python 3.14 with Camoufox (stealth Firefox) and Docker. The API has /health, /solve, and /result/{id} endpoints.
|
|
53
|
+
|
|
54
|
+
Always verify your assumptions by checking the actual code. Never guess about implementation details - read the source. If requirements are unclear, ask for clarification rather than making assumptions.
|
|
@@ -0,0 +1,207 @@
|
|
|
1
|
+
# Byte-compiled / optimized / DLL files
|
|
2
|
+
__pycache__/
|
|
3
|
+
*.py[codz]
|
|
4
|
+
*$py.class
|
|
5
|
+
|
|
6
|
+
# C extensions
|
|
7
|
+
*.so
|
|
8
|
+
|
|
9
|
+
# Distribution / packaging
|
|
10
|
+
.Python
|
|
11
|
+
build/
|
|
12
|
+
develop-eggs/
|
|
13
|
+
dist/
|
|
14
|
+
downloads/
|
|
15
|
+
eggs/
|
|
16
|
+
.eggs/
|
|
17
|
+
lib/
|
|
18
|
+
lib64/
|
|
19
|
+
parts/
|
|
20
|
+
sdist/
|
|
21
|
+
var/
|
|
22
|
+
wheels/
|
|
23
|
+
share/python-wheels/
|
|
24
|
+
*.egg-info/
|
|
25
|
+
.installed.cfg
|
|
26
|
+
*.egg
|
|
27
|
+
MANIFEST
|
|
28
|
+
|
|
29
|
+
# PyInstaller
|
|
30
|
+
# Usually these files are written by a python script from a template
|
|
31
|
+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
|
|
32
|
+
*.manifest
|
|
33
|
+
*.spec
|
|
34
|
+
|
|
35
|
+
# Installer logs
|
|
36
|
+
pip-log.txt
|
|
37
|
+
pip-delete-this-directory.txt
|
|
38
|
+
|
|
39
|
+
# Unit test / coverage reports
|
|
40
|
+
htmlcov/
|
|
41
|
+
.tox/
|
|
42
|
+
.nox/
|
|
43
|
+
.coverage
|
|
44
|
+
.coverage.*
|
|
45
|
+
.cache
|
|
46
|
+
nosetests.xml
|
|
47
|
+
coverage.xml
|
|
48
|
+
*.cover
|
|
49
|
+
*.py.cover
|
|
50
|
+
.hypothesis/
|
|
51
|
+
.pytest_cache/
|
|
52
|
+
cover/
|
|
53
|
+
|
|
54
|
+
# Translations
|
|
55
|
+
*.mo
|
|
56
|
+
*.pot
|
|
57
|
+
|
|
58
|
+
# Django stuff:
|
|
59
|
+
*.log
|
|
60
|
+
local_settings.py
|
|
61
|
+
db.sqlite3
|
|
62
|
+
db.sqlite3-journal
|
|
63
|
+
|
|
64
|
+
# Flask stuff:
|
|
65
|
+
instance/
|
|
66
|
+
.webassets-cache
|
|
67
|
+
|
|
68
|
+
# Scrapy stuff:
|
|
69
|
+
.scrapy
|
|
70
|
+
|
|
71
|
+
# Sphinx documentation
|
|
72
|
+
docs/_build/
|
|
73
|
+
|
|
74
|
+
# PyBuilder
|
|
75
|
+
.pybuilder/
|
|
76
|
+
target/
|
|
77
|
+
|
|
78
|
+
# Jupyter Notebook
|
|
79
|
+
.ipynb_checkpoints
|
|
80
|
+
|
|
81
|
+
# IPython
|
|
82
|
+
profile_default/
|
|
83
|
+
ipython_config.py
|
|
84
|
+
|
|
85
|
+
# pyenv
|
|
86
|
+
# For a library or package, you might want to ignore these files since the code is
|
|
87
|
+
# intended to run in multiple environments; otherwise, check them in:
|
|
88
|
+
# .python-version
|
|
89
|
+
|
|
90
|
+
# pipenv
|
|
91
|
+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
|
|
92
|
+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
|
|
93
|
+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
|
|
94
|
+
# install all needed dependencies.
|
|
95
|
+
#Pipfile.lock
|
|
96
|
+
|
|
97
|
+
# UV
|
|
98
|
+
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
|
|
99
|
+
# This is especially recommended for binary packages to ensure reproducibility, and is more
|
|
100
|
+
# commonly ignored for libraries.
|
|
101
|
+
#uv.lock
|
|
102
|
+
|
|
103
|
+
# poetry
|
|
104
|
+
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
|
|
105
|
+
# This is especially recommended for binary packages to ensure reproducibility, and is more
|
|
106
|
+
# commonly ignored for libraries.
|
|
107
|
+
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
|
|
108
|
+
#poetry.lock
|
|
109
|
+
#poetry.toml
|
|
110
|
+
|
|
111
|
+
# pdm
|
|
112
|
+
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
|
|
113
|
+
# pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
|
|
114
|
+
# https://pdm-project.org/en/latest/usage/project/#working-with-version-control
|
|
115
|
+
#pdm.lock
|
|
116
|
+
#pdm.toml
|
|
117
|
+
.pdm-python
|
|
118
|
+
.pdm-build/
|
|
119
|
+
|
|
120
|
+
# pixi
|
|
121
|
+
# Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
|
|
122
|
+
#pixi.lock
|
|
123
|
+
# Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
|
|
124
|
+
# in the .venv directory. It is recommended not to include this directory in version control.
|
|
125
|
+
.pixi
|
|
126
|
+
|
|
127
|
+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
|
|
128
|
+
__pypackages__/
|
|
129
|
+
|
|
130
|
+
# Celery stuff
|
|
131
|
+
celerybeat-schedule
|
|
132
|
+
celerybeat.pid
|
|
133
|
+
|
|
134
|
+
# SageMath parsed files
|
|
135
|
+
*.sage.py
|
|
136
|
+
|
|
137
|
+
# Environments
|
|
138
|
+
.env
|
|
139
|
+
.envrc
|
|
140
|
+
.venv
|
|
141
|
+
env/
|
|
142
|
+
venv/
|
|
143
|
+
ENV/
|
|
144
|
+
env.bak/
|
|
145
|
+
venv.bak/
|
|
146
|
+
|
|
147
|
+
# Spyder project settings
|
|
148
|
+
.spyderproject
|
|
149
|
+
.spyproject
|
|
150
|
+
|
|
151
|
+
# Rope project settings
|
|
152
|
+
.ropeproject
|
|
153
|
+
|
|
154
|
+
# mkdocs documentation
|
|
155
|
+
/site
|
|
156
|
+
|
|
157
|
+
# mypy
|
|
158
|
+
.mypy_cache/
|
|
159
|
+
.dmypy.json
|
|
160
|
+
dmypy.json
|
|
161
|
+
|
|
162
|
+
# Pyre type checker
|
|
163
|
+
.pyre/
|
|
164
|
+
|
|
165
|
+
# pytype static type analyzer
|
|
166
|
+
.pytype/
|
|
167
|
+
|
|
168
|
+
# Cython debug symbols
|
|
169
|
+
cython_debug/
|
|
170
|
+
|
|
171
|
+
# PyCharm
|
|
172
|
+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
|
|
173
|
+
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
|
|
174
|
+
# and can be added to the global gitignore or merged into this file. For a more nuclear
|
|
175
|
+
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
|
|
176
|
+
.idea/
|
|
177
|
+
|
|
178
|
+
# Abstra
|
|
179
|
+
# Abstra is an AI-powered process automation framework.
|
|
180
|
+
# Ignore directories containing user credentials, local state, and settings.
|
|
181
|
+
# Learn more at https://abstra.io/docs
|
|
182
|
+
.abstra/
|
|
183
|
+
|
|
184
|
+
# Visual Studio Code
|
|
185
|
+
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
|
|
186
|
+
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
|
|
187
|
+
# and can be added to the global gitignore or merged into this file. However, if you prefer,
|
|
188
|
+
# you could uncomment the following to ignore the entire vscode folder
|
|
189
|
+
# .vscode/
|
|
190
|
+
|
|
191
|
+
# Ruff stuff:
|
|
192
|
+
.ruff_cache/
|
|
193
|
+
|
|
194
|
+
# PyPI configuration file
|
|
195
|
+
.pypirc
|
|
196
|
+
|
|
197
|
+
# Cursor
|
|
198
|
+
# Cursor is an AI-powered code editor. `.cursorignore` specifies files/directories to
|
|
199
|
+
# exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
|
|
200
|
+
# refer to https://docs.cursor.com/context/ignore-files
|
|
201
|
+
.cursorignore
|
|
202
|
+
.cursorindexingignore
|
|
203
|
+
|
|
204
|
+
# Marimo
|
|
205
|
+
marimo/_static/
|
|
206
|
+
marimo/_lsp/
|
|
207
|
+
__marimo__/
|
|
@@ -0,0 +1,104 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
This file provides guidance to Claude Code when working with code in this repository.
|
|
4
|
+
|
|
5
|
+
## ROLE
|
|
6
|
+
|
|
7
|
+
You are a senior Python developer with extensive experience in:
|
|
8
|
+
- Web scraping and parsing (anti-bot bypass, browser automation, stealth techniques)
|
|
9
|
+
- High-load systems (async programming, task queues, horizontal scaling)
|
|
10
|
+
- Network protocols (HTTP/HTTPS, cookies, headers manipulation)
|
|
11
|
+
- Anti-detection tools (Camoufox, Playwright, Puppeteer patterns)
|
|
12
|
+
|
|
13
|
+
Approach: pragmatic, production-ready code. Prioritize reliability and performance over premature abstractions.
|
|
14
|
+
|
|
15
|
+
## PROJECT
|
|
16
|
+
|
|
17
|
+
Self-hosted captcha bypass service with HTTP API for circumventing Cloudflare/Amazon challenges.
|
|
18
|
+
|
|
19
|
+
### Purpose
|
|
20
|
+
|
|
21
|
+
Parsing websites often requires bypassing anti-bot protection. This service:
|
|
22
|
+
- Solves captcha challenges using Camoufox (stealth Firefox)
|
|
23
|
+
- Returns headers + cookies for subsequent API/HTML requests
|
|
24
|
+
- Runs as a standalone microservice accessible via HTTP
|
|
25
|
+
|
|
26
|
+
### Current Limitations
|
|
27
|
+
|
|
28
|
+
- Only GET requests are supported (POST/PUT with body and custom headers planned for future releases)
|
|
29
|
+
|
|
30
|
+
### Tech Stack
|
|
31
|
+
|
|
32
|
+
- Python 3.x
|
|
33
|
+
- Camoufox (anti-detect browser)
|
|
34
|
+
- HTTP API server
|
|
35
|
+
- Docker
|
|
36
|
+
|
|
37
|
+
### API Endpoints
|
|
38
|
+
|
|
39
|
+
1. `GET /health` — service status and metrics
|
|
40
|
+
2. `POST /solve` — queue captcha bypass task, returns task ID
|
|
41
|
+
3. `GET /result/{task_id}` — get task status/result by ID
|
|
42
|
+
4. `DELETE /task/{task_id}` — cancel running task or delete completed result
|
|
43
|
+
|
|
44
|
+
### Installation Options
|
|
45
|
+
|
|
46
|
+
1. **Docker Compose** — `docker-compose up -d` (supports env vars: WORKERS, PORT, RESULT_TTL, MAX_QUEUE_SIZE)
|
|
47
|
+
2. **pip** — `pip install .` then run `captcha-bypass` command
|
|
48
|
+
|
|
49
|
+
### Response Data
|
|
50
|
+
|
|
51
|
+
Successful bypass returns:
|
|
52
|
+
- `cookies` — array of cookie objects from browser context
|
|
53
|
+
- `request_headers` — browser request headers for reuse in subsequent requests
|
|
54
|
+
- `response_headers` — response headers from navigation
|
|
55
|
+
- `status_code` — HTTP status code
|
|
56
|
+
- `html` — page HTML content
|
|
57
|
+
- `url` — final URL after redirects
|
|
58
|
+
- `timeout_reached` — whether task waited full timeout
|
|
59
|
+
- `validation` — match info (matched, match_type, matched_condition)
|
|
60
|
+
|
|
61
|
+
## BASIC
|
|
62
|
+
|
|
63
|
+
Basic Claude Setup - foundational configuration and protocols.
|
|
64
|
+
|
|
65
|
+
### Memory Files Management
|
|
66
|
+
|
|
67
|
+
- When asked to add information to memory files - ALWAYS read the file first and search for existing information
|
|
68
|
+
- If found - update it, DO NOT duplicate. If not found - add to the specified location
|
|
69
|
+
- Report what was done: "Updated X" or "Added X"
|
|
70
|
+
- Information in memory files must always be written in English
|
|
71
|
+
|
|
72
|
+
### Modular CLAUDE.md Files
|
|
73
|
+
|
|
74
|
+
- MANDATORY: before working with any module, check for CLAUDE.md in its directory
|
|
75
|
+
- Use LS or Glob tools to search for local CLAUDE.md files
|
|
76
|
+
- Local CLAUDE.md supplements and refines the main file for its module
|
|
77
|
+
- Usage examples:
|
|
78
|
+
* Working with tests → first read /tests/CLAUDE.md
|
|
79
|
+
* Working with a specific service → look for CLAUDE.md in its folder
|
|
80
|
+
- Create modular CLAUDE.md only for complex modules with special rules
|
|
81
|
+
- Priority is determined according to "Specification Conflict Handling" section rules
|
|
82
|
+
- Ignoring modular instructions often leads to errors - always check for their presence
|
|
83
|
+
|
|
84
|
+
### User Communication
|
|
85
|
+
|
|
86
|
+
- Internal thinking (reasoning) must be in English
|
|
87
|
+
- Respond to the user in their language
|
|
88
|
+
- Goal: context token economy (non-English languages have higher token consumption)
|
|
89
|
+
|
|
90
|
+
### Critical Thinking
|
|
91
|
+
|
|
92
|
+
- Think critically, question ambiguous information
|
|
93
|
+
- Never assume: if information is unclear or missing - always ask, do not guess
|
|
94
|
+
- Verify facts: do not rely on memory about the project - always check in code/documentation
|
|
95
|
+
- Clarify uncertainty: e.g., which version to use if there is a choice between two options
|
|
96
|
+
- Do not hallucinate: do not invent information that does not exist
|
|
97
|
+
|
|
98
|
+
## TECHNICAL
|
|
99
|
+
|
|
100
|
+
Technical context - project stack, structure, and deployment instructions.
|
|
101
|
+
|
|
102
|
+
## TESTING
|
|
103
|
+
|
|
104
|
+
Testing rules - test environment setup, TDD practices, test isolation, and coverage requirements.
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
FROM python:3.12-slim
|
|
2
|
+
|
|
3
|
+
# Install system dependencies for Camoufox
|
|
4
|
+
RUN apt-get update && apt-get install -y --no-install-recommends \
|
|
5
|
+
libgtk-3-0 \
|
|
6
|
+
libx11-xcb1 \
|
|
7
|
+
libasound2 \
|
|
8
|
+
&& rm -rf /var/lib/apt/lists/*
|
|
9
|
+
|
|
10
|
+
WORKDIR /app
|
|
11
|
+
|
|
12
|
+
# Copy project files
|
|
13
|
+
COPY pyproject.toml README.md LICENSE ./
|
|
14
|
+
COPY captcha_bypass/ ./captcha_bypass/
|
|
15
|
+
|
|
16
|
+
# Install package
|
|
17
|
+
RUN pip install --no-cache-dir .
|
|
18
|
+
|
|
19
|
+
# Fetch Camoufox browser
|
|
20
|
+
RUN python -m camoufox fetch
|
|
21
|
+
|
|
22
|
+
CMD ["captcha-bypass"]
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
Captcha-Bypass License v1.0
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Maksym Panchenko
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted to any individual or organization to use, copy, modify, and distribute this software and its documentation, provided that:
|
|
6
|
+
|
|
7
|
+
1. The software is used solely for academic research, educational purposes, or penetration testing with written authorization from the system owner.
|
|
8
|
+
2. Any use for commercial purposes, malicious activity, or actions in violation of applicable laws and regulations is strictly prohibited.
|
|
9
|
+
3. The authors and contributors shall not be held liable for any misuse, damage, or legal consequences arising from the use of this software.
|
|
10
|
+
|
|
11
|
+
By using this software, you agree to comply with this license.
|
|
12
|
+
If you do not agree, you are not permitted to use the software.
|
|
13
|
+
|
|
14
|
+
This license does not grant any trademark rights, and it does not constitute an Open Source license as defined by the Open Source Initiative (OSI).
|