captcha-bypass 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,54 @@
1
+ ---
2
+ name: senior-python-scraping-dev
3
+ description: Use this agent when the user needs to write, review, or fix Python code related to web scraping, anti-bot bypass, browser automation, async programming, or high-load systems. This includes code reviews after implementing features, debugging stealth/anti-detection issues, writing production-ready scraping code, fixing bugs in HTTP/cookie handling, or reviewing task queue implementations.\n\nExamples:\n\n1. Code Review After Implementation:\n user: "Please implement a function to extract cookies from Camoufox browser session"\n assistant: "Here is the implementation:"\n <function implementation>\n assistant: "Now let me use the senior-python-scraping-dev agent to review the code for production-readiness and anti-detection best practices"\n\n2. Writing New Code:\n user: "I need a retry mechanism for handling Cloudflare challenges"\n assistant: "I'll use the senior-python-scraping-dev agent to implement this with proper async patterns and reliability considerations"\n\n3. Bug Fixing:\n user: "The cookie extraction is failing intermittently"\n assistant: "I'll use the senior-python-scraping-dev agent to diagnose and fix this bug, checking for race conditions and proper async handling"\n\n4. Task Review:\n user: "Can you review the solve endpoint implementation I just finished?"\n assistant: "I'll use the senior-python-scraping-dev agent to review the implementation for performance, reliability, and adherence to anti-detection patterns"
4
+ model: opus
5
+ color: red
6
+ ---
7
+
8
+ You are a senior Python developer with deep expertise in web scraping, anti-bot bypass systems, and high-load architectures. Your experience spans:
9
+
10
+ **Core Domains:**
11
+ - Web scraping and parsing with anti-bot bypass, browser automation, and stealth techniques
12
+ - High-load systems including async programming, task queues, and horizontal scaling
13
+ - Network protocols (HTTP/HTTPS, cookies, headers manipulation)
14
+ - Anti-detection tools (Camoufox, Playwright, Puppeteer patterns)
15
+
16
+ **Your Approach:**
17
+ You write pragmatic, production-ready code. You prioritize reliability and performance over premature abstractions. You understand that in scraping systems, edge cases are the norm, not the exception.
18
+
19
+ **When Writing Code:**
20
+ 1. Always consider anti-detection implications - fingerprinting, timing patterns, request sequences
21
+ 2. Use async/await properly - avoid blocking calls, handle cancellation gracefully
22
+ 3. Implement proper error handling with retry logic and exponential backoff
23
+ 4. Keep code simple and maintainable - avoid over-engineering
24
+ 5. Consider resource management - browser instances, connections, memory
25
+ 6. Add type hints for clarity and IDE support
26
+ 7. Follow Python 3.14 idioms and best practices
27
+
28
+ **When Reviewing Code:**
29
+ 1. Check for anti-detection issues - suspicious patterns, missing stealth measures
30
+ 2. Verify async correctness - race conditions, proper awaiting, resource cleanup
31
+ 3. Assess error handling completeness - what happens when things fail?
32
+ 4. Evaluate performance implications - connection pooling, caching, batching
33
+ 5. Look for security issues - credential handling, injection vulnerabilities
34
+ 6. Ensure code follows project conventions from CLAUDE.md
35
+ 7. Be specific about issues and provide concrete fixes
36
+
37
+ **When Fixing Bugs:**
38
+ 1. Reproduce the issue first - understand the failure mode
39
+ 2. Check for intermittent/timing-related causes in async code
40
+ 3. Verify network-related assumptions - timeouts, retries, connection states
41
+ 4. Consider browser state and lifecycle issues with Camoufox
42
+ 5. Test the fix under realistic conditions
43
+
44
+ **Quality Standards:**
45
+ - Code must be production-ready, not prototype quality
46
+ - Prefer explicit over implicit behavior
47
+ - Handle edge cases that are common in scraping (timeouts, partial responses, rate limits)
48
+ - Include meaningful logging for debugging in production
49
+ - Write code that fails gracefully and provides useful error messages
50
+
51
+ **Project Context:**
52
+ You are working on a self-hosted captcha bypass service with HTTP API for circumventing Cloudflare/Amazon challenges. The stack is Python 3.14 with Camoufox (stealth Firefox) and Docker. The API has /health, /solve, and /result/{id} endpoints.
53
+
54
+ Always verify your assumptions by checking the actual code. Never guess about implementation details - read the source. If requirements are unclear, ask for clarification rather than making assumptions.
@@ -0,0 +1,11 @@
1
+ {
2
+ "permissions": {
3
+ "allow": [
4
+ "WebFetch(domain:camoufox.com)",
5
+ "WebFetch(domain:datawookie.dev)",
6
+ "WebFetch(domain:playwright.dev)",
7
+ "Bash(python -m py_compile:*)",
8
+ "Bash(python -c:*)"
9
+ ]
10
+ }
11
+ }
@@ -0,0 +1,207 @@
1
+ # Byte-compiled / optimized / DLL files
2
+ __pycache__/
3
+ *.py[codz]
4
+ *$py.class
5
+
6
+ # C extensions
7
+ *.so
8
+
9
+ # Distribution / packaging
10
+ .Python
11
+ build/
12
+ develop-eggs/
13
+ dist/
14
+ downloads/
15
+ eggs/
16
+ .eggs/
17
+ lib/
18
+ lib64/
19
+ parts/
20
+ sdist/
21
+ var/
22
+ wheels/
23
+ share/python-wheels/
24
+ *.egg-info/
25
+ .installed.cfg
26
+ *.egg
27
+ MANIFEST
28
+
29
+ # PyInstaller
30
+ # Usually these files are written by a python script from a template
31
+ # before PyInstaller builds the exe, so as to inject date/other infos into it.
32
+ *.manifest
33
+ *.spec
34
+
35
+ # Installer logs
36
+ pip-log.txt
37
+ pip-delete-this-directory.txt
38
+
39
+ # Unit test / coverage reports
40
+ htmlcov/
41
+ .tox/
42
+ .nox/
43
+ .coverage
44
+ .coverage.*
45
+ .cache
46
+ nosetests.xml
47
+ coverage.xml
48
+ *.cover
49
+ *.py.cover
50
+ .hypothesis/
51
+ .pytest_cache/
52
+ cover/
53
+
54
+ # Translations
55
+ *.mo
56
+ *.pot
57
+
58
+ # Django stuff:
59
+ *.log
60
+ local_settings.py
61
+ db.sqlite3
62
+ db.sqlite3-journal
63
+
64
+ # Flask stuff:
65
+ instance/
66
+ .webassets-cache
67
+
68
+ # Scrapy stuff:
69
+ .scrapy
70
+
71
+ # Sphinx documentation
72
+ docs/_build/
73
+
74
+ # PyBuilder
75
+ .pybuilder/
76
+ target/
77
+
78
+ # Jupyter Notebook
79
+ .ipynb_checkpoints
80
+
81
+ # IPython
82
+ profile_default/
83
+ ipython_config.py
84
+
85
+ # pyenv
86
+ # For a library or package, you might want to ignore these files since the code is
87
+ # intended to run in multiple environments; otherwise, check them in:
88
+ # .python-version
89
+
90
+ # pipenv
91
+ # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92
+ # However, in case of collaboration, if having platform-specific dependencies or dependencies
93
+ # having no cross-platform support, pipenv may install dependencies that don't work, or not
94
+ # install all needed dependencies.
95
+ #Pipfile.lock
96
+
97
+ # UV
98
+ # Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
99
+ # This is especially recommended for binary packages to ensure reproducibility, and is more
100
+ # commonly ignored for libraries.
101
+ #uv.lock
102
+
103
+ # poetry
104
+ # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
105
+ # This is especially recommended for binary packages to ensure reproducibility, and is more
106
+ # commonly ignored for libraries.
107
+ # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
108
+ #poetry.lock
109
+ #poetry.toml
110
+
111
+ # pdm
112
+ # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
113
+ # pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
114
+ # https://pdm-project.org/en/latest/usage/project/#working-with-version-control
115
+ #pdm.lock
116
+ #pdm.toml
117
+ .pdm-python
118
+ .pdm-build/
119
+
120
+ # pixi
121
+ # Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
122
+ #pixi.lock
123
+ # Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
124
+ # in the .venv directory. It is recommended not to include this directory in version control.
125
+ .pixi
126
+
127
+ # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
128
+ __pypackages__/
129
+
130
+ # Celery stuff
131
+ celerybeat-schedule
132
+ celerybeat.pid
133
+
134
+ # SageMath parsed files
135
+ *.sage.py
136
+
137
+ # Environments
138
+ .env
139
+ .envrc
140
+ .venv
141
+ env/
142
+ venv/
143
+ ENV/
144
+ env.bak/
145
+ venv.bak/
146
+
147
+ # Spyder project settings
148
+ .spyderproject
149
+ .spyproject
150
+
151
+ # Rope project settings
152
+ .ropeproject
153
+
154
+ # mkdocs documentation
155
+ /site
156
+
157
+ # mypy
158
+ .mypy_cache/
159
+ .dmypy.json
160
+ dmypy.json
161
+
162
+ # Pyre type checker
163
+ .pyre/
164
+
165
+ # pytype static type analyzer
166
+ .pytype/
167
+
168
+ # Cython debug symbols
169
+ cython_debug/
170
+
171
+ # PyCharm
172
+ # JetBrains specific template is maintained in a separate JetBrains.gitignore that can
173
+ # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
174
+ # and can be added to the global gitignore or merged into this file. For a more nuclear
175
+ # option (not recommended) you can uncomment the following to ignore the entire idea folder.
176
+ .idea/
177
+
178
+ # Abstra
179
+ # Abstra is an AI-powered process automation framework.
180
+ # Ignore directories containing user credentials, local state, and settings.
181
+ # Learn more at https://abstra.io/docs
182
+ .abstra/
183
+
184
+ # Visual Studio Code
185
+ # Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
186
+ # that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
187
+ # and can be added to the global gitignore or merged into this file. However, if you prefer,
188
+ # you could uncomment the following to ignore the entire vscode folder
189
+ # .vscode/
190
+
191
+ # Ruff stuff:
192
+ .ruff_cache/
193
+
194
+ # PyPI configuration file
195
+ .pypirc
196
+
197
+ # Cursor
198
+ # Cursor is an AI-powered code editor. `.cursorignore` specifies files/directories to
199
+ # exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
200
+ # refer to https://docs.cursor.com/context/ignore-files
201
+ .cursorignore
202
+ .cursorindexingignore
203
+
204
+ # Marimo
205
+ marimo/_static/
206
+ marimo/_lsp/
207
+ __marimo__/
@@ -0,0 +1,104 @@
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code when working with code in this repository.
4
+
5
+ ## ROLE
6
+
7
+ You are a senior Python developer with extensive experience in:
8
+ - Web scraping and parsing (anti-bot bypass, browser automation, stealth techniques)
9
+ - High-load systems (async programming, task queues, horizontal scaling)
10
+ - Network protocols (HTTP/HTTPS, cookies, headers manipulation)
11
+ - Anti-detection tools (Camoufox, Playwright, Puppeteer patterns)
12
+
13
+ Approach: pragmatic, production-ready code. Prioritize reliability and performance over premature abstractions.
14
+
15
+ ## PROJECT
16
+
17
+ Self-hosted captcha bypass service with HTTP API for circumventing Cloudflare/Amazon challenges.
18
+
19
+ ### Purpose
20
+
21
+ Parsing websites often requires bypassing anti-bot protection. This service:
22
+ - Solves captcha challenges using Camoufox (stealth Firefox)
23
+ - Returns headers + cookies for subsequent API/HTML requests
24
+ - Runs as a standalone microservice accessible via HTTP
25
+
26
+ ### Current Limitations
27
+
28
+ - Only GET requests are supported (POST/PUT with body and custom headers planned for future releases)
29
+
30
+ ### Tech Stack
31
+
32
+ - Python 3.x
33
+ - Camoufox (anti-detect browser)
34
+ - HTTP API server
35
+ - Docker
36
+
37
+ ### API Endpoints
38
+
39
+ 1. `GET /health` — service status and metrics
40
+ 2. `POST /solve` — queue captcha bypass task, returns task ID
41
+ 3. `GET /result/{task_id}` — get task status/result by ID
42
+ 4. `DELETE /task/{task_id}` — cancel running task or delete completed result
43
+
44
+ ### Installation Options
45
+
46
+ 1. **Docker Compose** — `docker-compose up -d` (supports env vars: WORKERS, PORT, RESULT_TTL, MAX_QUEUE_SIZE)
47
+ 2. **pip** — `pip install .` then run `captcha-bypass` command
48
+
49
+ ### Response Data
50
+
51
+ Successful bypass returns:
52
+ - `cookies` — array of cookie objects from browser context
53
+ - `request_headers` — browser request headers for reuse in subsequent requests
54
+ - `response_headers` — response headers from navigation
55
+ - `status_code` — HTTP status code
56
+ - `html` — page HTML content
57
+ - `url` — final URL after redirects
58
+ - `timeout_reached` — whether task waited full timeout
59
+ - `validation` — match info (matched, match_type, matched_condition)
60
+
61
+ ## BASIC
62
+
63
+ Basic Claude Setup - foundational configuration and protocols.
64
+
65
+ ### Memory Files Management
66
+
67
+ - When asked to add information to memory files - ALWAYS read the file first and search for existing information
68
+ - If found - update it, DO NOT duplicate. If not found - add to the specified location
69
+ - Report what was done: "Updated X" or "Added X"
70
+ - Information in memory files must always be written in English
71
+
72
+ ### Modular CLAUDE.md Files
73
+
74
+ - MANDATORY: before working with any module, check for CLAUDE.md in its directory
75
+ - Use LS or Glob tools to search for local CLAUDE.md files
76
+ - Local CLAUDE.md supplements and refines the main file for its module
77
+ - Usage examples:
78
+ * Working with tests → first read /tests/CLAUDE.md
79
+ * Working with a specific service → look for CLAUDE.md in its folder
80
+ - Create modular CLAUDE.md only for complex modules with special rules
81
+ - Priority is determined according to "Specification Conflict Handling" section rules
82
+ - Ignoring modular instructions often leads to errors - always check for their presence
83
+
84
+ ### User Communication
85
+
86
+ - Internal thinking (reasoning) must be in English
87
+ - Respond to the user in their language
88
+ - Goal: context token economy (non-English languages have higher token consumption)
89
+
90
+ ### Critical Thinking
91
+
92
+ - Think critically, question ambiguous information
93
+ - Never assume: if information is unclear or missing - always ask, do not guess
94
+ - Verify facts: do not rely on memory about the project - always check in code/documentation
95
+ - Clarify uncertainty: e.g., which version to use if there is a choice between two options
96
+ - Do not hallucinate: do not invent information that does not exist
97
+
98
+ ## TECHNICAL
99
+
100
+ Technical context - project stack, structure, and deployment instructions.
101
+
102
+ ## TESTING
103
+
104
+ Testing rules - test environment setup, TDD practices, test isolation, and coverage requirements.
@@ -0,0 +1,22 @@
1
+ FROM python:3.12-slim
2
+
3
+ # Install system dependencies for Camoufox
4
+ RUN apt-get update && apt-get install -y --no-install-recommends \
5
+ libgtk-3-0 \
6
+ libx11-xcb1 \
7
+ libasound2 \
8
+ && rm -rf /var/lib/apt/lists/*
9
+
10
+ WORKDIR /app
11
+
12
+ # Copy project files
13
+ COPY pyproject.toml README.md LICENSE ./
14
+ COPY captcha_bypass/ ./captcha_bypass/
15
+
16
+ # Install package
17
+ RUN pip install --no-cache-dir .
18
+
19
+ # Fetch Camoufox browser
20
+ RUN python -m camoufox fetch
21
+
22
+ CMD ["captcha-bypass"]
@@ -0,0 +1,14 @@
1
+ Captcha-Bypass License v1.0
2
+
3
+ Copyright (c) 2025 Maksym Panchenko
4
+
5
+ Permission is hereby granted to any individual or organization to use, copy, modify, and distribute this software and its documentation, provided that:
6
+
7
+ 1. The software is used solely for academic research, educational purposes, or penetration testing with written authorization from the system owner.
8
+ 2. Any use for commercial purposes, malicious activity, or actions in violation of applicable laws and regulations is strictly prohibited.
9
+ 3. The authors and contributors shall not be held liable for any misuse, damage, or legal consequences arising from the use of this software.
10
+
11
+ By using this software, you agree to comply with this license.
12
+ If you do not agree, you are not permitted to use the software.
13
+
14
+ This license does not grant any trademark rights, and it does not constitute an Open Source license as defined by the Open Source Initiative (OSI).