skeleton-key-http 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. skeleton_key_http-0.1.0/.gitignore +14 -0
  2. skeleton_key_http-0.1.0/CHANGELOG.md +107 -0
  3. skeleton_key_http-0.1.0/LICENSE +21 -0
  4. skeleton_key_http-0.1.0/PKG-INFO +120 -0
  5. skeleton_key_http-0.1.0/README.md +87 -0
  6. skeleton_key_http-0.1.0/docs/ARCHITECTURE.md +191 -0
  7. skeleton_key_http-0.1.0/docs/COMPARISON.md +248 -0
  8. skeleton_key_http-0.1.0/docs/CONCEPTS.md +451 -0
  9. skeleton_key_http-0.1.0/docs/QUICKSTART.md +215 -0
  10. skeleton_key_http-0.1.0/docs/REGISTRY_FORMAT.md +459 -0
  11. skeleton_key_http-0.1.0/docs/WRITING_A_PICK.md +356 -0
  12. skeleton_key_http-0.1.0/docs/benchmarks.md +80 -0
  13. skeleton_key_http-0.1.0/docs/egress.md +75 -0
  14. skeleton_key_http-0.1.0/docs/honesty-contract.md +100 -0
  15. skeleton_key_http-0.1.0/docs/index.md +36 -0
  16. skeleton_key_http-0.1.0/examples/01_open_a_door.py +78 -0
  17. skeleton_key_http-0.1.0/examples/02_json_api.py +78 -0
  18. skeleton_key_http-0.1.0/examples/03_add_a_custom_pick.py +152 -0
  19. skeleton_key_http-0.1.0/examples/04_identity_rotation.py +106 -0
  20. skeleton_key_http-0.1.0/pyproject.toml +97 -0
  21. skeleton_key_http-0.1.0/src/skeleton_key/__init__.py +92 -0
  22. skeleton_key_http-0.1.0/src/skeleton_key/__main__.py +105 -0
  23. skeleton_key_http-0.1.0/src/skeleton_key/_paths.py +45 -0
  24. skeleton_key_http-0.1.0/src/skeleton_key/detect.py +313 -0
  25. skeleton_key_http-0.1.0/src/skeleton_key/engine.py +399 -0
  26. skeleton_key_http-0.1.0/src/skeleton_key/handlers.py +183 -0
  27. skeleton_key_http-0.1.0/src/skeleton_key/registry/__init__.py +99 -0
  28. skeleton_key_http-0.1.0/src/skeleton_key/registry/data/contrib_example.yaml +36 -0
  29. skeleton_key_http-0.1.0/src/skeleton_key/registry/data/picks.yaml +65 -0
  30. skeleton_key_http-0.1.0/src/skeleton_key/registry/data/schema.json +110 -0
  31. skeleton_key_http-0.1.0/src/skeleton_key/registry/data/tumblers.yaml +180 -0
  32. skeleton_key_http-0.1.0/src/skeleton_key/resilience/__init__.py +634 -0
  33. skeleton_key_http-0.1.0/src/skeleton_key/schema.py +155 -0
  34. skeleton_key_http-0.1.0/src/skeleton_key/shims/__init__.py +535 -0
  35. skeleton_key_http-0.1.0/src/skeleton_key/template_pick.py +71 -0
  36. skeleton_key_http-0.1.0/src/skeleton_key/transport/__init__.py +629 -0
  37. skeleton_key_http-0.1.0/tests/smoke_doors.py +113 -0
  38. skeleton_key_http-0.1.0/tests/test_detect.py +331 -0
  39. skeleton_key_http-0.1.0/tests/test_honesty_invariants.py +248 -0
  40. skeleton_key_http-0.1.0/tests/test_matrix.py +213 -0
  41. skeleton_key_http-0.1.0/tests/test_registry_schema.py +181 -0
  42. skeleton_key_http-0.1.0/tests/test_vocabulary.py +219 -0
@@ -0,0 +1,14 @@
1
+ __pycache__/
2
+ *.pyc
3
+ *.egg-info/
4
+ build/
5
+ dist/
6
+ .venv/
7
+ venv/
8
+ *.bak-*
9
+ .skeleton_key_cache/
10
+ .pytest_cache/
11
+ .mypy_cache/
12
+ .ruff_cache/
13
+ site/
14
+ _site/
@@ -0,0 +1,107 @@
1
+ # Changelog
2
+
3
+ All notable changes to **Skeleton Key** are documented here.
4
+
5
+ > Naming note: during initial development this project carried the working name
6
+ > `skeleton-key`; it ships as **Skeleton Key** (`pip install skeleton-key`,
7
+ > `import skeleton_key`). No `skeleton-key` identifiers remain in the code.
8
+
9
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
10
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
11
+
12
+ Until `1.0.0`, the public API and the registry schema may change between minor
13
+ versions. Breaking changes are called out explicitly in each entry.
14
+
15
+ ## [Unreleased]
16
+
17
+ Nothing yet. See [ROADMAP.md](ROADMAP.md) for what is planned next.
18
+
19
+ ## [0.1.0] - 2026-05-30
20
+
21
+ Initial scaffold — the first packaged extraction of the access-resilience engine
22
+ into a standalone, src-layout library under the import name `skeleton_key`.
23
+
24
+ Skeleton Key is an access-resilience engine that presents an authentic browser TLS
25
+ fingerprint so legitimate clients aren't misclassified as bots — with a portable,
26
+ declarative registry of access-walls and the ranked ladder to open each one.
27
+
28
+ ### Added
29
+
30
+ - **The registry — the killer feature.** A Sigma/YARA-style portable signature
31
+ format for anti-bot walls: declarative YAML that detects an access-wall
32
+ (a *tumbler*) and binds the tools that open it (*picks*), with a **derived,
33
+ cannot-drift detection-to-remediation matrix**. Ships **12 tumblers** across
34
+ five categories (`cryptographic`, `challenge`, `temporal`, `reputation`,
35
+ `behavioral`) and **6 reference picks** (`header.negotiate`, `timing.backoff`,
36
+ `wrench.rotate_profile`, `wrench.swap_egress`, `solver.reddit_js`,
37
+ `browser.js_runtime`). The match-DSL semantics in `registry/detect.py` are the
38
+ interop contract; a JSON-Schema (`draft 2020-12`) ships the structural contract.
39
+ - **Impersonate-only transport — the key.** TLS/JA3 + HTTP/2 mimicry via
40
+ `curl_cffi` with a CLI cascade (curl-impersonate). It raises `FetchError` rather
41
+ than ever fall back to a plain `urllib`/`requests`/`curl` path: Skeleton Key fails
42
+ loudly instead of sending a forged or non-authentic fingerprint. Includes a
43
+ cookie cache and the `fetch_raw` seam that surfaces the raw response so detection
44
+ lives in the registry, not the transport.
45
+ - **Resilience layer.** Identity rotation (`Egress` / `Identity` / `Pool`) that
46
+ mints fresh identities by rotating egress, impersonation profile, and headers,
47
+ with a bounded swap loop, jittered backoff that honors `Retry-After`, and a
48
+ block detector. Egress sources are free/owned-only and env-gated
49
+ (`direct` / `warp` / `tor` / `proton`), documented with their honest tradeoffs.
50
+ An optional patchright browser fallback (`[browser]` extra) is the last-resort
51
+ rung.
52
+ - **Shim layer.** A fidelity-ordered, last-resort alternate-route layer
53
+ (underlying endpoint → reader proxy → alt frontend → Wayback archive) that fires
54
+ **only after** picks, rotation, and the browser pass are exhausted. Every shim
55
+ result carries its source, fidelity class, and as-of timestamp; a non-equivalent
56
+ route is reported as a structural miss, never passed off as the real resource.
57
+ - **One-import public API.** `from skeleton_key import open_door` runs the registry's
58
+ *matrix-only* drive loop — fetch → detect → pick → re-detect — and returns an
59
+ `OpenResult` exposing `.opened`, `.opened_by`, and the underlying response. The
60
+ **shim-aware** pipeline (shims-only-after-picks) is a distinct function in a
61
+ separate module, `skeleton_key.shims.open_door`, for when an honestly-labeled
62
+ last-resort alternate route is acceptable.
63
+ - **The honesty contract, enforced in code.** Four invariants — impersonate-only,
64
+ shims-after-picks, fidelity-labeling, and no-silent-degradation — each tied to
65
+ the code location that enforces it. Skeleton Key fails closed on auth gates
66
+ (`401` / `407`) and on challenges that require a human, and labels everything else.
67
+ - **Extensibility on-ramp.** A drop-in overlay (`contrib_example.yaml`) and a pick
68
+ template demonstrate that adding a wall is often data, not code: append a tumbler,
69
+ append a pick manifest, optionally add a handler — the matrix re-derives and the
70
+ loader validates the whole registry at startup.
71
+ - **Cache-path parameterization.** A `_paths` module resolves the cache directory
72
+ from `$SKELETON_KEY_CACHE` if set, else `platformdirs.user_cache_dir("skeleton_key")`,
73
+ decoupling the library from any host-specific location.
74
+ - **Project scaffolding.** `pyproject.toml` (src-layout, MIT, py3.10+), an offline-by-
75
+ default pytest suite (match-DSL golden tests, registry load-time validation, matrix
76
+ ranking, drop-in overlay, the impersonate-only and shim-equivalence guards, and a
77
+ banned-vocabulary check), a CI workflow, pre-commit hooks, docs, runnable examples
78
+ against defensible targets, and the responsible-use / contributing / security /
79
+ conduct docs.
80
+
81
+ ### Benchmark (point-in-time)
82
+
83
+ - On a one-off run dated **2026-05-30**, Skeleton Key opened **22 of 24** tested doors
84
+ where plain curl (user-agent spoof only) failed on every one — the TLS/JA3 + HTTP/2
85
+ mimicry was the load-bearing difference. The remaining **2 were BOTH_BLOCKED**:
86
+ IP-gated and auth-gated, **not** fingerprint-gated, so Skeleton Key does **not** open
87
+ any arbitrary door. Numbers are point-in-time (sites change their posture; one site
88
+ shifted from an IP gate to a challenge within a day) and are re-run periodically.
89
+ See [docs/benchmarks.md](docs/benchmarks.md) for the dated, caveated table and the
90
+ defensible-targets stance in [RESPONSIBLE_USE.md](RESPONSIBLE_USE.md).
91
+
92
+ ### Notes on scope (honest limits at 0.1.0)
93
+
94
+ - The format is **portable by construction** — language-agnostic YAML plus a
95
+ JSON-Schema contract — but there is **one** engine today (this Python one) and
96
+ **no second-language implementation yet**, so portability is a design property,
97
+ not a proven one.
98
+ - Vendor challenges (Cloudflare Turnstile, DataDome, Akamai, PerimeterX, Imperva)
99
+ are **detected** but only carry a generic browser-pick ladder; they are **not**
100
+ claimed as solved. The bundled patchright pass does not pass Cloudflare Turnstile,
101
+ even headful.
102
+ - The cookie cache is shared per-domain (not per-identity), and all egress is
103
+ `en-US`-only today. The docs state these limits rather than imply isolation or
104
+ geo-diversity the code does not deliver.
105
+
106
+ [Unreleased]: https://github.com/veific/skeleton-key/compare/v0.1.0...HEAD
107
+ [0.1.0]: https://github.com/veific/skeleton-key/releases/tag/v0.1.0
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Mark Bernhardt
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,120 @@
1
+ Metadata-Version: 2.4
2
+ Name: skeleton-key-http
3
+ Version: 0.1.0
4
+ Summary: Skeleton Key — an HTTP access-resilience / locksmith library for Python: it presents an authentic browser TLS fingerprint and ships a portable, declarative registry of access-walls and the ranked ladder to open each one.
5
+ Project-URL: Homepage, https://github.com/veific/skeleton-key
6
+ Project-URL: Repository, https://github.com/veific/skeleton-key
7
+ Project-URL: Issues, https://github.com/veific/skeleton-key/issues
8
+ Author: Mark Bernhardt
9
+ License-Expression: MIT
10
+ License-File: LICENSE
11
+ Keywords: access-resilience,anti-bot,curl-cffi,curl-impersonate,fingerprint,http,impersonate,ja3,tls
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Operating System :: OS Independent
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Topic :: Internet :: WWW/HTTP
21
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
22
+ Requires-Python: >=3.10
23
+ Requires-Dist: curl-cffi>=0.7
24
+ Requires-Dist: platformdirs
25
+ Requires-Dist: pyyaml
26
+ Provides-Extra: browser
27
+ Requires-Dist: patchright; extra == 'browser'
28
+ Provides-Extra: dev
29
+ Requires-Dist: jsonschema; extra == 'dev'
30
+ Requires-Dist: pytest; extra == 'dev'
31
+ Requires-Dist: ruff; extra == 'dev'
32
+ Description-Content-Type: text/markdown
33
+
34
+ # Skeleton Key
35
+
36
+ [![CI](https://github.com/veific/skeleton-key/actions/workflows/ci.yml/badge.svg)](https://github.com/veific/skeleton-key/actions/workflows/ci.yml)
37
+ [![License: MIT](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)
38
+ [![Python 3.10–3.13](https://img.shields.io/badge/python-3.10%E2%80%933.13-blue.svg)](https://pypi.org/project/skeleton-key/)
39
+ [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
40
+ [![PRs welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md)
41
+
42
+ > Your authentic browser key, for doors you're wrongly locked out of.
43
+
44
+ Skeleton Key is an access-resilience engine that presents an authentic browser
45
+ TLS fingerprint so legitimate clients aren't misclassified as bots — with a
46
+ portable, declarative registry of access-walls and the ranked ladder to open
47
+ each one.
48
+
49
+ It is built for legitimate clients that are wrongly fingerprint-blocked: your
50
+ own data and APIs behind a fingerprinting WAF, public data sources that
51
+ misclassify non-browser clients, research, archival, and accessibility tooling.
52
+
53
+ ## Why
54
+
55
+ Anti-bot systems increasingly fingerprint the TLS/JA3 + HTTP/2 handshake, so a
56
+ legitimate script fetching your own data or a public API gets a challenge page
57
+ that a real browser never sees. Skeleton Key solves this honestly: it speaks an
58
+ **authentic** browser handshake via curl-impersonate (never a forged or
59
+ plain-curl one — it fails loudly rather than send a fake fingerprint), rotates
60
+ unlinkable identities when an IP or profile is burned, and — uniquely — ships a
61
+ Sigma/YARA-style portable registry: declarative YAML that detects an access-wall
62
+ and derives the ranked ladder of tools to open it.
63
+
64
+ Skeleton Key **never mutates your machine** to get in — no system tweaks, no
65
+ installed root certs, no background daemon. It opens a door from inside the
66
+ process and nothing else.
67
+
68
+ When a door genuinely cannot be opened, Skeleton Key says so. It never passes a
69
+ challenge page, a stale archive, or an alternate-route copy off as the real,
70
+ fresh resource. A wall it detects but cannot pick is reported as
71
+ *detected-but-not-solved*, never silently degraded. **Honesty is the feature.**
72
+
73
+ ## Install
74
+
75
+ ```bash
76
+ pip install skeleton-key
77
+ ```
78
+
79
+ ## Quick start
80
+
81
+ ```python
82
+ from skeleton_key import open_door
83
+
84
+ result = open_door("https://www.cloudflare.com/cdn-cgi/trace")
85
+ print(result.opened, result.opened_by, result.resp.status_code)
86
+ ```
87
+
88
+ This top-level `open_door` is the registry loop: it presents the authentic key and
89
+ walks the ranked tumbler→pick ladder against the front door. `result.resp` is always
90
+ the real front door — it does not rotate identity on its own or fall back to an
91
+ alternate route. For the opt-in ladder that adds identity rotation, a browser
92
+ last-resort, and labeled alternate-route shims (each stamped with source, fidelity
93
+ class, and as-of timestamp), import `from skeleton_key.shims import open_door` — see the
94
+ [Quickstart](docs/QUICKSTART.md#two-open_doors-which-one-you-get).
95
+
96
+ Or from the command line:
97
+
98
+ ```bash
99
+ python -m skeleton_key https://www.cloudflare.com/cdn-cgi/trace
100
+ ```
101
+
102
+ ## Responsible use
103
+
104
+ Skeleton Key is a tool, in the same category as curl-impersonate, curl_cffi, and
105
+ FlareSolverr. TLS impersonation itself is legal; what you do with the access is
106
+ your responsibility. Be rate-limit aware, treat a site's Terms of Service and
107
+ `robots.txt` as a stance worth respecting, and point it at things you have a
108
+ right to reach — your own data and APIs, public data, research, and archival.
109
+ See [ETHICS.md](ETHICS.md). This is not legal advice.
110
+
111
+ ## Status
112
+
113
+ Young project (v0.1). The portable registry format is the novel part; the
114
+ *content* is small (a handful of tumblers and picks) and grows by contribution.
115
+ Vendor challenges (e.g. Cloudflare Turnstile, DataDome) are detected but not
116
+ claimed solved.
117
+
118
+ ## License
119
+
120
+ MIT — see [LICENSE](LICENSE).
@@ -0,0 +1,87 @@
1
+ # Skeleton Key
2
+
3
+ [![CI](https://github.com/veific/skeleton-key/actions/workflows/ci.yml/badge.svg)](https://github.com/veific/skeleton-key/actions/workflows/ci.yml)
4
+ [![License: MIT](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)
5
+ [![Python 3.10–3.13](https://img.shields.io/badge/python-3.10%E2%80%933.13-blue.svg)](https://pypi.org/project/skeleton-key/)
6
+ [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
7
+ [![PRs welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md)
8
+
9
+ > Your authentic browser key, for doors you're wrongly locked out of.
10
+
11
+ Skeleton Key is an access-resilience engine that presents an authentic browser
12
+ TLS fingerprint so legitimate clients aren't misclassified as bots — with a
13
+ portable, declarative registry of access-walls and the ranked ladder to open
14
+ each one.
15
+
16
+ It is built for legitimate clients that are wrongly fingerprint-blocked: your
17
+ own data and APIs behind a fingerprinting WAF, public data sources that
18
+ misclassify non-browser clients, research, archival, and accessibility tooling.
19
+
20
+ ## Why
21
+
22
+ Anti-bot systems increasingly fingerprint the TLS/JA3 + HTTP/2 handshake, so a
23
+ legitimate script fetching your own data or a public API gets a challenge page
24
+ that a real browser never sees. Skeleton Key solves this honestly: it speaks an
25
+ **authentic** browser handshake via curl-impersonate (never a forged or
26
+ plain-curl one — it fails loudly rather than send a fake fingerprint), rotates
27
+ unlinkable identities when an IP or profile is burned, and — uniquely — ships a
28
+ Sigma/YARA-style portable registry: declarative YAML that detects an access-wall
29
+ and derives the ranked ladder of tools to open it.
30
+
31
+ Skeleton Key **never mutates your machine** to get in — no system tweaks, no
32
+ installed root certs, no background daemon. It opens a door from inside the
33
+ process and nothing else.
34
+
35
+ When a door genuinely cannot be opened, Skeleton Key says so. It never passes a
36
+ challenge page, a stale archive, or an alternate-route copy off as the real,
37
+ fresh resource. A wall it detects but cannot pick is reported as
38
+ *detected-but-not-solved*, never silently degraded. **Honesty is the feature.**
39
+
40
+ ## Install
41
+
42
+ ```bash
43
+ pip install skeleton-key
44
+ ```
45
+
46
+ ## Quick start
47
+
48
+ ```python
49
+ from skeleton_key import open_door
50
+
51
+ result = open_door("https://www.cloudflare.com/cdn-cgi/trace")
52
+ print(result.opened, result.opened_by, result.resp.status_code)
53
+ ```
54
+
55
+ This top-level `open_door` is the registry loop: it presents the authentic key and
56
+ walks the ranked tumbler→pick ladder against the front door. `result.resp` is always
57
+ the real front door — it does not rotate identity on its own or fall back to an
58
+ alternate route. For the opt-in ladder that adds identity rotation, a browser
59
+ last-resort, and labeled alternate-route shims (each stamped with source, fidelity
60
+ class, and as-of timestamp), import `from skeleton_key.shims import open_door` — see the
61
+ [Quickstart](docs/QUICKSTART.md#two-open_doors-which-one-you-get).
62
+
63
+ Or from the command line:
64
+
65
+ ```bash
66
+ python -m skeleton_key https://www.cloudflare.com/cdn-cgi/trace
67
+ ```
68
+
69
+ ## Responsible use
70
+
71
+ Skeleton Key is a tool, in the same category as curl-impersonate, curl_cffi, and
72
+ FlareSolverr. TLS impersonation itself is legal; what you do with the access is
73
+ your responsibility. Be rate-limit aware, treat a site's Terms of Service and
74
+ `robots.txt` as a stance worth respecting, and point it at things you have a
75
+ right to reach — your own data and APIs, public data, research, and archival.
76
+ See [ETHICS.md](ETHICS.md). This is not legal advice.
77
+
78
+ ## Status
79
+
80
+ Young project (v0.1). The portable registry format is the novel part; the
81
+ *content* is small (a handful of tumblers and picks) and grows by contribution.
82
+ Vendor challenges (e.g. Cloudflare Turnstile, DataDome) are detected but not
83
+ claimed solved.
84
+
85
+ ## License
86
+
87
+ MIT — see [LICENSE](LICENSE).
@@ -0,0 +1,191 @@
1
+ # Architecture
2
+
3
+ > **Skeleton Key** — *Your authentic browser key, for doors you're wrongly locked out of.*
4
+
5
+ Skeleton Key is an access-resilience engine that presents an authentic browser TLS fingerprint so legitimate clients aren't misclassified as bots — with a portable, declarative registry of access-walls and the ranked ladder to open each one.
6
+
7
+ This document explains how that engine is built: the three layers and where each one's responsibility starts and stops, the `open_door` data flow end to end, and — the load-bearing claim — exactly how the *impersonate-only* invariant holds **by construction** rather than by discipline.
8
+
9
+ If you want the vocabulary first (door, key, tumbler, pick, shim), read [CONCEPTS.md](CONCEPTS.md). If you want the killer feature — the portable signature format — read [REGISTRY_FORMAT.md](REGISTRY_FORMAT.md). If you want the promises Skeleton Key makes about never lying to you, read [honesty-contract.md](honesty-contract.md). This file is the map that ties them together.
10
+
11
+ ---
12
+
13
+ ## 1. The layered system
14
+
15
+ Skeleton Key is three layers with strict, one-directional responsibility boundaries. Each layer knows only about the one beneath it; nothing reaches up.
16
+
17
+ ```
18
+ ┌────────────────────────────────────────────────────────────────────────────┐
19
+ │ skeleton_key.registry THE FORMAT — the declarative, portable engine │
20
+ │ data/*.yaml tumblers (detection rules) · picks (manifests) │
21
+ │ detect.py the match-DSL evaluator → which tumbler fired │
22
+ │ engine.py load + validate + derive matrix + open_door loop │
23
+ │ handlers.py the per-language pick bindings (the only │
24
+ │ language-bound file) │
25
+ ├────────────────────────────────────────────────────────────────────────────┤
26
+ │ skeleton_key.resilience RESILIENCE — unlinkable-identity rotation │
27
+ │ identity.py Egress / Identity / Pool, minting identities │
28
+ │ fetch.py the rotation drive loop + is_blocked() │
29
+ │ browser.py the patchright last-resort browser fallback │
30
+ ├────────────────────────────────────────────────────────────────────────────┤
31
+ │ skeleton_key.transport MECHANISM — the KEY: impersonate-only transport │
32
+ │ fetch.py curl_cffi → CLI cascade · cookie cache · │
33
+ │ fetch_raw seam · the FetchError boundary │
34
+ │ solvers.py the AccessSolver registry (one reference solver) │
35
+ └────────────────────────────────────────────────────────────────────────────┘
36
+ (above) ── skeleton_key.shims ── the labeled last-resort alternate-route layer
37
+ ```
38
+
39
+ A fourth module, `skeleton_key.shims`, sits beside the registry rather than inside the stack: it is tried **only after** the whole pick ladder is exhausted, and it walks to a *sibling* door (a syndication endpoint, a reader proxy, an alternate frontend, a Wayback archive) for the same content. It never touches a front-door tumbler. The canonical public entry point — `from skeleton_key import open_door` — is the registry's *matrix-only* drive loop; the **shim-aware** loop is a separate function living in this fourth module, `skeleton_key.shims.open_door`, which runs the registry pipeline first and only then considers a labeled shim. They are different modules, not two names for one loop. See [§5](#5-the-public-entry-point).
40
+
41
+ ### 1.1 Responsibility boundaries — where each layer stops
42
+
43
+ The discipline that keeps the design honest is *what each layer refuses to do*.
44
+
45
+ | Layer | Owns | Does **not** own |
46
+ |---|---|---|
47
+ | **transport** (mechanism) | The authentic TLS/JA3 + HTTP/2 handshake. The `curl_cffi → CLI` cascade. The cookie cache. `fetch_raw` (the raw, single-attempt, no-solver primitive). The `FetchError` boundary. | *Which* wall it hit. *What* to try next. *Where* it exits (egress is not its concern). |
48
+ | **resilience** (resilience) | `Egress` / `Identity` / `Pool`. Minting fresh, unlinkable identities (new exit IP + profile + headers). The bounded rotation loop. `browser_fetch` (the patchright fallback). `is_blocked()`. | The taxonomy of walls. The escalation *order*. The transport mechanism itself (it delegates down). |
49
+ | **registry** (the format) | The tumbler taxonomy. The pick catalog. The derived tumbler→pick **matrix**. The `open_door` drive loop. The match-DSL semantics (the interop contract). | The bytes on the wire — every fetch and re-fetch delegates downward. A second transport. |
50
+ | **shims** (last resort) | The four fidelity-ordered alternate routes. The structural equivalence gate. Fidelity + freshness labelling. | Anything the picks can do. It never runs unless the picks are exhausted, and it never claims a sibling route is the real door. |
51
+
52
+ The registry **consumes** the two lower layers — transport and resilience — and never reimplements transport. It does **not** consume the fourth row, `shims`: the registry's own `open_door` is front-door-only and never reaches the shim layer — that wiring lives solely in the separate `skeleton_key.shims.open_door` loop (see [§5](#5-the-public-entry-point)). A pick that needs a fresh exit IP calls into `skeleton_key.resilience`'s pool; a pick that needs a real browser calls `browser_fetch`; **every** re-attempt — without exception — goes back through the transport's impersonate-only path. That single fact is what makes the impersonate-only invariant structural rather than aspirational ([§4](#4-the-impersonate-only-invariant-by-construction)).
53
+
54
+ ### 1.2 Why three layers (and not one)
55
+
56
+ The split exists so that each concern can be reasoned about — and tested — in isolation:
57
+
58
+ - **Mechanism** can be proven impersonate-only without knowing anything about walls or rotation. It has exactly one job and one escape (`FetchError`).
59
+ - **Resilience** can rotate identities without knowing the wall taxonomy — it only needs `is_blocked()` to tell it *whether* to rotate, not *why*.
60
+ - **The registry** can grow new walls and tools as **data**, never touching mechanism or resilience code. Adding a wall is a YAML append, not a control-flow edit scattered across files. This is the property that makes the format portable and the contribution flywheel possible.
61
+
62
+ ---
63
+
64
+ ## 2. The `open_door` data flow
65
+
66
+ `open_door` is the data-driven drive loop. It replaces what used to be a hardcoded escalation `while`-loop with one that asks the **derived matrix** what to try and in what order. The control flow is fixed and small; all the *knowledge* lives in the YAML.
67
+
68
+ ```
69
+ door (url)
70
+
71
+ │ fetch_raw(url) ← KEY (authentic TLS) + WRENCH (impersonation profile),
72
+ ▼ NO hidden solver — the raw response, wall and all
73
+ Resp ──────► detect(resp, tumblers) ← match-DSL, evaluated highest-severity-first;
74
+ │ │ the first tumbler that matches wins
75
+ │ ▼ TumblerHit (id, category, severity)
76
+ │ resolve(matrix, hit) ← the ranked picks for THIS tumbler (derived data)
77
+ │ │
78
+ │ ▼ [pick₁, pick₂, …] exact-target → cheapest → … → heavy (browser)
79
+ │ handler(ctx) → Resp | None ← the PICK acts; re-attempts only via
80
+ │ │ ctx.refetch (= fetch_raw, impersonate-only)
81
+ │ ▼
82
+ └────────── re-detect on the new Resp ──────► escalate to the next *untried* pick
83
+ repeat until OPEN (no tumbler fires)
84
+ or the picks are exhausted
85
+ ```
86
+
87
+ The loop is bounded by `max_rounds` (default 6). Each round does exactly one thing: detect the current wall, resolve its ranked picks, apply the first one not yet tried, then re-detect on whatever came back.
88
+
89
+ ### 2.1 Step by step
90
+
91
+ **1 — `fetch_raw` (the seam).** The door is fetched through the transport's `fetch_raw`: a single impersonated attempt with **no access-solvers and no escalation**. This is deliberate. The raw response — challenge page and all — is exactly what the registry needs to *see*, so detection happens in the registry where it is declarative and inspectable, not invisibly inside the transport. This is the **`fetch_raw` seam**: the transport surfaces the raw tumbler upward instead of quietly solving it.
92
+
93
+ **2 — `detect`.** The response is run against the loaded tumblers, sorted highest-severity-first. Each tumbler is a match-tree (pure data; see [REGISTRY_FORMAT.md](REGISTRY_FORMAT.md)). The first tumbler that matches wins and yields a `TumblerHit` carrying the tumbler's id, category, and severity. Severity-ordering is why a specific named wall (say `challenge.reddit_js`) is checked before a generic fallback (`reputation.forbidden`) — the specific rule gets first refusal. If **nothing** matches, the door is open and the loop returns immediately.
94
+
95
+ **3 — `resolve`.** Given the hit, `resolve` returns the picks that address it — already ranked — from the **derived matrix** ([§3](#3-the-derived-matrix)). This is a dictionary lookup, not a search: the ordering was computed once at load time.
96
+
97
+ **4 — the pick acts.** The loop takes the first ranked pick **not yet tried this round** and calls its bound handler with a `PickContext` (the url, the current response, the hit, and — critically — `ctx.refetch`). The handler does its work and returns either a new `Resp` or `None`. `None` means *this pick could not help*; the loop records that and moves to the next untried pick. A handler that raises is caught, logged in the trace, and treated as no-help — one failing pick never aborts the door.
98
+
99
+ **5 — re-detect and escalate.** If the pick returned a new response, the loop re-runs `detect` on it. If no tumbler fires, the door opened and the result is returned with `opened_by` set to the pick's id. If a tumbler still fires, the loop escalates to the next untried pick and repeats. Because each pick is marked tried, the ladder always makes forward progress and terminates.
100
+
101
+ **6 — termination.** The loop ends when the door opens, when every matching pick has been tried (`exhausted`), or when `max_rounds` is reached. The return value is an `OpenResult(resp, opened, hit, opened_by, trace)`. The `trace` is the round-by-round record — tumbler hit → pick chosen → outcome — so a caller can see exactly how the matrix resolved, which is invaluable both for debugging and for trust.
102
+
103
+ ### 2.2 When the key itself fails
104
+
105
+ If the very first `fetch_raw` raises `FetchError` — meaning every impersonation backend failed, typically a TLS-handshake-level refusal — the loop does **not** crash and does **not** fall back to a plain request. Instead it synthesizes a `cryptographic.handshake` tumbler hit and routes *that* through the same matrix. The cryptographic category's ranked picks are exactly the right response to a refused handshake: rotate the tension wrench (a different impersonation profile = a different JA3) or swap egress (a different exit IP). A key failure is just another tumbler, handled by the same uniform machinery. (If the registry somehow has no cryptographic tumbler to synthesize, the `FetchError` is re-raised honestly rather than swallowed.)
106
+
107
+ ---
108
+
109
+ ## 3. The derived matrix
110
+
111
+ The tumbler→pick matrix is the heart of the registry, and its defining property is that it is **derived, never authored**. No human writes "for wall X, try pick A then pick B." The matrix is computed from the picks' declared `targets` and `cost` at load time, and it is never persisted — so it cannot drift out of sync with the data it came from.
112
+
113
+ For each tumbler, every pick that *could* address it is collected and ranked by this key:
114
+
115
+ 1. **Exact-target first.** A pick whose `targets` names this tumbler's id is purpose-built domain knowledge — it is tried before any broad pick. (A pick targeting `cat:<category>` or the wildcard `*` is "broad".)
116
+ 2. **Then cost** (`light` < `moderate` < `heavy`). Among the broad picks, cheapest first — so fast attempts fail fast before the heavy browser render is ever spun up. You don't launch a headless browser to solve something a header negotiation would have cleared.
117
+ 3. **Then latency, specificity, and id** — stable, deterministic tie-breaks, so the same registry always produces the same ladder.
118
+
119
+ This is why the ordering is trustworthy: it is a pure function of declared facts. Add a new, cheaper pick that targets a category, and it slots into *every* wall in that category at the right rung automatically — no matrix edit, no risk of a stale hand-written ladder. The matrix is introspectable at runtime (`get_registry().matrix`), which is how `examples/03_add_a_custom_pick.py` prints each tumbler's ranked ladder without making a single network call.
120
+
121
+ ---
122
+
123
+ ## 4. The impersonate-only invariant, by construction
124
+
125
+ This is the project's spine, and the architecture is arranged so that the invariant holds **by construction at the top, not by discipline.** The claim is precise: there is no code path, anywhere in the engine, that sends a request without an authentic browser fingerprint — and that is true because of *how the modules are wired*, not because every contributor remembers a rule.
126
+
127
+ Here is the argument, layer by layer.
128
+
129
+ **The transport is the only thing that opens a socket.** All outbound HTTP in Skeleton Key funnels through `skeleton_key.transport`. Its fetch functions try the impersonation backends in a fixed cascade and nothing else:
130
+
131
+ ```
132
+ curl_cffi → curl-impersonate CLI wrapper → raise FetchError
133
+ ```
134
+
135
+ Both backends speak an authentic browser TLS/JA3 + HTTP/2 handshake. The cascade only ever escalates *between impersonation backends*. When both fail, it raises `FetchError` — it **never** falls through to `urllib`, `requests`, or plain `curl`. Failing loudly is the designed behavior: a plain-curl request would carry a non-authentic fingerprint that could both fail anyway and reveal the caller, so the transport refuses to send one. The raw `fetch_raw` cascade is the same three lines: `_via_cffi → _via_cli → FetchError`, with no fourth option to add a plain path.
136
+
137
+ **Every re-attempt is forced back through that transport.** A pick handler does not get a raw socket or an HTTP client of its own. It receives a `PickContext` whose only fetch capability is `ctx.refetch` — and `ctx.refetch` is a thin closure over `fetch_raw`. So a pick *cannot* make a non-impersonated request even if it wanted to; the only fetch primitive it is handed is the impersonate-only one. This is the structural core: the registry's drive loop fetches via `fetch_raw`, and picks re-fetch via `ctx.refetch` which *is* `fetch_raw`. There is no second door.
138
+
139
+ **The handler contract removes the temptation.** A handler that cannot help returns `None`; it never fabricates a `Resp` and never reaches for an alternate transport. The resilience layer's most invasive tool — the patchright browser — is itself a real browser, so it satisfies the invariant trivially. The contributor starting point (`examples/03_add_a_custom_pick.py`) bakes this in: the `@handler` skeleton a new author copies already routes through `ctx.refetch` and returns `None` on failure.
140
+
141
+ **The boundary is enforced, not just documented.** A static test (`tests/test_impersonate_only.py`) asserts that the transport module does not import any plain-fetch library, and that the cascade raises `FetchError` rather than degrade. Because *all* fetching is concentrated in one module with one cascade, that single check covers the whole engine. You cannot satisfy the test and also smuggle in a plain path, because there is exactly one path to audit.
142
+
143
+ The result: impersonate-only is a property of the module graph. To break it you would have to add a brand-new transport and rewire the drive loop and the pick context to use it — a change no honest pick needs and that the test suite and the responsible-use contract both reject.
144
+
145
+ ### 4.1 The auth-gate carve-out (honest, not a loophole)
146
+
147
+ One subtlety preserves honesty rather than violating it. The transport's block detector deliberately carves out **`401` and `407`** as *credential-shaped* responses, not fingerprint walls. A `401 Unauthorized` or `407 Proxy Authentication Required` means the door wants credentials you don't have — rotating a profile or swapping an exit IP will never help, because the wall isn't about *what you look like*, it's about *who you are*. So the engine fails **closed** on these immediately instead of grinding through the whole ladder pretending an identity swap could open an account gate. This is the honesty contract in action: Skeleton Key does not waste your rate budget on a door it structurally cannot open, and it does not pretend an auth gate is a fingerprint problem. (See [honesty-contract.md](honesty-contract.md) for the full fail-closed posture, including human-CAPTCHA.)
148
+
149
+ ---
150
+
151
+ ## 5. The public entry point
152
+
153
+ Two `open_door` functions exist, and the distinction matters:
154
+
155
+ - **`skeleton_key.open_door`** (the canonical public one) — the **matrix-only** drive loop described in [§2](#2-the-open_door-data-flow). Fetch → detect → resolve → pick → re-detect → escalate. It opens a door by *picking its tumblers*, and it stops there: the real front door or an honest "stuck", no alternate routes. This is what most callers want, and the loop the example below shows.
156
+ - **`skeleton_key.shims.open_door`** — the **shim-aware** loop, a separate function in a separate module. It runs the full registry pipeline first, and only if the picks are genuinely exhausted does it consider a labeled last-resort shim to a sibling door. Reach for it when an honestly-labeled alternate route is acceptable. (Note: these are two different modules — there is no `skeleton_key.registry.open_door`; `registry/` is data-only YAML + JSON.)
157
+
158
+ ```python
159
+ from skeleton_key import open_door
160
+
161
+ r = open_door("https://api.github.com") # a defensible public API
162
+ print(r.opened) # True
163
+ print(r.opened_by) # the pick id that cleared it, or None if it was already open
164
+ print(r.resp.status_code)
165
+ ```
166
+
167
+ The ordering — **picks before shims, always** — is an architectural invariant, not a runtime preference. A shim is the last thing tried, it walks to a different URL for the same content, and whatever it returns is stamped with its fidelity class and an as-of timestamp. A non-equivalent route is reported as a structural MISS, never quietly passed off as the real, fresh resource. The shim layer's full equivalence-gating and labelling rules live in [honesty-contract.md](honesty-contract.md).
168
+
169
+ ---
170
+
171
+ ## 6. Honest caveats, per layer
172
+
173
+ An architecture document that only listed strengths would itself violate the honesty contract. Here is what each layer does **not** yet deliver, stated plainly.
174
+
175
+ - **Transport.** The cookie cache is keyed per registered domain and is **shared** across the toolkit's worker threads (read-mostly). It is not a per-identity, isolated cookie jar — so the cache is a point of linkage, and no part of Skeleton Key claims otherwise. The domain key uses a simple registered-domain derivation; exotic multi-label public suffixes are a known sharp edge.
176
+ - **Resilience.** Identity rotation freshens the exit IP, the impersonation profile, and the headers — but because the cookie cache is shared per domain (above), rotation does not yet deliver *cookie-state* unlinkability. All bundled egress sources resolve to **US** exits today, so there is no geographic diversity to claim. The egress options carry honest tradeoffs (a datacenter exit is blocked *more* often than a residential one; some anonymity-network exits are themselves widely blocked) — see [egress.md](egress.md). The identity pool's thread-safety is documented, not assumed.
177
+ - **Registry.** The *format* is novel and designed to be portable by construction — but there is **one** engine (this Python one) today and no second-language implementation, so portability is a design property, not a proven fact. The shipped content is small and honest about it: a handful of tumblers and picks at v0.1, framed as "a novel portable format," never "comprehensive coverage."
178
+ - **Shims & vendor challenges.** Several commercial challenge systems are **detected** but only addressable by the heavy browser pick, which does not reliably open them even headful — they are honestly marked *detected-but-gap*, never "solved." When the picks and the browser both come up short, a shim to a sibling door is the labeled last resort, and a third-party reader/mirror/archive route always carries its intermediary-trust caveat.
179
+
180
+ The measured result Skeleton Key stands behind, and leads with: **22 doors opened where a plain User-Agent-spoofed request failed on every one; 2 doors `BOTH_BLOCKED`** — IP- or auth-gated, not fingerprint-gated, so neither the authentic key nor identity rotation could open them (point-in-time, 2026-05-30; see [benchmarks.md](benchmarks.md)). The TLS/JA3 mimicry is the load-bearing difference on the 22; the 2 are the proof that Skeleton Key does not open *every* door, and would never claim to.
181
+
182
+ ---
183
+
184
+ ## See also
185
+
186
+ - [CONCEPTS.md](CONCEPTS.md) — the lockpicking vocabulary, with a worked example per term.
187
+ - [REGISTRY_FORMAT.md](REGISTRY_FORMAT.md) — the match-DSL, the tumbler/pick shapes, the derived matrix, and the JSON-Schema contract.
188
+ - [honesty-contract.md](honesty-contract.md) — the four invariants and the code locations that enforce each one.
189
+ - [egress.md](egress.md) — the identity-rotation egress catalog and its honest tradeoffs.
190
+ - [benchmarks.md](benchmarks.md) — the dated, caveated results table.
191
+ - [RESPONSIBLE_USE.md](https://github.com/veific/skeleton-key/blob/main/RESPONSIBLE_USE.md) — who Skeleton Key is for, and the access-resilience-not-evasion stance.