@rester159/blacktip 0.2.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. package/CHANGELOG.md +222 -0
  2. package/README.md +25 -0
  3. package/dist/akamai-sensor.d.ts +128 -0
  4. package/dist/akamai-sensor.d.ts.map +1 -0
  5. package/dist/akamai-sensor.js +190 -0
  6. package/dist/akamai-sensor.js.map +1 -0
  7. package/dist/behavioral/parsers.d.ts +89 -0
  8. package/dist/behavioral/parsers.d.ts.map +1 -0
  9. package/dist/behavioral/parsers.js +223 -0
  10. package/dist/behavioral/parsers.js.map +1 -0
  11. package/dist/blacktip.d.ts +68 -1
  12. package/dist/blacktip.d.ts.map +1 -1
  13. package/dist/blacktip.js +140 -1
  14. package/dist/blacktip.js.map +1 -1
  15. package/dist/browser-core.d.ts +10 -0
  16. package/dist/browser-core.d.ts.map +1 -1
  17. package/dist/browser-core.js +49 -0
  18. package/dist/browser-core.js.map +1 -1
  19. package/dist/diagnostics.d.ts +31 -0
  20. package/dist/diagnostics.d.ts.map +1 -1
  21. package/dist/diagnostics.js +146 -0
  22. package/dist/diagnostics.js.map +1 -1
  23. package/dist/identity-pool.d.ts +160 -0
  24. package/dist/identity-pool.d.ts.map +1 -0
  25. package/dist/identity-pool.js +288 -0
  26. package/dist/identity-pool.js.map +1 -0
  27. package/dist/index.d.ts +11 -2
  28. package/dist/index.d.ts.map +1 -1
  29. package/dist/index.js +11 -1
  30. package/dist/index.js.map +1 -1
  31. package/dist/tls-rewriter.d.ts +74 -0
  32. package/dist/tls-rewriter.d.ts.map +1 -0
  33. package/dist/tls-rewriter.js +203 -0
  34. package/dist/tls-rewriter.js.map +1 -0
  35. package/dist/tls-side-channel.d.ts +91 -0
  36. package/dist/tls-side-channel.d.ts.map +1 -0
  37. package/dist/tls-side-channel.js +248 -0
  38. package/dist/tls-side-channel.js.map +1 -0
  39. package/dist/types.d.ts +46 -0
  40. package/dist/types.d.ts.map +1 -1
  41. package/dist/types.js.map +1 -1
  42. package/docs/akamai-bypass.md +257 -0
  43. package/docs/akamai-sensor.md +183 -0
  44. package/docs/anti-bot-validation.md +84 -0
  45. package/docs/calibration-validation.md +93 -0
  46. package/docs/identity-pool.md +176 -0
  47. package/docs/tls-rewriting.md +121 -0
  48. package/docs/tls-side-channel.md +83 -0
  49. package/native/tls-client/go.mod +21 -0
  50. package/native/tls-client/go.sum +36 -0
  51. package/native/tls-client/main.go +216 -0
  52. package/package.json +8 -2
  53. package/scripts/fit-cmu-keystroke.mjs +186 -0
@@ -0,0 +1,183 @@
1
+ # Akamai sensor challenge solver (v0.5.0)
2
+
3
+ `bt.solveAkamaiChallenge(url)` is the v0.5.0 answer to "I want to call Akamai-protected APIs from a sessionless TLS daemon, but the first request always 403s because Akamai gates everything behind a sensor data POST." Solve the challenge once in a real browser, get back the validated cookies plus a recommended header set, then replay arbitrary requests via `bt.fetchWithTls()` (or any other HTTP client) for as long as the cookies stay valid. **Empirically validated against OpenTable Akamai Bot Manager: 5/5 replay calls return 200 with real content. ~600ms per replay vs ~4s per browser launch.**
4
+
5
+ ## Why this isn't a pure-Go solver
6
+
7
+ Reverse-engineering Akamai's `bm.js` to generate sensor data without a browser is intentionally hostile work and the maintenance economics are bad:
8
+
9
+ 1. **bm.js is heavily obfuscated.** OpenTable's current sensor JS is 26 KB of hex-encoded string array references with no recognizable function names. References to `gyroscope`, `hardwareConcurrency`, `selenium`, `Chrome`, `vendor`, `ShockwaveFlash` are scattered through it — clearly the sensor collector — but extracting them requires symbolic execution, not just regex.
10
+ 2. **Sensor data is encrypted with a runtime-derived key** that lives inside the obfuscated code. You can't just capture the POST body Chrome sends and replay it — the key changes per session.
11
+ 3. **Akamai rotates the obfuscation monthly.** A pure-Go reimplementation would be a 1–2 week reverse engineering project, and the result would rot in ~6 weeks. Bad ROI.
12
+
13
+ What works instead, and what BlackTip ships in v0.5.0: launch a real BlackTip browser, navigate to the URL, let Akamai's bm.js execute naturally (real Chrome runs the JS, generates the sensor payload, POSTs it back), and capture the validated cookies. The caller then injects those cookies into thousands of sessionless TLS-daemon API calls until they expire.
14
+
15
+ **This is NOT "no browser needed for Akamai." It IS "amortize browser cost across many subsequent API calls instead of paying it per request."** For most use cases that's the same thing — you pay one browser session per hour and run hundreds of API calls in between.
16
+
17
+ ## Quick start
18
+
19
+ ```typescript
20
+ import { BlackTip } from '@rester159/blacktip';
21
+
22
+ const bt = new BlackTip({ logLevel: 'info' });
23
+ await bt.launch();
24
+
25
+ // 1. Solve the Akamai challenge in the browser. ~15s.
26
+ const solved = await bt.solveAkamaiChallenge(
27
+ 'https://www.opentable.com/booking/restref/availability?rid=76651&restref=76651&partySize=2&dateTime=2026-04-11T19:00',
28
+ );
29
+
30
+ console.log('Validated:', solved.validated);
31
+ console.log('Akamai cookies:', solved.cookies.map(c => c.name));
32
+ // → [ 'bm_ss', 'bm_so', 'bm_mi', 'bm_sz', 'ak_bmsc', 'bm_s', 'bm_sv', '_abck' ]
33
+
34
+ // 2. Replay arbitrary requests via the TLS daemon. ~600ms each, no browser.
35
+ for (let i = 0; i < 100; i++) {
36
+ const resp = await bt.fetchWithTls({
37
+ url: 'https://www.opentable.com/api/some-endpoint',
38
+ headers: solved.recommendedHeaders, // Cookie + Sec-Ch-Ua + Sec-Fetch-* baked in
39
+ });
40
+ console.log('Call', i, '→', resp.status);
41
+ }
42
+
43
+ await bt.close();
44
+ ```
45
+
46
+ ## Result shape
47
+
48
+ ```typescript
49
+ interface AkamaiChallengeResult {
50
+ /**
51
+ * True when EITHER:
52
+ * - _abck reached validated state (~0~), OR
53
+ * - The page rendered without an Akamai block (sensor not enforced).
54
+ *
55
+ * Akamai's sensor validation is only enforced when other signals
56
+ * (TLS, IP, behavior) look suspicious. For real-Chrome sessions on
57
+ * residential connections, Akamai often admits the request without
58
+ * ever requiring the JS-layer sensor POST.
59
+ */
60
+ validated: boolean;
61
+
62
+ /**
63
+ * Actual sensor validation state:
64
+ * 0 → validated as human (gold standard)
65
+ * -1 → sensor not enforced (page admitted without it)
66
+ * 1+ → flagged as bot
67
+ * null → no _abck cookie set (target may not be Akamai-protected)
68
+ */
69
+ abckState: -1 | 0 | 1 | null;
70
+
71
+ /** The full _abck cookie value at the end of the wait window. */
72
+ abckValue: string | null;
73
+
74
+ /** Whether the rendered page is the Akamai Access Denied block page. */
75
+ blocked: boolean;
76
+
77
+ /** All Akamai-related cookies, ready to inject into other sessions. */
78
+ cookies: Array<{ name: string; value: string; domain: string; path: string }>;
79
+
80
+ /**
81
+ * Pre-built header set for replay calls. Includes Cookie, User-Agent,
82
+ * Accept, Accept-Language, Sec-Ch-Ua, Sec-Ch-Ua-Mobile, Sec-Ch-Ua-Platform,
83
+ * Sec-Fetch-Dest, Sec-Fetch-Mode, Sec-Fetch-Site, Sec-Fetch-User,
84
+ * Upgrade-Insecure-Requests. Pass directly to `bt.fetchWithTls()`.
85
+ *
86
+ * Replays without these headers will 403 even with valid cookies —
87
+ * Akamai validates the full request shape, not just the cookie jar.
88
+ */
89
+ recommendedHeaders: Record<string, string>;
90
+
91
+ finalUrl: string;
92
+ title: string;
93
+ durationMs: number;
94
+ notes: string[];
95
+ }
96
+ ```
97
+
98
+ ## The cost amortization story
99
+
100
+ OpenTable's Gjelina booking endpoint, measured on a residential connection:
101
+
102
+ | Approach | Cost per call | 100 calls |
103
+ |---|---|---|
104
+ | Browser launch + navigate per call | ~4s | ~400s |
105
+ | `solveAkamaiChallenge` once + 100 daemon replays | 15s + 100×0.6s = 75s | **75s** |
106
+ | **Speedup** | | **~5.3x** |
107
+
108
+ Larger N gets better. At 1,000 calls, it's 15s + 600s = 615s vs 4,000s — almost 7x. The crossover is at ~5 calls (below that, browser-per-call is faster because the solve overhead dominates).
109
+
110
+ ## What "validated" actually means
111
+
112
+ Akamai's sensor validation has three observable states encoded in the second `~`-delimited field of the `_abck` cookie:
113
+
114
+ | `_abck` state | Meaning | What you can do |
115
+ |---|---|---|
116
+ | `~0~` | Sensor data validated as human | Maximum trust — replay anything |
117
+ | `~-1~` | Sensor not enforced (Akamai admitted on other signals) | Replay safely — same as `~0~` for most APIs |
118
+ | `~1~` (or higher) | Sensor flagged as bot | Session burned — solve again with a different identity |
119
+
120
+ The interesting case is `~-1~`. On every empirical test against OpenTable from a residential connection, Akamai admitted the request without ever requiring sensor validation — `_abck` stayed at `~-1~` but the page rendered fine and the cookies worked for daemon replays. This is consistent with Akamai's own marketing: the sensor JS is one layer of a multi-factor decision, and high-confidence requests (good TLS, good IP, real Chrome behavior) get admitted without it.
121
+
122
+ That's why `validated` is `true` for both `~0~` and `~-1~` outcomes — both unlock the replay path.
123
+
124
+ ## Replay headers — why all of them matter
125
+
126
+ The first thing I tried after solving was naive: solve in browser, copy cookies, pass them through `Cookie:` header to the daemon. **It 403'd.** Then I added the full Chrome header set: `Sec-Ch-Ua`, `Sec-Ch-Ua-Mobile`, `Sec-Ch-Ua-Platform`, `Sec-Fetch-Dest`, `Sec-Fetch-Mode`, `Sec-Fetch-Site`, `Sec-Fetch-User`, `Upgrade-Insecure-Requests`. **It returned 200.**
127
+
128
+ Akamai is validating the full request shape, not just the cookie jar. Without the Sec-Fetch-* headers, the request looks like a programmatic fetch and Akamai blocks it even with valid cookies. With them, the request looks like a navigation from a real Chrome and Akamai admits it.
129
+
130
+ The `recommendedHeaders` field on the solver result includes all of these. Don't strip them; pass the whole object to `bt.fetchWithTls({ url, headers: solved.recommendedHeaders })`.
131
+
132
+ ## Combining with IdentityPool
133
+
134
+ If you're running long-lived flows with identity rotation, the natural pattern is to attach the solved Akamai cookies to an IdentityPool snapshot:
135
+
136
+ ```typescript
137
+ import { BlackTip, IdentityPool } from '@rester159/blacktip';
138
+
139
+ const pool = new IdentityPool({ storePath: './.bt/identities.json' });
140
+ const identity = pool.acquire('opentable.com')!;
141
+
142
+ const config = pool.applyToConfig(identity);
143
+ const bt = new BlackTip(config);
144
+ await bt.launch();
145
+ await pool.restoreSnapshot(bt, identity);
146
+
147
+ // Solve Akamai once for this identity
148
+ const solved = await bt.solveAkamaiChallenge('https://www.opentable.com/booking/...');
149
+ if (!solved.validated) {
150
+ pool.markBurned(identity.id, 'Akamai blocked', 'opentable.com');
151
+ return;
152
+ }
153
+
154
+ // Save the post-solve session state into the identity for next time
155
+ await pool.captureSnapshot(bt, identity);
156
+
157
+ // Run N daemon replays. When _abck eventually expires, re-solve.
158
+ for (let i = 0; i < 100; i++) {
159
+ const resp = await bt.fetchWithTls({
160
+ url: 'https://www.opentable.com/api/...',
161
+ headers: solved.recommendedHeaders,
162
+ });
163
+ // ...
164
+ }
165
+
166
+ await bt.close();
167
+ ```
168
+
169
+ Now your identity is durable: cookies + storage + solved Akamai state, all persisted, ready to resume tomorrow without re-solving.
170
+
171
+ ## Limitations
172
+
173
+ - **Still requires a browser to solve.** This is the whole point of the architecture decision documented above. If you need pure-Go API access without ever launching a browser, you're going to write a lot of obfuscation reverse-engineering code that breaks every 6 weeks. v0.5.0 doesn't ship that path.
174
+ - **Cookies expire.** Akamai's session window is typically ~1 hour for the validated state. After that, replays start returning 403 again and you need to re-solve. Re-solving from the same browser context is fast (~3s on subsequent calls because the browser already has the prior state).
175
+ - **`_abck` flagged state means session burned.** If `abckState === 1`, the cookies are useless — Akamai marked you as a bot and even the browser session won't recover. You need a fresh BlackTip launch with a different identity (different proxy, fresh user data dir, possibly different device profile).
176
+ - **Per-domain.** This solver is tested against OpenTable. The pattern works against any Akamai Bot Manager target but the specific URL format and timing may vary. Adjust `dwellMsBeforePolling` and `timeoutMs` per-target.
177
+
178
+ ## See also
179
+
180
+ - `docs/tls-side-channel.md` — the underlying `bt.fetchWithTls()` daemon
181
+ - `docs/tls-rewriting.md` — the v0.5.0 full-rewriting mode (TLS rewriter intercepts every browser request)
182
+ - `docs/identity-pool.md` — long-running session and identity rotation
183
+ - `docs/akamai-bypass.md` — the v0.2.0 plan that documents Akamai's detection layer stack and the L016 fix that opened the door
@@ -0,0 +1,84 @@
1
+ # Anti-bot validation scoreboard
2
+
3
+ This document records BlackTip's results against commercial anti-bot vendors on real, in-the-wild targets. Every entry is generated by `bt.testAgainstAntiBot(url)`, which both detects challenge/block pages AND captures vendor signals (cookies, scripts) on a passing page so we can prove the target is actually protected — not a false negative on an unprotected URL.
4
+
5
+ Methodology and reproduction recipe at the bottom.
6
+
7
+ ## Live scoreboard — 2026-04-09 (BlackTip 0.2.0)
8
+
9
+ | Target | Vendors detected on page | Block? | Vendor signals on success | Notes |
10
+ |---|---|---|---|---|
11
+ | **vinted.com** | DataDome | pass | datadome cookie, cloudflare script | Real catalog renders with prices, brands, listings |
12
+ | **bestbuy.com** | Akamai | pass | akamai cookie + script | Earlier classification as PerimeterX was wrong — BestBuy is Akamai |
13
+ | **walmart.com** | Akamai + PerimeterX | pass | akamai cookie + script, perimeterx cookie + script | Walmart runs both vendors simultaneously; BlackTip slides past both |
14
+ | **crunchbase.com** | Cloudflare | pass | cloudflare script | Real homepage content visible |
15
+ | **ticketmaster.com** | none detected on homepage | pass | none on `/` | TM only arms PerimeterX on event/checkout pages, not the marketing homepage |
16
+ | **opentable.com** (Gjelina deep link, restref=76651) | Akamai | pass | akamai cookie + script | Real time slots render; Reservation at Gjelina, Apr 11 2026 |
17
+ | **chatgpt.com** | Cloudflare | pass | cloudflare script + cf_clearance cookie | **Cloudflare managed challenge auto-passed**; cf_clearance issued silently |
18
+ | **twitch.tv** | Kasada | pass | kasada script signal | **Kasada validated** — first published BlackTip pass against a real Kasada-armed target |
19
+ | **canadagoose.com** | (no live signals on homepage) | pass | none | Was historically Kasada but the homepage no longer surfaces Kasada cookies/scripts; may be lazy-loaded on cart/checkout |
20
+ | **hyatt.com** | (no live signals on landing page) | pass | none on `/loyalty/en-US` | RT (Akamai mPulse RUM) present, no Bot Manager indicators |
21
+ | **footlocker.com** | (no live signals on homepage) | pass | none | Same as Canada Goose — Kasada may arm only on PDP/cart |
22
+ | **datadome.co/bot-tester** | DataDome (bypassed via marketing redirect) | pass | none captured | Redirected to /signup/ with marketing content; DataDome did not arm a challenge |
23
+ | **antoinevastel.com/bots-vue.html** | none | pass | none | Demo page no longer arms a probe — author moved to a marketing homepage |
24
+ | **nowsecure.nl** (Cloudflare bot fight, nodriver author's benchmark) | none | pass | none on this barebones page | Passes regression — earlier validation in v0.1.0 |
25
+
26
+ ### Cloudflare managed challenge — silent auto-pass evidence
27
+
28
+ The `cf_clearance` cookie is set ONLY after Cloudflare's managed challenge accepts a request as human. Across the v0.3.0 validation run, BlackTip earned `cf_clearance` cookies on multiple Cloudflare-protected domains without ever surfacing a "Just a moment..." interstitial to the user:
29
+
30
+ - `.vinted.com` (cf_clearance httpOnly)
31
+ - `.crunchbase.com` (cf_clearance httpOnly)
32
+ - `.chatgpt.com` (cf_clearance httpOnly + cf_bm + cfuvid)
33
+
34
+ These cookies are not visible to `document.cookie` because they are httpOnly — the v0.2.0 detector missed them. v0.3.0 fixed the detector to read via the BlackTip cookies API, surfacing this evidence. **A `cf_clearance` cookie is the strongest possible proof of a Cloudflare managed-challenge pass on a real Chrome session.**
35
+
36
+ **Eight commercial-detector targets, eight passes.** The most load-bearing validations are Walmart (Akamai + PerimeterX simultaneously) and OpenTable (Akamai's full Bot Manager on a high-value booking endpoint that previously hard-blocked v0.1.0).
37
+
38
+ ## What "passing" means here
39
+
40
+ `bt.testAgainstAntiBot(url)` returns `passed: true` when:
41
+ 1. The page title does not match a known vendor block pattern (Akamai "Access Denied", Cloudflare "Just a moment...", PerimeterX "Press & Hold", DataDome captcha-delivery interstitial, Imperva incident page, etc.)
42
+ 2. The body text does not contain vendor block markers
43
+ 3. Real page content is rendered (verifiable in `bodyPreview`)
44
+
45
+ The `vendorSignals` field separately reports what vendor cookies and scripts are present even on a passing page. If a target shows `vendorSignals: []`, either the vendor doesn't arm protection on that URL (Ticketmaster homepage), or BlackTip's signal patterns missed something — note both honestly here.
46
+
47
+ ## Vendors recognised
48
+
49
+ | Vendor | Block-page tells | Cookie tells | Script tells |
50
+ |---|---|---|---|
51
+ | Akamai Bot Manager | title `Access Denied`, body `errors.edgesuite.net`, `Reference #...` | `_abck`, `bm_sz`, `ak_bmsc`, `bm_sv` | `akam/`, `ak.bmpsdk`, `akamaihd.net/sensor` |
52
+ | DataDome | body `geo.captcha-delivery.com`, `datado.me` | `datadome`, `dd_s`, `dd_cookie_test_` | `js.datadome.co`, `datado.me` |
53
+ | Cloudflare Bot Fight / Turnstile | title `Just a moment...`, body `cf-browser-verification`, `Sorry, you have been blocked` | `cf_clearance`, `__cf_bm`, `__cflb` | `challenges.cloudflare.com`, `cdn-cgi/challenge-platform` |
54
+ | HUMAN / PerimeterX | title `Access to this page has been denied`, body `Press & Hold` | `_px*`, `_pxhd` | `perimeterx`, `px-cdn`, `px-captcha`, `human-security` |
55
+ | Imperva / Incapsula | body `Request unsuccessful. Incapsula incident ID` | `visid_incap_`, `incap_ses_` | (mostly server-side) |
56
+ | Kasada | body `kpsdk` | (none captured) | `x-kpsdk-` headers, `ips.js` |
57
+ | Arkose Labs / FunCaptcha | body `client-api.arkoselabs.com`, `funcaptcha` | (none captured) | `client-api.arkoselabs.com` |
58
+
59
+ ## How to reproduce
60
+
61
+ ```bash
62
+ # Terminal 1
63
+ cd /path/to/blacktip
64
+ node dist/cli.js serve
65
+
66
+ # Terminal 2 — single target
67
+ node dist/cli.js send "return await bt.testAgainstAntiBot('https://www.vinted.com/')" --pretty
68
+
69
+ # Or batch:
70
+ node dist/cli.js batch '[
71
+ "return await bt.testAgainstAntiBot(\"https://www.vinted.com/\")",
72
+ "return await bt.testAgainstAntiBot(\"https://www.bestbuy.com/\")",
73
+ "return await bt.testAgainstAntiBot(\"https://www.walmart.com/\")",
74
+ "return await bt.testAgainstAntiBot(\"https://www.crunchbase.com/\")",
75
+ "return await bt.testAgainstAntiBot(\"https://www.opentable.com/booking/restref/availability?rid=76651&restref=76651&partySize=2&dateTime=2026-04-11T19%3A00\")"
76
+ ]'
77
+ ```
78
+
79
+ ## Caveats
80
+
81
+ - **One-shot probes.** This scoreboard records single navigations from a residential IP. Anti-bot vendors profile sessions over time and across requests; a target that passes a one-shot probe may still flag a multi-step automation flow where the behavioral signature looks too clean. The deep validation is the OpenTable booking flow itself, where we drove the form to completion through Akamai's Bot Manager.
82
+ - **Vendor classification can be wrong.** Targets like Walmart run multiple vendors in parallel. The `vendorSignals` field is the source of truth; the "expected vendor" column above is informational.
83
+ - **IP matters.** Run from a residential network or known-clean residential proxy. A datacenter IP will fail every entry on this scoreboard regardless of how good BlackTip's stealth is. Use `bt.checkIpReputation()` first.
84
+ - **The scoreboard goes stale.** Vendors update their detection logic constantly. Re-run on each release and update this doc. If a target moves from pass to fail, file it against the next BlackTip version, capture the failing fingerprint, and dig in.
@@ -0,0 +1,93 @@
1
+ # Behavioral calibration validation (v0.3.0)
2
+
3
+ This document records the result of fitting BlackTip's behavioral profile against the real CMU Keystroke Dynamics dataset (Killourhy & Maxion 2009) and validating the fit against held-out subjects.
4
+
5
+ ## TL;DR
6
+
7
+ The calibrated profile measurably beats BlackTip's canonical `HUMAN_PROFILE` on a held-out subject set:
8
+
9
+ | Metric | Canonical KS distance | Calibrated KS distance | Improvement |
10
+ |---|---|---|---|
11
+ | **Hold time** | 0.4297 | 0.2018 | **53% closer to real humans** |
12
+ | **Flight time** | 0.4811 | 0.4152 | 13.7% closer to real humans |
13
+
14
+ This is the first time the BlackTip behavioral pipeline has been validated end-to-end against a real public dataset. Up through v0.2.0, the engine's parameters were sane defaults; v0.3.0 makes them empirically grounded.
15
+
16
+ ## Methodology
17
+
18
+ 1. **Dataset**: CMU Keystroke Dynamics (`DSL-StrongPasswordData.csv`) — 51 subjects each typing the fixed phrase `.tie5Roanl` 50 times across 8 sessions, for 20,400 total phrase reps.
19
+ 2. **Split**: deterministic 80/20 by subject. 40 subjects (16,000 phrases) → training. 11 subjects (4,400 phrases) → held-out test.
20
+ 3. **Fit**: training set → `fitTypingDynamics()` → empirical hold-time and flight-time distributions plus per-digraph latencies.
21
+ 4. **Compare**: synthesized 5,000 samples from each of (a) BlackTip's canonical `HUMAN_PROFILE` ranges and (b) the fitted `[p5, p95]` ranges. Computed Kolmogorov–Smirnov distance (max empirical CDF gap) against the held-out test set.
22
+ 5. **Report**: lower KS distance → closer to real human distribution.
23
+
24
+ The KS test is the standard non-parametric goodness-of-fit measure. It does not assume any particular distribution shape, which matters here because keystroke timings are right-skewed log-normal-ish, not Gaussian. The improvement ratio is `1 - calibrated / canonical`.
25
+
26
+ ## Fitted parameters
27
+
28
+ ```
29
+ Hold time:
30
+ mean = 90.3 ms
31
+ p5 = 48.3 ms
32
+ p50 = 85.8 ms
33
+ p95 = 148.8 ms
34
+
35
+ Flight time:
36
+ mean = 151.4 ms
37
+ p5 = 0.0 ms (some adjacent keystrokes overlap — concurrent press/release)
38
+ p50 = 91.3 ms
39
+ p95 = 513.5 ms
40
+
41
+ Digraphs fit: 6 (the unique a–z transitions in the phrase)
42
+ ```
43
+
44
+ The fitted profile is saved to `data/cmu-keystroke/calibrated-profile.json` and ready to load:
45
+
46
+ ```typescript
47
+ import calibrated from './data/cmu-keystroke/calibrated-profile.json' with { type: 'json' };
48
+ import { BlackTip } from '@rester159/blacktip';
49
+
50
+ const bt = new BlackTip({
51
+ behaviorProfile: calibrated.profileConfig,
52
+ // ... rest of your config
53
+ });
54
+ ```
55
+
56
+ ## Why this matters
57
+
58
+ Behavioral biometrics services (BioCatch, NuData, SecuredTouch) profile users on dimensions like:
59
+
60
+ - Hold time mean and variance per key
61
+ - Flight time distributions per digraph
62
+ - Tap pressure (mobile only — n/a here)
63
+ - Mouse curvature, click dwell, scroll deceleration
64
+
65
+ A bot that types with uniform 100 ms holds and flat flight times stands out instantly because real humans have right-skewed log-normal distributions with subject-specific clustering. BlackTip's canonical `HUMAN_PROFILE` was already in the right ballpark, but the canonical hold-time range `[50, 200]` was 53% farther from the real distribution than the empirically-fitted `[48, 149]`. The fitted range is tighter and centered correctly, so BlackTip's keystroke output now sits inside the real human distribution rather than scattered across a too-wide canonical range.
66
+
67
+ ## Reproducing the result
68
+
69
+ ```bash
70
+ cd /path/to/blacktip
71
+ mkdir -p data/cmu-keystroke
72
+ curl -fsSL -o data/cmu-keystroke/DSL-StrongPasswordData.csv \
73
+ https://www.cs.cmu.edu/~keystroke/DSL-StrongPasswordData.csv
74
+ npm run build
75
+ node scripts/fit-cmu-keystroke.mjs
76
+ ```
77
+
78
+ The script writes its output to `data/cmu-keystroke/calibrated-profile.json` and prints the validation table to stdout. Re-runs are deterministic (the train/test split is sorted, not random) so the numbers match this document byte-for-byte.
79
+
80
+ ## What this does NOT prove
81
+
82
+ - The KS test compares marginal distributions, not joint ones. A profile that matches the marginals perfectly could still have unrealistic correlation structure (e.g. correct hold times but uncorrelated with flight times). A real biometrics test against a commercial service would catch this; we don't have one.
83
+ - The CMU dataset is 50 reps of one fixed phrase from each of 51 American English typists. The fitted profile generalises best to American English long-form typing; non-Latin scripts and very short fields may need a different calibration.
84
+ - Flight time fit improvement (13.7%) is much smaller than hold time (53%). The CMU phrase is short and contains transitions that aren't representative of free-text typing — the held-out flights span a wide range that the canonical `[80, 150]` and fitted `[0, 514]` are both bad fits for. A larger free-text dataset (e.g. Buffalo or GREYC) would likely produce a better flight fit. Future work.
85
+
86
+ ## Future calibration sources
87
+
88
+ Once a parser exists for each, the same pipeline applies:
89
+
90
+ - **Balabit Mouse Dynamics Challenge** — for `fitMouseDynamics()`. Parser exists in `parseBalabitMouseCsv()`; needs an actual fit run against the real dataset.
91
+ - **GREYC-NISLAB** — free-text keystroke dynamics from 110 subjects. Better representative coverage than CMU's fixed phrase.
92
+ - **Buffalo Free-Text** — multi-session keystroke data across 148 subjects. The canonical reference for keystroke behavioral biometrics literature.
93
+ - Your own telemetry — `parseGenericTelemetryJson()` accepts the normalized `MouseMovement` / `TypingSession` shapes directly. Bring your own data.
@@ -0,0 +1,176 @@
1
+ # IdentityPool — long-running session and identity rotation (v0.4.0)
2
+
3
+ `IdentityPool` is BlackTip's answer to the question "how do I rotate across many identities cleanly without my whole flow looking like one bot retried under different IPs?" An identity is the union of everything that makes a session look like one specific human: cookies, localStorage, proxy, device profile, behavior profile, locale, timezone. The pool persists to a JSON file so identities survive restarts, and each identity has a per-domain burn list so an identity blocked on opentable.com is still eligible for amazon.com.
4
+
5
+ ## When you need this
6
+
7
+ Most BlackTip flows do not need an IdentityPool. A single launch with the right device profile and a residential connection covers the common case. The pool earns its keep when:
8
+
9
+ 1. You're running many flows against the same target and need to look like many different users (price scraping, market research, multi-account ops on services where multi-account is allowed).
10
+ 2. You want resilience: when identity A gets blocked on opentable.com, you want identity B to take over without manual intervention.
11
+ 3. You want session persistence across process restarts so a logged-in identity from yesterday is still logged in today.
12
+ 4. You want a feedback loop: when an identity gets burned, the proxy bound to it should be marked dirty in `ProxyPool` so it isn't reused for the same target until the ban window decays.
13
+
14
+ ## Composition
15
+
16
+ `IdentityPool` does not reinvent persistence or proxy selection. It composes:
17
+
18
+ - **`SnapshotManager`** for cookies + localStorage + sessionStorage. The pool calls `captureSnapshot(bt, identity)` after a successful flow to save state.
19
+ - **`ProxyPool`** for proxy selection and ban tracking. New identities draw a proxy from the pool at creation time. When an identity is burned per-domain, the pool reports a ban on that proxy/domain pair so future selections skip it.
20
+ - **`BlackTipConfig`** is produced by `pool.applyToConfig(identity)` and passed to `new BlackTip(config)`.
21
+
22
+ ## Quick start
23
+
24
+ ```typescript
25
+ import { BlackTip, IdentityPool, ProxyPool, ProxyProviders } from '@rester159/blacktip';
26
+
27
+ // 1. Build a ProxyPool from whatever provider you use.
28
+ const proxyPool = new ProxyPool([
29
+ ProxyProviders.brightData('your-customer-id', 'your-password', 'residential'),
30
+ ProxyProviders.oxylabs('your-username', 'your-password'),
31
+ ]);
32
+
33
+ // 2. Build the IdentityPool, backed by a JSON file on disk.
34
+ const pool = new IdentityPool({
35
+ storePath: './.blacktip/identities.json',
36
+ proxyPool,
37
+ rotationPolicy: {
38
+ maxUses: 50, // burn after 50 uses
39
+ maxAgeMs: 7 * 24 * 60 * 60 * 1000, // burn after 7 days idle
40
+ },
41
+ });
42
+
43
+ // 3. First time only: seed the pool with N identities. Subsequent runs
44
+ // load from the store file.
45
+ if (pool.size() === 0) {
46
+ for (let i = 0; i < 5; i++) {
47
+ pool.add({
48
+ deviceProfile: i % 2 === 0 ? 'desktop-windows' : 'desktop-macos',
49
+ label: `identity-${i + 1}`,
50
+ locale: 'en-US',
51
+ timezone: 'America/New_York',
52
+ });
53
+ }
54
+ }
55
+
56
+ // 4. For each flow: acquire, launch, run, capture, release.
57
+ const identity = pool.acquire('opentable.com');
58
+ if (!identity) throw new Error('No eligible identity for opentable.com — pool exhausted');
59
+
60
+ const config = pool.applyToConfig(identity, { logLevel: 'info', timeout: 15_000 });
61
+ const bt = new BlackTip(config);
62
+ await bt.launch();
63
+
64
+ // Restore the identity's prior session (cookies, storage). No-op if first use.
65
+ await pool.restoreSnapshot(bt, identity);
66
+
67
+ try {
68
+ await bt.navigate('https://www.opentable.com/');
69
+ await bt.waitForStable();
70
+ // ... rest of the flow
71
+
72
+ // On success, save the updated session state back into the identity.
73
+ await pool.captureSnapshot(bt, identity);
74
+ } catch (err) {
75
+ // On failure, mark this identity burned for this domain. The proxy
76
+ // gets banned in ProxyPool too, so the next identity drawn from the
77
+ // pool won't reuse the same proxy on this target.
78
+ pool.markBurned(identity.id, err instanceof Error ? err.message : String(err), 'opentable.com');
79
+ } finally {
80
+ await bt.close();
81
+ }
82
+ ```
83
+
84
+ ## API
85
+
86
+ ### `new IdentityPool(options)`
87
+
88
+ ```typescript
89
+ {
90
+ storePath: string; // required — JSON file path
91
+ proxyPool?: ProxyPool; // optional — for proxy binding & feedback
92
+ rotationPolicy?: {
93
+ maxUses?: number; // default: Infinity
94
+ maxAgeMs?: number; // default: Infinity
95
+ preferLeastRecentlyUsed?: boolean; // default: true
96
+ };
97
+ }
98
+ ```
99
+
100
+ ### `add(init)` → `Identity`
101
+
102
+ Create a new identity. `deviceProfile` is required. If a `proxyPool` was supplied to the IdentityPool and `proxy` is omitted, the pool draws one. Auto-saves to disk.
103
+
104
+ ### `acquire(domain?)` → `Identity | null`
105
+
106
+ Pick an identity for use. Skips identities burned on the requested domain. Applies rotation policy: identities exceeding `maxUses` or `maxAgeMs` are auto-burned. Returns null if no eligible identity exists.
107
+
108
+ ### `markBurned(id, reason, domain?)` → `boolean`
109
+
110
+ Mark an identity burned. With `domain`, only burns for that domain (per-domain burn list). Without `domain`, fully burns the identity. Per-domain burns also report a proxy ban back to `ProxyPool` if one is wired up.
111
+
112
+ ### `clearBurn(id, domain?)` → `boolean`
113
+
114
+ Manually unban. Useful when you know the burn was a transient issue.
115
+
116
+ ### `applyToConfig(identity, baseConfig?)` → `BlackTipConfig`
117
+
118
+ Build a `BlackTipConfig` from an identity. Sets `deviceProfile`, `behaviorProfile`, `locale`, `timezone`, and `proxy` (URL-formatted via `proxyToUrl`). Other base config fields pass through unchanged.
119
+
120
+ ### `restoreSnapshot(bt, identity)` → `Promise<void>`
121
+
122
+ After `bt.launch()`, apply the identity's saved cookies + localStorage to the running browser. No-op if the identity has no snapshot yet.
123
+
124
+ ### `captureSnapshot(bt, identity)` → `Promise<void>`
125
+
126
+ Save the current BlackTip session state into the identity. Call after a successful flow so the next acquire of this identity starts from a known-good logged-in state.
127
+
128
+ ### `list()`, `available()`, `size()`, `remove(id)`
129
+
130
+ Standard inspection. `available()` returns identities not fully burned (per-domain burns don't count).
131
+
132
+ ## Persistence format
133
+
134
+ The store file is plain JSON with a schema version. Sample:
135
+
136
+ ```json
137
+ {
138
+ "version": 1,
139
+ "savedAt": "2026-04-10T22:30:00.000Z",
140
+ "identities": [
141
+ {
142
+ "id": "1f2e3d4c-...",
143
+ "label": "identity-1",
144
+ "createdAt": "2026-04-10T20:00:00.000Z",
145
+ "lastUsedAt": "2026-04-10T22:25:00.000Z",
146
+ "useCount": 12,
147
+ "burnedAt": null,
148
+ "burnedReason": null,
149
+ "burnedDomains": ["sears.com"],
150
+ "snapshot": { /* SessionSnapshot */ },
151
+ "proxy": { "id": "brightdata-residential", "...": "..." },
152
+ "behaviorProfile": "human",
153
+ "deviceProfile": "desktop-windows",
154
+ "locale": "en-US",
155
+ "timezone": "America/New_York"
156
+ }
157
+ ]
158
+ }
159
+ ```
160
+
161
+ The schema is versioned so future migrations are explicit. Don't hand-edit the file while a process is reading it — use the API.
162
+
163
+ ## IP reputation gate
164
+
165
+ v0.4.0 also adds `BlackTipConfig.requireResidentialIp`. When set, BlackTip runs `bt.checkIpReputation()` immediately after `launch()` and either warns or throws based on the verdict:
166
+
167
+ ```typescript
168
+ new BlackTip({
169
+ // 'throw': refuse to launch if egress IP is on a known datacenter ASN.
170
+ // 'warn': log a warning but allow the launch.
171
+ // false / unset: no check.
172
+ requireResidentialIp: 'throw',
173
+ });
174
+ ```
175
+
176
+ Use `'throw'` in production / CI where a flagged IP would burn a real account. Use `'warn'` for local dev. Combine with `IdentityPool` and `ProxyPool` to ensure every launch goes through a residential exit before touching the target.