npm - @rester159/blacktip - Versions diffs - 0.2.0 → 0.4.0 - Mend

@rester159/blacktip 0.2.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

package/CHANGELOG.md +190 -0
package/README.md +21 -0
package/dist/behavioral/parsers.d.ts +89 -0
package/dist/behavioral/parsers.d.ts.map +1 -0
package/dist/behavioral/parsers.js +223 -0
package/dist/behavioral/parsers.js.map +1 -0
package/dist/blacktip.d.ts +34 -1
package/dist/blacktip.d.ts.map +1 -1
package/dist/blacktip.js +105 -1
package/dist/blacktip.js.map +1 -1
package/dist/diagnostics.d.ts +31 -0
package/dist/diagnostics.d.ts.map +1 -1
package/dist/diagnostics.js +146 -0
package/dist/diagnostics.js.map +1 -1
package/dist/identity-pool.d.ts +160 -0
package/dist/identity-pool.d.ts.map +1 -0
package/dist/identity-pool.js +288 -0
package/dist/identity-pool.js.map +1 -0
package/dist/index.d.ts +7 -2
package/dist/index.d.ts.map +1 -1
package/dist/index.js +7 -1
package/dist/index.js.map +1 -1
package/dist/tls-side-channel.d.ts +82 -0
package/dist/tls-side-channel.d.ts.map +1 -0
package/dist/tls-side-channel.js +241 -0
package/dist/tls-side-channel.js.map +1 -0
package/dist/types.d.ts +15 -0
package/dist/types.d.ts.map +1 -1
package/dist/types.js.map +1 -1
package/docs/akamai-bypass.md +257 -0
package/docs/anti-bot-validation.md +84 -0
package/docs/calibration-validation.md +93 -0
package/docs/identity-pool.md +176 -0
package/docs/tls-side-channel.md +83 -0
package/native/tls-client/go.mod +21 -0
package/native/tls-client/go.sum +36 -0
package/native/tls-client/main.go +216 -0
package/package.json +8 -2
package/scripts/fit-cmu-keystroke.mjs +186 -0

package/docs/akamai-bypass.md ADDED Viewed

@@ -0,0 +1,257 @@
+# Defeating Akamai Bot Manager
+> Status as of v0.2.0: **passing on the User-Agent / Sec-Ch-Ua consistency layer** that previously blocked us. Validated against OpenTable (which uses Akamai Bot Manager). Future detection layers (sensor data, behavioral biometrics, IP reputation) are tracked below as the next areas to harden.
+This is the BlackTip team's working plan against Akamai Bot Manager, the most layered commercial anti-bot service in the wild. It's structured so you can use it as a reference whether you're a contributor improving BlackTip or a user diagnosing why a specific Akamai-protected target isn't working for you.
+## What Akamai Bot Manager actually is
+Akamai Bot Manager runs a stack of detection layers, scored independently and combined into a "bot probability" that decides whether you get the page, get a JavaScript challenge, or get blocked at the edge with `Access Denied`. The layers, in the order they fire:
+1. **TCP/IP layer** — IP reputation database. Datacenter ASNs (AWS, GCP, OVH, DigitalOcean) flagged automatically. Residential IPs scored by historical bot behavior on the same /24 block. Tor exit nodes blocked outright. **Cheapest signal, runs first.**
+2. **TLS layer** — JA3, JA4, GREASE position and rotation pattern, cipher ordering, extension ordering, signature algorithms, EC curves, ALPN. Akamai is one of the few that checks GREASE *position* (Chrome puts GREASE first in both ciphers and extensions).
+3. **HTTP/2 layer** — Akamai's own fingerprint format: `s[settings];w[window_update];p[priority_frames];h[header_order]`. Tracks SETTINGS values (HEADER_TABLE_SIZE, INITIAL_WINDOW_SIZE, MAX_FRAME_SIZE), WINDOW_UPDATE size, PRIORITY frame patterns, and pseudo-header order. Chrome's signature is `m,a,s,p` (method/authority/scheme/path).
+4. **HTTP header layer** — header order, presence and consistency of `Sec-Fetch-*`, `Sec-Ch-Ua-*`, `Accept-Language`, `Accept-Encoding`, `User-Agent`. **This is where v0.1.0 was being caught.** See L016 below.
+5. **Sensor data (the JavaScript challenge)** — Akamai injects a script that collects ~80 browser signals (mouse traces, keystroke timings, performance.now() resolution, Battery API, screen properties, WebGL info, canvas hash, audio fingerprint, plugins, fonts, timezone math, navigator properties) and POSTs them as a 30–50 KB blob to `/akam/11/...`. The server validates the blob and either sets a valid `_abck` cookie or marks the session as a bot. **All subsequent requests need a valid `_abck` cookie.**
+6. **Cookie continuity** — `_abck`, `bm_sz`, `bm_sv`, `_bm_sz`. They expire, rotate, and need session affinity. Sessions that don't carry the cookies properly are flagged on the next request.
+7. **Behavioral patterns** — after passing the initial gate, Akamai still profiles mouse dynamics, keystroke flight times, scroll patterns, and click timing distributions. Bot-like distributions get reclassified as bots even after passing the initial probe.
+If your block happens **before any JavaScript runs** (you see the `Access Denied` page directly with a `Reference #...` and `errors.edgesuite.net` URL), Akamai flagged you at one of layers 1–4. The sensor never executed.
+If your block happens **after the page partially loads** or you get a CAPTCHA challenge, you made it past layers 1–4 but the sensor data validation failed.
+## Why v0.1.0 was blocked
+When the BlackTip team first ran v0.1.0 against OpenTable in development, every request was rejected at the edge with Akamai's `Access Denied` page. We spent 30 minutes ruling out hypotheses one by one:
+- **TLS fingerprint:** Captured via `tls.peet.ws/api/all`. Result: **byte-perfect match** for real Chrome 125 on Windows. JA4 `t13d1516h2_8daaf6152771_d8a2da3f94cd`, GREASE in position 0 of both ciphers and extensions, 16 ciphers, 18 extensions. Not the issue.
+- **HTTP/2 fingerprint:** Akamai HTTP/2 string `1:65536;2:0;4:6291456;6:262144|15663105|0|m,a,s,p`. **Byte-perfect match** for real Chrome. Not the issue.
+- **IP reputation:** Residential Frontier Communications IP in Los Angeles. Not on a known datacenter ASN. Plausible signal but couldn't confirm via free tools.
+- **HTTP headers:** Captured via `httpbin.org/headers`. **FOUND IT.**
+The `httpbin.org/headers` capture showed:
+```
+User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) ... Chrome/125.0.0.0 ...
+Sec-Ch-Ua: "Chromium";v="146", "Not-A.Brand";v="24", "Google Chrome";v="146"
+```
+**Chrome/125 in `User-Agent` but Chrome/146 in `Sec-Ch-Ua`.** Real Chrome NEVER has these inconsistent. Akamai catches the mismatch as a textbook spoofing tell — they don't even need to run JavaScript, this header alone is enough.
+### Root cause
+`browser-core.ts` was setting `userAgent` at the Playwright context level via `newContext({userAgent: ...})`. Playwright's `userAgent` option overrides the `User-Agent` HTTP header value, but it does NOT update the `Sec-Ch-Ua` / `Sec-Ch-Ua-Mobile` / `Sec-Ch-Ua-Platform` client hint headers. Those come from the actual Chromium binary version (Chromium 146, the version patchright bundles, OR the version of Chrome Stable installed via `channel: 'chrome'`).
+The result: BlackTip was broadcasting "I am Chrome 125 (UA) but also Chrome 146 (client hints)" to every site since v0.1.0. Detectors that don't cross-check (CreepJS, bot.sannysoft, browserleaks) didn't notice. Detectors that do (Akamai, DataDome, PerimeterX) flagged it instantly.
+### The v0.2.0 fix (L016)
+Remove the `userAgent` context override entirely. Let real Chrome's natural User-Agent come through. UA and Sec-Ch-Ua match because they come from the same source (the actual Chromium binary).
+```typescript
+// browser-core.ts
+this.context = await this.browser.newContext({
+  viewport: {...},
+  // userAgent: this.deviceProfile.userAgent,   // ← REMOVED in v0.2.0
+  locale: this.config.locale,
+  timezoneId: this.config.timezone,
+  ...
+});
+```
+**Result:** OpenTable's Akamai Bot Manager went from blocking us at the edge to letting us into the booking flow on the very next request. Same machine, same network, same IP — only the UA override removed.
+### Side effect: cross-platform UA spoofing is no longer supported
+Previously you could declare a `desktop-macos` device profile while running on Linux and BlackTip would set the User-Agent to a macOS Chrome string. That doesn't work in v0.2.0 — your reported UA matches the actual Chrome binary on the host machine.
+If you need cross-platform spoofing, you have to override BOTH the User-Agent header AND all Sec-Ch-Ua-* headers in lockstep using `setExtraHTTPHeaders`. v0.2.0 doesn't ship a helper for this; v0.3.0 will.
+For most production use cases, you want Chrome-on-your-platform anyway, so this isn't a meaningful loss.
+## The phased response plan against Akamai
+This is the BlackTip team's running plan against Akamai's full layer stack. Phases marked DONE shipped in the version noted; phases marked NEXT are the team's next priorities.
+### Phase 1 — Diagnostics (DONE in v0.2.0)
+You can't fix what you can't see. v0.2.0 ships diagnostic primitives that capture exactly what BlackTip is sending across the TLS, HTTP/2, and HTTP header layers, plus IP reputation queries.
+```typescript
+// Capture our actual TLS / HTTP2 / header fingerprint
+const fp = await bt.captureFingerprint();
+console.log(fp.tls.ja4);                       // 't13d1516h2_8daaf6152771_d8a2da3f94cd'
+console.log(fp.http2.akamaiFingerprint);       // '1:65536;2:0;4:6291456;...'
+console.log(fp.headers.userAgent);             // 'Mozilla/5.0 ... Chrome/146.0.0.0 ...'
+console.log(fp.headers.secChUa);               // '"Google Chrome";v="146", ...'
+console.log(fp.headers.uaConsistent);          // true (the L016 check)
+// Check our IP reputation
+const ip = await bt.checkIpReputation();
+console.log(ip.ip);                            // '47.150.34.38'
+console.log(ip.asn);                           // 'AS5650'
+console.log(ip.org);                           // 'Frontier Communications of America, Inc.'
+console.log(ip.isDatacenter);                  // false
+console.log(ip.isResidential);                 // true
+// Test against an Akamai-protected URL with diagnosis
+const result = await bt.testAgainstAkamai('https://www.opentable.com/');
+console.log(result.passed);                    // true
+console.log(result.title);                     // 'Restaurants and Restaurant Bookings | OpenTable'
+console.log(result.akamaiReference);           // null (no block)
+```
+### Phase 2 — Quick wins (DONE in v0.2.0)
+Cheap fixes applied directly:
+1. **L016 (UA / Sec-Ch-Ua consistency)** — described above, the load-bearing fix
+2. **Aggressive Chrome flag cleanup** — minimum flags only, match Chrome's natural launch
+3. **Optional persistent user-data-dir** — `BlackTipConfig.userDataDir` lets you carry cookies, history, and visited-sites context across sessions, which makes Akamai's "first request from unknown session" challenge less likely to fire
+### Phase 3 — Session warming (DONE in v0.2.0)
+Akamai's "first request" challenge is harder to pass than the second. Solution: warm the session before hitting the target.
+```typescript
+await bt.launch();
+await bt.warmSession({
+  sites: [
+    'https://www.google.com/',
+    'https://www.wikipedia.org/',
+    'https://news.ycombinator.com/',
+  ],
+  dwellMsRange: [3000, 8000],   // human-like reading time on each site
+});
+// Now navigate to the target — the browser has cookies, history, and a
+// realistic activity pattern.
+await bt.navigate('https://target-protected-by-akamai.com/');
+```
+The warming visits accumulate cookies, populate the History API, and trigger the natural behavioral signals Akamai's profiler expects to see from a real user.
+### Phase 4 — TLS-rewriting proxy (DEFERRED to v0.3.0)
+For cases where the host machine's installed Chrome version is OLDER than what we want to declare, OR where the host has no Chrome installed at all and we're falling back to patchright's bundled Chromium with a different TLS profile, we need byte-level TLS impersonation. The plan:
+- **Use [bogdanfinn/tls-client](https://github.com/bogdanfinn/tls-client)** as a local MITM proxy
+- Spawn it as a subprocess on `bt.launch()`
+- Generate a self-signed root CA, install it into Chrome's cert store at launch
+- Point Chrome via `--proxy-server` at the local proxy
+- Verify via `bt.captureFingerprint()` that the JA4 matches the desired Chrome version
+Latency cost: ~5–20 ms per connection. Platform binaries: separate Linux/macOS/Windows × x64/arm64 builds. Will ship as an **optional dependency** so users who don't need this don't pay for it.
+### Phase 5 — Sensor data (DEFERRED to v0.3.0+)
+Akamai's JavaScript challenge collects ~80 signals and POSTs a 30–50 KB blob. To pass:
+- Either let the real script run with a real environment (best, but requires every JS-level signal to be perfect)
+- Or replay a pre-captured sensor payload from a real Chrome session (works once, then session expires)
+Plan:
+1. Run the Akamai sensor script in a controlled BlackTip session against a known-protected URL
+2. Capture the full payload and the resulting `_abck` cookie
+3. Identify which signals are flagged by analyzing the payload bytes
+4. Patch those specific signals at the patchright layer
+5. Re-test, repeat
+This is reverse-engineering work and takes weeks. Until then, BlackTip relies on its native browser environment being good enough to pass the sensor naturally — which it does in many cases now that L016 is fixed.
+### Phase 6 — Behavioral biometrics (DEFERRED to v0.3.0+)
+Once past the gate, Akamai still profiles mouse dynamics and keystroke timing. BlackTip's `BehavioralEngine` already handles this with Bézier mouse paths, Fitts' Law movement time, and digraph-aware typing. Tier 2 calibration against real datasets (Balabit, CMU Keystroke) will tighten the distributions further.
+The current behavioral engine is sufficient for most Akamai targets. The Tier 2 calibration is a "best-in-the-world" upgrade, not a "passes Akamai" requirement.
+### Phase 7 — IP reputation (USER-PROVIDED)
+This is the one layer BlackTip can't fix in code. If your IP is on Akamai's flagged list, no amount of fingerprint patching will help — you need a different network. Options:
+1. **Use a different connection** (mobile hotspot, different ISP) for the affected sessions
+2. **Use a residential proxy provider** (BrightData, Oxylabs, Smartproxy) — `BlackTipConfig.proxy` accepts the URL, and the `ProxyPool` class handles per-domain affinity
+3. **Wait 24–48 hours** for Akamai's reputation cache to expire if you've been hammering a target
+BlackTip's `bt.checkIpReputation()` will tell you if your current IP is on a known flagged list, but it can't fix it.
+## Currently passing / failing matrix
+As of v0.2.0:
+| Akamai layer | Status | Notes |
+|---|---|---|
+| TCP/IP reputation | User-dependent | BlackTip can't fix; use `checkIpReputation()` to diagnose |
+| TLS fingerprint | ✓ Passing | Real Chrome via `channel: 'chrome'` provides byte-perfect Chrome TLS |
+| HTTP/2 fingerprint | ✓ Passing | Same — real Chrome HTTP/2 stack |
+| HTTP headers (UA / Sec-Ch-Ua consistency) | ✓ Passing in v0.2.0 | The L016 fix |
+| HTTP headers (Sec-Fetch-*, order) | ✓ Passing | Real Chrome emits these naturally |
+| Sensor data validation | Best-effort | Native browser environment passes most Akamai sensors; sites with deeper sensor analysis may still flag |
+| Cookie continuity | ✓ Passing | Real Chrome handles cookies normally; persistent profile via `userDataDir` improves it further |
+| Behavioral patterns | Mostly passing | Behavioral engine generates plausible distributions; Tier 2 dataset calibration tightens further |
+**Validated against:**
+- ✓ **OpenTable** (Akamai Bot Manager) — passing as of v0.2.0
+- (More targets to be added as the team validates)
+## Recipe: get into an Akamai-protected target
+```typescript
+import { BlackTip } from '@rester159/blacktip';
+async function bookOnAkamaiSite() {
+  const bt = new BlackTip({
+    logLevel: 'info',
+    timeout: 15_000,
+    retryAttempts: 2,
+    behaviorProfile: 'human',
+    // userDataDir: './.bt-profile',   // optional: persist Chrome state across runs
+  });
+  await bt.launch();
+  // Verify we're set up correctly before touching the target
+  const fp = await bt.captureFingerprint();
+  if (!fp.headers.uaConsistent) {
+    throw new Error('UA / Sec-Ch-Ua mismatch — upgrade BlackTip to v0.2.0+');
+  }
+  const ip = await bt.checkIpReputation();
+  if (ip.isDatacenter) {
+    console.warn(`IP is on a datacenter ASN (${ip.asn}). Akamai will likely block.`);
+  }
+  // Warm the session before the target
+  await bt.warmSession({
+    sites: ['https://www.google.com/', 'https://en.wikipedia.org/wiki/Special:Random'],
+    dwellMsRange: [3000, 6000],
+  });
+  // Navigate to the target
+  await bt.navigate('https://www.opentable.com/');
+  await bt.waitForStable({ networkIdleMs: 1000, maxMs: 10_000 });
+  // Drive the booking flow as a normal user would
+  // ...
+}
+```
+## When BlackTip is NOT enough
+If `bt.testAgainstAkamai(targetUrl)` reports a block, walk this checklist:
+1. **Run `bt.captureFingerprint()`** — does `headers.uaConsistent` say `true`? If `false`, you're on v0.1.0 or older — upgrade.
+2. **Run `bt.checkIpReputation()`** — is `isDatacenter: true`? Then the IP itself is the problem. Switch networks or use a residential proxy.
+3. **Test in your normal Chrome from the same machine.** If your normal Chrome ALSO gets blocked, the IP is flagged regardless of what BlackTip does. You need a different network.
+4. **Try a session warm-up** — call `bt.warmSession()` before the target navigation.
+5. **Try a persistent profile** — set `userDataDir` in `BlackTipConfig` and let cookies accumulate across runs.
+6. **Try with a residential proxy** — configure via `BlackTipConfig.proxy`.
+7. **If all else fails, file an issue** at https://github.com/rester159/blacktip/issues with the output of `bt.captureFingerprint()` and `bt.checkIpReputation()` so the team can investigate.
+## What we learned
+The most important lesson from the v0.1.0 → v0.2.0 transition is that **fingerprint consistency matters more than fingerprint stealth**. We were emitting byte-perfect Chrome TLS, byte-perfect Chrome HTTP/2, and byte-perfect Chrome headers — except for ONE inconsistency between User-Agent and Sec-Ch-Ua. That single bug invalidated everything else against the highest-tier detectors.
+Top-tier commercial detectors (Akamai, DataDome, PerimeterX) don't just look at individual fingerprint values — they cross-check that values from different layers tell the same story. A "Chrome 125" UA and "Chrome 146" client hints together is a louder signal than either value being slightly off would be alone.
+**Implication:** if you're building stealth, prioritize consistency over richness. A complete, internally-consistent Chrome 125 fingerprint beats a perfectly-tuned Chrome 130 fingerprint that disagrees with itself somewhere.

package/docs/anti-bot-validation.md ADDED Viewed

@@ -0,0 +1,84 @@
+# Anti-bot validation scoreboard
+This document records BlackTip's results against commercial anti-bot vendors on real, in-the-wild targets. Every entry is generated by `bt.testAgainstAntiBot(url)`, which both detects challenge/block pages AND captures vendor signals (cookies, scripts) on a passing page so we can prove the target is actually protected — not a false negative on an unprotected URL.
+Methodology and reproduction recipe at the bottom.
+## Live scoreboard — 2026-04-09 (BlackTip 0.2.0)
+| Target | Vendors detected on page | Block? | Vendor signals on success | Notes |
+|---|---|---|---|---|
+| **vinted.com** | DataDome | pass | datadome cookie, cloudflare script | Real catalog renders with prices, brands, listings |
+| **bestbuy.com** | Akamai | pass | akamai cookie + script | Earlier classification as PerimeterX was wrong — BestBuy is Akamai |
+| **walmart.com** | Akamai + PerimeterX | pass | akamai cookie + script, perimeterx cookie + script | Walmart runs both vendors simultaneously; BlackTip slides past both |
+| **crunchbase.com** | Cloudflare | pass | cloudflare script | Real homepage content visible |
+| **ticketmaster.com** | none detected on homepage | pass | none on `/` | TM only arms PerimeterX on event/checkout pages, not the marketing homepage |
+| **opentable.com** (Gjelina deep link, restref=76651) | Akamai | pass | akamai cookie + script | Real time slots render; Reservation at Gjelina, Apr 11 2026 |
+| **chatgpt.com** | Cloudflare | pass | cloudflare script + cf_clearance cookie | **Cloudflare managed challenge auto-passed**; cf_clearance issued silently |
+| **twitch.tv** | Kasada | pass | kasada script signal | **Kasada validated** — first published BlackTip pass against a real Kasada-armed target |
+| **canadagoose.com** | (no live signals on homepage) | pass | none | Was historically Kasada but the homepage no longer surfaces Kasada cookies/scripts; may be lazy-loaded on cart/checkout |
+| **hyatt.com** | (no live signals on landing page) | pass | none on `/loyalty/en-US` | RT (Akamai mPulse RUM) present, no Bot Manager indicators |
+| **footlocker.com** | (no live signals on homepage) | pass | none | Same as Canada Goose — Kasada may arm only on PDP/cart |
+| **datadome.co/bot-tester** | DataDome (bypassed via marketing redirect) | pass | none captured | Redirected to /signup/ with marketing content; DataDome did not arm a challenge |
+| **antoinevastel.com/bots-vue.html** | none | pass | none | Demo page no longer arms a probe — author moved to a marketing homepage |
+| **nowsecure.nl** (Cloudflare bot fight, nodriver author's benchmark) | none | pass | none on this barebones page | Passes regression — earlier validation in v0.1.0 |
+### Cloudflare managed challenge — silent auto-pass evidence
+The `cf_clearance` cookie is set ONLY after Cloudflare's managed challenge accepts a request as human. Across the v0.3.0 validation run, BlackTip earned `cf_clearance` cookies on multiple Cloudflare-protected domains without ever surfacing a "Just a moment..." interstitial to the user:
+- `.vinted.com` (cf_clearance httpOnly)
+- `.crunchbase.com` (cf_clearance httpOnly)
+- `.chatgpt.com` (cf_clearance httpOnly + cf_bm + cfuvid)
+These cookies are not visible to `document.cookie` because they are httpOnly — the v0.2.0 detector missed them. v0.3.0 fixed the detector to read via the BlackTip cookies API, surfacing this evidence. **A `cf_clearance` cookie is the strongest possible proof of a Cloudflare managed-challenge pass on a real Chrome session.**
+**Eight commercial-detector targets, eight passes.** The most load-bearing validations are Walmart (Akamai + PerimeterX simultaneously) and OpenTable (Akamai's full Bot Manager on a high-value booking endpoint that previously hard-blocked v0.1.0).
+## What "passing" means here
+`bt.testAgainstAntiBot(url)` returns `passed: true` when:
+1. The page title does not match a known vendor block pattern (Akamai "Access Denied", Cloudflare "Just a moment...", PerimeterX "Press & Hold", DataDome captcha-delivery interstitial, Imperva incident page, etc.)
+2. The body text does not contain vendor block markers
+3. Real page content is rendered (verifiable in `bodyPreview`)
+The `vendorSignals` field separately reports what vendor cookies and scripts are present even on a passing page. If a target shows `vendorSignals: []`, either the vendor doesn't arm protection on that URL (Ticketmaster homepage), or BlackTip's signal patterns missed something — note both honestly here.
+## Vendors recognised
+| Vendor | Block-page tells | Cookie tells | Script tells |
+|---|---|---|---|
+| Akamai Bot Manager | title `Access Denied`, body `errors.edgesuite.net`, `Reference #...` | `_abck`, `bm_sz`, `ak_bmsc`, `bm_sv` | `akam/`, `ak.bmpsdk`, `akamaihd.net/sensor` |
+| DataDome | body `geo.captcha-delivery.com`, `datado.me` | `datadome`, `dd_s`, `dd_cookie_test_` | `js.datadome.co`, `datado.me` |
+| Cloudflare Bot Fight / Turnstile | title `Just a moment...`, body `cf-browser-verification`, `Sorry, you have been blocked` | `cf_clearance`, `__cf_bm`, `__cflb` | `challenges.cloudflare.com`, `cdn-cgi/challenge-platform` |
+| HUMAN / PerimeterX | title `Access to this page has been denied`, body `Press & Hold` | `_px*`, `_pxhd` | `perimeterx`, `px-cdn`, `px-captcha`, `human-security` |
+| Imperva / Incapsula | body `Request unsuccessful. Incapsula incident ID` | `visid_incap_`, `incap_ses_` | (mostly server-side) |
+| Kasada | body `kpsdk` | (none captured) | `x-kpsdk-` headers, `ips.js` |
+| Arkose Labs / FunCaptcha | body `client-api.arkoselabs.com`, `funcaptcha` | (none captured) | `client-api.arkoselabs.com` |
+## How to reproduce
+```bash
+# Terminal 1
+cd /path/to/blacktip
+node dist/cli.js serve
+# Terminal 2 — single target
+node dist/cli.js send "return await bt.testAgainstAntiBot('https://www.vinted.com/')" --pretty
+# Or batch:
+node dist/cli.js batch '[
+  "return await bt.testAgainstAntiBot(\"https://www.vinted.com/\")",
+  "return await bt.testAgainstAntiBot(\"https://www.bestbuy.com/\")",
+  "return await bt.testAgainstAntiBot(\"https://www.walmart.com/\")",
+  "return await bt.testAgainstAntiBot(\"https://www.crunchbase.com/\")",
+  "return await bt.testAgainstAntiBot(\"https://www.opentable.com/booking/restref/availability?rid=76651&restref=76651&partySize=2&dateTime=2026-04-11T19%3A00\")"
+]'
+```
+## Caveats
+- **One-shot probes.** This scoreboard records single navigations from a residential IP. Anti-bot vendors profile sessions over time and across requests; a target that passes a one-shot probe may still flag a multi-step automation flow where the behavioral signature looks too clean. The deep validation is the OpenTable booking flow itself, where we drove the form to completion through Akamai's Bot Manager.
+- **Vendor classification can be wrong.** Targets like Walmart run multiple vendors in parallel. The `vendorSignals` field is the source of truth; the "expected vendor" column above is informational.
+- **IP matters.** Run from a residential network or known-clean residential proxy. A datacenter IP will fail every entry on this scoreboard regardless of how good BlackTip's stealth is. Use `bt.checkIpReputation()` first.
+- **The scoreboard goes stale.** Vendors update their detection logic constantly. Re-run on each release and update this doc. If a target moves from pass to fail, file it against the next BlackTip version, capture the failing fingerprint, and dig in.

package/docs/calibration-validation.md ADDED Viewed

@@ -0,0 +1,93 @@
+# Behavioral calibration validation (v0.3.0)
+This document records the result of fitting BlackTip's behavioral profile against the real CMU Keystroke Dynamics dataset (Killourhy & Maxion 2009) and validating the fit against held-out subjects.
+## TL;DR
+The calibrated profile measurably beats BlackTip's canonical `HUMAN_PROFILE` on a held-out subject set:
+| Metric | Canonical KS distance | Calibrated KS distance | Improvement |
+|---|---|---|---|
+| **Hold time** | 0.4297 | 0.2018 | **53% closer to real humans** |
+| **Flight time** | 0.4811 | 0.4152 | 13.7% closer to real humans |
+This is the first time the BlackTip behavioral pipeline has been validated end-to-end against a real public dataset. Up through v0.2.0, the engine's parameters were sane defaults; v0.3.0 makes them empirically grounded.
+## Methodology
+1. **Dataset**: CMU Keystroke Dynamics (`DSL-StrongPasswordData.csv`) — 51 subjects each typing the fixed phrase `.tie5Roanl` 50 times across 8 sessions, for 20,400 total phrase reps.
+2. **Split**: deterministic 80/20 by subject. 40 subjects (16,000 phrases) → training. 11 subjects (4,400 phrases) → held-out test.
+3. **Fit**: training set → `fitTypingDynamics()` → empirical hold-time and flight-time distributions plus per-digraph latencies.
+4. **Compare**: synthesized 5,000 samples from each of (a) BlackTip's canonical `HUMAN_PROFILE` ranges and (b) the fitted `[p5, p95]` ranges. Computed Kolmogorov–Smirnov distance (max empirical CDF gap) against the held-out test set.
+5. **Report**: lower KS distance → closer to real human distribution.
+The KS test is the standard non-parametric goodness-of-fit measure. It does not assume any particular distribution shape, which matters here because keystroke timings are right-skewed log-normal-ish, not Gaussian. The improvement ratio is `1 - calibrated / canonical`.
+## Fitted parameters
+```
+Hold time:
+  mean = 90.3 ms
+  p5   = 48.3 ms
+  p50  = 85.8 ms
+  p95  = 148.8 ms
+Flight time:
+  mean = 151.4 ms
+  p5   = 0.0   ms   (some adjacent keystrokes overlap — concurrent press/release)
+  p50  = 91.3  ms
+  p95  = 513.5 ms
+Digraphs fit: 6 (the unique a–z transitions in the phrase)
+```
+The fitted profile is saved to `data/cmu-keystroke/calibrated-profile.json` and ready to load:
+```typescript
+import calibrated from './data/cmu-keystroke/calibrated-profile.json' with { type: 'json' };
+import { BlackTip } from '@rester159/blacktip';
+const bt = new BlackTip({
+  behaviorProfile: calibrated.profileConfig,
+  // ... rest of your config
+});
+```
+## Why this matters
+Behavioral biometrics services (BioCatch, NuData, SecuredTouch) profile users on dimensions like:
+- Hold time mean and variance per key
+- Flight time distributions per digraph
+- Tap pressure (mobile only — n/a here)
+- Mouse curvature, click dwell, scroll deceleration
+A bot that types with uniform 100 ms holds and flat flight times stands out instantly because real humans have right-skewed log-normal distributions with subject-specific clustering. BlackTip's canonical `HUMAN_PROFILE` was already in the right ballpark, but the canonical hold-time range `[50, 200]` was 53% farther from the real distribution than the empirically-fitted `[48, 149]`. The fitted range is tighter and centered correctly, so BlackTip's keystroke output now sits inside the real human distribution rather than scattered across a too-wide canonical range.
+## Reproducing the result
+```bash
+cd /path/to/blacktip
+mkdir -p data/cmu-keystroke
+curl -fsSL -o data/cmu-keystroke/DSL-StrongPasswordData.csv \
+  https://www.cs.cmu.edu/~keystroke/DSL-StrongPasswordData.csv
+npm run build
+node scripts/fit-cmu-keystroke.mjs
+```
+The script writes its output to `data/cmu-keystroke/calibrated-profile.json` and prints the validation table to stdout. Re-runs are deterministic (the train/test split is sorted, not random) so the numbers match this document byte-for-byte.
+## What this does NOT prove
+- The KS test compares marginal distributions, not joint ones. A profile that matches the marginals perfectly could still have unrealistic correlation structure (e.g. correct hold times but uncorrelated with flight times). A real biometrics test against a commercial service would catch this; we don't have one.
+- The CMU dataset is 50 reps of one fixed phrase from each of 51 American English typists. The fitted profile generalises best to American English long-form typing; non-Latin scripts and very short fields may need a different calibration.
+- Flight time fit improvement (13.7%) is much smaller than hold time (53%). The CMU phrase is short and contains transitions that aren't representative of free-text typing — the held-out flights span a wide range that the canonical `[80, 150]` and fitted `[0, 514]` are both bad fits for. A larger free-text dataset (e.g. Buffalo or GREYC) would likely produce a better flight fit. Future work.
+## Future calibration sources
+Once a parser exists for each, the same pipeline applies:
+- **Balabit Mouse Dynamics Challenge** — for `fitMouseDynamics()`. Parser exists in `parseBalabitMouseCsv()`; needs an actual fit run against the real dataset.
+- **GREYC-NISLAB** — free-text keystroke dynamics from 110 subjects. Better representative coverage than CMU's fixed phrase.
+- **Buffalo Free-Text** — multi-session keystroke data across 148 subjects. The canonical reference for keystroke behavioral biometrics literature.
+- Your own telemetry — `parseGenericTelemetryJson()` accepts the normalized `MouseMovement` / `TypingSession` shapes directly. Bring your own data.

package/docs/identity-pool.md ADDED Viewed

@@ -0,0 +1,176 @@
+# IdentityPool — long-running session and identity rotation (v0.4.0)
+`IdentityPool` is BlackTip's answer to the question "how do I rotate across many identities cleanly without my whole flow looking like one bot retried under different IPs?" An identity is the union of everything that makes a session look like one specific human: cookies, localStorage, proxy, device profile, behavior profile, locale, timezone. The pool persists to a JSON file so identities survive restarts, and each identity has a per-domain burn list so an identity blocked on opentable.com is still eligible for amazon.com.
+## When you need this
+Most BlackTip flows do not need an IdentityPool. A single launch with the right device profile and a residential connection covers the common case. The pool earns its keep when:
+1. You're running many flows against the same target and need to look like many different users (price scraping, market research, multi-account ops on services where multi-account is allowed).
+2. You want resilience: when identity A gets blocked on opentable.com, you want identity B to take over without manual intervention.
+3. You want session persistence across process restarts so a logged-in identity from yesterday is still logged in today.
+4. You want a feedback loop: when an identity gets burned, the proxy bound to it should be marked dirty in `ProxyPool` so it isn't reused for the same target until the ban window decays.
+## Composition
+`IdentityPool` does not reinvent persistence or proxy selection. It composes:
+- **`SnapshotManager`** for cookies + localStorage + sessionStorage. The pool calls `captureSnapshot(bt, identity)` after a successful flow to save state.
+- **`ProxyPool`** for proxy selection and ban tracking. New identities draw a proxy from the pool at creation time. When an identity is burned per-domain, the pool reports a ban on that proxy/domain pair so future selections skip it.
+- **`BlackTipConfig`** is produced by `pool.applyToConfig(identity)` and passed to `new BlackTip(config)`.
+## Quick start
+```typescript
+import { BlackTip, IdentityPool, ProxyPool, ProxyProviders } from '@rester159/blacktip';
+// 1. Build a ProxyPool from whatever provider you use.
+const proxyPool = new ProxyPool([
+  ProxyProviders.brightData('your-customer-id', 'your-password', 'residential'),
+  ProxyProviders.oxylabs('your-username', 'your-password'),
+]);
+// 2. Build the IdentityPool, backed by a JSON file on disk.
+const pool = new IdentityPool({
+  storePath: './.blacktip/identities.json',
+  proxyPool,
+  rotationPolicy: {
+    maxUses: 50,            // burn after 50 uses
+    maxAgeMs: 7 * 24 * 60 * 60 * 1000, // burn after 7 days idle
+  },
+});
+// 3. First time only: seed the pool with N identities. Subsequent runs
+// load from the store file.
+if (pool.size() === 0) {
+  for (let i = 0; i < 5; i++) {
+    pool.add({
+      deviceProfile: i % 2 === 0 ? 'desktop-windows' : 'desktop-macos',
+      label: `identity-${i + 1}`,
+      locale: 'en-US',
+      timezone: 'America/New_York',
+    });
+  }
+}
+// 4. For each flow: acquire, launch, run, capture, release.
+const identity = pool.acquire('opentable.com');
+if (!identity) throw new Error('No eligible identity for opentable.com — pool exhausted');
+const config = pool.applyToConfig(identity, { logLevel: 'info', timeout: 15_000 });
+const bt = new BlackTip(config);
+await bt.launch();
+// Restore the identity's prior session (cookies, storage). No-op if first use.
+await pool.restoreSnapshot(bt, identity);
+try {
+  await bt.navigate('https://www.opentable.com/');
+  await bt.waitForStable();
+  // ... rest of the flow
+  // On success, save the updated session state back into the identity.
+  await pool.captureSnapshot(bt, identity);
+} catch (err) {
+  // On failure, mark this identity burned for this domain. The proxy
+  // gets banned in ProxyPool too, so the next identity drawn from the
+  // pool won't reuse the same proxy on this target.
+  pool.markBurned(identity.id, err instanceof Error ? err.message : String(err), 'opentable.com');
+} finally {
+  await bt.close();
+}
+```
+## API
+### `new IdentityPool(options)`
+```typescript
+{
+  storePath: string;              // required — JSON file path
+  proxyPool?: ProxyPool;          // optional — for proxy binding & feedback
+  rotationPolicy?: {
+    maxUses?: number;             // default: Infinity
+    maxAgeMs?: number;            // default: Infinity
+    preferLeastRecentlyUsed?: boolean; // default: true
+  };
+}
+```
+### `add(init)` → `Identity`
+Create a new identity. `deviceProfile` is required. If a `proxyPool` was supplied to the IdentityPool and `proxy` is omitted, the pool draws one. Auto-saves to disk.
+### `acquire(domain?)` → `Identity | null`
+Pick an identity for use. Skips identities burned on the requested domain. Applies rotation policy: identities exceeding `maxUses` or `maxAgeMs` are auto-burned. Returns null if no eligible identity exists.
+### `markBurned(id, reason, domain?)` → `boolean`
+Mark an identity burned. With `domain`, only burns for that domain (per-domain burn list). Without `domain`, fully burns the identity. Per-domain burns also report a proxy ban back to `ProxyPool` if one is wired up.
+### `clearBurn(id, domain?)` → `boolean`
+Manually unban. Useful when you know the burn was a transient issue.
+### `applyToConfig(identity, baseConfig?)` → `BlackTipConfig`
+Build a `BlackTipConfig` from an identity. Sets `deviceProfile`, `behaviorProfile`, `locale`, `timezone`, and `proxy` (URL-formatted via `proxyToUrl`). Other base config fields pass through unchanged.
+### `restoreSnapshot(bt, identity)` → `Promise<void>`
+After `bt.launch()`, apply the identity's saved cookies + localStorage to the running browser. No-op if the identity has no snapshot yet.
+### `captureSnapshot(bt, identity)` → `Promise<void>`
+Save the current BlackTip session state into the identity. Call after a successful flow so the next acquire of this identity starts from a known-good logged-in state.
+### `list()`, `available()`, `size()`, `remove(id)`
+Standard inspection. `available()` returns identities not fully burned (per-domain burns don't count).
+## Persistence format
+The store file is plain JSON with a schema version. Sample:
+```json
+{
+  "version": 1,
+  "savedAt": "2026-04-10T22:30:00.000Z",
+  "identities": [
+    {
+      "id": "1f2e3d4c-...",
+      "label": "identity-1",
+      "createdAt": "2026-04-10T20:00:00.000Z",
+      "lastUsedAt": "2026-04-10T22:25:00.000Z",
+      "useCount": 12,
+      "burnedAt": null,
+      "burnedReason": null,
+      "burnedDomains": ["sears.com"],
+      "snapshot": { /* SessionSnapshot */ },
+      "proxy": { "id": "brightdata-residential", "...": "..." },
+      "behaviorProfile": "human",
+      "deviceProfile": "desktop-windows",
+      "locale": "en-US",
+      "timezone": "America/New_York"
+    }
+  ]
+}
+```
+The schema is versioned so future migrations are explicit. Don't hand-edit the file while a process is reading it — use the API.
+## IP reputation gate
+v0.4.0 also adds `BlackTipConfig.requireResidentialIp`. When set, BlackTip runs `bt.checkIpReputation()` immediately after `launch()` and either warns or throws based on the verdict:
+```typescript
+new BlackTip({
+  // 'throw': refuse to launch if egress IP is on a known datacenter ASN.
+  // 'warn':  log a warning but allow the launch.
+  // false / unset: no check.
+  requireResidentialIp: 'throw',
+});
+```
+Use `'throw'` in production / CI where a flagged IP would burn a real account. Use `'warn'` for local dev. Combine with `IdentityPool` and `ProxyPool` to ensure every launch goes through a residential exit before touching the target.