statswhatshesaid 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,151 @@
1
+ # statswhatshesaid
2
+
3
+ ## 0.3.0
4
+
5
+ ### Minor Changes
6
+
7
+ - a404924: **Library — opt-in shared-salt mode + raw HLL sketch export**
8
+
9
+ Add a new `saltSecret` option (env: `STATS_SALT_SECRET`). When set, the
10
+ daily HLL salt is derived as `HMAC-SHA-256(saltSecret, utcDate)` instead of
11
+ random per-process bytes. Replicas configured with the same secret then
12
+ produce identical daily salts — the mathematical precondition for an
13
+ external tool to merge HLL sketches across replicas. Cross-day
14
+ unlinkability is preserved (the salt still rotates daily).
15
+
16
+ When shared-salt mode is on, `GET /stats?format=raw` additionally returns
17
+ the raw 16,384-byte HLL register array (base64) plus an 8-byte
18
+ `saltFingerprint` so a collector can verify replicas are using the same
19
+ salt before merging. When the secret is unset, behavior is unchanged.
20
+
21
+ This is fully backwards-compatible: existing deployments need no changes.
22
+
23
+ **New package — `statswhatshesaid-collector`**
24
+
25
+ External one-shot CLI (`swhsd-collect`) that polls one or more deployed
26
+ `statswhatshesaid` apps and persists their results to a local SQLite
27
+ database. Solves what the in-memory library deliberately does not:
28
+
29
+ - multi-app aggregation
30
+ - best-effort persistence across app restarts
31
+ - long-term retention beyond the library's in-memory window
32
+ - multi-replica merging (opt-in, requires `STATS_SALT_SECRET`)
33
+
34
+ Schedule it however you like — cron, systemd timer, launchd, GitHub
35
+ Actions. See `packages/collector/README.md` for examples.
36
+
37
+ ## 0.2.0
38
+
39
+ ### Minor Changes
40
+
41
+ - c5437a3: **One-line drop-in.** statswhatshesaid is now truly one line:
42
+
43
+ ```ts
44
+ // middleware.ts
45
+ export { default } from "statswhatshesaid";
46
+ ```
47
+
48
+ No `runtime: 'nodejs'` config, no `matcher`, no `experimental` flags, no Next.js 15.2+ requirement. It just works.
49
+
50
+ ## Breaking changes
51
+
52
+ This is a major architectural change disguised as a `minor` bump because we're still in `0.x`. The headline change: **all persistence is gone**. Counts and history live in process memory only and reset on every restart. This is intentional — see "Why" below.
53
+
54
+ - **Removed** `node:fs` entirely. No more snapshot file. No more `.statswhatshesaid.json`. No more atomic-rename writes.
55
+ - **Removed** `node:crypto`. All hashing now uses Web Crypto (`crypto.subtle.digest`, `crypto.getRandomValues`).
56
+ - **Removed** the `runtime: 'nodejs'` middleware-config requirement. The library runs in **both** Edge and Node runtimes since it only uses Web APIs.
57
+ - **Removed** the `PersistAdapter` interface, `FileSnapshotAdapter`, `SnapshotV1` type, and the `persist`, `snapshotPath`, `flushIntervalMs` options.
58
+ - **Removed** `process.on('SIGTERM' | 'SIGINT' | 'beforeExit')` handlers. There's nothing to flush.
59
+ - **Removed** the periodic flush timer.
60
+ - **Removed** the Node-runtime guard (`assertNodeRuntime`). The library no longer cares which runtime you use.
61
+ - **Lowered** the Next.js peer dependency from `>=15.2.0` to `>=13.0.0`.
62
+ - **Default export** of the main package is now a pre-instantiated middleware function (was previously the `stats` object). To customize options, import `createMiddleware`:
63
+ ```ts
64
+ import { createMiddleware } from "statswhatshesaid";
65
+ export default createMiddleware({ filterBots: false });
66
+ ```
67
+ - **`createMiddleware` now returns an `async` function** since `crypto.subtle.digest` is async. Next.js middleware natively supports async.
68
+ - **`trackRequest` is now async** for the same reason.
69
+
70
+ ## New features
71
+
72
+ - **Self-filters common static paths** before tracking (`/_next/static/*`, `/_next/image/*`, `/favicon.ico`, `/robots.txt`, `/sitemap.xml`, `/manifest.json`, etc.) so users don't need a custom `matcher` to skip static assets.
73
+ - **One-line integration**: `export { default } from 'statswhatshesaid'` is the entire `middleware.ts`.
74
+
75
+ ## Why (the short version)
76
+
77
+ The previous version's promise of "drop in" was undermined by the four lines of `export const config = { matcher, runtime: 'nodejs' }` boilerplate users had to write, plus the Next.js 15.2+ requirement and the directory of snapshot/WAL/SHM-equivalent file artifacts. The user explicitly wanted a true drop-in for monitoring freshly launched apps and accepted the trade-off of in-memory-only state.
78
+
79
+ Edge runtime is now a first-class target. You can deploy this on Vercel Edge Middleware, on a Docker scratch image, on Cloudflare Pages — anywhere modern JS runs.
80
+
81
+ ## Bundle size
82
+
83
+ ESM bundle: 18.7 KB → **12.2 KB** (smaller because `snapshot.ts`, `FileSnapshotAdapter`, the persist abstraction, and the lifecycle plumbing are all gone).
84
+
85
+ ## Tests
86
+
87
+ 89 unit and integration tests, all passing. The 5-test snapshot suite was deleted along with the file adapter. Persistence-restart and corruption-recovery tests were dropped (no persistence to test). New tests cover the static-path filter, the Web Crypto hash path, and the async constant-time compare.
88
+
89
+ End-to-end smoke tested via the `examples/basic` Next.js app: 2 distinct visitors counted, dedup correct, bot filtered, favicon skipped, both query and `Authorization` token paths working, wrong token rejected.
90
+
91
+ ## 0.1.0
92
+
93
+ ### Minor Changes
94
+
95
+ - 7880aa7: Initial release of `statswhatshesaid` — a super minimal drop-in unique-visitors-per-day stats library for self-hosted Next.js.
96
+
97
+ **Features:**
98
+
99
+ - One-line integration via Next.js middleware (`export default stats.middleware()`).
100
+ - Single `/stats?t=<token>` endpoint returning JSON (today's estimate + history).
101
+ - Cookieless visitor identification: `SHA-256(ip + ua + dailySalt)`, salt rotates at UTC midnight.
102
+ - HyperLogLog (p=14) cardinality estimation — fixed 16 KB per day, ~0.8% standard error.
103
+ - Single JSON snapshot file (~22 KB) with atomic `.tmp` + rename writes. Default `./.statswhatshesaid.json`.
104
+ - Pluggable `PersistAdapter` for bring-your-own backends (Redis, KV, S3).
105
+ - Edge-runtime guard with a clear, actionable error message.
106
+ - **Zero runtime dependencies.** No native modules, no Docker volume gymnastics. Works on Alpine, slim, distroless.
107
+ - Requires Next.js ≥ 15.2 for the `nodejs` middleware runtime.
108
+
109
+ - 3b295d6: Second-pass hardening found via a targeted re-audit. These fixes all sit inside the existing v0.1.0 window (still unreleased), so they roll into the first published release.
110
+
111
+ **Length-prefixed visitor hash construction.** `computeVisitorHash` now prepends a big-endian length header for each variable-length component (ip, ua) before feeding them into SHA-256. The previous `ip + ":" + ua + ":" + salt` encoding was input-ambiguous: with IPv6 addresses containing colons, two distinct `(ip, ua)` pairs could produce the same pre-image and therefore the same hash. Length-prefixing makes the pre-image unambiguous.
112
+
113
+ **Snapshot load is now crash-proof.** `VisitorStore.fromSnapshot` wraps both the same-day and cross-day branches in try/catch and degrades gracefully:
114
+
115
+ - Decoded salt length is validated (must be exactly 32 bytes) before use.
116
+ - Decoded HLL register length is validated (must be exactly `HLL_REGISTER_COUNT` bytes) before use. `Buffer.from(x, 'base64')` silently ignores malformed characters, so the base64 string-length check in `isValidSnapshot` alone was insufficient.
117
+ - On any decode/validation failure, the store starts fresh for the current day rather than throwing out of init. Up to a few minutes of same-day dedupe state is lost; the process stays up.
118
+
119
+ **Strict history validation.** `sanitizeHistory` drops any entry that isn't a real `YYYY-MM-DD` calendar date (validated via `Date.UTC` round-trip, rejecting `2026-02-30` and `2025-02-29`) mapped to a non-negative integer count. Entries for `currentDate` itself are dropped (today's count is owned by the live HLL). Protects against snapshot files poisoned by whoever has write access.
120
+
121
+ **Snapshot validator rejects arrays.** `isValidSnapshot` now explicitly rejects arrays for the `history` field. Previously `typeof [] === 'object'` let arrays through, which would then be iterated with numeric-string keys.
122
+
123
+ **Config sanity checks.** `resolveConfig` now validates:
124
+
125
+ - `flushIntervalMs` must be a positive integer ≥ 1000 ms (prevents `setInterval(tick, 0)` hot loops from a bad config).
126
+ - `historyDays` and `maxHistoryDays` must be non-negative integers.
127
+ - `endpointPath` must match `^/[A-Za-z0-9\-._~/]*$` — no whitespace, CR/LF, or shell metacharacters.
128
+
129
+ All throw loud, clear errors at config resolution time.
130
+
131
+ **New `isValidUtcDate` helper.** Shared between snapshot validation and history sanitization. Rejects calendrically-impossible dates like `2026-02-30` via `Date.UTC` round-trip, not just via regex.
132
+
133
+ **Tests.** 23 new hardening tests across four describe blocks covering the hash input-ambiguity fix, date validation, config validation, and graceful snapshot-load degradation. Total test count: 76 → 99, all green.
134
+
135
+ - 64b583f: Security hardening pass before the first public release.
136
+
137
+ **New `trustProxy` option** (default: `1`). Determines how many reverse-proxy hops to skip when resolving the client IP from `X-Forwarded-For`. The library now walks the XFF chain from the RIGHT (instead of blindly taking the leftmost entry), which defeats the standard client-side XFF spoofing attack when at least one trusted proxy sits in front of the process. Set to `0` to ignore forwarding headers entirely, or to `N > 1` for chained proxies (e.g. Cloudflare → nginx → app = `2`). Configurable via `STATS_TRUST_PROXY` env var. See the README Security section for recipes.
138
+
139
+ **`/stats` now accepts `Authorization: Bearer <token>`** in addition to the `?t=<token>` query string. The header is preferred in production because it does not leak into access logs, browser history, or Referer headers. If both are provided, the header wins.
140
+
141
+ **Weak-token warning.** The library emits a one-time `console.warn` at init time if the token is shorter than 32 characters, with guidance to run `openssl rand -hex 32`. The library does NOT reject short tokens — you may deliberately pick a memorable one for ad-hoc browser access.
142
+
143
+ **Snapshot file is now written with mode `0o600`** (owner read/write only). The snapshot contains the current day's visitor-hashing salt and should not be world-readable.
144
+
145
+ **User-Agent truncation.** Incoming `User-Agent` headers are truncated to 512 bytes before hashing and bot filtering, bounding per-request CPU cost regardless of the upstream header-size limit.
146
+
147
+ **Constant-time token comparison.** Token validation now prehashes both sides with SHA-256 before `timingSafeEqual`, so the comparison no longer branches on token length.
148
+
149
+ **Process signal handler leak fix.** `shutdown()` now calls `process.removeListener` for its own handlers, fixing a `MaxListenersExceededWarning` that appeared when many init/shutdown cycles ran in the same process (e.g. dev-mode HMR, test suites).
150
+
151
+ **README: new Security section** covering the threat model, `trustProxy` semantics with nginx/Caddy/Cloudflare recipes, token handling, flooding limitations, snapshot file contents, and privacy properties.
package/README.md CHANGED
@@ -1,14 +1,14 @@
1
1
  # statswhatshesaid
2
2
 
3
- A super minimal drop-in stats library for **self-hosted Next.js**. One metric, one line of integration, **zero runtime dependencies**.
3
+ A super minimal **one-line** drop-in stats library for Next.js. One metric, one line of integration, **zero runtime dependencies**, in-memory only, runs in **both** the Edge and Node runtimes.
4
4
 
5
5
  - Tracks **unique visitors per day** — that's it.
6
6
  - No tracking pixel, no client JS, no cookies.
7
- - **Zero dependencies.** No native modules, no SQLite, no Docker volume gymnastics.
8
- - Single ~22KB JSON file for persistence. Atomic writes. Put it anywhere or nowhere.
7
+ - **Zero dependencies.** No native modules, no filesystem, no SQLite, no Docker volume gymnastics.
8
+ - **Works anywhere.** Edge runtime, Node runtime, Vercel, self-hosted, Docker, scratch images. The library uses only Web APIs (`crypto.subtle`, `crypto.getRandomValues`, `globalThis.fetch`).
9
9
  - Read your stats by visiting `myapp.com/stats?t=<your-secret>` — JSON response.
10
10
 
11
- > **Designed for freshly launched apps.** Once traffic gets serious you should graduate to a proper analytics suite (Plausible, Umami, PostHog, ...). This library is the thing you drop in on day one so you can tell whether anyone's visiting yet, with absolutely no setup ceremony.
11
+ > **Designed for freshly launched apps.** Counts and history live in process memory. They survive across requests within a single worker but reset on every deploy / restart. That's the trade-off for "drop in and forget." Once your traffic warrants real analytics, graduate to Plausible / Umami / PostHog.
12
12
 
13
13
  ## Install
14
14
 
@@ -18,21 +18,14 @@ npm install statswhatshesaid
18
18
 
19
19
  ## Use it
20
20
 
21
- Add **one line** to your `middleware.ts`:
21
+ **One line.** That's it.
22
22
 
23
23
  ```ts
24
24
  // middleware.ts
25
- import stats from 'statswhatshesaid'
26
-
27
- export default stats.middleware()
28
-
29
- export const config = {
30
- matcher: ['/((?!_next/static|_next/image|favicon.ico).*)'],
31
- runtime: 'nodejs', // REQUIRED — see "Edge Runtime" below
32
- }
25
+ export { default } from 'statswhatshesaid'
33
26
  ```
34
27
 
35
- Set your secret in the environment:
28
+ Set your secret:
36
29
 
37
30
  ```bash
38
31
  STATS_TOKEN=pick-a-long-random-string
@@ -57,34 +50,50 @@ You'll get JSON back:
57
50
  }
58
51
  ```
59
52
 
60
- That's the whole library.
53
+ That's the whole library. No `runtime: 'nodejs'` config, no `matcher`, no `experimental`, no `next.config` flags. Just one re-export line.
61
54
 
62
- ## Edge Runtime — read this first
55
+ ## Customizing options
63
56
 
64
- Next.js middleware defaults to the **Edge runtime**, which can't run `node:crypto` or `node:fs`. You **must** opt into the Node runtime:
57
+ If you need to change defaults bot filter, endpoint path, history retention, trustProxy hops import `createMiddleware` instead:
65
58
 
66
59
  ```ts
60
+ // middleware.ts
61
+ import { createMiddleware } from 'statswhatshesaid'
62
+
63
+ export default createMiddleware({
64
+ endpointPath: '/_internal/stats',
65
+ filterBots: false,
66
+ trustProxy: 2,
67
+ })
68
+ ```
69
+
70
+ You can also set a custom `matcher` if you want the middleware to run on a narrower path set than "everything":
71
+
72
+ ```ts
73
+ import { createMiddleware } from 'statswhatshesaid'
74
+
75
+ export default createMiddleware()
76
+
67
77
  export const config = {
68
- matcher: [...],
69
- runtime: 'nodejs',
78
+ matcher: ['/((?!api).*)'],
70
79
  }
71
80
  ```
72
81
 
73
- This is stable in **Next.js 15.2 and newer**.
74
-
75
82
  ## How a "unique visitor" is counted
76
83
 
77
84
  Cookieless, Plausible-style:
78
85
 
79
86
  ```
80
- visitorHash = SHA-256( ip + ":" + userAgent + ":" + dailySalt )
87
+ visitorHash = SHA-256( length-prefixed( ip ) + length-prefixed( userAgent ) + dailySalt )
81
88
  ```
82
89
 
83
- - `dailySalt` is generated in process memory and rotates at every UTC midnight.
90
+ - `dailySalt` is generated in process memory at startup and rotated lazily at every UTC midnight.
84
91
  - The hash is fed into a [**HyperLogLog** sketch](https://en.wikipedia.org/wiki/HyperLogLog) with 16384 one-byte registers (16 KB fixed per day, forever).
85
- - At UTC midnight the day's estimate is written to a historical map and the sketch is reset with a fresh salt.
92
+ - At UTC midnight the day's estimate is moved to an in-memory historical map and the sketch is reset with a fresh salt.
86
93
  - Cross-day unlinkability: because the salt is regenerated, hashes from different days can't be correlated back to the same visitor.
94
+ - The hash inputs are length-prefixed so two distinct `(ip, ua)` pairs can never collide via separator ambiguity.
87
95
  - Common bot User-Agents are filtered out by default.
96
+ - Common static asset paths (`/_next/static/*`, `/_next/image/*`, `/favicon.ico`, `/robots.txt`, `/sitemap.xml`, `/manifest.json`, etc.) are filtered out before tracking, so you don't need a custom `matcher`.
88
97
 
89
98
  ### About accuracy
90
99
 
@@ -94,122 +103,57 @@ If you need exact counts down to the last human, don't use this library — grad
94
103
 
95
104
  ## Storage
96
105
 
97
- A single JSON file. Default location: `./.statswhatshesaid.json`.
98
-
99
- ```jsonc
100
- {
101
- "version": 1,
102
- "today": "2026-04-07",
103
- "salt": "<base64 32 bytes>",
104
- "hllRegisters": "<base64 16 KB>",
105
- "history": { "2026-04-06": 388, "2026-04-05": 401 }
106
- }
107
- ```
108
-
109
- - **One file.** Not a directory, not a DB, no WAL/SHM sidecars.
110
- - **~22 KB today + 20 bytes per historical day.** Never grows beyond a few hundred KB, ever.
111
- - **Atomic writes** via write-to-`.tmp` + `rename`. Crash-safe.
112
- - **Flushed every hour** (tunable) and on `SIGTERM`/`SIGINT`/`beforeExit`.
113
- - **Nothing on the hot path touches disk.** Tracking a visit is: one SHA-256, one HLL register update. Sub-millisecond.
114
-
115
- ### Docker / containers
116
-
117
- Because it's one small file, you have options:
118
-
119
- ```dockerfile
120
- # Option A: persist it on a volume
121
- VOLUME /data
122
- ENV STATS_SNAPSHOT_PATH=/data/stats.json
123
- ```
124
-
125
- ```dockerfile
126
- # Option B: bind-mount a single file from the host
127
- # docker run -v $(pwd)/stats.json:/app/stats.json \
128
- # -e STATS_SNAPSHOT_PATH=/app/stats.json ...
129
- ```
130
-
131
- ```dockerfile
132
- # Option C: accept ephemerality. Losing "today" on a redeploy is often fine
133
- # for a small app. The snapshot is flushed on SIGTERM when Node is PID 1,
134
- # so graceful stops keep the latest data.
135
- ENV STATS_SNAPSHOT_PATH=/tmp/statswhatshesaid.json
136
- ```
137
-
138
- Works fine on `node:20-alpine`, `node:20-slim`, distroless — there are no native modules to compile.
106
+ **There is none.** Counts and history live in module-level memory inside whichever Next.js worker is running your middleware.
139
107
 
140
- ### Bring your own backend
108
+ - State **survives across requests** within a single worker / Edge isolate (which is what makes the counter actually count).
109
+ - ❌ State is **lost on every deploy**, process restart, or worker recycle.
110
+ - ❌ State is **per-instance**: if you're running multiple replicas behind a load balancer, each replica has its own counter and they don't sync. Run a single instance, or use a real analytics tool.
141
111
 
142
- If you want to stash the snapshot in Redis, Vercel KV, S3, or anything else, pass a `persist` adapter:
143
-
144
- ```ts
145
- import stats from 'statswhatshesaid'
146
- import type { PersistAdapter, SnapshotV1 } from 'statswhatshesaid'
147
-
148
- const redisPersist: PersistAdapter = {
149
- load: () => {
150
- const raw = redisClient.get('statswhatshesaid:snap') // your sync/blocking client
151
- return raw ? (JSON.parse(raw) as SnapshotV1) : null
152
- },
153
- save: (snap) => {
154
- redisClient.set('statswhatshesaid:snap', JSON.stringify(snap))
155
- },
156
- }
157
-
158
- export default stats.middleware({ persist: redisPersist })
159
- ```
160
-
161
- The adapter interface is synchronous on purpose so the shutdown handler can flush deterministically.
112
+ This is intentional. The library exists to give freshly launched apps an "is anybody home?" signal in 30 seconds with zero infrastructure. Persistence and replication are a different problem class — graduate when you need them.
162
113
 
163
114
  ## Configuration
164
115
 
165
- Configure via env vars (preferred) or by passing options to `stats.middleware({...})`. Options override env.
116
+ Configure via env vars (preferred for `STATS_TOKEN`) or by passing options to `createMiddleware({...})`. Options override env.
166
117
 
167
118
  | Option | Env var | Default |
168
119
  | --- | --- | --- |
169
120
  | `token` | `STATS_TOKEN` | **required** |
170
- | `snapshotPath` | `STATS_SNAPSHOT_PATH` | `./.statswhatshesaid.json` |
171
- | `persist` | — | file adapter at `snapshotPath` |
172
- | `flushIntervalMs` | `STATS_FLUSH_INTERVAL_MS` | `3600000` (1 hour) |
173
121
  | `endpointPath` | `STATS_ENDPOINT_PATH` | `/stats` |
174
122
  | `historyDays` | — | `90` (returned from `/stats`) |
175
- | `maxHistoryDays` | — | `365` (kept in snapshot) |
123
+ | `maxHistoryDays` | — | `365` (kept in memory) |
176
124
  | `filterBots` | — | `true` |
177
125
  | `trustProxy` | `STATS_TRUST_PROXY` | `1` (see [Security](#security) below) |
126
+ | `saltSecret` | `STATS_SALT_SECRET` | unset (see [Multi-replica deployments](#multi-replica-deployments) below) |
178
127
 
179
- ```ts
180
- export default stats.middleware({
181
- endpointPath: '/_internal/stats',
182
- flushIntervalMs: 5 * 60 * 1000,
183
- historyDays: 30,
184
- trustProxy: 1,
185
- })
186
- ```
128
+ ## Multi-replica deployments
129
+
130
+ The default in-memory design is single-instance: each Next.js worker has its own HyperLogLog sketch. If you run multiple replicas, each replica counts the visitors it serves, with no awareness of the others — visitor numbers across `/stats` will differ from replica to replica.
131
+
132
+ If you want a single consolidated number across replicas, you can opt in to **shared-salt mode** and pair it with an external collector that merges sketches:
133
+
134
+ 1. Set `STATS_SALT_SECRET` (any long random string — `openssl rand -hex 32`) to the **same value** on every replica. The daily HLL salt then becomes `HMAC-SHA-256(saltSecret, utcDate)` — deterministic across replicas, still rotating daily, so cross-day unlinkability is preserved.
135
+ 2. Run [`statswhatshesaid-collector`](../collector) — an external CLI — on a machine you control. Configure it with the per-replica URLs and the `STATS_TOKEN`. The collector polls `/stats?format=raw` from each replica, fetches the raw HLL register array plus a salt fingerprint, verifies the fingerprints match, merges the sketches register-wise (element-wise max), and stores the merged daily number in a local SQLite database.
136
+
137
+ If you don't set `STATS_SALT_SECRET`, the library behaves exactly as before — random per-process salts, `/stats?format=raw` simply ignored — and you can run a single-replica deployment without any of this.
187
138
 
188
139
  ## Security
189
140
 
190
- This is a minimal library, but it runs inside your app's request path and writes to your filesystem, so its defaults matter. Read this section before deploying.
141
+ This is a minimal library, but it runs inside your app's request path, so its defaults matter. Read this section before deploying.
191
142
 
192
143
  ### Threat model
193
144
 
194
145
  - **In scope:** preventing trivial forging of visitor counts, protecting the `/stats` endpoint from unauthorized reads, keeping the process alive under abuse, making visitor hashes cross-day unlinkable.
195
- - **Out of scope:** preventing a determined attacker with unlimited resources from skewing the numbers. statswhatshesaid is for day-one visibility on small, self-hosted apps. Once your traffic is big enough that someone would bother flooding your stats, you should be on Plausible / Umami / PostHog anyway.
146
+ - **Out of scope:** preventing a determined attacker with unlimited resources from skewing the numbers. statswhatshesaid is for day-one visibility on small apps. Once your traffic is big enough that someone would bother flooding your stats, you should be on Plausible / Umami / PostHog anyway.
196
147
 
197
148
  ### 1. `trustProxy` — who decides the client IP?
198
149
 
199
150
  Unique-visitor dedup hashes the client IP alongside the User-Agent. If the attacker controls the IP you hash with, they control the count.
200
151
 
201
- **The problem:** `X-Forwarded-For` is a list of IPs separated by commas. Each reverse proxy in the chain **appends** the IP of *its own peer* (the thing that spoke TCP to it). The *leftmost* entry is whatever the original client claimed — i.e. attacker-controlled. The *rightmost N entries* are what trusted proxies added, so they're authentic.
202
-
203
- To pick the real client IP safely you must **walk the chain from the right, skipping one entry per trusted proxy**.
204
-
205
- **Configuration:**
206
-
207
- - `trustProxy: 0` — Never read forwarding headers. Every request hashes to a single constant peer. `uniqueVisitors` will under-count dramatically (ideally it collapses to 1), but **nothing an attacker sends can forge it**. Use this only if (a) your process is directly exposed to untrusted clients, or (b) you're OK with a "did anybody visit today?" binary signal.
208
-
209
- - `trustProxy: 1` **(default)** — One trusted reverse proxy sits in front of this Node process. The library takes the **rightmost** entry of `X-Forwarded-For`. This is correct for the single most common self-hosted shape: `client → nginx → next`, or `client → Caddy → next`, or `client → Traefik → next`.
210
-
211
- - `trustProxy: 2` — Two trusted hops. The library takes the **second-from-right** entry of `X-Forwarded-For`. Use this for setups like `client → Cloudflare → nginx → next` where Cloudflare is ALSO adding to XFF.
152
+ `X-Forwarded-For` is a list of IPs separated by commas. Each reverse proxy in the chain **appends** the IP of *its own peer*. The *leftmost* entry is whatever the original client claimed — i.e. attacker-controlled. The *rightmost N entries* are what trusted proxies added, so they're authentic. To pick the real client IP safely you must **walk the chain from the right, skipping one entry per trusted proxy**.
212
153
 
154
+ - `trustProxy: 0` — Never read forwarding headers. Every request hashes to a single constant peer. `uniqueVisitors` will under-count, but **nothing an attacker sends can forge it**.
155
+ - `trustProxy: 1` **(default)** — One trusted reverse proxy in front of this process (`client → nginx → next`). Library takes the **rightmost** entry of `X-Forwarded-For`.
156
+ - `trustProxy: 2` — Two trusted hops (`client → Cloudflare → nginx → next`). Library takes the **second-from-right** entry.
213
157
  - `trustProxy: N` — Generalizes to N trusted hops.
214
158
 
215
159
  **nginx recipe (trustProxy = 1):**
@@ -222,93 +166,64 @@ location / {
222
166
  }
223
167
  ```
224
168
 
225
- `$proxy_add_x_forwarded_for` appends the client's socket IP to whatever XFF the client sent. With `trustProxy: 1`, statswhatshesaid ignores whatever the client sent and takes the rightmost entry (which is what nginx appended). The client's spoofed values sit uselessly to the left.
226
-
227
- **Caddy recipe (trustProxy = 1):**
228
-
229
- ```caddyfile
230
- example.com {
231
- reverse_proxy 127.0.0.1:3000
232
- }
233
- ```
234
-
235
- Caddy automatically appends the client IP to `X-Forwarded-For` by default.
236
-
237
- **Cloudflare + nginx recipe (trustProxy = 2):**
238
-
239
- ```nginx
240
- # nginx behind Cloudflare
241
- location / {
242
- proxy_pass http://127.0.0.1:3000;
243
- proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
244
- }
245
- ```
246
-
247
- With `trustProxy: 2`, the second-from-right entry is the real client: `attacker-spoof, real-client, cloudflare-edge`.
169
+ `$proxy_add_x_forwarded_for` appends the client's socket IP to whatever XFF the client sent. With `trustProxy: 1`, statswhatshesaid takes the rightmost entry (nginx's appended value), and the client's spoofed values sit uselessly to the left.
248
170
 
249
- **Direct-exposed (no proxy) warning:** If you're running `next start` straight on `0.0.0.0:3000` with no proxy, **any header you see is attacker-controlled**. Set `trustProxy: 0` and accept that visitor dedup won't work, OR put any reverse proxy in front.
171
+ **Direct-exposed (no proxy) warning:** If you're running Next.js straight on `0.0.0.0:3000` with no proxy in front, **any header you see is attacker-controlled**. Set `trustProxy: 0` and accept that visitor dedup won't work, OR put any reverse proxy in front.
250
172
 
251
173
  ### 2. Token strength and rate limiting
252
174
 
253
- `/stats` is protected by a single static token. A short token is brute-forceable if an attacker hammers the endpoint.
175
+ `/stats` is protected by a single static token. A short token is brute-forceable.
254
176
 
255
- - statswhatshesaid **warns** at startup if your token is shorter than 32 characters. It does not reject — you might deliberately pick a memorable token for ad-hoc browser access from anywhere.
177
+ - statswhatshesaid **warns** at startup if your token is shorter than 32 characters. It does not reject — you might pick a memorable token for ad-hoc browser access.
256
178
  - A safer choice: `openssl rand -hex 32` → a 64-char hex string.
257
- - The library does **not** rate-limit `/stats`. That's your CDN / reverse-proxy / application middleware's job. For nginx: [`limit_req`](https://nginx.org/en/docs/http/ngx_http_limit_req_module.html). For Cloudflare: [rate limiting rules](https://developers.cloudflare.com/waf/rate-limiting-rules/). For Next.js middleware chains: [`@upstash/ratelimit`](https://github.com/upstash/ratelimit-js).
179
+ - The library does **not** rate-limit `/stats`. That's your CDN / reverse-proxy / application middleware's job ([nginx `limit_req`](https://nginx.org/en/docs/http/ngx_http_limit_req_module.html), [Cloudflare rate limiting](https://developers.cloudflare.com/waf/rate-limiting-rules/), [`@upstash/ratelimit`](https://github.com/upstash/ratelimit-js)).
258
180
 
259
181
  ### 3. Passing the token: `Authorization` header vs query string
260
182
 
261
- You can pass the token two ways:
183
+ Two ways to pass the token:
262
184
 
263
185
  | Method | Use when |
264
186
  | --- | --- |
265
187
  | `Authorization: Bearer <token>` header | **Production** — doesn't leak to access logs, browser history, or Referer |
266
- | `?t=<token>` query string | Ad-hoc browser checks where typing a header is annoying |
188
+ | `?t=<token>` query string | Ad-hoc browser checks |
267
189
 
268
- Both are accepted. If both are present, the `Authorization` header wins. Example production check:
190
+ Both are accepted. If both are present, the `Authorization` header wins.
269
191
 
270
192
  ```bash
271
193
  curl -H "Authorization: Bearer $STATS_TOKEN" https://myapp.com/stats
272
194
  ```
273
195
 
274
- The query string is convenient but ends up in **nginx/CDN access logs, browser history, and Referer headers**. Don't link to `/stats?t=...` from any page.
275
-
276
196
  ### 4. Count inflation by flooding
277
197
 
278
- An attacker who can send arbitrary (IP, User-Agent) pairs — even behind a correctly configured `trustProxy` — can insert arbitrarily many distinct "visitors" into the HLL sketch. Memory doesn't blow up (HLL is fixed 16 KB/day), but the reported `uniqueVisitors` becomes meaningless during the attack. The library cannot prevent this at the middleware layer. **Defense:** rate-limit tracked routes at the same layer that protects the rest of your app. Don't treat the number as authoritative during a suspected abuse event.
198
+ An attacker who can send arbitrary `(IP, User-Agent)` pairs can insert arbitrarily many distinct "visitors" into the HLL sketch. Memory doesn't blow up (HLL is fixed 16 KB/day), but the reported count becomes meaningless during the attack. The library can't prevent this at the middleware layer rate-limit at your CDN / reverse proxy.
279
199
 
280
- ### 5. Snapshot file permissions and contents
200
+ ### 5. Privacy properties
281
201
 
282
- - The snapshot file is written with mode `0o600` (owner read/write only). It contains the current day's salt, which would make visitor hashes linkable back to their `(ip, ua)` tuples if disclosed alongside an independent request log.
283
- - Write is atomic via `.tmp` + `rename`. A crash mid-write leaves the previous snapshot intact.
284
- - The snapshot file contains **no personal data** just the HLL registers, the salt, and per-day visitor counts. No IPs or User-Agents are stored.
202
+ - **Cookieless.** The library never sets or reads cookies.
203
+ - **No personal data persisted.** Hashes go into the HLL (which discards them) and are never written anywhere. No filesystem, no remote calls.
204
+ - **Cross-day unlinkability.** The salt rotates at every UTC midnight. Yesterday's hash of `(ip, ua)` is unrelated to today's hash of the same tuple.
205
+ - **No telemetry.** The library makes zero outbound network requests.
285
206
 
286
207
  ### 6. User-Agent length cap
287
208
 
288
- Incoming User-Agent headers are truncated to **512 bytes** before hashing and bot-filter checks. Node already caps total header size at ~16 KB, but this bounds per-request CPU regardless.
289
-
290
- ### 7. Privacy properties
291
-
292
- - **Cookieless.** The library never sets or reads cookies.
293
- - **No personal data persisted.** Hashes go into the HLL (which discards them) and are never written to disk.
294
- - **Cross-day unlinkability.** The salt rotates at every UTC midnight. Yesterday's hash of `(ip, ua)` is unrelated to today's hash of the same tuple.
295
- - **Mid-day restart caveat.** If the process restarts within the same UTC day, the restored salt (from the snapshot file) is the same, so the same visitor returning after the restart doesn't get double-counted. This means the salt IS on disk for the current day. Rotate `STATS_TOKEN` and delete the snapshot file if you think the file was exposed.
209
+ Incoming User-Agent headers are truncated to **512 bytes** before hashing and bot-filter checks. Bounds per-request CPU regardless of upstream limits.
296
210
 
297
211
  ## Where it works
298
212
 
299
- - ✅ **Self-hosted Next.js** `next start` on a VPS, Docker, Fly.io, Railway, etc. Single long-running Node process.
300
- - **Vercel / Netlify / serverless by default** ephemeral filesystem and per-request lambdas mean the in-memory HLL doesn't survive. You *could* make this work with a custom `persist` adapter pointing at Vercel KV or Upstash Redis, but at that point you're probably better off with a hosted analytics service.
213
+ - ✅ **Self-hosted Next.js** (`next start` on a VPS, Docker, Fly.io, Railway, etc.) single instance.
214
+ - **Vercel** and other serverless platformsworks in Edge middleware. Counts persist for the lifetime of each isolate; expect them to reset more often than on a long-running self-hosted process.
215
+ - ❌ **Multi-instance deployments** — each replica has its own in-memory counter and they don't sync. The library is single-process by design.
301
216
 
302
217
  ## Escape hatch (non-middleware integration)
303
218
 
304
- If you can't use `runtime: 'nodejs'` in middleware, call the tracker manually from a route handler or `instrumentation.ts`:
219
+ If you need to call from a route handler or `instrumentation.ts`:
305
220
 
306
221
  ```ts
307
- import stats from 'statswhatshesaid'
222
+ import { trackRequest } from 'statswhatshesaid'
308
223
  import type { NextRequest } from 'next/server'
309
224
 
310
- export function GET(req: NextRequest) {
311
- stats.track(req)
225
+ export async function GET(req: NextRequest) {
226
+ await trackRequest(req)
312
227
  return new Response('ok')
313
228
  }
314
229
  ```
@@ -328,7 +243,7 @@ The example app under `examples/basic` is the simplest way to smoke-test changes
328
243
 
329
244
  ## Releasing
330
245
 
331
- Versioning and publishing are managed with [Changesets](https://github.com/changesets/changesets) and automated via GitHub Actions.
246
+ Versioning and publishing are managed with [Changesets](https://github.com/changesets/changesets) and automated via the GitHub Actions Release workflow using **npm trusted publishing** (OIDC). No long-lived npm tokens live in the repo.
332
247
 
333
248
  **Day-to-day flow:**
334
249
 
@@ -337,26 +252,8 @@ Versioning and publishing are managed with [Changesets](https://github.com/chang
337
252
  ```bash
338
253
  npx changeset
339
254
  ```
340
- Pick the bump type (patch / minor / major) and write a short summary. Commit the generated `.changeset/*.md` file.
341
- 3. Merge the PR into `main`. The `Release` workflow will open (or update) a **"chore(release): version packages"** PR that bumps `package.json` and updates `CHANGELOG.md`.
342
- 4. When you merge the release PR, the workflow publishes the new version to npm with [provenance](https://docs.npmjs.com/generating-provenance-statements) attached.
343
-
344
- **One-time setup:**
345
-
346
- - The unscoped package name `statswhatshesaid` must be available on npm (`npm view statswhatshesaid` — a 404 means it's yours for the taking on first publish).
347
- - Add an automation token to the GitHub repo as the `NPM_TOKEN` secret (`Settings → Secrets and variables → Actions`). Use a **granular** token scoped to publish the `statswhatshesaid` package.
348
- - In `Settings → Actions → General`, under *Workflow permissions*, allow GitHub Actions to **create and approve pull requests** so the release bot can open the version PR.
349
-
350
- **Manual publishing (escape hatch):**
351
-
352
- If you ever need to cut a release locally:
353
-
354
- ```bash
355
- npx changeset version # bumps package.json + updates CHANGELOG
356
- git commit -am "chore(release): version packages"
357
- git push
358
- npm run release # verify + changeset publish
359
- ```
255
+ 3. Merge the PR into `main`. The Release workflow opens (or updates) a "chore(release): version packages" PR that bumps `package.json` and updates `CHANGELOG.md`.
256
+ 4. When you merge the release PR, the workflow publishes the new version to npm with provenance attached.
360
257
 
361
258
  ## License
362
259