statswhatshesaid 0.1.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +151 -0
- package/README.md +82 -185
- package/dist/index.cjs +216 -350
- package/dist/index.cjs.map +1 -1
- package/dist/index.d.cts +68 -72
- package/dist/index.d.ts +68 -72
- package/dist/index.js +214 -349
- package/dist/index.js.map +1 -1
- package/package.json +11 -10
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,151 @@
|
|
|
1
|
+
# statswhatshesaid
|
|
2
|
+
|
|
3
|
+
## 0.3.0
|
|
4
|
+
|
|
5
|
+
### Minor Changes
|
|
6
|
+
|
|
7
|
+
- a404924: **Library — opt-in shared-salt mode + raw HLL sketch export**
|
|
8
|
+
|
|
9
|
+
Add a new `saltSecret` option (env: `STATS_SALT_SECRET`). When set, the
|
|
10
|
+
daily HLL salt is derived as `HMAC-SHA-256(saltSecret, utcDate)` instead of
|
|
11
|
+
random per-process bytes. Replicas configured with the same secret then
|
|
12
|
+
produce identical daily salts — the mathematical precondition for an
|
|
13
|
+
external tool to merge HLL sketches across replicas. Cross-day
|
|
14
|
+
unlinkability is preserved (the salt still rotates daily).
|
|
15
|
+
|
|
16
|
+
When shared-salt mode is on, `GET /stats?format=raw` additionally returns
|
|
17
|
+
the raw 16,384-byte HLL register array (base64) plus an 8-byte
|
|
18
|
+
`saltFingerprint` so a collector can verify replicas are using the same
|
|
19
|
+
salt before merging. When the secret is unset, behavior is unchanged.
|
|
20
|
+
|
|
21
|
+
This is fully backwards-compatible: existing deployments need no changes.
|
|
22
|
+
|
|
23
|
+
**New package — `statswhatshesaid-collector`**
|
|
24
|
+
|
|
25
|
+
External one-shot CLI (`swhsd-collect`) that polls one or more deployed
|
|
26
|
+
`statswhatshesaid` apps and persists their results to a local SQLite
|
|
27
|
+
database. Solves what the in-memory library deliberately does not:
|
|
28
|
+
|
|
29
|
+
- multi-app aggregation
|
|
30
|
+
- best-effort persistence across app restarts
|
|
31
|
+
- long-term retention beyond the library's in-memory window
|
|
32
|
+
- multi-replica merging (opt-in, requires `STATS_SALT_SECRET`)
|
|
33
|
+
|
|
34
|
+
Schedule it however you like — cron, systemd timer, launchd, GitHub
|
|
35
|
+
Actions. See `packages/collector/README.md` for examples.
|
|
36
|
+
|
|
37
|
+
## 0.2.0
|
|
38
|
+
|
|
39
|
+
### Minor Changes
|
|
40
|
+
|
|
41
|
+
- c5437a3: **One-line drop-in.** statswhatshesaid is now truly one line:
|
|
42
|
+
|
|
43
|
+
```ts
|
|
44
|
+
// middleware.ts
|
|
45
|
+
export { default } from "statswhatshesaid";
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
No `runtime: 'nodejs'` config, no `matcher`, no `experimental` flags, no Next.js 15.2+ requirement. It just works.
|
|
49
|
+
|
|
50
|
+
## Breaking changes
|
|
51
|
+
|
|
52
|
+
This is a major architectural change disguised as a `minor` bump because we're still in `0.x`. The headline change: **all persistence is gone**. Counts and history live in process memory only and reset on every restart. This is intentional — see "Why" below.
|
|
53
|
+
|
|
54
|
+
- **Removed** `node:fs` entirely. No more snapshot file. No more `.statswhatshesaid.json`. No more atomic-rename writes.
|
|
55
|
+
- **Removed** `node:crypto`. All hashing now uses Web Crypto (`crypto.subtle.digest`, `crypto.getRandomValues`).
|
|
56
|
+
- **Removed** the `runtime: 'nodejs'` middleware-config requirement. The library runs in **both** Edge and Node runtimes since it only uses Web APIs.
|
|
57
|
+
- **Removed** the `PersistAdapter` interface, `FileSnapshotAdapter`, `SnapshotV1` type, and the `persist`, `snapshotPath`, `flushIntervalMs` options.
|
|
58
|
+
- **Removed** `process.on('SIGTERM' | 'SIGINT' | 'beforeExit')` handlers. There's nothing to flush.
|
|
59
|
+
- **Removed** the periodic flush timer.
|
|
60
|
+
- **Removed** the Node-runtime guard (`assertNodeRuntime`). The library no longer cares which runtime you use.
|
|
61
|
+
- **Lowered** the Next.js peer dependency from `>=15.2.0` to `>=13.0.0`.
|
|
62
|
+
- **Default export** of the main package is now a pre-instantiated middleware function (was previously the `stats` object). To customize options, import `createMiddleware`:
|
|
63
|
+
```ts
|
|
64
|
+
import { createMiddleware } from "statswhatshesaid";
|
|
65
|
+
export default createMiddleware({ filterBots: false });
|
|
66
|
+
```
|
|
67
|
+
- **`createMiddleware` now returns an `async` function** since `crypto.subtle.digest` is async. Next.js middleware natively supports async.
|
|
68
|
+
- **`trackRequest` is now async** for the same reason.
|
|
69
|
+
|
|
70
|
+
## New features
|
|
71
|
+
|
|
72
|
+
- **Self-filters common static paths** before tracking (`/_next/static/*`, `/_next/image/*`, `/favicon.ico`, `/robots.txt`, `/sitemap.xml`, `/manifest.json`, etc.) so users don't need a custom `matcher` to skip static assets.
|
|
73
|
+
- **One-line integration**: `export { default } from 'statswhatshesaid'` is the entire `middleware.ts`.
|
|
74
|
+
|
|
75
|
+
## Why (the short version)
|
|
76
|
+
|
|
77
|
+
The previous version's promise of "drop in" was undermined by the four lines of `export const config = { matcher, runtime: 'nodejs' }` boilerplate users had to write, plus the Next.js 15.2+ requirement and the directory of snapshot/WAL/SHM-equivalent file artifacts. The user explicitly wanted a true drop-in for monitoring freshly launched apps and accepted the trade-off of in-memory-only state.
|
|
78
|
+
|
|
79
|
+
Edge runtime is now a first-class target. You can deploy this on Vercel Edge Middleware, on a Docker scratch image, on Cloudflare Pages — anywhere modern JS runs.
|
|
80
|
+
|
|
81
|
+
## Bundle size
|
|
82
|
+
|
|
83
|
+
ESM bundle: 18.7 KB → **12.2 KB** (smaller because `snapshot.ts`, `FileSnapshotAdapter`, the persist abstraction, and the lifecycle plumbing are all gone).
|
|
84
|
+
|
|
85
|
+
## Tests
|
|
86
|
+
|
|
87
|
+
89 unit and integration tests, all passing. The 5-test snapshot suite was deleted along with the file adapter. Persistence-restart and corruption-recovery tests were dropped (no persistence to test). New tests cover the static-path filter, the Web Crypto hash path, and the async constant-time compare.
|
|
88
|
+
|
|
89
|
+
End-to-end smoke tested via the `examples/basic` Next.js app: 2 distinct visitors counted, dedup correct, bot filtered, favicon skipped, both query and `Authorization` token paths working, wrong token rejected.
|
|
90
|
+
|
|
91
|
+
## 0.1.0
|
|
92
|
+
|
|
93
|
+
### Minor Changes
|
|
94
|
+
|
|
95
|
+
- 7880aa7: Initial release of `statswhatshesaid` — a super minimal drop-in unique-visitors-per-day stats library for self-hosted Next.js.
|
|
96
|
+
|
|
97
|
+
**Features:**
|
|
98
|
+
|
|
99
|
+
- One-line integration via Next.js middleware (`export default stats.middleware()`).
|
|
100
|
+
- Single `/stats?t=<token>` endpoint returning JSON (today's estimate + history).
|
|
101
|
+
- Cookieless visitor identification: `SHA-256(ip + ua + dailySalt)`, salt rotates at UTC midnight.
|
|
102
|
+
- HyperLogLog (p=14) cardinality estimation — fixed 16 KB per day, ~0.8% standard error.
|
|
103
|
+
- Single JSON snapshot file (~22 KB) with atomic `.tmp` + rename writes. Default `./.statswhatshesaid.json`.
|
|
104
|
+
- Pluggable `PersistAdapter` for bring-your-own backends (Redis, KV, S3).
|
|
105
|
+
- Edge-runtime guard with a clear, actionable error message.
|
|
106
|
+
- **Zero runtime dependencies.** No native modules, no Docker volume gymnastics. Works on Alpine, slim, distroless.
|
|
107
|
+
- Requires Next.js ≥ 15.2 for the `nodejs` middleware runtime.
|
|
108
|
+
|
|
109
|
+
- 3b295d6: Second-pass hardening found via a targeted re-audit. These fixes all sit inside the existing v0.1.0 window (still unreleased), so they roll into the first published release.
|
|
110
|
+
|
|
111
|
+
**Length-prefixed visitor hash construction.** `computeVisitorHash` now prepends a big-endian length header for each variable-length component (ip, ua) before feeding them into SHA-256. The previous `ip + ":" + ua + ":" + salt` encoding was input-ambiguous: with IPv6 addresses containing colons, two distinct `(ip, ua)` pairs could produce the same pre-image and therefore the same hash. Length-prefixing makes the pre-image unambiguous.
|
|
112
|
+
|
|
113
|
+
**Snapshot load is now crash-proof.** `VisitorStore.fromSnapshot` wraps both the same-day and cross-day branches in try/catch and degrades gracefully:
|
|
114
|
+
|
|
115
|
+
- Decoded salt length is validated (must be exactly 32 bytes) before use.
|
|
116
|
+
- Decoded HLL register length is validated (must be exactly `HLL_REGISTER_COUNT` bytes) before use. `Buffer.from(x, 'base64')` silently ignores malformed characters, so the base64 string-length check in `isValidSnapshot` alone was insufficient.
|
|
117
|
+
- On any decode/validation failure, the store starts fresh for the current day rather than throwing out of init. Up to a few minutes of same-day dedupe state is lost; the process stays up.
|
|
118
|
+
|
|
119
|
+
**Strict history validation.** `sanitizeHistory` drops any entry that isn't a real `YYYY-MM-DD` calendar date (validated via `Date.UTC` round-trip, rejecting `2026-02-30` and `2025-02-29`) mapped to a non-negative integer count. Entries for `currentDate` itself are dropped (today's count is owned by the live HLL). Protects against snapshot files poisoned by whoever has write access.
|
|
120
|
+
|
|
121
|
+
**Snapshot validator rejects arrays.** `isValidSnapshot` now explicitly rejects arrays for the `history` field. Previously `typeof [] === 'object'` let arrays through, which would then be iterated with numeric-string keys.
|
|
122
|
+
|
|
123
|
+
**Config sanity checks.** `resolveConfig` now validates:
|
|
124
|
+
|
|
125
|
+
- `flushIntervalMs` must be a positive integer ≥ 1000 ms (prevents `setInterval(tick, 0)` hot loops from a bad config).
|
|
126
|
+
- `historyDays` and `maxHistoryDays` must be non-negative integers.
|
|
127
|
+
- `endpointPath` must match `^/[A-Za-z0-9\-._~/]*$` — no whitespace, CR/LF, or shell metacharacters.
|
|
128
|
+
|
|
129
|
+
All throw loud, clear errors at config resolution time.
|
|
130
|
+
|
|
131
|
+
**New `isValidUtcDate` helper.** Shared between snapshot validation and history sanitization. Rejects calendrically-impossible dates like `2026-02-30` via `Date.UTC` round-trip, not just via regex.
|
|
132
|
+
|
|
133
|
+
**Tests.** 23 new hardening tests across four describe blocks covering the hash input-ambiguity fix, date validation, config validation, and graceful snapshot-load degradation. Total test count: 76 → 99, all green.
|
|
134
|
+
|
|
135
|
+
- 64b583f: Security hardening pass before the first public release.
|
|
136
|
+
|
|
137
|
+
**New `trustProxy` option** (default: `1`). Determines how many reverse-proxy hops to skip when resolving the client IP from `X-Forwarded-For`. The library now walks the XFF chain from the RIGHT (instead of blindly taking the leftmost entry), which defeats the standard client-side XFF spoofing attack when at least one trusted proxy sits in front of the process. Set to `0` to ignore forwarding headers entirely, or to `N > 1` for chained proxies (e.g. Cloudflare → nginx → app = `2`). Configurable via `STATS_TRUST_PROXY` env var. See the README Security section for recipes.
|
|
138
|
+
|
|
139
|
+
**`/stats` now accepts `Authorization: Bearer <token>`** in addition to the `?t=<token>` query string. The header is preferred in production because it does not leak into access logs, browser history, or Referer headers. If both are provided, the header wins.
|
|
140
|
+
|
|
141
|
+
**Weak-token warning.** The library emits a one-time `console.warn` at init time if the token is shorter than 32 characters, with guidance to run `openssl rand -hex 32`. The library does NOT reject short tokens — you may deliberately pick a memorable one for ad-hoc browser access.
|
|
142
|
+
|
|
143
|
+
**Snapshot file is now written with mode `0o600`** (owner read/write only). The snapshot contains the current day's visitor-hashing salt and should not be world-readable.
|
|
144
|
+
|
|
145
|
+
**User-Agent truncation.** Incoming `User-Agent` headers are truncated to 512 bytes before hashing and bot filtering, bounding per-request CPU cost regardless of the upstream header-size limit.
|
|
146
|
+
|
|
147
|
+
**Constant-time token comparison.** Token validation now prehashes both sides with SHA-256 before `timingSafeEqual`, so the comparison no longer branches on token length.
|
|
148
|
+
|
|
149
|
+
**Process signal handler leak fix.** `shutdown()` now calls `process.removeListener` for its own handlers, fixing a `MaxListenersExceededWarning` that appeared when many init/shutdown cycles ran in the same process (e.g. dev-mode HMR, test suites).
|
|
150
|
+
|
|
151
|
+
**README: new Security section** covering the threat model, `trustProxy` semantics with nginx/Caddy/Cloudflare recipes, token handling, flooding limitations, snapshot file contents, and privacy properties.
|
package/README.md
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
# statswhatshesaid
|
|
2
2
|
|
|
3
|
-
A super minimal drop-in stats library for
|
|
3
|
+
A super minimal **one-line** drop-in stats library for Next.js. One metric, one line of integration, **zero runtime dependencies**, in-memory only, runs in **both** the Edge and Node runtimes.
|
|
4
4
|
|
|
5
5
|
- Tracks **unique visitors per day** — that's it.
|
|
6
6
|
- No tracking pixel, no client JS, no cookies.
|
|
7
|
-
- **Zero dependencies.** No native modules, no SQLite, no Docker volume gymnastics.
|
|
8
|
-
-
|
|
7
|
+
- **Zero dependencies.** No native modules, no filesystem, no SQLite, no Docker volume gymnastics.
|
|
8
|
+
- **Works anywhere.** Edge runtime, Node runtime, Vercel, self-hosted, Docker, scratch images. The library uses only Web APIs (`crypto.subtle`, `crypto.getRandomValues`, `globalThis.fetch`).
|
|
9
9
|
- Read your stats by visiting `myapp.com/stats?t=<your-secret>` — JSON response.
|
|
10
10
|
|
|
11
|
-
> **Designed for freshly launched apps.**
|
|
11
|
+
> **Designed for freshly launched apps.** Counts and history live in process memory. They survive across requests within a single worker but reset on every deploy / restart. That's the trade-off for "drop in and forget." Once your traffic warrants real analytics, graduate to Plausible / Umami / PostHog.
|
|
12
12
|
|
|
13
13
|
## Install
|
|
14
14
|
|
|
@@ -18,21 +18,14 @@ npm install statswhatshesaid
|
|
|
18
18
|
|
|
19
19
|
## Use it
|
|
20
20
|
|
|
21
|
-
|
|
21
|
+
**One line.** That's it.
|
|
22
22
|
|
|
23
23
|
```ts
|
|
24
24
|
// middleware.ts
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
export default stats.middleware()
|
|
28
|
-
|
|
29
|
-
export const config = {
|
|
30
|
-
matcher: ['/((?!_next/static|_next/image|favicon.ico).*)'],
|
|
31
|
-
runtime: 'nodejs', // REQUIRED — see "Edge Runtime" below
|
|
32
|
-
}
|
|
25
|
+
export { default } from 'statswhatshesaid'
|
|
33
26
|
```
|
|
34
27
|
|
|
35
|
-
Set your secret
|
|
28
|
+
Set your secret:
|
|
36
29
|
|
|
37
30
|
```bash
|
|
38
31
|
STATS_TOKEN=pick-a-long-random-string
|
|
@@ -57,34 +50,50 @@ You'll get JSON back:
|
|
|
57
50
|
}
|
|
58
51
|
```
|
|
59
52
|
|
|
60
|
-
That's the whole library.
|
|
53
|
+
That's the whole library. No `runtime: 'nodejs'` config, no `matcher`, no `experimental`, no `next.config` flags. Just one re-export line.
|
|
61
54
|
|
|
62
|
-
##
|
|
55
|
+
## Customizing options
|
|
63
56
|
|
|
64
|
-
|
|
57
|
+
If you need to change defaults — bot filter, endpoint path, history retention, trustProxy hops — import `createMiddleware` instead:
|
|
65
58
|
|
|
66
59
|
```ts
|
|
60
|
+
// middleware.ts
|
|
61
|
+
import { createMiddleware } from 'statswhatshesaid'
|
|
62
|
+
|
|
63
|
+
export default createMiddleware({
|
|
64
|
+
endpointPath: '/_internal/stats',
|
|
65
|
+
filterBots: false,
|
|
66
|
+
trustProxy: 2,
|
|
67
|
+
})
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
You can also set a custom `matcher` if you want the middleware to run on a narrower path set than "everything":
|
|
71
|
+
|
|
72
|
+
```ts
|
|
73
|
+
import { createMiddleware } from 'statswhatshesaid'
|
|
74
|
+
|
|
75
|
+
export default createMiddleware()
|
|
76
|
+
|
|
67
77
|
export const config = {
|
|
68
|
-
matcher: [
|
|
69
|
-
runtime: 'nodejs',
|
|
78
|
+
matcher: ['/((?!api).*)'],
|
|
70
79
|
}
|
|
71
80
|
```
|
|
72
81
|
|
|
73
|
-
This is stable in **Next.js 15.2 and newer**.
|
|
74
|
-
|
|
75
82
|
## How a "unique visitor" is counted
|
|
76
83
|
|
|
77
84
|
Cookieless, Plausible-style:
|
|
78
85
|
|
|
79
86
|
```
|
|
80
|
-
visitorHash = SHA-256( ip +
|
|
87
|
+
visitorHash = SHA-256( length-prefixed( ip ) + length-prefixed( userAgent ) + dailySalt )
|
|
81
88
|
```
|
|
82
89
|
|
|
83
|
-
- `dailySalt` is generated in process memory and
|
|
90
|
+
- `dailySalt` is generated in process memory at startup and rotated lazily at every UTC midnight.
|
|
84
91
|
- The hash is fed into a [**HyperLogLog** sketch](https://en.wikipedia.org/wiki/HyperLogLog) with 16384 one-byte registers (16 KB fixed per day, forever).
|
|
85
|
-
- At UTC midnight the day's estimate is
|
|
92
|
+
- At UTC midnight the day's estimate is moved to an in-memory historical map and the sketch is reset with a fresh salt.
|
|
86
93
|
- Cross-day unlinkability: because the salt is regenerated, hashes from different days can't be correlated back to the same visitor.
|
|
94
|
+
- The hash inputs are length-prefixed so two distinct `(ip, ua)` pairs can never collide via separator ambiguity.
|
|
87
95
|
- Common bot User-Agents are filtered out by default.
|
|
96
|
+
- Common static asset paths (`/_next/static/*`, `/_next/image/*`, `/favicon.ico`, `/robots.txt`, `/sitemap.xml`, `/manifest.json`, etc.) are filtered out before tracking, so you don't need a custom `matcher`.
|
|
88
97
|
|
|
89
98
|
### About accuracy
|
|
90
99
|
|
|
@@ -94,122 +103,57 @@ If you need exact counts down to the last human, don't use this library — grad
|
|
|
94
103
|
|
|
95
104
|
## Storage
|
|
96
105
|
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
```jsonc
|
|
100
|
-
{
|
|
101
|
-
"version": 1,
|
|
102
|
-
"today": "2026-04-07",
|
|
103
|
-
"salt": "<base64 32 bytes>",
|
|
104
|
-
"hllRegisters": "<base64 16 KB>",
|
|
105
|
-
"history": { "2026-04-06": 388, "2026-04-05": 401 }
|
|
106
|
-
}
|
|
107
|
-
```
|
|
108
|
-
|
|
109
|
-
- **One file.** Not a directory, not a DB, no WAL/SHM sidecars.
|
|
110
|
-
- **~22 KB today + 20 bytes per historical day.** Never grows beyond a few hundred KB, ever.
|
|
111
|
-
- **Atomic writes** via write-to-`.tmp` + `rename`. Crash-safe.
|
|
112
|
-
- **Flushed every hour** (tunable) and on `SIGTERM`/`SIGINT`/`beforeExit`.
|
|
113
|
-
- **Nothing on the hot path touches disk.** Tracking a visit is: one SHA-256, one HLL register update. Sub-millisecond.
|
|
114
|
-
|
|
115
|
-
### Docker / containers
|
|
116
|
-
|
|
117
|
-
Because it's one small file, you have options:
|
|
118
|
-
|
|
119
|
-
```dockerfile
|
|
120
|
-
# Option A: persist it on a volume
|
|
121
|
-
VOLUME /data
|
|
122
|
-
ENV STATS_SNAPSHOT_PATH=/data/stats.json
|
|
123
|
-
```
|
|
124
|
-
|
|
125
|
-
```dockerfile
|
|
126
|
-
# Option B: bind-mount a single file from the host
|
|
127
|
-
# docker run -v $(pwd)/stats.json:/app/stats.json \
|
|
128
|
-
# -e STATS_SNAPSHOT_PATH=/app/stats.json ...
|
|
129
|
-
```
|
|
130
|
-
|
|
131
|
-
```dockerfile
|
|
132
|
-
# Option C: accept ephemerality. Losing "today" on a redeploy is often fine
|
|
133
|
-
# for a small app. The snapshot is flushed on SIGTERM when Node is PID 1,
|
|
134
|
-
# so graceful stops keep the latest data.
|
|
135
|
-
ENV STATS_SNAPSHOT_PATH=/tmp/statswhatshesaid.json
|
|
136
|
-
```
|
|
137
|
-
|
|
138
|
-
Works fine on `node:20-alpine`, `node:20-slim`, distroless — there are no native modules to compile.
|
|
106
|
+
**There is none.** Counts and history live in module-level memory inside whichever Next.js worker is running your middleware.
|
|
139
107
|
|
|
140
|
-
|
|
108
|
+
- ✅ State **survives across requests** within a single worker / Edge isolate (which is what makes the counter actually count).
|
|
109
|
+
- ❌ State is **lost on every deploy**, process restart, or worker recycle.
|
|
110
|
+
- ❌ State is **per-instance**: if you're running multiple replicas behind a load balancer, each replica has its own counter and they don't sync. Run a single instance, or use a real analytics tool.
|
|
141
111
|
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
```ts
|
|
145
|
-
import stats from 'statswhatshesaid'
|
|
146
|
-
import type { PersistAdapter, SnapshotV1 } from 'statswhatshesaid'
|
|
147
|
-
|
|
148
|
-
const redisPersist: PersistAdapter = {
|
|
149
|
-
load: () => {
|
|
150
|
-
const raw = redisClient.get('statswhatshesaid:snap') // your sync/blocking client
|
|
151
|
-
return raw ? (JSON.parse(raw) as SnapshotV1) : null
|
|
152
|
-
},
|
|
153
|
-
save: (snap) => {
|
|
154
|
-
redisClient.set('statswhatshesaid:snap', JSON.stringify(snap))
|
|
155
|
-
},
|
|
156
|
-
}
|
|
157
|
-
|
|
158
|
-
export default stats.middleware({ persist: redisPersist })
|
|
159
|
-
```
|
|
160
|
-
|
|
161
|
-
The adapter interface is synchronous on purpose so the shutdown handler can flush deterministically.
|
|
112
|
+
This is intentional. The library exists to give freshly launched apps an "is anybody home?" signal in 30 seconds with zero infrastructure. Persistence and replication are a different problem class — graduate when you need them.
|
|
162
113
|
|
|
163
114
|
## Configuration
|
|
164
115
|
|
|
165
|
-
Configure via env vars (preferred) or by passing options to `
|
|
116
|
+
Configure via env vars (preferred for `STATS_TOKEN`) or by passing options to `createMiddleware({...})`. Options override env.
|
|
166
117
|
|
|
167
118
|
| Option | Env var | Default |
|
|
168
119
|
| --- | --- | --- |
|
|
169
120
|
| `token` | `STATS_TOKEN` | **required** |
|
|
170
|
-
| `snapshotPath` | `STATS_SNAPSHOT_PATH` | `./.statswhatshesaid.json` |
|
|
171
|
-
| `persist` | — | file adapter at `snapshotPath` |
|
|
172
|
-
| `flushIntervalMs` | `STATS_FLUSH_INTERVAL_MS` | `3600000` (1 hour) |
|
|
173
121
|
| `endpointPath` | `STATS_ENDPOINT_PATH` | `/stats` |
|
|
174
122
|
| `historyDays` | — | `90` (returned from `/stats`) |
|
|
175
|
-
| `maxHistoryDays` | — | `365` (kept in
|
|
123
|
+
| `maxHistoryDays` | — | `365` (kept in memory) |
|
|
176
124
|
| `filterBots` | — | `true` |
|
|
177
125
|
| `trustProxy` | `STATS_TRUST_PROXY` | `1` (see [Security](#security) below) |
|
|
126
|
+
| `saltSecret` | `STATS_SALT_SECRET` | unset (see [Multi-replica deployments](#multi-replica-deployments) below) |
|
|
178
127
|
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
128
|
+
## Multi-replica deployments
|
|
129
|
+
|
|
130
|
+
The default in-memory design is single-instance: each Next.js worker has its own HyperLogLog sketch. If you run multiple replicas, each replica counts the visitors it serves, with no awareness of the others — visitor numbers across `/stats` will differ from replica to replica.
|
|
131
|
+
|
|
132
|
+
If you want a single consolidated number across replicas, you can opt in to **shared-salt mode** and pair it with an external collector that merges sketches:
|
|
133
|
+
|
|
134
|
+
1. Set `STATS_SALT_SECRET` (any long random string — `openssl rand -hex 32`) to the **same value** on every replica. The daily HLL salt then becomes `HMAC-SHA-256(saltSecret, utcDate)` — deterministic across replicas, still rotating daily, so cross-day unlinkability is preserved.
|
|
135
|
+
2. Run [`statswhatshesaid-collector`](../collector) — an external CLI — on a machine you control. Configure it with the per-replica URLs and the `STATS_TOKEN`. The collector polls `/stats?format=raw` from each replica, fetches the raw HLL register array plus a salt fingerprint, verifies the fingerprints match, merges the sketches register-wise (element-wise max), and stores the merged daily number in a local SQLite database.
|
|
136
|
+
|
|
137
|
+
If you don't set `STATS_SALT_SECRET`, the library behaves exactly as before — random per-process salts, `/stats?format=raw` simply ignored — and you can run a single-replica deployment without any of this.
|
|
187
138
|
|
|
188
139
|
## Security
|
|
189
140
|
|
|
190
|
-
This is a minimal library, but it runs inside your app's request path
|
|
141
|
+
This is a minimal library, but it runs inside your app's request path, so its defaults matter. Read this section before deploying.
|
|
191
142
|
|
|
192
143
|
### Threat model
|
|
193
144
|
|
|
194
145
|
- **In scope:** preventing trivial forging of visitor counts, protecting the `/stats` endpoint from unauthorized reads, keeping the process alive under abuse, making visitor hashes cross-day unlinkable.
|
|
195
|
-
- **Out of scope:** preventing a determined attacker with unlimited resources from skewing the numbers. statswhatshesaid is for day-one visibility on small
|
|
146
|
+
- **Out of scope:** preventing a determined attacker with unlimited resources from skewing the numbers. statswhatshesaid is for day-one visibility on small apps. Once your traffic is big enough that someone would bother flooding your stats, you should be on Plausible / Umami / PostHog anyway.
|
|
196
147
|
|
|
197
148
|
### 1. `trustProxy` — who decides the client IP?
|
|
198
149
|
|
|
199
150
|
Unique-visitor dedup hashes the client IP alongside the User-Agent. If the attacker controls the IP you hash with, they control the count.
|
|
200
151
|
|
|
201
|
-
|
|
202
|
-
|
|
203
|
-
To pick the real client IP safely you must **walk the chain from the right, skipping one entry per trusted proxy**.
|
|
204
|
-
|
|
205
|
-
**Configuration:**
|
|
206
|
-
|
|
207
|
-
- `trustProxy: 0` — Never read forwarding headers. Every request hashes to a single constant peer. `uniqueVisitors` will under-count dramatically (ideally it collapses to 1), but **nothing an attacker sends can forge it**. Use this only if (a) your process is directly exposed to untrusted clients, or (b) you're OK with a "did anybody visit today?" binary signal.
|
|
208
|
-
|
|
209
|
-
- `trustProxy: 1` **(default)** — One trusted reverse proxy sits in front of this Node process. The library takes the **rightmost** entry of `X-Forwarded-For`. This is correct for the single most common self-hosted shape: `client → nginx → next`, or `client → Caddy → next`, or `client → Traefik → next`.
|
|
210
|
-
|
|
211
|
-
- `trustProxy: 2` — Two trusted hops. The library takes the **second-from-right** entry of `X-Forwarded-For`. Use this for setups like `client → Cloudflare → nginx → next` where Cloudflare is ALSO adding to XFF.
|
|
152
|
+
`X-Forwarded-For` is a list of IPs separated by commas. Each reverse proxy in the chain **appends** the IP of *its own peer*. The *leftmost* entry is whatever the original client claimed — i.e. attacker-controlled. The *rightmost N entries* are what trusted proxies added, so they're authentic. To pick the real client IP safely you must **walk the chain from the right, skipping one entry per trusted proxy**.
|
|
212
153
|
|
|
154
|
+
- `trustProxy: 0` — Never read forwarding headers. Every request hashes to a single constant peer. `uniqueVisitors` will under-count, but **nothing an attacker sends can forge it**.
|
|
155
|
+
- `trustProxy: 1` **(default)** — One trusted reverse proxy in front of this process (`client → nginx → next`). Library takes the **rightmost** entry of `X-Forwarded-For`.
|
|
156
|
+
- `trustProxy: 2` — Two trusted hops (`client → Cloudflare → nginx → next`). Library takes the **second-from-right** entry.
|
|
213
157
|
- `trustProxy: N` — Generalizes to N trusted hops.
|
|
214
158
|
|
|
215
159
|
**nginx recipe (trustProxy = 1):**
|
|
@@ -222,93 +166,64 @@ location / {
|
|
|
222
166
|
}
|
|
223
167
|
```
|
|
224
168
|
|
|
225
|
-
`$proxy_add_x_forwarded_for` appends the client's socket IP to whatever XFF the client sent. With `trustProxy: 1`, statswhatshesaid
|
|
226
|
-
|
|
227
|
-
**Caddy recipe (trustProxy = 1):**
|
|
228
|
-
|
|
229
|
-
```caddyfile
|
|
230
|
-
example.com {
|
|
231
|
-
reverse_proxy 127.0.0.1:3000
|
|
232
|
-
}
|
|
233
|
-
```
|
|
234
|
-
|
|
235
|
-
Caddy automatically appends the client IP to `X-Forwarded-For` by default.
|
|
236
|
-
|
|
237
|
-
**Cloudflare + nginx recipe (trustProxy = 2):**
|
|
238
|
-
|
|
239
|
-
```nginx
|
|
240
|
-
# nginx behind Cloudflare
|
|
241
|
-
location / {
|
|
242
|
-
proxy_pass http://127.0.0.1:3000;
|
|
243
|
-
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
|
244
|
-
}
|
|
245
|
-
```
|
|
246
|
-
|
|
247
|
-
With `trustProxy: 2`, the second-from-right entry is the real client: `attacker-spoof, real-client, cloudflare-edge`.
|
|
169
|
+
`$proxy_add_x_forwarded_for` appends the client's socket IP to whatever XFF the client sent. With `trustProxy: 1`, statswhatshesaid takes the rightmost entry (nginx's appended value), and the client's spoofed values sit uselessly to the left.
|
|
248
170
|
|
|
249
|
-
**Direct-exposed (no proxy) warning:** If you're running
|
|
171
|
+
**Direct-exposed (no proxy) warning:** If you're running Next.js straight on `0.0.0.0:3000` with no proxy in front, **any header you see is attacker-controlled**. Set `trustProxy: 0` and accept that visitor dedup won't work, OR put any reverse proxy in front.
|
|
250
172
|
|
|
251
173
|
### 2. Token strength and rate limiting
|
|
252
174
|
|
|
253
|
-
`/stats` is protected by a single static token. A short token is brute-forceable
|
|
175
|
+
`/stats` is protected by a single static token. A short token is brute-forceable.
|
|
254
176
|
|
|
255
|
-
- statswhatshesaid **warns** at startup if your token is shorter than 32 characters. It does not reject — you might
|
|
177
|
+
- statswhatshesaid **warns** at startup if your token is shorter than 32 characters. It does not reject — you might pick a memorable token for ad-hoc browser access.
|
|
256
178
|
- A safer choice: `openssl rand -hex 32` → a 64-char hex string.
|
|
257
|
-
- The library does **not** rate-limit `/stats`. That's your CDN / reverse-proxy / application middleware's job
|
|
179
|
+
- The library does **not** rate-limit `/stats`. That's your CDN / reverse-proxy / application middleware's job ([nginx `limit_req`](https://nginx.org/en/docs/http/ngx_http_limit_req_module.html), [Cloudflare rate limiting](https://developers.cloudflare.com/waf/rate-limiting-rules/), [`@upstash/ratelimit`](https://github.com/upstash/ratelimit-js)).
|
|
258
180
|
|
|
259
181
|
### 3. Passing the token: `Authorization` header vs query string
|
|
260
182
|
|
|
261
|
-
|
|
183
|
+
Two ways to pass the token:
|
|
262
184
|
|
|
263
185
|
| Method | Use when |
|
|
264
186
|
| --- | --- |
|
|
265
187
|
| `Authorization: Bearer <token>` header | **Production** — doesn't leak to access logs, browser history, or Referer |
|
|
266
|
-
| `?t=<token>` query string | Ad-hoc browser checks
|
|
188
|
+
| `?t=<token>` query string | Ad-hoc browser checks |
|
|
267
189
|
|
|
268
|
-
Both are accepted. If both are present, the `Authorization` header wins.
|
|
190
|
+
Both are accepted. If both are present, the `Authorization` header wins.
|
|
269
191
|
|
|
270
192
|
```bash
|
|
271
193
|
curl -H "Authorization: Bearer $STATS_TOKEN" https://myapp.com/stats
|
|
272
194
|
```
|
|
273
195
|
|
|
274
|
-
The query string is convenient but ends up in **nginx/CDN access logs, browser history, and Referer headers**. Don't link to `/stats?t=...` from any page.
|
|
275
|
-
|
|
276
196
|
### 4. Count inflation by flooding
|
|
277
197
|
|
|
278
|
-
An attacker who can send arbitrary (IP, User-Agent) pairs
|
|
198
|
+
An attacker who can send arbitrary `(IP, User-Agent)` pairs can insert arbitrarily many distinct "visitors" into the HLL sketch. Memory doesn't blow up (HLL is fixed 16 KB/day), but the reported count becomes meaningless during the attack. The library can't prevent this at the middleware layer — rate-limit at your CDN / reverse proxy.
|
|
279
199
|
|
|
280
|
-
### 5.
|
|
200
|
+
### 5. Privacy properties
|
|
281
201
|
|
|
282
|
-
- The
|
|
283
|
-
-
|
|
284
|
-
-
|
|
202
|
+
- **Cookieless.** The library never sets or reads cookies.
|
|
203
|
+
- **No personal data persisted.** Hashes go into the HLL (which discards them) and are never written anywhere. No filesystem, no remote calls.
|
|
204
|
+
- **Cross-day unlinkability.** The salt rotates at every UTC midnight. Yesterday's hash of `(ip, ua)` is unrelated to today's hash of the same tuple.
|
|
205
|
+
- **No telemetry.** The library makes zero outbound network requests.
|
|
285
206
|
|
|
286
207
|
### 6. User-Agent length cap
|
|
287
208
|
|
|
288
|
-
Incoming User-Agent headers are truncated to **512 bytes** before hashing and bot-filter checks.
|
|
289
|
-
|
|
290
|
-
### 7. Privacy properties
|
|
291
|
-
|
|
292
|
-
- **Cookieless.** The library never sets or reads cookies.
|
|
293
|
-
- **No personal data persisted.** Hashes go into the HLL (which discards them) and are never written to disk.
|
|
294
|
-
- **Cross-day unlinkability.** The salt rotates at every UTC midnight. Yesterday's hash of `(ip, ua)` is unrelated to today's hash of the same tuple.
|
|
295
|
-
- **Mid-day restart caveat.** If the process restarts within the same UTC day, the restored salt (from the snapshot file) is the same, so the same visitor returning after the restart doesn't get double-counted. This means the salt IS on disk for the current day. Rotate `STATS_TOKEN` and delete the snapshot file if you think the file was exposed.
|
|
209
|
+
Incoming User-Agent headers are truncated to **512 bytes** before hashing and bot-filter checks. Bounds per-request CPU regardless of upstream limits.
|
|
296
210
|
|
|
297
211
|
## Where it works
|
|
298
212
|
|
|
299
|
-
- ✅ **Self-hosted Next.js**
|
|
300
|
-
-
|
|
213
|
+
- ✅ **Self-hosted Next.js** (`next start` on a VPS, Docker, Fly.io, Railway, etc.) — single instance.
|
|
214
|
+
- ✅ **Vercel** and other serverless platforms — works in Edge middleware. Counts persist for the lifetime of each isolate; expect them to reset more often than on a long-running self-hosted process.
|
|
215
|
+
- ❌ **Multi-instance deployments** — each replica has its own in-memory counter and they don't sync. The library is single-process by design.
|
|
301
216
|
|
|
302
217
|
## Escape hatch (non-middleware integration)
|
|
303
218
|
|
|
304
|
-
If you
|
|
219
|
+
If you need to call from a route handler or `instrumentation.ts`:
|
|
305
220
|
|
|
306
221
|
```ts
|
|
307
|
-
import
|
|
222
|
+
import { trackRequest } from 'statswhatshesaid'
|
|
308
223
|
import type { NextRequest } from 'next/server'
|
|
309
224
|
|
|
310
|
-
export function GET(req: NextRequest) {
|
|
311
|
-
|
|
225
|
+
export async function GET(req: NextRequest) {
|
|
226
|
+
await trackRequest(req)
|
|
312
227
|
return new Response('ok')
|
|
313
228
|
}
|
|
314
229
|
```
|
|
@@ -328,7 +243,7 @@ The example app under `examples/basic` is the simplest way to smoke-test changes
|
|
|
328
243
|
|
|
329
244
|
## Releasing
|
|
330
245
|
|
|
331
|
-
Versioning and publishing are managed with [Changesets](https://github.com/changesets/changesets) and automated via GitHub Actions.
|
|
246
|
+
Versioning and publishing are managed with [Changesets](https://github.com/changesets/changesets) and automated via the GitHub Actions Release workflow using **npm trusted publishing** (OIDC). No long-lived npm tokens live in the repo.
|
|
332
247
|
|
|
333
248
|
**Day-to-day flow:**
|
|
334
249
|
|
|
@@ -337,26 +252,8 @@ Versioning and publishing are managed with [Changesets](https://github.com/chang
|
|
|
337
252
|
```bash
|
|
338
253
|
npx changeset
|
|
339
254
|
```
|
|
340
|
-
|
|
341
|
-
|
|
342
|
-
4. When you merge the release PR, the workflow publishes the new version to npm with [provenance](https://docs.npmjs.com/generating-provenance-statements) attached.
|
|
343
|
-
|
|
344
|
-
**One-time setup:**
|
|
345
|
-
|
|
346
|
-
- The unscoped package name `statswhatshesaid` must be available on npm (`npm view statswhatshesaid` — a 404 means it's yours for the taking on first publish).
|
|
347
|
-
- Add an automation token to the GitHub repo as the `NPM_TOKEN` secret (`Settings → Secrets and variables → Actions`). Use a **granular** token scoped to publish the `statswhatshesaid` package.
|
|
348
|
-
- In `Settings → Actions → General`, under *Workflow permissions*, allow GitHub Actions to **create and approve pull requests** so the release bot can open the version PR.
|
|
349
|
-
|
|
350
|
-
**Manual publishing (escape hatch):**
|
|
351
|
-
|
|
352
|
-
If you ever need to cut a release locally:
|
|
353
|
-
|
|
354
|
-
```bash
|
|
355
|
-
npx changeset version # bumps package.json + updates CHANGELOG
|
|
356
|
-
git commit -am "chore(release): version packages"
|
|
357
|
-
git push
|
|
358
|
-
npm run release # verify + changeset publish
|
|
359
|
-
```
|
|
255
|
+
3. Merge the PR into `main`. The Release workflow opens (or updates) a "chore(release): version packages" PR that bumps `package.json` and updates `CHANGELOG.md`.
|
|
256
|
+
4. When you merge the release PR, the workflow publishes the new version to npm with provenance attached.
|
|
360
257
|
|
|
361
258
|
## License
|
|
362
259
|
|