statswhatshesaid 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Les
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,363 @@
1
+ # statswhatshesaid
2
+
3
+ A super minimal drop-in stats library for **self-hosted Next.js**. One metric, one line of integration, **zero runtime dependencies**.
4
+
5
+ - Tracks **unique visitors per day** — that's it.
6
+ - No tracking pixel, no client JS, no cookies.
7
+ - **Zero dependencies.** No native modules, no SQLite, no Docker volume gymnastics.
8
+ - Single ~22KB JSON file for persistence. Atomic writes. Put it anywhere or nowhere.
9
+ - Read your stats by visiting `myapp.com/stats?t=<your-secret>` — JSON response.
10
+
11
+ > **Designed for freshly launched apps.** Once traffic gets serious you should graduate to a proper analytics suite (Plausible, Umami, PostHog, ...). This library is the thing you drop in on day one so you can tell whether anyone's visiting yet, with absolutely no setup ceremony.
12
+
13
+ ## Install
14
+
15
+ ```bash
16
+ npm install statswhatshesaid
17
+ ```
18
+
19
+ ## Use it
20
+
21
+ Add **one line** to your `middleware.ts`:
22
+
23
+ ```ts
24
+ // middleware.ts
25
+ import stats from 'statswhatshesaid'
26
+
27
+ export default stats.middleware()
28
+
29
+ export const config = {
30
+ matcher: ['/((?!_next/static|_next/image|favicon.ico).*)'],
31
+ runtime: 'nodejs', // REQUIRED — see "Edge Runtime" below
32
+ }
33
+ ```
34
+
35
+ Set your secret in the environment:
36
+
37
+ ```bash
38
+ STATS_TOKEN=pick-a-long-random-string
39
+ ```
40
+
41
+ Then visit:
42
+
43
+ ```
44
+ https://myapp.com/stats?t=pick-a-long-random-string
45
+ ```
46
+
47
+ You'll get JSON back:
48
+
49
+ ```json
50
+ {
51
+ "today": { "date": "2026-04-07", "uniqueVisitors": 412 },
52
+ "history": [
53
+ { "date": "2026-04-06", "uniqueVisitors": 388 },
54
+ { "date": "2026-04-05", "uniqueVisitors": 401 }
55
+ ],
56
+ "generatedAt": "2026-04-07T14:23:10.000Z"
57
+ }
58
+ ```
59
+
60
+ That's the whole library.
61
+
62
+ ## Edge Runtime — read this first
63
+
64
+ Next.js middleware defaults to the **Edge runtime**, which can't run `node:crypto` or `node:fs`. You **must** opt into the Node runtime:
65
+
66
+ ```ts
67
+ export const config = {
68
+ matcher: [...],
69
+ runtime: 'nodejs',
70
+ }
71
+ ```
72
+
73
+ This is stable in **Next.js 15.2 and newer**.
74
+
75
+ ## How a "unique visitor" is counted
76
+
77
+ Cookieless, Plausible-style:
78
+
79
+ ```
80
+ visitorHash = SHA-256( ip + ":" + userAgent + ":" + dailySalt )
81
+ ```
82
+
83
+ - `dailySalt` is generated in process memory and rotates at every UTC midnight.
84
+ - The hash is fed into a [**HyperLogLog** sketch](https://en.wikipedia.org/wiki/HyperLogLog) with 16384 one-byte registers (16 KB fixed per day, forever).
85
+ - At UTC midnight the day's estimate is written to a historical map and the sketch is reset with a fresh salt.
86
+ - Cross-day unlinkability: because the salt is regenerated, hashes from different days can't be correlated back to the same visitor.
87
+ - Common bot User-Agents are filtered out by default.
88
+
89
+ ### About accuracy
90
+
91
+ HyperLogLog **estimates** cardinality — it doesn't count exactly. The expected standard error at `p=14` is **~0.8%**. If you had 1,000 true unique visitors, `/stats` will say somewhere in the range of ~992–1008. For a "how are we doing?" dashboard this is fine; it's what Plausible, Redis `PFCOUNT`, and BigQuery's `APPROX_COUNT_DISTINCT` use under the hood.
92
+
93
+ If you need exact counts down to the last human, don't use this library — graduate to a real analytics suite.
94
+
95
+ ## Storage
96
+
97
+ A single JSON file. Default location: `./.statswhatshesaid.json`.
98
+
99
+ ```jsonc
100
+ {
101
+ "version": 1,
102
+ "today": "2026-04-07",
103
+ "salt": "<base64 32 bytes>",
104
+ "hllRegisters": "<base64 16 KB>",
105
+ "history": { "2026-04-06": 388, "2026-04-05": 401 }
106
+ }
107
+ ```
108
+
109
+ - **One file.** Not a directory, not a DB, no WAL/SHM sidecars.
110
+ - **~22 KB today + 20 bytes per historical day.** Never grows beyond a few hundred KB, ever.
111
+ - **Atomic writes** via write-to-`.tmp` + `rename`. Crash-safe.
112
+ - **Flushed every hour** (tunable) and on `SIGTERM`/`SIGINT`/`beforeExit`.
113
+ - **Nothing on the hot path touches disk.** Tracking a visit is: one SHA-256, one HLL register update. Sub-millisecond.
114
+
115
+ ### Docker / containers
116
+
117
+ Because it's one small file, you have options:
118
+
119
+ ```dockerfile
120
+ # Option A: persist it on a volume
121
+ VOLUME /data
122
+ ENV STATS_SNAPSHOT_PATH=/data/stats.json
123
+ ```
124
+
125
+ ```dockerfile
126
+ # Option B: bind-mount a single file from the host
127
+ # docker run -v $(pwd)/stats.json:/app/stats.json \
128
+ # -e STATS_SNAPSHOT_PATH=/app/stats.json ...
129
+ ```
130
+
131
+ ```dockerfile
132
+ # Option C: accept ephemerality. Losing "today" on a redeploy is often fine
133
+ # for a small app. The snapshot is flushed on SIGTERM when Node is PID 1,
134
+ # so graceful stops keep the latest data.
135
+ ENV STATS_SNAPSHOT_PATH=/tmp/statswhatshesaid.json
136
+ ```
137
+
138
+ Works fine on `node:20-alpine`, `node:20-slim`, distroless — there are no native modules to compile.
139
+
140
+ ### Bring your own backend
141
+
142
+ If you want to stash the snapshot in Redis, Vercel KV, S3, or anything else, pass a `persist` adapter:
143
+
144
+ ```ts
145
+ import stats from 'statswhatshesaid'
146
+ import type { PersistAdapter, SnapshotV1 } from 'statswhatshesaid'
147
+
148
+ const redisPersist: PersistAdapter = {
149
+ load: () => {
150
+ const raw = redisClient.get('statswhatshesaid:snap') // your sync/blocking client
151
+ return raw ? (JSON.parse(raw) as SnapshotV1) : null
152
+ },
153
+ save: (snap) => {
154
+ redisClient.set('statswhatshesaid:snap', JSON.stringify(snap))
155
+ },
156
+ }
157
+
158
+ export default stats.middleware({ persist: redisPersist })
159
+ ```
160
+
161
+ The adapter interface is synchronous on purpose so the shutdown handler can flush deterministically.
162
+
163
+ ## Configuration
164
+
165
+ Configure via env vars (preferred) or by passing options to `stats.middleware({...})`. Options override env.
166
+
167
+ | Option | Env var | Default |
168
+ | --- | --- | --- |
169
+ | `token` | `STATS_TOKEN` | **required** |
170
+ | `snapshotPath` | `STATS_SNAPSHOT_PATH` | `./.statswhatshesaid.json` |
171
+ | `persist` | — | file adapter at `snapshotPath` |
172
+ | `flushIntervalMs` | `STATS_FLUSH_INTERVAL_MS` | `3600000` (1 hour) |
173
+ | `endpointPath` | `STATS_ENDPOINT_PATH` | `/stats` |
174
+ | `historyDays` | — | `90` (returned from `/stats`) |
175
+ | `maxHistoryDays` | — | `365` (kept in snapshot) |
176
+ | `filterBots` | — | `true` |
177
+ | `trustProxy` | `STATS_TRUST_PROXY` | `1` (see [Security](#security) below) |
178
+
179
+ ```ts
180
+ export default stats.middleware({
181
+ endpointPath: '/_internal/stats',
182
+ flushIntervalMs: 5 * 60 * 1000,
183
+ historyDays: 30,
184
+ trustProxy: 1,
185
+ })
186
+ ```
187
+
188
+ ## Security
189
+
190
+ This is a minimal library, but it runs inside your app's request path and writes to your filesystem, so its defaults matter. Read this section before deploying.
191
+
192
+ ### Threat model
193
+
194
+ - **In scope:** preventing trivial forging of visitor counts, protecting the `/stats` endpoint from unauthorized reads, keeping the process alive under abuse, making visitor hashes cross-day unlinkable.
195
+ - **Out of scope:** preventing a determined attacker with unlimited resources from skewing the numbers. statswhatshesaid is for day-one visibility on small, self-hosted apps. Once your traffic is big enough that someone would bother flooding your stats, you should be on Plausible / Umami / PostHog anyway.
196
+
197
+ ### 1. `trustProxy` — who decides the client IP?
198
+
199
+ Unique-visitor dedup hashes the client IP alongside the User-Agent. If the attacker controls the IP you hash with, they control the count.
200
+
201
+ **The problem:** `X-Forwarded-For` is a list of IPs separated by commas. Each reverse proxy in the chain **appends** the IP of *its own peer* (the thing that spoke TCP to it). The *leftmost* entry is whatever the original client claimed — i.e. attacker-controlled. The *rightmost N entries* are what trusted proxies added, so they're authentic.
202
+
203
+ To pick the real client IP safely you must **walk the chain from the right, skipping one entry per trusted proxy**.
204
+
205
+ **Configuration:**
206
+
207
+ - `trustProxy: 0` — Never read forwarding headers. Every request hashes to a single constant peer. `uniqueVisitors` will under-count dramatically (ideally it collapses to 1), but **nothing an attacker sends can forge it**. Use this only if (a) your process is directly exposed to untrusted clients, or (b) you're OK with a "did anybody visit today?" binary signal.
208
+
209
+ - `trustProxy: 1` **(default)** — One trusted reverse proxy sits in front of this Node process. The library takes the **rightmost** entry of `X-Forwarded-For`. This is correct for the single most common self-hosted shape: `client → nginx → next`, or `client → Caddy → next`, or `client → Traefik → next`.
210
+
211
+ - `trustProxy: 2` — Two trusted hops. The library takes the **second-from-right** entry of `X-Forwarded-For`. Use this for setups like `client → Cloudflare → nginx → next` where Cloudflare is ALSO adding to XFF.
212
+
213
+ - `trustProxy: N` — Generalizes to N trusted hops.
214
+
215
+ **nginx recipe (trustProxy = 1):**
216
+
217
+ ```nginx
218
+ location / {
219
+ proxy_pass http://127.0.0.1:3000;
220
+ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
221
+ proxy_set_header Host $host;
222
+ }
223
+ ```
224
+
225
+ `$proxy_add_x_forwarded_for` appends the client's socket IP to whatever XFF the client sent. With `trustProxy: 1`, statswhatshesaid ignores whatever the client sent and takes the rightmost entry (which is what nginx appended). The client's spoofed values sit uselessly to the left.
226
+
227
+ **Caddy recipe (trustProxy = 1):**
228
+
229
+ ```caddyfile
230
+ example.com {
231
+ reverse_proxy 127.0.0.1:3000
232
+ }
233
+ ```
234
+
235
+ Caddy automatically appends the client IP to `X-Forwarded-For` by default.
236
+
237
+ **Cloudflare + nginx recipe (trustProxy = 2):**
238
+
239
+ ```nginx
240
+ # nginx behind Cloudflare
241
+ location / {
242
+ proxy_pass http://127.0.0.1:3000;
243
+ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
244
+ }
245
+ ```
246
+
247
+ With `trustProxy: 2`, the second-from-right entry is the real client: `attacker-spoof, real-client, cloudflare-edge`.
248
+
249
+ **Direct-exposed (no proxy) warning:** If you're running `next start` straight on `0.0.0.0:3000` with no proxy, **any header you see is attacker-controlled**. Set `trustProxy: 0` and accept that visitor dedup won't work, OR put any reverse proxy in front.
250
+
251
+ ### 2. Token strength and rate limiting
252
+
253
+ `/stats` is protected by a single static token. A short token is brute-forceable if an attacker hammers the endpoint.
254
+
255
+ - statswhatshesaid **warns** at startup if your token is shorter than 32 characters. It does not reject — you might deliberately pick a memorable token for ad-hoc browser access from anywhere.
256
+ - A safer choice: `openssl rand -hex 32` → a 64-char hex string.
257
+ - The library does **not** rate-limit `/stats`. That's your CDN / reverse-proxy / application middleware's job. For nginx: [`limit_req`](https://nginx.org/en/docs/http/ngx_http_limit_req_module.html). For Cloudflare: [rate limiting rules](https://developers.cloudflare.com/waf/rate-limiting-rules/). For Next.js middleware chains: [`@upstash/ratelimit`](https://github.com/upstash/ratelimit-js).
258
+
259
+ ### 3. Passing the token: `Authorization` header vs query string
260
+
261
+ You can pass the token two ways:
262
+
263
+ | Method | Use when |
264
+ | --- | --- |
265
+ | `Authorization: Bearer <token>` header | **Production** — doesn't leak to access logs, browser history, or Referer |
266
+ | `?t=<token>` query string | Ad-hoc browser checks where typing a header is annoying |
267
+
268
+ Both are accepted. If both are present, the `Authorization` header wins. Example production check:
269
+
270
+ ```bash
271
+ curl -H "Authorization: Bearer $STATS_TOKEN" https://myapp.com/stats
272
+ ```
273
+
274
+ The query string is convenient but ends up in **nginx/CDN access logs, browser history, and Referer headers**. Don't link to `/stats?t=...` from any page.
275
+
276
+ ### 4. Count inflation by flooding
277
+
278
+ An attacker who can send arbitrary (IP, User-Agent) pairs — even behind a correctly configured `trustProxy` — can insert arbitrarily many distinct "visitors" into the HLL sketch. Memory doesn't blow up (HLL is fixed 16 KB/day), but the reported `uniqueVisitors` becomes meaningless during the attack. The library cannot prevent this at the middleware layer. **Defense:** rate-limit tracked routes at the same layer that protects the rest of your app. Don't treat the number as authoritative during a suspected abuse event.
279
+
280
+ ### 5. Snapshot file permissions and contents
281
+
282
+ - The snapshot file is written with mode `0o600` (owner read/write only). It contains the current day's salt, which would make visitor hashes linkable back to their `(ip, ua)` tuples if disclosed alongside an independent request log.
283
+ - Write is atomic via `.tmp` + `rename`. A crash mid-write leaves the previous snapshot intact.
284
+ - The snapshot file contains **no personal data** — just the HLL registers, the salt, and per-day visitor counts. No IPs or User-Agents are stored.
285
+
286
+ ### 6. User-Agent length cap
287
+
288
+ Incoming User-Agent headers are truncated to **512 bytes** before hashing and bot-filter checks. Node already caps total header size at ~16 KB, but this bounds per-request CPU regardless.
289
+
290
+ ### 7. Privacy properties
291
+
292
+ - **Cookieless.** The library never sets or reads cookies.
293
+ - **No personal data persisted.** Hashes go into the HLL (which discards them) and are never written to disk.
294
+ - **Cross-day unlinkability.** The salt rotates at every UTC midnight. Yesterday's hash of `(ip, ua)` is unrelated to today's hash of the same tuple.
295
+ - **Mid-day restart caveat.** If the process restarts within the same UTC day, the restored salt (from the snapshot file) is the same, so the same visitor returning after the restart doesn't get double-counted. This means the salt IS on disk for the current day. Rotate `STATS_TOKEN` and delete the snapshot file if you think the file was exposed.
296
+
297
+ ## Where it works
298
+
299
+ - ✅ **Self-hosted Next.js** — `next start` on a VPS, Docker, Fly.io, Railway, etc. Single long-running Node process.
300
+ - ❌ **Vercel / Netlify / serverless by default** — ephemeral filesystem and per-request lambdas mean the in-memory HLL doesn't survive. You *could* make this work with a custom `persist` adapter pointing at Vercel KV or Upstash Redis, but at that point you're probably better off with a hosted analytics service.
301
+
302
+ ## Escape hatch (non-middleware integration)
303
+
304
+ If you can't use `runtime: 'nodejs'` in middleware, call the tracker manually from a route handler or `instrumentation.ts`:
305
+
306
+ ```ts
307
+ import stats from 'statswhatshesaid'
308
+ import type { NextRequest } from 'next/server'
309
+
310
+ export function GET(req: NextRequest) {
311
+ stats.track(req)
312
+ return new Response('ok')
313
+ }
314
+ ```
315
+
316
+ ## Development
317
+
318
+ ```bash
319
+ npm install
320
+ npm run typecheck
321
+ npm run build
322
+ npm test
323
+ # or, all at once:
324
+ npm run verify
325
+ ```
326
+
327
+ The example app under `examples/basic` is the simplest way to smoke-test changes end-to-end.
328
+
329
+ ## Releasing
330
+
331
+ Versioning and publishing are managed with [Changesets](https://github.com/changesets/changesets) and automated via GitHub Actions.
332
+
333
+ **Day-to-day flow:**
334
+
335
+ 1. Make your changes on a branch and open a PR.
336
+ 2. Add a changeset describing what changed:
337
+ ```bash
338
+ npx changeset
339
+ ```
340
+ Pick the bump type (patch / minor / major) and write a short summary. Commit the generated `.changeset/*.md` file.
341
+ 3. Merge the PR into `main`. The `Release` workflow will open (or update) a **"chore(release): version packages"** PR that bumps `package.json` and updates `CHANGELOG.md`.
342
+ 4. When you merge the release PR, the workflow publishes the new version to npm with [provenance](https://docs.npmjs.com/generating-provenance-statements) attached.
343
+
344
+ **One-time setup:**
345
+
346
+ - The unscoped package name `statswhatshesaid` must be available on npm (`npm view statswhatshesaid` — a 404 means it's yours for the taking on first publish).
347
+ - Add an automation token to the GitHub repo as the `NPM_TOKEN` secret (`Settings → Secrets and variables → Actions`). Use a **granular** token scoped to publish the `statswhatshesaid` package.
348
+ - In `Settings → Actions → General`, under *Workflow permissions*, allow GitHub Actions to **create and approve pull requests** so the release bot can open the version PR.
349
+
350
+ **Manual publishing (escape hatch):**
351
+
352
+ If you ever need to cut a release locally:
353
+
354
+ ```bash
355
+ npx changeset version # bumps package.json + updates CHANGELOG
356
+ git commit -am "chore(release): version packages"
357
+ git push
358
+ npm run release # verify + changeset publish
359
+ ```
360
+
361
+ ## License
362
+
363
+ MIT