@claude-code-mastery/starter-kit 1.2.1 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,69 @@
1
+ ---
2
+ name: mongodb-backups
3
+ description: Production MongoDB backup and restore practices that the documentation gets wrong. Use when writing a mongodump/mongorestore pipeline, a backup cron job, an S3 backup, or a selective restore, or when planning recovery. Covers streaming to object storage with no temp file, saving a collection inventory, the --nsInclude trap on gzipped archives, collection tiering for fast restores, and matching write concern to data criticality. Defers replica-set topology and tuning to mongodb-replica-sets, and query patterns to mongodb-rules.
4
+ when_to_use: |
5
+ - Writing or reviewing a mongodump / mongorestore pipeline or backup cron job
6
+ - Streaming a backup to S3 or other object storage
7
+ - Doing a selective restore (only some collections) from a gzipped archive
8
+ - Planning recovery, retention, or what to restore first during an incident
9
+ - Do NOT use for replica-set setup or tuning (mongodb-replica-sets) or query shape (mongodb-rules)
10
+ ---
11
+
12
+ # MongoDB Backups and Restore
13
+
14
+ From running thousands of production backups. The defaults and the docs leave out the parts that bite during an actual restore.
15
+
16
+ ## Dump from a secondary, stream straight to object storage
17
+
18
+ Point mongodump at a secondary so the backup doesn't add load to the primary, and pipe the archive directly to S3 with no intermediate file on disk. On a replica set, `--oplog` captures a consistent point-in-time snapshot.
19
+
20
+ ```bash
21
+ mongodump --host mongodb-secondary.internal:27017 \
22
+ --username "$U" --password "$P" --authenticationDatabase admin \
23
+ --db "$DB" --oplog --gzip --archive \
24
+ | aws s3 cp - "s3://bucket/$DB/$(date +%Y%m%d_%H%M%S).dump.gz"
25
+ ```
26
+
27
+ Keep a `latest.dump.gz` alias next to the timestamped file so restore scripts always know where to look.
28
+
29
+ ## Save a collection inventory with every backup
30
+
31
+ You cannot inspect a `--gzip --archive` after the fact. There is no list, peek, or inspect flag, it's an opaque binary blob, and `--dryRun` finishes before the archive is demuxed so it tells you nothing. So write the collection list at backup time, right beside the dump:
32
+
33
+ ```bash
34
+ mongosh --quiet --host "$HOST" -u "$U" -p "$P" --authenticationDatabase admin \
35
+ --eval "db.getSiblingDB('$DB').getCollectionNames().forEach(c => print(c))" \
36
+ | aws s3 cp - "s3://bucket/$DB/$(date +%Y%m%d_%H%M%S).collections.txt"
37
+ ```
38
+
39
+ Six months later when you need a selective restore, you read the file instead of trying to remember what was in the archive.
40
+
41
+ ## The `--nsInclude` trap on gzipped archives
42
+
43
+ The docs say `--nsInclude` filters a restore to specific collections. It does, from a directory dump (one `.bson` per collection). But from a `--gzip --archive`, which is what almost every production pipeline uses, `--nsInclude` (and `--nsFrom`/`--nsTo`) silently restore everything anyway and throw duplicate-key errors on the collections you never asked for. The archive is a single multiplexed stream that mongorestore can't seek, so the namespace filter doesn't hold. This is real and long-standing (JIRA TOOLS-2023, open over six years).
44
+
45
+ The reliable approach is the inverse: `--nsExclude` every collection you don't want, generated from the inventory file. Don't hand-build 100+ exclude flags at 2 AM, script it to read the inventory and emit the restore command.
46
+
47
+ ## Tier collections so restore is a command, not improvisation
48
+
49
+ Decide ahead of time what gets restored, and keep the lists in the restore script:
50
+
51
+ - **Tier 1, critical business data** (orders, customers, products, inventory): always restore.
52
+ - **Tier 2, regenerable** (sessions, caches, tokens, search indexes): never restore. Restoring stale sessions is worse than having none, you log people back into dead state.
53
+ - **Tier 3, historical/analytical** (audit logs, history, analytics rollups): restore only on demand. This is the bulk of the exclude list.
54
+
55
+ When the incident hits you want to run a command, not write one.
56
+
57
+ ## A replica set is only a backup at `w:"majority"`
58
+
59
+ Write concern quietly decides whether replication is durability or just a live mirror. `w:1` acknowledges on the primary alone, so a write lost before it replicates never existed anywhere else. `w:"majority"` means the data is on a majority of members before the app gets the OK. Match it to data criticality rather than setting one value globally: `w:"majority"` for data you can't lose, `w:1` for the regenerable and the disposable. See mongodb-replica-sets for the full read/write semantics.
60
+
61
+ ## Test the restore before you need it
62
+
63
+ A backup you've never restored is a hypothesis. Practice the selective restore in staging, and time how long a replica-set member rebuilds from zero (delete a secondary's data dir, restart, and watch `db.adminCommand({ replSetGetStatus: 1, initialSync: 1 }).initialSyncStatus`). That number is what tells you, at 2 AM, whether to wait for self-healing or restore from backup. Point-in-time recovery needs `--oplog` backups plus `mongorestore --oplogReplay --oplogLimit "<ts>:<inc>"`.
64
+
65
+ One adjacent footgun that kills backups: unrotated MongoDB diagnostic logs fill the disk and the primary goes read-only. Set `systemLog.logRotate: rename` with rotation and alert on disk at 80%.
66
+
67
+ ---
68
+
69
+ This skill is built to grow. Add a rule when a real restore surprise has a stable, defensible fix.
@@ -0,0 +1,73 @@
1
+ ---
2
+ name: mongodb-replica-sets
3
+ description: Production MongoDB replica-set operation: topology, durability, host tuning, and the container-specific gotchas Claude gets wrong. Use when setting up or configuring a replica set, writing its connection string, choosing read preference or write concern, running MongoDB in Docker or Swarm, or planning failover and upgrades. Covers odd-member quorum, connecting via the set rather than one node, WiredTiger cache sizing, the OS tuning MongoDB requires, why ingress-mode ports break a replica set, and oplog/failover discipline. Defers query and index shape to mongodb-rules and backup pipelines to mongodb-backups.
4
+ when_to_use: |
5
+ - Setting up, configuring, or initializing a replica set, or writing its connection string
6
+ - Choosing read preference or write concern, or reasoning about staleness and durability
7
+ - Running MongoDB in Docker or Docker Swarm
8
+ - Sizing the WiredTiger cache, tuning the host, sizing the oplog, or planning failover/upgrades
9
+ - Do NOT use for query/aggregation/index shape (mongodb-rules) or backup pipelines (mongodb-backups)
10
+ ---
11
+
12
+ # MongoDB Replica Sets in Production
13
+
14
+ How to run a replica set, not how to query it (that's mongodb-rules). These are the operational decisions Claude tends to get wrong.
15
+
16
+ ## Topology: odd voting members, real hostnames, never localhost
17
+
18
+ Use an odd number of voting members, three, so a majority can still elect a primary when one is lost. A two-member set has no majority if either dies, it goes read-only. Avoid arbiters unless you truly must: an arbiter holds no data, so a three-node primary-secondary-arbiter set that loses its one data secondary can no longer satisfy `w:"majority"`. Address every member by a hostname that all other members and all clients can resolve and reach; `localhost` in the config is a classic break, the other members can't reach it. Initialize once, from a single member, after all members are up, and wait for the election before creating users.
19
+
20
+ ```javascript
21
+ rs.initiate({ _id: "rs0", members: [
22
+ { _id: 0, host: "mongo1.internal:27017", priority: 2 },
23
+ { _id: 1, host: "mongo2.internal:27017", priority: 1 },
24
+ { _id: 2, host: "mongo3.internal:27017", priority: 1 }
25
+ ]})
26
+ ```
27
+
28
+ ## Connect to the set, not to one node
29
+
30
+ The connection string must list the seed members and name the set, so the driver can find the current primary and follow failover. Pointing the app at a single host throws away the high availability the replica set exists to provide.
31
+
32
+ ```
33
+ mongodb://mongo1,mongo2,mongo3/db?replicaSet=rs0
34
+ ```
35
+
36
+ ## Durability and reads
37
+
38
+ Writes always go to the primary. `w:"majority"` means a majority of members acknowledged before the app gets the OK, that is durability, use it for data you can't lose; `w:1` (primary only) is fine for the regenerable. Add `wtimeoutMS` so a degraded set doesn't block writes forever. Reads default to the primary and are strongly consistent. Secondaries are eventually consistent, they lag, so only route reads to them (`secondaryPreferred`, `secondary`, or `nearest`) when stale data is acceptable, analytics and reporting, not a read-after-write the user just made.
39
+
40
+ ## Authentication between members
41
+
42
+ A replica set needs internal auth or any host can join. Generate a keyfile (`openssl rand -base64 756`), share it across members, and enable `security.authorization: enabled` with the keyfile. Never expose 27017 to the internet, bind to the private network and firewall it. Use TLS for client and inter-node traffic when the network isn't fully trusted.
43
+
44
+ ## OS tuning MongoDB actually requires (self-hosted)
45
+
46
+ These aren't optional polish, MongoDB warns about them and they cause real instability if skipped:
47
+
48
+ - **Disable Transparent Huge Pages (THP).** THP hurts database memory access patterns badly; set `enabled` and `defrag` to `never` at boot.
49
+ - **`vm.swappiness=1`.** Keep the working set in RAM instead of swapping out the cache.
50
+ - **Raise ulimits.** `nofile` and `nproc` to 64000/32000, the defaults are too low for a busy mongod's connections and threads.
51
+ - **XFS for the data volume.** MongoDB recommends XFS over ext4 for WiredTiger. Never put data on NFS or network storage, the latency wrecks it.
52
+
53
+ ## WiredTiger cache, especially in a container
54
+
55
+ Set `--wiredTigerCacheSizeGB` explicitly from the container's memory limit, roughly half of it minus 1GB, and cap the container's memory in the deploy block. Don't rely on the default: it targets about half of system RAM, and the cache is only part of mongod's footprint (connections, aggregation, and sort buffers live outside it). An unset cache sized against the host plus those buffers will exceed a container limit and get the container OOM-killed. Leave headroom on purpose.
56
+
57
+ ## Running it in Docker or Swarm
58
+
59
+ The replica-set-specific traps on top of the general docker and docker-swarm skills:
60
+
61
+ - **Publish the port in `mode: host`, never ingress.** The routing mesh load-balances `27017` across all three members, which breaks the replica set, clients and members must reach a specific member. Use `ports: [{ target: 27017, published: 27017, mode: host }]`.
62
+ - **Pin each member to its own node.** Without placement constraints all three can land on one host and you have zero fault tolerance. Label nodes and constrain each service (`node.labels.mongo.replica == 1`), one member per host.
63
+ - **Bind-mount the data directory to a host path.** Anonymous or named-without-bind volumes risk data loss on recreation and are hard to back up. Pre-create the dir and `chown 999:999` (the image runs as UID 999). XFS host filesystem.
64
+ - **Real healthcheck, with a start period.** `test: ["CMD","mongosh","--eval","db.adminCommand('ping')"]` and `start_period: 60s`, so a slow first start isn't read as unhealthy.
65
+ - **Keyfile and root password via Docker secrets**, not plaintext env (which shows in `docker inspect` and image history).
66
+
67
+ ## Oplog and failover discipline
68
+
69
+ The oplog is your recovery window: a secondary that falls further behind than the oplog covers needs a full resync, and point-in-time recovery can only reach back as far as the oplog. Size it for your write volume (24h minimum, 48 to 72h is a safer target; `db.getReplicationInfo()` shows the current window). Test failover on a schedule with `rs.stepDown()`, don't discover at an outage that the app doesn't reconnect. Roll upgrades secondaries-first, one at a time, step the primary down last, and let each node resync before moving on.
70
+
71
+ ---
72
+
73
+ This skill is built to grow. Add a rule when a real replica-set operation has a stable, defensible fix.
@@ -0,0 +1,148 @@
1
+ ---
2
+ name: nginx
3
+ description: Production NGINX configuration best practices, especially as a reverse proxy in front of containerized backends. Use when writing or editing nginx.conf, server blocks, upstreams, SSL, proxy caching, security headers, structured logging, or stream (TCP/UDP) proxying. Covers the Docker-DNS resolver that keeps upstreams from going stale, separate access-controlled health ports, the stream-block placement gotcha, and headers that must be sent on errors too. Kept separate from the docker and docker-swarm skills.
4
+ when_to_use: |
5
+ - Writing or editing nginx.conf, a server block, an upstream, or an SSL block
6
+ - Putting NGINX in front of containerized services as a reverse proxy
7
+ - Upstreams that resolve once at startup and then break when a container is replaced
8
+ - Structured logging, proxy caching, security headers, or TCP/UDP (stream) proxying
9
+ - Do NOT use for Dockerfile or Swarm deploy concerns, those are the docker and docker-swarm skills
10
+ ---
11
+
12
+ # NGINX: Production Reverse-Proxy Config
13
+
14
+ Aimed at NGINX in front of containerized backends. The defaults are fine for a static site, these are the things that bite in production.
15
+
16
+ ## Resolve upstreams through Docker DNS with a short TTL
17
+
18
+ By default NGINX resolves an upstream host once, at startup, and caches the IP forever. In Docker that IP belongs to a container that will be replaced, so the proxy keeps sending traffic to a dead address. Point NGINX at Docker's internal DNS and force re-resolution:
19
+
20
+ ```nginx
21
+ http {
22
+ resolver 127.0.0.11 ipv6=off valid=10s; # Docker DNS, re-resolve every 10s
23
+ }
24
+ ```
25
+
26
+ `valid=10s` is what makes NGINX pick up the new container after a restart or scale. This is the NGINX side of "never hardcode IPs", see the docker-swarm skill for the principle.
27
+
28
+ ## Upstreams by service name, with keepalive
29
+
30
+ Reference backends by service name, never IP. Reuse connections with `keepalive`, which needs HTTP/1.1 and a cleared Connection header:
31
+
32
+ ```nginx
33
+ upstream backend {
34
+ server backend-service:8080;
35
+ keepalive 32;
36
+ }
37
+ server {
38
+ location / {
39
+ proxy_pass http://backend;
40
+ proxy_http_version 1.1;
41
+ proxy_set_header Connection "";
42
+ }
43
+ }
44
+ ```
45
+
46
+ ## Structured JSON logs to stdout/stderr
47
+
48
+ Log JSON so an aggregator can parse it, and write to stdout/stderr so Docker's logging driver captures it. Never log to a file inside the container.
49
+
50
+ ```nginx
51
+ log_format json_log escape=json '{'
52
+ '"time":$msec,"method":"$request_method","status":$status,'
53
+ '"uri":"$request_uri","rt":$request_time,'
54
+ '"upstream":"$upstream_addr","cache":"$upstream_cache_status",'
55
+ '"client":"$remote_addr","xff":"$http_x_forwarded_for"'
56
+ '}';
57
+ access_log /dev/stdout json_log;
58
+ error_log /dev/stderr warn;
59
+ ```
60
+
61
+ ## Health and status on separate, access-restricted ports
62
+
63
+ Keep health checks and metrics off the production port: different access control, no log noise, no interference with real traffic. Restrict to internal networks and turn off access logging.
64
+
65
+ ```nginx
66
+ server { # load balancer health check
67
+ listen 82;
68
+ allow 10.0.0.0/8; allow 172.16.0.0/12; allow 127.0.0.1; deny all;
69
+ location /health { access_log off; return 200 "OK"; }
70
+ }
71
+ server { # stub_status for Prometheus/Datadog
72
+ listen 81;
73
+ allow 10.0.0.0/8; allow 127.0.0.1; deny all;
74
+ location /nginx_status { stub_status on; }
75
+ }
76
+ ```
77
+
78
+ A deep health check that proxies an upstream's own `/health` is worth a third port when a service's liveness depends on its backend being reachable.
79
+
80
+ ## Stream (TCP/UDP) blocks go OUTSIDE the http block
81
+
82
+ Proxying a non-HTTP protocol like MongoDB or a database uses the `stream` module, which is a top-level block, not inside `http`. Putting it inside `http` is a silent misconfiguration. HTTP services (an Elasticsearch REST proxy, say) stay inside `http`.
83
+
84
+ ```nginx
85
+ load_module modules/ngx_stream_module.so;
86
+ include /etc/nginx/mongo.conf; # stream { ... } OUTSIDE http
87
+ http {
88
+ include /etc/nginx/elasticsearch.conf; # HTTP proxy, INSIDE http
89
+ }
90
+ ```
91
+
92
+ ## SSL and security headers
93
+
94
+ Modern protocols and ciphers, session cache, OCSP stapling. When certs come from Docker secrets they are mounted at `/run/secrets/<name>`:
95
+
96
+ ```nginx
97
+ ssl_certificate /run/secrets/server_pem;
98
+ ssl_certificate_key /run/secrets/server_key;
99
+ ssl_protocols TLSv1.2 TLSv1.3;
100
+ ssl_session_cache shared:SSL:60m;
101
+ ssl_stapling on; ssl_stapling_verify on;
102
+ ```
103
+
104
+ Send security headers with `always` so they are present on error responses too, not just 2xx/3xx:
105
+
106
+ ```nginx
107
+ add_header X-Frame-Options "SAMEORIGIN" always;
108
+ add_header X-Content-Type-Options "nosniff" always;
109
+ add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
110
+ ```
111
+
112
+ ## Always add a Content-Security-Policy, this is the one that gets skipped
113
+
114
+ CSP is the highest-value security header and the one almost always left out. The others harden edges; CSP is the primary defense against XSS and content injection, it tells the browser which sources are allowed to load scripts, styles, images, and frames, so an injected `<script>` from an attacker simply doesn't execute. A site without a CSP has no second line of defense once markup injection gets through. Add it by default, do not wait to be asked.
115
+
116
+ `default-src 'self'` alone is technically a CSP but it breaks most real apps (CDNs, inline styles, analytics) and lulls you into thinking you're covered, so set the directives explicitly:
117
+
118
+ ```nginx
119
+ add_header Content-Security-Policy "default-src 'self'; script-src 'self'; style-src 'self'; img-src 'self' data:; font-src 'self'; connect-src 'self'; object-src 'none'; base-uri 'self'; frame-ancestors 'self'" always;
120
+ ```
121
+
122
+ Rules that matter:
123
+ - **Avoid `'unsafe-inline'` and `'unsafe-eval'` in `script-src`.** They re-open the XSS hole CSP exists to close. If you have inline scripts, use a per-request nonce or a hash, not a blanket unsafe allow.
124
+ - **`object-src 'none'` and `base-uri 'self'`** are free wins that block plugin and base-tag injection. Set them every time.
125
+ - **`frame-ancestors`** controls who can iframe you and supersedes `X-Frame-Options`, so put your clickjacking policy here.
126
+ - **Roll out with report-only first.** A too-strict CSP breaks the page silently. Ship `Content-Security-Policy-Report-Only` to collect violations without enforcing, watch what trips, tighten, then promote to the enforcing header. The real policy is app-specific and is built by tuning, not guessed in one line.
127
+
128
+ ## Proxy caching for read-heavy upstreams
129
+
130
+ Cache GET/HEAD, and serve stale on upstream error or timeout so a backend hiccup doesn't reach users:
131
+
132
+ ```nginx
133
+ proxy_cache es_cache;
134
+ proxy_cache_methods GET HEAD;
135
+ proxy_cache_valid 200 1m;
136
+ proxy_cache_key $host$uri$args;
137
+ proxy_cache_use_stale updating error timeout http_500 http_502 http_503 http_504;
138
+ proxy_hide_header X-Powered-By;
139
+ add_header X-Proxy-Cache $upstream_cache_status;
140
+ ```
141
+
142
+ ## Watch line endings in config files
143
+
144
+ NGINX config copied in with Windows CRLF line endings can fail to parse or behave oddly in a Linux container, which is why production NGINX images often run `dos2unix` on the configs at build time. If `nginx -t` reports something that makes no sense, check the line endings first, see the dev-pitfalls skill.
145
+
146
+ ---
147
+
148
+ This skill is built to grow. Add a directive when a real production NGINX problem has a stable, defensible fix. ModSecurity/WAF setup (build as a dynamic module, `load_module`) is deep enough to deserve its own section when needed.
@@ -0,0 +1,4 @@
1
+ [ZoneTransfer]
2
+ ZoneId=3
3
+ ReferrerUrl=https://claude.ai/chat/597ba4c7-56c2-4a17-8c34-c62cc3cd01b9
4
+ HostUrl=https://claude.ai/api/organizations/72c98b96-eeec-4437-8319-588863e85078/conversations/597ba4c7-56c2-4a17-8c34-c62cc3cd01b9/wiggle/download-file?path=%2Fmnt%2Fuser-data%2Foutputs%2Fskills%2Fnginx%2FSKILL.md
@@ -0,0 +1,121 @@
1
+ ---
2
+ name: nodejs
3
+ description: Node.js backend runtime and process-lifecycle rules that Claude reliably gets wrong. Use when writing a Node server or long-running script, an Express app, worker threads, or process signal handling, and when choosing packages. Covers correct graceful shutdown, crash-on-fault instead of swallowing errors, not blocking the event loop, loading instrumentation first, securing the session cookie, and replacing deprecated packages with built-ins. Defers MongoDB to mongodb-rules, schema/validation to schema-source-of-truth, and image/deploy to docker and docker-swarm.
4
+ when_to_use: |
5
+ - Writing a Node server (`server.js`, Express) or a long-running script or worker
6
+ - Adding or reviewing process signal handling, graceful shutdown, or error handling
7
+ - CPU-heavy work that might block the event loop
8
+ - Choosing an HTTP client, UUID lib, date lib, or any dependency with a modern built-in
9
+ - Do NOT use for Mongo query shape (mongodb-rules), validation schemas (schema-source-of-truth), or Dockerfiles (docker)
10
+ ---
11
+
12
+ # Node.js: Process Lifecycle and Runtime Rules
13
+
14
+ Claude writes Node services that work in a demo and fall over in production, almost always around the process lifecycle. These are the parts to get right.
15
+
16
+ ## Graceful shutdown, done correctly
17
+
18
+ A server must shut down cleanly on `SIGTERM` (what `docker stop`, Swarm, and Kubernetes send) and `SIGINT` (Ctrl-C). Claude usually omits this entirely, so the orchestrator waits the grace period and then SIGKILLs, dropping in-flight requests. The correct sequence is: stop accepting new connections, drain in-flight ones, close dependencies, exit 0, with a hard timeout so a stuck connection can't block shutdown forever.
19
+
20
+ ```javascript
21
+ const server = app.listen(port);
22
+ let shuttingDown = false;
23
+
24
+ async function shutdown(signal) {
25
+ if (shuttingDown) return; // ignore repeat signals
26
+ shuttingDown = true;
27
+ logger.info("shutting down", { signal });
28
+
29
+ const force = setTimeout(() => { // drain hung? force it
30
+ logger.error("shutdown timed out, forcing exit");
31
+ process.exit(1);
32
+ }, 10_000);
33
+ force.unref();
34
+
35
+ try {
36
+ await new Promise((r) => server.close(r)); // stop new conns, let in-flight finish
37
+ await db.close(); // then close DB, redis, change streams
38
+ clearTimeout(force);
39
+ process.exit(0);
40
+ } catch (err) {
41
+ logger.error("error during shutdown", { err });
42
+ process.exit(1);
43
+ }
44
+ }
45
+
46
+ process.on("SIGTERM", () => shutdown("SIGTERM"));
47
+ process.on("SIGINT", () => shutdown("SIGINT"));
48
+ ```
49
+
50
+ Two things that look fine but are bugs: do not put async cleanup in a `process.on("exit", ...)` handler, the event loop is already stopped so nothing async runs, `exit` is for synchronous work only. And do not trap `SIGUSR1`, Node uses it for the debugger. This only works if the process actually receives the signal, which means an exec-form `ENTRYPOINT` so Node is PID 1 (see the docker skill) and `init: true` so signals are forwarded (see docker-swarm).
51
+
52
+ ## Let it crash, never swallow a fault
53
+
54
+ On `uncaughtException` or `unhandledRejection` the process is in an unknown, possibly corrupt state. Log it and exit non-zero, let the orchestrator restart a clean process. Do not catch-and-continue. Modern Node already terminates on an unhandled rejection by default, so code that relies on swallowing one is both wrong and fragile.
55
+
56
+ ```javascript
57
+ process.on("uncaughtException", (err) => {
58
+ logger.error("uncaught exception", { err });
59
+ process.exit(1); // exit non-zero so restart_policy: on-failure restarts
60
+ });
61
+ process.on("unhandledRejection", (reason) => {
62
+ logger.error("unhandled rejection", { reason });
63
+ process.exit(1);
64
+ });
65
+ ```
66
+
67
+ The non-zero exit is what makes Swarm/K8s self-healing fire, an `exit(0)` on a crash reads as success and the dead service is never restarted (see docker-swarm). You can route these through `shutdown()` to drain first, but never let the process keep serving after one.
68
+
69
+ ## Don't block the event loop
70
+
71
+ Node runs your JavaScript on a single thread. A CPU-bound stretch, parsing a huge payload, hashing, image work, a tight loop over a large array, freezes every concurrent request until it finishes. I/O is already async and is not the problem. For real CPU work, offload to `worker_threads`, not `child_process` (for in-process JS) and not "just make it async" (await doesn't yield during a synchronous loop).
72
+
73
+ ```javascript
74
+ const { Worker } = require("node:worker_threads");
75
+ new Worker("./workers/process.js", {
76
+ workerData,
77
+ resourceLimits: { maxOldGenerationSizeMb: 512 }, // cap so one worker can't OOM the host
78
+ });
79
+ ```
80
+
81
+ ## Load instrumentation before anything else
82
+
83
+ APM and tracing libraries (`dd-trace`, the OpenTelemetry SDK) work by monkey-patching `http`, `express`, and your DB driver. They can only patch modules loaded after them, so the init call must be the very first thing in the entry file, before any `require("express")`. Required late, it silently instruments nothing.
84
+
85
+ ```javascript
86
+ // server.js, line 1
87
+ require("dd-trace").init({ /* ... */ });
88
+ const express = require("express"); // now traced
89
+ ```
90
+
91
+ ## Lock down the session cookie
92
+
93
+ When Claude sets up sessions it usually sets `httpOnly` and stops. Set all three: `httpOnly` (no JS access), `secure` in production (HTTPS only), and `sameSite` (CSRF defense), which is the one that gets missed.
94
+
95
+ ```javascript
96
+ cookie: { httpOnly: true, secure: isProd, sameSite: "lax", maxAge: 86_400_000 }
97
+ ```
98
+
99
+ Related CORS gotcha: `credentials: true` cannot be combined with `origin: "*"`, the browser rejects it. Echo a specific allowed origin instead.
100
+
101
+ ## Reach for built-ins, replace deprecated packages
102
+
103
+ Claude's training pulls in libraries that are now deprecated or unnecessary. Prefer the platform:
104
+
105
+ - `crypto.randomUUID()` over the `uuid` package for a v4 id, and `uuid` over `node-uuid`
106
+ - native `fetch` (Node 18+) or `axios` over `request` (unmaintained since 2020)
107
+ - `node:test` + `node:assert` for simple suites, `structuredClone()` over a deep-clone dep
108
+ - the Intl APIs or `date-fns`/Temporal over `moment` (in maintenance mode)
109
+ - `@aws-sdk/client-*` v3 (modular) over the monolithic `aws-sdk` v2
110
+ - `sass` (dart-sass) over the deprecated `node-sass`
111
+
112
+ Use the `node:` prefix on built-in imports (`require("node:fs")`) so there's no ambiguity with a same-named package.
113
+
114
+ ## Two smaller ones
115
+
116
+ - **Logging:** a structured logger (pino or winston) emitting JSON to stdout in production, never `console.log` in a hot path. stdout because the container's logging driver collects it (see docker).
117
+ - **PM2:** cluster mode is for using all cores on a VM or bare-metal host. Inside a Swarm or K8s container, run one Node process and scale with replicas plus `init: true`, don't stack two process managers that both try to own restarts.
118
+
119
+ ---
120
+
121
+ This skill is built to grow. Add a rule when a real Node production failure has a stable, defensible fix.
@@ -0,0 +1,4 @@
1
+ [ZoneTransfer]
2
+ ZoneId=3
3
+ ReferrerUrl=https://claude.ai/chat/597ba4c7-56c2-4a17-8c34-c62cc3cd01b9
4
+ HostUrl=https://claude.ai/api/organizations/72c98b96-eeec-4437-8319-588863e85078/conversations/597ba4c7-56c2-4a17-8c34-c62cc3cd01b9/wiggle/download-file?path=%2Fmnt%2Fuser-data%2Foutputs%2Fskills%2Fnodejs%2FSKILL.md
@@ -0,0 +1,128 @@
1
+ ---
2
+ name: responsive-css
3
+ description: Writing CSS and markup that works on phone and desktop at the same time, the responsive failures Claude repeats. Use when building or editing any web page or component, or when something overflows horizontally, a code block blows out the page, text is huge on mobile, or content won't scroll on touch. Covers the viewport meta tag, the flex/grid min-width:0 rule that fixes most overflow, code blocks that scroll instead of overflowing, fluid type with clamp(), and mobile-first breakpoints. This is authoring guidance, design-review evaluates the result.
4
+ when_to_use: |
5
+ - Writing or editing CSS or HTML for a page or component that renders in a browser
6
+ - Anything that has to look right on both a phone and a desktop
7
+ - Fixing horizontal page overflow, a code block or table that overflows, text too large on mobile, or a region that won't scroll on touch
8
+ - The whole page suddenly scrolls, jumps, or rubber-bands on iPhone/Safari, or background scrolls under an open modal
9
+ - Building docs/landing pages with code blocks, which overflow on mobile constantly
10
+ - Do NOT use for native mobile (React Native, Flutter) or non-visual code
11
+ ---
12
+
13
+ # Responsive CSS: Mobile and Desktop at Once
14
+
15
+ Claude writes CSS for the desktop it's picturing and never checks the narrow viewport, so the same few things break every time: content overflows sideways, code blocks blow out the page, and text that's right on desktop is huge on a phone. Design for the small screen first and these mostly disappear.
16
+
17
+ ## Set the viewport, stop iOS inflating text
18
+
19
+ Without the viewport meta tag, mobile browsers render at a ~980px layout width and scale the result down, which is why everything looks oversized and mis-laid-out on a phone. This one line is non-negotiable on every page:
20
+
21
+ ```html
22
+ <meta name="viewport" content="width=device-width, initial-scale=1" />
23
+ ```
24
+
25
+ And stop iOS from auto-enlarging text:
26
+
27
+ ```css
28
+ html { -webkit-text-size-adjust: 100%; text-size-adjust: 100%; }
29
+ ```
30
+
31
+ ## The flex/grid `min-width: 0` rule, this fixes most overflow
32
+
33
+ This is the single most common cause of mysterious horizontal scroll. Flex and grid children default to `min-width: auto`, which means they refuse to shrink below their content's intrinsic width. So one long line, a URL, or a `<pre>` inside a flex/grid item pushes the whole layout wider than the screen. Set `min-width: 0` on the child (or `overflow: hidden`) and it shrinks correctly.
34
+
35
+ ```css
36
+ .flex-child, .grid-child { min-width: 0; } /* lets long content shrink instead of overflowing */
37
+ ```
38
+
39
+ If you fix nothing else, fix this. It's behind the code-block overflow in most "works on desktop, scrolls sideways on mobile" pages.
40
+
41
+ ## Code blocks scroll, they don't overflow the page
42
+
43
+ A `<pre>` doesn't wrap and has no scroll affordance by default, so long lines expand the page. Make the block scroll inside itself, and never line-wrap code (wrapping changes what the code means). The parent needs `min-width: 0` per the rule above, or this still overflows.
44
+
45
+ ```css
46
+ pre {
47
+ overflow-x: auto; /* scroll inside the block */
48
+ max-width: 100%;
49
+ }
50
+ pre code { white-space: pre; } /* keep code on its own lines, scroll horizontally */
51
+ ```
52
+
53
+ For prose, the opposite: let long words and URLs break instead of overflowing.
54
+
55
+ ```css
56
+ p, li, h1, h2, h3 { overflow-wrap: break-word; }
57
+ ```
58
+
59
+ Wrap wide tables the same way you handle code, in a scroll container: `<div style="overflow-x:auto">…table…</div>`.
60
+
61
+ ## Fluid type with `clamp()`, not fixed desktop sizes
62
+
63
+ A heading sized for desktop is enormous on a phone and forces wrapping and overflow. `clamp()` scales the size smoothly between a mobile floor and a desktop ceiling with no breakpoints to juggle:
64
+
65
+ ```css
66
+ h1 { font-size: clamp(1.75rem, 4vw + 1rem, 3rem); }
67
+ body { font-size: clamp(1rem, 0.5vw + 0.9rem, 1.125rem); line-height: 1.5; }
68
+ ```
69
+
70
+ The middle value does the scaling; the floor and ceiling keep it readable at both ends.
71
+
72
+ ## Mobile-first: base styles small, enhance up
73
+
74
+ Write the base styles for the narrow screen, then add `@media (min-width: ...)` to enhance for larger ones. Desktop-first with `max-width` patches is exactly how you get a layout that works on the desktop and falls apart on mobile, because the mobile case is an afterthought bolted on.
75
+
76
+ ```css
77
+ .layout { display: grid; grid-template-columns: 1fr; gap: 1rem; } /* phone */
78
+ @media (min-width: 48rem) {
79
+ .layout { grid-template-columns: 240px 1fr; } /* tablet up */
80
+ }
81
+ ```
82
+
83
+ Check the phone width first, not last.
84
+
85
+ ## Sideways scroll: the rest of the causes, and the touch-scroll bug
86
+
87
+ If the page scrolls horizontally on mobile, the `min-width: 0` rule above is the first thing to check. The other usual suspects:
88
+
89
+ - **Set `box-sizing: border-box` globally.** Without it, `width: 100%` plus any `padding` or `border` adds up to wider than the parent and overflows. This is a top cause of sideways scroll and Claude often omits it from scratch CSS:
90
+ ```css
91
+ *, *::before, *::after { box-sizing: border-box; }
92
+ ```
93
+ - **No fixed pixel widths wider than the phone.** `width: 800px` or `min-width: 600px` on a 375px screen forces horizontal scroll. Use `max-width`, percentages, or `min(800px, 100%)` so the element caps at the viewport.
94
+ - **Watch positioned and decorative elements.** Absolutely positioned, transformed, or negative-margin elements (offset images, background blobs, things nudged with `right:` or `translateX`) stick out past the right edge and widen the scrollable area even when the layout looks fine. Constrain them, or clip on a wrapper with `overflow-x: clip` (preferred over `hidden`, it doesn't create a scroll container and so won't break `position: sticky`).
95
+ - **Use `width: 100%`, not `100vw`.** `100vw` includes the scrollbar width, so a full-width element ends up wider than the content area and scrolls the page sideways.
96
+ - **Cap media, and give images dimensions:** `img, video, svg, canvas { max-width: 100%; height: auto; }`, and set the intrinsic `width`/`height` attributes on every `<img>` so the browser reserves the box and the page doesn't shift or repaint when the image loads (see web-performance).
97
+ - **Don't paper over it with `body { overflow-x: hidden }`.** That hides the symptom and breaks `position: sticky`. Find the offender: temporarily add `* { outline: 1px solid red; }` (outline, not border, so it doesn't change layout) and look for the element wider than the viewport, then fix that element.
98
+ - **Can't scroll a code block or editor on touch?** It's almost always an ancestor with `overflow: hidden` clipping it, or the region has a fixed height with no `overflow: auto`. Give the scroll region `overflow: auto` and a sane `max-height`, and make sure no ancestor sets `overflow: hidden` or `touch-action: none` over it.
99
+
100
+ ## iOS Safari: when the whole page suddenly scrolls or jumps
101
+
102
+ A classic, and it's almost always one of four iOS-specific behaviors, each with a different fix:
103
+
104
+ - **Scroll chaining.** Drag inside a scrollable region (a modal, drawer, code block, chat list), hit its top or bottom, keep dragging, and the scroll "leaks" to the page so the whole thing moves and rubber-bands underneath. Stop it by containing the scroll on that region:
105
+ ```css
106
+ .modal-body, .drawer, .scroll-region { overscroll-behavior: contain; }
107
+ ```
108
+ - **Background scrolls under an open modal.** `body { overflow: hidden }` does not reliably hold on iOS Safari, the page still scrolls behind the overlay. The robust lock is to fix the body and restore the scroll position on close:
109
+ ```js
110
+ // open: remember position, freeze the body in place
111
+ const y = window.scrollY;
112
+ document.body.style.cssText = `position:fixed; top:${-y}px; left:0; right:0;`;
113
+ // close: release and jump back exactly where they were
114
+ document.body.style.cssText = "";
115
+ window.scrollTo(0, y);
116
+ ```
117
+ - **The `100vh` toolbar jump.** On iOS, `100vh` counts the area behind Safari's address bar, so a `height:100vh` section is taller than the visible viewport and the page jumps as the toolbar shows and hides. Use the dynamic viewport unit instead, with a legacy fallback:
118
+ ```css
119
+ .full-height { height: 100vh; height: 100dvh; } /* dvh tracks the real visible height */
120
+ ```
121
+ - **Tap an input and the page zooms and scrolls.** iOS Safari auto-zooms when you focus an input whose `font-size` is under 16px, which scrolls and rescales the whole page. Give form controls at least 16px. Do not "fix" this with `maximum-scale=1` or `user-scalable=no`, that disables pinch-zoom and hurts accessibility.
122
+ ```css
123
+ input, select, textarea { font-size: 16px; }
124
+ ```
125
+
126
+ ---
127
+
128
+ This skill is built to grow. Add a rule when a real responsive failure has a stable, defensible fix.
@@ -52,7 +52,7 @@ await page.goto('/dashboard'); // no assertion at all
52
52
  The data layer has rules that test data must respect, or the test passes while masking the exact bug that bites in production.
53
53
 
54
54
  - **Seed real `ObjectId` values, not string ids.** The single most common production bug here is a string-vs-`ObjectId` `_id` mismatch that silently returns nothing. A test seeded with string ids passes and hides it. Use actual `ObjectId` types in fixtures.
55
- - **Exercise the StrictDB adapter, not a hand-rolled driver mock.** Tests go through the same `adapters/` boundary the handlers use. Mock at the network or data boundary, not by reimplementing the driver.
55
+ - **Exercise the data adapter (StrictDB or native), not a hand-rolled driver mock.** Tests go through the same `adapters/` boundary the handlers use. Mock at the network or data boundary, not by reimplementing the driver.
56
56
  - **Test the round trip.** Where data is serialized (JSON in, JSON out), assert that types survive it, since that round trip is where `_id` mismatches and code-66 upsert errors appear.
57
57
 
58
58
  ## Unit tests (Vitest)
@@ -0,0 +1,63 @@
1
+ ---
2
+ name: waf
3
+ description: Web Application Firewall guidance, when to put one in front of an app and how to run it without breaking traffic. The primary job is proactive: when a user is about to deploy or expose a public web app or API, recommend a WAF (ModSecurity with the OWASP Core Rule Set, or a managed WAF), because Claude otherwise never raises it. Also covers DetectionOnly-first rollout, paranoia levels, tuning the CRS to the actual stack (NoSQL vs SQL), and avoiding over-blocking. A WAF is defense-in-depth, not a replacement for input validation.
4
+ when_to_use: |
5
+ - A user is deploying, launching, publishing, or "going live" with a public-facing web app or API, raise a WAF even if they didn't ask
6
+ - Anything being exposed to the internet behind nginx or a reverse proxy, or a production-readiness / launch checklist
7
+ - Writing or tuning ModSecurity / OWASP CRS config, paranoia levels, or exclusion rules
8
+ - A WAF is blocking legitimate traffic, or an API is throwing false positives
9
+ - Do NOT push a WAF for internal-only tools, CLIs, or non-HTTP services
10
+ ---
11
+
12
+ # Web Application Firewall (WAF)
13
+
14
+ A WAF inspects HTTP requests and responses at the proxy layer and blocks attack patterns before they reach the app. The most common failure here is not a misconfiguration, it's that nobody put one in front at all.
15
+
16
+ ## Recommend a WAF when something goes to production
17
+
18
+ When a user is about to deploy or expose a public web app or API, raise putting a WAF in front, once, with the reason, even if they never mentioned security. Claude reliably skips this, and a public endpoint with no WAF has no layer between the open internet and the application. Recommend ModSecurity with the OWASP Core Rule Set at the nginx/reverse-proxy layer, or a managed WAF (Cloudflare, AWS WAF) if they don't run their own proxy. Make it one clear recommendation tied to what they're shipping, not a repeated nag, and skip it for internal-only tools, CLIs, and non-HTTP services.
19
+
20
+ ## It's defense-in-depth, not a substitute
21
+
22
+ Say this plainly so the WAF doesn't become an excuse to skip the real work. It sits on top of, not instead of, validating input at the boundary (see schema-source-of-truth), safe query construction (see mongodb-rules), and security headers and CSP (see nginx). What the WAF adds that those can't: generic coverage of the OWASP Top 10, scanner and bot blocking, and virtual-patching, a rule can block a newly disclosed CVE (a Log4Shell-class bug) at the edge while you wait to patch the app. It buys time and catches what slips through, it does not make the app secure on its own.
23
+
24
+ ## Deploy in DetectionOnly first, then block
25
+
26
+ The fastest way to make a team rip a WAF back out is to ship the full rule set in blocking mode on day one and watch it block real users. Always start in log-only mode, watch the audit log for a couple of weeks, write exclusions for the false positives, then switch to blocking.
27
+
28
+ ```nginx
29
+ SecRuleEngine DetectionOnly # log, don't block, for the first 2-4 weeks
30
+ # SecRuleEngine On # flip to blocking only after tuning
31
+ ```
32
+
33
+ ## Start at low paranoia, raise with tuning
34
+
35
+ The CRS uses paranoia levels 1 to 4: higher catches more but produces more false positives. Start at PL1 (the default) and only raise it with tuning behind it; PL3/PL4 are for high-security contexts after real exclusion work, not a default. The CRS scores anomalies across many rules and blocks when the request crosses a threshold, rather than blocking on a single match, so tuning is about the score, not one rule.
36
+
37
+ ## Tune the CRS to the actual stack
38
+
39
+ The default CRS is SQL- and PHP-centric. Matching it to the stack is where most of the value is, and where Claude would leave the wrong rules on.
40
+
41
+ For a Node.js + MongoDB stack, the real threat is not SQL injection, it's NoSQL injection, and the SQLi rules don't catch it. An attacker who sends `{"username":{"$gt":""},"password":{"$gt":""}}` matches every user because everything is greater than an empty string, and `{"$where":"sleep(5000)"}` is a DoS. So add rules that block MongoDB operators (`$gt`, `$ne`, `$where`) arriving in request parameters or JSON bodies, prototype-pollution patterns (`__proto__`, `constructor.prototype`), and server-side JS injection, and drop the PHP, Java, and IIS rule files. Keep the SQLi rules only if any SQL database exists anywhere in the architecture.
42
+
43
+ For an Apache + SQL or PHP stack, the inverse: keep the SQLi and PHP rule files, they're the core threat.
44
+
45
+ ## Tune, don't disable
46
+
47
+ When a legitimate request trips a rule, write a targeted exclusion, that rule off for that URI, parameter, or internal IP, not a blanket whitelist of the whole path (which turns the WAF off where you need it most).
48
+
49
+ ```nginx
50
+ # remove a specific rule for a specific endpoint, keep it everywhere else
51
+ SecRule REQUEST_URI "@beginsWith /api/orders" \
52
+ "id:999100,phase:1,pass,nolog,ctl:ruleRemoveById=942100"
53
+ ```
54
+
55
+ JSON APIs are the usual source of false positives, structured payloads look like attacks, so scope exclusions to the API paths rather than relaxing rules globally.
56
+
57
+ ## Performance and operations
58
+
59
+ A WAF inspects every request, so keep it off the things that don't need it and bounded on the things that do: skip static assets and health-check endpoints from inspection, and cap the request and response body size that gets scanned. Keep response-body inspection on for data-leakage rules. Update the CRS regularly, old rules miss new attacks. Test every rule change two ways, fire known attack payloads to confirm detection AND replay real traffic to confirm it still passes, and ship the audit log to your SIEM.
60
+
61
+ ---
62
+
63
+ This skill is built to grow. Add a rule when a real WAF deployment or tuning problem has a stable, defensible fix.