pompelmi 1.5.0 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,190 @@
1
+ # Docker Compose — Production Setup
2
+
3
+ Production-grade docker-compose configuration for running pompelmi with a ClamAV sidecar. This setup includes health checks, restart policy, persistent virus definition storage, and environment variable configuration.
4
+
5
+ ---
6
+
7
+ ## Complete `docker-compose.yml`
8
+
9
+ ```yaml
10
+ services:
11
+ app:
12
+ build: .
13
+ ports:
14
+ - "3000:3000"
15
+ environment:
16
+ NODE_ENV: production
17
+ CLAMAV_HOST: clamav
18
+ CLAMAV_PORT: "3310"
19
+ CLAMAV_TIMEOUT: "30000"
20
+ depends_on:
21
+ clamav:
22
+ condition: service_healthy
23
+ restart: unless-stopped
24
+ volumes:
25
+ - uploads:/app/uploads
26
+
27
+ clamav:
28
+ image: clamav/clamav:stable
29
+ ports:
30
+ - "3310:3310"
31
+ restart: unless-stopped
32
+ volumes:
33
+ - clamav_db:/var/lib/clamav # persist virus definitions across restarts
34
+ healthcheck:
35
+ test: ["CMD", "clamdcheck"]
36
+ interval: 30s
37
+ timeout: 10s
38
+ retries: 5
39
+ start_period: 120s # first start downloads ~300 MB of definitions
40
+
41
+ volumes:
42
+ clamav_db:
43
+ uploads:
44
+ ```
45
+
46
+ ---
47
+
48
+ ## Application code
49
+
50
+ Read options from environment variables so the same image works in all environments:
51
+
52
+ ```js
53
+ const { scan, scanBuffer, scanStream, Verdict } = require('pompelmi');
54
+
55
+ const SCAN_OPTS = {
56
+ host: process.env.CLAMAV_HOST || '127.0.0.1',
57
+ port: Number(process.env.CLAMAV_PORT) || 3310,
58
+ timeout: Number(process.env.CLAMAV_TIMEOUT) || 15_000,
59
+ };
60
+
61
+ const result = await scan('/uploads/file.pdf', SCAN_OPTS);
62
+ ```
63
+
64
+ ---
65
+
66
+ ## Dockerfile example
67
+
68
+ ```dockerfile
69
+ FROM node:22-alpine
70
+
71
+ WORKDIR /app
72
+
73
+ COPY package*.json ./
74
+ RUN npm ci --omit=dev
75
+
76
+ COPY . .
77
+
78
+ RUN mkdir -p uploads
79
+
80
+ EXPOSE 3000
81
+ CMD ["node", "src/server.js"]
82
+ ```
83
+
84
+ Note: `clamscan` does **not** need to be installed in the application container when using TCP mode. The ClamAV sidecar handles all scanning.
85
+
86
+ ---
87
+
88
+ ## Health check explanation
89
+
90
+ `clamdcheck` is a shell script bundled inside the `clamav/clamav:stable` image. It sends a `PING` to the local clamd socket and checks the response. The `start_period: 120s` gives clamd time to download virus definitions on first start before health checks begin counting failures.
91
+
92
+ If you need to check from outside the container, you can also use TCP:
93
+
94
+ ```bash
95
+ echo -n "PING" | nc -q1 localhost 3310
96
+ # expected response: PONG
97
+ ```
98
+
99
+ ---
100
+
101
+ ## `depends_on` with health check
102
+
103
+ ```yaml
104
+ depends_on:
105
+ clamav:
106
+ condition: service_healthy
107
+ ```
108
+
109
+ This prevents the application container from starting until clamd passes its health check. Without this, your app may start and immediately fail its first scan with "connection refused."
110
+
111
+ ---
112
+
113
+ ## Scaling considerations
114
+
115
+ ### Vertical scaling
116
+
117
+ ClamAV is single-threaded per scan. For high-throughput use cases, run multiple clamd containers behind a load balancer rather than trying to parallelise within one instance.
118
+
119
+ ### Horizontal scaling
120
+
121
+ ```yaml
122
+ services:
123
+ clamav:
124
+ image: clamav/clamav:stable
125
+ deploy:
126
+ replicas: 3
127
+ volumes:
128
+ - clamav_db:/var/lib/clamav
129
+ ```
130
+
131
+ Point your application at a load balancer in front of the clamd replicas. Note: each clamd replica downloads its own virus database on startup unless you share the volume (which requires care with concurrent freshclam writes).
132
+
133
+ ### Alternative: one clamd per app instance
134
+
135
+ For simpler setups, co-deploy one clamd container with each app container. Each pair shares a clamd_db volume scoped to the pair.
136
+
137
+ ---
138
+
139
+ ## Resource limits
140
+
141
+ ClamAV can use significant memory when unpacking large archives:
142
+
143
+ ```yaml
144
+ clamav:
145
+ image: clamav/clamav:stable
146
+ deploy:
147
+ resources:
148
+ limits:
149
+ memory: 1g
150
+ cpus: '1.0'
151
+ ```
152
+
153
+ Tune based on your expected file sizes. Scanning uncompressed archives >100 MB may require more.
154
+
155
+ ---
156
+
157
+ ## Keeping virus definitions fresh
158
+
159
+ The `clamav/clamav:stable` image runs `freshclam` on a schedule automatically. Verify definitions are up to date:
160
+
161
+ ```bash
162
+ docker compose exec clamav freshclam --verbose
163
+ ```
164
+
165
+ Or trigger a manual update:
166
+
167
+ ```bash
168
+ docker compose exec clamav freshclam
169
+ ```
170
+
171
+ For zero-downtime definition updates, restart the clamav container (freshclam updates on startup) without restarting the app container:
172
+
173
+ ```bash
174
+ docker compose restart clamav
175
+ ```
176
+
177
+ The named volume preserves the downloaded definitions across restarts, so only incremental updates are downloaded after the first start.
178
+
179
+ ---
180
+
181
+ ## Production checklist
182
+
183
+ - [ ] `restart: unless-stopped` on both services
184
+ - [ ] `healthcheck` configured on clamav with `start_period` ≥ 90s
185
+ - [ ] `depends_on: condition: service_healthy` on app
186
+ - [ ] `CLAMAV_HOST`, `CLAMAV_PORT`, `CLAMAV_TIMEOUT` via environment variables
187
+ - [ ] Named volume for `clamav_db` (not anonymous)
188
+ - [ ] File size limits in your HTTP server (before the scan is reached)
189
+ - [ ] Upload directory in a named volume (survives container restarts)
190
+ - [ ] Log aggregation: capture `app` and `clamav` container logs
@@ -0,0 +1,178 @@
1
+ # Docker Setup
2
+
3
+ Run ClamAV as a Docker sidecar so your application host requires no local ClamAV installation. pompelmi's TCP mode streams files directly to the clamd daemon — the API is identical to local mode.
4
+
5
+ ---
6
+
7
+ ## Why a Docker sidecar?
8
+
9
+ - **No local install** — the application container stays lean; ClamAV and its virus definitions live in a dedicated sidecar.
10
+ - **Always up-to-date definitions** — the official `clamav/clamav:stable` image runs `freshclam` on startup and periodically refreshes the database.
11
+ - **Isolation** — ClamAV runs in its own process/container; a crash or restart does not affect your application.
12
+ - **Consistent environments** — same image in development, staging, and production.
13
+
14
+ ---
15
+
16
+ ## docker-compose.yml
17
+
18
+ ```yaml
19
+ services:
20
+ app:
21
+ build: .
22
+ ports:
23
+ - "3000:3000"
24
+ environment:
25
+ CLAMAV_HOST: clamav
26
+ CLAMAV_PORT: 3310
27
+ depends_on:
28
+ clamav:
29
+ condition: service_healthy
30
+
31
+ clamav:
32
+ image: clamav/clamav:stable
33
+ ports:
34
+ - "3310:3310"
35
+ restart: unless-stopped
36
+ volumes:
37
+ - clamav_db:/var/lib/clamav # persist virus definitions across restarts
38
+ healthcheck:
39
+ test: ["CMD", "clamdcheck"] # bundled check script in clamav/clamav image
40
+ interval: 30s
41
+ timeout: 10s
42
+ retries: 5
43
+ start_period: 120s # freshclam download takes time on first boot
44
+
45
+ volumes:
46
+ clamav_db:
47
+ ```
48
+
49
+ > **First boot:** The image downloads the full virus database (~300 MB) before clamd starts accepting connections. `start_period: 120s` gives it time. On subsequent restarts the volume cache means startup is near-instant.
50
+
51
+ ---
52
+
53
+ ## Pointing pompelmi at clamd
54
+
55
+ Pass `host` and `port` to any pompelmi function. No other code changes are needed.
56
+
57
+ ```js
58
+ const { scan, scanBuffer, scanStream, scanDirectory, Verdict } = require('pompelmi');
59
+
60
+ const CLAMAV_OPTS = {
61
+ host: process.env.CLAMAV_HOST || '127.0.0.1',
62
+ port: Number(process.env.CLAMAV_PORT) || 3310,
63
+ timeout: 30_000, // ms — increase for large files
64
+ };
65
+
66
+ // scan a file by path
67
+ const result = await scan('/uploads/report.pdf', CLAMAV_OPTS);
68
+
69
+ // scan an in-memory Buffer (multer memoryStorage)
70
+ const result = await scanBuffer(req.file.buffer, CLAMAV_OPTS);
71
+
72
+ // scan a Readable stream (S3, HTTP, pipes)
73
+ const stream = s3.getObject({ Bucket, Key }).createReadStream();
74
+ const result = await scanStream(stream, CLAMAV_OPTS);
75
+
76
+ // recursively scan a directory
77
+ const results = await scanDirectory('/uploads', CLAMAV_OPTS);
78
+ ```
79
+
80
+ All four functions return the same `Verdict.Clean`, `Verdict.Malicious`, or `Verdict.ScanError` Symbols. No code changes are required when switching between local and TCP mode.
81
+
82
+ ---
83
+
84
+ ## Configuring timeout for large files
85
+
86
+ The `timeout` option sets the socket idle timeout in milliseconds (default: 15 000 ms). Increase it when scanning large archives or slow network links.
87
+
88
+ ```js
89
+ const result = await scan('/uploads/large-archive.zip', {
90
+ host: 'clamav',
91
+ port: 3310,
92
+ timeout: 120_000, // 2 minutes
93
+ });
94
+ ```
95
+
96
+ If clamd takes longer than `timeout` ms without sending data, pompelmi rejects with:
97
+
98
+ ```
99
+ clamd connection timed out after 120000ms
100
+ ```
101
+
102
+ ---
103
+
104
+ ## Production tips
105
+
106
+ ### Health checks
107
+
108
+ The `healthcheck` in the example above uses the `clamdcheck` script bundled in the official image. Your application container uses `depends_on: condition: service_healthy` so it only starts once clamd is ready.
109
+
110
+ ### Restart policy
111
+
112
+ ```yaml
113
+ restart: unless-stopped
114
+ ```
115
+
116
+ This ensures clamd comes back up after host reboots or OOM kills without manual intervention.
117
+
118
+ ### Persisting virus definitions
119
+
120
+ The named volume `clamav_db` mounts to `/var/lib/clamav` inside the container. This means:
121
+
122
+ - First start downloads definitions once (~300 MB).
123
+ - Subsequent restarts reuse the cache; `freshclam` only downloads incremental updates.
124
+ - The volume survives `docker compose down` (use `docker compose down -v` to wipe it).
125
+
126
+ ### Resource limits (optional)
127
+
128
+ ClamAV can be memory-hungry when scanning large ZIP archives. Set a limit if needed:
129
+
130
+ ```yaml
131
+ clamav:
132
+ image: clamav/clamav:stable
133
+ deploy:
134
+ resources:
135
+ limits:
136
+ memory: 1g
137
+ ```
138
+
139
+ ---
140
+
141
+ ## Troubleshooting
142
+
143
+ ### clamd not ready on startup
144
+
145
+ **Symptom:** Application starts before clamd is accepting connections; first scan fails with connection refused.
146
+
147
+ **Fix:** Add `depends_on` with `condition: service_healthy` (see example above) and ensure the `healthcheck` is configured on the clamav service. The `start_period` must be long enough for the initial database download.
148
+
149
+ ### Connection refused
150
+
151
+ **Symptom:** `ECONNREFUSED 127.0.0.1:3310`
152
+
153
+ **Causes and fixes:**
154
+
155
+ 1. clamd container is not running — `docker compose ps` to check.
156
+ 2. Wrong host — if the app is inside Docker, use the service name (`clamav`), not `127.0.0.1`.
157
+ 3. Port not exposed — verify the `ports` mapping in `docker-compose.yml`.
158
+ 4. clamd is still loading the virus database — add the `healthcheck` and `depends_on` described above.
159
+
160
+ ### Timeout errors
161
+
162
+ **Symptom:** `clamd connection timed out after 15000ms`
163
+
164
+ **Fixes:**
165
+
166
+ 1. Increase `timeout` in the options object (e.g. `timeout: 60_000`).
167
+ 2. Check clamd resource limits — if it is CPU- or memory-constrained it will scan slowly.
168
+ 3. Check network latency between app and clamav containers.
169
+
170
+ ### Virus definitions out of date
171
+
172
+ The official image runs `freshclam` periodically. If you see scan errors mentioning outdated definitions, exec into the container and run it manually:
173
+
174
+ ```bash
175
+ docker compose exec clamav freshclam
176
+ ```
177
+
178
+ Or restart the container; `freshclam` runs at startup.
@@ -0,0 +1,242 @@
1
+ # Error Handling
2
+
3
+ pompelmi has two distinct failure modes that require different handling: **rejected Promises** (the function threw) and **`Verdict.ScanError`** (the scan completed but could not determine safety). Understanding the difference is critical to building a secure upload pipeline.
4
+
5
+ ---
6
+
7
+ ## The two failure modes
8
+
9
+ ### 1. Rejected Promise (the scan function threw)
10
+
11
+ `scan()`, `scanBuffer()`, `scanStream()`, and `scanDirectory()` reject when something prevents the scan from running at all — not when the scan completes and finds a problem.
12
+
13
+ Common rejection causes:
14
+
15
+ | Error message | Cause |
16
+ |---------------|-------|
17
+ | `filePath must be a string` | Wrong argument type |
18
+ | `File not found: <path>` | File does not exist |
19
+ | `ENOENT` | `clamscan` not installed or not in PATH |
20
+ | `Unexpected exit code: N` | ClamAV internal error |
21
+ | `Process killed by signal: SIGTERM` | Process killed (OOM, timeout) |
22
+ | `clamd connection timed out after Nms` | TCP timeout exceeded |
23
+ | `buffer must be a Buffer` | Wrong argument to `scanBuffer()` |
24
+ | `stream must be a Readable` | Wrong argument to `scanStream()` |
25
+ | `dirPath must be a string` | Wrong argument to `scanDirectory()` |
26
+ | `Directory not found: <path>` | Directory does not exist |
27
+
28
+ These are programming errors or infrastructure failures. Handle them with `try/catch`.
29
+
30
+ ### 2. `Verdict.ScanError` (scan completed, result unknown)
31
+
32
+ `Verdict.ScanError` resolves (does not throw) and indicates ClamAV ran but could not produce a clean/malicious verdict. Common causes: encrypted archives, corrupt files, permission errors, I/O issues.
33
+
34
+ ---
35
+
36
+ ## The secure default: reject on both
37
+
38
+ The safest policy: any outcome other than `Verdict.Clean` results in rejection.
39
+
40
+ ```js
41
+ const { scan, Verdict } = require('pompelmi');
42
+ const fs = require('fs');
43
+
44
+ async function scanAndAccept(filePath) {
45
+ try {
46
+ const result = await scan(filePath, { host: 'clamav', port: 3310 });
47
+
48
+ if (result === Verdict.Malicious) {
49
+ fs.unlinkSync(filePath);
50
+ throw new Error('Malicious file rejected.');
51
+ }
52
+
53
+ if (result === Verdict.ScanError) {
54
+ fs.unlinkSync(filePath);
55
+ throw new Error('Scan incomplete — file rejected as precaution.');
56
+ }
57
+
58
+ return filePath; // Verdict.Clean
59
+ } catch (err) {
60
+ // Covers both scan() rejections and our own thrown Errors above.
61
+ // Delete the file defensively if it still exists.
62
+ try { fs.unlinkSync(filePath); } catch {}
63
+ throw err;
64
+ }
65
+ }
66
+ ```
67
+
68
+ ---
69
+
70
+ ## When to retry `ScanError`
71
+
72
+ A `ScanError` caused by a transient network blip or a momentary clamd overload is worth one retry. A `ScanError` caused by a corrupt file or encrypted archive will always return `ScanError` — retrying wastes time.
73
+
74
+ ```js
75
+ async function scanWithRetry(filePath, opts, retries = 1) {
76
+ for (let attempt = 0; attempt <= retries; attempt++) {
77
+ try {
78
+ const result = await scan(filePath, opts);
79
+ if (result !== Verdict.ScanError || attempt === retries) {
80
+ return result;
81
+ }
82
+ // ScanError on non-final attempt — wait briefly and retry
83
+ await new Promise(r => setTimeout(r, 500));
84
+ } catch (err) {
85
+ if (attempt === retries) throw err;
86
+ }
87
+ }
88
+ }
89
+ ```
90
+
91
+ Do not retry `Verdict.Malicious` — the signature match is deterministic.
92
+
93
+ ---
94
+
95
+ ## Cleanup with `finally`
96
+
97
+ Ensure temp files are always deleted regardless of scan outcome:
98
+
99
+ ```js
100
+ const os = require('os');
101
+ const fs = require('fs');
102
+ const path = require('path');
103
+ const { scan, Verdict } = require('pompelmi');
104
+
105
+ async function scanBuffer_manual(buffer) {
106
+ const tmpPath = path.join(os.tmpdir(), `scan-${Date.now()}.tmp`);
107
+ fs.writeFileSync(tmpPath, buffer);
108
+
109
+ try {
110
+ return await scan(tmpPath);
111
+ } finally {
112
+ try { fs.unlinkSync(tmpPath); } catch {}
113
+ }
114
+ }
115
+ ```
116
+
117
+ `scanBuffer()` handles this `finally` pattern internally in local mode — you don't need to replicate it when using the API directly.
118
+
119
+ ---
120
+
121
+ ## Express error handling pattern
122
+
123
+ ```js
124
+ const express = require('express');
125
+ const multer = require('multer');
126
+ const fs = require('fs');
127
+ const { scan, Verdict } = require('pompelmi');
128
+
129
+ const app = express();
130
+ const upload = multer({ dest: './uploads', limits: { fileSize: 10 * 1024 * 1024 } });
131
+
132
+ app.post('/upload', upload.single('file'), async (req, res, next) => {
133
+ if (!req.file) return res.status(400).json({ error: 'No file uploaded.' });
134
+
135
+ try {
136
+ const result = await scan(req.file.path, { host: 'clamav', port: 3310 });
137
+
138
+ if (result === Verdict.Malicious) {
139
+ fs.unlinkSync(req.file.path);
140
+ return res.status(422).json({ error: 'Malicious file rejected.' });
141
+ }
142
+
143
+ if (result === Verdict.ScanError) {
144
+ fs.unlinkSync(req.file.path);
145
+ return res.status(422).json({ error: 'Scan failed — file rejected.' });
146
+ }
147
+
148
+ return res.json({ ok: true, filename: req.file.filename });
149
+ } catch (err) {
150
+ try { fs.unlinkSync(req.file.path); } catch {}
151
+ next(err); // forward to Express error middleware
152
+ }
153
+ });
154
+
155
+ // Global error handler
156
+ app.use((err, req, res, next) => {
157
+ console.error(err);
158
+ res.status(500).json({ error: 'Internal scan error.' });
159
+ });
160
+ ```
161
+
162
+ ---
163
+
164
+ ## Logging best practices
165
+
166
+ Log rejections with enough context to investigate later — but never log file contents.
167
+
168
+ ```js
169
+ const logger = require('./logger'); // pino, winston, etc.
170
+
171
+ if (result === Verdict.Malicious) {
172
+ logger.warn({
173
+ event: 'malware_detected',
174
+ filePath,
175
+ originalname: req.file.originalname,
176
+ mimetype: req.file.mimetype,
177
+ size: req.file.size,
178
+ userId: req.user?.id,
179
+ ip: req.ip,
180
+ });
181
+ }
182
+ ```
183
+
184
+ For `ScanError`:
185
+
186
+ ```js
187
+ if (result === Verdict.ScanError) {
188
+ logger.warn({
189
+ event: 'scan_error',
190
+ filePath,
191
+ mimetype: req.file.mimetype,
192
+ size: req.file.size,
193
+ });
194
+ }
195
+ ```
196
+
197
+ For scan function rejections:
198
+
199
+ ```js
200
+ } catch (err) {
201
+ logger.error({
202
+ event: 'scan_threw',
203
+ message: err.message,
204
+ filePath,
205
+ });
206
+ }
207
+ ```
208
+
209
+ ---
210
+
211
+ ## HTTP status code conventions
212
+
213
+ | Situation | Recommended status |
214
+ |-----------|-------------------|
215
+ | No file in request | `400 Bad Request` |
216
+ | Wrong argument (programming error) | `400 Bad Request` |
217
+ | `Verdict.Malicious` | `422 Unprocessable Entity` |
218
+ | `Verdict.ScanError` (reject policy) | `422 Unprocessable Entity` |
219
+ | `scan()` throws (infra error) | `500 Internal Server Error` |
220
+ | File too large (pre-scan) | `413 Content Too Large` |
221
+ | `Verdict.Clean` | `200 OK` / `201 Created` |
222
+
223
+ ---
224
+
225
+ ## `scanDirectory()` error handling
226
+
227
+ Per-file failures in `scanDirectory()` go into the `errors` array — the function itself only rejects on argument errors or missing directory.
228
+
229
+ ```js
230
+ const { scanDirectory } = require('pompelmi');
231
+
232
+ try {
233
+ const results = await scanDirectory('/uploads');
234
+ if (results.errors.length > 0) {
235
+ logger.warn({ event: 'scan_errors', paths: results.errors });
236
+ // Decide: reject the whole batch, or only reject the errored files
237
+ }
238
+ } catch (err) {
239
+ // dirPath not a string, or directory not found
240
+ logger.error({ event: 'scan_threw', message: err.message });
241
+ }
242
+ ```