pompelmi 1.5.0 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,179 @@
1
+ # Local vs TCP Mode
2
+
3
+ pompelmi supports two scanning modes. Which one you use is controlled entirely by the options you pass — no configuration files, no environment flags in pompelmi itself.
4
+
5
+ ---
6
+
7
+ ## Summary
8
+
9
+ | | Local mode | TCP mode |
10
+ |---|---|---|
11
+ | **How it works** | Spawns `clamscan` as a child process | Streams file to `clamd` daemon over TCP |
12
+ | **ClamAV requirement** | `clamscan` binary in PATH | Running `clamd` daemon reachable over TCP |
13
+ | **Startup time** | Slow — loads virus DB each invocation | Fast — daemon keeps DB in memory |
14
+ | **Throughput** | Low — one process per scan | High — persistent connection |
15
+ | **Disk I/O** | Reads file from disk | Reads file from disk (or buffer/stream with no disk) |
16
+ | **Docker** | Requires ClamAV in app container | Use ClamAV as a sidecar |
17
+ | **Zero-copy scan** | Not possible | `scanBuffer()` and `scanStream()` with no disk I/O |
18
+
19
+ ---
20
+
21
+ ## Enabling local mode
22
+
23
+ Do not pass `host` or `port`. pompelmi spawns `clamscan --no-summary <filePath>`:
24
+
25
+ ```js
26
+ const { scan, Verdict } = require('pompelmi');
27
+
28
+ // Local mode — no options, or empty options
29
+ const result = await scan('/uploads/file.pdf');
30
+ const result = await scan('/uploads/file.pdf', {});
31
+ ```
32
+
33
+ `clamscan` must be in `PATH`. Install it with:
34
+
35
+ ```bash
36
+ # macOS
37
+ brew install clamav && freshclam
38
+
39
+ # Linux
40
+ sudo apt-get install -y clamav && sudo freshclam
41
+ ```
42
+
43
+ ---
44
+
45
+ ## Enabling TCP mode
46
+
47
+ Pass `host` (and optionally `port`) to any scan function:
48
+
49
+ ```js
50
+ const result = await scan('/uploads/file.pdf', {
51
+ host: '127.0.0.1',
52
+ port: 3310, // default 3310
53
+ timeout: 30_000, // socket idle timeout ms, default 15000
54
+ });
55
+ ```
56
+
57
+ Setting `host` switches all four functions — `scan`, `scanBuffer`, `scanStream`, `scanDirectory` — to TCP mode.
58
+
59
+ ---
60
+
61
+ ## How local mode works
62
+
63
+ ```
64
+ pompelmi OS
65
+ │ │
66
+ ├── spawn clamscan ─┤
67
+ │ ├── load virus DB (~300 MB) into memory
68
+ │ ├── scan file
69
+ │ ├── exit 0 / 1 / 2
70
+ ├── read exit code ─┘
71
+
72
+ └── resolve Verdict
73
+ ```
74
+
75
+ Each scan call:
76
+ 1. Spawns a new `clamscan` process
77
+ 2. `clamscan` loads the full virus database into memory
78
+ 3. Scans the file
79
+ 4. Exits with code 0 (clean), 1 (malicious), or 2 (error)
80
+ 5. pompelmi maps the exit code to a Verdict Symbol
81
+
82
+ **Typical latency:** 400–800 ms per scan (dominated by database load time).
83
+
84
+ ---
85
+
86
+ ## How TCP mode works
87
+
88
+ ```
89
+ pompelmi clamd daemon
90
+ │ │
91
+ ├── TCP connect ───►│ (keep-alive daemon)
92
+ ├── INSTREAM ───────┤
93
+ ├── stream chunks ──┤ scan in memory
94
+ │ ├── "stream: OK" / "stream: X FOUND"
95
+ ├── read response ──┘
96
+
97
+ └── resolve Verdict
98
+ ```
99
+
100
+ Each scan call:
101
+ 1. Opens a TCP connection to clamd
102
+ 2. Sends `zINSTREAM\0` command
103
+ 3. Streams the file in 64 KB chunks, each prefixed with a 4-byte big-endian length header
104
+ 4. Sends 4 zero bytes to signal end of stream
105
+ 5. Reads the response line
106
+ 6. Maps response to Verdict Symbol
107
+
108
+ **Typical latency:** 5–50 ms per scan (clamd keeps DB in memory; network is the bottleneck).
109
+
110
+ ---
111
+
112
+ ## Performance comparison
113
+
114
+ | Metric | Local mode | TCP mode |
115
+ |--------|-----------|----------|
116
+ | First scan latency | ~600 ms | ~10 ms |
117
+ | Subsequent scan latency | ~600 ms | ~10 ms |
118
+ | Concurrent scans (4-core) | ~4 (CPU-bound) | ~50+ |
119
+ | Memory per scan | ~300 MB (DB load) | ~0 (clamd holds DB) |
120
+
121
+ Local mode is fine for low-traffic applications (< 10 uploads/minute). TCP mode is required for any sustained upload throughput.
122
+
123
+ ---
124
+
125
+ ## Switching modes without changing application code
126
+
127
+ Structure your options from environment variables so the same code runs in local mode during development and TCP mode in production:
128
+
129
+ ```js
130
+ const SCAN_OPTS = process.env.CLAMAV_HOST
131
+ ? {
132
+ host: process.env.CLAMAV_HOST,
133
+ port: Number(process.env.CLAMAV_PORT) || 3310,
134
+ timeout: Number(process.env.CLAMAV_TIMEOUT) || 15_000,
135
+ }
136
+ : {}; // local mode — empty options
137
+
138
+ const result = await scan('/uploads/file.pdf', SCAN_OPTS);
139
+ ```
140
+
141
+ Set `CLAMAV_HOST=clamav` in your Docker environment; leave it unset in local development.
142
+
143
+ ---
144
+
145
+ ## Timeout differences
146
+
147
+ | | Local mode | TCP mode |
148
+ |---|---|---|
149
+ | **`timeout` option** | Ignored | Socket idle timeout in ms |
150
+ | **Default timeout** | OS process timeout | 15 000 ms |
151
+ | **Timeout error** | `Process killed by signal: SIGTERM` | `clamd connection timed out after Nms` |
152
+
153
+ In local mode, the process runs until `clamscan` finishes or the OS kills it. In TCP mode, pompelmi sets a socket idle timeout — if clamd stops sending data for longer than `timeout` ms, the connection is closed and the promise rejects.
154
+
155
+ ---
156
+
157
+ ## Error behaviour differences
158
+
159
+ | Condition | Local mode error | TCP mode error |
160
+ |-----------|-----------------|----------------|
161
+ | Service unavailable | `ENOENT` (clamscan not found) | `ECONNREFUSED` |
162
+ | Service slow | Process runs to completion | `clamd connection timed out` |
163
+ | File not scannable | `Verdict.ScanError` (exit code 2) | `Verdict.ScanError` (error response) |
164
+
165
+ ---
166
+
167
+ ## When to use local mode
168
+
169
+ - Development and testing on a developer's machine
170
+ - Low-traffic applications (< a few uploads per minute)
171
+ - Environments where Docker is unavailable
172
+ - Simple scripts and one-off scans
173
+
174
+ ## When to use TCP mode
175
+
176
+ - Production applications with concurrent uploads
177
+ - Docker or Kubernetes deployments
178
+ - Scanning in-memory buffers or streams with zero disk I/O
179
+ - Environments where the application container cannot install ClamAV
@@ -0,0 +1,166 @@
1
+ # multer Memory Storage
2
+
3
+ When you use `multer({ storage: multer.memoryStorage() })`, the uploaded file is never written to disk. It lives entirely in `req.file.buffer` as a Node.js `Buffer`. This page explains why `scan()` does not work in this case and how `scanBuffer()` solves it.
4
+
5
+ ---
6
+
7
+ ## Why `scan()` doesn't work with memoryStorage
8
+
9
+ `scan()` requires a file path — it calls `clamscan <filePath>` or streams the file at that path to clamd. With `memoryStorage`, there is no path:
10
+
11
+ ```js
12
+ const upload = multer({ storage: multer.memoryStorage() });
13
+
14
+ app.post('/upload', upload.single('file'), async (req, res) => {
15
+ console.log(req.file.path); // undefined — no file on disk
16
+ console.log(req.file.buffer); // <Buffer 25 50 44 ...> — file is here
17
+
18
+ // This throws "filePath must be a string"
19
+ await scan(req.file.path);
20
+ });
21
+ ```
22
+
23
+ ---
24
+
25
+ ## Use `scanBuffer()` instead
26
+
27
+ ```js
28
+ const express = require('express');
29
+ const multer = require('multer');
30
+ const { scanBuffer, Verdict } = require('pompelmi');
31
+
32
+ const app = express();
33
+ const upload = multer({
34
+ storage: multer.memoryStorage(),
35
+ limits: { fileSize: 10 * 1024 * 1024 }, // 10 MB
36
+ });
37
+
38
+ const SCAN_OPTS = {
39
+ host: process.env.CLAMAV_HOST,
40
+ port: Number(process.env.CLAMAV_PORT) || 3310,
41
+ timeout: 30_000,
42
+ };
43
+
44
+ app.post('/upload', upload.single('file'), async (req, res) => {
45
+ if (!req.file) {
46
+ return res.status(400).json({ error: 'No file uploaded.' });
47
+ }
48
+
49
+ let result;
50
+ try {
51
+ result = await scanBuffer(req.file.buffer, SCAN_OPTS);
52
+ } catch (err) {
53
+ return res.status(500).json({ error: `Scan failed: ${err.message}` });
54
+ }
55
+
56
+ if (result === Verdict.Malicious) {
57
+ return res.status(422).json({ error: 'Malicious file rejected.' });
58
+ }
59
+
60
+ if (result === Verdict.ScanError) {
61
+ return res.status(422).json({ error: 'Scan incomplete — file rejected.' });
62
+ }
63
+
64
+ // Verdict.Clean — buffer is available for forwarding to S3, DB, etc.
65
+ return res.json({ ok: true, size: req.file.size });
66
+ });
67
+
68
+ app.listen(3000);
69
+ ```
70
+
71
+ ---
72
+
73
+ ## How `scanBuffer()` works in each mode
74
+
75
+ | Mode | Behaviour |
76
+ |------|-----------|
77
+ | **TCP** (`host` set) | Buffer is streamed directly to clamd via INSTREAM — zero disk I/O on the application host. |
78
+ | **Local** (no `host`) | Buffer is written to a temp file in `os.tmpdir()`, scanned with `clamscan`, then deleted in a `finally` block. |
79
+
80
+ For `memoryStorage` workloads, TCP mode is strongly recommended: the whole point of keeping the file in memory is to avoid touching disk, and TCP mode preserves that guarantee.
81
+
82
+ ---
83
+
84
+ ## Multiple files with `upload.array()`
85
+
86
+ ```js
87
+ app.post('/upload-many', upload.array('files', 10), async (req, res) => {
88
+ if (!req.files || req.files.length === 0) {
89
+ return res.status(400).json({ error: 'No files uploaded.' });
90
+ }
91
+
92
+ const scanResults = await Promise.allSettled(
93
+ req.files.map(async (file) => {
94
+ const verdict = await scanBuffer(file.buffer, SCAN_OPTS);
95
+ return { originalname: file.originalname, verdict };
96
+ })
97
+ );
98
+
99
+ const accepted = [];
100
+ const rejected = [];
101
+
102
+ for (const r of scanResults) {
103
+ if (r.status === 'rejected') {
104
+ rejected.push({ originalname: '?', reason: 'scan_failed' });
105
+ continue;
106
+ }
107
+ const { originalname, verdict } = r.value;
108
+ if (verdict === Verdict.Clean) {
109
+ accepted.push(originalname);
110
+ } else {
111
+ rejected.push({ originalname, reason: verdict.description });
112
+ }
113
+ }
114
+
115
+ if (rejected.length > 0) {
116
+ return res.status(422).json({ accepted, rejected });
117
+ }
118
+ return res.json({ ok: true, accepted });
119
+ });
120
+ ```
121
+
122
+ ---
123
+
124
+ ## Memory usage considerations
125
+
126
+ With `memoryStorage`, every uploaded file occupies memory for the duration of the request. For large files or high concurrency, this can exhaust the Node.js heap. Options:
127
+
128
+ 1. **Set a file size limit** — always set `limits.fileSize` on multer.
129
+ 2. **Use disk storage for large files** — fall back to `diskStorage` for files above a threshold.
130
+ 3. **Use `scanStream()` instead** — pipe the multipart stream directly to `scanStream()` without buffering the entire file. This requires bypassing multer and parsing multipart manually (e.g. with `busboy`).
131
+
132
+ ```js
133
+ // scanStream with busboy — no full buffering
134
+ const busboy = require('busboy');
135
+
136
+ app.post('/upload-stream', (req, res) => {
137
+ const bb = busboy({ headers: req.headers });
138
+
139
+ bb.on('file', async (_name, fileStream, _info) => {
140
+ const result = await scanStream(fileStream, SCAN_OPTS);
141
+
142
+ if (result !== Verdict.Clean) {
143
+ return res.status(422).json({ error: result.description });
144
+ }
145
+ return res.json({ ok: true });
146
+ });
147
+
148
+ req.pipe(bb);
149
+ });
150
+ ```
151
+
152
+ ---
153
+
154
+ ## After scanning: forward to S3
155
+
156
+ ```js
157
+ const { PutObjectCommand } = require('@aws-sdk/client-s3');
158
+
159
+ // After confirmed Verdict.Clean
160
+ await s3.send(new PutObjectCommand({
161
+ Bucket: process.env.S3_BUCKET,
162
+ Key: `uploads/${Date.now()}-${req.file.originalname}`,
163
+ Body: req.file.buffer,
164
+ ContentType: req.file.mimetype,
165
+ }));
166
+ ```
@@ -0,0 +1,228 @@
1
+ # NestJS Integration
2
+
3
+ Integrate pompelmi into a NestJS application using a custom `PipeTransform` or `Interceptor` to scan uploaded files before they reach your controller.
4
+
5
+ ---
6
+
7
+ ## Setup
8
+
9
+ ```bash
10
+ npm install pompelmi @nestjs/platform-express multer
11
+ npm install -D @types/multer
12
+ ```
13
+
14
+ ---
15
+
16
+ ## Custom pipe: `FileScanPipe`
17
+
18
+ A `PipeTransform` that receives the `Express.Multer.File` object from `FileInterceptor` and rejects it if ClamAV detects malware. Throw `BadRequestException` or `UnprocessableEntityException` as appropriate.
19
+
20
+ ```ts
21
+ // file-scan.pipe.ts
22
+ import { PipeTransform, Injectable, BadRequestException } from '@nestjs/common';
23
+ import { scan, Verdict } from 'pompelmi';
24
+
25
+ @Injectable()
26
+ export class FileScanPipe implements PipeTransform {
27
+ private readonly opts = {
28
+ host: process.env.CLAMAV_HOST,
29
+ port: Number(process.env.CLAMAV_PORT) || 3310,
30
+ timeout: 30_000,
31
+ };
32
+
33
+ async transform(file: Express.Multer.File): Promise<Express.Multer.File> {
34
+ if (!file) throw new BadRequestException('No file uploaded.');
35
+
36
+ let result: symbol;
37
+ try {
38
+ result = await scan(file.path, this.opts);
39
+ } catch (err) {
40
+ throw new BadRequestException(`Scan failed: ${(err as Error).message}`);
41
+ }
42
+
43
+ if (result === Verdict.Malicious) {
44
+ throw new BadRequestException('Malicious file rejected.');
45
+ }
46
+
47
+ if (result === Verdict.ScanError) {
48
+ throw new BadRequestException('Scan incomplete — file rejected.');
49
+ }
50
+
51
+ return file;
52
+ }
53
+ }
54
+ ```
55
+
56
+ ---
57
+
58
+ ## Using the pipe in a controller
59
+
60
+ ```ts
61
+ // upload.controller.ts
62
+ import {
63
+ Controller,
64
+ Post,
65
+ UploadedFile,
66
+ UseInterceptors,
67
+ } from '@nestjs/common';
68
+ import { FileInterceptor } from '@nestjs/platform-express';
69
+ import { FileScanPipe } from './file-scan.pipe';
70
+ import * as diskStorage from 'multer';
71
+
72
+ @Controller('upload')
73
+ export class UploadController {
74
+ @Post()
75
+ @UseInterceptors(
76
+ FileInterceptor('file', {
77
+ dest: './uploads',
78
+ limits: { fileSize: 10 * 1024 * 1024 },
79
+ }),
80
+ )
81
+ async uploadFile(
82
+ @UploadedFile(FileScanPipe) file: Express.Multer.File,
83
+ ) {
84
+ return { ok: true, filename: file.filename };
85
+ }
86
+ }
87
+ ```
88
+
89
+ `FileInterceptor` writes the file to `dest` before the pipe runs, so `file.path` is available.
90
+
91
+ ---
92
+
93
+ ## Memory storage with `scanBuffer()`
94
+
95
+ If you use `multer.memoryStorage()` the file is in `file.buffer`. Update the pipe:
96
+
97
+ ```ts
98
+ // file-scan-buffer.pipe.ts
99
+ import { PipeTransform, Injectable, BadRequestException } from '@nestjs/common';
100
+ import { scanBuffer, Verdict } from 'pompelmi';
101
+
102
+ @Injectable()
103
+ export class FileScanBufferPipe implements PipeTransform {
104
+ private readonly opts = {
105
+ host: process.env.CLAMAV_HOST,
106
+ port: Number(process.env.CLAMAV_PORT) || 3310,
107
+ };
108
+
109
+ async transform(file: Express.Multer.File): Promise<Express.Multer.File> {
110
+ if (!file) throw new BadRequestException('No file uploaded.');
111
+
112
+ const result = await scanBuffer(file.buffer, this.opts).catch((err) => {
113
+ throw new BadRequestException(`Scan failed: ${err.message}`);
114
+ });
115
+
116
+ if (result !== Verdict.Clean) {
117
+ throw new BadRequestException(`Upload rejected: ${result.description}`);
118
+ }
119
+
120
+ return file;
121
+ }
122
+ }
123
+ ```
124
+
125
+ Controller stays the same; swap `FileScanPipe` for `FileScanBufferPipe`:
126
+
127
+ ```ts
128
+ @UseInterceptors(
129
+ FileInterceptor('file', {
130
+ storage: multer.memoryStorage(),
131
+ limits: { fileSize: 10 * 1024 * 1024 },
132
+ }),
133
+ )
134
+ async uploadFile(@UploadedFile(FileScanBufferPipe) file: Express.Multer.File) {
135
+ // file.buffer is the scanned content
136
+ return { ok: true, size: file.size };
137
+ }
138
+ ```
139
+
140
+ ---
141
+
142
+ ## Module setup
143
+
144
+ Register the pipe as a provider if you want it injectable via DI:
145
+
146
+ ```ts
147
+ // upload.module.ts
148
+ import { Module } from '@nestjs/common';
149
+ import { UploadController } from './upload.controller';
150
+ import { FileScanPipe } from './file-scan.pipe';
151
+
152
+ @Module({
153
+ controllers: [UploadController],
154
+ providers: [FileScanPipe],
155
+ })
156
+ export class UploadModule {}
157
+ ```
158
+
159
+ Or use it directly inline without DI — the `new FileScanPipe()` form works equally well:
160
+
161
+ ```ts
162
+ @UploadedFile(new FileScanPipe())
163
+ ```
164
+
165
+ ---
166
+
167
+ ## Interceptor approach
168
+
169
+ An `NestInterceptor` runs around the entire handler. Use this if you want to clean up the file after the handler completes regardless of outcome.
170
+
171
+ ```ts
172
+ // scan.interceptor.ts
173
+ import {
174
+ Injectable,
175
+ NestInterceptor,
176
+ ExecutionContext,
177
+ CallHandler,
178
+ BadRequestException,
179
+ } from '@nestjs/common';
180
+ import { Observable } from 'rxjs';
181
+ import { scan, Verdict } from 'pompelmi';
182
+
183
+ @Injectable()
184
+ export class ScanInterceptor implements NestInterceptor {
185
+ async intercept(ctx: ExecutionContext, next: CallHandler): Promise<Observable<any>> {
186
+ const req = ctx.switchToHttp().getRequest();
187
+ const file: Express.Multer.File | undefined = req.file;
188
+
189
+ if (!file) return next.handle();
190
+
191
+ const result = await scan(file.path, {
192
+ host: process.env.CLAMAV_HOST,
193
+ port: 3310,
194
+ });
195
+
196
+ if (result !== Verdict.Clean) {
197
+ throw new BadRequestException(`Upload rejected: ${result.description}`);
198
+ }
199
+
200
+ return next.handle();
201
+ }
202
+ }
203
+ ```
204
+
205
+ Apply it at the controller or handler level:
206
+
207
+ ```ts
208
+ @UseInterceptors(FileInterceptor('file', { dest: './uploads' }), ScanInterceptor)
209
+ @Post()
210
+ async uploadFile(@UploadedFile() file: Express.Multer.File) {
211
+ return { ok: true, filename: file.filename };
212
+ }
213
+ ```
214
+
215
+ ---
216
+
217
+ ## Cleanup on rejection
218
+
219
+ When the file is rejected, delete it to avoid accumulating rejected uploads on disk:
220
+
221
+ ```ts
222
+ import * as fs from 'fs';
223
+
224
+ if (result !== Verdict.Clean) {
225
+ try { fs.unlinkSync(file.path); } catch {}
226
+ throw new BadRequestException('Malicious file rejected.');
227
+ }
228
+ ```