npm - pompelmi - Versions diffs - 1.4.0 → 1.6.0 - Mend

pompelmi 1.4.0 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

package/README.md +96 -40
package/llms.txt +22 -99
package/package.json +4 -1
package/release-notes-v1.4.0.md +25 -0
package/release-notes-v1.5.0.md +37 -0
package/src/BufferScanner.js +20 -17
package/src/ClamAVScanner.js +42 -5
package/src/ClamdScanner.js +18 -15
package/src/StreamScanner.js +20 -17
package/src/index.js +3 -3
package/wiki/api-reference.md +268 -0
package/wiki/cli-usage.md +263 -0
package/wiki/concurrent-scanning.md +199 -0
package/wiki/docker-compose-production.md +190 -0
package/wiki/docker-setup.md +178 -0
package/wiki/error-handling.md +242 -0
package/wiki/express-integration.md +227 -0
package/wiki/fastify-integration.md +207 -0
package/wiki/home.md +0 -0
package/wiki/local-vs-tcp-mode.md +179 -0
package/wiki/multer-memory-storage.md +166 -0
package/wiki/nestjs-integration.md +228 -0
package/wiki/nextjs-integration.md +209 -0
package/wiki/performance.md +178 -0
package/wiki/quarantine-workflow.md +260 -0
package/wiki/rest-api-server.md +297 -0
package/wiki/s3-integration.md +233 -0
package/wiki/security-considerations.md +192 -0
package/wiki/typescript-usage.md +239 -0
package/wiki/verdicts.md +192 -0
package/wiki/virus-definitions.md +194 -0

package/wiki/rest-api-server.md ADDED Viewed

@@ -0,0 +1,297 @@
+# REST API Scan Server
+Build a standalone HTTP microservice that exposes a `POST /scan` endpoint. Other services send files to it and receive a JSON verdict. This pattern lets you share one clamd instance and one scan service across multiple applications.
+---
+## Minimal implementation (Node.js built-ins)
+No framework required — just `node:http` and `busboy` for multipart parsing:
+```bash
+npm install pompelmi busboy
+```
+```js
+// scan-server.js
+const http   = require('http');
+const busboy = require('busboy');
+const os     = require('os');
+const fs     = require('fs');
+const path   = require('path');
+const { scan, Verdict } = require('pompelmi');
+const PORT = Number(process.env.PORT) || 4000;
+const SCAN_OPTS = {
+  host:    process.env.CLAMAV_HOST,
+  port:    Number(process.env.CLAMAV_PORT) || 3310,
+  timeout: Number(process.env.CLAMAV_TIMEOUT) || 30_000,
+};
+function json(res, status, body) {
+  const payload = JSON.stringify(body);
+  res.writeHead(status, {
+    'Content-Type':   'application/json',
+    'Content-Length': Buffer.byteLength(payload),
+  });
+  res.end(payload);
+}
+const server = http.createServer((req, res) => {
+  if (req.method !== 'POST' || req.url !== '/scan') {
+    return json(res, 404, { error: 'Not found.' });
+  }
+  const bb = busboy({ headers: req.headers });
+  let handled = false;
+  bb.on('file', async (_name, fileStream, info) => {
+    const tmpPath = path.join(os.tmpdir(), `scan-${Date.now()}-${info.filename}`);
+    const ws      = fs.createWriteStream(tmpPath);
+    fileStream.pipe(ws);
+    ws.on('finish', async () => {
+      if (handled) return;
+      handled = true;
+      try {
+        const result = await scan(tmpPath, SCAN_OPTS);
+        json(res, result === Verdict.Clean ? 200 : 422, {
+          verdict:  result.description,
+          filename: info.filename,
+        });
+      } catch (err) {
+        json(res, 500, { error: err.message });
+      } finally {
+        try { fs.unlinkSync(tmpPath); } catch {}
+      }
+    });
+    ws.on('error', (err) => {
+      if (!handled) {
+        handled = true;
+        json(res, 500, { error: err.message });
+      }
+    });
+  });
+  bb.on('error', (err) => {
+    if (!handled) {
+      handled = true;
+      json(res, 400, { error: `Multipart error: ${err.message}` });
+    }
+  });
+  req.pipe(bb);
+});
+server.listen(PORT, () => {
+  console.log(`Scan server listening on :${PORT}`);
+});
+```
+---
+## Express implementation
+```bash
+npm install pompelmi express multer
+```
+```js
+// scan-server-express.js
+const express = require('express');
+const multer  = require('multer');
+const fs      = require('fs');
+const { scan, Verdict } = require('pompelmi');
+const app    = express();
+const upload = multer({
+  dest:   require('os').tmpdir(),
+  limits: { fileSize: 100 * 1024 * 1024 },
+});
+const SCAN_OPTS = {
+  host:    process.env.CLAMAV_HOST,
+  port:    Number(process.env.CLAMAV_PORT) || 3310,
+  timeout: 30_000,
+};
+app.post('/scan', upload.single('file'), async (req, res) => {
+  if (!req.file) {
+    return res.status(400).json({ error: 'No file uploaded. Send a multipart/form-data request with field name "file".' });
+  }
+  const filePath = req.file.path;
+  try {
+    const result = await scan(filePath, SCAN_OPTS);
+    return res.status(result === Verdict.Clean ? 200 : 422).json({
+      verdict:  result.description,
+      filename: req.file.originalname,
+      size:     req.file.size,
+    });
+  } catch (err) {
+    return res.status(500).json({ error: err.message });
+  } finally {
+    try { fs.unlinkSync(filePath); } catch {}
+  }
+});
+app.get('/health', (_req, res) => res.json({ ok: true }));
+app.listen(Number(process.env.PORT) || 4000, () => {
+  console.log('Scan server ready.');
+});
+```
+---
+## JSON response format
+**Clean file (200):**
+```json
+{
+  "verdict":  "Clean",
+  "filename": "report.pdf",
+  "size":     245760
+}
+```
+**Malicious file (422):**
+```json
+{
+  "verdict":  "Malicious",
+  "filename": "evil.exe",
+  "size":     16384
+}
+```
+**Scan error (422):**
+```json
+{
+  "verdict":  "ScanError",
+  "filename": "protected.zip",
+  "size":     8192
+}
+```
+**Server error (500):**
+```json
+{
+  "error": "clamd connection timed out after 30000ms"
+}
+```
+---
+## Calling from another service
+### curl
+```bash
+curl -X POST http://scan-service:4000/scan \
+  -F "file=@/path/to/file.pdf" \
+  -w "\n%{http_code}"
+```
+### Node.js (using `form-data`)
+```bash
+npm install form-data node-fetch
+```
+```js
+const FormData = require('form-data');
+const fetch    = require('node-fetch');
+const fs       = require('fs');
+async function remoteVerdictCheck(filePath) {
+  const form = new FormData();
+  form.append('file', fs.createReadStream(filePath));
+  const res = await fetch('http://scan-service:4000/scan', {
+    method:  'POST',
+    body:    form,
+    headers: form.getHeaders(),
+  });
+  const body = await res.json();
+  return body.verdict; // 'Clean' | 'Malicious' | 'ScanError'
+}
+```
+### Python
+```python
+import requests
+with open('/path/to/file.pdf', 'rb') as f:
+    response = requests.post(
+        'http://scan-service:4000/scan',
+        files={'file': f},
+    )
+verdict = response.json()['verdict']
+print(verdict)  # Clean / Malicious / ScanError
+```
+---
+## Docker deployment
+```yaml
+# docker-compose.yml
+services:
+  scan-service:
+    build: .
+    ports:
+      - "4000:4000"
+    environment:
+      PORT: 4000
+      CLAMAV_HOST: clamav
+      CLAMAV_PORT: 3310
+      CLAMAV_TIMEOUT: 30000
+    depends_on:
+      clamav:
+        condition: service_healthy
+  clamav:
+    image: clamav/clamav:stable
+    volumes:
+      - clamav_db:/var/lib/clamav
+    healthcheck:
+      test: ["CMD", "clamdcheck"]
+      interval: 30s
+      timeout: 10s
+      retries: 5
+      start_period: 120s
+volumes:
+  clamav_db:
+```
+---
+## Security considerations for the scan service
+- **Network access:** Expose the scan service only within your internal network or VPC. Never expose it to the public internet.
+- **Authentication:** Add an API key or mTLS for service-to-service authentication.
+- **File size limits:** Set `limits.fileSize` on multer to prevent the scan service from being used as a DoS vector.
+- **Rate limiting:** Add rate limiting per caller IP or API key.
+```js
+const rateLimit = require('express-rate-limit');
+app.use('/scan', rateLimit({
+  windowMs: 60_000,
+  max: 60, // 60 scans per minute per IP
+}));
+```

package/wiki/s3-integration.md ADDED Viewed

@@ -0,0 +1,233 @@
+# S3 Integration
+Two common patterns when using pompelmi with Amazon S3: scan before uploading (local scan then putObject), and scan a file already in S3 (getObject stream then scanStream).
+---
+## Pattern 1: Scan locally, then upload to S3 if clean
+The file is scanned before it ever reaches S3. Malicious files are rejected and never uploaded.
+```js
+const fs = require('fs');
+const path = require('path');
+const { S3Client, PutObjectCommand } = require('@aws-sdk/client-s3');
+const { scan, Verdict } = require('pompelmi');
+const s3 = new S3Client({ region: process.env.AWS_REGION });
+const SCAN_OPTS = {
+  host: process.env.CLAMAV_HOST,
+  port: Number(process.env.CLAMAV_PORT) || 3310,
+  timeout: 30_000,
+};
+async function scanThenUpload(localPath, s3Key) {
+  const result = await scan(localPath, SCAN_OPTS);
+  if (result === Verdict.Malicious) {
+    throw new Error(`Malicious file rejected: ${localPath}`);
+  }
+  if (result === Verdict.ScanError) {
+    throw new Error(`Scan incomplete — rejecting file: ${localPath}`);
+  }
+  // Only reached if Verdict.Clean
+  const fileStream = fs.createReadStream(localPath);
+  await s3.send(new PutObjectCommand({
+    Bucket: process.env.S3_BUCKET,
+    Key:    s3Key,
+    Body:   fileStream,
+  }));
+  return s3Key;
+}
+```
+### In an Express upload route
+```js
+const express = require('express');
+const multer  = require('multer');
+const { scan, Verdict } = require('pompelmi');
+const { S3Client, PutObjectCommand } = require('@aws-sdk/client-s3');
+const fs = require('fs');
+const upload = multer({ dest: '/tmp/uploads' });
+const s3     = new S3Client({ region: process.env.AWS_REGION });
+const app    = express();
+app.post('/upload', upload.single('file'), async (req, res) => {
+  if (!req.file) return res.status(400).json({ error: 'No file.' });
+  const filePath = req.file.path;
+  try {
+    const result = await scan(filePath, { host: process.env.CLAMAV_HOST, port: 3310 });
+    if (result !== Verdict.Clean) {
+      fs.unlinkSync(filePath);
+      return res.status(422).json({ error: `Upload rejected: ${result.description}` });
+    }
+    const key = `uploads/${Date.now()}-${req.file.originalname}`;
+    await s3.send(new PutObjectCommand({
+      Bucket:      process.env.S3_BUCKET,
+      Key:         key,
+      Body:        fs.createReadStream(filePath),
+      ContentType: req.file.mimetype,
+    }));
+    fs.unlinkSync(filePath); // clean up temp file after upload
+    return res.json({ ok: true, key });
+  } catch (err) {
+    try { fs.unlinkSync(filePath); } catch {}
+    return res.status(500).json({ error: err.message });
+  }
+});
+```
+---
+## Pattern 2: Scan an S3 object stream
+Scan a file that already exists in S3 by streaming the `GetObjectCommand` response body through `scanStream()`. No data is written to the application host's disk.
+```js
+const { S3Client, GetObjectCommand } = require('@aws-sdk/client-s3');
+const { scanStream, Verdict } = require('pompelmi');
+const s3 = new S3Client({ region: process.env.AWS_REGION });
+const SCAN_OPTS = {
+  host: process.env.CLAMAV_HOST,
+  port: 3310,
+  timeout: 60_000,
+};
+async function scanS3Object(bucket, key) {
+  const response = await s3.send(new GetObjectCommand({ Bucket: bucket, Key: key }));
+  // response.Body is a SdkStreamMixin (Node.js Readable-compatible)
+  const result = await scanStream(response.Body, SCAN_OPTS);
+  return result;
+}
+// Usage
+const verdict = await scanS3Object('my-bucket', 'uploads/user-file.pdf');
+if (verdict === Verdict.Malicious) {
+  // Move to quarantine bucket
+  await moveToQuarantine('my-bucket', 'uploads/user-file.pdf');
+}
+```
+### AWS SDK v3 stream compatibility
+The AWS SDK v3 returns `response.Body` as a `SdkStreamMixin` which implements the Node.js `Readable` interface. Pass it directly to `scanStream()`:
+```js
+const response = await s3.send(new GetObjectCommand({ Bucket, Key }));
+const result = await scanStream(response.Body, SCAN_OPTS);
+```
+For the older AWS SDK v2:
+```js
+const response = s3.getObject({ Bucket, Key });
+const result = await scanStream(response.createReadStream(), SCAN_OPTS);
+```
+---
+## Pattern 3: Quarantine bucket
+Move malicious files to a separate quarantine bucket instead of deleting them, for forensic review.
+```js
+const {
+  S3Client,
+  GetObjectCommand,
+  CopyObjectCommand,
+  DeleteObjectCommand,
+} = require('@aws-sdk/client-s3');
+const { scanStream, Verdict } = require('pompelmi');
+const s3 = new S3Client({ region: process.env.AWS_REGION });
+async function scanAndQuarantine(sourceBucket, key) {
+  const response = await s3.send(new GetObjectCommand({
+    Bucket: sourceBucket,
+    Key:    key,
+  }));
+  const result = await scanStream(response.Body, {
+    host: process.env.CLAMAV_HOST,
+    port: 3310,
+  });
+  if (result === Verdict.Malicious) {
+    const quarantineKey = `quarantine/${Date.now()}-${key}`;
+    // Copy to quarantine bucket
+    await s3.send(new CopyObjectCommand({
+      CopySource:  `${sourceBucket}/${key}`,
+      Bucket:      process.env.QUARANTINE_BUCKET,
+      Key:         quarantineKey,
+    }));
+    // Delete from source
+    await s3.send(new DeleteObjectCommand({
+      Bucket: sourceBucket,
+      Key:    key,
+    }));
+    console.warn({ event: 'quarantined', sourceBucket, key, quarantineKey });
+    return { quarantined: true, quarantineKey };
+  }
+  return { quarantined: false, verdict: result.description };
+}
+```
+---
+## S3 trigger pattern (Lambda or background job)
+Scan every object as it arrives in an upload bucket using an S3 event trigger or a polling job:
+```js
+// Lambda handler (or background worker)
+async function processUpload(event) {
+  const record = event.Records[0];
+  const bucket = record.s3.bucket.name;
+  const key    = decodeURIComponent(record.s3.object.key);
+  const response = await s3.send(new GetObjectCommand({ Bucket: bucket, Key: key }));
+  const result   = await scanStream(response.Body, {
+    host: process.env.CLAMAV_HOST,
+    port: 3310,
+  });
+  if (result !== Verdict.Clean) {
+    await moveToQuarantine(bucket, key);
+    await notifyAdmin(key, result.description);
+  }
+}
+```
+> **Note on Lambda:** ClamAV cannot run inside a standard Lambda function. Use TCP mode pointing to clamd on a persistent host (EC2, ECS, Fargate) or a dedicated scan microservice.
+---
+## Environment variables
+```
+CLAMAV_HOST=clamav.internal
+CLAMAV_PORT=3310
+AWS_REGION=us-east-1
+S3_BUCKET=uploads-bucket
+QUARANTINE_BUCKET=quarantine-bucket
+```

package/wiki/security-considerations.md ADDED Viewed

@@ -0,0 +1,192 @@
+# Security Considerations
+pompelmi is one layer in a secure file upload pipeline — not a complete solution on its own. This page covers what ClamAV protects against, what it does not, and how to build a genuinely secure upload endpoint.
+---
+## What ClamAV detects
+ClamAV is a signature-based antivirus. It detects:
+- **Known malware** — executables, scripts, and documents matching its signature database
+- **Known malware inside archives** — ZIP, RAR, TAR, PDF, Office documents (recursive scanning)
+- **EICAR test files** — for verifying your integration
+- **Some heuristic patterns** — suspicious bytecode, known malware families
+ClamAV does **not** detect:
+- **Zero-day malware** — novel malware without a signature
+- **Obfuscated malware** — some malware evades signature matching through packing or encryption
+- **Logic bombs** — malicious code that only activates under specific conditions
+- **Malicious content that is not malware** — spam, phishing text, NSFW images, copyright violations
+**ClamAV is a necessary but not sufficient safeguard.** Use it as one layer in a defence-in-depth strategy.
+---
+## Reject on `ScanError`
+`Verdict.ScanError` means the scan did not complete. Password-protected archives, corrupt files, and oversized archives all return `ScanError`. These are common evasion techniques — always reject `ScanError` files:
+```js
+if (result !== Verdict.Clean) {
+  fs.unlinkSync(filePath);
+  return res.status(422).json({ error: 'Upload rejected.' });
+}
+```
+Never serve a file whose safety status is unknown.
+---
+## Validate MIME type and extension
+ClamAV checks content, not metadata. But your application may have legitimate business reasons to restrict file types. Add MIME validation as a complementary check:
+```js
+const ALLOWED_MIME_TYPES = new Set([
+  'application/pdf',
+  'image/jpeg',
+  'image/png',
+  'image/gif',
+  'image/webp',
+]);
+const ALLOWED_EXTENSIONS = new Set(['.pdf', '.jpg', '.jpeg', '.png', '.gif', '.webp']);
+function validateFile(file) {
+  const ext = path.extname(file.originalname).toLowerCase();
+  if (!ALLOWED_MIME_TYPES.has(file.mimetype)) {
+    throw new Error(`File type not allowed: ${file.mimetype}`);
+  }
+  if (!ALLOWED_EXTENSIONS.has(ext)) {
+    throw new Error(`File extension not allowed: ${ext}`);
+  }
+}
+```
+Note: MIME type from `req.file.mimetype` is supplied by the client and can be spoofed. For strong MIME validation, use `file-type` to detect the real MIME type from the file's magic bytes:
+```bash
+npm install file-type
+```
+```js
+const { fileTypeFromBuffer } = require('file-type');
+const detected = await fileTypeFromBuffer(req.file.buffer);
+if (!ALLOWED_MIME_TYPES.has(detected?.mime)) {
+  throw new Error('File type mismatch or not allowed.');
+}
+```
+---
+## Set file size limits
+Never let a file reach ClamAV if it exceeds your maximum allowed size. Check size before scanning:
+```js
+// multer
+const upload = multer({
+  dest: './uploads',
+  limits: { fileSize: 10 * 1024 * 1024 }, // 10 MB
+});
+// Manual check
+if (req.file.size > 10 * 1024 * 1024) {
+  fs.unlinkSync(req.file.path);
+  return res.status(413).json({ error: 'File too large.' });
+}
+```
+Large files slow down scans and can exhaust ClamAV's memory when unpacking archives.
+---
+## Never serve files from the upload directory directly
+After a file is uploaded and scanned, do not serve it directly from the upload directory as a static file. Instead:
+1. Move it to a separate storage location (S3, a content-addressed store, or a named directory).
+2. Serve files through a route handler that validates authorisation before returning the file.
+Serving uploads as static files bypasses all access control and lets any user download any uploaded file if they know or guess the path.
+```js
+// BAD — serves all uploads publicly
+app.use('/uploads', express.static('./uploads'));
+// GOOD — validate before serving
+app.get('/files/:id', authenticate, async (req, res) => {
+  const file = await db.files.findById(req.params.id);
+  if (!file || file.userId !== req.user.id) {
+    return res.status(404).end();
+  }
+  res.sendFile(file.storagePath);
+});
+```
+---
+## Store files with randomised names
+Never use the original filename from the upload. Sanitise or replace it entirely to prevent path traversal, null byte injection, and social engineering attacks:
+```js
+const { randomBytes } = require('crypto');
+const path = require('path');
+function safeFilename(originalname) {
+  const ext = path.extname(originalname).toLowerCase().replace(/[^a-z0-9.]/g, '');
+  return `${randomBytes(16).toString('hex')}${ext}`;
+}
+const storedName = safeFilename(req.file.originalname);
+```
+---
+## OWASP file upload security checklist
+| Control | How to implement with pompelmi |
+|---------|-------------------------------|
+| Validate file type | `file-type` for magic bytes + MIME allowlist |
+| Validate file size | `multer limits.fileSize` + pre-scan size check |
+| Scan for malware | `scan()` / `scanBuffer()` / `scanStream()` |
+| Rename uploaded files | Generate random names — never use original filename |
+| Store outside webroot | Use S3 or a non-public directory |
+| Serve through auth-gated handler | Route handler with session/token check |
+| Limit upload rate | Express rate limiting middleware |
+| Log all upload attempts | Log verdict, user, IP, original filename |
+| Reject `ScanError` | `if (result !== Verdict.Clean)` → reject |
+| Set Content-Security-Policy | Prevent XSS from served HTML files |
+---
+## Defence in depth
+pompelmi sits at layer 3 of a multi-layer defence:
+```
+Layer 1: TLS — encrypted transport
+Layer 2: Authentication — only authorised users can upload
+Layer 3: Size limits — reject oversized files before processing
+Layer 4: Extension / MIME allowlist — reject obviously wrong file types
+Layer 5: pompelmi — ClamAV signature scan
+Layer 6: Random storage name — no path traversal possible
+Layer 7: Auth-gated serving — no direct URL access to upload directory
+Layer 8: CSP headers — limit damage if a malicious file is served
+```
+Remove any one of these layers and the others compensate. pompelmi does not replace them — it adds to them.
+---
+## Privacy and data handling
+pompelmi scans files locally. In local mode, files are passed to `clamscan` as a path argument. In TCP mode, file content is streamed to your own clamd instance. **No file content is sent to any third party.** This makes pompelmi suitable for GDPR, HIPAA, and other privacy-sensitive environments.
+In TCP mode with `scanBuffer()` or `scanStream()`, no data is written to the application host's disk at all — the content goes directly from memory to the clamd daemon.