npm - pompelmi - Versions diffs - 1.5.0 → 1.7.0 - Mend

pompelmi 1.5.0 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/README.md +113 -195
package/action/Dockerfile +24 -0
package/action/entrypoint.sh +23 -0
package/action/scanner.js +89 -0
package/action.yml +29 -0
package/llms.txt +22 -99
package/package.json +1 -1
package/pr_info.tmp +2 -0
package/release-notes-v1.4.0.md +25 -0
package/release-notes-v1.5.0.md +37 -0
package/src/BufferScanner.js +20 -17
package/src/ClamAVScanner.js +4 -4
package/src/ClamdScanner.js +18 -15
package/src/StreamScanner.js +20 -17
package/wiki/api-reference.md +268 -0
package/wiki/cli-usage.md +263 -0
package/wiki/concurrent-scanning.md +199 -0
package/wiki/docker-compose-production.md +190 -0
package/wiki/docker-setup.md +178 -0
package/wiki/error-handling.md +242 -0
package/wiki/express-integration.md +227 -0
package/wiki/fastify-integration.md +207 -0
package/wiki/home.md +0 -0
package/wiki/local-vs-tcp-mode.md +179 -0
package/wiki/multer-memory-storage.md +166 -0
package/wiki/nestjs-integration.md +228 -0
package/wiki/nextjs-integration.md +209 -0
package/wiki/performance.md +178 -0
package/wiki/quarantine-workflow.md +260 -0
package/wiki/rest-api-server.md +297 -0
package/wiki/s3-integration.md +233 -0
package/wiki/security-considerations.md +192 -0
package/wiki/typescript-usage.md +239 -0
package/wiki/verdicts.md +192 -0
package/wiki/virus-definitions.md +194 -0

package/wiki/docker-compose-production.md ADDED Viewed

@@ -0,0 +1,190 @@
+# Docker Compose — Production Setup
+Production-grade docker-compose configuration for running pompelmi with a ClamAV sidecar. This setup includes health checks, restart policy, persistent virus definition storage, and environment variable configuration.
+---
+## Complete `docker-compose.yml`
+```yaml
+services:
+  app:
+    build: .
+    ports:
+      - "3000:3000"
+    environment:
+      NODE_ENV: production
+      CLAMAV_HOST: clamav
+      CLAMAV_PORT: "3310"
+      CLAMAV_TIMEOUT: "30000"
+    depends_on:
+      clamav:
+        condition: service_healthy
+    restart: unless-stopped
+    volumes:
+      - uploads:/app/uploads
+  clamav:
+    image: clamav/clamav:stable
+    ports:
+      - "3310:3310"
+    restart: unless-stopped
+    volumes:
+      - clamav_db:/var/lib/clamav    # persist virus definitions across restarts
+    healthcheck:
+      test: ["CMD", "clamdcheck"]
+      interval: 30s
+      timeout: 10s
+      retries: 5
+      start_period: 120s             # first start downloads ~300 MB of definitions
+volumes:
+  clamav_db:
+  uploads:
+```
+---
+## Application code
+Read options from environment variables so the same image works in all environments:
+```js
+const { scan, scanBuffer, scanStream, Verdict } = require('pompelmi');
+const SCAN_OPTS = {
+  host:    process.env.CLAMAV_HOST    || '127.0.0.1',
+  port:    Number(process.env.CLAMAV_PORT)    || 3310,
+  timeout: Number(process.env.CLAMAV_TIMEOUT) || 15_000,
+};
+const result = await scan('/uploads/file.pdf', SCAN_OPTS);
+```
+---
+## Dockerfile example
+```dockerfile
+FROM node:22-alpine
+WORKDIR /app
+COPY package*.json ./
+RUN npm ci --omit=dev
+COPY . .
+RUN mkdir -p uploads
+EXPOSE 3000
+CMD ["node", "src/server.js"]
+```
+Note: `clamscan` does **not** need to be installed in the application container when using TCP mode. The ClamAV sidecar handles all scanning.
+---
+## Health check explanation
+`clamdcheck` is a shell script bundled inside the `clamav/clamav:stable` image. It sends a `PING` to the local clamd socket and checks the response. The `start_period: 120s` gives clamd time to download virus definitions on first start before health checks begin counting failures.
+If you need to check from outside the container, you can also use TCP:
+```bash
+echo -n "PING" | nc -q1 localhost 3310
+# expected response: PONG
+```
+---
+## `depends_on` with health check
+```yaml
+depends_on:
+  clamav:
+    condition: service_healthy
+```
+This prevents the application container from starting until clamd passes its health check. Without this, your app may start and immediately fail its first scan with "connection refused."
+---
+## Scaling considerations
+### Vertical scaling
+ClamAV is single-threaded per scan. For high-throughput use cases, run multiple clamd containers behind a load balancer rather than trying to parallelise within one instance.
+### Horizontal scaling
+```yaml
+services:
+  clamav:
+    image: clamav/clamav:stable
+    deploy:
+      replicas: 3
+    volumes:
+      - clamav_db:/var/lib/clamav
+```
+Point your application at a load balancer in front of the clamd replicas. Note: each clamd replica downloads its own virus database on startup unless you share the volume (which requires care with concurrent freshclam writes).
+### Alternative: one clamd per app instance
+For simpler setups, co-deploy one clamd container with each app container. Each pair shares a clamd_db volume scoped to the pair.
+---
+## Resource limits
+ClamAV can use significant memory when unpacking large archives:
+```yaml
+clamav:
+  image: clamav/clamav:stable
+  deploy:
+    resources:
+      limits:
+        memory: 1g
+        cpus: '1.0'
+```
+Tune based on your expected file sizes. Scanning uncompressed archives >100 MB may require more.
+---
+## Keeping virus definitions fresh
+The `clamav/clamav:stable` image runs `freshclam` on a schedule automatically. Verify definitions are up to date:
+```bash
+docker compose exec clamav freshclam --verbose
+```
+Or trigger a manual update:
+```bash
+docker compose exec clamav freshclam
+```
+For zero-downtime definition updates, restart the clamav container (freshclam updates on startup) without restarting the app container:
+```bash
+docker compose restart clamav
+```
+The named volume preserves the downloaded definitions across restarts, so only incremental updates are downloaded after the first start.
+---
+## Production checklist
+- [ ] `restart: unless-stopped` on both services
+- [ ] `healthcheck` configured on clamav with `start_period` ≥ 90s
+- [ ] `depends_on: condition: service_healthy` on app
+- [ ] `CLAMAV_HOST`, `CLAMAV_PORT`, `CLAMAV_TIMEOUT` via environment variables
+- [ ] Named volume for `clamav_db` (not anonymous)
+- [ ] File size limits in your HTTP server (before the scan is reached)
+- [ ] Upload directory in a named volume (survives container restarts)
+- [ ] Log aggregation: capture `app` and `clamav` container logs

package/wiki/docker-setup.md ADDED Viewed

@@ -0,0 +1,178 @@
+# Docker Setup
+Run ClamAV as a Docker sidecar so your application host requires no local ClamAV installation. pompelmi's TCP mode streams files directly to the clamd daemon — the API is identical to local mode.
+---
+## Why a Docker sidecar?
+- **No local install** — the application container stays lean; ClamAV and its virus definitions live in a dedicated sidecar.
+- **Always up-to-date definitions** — the official `clamav/clamav:stable` image runs `freshclam` on startup and periodically refreshes the database.
+- **Isolation** — ClamAV runs in its own process/container; a crash or restart does not affect your application.
+- **Consistent environments** — same image in development, staging, and production.
+---
+## docker-compose.yml
+```yaml
+services:
+  app:
+    build: .
+    ports:
+      - "3000:3000"
+    environment:
+      CLAMAV_HOST: clamav
+      CLAMAV_PORT: 3310
+    depends_on:
+      clamav:
+        condition: service_healthy
+  clamav:
+    image: clamav/clamav:stable
+    ports:
+      - "3310:3310"
+    restart: unless-stopped
+    volumes:
+      - clamav_db:/var/lib/clamav   # persist virus definitions across restarts
+    healthcheck:
+      test: ["CMD", "clamdcheck"]   # bundled check script in clamav/clamav image
+      interval: 30s
+      timeout: 10s
+      retries: 5
+      start_period: 120s            # freshclam download takes time on first boot
+volumes:
+  clamav_db:
+```
+> **First boot:** The image downloads the full virus database (~300 MB) before clamd starts accepting connections. `start_period: 120s` gives it time. On subsequent restarts the volume cache means startup is near-instant.
+---
+## Pointing pompelmi at clamd
+Pass `host` and `port` to any pompelmi function. No other code changes are needed.
+```js
+const { scan, scanBuffer, scanStream, scanDirectory, Verdict } = require('pompelmi');
+const CLAMAV_OPTS = {
+  host: process.env.CLAMAV_HOST || '127.0.0.1',
+  port: Number(process.env.CLAMAV_PORT) || 3310,
+  timeout: 30_000,  // ms — increase for large files
+};
+// scan a file by path
+const result = await scan('/uploads/report.pdf', CLAMAV_OPTS);
+// scan an in-memory Buffer (multer memoryStorage)
+const result = await scanBuffer(req.file.buffer, CLAMAV_OPTS);
+// scan a Readable stream (S3, HTTP, pipes)
+const stream = s3.getObject({ Bucket, Key }).createReadStream();
+const result = await scanStream(stream, CLAMAV_OPTS);
+// recursively scan a directory
+const results = await scanDirectory('/uploads', CLAMAV_OPTS);
+```
+All four functions return the same `Verdict.Clean`, `Verdict.Malicious`, or `Verdict.ScanError` Symbols. No code changes are required when switching between local and TCP mode.
+---
+## Configuring timeout for large files
+The `timeout` option sets the socket idle timeout in milliseconds (default: 15 000 ms). Increase it when scanning large archives or slow network links.
+```js
+const result = await scan('/uploads/large-archive.zip', {
+  host: 'clamav',
+  port: 3310,
+  timeout: 120_000,  // 2 minutes
+});
+```
+If clamd takes longer than `timeout` ms without sending data, pompelmi rejects with:
+```
+clamd connection timed out after 120000ms
+```
+---
+## Production tips
+### Health checks
+The `healthcheck` in the example above uses the `clamdcheck` script bundled in the official image. Your application container uses `depends_on: condition: service_healthy` so it only starts once clamd is ready.
+### Restart policy
+```yaml
+restart: unless-stopped
+```
+This ensures clamd comes back up after host reboots or OOM kills without manual intervention.
+### Persisting virus definitions
+The named volume `clamav_db` mounts to `/var/lib/clamav` inside the container. This means:
+- First start downloads definitions once (~300 MB).
+- Subsequent restarts reuse the cache; `freshclam` only downloads incremental updates.
+- The volume survives `docker compose down` (use `docker compose down -v` to wipe it).
+### Resource limits (optional)
+ClamAV can be memory-hungry when scanning large ZIP archives. Set a limit if needed:
+```yaml
+clamav:
+  image: clamav/clamav:stable
+  deploy:
+    resources:
+      limits:
+        memory: 1g
+```
+---
+## Troubleshooting
+### clamd not ready on startup
+**Symptom:** Application starts before clamd is accepting connections; first scan fails with connection refused.
+**Fix:** Add `depends_on` with `condition: service_healthy` (see example above) and ensure the `healthcheck` is configured on the clamav service. The `start_period` must be long enough for the initial database download.
+### Connection refused
+**Symptom:** `ECONNREFUSED 127.0.0.1:3310`
+**Causes and fixes:**
+1. clamd container is not running — `docker compose ps` to check.
+2. Wrong host — if the app is inside Docker, use the service name (`clamav`), not `127.0.0.1`.
+3. Port not exposed — verify the `ports` mapping in `docker-compose.yml`.
+4. clamd is still loading the virus database — add the `healthcheck` and `depends_on` described above.
+### Timeout errors
+**Symptom:** `clamd connection timed out after 15000ms`
+**Fixes:**
+1. Increase `timeout` in the options object (e.g. `timeout: 60_000`).
+2. Check clamd resource limits — if it is CPU- or memory-constrained it will scan slowly.
+3. Check network latency between app and clamav containers.
+### Virus definitions out of date
+The official image runs `freshclam` periodically. If you see scan errors mentioning outdated definitions, exec into the container and run it manually:
+```bash
+docker compose exec clamav freshclam
+```
+Or restart the container; `freshclam` runs at startup.

package/wiki/error-handling.md ADDED Viewed

@@ -0,0 +1,242 @@
+# Error Handling
+pompelmi has two distinct failure modes that require different handling: **rejected Promises** (the function threw) and **`Verdict.ScanError`** (the scan completed but could not determine safety). Understanding the difference is critical to building a secure upload pipeline.
+---
+## The two failure modes
+### 1. Rejected Promise (the scan function threw)
+`scan()`, `scanBuffer()`, `scanStream()`, and `scanDirectory()` reject when something prevents the scan from running at all — not when the scan completes and finds a problem.
+Common rejection causes:
+| Error message | Cause |
+|---------------|-------|
+| `filePath must be a string` | Wrong argument type |
+| `File not found: <path>` | File does not exist |
+| `ENOENT` | `clamscan` not installed or not in PATH |
+| `Unexpected exit code: N` | ClamAV internal error |
+| `Process killed by signal: SIGTERM` | Process killed (OOM, timeout) |
+| `clamd connection timed out after Nms` | TCP timeout exceeded |
+| `buffer must be a Buffer` | Wrong argument to `scanBuffer()` |
+| `stream must be a Readable` | Wrong argument to `scanStream()` |
+| `dirPath must be a string` | Wrong argument to `scanDirectory()` |
+| `Directory not found: <path>` | Directory does not exist |
+These are programming errors or infrastructure failures. Handle them with `try/catch`.
+### 2. `Verdict.ScanError` (scan completed, result unknown)
+`Verdict.ScanError` resolves (does not throw) and indicates ClamAV ran but could not produce a clean/malicious verdict. Common causes: encrypted archives, corrupt files, permission errors, I/O issues.
+---
+## The secure default: reject on both
+The safest policy: any outcome other than `Verdict.Clean` results in rejection.
+```js
+const { scan, Verdict } = require('pompelmi');
+const fs = require('fs');
+async function scanAndAccept(filePath) {
+  try {
+    const result = await scan(filePath, { host: 'clamav', port: 3310 });
+    if (result === Verdict.Malicious) {
+      fs.unlinkSync(filePath);
+      throw new Error('Malicious file rejected.');
+    }
+    if (result === Verdict.ScanError) {
+      fs.unlinkSync(filePath);
+      throw new Error('Scan incomplete — file rejected as precaution.');
+    }
+    return filePath; // Verdict.Clean
+  } catch (err) {
+    // Covers both scan() rejections and our own thrown Errors above.
+    // Delete the file defensively if it still exists.
+    try { fs.unlinkSync(filePath); } catch {}
+    throw err;
+  }
+}
+```
+---
+## When to retry `ScanError`
+A `ScanError` caused by a transient network blip or a momentary clamd overload is worth one retry. A `ScanError` caused by a corrupt file or encrypted archive will always return `ScanError` — retrying wastes time.
+```js
+async function scanWithRetry(filePath, opts, retries = 1) {
+  for (let attempt = 0; attempt <= retries; attempt++) {
+    try {
+      const result = await scan(filePath, opts);
+      if (result !== Verdict.ScanError || attempt === retries) {
+        return result;
+      }
+      // ScanError on non-final attempt — wait briefly and retry
+      await new Promise(r => setTimeout(r, 500));
+    } catch (err) {
+      if (attempt === retries) throw err;
+    }
+  }
+}
+```
+Do not retry `Verdict.Malicious` — the signature match is deterministic.
+---
+## Cleanup with `finally`
+Ensure temp files are always deleted regardless of scan outcome:
+```js
+const os   = require('os');
+const fs   = require('fs');
+const path = require('path');
+const { scan, Verdict } = require('pompelmi');
+async function scanBuffer_manual(buffer) {
+  const tmpPath = path.join(os.tmpdir(), `scan-${Date.now()}.tmp`);
+  fs.writeFileSync(tmpPath, buffer);
+  try {
+    return await scan(tmpPath);
+  } finally {
+    try { fs.unlinkSync(tmpPath); } catch {}
+  }
+}
+```
+`scanBuffer()` handles this `finally` pattern internally in local mode — you don't need to replicate it when using the API directly.
+---
+## Express error handling pattern
+```js
+const express = require('express');
+const multer  = require('multer');
+const fs      = require('fs');
+const { scan, Verdict } = require('pompelmi');
+const app    = express();
+const upload = multer({ dest: './uploads', limits: { fileSize: 10 * 1024 * 1024 } });
+app.post('/upload', upload.single('file'), async (req, res, next) => {
+  if (!req.file) return res.status(400).json({ error: 'No file uploaded.' });
+  try {
+    const result = await scan(req.file.path, { host: 'clamav', port: 3310 });
+    if (result === Verdict.Malicious) {
+      fs.unlinkSync(req.file.path);
+      return res.status(422).json({ error: 'Malicious file rejected.' });
+    }
+    if (result === Verdict.ScanError) {
+      fs.unlinkSync(req.file.path);
+      return res.status(422).json({ error: 'Scan failed — file rejected.' });
+    }
+    return res.json({ ok: true, filename: req.file.filename });
+  } catch (err) {
+    try { fs.unlinkSync(req.file.path); } catch {}
+    next(err); // forward to Express error middleware
+  }
+});
+// Global error handler
+app.use((err, req, res, next) => {
+  console.error(err);
+  res.status(500).json({ error: 'Internal scan error.' });
+});
+```
+---
+## Logging best practices
+Log rejections with enough context to investigate later — but never log file contents.
+```js
+const logger = require('./logger'); // pino, winston, etc.
+if (result === Verdict.Malicious) {
+  logger.warn({
+    event:       'malware_detected',
+    filePath,
+    originalname: req.file.originalname,
+    mimetype:     req.file.mimetype,
+    size:         req.file.size,
+    userId:       req.user?.id,
+    ip:           req.ip,
+  });
+}
+```
+For `ScanError`:
+```js
+if (result === Verdict.ScanError) {
+  logger.warn({
+    event:    'scan_error',
+    filePath,
+    mimetype: req.file.mimetype,
+    size:     req.file.size,
+  });
+}
+```
+For scan function rejections:
+```js
+} catch (err) {
+  logger.error({
+    event:   'scan_threw',
+    message: err.message,
+    filePath,
+  });
+}
+```
+---
+## HTTP status code conventions
+| Situation | Recommended status |
+|-----------|-------------------|
+| No file in request | `400 Bad Request` |
+| Wrong argument (programming error) | `400 Bad Request` |
+| `Verdict.Malicious` | `422 Unprocessable Entity` |
+| `Verdict.ScanError` (reject policy) | `422 Unprocessable Entity` |
+| `scan()` throws (infra error) | `500 Internal Server Error` |
+| File too large (pre-scan) | `413 Content Too Large` |
+| `Verdict.Clean` | `200 OK` / `201 Created` |
+---
+## `scanDirectory()` error handling
+Per-file failures in `scanDirectory()` go into the `errors` array — the function itself only rejects on argument errors or missing directory.
+```js
+const { scanDirectory } = require('pompelmi');
+try {
+  const results = await scanDirectory('/uploads');
+  if (results.errors.length > 0) {
+    logger.warn({ event: 'scan_errors', paths: results.errors });
+    // Decide: reject the whole batch, or only reject the errored files
+  }
+} catch (err) {
+  // dirPath not a string, or directory not found
+  logger.error({ event: 'scan_threw', message: err.message });
+}
+```