npm - pompelmi - Versions diffs - 1.5.0 → 1.6.0 - Mend

pompelmi 1.5.0 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/README.md +53 -59
package/llms.txt +22 -99
package/package.json +1 -1
package/release-notes-v1.4.0.md +25 -0
package/release-notes-v1.5.0.md +37 -0
package/src/BufferScanner.js +20 -17
package/src/ClamAVScanner.js +4 -4
package/src/ClamdScanner.js +18 -15
package/src/StreamScanner.js +20 -17
package/wiki/api-reference.md +268 -0
package/wiki/cli-usage.md +263 -0
package/wiki/concurrent-scanning.md +199 -0
package/wiki/docker-compose-production.md +190 -0
package/wiki/docker-setup.md +178 -0
package/wiki/error-handling.md +242 -0
package/wiki/express-integration.md +227 -0
package/wiki/fastify-integration.md +207 -0
package/wiki/home.md +0 -0
package/wiki/local-vs-tcp-mode.md +179 -0
package/wiki/multer-memory-storage.md +166 -0
package/wiki/nestjs-integration.md +228 -0
package/wiki/nextjs-integration.md +209 -0
package/wiki/performance.md +178 -0
package/wiki/quarantine-workflow.md +260 -0
package/wiki/rest-api-server.md +297 -0
package/wiki/s3-integration.md +233 -0
package/wiki/security-considerations.md +192 -0
package/wiki/typescript-usage.md +239 -0
package/wiki/verdicts.md +192 -0
package/wiki/virus-definitions.md +194 -0

package/wiki/concurrent-scanning.md ADDED Viewed

@@ -0,0 +1,199 @@
+# Concurrent Scanning
+Scanning multiple files in parallel improves throughput but introduces tradeoffs around resource usage, partial failures, and connection limits. This page covers the main patterns.
+---
+## `Promise.all` — scan multiple files in parallel
+`Promise.all` runs all scans concurrently and resolves when every scan completes. If any scan rejects (throws), the entire `Promise.all` rejects immediately.
+```js
+const { scan, Verdict } = require('pompelmi');
+const files = [
+  '/uploads/document.pdf',
+  '/uploads/photo.jpg',
+  '/uploads/archive.zip',
+];
+const results = await Promise.all(files.map(f => scan(f)));
+results.forEach((result, i) => {
+  if (result === Verdict.Malicious) {
+    console.log(`${files[i]} is malicious.`);
+  }
+});
+```
+Use `Promise.all` when:
+- All files must be accepted for the request to succeed.
+- You want to fail fast if any scan throws.
+---
+## `Promise.allSettled` — partial failures
+`Promise.allSettled` waits for all scans to complete regardless of individual failures. Each result has a `status` of `'fulfilled'` or `'rejected'`.
+```js
+const { scan, scanBuffer, Verdict } = require('pompelmi');
+const files = ['/uploads/a.pdf', '/uploads/b.zip', '/uploads/c.png'];
+const settled = await Promise.allSettled(
+  files.map(async (f) => ({ path: f, verdict: await scan(f) }))
+);
+const accepted = [];
+const rejected = [];
+for (const r of settled) {
+  if (r.status === 'rejected') {
+    rejected.push({ path: '?', reason: r.reason.message });
+    continue;
+  }
+  const { path, verdict } = r.value;
+  if (verdict === Verdict.Clean) {
+    accepted.push(path);
+  } else {
+    rejected.push({ path, reason: verdict.description });
+  }
+}
+console.log({ accepted, rejected });
+```
+Use `Promise.allSettled` when:
+- You want to process as many files as possible even if some fail.
+- You need to report which specific files were rejected.
+---
+## `scanDirectory()` — scan an entire folder
+`scanDirectory()` handles concurrent scanning of every file in a directory internally. It catches per-file errors and collects them into the `errors` array rather than throwing.
+```js
+const fs = require('fs');
+const { scanDirectory } = require('pompelmi');
+const results = await scanDirectory('/uploads', {
+  host: process.env.CLAMAV_HOST,
+  port: 3310,
+});
+console.log(`Clean: ${results.clean.length}`);
+console.log(`Malicious: ${results.malicious.length}`);
+console.log(`Errors: ${results.errors.length}`);
+// Auto-delete malicious files
+results.malicious.forEach(f => fs.unlinkSync(f));
+```
+Use `scanDirectory()` when:
+- You have an existing folder of files to audit.
+- You want a single-call interface with clean/malicious/errors output.
+---
+## Rate limiting concurrent scans with `p-limit`
+Unbounded `Promise.all` with a large number of files can overwhelm clamd or exhaust the OS file descriptor limit. Use `p-limit` to cap concurrency.
+```bash
+npm install p-limit
+```
+```js
+const pLimit = require('p-limit');
+const { scan, Verdict } = require('pompelmi');
+const files = getFilePaths(); // array of N paths
+const limit = pLimit(5);      // at most 5 concurrent scans
+const results = await Promise.all(
+  files.map(f => limit(() => scan(f, { host: 'clamav', port: 3310 })))
+);
+```
+Recommended concurrency limits:
+| Mode | Suggested concurrency |
+|------|----------------------|
+| Local (`clamscan`) | 2–4 (CPU-bound) |
+| TCP (single clamd) | 5–10 |
+| TCP (multiple clamd replicas) | 20–50 |
+Tune based on your hardware and observed clamd CPU usage.
+---
+## Concurrently scanning buffers
+```js
+const { scanBuffer, Verdict } = require('pompelmi');
+// req.files from multer.array()
+const results = await Promise.allSettled(
+  req.files.map(file =>
+    scanBuffer(file.buffer, { host: 'clamav', port: 3310 })
+      .then(verdict => ({ name: file.originalname, verdict }))
+  )
+);
+```
+---
+## Performance considerations
+### Local mode
+Each `scan()` in local mode spawns a `clamscan` child process. Spawning processes is expensive — ClamAV loads its virus database into memory on each invocation. For high-throughput local scanning, consider switching to TCP mode where a persistent `clamd` daemon keeps the database in memory.
+### TCP mode
+In TCP mode, pompelmi opens a new TCP connection per scan call. For sustained high-throughput workloads, the connection overhead is measurable. Options:
+1. **Increase concurrency gradually** — start at 5, measure clamd CPU, increase until you see degradation.
+2. **Scale clamd horizontally** — run multiple clamd containers behind a load balancer.
+3. **Connection pooling** — pompelmi does not pool connections. For extremely high throughput, implement a connection pool that keeps sockets open and reuses them.
+### Memory
+`scanBuffer()` holds the full file content in memory. For large files (>50 MB), prefer `scan()` (from disk) or `scanStream()` (streaming, no full buffering in TCP mode).
+---
+## Example: batch-scan upload queue
+```js
+const pLimit = require('p-limit');
+const { scan, Verdict } = require('pompelmi');
+const fs = require('fs');
+async function processBatch(filePaths) {
+  const limit = pLimit(8);
+  const results = await Promise.allSettled(
+    filePaths.map(filePath =>
+      limit(async () => {
+        const verdict = await scan(filePath, { host: 'clamav', port: 3310 });
+        return { filePath, verdict };
+      })
+    )
+  );
+  for (const r of results) {
+    if (r.status === 'rejected') {
+      console.error('Scan error:', r.reason.message);
+      continue;
+    }
+    const { filePath, verdict } = r.value;
+    if (verdict !== Verdict.Clean) {
+      fs.unlinkSync(filePath);
+      console.warn('Rejected:', filePath, verdict.description);
+    }
+  }
+}
+```

package/wiki/docker-compose-production.md ADDED Viewed

@@ -0,0 +1,190 @@
+# Docker Compose — Production Setup
+Production-grade docker-compose configuration for running pompelmi with a ClamAV sidecar. This setup includes health checks, restart policy, persistent virus definition storage, and environment variable configuration.
+---
+## Complete `docker-compose.yml`
+```yaml
+services:
+  app:
+    build: .
+    ports:
+      - "3000:3000"
+    environment:
+      NODE_ENV: production
+      CLAMAV_HOST: clamav
+      CLAMAV_PORT: "3310"
+      CLAMAV_TIMEOUT: "30000"
+    depends_on:
+      clamav:
+        condition: service_healthy
+    restart: unless-stopped
+    volumes:
+      - uploads:/app/uploads
+  clamav:
+    image: clamav/clamav:stable
+    ports:
+      - "3310:3310"
+    restart: unless-stopped
+    volumes:
+      - clamav_db:/var/lib/clamav    # persist virus definitions across restarts
+    healthcheck:
+      test: ["CMD", "clamdcheck"]
+      interval: 30s
+      timeout: 10s
+      retries: 5
+      start_period: 120s             # first start downloads ~300 MB of definitions
+volumes:
+  clamav_db:
+  uploads:
+```
+---
+## Application code
+Read options from environment variables so the same image works in all environments:
+```js
+const { scan, scanBuffer, scanStream, Verdict } = require('pompelmi');
+const SCAN_OPTS = {
+  host:    process.env.CLAMAV_HOST    || '127.0.0.1',
+  port:    Number(process.env.CLAMAV_PORT)    || 3310,
+  timeout: Number(process.env.CLAMAV_TIMEOUT) || 15_000,
+};
+const result = await scan('/uploads/file.pdf', SCAN_OPTS);
+```
+---
+## Dockerfile example
+```dockerfile
+FROM node:22-alpine
+WORKDIR /app
+COPY package*.json ./
+RUN npm ci --omit=dev
+COPY . .
+RUN mkdir -p uploads
+EXPOSE 3000
+CMD ["node", "src/server.js"]
+```
+Note: `clamscan` does **not** need to be installed in the application container when using TCP mode. The ClamAV sidecar handles all scanning.
+---
+## Health check explanation
+`clamdcheck` is a shell script bundled inside the `clamav/clamav:stable` image. It sends a `PING` to the local clamd socket and checks the response. The `start_period: 120s` gives clamd time to download virus definitions on first start before health checks begin counting failures.
+If you need to check from outside the container, you can also use TCP:
+```bash
+echo -n "PING" | nc -q1 localhost 3310
+# expected response: PONG
+```
+---
+## `depends_on` with health check
+```yaml
+depends_on:
+  clamav:
+    condition: service_healthy
+```
+This prevents the application container from starting until clamd passes its health check. Without this, your app may start and immediately fail its first scan with "connection refused."
+---
+## Scaling considerations
+### Vertical scaling
+ClamAV is single-threaded per scan. For high-throughput use cases, run multiple clamd containers behind a load balancer rather than trying to parallelise within one instance.
+### Horizontal scaling
+```yaml
+services:
+  clamav:
+    image: clamav/clamav:stable
+    deploy:
+      replicas: 3
+    volumes:
+      - clamav_db:/var/lib/clamav
+```
+Point your application at a load balancer in front of the clamd replicas. Note: each clamd replica downloads its own virus database on startup unless you share the volume (which requires care with concurrent freshclam writes).
+### Alternative: one clamd per app instance
+For simpler setups, co-deploy one clamd container with each app container. Each pair shares a clamd_db volume scoped to the pair.
+---
+## Resource limits
+ClamAV can use significant memory when unpacking large archives:
+```yaml
+clamav:
+  image: clamav/clamav:stable
+  deploy:
+    resources:
+      limits:
+        memory: 1g
+        cpus: '1.0'
+```
+Tune based on your expected file sizes. Scanning uncompressed archives >100 MB may require more.
+---
+## Keeping virus definitions fresh
+The `clamav/clamav:stable` image runs `freshclam` on a schedule automatically. Verify definitions are up to date:
+```bash
+docker compose exec clamav freshclam --verbose
+```
+Or trigger a manual update:
+```bash
+docker compose exec clamav freshclam
+```
+For zero-downtime definition updates, restart the clamav container (freshclam updates on startup) without restarting the app container:
+```bash
+docker compose restart clamav
+```
+The named volume preserves the downloaded definitions across restarts, so only incremental updates are downloaded after the first start.
+---
+## Production checklist
+- [ ] `restart: unless-stopped` on both services
+- [ ] `healthcheck` configured on clamav with `start_period` ≥ 90s
+- [ ] `depends_on: condition: service_healthy` on app
+- [ ] `CLAMAV_HOST`, `CLAMAV_PORT`, `CLAMAV_TIMEOUT` via environment variables
+- [ ] Named volume for `clamav_db` (not anonymous)
+- [ ] File size limits in your HTTP server (before the scan is reached)
+- [ ] Upload directory in a named volume (survives container restarts)
+- [ ] Log aggregation: capture `app` and `clamav` container logs

package/wiki/docker-setup.md ADDED Viewed

@@ -0,0 +1,178 @@
+# Docker Setup
+Run ClamAV as a Docker sidecar so your application host requires no local ClamAV installation. pompelmi's TCP mode streams files directly to the clamd daemon — the API is identical to local mode.
+---
+## Why a Docker sidecar?
+- **No local install** — the application container stays lean; ClamAV and its virus definitions live in a dedicated sidecar.
+- **Always up-to-date definitions** — the official `clamav/clamav:stable` image runs `freshclam` on startup and periodically refreshes the database.
+- **Isolation** — ClamAV runs in its own process/container; a crash or restart does not affect your application.
+- **Consistent environments** — same image in development, staging, and production.
+---
+## docker-compose.yml
+```yaml
+services:
+  app:
+    build: .
+    ports:
+      - "3000:3000"
+    environment:
+      CLAMAV_HOST: clamav
+      CLAMAV_PORT: 3310
+    depends_on:
+      clamav:
+        condition: service_healthy
+  clamav:
+    image: clamav/clamav:stable
+    ports:
+      - "3310:3310"
+    restart: unless-stopped
+    volumes:
+      - clamav_db:/var/lib/clamav   # persist virus definitions across restarts
+    healthcheck:
+      test: ["CMD", "clamdcheck"]   # bundled check script in clamav/clamav image
+      interval: 30s
+      timeout: 10s
+      retries: 5
+      start_period: 120s            # freshclam download takes time on first boot
+volumes:
+  clamav_db:
+```
+> **First boot:** The image downloads the full virus database (~300 MB) before clamd starts accepting connections. `start_period: 120s` gives it time. On subsequent restarts the volume cache means startup is near-instant.
+---
+## Pointing pompelmi at clamd
+Pass `host` and `port` to any pompelmi function. No other code changes are needed.
+```js
+const { scan, scanBuffer, scanStream, scanDirectory, Verdict } = require('pompelmi');
+const CLAMAV_OPTS = {
+  host: process.env.CLAMAV_HOST || '127.0.0.1',
+  port: Number(process.env.CLAMAV_PORT) || 3310,
+  timeout: 30_000,  // ms — increase for large files
+};
+// scan a file by path
+const result = await scan('/uploads/report.pdf', CLAMAV_OPTS);
+// scan an in-memory Buffer (multer memoryStorage)
+const result = await scanBuffer(req.file.buffer, CLAMAV_OPTS);
+// scan a Readable stream (S3, HTTP, pipes)
+const stream = s3.getObject({ Bucket, Key }).createReadStream();
+const result = await scanStream(stream, CLAMAV_OPTS);
+// recursively scan a directory
+const results = await scanDirectory('/uploads', CLAMAV_OPTS);
+```
+All four functions return the same `Verdict.Clean`, `Verdict.Malicious`, or `Verdict.ScanError` Symbols. No code changes are required when switching between local and TCP mode.
+---
+## Configuring timeout for large files
+The `timeout` option sets the socket idle timeout in milliseconds (default: 15 000 ms). Increase it when scanning large archives or slow network links.
+```js
+const result = await scan('/uploads/large-archive.zip', {
+  host: 'clamav',
+  port: 3310,
+  timeout: 120_000,  // 2 minutes
+});
+```
+If clamd takes longer than `timeout` ms without sending data, pompelmi rejects with:
+```
+clamd connection timed out after 120000ms
+```
+---
+## Production tips
+### Health checks
+The `healthcheck` in the example above uses the `clamdcheck` script bundled in the official image. Your application container uses `depends_on: condition: service_healthy` so it only starts once clamd is ready.
+### Restart policy
+```yaml
+restart: unless-stopped
+```
+This ensures clamd comes back up after host reboots or OOM kills without manual intervention.
+### Persisting virus definitions
+The named volume `clamav_db` mounts to `/var/lib/clamav` inside the container. This means:
+- First start downloads definitions once (~300 MB).
+- Subsequent restarts reuse the cache; `freshclam` only downloads incremental updates.
+- The volume survives `docker compose down` (use `docker compose down -v` to wipe it).
+### Resource limits (optional)
+ClamAV can be memory-hungry when scanning large ZIP archives. Set a limit if needed:
+```yaml
+clamav:
+  image: clamav/clamav:stable
+  deploy:
+    resources:
+      limits:
+        memory: 1g
+```
+---
+## Troubleshooting
+### clamd not ready on startup
+**Symptom:** Application starts before clamd is accepting connections; first scan fails with connection refused.
+**Fix:** Add `depends_on` with `condition: service_healthy` (see example above) and ensure the `healthcheck` is configured on the clamav service. The `start_period` must be long enough for the initial database download.
+### Connection refused
+**Symptom:** `ECONNREFUSED 127.0.0.1:3310`
+**Causes and fixes:**
+1. clamd container is not running — `docker compose ps` to check.
+2. Wrong host — if the app is inside Docker, use the service name (`clamav`), not `127.0.0.1`.
+3. Port not exposed — verify the `ports` mapping in `docker-compose.yml`.
+4. clamd is still loading the virus database — add the `healthcheck` and `depends_on` described above.
+### Timeout errors
+**Symptom:** `clamd connection timed out after 15000ms`
+**Fixes:**
+1. Increase `timeout` in the options object (e.g. `timeout: 60_000`).
+2. Check clamd resource limits — if it is CPU- or memory-constrained it will scan slowly.
+3. Check network latency between app and clamav containers.
+### Virus definitions out of date
+The official image runs `freshclam` periodically. If you see scan errors mentioning outdated definitions, exec into the container and run it manually:
+```bash
+docker compose exec clamav freshclam
+```
+Or restart the container; `freshclam` runs at startup.