npm - pompelmi - Versions diffs - 1.5.0 → 1.7.0 - Mend

pompelmi 1.5.0 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/README.md +113 -195
package/action/Dockerfile +24 -0
package/action/entrypoint.sh +23 -0
package/action/scanner.js +89 -0
package/action.yml +29 -0
package/llms.txt +22 -99
package/package.json +1 -1
package/pr_info.tmp +2 -0
package/release-notes-v1.4.0.md +25 -0
package/release-notes-v1.5.0.md +37 -0
package/src/BufferScanner.js +20 -17
package/src/ClamAVScanner.js +4 -4
package/src/ClamdScanner.js +18 -15
package/src/StreamScanner.js +20 -17
package/wiki/api-reference.md +268 -0
package/wiki/cli-usage.md +263 -0
package/wiki/concurrent-scanning.md +199 -0
package/wiki/docker-compose-production.md +190 -0
package/wiki/docker-setup.md +178 -0
package/wiki/error-handling.md +242 -0
package/wiki/express-integration.md +227 -0
package/wiki/fastify-integration.md +207 -0
package/wiki/home.md +0 -0
package/wiki/local-vs-tcp-mode.md +179 -0
package/wiki/multer-memory-storage.md +166 -0
package/wiki/nestjs-integration.md +228 -0
package/wiki/nextjs-integration.md +209 -0
package/wiki/performance.md +178 -0
package/wiki/quarantine-workflow.md +260 -0
package/wiki/rest-api-server.md +297 -0
package/wiki/s3-integration.md +233 -0
package/wiki/security-considerations.md +192 -0
package/wiki/typescript-usage.md +239 -0
package/wiki/verdicts.md +192 -0
package/wiki/virus-definitions.md +194 -0

package/wiki/api-reference.md ADDED Viewed

@@ -0,0 +1,268 @@
+# API Reference
+Complete reference for all public functions exported by pompelmi.
+---
+## Installation
+```bash
+npm install pompelmi
+```
+```js
+const { scan, scanBuffer, scanStream, scanDirectory, Verdict } = require('pompelmi');
+```
+---
+## `scan(filePath, [options])`
+Scan a file by absolute or relative path. In local mode spawns `clamscan`; in TCP mode streams the file to clamd via INSTREAM.
+```ts
+scan(
+  filePath: string,
+  options?: {
+    host?: string;
+    port?: number;
+    timeout?: number;
+  }
+): Promise<symbol>
+```
+### Parameters
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `filePath` | `string` | Yes | Path to the file to scan. Use `path.resolve()` for safety. |
+| `options.host` | `string` | No | clamd hostname. Setting this enables TCP mode. |
+| `options.port` | `number` | No | clamd port. Default: `3310`. |
+| `options.timeout` | `number` | No | Socket idle timeout in ms (TCP mode only). Default: `15000`. |
+### Returns
+`Promise<symbol>` — resolves to one of the three `Verdict` Symbols:
+| Verdict | Local exit code | TCP response | Meaning |
+|---------|-----------------|--------------|---------|
+| `Verdict.Clean` | `0` | `stream: OK` | No threats found. |
+| `Verdict.Malicious` | `1` | `stream: <name> FOUND` | Known malware signature matched. |
+| `Verdict.ScanError` | `2` | other response | Scan could not complete. Treat as untrusted. |
+### Rejects with
+| Message | Cause |
+|---------|-------|
+| `filePath must be a string` | First argument is not a string. |
+| `File not found: <path>` | File does not exist at the given path. |
+| `ENOENT` | `clamscan` binary not found in PATH (local mode). |
+| `Unexpected exit code: N` | ClamAV exited with an undocumented code. |
+| `Process killed by signal: <SIG>` | Process was killed (timeout, OOM, SIGTERM). |
+| `clamd connection timed out after Nms` | TCP socket idle timeout exceeded. |
+### Examples
+```js
+// Local mode
+const result = await scan('/uploads/report.pdf');
+// TCP mode
+const result = await scan('/uploads/report.pdf', {
+  host: '127.0.0.1',
+  port: 3310,
+  timeout: 30_000,
+});
+if (result === Verdict.Clean)     console.log('Safe.');
+if (result === Verdict.Malicious) throw new Error('Malware detected.');
+if (result === Verdict.ScanError) console.warn('Scan incomplete — treat as untrusted.');
+```
+---
+## `scanBuffer(buffer, [options])`
+Scan an in-memory `Buffer` without writing to disk (TCP mode) or via a temp file that is deleted automatically (local mode).
+```ts
+scanBuffer(
+  buffer: Buffer,
+  options?: {
+    host?: string;
+    port?: number;
+    timeout?: number;
+  }
+): Promise<symbol>
+```
+### Parameters
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `buffer` | `Buffer` | Yes | The in-memory buffer to scan. |
+| `options` | `object` | No | Same options as `scan()`. |
+### Returns
+Same three `Verdict` Symbols as `scan()`.
+### Rejects with
+Everything `scan()` can reject with, plus:
+| Message | Cause |
+|---------|-------|
+| `buffer must be a Buffer` | First argument is not a `Buffer` instance. |
+| `buffer is empty` | Zero-length Buffer passed. |
+### Notes
+- **TCP mode:** buffer is streamed to clamd via INSTREAM — no disk I/O.
+- **Local mode:** buffer is written to a temp file in `os.tmpdir()`, scanned, then deleted in a `finally` block.
+### Example
+```js
+// multer memoryStorage
+const result = await scanBuffer(req.file.buffer, {
+  host: process.env.CLAMAV_HOST,
+  port: 3310,
+});
+```
+---
+## `scanStream(stream, [options])`
+Scan any Node.js `Readable` stream. In TCP mode the stream is piped directly to clamd — no disk I/O. In local mode it is written to a temp file that is deleted automatically.
+```ts
+scanStream(
+  stream: Readable,
+  options?: {
+    host?: string;
+    port?: number;
+    timeout?: number;
+  }
+): Promise<symbol>
+```
+### Parameters
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `stream` | `stream.Readable` | Yes | The readable stream to scan. |
+| `options` | `object` | No | Same options as `scan()`. |
+### Returns
+Same three `Verdict` Symbols as `scan()`.
+### Rejects with
+Everything `scan()` can reject with, plus:
+| Message | Cause |
+|---------|-------|
+| `stream must be a Readable` | First argument is not a `stream.Readable`. |
+| stream error | Any error emitted by the stream is propagated as-is. |
+### Example
+```js
+// S3 getObject stream
+const { GetObjectCommand } = require('@aws-sdk/client-s3');
+const response = await s3.send(new GetObjectCommand({ Bucket, Key }));
+const result = await scanStream(response.Body, { host: 'clamav', port: 3310 });
+```
+---
+## `scanDirectory(dirPath, [options])`
+Recursively scan every file in a directory. Returns three arrays of absolute paths; per-file failures are collected rather than thrown.
+```ts
+scanDirectory(
+  dirPath: string,
+  options?: {
+    host?: string;
+    port?: number;
+    timeout?: number;
+  }
+): Promise<{
+  clean: string[];
+  malicious: string[];
+  errors: string[];
+}>
+```
+### Parameters
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `dirPath` | `string` | Yes | Path to the directory to scan recursively. |
+| `options` | `object` | No | Same options as `scan()`. |
+### Returns
+| Field | Type | Description |
+|-------|------|-------------|
+| `clean` | `string[]` | Absolute paths of files with no threats. |
+| `malicious` | `string[]` | Absolute paths of files with matched signatures. |
+| `errors` | `string[]` | Absolute paths of files that could not be scanned. |
+### Rejects with
+| Message | Cause |
+|---------|-------|
+| `dirPath must be a string` | First argument is not a string. |
+| `Directory not found: <path>` | Directory does not exist. |
+Individual file scan failures do **not** cause the function to reject — they appear in `errors`.
+### Example
+```js
+const results = await scanDirectory('/uploads', { host: 'clamav', port: 3310 });
+console.log(`${results.clean.length} clean, ${results.malicious.length} malicious`);
+results.malicious.forEach(f => fs.unlinkSync(f));
+```
+---
+## `Verdict`
+The `Verdict` object exported by pompelmi contains three Symbols:
+```js
+const { Verdict } = require('pompelmi');
+Verdict.Clean     // Symbol(Clean)
+Verdict.Malicious // Symbol(Malicious)
+Verdict.ScanError // Symbol(ScanError)
+```
+Each Symbol has a `.description` property for safe serialisation:
+```js
+Verdict.Clean.description     // 'Clean'
+Verdict.Malicious.description // 'Malicious'
+Verdict.ScanError.description // 'ScanError'
+```
+---
+## Options reference
+All four functions accept the same options object:
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `host` | `string` | — | clamd hostname. Setting this enables TCP mode. |
+| `port` | `number` | `3310` | clamd port. |
+| `timeout` | `number` | `15000` | Socket idle timeout in ms. TCP mode only. |
+When neither `host` nor `port` is set, pompelmi uses local mode (spawns `clamscan`).

package/wiki/cli-usage.md ADDED Viewed

@@ -0,0 +1,263 @@
+# CLI Usage
+pompelmi can be used as a command-line tool for scripting, CI pipelines, and interactive scanning. This page shows how to build a CLI scanner with pompelmi and how to use it in shell scripts.
+---
+## Minimal CLI scanner
+```js
+#!/usr/bin/env node
+// cli-scan.js
+const { scan, scanDirectory, Verdict } = require('pompelmi');
+const path = require('path');
+const fs   = require('fs');
+const args = process.argv.slice(2);
+if (args.length === 0) {
+  console.error('Usage: node cli-scan.js <file-or-dir> [file2] ...');
+  process.exit(2);
+}
+const SCAN_OPTS = {
+  host:    process.env.CLAMAV_HOST,
+  port:    Number(process.env.CLAMAV_PORT) || 3310,
+  timeout: Number(process.env.CLAMAV_TIMEOUT) || 30_000,
+};
+async function main() {
+  let anyMalicious = false;
+  for (const target of args) {
+    const resolved = path.resolve(target);
+    let stat;
+    try {
+      stat = fs.statSync(resolved);
+    } catch {
+      console.error(`Not found: ${resolved}`);
+      process.exit(2);
+    }
+    if (stat.isDirectory()) {
+      const results = await scanDirectory(resolved, SCAN_OPTS);
+      for (const f of results.clean)     console.log(`CLEAN     ${f}`);
+      for (const f of results.malicious) console.log(`MALICIOUS ${f}`);
+      for (const f of results.errors)    console.log(`ERROR     ${f}`);
+      if (results.malicious.length > 0) anyMalicious = true;
+    } else {
+      const result = await scan(resolved, SCAN_OPTS);
+      const label  = result.description.toUpperCase().padEnd(9);
+      console.log(`${label} ${resolved}`);
+      if (result === Verdict.Malicious) anyMalicious = true;
+    }
+  }
+  // Exit code 1 if any malicious file found — useful for CI
+  process.exit(anyMalicious ? 1 : 0);
+}
+main().catch(err => {
+  console.error(err.message);
+  process.exit(2);
+});
+```
+Make it executable:
+```bash
+chmod +x cli-scan.js
+```
+---
+## Usage examples
+```bash
+# Scan a single file
+node cli-scan.js /path/to/file.pdf
+# Scan multiple files
+node cli-scan.js /uploads/a.pdf /uploads/b.zip
+# Scan a directory recursively
+node cli-scan.js /uploads/
+# TCP mode (set env vars)
+CLAMAV_HOST=127.0.0.1 node cli-scan.js /uploads/file.pdf
+```
+---
+## Exit codes
+| Exit code | Meaning |
+|-----------|---------|
+| `0` | All scanned files are clean |
+| `1` | One or more malicious files found |
+| `2` | Scan failed or argument error |
+Exit code `1` for malicious is standard in shell scripting — makes it easy to use in `if` statements and CI pipelines.
+---
+## Shell scripting
+```bash
+#!/bin/bash
+set -e
+FILE="$1"
+if [ -z "$FILE" ]; then
+  echo "Usage: $0 <file>"
+  exit 2
+fi
+node /usr/local/bin/cli-scan.js "$FILE"
+STATUS=$?
+if [ $STATUS -eq 1 ]; then
+  echo "Upload rejected: malware detected."
+  exit 1
+elif [ $STATUS -eq 2 ]; then
+  echo "Scan failed."
+  exit 2
+else
+  echo "File is clean."
+fi
+```
+---
+## CI pipeline integration
+### GitHub Actions
+```yaml
+# .github/workflows/scan-artifacts.yml
+name: Scan artifacts
+on:
+  workflow_run:
+    workflows: ["Build"]
+    types: [completed]
+jobs:
+  scan:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install ClamAV
+        run: |
+          sudo apt-get install -y clamav
+          sudo freshclam
+      - name: Download build artifacts
+        uses: actions/download-artifact@v4
+        with:
+          name: dist
+          path: dist/
+      - name: Scan artifacts
+        run: |
+          npm ci
+          node cli-scan.js dist/
+          # Exits 1 if malicious files found — fails the job
+```
+### Pre-commit hook
+```bash
+# .git/hooks/pre-commit
+#!/bin/bash
+# Scan staged files before commit
+STAGED=$(git diff --cached --name-only --diff-filter=ACM)
+if [ -z "$STAGED" ]; then
+  exit 0
+fi
+echo "$STAGED" | xargs node cli-scan.js
+if [ $? -ne 0 ]; then
+  echo "Commit blocked: malware detected in staged files."
+  exit 1
+fi
+```
+---
+## Adding a global `pompelmi` command
+Add to `package.json` to expose as a package binary:
+```json
+{
+  "bin": {
+    "pompelmi-scan": "./cli-scan.js"
+  }
+}
+```
+Install globally:
+```bash
+npm install -g pompelmi
+pompelmi-scan /path/to/file.pdf
+```
+---
+## Scanning directories and deleting malicious files
+```js
+#!/usr/bin/env node
+// cli-purge.js — scan a directory and delete malicious files
+const { scanDirectory, Verdict } = require('pompelmi');
+const fs   = require('fs');
+const path = require('path');
+const dir = path.resolve(process.argv[2] || '.');
+const results = await scanDirectory(dir, {
+  host: process.env.CLAMAV_HOST,
+  port: 3310,
+});
+console.log(`Scanned: ${results.clean.length + results.malicious.length + results.errors.length} files`);
+console.log(`Clean:     ${results.clean.length}`);
+console.log(`Malicious: ${results.malicious.length}`);
+console.log(`Errors:    ${results.errors.length}`);
+for (const f of results.malicious) {
+  fs.unlinkSync(f);
+  console.log(`Deleted: ${f}`);
+}
+process.exit(results.malicious.length > 0 ? 1 : 0);
+```
+---
+## JSON output for piping to other tools
+```js
+// cli-scan-json.js — outputs JSON for use with jq
+const results = [];
+// ... scan logic ...
+process.stdout.write(JSON.stringify({ files: results }, null, 2));
+process.exit(anyMalicious ? 1 : 0);
+```
+```bash
+node cli-scan-json.js /uploads/ | jq '.files[] | select(.verdict == "Malicious") | .path'
+```

package/wiki/concurrent-scanning.md ADDED Viewed

@@ -0,0 +1,199 @@
+# Concurrent Scanning
+Scanning multiple files in parallel improves throughput but introduces tradeoffs around resource usage, partial failures, and connection limits. This page covers the main patterns.
+---
+## `Promise.all` — scan multiple files in parallel
+`Promise.all` runs all scans concurrently and resolves when every scan completes. If any scan rejects (throws), the entire `Promise.all` rejects immediately.
+```js
+const { scan, Verdict } = require('pompelmi');
+const files = [
+  '/uploads/document.pdf',
+  '/uploads/photo.jpg',
+  '/uploads/archive.zip',
+];
+const results = await Promise.all(files.map(f => scan(f)));
+results.forEach((result, i) => {
+  if (result === Verdict.Malicious) {
+    console.log(`${files[i]} is malicious.`);
+  }
+});
+```
+Use `Promise.all` when:
+- All files must be accepted for the request to succeed.
+- You want to fail fast if any scan throws.
+---
+## `Promise.allSettled` — partial failures
+`Promise.allSettled` waits for all scans to complete regardless of individual failures. Each result has a `status` of `'fulfilled'` or `'rejected'`.
+```js
+const { scan, scanBuffer, Verdict } = require('pompelmi');
+const files = ['/uploads/a.pdf', '/uploads/b.zip', '/uploads/c.png'];
+const settled = await Promise.allSettled(
+  files.map(async (f) => ({ path: f, verdict: await scan(f) }))
+);
+const accepted = [];
+const rejected = [];
+for (const r of settled) {
+  if (r.status === 'rejected') {
+    rejected.push({ path: '?', reason: r.reason.message });
+    continue;
+  }
+  const { path, verdict } = r.value;
+  if (verdict === Verdict.Clean) {
+    accepted.push(path);
+  } else {
+    rejected.push({ path, reason: verdict.description });
+  }
+}
+console.log({ accepted, rejected });
+```
+Use `Promise.allSettled` when:
+- You want to process as many files as possible even if some fail.
+- You need to report which specific files were rejected.
+---
+## `scanDirectory()` — scan an entire folder
+`scanDirectory()` handles concurrent scanning of every file in a directory internally. It catches per-file errors and collects them into the `errors` array rather than throwing.
+```js
+const fs = require('fs');
+const { scanDirectory } = require('pompelmi');
+const results = await scanDirectory('/uploads', {
+  host: process.env.CLAMAV_HOST,
+  port: 3310,
+});
+console.log(`Clean: ${results.clean.length}`);
+console.log(`Malicious: ${results.malicious.length}`);
+console.log(`Errors: ${results.errors.length}`);
+// Auto-delete malicious files
+results.malicious.forEach(f => fs.unlinkSync(f));
+```
+Use `scanDirectory()` when:
+- You have an existing folder of files to audit.
+- You want a single-call interface with clean/malicious/errors output.
+---
+## Rate limiting concurrent scans with `p-limit`
+Unbounded `Promise.all` with a large number of files can overwhelm clamd or exhaust the OS file descriptor limit. Use `p-limit` to cap concurrency.
+```bash
+npm install p-limit
+```
+```js
+const pLimit = require('p-limit');
+const { scan, Verdict } = require('pompelmi');
+const files = getFilePaths(); // array of N paths
+const limit = pLimit(5);      // at most 5 concurrent scans
+const results = await Promise.all(
+  files.map(f => limit(() => scan(f, { host: 'clamav', port: 3310 })))
+);
+```
+Recommended concurrency limits:
+| Mode | Suggested concurrency |
+|------|----------------------|
+| Local (`clamscan`) | 2–4 (CPU-bound) |
+| TCP (single clamd) | 5–10 |
+| TCP (multiple clamd replicas) | 20–50 |
+Tune based on your hardware and observed clamd CPU usage.
+---
+## Concurrently scanning buffers
+```js
+const { scanBuffer, Verdict } = require('pompelmi');
+// req.files from multer.array()
+const results = await Promise.allSettled(
+  req.files.map(file =>
+    scanBuffer(file.buffer, { host: 'clamav', port: 3310 })
+      .then(verdict => ({ name: file.originalname, verdict }))
+  )
+);
+```
+---
+## Performance considerations
+### Local mode
+Each `scan()` in local mode spawns a `clamscan` child process. Spawning processes is expensive — ClamAV loads its virus database into memory on each invocation. For high-throughput local scanning, consider switching to TCP mode where a persistent `clamd` daemon keeps the database in memory.
+### TCP mode
+In TCP mode, pompelmi opens a new TCP connection per scan call. For sustained high-throughput workloads, the connection overhead is measurable. Options:
+1. **Increase concurrency gradually** — start at 5, measure clamd CPU, increase until you see degradation.
+2. **Scale clamd horizontally** — run multiple clamd containers behind a load balancer.
+3. **Connection pooling** — pompelmi does not pool connections. For extremely high throughput, implement a connection pool that keeps sockets open and reuses them.
+### Memory
+`scanBuffer()` holds the full file content in memory. For large files (>50 MB), prefer `scan()` (from disk) or `scanStream()` (streaming, no full buffering in TCP mode).
+---
+## Example: batch-scan upload queue
+```js
+const pLimit = require('p-limit');
+const { scan, Verdict } = require('pompelmi');
+const fs = require('fs');
+async function processBatch(filePaths) {
+  const limit = pLimit(8);
+  const results = await Promise.allSettled(
+    filePaths.map(filePath =>
+      limit(async () => {
+        const verdict = await scan(filePath, { host: 'clamav', port: 3310 });
+        return { filePath, verdict };
+      })
+    )
+  );
+  for (const r of results) {
+    if (r.status === 'rejected') {
+      console.error('Scan error:', r.reason.message);
+      continue;
+    }
+    const { filePath, verdict } = r.value;
+    if (verdict !== Verdict.Clean) {
+      fs.unlinkSync(filePath);
+      console.warn('Rejected:', filePath, verdict.description);
+    }
+  }
+}
+```