npm - pompelmi - Versions diffs - 1.5.0 → 1.7.0 - Mend

pompelmi 1.5.0 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/README.md +113 -195
package/action/Dockerfile +24 -0
package/action/entrypoint.sh +23 -0
package/action/scanner.js +89 -0
package/action.yml +29 -0
package/llms.txt +22 -99
package/package.json +1 -1
package/pr_info.tmp +2 -0
package/release-notes-v1.4.0.md +25 -0
package/release-notes-v1.5.0.md +37 -0
package/src/BufferScanner.js +20 -17
package/src/ClamAVScanner.js +4 -4
package/src/ClamdScanner.js +18 -15
package/src/StreamScanner.js +20 -17
package/wiki/api-reference.md +268 -0
package/wiki/cli-usage.md +263 -0
package/wiki/concurrent-scanning.md +199 -0
package/wiki/docker-compose-production.md +190 -0
package/wiki/docker-setup.md +178 -0
package/wiki/error-handling.md +242 -0
package/wiki/express-integration.md +227 -0
package/wiki/fastify-integration.md +207 -0
package/wiki/home.md +0 -0
package/wiki/local-vs-tcp-mode.md +179 -0
package/wiki/multer-memory-storage.md +166 -0
package/wiki/nestjs-integration.md +228 -0
package/wiki/nextjs-integration.md +209 -0
package/wiki/performance.md +178 -0
package/wiki/quarantine-workflow.md +260 -0
package/wiki/rest-api-server.md +297 -0
package/wiki/s3-integration.md +233 -0
package/wiki/security-considerations.md +192 -0
package/wiki/typescript-usage.md +239 -0
package/wiki/verdicts.md +192 -0
package/wiki/virus-definitions.md +194 -0

package/wiki/performance.md ADDED Viewed

@@ -0,0 +1,178 @@
+# Performance
+Understanding pompelmi's performance characteristics helps you choose the right mode, concurrency level, and file handling strategy for your workload.
+---
+## Latency: local mode vs TCP mode
+| Scenario | Local mode | TCP mode (LAN) |
+|----------|-----------|----------------|
+| Small file (< 1 MB) | 400–800 ms | 5–20 ms |
+| Medium file (5–10 MB) | 800–1500 ms | 20–80 ms |
+| Large file (50 MB) | 2000–4000 ms | 100–400 ms |
+| ZIP archive (1 MB compressed) | 600–1200 ms | 15–60 ms |
+Local mode is dominated by the time ClamAV takes to load the virus database (~300 MB) into memory on each invocation. TCP mode reuses a persistent clamd daemon that keeps the database resident.
+> These are rough estimates. Actual latency depends on disk I/O speed, CPU, ClamAV version, and virus definition size.
+---
+## Throughput: concurrent scans
+### Local mode
+Each local scan spawns a `clamscan` process that loads the database from disk. On a 4-core machine:
+```
+~2–4 concurrent scans before CPU saturation
+~1–2 scans/second sustained throughput
+```
+Increasing concurrency beyond 4 in local mode degrades performance rather than improving it — processes compete for disk and CPU.
+### TCP mode
+clamd keeps the virus database in memory and handles requests on a single thread. Multiple connections are accepted and queued:
+```
+~5–10 concurrent scans before clamd is saturated (single instance)
+~50–200 scans/second sustained throughput (single clamd, depends on file size)
+```
+Scale horizontally by running multiple clamd instances behind a load balancer.
+---
+## Memory usage
+### `scan()` (file path)
+Memory usage is minimal in the application process — pompelmi reads a path and delegates. ClamAV allocates memory to load the database and scan the file (especially for archive extraction).
+### `scanBuffer()` with large files
+The full file content is held in memory as a Node.js `Buffer` for the duration of the scan. For a 50 MB upload:
+- Application process: ~50 MB Buffer
+- clamd (TCP mode): streams the buffer, does not accumulate it all at once
+- Local mode: writes a temp file, so memory usage is minimal in the app process
+**Avoid `scanBuffer()` for files > 50 MB.** Use `scan()` (disk) or `scanStream()` (streaming) instead.
+### `scanStream()` with TCP mode
+The stream is piped directly to clamd in 64 KB chunks. The application process never holds the full file in memory — peak memory usage is approximately 64 KB for the chunk buffer plus stream buffering overhead. This is the most memory-efficient option for large files.
+---
+## Temp file cleanup in local mode
+`scanBuffer()` and `scanStream()` in local mode write a temp file to `os.tmpdir()` before scanning. pompelmi deletes the temp file in a `finally` block — it is always removed regardless of scan outcome.
+However, if your process is killed with `SIGKILL` (not `SIGTERM`), the `finally` block does not run and the temp file persists. Add a startup cleanup or use a system temp cleaner (Linux `systemd-tmpfiles`, macOS `/tmp` auto-clean) to handle this case.
+```js
+const os   = require('os');
+const fs   = require('fs');
+const path = require('path');
+function cleanTempFiles() {
+  const tmpDir = os.tmpdir();
+  const files  = fs.readdirSync(tmpDir);
+  const stale  = files.filter(f => f.startsWith('scan-') && f.endsWith('.tmp'));
+  for (const f of stale) {
+    const full = path.join(tmpDir, f);
+    const age  = Date.now() - fs.statSync(full).mtimeMs;
+    if (age > 60_000) { // older than 1 minute
+      try { fs.unlinkSync(full); } catch {}
+    }
+  }
+}
+// Run at startup
+cleanTempFiles();
+```
+---
+## Connection considerations for TCP mode
+pompelmi opens a new TCP connection per scan call. For sporadic uploads, this is fine — the connection overhead is small (< 1 ms on LAN).
+For sustained high-throughput workloads (hundreds of scans per second), the connection overhead accumulates. Options:
+1. **Keep-alive / connection reuse:** pompelmi does not implement connection pooling. If this becomes a bottleneck, implement a pool using Node.js `net.Socket` that reuses open connections.
+2. **Increase clamd connection limit:** Check `MaxConnections` in `clamd.conf` (default: 30). Increase it if you are running many concurrent scans.
+3. **Scale horizontally:** Run multiple clamd instances behind a load balancer and distribute scan requests across them.
+---
+## `scanDirectory()` performance
+`scanDirectory()` scans all files concurrently (bounded internally). For very large directories (thousands of files), it may open many simultaneous connections to clamd.
+If you observe clamd connection errors with large directories, use `p-limit` to wrap individual `scan()` calls instead:
+```js
+const pLimit = require('p-limit');
+const { scan, Verdict } = require('pompelmi');
+const fs = require('fs');
+async function scanDirLimited(dirPath, concurrency = 5) {
+  const limit = pLimit(concurrency);
+  const files = fs.readdirSync(dirPath, { recursive: true })
+    .filter(f => !fs.statSync(`${dirPath}/${f}`).isDirectory())
+    .map(f => `${dirPath}/${f}`);
+  return Promise.allSettled(
+    files.map(f => limit(async () => ({
+      path: f,
+      verdict: await scan(f, { host: 'clamav', port: 3310 }),
+    })))
+  );
+}
+```
+---
+## Profiling scan latency in production
+Wrap your scan calls with timing instrumentation:
+```js
+async function timedScan(filePath, opts) {
+  const start  = Date.now();
+  const result = await scan(filePath, opts);
+  const ms     = Date.now() - start;
+  logger.info({
+    event:   'scan_complete',
+    filePath,
+    verdict: result.description,
+    ms,
+    size:    fs.statSync(filePath).size,
+  });
+  return result;
+}
+```
+Track the `ms` metric in your observability system. Sudden increases indicate clamd overload, disk I/O contention, or stale virus definitions.
+---
+## Choosing the right function for your workload
+| Scenario | Recommended function | Reason |
+|----------|---------------------|--------|
+| File uploaded to disk | `scan(filePath)` | Zero buffer overhead |
+| multer memoryStorage, small files (< 10 MB) | `scanBuffer(buffer)` | Simple, no temp file in TCP |
+| multer memoryStorage, large files | `scanStream(stream)` | No full buffer in memory |
+| S3 getObject | `scanStream(response.Body)` | No disk, no full buffer |
+| Batch of files in a folder | `scanDirectory(dirPath)` | Single call, concurrent |
+| High-throughput uploads | TCP mode + `scanStream()` | Lowest latency, no disk |

package/wiki/quarantine-workflow.md ADDED Viewed

@@ -0,0 +1,260 @@
+# Quarantine Workflow
+Deleting malicious files immediately is the simplest response, but a quarantine folder lets you retain infected files for forensic review, audit logging, and pattern analysis before permanent deletion.
+---
+## Basic quarantine: move instead of delete
+```js
+const fs   = require('fs');
+const path = require('path');
+const { scan, Verdict } = require('pompelmi');
+const QUARANTINE_DIR = path.join(__dirname, 'quarantine');
+fs.mkdirSync(QUARANTINE_DIR, { recursive: true });
+async function scanAndQuarantine(filePath) {
+  const result = await scan(filePath, { host: process.env.CLAMAV_HOST, port: 3310 });
+  if (result === Verdict.Malicious) {
+    const filename = path.basename(filePath);
+    const dest     = path.join(QUARANTINE_DIR, `${Date.now()}-${filename}`);
+    fs.renameSync(filePath, dest);
+    console.warn({
+      event:    'quarantined',
+      original: filePath,
+      dest,
+      verdict:  result.description,
+    });
+    return { quarantined: true, dest };
+  }
+  if (result === Verdict.ScanError) {
+    fs.unlinkSync(filePath);
+    return { quarantined: false, deleted: true, reason: 'scan_error' };
+  }
+  return { quarantined: false, verdict: result.description };
+}
+```
+`fs.renameSync` is atomic on the same filesystem. If `filePath` and `QUARANTINE_DIR` are on different filesystems, copy then delete:
+```js
+fs.copyFileSync(filePath, dest);
+fs.unlinkSync(filePath);
+```
+---
+## Quarantine folder structure
+Organise quarantine files for easy review. A date-based hierarchy keeps any single directory manageable:
+```
+quarantine/
+  2024/
+    04/
+      28/
+        1714300800000-invoice.pdf
+        1714301200000-resume.doc
+```
+```js
+function quarantinePath(originalPath) {
+  const now      = new Date();
+  const year     = now.getFullYear();
+  const month    = String(now.getMonth() + 1).padStart(2, '0');
+  const day      = String(now.getDate()).padStart(2, '0');
+  const dir      = path.join(QUARANTINE_DIR, String(year), month, day);
+  const filename = `${Date.now()}-${path.basename(originalPath)}`;
+  fs.mkdirSync(dir, { recursive: true });
+  return path.join(dir, filename);
+}
+```
+---
+## Logging quarantined files to a database
+Store a record of every quarantined file for audit and reporting:
+```js
+const { scan, Verdict } = require('pompelmi');
+async function scanAndLog(filePath, db, userId) {
+  let result;
+  try {
+    result = await scan(filePath, { host: 'clamav', port: 3310 });
+  } catch (err) {
+    await db.scanEvents.insert({
+      filePath,
+      userId,
+      event:     'scan_error',
+      error:     err.message,
+      createdAt: new Date(),
+    });
+    throw err;
+  }
+  if (result === Verdict.Malicious) {
+    const dest = quarantinePath(filePath);
+    fs.renameSync(filePath, dest);
+    await db.scanEvents.insert({
+      originalPath: filePath,
+      quarantinePath: dest,
+      userId,
+      event:     'quarantined',
+      verdict:   'malicious',
+      createdAt: new Date(),
+    });
+    return { quarantined: true, dest };
+  }
+  await db.scanEvents.insert({
+    filePath,
+    userId,
+    event:   'clean',
+    verdict: 'clean',
+    createdAt: new Date(),
+  });
+  return { quarantined: false };
+}
+```
+---
+## Alerting on quarantine events
+Send a notification when malware is detected. Use any alerting mechanism — email, Slack, PagerDuty, a webhook:
+```js
+async function notifyAdmin(event) {
+  const message = [
+    `Malicious file quarantined`,
+    `Original path: ${event.originalPath}`,
+    `Quarantine path: ${event.quarantinePath}`,
+    `User: ${event.userId}`,
+    `Time: ${event.createdAt.toISOString()}`,
+  ].join('\n');
+  await fetch(process.env.SLACK_WEBHOOK_URL, {
+    method: 'POST',
+    headers: { 'Content-Type': 'application/json' },
+    body: JSON.stringify({ text: message }),
+  });
+}
+```
+---
+## Express integration with quarantine
+```js
+const express = require('express');
+const multer  = require('multer');
+const { scan, Verdict } = require('pompelmi');
+const app    = express();
+const upload = multer({ dest: './uploads' });
+app.post('/upload', upload.single('file'), async (req, res) => {
+  if (!req.file) return res.status(400).json({ error: 'No file.' });
+  const filePath = req.file.path;
+  const result   = await scan(filePath, { host: 'clamav', port: 3310 }).catch(err => {
+    try { fs.unlinkSync(filePath); } catch {}
+    throw err;
+  });
+  if (result === Verdict.Malicious) {
+    const dest = quarantinePath(filePath);
+    fs.renameSync(filePath, dest);
+    logger.warn({ event: 'quarantined', dest, userId: req.user?.id });
+    return res.status(422).json({ error: 'Malicious file rejected.' });
+  }
+  if (result === Verdict.ScanError) {
+    fs.unlinkSync(filePath);
+    return res.status(422).json({ error: 'Scan incomplete — file rejected.' });
+  }
+  return res.json({ ok: true, filename: req.file.filename });
+});
+```
+---
+## Reviewing quarantined files
+To review what was quarantined:
+```bash
+# List quarantined files with sizes
+find quarantine/ -type f -exec ls -lh {} \;
+# Count by day
+find quarantine/ -type f | cut -d/ -f2-4 | sort | uniq -c
+```
+From a Node.js admin script:
+```js
+const { scanDirectory } = require('pompelmi');
+// Re-scan the quarantine folder to verify signatures (optional)
+const results = await scanDirectory('./quarantine', { host: 'clamav', port: 3310 });
+console.log(`Quarantine: ${results.malicious.length} confirmed malicious, ${results.clean.length} clean`);
+```
+---
+## Cleanup policy
+Quarantined files should not accumulate indefinitely. Implement a retention policy:
+```js
+const fs   = require('fs');
+const path = require('path');
+const RETENTION_DAYS = 30;
+function pruneQuarantine(dir) {
+  const cutoff = Date.now() - RETENTION_DAYS * 24 * 60 * 60 * 1000;
+  for (const file of fs.readdirSync(dir, { recursive: true })) {
+    const fullPath = path.join(dir, file);
+    const stat     = fs.statSync(fullPath);
+    if (stat.isFile() && stat.mtimeMs < cutoff) {
+      fs.unlinkSync(fullPath);
+      console.log(`Deleted expired quarantine file: ${fullPath}`);
+    }
+  }
+}
+pruneQuarantine('./quarantine');
+```
+Run this as a daily cron job. Adjust `RETENTION_DAYS` based on your audit or compliance requirements.
+---
+## Permissions
+Ensure the quarantine directory is not web-accessible. Never serve files from the quarantine folder through your web server. Set restrictive filesystem permissions:
+```bash
+mkdir -p quarantine
+chmod 700 quarantine
+```
+On Linux, assign ownership to the user running your Node.js process and deny access to all others.