npm - cognium-ai - Versions diffs - 2.7.0 → 2.7.1 - Mend

cognium-ai 2.7.0 → 2.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +44 -0
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -83,6 +83,50 @@ export LLM_ENRICHMENT_MODEL=cognium/gpt-oss-120b
 | Ollama (local) | `http://localhost:11434/v1` | `llama3` |
 | Together AI | `https://api.together.xyz/v1` | `meta-llama/Llama-3-70b` |
+## Performance: local Ollama vs cloud LLM
+LLM-enriched scans dispatch ~3 LLM calls per source file (role classification, source discovery, sink discovery). Throughput is dominated by the LLM endpoint's per-call latency, not the SAST analysis. Practical guidance:
+| Setup | Per-call latency | Practical throughput | Recommended for |
+|---|---|---|---|
+| **Cognium proxy** (`http://localhost:4000/v1`) | ~1–3s | ~10–30 files/min | Daily scans, CI |
+| **Cloud (OpenAI gpt-4o-mini, GitHub Models)** | ~1–4s | ~10–30 files/min | Daily scans, CI |
+| **Ollama 7B+** (`llama3:8b`, `qwen2.5-coder:7b`) | ~5–15s | ~3–10 files/min | Small repos, local development |
+| **Ollama 1.5B–3B** (`llama3.2:3b`, `qwen2.5-coder:1.5b`) | ~3–10s | ~5–15 files/min | Development only — JSON quality is unreliable (#25, #37) |
+| **Ollama reasoning** (`deepseek-r1`, `o1`) | 30–120s+ | <1 file/min | Not recommended (#25 — `<think>` blocks break JSON parser) |
+| **Static-only** (`--no-llm`) | n/a | 100s files/sec | CI gates, large repos, air-gapped |
+For a medium JS repo (~1000 source files), expect:
+- Cognium / cloud: 30 sec – 5 min
+- Ollama 7B: 2–6 hours
+- Ollama 3B: probably similar but with degraded finding quality
+- Static-only: <1 min
+### Tuning knobs
+If you must run LLM-enriched scans against a slow endpoint:
+```bash
+# Raise per-call timeout for slow models (default 60s)
+cognium-ai scan ./src --llm-timeout 180
+# Concurrency control (env vars, default LLM_MAX_CONCURRENT=5, LLM_RATE_LIMIT=10)
+LLM_MAX_CONCURRENT=2 LLM_RATE_LIMIT=4 cognium-ai scan ./src
+# Bound the file count
+cognium-ai scan ./src --max-files 100
+# Or skip LLM entirely on large/CI runs
+cognium-ai scan ./src --no-llm
+```
+### When to choose which
+- **Daily / CI:** cognium proxy or a small cloud model (`gpt-4o-mini`, `openai/gpt-4o-mini` via GitHub Models — generous free tier)
+- **Local development:** static-only by default; use a 7B+ Ollama model for occasional LLM-augmented runs
+- **Air-gapped / sensitive repos:** static-only; the SAST core covers OWASP Top 10 / Juliet at >97% accuracy without LLM
+- **Reasoning models (`deepseek-r1`, `o1`):** route through the cognium proxy — direct calls hit the JSON parser issue documented in #25
 ## CI/CD with GitHub Actions
 Run LLM-enhanced SAST in CI using [GitHub Models](https://github.com/marketplace?type=models) free tier -- no API keys to configure:

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "cognium-ai",
-  "version": "2.7.0",
+  "version": "2.7.1",
   "description": "AI-powered static analysis CLI with LLM-enhanced vulnerability detection",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",