cognium-ai 2.7.0 → 2.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +44 -0
  2. package/package.json +2 -2
package/README.md CHANGED
@@ -83,6 +83,50 @@ export LLM_ENRICHMENT_MODEL=cognium/gpt-oss-120b
83
83
  | Ollama (local) | `http://localhost:11434/v1` | `llama3` |
84
84
  | Together AI | `https://api.together.xyz/v1` | `meta-llama/Llama-3-70b` |
85
85
 
86
+ ## Performance: local Ollama vs cloud LLM
87
+
88
+ LLM-enriched scans dispatch ~3 LLM calls per source file (role classification, source discovery, sink discovery). Throughput is dominated by the LLM endpoint's per-call latency, not the SAST analysis. Practical guidance:
89
+
90
+ | Setup | Per-call latency | Practical throughput | Recommended for |
91
+ |---|---|---|---|
92
+ | **Cognium proxy** (`http://localhost:4000/v1`) | ~1–3s | ~10–30 files/min | Daily scans, CI |
93
+ | **Cloud (OpenAI gpt-4o-mini, GitHub Models)** | ~1–4s | ~10–30 files/min | Daily scans, CI |
94
+ | **Ollama 7B+** (`llama3:8b`, `qwen2.5-coder:7b`) | ~5–15s | ~3–10 files/min | Small repos, local development |
95
+ | **Ollama 1.5B–3B** (`llama3.2:3b`, `qwen2.5-coder:1.5b`) | ~3–10s | ~5–15 files/min | Development only — JSON quality is unreliable (#25, #37) |
96
+ | **Ollama reasoning** (`deepseek-r1`, `o1`) | 30–120s+ | <1 file/min | Not recommended (#25 — `<think>` blocks break JSON parser) |
97
+ | **Static-only** (`--no-llm`) | n/a | 100s files/sec | CI gates, large repos, air-gapped |
98
+
99
+ For a medium JS repo (~1000 source files), expect:
100
+ - Cognium / cloud: 30 sec – 5 min
101
+ - Ollama 7B: 2–6 hours
102
+ - Ollama 3B: probably similar but with degraded finding quality
103
+ - Static-only: <1 min
104
+
105
+ ### Tuning knobs
106
+
107
+ If you must run LLM-enriched scans against a slow endpoint:
108
+
109
+ ```bash
110
+ # Raise per-call timeout for slow models (default 60s)
111
+ cognium-ai scan ./src --llm-timeout 180
112
+
113
+ # Concurrency control (env vars, default LLM_MAX_CONCURRENT=5, LLM_RATE_LIMIT=10)
114
+ LLM_MAX_CONCURRENT=2 LLM_RATE_LIMIT=4 cognium-ai scan ./src
115
+
116
+ # Bound the file count
117
+ cognium-ai scan ./src --max-files 100
118
+
119
+ # Or skip LLM entirely on large/CI runs
120
+ cognium-ai scan ./src --no-llm
121
+ ```
122
+
123
+ ### When to choose which
124
+
125
+ - **Daily / CI:** cognium proxy or a small cloud model (`gpt-4o-mini`, `openai/gpt-4o-mini` via GitHub Models — generous free tier)
126
+ - **Local development:** static-only by default; use a 7B+ Ollama model for occasional LLM-augmented runs
127
+ - **Air-gapped / sensitive repos:** static-only; the SAST core covers OWASP Top 10 / Juliet at >97% accuracy without LLM
128
+ - **Reasoning models (`deepseek-r1`, `o1`):** route through the cognium proxy — direct calls hit the JSON parser issue documented in #25
129
+
86
130
  ## CI/CD with GitHub Actions
87
131
 
88
132
  Run LLM-enhanced SAST in CI using [GitHub Models](https://github.com/marketplace?type=models) free tier -- no API keys to configure:
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "cognium-ai",
3
- "version": "2.7.0",
3
+ "version": "2.7.2",
4
4
  "description": "AI-powered static analysis CLI with LLM-enhanced vulnerability detection",
5
5
  "main": "dist/index.js",
6
6
  "types": "dist/index.d.ts",
@@ -42,7 +42,7 @@
42
42
  ],
43
43
  "dependencies": {
44
44
  "circle-ir": "^3.20.0",
45
- "circle-ir-ai": "^2.7.0",
45
+ "circle-ir-ai": "^2.7.1",
46
46
  "commander": "^14.0.3",
47
47
  "minimatch": "^10.2.5"
48
48
  },