circle-ir-ai 2.8.3 → 2.8.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/CHANGELOG.md +126 -0
  2. package/package.json +1 -1
package/CHANGELOG.md CHANGED
@@ -5,6 +5,132 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [2.8.6] - 2026-06-10
9
+
10
+ ### Fixed
11
+
12
+ - **#84 postmortem: parse-retry budget was entangled with HTTP retry
13
+ budget in v2.8.5.** Investigation after the v2.8.5 gemma3:12b
14
+ benchmark re-run showed an unchanged score (94/120, same 2 parse
15
+ errors). Inspection of the v2.8.5 verbose trail revealed:
16
+
17
+ ```
18
+ [LLM] JSON parse error (array) — parse-retry 1/1
19
+ [LLM] Error: This operation was aborted — retry 2/3 in 10s
20
+ [LLM] JSON parse error (array) — parse-retries exhausted
21
+ ```
22
+
23
+ The `parseRetries` counter in `benchmarks/runners/run-cwe-bench-java.ts`
24
+ was declared outside the `for (attempt...)` loop and was consumed by
25
+ the first parse error. When an AbortError subsequently triggered an
26
+ HTTP retry, the retry consumed an `attempt` slot but did not refresh
27
+ the parse-retry budget — so the next parse error had no budget left.
28
+
29
+ Fix: decoupled the two counters. Converted the loop to `while
30
+ (attempt <= MAX_RETRIES)` and now only HTTP-retry sites
31
+ (`attempt++`) consume the HTTP budget. Parse-retry `continue`
32
+ statements leave `attempt` untouched. Each error type now has its
33
+ own independent budget (max 4 HTTP attempts + 1 parse retry per
34
+ call). Maximum loop iterations bounded at 5.
35
+
36
+ ### Notes
37
+
38
+ - The v2.8.6 gemma3:12b benchmark (re-run 2026-06-10) scored the
39
+ same 94/120 with 2 parse errors. The retry-logic fix is correct
40
+ but didn't recover the lost detection because gemma3:12b
41
+ consistently fails on the same 2 specific prompts under the
42
+ benchmark's actual conditions (sustained corpus load + cross-file
43
+ context augmentation). Standalone repros with `num_ctx=32768`
44
+ succeed 5/5 and 6/6 respectively, indicating the failure is
45
+ contextual to the runner's full augmented prompt + position in
46
+ the sequence rather than a logic bug.
47
+
48
+ - The fix is the right semantic to ship regardless — removes a
49
+ latent bug from the retry path that would have masked any future
50
+ retry-recoverable failure.
51
+
52
+ - For users seeking zero parse errors, recommend `qwen3-coder:30b`
53
+ (0/113 failures, 91/120 score, ~2x faster per call) over
54
+ `gemma3:12b` (2/109 failures, 94/120 score).
55
+
56
+ ## [2.8.5] - 2026-06-09
57
+
58
+ ### Fixed
59
+
60
+ - **#84: CWE-Bench-Java runner produced 2 unrecoverable JSON parse
61
+ errors with `gemma3:12b`** (and any other local Ollama model) on
62
+ the 2026-06-09 run. Root cause turned out to be two distinct bugs
63
+ in `benchmarks/runners/run-cwe-bench-java.ts`:
64
+
65
+ **(1) Deterministic context overflow on large files (#118
66
+ rocketmq).** The Ollama `/v1/chat/completions` (OpenAI-compat)
67
+ endpoint defaults to `num_ctx=8192` — much smaller than the
68
+ model's native context window. `AdminBrokerProcessor.java`
69
+ (2655 lines, ~35K tokens) filled the entire prompt buffer,
70
+ leaving exactly 1 token for the response ("Okay"). The parser
71
+ then logged `No JSON found in response`.
72
+
73
+ Repro confirmed with `num_ctx∈{8192,16384}` → `eval_count=1`,
74
+ `num_ctx∈{32768,49152}` → `eval_count≈630`, valid array of 5
75
+ entries.
76
+
77
+ Fix: the runner now sets `options.num_ctx=32768` for any
78
+ `localhost:11434` / `127.0.0.1:11434` base URL. Honors
79
+ `LLM_OLLAMA_NUM_CTX` override for users with smaller VRAM or
80
+ models that don't support 32K (rare). 32K covers every file
81
+ in CWE-Bench-Java; gemma3:12b / qwen3-coder:30b / llama3 all
82
+ support 32K natively in <10GB VRAM.
83
+
84
+ **(2) Transient temp=0 stochasticity on tiny files (#109
85
+ spring-security).** `DefaultHttpFirewall.java` is 68 lines and
86
+ the parse error did NOT reproduce on 3 fresh repro attempts —
87
+ diagnosed as KV-cache / batch-grouping non-determinism that
88
+ surfaces at ~1% rate even with `temperature=0`.
89
+
90
+ Fix: added a single retry on any JSON parse failure
91
+ (`PARSE_ERR_ARRAY`, `PARSE_ERR_OBJECT`, `NO_JSON`). One retry
92
+ is sufficient because the failure is non-deterministic; a
93
+ second consecutive failure indicates a real prompt/model
94
+ problem worth recording as `parseError` in stats. Adds at
95
+ most ~1% extra LLM calls in the worst case, ~0 in the
96
+ common case.
97
+
98
+ Together these fixes should drop gemma3:12b's failure rate
99
+ from 2/109 (1.8%) to ~0/109. Smoke-tested on #118 only —
100
+ full re-run will happen on next benchmark cycle.
101
+
102
+ New env var: `LLM_OLLAMA_NUM_CTX` (integer, defaults to
103
+ 32768). Only consulted when the LLM base URL is local
104
+ Ollama.
105
+
106
+ ## [2.8.4] - 2026-06-09
107
+
108
+ ### Fixed
109
+
110
+ - **#72: benchmark runners ignored externally-set env vars (e.g.
111
+ `LLM_ENRICHMENT_MODEL`).** Symptom: `LLM_ENRICHMENT_MODEL=gpt-oss-120b
112
+ npm run benchmark:cwe` silently used whatever value was in the local
113
+ `.env` instead — masking LLM uplift in CWE-Bench-Java runs and
114
+ producing static-only numbers when the user had explicitly requested
115
+ an LLM model on the command line.
116
+
117
+ Root cause: 4 benchmark runners loaded `.env` via `dotenv.config()`
118
+ with its default `override: true` behavior, so `.env` clobbered any
119
+ pre-existing `process.env` value (the opposite of POSIX
120
+ precedence).
121
+
122
+ Fix: pass `{ override: false }` in all four call sites:
123
+ - `benchmarks/runners/run-cwe-bench-java.ts`
124
+ - `benchmarks/runners/run-all-benchmarks-parallel.ts`
125
+ - `benchmarks/instruction-safety/run-benchmark.ts`
126
+ - `benchmarks/skills/run-skills-benchmark.ts`
127
+
128
+ External env vars (CLI invocation, exported shell vars) now win;
129
+ `.env` is consulted only for keys not already set. `circle-pack`'s
130
+ `src/api/server.ts` is intentionally left as-is — different threat
131
+ model (production REST server where `.env` is the canonical config
132
+ source).
133
+
8
134
  ## [2.7.19] - 2026-05-28
9
135
 
10
136
  ### Versioning policy
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "circle-ir-ai",
3
- "version": "2.8.3",
3
+ "version": "2.8.6",
4
4
  "description": "LLM-enhanced SAST analysis built on circle-ir",
5
5
  "main": "dist/index.js",
6
6
  "module": "dist/index.js",