@3030-labs/wotw 0.8.4 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -11,6 +11,60 @@ always non-breaking.
11
11
 
12
12
  ---
13
13
 
14
+ ## [0.9.0] — 2026-05-31 — Multi-LLM + verify-and-harness + Group C + redaction security
15
+
16
+ This release ships the multi-provider LLM abstraction, the standing gold-fact
17
+ regression harness and its CI, the Group C residuals (token-estimation accuracy
18
+ + per-provider extraction quality), and two credential-redaction security fixes.
19
+ All live provider baselines (Anthropic / OpenAI / Gemini) are recorded and
20
+ committed; Ollama ships untested (no local server at release time).
21
+
22
+ ### Security
23
+ - **`AQ.`-format Google API keys are now redacted.** The `gemini-api-key`
24
+ redaction rule matched only the legacy `AIza…` format; Google's current AI
25
+ Studio keys (`AQ.` + ~40–80 chars, rolled out 2024–2025) passed through
26
+ unredacted — a credential-leak vector for any content sent to an LLM. The rule
27
+ now matches both formats (additive; same `cloud_rule_id`, legacy `AIza…`
28
+ redaction preserved verbatim).
29
+ - **Removed real API keys from the redaction test fixtures.** `sanitize.test.ts`
30
+ had used two real keys (Google + OpenAI) as examples; replaced with synthetic
31
+ fixtures. The exposed keys have been rotated. (Git history still contains the
32
+ revoked values — rotation is the remediation; an optional history scrub is
33
+ tracked separately.)
34
+
35
+ ### Added
36
+ - **Multi-LLM provider abstraction (Option B, single-pass).** A `complete()`-level
37
+ `LLMProvider` interface with OpenAI, Gemini, and Ollama wrappers alongside the
38
+ Anthropic provider, wired into all callers via `runtimeAwareComplete`. The
39
+ single-pass ingestion architecture is preserved (no agent-loop reintroduced;
40
+ the no-agent-SDK-import invariant test stays green).
41
+ - **Gold-fact regression harness + cassette CI** (verify-and-harness arc). A
42
+ semantic fact-level precision/recall scorer, a 20-fixture / 105-fact gold
43
+ corpus, regression-from-baseline gating, accepted-delta normalization, and a
44
+ committed-cassette offline PR-gate (zero API calls in CI) plus a scheduled,
45
+ key-gated live-drift workflow.
46
+ - **GC1 — token-estimation accuracy CI.** A ±15% offline PR-gate that compares
47
+ the `estimate_query_cost` 4-char heuristic against recorded provider-native
48
+ token counts (Anthropic + Gemini), gating on regression-from-baseline (compliant
49
+ fixtures held to ±15%, known exceedances pinned against worsening), plus a
50
+ scheduled live-drift tier and a committed accuracy report. Measured result:
51
+ 30/40 within ±15% (the heuristic under-estimates dense/code content; documented).
52
+ OpenAI / Ollama have no native tokenizer — an explicit, documented gap.
53
+ - **GC2 — per-provider fact-extraction quality report.** An operator-facing
54
+ per-provider quality summary with a recall-keyed recommend-disable signal
55
+ (precision is expected-low by design; the gold set is a curated subset). All
56
+ three cloud providers score healthy; Ollama is reported untested.
57
+
58
+ ### Fixed
59
+ - **Per-provider API-key wiring in the harness recorder.** API mode read a single
60
+ `execution.api_key_env` for every provider, so non-Anthropic recordings were
61
+ handed the Anthropic key (401). The recorder now maps each provider to its own
62
+ env var — the reason non-Anthropic recording had never been exercised.
63
+ - Two redaction-security defects (leaked test keys; `AQ.`-format gap) — see
64
+ **Security** above.
65
+
66
+ ---
67
+
14
68
  ## [0.8.4] — 2026-05-26 — Public-launch readiness (PASS-023)
15
69
 
16
70
  ### Added