npm - @3030-labs/wotw - Versions diffs - 0.8.4 → 0.9.0 - Mend

@3030-labs/wotw 0.8.4 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -11,6 +11,60 @@ always non-breaking.
 ---
+## [0.9.0] — 2026-05-31 — Multi-LLM + verify-and-harness + Group C + redaction security
+This release ships the multi-provider LLM abstraction, the standing gold-fact
+regression harness and its CI, the Group C residuals (token-estimation accuracy
++ per-provider extraction quality), and two credential-redaction security fixes.
+All live provider baselines (Anthropic / OpenAI / Gemini) are recorded and
+committed; Ollama ships untested (no local server at release time).
+### Security
+- **`AQ.`-format Google API keys are now redacted.** The `gemini-api-key`
+  redaction rule matched only the legacy `AIza…` format; Google's current AI
+  Studio keys (`AQ.` + ~40–80 chars, rolled out 2024–2025) passed through
+  unredacted — a credential-leak vector for any content sent to an LLM. The rule
+  now matches both formats (additive; same `cloud_rule_id`, legacy `AIza…`
+  redaction preserved verbatim).
+- **Removed real API keys from the redaction test fixtures.** `sanitize.test.ts`
+  had used two real keys (Google + OpenAI) as examples; replaced with synthetic
+  fixtures. The exposed keys have been rotated. (Git history still contains the
+  revoked values — rotation is the remediation; an optional history scrub is
+  tracked separately.)
+### Added
+- **Multi-LLM provider abstraction (Option B, single-pass).** A `complete()`-level
+  `LLMProvider` interface with OpenAI, Gemini, and Ollama wrappers alongside the
+  Anthropic provider, wired into all callers via `runtimeAwareComplete`. The
+  single-pass ingestion architecture is preserved (no agent-loop reintroduced;
+  the no-agent-SDK-import invariant test stays green).
+- **Gold-fact regression harness + cassette CI** (verify-and-harness arc). A
+  semantic fact-level precision/recall scorer, a 20-fixture / 105-fact gold
+  corpus, regression-from-baseline gating, accepted-delta normalization, and a
+  committed-cassette offline PR-gate (zero API calls in CI) plus a scheduled,
+  key-gated live-drift workflow.
+- **GC1 — token-estimation accuracy CI.** A ±15% offline PR-gate that compares
+  the `estimate_query_cost` 4-char heuristic against recorded provider-native
+  token counts (Anthropic + Gemini), gating on regression-from-baseline (compliant
+  fixtures held to ±15%, known exceedances pinned against worsening), plus a
+  scheduled live-drift tier and a committed accuracy report. Measured result:
+  30/40 within ±15% (the heuristic under-estimates dense/code content; documented).
+  OpenAI / Ollama have no native tokenizer — an explicit, documented gap.
+- **GC2 — per-provider fact-extraction quality report.** An operator-facing
+  per-provider quality summary with a recall-keyed recommend-disable signal
+  (precision is expected-low by design; the gold set is a curated subset). All
+  three cloud providers score healthy; Ollama is reported untested.
+### Fixed
+- **Per-provider API-key wiring in the harness recorder.** API mode read a single
+  `execution.api_key_env` for every provider, so non-Anthropic recordings were
+  handed the Anthropic key (401). The recorder now maps each provider to its own
+  env var — the reason non-Anthropic recording had never been exercised.
+- Two redaction-security defects (leaked test keys; `AQ.`-format gap) — see
+  **Security** above.
+---
 ## [0.8.4] — 2026-05-26 — Public-launch readiness (PASS-023)
 ### Added