npm - audrey - Versions diffs - 0.20.0 → 0.23.1 - Mend

audrey 0.20.0 → 0.23.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (156) hide show

package/CHANGELOG.md +191 -0
package/README.md +216 -117
package/SECURITY.md +29 -0
package/dist/mcp-server/config.d.ts +29 -4
package/dist/mcp-server/config.d.ts.map +1 -1
package/dist/mcp-server/config.js +100 -17
package/dist/mcp-server/config.js.map +1 -1
package/dist/mcp-server/index.d.ts +302 -25
package/dist/mcp-server/index.d.ts.map +1 -1
package/dist/mcp-server/index.js +1077 -74
package/dist/mcp-server/index.js.map +1 -1
package/dist/src/adaptive.d.ts.map +1 -1
package/dist/src/adaptive.js +3 -1
package/dist/src/adaptive.js.map +1 -1
package/dist/src/affect.d.ts +4 -1
package/dist/src/affect.d.ts.map +1 -1
package/dist/src/affect.js +6 -4
package/dist/src/affect.js.map +1 -1
package/dist/src/audrey.d.ts +58 -4
package/dist/src/audrey.d.ts.map +1 -1
package/dist/src/audrey.js +469 -62
package/dist/src/audrey.js.map +1 -1
package/dist/src/capsule.d.ts +2 -1
package/dist/src/capsule.d.ts.map +1 -1
package/dist/src/capsule.js +14 -4
package/dist/src/capsule.js.map +1 -1
package/dist/src/causal.d.ts.map +1 -1
package/dist/src/causal.js +20 -2
package/dist/src/causal.js.map +1 -1
package/dist/src/confidence.d.ts.map +1 -1
package/dist/src/confidence.js +3 -0
package/dist/src/confidence.js.map +1 -1
package/dist/src/consolidate.d.ts +1 -0
package/dist/src/consolidate.d.ts.map +1 -1
package/dist/src/consolidate.js +35 -19
package/dist/src/consolidate.js.map +1 -1
package/dist/src/controller.d.ts +38 -0
package/dist/src/controller.d.ts.map +1 -0
package/dist/src/controller.js +169 -0
package/dist/src/controller.js.map +1 -0
package/dist/src/db.d.ts.map +1 -1
package/dist/src/db.js +12 -0
package/dist/src/db.js.map +1 -1
package/dist/src/decay.d.ts.map +1 -1
package/dist/src/decay.js +57 -50
package/dist/src/decay.js.map +1 -1
package/dist/src/embedding.d.ts.map +1 -1
package/dist/src/embedding.js +31 -3
package/dist/src/embedding.js.map +1 -1
package/dist/src/encode.d.ts +9 -2
package/dist/src/encode.d.ts.map +1 -1
package/dist/src/encode.js +21 -8
package/dist/src/encode.js.map +1 -1
package/dist/src/export.d.ts.map +1 -1
package/dist/src/export.js +5 -3
package/dist/src/export.js.map +1 -1
package/dist/src/feedback.d.ts +29 -0
package/dist/src/feedback.d.ts.map +1 -0
package/dist/src/feedback.js +123 -0
package/dist/src/feedback.js.map +1 -0
package/dist/src/forget.d.ts.map +1 -1
package/dist/src/forget.js +58 -50
package/dist/src/forget.js.map +1 -1
package/dist/src/fts.js +1 -1
package/dist/src/fts.js.map +1 -1
package/dist/src/hybrid-recall.d.ts +2 -1
package/dist/src/hybrid-recall.d.ts.map +1 -1
package/dist/src/hybrid-recall.js +35 -26
package/dist/src/hybrid-recall.js.map +1 -1
package/dist/src/impact.d.ts +47 -0
package/dist/src/impact.d.ts.map +1 -0
package/dist/src/impact.js +146 -0
package/dist/src/impact.js.map +1 -0
package/dist/src/import.d.ts +177 -1
package/dist/src/import.d.ts.map +1 -1
package/dist/src/import.js +206 -17
package/dist/src/import.js.map +1 -1
package/dist/src/index.d.ts +8 -0
package/dist/src/index.d.ts.map +1 -1
package/dist/src/index.js +4 -0
package/dist/src/index.js.map +1 -1
package/dist/src/interference.d.ts +5 -2
package/dist/src/interference.d.ts.map +1 -1
package/dist/src/interference.js +27 -20
package/dist/src/interference.js.map +1 -1
package/dist/src/llm.d.ts.map +1 -1
package/dist/src/llm.js +1 -0
package/dist/src/llm.js.map +1 -1
package/dist/src/migrate.d.ts.map +1 -1
package/dist/src/migrate.js +21 -9
package/dist/src/migrate.js.map +1 -1
package/dist/src/preflight.d.ts +52 -0
package/dist/src/preflight.d.ts.map +1 -0
package/dist/src/preflight.js +221 -0
package/dist/src/preflight.js.map +1 -0
package/dist/src/profile.d.ts +23 -0
package/dist/src/profile.d.ts.map +1 -0
package/dist/src/profile.js +51 -0
package/dist/src/profile.js.map +1 -0
package/dist/src/promote.d.ts.map +1 -1
package/dist/src/promote.js +2 -3
package/dist/src/promote.js.map +1 -1
package/dist/src/prompts.d.ts.map +1 -1
package/dist/src/prompts.js +76 -47
package/dist/src/prompts.js.map +1 -1
package/dist/src/recall.d.ts +9 -6
package/dist/src/recall.d.ts.map +1 -1
package/dist/src/recall.js +182 -40
package/dist/src/recall.js.map +1 -1
package/dist/src/redact.d.ts +7 -1
package/dist/src/redact.d.ts.map +1 -1
package/dist/src/redact.js +94 -11
package/dist/src/redact.js.map +1 -1
package/dist/src/reflexes.d.ts +35 -0
package/dist/src/reflexes.d.ts.map +1 -0
package/dist/src/reflexes.js +87 -0
package/dist/src/reflexes.js.map +1 -0
package/dist/src/rollback.d.ts.map +1 -1
package/dist/src/rollback.js +9 -4
package/dist/src/rollback.js.map +1 -1
package/dist/src/routes.d.ts +1 -0
package/dist/src/routes.d.ts.map +1 -1
package/dist/src/routes.js +267 -11
package/dist/src/routes.js.map +1 -1
package/dist/src/rules-compiler.d.ts.map +1 -1
package/dist/src/rules-compiler.js +36 -6
package/dist/src/rules-compiler.js.map +1 -1
package/dist/src/server.d.ts +2 -1
package/dist/src/server.d.ts.map +1 -1
package/dist/src/server.js +42 -4
package/dist/src/server.js.map +1 -1
package/dist/src/tool-trace.d.ts.map +1 -1
package/dist/src/tool-trace.js +42 -29
package/dist/src/tool-trace.js.map +1 -1
package/dist/src/types.d.ts +28 -1
package/dist/src/types.d.ts.map +1 -1
package/dist/src/ulid.d.ts.map +1 -1
package/dist/src/ulid.js +52 -2
package/dist/src/ulid.js.map +1 -1
package/dist/src/utils.d.ts.map +1 -1
package/dist/src/utils.js +8 -1
package/dist/src/utils.js.map +1 -1
package/dist/src/validate.d.ts +2 -0
package/dist/src/validate.d.ts.map +1 -1
package/dist/src/validate.js +60 -29
package/dist/src/validate.js.map +1 -1
package/docs/assets/audrey-feature-grid.jpg +0 -0
package/docs/assets/audrey-logo.svg +45 -0
package/docs/assets/audrey-wordmark.png +0 -0
package/examples/ollama-memory-agent.js +326 -0
package/package.json +35 -22
package/docs/assets/benchmarks/local-benchmark.svg +0 -45
package/docs/assets/benchmarks/operations-benchmark.svg +0 -45
package/docs/assets/benchmarks/published-memory-standards.svg +0 -50
package/docs/benchmarking.md +0 -151
package/docs/production-readiness.md +0 -124

package/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,191 @@
+# Changelog
+## 0.23.1 - 2026-05-08
+### Added - Audrey Guard chassis
+- Added `MemoryController` as the first orchestration layer for memory-before-action workflows. `beforeAction()` returns `allow` / `warn` / `block` with evidence, reflexes, recommendations, and an optional capsule; `afterAction()` records redacted tool outcomes and turns failures into tool-result memories.
+- Added `audrey guard --tool <Tool> "<action>"` with `--json`, `--explain`, `--override`, and `--fail-on-warn`.
+- Added `audrey demo --scenario repeated-failure`, a deterministic no-network demo where Audrey records a failed deploy, blocks the repeat attempt, validates the lesson, and prints impact.
+- `Audrey.encodeBatch()` now uses provider-level `embedBatch()` and validates the batch before embedding, avoiding N sequential cloud embedding calls for valid batches.
+- Recall now surfaces partial vector/FTS failures on the returned result array. Capsules preserve those diagnostics, strict Guard preflights block when recall is degraded, and `/v1/status` / `memory_status` expose the latest recall degradation signal.
+- Added `docs/AUDREY_PAPER_OUTLINE.md`, framing Audrey Guard as local-first pre-action memory control for tool-using agents and outlining the GuardBench evaluation plan.
+### Fixed
+- Docker Compose now requires `AUDREY_API_KEY` instead of starting a non-loopback unauthenticated REST sidecar that the server correctly refuses.
+- Guard exact-failure matching now redacts before trimming, matches tool names case-insensitively, and includes file scope in the action hash.
+- Redaction-aware truncation keeps complete `[REDACTED:*]` markers in long tool errors and output summaries.
+- `npm test` and `npm run test:watch` now set a repo-local Vitest temp directory before Vitest starts, avoiding locked-down Windows user-temp failures.
+- `npm audit --omit=dev --audit-level=moderate` is clean after refreshing Hono, Zod, and transitive rate-limit packages.
+- README benchmark sample values now match `benchmarks/snapshots/perf-0.22.2.json`; the paper evidence ledger was re-checked for the repeated-failure demo line range and live bibliography URLs before release prep.
+## 0.22.2 - 2026-05-01
+### Correctness — second CodeRabbit review pass and code-scanning audit
+- `src/forget.ts` `WHERE v.state ...` was filtering on the denormalized state column on `vec_semantics` / `vec_procedures`. That column is only populated at INSERT and never updated, so dormant or superseded rows were still passing the filter. Switched to `s.state` / `p.state`. Same fix applied to `src/interference.ts` after the second review pass caught the duplicate.
+- Wrapped `forgetMemory`, `purgeMemories`, `applyDecay`, `applyInterference`, and the contradiction insert + state update in `src/validate.ts` in transactions so partial failures can't leave inconsistent counts or orphan contradictions.
+- `mcp-server/index.ts` `VALID_SOURCES` and `VALID_TYPES` were object literals fed to `z.enum()`, which expects a tuple. Converted to const tuples so the MCP schemas validate correctly.
+- `src/utils.ts` `cosineSimilarity` now throws on length mismatch instead of silently returning NaN; `daysBetween` throws on invalid date strings.
+- `src/ulid.ts` `generateDeterministicId` rebuilt as canonicalize → SHA-256 → first 16 bytes → Crockford Base32. The previous shape used `JSON.stringify` (object-key-order-unstable) and emitted hex characters, neither of which produced a real ULID. `canonicalize` now also rejects circular references.
+- `src/audrey.ts` constructor and `consolidate`/`decay` now use `??` for default fallbacks so an explicit `0` survives. The previous `||` short-circuit silently replaced valid zero-value config.
+- `src/audrey.ts` `recallStream` now respects `options.agent` (was hardcoded to `this.agent`) and waits for embedding warmup like the non-streaming path.
+- `src/confidence.ts` `recencyDecay` throws `RangeError` on `halfLifeDays <= 0` to surface NaN/Infinity earlier in the pipeline.
+- `src/causal.ts` and `src/validate.ts` now validate the LLM response shape before reading fields. `causal` rejects non-finite confidence; `validate` rejects non-object/array conditions and only counts new evidence toward `supporting_count`.
+- `src/rollback.ts` UPDATEs now check `.changes` and aggregate real counts. Rolling back ids that don't exist no longer reports false success.
+- `src/rules-compiler.ts` `quoteString` now also escapes newline, carriage return, and tab so promoted rule content with multiline values produces valid double-quoted YAML.
+- `src/decay.ts` and `src/forget.ts purgeMemories` moved their SELECTs inside the surrounding transaction so concurrent writers can't slip rows in or out between read and write.
+- `src/migrate.ts` `reembedAll` chunks `embedBatch` calls into 256-row batches and labels failures by kind + row range. Pre-fix a partial embed failure on a 50K-episode reembed printed a bare provider error and lost the location. `EpisodeMigrateRow.consolidated` was also retyped to `number | null` to match runtime usage.
+- `src/embedding.ts` `embedBatch` validates response shape with clear errors instead of mapping over a missing or malformed `data` field.
+- `src/encode.ts` `effectiveSalience` clamped to `[0, 1]`. The previous formula could go negative on a sufficiently negative arousal boost.
+- `src/affect.ts` `timeDeltaDays` no longer propagates NaN from invalid `created_at`.
+- `src/capsule.ts` failure entry `memory_id` no longer interpolates `'undefined'` when `tool_name` is missing; recall spread order keeps `scope: 'agent'` from being overridden by caller options.
+- `src/import.ts` `isDatabaseEmpty` now also checks `memory_events`. Pre-fix you could `restore` into a "fresh" store that already contained audit-trail rows.
+- `src/server.ts` shutdown awaits `server.close` (was fire-and-forget) and surfaces `audrey.closeAsync` errors to stderr instead of silently swallowing them. `ERR_SERVER_NOT_RUNNING` is treated as success.
+- `src/feedback.ts` replaced a `findRow(id)!.row` non-null assertion with a defensive null check; if the row was concurrently forgotten between UPDATE and re-read, returns the values just written rather than crashing.
+- `src/promote.ts` folded `trigger_conditions` into the main SELECT (was an N+1).
+### Security
+- `src/routes.ts` API key auth uses padded-buffer constant-time comparison. The previous `provided.length !== expected.length || !timingSafeEqual(...)` shape leaked the expected key length via response timing on local untrusted callers. Both buffers are now padded to 1 KiB before `timingSafeEqual`, so the comparison runs identically regardless of header length.
+- `src/redact.ts` raised the hex-secret length threshold from 40 to 80 chars so 40-character git SHAs and 64-character SHA-256 checksums are no longer redacted as secrets.
+- The "Protect master" GitHub ruleset was updated to drop the stale `Node 18 on Ubuntu` required check (CI dropped Node 18 from the matrix in 0.22.1 to match `engines.node >=20`, but the protection rule kept requiring a check that would never run).
+### Added — closed-loop visibility on REST and Python
+- New `GET /v1/impact` route that mirrors `Audrey.impact()` and the `audrey impact` CLI. Bounds `windowDays` to 1-365 and `limit` to 1-100.
+- Python sync and async clients gained an `impact(window_days=, limit=)` method. The previous `analytics()` no longer raises `NotImplementedError`; it's an alias of `impact()` for older callers.
+- Python integration tests are no longer skipped. The suite spins up the real TS REST sidecar via `node dist/mcp-server/index.js serve` and exercises encode → recall → mark_used → impact → snapshot → restore end-to-end.
+### Benchmarks — legitimate performance snapshot, no marketing graphs
+- New `npm run bench:perf-snapshot` (`benchmarks/perf-snapshot.js`) reports encode and hybrid-recall p50/p95/p99 across multiple corpus sizes (default 100, 1000, 5000) with full machine provenance (Node version, CPU model, RAM, git SHA) so the numbers are reproducible.
+- Removed the synthetic-baseline SVG charts (`docs/assets/benchmarks/local-benchmark.svg`, `operations-benchmark.svg`, `published-memory-standards.svg`) from the repo and from the npm package's `files` field. They claimed Audrey beat naive baselines on 12 hand-crafted scenarios, which is not a useful marketing signal. The behavioral regression suite (`npm run bench:memory:check`) still runs as a release gate; it just no longer ships chart artifacts to the README.
+- Removed the `bench:memory:readme-assets` script (it generated the SVGs above).
+- README's Benchmarks section rewritten around the perf snapshot with explicit caveats about embedding-provider cost and what the numbers do and don't cover.
+### Fixed
+- `mcp-server/index.ts` help banner: `memory_validate` was already registered but was missing from the in-session tool list.
+- `CHANGELOG.md` 0.22.1 contradicted itself by stating `mark_used()` was both upgraded to a real call and still raises `NotImplementedError`. Removed the stale duplicate.
+### Personal-data cleanup
+- `tests/http-api.test.js` no longer references "Tyler" — replaced with generic test fixtures so the public test suite has no personal identifiers.
+## 0.22.1 - 2026-04-30
+### Added — `audrey impact` report
+- New `audrey impact` CLI command (also `--json` for automation, `--window N` for the lookback window in days, `--limit N` for how many rows in each list).
+- Shows: total memories by type, all-time validated count, recent validations, top-N most-used memories, weakest-N (lowest salience — candidates to forget), and recent activity timeline.
+- Backed by `src/impact.ts` (`buildImpactReport`, `formatImpactReport`) and `Audrey.impact({ windowDays, limit })`.
+- This is the marketing surface the adversary called for: vital signs over CI verdicts. As agents start calling `memory_validate`, the report accumulates the "X failures prevented this week, Y procedures auto-promoted" story.
+### Added — closed-loop feedback (the "memory before action" wedge)
+- New `memory_validate(id, outcome)` MCP tool. `outcome` is one of:
+  - `"helpful"` — the recalled memory drove a correct action. Reinforces salience and bumps `retrieval_count` for semantic/procedural rows.
+  - `"wrong"` — the memory was misleading. Decreases salience and bumps `challenge_count` for semantic memories.
+  - `"used"` — neutral signal that the memory was referenced (smaller salience delta than `helpful`).
+- New REST endpoints `POST /v1/validate` (canonical) and `POST /v1/mark-used` (legacy alias defaulting to `outcome=used`).
+- New `Audrey.validate({ id, outcome })` SDK method emits a `'validate'` event so consumers can audit feedback flow.
+- New `src/feedback.ts` module with the `applyFeedback()` primitive — kept out of `audrey.ts` per architecture review (god-class concern).
+- Python client `mark_used()` is no longer a `NotImplementedError`; calls `/v1/mark-used`. New `validate(memory_id, outcome="used"|"helpful"|"wrong")` method on both sync and async clients.
+- 10 new tests (6 SDK math, 1 MCP enum, 3 HTTP roundtrip including 404 path).
+This is the P0#1 item from `docs/PRODUCTION_BACKLOG.md` — the closed feedback loop that lifts the autopilot rubric's ALIVE dimension from 4 to 7+. The math reuses the existing `confidence.ts` reinforcement formula; the new column work is a no-op (`usage_count` and `last_used_at` were already added by migration 10 in v0.21).
+### Security
+- HTTP `/v1/recall` and `/v1/capsule` no longer body-spread caller options into `audrey.recall()`. Pre-fix, `includePrivate: true` and `confidenceConfig` overrides could be passed in HTTP bodies, bypassing the private-memory ACL and integrity controls. The new `sanitizeRecallOptions()` allowlist drops anything not in a known-safe key set.
+- `audrey serve` defaults to binding `127.0.0.1` (was `0.0.0.0`). Refuses to start on a non-loopback host without `AUDREY_API_KEY` unless `AUDREY_ALLOW_NO_AUTH=1`. New `AUDREY_HOST` env var explicitly opts in to network exposure.
+- HTTP API key comparison uses `crypto.timingSafeEqual` instead of string `!==` to avoid prefix-match timing leaks on local untrusted callers.
+- `audrey promote --yes` refuses to write `.claude/rules/*.md` outside `process.cwd()` unless the target path is in `AUDREY_PROMOTE_ROOTS`. Prevents a malicious MCP caller from writing persistent prompt-injection files into the user's `~/.claude/` directory.
+### First-contact UX
+- `audrey --help`, `audrey --version`, and `audrey help`/`audrey version` now print help/version and exit 0 instead of silently dropping into the MCP stdio server. Unknown subcommands print error + help and exit 2.
+- ONNX runtime EP-assignment warnings ("Some nodes were not assigned to the preferred execution providers...") are suppressed by default via per-session `logSeverityLevel`. Set `AUDREY_ONNX_VERBOSE=1` to restore the original behavior.
+- `[audrey-mcp]` info boot logs (server started, connected via stdio, warmup completed) are gated behind `AUDREY_DEBUG=1`. Warmup-failure errors continue to log unconditionally.
+### Reliability
+- `audrey.close()` now warns to stderr when called with pending post-encode consolidation work. New `audrey.closeAsync()` awaits `drainPostEncodeQueue()` before closing the database. All CLI subcommands (`reembed`, `dream`, `greeting`, `reflect`, `demo`, `observe-tool`, `promote`) use `closeAsync` to prevent the silent-data-loss race introduced in v0.22.0 where post-encode validation/interference could hit a closed DB.
+- `_emitQueueError` reverted to the standard EventEmitter idiom: emit `error` when a listener is attached, fall back to `console.error` otherwise. v0.22.0 always called `console.error` and produced duplicate stderr lines for apps with structured error pipelines.
+- `encodeBatch` now reuses the encode vector across post-encode stages and routes through `_enqueuePostEncode` (matching `encode`). Pre-fix, batch callers paid 4× embed cost per item and silently bypassed interference/resonance — a behavior divergence from single-encode that the v0.22.0 perf pass missed.
+### Performance
+- SQLite PRAGMA tuning at db creation: `synchronous=NORMAL` (durable under WAL), 64 MiB page cache, 256 MiB mmap, `temp_store=MEMORY`. Set `AUDREY_PRAGMA_DEFAULTS=0` to revert to better-sqlite3 defaults. Expected impact: 2-5× recall p95 at &gt;10K episodes; 30-50% improvement on encode under sustained load.
+### Dependencies
+- `sqlite-vec`: `0.1.7-alpha.2` → `0.1.9` (alpha to stable; the prior pin was 15 months old).
+- `@modelcontextprotocol/sdk`: `1.26.0` → `1.29.0` (stricter schema validation, transport stability).
+- `zod` `4.3.6` → `4.4.1`, `better-sqlite3` `12.6.2` → `12.9.0`, `hono` `4.12.14` → `4.12.15`, `@hono/node-server` `1.19.13` → `1.19.14`, `vitest` `4.0.18` → `4.1.5`, `typescript` `6.0.2` → `6.0.3`.
+- `npm audit`: 0 vulnerabilities (production); transitive postcss CVE in vitest's vite resolved via `npm audit fix`.
+### SDK contract fixes (Python ↔ TS server)
+- Python client `DEFAULT_BASE_URL` corrected from `http://127.0.0.1:3487` to `http://127.0.0.1:7437` to match the TS server's default port. Pre-fix, calling `Audrey()` with no args connected to nothing.
+- Python `recall()` and `recall_response()` now decode the bare-list payload that `/v1/recall` actually returns, then wrap into `RecallResponse` client-side. Pre-fix, `recall_response()` would raise a Pydantic validation error against the real server.
+- Python `restore()` now wraps the snapshot in `{"snapshot": ...}` to match the TS `/v1/import` handler that reads `body.snapshot`. Pre-fix, the server received `body.snapshot === undefined` and `audrey.import(undefined)` failed.
+- Python `analytics()` raises `NotImplementedError` with a pointer to `docs/PRODUCTION_BACKLOG.md` until the analytics endpoint ships. Pre-fix, it produced a cryptic 404 from the TS sidecar that doesn't expose that endpoint. (Note: `mark_used()` was upgraded to a real call against `/v1/mark-used` in this same release — see the closed-loop section above.)
+- README REST API row no longer claims `/openapi.json` or `/docs` — those routes aren't currently wired. The README now matches the actual surface (`/health` + `/v1/*`).
+### Removed
+- `hybrid_strict` retrieval mode (was a silent alias of `hybrid` with no behavioral difference). Use `hybrid` (default) or `vector`.
+### Internal
+- New `closeAsync(timeoutMs?: number)` on `Audrey`.
+- New `sanitizeRecallOptions()` allowlist helper in `src/routes.ts`.
+- `startServer` returns `hostname` alongside `port`.
+- 5 new tests: CLI surface (`--help`/`--version`/unknown), HTTP recall sanitizer (privacy ACL, integrity, retrieval enum), HTTP bind safety (no-auth on LAN refused, `AUDREY_ALLOW_NO_AUTH` override).
+## 0.22.0 - 2026-04-28
+### Performance
+- Encode response time: 24.7ms to 15.2ms p50, about 40% faster.
+- Cold-start first encode: 525ms to 28ms with warmup, about 18.7x faster.
+- Hybrid recall: 30.2ms to 14.3ms p50, about 2.1x faster.
+- Eliminated 3 of 4 redundant embedding calls during encode. Validation, interference, and affect resonance now reuse the main content vector.
+### Added
+- Added `memory_encode.wait_for_consolidation` parameter, default `false`, for opt-in read-after-write semantics.
+- Added `memory_recall.retrieval` parameter with `"hybrid"` default and `"vector"` (FTS-bypass fast path).
+- Added `pending_consolidation_count`, `embedding_warm`, `warmup_duration_ms`, and `default_retrieval_mode` to `memory_status`.
+- Added background embedding pipeline warmup after MCP `server.connect()`.
+- Added `AUDREY_PROFILE=1` for per-stage timings in MCP `_meta.diagnostics`.
+- Added `AUDREY_DISABLE_WARMUP=1` to opt out of background embedding warmup.
+- Added `benchmarks/perf.bench.js` and `npm run bench:perf` as a mock-embedding CI perf gate.
+### Changed
+- Moved post-encode validation, interference, and affect resonance onto a serialized async queue so `memory_encode` no longer blocks on downstream consolidation work by default.
+- Folded recall's three healthy-store vec-table count queries into one SQL roundtrip before KNN.
+- Process shutdown now drains the post-encode consolidation queue with a 5-second timeout and logs pending row IDs if work remains.
+### Internal
+- Added `src/profile.ts` with `ProfileRecorder`.
+- Added `encodeWithDiagnostics()` and `recallWithDiagnostics()` for MCP profiling-mode response metadata.
+## 0.21.0 - Release Diagnostics and Host Setup
+- Added `npx audrey doctor` for first-contact diagnostics, JSON automation, provider checks, MCP entrypoint validation, memory-store health, and host config generation.
+- Added `npx audrey install --host <host> --dry-run` so Codex, Claude Code, Claude Desktop, Cursor, Windsurf, VS Code, JetBrains, and generic MCP hosts can preview setup without accidental config writes.
+- Updated docs around the recommended first run: `doctor`, `demo`, safe host install preview, then host-specific verification.
+- Kept Claude Code's direct installer intact while making the default release story host-neutral.
+- Refreshed lockfile transitive packages through the npm resolver; vulnerability audit remains clean.
+## 0.20.0 - Memory Reflexes
+- Added Memory Preflight and Memory Reflexes so agents can check memory before acting and turn repeated failures into trigger-response guidance.
+- Added Ollama/local-agent guidance and runnable local-agent example.
+- Expanded host-neutral MCP docs and Audrey for Dummies onboarding.

package/README.md CHANGED Viewed

@@ -1,83 +1,148 @@
-# Audrey
+<div align="center">
+  <img src="docs/assets/audrey-wordmark.png" alt="Audrey wordmark" width="760">
-[![CI](https://github.com/Evilander/Audrey/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/Evilander/Audrey/actions/workflows/ci.yml)
-[![npm version](https://img.shields.io/npm/v/audrey.svg)](https://www.npmjs.com/package/audrey)
-[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
+  <p><strong>The local-first memory firewall for AI agents.</strong></p>
-Audrey is a persistent memory and continuity engine for Claude Code and AI agents.
+  <p>
+    Give Codex, Claude Code, Claude Desktop, Cursor, Windsurf, VS Code, JetBrains, Ollama-backed agents,
+    and custom agent services one durable memory layer they can check before they touch tools.
+  </p>
-It gives an agent a local memory store, durable recall, consolidation, contradiction handling, a REST sidecar, MCP tools, and benchmark gates without adding external infrastructure.
+  <p>
+    <a href="https://github.com/Evilander/Audrey/actions/workflows/ci.yml"><img alt="CI" src="https://github.com/Evilander/Audrey/actions/workflows/ci.yml/badge.svg?branch=master"></a>
+    <a href="https://www.npmjs.com/package/audrey"><img alt="npm version" src="https://img.shields.io/npm/v/audrey.svg"></a>
+    <a href="LICENSE"><img alt="MIT license" src="https://img.shields.io/badge/license-MIT-blue.svg"></a>
+  </p>
+</div>
-Requires Node.js 20+.
+## Why Audrey Exists
+Agents forget the exact mistakes they made yesterday. They repeat broken commands, lose project-specific rules, miss contradictions, and treat every new session like a cold start.
+Audrey Guard is the headline loop: record what happened, remember what mattered, check before action, return `allow`, `warn`, or `block` with evidence, then validate whether the memory helped.
+Audrey turns those hard-won lessons into a local memory runtime:
+- `audrey guard --tool Bash "npm run deploy"` runs memory-before-action from the terminal.
+- `memory_recall` finds durable context by semantic similarity.
+- `memory_preflight` checks prior failures, risks, rules, and relevant procedures before an action.
+- `memory_reflexes` converts remembered evidence into trigger-response guidance agents can follow.
+- `memory_validate` closes the loop after the action — `helpful`, `used`, or `wrong` outcomes feed salience and decay.
+- `memory_dream` consolidates episodes into principles and applies decay.
+- `audrey impact` and `audrey doctor` tell a human or CI system whether the runtime is doing real work and is actually ready.
+It is not a hosted vector database, a notes app, or a Claude-only plugin. Audrey is a SQLite-backed continuity layer that can sit under any local or sidecar agent loop.
+<div align="center">
+  <img src="docs/assets/audrey-feature-grid.jpg" alt="Audrey feature marks: memory continuity, archive signal, recall loop, layered evidence, local node, and remembering before acting" width="760">
+</div>
 ## Quick Start
-### Claude Code
+Requires Node.js 20+.
 ```bash
-npx audrey init
 npx audrey doctor
+npx audrey demo --scenario repeated-failure
+npx audrey guard --tool Bash "npm run deploy"
+```
+`doctor` verifies Node, the MCP entrypoint, provider selection, memory-store health, and host config generation. The repeated-failure demo is no-key, no-host, and no-network: it creates a temporary store, records a failed deploy, teaches Audrey the fix, then shows Audrey Guard blocking the repeat attempt with evidence.
+Expected first-run shape:
+```text
+Audrey Doctor v0.23.1
+Store health: not initialized
+Verdict: ready
 ```
-This uses the default `local-offline` preset:
+After the first real memory write, `doctor` should report the store as healthy.
-- registers Audrey with Claude Code
-- installs hooks for automatic recall and reflection
-- uses local embeddings by default
-- stores memory in one local SQLite-backed data directory
+## Install Into Agent Hosts
-### REST or Docker Sidecar
+Preview host setup without editing config files:
 ```bash
-npx audrey init sidecar-prod
-docker compose up -d --build
+npx audrey install --host codex --dry-run
+npx audrey install --host claude-code --dry-run
+npx audrey install --host generic --dry-run
 ```
-Then verify:
+Generate raw config blocks:
 ```bash
-npx audrey doctor
-curl http://localhost:3487/health
+npx audrey mcp-config codex
+npx audrey mcp-config generic
+npx audrey mcp-config vscode
 ```
-## Why Audrey
+Claude Code can be registered directly:
-- Local-first: memory lives in SQLite with `sqlite-vec`, not a hosted vector database.
-- Practical: MCP, CLI, REST, JavaScript, Python, and Docker are all first-class.
-- Durable: snapshot, restore, health checks, benchmark gates, and graceful shutdown are built in.
-- Structured: Audrey does more than save notes. It consolidates, decays, tracks contradictions, and supports procedural memory.
+```bash
+npx audrey install
+claude mcp list
+```
-## What Ships
+All local MCP paths default to local embeddings and one shared SQLite-backed memory directory. Use `AUDREY_DATA_DIR` to isolate projects, tenants, or host identities.
-- Claude Code MCP server with 13 memory tools
-- Automatic hook-based recall and reflection for Claude Code sessions
-- JavaScript SDK
-- Python SDK packaged as `audrey-memory`
-- REST API for sidecar deployment
-- Docker and Compose deployment path
-- Snapshot and restore for portable memory state
-- Machine-readable health and benchmark gates
-- Local benchmark harness with retrieval and lifecycle-operation tracks
+Installer-generated host config does not include provider API keys by default. Prefer setting `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GOOGLE_API_KEY`, or `GEMINI_API_KEY` in the host runtime environment; use `npx audrey install --include-secrets` only if you explicitly accept argv/config exposure.
-## Setup Presets
+## Use With Ollama And Local Agents
-`npx audrey init` supports four named presets:
+Ollama runs models; Audrey supplies memory. Start Audrey as a local REST sidecar and expose its routes as tools in your agent loop:
-| Preset | Best For | Behavior |
-|---|---|---|
-| `local-offline` | Claude Code on one machine | Local embeddings, MCP install, hooks install |
-| `hosted-fast` | Claude Code with provider keys already present | Auto-picks hosted providers from env, MCP install, hooks install |
-| `ci-mock` | CI and smoke tests | Mock embedding + LLM providers, no Claude-specific setup |
-| `sidecar-prod` | REST API and Docker deployment | Sidecar-oriented defaults, no Claude-specific setup |
+```bash
+AUDREY_AGENT=ollama-local-agent npx audrey serve
+curl http://localhost:7437/health
+curl http://localhost:7437/v1/status
+```
-Useful checks:
+Runnable example:
 ```bash
-npx audrey doctor
-npx audrey status
-npx audrey status --json --fail-on-unhealthy
+AUDREY_AGENT=ollama-local-agent npx audrey serve
+OLLAMA_MODEL=qwen3 node examples/ollama-memory-agent.js "What should you remember about Audrey?"
 ```
+Core sidecar tools:
+| Agent Need | REST Route |
+|---|---|
+| Check memory before acting | `POST /v1/preflight` |
+| Get reflex rules for an action | `POST /v1/reflexes` |
+| Store a useful observation | `POST /v1/encode` |
+| Recall relevant context | `POST /v1/recall` |
+| Get a turn-sized memory packet | `POST /v1/capsule` |
+| Check health | `GET /v1/status` |
+## What Ships
+| Surface | Status |
+|---|---|
+| MCP stdio server | 20 tools plus status/recent/principles resources and briefing/recall/reflection prompts |
+| CLI | `doctor`, `demo`, `guard`, `install`, `mcp-config`, `status`, `dream`, `reembed`, `observe-tool`, `promote`, `impact` |
+| REST API | Hono server with `/health` and `/v1/*` routes |
+| JavaScript SDK | Direct TypeScript/Node import from `audrey` |
+| Python client | `pip install audrey-memory`, calls the REST sidecar |
+| Storage | Local SQLite plus `sqlite-vec`, no hosted database required |
+| Deployment | npm package, Docker, Compose, host-specific MCP config generation |
+| Safety loop | preflight warnings, reflexes, redacted tool traces, contradiction handling |
+## Memory Model
+Audrey is built around the parts of memory that matter for agents:
+- Episodic memory: specific observations, tool results, preferences, and session facts.
+- Semantic memory: consolidated principles extracted from repeated evidence.
+- Procedural memory: remembered ways to act, avoid, retry, or verify.
+- Affect and salience: emotional weight and importance influence recall.
+- Interference and decay: stale, conflicting, or low-confidence memories lose authority over time.
+- Contradiction handling: competing claims are tracked instead of silently overwritten.
+- Tool-trace learning: failed commands and risky actions become future preflight warnings.
+The product bet is simple: the next generation of useful agents will not just retrieve facts. They will remember what happened, decide whether a memory is still trustworthy, and use that memory before touching tools.
 ## Use Audrey From Code
 ### JavaScript
@@ -112,119 +177,153 @@ pip install audrey-memory
 ```python
 from audrey_memory import Audrey
-brain = Audrey(
-    base_url="http://127.0.0.1:3487",
-    api_key="secret",
-    agent="support-agent",
-)
-memory_id = brain.encode(
-    "Stripe returns HTTP 429 above 100 req/s",
-    source="direct-observation",
-)
+brain = Audrey(base_url="http://127.0.0.1:7437", agent="support-agent")
+memory_id = brain.encode("Stripe returns HTTP 429 above 100 req/s", source="direct-observation")
 results = brain.recall("stripe rate limit", limit=5)
 brain.close()
 ```
-## Key Commands
+## Production Readiness
-```bash
-# Setup
-npx audrey init
-npx audrey init hosted-fast
-npx audrey init ci-mock
-npx audrey init sidecar-prod
+Audrey is close to a 1.0-ready local memory runtime, but production depends on how it is embedded. Treat it like stateful infrastructure.
-# Claude Code integration
-npx audrey install
-npx audrey hooks install
-npx audrey hooks uninstall
-npx audrey uninstall
+Release gates used for this package:
-# Health and maintenance
+```bash
+npm run release:gate
 npx audrey doctor
-npx audrey status
-npx audrey dream
-npx audrey reembed
+npx audrey demo
+```
-# Versioning
-npx audrey snapshot
-npx audrey restore backup.json --force
+Recommended runtime checks:
-# Sidecar
-npx audrey serve
-docker compose up -d --build
+```bash
+npx audrey doctor --json
+npx audrey status --json --fail-on-unhealthy
+npx audrey install --host codex --dry-run
 ```
+Production controls you still own:
+- Set one `AUDREY_DATA_DIR` per tenant, environment, or isolation boundary.
+- Pin `AUDREY_EMBEDDING_PROVIDER` and `AUDREY_LLM_PROVIDER` explicitly.
+- Back up the SQLite data directory before provider or dimension changes.
+- Keep API keys and raw credentials out of encoded memory content.
+- Use `AUDREY_API_KEY` if the REST sidecar is reachable beyond the local process boundary.
+- Run `npx audrey dream` on a schedule so consolidation and decay stay current.
+- Add application-level encryption, retention, access control, and audit logging for regulated environments.
+## Environment Variables
+| Variable | Default | Purpose |
+|---|---|---|
+| `AUDREY_DATA_DIR` | `~/.audrey/data` | SQLite memory store path. Use one per tenant or agent identity for isolation. |
+| `AUDREY_AGENT` | `local-agent` | Logical agent identity stamped on writes. |
+| `AUDREY_EMBEDDING_PROVIDER` | `local` | `local`, `gemini`, `openai`, or `mock`. Cloud providers require explicit opt-in. |
+| `AUDREY_LLM_PROVIDER` | auto | `anthropic`, `openai`, or `mock`. |
+| `AUDREY_DEVICE` | `gpu` | Local embedding device (`gpu` or `cpu`). Falls back to CPU if GPU init fails. |
+| `AUDREY_PORT` | `7437` | REST sidecar port. |
+| `AUDREY_HOST` | `127.0.0.1` | REST sidecar bind address. Set to `0.0.0.0` only with `AUDREY_API_KEY`. |
+| `AUDREY_API_KEY` | unset | Bearer token required for non-loopback REST traffic. |
+| `AUDREY_ALLOW_NO_AUTH` | `0` | Set to `1` to allow non-loopback bind without an API key. Don't. |
+| `AUDREY_ENABLE_ADMIN_TOOLS` | `0` | Set to `1` to enable export, import, and forget routes/tools. Disabled by default. |
+| `AUDREY_PROMOTE_ROOTS` | unset | Colon/semicolon-separated extra roots for `audrey promote --yes` writes. By default writes are restricted to `process.cwd()`. |
+| `AUDREY_DEBUG` | `0` | Set to `1` to print MCP info logs (server started, warmup completed). Errors always log. |
+| `AUDREY_PROFILE` | `0` | Set to `1` to emit per-stage timings via MCP `_meta.diagnostics`. |
+| `AUDREY_DISABLE_WARMUP` | `0` | Set to `1` to skip background embedding warmup at MCP boot. |
+| `AUDREY_ONNX_VERBOSE` | `0` | Set to `1` to restore ONNX runtime EP-assignment warnings (suppressed by default). |
+| `AUDREY_PRAGMA_DEFAULTS` | `1` | Set to `0` to revert SQLite PRAGMA tuning to better-sqlite3 defaults. |
+| `AUDREY_CONTEXT_BUDGET_CHARS` | `4000` | Default Memory Capsule character budget. |
 ## Benchmarks
-Audrey ships with a benchmark harness and release gate:
+Audrey ships two benchmark commands.
+### Performance snapshot
+`npm run bench:perf-snapshot` measures encode and hybrid recall latency at multiple corpus sizes against the in-process mock provider. It reports p50/p95/p99 plus machine provenance so the numbers are reproducible and honest about what they cover.
 ```bash
-npm run bench:memory
-npm run bench:memory:check
+npm run build
+npm run bench:perf-snapshot                                 # default sizes 100, 1000, 5000
+node benchmarks/perf-snapshot.js --sizes 1000,10000 --json  # custom shape
 ```
-The benchmark suite measures:
-- retrieval behavior
-- update and overwrite behavior
-- delete and abstain behavior
-- semantic and procedural merge behavior
+Sample output from `benchmarks/snapshots/perf-0.22.2.json` (24-core Ryzen 9 7900X3D, Node 25.5.0, mock 64-dim embedding, hybrid recall, limit 5):
-Current repo snapshot:
+| Corpus size | Encode p50 (ms) | Encode p95 (ms) | Recall p50 (ms) | Recall p95 (ms) | Recall p99 (ms) |
+|---|---|---|---|---|---|
+| 100 | 0.33 | 0.59 | 0.54 | 1.82 | 2.71 |
+| 1,000 | 0.31 | 2.15 | 1.57 | 2.36 | 21.18 |
+| 5,000 | 0.31 | 1.84 | 2.09 | 3.42 | 16.58 |
-![Audrey local benchmark](docs/assets/benchmarks/local-benchmark.svg)
+These numbers cover Audrey's own pipeline (SQLite + sqlite-vec + hybrid ranking) and exclude embedding-provider cost. Real-world recall p95 with a local 384-dim provider is typically 5-15x higher; with a hosted provider it is dominated by the API round-trip. Run on your own hardware before quoting numbers anywhere.
-For detailed methodology, published comparison anchors, and generated reports, see [docs/benchmarking.md](docs/benchmarking.md).
+### Behavioral regression suite
-## Production
+`npm run bench:memory:check` is a release gate. It runs a small set of retrieval and lifecycle scenarios (information extraction, knowledge updates, multi-session reasoning, conflict resolution, privacy boundary, overwrite, delete-and-abstain, semantic/procedural merge) against Audrey and three weak baselines (vector-only, keyword+recency, recent-window) and asserts Audrey doesn't regress. The baseline comparisons exist to catch correctness regressions in retrieval logic, not to make marketing claims.
-Audrey is strongest in workflows where memory must stay local, reviewable, and durable. It already fits well as a sidecar for internal agents in operational domains like financial services and healthcare operations, but it is a memory layer, not a compliance boundary.
+```bash
+npm run bench:memory          # full regression suite (writes JSON + report)
+npm run bench:memory:check    # release gate, exits non-zero on regression
+```
-Production guide: [docs/production-readiness.md](docs/production-readiness.md)
+## Command Reference
-Examples:
+```bash
+# First contact
+npx audrey doctor
+npx audrey demo
-- [examples/fintech-ops-demo.js](examples/fintech-ops-demo.js)
-- [examples/healthcare-ops-demo.js](examples/healthcare-ops-demo.js)
-- [examples/stripe-demo.js](examples/stripe-demo.js)
+# MCP setup
+npx audrey install --host codex --dry-run
+npx audrey mcp-config codex
+npx audrey mcp-config generic
+npx audrey install
+npx audrey uninstall
-## Environment
+# Health and maintenance
+npx audrey status
+npx audrey status --json --fail-on-unhealthy
+npx audrey dream
+npx audrey reembed
-Starter config:
+# Closed-loop visibility
+npx audrey impact
+npx audrey impact --json --window 7 --limit 5
-- [.env.example](.env.example)
-- [.env.docker.example](.env.docker.example)
+# Tool-trace learning
+npx audrey observe-tool --event PostToolUse --tool Bash --outcome failed
+npx audrey promote --dry-run
-Key environment variables:
+# REST sidecar
+npx audrey serve
+copy .env.docker.example .env
+# edit AUDREY_API_KEY in .env
+docker compose up -d --build
+```
-- `AUDREY_DATA_DIR`
-- `AUDREY_EMBEDDING_PROVIDER`
-- `AUDREY_LLM_PROVIDER`
-- `AUDREY_DEVICE`
-- `AUDREY_API_KEY`
-- `AUDREY_HOST`
-- `AUDREY_PORT`
+The Node sidecar defaults to `127.0.0.1:7437`. The Docker image intentionally binds inside the container on `3487`, so Compose requires `AUDREY_API_KEY` in `.env` before startup. Override the published host port with `AUDREY_PUBLISHED_PORT` when using Compose.
 ## Documentation
-- [docs/benchmarking.md](docs/benchmarking.md)
-- [docs/production-readiness.md](docs/production-readiness.md)
-- [CONTRIBUTING.md](CONTRIBUTING.md)
-- [SECURITY.md](SECURITY.md)
+- [Security policy](SECURITY.md)
+- [Audrey paper outline](docs/AUDREY_PAPER_OUTLINE.md)
+- Public setup, runtime, benchmark, and command guidance is maintained in this README.
 ## Development
 ```bash
 npm ci
-npm test
-npm run bench:memory:check
-npm run pack:check
+npm run release:gate
 python -m unittest discover -s python/tests -v
 python -m build --no-isolation python
 ```
+`npm test` uses a repo-local Vitest launcher so locked-down Windows temp
+directories do not block test startup. `npm run release:gate:sandbox` remains
+available for hosts that block child-process spawning entirely.
 ## License
 MIT. See [LICENSE](LICENSE).

package/SECURITY.md ADDED Viewed

@@ -0,0 +1,29 @@
+# Security Policy
+## Supported Versions
+Security fixes are best-effort for the current published release line and the current default branch.
+| Version | Supported |
+|---|---|
+| `0.22.x` | Yes |
+| `< 0.22.0` | No |
+## Reporting a Vulnerability
+Do not open a public GitHub issue for a security vulnerability.
+Report vulnerabilities through one of these channels:
+- GitHub Security Advisories for this repository
+Include:
+- affected version
+- reproduction steps or proof of concept
+- impact description
+- suggested mitigation, if you have one
+## Scope Notes
+Audrey is a memory layer. Security posture also depends on the host application, deployment environment, provider configuration, access controls, and data-handling rules around it.