npm - audrey - Versions diffs - 0.23.1 → 1.0.0 - Mend

audrey 0.23.1 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (250) hide show

package/CHANGELOG.md +81 -19
package/LICENSE +21 -21
package/README.md +209 -5
package/SECURITY.md +2 -1
package/benchmarks/adapter-kit.mjs +20 -0
package/benchmarks/adapter-self-test.mjs +166 -0
package/benchmarks/adapters/example-allow.mjs +28 -0
package/benchmarks/adapters/mem0-platform.mjs +267 -0
package/benchmarks/adapters/registry.json +51 -0
package/benchmarks/adapters/zep-cloud.mjs +280 -0
package/benchmarks/baselines.js +169 -0
package/benchmarks/build-leaderboard.mjs +170 -0
package/benchmarks/cases.js +537 -0
package/benchmarks/create-conformance-card.mjs +139 -0
package/benchmarks/create-submission-bundle.mjs +176 -0
package/benchmarks/dry-run-external-adapters.mjs +165 -0
package/benchmarks/guardbench.js +1035 -0
package/benchmarks/output/adapter-self-test/guardbench-adapter-self-test.json +50 -0
package/benchmarks/output/external/guardbench-external-dry-run.json +69 -0
package/benchmarks/output/external/guardbench-external-evidence.json +56 -0
package/benchmarks/output/guardbench-conformance-card.json +63 -0
package/benchmarks/output/guardbench-manifest.json +414 -0
package/benchmarks/output/guardbench-raw.json +1171 -0
package/benchmarks/output/guardbench-summary.json +1981 -0
package/benchmarks/output/leaderboard/guardbench-leaderboard.json +93 -0
package/benchmarks/output/leaderboard/guardbench-leaderboard.md +7 -0
package/benchmarks/output/submission-bundle/guardbench-conformance-card.json +63 -0
package/benchmarks/output/submission-bundle/guardbench-manifest.json +414 -0
package/benchmarks/output/submission-bundle/guardbench-raw.json +1171 -0
package/benchmarks/output/submission-bundle/guardbench-summary.json +1981 -0
package/benchmarks/output/submission-bundle/schemas/guardbench-adapter-registry.schema.json +69 -0
package/benchmarks/output/submission-bundle/schemas/guardbench-adapter-self-test.schema.json +156 -0
package/benchmarks/output/submission-bundle/schemas/guardbench-conformance-card.schema.json +184 -0
package/benchmarks/output/submission-bundle/schemas/guardbench-external-dry-run.schema.json +74 -0
package/benchmarks/output/submission-bundle/schemas/guardbench-external-evidence.schema.json +108 -0
package/benchmarks/output/submission-bundle/schemas/guardbench-external-run.schema.json +160 -0
package/benchmarks/output/submission-bundle/schemas/guardbench-leaderboard.schema.json +179 -0
package/benchmarks/output/submission-bundle/schemas/guardbench-manifest.schema.json +213 -0
package/benchmarks/output/submission-bundle/schemas/guardbench-publication-verification.schema.json +47 -0
package/benchmarks/output/submission-bundle/schemas/guardbench-raw.schema.json +164 -0
package/benchmarks/output/submission-bundle/schemas/guardbench-submission-manifest.schema.json +151 -0
package/benchmarks/output/submission-bundle/schemas/guardbench-summary.schema.json +228 -0
package/benchmarks/output/submission-bundle/submission-manifest.json +131 -0
package/benchmarks/output/submission-bundle/validation-report.json +31 -0
package/benchmarks/output/summary.json +2354 -0
package/benchmarks/perf-snapshot.js +304 -0
package/benchmarks/perf.bench.js +161 -0
package/benchmarks/public-paths.mjs +78 -0
package/benchmarks/reference-results.js +70 -0
package/benchmarks/report.js +259 -0
package/benchmarks/run-external-guardbench.mjs +281 -0
package/benchmarks/run.js +682 -0
package/benchmarks/schemas/guardbench-adapter-registry.schema.json +69 -0
package/benchmarks/schemas/guardbench-adapter-self-test.schema.json +156 -0
package/benchmarks/schemas/guardbench-conformance-card.schema.json +184 -0
package/benchmarks/schemas/guardbench-external-dry-run.schema.json +74 -0
package/benchmarks/schemas/guardbench-external-evidence.schema.json +108 -0
package/benchmarks/schemas/guardbench-external-run.schema.json +160 -0
package/benchmarks/schemas/guardbench-leaderboard.schema.json +179 -0
package/benchmarks/schemas/guardbench-manifest.schema.json +213 -0
package/benchmarks/schemas/guardbench-publication-verification.schema.json +47 -0
package/benchmarks/schemas/guardbench-raw.schema.json +164 -0
package/benchmarks/schemas/guardbench-submission-manifest.schema.json +151 -0
package/benchmarks/schemas/guardbench-summary.schema.json +228 -0
package/benchmarks/snapshots/perf-0.22.2.json +123 -0
package/benchmarks/snapshots/perf-0.23.0.json +123 -0
package/benchmarks/validate-adapter-module.mjs +104 -0
package/benchmarks/validate-adapter-registry.mjs +134 -0
package/benchmarks/validate-adapter-self-test.mjs +96 -0
package/benchmarks/validate-guardbench-artifacts.mjs +343 -0
package/benchmarks/verify-external-evidence.mjs +296 -0
package/benchmarks/verify-publication-artifacts.mjs +286 -0
package/benchmarks/verify-submission-bundle.mjs +167 -0
package/dist/mcp-server/config.d.ts +1 -1
package/dist/mcp-server/config.d.ts.map +1 -1
package/dist/mcp-server/config.js +1 -1
package/dist/mcp-server/config.js.map +1 -1
package/dist/mcp-server/index.d.ts +65 -3
package/dist/mcp-server/index.d.ts.map +1 -1
package/dist/mcp-server/index.js +675 -157
package/dist/mcp-server/index.js.map +1 -1
package/dist/src/action-key.d.ts +9 -0
package/dist/src/action-key.d.ts.map +1 -0
package/dist/src/action-key.js +49 -0
package/dist/src/action-key.js.map +1 -0
package/dist/src/adaptive.js +5 -5
package/dist/src/affect.js +8 -8
package/dist/src/audrey.d.ts +3 -0
package/dist/src/audrey.d.ts.map +1 -1
package/dist/src/audrey.js +55 -3
package/dist/src/audrey.js.map +1 -1
package/dist/src/capsule.js +4 -4
package/dist/src/causal.js +3 -3
package/dist/src/consolidate.js +48 -48
package/dist/src/controller.d.ts +61 -5
package/dist/src/controller.d.ts.map +1 -1
package/dist/src/controller.js +230 -49
package/dist/src/controller.js.map +1 -1
package/dist/src/db.js +172 -172
package/dist/src/decay.js +8 -8
package/dist/src/embedding.d.ts +2 -1
package/dist/src/embedding.d.ts.map +1 -1
package/dist/src/embedding.js +39 -29
package/dist/src/embedding.js.map +1 -1
package/dist/src/encode.js +6 -6
package/dist/src/feedback.d.ts +6 -0
package/dist/src/feedback.d.ts.map +1 -1
package/dist/src/feedback.js +6 -0
package/dist/src/feedback.js.map +1 -1
package/dist/src/forget.js +12 -12
package/dist/src/hybrid-recall.js +9 -9
package/dist/src/impact.js +6 -6
package/dist/src/import.d.ts +3 -3
package/dist/src/import.js +41 -41
package/dist/src/index.d.ts +3 -3
package/dist/src/index.d.ts.map +1 -1
package/dist/src/index.js +2 -2
package/dist/src/index.js.map +1 -1
package/dist/src/interference.js +14 -14
package/dist/src/introspect.js +18 -18
package/dist/src/preflight.d.ts.map +1 -1
package/dist/src/preflight.js +41 -0
package/dist/src/preflight.js.map +1 -1
package/dist/src/promote.js +7 -7
package/dist/src/prompts.js +118 -118
package/dist/src/recall.js +30 -30
package/dist/src/reflexes.d.ts +1 -0
package/dist/src/reflexes.d.ts.map +1 -1
package/dist/src/reflexes.js +3 -0
package/dist/src/reflexes.js.map +1 -1
package/dist/src/rollback.js +4 -4
package/dist/src/routes.d.ts.map +1 -1
package/dist/src/routes.js +67 -1
package/dist/src/routes.js.map +1 -1
package/dist/src/validate.js +25 -25
package/docs/AUDREY_PAPER_OUTLINE.md +175 -0
package/docs/MEMORY_BENCHMARKING.md +59 -0
package/docs/PRODUCTION_BACKLOG.md +304 -0
package/docs/paper/00-master.md +48 -0
package/docs/paper/01-introduction.md +27 -0
package/docs/paper/02-related-work.md +47 -0
package/docs/paper/03-problem-definition.md +108 -0
package/docs/paper/04-design.md +164 -0
package/docs/paper/05-guardbench-spec.md +412 -0
package/docs/paper/06-implementation.md +113 -0
package/docs/paper/07-evaluation.md +168 -0
package/docs/paper/08-discussion-limitations.md +61 -0
package/docs/paper/09-conclusion.md +11 -0
package/docs/paper/SUBMISSION_README.md +162 -0
package/docs/paper/appendix-a-demo-transcript.md +114 -0
package/docs/paper/arxiv-compile-report.schema.json +116 -0
package/docs/paper/arxiv-source.schema.json +61 -0
package/docs/paper/audrey-paper-v1.md +1106 -0
package/docs/paper/browser-launch-plan.json +209 -0
package/docs/paper/browser-launch-plan.schema.json +100 -0
package/docs/paper/browser-launch-results.json +86 -0
package/docs/paper/browser-launch-results.schema.json +66 -0
package/docs/paper/claim-register.json +138 -0
package/docs/paper/claim-register.schema.json +81 -0
package/docs/paper/evidence-ledger.md +103 -0
package/docs/paper/output/arxiv/README-arxiv.txt +8 -0
package/docs/paper/output/arxiv/arxiv-manifest.json +41 -0
package/docs/paper/output/arxiv/main.tex +949 -0
package/docs/paper/output/arxiv/references.bib +222 -0
package/docs/paper/output/arxiv-compile-report.json +24 -0
package/docs/paper/output/submission-bundle/LICENSE +21 -0
package/docs/paper/output/submission-bundle/README.md +533 -0
package/docs/paper/output/submission-bundle/benchmarks/output/adapter-self-test/guardbench-adapter-self-test.json +50 -0
package/docs/paper/output/submission-bundle/benchmarks/output/external/guardbench-external-dry-run.json +69 -0
package/docs/paper/output/submission-bundle/benchmarks/output/external/guardbench-external-evidence.json +56 -0
package/docs/paper/output/submission-bundle/benchmarks/output/guardbench-conformance-card.json +63 -0
package/docs/paper/output/submission-bundle/benchmarks/output/guardbench-manifest.json +414 -0
package/docs/paper/output/submission-bundle/benchmarks/output/guardbench-raw.json +1171 -0
package/docs/paper/output/submission-bundle/benchmarks/output/guardbench-summary.json +1981 -0
package/docs/paper/output/submission-bundle/benchmarks/output/leaderboard/guardbench-leaderboard.json +93 -0
package/docs/paper/output/submission-bundle/benchmarks/output/leaderboard/guardbench-leaderboard.md +7 -0
package/docs/paper/output/submission-bundle/benchmarks/output/submission-bundle/submission-manifest.json +131 -0
package/docs/paper/output/submission-bundle/benchmarks/output/submission-bundle/validation-report.json +31 -0
package/docs/paper/output/submission-bundle/benchmarks/output/summary.json +2354 -0
package/docs/paper/output/submission-bundle/benchmarks/schemas/guardbench-adapter-registry.schema.json +69 -0
package/docs/paper/output/submission-bundle/benchmarks/schemas/guardbench-adapter-self-test.schema.json +156 -0
package/docs/paper/output/submission-bundle/benchmarks/schemas/guardbench-conformance-card.schema.json +184 -0
package/docs/paper/output/submission-bundle/benchmarks/schemas/guardbench-external-dry-run.schema.json +74 -0
package/docs/paper/output/submission-bundle/benchmarks/schemas/guardbench-external-evidence.schema.json +108 -0
package/docs/paper/output/submission-bundle/benchmarks/schemas/guardbench-external-run.schema.json +160 -0
package/docs/paper/output/submission-bundle/benchmarks/schemas/guardbench-leaderboard.schema.json +179 -0
package/docs/paper/output/submission-bundle/benchmarks/schemas/guardbench-manifest.schema.json +213 -0
package/docs/paper/output/submission-bundle/benchmarks/schemas/guardbench-publication-verification.schema.json +47 -0
package/docs/paper/output/submission-bundle/benchmarks/schemas/guardbench-raw.schema.json +164 -0
package/docs/paper/output/submission-bundle/benchmarks/schemas/guardbench-submission-manifest.schema.json +151 -0
package/docs/paper/output/submission-bundle/benchmarks/schemas/guardbench-summary.schema.json +228 -0
package/docs/paper/output/submission-bundle/docs/AUDREY_PAPER_OUTLINE.md +175 -0
package/docs/paper/output/submission-bundle/docs/paper/00-master.md +48 -0
package/docs/paper/output/submission-bundle/docs/paper/01-introduction.md +27 -0
package/docs/paper/output/submission-bundle/docs/paper/02-related-work.md +47 -0
package/docs/paper/output/submission-bundle/docs/paper/03-problem-definition.md +108 -0
package/docs/paper/output/submission-bundle/docs/paper/04-design.md +164 -0
package/docs/paper/output/submission-bundle/docs/paper/05-guardbench-spec.md +412 -0
package/docs/paper/output/submission-bundle/docs/paper/06-implementation.md +113 -0
package/docs/paper/output/submission-bundle/docs/paper/07-evaluation.md +168 -0
package/docs/paper/output/submission-bundle/docs/paper/08-discussion-limitations.md +61 -0
package/docs/paper/output/submission-bundle/docs/paper/09-conclusion.md +11 -0
package/docs/paper/output/submission-bundle/docs/paper/SUBMISSION_README.md +162 -0
package/docs/paper/output/submission-bundle/docs/paper/appendix-a-demo-transcript.md +114 -0
package/docs/paper/output/submission-bundle/docs/paper/arxiv-compile-report.schema.json +116 -0
package/docs/paper/output/submission-bundle/docs/paper/arxiv-source.schema.json +61 -0
package/docs/paper/output/submission-bundle/docs/paper/audrey-paper-v1.md +1106 -0
package/docs/paper/output/submission-bundle/docs/paper/browser-launch-plan.json +209 -0
package/docs/paper/output/submission-bundle/docs/paper/browser-launch-plan.schema.json +100 -0
package/docs/paper/output/submission-bundle/docs/paper/browser-launch-results.json +86 -0
package/docs/paper/output/submission-bundle/docs/paper/browser-launch-results.schema.json +66 -0
package/docs/paper/output/submission-bundle/docs/paper/claim-register.json +138 -0
package/docs/paper/output/submission-bundle/docs/paper/claim-register.schema.json +81 -0
package/docs/paper/output/submission-bundle/docs/paper/evidence-ledger.md +103 -0
package/docs/paper/output/submission-bundle/docs/paper/output/arxiv/README-arxiv.txt +8 -0
package/docs/paper/output/submission-bundle/docs/paper/output/arxiv/arxiv-manifest.json +41 -0
package/docs/paper/output/submission-bundle/docs/paper/output/arxiv/main.tex +949 -0
package/docs/paper/output/submission-bundle/docs/paper/output/arxiv/references.bib +222 -0
package/docs/paper/output/submission-bundle/docs/paper/output/arxiv-compile-report.json +24 -0
package/docs/paper/output/submission-bundle/docs/paper/paper-submission-bundle.schema.json +70 -0
package/docs/paper/output/submission-bundle/docs/paper/publication-pack.json +81 -0
package/docs/paper/output/submission-bundle/docs/paper/publication-pack.schema.json +60 -0
package/docs/paper/output/submission-bundle/docs/paper/references.bib +222 -0
package/docs/paper/output/submission-bundle/package.json +212 -0
package/docs/paper/output/submission-bundle/paper-submission-manifest.json +379 -0
package/docs/paper/paper-submission-bundle.schema.json +70 -0
package/docs/paper/publication-pack.json +81 -0
package/docs/paper/publication-pack.schema.json +60 -0
package/docs/paper/references.bib +222 -0
package/package.json +87 -4
package/scripts/audit-release-completion.mjs +362 -0
package/scripts/create-arxiv-source.mjs +362 -0
package/scripts/create-paper-submission-bundle.mjs +210 -0
package/scripts/finalize-release.mjs +526 -0
package/scripts/prepare-release-cut.mjs +269 -0
package/scripts/publish-release-bundle.mjs +209 -0
package/scripts/publish-release-github-api.mjs +429 -0
package/scripts/run-vitest.mjs +34 -0
package/scripts/smoke-cli.js +72 -0
package/scripts/sync-paper-artifacts.mjs +109 -0
package/scripts/verify-arxiv-compile.mjs +440 -0
package/scripts/verify-arxiv-source.mjs +194 -0
package/scripts/verify-browser-launch-plan.mjs +237 -0
package/scripts/verify-browser-launch-results.mjs +285 -0
package/scripts/verify-paper-artifacts.mjs +338 -0
package/scripts/verify-paper-claims.mjs +226 -0
package/scripts/verify-paper-submission-bundle.mjs +207 -0
package/scripts/verify-publication-pack.mjs +196 -0
package/scripts/verify-python-package.py +201 -0
package/scripts/verify-release-readiness.mjs +741 -0

package/docs/paper/output/submission-bundle/docs/AUDREY_PAPER_OUTLINE.md ADDED Viewed

@@ -0,0 +1,175 @@
+# Audrey Paper Outline
+## Working Title
+Audrey Guard: Local-First Pre-Action Memory Control for Tool-Using Agents
+## One-Sentence Thesis
+Long-term memory for agents should not stop at recall; it should run before tool use, connect prior outcomes to the next action, and return an auditable `allow`, `warn`, or `block` decision with evidence.
+## Abstract Draft
+Tool-using agents repeatedly fail in ways that are avoidable: they rerun broken commands, ignore project-specific procedures, lose the context behind prior failures, and trust degraded retrieval paths as if they were complete. Existing agent-memory systems focus mainly on storing and retrieving conversational facts. Audrey reframes memory as a local-first control loop for action: observe tool outcomes, encode durable lessons, build a memory capsule before the next action, generate reflexes, decide whether to allow, warn, or block, and validate whether the memory changed the result.
+This paper introduces Audrey Guard, a SQLite-backed memory controller for Model Context Protocol and CLI agents. Audrey Guard combines hybrid vector/FTS recall, memory capsules, preflight warnings, tool-trace learning, redaction-first audit logging, and evidence-linked impact measurement. The evaluation plan measures repeated-failure prevention, false-block rate, degraded-recall fail-closed behavior, redaction safety, and overhead. The result is a practical memory firewall for local agent work: not a replacement for general memory platforms, but an auditable layer that helps agents avoid repeating known mistakes before they touch tools.
+## Core Contributions
+1. Define pre-action memory control as a distinct problem from generic long-term memory retrieval.
+2. Present the Audrey Guard loop: `PostToolUse` observation -> memory encoding -> preflight/capsule/reflex generation -> `allow` / `warn` / `block` -> validation/impact.
+3. Show a local-first implementation over SQLite, vector search, FTS, MCP, CLI, REST, and Python clients.
+4. Introduce GuardBench, an evaluation suite focused on tool-use risk reduction rather than chat-memory accuracy alone.
+5. Measure safety properties that memory systems usually underreport: repeated-failure prevention, recall degradation handling, secret redaction, and audit lineage.
+## Paper Structure
+### 1. Introduction
+- Agents now operate tools, not just text conversations.
+- The failure mode is operational: the agent knows less than yesterday's run.
+- Generic memory recall is necessary but insufficient; the memory must participate before action.
+- Audrey's claim: a local memory controller can prevent repeated tool failures with low overhead and inspectable evidence.
+### 2. Background and Related Work
+- Agent-memory systems: Mem0, Letta/MemGPT, LangMem, Zep/Graphiti, Supermemory, OpenMemory, Cognee, LlamaIndex memory.
+- Memory-as-system-resource work: MemOS, procedural memory, evidence-driven retention, temporal graphs.
+- MCP tool safety: tool annotations, tool poisoning, descriptor drift, open-world tool risk.
+- Hook runtimes: Claude Code `PreToolUse`, `PostToolUse`, and `PostToolUseFailure` make pre-action memory control deployable.
+### 3. Problem Definition
+- Input: proposed agent action, tool name, command/action text, cwd, file scope, session id, and current memory store.
+- Output: decision, risk score, summary, evidence ids, recommended actions, reflexes, optional capsule, and preflight event id.
+- Desired behavior:
+  - Block exact repeated failures unless the action changed.
+  - Warn on relevant prior failures, must-follow procedures, contradictions, and degraded recall.
+  - Preserve evidence lineage and redact secrets before durable storage.
+  - Add low enough latency to run inside tool hooks.
+### 4. Audrey Guard Design
+- Memory substrate: episodic, semantic, procedural, event log, validation, decay, consolidation.
+- Recall: hybrid vector + FTS with tag/source/date filters and partial-failure diagnostics.
+- Capsules: budgeted evidence assembly for action context.
+- Preflight: warnings and risk scoring from capsule sections, status, and recent tool failures.
+- Reflexes: action-oriented responses generated from preflight evidence.
+- Controller: `beforeAction()` and `afterAction()` over existing Audrey primitives.
+- Audit safety: redaction-before-truncation, action hashing, file-scope hashing, event ids.
+### 5. Implementation
+- Runtime: Node.js 20+, TypeScript, SQLite, sqlite-vec, Hono REST, MCP stdio, Python client.
+- CLI:
+  - `audrey guard --tool Bash "npm run deploy"`
+  - `audrey demo --scenario repeated-failure`
+- MCP surfaces:
+  - Tools for recall, preflight, reflexes, observe-tool, impact, status.
+  - Resources for status, recent memories, and principles.
+  - Prompts for session briefing, recall, and reflection.
+- Docker behavior: fail-closed non-loopback REST sidecar with required API key.
+### 6. Evaluation: GuardBench
+Baselines:
+- No memory.
+- Recent-window memory.
+- Vector-only recall.
+- Keyword/FTS-only recall.
+- Audrey Guard with hybrid recall and exact-failure matching.
+Scenarios:
+- Repeated failed shell command.
+- Required preflight procedure missing.
+- Same command in a different file scope.
+- Same tool/action with changed command.
+- Prior failure plus successful fix.
+- Recall vector table missing.
+- FTS failure under hybrid recall.
+- Long secret near truncation boundary.
+- Conflicting project instructions.
+- High-volume irrelevant memory noise.
+Metrics:
+- Repeated-failure prevention rate.
+- False-block rate.
+- Useful-warning precision.
+- Evidence recall: whether the blocking evidence is surfaced.
+- Redaction safety: raw secret leakage count.
+- Recall-degradation detection rate.
+- Runtime overhead p50/p95.
+- Validation-linked impact count.
+### 7. Results Plan
+- Use the existing repeated-failure demo as the first qualitative figure.
+- Run `npm run bench:memory:check` as the memory-regression baseline.
+- Keep the `bench:guard` command wired into release evidence before paper submission.
+- Report machine provenance for all timings, matching the existing 0.22.2 benchmark snapshot style.
+- Include ablations:
+  - Without exact action hash.
+  - Without file scope.
+  - Without recall degradation warnings.
+  - Without redaction-aware truncation.
+### 8. Discussion
+- Why Audrey should not compete as "the best general memory store."
+- Why local-first matters for tool traces: secrets, filesystem paths, project rules, and private failures.
+- Why tool annotations are hints, not policy guarantees.
+- What Audrey borrows from graph memory without adding a graph database to the core.
+- Limitations:
+  - Claude Code hook config can be applied with a guarded settings merge, but
+    equivalent Codex hook wiring still depends on a stable host hook surface.
+  - Validation lineage is bound to exact preflight event evidence, but feedback
+    does not yet tune risk scoring.
+  - Local comparative GuardBench numbers exist; no external-system numbers yet.
+  - Temporal belief fields are still future work.
+### 9. Conclusion
+- Agent memory should be judged by whether it changes future actions, not just whether it retrieves relevant text.
+- Audrey Guard demonstrates a practical local loop for using memory as a pre-action control layer.
+- The next publishable milestone is live external-adapter GuardBench output plus broader host-hook integration.
+## Figures and Tables
+1. Guard loop diagram: observe -> encode -> capsule/preflight/reflex -> decision -> validate.
+2. Architecture diagram: SQLite store, event log, recall, controller, MCP/CLI/REST clients.
+3. Repeated-failure demo transcript with evidence ids.
+4. GuardBench table by baseline and scenario.
+5. Redaction/truncation safety table.
+6. Latency table: preflight p50/p95 by memory count.
+## Artifact Checklist Before Submission
+- `bench:guard` script and JSON output.
+- Public GuardBench scenario manifest, comparative adapter package, and external-run metadata bundle.
+- Reproducible benchmark snapshot with Node version, CPU, RAM, git SHA.
+- CLI smoke transcript for `audrey demo --scenario repeated-failure`.
+- MCP smoke transcript for `tools/list`, `resources/list`, `prompts/list`, and `memory_status`.
+- Python integration proof.
+- Docker fail-closed auth proof.
+- Paper appendix with exact commands.
+## Submission Strategy
+1. Publish an arXiv preprint after GuardBench exists.
+2. Submit to an agent-systems, AI engineering, or LLM applications workshop.
+3. Keep the first version implementation-centered, not theory-heavy.
+4. Release the evaluation artifact with the paper so the claim is falsifiable.
+## Source Map
+- MCP tool annotations and trust model: https://modelcontextprotocol.io/specification/2025-11-25/server/tools and https://modelcontextprotocol.io/specification/2025-11-25/schema
+- MCP annotation risk vocabulary: https://blog.modelcontextprotocol.io/posts/2026-03-16-tool-annotations/
+- Claude Code hooks: https://code.claude.com/docs/en/hooks
+- Mem0 token-efficient memory algorithm: https://mem0.ai/blog/mem0-the-token-efficient-memory-algorithm
+- MemOS: https://huggingface.co/papers/2507.03724
+- MCP Security Bench: https://huggingface.co/papers/2510.15994
+- Securing MCP against tool poisoning: https://papers.cool/arxiv/2512.06556
+- Zep/Graphiti temporal knowledge graph: https://help.getzep.com/graphiti/graphiti/overview

package/docs/paper/output/submission-bundle/docs/paper/00-master.md ADDED Viewed

@@ -0,0 +1,48 @@
+# Agent Memory Should Control Tool Use: Audrey Guard and Pre-Action Memory Control
+## Abstract
+Agent memory should be judged by whether it changes future tool actions, not only by whether it retrieves relevant text. Audrey implements a local-first pre-action memory controller that converts prior tool outcomes, procedures, contradictions, recall health, and redacted traces into auditable `allow`, `warn`, or `block` decisions before an agent acts. The system builds bounded memory capsules, scores preflight risk, generates evidence-linked reflexes, blocks exact repeated failures through deterministic action identity hashing, and closes the loop through post-action validation and impact reporting. This paper frames the scientific category as pre-action memory control and the artifact as Audrey Guard. The Stage-A version reports implemented Audrey evidence: the controller and CLI, redaction-first tool tracing, recall-degradation handling, the canonical 0.22.2 performance snapshot, the current behavioral regression gate output, the local comparative GuardBench run, and the deterministic repeated-failure demo. It also specifies GuardBench as the evaluation methodology for future cross-system comparison.
+## Table of Contents and Authoring Status
+| Section | File | Status | Owner |
+|---|---|---|---|
+| 0. Master, abstract, status | `00-master.md` | Draft initialized | Codex |
+| 1. Introduction | `01-introduction.md` | Draft complete | Claude strategy, Codex draft |
+| 2. Related Work | `02-related-work.md` | Draft complete | Claude citation strategy, Codex draft |
+| 3. Problem Definition | `03-problem-definition.md` | Draft complete | Codex |
+| 4. Design | `04-design.md` | Draft complete | Codex |
+| 5. GuardBench Specification | `05-guardbench-spec.md` | Draft complete | Claude spec review, Codex draft |
+| 6. Implementation | `06-implementation.md` | Draft complete | Codex |
+| 7. Evaluation | `07-evaluation.md` | Draft complete | Codex with Claude anti-claim review |
+| 8. Discussion and Limitations | `08-discussion-limitations.md` | Draft complete | Claude review, Codex draft |
+| 9. Conclusion | `09-conclusion.md` | Draft complete | Codex |
+| Consolidated v1 master | `audrey-paper-v1.md` | Assembled | Codex |
+| Appendix A. Demo Transcript | `appendix-a-demo-transcript.md` | Draft complete | Codex |
+| Appendix B. Evidence Ledger | `evidence-ledger.md` | Initialized and populated | Codex |
+| References | `references.bib` | Initialized with primary URLs; benchmark citations added | Codex |
+## Current Draft Constraints
+- Quote benchmark numbers from `benchmarks/snapshots/perf-0.22.2.json`, not the README sample table (Ledger: E28).
+- Treat GuardBench Stage A as a specification contribution plus local comparative result, not completed external-system results.
+- Cite external claims only from primary papers, official documentation, official repositories, or first-party project posts.
+- Keep claims about Audrey tied to evidence-ledger IDs.
+- Keep section-body ledger references while drafting; remove them during final submission polish after claims are stable.
+## Assembled Draft Preview
+| Order | File | Lines |
+|---|---|---:|
+| Master | `audrey-paper-v1.md` | 921 |
+| 1 | `01-introduction.md` | 27 |
+| 2 | `02-related-work.md` | 47 |
+| 3 | `03-problem-definition.md` | 108 |
+| 4 | `04-design.md` | 162 |
+| 5 | `05-guardbench-spec.md` | 242 |
+| 6 | `06-implementation.md` | 113 |
+| 7 | `07-evaluation.md` | 124 |
+| 8 | `08-discussion-limitations.md` | 61 |
+| 9 | `09-conclusion.md` | 11 |
+| Appendix A | `appendix-a-demo-transcript.md` | 114 |

package/docs/paper/output/submission-bundle/docs/paper/01-introduction.md ADDED Viewed

@@ -0,0 +1,27 @@
+# 1. Introduction
+Tool-using agents fail in ways that ordinary chat-memory evaluation does not measure. They repeat broken shell commands after a previous run already exposed the error. They ignore project-specific setup rules that were learned in an earlier session. They lose the causal link between a failed action and the fix that made a later action safe. They treat degraded retrieval as complete memory and act anyway. In Audrey's repeated-failure demo, an agent first runs `npm run deploy` and fails because the Prisma client was not generated. Audrey records the failed tool event, stores the operational rule, and blocks the same action when it is proposed again. The transcript ends with the intended behavior of pre-action memory control: "Audrey saw the agent fail once. Audrey stopped it from failing twice." (Ledger: E25, E42)
+Most memory evaluation frames do not test this behavior. MTEB evaluates text embeddings across retrieval and representation tasks [@muennighoff2023mteb]. LongMemEval evaluates chat assistants on information extraction, multi-session reasoning, temporal reasoning, knowledge updates, and abstention over long interaction histories [@wu2025longmemeval]. LoCoMo evaluates very long-term conversational memory through question answering, summarization, and multimodal dialogue generation [@maharana2024locomo]. These benchmarks are valuable, but their output target is retrieved context or an answer. They do not ask whether memory changed a future tool action before the action reached the shell, file system, browser, API, or MCP server.
+This paper defines pre-action memory control as a distinct systems problem. A controller receives a proposed tool action and remembered state before execution, then returns an auditable `allow`, `warn`, or `block` decision with evidence. Section 3 gives the formal input and output contract, desired behavior properties, threat model, and scope boundaries. The key shift is the evaluation target: memory is judged by its effect on action selection, not only by the relevance of retrieved text.
+Audrey Guard is the artifact studied in this paper. It is a local-first memory controller for agents that observes tool outcomes, redacts traces, retrieves relevant memory, constructs a bounded capsule, scores preflight risk, generates reflexes, returns `allow`/`warn`/`block`, and validates whether memory helped after the action path completes (Ledger: E1-E17). The implementation is exposed through MCP, CLI, REST, and Python client surfaces, while the core guard path runs host-side before tool execution (Ledger: E18-E19, E26, E32-E36).
+This paper makes six contributions:
+1. It formalizes pre-action memory control as a problem separate from chat recall, retrieval accuracy, and long-context question answering (Section 3).
+2. It presents Audrey Guard, a local-first controller that converts remembered failures, procedures, contradictions, recall health, and redacted tool traces into `allow`, `warn`, or `block` decisions before tool use (Sections 4 and 6; Ledger: E1-E15, E29-E40).
+3. It introduces deterministic action identity for repeated-failure prevention: tool, redacted command, normalized working directory, and sorted file scope are hashed and matched against prior failed tool events (Sections 4, 6, and 7; Ledger: E3, E25, E42).
+4. It implements a redaction-first tool-trace path so guard evidence can reference prior tool input, output, and error summaries without storing raw secrets in durable memory (Sections 4 and 6; Ledger: E12-E13).
+5. It treats recall degradation as a control signal: missing vector tables, KNN failures, and FTS failures propagate as `RecallError[]`, appear in capsules, and become high-severity preflight warnings under strict guard mode (Sections 4 and 6; Ledger: E7, E9-E10, E15, E40).
+6. It specifies GuardBench, a reproducibility contract for measuring whether memory changes future tool actions, including scenarios, baselines, metrics, redaction sweeps, machine provenance, and raw per-scenario outputs (Section 5).
+The empirical scope is Stage A. This paper reports implemented Audrey evidence: the controller and CLI guard path, redaction-first tool tracing, recall-degradation handling, the canonical 0.22.2 performance snapshot, the current `bench:memory:check` regression output, the local comparative GuardBench run, and the deterministic repeated-failure demo transcript (Ledger: E20-E26, E41-E42, E46). It does not report external-system GuardBench comparisons, production-load measurements, or real-provider embedding latency. The external adapter contract, Mem0 adapter, and evidence-bundle runner now exist, but live external-system scores belong in a v2 paper after credentialed runs publish raw outputs under the contract in Section 5 (Ledger: E47-E50).
+Section 2 positions Audrey against memory systems, memory benchmarks, graph-memory systems, and MCP safety work. Section 3 defines the pre-action memory-control problem. Section 4 describes Audrey Guard's design. Section 5 specifies GuardBench. Section 6 documents the implementation. Section 7 reports Stage-A evaluation artifacts. Section 8 discusses limitations and open problems. Section 9 concludes.

package/docs/paper/output/submission-bundle/docs/paper/02-related-work.md ADDED Viewed

@@ -0,0 +1,47 @@
+# 2. Related Work
+This section organizes prior work by the behavior each system optimizes. The point of comparison is not whether a system has memory. It is whether memory runs before tool use and produces an evidence-linked action decision.
+## Conversational and Scalable Memory Systems
+Mem0 optimizes scalable long-term memory for multi-session agents. Its paper frames the problem as extracting, consolidating, and retrieving salient information from ongoing conversations, including a graph-memory variant for relational structure, and evaluates on LoCoMo-style conversational memory tasks [@chhikara2025mem0]. Its 2026 algorithm post emphasizes token-efficient retrieval through hierarchical memory, ADD-only extraction, entity linking, and multi-signal retrieval [@mem02026tokenefficient]. Audrey differs at the control boundary: Mem0 optimizes what context to retrieve for response generation, while Audrey evaluates whether remembered context changes a proposed tool action before execution.
+MemGPT, now associated with Letta, optimizes virtual context management. The MemGPT paper treats limited context windows as an operating-system-style memory hierarchy problem, moving information between memory tiers so an LLM can operate beyond its immediate context window [@packer2024memgpt]. This is an architectural framing for extended context and multi-session chat. Audrey borrows the systems instinct but changes the target: its controller is not a virtual-context manager for the model; it is a host-side guard that returns `allow`, `warn`, or `block` for a proposed action.
+LangMem optimizes memory as a reusable agent-runtime primitive. Its documentation describes tooling for extracting important information from conversations, optimizing agent behavior through prompt refinement, maintaining long-term memory, and providing hot-path memory tools agents can call during active conversations [@langchain2026langmem]. This is close to an agent developer's integration layer. Audrey differs because the guard path does not depend on the language model deciding to call a memory tool; the host asks memory before the tool call proceeds.
+Supermemory optimizes a developer memory API and context stack. Its documentation positions the service as long-term and short-term memory and context infrastructure, with ingestion, extraction, graph memory, user profiles, connectors, and managed RAG [@supermemory2026docs]. Its repository describes persistent memory for AI tools and an API that returns user profiles and relevant memories [@supermemory2026repo]. Audrey differs by keeping the pre-action controller local and by treating prior tool outcomes, redaction state, and recall degradation as enforceable control inputs rather than retrieved context alone.
+## Memory as System Resource and Graph Systems
+MemOS optimizes memory as a system resource. The paper introduces a memory operating system that manages heterogeneous memory forms across temporal scales, with memory units carrying content and metadata such as provenance and versioning [@li2025memos]. This is the broadest systems framing among the related memory papers. Audrey's scope is narrower and more operational: it does not manage parameter-level memories or schedule heterogeneous memory resources; it inserts a local memory-derived decision layer before agent tool use.
+Zep optimizes temporal knowledge graphs for agent memory. Its paper presents Graphiti as a temporally aware knowledge-graph engine that synthesizes unstructured conversations and structured business data while maintaining historical relationships, then evaluates retrieval over Deep Memory Retrieval and LongMemEval-style tasks [@rasmussen2025zep]. Audrey uses contradictions, recent tool events, and typed memory, but it does not claim to be a temporal knowledge graph service. Its central output is a guard decision, not a retrieved graph context.
+Graphiti optimizes real-time temporal context graphs. Its repository describes context graphs that track how facts change over time, maintain provenance to source data, and support semantic, keyword, and graph traversal retrieval [@zep2026graphiti]. This is valuable for evolving facts and historical queries. Audrey's use of evidence is different: evidence is attached to an action decision and to recommendations that a host can enforce.
+Cognee optimizes knowledge infrastructure for agent memory. The Cognee repository describes an open-source memory control plane that ingests data, combines embeddings and graphs, supports local execution, and provides traceability and cross-agent knowledge sharing [@cognee2026repo]. The Cognee paper studies hyperparameter optimization for graph construction, retrieval, and prompting in multi-hop question answering [@markovic2025cognee]. Audrey does not optimize knowledge-graph retrieval quality. It uses local memory to decide whether an agent action should proceed.
+## Memory Benchmarks and Evaluation
+MTEB optimizes broad evaluation of embedding models. It spans embedding tasks such as retrieval, clustering, reranking, and semantic textual similarity across many datasets and languages [@muennighoff2023mteb]. It is relevant because many memory systems rely on embeddings, but it evaluates representation quality rather than the behavioral effect of memory on a tool-using agent.
+LongMemEval optimizes long-term chat-assistant memory evaluation. It tests information extraction, multi-session reasoning, temporal reasoning, knowledge updates, and abstention across sustained user-assistant histories [@wu2025longmemeval]. GuardBench is orthogonal. It starts after a system has some memory state and asks whether that state changes a future tool action.
+LoCoMo optimizes very long-term conversational memory. It provides long dialogues across many sessions and evaluates question answering, event summarization, and multimodal dialogue generation [@maharana2024locomo]. GuardBench does not replace LoCoMo. It tests a separate failure surface: repeated actions, missing procedures, degraded recall, secret redaction, and contradictions at the tool boundary.
+MemoryBench optimizes continual-learning evaluation from accumulated user feedback. Its paper argues that many memory benchmarks focus on homogeneous reading-comprehension tasks and introduces a user-feedback simulation framework across domains, languages, and task types [@ai2026memorybench]. Audrey's validation loop is smaller: it records whether a memory was used, helpful, or wrong, then updates salience and bookkeeping (Ledger: E16). GuardBench evaluates the control effect of those memories, not general continual-learning quality.
+## MCP Tool Safety and Pre-Action Runtimes
+The Model Context Protocol standardizes how clients discover and call tools. The 2025-06-18 schema defines `tools/list`, tool metadata, `tools/call`, input and output schemas, and tool annotations [@mcp2025schema]. MCP creates an interoperable tool surface; it does not define a memory-derived policy for whether a call should happen. Audrey fits beside MCP as a local controller that runs before a host invokes a tool.
+MCP Security Bench optimizes evaluation of MCP-specific attacks. It introduces attack categories across task planning, tool invocation, and response handling, and evaluates LLM agents with real benign and malicious MCP tools [@zhang2026mcpsecuritybench]. Audrey is not an MCP attack benchmark and not a complete MCP defense system. It addresses one component inside a defensive host: memory-derived pre-action control with evidence and redaction.
+The tool-poisoning paper studies semantic attacks against MCP-integrated systems, including malicious tool descriptors, shadowing through contaminated context, and descriptor changes after approval [@jamshidi2025toolpoisoning]. Audrey's trusted-control-source gate responds to a related risk inside memory: untrusted memories tagged as must-follow are not promoted into control rules (Ledger: E6). This does not solve tool poisoning. It reduces one path by which remembered content becomes operational instruction.
+The MCP tool-annotations blog frames annotations as a risk vocabulary. It states that annotations such as read-only, destructive, idempotent, and open-world are hints, not guaranteed descriptions, and that clients should not base tool-use decisions on annotations from untrusted servers [@mcp2026toolannotations]. Audrey's decision layer is complementary: it uses remembered outcomes, procedures, contradictions, and recall health, not only static tool metadata.
+## What Is Missing
+Across the primary sources reviewed here, memory systems optimize extraction, retrieval, persistence, graph structure, context assembly, personalization, or continual learning. Safety work around MCP optimizes attack detection, tool metadata, and protocol-level risk. The missing evaluation target is action effect: whether memory changes what an agent does next. Audrey implements a redaction-first, evidence-linked, host-side controller that runs before tool use and returns `allow`, `warn`, or `block`; GuardBench specifies how to evaluate that category.

package/docs/paper/output/submission-bundle/docs/paper/03-problem-definition.md ADDED Viewed

@@ -0,0 +1,108 @@
+# 3. Problem Definition: Pre-Action Memory Control
+Long-term memory systems for agents are usually evaluated as retrieval systems: given prior interaction history and a current query, the system returns facts, summaries, graph neighborhoods, or memory-tool results that improve the next model response. Mem0 evaluates scalable conversational memory and token efficiency [@chhikara2025mem0; @mem02026tokenefficient]. MemGPT/Letta frames memory as virtual context management across tiers [@packer2024memgpt]. Zep and Graphiti model changing facts as temporal knowledge graphs [@rasmussen2025zep; @zep2026graphiti]. MemOS treats memory as a manageable system resource [@li2025memos]. LangMem, Supermemory, and Cognee expose memory management, search, and graph/context layers for agents [@langchain2026langmem; @supermemory2026docs; @markovic2025cognee; @cognee2026repo].
+Those systems make recall, memory formation, or memory organization the central artifact. This paper studies a different artifact: a controller that runs before an agent uses a tool. The relevant question is not only whether memory returns useful text. The relevant question is whether memory changes the next external action.
+## Problem Statement
+A tool-using agent repeatedly converts model state into external actions: shell commands, file edits, API calls, browser operations, MCP tool calls, or domain-specific side effects. These actions are not only language outputs. They change local files, spend API budget, mutate remote systems, publish content, delete data, and expose credentials. MCP standardizes tool discovery and invocation through a JSON-RPC protocol surface [@mcp2025schema]. Claude Code hooks expose pre-tool and post-tool extension points around tool calls [@anthropic2026claudecodehooks]. These interfaces make tool use observable and interceptable; they do not by themselves decide whether prior failures, remembered constraints, stale recall, or contradictions should stop the next action.
+The pre-action memory control problem is:
+Given an intended agent action and a local memory state, return an auditable decision before tool execution: `allow`, `warn`, or `block`, with evidence and repair guidance.
+Audrey implements this decision as the `GuardResult` contract with `decision`, `riskScore`, `summary`, `evidenceIds`, `recommendedActions`, `capsule`, `reflexes`, and `preflightEventId` fields (Ledger: E1). Its current controller calls preflight/reflex generation before action, records a preflight event, and scopes recall to the current agent (Ledger: E2). It also records tool outcomes after execution and turns failures into future tool-result memories (Ledger: E4).
+## Formal Interface
+Let an intended action at time `t` be:
+```text
+a_t = (tool, action, command, cwd, files, session_id)
+```
+where `tool` names the external capability, `action` is the human-readable intended operation, `command` is the concrete command when present, `cwd` is the execution directory, `files` is the known file scope, and `session_id` identifies the agent session. Audrey represents this shape in `AgentAction` (Ledger: E1).
+Let `M_t` be the memory store visible to the agent, `T_t` be the tool-trace event history, and `H_t` be the current recall-health state. A pre-action memory controller is a function:
+```text
+G(M_t, T_t, H_t, a_t) -> (d_t, r_t, E_t, R_t, C_t)
+```
+where:
+- `d_t in {allow, warn, block}` is the action decision.
+- `r_t in [0,1]` is the risk score.
+- `E_t` is a set of evidence identifiers.
+- `R_t` is a set of recommended repair or mitigation actions.
+- `C_t` is an optional memory capsule containing the evidence packet.
+The controller is useful only if `d_t` is consumed before the side effect occurs. If the decision is displayed after execution, the system is post-hoc logging, not pre-action control.
+After the tool executes, let the observed outcome be:
+```text
+o_t = (outcome, output, error_summary, metadata)
+```
+where `outcome` includes success, failure, or unknown status. A closed-loop memory controller also defines an update function:
+```text
+U(M_t, T_t, a_t, o_t) -> (M_{t+1}, T_{t+1})
+```
+Audrey implements this post-action update by recording redacted tool events and encoding failures as tool-result memories for later preflight use (Ledger: E4, E12).
+## Desired Behavior Properties
+**Pre-action placement.** The controller runs after the agent proposes tool parameters and before the tool runs. Audrey's `beforeAction()` path calls reflex/preflight generation before execution (Ledger: E2). This placement separates memory control from answer generation.
+**Evidence-linked decisions.** Every warning or block is backed by memory IDs, failure IDs, recall diagnostics, or preflight event IDs. Audrey preflight returns warnings, evidence IDs, recommended actions, recent failures, and an optional capsule (Ledger: E8-E10). Reflex generation preserves evidence IDs and reasons (Ledger: E11).
+**Action identity.** A repeated-failure detector needs an action identity stricter than a natural-language similarity match and more robust than raw string equality. Audrey hashes tool name, redacted command/action text, normalized working directory, and sorted normalized files; it then compares the hash with prior failed tool events for the same agent and tool (Ledger: E3).
+**Conservative control-source handling.** A memory tagged as a rule is not automatically trusted. Audrey treats `must-follow` style tags as control signals only when the source is `direct-observation` or `told-by-user`; untrusted control-looking memories are routed to uncertain/disputed context (Ledger: E6).
+**Recall-degradation awareness.** If retrieval partially fails, a memory controller should not silently proceed as though recall were complete. Audrey represents recall errors, propagates partial failures, exposes recent recall degradation in status, carries recall errors into capsules, and turns capsule recall errors into high-severity preflight warnings (Ledger: E7, E9, E15).
+**Redaction before persistence.** Tool traces are high-risk memory inputs because they contain commands, environment output, stack traces, credentials, and file paths. Audrey's tool-trace contract states that raw tool input, output, and error text do not leave the module without redaction; it stores hashes, redacted summaries, redacted metadata, file fingerprints, and redaction state (Ledger: E12). The redaction layer covers common credentials, bearer/basic auth, private keys, JWTs, URL credentials, password assignments, payment/PII patterns, signed URL signatures, session cookies, high-entropy secrets, and sensitive JSON keys (Ledger: E13).
+**Bounded context assembly.** A pre-action controller must fit within model and tool budgets. Audrey capsules include `budget_chars`, `used_chars`, and `truncated` fields and organize evidence into typed sections rather than returning an unstructured recall list (Ledger: E5).
+**Agent scoping.** A local memory runtime used by multiple agents should not leak another agent's private operational history into a current preflight. Audrey capsule recall forces `scope: 'agent'` (Ledger: E7).
+**Closed-loop validation.** A memory controller needs feedback after action because the controller's evidence can be helpful, merely used, or wrong. Audrey validation accepts `used`, `helpful`, and `wrong` outcomes and updates salience and bookkeeping fields; impact reporting summarizes validation and recent activity (Ledger: E16-E17).
+## Threat Model
+The controller assumes an agent with legitimate access to local tools and MCP tools. The main hazards are action-selection hazards, not cryptographic compromise.
+The in-scope hazards are:
+- Repeating an exact action that already failed.
+- Ignoring a remembered must-follow procedure or project rule.
+- Acting on a memory set with open contradictions.
+- Treating degraded recall as complete recall.
+- Persisting credentials or sensitive tool output into long-lived memory.
+- Allowing untrusted tool outputs or descriptors to influence future actions as if they were trusted rules.
+- Losing auditability between a warning/block and the evidence that caused it.
+The MCP ecosystem makes these hazards concrete. The MCP specification standardizes the schema for tools and messages [@mcp2025schema]. MCP security benchmarks and tool-poisoning work evaluate attacks against MCP-integrated agents, including adversarial tool descriptions, shadowing through shared context, and changed tool descriptors after approval [@zhang2026mcpsecuritybench; @jamshidi2025toolpoisoning]. The MCP tool-annotations discussion frames annotations as risk hints that matter only when a client performs a concrete action based on them [@mcp2026toolannotations]. Audrey's controller fits this pattern: memory risk is useful when it changes the host-side decision.
+The out-of-scope hazards are:
+- First-time mistakes with no relevant prior evidence.
+- Malicious host compromise, database tampering, or filesystem attacks outside Audrey's process boundary.
+- Formal verification that a tool action is semantically safe.
+- Permission enforcement, sandboxing, rate limiting, or policy execution outside the memory controller.
+- Model alignment, deception detection, and general prompt-injection defense independent of remembered evidence.
+- Complete prevention of secret exposure after a caller explicitly stores unredacted data outside Audrey's tool-trace path.
+These boundaries matter for evaluation. Audrey Guard is not a replacement for sandboxing, MCP permission systems, static analysis, or human approval. It is a memory-derived pre-action control layer: it catches hazards that are visible in prior outcomes, stored rules, contradictions, tool traces, and recall health.
+## Stage-A Evaluation Target
+The first paper version evaluates implemented mechanisms and specifies GuardBench rather than claiming full benchmark results across external systems. The implemented evidence available today is: the controller and CLI guard path (Ledger: E1-E4, E26), the redacted tool-trace path (Ledger: E12-E13), preflight/reflex behavior (Ledger: E8-E11), recall degradation handling (Ledger: E15), closed-loop validation and impact reporting (Ledger: E16-E17), the canonical performance snapshot (Ledger: E20-E22), the current behavioral regression gate output (Ledger: E23-E24), and the deterministic repeated-failure demo (Ledger: E25).
+GuardBench therefore belongs in this version as a specification for future comparative evaluation, not as a completed empirical result table. The empirical claims in the first version use existing Audrey artifacts only.

package/docs/paper/output/submission-bundle/docs/paper/04-design.md ADDED Viewed

@@ -0,0 +1,164 @@
+# 4. Design: Audrey Guard as a Pre-Action Memory Controller
+Audrey Guard is the pre-action control layer of Audrey, not a separate memory store. It uses the same local memory runtime, tool-event log, recall path, validation feedback, and MCP/CLI surfaces, then adds one controller loop around tool use. The controller's output is deliberately narrow: `allow`, `warn`, or `block`, plus risk score, evidence, reflexes, recommendations, and an optional capsule (Ledger: E1).
+The design has four layers:
+1. Tool events enter through a redaction-first trace layer.
+2. Recall and event history are assembled into a bounded memory capsule.
+3. Preflight turns capsule entries, recent failures, contradictions, memory health, and recall errors into a risk-scored decision.
+4. Reflexes and the top-level controller convert the preflight into host-facing guidance, repeated-failure blocking, and post-action learning.
+This section describes the implemented design as of the current repository state. Each Audrey implementation claim references the evidence ledger.
+## Architecture Overview
+Audrey's public surface includes MCP tools for observation, capsules, preflight, reflexes, validation, and dream/consolidation (Ledger: E18-E19). The package metadata describes the system as a local-first memory runtime with recall, consolidation, memory reflexes, contradiction detection, and tool-trace learning (Ledger: E27). The Guard design composes these existing mechanisms rather than introducing a separate policy engine.
+The runtime path is:
+```text
+agent proposes action
+        |
+        v
+MemoryController.beforeAction(a_t)
+        |
+        v
+audrey.reflexes(action, strict=true, includePreflight=true, includeCapsule=true)
+        |
+        v
+buildPreflight(...) -> buildCapsule(...) -> recall + events + status
+        |
+        v
+GuardResult { allow | warn | block, evidence, recommendations }
+        |
+        v
+tool executes only if host permits it
+        |
+        v
+MemoryController.afterAction(o_t) -> observeTool(...) + optional failure memory
+```
+The controller is host-side. It does not ask the language model to decide whether memory matters. It converts memory state into a small decision object that a CLI, hook, MCP client, or agent runtime can enforce.
+## Controller Loop
+`MemoryController.beforeAction()` is the orchestration point. It calls `audrey.reflexes()` with `strict: true`, `includePreflight: true`, `includeCapsule: true`, `recordEvent: true`, and `scope: 'agent'` (Ledger: E2). This means a pre-action check records its own trace, returns the underlying evidence packet, and blocks on high-severity warnings rather than treating them as advisory text.
+The controller then checks exact repeated failures outside the general preflight path. If a matching failed tool event exists, the controller returns `block`, raises the risk score to at least `0.9`, prepends a recommendation not to repeat the exact failed action, and merges the prior failure event IDs with reflex evidence IDs (Ledger: E3). This rule gives repeated-failure prevention a deterministic path that does not depend on embedding similarity, lexical search, or model interpretation.
+`MemoryController.afterAction()` closes the loop after execution. It records the tool outcome through `observeTool()`, attaches the action hash as `audrey_guard_action_key`, and, when the outcome is failed and a redacted error summary exists, encodes a high-salience `tool-result` memory containing the failure, command, and error summary (Ledger: E4). The next preflight can therefore use both structured event history and ordinary recall.
+This loop makes memory an action governor. The system remembers not only what was said, but what happened when an agent touched a tool.
+## Capsule Construction
+The capsule is the evidence packet that feeds preflight. It replaces a loose recall list with a typed, budgeted object containing:
+- `must_follow`
+- `project_facts`
+- `user_preferences`
+- `procedures`
+- `risks`
+- `recent_changes`
+- `contradictions`
+- `uncertain_or_disputed`
+- `evidence_ids`
+- optional `recall_errors`
+The capsule also reports `budget_chars`, `used_chars`, and `truncated` so downstream callers know whether the packet was pruned (Ledger: E5).
+Capsule construction starts with recall and then enriches results with tags, provenance, state, evidence IDs, recent failure events, and open contradictions. The implementation forces recall to `scope: 'agent'` even if a caller passes a different scope through capsule options (Ledger: E7). This is a security and relevance decision: the controller should not block or guide one agent using another agent's unrelated private work.
+The capsule's control-source rule is conservative. A memory tagged as `must-follow`, `must`, `required`, `never`, `always`, or `policy` becomes a must-follow control signal only when its source is `direct-observation` or `told-by-user`; the same tag from an untrusted source is classified as uncertain/disputed context (Ledger: E6). This prevents tool output, imported text, or adversarial memory content from becoming a rule merely by containing a policy-looking tag.
+The capsule includes two non-recall evidence classes. First, recent tool failures are inserted into the `risks` section as `tool_failure` entries with confidence derived from failure count (Ledger: E5). Second, open contradictions are inserted into the `contradictions` section with both sides referenced as evidence (Ledger: E5). These are control-relevant facts even when the current semantic query does not retrieve them.
+Finally, capsule pruning uses section priority. Must-follow rules, risks, contradictions, and procedures are retained before general project facts and preferences when the character budget is tight (Ledger: E5). This is the right asymmetry for pre-action control: the controller loses optional context before losing stop conditions.
+## Preflight Risk Scoring
+Preflight is the decision layer over the capsule. The output contract includes the action, query, tool, working directory, generated timestamp, decision, verdict, `ok_to_proceed`, numeric `risk_score`, summary, warnings, recent failures, optional status, recommended actions, evidence IDs, optional preflight event ID, and optional capsule (Ledger: E8).
+`buildPreflight()` constructs its query from the action, tool, and working directory, then requests a conservative capsule with risks and contradictions enabled (Ledger: E9). It adds warnings from seven sources:
+- memory health failures or re-embedding recommendations;
+- recall errors carried by the capsule;
+- recent failed tool events matching the action or tool;
+- must-follow capsule entries;
+- remembered risks and tool failures;
+- remembered procedures;
+- open contradictions and uncertain/disputed entries.
+Each warning has a type, severity, message, reason, optional evidence ID, and optional recommended action (Ledger: E8-E9). The decision rule is intentionally simple. Warnings are sorted by severity. The risk score is the maximum severity score. In strict mode, any high-severity warning returns `block`; high or medium warnings outside strict mode return `caution`; absence of such warnings returns `go` (Ledger: E10). The controller maps `go` to `allow`, `caution` to `warn`, and `block` to `block` (Ledger: E1).
+This scoring rule trades statistical sophistication for auditability. A user can inspect the warning type and evidence ID that produced the decision. The rule is also stable under small recall-score perturbations: once an item enters the capsule and is categorized as high-severity, the block decision does not depend on an opaque model score.
+## Reflex Generation
+Reflexes are the host-readable form of preflight warnings. `buildReflexReport()` calls `buildPreflight()` and maps each warning into a `MemoryReflex` with a stable hash ID, trigger text, response type, severity, source warning type, response, reason, evidence ID, action, tool, and working directory (Ledger: E11).
+The response type is derived from decision and warning semantics. A high-severity warning under a blocking preflight becomes `block`; an informational procedure becomes `guide`; other warnings become `warn` (Ledger: E11). The report returns the preflight decision, risk score, summary, reflexes, evidence IDs, recommended actions, and optionally the embedded preflight (Ledger: E11).
+This layer separates the controller's enforcement object from user-facing guidance. The host can enforce the top-level decision while still showing a concise list of trigger-response reflexes that explain what memory changed.
+## Action Identity Hashing
+Repeated-failure prevention requires stable action identity. Audrey's action key is a SHA-256 hash over:
+- lower-cased tool name;
+- redacted command or action text normalized for whitespace and case;
+- normalized working directory;
+- sorted normalized file paths.
+The implementation resolves real paths when available, removes Windows extended path prefixes, normalizes slashes, lowercases paths on Windows, and sorts the file set before hashing (Ledger: E3). The repeated-failure matcher then scans failed tool events for the same tool and agent and checks whether event metadata contains the same `audrey_guard_action_key` (Ledger: E3).
+This design avoids two failure modes. Raw command matching leaks secrets and treats path spelling differences as different actions. Pure semantic matching catches near neighbors but cannot prove that the exact failed operation is being repeated. Audrey uses a redacted deterministic key for the exact-repeat case and leaves broader similarity risks to capsule/preflight.
+## Redaction Discipline
+Tool traces are both valuable and dangerous. The trace layer states an explicit contract: raw tool input, output, and error text do not leave `tool-trace.ts` without redaction (Ledger: E12). By default, tool tracing stores hashes and summaries rather than full payloads. When callers opt into retained details, those details still pass through JSON redaction before persistence (Ledger: E12-E13).
+The redaction layer is rule-based and conservative. It covers provider keys, GitHub and Slack tokens, Stripe keys, bearer/basic auth, private key blocks, JWTs, URL credentials, password and secret assignments, credit cards, CVV, US SSNs, signed URL signatures, session cookies, high-entropy secrets, and sensitive JSON keys (Ledger: E13). Truncation preserves redaction markers so an audit trail still records what class of secret was removed even when the surrounding text is shortened (Ledger: E13).
+The trace layer also computes file fingerprints from files under the current working directory, capped at 50 files, rather than storing raw file contents (Ledger: E12). This gives preflight evidence enough identity to connect a future action with a prior failure without turning the memory log into an uncontrolled data sink.
+## Recall and Degradation Handling
+Audrey recall uses hybrid retrieval: vector KNN and FTS5 BM25 are fused with reciprocal rank fusion, using `RRF_K = 60`, vector weight `0.3`, FTS weight `0.7`, and mode-specific behavior for vector, keyword, and hybrid retrieval (Ledger: E14). The paper should treat these weights as implementation choices, not theoretical claims.
+More important for Guard is recall degradation. Audrey's recall result type carries `partialFailure` and `errors`; memory status exposes `recall_degraded` and `last_recall_errors`; capsules preserve recall errors; preflight turns capsule recall errors into high-severity memory-health warnings with repair guidance (Ledger: E7, E9, E15). In strict guard mode, this becomes a block because high-severity warnings block (Ledger: E10).
+This behavior is a core distinction between recall as context and memory as control. A chat assistant can degrade gracefully by answering from partial recall. A pre-action controller that cannot inspect part of memory should not present a clear action path as equivalent to complete recall.
+## Closed-Loop Validation
+Audrey includes explicit post-hoc validation. `memory_validate` accepts `used`, `helpful`, or `wrong`; the implementation updates salience, usage count, retrieval count, challenge count, and last-use/reinforcement fields according to memory type (Ledger: E16). Impact reporting then aggregates totals, validation windows, semantic challenge counts, validation outcomes from audit events, top-used memories, weakest memories, and recent activity (Ledger: E17).
+Guard validation can also bind the feedback event to the exact `preflight_event_id`, evidence set, and action fingerprint that surfaced a memory. This keeps post-hoc feedback attached to the pre-action decision it is judging instead of treating validation as an unscoped memory tap (Ledger: E44).
+The controller uses validation in the repeated-failure demo: after Guard blocks the repeat action, the demo validates the operational lesson as helpful and reports impact counts (Ledger: E25). This is qualitative evidence, not a benchmark score. Its role in the paper is to show the end-to-end control loop: failure, memory write, preflight block, evidence, validation.
+## Interfaces
+Audrey exposes Guard through both MCP and CLI surfaces. MCP registers the lower-level tools needed to assemble the loop: `memory_observe_tool`, `memory_capsule`, `memory_preflight`, `memory_reflexes`, `memory_validate`, and `memory_dream` (Ledger: E18-E19). The CLI exposes `audrey guard`, which parses tool/action/file/cwd/session options, runs `MemoryController.beforeAction()`, prints JSON or formatted output, and exits with code `2` on block or fail-on-warn unless overridden (Ledger: E26).
+The deterministic demo, `audrey demo --scenario repeated-failure`, constructs a temporary mock-provider store, records a failed deploy, encodes the required remediation as a must-follow memory, reruns preflight on the same action, validates the lesson, and reports whether a repeated failure was prevented (Ledger: E25). This demo is the right Stage-A qualitative figure because it exercises the implemented controller path without external API keys or hosted services.
+## Existing Empirical Hooks
+The current paper version has two implemented empirical anchors. First, `benchmarks/snapshots/perf-0.22.2.json` reports canonical local performance under the mock-provider methodology: generated on 2026-05-01 from git SHA `e2e821b`, using mock 64-dimensional in-process embeddings, hybrid recall limit 5, and corpus sizes 100, 1,000, and 5,000 on Node 25.5.0 with a 24-core Ryzen 9 7900X3D and 62.9 GB RAM (Ledger: E20). Under that methodology, hybrid recall p95 is 1.82 ms, 2.364 ms, and 3.417 ms for those three sizes, and encode p95 is 0.589 ms, 2.147 ms, and 1.838 ms (Ledger: E21-E22).
+Second, `bench:memory:check` is wired into the release gate and enforces retrieval/lifecycle benchmark guardrails against weak local baselines (Ledger: E23). The current checked-in output reports a 2026-05-08 mock-provider run in which Audrey scores 100% with 100% pass rate, while the strongest listed local baselines score 41.67% with 25% pass rate in that output (Ledger: E24). These numbers support regression-gate honesty; they do not replace GuardBench.
+The README benchmark table currently differs from the canonical JSON snapshot, so the paper quotes only the JSON snapshot and tracks the README correction as a follow-up (Ledger: E28).
+## Design Consequence
+The central design consequence is that memory is not treated as passive context. It becomes a control signal with a lifecycle:
+```text
+observe -> redact -> remember -> retrieve -> capsule -> preflight -> reflex -> allow/warn/block -> validate
+```
+This lifecycle is the paper's contribution. The Stage-A paper should present GuardBench as the evaluation specification for this lifecycle, while reporting only the implemented Audrey artifacts and current Audrey-only measurements that already exist in the repository.