npm - haechi - Versions diffs - 1.2.0 → 1.3.0 - Mend

haechi 1.2.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/README.ko.md +46 -11
package/README.md +46 -11
package/docs/current/config-version.ko.md +2 -2
package/docs/current/config-version.md +2 -2
package/docs/current/configuration.ko.md +26 -10
package/docs/current/configuration.md +26 -10
package/docs/current/operations-runbook.ko.md +36 -2
package/docs/current/operations-runbook.md +39 -2
package/docs/current/release-process.ko.md +5 -1
package/docs/current/release-process.md +5 -1
package/docs/current/risk-register-release-gate.ko.md +4 -3
package/docs/current/risk-register-release-gate.md +4 -3
package/docs/current/shared-responsibility.ko.md +2 -2
package/docs/current/shared-responsibility.md +2 -2
package/docs/current/threat-model.ko.md +4 -3
package/docs/current/threat-model.md +4 -3
package/examples/local-proxy-demo/README.md +51 -0
package/examples/local-proxy-demo/demo.mjs +144 -0
package/examples/local-proxy-demo/demo.tape +19 -0
package/examples/local-proxy-demo/live-demo.mjs +121 -0
package/examples/local-proxy-demo/live-demo.tape +25 -0
package/haechi.config.example.json +2 -1
package/package.json +3 -1
package/packages/cli/bin/haechi.mjs +3 -2
package/packages/cli/runtime.mjs +12 -1
package/packages/filter/index.mjs +679 -6
package/packages/privacy-profiles/index.mjs +72 -3
package/packages/protocol-adapters/index.mjs +99 -1
package/packages/proxy/index.mjs +7 -1
package/packages/stream-filter/index.mjs +69 -7

package/docs/current/release-process.ko.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Haechi Release Process
-- 문서 상태: Living document (core 1.2.x 추적)
+- 문서 상태: Living document (core 1.3.x 추적)
 - 작성일: 2026-06-10
 ## 1. 로컬 릴리즈 검증
@@ -70,6 +70,7 @@ npm audit signatures
 | `.github/workflows/auth-jwt-publish.yml` | `haechi-auth-jwt` | `auth-jwt-v<semver>` | satellite publish, 동일한 서명 아티팩트 경로 |
 | `.github/workflows/dashboard-publish.yml` | `haechi-dashboard` | `dashboard-v<semver>` | satellite publish, 동일한 서명 아티팩트 경로 |
 | `.github/workflows/auth-oidc-publish.yml` | `haechi-auth-oidc` | `auth-oidc-v<semver>` | satellite publish, 동일한 서명 아티팩트 경로 |
+| `.github/workflows/ratelimit-redis-publish.yml` | `haechi-ratelimit-redis` | `ratelimit-redis-v<semver>` | satellite publish, 동일한 서명 아티팩트 경로 |
 각 publish 워크플로는 `release: published`에서 트리거되지만 **가드**되어 둘이 교차 발화하지 않습니다. core job은 `v`로 시작하는 태그에서만 실행되고(그리고 `^v[0-9]+\.[0-9]+\.[0-9]+$` 재검증), satellite job은 `crypto-kms-v…`에서만 실행됩니다(그리고 `^crypto-kms-v[0-9]+\.[0-9]+\.[0-9]+$` 재검증 **및** 태그 버전이 satellite `package.json` 버전과 일치하는지 검증). npmjs.com Trusted Publisher는 각 패키지의 **특정 워크플로 파일명**에 바인딩됩니다 — 워크플로 파일 rename은 npm 설정을 갱신할 때까지 OIDC publish를 깨뜨립니다.
@@ -92,6 +93,7 @@ Satellite는 npm workspaces 모노레포의 `satellites/*`에 살며 core와 **
 | `haechi-auth-jwt` | `auth-jwt-v<semver>` | `auth-jwt-publish.yml` | `satellites/auth-jwt/package.json` |
 | `haechi-dashboard` | `dashboard-v<semver>` | `dashboard-publish.yml` | `satellites/dashboard/package.json` |
 | `haechi-auth-oidc` | `auth-oidc-v<semver>` | `auth-oidc-publish.yml` | `satellites/auth-oidc/package.json` |
+| `haechi-ratelimit-redis` | `ratelimit-redis-v<semver>` | `ratelimit-redis-publish.yml` | `satellites/ratelimit-redis/package.json` |
 **satellite 릴리스 검증** (core와 동일한 신뢰 앵커):
@@ -104,6 +106,8 @@ npm view haechi-crypto-kms --json   # dist.attestations 존재 확인; access "p
 **0.9 satellite(새 unscoped 이름 — 첫 태그 *전에* Trusted Publisher 설정):** `haechi-dashboard`와 `haechi-auth-oidc`는 0.9에서 첫 발행되며 위의 satellite별 부트스트랩 순서를 동일하게 따릅니다. 0.8 satellite와 마찬가지로 unscoped 이름은 첫 OIDC publish 시 확보되므로, 각각의 npmjs.com Trusted Publisher를 첫 태그 **전에** 설정해야 합니다 — `raeseoklee/haechi` 저장소와 정확한 워크플로 파일명(`haechi-dashboard`는 `dashboard-publish.yml`, `haechi-auth-oidc`는 `auth-oidc-publish.yml`)을 연결한 뒤, 접두사 태그(`dashboard-v0.1.0`, `auth-oidc-v0.1.0`)를 push하고 GitHub Release를 발행합니다. 기존 두 satellite는 이미 부트스트랩된 태그/워크플로를 그대로 사용합니다: `haechi-auth-jwt@0.2.0`은 `auth-jwt-v<semver>`(`auth-jwt-publish.yml`), `haechi-crypto-kms@0.2.0`은 `crypto-kms-v<semver>`(`crypto-kms-publish.yml`) — 이 둘은 새 Trusted Publisher 설정이 필요 없습니다.
+**`haechi-ratelimit-redis`(새 unscoped 이름 — 첫 태그 *전에* Trusted Publisher 설정):** 공유 저장소 rate-limiter satellite는 고유의 `ratelimit-redis-v<semver>` 태그에서 첫 발행되며 위의 satellite별 부트스트랩 순서를 동일하게 따릅니다. unscoped 이름은 첫 OIDC publish 시 확보되므로, npmjs.com Trusted Publisher를 첫 태그 **전에** 설정해야 합니다 — `raeseoklee/haechi` 저장소와 정확한 워크플로 파일명 `ratelimit-redis-publish.yml`을 연결한 뒤, 접두사 태그(`ratelimit-redis-v0.1.0`)를 push하고 GitHub Release를 발행합니다. `redis` 클라이언트는 **optional peer dependency**이며 번들된 Redis 어댑터를 쓰는 소비자만 import합니다(store/client는 주입됩니다). 따라서 core는 zero-dependency로 유지됩니다.
 ## 6. 배포 차단 조건
 다음 중 하나라도 실패하면 npm publish를 하지 않습니다.

package/docs/current/release-process.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Haechi Release Process
-- Status: Living document (tracks core 1.2.x)
+- Status: Living document (tracks core 1.3.x)
 - Date: 2026-06-10
 ## 1. Local Release Verification
@@ -70,6 +70,7 @@ npm audit signatures
 | `.github/workflows/auth-jwt-publish.yml` | `haechi-auth-jwt` | `auth-jwt-v<semver>` | satellite publish, same signed-artifacts path |
 | `.github/workflows/dashboard-publish.yml` | `haechi-dashboard` | `dashboard-v<semver>` | satellite publish, same signed-artifacts path |
 | `.github/workflows/auth-oidc-publish.yml` | `haechi-auth-oidc` | `auth-oidc-v<semver>` | satellite publish, same signed-artifacts path |
+| `.github/workflows/ratelimit-redis-publish.yml` | `haechi-ratelimit-redis` | `ratelimit-redis-v<semver>` | satellite publish, same signed-artifacts path |
 Each publish workflow triggers on `release: published` but is **guarded** so the two never cross-fire: the core job runs only for tags starting `v` (and re-validates `^v[0-9]+\.[0-9]+\.[0-9]+$`); the satellite job runs only for `crypto-kms-v…` (and re-validates `^crypto-kms-v[0-9]+\.[0-9]+\.[0-9]+$` **and** that the tag version equals the satellite's `package.json` version). The npmjs.com Trusted Publisher for each package is bound to its **specific workflow filename** — renaming a workflow file breaks its OIDC publish until the npm config is updated.
@@ -92,6 +93,7 @@ No manual `npm publish` from a laptop is needed. Because the names are unscoped
 | `haechi-auth-jwt` | `auth-jwt-v<semver>` | `auth-jwt-publish.yml` | `satellites/auth-jwt/package.json` |
 | `haechi-dashboard` | `dashboard-v<semver>` | `dashboard-publish.yml` | `satellites/dashboard/package.json` |
 | `haechi-auth-oidc` | `auth-oidc-v<semver>` | `auth-oidc-publish.yml` | `satellites/auth-oidc/package.json` |
+| `haechi-ratelimit-redis` | `ratelimit-redis-v<semver>` | `ratelimit-redis-publish.yml` | `satellites/ratelimit-redis/package.json` |
 **Verify a satellite release** (same anchors as core):
@@ -104,6 +106,8 @@ npm view haechi-crypto-kms --json   # dist.attestations present; access "public"
 **0.9 satellites (new unscoped names — configure Trusted Publisher *before* the first tag):** `haechi-dashboard` and `haechi-auth-oidc` are first-published in 0.9 and follow the same per-satellite bootstrap order above. As with the 0.8 satellites, the unscoped name is claimed on first OIDC publish, so the npmjs.com Trusted Publisher for each must be configured **before** its first tag — link `raeseoklee/haechi` and the exact workflow filename (`dashboard-publish.yml` for `haechi-dashboard`, `auth-oidc-publish.yml` for `haechi-auth-oidc`), then push the prefixed tag (`dashboard-v0.1.0`, `auth-oidc-v0.1.0`) and publish the GitHub Release. The two existing satellites ride their already-bootstrapped tags/workflows: `haechi-auth-jwt@0.2.0` on `auth-jwt-v<semver>` (`auth-jwt-publish.yml`) and `haechi-crypto-kms@0.2.0` on `crypto-kms-v<semver>` (`crypto-kms-publish.yml`) — no new Trusted Publisher configuration is required for those two.
+**`haechi-ratelimit-redis` (new unscoped name — configure Trusted Publisher *before* the first tag):** the shared-store rate-limiter satellite is first-published from its own `ratelimit-redis-v<semver>` tag and follows the same per-satellite bootstrap order above. The unscoped name is claimed on its first OIDC publish, so its npmjs.com Trusted Publisher must be configured **before** its first tag — link `raeseoklee/haechi` and the exact workflow filename `ratelimit-redis-publish.yml`, then push the prefixed tag (`ratelimit-redis-v0.1.0`) and publish the GitHub Release. The `redis` client is an **optional peer dependency**, imported only by consumers using the bundled Redis adapter (the store/client is injected), so core stays zero-dependency.
 ## 6. Deployment block conditions
 npm publish is not performed if any of the following fail.

package/docs/current/risk-register-release-gate.ko.md CHANGED Viewed

@@ -1,8 +1,8 @@
 # Haechi 리스크 레지스터 및 릴리스 게이트
-- 문서 상태: Living document(core 1.2.x 추적)
+- 문서 상태: Living document(core 1.3.x 추적)
 - 작성일: 2026-06-11
-- 기준 버전: 1.2.x
+- 기준 버전: 1.3.x
 - 기준 브랜치: `main`
 ## 1. 현재 판단
@@ -23,11 +23,12 @@ Haechi는 `1.x` stable 라인을 출시했습니다. developer preview 게이트
 | G0 | GitHub source 공개 | 테스트 통과, 보안 한계 문서화, 평문 audit leak 없음 | Pass |
 | G1 | GitHub pre-release | P0 코드 리스크 해결, production-ready 표현 없음 | Pass |
 | G2 | npm developer preview | P0 해결, preflight/SBOM/provenance 경로 준비, npm auth 확인 | Pass (`haechi@0.3.2` 2026-06-10 배포) |
-| G3 | npm stable | P1 운영 reference, stream-aware enforcement, API stability 강화 | Blocked |
+| G3 | npm stable | P1 운영 reference, stream-aware enforcement, API stability 강화 | Pass (1.0.0 stable 컷에서 달성 — streaming inspection은 0.5, API freeze는 1.0.0에서 출시; G5 참조. G5–G7로 대체됨.) |
 | G4 | 0.9.0 observability + interactive-auth 위성 컷 | P1-SEC-026 / P1-OPS-009 mitigated 및 P2-CRYPTO-001 accepted; `haechi-dashboard` + `haechi-auth-oidc` + `haechi-crypto-kms@0.2.0` 테스트 통과; 위성 tarball zero-dep; core 0.9.0 bump(추가적 FORBIDDEN_KEYS audit 강화만) | Pass |
 | G5 | 1.0.0 stable API contract + signed-plugin sandbox | P1-SEC-024 / P1-SEC-025 mitigated, P2-API-001 / P2-OPS-006 resolved; API freeze + deprecation policy + `tests/api-contract.test.mjs` 통과; Ed25519 signed-plugin contract + `assertAuthProviderConformance` + worker-isolated `authProvider` sandbox 테스트 통과; PR0 위성 peer-range를 `>=0.8.0 <2.0.0`로 확대 및 `check-satellite-peer-ranges.mjs` preflight 게이트 통과; core는 zero runtime dependency 유지; core 1.0.0 bump | Pass |
 | G6 | 1.1.0 plugin capability 강제 (`process-isolated`) | P1-SEC-027 / P1-SEC-028 mitigated; `process-isolated` 런타임(`--permission` 하 자식, 부여 0, `data:` URL 로드, stdio 무시, JSON-string IPC) + fail-closed `--allow-net` 기능 탐지(`netEnforcement:"require-permission"`) + 코어 `haechi/ssrf` 가드 + 호스트 중개 키 자료 + spawn-storm 서킷 브레이커; fs/net/stdio 레드팀 + SSRF + config 테스트 통과(행동 스위트는 `--allow-net` Node에서 실행, 아니면 fail-closed로 skip); API freeze 통과 유지(additive `./ssrf` export + additive config 키); core는 zero runtime dependency 유지; core 1.1.0 bump(additive + opt-in 마이너) | Pass |
 | G7 | 1.2.0 신뢰성 강화 트랙 (WS1–WS6) | 탐지 품질 측정+강화(WS2: 라벨 코퍼스 precision/recall `bench:detection` 게이트, 자격증명+국제 PII 커버리지, 하드블록 타입 불변식이 적용된 `filters.minConfidence` / `filters.allowlist`, offset 무결성을 갖춘 NFKC 유니코드 회피 폴딩); WS3 주입 가능한 `rateLimiter` 시임 + bounded fixed-window map; WS4 운영성(`/__haechi/live`+`/ready` 분리, 주입 가능한 `/metrics`, 구조적 로그 + 요청별 `correlationId`, graceful drain, max-in-flight backpressure, env overlay, 하드닝 Dockerfile/compose/runbook, `configVersion`); WS6 proxy TLS / remote-bind 하드닝(`proxy.tls` / `proxy.trustForwardedProto`, fail-closed `assertSafeProxyTransport`) + OWASP-LLM/NIST 컨트롤 매핑 백서 + RFC 9116 `security.txt` + 취약점 공개 경로. 모든 변경은 1.1 동작을 보존하는 기본값 뒤의 additive(`tests/api-contract.test.mjs` 통과); no-plaintext-in-audit 불변식이 텔레메트리까지 확장; core는 zero runtime dependency 유지; core 1.2.0 bump(additive 마이너) | Pass |
+| G8 | 1.3.0 백엔드 + 탐지 커버리지 확장 | **Anthropic Messages API**(`/v1/messages`, content-block + SSE `delta.text`, `event:` 라인 보존 재직렬화)와 **Google Gemini API**(model-in-path `:generateContent`/`:streamGenerateContent`, 기존 정확-매칭 어댑터를 바이트 동일하게 두는 additive `:method`-suffix 라우트 매처) 프로토콜 어댑터 추가; 탐지 커버리지 확장 — 클라우드/SaaS provider 키(OpenAI/Anthropic/Google-OAuth/SendGrid/Twilio/npm/Azure, anchored)와 국제 PII(FR/ES/JP + IT/SG/IN/DE/NL 국가 ID, 체크섬 validator), 각 하드블록-대-dial-eligible 결정은 측정된 충돌률 기반(하드블록은 비숫자 앵커 또는 비현실적으로 드문 형태가 필요; 흔한 길이의 bare-digit run은 allowlist로 정리 가능 유지); `bench:throughput` proxy 부하 벤치; `haechi-ratelimit-redis` 공유 저장소 rate-limiter 위성(WS3 시임의 운영 소비자; proxy가 이제 `rateLimiter.allow`를 `await`); `haechi-dashboard`가 요청별 `correlationId` 노출. 모든 변경은 additive — 새 `target.type`/탐지타입/`privacy.profile` *값*이며 새 config 키가 아님(`configVersion`은 `1` 유지); `tests/api-contract.test.mjs` 통과; core는 zero runtime dependency 유지; core 1.3.0 bump(additive 마이너) | Pass |
 ## 3. P0 배포 차단 리스크 상태

package/docs/current/risk-register-release-gate.md CHANGED Viewed

@@ -1,8 +1,8 @@
 # Haechi Risk Register and Release Gates
-- Status: Living document (tracks core 1.2.x)
+- Status: Living document (tracks core 1.3.x)
 - Date: 2026-06-11
-- Target version: 1.2.x
+- Target version: 1.3.x
 - Branch: `main`
 ## 1. Current Assessment
@@ -23,11 +23,12 @@ Haechi has shipped its `1.x` stable line. The developer-preview gate (G2, `haech
 | G0 | GitHub source publication | Tests pass, security limitations documented, no plaintext audit leak | Pass |
 | G1 | GitHub pre-release | P0 code risks resolved, no production-ready language | Pass |
 | G2 | npm developer preview | P0 resolved, preflight/SBOM/provenance paths ready, npm auth confirmed | Pass (`haechi@0.3.2` published 2026-06-10) |
-| G3 | npm stable | P1 production reference, stream-aware enforcement, API stability hardened | Blocked |
+| G3 | npm stable | P1 production reference, stream-aware enforcement, API stability hardened | Pass (achieved at the 1.0.0 stable cut — streaming inspection shipped in 0.5, the API freeze in 1.0.0; see G5. Superseded by G5–G7.) |
 | G4 | 0.9.0 observability + interactive-auth satellite cut | P1-SEC-026 / P1-OPS-009 mitigated and P2-CRYPTO-001 accepted; `haechi-dashboard` + `haechi-auth-oidc` + `haechi-crypto-kms@0.2.0` tests green; satellite tarballs zero-dep; core bumped to 0.9.0 (only an additive FORBIDDEN_KEYS audit hardening) | Pass |
 | G5 | 1.0.0 stable API contract + signed-plugin sandbox | P1-SEC-024 / P1-SEC-025 mitigated, P2-API-001 / P2-OPS-006 resolved; the API freeze + deprecation policy + `tests/api-contract.test.mjs` green; the Ed25519 signed-plugin contract + `assertAuthProviderConformance` + the worker-isolated `authProvider` sandbox tests green; PR0 satellite peer-ranges widened to `>=0.8.0 <2.0.0` and the `check-satellite-peer-ranges.mjs` preflight gate green; core stays zero runtime dependency; core bumped to 1.0.0 | Pass |
 | G6 | 1.1.0 plugin capability enforcement (`process-isolated`) | P1-SEC-027 / P1-SEC-028 mitigated; the `process-isolated` runtime (child under `--permission`, zero grants, `data:`-URL load, stdio-ignored, JSON-string IPC) + the fail-closed `--allow-net` feature detection (`netEnforcement:"require-permission"`) + the core `haechi/ssrf` guard + host-mediated key material + the spawn-storm circuit breaker; the fs/net/stdio red-team + SSRF + config tests green (the behavioral suite runs on a `--allow-net` Node and skips fail-closed otherwise); the API freeze stays green (additive `./ssrf` export + additive config keys); core stays zero runtime dependency; core bumped to 1.1.0 (additive + opt-in minor) | Pass |
 | G7 | 1.2.0 Reliability Hardening Track (WS1–WS6) | Detection quality measured + tightened (WS2: a labeled-corpus precision/recall `bench:detection` gate, credential + international-PII coverage, `filters.minConfidence` / `filters.allowlist` with the hard-block-types invariant, NFKC unicode-evasion folding with offset-integrity); WS3 injectable `rateLimiter` seam + bounded fixed-window map; WS4 operability (`/__haechi/live`+`/ready` split, injectable `/metrics`, structured logs + per-request `correlationId`, graceful drain, max-in-flight backpressure, env overlay, hardened Dockerfile/compose/runbook, `configVersion`); WS6 proxy TLS / remote-bind hardening (`proxy.tls` / `proxy.trustForwardedProto`, fail-closed `assertSafeProxyTransport`) + OWASP-LLM/NIST control-mapping whitepaper + RFC 9116 `security.txt` + vulnerability-disclosure path. Every change is additive behind 1.1-preserving defaults (`tests/api-contract.test.mjs` green); the no-plaintext-in-audit invariant extends to telemetry; core stays zero runtime dependency; core bumped to 1.2.0 (additive minor) | Pass |
+| G8 | 1.3.0 backend + detection coverage expansion | New protocol adapters for the **Anthropic Messages API** (`/v1/messages`, content-block + SSE `delta.text` with `event:`-line-preserving re-serialize) and the **Google Gemini API** (model-in-path `:generateContent`/`:streamGenerateContent` via an additive `:method`-suffix route matcher that leaves the exact-match adapters byte-identical); detection coverage expansion — cloud/SaaS provider keys (OpenAI/Anthropic/Google-OAuth/SendGrid/Twilio/npm/Azure, anchored) and international PII (FR/ES/JP + IT/SG/IN/DE/NL national IDs with checksum validators), each hard-block-vs-dial-eligible decision driven by measured collision rates (a non-numeric anchor or implausibly-rare shape is required for hard-block; a bare-digit run over a common length stays allowlist-clearable); a `bench:throughput` proxy load benchmark; the `haechi-ratelimit-redis` shared-store rate-limiter satellite (the WS3 seam's production consumer; the proxy now `await`s `rateLimiter.allow`); `haechi-dashboard` surfaces the per-request `correlationId`. Every change is additive — new `target.type`/detection-type/`privacy.profile` *values*, not new config keys (`configVersion` stays `1`); `tests/api-contract.test.mjs` green; core stays zero runtime dependency; core bumped to 1.3.0 (additive minor) | Pass |
 ## 3. P0 Distribution-Blocking Risk Status

package/docs/current/shared-responsibility.ko.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Haechi Shared Responsibility
-- 문서 상태: Living document (core 1.2.x 추적)
+- 문서 상태: Living document (core 1.3.x 추적)
 - 작성일: 2026-06-10
 ## 1. 책임 매트릭스
@@ -41,7 +41,7 @@
 Haechi의 상태 보유 통제는 설계상 단일 프로세스입니다. 로드밸런서 뒤에서 복제본을 2개 이상 실행하면, 운영자가 공유 인프라를 제공하지 않는 한 이들이 **무음으로 약화**됩니다.
-- **Rate limit**은 프로세스별·인메모리이므로 전체 처리량이 복제본 수만큼 배가됩니다. identity별 한도를 공유 front door에서 강제하거나, `createRuntime(config, { rateLimiter })`를 통해 공유 저장소 기반 `rateLimiter`를 주입하세요(이 시임은 `allow(key, limit)` 계약을 만족합니다. [`configuration.md` → Rate limiter 주입](./configuration.ko.md#rate-limiter-주입) 참고). 기본 프로세스별 limiter는 window map도 bounding하므로 identity 기준 무한 메모리 증가가 없습니다.
+- **Rate limit**은 프로세스별·인메모리이므로 전체 처리량이 복제본 수만큼 배가됩니다. identity별 한도를 공유 front door에서 강제하거나, `createRuntime(config, { rateLimiter })`를 통해 공유 저장소 기반 `rateLimiter`를 주입하세요(이 시임은 `allow(key, limit)` 계약을 만족하며, `boolean` 또는 `Promise<boolean>`을 반환할 수 있습니다. [`configuration.md` → Rate limiter 주입](./configuration.ko.md#rate-limiter-주입) 참고). [`haechi-ratelimit-redis`](https://github.com/raeseoklee/haechi/tree/main/satellites/ratelimit-redis) satellite가 레퍼런스 공유 저장소(Redis 기반) 구현입니다 — 주입된 클라이언트 위의 fixed-window 카운터입니다. 기본 프로세스별 limiter는 window map도 bounding하므로 identity 기준 무한 메모리 증가가 없습니다.
 - **Audit hash chain + anchor**는 단일 작성자입니다. 각 복제본에 **고유한** `audit.path`(및 anchor 경로)를 주세요. 하나의 audit 파일을 복제본 간에 공유하면 체인이 분기되어 검증 불가 상태가 됩니다.
 - **TokenVault와 auth store**는 whole-file 로컬 저장소입니다 — 단일 호스트에서는 올바르지만 공유 다중 작성자 저장소는 아닙니다. 다중 복제 토큰화에는 공유 `tokenVault`를 주입하세요.
 - 파일 락은 `O_EXCL` + atomic rename에 의존하며 NFS/공유 파일시스템에서는 보장되지 않습니다 — 이 저장소들은 로컬 디스크에 두세요.

package/docs/current/shared-responsibility.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Haechi Shared Responsibility
-- Status: Living document (tracks core 1.2.x)
+- Status: Living document (tracks core 1.3.x)
 - Date: 2026-06-10
 ## 1. Responsibility Matrix
@@ -41,7 +41,7 @@
 Haechi's stateful controls are single-process by design. Running 2+ replicas behind a load balancer **silently weakens** them unless the operator supplies shared infrastructure:
-- **Rate limit** is per-process and in-memory — total throughput multiplies by the replica count. Enforce a per-identity limit at a shared front door, or inject a shared-store `rateLimiter` via `createRuntime(config, { rateLimiter })` (the seam satisfies the `allow(key, limit)` contract; see [`configuration.md` → Rate limiter injection](./configuration.md#rate-limiter-injection)). The default per-process limiter also bounds its window map (no unbounded memory growth keyed by identity).
+- **Rate limit** is per-process and in-memory — total throughput multiplies by the replica count. Enforce a per-identity limit at a shared front door, or inject a shared-store `rateLimiter` via `createRuntime(config, { rateLimiter })` (the seam satisfies the `allow(key, limit)` contract, which may return `boolean` or `Promise<boolean>`; see [`configuration.md` → Rate limiter injection](./configuration.md#rate-limiter-injection)). The [`haechi-ratelimit-redis`](https://github.com/raeseoklee/haechi/tree/main/satellites/ratelimit-redis) satellite is the reference shared-store (Redis-backed) implementation — a fixed-window counter over an injected client. The default per-process limiter also bounds its window map (no unbounded memory growth keyed by identity).
 - **Audit hash chain + anchor** are single-writer. Give each replica its **own** `audit.path` (and anchor path); never share one audit file across replicas, or the chain forks into an unverifiable state.
 - **TokenVault and the auth store** are whole-file local stores — correct for one host, but not a shared multi-writer store. For multi-replica tokenization, inject a shared `tokenVault`.
 - File locking relies on `O_EXCL` + atomic rename, which do not hold on NFS / shared filesystems — keep these stores on local disk.

package/docs/current/threat-model.ko.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Haechi Threat Model
-- 문서 상태: Living document(core 1.2.x 추적)
+- 문서 상태: Living document(core 1.3.x 추적)
 - 작성일: 2026-06-10
 ## 1. 보호 대상
@@ -46,7 +46,8 @@ Haechi가 보호하려는 주요 자산은 다음과 같습니다.
 | 행 걸린 upstream | proxy 연결 고갈 | `limits.upstreamTimeoutMs` 기본 120s, 초과 시 504 fail |
 | signing/encryption 키 혼용 | key separation 위반 | policy bundle 서명 키를 domain-separated 파생 키로 분리 |
 | JSON number/object key 은닉 | 카드번호 등 비문자열 leaf 미탐지 | number leaf와 object key도 detection/transform 대상 |
-| 모든 정규식 규칙을 우회하는 유니코드 난독화 | card/RRN/phone/email/secret을 시각·의미상 동등한 비ASCII 유니코드 형태(전각 숫자 `４２４２…`, 전각 `＠`, 수학·원문자 영숫자)로 보내 모든 탐지 규칙을 무력화 | **매칭 전 각 string leaf의 NFKC 정규화**(WS2d)입니다. 정규화가 무변환인 경우(leaf의 약 99%) 탐지는 이전과 바이트 단위로 동일합니다. 접힘이 **위치 안정적**인 경우(모든 코드포인트가 같은 UTF-16 길이로 접히고 코드포인트별 접힘이 전체 정규화를 그대로 재구성) 정규화 사본에서 탐지하고 원본의 정확한 구간을 redact/block하며, 기록되는 값은 접힌 형태가 아니라 원본 바이트입니다. 그 외 — 길이가 달라지거나(수학 숫자·합자) 총 길이는 같지만 내부 offset을 이동시키는 수축+확장 보상 — 의 경우 offset을 원본에 매핑할 수 없으므로 탐지가 **fail closed**되어 leaf 전체를 덮는 단일 탐지로 처리합니다(leaf 전체 redact/block — 우회 시도를 과도 redact하는 것이 안전한 실패입니다). `String.prototype.normalize` 빌트인을 사용하므로 새 의존성은 없습니다. **수용된 잔여:** base64/percent-encoded 페이로드는 여전히 디코딩 후 재검사하지 않습니다(§4 참조) |
+| 모든 정규식 규칙을 우회하는 유니코드 난독화 | card/RRN/phone/email/secret을 시각·의미상 동등한 비ASCII 유니코드 형태(전각 숫자 `４２４２…`, 전각 `＠`, 수학·원문자 영숫자)로 보내 모든 탐지 규칙을 무력화 | **매칭 전 각 string leaf의 NFKC 정규화**(WS2d)입니다. 정규화가 무변환인 경우(leaf의 약 99%) 탐지는 이전과 바이트 단위로 동일합니다. 접힘이 **위치 안정적**인 경우(모든 코드포인트가 같은 UTF-16 길이로 접히고 코드포인트별 접힘이 전체 정규화를 그대로 재구성) 정규화 사본에서 탐지하고 원본의 정확한 구간을 redact/block하며, 기록되는 값은 접힌 형태가 아니라 원본 바이트입니다. 그 외 — 길이가 달라지거나(수학 숫자·합자) 총 길이는 같지만 내부 offset을 이동시키는 수축+확장 보상 — 의 경우 offset을 원본에 매핑할 수 없으므로 탐지가 **fail closed**되어 leaf 전체를 덮는 단일 탐지로 처리합니다(leaf 전체 redact/block — 우회 시도를 과도 redact하는 것이 안전한 실패입니다). `String.prototype.normalize` 빌트인을 사용하므로 새 의존성은 없습니다. **잔여는 이제 opt-in 통제입니다:** base64/percent-encoded 페이로드는 `filters.decodeAndRescan`이 활성화된 경우에만 디코딩 후 재검사합니다(다음 행 및 §4 참조) |
+| 모든 정규식 규칙을 우회하는 base64/percent-encoded 페이로드 | 전송 전 base64·percent로 인코딩된 card/RRN/secret은 모든 규칙을 통과합니다(Haechi는 NFKC 텍스트에서 매칭하지만 디코딩하지 않습니다) | **opt-in `filters.decodeAndRescan`**입니다(기본 OFF → 이전과 바이트 단위로 동일). ON일 때, 일반 NFKC 스캔 이후 base64/base64url로 **보이는** string leaf(고정 알파벳, 유효한 길이, `16…8192` 바이트 범위, 같은 leaf로 round-trip, `node:buffer` `isUtf8`로 **유효한 UTF-8** 디코딩)이거나 `%XX` 이스케이프를 포함하는 leaf(try/catch 안의 `decodeURIComponent`)를 디코딩하여 같은 규칙·validator로 재검사합니다. **offset 처리는 fail closed입니다:** 디코딩된 매칭은 인코딩된 leaf에 유효한 offset이 없으므로, 원본 인코딩 leaf 전체를 덮는 **WHOLE-LEAF** 탐지(`start:0, end:leaf.length`)를 발생시킵니다 — transform이 leaf 전체를 redact/block하며, 디코딩된 offset을 원본으로 되돌려 매핑하지 않습니다. **정밀도 가드:** 디코딩된 매칭은 **validator 기반이거나 하드 블록 타입**일 때만 발생합니다(Luhn 통과 `card`, 체크섬 `kr_rrn`/`us_ssn`, IBAN mod-97, 또는 앵커된 규칙의 `secret`/`api_key`). validator 없는 디코딩된 소프트 타입 매칭(맨 전화번호 형태 등)은 발생하지 **않으므로** 무작위 base64는 오탐하지 않습니다. 새 의존성은 없습니다(`node:buffer` Buffer + `decodeURIComponent` 빌트인). **수용된 잔여:** Haechi가 디코딩하지 않는 인코딩(gzip, hex, 중첩/이중 인코딩, 커스텀 알파벳), 그리고 양성 텍스트 안에 Luhn-유효 16자리 런으로 디코딩되도록 의도적으로 조작된 평문(이에 발생하는 것은 오탐이 아니라 올바른 동작) |
 | 인증 없는 멀티 클라이언트 접근 | 로컬 프로세스가 upstream / token round-trip 경로를 무단 사용 | 선택적 bearer auth (`auth.provider: bearer`); 없거나 잘못된 경우 → 바디 읽기 전 401; identity별 rate limit 및 model allowlist |
 | Audit tail truncation | 꼬리 audit 레코드의 무음 삭제 | 추가 전용/별도 미디어의 `audit.anchor` head-hash anchoring으로 마지막 anchor까지의 절단 탐지 (0.7) |
 | Local dev key in production | 소프트웨어 키의 운영 custody 오용 | `assertCryptoProviderConformance`를 통한 외부 `cryptoProvider` 주입; reference KMS adapter (envelope 암호화) |
@@ -87,7 +88,7 @@ Haechi는 다음을 보장하지 않습니다.
 - 법적 컴플라이언스 인증
 - 모델 hallucination, prompt injection 완전 방어
 - 외부 MCP server의 OAuth/resource binding 검증
-- base64/percent-encoded 값의 **디코딩 후** 검사 — Haechi는 NFKC 정규화 텍스트에서 매칭하지만(§3의 유니코드 난독화 행 참조) base64/URL 디코딩 후 재검사는 하지 **않습니다**. 전송 전 base64·percent로 인코딩된 값은 검사되지 않습니다. (WS2d는 디코딩-후-재검사 패스를 보류했습니다. 상시 디코딩은 오탐이 많고, recall-safe한 opt-in을 범위 내에서 precision-neutral하게 만들 수 없어 문서화된 제외로 남깁니다.)
+- base64/percent-encoded 값의 **기본** 디코딩 후 검사 — Haechi는 NFKC 정규화 텍스트에서 매칭하며(§3의 유니코드 난독화 행 참조) opt-in `filters.decodeAndRescan`(기본 OFF)을 활성화하지 않는 한 base64/URL 디코딩 후 재검사는 하지 **않습니다**. OFF이면 전송 전 base64·percent로 인코딩된 값은 검사되지 않습니다. ON이면 §3에 설명된 정밀도 가드(validator 기반 / 하드 블록 매칭만, WHOLE-LEAF fail-closed)와 함께 디코딩-후-재검사 패스가 동작합니다. WS2d는 *상시* 디코딩을 보류했고(오탐이 많고 범위 내에서 precision-neutral하지 않음), opt-in 통제는 트레이드오프를 수용하는 운영자를 위해 그 잔여를 닫습니다. 다른 인코딩(gzip/hex/중첩/커스텀 알파벳)은 여전히 범위 밖입니다.
 - URL query string 내 민감값 검사 (JSON body만 검사)
 - 마지막 anchor 이후의 audit tail truncation — `audit.anchor`(0.7)는 anchor가 추가 전용/별도 미디어에 있을 때 마지막 anchor까지의 레코드 삭제를 탐지합니다. 마지막 anchor 이후 기록된 레코드와 동일 파일시스템 anchor는 대상에서 제외됩니다
 - JSON-RPC batch 메시지 처리 (MCP stdio filter는 batch를 fail-closed로 거부)

package/docs/current/threat-model.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Haechi Threat Model
-- Status: Living document (tracks core 1.2.x)
+- Status: Living document (tracks core 1.3.x)
 - Date: 2026-06-10
 ## 1. Assets Under Protection
@@ -46,7 +46,8 @@ The primary assets Haechi protects are:
 | Hung upstream | Proxy connection exhaustion | `limits.upstreamTimeoutMs` default 120 s; 504 fail on timeout |
 | Signing/encryption key conflation | Key separation violation | Policy bundle signing key isolated as a domain-separated derived key |
 | JSON number / object key concealment | Undetected non-string leaves such as card numbers | Number leaves and object keys included in detection/transform scope |
-| Unicode-obfuscation evasion of every regex rule | A card/RRN/phone/email/secret sent in a visually/semantically equivalent non-ASCII Unicode form (full-width digits `４２４２…`, full-width `＠`, mathematical/enclosed alphanumerics) defeats every detection rule | **NFKC normalization of each string leaf before matching** (WS2d). When the normalization is a no-op (~99% of leaves) detection is byte-identical to before. When the fold is **position-stable** (every codepoint folds to the same UTF-16 length and the per-codepoint folds reconstruct the whole normalization), detection runs on the normalized copy and the exact original span is redacted/blocked (the recorded value is the original bytes, never the fold). Otherwise — a length change (mathematical digits/ligatures) **or** a compensating contraction+expansion that keeps the total length equal while shifting interior offsets — offsets cannot map back, so detection **fails closed** to a single whole-leaf detection (the entire leaf is redacted/blocked — over-redacting an evasion attempt is the safe failure). Uses the `String.prototype.normalize` builtin (no new dependency). **Accepted residual:** base64/percent-encoded payloads are still not decoded-and-rescanned (see §4) |
+| Unicode-obfuscation evasion of every regex rule | A card/RRN/phone/email/secret sent in a visually/semantically equivalent non-ASCII Unicode form (full-width digits `４２４２…`, full-width `＠`, mathematical/enclosed alphanumerics) defeats every detection rule | **NFKC normalization of each string leaf before matching** (WS2d). When the normalization is a no-op (~99% of leaves) detection is byte-identical to before. When the fold is **position-stable** (every codepoint folds to the same UTF-16 length and the per-codepoint folds reconstruct the whole normalization), detection runs on the normalized copy and the exact original span is redacted/blocked (the recorded value is the original bytes, never the fold). Otherwise — a length change (mathematical digits/ligatures) **or** a compensating contraction+expansion that keeps the total length equal while shifting interior offsets — offsets cannot map back, so detection **fails closed** to a single whole-leaf detection (the entire leaf is redacted/blocked — over-redacting an evasion attempt is the safe failure). Uses the `String.prototype.normalize` builtin (no new dependency). **Residual now an opt-in control:** base64/percent-encoded payloads are decoded-and-rescanned only when `filters.decodeAndRescan` is enabled (see the next row and §4) |
+| Base64/percent-encoded payload evades every regex rule | A card/RRN/secret base64- or percent-encoded before sending passes every rule (Haechi matches the NFKC text but does not decode) | **Opt-in `filters.decodeAndRescan`** (default OFF → byte-identical to before). When ON, after the normal NFKC scan a string leaf that LOOKS base64/base64url (anchored alphabet, valid length, within `16…8192` bytes, round-trips to the same leaf, decodes to **valid UTF-8** via `node:buffer` `isUtf8`) or contains a `%XX` escape (`decodeURIComponent` in try/catch) is decoded and rescanned with the same rules + validators. **Offset handling fails closed:** a decoded hit has no offset in the encoded leaf, so it emits a **WHOLE-LEAF** detection of the original encoded leaf (`start:0, end:leaf.length`) — the transform redacts/blocks the entire leaf; a decoded offset is never mapped back. **Precision guard:** a decoded hit only fires when it is **validator-backed or a hard-block type** (a Luhn-passing `card`, a checksum `kr_rrn`/`us_ssn`, an IBAN mod-97, or a `secret`/`api_key` on its anchored rule). A decoded soft-type-without-validator match (a bare phone-shaped run) does **not** fire, so random base64 does not false-positive. Zero new dependency (`node:buffer` Buffer + the `decodeURIComponent` builtin). **Accepted residual:** an encoding Haechi does not decode (gzip, hex, nested/double-encoding, a custom alphabet), and a deliberately contrived plaintext that decodes to a Luhn-valid 16-digit run inside benign text (firing on it is correct, not a false positive) |
 | Unauthenticated multi-client access | Any local process uses the upstream / token round-trip | Optional bearer auth (`auth.provider: bearer`); missing/invalid → 401 before body read; per-identity rate limit and model allowlist |
 | Audit tail truncation | Silent deletion of trailing audit records | `audit.anchor` head-hash anchoring on append-only/separate media detects truncation back to the last anchor (0.7) |
 | Local dev key in production | Software key misused as production custody | External `cryptoProvider` injection with `assertCryptoProviderConformance`; reference KMS adapter (envelope encryption) |
@@ -87,7 +88,7 @@ Haechi does not guarantee:
 - Legal compliance certification
 - Complete defense against model hallucination or prompt injection
 - OAuth/resource binding validation for external MCP servers
-- Inspection of base64/percent-encoded values **after decoding** — Haechi matches on the NFKC-normalized text (see the Unicode-evasion row in §3) but does **not** base64/URL-decode-and-rescan. A value that is base64- or percent-encoded before sending is not inspected. (WS2d deferred a decode-and-rescan pass: an always-on decode is false-positive-prone, and a recall-safe opt-in could not be made precision-neutral within scope; it remains a documented exclusion.)
+- Inspection of base64/percent-encoded values **after decoding** **by default** — Haechi matches on the NFKC-normalized text (see the Unicode-evasion row in §3) and does **not** base64/URL-decode-and-rescan unless the opt-in `filters.decodeAndRescan` is enabled (default OFF). With it OFF, a value that is base64- or percent-encoded before sending is not inspected. With it ON, the decode-and-rescan pass runs with the precision guard described in §3 (validator-backed / hard-block hits only, whole-leaf fail-closed). WS2d deferred an *always-on* decode (false-positive-prone, not precision-neutral within scope); the opt-in control closes that residual for operators who accept the trade-off, and other encodings (gzip/hex/nested/custom-alphabet) remain out of scope.
 - Detection of sensitive values in URL query strings (JSON body only)
 - Audit tail truncation beyond the last anchor — `audit.anchor` (0.7) detects deletion of records back to the last anchor when the anchor is on append-only/separate media; records written after the last anchor, and same-filesystem anchors, are not covered
 - JSON-RPC batch message processing (the MCP stdio filter rejects batches fail-closed)

package/examples/local-proxy-demo/README.md ADDED Viewed

@@ -0,0 +1,51 @@
+# Local end-to-end demo
+A self-contained, **reproducible** walkthrough of Haechi — no remote model required.
+It stands up a tiny OpenAI-compatible *stub* upstream and the **real** Haechi proxy
+in front of it (in `enforce` mode), then narrates what happens to a payload carrying
+an email, a phone number, an API key, and a card number.
+```bash
+node examples/local-proxy-demo/demo.mjs
+# or, from the repo root:
+npm run demo
+```
+What it shows, in order:
+1. **The model only sees protected values** — the proxy detects and transforms the
+   payload *before* forwarding, so the stub (standing in for the model) receives
+   `[TOKEN:…]` for the email, a masked phone, and `[REDACTED:api_key]` for the key.
+2. **The token round-trip** — because the email was *tokenized* (reversible), the
+   caller gets `minji.kim@example.com` back, while the masked phone and redacted
+   secret stay protected. The model's own leaked secret in its reply is
+   response-protected too.
+3. **The audit log** carries detection metadata and is hash-chained — and never any
+   plaintext email/phone/key.
+4. **Day-2 operability** — the live `/__haechi/ready` readiness probe and the
+   Prometheus `/__haechi/metrics` surface.
+5. **A card number is blocked outright** (`403`, fail-closed) — it never reaches the
+   model.
+Zero dependencies (only `node:` builtins + the in-repo `haechi` packages). The demo
+is programmatic for reproducibility; for the real CLI invocation see the
+[Quickstart](../../README.md#quickstart) and
+[`docs/current/configuration.md`](../../docs/current/configuration.md).
+## Live demo against a real model
+`live-demo.mjs` runs the same flow against a **real** upstream (vLLM / Ollama / any
+OpenAI-compatible server) instead of the stub. It asks the model to repeat the phone
+number it was given — and the model can only return the *masked* form, because the
+real number never reached it. This is the run recorded in the README GIF
+(`demo.tape` records the stub demo; `live-demo.tape` records this one).
+```bash
+HAECHI_LIVE_UPSTREAM=http://127.0.0.1:8000 \
+HAECHI_LIVE_MODEL="Qwen/Qwen3.6-35B-A3B-FP8" \
+node examples/local-proxy-demo/live-demo.mjs
+```
+`HAECHI_LIVE_TYPE` (default `vllm-openai`) and `HAECHI_LIVE_MODEL` override the target.
+For Qwen3-style reasoning servers the request sets `chat_template_kwargs.enable_thinking
+= false` so the reply is a terse line; non-reasoning servers ignore it.

package/examples/local-proxy-demo/demo.mjs ADDED Viewed

@@ -0,0 +1,144 @@
+#!/usr/bin/env node
+// Self-contained, reproducible Haechi demo — no remote model required.
+//
+// It stands up a tiny OpenAI-compatible *stub* upstream and the REAL Haechi proxy
+// in front of it, then walks through what Haechi does to a payload that carries an
+// email, a phone number, an API key, and a card:
+//   1. the model only ever sees redacted/tokenized values (proven by echoing the
+//      exact body the stub received),
+//   2. the caller gets the original email back (the token round-trip),
+//   3. the audit log carries no plaintext,
+//   4. the live /__haechi/metrics + /__haechi/ready operability surface,
+//   5. a card is blocked outright (fail-closed).
+//
+// Run:  node examples/local-proxy-demo/demo.mjs   (or: npm run demo)
+// Zero dependencies — only node: builtins and the in-repo haechi packages.
+import { createServer } from "node:http";
+import { mkdtemp, readFile } from "node:fs/promises";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import { createRuntime } from "../../packages/cli/runtime.mjs";
+import { createHaechiProxy } from "../../packages/proxy/index.mjs";
+import { initLocalKeyFile } from "../../packages/crypto/index.mjs";
+const B = "\x1b[1m", D = "\x1b[2m", G = "\x1b[32m", Y = "\x1b[33m", C = "\x1b[36m", R = "\x1b[31m", X = "\x1b[0m";
+const rule = () => console.log(D + "─".repeat(64) + X);
+const scene = (n, t) => { console.log(); rule(); console.log(`${B}${C}  ${n}. ${t}${X}`); rule(); };
+const pause = (ms) => new Promise((r) => setTimeout(r, ms));
+// A minimal OpenAI-compatible stub. It records the EXACT body it receives (which is
+// whatever the proxy forwarded, i.e. the protected payload) and replies with a
+// canned assistant message that itself leaks a secret, to exercise response protection.
+function startStubUpstream() {
+  let lastReceived = null;
+  const server = createServer((req, res) => {
+    let body = "";
+    req.on("data", (c) => (body += c));
+    req.on("end", () => {
+      lastReceived = body;
+      // Echo the (already-protected) user content back so the response exercises the
+      // token round-trip, and append a leaked secret so response protection fires.
+      let echoed = "";
+      try { echoed = JSON.parse(body).messages.at(-1).content; } catch { /* ignore */ }
+      res.writeHead(200, { "content-type": "application/json" });
+      res.end(JSON.stringify({
+        id: "chatcmpl-demo",
+        object: "chat.completion",
+        choices: [{ index: 0, message: { role: "assistant",
+          content: `Noted — I will follow up. You wrote: "${echoed}" (our ref: token=DEMOleak9876543210notRealzyxwvu)` } }]
+      }));
+    });
+  });
+  return new Promise((resolve) => {
+    server.listen(0, "127.0.0.1", () => resolve({ server, url: `http://127.0.0.1:${server.address().port}`, received: () => lastReceived }));
+  });
+}
+async function main() {
+  console.log(`\n${B}🛡  Haechi — local end-to-end demo${X}  ${D}(stub upstream, real proxy, enforce mode)${X}`);
+  const dir = await mkdtemp(join(tmpdir(), "haechi-demo-"));
+  const keyFile = join(dir, ".haechi", "dev.keys.json");
+  const auditPath = join(dir, ".haechi", "audit.jsonl");
+  await initLocalKeyFile(keyFile, { force: true });
+  const stub = await startStubUpstream();
+  const runtime = createRuntime({
+    mode: "enforce",
+    target: { type: "openai-compatible", upstream: stub.url },
+    policy: {
+      mode: "enforce",
+      presets: ["llm-redact"],
+      actions: { email: "tokenize", phone: "mask", secret: "redact", api_key: "redact", card: "block" }
+    },
+    tokenVault: { detokenizeResponses: true },
+    responseProtection: { enabled: true, mode: "enforce", failureMode: "fail-closed" },
+    keys: { keyFile },
+    audit: { path: auditPath }
+  });
+  const proxy = createHaechiProxy({ runtime, port: 0 });
+  const addr = await proxy.listen();
+  const base = `http://127.0.0.1:${addr.port}`;
+  // ── Scene 1 ───────────────────────────────────────────────────────────────
+  scene(1, "A prompt with an email, a phone number, and a deploy secret");
+  const userText = "Contact minji.kim@example.com or 010-1234-5678. Deploy api_key=DEMOkey0123456789notARealSecretabcdef.";
+  console.log(`${Y}you send →${X} ${userText}`);
+  await pause(700);
+  const r1 = await fetch(`${base}/v1/chat/completions`, {
+    method: "POST", headers: { "content-type": "application/json" },
+    body: JSON.stringify({ model: "demo", messages: [{ role: "user", content: userText }] })
+  });
+  const out1 = await r1.json();
+  scene(2, "What the MODEL actually received (the proxy protected it first)");
+  const forwarded = JSON.parse(stub.received());
+  console.log(`${G}model sees →${X} ${forwarded.messages[0].content}`);
+  console.log(`${D}  (email → [TOKEN:…], phone → masked, secret → [REDACTED])${X}`);
+  await pause(700);
+  scene(3, "What YOU get back — the email token is restored (round-trip)");
+  console.log(`${G}you receive →${X} ${out1.choices[0].message.content}`);
+  console.log(`${D}  (email restored from its token; phone stays masked; keys stay redacted both ways)${X}`);
+  await pause(700);
+  // ── Scene 4 ───────────────────────────────────────────────────────────────
+  scene(4, "The audit log — tamper-evident, and never any plaintext");
+  const audit = (await readFile(auditPath, "utf8")).trim().split("\n");
+  const ev = JSON.parse(audit[0]);
+  console.log(`${D}detections:${X} ${ev.detections.map((d) => `${d.type}→${d.action}`).join("  ")}`);
+  console.log(`${D}leaks the email/secret/phone?${X} ${audit.join("").match(/minji\.kim@|DEMOkey0123|010-1234-5678/) ? R + "YES" + X : G + "no — clean" + X}`);
+  await pause(700);
+  // ── Scene 5 ───────────────────────────────────────────────────────────────
+  scene(5, "Day-2 operability — live health + Prometheus metrics");
+  const ready = await (await fetch(`${base}/__haechi/ready`)).json();
+  console.log(`${D}/__haechi/ready →${X} ${ready.ready ? G + "ready" : R + "not ready"}${X} ${D}(audit writable: ${ready.checks?.auditWritable})${X}`);
+  const metrics = await (await fetch(`${base}/__haechi/metrics`)).text();
+  for (const line of metrics.split("\n").filter((l) => /^haechi_requests_total\{|^haechi_blocks_total /.test(l)).slice(0, 4)) {
+    console.log(`${D}metric:${X} ${line}`);
+  }
+  await pause(700);
+  // ── Scene 6 ───────────────────────────────────────────────────────────────
+  scene(6, "A card number is blocked outright (fail-closed)");
+  const r2 = await fetch(`${base}/v1/chat/completions`, {
+    method: "POST", headers: { "content-type": "application/json" },
+    body: JSON.stringify({ model: "demo", messages: [{ role: "user", content: "charge card 4242 4242 4242 4242 now" }] })
+  });
+  console.log(`${Y}you send →${X} "charge card 4242 4242 4242 4242 now"`);
+  console.log(`${G}proxy →${X} HTTP ${r2.status} ${r2.status === 403 ? R + B + "BLOCKED" + X : ""} ${D}(the card never reaches the model)${X}`);
+  console.log();
+  rule();
+  console.log(`${B}${G}  ✓ done${X}  ${D}— detection → redact/tokenize/block → forward → audit, all local.${X}`);
+  rule();
+  console.log(`${D}  config reference: haechi.config.example.json   ·   docs/current/configuration.md${X}\n`);
+  await proxy.close();
+  stub.server.close();
+}
+main().then(() => process.exit(0)).catch((e) => { console.error("demo failed:", e); process.exit(1); });

package/examples/local-proxy-demo/demo.tape ADDED Viewed

@@ -0,0 +1,19 @@
+# VHS tape for the Haechi local end-to-end demo.
+# Regenerate the README GIF with:  vhs examples/local-proxy-demo/demo.tape
+# (run from the repo root; requires vhs + ttyd + ffmpeg)
+Output docs/assets/haechi-demo.gif
+Set Shell "bash"
+Set FontSize 15
+Set Width 1180
+Set Height 840
+Set Padding 18
+Set Theme "Catppuccin Mocha"
+Set TypingSpeed 55ms
+Sleep 500ms
+Type "node examples/local-proxy-demo/demo.mjs"
+Sleep 600ms
+Enter
+Sleep 9s

package/examples/local-proxy-demo/live-demo.mjs ADDED Viewed

@@ -0,0 +1,121 @@
+#!/usr/bin/env node
+// Live end-to-end demo against a REAL upstream model (vLLM / Ollama / any
+// OpenAI-compatible server). Unlike demo.mjs (which uses a deterministic stub),
+// this proves protection against an actual model: it asks the model to repeat the
+// phone number it was given, and the model can only return the *masked* form —
+// the real number never reached it.
+//
+//   HAECHI_LIVE_UPSTREAM=http://127.0.0.1:8000 \
+//   HAECHI_LIVE_MODEL="Qwen/Qwen3.6-35B-A3B-FP8" \
+//   node examples/local-proxy-demo/live-demo.mjs
+//
+// Defaults: type=vllm-openai. HAECHI_LIVE_TYPE and HAECHI_LIVE_MODEL override.
+// Zero dependencies — only node: builtins + the in-repo haechi packages.
+import { mkdtemp, readFile } from "node:fs/promises";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import { createRuntime } from "../../packages/cli/runtime.mjs";
+import { createHaechiProxy } from "../../packages/proxy/index.mjs";
+import { initLocalKeyFile } from "../../packages/crypto/index.mjs";
+const B = "\x1b[1m", D = "\x1b[2m", G = "\x1b[32m", Y = "\x1b[33m", C = "\x1b[36m", R = "\x1b[31m", X = "\x1b[0m";
+const rule = () => console.log(D + "─".repeat(64) + X);
+const scene = (n, t) => { console.log(); rule(); console.log(`${B}${C}  ${n}. ${t}${X}`); rule(); };
+const pause = (ms) => new Promise((r) => setTimeout(r, ms));
+const UPSTREAM = process.env.HAECHI_LIVE_UPSTREAM;
+const TYPE = process.env.HAECHI_LIVE_TYPE || "vllm-openai";
+const MODEL = process.env.HAECHI_LIVE_MODEL || "Qwen/Qwen3.6-35B-A3B-FP8";
+if (!UPSTREAM) {
+  console.error("Set HAECHI_LIVE_UPSTREAM (e.g. http://127.0.0.1:8000) to a reachable OpenAI-compatible server.");
+  console.error("For a no-backend reproducible run, use:  npm run demo");
+  process.exit(2);
+}
+async function chat(base, content, extra = {}) {
+  const t0 = Date.now();
+  const res = await fetch(`${base}/v1/chat/completions`, {
+    method: "POST", headers: { "content-type": "application/json" },
+    body: JSON.stringify({ model: MODEL, max_tokens: 128, temperature: 0,
+      // Qwen3 reasoning models: ask for a direct answer (no chain-of-thought) so
+      // the demo gets a terse content reply. Ignored by non-reasoning servers.
+      chat_template_kwargs: { enable_thinking: false },
+      messages: [{ role: "user", content }], ...extra })
+  });
+  const body = await res.json();
+  return { status: res.status, ms: Date.now() - t0, text: body.choices?.[0]?.message?.content ?? body.error?.message ?? "(no content)" };
+}
+async function main() {
+  console.log(`\n${B}🛡  Haechi — LIVE end-to-end demo${X}  ${D}(real model: ${MODEL} via ${TYPE}, enforce mode)${X}`);
+  const dir = await mkdtemp(join(tmpdir(), "haechi-live-"));
+  const keyFile = join(dir, ".haechi", "dev.keys.json");
+  const auditPath = join(dir, ".haechi", "audit.jsonl");
+  await initLocalKeyFile(keyFile, { force: true });
+  const runtime = createRuntime({
+    mode: "enforce",
+    target: { type: TYPE, upstream: UPSTREAM },
+    policy: { mode: "enforce", presets: ["llm-redact"], actions: { email: "tokenize", phone: "mask", secret: "redact", api_key: "redact", card: "block" } },
+    tokenVault: { detokenizeResponses: true },
+    responseProtection: { enabled: true, mode: "enforce", failureMode: "fail-closed" },
+    keys: { keyFile }, audit: { path: auditPath }
+  });
+  const proxy = createHaechiProxy({ runtime, port: 0 });
+  const addr = await proxy.listen();
+  const base = `http://127.0.0.1:${addr.port}`;
+  // ── Scene 1 ────────────────────────────────────────────────────────────────
+  scene(1, "Ask a REAL model to repeat the phone number you give it");
+  const prompt = "Reply in one short line: repeat the phone number you were given. Phone: 010-1234-5678, email minji.kim@example.com";
+  console.log(`${Y}you send →${X} ${prompt}`);
+  await pause(700);
+  const r1 = await chat(base, prompt);
+  scene(2, "Haechi detected + protected the prompt BEFORE it left your machine");
+  const events = (await readFile(auditPath, "utf8")).trim().split("\n").map((l) => JSON.parse(l));
+  const ev = events.find((e) => Array.isArray(e.detections) && e.detections.length) ?? events[0];
+  console.log(`${D}detections:${X} ${(ev.detections ?? []).map((d) => `${G}${d.type}→${d.action}${X}`).join("  ")}`);
+  console.log(`${D}the model only ever saw:${X} email → ${C}[TOKEN:…]${X},  phone → ${C}01*********78${X}`);
+  await pause(700);
+  scene(3, "The real model replies — it can only return the MASKED phone");
+  console.log(`${G}${MODEL.split("/").pop()} →${X} ${B}${r1.text}${X}   ${D}(${r1.ms} ms)${X}`);
+  console.log(`${D}  your real number 010-1234-5678 never reached the model — it cannot reveal it.${X}`);
+  await pause(700);
+  // ── Scene 4 ────────────────────────────────────────────────────────────────
+  scene(4, "The audit log — hash-chained, and never any plaintext");
+  const auditRaw = await readFile(auditPath, "utf8");
+  console.log(`${D}leaks the real email/phone?${X} ${/minji\.kim@example|010-1234-5678/.test(auditRaw) ? R + "YES" + X : G + "no — clean" + X}`);
+  await pause(700);
+  // ── Scene 5 ────────────────────────────────────────────────────────────────
+  scene(5, "Day-2 operability — live readiness + Prometheus metrics");
+  const ready = await (await fetch(`${base}/__haechi/ready`)).json();
+  console.log(`${D}/__haechi/ready →${X} ${ready.ready ? G + "ready" : R + "not ready"}${X}`);
+  const metrics = await (await fetch(`${base}/__haechi/metrics`)).text();
+  for (const line of metrics.split("\n").filter((l) => /^haechi_requests_total\{/.test(l)).slice(0, 3)) {
+    console.log(`${D}metric:${X} ${line}`);
+  }
+  await pause(700);
+  // ── Scene 6 ────────────────────────────────────────────────────────────────
+  scene(6, "A card number is blocked before it ever reaches the model");
+  const r2 = await chat(base, "charge card 4242 4242 4242 4242 now");
+  console.log(`${Y}you send →${X} "charge card 4242 4242 4242 4242 now"`);
+  console.log(`${G}proxy →${X} HTTP ${r2.status} ${r2.status === 403 ? R + B + "BLOCKED" + X : ""} ${D}(no upstream call made)${X}`);
+  console.log();
+  rule();
+  console.log(`${B}${G}  ✓ live${X}  ${D}— a real model, and your PII never left the gateway in the clear.${X}`);
+  rule();
+  console.log();
+  await proxy.close();
+}
+main().then(() => process.exit(0)).catch((e) => { console.error("live demo failed:", e); process.exit(1); });

package/examples/local-proxy-demo/live-demo.tape ADDED Viewed

@@ -0,0 +1,25 @@
+# VHS tape for the Haechi LIVE demo (real upstream model).
+# Regenerate the README GIF with:
+#   HAECHI_LIVE_UPSTREAM is set below via Env so it stays out of the recording.
+#   vhs examples/local-proxy-demo/live-demo.tape    (run from the repo root)
+Output docs/assets/haechi-demo.gif
+Set Shell "bash"
+Set FontSize 15
+Set Width 1180
+Set Height 840
+Set Padding 18
+Set Theme "Catppuccin Mocha"
+Set TypingSpeed 55ms
+# Point these at a reachable OpenAI-compatible server before recording. Using Env
+# (not the typed command) keeps the upstream URL out of the captured GIF.
+Env HAECHI_LIVE_UPSTREAM "http://127.0.0.1:8000"
+Env HAECHI_LIVE_MODEL "Qwen/Qwen3.6-35B-A3B-FP8"
+Sleep 500ms
+Type "node examples/local-proxy-demo/live-demo.mjs"
+Sleep 600ms
+Enter
+Sleep 9s