npm - haechi - Versions diffs - 0.4.0 → 0.5.0 - Mend

haechi 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/README.ko.md +227 -0
package/README.md +13 -4
package/docs/README.md +3 -6
package/docs/current/api-stability.ko.md +2 -1
package/docs/current/api-stability.md +1 -0
package/docs/current/configuration.ko.md +210 -0
package/docs/current/configuration.md +210 -0
package/docs/current/release-0.5-implementation-scope.ko.md +69 -0
package/docs/current/release-0.5-implementation-scope.md +69 -0
package/docs/current/release-process.ko.md +2 -2
package/docs/current/release-process.md +2 -2
package/docs/current/risk-register-release-gate.ko.md +2 -2
package/docs/current/risk-register-release-gate.md +2 -2
package/docs/current/threat-model.ko.md +6 -4
package/docs/current/threat-model.md +5 -3
package/haechi.config.example.json +3 -1
package/package.json +3 -2
package/packages/cli/bin/haechi.mjs +163 -22
package/packages/cli/runtime.mjs +10 -2
package/packages/core/index.mjs +110 -1
package/packages/protocol-adapters/index.mjs +33 -14
package/packages/proxy/index.mjs +108 -1
package/packages/stream-filter/index.mjs +194 -0

package/docs/current/configuration.md ADDED Viewed

@@ -0,0 +1,210 @@
+# Haechi Configuration Reference
+- Status: Living document
+- Target version: 0.5.0
+`haechi init` writes `haechi.config.json`; a non-secret template is at `haechi.config.example.json`. Every command reads it with `--config <path>` (default `haechi.config.json`). Configuration is **validated fail-closed**: unknown providers, out-of-range numbers, and malformed values throw at load time rather than degrading silently. `haechi config` prints this reference; `haechi status` prints the *effective* state of a given config.
+## Full default
+```json
+{
+  "mode": "dry-run",
+  "target": { "type": "llm-http", "adapter": "openai-compatible", "upstream": "http://127.0.0.1:9999" },
+  "proxy": { "host": "127.0.0.1", "port": 1016 },
+  "responseProtection": { "enabled": false, "mode": "enforce", "failureMode": "fail-closed", "allowNonJson": false, "allowCompressed": false, "maxBytes": 1048576 },
+  "streaming": { "requestMode": "block" },
+  "limits": { "maxRequestBytes": 1048576, "upstreamTimeoutMs": 120000 },
+  "policy": { "mode": "dry-run", "presets": ["korean-pii", "secrets-only", "llm-redact"], "defaultAction": "redact", "actions": { "card": "block" } },
+  "filters": { "customRules": [] },
+  "keys": { "provider": "local", "keyFile": ".haechi/dev.keys.json" },
+  "audit": { "sink": "jsonl", "path": ".haechi/audit.jsonl" },
+  "tokenVault": { "provider": "local", "path": ".haechi/token-vault.json", "revealPolicy": "disabled", "retentionDays": 30, "deterministic": false, "deterministicTypes": null, "detokenizeResponses": false },
+  "privacy": { "profile": null },
+  "mcp": { "allowedMethods": ["initialize", "tools/call", "resources/read", "prompts/get"], "protectParams": true, "protectResults": true, "requireJsonRpc": true }
+}
+```
+## Top level
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `mode` | `dry-run` \| `report-only` \| `enforce` | `dry-run` | Global enforcement mode. `dry-run`/`report-only` detect + audit only; `enforce` transforms/blocks. Overridden by `policy.mode` when set. |
+## `target`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `target.type` | `llm-http` \| `openai-compatible` \| `vllm-openai` \| `ollama` \| `llama-cpp` | `llm-http` | Selects the protocol adapter. `llm-http` aliases `openai-compatible`. Unknown values **fail closed** at load. |
+| `target.adapter` | same set | `openai-compatible` | Explicit adapter override; usually leave unset and let `type` decide. |
+| `target.upstream` | URL string | `http://127.0.0.1:9999` | The only upstream the proxy forwards to. Request targets must be origin-form paths; absolute-URL targets are rejected (SSRF guard). |
+## `proxy`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `proxy.host` | non-empty string | `127.0.0.1` | Bind address. Non-loopback hosts require the `--allow-remote-bind` CLI flag — config alone will not start (see [Binding beyond loopback](#binding-beyond-loopback)). |
+| `proxy.port` | integer 0–65535 | `1016` | Listen port (`0` = ephemeral). Override per-run with `--port`. |
+## `responseProtection`
+Inspects upstream JSON responses (off by default — turn on to protect what comes *back* from the model).
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `responseProtection.enabled` | boolean | `false` | Master switch. Required for `detokenizeResponses` to do anything. |
+| `responseProtection.mode` | `dry-run` \| `report-only` \| `enforce` | `enforce` | Enforcement mode for the response direction. |
+| `responseProtection.failureMode` | `fail-closed` \| `allow` | `fail-closed` | What to do with an *uninspectable* response (non-JSON, invalid JSON, compressed). `fail-closed` returns 502; `allow` passes it through (audited). |
+| `responseProtection.allowNonJson` | boolean | `false` | Permit non-JSON responses through without inspection. |
+| `responseProtection.allowCompressed` | boolean | `false` | Permit compressed responses through without inspection. |
+| `responseProtection.maxBytes` | positive integer | `1048576` | Hard response size cap. Enforced even under `failureMode: allow` — oversized responses are always denied. |
+## `streaming`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `streaming.requestMode` | `block` \| `pass-through` \| `inspect` | `block` | `block` → `501` for streaming; `inspect` → stream-filter SSE/NDJSON responses (bounded cross-frame buffer); `pass-through` → forward uninspected (audited). Ollama `/api/chat` and `/api/generate` are treated as streaming unless `stream: false` is set. |
+| `streaming.responseMode` | `dry-run` \| `report-only` \| `enforce` | `enforce` | Enforcement mode applied to inspected streams (independent of the request direction). |
+| `streaming.maxMatchBytes` | positive integer | `256` | Cross-frame match window for `inspect`. A held tail of this size lets a detection spanning frames be caught before emission; a single match longer than this may still split across frames. |
+## `limits`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `limits.maxRequestBytes` | positive integer | `1048576` | Request body cap; over the limit returns `413`. Enforced incrementally (the body is not fully buffered first). |
+| `limits.upstreamTimeoutMs` | positive integer | `120000` | Upstream request timeout; on expiry returns `504 haechi_upstream_timeout`. Connection failure returns `502 haechi_upstream_unreachable`. |
+## `policy`
+The detect→decide core. See [Detection types & actions](#detection-types--actions).
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `policy.mode` | `dry-run` \| `report-only` \| `enforce` | `dry-run` | Effective enforcement mode (`policy.mode ?? mode`). |
+| `policy.presets` | array of preset names | `["korean-pii", "secrets-only", "llm-redact"]` | Bundled action sets, merged in order. See [Presets](#presets). |
+| `policy.defaultAction` | an action | `redact` | Action for a detected type with no explicit mapping. |
+| `policy.actions` | `{ <type>: <action> }` | `{ "card": "block" }` | Per-type overrides. Merges may **strengthen** but never weaken (see [Action strength](#action-strength)); `injection` defaults to `allow` unless set. |
+| `policy.allowUnsafeOverrides` | boolean | `false` | Permit a weaker action to override a stronger one. Off by default; turning it on removes a safety guard. |
+| `policy.bundlePath` | path | unset | Load a signed policy bundle instead of inline policy (verified against `keys.keyFile`). |
+## `filters`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `filters.customRules` | array of rule objects | `[]` | Extra detection rules: `{ id, type, pattern, flags?, confidence? }`. Patterns are ReDoS-screened (≤500 chars, no nested quantifiers, no backreferences) and rejected at load if unsafe. |
+## `keys`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `keys.provider` | `local` \| `external` | `local` | `local` uses a software AES-256-GCM key file (dev only). `external` ships no key material and **requires** injecting a crypto provider via `createRuntime(config, { cryptoProvider })`. |
+| `keys.keyFile` | path | `.haechi/dev.keys.json` | Local key file (mode `0600`). `haechi init --force` rotates it, retiring prior keys so existing ciphertext/tokens stay decryptable by `kid`. |
+## `audit`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `audit.sink` | `jsonl` | `jsonl` | Only `jsonl` is supported. |
+| `audit.path` | path | `.haechi/audit.jsonl` | SHA-256 hash-chained log; verify with `haechi audit-verify`. Never contains plaintext/PII. |
+## `tokenVault`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `tokenVault.provider` | `local` | `local` | Only `local` is supported. |
+| `tokenVault.path` | path | `.haechi/token-vault.json` | Encrypted token store (atomic writes, file-locked). |
+| `tokenVault.revealPolicy` | `disabled` \| `local-dev` | `disabled` | Gates **manual** reveal (`token-reveal`). Every reveal/purge decision is audited. Independent of detokenization. |
+| `tokenVault.retentionDays` | positive number | `30` | Token TTL. Expired tokens are deleted on vault writes or via `token-purge --expired`. |
+| `tokenVault.deterministic` | boolean | `false` | Equal `(type, value)` → equal token (HMAC over a domain-separated derived key). Needed for multi-turn. **Trade-off:** equal values become linkable. |
+| `tokenVault.deterministicTypes` | `null` \| non-empty string array | `null` | `null` = all types when deterministic; otherwise limit determinism to listed types (e.g. `["email"]`). |
+| `tokenVault.detokenizeResponses` | boolean | `false` | Restore request-issued tokens in that request's response. Only the tokens issued while protecting the same request are restored; requires `responseProtection.enabled`. Audited by count. |
+## `privacy`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `privacy.profile` | `null` \| `kr-pipa` \| `eu-gdpr` \| `us-general` | `null` | Applies a regional baseline action set before enforcement. Profiles may **strengthen** but never weaken your explicit actions. Engineering defaults, not legal advice. |
+## `mcp`
+Applies to `mcp-stdio` and `mcp-wrap`.
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `mcp.allowedMethods` | non-empty string array | `["initialize", "tools/call", "resources/read", "prompts/get"]` | Client-callable method allowlist (`"*"` allows all). Server-initiated requests bypass the allowlist but are still params-protected. |
+| `mcp.protectParams` | boolean | `true` | Protect request `params`. |
+| `mcp.protectResults` | boolean | `true` | Protect response `result` (and run injection heuristics on it). |
+| `mcp.requireJsonRpc` | boolean | `true` | Require `jsonrpc: "2.0"`; non-conforming messages are rejected. |
+## Detection types & actions
+Built-in detection `type` values: `email`, `phone`, `kr_rrn`, `card`, `api_key`, `secret`, and `injection` (response-direction heuristic, report-only by default). Custom rules may introduce new types.
+Actions (weakest → strongest):
+| Action | Effect |
+|---|---|
+| `allow` | No change (still detected and audited). |
+| `redact` | Replace with `[REDACTED:<type>]`. |
+| `mask` | Partially mask (values ≤8 chars are fully masked). |
+| `tokenize` | Replace with a vault token; reversible via the token vault. |
+| `encrypt` | Replace with an inline AES-256-GCM envelope. |
+| `block` | Reject the whole payload (`403`/`-32001`/exit 3). |
+### Action strength
+When a preset and an override (or a privacy profile) disagree, the **stronger** action wins, and trying to weaken a stronger action throws unless `policy.allowUnsafeOverrides` is `true`. Strength: `allow`(0) < `redact`/`mask`(1) < `tokenize`/`encrypt`(2) < `block`(3).
+### Presets
+| Preset | Effect |
+|---|---|
+| `llm-redact` | default `redact`; `email: redact`, `phone: mask` |
+| `korean-pii` | `kr_rrn: block`, `phone: mask`, `email: redact` |
+| `secrets-only` | `api_key: block`, `secret: block` |
+| `strict-block` | default `block` |
+| `mcp-basic` | default `redact`; `api_key`/`secret`/`kr_rrn: block` |
+| `local-inference` | default `redact`; `email: tokenize`, `phone: mask`, secrets/`kr_rrn: block` |
+| `local-only` | marks transfer as non-external (metadata) |
+## Common setups
+**Protect requests in enforce mode (minimal):**
+```json
+{ "mode": "enforce", "policy": { "mode": "enforce" } }
+```
+**Local inference with response protection + token round-trip:**
+```json
+{
+  "mode": "enforce",
+  "target": { "type": "vllm-openai", "upstream": "http://127.0.0.1:8000" },
+  "policy": { "mode": "enforce", "presets": ["local-inference"] },
+  "responseProtection": { "enabled": true, "mode": "enforce" },
+  "tokenVault": { "deterministic": true, "detokenizeResponses": true }
+}
+```
+**EU profile, secrets blocked, injection flagged:**
+```json
+{
+  "mode": "enforce",
+  "privacy": { "profile": "eu-gdpr" },
+  "policy": { "mode": "enforce", "actions": { "injection": "redact" } },
+  "responseProtection": { "enabled": true }
+}
+```
+## Binding beyond loopback
+The proxy refuses non-loopback hosts unless the CLI flag is passed explicitly — `proxy.host: "0.0.0.0"` in config alone will not start, by design:
+```bash
+haechi proxy --config haechi.config.json --host 0.0.0.0 --allow-remote-bind
+```
+**The proxy has no client authentication yet** (planned for 0.6): anyone who can reach the port can use your upstream and the token round-trip path. Use `--allow-remote-bind` only behind explicit network controls — bind `0.0.0.0` inside a container and restrict the host port mapping (`-p 127.0.0.1:1016:1016`), or front it with a firewall/VPN/authenticating reverse proxy.
+## Validation cheatsheet
+These throw at load (fail-closed): unknown `keys.provider`; empty `proxy.host`; out-of-range `proxy.port`; non-`jsonl` `audit.sink`; non-`local` `tokenVault.provider`; bad `revealPolicy`; non-positive `retentionDays`; non-boolean `deterministic`/`detokenizeResponses`; empty/non-string `deterministicTypes`; empty/non-string `mcp.allowedMethods`; non-boolean `mcp.*` flags; unknown `privacy.profile`; bad `responseProtection.failureMode`; non-positive `responseProtection.maxBytes`; bad `streaming.requestMode`/`streaming.responseMode`; non-positive `streaming.maxMatchBytes`; non-positive `limits.*`; unknown `target.type`/`adapter`; unsafe custom regex; weakening action without `allowUnsafeOverrides`.

package/docs/current/release-0.5-implementation-scope.ko.md ADDED Viewed

@@ -0,0 +1,69 @@
+# Haechi 0.5 Implementation Scope
+- 문서 상태: Final
+- 작성일: 2026-06-10
+- 기준 버전: 0.5.0 (0.4.0 이후)
+- 성격: streaming hardening
+- 구현 완료: 2026-06-10 — PR #14 (streaming inspection)
+## 1. 릴리스 목표
+streaming 보호 공백을 메운다: SSE/NDJSON 응답 stream을 차단하거나 무보호로 통과시키는 대신 직접 검사(inspect)한다. Streaming은 실제 LLM 사용에서 가장 흔한 전송 방식이므로 "streaming을 쓰면 보호를 포기해야 한다"는 구조가 핵심 잔여 취약점이었다.
+## 2. 범위
+### 2.1 Streaming 응답 검사
+- 새 `streaming.requestMode: "inspect"` (`block` 및 `pass-through`와 병존).
+- `packages/stream-filter`: 두 가지 wire format에 대한 점진적 frame parser — SSE (`data: …\n\n`)와 NDJSON (`{…}\n`). `[DONE]`, keep-alive comment, 비-JSON frame은 원문 그대로 통과한다.
+- 각 protocol-adapter streaming 라우트는 `{ format, deltaPath }` — 주 점진적 텍스트 채널 — 을 선언한다.
+  - OpenAI-compatible / vLLM / llama.cpp chat-completions: SSE, `choices[0].delta.content`
+  - completions: SSE, `choices[0].text`
+  - llama.cpp `/completion`: SSE, `content`
+  - Ollama `/api/chat`: NDJSON, `message.content`
+  - Ollama `/api/generate`: NDJSON, `response`
+  - OpenAI `/v1/responses`: SSE, 고정 delta path 없음 (frame 전체 보호만)
+### 2.2 Cross-frame 정확성 (sliding buffer)
+Stream으로 전송된 바이트는 회수할 수 없으므로 탐지는 값이 방출되기 전에 이루어져야 한다. `core`에 `createStreamProtector`가 추가된다. 이 상태 유지(stateful) protector는 다음과 같이 동작한다.
+- delta 채널의 bounded **raw tail**을 보관한다. push가 들어올 때마다 누적된 pending 텍스트에 대해 탐지를 수행하고, `len - maxMatchBytes`를 commit point로 계산하며, 이를 가로지르는 탐지가 발생하기 전에 commit point를 후퇴시킨다. committed prefix만 변환하여 방출하고 tail은 다음 frame을 위해 보관한다.
+- stream 끝에서 보관 중인 tail을 합성된 최종 frame으로 flush한다.
+- frame의 그 밖의 모든 문자열 리프(tool-call argument 등)에 대해 `protectFrameExtras`를 실행하며, within-frame 보호를 적용한다.
+- `streaming.maxMatchBytes` (기본값 256)는 **보장 범위의 경계**다: window보다 긴 단일 match는 여전히 frame 사이에서 분할될 수 있다. 문서화된 한계 사항.
+### 2.3 Enforcement와 audit
+- Streaming 호출의 request body는 일반 JSON이며 포워딩 전에 일반 요청과 동일하게 보호된다.
+- `block` action은 문제가 되는 값이 방출되기 전에 stream을 중단한다(buffer에 보관 중이며 commit되지 않은 상태). 연결은 종료된다. 이미 방출된 바이트는 회수할 수 없다 — streaming의 문서화된 한계.
+- stream 전체에 대해 한 번 audit 기록: `stream_inspected` 또는 `stream_blocked`, 집계 탐지 횟수만 기록(평문 없음). `identity: null`은 다른 곳과 동일하게 예약.
+- 새 `streaming.responseMode` (`dry-run` | `report-only` | `enforce`, 기본값 `enforce`)로 응답 방향 enforcement 모드를 독립적으로 제어한다.
+### 2.4 Adapter 라우팅 수정
+특정 `target.type` (`ollama`, `vllm-openai`, `llama-cpp`)이 이제 deep-merge된 기본 `target.adapter` (`openai-compatible`)보다 우선된다. 기존에는 `target.type: "ollama"`만 설정한 config가 기본 adapter가 merge 후에도 살아남아 OpenAI 경로로 조용히 라우팅되었다 — 이로 인해 streaming 분류도 무력화되었다.
+## 3. 명시적 비범위 (0.5에서 하지 않음)
+- Stream sequence AAD 및 replay cache (보류; encryption-on-stream 필요 시점에 해당).
+- Per-choice (`n > 1`) cross-frame buffering — secondary choice는 within-frame 보호만 적용.
+- Stream 내부의 base64/인코딩 값 디코딩 (비-streaming과 동일한 제외 항목).
+- MCP의 양방향 streaming (stdio filter는 line-framed JSON-RPC로 이미 처리됨).
+## 4. 테스트 기준
+- Within-frame 및 cross-frame (byte 단위 분할 포함) PII를 SSE와 NDJSON 모두에서 탐지.
+- `[DONE]` / keep-alive / 비-JSON frame 보존.
+- delta 외 PII (tool-call argument) within-frame 보호.
+- `block`은 값 방출 전에 stream 중단; `report-only`는 변환 없이 탐지.
+- Proxy e2e: 요청 보호, 응답 stream-filter, audit chain 유효, audit에 평문 없음.
+- `inspect` 하에서 검사 불가 라우트는 fail-closed (501).
+- `requestMode: inspect`, `responseMode`, `maxMatchBytes`에 대한 config 검증.
+## 5. 문서 영향
+- README: streaming inspection 섹션, config 참조 행, `configuration.md` 업데이트.
+- threat-model: streaming이 "검사 불가, 차단됨"에서 "검사됨(bounded)"으로 이동; `maxMatchBytes` 한계와 block 시 방출된 바이트 한계를 문서화된 제외 항목으로 기재.
+- risk-register: 0.5.0 백로그 행 완료 처리.
+- api-stability: `haechi/stream-filter`와 `createStreamProtector`를 experimental로 표기.

package/docs/current/release-0.5-implementation-scope.md ADDED Viewed

@@ -0,0 +1,69 @@
+# Haechi 0.5 Implementation Scope
+- Status: Final
+- Date: 2026-06-10
+- Target version: 0.5.0 (after 0.4.0)
+- Type: streaming hardening
+- Shipped: 2026-06-10 — PR #14 (streaming inspection)
+## 1. Release Goal
+Close the streaming protection gap: inspect SSE/NDJSON response streams instead of only blocking them or passing them through unprotected. Streaming is the common transport for real LLM usage, so "use streaming and give up protection" was the main remaining hole.
+## 2. Scope
+### 2.1 Streaming response inspection
+- New `streaming.requestMode: "inspect"` (alongside `block` and `pass-through`).
+- `packages/stream-filter`: incremental frame parser for two wire formats — SSE (`data: …\n\n`) and NDJSON (`{…}\n`). `[DONE]`, keep-alive comments, and non-JSON frames pass through verbatim.
+- Each protocol-adapter streaming route declares `{ format, deltaPath }`: the primary incremental-text channel.
+  - OpenAI-compatible / vLLM / llama.cpp chat-completions: SSE, `choices[0].delta.content`
+  - completions: SSE, `choices[0].text`
+  - llama.cpp `/completion`: SSE, `content`
+  - Ollama `/api/chat`: NDJSON, `message.content`
+  - Ollama `/api/generate`: NDJSON, `response`
+  - OpenAI `/v1/responses`: SSE, no fixed delta path (whole-frame protection only)
+### 2.2 Cross-frame correctness (sliding buffer)
+Streamed bytes cannot be retracted, so detection must happen before a value is emitted. `core` gains `createStreamProtector`, a stateful protector that:
+- Holds a bounded **raw tail** of the delta channel. On each push it detects on the accumulated pending text, computes a commit point of `len - maxMatchBytes`, and pulls the commit point back before any detection that straddles it. Only the committed prefix is transformed and emitted; the tail is held for the next frame.
+- Flushes the held tail as a synthesized final frame at end of stream.
+- Runs `protectFrameExtras` for all other string leaves of a frame (tool-call arguments, etc.) with within-frame protection.
+- `streaming.maxMatchBytes` (default 256) **bounds the guarantee**: a single match longer than the window may still split across frames. Documented limitation.
+### 2.3 Enforcement and audit
+- The request body of a streaming call is ordinary JSON and is protected like any request before forwarding.
+- `block` actions stop the stream before the offending value is emitted (held in the buffer, never committed); the connection is ended. Bytes already emitted cannot be retracted — a documented limit of streaming.
+- The whole stream is audited once: `stream_inspected` or `stream_blocked`, with aggregate detection counts only (no plaintext). `identity: null` reserved as elsewhere.
+- New `streaming.responseMode` (`dry-run` | `report-only` | `enforce`, default `enforce`) controls the response-direction enforcement mode independently.
+### 2.4 Adapter routing fix
+A specific `target.type` (`ollama`, `vllm-openai`, `llama-cpp`) now takes precedence over a deep-merged default `target.adapter` (`openai-compatible`). Previously a config that set only `target.type: "ollama"` was silently routed to OpenAI paths because the default adapter survived the merge — which also defeated streaming classification.
+## 3. Explicit non-scope (not in 0.5)
+- Stream sequence AAD and replay cache (deferred; relevant once encryption-on-stream is needed).
+- Per-choice (`n > 1`) cross-frame buffering — secondary choices get within-frame protection only.
+- Decoding base64/encoded values inside streams (same exclusion as non-streaming).
+- Bidirectional streaming for MCP (the stdio filter is line-framed JSON-RPC, already handled).
+## 4. Test criteria
+- Within-frame and cross-frame (including byte-by-byte split) PII caught in both SSE and NDJSON.
+- `[DONE]` / keep-alive / non-JSON frames preserved.
+- Non-delta PII (tool-call args) protected within-frame.
+- `block` stops the stream before emitting the value; `report-only` detects without transforming.
+- Proxy e2e: request protected, response stream-filtered, audit chain valid, no plaintext in audit.
+- Uninspectable route under `inspect` fails closed (501).
+- Config validation for `requestMode: inspect`, `responseMode`, `maxMatchBytes`.
+## 5. Documentation impact
+- README: streaming inspection section, config reference rows, `configuration.md` updates.
+- threat-model: streaming moves from "uninspectable, blocked" to "inspected (bounded)"; the `maxMatchBytes` limit and emitted-bytes-on-block limit are documented exclusions.
+- risk-register: 0.5.0 backlog row checked off.
+- api-stability: `haechi/stream-filter` and `createStreamProtector` marked experimental.

package/docs/current/release-process.ko.md CHANGED Viewed

@@ -24,11 +24,11 @@ npm run release:preflight:npm
 의도된 publish 경로는 GitHub Actions trusted publishing이다: npm이 release workflow를 OIDC로 인증하고 provenance 증명을 자동 생성한다. 공식 npm 요구사항에 따라 GitHub-hosted runner, `id-token: write`, 연결된 workflow에서의 publish가 필요하다.
-**현재 상태: trusted publishing 구성 완료, 첫 증명 릴리스 대기.** `haechi@0.3.2`는 로컬 머신에서 패스키 인증과 `--provenance=false`로 배포되어 해당 버전의 provenance 증명이 존재하지 않는다. 활성화 runbook과 진행 상태:
+**현재 상태: trusted publishing 구성 및 검증 완료.** `haechi@0.3.2`는 로컬 머신에서 패스키 인증과 `--provenance=false`로 배포되어 해당 버전의 provenance 증명이 존재하지 않는다. 활성화 runbook과 진행 상태:
 1. ✅ npmjs.com에서: package settings → Trusted Publisher → `raeseoklee/haechi` 저장소와 `npm-publish.yml` workflow 연결 (2026-06-10).
 2. ✅ `.github/workflows/npm-publish.yml` OIDC 인증 전환 (2026-06-10): `NODE_AUTH_TOKEN`과 `registry-url` 제거, runner의 npm CLI를 `>= 11.5.1`로 업그레이드.
-3. ⏳ 다음 릴리스 후 `npm view haechi --json`(`dist.attestations`)으로 증명을 확인. OIDC 경로는 아직 실제 publish를 수행한 적이 없으며, 잘못 구성된 경우 publish 시점에 fail-closed로 실패한다.
+3. ✅ `haechi@0.4.0`으로 검증 완료 (2026-06-10): `npm view haechi --json`에서 SLSA provenance v1 predicate를 가진 `dist.attestations` 확인. 로컬 패스키로 배포한 `haechi@0.3.2`만 비증명 상태로 남는다.
 provenance 없이 수행한 publish는 release note에 갭을 명시적으로 기록해야 한다(`CONTRIBUTING.md` 참조).

package/docs/current/release-process.md CHANGED Viewed

@@ -24,11 +24,11 @@ Before the first publish, it is normal for `npm view <package> version` to retur
 The intended publish path is GitHub Actions trusted publishing: npm authenticates the release workflow via OIDC and generates a provenance statement automatically. Per the official npm requirements this needs a GitHub-hosted runner, `id-token: write`, and a publish from the linked workflow.
-**Current state: trusted publishing is configured; first attested release pending.** `haechi@0.3.2` was published from a local machine using passkey authentication with `--provenance=false`, so no provenance attestation exists for that version. The enablement runbook and its status:
+**Current state: trusted publishing is configured and verified.** `haechi@0.3.2` was published from a local machine using passkey authentication with `--provenance=false`, so no provenance attestation exists for that version. The enablement runbook and its status:
 1. ✅ On npmjs.com: package settings → Trusted Publisher → linked the `raeseoklee/haechi` repository and the `npm-publish.yml` workflow (2026-06-10).
 2. ✅ `.github/workflows/npm-publish.yml` authenticates via OIDC (2026-06-10): `NODE_AUTH_TOKEN` and `registry-url` removed, npm CLI upgraded to `>= 11.5.1` in the runner.
-3. ⏳ After the next release, verify the attestation with `npm view haechi --json` (`dist.attestations`). The OIDC path has not carried a real publish yet; if misconfigured it fails closed at publish time.
+3. ✅ Verified with `haechi@0.4.0` (2026-06-10): `npm view haechi --json` shows `dist.attestations` with a SLSA provenance v1 predicate. Only `haechi@0.3.2` remains unattested (published via local passkey).
 Any publish performed without provenance must record the gap explicitly in the release notes (see `CONTRIBUTING.md`).

package/docs/current/risk-register-release-gate.ko.md CHANGED Viewed

@@ -2,7 +2,7 @@
 - 문서 상태: Draft 0.3
 - 작성일: 2026-06-10
-- 기준 버전: 0.4.0
+- 기준 버전: 0.5.0
 - 기준 브랜치: `main`
 ## 1. 현재 판단
@@ -127,7 +127,7 @@ base64/인코딩 값 디코딩 검사, query string 검사, audit tail truncatio
 | 버전 | 목표 | 남은 범위 |
 |---|---|---|
 | 0.4.0 ✅ | token round-trip and adoption | 2026-06-10 구현 완료: 요청 스코프 response detokenization, deterministic tokenization(파생 키), `haechi mcp-wrap`, `haechi audit-verify`/`haechi status`, injection detection type(기본 allow), `identity`/`authProvider` 계약 예약. `docs/current/release-0.4-implementation-scope.md` 참조 |
-| 0.5.0 | streaming hardening | SSE/NDJSON stream inspection, stream sequence AAD, replay cache, stronger remote deployment guide |
+| 0.5.0 ✅ | streaming hardening | 2026-06-10 출시: bounded cross-frame 버퍼를 사용한 SSE/NDJSON 스트리밍 응답 검사(`streaming.requestMode: inspect`). stream sequence AAD, replay cache, 강화된 원격 배포 가이드는 0.6+으로 이월. `docs/current/release-0.5-implementation-scope.md` 참조 |
 | 0.6.0 | auth and 운영 통제 | built-in bearer auth, client별 policy scope, model allowlist/rate budget, Vault/AWS KMS reference adapter, external append-only audit sink, signed release artifacts, npm org(`@haechi/*`) 확보 |
 | 0.7.0 | observability | npm workspaces 전환, `@haechi/dashboard` read-only audit viewer (hash chain 무결성 표시, 요약/검색/타임라인) |
 | 1.0.0 | stable API contract | migration policy, long-term audit schema, plugin sandbox/runtime conformance 및 allowlist/manifest 통과 외부 auth/classifier package 동적 로딩 |

package/docs/current/risk-register-release-gate.md CHANGED Viewed

@@ -2,7 +2,7 @@
 - Status: Draft 0.3
 - Date: 2026-06-10
-- Target version: 0.4.0
+- Target version: 0.5.0
 - Branch: `main`
 ## 1. Current Assessment
@@ -127,7 +127,7 @@ All checklist items below were completed for 0.3.2 on 2026-06-10 except the prov
 | Version | Goal | Remaining scope |
 |---|---|---|
 | 0.4.0 ✅ | Token round-trip and adoption | Shipped 2026-06-10: request-scoped response detokenization, deterministic tokenization (derived key), `haechi mcp-wrap`, `haechi audit-verify`/`haechi status`, injection detection type (default allow), `identity`/`authProvider` contracts reserved. See `docs/current/release-0.4-implementation-scope.md` |
-| 0.5.0 | Streaming hardening | SSE/NDJSON stream inspection, stream sequence AAD, replay cache, stronger remote deployment guide |
+| 0.5.0 ✅ | Streaming hardening | Shipped 2026-06-10: SSE/NDJSON streaming response inspection with bounded cross-frame buffer (`streaming.requestMode: inspect`). Stream sequence AAD, replay cache, stronger remote deployment guide deferred to 0.6+. See `docs/current/release-0.5-implementation-scope.md` |
 | 0.6.0 | Auth and operational controls | Built-in bearer auth, per-client policy scope, model allowlist/rate budget, Vault/AWS KMS reference adapter, external append-only audit sink, signed release artifacts, npm org (`@haechi/*`) acquisition |
 | 0.7.0 | Observability | npm workspaces migration, `@haechi/dashboard` read-only audit viewer (hash chain integrity display, summary/search/timeline) |
 | 1.0.0 | Stable API contract | Migration policy, long-term audit schema, plugin sandbox/runtime conformance, and dynamic loading of external auth/classifier packages that pass allowlist/manifest |

package/docs/current/threat-model.ko.md CHANGED Viewed

@@ -2,7 +2,7 @@
 - 문서 상태: Draft 0.1
 - 작성일: 2026-06-10
-- 기준 버전: 0.4.0
+- 기준 버전: 0.5.0
 ## 1. 보호 대상
@@ -24,7 +24,7 @@ Haechi가 보호하려는 주요 자산은 다음이다.
 | CLI local process | 개발자 로컬 신뢰 | dev key 경고, dry-run 기본값 |
 | HTTP proxy listener | 비신뢰 client 입력 | loopback bind 기본, remote bind 명시 플래그 |
 | Upstream model/tool server | 비신뢰 또는 부분 신뢰 | request/response protection, uninspectable response fail-closed |
-| Streaming response | 현재 비검사 영역 | `stream: true` 기본 차단 |
+| Streaming response | 검사(bounded) 또는 차단 | `inspect` 모드는 bounded cross-frame 버퍼로 SSE/NDJSON을 stream-filter함; `block`(기본값)은 거부 |
 | MCP stdio peer | 부분 신뢰 | JSON-RPC 2.0 요구, method allowlist |
 | Local filesystem | 부분 신뢰 | local key/token vault 0600, audit hash chain |
 | External provider/plugin | 비신뢰 | provider method contract, plugin manifest-only gate |
@@ -34,7 +34,7 @@ Haechi가 보호하려는 주요 자산은 다음이다.
 | 위협 | 영향 | 현재 통제 |
 |---|---|---|
 | 인터넷 노출 proxy | 인증 없는 LLM gateway | non-loopback bind 기본 실패 |
-| streaming 우회 | SSE/NDJSON 평문 유출 | streaming request 기본 실패 |
+| streaming 우회 | SSE/NDJSON 평문 유출 | `inspect` 모드는 SSE/NDJSON을 stream-filter함; `block`(기본값)은 거부; `pass-through`는 명시적으로 감사된 opt-out |
 | Ollama 암묵 streaming 우회 | `stream` 생략 시 NDJSON 평문 유출 | `/api/chat`·`/api/generate`는 `stream: false` 명시 없으면 streaming으로 간주해 기본 차단 |
 | 비JSON/압축/대용량 응답 | responseProtection 우회 | fail-closed response policy |
 | token reveal 남용 | tokenized PII 복원 | revealPolicy 기본 disabled, reveal/purge 결정 audit 기록 |
@@ -56,7 +56,9 @@ Haechi가 보호하려는 주요 자산은 다음이다.
 - 운영 KMS/HSM/Vault adapter 자체 제공
 - internet-facing gateway 인증/인가
-- SSE/NDJSON stream inspection
+- `streaming.maxMatchBytes`보다 긴 cross-frame 매칭(스트림 프레임에 걸쳐 분할될 수 있음)
+- `block`이 발동되기 전에 이미 방출된 스트림 바이트의 회수
+- 스트림에서 choice별(`n > 1`) cross-frame 버퍼링(보조 choice는 프레임 내 보호만 적용)
 - 법적 컴플라이언스 인증
 - 모델 hallucination, prompt injection 완전 방어
 - 외부 MCP server의 OAuth/resource binding 검증

package/docs/current/threat-model.md CHANGED Viewed

@@ -24,7 +24,7 @@ The primary assets Haechi protects are:
 | CLI local process | Developer local trust | Dev key warning, dry-run default |
 | HTTP proxy listener | Untrusted client input | Loopback bind by default, remote bind requires explicit flag |
 | Upstream model/tool server | Untrusted or partially trusted | Request/response protection, uninspectable response fail-closed |
-| Streaming response | Currently uninspected | `stream: true` blocked by default |
+| Streaming response | Inspected (bounded) or blocked | `inspect` stream-filters SSE/NDJSON with a bounded cross-frame buffer; `block` (default) refuses |
 | MCP stdio peer | Partially trusted | JSON-RPC 2.0 required, method allowlist |
 | Local filesystem | Partially trusted | Local key/token vault at 0600, audit hash chain |
 | External provider/plugin | Untrusted | Provider method contract, plugin manifest-only gate |
@@ -34,7 +34,7 @@ The primary assets Haechi protects are:
 | Threat | Impact | Current Control |
 |---|---|---|
 | Internet-exposed proxy | Unauthenticated LLM gateway | Non-loopback bind fails by default |
-| Streaming bypass | SSE/NDJSON plaintext leak | Streaming requests fail by default |
+| Streaming bypass | SSE/NDJSON plaintext leak | `inspect` mode stream-filters SSE/NDJSON; `block` (default) refuses; `pass-through` is an explicit audited opt-out |
 | Ollama implicit streaming bypass | NDJSON plaintext leak when `stream` is omitted | `/api/chat` and `/api/generate` are treated as streaming unless `stream: false` is explicit; blocked by default |
 | Non-JSON / compressed / oversized response | responseProtection bypass | Fail-closed response policy |
 | Token reveal abuse | Restoration of tokenized PII | `revealPolicy` disabled by default; reveal/purge decisions recorded in audit |
@@ -56,7 +56,9 @@ The primary assets Haechi protects are:
 - A production KMS/HSM/Vault adapter
 - Authentication/authorization for internet-facing gateways
-- SSE/NDJSON stream inspection
+- Cross-frame matches longer than `streaming.maxMatchBytes` (may still split across stream frames)
+- Retraction of stream bytes already emitted before a `block` fires
+- Per-choice (`n > 1`) cross-frame buffering in streams (secondary choices get within-frame protection only)
 - Legal compliance certification
 - Complete defense against model hallucination or prompt injection
 - OAuth/resource binding validation for external MCP servers

package/haechi.config.example.json CHANGED Viewed

@@ -18,7 +18,9 @@
     "maxBytes": 1048576
   },
   "streaming": {
-    "requestMode": "block"
+    "requestMode": "block",
+    "responseMode": "enforce",
+    "maxMatchBytes": 256
   },
   "limits": {
     "maxRequestBytes": 1048576,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "haechi",
-  "version": "0.4.0",
+  "version": "0.5.0",
   "description": "Experimental developer preview for self-hosted AI context enforcement across LLM, MCP, vLLM, Ollama, and agent traffic.",
   "license": "Apache-2.0",
   "type": "module",
@@ -44,7 +44,8 @@
     "./protocol-adapters": "./packages/protocol-adapters/index.mjs",
     "./proxy": "./packages/proxy/index.mjs",
     "./runtime": "./packages/cli/runtime.mjs",
-    "./token-vault": "./packages/token-vault/index.mjs"
+    "./token-vault": "./packages/token-vault/index.mjs",
+    "./stream-filter": "./packages/stream-filter/index.mjs"
   },
   "files": [
     "README.md",