npm - haechi - Versions diffs - 0.4.0 → 0.6.0 - Mend

haechi 0.4.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/README.ko.md +264 -0
package/README.md +50 -4
package/docs/README.md +4 -6
package/docs/current/api-stability.ko.md +4 -1
package/docs/current/api-stability.md +4 -1
package/docs/current/configuration.ko.md +233 -0
package/docs/current/configuration.md +233 -0
package/docs/current/release-0.5-implementation-scope.ko.md +69 -0
package/docs/current/release-0.5-implementation-scope.md +69 -0
package/docs/current/release-0.6-implementation-scope.ko.md +151 -0
package/docs/current/release-0.6-implementation-scope.md +151 -0
package/docs/current/release-process.ko.md +2 -2
package/docs/current/release-process.md +2 -2
package/docs/current/risk-register-release-gate.ko.md +3 -3
package/docs/current/risk-register-release-gate.md +4 -3
package/docs/current/threat-model.ko.md +8 -4
package/docs/current/threat-model.md +8 -4
package/haechi.config.example.json +13 -1
package/package.json +4 -2
package/packages/auth/index.mjs +170 -0
package/packages/cli/bin/haechi.mjs +253 -27
package/packages/cli/runtime.mjs +113 -7
package/packages/core/index.mjs +126 -6
package/packages/policy/index.mjs +82 -0
package/packages/protocol-adapters/index.mjs +33 -14
package/packages/proxy/index.mjs +237 -4
package/packages/stream-filter/index.mjs +194 -0

package/docs/current/configuration.ko.md ADDED Viewed

@@ -0,0 +1,233 @@
+# Haechi 설정 레퍼런스
+- 문서 상태: Living document
+- 기준 버전: 0.6.0
+`haechi init`은 `haechi.config.json`을 생성하며, 비밀 정보를 포함하지 않는 템플릿은 `haechi.config.example.json`에 있다. 모든 커맨드는 `--config <path>`로 설정 파일을 읽는다(기본값: `haechi.config.json`). 설정은 **fail-closed 방식으로 검증**된다: 알 수 없는 provider, 범위를 벗어난 숫자, 잘못된 형식의 값은 자동으로 무시되지 않고 로드 시점에 오류를 발생시킨다. `haechi config`는 이 레퍼런스를 출력하며, `haechi status`는 특정 설정 파일의 *실제 적용* 상태를 출력한다.
+## 전체 기본값
+```json
+{
+  "mode": "dry-run",
+  "target": { "type": "llm-http", "adapter": "openai-compatible", "upstream": "http://127.0.0.1:9999" },
+  "proxy": { "host": "127.0.0.1", "port": 1016 },
+  "responseProtection": { "enabled": false, "mode": "enforce", "failureMode": "fail-closed", "allowNonJson": false, "allowCompressed": false, "maxBytes": 1048576 },
+  "streaming": { "requestMode": "block" },
+  "limits": { "maxRequestBytes": 1048576, "upstreamTimeoutMs": 120000 },
+  "policy": { "mode": "dry-run", "presets": ["korean-pii", "secrets-only", "llm-redact"], "defaultAction": "redact", "actions": { "card": "block" } },
+  "filters": { "customRules": [] },
+  "keys": { "provider": "local", "keyFile": ".haechi/dev.keys.json" },
+  "audit": { "sink": "jsonl", "path": ".haechi/audit.jsonl" },
+  "tokenVault": { "provider": "local", "path": ".haechi/token-vault.json", "revealPolicy": "disabled", "retentionDays": 30, "deterministic": false, "deterministicTypes": null, "detokenizeResponses": false },
+  "privacy": { "profile": null },
+  "mcp": { "allowedMethods": ["initialize", "tools/call", "resources/read", "prompts/get"], "protectParams": true, "protectResults": true, "requireJsonRpc": true }
+}
+```
+## 최상위
+| 키 | 타입 / 값 | 기본값 | 설명 |
+|---|---|---|---|
+| `mode` | `dry-run` \| `report-only` \| `enforce` | `dry-run` | 전역 집행 모드. `dry-run`/`report-only`는 탐지 및 audit만 수행하며, `enforce`는 변환/차단을 적용한다. `policy.mode`가 설정된 경우 해당 값이 우선한다. |
+## `target`
+| 키 | 타입 / 값 | 기본값 | 설명 |
+|---|---|---|---|
+| `target.type` | `llm-http` \| `openai-compatible` \| `vllm-openai` \| `ollama` \| `llama-cpp` | `llm-http` | 프로토콜 adapter를 선택한다. `llm-http`는 `openai-compatible`의 별칭이다. 알 수 없는 값은 로드 시 **fail-closed**로 처리된다. |
+| `target.adapter` | 동일한 값 집합 | `openai-compatible` | adapter를 명시적으로 지정한다. 보통은 설정하지 않고 `type`이 결정하도록 두면 된다. |
+| `target.upstream` | URL 문자열 | `http://127.0.0.1:9999` | proxy가 요청을 전달하는 유일한 upstream. 요청 대상은 origin-form 경로여야 하며, 절대 URL 대상은 거부된다(SSRF 방어). |
+## `proxy`
+| 키 | 타입 / 값 | 기본값 | 설명 |
+|---|---|---|---|
+| `proxy.host` | 비어 있지 않은 문자열 | `127.0.0.1` | 바인드 주소. loopback이 아닌 host를 사용하려면 `--allow-remote-bind` CLI 플래그가 필요하다 — 설정 파일만으로는 시작되지 않는다([loopback 밖으로 바인딩](#binding-beyond-loopback) 참고). |
+| `proxy.port` | 정수 0–65535 | `1016` | 리슨 포트(`0` = 임시 포트). `--port`로 실행 시마다 덮어쓸 수 있다. |
+## `responseProtection`
+upstream JSON 응답을 검사한다(기본적으로 꺼져 있음 — 모델로부터 *돌아오는* 내용을 보호하려면 활성화한다).
+| 키 | 타입 / 값 | 기본값 | 설명 |
+|---|---|---|---|
+| `responseProtection.enabled` | boolean | `false` | 마스터 스위치. `detokenizeResponses`가 작동하려면 반드시 활성화되어 있어야 한다. |
+| `responseProtection.mode` | `dry-run` \| `report-only` \| `enforce` | `enforce` | 응답 방향의 집행 모드. |
+| `responseProtection.failureMode` | `fail-closed` \| `allow` | `fail-closed` | *검사 불가능한* 응답(비JSON, 잘못된 JSON, 압축)에 대한 처리 방식. `fail-closed`는 502를 반환하고, `allow`는 통과시킨다(audit 기록됨). |
+| `responseProtection.allowNonJson` | boolean | `false` | 비JSON 응답을 검사 없이 통과시킨다. |
+| `responseProtection.allowCompressed` | boolean | `false` | 압축 응답을 검사 없이 통과시킨다. |
+| `responseProtection.maxBytes` | 양의 정수 | `1048576` | 응답 크기의 상한. `failureMode: allow` 상태에서도 적용되며, 크기를 초과한 응답은 항상 거부된다. |
+## `streaming`
+| 키 | 타입 / 값 | 기본값 | 설명 |
+|---|---|---|---|
+| `streaming.requestMode` | `block` \| `pass-through` \| `inspect` | `block` | `block`은 스트리밍 요청에 `501`을 반환한다; `inspect`는 bounded cross-frame 버퍼로 SSE/NDJSON 응답을 stream-filter한다; `pass-through`는 검사 없이 전달한다(감사됨). Ollama의 `/api/chat`과 `/api/generate`는 `stream: false`가 명시되지 않으면 streaming으로 간주된다. |
+| `streaming.responseMode` | `dry-run` \| `report-only` \| `enforce` | `enforce` | 검사된 스트림에 적용되는 집행 모드(요청 방향과 독립적). |
+| `streaming.maxMatchBytes` | 양의 정수 | `256` | inspect 시 cross-frame 매칭 윈도우. 이 크기의 tail을 보유하여 프레임에 걸친 탐지를 방출 전에 포착할 수 있다; 이 값보다 긴 단일 매칭은 프레임에 걸쳐 분할될 수 있다. |
+## `limits`
+| 키 | 타입 / 값 | 기본값 | 설명 |
+|---|---|---|---|
+| `limits.maxRequestBytes` | 양의 정수 | `1048576` | 요청 바디 크기 상한. 초과 시 `413`을 반환한다. 바디를 전부 버퍼링하지 않고 증분 방식으로 적용된다. |
+| `limits.upstreamTimeoutMs` | 양의 정수 | `120000` | upstream 요청 타임아웃. 만료 시 `504 haechi_upstream_timeout`을 반환한다. 연결 실패 시에는 `502 haechi_upstream_unreachable`을 반환한다. |
+## `policy`
+탐지→결정의 핵심. [Detection type과 action](#detection-types--actions) 참고.
+| 키 | 타입 / 값 | 기본값 | 설명 |
+|---|---|---|---|
+| `policy.mode` | `dry-run` \| `report-only` \| `enforce` | `dry-run` | 실제 적용되는 집행 모드(`policy.mode ?? mode`). |
+| `policy.presets` | preset 이름 배열 | `["korean-pii", "secrets-only", "llm-redact"]` | 순서대로 병합되는 내장 action 집합. [Presets](#presets) 참고. |
+| `policy.defaultAction` | action | `redact` | 명시적 매핑이 없는 탐지 type에 적용되는 action. |
+| `policy.actions` | `{ <type>: <action> }` | `{ "card": "block" }` | type별 개별 재정의. 병합 시 **강화**는 가능하지만 약화는 불가([Action strength](#action-strength) 참고). `injection`은 설정하지 않으면 기본적으로 `allow`이다. |
+| `policy.allowUnsafeOverrides` | boolean | `false` | 더 약한 action이 더 강한 action을 덮어쓰는 것을 허용한다. 기본적으로 꺼져 있으며, 활성화하면 안전 장치가 제거된다. |
+| `policy.bundlePath` | 경로 | 미설정 | 인라인 정책 대신 서명된 policy bundle을 로드한다(`keys.keyFile`에 대해 검증됨). |
+## `filters`
+| 키 | 타입 / 값 | 기본값 | 설명 |
+|---|---|---|---|
+| `filters.customRules` | 규칙 객체 배열 | `[]` | 추가 탐지 규칙: `{ id, type, pattern, flags?, confidence? }`. 패턴은 ReDoS 검사를 통과해야 하며(≤500자, 중첩 한정자 없음, 역참조 없음), 안전하지 않으면 로드 시 거부된다. |
+## `keys`
+| 키 | 타입 / 값 | 기본값 | 설명 |
+|---|---|---|---|
+| `keys.provider` | `local` \| `external` | `local` | `local`은 소프트웨어 AES-256-GCM 키 파일을 사용한다(개발 전용). `external`은 키 자료를 포함하지 않으며, `createRuntime(config, { cryptoProvider })`를 통해 crypto provider를 주입해야 한다. |
+| `keys.keyFile` | 경로 | `.haechi/dev.keys.json` | 로컬 키 파일(모드 `0600`). `haechi init --force`는 키를 교체하며, 기존 키는 `kid`로 기존 암호문/token이 복호화 가능하도록 퇴역 상태로 보관된다. |
+## `audit`
+| 키 | 타입 / 값 | 기본값 | 설명 |
+|---|---|---|---|
+| `audit.sink` | `jsonl` | `jsonl` | `jsonl`만 지원된다. |
+| `audit.path` | 경로 | `.haechi/audit.jsonl` | SHA-256 hash chain 로그. `haechi audit-verify`로 검증한다. 평문/PII를 포함하지 않는다. |
+## `tokenVault`
+| 키 | 타입 / 값 | 기본값 | 설명 |
+|---|---|---|---|
+| `tokenVault.provider` | `local` | `local` | `local`만 지원된다. |
+| `tokenVault.path` | 경로 | `.haechi/token-vault.json` | 암호화된 token 저장소(원자적 쓰기, 파일 락). |
+| `tokenVault.revealPolicy` | `disabled` \| `local-dev` | `disabled` | **수동** reveal(`token-reveal`)을 허용할지 결정한다. 모든 reveal/purge 결정은 audit 기록된다. detokenization과는 독립적이다. |
+| `tokenVault.retentionDays` | 양의 수 | `30` | Token TTL. 만료된 token은 vault 쓰기 시 또는 `token-purge --expired`로 삭제된다. |
+| `tokenVault.deterministic` | boolean | `false` | 동일한 `(type, value)` → 동일한 token(도메인 분리된 파생 키로 HMAC). 멀티턴에 필요하다. **트레이드오프:** 동일한 값이 연결 가능해진다. |
+| `tokenVault.deterministicTypes` | `null` \| 비어 있지 않은 문자열 배열 | `null` | `null`이면 deterministic 활성화 시 모든 type에 적용. 그렇지 않으면 열거된 type에만 determinism을 제한한다(예: `["email"]`). |
+| `tokenVault.detokenizeResponses` | boolean | `false` | 해당 요청을 처리하며 발급한 token을 응답에서 복원한다. 동일 요청을 보호하며 발급된 token만 복원되며, `responseProtection.enabled`가 필요하다. 개수 단위로 audit 기록된다. |
+## `privacy`
+| 키 | 타입 / 값 | 기본값 | 설명 |
+|---|---|---|---|
+| `privacy.profile` | `null` \| `kr-pipa` \| `eu-gdpr` \| `us-general` | `null` | 집행 전에 지역별 기준 action 집합을 적용한다. 프로필은 명시적 action을 **강화**할 수는 있지만 약화할 수는 없다. 엔지니어링 기본값이며, 법적 자문이 아니다. |
+## `mcp`
+`mcp-stdio`와 `mcp-wrap`에 적용된다.
+| 키 | 타입 / 값 | 기본값 | 설명 |
+|---|---|---|---|
+| `mcp.allowedMethods` | 비어 있지 않은 문자열 배열 | `["initialize", "tools/call", "resources/read", "prompts/get"]` | 클라이언트가 호출할 수 있는 method allowlist(`"*"`는 전체 허용). 서버가 먼저 시작하는 요청은 allowlist를 우회하지만 params는 여전히 보호된다. |
+| `mcp.protectParams` | boolean | `true` | 요청 `params`를 보호한다. |
+| `mcp.protectResults` | boolean | `true` | 응답 `result`를 보호한다(injection 휴리스틱도 실행). |
+| `mcp.requireJsonRpc` | boolean | `true` | `jsonrpc: "2.0"`을 요구하며, 규격에 맞지 않는 메시지는 거부된다. |
+## `auth`
+| 키 | 타입 / 값 | 기본값 | 설명 |
+|---|---|---|---|
+| `auth.provider` | `none` \| `bearer` \| `external` | `none` | `none` = 인증 없음(identity null). `bearer` = 내장 token auth. `external`은 `createRuntime(config, { authProvider })`를 통해 `authProvider`를 주입해야 한다. |
+| `auth.store` | 경로 | `.haechi/auth.json` | Bearer token 저장소(모드 `0600`). Token은 keyed-HMAC 해시로만 보관되며, 평문은 `haechi auth add` 실행 시 한 번만 표시된다. |
+| `auth.allowedLabelKeys` | 문자열 배열 | `["team", "env", "tier", "role"]` | Token이 가질 수 있는 label 키; 값은 길이가 제한되며 PII를 포함하면 안 된다. |
+## `policy` profiles & limits
+기본 `policy` 위에 클라이언트별 통제를 레이어로 추가한다. [Named profiles](#named-profiles) 참고.
+| 키 | 타입 / 값 | 기본값 | 설명 |
+|---|---|---|---|
+| `policy.profiles` | `{ <name>: { presets?, actions?, modelAllowlist?, rate? } }` | `{}` | Named profile; 각각 기본 policy를 재정의한다. |
+| `policy.profileBinding` | `{ byScope?, byLabel?, default }` | 미설정 | identity scope/label(`"k=v"` 형태)을 profile 이름으로 매핑한다. `profiles`가 설정된 경우 `default`는 **필수**이며 가장 엄격한 profile이어야 한다(fail-closed). |
+| `policy.modelAllowlist` | 문자열 배열 | 미설정 | 허용된 `model` 값(기본 레벨; profile별로도 설정 가능). 허용되지 않은 모델 → `403`. 비어 있거나 없으면 모두 허용. |
+| `policy.rate` | `{ requestsPerMinute }` | 미설정 | identity별 요청 rate limit(기본 레벨 또는 profile별). 초과 시 → `429`. 인메모리, 프로세스별. |
+### Named profiles
+identity가 인증되면 **scope → label → `default`** 순으로 profile이 resolve된다; scope가 label보다 우선하며 첫 번째 매칭이 적용된다. `profiles`가 없거나 `auth.provider: none`인 경우 기본 policy가 적용된다. Resolve된 profile의 policy 엔진, `modelAllowlist`, `rate`가 해당 요청을 처리한다.
+## Detection type과 action
+내장 탐지 `type` 값: `email`, `phone`, `kr_rrn`, `card`, `api_key`, `secret`, `injection`(응답 방향 휴리스틱, 기본 report-only). 커스텀 규칙으로 새로운 type을 추가할 수 있다.
+Action(약한 것 → 강한 것 순):
+| Action | 효과 |
+|---|---|
+| `allow` | 변경 없음(탐지 및 audit은 기록됨). |
+| `redact` | `[REDACTED:<type>]`으로 교체한다. |
+| `mask` | 부분 마스킹한다(값이 8자 이하이면 전체 마스킹). |
+| `tokenize` | vault token으로 교체한다. token vault를 통해 복원 가능하다. |
+| `encrypt` | 인라인 AES-256-GCM 봉투로 교체한다. |
+| `block` | 전체 payload를 거부한다(`403`/`-32001`/exit 3). |
+### Action strength
+preset과 override(또는 privacy profile)가 충돌할 경우 **강한** action이 우선하며, `policy.allowUnsafeOverrides`가 `true`가 아니면 더 강한 action을 약화하려 할 경우 오류가 발생한다. 강도 순: `allow`(0) < `redact`/`mask`(1) < `tokenize`/`encrypt`(2) < `block`(3).
+### Presets
+| Preset | 효과 |
+|---|---|
+| `llm-redact` | 기본 `redact`; `email: redact`, `phone: mask` |
+| `korean-pii` | `kr_rrn: block`, `phone: mask`, `email: redact` |
+| `secrets-only` | `api_key: block`, `secret: block` |
+| `strict-block` | 기본 `block` |
+| `mcp-basic` | 기본 `redact`; `api_key`/`secret`/`kr_rrn: block` |
+| `local-inference` | 기본 `redact`; `email: tokenize`, `phone: mask`, secrets/`kr_rrn: block` |
+| `local-only` | 전송을 외부 전송이 아닌 것으로 표시(메타데이터) |
+## 자주 쓰는 설정
+**enforce 모드에서 요청 보호(최소 설정):**
+```json
+{ "mode": "enforce", "policy": { "mode": "enforce" } }
+```
+**로컬 inference, response protection + token round-trip:**
+```json
+{
+  "mode": "enforce",
+  "target": { "type": "vllm-openai", "upstream": "http://127.0.0.1:8000" },
+  "policy": { "mode": "enforce", "presets": ["local-inference"] },
+  "responseProtection": { "enabled": true, "mode": "enforce" },
+  "tokenVault": { "deterministic": true, "detokenizeResponses": true }
+}
+```
+**EU 프로필, secret 차단, injection 플래그:**
+```json
+{
+  "mode": "enforce",
+  "privacy": { "profile": "eu-gdpr" },
+  "policy": { "mode": "enforce", "actions": { "injection": "redact" } },
+  "responseProtection": { "enabled": true }
+}
+```
+## loopback 밖으로 바인딩
+proxy는 CLI 플래그를 명시적으로 전달하지 않으면 loopback이 아닌 host를 거부한다 — 설정 파일에 `proxy.host: "0.0.0.0"`을 지정해도 의도적으로 시작되지 않는다:
+```bash
+haechi proxy --config haechi.config.json --host 0.0.0.0 --allow-remote-bind
+```
+**proxy는 아직 클라이언트 인증을 제공하지 않는다**(0.6 계획): 포트에 접근할 수 있는 누구든 upstream과 token round-trip 경로를 사용할 수 있다. `--allow-remote-bind`는 명시적인 네트워크 통제 하에서만 사용해야 한다 — 컨테이너 내에서 `0.0.0.0`으로 바인드하고 host 포트 매핑을 제한하거나(`-p 127.0.0.1:1016:1016`), 방화벽/VPN/인증 reverse proxy 뒤에 두어야 한다.
+## 검증 요약
+다음은 로드 시 오류(fail-closed)를 발생시킨다: 알 수 없는 `keys.provider`; 빈 `proxy.host`; 범위를 벗어난 `proxy.port`; `jsonl`이 아닌 `audit.sink`; `local`이 아닌 `tokenVault.provider`; 잘못된 `revealPolicy`; 양수가 아닌 `retentionDays`; boolean이 아닌 `deterministic`/`detokenizeResponses`; 비어 있거나 문자열이 아닌 `deterministicTypes`; 비어 있거나 문자열이 아닌 `mcp.allowedMethods`; boolean이 아닌 `mcp.*` 플래그; 알 수 없는 `privacy.profile`; 잘못된 `responseProtection.failureMode`; 양수가 아닌 `responseProtection.maxBytes`; 잘못된 `streaming.requestMode`; 잘못된 `streaming.responseMode`; 양수가 아닌 `streaming.maxMatchBytes`; 잘못된 `auth.provider`; 빈 `auth.store`; 문자열이 아닌 `auth.allowedLabelKeys`; 객체가 아닌 `policy.profiles`; 유효한 `default` 없는 `policy.profileBinding`; 문자열이 아닌 `policy.modelAllowlist`; 양수가 아닌 `policy.rate.requestsPerMinute`; 양수가 아닌 `limits.*`; 알 수 없는 `target.type`/`adapter`; 안전하지 않은 커스텀 정규식; `allowUnsafeOverrides` 없이 action을 약화하려는 시도.

package/docs/current/configuration.md ADDED Viewed

@@ -0,0 +1,233 @@
+# Haechi Configuration Reference
+- Status: Living document
+- Target version: 0.6.0
+`haechi init` writes `haechi.config.json`; a non-secret template is at `haechi.config.example.json`. Every command reads it with `--config <path>` (default `haechi.config.json`). Configuration is **validated fail-closed**: unknown providers, out-of-range numbers, and malformed values throw at load time rather than degrading silently. `haechi config` prints this reference; `haechi status` prints the *effective* state of a given config.
+## Full default
+```json
+{
+  "mode": "dry-run",
+  "target": { "type": "llm-http", "adapter": "openai-compatible", "upstream": "http://127.0.0.1:9999" },
+  "proxy": { "host": "127.0.0.1", "port": 1016 },
+  "responseProtection": { "enabled": false, "mode": "enforce", "failureMode": "fail-closed", "allowNonJson": false, "allowCompressed": false, "maxBytes": 1048576 },
+  "streaming": { "requestMode": "block" },
+  "limits": { "maxRequestBytes": 1048576, "upstreamTimeoutMs": 120000 },
+  "policy": { "mode": "dry-run", "presets": ["korean-pii", "secrets-only", "llm-redact"], "defaultAction": "redact", "actions": { "card": "block" } },
+  "filters": { "customRules": [] },
+  "keys": { "provider": "local", "keyFile": ".haechi/dev.keys.json" },
+  "audit": { "sink": "jsonl", "path": ".haechi/audit.jsonl" },
+  "tokenVault": { "provider": "local", "path": ".haechi/token-vault.json", "revealPolicy": "disabled", "retentionDays": 30, "deterministic": false, "deterministicTypes": null, "detokenizeResponses": false },
+  "privacy": { "profile": null },
+  "mcp": { "allowedMethods": ["initialize", "tools/call", "resources/read", "prompts/get"], "protectParams": true, "protectResults": true, "requireJsonRpc": true }
+}
+```
+## Top level
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `mode` | `dry-run` \| `report-only` \| `enforce` | `dry-run` | Global enforcement mode. `dry-run`/`report-only` detect + audit only; `enforce` transforms/blocks. Overridden by `policy.mode` when set. |
+## `target`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `target.type` | `llm-http` \| `openai-compatible` \| `vllm-openai` \| `ollama` \| `llama-cpp` | `llm-http` | Selects the protocol adapter. `llm-http` aliases `openai-compatible`. Unknown values **fail closed** at load. |
+| `target.adapter` | same set | `openai-compatible` | Explicit adapter override; usually leave unset and let `type` decide. |
+| `target.upstream` | URL string | `http://127.0.0.1:9999` | The only upstream the proxy forwards to. Request targets must be origin-form paths; absolute-URL targets are rejected (SSRF guard). |
+## `proxy`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `proxy.host` | non-empty string | `127.0.0.1` | Bind address. Non-loopback hosts require the `--allow-remote-bind` CLI flag — config alone will not start (see [Binding beyond loopback](#binding-beyond-loopback)). |
+| `proxy.port` | integer 0–65535 | `1016` | Listen port (`0` = ephemeral). Override per-run with `--port`. |
+## `responseProtection`
+Inspects upstream JSON responses (off by default — turn on to protect what comes *back* from the model).
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `responseProtection.enabled` | boolean | `false` | Master switch. Required for `detokenizeResponses` to do anything. |
+| `responseProtection.mode` | `dry-run` \| `report-only` \| `enforce` | `enforce` | Enforcement mode for the response direction. |
+| `responseProtection.failureMode` | `fail-closed` \| `allow` | `fail-closed` | What to do with an *uninspectable* response (non-JSON, invalid JSON, compressed). `fail-closed` returns 502; `allow` passes it through (audited). |
+| `responseProtection.allowNonJson` | boolean | `false` | Permit non-JSON responses through without inspection. |
+| `responseProtection.allowCompressed` | boolean | `false` | Permit compressed responses through without inspection. |
+| `responseProtection.maxBytes` | positive integer | `1048576` | Hard response size cap. Enforced even under `failureMode: allow` — oversized responses are always denied. |
+## `streaming`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `streaming.requestMode` | `block` \| `pass-through` \| `inspect` | `block` | `block` → `501` for streaming; `inspect` → stream-filter SSE/NDJSON responses (bounded cross-frame buffer); `pass-through` → forward uninspected (audited). Ollama `/api/chat` and `/api/generate` are treated as streaming unless `stream: false` is set. |
+| `streaming.responseMode` | `dry-run` \| `report-only` \| `enforce` | `enforce` | Enforcement mode applied to inspected streams (independent of the request direction). |
+| `streaming.maxMatchBytes` | positive integer | `256` | Cross-frame match window for `inspect`. A held tail of this size lets a detection spanning frames be caught before emission; a single match longer than this may still split across frames. |
+## `limits`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `limits.maxRequestBytes` | positive integer | `1048576` | Request body cap; over the limit returns `413`. Enforced incrementally (the body is not fully buffered first). |
+| `limits.upstreamTimeoutMs` | positive integer | `120000` | Upstream request timeout; on expiry returns `504 haechi_upstream_timeout`. Connection failure returns `502 haechi_upstream_unreachable`. |
+## `policy`
+The detect→decide core. See [Detection types & actions](#detection-types--actions).
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `policy.mode` | `dry-run` \| `report-only` \| `enforce` | `dry-run` | Effective enforcement mode (`policy.mode ?? mode`). |
+| `policy.presets` | array of preset names | `["korean-pii", "secrets-only", "llm-redact"]` | Bundled action sets, merged in order. See [Presets](#presets). |
+| `policy.defaultAction` | an action | `redact` | Action for a detected type with no explicit mapping. |
+| `policy.actions` | `{ <type>: <action> }` | `{ "card": "block" }` | Per-type overrides. Merges may **strengthen** but never weaken (see [Action strength](#action-strength)); `injection` defaults to `allow` unless set. |
+| `policy.allowUnsafeOverrides` | boolean | `false` | Permit a weaker action to override a stronger one. Off by default; turning it on removes a safety guard. |
+| `policy.bundlePath` | path | unset | Load a signed policy bundle instead of inline policy (verified against `keys.keyFile`). |
+## `filters`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `filters.customRules` | array of rule objects | `[]` | Extra detection rules: `{ id, type, pattern, flags?, confidence? }`. Patterns are ReDoS-screened (≤500 chars, no nested quantifiers, no backreferences) and rejected at load if unsafe. |
+## `keys`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `keys.provider` | `local` \| `external` | `local` | `local` uses a software AES-256-GCM key file (dev only). `external` ships no key material and **requires** injecting a crypto provider via `createRuntime(config, { cryptoProvider })`. |
+| `keys.keyFile` | path | `.haechi/dev.keys.json` | Local key file (mode `0600`). `haechi init --force` rotates it, retiring prior keys so existing ciphertext/tokens stay decryptable by `kid`. |
+## `audit`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `audit.sink` | `jsonl` | `jsonl` | Only `jsonl` is supported. |
+| `audit.path` | path | `.haechi/audit.jsonl` | SHA-256 hash-chained log; verify with `haechi audit-verify`. Never contains plaintext/PII. |
+## `tokenVault`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `tokenVault.provider` | `local` | `local` | Only `local` is supported. |
+| `tokenVault.path` | path | `.haechi/token-vault.json` | Encrypted token store (atomic writes, file-locked). |
+| `tokenVault.revealPolicy` | `disabled` \| `local-dev` | `disabled` | Gates **manual** reveal (`token-reveal`). Every reveal/purge decision is audited. Independent of detokenization. |
+| `tokenVault.retentionDays` | positive number | `30` | Token TTL. Expired tokens are deleted on vault writes or via `token-purge --expired`. |
+| `tokenVault.deterministic` | boolean | `false` | Equal `(type, value)` → equal token (HMAC over a domain-separated derived key). Needed for multi-turn. **Trade-off:** equal values become linkable. |
+| `tokenVault.deterministicTypes` | `null` \| non-empty string array | `null` | `null` = all types when deterministic; otherwise limit determinism to listed types (e.g. `["email"]`). |
+| `tokenVault.detokenizeResponses` | boolean | `false` | Restore request-issued tokens in that request's response. Only the tokens issued while protecting the same request are restored; requires `responseProtection.enabled`. Audited by count. |
+## `privacy`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `privacy.profile` | `null` \| `kr-pipa` \| `eu-gdpr` \| `us-general` | `null` | Applies a regional baseline action set before enforcement. Profiles may **strengthen** but never weaken your explicit actions. Engineering defaults, not legal advice. |
+## `mcp`
+Applies to `mcp-stdio` and `mcp-wrap`.
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `mcp.allowedMethods` | non-empty string array | `["initialize", "tools/call", "resources/read", "prompts/get"]` | Client-callable method allowlist (`"*"` allows all). Server-initiated requests bypass the allowlist but are still params-protected. |
+| `mcp.protectParams` | boolean | `true` | Protect request `params`. |
+| `mcp.protectResults` | boolean | `true` | Protect response `result` (and run injection heuristics on it). |
+| `mcp.requireJsonRpc` | boolean | `true` | Require `jsonrpc: "2.0"`; non-conforming messages are rejected. |
+## `auth`
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `auth.provider` | `none` \| `bearer` \| `external` | `none` | `none` = no auth (identity null). `bearer` = built-in token auth. `external` requires injecting an `authProvider` via `createRuntime(config, { authProvider })`. |
+| `auth.store` | path | `.haechi/auth.json` | Bearer token store (mode `0600`). Tokens are kept only as keyed-HMAC hashes; the plaintext is shown once by `haechi auth add`. |
+| `auth.allowedLabelKeys` | string array | `["team", "env", "tier", "role"]` | Label keys a token may carry; values are length-limited and must not contain PII. |
+## `policy` profiles & limits
+Per-client controls layered on top of the base `policy`. See [Named profiles](#named-profiles).
+| Key | Type / values | Default | Notes |
+|---|---|---|---|
+| `policy.profiles` | `{ <name>: { presets?, actions?, modelAllowlist?, rate? } }` | `{}` | Named profiles; each overrides the base policy. |
+| `policy.profileBinding` | `{ byScope?, byLabel?, default }` | unset | Maps identity scopes/labels (`"k=v"` for labels) to profile names. `default` is **required** when `profiles` is set and should be the strictest profile (fail-closed). |
+| `policy.modelAllowlist` | string array | unset | Allowed `model` values (base level; also settable per profile). A disallowed model → `403`. Empty/absent = allow all. |
+| `policy.rate` | `{ requestsPerMinute }` | unset | Per-identity request rate limit (base level or per profile). Over the limit → `429`. In-memory, per-process. |
+### Named profiles
+When an identity authenticates, its profile resolves in order **scope → label → `default`**; scope precedes label and the first match wins. Without `profiles`, or under `auth.provider: none`, the base policy applies. The resolved profile's policy engine, `modelAllowlist`, and `rate` govern that request.
+## Detection types & actions
+Built-in detection `type` values: `email`, `phone`, `kr_rrn`, `card`, `api_key`, `secret`, and `injection` (response-direction heuristic, report-only by default). Custom rules may introduce new types.
+Actions (weakest → strongest):
+| Action | Effect |
+|---|---|
+| `allow` | No change (still detected and audited). |
+| `redact` | Replace with `[REDACTED:<type>]`. |
+| `mask` | Partially mask (values ≤8 chars are fully masked). |
+| `tokenize` | Replace with a vault token; reversible via the token vault. |
+| `encrypt` | Replace with an inline AES-256-GCM envelope. |
+| `block` | Reject the whole payload (`403`/`-32001`/exit 3). |
+### Action strength
+When a preset and an override (or a privacy profile) disagree, the **stronger** action wins, and trying to weaken a stronger action throws unless `policy.allowUnsafeOverrides` is `true`. Strength: `allow`(0) < `redact`/`mask`(1) < `tokenize`/`encrypt`(2) < `block`(3).
+### Presets
+| Preset | Effect |
+|---|---|
+| `llm-redact` | default `redact`; `email: redact`, `phone: mask` |
+| `korean-pii` | `kr_rrn: block`, `phone: mask`, `email: redact` |
+| `secrets-only` | `api_key: block`, `secret: block` |
+| `strict-block` | default `block` |
+| `mcp-basic` | default `redact`; `api_key`/`secret`/`kr_rrn: block` |
+| `local-inference` | default `redact`; `email: tokenize`, `phone: mask`, secrets/`kr_rrn: block` |
+| `local-only` | marks transfer as non-external (metadata) |
+## Common setups
+**Protect requests in enforce mode (minimal):**
+```json
+{ "mode": "enforce", "policy": { "mode": "enforce" } }
+```
+**Local inference with response protection + token round-trip:**
+```json
+{
+  "mode": "enforce",
+  "target": { "type": "vllm-openai", "upstream": "http://127.0.0.1:8000" },
+  "policy": { "mode": "enforce", "presets": ["local-inference"] },
+  "responseProtection": { "enabled": true, "mode": "enforce" },
+  "tokenVault": { "deterministic": true, "detokenizeResponses": true }
+}
+```
+**EU profile, secrets blocked, injection flagged:**
+```json
+{
+  "mode": "enforce",
+  "privacy": { "profile": "eu-gdpr" },
+  "policy": { "mode": "enforce", "actions": { "injection": "redact" } },
+  "responseProtection": { "enabled": true }
+}
+```
+## Binding beyond loopback
+The proxy refuses non-loopback hosts unless the CLI flag is passed explicitly — `proxy.host: "0.0.0.0"` in config alone will not start, by design:
+```bash
+haechi proxy --config haechi.config.json --host 0.0.0.0 --allow-remote-bind
+```
+**The proxy has no client authentication yet** (planned for 0.6): anyone who can reach the port can use your upstream and the token round-trip path. Use `--allow-remote-bind` only behind explicit network controls — bind `0.0.0.0` inside a container and restrict the host port mapping (`-p 127.0.0.1:1016:1016`), or front it with a firewall/VPN/authenticating reverse proxy.
+## Validation cheatsheet
+These throw at load (fail-closed): unknown `keys.provider`; empty `proxy.host`; out-of-range `proxy.port`; non-`jsonl` `audit.sink`; non-`local` `tokenVault.provider`; bad `revealPolicy`; non-positive `retentionDays`; non-boolean `deterministic`/`detokenizeResponses`; empty/non-string `deterministicTypes`; empty/non-string `mcp.allowedMethods`; non-boolean `mcp.*` flags; unknown `privacy.profile`; bad `responseProtection.failureMode`; non-positive `responseProtection.maxBytes`; bad `streaming.requestMode`/`streaming.responseMode`; non-positive `streaming.maxMatchBytes`; bad `auth.provider`; empty `auth.store`; non-string `auth.allowedLabelKeys`; non-object `policy.profiles`; `policy.profileBinding` without a valid `default`; non-string `policy.modelAllowlist`; non-positive `policy.rate.requestsPerMinute`; non-positive `limits.*`; unknown `target.type`/`adapter`; unsafe custom regex; weakening action without `allowUnsafeOverrides`.

package/docs/current/release-0.5-implementation-scope.ko.md ADDED Viewed

@@ -0,0 +1,69 @@
+# Haechi 0.5 Implementation Scope
+- 문서 상태: Final
+- 작성일: 2026-06-10
+- 기준 버전: 0.5.0 (0.4.0 이후)
+- 성격: streaming hardening
+- 구현 완료: 2026-06-10 — PR #14 (streaming inspection)
+## 1. 릴리스 목표
+streaming 보호 공백을 메운다: SSE/NDJSON 응답 stream을 차단하거나 무보호로 통과시키는 대신 직접 검사(inspect)한다. Streaming은 실제 LLM 사용에서 가장 흔한 전송 방식이므로 "streaming을 쓰면 보호를 포기해야 한다"는 구조가 핵심 잔여 취약점이었다.
+## 2. 범위
+### 2.1 Streaming 응답 검사
+- 새 `streaming.requestMode: "inspect"` (`block` 및 `pass-through`와 병존).
+- `packages/stream-filter`: 두 가지 wire format에 대한 점진적 frame parser — SSE (`data: …\n\n`)와 NDJSON (`{…}\n`). `[DONE]`, keep-alive comment, 비-JSON frame은 원문 그대로 통과한다.
+- 각 protocol-adapter streaming 라우트는 `{ format, deltaPath }` — 주 점진적 텍스트 채널 — 을 선언한다.
+  - OpenAI-compatible / vLLM / llama.cpp chat-completions: SSE, `choices[0].delta.content`
+  - completions: SSE, `choices[0].text`
+  - llama.cpp `/completion`: SSE, `content`
+  - Ollama `/api/chat`: NDJSON, `message.content`
+  - Ollama `/api/generate`: NDJSON, `response`
+  - OpenAI `/v1/responses`: SSE, 고정 delta path 없음 (frame 전체 보호만)
+### 2.2 Cross-frame 정확성 (sliding buffer)
+Stream으로 전송된 바이트는 회수할 수 없으므로 탐지는 값이 방출되기 전에 이루어져야 한다. `core`에 `createStreamProtector`가 추가된다. 이 상태 유지(stateful) protector는 다음과 같이 동작한다.
+- delta 채널의 bounded **raw tail**을 보관한다. push가 들어올 때마다 누적된 pending 텍스트에 대해 탐지를 수행하고, `len - maxMatchBytes`를 commit point로 계산하며, 이를 가로지르는 탐지가 발생하기 전에 commit point를 후퇴시킨다. committed prefix만 변환하여 방출하고 tail은 다음 frame을 위해 보관한다.
+- stream 끝에서 보관 중인 tail을 합성된 최종 frame으로 flush한다.
+- frame의 그 밖의 모든 문자열 리프(tool-call argument 등)에 대해 `protectFrameExtras`를 실행하며, within-frame 보호를 적용한다.
+- `streaming.maxMatchBytes` (기본값 256)는 **보장 범위의 경계**다: window보다 긴 단일 match는 여전히 frame 사이에서 분할될 수 있다. 문서화된 한계 사항.
+### 2.3 Enforcement와 audit
+- Streaming 호출의 request body는 일반 JSON이며 포워딩 전에 일반 요청과 동일하게 보호된다.
+- `block` action은 문제가 되는 값이 방출되기 전에 stream을 중단한다(buffer에 보관 중이며 commit되지 않은 상태). 연결은 종료된다. 이미 방출된 바이트는 회수할 수 없다 — streaming의 문서화된 한계.
+- stream 전체에 대해 한 번 audit 기록: `stream_inspected` 또는 `stream_blocked`, 집계 탐지 횟수만 기록(평문 없음). `identity: null`은 다른 곳과 동일하게 예약.
+- 새 `streaming.responseMode` (`dry-run` | `report-only` | `enforce`, 기본값 `enforce`)로 응답 방향 enforcement 모드를 독립적으로 제어한다.
+### 2.4 Adapter 라우팅 수정
+특정 `target.type` (`ollama`, `vllm-openai`, `llama-cpp`)이 이제 deep-merge된 기본 `target.adapter` (`openai-compatible`)보다 우선된다. 기존에는 `target.type: "ollama"`만 설정한 config가 기본 adapter가 merge 후에도 살아남아 OpenAI 경로로 조용히 라우팅되었다 — 이로 인해 streaming 분류도 무력화되었다.
+## 3. 명시적 비범위 (0.5에서 하지 않음)
+- Stream sequence AAD 및 replay cache (보류; encryption-on-stream 필요 시점에 해당).
+- Per-choice (`n > 1`) cross-frame buffering — secondary choice는 within-frame 보호만 적용.
+- Stream 내부의 base64/인코딩 값 디코딩 (비-streaming과 동일한 제외 항목).
+- MCP의 양방향 streaming (stdio filter는 line-framed JSON-RPC로 이미 처리됨).
+## 4. 테스트 기준
+- Within-frame 및 cross-frame (byte 단위 분할 포함) PII를 SSE와 NDJSON 모두에서 탐지.
+- `[DONE]` / keep-alive / 비-JSON frame 보존.
+- delta 외 PII (tool-call argument) within-frame 보호.
+- `block`은 값 방출 전에 stream 중단; `report-only`는 변환 없이 탐지.
+- Proxy e2e: 요청 보호, 응답 stream-filter, audit chain 유효, audit에 평문 없음.
+- `inspect` 하에서 검사 불가 라우트는 fail-closed (501).
+- `requestMode: inspect`, `responseMode`, `maxMatchBytes`에 대한 config 검증.
+## 5. 문서 영향
+- README: streaming inspection 섹션, config 참조 행, `configuration.md` 업데이트.
+- threat-model: streaming이 "검사 불가, 차단됨"에서 "검사됨(bounded)"으로 이동; `maxMatchBytes` 한계와 block 시 방출된 바이트 한계를 문서화된 제외 항목으로 기재.
+- risk-register: 0.5.0 백로그 행 완료 처리.
+- api-stability: `haechi/stream-filter`와 `createStreamProtector`를 experimental로 표기.

package/docs/current/release-0.5-implementation-scope.md ADDED Viewed

@@ -0,0 +1,69 @@
+# Haechi 0.5 Implementation Scope
+- Status: Final
+- Date: 2026-06-10
+- Target version: 0.5.0 (after 0.4.0)
+- Type: streaming hardening
+- Shipped: 2026-06-10 — PR #14 (streaming inspection)
+## 1. Release Goal
+Close the streaming protection gap: inspect SSE/NDJSON response streams instead of only blocking them or passing them through unprotected. Streaming is the common transport for real LLM usage, so "use streaming and give up protection" was the main remaining hole.
+## 2. Scope
+### 2.1 Streaming response inspection
+- New `streaming.requestMode: "inspect"` (alongside `block` and `pass-through`).
+- `packages/stream-filter`: incremental frame parser for two wire formats — SSE (`data: …\n\n`) and NDJSON (`{…}\n`). `[DONE]`, keep-alive comments, and non-JSON frames pass through verbatim.
+- Each protocol-adapter streaming route declares `{ format, deltaPath }`: the primary incremental-text channel.
+  - OpenAI-compatible / vLLM / llama.cpp chat-completions: SSE, `choices[0].delta.content`
+  - completions: SSE, `choices[0].text`
+  - llama.cpp `/completion`: SSE, `content`
+  - Ollama `/api/chat`: NDJSON, `message.content`
+  - Ollama `/api/generate`: NDJSON, `response`
+  - OpenAI `/v1/responses`: SSE, no fixed delta path (whole-frame protection only)
+### 2.2 Cross-frame correctness (sliding buffer)
+Streamed bytes cannot be retracted, so detection must happen before a value is emitted. `core` gains `createStreamProtector`, a stateful protector that:
+- Holds a bounded **raw tail** of the delta channel. On each push it detects on the accumulated pending text, computes a commit point of `len - maxMatchBytes`, and pulls the commit point back before any detection that straddles it. Only the committed prefix is transformed and emitted; the tail is held for the next frame.
+- Flushes the held tail as a synthesized final frame at end of stream.
+- Runs `protectFrameExtras` for all other string leaves of a frame (tool-call arguments, etc.) with within-frame protection.
+- `streaming.maxMatchBytes` (default 256) **bounds the guarantee**: a single match longer than the window may still split across frames. Documented limitation.
+### 2.3 Enforcement and audit
+- The request body of a streaming call is ordinary JSON and is protected like any request before forwarding.
+- `block` actions stop the stream before the offending value is emitted (held in the buffer, never committed); the connection is ended. Bytes already emitted cannot be retracted — a documented limit of streaming.
+- The whole stream is audited once: `stream_inspected` or `stream_blocked`, with aggregate detection counts only (no plaintext). `identity: null` reserved as elsewhere.
+- New `streaming.responseMode` (`dry-run` | `report-only` | `enforce`, default `enforce`) controls the response-direction enforcement mode independently.
+### 2.4 Adapter routing fix
+A specific `target.type` (`ollama`, `vllm-openai`, `llama-cpp`) now takes precedence over a deep-merged default `target.adapter` (`openai-compatible`). Previously a config that set only `target.type: "ollama"` was silently routed to OpenAI paths because the default adapter survived the merge — which also defeated streaming classification.
+## 3. Explicit non-scope (not in 0.5)
+- Stream sequence AAD and replay cache (deferred; relevant once encryption-on-stream is needed).
+- Per-choice (`n > 1`) cross-frame buffering — secondary choices get within-frame protection only.
+- Decoding base64/encoded values inside streams (same exclusion as non-streaming).
+- Bidirectional streaming for MCP (the stdio filter is line-framed JSON-RPC, already handled).
+## 4. Test criteria
+- Within-frame and cross-frame (including byte-by-byte split) PII caught in both SSE and NDJSON.
+- `[DONE]` / keep-alive / non-JSON frames preserved.
+- Non-delta PII (tool-call args) protected within-frame.
+- `block` stops the stream before emitting the value; `report-only` detects without transforming.
+- Proxy e2e: request protected, response stream-filtered, audit chain valid, no plaintext in audit.
+- Uninspectable route under `inspect` fails closed (501).
+- Config validation for `requestMode: inspect`, `responseMode`, `maxMatchBytes`.
+## 5. Documentation impact
+- README: streaming inspection section, config reference rows, `configuration.md` updates.
+- threat-model: streaming moves from "uninspectable, blocked" to "inspected (bounded)"; the `maxMatchBytes` limit and emitted-bytes-on-block limit are documented exclusions.
+- risk-register: 0.5.0 backlog row checked off.
+- api-stability: `haechi/stream-filter` and `createStreamProtector` marked experimental.