npm - haechi - Versions diffs - 0.3.2 - Mend

haechi 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (54) hide show

package/LICENSE +154 -0
package/README.md +102 -0
package/SECURITY.md +31 -0
package/docs/README.md +35 -0
package/docs/current/api-stability.ko.md +48 -0
package/docs/current/api-stability.md +48 -0
package/docs/current/expert-gap-review-ai-llm-mcp-encryption.ko.md +107 -0
package/docs/current/expert-gap-review-ai-llm-mcp-encryption.md +107 -0
package/docs/current/global-privacy-compliance-review.ko.md +110 -0
package/docs/current/global-privacy-compliance-review.md +110 -0
package/docs/current/initial-plan-ai-llm-mcp-encryption.ko.md +214 -0
package/docs/current/initial-plan-ai-llm-mcp-encryption.md +214 -0
package/docs/current/mvp-0.1-implementation-scope.ko.md +79 -0
package/docs/current/mvp-0.1-implementation-scope.md +79 -0
package/docs/current/open-source-modular-architecture.ko.md +387 -0
package/docs/current/open-source-modular-architecture.md +387 -0
package/docs/current/prd-ai-llm-mcp-encryption.ko.md +260 -0
package/docs/current/prd-ai-llm-mcp-encryption.md +262 -0
package/docs/current/privacy-filtering-policy-draft.ko.md +307 -0
package/docs/current/privacy-filtering-policy-draft.md +307 -0
package/docs/current/release-0.2-implementation-scope.ko.md +46 -0
package/docs/current/release-0.2-implementation-scope.md +46 -0
package/docs/current/release-0.3-implementation-scope.ko.md +86 -0
package/docs/current/release-0.3-implementation-scope.md +86 -0
package/docs/current/release-0.3.2-hardening-scope.ko.md +64 -0
package/docs/current/release-0.3.2-hardening-scope.md +64 -0
package/docs/current/release-0.4-implementation-scope.ko.md +121 -0
package/docs/current/release-0.4-implementation-scope.md +121 -0
package/docs/current/release-process.ko.md +48 -0
package/docs/current/release-process.md +48 -0
package/docs/current/risk-register-release-gate.ko.md +154 -0
package/docs/current/risk-register-release-gate.md +154 -0
package/docs/current/shared-responsibility.ko.md +38 -0
package/docs/current/shared-responsibility.md +38 -0
package/docs/current/threat-model.ko.md +68 -0
package/docs/current/threat-model.md +68 -0
package/examples/llm-prompt-filtering/input.json +13 -0
package/examples/plugins/custom-filter.plugin.json +29 -0
package/haechi.config.example.json +70 -0
package/package.json +74 -0
package/packages/audit/index.mjs +262 -0
package/packages/cli/bin/haechi.mjs +341 -0
package/packages/cli/runtime.mjs +287 -0
package/packages/core/index.mjs +309 -0
package/packages/crypto/index.mjs +142 -0
package/packages/filter/index.mjs +189 -0
package/packages/mcp-stdio/index.mjs +105 -0
package/packages/plugin/index.mjs +83 -0
package/packages/policy/index.mjs +165 -0
package/packages/policy-bundle/index.mjs +91 -0
package/packages/privacy-profiles/index.mjs +92 -0
package/packages/protocol-adapters/index.mjs +111 -0
package/packages/proxy/index.mjs +534 -0
package/packages/token-vault/index.mjs +262 -0

package/docs/current/prd-ai-llm-mcp-encryption.md ADDED Viewed

@@ -0,0 +1,262 @@
+# PRD: AI/LLM/MCP Encryption Solution
+- Status: Draft 0.1
+- Date: 2026-06-08
+- Product: Haechi
+- Target version: Realignment of the initial PRD/SRS/security-review archive toward AI/LLM/MCP-specific requirements
+## 1. Purpose
+Haechi is an encryption solution that protects prompts, contexts, tool calls, resources, retrieval snippets, artifacts, and streaming events flowing through AI applications, LLM gateways, MCP clients/servers, agent runtimes, and A2A agent networks.
+The core objective is not merely transport-layer protection. Instead, it elevates the semantic units of AI workflows — `message`, `tool call`, `resource`, `task`, `context`, `artifact`, and `agent identity` — to first-class subjects of encryption policy and key management.
+The initial product form is open-source/self-hosted security infrastructure, not SaaS. Users attach Haechi to their AI applications as a library, CLI, sidecar proxy, or MCP wrapper, and must be able to swap out the encryption scheme, policy evaluation, privacy filtering, and audit store to fit their own environment.
+## 2. Background
+Standard TLS protects the transport channel between client and server. In LLM/agent systems, however, plaintext exposure recurs at the following points:
+- LLM gateways, prompt routers, and observability pipelines
+- JSON-RPC messages between MCP clients and servers
+- MCP tool inputs/outputs, resource content, and prompt templates
+- RAG retrieval snippets and vector metadata
+- A2A agent messages, task state, and artifacts
+- gRPC streaming chunks and metadata
+- Agent tool-call logs, traces, and replay records
+- Prompt/context transformation points before and after model provider transmission
+Haechi is the layer that encrypts, tokenizes, redacts, evaluates permissions, and audits these AI-native data units based on policy.
+Privacy filtering is a core capability of Haechi. Encryption protects data, but the moment it is exposed in plaintext to a model or tool, separate controls are required. Haechi detects PII in prompts, contexts, MCP tool inputs/outputs, resources, retrieval snippets, and generated artifacts, then handles each finding with one of: `allow`, `redact`, `mask`, `tokenize`, `encrypt`, `block`, or `human-review`.
+## 3. Product Positioning
+Haechi does not replace general-purpose security solutions, LLM gateways, MCP frameworks, or agent frameworks. It is an encryption and context protection module that augments existing AI/LLM/MCP-targeted solutions.
+Commercialization and a SaaS control plane are not initial goals. The initial positioning is: "a small but verifiable OSS core + swappable reference engines + deployment-oriented AI/MCP examples." The default implementation Haechi ships is a reference, not a prescription — users must be able to inject their own implementations through boundaries such as `CryptoProvider`, `PolicyEngine`, `FilterEngine`, `KeyProvider`, and `AuditSink`.
+| Target Solution | Integration Approach |
+|---|---|
+| LLM gateway | OpenAI-compatible/Anthropic-compatible HTTP adapter or middleware |
+| MCP host/client | MCP client proxy, SDK wrapper, policy interceptor |
+| MCP server | tool/resource/prompt response encryption wrapper |
+| Agent runtime | task/context/artifact scoped encryption |
+| A2A server/client | AgentCard verification, message/task/artifact encryption adapter |
+| gRPC AI service | protobuf field encryption, streaming message protection |
+| RAG pipeline | retrieval snippet, source metadata, artifact encryption |
+| Observability platform | prompt/tool/result redaction, sealed audit events |
+## 4. Key Differentiators
+- **AI semantic unit protection**: prompt, system message, tool args, tool result, MCP resource, and A2A artifact are treated as distinct protection subjects.
+- **Context-bound encryption**: tenant, user, agent, model, task, context, tool, and resource URI are included in AAD and decryption authorization evaluation.
+- **Policy before model**: policy determines what information may be exposed in plaintext to a model.
+- **Selective reveal**: only the portions a model strictly needs to see are exposed in plaintext; the rest is kept tokenized or as ciphertext.
+- **MCP/A2A native**: JSON-RPC id, method, session id, task id, artifact id, AgentCard, and streaming events are used as policy context.
+- **Observability-safe**: prompts and tool outputs are not retained in plaintext in logs, traces, metrics, or replay artifacts.
+- **Privacy filtering first**: PII is detected and policy is applied before data is passed to a model, agent, tool, or log.
+- **Easy adoption**: attachable to existing AI applications with minimal changes via proxy, middleware, SDK wrapper, sidecar, or config preset.
+## 5. Non-Goals
+- General-purpose cryptography that enables LLMs to directly understand ciphertext is not a goal.
+- Fully homomorphic encryption-based LLM inference is out of MVP scope.
+- End-to-end invisibility to all external LLM providers is not claimed.
+- Complete prevention of prompt injection is not claimed.
+- MCP or A2A protocols themselves are not replaced.
+- The initial version does not provide hosted SaaS, multi-tenant control planes, billing, SLAs, or commercial compliance packs.
+- No novel cryptographic primitives are invented. Validated standards and libraries are composed, and implementation boundaries are tested.
+## 6. User Segments
+| User Segment | Need |
+|---|---|
+| AI platform security owner | Minimize plaintext exposure of prompts/contexts/tool calls |
+| LLM gateway operator | Per-provider policy, redaction, encryption, and audit |
+| MCP server developer | Protect tool inputs/outputs and resource content |
+| Agent framework developer | Task/context/artifact-level encryption |
+| RAG operator | Protect retrieval snippets and source metadata |
+| Compliance/audit officer | Track who exposed which context to which model/agent/tool |
+| Privacy officer | Filter and control the exposure scope of PII, unique identifiers, and sensitive data before AI processing |
+| Open-source adopter | Reference the default implementation while easily replacing crypto, policy, filtering, and audit components |
+| OSS contributor/reviewer | Verify a project with proven security design, testing, and documentation quality even within a narrow scope |
+## 7. Business Requirements
+| ID | Requirement | Priority | Verification |
+|---|---|---:|---|
+| BR-AI-001 | The product must be delivered as an encryption module that can be added to AI/LLM/MCP-targeted solutions. | Must | PoC |
+| BR-AI-002 | The product must reduce plaintext exposure points for prompts, contexts, tool calls, resources, and artifacts. | Must | threat model |
+| BR-AI-003 | The product must provide an adapter strategy that covers both MCP stdio and Streamable HTTP. | Must | MCP PoC |
+| BR-AI-004 | The product should support gRPC streaming and A2A message/task/artifact protection. | Should | protocol PoC |
+| BR-AI-005 | The product must enforce policy-based control over the plaintext exposure scope when using external LLM providers. | Must | policy test |
+| BR-AI-006 | The product must remove or seal AI sensitive data from logs, traces, replays, and metrics. | Must | observability test |
+| BR-AI-007 | The product must support KMS/HSM/Vault-based key management and per-tenant/agent/task key separation. | Must | key test |
+| BR-AI-008 | The product must detect PII, unique identifiers, sensitive data, credentials, and secrets, and apply policy before exposing them to models/tools/agents. | Must | privacy filtering test |
+| BR-AI-009 | The product should support detection rules adapted to the Korean privacy environment and customer-specific custom entity rules. | Should | Korean PII fixture test |
+| BR-AI-010 | The product must support selecting privacy regulatory profiles for major markets — Korea, EU/UK, US, Japan, Singapore, Canada, and Brazil — as policy. | Must | regional profile test |
+| BR-AI-011 | The product must include cross-border transfer, data residency, subprocessors, and model provider region in the policy decision context. | Must | transfer policy test |
+| BR-AI-012 | The product should produce decision records and data flow evidence sufficient to support data subject rights responses, audits, DPIA/PIA, and DSAR exports. | Should | audit evidence review |
+| BR-AI-013 | The product must support per-customer/tenant custom filtering rules, dictionaries, classifiers, and action overrides. | Must | custom filter test |
+| BR-AI-014 | The product must provide AAD canonicalization, nonce/replay defense, key lifecycle, and signed policy distribution as first-class cryptographic security requirements. | Must | crypto negative test |
+| BR-AI-015 | The product must define security contracts covering transport, auth, lifecycle, and metadata scrubbing for each MCP, A2A, gRPC, and LLM gateway adapter. | Must | protocol contract test |
+| BR-AI-016 | The product must be usable as a library, CLI, local proxy, or self-hosted sidecar without depending on hosted SaaS, and must explicitly state key custody and telemetry boundaries. | Must | deployment review |
+| BR-AI-017 | The product must prioritize OSS trust artifacts over commercial evidence packs: `SECURITY.md`, threat model, conformance tests, SBOM, signed release artifacts, and security test results. | Must | OSS trust review |
+| BR-AI-018 | The product must verify plaintext leak, policy conflict, KMS fault, replay, region-deny, and custom DSL bypass as build-blocking security tests. | Must | CI gate test |
+| BR-AI-019 | The product must separate encryption, key management, policy evaluation, privacy filtering, token vault, audit, and protocol adapters into swappable provider interfaces. | Must | plugin boundary review |
+| BR-AI-020 | The product must provide a default reference implementation while offering dependency injection, plugin manifests, and compatibility contracts so users can inject their own implementations. | Must | plugin conformance test |
+| BR-AI-021 | The product must require plugins to declare capabilities such as plaintext access, network egress, file writes, and audit logging, and evaluate them under a fail-closed policy. | Must | plugin security test |
+| BR-AI-022 | The product must apply to existing AI/LLM/MCP solutions with low change cost, targeting a 5-minute local demo, 30-minute MCP/LLM PoC, and same-day custom filter PoC. | Must | adoption test |
+| BR-AI-023 | The product must lower the adoption barrier through `init`, preset policies, dry-run, report-only mode, and copy-paste middleware examples while maintaining secure defaults. | Must | quickstart review |
+## 8. Product Requirements
+| ID | Requirement | Priority |
+|---|---|---:|
+| PRD-AI-001 | The product must support per-role encryption, redaction, and tokenization policies for prompt messages. | Must |
+| PRD-AI-002 | The product must support per-method policies for MCP JSON-RPC. | Must |
+| PRD-AI-003 | The product must classify MCP tool inputs/outputs, resource content, and prompt templates as protection subjects. | Must |
+| PRD-AI-004 | The product should include A2A agent id, task id, context id, and artifact id in encryption AAD and decryption authorization evaluation. | Should |
+| PRD-AI-005 | The product should have an architecture capable of supporting both gRPC protobuf field encryption and opaque message encryption. | Should |
+| PRD-AI-006 | The product should support per-chunk nonces and stream/session binding for streaming chunks. | Should |
+| PRD-AI-007 | The product must produce a policy decision record before sending plaintext to a model provider. | Must |
+| PRD-AI-008 | The product must apply redaction by default to prompt/tool/resource/artifact logs. | Must |
+| PRD-AI-009 | The product must distinguish between customer-managed keys and provider-managed keys. | Must |
+| PRD-AI-010 | The product should support trust verification and allowlisting for MCP/A2A discovery metadata. | Should |
+| PRD-AI-011 | The product must provide a privacy filtering pipeline that combines deterministic rules, checksum validation, dictionary/NER, and pluggable classifiers. | Must |
+| PRD-AI-012 | The product must include as default detection targets: resident registration numbers, alien registration numbers, passport numbers, driver's license numbers, mobile phone numbers, email addresses, physical addresses, bank account numbers, card numbers, health information, biometric data, authentication credentials, and API keys/secrets. | Must |
+| PRD-AI-013 | The product must support specifying the handling action for each detected PII finding as one of: `redact`, `mask`, `tokenize`, `encrypt`, `block`, or `human-review`. | Must |
+| PRD-AI-014 | The product must support both pre-filter (before model invocation) and post-filter (after model/tool response). | Must |
+| PRD-AI-015 | The product must not retain the original text of filtered findings in logs; it must audit only entity type, confidence, rule id, action, and decision id. | Must |
+| PRD-AI-016 | The product must support switching the detection catalog, default action, transfer rules, retention rules, and audit fields through regional privacy profiles. | Must |
+| PRD-AI-017 | The product must be able to express GDPR/UK GDPR requirements for personal data, special category data, pseudonymisation, and international transfer as policy items. | Must |
+| PRD-AI-018 | The product should be able to express CCPA/CPRA requirements for sensitive personal information and limit-use as policy items. | Should |
+| PRD-AI-019 | The product should apply detection, redaction, tokenization, and logging-prohibition policies to HIPAA PHI and PCI cardholder data as separate sector profiles. | Should |
+| PRD-AI-020 | The product must be able to enforce per-tenant data residency and model provider region allowlists in global deployments. | Must |
+| PRD-AI-021 | The product must provide a custom filter DSL combining regex, checksum validators, keyword dictionaries, deny/allow lists, JSONPath/protobuf path, and semantic classifiers. | Must |
+| PRD-AI-022 | The product must support a draft, validate, test, approve, publish, and rollback lifecycle for custom filter rules. | Must |
+| PRD-AI-023 | The product must apply a clear priority order on custom filter rule conflicts: global profile, sector profile, tenant rule, app rule, emergency rule. | Must |
+| PRD-AI-024 | The product should encrypt customer-supplied custom dictionaries and fixtures at rest and enforce auditable access control. | Should |
+| PRD-AI-025 | The product should support deploying custom classifier plugins as one of: local-only, customer-managed endpoint, or external endpoint. | Should |
+| PRD-AI-026 | The product must fix encryption AAD using canonical JSON, Unicode normalization, and tenant/user/agent/model/task/tool/resource/policy version. | Must |
+| PRD-AI-027 | The product must enforce nonce uniqueness, stream sequence integrity, and replay cache across streaming chunks, retries, cancellations, and partial deliveries. | Must |
+| PRD-AI-028 | The product must manage key generation, rotation, rewrap, retirement, destruction evidence, and backup/restore drills as a key lifecycle. | Must |
+| PRD-AI-029 | The product must support token vault retention, deletion, DSAR export, re-identification approval, and access audit. | Must |
+| PRD-AI-030 | The product must support policy bundle signing, version pinning, emergency block, fail-closed validation, and stale policy rejection. | Must |
+| PRD-AI-031 | The product must enforce MCP authorization, token passthrough prohibition, per-client consent, stdio credential handling, and protocol version negotiation. | Must |
+| PRD-AI-032 | The product should verify A2A AgentCard signatures, authenticated extended cards, transport parity, and push notification security. | Should |
+| PRD-AI-033 | The product must remove original sensitive data from OpenTelemetry baggage, span attributes, metric labels, exceptions, crash dumps, and replay artifacts. | Must |
+| PRD-AI-034 | The product should have a provider-neutral LLM message schema and provider adapter mapping. | Should |
+| PRD-AI-035 | The product should support RAG/vector namespace, embedding/source metadata, citation, and index deletion propagation policies. | Should |
+| PRD-AI-036 | The product should classify agent memory as ephemeral or durable and support TTL, purge, export, and cross-task recall blocking. | Should |
+| PRD-AI-037 | The product must provide per-tenant config store, audit sink, quota, admin RBAC, and blast-radius limits. | Must |
+| PRD-AI-038 | The product should provide SBOM, artifact signing, provenance, dependency vulnerability policy, and classifier/plugin trust policy. | Should |
+| PRD-AI-039 | The product must provide envelope encryption, decrypt, rewrap, and key id resolution as swappable operations through the `CryptoProvider` interface. | Must |
+| PRD-AI-040 | The product must allow connecting local key, Vault, KMS, HSM, and test key providers under the same contract through the `KeyProvider` interface. | Must |
+| PRD-AI-041 | The product must allow connecting CEL, OPA/Rego, and user-supplied policy engines — in addition to JSON/YAML reference policies — through the `PolicyEngine` interface. | Must |
+| PRD-AI-042 | The product must allow replacing the rule/checksum/dictionary-based reference filter and user-supplied classifiers through the `FilterEngine` interface. | Must |
+| PRD-AI-043 | The product should allow replacing local encrypted vault, DB-backed vault, and external vault through the `TokenVault` interface. | Should |
+| PRD-AI-044 | The product must allow replacing JSONL, OpenTelemetry-safe exporter, SIEM webhook, and custom sinks through the `AuditSink` interface. | Must |
+| PRD-AI-045 | The product must ensure that MCP, LLM HTTP, gRPC, and A2A adapters all use the same protect/reveal pipeline through the `ProtocolAdapter` interface. | Must |
+| PRD-AI-046 | The product must provide golden fixtures, negative fixtures, capability manifests, and compatibility version tests for all providers/plugins. | Must |
+| PRD-AI-047 | The product must provide a local proxy mode that requires minimal changes to existing code. Users must be able to reroute LLM/MCP requests by specifying only a target base URL and a policy file. | Must |
+| PRD-AI-048 | The product must be applicable to Node and Python AI applications with SDK wrapper/middleware examples of ten lines or fewer. | Must |
+| PRD-AI-049 | The product must generate a sample policy, local key, audit path, and MCP/LLM presets via `haechi init` or an equivalent CLI command. | Must |
+| PRD-AI-050 | The product must provide a `dry-run` or `report-only` mode to inspect which prompts/tools/resources would be detected before actual blocking or encryption takes effect. | Must |
+| PRD-AI-051 | The product must provide the following default presets: `mcp-basic`, `llm-redact`, `korean-pii`, `secrets-only`, `local-only`, `strict-block`. | Must |
+| PRD-AI-052 | The product should provide a path for users to swap `CryptoProvider`, `PolicyEngine`, `FilterEngine`, and `AuditSink` from config without code changes. | Should |
+| PRD-AI-053 | The product must prefer blocking requests over leaking plaintext on application failure, and must explain the cause and remediation in development mode without including plaintext data. | Must |
+## 9. MVP Scope
+The MVP starts narrow.
+The actual 0.1 implementation scope is governed by `docs/current/mvp-0.1-implementation-scope.md`. Among the items below, the Python SDK, Vault/KMS adapter, MCP stdio wrapper, and RAG sample may be deferred beyond 0.1.
+Included:
+- TypeScript/Node SDK
+- Python SDK
+- core provider interface package
+- plugin manifest schema
+- provider conformance test harness
+- local CLI demo
+- one-command `init` quickstart
+- dry-run/report-only mode
+- copy-paste Node/Python middleware examples
+- MCP/LLM preset policy files
+- MCP Streamable HTTP proxy
+- MCP stdio wrapper
+- OpenAI-compatible HTTP request/response adapter
+- prompt/tool/resource redaction and envelope encryption
+- privacy filtering pipeline
+- Korean PII default detection rules
+- one of: Vault or AWS KMS adapter
+- local software key provider
+- JSON policy file
+- audit event JSON Lines
+- MCP tool-call sample
+- RAG snippet protection sample
+- reference `CryptoProvider`, `PolicyEngine`, `FilterEngine`, `KeyProvider`, `AuditSink`
+Excluded from MVP:
+- Fully homomorphic encryption or ciphertext LLM inference
+- Native SDK support for all LLM providers
+- Built-in KCMVP provider
+- Full A2A server implementation
+- gRPC bidirectional streaming production adapter
+- GUI management console
+- Hosted SaaS control plane
+- Billing, tenant admin portal, SLA
+- SOC 2/ISO commercial evidence pack
+## 10. Core Security Principles
+- Separate values the model must process from values the model does not need to see.
+- Classify PII before exposing it to a model; apply default blocking or tokenization when the purpose of exposure is not clear.
+- Explicitly state that data sent in plaintext to a model provider can no longer be guaranteed invisible by Haechi alone.
+- Treat tool-call and resource results as sensitive by default.
+- Include agent/task/context boundaries in AAD and decryption authorization.
+- Treat the observability pipeline as a first-class security boundary of the product.
+- Prompt injection defense and encryption are separate controls; neither substitutes for the other.
+- Swappable providers/plugins are trust boundaries. Expose and test capabilities such as plaintext access, network egress, file writes, and audit manipulation.
+- The reference implementation must be replaceable by users, but conformance tests and security negative tests remain as a fixed baseline that is not replaced.
+## 11. Open Questions
+- Decide whether the MCP adapter should be proxy-first or SDK wrapper-first.
+- Decide whether to use an OpenAI-compatible API as the primary LLM adapter or to define a provider-agnostic schema first.
+- Decide whether to protect embeddings themselves in RAG vector search or focus on protecting source text and metadata.
+- Decide whether A2A remains at the adapter level or evolves into a full protocol gateway.
+- Decide whether to include confidential computing/TEE in the secondary roadmap.
+- Decide whether an ML/LLM classifier used for PII detection may receive plaintext PII at that classifier itself.
+- Decide whether high-risk identifiers such as resident registration numbers default to block, or whether tokenization may be permitted under customer policy.
+- Decide whether the product supports EU/UK data transfer mechanisms (SCC/IDTA) as evidence only or enforces them as policy.
+- Decide whether HIPAA/PCI sector profiles are included in the MVP or deferred as later examples.
+- Decide whether the custom filter DSL uses a product-specific syntax or partially adopts an existing policy language such as OPA/Rego or CEL.
+- Decide whether external endpoint calls by custom classifier plugins default to prohibiting PII transmission or require customer opt-in.
+- Decide whether to stabilize the provider/plugin API TypeScript-first and then align Python, or to define a language-neutral IDL first.
+- Decide whether the open-source license is Apache-2.0 or MIT.
+## 12. References
+- Model Context Protocol Specification, latest: https://modelcontextprotocol.io/specification/
+- Model Context Protocol Authorization: https://modelcontextprotocol.io/specification/2025-11-25/basic/authorization
+- Model Context Protocol Security Best Practices: https://modelcontextprotocol.io/docs/tutorials/security/security_best_practices
+- Model Context Protocol official repository: https://github.com/modelcontextprotocol/modelcontextprotocol
+- NSA MCP Security Design Considerations: https://www.nsa.gov/Portals/75/documents/Cybersecurity/CSI_MCP_SECURITY.pdf
+- NSA/Five Eyes Careful Adoption of Agentic AI Services: https://media.defense.gov/2026/Apr/30/2003922823/-1/-1/0/CAREFUL%20ADOPTION%20OF%20AGENTIC%20AI%20SERVICES_FINAL.PDF
+- A2A Agent2Agent Protocol: https://a2a-protocol.org/latest/specification/
+- gRPC Core Concepts: https://grpc.io/docs/what-is-grpc/core-concepts/
+- Korean Personal Information Safety Measures Standard: https://law.go.kr/LSW/admRulInfoP.do?admRulSeq=2100000192069&chrClsCd=010201
+- KISA Cryptography FAQ: https://seed.kisa.or.kr/kisa/bbs/faq.do
+- European Commission GDPR overview: https://commission.europa.eu/law/law-topic/data-protection/reform/what-does-general-data-protection-regulation-gdpr-govern_en
+- European Commission Standard Contractual Clauses: https://commission.europa.eu/law/law-topic/data-protection/international-dimension-data-protection/standard-contractual-clauses-scc_en
+- California CCPA: https://www.oag.ca.gov/privacy/ccpa
+- HHS HIPAA Privacy Rule: https://www.hhs.gov/hipaa/for-professionals/privacy/index.html
+- NIST Privacy Framework: https://www.nist.gov/privacy-framework
+- RFC 8446, TLS 1.3
+- RFC 7516, JSON Web Encryption
+- RFC 9180, Hybrid Public Key Encryption

package/docs/current/privacy-filtering-policy-draft.ko.md ADDED Viewed

@@ -0,0 +1,307 @@
+# 개인정보 필터링 정책 초안
+- 문서 상태: Draft 0.1
+- 작성일: 2026-06-08
+- 관련 제품: Haechi
+## 1. 목적
+본 문서는 AI/LLM/MCP 환경에서 개인정보와 고위험 민감 데이터를 모델, tool, agent, 로그, trace, replay artifact로 전달하기 전에 탐지하고 처리하는 정책 초안을 정의한다.
+개인정보 필터링은 암호화의 대체물이 아니다. 필터링은 평문 공개 여부를 결정하고, 암호화는 저장·전송·권한 경계를 보호한다. Haechi는 두 기능을 함께 적용한다.
+## 2. 필터링 지점
+| 지점 | 설명 | 기본 정책 |
+|---|---|---|
+| Pre-model | LLM provider 호출 전 prompt/message 필터링 | Must |
+| Post-model | LLM 응답을 사용자, agent, tool에 전달하기 전 필터링 | Must |
+| MCP tool input | MCP tools/call arguments 필터링 | Must |
+| MCP tool output | tool result와 resource content 필터링 | Must |
+| RAG input | retrieval query, snippet, source metadata 필터링 | Should |
+| A2A message | agent message, task, artifact 필터링 | Should |
+| Observability | log, trace, replay, metric label 필터링 | Must |
+## 3. 기본 탐지 카탈로그
+| 분류 | 예시 | 기본 액션 |
+|---|---|---|
+| 고유식별정보 | 주민등록번호, 외국인등록번호, 여권번호, 운전면허번호 | block 또는 tokenize |
+| 민감정보 | 건강정보, 생체정보, 유전정보, 범죄경력, 정치/노조/종교 관련 정보 | block 또는 human-review |
+| 연락처 | 휴대전화번호, 전화번호, 이메일, 주소 | mask 또는 tokenize |
+| 금융정보 | 계좌번호, 카드번호, 카드 유효기간, CVC 유사값 | tokenize 또는 block |
+| 인증정보 | 비밀번호, access token, refresh token, API key, private key | block |
+| 고객 데이터 | 고객번호, 계약번호, 주문번호, 내부 식별자 | tokenize 또는 encrypt |
+| AI 특화 민감정보 | prompt 내 secret, tool output 내 개인정보, RAG snippet 내 개인정보 | redact, tokenize, encrypt |
+## 4. 글로벌 규제 프로파일
+지역별 기본 프로파일은 탐지 카탈로그, 기본 액션, 전송 제한, 감사 필드, 보존 정책을 바꾼다. 이 표는 제품 정책 설계를 위한 출발점이며 법률 자문을 대체하지 않는다.
+| Profile | 주요 기준 | 기본 강화 항목 |
+|---|---|---|
+| KR-PIPA | 개인정보, 고유식별정보, 민감정보, 안전성 확보조치 | 고유식별정보 block/tokenize, 암호키 관리, 접속기록 |
+| EU-GDPR | personal data, special categories, pseudonymisation, data minimisation, international transfer | special category 기본 block/human-review, SCC/adequacy evidence, DPIA evidence |
+| UK-GDPR | UK GDPR, IDTA/Addendum, special category data | UK transfer mechanism evidence, special category 강화 |
+| US-CCPA-CPRA | personal information, sensitive personal information, limit use/disclosure | SPI limit-use flag, consumer request evidence |
+| US-HIPAA | PHI, covered entity/business associate, de-identification | PHI default block/tokenize, Safe Harbor style identifier catalog, BAA evidence |
+| PCI-DSS | cardholder data, sensitive authentication data | PAN tokenize/mask, CVC block, payment data logging 금지 |
+| JP-APPI | personal information, special care-required personal information, anonymized/pseudonymized information | special care-required 정보 human-review, cross-border consent/evidence |
+| SG-PDPA | consent, purpose limitation, protection, retention, transfer limitation | purpose binding, transfer limitation evidence |
+| CA-PIPEDA | consent, limiting collection/use/disclosure, safeguards, cross-border handling | consent/purpose evidence, safeguard audit |
+| BR-LGPD | personal data, sensitive personal data, international transfer | sensitive data 강화, ANPD transfer mechanism evidence |
+## 5. 글로벌 정책 결정 context
+| Context | 설명 |
+|---|---|
+| data_subject_region | 정보주체의 추정 또는 명시 지역 |
+| controller_region | 고객/controller 지역 |
+| processor_region | Haechi/processor 처리 지역 |
+| model_provider_region | LLM provider 처리 지역 |
+| transfer_mechanism | SCC, IDTA, adequacy, BCR, consent, local-only 등 |
+| sector_profile | healthcare, payment, finance, education, public sector 등 |
+| lawful_basis_or_purpose | 처리 목적 또는 법적 근거를 표현하는 고객 정의 값 |
+| residency_policy | local-only, region-locked, allowed-regions |
+| retention_policy | audit와 token vault 보존 정책 |
+## 6. 탐지 방식
+| 방식 | 적용 대상 | 요구사항 |
+|---|---|---|
+| Deterministic rule | 주민등록번호, 카드번호, 이메일, 전화번호, API key | 규칙 ID와 버전을 관리해야 한다. |
+| Checksum validation | 주민등록번호 후보, 카드번호 후보 | 유효성 검증 실패 후보는 confidence를 낮춘다. |
+| Dictionary | 조직명, 내부 시스템명, 금칙어 | tenant별 dictionary를 지원한다. |
+| NER/classifier | 이름, 주소, 의료/건강 문맥, 민감 추론 | local-first를 기본으로 하고 외부 전송 시 별도 동의/정책이 필요하다. |
+| Custom entity rule | 고객별 식별자, 계약번호, 티켓번호 | policy에서 schema와 action을 정의한다. |
+## 7. 커스텀 필터링
+기본 규제 프로파일은 고객 내부 데이터를 충분히 알 수 없다. Haechi는 tenant별 custom filter를 1급 기능으로 제공해야 한다.
+### 7.1 커스텀 탐지 대상
+| 대상 | 예시 |
+|---|---|
+| 내부 식별자 | 고객번호, 사번, 멤버십 ID, 계약번호, 주문번호, 티켓번호 |
+| 제품/프로젝트 기밀 | 코드명, 제품 출시명, 내부 roadmap keyword |
+| 사내 시스템 정보 | internal hostname, repository name, table name, service name |
+| 산업 특화 데이터 | 의료 chart id, 보험 증권번호, 송장번호, 계좌 별칭 |
+| AI 특화 데이터 | prompt template secret, tool name, private skill name, vector collection name |
+| 보안정보 | internal API key prefix, service account, private endpoint, secret naming pattern |
+### 7.2 Custom filter DSL 요구사항
+| 기능 | 설명 |
+|---|---|
+| regex | 정규식 기반 탐지 |
+| checksum | 고객 정의 checksum 또는 validator 함수 |
+| dictionary | tenant별 단어/구문 사전 |
+| allowlist | 오탐 예외 처리 |
+| denylist | 즉시 차단 대상 |
+| path scope | JSONPath, protobuf field path, MCP method, A2A part type 범위 지정 |
+| context condition | tenant, app, environment, model provider, region, purpose 조건 |
+| action override | 기본 profile action보다 강한 조치 적용 |
+| confidence override | rule별 confidence 계산 또는 고정 |
+| test fixture | positive/negative sample과 expected action |
+### 7.3 Rule lifecycle
+| 단계 | 요구사항 |
+|---|---|
+| draft | rule 작성자는 production traffic에 영향을 주지 않고 초안을 만들 수 있어야 한다. |
+| validate | schema, regex safety, catastrophic backtracking, action 충돌을 검사해야 한다. |
+| test | fixture와 shadow traffic으로 false positive/negative를 측정해야 한다. |
+| approve | 고위험 action, 예: block, external classifier, region override는 승인 절차가 필요하다. |
+| publish | versioned rollout과 tenant/app/environment scope가 필요하다. |
+| monitor | hit rate, action rate, override rate를 관측해야 한다. |
+| rollback | 이전 rule version으로 즉시 되돌릴 수 있어야 한다. |
+### 7.4 우선순위
+강한 보호가 약한 보호보다 우선한다.
+1. Emergency global block rule
+2. Legal/regional profile mandatory rule
+3. Sector profile mandatory rule
+4. Tenant custom rule
+5. Application custom rule
+6. Allowlist exception
+7. Default profile rule
+Allowlist는 고유식별정보, PHI, card security code, secret 같은 hard-block entity를 우회할 수 없어야 한다.
+### 7.5 커스텀 규칙 예시
+```yaml
+customRules:
+  - id: tenant-contract-id
+    version: 3
+    owner: privacy-team
+    match:
+      regex: "\\bCTR-[0-9]{4}-[A-Z0-9]{8}\\b"
+      entityType: TENANT_CONTRACT_ID
+      scope:
+        sources:
+          - pre_model
+          - mcp_tool_input
+          - a2a_artifact
+    action: tokenize
+    tests:
+      positive:
+        - input: "계약번호는 CTR-2026-AB12CD34 입니다"
+          expectedEntity: TENANT_CONTRACT_ID
+          expectedAction: tokenize
+      negative:
+        - input: "CTR-ABCD는 제품 코드입니다"
+          expectedEntity: null
+  - id: internal-project-codename
+    version: 1
+    match:
+      dictionaryRef: dict://tenant-a/project-codenames
+      caseSensitive: false
+    action: block
+    appliesTo:
+      modelProviderRegionNotIn:
+        - local
+        - private-cloud
+```
+## 8. 처리 액션
+| 액션 | 의미 |
+|---|---|
+| allow | 탐지 결과를 허용한다. audit event는 남긴다. |
+| mask | 일부 문자만 유지하고 나머지를 마스킹한다. |
+| redact | 값을 제거하고 placeholder로 대체한다. |
+| tokenize | 복원 가능한 token으로 대체한다. 복원은 권한 평가 후 수행한다. |
+| encrypt | envelope ciphertext로 대체한다. |
+| block | 요청, 응답, tool-call 또는 artifact 전달을 차단한다. |
+| human-review | 승인 workflow로 보낸다. 자동 전달하지 않는다. |
+| region-deny | 지역/전송 정책 위반으로 차단한다. |
+| local-only | 외부 provider 호출 없이 로컬 처리만 허용한다. |
+## 9. 정책 예시
+```yaml
+profiles:
+  active:
+    - KR-PIPA
+    - EU-GDPR
+    - US-CCPA-CPRA
+rules:
+  - id: kr-unique-id-default
+    match:
+      entityTypes:
+        - KR_RRN
+        - KR_ALIEN_REG_NO
+        - PASSPORT_NO
+        - DRIVER_LICENSE_NO
+    action: block
+    appliesTo:
+      - pre_model
+      - mcp_tool_input
+      - observability
+  - id: contact-info-tokenize
+    match:
+      entityTypes:
+        - EMAIL
+        - PHONE
+        - ADDRESS
+    action: tokenize
+    reveal:
+      allowedPurposes:
+        - customer_support
+      requireAudit: true
+  - id: tool-output-redact
+    match:
+      source: mcp_tool_output
+      minConfidence: 0.65
+    action: redact
+  - id: eu-special-category-transfer-guard
+    match:
+      profiles:
+        - EU-GDPR
+      entityTypes:
+        - HEALTH_DATA
+        - BIOMETRIC_ID
+        - POLITICAL_OPINION
+        - UNION_MEMBERSHIP
+      destination:
+        modelProviderRegionNotIn:
+          - EU
+          - EEA
+    action: region-deny
+    require:
+      transferMechanism:
+        - adequacy
+        - SCC
+        - BCR
+```
+## 10. 감사 이벤트
+감사 이벤트는 원문을 포함하지 않아야 한다.
+| 필드 | 설명 |
+|---|---|
+| decision_id | 정책 결정 ID |
+| entity_type | 탐지된 entity type |
+| rule_id | 적용된 rule |
+| confidence | 탐지 confidence |
+| action | 적용한 처리 |
+| source | pre_model, mcp_tool_input 등 |
+| tenant_id_hash | tenant 식별자 hash |
+| agent_id_hash | agent 식별자 hash |
+| request_id | correlation id |
+| profile | 적용된 regional/sector profile |
+| transfer_mechanism | 적용된 전송 메커니즘 |
+| residency_decision | 지역 정책 결정 |
+| custom_rule_id | 커스텀 규칙이 적용된 경우 rule id |
+| custom_rule_version | 커스텀 규칙 버전 |
+## 11. 테스트 기준
+- 한국 개인정보 fixture를 유지한다.
+- EU special category, US sensitive personal information, HIPAA PHI, PCI card data, Japan/Singapore/Brazil/Canada fixture를 유지한다.
+- 주민등록번호, 외국인등록번호, 카드번호는 checksum positive/negative fixture를 모두 포함한다.
+- custom rule별 positive/negative fixture를 필수로 요구한다.
+- regex catastrophic backtracking과 과도한 CPU/memory 사용을 검사한다.
+- allowlist가 hard-block entity를 우회하지 못하는지 테스트한다.
+- prompt, MCP tool input/output, resource, artifact, log line별 fixture를 둔다.
+- false positive와 false negative를 별도 측정한다.
+- 필터링 전후 결과에 원문이 남지 않는 snapshot test를 수행한다.
+- 외부 classifier를 사용할 경우 classifier 요청 payload에 개인정보가 전송되는지 별도 검사한다.
+- region-deny, local-only, allowed-regions, transfer-mechanism missing 부정 테스트를 수행한다.
+## 12. 미결정 사항
+- 주민등록번호 등 고위험 식별자를 모든 환경에서 block할지, 폐쇄망/customer-managed-key 환경에서 tokenization을 허용할지 결정해야 한다.
+- 이름/주소 탐지를 deterministic rule 위주로 할지 NER classifier를 포함할지 결정해야 한다.
+- 개인정보 필터링 confidence threshold를 global default로 둘지 tenant별로 둘지 결정해야 한다.
+- 필터링 결과를 LLM에게 placeholder로 설명할지, 완전히 삭제할지 결정해야 한다.
+- GDPR/UK GDPR transfer mechanism을 제품이 hard enforcement할지, customer-provided evidence validation으로 둘지 결정해야 한다.
+- HIPAA/PCI sector profile을 MVP에 포함할지 결정해야 한다.
+- custom filter DSL을 자체 YAML 스키마로 유지할지, CEL/OPA/Rego 등 기존 표현식을 제한적으로 채택할지 결정해야 한다.
+- 고객 제공 dictionary를 제품 관리 KMS로 암호화할지 customer-managed key만 허용할지 결정해야 한다.
+## 13. 참고
+- 개인정보의 안전성 확보조치 기준: https://law.go.kr/LSW/admRulInfoP.do?admRulSeq=2100000192069&chrClsCd=010201
+- 개인정보보호위원회 개인정보보호지침: https://law.go.kr/LSW/admRulLsInfoP.do?admRulSeq=2100000240116
+- KISA 암호이용 FAQ: https://seed.kisa.or.kr/kisa/bbs/faq.do
+- European Commission GDPR overview: https://commission.europa.eu/law/law-topic/data-protection/reform/what-does-general-data-protection-regulation-gdpr-govern_en
+- European Commission SCC: https://commission.europa.eu/law/law-topic/data-protection/international-dimension-data-protection/standard-contractual-clauses-scc_en
+- California CCPA: https://www.oag.ca.gov/privacy/ccpa
+- HHS HIPAA Privacy Rule: https://www.hhs.gov/hipaa/for-professionals/privacy/index.html
+- NIST Privacy Framework: https://www.nist.gov/privacy-framework
+- Japan PPC APPI: https://www.ppc.go.jp/en/legal/
+- Singapore PDPC: https://www.imda.gov.sg/About-IMDA/Data-Protection/personal-data-protection
+- Canada PIPEDA: https://www.priv.gc.ca/en/privacy-topics/privacy-laws-in-canada/the-personal-information-protection-and-electronic-documents-act-pipeda/pipeda_brief
+- Brazil LGPD: https://www.gov.br/anpd/pt-br/centrais-de-conteudo/outros-documentos-e-publicacoes-institucionais/lgpd-en-lei-no-13-709-capa.pdf/view
+- PCI DSS: https://www.pcisecuritystandards.org/standards/pci-dss/