npm - haechi - Versions diffs - 0.3.2 - Mend

haechi 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (54) hide show

package/LICENSE +154 -0
package/README.md +102 -0
package/SECURITY.md +31 -0
package/docs/README.md +35 -0
package/docs/current/api-stability.ko.md +48 -0
package/docs/current/api-stability.md +48 -0
package/docs/current/expert-gap-review-ai-llm-mcp-encryption.ko.md +107 -0
package/docs/current/expert-gap-review-ai-llm-mcp-encryption.md +107 -0
package/docs/current/global-privacy-compliance-review.ko.md +110 -0
package/docs/current/global-privacy-compliance-review.md +110 -0
package/docs/current/initial-plan-ai-llm-mcp-encryption.ko.md +214 -0
package/docs/current/initial-plan-ai-llm-mcp-encryption.md +214 -0
package/docs/current/mvp-0.1-implementation-scope.ko.md +79 -0
package/docs/current/mvp-0.1-implementation-scope.md +79 -0
package/docs/current/open-source-modular-architecture.ko.md +387 -0
package/docs/current/open-source-modular-architecture.md +387 -0
package/docs/current/prd-ai-llm-mcp-encryption.ko.md +260 -0
package/docs/current/prd-ai-llm-mcp-encryption.md +262 -0
package/docs/current/privacy-filtering-policy-draft.ko.md +307 -0
package/docs/current/privacy-filtering-policy-draft.md +307 -0
package/docs/current/release-0.2-implementation-scope.ko.md +46 -0
package/docs/current/release-0.2-implementation-scope.md +46 -0
package/docs/current/release-0.3-implementation-scope.ko.md +86 -0
package/docs/current/release-0.3-implementation-scope.md +86 -0
package/docs/current/release-0.3.2-hardening-scope.ko.md +64 -0
package/docs/current/release-0.3.2-hardening-scope.md +64 -0
package/docs/current/release-0.4-implementation-scope.ko.md +121 -0
package/docs/current/release-0.4-implementation-scope.md +121 -0
package/docs/current/release-process.ko.md +48 -0
package/docs/current/release-process.md +48 -0
package/docs/current/risk-register-release-gate.ko.md +154 -0
package/docs/current/risk-register-release-gate.md +154 -0
package/docs/current/shared-responsibility.ko.md +38 -0
package/docs/current/shared-responsibility.md +38 -0
package/docs/current/threat-model.ko.md +68 -0
package/docs/current/threat-model.md +68 -0
package/examples/llm-prompt-filtering/input.json +13 -0
package/examples/plugins/custom-filter.plugin.json +29 -0
package/haechi.config.example.json +70 -0
package/package.json +74 -0
package/packages/audit/index.mjs +262 -0
package/packages/cli/bin/haechi.mjs +341 -0
package/packages/cli/runtime.mjs +287 -0
package/packages/core/index.mjs +309 -0
package/packages/crypto/index.mjs +142 -0
package/packages/filter/index.mjs +189 -0
package/packages/mcp-stdio/index.mjs +105 -0
package/packages/plugin/index.mjs +83 -0
package/packages/policy/index.mjs +165 -0
package/packages/policy-bundle/index.mjs +91 -0
package/packages/privacy-profiles/index.mjs +92 -0
package/packages/protocol-adapters/index.mjs +111 -0
package/packages/proxy/index.mjs +534 -0
package/packages/token-vault/index.mjs +262 -0

package/docs/current/expert-gap-review-ai-llm-mcp-encryption.md ADDED Viewed

@@ -0,0 +1,107 @@
+# Expert Parallel Gap Review: AI/LLM/MCP-Specific Encryption Solution
+- Status: Draft 0.1
+- Date: 2026-06-08
+- Product: Haechi
+- Review method: Parallel review across security/crypto, AI/MCP/A2A architecture, global compliance, product/business, and test strategy personas
+## 1. Summary
+The current document has a solid grasp of `what to protect`. However, the latest direction is an open-source / self-hosted security infrastructure, not a SaaS offering. The gaps below should therefore be read in two categories: "security gaps that are mandatory for the initial OSS core" and "an optional backlog for commercialization or enterprise adoption."
+Even for the initial OSS core, the following five axes are required.
+1. Crypto/policy bypass-resistance: canonical AAD, nonce/replay, key lifecycle, signed policy, fail-closed.
+2. Per-protocol operational contracts: MCP stdio/Streamable HTTP, A2A, gRPC, LLM gateway, RAG/vector, agent memory.
+3. OSS distribution/trust model: self-hosted mode, key custody, telemetry boundary, SECURITY.md, conformance tests, SBOM, signed release.
+4. Global AI/privacy governance: EU AI Act, NIST AI RMF/CSF, OWASP LLM/Agentic, US state privacy, sector profiles.
+5. Adoptability: 5-minute local demo, 30-minute MCP/LLM PoC, 1-day custom filter PoC, proxy/middleware/SDK/sidecar adoption paths.
+6. Verification automation: per-surface plaintext leak sentinel, policy conflict matrix, KMS fault injection, streaming chaos, red-team corpus.
+## 2. P0: Must Address Before Requirements Freeze
+| ID | Missing / Needs Strengthening | Why It Matters | Acceptance Criteria |
+|---|---|---|---|
+| GAP-P0-001 | AAD canonicalization | JSON reordering, Unicode variants, role/source spoofing, and policy version changes can destabilize the decryption context. | AAD schema, canonical JSON, Unicode normalization, policy version, and tenant/user/agent/model/task/tool/resource bindings are fixed; malformed inputs fail decryption. |
+| GAP-P0-002 | nonce/replay/stream sequencing | Nonce reuse or replay in streaming chunks, retries, partial delivery, and duplicate requests is catastrophic. | Identical nonce, cross-session replay, cross-tenant replay, out-of-order chunks, and duplicate chunks are all rejected. |
+| GAP-P0-003 | Key lifecycle | Using KMS/HSM/Vault alone does not define key creation, rotation, revocation, rewrap, backup/restore, or blast radius. | Tenant key rotation, retired key rejection, plaintext key non-export, rewrap job, restore drill, and destruction evidence are all tested. |
+| GAP-P0-004 | Token vault governance | Tokenization must be connected to a DSAR, deletion, retention, and re-identification authorization model. | Token mapping purge, dual-control re-identification, DSAR export, retention expiry, and decision record linkage are verified. |
+| GAP-P0-005 | Signed policy distribution | Client-supplied metadata, stale policy cache, and allowlist misuse can bypass hard-blocks. | Signed policy bundle, version pinning, server-side source classification, emergency rule precedence, and fail-closed validation are enforced. |
+| GAP-P0-006 | MCP transport security contract | The latest MCP spec is actively evolving authorization, lifecycle, protocol version, and security best practices. | `initialize`/`initialized`, `MCP-Protocol-Version`, OAuth resource binding, stdio env allowlist, token passthrough prohibition, and per-client consent are tested. |
+| GAP-P0-007 | A2A discovery/auth parity | Weak AgentCard, authenticated extended card, SSE/gRPC/REST binding, or push notification boundaries enable agent impersonation. | AgentCard signature/verification, security scheme parity, authenticated extended card, and streaming/resubscribe/push security are consistent across all adapters. |
+| GAP-P0-008 | Observability boundary | Plaintext leaks not only through logs but also through trace baggage, headers, URL query strings, stack traces, crash dumps, metric labels, and replay artifacts. | No sentinel PII/secret plaintext appears in any telemetry sink; only hashes, IDs, and redaction metadata are retained. |
+| GAP-P0-009 | OSS/self-hosted deployment modes | Library, CLI, local proxy, sidecar, and self-hosted service each have different key custody, egress, update, telemetry, and failure boundaries. | Per-mode key custody, network egress, telemetry path, upgrade/rollback, and local-only behavior are documented. |
+| GAP-P0-010 | Open-source shared responsibility note | OSS users must know the scope of their own responsibility for legal decisions, transfer evidence, key custody, DSAR, and incident response. | README/SECURITY/docs explicitly state maintainer responsibility, user responsibility, and a non-compliance disclaimer. |
+| GAP-P0-011 | EU AI Act / AI governance mapping | When Haechi is used as an AI system or AI governance component, interpretations around transparency, role determination, and incidents may be required. | A provider/deployer/GPAI role decision table, transparency note, incident log, and AI risk register template are provided as reference materials. |
+| GAP-P0-012 | Build-blocking security tests | Current acceptance criteria are declarative. A security product must be explicit about which failures block the build. | Plaintext leak, policy conflict, KMS fault, replay, global profile violations, and hard-block bypass failures all become CI gates. |
+| GAP-P0-013 | Easy adoption path | High security is irrelevant if adoption is too difficult for OSS spread and real-world use. | `haechi init`, dry-run/report-only, preset policy, local proxy, copy-paste middleware, and 5-minute/30-minute/1-day adoption targets are validated through documentation and examples. |
+## 3. P1: Address Before OSS Adoption and Production Use
+| ID | Missing / Needs Strengthening | Why It Matters | Acceptance Criteria |
+|---|---|---|---|
+| GAP-P1-001 | Provider-neutral LLM message model | An OpenAI-compatible schema alone cannot uniformly govern Anthropic, multimodal, structured output, tool/function calls, and streaming chunks. | An internal canonical message schema, provider adapter mapping, pre/post policy hooks, and chunk-level decision records exist. |
+| GAP-P1-002 | MCP registry/provenance/cache invalidation | A stale or poisoned tools/resources/prompts listing or `listChanged` event results in incorrect tool policy being applied. | Registry entry owner, version, hash, scope, cache invalidation, discovery auth, and deterministic ordering are verified. |
+| GAP-P1-003 | gRPC streaming semantics | Field encryption alone cannot handle deadline, cancellation, retry, ordering, and metadata leakage. | Deadline propagation, cancel audit, retry idempotency, metadata scrub, and partial delivery semantics are tested. |
+| GAP-P1-004 | RAG/vector DB protection | Protecting snippets alone leaves embedding, vector namespace, citation, and deletion propagation unaddressed. | Tenant-scoped namespace, embedding/source metadata policy, index deletion propagation, and citation redaction are verified. |
+| GAP-P1-005 | Agent memory lifecycle | Long-term memory and recall cannot be governed by task/context encryption alone. | Ephemeral/durable memory distinction, TTL/purge/export, per-tenant/agent namespace, and cross-task recall denial all function correctly. |
+| GAP-P1-006 | Multi-tenant isolation | Key separation alone is insufficient to isolate policy, audit, memory, rate limits, and the admin plane. | Tenant config store, audit sink, quota, provider allowlist, admin RBAC, and blast-radius limits are isolated per tenant. |
+| GAP-P1-007 | SDK/proxy/plugin deployment model | Customers must be able to decide whether to adopt in-process SDK, sidecar, gateway plugin, or server middleware. | Compatibility, failure boundaries, performance cost, rollback, and upgrade policy for each mode are documented. |
+| GAP-P1-008 | Custom DSL safety | Custom filters are an attack surface for parser bugs, regex DoS, allowlist bypass, and external classifier leakage. | DSL fuzzing, regex resource limits, conflict golden tests, and classifier egress tests run in CI. |
+| GAP-P1-009 | Supply chain integrity | A compromised SDK, connector, classifier plugin, or policy package turns the security product into an attack vector. | SBOM, provenance, artifact signing, dependency vulnerability policy, and plugin trust policy are required. |
+| GAP-P1-010 | AI red-team corpus | Treating prompt injection as a non-goal leaves tool-output injection, resource poisoning, and exfiltration unaddressed. | A red-team corpus mapped to OWASP LLM/Agentic threats exists, and block rationale is recorded in decision records. |
+| GAP-P1-011 | US privacy expansion | CCPA/CPRA alone is insufficient for full US market coverage. | Profiles or exclusion rationale for Colorado, Connecticut, Virginia, Texas, and Washington My Health My Data Act are defined. |
+| GAP-P1-012 | Sector operational controls | HIPAA/PCI require more than identifier detection. | BAA, ePHI audit, breach workflow, SAD storage prohibition, retention/disposal, and MFA evidence are verified. |
+| GAP-P1-013 | OSS trust evidence | Commercial certifications are a lower priority, but a security OSS project requires threat model, security policy, SBOM, release provenance, and test results as prerequisites for trust. | SECURITY.md, threat model, SBOM, signed release, and conformance results are publicly available. |
+| GAP-P1-014 | Adoption packaging | No commercial SKUs are needed, but users must have the information to decide which modules to integrate and how. | A core/filter/policy/crypto/mcp/llm/audit package matrix and a per-example integration guide are provided. |
+## 4. P2: Roadmap / Hardening
+| ID | Item | Rationale | Completion Criteria |
+|---|---|---|---|
+| GAP-P2-001 | Crypto agility / PQC migration | The envelope format may handle long-retention data, so algorithm deprecation and a PQC migration plan are needed. | Envelope versioning, deprecation window, rewrap, and an HPKE/PQC review note exist. |
+| GAP-P2-002 | STPA/CAST-based system-theoretic risk analysis | Agentic AI is more vulnerable to structural failures and cascading failures than to individual vulnerabilities. | STPA-Sec or CAST format hazard/control loop analysis is performed for high-risk use cases. |
+| GAP-P2-003 | China/India/Australia market matrix | A global product must decide whether to support major markets with profiles or explicitly exclude them. | A profile or exclusion rationale for PIPL, DPDP Act, and Australia Privacy Act is documented. |
+| GAP-P2-004 | Optional commercialization path | OSS/self-hosted is the current priority, but commercial support, consulting, and managed offerings may be chosen long-term. | Partner/channel, support policy, SLA, and pricing/SKU are documented separately only when commercialization is revisited. |
+| GAP-P2-005 | Performance/soak budget | Filtering and encryption directly affect latency and cost. | p95/p99 latency, throughput, memory, regex CPU limits, and telemetry overhead budget are verified through CI/soak tests. |
+## 5. Key Findings by Expert Persona
+| Persona | Key Findings |
+|---|---|
+| Security/Crypto | AAD canonicalization, nonce/replay, key lifecycle, token vault, and signed policy are all P0. |
+| AI/MCP/A2A Architecture | Per-protocol transport/auth/lifecycle contracts, observability boundaries, RAG/vector, agent memory, and multi-tenancy are all underdefined. |
+| Global Compliance | EU AI Act, ISO 27001/27701/42001, NIST AI RMF/CSF, OWASP GenAI/Agentic, and US state privacy expansions require reference profiles and exclusion rationale. |
+| Product/Business | Adoption is the priority over sales for now. Quickstart, package boundaries, examples, README, SECURITY.md, and plugin conformance are the key deliverables. |
+| Test Strategy | Security requirements must be enforced as build-blocking CI gates. Plaintext leak, policy conflict, KMS fault, replay, global profile violations, and DSL fuzzing are the critical test areas. |
+## 6. Documents to Add Immediately
+| Priority | Document | Purpose |
+|---|---|---|
+| 1 | `crypto-envelope-spec.md` | Define AAD canonicalization, envelope version, nonce, replay, and key lifecycle |
+| 2 | `security-test-spec-ai-llm-mcp.md` | Define build-blocking negative tests and red-team corpus |
+| 3 | `protocol-security-contract-mcp-a2a-grpc.md` | Per-adapter transport, auth, lifecycle, and metadata scrub contracts for MCP/A2A/gRPC/LLM |
+| 4 | `open-source-modular-architecture.md` | OSS/self-hosted package boundaries, provider/plugin API, conformance tests |
+| 5 | `self-hosted-shared-responsibility.md` | Per-mode key custody, telemetry, and user/maintainer responsibility for library/CLI/proxy/sidecar |
+| 6 | `rag-agent-memory-protection-design.md` | Protection for RAG/vector DB, source metadata, citations, and agent memory lifecycle |
+| 7 | `optional-enterprise-evidence-pack.md` | SOC 2, ISO, DPA, SCC/IDTA, and BAA evidence for when commercialization is revisited |
+## 7. Official References
+- Model Context Protocol latest specification: https://modelcontextprotocol.io/specification/
+- MCP authorization: https://modelcontextprotocol.io/specification/2025-11-25/basic/authorization
+- MCP security best practices: https://modelcontextprotocol.io/docs/tutorials/security/security_best_practices
+- NSA, Model Context Protocol Security Design Considerations, May 2026: https://www.nsa.gov/Portals/75/documents/Cybersecurity/CSI_MCP_SECURITY.pdf
+- NSA/Five Eyes, Careful Adoption of Agentic AI Services, April 2026: https://media.defense.gov/2026/Apr/30/2003922823/-1/-1/0/CAREFUL%20ADOPTION%20OF%20AGENTIC%20AI%20SERVICES_FINAL.PDF
+- A2A Protocol latest specification: https://a2a-protocol.org/latest/specification/
+- NIST AI Risk Management Framework: https://www.nist.gov/itl/ai-risk-management-framework
+- NIST Cybersecurity Framework 2.0: https://www.nist.gov/cyberframework
+- NIST Generative AI Profile, AI 600-1: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
+- European Commission, AI Act enters into force: https://commission.europa.eu/news-and-media/news/ai-act-enters-force-2024-08-01_en
+- OWASP Top 10 for LLM Applications 2025: https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/
+- OWASP Top 10 for Agentic Applications 2026: https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/
+- ISO/IEC 27001:2022: https://www.iso.org/standard/27001
+- ISO/IEC 27701:2025: https://www.iso.org/standard/27701
+- ISO/IEC 42001:2023: https://www.iso.org/standard/42001
+- AICPA SOC Suite of Services: https://www.aicpa-cima.com/resources/landing/system-and-organization-controls-soc-suite-of-services

package/docs/current/global-privacy-compliance-review.ko.md ADDED Viewed

@@ -0,0 +1,110 @@
+# 글로벌 개인정보/AI 컴플라이언스 검토
+- 문서 상태: Draft 0.1
+- 작성일: 2026-06-08
+- 관련 제품: Haechi
+## 1. 결론
+Haechi는 한국용 제품으로 시작할 수 있지만, 글로벌 제품으로 설계하려면 개인정보 필터링을 단일 규칙 세트가 아니라 `regional privacy profile`로 구현해야 한다. 글로벌 고객은 어떤 정보가 탐지되었는지보다 더 중요하게 다음 질문을 한다.
+- 어느 지역 정보주체의 데이터인가?
+- 어느 지역의 model provider, MCP server, agent, subprocessor로 전송되는가?
+- 평문 공개가 필요한가, tokenization으로 충분한가?
+- DSAR, deletion, audit, DPIA/PIA evidence를 남길 수 있는가?
+- sector data, 예: PHI, cardholder data, 교육/금융 데이터가 포함되는가?
+## 2. 우선 지원해야 할 프로파일
+| 우선순위 | Profile | 이유 |
+|---|---|---|
+| P0 | KR-PIPA | 국내 시장 기본 요구 |
+| P0 | EU-GDPR / UK-GDPR | 글로벌 privacy benchmark, cross-border transfer 요구 |
+| P0 | US-CCPA-CPRA | 미국 소비자 privacy와 sensitive personal information 대응 |
+| P1 | US-HIPAA | healthcare AI/MCP 도입 시 PHI 보호 |
+| P1 | PCI-DSS | 결제/커머스 agent와 card data 보호 |
+| P1 | JP-APPI | 일본 사용자/기업 도입 |
+| P1 | SG-PDPA | APAC regional hub와 self-hosted 도입 |
+| P2 | CA-PIPEDA | 캐나다 상용 고객 |
+| P2 | BR-LGPD | 브라질/라틴아메리카 진입 |
+## 3. 제품 요구사항 영향
+| 영역 | 글로벌화 영향 |
+|---|---|
+| 개인정보 탐지 | 국가별 identifier, sensitive category, sector data fixture 필요 |
+| 커스텀 필터링 | 고객별 내부 식별자, 사내 코드명, proprietary data rule 필요 |
+| 정책 엔진 | region, data subject, provider region, transfer mechanism을 context로 사용 |
+| 암호화 | tenant/region/profile별 key separation 필요 |
+| token vault | DSAR, deletion, retention, re-identification 권한 모델 필요 |
+| MCP/A2A | agent/task/context뿐 아니라 region과 provider allowlist를 AAD/권한에 포함 |
+| 로그/감사 | 원문 없는 decision record, profile, transfer mechanism, residency decision 필요 |
+| 배포 | local-only, region-locked, allowed-regions 모드 필요 |
+| AI governance | EU AI Act role 판단, transparency notice, incident log, AI risk register 필요 |
+| 관리체계/인증 | OSS 단계에서는 SECURITY.md, threat model, SBOM, signed release가 우선이고 ISO 27001/27701/42001, SOC 2 readiness는 후순위 참고 자료 |
+| 미국 확장 | CCPA/CPRA 외 state privacy, Washington consumer health data, HIPAA/PCI 운영요건 필요 |
+## 4. Cross-border transfer 설계
+Haechi는 법적 계약을 대체하지 않는다. 다만 기술적으로 다음을 강제하거나 증거화할 수 있어야 한다.
+| 요구 | 제품 동작 |
+|---|---|
+| region allowlist | 허용된 model provider/MCP server/agent region만 호출 |
+| transfer mechanism evidence | SCC, IDTA, adequacy, BCR, consent 등 고객 제공 값을 decision record에 포함 |
+| local-only | 특정 profile에서는 외부 provider 호출 차단 |
+| tokenization before transfer | 해외 전송 전 직접식별자 tokenization |
+| encrypted artifact transfer | A2A/MCP artifact는 task/context scoped key로 암호화 후 전송 |
+## 5. 글로벌 데이터 카테고리
+| Category | 예시 | 기본 처리 |
+|---|---|---|
+| Direct identifiers | national ID, SSN, passport, driver's license, alien registration | block/tokenize |
+| Contact identifiers | email, phone, address | mask/tokenize |
+| Online identifiers | IP, cookie id, device id, advertising id | tokenize/redact |
+| Financial data | bank account, card number, payment token | tokenize/block |
+| Health/biometric/genetic | PHI, biometric template, genetic data | block/human-review |
+| Children data | minor/child related data | block/human-review |
+| Sensitive beliefs/status | religion, union, politics, ethnicity, immigration/citizenship | block/human-review |
+| AI-specific | prompt secrets, tool output PII, RAG snippets, generated artifact | redact/tokenize/encrypt |
+## 6. 글로벌 검증 기준
+- Region별 positive/negative fixture를 유지한다.
+- Tenant custom rule positive/negative fixture와 rule lifecycle audit을 유지한다.
+- GDPR special category, CCPA sensitive personal information, HIPAA PHI, PCI card data fixture를 분리한다.
+- model provider region이 allowlist 밖이면 호출이 차단되는지 테스트한다.
+- transfer mechanism 누락 시 EU/UK/BR profile에서 region-deny되는지 테스트한다.
+- token vault deletion과 DSAR export가 decision record와 연결되는지 테스트한다.
+- audit log에 원문 개인정보가 없는지 snapshot test를 수행한다.
+- EU AI Act role 판단표, transparency notice, synthetic content label, incident record를 검증한다.
+- OSS trust evidence에 SECURITY.md, threat model, SBOM, signed release, conformance test result가 포함되는지 검토한다.
+- ISO/SOC 2 evidence pack은 상용화 또는 enterprise support를 다시 검토할 때 후순위로 작성한다.
+- US state privacy와 Washington consumer health data fixture 또는 제외 사유를 market-expansion matrix에 남긴다.
+- HIPAA/PCI sector profile은 BAA, ePHI audit, SAD storage prohibition, retention/disposal evidence를 포함한다.
+## 7. 공식 참고 자료
+- European Commission GDPR overview: https://commission.europa.eu/law/law-topic/data-protection/reform/what-does-general-data-protection-regulation-gdpr-govern_en
+- European Commission SCC: https://commission.europa.eu/law/law-topic/data-protection/international-dimension-data-protection/standard-contractual-clauses-scc_en
+- EU AI Act: https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng
+- European Commission AI Act overview: https://commission.europa.eu/news-and-media/news/ai-act-enters-force-2024-08-01_en
+- NIST AI Risk Management Framework: https://www.nist.gov/itl/ai-risk-management-framework
+- NIST Cybersecurity Framework 2.0: https://www.nist.gov/cyberframework
+- OWASP Top 10 for LLM Applications 2025: https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/
+- OWASP Top 10 for Agentic Applications 2026: https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/
+- ISO/IEC 27001:2022: https://www.iso.org/standard/27001
+- ISO/IEC 27701:2025: https://www.iso.org/standard/27701
+- ISO/IEC 42001:2023: https://www.iso.org/standard/42001
+- AICPA SOC Suite of Services: https://www.aicpa-cima.com/resources/landing/system-and-organization-controls-soc-suite-of-services
+- California CCPA: https://www.oag.ca.gov/privacy/ccpa
+- California CPPA FAQ: https://cppa.ca.gov/faq
+- HHS HIPAA Privacy Rule: https://www.hhs.gov/hipaa/for-professionals/privacy/index.html
+- HHS HIPAA De-identification: https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html
+- NIST Privacy Framework: https://www.nist.gov/privacy-framework
+- Japan PPC APPI: https://www.ppc.go.jp/en/legal/
+- Singapore PDPC: https://www.imda.gov.sg/About-IMDA/Data-Protection/personal-data-protection
+- Canada PIPEDA: https://www.priv.gc.ca/en/privacy-topics/privacy-laws-in-canada/the-personal-information-protection-and-electronic-documents-act-pipeda/pipeda_brief
+- Brazil LGPD: https://www.gov.br/anpd/pt-br/centrais-de-conteudo/outros-documentos-e-publicacoes-institucionais/lgpd-en-lei-no-13-709-capa.pdf/view
+- PCI DSS: https://www.pcisecuritystandards.org/standards/pci-dss/

package/docs/current/global-privacy-compliance-review.md ADDED Viewed

@@ -0,0 +1,110 @@
+# Global Privacy and AI Compliance Review
+- Status: Draft 0.1
+- Date: 2026-06-08
+- Product: Haechi
+## 1. Summary
+Haechi can launch as a Korea-first product, but designing it as a global product requires implementing privacy filtering not as a single ruleset but as `regional privacy profiles`. Global customers care less about which specific data items are detected and more about the following questions:
+- Which region's data subjects are involved?
+- To which region's model provider, MCP server, agent, or subprocessor is data being sent?
+- Is plaintext disclosure required, or is tokenization sufficient?
+- Can DSAR, deletion, audit, and DPIA/PIA evidence be produced?
+- Does the data include sector-specific data such as PHI, cardholder data, or education/financial data?
+## 2. Priority Profiles to Support
+| Priority | Profile | Rationale |
+|---|---|---|
+| P0 | KR-PIPA | Core requirement for the domestic market |
+| P0 | EU-GDPR / UK-GDPR | Global privacy benchmark; cross-border transfer requirements |
+| P0 | US-CCPA-CPRA | US consumer privacy and sensitive personal information coverage |
+| P1 | US-HIPAA | PHI protection for healthcare AI/MCP adoption |
+| P1 | PCI-DSS | Card data protection for payment/commerce agents |
+| P1 | JP-APPI | Adoption by Japanese users and enterprises |
+| P1 | SG-PDPA | APAC regional hub and self-hosted deployments |
+| P2 | CA-PIPEDA | Canadian commercial customers |
+| P2 | BR-LGPD | Brazil/Latin America market entry |
+## 3. Product Requirements Impact
+| Area | Globalization impact |
+|---|---|
+| Privacy detection | Per-country identifier, sensitive category, and sector data fixtures required |
+| Custom filtering | Per-customer internal identifiers, internal code names, and proprietary data rules required |
+| Policy engine | Region, data subject, provider region, and transfer mechanism used as policy context |
+| Encryption | Per-tenant/region/profile key separation required |
+| Token vault | DSAR, deletion, retention, and re-identification permission model required |
+| MCP/A2A | Region and provider allowlists included in AAD/permissions alongside agent/task/context |
+| Logging/audit | Decision records without raw content required, including profile, transfer mechanism, and residency decision |
+| Deployment | Local-only, region-locked, and allowed-regions modes required |
+| AI governance | EU AI Act role determination, transparency notice, incident log, and AI risk register required |
+| Governance/certification | At OSS stage, SECURITY.md, threat model, SBOM, and signed releases are the priority; ISO 27001/27701/42001 and SOC 2 readiness are secondary reference material |
+| US expansion | State privacy laws beyond CCPA/CPRA, Washington consumer health data, and HIPAA/PCI operational requirements needed |
+## 4. Cross-Border Transfer Design
+Haechi does not replace legal contracts. However, it must be able to technically enforce or produce evidence for the following:
+| Requirement | Product behavior |
+|---|---|
+| Region allowlist | Only call model providers/MCP servers/agents in allowed regions |
+| Transfer mechanism evidence | Customer-supplied values (SCC, IDTA, adequacy, BCR, consent, etc.) included in decision records |
+| Local-only | Block external provider calls for certain profiles |
+| Tokenization before transfer | Tokenize direct identifiers before cross-border transmission |
+| Encrypted artifact transfer | A2A/MCP artifacts encrypted with task/context-scoped keys before transmission |
+## 5. Global Data Categories
+| Category | Examples | Default handling |
+|---|---|---|
+| Direct identifiers | National ID, SSN, passport, driver's license, alien registration | block/tokenize |
+| Contact identifiers | Email, phone, address | mask/tokenize |
+| Online identifiers | IP, cookie ID, device ID, advertising ID | tokenize/redact |
+| Financial data | Bank account, card number, payment token | tokenize/block |
+| Health/biometric/genetic | PHI, biometric template, genetic data | block/human-review |
+| Children's data | Minor/child-related data | block/human-review |
+| Sensitive beliefs/status | Religion, union membership, politics, ethnicity, immigration/citizenship | block/human-review |
+| AI-specific | Prompt secrets, tool output PII, RAG snippets, generated artifacts | redact/tokenize/encrypt |
+## 6. Global Validation Criteria
+- Maintain positive/negative fixtures per region.
+- Maintain tenant custom rule positive/negative fixtures and rule lifecycle audits.
+- Keep GDPR special category, CCPA sensitive personal information, HIPAA PHI, and PCI card data fixtures separate.
+- Test that calls are blocked when the model provider region is outside the allowlist.
+- Test that EU/UK/BR profiles deny the region when a transfer mechanism is missing.
+- Test that token vault deletion and DSAR export are linked to decision records.
+- Run snapshot tests to verify that raw personal data is absent from audit logs.
+- Validate EU AI Act role determination table, transparency notice, synthetic content label, and incident records.
+- Review whether OSS trust evidence includes SECURITY.md, threat model, SBOM, signed release, and conformance test results.
+- Defer ISO/SOC 2 evidence pack to when commercialization or enterprise support is revisited.
+- Record US state privacy and Washington consumer health data fixtures or exclusion rationale in the market-expansion matrix.
+- HIPAA/PCI sector profiles must include BAA, ePHI audit, SAD storage prohibition, and retention/disposal evidence.
+## 7. Official References
+- European Commission GDPR overview: https://commission.europa.eu/law/law-topic/data-protection/reform/what-does-general-data-protection-regulation-gdpr-govern_en
+- European Commission SCC: https://commission.europa.eu/law/law-topic/data-protection/international-dimension-data-protection/standard-contractual-clauses-scc_en
+- EU AI Act: https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng
+- European Commission AI Act overview: https://commission.europa.eu/news-and-media/news/ai-act-enters-force-2024-08-01_en
+- NIST AI Risk Management Framework: https://www.nist.gov/itl/ai-risk-management-framework
+- NIST Cybersecurity Framework 2.0: https://www.nist.gov/cyberframework
+- OWASP Top 10 for LLM Applications 2025: https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/
+- OWASP Top 10 for Agentic Applications 2026: https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/
+- ISO/IEC 27001:2022: https://www.iso.org/standard/27001
+- ISO/IEC 27701:2025: https://www.iso.org/standard/27701
+- ISO/IEC 42001:2023: https://www.iso.org/standard/42001
+- AICPA SOC Suite of Services: https://www.aicpa-cima.com/resources/landing/system-and-organization-controls-soc-suite-of-services
+- California CCPA: https://www.oag.ca.gov/privacy/ccpa
+- California CPPA FAQ: https://cppa.ca.gov/faq
+- HHS HIPAA Privacy Rule: https://www.hhs.gov/hipaa/for-professionals/privacy/index.html
+- HHS HIPAA De-identification: https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html
+- NIST Privacy Framework: https://www.nist.gov/privacy-framework
+- Japan PPC APPI: https://www.ppc.go.jp/en/legal/
+- Singapore PDPC: https://www.imda.gov.sg/About-IMDA/Data-Protection/personal-data-protection
+- Canada PIPEDA: https://www.priv.gc.ca/en/privacy-topics/privacy-laws-in-canada/the-personal-information-protection-and-electronic-documents-act-pipeda/pipeda_brief
+- Brazil LGPD: https://www.gov.br/anpd/pt-br/centrais-de-conteudo/outros-documentos-e-publicacoes-institucionais/lgpd-en-lei-no-13-709-capa.pdf/view
+- PCI DSS: https://www.pcisecuritystandards.org/standards/pci-dss/

package/docs/current/initial-plan-ai-llm-mcp-encryption.ko.md ADDED Viewed

@@ -0,0 +1,214 @@
+# 초기 계획: AI/LLM/MCP 특화 암호화 솔루션
+- 문서 상태: Draft 0.1
+- 작성일: 2026-06-08
+- 제품명: Haechi
+## 1. 방향성 판단
+AI, LLM, MCP에 특화한 암호화 솔루션은 범용 구간암호화보다 더 선명한 시장 포지션을 가진다. 일반 API 암호화 제품은 HTTP payload 보호에 머무는 경우가 많지만, AI 시스템은 prompt, context, tool-call, resource, retrieval snippet, artifact, streaming event라는 새로운 민감 데이터 단위를 가진다.
+특히 MCP와 A2A는 agent와 tool 생태계가 확장될수록 보안 경계가 흐려진다. Haechi는 이 경계에 들어가서 "무엇을 어떤 agent, tool, model, provider에게 평문으로 보여줄 수 있는가"를 정책화하는 제품이 될 수 있다.
+초기 방향은 SaaS가 아니라 오픈소스/self-hosted 보안 프로젝트다. 따라서 1차 목표는 판매 가능한 control plane이 아니라, 보안 설계가 선명한 core interface, 교체 가능한 reference engine, conformance test, MCP/LLM 실사용 예제를 공개하는 것이다.
+## 2. 핵심 가설
+| ID | 가설 | 검증 방법 |
+|---|---|---|
+| HYP-001 | 기업은 LLM gateway 로그에 prompt/tool output이 평문으로 남는 것을 줄이고 싶어 한다. | AI gateway PoC |
+| HYP-002 | MCP server 개발자는 tool input/output과 resource content를 안전하게 노출하는 공통 모듈을 원한다. | MCP server sample |
+| HYP-003 | agent-to-agent 시스템에서는 task/context/artifact 단위 복호화 권한이 필요하다. | A2A adapter PoC |
+| HYP-004 | "모델이 꼭 봐야 하는 정보"와 "시스템이 보관/전달만 해야 하는 정보"를 분리하면 암호화 제품 가치가 커진다. | selective reveal demo |
+| HYP-005 | 보안팀과 OSS 도입자는 KMS/HSM, audit, policy, redaction을 agent framework 바깥에서 통제하고 싶어 한다. | OSS adopter / security reviewer interview |
+| HYP-006 | 기업은 LLM/MCP 도입 시 개인정보 필터링을 암호화만큼 중요한 기본 통제로 요구한다. | Korean PII filtering PoC |
+| HYP-007 | 글로벌 고객은 지역별 privacy profile과 data residency/model provider region 통제를 요구한다. | regional profile PoC |
+| HYP-008 | 고객은 내부 식별자와 기밀명칭을 직접 등록하는 custom filtering 기능을 요구한다. | custom rule DSL PoC |
+| HYP-009 | OSS 도입자는 완성형 SaaS보다 자기 환경에 맞게 crypto, policy, filtering, audit 구현을 갈아끼울 수 있는 작은 core를 선호한다. | plugin API PoC + conformance tests |
+| HYP-010 | 보안 도구라도 적용이 어렵다면 확산되지 않는다. 5분 local demo, 30분 MCP/LLM PoC, 1일 custom filter PoC가 가능해야 한다. | quickstart usability test |
+## 3. 우선순위 유스케이스
+### 3.1 MCP Tool-call 보호
+- MCP client가 tool call을 생성한다.
+- Haechi policy가 tool name, arguments schema, tenant, user, agent id를 평가한다.
+- 민감 argument는 tokenization 또는 envelope encryption으로 보호한다.
+- MCP server는 허용된 context에서만 복호화한다.
+- tool result는 기본 redaction 후 agent에게 반환한다.
+### 3.2 MCP Resource 보호
+- resource URI와 content classification을 policy에 매핑한다.
+- resource content는 tenant/resource scope key로 암호화한다.
+- LLM에게는 원문 대신 redacted summary 또는 reference token을 제공한다.
+- 감사로그에는 resource URI hash, policy id, key id, decision id만 남긴다.
+### 3.3 LLM Gateway Prompt 보호
+- HTTP request에서 system/developer/user/tool message를 분리한다.
+- PII, secret, credential, source code, customer data를 탐지한다.
+- provider 전송 전 reveal, redact, tokenize, block 중 하나로 결정한다.
+- provider별 평문 공개 범위와 audit event를 기록한다.
+### 3.4 개인정보 필터링
+- prompt, MCP tool argument, resource content, RAG snippet, generated artifact를 필터링 대상으로 수집한다.
+- deterministic rule과 checksum으로 주민등록번호, 외국인등록번호, 카드번호 등 구조화 식별자를 우선 탐지한다.
+- 이메일, 전화번호, 주소, 계좌번호, API key, access token, secret은 rule과 pattern library로 탐지한다.
+- 이름, 조직, 의료/건강정보, 생체정보, 민감 추론 정보는 dictionary/NER/pluggable classifier로 탐지한다.
+- 탐지 결과는 정책에 따라 mask, redact, tokenize, encrypt, block, human-review 중 하나로 처리한다.
+- 필터링 감사로그에는 원문을 남기지 않고 entity type, rule id, confidence, action, decision id만 남긴다.
+### 3.5 A2A Task/Artifact 보호
+- AgentCard discovery 결과를 검증한다.
+- task id, context id, source agent, target agent를 AAD에 포함한다.
+- artifact는 task-scoped key로 암호화한다.
+- 다른 task/context/agent에서 artifact 복호화를 시도하면 거부한다.
+### 3.6 gRPC Streaming 보호
+- service/method/message type을 policy context로 사용한다.
+- stream/session key와 chunk nonce를 분리한다.
+- cancellation, retry, redelivery, partial delivery를 audit event로 남긴다.
+- metadata leakage를 별도 검사한다.
+## 4. 아키텍처 초안
+```text
+AI App / Agent Runtime / MCP Host
+        |
+        v
+Haechi SDK / CLI / Local Proxy / Sidecar
+        |
+        +-- Core Pipeline
+        |      +-- normalize protocol message
+        |      +-- classify/filter
+        |      +-- decide policy
+        |      +-- encrypt/tokenize/redact
+        |      +-- emit safe audit
+        |
+        +-- Pluggable Providers
+        |      +-- CryptoProvider
+        |      +-- KeyProvider
+        |      +-- PolicyEngine
+        |      +-- FilterEngine
+        |      +-- TokenVault
+        |      +-- AuditSink
+        |
+        +-- Reference Engines
+        |      +-- JSON/YAML policy
+        |      +-- local key provider
+        |      +-- envelope crypto
+        |      +-- Korean/global PII filters
+        |      +-- JSONL audit
+        |
+        +-- MCP Adapter
+        +-- LLM HTTP Adapter
+        +-- gRPC Adapter
+        +-- A2A Adapter
+        |
+        v
+MCP Server / LLM Provider / Remote Agent / Tool API
+```
+## 5. 설계 원칙
+- Protocol-aware: 단순 byte stream이 아니라 MCP method, A2A task, gRPC method, LLM message role을 이해한다.
+- Context-bound: 암호문은 tenant, user, agent, model, task, context, tool, resource에 바인딩된다.
+- Selective reveal: 모델에게 필요한 최소 정보만 평문으로 공개한다.
+- Observability-safe: trace와 replay는 기본적으로 민감정보를 담지 않는다.
+- Provider-neutral: 특정 LLM vendor에 종속되지 않는다.
+- Fail-closed: 민감정보 분류가 있는 payload는 정책 실패 시 차단한다.
+- OSS-first: hosted SaaS 없이 라이브러리, CLI, local proxy, self-hosted sidecar로 작동한다.
+- Pluggable by default: crypto, key, policy, filtering, audit은 기본 구현보다 interface와 test contract가 더 중요하다.
+- Reference implementation is replaceable: 기본 구현은 학습과 PoC를 위한 기준이며 사용자 환경에 맞게 교체 가능해야 한다.
+- Test fixtures as API: plugin 작성자가 fixture와 conformance test로 호환성을 검증할 수 있어야 한다.
+- Easy adoption: proxy, middleware, SDK wrapper, sidecar, preset policy 중 하나로 기존 앱에 낮은 변경 비용으로 붙을 수 있어야 한다.
+- Progressive hardening: 처음에는 dry-run/report-only로 탐지 결과를 확인하고, 이후 redact/tokenize/encrypt/block을 단계적으로 강제한다.
+## 6. 1차 기술 스택 제안
+| 영역 | 제안 |
+|---|---|
+| SDK | TypeScript/Node, Python |
+| Policy | JSON/YAML + JSON Schema |
+| Crypto format | JWE JSON serialization 또는 compact envelope |
+| KMS | Vault 또는 AWS KMS |
+| MCP | Streamable HTTP proxy, stdio wrapper |
+| LLM adapter | OpenAI-compatible HTTP schema 우선 |
+| Redaction | deterministic detector + pluggable classifier |
+| Privacy filtering | Korean PII rules + checksum validators + custom entity rules |
+| Audit | JSON Lines + hash chain option |
+| Tests | golden fixtures, tamper/replay/cross-context negative tests |
+| Plugin contract | TypeScript interface + JSON Schema manifest + conformance test |
+| Distribution | GitHub repository, package examples, local CLI, SECURITY.md |
+| Developer UX | `haechi init`, preset policy, dry-run/report-only, copy-paste middleware |
+## 7. MVP 마일스톤
+| 단계 | 산출물 | 완료 기준 |
+|---|---|---|
+| M0 | Developer quickstart | `haechi init`, local key, sample policy, dry-run, MCP/LLM demo가 5분 안에 실행 |
+| M1 | MCP proxy skeleton | initialize/tools/call/resource read 흐름 관측 |
+| M2 | Policy engine | method/tool/resource별 allow/block/redact/encrypt 결정 |
+| M3 | 개인정보 필터링 | 한국 PII fixture와 secret fixture 탐지/처리 |
+| M4 | 글로벌 privacy profile | EU-GDPR, US-CCPA-CPRA, US-HIPAA/PCI fixture와 region-deny |
+| M5 | Custom filter DSL | regex/dictionary/path-scope/action override와 fixture test |
+| M6 | Envelope crypto | context-bound encrypt/decrypt와 tamper test |
+| M7 | KMS adapter | local provider + Vault/AWS KMS 중 1개 |
+| M8 | LLM HTTP adapter | chat/completion message redaction/encryption policy |
+| M9 | Audit | prompt/tool/resource/PII 평문 미노출 검증 |
+| M10 | Security negative tests | replay, wrong context, wrong agent, wrong tool, log leakage |
+| M11 | Crypto envelope hardening | canonical AAD, nonce/replay cache, key lifecycle, signed policy |
+| M12 | Protocol security contracts | MCP/A2A/gRPC/LLM adapter별 auth/lifecycle/metadata scrub 계약 |
+| M13 | OSS modular package | `core`, `crypto`, `policy`, `filter`, `mcp`, `llm`, `audit`, `examples` package boundary |
+| M14 | Plugin examples | custom `PolicyEngine`, custom `FilterEngine`, custom `AuditSink` 예제와 conformance test |
+| M15 | Build-blocking QA gate | plaintext leak, policy conflict, KMS fault, region-deny, DSL fuzzing, plugin capability violation |
+## 8. 가장 큰 리스크
+| 리스크 | 설명 | 대응 |
+|---|---|---|
+| 모델은 암호문을 이해하지 못한다 | LLM이 처리해야 하는 의미 정보는 결국 reveal이 필요하다. | selective reveal, tokenization, TEE 로드맵 |
+| MCP/A2A 스펙 변화 | protocol이 아직 빠르게 변한다. | adapter isolation, spec version field |
+| tool-call 로그 유출 | agent framework가 별도 로그를 남길 수 있다. | framework-specific log hook, redaction test |
+| prompt injection과 혼동 | 암호화가 prompt injection 방어를 대체하지 않는다. | 별도 prompt security gate |
+| embedding 보호 난이도 | 암호화하면 similarity search가 어렵다. | source text 보호 우선, embedding policy 별도 |
+| 개인정보 필터링 오탐/미탐 | 오탐은 업무 품질을 떨어뜨리고 미탐은 개인정보 유출로 이어진다. | confidence threshold, human-review, fixture test |
+| 필터 자체의 개인정보 처리 | 외부 classifier를 쓰면 필터링 과정에서 개인정보가 재노출될 수 있다. | local-first detector, classifier privacy policy |
+| 글로벌 규제 차이 | GDPR, CCPA, HIPAA, APPI, PDPA, LGPD는 정의와 권리, 전송 조건이 다르다. | regional profile abstraction |
+| Cross-border transfer 실패 | 외부 LLM provider region 때문에 EU/UK/BR 등에서 전송 제한을 위반할 수 있다. | region-aware provider allowlist |
+| 커스텀 규칙 오작동 | 잘못된 regex나 allowlist가 차단 누락 또는 업무 중단을 만들 수 있다. | validate/test/approve/rollback lifecycle |
+| 커스텀 사전 유출 | 고객 dictionary 자체가 영업비밀일 수 있다. | dictionary encryption, access audit |
+| 적용 난이도 | 설치와 설정이 어렵다면 OSS 확산과 실제 사용이 모두 실패한다. | 5분 quickstart, dry-run, preset, minimal config, copy-paste examples |
+| AAD/nonce/replay 취약점 | context-bound 암호화가 canonicalization과 replay cache 없이 구현되면 우회될 수 있다. | crypto envelope spec, stream sequencing test |
+| 정책 배포 오염 | stale policy, client-supplied source label, unsigned rule package가 hard-block을 우회할 수 있다. | signed policy bundle, fail-closed validation |
+| 관측성 유출 | trace baggage, metric label, exception, crash dump에 평문 prompt/tool output이 남을 수 있다. | telemetry sentinel test |
+| 추상화 과잉 | 프로젝트가 초반부터 너무 많은 interface를 만들면 동작하는 데모가 늦어진다. | core pipeline, MCP proxy, filter/crypto reference를 먼저 구현 |
+| plugin 안전성 | 사용자가 작성한 plugin이 평문을 외부로 전송하거나 audit을 우회할 수 있다. | capability manifest, fail-closed loading, conformance/negative test |
+| OSS 유지보수 부담 | 문서와 예제가 늘수록 보안 업데이트와 호환성 관리가 어려워진다. | 좁은 MVP, semantic versioning, compatibility matrix |
+| 상용/준법 오해 | OSS 문서를 규제 준수 보증으로 오해할 수 있다. | README와 SECURITY.md에 non-compliance disclaimer 명시 |
+## 9. 다음 문서화 작업
+- AI threat model
+- MCP adapter SRS
+- LLM gateway policy schema
+- A2A task/artifact encryption design
+- redaction/tokenization policy spec
+- privacy filtering policy spec
+- custom filtering DSL spec
+- global privacy compliance matrix
+- crypto envelope spec
+- audit event schema
+- expert gap review backlog
+- OSS modular architecture
+- easy adoption guide and quickstart
+- plugin API and conformance test spec
+- protocol security contract spec
+- self-hosted usage and shared responsibility note
+- optional enterprise procurement evidence pack
+- security test spec and red-team corpus
+- RAG/vector and agent memory protection design