maskcloud 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (37) hide show
  1. maskcloud-0.1.0/.gitignore +11 -0
  2. maskcloud-0.1.0/.mask_audit.db +0 -0
  3. maskcloud-0.1.0/.python-version +1 -0
  4. maskcloud-0.1.0/LICENSE +201 -0
  5. maskcloud-0.1.0/PKG-INFO +332 -0
  6. maskcloud-0.1.0/README.md +284 -0
  7. maskcloud-0.1.0/desktop.ini +2 -0
  8. maskcloud-0.1.0/examples/secure_vault/__init__.py +13 -0
  9. maskcloud-0.1.0/examples/secure_vault/email_tool.py +28 -0
  10. maskcloud-0.1.0/examples/secure_vault/vault_agent.py +70 -0
  11. maskcloud-0.1.0/examples/server.py +54 -0
  12. maskcloud-0.1.0/examples/test_agent.py +55 -0
  13. maskcloud-0.1.0/mask/__init__.py +19 -0
  14. maskcloud-0.1.0/mask/client.py +90 -0
  15. maskcloud-0.1.0/mask/core/__init__.py +3 -0
  16. maskcloud-0.1.0/mask/core/crypto.py +82 -0
  17. maskcloud-0.1.0/mask/core/fpe.py +113 -0
  18. maskcloud-0.1.0/mask/core/scanner.py +128 -0
  19. maskcloud-0.1.0/mask/core/utils.py +60 -0
  20. maskcloud-0.1.0/mask/core/vault.py +431 -0
  21. maskcloud-0.1.0/mask/integrations/__init__.py +3 -0
  22. maskcloud-0.1.0/mask/integrations/adk_hooks.py +70 -0
  23. maskcloud-0.1.0/mask/integrations/langchain_hooks.py +98 -0
  24. maskcloud-0.1.0/mask/integrations/llamaindex_hooks.py +128 -0
  25. maskcloud-0.1.0/mask/telemetry/__init__.py +3 -0
  26. maskcloud-0.1.0/mask/telemetry/audit_logger.py +266 -0
  27. maskcloud-0.1.0/pyproject.toml +81 -0
  28. maskcloud-0.1.0/tests/__init__.py +0 -0
  29. maskcloud-0.1.0/tests/conftest.py +12 -0
  30. maskcloud-0.1.0/tests/test_audit_logger.py +115 -0
  31. maskcloud-0.1.0/tests/test_fpe.py +79 -0
  32. maskcloud-0.1.0/tests/test_hooks.py +100 -0
  33. maskcloud-0.1.0/tests/test_langchain.py +72 -0
  34. maskcloud-0.1.0/tests/test_llamaindex.py +80 -0
  35. maskcloud-0.1.0/tests/test_vault.py +118 -0
  36. maskcloud-0.1.0/tests/test_vault_backends.py +134 -0
  37. maskcloud-0.1.0/uv.lock +5639 -0
@@ -0,0 +1,11 @@
1
+ # Python-generated files
2
+ __pycache__/
3
+ *.py[oc]
4
+ build/
5
+ dist/
6
+ wheels/
7
+ *.egg-info
8
+
9
+ # Virtual environments
10
+ .venv
11
+ .env
Binary file
@@ -0,0 +1 @@
1
+ 3.13.3
@@ -0,0 +1,201 @@
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright 2026 Mask AI Solutions
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
@@ -0,0 +1,332 @@
1
+ Metadata-Version: 2.4
2
+ Name: maskcloud
3
+ Version: 0.1.0
4
+ Summary: Just-In-Time Privacy Middleware for AI Agents. Format-preserving encryption with pluggable vault backends.
5
+ Author: Mask AI Solutions
6
+ License-Expression: Apache-2.0
7
+ License-File: LICENSE
8
+ Keywords: ai-agents,encryption,fpe,hipaa,llm,pii,privacy,soc2
9
+ Classifier: Development Status :: 3 - Alpha
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: Programming Language :: Python :: 3
12
+ Classifier: Topic :: Security :: Cryptography
13
+ Requires-Python: >=3.10
14
+ Requires-Dist: cryptography>=46.0.5
15
+ Requires-Dist: presidio-analyzer>=2.2.361
16
+ Requires-Dist: presidio-anonymizer>=2.2.361
17
+ Requires-Dist: pydantic>=2.12.5
18
+ Provides-Extra: adk
19
+ Requires-Dist: google-adk>=1.0; extra == 'adk'
20
+ Provides-Extra: all
21
+ Requires-Dist: boto3>=1.34; extra == 'all'
22
+ Requires-Dist: google-adk>=1.0; extra == 'all'
23
+ Requires-Dist: langchain-core>=0.2; extra == 'all'
24
+ Requires-Dist: llama-index-core>=0.10; extra == 'all'
25
+ Requires-Dist: pymemcache>=4.0; extra == 'all'
26
+ Requires-Dist: redis>=5.0; extra == 'all'
27
+ Requires-Dist: spacy>=3.4.4; extra == 'all'
28
+ Provides-Extra: dynamodb
29
+ Requires-Dist: boto3>=1.34; extra == 'dynamodb'
30
+ Provides-Extra: examples
31
+ Requires-Dist: fastapi>=0.111; extra == 'examples'
32
+ Requires-Dist: httpx>=0.27; extra == 'examples'
33
+ Requires-Dist: litellm>=1.74; extra == 'examples'
34
+ Requires-Dist: python-dotenv>=1.0; extra == 'examples'
35
+ Requires-Dist: redis>=5.0.0; extra == 'examples'
36
+ Requires-Dist: uvicorn>=0.30; extra == 'examples'
37
+ Provides-Extra: langchain
38
+ Requires-Dist: langchain-core>=0.2; extra == 'langchain'
39
+ Provides-Extra: llamaindex
40
+ Requires-Dist: llama-index-core>=0.10; extra == 'llamaindex'
41
+ Provides-Extra: memcached
42
+ Requires-Dist: pymemcache>=4.0; extra == 'memcached'
43
+ Provides-Extra: redis
44
+ Requires-Dist: redis>=5.0; extra == 'redis'
45
+ Provides-Extra: spacy
46
+ Requires-Dist: spacy>=3.4.4; extra == 'spacy'
47
+ Description-Content-Type: text/markdown
48
+
49
+ # Mask: Just-in-Time AI Agent Security
50
+
51
+ Contact: millingtonsully@gmail.com
52
+
53
+ [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
54
+
55
+ Mask is an enterprise-grade AI Data Loss Prevention (DLP) infrastructure. It acts as the runtime enforcement layer between your Large Language Models (LLMs) and your active tool execution environment, ensuring that LLMs never see raw PII or sensitive financial records, while maintaining flawless functional execution for the end user.
56
+
57
+ ---
58
+
59
+ ## The Problem Space: LLM Data Leakage
60
+
61
+ As Large Language Model (LLM) agents gain autonomy, they become deeply integrated into enterprise systems, often requiring access to highly sensitive information such as Personally Identifiable Information (PII) and confidential financial records.
62
+
63
+ The core vulnerability in standard agentic architectures is that sensitive data retrieved by tools is injected as plain-text directly into the LLM's context window. This creates severe compliance and security risks:
64
+ - **Data Leakage:** Plain-text PII can be logged by external LLM providers, violating data residency laws or compliance frameworks (SOC2, HIPAA, PCI-DSS).
65
+ - **Inadvertent Disclosure:** If an agent is compromised via prompt injection or malicious instructions, it can be manipulated into exfiltrating the plain-text data it actively holds in its context.
66
+
67
+ ## The Solution: Privacy by Design
68
+
69
+ Mask utilizes a **Two-Layer Strategy** to solve the data leakage problem, splitting responsibilities between a local runtime environment (The Data Plane) and a centralized governance platform (The Control Plane).
70
+
71
+ Instead of trusting the LLM to safeguard plain-text data, the system strictly enforces cryptographic boundaries using **Just-In-Time (JIT) Encryption and Decryption Middleware**.
72
+ 1. The LLM only ever "sees" and reasons over scrambled, encrypted cyphertext.
73
+ 2. When the LLM decides to call a specific authorized tool (e.g., querying a database), a **Pre-Tool Decryption Hook** intercepts the call. It decrypts the specific parameters required by the tool, allowing the backend function to execute securely with real data.
74
+ 3. Once the tool finishes, a **Post-Tool Encryption Hook** instantly intercepts the output, detects sensitive entities, and encrypts them *before* the result is returned to the LLM's analytical context block.
75
+
76
+ This guarantees that the LLM can orchestrate workflows involving sensitive data without ever actually exposing the raw data to the model or its remote provider logs.
77
+
78
+ Additionally, we solve two critical sub-issues to make this enterprise-ready:
79
+ 1. **The Statefulness Trap**: Traditional "vaults" break down in multi-node Kubernetes environments. We support pluggable distributed vaults (Redis, DynamoDB, Memcached) so detokenization state is instantly shared across all your horizontally scaled pods.
80
+ 2. **The Schema Trap**: Strict downstream tools will crash if handed a random token. We use Format-Preserving Tokenization backed by an encrypted vault to generate tokens that retain the exact format of the original data (Emails, US Phones, SSNs, 16-digit Credit Cards, 9-digit Routing Numbers). Tokens look like real data; the real values are stored encrypted and retrieved via the vault.
81
+
82
+ ### How We Handle Data
83
+
84
+ * **If the LLM needs to think about the value:** We tokenize it so the LLM only sees a fake-looking value, and we keep the real value encrypted in a vault.
85
+ * **If something is so sensitive that the LLM should never see it at all:** A future version will support skipping the LLM entirely for that field and only sending it to tools/backends.
86
+
87
+ Real math and real business logic always happen inside tools, after detokenization and decryption, not inside the LLM on fake numbers.
88
+
89
+ ---
90
+
91
+ ## Architectural Overview
92
+
93
+ ### 1. The Data Plane (Mask Open Source SDK)
94
+ The Data Plane is the open-source, transparent, auditable runtime execution layer. It lives inside your secure VPC or Kubernetes clusters alongside your AI agents. It acts as the Trojan Horse of security, providing frictionless adoption for engineers while proving cryptographic soundness to security reviewers.
95
+
96
+ * **JIT Cryptography Engine:** The core pre-tool decryption and post-tool encryption hooks that intercept and mutate data in-flight.
97
+ * **Format-Preserving Tokenization Router:** Ensures downstream databases and strict schemas don't break when handed a token. Tokens look like real data; the real values are stored encrypted and retrieved via the vault.
98
+ * **Pluggable Distributed Vaults:** Support for enterprise-native caching layers (Redis, DynamoDB, Memcached) to ensure horizontally-scaled edge agents have synchronized access to detokenization mapping.
99
+ * **Telemetry Remote Forwarder:** An asynchronous AuditLogger that buffers privacy events and securely POSTs them to the Control Plane API without blocking LLM execution.
100
+
101
+ ### 2. The Control Plane (Mask Enterprise Platform)
102
+ The Control Plane is our active Enterprise SaaS offering—a centralized governance platform for security orchestration. Coming soon.
103
+
104
+ The Control Plane manages:
105
+ * **Unified Dashboard:** Visualizes usage and storage metrics across your environment.
106
+ * **Tenant & API Key Management:** Manage API keys and role-based access control for isolated environments.
107
+ * **Vault Configuration UI:** Provision and monitor our managed, hosted vaults. We offer Ephemeral Memory Vaults cleared daily, and highly available Persistent Vaults with customizable retention policies based on your subscription tier.
108
+ * **Audit Log Viewer:** Explore telemetry events and generate one-click compliance reports for SOC2, HIPAA, and PCI-DSS.
109
+ * **Key Management Center:** Automate rotation of symmetric encryption keys and track key status without relying on static local environment variables.
110
+ * **Billing & Subscriptions:** Transparent tracking of Protected Entities and metered overage.
111
+
112
+ ---
113
+
114
+ ## Advanced Architecture & Security Guarantees
115
+
116
+ While Mask can be run globally via environment variables, the underlying SDK is highly sophisticated and designed for multi-tenant, zero-trust environments.
117
+
118
+ ### 1. Deterministic Token Deduplication
119
+ Mask vaults use cryptographic hashing to perform reverse-lookups during tokenization. If the LLM generates a prompt containing the same email address 50 times in a single session, Mask retrieves and re-uses the *exact same Format-Preserving Token*. This mathematically prevents vault storage bloat, accelerates encryption performance, and crucially, prevents the LLM from hallucinating due to seeing inconsistent tokens for the same underlying entity.
120
+
121
+ ### 2. The Explicit `MaskClient` API
122
+ For enterprise backend services handling multiple tenants at once, global singletons (environment configurations) are dangerous. Mask natively supports explicit client instantiation. Developers can isolate vaults, crypto engines, and NLP scanners on a per-request basis.
123
+
124
+ ```python
125
+ from mask.client import MaskClient
126
+ from mask.core.vault import MemoryVault
127
+ from mask.core.crypto import CryptoEngine
128
+
129
+ # Fully isolated instance for strict multi-tenancy
130
+ client = MaskClient(
131
+ vault=MemoryVault(),
132
+ crypto=CryptoEngine(tenant_specific_key),
133
+ ttl=3600
134
+ )
135
+
136
+ safe_token = client.encode("user@tenant.com")
137
+ ```
138
+
139
+ ### 3. Heuristic Safety mathematically guaranteed
140
+ It is catastrophic if an SDK misidentifies a user's *real* SSN as a "token" and accidentally passes it in plaintext to an LLM. Mask's `looks_like_token()` heuristic algorithm strictly uses universally invalid prefixes.
141
+ * SSN tokens always begin with `000` (The Social Security Administration has never issued an Area Number of 000).
142
+ * Routing tokens always begin with `0000` (The Federal Reserve valid range starts at 01).
143
+ * Credit Card tokens use the `4000-0000-0000` Visa reserved test BIN.
144
+ By generating statistically impossible tokens, Mask guarantees it will never accidentally swallow real PII.
145
+
146
+ ---
147
+
148
+ ## Installation and Setup
149
+
150
+ Install the Data Plane core SDK. Core features require cryptography and Presidio; Redis/Dynamo/Memcached/LangChain/LlamaIndex/ADK remain optional extras:
151
+ ```bash
152
+ pip install maskcloud
153
+ ```
154
+
155
+ Add optional extras depending on your infrastructure and framework:
156
+ ```bash
157
+ pip install "maskcloud[redis]" # For Redis vaults
158
+ pip install "maskcloud[dynamodb]" # For AWS DynamoDB vaults
159
+ pip install "maskcloud[memcached]" # For Memcached vaults
160
+ pip install "maskcloud[langchain]" # For LangChain hooks
161
+ pip install "maskcloud[llamaindex]" # For LlamaIndex hooks
162
+ pip install "maskcloud[adk]" # For Google ADK hooks
163
+ ```
164
+
165
+
166
+ ### Installing AI Models
167
+ Mask uses powerful NLP engines for PII detection. Install the `spacy` extra and then download your preferred model:
168
+ ```bash
169
+ # 1. Install with spaCy support
170
+ pip install "maskcloud[spacy]"
171
+
172
+ # 2. Download the NLP model (choose one)
173
+ python -m spacy download en_core_web_sm # Small (~12MB, Fast)
174
+ python -m spacy download en_core_web_md # Standard (~40MB, Balanced)
175
+ python -m spacy download en_core_web_lg # Large (~560MB, High Accuracy)
176
+ ```
177
+
178
+ For a typical production environment, you might combine extras:
179
+ ```bash
180
+ pip install "maskcloud[spacy,redis]"
181
+ python -m spacy download en_core_web_lg
182
+ ```
183
+
184
+
185
+ ### Environment Configuration
186
+
187
+ Before running your agents, Mask requires an encryption key and a vault backend selection.
188
+
189
+ ```bash
190
+ # 1. Provide your encryption key
191
+ # Generate a static key for local development:
192
+ python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
193
+ export MASK_ENCRYPTION_KEY="..."
194
+ # For Cloud Platform users, the KMS can securely distribute this upon initialization.
195
+
196
+ # 2. Select your vault type
197
+ export MASK_VAULT_TYPE=redis # Options: memory, redis, dynamodb, memcached, cloud
198
+
199
+ # 3. Configure your chosen vault backend
200
+ # For Mask Cloud Managed Vaults:
201
+ export MASK_API_KEY="..."
202
+ # For self-hosted Redis:
203
+ export MASK_REDIS_URL=redis://localhost:6379/0
204
+ # For self-hosted DynamoDB:
205
+ export MASK_DYNAMODB_TABLE=mask-vault
206
+ export MASK_DYNAMODB_REGION=us-east-1
207
+ # For self-hosted Memcached:
208
+ export MASK_MEMCACHED_HOST=localhost
209
+ export MASK_MEMCACHED_PORT=11211
210
+ ```
211
+
212
+ For production and staging environments, `MASK_ENCRYPTION_KEY` **must** be set;
213
+ the SDK will not start without it. The SDK is designed for single-tenant
214
+ deployments where one global vault and key serve a single financial institution
215
+ or environment.
216
+
217
+ ---
218
+
219
+ ## Framework Integrations
220
+
221
+ Mask integrates seamlessly by injecting dynamic, recursive hooks into your agent's execution pipeline.
222
+ * **Pre-Hooks (Decoding)**: Scans the incoming tool arguments, looks up tokens in the Vault, and replaces them with plaintext *before* the function executes.
223
+ * **Post-Hooks (Encoding)**: Scans data returning from the tool, encrypts any raw PII found, and hands the tokens back to the LLM.
224
+
225
+ ### 1. LangChain
226
+ To protect tool outputs, you must wrap tools with MaskToolWrapper. The callback handler is for logging/audit only.
227
+ ```python
228
+ from langchain.agents import AgentExecutor
229
+ from mask.integrations.langchain_hooks import MaskCallbackHandler, MaskToolWrapper
230
+
231
+ # Wrap your tools so arguments are automatically detokenized and outputs re-tokenized
232
+ secure_tools = [MaskToolWrapper(my_email_tool)]
233
+
234
+ # Add the callback handler (for logging/audit only)
235
+ agent_executor = AgentExecutor(
236
+ agent=my_agent,
237
+ tools=secure_tools,
238
+ callbacks=[MaskCallbackHandler()]
239
+ )
240
+ ```
241
+
242
+ ### 2. LlamaIndex
243
+ Use MaskToolWrapper and/or MaskCallbackHandler to ensure inputs are detokenized and outputs are re-tokenized.
244
+ ```python
245
+ from llama_index.core.tools import FunctionTool
246
+ from mask.integrations.llamaindex_hooks import MaskToolWrapper
247
+
248
+ # Wrap the callable directly for input detokenization and output tokenization
249
+ secure_email_tool = FunctionTool.from_defaults(
250
+ fn=MaskToolWrapper(my_email_function),
251
+ name="send_email",
252
+ description="Sends a secure email"
253
+ )
254
+ ```
255
+
256
+ ### 3. Google ADK
257
+ Use decrypt_before_tool and encrypt_after_tool; they protect args and responses (strings, dicts, lists) with tokenization.
258
+ ```python
259
+ from google.adk.agents import Agent
260
+ from mask.integrations.adk_hooks import decrypt_before_tool, encrypt_after_tool
261
+
262
+ secure_agent = Agent(
263
+ name="secure_assistant",
264
+ model=...,
265
+ tools=[...],
266
+ before_tool_callback=decrypt_before_tool, # Protects arguments
267
+ after_tool_callback=encrypt_after_tool, # Protects responses
268
+ )
269
+ ```
270
+
271
+ ---
272
+
273
+ ## Testing and Verification
274
+
275
+ ### The Test Suite
276
+ The SDK is highly comprehensive and fully verified with a native `pytest` suite. It ensures cryptographic integrity, FPE format compliance, asynchronous telemetry, and distributed vault TTL expiry across all layers.
277
+
278
+ #### Core Tests (`test_fpe.py`, `test_vault.py`, `test_vault_backends.py`)
279
+ - **Format-Preserving Tokenization Integrity:** Validates that tokens preserve their original formats (e.g., emails become `tkn-<hex>@email.com`, SSNs become `000-00-<4 digits>`) to ensure downstream regex and schema validators do not break.
280
+ - **Memory Vaults:** Verifies fundamental `store()`, `retrieve()`, `delete()`, TTL mechanics, and clean token/plaintext roundtrips via the `encode()` and `decode()` API. The public `decode()` helper is **strict** and raises on failure; callers that prefer lenient behaviour should catch `DecodeError` and fall back to the original token themselves.
281
+ - **Distributed Vaults:** Mocks `boto3` and `pymemcache` to guarantee production-grade backends (DynamoDB and Memcached) correctly respect TTL expirations and auto-delete stale rows across distributed architectures.
282
+
283
+ #### Telemetry Tests (`test_audit_logger.py`)
284
+ - **SOC2/HIPAA Trailing:** Validates asynchronous audit event buffering.
285
+ - **Resilience:** Proves that network timeouts (`urllib.error.URLError`) when POSTing to the Mask Control Plane are safely swallowed by the daemon thread and will *never* crash the host application.
286
+
287
+ #### Framework Integrations (`test_hooks.py`, `test_langchain.py`, `test_llamaindex.py`)
288
+ - **Recursive Scanners:** Tests `deep_decode` and `deep_encode_pii` (from `mask.core.utils`) to prove nested dictionaries/lists in JSON payloads are correctly scrubbed without mutating the underlying framework data structures.
289
+ - **Framework specific hooks:** Validates that LangChain `MaskToolWrapper`, LlamaIndex `FunctionTool` wrappers, and Google ADK pre/post hooks correctly intercept inputs and outputs to enforce the JIT Privacy Middleware.
290
+
291
+ ```bash
292
+ uv run pytest tests/ -v
293
+ ```
294
+
295
+ ### The Interactive Demo (examples/test_agent.py)
296
+ You can observe Mask's privacy middleware in action by running the demo script:
297
+ ```bash
298
+ uv run python examples/test_agent.py
299
+ ```
300
+
301
+ **What is REAL vs MOCKED in the demo?**
302
+ * **REAL**: The Format-Preserving Tokenization generation, the storage of the token into the Vault, and the hook's recursive detokenization algorithm are all executing genuinely.
303
+ * **MOCKED**: To save time and API credits for a local demo, the script does not make a real HTTP call to an LLM provider, nor does the mock tool perform real downstream actions. It simulates the LLM's decision so you can observe the middleware pipeline execute flawlessly.
304
+
305
+ ---
306
+
307
+ ## Telemetry and Compliance
308
+ The SDK includes a thread-safe, asynchronous AuditLogger built-in (`mask/telemetry/audit_logger.py`).
309
+
310
+ As your agents encrypt and decrypt data, the logger buffers these privacy events (e.g., Action: Tokenized Email, Agent: SalesBot, TTL: 600s). **Raw PII is never logged.**
311
+
312
+ In enterprise deployments, these logs are automatically forwarded to the **Mask Control Plane** via our Telemetry Ingestion API to power your compliance dashboard and Audit Storage database. Retention policies vary by Cloud tier (7-day history for Basic, 30-day for Pro, Unlimited for Enterprise).
313
+
314
+ For open-source or local telemetry evaluation, they can be flushed to stdout as structured JSON and piped into your existing Datadog or Splunk agents to generate compliance reports for your SOC2, HIPAA, or PCI-DSS auditors proving that your LLM infrastructure properly isolates sensitive data.
315
+
316
+ If your environment does not permit on-disk storage of audit events, you can
317
+ disable the local SQLite buffer by setting:
318
+
319
+ ```bash
320
+ export MASK_DISABLE_AUDIT_DB=true
321
+ ```
322
+
323
+ In this mode, events are still emitted via the logger but never persisted to
324
+ `.mask_audit.db` on disk.
325
+
326
+ ---
327
+
328
+ ## License
329
+
330
+ This project is licensed under the Apache License, Version 2.0 - see the [LICENSE](LICENSE) file for details.
331
+
332
+ Copyright (c) 2026 Mask AI Solutions