hipaa-mcp 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,33 @@
1
+ {
2
+ "permissions": {
3
+ "allow": [
4
+ "Bash(uv sync *)",
5
+ "Bash(python3 -m pip install -e \".[dev]\")",
6
+ "Bash(curl -Ls https://astral.sh/uv/install.sh)",
7
+ "Bash(sh)",
8
+ "Bash(pip install *)",
9
+ "Bash(uv venv *)",
10
+ "Bash(uv run *)",
11
+ "Bash(uv add *)",
12
+ "Bash(curl -s \"https://www.ecfr.gov/api/versioner/v1/full/2026-04-16/title-45.xml\" --range 0-3000)",
13
+ "Bash(.venv/bin/python *)",
14
+ "Bash(.venv/bin/pytest *)",
15
+ "Bash(timeout 10 .venv/bin/hipaa-mcp serve)",
16
+ "Bash(grep -v '^$')",
17
+ "Bash(python3 -c \"import sys,json; [print\\(json.dumps\\(json.loads\\(l\\), indent=2\\)\\) for l in sys.stdin if l.strip\\(\\) and '\\\\\"id\\\\\":2' in l]\")",
18
+ "Bash(claude mcp *)",
19
+ "Read(//home/zachw/.claude/**)",
20
+ "Skill(update-config)",
21
+ "mcp__hipaa-mcp__search_regulations",
22
+ "mcp__hipaa-mcp__list_glossary_terms",
23
+ "mcp__hipaa-mcp__get_section",
24
+ "Bash(python -m spacy download en_core_web_sm)",
25
+ "mcp__hipaa-mcp__add_glossary_term",
26
+ "mcp__hipaa-mcp__explain_search",
27
+ "Bash(grep -E '\\\\.\\(toml|md|cfg|txt\\)$')"
28
+ ]
29
+ },
30
+ "enabledMcpjsonServers": [
31
+ "hipaa-mcp"
32
+ ]
33
+ }
@@ -0,0 +1,30 @@
1
+ # deps / build
2
+ .venv/
3
+ __pycache__/
4
+ *.pyc
5
+ *.pyo
6
+ dist/
7
+ build/
8
+ *.egg-info/
9
+
10
+ # tool caches
11
+ .ruff_cache/
12
+ .mypy_cache/
13
+ .pytest_cache/
14
+
15
+ # user data (chromadb, bm25 index, user glossary)
16
+ data/chroma/
17
+ data/bm25_index.pkl
18
+ data/user_glossary.yaml
19
+ *.db
20
+
21
+ # env
22
+ .env
23
+
24
+ # claude instructions (local only)
25
+ claude.md
26
+ CLAUDE.md
27
+
28
+ # OS
29
+ .DS_Store
30
+ Thumbs.db
@@ -0,0 +1,9 @@
1
+ {
2
+ "mcpServers": {
3
+ "hipaa-mcp": {
4
+ "type": "stdio",
5
+ "command": "uv",
6
+ "args": ["run", "--directory", "/mnt/c/users/zachw/repos/hipaa-2-vec", "hipaa-mcp", "serve"]
7
+ }
8
+ }
9
+ }
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 CodePapayas
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,16 @@
1
+ Metadata-Version: 2.4
2
+ Name: hipaa-mcp
3
+ Version: 0.1.0
4
+ License-File: LICENSE
5
+ Requires-Python: >=3.12
6
+ Requires-Dist: chromadb>=0.5.0
7
+ Requires-Dist: click>=8.1
8
+ Requires-Dist: fastmcp>=0.1.0
9
+ Requires-Dist: httpx>=0.27.0
10
+ Requires-Dist: lxml>=5.0
11
+ Requires-Dist: platformdirs>=4.0
12
+ Requires-Dist: pydantic-settings>=2.0
13
+ Requires-Dist: pydantic>=2.0
14
+ Requires-Dist: pyyaml>=6.0
15
+ Requires-Dist: rank-bm25>=0.2.2
16
+ Requires-Dist: spacy>=3.7
@@ -0,0 +1,244 @@
1
+ # πŸ₯ hipaa-mcp
2
+
3
+ > **Ask HIPAA questions in plain English. Get exact citations back. Nothing else.**
4
+
5
+ A local-first MCP server that searches **45 CFR Part 164 (HIPAA)** and **42 CFR Part 2** and returns precise regulatory citations like `Β§ 164.308(a)(1)(ii)(A)` β€” not summaries, not interpretations, not vibes.
6
+
7
+ Built for healthtech developers who need to answer "do I need a BAA for this vendor?" without reading 200 pages of CFR or trusting a Reddit thread.
8
+
9
+ ---
10
+
11
+ > ⚠️ **This is a reference tool, not a compliance tool.** It retrieves and cites regulation text. It does not tell you what the regulation means for your situation. When in doubt, talk to a lawyer.
12
+
13
+ ---
14
+
15
+ ## ✨ What it does
16
+
17
+ | Tool | What it returns |
18
+ |---|---|
19
+ | `search_regulations("do I need a BAA for my analytics vendor?")` | Ranked `Β§ X.Y` citations with full regulation text |
20
+ | `get_section("Β§ 164.308(a)(1)")` | Full text of that specific section |
21
+ | `explain_search("why did my microservice query return these results?")` | Same results + full provenance: which glossary terms fired, confidence scores, per-hit vector/BM25 scores |
22
+ | `add_glossary_term / list_glossary_terms / remove_glossary_term` | Tune how your developer vocabulary maps to regulatory terms |
23
+
24
+ **How search works:** hybrid vector + BM25 retrieval merged with reciprocal rank fusion β†’ your query gets expanded (e.g. "vendor" β†’ "business associate") before hitting the index β†’ results ranked by combined score. No cloud, no OpenAI, no Anthropic. Everything runs on your machine.
25
+
26
+ ---
27
+
28
+ ## πŸš€ Quick start
29
+
30
+ ### Prerequisites
31
+
32
+ | Dependency | Install |
33
+ |---|---|
34
+ | Python 3.12+ | [python.org](https://www.python.org/downloads/) or `pyenv install 3.12` |
35
+ | `uv` (package manager) | `curl -LsSf https://astral.sh/uv/install.sh \| sh` |
36
+ | Ollama *(optional, improves search)* | [ollama.com](https://ollama.com) |
37
+
38
+ ### 1. Install
39
+
40
+ ```bash
41
+ git clone https://github.com/CodePapayas/hipaa-2-vec
42
+ cd hipaa-2-vec
43
+ uv sync
44
+ ```
45
+
46
+ ### 2. Download the spaCy language model
47
+
48
+ ```bash
49
+ uv run python -m spacy download en_core_web_sm
50
+ ```
51
+
52
+ > This is used for POS tagging so "building a SaaS" doesn't match regulation text about building facilities.
53
+
54
+ ### 3. Index the regulations
55
+
56
+ ```bash
57
+ uv run hipaa-mcp reindex
58
+ ```
59
+
60
+ This downloads the eCFR XML from the federal government, parses it into chunks, and builds a local ChromaDB vector index + BM25 index. Takes a minute or two. Only needs to run once (or when you want fresh regulation text).
61
+
62
+ ### 4. *(Optional)* Pull the LLM for smarter query rewriting
63
+
64
+ ```bash
65
+ ollama pull gemma4:e4b
66
+ ```
67
+
68
+ Without this, glossary-based expansion still runs β€” you just won't get LLM-assisted query rewriting. Works fine either way.
69
+
70
+ ---
71
+
72
+ ## πŸ”Œ Connect to Claude Desktop (or any MCP client)
73
+
74
+ Add this to your MCP config file:
75
+
76
+ **Mac:** `~/Library/Application Support/Claude/claude_desktop_config.json`
77
+ **Windows:** `%APPDATA%\Claude\claude_desktop_config.json`
78
+
79
+ ```json
80
+ {
81
+ "mcpServers": {
82
+ "hipaa-mcp": {
83
+ "command": "uv",
84
+ "args": ["run", "hipaa-mcp", "serve"],
85
+ "cwd": "/path/to/hipaa-2-vec"
86
+ }
87
+ }
88
+ }
89
+ ```
90
+
91
+ Restart Claude Desktop. You'll see the πŸ”¨ tools icon β€” `search_regulations`, `get_section`, `explain_search`, and the glossary tools will be available.
92
+
93
+ ---
94
+
95
+ ## πŸ’¬ Example queries
96
+
97
+ ```
98
+ "Do I need a BAA with my logging vendor?"
99
+ "What are the minimum necessary standards?"
100
+ "Can I share patient data with a data analytics subprocessor?"
101
+ "What does HIPAA say about breach notification timelines?"
102
+ "What's required for de-identified data?"
103
+ ```
104
+
105
+ Each returns the matching regulation sections verbatim with their `Β§` citations. The tool never interprets β€” just retrieves.
106
+
107
+ ---
108
+
109
+ ## πŸ—‚οΈ CLI reference
110
+
111
+ ```bash
112
+ # Start MCP server over stdio (used by Claude Desktop / MCP clients)
113
+ hipaa-mcp serve
114
+
115
+ # Rebuild the index (re-downloads eCFR XML, rebuilds ChromaDB + BM25)
116
+ hipaa-mcp reindex
117
+ hipaa-mcp reindex --date 2026-01-01 # pin to a specific regulation date
118
+
119
+ # Glossary management
120
+ hipaa-mcp glossary list # show all term mappings
121
+ hipaa-mcp glossary path # show where the YAML file lives
122
+ ```
123
+
124
+ ---
125
+
126
+ ## πŸ“– The glossary: why it exists and how to use it
127
+
128
+ HIPAA uses different words than developers do. The glossary bridges that gap at query time β€” no re-indexing required when you change it.
129
+
130
+ ### Built-in mappings (sample)
131
+
132
+ | What you say | What HIPAA says |
133
+ |---|---|
134
+ | SaaS, vendor, contractor | business associate |
135
+ | share, send, transmit | disclosure |
136
+ | delete, purge, wipe | destruction |
137
+ | consent, opt-in | authorization |
138
+ | logging, audit log | audit controls |
139
+ | least privilege | minimum necessary |
140
+ | breach, data leak | breach notification |
141
+ | de-identified | *(anti)* not PHI |
142
+
143
+ ### Relationship types
144
+
145
+ | Type | Behavior |
146
+ |---|---|
147
+ | `synonym` | Expand in both directions |
148
+ | `hyponym` | One-way only (your term β†’ regulatory term) |
149
+ | `contextual` | Only expand if a scope keyword appears in the query |
150
+ | `anti` | When your term is present, *exclude* the target from expansion |
151
+
152
+ ### Inspecting expansion with `explain_search`
153
+
154
+ When you want to understand *why* a query returned specific results, use `explain_search` instead of `search_regulations`. It returns the same hits plus:
155
+
156
+ - **`glossary_matches`** β€” every glossary entry that fired, with `confidence` (0–1), the relationship type, and which `scope_triggered` words caused a contextual match
157
+ - **`vector_score`** β€” cosine similarity (0–1) between the query and the chunk
158
+ - **`bm25_score`** β€” lexical match score normalized to the top BM25 result (0–1)
159
+ - **`rrf_score`** β€” the final merged rank fusion score
160
+
161
+ ```
162
+ explain_search("does my microservice need a BAA if it processes PHI?")
163
+ β†’ glossary_matches:
164
+ "microservice" β†’ "business associate" [contextual, scope: PHI] confidence: 0.95
165
+ "processes" β†’ "use" [synonym, VERB subst.] confidence: 1.0
166
+ β†’ hits:
167
+ Β§ 164.308 vector=0.71 bm25=1.00 rrf=0.032 [hybrid]
168
+ Β§ 164.314 vector=0.65 bm25=0.84 rrf=0.031 [hybrid]
169
+ ```
170
+
171
+ ### Adding your own mappings
172
+
173
+ ```bash
174
+ # Via MCP tool (works inside Claude)
175
+ add_glossary_term(phrase="my term", maps_to="regulatory term", relationship="synonym")
176
+
177
+ # Or edit the YAML directly
178
+ hipaa-mcp glossary path # shows the file location
179
+ ```
180
+
181
+ The glossary lives in your platform's user data directory β€” it won't be overwritten by upgrades.
182
+
183
+ ---
184
+
185
+ ## βš™οΈ Configuration
186
+
187
+ All env vars are prefixed `HIPAA_MCP_`. You can set them in a `.env` file in the project root.
188
+
189
+ | Variable | Default | What it does |
190
+ |---|---|---|
191
+ | `HIPAA_MCP_OLLAMA_URL` | `http://localhost:11434` | Ollama endpoint |
192
+ | `HIPAA_MCP_LLM_MODEL` | `gemma4:e4b` | Model used for query rewriting |
193
+ | `HIPAA_MCP_USE_LLM_FOR_QUERY_UNDERSTANDING` | `true` | Set `false` to skip LLM rewriting (glossary expansion still runs) |
194
+ | `HIPAA_MCP_DATA_DIR` | platform user data dir | Where ChromaDB, BM25 index, and glossary are stored |
195
+ | `HIPAA_MCP_TOP_K_DEFAULT` | `5` | Default number of results returned |
196
+
197
+ **Example `.env`:**
198
+ ```env
199
+ HIPAA_MCP_USE_LLM_FOR_QUERY_UNDERSTANDING=false
200
+ HIPAA_MCP_TOP_K_DEFAULT=10
201
+ ```
202
+
203
+ ---
204
+
205
+ ## πŸ§ͺ Running tests
206
+
207
+ ```bash
208
+ uv run pytest
209
+ ```
210
+
211
+ Tests use in-memory ChromaDB and a stub LLM β€” no real Ollama calls, no network required.
212
+
213
+ ---
214
+
215
+ ## πŸ—ΊοΈ What's in scope / not in scope
216
+
217
+ | βœ… In scope | ❌ Not in scope |
218
+ |---|---|
219
+ | HIPAA 45 CFR Part 164 | Legal interpretation of any kind |
220
+ | 42 CFR Part 2 (substance use records) | Cloud inference of any kind |
221
+ | Plain-English β†’ citation search | A web UI |
222
+ | Local-only, air-gappable | Authentication |
223
+ | Glossary-tunable query expansion | Regs beyond HIPAA + Part 2 |
224
+
225
+ ---
226
+
227
+ ## πŸ“¦ Stack
228
+
229
+ `Python 3.12` Β· `FastMCP` Β· `ChromaDB` Β· `rank_bm25` Β· `Pydantic v2` Β· `spaCy` Β· `lxml` Β· `Ollama (Gemma 4 E4B)` Β· `uv`
230
+
231
+ ---
232
+
233
+ ## πŸ—’οΈ TODO
234
+
235
+ - **Glossary expansion preview during reindex** β€” while `hipaa-mcp reindex` runs, sample ~1 in 5 glossary mappings and print them as they're applied, e.g.:
236
+
237
+ ```
238
+ expanding "vendor" β†’ "business associate"
239
+ expanding "share" β†’ "disclosure"
240
+ expanding "delete" β†’ "destruction"
241
+ ...
242
+ ```
243
+
244
+ Goal: visual confirmation the glossary is wired up correctly + teaches developers the regulatory vocabulary while they wait. Not all terms β€” just a representative sample, whatever looks good in the terminal.