@xdev-asia/xdev-knowledge-mcp 1.0.39 → 1.0.41
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/01-phan-1-kien-truc-nen-tang/lessons/04-bai-4-threat-modeling-stride-dread.md +41 -52
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/02-phan-2-iam-keycloak/lessons/01-bai-5-setup-keycloak-realm-benh-vien.md +33 -84
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/02-phan-2-iam-keycloak/lessons/02-bai-6-phan-quyen-rbac-abac.md +6 -23
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/02-phan-2-iam-keycloak/lessons/03-bai-7-smart-on-fhir-oauth2-oidc.md +25 -36
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/02-phan-2-iam-keycloak/lessons/04-bai-8-mfa-passkeys-emergency-access.md +7 -23
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/03-phan-3-data-layer-postgresql/lessons/01-bai-9-postgresql-security-hardening.md +23 -69
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/03-phan-3-data-layer-postgresql/lessons/02-bai-10-ma-hoa-du-lieu-postgresql.md +25 -80
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/03-phan-3-data-layer-postgresql/lessons/03-bai-11-row-level-security-column-encryption.md +26 -55
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/03-phan-3-data-layer-postgresql/lessons/04-bai-12-audit-logging-cdc-pgaudit.md +51 -87
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/04-phan-4-microservices-quarkus/lessons/03-bai-15-ma-hoa-end-to-end-microservices.md +18 -63
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/04-phan-4-microservices-quarkus/lessons/04-bai-16-mtls-service-mesh.md +26 -88
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/05-phan-5-compliance-audit/lessons/01-bai-17-hipaa-technical-safeguards.md +50 -61
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/05-phan-5-compliance-audit/lessons/02-bai-18-audit-trail-opentelemetry-elk.md +11 -34
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/05-phan-5-compliance-audit/lessons/03-bai-19-data-masking-anonymization.md +113 -223
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/05-phan-5-compliance-audit/lessons/04-bai-20-backup-disaster-recovery.md +92 -149
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/06-phan-6-production-van-hanh/lessons/01-bai-21-zero-trust-architecture.md +126 -271
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/06-phan-6-production-van-hanh/lessons/02-bai-22-container-kubernetes-security.md +10 -52
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/06-phan-6-production-van-hanh/lessons/03-bai-23-penetration-testing.md +51 -90
- package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/06-phan-6-production-van-hanh/lessons/04-bai-24-capstone-deploy-production.md +137 -232
- package/data/settings.json +2 -1
- package/package.json +1 -1
|
@@ -23,60 +23,38 @@ course:
|
|
|
23
23
|
|
|
24
24
|

|
|
25
25
|
|
|
26
|
-
|
|
27
26
|
HIPAA cho phép sử dụng và chia sẻ dữ liệu y tế mà **không cần patient consent** nếu dữ liệu đã được **de-identified** — tức là không thể dùng để xác định danh tính bệnh nhân. Đây là nền tảng cho medical research, population health analytics, và machine learning trong healthcare.
|
|
28
27
|
|
|
29
28
|
### 1.1. HIPAA De-identification Standards — §164.514
|
|
30
29
|
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
│ └── Documents methods and results │
|
|
46
|
-
│ │
|
|
47
|
-
│ ▼ │
|
|
48
|
-
│ De-identified Data │
|
|
49
|
-
│ └── NOT considered PHI │
|
|
50
|
-
│ └── NOT subject to HIPAA Privacy Rule │
|
|
51
|
-
│ └── Can be shared freely for research │
|
|
52
|
-
└─────────────────────────────────────────────────────────────┘
|
|
53
|
-
```
|
|
30
|
+

|
|
31
|
+
|
|
32
|
+
**PHI (Protected Health Information)** có 2 phương pháp de-identification theo §164.514(a):
|
|
33
|
+
|
|
34
|
+
- **Method 1: Safe Harbor** §164.514(b)
|
|
35
|
+
- Remove tất cả **18 identifiers**
|
|
36
|
+
- Không có actual knowledge về re-identification
|
|
37
|
+
- Deterministic, rules-based
|
|
38
|
+
- **Method 2: Expert Determination** §164.514(b)(1)
|
|
39
|
+
- Statistical/scientific expert certifies
|
|
40
|
+
- Risk of re-identification là **"very small"**
|
|
41
|
+
- Document phương pháp và kết quả
|
|
42
|
+
|
|
43
|
+
**→ De-identified Data**: NOT considered PHI, NOT subject to HIPAA Privacy Rule, có thể share freely cho research
|
|
54
44
|
|
|
55
45
|
### 1.2. Data Protection Spectrum
|
|
56
46
|
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
│ Fake data Cannot 18 identifiers Partial Full │
|
|
69
|
-
│ generated reverse removed hiding data │
|
|
70
|
-
│ from to (Safe Harbor) (SSN: visible │
|
|
71
|
-
│ patterns original ***-4567) │
|
|
72
|
-
│ data │
|
|
73
|
-
│ │
|
|
74
|
-
│ Use cases: Research Research Production Testing │
|
|
75
|
-
│ Dev/Test Analytics Sharing Display │
|
|
76
|
-
│ Training Population Publications Logs │
|
|
77
|
-
│ Health │
|
|
78
|
-
└─────────────────────────────────────────────────────────────┘
|
|
79
|
-
```
|
|
47
|
+

|
|
48
|
+
|
|
49
|
+
| Level | Mô tả | Use Cases |
|
|
50
|
+
|-------|--------|----------|
|
|
51
|
+
| **Synthetic Data** | Fake data generated from patterns | Dev/Test, Training |
|
|
52
|
+
| **Anonymized Data** | Cannot reverse to original | Research, Analytics, Population Health |
|
|
53
|
+
| **De-identified Data** | 18 identifiers removed (Safe Harbor) | Research Sharing, Publications |
|
|
54
|
+
| **Masked Data** | Partial hiding (SSN: ***-4567) | Production Display, Logs |
|
|
55
|
+
| **Original PHI** | Full data visible | Testing (restricted) |
|
|
56
|
+
|
|
57
|
+
◄── **More Privacy** ─────────────────── **Less Privacy** ──►
|
|
80
58
|
|
|
81
59
|
## 2. HIPAA Safe Harbor Method — 18 Identifiers
|
|
82
60
|
|
|
@@ -341,30 +319,13 @@ Expert Determination method cho phép linh hoạt hơn Safe Harbor nhưng yêu c
|
|
|
341
319
|
3. Document phương pháp và kết quả đánh giá
|
|
342
320
|
|
|
343
321
|
```
|
|
344
|
-
|
|
345
|
-
|
|
346
|
-
|
|
347
|
-
|
|
348
|
-
|
|
349
|
-
|
|
350
|
-
|
|
351
|
-
│ Step 2: Apply statistical methods │
|
|
352
|
-
│ └── k-anonymity (k ≥ 5 recommended) │
|
|
353
|
-
│ └── l-diversity │
|
|
354
|
-
│ └── t-closeness │
|
|
355
|
-
│ │
|
|
356
|
-
│ Step 3: Re-identification risk assessment │
|
|
357
|
-
│ └── Prosecutor risk < 0.04 (1/25) │
|
|
358
|
-
│ └── Journalist risk < 0.04 │
|
|
359
|
-
│ └── Marketer risk < 0.04 │
|
|
360
|
-
│ │
|
|
361
|
-
│ Step 4: Document and certify │
|
|
362
|
-
│ └── Expert's qualifications │
|
|
363
|
-
│ └── Methods used │
|
|
364
|
-
│ └── Risk assessment results │
|
|
365
|
-
│ └── Signed certification │
|
|
366
|
-
└─────────────────────────────────────────────────────────────┘
|
|
367
|
-
```
|
|
322
|
+
|
|
323
|
+
**Expert Determination Process:**
|
|
324
|
+
|
|
325
|
+
1. **Identify quasi-identifiers** — Tổ hợp các fields có thể re-identify (ví dụ: ZIP + DOB + Gender → 87% population unique)
|
|
326
|
+
2. **Apply statistical methods** — k-anonymity (k ≥ 5 recommended), l-diversity, t-closeness
|
|
327
|
+
3. **Re-identification risk assessment** — Prosecutor/Journalist/Marketer risk < 0.04 (1/25)
|
|
328
|
+
4. **Document and certify** — Expert’s qualifications, methods used, risk assessment results, signed certification
|
|
368
329
|
|
|
369
330
|
### 3.2. Quasi-Identifier Analysis
|
|
370
331
|
|
|
@@ -544,35 +505,24 @@ $$ LANGUAGE plpgsql IMMUTABLE;
|
|
|
544
505
|
### 5.1. Masking Pipeline cho Non-Production
|
|
545
506
|
|
|
546
507
|
```
|
|
547
|
-
|
|
548
|
-
|
|
549
|
-
|
|
550
|
-
|
|
551
|
-
|
|
552
|
-
|
|
553
|
-
|
|
554
|
-
|
|
555
|
-
|
|
556
|
-
|
|
557
|
-
|
|
558
|
-
|
|
559
|
-
|
|
560
|
-
|
|
561
|
-
|
|
562
|
-
|
|
563
|
-
|
|
564
|
-
|
|
565
|
-
│ │ ├── No real PHI remaining │
|
|
566
|
-
│ │ ├── Referential integrity preserved │
|
|
567
|
-
│ │ └── Data distributions similar │
|
|
568
|
-
│ │ │
|
|
569
|
-
│ ▼ │
|
|
570
|
-
│ Dev/Test DB │
|
|
571
|
-
│ └── Safe to use without HIPAA constraints │
|
|
572
|
-
│ │
|
|
573
|
-
│ ⚠ Staging area is securely deleted after masking │
|
|
574
|
-
└─────────────────────────────────────────────────────────────┘
|
|
575
|
-
```
|
|
508
|
+
|
|
509
|
+
**Static Data Masking Pipeline:**
|
|
510
|
+
|
|
511
|
+
1. **Production DB** → `pg_dump` (logical backup)
|
|
512
|
+
2. **Staging Area** (temporary, isolated network)
|
|
513
|
+
3. **Apply masking transformations:**
|
|
514
|
+
- Names → Faker-generated names
|
|
515
|
+
- SSN → Random SSN format
|
|
516
|
+
- Dates → Shifted by random offset
|
|
517
|
+
- Addresses → Randomized
|
|
518
|
+
- MRN → Re-keyed
|
|
519
|
+
4. **Validate masked data:**
|
|
520
|
+
- No real PHI remaining
|
|
521
|
+
- Referential integrity preserved
|
|
522
|
+
- Data distributions similar
|
|
523
|
+
5. **Dev/Test DB** — Safe to use without HIPAA constraints
|
|
524
|
+
|
|
525
|
+
> ⚠️ Staging area is securely deleted after masking
|
|
576
526
|
|
|
577
527
|
### 5.2. SQL-Based Static Masking Script
|
|
578
528
|
|
|
@@ -640,31 +590,26 @@ COMMIT;
|
|
|
640
590
|
**Định nghĩa**: Một dataset đạt k-anonymity nếu mỗi tổ hợp quasi-identifiers xuất hiện ít nhất **k lần**. Nghĩa là mỗi record không thể phân biệt khỏi ít nhất k-1 records khác.
|
|
641
591
|
|
|
642
592
|
```
|
|
643
|
-
|
|
644
|
-
|
|
645
|
-
|
|
646
|
-
|
|
647
|
-
|
|
648
|
-
|
|
649
|
-
|
|
650
|
-
|
|
651
|
-
|
|
652
|
-
|
|
653
|
-
|
|
654
|
-
|
|
655
|
-
|
|
656
|
-
|
|
657
|
-
|
|
658
|
-
|
|
659
|
-
|
|
660
|
-
|
|
661
|
-
|
|
662
|
-
|
|
663
|
-
│ │
|
|
664
|
-
│ Techniques: Generalization (age ranges, ZIP truncation) │
|
|
665
|
-
│ Suppression (remove rare values) │
|
|
666
|
-
└─────────────────────────────────────────────────────────────┘
|
|
667
|
-
```
|
|
593
|
+
|
|
594
|
+

|
|
595
|
+
|
|
596
|
+
**BEFORE (k=1, not anonymous):**
|
|
597
|
+
|
|
598
|
+
| Age | ZIP | Gender | Diagnosis |
|
|
599
|
+
|-----|-----|--------|----------|
|
|
600
|
+
| 28 | 700 | M | Diabetes ← Unique! |
|
|
601
|
+
| 29 | 700 | M | Heart |
|
|
602
|
+
| 35 | 700 | F | Cancer ← Unique! |
|
|
603
|
+
|
|
604
|
+
**AFTER (k=3, generalized):**
|
|
605
|
+
|
|
606
|
+
| Age Range | ZIP | Gender | Diagnosis |
|
|
607
|
+
|-----------|-----|--------|----------|
|
|
608
|
+
| 25-35 | 7** | * | Diabetes ← 3 matches |
|
|
609
|
+
| 25-35 | 7** | * | Heart ← 3 matches |
|
|
610
|
+
| 25-35 | 7** | * | Cancer ← 3 matches |
|
|
611
|
+
|
|
612
|
+
**Techniques:** Generalization (age ranges, ZIP truncation), Suppression (remove rare values)
|
|
668
613
|
|
|
669
614
|
### 6.2. K-Anonymity Implementation
|
|
670
615
|
|
|
@@ -832,28 +777,22 @@ public class KAnonymityService {
|
|
|
832
777
|
**T-Closeness**: Phân phối sensitive attribute trong mỗi equivalence class phải gần với phân phối tổng thể (khoảng cách ≤ t). Ngăn chặn **skewness attack**.
|
|
833
778
|
|
|
834
779
|
```
|
|
835
|
-
|
|
836
|
-
|
|
837
|
-
|
|
838
|
-
|
|
839
|
-
|
|
840
|
-
|
|
841
|
-
|
|
842
|
-
|
|
843
|
-
|
|
844
|
-
|
|
845
|
-
|
|
846
|
-
|
|
847
|
-
|
|
848
|
-
|
|
849
|
-
|
|
850
|
-
|
|
851
|
-
│ │ 25-35 │ Diabetes │ ← 3 different diagnoses │
|
|
852
|
-
│ │ 25-35 │ Heart │ ← Attacker cannot infer │
|
|
853
|
-
│ │ 25-35 │ Cold │ ← which one belongs to target │
|
|
854
|
-
│ └──────────┴─────────────┘ │
|
|
855
|
-
└─────────────────────────────────────────────────────────────┘
|
|
856
|
-
```
|
|
780
|
+
|
|
781
|
+
**K-Anonymity (k=3) — VULNERABLE (Homogeneity Attack):**
|
|
782
|
+
|
|
783
|
+
| Age Range | Diagnosis |
|
|
784
|
+
|-----------|----------|
|
|
785
|
+
| 25-35 | HIV ← All have HIV! |
|
|
786
|
+
| 25-35 | HIV ← Attacker knows diagnosis |
|
|
787
|
+
| 25-35 | HIV ← even without knowing WHO |
|
|
788
|
+
|
|
789
|
+
**L-Diversity (l=3) — PROTECTED:**
|
|
790
|
+
|
|
791
|
+
| Age Range | Diagnosis |
|
|
792
|
+
|-----------|----------|
|
|
793
|
+
| 25-35 | Diabetes ← 3 different diagnoses |
|
|
794
|
+
| 25-35 | Heart ← Attacker cannot infer |
|
|
795
|
+
| 25-35 | Cold ← which one belongs to target |
|
|
857
796
|
|
|
858
797
|
| Method | Protects Against | Weakness |
|
|
859
798
|
|--------|-----------------|----------|
|
|
@@ -865,33 +804,16 @@ public class KAnonymityService {
|
|
|
865
804
|
|
|
866
805
|
### 7.1. Tokenization Architecture
|
|
867
806
|
|
|
868
|
-
|
|
869
|
-
|
|
870
|
-
|
|
871
|
-
|
|
872
|
-
|
|
873
|
-
|
|
874
|
-
|
|
875
|
-
|
|
876
|
-
|
|
877
|
-
|
|
878
|
-
│ │ Encryption - FPE) │ │
|
|
879
|
-
│ └───────────┬────────────┘ │
|
|
880
|
-
│ │ │
|
|
881
|
-
│ ┌──────┴──────┐ │
|
|
882
|
-
│ ▼ ▼ │
|
|
883
|
-
│ Token: 248-971-832145 Token Vault (mapping) │
|
|
884
|
-
│ (same format, different ┌──────────────────┐ │
|
|
885
|
-
│ value, reversible │ Token → Original │ │
|
|
886
|
-
│ with key) │ Encrypted store │ │
|
|
887
|
-
│ └──────────────────┘ │
|
|
888
|
-
│ │
|
|
889
|
-
│ Advantages: │
|
|
890
|
-
│ • Same format → existing systems work (validation, UI) │
|
|
891
|
-
│ • Reversible → authorized users can detokenize │
|
|
892
|
-
│ • Referential integrity → same SSN always → same token │
|
|
893
|
-
└─────────────────────────────────────────────────────────────┘
|
|
894
|
-
```
|
|
807
|
+
**Flow:**
|
|
808
|
+
- **Original SSN**: `079-123-456789`
|
|
809
|
+
- → **Tokenization Service** (Format-Preserving Encryption — FPE)
|
|
810
|
+
- **Token**: `248-971-832145` (same format, different value, reversible with key)
|
|
811
|
+
- **Token Vault**: Encrypted mapping Token → Original
|
|
812
|
+
|
|
813
|
+
**Advantages:**
|
|
814
|
+
- Same format → existing systems work (validation, UI)
|
|
815
|
+
- Reversible → authorized users can detokenize
|
|
816
|
+
- Referential integrity → same SSN always → same token
|
|
895
817
|
|
|
896
818
|
### 7.2. FPE Tokenization Service
|
|
897
819
|
|
|
@@ -1280,55 +1202,23 @@ public class DeidentificationResource {
|
|
|
1280
1202
|
|
|
1281
1203
|
### 10.1. Complete Pipeline
|
|
1282
1204
|
|
|
1283
|
-
|
|
1284
|
-
|
|
1285
|
-
|
|
1286
|
-
|
|
1287
|
-
|
|
1288
|
-
|
|
1289
|
-
|
|
1290
|
-
|
|
1291
|
-
|
|
1292
|
-
|
|
1293
|
-
|
|
1294
|
-
|
|
1295
|
-
|
|
1296
|
-
|
|
1297
|
-
|
|
1298
|
-
|
|
1299
|
-
|
|
1300
|
-
│ │ │
|
|
1301
|
-
│ ├──────────────────────────────────────────────────────┐ │
|
|
1302
|
-
│ │ Step 3: Apply Masking Rules │ │
|
|
1303
|
-
│ │ ├── Direct identifiers → Remove/Replace │ │
|
|
1304
|
-
│ │ ├── Quasi-identifiers → Generalize │ │
|
|
1305
|
-
│ │ ├── Dates → Shift by random offset │ │
|
|
1306
|
-
│ │ ├── Free text → NLP scrubbing │ │
|
|
1307
|
-
│ │ └── Verify: k-anonymity check │ │
|
|
1308
|
-
│ └──────────────────────────────────────────────┬───────┘ │
|
|
1309
|
-
│ │ │
|
|
1310
|
-
│ ├──────────────────────────────────────────────────────┐ │
|
|
1311
|
-
│ │ Step 4: Validate │ │
|
|
1312
|
-
│ │ ├── No real PHI remaining (regex scan) │ │
|
|
1313
|
-
│ │ ├── Referential integrity preserved │ │
|
|
1314
|
-
│ │ ├── Statistical properties maintained │ │
|
|
1315
|
-
│ │ └── K-anonymity verified (k ≥ 5) │ │
|
|
1316
|
-
│ └──────────────────────────────────────────────┬───────┘ │
|
|
1317
|
-
│ │ │
|
|
1318
|
-
│ ├──────────────────────────────────────────────────────┐ │
|
|
1319
|
-
│ │ Step 5: Export │ │
|
|
1320
|
-
│ │ pg_dump staging → masked dump file │ │
|
|
1321
|
-
│ │ Load into dev/test/research databases │ │
|
|
1322
|
-
│ └──────────────────────────────────────────────┬───────┘ │
|
|
1323
|
-
│ │ │
|
|
1324
|
-
│ ├──────────────────────────────────────────────────────┐ │
|
|
1325
|
-
│ │ Step 6: Cleanup │ │
|
|
1326
|
-
│ │ DROP staging database │ │
|
|
1327
|
-
│ │ Securely delete temp files │ │
|
|
1328
|
-
│ │ Audit log the entire process │ │
|
|
1329
|
-
│ └─────────────────────────────────────────────────────┘ │
|
|
1330
|
-
└─────────────────────────────────────────────────────────────┘
|
|
1331
|
-
```
|
|
1205
|
+
**Data Masking & De-identification Pipeline:**
|
|
1206
|
+
|
|
1207
|
+
1. **Extract** — `pg_dump --format=custom --compress=9` → encrypted dump file
|
|
1208
|
+
2. **Load into Staging** — `pg_restore` → staging database (isolated network, no external access)
|
|
1209
|
+
3. **Apply Masking Rules:**
|
|
1210
|
+
- Direct identifiers → Remove/Replace
|
|
1211
|
+
- Quasi-identifiers → Generalize
|
|
1212
|
+
- Dates → Shift by random offset
|
|
1213
|
+
- Free text → NLP scrubbing
|
|
1214
|
+
- Verify: k-anonymity check
|
|
1215
|
+
4. **Validate:**
|
|
1216
|
+
- No real PHI remaining (regex scan)
|
|
1217
|
+
- Referential integrity preserved
|
|
1218
|
+
- Statistical properties maintained
|
|
1219
|
+
- K-anonymity verified (k ≥ 5)
|
|
1220
|
+
5. **Export** — `pg_dump staging` → masked dump file → load into dev/test/research databases
|
|
1221
|
+
6. **Cleanup** — DROP staging database, securely delete temp files, audit log the entire process
|
|
1332
1222
|
|
|
1333
1223
|
### 10.2. Comparison Table
|
|
1334
1224
|
|