@xdev-asia/xdev-knowledge-mcp 1.0.39 → 1.0.40

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (20) hide show
  1. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/01-phan-1-kien-truc-nen-tang/lessons/04-bai-4-threat-modeling-stride-dread.md +41 -52
  2. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/02-phan-2-iam-keycloak/lessons/01-bai-5-setup-keycloak-realm-benh-vien.md +33 -84
  3. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/02-phan-2-iam-keycloak/lessons/02-bai-6-phan-quyen-rbac-abac.md +6 -23
  4. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/02-phan-2-iam-keycloak/lessons/03-bai-7-smart-on-fhir-oauth2-oidc.md +25 -36
  5. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/02-phan-2-iam-keycloak/lessons/04-bai-8-mfa-passkeys-emergency-access.md +7 -23
  6. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/03-phan-3-data-layer-postgresql/lessons/01-bai-9-postgresql-security-hardening.md +23 -69
  7. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/03-phan-3-data-layer-postgresql/lessons/02-bai-10-ma-hoa-du-lieu-postgresql.md +25 -80
  8. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/03-phan-3-data-layer-postgresql/lessons/03-bai-11-row-level-security-column-encryption.md +26 -55
  9. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/03-phan-3-data-layer-postgresql/lessons/04-bai-12-audit-logging-cdc-pgaudit.md +51 -87
  10. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/04-phan-4-microservices-quarkus/lessons/03-bai-15-ma-hoa-end-to-end-microservices.md +18 -63
  11. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/04-phan-4-microservices-quarkus/lessons/04-bai-16-mtls-service-mesh.md +26 -88
  12. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/05-phan-5-compliance-audit/lessons/01-bai-17-hipaa-technical-safeguards.md +50 -61
  13. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/05-phan-5-compliance-audit/lessons/02-bai-18-audit-trail-opentelemetry-elk.md +11 -34
  14. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/05-phan-5-compliance-audit/lessons/03-bai-19-data-masking-anonymization.md +113 -223
  15. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/05-phan-5-compliance-audit/lessons/04-bai-20-backup-disaster-recovery.md +92 -149
  16. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/06-phan-6-production-van-hanh/lessons/01-bai-21-zero-trust-architecture.md +126 -271
  17. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/06-phan-6-production-van-hanh/lessons/02-bai-22-container-kubernetes-security.md +10 -52
  18. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/06-phan-6-production-van-hanh/lessons/03-bai-23-penetration-testing.md +51 -90
  19. package/content/series/architecture/xay-dung-he-thong-y-te-microservices/chapters/06-phan-6-production-van-hanh/lessons/04-bai-24-capstone-deploy-production.md +137 -232
  20. package/package.json +1 -1
@@ -23,60 +23,38 @@ course:
23
23
 
24
24
  ![HIPAA De-identification — Safe Harbor vs Expert Determination](/storage/uploads/2026/04/healthcare-data-deidentification.png)
25
25
 
26
-
27
26
  HIPAA cho phép sử dụng và chia sẻ dữ liệu y tế mà **không cần patient consent** nếu dữ liệu đã được **de-identified** — tức là không thể dùng để xác định danh tính bệnh nhân. Đây là nền tảng cho medical research, population health analytics, và machine learning trong healthcare.
28
27
 
29
28
  ### 1.1. HIPAA De-identification Standards — §164.514
30
29
 
31
- ```
32
- ┌─────────────────────────────────────────────────────────────┐
33
- │ HIPAA De-identification §164.514(a)
34
- │ │
35
- │ PHI (Protected Health Information)
36
- │ │ │
37
- │ ├── Method 1: Safe Harbor §164.514(b) │
38
- │ │ └── Remove all 18 identifiers │
39
- │ │ └── No actual knowledge of re-identification │
40
- │ │ └── Deterministic, rules-based │
41
- │ │ │
42
- │ └── Method 2: Expert Determination §164.514(b)(1) │
43
- │ └── Statistical/scientific expert certifies │
44
- │ └── Risk of re-identification is "very small" │
45
- │ └── Documents methods and results │
46
- │ │
47
- │ ▼ │
48
- │ De-identified Data │
49
- │ └── NOT considered PHI │
50
- │ └── NOT subject to HIPAA Privacy Rule │
51
- │ └── Can be shared freely for research │
52
- └─────────────────────────────────────────────────────────────┘
53
- ```
30
+ ![HIPAA De-identification — Safe Harbor vs Expert Determination flow](/storage/uploads/2026/04/healthcare-safe-harbor-flow.png)
31
+
32
+ **PHI (Protected Health Information)** có 2 phương pháp de-identification theo §164.514(a):
33
+
34
+ - **Method 1: Safe Harbor** §164.514(b)
35
+ - Remove tất cả **18 identifiers**
36
+ - Không actual knowledge về re-identification
37
+ - Deterministic, rules-based
38
+ - **Method 2: Expert Determination** §164.514(b)(1)
39
+ - Statistical/scientific expert certifies
40
+ - Risk of re-identification là **"very small"**
41
+ - Document phương pháp kết quả
42
+
43
+ **→ De-identified Data**: NOT considered PHI, NOT subject to HIPAA Privacy Rule, có thể share freely cho research
54
44
 
55
45
  ### 1.2. Data Protection Spectrum
56
46
 
57
- ```
58
- ┌─────────────────────────────────────────────────────────────┐
59
- │ Data Protection Spectrum │
60
- │ │
61
- │ ◄─── More Privacy ────────────────── Less Privacy ──► │
62
- │ │
63
- │ Synthetic Anonymized De-identified Masked Original
64
- Data Data Data Data PHI │
65
- │ │ │ │ │ │ │
66
- │ │ │ │ │ │ │
67
- │ ▼ ▼ ▼ ▼ ▼ │
68
- │ Fake data Cannot 18 identifiers Partial Full │
69
- │ generated reverse removed hiding data │
70
- │ from to (Safe Harbor) (SSN: visible │
71
- │ patterns original ***-4567) │
72
- │ data │
73
- │ │
74
- │ Use cases: Research Research Production Testing │
75
- │ Dev/Test Analytics Sharing Display │
76
- │ Training Population Publications Logs │
77
- │ Health │
78
- └─────────────────────────────────────────────────────────────┘
79
- ```
47
+ ![Data Protection Spectrum — từ Synthetic Data đến Original PHI](/storage/uploads/2026/04/healthcare-data-protection-spectrum.png)
48
+
49
+ | Level | Mô tả | Use Cases |
50
+ |-------|--------|----------|
51
+ | **Synthetic Data** | Fake data generated from patterns | Dev/Test, Training |
52
+ | **Anonymized Data** | Cannot reverse to original | Research, Analytics, Population Health |
53
+ | **De-identified Data** | 18 identifiers removed (Safe Harbor) | Research Sharing, Publications |
54
+ | **Masked Data** | Partial hiding (SSN: ***-4567) | Production Display, Logs |
55
+ | **Original PHI** | Full data visible | Testing (restricted) |
56
+
57
+ ◄── **More Privacy** ─────────────────── **Less Privacy** ──►
80
58
 
81
59
  ## 2. HIPAA Safe Harbor Method — 18 Identifiers
82
60
 
@@ -341,30 +319,13 @@ Expert Determination method cho phép linh hoạt hơn Safe Harbor nhưng yêu c
341
319
  3. Document phương pháp và kết quả đánh giá
342
320
 
343
321
  ```
344
- ┌─────────────────────────────────────────────────────────────┐
345
- Expert Determination Process
346
- │ │
347
- │ Step 1: Identify quasi-identifiers
348
- │ └── Combinations of fields thể re-identify │
349
- │ └── dụ: ZIP + DOB + Gender 87% population unique │
350
- │ │
351
- │ Step 2: Apply statistical methods │
352
- │ └── k-anonymity (k ≥ 5 recommended) │
353
- │ └── l-diversity │
354
- │ └── t-closeness │
355
- │ │
356
- │ Step 3: Re-identification risk assessment │
357
- │ └── Prosecutor risk < 0.04 (1/25) │
358
- │ └── Journalist risk < 0.04 │
359
- │ └── Marketer risk < 0.04 │
360
- │ │
361
- │ Step 4: Document and certify │
362
- │ └── Expert's qualifications │
363
- │ └── Methods used │
364
- │ └── Risk assessment results │
365
- │ └── Signed certification │
366
- └─────────────────────────────────────────────────────────────┘
367
- ```
322
+
323
+ **Expert Determination Process:**
324
+
325
+ 1. **Identify quasi-identifiers** — Tổ hợp các fields có thể re-identify (ví dụ: ZIP + DOB + Gender → 87% population unique)
326
+ 2. **Apply statistical methods** k-anonymity (k ≥ 5 recommended), l-diversity, t-closeness
327
+ 3. **Re-identification risk assessment** Prosecutor/Journalist/Marketer risk < 0.04 (1/25)
328
+ 4. **Document and certify** — Expert’s qualifications, methods used, risk assessment results, signed certification
368
329
 
369
330
  ### 3.2. Quasi-Identifier Analysis
370
331
 
@@ -544,35 +505,24 @@ $$ LANGUAGE plpgsql IMMUTABLE;
544
505
  ### 5.1. Masking Pipeline cho Non-Production
545
506
 
546
507
  ```
547
- ┌─────────────────────────────────────────────────────────────┐
548
- Static Data Masking Pipeline
549
- │ │
550
- Production DB
551
- │ │ │
552
- │ ├── 1. pg_dump (logical backup) │
553
- │ │ │
554
- │ ▼ │
555
- │ Staging Area (temporary) │
556
- │ │ │
557
- │ ├── 2. Apply masking transformations │
558
- │ │ ├── Names Faker-generated names │
559
- │ │ ├── SSN Random SSN format │
560
- │ │ ├── Dates Shifted by random offset │
561
- │ │ ├── Addresses Randomized │
562
- │ │ └── MRN Re-keyed │
563
- │ │ │
564
- │ ├── 3. Validate masked data │
565
- │ │ ├── No real PHI remaining │
566
- │ │ ├── Referential integrity preserved │
567
- │ │ └── Data distributions similar │
568
- │ │ │
569
- │ ▼ │
570
- │ Dev/Test DB │
571
- │ └── Safe to use without HIPAA constraints │
572
- │ │
573
- │ ⚠ Staging area is securely deleted after masking │
574
- └─────────────────────────────────────────────────────────────┘
575
- ```
508
+
509
+ **Static Data Masking Pipeline:**
510
+
511
+ 1. **Production DB** → `pg_dump` (logical backup)
512
+ 2. **Staging Area** (temporary, isolated network)
513
+ 3. **Apply masking transformations:**
514
+ - Names → Faker-generated names
515
+ - SSN → Random SSN format
516
+ - Dates → Shifted by random offset
517
+ - Addresses → Randomized
518
+ - MRN Re-keyed
519
+ 4. **Validate masked data:**
520
+ - No real PHI remaining
521
+ - Referential integrity preserved
522
+ - Data distributions similar
523
+ 5. **Dev/Test DB** — Safe to use without HIPAA constraints
524
+
525
+ > ⚠️ Staging area is securely deleted after masking
576
526
 
577
527
  ### 5.2. SQL-Based Static Masking Script
578
528
 
@@ -640,31 +590,26 @@ COMMIT;
640
590
  **Định nghĩa**: Một dataset đạt k-anonymity nếu mỗi tổ hợp quasi-identifiers xuất hiện ít nhất **k lần**. Nghĩa là mỗi record không thể phân biệt khỏi ít nhất k-1 records khác.
641
591
 
642
592
  ```
643
- ┌─────────────────────────────────────────────────────────────┐
644
- K-Anonymity Example (k=3)
645
- │ │
646
- BEFORE (k=1, not anonymous): │
647
- │ ┌──────────┬────┬────────┬─────────────┐ │
648
- │ │ Age ZIP Gender Diagnosis │ │
649
- │ ├──────────┼────┼────────┼─────────────┤ │
650
- │ │ 28700 M Diabetes← Unique!
651
- │ │ 29700 M Heart │ │
652
- │ │ 35700 F Cancer← Unique!
653
- │ └──────────┴────┴────────┴─────────────┘ │
654
- │ │
655
- │ AFTER (k=3, generalized): │
656
- │ ┌──────────┬────┬────────┬─────────────┐ │
657
- │ │ Age Range│ ZIP│ Gender │ Diagnosis │ │
658
- │ ├──────────┼────┼────────┼─────────────┤ │
659
- │ │ 25-357** * Diabetes │ ← 3 matches
660
- │ │ 25-357** * Heart │ ← 3 matches
661
- │ │ 25-35 │7** │ * │ Cancer │ ← 3 matches │
662
- │ └──────────┴────┴────────┴─────────────┘ │
663
- │ │
664
- │ Techniques: Generalization (age ranges, ZIP truncation) │
665
- │ Suppression (remove rare values) │
666
- └─────────────────────────────────────────────────────────────┘
667
- ```
593
+
594
+ ![K-Anonymity Example — Before (k=1) vs After (k=3) Generalization](/storage/uploads/2026/04/healthcare-k-anonymity-example.png)
595
+
596
+ **BEFORE (k=1, not anonymous):**
597
+
598
+ | Age | ZIP | Gender | Diagnosis |
599
+ |-----|-----|--------|----------|
600
+ | 28 | 700 | M | Diabetes ← Unique! |
601
+ | 29 | 700 | M | Heart |
602
+ | 35 | 700 | F | Cancer ← Unique! |
603
+
604
+ **AFTER (k=3, generalized):**
605
+
606
+ | Age Range | ZIP | Gender | Diagnosis |
607
+ |-----------|-----|--------|----------|
608
+ | 25-35 | 7** | * | Diabetes ← 3 matches |
609
+ | 25-35 | 7** | * | Heart ← 3 matches |
610
+ | 25-35 | 7** | * | Cancer ← 3 matches |
611
+
612
+ **Techniques:** Generalization (age ranges, ZIP truncation), Suppression (remove rare values)
668
613
 
669
614
  ### 6.2. K-Anonymity Implementation
670
615
 
@@ -832,28 +777,22 @@ public class KAnonymityService {
832
777
  **T-Closeness**: Phân phối sensitive attribute trong mỗi equivalence class phải gần với phân phối tổng thể (khoảng cách ≤ t). Ngăn chặn **skewness attack**.
833
778
 
834
779
  ```
835
- ┌─────────────────────────────────────────────────────────────┐
836
- K-Anonymity vs L-Diversity │
837
- │ │
838
- │ K-Anonymity (k=3) VULNERABLE: │
839
- │ ┌──────────┬─────────────┐ │
840
- │ │ Age 25-35 Diagnosis │ │
841
- │ ├──────────┼─────────────┤ │
842
- │ │ 25-35 HIVAll have HIV! │
843
- │ │ 25-35 │ HIV │ ← Attacker knows diagnosis │
844
- │ │ 25-35 │ HIV │ ← even without knowing WHO │
845
- │ └──────────┴─────────────┘ │
846
- │ │
847
- │ L-Diversity (l=3) — PROTECTED: │
848
- │ ┌──────────┬─────────────┐ │
849
- │ │ Age 25-35 Diagnosis │ │
850
- │ ├──────────┼─────────────┤ │
851
- │ │ 25-35 │ Diabetes │ ← 3 different diagnoses │
852
- │ │ 25-35 │ Heart │ ← Attacker cannot infer │
853
- │ │ 25-35 │ Cold │ ← which one belongs to target │
854
- │ └──────────┴─────────────┘ │
855
- └─────────────────────────────────────────────────────────────┘
856
- ```
780
+
781
+ **K-Anonymity (k=3) — VULNERABLE (Homogeneity Attack):**
782
+
783
+ | Age Range | Diagnosis |
784
+ |-----------|----------|
785
+ | 25-35 | HIV ← All have HIV! |
786
+ | 25-35 | HIV ← Attacker knows diagnosis |
787
+ | 25-35 | HIV even without knowing WHO |
788
+
789
+ **L-Diversity (l=3) PROTECTED:**
790
+
791
+ | Age Range | Diagnosis |
792
+ |-----------|----------|
793
+ | 25-35 | Diabetes ← 3 different diagnoses |
794
+ | 25-35 | Heart ← Attacker cannot infer |
795
+ | 25-35 | Cold ← which one belongs to target |
857
796
 
858
797
  | Method | Protects Against | Weakness |
859
798
  |--------|-----------------|----------|
@@ -865,33 +804,16 @@ public class KAnonymityService {
865
804
 
866
805
  ### 7.1. Tokenization Architecture
867
806
 
868
- ```
869
- ┌─────────────────────────────────────────────────────────────┐
870
- Tokenization Architecture │
871
- │ │
872
- Original SSN: 079-123-456789 │
873
- │ │ │
874
- │ ▼ │
875
- │ ┌────────────────────────┐ │
876
- │ │ Tokenization Service │ │
877
- │ │ (Format-Preserving │ │
878
- │ │ Encryption - FPE) │ │
879
- │ └───────────┬────────────┘ │
880
- │ │ │
881
- │ ┌──────┴──────┐ │
882
- │ ▼ ▼ │
883
- │ Token: 248-971-832145 Token Vault (mapping) │
884
- │ (same format, different ┌──────────────────┐ │
885
- │ value, reversible │ Token → Original │ │
886
- │ with key) │ Encrypted store │ │
887
- │ └──────────────────┘ │
888
- │ │
889
- │ Advantages: │
890
- │ • Same format → existing systems work (validation, UI) │
891
- │ • Reversible → authorized users can detokenize │
892
- │ • Referential integrity → same SSN always → same token │
893
- └─────────────────────────────────────────────────────────────┘
894
- ```
807
+ **Flow:**
808
+ - **Original SSN**: `079-123-456789`
809
+ - → **Tokenization Service** (Format-Preserving Encryption — FPE)
810
+ - **Token**: `248-971-832145` (same format, different value, reversible with key)
811
+ - **Token Vault**: Encrypted mapping Token → Original
812
+
813
+ **Advantages:**
814
+ - Same format → existing systems work (validation, UI)
815
+ - Reversible → authorized users can detokenize
816
+ - Referential integrity → same SSN always → same token
895
817
 
896
818
  ### 7.2. FPE Tokenization Service
897
819
 
@@ -1280,55 +1202,23 @@ public class DeidentificationResource {
1280
1202
 
1281
1203
  ### 10.1. Complete Pipeline
1282
1204
 
1283
- ```
1284
- ┌─────────────────────────────────────────────────────────────┐
1285
- │ Data Masking & De-identification Pipeline │
1286
- │ │
1287
- │ Source: Production Database │
1288
- │ │ │
1289
- │ ├──────────────────────────────────────────────────────┐
1290
- │ │ Step 1: Extract │ │
1291
- │ │ pg_dump --format=custom --compress=9 │ │
1292
- │ │ Output: encrypted dump file │ │
1293
- │ └──────────────────────────────────────────────┬───────┘ │
1294
- │ │ │
1295
- │ ├──────────────────────────────────────────────────────┐
1296
- │ │ Step 2: Load into Staging │ │
1297
- │ │ pg_restore staging database │ │
1298
- │ │ (isolated network, no external access) │ │
1299
- │ └──────────────────────────────────────────────┬───────┘ │
1300
- │ │ │
1301
- │ ├──────────────────────────────────────────────────────┐ │
1302
- │ │ Step 3: Apply Masking Rules │ │
1303
- │ │ ├── Direct identifiers → Remove/Replace │ │
1304
- │ │ ├── Quasi-identifiers → Generalize │ │
1305
- │ │ ├── Dates → Shift by random offset │ │
1306
- │ │ ├── Free text → NLP scrubbing │ │
1307
- │ │ └── Verify: k-anonymity check │ │
1308
- │ └──────────────────────────────────────────────┬───────┘ │
1309
- │ │ │
1310
- │ ├──────────────────────────────────────────────────────┐ │
1311
- │ │ Step 4: Validate │ │
1312
- │ │ ├── No real PHI remaining (regex scan) │ │
1313
- │ │ ├── Referential integrity preserved │ │
1314
- │ │ ├── Statistical properties maintained │ │
1315
- │ │ └── K-anonymity verified (k ≥ 5) │ │
1316
- │ └──────────────────────────────────────────────┬───────┘ │
1317
- │ │ │
1318
- │ ├──────────────────────────────────────────────────────┐ │
1319
- │ │ Step 5: Export │ │
1320
- │ │ pg_dump staging → masked dump file │ │
1321
- │ │ Load into dev/test/research databases │ │
1322
- │ └──────────────────────────────────────────────┬───────┘ │
1323
- │ │ │
1324
- │ ├──────────────────────────────────────────────────────┐ │
1325
- │ │ Step 6: Cleanup │ │
1326
- │ │ DROP staging database │ │
1327
- │ │ Securely delete temp files │ │
1328
- │ │ Audit log the entire process │ │
1329
- │ └─────────────────────────────────────────────────────┘ │
1330
- └─────────────────────────────────────────────────────────────┘
1331
- ```
1205
+ **Data Masking & De-identification Pipeline:**
1206
+
1207
+ 1. **Extract** `pg_dump --format=custom --compress=9` → encrypted dump file
1208
+ 2. **Load into Staging** — `pg_restore` → staging database (isolated network, no external access)
1209
+ 3. **Apply Masking Rules:**
1210
+ - Direct identifiers → Remove/Replace
1211
+ - Quasi-identifiers → Generalize
1212
+ - Dates → Shift by random offset
1213
+ - Free text → NLP scrubbing
1214
+ - Verify: k-anonymity check
1215
+ 4. **Validate:**
1216
+ - No real PHI remaining (regex scan)
1217
+ - Referential integrity preserved
1218
+ - Statistical properties maintained
1219
+ - K-anonymity verified (k ≥ 5)
1220
+ 5. **Export** `pg_dump staging` → masked dump file → load into dev/test/research databases
1221
+ 6. **Cleanup** — DROP staging database, securely delete temp files, audit log the entire process
1332
1222
 
1333
1223
  ### 10.2. Comparison Table
1334
1224