npm - tech-hub-skills - Versions diffs - 1.0.0 - Mend

tech-hub-skills 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (133) hide show

package/tech_hub_skills/roles/security-architect/skills/01-pii-detection/README.md ADDED Viewed

@@ -0,0 +1,319 @@
+# Skill 1: PII Detection & Data Privacy
+## 🎯 Overview
+Automated PII detection, masking, and GDPR compliance tools.
+## 🔗 Connections
+- **Data Engineer**: PII masking in data pipelines (de-01, de-02, de-03)
+- **AI Engineer**: PII filtering before RAG indexing (ai-02, ai-03)
+- **ML Engineer**: Remove PII before model training (ml-01, ml-02)
+- **Data Scientist**: PII detection in analysis datasets (ds-01)
+- **DevOps**: Automated PII scanning in CI/CD (do-01, do-02)
+- **FinOps**: Track compliance audit costs (fo-01)
+- **All Roles**: GDPR compliance and data protection
+## 🛠️ Tools Included
+### 1. `pii_detector.py`
+PII detection using Microsoft Presidio and custom patterns.
+### 2. `data_anonymizer.py`
+Data anonymization with multiple strategies (masking, hashing, generalization).
+### 3. `gdpr_compliance_checker.py`
+GDPR compliance validation and audit trails.
+### 4. `consent_manager.py`
+User consent tracking and right-to-erasure automation.
+### 5. `pii_audit_queries.sql`
+SQL queries for PII inventory and audit logs.
+## 📊 PII Types Detected
+- Email addresses
+- Phone numbers
+- Credit cards
+- SSN / National IDs
+- IP addresses
+- Addresses
+- Names
+- Dates of birth
+## 🚀 Quick Start
+```python
+from pii_detector import PIIDetector
+from data_anonymizer import DataAnonymizer
+# Detect PII
+detector = PIIDetector()
+pii_findings = detector.analyze_text(
+    "Contact John Smith at john.smith@email.com or 555-123-4567"
+)
+# Anonymize data
+anonymizer = DataAnonymizer()
+anonymized = anonymizer.mask_dataframe(
+    df=customer_df,
+    pii_columns=["email", "phone", "ssn"]
+)
+```
+## 📚 Best Practices
+### Integration with Data Pipelines (Data Engineer)
+1. **Bronze Layer PII Scanning**
+   - Scan all raw data at ingestion
+   - Tag datasets containing PII
+   - Block high-risk PII from pipeline
+   - Maintain PII inventory
+   - Reference: Data Engineer de-01 (Lakehouse Architecture)
+2. **Silver Layer PII Masking**
+   - Apply masking transformations
+   - Implement k-anonymity for aggregations
+   - Track masked vs raw data lineage
+   - Validate masking effectiveness
+   - Reference: Data Engineer de-01, de-03
+3. **Gold Layer Compliance**
+   - Ensure no PII in analytics layers
+   - Implement row-level security
+   - Audit PII access logs
+   - Enable right-to-erasure automation
+   - Reference: Data Engineer de-01
+### AI/ML Integration
+4. **Pre-Training PII Removal**
+   - Scan training data before ML experiments
+   - Remove PII from feature engineering
+   - Anonymize datasets for model development
+   - Track data provenance for compliance
+   - Reference: ML Engineer ml-01, ml-02
+5. **RAG Knowledge Base Protection**
+   - Scan documents before embedding
+   - Prevent PII indexing in vector databases
+   - Filter PII from LLM context
+   - Audit knowledge base for compliance
+   - Reference: AI Engineer ai-02 (RAG Pipeline)
+6. **LLM Input/Output Filtering**
+   - Detect PII in user prompts
+   - Redact PII from LLM responses
+   - Log PII exposure incidents
+   - Implement real-time PII alerts
+   - Reference: AI Engineer ai-01, ai-07
+### Automation & CI/CD (DevOps Integration)
+7. **Automated PII Scanning**
+   - Integrate PII detection in CI/CD pipelines
+   - Block commits containing PII
+   - Scan code, configs, and test data
+   - Automate compliance reports
+   - Reference: DevOps do-01 (CI/CD), do-02 (Testing)
+8. **Continuous Compliance Monitoring**
+   - Schedule regular PII scans
+   - Alert on new PII discoveries
+   - Track remediation progress
+   - Generate audit trails
+   - Reference: DevOps do-08 (Monitoring)
+### Cost Management (FinOps Integration)
+9. **Optimize PII Scanning Costs**
+   - Use sampling for large datasets
+   - Cache PII detection results
+   - Right-size scanning compute
+   - Monitor compliance operation costs
+   - Reference: FinOps fo-01, fo-06
+### Enterprise Governance
+10. **Data Governance Framework**
+    - Classify data by sensitivity level
+    - Implement data handling policies
+    - Track PII across all systems
+    - Enable compliance reporting
+    - Reference: Security Architect sa-06 (Data Governance)
+11. **GDPR Right-to-Erasure**
+    - Automate data deletion requests
+    - Track PII deletion across systems
+    - Verify erasure completeness
+    - Maintain deletion audit logs
+    - Reference: Security Architect sa-06
+## 💰 Cost Optimization Examples
+### Efficient PII Scanning
+```python
+from pii_detector import PIIDetector
+from finops_tracker import ComplianceCostTracker
+detector = PIIDetector()
+cost_tracker = ComplianceCostTracker()
+@cost_tracker.track_scan_cost
+def smart_pii_scan(df: pd.DataFrame, sample_size: int = 10000):
+    # Sample for initial detection (cost savings)
+    if len(df) > sample_size:
+        sample_df = df.sample(n=sample_size, random_state=42)
+        pii_columns = detector.find_pii_columns(sample_df)
+        # Full scan only on suspected PII columns
+        results = {}
+        for col in pii_columns:
+            results[col] = detector.analyze_column(df[col])
+    else:
+        results = detector.analyze_dataframe(df)
+    return results
+# Cost report
+report = cost_tracker.monthly_report()
+print(f"PII scanning costs: ${report.total_cost:.2f}")
+print(f"Datasets scanned: {report.datasets_scanned}")
+```
+## 🚀 Automated PII Protection Pipeline
+### CI/CD Integration
+```yaml
+# .github/workflows/pii-protection.yml
+name: PII Protection
+on:
+  push:
+    paths:
+      - 'data/**'
+      - 'pipelines/**'
+  pull_request:
+jobs:
+  pii-scan:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - name: Scan code for PII patterns
+        run: |
+          python scripts/scan_code_for_pii.py \
+            --fail-on-detection \
+            --exclude-patterns .gitignore
+      - name: Scan test data
+        run: |
+          python scripts/scan_test_data.py \
+            --redact-if-found \
+            --report-path reports/pii_scan.json
+      - name: Validate data pipelines
+        run: |
+          python scripts/validate_pii_masking.py \
+            --pipeline-config pipelines/config.yaml
+      - name: Generate compliance report
+        run: python scripts/generate_compliance_report.py
+      - name: Upload scan results
+        uses: actions/upload-artifact@v3
+        with:
+          name: pii-scan-results
+          path: reports/
+```
+### Data Pipeline Integration
+```python
+from bronze_ingestion import BronzeLoader
+from pii_detector import PIIDetector
+from data_anonymizer import DataAnonymizer
+detector = PIIDetector()
+anonymizer = DataAnonymizer()
+def secure_data_pipeline(source_data: str, output_table: str):
+    # Bronze: Ingest with PII detection
+    bronze = BronzeLoader()
+    df = bronze.ingest(source_data)
+    # Detect PII
+    pii_findings = detector.analyze_dataframe(df)
+    if pii_findings:
+        # Log for compliance
+        log_pii_detection(
+            dataset=output_table,
+            pii_types=[f.type for f in pii_findings],
+            timestamp=datetime.now()
+        )
+        # Silver: Mask PII
+        df_masked = anonymizer.mask_dataframe(
+            df,
+            pii_columns=[f.column for f in pii_findings],
+            strategy="hash"  # Deterministic for joins
+        )
+        # Store both raw (encrypted) and masked
+        bronze.save(df, f"{output_table}_raw_encrypted")
+        bronze.save(df_masked, f"{output_table}_masked")
+        # Alert security team
+        if any(f.severity == "high" for f in pii_findings):
+            send_security_alert(pii_findings)
+    else:
+        bronze.save(df, output_table)
+    return pii_findings
+```
+## 📊 Enhanced Metrics
+| Metric | Target | Tool |
+|--------|--------|------|
+| **PII Detection Coverage** | 100% of datasets | Automated scanning |
+| **False Positive Rate** | <5% | Model tuning |
+| **Detection Latency** | <1min per GB | Performance monitoring |
+| **Masking Accuracy** | >99.9% | Validation tests |
+| **Compliance Audit Pass Rate** | 100% | Audit logs |
+| **Mean Time to Remediate** | <24 hours | Incident tracking |
+## 🔄 Integration Workflow
+### End-to-End PII Protection
+```
+1. Data Ingestion (de-01)
+   ↓
+2. PII Detection (sa-01) → Log Finding
+   ↓
+3. Risk Assessment (High/Medium/Low)
+   ↓
+4. Masking/Encryption (sa-01)
+   ↓
+5. Quality Validation (de-03)
+   ↓
+6. Compliance Audit Log (sa-06)
+   ↓
+7. Downstream Processing (ML, Analytics)
+   ├── Model Training (ml-01) - PII-free
+   ├── RAG Indexing (ai-02) - PII-free
+   └── EDA Reports (ds-01) - Masked
+   ↓
+8. Continuous Monitoring (do-08)
+   ↓
+9. Cost Tracking (fo-01)
+```
+## 🎯 Quick Wins
+1. **Integrate PII scanning in CI/CD** - Prevent PII commits
+2. **Automate Bronze layer scanning** - Detect PII at ingestion
+3. **Implement PII masking in Silver** - Protect downstream systems
+4. **Enable LLM input filtering** - Prevent PII in prompts
+5. **Set up compliance dashboards** - Real-time PII tracking
+6. **Automate right-to-erasure** - GDPR compliance automation

package/tech_hub_skills/roles/security-architect/skills/02-threat-modeling/README.md ADDED Viewed

@@ -0,0 +1,264 @@
+# Skill 02: Threat Modeling & Risk Assessment
+## 🎯 Overview
+STRIDE model generator, attack surface analyzer, risk scoring
+## 🔗 Connections
+- **Data Engineer**: Data foundation and pipelines (de-01, de-02, de-03)
+- **Security Architect**: Compliance, PII detection, access control (sa-01, sa-02)
+- **ML Engineer**: Model lifecycle and serving (ml-01, ml-04)
+- **AI Engineer**: LLM integration and automation (ai-01, ai-02, ai-07)
+- **MLOps**: Experiment tracking and monitoring (mo-01, mo-03, mo-06)
+- **FinOps**: Cost optimization and tracking (fo-01, fo-07)
+- **DevOps**: CI/CD, containerization, monitoring (do-01, do-03, do-08)
+- **System Design**: Architecture patterns (sd-01)
+- **Dependencies**: sd-01
+## 🛠️ Tools Included
+### 1. Primary Implementation Script
+Core implementation for threat modeling & risk assessment.
+### 2. Configuration Manager
+Manage configuration and settings for threat modeling & risk assessment.
+### 3. Integration Connector
+Connect with other Tech Hub skills and external services.
+### 4. Monitoring & Metrics
+Track performance, costs, and quality metrics.
+### 5. Automation Scripts
+Automate common workflows and tasks.
+## 📊 Key Metrics
+- Implementation quality score
+- Performance benchmarks
+- Cost efficiency
+- Security compliance rate
+- Integration test coverage
+## 🚀 Quick Start
+```python
+# Example implementation for Threat Modeling & Risk Assessment
+from security_architect import 02_threat_modeling
+# Initialize
+service = 02ThreatModelingService()
+# Execute
+result = service.execute(
+    config={
+        "environment": "production",
+        "enable_monitoring": True
+    }
+)
+print(f"Status: {result.status}")
+print(f"Metrics: {result.metrics}")
+```
+## 📚 Best Practices
+### Cost Optimization (FinOps Integration)
+1. **Monitor Resource Costs**
+   - Track costs per execution
+   - Set budget alerts
+   - Optimize resource utilization
+   - Reference: FinOps fo-01 (Cost Monitoring)
+2. **Right-size Resources**
+   - Use appropriate compute sizes
+   - Implement auto-scaling
+   - Leverage spot/reserved instances where applicable
+   - Reference: FinOps fo-06, fo-07
+### Security & Privacy (Security Architect Integration)
+3. **Implement Access Control**
+   - Use least privilege principle
+   - Enable Azure AD authentication
+   - Audit access logs
+   - Reference: Security Architect sa-02 (IAM), sa-04
+4. **Data Protection**
+   - Encrypt data at rest and in transit
+   - Scan for PII before processing
+   - Implement data retention policies
+   - Reference: Security Architect sa-01 (PII Detection)
+### Quality & Governance (Data Engineer Integration)
+5. **Ensure Data Quality**
+   - Validate inputs and outputs
+   - Implement quality gates
+   - Monitor data freshness
+   - Reference: Data Engineer de-03 (Data Quality)
+### Lifecycle Management (MLOps Integration)
+6. **Version Control**
+   - Version all configurations
+   - Track changes over time
+   - Enable rollback capability
+   - Reference: MLOps mo-03 (Versioning)
+7. **Continuous Monitoring**
+   - Track performance metrics
+   - Set up alerting
+   - Monitor for drift
+   - Reference: MLOps mo-06 (Monitoring)
+### Deployment & Operations (DevOps Integration)
+8. **Automate Deployment**
+   - Implement CI/CD pipelines
+   - Use infrastructure as code
+   - Enable blue-green deployments
+   - Reference: DevOps do-01 (CI/CD), do-03 (IaC)
+9. **Observability**
+   - Implement distributed tracing
+   - Set up dashboards
+   - Enable logging and metrics
+   - Reference: DevOps do-08 (Monitoring)
+### Azure-Specific Best Practices
+10. **Leverage Azure Services**
+    - Use managed services where possible
+    - Implement Azure Policy for governance
+    - Enable Azure Monitor integration
+    - Use managed identities for authentication
+## 💰 Cost Optimization Examples
+### Cost Tracking
+```python
+from finops_tracker import CostTracker
+tracker = CostTracker()
+@tracker.track_costs
+def run_operation(params):
+    # Your operation here
+    result = execute_operation(params)
+    return result
+# Monthly report
+report = tracker.monthly_report()
+print(f"Total cost: ${report.total_cost:.2f}")
+print(f"Cost per operation: ${report.avg_cost:.4f}")
+```
+## 🔒 Security Best Practices Examples
+### Access Control Implementation
+```python
+from azure.identity import DefaultAzureCredential
+from security_manager import AccessControl
+credential = DefaultAzureCredential()
+access_control = AccessControl(credential)
+# Validate access before operation
+@access_control.require_role("operator")
+def sensitive_operation(data):
+    # Operation logic
+    return process_data(data)
+```
+## 📊 Enhanced Metrics & Monitoring
+| Metric Category | Metric | Target | Tool |
+|-----------------|--------|--------|------|
+| **Performance** | Execution time (p95) | <5s | Azure Monitor |
+| | Success rate | >99% | Custom metrics |
+| **Cost** | Cost per operation | <$0.05 | FinOps dashboard |
+| | Resource utilization | >75% | Azure Monitor |
+| **Quality** | Error rate | <1% | App Insights |
+| | Data quality score | >95% | Quality tracker |
+| **Security** | Access violations | 0 | Security logs |
+| | Compliance score | 100% | Audit system |
+## 🚀 Deployment Pipeline
+### CI/CD Example
+```yaml
+# .github/workflows/deploy-02-threat-modeling.yml
+name: Deploy Threat Modeling & Risk Assessment
+on:
+  push:
+    paths:
+      - 'security-architect/skills/02-threat-modeling/**'
+    branches:
+      - main
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - name: Run tests
+        run: pytest tests/ -v
+      - name: Security scan
+        run: python scripts/security_scan.py
+      - name: Cost validation
+        run: python scripts/validate_costs.py
+  deploy:
+    needs: test
+    runs-on: ubuntu-latest
+    steps:
+      - name: Deploy to Azure
+        run: |
+          az deployment group create \
+            --resource-group rg-security-architect \
+            --template-file infra/main.bicep
+      - name: Monitor deployment
+        run: python scripts/monitor_health.py --duration 10m
+```
+## 🔄 Integration Workflow
+### End-to-End Process
+```
+1. Input Validation
+   ↓
+2. Security Checks (sa-01, sa-02)
+   ↓
+3. Main Processing
+   ↓
+4. Quality Validation (de-03)
+   ↓
+5. Cost Tracking (fo-01)
+   ↓
+6. Monitoring & Logging (do-08)
+   ↓
+7. Output Delivery
+```
+## 🎯 Quick Wins
+1. **Enable cost tracking** - Monitor spending from day one
+2. **Implement security scanning** - Catch vulnerabilities early
+3. **Set up monitoring** - Full visibility into operations
+4. **Automate deployment** - Faster, safer releases
+5. **Add quality gates** - Prevent bad data from propagating
+6. **Enable caching** - Reduce redundant operations
+7. **Implement retries** - Improve reliability
+8. **Set up alerting** - Know about issues immediately
+## 🔗 Related Skills
+- sd-01
+---
+**Skill ID**: `02-threat-modeling`
+**Complexity**: Medium
+**Dependencies**: sd-01
+**Business Value**: High
+**Estimated Implementation Time**: 4-8 hours