npm - @xdev-asia/xdev-knowledge-mcp - Versions diffs - 1.0.43 → 1.0.45 - Mend

@xdev-asia/xdev-knowledge-mcp 1.0.43 → 1.0.45

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (61) hide show

package/content/series/luyen-thi/luyen-thi-gcp-ml-engineer/chapters/03-phan-3-model-development/lessons/05-bai-5-vertex-ai-training.md ADDED Viewed

@@ -0,0 +1,155 @@
+---
+id: 019c9619-lt03-l05
+title: 'Bài 5: Vertex AI Training — Custom & AutoML'
+slug: bai-5-vertex-ai-training
+description: >-
+  Custom Training Jobs: pre-built containers, custom containers.
+  Distributed training trên GPU/TPU. AutoML: Tabular, Image, Text, Video.
+  Training pipeline setup. Hyperparameter tuning service.
+duration_minutes: 60
+is_free: true
+video_url: null
+sort_order: 5
+section_title: "Phần 3: Model Development trên Vertex AI"
+course:
+  id: 019c9619-lt03-7003-c003-lt0300000003
+  title: 'Luyện thi Google Cloud Professional Machine Learning Engineer'
+  slug: luyen-thi-gcp-ml-engineer
+---
+<div style="text-align: center; margin: 2rem 0;">
+<img src="/storage/uploads/2026/04/gcp-mle-bai5-vertex-training.png" alt="Vertex AI Custom Training" style="max-width: 800px; width: 100%; border-radius: 12px;" />
+<p><em>Vertex AI Custom Training: Training Jobs, AutoML, phân tán distributed training và tối ưu</em></p>
+</div>
+<h2 id="custom-training"><strong>1. Vertex AI Custom Training</strong></h2>
+<p>Custom Training cho phép bạn chạy training code của mình trên Google Cloud infrastructure. Có 2 cách đóng gói code:</p>
+<table>
+<thead><tr><th>Method</th><th>Description</th><th>When to Use</th></tr></thead>
+<tbody>
+<tr><td><strong>Pre-built containers</strong></td><td>GCP-provided containers: TF, PyTorch, Scikit-learn, XGBoost</td><td>Standard ML frameworks, fast setup</td></tr>
+<tr><td><strong>Custom containers</strong></td><td>Build your own Docker image</td><td>Custom dependencies, special environments</td></tr>
+</tbody>
+</table>
+<pre><code class="language-text">Custom Training Job Structure:
+training_package/ (Python package or Docker image)
+│
+├── trainer/
+│   ├── __init__.py
+│   ├── task.py        ← entry point (main training script)
+│   └── model.py       ← model definition
+│
+└── setup.py
+Arguments passed via:
+  TRAINING_DATA_URI: gs://bucket/data/
+  TRAINING_OUTPUT_URI: gs://bucket/model/
+  Hyperparameters: --learning-rate=0.001
+</code></pre>
+<h2 id="compute-options"><strong>2. Compute Options</strong></h2>
+<table>
+<thead><tr><th>Hardware</th><th>Best For</th><th>Notes</th></tr></thead>
+<tbody>
+<tr><td><strong>CPU</strong></td><td>Scikit-learn, small tabular</td><td>Cheapest, no GPU parallelism</td></tr>
+<tr><td><strong>GPU (T4, A100, V100)</strong></td><td>Deep learning, NLP, CV</td><td>10-100x faster than CPU for DL</td></tr>
+<tr><td><strong>TPU v3, v4</strong></td><td>TensorFlow large-scale training</td><td>Google-specific; very fast for TF/JAX</td></tr>
+</tbody>
+</table>
+<blockquote>
+<p><strong>Exam tip:</strong> TPU is Google-specific hardware optimized for TensorFlow and JAX. GPUs work with all frameworks. TPUs are most cost-effective for very large TF models; GPUs are more versatile. Exam may ask "most cost-effective for TensorFlow large-scale" → TPU.</p>
+</blockquote>
+<h2 id="distributed-training"><strong>3. Distributed Training on Vertex AI</strong></h2>
+<table>
+<thead><tr><th>Strategy</th><th>Description</th><th>Use Case</th></tr></thead>
+<tbody>
+<tr><td><strong>Data Parallelism</strong></td><td>Split data across workers, same model</td><td>Most DL training scenarios</td></tr>
+<tr><td><strong>Model Parallelism</strong></td><td>Split model layers across workers</td><td>Model too large for one GPU</td></tr>
+<tr><td><strong>MirroredStrategy (TF)</strong></td><td>Multi-GPU, single machine</td><td>Single node, multiple GPUs</td></tr>
+<tr><td><strong>MultiWorkerMirroredStrategy</strong></td><td>Multi-GPU, multi-machine</td><td>Cluster training</td></tr>
+<tr><td><strong>ParameterServerStrategy</strong></td><td>Async updates via parameter server</td><td>Very large models (legacy)</td></tr>
+</tbody>
+</table>
+<h2 id="automl"><strong>4. Vertex AI AutoML</strong></h2>
+<table>
+<thead><tr><th>AutoML Type</th><th>Input Data</th><th>Supported Tasks</th></tr></thead>
+<tbody>
+<tr><td><strong>AutoML Tabular</strong></td><td>CSV, BigQuery table</td><td>Classification, Regression, Forecasting</td></tr>
+<tr><td><strong>AutoML Image</strong></td><td>JPEG, PNG, BMP</td><td>Classification (single/multi), Object Detection, Segmentation</td></tr>
+<tr><td><strong>AutoML Text</strong></td><td>Text documents</td><td>Classification, Entity Extraction, Sentiment</td></tr>
+<tr><td><strong>AutoML Video</strong></td><td>MP4, AVI, MOV</td><td>Classification, Object Detection, Action Recognition</td></tr>
+</tbody>
+</table>
+<h2 id="hyperparameter-tuning"><strong>5. Vertex AI Hyperparameter Tuning</strong></h2>
+<p>Vertex AI Hyperparameter Tuning tự động tìm hyperparameter combinations tốt nhất.</p>
+<table>
+<thead><tr><th>Search Algorithm</th><th>Description</th></tr></thead>
+<tbody>
+<tr><td><strong>Grid Search</strong></td><td>Exhaustive, expensive; small search space</td></tr>
+<tr><td><strong>Random Search</strong></td><td>Random sampling; often better than grid</td></tr>
+<tr><td><strong>Bayesian Optimization</strong></td><td>Smart search using Gaussian Process; most efficient</td></tr>
+</tbody>
+</table>
+<pre><code class="language-text">HPT Job Setup:
+hyperparameters:
+  - parameter_id: learning_rate
+    type: DOUBLE
+    min_value: 0.0001
+    max_value: 0.1
+    scale: LOG  ← log scale for LR
+  - parameter_id: batch_size
+    type: INTEGER
+    values: [32, 64, 128, 256]
+metric:
+  metric_id: val_accuracy
+  goal: MAXIMIZE
+max_trial_count: 50
+parallel_trial_count: 5
+</code></pre>
+<h2 id="practice"><strong>6. Practice Questions</strong></h2>
+<p><strong>Q1:</strong> A team wants to train a custom TensorFlow model across multiple machines with 8 GPUs each. They want gradients synchronized across all workers without a parameter server. Which TensorFlow distribution strategy should they use?</p>
+<ul>
+<li>A) MirroredStrategy</li>
+<li>B) MultiWorkerMirroredStrategy ✓</li>
+<li>C) ParameterServerStrategy</li>
+<li>D) TPUStrategy</li>
+</ul>
+<p><em>Explanation: MultiWorkerMirroredStrategy enables synchronous data-parallel training across multiple machines, each with multiple GPUs. MirroredStrategy is single-machine multi-GPU only. ParameterServerStrategy uses asynchronous updates. TPUStrategy is for TPU pods.</em></p>
+<p><strong>Q2:</strong> A company needs to train an image classification model but their team has no deep learning expertise. They have 5,000 labeled product images. Which Vertex AI option requires the LEAST ML expertise?</p>
+<ul>
+<li>A) Vertex AI Custom Training with TensorFlow CNN</li>
+<li>B) Vertex AI AutoML Image Classification ✓</li>
+<li>C) Dataproc Spark ML</li>
+<li>D) BigQuery ML</li>
+</ul>
+<p><em>Explanation: AutoML Image Classification handles architecture selection, hyperparameter tuning, and training automatically. A team just needs to upload labeled images and specify the task. No code or deep learning expertise is required.</em></p>
+<p><strong>Q3:</strong> Which hyperparameter search strategy is MOST efficient when evaluating expensive-to-train deep learning models with a large search space?</p>
+<ul>
+<li>A) Grid Search — tests all combinations</li>
+<li>B) Random Search — samples uniformly</li>
+<li>C) Bayesian Optimization — uses past trial results to guide search ✓</li>
+<li>D) Manual tuning — expert selects parameters</li>
+</ul>
+<p><em>Explanation: Bayesian Optimization builds a probabilistic model of the objective function using Gaussian Processes to intelligently select the next hyperparameter configuration to evaluate, based on past trial results. It finds good configurations with far fewer trials than grid or random search.</em></p>

package/content/series/luyen-thi/luyen-thi-gcp-ml-engineer/chapters/03-phan-3-model-development/lessons/06-bai-6-bigquery-ml-tensorflow.md ADDED Viewed

@@ -0,0 +1,141 @@
+---
+id: 019c9619-lt03-l06
+title: 'Bài 6: BigQuery ML & TensorFlow on GCP'
+slug: bai-6-bigquery-ml-tensorflow
+description: >-
+  BigQuery ML: CREATE MODEL syntax, supported models.
+  TensorFlow Extended (TFX) pipeline components.
+  TFServing, TFLite. Model optimization techniques.
+duration_minutes: 60
+is_free: true
+video_url: null
+sort_order: 6
+section_title: "Phần 3: Model Development trên Vertex AI"
+course:
+  id: 019c9619-lt03-7003-c003-lt0300000003
+  title: 'Luyện thi Google Cloud Professional Machine Learning Engineer'
+  slug: luyen-thi-gcp-ml-engineer
+---
+<div style="text-align: center; margin: 2rem 0;">
+<img src="/storage/uploads/2026/04/gcp-mle-bai6-bqml-tfx.png" alt="BigQuery ML & TFX Pipeline" style="max-width: 800px; width: 100%; border-radius: 12px;" />
+<p><em>BigQuery ML và TFX Pipeline: train models bằng SQL, optimize model, và production ML pipelines</em></p>
+</div>
+<h2 id="bigquery-ml"><strong>1. BigQuery ML (BQML)</strong></h2>
+<p>BigQuery ML cho phép data analysts train và serve ML models bằng SQL trong BigQuery — không cần export data, không cần biết framework ML.</p>
+<pre><code class="language-text">BigQuery ML Workflow:
+1. CREATE MODEL → train
+2. ML.EVALUATE() → evaluate metrics
+3. ML.PREDICT() → generate predictions
+4. ML.EXPLAIN_PREDICT() → SHAP-based explanations
+5. EXPORT MODEL → export to Cloud Storage (TF SavedModel format)
+</code></pre>
+<table>
+<thead><tr><th>Model Type</th><th>BQML Option</th><th>Task</th></tr></thead>
+<tbody>
+<tr><td>Linear Regression</td><td>LINEAR_REG</td><td>Regression</td></tr>
+<tr><td>Logistic Regression</td><td>LOGISTIC_REG</td><td>Binary/Multiclass classification</td></tr>
+<tr><td>K-Means</td><td>KMEANS</td><td>Clustering</td></tr>
+<tr><td>XGBoost</td><td>BOOSTED_TREE_CLASSIFIER / BOOSTED_TREE_REGRESSOR</td><td>Tabular classification/regression</td></tr>
+<tr><td>Random Forest</td><td>RANDOM_FOREST_CLASSIFIER / RANDOM_FOREST_REGRESSOR</td><td>Tabular classification/regression</td></tr>
+<tr><td>DNN</td><td>DNN_CLASSIFIER / DNN_REGRESSOR</td><td>Complex patterns</td></tr>
+<tr><td>Wide &amp; Deep</td><td>WIDE_AND_DEEP_CLASSIFIER</td><td>Recommendations (memorization + generalization)</td></tr>
+<tr><td>AutoML</td><td>AUTOML_CLASSIFIER / AUTOML_REGRESSOR</td><td>Automated model selection</td></tr>
+<tr><td>Time Series</td><td>ARIMA_PLUS</td><td>Forecasting</td></tr>
+<tr><td>Matrix Factorization</td><td>MATRIX_FACTORIZATION</td><td>Collaborative filtering</td></tr>
+</tbody>
+</table>
+<blockquote>
+<p><strong>Exam tip:</strong> BQML ARIMA_PLUS tự động xử lý seasonality, holiday effects, trend decomposition. Khi đề hỏi "forecast using BigQuery data" → ARIMA_PLUS. Khi hỏi "recommendation system in BigQuery" → MATRIX_FACTORIZATION.</p>
+</blockquote>
+<h2 id="tfx"><strong>2. TensorFlow Extended (TFX)</strong></h2>
+<p>TFX là production ML pipeline library dành cho TensorFlow. Cung cấp standard components cho mỗi bước trong ML lifecycle.</p>
+<table>
+<thead><tr><th>TFX Component</th><th>Purpose</th></tr></thead>
+<tbody>
+<tr><td><strong>ExampleGen</strong></td><td>Ingest data từ CSV, BigQuery, Avro, Parquet</td></tr>
+<tr><td><strong>StatisticsGen</strong></td><td>Compute statistics về training data</td></tr>
+<tr><td><strong>SchemaGen</strong></td><td>Infer schema từ statistics</td></tr>
+<tr><td><strong>ExampleValidator</strong></td><td>Detect anomalies: missing, distribution skew</td></tr>
+<tr><td><strong>Transform</strong></td><td>Feature engineering (Apache Beam-based)</td></tr>
+<tr><td><strong>Trainer</strong></td><td>Train TF model (EvalSpec + TrainSpec)</td></tr>
+<tr><td><strong>Tuner</strong></td><td>Hyperparameter tuning (KerasTuner)</td></tr>
+<tr><td><strong>Evaluator</strong></td><td>Evaluate model against baseline</td></tr>
+<tr><td><strong>ModelValidator</strong></td><td>Validate model meets quality thresholds</td></tr>
+<tr><td><strong>Pusher</strong></td><td>Push model to serving (TF Serving, Vertex AI)</td></tr>
+</tbody>
+</table>
+<pre><code class="language-text">TFX Pipeline (simplified):
+ExampleGen → StatisticsGen → SchemaGen → ExampleValidator
+                ↓
+            Transform (feature engineering)
+                ↓
+            Trainer (model training)
+                ↓
+            Evaluator (metrics vs baseline)
+                ↓ (if pass)
+            Pusher → TF Serving / Vertex AI Endpoint
+</code></pre>
+<h2 id="tf-serving"><strong>3. TF Serving & TFLite</strong></h2>
+<table>
+<thead><tr><th>Option</th><th>Use Case</th></tr></thead>
+<tbody>
+<tr><td><strong>TF Serving</strong></td><td>High-performance serving trên server/cloud (gRPC or REST)</td></tr>
+<tr><td><strong>TFLite</strong></td><td>Mobile devices, edge devices, microcontrollers</td></tr>
+<tr><td><strong>TF.js</strong></td><td>Browser-based inference</td></tr>
+</tbody>
+</table>
+<h2 id="model-optimization"><strong>4. Model Optimization Techniques</strong></h2>
+<table>
+<thead><tr><th>Technique</th><th>Description</th><th>Trade-off</th></tr></thead>
+<tbody>
+<tr><td><strong>Quantization</strong></td><td>Float32 → INT8 weights</td><td>4x smaller, ~2x faster, slight accuracy loss</td></tr>
+<tr><td><strong>Pruning</strong></td><td>Remove low-weight connections</td><td>Smaller model, preserve accuracy</td></tr>
+<tr><td><strong>Knowledge Distillation</strong></td><td>Train small "student" model from large "teacher"</td><td>Smaller + fast, slight accuracy loss</td></tr>
+<tr><td><strong>TensorRT</strong></td><td>NVIDIA GPU optimization (layer fusion)</td><td>3-5x inference speedup on NVIDIA GPUs</td></tr>
+</tbody>
+</table>
+<h2 id="practice"><strong>5. Practice Questions</strong></h2>
+<p><strong>Q1:</strong> A data analyst team needs to build a sales forecasting model on data already in BigQuery. They are comfortable with SQL but have no Python/ML framework experience. Which BigQuery ML model type should they use for time series forecasting?</p>
+<ul>
+<li>A) KMEANS</li>
+<li>B) LOGISTIC_REG</li>
+<li>C) ARIMA_PLUS ✓</li>
+<li>D) MATRIX_FACTORIZATION</li>
+</ul>
+<p><em>Explanation: BigQuery ML ARIMA_PLUS is designed for time series forecasting and automatically handles seasonality, trend, and holiday effects. It can be trained with a simple CREATE MODEL statement in SQL, requiring no Python expertise.</em></p>
+<p><strong>Q2:</strong> A TFX pipeline is detecting that the distribution of the "age" feature in new production data differs significantly from the training data distribution. Which TFX component is responsible for detecting this anomaly?</p>
+<ul>
+<li>A) StatisticsGen</li>
+<li>B) SchemaGen</li>
+<li>C) ExampleValidator ✓</li>
+<li>D) Transform</li>
+</ul>
+<p><em>Explanation: ExampleValidator compares data statistics against the expected schema and flags anomalies including distribution skew (significant difference between training and serving data distributions). StatisticsGen computes statistics; SchemaGen creates the schema; Transform does feature engineering.</em></p>
+<p><strong>Q3:</strong> A team needs to deploy a TensorFlow image classification model to mobile devices with limited compute resources. They need to reduce model size by 4x with minimal accuracy loss. Which technique should they apply?</p>
+<ul>
+<li>A) Knowledge Distillation</li>
+<li>B) Model Pruning</li>
+<li>C) Post-training quantization (INT8) ✓</li>
+<li>D) TensorRT optimization</li>
+</ul>
+<p><em>Explanation: Post-training quantization converts Float32 weights to INT8, reducing model size by approximately 4x and improving inference speed by 2x, with minimal accuracy loss for most models. TFLite supports INT8 quantization for mobile/edge deployment. TensorRT is for NVIDIA GPUs, not mobile.</em></p>

package/content/series/luyen-thi/luyen-thi-gcp-ml-engineer/chapters/04-phan-4-deployment-mlops/lessons/07-bai-7-model-deployment.md ADDED Viewed

@@ -0,0 +1,134 @@
+---
+id: 019c9619-lt03-l07
+title: 'Bài 7: Model Deployment & Prediction'
+slug: bai-7-model-deployment
+description: >-
+  Vertex AI Endpoints: online, batch prediction.
+  Model versioning, traffic splitting. Edge deployment.
+  Scaling config, GPU allocation.
+duration_minutes: 60
+is_free: true
+video_url: null
+sort_order: 7
+section_title: "Phần 4: Model Deployment & MLOps"
+course:
+  id: 019c9619-lt03-7003-c003-lt0300000003
+  title: 'Luyện thi Google Cloud Professional Machine Learning Engineer'
+  slug: luyen-thi-gcp-ml-engineer
+---
+<div style="text-align: center; margin: 2rem 0;">
+<img src="/storage/uploads/2026/04/gcp-mle-bai7-deployment.png" alt="Vertex AI Model Deployment" style="max-width: 800px; width: 100%; border-radius: 12px;" />
+<p><em>Vertex AI Deployment: Online Prediction, Batch Prediction, traffic splitting, và edge deployment</em></p>
+</div>
+<h2 id="deployment-types"><strong>1. Prediction Types on Vertex AI</strong></h2>
+<table>
+<thead><tr><th>Type</th><th>Latency</th><th>When to Use</th></tr></thead>
+<tbody>
+<tr><td><strong>Online Prediction</strong></td><td>Milliseconds (sync)</td><td>Real-time apps, user-facing APIs</td></tr>
+<tr><td><strong>Batch Prediction</strong></td><td>Minutes/Hours (async)</td><td>Large datasets, scheduled scoring</td></tr>
+<tr><td><strong>Streaming Prediction</strong></td><td>Near real-time</td><td>Pub/Sub events + Dataflow + Vertex AI</td></tr>
+</tbody>
+</table>
+<h2 id="vertex-endpoints"><strong>2. Vertex AI Endpoints</strong></h2>
+<pre><code class="language-text">Vertex AI Endpoint Architecture:
+Client Request
+    ↓
+Vertex AI Endpoint (load balancer)
+    ├── Model Version A (70% traffic)
+    │       └── Deployed Model (e.g., v1.0)
+    └── Model Version B (30% traffic)  ← Canary/A-B test
+            └── Deployed Model (e.g., v1.1)
+</code></pre>
+<p>Mỗi Endpoint có thể có <strong>nhiều model versions</strong> với <strong>traffic splitting</strong> — dùng để A/B testing và canary deployments.</p>
+<table>
+<thead><tr><th>Feature</th><th>Details</th></tr></thead>
+<tbody>
+<tr><td><strong>Dedicated Endpoint</strong></td><td>Dedicated resources, lowest latency, higher cost</td></tr>
+<tr><td><strong>Shared Endpoint</strong></td><td>Multi-tenant, lower cost, potential cold start</td></tr>
+<tr><td><strong>Explanation</strong></td><td>Enable Vertex Explainability per deployed model</td></tr>
+<tr><td><strong>Min/Max Replicas</strong></td><td>Autoscaling based on request rate</td></tr>
+<tr><td><strong>GPU allocation</strong></td><td>Specify GPU type (NVIDIA T4, A100) per deployment</td></tr>
+</tbody>
+</table>
+<blockquote>
+<p><strong>Exam tip:</strong> Traffic splitting trong Vertex AI Endpoints là cách triển khai <strong>Canary deployment</strong> hoặc <strong>A/B testing</strong>. Câu hỏi "roll out new model version safely" → Traffic splitting (ví dụ: 90% old, 10% new).</p>
+</blockquote>
+<h2 id="batch-prediction"><strong>3. Batch Prediction</strong></h2>
+<table>
+<thead><tr><th>Property</th><th>Value</th></tr></thead>
+<tbody>
+<tr><td><strong>Input</strong></td><td>Cloud Storage (CSV, JSON, JSONL, TFRecords, Avro)</td></tr>
+<tr><td><strong>Output</strong></td><td>Cloud Storage (predictions as JSON/CSV)</td></tr>
+<tr><td><strong>No Endpoint needed</strong></td><td>Runs directly from Model Registry, no persistent endpoint</td></tr>
+<tr><td><strong>Auto-scaling</strong></td><td>Scales to zero when done (cost-efficient)</td></tr>
+<tr><td><strong>Accelerators</strong></td><td>Supports GPU/TPU for batch inference</td></tr>
+</tbody>
+</table>
+<h2 id="model-versioning"><strong>4. Model Versioning & Registry</strong></h2>
+<pre><code class="language-text">Vertex AI Model Registry:
+Model: churn-predictor
+├── v1 (Logistic Regression)  ← Champion in production
+│   - Accuracy: 0.87
+│   - Deployed to: endpoint/prod (70% traffic)
+│
+└── v2 (XGBoost)              ← Challenger
+    - Accuracy: 0.91
+    - Deployed to: endpoint/prod (30% traffic)
+After validation: promote v2 to Champion
+</code></pre>
+<h2 id="edge-deployment"><strong>5. Edge Deployment</strong></h2>
+<table>
+<thead><tr><th>Platform</th><th>Solution</th></tr></thead>
+<tbody>
+<tr><td>Mobile (Android/iOS)</td><td>TFLite + Vertex AI model export</td></tr>
+<tr><td>Edge devices (IoT)</td><td>TFLite Micro / Edge TPU (Coral)</td></tr>
+<tr><td>On-premise servers</td><td>TF Serving in Docker container</td></tr>
+<tr><td>Kubernetes</td><td>KServe (formerly KFServing) on GKE</td></tr>
+</tbody>
+</table>
+<h2 id="practice"><strong>6. Practice Questions</strong></h2>
+<p><strong>Q1:</strong> A company needs to score 50 million customer records for churn risk. Results are needed within 2 hours but not in real time. Which Vertex AI prediction option is MOST cost-effective?</p>
+<ul>
+<li>A) Online Prediction with high replica count</li>
+<li>B) Batch Prediction ✓</li>
+<li>C) Streaming prediction via Dataflow</li>
+<li>D) Deploy on dedicated GPU endpoint</li>
+</ul>
+<p><em>Explanation: Batch Prediction is designed for large-scale asynchronous scoring. It scales compute resources up during the job and back to zero when done, with no persistent endpoint cost. Online Prediction would be wasteful since real-time response isn't needed for batch scoring.</em></p>
+<p><strong>Q2:</strong> A team is deploying a new model version. They want to gradually route 10% of production traffic to the new version while the old version handles 90%, allowing comparison of performance metrics before full rollout. Which Vertex AI feature enables this?</p>
+<ul>
+<li>A) Model Registry versioning</li>
+<li>B) Traffic splitting on Vertex AI Endpoints ✓</li>
+<li>C) Batch Prediction comparison</li>
+<li>D) Vertex AI Experiments</li>
+</ul>
+<p><em>Explanation: Vertex AI Endpoints support deploying multiple model versions simultaneously with configurable traffic splits (e.g., 90%/10%). This enables canary deployments and A/B testing to compare live performance before committing to a full rollout.</em></p>
+<p><strong>Q3:</strong> A retail company wants to detect product defects on a factory floor without network connectivity to cloud. Which deployment approach should they use?</p>
+<ul>
+<li>A) Vertex AI Online Prediction Endpoint</li>
+<li>B) AutoML Edge Model deployed to device using TFLite ✓</li>
+<li>C) BigQuery ML batch prediction</li>
+<li>D) TF Serving on Cloud Run</li>
+</ul>
+<p><em>Explanation: Edge deployment with TFLite (or AutoML Edge Model) runs inference locally on the device without network connectivity. TFLite supports on-device inference for computer vision models, suitable for factory floor equipment with no internet access.</em></p>

package/content/series/luyen-thi/luyen-thi-gcp-ml-engineer/chapters/04-phan-4-deployment-mlops/lessons/08-bai-8-vertex-ai-pipelines-mlops.md ADDED Viewed

@@ -0,0 +1,149 @@
+---
+id: 019c9619-lt03-l08
+title: 'Bài 8: Vertex AI Pipelines & MLOps'
+slug: bai-8-vertex-ai-pipelines-mlops
+description: >-
+  Vertex AI Pipelines (Kubeflow Pipelines SDK).
+  Model Registry, Experiments, Metadata Store.
+  Vertex AI Model Monitoring: skew, drift detection.
+  CI/CD cho ML: Cloud Build + Vertex AI.
+duration_minutes: 60
+is_free: true
+video_url: null
+sort_order: 8
+section_title: "Phần 4: Model Deployment & MLOps"
+course:
+  id: 019c9619-lt03-7003-c003-lt0300000003
+  title: 'Luyện thi Google Cloud Professional Machine Learning Engineer'
+  slug: luyen-thi-gcp-ml-engineer
+---
+<div style="text-align: center; margin: 2rem 0;">
+<img src="/storage/uploads/2026/04/gcp-mle-bai8-mlops-cicd.png" alt="Vertex AI Pipelines & MLOps" style="max-width: 800px; width: 100%; border-radius: 12px;" />
+<p><em>Vertex AI MLOps: Pipelines, CI/CD, Model Registry, và monitoring cho production ML</em></p>
+</div>
+<h2 id="mlops-maturity"><strong>1. MLOps Maturity Levels</strong></h2>
+<table>
+<thead><tr><th>Level</th><th>Description</th><th>Automation</th></tr></thead>
+<tbody>
+<tr><td><strong>Level 0</strong></td><td>Manual process, scripts only</td><td>None</td></tr>
+<tr><td><strong>Level 1</strong></td><td>ML pipeline automation, continuous training</td><td>Training pipeline</td></tr>
+<tr><td><strong>Level 2</strong></td><td>Full CI/CD for ML, automated retraining triggers</td><td>Everything</td></tr>
+</tbody>
+</table>
+<h2 id="vertex-pipelines"><strong>2. Vertex AI Pipelines</strong></h2>
+<p>Vertex AI Pipelines là managed execution environment cho <strong>Kubeflow Pipelines (KFP)</strong>. Pipeline được định nghĩa bằng Python SDK và compile thành YAML.</p>
+<pre><code class="language-text">Vertex AI Pipeline Structure:
+@component (preprocess_data)
+     ↓
+@component (train_model)
+     ↓
+@component (evaluate_model)
+     ↓ (if accuracy > threshold)
+@component (deploy_model)
+Each component = isolated Docker container
+Artifacts (data, models) stored in Cloud Storage
+Metadata tracked in Vertex ML Metadata Store
+</code></pre>
+<table>
+<thead><tr><th>Pipeline SDK</th><th>Notes</th></tr></thead>
+<tbody>
+<tr><td><strong>Kubeflow Pipelines SDK v2</strong></td><td>Primary SDK for Vertex AI Pipelines</td></tr>
+<tr><td><strong>TFX</strong></td><td>TensorFlow-specific pipeline components</td></tr>
+<tr><td><strong>Google Cloud Pipeline Components</strong></td><td>Pre-built components cho Vertex AI services</td></tr>
+</tbody>
+</table>
+<h2 id="model-monitoring"><strong>3. Vertex AI Model Monitoring</strong></h2>
+<table>
+<thead><tr><th>Monitoring Type</th><th>What It Detects</th></tr></thead>
+<tbody>
+<tr><td><strong>Feature Skew Monitoring</strong></td><td>Serving feature distribution ≠ training baseline</td></tr>
+<tr><td><strong>Feature Drift Monitoring</strong></td><td>Serving feature distribution changes over time</td></tr>
+<tr><td><strong>Prediction Drift</strong></td><td>Model output distribution changes (indirect label drift)</td></tr>
+</tbody>
+</table>
+<pre><code class="language-text">Model Monitoring Workflow:
+Training Data Baseline (BigQuery/GCS)
+     ↓ (establish distribution)
+Deploy to Endpoint with Monitoring enabled
+     ↓ (collect serving requests)
+Periodic Analysis (hourly/daily)
+     ↓ (compare distributions)
+Alert if skew/drift > threshold
+     ↓
+Retrain trigger → new Pipeline run
+</code></pre>
+<h2 id="experiments-metadata"><strong>4. Vertex AI Experiments & Metadata</strong></h2>
+<table>
+<thead><tr><th>Component</th><th>Purpose</th></tr></thead>
+<tbody>
+<tr><td><strong>Vertex AI Experiments</strong></td><td>Track hyperparameters, metrics, artifacts across runs</td></tr>
+<tr><td><strong>ML Metadata Store</strong></td><td>Track lineage: data → model → endpoint</td></tr>
+<tr><td><strong>Vertex AI TensorBoard</strong></td><td>Visualize training metrics (loss, accuracy curves)</td></tr>
+</tbody>
+</table>
+<h2 id="cicd-ml"><strong>5. CI/CD for ML on GCP</strong></h2>
+<pre><code class="language-text">ML CI/CD Pipeline on GCP:
+Code Push to Cloud Source Repositories
+     ↓
+Cloud Build trigger (CI)
+     ├── Unit tests for ML components
+     ├── Data validation tests
+     └── Build Docker image → push to Artifact Registry
+          ↓
+Vertex AI Pipeline trigger (CD/CT)
+     ├── Data preprocessing
+     ├── Model training
+     ├── Model evaluation
+     └── Conditional deployment → Vertex AI Endpoint
+</code></pre>
+<blockquote>
+<p><strong>Exam tip:</strong> CI/CD cho ML = Cloud Build (code testing + Docker build) + Vertex AI Pipelines (training + deployment orchestration). Cloud Source Repositories là GCP's Git hosting. Artifact Registry thay thế Container Registry để lưu Docker images.</p>
+</blockquote>
+<h2 id="practice"><strong>6. Practice Questions</strong></h2>
+<p><strong>Q1:</strong> A production ML model's prediction distribution has shifted significantly over 3 weeks, but ground truth labels are not yet available to measure accuracy directly. Which Vertex AI monitoring type detects this?</p>
+<ul>
+<li>A) Feature Skew Monitoring</li>
+<li>B) Prediction Drift Monitoring ✓</li>
+<li>C) Training data validation</li>
+<li>D) Vertex AI Experiments baseline comparison</li>
+</ul>
+<p><em>Explanation: Prediction Drift Monitoring tracks how the model's output distribution changes over time, serving as an indirect signal of model degradation even when ground truth labels are unavailable. Feature Skew compares serving vs training feature distributions (requires known training baseline).</em></p>
+<p><strong>Q2:</strong> A team is building a Vertex AI Pipeline that includes data preprocessing, model training, and deployment. They need to track all inputs, outputs, and model artifacts for auditability and reproducibility. Which service stores this lineage information?</p>
+<ul>
+<li>A) Cloud Logging</li>
+<li>B) Vertex AI ML Metadata Store ✓</li>
+<li>C) Cloud Storage versioning</li>
+<li>D) Vertex AI Experiments dashboard</li>
+</ul>
+<p><em>Explanation: Vertex AI ML Metadata Store (also called Vertex ML Metadata) automatically tracks lineage: which datasets produced which models, which models were deployed to which endpoints, including hyperparameters and evaluation metrics — enabling full provenance tracking.</em></p>
+<p><strong>Q3:</strong> A company wants to automatically retrain their ML model whenever new training data is available in Cloud Storage. The retraining should run a Vertex AI Pipeline and deploy if metrics pass thresholds. Which GCP service should trigger the pipeline?</p>
+<ul>
+<li>A) Vertex AI Schedules</li>
+<li>B) Cloud Storage notifications + Cloud Functions/Eventarc → Vertex AI Pipelines ✓</li>
+<li>C) BigQuery scheduled queries</li>
+<li>D) Cloud Scheduler alone</li>
+</ul>
+<p><em>Explanation: Cloud Storage object finalize notifications can trigger Cloud Functions or Eventarc, which then programmatically start a Vertex AI Pipeline run. This creates event-driven continuous training (MLOps Level 1). Cloud Scheduler triggers on time, not on data availability.</em></p>