@xdev-asia/xdev-knowledge-mcp 1.0.41 → 1.0.43
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/content/series/luyen-thi/luyen-thi-aws-ai-practitioner/chapters/01-domain-1-fundamentals-ai-ml/lessons/01-bai-1-ai-ml-deep-learning-concepts.md +287 -0
- package/content/series/luyen-thi/luyen-thi-aws-ai-practitioner/chapters/01-domain-1-fundamentals-ai-ml/lessons/02-bai-2-ml-lifecycle-aws-services.md +258 -0
- package/content/series/luyen-thi/luyen-thi-aws-ai-practitioner/chapters/02-domain-2-fundamentals-generative-ai/lessons/03-bai-3-generative-ai-foundation-models.md +218 -0
- package/content/series/luyen-thi/luyen-thi-aws-ai-practitioner/chapters/02-domain-2-fundamentals-generative-ai/lessons/04-bai-4-llm-transformers-multimodal.md +232 -0
- package/content/series/luyen-thi/luyen-thi-aws-ai-practitioner/chapters/03-domain-3-applications-foundation-models/lessons/05-bai-5-prompt-engineering-techniques.md +254 -0
- package/content/series/luyen-thi/luyen-thi-aws-ai-practitioner/chapters/03-domain-3-applications-foundation-models/lessons/06-bai-6-rag-vector-databases-knowledge-bases.md +244 -0
- package/content/series/luyen-thi/luyen-thi-aws-ai-practitioner/chapters/03-domain-3-applications-foundation-models/lessons/07-bai-7-fine-tuning-model-customization.md +247 -0
- package/content/series/luyen-thi/luyen-thi-aws-ai-practitioner/chapters/03-domain-3-applications-foundation-models/lessons/08-bai-8-amazon-bedrock-deep-dive.md +276 -0
- package/content/series/luyen-thi/luyen-thi-aws-ai-practitioner/chapters/04-domain-4-responsible-ai/lessons/09-bai-9-responsible-ai-fairness-bias-transparency.md +224 -0
- package/content/series/luyen-thi/luyen-thi-aws-ai-practitioner/chapters/04-domain-4-responsible-ai/lessons/10-bai-10-aws-responsible-ai-tools.md +252 -0
- package/content/series/luyen-thi/luyen-thi-aws-ai-practitioner/chapters/05-domain-5-security-compliance/lessons/11-bai-11-ai-security-data-privacy-compliance.md +279 -0
- package/content/series/luyen-thi/luyen-thi-aws-ai-practitioner/chapters/05-domain-5-security-compliance/lessons/12-bai-12-exam-strategy-cheat-sheet.md +229 -0
- package/content/series/luyen-thi/luyen-thi-aws-ai-practitioner/index.md +257 -0
- package/content/series/luyen-thi/luyen-thi-aws-ml-specialty/chapters/01-phan-1-data-engineering/lessons/01-bai-1-data-repositories-ingestion.md +193 -0
- package/content/series/luyen-thi/luyen-thi-aws-ml-specialty/chapters/01-phan-1-data-engineering/lessons/02-bai-2-data-transformation.md +178 -0
- package/content/series/luyen-thi/luyen-thi-aws-ml-specialty/index.md +240 -0
- package/content/series/luyen-thi/luyen-thi-gcp-ml-engineer/index.md +225 -0
- package/data/categories.json +16 -4
- package/data/quizzes/aws-ai-practitioner.json +362 -0
- package/data/quizzes/aws-ml-specialty.json +200 -0
- package/data/quizzes/gcp-ml-engineer.json +200 -0
- package/data/quizzes.json +764 -0
- package/package.json +1 -1
|
@@ -0,0 +1,287 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 019c9619-lt01-d1-l01
|
|
3
|
+
title: 'Bài 1: AI, ML & Deep Learning — Concepts and Terminology'
|
|
4
|
+
slug: bai-1-ai-ml-deep-learning-concepts
|
|
5
|
+
description: >-
|
|
6
|
+
AI vs ML vs DL. Supervised, Unsupervised, Reinforcement Learning.
|
|
7
|
+
Classification, Regression, Clustering. Neural Networks basics.
|
|
8
|
+
Training, Validation, Test sets. Bias-Variance tradeoff.
|
|
9
|
+
duration_minutes: 60
|
|
10
|
+
is_free: true
|
|
11
|
+
video_url: null
|
|
12
|
+
sort_order: 1
|
|
13
|
+
section_title: "Domain 1: Fundamentals of AI and ML (20%)"
|
|
14
|
+
course:
|
|
15
|
+
id: 019c9619-lt01-7001-c001-lt0100000001
|
|
16
|
+
title: 'Luyện thi AWS Certified AI Practitioner (AIF-C01)'
|
|
17
|
+
slug: luyen-thi-aws-ai-practitioner
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
<div style="text-align: center; margin: 2rem 0;">
|
|
21
|
+
<img src="/storage/uploads/2026/04/aws-aif-bai1-ai-ml-dl-hierarchy.png" alt="AI, ML và Deep Learning Hierarchy" style="max-width: 800px; width: 100%; border-radius: 12px;" />
|
|
22
|
+
<p><em>AI, ML và Deep Learning — quan hệ lồng nhau và ba paradigm học máy</em></p>
|
|
23
|
+
</div>
|
|
24
|
+
|
|
25
|
+
<h2 id="overview"><strong>Tổng quan Domain 1</strong></h2>
|
|
26
|
+
|
|
27
|
+
<p>Domain 1 chiếm <strong>20% đề thi AIF-C01</strong>. Bạn cần hiểu rõ các khái niệm nền tảng về AI, ML, và Deep Learning — không cần code, nhưng phải phân biệt được khi nào dùng approach nào.</p>
|
|
28
|
+
|
|
29
|
+
<blockquote>
|
|
30
|
+
<p><strong>Exam tip:</strong> Domain này thường có các câu hỏi dạng "Which type of machine learning is BEST suited for..." — yêu cầu bạn chọn đúng paradigm cho use case.</p>
|
|
31
|
+
</blockquote>
|
|
32
|
+
|
|
33
|
+
<h2 id="ai-vs-ml-vs-dl"><strong>1. AI vs Machine Learning vs Deep Learning</strong></h2>
|
|
34
|
+
|
|
35
|
+
<p>Ba khái niệm này có quan hệ lồng nhau (nested relationship):</p>
|
|
36
|
+
|
|
37
|
+
<pre><code class="language-text">┌─────────────────────────────────────────────┐
|
|
38
|
+
│ Artificial Intelligence (AI) │
|
|
39
|
+
│ "Machines that mimic human intelligence" │
|
|
40
|
+
│ ┌───────────────────────────────────────┐ │
|
|
41
|
+
│ │ Machine Learning (ML) │ │
|
|
42
|
+
│ │ "Learning from data without │ │
|
|
43
|
+
│ │ explicit programming" │ │
|
|
44
|
+
│ │ ┌─────────────────────────────────┐ │ │
|
|
45
|
+
│ │ │ Deep Learning (DL) │ │ │
|
|
46
|
+
│ │ │ "Neural networks with many │ │ │
|
|
47
|
+
│ │ │ layers" │ │ │
|
|
48
|
+
│ │ └─────────────────────────────────┘ │ │
|
|
49
|
+
│ └───────────────────────────────────────┘ │
|
|
50
|
+
└─────────────────────────────────────────────┘
|
|
51
|
+
</code></pre>
|
|
52
|
+
|
|
53
|
+
<table>
|
|
54
|
+
<thead><tr><th>Concept</th><th>Definition</th><th>Example</th></tr></thead>
|
|
55
|
+
<tbody>
|
|
56
|
+
<tr><td><strong>AI</strong></td><td>Broad field — machines performing tasks that typically require human intelligence</td><td>Chatbot, self-driving car, chess engine</td></tr>
|
|
57
|
+
<tr><td><strong>ML</strong></td><td>Subset of AI — algorithms learn patterns from data</td><td>Spam filter, recommendation engine</td></tr>
|
|
58
|
+
<tr><td><strong>DL</strong></td><td>Subset of ML — neural networks with multiple layers</td><td>Image recognition, language translation</td></tr>
|
|
59
|
+
</tbody>
|
|
60
|
+
</table>
|
|
61
|
+
|
|
62
|
+
<h3 id="key-differences"><strong>Key Differences for the Exam</strong></h3>
|
|
63
|
+
|
|
64
|
+
<ul>
|
|
65
|
+
<li><strong>Traditional Programming</strong>: Rules + Data → Output</li>
|
|
66
|
+
<li><strong>Machine Learning</strong>: Data + Output → Rules (model learns the rules)</li>
|
|
67
|
+
<li><strong>Deep Learning</strong>: Tự động extract features từ raw data (không cần manual feature engineering)</li>
|
|
68
|
+
</ul>
|
|
69
|
+
|
|
70
|
+
<h2 id="ml-paradigms"><strong>2. Three ML Paradigms</strong></h2>
|
|
71
|
+
|
|
72
|
+
<h3 id="supervised-learning"><strong>2.1. Supervised Learning</strong></h3>
|
|
73
|
+
|
|
74
|
+
<p>Model học từ <strong>labeled data</strong> — mỗi input đi kèm output đúng (label/target).</p>
|
|
75
|
+
|
|
76
|
+
<table>
|
|
77
|
+
<thead><tr><th>Task Type</th><th>Output</th><th>Use Case</th><th>Algorithms</th></tr></thead>
|
|
78
|
+
<tbody>
|
|
79
|
+
<tr><td><strong>Classification</strong></td><td>Discrete category</td><td>Spam vs Not Spam, Fraud detection</td><td>Logistic Regression, Random Forest, SVM</td></tr>
|
|
80
|
+
<tr><td><strong>Regression</strong></td><td>Continuous number</td><td>House price prediction, Stock forecast</td><td>Linear Regression, XGBoost</td></tr>
|
|
81
|
+
</tbody>
|
|
82
|
+
</table>
|
|
83
|
+
|
|
84
|
+
<blockquote>
|
|
85
|
+
<p><strong>Exam tip:</strong> Nếu đề bài nói "predict a category" hoặc "classify" → <strong>Classification</strong>. Nếu nói "predict a number/value" → <strong>Regression</strong>.</p>
|
|
86
|
+
</blockquote>
|
|
87
|
+
|
|
88
|
+
<h3 id="unsupervised-learning"><strong>2.2. Unsupervised Learning</strong></h3>
|
|
89
|
+
|
|
90
|
+
<p>Model học từ <strong>unlabeled data</strong> — tự tìm patterns, structure trong dữ liệu.</p>
|
|
91
|
+
|
|
92
|
+
<table>
|
|
93
|
+
<thead><tr><th>Task Type</th><th>What it does</th><th>Use Case</th></tr></thead>
|
|
94
|
+
<tbody>
|
|
95
|
+
<tr><td><strong>Clustering</strong></td><td>Group similar data points</td><td>Customer segmentation, Document grouping</td></tr>
|
|
96
|
+
<tr><td><strong>Dimensionality Reduction</strong></td><td>Reduce features while preserving info</td><td>Data visualization, noise reduction</td></tr>
|
|
97
|
+
<tr><td><strong>Anomaly Detection</strong></td><td>Find unusual data points</td><td>Fraud detection, equipment failure</td></tr>
|
|
98
|
+
<tr><td><strong>Association</strong></td><td>Find rules between items</td><td>"Customers who bought X also bought Y"</td></tr>
|
|
99
|
+
</tbody>
|
|
100
|
+
</table>
|
|
101
|
+
|
|
102
|
+
<h3 id="reinforcement-learning"><strong>2.3. Reinforcement Learning (RL)</strong></h3>
|
|
103
|
+
|
|
104
|
+
<p>Agent học bằng cách <strong>trial-and-error</strong> trong một environment. Nhận <strong>reward</strong> (positive) hoặc <strong>penalty</strong> (negative) cho mỗi action.</p>
|
|
105
|
+
|
|
106
|
+
<pre><code class="language-text">Agent → Action → Environment → State + Reward → Agent (loop)
|
|
107
|
+
</code></pre>
|
|
108
|
+
|
|
109
|
+
<p><strong>Use cases:</strong></p>
|
|
110
|
+
<ul>
|
|
111
|
+
<li>Game AI (AlphaGo)</li>
|
|
112
|
+
<li>Robotics navigation</li>
|
|
113
|
+
<li>Autonomous driving</li>
|
|
114
|
+
<li>AWS DeepRacer (self-driving car simulation)</li>
|
|
115
|
+
</ul>
|
|
116
|
+
|
|
117
|
+
<h3 id="choosing-paradigm"><strong>2.4. Choosing the Right Paradigm — Exam Decision Tree</strong></h3>
|
|
118
|
+
|
|
119
|
+
<pre><code class="language-text">Do you have labeled data?
|
|
120
|
+
├── YES → Supervised Learning
|
|
121
|
+
│ ├── Predicting a category? → Classification
|
|
122
|
+
│ └── Predicting a number? → Regression
|
|
123
|
+
├── NO →
|
|
124
|
+
│ ├── Want to find groups/patterns? → Unsupervised (Clustering)
|
|
125
|
+
│ └── Learning through trial & error? → Reinforcement Learning
|
|
126
|
+
</code></pre>
|
|
127
|
+
|
|
128
|
+
<h2 id="data-concepts"><strong>3. Data Concepts for ML</strong></h2>
|
|
129
|
+
|
|
130
|
+
<h3 id="data-types"><strong>3.1. Data Types</strong></h3>
|
|
131
|
+
|
|
132
|
+
<table>
|
|
133
|
+
<thead><tr><th>Type</th><th>Description</th><th>Examples</th></tr></thead>
|
|
134
|
+
<tbody>
|
|
135
|
+
<tr><td><strong>Structured</strong></td><td>Organized in rows & columns (tabular)</td><td>CSV, database tables, spreadsheets</td></tr>
|
|
136
|
+
<tr><td><strong>Semi-structured</strong></td><td>Has some organization but flexible</td><td>JSON, XML, log files</td></tr>
|
|
137
|
+
<tr><td><strong>Unstructured</strong></td><td>No predefined format</td><td>Images, videos, audio, free text</td></tr>
|
|
138
|
+
<tr><td><strong>Time-series</strong></td><td>Data points indexed by time</td><td>Stock prices, IoT sensor readings</td></tr>
|
|
139
|
+
</tbody>
|
|
140
|
+
</table>
|
|
141
|
+
|
|
142
|
+
<h3 id="labeled-unlabeled"><strong>3.2. Labeled vs Unlabeled Data</strong></h3>
|
|
143
|
+
|
|
144
|
+
<ul>
|
|
145
|
+
<li><strong>Labeled data</strong>: Mỗi data point có kèm answer (label). Ví dụ: email + tag "spam"/"not spam". Dùng cho <strong>Supervised Learning</strong>.</li>
|
|
146
|
+
<li><strong>Unlabeled data</strong>: Chỉ có data, không có label. Dùng cho <strong>Unsupervised Learning</strong>.</li>
|
|
147
|
+
<li><strong>Amazon SageMaker Ground Truth</strong>: Dịch vụ AWS giúp label data (human + ML-assisted labeling).</li>
|
|
148
|
+
</ul>
|
|
149
|
+
|
|
150
|
+
<h3 id="datasets"><strong>3.3. Training, Validation, Test Sets</strong></h3>
|
|
151
|
+
|
|
152
|
+
<pre><code class="language-text">┌────────────────────────────────────────────────┐
|
|
153
|
+
│ Full Dataset (100%) │
|
|
154
|
+
├──────────────────┬──────────┬──────────────────┤
|
|
155
|
+
│ Training (70%) │ Val(15%) │ Test (15%) │
|
|
156
|
+
│ Model learns │ Tune │ Final evaluation │
|
|
157
|
+
│ from this data │ hyper- │ (never seen │
|
|
158
|
+
│ │ params │ during training) │
|
|
159
|
+
└──────────────────┴──────────┴──────────────────┘
|
|
160
|
+
</code></pre>
|
|
161
|
+
|
|
162
|
+
<ul>
|
|
163
|
+
<li><strong>Training set</strong>: Model học patterns từ đây</li>
|
|
164
|
+
<li><strong>Validation set</strong>: Tune hyperparameters, chống overfitting</li>
|
|
165
|
+
<li><strong>Test set</strong>: Đánh giá cuối cùng — model chưa bao giờ thấy data này</li>
|
|
166
|
+
</ul>
|
|
167
|
+
|
|
168
|
+
<h2 id="neural-networks"><strong>4. Neural Networks Basics</strong></h2>
|
|
169
|
+
|
|
170
|
+
<h3 id="nn-architecture"><strong>4.1. Architecture</strong></h3>
|
|
171
|
+
|
|
172
|
+
<pre><code class="language-text">Input Layer → Hidden Layer(s) → Output Layer
|
|
173
|
+
x₁ ──┐ ┌── h₁ ──┐
|
|
174
|
+
x₂ ──┼─────┼── h₂ ──┼──── ŷ (prediction)
|
|
175
|
+
x₃ ──┘ └── h₃ ──┘
|
|
176
|
+
|
|
177
|
+
Each connection has a weight (w)
|
|
178
|
+
Each neuron applies an activation function
|
|
179
|
+
</code></pre>
|
|
180
|
+
|
|
181
|
+
<p><strong>Key components:</strong></p>
|
|
182
|
+
<ul>
|
|
183
|
+
<li><strong>Weights</strong>: Parameters the model learns during training</li>
|
|
184
|
+
<li><strong>Bias</strong>: Additional parameter to shift the activation function</li>
|
|
185
|
+
<li><strong>Activation Function</strong>: ReLU, Sigmoid, Softmax — introduces non-linearity</li>
|
|
186
|
+
<li><strong>Loss Function</strong>: Measures how wrong the model's predictions are</li>
|
|
187
|
+
<li><strong>Optimizer</strong>: Updates weights to minimize loss (e.g., SGD, Adam)</li>
|
|
188
|
+
</ul>
|
|
189
|
+
|
|
190
|
+
<h3 id="nn-types"><strong>4.2. Types of Neural Networks</strong></h3>
|
|
191
|
+
|
|
192
|
+
<table>
|
|
193
|
+
<thead><tr><th>Type</th><th>Best For</th><th>AWS Service</th></tr></thead>
|
|
194
|
+
<tbody>
|
|
195
|
+
<tr><td><strong>CNN</strong> (Convolutional NN)</td><td>Images, video</td><td>Amazon Rekognition</td></tr>
|
|
196
|
+
<tr><td><strong>RNN/LSTM</strong> (Recurrent NN)</td><td>Sequential data, time series</td><td>Amazon Forecast</td></tr>
|
|
197
|
+
<tr><td><strong>Transformer</strong></td><td>NLP, text generation</td><td>Amazon Bedrock (LLMs)</td></tr>
|
|
198
|
+
<tr><td><strong>GAN</strong> (Generative Adversarial)</td><td>Generate new data (images)</td><td>—</td></tr>
|
|
199
|
+
</tbody>
|
|
200
|
+
</table>
|
|
201
|
+
|
|
202
|
+
<h2 id="model-evaluation"><strong>5. Model Evaluation Concepts</strong></h2>
|
|
203
|
+
|
|
204
|
+
<h3 id="overfitting-underfitting"><strong>5.1. Overfitting vs Underfitting</strong></h3>
|
|
205
|
+
|
|
206
|
+
<table>
|
|
207
|
+
<thead><tr><th>Problem</th><th>Training Accuracy</th><th>Test Accuracy</th><th>Cause</th><th>Solution</th></tr></thead>
|
|
208
|
+
<tbody>
|
|
209
|
+
<tr><td><strong>Overfitting</strong></td><td>Very High</td><td>Low</td><td>Model memorizes training data</td><td>More data, regularization, dropout, early stopping</td></tr>
|
|
210
|
+
<tr><td><strong>Underfitting</strong></td><td>Low</td><td>Low</td><td>Model too simple</td><td>More features, more complex model, longer training</td></tr>
|
|
211
|
+
<tr><td><strong>Good Fit</strong></td><td>High</td><td>High</td><td>Balanced complexity</td><td>—</td></tr>
|
|
212
|
+
</tbody>
|
|
213
|
+
</table>
|
|
214
|
+
|
|
215
|
+
<h3 id="bias-variance"><strong>5.2. Bias-Variance Tradeoff</strong></h3>
|
|
216
|
+
|
|
217
|
+
<ul>
|
|
218
|
+
<li><strong>High Bias</strong> = Underfitting (model quá đơn giản, bỏ qua patterns)</li>
|
|
219
|
+
<li><strong>High Variance</strong> = Overfitting (model quá phức tạp, nhạy cảm với noise)</li>
|
|
220
|
+
<li>Mục tiêu: tìm <strong>sweet spot</strong> giữa bias và variance</li>
|
|
221
|
+
</ul>
|
|
222
|
+
|
|
223
|
+
<h3 id="metrics"><strong>5.3. Common Metrics</strong></h3>
|
|
224
|
+
|
|
225
|
+
<p><strong>Classification metrics:</strong></p>
|
|
226
|
+
<table>
|
|
227
|
+
<thead><tr><th>Metric</th><th>Formula</th><th>When to use</th></tr></thead>
|
|
228
|
+
<tbody>
|
|
229
|
+
<tr><td><strong>Accuracy</strong></td><td>(TP + TN) / Total</td><td>Balanced classes</td></tr>
|
|
230
|
+
<tr><td><strong>Precision</strong></td><td>TP / (TP + FP)</td><td>"Don't flag innocent as spam"</td></tr>
|
|
231
|
+
<tr><td><strong>Recall</strong></td><td>TP / (TP + FN)</td><td>"Don't miss any fraud"</td></tr>
|
|
232
|
+
<tr><td><strong>F1 Score</strong></td><td>2 × (P × R) / (P + R)</td><td>Imbalanced classes</td></tr>
|
|
233
|
+
<tr><td><strong>AUC-ROC</strong></td><td>Area under ROC curve</td><td>Binary classification overall</td></tr>
|
|
234
|
+
</tbody>
|
|
235
|
+
</table>
|
|
236
|
+
|
|
237
|
+
<p><strong>Regression metrics:</strong></p>
|
|
238
|
+
<ul>
|
|
239
|
+
<li><strong>RMSE</strong> (Root Mean Square Error): Penalizes large errors</li>
|
|
240
|
+
<li><strong>MAE</strong> (Mean Absolute Error): Average error magnitude</li>
|
|
241
|
+
<li><strong>R²</strong>: How well model explains variance (1.0 = perfect)</li>
|
|
242
|
+
</ul>
|
|
243
|
+
|
|
244
|
+
<h2 id="key-terms"><strong>6. Key Terms Cheat Sheet</strong></h2>
|
|
245
|
+
|
|
246
|
+
<table>
|
|
247
|
+
<thead><tr><th>Term</th><th>Definition (for exam)</th></tr></thead>
|
|
248
|
+
<tbody>
|
|
249
|
+
<tr><td><strong>Feature</strong></td><td>Input variable used for prediction (column in data)</td></tr>
|
|
250
|
+
<tr><td><strong>Label / Target</strong></td><td>The answer we want the model to predict</td></tr>
|
|
251
|
+
<tr><td><strong>Hyperparameter</strong></td><td>Settings configured BEFORE training (learning rate, epochs)</td></tr>
|
|
252
|
+
<tr><td><strong>Parameter</strong></td><td>Values the model learns DURING training (weights, biases)</td></tr>
|
|
253
|
+
<tr><td><strong>Epoch</strong></td><td>One complete pass through the entire training dataset</td></tr>
|
|
254
|
+
<tr><td><strong>Batch Size</strong></td><td>Number of samples processed before updating weights</td></tr>
|
|
255
|
+
<tr><td><strong>Inference</strong></td><td>Using a trained model to make predictions on new data</td></tr>
|
|
256
|
+
<tr><td><strong>Transfer Learning</strong></td><td>Using a pre-trained model and adapting it for a new task</td></tr>
|
|
257
|
+
</tbody>
|
|
258
|
+
</table>
|
|
259
|
+
|
|
260
|
+
<h2 id="practice-questions"><strong>7. Practice Questions</strong></h2>
|
|
261
|
+
|
|
262
|
+
<p><strong>Q1:</strong> A company wants to predict whether customers will cancel their subscription (yes/no). Which ML approach is most appropriate?</p>
|
|
263
|
+
<ul>
|
|
264
|
+
<li>A) Unsupervised Learning — Clustering</li>
|
|
265
|
+
<li>B) Supervised Learning — Regression</li>
|
|
266
|
+
<li>C) Supervised Learning — Classification ✓</li>
|
|
267
|
+
<li>D) Reinforcement Learning</li>
|
|
268
|
+
</ul>
|
|
269
|
+
<p><em>Explanation: Predicting a binary outcome (yes/no) with labeled historical data = supervised classification.</em></p>
|
|
270
|
+
|
|
271
|
+
<p><strong>Q2:</strong> A retail company has customer purchase data but NO predefined groups. They want to segment customers into groups for targeted marketing. Which approach should they use?</p>
|
|
272
|
+
<ul>
|
|
273
|
+
<li>A) Supervised Learning — Classification</li>
|
|
274
|
+
<li>B) Unsupervised Learning — Clustering ✓</li>
|
|
275
|
+
<li>C) Reinforcement Learning</li>
|
|
276
|
+
<li>D) Supervised Learning — Regression</li>
|
|
277
|
+
</ul>
|
|
278
|
+
<p><em>Explanation: No labels + finding natural groups in data = unsupervised clustering.</em></p>
|
|
279
|
+
|
|
280
|
+
<p><strong>Q3:</strong> A model performs extremely well on training data (99% accuracy) but poorly on new data (65% accuracy). What is this called?</p>
|
|
281
|
+
<ul>
|
|
282
|
+
<li>A) Underfitting</li>
|
|
283
|
+
<li>B) Overfitting ✓</li>
|
|
284
|
+
<li>C) High bias</li>
|
|
285
|
+
<li>D) Regularization</li>
|
|
286
|
+
</ul>
|
|
287
|
+
<p><em>Explanation: High training accuracy + low test accuracy = overfitting (model memorized training data).</em></p>
|
|
@@ -0,0 +1,258 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 019c9619-lt01-d1-l02
|
|
3
|
+
title: 'Bài 2: ML Development Lifecycle & AWS AI Services Overview'
|
|
4
|
+
slug: bai-2-ml-lifecycle-aws-services
|
|
5
|
+
description: >-
|
|
6
|
+
ML pipeline: data collection → feature engineering → training → evaluation → deployment.
|
|
7
|
+
AWS AI/ML service stack. SageMaker, Rekognition, Comprehend, Polly,
|
|
8
|
+
Transcribe, Translate, Textract, Lex, Personalize, Forecast, Kendra.
|
|
9
|
+
duration_minutes: 60
|
|
10
|
+
is_free: true
|
|
11
|
+
video_url: null
|
|
12
|
+
sort_order: 2
|
|
13
|
+
section_title: "Domain 1: Fundamentals of AI and ML (20%)"
|
|
14
|
+
course:
|
|
15
|
+
id: 019c9619-lt01-7001-c001-lt0100000001
|
|
16
|
+
title: 'Luyện thi AWS Certified AI Practitioner (AIF-C01)'
|
|
17
|
+
slug: luyen-thi-aws-ai-practitioner
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
<div style="text-align: center; margin: 2rem 0;">
|
|
21
|
+
<img src="/storage/uploads/2026/04/aws-aif-bai2-ml-lifecycle-pipeline.png" alt="ML Development Lifecycle Pipeline on AWS" style="max-width: 800px; width: 100%; border-radius: 12px;" />
|
|
22
|
+
<p><em>ML Development Lifecycle Pipeline và AWS AI/ML Service Stack</em></p>
|
|
23
|
+
</div>
|
|
24
|
+
|
|
25
|
+
<h2 id="ml-lifecycle"><strong>1. ML Development Lifecycle</strong></h2>
|
|
26
|
+
|
|
27
|
+
<p>Đề thi AIF-C01 yêu cầu bạn hiểu toàn bộ vòng đời phát triển ML — từ khi xác định bài toán đến khi deploy và monitor model.</p>
|
|
28
|
+
|
|
29
|
+
<pre><code class="language-text">┌─────────────┐ ┌──────────────┐ ┌──────────────┐
|
|
30
|
+
│ 1. Business │───→│ 2. Data │───→│ 3. Feature │
|
|
31
|
+
│ Problem │ │ Collection & │ │ Engineering │
|
|
32
|
+
│ Definition │ │ Preparation │ │ │
|
|
33
|
+
└─────────────┘ └──────────────┘ └──────────────┘
|
|
34
|
+
│
|
|
35
|
+
┌─────────────┐ ┌──────────────┐ ┌──────┴───────┐
|
|
36
|
+
│ 6. Monitor │←───│ 5. Deploy │←───│ 4. Model │
|
|
37
|
+
│ & Retrain │ │ & Inference │ │ Training & │
|
|
38
|
+
│ │ │ │ │ Evaluation │
|
|
39
|
+
└─────────────┘ └──────────────┘ └──────────────┘
|
|
40
|
+
</code></pre>
|
|
41
|
+
|
|
42
|
+
<h3 id="step-1"><strong>Step 1: Business Problem Definition</strong></h3>
|
|
43
|
+
|
|
44
|
+
<ul>
|
|
45
|
+
<li>Xác định bài toán có thực sự cần ML không (sometimes rules-based is enough)</li>
|
|
46
|
+
<li>Define success metrics (KPIs)</li>
|
|
47
|
+
<li>Determine data availability</li>
|
|
48
|
+
</ul>
|
|
49
|
+
|
|
50
|
+
<blockquote>
|
|
51
|
+
<p><strong>Exam tip:</strong> "Not every problem needs ML." Nếu đề bài mô tả bài toán đơn giản, có thể rule-based hoặc lookup table là đủ.</p>
|
|
52
|
+
</blockquote>
|
|
53
|
+
|
|
54
|
+
<h3 id="step-2"><strong>Step 2: Data Collection & Preparation</strong></h3>
|
|
55
|
+
|
|
56
|
+
<ul>
|
|
57
|
+
<li><strong>Data Collection</strong>: Thu thập từ databases, APIs, IoT, logs</li>
|
|
58
|
+
<li><strong>Data Cleaning</strong>: Handle missing values, remove duplicates, fix errors</li>
|
|
59
|
+
<li><strong>Data Labeling</strong>: Gắn nhãn cho supervised learning → <strong>Amazon SageMaker Ground Truth</strong></li>
|
|
60
|
+
<li><strong>Exploratory Data Analysis (EDA)</strong>: Visualize, understand distributions, correlations</li>
|
|
61
|
+
</ul>
|
|
62
|
+
|
|
63
|
+
<h3 id="step-3"><strong>Step 3: Feature Engineering</strong></h3>
|
|
64
|
+
|
|
65
|
+
<ul>
|
|
66
|
+
<li><strong>Feature selection</strong>: Chọn features quan trọng, loại bỏ noise</li>
|
|
67
|
+
<li><strong>Feature transformation</strong>: Normalization, scaling, encoding</li>
|
|
68
|
+
<li><strong>Feature creation</strong>: Tạo features mới từ raw data</li>
|
|
69
|
+
<li>AWS: <strong>SageMaker Data Wrangler</strong>, <strong>SageMaker Feature Store</strong></li>
|
|
70
|
+
</ul>
|
|
71
|
+
|
|
72
|
+
<h3 id="step-4"><strong>Step 4: Model Training & Evaluation</strong></h3>
|
|
73
|
+
|
|
74
|
+
<ul>
|
|
75
|
+
<li>Choose algorithm appropriate for the problem</li>
|
|
76
|
+
<li>Split data into training/validation/test sets</li>
|
|
77
|
+
<li>Train model, tune hyperparameters</li>
|
|
78
|
+
<li>Evaluate using appropriate metrics (accuracy, F1, RMSE...)</li>
|
|
79
|
+
<li>AWS: <strong>Amazon SageMaker</strong> for full ML workflow</li>
|
|
80
|
+
</ul>
|
|
81
|
+
|
|
82
|
+
<h3 id="step-5"><strong>Step 5: Deployment & Inference</strong></h3>
|
|
83
|
+
|
|
84
|
+
<ul>
|
|
85
|
+
<li><strong>Real-time inference</strong>: Endpoint cho instant predictions</li>
|
|
86
|
+
<li><strong>Batch inference</strong>: Process large datasets offline</li>
|
|
87
|
+
<li><strong>Edge deployment</strong>: Run model on edge devices</li>
|
|
88
|
+
<li>AWS: <strong>SageMaker Endpoints</strong>, <strong>Lambda</strong>, <strong>IoT Greengrass</strong></li>
|
|
89
|
+
</ul>
|
|
90
|
+
|
|
91
|
+
<h3 id="step-6"><strong>Step 6: Monitoring & Retraining</strong></h3>
|
|
92
|
+
|
|
93
|
+
<ul>
|
|
94
|
+
<li><strong>Model drift</strong>: Performance degrades over time as data changes</li>
|
|
95
|
+
<li><strong>Data drift</strong>: Input data distribution changes</li>
|
|
96
|
+
<li><strong>Concept drift</strong>: Relationship between input and output changes</li>
|
|
97
|
+
<li>Solution: Monitor → detect drift → retrain with new data</li>
|
|
98
|
+
<li>AWS: <strong>SageMaker Model Monitor</strong></li>
|
|
99
|
+
</ul>
|
|
100
|
+
|
|
101
|
+
<h2 id="aws-ai-stack"><strong>2. AWS AI/ML Service Stack</strong></h2>
|
|
102
|
+
|
|
103
|
+
<p>AWS cung cấp 3 layers of AI/ML services — từ high-level (no ML knowledge needed) đến low-level (full control):</p>
|
|
104
|
+
|
|
105
|
+
<pre><code class="language-text">┌─────────────────────────────────────────────────────┐
|
|
106
|
+
│ Layer 3: AI Services (Pre-trained, API-based) │
|
|
107
|
+
│ → Rekognition, Comprehend, Polly, Transcribe, │
|
|
108
|
+
│ Translate, Textract, Lex, Personalize, Forecast │
|
|
109
|
+
│ → NO ML expertise needed │
|
|
110
|
+
├─────────────────────────────────────────────────────┤
|
|
111
|
+
│ Layer 2: ML Services (Managed platform) │
|
|
112
|
+
│ → Amazon SageMaker, SageMaker JumpStart │
|
|
113
|
+
│ → Amazon Bedrock (GenAI) │
|
|
114
|
+
│ → SOME ML expertise needed │
|
|
115
|
+
├─────────────────────────────────────────────────────┤
|
|
116
|
+
│ Layer 1: ML Frameworks & Infrastructure │
|
|
117
|
+
│ → EC2 with GPU/Inferentia, Deep Learning AMIs, │
|
|
118
|
+
│ Deep Learning Containers │
|
|
119
|
+
│ → FULL ML expertise needed │
|
|
120
|
+
└─────────────────────────────────────────────────────┘
|
|
121
|
+
</code></pre>
|
|
122
|
+
|
|
123
|
+
<h2 id="ai-services"><strong>3. AWS AI Services — Bảng Tổng hợp</strong></h2>
|
|
124
|
+
|
|
125
|
+
<p>Đây là phần <strong>rất quan trọng cho đề thi</strong> — bạn cần biết mỗi service làm gì và khi nào dùng.</p>
|
|
126
|
+
|
|
127
|
+
<h3 id="vision"><strong>3.1. Computer Vision</strong></h3>
|
|
128
|
+
|
|
129
|
+
<table>
|
|
130
|
+
<thead><tr><th>Service</th><th>What it does</th><th>Use Cases</th></tr></thead>
|
|
131
|
+
<tbody>
|
|
132
|
+
<tr><td><strong>Amazon Rekognition</strong></td><td>Image and video analysis</td><td>Face detection, object detection, content moderation, celebrity recognition, text in images (OCR)</td></tr>
|
|
133
|
+
<tr><td><strong>Amazon Textract</strong></td><td>Extract text & data from documents</td><td>Invoice processing, ID document extraction, form data, table extraction</td></tr>
|
|
134
|
+
<tr><td><strong>Amazon Lookout for Vision</strong></td><td>Visual inspection for manufacturing</td><td>Defect detection in products on assembly line</td></tr>
|
|
135
|
+
</tbody>
|
|
136
|
+
</table>
|
|
137
|
+
|
|
138
|
+
<h3 id="nlp"><strong>3.2. Natural Language Processing (NLP)</strong></h3>
|
|
139
|
+
|
|
140
|
+
<table>
|
|
141
|
+
<thead><tr><th>Service</th><th>What it does</th><th>Use Cases</th></tr></thead>
|
|
142
|
+
<tbody>
|
|
143
|
+
<tr><td><strong>Amazon Comprehend</strong></td><td>NLP analysis</td><td>Sentiment analysis, entity extraction, key phrases, language detection, PII detection</td></tr>
|
|
144
|
+
<tr><td><strong>Amazon Translate</strong></td><td>Neural machine translation</td><td>Real-time translation, batch document translation</td></tr>
|
|
145
|
+
<tr><td><strong>Amazon Kendra</strong></td><td>Intelligent enterprise search</td><td>Internal knowledge search, FAQ, document search powered by NLP</td></tr>
|
|
146
|
+
</tbody>
|
|
147
|
+
</table>
|
|
148
|
+
|
|
149
|
+
<h3 id="speech"><strong>3.3. Speech</strong></h3>
|
|
150
|
+
|
|
151
|
+
<table>
|
|
152
|
+
<thead><tr><th>Service</th><th>What it does</th><th>Direction</th></tr></thead>
|
|
153
|
+
<tbody>
|
|
154
|
+
<tr><td><strong>Amazon Polly</strong></td><td>Text-to-Speech (TTS)</td><td>Text → Audio</td></tr>
|
|
155
|
+
<tr><td><strong>Amazon Transcribe</strong></td><td>Speech-to-Text (STT)</td><td>Audio → Text</td></tr>
|
|
156
|
+
<tr><td><strong>Amazon Lex</strong></td><td>Conversational AI (chatbot)</td><td>Build chatbots with voice & text (powers Alexa)</td></tr>
|
|
157
|
+
</tbody>
|
|
158
|
+
</table>
|
|
159
|
+
|
|
160
|
+
<blockquote>
|
|
161
|
+
<p><strong>Exam tip:</strong> Polly = text TO speech (Polly "speaks"). Transcribe = speech TO text (Transcribe "writes down").</p>
|
|
162
|
+
</blockquote>
|
|
163
|
+
|
|
164
|
+
<h3 id="predictions"><strong>3.4. Predictions & Recommendations</strong></h3>
|
|
165
|
+
|
|
166
|
+
<table>
|
|
167
|
+
<thead><tr><th>Service</th><th>What it does</th><th>Use Cases</th></tr></thead>
|
|
168
|
+
<tbody>
|
|
169
|
+
<tr><td><strong>Amazon Personalize</strong></td><td>Real-time personalization & recommendations</td><td>Product recommendations, personalized content</td></tr>
|
|
170
|
+
<tr><td><strong>Amazon Forecast</strong></td><td>Time-series forecasting</td><td>Demand planning, financial forecasting, resource planning</td></tr>
|
|
171
|
+
<tr><td><strong>Amazon Fraud Detector</strong></td><td>Detect online fraud</td><td>Payment fraud, fake accounts, account takeover</td></tr>
|
|
172
|
+
</tbody>
|
|
173
|
+
</table>
|
|
174
|
+
|
|
175
|
+
<h2 id="sagemaker"><strong>4. Amazon SageMaker Overview</strong></h2>
|
|
176
|
+
|
|
177
|
+
<p>SageMaker là <strong>fully managed ML platform</strong> — cung cấp mọi thứ cần thiết cho toàn bộ ML lifecycle.</p>
|
|
178
|
+
|
|
179
|
+
<h3 id="sagemaker-components"><strong>Key Components:</strong></h3>
|
|
180
|
+
|
|
181
|
+
<table>
|
|
182
|
+
<thead><tr><th>Component</th><th>Purpose</th></tr></thead>
|
|
183
|
+
<tbody>
|
|
184
|
+
<tr><td><strong>SageMaker Studio</strong></td><td>IDE cho ML development (Jupyter-based)</td></tr>
|
|
185
|
+
<tr><td><strong>SageMaker Ground Truth</strong></td><td>Data labeling service (human + ML-assisted)</td></tr>
|
|
186
|
+
<tr><td><strong>SageMaker Data Wrangler</strong></td><td>Data preparation & transformation (no code)</td></tr>
|
|
187
|
+
<tr><td><strong>SageMaker Feature Store</strong></td><td>Store & share ML features</td></tr>
|
|
188
|
+
<tr><td><strong>SageMaker Training</strong></td><td>Managed training jobs with built-in algorithms</td></tr>
|
|
189
|
+
<tr><td><strong>SageMaker Autopilot</strong></td><td>AutoML — automatic model building</td></tr>
|
|
190
|
+
<tr><td><strong>SageMaker JumpStart</strong></td><td>Pre-trained models & solutions (model hub)</td></tr>
|
|
191
|
+
<tr><td><strong>SageMaker Endpoints</strong></td><td>Deploy models for real-time inference</td></tr>
|
|
192
|
+
<tr><td><strong>SageMaker Model Monitor</strong></td><td>Monitor deployed models for drift</td></tr>
|
|
193
|
+
<tr><td><strong>SageMaker Clarify</strong></td><td>Bias detection & model explainability</td></tr>
|
|
194
|
+
<tr><td><strong>SageMaker Canvas</strong></td><td>No-code ML for business users</td></tr>
|
|
195
|
+
</tbody>
|
|
196
|
+
</table>
|
|
197
|
+
|
|
198
|
+
<h3 id="sagemaker-decision"><strong>When to use SageMaker vs AI Services?</strong></h3>
|
|
199
|
+
|
|
200
|
+
<pre><code class="language-text">Need custom ML model? → SageMaker
|
|
201
|
+
Need pre-trained capability? → AI Services (Rekognition, Comprehend, etc.)
|
|
202
|
+
Need GenAI/Foundation Models? → Amazon Bedrock
|
|
203
|
+
Business user, no code? → SageMaker Canvas
|
|
204
|
+
</code></pre>
|
|
205
|
+
|
|
206
|
+
<h2 id="service-mapping"><strong>5. Use Case → AWS Service Mapping</strong></h2>
|
|
207
|
+
|
|
208
|
+
<p>Đây là dạng câu hỏi rất phổ biến trong đề thi:</p>
|
|
209
|
+
|
|
210
|
+
<table>
|
|
211
|
+
<thead><tr><th>Use Case</th><th>AWS Service</th></tr></thead>
|
|
212
|
+
<tbody>
|
|
213
|
+
<tr><td>Detect faces in photos</td><td>Amazon Rekognition</td></tr>
|
|
214
|
+
<tr><td>Extract data from invoices</td><td>Amazon Textract</td></tr>
|
|
215
|
+
<tr><td>Analyze customer review sentiment</td><td>Amazon Comprehend</td></tr>
|
|
216
|
+
<tr><td>Translate content to multiple languages</td><td>Amazon Translate</td></tr>
|
|
217
|
+
<tr><td>Build a customer service chatbot</td><td>Amazon Lex</td></tr>
|
|
218
|
+
<tr><td>Convert blog posts to audio</td><td>Amazon Polly</td></tr>
|
|
219
|
+
<tr><td>Transcribe meeting recordings</td><td>Amazon Transcribe</td></tr>
|
|
220
|
+
<tr><td>Product recommendations</td><td>Amazon Personalize</td></tr>
|
|
221
|
+
<tr><td>Demand forecasting</td><td>Amazon Forecast</td></tr>
|
|
222
|
+
<tr><td>Search internal documents</td><td>Amazon Kendra</td></tr>
|
|
223
|
+
<tr><td>Detect fraudulent transactions</td><td>Amazon Fraud Detector</td></tr>
|
|
224
|
+
<tr><td>Label training data</td><td>SageMaker Ground Truth</td></tr>
|
|
225
|
+
<tr><td>Build custom ML model</td><td>Amazon SageMaker</td></tr>
|
|
226
|
+
<tr><td>No-code ML for business analysts</td><td>SageMaker Canvas</td></tr>
|
|
227
|
+
<tr><td>Generate text with LLM</td><td>Amazon Bedrock</td></tr>
|
|
228
|
+
</tbody>
|
|
229
|
+
</table>
|
|
230
|
+
|
|
231
|
+
<h2 id="practice-questions"><strong>6. Practice Questions</strong></h2>
|
|
232
|
+
|
|
233
|
+
<p><strong>Q1:</strong> A company wants to automatically extract text and structured data from scanned invoices. Which AWS service should they use?</p>
|
|
234
|
+
<ul>
|
|
235
|
+
<li>A) Amazon Comprehend</li>
|
|
236
|
+
<li>B) Amazon Rekognition</li>
|
|
237
|
+
<li>C) Amazon Textract ✓</li>
|
|
238
|
+
<li>D) Amazon Translate</li>
|
|
239
|
+
</ul>
|
|
240
|
+
<p><em>Explanation: Textract is specifically designed to extract text, forms, and tables from scanned documents. Comprehend analyzes text meaning, not document extraction. Rekognition is for image/video analysis.</em></p>
|
|
241
|
+
|
|
242
|
+
<p><strong>Q2:</strong> A data scientist notices that their deployed model's prediction accuracy has decreased over the past month. The input data patterns have changed. What is this called?</p>
|
|
243
|
+
<ul>
|
|
244
|
+
<li>A) Overfitting</li>
|
|
245
|
+
<li>B) Underfitting</li>
|
|
246
|
+
<li>C) Data drift ✓</li>
|
|
247
|
+
<li>D) Feature engineering</li>
|
|
248
|
+
</ul>
|
|
249
|
+
<p><em>Explanation: When the statistical properties of model input data change over time, causing performance degradation, this is called data drift.</em></p>
|
|
250
|
+
|
|
251
|
+
<p><strong>Q3:</strong> Which AWS service allows business analysts with no ML experience to build ML models using a visual interface?</p>
|
|
252
|
+
<ul>
|
|
253
|
+
<li>A) SageMaker Studio</li>
|
|
254
|
+
<li>B) SageMaker Autopilot</li>
|
|
255
|
+
<li>C) SageMaker Canvas ✓</li>
|
|
256
|
+
<li>D) SageMaker JumpStart</li>
|
|
257
|
+
</ul>
|
|
258
|
+
<p><em>Explanation: SageMaker Canvas provides a no-code, visual point-and-click interface for business analysts. Autopilot automates model building but requires some ML knowledge. JumpStart provides pre-trained models. Studio is the full ML IDE.</em></p>
|