npm - @xdev-asia/xdev-knowledge-mcp - Versions diffs - 1.0.40 → 1.0.42 - Mend

@xdev-asia/xdev-knowledge-mcp 1.0.40 → 1.0.42

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/content/series/luyen-thi/luyen-thi-aws-ai-practitioner/chapters/03-domain-3-applications-foundation-models/lessons/05-bai-5-prompt-engineering-techniques.md ADDED Viewed

@@ -0,0 +1,254 @@
+---
+id: 019c9619-lt01-d3-l05
+title: 'Bài 5: Prompt Engineering Techniques'
+slug: bai-5-prompt-engineering-techniques
+description: >-
+  Zero-shot, few-shot, Chain-of-Thought prompting.
+  System prompts, prompt templates, negative prompting.
+  Prompt engineering best practices cho AWS AI Practitioner exam.
+duration_minutes: 55
+is_free: true
+video_url: null
+sort_order: 1
+section_title: "Domain 3: Applications of Foundation Models (28%)"
+course:
+  id: 019c9619-lt01-7001-c001-lt0100000001
+  title: 'Luyện thi AWS Certified AI Practitioner (AIF-C01)'
+  slug: luyen-thi-aws-ai-practitioner
+---
+<div style="text-align: center; margin: 2rem 0;">
+<img src="/storage/uploads/2026/04/aws-aif-bai5-prompt-engineering.png" alt="Prompt Engineering Techniques" style="max-width: 800px; width: 100%; border-radius: 12px;" />
+<p><em>Prompt Engineering Techniques: Zero-shot, Few-shot và Chain-of-Thought</em></p>
+</div>
+<h2 id="prompt-engineering"><strong>1. What is Prompt Engineering?</strong></h2>
+<p><strong>Prompt Engineering</strong> là nghệ thuật thiết kế input (prompt) để nhận được output mong muốn từ Foundation Models. Đây là cách <strong>rẻ nhất và nhanh nhất</strong> để customize FM behavior — không cần training hay fine-tuning.</p>
+<h3 id="prompt-components"><strong>1.1. Components of a Prompt</strong></h3>
+<pre><code class="language-text">┌────────────────────────────────────────────┐
+│  SYSTEM PROMPT (optional)                  │
+│  "You are a helpful AWS solutions          │
+│   architect. Answer concisely."            │
+├────────────────────────────────────────────┤
+│  CONTEXT (optional)                        │
+│  Background info, documents, data          │
+├────────────────────────────────────────────┤
+│  USER PROMPT (required)                    │
+│  The actual question or instruction        │
+├────────────────────────────────────────────┤
+│  EXAMPLES (optional, for few-shot)         │
+│  Input → Output pairs                     │
+├────────────────────────────────────────────┤
+│  OUTPUT FORMAT (optional)                  │
+│  "Respond in JSON", "Use bullet points"    │
+└────────────────────────────────────────────┘
+</code></pre>
+<h2 id="prompting-techniques"><strong>2. Prompting Techniques</strong></h2>
+<h3 id="zero-shot"><strong>2.1. Zero-shot Prompting</strong></h3>
+<p>Gửi prompt <strong>không có ví dụ</strong>. Model dựa hoàn toàn vào knowledge đã học.</p>
+<pre><code class="language-text">Prompt: "Classify the sentiment of this review:
+'The product arrived damaged and customer service was unhelpful.'
+Sentiment:"
+Output: "Negative"
+</code></pre>
+<p><strong>When to use:</strong> Simple, well-defined tasks mà model đã hiểu rõ.</p>
+<h3 id="few-shot"><strong>2.2. Few-shot Prompting</strong></h3>
+<p>Cung cấp <strong>một vài examples</strong> trước khi đưa ra task thực tế. Giúp model hiểu expected format và logic.</p>
+<pre><code class="language-text">Prompt: "Classify these reviews:
+Review: 'Amazing quality, fast shipping!' → Positive
+Review: 'Terrible experience, never again.' → Negative
+Review: 'It's okay, nothing special.' → Neutral
+Review: 'The product exceeded my expectations!' →"
+Output: "Positive"
+</code></pre>
+<p><strong>When to use:</strong> Khi cần model tuân theo specific format hoặc logic pattern mà zero-shot không đạt đủ chất lượng.</p>
+<h3 id="one-shot"><strong>2.3. One-shot Prompting</strong></h3>
+<p>Variation của few-shot nhưng chỉ cung cấp <strong>1 example</strong>. Dùng khi bạn muốn set pattern nhưng context window bị hạn chế.</p>
+<h3 id="cot"><strong>2.4. Chain-of-Thought (CoT) Prompting</strong></h3>
+<p>Yêu cầu model <strong>suy nghĩ step-by-step</strong> trước khi trả lời. Đặc biệt hiệu quả cho math, logic, và reasoning tasks.</p>
+<pre><code class="language-text">WITHOUT CoT:
+Q: "If a store has 3 boxes with 12 apples each, and gives
+    away 15 apples, how many are left?"
+A: "21" (might be wrong without reasoning)
+WITH CoT:
+Q: "Think step by step: If a store has 3 boxes with 12
+    apples each, and gives away 15 apples, how many are left?"
+A: "Step 1: Total apples = 3 × 12 = 36
+    Step 2: After giving away = 36 - 15 = 21
+    Answer: 21 apples"
+</code></pre>
+<blockquote>
+<p><strong>Exam tip:</strong> "Which prompting technique improves reasoning accuracy?" → <strong>Chain-of-Thought</strong>. Key phrase: "think step by step" hoặc "explain your reasoning".</p>
+</blockquote>
+<h2 id="system-prompts"><strong>3. System Prompts & Personas</strong></h2>
+<p><strong>System prompt</strong> defines the model's role, behavior, constraints, and output format. Nó "sets the stage" trước khi user interaction.</p>
+<pre><code class="language-text">System Prompt:
+"You are a financial advisor AI for XYZ Bank.
+Rules:
+- Only answer questions about banking and investments
+- Never provide specific stock recommendations
+- Always include a disclaimer
+- Respond in a professional tone
+- If asked about non-financial topics, politely redirect"
+</code></pre>
+<h3 id="system-prompt-use"><strong>System Prompt Best Practices:</strong></h3>
+<table>
+<thead><tr><th>Practice</th><th>Why</th></tr></thead>
+<tbody>
+<tr><td>Define a <strong>clear role</strong></td><td>Constrains model behavior to a domain</td></tr>
+<tr><td>Set <strong>boundaries</strong></td><td>Prevents off-topic or harmful responses</td></tr>
+<tr><td>Specify <strong>output format</strong></td><td>Ensures consistent, parseable outputs</td></tr>
+<tr><td>Include <strong>examples</strong></td><td>Clarifies expected behavior</td></tr>
+<tr><td>Add <strong>guardrails</strong></td><td>Prevents misuse (PII, harmful content)</td></tr>
+</tbody>
+</table>
+<h2 id="advanced-techniques"><strong>4. Advanced Prompting Techniques</strong></h2>
+<h3 id="negative-prompting"><strong>4.1. Negative Prompting</strong></h3>
+<p>Chỉ rõ những gì model <strong>KHÔNG nên làm</strong>. Đặc biệt useful trong image generation.</p>
+<pre><code class="language-text">Text generation:
+"Summarize this article. Do NOT include opinions
+or personal commentary. Do NOT exceed 100 words."
+Image generation (Stable Diffusion):
+Prompt: "Professional headshot, studio lighting"
+Negative prompt: "blurry, cartoon, distorted, low quality"
+</code></pre>
+<h3 id="prompt-templates"><strong>4.2. Prompt Templates</strong></h3>
+<p>Reusable prompt structures với <strong>placeholders</strong> cho dynamic content:</p>
+<pre><code class="language-text">Template:
+"Given the following {document_type}:
+---
+{content}
+---
+Extract the following information:
+- {field_1}
+- {field_2}
+- {field_3}
+Respond in JSON format."
+</code></pre>
+<h3 id="prompt-chaining"><strong>4.3. Prompt Chaining</strong></h3>
+<p>Chia complex tasks thành <strong>multiple sequential prompts</strong>, output của prompt trước trở thành input của prompt sau.</p>
+<pre><code class="language-text">Step 1: "Extract key entities from this document: {doc}"
+         → Output: list of entities
+Step 2: "For each entity {entities}, find the sentiment
+         expressed about it in this text: {doc}"
+         → Output: entity-sentiment pairs
+Step 3: "Create a summary report of sentiment analysis
+         for these entities: {entity_sentiments}"
+         → Output: Final report
+</code></pre>
+<h2 id="comparison"><strong>5. Comparison Table for Exam</strong></h2>
+<table>
+<thead><tr><th>Technique</th><th>Examples Given?</th><th>Best For</th><th>Exam Keyword</th></tr></thead>
+<tbody>
+<tr><td><strong>Zero-shot</strong></td><td>None</td><td>Simple, well-known tasks</td><td>"no examples provided"</td></tr>
+<tr><td><strong>One-shot</strong></td><td>1 example</td><td>Setting format with minimal context</td><td>"single example"</td></tr>
+<tr><td><strong>Few-shot</strong></td><td>2-5 examples</td><td>Pattern following, classification</td><td>"examples provided", "demonstrations"</td></tr>
+<tr><td><strong>Chain-of-Thought</strong></td><td>With reasoning steps</td><td>Math, logic, complex reasoning</td><td>"step by step", "reasoning"</td></tr>
+<tr><td><strong>Negative prompting</strong></td><td>N/A</td><td>Avoiding unwanted outputs</td><td>"do not include", "avoid"</td></tr>
+<tr><td><strong>Prompt chaining</strong></td><td>N/A</td><td>Complex multi-step tasks</td><td>"break into steps", "sequential"</td></tr>
+</tbody>
+</table>
+<h2 id="inference-params"><strong>6. Inference Parameters Review</strong></h2>
+<p>Prompt engineering cũng bao gồm tuning inference parameters:</p>
+<table>
+<thead><tr><th>Parameter</th><th>Low Value</th><th>High Value</th></tr></thead>
+<tbody>
+<tr><td><strong>Temperature</strong></td><td>Deterministic, factual (0.0-0.3)</td><td>Creative, diverse (0.7-1.0)</td></tr>
+<tr><td><strong>Top-p</strong></td><td>Focused vocabulary (0.1-0.3)</td><td>Diverse vocabulary (0.9-1.0)</td></tr>
+<tr><td><strong>Top-k</strong></td><td>Limited choices (e.g., 10)</td><td>More choices (e.g., 250)</td></tr>
+<tr><td><strong>Max tokens</strong></td><td>Short responses</td><td>Long responses</td></tr>
+<tr><td><strong>Stop sequences</strong></td><td colspan="2">Define when to stop generating</td></tr>
+</tbody>
+</table>
+<blockquote>
+<p><strong>Exam tip:</strong> "A customer support chatbot gives inconsistent answers" → Lower <strong>temperature</strong> (closer to 0). "A creative writing app produces boring text" → Raise <strong>temperature</strong> (closer to 1).</p>
+</blockquote>
+<h2 id="best-practices"><strong>7. Prompt Engineering Best Practices</strong></h2>
+<ol>
+<li><strong>Be specific</strong>: "Summarize in 3 bullet points" > "Summarize this"</li>
+<li><strong>Provide context</strong>: Include relevant background information</li>
+<li><strong>Define output format</strong>: JSON, markdown, table, bullet points</li>
+<li><strong>Use delimiters</strong>: Separate sections with --- or ``` to avoid prompt injection</li>
+<li><strong>Iterate</strong>: Test and refine prompts based on outputs</li>
+<li><strong>Avoid ambiguity</strong>: Don't assume model knows your intent</li>
+<li><strong>Use examples</strong>: When zero-shot doesn't work, add few-shot examples</li>
+</ol>
+<h2 id="practice-questions"><strong>8. Practice Questions</strong></h2>
+<p><strong>Q1:</strong> A developer is working on a classification task, but the model's zero-shot responses are inconsistent. Which prompting technique should the developer try NEXT?</p>
+<ul>
+<li>A) Reduce the temperature to 0</li>
+<li>B) Use few-shot prompting with example inputs and outputs ✓</li>
+<li>C) Fine-tune the model on custom data</li>
+<li>D) Switch to a different model provider</li>
+</ul>
+<p><em>Explanation: Few-shot prompting is the logical next step after zero-shot fails — providing examples helps the model understand the expected pattern. Fine-tuning is more expensive and complex. Temperature adjustment alone may not fix classification logic.</em></p>
+<p><strong>Q2:</strong> A customer wants their AI application to solve complex mathematical word problems more accurately. Which prompting technique would MOST improve the results?</p>
+<ul>
+<li>A) Zero-shot prompting</li>
+<li>B) Negative prompting</li>
+<li>C) Chain-of-Thought prompting ✓</li>
+<li>D) Prompt chaining</li>
+</ul>
+<p><em>Explanation: Chain-of-Thought prompting encourages the model to show its reasoning step by step, which significantly improves accuracy on mathematical and logical reasoning tasks.</em></p>
+<p><strong>Q3:</strong> Which of the following is a benefit of using a system prompt in a generative AI application?</p>
+<ul>
+<li>A) It eliminates the need for user input</li>
+<li>B) It reduces the model's inference cost</li>
+<li>C) It defines the model's role, behavior, and constraints ✓</li>
+<li>D) It replaces the need for fine-tuning</li>
+</ul>
+<p><em>Explanation: System prompts set the model's role, behavioral constraints, and output format — establishing consistent behavior across all user interactions without any model training.</em></p>

package/content/series/luyen-thi/luyen-thi-aws-ai-practitioner/chapters/03-domain-3-applications-foundation-models/lessons/06-bai-6-rag-vector-databases-knowledge-bases.md ADDED Viewed

@@ -0,0 +1,244 @@
+---
+id: 019c9619-lt01-d3-l06
+title: 'Bài 6: RAG, Vector Databases & Bedrock Knowledge Bases'
+slug: bai-6-rag-vector-databases-knowledge-bases
+description: >-
+  Retrieval-Augmented Generation (RAG) architecture.
+  Vector databases, embeddings, chunking strategies.
+  Amazon Bedrock Knowledge Bases. So sánh RAG vs Fine-tuning.
+duration_minutes: 60
+is_free: true
+video_url: null
+sort_order: 2
+section_title: "Domain 3: Applications of Foundation Models (28%)"
+course:
+  id: 019c9619-lt01-7001-c001-lt0100000001
+  title: 'Luyện thi AWS Certified AI Practitioner (AIF-C01)'
+  slug: luyen-thi-aws-ai-practitioner
+---
+<div style="text-align: center; margin: 2rem 0;">
+<img src="/storage/uploads/2026/04/aws-aif-bai6-rag-architecture.png" alt="RAG Architecture" style="max-width: 800px; width: 100%; border-radius: 12px;" />
+<p><em>RAG Architecture — Indexing Phase và Query Phase với Amazon Bedrock Knowledge Bases</em></p>
+</div>
+<h2 id="rag-overview"><strong>1. What is RAG?</strong></h2>
+<p><strong>Retrieval-Augmented Generation (RAG)</strong> là kỹ thuật kết hợp FM với <strong>external knowledge sources</strong> để trả lời chính xác hơn, giảm hallucination, và cập nhật thông tin mà model chưa biết.</p>
+<h3 id="why-rag"><strong>1.1. Why RAG?</strong></h3>
+<table>
+<thead><tr><th>Problem</th><th>RAG Solution</th></tr></thead>
+<tbody>
+<tr><td>Knowledge cutoff date</td><td>Retrieve latest documents</td></tr>
+<tr><td>Hallucination</td><td>Ground responses in real data</td></tr>
+<tr><td>No domain knowledge</td><td>Add company-specific documents</td></tr>
+<tr><td>Generic answers</td><td>Cite specific sources</td></tr>
+<tr><td>Privacy — can't send data to FM training</td><td>Keep data in your own vector DB</td></tr>
+</tbody>
+</table>
+<h3 id="rag-flow"><strong>1.2. RAG Architecture</strong></h3>
+<pre><code class="language-text">RAG Pipeline:
+┌─────────────────────────────────────────────────────────────┐
+│  INDEXING (Done once / periodically)                        │
+│                                                             │
+│  Documents → Chunking → Embedding Model → Vector Database   │
+│  (PDF, web,    (split     (Amazon Titan     (OpenSearch,    │
+│   S3, etc.)    text)       Embeddings)       Aurora pgvector)│
+└─────────────────────────────────────────────────────────────┘
+┌─────────────────────────────────────────────────────────────┐
+│  RETRIEVAL & GENERATION (Per query)                         │
+│                                                             │
+│  User Query → Embed Query → Search Vector DB → Top-K docs   │
+│                                                             │
+│  Augmented Prompt = System Prompt + Retrieved Docs + Query  │
+│                                                             │
+│  Augmented Prompt → Foundation Model → Answer with sources  │
+└─────────────────────────────────────────────────────────────┘
+</code></pre>
+<h2 id="chunking"><strong>2. Chunking Strategies</strong></h2>
+<p>Trước khi tạo embeddings, documents phải được <strong>chia nhỏ (chunked)</strong> thành các đoạn phù hợp.</p>
+<table>
+<thead><tr><th>Strategy</th><th>Description</th><th>Best For</th></tr></thead>
+<tbody>
+<tr><td><strong>Fixed-size</strong></td><td>Split every N characters/tokens</td><td>Simple, uniform documents</td></tr>
+<tr><td><strong>Sentence-based</strong></td><td>Split at sentence boundaries</td><td>Narrative text</td></tr>
+<tr><td><strong>Paragraph-based</strong></td><td>Split at paragraph breaks</td><td>Well-structured documents</td></tr>
+<tr><td><strong>Semantic</strong></td><td>Split based on topic changes</td><td>Complex documents</td></tr>
+<tr><td><strong>Hierarchical</strong></td><td>Parent-child chunk relationships</td><td>Long documents with sections</td></tr>
+</tbody>
+</table>
+<h3 id="chunk-size"><strong>Chunk Size Trade-offs:</strong></h3>
+<pre><code class="language-text">Small chunks (100-200 tokens):
+  ✓ More precise retrieval
+  ✗ May lose context
+  ✗ More chunks to search
+Large chunks (500-1000 tokens):
+  ✓ More context preserved
+  ✗ May include irrelevant info
+  ✗ Fewer chunks, less granular
+Overlap (e.g., 20% between chunks):
+  ✓ Prevents information loss at boundaries
+  ✗ Increases storage and compute
+</code></pre>
+<blockquote>
+<p><strong>Exam tip:</strong> "How to improve RAG retrieval accuracy?" → Adjust <strong>chunk size</strong>, add <strong>overlap</strong>, use <strong>semantic chunking</strong>, improve <strong>embedding model</strong>.</p>
+</blockquote>
+<h2 id="embeddings"><strong>3. Embeddings for RAG</strong></h2>
+<h3 id="embedding-models"><strong>3.1. AWS Embedding Models</strong></h3>
+<table>
+<thead><tr><th>Model</th><th>Modality</th><th>Dimensions</th><th>Use Case</th></tr></thead>
+<tbody>
+<tr><td><strong>Amazon Titan Text Embeddings V2</strong></td><td>Text</td><td>256/512/1024</td><td>Semantic search, RAG</td></tr>
+<tr><td><strong>Amazon Titan Multimodal Embeddings</strong></td><td>Text + Image</td><td>256/384/1024</td><td>Cross-modal search</td></tr>
+<tr><td><strong>Cohere Embed</strong></td><td>Text</td><td>1024</td><td>Multilingual search</td></tr>
+</tbody>
+</table>
+<h3 id="vector-db"><strong>3.2. Vector Databases on AWS</strong></h3>
+<table>
+<thead><tr><th>Service</th><th>Type</th><th>Key Feature</th></tr></thead>
+<tbody>
+<tr><td><strong>Amazon OpenSearch Serverless</strong></td><td>Managed</td><td>Vector search collection type, serverless</td></tr>
+<tr><td><strong>Amazon Aurora PostgreSQL</strong></td><td>RDB + Vector</td><td>pgvector extension</td></tr>
+<tr><td><strong>Amazon Neptune</strong></td><td>Graph + Vector</td><td>Knowledge graphs with vector search</td></tr>
+<tr><td><strong>Amazon DocumentDB</strong></td><td>Document + Vector</td><td>MongoDB-compatible with vector search</td></tr>
+<tr><td><strong>Amazon MemoryDB</strong></td><td>In-memory + Vector</td><td>Redis-compatible, ultra-low latency</td></tr>
+<tr><td><strong>Pinecone (3rd party)</strong></td><td>Dedicated vector DB</td><td>Popular, integrates with Bedrock</td></tr>
+</tbody>
+</table>
+<h2 id="bedrock-kb"><strong>4. Amazon Bedrock Knowledge Bases</strong></h2>
+<p><strong>Bedrock Knowledge Bases</strong> là <strong>fully managed RAG solution</strong>. AWS handles chunking, embedding, indexing, retrieval — bạn chỉ cần point to data sources.</p>
+<h3 id="kb-architecture"><strong>4.1. How It Works</strong></h3>
+<pre><code class="language-text">Setup:
+┌───────────┐     ┌───────────────┐     ┌─────────────────┐
+│ S3 Bucket │────→│ Bedrock       │────→│ Vector Store     │
+│ (docs)    │     │ Knowledge Base│     │ (OpenSearch/     │
+│           │     │ (auto-chunk,  │     │  Aurora/Pinecone) │
+│           │     │  auto-embed)  │     │                  │
+└───────────┘     └───────────────┘     └─────────────────┘
+Query:
+┌───────────┐     ┌───────────────┐     ┌─────────────────┐
+│ User      │────→│ Knowledge Base│────→│ FM (Claude,      │
+│ "What is  │     │ retrieves     │     │  Titan, etc.)    │
+│  the..."  │     │ relevant docs │     │ generates answer │
+└───────────┘     └───────────────┘     └─────────────────┘
+</code></pre>
+<h3 id="kb-data-sources"><strong>4.2. Supported Data Sources</strong></h3>
+<ul>
+<li><strong>Amazon S3</strong>: PDF, TXT, MD, HTML, DOC, CSV</li>
+<li><strong>Web Crawler</strong>: Crawl websites automatically</li>
+<li><strong>Confluence</strong>: Atlassian Confluence pages</li>
+<li><strong>SharePoint</strong>: Microsoft SharePoint documents</li>
+<li><strong>Salesforce</strong>: Salesforce knowledge articles</li>
+</ul>
+<h3 id="kb-features"><strong>4.3. Key Features</strong></h3>
+<table>
+<thead><tr><th>Feature</th><th>Benefit</th></tr></thead>
+<tbody>
+<tr><td><strong>Managed chunking</strong></td><td>Auto-splits documents (fixed, semantic, hierarchical)</td></tr>
+<tr><td><strong>Auto-sync</strong></td><td>Periodically re-indexes when data changes</td></tr>
+<tr><td><strong>Source attribution</strong></td><td>Returns source documents with answers</td></tr>
+<tr><td><strong>Metadata filtering</strong></td><td>Filter chunks by custom metadata fields</td></tr>
+<tr><td><strong>Hybrid search</strong></td><td>Combines semantic + keyword search</td></tr>
+<tr><td><strong>Guardrails integration</strong></td><td>Apply safety filters to RAG responses</td></tr>
+</tbody>
+</table>
+<blockquote>
+<p><strong>Exam tip:</strong> "A company wants to build a chatbot that answers questions from internal documents stored in S3, with minimal custom code" → <strong>Amazon Bedrock Knowledge Bases</strong>.</p>
+</blockquote>
+<h2 id="rag-vs-finetuning"><strong>5. RAG vs Fine-tuning</strong></h2>
+<table>
+<thead><tr><th>Factor</th><th>RAG</th><th>Fine-tuning</th></tr></thead>
+<tbody>
+<tr><td><strong>Purpose</strong></td><td>Access external/current data</td><td>Teach new skills/domain patterns</td></tr>
+<tr><td><strong>Data freshness</strong></td><td>Always up-to-date</td><td>Fixed at training time</td></tr>
+<tr><td><strong>Training required?</strong></td><td>No model training</td><td>Yes, needs labeled data + compute</td></tr>
+<tr><td><strong>Cost</strong></td><td>Vector DB + retrieval costs</td><td>Training compute + storage</td></tr>
+<tr><td><strong>Hallucination</strong></td><td>Reduced (grounded in data)</td><td>May still hallucinate</td></tr>
+<tr><td><strong>Latency</strong></td><td>Slightly higher (retrieval step)</td><td>Same as base model</td></tr>
+<tr><td><strong>Best for</strong></td><td>Q&A, search, knowledge bases</td><td>Style, tone, domain-specific patterns</td></tr>
+<tr><td><strong>Data privacy</strong></td><td>Data stays in your vector DB</td><td>Data used in training process</td></tr>
+</tbody>
+</table>
+<h3 id="when-to-use"><strong>Decision Matrix:</strong></h3>
+<pre><code class="language-text">"Need to answer from company docs?"       → RAG
+"Need real-time/latest information?"       → RAG
+"Need to change model's writing style?"    → Fine-tuning
+"Need model to follow specific format?"    → Try prompting first → then fine-tuning
+"Need domain-specific terminology?"        → RAG (if in docs) or Fine-tuning (if patterns)
+"Minimum effort/cost?"                     → RAG > Prompt Engineering > Fine-tuning
+</code></pre>
+<h2 id="rag-evaluation"><strong>6. Evaluating RAG Quality</strong></h2>
+<table>
+<thead><tr><th>Metric</th><th>What it measures</th></tr></thead>
+<tbody>
+<tr><td><strong>Faithfulness</strong></td><td>Is the answer grounded in retrieved docs? (no hallucination)</td></tr>
+<tr><td><strong>Relevance</strong></td><td>Are retrieved documents relevant to the query?</td></tr>
+<tr><td><strong>Answer correctness</strong></td><td>Is the final answer factually correct?</td></tr>
+<tr><td><strong>Context precision</strong></td><td>What % of retrieved chunks are actually relevant?</td></tr>
+<tr><td><strong>Context recall</strong></td><td>Did we retrieve all relevant chunks?</td></tr>
+</tbody>
+</table>
+<h2 id="practice-questions"><strong>7. Practice Questions</strong></h2>
+<p><strong>Q1:</strong> A healthcare company wants an AI assistant that answers questions from their latest medical research papers stored in Amazon S3. The information changes weekly. Which approach is MOST suitable?</p>
+<ul>
+<li>A) Fine-tune a foundation model on the papers</li>
+<li>B) Use RAG with Amazon Bedrock Knowledge Bases ✓</li>
+<li>C) Use zero-shot prompting with a large context window</li>
+<li>D) Pre-train a custom model on medical data</li>
+</ul>
+<p><em>Explanation: RAG with Bedrock Knowledge Bases is ideal — it automatically indexes S3 documents, retrieves relevant information per query, and keeps responses current without retraining. Weekly updates are handled by auto-sync.</em></p>
+<p><strong>Q2:</strong> What is the PRIMARY purpose of chunking documents in a RAG pipeline?</p>
+<ul>
+<li>A) To reduce storage costs</li>
+<li>B) To split documents into manageable pieces for embedding and retrieval ✓</li>
+<li>C) To encrypt sensitive data</li>
+<li>D) To convert documents to a different file format</li>
+</ul>
+<p><em>Explanation: Chunking splits large documents into smaller, semantically meaningful pieces that can be individually embedded and retrieved. This enables precise retrieval of relevant information rather than processing entire documents.</em></p>
+<p><strong>Q3:</strong> A company built a RAG application, but it sometimes returns answers not supported by the retrieved documents. Which metric should they focus on improving?</p>
+<ul>
+<li>A) Context recall</li>
+<li>B) Answer length</li>
+<li>C) Faithfulness ✓</li>
+<li>D) Response latency</li>
+</ul>
+<p><em>Explanation: Faithfulness measures whether the generated answer is grounded in the retrieved documents. Low faithfulness means the model is generating information beyond what the retrieved context supports (hallucination in RAG context).</em></p>