rag-skills 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +14 -0
- package/.claude-plugin/plugin.json +8 -0
- package/CONTRIBUTING.md +210 -0
- package/LICENSE +21 -0
- package/README.md +148 -0
- package/examples/foundational-rag-pipeline.md +104 -0
- package/examples/multi-agent-rag.md +111 -0
- package/examples/production-rag-setup.md +133 -0
- package/package.json +22 -0
- package/scripts/generate-index.py +276 -0
- package/scripts/validate-skills.py +214 -0
- package/skills/chunking/choosing-a-chunking-framework.md +186 -0
- package/skills/chunking/contextual-chunk-headers.md +106 -0
- package/skills/chunking/hierarchical-chunking.md +77 -0
- package/skills/chunking/semantic-chunking.md +78 -0
- package/skills/chunking/sliding-window-chunking.md +82 -0
- package/skills/data-type-handling/rag-for-code-documentation.md +83 -0
- package/skills/data-type-handling/rag-for-multimodal-content.md +83 -0
- package/skills/performance-optimization/optimize-retrieval-latency.md +88 -0
- package/skills/retrieval-strategies/adaptive-retrieval.md +102 -0
- package/skills/retrieval-strategies/context-enrichment-window.md +99 -0
- package/skills/retrieval-strategies/crag-corrective-rag.md +108 -0
- package/skills/retrieval-strategies/explainable-retrieval.md +106 -0
- package/skills/retrieval-strategies/graph-rag.md +107 -0
- package/skills/retrieval-strategies/hybrid-search-bm25-dense.md +81 -0
- package/skills/retrieval-strategies/hyde-hypothetical-document-embeddings.md +91 -0
- package/skills/retrieval-strategies/hype-hypothetical-prompt-embeddings.md +98 -0
- package/skills/retrieval-strategies/multi-pass-retrieval-with-reranking.md +82 -0
- package/skills/retrieval-strategies/query-transformation-strategies.md +93 -0
- package/skills/retrieval-strategies/raptor-hierarchical-retrieval.md +106 -0
- package/skills/retrieval-strategies/self-rag.md +108 -0
- package/skills/vector-databases/choosing-vector-db-by-datatype.md +112 -0
- package/skills/vector-databases/qdrant-for-production-rag.md +88 -0
- package/skills/vector-databases/qdrant-setup-rag.md +86 -0
- package/templates/skill-template.md +53 -0
- package/templates/workflow-template.md +67 -0
|
@@ -0,0 +1,108 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Self-RAG - Self-Reflective Retrieval"
|
|
3
|
+
description: "Use self-reflective loops to make dynamic retrieval decisions and assess response quality."
|
|
4
|
+
allowed-tools:
|
|
5
|
+
- Read
|
|
6
|
+
- Grep
|
|
7
|
+
- Glob
|
|
8
|
+
- Bash
|
|
9
|
+
category: "retrieval-strategies"
|
|
10
|
+
tags: ["self-rag", "reflection", "retrieval-decision", "relevance-evaluation"]
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Overview
|
|
14
|
+
Self-RAG is a reflective framework that decides whether to retrieve information, evaluates the relevance of retrieved documents, assesses response support, and rates output utility. This dynamic approach optimizes retrieval decisions based on query characteristics and retrieved content quality.
|
|
15
|
+
|
|
16
|
+
## Problem Statement
|
|
17
|
+
Traditional RAG systems lack introspection:
|
|
18
|
+
- **Blind Retrieval**: Always retrieves regardless of whether it's needed
|
|
19
|
+
- **Irrelevant Context**: Cannot filter out irrelevant retrieved documents
|
|
20
|
+
- **No Grounding Assessment**: Cannot verify if responses are supported by context
|
|
21
|
+
- **Fixed Strategy**: Cannot adapt to different query types
|
|
22
|
+
- **Poor Quality Control**: No mechanism to assess output utility
|
|
23
|
+
|
|
24
|
+
## Key Concepts
|
|
25
|
+
- **Retrieval Decision**: LLM determines if retrieval is necessary for the query
|
|
26
|
+
- **Relevance Evaluation**: Each retrieved document assessed for query relevance
|
|
27
|
+
- **Response Generation**: Answers generated using only relevant context
|
|
28
|
+
- **Support Assessment**: Evaluation of how well responses are grounded in context
|
|
29
|
+
- **Utility Rating**: Assessment of how well the response addresses the original query
|
|
30
|
+
|
|
31
|
+
## Implementation Guide
|
|
32
|
+
|
|
33
|
+
### Step 1: Implement Retrieval Decision
|
|
34
|
+
Use an LLM to determine if the query requires retrieval or can be answered directly.
|
|
35
|
+
|
|
36
|
+
**Why**: Not all queries need retrieval—some can be answered from general knowledge or are simple enough that retrieval adds unnecessary cost and latency.
|
|
37
|
+
|
|
38
|
+
Direct the LLM to make a binary decision: retrieve from documents or answer without retrieval.
|
|
39
|
+
|
|
40
|
+
### Step 2: Implement Relevance Evaluation
|
|
41
|
+
Assess each retrieved document for its relevance to the original query.
|
|
42
|
+
|
|
43
|
+
**Why**: Not all retrieved documents are equally relevant. Filtering ensures only pertinent information is used for generation, reducing noise and improving answer quality.
|
|
44
|
+
|
|
45
|
+
For each retrieved document, have the LLM rate its relevance to the query on a scale or classify as relevant/irrelevant.
|
|
46
|
+
|
|
47
|
+
### Step 3: Implement Context Filtering
|
|
48
|
+
Build a pipeline that filters documents based on relevance scores.
|
|
49
|
+
|
|
50
|
+
**Why**: A systematic filtering process ensures consistent quality and prevents irrelevant information from contaminating the generation process.
|
|
51
|
+
|
|
52
|
+
Select only documents that meet a relevance threshold for use in response generation.
|
|
53
|
+
|
|
54
|
+
### Step 4: Implement Response Generation
|
|
55
|
+
Generate responses using only relevant context, or without context if none is relevant.
|
|
56
|
+
|
|
57
|
+
**Why**: Using only relevant context improves factual grounding. When no relevant context exists, the system can still provide a response (with appropriate caveats) rather than forcing use of irrelevant information.
|
|
58
|
+
|
|
59
|
+
Generate responses conditioned on whether relevant context was found—using context if available, answering from general knowledge if not.
|
|
60
|
+
|
|
61
|
+
### Step 5: Implement Support Assessment
|
|
62
|
+
Evaluate how well the generated response is supported by the context.
|
|
63
|
+
|
|
64
|
+
**Why**: Support assessment provides a confidence metric and helps detect hallucinations—responses not supported by context can be flagged or refined.
|
|
65
|
+
|
|
66
|
+
Have the LLM classify the level of support: fully supported, partially supported, or no support.
|
|
67
|
+
|
|
68
|
+
### Step 6: Implement Utility Rating
|
|
69
|
+
Rate the usefulness of the generated response for the original query.
|
|
70
|
+
|
|
71
|
+
**Why**: Utility rating provides feedback on response quality and can be used for continuous improvement, A/B testing, or filtering low-quality responses.
|
|
72
|
+
|
|
73
|
+
Rate the response on a scale (e.g., 1-5) based on how well it addresses the user's query.
|
|
74
|
+
|
|
75
|
+
### Step 7: Build Complete Self-RAG Pipeline
|
|
76
|
+
Combine all components into a cohesive reflective system.
|
|
77
|
+
|
|
78
|
+
**Why**: A complete pipeline with clear decision points allows for monitoring, optimization, and understanding of system behavior.
|
|
79
|
+
|
|
80
|
+
Chain together the retrieval decision, relevance evaluation, response generation, and assessment steps into a unified workflow.
|
|
81
|
+
|
|
82
|
+
For implementation patterns, see the [Self-RAG paper](https://arxiv.org/abs/2310.11511), [LangGraph agentic RAG guide](https://docs.langchain.com/oss/python/langgraph/agentic-rag), and [LlamaIndex SelfRAG pack](https://docs.llamaindex.ai/en/stable/api_reference/packs/self_rag/).
|
|
83
|
+
|
|
84
|
+
## When to Use This Skill
|
|
85
|
+
- Building RAG systems where response accuracy is critical
|
|
86
|
+
- When retrieval quality varies significantly across queries
|
|
87
|
+
- For applications requiring confidence scoring
|
|
88
|
+
- When you need to detect and handle hallucinations
|
|
89
|
+
- For systems that must adapt to different query types
|
|
90
|
+
|
|
91
|
+
## When NOT to Use This Skill
|
|
92
|
+
- For extremely latency-sensitive applications (multiple LLM calls)
|
|
93
|
+
- When using a vector database with high-precision retrieval
|
|
94
|
+
- For simple FAQ systems where queries are well-structured
|
|
95
|
+
- When cost is a major constraint (multiple LLM calls per query)
|
|
96
|
+
- For real-time systems with strict latency budgets
|
|
97
|
+
|
|
98
|
+
## Related Skills
|
|
99
|
+
- [CRAG (Corrective RAG)](./crag-corrective-rag.md) - Dynamic correction approach
|
|
100
|
+
- [Adaptive Retrieval](./adaptive-retrieval.md) - Query-based strategy selection
|
|
101
|
+
- [Explainable Retrieval](./explainable-retrieval.md) - Citation and traceability
|
|
102
|
+
|
|
103
|
+
## Metrics & Success Criteria
|
|
104
|
+
- **Retrieval Efficiency**: Reduced unnecessary retrievals by 30-50%
|
|
105
|
+
- **Relevance Filtering**: High precision in identifying irrelevant documents
|
|
106
|
+
- **Response Accuracy**: Improved factual grounding of responses
|
|
107
|
+
- **Support Confidence**: Reliable support assessment metrics
|
|
108
|
+
- **Utility Rating**: Consistent utility scores correlate with human evaluation
|
|
@@ -0,0 +1,112 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Choosing Vector Database by Data Type"
|
|
3
|
+
description: "Choose a vector database based on content type, metadata needs, and query patterns."
|
|
4
|
+
allowed-tools:
|
|
5
|
+
- Read
|
|
6
|
+
- Grep
|
|
7
|
+
- Glob
|
|
8
|
+
- Bash
|
|
9
|
+
category: "vector-databases"
|
|
10
|
+
tags: ["selection", "text", "multimodal", "code", "comparison"]
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Overview
|
|
14
|
+
Selecting the right vector database depends heavily on your data type (text, images, code, multimodal) and use case requirements. This skill provides a decision framework for choosing between popular vector databases based on data characteristics and operational constraints.
|
|
15
|
+
|
|
16
|
+
## Problem Statement
|
|
17
|
+
Different vector databases excel at different use cases:
|
|
18
|
+
- Text-heavy workloads need efficient semantic search
|
|
19
|
+
- Code repositories require specialized embedding handling
|
|
20
|
+
- Multimodal data needs multi-vector support
|
|
21
|
+
- Production constraints (cost, scaling, latency) vary widely
|
|
22
|
+
- Choosing the wrong database leads to poor performance or high costs
|
|
23
|
+
|
|
24
|
+
## Key Concepts
|
|
25
|
+
- **Data Type**: Text, code, images, audio, or multimodal combinations
|
|
26
|
+
- **Embedding Model Compatibility**: Support for different embedding dimensions and models
|
|
27
|
+
- **Multi-Vector Support**: Ability to store and search multiple embeddings per document
|
|
28
|
+
- **Scaling Model**: Vertical vs. horizontal scaling capabilities
|
|
29
|
+
- **Deployment Model**: Self-hosted, managed cloud, or hybrid
|
|
30
|
+
|
|
31
|
+
## Database Comparison Matrix
|
|
32
|
+
|
|
33
|
+
### Text-Heavy Workloads
|
|
34
|
+
|
|
35
|
+
| Database | Strengths | Weaknesses | Best For |
|
|
36
|
+
|----------|-----------|------------|----------|
|
|
37
|
+
| **[Qdrant](https://qdrant.tech/documentation/)** | Open-source, excellent filtering, hybrid search | Smaller community than Pinecone | Production RAG, cost-sensitive |
|
|
38
|
+
| **[Pinecone](https://docs.pinecone.io/)** | Managed, excellent scalability, easy setup | Expensive, vendor lock-in | Enterprise, zero-ops requirement |
|
|
39
|
+
| **[Weaviate](https://docs.weaviate.io/weaviate/introduction)** | GraphQL API, modular, schema-first | Learning curve, can be complex | Teams wanting flexible schemas |
|
|
40
|
+
| **[ChromaDB](https://docs.trychroma.com/)** | Simple API, embeds easily, open-source | Less scalable for production | Prototyping, small-medium scale |
|
|
41
|
+
| **[Milvus](https://milvus.io/docs)** | Highly scalable, feature-rich | Complex setup, heavy | Enterprise with data team |
|
|
42
|
+
|
|
43
|
+
### Code & Documentation
|
|
44
|
+
|
|
45
|
+
| Database | Strengths | Weaknesses | Best For |
|
|
46
|
+
|----------|-----------|------------|----------|
|
|
47
|
+
| **[Qdrant](https://qdrant.tech/documentation/)** | Great metadata filtering, payload indexing | No code-specific features | Code search with metadata |
|
|
48
|
+
| **[Weaviate](https://docs.weaviate.io/weaviate/introduction)** | Graph traversals, relationship handling | Setup complexity | Code graph/context search |
|
|
49
|
+
| **[ChromaDB](https://docs.trychroma.com/)** | Simple integration with code tools | Limited scaling | IDE integrations, code assistants |
|
|
50
|
+
| **[Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html)** | Text search + vectors, mature ecosystem | Not vector-first | Hybrid code + keyword search |
|
|
51
|
+
|
|
52
|
+
### Multimodal (Text + Images)
|
|
53
|
+
|
|
54
|
+
| Database | Strengths | Weaknesses | Best For |
|
|
55
|
+
|----------|-----------|------------|----------|
|
|
56
|
+
| **[Weaviate](https://docs.weaviate.io/weaviate/introduction)** | Multi-vector, CLIP support built-in | Performance overhead | Image + text search |
|
|
57
|
+
| **[Qdrant](https://qdrant.tech/documentation/)** | Multi-vector via named vectors | Manual configuration | Custom multimodal pipelines |
|
|
58
|
+
| **[Pinecone](https://docs.pinecone.io/)** | Multiple namespaces, sparse vectors | Limited multi-vector | Limited multimodal needs |
|
|
59
|
+
| **[Milvus](https://milvus.io/docs)** | Multi-vector, hybrid search | Complex setup | Large-scale multimodal systems |
|
|
60
|
+
|
|
61
|
+
## Implementation Guide
|
|
62
|
+
|
|
63
|
+
### Step 1: Analyze Your Data Type
|
|
64
|
+
Identify your primary data characteristics.
|
|
65
|
+
|
|
66
|
+
**Why**: Data profiling ensures your choice aligns with actual requirements rather than hype or popularity.
|
|
67
|
+
|
|
68
|
+
### Step 2: Evaluate Database Candidates
|
|
69
|
+
Score databases based on your requirements.
|
|
70
|
+
|
|
71
|
+
**Why**: A scoring framework provides objective criteria rather than subjective opinions.
|
|
72
|
+
|
|
73
|
+
### Step 3: Implement Type-Specific Strategies
|
|
74
|
+
Tailor your approach to each data type.
|
|
75
|
+
|
|
76
|
+
**Why**: Different data types require fundamentally different approaches for optimal performance.
|
|
77
|
+
|
|
78
|
+
### Step 4: Proof of Concept
|
|
79
|
+
Create a PoC to validate your choice.
|
|
80
|
+
|
|
81
|
+
**Why**: Nothing beats testing with your actual data and queries—PoCs reveal real-world performance characteristics.
|
|
82
|
+
|
|
83
|
+
### Step 5: Migration Path
|
|
84
|
+
Plan for potential future database changes.
|
|
85
|
+
|
|
86
|
+
**Why**: An export strategy prevents lock-in and enables future migrations if requirements change.
|
|
87
|
+
|
|
88
|
+
For comparisons and vendor docs, use [ANN Benchmarks](https://ann-benchmarks.com/), [Qdrant documentation](https://qdrant.tech/documentation/), [Pinecone documentation](https://docs.pinecone.io/), [Weaviate documentation](https://docs.weaviate.io/weaviate/introduction), [ChromaDB documentation](https://docs.trychroma.com/), [Milvus documentation](https://milvus.io/docs), and [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html).
|
|
89
|
+
|
|
90
|
+
## When to Use This Skill
|
|
91
|
+
- Starting a new RAG project and need database selection guidance
|
|
92
|
+
- Evaluating whether to switch vector databases
|
|
93
|
+
- Building a multimodal RAG system
|
|
94
|
+
- When cost is a significant constraint
|
|
95
|
+
- When scaling requirements are unclear
|
|
96
|
+
|
|
97
|
+
## When NOT to Use This Skill
|
|
98
|
+
- When you already have a working solution that meets requirements
|
|
99
|
+
- For trivial prototypes (< 10k documents) where any DB will work
|
|
100
|
+
- When using a managed service that handles database choice
|
|
101
|
+
|
|
102
|
+
## Related Skills
|
|
103
|
+
- [Qdrant Setup for RAG](./qdrant-setup-rag.md) - Setting up Qdrant
|
|
104
|
+
- [RAG for Code Documentation](../data-type-handling/rag-for-code-documentation.md) - Code-specific RAG
|
|
105
|
+
- [RAG for Multimodal Content](../data-type-handling/rag-for-multimodal-content.md) - Multimodal RAG
|
|
106
|
+
|
|
107
|
+
## Metrics & Success Criteria
|
|
108
|
+
- **Selection Accuracy**: Chosen database meets functional requirements
|
|
109
|
+
- **Performance**: Meets or exceeds latency/throughput targets
|
|
110
|
+
- **Cost**: Within budget constraints
|
|
111
|
+
- **Scalability**: Can grow with expected data volume
|
|
112
|
+
- **Operational Overhead**: Matches team capabilities
|
|
@@ -0,0 +1,88 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Qdrant for Production RAG"
|
|
3
|
+
description: "Run Qdrant reliably in production with scaling, backups, monitoring, and operational tuning."
|
|
4
|
+
allowed-tools:
|
|
5
|
+
- Read
|
|
6
|
+
- Grep
|
|
7
|
+
- Glob
|
|
8
|
+
- Bash
|
|
9
|
+
category: "vector-databases"
|
|
10
|
+
tags: ["production", "scaling", "optimization", "deployment"]
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Overview
|
|
14
|
+
Productionizing a RAG system with Qdrant requires considerations beyond basic setup: horizontal scaling, high availability, performance optimization, monitoring, and cost management. This skill covers deploying Qdrant at scale with resilience and efficiency.
|
|
15
|
+
|
|
16
|
+
## Problem Statement
|
|
17
|
+
Moving from development to production introduces challenges:
|
|
18
|
+
- Single-node deployments can't handle increasing query/load volume
|
|
19
|
+
- Indexing parameters need tuning for performance vs. recall trade-offs
|
|
20
|
+
- Data durability and backup strategies are critical
|
|
21
|
+
- Monitoring and alerting are essential for production reliability
|
|
22
|
+
- Cost optimization becomes important as scale increases
|
|
23
|
+
|
|
24
|
+
## Key Concepts
|
|
25
|
+
- **Horizontal Scaling**: Distributing load across multiple Qdrant nodes
|
|
26
|
+
- **Replication**: Data redundancy for high availability
|
|
27
|
+
- **Sharding**: Partitioning data across nodes for scalability
|
|
28
|
+
- **Performance Tuning**: Optimizing index parameters and cache settings
|
|
29
|
+
- **Observability**: Monitoring metrics and setting up alerts
|
|
30
|
+
|
|
31
|
+
## Implementation Guide
|
|
32
|
+
|
|
33
|
+
### Step 1: Choose Deployment Strategy
|
|
34
|
+
Select between self-hosted, managed cloud, or hybrid approaches.
|
|
35
|
+
|
|
36
|
+
**Why**: Self-hosted gives control but requires ops overhead; managed cloud provides scalability with less operational burden.
|
|
37
|
+
|
|
38
|
+
### Step 2: Configure Production Collections
|
|
39
|
+
Optimize collection parameters for production workloads.
|
|
40
|
+
|
|
41
|
+
**Why**: Production configurations balance performance, memory usage, and build time based on workload characteristics.
|
|
42
|
+
|
|
43
|
+
### Step 3: Implement Efficient Batch Operations
|
|
44
|
+
Optimize ingestion and query batching for production scale.
|
|
45
|
+
|
|
46
|
+
**Why**: Asynchronous batch operations maximize throughput and prevent bottlenecks during ingestion.
|
|
47
|
+
|
|
48
|
+
### Step 4: Configure Caching and Connection Pooling
|
|
49
|
+
Optimize connection handling and implement query caching.
|
|
50
|
+
|
|
51
|
+
**Why**: Caching reduces load on Qdrant for repeated queries and improves response times for common questions.
|
|
52
|
+
|
|
53
|
+
### Step 5: Set Up Monitoring and Alerting
|
|
54
|
+
Implement observability for production RAG systems.
|
|
55
|
+
|
|
56
|
+
**Why**: Metrics enable proactive issue detection and capacity planning before problems impact users.
|
|
57
|
+
|
|
58
|
+
### Step 6: Implement Backup and Recovery
|
|
59
|
+
Set up automated backups for data durability.
|
|
60
|
+
|
|
61
|
+
**Why**: Regular backups protect against data loss and enable quick recovery from failures.
|
|
62
|
+
|
|
63
|
+
For deployment guidance, review the [Qdrant snapshots tutorial](https://qdrant.tech/documentation/tutorials-operations/create-snapshot/), [Qdrant snapshots concept](https://qdrant.tech/documentation/concepts/snapshots/), and [HNSW tuning guide](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md).
|
|
64
|
+
|
|
65
|
+
## When to Use This Skill
|
|
66
|
+
- Deploying RAG to production environments
|
|
67
|
+
- Handling high query volume (> 100 QPS)
|
|
68
|
+
- When data durability and high availability are critical
|
|
69
|
+
- When scaling beyond single-node deployments
|
|
70
|
+
- For cost optimization at scale
|
|
71
|
+
|
|
72
|
+
## When NOT to Use This Skill
|
|
73
|
+
- Early prototyping or MVP development
|
|
74
|
+
- Small datasets (< 100k vectors)
|
|
75
|
+
- When using managed services that handle these concerns
|
|
76
|
+
- For experimental or temporary deployments
|
|
77
|
+
|
|
78
|
+
## Related Skills
|
|
79
|
+
- [Qdrant Setup for RAG](./qdrant-setup-rag.md) - Basic setup
|
|
80
|
+
- [Optimize Retrieval Latency](../performance-optimization/optimize-retrieval-latency.md) - Performance tuning
|
|
81
|
+
- [Multi-Agent RAG](../../examples/multi-agent-rag.md) - Complex workflows
|
|
82
|
+
|
|
83
|
+
## Metrics & Success Criteria
|
|
84
|
+
- **Availability**: > 99.9% uptime
|
|
85
|
+
- **Query Latency**: P50 < 50ms, P99 < 200ms for typical queries
|
|
86
|
+
- **Ingestion Throughput**: > 100,000 vectors/hour
|
|
87
|
+
- **Recovery Time**: RTO < 1 hour, RPO < 5 minutes
|
|
88
|
+
- **Cost Efficiency**: < $0.01 per 1000 queries at scale
|
|
@@ -0,0 +1,86 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Qdrant Setup for RAG"
|
|
3
|
+
description: "Set up Qdrant for RAG with collections, payload filtering, and batch ingestion."
|
|
4
|
+
allowed-tools:
|
|
5
|
+
- Read
|
|
6
|
+
- Grep
|
|
7
|
+
- Glob
|
|
8
|
+
- Bash
|
|
9
|
+
category: "vector-databases"
|
|
10
|
+
tags: ["qdrant", "setup", "ingestion", "filtering"]
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Overview
|
|
14
|
+
Qdrant is an open-source vector similarity search engine designed for high-performance RAG applications. This skill covers setting up Qdrant, creating collections with proper indexing, implementing filtering with metadata, and managing document ingestion.
|
|
15
|
+
|
|
16
|
+
## Problem Statement
|
|
17
|
+
Setting up a vector database for RAG involves multiple considerations:
|
|
18
|
+
- Choosing the right distance metric for your embedding model
|
|
19
|
+
- Configuring optimal index parameters for performance vs. recall
|
|
20
|
+
- Structuring metadata for effective filtering
|
|
21
|
+
- Handling batch ingestion efficiently
|
|
22
|
+
- Managing collection schema changes
|
|
23
|
+
|
|
24
|
+
## Key Concepts
|
|
25
|
+
- **Distance Metric**: Cosine similarity, dot product, or Euclidean distance for vector comparisons
|
|
26
|
+
- **Index Type**: HNSW (Hierarchical Navigable Small World) for approximate nearest neighbors
|
|
27
|
+
- **Payload Filtering**: Querying vectors based on associated metadata
|
|
28
|
+
- **Collection Schema**: Defining vector dimensions and payload structure
|
|
29
|
+
- **Batch Operations**: Efficient bulk insertion of vectors
|
|
30
|
+
|
|
31
|
+
## Implementation Guide
|
|
32
|
+
|
|
33
|
+
### Step 1: Install and Initialize Qdrant
|
|
34
|
+
Set up Qdrant locally or connect to a cloud instance.
|
|
35
|
+
|
|
36
|
+
**Why**: Docker provides an isolated environment with persistence, while the cloud option offers managed scalability.
|
|
37
|
+
|
|
38
|
+
### Step 2: Create a Collection
|
|
39
|
+
Configure collection parameters based on your embedding model and use case.
|
|
40
|
+
|
|
41
|
+
**Why**: Proper configuration at creation time avoids expensive re-indexing later. COSINE is default for normalized embeddings.
|
|
42
|
+
|
|
43
|
+
### Step 3: Design Metadata Schema
|
|
44
|
+
Structure your payloads (metadata) for effective filtering.
|
|
45
|
+
|
|
46
|
+
**Why**: Well-structured metadata enables powerful filtering queries and helps organize retrieval results.
|
|
47
|
+
|
|
48
|
+
### Step 4: Ingest Documents
|
|
49
|
+
Upload documents with their embeddings and metadata.
|
|
50
|
+
|
|
51
|
+
**Why**: Batch operations significantly improve ingestion performance compared to individual uploads.
|
|
52
|
+
|
|
53
|
+
### Step 5: Implement Filtering Queries
|
|
54
|
+
Query with metadata filters for targeted retrieval.
|
|
55
|
+
|
|
56
|
+
**Why**: Filtering enables domain-specific retrieval (e.g., "search only in documentation" or "search only code examples").
|
|
57
|
+
|
|
58
|
+
### Step 6: Monitor and Maintain
|
|
59
|
+
Set up monitoring for collection health.
|
|
60
|
+
|
|
61
|
+
**Why**: Monitoring helps identify performance issues and scaling needs before they impact users.
|
|
62
|
+
|
|
63
|
+
See the [Qdrant documentation](https://qdrant.tech/documentation/), [Qdrant filtering](https://qdrant.tech/documentation/concepts/filtering/), [Qdrant Python client](https://github.com/qdrant/qdrant-client), and the [HNSW paper](https://arxiv.org/abs/1603.09320) for production-ready setup patterns.
|
|
64
|
+
|
|
65
|
+
## When to Use This Skill
|
|
66
|
+
- Building a new RAG application from scratch
|
|
67
|
+
- Setting up a local vector database for development
|
|
68
|
+
- Migrating from another vector database to Qdrant
|
|
69
|
+
- When you need open-source with no vendor lock-in
|
|
70
|
+
|
|
71
|
+
## When NOT to Use This Skill
|
|
72
|
+
- When you need managed, zero-ops solution (consider Qdrant Cloud)
|
|
73
|
+
- For very small datasets (< 1000 vectors) where a simple list might suffice
|
|
74
|
+
- When you need specialized features like multi-vector indexing (consider other DBs)
|
|
75
|
+
|
|
76
|
+
## Related Skills
|
|
77
|
+
- [Qdrant for Production RAG](./qdrant-for-production-rag.md) - Scaling for production
|
|
78
|
+
- [Choosing Vector DB by Datatype](./choosing-vector-db-by-datatype.md) - Database selection guide
|
|
79
|
+
- [Optimize Retrieval Latency](../performance-optimization/optimize-retrieval-latency.md) - Performance tuning
|
|
80
|
+
|
|
81
|
+
## Metrics & Success Criteria
|
|
82
|
+
- **Setup Time**: < 10 minutes to get a working collection
|
|
83
|
+
- **Ingestion Speed**: > 10,000 vectors/minute on modest hardware
|
|
84
|
+
- **Query Latency**: < 50ms for queries on 1M vectors
|
|
85
|
+
- **Indexing**: Automatic HNSW indexing within reasonable time
|
|
86
|
+
- **Recall**: > 0.95 for 10 nearest neighbors
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Skill Name"
|
|
3
|
+
description: "Use this template to define an agent-friendly RAG skill with concise guidance, references, and allowed tool access."
|
|
4
|
+
allowed-tools:
|
|
5
|
+
- Read
|
|
6
|
+
- Grep
|
|
7
|
+
- Glob
|
|
8
|
+
- Bash
|
|
9
|
+
category: "chunking|vector-databases|retrieval-strategies|data-type-handling|performance-optimization|evaluation-metrics|rag-agents|deployment"
|
|
10
|
+
tags: ["tag1", "tag2", "tag3"]
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Overview
|
|
14
|
+
[2-3 sentence description of what this skill covers and why it matters for RAG systems]
|
|
15
|
+
|
|
16
|
+
## Problem Statement
|
|
17
|
+
[Describe the specific challenge this skill addresses]
|
|
18
|
+
|
|
19
|
+
## Key Concepts
|
|
20
|
+
- **Concept 1**: Brief explanation
|
|
21
|
+
- **Concept 2**: Brief explanation
|
|
22
|
+
- **Concept 3**: Brief explanation
|
|
23
|
+
|
|
24
|
+
## Implementation Guide
|
|
25
|
+
|
|
26
|
+
### Step 1: [Step Name]
|
|
27
|
+
[Detailed explanation with reasoning]
|
|
28
|
+
|
|
29
|
+
### Step 2: [Step Name]
|
|
30
|
+
[Detailed explanation with reasoning]
|
|
31
|
+
|
|
32
|
+
### Step 3: [Step Name]
|
|
33
|
+
[Detailed explanation with reasoning]
|
|
34
|
+
|
|
35
|
+
## When to Use This Skill
|
|
36
|
+
- Use case 1
|
|
37
|
+
- Use case 2
|
|
38
|
+
- Use case 3
|
|
39
|
+
|
|
40
|
+
## When NOT to Use This Skill
|
|
41
|
+
- Anti-pattern 1
|
|
42
|
+
- Anti-pattern 2
|
|
43
|
+
|
|
44
|
+
## Related Skills
|
|
45
|
+
- [Related Skill 1](../category/skill-name.md)
|
|
46
|
+
- [Related Skill 2](../category/skill-name.md)
|
|
47
|
+
|
|
48
|
+
## Implementation Resources
|
|
49
|
+
Fold external links into the surrounding text where possible. When a skill needs source material, reference the implementation inline, for example [project name](https://example.com) in the step or sentence that uses it.
|
|
50
|
+
|
|
51
|
+
## Metrics & Success Criteria
|
|
52
|
+
- Success indicator 1
|
|
53
|
+
- Success indicator 2
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Workflow Name"
|
|
3
|
+
description: "Use this template to define an agent workflow with steps, inputs, outputs, and reference notes."
|
|
4
|
+
allowed-tools:
|
|
5
|
+
- Read
|
|
6
|
+
- Grep
|
|
7
|
+
- Glob
|
|
8
|
+
- Bash
|
|
9
|
+
category: "rag-agents"
|
|
10
|
+
tags: ["workflow", "multi-step", "orchestration"]
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Overview
|
|
14
|
+
[Description of this multi-step workflow and its purpose in RAG systems]
|
|
15
|
+
|
|
16
|
+
## Prerequisites
|
|
17
|
+
- [Required Skill 1](../category/skill-name.md)
|
|
18
|
+
- [Required Skill 2](../category/skill-name.md)
|
|
19
|
+
- Required dependencies (Python packages, services, etc.)
|
|
20
|
+
|
|
21
|
+
## Workflow Diagram
|
|
22
|
+
|
|
23
|
+
## Workflow Steps
|
|
24
|
+
|
|
25
|
+
### Step 1: [Step Name]
|
|
26
|
+
**Purpose**: [What this step accomplishes]
|
|
27
|
+
|
|
28
|
+
**Input**: [What data enters this step]
|
|
29
|
+
**Output**: [What this step produces]
|
|
30
|
+
**Notes**: [Short guidance or decision criteria]
|
|
31
|
+
|
|
32
|
+
### Step 2: [Step Name]
|
|
33
|
+
**Purpose**: [What this step accomplishes]
|
|
34
|
+
|
|
35
|
+
**Input**: [What data enters this step]
|
|
36
|
+
**Output**: [What this step produces]
|
|
37
|
+
**Notes**: [Short guidance or decision criteria]
|
|
38
|
+
|
|
39
|
+
### Step 3: [Step Name]
|
|
40
|
+
**Purpose**: [What this step accomplishes]
|
|
41
|
+
|
|
42
|
+
**Input**: [What data enters this step]
|
|
43
|
+
**Output**: [What this step produces]
|
|
44
|
+
**Notes**: [Short guidance or decision criteria]
|
|
45
|
+
|
|
46
|
+
## Inline Implementation Notes
|
|
47
|
+
|
|
48
|
+
Fold implementation links into the step that uses them. If a workflow needs supporting docs or repos, link them in the step purpose or notes rather than collecting them in a separate reference block.
|
|
49
|
+
|
|
50
|
+
## Configuration Options
|
|
51
|
+
| Option | Description | Default | Notes |
|
|
52
|
+
|--------|-------------|---------|-------|
|
|
53
|
+
| option1 | Description | value1 | Note |
|
|
54
|
+
| option2 | Description | value2 | Note |
|
|
55
|
+
|
|
56
|
+
## Troubleshooting
|
|
57
|
+
| Issue | Cause | Solution |
|
|
58
|
+
|-------|-------|----------|
|
|
59
|
+
| Issue description | Root cause | Fix |
|
|
60
|
+
|
|
61
|
+
## Performance Considerations
|
|
62
|
+
- [Performance tip 1]
|
|
63
|
+
- [Performance tip 2]
|
|
64
|
+
|
|
65
|
+
## Related Workflows
|
|
66
|
+
- [Related Workflow 1](workflow-name.md)
|
|
67
|
+
- [Related Workflow 2](workflow-name.md)
|