opencode-skills-collection 1.0.185 → 1.0.187

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (71) hide show
  1. package/bundled-skills/.antigravity-install-manifest.json +5 -1
  2. package/bundled-skills/3d-web-experience/SKILL.md +152 -37
  3. package/bundled-skills/agent-evaluation/SKILL.md +1088 -26
  4. package/bundled-skills/agent-memory-systems/SKILL.md +1037 -25
  5. package/bundled-skills/agent-tool-builder/SKILL.md +668 -16
  6. package/bundled-skills/ai-agents-architect/SKILL.md +271 -31
  7. package/bundled-skills/ai-product/SKILL.md +716 -26
  8. package/bundled-skills/ai-wrapper-product/SKILL.md +450 -44
  9. package/bundled-skills/algolia-search/SKILL.md +867 -15
  10. package/bundled-skills/autonomous-agents/SKILL.md +1033 -26
  11. package/bundled-skills/aws-serverless/SKILL.md +1046 -35
  12. package/bundled-skills/azure-functions/SKILL.md +1318 -19
  13. package/bundled-skills/browser-automation/SKILL.md +1065 -28
  14. package/bundled-skills/browser-extension-builder/SKILL.md +159 -32
  15. package/bundled-skills/bullmq-specialist/SKILL.md +347 -16
  16. package/bundled-skills/clerk-auth/SKILL.md +796 -15
  17. package/bundled-skills/computer-use-agents/SKILL.md +1870 -28
  18. package/bundled-skills/context-window-management/SKILL.md +271 -18
  19. package/bundled-skills/conversation-memory/SKILL.md +453 -24
  20. package/bundled-skills/crewai/SKILL.md +252 -46
  21. package/bundled-skills/discord-bot-architect/SKILL.md +1207 -34
  22. package/bundled-skills/docs/integrations/jetski-cortex.md +3 -3
  23. package/bundled-skills/docs/integrations/jetski-gemini-loader/README.md +1 -1
  24. package/bundled-skills/docs/maintainers/repo-growth-seo.md +3 -3
  25. package/bundled-skills/docs/maintainers/skills-update-guide.md +1 -1
  26. package/bundled-skills/docs/users/bundles.md +1 -1
  27. package/bundled-skills/docs/users/claude-code-skills.md +1 -1
  28. package/bundled-skills/docs/users/gemini-cli-skills.md +1 -1
  29. package/bundled-skills/docs/users/getting-started.md +1 -1
  30. package/bundled-skills/docs/users/kiro-integration.md +1 -1
  31. package/bundled-skills/docs/users/usage.md +4 -4
  32. package/bundled-skills/docs/users/visual-guide.md +4 -4
  33. package/bundled-skills/email-systems/SKILL.md +646 -26
  34. package/bundled-skills/faf-expert/SKILL.md +221 -0
  35. package/bundled-skills/faf-wizard/SKILL.md +252 -0
  36. package/bundled-skills/file-uploads/SKILL.md +212 -11
  37. package/bundled-skills/firebase/SKILL.md +646 -16
  38. package/bundled-skills/gcp-cloud-run/SKILL.md +1117 -32
  39. package/bundled-skills/graphql/SKILL.md +1026 -27
  40. package/bundled-skills/hubspot-integration/SKILL.md +804 -19
  41. package/bundled-skills/idea-darwin/SKILL.md +120 -0
  42. package/bundled-skills/inngest/SKILL.md +431 -16
  43. package/bundled-skills/interactive-portfolio/SKILL.md +342 -44
  44. package/bundled-skills/langfuse/SKILL.md +296 -41
  45. package/bundled-skills/langgraph/SKILL.md +259 -50
  46. package/bundled-skills/micro-saas-launcher/SKILL.md +343 -44
  47. package/bundled-skills/neon-postgres/SKILL.md +572 -15
  48. package/bundled-skills/nextjs-supabase-auth/SKILL.md +269 -21
  49. package/bundled-skills/notion-template-business/SKILL.md +371 -44
  50. package/bundled-skills/personal-tool-builder/SKILL.md +537 -44
  51. package/bundled-skills/plaid-fintech/SKILL.md +825 -19
  52. package/bundled-skills/prompt-caching/SKILL.md +438 -25
  53. package/bundled-skills/rag-engineer/SKILL.md +271 -29
  54. package/bundled-skills/salesforce-development/SKILL.md +912 -19
  55. package/bundled-skills/satori/SKILL.md +54 -0
  56. package/bundled-skills/scroll-experience/SKILL.md +381 -44
  57. package/bundled-skills/segment-cdp/SKILL.md +817 -19
  58. package/bundled-skills/shopify-apps/SKILL.md +1475 -19
  59. package/bundled-skills/slack-bot-builder/SKILL.md +1162 -28
  60. package/bundled-skills/telegram-bot-builder/SKILL.md +152 -37
  61. package/bundled-skills/telegram-mini-app/SKILL.md +445 -44
  62. package/bundled-skills/trigger-dev/SKILL.md +916 -27
  63. package/bundled-skills/twilio-communications/SKILL.md +1310 -28
  64. package/bundled-skills/upstash-qstash/SKILL.md +898 -27
  65. package/bundled-skills/vercel-deployment/SKILL.md +637 -39
  66. package/bundled-skills/viral-generator-builder/SKILL.md +132 -37
  67. package/bundled-skills/voice-agents/SKILL.md +937 -27
  68. package/bundled-skills/voice-ai-development/SKILL.md +375 -46
  69. package/bundled-skills/workflow-automation/SKILL.md +982 -29
  70. package/bundled-skills/zapier-make-patterns/SKILL.md +772 -27
  71. package/package.json +1 -1
@@ -1,13 +1,18 @@
1
1
  ---
2
2
  name: rag-engineer
3
- description: "I bridge the gap between raw documents and LLM understanding. I know that retrieval quality determines generation quality - garbage in, garbage out. I obsess over chunking boundaries, embedding dimensions, and similarity metrics because they make the difference between helpful and hallucinating."
3
+ description: Expert in building Retrieval-Augmented Generation systems. Masters
4
+ embedding models, vector databases, chunking strategies, and retrieval
5
+ optimization for LLM applications.
4
6
  risk: unknown
5
- source: "vibeship-spawner-skills (Apache 2.0)"
6
- date_added: "2026-02-27"
7
+ source: vibeship-spawner-skills (Apache 2.0)
8
+ date_added: 2026-02-27
7
9
  ---
8
10
 
9
11
  # RAG Engineer
10
12
 
13
+ Expert in building Retrieval-Augmented Generation systems. Masters embedding models,
14
+ vector databases, chunking strategies, and retrieval optimization for LLM applications.
15
+
11
16
  **Role**: RAG Systems Architect
12
17
 
13
18
  I bridge the gap between raw documents and LLM understanding. I know that
@@ -15,6 +20,25 @@ retrieval quality determines generation quality - garbage in, garbage out.
15
20
  I obsess over chunking boundaries, embedding dimensions, and similarity
16
21
  metrics because they make the difference between helpful and hallucinating.
17
22
 
23
+ ### Expertise
24
+
25
+ - Embedding model selection and fine-tuning
26
+ - Vector database architecture and scaling
27
+ - Chunking strategies for different content types
28
+ - Retrieval quality optimization
29
+ - Hybrid search implementation
30
+ - Re-ranking and filtering strategies
31
+ - Context window management
32
+ - Evaluation metrics for retrieval
33
+
34
+ ### Principles
35
+
36
+ - Retrieval quality > Generation quality - fix retrieval first
37
+ - Chunk size depends on content type and query patterns
38
+ - Embeddings are not magic - they have blind spots
39
+ - Always evaluate retrieval separately from generation
40
+ - Hybrid search beats pure semantic in most cases
41
+
18
42
  ## Capabilities
19
43
 
20
44
  - Vector embeddings and similarity search
@@ -24,11 +48,9 @@ metrics because they make the difference between helpful and hallucinating.
24
48
  - Context window optimization
25
49
  - Hybrid search (keyword + semantic)
26
50
 
27
- ## Requirements
51
+ ## Prerequisites
28
52
 
29
- - LLM fundamentals
30
- - Understanding of embeddings
31
- - Basic NLP concepts
53
+ - Required skills: LLM fundamentals, Understanding of embeddings, Basic NLP concepts
32
54
 
33
55
  ## Patterns
34
56
 
@@ -36,60 +58,280 @@ metrics because they make the difference between helpful and hallucinating.
36
58
 
37
59
  Chunk by meaning, not arbitrary token counts
38
60
 
39
- ```javascript
61
+ **When to use**: Processing documents with natural sections
62
+
40
63
  - Use sentence boundaries, not token limits
41
64
  - Detect topic shifts with embedding similarity
42
65
  - Preserve document structure (headers, paragraphs)
43
66
  - Include overlap for context continuity
44
67
  - Add metadata for filtering
45
- ```
46
68
 
47
69
  ### Hierarchical Retrieval
48
70
 
49
71
  Multi-level retrieval for better precision
50
72
 
51
- ```javascript
73
+ **When to use**: Large document collections with varied granularity
74
+
52
75
  - Index at multiple chunk sizes (paragraph, section, document)
53
76
  - First pass: coarse retrieval for candidates
54
77
  - Second pass: fine-grained retrieval for precision
55
78
  - Use parent-child relationships for context
56
- ```
57
79
 
58
80
  ### Hybrid Search
59
81
 
60
82
  Combine semantic and keyword search
61
83
 
62
- ```javascript
84
+ **When to use**: Queries may be keyword-heavy or semantic
85
+
63
86
  - BM25/TF-IDF for keyword matching
64
87
  - Vector similarity for semantic matching
65
88
  - Reciprocal Rank Fusion for combining scores
66
89
  - Weight tuning based on query type
67
- ```
68
90
 
69
- ## Anti-Patterns
91
+ ### Query Expansion
92
+
93
+ Expand queries to improve recall
94
+
95
+ **When to use**: User queries are short or ambiguous
96
+
97
+ - Use LLM to generate query variations
98
+ - Add synonyms and related terms
99
+ - Hypothetical Document Embedding (HyDE)
100
+ - Multi-query retrieval with deduplication
101
+
102
+ ### Contextual Compression
103
+
104
+ Compress retrieved context to fit window
105
+
106
+ **When to use**: Retrieved chunks exceed context limits
107
+
108
+ - Extract relevant sentences only
109
+ - Use LLM to summarize chunks
110
+ - Remove redundant information
111
+ - Prioritize by relevance score
112
+
113
+ ### Metadata Filtering
114
+
115
+ Pre-filter by metadata before semantic search
116
+
117
+ **When to use**: Documents have structured metadata
118
+
119
+ - Filter by date, source, category first
120
+ - Reduce search space before vector similarity
121
+ - Combine metadata filters with semantic scores
122
+ - Index metadata for fast filtering
123
+
124
+ ## Sharp Edges
125
+
126
+ ### Fixed-size chunking breaks sentences and context
127
+
128
+ Severity: HIGH
129
+
130
+ Situation: Using fixed token/character limits for chunking
131
+
132
+ Symptoms:
133
+ - Retrieved chunks feel incomplete or cut off
134
+ - Answer quality varies wildly
135
+ - High recall but low precision
136
+
137
+ Why this breaks:
138
+ Fixed-size chunks split mid-sentence, mid-paragraph, or mid-idea.
139
+ The resulting embeddings represent incomplete thoughts, leading to
140
+ poor retrieval quality. Users search for concepts but get fragments.
141
+
142
+ Recommended fix:
143
+
144
+ Use semantic chunking that respects document structure:
145
+ - Split on sentence/paragraph boundaries
146
+ - Use embedding similarity to detect topic shifts
147
+ - Include overlap for context continuity
148
+ - Preserve headers and document structure as metadata
149
+
150
+ ### Pure semantic search without metadata pre-filtering
151
+
152
+ Severity: MEDIUM
153
+
154
+ Situation: Only using vector similarity, ignoring metadata
155
+
156
+ Symptoms:
157
+ - Returns outdated information
158
+ - Mixes content from wrong sources
159
+ - Users can't scope their searches
160
+
161
+ Why this breaks:
162
+ Semantic search finds semantically similar content, but not necessarily
163
+ relevant content. Without metadata filtering, you return old docs when
164
+ user wants recent, wrong categories, or inapplicable content.
165
+
166
+ Recommended fix:
167
+
168
+ Implement hybrid filtering:
169
+ - Pre-filter by metadata (date, source, category) before vector search
170
+ - Post-filter results by relevance criteria
171
+ - Include metadata in the retrieval API
172
+ - Allow users to specify filters
173
+
174
+ ### Using same embedding model for different content types
175
+
176
+ Severity: MEDIUM
177
+
178
+ Situation: One embedding model for code, docs, and structured data
70
179
 
71
- ### ❌ Fixed Chunk Size
180
+ Symptoms:
181
+ - Code search returns irrelevant results
182
+ - Domain terms not matched properly
183
+ - Similar concepts not clustered
72
184
 
73
- ### Embedding Everything
185
+ Why this breaks:
186
+ Embedding models are trained on specific content types. Using a text
187
+ embedding model for code, or a general model for domain-specific
188
+ content, produces poor similarity matches.
74
189
 
75
- ### ❌ Ignoring Evaluation
190
+ Recommended fix:
76
191
 
77
- ## ⚠️ Sharp Edges
192
+ Evaluate embeddings per content type:
193
+ - Use code-specific embeddings for code (e.g., CodeBERT)
194
+ - Consider domain-specific or fine-tuned embeddings
195
+ - Benchmark retrieval quality before choosing
196
+ - Separate indices for different content types if needed
78
197
 
79
- | Issue | Severity | Solution |
80
- |-------|----------|----------|
81
- | Fixed-size chunking breaks sentences and context | high | Use semantic chunking that respects document structure: |
82
- | Pure semantic search without metadata pre-filtering | medium | Implement hybrid filtering: |
83
- | Using same embedding model for different content types | medium | Evaluate embeddings per content type: |
84
- | Using first-stage retrieval results directly | medium | Add reranking step: |
85
- | Cramming maximum context into LLM prompt | medium | Use relevance thresholds: |
86
- | Not measuring retrieval quality separately from generation | high | Separate retrieval evaluation: |
87
- | Not updating embeddings when source documents change | medium | Implement embedding refresh: |
88
- | Same retrieval strategy for all query types | medium | Implement hybrid search: |
198
+ ### Using first-stage retrieval results directly
199
+
200
+ Severity: MEDIUM
201
+
202
+ Situation: Taking top-K from vector search without reranking
203
+
204
+ Symptoms:
205
+ - Clearly relevant docs not in top results
206
+ - Results order seems arbitrary
207
+ - Adding more results helps quality
208
+
209
+ Why this breaks:
210
+ First-stage retrieval (vector search) optimizes for recall, not precision.
211
+ The top results by embedding similarity may not be the most relevant
212
+ for the specific query. Cross-encoder reranking dramatically improves
213
+ precision for the final results.
214
+
215
+ Recommended fix:
216
+
217
+ Add reranking step:
218
+ - Retrieve larger candidate set (e.g., top 20-50)
219
+ - Rerank with cross-encoder (query-document pairs)
220
+ - Return reranked top-K (e.g., top 5)
221
+ - Cache reranker for performance
222
+
223
+ ### Cramming maximum context into LLM prompt
224
+
225
+ Severity: MEDIUM
226
+
227
+ Situation: Using all retrieved context regardless of relevance
228
+
229
+ Symptoms:
230
+ - Answers drift with more context
231
+ - LLM ignores key information
232
+ - High token costs
233
+
234
+ Why this breaks:
235
+ More context isn't always better. Irrelevant context confuses the LLM,
236
+ increases latency and cost, and can cause the model to ignore the
237
+ most relevant information. Models have attention limits.
238
+
239
+ Recommended fix:
240
+
241
+ Use relevance thresholds:
242
+ - Set minimum similarity score cutoff
243
+ - Limit context to truly relevant chunks
244
+ - Summarize or compress if needed
245
+ - Order context by relevance
246
+
247
+ ### Not measuring retrieval quality separately from generation
248
+
249
+ Severity: HIGH
250
+
251
+ Situation: Only evaluating end-to-end RAG quality
252
+
253
+ Symptoms:
254
+ - Can't diagnose poor RAG performance
255
+ - Prompt changes don't help
256
+ - Random quality variations
257
+
258
+ Why this breaks:
259
+ If answers are wrong, you can't tell if retrieval failed or generation
260
+ failed. This makes debugging impossible and leads to wrong fixes
261
+ (tuning prompts when retrieval is the problem).
262
+
263
+ Recommended fix:
264
+
265
+ Separate retrieval evaluation:
266
+ - Create retrieval test set with relevant docs labeled
267
+ - Measure MRR, NDCG, Recall@K for retrieval
268
+ - Evaluate generation only on correct retrievals
269
+ - Track metrics over time
270
+
271
+ ### Not updating embeddings when source documents change
272
+
273
+ Severity: MEDIUM
274
+
275
+ Situation: Embeddings generated once, never refreshed
276
+
277
+ Symptoms:
278
+ - Returns outdated information
279
+ - References deleted content
280
+ - Inconsistent with source
281
+
282
+ Why this breaks:
283
+ Documents change but embeddings don't. Users retrieve outdated content
284
+ or, worse, content that no longer exists. This erodes trust in the
285
+ system.
286
+
287
+ Recommended fix:
288
+
289
+ Implement embedding refresh:
290
+ - Track document versions/hashes
291
+ - Re-embed on document change
292
+ - Handle deleted documents
293
+ - Consider TTL for embeddings
294
+
295
+ ### Same retrieval strategy for all query types
296
+
297
+ Severity: MEDIUM
298
+
299
+ Situation: Using pure semantic search for keyword-heavy queries
300
+
301
+ Symptoms:
302
+ - Exact term searches miss results
303
+ - Concept searches too literal
304
+ - Users frustrated with both
305
+
306
+ Why this breaks:
307
+ Some queries are keyword-oriented (looking for specific terms) while
308
+ others are semantic (looking for concepts). Pure semantic search fails
309
+ on exact matches; pure keyword search fails on paraphrases.
310
+
311
+ Recommended fix:
312
+
313
+ Implement hybrid search:
314
+ - BM25/TF-IDF for keyword matching
315
+ - Vector similarity for semantic matching
316
+ - Reciprocal Rank Fusion to combine
317
+ - Tune weights based on query patterns
89
318
 
90
319
  ## Related Skills
91
320
 
92
321
  Works well with: `ai-agents-architect`, `prompt-engineer`, `database-architect`, `backend`
93
322
 
94
323
  ## When to Use
95
- This skill is applicable to execute the workflow or actions described in the overview.
324
+
325
+ - User mentions or implies: building RAG
326
+ - User mentions or implies: vector search
327
+ - User mentions or implies: embeddings
328
+ - User mentions or implies: semantic search
329
+ - User mentions or implies: document retrieval
330
+ - User mentions or implies: context retrieval
331
+ - User mentions or implies: knowledge base
332
+ - User mentions or implies: LLM with documents
333
+ - User mentions or implies: chunking strategy
334
+ - User mentions or implies: pinecone
335
+ - User mentions or implies: weaviate
336
+ - User mentions or implies: chromadb
337
+ - User mentions or implies: pgvector