npm - @sylix/coworker - Versions diffs - 2.0.11 → 2.0.14 - Mend

@sylix/coworker 2.0.11 → 2.0.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (169) hide show

package/dist/skills/defaults/llm-application/rag-implementation.md ADDED Viewed

@@ -0,0 +1,434 @@
+---
+name: rag-implementation
+description: Build Retrieval-Augmented Generation (RAG) systems for LLM applications with vector databases and semantic search
+---
+# RAG Implementation
+Master Retrieval-Augmented Generation (RAG) to build LLM applications that provide accurate, grounded responses using external knowledge sources.
+## When to Use This Skill
+- Building Q&A systems over proprietary documents
+- Creating chatbots with current, factual information
+- Implementing semantic search with natural language queries
+- Reducing hallucinations with grounded responses
+- Enabling LLMs to access domain-specific knowledge
+- Building documentation assistants
+- Creating research tools with source citation
+## Core Components
+### 1. Vector Databases
+**Options:**
+- **Pinecone**: Managed, scalable, serverless
+- **Weaviate**: Open-source, hybrid search, GraphQL
+- **Milvus**: High performance, on-premise
+- **Chroma**: Lightweight, easy to use, local development
+- **Qdrant**: Fast, filtered search, Rust-based
+- **pgvector**: PostgreSQL extension, SQL integration
+### 2. Embeddings
+| Model | Dimensions | Best For |
+|-------|------------|----------|
+| **voyage-3-large** | 1024 | Claude apps (Anthropic recommended) |
+| **voyage-code-3** | 1024 | Code search |
+| **text-embedding-3-large** | 3072 | OpenAI apps, high accuracy |
+| **text-embedding-3-small** | 1536 | OpenAI apps, cost-effective |
+| **bge-large-en-v1.5** | 1024 | Open source, local deployment |
+| **multilingual-e5-large** | 1024 | Multi-language support |
+### 3. Retrieval Strategies
+- **Dense Retrieval**: Semantic similarity via embeddings
+- **Sparse Retrieval**: Keyword matching (BM25, TF-IDF)
+- **Hybrid Search**: Combine dense + sparse with weighted fusion
+- **Multi-Query**: Generate multiple query variations
+- **HyDE**: Generate hypothetical documents for better retrieval
+### 4. Reranking
+- **Cross-Encoders**: BERT-based reranking (ms-marco-MiniLM)
+- **Cohere Rerank**: API-based reranking
+- **Maximal Marginal Relevance (MMR)**: Diversity + relevance
+- **LLM-based**: Use LLM to score relevance
+## Quick Start with LangGraph
+```python
+from langgraph.graph import StateGraph, START, END
+from langchain_anthropic import ChatAnthropic
+from langchain_voyageai import VoyageAIEmbeddings
+from langchain_pinecone import PineconeVectorStore
+from langchain_core.documents import Document
+from langchain_core.prompts import ChatPromptTemplate
+from typing import TypedDict, Annotated
+class RAGState(TypedDict):
+    question: str
+    context: list[Document]
+    answer: str
+llm = ChatAnthropic(model="claude-sonnet-4-6")
+embeddings = VoyageAIEmbeddings(model="voyage-3-large")
+vectorstore = PineconeVectorStore(index_name="docs", embedding=embeddings)
+retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
+rag_prompt = ChatPromptTemplate.from_template(
+    """Answer based on the context below. If you cannot answer, say so.
+    Context:
+    {context}
+    Question: {question}
+    Answer:"""
+)
+async def retrieve(state: RAGState) -> RAGState:
+    docs = await retriever.ainvoke(state["question"])
+    return {"context": docs}
+async def generate(state: RAGState) -> RAGState:
+    context_text = "\n\n".join(doc.page_content for doc in state["context"])
+    messages = rag_prompt.format_messages(context=context_text, question=state["question"])
+    response = await llm.ainvoke(messages)
+    return {"answer": response.content}
+builder = StateGraph(RAGState)
+builder.add_node("retrieve", retrieve)
+builder.add_node("generate", generate)
+builder.add_edge(START, "retrieve")
+builder.add_edge("retrieve", "generate")
+builder.add_edge("generate", END)
+rag_chain = builder.compile()
+result = await rag_chain.ainvoke({"question": "What are the main features?"})
+print(result["answer"])
+```
+## Advanced RAG Patterns
+### Pattern 1: Hybrid Search with RRF
+```python
+from langchain_community.retrievers import BM25Retriever
+from langchain.retrievers import EnsembleRetriever
+bm25_retriever = BM25Retriever.from_documents(documents)
+bm25_retriever.k = 10
+dense_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
+ensemble_retriever = EnsembleRetriever(
+    retrievers=[bm25_retriever, dense_retriever],
+    weights=[0.3, 0.7]
+)
+```
+### Pattern 2: Multi-Query Retrieval
+```python
+from langchain.retrievers.multi_query import MultiQueryRetriever
+multi_query_retriever = MultiQueryRetriever.from_llm(
+    retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
+    llm=llm
+)
+results = await multi_query_retriever.ainvoke("What is the main topic?")
+```
+### Pattern 3: Contextual Compression
+```python
+from langchain.retrievers import ContextualCompressionRetriever
+from langchain.retrievers.document_compressors import LLMChainExtractor
+compressor = LLMChainExtractor.from_llm(llm)
+compression_retriever = ContextualCompressionRetriever(
+    base_compressor=compressor,
+    base_retriever=vectorstore.as_retriever(search_kwargs={"k": 10})
+)
+compressed_docs = await compression_retriever.ainvoke("specific query")
+```
+### Pattern 4: Parent Document Retriever
+```python
+from langchain.retrievers import ParentDocumentRetriever
+from langchain.storage import InMemoryStore
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+child_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50)
+parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
+docstore = InMemoryStore()
+parent_retriever = ParentDocumentRetriever(
+    vectorstore=vectorstore,
+    docstore=docstore,
+    child_splitter=child_splitter,
+    parent_splitter=parent_splitter
+)
+await parent_retriever.aadd_documents(documents)
+results = await parent_retriever.ainvoke("query")
+```
+### Pattern 5: HyDE (Hypothetical Document Embeddings)
+```python
+from langchain_core.prompts import ChatPromptTemplate
+hyde_prompt = ChatPromptTemplate.from_template(
+    """Write a detailed passage that would answer this question:
+    Question: {question}
+    Passage:"""
+)
+async def generate_hypothetical(state: HyDEState) -> HyDEState:
+    messages = hyde_prompt.format_messages(question=state["question"])
+    response = await llm.ainvoke(messages)
+    return {"hypothetical_doc": response.content}
+async def retrieve_with_hyde(state: HyDEState) -> HyDEState:
+    docs = await retriever.ainvoke(state["hypothetical_doc"])
+    return {"context": docs}
+builder = StateGraph(HyDEState)
+builder.add_node("hypothetical", generate_hypothetical)
+builder.add_node("retrieve", retrieve_with_hyde)
+builder.add_node("generate", generate)
+builder.add_edge(START, "hypothetical")
+builder.add_edge("hypothetical", "retrieve")
+builder.add_edge("retrieve", "generate")
+builder.add_edge("generate", END)
+hyde_rag = builder.compile()
+```
+## Document Chunking Strategies
+### Recursive Character Text Splitter
+```python
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+splitter = RecursiveCharacterTextSplitter(
+    chunk_size=1000,
+    chunk_overlap=200,
+    length_function=len,
+    separators=["\n\n", "\n", ". ", " ", ""]
+)
+chunks = splitter.split_documents(documents)
+```
+### Semantic Chunking
+```python
+from langchain_experimental.text_splitter import SemanticChunker
+splitter = SemanticChunker(
+    embeddings=embeddings,
+    breakpoint_threshold_type="percentile",
+    breakpoint_threshold_amount=95
+)
+```
+### Markdown Header Splitter
+```python
+from langchain_text_splitters import MarkdownHeaderTextSplitter
+headers_to_split_on = [
+    ("#", "Header 1"),
+    ("##", "Header 2"),
+    ("###", "Header 3"),
+]
+splitter = MarkdownHeaderTextSplitter(
+    headers_to_split_on=headers_to_split_on,
+    strip_headers=False
+)
+```
+## Vector Store Configurations
+### Pinecone (Serverless)
+```python
+from pinecone import Pinecone, ServerlessSpec
+from langchain_pinecone import PineconeVectorStore
+pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
+if "my-index" not in pc.list_indexes().names():
+    pc.create_index(
+        name="my-index",
+        dimension=1024,
+        metric="cosine",
+        spec=ServerlessSpec(cloud="aws", region="us-east-1")
+    )
+index = pc.Index("my-index")
+vectorstore = PineconeVectorStore(index=index, embedding=embeddings)
+```
+### Chroma (Local Development)
+```python
+from langchain_chroma import Chroma
+vectorstore = Chroma(
+    collection_name="my_collection",
+    embedding_function=embeddings,
+    persist_directory="./chroma_db"
+)
+```
+### pgvector (PostgreSQL)
+```python
+from langchain_postgres.vectorstores import PGVector
+connection_string = "postgresql+psycopg://user:pass@localhost:5432/vectordb"
+vectorstore = PGVector(
+    embeddings=embeddings,
+    collection_name="documents",
+    connection=connection_string,
+)
+```
+## Retrieval Optimization
+### Metadata Filtering
+```python
+results = await vectorstore.asimilarity_search(
+    "query",
+    filter={"category": "technical"},
+    k=5
+)
+```
+### Maximal Marginal Relevance (MMR)
+```python
+results = await vectorstore.amax_marginal_relevance_search(
+    "query",
+    k=5,
+    fetch_k=20,
+    lambda_mult=0.5
+)
+```
+### Reranking with Cross-Encoder
+```python
+from sentence_transformers import CrossEncoder
+reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
+async def retrieve_and_rerank(query: str, k: int = 5) -> list[Document]:
+    candidates = await vectorstore.asimilarity_search(query, k=20)
+    pairs = [[query, doc.page_content] for doc in candidates]
+    scores = reranker.predict(pairs)
+    ranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)
+    return [doc for doc, score in ranked[:k]]
+```
+## Prompt Engineering for RAG
+```python
+rag_prompt = ChatPromptTemplate.from_template(
+    """Answer the question based on the context below. Include citations using [1], [2], etc.
+    If you cannot answer based on the context, say "I don't have enough information."
+    Context:
+    {context}
+    Question: {question}
+    Instructions:
+    1. Use only information from the context
+    2. Cite sources with [1], [2] format
+    3. If uncertain, express uncertainty
+    Answer (with citations):"""
+)
+```
+### Structured Output for RAG
+```python
+from pydantic import BaseModel, Field
+class RAGResponse(BaseModel):
+    answer: str = Field(description="The answer based on context")
+    confidence: float = Field(description="Confidence score 0-1")
+    sources: list[str] = Field(description="Source document IDs used")
+    reasoning: str = Field(description="Brief reasoning for the answer")
+structured_llm = llm.with_structured_output(RAGResponse)
+```
+## Evaluation Metrics
+```python
+class RAGEvalMetrics(TypedDict):
+    retrieval_precision: float
+    retrieval_recall: float
+    answer_relevance: float
+    faithfulness: float
+    context_relevance: float
+async def evaluate_rag_system(rag_chain, test_cases: list[dict]) -> RAGEvalMetrics:
+    metrics = {k: [] for k in RAGEvalMetrics.__annotations__}
+    for test in test_cases:
+        result = await rag_chain.ainvoke({"question": test["question"]})
+        retrieved_ids = {doc.metadata["id"] for doc in result["context"]}
+        relevant_ids = set(test["relevant_doc_ids"])
+        precision = len(retrieved_ids & relevant_ids) / len(retrieved_ids)
+        recall = len(retrieved_ids & relevant_ids) / len(relevant_ids)
+        metrics["retrieval_precision"].append(precision)
+        metrics["retrieval_recall"].append(recall)
+    return {k: sum(v) / len(v) for k, v in metrics.items()}
+```
+## Best Practices
+1. **Chunk Size**: Balance between context (larger) and specificity (smaller) - typically 500-1000 tokens
+2. **Overlap**: Use 10-20% overlap to preserve context at boundaries
+3. **Metadata**: Include source, page, timestamp for filtering and debugging
+4. **Hybrid Search**: Combine semantic and keyword search for best recall
+5. **Reranking**: Use cross-encoder reranking for precision-critical applications
+6. **Citations**: Always return source documents for transparency
+7. **Evaluation**: Continuously test retrieval quality and answer accuracy
+8. **Monitoring**: Track retrieval metrics and latency in production
+## Common Issues
+- **Poor Retrieval**: Check embedding quality, chunk size, query formulation
+- **Irrelevant Results**: Add metadata filtering, use hybrid search, rerank
+- **Missing Information**: Ensure documents are properly indexed, check chunking
+- **Slow Queries**: Optimize vector store, use caching, reduce k
+- **Hallucinations**: Improve grounding prompt, add verification step
+- **Context Too Long**: Use compression or parent document retriever