npm - sinapse-ai - Versions diffs - 9.3.0 → 9.4.0 - Mend

sinapse-ai 9.3.0 → 9.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (192) hide show

package/squads/squad-claude/knowledge-base/retrieval-augmented-generation.md ADDED Viewed

@@ -0,0 +1,320 @@
+# Retrieval-Augmented Generation (RAG)
+> BM25+embeddings+graph hybrid retrieval, chunking strategies, and production RAG patterns. Based on MS-009 research (April 2026).
+---
+## RAG Fundamentals
+**RAG (Retrieval-Augmented Generation):** Augmenting LLM responses with retrieved external knowledge, rather than relying solely on model training data.
+**Why RAG matters for agents:**
+- Grounds responses in verified, current information
+- Prevents hallucination by anchoring to source documents
+- Enables access to knowledge beyond training cutoff
+- Allows agents to operate over private/proprietary knowledge bases
+- Reduces fine-tuning costs (context engineering instead)
+**Patrick Lewis et al. (Facebook AI Research, 2020):** The foundational paper establishing RAG as a paradigm. "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks."
+---
+## The 2026 Production Standard: Hybrid Search
+Single-modality retrieval is no longer production-grade. The standard is hybrid:
+```
+[User Query]
+  │
+  ├── BM25 (keyword search) ──────────────────┐
+  │                                           │
+  ├── Dense Embeddings (semantic) ────────────┤
+  │                                           │
+  └── Knowledge Graph (structured) ──────────┤
+                                             │
+                                     Reciprocal Rank Fusion (RRF)
+                                             │
+                                     Cross-Encoder Reranking
+                                             │
+                                     Final Top-N to LLM
+```
+**Why each component:**
+| Component | Strengths | Weaknesses |
+|-----------|----------|------------|
+| BM25 | Exact matches, product codes, legal terms, acronyms | Misses semantic similarity |
+| Dense embeddings | Semantic similarity, paraphrase matching | Misses exact terms |
+| Knowledge graph | Structural relationships, entity chains | Misses unstructured content |
+**Hybrid reduces errors by 35-60% vs pure semantic retrieval.**
+---
+## BM25 (Best Match 25)
+The gold standard for keyword-based retrieval, still essential in 2026.
+**Scoring formula:**
+```
+BM25(d, q) = Σ IDF(qi) × [f(qi,d) × (k1+1)] / [f(qi,d) + k1×(1-b+b×|d|/avgdl)]
+```
+Where:
+- `IDF(qi)` — inverse document frequency of term qi
+- `f(qi,d)` — frequency of term qi in document d
+- `k1` — term saturation (typically 1.2-2.0)
+- `b` — length normalization (typically 0.75)
+- `avgdl` — average document length
+**Best for:** Product codes, UUIDs, unique identifiers, technical terms, proper nouns, legal/medical terminology.
+---
+## Dense Embeddings
+**How they work:** Documents and queries converted to high-dimensional vectors. Similar meanings → nearby in vector space → retrieved together.
+**Embedding models (2026):**
+| Model | Dimensions | Strengths |
+|-------|-----------|----------|
+| text-embedding-3-large (OpenAI) | 3,072 | General purpose, MTEB SOTA |
+| voyage-3 (Anthropic) | 1,024 | Claude-optimized, multilingual |
+| cohere-embed-v3 | 1,024 | Classification, search |
+| e5-large-v2 (free) | 1,024 | Open-source quality |
+| bge-m3 (free) | 1,024 | Multi-lingual, long-context |
+**Selection guidance:**
+- Production with Anthropic stack: Voyage-3
+- Cost-sensitive: bge-m3 (self-hosted)
+- Multilingual: bge-m3 or voyage-3
+---
+## Reciprocal Rank Fusion (RRF)
+Merges rankings from multiple retrieval systems without requiring score normalization.
+**Formula:**
+```
+RRF(d) = Σ 1 / (k + rank_i(d))
+```
+Where `k` = 60 (standard), `rank_i(d)` = rank of document d in retrieval system i.
+**Why RRF works:** A document appearing in top-5 across multiple retrieval types gets a massive score boost. Captures "universal relevance."
+**Alternative: Learned merging** — train a small model to weight retrieval sources based on query type. Better for domain-specific applications with training data.
+---
+## Cross-Encoder Reranking
+Two-stage retrieval architecture:
+**Stage 1 (Bi-encoder):** Fast approximate search — retrieves top-K candidates using embeddings.
+**Stage 2 (Cross-encoder):** Slow, precise scoring — reads query + document together for more accurate relevance.
+```
+[Query] + [Document 1] → Score: 0.92
+[Query] + [Document 2] → Score: 0.87
+[Query] + [Document 3] → Score: 0.71
+```
+**Models:** `cross-encoder/ms-marco-MiniLM-L-6-v2` (open-source), Cohere Rerank (API).
+**When to use:** When top-K from bi-encoder stage contains irrelevant results that hurt LLM response quality.
+---
+## Chunking Strategies
+How documents are split for indexing. Critical for retrieval quality.
+### Chunking Methods
+| Method | Description | Best For |
+|--------|-------------|---------|
+| **Fixed-size** | Split every N characters/tokens | Baseline, simple docs |
+| **Sentence-based** | Split at sentence boundaries | Prose text |
+| **Semantic** | Split at topic changes (detected by LLM/embeddings) | Complex documents |
+| **Recursive** | Try paragraphs → sentences → words | Variable-length content |
+| **Document-aware** | Respect markdown headers, code blocks | Technical docs, KB files |
+| **Parent-child** | Store full section + child chunks | Knowledge retrieval |
+### Parent-Child Chunking (Recommended for KB)
+```
+Parent chunk: Full section (e.g., "## Memory Architecture")
+  → Stored for retrieval (full context)
+Child chunks: Individual paragraphs within section
+  → Used for search (narrow matches)
+  → When matched, return parent chunk (full context)
+```
+**Why:** Narrow chunks retrieve more precisely; full sections provide sufficient context for LLM.
+### Chunk Size Guidelines
+| Content Type | Recommended Size | Overlap |
+|-------------|-----------------|---------|
+| Technical documentation | 512-1024 tokens | 10-20% |
+| Conversational notes | 256-512 tokens | 5-10% |
+| Code snippets | 256-512 tokens | 0% (no overlap) |
+| Legal/formal documents | 1024-2048 tokens | 15-20% |
+| Research papers | 512-1024 tokens | 10-15% |
+**Key rule:** Each chunk should be **semantically self-contained** — understandable without surrounding context.
+---
+## GraphRAG (Microsoft Research + Production)
+### What GraphRAG Adds
+Standard RAG: "What do these chunks say?"
+GraphRAG: "What do ENTITIES and their RELATIONSHIPS say?"
+**Two query modes:**
+- **Local query:** Specific fact lookup about known entities
+- **Global query:** Theme/pattern questions across entire corpus
+### LazyGraphRAG (2026 Innovation)
+Full GraphRAG requires expensive upfront indexing (extract all entities and relationships).
+**LazyGraphRAG:** Defers expensive analysis to query time.
+- Index cost: ~0.1% of full GraphRAG
+- Quality: Comparable for global queries
+- Trade-off: Higher latency per query
+**Use when:** Cost of upfront indexing prohibitive; data changes frequently.
+---
+## Agentic RAG
+State of the art — agents that **plan, retrieve, reason, critique, and refine** in loops.
+### Agentic RAG Loop
+```
+[Question]
+  ↓
+[Decompose into sub-questions]
+  ↓
+For each sub-question:
+  [Query formulation] → refined search terms
+  [Retrieval] → top-K results
+  [Relevance check] → is this actually useful?
+  [Gap detection] → what's missing?
+  If insufficient → reformulate query and retry (max N)
+  ↓
+[Cross-reference all findings]
+[Identify convergences and contradictions]
+[Synthesize into coherent answer]
+[Verify citations are accurate]
+  ↓
+[Final response with citations]
+```
+**Survey:** arXiv 2501.09136 — "Agentic RAG" as formal research area.
+### SINAPSE Research Pipeline (analogous)
+```
+@analyst receives research request
+  → Decompose into sub-questions
+  → For each: search vault + web + papers
+  → Extract claims and assess credibility
+  → Cross-reference sources
+  → Synthesize structured output
+  → Deposit results in vault (audit trail)
+```
+---
+## RAG Evaluation
+### Core Metrics
+| Metric | What It Measures | Target |
+|--------|-----------------|--------|
+| Retrieval Precision | % retrieved chunks actually relevant | > 70% |
+| Retrieval Recall | % relevant chunks retrieved | > 60% |
+| Answer Faithfulness | Answer grounded in retrieved context | > 90% |
+| Answer Relevance | Answer addresses the question | > 85% |
+| Context Relevance | Retrieved context relevant to question | > 70% |
+| Latency P95 | Time to first token | < 3s |
+### RAGAS Framework
+Open-source RAG evaluation (Exploding Topics, 2023):
+```python
+from ragas import evaluate
+from ragas.metrics import faithfulness, answer_relevancy, context_precision
+results = evaluate(
+    dataset=test_cases,
+    metrics=[faithfulness, answer_relevancy, context_precision]
+)
+```
+### Common Failure Modes
+| Failure | Cause | Fix |
+|---------|-------|-----|
+| Hallucination | LLM ignores retrieved context | Better prompting: "Based ONLY on context" |
+| Retrieved wrong content | Poor chunking or embedding quality | Improve chunking, upgrade embeddings |
+| Missing relevant content | Incomplete retrieval | Hybrid search, increase top-K |
+| Context too long | Too many chunks retrieved | Cross-encoder reranking, reduce K |
+| Outdated information | Stale index | Index update schedule |
+---
+## RAG for SINAPSE Knowledge Base
+### Current Knowledge Flow
+```
+User query → Claude Code
+  → (no automated retrieval)
+  → Claude reads KB files manually when needed
+```
+### Recommended Enhancement
+```
+User query → SINAPSE agent
+  → Query formulation
+  → Hybrid search over KB + stories + architecture docs
+  → Top-N relevant chunks
+  → Grounded response with KB citations
+```
+### Implementation Approach
+1. **Index KB files:** Embed all KB files (*.md in knowledge-base/)
+2. **Index stories:** Embed active stories (docs/stories/)
+3. **Index architecture docs:** Embed docs/architecture/
+4. **Search API:** Expose search endpoint via MCP server
+5. **Agent integration:** Add `*search-kb` skill that agents call when needing reference
+### Chunking KB Files
+KB files are structured markdown. Use document-aware chunking:
+```python
+# Split at H2 headers (##)
+# Keep H1 context in each chunk as prefix
+# Minimum chunk: 100 tokens
+# Maximum chunk: 1500 tokens
+# Overlap: 10% between consecutive sections
+```
+Each chunk prefixed with: `{file_name} > {h1_title} > {h2_section}`
+Example: `memory-systems-reference.md > Memory Frameworks Comparison > Letta (MemGPT)`

package/squads/squad-claude/knowledge-base/skill-creation-patterns.md ADDED Viewed

@@ -0,0 +1,380 @@
+# Skill Creation Patterns
+> Complete guide to the Agent Skills ecosystem. Based on skills-ecosystem-analysis research (April 2026). 1,060+ skills catalogued, 33 platforms supporting the format.
+---
+## The Agent Skills Standard
+### What Skills Are
+Skills are **portable, on-demand capability packages** that teach agents how to accomplish domain-specific tasks. Not tools (which provide programmatic access) — skills are **knowledge and workflow instructions** that agents load when needed.
+**The key difference:**
+| Concept | What it is | When to use |
+|---------|-----------|-------------|
+| **Skill** | Instructions + resources teaching the agent a workflow | Agent needs domain expertise or methodology |
+| **MCP Tool** | Server exposing programmatic actions via protocol | Agent needs API/DB/external service access |
+| **Hook** | Script intercepting lifecycle events | Enforcement, validation, automation |
+| **CLAUDE.md** | Global project instructions | Conventions, rules, project context |
+### Format Standard (agentskills.io)
+The `SKILL.md` format is supported by **33 platforms** (April 2026), including Claude Code, Codex, Gemini CLI, Cursor, VS Code, GitHub Copilot, and 27+ others.
+**Key principle:** Write once, run anywhere.
+---
+## SKILL.md File Format
+### Directory Structure
+```
+skill-name/
+  SKILL.md          # REQUIRED: metadata + instructions
+  scripts/          # Optional: executable scripts
+  references/       # Optional: supplementary documentation
+  assets/           # Optional: templates, static resources
+  LICENSE.txt       # Optional: license
+```
+**Rule:** Folder name MUST match the `name` field in frontmatter.
+### Frontmatter Specification
+| Field | Required | Constraints | Description |
+|-------|----------|-------------|-------------|
+| `name` | Yes | Max 64 chars, lowercase + hyphens, no consecutive hyphens | Unique identifier |
+| `description` | Yes | Max 1024 chars | What it does AND when to use it |
+| `license` | No | License name or file reference | Distribution terms |
+| `compatibility` | No | Max 500 chars | Environment requirements |
+| `metadata` | No | `map<string, string>` | Arbitrary key-value pairs |
+| `allowed-tools` | No | Experimental | Pre-approved tools list |
+**Minimal example:**
+```yaml
+---
+name: pdf-processing
+description: Extract PDF text, fill forms, merge files. Use when handling PDFs.
+---
+```
+**Complete example:**
+```yaml
+---
+name: security-audit
+description: |
+  Audit code for security vulnerabilities including injection flaws,
+  auth issues, and secret exposure. Use when reviewing PR diffs,
+  new features, or when user mentions security review.
+license: Apache-2.0
+compatibility: Requires Python 3.12+
+metadata:
+  author: squad-claude
+  version: "1.0"
+  category: security
+allowed-tools: Read Grep Glob Bash(git diff *)
+---
+```
+### Progressive Disclosure (3 levels)
+The most critical architectural pattern in the skill system:
+| Level | What loads | When | Ideal size |
+|-------|-----------|------|-----------|
+| 1. Metadata | `name` + `description` | Always (startup) | ~100 tokens |
+| 2. Instructions | Full SKILL.md body | When skill is activated | < 5,000 tokens (~500 lines) |
+| 3. Resources | Files in scripts/, references/, assets/ | On-demand | Unlimited |
+**Critical rule: Keep SKILL.md under 500 lines.** Move detailed reference material to separate files.
+---
+## Quality Patterns
+### Writing Effective Descriptions
+The description is NOT just documentation — it's the **primary activation mechanism**. Agents use descriptions to decide when to invoke skills.
+**Anti-pattern (vague):**
+```yaml
+description: "Helps with code review"
+```
+**Best practice (specific + trigger keywords):**
+```yaml
+description: |
+  Review code changes for security vulnerabilities, logic errors, and style violations.
+  Use when: reviewing PRs, auditing new features, after implementing auth/payment code,
+  or when user asks for code review, security check, or audit.
+```
+**Anthropic recommendation:** Make descriptions "a bit pushy" to counter the tendency of under-triggering. Include specific keywords users might say.
+### Instruction Principles
+1. **Imperative, not declarative** — "Do X" not "X should be done"
+2. **Concrete examples** — Every skill should have input/output examples
+3. **Decision trees** — Complex skills need explicit when/when-not-to-use branches
+4. **Reference templates** — Link to templates in sub-folders, don't inline everything
+5. **No inventions** — Skills teach existing patterns, not invented ones
+6. **Scripts as black boxes** — Scripts in `scripts/` should be called with `--help` first, not read inline
+### Decision Tree Pattern
+```markdown
+## When to Use This Skill
+Use this skill when:
+- User mentions "security review", "audit", "vulnerability"
+- PR contains changes to auth, payments, or user data
+- New API endpoints are being added
+Do NOT use when:
+- Simple read-only changes (cosmetic, docs)
+- Test-only changes
+- User explicitly asks for implementation help (not review)
+## Decision Process
+1. Read the diff/changed files
+2. Check for [high-risk patterns]:
+   - SQL string concatenation → BLOCK, flag injection risk
+   - Hardcoded credentials → BLOCK, flag secret exposure
+   - Missing input validation → WARN, suggest Zod schema
+   - RLS disabled → BLOCK, flag security regression
+3. Summarize findings by severity: CRITICAL / HIGH / MEDIUM / LOW
+```
+---
+## Official Anthropic Skills (17 skills)
+### Plugin: document-skills (source-available)
+| Skill | Capability |
+|-------|-----------|
+| `docx` | Word document creation/editing with tracked changes |
+| `pdf` | Full PDF manipulation (extract, merge, fill forms) |
+| `pptx` | PowerPoint creation/editing |
+| `xlsx` | Excel creation/editing with formulas and charts |
+### Plugin: example-skills (Apache 2.0)
+| Skill | Capability |
+|-------|-----------|
+| `algorithmic-art` | Generative art with p5.js and seeded randomness |
+| `brand-guidelines` | Anthropic visual identity application |
+| `canvas-design` | Visual design in PNG/PDF |
+| `doc-coauthoring` | Document co-authoring |
+| `frontend-design` | Production-grade interfaces without "AI slop" |
+| `internal-comms` | Internal communications (status reports, newsletters) |
+| `mcp-builder` | Guide for creating MCP servers |
+| `skill-creator` | Meta-skill for creating new skills |
+| `slack-gif-creator` | Animated GIFs optimized for Slack |
+| `theme-factory` | Visual theme creation |
+| `web-artifacts-builder` | Interactive HTML artifacts |
+| `webapp-testing` | Web app testing with Playwright |
+### Plugin: claude-api
+| Skill | Capability |
+|-------|-----------|
+| `claude-api` | Claude API documentation and SDK (Python, TS, Java, Go, Ruby, C#, PHP) |
+---
+## Ecosystem Numbers (April 2026)
+| Metric | Value |
+|--------|-------|
+| Skills catalogued (VoltAgent/awesome-agent-skills) | 1,060+ |
+| Dev teams contributing | 38+ |
+| Categories | 11+ |
+| Official vendor skills | 307 |
+| Community skills | 144+ |
+| Platforms supporting SKILL.md format | 33 |
+| obra/superpowers stars | 134,347 |
+| anthropics/skills stars | 110,197 |
+### Quality Distribution
+| Tier | Description | Estimated % |
+|------|-------------|-------------|
+| S-tier | Official Anthropic + Trail of Bits + Superpowers | ~5% |
+| A-tier | Official vendors (Vercel, Netlify, Expo, Microsoft) | ~15% |
+| B-tier | Strong community (documented, tested) | ~25% |
+| C-tier | Basic community (functional, no polish) | ~35% |
+| D-tier | Low-effort / AI-generated bulk | ~20% |
+---
+## obra/superpowers Framework
+The dominant community reference with 134K stars — not just skills, but a complete development methodology.
+### 14 Skills Included
+- `brainstorming` — Structured ideation
+- `dispatching-parallel-agents` — Parallel agent orchestration
+- `executing-plans` — Implementation plan execution
+- `finishing-a-development-branch` — Branch finalization
+- `receiving-code-review` — Code review reception
+- `requesting-code-review` — Code review solicitation
+- `subagent-driven-development` — Development with subagents
+- `systematic-debugging` — Systematic debug process
+- `test-driven-development` — TDD workflow
+- `using-git-worktrees` — Worktrees for parallelism
+- `verification-before-completion` — Pre-delivery verification
+- `writing-plans` — Implementation plan writing
+- `writing-skills` — Meta-skill for creating skills
+- `using-superpowers` — Framework usage guide
+### Methodology (5 steps)
+1. Understand what user wants (spec)
+2. Show spec in digestible chunks
+3. Create implementation plan for "enthusiastic junior engineer"
+4. Subagent-driven development (one agent per task)
+5. Two-stage review (spec compliance + code quality)
+---
+## Skills for SINAPSE Squad-Claude
+### Existing Skills to Build
+| Skill | Priority | Description |
+|-------|----------|-------------|
+| `claude-code-audit` | High | Audit CC configuration, hooks, settings |
+| `sinapse-setup` | High | Initialize SINAPSE in a new project |
+| `agent-persona-creation` | High | Create new agent .md files |
+| `hooks-architecture` | Medium | Design and implement hook systems |
+| `mcp-server-setup` | Medium | Configure and test MCP servers |
+| `context-optimization` | Medium | Analyze and reduce context window usage |
+| `skill-creation` | High | Meta-skill — create new SINAPSE skills |
+| `squad-publishing` | Low | Package and publish squads |
+### Skill Template for SINAPSE
+```yaml
+---
+name: {skill-name}
+description: |
+  {What it does in 1-2 sentences}.
+  Use when: {specific trigger conditions — include keywords users say}.
+license: Apache-2.0
+metadata:
+  author: squad-claude
+  version: "1.0"
+  category: {claude-code|configuration|agent|workflow}
+allowed-tools: {space-separated list or omit}
+---
+## Purpose
+{Brief purpose statement}
+## Prerequisites
+- {Requirement 1}
+- {Requirement 2}
+## When to Use
+Use this skill when:
+- {Trigger condition 1}
+- {Trigger condition 2}
+Do NOT use when:
+- {Anti-trigger 1}
+## Process
+1. **{Step 1}**
+   {Instructions}
+2. **{Step 2}**
+   {Instructions}
+## Outputs
+- `{file/artifact}` — {description}
+## Common Issues
+| Issue | Cause | Fix |
+|-------|-------|-----|
+| {issue} | {cause} | {fix} |
+```
+---
+## Plugin Distribution System
+### Plugin Manifest (marketplace.json)
+```json
+{
+  "name": "sinapse-claude-skills",
+  "plugins": [
+    {
+      "name": "development-workflow",
+      "description": "Skills for Claude Code configuration and SINAPSE workflows",
+      "source": "./",
+      "strict": false,
+      "skills": [
+        "./skills/claude-code-audit",
+        "./skills/sinapse-setup",
+        "./skills/agent-persona-creation"
+      ]
+    }
+  ]
+}
+```
+### Installation
+```bash
+/plugin marketplace add sinapse-ai/squad-claude-skills
+/plugin install development-workflow@sinapse-claude-skills
+```
+### Namespacing
+Skills from plugins receive prefix: `plugin-name:skill-name`
+Example: `sinapse-claude:claude-code-audit`
+---
+## Platform Compatibility Matrix
+| Platform | Support | Notes |
+|----------|---------|-------|
+| Claude Code | Full | Native marketplace |
+| Codex | Full | `.codex/skills/` directory |
+| Gemini CLI | Full | Standard SKILL.md |
+| Cursor | Full | Marketplace |
+| VS Code | Full | Extension integration |
+| GitHub Copilot | Full | Instruction-based |
+| OpenCode | Full | Fetch + install |
+| OpenHands | Full | Cloud platform |
+| Spring AI | Native | Framework integration |
+| 25+ others | Full | Via agentskills.io spec |
+---
+## Anti-Patterns
+| Anti-pattern | Problem | Fix |
+|-------------|---------|-----|
+| Vague description | Skill never triggers | Add specific keywords + trigger conditions |
+| SKILL.md > 500 lines | Context pollution | Move details to `references/` subfolder |
+| Scripts inlined in SKILL.md | Unmanageable | Put in `scripts/`, call with `--help` first |
+| Inventing workflows | No grounding | Only document validated, existing patterns |
+| Single monolithic skill | Hard to maintain | Split into focused single-responsibility skills |
+| No examples | Agents misuse skill | Include concrete input/output examples |
+| Hardcoded paths | Portability failures | Use relative paths and env vars |