@rune-kit/rune 2.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (155) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +357 -0
  3. package/agents/.gitkeep +0 -0
  4. package/agents/architect.md +29 -0
  5. package/agents/asset-creator.md +11 -0
  6. package/agents/audit.md +11 -0
  7. package/agents/autopsy.md +11 -0
  8. package/agents/brainstorm.md +11 -0
  9. package/agents/browser-pilot.md +11 -0
  10. package/agents/coder.md +29 -0
  11. package/agents/completion-gate.md +11 -0
  12. package/agents/constraint-check.md +11 -0
  13. package/agents/context-engine.md +11 -0
  14. package/agents/cook.md +11 -0
  15. package/agents/db.md +11 -0
  16. package/agents/debug.md +11 -0
  17. package/agents/dependency-doctor.md +11 -0
  18. package/agents/deploy.md +11 -0
  19. package/agents/design.md +11 -0
  20. package/agents/docs-seeker.md +11 -0
  21. package/agents/fix.md +11 -0
  22. package/agents/hallucination-guard.md +11 -0
  23. package/agents/incident.md +11 -0
  24. package/agents/integrity-check.md +11 -0
  25. package/agents/journal.md +11 -0
  26. package/agents/launch.md +11 -0
  27. package/agents/logic-guardian.md +11 -0
  28. package/agents/marketing.md +11 -0
  29. package/agents/onboard.md +11 -0
  30. package/agents/perf.md +11 -0
  31. package/agents/plan.md +11 -0
  32. package/agents/preflight.md +11 -0
  33. package/agents/problem-solver.md +11 -0
  34. package/agents/rescue.md +11 -0
  35. package/agents/research.md +11 -0
  36. package/agents/researcher.md +29 -0
  37. package/agents/review-intake.md +11 -0
  38. package/agents/review.md +11 -0
  39. package/agents/reviewer.md +28 -0
  40. package/agents/safeguard.md +11 -0
  41. package/agents/sast.md +11 -0
  42. package/agents/scanner.md +28 -0
  43. package/agents/scope-guard.md +11 -0
  44. package/agents/scout.md +11 -0
  45. package/agents/sentinel.md +11 -0
  46. package/agents/sequential-thinking.md +11 -0
  47. package/agents/session-bridge.md +11 -0
  48. package/agents/skill-forge.md +11 -0
  49. package/agents/skill-router.md +11 -0
  50. package/agents/surgeon.md +11 -0
  51. package/agents/team.md +11 -0
  52. package/agents/test.md +11 -0
  53. package/agents/trend-scout.md +11 -0
  54. package/agents/verification.md +11 -0
  55. package/agents/video-creator.md +11 -0
  56. package/agents/watchdog.md +11 -0
  57. package/agents/worktree.md +11 -0
  58. package/commands/.gitkeep +0 -0
  59. package/commands/rune.md +168 -0
  60. package/compiler/__tests__/openclaw-adapter.test.js +140 -0
  61. package/compiler/__tests__/parser.test.js +55 -0
  62. package/compiler/adapters/antigravity.js +59 -0
  63. package/compiler/adapters/claude.js +37 -0
  64. package/compiler/adapters/cursor.js +67 -0
  65. package/compiler/adapters/generic.js +60 -0
  66. package/compiler/adapters/index.js +45 -0
  67. package/compiler/adapters/openclaw.js +150 -0
  68. package/compiler/adapters/windsurf.js +60 -0
  69. package/compiler/bin/rune.js +288 -0
  70. package/compiler/doctor.js +153 -0
  71. package/compiler/emitter.js +240 -0
  72. package/compiler/parser.js +208 -0
  73. package/compiler/transformer.js +69 -0
  74. package/compiler/transforms/branding.js +27 -0
  75. package/compiler/transforms/cross-references.js +29 -0
  76. package/compiler/transforms/frontmatter.js +38 -0
  77. package/compiler/transforms/hooks.js +68 -0
  78. package/compiler/transforms/subagents.js +36 -0
  79. package/compiler/transforms/tool-names.js +60 -0
  80. package/contexts/dev.md +34 -0
  81. package/contexts/research.md +43 -0
  82. package/contexts/review.md +55 -0
  83. package/extensions/ai-ml/PACK.md +517 -0
  84. package/extensions/analytics/PACK.md +557 -0
  85. package/extensions/backend/PACK.md +678 -0
  86. package/extensions/chrome-ext/PACK.md +995 -0
  87. package/extensions/content/PACK.md +381 -0
  88. package/extensions/devops/PACK.md +520 -0
  89. package/extensions/ecommerce/PACK.md +280 -0
  90. package/extensions/gamedev/PACK.md +393 -0
  91. package/extensions/mobile/PACK.md +273 -0
  92. package/extensions/saas/PACK.md +805 -0
  93. package/extensions/security/PACK.md +536 -0
  94. package/extensions/trading/PACK.md +597 -0
  95. package/extensions/ui/PACK.md +947 -0
  96. package/package.json +47 -0
  97. package/skills/.gitkeep +0 -0
  98. package/skills/adversary/SKILL.md +271 -0
  99. package/skills/asset-creator/SKILL.md +157 -0
  100. package/skills/audit/SKILL.md +466 -0
  101. package/skills/autopsy/SKILL.md +200 -0
  102. package/skills/ba/SKILL.md +279 -0
  103. package/skills/brainstorm/SKILL.md +266 -0
  104. package/skills/browser-pilot/SKILL.md +168 -0
  105. package/skills/completion-gate/SKILL.md +151 -0
  106. package/skills/constraint-check/SKILL.md +165 -0
  107. package/skills/context-engine/SKILL.md +176 -0
  108. package/skills/cook/SKILL.md +636 -0
  109. package/skills/db/SKILL.md +256 -0
  110. package/skills/debug/SKILL.md +240 -0
  111. package/skills/dependency-doctor/SKILL.md +235 -0
  112. package/skills/deploy/SKILL.md +174 -0
  113. package/skills/design/DESIGN-REFERENCE.md +365 -0
  114. package/skills/design/SKILL.md +462 -0
  115. package/skills/doc-processor/SKILL.md +254 -0
  116. package/skills/docs/SKILL.md +336 -0
  117. package/skills/docs-seeker/SKILL.md +166 -0
  118. package/skills/fix/SKILL.md +192 -0
  119. package/skills/git/SKILL.md +285 -0
  120. package/skills/hallucination-guard/SKILL.md +204 -0
  121. package/skills/incident/SKILL.md +241 -0
  122. package/skills/integrity-check/SKILL.md +169 -0
  123. package/skills/journal/SKILL.md +190 -0
  124. package/skills/launch/SKILL.md +330 -0
  125. package/skills/logic-guardian/SKILL.md +240 -0
  126. package/skills/marketing/SKILL.md +229 -0
  127. package/skills/mcp-builder/SKILL.md +311 -0
  128. package/skills/onboard/SKILL.md +298 -0
  129. package/skills/perf/SKILL.md +297 -0
  130. package/skills/plan/SKILL.md +520 -0
  131. package/skills/preflight/SKILL.md +231 -0
  132. package/skills/problem-solver/SKILL.md +284 -0
  133. package/skills/rescue/SKILL.md +434 -0
  134. package/skills/research/SKILL.md +122 -0
  135. package/skills/review/SKILL.md +354 -0
  136. package/skills/review-intake/SKILL.md +222 -0
  137. package/skills/safeguard/SKILL.md +188 -0
  138. package/skills/sast/SKILL.md +190 -0
  139. package/skills/scaffold/SKILL.md +276 -0
  140. package/skills/scope-guard/SKILL.md +150 -0
  141. package/skills/scout/SKILL.md +232 -0
  142. package/skills/sentinel/SKILL.md +320 -0
  143. package/skills/sentinel-env/SKILL.md +226 -0
  144. package/skills/sequential-thinking/SKILL.md +234 -0
  145. package/skills/session-bridge/SKILL.md +287 -0
  146. package/skills/skill-forge/SKILL.md +317 -0
  147. package/skills/skill-router/SKILL.md +267 -0
  148. package/skills/surgeon/SKILL.md +203 -0
  149. package/skills/team/SKILL.md +397 -0
  150. package/skills/test/SKILL.md +271 -0
  151. package/skills/trend-scout/SKILL.md +145 -0
  152. package/skills/verification/SKILL.md +201 -0
  153. package/skills/video-creator/SKILL.md +201 -0
  154. package/skills/watchdog/SKILL.md +166 -0
  155. package/skills/worktree/SKILL.md +140 -0
@@ -0,0 +1,517 @@
1
+ ---
2
+ name: "@rune/ai-ml"
3
+ description: AI/ML integration patterns — LLM integration, RAG pipelines, embeddings, and fine-tuning workflows.
4
+ metadata:
5
+ author: runedev
6
+ version: "0.2.0"
7
+ layer: L4
8
+ price: "$15"
9
+ target: AI engineers
10
+ ---
11
+
12
+ # @rune/ai-ml
13
+
14
+ ## Purpose
15
+
16
+ AI-powered features fail in predictable ways: LLM calls without retry logic that crash on rate limits, RAG pipelines that retrieve irrelevant chunks because the chunking strategy ignores document structure, embedding search that returns semantic matches with zero keyword overlap, and fine-tuning runs that overfit because the eval set leaked into training data. This pack codifies production patterns for each — from API client resilience to retrieval quality to model evaluation — so AI features ship with the reliability of traditional software.
17
+
18
+ ## Triggers
19
+
20
+ - Auto-trigger: when `openai`, `anthropic`, `@langchain`, `pinecone`, `pgvector`, `embedding`, `llm` detected in dependencies or code
21
+ - `/rune llm-integration` — audit or improve LLM API usage
22
+ - `/rune rag-patterns` — build or audit RAG pipeline
23
+ - `/rune embedding-search` — implement or optimize semantic search
24
+ - `/rune fine-tuning-guide` — prepare and execute fine-tuning workflow
25
+ - Called by `cook` (L1) when AI/ML task detected
26
+ - Called by `plan` (L2) when AI architecture decisions needed
27
+
28
+ ## Skills Included
29
+
30
+ ### llm-integration
31
+
32
+ LLM integration patterns — API client wrappers, streaming responses, structured output, retry with exponential backoff, model fallback chains, prompt versioning.
33
+
34
+ #### Workflow
35
+
36
+ **Step 1 — Detect LLM usage**
37
+ Use Grep to find LLM API calls: `openai.chat`, `anthropic.messages`, `OpenAI(`, `Anthropic(`, `generateText`, `streamText`. Read client initialization and prompt construction to understand: model selection, error handling, output parsing, and token management.
38
+
39
+ **Step 2 — Audit resilience**
40
+ Check for: no retry on rate limit (429), no timeout on API calls, unstructured output parsing (regex on LLM text instead of function calling), hardcoded prompts without versioning, no token counting before request, missing fallback model chain, and streaming without backpressure handling.
41
+
42
+ **Step 3 — Emit robust LLM client**
43
+ Emit: typed client wrapper with exponential backoff retry, structured output via Zod schema + function calling, streaming with proper error boundaries, token budget management, and prompt version registry.
44
+
45
+ #### Example
46
+
47
+ ```typescript
48
+ // Robust LLM client — retry, structured output, fallback chain
49
+ import OpenAI from 'openai';
50
+ import { z } from 'zod';
51
+
52
+ const client = new OpenAI();
53
+
54
+ const SentimentSchema = z.object({
55
+ sentiment: z.enum(['positive', 'negative', 'neutral']),
56
+ confidence: z.number().min(0).max(1),
57
+ reasoning: z.string(),
58
+ });
59
+
60
+ async function analyzeSentiment(text: string, attempt = 0): Promise<z.infer<typeof SentimentSchema>> {
61
+ const models = ['gpt-4o-mini', 'gpt-4o'] as const; // fallback chain
62
+ const model = attempt >= 2 ? models[1] : models[0];
63
+
64
+ try {
65
+ const response = await client.chat.completions.create({
66
+ model,
67
+ messages: [
68
+ { role: 'system', content: 'Analyze sentiment. Return JSON matching the schema.' },
69
+ { role: 'user', content: text },
70
+ ],
71
+ response_format: { type: 'json_object' },
72
+ max_tokens: 200,
73
+ timeout: 10_000,
74
+ });
75
+
76
+ return SentimentSchema.parse(JSON.parse(response.choices[0].message.content!));
77
+ } catch (err) {
78
+ if (err instanceof OpenAI.RateLimitError && attempt < 3) {
79
+ await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000));
80
+ return analyzeSentiment(text, attempt + 1);
81
+ }
82
+ throw err;
83
+ }
84
+ }
85
+ ```
86
+
87
+ ---
88
+
89
+ ### rag-patterns
90
+
91
+ RAG pipeline patterns — document chunking, embedding generation, vector store setup, retrieval strategies, reranking.
92
+
93
+ #### Workflow
94
+
95
+ **Step 1 — Detect RAG components**
96
+ Use Grep to find vector store usage: `PineconeClient`, `pgvector`, `Weaviate`, `ChromaClient`, `QdrantClient`. Find embedding calls: `embeddings.create`, `embed()`. Read the ingestion pipeline and retrieval logic to map the full RAG flow.
97
+
98
+ **Step 2 — Audit retrieval quality**
99
+ Check for: fixed-size chunking that splits mid-sentence (context loss), no overlap between chunks (boundary information lost), embeddings generated without metadata (no filtering capability), retrieval without reranking (relevance drops after top-3), no chunk deduplication, and context window overflow (retrieved chunks exceed model limit).
100
+
101
+ **Step 3 — Emit RAG pipeline**
102
+ Emit: recursive text splitter with semantic boundaries, embedding generation with metadata, vector upsert with namespace, retrieval with reranking, and context window budget management.
103
+
104
+ #### Example
105
+
106
+ ```typescript
107
+ // RAG pipeline — recursive chunking + pgvector + reranking
108
+ import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
109
+ import { OpenAIEmbeddings } from '@langchain/openai';
110
+ import { PGVectorStore } from '@langchain/community/vectorstores/pgvector';
111
+
112
+ // Ingestion: chunk → embed → store
113
+ async function ingestDocument(doc: { content: string; metadata: Record<string, string> }) {
114
+ const splitter = new RecursiveCharacterTextSplitter({
115
+ chunkSize: 1000,
116
+ chunkOverlap: 200,
117
+ separators: ['\n## ', '\n### ', '\n\n', '\n', '. ', ' '],
118
+ });
119
+ const chunks = await splitter.createDocuments(
120
+ [doc.content],
121
+ [doc.metadata],
122
+ );
123
+
124
+ const embeddings = new OpenAIEmbeddings({ model: 'text-embedding-3-small' });
125
+ await PGVectorStore.fromDocuments(chunks, embeddings, {
126
+ postgresConnectionOptions: { connectionString: process.env.DATABASE_URL },
127
+ tableName: 'documents',
128
+ });
129
+ }
130
+
131
+ // Retrieval: query → vector search → rerank → top-k
132
+ async function retrieve(query: string, topK = 5) {
133
+ const store = await PGVectorStore.initialize(embeddings, pgConfig);
134
+ const candidates = await store.similaritySearch(query, topK * 3); // over-retrieve
135
+
136
+ // Rerank with Cohere
137
+ const { results } = await cohere.rerank({
138
+ model: 'rerank-english-v3.0',
139
+ query,
140
+ documents: candidates.map(c => c.pageContent),
141
+ topN: topK,
142
+ });
143
+
144
+ return results.map(r => candidates[r.index]);
145
+ }
146
+ ```
147
+
148
+ ---
149
+
150
+ ### embedding-search
151
+
152
+ Embedding-based search — semantic search, hybrid search (BM25 + vector), similarity thresholds, index optimization.
153
+
154
+ #### Workflow
155
+
156
+ **Step 1 — Detect search implementation**
157
+ Use Grep to find search code: `similarity_search`, `vector_search`, `fts`, `tsvector`, `BM25`. Read search handlers to understand: query flow, ranking strategy, and result formatting.
158
+
159
+ **Step 2 — Audit search quality**
160
+ Check for: pure vector search without keyword fallback (misses exact matches), no similarity threshold (returns irrelevant results at low scores), missing query embedding cache (repeated queries re-embed), no hybrid scoring (BM25 for exact + vector for semantic), and unoptimized vector index (HNSW parameters not tuned).
161
+
162
+ **Step 3 — Emit hybrid search**
163
+ Emit: combined BM25 + vector search with reciprocal rank fusion, similarity threshold filtering, query embedding cache, and HNSW index tuning.
164
+
165
+ #### Example
166
+
167
+ ```typescript
168
+ // Hybrid search — BM25 + vector with reciprocal rank fusion
169
+ async function hybridSearch(query: string, limit = 10) {
170
+ // Parallel: keyword (BM25) + semantic (vector)
171
+ const [keywordResults, vectorResults] = await Promise.all([
172
+ db.execute(sql`
173
+ SELECT id, content, ts_rank(search_vector, plainto_tsquery(${query})) AS bm25_score
174
+ FROM documents
175
+ WHERE search_vector @@ plainto_tsquery(${query})
176
+ ORDER BY bm25_score DESC LIMIT ${limit * 2}
177
+ `),
178
+ db.execute(sql`
179
+ SELECT id, content, 1 - (embedding <=> ${await getEmbedding(query)}) AS vector_score
180
+ FROM documents
181
+ ORDER BY embedding <=> ${await getEmbedding(query)}
182
+ LIMIT ${limit * 2}
183
+ `),
184
+ ]);
185
+
186
+ // Reciprocal rank fusion (k=60)
187
+ const scores = new Map<string, number>();
188
+ const K = 60;
189
+ keywordResults.forEach((r, i) => scores.set(r.id, (scores.get(r.id) || 0) + 1 / (K + i + 1)));
190
+ vectorResults.forEach((r, i) => scores.set(r.id, (scores.get(r.id) || 0) + 1 / (K + i + 1)));
191
+
192
+ return [...scores.entries()]
193
+ .sort((a, b) => b[1] - a[1])
194
+ .slice(0, limit)
195
+ .filter(([_, score]) => score > 0.01); // threshold
196
+ }
197
+
198
+ // Embedding cache (avoid re-embedding repeated queries)
199
+ const embeddingCache = new Map<string, number[]>();
200
+ async function getEmbedding(text: string): Promise<number[]> {
201
+ const cached = embeddingCache.get(text);
202
+ if (cached) return cached;
203
+ const { data } = await openai.embeddings.create({ model: 'text-embedding-3-small', input: text });
204
+ embeddingCache.set(text, data[0].embedding);
205
+ return data[0].embedding;
206
+ }
207
+ ```
208
+
209
+ ---
210
+
211
+ ### fine-tuning-guide
212
+
213
+ Fine-tuning workflows — dataset preparation, training configuration, evaluation metrics, deployment, A/B testing.
214
+
215
+ #### Workflow
216
+
217
+ **Step 1 — Audit training data**
218
+ Use Read to examine the dataset files. Check for: data format (JSONL with `messages` array), train/eval split (eval must not overlap with train), sufficient examples (minimum 50, recommended 200+), balanced class distribution, and PII in training data.
219
+
220
+ **Step 2 — Prepare and validate dataset**
221
+ Emit: JSONL formatter that validates each example, train/eval splitter with stratification, token count estimator (cost preview), and data quality checks (duplicate detection, format validation).
222
+
223
+ **Step 3 — Execute fine-tuning and evaluate**
224
+ Emit: fine-tune API call with hyperparameters, evaluation script that compares base vs fine-tuned on held-out set, and A/B deployment configuration.
225
+
226
+ #### Example
227
+
228
+ ```python
229
+ # Fine-tuning workflow — prepare, train, evaluate
230
+ import json
231
+ import openai
232
+ from sklearn.model_selection import train_test_split
233
+
234
+ # Step 1: Prepare JSONL dataset
235
+ def prepare_dataset(examples: list[dict], output_prefix: str):
236
+ train, eval_set = train_test_split(examples, test_size=0.2, random_state=42)
237
+
238
+ for split_name, split_data in [("train", train), ("eval", eval_set)]:
239
+ path = f"{output_prefix}_{split_name}.jsonl"
240
+ with open(path, "w") as f:
241
+ for ex in split_data:
242
+ f.write(json.dumps({"messages": [
243
+ {"role": "system", "content": ex["system"]},
244
+ {"role": "user", "content": ex["input"]},
245
+ {"role": "assistant", "content": ex["output"]},
246
+ ]}) + "\n")
247
+ print(f"Wrote {len(split_data)} examples to {path}")
248
+
249
+ # Step 2: Launch fine-tuning
250
+ def start_fine_tune(train_file: str, eval_file: str):
251
+ train_id = openai.files.create(file=open(train_file, "rb"), purpose="fine-tune").id
252
+ eval_id = openai.files.create(file=open(eval_file, "rb"), purpose="fine-tune").id
253
+
254
+ job = openai.fine_tuning.jobs.create(
255
+ training_file=train_id,
256
+ validation_file=eval_id,
257
+ model="gpt-4o-mini-2024-07-18",
258
+ hyperparameters={"n_epochs": 3, "batch_size": "auto", "learning_rate_multiplier": "auto"},
259
+ )
260
+ print(f"Fine-tuning job: {job.id} — status: {job.status}")
261
+ return job
262
+
263
+ # Step 3: Evaluate base vs fine-tuned
264
+ def evaluate(base_model: str, ft_model: str, eval_set: list[dict]) -> dict:
265
+ results = {"base": {"correct": 0}, "finetuned": {"correct": 0}}
266
+ for ex in eval_set:
267
+ for label, model in [("base", base_model), ("finetuned", ft_model)]:
268
+ response = openai.chat.completions.create(
269
+ model=model, messages=ex["messages"][:2], max_tokens=500,
270
+ )
271
+ if response.choices[0].message.content.strip() == ex["messages"][2]["content"].strip():
272
+ results[label]["correct"] += 1
273
+ for label in results:
274
+ results[label]["accuracy"] = results[label]["correct"] / len(eval_set)
275
+ return results
276
+ ```
277
+
278
+ ---
279
+
280
+ ### llm-architect
281
+
282
+ LLM system architecture — model selection, prompt engineering patterns, evaluation frameworks, cost optimization, multi-model routing, and guardrail design.
283
+
284
+ #### Workflow
285
+
286
+ **Step 1 — Assess LLM requirements**
287
+ Understand the use case: what does the LLM need to do? Classify into:
288
+ - **Generation**: open-ended text (blog, email, creative writing)
289
+ - **Extraction**: structured data from unstructured input (JSON from text, entities, classification)
290
+ - **Reasoning**: multi-step logic (math, code generation, planning)
291
+ - **Conversation**: multi-turn dialogue with memory
292
+ - **Agentic**: tool use, function calling, autonomous task execution
293
+
294
+ For each class, identify: latency requirements (real-time < 2s, async < 30s, batch), accuracy requirements (critical = needs eval suite, casual = spot check), cost sensitivity (per-call budget), and data sensitivity (PII, HIPAA, can data leave the network?).
295
+
296
+ **Step 2 — Model selection matrix**
297
+ Based on requirements, recommend model tier:
298
+
299
+ | Requirement | Recommended | Fallback |
300
+ |------------|-------------|----------|
301
+ | Fast + cheap (classification, routing) | Haiku / GPT-4o-mini | Local (Llama 3) |
302
+ | Balanced (code, summaries, RAG) | Sonnet / GPT-4o | Haiku with retry |
303
+ | Deep reasoning (architecture, math) | Opus / o1 | Sonnet with chain-of-thought |
304
+ | On-premise required | Llama 3 / Mistral | Ollama local deployment |
305
+ | Multimodal (vision + text) | Sonnet / GPT-4o | Local LLaVA |
306
+
307
+ Emit: primary model, fallback model, estimated cost per 1K calls, and latency p50/p99.
308
+
309
+ **Step 3 — Prompt architecture**
310
+ Design the prompt structure:
311
+ - **System prompt**: Role definition, constraints, output format. Keep under 500 tokens for cost efficiency.
312
+ - **Few-shot examples**: 2-3 examples for extraction/classification tasks. Format matches expected output exactly.
313
+ - **Chain-of-thought**: For reasoning tasks, explicitly request step-by-step thinking before final answer.
314
+ - **Structured output**: JSON mode or tool use for extraction. Define schema with Zod/Pydantic for validation.
315
+
316
+ **Step 4 — Guardrails and evaluation**
317
+ Design safety and quality layers:
318
+ - **Input guardrails**: PII detection, prompt injection detection, topic filtering
319
+ - **Output guardrails**: Schema validation, hallucination checks, toxicity filtering
320
+ - **Evaluation framework**: Define eval dataset (50+ examples), metrics (accuracy, latency, cost), and regression threshold (new prompt must not drop > 2% on any metric)
321
+
322
+ Save architecture doc to `.rune/ai/llm-architecture.md`.
323
+
324
+ #### Example
325
+
326
+ ```typescript
327
+ // Multi-model router with fallback
328
+ interface ModelConfig {
329
+ id: string;
330
+ provider: 'anthropic' | 'openai' | 'local';
331
+ costPer1kTokens: number;
332
+ maxTokens: number;
333
+ latencyP50Ms: number;
334
+ }
335
+
336
+ const MODELS: Record<string, ModelConfig> = {
337
+ fast: {
338
+ id: 'claude-haiku-4-5-20251001',
339
+ provider: 'anthropic',
340
+ costPer1kTokens: 0.001,
341
+ maxTokens: 4096,
342
+ latencyP50Ms: 200,
343
+ },
344
+ balanced: {
345
+ id: 'claude-sonnet-4-6',
346
+ provider: 'anthropic',
347
+ costPer1kTokens: 0.01,
348
+ maxTokens: 8192,
349
+ latencyP50Ms: 800,
350
+ },
351
+ deep: {
352
+ id: 'claude-opus-4-6',
353
+ provider: 'anthropic',
354
+ costPer1kTokens: 0.05,
355
+ maxTokens: 16384,
356
+ latencyP50Ms: 2000,
357
+ },
358
+ };
359
+
360
+ type TaskComplexity = 'trivial' | 'standard' | 'complex';
361
+
362
+ function selectModel(complexity: TaskComplexity): ModelConfig {
363
+ const map: Record<TaskComplexity, string> = {
364
+ trivial: 'fast',
365
+ standard: 'balanced',
366
+ complex: 'deep',
367
+ };
368
+ return MODELS[map[complexity]];
369
+ }
370
+
371
+ // Prompt architecture template
372
+ const systemPrompt = `You are a ${role} assistant.
373
+
374
+ CONSTRAINTS:
375
+ - ${constraints.join('\n- ')}
376
+
377
+ OUTPUT FORMAT:
378
+ Return valid JSON matching this schema:
379
+ ${JSON.stringify(outputSchema, null, 2)}
380
+
381
+ Do not include explanations outside the JSON.`;
382
+
383
+ // Guardrail: validate structured output
384
+ import { z } from 'zod';
385
+
386
+ const OutputSchema = z.object({
387
+ classification: z.enum(['positive', 'negative', 'neutral']),
388
+ confidence: z.number().min(0).max(1),
389
+ reasoning: z.string().max(200),
390
+ });
391
+
392
+ function validateOutput(raw: string): z.infer<typeof OutputSchema> {
393
+ const parsed = JSON.parse(raw);
394
+ return OutputSchema.parse(parsed); // throws if invalid
395
+ }
396
+ ```
397
+
398
+ ---
399
+
400
+ ### prompt-patterns
401
+
402
+ Reusable prompt engineering patterns — structured output, chain-of-thought, self-critique, tool use orchestration, and multi-turn memory management.
403
+
404
+ #### Workflow
405
+
406
+ **Step 1 — Identify the pattern**
407
+ Match the user's task to a proven prompt pattern:
408
+ - **Extraction**: Use JSON mode + schema definition + few-shot examples
409
+ - **Classification**: Use enum output + confidence score + chain-of-thought
410
+ - **Summarization**: Use structured summary template + length constraint + key point extraction
411
+ - **Code generation**: Use system prompt with language constraints + test-driven output format
412
+ - **Agent loop**: Use ReAct pattern (Thought → Action → Observation → repeat)
413
+ - **Self-critique**: Use generate → critique → revise loop for quality-sensitive output
414
+
415
+ **Step 2 — Apply the pattern**
416
+ Generate the prompt following the selected pattern. Include:
417
+ - System prompt (role + constraints + output format)
418
+ - User message template (input variables marked with `{{variable}}`)
419
+ - Few-shot examples (2-3, matching exact output format)
420
+ - Validation schema (Zod/Pydantic for structured output)
421
+
422
+ **Step 3 — Test harness**
423
+ Emit a test file with 5+ test cases that validate the prompt produces correct output for known inputs. Include edge cases: empty input, very long input, ambiguous input, adversarial input.
424
+
425
+ #### Example
426
+
427
+ ```typescript
428
+ // Pattern: ReAct Agent Loop
429
+ const REACT_SYSTEM = `You are an agent that solves tasks using available tools.
430
+
431
+ For each step, output EXACTLY this JSON format:
432
+ {"thought": "reasoning about what to do next",
433
+ "action": "tool_name",
434
+ "action_input": "input for the tool"}
435
+
436
+ After receiving an observation, continue with the next thought.
437
+ When you have the final answer, output:
438
+ {"thought": "I have the answer", "final_answer": "the answer"}
439
+
440
+ Available tools:
441
+ {{tools}}`;
442
+
443
+ // Pattern: Self-Critique Loop
444
+ async function generateWithCritique(prompt: string, maxRounds = 2) {
445
+ let output = await llm.generate(prompt);
446
+
447
+ for (let i = 0; i < maxRounds; i++) {
448
+ const critique = await llm.generate(
449
+ `Review this output for errors, omissions, and improvements:\n\n${output}\n\n` +
450
+ `List specific issues. If no issues, respond with "APPROVED".`
451
+ );
452
+
453
+ if (critique.includes('APPROVED')) break;
454
+
455
+ output = await llm.generate(
456
+ `Original output:\n${output}\n\nCritique:\n${critique}\n\n` +
457
+ `Revise the output to address all issues in the critique.`
458
+ );
459
+ }
460
+
461
+ return output;
462
+ }
463
+ ```
464
+
465
+ ---
466
+
467
+ ## Connections
468
+
469
+ ```
470
+ Calls → research (L3): lookup model documentation and best practices
471
+ Calls → docs-seeker (L3): API reference for LLM providers
472
+ Calls → verification (L3): validate pipeline correctness
473
+ Called By ← cook (L1): when AI/ML task detected
474
+ Called By ← plan (L2): when AI architecture decisions needed
475
+ Called By ← review (L2): when AI code under review
476
+ ```
477
+
478
+ ## Tech Stack Support
479
+
480
+ | Provider | SDK | Vector Store | Notes |
481
+ |----------|-----|-------------|-------|
482
+ | OpenAI | openai v4+ | pgvector | Most common, JSON mode + function calling |
483
+ | Anthropic | @anthropic-ai/sdk | Pinecone | Tool use + long context |
484
+ | Cohere | cohere-ai | Weaviate | Reranking + embed v3 |
485
+ | Local (Ollama) | ollama-js | ChromaDB | Self-hosted, privacy-sensitive |
486
+
487
+ ## Constraints
488
+
489
+ 1. MUST implement retry with exponential backoff on all LLM API calls — rate limits are guaranteed at scale.
490
+ 2. MUST validate LLM output against a schema (Zod/Pydantic) — never trust raw text parsing for structured data.
491
+ 3. MUST separate training and evaluation datasets — eval set leaking into training invalidates all metrics.
492
+ 4. MUST set similarity thresholds on vector search — returning all results regardless of score degrades quality.
493
+ 5. MUST NOT embed sensitive/PII data without explicit consent — embeddings are not easily deletable from vector stores.
494
+
495
+ ## Sharp Edges
496
+
497
+ | Failure Mode | Severity | Mitigation |
498
+ |---|---|---|
499
+ | LLM rate limit (429) crashes entire request pipeline | HIGH | Exponential backoff retry with jitter; fallback model chain for critical paths |
500
+ | RAG retrieves irrelevant chunks due to fixed-size splitting across section boundaries | HIGH | Use recursive splitter with semantic separators (headings, paragraphs); include metadata for filtering |
501
+ | Vector search returns high-similarity results that are factually wrong (semantic ≠ factual) | HIGH | Always rerank with cross-encoder; include source citation for verification |
502
+ | Fine-tuned model overfits to training format, fails on slightly different inputs | HIGH | Include diverse input formats in training data; evaluate on out-of-distribution examples |
503
+ | Embedding dimension mismatch between index and query model (model upgraded) | CRITICAL | Pin embedding model version; store model version in index metadata; re-embed on model change |
504
+ | Token budget overflow when stuffing retrieved chunks into prompt | MEDIUM | Count tokens before assembly; truncate or drop lowest-ranked chunks to fit budget |
505
+
506
+ ## Done When
507
+
508
+ - LLM client has retry, structured output, streaming, and fallback chain
509
+ - RAG pipeline ingests, chunks, embeds, stores, retrieves, and reranks correctly
510
+ - Hybrid search returns relevant results for both keyword and semantic queries
511
+ - Fine-tuning dataset validated, model trained, and eval shows improvement over base
512
+ - All API calls handle rate limits and timeouts gracefully
513
+ - Structured report emitted for each skill invoked
514
+
515
+ ## Cost Profile
516
+
517
+ ~10,000–18,000 tokens per full pack run (all 4 skills). Individual skill: ~2,500–5,000 tokens. Sonnet default. Use haiku for code detection scans; escalate to sonnet for pipeline design and evaluation strategy.