agent-scaffold-cli 0.1.1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (66) hide show
  1. agent_scaffold/__init__.py +8 -0
  2. agent_scaffold/__main__.py +6 -0
  3. agent_scaffold/_bundled_deployments/__init__.py +15 -0
  4. agent_scaffold/_bundled_deployments/docs/cross-cutting/README.md +15 -0
  5. agent_scaffold/_bundled_deployments/docs/cross-cutting/auth-jwt.md +235 -0
  6. agent_scaffold/_bundled_deployments/docs/cross-cutting/logging-structured.md +196 -0
  7. agent_scaffold/_bundled_deployments/docs/cross-cutting/observability.md +259 -0
  8. agent_scaffold/_bundled_deployments/docs/cross-cutting/rate-limiting.md +171 -0
  9. agent_scaffold/_bundled_deployments/docs/cross-cutting/testing-strategy.md +261 -0
  10. agent_scaffold/_bundled_deployments/docs/frameworks/README.md +22 -0
  11. agent_scaffold/_bundled_deployments/docs/frameworks/crewai.md +91 -0
  12. agent_scaffold/_bundled_deployments/docs/frameworks/langgraph.md +79 -0
  13. agent_scaffold/_bundled_deployments/docs/frameworks/mastra.md +74 -0
  14. agent_scaffold/_bundled_deployments/docs/frameworks/pydantic-ai.md +77 -0
  15. agent_scaffold/_bundled_deployments/docs/frameworks/vercel-ai-sdk.md +83 -0
  16. agent_scaffold/_bundled_deployments/docs/patterns/README.md +26 -0
  17. agent_scaffold/_bundled_deployments/docs/patterns/memory.md +82 -0
  18. agent_scaffold/_bundled_deployments/docs/patterns/multi-agent-flat.md +72 -0
  19. agent_scaffold/_bundled_deployments/docs/patterns/multi-agent-hierarchical.md +83 -0
  20. agent_scaffold/_bundled_deployments/docs/patterns/parallel-calls.md +73 -0
  21. agent_scaffold/_bundled_deployments/docs/patterns/plan-execute-reflect.md +77 -0
  22. agent_scaffold/_bundled_deployments/docs/patterns/prompt-chaining.md +73 -0
  23. agent_scaffold/_bundled_deployments/docs/patterns/rag.md +84 -0
  24. agent_scaffold/_bundled_deployments/docs/patterns/react.md +77 -0
  25. agent_scaffold/_bundled_deployments/docs/patterns/routing-tool-use.md +69 -0
  26. agent_scaffold/_bundled_deployments/docs/recipes/README.md +39 -0
  27. agent_scaffold/_bundled_deployments/docs/recipes/code-review-agent.md +518 -0
  28. agent_scaffold/_bundled_deployments/docs/recipes/content-pipeline.md +525 -0
  29. agent_scaffold/_bundled_deployments/docs/recipes/customer-support-triage.md +1679 -0
  30. agent_scaffold/_bundled_deployments/docs/recipes/docs-rag-qa.md +1254 -0
  31. agent_scaffold/_bundled_deployments/docs/recipes/hierarchical-agent.md +554 -0
  32. agent_scaffold/_bundled_deployments/docs/recipes/memory-assistant.md +499 -0
  33. agent_scaffold/_bundled_deployments/docs/recipes/ops-crew.md +457 -0
  34. agent_scaffold/_bundled_deployments/docs/recipes/parallel-enricher.md +457 -0
  35. agent_scaffold/_bundled_deployments/docs/recipes/research-assistant.md +1096 -0
  36. agent_scaffold/_bundled_deployments/docs/stack/README.md +19 -0
  37. agent_scaffold/_bundled_deployments/docs/stack/api-fastapi.md +112 -0
  38. agent_scaffold/_bundled_deployments/docs/stack/api-hono.md +108 -0
  39. agent_scaffold/_bundled_deployments/docs/stack/cache-redis.md +85 -0
  40. agent_scaffold/_bundled_deployments/docs/stack/eval-deepeval-ragas-promptfoo.md +164 -0
  41. agent_scaffold/_bundled_deployments/docs/stack/llm-claude.md +105 -0
  42. agent_scaffold/_bundled_deployments/docs/stack/relational-postgres.md +122 -0
  43. agent_scaffold/_bundled_deployments/docs/stack/tool-protocol-mcp.md +275 -0
  44. agent_scaffold/_bundled_deployments/docs/stack/tracing-langfuse.md +108 -0
  45. agent_scaffold/_bundled_deployments/docs/stack/vector-qdrant.md +121 -0
  46. agent_scaffold/cache.py +32 -0
  47. agent_scaffold/cli.py +512 -0
  48. agent_scaffold/config.py +117 -0
  49. agent_scaffold/context.py +253 -0
  50. agent_scaffold/contract.py +141 -0
  51. agent_scaffold/discovery.py +112 -0
  52. agent_scaffold/generator.py +213 -0
  53. agent_scaffold/languages/__init__.py +0 -0
  54. agent_scaffold/languages/python.yaml +28 -0
  55. agent_scaffold/languages/typescript.yaml +25 -0
  56. agent_scaffold/prompts/__init__.py +0 -0
  57. agent_scaffold/prompts/repair.md +9 -0
  58. agent_scaffold/prompts/system.md +21 -0
  59. agent_scaffold/prompts/user_template.md +43 -0
  60. agent_scaffold/validator.py +133 -0
  61. agent_scaffold/writer.py +171 -0
  62. agent_scaffold_cli-0.1.1.dist-info/METADATA +147 -0
  63. agent_scaffold_cli-0.1.1.dist-info/RECORD +66 -0
  64. agent_scaffold_cli-0.1.1.dist-info/WHEEL +4 -0
  65. agent_scaffold_cli-0.1.1.dist-info/entry_points.txt +2 -0
  66. agent_scaffold_cli-0.1.1.dist-info/licenses/LICENSE +21 -0
@@ -0,0 +1,83 @@
1
+ # Pattern: Multi-Agent Hierarchical (Supervisor + Workers)
2
+
3
+ **One-liner:** A supervisor agent delegates sub-tasks to specialized worker agents, coordinates their outputs, and decides when the job is done.
4
+
5
+ ## When to use
6
+
7
+ - The task is complex and decomposes into sub-tasks that require different specializations.
8
+ - You need centralized coordination — someone to decide what happens next based on intermediate results.
9
+ - Sub-agents may need to be called multiple times or in different orders depending on results.
10
+ - You want a clear chain of command for audit and debugging purposes.
11
+
12
+ ## When NOT to use
13
+
14
+ - The task doesn't require multiple specializations (use a single agent with tools).
15
+ - All sub-tasks are independent with no coordination needed (use Parallel Calls or Flat Multi-Agent).
16
+ - The number of sub-agents is small (2) and the interaction is simple (just use Routing).
17
+
18
+ ## Core flow
19
+
20
+ ```
21
+ User task
22
+ |
23
+ v
24
+ [Supervisor] ──> "I need Agent A to do X first"
25
+ |
26
+ v
27
+ [Agent A: Researcher] ──> results
28
+ |
29
+ v
30
+ [Supervisor] ──> "Now I need Agent B to do Y with A's results"
31
+ |
32
+ v
33
+ [Agent B: Writer] ──> draft
34
+ |
35
+ v
36
+ [Supervisor] ──> "Agent C should review this"
37
+ |
38
+ v
39
+ [Agent C: Reviewer] ──> feedback
40
+ |
41
+ v
42
+ [Supervisor] ──> "Done" or "Agent B, revise based on feedback"
43
+ |
44
+ v
45
+ Final output
46
+ ```
47
+
48
+ ### Variants
49
+
50
+ - **Single-level hierarchy:** One supervisor, N workers. Most common.
51
+ - **Multi-level hierarchy:** Supervisor → sub-supervisors → workers. For very complex tasks.
52
+ - **Supervisor with self-delegation:** The supervisor can also do work itself, not just delegate.
53
+ - **Dynamic team:** The supervisor can spawn new workers as needed based on the task.
54
+
55
+ ## Key components
56
+
57
+ - **Supervisor agent:** The orchestrator. Decides which worker to call, with what input, and when to stop. Has access to all worker agent descriptions but not their tools directly.
58
+ - **Worker agents:** Specialized agents with their own system prompts and tools. Each handles one domain.
59
+ - **Delegation protocol:** How the supervisor invokes workers — typically as tool calls where each worker is a "tool" the supervisor can call.
60
+ - **State manager:** Tracks what's been done, what's pending, and intermediate results. LangGraph's state graph is ideal.
61
+ - **Termination logic:** The supervisor decides when the task is complete. Can be explicit ("all sub-tasks done") or LLM-judged.
62
+
63
+ ## Common pitfalls
64
+
65
+ - **Supervisor bottleneck:** Every interaction goes through the supervisor, adding latency. For independent sub-tasks, let workers run in parallel.
66
+ - **Over-delegation:** The supervisor breaks the task into too many tiny sub-tasks. Give it examples of good delegation granularity.
67
+ - **Lost context:** Each worker only sees its sub-task, not the full picture. The supervisor must provide enough context in each delegation.
68
+ - **Supervisor hallucination:** The supervisor claims a worker returned results it didn't. Validate worker outputs in state.
69
+ - **Worker scope creep:** A worker tries to do work outside its specialty. Keep worker system prompts narrowly focused.
70
+
71
+ ## Framework fit
72
+
73
+ | Framework | Native support | Notes |
74
+ |-----------|----------------|-------|
75
+ | LangGraph | `langgraph-supervisor` package — purpose-built supervisor pattern | Best fit — each worker is a compiled sub-graph |
76
+ | CrewAI | Hierarchical process mode with `manager_agent` | Built-in but less flexible than LangGraph |
77
+ | Mastra | Agent workflows with delegation | TS-native, manual but flexible |
78
+ | Pydantic AI | Manual orchestration — supervisor agent with workers as tools | Works but you build the coordination |
79
+ | Vercel AI SDK | Manual orchestration | No built-in hierarchy |
80
+
81
+ ## Reference implementations
82
+
83
+ - [recipes/hierarchical-agent.md](../recipes/hierarchical-agent.md) — Hierarchical multi-agent with LangGraph supervisor (skeleton)
@@ -0,0 +1,73 @@
1
+ # Pattern: Parallel Calls (Fan-out / Fan-in)
2
+
3
+ **One-liner:** Run multiple independent LLM calls or tool invocations concurrently, then aggregate their results.
4
+
5
+ ## When to use
6
+
7
+ - You have N independent sub-tasks that don't depend on each other (e.g., enrich N records, summarize N documents).
8
+ - Latency matters — sequential execution of N tasks takes N× as long.
9
+ - Each sub-task uses the same prompt/tool with different inputs (batch processing pattern).
10
+ - You need to aggregate or merge results after all sub-tasks complete.
11
+
12
+ ## When NOT to use
13
+
14
+ - Sub-tasks depend on each other's outputs (use Prompt Chaining).
15
+ - N is very large (100+) and you'd hit rate limits. Add concurrency controls.
16
+ - The sub-tasks need to coordinate or share state mid-execution (use Multi-Agent).
17
+
18
+ ## Core flow
19
+
20
+ ```
21
+ Input (list of items)
22
+ |
23
+ v
24
+ [Fan-out] ──> spawn N concurrent tasks
25
+ |
26
+ ├──> [Task 1: process item A] ──┐
27
+ ├──> [Task 2: process item B] ──┤
28
+ ├──> [Task 3: process item C] ──┤
29
+ └──> [Task N: process item N] ──┘
30
+ |
31
+ v
32
+ [Fan-in / Aggregate]
33
+ |
34
+ v
35
+ Combined result
36
+ ```
37
+
38
+ ### Variants
39
+
40
+ - **Homogeneous fan-out:** All tasks use the same prompt/tool, different inputs. Classic batch processing.
41
+ - **Heterogeneous fan-out:** Different tasks run different prompts/tools in parallel (e.g., summarize + extract entities + classify sentiment simultaneously on the same document).
42
+ - **Fan-out with partial failure:** Some tasks may fail. Collect successes, report failures, optionally retry.
43
+ - **Chunked fan-out:** For large N, split into batches with concurrency limits (e.g., 10 at a time).
44
+
45
+ ## Key components
46
+
47
+ - **Splitter:** Divides the input into independent work units.
48
+ - **Worker:** The LLM call or tool invocation applied to each unit. Should be stateless and idempotent.
49
+ - **Concurrency controller:** Limits how many workers run simultaneously. Prevents rate-limit exhaustion. In Python: `asyncio.Semaphore`. In TS: `Promise.all()` with chunking.
50
+ - **Aggregator:** Merges individual results into a combined output. May be a simple list, a summary, or a structured merge.
51
+ - **Error handler:** Decides what to do when individual tasks fail (skip, retry, abort all).
52
+
53
+ ## Common pitfalls
54
+
55
+ - **No concurrency limit:** Firing 100 parallel LLM calls will hit rate limits. Use a semaphore or batch size.
56
+ - **One failure kills all:** If using `Promise.all()` or `asyncio.gather()` without error handling, one failure cancels everything. Use `Promise.allSettled()` / `return_exceptions=True`.
57
+ - **Results out of order:** Parallel results may return in any order. Preserve input-output mapping (index or ID).
58
+ - **Context window limits:** If the aggregator tries to merge too many results into one prompt, it may exceed the context window. Summarize incrementally.
59
+ - **Cost multiplication:** N parallel calls = N× the cost of one call. Make sure parallelism is worth it vs. a single batched prompt.
60
+
61
+ ## Framework fit
62
+
63
+ | Framework | Native support | Notes |
64
+ |-----------|----------------|-------|
65
+ | Pydantic AI | `asyncio.gather()` with multiple `agent.run()` calls | Natural — async-first, clean ergonomics |
66
+ | Mastra | `Promise.all()` with agent calls, or workflow parallel steps | TS-native async |
67
+ | LangGraph | `Send()` API for map-reduce fan-out | Works but heavier than raw async |
68
+ | Vercel AI SDK | `Promise.all()` with `generateText()` calls | Simple and effective |
69
+ | CrewAI | Parallel task execution in a crew | Built-in but less control over concurrency |
70
+
71
+ ## Reference implementations
72
+
73
+ - [recipes/parallel-enricher.md](../recipes/parallel-enricher.md) — Parallel batch enrichment with Pydantic AI / Mastra (skeleton)
@@ -0,0 +1,77 @@
1
+ # Pattern: Plan, Execute, Reflect
2
+
3
+ **One-liner:** The agent creates a plan of steps, executes them one by one, then reflects on results to decide if re-planning is needed.
4
+
5
+ ## When to use
6
+
7
+ - The task is complex enough that jumping straight to execution would miss steps or go off track.
8
+ - You want the agent to be self-correcting — detecting when a step failed and adapting.
9
+ - The task has a verifiable goal (code review: "all issues found"; research: "question answered").
10
+ - You need an auditable trace: plan → what was done → what was learned → revised plan.
11
+
12
+ ## When NOT to use
13
+
14
+ - The task is simple enough for a single LLM call or a fixed pipeline (use Prompt Chaining).
15
+ - There's no way to evaluate whether a step succeeded (reflection has nothing to work with).
16
+ - Latency is critical — the plan/reflect overhead adds 2+ extra LLM calls per cycle.
17
+
18
+ ## Core flow
19
+
20
+ ```
21
+ User task
22
+ |
23
+ v
24
+ [Planner] ──> step list (ordered, with dependencies)
25
+ |
26
+ v
27
+ [Executor] ──> execute step 1 ──> result
28
+ | |
29
+ | v
30
+ | [Reflector] ──> "step succeeded" / "step failed, because..."
31
+ | |
32
+ | ┌──────────┴──────────┐
33
+ | v v
34
+ | continue to [Re-planner] ──> revised steps
35
+ | step 2 |
36
+ | | v
37
+ | └─────────────────> next step
38
+ | ...
39
+ v
40
+ Final output (after all steps pass reflection)
41
+ ```
42
+
43
+ ### Variants
44
+
45
+ - **Plan-Execute (no reflection):** Simpler version — plan once, execute all steps, return results. Good when steps are unlikely to fail.
46
+ - **Plan-Execute-Reflect:** Full version with a reflection step after each execution. Re-plans if reflection identifies problems.
47
+ - **Iterative deepening:** Start with a coarse plan, execute, reflect, then create more detailed sub-plans for steps that need it.
48
+
49
+ ## Key components
50
+
51
+ - **Planner:** An LLM call that takes the task description and produces a structured step list. Each step has: description, expected output, success criteria.
52
+ - **Executor:** Runs one step at a time. May use tools (search, code execution, API calls) to complete each step.
53
+ - **Reflector:** Evaluates the executor's output against the step's success criteria. Returns pass/fail + reasoning.
54
+ - **Re-planner:** If reflection fails, takes the original plan + what happened + what went wrong, and produces a revised plan.
55
+ - **State store:** Holds the plan, completed steps, and their results. LangGraph's checkpointer is ideal for this.
56
+
57
+ ## Common pitfalls
58
+
59
+ - **Over-planning:** The planner produces 15 steps for a task that needs 3. Constrain the planner with max steps or examples of good plans.
60
+ - **Reflection hallucination:** The reflector says "looks good" when the step clearly failed. Use concrete success criteria, not vibes.
61
+ - **Infinite re-planning:** Reflection keeps failing, re-planner keeps generating new plans. Set a max replanning budget (2-3 attempts).
62
+ - **Executor can't do the step:** The planner writes steps that require capabilities the executor doesn't have. Ground the planner in available tools.
63
+ - **State explosion:** Each cycle adds to state. Summarize completed steps rather than carrying full outputs.
64
+
65
+ ## Framework fit
66
+
67
+ | Framework | Native support | Notes |
68
+ |-----------|----------------|-------|
69
+ | LangGraph | Planner, Executor, Reflector as graph nodes with conditional edges for re-planning | Best fit — state management handles plan evolution |
70
+ | Pydantic AI | Separate agents for planner/executor/reflector, manual orchestration | Works but you manage state yourself |
71
+ | Mastra | Workflow steps for plan/execute/reflect cycle | TS-native, workflow primitives help |
72
+ | CrewAI | Sequential crew with planner + executor agents | Possible but less control over reflection loop |
73
+ | Vercel AI SDK | Manual orchestration with `generateObject()` / `generateText()` | Lightweight but no built-in state management |
74
+
75
+ ## Reference implementations
76
+
77
+ - [recipes/code-review-agent.md](../recipes/code-review-agent.md) — Plan-and-Execute code reviewer with LangGraph (skeleton)
@@ -0,0 +1,73 @@
1
+ # Pattern: Prompt Chaining
2
+
3
+ **One-liner:** Break a complex task into a fixed sequence of LLM calls where each step's output feeds the next step's input.
4
+
5
+ ## When to use
6
+
7
+ - The task naturally decomposes into ordered stages (e.g., research → outline → draft → edit).
8
+ - Each stage has a clear input/output contract.
9
+ - You want deterministic flow — every request follows the same pipeline.
10
+ - Quality benefits from specialized prompts per stage rather than one monolithic prompt.
11
+
12
+ ## When NOT to use
13
+
14
+ - The sequence of steps isn't known upfront (use ReAct).
15
+ - Steps need to run in parallel, not sequentially (use Parallel Calls).
16
+ - The agent needs to decide whether to skip or repeat steps (use Plan-and-Execute).
17
+
18
+ ## Core flow
19
+
20
+ ```
21
+ Input
22
+ |
23
+ v
24
+ [Stage 1: Research] ──> research notes
25
+ |
26
+ v
27
+ [Stage 2: Outline] ──> structured outline
28
+ |
29
+ v
30
+ [Stage 3: Draft] ──> full draft
31
+ |
32
+ v
33
+ [Stage 4: Edit] ──> polished output
34
+ |
35
+ v
36
+ Final output
37
+ ```
38
+
39
+ ### Variants
40
+
41
+ - **Linear chain:** A → B → C → D. Simplest form.
42
+ - **Chain with validation gates:** After each stage, validate the output (schema check, quality check). Retry or fail early if validation fails.
43
+ - **Chain with accumulation:** Each stage receives not just the previous output, but all prior outputs (snowball context).
44
+ - **Chain with human-in-the-loop:** Pause between stages for human review/approval before continuing.
45
+
46
+ ## Key components
47
+
48
+ - **Stage:** A single LLM call with a focused system prompt. Each stage is a pure function: input → output.
49
+ - **Schema:** Structured output types (Pydantic models / Zod schemas) that define the contract between stages. Type safety prevents silent failures.
50
+ - **Pipeline orchestrator:** Runs stages in order, passing outputs forward. Can be as simple as sequential `await` calls.
51
+ - **Validation gate (optional):** Checks stage output before passing it to the next stage. Can retry, modify, or abort.
52
+
53
+ ## Common pitfalls
54
+
55
+ - **Context bloat:** If you pass all prior stage outputs to every stage, context grows linearly. Only pass what each stage needs.
56
+ - **Error propagation:** A bad output from stage 1 cascades through all subsequent stages. Add validation gates at critical points.
57
+ - **Prompt coupling:** If stage 2's prompt assumes a specific format from stage 1, changing stage 1 breaks stage 2. Use explicit schemas.
58
+ - **Unnecessary chaining:** If one good prompt can do the job, don't split it into 3 stages. More stages = more latency + cost.
59
+ - **No partial results:** If the pipeline fails at stage 3, you lose stages 1-2 unless you persist intermediate outputs.
60
+
61
+ ## Framework fit
62
+
63
+ | Framework | Native support | Notes |
64
+ |-----------|----------------|-------|
65
+ | Pydantic AI | Sequential `agent.run()` calls with typed `result_type` per stage | Natural fit — type safety between stages |
66
+ | Vercel AI SDK | `generateObject()` per stage, pipe outputs manually | Clean TS implementation |
67
+ | LangGraph | Linear graph with one node per stage | Works but overkill for simple chains |
68
+ | Mastra | Sequential agent calls or workflow steps | Mastra workflows support chaining natively |
69
+ | CrewAI | Sequential task execution in a crew | Works but heavy for a simple pipeline |
70
+
71
+ ## Reference implementations
72
+
73
+ - [recipes/content-pipeline.md](../recipes/content-pipeline.md) — Multi-stage content generation pipeline (Pydantic AI / Vercel AI SDK, skeleton)
@@ -0,0 +1,84 @@
1
+ # Pattern: RAG (Retrieval-Augmented Generation)
2
+
3
+ **One-liner:** Ground LLM answers in retrieved documents so the model answers from your data, not its training set.
4
+
5
+ ## When to use
6
+
7
+ - User asks questions that should be answered from a specific corpus (docs, KB articles, policies).
8
+ - You need citations or source attribution in answers.
9
+ - The knowledge base changes over time and can't be baked into prompts.
10
+ - You want to reduce hallucination by constraining the model to retrieved context.
11
+
12
+ ## When NOT to use
13
+
14
+ - The answer space is small enough to fit in the system prompt (use direct prompting).
15
+ - You need the agent to take *actions*, not just answer questions (use ReAct or Routing).
16
+ - The corpus is unstructured and doesn't chunk well (consider summarization pipelines first).
17
+
18
+ ## Core flow
19
+
20
+ ```
21
+ User question
22
+ |
23
+ v
24
+ [Embed query] ──> [Vector search] ──> top-K chunks
25
+ | |
26
+ v v
27
+ [Build prompt] <────────────────── [Retrieved context]
28
+ |
29
+ v
30
+ [LLM generates answer grounded in context]
31
+ |
32
+ v
33
+ Answer + citations
34
+ ```
35
+
36
+ ### Variants
37
+
38
+ - **Naive RAG:** Embed > retrieve > generate. Simple, works for most cases.
39
+ - **RAG + reranker:** Add a cross-encoder reranker after retrieval to improve precision.
40
+ - **Agentic RAG:** The LLM decides *when* and *what* to retrieve (retrieval as a tool call). This is what the `docs-rag-qa` prototype implements.
41
+ - **Multi-step RAG:** Decompose complex questions, retrieve per sub-question, synthesize.
42
+
43
+ ## Key components
44
+
45
+ - **Chunker:** Splits documents into retrieval-sized pieces. Sentence-boundary splitting with overlap is the baseline. Typical chunk size: 300-800 characters.
46
+ - **Embedder:** Converts text to vectors. In this repo, Qdrant handles storage; embedding is done by the Qdrant client or a dedicated model.
47
+ - **Retriever:** Searches the vector store for chunks similar to the query. Returns top-K with relevance scores.
48
+ - **Generator:** The LLM that synthesizes an answer from the retrieved chunks. Gets the chunks injected into its prompt or via tool call results.
49
+ - **Citation extractor:** Maps answer spans back to source documents for attribution.
50
+
51
+ ## Common pitfalls
52
+
53
+ - **Chunks too large:** The model ignores the middle of long contexts. Keep chunks focused.
54
+ - **Chunks too small:** Loss of surrounding context makes chunks meaningless. Use overlap.
55
+ - **No relevance threshold:** Returning irrelevant chunks harms answer quality. Filter by score.
56
+ - **Stuffing all chunks into one prompt:** With many results, use map-reduce or iterative refinement instead.
57
+ - **Ignoring metadata:** Filtering by document type, date, or source before vector search dramatically improves precision.
58
+ - **Not evaluating retrieval separately:** Measure retrieval recall/precision independently from generation quality. RAGAS is purpose-built for this.
59
+
60
+ ## Framework fit
61
+
62
+ | Framework | Native support | Notes |
63
+ |-----------|----------------|-------|
64
+ | LangGraph | Built-in retriever nodes, checkpointing for multi-step RAG | Best for complex RAG with state management |
65
+ | Pydantic AI | Tool-based retrieval, type-safe citation schemas | Clean DX for simple agentic RAG |
66
+ | Mastra | Built-in RAG primitives, vector store integrations | TS-native, batteries included |
67
+ | Vercel AI SDK | Tool-based retrieval via `tool()` | Lightweight, good for simple RAG |
68
+ | CrewAI | Possible but not idiomatic | Better suited for multi-agent patterns |
69
+
70
+ ## Evaluation metrics
71
+
72
+ RAG has dedicated metrics beyond general agent eval:
73
+
74
+ | Metric | What it measures | Tool |
75
+ |--------|-----------------|------|
76
+ | Context recall | Did retrieval find the right chunks? | RAGAS |
77
+ | Context precision | Are retrieved chunks relevant (not noise)? | RAGAS |
78
+ | Faithfulness | Is the answer grounded in retrieved context (not hallucinated)? | RAGAS |
79
+ | Answer relevancy | Does the answer address the question? | RAGAS / DeepEval |
80
+ | Answer correctness | Is the answer factually correct vs. gold standard? | DeepEval |
81
+
82
+ ## Reference implementations
83
+
84
+ - [recipes/docs-rag-qa.md](../recipes/docs-rag-qa.md) -- Agentic RAG with Pydantic AI (Py) and Vercel AI SDK (TS)
@@ -0,0 +1,77 @@
1
+ # Pattern: ReAct (Reason + Act)
2
+
3
+ **One-liner:** The agent iterates through a think → act → observe loop, choosing tools at each step until it can answer.
4
+
5
+ ## When to use
6
+
7
+ - The task requires multi-step reasoning with external information (search, APIs, databases).
8
+ - You can't predict upfront which tools the agent will need or in what order.
9
+ - The agent needs to adapt its strategy based on intermediate results.
10
+ - You want an auditable trace of the agent's reasoning at each step.
11
+
12
+ ## When NOT to use
13
+
14
+ - The task is a single classification or generation (no tools needed — just prompt).
15
+ - The workflow is a fixed sequence of steps (use Prompt Chaining instead).
16
+ - You need guaranteed execution order or deterministic outputs (ReAct is inherently non-deterministic).
17
+ - Latency budget is tight — each loop iteration is an LLM call.
18
+
19
+ ## Core flow
20
+
21
+ ```
22
+ User question
23
+ |
24
+ v
25
+ [Reason] ──> "I need to search for X"
26
+ |
27
+ v
28
+ [Act] ──> call tool(search, query="X")
29
+ |
30
+ v
31
+ [Observe] ──> tool returns results
32
+ |
33
+ v
34
+ [Reason] ──> "Now I need to look up Y"
35
+ | ... (loop until done)
36
+ v
37
+ [Final Answer]
38
+ ```
39
+
40
+ ### Loop termination
41
+
42
+ The LLM decides when it has enough information to answer. Most frameworks enforce a `max_steps` limit to prevent runaway loops. In this repo, the default is 5 steps.
43
+
44
+ ### Variants
45
+
46
+ - **Vanilla ReAct:** Single agent, flat tool list. The most common variant.
47
+ - **ReAct + reflection:** After each observation, a separate prompt critiques whether the approach is working.
48
+ - **ReAct + planning:** The agent first writes a brief plan, then executes via ReAct. Hybrid with Plan-and-Execute.
49
+
50
+ ## Key components
51
+
52
+ - **Reasoner:** The LLM generating thoughts about what to do next. The system prompt defines its persona and available tools.
53
+ - **Tool executor:** Runs the chosen tool and returns results. In Pydantic AI, tools are decorated functions. In LangGraph, `ToolNode` handles this.
54
+ - **Observation parser:** Feeds tool results back into the next reasoning step as context.
55
+ - **Step limiter:** Prevents infinite loops. Configurable via `max_steps` or equivalent.
56
+
57
+ ## Common pitfalls
58
+
59
+ - **Tool description quality:** Vague tool descriptions cause the agent to pick wrong tools. Be precise about what each tool does, its inputs, and output format.
60
+ - **Too many tools:** More than ~10 tools degrades selection accuracy. Group related tools or use a routing layer.
61
+ - **No step limit:** Without `max_steps`, the agent can loop indefinitely on hard questions. Always set a cap.
62
+ - **Expensive tool calls in loops:** If a tool hits a paid API, each loop iteration costs money. Consider caching or rate-limiting tool calls.
63
+ - **Hallucinated tool names:** The agent may try to call tools that don't exist. Ensure your framework validates tool names before execution.
64
+
65
+ ## Framework fit
66
+
67
+ | Framework | Native support | Notes |
68
+ |-----------|----------------|-------|
69
+ | LangGraph | `create_react_agent()` — purpose-built | Canonical implementation with full state management |
70
+ | Pydantic AI | `Agent` with `@tool` decorators — built-in loop | Clean DX, auto-manages the reason/act/observe cycle |
71
+ | Mastra | `Agent` with `tools` array — built-in loop | TS-native, similar ergonomics to Pydantic AI |
72
+ | Vercel AI SDK | `generateText()` with `tools` + `maxSteps` | Lightweight, good for simple ReAct |
73
+ | CrewAI | Agent with tools — uses ReAct internally | Works but less control over the loop |
74
+
75
+ ## Reference implementations
76
+
77
+ - [recipes/research-assistant.md](../recipes/research-assistant.md) — ReAct research agent with web search (Pydantic AI / Mastra)
@@ -0,0 +1,69 @@
1
+ # Pattern: Routing + Tool Use
2
+
3
+ **One-liner:** Classify the user's intent, then route to a specialized handler (agent or tool) for that intent.
4
+
5
+ ## When to use
6
+
7
+ - Incoming requests fall into distinct categories with different handling logic (e.g., billing vs. technical support).
8
+ - You want to use a cheap/fast model for classification and a capable model for handling.
9
+ - Each category needs different tools, system prompts, or guardrails.
10
+ - You need clear audit trails showing why a request was routed a specific way.
11
+
12
+ ## When NOT to use
13
+
14
+ - There's only one type of request (just use a single agent with tools).
15
+ - The categories overlap heavily and routing would be wrong half the time.
16
+ - The task requires multi-step reasoning across categories (use ReAct or Multi-Agent).
17
+
18
+ ## Core flow
19
+
20
+ ```
21
+ User message
22
+ |
23
+ v
24
+ [Classifier] ──> intent + confidence
25
+ |
26
+ ├── billing ──> [Billing Specialist] ──> (Stripe tool)
27
+ ├── technical ──> [Technical Specialist] ──> (KB search tool)
28
+ ├── account ──> [Account Specialist] ──> (KB search tool)
29
+ └── general ──> [General Specialist] ──> (no tools)
30
+ |
31
+ v
32
+ Response
33
+ ```
34
+
35
+ ### Variants
36
+
37
+ - **Single-hop routing:** Classify once, route once. Simple and fast.
38
+ - **Cascading routers:** First router picks a domain, second router picks a sub-category within that domain.
39
+ - **Confidence-gated routing:** If classifier confidence is below threshold, fall back to a general handler or ask for clarification.
40
+ - **Tool-only routing:** Instead of separate agents, the classifier picks which tool to invoke. Lighter-weight.
41
+
42
+ ## Key components
43
+
44
+ - **Classifier:** An LLM (often smaller/cheaper) that maps input to an intent. Returns structured output: intent enum + confidence score + reasoning.
45
+ - **Specialist agents:** One per intent category, each with its own system prompt and tool set. Specialists are isolated — a billing agent can't accidentally call the KB search tool.
46
+ - **Router:** Dispatches to the correct specialist based on classification. In code, this is typically a match/switch on the intent enum.
47
+ - **Fallback handler:** Catches low-confidence classifications or unknown intents.
48
+
49
+ ## Common pitfalls
50
+
51
+ - **Overlapping intents:** "I can't log in to pay my bill" is both account and billing. Define clear boundaries or allow multi-label classification with priority.
52
+ - **Classifier drift:** As your product changes, new intents emerge. Monitor classification accuracy and retrain/update prompts.
53
+ - **Over-routing:** Too many intent categories makes classification unreliable. Start with 3-5 categories.
54
+ - **Missing fallback:** Without a fallback, unclassifiable messages get random routing. Always have a general handler.
55
+ - **Specialist without tools:** If a specialist can't actually do anything (no tools, no data access), it's just a differently-prompted chatbot. Make sure specialists have real capabilities.
56
+
57
+ ## Framework fit
58
+
59
+ | Framework | Native support | Notes |
60
+ |-----------|----------------|-------|
61
+ | Pydantic AI | `result_type=ClassificationResult` for structured classification, separate `Agent` per specialist | Natural fit — typed outputs + tool isolation |
62
+ | LangGraph | Conditional edges from classifier node to specialist nodes | Works well but verbose for simple routing |
63
+ | Mastra | Agent with `tools` per specialist, manual routing | Clean, no extra abstraction needed |
64
+ | Vercel AI SDK | `generateObject()` for classification, `generateText()` per specialist | Lightweight TS option |
65
+ | CrewAI | Not idiomatic — CrewAI is designed for collaboration, not routing | Use a different framework |
66
+
67
+ ## Reference implementations
68
+
69
+ - [recipes/customer-support-triage.md](../recipes/customer-support-triage.md) — Intent classification → specialist routing (Pydantic AI / Mastra)
@@ -0,0 +1,39 @@
1
+ # Recipes
2
+
3
+ Full-spec agent blueprints showing how patterns, frameworks, and stack compose into real agents. Each file answers: **"Give me everything I need to build this agent."**
4
+
5
+ ## Validated (with reference implementation)
6
+
7
+ | Recipe | Pattern | Framework (Py / TS) | Status |
8
+ |--------|---------|---------------------|--------|
9
+ | [Customer Support Triage](customer-support-triage.md) | Routing + Tool Use | Pydantic AI / Vercel AI SDK | Blueprint (validated) |
10
+ | [Docs RAG QA](docs-rag-qa.md) | RAG | Pydantic AI / Vercel AI SDK | Blueprint (validated) |
11
+ | [Research Assistant](research-assistant.md) | ReAct | Pydantic AI / Vercel AI SDK | Blueprint (validated) |
12
+
13
+ ## Design spec
14
+
15
+ | Recipe | Pattern | Framework (Py / TS) | Status |
16
+ |--------|---------|---------------------|--------|
17
+ | [Content Pipeline](content-pipeline.md) | Prompt Chaining | Pydantic AI / Vercel AI SDK | Blueprint (design spec) |
18
+ | [Code Review Agent](code-review-agent.md) | Plan-Execute-Reflect | LangGraph / Vercel AI SDK | Blueprint (design spec) |
19
+ | [Ops Crew](ops-crew.md) | Multi-Agent Flat | CrewAI / Vercel AI SDK | Blueprint (design spec) |
20
+ | [Parallel Enricher](parallel-enricher.md) | Parallel Calls | Pydantic AI / Vercel AI SDK | Blueprint (design spec) |
21
+ | [Memory Assistant](memory-assistant.md) | Memory | LangGraph / Vercel AI SDK | Blueprint (design spec) |
22
+ | [Hierarchical Agent](hierarchical-agent.md) | Multi-Agent Hierarchical | LangGraph / Vercel AI SDK | Blueprint (design spec) |
23
+
24
+ ## How to read a blueprint
25
+
26
+ Each blueprint includes 13 sections:
27
+ 1. **What it composes** — links to the pattern, framework, stack, and cross-cutting docs
28
+ 2. **Architecture** — diagram of the agent's structure
29
+ 3. **Data Models** — full Pydantic + Zod schemas
30
+ 4. **API Contract** — every endpoint with request/response JSON
31
+ 5. **Tool Specifications** — each tool with parameters and examples
32
+ 6. **Prompt Specifications** — actual system prompts with design rationale
33
+ 7. **Key files** — file-by-file implementation spec (Python + TypeScript)
34
+ 8. **Implementation Roadmap** — ordered build steps
35
+ 9. **Environment & Deployment** — env vars table, Docker reference
36
+ 10. **Test Strategy** — example tests per tier
37
+ 11. **Eval Dataset** — inline golden examples
38
+ 12. **Design Decisions** — trade-offs and rationale
39
+ 13. **Reference Implementation** — full source code (validated blueprints only)