agent-scaffold-cli 0.1.1__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- agent_scaffold/__init__.py +8 -0
- agent_scaffold/__main__.py +6 -0
- agent_scaffold/_bundled_deployments/__init__.py +15 -0
- agent_scaffold/_bundled_deployments/docs/cross-cutting/README.md +15 -0
- agent_scaffold/_bundled_deployments/docs/cross-cutting/auth-jwt.md +235 -0
- agent_scaffold/_bundled_deployments/docs/cross-cutting/logging-structured.md +196 -0
- agent_scaffold/_bundled_deployments/docs/cross-cutting/observability.md +259 -0
- agent_scaffold/_bundled_deployments/docs/cross-cutting/rate-limiting.md +171 -0
- agent_scaffold/_bundled_deployments/docs/cross-cutting/testing-strategy.md +261 -0
- agent_scaffold/_bundled_deployments/docs/frameworks/README.md +22 -0
- agent_scaffold/_bundled_deployments/docs/frameworks/crewai.md +91 -0
- agent_scaffold/_bundled_deployments/docs/frameworks/langgraph.md +79 -0
- agent_scaffold/_bundled_deployments/docs/frameworks/mastra.md +74 -0
- agent_scaffold/_bundled_deployments/docs/frameworks/pydantic-ai.md +77 -0
- agent_scaffold/_bundled_deployments/docs/frameworks/vercel-ai-sdk.md +83 -0
- agent_scaffold/_bundled_deployments/docs/patterns/README.md +26 -0
- agent_scaffold/_bundled_deployments/docs/patterns/memory.md +82 -0
- agent_scaffold/_bundled_deployments/docs/patterns/multi-agent-flat.md +72 -0
- agent_scaffold/_bundled_deployments/docs/patterns/multi-agent-hierarchical.md +83 -0
- agent_scaffold/_bundled_deployments/docs/patterns/parallel-calls.md +73 -0
- agent_scaffold/_bundled_deployments/docs/patterns/plan-execute-reflect.md +77 -0
- agent_scaffold/_bundled_deployments/docs/patterns/prompt-chaining.md +73 -0
- agent_scaffold/_bundled_deployments/docs/patterns/rag.md +84 -0
- agent_scaffold/_bundled_deployments/docs/patterns/react.md +77 -0
- agent_scaffold/_bundled_deployments/docs/patterns/routing-tool-use.md +69 -0
- agent_scaffold/_bundled_deployments/docs/recipes/README.md +39 -0
- agent_scaffold/_bundled_deployments/docs/recipes/code-review-agent.md +518 -0
- agent_scaffold/_bundled_deployments/docs/recipes/content-pipeline.md +525 -0
- agent_scaffold/_bundled_deployments/docs/recipes/customer-support-triage.md +1679 -0
- agent_scaffold/_bundled_deployments/docs/recipes/docs-rag-qa.md +1254 -0
- agent_scaffold/_bundled_deployments/docs/recipes/hierarchical-agent.md +554 -0
- agent_scaffold/_bundled_deployments/docs/recipes/memory-assistant.md +499 -0
- agent_scaffold/_bundled_deployments/docs/recipes/ops-crew.md +457 -0
- agent_scaffold/_bundled_deployments/docs/recipes/parallel-enricher.md +457 -0
- agent_scaffold/_bundled_deployments/docs/recipes/research-assistant.md +1096 -0
- agent_scaffold/_bundled_deployments/docs/stack/README.md +19 -0
- agent_scaffold/_bundled_deployments/docs/stack/api-fastapi.md +112 -0
- agent_scaffold/_bundled_deployments/docs/stack/api-hono.md +108 -0
- agent_scaffold/_bundled_deployments/docs/stack/cache-redis.md +85 -0
- agent_scaffold/_bundled_deployments/docs/stack/eval-deepeval-ragas-promptfoo.md +164 -0
- agent_scaffold/_bundled_deployments/docs/stack/llm-claude.md +105 -0
- agent_scaffold/_bundled_deployments/docs/stack/relational-postgres.md +122 -0
- agent_scaffold/_bundled_deployments/docs/stack/tool-protocol-mcp.md +275 -0
- agent_scaffold/_bundled_deployments/docs/stack/tracing-langfuse.md +108 -0
- agent_scaffold/_bundled_deployments/docs/stack/vector-qdrant.md +121 -0
- agent_scaffold/cache.py +32 -0
- agent_scaffold/cli.py +512 -0
- agent_scaffold/config.py +117 -0
- agent_scaffold/context.py +253 -0
- agent_scaffold/contract.py +141 -0
- agent_scaffold/discovery.py +112 -0
- agent_scaffold/generator.py +213 -0
- agent_scaffold/languages/__init__.py +0 -0
- agent_scaffold/languages/python.yaml +28 -0
- agent_scaffold/languages/typescript.yaml +25 -0
- agent_scaffold/prompts/__init__.py +0 -0
- agent_scaffold/prompts/repair.md +9 -0
- agent_scaffold/prompts/system.md +21 -0
- agent_scaffold/prompts/user_template.md +43 -0
- agent_scaffold/validator.py +133 -0
- agent_scaffold/writer.py +171 -0
- agent_scaffold_cli-0.1.1.dist-info/METADATA +147 -0
- agent_scaffold_cli-0.1.1.dist-info/RECORD +66 -0
- agent_scaffold_cli-0.1.1.dist-info/WHEEL +4 -0
- agent_scaffold_cli-0.1.1.dist-info/entry_points.txt +2 -0
- agent_scaffold_cli-0.1.1.dist-info/licenses/LICENSE +21 -0
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
# Framework: CrewAI
|
|
2
|
+
|
|
3
|
+
**Language:** Python
|
|
4
|
+
**Install:** `uv add crewai`
|
|
5
|
+
**Version pinned:** >=0.28.0
|
|
6
|
+
|
|
7
|
+
## Core abstractions
|
|
8
|
+
|
|
9
|
+
- **Agent:** An LLM-powered entity with a role, goal, backstory, and tools. Agents have a persona that shapes their behavior.
|
|
10
|
+
- **Task:** A unit of work with a description, expected output, and optionally an assigned agent. Tasks can depend on other tasks.
|
|
11
|
+
- **Crew:** A team of agents working together on tasks. The crew defines the process (sequential, parallel, or hierarchical).
|
|
12
|
+
- **Process:** How agents execute tasks — `sequential` (one after another), `hierarchical` (manager delegates), or custom.
|
|
13
|
+
- **Tool:** A function the agent can call. CrewAI tools are based on LangChain's tool interface.
|
|
14
|
+
|
|
15
|
+
## Patterns it supports well
|
|
16
|
+
|
|
17
|
+
- **Multi-Agent Flat** — The canonical use case. Define a crew with specialized agents, assign tasks, run. Built-in collaboration.
|
|
18
|
+
- **Multi-Agent Hierarchical** — Set `process=Process.hierarchical` and designate a `manager_agent`. The manager delegates tasks to workers.
|
|
19
|
+
- **Prompt Chaining** — Sequential task execution where each task's output feeds the next.
|
|
20
|
+
- **ReAct** — Individual agents use a ReAct loop internally when they have tools.
|
|
21
|
+
|
|
22
|
+
## Patterns where it's awkward
|
|
23
|
+
|
|
24
|
+
- **Routing / Classification** — CrewAI is designed for collaboration, not intent routing. Use Pydantic AI instead.
|
|
25
|
+
- **RAG** — Possible via tools but there's no built-in retrieval. Better handled by a dedicated RAG framework.
|
|
26
|
+
- **Parallel fan-out on data** — CrewAI's parallelism is at the agent/task level, not data-level. For batch processing N items, use `asyncio.gather()` instead.
|
|
27
|
+
- **Fine-grained state control** — No checkpointing or state graph. The crew runs to completion.
|
|
28
|
+
|
|
29
|
+
## Idiomatic minimal example
|
|
30
|
+
|
|
31
|
+
```python
|
|
32
|
+
from crewai import Agent, Task, Crew, Process
|
|
33
|
+
|
|
34
|
+
researcher = Agent(
|
|
35
|
+
role="Researcher",
|
|
36
|
+
goal="Find accurate information about the topic",
|
|
37
|
+
backstory="You are an expert researcher with attention to detail.",
|
|
38
|
+
tools=[search_tool],
|
|
39
|
+
)
|
|
40
|
+
|
|
41
|
+
writer = Agent(
|
|
42
|
+
role="Writer",
|
|
43
|
+
goal="Write clear, concise summaries",
|
|
44
|
+
backstory="You are a technical writer who excels at making complex topics accessible.",
|
|
45
|
+
)
|
|
46
|
+
|
|
47
|
+
research_task = Task(
|
|
48
|
+
description="Research {topic}",
|
|
49
|
+
expected_output="Detailed research notes with sources",
|
|
50
|
+
agent=researcher,
|
|
51
|
+
)
|
|
52
|
+
|
|
53
|
+
write_task = Task(
|
|
54
|
+
description="Write a summary based on the research",
|
|
55
|
+
expected_output="A clear, concise summary",
|
|
56
|
+
agent=writer,
|
|
57
|
+
)
|
|
58
|
+
|
|
59
|
+
crew = Crew(
|
|
60
|
+
agents=[researcher, writer],
|
|
61
|
+
tasks=[research_task, write_task],
|
|
62
|
+
process=Process.sequential,
|
|
63
|
+
)
|
|
64
|
+
|
|
65
|
+
result = crew.kickoff(inputs={"topic": "AI agents"})
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
## Strengths
|
|
69
|
+
|
|
70
|
+
- **Multi-agent first** — The best framework for teams of agents working together. Crew/Agent/Task is a natural mental model.
|
|
71
|
+
- **Role-based agents** — Backstory + role + goal gives agents strong personas without complex prompt engineering.
|
|
72
|
+
- **Built-in delegation** — Hierarchical process with manager delegation works out of the box.
|
|
73
|
+
- **Low code for simple cases** — A 2-agent crew with sequential tasks is ~20 lines.
|
|
74
|
+
|
|
75
|
+
## Trade-offs
|
|
76
|
+
|
|
77
|
+
- **Opinionated** — The Crew/Agent/Task model doesn't fit every pattern. Single-agent workflows feel over-engineered.
|
|
78
|
+
- **Token-heavy** — Agent backstories, task descriptions, and inter-agent messages consume many tokens. Costs add up.
|
|
79
|
+
- **Less control** — The internal ReAct loop and delegation logic are opaque. Hard to customize behavior mid-execution.
|
|
80
|
+
- **LangChain dependency** — Tools and some internals depend on LangChain, adding to the dependency tree.
|
|
81
|
+
- **Debugging** — Multi-agent interactions are hard to trace. Add verbose logging.
|
|
82
|
+
|
|
83
|
+
## Used in this repo
|
|
84
|
+
|
|
85
|
+
| Prototype | Role |
|
|
86
|
+
|-----------|------|
|
|
87
|
+
| `ops-crew` | Planned for flat multi-agent DevOps/Security/Database crew (skeleton) |
|
|
88
|
+
|
|
89
|
+
## Reference implementations
|
|
90
|
+
|
|
91
|
+
- [recipes/ops-crew.md](../recipes/ops-crew.md) — Multi-agent ops crew (skeleton)
|
|
@@ -0,0 +1,79 @@
|
|
|
1
|
+
# Framework: LangGraph
|
|
2
|
+
|
|
3
|
+
**Language:** Python
|
|
4
|
+
**Install:** `uv add langgraph`
|
|
5
|
+
**Version pinned:** 0.3.21
|
|
6
|
+
|
|
7
|
+
## Core abstractions
|
|
8
|
+
|
|
9
|
+
- **StateGraph:** A directed graph where nodes are functions and edges are transitions. State flows through the graph as a typed dict.
|
|
10
|
+
- **State:** A TypedDict (or Pydantic model) that accumulates data as the graph executes. Each node receives the full state and returns updates.
|
|
11
|
+
- **Nodes:** Python functions (sync or async) that take state, do work (call LLM, run tool, transform data), and return state updates.
|
|
12
|
+
- **Edges:** Transitions between nodes. Can be unconditional (always go A -> B) or conditional (router function picks the next node based on state).
|
|
13
|
+
- **Checkpointer:** Persists state between steps, enabling resume, replay, and human-in-the-loop. Postgres-backed in production.
|
|
14
|
+
- **ToolNode:** Built-in node that executes tool calls from a preceding LLM node. Handles tool schemas, invocation, and result injection.
|
|
15
|
+
|
|
16
|
+
## Patterns it supports well
|
|
17
|
+
|
|
18
|
+
- **RAG** -- Retriever node -> generator node with state carrying retrieved chunks. Add conditional edges for multi-step retrieval.
|
|
19
|
+
- **ReAct** -- `create_react_agent()` gives you a prebuilt reason-act-observe loop with tool execution. The canonical use case.
|
|
20
|
+
- **Plan & Execute** -- Planner node produces a step list in state, executor node works through them, reflector node evaluates and optionally re-plans.
|
|
21
|
+
- **Multi-Agent (hierarchical)** -- `langgraph-supervisor` package provides a supervisor node that delegates to sub-graphs. Each sub-agent is its own compiled graph.
|
|
22
|
+
- **Memory** -- Checkpointer + state persistence means conversation history and memory are first-class.
|
|
23
|
+
|
|
24
|
+
## Patterns where it's awkward
|
|
25
|
+
|
|
26
|
+
- **Simple tool use / routing** -- If your agent is just "classify intent, call one tool," LangGraph's graph abstraction is overkill. Use Pydantic AI instead.
|
|
27
|
+
- **Parallel fan-out** -- LangGraph supports map-reduce via `Send()`, but the ergonomics are heavier than raw `asyncio.gather()` with Pydantic AI.
|
|
28
|
+
|
|
29
|
+
## Idiomatic minimal example
|
|
30
|
+
|
|
31
|
+
```python
|
|
32
|
+
from langgraph.graph import StateGraph, START, END
|
|
33
|
+
from langgraph.prebuilt import create_react_agent
|
|
34
|
+
from langchain_anthropic import ChatAnthropic
|
|
35
|
+
from langchain_core.tools import tool
|
|
36
|
+
|
|
37
|
+
@tool
|
|
38
|
+
def search(query: str) -> str:
|
|
39
|
+
"""Search the web."""
|
|
40
|
+
return f"Results for: {query}"
|
|
41
|
+
|
|
42
|
+
llm = ChatAnthropic(model="claude-sonnet-4-6-20250514")
|
|
43
|
+
agent = create_react_agent(llm, tools=[search])
|
|
44
|
+
|
|
45
|
+
# Run
|
|
46
|
+
result = agent.invoke({"messages": [("user", "What is MCP?")]})
|
|
47
|
+
print(result["messages"][-1].content)
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
## Strengths
|
|
51
|
+
|
|
52
|
+
- **State management** is the best in class. TypedDict state + checkpointing means you can pause, resume, replay, and branch agent execution.
|
|
53
|
+
- **Observability** via LangSmith integration -- every node execution, LLM call, and tool invocation is traced automatically.
|
|
54
|
+
- **Composition** -- sub-graphs can be compiled and used as nodes in parent graphs, enabling hierarchical agent architectures.
|
|
55
|
+
- **Production-proven** -- widely deployed, well-documented, active maintenance.
|
|
56
|
+
|
|
57
|
+
## Trade-offs
|
|
58
|
+
|
|
59
|
+
- **Learning curve** -- the graph mental model takes time. Simple agents feel over-engineered.
|
|
60
|
+
- **LangChain coupling** -- while LangGraph is technically separate, it works best with LangChain's model wrappers (`ChatAnthropic`, `ChatOpenAI`), tool decorators, and message types.
|
|
61
|
+
- **Async complexity** -- async graph execution works but debugging is harder than sync Pydantic AI agents.
|
|
62
|
+
- **Verbose for simple cases** -- a 3-node graph with conditional edges is more code than a Pydantic AI agent with a tool.
|
|
63
|
+
|
|
64
|
+
## Used in this repo
|
|
65
|
+
|
|
66
|
+
| Prototype | Role |
|
|
67
|
+
|-----------|------|
|
|
68
|
+
| `docs-rag-qa` | Listed as LangGraph in README, but implementation uses Pydantic AI for simplicity. LangGraph would be the choice for multi-step RAG with state. |
|
|
69
|
+
| `research-assistant` | Listed for ReAct loop. The `create_react_agent` helper is the natural fit. |
|
|
70
|
+
| `code-review-agent` | Plan & Execute pattern -- planner + executor + reflector as graph nodes. (Skeleton) |
|
|
71
|
+
| `memory-assistant` | Checkpointer-backed memory with LangGraph + mem0. (Skeleton) |
|
|
72
|
+
| `hierarchical-agent` | `langgraph-supervisor` for hierarchical multi-agent. (Skeleton) |
|
|
73
|
+
|
|
74
|
+
## Reference implementations
|
|
75
|
+
|
|
76
|
+
- [recipes/docs-rag-qa.md](../recipes/docs-rag-qa.md) -- RAG pipeline (design-level LangGraph, implemented with Pydantic AI)
|
|
77
|
+
- [recipes/research-assistant.md](../recipes/research-assistant.md) -- ReAct agent
|
|
78
|
+
- [recipes/code-review-agent.md](../recipes/code-review-agent.md) -- Plan & Execute (skeleton)
|
|
79
|
+
- [recipes/hierarchical-agent.md](../recipes/hierarchical-agent.md) -- Hierarchical multi-agent (skeleton)
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
# Framework: Mastra
|
|
2
|
+
|
|
3
|
+
**Language:** TypeScript
|
|
4
|
+
**Install:** `npm install mastra @mastra/core`
|
|
5
|
+
**Version pinned:** >=0.1.0
|
|
6
|
+
|
|
7
|
+
## Core abstractions
|
|
8
|
+
|
|
9
|
+
- **Agent:** An LLM-powered entity with a system prompt, model config, and tools. Similar to Pydantic AI's Agent but in TypeScript.
|
|
10
|
+
- **Tool:** A typed function the agent can call. Defined with Zod schemas for input/output validation.
|
|
11
|
+
- **Workflow:** A directed graph of steps (like LangGraph but TS-native). Supports branching, looping, and parallel execution.
|
|
12
|
+
- **Memory:** Built-in memory primitives for conversation history and semantic memory. Integrates with vector stores.
|
|
13
|
+
- **Integration:** Pre-built connectors for external services (APIs, databases, SaaS tools).
|
|
14
|
+
|
|
15
|
+
## Patterns it supports well
|
|
16
|
+
|
|
17
|
+
- **ReAct** — Agent with tools runs a built-in reason-act-observe loop. Similar ergonomics to Pydantic AI.
|
|
18
|
+
- **Routing + Tool Use** — Agent with structured output for classification, separate agents per handler.
|
|
19
|
+
- **Prompt Chaining** — Workflow steps execute sequentially, passing typed data between stages.
|
|
20
|
+
- **Memory** — First-class memory support with built-in storage backends.
|
|
21
|
+
- **Multi-Agent** — Agent handoffs via workflows. Agents can delegate to other agents.
|
|
22
|
+
|
|
23
|
+
## Patterns where it's awkward
|
|
24
|
+
|
|
25
|
+
- **Plan-and-Execute** — Possible via workflows but no dedicated planner/reflector abstractions.
|
|
26
|
+
- **Complex state management** — Workflows handle state but lack LangGraph's checkpointing depth (no replay, no branching history).
|
|
27
|
+
|
|
28
|
+
## Idiomatic minimal example
|
|
29
|
+
|
|
30
|
+
```typescript
|
|
31
|
+
import { Agent } from "@mastra/core";
|
|
32
|
+
import { anthropic } from "@ai-sdk/anthropic";
|
|
33
|
+
import { z } from "zod";
|
|
34
|
+
|
|
35
|
+
const agent = new Agent({
|
|
36
|
+
name: "assistant",
|
|
37
|
+
model: anthropic("claude-sonnet-4-6-20250514"),
|
|
38
|
+
instructions: "You are a helpful assistant.",
|
|
39
|
+
tools: {
|
|
40
|
+
search: {
|
|
41
|
+
description: "Search for information",
|
|
42
|
+
parameters: z.object({ query: z.string() }),
|
|
43
|
+
execute: async ({ query }) => `Results for: ${query}`,
|
|
44
|
+
},
|
|
45
|
+
},
|
|
46
|
+
});
|
|
47
|
+
|
|
48
|
+
const result = await agent.generate("What is MCP?");
|
|
49
|
+
console.log(result.text);
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
## Strengths
|
|
53
|
+
|
|
54
|
+
- **TS-native** — Built for TypeScript from the ground up. Zod schemas, async/await, full type inference.
|
|
55
|
+
- **Batteries included** — Memory, workflows, integrations, and tools in one framework.
|
|
56
|
+
- **Workflow engine** — Directed graph workflows with branching and parallelism, similar to LangGraph but in TS.
|
|
57
|
+
- **Growing ecosystem** — Active development, increasing community adoption.
|
|
58
|
+
|
|
59
|
+
## Trade-offs
|
|
60
|
+
|
|
61
|
+
- **Newer framework** — Smaller community, fewer production deployments compared to Vercel AI SDK or LangChain.
|
|
62
|
+
- **Heavier than Vercel AI SDK** — More abstractions to learn. For simple agents, Vercel AI SDK is lighter.
|
|
63
|
+
- **Integration lock-in** — Built-in integrations are convenient but may not match your exact needs.
|
|
64
|
+
- **Documentation gaps** — As a newer project, some advanced use cases lack documentation.
|
|
65
|
+
|
|
66
|
+
## Used in this repo
|
|
67
|
+
|
|
68
|
+
| Prototype | Role |
|
|
69
|
+
|-----------|------|
|
|
70
|
+
| Not currently used | The TS track uses Vercel AI SDK. Mastra is documented as a TS framework option for teams that need workflow orchestration or built-in memory. |
|
|
71
|
+
|
|
72
|
+
## Reference implementations
|
|
73
|
+
|
|
74
|
+
- No direct recipes yet. See [frameworks/vercel-ai-sdk.md](vercel-ai-sdk.md) for the TS framework currently used in prototypes.
|
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
# Framework: Pydantic AI
|
|
2
|
+
|
|
3
|
+
**Language:** Python
|
|
4
|
+
**Install:** `uv add pydantic-ai[anthropic]`
|
|
5
|
+
**Version pinned:** >=0.1.0
|
|
6
|
+
|
|
7
|
+
## Core abstractions
|
|
8
|
+
|
|
9
|
+
- **Agent:** The central class. Wraps a model, system prompt, tools, and result type. Calling `agent.run()` executes a full reason-act-observe loop until the agent produces a result.
|
|
10
|
+
- **Tool:** A decorated Python function (`@agent.tool` or `@agent.tool_plain`) that the agent can call. Tools receive typed arguments and return typed results.
|
|
11
|
+
- **Result type:** A Pydantic model that defines the structured output the agent must produce. The framework validates the LLM's output against this schema automatically.
|
|
12
|
+
- **Dependencies:** Typed context injected into tools via `@agent.tool` (as opposed to `@agent.tool_plain`). Useful for passing DB connections, API clients, or user context.
|
|
13
|
+
- **System prompt:** Static string or dynamic function that sets the agent's behavior. Dynamic prompts can use dependencies.
|
|
14
|
+
|
|
15
|
+
## Patterns it supports well
|
|
16
|
+
|
|
17
|
+
- **Routing + Tool Use** — Structured `result_type` makes intent classification natural. Separate agents per specialist with isolated tool sets. This is the pattern used in `customer-support-triage`.
|
|
18
|
+
- **ReAct** — `agent.run()` is a built-in ReAct loop. The agent reasons, calls tools, observes results, and loops until it produces the result type. Used in `research-assistant`.
|
|
19
|
+
- **RAG** — Retrieval as a tool, generation via the agent. Type-safe citation schemas via `result_type`. Used in `docs-rag-qa`.
|
|
20
|
+
- **Prompt Chaining** — Sequential `agent.run()` calls with different `result_type` per stage. Type safety between stages.
|
|
21
|
+
- **Parallel Calls** — `asyncio.gather()` with multiple `agent.run()` calls. Async-first design makes this natural.
|
|
22
|
+
|
|
23
|
+
## Patterns where it's awkward
|
|
24
|
+
|
|
25
|
+
- **Plan-and-Execute** — No built-in state management or checkpointing. You'd manage the plan/reflect state yourself.
|
|
26
|
+
- **Multi-Agent (hierarchical)** — No supervisor abstraction. You'd orchestrate agent-calling-agent manually.
|
|
27
|
+
- **Memory** — No built-in persistence. You'd integrate with an external memory store via tools.
|
|
28
|
+
|
|
29
|
+
## Idiomatic minimal example
|
|
30
|
+
|
|
31
|
+
```python
|
|
32
|
+
from pydantic_ai import Agent
|
|
33
|
+
|
|
34
|
+
agent = Agent(
|
|
35
|
+
"anthropic:claude-sonnet-4-6-20250514",
|
|
36
|
+
system_prompt="You are a helpful assistant.",
|
|
37
|
+
)
|
|
38
|
+
|
|
39
|
+
@agent.tool_plain
|
|
40
|
+
async def search(query: str) -> str:
|
|
41
|
+
"""Search for information."""
|
|
42
|
+
return f"Results for: {query}"
|
|
43
|
+
|
|
44
|
+
result = await agent.run("What is MCP?")
|
|
45
|
+
print(result.data)
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
## Strengths
|
|
49
|
+
|
|
50
|
+
- **Type safety** — Pydantic models for inputs, outputs, and tool signatures. Validation is automatic.
|
|
51
|
+
- **Minimal boilerplate** — An agent with tools is ~10 lines. No graph to define, no nodes to wire.
|
|
52
|
+
- **Async-first** — Built on asyncio. Parallel tool calls and concurrent agents work naturally.
|
|
53
|
+
- **Framework-agnostic models** — Supports Anthropic, OpenAI, Gemini, Ollama, and more via a clean provider interface.
|
|
54
|
+
- **Testable** — `TestModel` and `FunctionModel` allow deterministic testing without hitting an LLM.
|
|
55
|
+
|
|
56
|
+
## Trade-offs
|
|
57
|
+
|
|
58
|
+
- **No state management** — Unlike LangGraph, there's no checkpointer or state graph. Complex multi-step workflows require manual state handling.
|
|
59
|
+
- **No built-in multi-agent** — Each agent is independent. Orchestrating multiple agents is your responsibility.
|
|
60
|
+
- **Simpler = less control** — The ReAct loop is opaque. You can't easily inject logic between reason and act steps (LangGraph lets you add nodes anywhere).
|
|
61
|
+
- **Younger ecosystem** — Smaller community and fewer integrations compared to LangChain/LangGraph.
|
|
62
|
+
|
|
63
|
+
## Used in this repo
|
|
64
|
+
|
|
65
|
+
| Prototype | Role |
|
|
66
|
+
|-----------|------|
|
|
67
|
+
| `customer-support-triage` | Classifier agent with `result_type=ClassificationResult`, specialist agents per intent |
|
|
68
|
+
| `docs-rag-qa` | RAG agent with Qdrant retrieval as a tool |
|
|
69
|
+
| `research-assistant` | ReAct agent with web search tool |
|
|
70
|
+
| `content-pipeline` | Planned for prompt chaining (skeleton) |
|
|
71
|
+
| `parallel-enricher` | Planned for parallel `asyncio.gather()` pattern (skeleton) |
|
|
72
|
+
|
|
73
|
+
## Reference implementations
|
|
74
|
+
|
|
75
|
+
- [recipes/customer-support-triage.md](../recipes/customer-support-triage.md) — Routing + Tool Use
|
|
76
|
+
- [recipes/docs-rag-qa.md](../recipes/docs-rag-qa.md) — Agentic RAG
|
|
77
|
+
- [recipes/research-assistant.md](../recipes/research-assistant.md) — ReAct research agent
|
|
@@ -0,0 +1,83 @@
|
|
|
1
|
+
# Framework: Vercel AI SDK
|
|
2
|
+
|
|
3
|
+
**Language:** TypeScript
|
|
4
|
+
**Install:** `npm install ai @ai-sdk/anthropic`
|
|
5
|
+
**Version pinned:** ai ^4.0.0, @ai-sdk/anthropic ^1.0.0
|
|
6
|
+
|
|
7
|
+
## Core abstractions
|
|
8
|
+
|
|
9
|
+
- **`generateText()`:** Single LLM call that can use tools. The agent loops internally (up to `maxSteps`) calling tools until it produces a final text response.
|
|
10
|
+
- **`generateObject()`:** LLM call that returns structured output validated against a Zod schema. Ideal for classification, extraction, and structured data.
|
|
11
|
+
- **`streamText()` / `streamObject()`:** Streaming variants for real-time output.
|
|
12
|
+
- **`tool()`:** Defines a tool with a Zod schema and an execute function. Tools are passed to `generateText()` as a record.
|
|
13
|
+
- **Provider:** Model abstraction (`@ai-sdk/anthropic`, `@ai-sdk/openai`, etc.) that handles API communication.
|
|
14
|
+
|
|
15
|
+
## Patterns it supports well
|
|
16
|
+
|
|
17
|
+
- **ReAct** — `generateText()` with `tools` and `maxSteps` runs a built-in reason-act-observe loop. The simplest way to build an agent in TS.
|
|
18
|
+
- **Routing + Tool Use** — `generateObject()` for classification (structured output), `generateText()` per specialist. Clean and lightweight.
|
|
19
|
+
- **RAG** — Retrieval as a tool, generation via `generateText()`. Or inject retrieved context directly into the prompt.
|
|
20
|
+
- **Prompt Chaining** — Sequential `generateObject()` / `generateText()` calls. Each stage is a function call.
|
|
21
|
+
- **Parallel Calls** — `Promise.all()` with multiple `generateText()` calls. Standard TS async.
|
|
22
|
+
|
|
23
|
+
## Patterns where it's awkward
|
|
24
|
+
|
|
25
|
+
- **Plan-and-Execute** — No state management or checkpointing. You'd manage everything manually.
|
|
26
|
+
- **Multi-Agent (hierarchical)** — No supervisor abstraction. You'd orchestrate agent-calling-agent yourself.
|
|
27
|
+
- **Memory** — No built-in persistence. Conversation history must be managed externally.
|
|
28
|
+
- **Complex workflows** — No graph or workflow engine. For complex flows, consider Mastra or a manual state machine.
|
|
29
|
+
|
|
30
|
+
## Idiomatic minimal example
|
|
31
|
+
|
|
32
|
+
```typescript
|
|
33
|
+
import { generateText, tool } from "ai";
|
|
34
|
+
import { anthropic } from "@ai-sdk/anthropic";
|
|
35
|
+
import { z } from "zod";
|
|
36
|
+
|
|
37
|
+
const result = await generateText({
|
|
38
|
+
model: anthropic("claude-sonnet-4-6-20250514"),
|
|
39
|
+
system: "You are a helpful assistant.",
|
|
40
|
+
prompt: "What is MCP?",
|
|
41
|
+
tools: {
|
|
42
|
+
search: tool({
|
|
43
|
+
description: "Search for information",
|
|
44
|
+
parameters: z.object({ query: z.string() }),
|
|
45
|
+
execute: async ({ query }) => `Results for: ${query}`,
|
|
46
|
+
}),
|
|
47
|
+
},
|
|
48
|
+
maxSteps: 5,
|
|
49
|
+
});
|
|
50
|
+
|
|
51
|
+
console.log(result.text);
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
## Strengths
|
|
55
|
+
|
|
56
|
+
- **Minimal API surface** — Three functions (`generateText`, `generateObject`, `streamText`) cover most use cases. Easy to learn.
|
|
57
|
+
- **Type safety** — Zod schemas for tool inputs, structured outputs, and provider configs. Full TypeScript inference.
|
|
58
|
+
- **Streaming-first** — Built for real-time UIs. `streamText()` and `streamObject()` integrate with React Server Components and Next.js.
|
|
59
|
+
- **Provider-agnostic** — Swap models by changing the provider import. Same code works with Anthropic, OpenAI, Google, and more.
|
|
60
|
+
- **Lightweight** — No framework overhead. Functions, not classes. Composes with any TS codebase.
|
|
61
|
+
- **Production-proven** — Widely deployed via Vercel's ecosystem. Well-documented, actively maintained.
|
|
62
|
+
|
|
63
|
+
## Trade-offs
|
|
64
|
+
|
|
65
|
+
- **No orchestration** — No graph, no workflow, no state management. Complex multi-step agents require manual plumbing.
|
|
66
|
+
- **No multi-agent** — Each `generateText()` call is independent. No built-in agent-to-agent communication.
|
|
67
|
+
- **No memory** — Conversation history is your responsibility. Pass `messages` array manually.
|
|
68
|
+
- **UI-focused origin** — Some features (streaming, React hooks) are optimized for web UIs, which may be unnecessary for backend agents.
|
|
69
|
+
|
|
70
|
+
## Used in this repo
|
|
71
|
+
|
|
72
|
+
| Prototype | Role |
|
|
73
|
+
|-----------|------|
|
|
74
|
+
| `customer-support-triage` (TS) | Classifier with `generateObject()`, specialist with `generateText()` + tools |
|
|
75
|
+
| `docs-rag-qa` (TS) | RAG agent with retrieval tool |
|
|
76
|
+
| `research-assistant` (TS) | ReAct agent with web search tool |
|
|
77
|
+
| All TS prototypes | The standard TS framework for all prototypes in this repo |
|
|
78
|
+
|
|
79
|
+
## Reference implementations
|
|
80
|
+
|
|
81
|
+
- [recipes/customer-support-triage.md](../recipes/customer-support-triage.md) — Routing + Tool Use (TS track)
|
|
82
|
+
- [recipes/docs-rag-qa.md](../recipes/docs-rag-qa.md) — Agentic RAG (TS track)
|
|
83
|
+
- [recipes/research-assistant.md](../recipes/research-assistant.md) — ReAct agent (TS track)
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
# Patterns
|
|
2
|
+
|
|
3
|
+
Architecture patterns for AI agents. Each file answers: **"What shape does my agent take?"**
|
|
4
|
+
|
|
5
|
+
| Pattern | One-liner | Best framework fit |
|
|
6
|
+
|---------|-----------|-------------------|
|
|
7
|
+
| [RAG](rag.md) | Ground answers in retrieved documents | LangGraph, Pydantic AI |
|
|
8
|
+
| [ReAct](react.md) | Think → act → observe loop with tools | LangGraph, Pydantic AI |
|
|
9
|
+
| [Routing + Tool Use](routing-tool-use.md) | Classify intent, route to specialist | Pydantic AI |
|
|
10
|
+
| [Prompt Chaining](prompt-chaining.md) | Fixed sequence of LLM calls | Pydantic AI, Vercel AI SDK |
|
|
11
|
+
| [Plan, Execute, Reflect](plan-execute-reflect.md) | Plan steps, execute, self-correct | LangGraph |
|
|
12
|
+
| [Parallel Calls](parallel-calls.md) | Fan-out / fan-in concurrent execution | Pydantic AI, Vercel AI SDK |
|
|
13
|
+
| [Memory](memory.md) | Persist context across conversations | LangGraph |
|
|
14
|
+
| [Multi-Agent Flat](multi-agent-flat.md) | Peer agents collaborating | CrewAI |
|
|
15
|
+
| [Multi-Agent Hierarchical](multi-agent-hierarchical.md) | Supervisor delegates to workers | LangGraph |
|
|
16
|
+
|
|
17
|
+
## How to pick a pattern
|
|
18
|
+
|
|
19
|
+
1. **Single task, no tools needed?** → Just prompt the model directly. No pattern needed.
|
|
20
|
+
2. **Need external data but fixed flow?** → [RAG](rag.md) or [Prompt Chaining](prompt-chaining.md)
|
|
21
|
+
3. **Need tools, unknown sequence?** → [ReAct](react.md)
|
|
22
|
+
4. **Multiple request types?** → [Routing + Tool Use](routing-tool-use.md)
|
|
23
|
+
5. **Complex task needing self-correction?** → [Plan, Execute, Reflect](plan-execute-reflect.md)
|
|
24
|
+
6. **N independent sub-tasks?** → [Parallel Calls](parallel-calls.md)
|
|
25
|
+
7. **Need cross-session context?** → [Memory](memory.md)
|
|
26
|
+
8. **Multiple specialists needed?** → [Multi-Agent Flat](multi-agent-flat.md) or [Hierarchical](multi-agent-hierarchical.md)
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
# Pattern: Memory
|
|
2
|
+
|
|
3
|
+
**One-liner:** Persist information across conversations so the agent can recall context, preferences, and facts from prior interactions.
|
|
4
|
+
|
|
5
|
+
## When to use
|
|
6
|
+
|
|
7
|
+
- The agent has repeat users who expect it to remember past interactions.
|
|
8
|
+
- Context from previous conversations is needed to give good answers (e.g., user preferences, prior decisions).
|
|
9
|
+
- You want the agent to learn and improve over time from interactions.
|
|
10
|
+
- The agent manages ongoing relationships or projects that span multiple sessions.
|
|
11
|
+
|
|
12
|
+
## When NOT to use
|
|
13
|
+
|
|
14
|
+
- Every interaction is independent (stateless Q&A, one-shot tasks).
|
|
15
|
+
- The conversation context window is large enough to hold all relevant history.
|
|
16
|
+
- Privacy/compliance requirements prohibit storing user interaction data.
|
|
17
|
+
|
|
18
|
+
## Core flow
|
|
19
|
+
|
|
20
|
+
```
|
|
21
|
+
User message
|
|
22
|
+
|
|
|
23
|
+
v
|
|
24
|
+
[Retrieve memories] ──> query memory store for relevant past context
|
|
25
|
+
|
|
|
26
|
+
v
|
|
27
|
+
[Augment prompt] ──> inject retrieved memories into system/user prompt
|
|
28
|
+
|
|
|
29
|
+
v
|
|
30
|
+
[LLM generates response] ──> answer informed by past context
|
|
31
|
+
|
|
|
32
|
+
v
|
|
33
|
+
[Extract & store memories] ──> save new facts/preferences from this interaction
|
|
34
|
+
|
|
|
35
|
+
v
|
|
36
|
+
Response
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
### Memory types
|
|
40
|
+
|
|
41
|
+
- **Conversation history:** Raw message log from prior sessions. Simple but grows unbounded.
|
|
42
|
+
- **Semantic memory:** Facts and knowledge extracted from conversations, stored as embeddings. Retrievable by similarity.
|
|
43
|
+
- **Episodic memory:** Summaries of past interactions ("Last week, user asked about X and we resolved it by Y").
|
|
44
|
+
- **User profile memory:** Structured preferences and attributes (name, role, preferred language, past decisions).
|
|
45
|
+
- **Working memory:** Short-term context within a single session. Typically managed by the conversation itself.
|
|
46
|
+
|
|
47
|
+
### Variants
|
|
48
|
+
|
|
49
|
+
- **Explicit memory:** User says "remember that I prefer dark mode." Agent stores it as-is.
|
|
50
|
+
- **Implicit memory:** Agent automatically extracts noteworthy facts from conversations.
|
|
51
|
+
- **Hybrid:** Both explicit and implicit, with the agent deciding what's worth remembering.
|
|
52
|
+
|
|
53
|
+
## Key components
|
|
54
|
+
|
|
55
|
+
- **Memory store:** Where memories live. Options: vector DB (semantic search), relational DB (structured), or specialized services like Mem0.
|
|
56
|
+
- **Memory retriever:** Queries the store for memories relevant to the current conversation. Typically vector similarity search.
|
|
57
|
+
- **Memory extractor:** After each conversation, identifies new facts/preferences worth persisting.
|
|
58
|
+
- **Memory manager:** Handles deduplication, updates, and expiration of memories. Prevents contradictions (old preference vs. new one).
|
|
59
|
+
- **Injection layer:** Formats retrieved memories and adds them to the prompt context.
|
|
60
|
+
|
|
61
|
+
## Common pitfalls
|
|
62
|
+
|
|
63
|
+
- **Unbounded memory:** Storing everything makes retrieval noisy. Be selective about what gets stored.
|
|
64
|
+
- **Stale memories:** User preferences change. Memories need timestamps and a mechanism to update or expire.
|
|
65
|
+
- **Contradictions:** "User likes dark mode" and "User prefers light mode" both stored. Newer should overwrite older.
|
|
66
|
+
- **Privacy leakage:** Memories from one user accessible to another. Always scope memories by user ID.
|
|
67
|
+
- **Retrieval noise:** Irrelevant memories injected into the prompt confuse the model. Use relevance thresholds.
|
|
68
|
+
- **Memory extraction hallucination:** The extractor "remembers" things the user didn't actually say. Validate extracted facts against the conversation.
|
|
69
|
+
|
|
70
|
+
## Framework fit
|
|
71
|
+
|
|
72
|
+
| Framework | Native support | Notes |
|
|
73
|
+
|-----------|----------------|-------|
|
|
74
|
+
| LangGraph | Checkpointer for conversation state, integrates with external memory stores | Best for stateful agents with complex memory needs |
|
|
75
|
+
| Pydantic AI | Manual memory integration via tool calls or prompt augmentation | Flexible but you build the plumbing |
|
|
76
|
+
| Mastra | Built-in memory primitives and storage integrations | TS-native memory support |
|
|
77
|
+
| CrewAI | Agent memory via `memory=True` flag | Simple but limited customization |
|
|
78
|
+
| Vercel AI SDK | Manual integration | No built-in memory |
|
|
79
|
+
|
|
80
|
+
## Reference implementations
|
|
81
|
+
|
|
82
|
+
- [recipes/memory-assistant.md](../recipes/memory-assistant.md) — Memory-enabled assistant with LangGraph + Mem0 (skeleton)
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
# Pattern: Multi-Agent Flat (Peer Collaboration)
|
|
2
|
+
|
|
3
|
+
**One-liner:** Multiple specialized agents collaborate as peers, each handling part of a task, without a central supervisor.
|
|
4
|
+
|
|
5
|
+
## When to use
|
|
6
|
+
|
|
7
|
+
- The task naturally splits into independent specialist domains (e.g., DevOps + Security + Database).
|
|
8
|
+
- Agents need to collaborate but no single agent should be "in charge."
|
|
9
|
+
- You want modular agent design where adding a new specialist doesn't change the others.
|
|
10
|
+
- Each agent has a distinct tool set and expertise.
|
|
11
|
+
|
|
12
|
+
## When NOT to use
|
|
13
|
+
|
|
14
|
+
- One agent can handle the task alone (simpler is better).
|
|
15
|
+
- Agents need strict coordination or sequencing (use Hierarchical or Plan-and-Execute).
|
|
16
|
+
- The number of agents is large (>5) — coordination overhead grows. Use hierarchical instead.
|
|
17
|
+
|
|
18
|
+
## Core flow
|
|
19
|
+
|
|
20
|
+
```
|
|
21
|
+
User task
|
|
22
|
+
|
|
|
23
|
+
v
|
|
24
|
+
[Task distribution] ──> split across agents
|
|
25
|
+
|
|
|
26
|
+
├──> [Agent A: DevOps] ──> findings
|
|
27
|
+
├──> [Agent B: Security] ──> findings
|
|
28
|
+
└──> [Agent C: Database] ──> findings
|
|
29
|
+
|
|
|
30
|
+
v
|
|
31
|
+
[Aggregation]
|
|
32
|
+
|
|
|
33
|
+
v
|
|
34
|
+
Combined report
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
### Variants
|
|
38
|
+
|
|
39
|
+
- **Independent execution:** Each agent works on its piece independently, results merged at the end. No inter-agent communication.
|
|
40
|
+
- **Round-robin discussion:** Agents take turns adding to a shared context. Each sees what prior agents said.
|
|
41
|
+
- **Debate/critique:** Agents review each other's outputs and provide feedback. Converges through iteration.
|
|
42
|
+
- **Handoff:** Agents pass work to each other when they encounter something outside their expertise.
|
|
43
|
+
|
|
44
|
+
## Key components
|
|
45
|
+
|
|
46
|
+
- **Crew/Team:** The container that holds the set of agents and defines how they interact (parallel, sequential, round-robin).
|
|
47
|
+
- **Agent:** An LLM with a specialized system prompt and tool set. Each agent has a distinct role.
|
|
48
|
+
- **Task:** A unit of work assigned to one or more agents. Has a description, expected output, and optionally a designated agent.
|
|
49
|
+
- **Communication protocol:** How agents share information — shared state, message passing, or output chaining.
|
|
50
|
+
- **Aggregator:** Merges individual agent outputs into a cohesive result.
|
|
51
|
+
|
|
52
|
+
## Common pitfalls
|
|
53
|
+
|
|
54
|
+
- **Role overlap:** Two agents covering similar ground produces redundant or contradictory output. Define clear boundaries.
|
|
55
|
+
- **No aggregation strategy:** Individual agent outputs dumped together aren't useful. Design the merge step.
|
|
56
|
+
- **Excessive communication:** Agents chatting with each other burns tokens and adds latency. Minimize inter-agent messages.
|
|
57
|
+
- **Lowest common denominator:** The final output quality is limited by the weakest agent. Each specialist needs to be good at its job.
|
|
58
|
+
- **Coordination failure:** Without a supervisor, no one resolves disagreements between agents. Have a tie-breaking mechanism.
|
|
59
|
+
|
|
60
|
+
## Framework fit
|
|
61
|
+
|
|
62
|
+
| Framework | Native support | Notes |
|
|
63
|
+
|-----------|----------------|-------|
|
|
64
|
+
| CrewAI | Purpose-built — `Crew`, `Agent`, `Task` abstractions | Best fit for flat multi-agent |
|
|
65
|
+
| LangGraph | Multiple sub-graphs composed in a parent graph | More manual but more control |
|
|
66
|
+
| Mastra | Multi-agent workflows with agent handoffs | TS-native option |
|
|
67
|
+
| Pydantic AI | Multiple agents orchestrated manually | Works but no built-in multi-agent |
|
|
68
|
+
| Vercel AI SDK | Manual orchestration | No multi-agent primitives |
|
|
69
|
+
|
|
70
|
+
## Reference implementations
|
|
71
|
+
|
|
72
|
+
- [recipes/ops-crew.md](../recipes/ops-crew.md) — DevOps/Security/Database ops crew with CrewAI (skeleton)
|