agent-starter-pack 0.6.4__py3-none-any.whl → 0.7.1__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {agent_starter_pack-0.6.4.dist-info → agent_starter_pack-0.7.1.dist-info}/METADATA +15 -12
- {agent_starter_pack-0.6.4.dist-info → agent_starter_pack-0.7.1.dist-info}/RECORD +34 -31
- agents/adk_gemini_fullstack/app/agent.py +62 -8
- llm.txt +296 -0
- src/base_template/.gitignore +1 -1
- src/base_template/GEMINI.md +5 -0
- src/base_template/Makefile +20 -5
- src/base_template/README.md +3 -0
- src/base_template/deployment/cd/deploy-to-prod.yaml +3 -3
- src/base_template/deployment/cd/staging.yaml +4 -4
- src/base_template/deployment/ci/pr_checks.yaml +1 -1
- src/base_template/pyproject.toml +2 -2
- src/cli/utils/cicd.py +4 -1
- src/cli/utils/template.py +22 -18
- src/data_ingestion/uv.lock +97 -0
- src/frontends/adk_gemini_fullstack/frontend/package-lock.json +276 -0
- src/frontends/adk_gemini_fullstack/frontend/package.json +1 -0
- src/frontends/adk_gemini_fullstack/frontend/src/components/ChatMessagesView.tsx +5 -4
- src/frontends/adk_gemini_fullstack/frontend/vite.config.ts +4 -0
- src/resources/docs/adk-cheatsheet.md +1224 -0
- src/resources/locks/uv-adk_base-agent_engine.lock +14 -13
- src/resources/locks/uv-adk_base-cloud_run.lock +14 -13
- src/resources/locks/uv-adk_gemini_fullstack-agent_engine.lock +14 -13
- src/resources/locks/uv-adk_gemini_fullstack-cloud_run.lock +14 -13
- src/resources/locks/uv-agentic_rag-agent_engine.lock +78 -83
- src/resources/locks/uv-agentic_rag-cloud_run.lock +110 -119
- src/resources/locks/uv-crewai_coding_crew-agent_engine.lock +130 -135
- src/resources/locks/uv-crewai_coding_crew-cloud_run.lock +163 -172
- src/resources/locks/uv-langgraph_base_react-agent_engine.lock +87 -92
- src/resources/locks/uv-langgraph_base_react-cloud_run.lock +119 -128
- src/resources/locks/uv-live_api-cloud_run.lock +98 -107
- {agent_starter_pack-0.6.4.dist-info → agent_starter_pack-0.7.1.dist-info}/WHEEL +0 -0
- {agent_starter_pack-0.6.4.dist-info → agent_starter_pack-0.7.1.dist-info}/entry_points.txt +0 -0
- {agent_starter_pack-0.6.4.dist-info → agent_starter_pack-0.7.1.dist-info}/licenses/LICENSE +0 -0
@@ -0,0 +1,1224 @@
|
|
1
|
+
# Google Agent Development Kit (ADK) Python Cheatsheet
|
2
|
+
|
3
|
+
This document serves as a long-form, comprehensive reference for building, orchestrating, and deploying AI agents using the Python Agent Development Kit (ADK). It aims to cover every significant aspect with greater detail, more code examples, and in-depth best practices.
|
4
|
+
|
5
|
+
## Table of Contents
|
6
|
+
|
7
|
+
1. [Core Concepts & Project Structure](#1-core-concepts--project-structure)
|
8
|
+
* 1.1 ADK's Foundational Principles
|
9
|
+
* 1.2 Essential Primitives
|
10
|
+
* 1.3 Standard Project Layout
|
11
|
+
2. [Agent Definitions (`LlmAgent`)](#2-agent-definitions-llmagent)
|
12
|
+
* 2.1 Basic `LlmAgent` Setup
|
13
|
+
* 2.2 Advanced `LlmAgent` Configuration
|
14
|
+
* 2.3 LLM Instruction Crafting
|
15
|
+
3. [Orchestration with Workflow Agents](#3-orchestration-with-workflow-agents)
|
16
|
+
* 3.1 `SequentialAgent`: Linear Execution
|
17
|
+
* 3.2 `ParallelAgent`: Concurrent Execution
|
18
|
+
* 3.3 `LoopAgent`: Iterative Processes
|
19
|
+
4. [Multi-Agent Systems & Communication](#4-multi-agent-systems--communication)
|
20
|
+
* 4.1 Agent Hierarchy
|
21
|
+
* 4.2 Inter-Agent Communication Mechanisms
|
22
|
+
* 4.3 Common Multi-Agent Patterns
|
23
|
+
5. [Building Custom Agents (`BaseAgent`)](#5-building-custom-agents-baseagent)
|
24
|
+
* 5.1 When to Use Custom Agents
|
25
|
+
* 5.2 Implementing `_run_async_impl`
|
26
|
+
6. [Models: Gemini, LiteLLM, and Vertex AI](#6-models-gemini-litellm-and-vertex-ai)
|
27
|
+
* 6.1 Google Gemini Models (AI Studio & Vertex AI)
|
28
|
+
* 6.2 Other Cloud & Proprietary Models via LiteLLM
|
29
|
+
* 6.3 Open & Local Models via LiteLLM (Ollama, vLLM)
|
30
|
+
* 6.4 Customizing LLM API Clients
|
31
|
+
7. [Tools: The Agent's Capabilities](#7-tools-the-agents-capabilities)
|
32
|
+
* 7.1 Defining Function Tools: Principles & Best Practices
|
33
|
+
* 7.2 The `ToolContext` Object: Accessing Runtime Information
|
34
|
+
* 7.3 All Tool Types & Their Usage
|
35
|
+
8. [Context, State, and Memory Management](#8-context-state-and-memory-management)
|
36
|
+
* 8.1 The `Session` Object & `SessionService`
|
37
|
+
* 8.2 `State`: The Conversational Scratchpad
|
38
|
+
* 8.3 `Memory`: Long-Term Knowledge & Retrieval
|
39
|
+
* 8.4 `Artifacts`: Binary Data Management
|
40
|
+
9. [Runtime, Events, and Execution Flow](#9-runtime-events-and-execution-flow)
|
41
|
+
* 9.1 The `Runner`: The Orchestrator
|
42
|
+
* 9.2 The Event Loop: Core Execution Flow
|
43
|
+
* 9.3 `Event` Object: The Communication Backbone
|
44
|
+
* 9.4 Asynchronous Programming (Python Specific)
|
45
|
+
10. [Control Flow with Callbacks](#10-control-flow-with-callbacks)
|
46
|
+
* 10.1 Callback Mechanism: Interception & Control
|
47
|
+
* 10.2 Types of Callbacks
|
48
|
+
* 10.3 Callback Best Practices
|
49
|
+
11. [Authentication for Tools](#11-authentication-for-tools)
|
50
|
+
* 11.1 Core Concepts: `AuthScheme` & `AuthCredential`
|
51
|
+
* 11.2 Interactive OAuth/OIDC Flows
|
52
|
+
* 11.3 Custom Tool Authentication
|
53
|
+
12. [Deployment Strategies](#12-deployment-strategies)
|
54
|
+
* 12.1 Local Development & Testing (`adk web`, `adk run`, `adk api_server`)
|
55
|
+
* 12.2 Vertex AI Agent Engine
|
56
|
+
* 12.3 Cloud Run
|
57
|
+
* 12.4 Google Kubernetes Engine (GKE)
|
58
|
+
* 12.5 CI/CD Integration
|
59
|
+
13. [Evaluation and Safety](#13-evaluation-and-safety)
|
60
|
+
* 13.1 Agent Evaluation (`adk eval`)
|
61
|
+
* 13.2 Safety & Guardrails
|
62
|
+
14. [Debugging, Logging & Observability](#14-debugging-logging--observability)
|
63
|
+
15. [Advanced I/O Modalities](#15-advanced-io-modalities)
|
64
|
+
16. [Performance Optimization](#16-performance-optimization)
|
65
|
+
17. [General Best Practices & Common Pitfalls](#17-general-best-practices--common-pitfalls)
|
66
|
+
|
67
|
+
---
|
68
|
+
|
69
|
+
## 1. Core Concepts & Project Structure
|
70
|
+
|
71
|
+
### 1.1 ADK's Foundational Principles
|
72
|
+
|
73
|
+
* **Modularity**: Break down complex problems into smaller, manageable agents and tools.
|
74
|
+
* **Composability**: Combine simple agents and tools to build sophisticated systems.
|
75
|
+
* **Observability**: Detailed event logging and tracing capabilities to understand agent behavior.
|
76
|
+
* **Extensibility**: Easily integrate with external services, models, and frameworks.
|
77
|
+
* **Deployment-Agnostic**: Design agents once, deploy anywhere.
|
78
|
+
|
79
|
+
### 1.2 Essential Primitives
|
80
|
+
|
81
|
+
* **`Agent`**: The core intelligent unit. Can be `LlmAgent` (LLM-driven) or `BaseAgent` (custom/workflow).
|
82
|
+
* **`Tool`**: Callable function/class providing external capabilities (`FunctionTool`, `OpenAPIToolset`, etc.).
|
83
|
+
* **`Session`**: A unique, stateful conversation thread with history (`events`) and short-term memory (`state`).
|
84
|
+
* **`State`**: Key-value dictionary within a `Session` for transient conversation data.
|
85
|
+
* **`Memory`**: Long-term, searchable knowledge base beyond a single session (`MemoryService`).
|
86
|
+
* **`Artifact`**: Named, versioned binary data (files, images) associated with a session or user.
|
87
|
+
* **`Runner`**: The execution engine; orchestrates agent activity and event flow.
|
88
|
+
* **`Event`**: Atomic unit of communication and history; carries content and side-effect `actions`.
|
89
|
+
* **`InvocationContext`**: The comprehensive root context object holding all runtime information for a single `run_async` call.
|
90
|
+
|
91
|
+
### 1.3 Standard Project Layout
|
92
|
+
|
93
|
+
A well-structured ADK project is crucial for maintainability and leveraging `adk` CLI tools.
|
94
|
+
|
95
|
+
```
|
96
|
+
your_project_root/
|
97
|
+
├── my_first_agent/ # Each folder is a distinct agent app
|
98
|
+
│ ├── __init__.py # Makes `my_first_agent` a Python package (`from . import agent`)
|
99
|
+
│ ├── agent.py # Contains `root_agent` definition and `LlmAgent`/WorkflowAgent instances
|
100
|
+
│ ├── tools.py # Custom tool function definitions
|
101
|
+
│ ├── data/ # Optional: static data, templates
|
102
|
+
│ └── .env # Environment variables (API keys, project IDs)
|
103
|
+
├── my_second_agent/
|
104
|
+
│ ├── __init__.py
|
105
|
+
│ └── agent.py
|
106
|
+
├── requirements.txt # Project's Python dependencies (e.g., google-adk, litellm)
|
107
|
+
├── tests/ # Unit and integration tests
|
108
|
+
│ ├── unit/
|
109
|
+
│ │ └── test_tools.py
|
110
|
+
│ └── integration/
|
111
|
+
│ └── test_my_first_agent.py
|
112
|
+
│ └── my_first_agent.evalset.json # Evaluation dataset for `adk eval`
|
113
|
+
└── main.py # Optional: Entry point for custom FastAPI server deployment
|
114
|
+
```
|
115
|
+
* `adk web` and `adk run` automatically discover agents in subdirectories with `__init__.py` and `agent.py`.
|
116
|
+
* `.env` files are automatically loaded by `adk` tools when run from the root or agent directory.
|
117
|
+
|
118
|
+
---
|
119
|
+
|
120
|
+
## 2. Agent Definitions (`LlmAgent`)
|
121
|
+
|
122
|
+
The `LlmAgent` is the cornerstone of intelligent behavior, leveraging an LLM for reasoning and decision-making.
|
123
|
+
|
124
|
+
### 2.1 Basic `LlmAgent` Setup
|
125
|
+
|
126
|
+
```python
|
127
|
+
from google.adk.agents import Agent
|
128
|
+
|
129
|
+
def get_current_time(city: str) -> dict:
|
130
|
+
"""Returns the current time in a specified city."""
|
131
|
+
# Mock implementation
|
132
|
+
if city.lower() == "new york":
|
133
|
+
return {"status": "success", "time": "10:30 AM EST"}
|
134
|
+
return {"status": "error", "message": f"Time for {city} not available."}
|
135
|
+
|
136
|
+
my_first_llm_agent = Agent(
|
137
|
+
name="time_teller_agent",
|
138
|
+
model="gemini-2.5-flash", # Essential: The LLM powering the agent
|
139
|
+
instruction="You are a helpful assistant that tells the current time in cities. Use the 'get_current_time' tool for this purpose.",
|
140
|
+
description="Tells the current time in a specified city.", # Crucial for multi-agent delegation
|
141
|
+
tools=[get_current_time] # List of callable functions/tool instances
|
142
|
+
)
|
143
|
+
```
|
144
|
+
|
145
|
+
### 2.2 Advanced `LlmAgent` Configuration
|
146
|
+
|
147
|
+
* **`generate_content_config`**: Controls LLM generation parameters (temperature, token limits, safety).
|
148
|
+
```python
|
149
|
+
from google.genai import types as genai_types
|
150
|
+
from google.adk.agents import Agent
|
151
|
+
|
152
|
+
gen_config = genai_types.GenerateContentConfig(
|
153
|
+
temperature=0.2, # Controls randomness (0.0-1.0), lower for more deterministic.
|
154
|
+
top_p=0.9, # Nucleus sampling: sample from top_p probability mass.
|
155
|
+
top_k=40, # Top-k sampling: sample from top_k most likely tokens.
|
156
|
+
max_output_tokens=1024, # Max tokens in LLM's response.
|
157
|
+
stop_sequences=["## END"] # LLM will stop generating if these sequences appear.
|
158
|
+
)
|
159
|
+
agent = Agent(
|
160
|
+
# ... basic config ...
|
161
|
+
generate_content_config=gen_config
|
162
|
+
)
|
163
|
+
```
|
164
|
+
|
165
|
+
* **`output_key`**: Automatically saves the agent's final text or structured (if `output_schema` is used) response to the `session.state` under this key. Facilitates data flow between agents.
|
166
|
+
```python
|
167
|
+
agent = Agent(
|
168
|
+
# ... basic config ...
|
169
|
+
output_key="llm_final_response_text"
|
170
|
+
)
|
171
|
+
# After agent runs, session.state['llm_final_response_text'] will contain its output.
|
172
|
+
```
|
173
|
+
|
174
|
+
* **`input_schema` & `output_schema`**: Define strict JSON input/output formats using Pydantic models.
|
175
|
+
> **Warning**: Using `output_schema` forces the LLM to generate JSON and **disables** its ability to use tools or delegate to other agents.
|
176
|
+
|
177
|
+
#### **Example: Defining and Using Structured Output**
|
178
|
+
|
179
|
+
This is the most reliable way to make an LLM produce predictable, parseable JSON, which is essential for multi-agent workflows.
|
180
|
+
|
181
|
+
1. **Define the Schema with Pydantic:**
|
182
|
+
```python
|
183
|
+
from pydantic import BaseModel, Field
|
184
|
+
from typing import Literal
|
185
|
+
|
186
|
+
class SearchQuery(BaseModel):
|
187
|
+
"""Model representing a specific search query for web search."""
|
188
|
+
search_query: str = Field(
|
189
|
+
description="A highly specific and targeted query for web search."
|
190
|
+
)
|
191
|
+
|
192
|
+
class Feedback(BaseModel):
|
193
|
+
"""Model for providing evaluation feedback on research quality."""
|
194
|
+
grade: Literal["pass", "fail"] = Field(
|
195
|
+
description="Evaluation result. 'pass' if the research is sufficient, 'fail' if it needs revision."
|
196
|
+
)
|
197
|
+
comment: str = Field(
|
198
|
+
description="Detailed explanation of the evaluation, highlighting strengths and/or weaknesses of the research."
|
199
|
+
)
|
200
|
+
follow_up_queries: list[SearchQuery] | None = Field(
|
201
|
+
default=None,
|
202
|
+
description="A list of specific, targeted follow-up search queries needed to fix research gaps. This should be null or empty if the grade is 'pass'."
|
203
|
+
)
|
204
|
+
```
|
205
|
+
* **`BaseModel` & `Field`**: Define data types, defaults, and crucial `description` fields. These descriptions are sent to the LLM to guide its output.
|
206
|
+
* **`Literal`**: Enforces strict enum-like values (`"pass"` or `"fail"`), preventing the LLM from hallucinating unexpected values.
|
207
|
+
|
208
|
+
2. **Assign the Schema to an `LlmAgent`:**
|
209
|
+
```python
|
210
|
+
research_evaluator = LlmAgent(
|
211
|
+
name="research_evaluator",
|
212
|
+
model="gemini-2.5-pro",
|
213
|
+
instruction="""You are a meticulous quality assurance analyst. Evaluate the research findings in 'section_research_findings' and be very critical.
|
214
|
+
If you find significant gaps, assign a grade of 'fail', write a detailed comment, and generate 5-7 specific follow-up queries.
|
215
|
+
If the research is thorough, grade it 'pass'.
|
216
|
+
Your response must be a single, raw JSON object validating against the 'Feedback' schema.
|
217
|
+
""",
|
218
|
+
output_schema=Feedback, # This forces the LLM to output JSON matching the Feedback model.
|
219
|
+
output_key="research_evaluation", # The resulting JSON object will be saved to state.
|
220
|
+
disallow_transfer_to_peers=True, # Prevents this agent from delegating. Its job is only to evaluate.
|
221
|
+
)
|
222
|
+
```
|
223
|
+
|
224
|
+
* **`include_contents`**: Controls whether the conversation history is sent to the LLM.
|
225
|
+
* `'default'` (default): Sends relevant history.
|
226
|
+
* `'none'`: Sends no history; agent operates purely on current turn's input and `instruction`. Useful for stateless API wrapper agents.
|
227
|
+
```python
|
228
|
+
agent = Agent(..., include_contents='none')
|
229
|
+
```
|
230
|
+
|
231
|
+
* **`planner`**: Assign a `BasePlanner` instance (e.g., `ReActPlanner`) to enable multi-step reasoning and planning. (Advanced, covered in Multi-Agents).
|
232
|
+
|
233
|
+
* **`executor`**: Assign a `BaseCodeExecutor` (e.g., `BuiltInCodeExecutor`) to allow the agent to execute code blocks.
|
234
|
+
```python
|
235
|
+
from google.adk.code_executors import BuiltInCodeExecutor
|
236
|
+
agent = Agent(
|
237
|
+
name="code_agent",
|
238
|
+
model="gemini-2.5-flash",
|
239
|
+
instruction="Write and execute Python code to solve math problems.",
|
240
|
+
executor=[BuiltInCodeExecutor] # Allows agent to run Python code
|
241
|
+
)
|
242
|
+
```
|
243
|
+
|
244
|
+
* **Callbacks**: Hooks for observing and modifying agent behavior at key lifecycle points (`before_model_callback`, `after_tool_callback`, etc.). (Covered in Callbacks).
|
245
|
+
|
246
|
+
### 2.3 LLM Instruction Crafting (`instruction`)
|
247
|
+
|
248
|
+
The `instruction` is critical. It guides the LLM's behavior, persona, and tool usage. The following examples demonstrate powerful techniques for creating specialized, reliable agents.
|
249
|
+
|
250
|
+
**Best Practices & Examples:**
|
251
|
+
|
252
|
+
* **Be Specific & Concise**: Avoid ambiguity.
|
253
|
+
* **Define Persona & Role**: Give the LLM a clear role.
|
254
|
+
* **Constrain Behavior & Tool Use**: Explicitly state what the LLM should *and should not* do.
|
255
|
+
* **Define Output Format**: Tell the LLM *exactly* what its output should look like, especially when not using `output_schema`.
|
256
|
+
* **Dynamic Injection**: Use `{state_key}` to inject runtime data from `session.state` into the prompt.
|
257
|
+
* **Iteration**: Test, observe, and refine instructions.
|
258
|
+
|
259
|
+
**Example 1: Constraining Tool Use and Output Format**
|
260
|
+
```python
|
261
|
+
import datetime
|
262
|
+
from google.adk.tools import google_search
|
263
|
+
|
264
|
+
|
265
|
+
plan_generator = LlmAgent(
|
266
|
+
model="gemini-2.5-flash",
|
267
|
+
name="plan_generator",
|
268
|
+
description="Generates a 4-5 line action-oriented research plan.",
|
269
|
+
instruction=f"""
|
270
|
+
You are a research strategist. Your job is to create a high-level RESEARCH PLAN, not a summary.
|
271
|
+
**RULE: Your output MUST be a bulleted list of 4-5 action-oriented research goals or key questions.**
|
272
|
+
- A good goal starts with a verb like "Analyze," "Identify," "Investigate."
|
273
|
+
- A bad output is a statement of fact like "The event was in April 2024."
|
274
|
+
**TOOL USE IS STRICTLY LIMITED:**
|
275
|
+
Your goal is to create a generic, high-quality plan *without searching*.
|
276
|
+
Only use `google_search` if a topic is ambiguous and you absolutely cannot create a plan without it.
|
277
|
+
You are explicitly forbidden from researching the *content* or *themes* of the topic.
|
278
|
+
Current date: {datetime.datetime.now().strftime("%Y-%m-%d")}
|
279
|
+
""",
|
280
|
+
tools=[google_search],
|
281
|
+
)
|
282
|
+
```
|
283
|
+
|
284
|
+
**Example 2: Injecting Data from State and Specifying Custom Tags**
|
285
|
+
This agent's `instruction` relies on data placed in `session.state` by previous agents.
|
286
|
+
```python
|
287
|
+
report_composer = LlmAgent(
|
288
|
+
model="gemini-2.5-pro",
|
289
|
+
name="report_composer_with_citations",
|
290
|
+
include_contents="none", # History not needed; all data is injected.
|
291
|
+
description="Transforms research data and a markdown outline into a final, cited report.",
|
292
|
+
instruction="""
|
293
|
+
Transform the provided data into a polished, professional, and meticulously cited research report.
|
294
|
+
|
295
|
+
---
|
296
|
+
### INPUT DATA
|
297
|
+
* Research Plan: `{research_plan}`
|
298
|
+
* Research Findings: `{section_research_findings}`
|
299
|
+
* Citation Sources: `{sources}`
|
300
|
+
* Report Structure: `{report_sections}`
|
301
|
+
|
302
|
+
---
|
303
|
+
### CRITICAL: Citation System
|
304
|
+
To cite a source, you MUST insert a special citation tag directly after the claim it supports.
|
305
|
+
|
306
|
+
**The only correct format is:** `<cite source="src-ID_NUMBER" />`
|
307
|
+
|
308
|
+
---
|
309
|
+
### Final Instructions
|
310
|
+
Generate a comprehensive report using ONLY the `<cite source="src-ID_NUMBER" />` tag system for all citations.
|
311
|
+
The final report must strictly follow the structure provided in the **Report Structure** markdown outline.
|
312
|
+
Do not include a "References" or "Sources" section; all citations must be in-line.
|
313
|
+
""",
|
314
|
+
output_key="final_cited_report",
|
315
|
+
)
|
316
|
+
```
|
317
|
+
|
318
|
+
---
|
319
|
+
|
320
|
+
## 3. Orchestration with Workflow Agents
|
321
|
+
|
322
|
+
Workflow agents (`SequentialAgent`, `ParallelAgent`, `LoopAgent`) provide deterministic control flow, combining LLM capabilities with structured execution. They do **not** use an LLM for their own orchestration logic.
|
323
|
+
|
324
|
+
### 3.1 `SequentialAgent`: Linear Execution
|
325
|
+
|
326
|
+
Executes `sub_agents` one after another in the order defined. The `InvocationContext` is passed along, allowing state changes to be visible to subsequent agents.
|
327
|
+
|
328
|
+
```python
|
329
|
+
from google.adk.agents import SequentialAgent, Agent
|
330
|
+
|
331
|
+
# Agent 1: Summarizes a document and saves to state
|
332
|
+
summarizer = Agent(
|
333
|
+
name="DocumentSummarizer",
|
334
|
+
model="gemini-2.5-flash",
|
335
|
+
instruction="Summarize the provided document in 3 sentences.",
|
336
|
+
output_key="document_summary" # Output saved to session.state['document_summary']
|
337
|
+
)
|
338
|
+
|
339
|
+
# Agent 2: Generates questions based on the summary from state
|
340
|
+
question_generator = Agent(
|
341
|
+
name="QuestionGenerator",
|
342
|
+
model="gemini-2.5-flash",
|
343
|
+
instruction="Generate 3 comprehension questions based on this summary: {document_summary}",
|
344
|
+
# 'document_summary' is dynamically injected from session.state
|
345
|
+
)
|
346
|
+
|
347
|
+
document_pipeline = SequentialAgent(
|
348
|
+
name="SummaryQuestionPipeline",
|
349
|
+
sub_agents=[summarizer, question_generator], # Order matters!
|
350
|
+
description="Summarizes a document then generates questions."
|
351
|
+
)
|
352
|
+
```
|
353
|
+
|
354
|
+
### 3.2 `ParallelAgent`: Concurrent Execution
|
355
|
+
|
356
|
+
Executes `sub_agents` simultaneously. Useful for independent tasks to reduce overall latency. All sub-agents share the same `session.state`.
|
357
|
+
|
358
|
+
```python
|
359
|
+
from google.adk.agents import ParallelAgent, Agent
|
360
|
+
|
361
|
+
# Agents to fetch data concurrently
|
362
|
+
fetch_stock_price = Agent(name="StockPriceFetcher", ..., output_key="stock_data")
|
363
|
+
fetch_news_headlines = Agent(name="NewsFetcher", ..., output_key="news_data")
|
364
|
+
fetch_social_sentiment = Agent(name="SentimentAnalyzer", ..., output_key="sentiment_data")
|
365
|
+
|
366
|
+
# Agent to merge results (runs after ParallelAgent, usually in a SequentialAgent)
|
367
|
+
merger_agent = Agent(
|
368
|
+
name="ReportGenerator",
|
369
|
+
model="gemini-2.5-flash",
|
370
|
+
instruction="Combine stock data: {stock_data}, news: {news_data}, and sentiment: {sentiment_data} into a market report."
|
371
|
+
)
|
372
|
+
|
373
|
+
# Pipeline to run parallel fetching then sequential merging
|
374
|
+
market_analysis_pipeline = SequentialAgent(
|
375
|
+
name="MarketAnalyzer",
|
376
|
+
sub_agents=[
|
377
|
+
ParallelAgent(
|
378
|
+
name="ConcurrentFetch",
|
379
|
+
sub_agents=[fetch_stock_price, fetch_news_headlines, fetch_social_sentiment]
|
380
|
+
),
|
381
|
+
merger_agent # Runs after all parallel agents complete
|
382
|
+
]
|
383
|
+
)
|
384
|
+
```
|
385
|
+
* **Concurrency Caution**: When parallel agents write to the same `state` key, race conditions can occur. Always use distinct `output_key`s or manage concurrent writes explicitly.
|
386
|
+
|
387
|
+
### 3.3 `LoopAgent`: Iterative Processes
|
388
|
+
|
389
|
+
Repeatedly executes its `sub_agents` (sequentially within each loop iteration) until a condition is met or `max_iterations` is reached.
|
390
|
+
|
391
|
+
#### **Termination of `LoopAgent`**
|
392
|
+
A `LoopAgent` terminates when:
|
393
|
+
1. `max_iterations` is reached.
|
394
|
+
2. Any `Event` yielded by a sub-agent (or a tool within it) sets `actions.escalate = True`. This provides dynamic, content-driven loop termination.
|
395
|
+
|
396
|
+
#### **Example: Iterative Refinement Loop with a Custom `BaseAgent` for Control**
|
397
|
+
This example shows a loop that continues until a condition, determined by an evaluation agent, is met.
|
398
|
+
|
399
|
+
```python
|
400
|
+
from google.adk.agents import LoopAgent, Agent, BaseAgent
|
401
|
+
from google.adk.events import Event, EventActions
|
402
|
+
from google.adk.agents.invocation_context import InvocationContext
|
403
|
+
from typing import AsyncGenerator
|
404
|
+
|
405
|
+
# An LLM Agent that evaluates research and produces structured JSON output
|
406
|
+
research_evaluator = Agent(
|
407
|
+
name="research_evaluator",
|
408
|
+
# ... configuration from Section 2.2 ...
|
409
|
+
output_schema=Feedback,
|
410
|
+
output_key="research_evaluation",
|
411
|
+
)
|
412
|
+
|
413
|
+
# An LLM Agent that performs additional searches based on feedback
|
414
|
+
enhanced_search_executor = Agent(
|
415
|
+
name="enhanced_search_executor",
|
416
|
+
instruction="Execute the follow-up queries from 'research_evaluation' and combine with existing findings.",
|
417
|
+
# ... other configurations ...
|
418
|
+
)
|
419
|
+
|
420
|
+
# A custom BaseAgent to check the evaluation and stop the loop
|
421
|
+
class EscalationChecker(BaseAgent):
|
422
|
+
"""Checks research evaluation and escalates to stop the loop if grade is 'pass'."""
|
423
|
+
async def _run_async_impl(self, ctx: InvocationContext) -> AsyncGenerator[Event, None]:
|
424
|
+
evaluation = ctx.session.state.get("research_evaluation")
|
425
|
+
if evaluation and evaluation.get("grade") == "pass":
|
426
|
+
# The key to stopping the loop: yield an Event with escalate=True
|
427
|
+
yield Event(author=self.name, actions=EventActions(escalate=True))
|
428
|
+
else:
|
429
|
+
# Let the loop continue
|
430
|
+
yield Event(author=self.name)
|
431
|
+
|
432
|
+
# Define the loop
|
433
|
+
iterative_refinement_loop = LoopAgent(
|
434
|
+
name="IterativeRefinementLoop",
|
435
|
+
sub_agents=[
|
436
|
+
research_evaluator, # Step 1: Evaluate
|
437
|
+
EscalationChecker(name="EscalationChecker"), # Step 2: Check and maybe stop
|
438
|
+
enhanced_search_executor, # Step 3: Refine (only runs if loop didn't stop)
|
439
|
+
],
|
440
|
+
max_iterations=5, # Fallback to prevent infinite loops
|
441
|
+
description="Iteratively evaluates and refines research until it passes quality checks."
|
442
|
+
)
|
443
|
+
```
|
444
|
+
|
445
|
+
---
|
446
|
+
|
447
|
+
## 4. Multi-Agent Systems & Communication
|
448
|
+
|
449
|
+
Building complex applications by composing multiple, specialized agents.
|
450
|
+
|
451
|
+
### 4.1 Agent Hierarchy
|
452
|
+
|
453
|
+
A hierarchical (tree-like) structure of parent-child relationships defined by the `sub_agents` parameter during `BaseAgent` initialization. An agent can only have one parent.
|
454
|
+
|
455
|
+
```python
|
456
|
+
# Conceptual Hierarchy
|
457
|
+
# Root
|
458
|
+
# └── Coordinator (LlmAgent)
|
459
|
+
# ├── SalesAgent (LlmAgent)
|
460
|
+
# └── SupportAgent (LlmAgent)
|
461
|
+
# └── DataPipeline (SequentialAgent)
|
462
|
+
# ├── DataFetcher (LlmAgent)
|
463
|
+
# └── DataProcessor (LlmAgent)
|
464
|
+
```
|
465
|
+
|
466
|
+
### 4.2 Inter-Agent Communication Mechanisms
|
467
|
+
|
468
|
+
1. **Shared Session State (`session.state`)**: The most common and robust method. Agents read from and write to the same mutable dictionary.
|
469
|
+
* **Mechanism**: Agent A sets `ctx.session.state['key'] = value`. Agent B later reads `ctx.session.state.get('key')`. `output_key` on `LlmAgent` is a convenient auto-setter.
|
470
|
+
* **Best for**: Passing intermediate results, shared configurations, and flags in pipelines (Sequential, Loop agents).
|
471
|
+
|
472
|
+
2. **LLM-Driven Delegation (`transfer_to_agent`)**: A `LlmAgent` can dynamically hand over control to another agent based on its reasoning.
|
473
|
+
* **Mechanism**: The LLM generates a special `transfer_to_agent` function call. The ADK framework intercepts this, routes the next turn to the target agent.
|
474
|
+
* **Prerequisites**:
|
475
|
+
* The initiating `LlmAgent` needs `instruction` to guide delegation and `description` of the target agent(s).
|
476
|
+
* Target agents need clear `description`s to help the LLM decide.
|
477
|
+
* Target agent must be discoverable within the current agent's hierarchy (direct `sub_agent` or a descendant).
|
478
|
+
* **Configuration**: Can be enabled/disabled via `disallow_transfer_to_parent` and `disallow_transfer_to_peers` on `LlmAgent`.
|
479
|
+
|
480
|
+
3. **Explicit Invocation (`AgentTool`)**: An `LlmAgent` can treat another `BaseAgent` instance as a callable tool.
|
481
|
+
* **Mechanism**: Wrap the target agent (`target_agent`) in `AgentTool(agent=target_agent)` and add it to the calling `LlmAgent`'s `tools` list. The `AgentTool` generates a `FunctionDeclaration` for the LLM. When called, `AgentTool` runs the target agent and returns its final response as the tool result.
|
482
|
+
* **Best for**: Hierarchical task decomposition, where a higher-level agent needs a specific output from a lower-level agent.
|
483
|
+
|
484
|
+
### 4.3 Common Multi-Agent Patterns
|
485
|
+
|
486
|
+
* **Coordinator/Dispatcher**: A central agent routes requests to specialized sub-agents (often via LLM-driven delegation).
|
487
|
+
* **Sequential Pipeline**: `SequentialAgent` orchestrates a fixed sequence of tasks, passing data via shared state.
|
488
|
+
* **Parallel Fan-Out/Gather**: `ParallelAgent` runs concurrent tasks, followed by a final agent that synthesizes results from state.
|
489
|
+
* **Review/Critique (Generator-Critic)**: `SequentialAgent` with a generator followed by a critic, often in a `LoopAgent` for iterative refinement.
|
490
|
+
* **Hierarchical Task Decomposition (Planner/Executor)**: High-level agents break down complex problems, delegating sub-tasks to lower-level agents (often via `AgentTool` and delegation).
|
491
|
+
|
492
|
+
#### **Example: Hierarchical Planner/Executor Pattern**
|
493
|
+
This pattern combines several mechanisms. A top-level `interactive_planner_agent` uses another agent (`plan_generator`) as a tool to create a plan, then delegates the execution of that plan to a complex `SequentialAgent` (`research_pipeline`).
|
494
|
+
|
495
|
+
```python
|
496
|
+
from google.adk.agents import LlmAgent, SequentialAgent, LoopAgent
|
497
|
+
from google.adk.tools.agent_tool import AgentTool
|
498
|
+
|
499
|
+
# Assume plan_generator, section_planner, research_evaluator, etc. are defined.
|
500
|
+
|
501
|
+
# The execution pipeline itself is a complex agent.
|
502
|
+
research_pipeline = SequentialAgent(
|
503
|
+
name="research_pipeline",
|
504
|
+
description="Executes a pre-approved research plan. It performs iterative research, evaluation, and composes a final, cited report.",
|
505
|
+
sub_agents=[
|
506
|
+
section_planner,
|
507
|
+
section_researcher,
|
508
|
+
LoopAgent(
|
509
|
+
name="iterative_refinement_loop",
|
510
|
+
max_iterations=3,
|
511
|
+
sub_agents=[
|
512
|
+
research_evaluator,
|
513
|
+
EscalationChecker(name="escalation_checker"),
|
514
|
+
enhanced_search_executor,
|
515
|
+
],
|
516
|
+
),
|
517
|
+
report_composer,
|
518
|
+
],
|
519
|
+
)
|
520
|
+
|
521
|
+
# The top-level agent that interacts with the user.
|
522
|
+
interactive_planner_agent = LlmAgent(
|
523
|
+
name="interactive_planner_agent",
|
524
|
+
model="gemini-2.5-flash",
|
525
|
+
description="The primary research assistant. It collaborates with the user to create a research plan, and then executes it upon approval.",
|
526
|
+
instruction="""
|
527
|
+
You are a research planning assistant. Your workflow is:
|
528
|
+
1. **Plan:** Use the `plan_generator` tool to create a draft research plan.
|
529
|
+
2. **Refine:** Incorporate user feedback until the plan is approved.
|
530
|
+
3. **Execute:** Once the user gives EXPLICIT approval (e.g., "looks good, run it"), you MUST delegate the task to the `research_pipeline` agent.
|
531
|
+
Your job is to Plan, Refine, and Delegate. Do not do the research yourself.
|
532
|
+
""",
|
533
|
+
# The planner delegates to the pipeline.
|
534
|
+
sub_agents=[research_pipeline],
|
535
|
+
# The planner uses another agent as a tool.
|
536
|
+
tools=[AgentTool(plan_generator)],
|
537
|
+
output_key="research_plan",
|
538
|
+
)
|
539
|
+
|
540
|
+
# The root agent of the application is the top-level planner.
|
541
|
+
root_agent = interactive_planner_agent
|
542
|
+
```
|
543
|
+
|
544
|
+
---
|
545
|
+
|
546
|
+
## 5. Building Custom Agents (`BaseAgent`)
|
547
|
+
|
548
|
+
For unique orchestration logic that doesn't fit standard workflow agents, inherit directly from `BaseAgent`.
|
549
|
+
|
550
|
+
### 5.1 When to Use Custom Agents
|
551
|
+
|
552
|
+
* **Complex Conditional Logic**: `if/else` branching based on multiple state variables.
|
553
|
+
* **Dynamic Agent Selection**: Choosing which sub-agent to run based on runtime evaluation.
|
554
|
+
* **Direct External Integrations**: Calling external APIs or libraries directly within the orchestration flow.
|
555
|
+
* **Custom Loop/Retry Logic**: More sophisticated iteration patterns than `LoopAgent`, such as the `EscalationChecker` example.
|
556
|
+
|
557
|
+
### 5.2 Implementing `_run_async_impl`
|
558
|
+
|
559
|
+
This is the core asynchronous method you must override.
|
560
|
+
|
561
|
+
#### **Example: A Custom Agent for Loop Control**
|
562
|
+
This agent reads state, applies simple Python logic, and yields an `Event` with an `escalate` action to control a `LoopAgent`.
|
563
|
+
|
564
|
+
```python
|
565
|
+
from google.adk.agents import BaseAgent
|
566
|
+
from google.adk.agents.invocation_context import InvocationContext
|
567
|
+
from google.adk.events import Event, EventActions
|
568
|
+
from typing import AsyncGenerator
|
569
|
+
import logging
|
570
|
+
|
571
|
+
class EscalationChecker(BaseAgent):
|
572
|
+
"""Checks research evaluation and escalates to stop the loop if grade is 'pass'."""
|
573
|
+
|
574
|
+
def __init__(self, name: str):
|
575
|
+
super().__init__(name=name)
|
576
|
+
|
577
|
+
async def _run_async_impl(
|
578
|
+
self, ctx: InvocationContext
|
579
|
+
) -> AsyncGenerator[Event, None]:
|
580
|
+
# 1. Read from session state.
|
581
|
+
evaluation_result = ctx.session.state.get("research_evaluation")
|
582
|
+
|
583
|
+
# 2. Apply custom Python logic.
|
584
|
+
if evaluation_result and evaluation_result.get("grade") == "pass":
|
585
|
+
logging.info(
|
586
|
+
f"[{self.name}] Research passed. Escalating to stop loop."
|
587
|
+
)
|
588
|
+
# 3. Yield an Event with a control Action.
|
589
|
+
yield Event(author=self.name, actions=EventActions(escalate=True))
|
590
|
+
else:
|
591
|
+
logging.info(
|
592
|
+
f"[{self.name}] Research failed or not found. Loop continues."
|
593
|
+
)
|
594
|
+
# Yielding an event without actions lets the flow continue.
|
595
|
+
yield Event(author=self.name)
|
596
|
+
```
|
597
|
+
* **Asynchronous Generator**: `async def ... yield Event`. This allows pausing and resuming execution.
|
598
|
+
* **`ctx: InvocationContext`**: Provides access to all session state (`ctx.session.state`).
|
599
|
+
* **Calling Sub-Agents**: Use `async for event in self.sub_agent_instance.run_async(ctx): yield event`.
|
600
|
+
* **Control Flow**: Use standard Python `if/else`, `for/while` loops for complex logic.
|
601
|
+
|
602
|
+
---
|
603
|
+
|
604
|
+
## 6. Models: Gemini, LiteLLM, and Vertex AI
|
605
|
+
|
606
|
+
ADK's model flexibility allows integrating various LLMs for different needs.
|
607
|
+
|
608
|
+
### 6.1 Google Gemini Models (AI Studio & Vertex AI)
|
609
|
+
|
610
|
+
* **Default Integration**: Native support via `google-genai` library.
|
611
|
+
* **AI Studio (Easy Start)**:
|
612
|
+
* Set `GOOGLE_API_KEY="YOUR_API_KEY"` (environment variable).
|
613
|
+
* Set `GOOGLE_GENAI_USE_VERTEXAI="False"`.
|
614
|
+
* Model strings: `"gemini-2.5-flash"`, `"gemini-2.5-pro"`, etc.
|
615
|
+
* **Vertex AI (Production)**:
|
616
|
+
* Authenticate via `gcloud auth application-default login` (recommended).
|
617
|
+
* Set `GOOGLE_CLOUD_PROJECT="YOUR_PROJECT_ID"`, `GOOGLE_CLOUD_LOCATION="your-region"` (environment variables).
|
618
|
+
* Set `GOOGLE_GENAI_USE_VERTEXAI="True"`.
|
619
|
+
* Model strings: `"gemini-2.5-flash"`, `"gemini-2.5-pro"`, or full Vertex AI endpoint resource names for specific deployments.
|
620
|
+
|
621
|
+
### 6.2 Other Cloud & Proprietary Models via LiteLLM
|
622
|
+
|
623
|
+
`LiteLlm` provides a unified interface to 100+ LLMs (OpenAI, Anthropic, Cohere, etc.).
|
624
|
+
|
625
|
+
* **Installation**: `pip install litellm`
|
626
|
+
* **API Keys**: Set environment variables as required by LiteLLM (e.g., `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`).
|
627
|
+
* **Usage**:
|
628
|
+
```python
|
629
|
+
from google.adk.models.lite_llm import LiteLlm
|
630
|
+
agent_openai = Agent(model=LiteLlm(model="openai/gpt-4o"), ...)
|
631
|
+
agent_claude = Agent(model=LiteLlm(model="anthropic/claude-3-haiku-20240307"), ...)
|
632
|
+
```
|
633
|
+
|
634
|
+
### 6.3 Open & Local Models via LiteLLM (Ollama, vLLM)
|
635
|
+
|
636
|
+
For self-hosting, cost savings, privacy, or offline use.
|
637
|
+
|
638
|
+
* **Ollama Integration**: Run Ollama locally (`ollama run <model>`).
|
639
|
+
```bash
|
640
|
+
export OLLAMA_API_BASE="http://localhost:11434" # Ensure Ollama server is running
|
641
|
+
```
|
642
|
+
```python
|
643
|
+
from google.adk.models.lite_llm import LiteLlm
|
644
|
+
# Use 'ollama_chat' provider for tool-calling capabilities with Ollama models
|
645
|
+
agent_ollama = Agent(model=LiteLlm(model="ollama_chat/llama3:instruct"), ...)
|
646
|
+
```
|
647
|
+
|
648
|
+
* **Self-Hosted Endpoint (e.g., vLLM)**:
|
649
|
+
```python
|
650
|
+
from google.adk.models.lite_llm import LiteLlm
|
651
|
+
api_base_url = "https://your-vllm-endpoint.example.com/v1"
|
652
|
+
agent_vllm = Agent(
|
653
|
+
model=LiteLlm(
|
654
|
+
model="your-model-name-on-vllm",
|
655
|
+
api_base=api_base_url,
|
656
|
+
extra_headers={"Authorization": "Bearer YOUR_TOKEN"},
|
657
|
+
),
|
658
|
+
...
|
659
|
+
)
|
660
|
+
```
|
661
|
+
|
662
|
+
### 6.4 Customizing LLM API Clients
|
663
|
+
|
664
|
+
For `google-genai` (used by Gemini models), you can configure the underlying client.
|
665
|
+
|
666
|
+
```python
|
667
|
+
import os
|
668
|
+
from google.genai import configure as genai_configure
|
669
|
+
|
670
|
+
genai_configure.use_defaults(
|
671
|
+
timeout=60, # seconds
|
672
|
+
client_options={"api_key": os.getenv("GOOGLE_API_KEY")},
|
673
|
+
)
|
674
|
+
```
|
675
|
+
|
676
|
+
---
|
677
|
+
|
678
|
+
## 7. Tools: The Agent's Capabilities
|
679
|
+
|
680
|
+
Tools extend an agent's abilities beyond text generation.
|
681
|
+
|
682
|
+
### 7.1 Defining Function Tools: Principles & Best Practices
|
683
|
+
|
684
|
+
* **Signature**: `def my_tool(param1: Type, param2: Type, tool_context: ToolContext) -> dict:`
|
685
|
+
* **Function Name**: Descriptive verb-noun (e.g., `schedule_meeting`).
|
686
|
+
* **Parameters**: Clear names, required type hints, **NO DEFAULT VALUES**.
|
687
|
+
* **Return Type**: **Must** be a `dict` (JSON-serializable), preferably with a `'status'` key.
|
688
|
+
* **Docstring**: **CRITICAL**. Explain purpose, when to use, arguments, and return value structure. **AVOID** mentioning `tool_context`.
|
689
|
+
|
690
|
+
```python
|
691
|
+
def calculate_compound_interest(
|
692
|
+
principal: float,
|
693
|
+
rate: float,
|
694
|
+
years: int,
|
695
|
+
compounding_frequency: int,
|
696
|
+
tool_context: ToolContext
|
697
|
+
) -> dict:
|
698
|
+
"""Calculates the future value of an investment with compound interest.
|
699
|
+
|
700
|
+
Use this tool to calculate the future value of an investment given a
|
701
|
+
principal amount, interest rate, number of years, and how often the
|
702
|
+
interest is compounded per year.
|
703
|
+
|
704
|
+
Args:
|
705
|
+
principal (float): The initial amount of money invested.
|
706
|
+
rate (float): The annual interest rate (e.g., 0.05 for 5%).
|
707
|
+
years (int): The number of years the money is invested.
|
708
|
+
compounding_frequency (int): The number of times interest is compounded
|
709
|
+
per year (e.g., 1 for annually, 12 for monthly).
|
710
|
+
|
711
|
+
Returns:
|
712
|
+
dict: Contains the calculation result.
|
713
|
+
- 'status' (str): "success" or "error".
|
714
|
+
- 'future_value' (float, optional): The calculated future value.
|
715
|
+
- 'error_message' (str, optional): Description of error, if any.
|
716
|
+
"""
|
717
|
+
# ... implementation ...
|
718
|
+
```
|
719
|
+
|
720
|
+
### 7.2 The `ToolContext` Object: Accessing Runtime Information
|
721
|
+
|
722
|
+
`ToolContext` is the gateway for tools to interact with the ADK runtime.
|
723
|
+
|
724
|
+
* `tool_context.state`: Read and write to the current `Session`'s `state` dictionary.
|
725
|
+
* `tool_context.actions`: Modify the `EventActions` object (e.g., `tool_context.actions.escalate = True`).
|
726
|
+
* `tool_context.load_artifact(filename)` / `tool_context.save_artifact(filename, part)`: Manage binary data.
|
727
|
+
* `tool_context.search_memory(query)`: Query the long-term `MemoryService`.
|
728
|
+
|
729
|
+
### 7.3 All Tool Types & Their Usage
|
730
|
+
|
731
|
+
ADK supports a diverse ecosystem of tools.
|
732
|
+
|
733
|
+
1. **`FunctionTool`**: Wraps any Python callable. The most common tool type.
|
734
|
+
2. **`LongRunningFunctionTool`**: For `async` functions that `yield` intermediate results.
|
735
|
+
3. **`AgentTool`**: Wraps another `BaseAgent` instance, allowing it to be called as a tool.
|
736
|
+
4. **`OpenAPIToolset`**: Automatically generates tools from an OpenAPI (Swagger) specification.
|
737
|
+
5. **`MCPToolset`**: Connects to an external Model Context Protocol (MCP) server.
|
738
|
+
6. **Built-in Tools**: `google_search`, `BuiltInCodeExecutor`, `VertexAiSearchTool`. e.g `from google.adk.tools import google_search`
|
739
|
+
Note: google_search is a special tool automatically invoked by the model. It can be passed directly to the agent without wrapping in a custom function.
|
740
|
+
7. **Third-Party Tool Wrappers**: `LangchainTool`, `CrewaiTool`.
|
741
|
+
8. **Google Cloud Tools**: `ApiHubToolset`, `ApplicationIntegrationToolset`.
|
742
|
+
|
743
|
+
---
|
744
|
+
|
745
|
+
## 8. Context, State, and Memory Management
|
746
|
+
|
747
|
+
Effective context management is crucial for coherent, multi-turn conversations.
|
748
|
+
|
749
|
+
### 8.1 The `Session` Object & `SessionService`
|
750
|
+
|
751
|
+
* **`Session`**: The container for a single, ongoing conversation (`id`, `state`, `events`).
|
752
|
+
* **`SessionService`**: Manages the lifecycle of `Session` objects (`create_session`, `get_session`, `append_event`).
|
753
|
+
* **Implementations**: `InMemorySessionService` (dev), `VertexAiSessionService` (prod), `DatabaseSessionService` (self-managed).
|
754
|
+
|
755
|
+
### 8.2 `State`: The Conversational Scratchpad
|
756
|
+
|
757
|
+
A mutable dictionary within `session.state` for short-term, dynamic data.
|
758
|
+
|
759
|
+
* **Update Mechanism**: Always update via `context.state` (in callbacks/tools) or `LlmAgent.output_key`.
|
760
|
+
* **Prefixes for Scope**:
|
761
|
+
* **(No prefix)**: Session-specific (e.g., `session.state['booking_step']`).
|
762
|
+
* `user:`: Persistent for a `user_id` across all their sessions (e.g., `session.state['user:preferred_currency']`).
|
763
|
+
* `app:`: Persistent for `app_name` across all users and sessions.
|
764
|
+
* `temp:`: Volatile, for the current `Invocation` turn only.
|
765
|
+
|
766
|
+
### 8.3 `Memory`: Long-Term Knowledge & Retrieval
|
767
|
+
|
768
|
+
For knowledge beyond a single conversation.
|
769
|
+
|
770
|
+
* **`BaseMemoryService`**: Defines the interface (`add_session_to_memory`, `search_memory`).
|
771
|
+
* **Implementations**: `InMemoryMemoryService`, `VertexAiRagMemoryService`.
|
772
|
+
* **Usage**: Agents interact via tools (e.g., the built-in `load_memory` tool).
|
773
|
+
|
774
|
+
### 8.4 `Artifacts`: Binary Data Management
|
775
|
+
|
776
|
+
For named, versioned binary data (files, images).
|
777
|
+
|
778
|
+
* **Representation**: `google.genai.types.Part` (containing a `Blob` with `data: bytes` and `mime_type: str`).
|
779
|
+
* **`BaseArtifactService`**: Manages storage (`save_artifact`, `load_artifact`).
|
780
|
+
* **Implementations**: `InMemoryArtifactService`, `GcsArtifactService`.
|
781
|
+
|
782
|
+
---
|
783
|
+
|
784
|
+
## 9. Runtime, Events, and Execution Flow
|
785
|
+
|
786
|
+
The `Runner` is the central orchestrator of an ADK application.
|
787
|
+
|
788
|
+
### 9.1 The `Runner`: The Orchestrator
|
789
|
+
|
790
|
+
* **Role**: Manages the agent's lifecycle, the event loop, and coordinates with services.
|
791
|
+
* **Entry Point**: `runner.run_async(user_id, session_id, new_message)`.
|
792
|
+
|
793
|
+
### 9.2 The Event Loop: Core Execution Flow
|
794
|
+
|
795
|
+
1. User input becomes a `user` `Event`.
|
796
|
+
2. `Runner` calls `agent.run_async(invocation_context)`.
|
797
|
+
3. Agent `yield`s an `Event` (e.g., tool call, text response). Execution pauses.
|
798
|
+
4. `Runner` processes the `Event` (applies state changes, etc.) and yields it to the client.
|
799
|
+
5. Execution resumes. This cycle repeats until the agent is done.
|
800
|
+
|
801
|
+
### 9.3 `Event` Object: The Communication Backbone
|
802
|
+
|
803
|
+
`Event` objects carry all information and signals.
|
804
|
+
|
805
|
+
* `Event.author`: Source of the event (`'user'`, agent name, `'system'`).
|
806
|
+
* `Event.content`: The primary payload (text, function calls, function responses).
|
807
|
+
* `Event.actions`: Signals side effects (`state_delta`, `transfer_to_agent`, `escalate`).
|
808
|
+
* `Event.is_final_response()`: Helper to identify the complete, displayable message.
|
809
|
+
|
810
|
+
### 9.4 Asynchronous Programming (Python Specific)
|
811
|
+
|
812
|
+
ADK is built on `asyncio`. Use `async def`, `await`, and `async for` for all I/O-bound operations.
|
813
|
+
|
814
|
+
---
|
815
|
+
|
816
|
+
## 10. Control Flow with Callbacks
|
817
|
+
|
818
|
+
Callbacks are functions that intercept and control agent execution at specific points.
|
819
|
+
|
820
|
+
### 10.1 Callback Mechanism: Interception & Control
|
821
|
+
|
822
|
+
* **Definition**: A Python function assigned to an agent's `callback` parameter (e.g., `after_agent_callback=my_func`).
|
823
|
+
* **Context**: Receives a `CallbackContext` (or `ToolContext`) with runtime info.
|
824
|
+
* **Return Value**: **Crucially determines flow.**
|
825
|
+
* `return None`: Allow the default action to proceed.
|
826
|
+
* `return <Specific Object>`: **Override** the default action/result.
|
827
|
+
|
828
|
+
### 10.2 Types of Callbacks
|
829
|
+
|
830
|
+
1. **Agent Lifecycle**: `before_agent_callback`, `after_agent_callback`.
|
831
|
+
2. **LLM Interaction**: `before_model_callback`, `after_model_callback`.
|
832
|
+
3. **Tool Execution**: `before_tool_callback`, `after_tool_callback`.
|
833
|
+
|
834
|
+
### 10.3 Callback Best Practices
|
835
|
+
|
836
|
+
* **Keep Focused**: Each callback for a single purpose.
|
837
|
+
* **Performance**: Avoid blocking I/O or heavy computation.
|
838
|
+
* **Error Handling**: Use `try...except` to prevent crashes.
|
839
|
+
|
840
|
+
#### **Example 1: Data Aggregation with `after_agent_callback`**
|
841
|
+
This callback runs after an agent, inspects the `session.events` to find structured data from tool calls (like `google_search` results), and saves it to state for later use.
|
842
|
+
|
843
|
+
```python
|
844
|
+
from google.adk.agents.callback_context import CallbackContext
|
845
|
+
|
846
|
+
def collect_research_sources_callback(callback_context: CallbackContext) -> None:
|
847
|
+
"""Collects and organizes web research sources from agent events."""
|
848
|
+
session = callback_context._invocation_context.session
|
849
|
+
# Get existing sources from state to append to them.
|
850
|
+
url_to_short_id = callback_context.state.get("url_to_short_id", {})
|
851
|
+
sources = callback_context.state.get("sources", {})
|
852
|
+
id_counter = len(url_to_short_id) + 1
|
853
|
+
|
854
|
+
# Iterate through all events in the session to find grounding metadata.
|
855
|
+
for event in session.events:
|
856
|
+
if not (event.grounding_metadata and event.grounding_metadata.grounding_chunks):
|
857
|
+
continue
|
858
|
+
# ... logic to parse grounding_chunks and grounding_supports ...
|
859
|
+
# (See full implementation in the original code snippet)
|
860
|
+
|
861
|
+
# Save the updated source map back to state.
|
862
|
+
callback_context.state["url_to_short_id"] = url_to_short_id
|
863
|
+
callback_context.state["sources"] = sources
|
864
|
+
|
865
|
+
# Used in an agent like this:
|
866
|
+
# section_researcher = LlmAgent(..., after_agent_callback=collect_research_sources_callback)
|
867
|
+
```
|
868
|
+
|
869
|
+
#### **Example 2: Output Transformation with `after_agent_callback`**
|
870
|
+
This callback takes an LLM's raw output (containing custom tags), uses Python to format it into markdown, and returns the modified content, overriding the original.
|
871
|
+
|
872
|
+
```python
|
873
|
+
import re
|
874
|
+
from google.adk.agents.callback_context import CallbackContext
|
875
|
+
from google.genai import types as genai_types
|
876
|
+
|
877
|
+
def citation_replacement_callback(callback_context: CallbackContext) -> genai_types.Content:
|
878
|
+
"""Replaces <cite> tags in a report with Markdown-formatted links."""
|
879
|
+
# 1. Get raw report and sources from state.
|
880
|
+
final_report = callback_context.state.get("final_cited_report", "")
|
881
|
+
sources = callback_context.state.get("sources", {})
|
882
|
+
|
883
|
+
# 2. Define a replacer function for regex substitution.
|
884
|
+
def tag_replacer(match: re.Match) -> str:
|
885
|
+
short_id = match.group(1)
|
886
|
+
if not (source_info := sources.get(short_id)):
|
887
|
+
return "" # Remove invalid tags
|
888
|
+
title = source_info.get("title", short_id)
|
889
|
+
return f" [{title}]({source_info['url']})"
|
890
|
+
|
891
|
+
# 3. Use regex to find all <cite> tags and replace them.
|
892
|
+
processed_report = re.sub(
|
893
|
+
r'<cite\s+source\s*=\s*["\']?(src-\d+)["\']?\s*/>',
|
894
|
+
tag_replacer,
|
895
|
+
final_report,
|
896
|
+
)
|
897
|
+
processed_report = re.sub(r"\s+([.,;:])", r"\1", processed_report) # Fix spacing
|
898
|
+
|
899
|
+
# 4. Save the new version to state and return it to override the original agent output.
|
900
|
+
callback_context.state["final_report_with_citations"] = processed_report
|
901
|
+
return genai_types.Content(parts=[genai_types.Part(text=processed_report)])
|
902
|
+
|
903
|
+
# Used in an agent like this:
|
904
|
+
# report_composer = LlmAgent(..., after_agent_callback=citation_replacement_callback)
|
905
|
+
```
|
906
|
+
---
|
907
|
+
|
908
|
+
## 11. Authentication for Tools
|
909
|
+
|
910
|
+
Enabling agents to securely access protected external resources.
|
911
|
+
|
912
|
+
### 11.1 Core Concepts: `AuthScheme` & `AuthCredential`
|
913
|
+
|
914
|
+
* **`AuthScheme`**: Defines *how* an API expects authentication (e.g., `APIKey`, `HTTPBearer`, `OAuth2`, `OpenIdConnectWithConfig`).
|
915
|
+
* **`AuthCredential`**: Holds *initial* information to *start* the auth process (e.g., API key value, OAuth client ID/secret).
|
916
|
+
|
917
|
+
### 11.2 Interactive OAuth/OIDC Flows
|
918
|
+
|
919
|
+
When a tool requires user interaction (OAuth consent), ADK pauses and signals your `Agent Client` application.
|
920
|
+
|
921
|
+
1. **Detect Auth Request**: `runner.run_async()` yields an event with a special `adk_request_credential` function call.
|
922
|
+
2. **Redirect User**: Extract `auth_uri` from `auth_config` in the event. Your client app redirects the user's browser to this `auth_uri` (appending `redirect_uri`).
|
923
|
+
3. **Handle Callback**: Your client app has a pre-registered `redirect_uri` to receive the user after authorization. It captures the full callback URL (containing `authorization_code`).
|
924
|
+
4. **Send Auth Result to ADK**: Your client prepares a `FunctionResponse` for `adk_request_credential`, setting `auth_config.exchanged_auth_credential.oauth2.auth_response_uri` to the captured callback URL.
|
925
|
+
5. **Resume Execution**: `runner.run_async()` is called again with this `FunctionResponse`. ADK performs the token exchange, stores the access token, and retries the original tool call.
|
926
|
+
|
927
|
+
### 11.3 Custom Tool Authentication
|
928
|
+
|
929
|
+
If building a `FunctionTool` that needs authentication:
|
930
|
+
|
931
|
+
1. **Check for Cached Creds**: `tool_context.state.get("my_token_cache_key")`.
|
932
|
+
2. **Check for Auth Response**: `tool_context.get_auth_response(my_auth_config)`.
|
933
|
+
3. **Initiate Auth**: If no creds, call `tool_context.request_credential(my_auth_config)` and return a pending status. This triggers the external flow.
|
934
|
+
4. **Cache Credentials**: After obtaining, store in `tool_context.state`.
|
935
|
+
5. **Make API Call**: Use the valid credentials (e.g., `google.oauth2.credentials.Credentials`).
|
936
|
+
|
937
|
+
---
|
938
|
+
|
939
|
+
## 12. Deployment Strategies
|
940
|
+
|
941
|
+
From local dev to production.
|
942
|
+
|
943
|
+
### 12.1 Local Development & Testing (`adk web`, `adk run`, `adk api_server`)
|
944
|
+
|
945
|
+
* **`adk web`**: Launches a local web UI for interactive chat, session inspection, and visual tracing.
|
946
|
+
```bash
|
947
|
+
adk web /path/to/your/project_root
|
948
|
+
```
|
949
|
+
* **`adk run`**: Command-line interactive chat.
|
950
|
+
```bash
|
951
|
+
adk run /path/to/your/agent_folder
|
952
|
+
```
|
953
|
+
* **`adk api_server`**: Launches a local FastAPI server exposing `/run`, `/run_sse`, `/list-apps`, etc., for API testing with `curl` or client libraries.
|
954
|
+
```bash
|
955
|
+
adk api_server /path/to/your/project_root
|
956
|
+
```
|
957
|
+
|
958
|
+
### 12.2 Vertex AI Agent Engine
|
959
|
+
|
960
|
+
Fully managed, scalable service for ADK agents on Google Cloud.
|
961
|
+
|
962
|
+
* **Features**: Auto-scaling, session management, observability integration.
|
963
|
+
* **Deployment**: Use `vertexai.agent_engines.create()`.
|
964
|
+
```python
|
965
|
+
from vertexai.preview import reasoning_engines # or agent_engines directly in later versions
|
966
|
+
|
967
|
+
# Wrap your root_agent for deployment
|
968
|
+
app_for_engine = reasoning_engines.AdkApp(agent=root_agent, enable_tracing=True)
|
969
|
+
|
970
|
+
# Deploy
|
971
|
+
remote_app = agent_engines.create(
|
972
|
+
agent_engine=app_for_engine,
|
973
|
+
requirements=["google-cloud-aiplatform[adk,agent_engines]"],
|
974
|
+
display_name="My Production Agent"
|
975
|
+
)
|
976
|
+
print(remote_app.resource_name) # projects/PROJECT_NUM/locations/REGION/reasoningEngines/ID
|
977
|
+
```
|
978
|
+
* **Interaction**: Use `remote_app.stream_query()`, `create_session()`, etc.
|
979
|
+
|
980
|
+
### 12.3 Cloud Run
|
981
|
+
|
982
|
+
Serverless container platform for custom web applications.
|
983
|
+
|
984
|
+
* **Deployment**:
|
985
|
+
1. Create a `Dockerfile` for your FastAPI app (using `google.adk.cli.fast_api.get_fast_api_app`).
|
986
|
+
2. Use `gcloud run deploy --source .`.
|
987
|
+
3. Alternatively, `adk deploy cloud_run` (simpler, opinionated).
|
988
|
+
* **Example `main.py`**:
|
989
|
+
```python
|
990
|
+
import os
|
991
|
+
from fastapi import FastAPI
|
992
|
+
from google.adk.cli.fast_api import get_fast_api_app
|
993
|
+
|
994
|
+
# Ensure your agent_folder (e.g., 'my_first_agent') is in the same directory as main.py
|
995
|
+
app: FastAPI = get_fast_api_app(
|
996
|
+
agents_dir=os.path.dirname(os.path.abspath(__file__)),
|
997
|
+
session_db_url="sqlite:///./sessions.db", # In-container SQLite, for simple cases
|
998
|
+
# For production: use a persistent DB (Cloud SQL) or VertexAiSessionService
|
999
|
+
allow_origins=["*"],
|
1000
|
+
web=True # Serve ADK UI
|
1001
|
+
)
|
1002
|
+
# uvicorn.run(app, host="0.0.0.0", port=int(os.environ.get("PORT", 8080))) # If running directly
|
1003
|
+
```
|
1004
|
+
|
1005
|
+
### 12.4 Google Kubernetes Engine (GKE)
|
1006
|
+
|
1007
|
+
For maximum control, run your containerized agent in a Kubernetes cluster.
|
1008
|
+
|
1009
|
+
* **Deployment**:
|
1010
|
+
1. Build Docker image (`gcloud builds submit`).
|
1011
|
+
2. Create Kubernetes Deployment and Service YAMLs.
|
1012
|
+
3. Apply with `kubectl apply -f deployment.yaml`.
|
1013
|
+
4. Configure Workload Identity for GCP permissions.
|
1014
|
+
|
1015
|
+
### 12.5 CI/CD Integration
|
1016
|
+
|
1017
|
+
* Automate testing (`pytest`, `adk eval`) in CI.
|
1018
|
+
* Automate container builds and deployments (e.g., Cloud Build, GitHub Actions).
|
1019
|
+
* Use environment variables for secrets.
|
1020
|
+
|
1021
|
+
---
|
1022
|
+
|
1023
|
+
## 13. Evaluation and Safety
|
1024
|
+
|
1025
|
+
Critical for robust, production-ready agents.
|
1026
|
+
|
1027
|
+
### 13.1 Agent Evaluation (`adk eval`)
|
1028
|
+
|
1029
|
+
Systematically assess agent performance using predefined test cases.
|
1030
|
+
|
1031
|
+
* **Evalset File (`.evalset.json`)**: Contains `eval_cases`, each with a `conversation` (user queries, expected tool calls, expected intermediate/final responses) and `session_input` (initial state).
|
1032
|
+
```json
|
1033
|
+
{
|
1034
|
+
"eval_set_id": "weather_bot_eval",
|
1035
|
+
"eval_cases": [
|
1036
|
+
{
|
1037
|
+
"eval_id": "london_weather_query",
|
1038
|
+
"conversation": [
|
1039
|
+
{
|
1040
|
+
"user_content": {"parts": [{"text": "What's the weather in London?"}]},
|
1041
|
+
"final_response": {"parts": [{"text": "The weather in London is cloudy..."}]},
|
1042
|
+
"intermediate_data": {
|
1043
|
+
"tool_uses": [{"name": "get_weather", "args": {"city": "London"}}]
|
1044
|
+
}
|
1045
|
+
}
|
1046
|
+
],
|
1047
|
+
"session_input": {"app_name": "weather_app", "user_id": "test_user", "state": {}}
|
1048
|
+
}
|
1049
|
+
]
|
1050
|
+
}
|
1051
|
+
```
|
1052
|
+
* **Running Evaluation**:
|
1053
|
+
* `adk web`: Interactive UI for creating/running eval cases.
|
1054
|
+
* `adk eval /path/to/agent_folder /path/to/evalset.json`: CLI execution.
|
1055
|
+
* `pytest`: Integrate `AgentEvaluator.evaluate()` into unit/integration tests.
|
1056
|
+
* **Metrics**: `tool_trajectory_avg_score` (tool calls match expected), `response_match_score` (final response similarity using ROUGE). Configurable via `test_config.json`.
|
1057
|
+
|
1058
|
+
### 13.2 Safety & Guardrails
|
1059
|
+
|
1060
|
+
Multi-layered defense against harmful content, misalignment, and unsafe actions.
|
1061
|
+
|
1062
|
+
1. **Identity and Authorization**:
|
1063
|
+
* **Agent-Auth**: Tool acts with the agent's service account (e.g., `Vertex AI User` role). Simple, but all users share access level. Logs needed for attribution.
|
1064
|
+
* **User-Auth**: Tool acts with the end-user's identity (via OAuth tokens). Reduces risk of abuse.
|
1065
|
+
2. **In-Tool Guardrails**: Design tools defensively. Tools can read policies from `tool_context.state` (set deterministically by developer) and validate model-provided arguments before execution.
|
1066
|
+
```python
|
1067
|
+
def execute_sql(query: str, tool_context: ToolContext) -> dict:
|
1068
|
+
policy = tool_context.state.get("user:sql_policy", {})
|
1069
|
+
if not policy.get("allow_writes", False) and ("INSERT" in query.upper() or "DELETE" in query.upper()):
|
1070
|
+
return {"status": "error", "message": "Policy: Write operations are not allowed."}
|
1071
|
+
# ... execute query ...
|
1072
|
+
```
|
1073
|
+
3. **Built-in Gemini Safety Features**:
|
1074
|
+
* **Content Safety Filters**: Automatically block harmful content (CSAM, PII, hate speech, etc.). Configurable thresholds.
|
1075
|
+
* **System Instructions**: Guide model behavior, define prohibited topics, brand tone, disclaimers.
|
1076
|
+
4. **Model and Tool Callbacks (LLM as a Guardrail)**: Use callbacks to inspect inputs/outputs.
|
1077
|
+
* `before_model_callback`: Intercept `LlmRequest` before it hits the LLM. Block (return `LlmResponse`) or modify.
|
1078
|
+
* `before_tool_callback`: Intercept tool calls (name, args) before execution. Block (return `dict`) or modify.
|
1079
|
+
* **LLM-based Safety**: Use a cheap/fast LLM (e.g., Gemini Flash) in a callback to classify input/output safety.
|
1080
|
+
```python
|
1081
|
+
def safety_checker_callback(context: CallbackContext, llm_request: LlmRequest) -> Optional[LlmResponse]:
|
1082
|
+
# Use a separate, small LLM to classify safety
|
1083
|
+
safety_llm_agent = Agent(name="SafetyChecker", model="gemini-2.5-flash-001", instruction="Classify input as 'safe' or 'unsafe'. Output ONLY the word.")
|
1084
|
+
# Run the safety agent (might need a new runner instance or direct model call)
|
1085
|
+
# For simplicity, a mock:
|
1086
|
+
user_input = llm_request.contents[-1].parts[0].text
|
1087
|
+
if "dangerous_phrase" in user_input.lower():
|
1088
|
+
context.state["safety_violation"] = True
|
1089
|
+
return LlmResponse(content=genai_types.Content(parts=[genai_types.Part(text="I cannot process this request due to safety concerns.")]))
|
1090
|
+
return None
|
1091
|
+
```
|
1092
|
+
5. **Sandboxed Code Execution**:
|
1093
|
+
* `BuiltInCodeExecutor`: Uses secure, sandboxed execution environments.
|
1094
|
+
* Vertex AI Code Interpreter Extension.
|
1095
|
+
* If custom, ensure hermetic environments (no network, isolated).
|
1096
|
+
6. **Network Controls & VPC-SC**: Confine agent activity within secure perimeters (VPC Service Controls) to prevent data exfiltration.
|
1097
|
+
7. **Output Escaping in UIs**: Always properly escape LLM-generated content in web UIs to prevent XSS attacks and indirect prompt injections.
|
1098
|
+
|
1099
|
+
---
|
1100
|
+
|
1101
|
+
## 14. Debugging, Logging & Observability
|
1102
|
+
|
1103
|
+
* **`adk web` UI**: Best first step. Provides visual trace, session history, and state inspection.
|
1104
|
+
* **Event Stream Logging**: Iterate `runner.run_async()` events and print relevant fields.
|
1105
|
+
```python
|
1106
|
+
async for event in runner.run_async(...):
|
1107
|
+
print(f"[{event.author}] Event ID: {event.id}, Invocation: {event.invocation_id}")
|
1108
|
+
if event.content and event.content.parts:
|
1109
|
+
if event.content.parts[0].text:
|
1110
|
+
print(f" Text: {event.content.parts[0].text[:100]}...")
|
1111
|
+
if event.get_function_calls():
|
1112
|
+
print(f" Tool Call: {event.get_function_calls()[0].name} with {event.get_function_calls()[0].args}")
|
1113
|
+
if event.get_function_responses():
|
1114
|
+
print(f" Tool Response: {event.get_function_responses()[0].response}")
|
1115
|
+
if event.actions:
|
1116
|
+
if event.actions.state_delta:
|
1117
|
+
print(f" State Delta: {event.actions.state_delta}")
|
1118
|
+
if event.actions.transfer_to_agent:
|
1119
|
+
print(f" TRANSFER TO: {event.actions.transfer_to_agent}")
|
1120
|
+
if event.error_message:
|
1121
|
+
print(f" ERROR: {event.error_message}")
|
1122
|
+
```
|
1123
|
+
* **Tool/Callback `print` statements**: Simple logging directly within your functions.
|
1124
|
+
* **Python `logging` module**: Integrate with standard logging frameworks.
|
1125
|
+
* **Tracing Integrations**: ADK supports OpenTelemetry (e.g., via Comet Opik) for distributed tracing.
|
1126
|
+
```python
|
1127
|
+
# Example using Comet Opik integration (conceptual)
|
1128
|
+
# pip install comet_opik_adk
|
1129
|
+
# from comet_opik_adk import enable_opik_tracing
|
1130
|
+
# enable_opik_tracing() # Call at app startup
|
1131
|
+
# Then run your ADK app, traces appear in Comet workspace.
|
1132
|
+
```
|
1133
|
+
* **Session History (`session.events`)**: Persisted for detailed post-mortem analysis.
|
1134
|
+
|
1135
|
+
---
|
1136
|
+
|
1137
|
+
## 15. Advanced I/O Modalities
|
1138
|
+
|
1139
|
+
ADK (especially with Gemini Live API models) supports richer interactions.
|
1140
|
+
|
1141
|
+
* **Audio**: Input via `Blob(mime_type="audio/pcm", data=bytes)`, Output via `genai_types.SpeechConfig` in `RunConfig`.
|
1142
|
+
* **Vision (Images/Video)**: Input via `Blob(mime_type="image/jpeg", data=bytes)` or `Blob(mime_type="video/mp4", data=bytes)`. Models like `gemini-2.5-flash-exp` can process these.
|
1143
|
+
* **Multimodal Input in `Content`**:
|
1144
|
+
```python
|
1145
|
+
multimodal_content = genai_types.Content(
|
1146
|
+
parts=[
|
1147
|
+
genai_types.Part(text="Describe this image:"),
|
1148
|
+
genai_types.Part(inline_data=genai_types.Blob(mime_type="image/jpeg", data=image_bytes))
|
1149
|
+
]
|
1150
|
+
)
|
1151
|
+
```
|
1152
|
+
* **Streaming Modalities**: `RunConfig.response_modalities=['TEXT', 'AUDIO']`.
|
1153
|
+
|
1154
|
+
---
|
1155
|
+
|
1156
|
+
## 16. Performance Optimization
|
1157
|
+
|
1158
|
+
* **Model Selection**: Choose the smallest model that meets requirements (e.g., `gemini-2.5-flash` for simple tasks).
|
1159
|
+
* **Instruction Prompt Engineering**: Concise, clear instructions reduce tokens and improve accuracy.
|
1160
|
+
* **Tool Use Optimization**:
|
1161
|
+
* Design efficient tools (fast API calls, optimize database queries).
|
1162
|
+
* Cache tool results (e.g., using `before_tool_callback` or `tool_context.state`).
|
1163
|
+
* **State Management**: Store only necessary data in state to avoid large context windows.
|
1164
|
+
* **`include_contents='none'`**: For stateless utility agents, saves LLM context window.
|
1165
|
+
* **Parallelization**: Use `ParallelAgent` for independent tasks.
|
1166
|
+
* **Streaming**: Use `StreamingMode.SSE` or `BIDI` for perceived latency reduction.
|
1167
|
+
* **`max_llm_calls`**: Limit LLM calls to prevent runaway agents and control costs.
|
1168
|
+
|
1169
|
+
---
|
1170
|
+
|
1171
|
+
## 17. General Best Practices & Common Pitfalls
|
1172
|
+
|
1173
|
+
* **Start Simple**: Begin with `LlmAgent`, mock tools, and `InMemorySessionService`. Gradually add complexity.
|
1174
|
+
* **Iterative Development**: Build small features, test, debug, refine.
|
1175
|
+
* **Modular Design**: Use agents and tools to encapsulate logic.
|
1176
|
+
* **Clear Naming**: Descriptive names for agents, tools, state keys.
|
1177
|
+
* **Error Handling**: Implement robust `try...except` blocks in tools and callbacks. Guide LLMs on how to handle tool errors.
|
1178
|
+
* **Testing**: Write unit tests for tools/callbacks, integration tests for agent flows (`pytest`, `adk eval`).
|
1179
|
+
* **Dependency Management**: Use virtual environments (`venv`) and `requirements.txt`.
|
1180
|
+
* **Secrets Management**: Never hardcode API keys. Use `.env` for local dev, environment variables or secret managers (Google Cloud Secret Manager) for production.
|
1181
|
+
* **Avoid Infinite Loops**: Especially with `LoopAgent` or complex LLM tool-calling chains. Use `max_iterations`, `max_llm_calls`, and strong instructions.
|
1182
|
+
* **Handle `None` & `Optional`**: Always check for `None` or `Optional` values when accessing nested properties (e.g., `event.content and event.content.parts and event.content.parts[0].text`).
|
1183
|
+
* **Immutability of Events**: Events are immutable records. If you need to change something *before* it's processed, do so in a `before_*` callback and return a *new* modified object.
|
1184
|
+
* **Understand `output_key` vs. direct `state` writes**: `output_key` is for the agent's *final conversational* output. Direct `tool_context.state['key'] = value` is for *any other* data you want to save.
|
1185
|
+
* **Example Agents**: Find practical examples and reference implementations in the [ADK Samples repository](https://github.com/google/adk-samples).
|
1186
|
+
|
1187
|
+
|
1188
|
+
### Testing the output of an agent
|
1189
|
+
|
1190
|
+
The following script demonstrates how to programmatically test an agent's output. This approach is extremely useful when an LLM or coding agent needs to interact with a work-in-progress agent, as well as for automated testing, debugging, or when you need to integrate agent execution into other workflows:
|
1191
|
+
```
|
1192
|
+
import asyncio
|
1193
|
+
|
1194
|
+
from google.adk.runners import Runner
|
1195
|
+
from google.adk.sessions import InMemorySessionService
|
1196
|
+
from app.agent import root_agent
|
1197
|
+
from google.genai import types as genai_types
|
1198
|
+
|
1199
|
+
|
1200
|
+
async def main():
|
1201
|
+
"""Runs the agent with a sample query."""
|
1202
|
+
session_service = InMemorySessionService()
|
1203
|
+
await session_service.create_session(
|
1204
|
+
app_name="app", user_id="test_user", session_id="test_session"
|
1205
|
+
)
|
1206
|
+
runner = Runner(
|
1207
|
+
agent=root_agent, app_name="app", session_service=session_service
|
1208
|
+
)
|
1209
|
+
query = "I want a recipe for pancakes"
|
1210
|
+
async for event in runner.run_async(
|
1211
|
+
user_id="test_user",
|
1212
|
+
session_id="test_session",
|
1213
|
+
new_message=genai_types.Content(
|
1214
|
+
role="user",
|
1215
|
+
parts=[genai_types.Part.from_text(text=query)]
|
1216
|
+
),
|
1217
|
+
):
|
1218
|
+
if event.is_final_response():
|
1219
|
+
print(event.content.parts[0].text)
|
1220
|
+
|
1221
|
+
|
1222
|
+
if __name__ == "__main__":
|
1223
|
+
asyncio.run(main())
|
1224
|
+
```
|