npm - @bastani/atomic - Versions diffs - 0.5.11 → 0.5.12-0 - Mend

@bastani/atomic 0.5.11 → 0.5.12-0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (506) hide show

package/.agents/skills/tool-design/SKILL.md ADDED Viewed

@@ -0,0 +1,271 @@
+---
+name: tool-design
+description: This skill should be used when the user asks to "design agent tools", "create tool descriptions", "reduce tool complexity", "implement MCP tools", or mentions tool consolidation, architectural reduction, tool naming conventions, or agent-tool interfaces. Part of the context engineering skill suite — also activates when the user mentions "context engineering" or "context-engineering" in the context of designing tools that shape how agents receive and process context.
+---
+# Tool Design for Agents
+Design every tool as a contract between a deterministic system and a non-deterministic agent. Unlike human-facing APIs, agent-facing tools must make the contract unambiguous through the description alone -- agents infer intent from descriptions and generate calls that must match expected formats. Every ambiguity becomes a potential failure mode that no amount of prompt engineering can fix.
+## When to Activate
+Activate this skill when:
+- Creating new tools for agent systems
+- Debugging tool-related failures or misuse
+- Optimizing existing tool sets for better agent performance
+- Designing tool APIs from scratch
+- Evaluating third-party tools for agent integration
+- Standardizing tool conventions across a codebase
+## Core Concepts
+Design tools around the consolidation principle: if a human engineer cannot definitively say which tool should be used in a given situation, an agent cannot be expected to do better. Reduce the tool set until each tool has one unambiguous purpose, because agents select tools by comparing descriptions and any overlap introduces selection errors.
+Treat every tool description as prompt engineering that shapes agent behavior. The description is not documentation for humans -- it is injected into the agent's context and directly steers reasoning. Write descriptions that answer what the tool does, when to use it, and what it returns, because these three questions are exactly what agents evaluate during tool selection.
+## Detailed Topics
+### The Tool-Agent Interface
+**Tools as Contracts**
+Design each tool as a self-contained contract. When humans call APIs, they read docs, understand conventions, and make appropriate requests. Agents must infer the entire contract from a single description block. Make the contract unambiguous by including format examples, expected patterns, and explicit constraints. Omit nothing that a caller needs to know, because agents cannot ask clarifying questions before making a call.
+**Tool Description as Prompt**
+Write tool descriptions knowing they load directly into agent context and collectively steer behavior. A vague description like "Search the database" with cryptic parameter names forces the agent to guess -- and guessing produces incorrect calls. Instead, include usage context, parameter format examples, and sensible defaults. Every word in the description either helps or hurts tool selection accuracy.
+**Namespacing and Organization**
+Namespace tools under common prefixes as the collection grows, because agents benefit from hierarchical grouping. When an agent needs database operations, it routes to the `db_*` namespace; when it needs web interactions, it routes to `web_*`. Without namespacing, agents must evaluate every tool in a flat list, which degrades selection accuracy as the count grows.
+### The Consolidation Principle
+**Single Comprehensive Tools**
+Build single comprehensive tools instead of multiple narrow tools that overlap. Rather than implementing `list_users`, `list_events`, and `create_event` separately, implement `schedule_event` that finds availability and schedules in one call. The comprehensive tool handles the full workflow internally, removing the agent's burden of chaining calls in the correct order.
+**Why Consolidation Works**
+Apply consolidation because agents have limited context and attention. Each tool in the collection competes for attention during tool selection, each description consumes context budget tokens, and overlapping functionality creates ambiguity. Consolidation eliminates redundant descriptions, removes selection ambiguity, and shrinks the effective tool set. Vercel demonstrated this principle by reducing their agent from 17 specialized tools to 2 general-purpose tools and achieving better performance -- fewer tools meant less confusion and more reliable tool selection.
+**When Not to Consolidate**
+Keep tools separate when they have fundamentally different behaviors, serve different contexts, or must be callable independently. Over-consolidation creates a different problem: a single tool with too many parameters and modes becomes hard for agents to parameterize correctly.
+### Architectural Reduction
+Push the consolidation principle to its logical extreme by removing most specialized tools in favor of primitive, general-purpose capabilities. Production evidence shows this approach can outperform sophisticated multi-tool architectures.
+**The File System Agent Pattern**
+Provide direct file system access through a single command execution tool instead of building custom tools for data exploration, schema lookup, and query validation. The agent uses standard Unix utilities (grep, cat, find, ls) to explore and operate on the system. This works because file systems are a proven abstraction that models understand deeply, standard tools have predictable behavior, agents can chain primitives flexibly rather than being constrained to predefined workflows, and good documentation in files replaces summarization tools.
+**When Reduction Outperforms Complexity**
+Choose reduction when the data layer is well-documented and consistently structured, the model has sufficient reasoning capability, specialized tools were constraining rather than enabling the model, or more time is spent maintaining scaffolding than improving outcomes. Avoid reduction when underlying data is messy or poorly documented, the domain requires specialized knowledge the model lacks, safety constraints must limit agent actions, or operations genuinely benefit from structured workflows.
+**Build for Future Models**
+Design minimal architectures that benefit from model improvements rather than sophisticated architectures that lock in current limitations. Ask whether each tool enables new capabilities or constrains reasoning the model could handle on its own -- tools built as "guardrails" often become liabilities as models improve.
+See [Architectural Reduction Case Study](./references/architectural_reduction.md) for production evidence.
+### Tool Description Engineering
+**Description Structure**
+Structure every tool description to answer four questions:
+1. What does the tool do? State exactly what the tool accomplishes -- avoid vague language like "helps with" or "can be used for."
+2. When should it be used? Specify direct triggers ("User asks about pricing") and indirect signals ("Need current market rates").
+3. What inputs does it accept? Describe each parameter with types, constraints, defaults, and format examples.
+4. What does it return? Document the output format, structure, successful response examples, and error conditions.
+**Default Parameter Selection**
+Set defaults to reflect common use cases. Defaults reduce agent burden by eliminating unnecessary parameter specification and prevent errors from omitted parameters. Choose defaults that produce useful results without requiring the agent to understand every option.
+### Response Format Optimization
+Offer response format options (concise vs. detailed) because tool response size significantly impacts context usage. Concise format returns essential fields only, suitable for confirmations. Detailed format returns complete objects, suitable when full context drives decisions. Document when to use each format in the tool description so agents learn to select appropriately.
+### Error Message Design
+Design error messages for two audiences: developers debugging issues and agents recovering from failures. For agents, every error message must be actionable -- it must state what went wrong and how to correct it. Include retry guidance for retryable errors, corrected format examples for input errors, and specific missing fields for incomplete requests. An error that says only "failed" provides zero recovery signal.
+### Tool Definition Schema
+Establish a consistent schema across all tools. Use verb-noun pattern for tool names (`get_customer`, `create_order`), consistent parameter names across tools (always `customer_id`, never sometimes `id` and sometimes `identifier`), and consistent return field names. Consistency reduces the cognitive load on agents and improves cross-tool generalization.
+### Tool Collection Design
+Limit tool collections to 10-20 tools for most applications, because research shows description overlap causes model confusion and more tools do not always lead to better outcomes. When more tools are genuinely needed, use namespacing to create logical groupings. Implement selection mechanisms: tool grouping by domain, example-based selection hints, and umbrella tools that route to specialized sub-tools.
+### MCP Tool Naming Requirements
+Always use fully qualified tool names with MCP (Model Context Protocol) to avoid "tool not found" errors.
+Format: `ServerName:tool_name`
+```python
+# Correct: Fully qualified names
+"Use the BigQuery:bigquery_schema tool to retrieve table schemas."
+"Use the GitHub:create_issue tool to create issues."
+# Incorrect: Unqualified names
+"Use the bigquery_schema tool..."  # May fail with multiple servers
+```
+Without the server prefix, agents may fail to locate tools when multiple MCP servers are available. Establish naming conventions that include server context in all tool references.
+### Using Agents to Optimize Tools
+Feed observed tool failures back to an agent to diagnose issues and improve descriptions. Production testing shows this approach achieves 40% reduction in task completion time by helping future agents avoid mistakes.
+**The Tool-Testing Agent Pattern**:
+```python
+def optimize_tool_description(tool_spec, failure_examples):
+    """
+    Use an agent to analyze tool failures and improve descriptions.
+    Process:
+    1. Agent attempts to use tool across diverse tasks
+    2. Collect failure modes and friction points
+    3. Agent analyzes failures and proposes improvements
+    4. Test improved descriptions against same tasks
+    """
+    prompt = f"""
+    Analyze this tool specification and the observed failures.
+    Tool: {tool_spec}
+    Failures observed:
+    {failure_examples}
+    Identify:
+    1. Why agents are failing with this tool
+    2. What information is missing from the description
+    3. What ambiguities cause incorrect usage
+    Propose an improved tool description that addresses these issues.
+    """
+    return get_agent_response(prompt)
+```
+This creates a feedback loop: agents using tools generate failure data, which agents then use to improve tool descriptions, which reduces future failures.
+### Testing Tool Design
+Evaluate tool designs against five criteria: unambiguity, completeness, recoverability, efficiency, and consistency. Test by presenting representative agent requests and evaluating the resulting tool calls against expected behavior.
+## Practical Guidance
+### Tool Selection Framework
+When designing tool collections:
+1. Identify distinct workflows agents must accomplish
+2. Group related actions into comprehensive tools
+3. Ensure each tool has a clear, unambiguous purpose
+4. Document error cases and recovery paths
+5. Test with actual agent interactions
+## Examples
+**Example 1: Well-Designed Tool**
+```python
+def get_customer(customer_id: str, format: str = "concise"):
+    """
+    Retrieve customer information by ID.
+    Use when:
+    - User asks about specific customer details
+    - Need customer context for decision-making
+    - Verifying customer identity
+    Args:
+        customer_id: Format "CUST-######" (e.g., "CUST-000001")
+        format: "concise" for key fields, "detailed" for complete record
+    Returns:
+        Customer object with requested fields
+    Errors:
+        NOT_FOUND: Customer ID not found
+        INVALID_FORMAT: ID must match CUST-###### pattern
+    """
+```
+**Example 2: Poor Tool Design**
+This example demonstrates several tool design anti-patterns:
+```python
+def search(query):
+    """Search the database."""
+    pass
+```
+**Problems with this design:**
+1. **Vague name**: "search" is ambiguous - search what, for what purpose?
+2. **Missing parameters**: What database? What format should query take?
+3. **No return description**: What does this function return? A list? A string? Error handling?
+4. **No usage context**: When should an agent use this versus other tools?
+5. **No error handling**: What happens if the database is unavailable?
+**Failure modes:**
+- Agents may call this tool when they should use a more specific tool
+- Agents cannot determine correct query format
+- Agents cannot interpret results
+- Agents cannot recover from failures
+## Guidelines
+1. Write descriptions that answer what, when, and what returns
+2. Use consolidation to reduce ambiguity
+3. Implement response format options for token efficiency
+4. Design error messages for agent recovery
+5. Establish and follow consistent naming conventions
+6. Limit tool count and use namespacing for organization
+7. Test tool designs with actual agent interactions
+8. Iterate based on observed failure modes
+9. Question whether each tool enables or constrains the model
+10. Prefer primitive, general-purpose tools over specialized wrappers
+11. Invest in documentation quality over tooling sophistication
+12. Build minimal architectures that benefit from model improvements
+## Gotchas
+1. **Vague descriptions**: Descriptions like "Search the database for customer information" leave too many questions unanswered. State the exact database, query format, and return shape.
+2. **Cryptic parameter names**: Parameters named `x`, `val`, or `param1` force agents to guess meaning. Use descriptive names that convey purpose without reading further documentation.
+3. **Missing error recovery guidance**: Tools that fail with generic messages like "Error occurred" provide no recovery signal. Every error response must tell the agent what went wrong and what to try next.
+4. **Inconsistent naming across tools**: Using `id` in one tool, `identifier` in another, and `customer_id` in a third creates confusion. Standardize parameter names across the entire tool collection.
+5. **MCP namespace collisions**: When multiple MCP tool providers register tools with similar names (e.g., two servers both exposing `search`), agents cannot disambiguate. Always use fully qualified `ServerName:tool_name` format and audit for collisions when adding new providers.
+6. **Tool description rot**: Descriptions become inaccurate as underlying APIs evolve -- parameters get added, return formats change, error codes shift. Treat descriptions as code: version them, review them during API changes, and test them against current behavior.
+7. **Over-consolidation**: Making a single tool handle too many workflows produces parameter lists so large that agents struggle to select the right combination. If a tool requires more than 8-10 parameters or serves fundamentally different use cases, split it.
+8. **Parameter explosion**: Too many optional parameters overwhelm agent decision-making. Each parameter the agent must evaluate adds cognitive load. Provide sensible defaults, group related options into format presets, and move rarely-used parameters into an `options` object.
+9. **Missing error context**: Error messages that say only "failed" or "invalid input" without specifying which input, why it failed, or what a valid input looks like leave agents unable to self-correct. Include the invalid value, the expected format, and a concrete example in every error response.
+## Integration
+This skill connects to:
+- context-fundamentals - How tools interact with context
+- multi-agent-patterns - Specialized tools per agent
+- evaluation - Evaluating tool effectiveness
+## References
+Internal references:
+- [Best Practices Reference](./references/best_practices.md) - Read when: designing a new tool from scratch or auditing an existing tool collection for quality gaps
+- [Architectural Reduction Case Study](./references/architectural_reduction.md) - Read when: considering removing specialized tools in favor of primitives, or evaluating whether a complex tool architecture is justified
+Related skills in this collection:
+- context-fundamentals - Tool context interactions
+- evaluation - Tool testing patterns
+External resources:
+- MCP (Model Context Protocol) documentation - Read when: implementing tools for multi-server agent environments or debugging tool routing failures
+- Framework tool conventions - Read when: adopting a new agent framework and need to map tool design principles to framework-specific APIs
+- API design best practices for agents - Read when: translating existing human-facing APIs into agent-facing tool interfaces
+- Vercel d0 agent architecture case study - Read when: evaluating whether to consolidate tools or seeking production evidence for architectural reduction
+---
+## Skill Metadata
+**Created**: 2025-12-20
+**Last Updated**: 2026-03-17
+**Author**: Agent Skills for Context Engineering Contributors
+**Version**: 2.0.0

package/.agents/skills/tool-design/references/architectural_reduction.md ADDED Viewed

@@ -0,0 +1,210 @@
+# Architectural Reduction: Production Evidence
+This document provides detailed evidence and implementation patterns for the architectural reduction approach to agent tool design.
+## Case Study: Text-to-SQL Agent
+A production text-to-SQL agent was rebuilt using architectural reduction principles. The original architecture used specialized tools with heavy prompt engineering and careful context management. The reduced architecture used a single bash command execution tool.
+### Original Architecture (Many Specialized Tools)
+The original system included:
+- GetEntityJoins: Find relationships between entities
+- LoadCatalog: Load data catalog information
+- RecallContext: Retrieve previous context
+- LoadEntityDetails: Get entity specifications
+- SearchCatalog: Search data catalog
+- ClarifyIntent: Clarify user intent
+- SearchSchema: Search database schema
+- GenerateAnalysisPlan: Create query plan
+- FinalizeQueryPlan: Complete query plan
+- FinalizeNoData: Handle no-data cases
+- JoinPathFinder: Find join paths
+- SyntaxValidator: Validate SQL syntax
+- FinalizeBuild: Complete query build
+- ExecuteSQL: Run SQL queries
+- FormatResults: Format query results
+- VisualizeData: Create visualizations
+- ExplainResults: Explain query results
+Each tool solved a specific problem the team anticipated the model would face. The assumption was that the model would get lost in complex schemas, make bad joins, or hallucinate table names.
+### Reduced Architecture (Two Primitive Tools)
+The reduced system included:
+- ExecuteCommand: Run arbitrary bash commands in a sandbox
+- ExecuteSQL: Run SQL queries against the database
+The agent explores the semantic layer using standard Unix tools:
+```python
+from vercel_sandbox import Sandbox
+sandbox = Sandbox.create()
+await sandbox.write_files(semantic_layer_files)
+def execute_command(command: str):
+    """Execute arbitrary bash command in sandbox."""
+    result = sandbox.exec(command)
+    return {
+        "stdout": result.stdout,
+        "stderr": result.stderr,
+        "exit_code": result.exit_code
+    }
+```
+The agent now uses `grep`, `cat`, `find`, and `ls` to navigate YAML, Markdown, and JSON files containing dimension definitions, measure calculations, and join relationships.
+### Comparative Results
+| Metric | Original (17 tools) | Reduced (2 tools) | Change |
+|--------|---------------------|-------------------|--------|
+| Average execution time | 274.8s | 77.4s | 3.5x faster |
+| Success rate | 80% (4/5) | 100% (5/5) | +20% |
+| Average token usage | ~102k tokens | ~61k tokens | 37% fewer |
+| Average steps | ~12 steps | ~7 steps | 42% fewer |
+The worst case in the original architecture: 724 seconds, 100 steps, 145,463 tokens, and a failure. The reduced architecture completed the same query in 141 seconds with 19 steps and 67,483 tokens, successfully.
+## Why Reduction Works
+### File Systems Are Powerful Abstractions
+File systems have 50+ years of refinement. Standard Unix tools like `grep` are well-documented, predictable, and understood by models. Building custom tools for what Unix already solves adds complexity without value.
+### Tools Were Constraining Reasoning
+The specialized tools were solving problems the model could handle on its own:
+- Pre-filtering context the model could navigate
+- Constraining options the model could evaluate
+- Wrapping interactions in validation logic the model didn't need
+Each guardrail became a maintenance burden. Each model update required recalibrating constraints. The team spent more time maintaining scaffolding than improving the agent.
+### Good Documentation Replaces Tool Sophistication
+The semantic layer was already well-documented:
+- Dimension definitions in structured YAML
+- Measure calculations with clear naming
+- Join relationships in navigable files
+The custom tools were summarizing what was already legible. The model needed access to read the documentation directly, not abstractions on top of it.
+## Implementation Pattern
+### The File System Agent
+```python
+from ai import ToolLoopAgent, tool
+from sandbox import Sandbox
+# Create sandboxed environment with your data layer
+sandbox = Sandbox.create()
+await sandbox.write_files(data_layer_files)
+# Single primitive tool
+def create_execute_tool(sandbox):
+    return tool(
+        name="execute_command",
+        description="""
+        Execute a bash command in the sandbox environment.
+        Use standard Unix tools to explore and understand the data layer:
+        - ls: List directory contents
+        - cat: Read file contents
+        - grep: Search for patterns
+        - find: Locate files
+        The sandbox contains the semantic layer documentation:
+        - /data/entities/*.yaml: Entity definitions
+        - /data/measures/*.yaml: Measure calculations
+        - /data/joins/*.yaml: Join relationships
+        - /docs/*.md: Additional documentation
+        """,
+        execute=lambda command: sandbox.exec(command)
+    )
+# Minimal agent
+agent = ToolLoopAgent(
+    model="claude-opus-4.5",
+    tools={
+        "execute_command": create_execute_tool(sandbox),
+        "execute_sql": sql_tool,
+    }
+)
+```
+### Prerequisites for Success
+This pattern works when:
+1. **Documentation quality is high**: Files are well-structured, consistently named, and contain clear definitions.
+2. **Model capability is sufficient**: The model can reason through complexity without hand-holding.
+3. **Safety constraints permit**: The sandbox limits what the agent can access and modify.
+4. **Domain is navigable**: The problem space can be explored through file inspection.
+### When Not to Use
+Reduction fails when:
+1. **Data layer is messy**: Legacy naming conventions, undocumented joins, inconsistent structure. The model will produce faster bad queries.
+2. **Specialized knowledge is required**: Domain expertise that can't be documented in files.
+3. **Safety requires restrictions**: Operations that must be constrained for security or compliance.
+4. **Workflows are genuinely complex**: Multi-step processes that benefit from structured orchestration.
+## Design Principles
+### Addition by Subtraction
+The best agents may be the ones with the fewest tools. Every tool is a choice made for the model. Sometimes the model makes better choices when given primitive capabilities rather than constrained workflows.
+### Trust Model Reasoning
+Modern models can handle complexity. Constraining reasoning because you don't trust the model to reason is often counterproductive. Test what the model can actually do before building guardrails.
+### Invest in Context, Not Tooling
+The foundation matters more than clever tooling:
+- Clear file naming conventions
+- Well-structured documentation
+- Consistent data organization
+- Legible relationship definitions
+### Build for Future Models
+Models improve faster than tooling can keep up. An architecture optimized for today's model limitations may be over-constrained for tomorrow's model capabilities. Build minimal architectures that benefit from model improvements.
+## Evaluation Framework
+When considering architectural reduction, evaluate:
+1. **Maintenance overhead**: How much time is spent maintaining tools vs. improving outcomes?
+2. **Failure analysis**: Are failures caused by model limitations or tool constraints?
+3. **Documentation quality**: Could the model navigate your data layer directly if given access?
+4. **Constraint necessity**: Are guardrails protecting against real risks or hypothetical concerns?
+5. **Model capability**: Has the model improved since tools were designed?
+## Conclusion
+Architectural reduction is not universally applicable, but the principle challenges a common assumption: that more sophisticated tooling leads to better outcomes. Sometimes the opposite is true. Start with the simplest possible architecture, add complexity only when proven necessary, and continuously question whether tools are enabling or constraining model capabilities.
+## References
+- Vercel Engineering: "We removed 80% of our agent's tools" (December 2025)
+- AI SDK ToolLoopAgent documentation
+- Vercel Sandbox documentation

package/.agents/skills/tool-design/references/best_practices.md ADDED Viewed

@@ -0,0 +1,176 @@
+# Tool Design Best Practices
+This document provides additional best practices and guidelines for designing tools for agent systems.
+## Tool Philosophy
+Tools are the primary interface between agents and the world. Unlike traditional APIs designed for developers who understand underlying systems, tools must be designed for language models that infer intent from descriptions and generate calls from natural language requests. This fundamental difference requires rethinking how we design and document tool interfaces.
+The goal is to create tools that agents can discover, understand, and use correctly without extensive trial and error. Every ambiguity in tool definitions becomes a potential failure mode. Every unclear parameter name forces the agent to guess. Every missing example leaves the agent without guidance for edge cases.
+## Description Engineering Principles
+### Principle 1: Answer the Fundamental Questions
+Every tool description should clearly answer four questions. What does the tool do? State exactly what the tool accomplishes in specific terms, avoiding vague language like "helps with" or "can be used for." When should it be used? Provide specific triggers and contexts, including both direct triggers and indirect signals that indicate the tool's applicability. What inputs does it accept? Document parameters with types, constraints, and defaults, explaining what each parameter controls. What does it return? Describe output format and structure, including examples of successful responses and error conditions.
+### Principle 2: Use Consistent Structure
+Maintain consistent structure across all tool descriptions in your codebase. When agents encounter a new tool, they should be able to predict where to find specific information based on patterns learned from other tools. This reduces cognitive overhead and prevents errors caused by inconsistent formatting.
+A recommended structure includes a brief description in the first sentence, a detailed explanation with usage context, a parameters section with clear type information, a returns section describing output structure, and an errors section listing possible failure modes with recovery guidance.
+### Principle 3: Include Concrete Examples
+Examples bridge the gap between abstract description and actual usage. Include examples of typical calls showing common parameter combinations, examples of edge cases and how to handle them, and examples of error responses and appropriate recovery actions.
+Good examples are specific rather than generic. Instead of "Use an ID like '123'", use "Use format: 'CUST-######' (e.g., 'CUST-000001')". Instead of "Provide a date", use "Format: 'YYYY-MM-DD' (e.g., '2024-01-15')".
+## Naming Conventions
+### Parameter Naming
+Parameter names should be self-documenting. Use names that clearly indicate purpose without requiring additional explanation. Prefer full words over abbreviations except for widely understood acronyms like "id" or "url". Use consistent naming across tools for similar concepts.
+Good parameter names include customer_id, search_query, output_format, max_results, and include_details. Poor parameter names include x, val, param1, and info.
+### Enumeration Values
+When parameters accept enumerated values, use consistent naming across all tools. For boolean-style options, use prefix patterns like "include_" for affirmative options (include_history, include_metadata) and "exclude_" for negative options (exclude_archived, exclude_inactive). For categorical values, use consistent terminology like "format": "concise" | "detailed" rather than mixing "format": "short" | "long" in some tools and "format": "brief" | "complete" in others.
+## Error Message Design
+### The Dual Audience
+Error messages serve two audiences with different needs. Developers debugging issues need detailed technical information including stack traces and internal state. Agents recovering from failures need actionable guidance that tells them what went wrong and how to correct it.
+Design error messages with agent recovery as the primary consideration. Include what specifically went wrong in clear language. Provide resolution guidance describing what the agent should do next. Include corrected format for input errors. Add examples of valid input.
+### Error Message Structure
+```json
+{
+    "error": {
+        "code": "INVALID_CUSTOMER_ID",
+        "category": "validation",
+        "message": "Customer ID 'CUST-123' does not match required format",
+        "expected_format": {
+            "description": "Customer ID must be 9 characters",
+            "pattern": "CUST-######",
+            "example": "CUST-000001"
+        },
+        "resolution": "Provide a customer ID matching pattern CUST-######",
+        "retryable": true
+    }
+}
+```
+### Common Error Patterns
+Validation errors should specify what was received, what format was expected, and how to correct it. Rate limit errors should specify wait time and retry guidance. Not found errors should suggest alternative approaches or verification steps. System errors should indicate whether retry is appropriate and suggest alternatives.
+## Response Format Optimization
+### The Token-Accuracy Trade-off
+Verbose responses provide comprehensive information but consume significant context tokens. Concise responses minimize token usage but may lack necessary detail. The optimal approach provides format options that allow agents to request appropriate verbosity for their needs.
+### Format Options Pattern
+```python
+def get_customer_response(format: str = "concise"):
+    """
+    Retrieve customer information.
+    Args:
+        format: Response format - 'concise' for key fields only,
+                'detailed' for complete customer record
+    """
+    if format == "concise":
+        return {
+            "id": customer.id,
+            "name": customer.name,
+            "status": customer.status
+        }
+    else:  # detailed
+        return {
+            "id": customer.id,
+            "name": customer.name,
+            "email": customer.email,
+            "phone": customer.phone,
+            "address": customer.address,
+            "status": customer.status,
+            "created_at": customer.created_at,
+            "history": customer.history,
+            "preferences": customer.preferences
+        }
+```
+### When to Use Each Format
+Use concise format for quick verification or simple lookups, when only confirmation is needed, and in subsequent tool calls after initial retrieval. Use detailed format when making decisions based on customer data, when output becomes input for other processing, and when complete context is necessary for correctness.
+## Tool Collection Design
+### Managing Tool Proliferation
+As agent systems grow, tool collections tend to proliferate. More tools can enable more capabilities but create selection challenges. Research shows that tool description overlap causes model confusion. The key insight is that if a human engineer cannot definitively say which tool should be used in a given situation, an agent cannot be expected to do better.
+### Consolidation Guidelines
+Consolidate tools that represent sequential steps in a single workflow into a single tool that handles the entire workflow. For example, instead of list_users, list_events, and create_event, implement schedule_event that finds availability and schedules in one call.
+Keep separate tools that have fundamentally different behaviors even if they share some functionality. Tools used in different contexts should maintain separation to prevent confusion.
+Maintain clear boundaries between tools even when they operate in similar domains. Overlapping functionality should be minimized through careful design.
+### Tool Selection Guidance
+When designing tool collections, consider what information an agent needs to make correct selections. If multiple tools could apply to a situation, clarify the distinction in descriptions. Use namespacing to create logical groupings that help agents navigate the tool space.
+## Testing Tool Design
+### Evaluation Criteria
+Evaluate tool designs against clarity, completeness, recoverability, efficiency, and consistency criteria. Clarity measures whether agents can determine when to use the tool. Completeness measures whether descriptions include all necessary information. Recoverability measures whether agents can recover from errors. Efficiency measures whether tools support appropriate response formats. Consistency measures whether tools follow naming and schema conventions.
+### Agent Testing Pattern
+Test tools by presenting representative agent requests and evaluating the resulting tool calls:
+1. Prepare test cases with diverse agent requests
+2. Have an agent formulate tool calls for each request
+3. Evaluate call correctness against expected patterns
+4. Identify common failure modes
+5. Refine tool definitions based on findings
+## Anti-Patterns to Avoid
+### Vague Descriptions
+Bad: "Search the database for customer information." This leaves too many questions unanswered. What database? What information is available? What format should queries take?
+Good: "Retrieve customer information by ID or email. Use when user asks about specific customer details, history, or status. Returns customer object with id, name, email, account_status, and optional order history."
+### Cryptic Parameter Names
+Bad: Parameters named x, val, or param1 force agents to guess meaning.
+Good: Parameters named customer_id, max_results, or include_history are self-documenting.
+### Missing Error Handling
+Bad: Tools that fail with generic errors or no error handling.
+Good: Tools that provide specific error types, messages, and resolution guidance.
+### Inconsistent Naming
+Bad: Using id in some tools, identifier in others, customer_id in some and user_id in others for similar concepts.
+Good: Maintaining consistent naming patterns across all tools for similar concepts.
+## Checklist for Tool Design
+Before deploying a new tool, verify that the description clearly states what the tool does and when to use it. Verify that all parameters have descriptive names and clear type information. Verify that return values are documented with structure and examples. Verify that error cases are covered with actionable messages. Verify that the tool follows naming conventions used elsewhere. Verify that examples demonstrate common usage patterns. Verify that format options are available if response size varies significantly.