aiblueprint-cli 1.4.59 → 1.4.60
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +16 -36
- package/agents-config/agents/action.md +1 -1
- package/agents-config/agents/explore-codebase.md +53 -53
- package/agents-config/agents/explore-docs.md +50 -69
- package/agents-config/agents/websearch.md +36 -40
- package/agents-config/claude-config/scripts/.claude/skills/fix-on-my-computer/SKILL.md +81 -0
- package/agents-config/claude-config/scripts/CLAUDE.md +10 -4
- package/agents-config/claude-config/scripts/bun.lockb +0 -0
- package/agents-config/claude-config/scripts/package.json +22 -30
- package/agents-config/claude-config/scripts/statusline/CLAUDE.md +37 -155
- package/agents-config/claude-config/scripts/statusline/README.md +18 -94
- package/agents-config/claude-config/scripts/statusline/defaults.json +13 -10
- package/agents-config/claude-config/scripts/statusline/fixtures/mock-transcript.jsonl +4 -4
- package/agents-config/claude-config/scripts/statusline/fixtures/test-input.json +4 -4
- package/agents-config/claude-config/scripts/statusline/src/commands/interactive-config.ts +403 -0
- package/agents-config/claude-config/scripts/statusline/src/index.ts +33 -82
- package/agents-config/claude-config/scripts/statusline/src/lib/config-types.ts +7 -1
- package/agents-config/claude-config/scripts/statusline/src/lib/formatters.ts +40 -0
- package/agents-config/claude-config/scripts/statusline/src/lib/presets.ts +13 -13
- package/agents-config/claude-config/scripts/statusline/src/lib/render-pure.ts +24 -5
- package/agents-config/claude-config/scripts/statusline/statusline.config.free.json +79 -0
- package/agents-config/claude-config/scripts/statusline/statusline.config.json +77 -77
- package/agents-config/commands/prompts/create-vitejs-app.md +272 -0
- package/agents-config/commands/prompts/nextjs-add-prisma-db.md +136 -0
- package/agents-config/commands/prompts/nextjs-setup-better-auth.md +173 -0
- package/agents-config/commands/prompts/nextjs-setup-project.md +200 -0
- package/agents-config/commands/prompts/prompt.md +55 -0
- package/agents-config/commands/prompts/saas-challenge-idea.md +135 -0
- package/agents-config/commands/prompts/saas-create-architecture.md +242 -0
- package/agents-config/commands/prompts/saas-create-headline.md +132 -0
- package/agents-config/commands/prompts/saas-create-landing-copywritting.md +267 -0
- package/agents-config/commands/prompts/saas-create-legals-docs.md +176 -0
- package/agents-config/commands/prompts/saas-create-logos.md +240 -0
- package/agents-config/commands/prompts/saas-create-prd.md +195 -0
- package/agents-config/commands/prompts/saas-create-tasks.md +240 -0
- package/agents-config/commands/prompts/saas-define-pricing.md +293 -0
- package/agents-config/commands/prompts/saas-find-domain-name.md +190 -0
- package/agents-config/commands/prompts/saas-implement-landing-page.md +257 -0
- package/agents-config/commands/prompts/setup-tmux.md +160 -0
- package/agents-config/commands/prompts/tools.md +148 -0
- package/agents-config/scripts/.claude/skills/fix-on-my-computer/SKILL.md +81 -0
- package/agents-config/scripts/CLAUDE.md +37 -0
- package/agents-config/scripts/biome.json +37 -0
- package/agents-config/scripts/bun.lockb +0 -0
- package/agents-config/scripts/package.json +24 -0
- package/agents-config/scripts/statusline/CLAUDE.md +87 -0
- package/agents-config/scripts/statusline/README.md +117 -0
- package/agents-config/scripts/statusline/__tests__/context.test.ts +229 -0
- package/agents-config/scripts/statusline/__tests__/formatters.test.ts +108 -0
- package/agents-config/scripts/statusline/__tests__/statusline.test.ts +309 -0
- package/agents-config/scripts/statusline/defaults.json +82 -0
- package/agents-config/scripts/statusline/fixtures/mock-transcript.jsonl +4 -0
- package/agents-config/scripts/statusline/fixtures/test-input.json +35 -0
- package/agents-config/scripts/statusline/src/commands/interactive-config.ts +403 -0
- package/agents-config/scripts/statusline/src/index.ts +141 -0
- package/agents-config/scripts/statusline/src/lib/config-types.ts +110 -0
- package/agents-config/scripts/statusline/src/lib/config.ts +21 -0
- package/agents-config/scripts/statusline/src/lib/context.ts +103 -0
- package/agents-config/scripts/statusline/src/lib/formatters.ts +426 -0
- package/agents-config/scripts/statusline/src/lib/git.ts +100 -0
- package/agents-config/scripts/statusline/src/lib/menu-factories.ts +224 -0
- package/agents-config/scripts/statusline/src/lib/presets.ts +177 -0
- package/agents-config/scripts/statusline/src/lib/render-pure.ts +516 -0
- package/agents-config/scripts/statusline/src/lib/types.ts +36 -0
- package/agents-config/scripts/statusline/src/lib/utils.ts +15 -0
- package/agents-config/scripts/statusline/statusline.config.free.json +79 -0
- package/agents-config/scripts/statusline/statusline.config.json +79 -0
- package/agents-config/scripts/statusline/test-with-fixtures.ts +37 -0
- package/agents-config/scripts/statusline/test.ts +20 -0
- package/agents-config/scripts/statusline/tsconfig.json +27 -0
- package/agents-config/scripts/tsconfig.json +27 -0
- package/agents-config/skills/{subagent-creator → agents-managers}/SKILL.md +47 -47
- package/agents-config/skills/{subagent-creator/references/subagents.md → agents-managers/references/agents.md} +45 -45
- package/agents-config/skills/{subagent-creator → agents-managers}/references/context-management.md +20 -20
- package/agents-config/skills/{subagent-creator → agents-managers}/references/debugging-agents.md +27 -27
- package/agents-config/skills/{subagent-creator → agents-managers}/references/error-handling-and-recovery.md +19 -19
- package/agents-config/skills/{subagent-creator → agents-managers}/references/evaluation-and-testing.md +29 -29
- package/agents-config/skills/{subagent-creator → agents-managers}/references/orchestration-patterns.md +5 -5
- package/agents-config/skills/{subagent-creator/references/writing-subagent-prompts.md → agents-managers/references/writing-agent-prompts.md} +23 -23
- package/agents-config/skills/codex-environment/SKILL.md +2 -0
- package/agents-config/skills/commit/SKILL.md +2 -0
- package/agents-config/skills/create-pr/SKILL.md +2 -0
- package/agents-config/skills/environments-manager/SKILL.md +271 -0
- package/agents-config/skills/environments-manager/examples/claude/.worktreeinclude +3 -0
- package/agents-config/skills/environments-manager/examples/claude/commands/dev.md +5 -0
- package/agents-config/skills/environments-manager/examples/claude/commands/lint.md +5 -0
- package/agents-config/skills/environments-manager/examples/claude/commands/test.md +5 -0
- package/agents-config/skills/environments-manager/examples/claude/commands/typecheck.md +5 -0
- package/agents-config/skills/environments-manager/examples/claude/settings.json +24 -0
- package/agents-config/skills/environments-manager/examples/codex/environments/environment.toml +29 -0
- package/agents-config/skills/environments-manager/examples/cursor/worktrees.json +3 -0
- package/agents-config/skills/environments-manager/examples/scripts/claude-worktree-create.sh +96 -0
- package/agents-config/skills/environments-manager/examples/scripts/claude-worktree-remove.sh +66 -0
- package/agents-config/skills/environments-manager/examples/scripts/dev.sh +15 -0
- package/agents-config/skills/environments-manager/examples/scripts/worktree-down.sh +22 -0
- package/agents-config/skills/environments-manager/examples/scripts/worktree-up.sh +50 -0
- package/agents-config/skills/environments-manager/references/claude.md +156 -0
- package/agents-config/skills/environments-manager/references/codex.md +97 -0
- package/agents-config/skills/environments-manager/references/cursor.md +88 -0
- package/agents-config/skills/fix-pr-comments/SKILL.md +2 -0
- package/agents-config/skills/grill-me/SKILL.md +10 -0
- package/agents-config/skills/merge/SKILL.md +2 -0
- package/agents-config/skills/rules-manager/SKILL.md +191 -0
- package/agents-config/skills/rules-manager/references/agents-vs-claude.md +66 -0
- package/agents-config/skills/rules-manager/references/examples.md +117 -0
- package/agents-config/skills/skill-manager/SKILL.md +83 -0
- package/agents-config/skills/skill-manager/references/claude-code.md +81 -0
- package/agents-config/skills/skill-manager/references/codex.md +288 -0
- package/agents-config/skills/skill-manager/references/cursor.md +125 -0
- package/agents-config/skills/ultrathink/SKILL.md +2 -0
- package/package.json +1 -1
- package/agents-config/claude-config/scripts/statusline/data/.gitignore +0 -8
- package/agents-config/claude-config/scripts/statusline/data/.gitkeep +0 -0
- package/agents-config/claude-config/scripts/statusline/docs/ARCHITECTURE.md +0 -166
- package/agents-config/claude-config/scripts/statusline/src/tests/spend-v2.test.ts +0 -306
- package/agents-config/skills/apex/SKILL.md +0 -261
- package/agents-config/skills/apex/scripts/setup-templates.sh +0 -100
- package/agents-config/skills/apex/scripts/update-progress.sh +0 -80
- package/agents-config/skills/apex/steps/step-00-init.md +0 -267
- package/agents-config/skills/apex/steps/step-00b-branch.md +0 -126
- package/agents-config/skills/apex/steps/step-00b-economy.md +0 -244
- package/agents-config/skills/apex/steps/step-00b-interactive.md +0 -153
- package/agents-config/skills/apex/steps/step-01-analyze.md +0 -361
- package/agents-config/skills/apex/steps/step-02-plan.md +0 -264
- package/agents-config/skills/apex/steps/step-03-execute.md +0 -239
- package/agents-config/skills/apex/steps/step-04-validate.md +0 -251
- package/agents-config/skills/apex/templates/00-context.md +0 -43
- package/agents-config/skills/apex/templates/01-analyze.md +0 -10
- package/agents-config/skills/apex/templates/02-plan.md +0 -10
- package/agents-config/skills/apex/templates/03-execute.md +0 -10
- package/agents-config/skills/apex/templates/04-validate.md +0 -10
- package/agents-config/skills/apex/templates/README.md +0 -176
- package/agents-config/skills/apex/templates/step-complete.md +0 -7
- package/agents-config/skills/claude-memory/SKILL.md +0 -293
- package/agents-config/skills/claude-memory/references/comprehensive-example.md +0 -175
- package/agents-config/skills/claude-memory/references/optimize-guide.md +0 -300
- package/agents-config/skills/claude-memory/references/project-patterns.md +0 -334
- package/agents-config/skills/claude-memory/references/prompting-techniques.md +0 -411
- package/agents-config/skills/claude-memory/references/rules-directory-guide.md +0 -298
- package/agents-config/skills/claude-memory/references/section-templates.md +0 -347
- package/agents-config/skills/fix-errors/SKILL.md +0 -61
- package/agents-config/skills/fix-grammar/SKILL.md +0 -59
- package/agents-config/skills/ralph-loop/SKILL.md +0 -117
- package/agents-config/skills/ralph-loop/scripts/setup.sh +0 -278
- package/agents-config/skills/ralph-loop/steps/step-00-init.md +0 -215
- package/agents-config/skills/ralph-loop/steps/step-01-interactive-prd.md +0 -366
- package/agents-config/skills/ralph-loop/steps/step-02-create-stories.md +0 -273
- package/agents-config/skills/ralph-loop/steps/step-03-finish.md +0 -245
- package/agents-config/skills/skill-creator/LICENSE.txt +0 -202
- package/agents-config/skills/skill-creator/SKILL.md +0 -421
- package/agents-config/skills/skill-creator/package.json +0 -5
- package/agents-config/skills/skill-creator/references/output-patterns.md +0 -82
- package/agents-config/skills/skill-creator/references/progressive-disclosure-patterns.md +0 -374
- package/agents-config/skills/skill-creator/references/prompting-integration.md +0 -363
- package/agents-config/skills/skill-creator/references/real-world-examples.md +0 -513
- package/agents-config/skills/skill-creator/references/script-patterns.md +0 -385
- package/agents-config/skills/skill-creator/references/workflows.md +0 -28
- package/agents-config/skills/skill-creator/references/xml-tag-guide.md +0 -606
- package/agents-config/skills/skill-creator/scripts/init-skill.ts +0 -214
- package/agents-config/skills/skill-creator/scripts/package-skill.ts +0 -146
- package/agents-config/skills/skill-creator/scripts/validate.ts +0 -138
- package/agents-config/skills/workflow-apex-free/SKILL.md +0 -261
- package/agents-config/skills/workflow-apex-free/scripts/setup-templates.sh +0 -100
- package/agents-config/skills/workflow-apex-free/scripts/update-progress.sh +0 -80
- package/agents-config/skills/workflow-apex-free/steps/step-00-init.md +0 -267
- package/agents-config/skills/workflow-apex-free/steps/step-00b-branch.md +0 -126
- package/agents-config/skills/workflow-apex-free/steps/step-00b-economy.md +0 -244
- package/agents-config/skills/workflow-apex-free/steps/step-00b-interactive.md +0 -153
- package/agents-config/skills/workflow-apex-free/steps/step-01-analyze.md +0 -361
- package/agents-config/skills/workflow-apex-free/steps/step-02-plan.md +0 -264
- package/agents-config/skills/workflow-apex-free/steps/step-03-execute.md +0 -239
- package/agents-config/skills/workflow-apex-free/steps/step-04-validate.md +0 -251
- package/agents-config/skills/workflow-apex-free/templates/00-context.md +0 -43
- package/agents-config/skills/workflow-apex-free/templates/01-analyze.md +0 -10
- package/agents-config/skills/workflow-apex-free/templates/02-plan.md +0 -10
- package/agents-config/skills/workflow-apex-free/templates/03-execute.md +0 -10
- package/agents-config/skills/workflow-apex-free/templates/04-validate.md +0 -10
- package/agents-config/skills/workflow-apex-free/templates/README.md +0 -176
- package/agents-config/skills/workflow-apex-free/templates/step-complete.md +0 -7
package/agents-config/skills/{subagent-creator → agents-managers}/references/context-management.md
RENAMED
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
# Context Management for
|
|
1
|
+
# Context Management for Agents
|
|
2
2
|
|
|
3
3
|
<core_problem>
|
|
4
4
|
|
|
@@ -8,7 +8,7 @@
|
|
|
8
8
|
<stateless_nature>
|
|
9
9
|
LLMs are stateless by default. Each invocation starts fresh with no memory of previous interactions.
|
|
10
10
|
|
|
11
|
-
**For
|
|
11
|
+
**For agents, this means**:
|
|
12
12
|
- Long-running tasks lose context between tool calls
|
|
13
13
|
- Repeated information wastes tokens
|
|
14
14
|
- Important decisions from earlier in workflow forgotten
|
|
@@ -192,7 +192,7 @@ Retrieval trigger:
|
|
|
192
192
|
<context_switch_detection>
|
|
193
193
|
Monitor for topic changes:
|
|
194
194
|
- User switches from "fix bug" to "add feature"
|
|
195
|
-
-
|
|
195
|
+
- Agent transitions from "analysis" to "implementation"
|
|
196
196
|
- Task scope changes mid-execution
|
|
197
197
|
|
|
198
198
|
On context switch:
|
|
@@ -308,7 +308,7 @@ Summary format:
|
|
|
308
308
|
- Vector store memory
|
|
309
309
|
- Entity extraction
|
|
310
310
|
|
|
311
|
-
**Use case**: Building
|
|
311
|
+
**Use case**: Building agents that need sophisticated memory without manual implementation.
|
|
312
312
|
</langchain>
|
|
313
313
|
|
|
314
314
|
<llamaindex>
|
|
@@ -319,7 +319,7 @@ Summary format:
|
|
|
319
319
|
- Automatic chunking and indexing
|
|
320
320
|
- Retrieval augmentation
|
|
321
321
|
|
|
322
|
-
**Use case**:
|
|
322
|
+
**Use case**: Agents working with large codebases, documentation, or extensive conversation history.
|
|
323
323
|
</llamaindex>
|
|
324
324
|
|
|
325
325
|
<file_based>
|
|
@@ -331,11 +331,11 @@ Summary format:
|
|
|
331
331
|
core-facts.md # Essential project information
|
|
332
332
|
decisions.md # Key decisions and rationale
|
|
333
333
|
patterns.md # Discovered patterns and conventions
|
|
334
|
-
{
|
|
334
|
+
{agent}-state.json # Agent-specific state
|
|
335
335
|
</memory_structure>
|
|
336
336
|
|
|
337
337
|
<usage>
|
|
338
|
-
|
|
338
|
+
Agent reads relevant files at start, updates during execution, summarizes at end.
|
|
339
339
|
</usage>
|
|
340
340
|
```
|
|
341
341
|
|
|
@@ -343,11 +343,11 @@ Subagent reads relevant files at start, updates during execution, summarizes at
|
|
|
343
343
|
</file_based>
|
|
344
344
|
</framework_support>
|
|
345
345
|
|
|
346
|
-
<
|
|
346
|
+
<agent_patterns>
|
|
347
347
|
|
|
348
348
|
|
|
349
|
-
<
|
|
350
|
-
**For long-running or frequently-invoked
|
|
349
|
+
<stateful_agent>
|
|
350
|
+
**For long-running or frequently-invoked agents**:
|
|
351
351
|
|
|
352
352
|
```markdown
|
|
353
353
|
---
|
|
@@ -375,10 +375,10 @@ State file structure:
|
|
|
375
375
|
- Active concerns (issues to address)
|
|
376
376
|
</memory_management>
|
|
377
377
|
```
|
|
378
|
-
</
|
|
378
|
+
</stateful_agent>
|
|
379
379
|
|
|
380
|
-
<
|
|
381
|
-
**For simple, focused
|
|
380
|
+
<stateless_agent>
|
|
381
|
+
**For simple, focused agents**:
|
|
382
382
|
|
|
383
383
|
```markdown
|
|
384
384
|
---
|
|
@@ -401,12 +401,12 @@ You are a syntax validator. Check code for syntax errors.
|
|
|
401
401
|
```
|
|
402
402
|
|
|
403
403
|
**When to use stateless**: Single-purpose validators, formatters, simple transformations.
|
|
404
|
-
</
|
|
404
|
+
</stateless_agent>
|
|
405
405
|
|
|
406
406
|
<context_inheritance>
|
|
407
407
|
**Inheriting context from main chat**:
|
|
408
408
|
|
|
409
|
-
|
|
409
|
+
Agents automatically have access to:
|
|
410
410
|
- User's original request
|
|
411
411
|
- Any context provided in invocation
|
|
412
412
|
|
|
@@ -414,13 +414,13 @@ Subagents automatically have access to:
|
|
|
414
414
|
Main chat: "Review the authentication changes for security issues.
|
|
415
415
|
Context: We recently switched from JWT to session-based auth."
|
|
416
416
|
|
|
417
|
-
|
|
417
|
+
Agent receives:
|
|
418
418
|
- Task: Review authentication changes
|
|
419
419
|
- Context: Recent switch from JWT to session-based auth
|
|
420
420
|
- This context informs review focus without explicit memory management
|
|
421
421
|
```
|
|
422
422
|
</context_inheritance>
|
|
423
|
-
</
|
|
423
|
+
</agent_patterns>
|
|
424
424
|
|
|
425
425
|
<anti_patterns>
|
|
426
426
|
|
|
@@ -535,13 +535,13 @@ Use context for:
|
|
|
535
535
|
<prompt_caching_interaction>
|
|
536
536
|
|
|
537
537
|
|
|
538
|
-
Prompt caching (see [
|
|
538
|
+
Prompt caching (see [agents.md](agents.md#prompt_caching)) works best with stable context.
|
|
539
539
|
|
|
540
540
|
<cache_friendly_context>
|
|
541
541
|
**Structure context for caching**:
|
|
542
542
|
|
|
543
543
|
```markdown
|
|
544
|
-
[CACHEABLE: Stable
|
|
544
|
+
[CACHEABLE: Stable agent instructions]
|
|
545
545
|
<role>...</role>
|
|
546
546
|
<focus_areas>...</focus_areas>
|
|
547
547
|
<workflow>...</workflow>
|
|
@@ -558,7 +558,7 @@ Recent context: ...
|
|
|
558
558
|
|
|
559
559
|
<cache_invalidation>
|
|
560
560
|
**When context changes invalidate cache**:
|
|
561
|
-
-
|
|
561
|
+
- Agent prompt updated
|
|
562
562
|
- Core memory structure changed
|
|
563
563
|
- Context reorganization
|
|
564
564
|
|
package/agents-config/skills/{subagent-creator → agents-managers}/references/debugging-agents.md
RENAMED
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
# Debugging and Troubleshooting
|
|
1
|
+
# Debugging and Troubleshooting Agents
|
|
2
2
|
|
|
3
3
|
<core_challenges>
|
|
4
4
|
|
|
@@ -23,7 +23,7 @@ Impact: Behavior no single agent was designed to exhibit, hard to predict or dia
|
|
|
23
23
|
</emergent_behaviors>
|
|
24
24
|
|
|
25
25
|
<black_box_execution>
|
|
26
|
-
**
|
|
26
|
+
**Agents run in isolated contexts**.
|
|
27
27
|
|
|
28
28
|
User sees final output, not intermediate steps. Makes diagnosis harder.
|
|
29
29
|
|
|
@@ -51,9 +51,9 @@ Common issues:
|
|
|
51
51
|
|
|
52
52
|
<what_to_log>
|
|
53
53
|
Essential logging:
|
|
54
|
-
- **Input prompts**: Full
|
|
54
|
+
- **Input prompts**: Full agent prompt + user request
|
|
55
55
|
- **Tool calls**: Which tools called, parameters, results
|
|
56
|
-
- **Outputs**: Final
|
|
56
|
+
- **Outputs**: Final agent response
|
|
57
57
|
- **Metadata**: Timestamps, model version, token usage, latency
|
|
58
58
|
- **Errors**: Exceptions, tool failures, timeouts
|
|
59
59
|
- **Decisions**: Key choice points in workflow
|
|
@@ -63,7 +63,7 @@ Format:
|
|
|
63
63
|
{
|
|
64
64
|
"invocation_id": "inv_20251115_abc123",
|
|
65
65
|
"timestamp": "2025-11-15T14:23:01Z",
|
|
66
|
-
"
|
|
66
|
+
"agent": "security-reviewer",
|
|
67
67
|
"model": "claude-sonnet-4-5",
|
|
68
68
|
"input": {
|
|
69
69
|
"task": "Review auth.ts for security issues",
|
|
@@ -137,8 +137,8 @@ Session: workflow-20251115-abc
|
|
|
137
137
|
<tracing_implementation>
|
|
138
138
|
Generate correlation ID for each workflow:
|
|
139
139
|
- Workflow ID: unique identifier for entire user request
|
|
140
|
-
-
|
|
141
|
-
- Tool ID:
|
|
140
|
+
- Agent ID: workflow_id + agent name + sequence number
|
|
141
|
+
- Tool ID: agent_id + tool name + sequence number
|
|
142
142
|
|
|
143
143
|
Log all events with correlation IDs for end-to-end reconstruction.
|
|
144
144
|
</tracing_implementation>
|
|
@@ -181,17 +181,17 @@ Events:
|
|
|
181
181
|
```markdown
|
|
182
182
|
---
|
|
183
183
|
name: output-validator
|
|
184
|
-
description: Validates
|
|
184
|
+
description: Validates agent outputs for correctness, completeness, and format compliance
|
|
185
185
|
tools: Read
|
|
186
186
|
model: haiku
|
|
187
187
|
---
|
|
188
188
|
|
|
189
189
|
<role>
|
|
190
|
-
You are a validation specialist. Check
|
|
190
|
+
You are a validation specialist. Check agent outputs for quality issues.
|
|
191
191
|
</role>
|
|
192
192
|
|
|
193
193
|
<validation_checks>
|
|
194
|
-
For each
|
|
194
|
+
For each agent output:
|
|
195
195
|
1. **Format compliance**: Matches expected schema
|
|
196
196
|
2. **Completeness**: All required fields present
|
|
197
197
|
3. **Consistency**: No internal contradictions
|
|
@@ -241,7 +241,7 @@ Validation result:
|
|
|
241
241
|
**Mitigation**:
|
|
242
242
|
```markdown
|
|
243
243
|
<anti_hallucination>
|
|
244
|
-
In
|
|
244
|
+
In agent prompt:
|
|
245
245
|
- "Only reference files you've actually read"
|
|
246
246
|
- "If unsure, say so explicitly rather than guessing"
|
|
247
247
|
- "Cite specific line numbers for code references"
|
|
@@ -313,7 +313,7 @@ Before returning output:
|
|
|
313
313
|
</prompt_injection>
|
|
314
314
|
|
|
315
315
|
<workflow_incompleteness>
|
|
316
|
-
**
|
|
316
|
+
**Agent skips steps or produces partial output**.
|
|
317
317
|
|
|
318
318
|
**Symptoms**:
|
|
319
319
|
- Missing expected components
|
|
@@ -383,11 +383,11 @@ Before using a tool, ask:
|
|
|
383
383
|
|
|
384
384
|
|
|
385
385
|
<systematic_diagnosis>
|
|
386
|
-
**When
|
|
386
|
+
**When agent fails or produces unexpected output**:
|
|
387
387
|
|
|
388
388
|
<step_1>
|
|
389
389
|
**1. Reproduce the issue**
|
|
390
|
-
- Invoke
|
|
390
|
+
- Invoke agent with same inputs
|
|
391
391
|
- Document whether failure is consistent or intermittent
|
|
392
392
|
- If intermittent, run 5-10 times to identify frequency
|
|
393
393
|
</step_1>
|
|
@@ -435,7 +435,7 @@ Before using a tool, ask:
|
|
|
435
435
|
<step_7>
|
|
436
436
|
**7. Test hypothesis**
|
|
437
437
|
- Make targeted change to prompt/input
|
|
438
|
-
- Re-run
|
|
438
|
+
- Re-run agent
|
|
439
439
|
- Observe if behavior changes as predicted
|
|
440
440
|
</step_7>
|
|
441
441
|
|
|
@@ -452,11 +452,11 @@ Before using a tool, ask:
|
|
|
452
452
|
|
|
453
453
|
- [ ] Is the failure consistent or intermittent?
|
|
454
454
|
- [ ] Does the error message indicate the problem clearly?
|
|
455
|
-
- [ ] Was there a recent change to the
|
|
455
|
+
- [ ] Was there a recent change to the agent prompt?
|
|
456
456
|
- [ ] Does the issue occur with all inputs or specific ones?
|
|
457
457
|
- [ ] Are logs available for the failed execution?
|
|
458
|
-
- [ ] Has this
|
|
459
|
-
- [ ] Are other
|
|
458
|
+
- [ ] Has this agent worked correctly in the past?
|
|
459
|
+
- [ ] Are other agents experiencing similar issues?
|
|
460
460
|
</quick_diagnostic_checklist>
|
|
461
461
|
</diagnostic_procedures>
|
|
462
462
|
|
|
@@ -464,7 +464,7 @@ Before using a tool, ask:
|
|
|
464
464
|
|
|
465
465
|
|
|
466
466
|
<issue_specificity>
|
|
467
|
-
**Problem**:
|
|
467
|
+
**Problem**: Agent too generic, produces vague outputs.
|
|
468
468
|
|
|
469
469
|
**Diagnosis**: Role definition lacks specificity, focus areas too broad.
|
|
470
470
|
|
|
@@ -482,7 +482,7 @@ Focus on OWASP Top 10, authentication flaws, and data exposure risks.
|
|
|
482
482
|
</issue_specificity>
|
|
483
483
|
|
|
484
484
|
<issue_context>
|
|
485
|
-
**Problem**:
|
|
485
|
+
**Problem**: Agent makes incorrect assumptions or misses important info.
|
|
486
486
|
|
|
487
487
|
**Diagnosis**: Context failure - relevant information not in prompt or context window.
|
|
488
488
|
|
|
@@ -493,7 +493,7 @@ Focus on OWASP Top 10, authentication flaws, and data exposure risks.
|
|
|
493
493
|
</issue_context>
|
|
494
494
|
|
|
495
495
|
<issue_workflow>
|
|
496
|
-
**Problem**:
|
|
496
|
+
**Problem**: Agent inconsistently follows process or skips steps.
|
|
497
497
|
|
|
498
498
|
**Diagnosis**: Workflow not explicit enough, no verification step.
|
|
499
499
|
|
|
@@ -545,7 +545,7 @@ Validate output matches this structure before returning.
|
|
|
545
545
|
</issue_output>
|
|
546
546
|
|
|
547
547
|
<issue_constraints>
|
|
548
|
-
**Problem**:
|
|
548
|
+
**Problem**: Agent does things it shouldn't (modifies wrong files, runs dangerous commands).
|
|
549
549
|
|
|
550
550
|
**Diagnosis**: Constraints missing or too vague.
|
|
551
551
|
|
|
@@ -564,14 +564,14 @@ Use strong modal verbs (ONLY, NEVER, ALWAYS) for critical constraints.
|
|
|
564
564
|
</issue_constraints>
|
|
565
565
|
|
|
566
566
|
<issue_tools>
|
|
567
|
-
**Problem**:
|
|
567
|
+
**Problem**: Agent uses wrong tools or uses tools inefficiently.
|
|
568
568
|
|
|
569
569
|
**Diagnosis**: Tool access too broad or tool usage guidance missing.
|
|
570
570
|
|
|
571
571
|
**Fix**:
|
|
572
572
|
```markdown
|
|
573
573
|
<tool_access>
|
|
574
|
-
This
|
|
574
|
+
This agent is read-only and should only use:
|
|
575
575
|
- Read: View file contents
|
|
576
576
|
- Grep: Search for patterns
|
|
577
577
|
- Glob: Find files
|
|
@@ -603,7 +603,7 @@ Efficient tool usage:
|
|
|
603
603
|
</anti_pattern>
|
|
604
604
|
|
|
605
605
|
<anti_pattern name="no_logging">
|
|
606
|
-
❌ Running
|
|
606
|
+
❌ Running agents with no logging, then wondering why they failed
|
|
607
607
|
|
|
608
608
|
**Fix**: Comprehensive logging is non-negotiable. Can't debug what you can't observe.
|
|
609
609
|
</anti_pattern>
|
|
@@ -680,7 +680,7 @@ Efficient tool usage:
|
|
|
680
680
|
- Success rate over time (trend line)
|
|
681
681
|
- Error type breakdown (pie chart)
|
|
682
682
|
- Latency distribution (histogram)
|
|
683
|
-
- Token usage by
|
|
683
|
+
- Token usage by agent (bar chart)
|
|
684
684
|
- Top 10 failure causes (ranked list)
|
|
685
685
|
- Invocation volume (time series)
|
|
686
686
|
</dashboards>
|
|
@@ -709,6 +709,6 @@ Efficient tool usage:
|
|
|
709
709
|
- Add common issues to anti-patterns section
|
|
710
710
|
- Update best practices based on real-world usage
|
|
711
711
|
- Create troubleshooting guides for frequent problems
|
|
712
|
-
- Share insights across
|
|
712
|
+
- Share insights across agents (similar fixes often apply)
|
|
713
713
|
</knowledge_capture>
|
|
714
714
|
</continuous_improvement>
|
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
# Error Handling and Recovery for
|
|
1
|
+
# Error Handling and Recovery for Agents
|
|
2
2
|
|
|
3
3
|
<common_failure_modes>
|
|
4
4
|
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
Industry research identifies these failure patterns:
|
|
7
7
|
|
|
8
8
|
<specification_problems>
|
|
9
|
-
**32% of failures**:
|
|
9
|
+
**32% of failures**: Agents don't know what to do.
|
|
10
10
|
|
|
11
11
|
**Causes**:
|
|
12
12
|
- Vague or incomplete role definition
|
|
@@ -14,7 +14,7 @@ Industry research identifies these failure patterns:
|
|
|
14
14
|
- Unclear success criteria
|
|
15
15
|
- Ambiguous constraints
|
|
16
16
|
|
|
17
|
-
**Symptoms**:
|
|
17
|
+
**Symptoms**: Agent asks clarifying questions (can't if it's a agent), makes incorrect assumptions, produces partial outputs, or fails to complete task.
|
|
18
18
|
|
|
19
19
|
**Prevention**: Explicit `<role>`, `<workflow>`, `<focus_areas>`, and `<output_format>` sections in prompt.
|
|
20
20
|
</specification_problems>
|
|
@@ -23,7 +23,7 @@ Industry research identifies these failure patterns:
|
|
|
23
23
|
**28% of failures**: Coordination breakdowns in multi-agent workflows.
|
|
24
24
|
|
|
25
25
|
**Causes**:
|
|
26
|
-
-
|
|
26
|
+
- Agents have conflicting objectives
|
|
27
27
|
- Handoff points unclear
|
|
28
28
|
- No shared context or state
|
|
29
29
|
- Assumptions about other agents' outputs
|
|
@@ -40,15 +40,15 @@ Industry research identifies these failure patterns:
|
|
|
40
40
|
- No validation step in workflow
|
|
41
41
|
- Missing output format specification
|
|
42
42
|
- No error detection logic
|
|
43
|
-
- Blind trust in
|
|
43
|
+
- Blind trust in agent outputs
|
|
44
44
|
|
|
45
45
|
**Symptoms**: Incorrect results silently propagated, hallucinations undetected, format errors break downstream processes.
|
|
46
46
|
|
|
47
|
-
**Prevention**: Include verification steps in
|
|
47
|
+
**Prevention**: Include verification steps in agent workflows, validate outputs before use, implement evaluator agents.
|
|
48
48
|
</verification_gaps>
|
|
49
49
|
|
|
50
50
|
<error_cascading>
|
|
51
|
-
**Critical pattern**: Failures in one
|
|
51
|
+
**Critical pattern**: Failures in one agent propagate to others.
|
|
52
52
|
|
|
53
53
|
**Causes**:
|
|
54
54
|
- No error handling in downstream agents
|
|
@@ -57,7 +57,7 @@ Industry research identifies these failure patterns:
|
|
|
57
57
|
|
|
58
58
|
**Symptoms**: Single failure causes entire workflow to fail.
|
|
59
59
|
|
|
60
|
-
**Prevention**: Defensive programming in
|
|
60
|
+
**Prevention**: Defensive programming in agent prompts, graceful degradation strategies, validation at boundaries.
|
|
61
61
|
</error_cascading>
|
|
62
62
|
|
|
63
63
|
<non_determinism>
|
|
@@ -103,7 +103,7 @@ Industry research identifies these failure patterns:
|
|
|
103
103
|
</graceful_degradation>
|
|
104
104
|
|
|
105
105
|
<autonomous_retry>
|
|
106
|
-
**Pattern**:
|
|
106
|
+
**Pattern**: Agent retries failed operations with exponential backoff.
|
|
107
107
|
|
|
108
108
|
<example>
|
|
109
109
|
```markdown
|
|
@@ -140,7 +140,7 @@ If API endpoint has failed 5 consecutive times:
|
|
|
140
140
|
</circuit_breaker_logic>
|
|
141
141
|
```
|
|
142
142
|
|
|
143
|
-
**Application to
|
|
143
|
+
**Application to agents**: Include in prompt when agent calls external APIs or services.
|
|
144
144
|
|
|
145
145
|
**Benefit**: Prevents wasting time/tokens on operations known to be failing.
|
|
146
146
|
</conceptual_example>
|
|
@@ -162,7 +162,7 @@ For long-running operations:
|
|
|
162
162
|
</timeout_handling>
|
|
163
163
|
```
|
|
164
164
|
|
|
165
|
-
**Note**: Claude Code has built-in timeouts for tool calls.
|
|
165
|
+
**Note**: Claude Code has built-in timeouts for tool calls. Agent prompts should include guidance on what to do when operations approach reasonable time limits.
|
|
166
166
|
</implementation>
|
|
167
167
|
</timeouts>
|
|
168
168
|
|
|
@@ -204,7 +204,7 @@ Know when to escalate rather than thrashing.
|
|
|
204
204
|
</escalation_workflow>
|
|
205
205
|
```
|
|
206
206
|
|
|
207
|
-
**Key insight**:
|
|
207
|
+
**Key insight**: Agents should recognize their limitations and provide useful handoff information.
|
|
208
208
|
</example>
|
|
209
209
|
</reassigning_tasks>
|
|
210
210
|
</recovery_strategies>
|
|
@@ -304,7 +304,7 @@ Before returning output:
|
|
|
304
304
|
```markdown
|
|
305
305
|
Invocation ID: abc-123-def
|
|
306
306
|
Timestamp: 2025-11-15T14:23:01Z
|
|
307
|
-
|
|
307
|
+
Agent: security-reviewer
|
|
308
308
|
Model: sonnet-4.5
|
|
309
309
|
Input: "Review changes in commit a3f2b1c"
|
|
310
310
|
Tool calls:
|
|
@@ -365,13 +365,13 @@ Main chat [abc123]:
|
|
|
365
365
|
```markdown
|
|
366
366
|
---
|
|
367
367
|
name: output-validator
|
|
368
|
-
description: Validates
|
|
368
|
+
description: Validates agent outputs against expected schemas and quality criteria. Use after any agent produces structured output.
|
|
369
369
|
tools: Read
|
|
370
370
|
model: haiku
|
|
371
371
|
---
|
|
372
372
|
|
|
373
373
|
<role>
|
|
374
|
-
You are an output validation specialist. Check
|
|
374
|
+
You are an output validation specialist. Check agent outputs for:
|
|
375
375
|
- Schema compliance
|
|
376
376
|
- Completeness
|
|
377
377
|
- Internal consistency
|
|
@@ -379,7 +379,7 @@ You are an output validation specialist. Check subagent outputs for:
|
|
|
379
379
|
</role>
|
|
380
380
|
|
|
381
381
|
<workflow>
|
|
382
|
-
1. Receive
|
|
382
|
+
1. Receive agent output and expected schema
|
|
383
383
|
2. Validate structure matches schema
|
|
384
384
|
3. Check for required fields
|
|
385
385
|
4. Verify value constraints (enums, formats, ranges)
|
|
@@ -403,7 +403,7 @@ Partial: Minor issues that don't prevent use - flag warnings
|
|
|
403
403
|
|
|
404
404
|
|
|
405
405
|
<anti_pattern name="silent_failures">
|
|
406
|
-
❌
|
|
406
|
+
❌ Agent fails but doesn't indicate failure in output
|
|
407
407
|
|
|
408
408
|
**Example**:
|
|
409
409
|
```markdown
|
|
@@ -416,7 +416,7 @@ Output: "No issues found" (incomplete review, but looks successful)
|
|
|
416
416
|
</anti_pattern>
|
|
417
417
|
|
|
418
418
|
<anti_pattern name="no_fallback">
|
|
419
|
-
❌ When ideal path fails,
|
|
419
|
+
❌ When ideal path fails, agent gives up entirely
|
|
420
420
|
|
|
421
421
|
**Example**:
|
|
422
422
|
```markdown
|
|
@@ -474,7 +474,7 @@ Total workflow failure from single upstream error
|
|
|
474
474
|
<recovery_checklist>
|
|
475
475
|
|
|
476
476
|
|
|
477
|
-
Include these patterns in
|
|
477
|
+
Include these patterns in agent prompts:
|
|
478
478
|
|
|
479
479
|
**Error detection**:
|
|
480
480
|
- [ ] Validate inputs before processing
|
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
# Evaluation and Testing for
|
|
1
|
+
# Evaluation and Testing for Agents
|
|
2
2
|
|
|
3
3
|
<evaluation_framework>
|
|
4
4
|
|
|
@@ -7,15 +7,15 @@
|
|
|
7
7
|
**Primary metric**: Proportion of tasks completed correctly and satisfactorily.
|
|
8
8
|
|
|
9
9
|
Measure:
|
|
10
|
-
- Did the
|
|
10
|
+
- Did the agent complete the requested task?
|
|
11
11
|
- Did it produce the expected output?
|
|
12
12
|
- Would a human consider the task "done"?
|
|
13
13
|
|
|
14
|
-
**Testing approach**: Create test cases with known expected outcomes, invoke
|
|
14
|
+
**Testing approach**: Create test cases with known expected outcomes, invoke agent, compare results.
|
|
15
15
|
</task_completion>
|
|
16
16
|
|
|
17
17
|
<tool_correctness>
|
|
18
|
-
**Secondary metric**: Whether
|
|
18
|
+
**Secondary metric**: Whether agent calls correct tools for given task.
|
|
19
19
|
|
|
20
20
|
Measure:
|
|
21
21
|
- Are tool selections appropriate for the task?
|
|
@@ -26,7 +26,7 @@ Measure:
|
|
|
26
26
|
</tool_correctness>
|
|
27
27
|
|
|
28
28
|
<output_quality>
|
|
29
|
-
**Quality metric**: Assess quality of
|
|
29
|
+
**Quality metric**: Assess quality of agent-generated outputs.
|
|
30
30
|
|
|
31
31
|
Measure:
|
|
32
32
|
- Accuracy of analysis
|
|
@@ -38,7 +38,7 @@ Measure:
|
|
|
38
38
|
</output_quality>
|
|
39
39
|
|
|
40
40
|
<robustness>
|
|
41
|
-
**Resilience metric**: How well
|
|
41
|
+
**Resilience metric**: How well agent handles failures and edge cases.
|
|
42
42
|
|
|
43
43
|
Measure:
|
|
44
44
|
- Graceful handling of missing files
|
|
@@ -81,7 +81,7 @@ Evaluate the security review output on a 1-5 scale:
|
|
|
81
81
|
Think step-by-step about which vulnerabilities were checked and which were missed.
|
|
82
82
|
```
|
|
83
83
|
|
|
84
|
-
**Implementation**: Pass
|
|
84
|
+
**Implementation**: Pass agent output and criteria to Claude, get structured evaluation.
|
|
85
85
|
</example>
|
|
86
86
|
|
|
87
87
|
**When to use**: Complex quality metrics that can't be measured programmatically (thoroughness, insight quality, appropriateness of recommendations).
|
|
@@ -99,11 +99,11 @@ Think step-by-step about which vulnerabilities were checked and which were misse
|
|
|
99
99
|
- Edge cases (boundary conditions, unusual inputs)
|
|
100
100
|
- Error conditions (missing data, tool failures)
|
|
101
101
|
- Adversarial inputs (malformed, malicious)
|
|
102
|
-
2. Invoke
|
|
102
|
+
2. Invoke agent with each test case
|
|
103
103
|
3. Compare outputs to expected results
|
|
104
104
|
4. Document failures and iterate on prompt
|
|
105
105
|
|
|
106
|
-
**Example test suite for code-reviewer
|
|
106
|
+
**Example test suite for code-reviewer agent**:
|
|
107
107
|
```markdown
|
|
108
108
|
Test 1 (Happy path): Recent commit with SQL injection vulnerability
|
|
109
109
|
Expected: Identifies SQL injection, provides fix, rates as Critical
|
|
@@ -120,7 +120,7 @@ Expected: Identifies pattern despite obfuscation
|
|
|
120
120
|
</offline_testing>
|
|
121
121
|
|
|
122
122
|
<simulation>
|
|
123
|
-
**Simulation testing**: Run
|
|
123
|
+
**Simulation testing**: Run agent in realistic but controlled environments.
|
|
124
124
|
|
|
125
125
|
**Use cases**:
|
|
126
126
|
- Testing against historical issues (can it find bugs that were previously fixed?)
|
|
@@ -147,7 +147,7 @@ Expected: Identifies pattern despite obfuscation
|
|
|
147
147
|
<evaluation_driven_development>
|
|
148
148
|
|
|
149
149
|
|
|
150
|
-
**Philosophy**: Integrate evaluation throughout
|
|
150
|
+
**Philosophy**: Integrate evaluation throughout agent lifecycle, not just at validation stage.
|
|
151
151
|
|
|
152
152
|
<workflow>
|
|
153
153
|
1. **Initial creation**: Define success criteria before writing prompt
|
|
@@ -158,16 +158,16 @@ Expected: Identifies pattern despite obfuscation
|
|
|
158
158
|
6. **Continuous**: Ongoing evaluation → feedback → refinement cycles
|
|
159
159
|
</workflow>
|
|
160
160
|
|
|
161
|
-
**Anti-pattern**: Writing
|
|
161
|
+
**Anti-pattern**: Writing agent, deploying, never measuring effectiveness or iterating.
|
|
162
162
|
|
|
163
|
-
**Best practice**: Treat
|
|
163
|
+
**Best practice**: Treat agent prompts as living documents that evolve based on real-world performance data.
|
|
164
164
|
</evaluation_driven_development>
|
|
165
165
|
|
|
166
166
|
<testing_checklist>
|
|
167
167
|
|
|
168
168
|
|
|
169
169
|
<before_deployment>
|
|
170
|
-
Before deploying a
|
|
170
|
+
Before deploying a agent, complete this validation:
|
|
171
171
|
|
|
172
172
|
**Basic functionality**:
|
|
173
173
|
- [ ] Invoke with representative task, verify completion
|
|
@@ -219,11 +219,11 @@ Synthetic data generation useful for:
|
|
|
219
219
|
```markdown
|
|
220
220
|
Persona: Junior developer
|
|
221
221
|
Task: "Fix the bug where the login page crashes"
|
|
222
|
-
Expected behavior:
|
|
222
|
+
Expected behavior: Agent provides detailed debugging steps
|
|
223
223
|
|
|
224
224
|
Persona: Senior engineer
|
|
225
225
|
Task: "Investigate authentication flow security"
|
|
226
|
-
Expected behavior:
|
|
226
|
+
Expected behavior: Agent performs deep security analysis
|
|
227
227
|
```
|
|
228
228
|
|
|
229
229
|
**Scenario simulation**: Generate variations of common scenarios.
|
|
@@ -255,14 +255,14 @@ Maintain a validation set of real usage examples. Synthetic data can miss:
|
|
|
255
255
|
|
|
256
256
|
|
|
257
257
|
<basic_pattern>
|
|
258
|
-
Use LLM to evaluate
|
|
258
|
+
Use LLM to evaluate agent outputs when human review is impractical at scale.
|
|
259
259
|
|
|
260
260
|
**Example evaluation prompt**:
|
|
261
261
|
```markdown
|
|
262
|
-
You are evaluating a security code review performed by an AI
|
|
262
|
+
You are evaluating a security code review performed by an AI agent.
|
|
263
263
|
|
|
264
264
|
Review output:
|
|
265
|
-
{
|
|
265
|
+
{agent_output}
|
|
266
266
|
|
|
267
267
|
Code that was reviewed:
|
|
268
268
|
{code}
|
|
@@ -288,8 +288,8 @@ Expected vulnerabilities in test code:
|
|
|
288
288
|
2. XSS vulnerability on line 67
|
|
289
289
|
3. Missing authentication check on line 103
|
|
290
290
|
|
|
291
|
-
|
|
292
|
-
{
|
|
291
|
+
Agent identified:
|
|
292
|
+
{agent_findings}
|
|
293
293
|
|
|
294
294
|
Calculate:
|
|
295
295
|
- Precision: % of identified issues that are real
|
|
@@ -305,15 +305,15 @@ Calculate:
|
|
|
305
305
|
Anthropic guidance: "Test-driven development becomes even more powerful with agentic coding."
|
|
306
306
|
|
|
307
307
|
<approach>
|
|
308
|
-
**Before writing
|
|
308
|
+
**Before writing agent prompt**:
|
|
309
309
|
1. Define expected input/output pairs
|
|
310
|
-
2. Create test cases that
|
|
310
|
+
2. Create test cases that agent must pass
|
|
311
311
|
3. Write initial prompt
|
|
312
312
|
4. Run tests, observe failures
|
|
313
313
|
5. Refine prompt based on failures
|
|
314
314
|
6. Repeat until all tests pass
|
|
315
315
|
|
|
316
|
-
**Example for test-writer
|
|
316
|
+
**Example for test-writer agent**:
|
|
317
317
|
```markdown
|
|
318
318
|
Test 1:
|
|
319
319
|
Input: Function that adds two numbers
|
|
@@ -331,7 +331,7 @@ Expected output: Test file with:
|
|
|
331
331
|
- Mocked HTTP calls (no real API calls)
|
|
332
332
|
```
|
|
333
333
|
|
|
334
|
-
**Invoke
|
|
334
|
+
**Invoke agent → check if outputs match expectations → iterate on prompt.**
|
|
335
335
|
</approach>
|
|
336
336
|
|
|
337
337
|
**Benefit**: Clear acceptance criteria before development, objective measure of prompt quality.
|
|
@@ -341,9 +341,9 @@ Expected output: Test file with:
|
|
|
341
341
|
|
|
342
342
|
|
|
343
343
|
<anti_pattern name="no_testing">
|
|
344
|
-
❌ Deploying
|
|
344
|
+
❌ Deploying agents without any validation
|
|
345
345
|
|
|
346
|
-
**Risk**:
|
|
346
|
+
**Risk**: Agent fails on real tasks, wastes user time, damages trust.
|
|
347
347
|
|
|
348
348
|
**Fix**: Minimum viable testing = invoke with 3 representative tasks before deploying.
|
|
349
349
|
</anti_pattern>
|
|
@@ -351,7 +351,7 @@ Expected output: Test file with:
|
|
|
351
351
|
<anti_pattern name="only_happy_path">
|
|
352
352
|
❌ Testing only ideal scenarios
|
|
353
353
|
|
|
354
|
-
**Risk**:
|
|
354
|
+
**Risk**: Agent fails on edge cases, error conditions, or unusual (but valid) inputs.
|
|
355
355
|
|
|
356
356
|
**Fix**: Test matrix covering happy path, edge cases, and error conditions.
|
|
357
357
|
</anti_pattern>
|
|
@@ -367,7 +367,7 @@ Expected output: Test file with:
|
|
|
367
367
|
<anti_pattern name="test_once_deploy_forever">
|
|
368
368
|
❌ Testing once at creation, never revisiting
|
|
369
369
|
|
|
370
|
-
**Risk**:
|
|
370
|
+
**Risk**: Agent degrades over time as usage patterns shift, codebases change, or models update.
|
|
371
371
|
|
|
372
372
|
**Fix**: Periodic re-evaluation with current usage patterns and edge cases.
|
|
373
373
|
</anti_pattern>
|