@jaguilar87/gaia-ops 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (91) hide show
  1. package/CHANGELOG.md +315 -0
  2. package/CLAUDE.md +154 -0
  3. package/LICENSE +21 -0
  4. package/README.md +221 -0
  5. package/agents/aws-troubleshooter.md +50 -0
  6. package/agents/claude-architect.md +821 -0
  7. package/agents/devops-developer.md +92 -0
  8. package/agents/gcp-troubleshooter.md +50 -0
  9. package/agents/gitops-operator.md +360 -0
  10. package/agents/terraform-architect.md +289 -0
  11. package/bin/gaia-init.js +620 -0
  12. package/commands/architect.md +97 -0
  13. package/commands/restore-session.md +87 -0
  14. package/commands/save-session.md +88 -0
  15. package/commands/session-status.md +61 -0
  16. package/commands/speckit.add-task.md +144 -0
  17. package/commands/speckit.analyze-task.md +65 -0
  18. package/commands/speckit.implement.md +96 -0
  19. package/commands/speckit.init.md +237 -0
  20. package/commands/speckit.plan.md +88 -0
  21. package/commands/speckit.specify.md +161 -0
  22. package/commands/speckit.tasks.md +188 -0
  23. package/config/AGENTS.md +162 -0
  24. package/config/agent-catalog.md +604 -0
  25. package/config/context-contracts.md +682 -0
  26. package/config/git-standards.md +674 -0
  27. package/config/git_standards.json +69 -0
  28. package/config/orchestration-workflow.md +735 -0
  29. package/hooks/__pycache__/post_tool_use.cpython-312.pyc +0 -0
  30. package/hooks/__pycache__/pre_kubectl_security.cpython-312.pyc +0 -0
  31. package/hooks/__pycache__/pre_tool_use.cpython-312.pyc +0 -0
  32. package/hooks/__pycache__/session_start.cpython-312.pyc +0 -0
  33. package/hooks/__pycache__/subagent_stop.cpython-312.pyc +0 -0
  34. package/hooks/post_tool_use.py +463 -0
  35. package/hooks/pre_kubectl_security.py +205 -0
  36. package/hooks/pre_tool_use.py +530 -0
  37. package/hooks/session_start.py +315 -0
  38. package/hooks/subagent_stop.py +549 -0
  39. package/index.js +92 -0
  40. package/package.json +59 -0
  41. package/speckit/README.en.md +648 -0
  42. package/speckit/README.md +353 -0
  43. package/speckit/governance.md +169 -0
  44. package/speckit/scripts/check-prerequisites.sh +194 -0
  45. package/speckit/scripts/common.sh +126 -0
  46. package/speckit/scripts/create-new-feature.sh +131 -0
  47. package/speckit/scripts/init.sh +42 -0
  48. package/speckit/scripts/setup-plan.sh +95 -0
  49. package/speckit/scripts/update-agent-context.sh +718 -0
  50. package/speckit/templates/adr-template.md +118 -0
  51. package/speckit/templates/agent-file-template.md +23 -0
  52. package/speckit/templates/plan-template.md +233 -0
  53. package/speckit/templates/spec-template.md +116 -0
  54. package/speckit/templates/tasks-template-bkp.md +136 -0
  55. package/speckit/templates/tasks-template.md +345 -0
  56. package/templates/CLAUDE.template.md +170 -0
  57. package/templates/code-examples/approval_gate_workflow.py +141 -0
  58. package/templates/code-examples/clarification_workflow.py +94 -0
  59. package/templates/code-examples/commit_validation.py +86 -0
  60. package/templates/project-context.template.json +126 -0
  61. package/templates/settings.template.json +307 -0
  62. package/tools/__pycache__/agent_router.cpython-312.pyc +0 -0
  63. package/tools/__pycache__/approval_gate.cpython-312.pyc +0 -0
  64. package/tools/__pycache__/clarify_engine.cpython-312.pyc +0 -0
  65. package/tools/__pycache__/clarify_patterns.cpython-312.pyc +0 -0
  66. package/tools/__pycache__/commit_validator.cpython-312.pyc +0 -0
  67. package/tools/__pycache__/context_section_reader.cpython-312.pyc +0 -0
  68. package/tools/__pycache__/routing_dashboard.cpython-312.pyc +0 -0
  69. package/tools/__pycache__/routing_feedback.cpython-312.pyc +0 -0
  70. package/tools/__pycache__/semantic_matcher.cpython-312.pyc +0 -0
  71. package/tools/__pycache__/task_manager.cpython-312.pyc +0 -0
  72. package/tools/agent_capabilities.json +231 -0
  73. package/tools/agent_invoker_helper.py +239 -0
  74. package/tools/agent_router.py +730 -0
  75. package/tools/approval_gate.py +318 -0
  76. package/tools/clarify_engine.py +511 -0
  77. package/tools/clarify_patterns.py +356 -0
  78. package/tools/commit_validator.py +338 -0
  79. package/tools/context_provider.py +181 -0
  80. package/tools/context_section_reader.py +301 -0
  81. package/tools/demo_clarify.py +104 -0
  82. package/tools/generate_embeddings.py +168 -0
  83. package/tools/quicktriage_aws_troubleshooter.sh +45 -0
  84. package/tools/quicktriage_devops_developer.sh +38 -0
  85. package/tools/quicktriage_gcp_troubleshooter.sh +51 -0
  86. package/tools/quicktriage_gitops_operator.sh +47 -0
  87. package/tools/quicktriage_terraform_architect.sh +40 -0
  88. package/tools/semantic_matcher.py +222 -0
  89. package/tools/task_manager.py +547 -0
  90. package/tools/task_manager_README.md +395 -0
  91. package/tools/task_manager_example.py +215 -0
@@ -0,0 +1,604 @@
1
+ # Agent Catalog
2
+
3
+ **Version:** 2.0.0
4
+ **Last Updated:** 2025-11-07
5
+ **Parent:** CLAUDE.md
6
+
7
+ This document provides a comprehensive catalog of all specialist agents in the system, including their capabilities, security tiers, use cases, and invocation patterns.
8
+
9
+ ---
10
+
11
+ ## Agent Classification
12
+
13
+ ### Project Agents (Use context_provider.py)
14
+
15
+ These agents work on **USER PROJECTS** (infrastructure, GitOps, applications). They MUST receive context via `context_provider.py`.
16
+
17
+ | Agent | Primary Role | Security Tier | Model |
18
+ |-------|--------------|---------------|-------|
19
+ | **terraform-architect** | Terraform/Terragrunt validation & realization | T0-T3 | inherit |
20
+ | **gitops-operator** | Kubernetes/Flux operations & realization | T0-T3 | inherit |
21
+ | **gcp-troubleshooter** | GCP diagnostics | T0-T2 | inherit |
22
+ | **aws-troubleshooter** | AWS diagnostics | T0-T2 | inherit |
23
+ | **devops-developer** | Application build/test/debug | T0-T2 | inherit |
24
+
25
+ ---
26
+
27
+ ### Meta-Agents (Direct invocation, NO context_provider.py)
28
+
29
+ These agents work on **CLAUDE CODE SYSTEM ITSELF** (agent orchestration, tooling, optimization). They receive MANUAL context in the prompt.
30
+
31
+ | Agent | Primary Role | Context Type | Model |
32
+ |-------|--------------|--------------|-------|
33
+ | **claude-architect** | System analysis & optimization | Manual (logs, .claude/, tests) | inherit |
34
+ | **Explore** | Codebase exploration for understanding | Automatic (file patterns) | haiku |
35
+ | **Plan** | Planning mode for implementation design | Automatic (user prompt) | inherit |
36
+
37
+ ---
38
+
39
+ ## Security Tiers
40
+
41
+ ### Tier Definitions
42
+
43
+ | Tier | Operations | Approval Required | Examples |
44
+ |------|-----------|-------------------|----------|
45
+ | **T0** | Read-only queries | No | `kubectl get`, `git status`, `terraform show` |
46
+ | **T1** | Local changes only | No | File edits, local git commits |
47
+ | **T2** | Reversible remote operations | No | `git push` to feature branch, `terraform plan` |
48
+ | **T3** | Irreversible operations | **YES** | `git push` to main, `terraform apply`, `kubectl apply` |
49
+
50
+ **Critical Rule:** T3 operations REQUIRE Phase 4 Approval Gate (MANDATORY).
51
+
52
+ ---
53
+
54
+ ## Project Agents: Detailed Catalog
55
+
56
+ ### terraform-architect
57
+
58
+ **Full Name:** Terraform Infrastructure Architect
59
+
60
+ **Purpose:** Manages the complete Terraform/Terragrunt lifecycle for cloud infrastructure.
61
+
62
+ **Capabilities:**
63
+ - **Validation:** Syntax check, `terraform validate`, `terraform fmt`, linting
64
+ - **Planning:** Generate `terraform plan`, analyze resource changes
65
+ - **Code Generation:** Create new terraform modules/resources from specifications
66
+ - **Realization:** Execute `terragrunt apply` with verification
67
+ - **State Management:** Analyze terraform state, detect drift
68
+ - **Rollback:** Revert failed applies using state backups
69
+
70
+ **Tools Available:**
71
+ - Read, Edit, Write, Glob, Grep, Bash
72
+ - `terraform` (validate, plan, apply, destroy, show, state)
73
+ - `terragrunt` (plan, apply, run-all)
74
+ - `tflint` (linting)
75
+ - `gcloud`, `aws` (for provider operations)
76
+ - `kubectl` (for GKE/EKS cluster validation)
77
+ - Task (for sub-agent delegation)
78
+
79
+ **Security Tiers:**
80
+ - T0: `terraform show`, `terraform state list`, `terraform output`
81
+ - T1: File edits, `terraform fmt`, `terraform validate`
82
+ - T2: `terraform plan`
83
+ - T3: `terragrunt apply` (requires approval)
84
+
85
+ **Context Contract:** See `.claude/docs/context-contracts.md#terraform-architect`
86
+
87
+ **Use Cases:**
88
+ - "Create a CloudSQL instance with Terraform"
89
+ - "Update GKE node pool configuration"
90
+ - "Validate terraform code for syntax errors"
91
+ - "Apply terraform changes for VPC networking"
92
+
93
+ **Invocation Pattern:**
94
+ ```python
95
+ # Phase 1: Routing
96
+ agent = agent_router.py("Update GKE node pool configuration")
97
+ # Returns: "terraform-architect"
98
+
99
+ # Phase 2: Context
100
+ context = context_provider.py("terraform-architect", "Update GKE node pool configuration")
101
+
102
+ # Phase 3: Invoke for Planning
103
+ Task(
104
+ subagent_type="terraform-architect",
105
+ description="Generate GKE node pool update",
106
+ prompt=f"{context}\n\nTask: Update GKE node pool configuration"
107
+ )
108
+ # Returns: Realization package with terraform code
109
+
110
+ # Phase 4: Approval Gate (MANDATORY for apply)
111
+
112
+ # Phase 5: Invoke for Realization
113
+ Task(
114
+ subagent_type="terraform-architect",
115
+ description="Apply GKE node pool update",
116
+ prompt=f"Execute realization: {realization_package}"
117
+ )
118
+ ```
119
+
120
+ ---
121
+
122
+ ### gitops-operator
123
+
124
+ **Full Name:** GitOps Kubernetes Operator
125
+
126
+ **Purpose:** Manages Kubernetes applications using GitOps methodology (Flux reconciliation).
127
+
128
+ **Capabilities:**
129
+ - **Manifest Generation:** Create Kubernetes YAML (Deployments, Services, Ingresses, etc.)
130
+ - **HelmRelease Management:** Generate/update HelmRelease resources for Flux
131
+ - **Deployment:** Apply manifests to cluster, wait for readiness
132
+ - **Verification:** Check pod status, logs, events
133
+ - **Rollback:** Revert failed deployments using git revert
134
+ - **Troubleshooting:** Diagnose deployment issues (ImagePullBackOff, CrashLoopBackOff, etc.)
135
+
136
+ **Tools Available:**
137
+ - Read, Edit, Write, Glob, Grep, Bash
138
+ - `kubectl` (apply, get, describe, logs, exec, port-forward)
139
+ - `helm` (template, lint, list, status)
140
+ - `flux` (reconcile, get, logs, check)
141
+ - `kustomize` (build)
142
+ - `gcloud container clusters get-credentials` (for cluster access)
143
+ - Task (for sub-agent delegation)
144
+
145
+ **Security Tiers:**
146
+ - T0: `kubectl get`, `kubectl describe`, `kubectl logs`, `flux get`
147
+ - T1: File edits, `helm template`, `kustomize build`
148
+ - T2: `git push` to feature branch, `flux reconcile`
149
+ - T3: `kubectl apply`, `git push` to main (requires approval)
150
+
151
+ **Context Contract:** See `.claude/docs/context-contracts.md#gitops-operator`
152
+
153
+ **Use Cases:**
154
+ - "Deploy tcm-api service to non-prod cluster"
155
+ - "Update pg-api to version v2.1.0"
156
+ - "Debug why tcm-api pods are crashing"
157
+ - "Create Ingress for new service"
158
+ - "Rollback failed deployment of pg-query-api"
159
+
160
+ **Invocation Pattern:**
161
+ ```python
162
+ # Phase 1: Routing
163
+ agent = agent_router.py("Deploy tcm-api service to non-prod cluster")
164
+ # Returns: "gitops-operator"
165
+
166
+ # Phase 2: Context
167
+ context = context_provider.py("gitops-operator", "Deploy tcm-api service to non-prod cluster")
168
+
169
+ # Phase 3: Invoke for Planning
170
+ Task(
171
+ subagent_type="gitops-operator",
172
+ description="Generate tcm-api deployment",
173
+ prompt=f"{context}\n\nTask: Deploy tcm-api service to non-prod cluster"
174
+ )
175
+ # Returns: Realization package with Kubernetes YAML
176
+
177
+ # Phase 4: Approval Gate (MANDATORY for kubectl apply)
178
+
179
+ # Phase 5: Invoke for Realization
180
+ Task(
181
+ subagent_type="gitops-operator",
182
+ description="Deploy tcm-api to cluster",
183
+ prompt=f"Execute realization: {realization_package}"
184
+ )
185
+ ```
186
+
187
+ ---
188
+
189
+ ### gcp-troubleshooter
190
+
191
+ **Full Name:** GCP Diagnostic Specialist
192
+
193
+ **Purpose:** Diagnoses issues in GCP environments by comparing intended state (IaC/GitOps) with actual state (live resources).
194
+
195
+ **Capabilities:**
196
+ - **State Comparison:** Compare terraform/kubectl desired state with live GCP/GKE state
197
+ - **Log Analysis:** Analyze Cloud Logging, GKE pod logs, Cloud SQL logs
198
+ - **Network Diagnostics:** Test connectivity, DNS resolution, firewall rules
199
+ - **IAM Debugging:** Verify service account permissions, IAM policy bindings
200
+ - **Performance Analysis:** Query Cloud Monitoring metrics, identify bottlenecks
201
+ - **Root Cause Analysis:** Correlate symptoms across multiple resources
202
+
203
+ **Tools Available:**
204
+ - Read, Glob, Grep, Bash
205
+ - `gcloud` (compute, container, sql, iam, logging, monitoring)
206
+ - `kubectl` (get, describe, logs, exec)
207
+ - `gsutil` (for GCS bucket inspection)
208
+ - `terraform` (state inspection)
209
+ - Task (for sub-agent delegation)
210
+
211
+ **Security Tiers:**
212
+ - T0: `gcloud describe`, `kubectl get`, `gcloud logging read`
213
+ - T1: `gcloud sql connect` (read-only queries)
214
+ - T2: `gcloud compute ssh` (for VM diagnostics)
215
+ - T3: None (diagnostic agent, no destructive operations)
216
+
217
+ **Context Contract:** See `.claude/docs/context-contracts.md#gcp-troubleshooter`
218
+
219
+ **Use Cases:**
220
+ - "Why is tcm-api pod crashing with database connection error?"
221
+ - "Diagnose why CloudSQL instance is unreachable from GKE"
222
+ - "Investigate high latency on pg-api service"
223
+ - "Why is IAM permission denied for service account?"
224
+
225
+ **Invocation Pattern:**
226
+ ```python
227
+ # Phase 1: Routing
228
+ agent = agent_router.py("Why is tcm-api pod crashing with database connection error?")
229
+ # Returns: "gcp-troubleshooter"
230
+
231
+ # Phase 2: Context
232
+ context = context_provider.py("gcp-troubleshooter", "Diagnose tcm-api pod crash")
233
+
234
+ # Phase 3: Invoke for Diagnosis
235
+ Task(
236
+ subagent_type="gcp-troubleshooter",
237
+ description="Diagnose tcm-api database error",
238
+ prompt=f"{context}\n\nTask: Why is tcm-api pod crashing with database connection error?"
239
+ )
240
+ # Returns: Diagnostic report with root cause and recommendations
241
+ ```
242
+
243
+ ---
244
+
245
+ ### aws-troubleshooter
246
+
247
+ **Full Name:** AWS Diagnostic Specialist
248
+
249
+ **Purpose:** Diagnoses issues in AWS environments (EKS, RDS, EC2, etc.) by comparing intended state with actual state.
250
+
251
+ **Capabilities:**
252
+ - **State Comparison:** Compare terraform/kubectl desired state with live AWS/EKS state
253
+ - **Log Analysis:** Analyze CloudWatch Logs, EKS pod logs, RDS logs
254
+ - **Network Diagnostics:** Test connectivity, Route 53 DNS, security groups
255
+ - **IAM Debugging:** Verify IAM role permissions, trust policies
256
+ - **Performance Analysis:** Query CloudWatch metrics, identify bottlenecks
257
+ - **Root Cause Analysis:** Correlate symptoms across multiple resources
258
+
259
+ **Tools Available:**
260
+ - Read, Glob, Grep, Bash
261
+ - `aws` (ec2, eks, rds, iam, logs, cloudwatch)
262
+ - `kubectl` (get, describe, logs, exec)
263
+ - `eksctl` (for EKS cluster inspection)
264
+ - `terraform` (state inspection)
265
+ - Task (for sub-agent delegation)
266
+
267
+ **Security Tiers:**
268
+ - T0: `aws describe-*`, `kubectl get`, `aws logs tail`
269
+ - T1: `aws rds connect` (read-only queries)
270
+ - T2: `aws ec2 ssh` (for EC2 diagnostics)
271
+ - T3: None (diagnostic agent, no destructive operations)
272
+
273
+ **Context Contract:** See `.claude/docs/context-contracts.md#aws-troubleshooter`
274
+
275
+ **Use Cases:**
276
+ - "Why is EKS pod failing with IAM permission error?"
277
+ - "Diagnose why RDS instance has high CPU usage"
278
+ - "Investigate network timeout on ALB"
279
+ - "Why is security group blocking traffic?"
280
+
281
+ ---
282
+
283
+ ### devops-developer
284
+
285
+ **Full Name:** DevOps Application Developer
286
+
287
+ **Purpose:** Application-level operations including build, test, debug, and ad-hoc git operations.
288
+
289
+ **Capabilities:**
290
+ - **Code Analysis:** Understand application code, dependencies, configuration
291
+ - **Build & Test:** Run build commands, execute test suites, analyze failures
292
+ - **Debugging:** Add logging, reproduce bugs, analyze stack traces
293
+ - **Dependency Management:** Update packages, resolve version conflicts
294
+ - **Git Operations:** Create commits, branches, pull requests (ad-hoc, NOT part of infrastructure workflow)
295
+ - **CI/CD:** Analyze pipeline failures, update GitHub Actions/GitLab CI configs
296
+
297
+ **Tools Available:**
298
+ - Read, Edit, Write, Glob, Grep, Bash
299
+ - `node`, `npm`, `npx` (for Node.js/TypeScript apps)
300
+ - `python3`, `pip`, `pytest` (for Python apps)
301
+ - `jest`, `eslint`, `prettier` (for JavaScript/TypeScript)
302
+ - `git` (all operations)
303
+ - Task (for sub-agent delegation)
304
+
305
+ **Security Tiers:**
306
+ - T0: `npm test`, `pytest`, `git log`, `git diff`
307
+ - T1: File edits, `npm install`, `pip install`, local git commits
308
+ - T2: `git push` to feature branch, `npm publish` (to test registry)
309
+ - T3: `git push` to main (requires approval)
310
+
311
+ **Context Contract:** See `.claude/docs/context-contracts.md#devops-developer`
312
+
313
+ **Use Cases:**
314
+ - "Run tests and fix failures"
315
+ - "Update package.json dependencies"
316
+ - "Debug why API returns 500 error"
317
+ - "Create a commit with these changes"
318
+ - "Generate pull request for feature branch"
319
+
320
+ **Invocation Pattern:**
321
+ ```python
322
+ # Orchestrator delegates simple operations to devops-developer
323
+ Task(
324
+ subagent_type="devops-developer",
325
+ description="Fix test failures",
326
+ prompt=f"{context}\n\nTask: Run tests and fix any failures found"
327
+ )
328
+ ```
329
+
330
+ ---
331
+
332
+ ## Meta-Agents: Detailed Catalog
333
+
334
+ ### claude-architect
335
+
336
+ **Full Name:** Claude Code System Architect
337
+
338
+ **Purpose:** Analyzes, diagnoses, and optimizes the agent orchestration system itself.
339
+
340
+ **Capabilities:**
341
+ - **System Analysis:** Understand agent system architecture, workflows, data flows
342
+ - **Log Analysis:** Parse and analyze logs (routing, approvals, clarifications, violations)
343
+ - **Performance Optimization:** Identify bottlenecks, propose improvements
344
+ - **Best Practices Research:** Use WebSearch to find industry standards, patterns
345
+ - **Test Suite Analysis:** Review test results, coverage, identify gaps
346
+ - **Documentation Generation:** Create/update system documentation
347
+
348
+ **Tools Available:**
349
+ - Read, Glob, Grep, Bash
350
+ - WebSearch (for best practices research)
351
+ - Python (for data analysis, script generation)
352
+ - Task (for sub-agent delegation)
353
+
354
+ **Context Type:** Manual (NOT context_provider.py)
355
+
356
+ **Use Cases:**
357
+ - "Analyze routing accuracy and propose improvements"
358
+ - "Why is approval gate being bypassed?"
359
+ - "Optimize CLAUDE.md for token efficiency"
360
+ - "Research best practices for agent context provisioning"
361
+
362
+ **Invocation Pattern:**
363
+ ```python
364
+ # Orchestrator provides manual context in prompt
365
+ Task(
366
+ subagent_type="claude-architect",
367
+ description="Analyze routing accuracy",
368
+ prompt="""
369
+ ## System Context
370
+ - Agent system: /home/jaguilar/aaxis/rnd/repositories/.claude/
371
+ - Logs: /home/jaguilar/aaxis/rnd/repositories/.claude/logs/
372
+ - Tools: /home/jaguilar/aaxis/rnd/repositories/.claude/tools/
373
+
374
+ ## Task
375
+ Analyze routing accuracy over last 100 invocations. Propose improvements.
376
+ """
377
+ )
378
+ ```
379
+
380
+ ---
381
+
382
+ ### Explore
383
+
384
+ **Full Name:** Codebase Explorer
385
+
386
+ **Purpose:** Fast, thorough exploration of codebases to answer questions or find patterns.
387
+
388
+ **Capabilities:**
389
+ - **File Pattern Matching:** Find files by glob patterns (e.g., "src/components/**/*.tsx")
390
+ - **Keyword Search:** Search code for specific keywords or patterns
391
+ - **Architecture Understanding:** Analyze codebase structure, identify patterns
392
+ - **Dependency Mapping:** Trace imports, dependencies, data flows
393
+
394
+ **Tools Available:**
395
+ - Read, Glob, Grep, Bash (all tools)
396
+
397
+ **Thoroughness Levels:**
398
+ - `quick`: Basic searches, 1-2 file locations
399
+ - `medium`: Moderate exploration, 3-5 locations
400
+ - `very thorough`: Comprehensive analysis, multiple locations and naming conventions
401
+
402
+ **Context Type:** Automatic (file patterns, keywords)
403
+
404
+ **Use Cases:**
405
+ - "Find all API endpoints in the codebase"
406
+ - "Where is the user authentication logic?"
407
+ - "Show me all components that use the UserContext"
408
+
409
+ **Invocation Pattern:**
410
+ ```python
411
+ Task(
412
+ subagent_type="Explore",
413
+ description="Find API endpoints",
414
+ prompt="Find all API endpoints in the codebase (thoroughness: medium)"
415
+ )
416
+ ```
417
+
418
+ ---
419
+
420
+ ### Plan
421
+
422
+ **Full Name:** Implementation Planner
423
+
424
+ **Purpose:** Breaks down complex implementation tasks into step-by-step plans.
425
+
426
+ **Capabilities:**
427
+ - **Task Decomposition:** Break large tasks into smaller, manageable subtasks
428
+ - **Dependency Analysis:** Identify task dependencies, optimal execution order
429
+ - **Risk Assessment:** Identify high-risk tasks, potential blockers
430
+ - **Resource Estimation:** Estimate time, complexity for each subtask
431
+
432
+ **Tools Available:**
433
+ - Read, Glob, Grep, Bash (all tools)
434
+
435
+ **Context Type:** Automatic (user prompt)
436
+
437
+ **Use Cases:**
438
+ - "Plan implementation of user authentication feature"
439
+ - "How should I refactor this module?"
440
+ - "Break down the task of migrating to Kubernetes"
441
+
442
+ **Invocation Pattern:**
443
+ ```python
444
+ Task(
445
+ subagent_type="Plan",
446
+ description="Plan auth implementation",
447
+ prompt="Plan implementation of user authentication with JWT tokens"
448
+ )
449
+ ```
450
+
451
+ ---
452
+
453
+ ## Agent Selection Guide
454
+
455
+ ### Decision Tree
456
+
457
+ ```
458
+ User Request
459
+
460
+ ├─ Infrastructure change (terraform, GCP, AWS)?
461
+ │ └─ YES → terraform-architect
462
+
463
+ ├─ Kubernetes/deployment/service change?
464
+ │ └─ YES → gitops-operator
465
+
466
+ ├─ Diagnostic/troubleshooting?
467
+ │ ├─ GCP? → gcp-troubleshooter
468
+ │ └─ AWS? → aws-troubleshooter
469
+
470
+ ├─ Application code/build/test?
471
+ │ └─ YES → devops-developer
472
+
473
+ ├─ System/agent analysis?
474
+ │ └─ YES → claude-architect
475
+
476
+ ├─ Codebase exploration/understanding?
477
+ │ └─ YES → Explore
478
+
479
+ └─ Implementation planning?
480
+ └─ YES → Plan
481
+ ```
482
+
483
+ ---
484
+
485
+ ## Invocation Anti-Patterns
486
+
487
+ ### ❌ DON'T: Use context_provider.py for Meta-Agents
488
+
489
+ ```python
490
+ # WRONG
491
+ context = context_provider.py("claude-architect", "analyze logs")
492
+ Task(subagent_type="claude-architect", prompt=context)
493
+ ```
494
+
495
+ **Why:** Meta-agents work on the SYSTEM, not projects. They need system paths, not project context.
496
+
497
+ **Correct:**
498
+ ```python
499
+ Task(
500
+ subagent_type="claude-architect",
501
+ prompt=f"System path: {system_path}\n\nTask: analyze logs"
502
+ )
503
+ ```
504
+
505
+ ---
506
+
507
+ ### ❌ DON'T: Skip Approval Gate for T3 Operations
508
+
509
+ ```python
510
+ # WRONG
511
+ Task(subagent_type="gitops-operator", prompt="Deploy to prod")
512
+ # Skips Phase 4 Approval Gate
513
+ ```
514
+
515
+ **Why:** T3 operations are irreversible. Approval is MANDATORY.
516
+
517
+ **Correct:**
518
+ ```python
519
+ # Phase 3: Planning
520
+ plan = Task(subagent_type="gitops-operator", prompt="Generate deployment plan")
521
+
522
+ # Phase 4: Approval Gate (MANDATORY)
523
+ approval = approval_gate.py(plan)
524
+ if not approval["approved"]:
525
+ halt_workflow()
526
+
527
+ # Phase 5: Realization
528
+ Task(subagent_type="gitops-operator", prompt=f"Execute: {plan}")
529
+ ```
530
+
531
+ ---
532
+
533
+ ### ❌ DON'T: Over-Prompt Agents
534
+
535
+ ```python
536
+ # WRONG
537
+ Task(
538
+ subagent_type="terraform-architect",
539
+ prompt="""
540
+ Context: {...}
541
+
542
+ Task: Create CloudSQL instance
543
+
544
+ Instructions:
545
+ 1. First, read the terraform code
546
+ 2. Then, generate a new module
547
+ 3. Then, run terraform validate
548
+ 4. Then, return a realization package
549
+ ...
550
+ """
551
+ )
552
+ ```
553
+
554
+ **Why:** The agent knows its own protocol. Over-prompting causes the agent to ignore its internal workflow.
555
+
556
+ **Correct:**
557
+ ```python
558
+ Task(
559
+ subagent_type="terraform-architect",
560
+ prompt=f"{context}\n\nTask: Create CloudSQL instance"
561
+ )
562
+ ```
563
+
564
+ ---
565
+
566
+ ## Agent Performance Metrics
567
+
568
+ Track in `.claude/logs/agent-metrics.jsonl`:
569
+
570
+ ### Per-Agent Metrics
571
+
572
+ | Agent | Invocations | Success Rate | Avg Duration | Approval Rate |
573
+ |-------|-------------|--------------|--------------|---------------|
574
+ | terraform-architect | 234 | 94% | 45s | 87% |
575
+ | gitops-operator | 567 | 96% | 32s | 91% |
576
+ | gcp-troubleshooter | 123 | 98% | 28s | N/A |
577
+ | aws-troubleshooter | 45 | 96% | 31s | N/A |
578
+ | devops-developer | 189 | 92% | 18s | 15% |
579
+ | claude-architect | 34 | 100% | 120s | N/A |
580
+ | Explore | 456 | 99% | 8s | N/A |
581
+ | Plan | 78 | 95% | 22s | N/A |
582
+
583
+ ### Target Thresholds
584
+
585
+ - **Success Rate:** >90% for all agents
586
+ - **Avg Duration:** <60s for project agents, <10s for meta-agents
587
+ - **Approval Rate:** 80-90% (for agents with T3 operations)
588
+
589
+ ---
590
+
591
+ ## Version History
592
+
593
+ ### 2.0.0 (2025-11-07)
594
+ - Extracted from CLAUDE.md monolith
595
+ - Added comprehensive capabilities, tools, use cases for each agent
596
+ - Added security tier definitions and enforcement
597
+ - Added invocation patterns with examples
598
+ - Added anti-patterns section
599
+ - Added performance metrics
600
+ - Distinguished project agents from meta-agents
601
+
602
+ ### 1.x (Historical)
603
+ - Embedded in CLAUDE.md
604
+ - Basic agent list with roles