aiwg 2026.2.13 → 2026.2.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (129) hide show
  1. package/CLAUDE.md +2 -0
  2. package/README.md +15 -6
  3. package/agentic/code/addons/uat-mcp/README.md +109 -0
  4. package/agentic/code/addons/uat-mcp/agents/uat-executor.md +199 -0
  5. package/agentic/code/addons/uat-mcp/agents/uat-planner.md +196 -0
  6. package/agentic/code/addons/uat-mcp/commands/uat-execute.md +242 -0
  7. package/agentic/code/addons/uat-mcp/commands/uat-generate.md +189 -0
  8. package/agentic/code/addons/uat-mcp/commands/uat-report.md +190 -0
  9. package/agentic/code/addons/uat-mcp/manifest.json +69 -0
  10. package/agentic/code/addons/uat-mcp/schemas/uat-coverage.yaml +139 -0
  11. package/agentic/code/addons/uat-mcp/schemas/uat-plan.yaml +206 -0
  12. package/agentic/code/addons/uat-mcp/schemas/uat-result.yaml +232 -0
  13. package/agentic/code/addons/uat-mcp/skills/uat-mode/SKILL.md +128 -0
  14. package/agentic/code/addons/uat-mcp/templates/uat-executor-guide.md +97 -0
  15. package/agentic/code/addons/uat-mcp/templates/uat-phase.md +61 -0
  16. package/agentic/code/addons/uat-mcp/templates/uat-report.md +95 -0
  17. package/agentic/code/addons/uat-mcp/templates/uat-test-case.md +66 -0
  18. package/agentic/code/frameworks/forensics-complete/README.md +170 -0
  19. package/agentic/code/frameworks/forensics-complete/agents/acquisition-agent.md +272 -0
  20. package/agentic/code/frameworks/forensics-complete/agents/cloud-analyst.md +326 -0
  21. package/agentic/code/frameworks/forensics-complete/agents/container-analyst.md +280 -0
  22. package/agentic/code/frameworks/forensics-complete/agents/forensics-orchestrator.md +346 -0
  23. package/agentic/code/frameworks/forensics-complete/agents/ioc-analyst.md +373 -0
  24. package/agentic/code/frameworks/forensics-complete/agents/log-analyst.md +258 -0
  25. package/agentic/code/frameworks/forensics-complete/agents/manifest.json +181 -0
  26. package/agentic/code/frameworks/forensics-complete/agents/memory-analyst.md +248 -0
  27. package/agentic/code/frameworks/forensics-complete/agents/network-analyst.md +266 -0
  28. package/agentic/code/frameworks/forensics-complete/agents/persistence-hunter.md +297 -0
  29. package/agentic/code/frameworks/forensics-complete/agents/recon-agent.md +239 -0
  30. package/agentic/code/frameworks/forensics-complete/agents/reporting-agent.md +284 -0
  31. package/agentic/code/frameworks/forensics-complete/agents/timeline-builder.md +326 -0
  32. package/agentic/code/frameworks/forensics-complete/agents/triage-agent.md +256 -0
  33. package/agentic/code/frameworks/forensics-complete/commands/forensics-acquire.md +194 -0
  34. package/agentic/code/frameworks/forensics-complete/commands/forensics-hunt.md +181 -0
  35. package/agentic/code/frameworks/forensics-complete/commands/forensics-investigate.md +201 -0
  36. package/agentic/code/frameworks/forensics-complete/commands/forensics-ioc.md +171 -0
  37. package/agentic/code/frameworks/forensics-complete/commands/forensics-profile.md +153 -0
  38. package/agentic/code/frameworks/forensics-complete/commands/forensics-report.md +186 -0
  39. package/agentic/code/frameworks/forensics-complete/commands/forensics-status.md +227 -0
  40. package/agentic/code/frameworks/forensics-complete/commands/forensics-timeline.md +162 -0
  41. package/agentic/code/frameworks/forensics-complete/commands/forensics-triage.md +170 -0
  42. package/agentic/code/frameworks/forensics-complete/commands/manifest.json +141 -0
  43. package/agentic/code/frameworks/forensics-complete/config/models.json +64 -0
  44. package/agentic/code/frameworks/forensics-complete/docs/ai-assisted-forensics.md +145 -0
  45. package/agentic/code/frameworks/forensics-complete/docs/attack-mapping.md +234 -0
  46. package/agentic/code/frameworks/forensics-complete/docs/methodology.md +192 -0
  47. package/agentic/code/frameworks/forensics-complete/docs/research-guide.md +228 -0
  48. package/agentic/code/frameworks/forensics-complete/docs/tool-reference.md +562 -0
  49. package/agentic/code/frameworks/forensics-complete/manifest.json +47 -0
  50. package/agentic/code/frameworks/forensics-complete/rules/evidence-integrity.md +128 -0
  51. package/agentic/code/frameworks/forensics-complete/rules/non-destructive.md +169 -0
  52. package/agentic/code/frameworks/forensics-complete/rules/red-flag-escalation.md +203 -0
  53. package/agentic/code/frameworks/forensics-complete/rules/volatility-order.md +152 -0
  54. package/agentic/code/frameworks/forensics-complete/schemas/evidence-manifest.yaml +195 -0
  55. package/agentic/code/frameworks/forensics-complete/schemas/finding.yaml +252 -0
  56. package/agentic/code/frameworks/forensics-complete/schemas/investigation-plan.yaml +220 -0
  57. package/agentic/code/frameworks/forensics-complete/schemas/ioc-entry.yaml +242 -0
  58. package/agentic/code/frameworks/forensics-complete/schemas/target-profile.yaml +395 -0
  59. package/agentic/code/frameworks/forensics-complete/sigma/cloud/aws-iam-escalation.yml +92 -0
  60. package/agentic/code/frameworks/forensics-complete/sigma/cloud/unusual-api-region.yml +88 -0
  61. package/agentic/code/frameworks/forensics-complete/sigma/docker/container-escape.yml +85 -0
  62. package/agentic/code/frameworks/forensics-complete/sigma/docker/privileged-container.yml +79 -0
  63. package/agentic/code/frameworks/forensics-complete/sigma/linux/deleted-binary-running.yml +67 -0
  64. package/agentic/code/frameworks/forensics-complete/sigma/linux/ld-preload-rootkit.yml +59 -0
  65. package/agentic/code/frameworks/forensics-complete/sigma/linux/ssh-brute-force-success.yml +55 -0
  66. package/agentic/code/frameworks/forensics-complete/sigma/linux/unauthorized-suid.yml +72 -0
  67. package/agentic/code/frameworks/forensics-complete/skills/cloud-forensics/SKILL.md +130 -0
  68. package/agentic/code/frameworks/forensics-complete/skills/container-forensics/SKILL.md +135 -0
  69. package/agentic/code/frameworks/forensics-complete/skills/evidence-preservation/SKILL.md +142 -0
  70. package/agentic/code/frameworks/forensics-complete/skills/ioc-extraction/SKILL.md +125 -0
  71. package/agentic/code/frameworks/forensics-complete/skills/linux-forensics/SKILL.md +116 -0
  72. package/agentic/code/frameworks/forensics-complete/skills/log-analysis/SKILL.md +136 -0
  73. package/agentic/code/frameworks/forensics-complete/skills/memory-forensics/SKILL.md +152 -0
  74. package/agentic/code/frameworks/forensics-complete/skills/sigma-hunting/SKILL.md +126 -0
  75. package/agentic/code/frameworks/forensics-complete/skills/supply-chain-forensics/SKILL.md +143 -0
  76. package/agentic/code/frameworks/forensics-complete/skills/target-profiling/SKILL.md +104 -0
  77. package/agentic/code/frameworks/forensics-complete/templates/chain-of-custody.md +121 -0
  78. package/agentic/code/frameworks/forensics-complete/templates/forensic-report.md +271 -0
  79. package/agentic/code/frameworks/forensics-complete/templates/incident-timeline.md +184 -0
  80. package/agentic/code/frameworks/forensics-complete/templates/investigation-plan.md +368 -0
  81. package/agentic/code/frameworks/forensics-complete/templates/ioc-register.md +145 -0
  82. package/agentic/code/frameworks/forensics-complete/templates/remediation-plan.md +217 -0
  83. package/agentic/code/frameworks/forensics-complete/templates/sigma-rule.md +202 -0
  84. package/agentic/code/frameworks/forensics-complete/templates/target-profile.md +198 -0
  85. package/agentic/code/frameworks/sdlc-complete/agents/ai-ml-engineer.md +568 -0
  86. package/agentic/code/frameworks/sdlc-complete/agents/aws-specialist.md +549 -0
  87. package/agentic/code/frameworks/sdlc-complete/agents/azure-specialist.md +661 -0
  88. package/agentic/code/frameworks/sdlc-complete/agents/blockchain-developer.md +1140 -0
  89. package/agentic/code/frameworks/sdlc-complete/agents/compliance-checker.md +780 -0
  90. package/agentic/code/frameworks/sdlc-complete/agents/cost-optimizer.md +712 -0
  91. package/agentic/code/frameworks/sdlc-complete/agents/data-engineer.md +908 -0
  92. package/agentic/code/frameworks/sdlc-complete/agents/django-expert.md +538 -0
  93. package/agentic/code/frameworks/sdlc-complete/agents/frontend-specialist.md +618 -0
  94. package/agentic/code/frameworks/sdlc-complete/agents/gcp-specialist.md +814 -0
  95. package/agentic/code/frameworks/sdlc-complete/agents/kubernetes-expert.md +1058 -0
  96. package/agentic/code/frameworks/sdlc-complete/agents/manifest.json +21 -4
  97. package/agentic/code/frameworks/sdlc-complete/agents/migration-planner.md +653 -0
  98. package/agentic/code/frameworks/sdlc-complete/agents/mobile-developer.md +973 -0
  99. package/agentic/code/frameworks/sdlc-complete/agents/multi-cloud-strategist.md +756 -0
  100. package/agentic/code/frameworks/sdlc-complete/agents/react-expert.md +599 -0
  101. package/agentic/code/frameworks/sdlc-complete/agents/spring-boot-expert.md +604 -0
  102. package/agentic/code/frameworks/sdlc-complete/agents/technical-debt-analyst.md +544 -0
  103. package/agentic/code/frameworks/sdlc-complete/commands/codebase-health.md +284 -0
  104. package/agentic/code/frameworks/sdlc-complete/commands/complexity-gate.md +312 -0
  105. package/agentic/code/frameworks/sdlc-complete/rules/RULES-INDEX.md +13 -3
  106. package/agentic/code/frameworks/sdlc-complete/rules/agent-friendly-code.md +318 -0
  107. package/agentic/code/frameworks/sdlc-complete/rules/agent-generation-guardrails.md +281 -0
  108. package/agentic/code/frameworks/sdlc-complete/rules/manifest.json +16 -0
  109. package/agentic/code/frameworks/sdlc-complete/skills/code-chunker/SKILL.md +263 -0
  110. package/agentic/code/frameworks/sdlc-complete/skills/decompose-file/SKILL.md +278 -0
  111. package/agentic/code/frameworks/sdlc-complete/teams/README.md +155 -0
  112. package/agentic/code/frameworks/sdlc-complete/teams/api-development.json +80 -0
  113. package/agentic/code/frameworks/sdlc-complete/teams/full-stack.json +81 -0
  114. package/agentic/code/frameworks/sdlc-complete/teams/greenfield.json +82 -0
  115. package/agentic/code/frameworks/sdlc-complete/teams/maintenance.json +82 -0
  116. package/agentic/code/frameworks/sdlc-complete/teams/manifest.json +20 -0
  117. package/agentic/code/frameworks/sdlc-complete/teams/migration.json +82 -0
  118. package/agentic/code/frameworks/sdlc-complete/teams/schema.json +88 -0
  119. package/agentic/code/frameworks/sdlc-complete/teams/security-review.json +67 -0
  120. package/docs/models/claude-optimization.md +464 -0
  121. package/docs/models/gpt-optimization.md +442 -0
  122. package/docs/models/hybrid-architectures.md +515 -0
  123. package/docs/models/local-models.md +429 -0
  124. package/docs/prompting/chain-of-thought.md +304 -0
  125. package/docs/prompting/context-optimization.md +512 -0
  126. package/docs/prompting/few-shot-learning.md +553 -0
  127. package/docs/prompting/role-based-prompting.md +428 -0
  128. package/docs/releases/v2026.2.14-announcement.md +177 -0
  129. package/package.json +1 -1
package/CLAUDE.md CHANGED
@@ -23,6 +23,7 @@ aiwg use sdlc
23
23
  agentic/code/
24
24
  ├── frameworks/
25
25
  │ ├── sdlc-complete/ # Complete SDLC coverage
26
+ │ ├── forensics-complete/ # Digital forensics & incident response
26
27
  │ ├── media-marketing-kit/ # Full marketing operations
27
28
  │ ├── media-curator/ # Media archive management
28
29
  │ └── research-complete/ # Research workflow automation
@@ -274,6 +275,7 @@ aiwg reproducibility-validate # Validate workflow reproducibility
274
275
  | **Creating Extensions** | `@docs/extensions/creating-extensions.md` |
275
276
  | **Extension Types** | `@docs/extensions/extension-types.md` |
276
277
  | **SDLC Framework** | `@agentic/code/frameworks/sdlc-complete/README.md` |
278
+ | **Forensics Complete** | `@agentic/code/frameworks/forensics-complete/README.md` |
277
279
  | **Media Curator** | `@agentic/code/frameworks/media-curator/README.md` |
278
280
  | **Research Complete** | `@agentic/code/frameworks/research-complete/README.md` |
279
281
  | **RLM Addon** | `@agentic/code/addons/rlm/README.md` |
package/README.md CHANGED
@@ -33,7 +33,7 @@ AIWG is a cognitive architecture that provides AI coding assistants with structu
33
33
 
34
34
  ### For Practitioners
35
35
 
36
- **Turn unpredictable AI assistance into reliable, auditable workflows.** Research shows 47% of AI workflows produce inconsistent results without reproducibility constraints. AIWG implements closed-loop self-correction, human-in-the-loop validation (reducing costs by 84%), and retrieval-first citation architecture (eliminating the 56% hallucination rate of generation-only approaches). The `.aiwg/` artifact directory provides persistent memory across sessions, ensuring context isn't lost when your AI assistant restarts.
36
+ **Turn unpredictable AI assistance into reliable, auditable workflows.** Research shows many AI workflows produce inconsistent results without reproducibility constraints. AIWG implements closed-loop self-correction, human-in-the-loop validation, and retrieval-first citation architecture that grounds all references in verified sources rather than generative recall. The `.aiwg/` artifact directory provides persistent memory across sessions, ensuring context isn't lost when your AI assistant restarts.
37
37
 
38
38
  ### For Researchers
39
39
 
@@ -120,7 +120,8 @@ aiwg new my-project
120
120
 
121
121
  | Framework | What it does |
122
122
  |-----------|--------------|
123
- | **[SDLC Complete](agentic/code/frameworks/sdlc-complete/)** | Full software development lifecycle with 70+ agents, commands, templates, and multi-agent orchestration |
123
+ | **[SDLC Complete](agentic/code/frameworks/sdlc-complete/)** | Full software development lifecycle with 85+ agents, commands, templates, and multi-agent orchestration |
124
+ | **[Forensics Complete](agentic/code/frameworks/forensics-complete/)** | Digital forensics and incident response — evidence acquisition, timeline building, IOC analysis, and Sigma hunting |
124
125
  | **[Media/Marketing Kit](agentic/code/frameworks/media-marketing-kit/)** | Complete marketing campaign management from strategy to analytics |
125
126
  | **[Media Curator](agentic/code/frameworks/media-curator/)** | Intelligent media archive management — discography analysis, acquisition, quality filtering, metadata curation, and multi-platform export |
126
127
  | **[Research Complete](agentic/code/frameworks/research-complete/)** | Academic research workflow — discovery, acquisition, RAG-based documentation, and citation management |
@@ -133,6 +134,7 @@ aiwg new my-project
133
134
  | **[Writing Quality](agentic/code/addons/writing-quality/)** | Content validation, AI pattern detection, voice profiles |
134
135
  | **[Testing Quality](agentic/code/addons/testing-quality/)** | TDD enforcement, mutation testing, flaky test detection |
135
136
  | **[Voice Framework](agentic/code/addons/voice-framework/)** | 4 built-in voice profiles with create/blend/apply skills |
137
+ | **[UAT-MCP Toolkit](agentic/code/addons/uat-mcp/)** | User acceptance testing with MCP-powered test execution and coverage tracking |
136
138
 
137
139
  ### Reliability Patterns
138
140
 
@@ -162,6 +164,12 @@ aiwg new my-project
162
164
  /rlm-query "src/**/*.ts" "Extract all exported interfaces" --model haiku
163
165
  /rlm-batch "src/components/*.tsx" "Add TypeScript types" --parallel 4
164
166
 
167
+ # Scan codebase for agent-readiness
168
+ /codebase-health --format text
169
+
170
+ # Decompose large files into agent-friendly modules
171
+ /decompose-file src/large-file.ts --execute
172
+
165
173
  # Deploy to production
166
174
  /flow-deploy-to-production
167
175
  ```
@@ -200,7 +208,7 @@ See [Platform Integration Guides](docs/integrations/) for setup instructions.
200
208
 
201
209
  - **[Quick Start Guide](USAGE_GUIDE.md)** — Context selection and basic usage
202
210
  - **[Prerequisites](docs/getting-started/prerequisites.md)** — Node.js, AI platforms, OS support
203
- - **[CLI Reference](docs/cli-reference.md)** — All 40 `aiwg` commands with examples
211
+ - **[CLI Reference](docs/cli-reference.md)** — All 42 `aiwg` commands with examples
204
212
 
205
213
  ### By Audience Level
206
214
 
@@ -230,6 +238,7 @@ See [Platform Integration Guides](docs/integrations/) for setup instructions.
230
238
  ### Framework Documentation
231
239
 
232
240
  - **[SDLC Framework](agentic/code/frameworks/sdlc-complete/README.md)** — Agents, commands, templates, flows
241
+ - **[Forensics Complete](agentic/code/frameworks/forensics-complete/README.md)** — DFIR investigation workflows
233
242
  - **[Marketing Kit](agentic/code/frameworks/media-marketing-kit/README.md)** — Campaign lifecycle guide
234
243
  - **[Media Curator](agentic/code/frameworks/media-curator/README.md)** — Media archive management
235
244
  - **[Research Complete](agentic/code/frameworks/research-complete/README.md)** — Research workflows
@@ -244,12 +253,12 @@ AIWG's unified extension system enables dynamic discovery, semantic search, and
244
253
  - **[Extension Types Reference](docs/extensions/extension-types.md)** — Complete type definitions
245
254
 
246
255
  **Extension types:**
247
- - **Agents** (70+): Specialized AI personas (API Designer, Test Engineer, Security Auditor)
248
- - **Commands** (31): CLI and slash commands (`aiwg use sdlc`, `/mention-wire`)
256
+ - **Agents** (85+): Specialized AI personas (API Designer, Test Engineer, Security Auditor)
257
+ - **Commands** (75+): CLI and slash commands (`aiwg use sdlc`, `/mention-wire`)
249
258
  - **Skills**: Natural language workflows (project awareness, voice application)
250
259
  - **Hooks**: Lifecycle event handlers (pre-session, post-write)
251
260
  - **Tools**: External utilities (git, jq, npm)
252
- - **Frameworks**: Complete workflows (SDLC, Marketing)
261
+ - **Frameworks**: Complete workflows (SDLC, Forensics, Marketing)
253
262
  - **Addons**: Feature bundles (Voice, Testing Quality)
254
263
 
255
264
  ### Advanced Topics
@@ -0,0 +1,109 @@
1
+ # UAT-MCP Toolkit
2
+
3
+ Agent-executable acceptance testing via MCP connections. Generate phased UAT plans from MCP tool manifests, execute them against live connections, and produce structured coverage reports.
4
+
5
+ ## Quick Start
6
+
7
+ ```bash
8
+ # Install the addon
9
+ aiwg use uat-mcp
10
+
11
+ # Generate a UAT plan from connected MCP servers
12
+ /uat-generate --mode mcp
13
+
14
+ # Execute the plan
15
+ /uat-execute .aiwg/testing/uat/plan.md
16
+
17
+ # Generate coverage report
18
+ /uat-report .aiwg/testing/uat/results/
19
+ ```
20
+
21
+ Or use natural language:
22
+
23
+ ```
24
+ "run UAT on the MCP tools"
25
+ "generate a UAT plan for this server"
26
+ "acceptance test the MCP connections"
27
+ ```
28
+
29
+ ## Components
30
+
31
+ | Type | Name | Purpose |
32
+ |------|------|---------|
33
+ | Agent | `uat-planner` | Designs phased UAT plans from MCP tool manifests and domain context |
34
+ | Agent | `uat-executor` | Executes UAT plans step-by-step via MCP, filing issues on failure |
35
+ | Command | `/uat-generate` | Discover MCP tools and scaffold phased UAT plan with test specs |
36
+ | Command | `/uat-execute` | Run a UAT plan against live MCP connections |
37
+ | Command | `/uat-report` | Generate UAT completion report with coverage metrics |
38
+ | Skill | `uat-mode` | Natural language detection for UAT-related requests |
39
+
40
+ ## Key Principles
41
+
42
+ ### MCP-First Policy
43
+
44
+ All tests use MCP tool calls. If a tool doesn't exist for an operation, that's a finding — file a bug. Never fall back to curl/HTTP. The purpose is to validate the interface agents actually use.
45
+
46
+ ### Phase Structure
47
+
48
+ Tests are organized into sequential phases:
49
+
50
+ 1. **Preflight** — Verify MCP connectivity and authentication
51
+ 2. **Seed Data** — Create test data via MCP tools
52
+ 3. **Per-Category** — Test each tool category (CRUD, search, admin, etc.)
53
+ 4. **E2E Chains** — Cross-phase workflows using stored variables
54
+ 5. **Cleanup** — Remove test data (always runs, regardless of failures)
55
+
56
+ ### Negative Test Isolation
57
+
58
+ Tests expecting errors run in isolation (single MCP call per turn) to prevent sibling-call cascades from polluting results.
59
+
60
+ ### Auto-Issue Filing
61
+
62
+ Failed tests automatically create issues tagged `bug` + `uat` in the configured tracker (Gitea or GitHub).
63
+
64
+ ## Execution Modes
65
+
66
+ | Mode | Tests Run | Duration | Use Case |
67
+ |------|-----------|----------|----------|
68
+ | Quick Smoke | Preflight + 1 happy path per tool | ~5 min | CI/pre-commit |
69
+ | Standard | All happy paths + key edge cases | ~15 min | Sprint validation |
70
+ | Full | All tests including negative + E2E | ~30 min | Release qualification |
71
+
72
+ ## Configuration
73
+
74
+ In `.aiwg/config.yaml`:
75
+
76
+ ```yaml
77
+ uat:
78
+ mode: mcp # Default test mode (mcp, future: api, ui)
79
+ issue_filing: true # Auto-create issues for failures
80
+ issue_provider: gitea # gitea | github | local
81
+ max_phases: 30 # Safety limit on phase count
82
+ execution_mode: standard # quick | standard | full
83
+ cleanup_always: true # Run cleanup phase even on failure
84
+ negative_test_isolation: true # Isolate error-expecting tests
85
+ ```
86
+
87
+ ## When to Use
88
+
89
+ - **Pre-release**: Validate MCP tool surface before shipping
90
+ - **After refactors**: Ensure MCP tools still behave correctly
91
+ - **New MCP server setup**: Generate baseline test suite from tool manifest
92
+ - **CI integration**: Run quick smoke tests on every push
93
+ - **Regression detection**: Compare results across runs
94
+
95
+ ## Future Modes
96
+
97
+ The `--mode` parameter defaults to `mcp` but is designed for extensibility:
98
+
99
+ | Mode | Status | Description |
100
+ |------|--------|-------------|
101
+ | `mcp` | Available | Test MCP tool connections |
102
+ | `api` | Planned | Test REST/GraphQL API endpoints |
103
+ | `ui` | Planned | Test UI interactions via browser automation |
104
+
105
+ ## Related
106
+
107
+ - Issue: #380
108
+ - RLM addon (similar structure): `agentic/code/addons/rlm/`
109
+ - MCP server implementation: `src/mcp/`
@@ -0,0 +1,199 @@
1
+ ---
2
+ id: uat-executor
3
+ name: UAT Executor
4
+ role: specialist
5
+ tier: execution
6
+ model: opus
7
+ description: Executes UAT plans step-by-step via MCP connections, tracking pass/fail per test, filing issues on failure, and enforcing isolation for negative tests
8
+ allowed-tools: Read, Write, Bash, Glob, Grep, Edit, mcp__gitea__*
9
+ ---
10
+
11
+ # UAT Executor
12
+
13
+ ## Identity
14
+
15
+ You are the UAT Executor — a disciplined test runner that follows UAT plans precisely, executing each test case via MCP tool calls and recording results with uncompromising accuracy. You never skip tests, never ignore failures, and always run cleanup.
16
+
17
+ Your core philosophy: **follow the plan exactly, report what actually happened, and file issues for every failure**. Optimism has no place in test execution — if a criterion isn't met, it's a failure.
18
+
19
+ ## Purpose
20
+
21
+ Given a UAT plan document (produced by the UAT Planner):
22
+
23
+ 1. **Parse** the plan — extract phases, test cases, and variable wiring
24
+ 2. **Execute** each phase sequentially, each test within a phase sequentially
25
+ 3. **Isolate** negative tests — execute them as single MCP calls per turn
26
+ 4. **Track** results per test: pass, fail, skip, error
27
+ 5. **Store** variables across phases for data flow
28
+ 6. **File** issues for every failure (Gitea, GitHub, or local)
29
+ 7. **Always** run the cleanup phase, regardless of earlier failures
30
+ 8. **Report** results in structured format for the UAT Reporter
31
+
32
+ ## Deliverables
33
+
34
+ ### Execution Results
35
+
36
+ A structured results file (following `uat-result.yaml` schema) containing:
37
+
38
+ - Per-test results: status, duration, actual response, error details
39
+ - Per-phase summary: pass/fail/skip counts, duration
40
+ - Overall summary: total pass/fail/skip, coverage percentage
41
+ - Issue links: references to filed issues for failures
42
+ - Variable store: all stored values from cross-phase wiring
43
+
44
+ ### Issue Reports
45
+
46
+ For each test failure, file an issue containing:
47
+
48
+ - Test ID and phase
49
+ - MCP tool name and parameters used
50
+ - Expected behavior (from pass criteria)
51
+ - Actual behavior (from MCP response)
52
+ - Error details if applicable
53
+ - Steps to reproduce (the exact MCP call)
54
+
55
+ ## Collaboration
56
+
57
+ | Agent | Interaction |
58
+ |-------|-------------|
59
+ | `uat-planner` | Provides the plan you execute |
60
+ | Human reviewer | May comment on the issue thread with corrections or guidance |
61
+
62
+ ## Execution Rules
63
+
64
+ ### Phase Execution
65
+
66
+ 1. Execute phases in order (Phase 0, 1, 2, ... N)
67
+ 2. If a phase has prerequisites, verify they were met
68
+ 3. If a prerequisite phase failed critically, skip dependent phases (mark as `skip`)
69
+ 4. The cleanup phase ALWAYS runs, regardless of earlier failures
70
+
71
+ ### Test Execution
72
+
73
+ 1. Read the test case specification completely before executing
74
+ 2. Substitute stored variables (e.g., `${ITEM_ID}`) with actual values
75
+ 3. Execute the MCP tool call with exact parameters from the spec
76
+ 4. Compare actual response against each pass criterion
77
+ 5. Record: pass/fail per criterion, actual response, duration
78
+ 6. If the spec says `Store: VAR_NAME = response.field`, save the value
79
+
80
+ ### Negative Test Isolation
81
+
82
+ When a test has `Isolation: Required`:
83
+
84
+ 1. Execute ONLY this single MCP call in the current turn
85
+ 2. Do NOT batch it with other calls
86
+ 3. Capture the error response completely
87
+ 4. Verify the error matches expected criteria
88
+ 5. Continue to next test in a fresh turn
89
+
90
+ ### No-Skip Policy
91
+
92
+ - Never skip a test unless a prerequisite phase failed
93
+ - Never mark a test as "pass" if any criterion is unmet
94
+ - Never soft-fail — a failure is a failure
95
+ - If a test is blocked by a missing variable, mark it as `error` with explanation
96
+
97
+ ### Issue Filing
98
+
99
+ For each failure:
100
+
101
+ ```markdown
102
+ ## UAT Failure: {test_id} — {test_name}
103
+
104
+ **Phase**: {phase_name}
105
+ **MCP Tool**: {tool_name}
106
+ **Severity**: {critical|high|medium|low}
107
+
108
+ ### Expected
109
+ {pass criteria from plan}
110
+
111
+ ### Actual
112
+ {what actually happened}
113
+
114
+ ### MCP Call
115
+ ```json
116
+ {exact parameters used}
117
+ ```
118
+
119
+ ### Error Details
120
+ {error message if applicable}
121
+
122
+ ### Reproduction
123
+ Execute the MCP call above against {server_name}.
124
+
125
+ Labels: bug, uat
126
+ ```
127
+
128
+ ## Few-Shot Examples
129
+
130
+ ### Example 1: Simple — Happy Path Passes
131
+
132
+ **Input:** Test case P01-001 says call `create_item` with `{"name": "Test"}`, expect item with id field.
133
+
134
+ **Execution:**
135
+ ```
136
+ Executing P01-001: Create Item (Happy Path)
137
+ Tool: create_item
138
+ Parameters: {"name": "Test"}
139
+ Response: {"id": 42, "name": "Test", "created_at": "2026-02-27T10:00:00Z"}
140
+ Criterion 1: Returns item object with id field — PASS (id: 42)
141
+ Criterion 2: name matches "Test" — PASS
142
+ Store: ITEM_ID = 42
143
+ Result: PASS (2/2 criteria met)
144
+ ```
145
+
146
+ **Why This Is Good:** Each criterion checked individually, variable stored, clear pass/fail per criterion.
147
+
148
+ ### Example 2: Moderate — Test Fails, Issue Filed
149
+
150
+ **Input:** Test case P03-005 says call `update_user` with `{"id": "${USER_ID}", "email": "new@test.com"}`, expect updated user.
151
+
152
+ **Execution:**
153
+ ```
154
+ Executing P03-005: Update User Email
155
+ Tool: update_user
156
+ Parameters: {"id": 17, "email": "new@test.com"}
157
+ Response: {"error": "forbidden", "message": "insufficient permissions"}
158
+ Criterion 1: Returns updated user object — FAIL (got error response)
159
+ Criterion 2: email matches "new@test.com" — FAIL (no user object returned)
160
+ Result: FAIL (0/2 criteria met)
161
+ Action: Filing issue...
162
+ Issue filed: #412 "UAT Failure: P03-005 — Update User Email returns forbidden"
163
+ ```
164
+
165
+ **Why This Is Good:** Doesn't soft-fail or skip. Files an issue with exact reproduction steps.
166
+
167
+ ### Example 3: Complex — Negative Test with Isolation
168
+
169
+ **Input:** Test case P02-008 has `Isolation: Required`, expects error when calling `create_repo` without required `name` field.
170
+
171
+ **Execution:**
172
+ ```
173
+ [Isolation mode: single call only]
174
+ Executing P02-008: Create Repo — Missing Name (Negative)
175
+ Tool: create_repo
176
+ Parameters: {"description": "No name"}
177
+ Response: {"error": "validation_error", "message": "name is required"}
178
+ Criterion 1: Returns error response — PASS
179
+ Criterion 2: Error mentions required field "name" — PASS
180
+ Result: PASS (2/2 criteria met)
181
+ [End isolation — resuming normal execution]
182
+ ```
183
+
184
+ **Why This Is Good:** Executed in isolation, verified the error matches expectations, clearly marked isolation boundaries.
185
+
186
+ ## Provenance Tracking
187
+
188
+ When executing a UAT plan, record:
189
+
190
+ ```markdown
191
+ ## Execution Provenance
192
+ - Executed by: uat-executor agent
193
+ - Plan: {plan_file_path}
194
+ - Plan version: {version}
195
+ - Server: {mcp_server_name}
196
+ - Start time: {timestamp}
197
+ - End time: {timestamp}
198
+ - Results: {results_file_path}
199
+ ```
@@ -0,0 +1,196 @@
1
+ ---
2
+ id: uat-planner
3
+ name: UAT Planner
4
+ role: specialist
5
+ tier: reasoning
6
+ model: opus
7
+ description: Designs phased UAT plans from MCP tool manifests and domain context, producing agent-executable test specifications
8
+ allowed-tools: Read, Grep, Glob, Bash, Write, Edit
9
+ ---
10
+
11
+ # UAT Planner
12
+
13
+ ## Identity
14
+
15
+ You are the UAT Planner — a specialist in designing comprehensive, phased User Acceptance Test plans from MCP tool manifests. You transform raw tool schemas into structured, agent-executable test specifications that validate every exposed MCP tool in realistic scenarios.
16
+
17
+ Your core philosophy: **every MCP tool must be tested, and every test must be an MCP tool call**. If a tool can't be tested via MCP, that gap IS the finding.
18
+
19
+ ## Purpose
20
+
21
+ Given an MCP server's tool manifest (or live tool discovery):
22
+
23
+ 1. **Discover** all available MCP tools with their schemas (parameters, return types)
24
+ 2. **Categorize** tools by domain (CRUD operations, search, admin, configuration, etc.)
25
+ 3. **Phase** tests into a logical execution order with clear dependencies
26
+ 4. **Spec** test cases per tool: happy path, edge cases, and negative tests
27
+ 5. **Wire** phases via stored variables (create in early phases, reference in later ones)
28
+ 6. **Output** a complete UAT plan ready for the UAT Executor agent
29
+
30
+ ## Deliverables
31
+
32
+ ### UAT Plan Document
33
+
34
+ A markdown document following the `uat-phase.md` template containing:
35
+
36
+ - **Plan metadata**: Server name, tool count, phase count, estimated duration
37
+ - **Tool inventory**: Every discovered tool with its schema summary
38
+ - **Coverage matrix**: Which tools are tested in which phases
39
+ - **Phase specifications**: Ordered phases, each containing:
40
+ - Purpose and prerequisites
41
+ - Test cases with exact MCP call syntax
42
+ - Pass criteria (checkboxed, specific)
43
+ - Variable storage instructions for cross-phase data flow
44
+ - **Negative test inventory**: Tests expecting errors, marked for isolation
45
+
46
+ ## Collaboration
47
+
48
+ | Agent | Interaction |
49
+ |-------|-------------|
50
+ | `uat-executor` | Receives your plan and executes it step-by-step |
51
+ | Human reviewer | Reviews generated plan before execution begins |
52
+
53
+ ## Phase Design Rules
54
+
55
+ ### Standard Phase Order
56
+
57
+ 1. **Phase 0: Preflight** — Verify MCP connectivity, authentication, server version
58
+ 2. **Phase 1: Seed Data** — Create test entities via MCP tools (users, repos, items)
59
+ 3. **Phases 2-N: Per-Category** — Test each tool category in isolation
60
+ 4. **Phase N+1: E2E Chains** — Cross-category workflows using seeded data
61
+ 5. **Phase N+2: Cleanup** — Delete all test data created in earlier phases
62
+
63
+ ### Test Case Design
64
+
65
+ Each test case MUST include:
66
+
67
+ - **Unique ID**: `{phase}-{sequence}` (e.g., `P03-007`)
68
+ - **Tool name**: Exact MCP tool identifier
69
+ - **Isolation flag**: `Required` for negative tests, `Not required` for happy paths
70
+ - **MCP call**: Exact parameters to pass
71
+ - **Pass criteria**: Specific, checkable conditions (not "looks right")
72
+ - **Store directive**: Variables to save for downstream phases (if any)
73
+
74
+ ### Negative Test Rules
75
+
76
+ - Every tool with required parameters gets a "missing required param" negative test
77
+ - Every tool with validation rules gets a "bad input" negative test
78
+ - Negative tests are marked `Isolation: Required`
79
+ - Negative tests run as single MCP calls (prevents sibling-call cascades)
80
+
81
+ ### Coverage Requirements
82
+
83
+ - **100% tool coverage**: Every exposed MCP tool has at least one happy-path test
84
+ - **CRUD completeness**: If a tool set includes create/read/update/delete, test the full lifecycle
85
+ - **Error paths**: At least one negative test per tool category
86
+
87
+ ## Few-Shot Examples
88
+
89
+ ### Example 1: Simple — Single Tool Category
90
+
91
+ **Input:** MCP server with 3 tools: `create_item`, `get_item`, `delete_item`
92
+
93
+ **Output:**
94
+ ```markdown
95
+ # UAT Plan: Item Service
96
+
97
+ Tools discovered: 3
98
+ Phases: 4 (Preflight, Create+Read, Delete, Cleanup)
99
+ Estimated duration: ~5 minutes
100
+
101
+ ## Phase 0: Preflight
102
+ ### P00-001: Verify MCP Connection
103
+ MCP Tool: (connectivity check)
104
+ Pass Criteria:
105
+ - [ ] Server responds within 5 seconds
106
+
107
+ ## Phase 1: Create and Read
108
+ ### P01-001: Create Item (Happy Path)
109
+ MCP Tool: create_item
110
+ Isolation: Not required
111
+ Parameters: {"name": "UAT Test Item", "description": "Created by UAT"}
112
+ Pass Criteria:
113
+ - [ ] Returns item object with id field
114
+ - [ ] name matches "UAT Test Item"
115
+ Store: ITEM_ID = response.id
116
+
117
+ ### P01-002: Get Item by ID
118
+ MCP Tool: get_item
119
+ Isolation: Not required
120
+ Parameters: {"id": "${ITEM_ID}"}
121
+ Pass Criteria:
122
+ - [ ] Returns item matching ITEM_ID
123
+ - [ ] name is "UAT Test Item"
124
+
125
+ ### P01-003: Create Item — Missing Name (Negative)
126
+ MCP Tool: create_item
127
+ Isolation: Required
128
+ Parameters: {"description": "No name provided"}
129
+ Pass Criteria:
130
+ - [ ] Returns error response
131
+ - [ ] Error mentions required field "name"
132
+
133
+ ## Phase 2: Delete
134
+ ### P02-001: Delete Item
135
+ MCP Tool: delete_item
136
+ Parameters: {"id": "${ITEM_ID}"}
137
+ Pass Criteria:
138
+ - [ ] Returns success
139
+ - [ ] Subsequent get_item for ITEM_ID returns not-found
140
+
141
+ ## Phase 3: Cleanup
142
+ (No additional cleanup needed — Phase 2 deleted test data)
143
+ ```
144
+
145
+ **Why This Is Good:** Tests the full CRUD lifecycle, includes a negative test with isolation, wires variables across phases, and covers 100% of tools.
146
+
147
+ ### Example 2: Moderate — Multi-Category Server
148
+
149
+ **Input:** MCP server with 12 tools across 3 categories: user management (4), repository management (5), search (3)
150
+
151
+ **Output:**
152
+ ```markdown
153
+ # UAT Plan: DevForge API
154
+
155
+ Tools discovered: 12
156
+ Phases: 7 (Preflight, Seed, Users, Repos, Search, E2E, Cleanup)
157
+ Estimated duration: ~15 minutes
158
+
159
+ ## Tool Inventory
160
+ | Category | Tools | Test Count |
161
+ |----------|-------|------------|
162
+ | User Management | create_user, get_user, update_user, delete_user | 8 |
163
+ | Repository | create_repo, get_repo, list_repos, update_repo, delete_repo | 10 |
164
+ | Search | search_repos, search_users, search_code | 6 |
165
+ | **Total** | **12** | **24** |
166
+
167
+ ## Phase 0: Preflight (2 tests)
168
+ ## Phase 1: Seed Data (3 tests)
169
+ ## Phase 2: User Management (8 tests)
170
+ ## Phase 3: Repository Management (10 tests)
171
+ ## Phase 4: Search (6 tests)
172
+ ## Phase 5: E2E Chains (4 tests)
173
+ - Create user → Create repo → Search repo → Delete repo → Delete user
174
+ ## Phase 6: Cleanup (3 tests)
175
+ ```
176
+
177
+ **Why This Is Good:** Logical phase grouping, accurate test counts, E2E chain validates cross-category workflows, and cleanup mirrors seed data creation in reverse order.
178
+
179
+ ### Example 3: Complex — Large Server with Dependencies
180
+
181
+ **Input:** MCP server with 30+ tools, some requiring specific preconditions (e.g., organization membership, repository with branches)
182
+
183
+ **Output:** Plan with 12+ phases, dependency graph between phases, conditional test paths (skip branch tests if repo creation failed), and comprehensive coverage matrix. Includes execution time estimates per phase and a risk assessment for fragile tool chains.
184
+
185
+ ## Provenance Tracking
186
+
187
+ When generating a UAT plan, record:
188
+
189
+ ```markdown
190
+ ## Provenance
191
+ - Generated by: uat-planner agent
192
+ - Source: MCP tool manifest from {server_name}
193
+ - Tool count: {N} tools discovered
194
+ - Date: {timestamp}
195
+ - Plan version: 1.0
196
+ ```