aiwg 2026.2.13 → 2026.2.14
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +2 -0
- package/README.md +15 -6
- package/agentic/code/addons/uat-mcp/README.md +109 -0
- package/agentic/code/addons/uat-mcp/agents/uat-executor.md +199 -0
- package/agentic/code/addons/uat-mcp/agents/uat-planner.md +196 -0
- package/agentic/code/addons/uat-mcp/commands/uat-execute.md +242 -0
- package/agentic/code/addons/uat-mcp/commands/uat-generate.md +189 -0
- package/agentic/code/addons/uat-mcp/commands/uat-report.md +190 -0
- package/agentic/code/addons/uat-mcp/manifest.json +69 -0
- package/agentic/code/addons/uat-mcp/schemas/uat-coverage.yaml +139 -0
- package/agentic/code/addons/uat-mcp/schemas/uat-plan.yaml +206 -0
- package/agentic/code/addons/uat-mcp/schemas/uat-result.yaml +232 -0
- package/agentic/code/addons/uat-mcp/skills/uat-mode/SKILL.md +128 -0
- package/agentic/code/addons/uat-mcp/templates/uat-executor-guide.md +97 -0
- package/agentic/code/addons/uat-mcp/templates/uat-phase.md +61 -0
- package/agentic/code/addons/uat-mcp/templates/uat-report.md +95 -0
- package/agentic/code/addons/uat-mcp/templates/uat-test-case.md +66 -0
- package/agentic/code/frameworks/forensics-complete/README.md +170 -0
- package/agentic/code/frameworks/forensics-complete/agents/acquisition-agent.md +272 -0
- package/agentic/code/frameworks/forensics-complete/agents/cloud-analyst.md +326 -0
- package/agentic/code/frameworks/forensics-complete/agents/container-analyst.md +280 -0
- package/agentic/code/frameworks/forensics-complete/agents/forensics-orchestrator.md +346 -0
- package/agentic/code/frameworks/forensics-complete/agents/ioc-analyst.md +373 -0
- package/agentic/code/frameworks/forensics-complete/agents/log-analyst.md +258 -0
- package/agentic/code/frameworks/forensics-complete/agents/manifest.json +181 -0
- package/agentic/code/frameworks/forensics-complete/agents/memory-analyst.md +248 -0
- package/agentic/code/frameworks/forensics-complete/agents/network-analyst.md +266 -0
- package/agentic/code/frameworks/forensics-complete/agents/persistence-hunter.md +297 -0
- package/agentic/code/frameworks/forensics-complete/agents/recon-agent.md +239 -0
- package/agentic/code/frameworks/forensics-complete/agents/reporting-agent.md +284 -0
- package/agentic/code/frameworks/forensics-complete/agents/timeline-builder.md +326 -0
- package/agentic/code/frameworks/forensics-complete/agents/triage-agent.md +256 -0
- package/agentic/code/frameworks/forensics-complete/commands/forensics-acquire.md +194 -0
- package/agentic/code/frameworks/forensics-complete/commands/forensics-hunt.md +181 -0
- package/agentic/code/frameworks/forensics-complete/commands/forensics-investigate.md +201 -0
- package/agentic/code/frameworks/forensics-complete/commands/forensics-ioc.md +171 -0
- package/agentic/code/frameworks/forensics-complete/commands/forensics-profile.md +153 -0
- package/agentic/code/frameworks/forensics-complete/commands/forensics-report.md +186 -0
- package/agentic/code/frameworks/forensics-complete/commands/forensics-status.md +227 -0
- package/agentic/code/frameworks/forensics-complete/commands/forensics-timeline.md +162 -0
- package/agentic/code/frameworks/forensics-complete/commands/forensics-triage.md +170 -0
- package/agentic/code/frameworks/forensics-complete/commands/manifest.json +141 -0
- package/agentic/code/frameworks/forensics-complete/config/models.json +64 -0
- package/agentic/code/frameworks/forensics-complete/docs/ai-assisted-forensics.md +145 -0
- package/agentic/code/frameworks/forensics-complete/docs/attack-mapping.md +234 -0
- package/agentic/code/frameworks/forensics-complete/docs/methodology.md +192 -0
- package/agentic/code/frameworks/forensics-complete/docs/research-guide.md +228 -0
- package/agentic/code/frameworks/forensics-complete/docs/tool-reference.md +562 -0
- package/agentic/code/frameworks/forensics-complete/manifest.json +47 -0
- package/agentic/code/frameworks/forensics-complete/rules/evidence-integrity.md +128 -0
- package/agentic/code/frameworks/forensics-complete/rules/non-destructive.md +169 -0
- package/agentic/code/frameworks/forensics-complete/rules/red-flag-escalation.md +203 -0
- package/agentic/code/frameworks/forensics-complete/rules/volatility-order.md +152 -0
- package/agentic/code/frameworks/forensics-complete/schemas/evidence-manifest.yaml +195 -0
- package/agentic/code/frameworks/forensics-complete/schemas/finding.yaml +252 -0
- package/agentic/code/frameworks/forensics-complete/schemas/investigation-plan.yaml +220 -0
- package/agentic/code/frameworks/forensics-complete/schemas/ioc-entry.yaml +242 -0
- package/agentic/code/frameworks/forensics-complete/schemas/target-profile.yaml +395 -0
- package/agentic/code/frameworks/forensics-complete/sigma/cloud/aws-iam-escalation.yml +92 -0
- package/agentic/code/frameworks/forensics-complete/sigma/cloud/unusual-api-region.yml +88 -0
- package/agentic/code/frameworks/forensics-complete/sigma/docker/container-escape.yml +85 -0
- package/agentic/code/frameworks/forensics-complete/sigma/docker/privileged-container.yml +79 -0
- package/agentic/code/frameworks/forensics-complete/sigma/linux/deleted-binary-running.yml +67 -0
- package/agentic/code/frameworks/forensics-complete/sigma/linux/ld-preload-rootkit.yml +59 -0
- package/agentic/code/frameworks/forensics-complete/sigma/linux/ssh-brute-force-success.yml +55 -0
- package/agentic/code/frameworks/forensics-complete/sigma/linux/unauthorized-suid.yml +72 -0
- package/agentic/code/frameworks/forensics-complete/skills/cloud-forensics/SKILL.md +130 -0
- package/agentic/code/frameworks/forensics-complete/skills/container-forensics/SKILL.md +135 -0
- package/agentic/code/frameworks/forensics-complete/skills/evidence-preservation/SKILL.md +142 -0
- package/agentic/code/frameworks/forensics-complete/skills/ioc-extraction/SKILL.md +125 -0
- package/agentic/code/frameworks/forensics-complete/skills/linux-forensics/SKILL.md +116 -0
- package/agentic/code/frameworks/forensics-complete/skills/log-analysis/SKILL.md +136 -0
- package/agentic/code/frameworks/forensics-complete/skills/memory-forensics/SKILL.md +152 -0
- package/agentic/code/frameworks/forensics-complete/skills/sigma-hunting/SKILL.md +126 -0
- package/agentic/code/frameworks/forensics-complete/skills/supply-chain-forensics/SKILL.md +143 -0
- package/agentic/code/frameworks/forensics-complete/skills/target-profiling/SKILL.md +104 -0
- package/agentic/code/frameworks/forensics-complete/templates/chain-of-custody.md +121 -0
- package/agentic/code/frameworks/forensics-complete/templates/forensic-report.md +271 -0
- package/agentic/code/frameworks/forensics-complete/templates/incident-timeline.md +184 -0
- package/agentic/code/frameworks/forensics-complete/templates/investigation-plan.md +368 -0
- package/agentic/code/frameworks/forensics-complete/templates/ioc-register.md +145 -0
- package/agentic/code/frameworks/forensics-complete/templates/remediation-plan.md +217 -0
- package/agentic/code/frameworks/forensics-complete/templates/sigma-rule.md +202 -0
- package/agentic/code/frameworks/forensics-complete/templates/target-profile.md +198 -0
- package/agentic/code/frameworks/sdlc-complete/agents/ai-ml-engineer.md +568 -0
- package/agentic/code/frameworks/sdlc-complete/agents/aws-specialist.md +549 -0
- package/agentic/code/frameworks/sdlc-complete/agents/azure-specialist.md +661 -0
- package/agentic/code/frameworks/sdlc-complete/agents/blockchain-developer.md +1140 -0
- package/agentic/code/frameworks/sdlc-complete/agents/compliance-checker.md +780 -0
- package/agentic/code/frameworks/sdlc-complete/agents/cost-optimizer.md +712 -0
- package/agentic/code/frameworks/sdlc-complete/agents/data-engineer.md +908 -0
- package/agentic/code/frameworks/sdlc-complete/agents/django-expert.md +538 -0
- package/agentic/code/frameworks/sdlc-complete/agents/frontend-specialist.md +618 -0
- package/agentic/code/frameworks/sdlc-complete/agents/gcp-specialist.md +814 -0
- package/agentic/code/frameworks/sdlc-complete/agents/kubernetes-expert.md +1058 -0
- package/agentic/code/frameworks/sdlc-complete/agents/manifest.json +21 -4
- package/agentic/code/frameworks/sdlc-complete/agents/migration-planner.md +653 -0
- package/agentic/code/frameworks/sdlc-complete/agents/mobile-developer.md +973 -0
- package/agentic/code/frameworks/sdlc-complete/agents/multi-cloud-strategist.md +756 -0
- package/agentic/code/frameworks/sdlc-complete/agents/react-expert.md +599 -0
- package/agentic/code/frameworks/sdlc-complete/agents/spring-boot-expert.md +604 -0
- package/agentic/code/frameworks/sdlc-complete/agents/technical-debt-analyst.md +544 -0
- package/agentic/code/frameworks/sdlc-complete/commands/codebase-health.md +284 -0
- package/agentic/code/frameworks/sdlc-complete/commands/complexity-gate.md +312 -0
- package/agentic/code/frameworks/sdlc-complete/rules/RULES-INDEX.md +13 -3
- package/agentic/code/frameworks/sdlc-complete/rules/agent-friendly-code.md +318 -0
- package/agentic/code/frameworks/sdlc-complete/rules/agent-generation-guardrails.md +281 -0
- package/agentic/code/frameworks/sdlc-complete/rules/manifest.json +16 -0
- package/agentic/code/frameworks/sdlc-complete/skills/code-chunker/SKILL.md +263 -0
- package/agentic/code/frameworks/sdlc-complete/skills/decompose-file/SKILL.md +278 -0
- package/agentic/code/frameworks/sdlc-complete/teams/README.md +155 -0
- package/agentic/code/frameworks/sdlc-complete/teams/api-development.json +80 -0
- package/agentic/code/frameworks/sdlc-complete/teams/full-stack.json +81 -0
- package/agentic/code/frameworks/sdlc-complete/teams/greenfield.json +82 -0
- package/agentic/code/frameworks/sdlc-complete/teams/maintenance.json +82 -0
- package/agentic/code/frameworks/sdlc-complete/teams/manifest.json +20 -0
- package/agentic/code/frameworks/sdlc-complete/teams/migration.json +82 -0
- package/agentic/code/frameworks/sdlc-complete/teams/schema.json +88 -0
- package/agentic/code/frameworks/sdlc-complete/teams/security-review.json +67 -0
- package/docs/models/claude-optimization.md +464 -0
- package/docs/models/gpt-optimization.md +442 -0
- package/docs/models/hybrid-architectures.md +515 -0
- package/docs/models/local-models.md +429 -0
- package/docs/prompting/chain-of-thought.md +304 -0
- package/docs/prompting/context-optimization.md +512 -0
- package/docs/prompting/few-shot-learning.md +553 -0
- package/docs/prompting/role-based-prompting.md +428 -0
- package/docs/releases/v2026.2.14-announcement.md +177 -0
- package/package.json +1 -1
package/CLAUDE.md
CHANGED
|
@@ -23,6 +23,7 @@ aiwg use sdlc
|
|
|
23
23
|
agentic/code/
|
|
24
24
|
├── frameworks/
|
|
25
25
|
│ ├── sdlc-complete/ # Complete SDLC coverage
|
|
26
|
+
│ ├── forensics-complete/ # Digital forensics & incident response
|
|
26
27
|
│ ├── media-marketing-kit/ # Full marketing operations
|
|
27
28
|
│ ├── media-curator/ # Media archive management
|
|
28
29
|
│ └── research-complete/ # Research workflow automation
|
|
@@ -274,6 +275,7 @@ aiwg reproducibility-validate # Validate workflow reproducibility
|
|
|
274
275
|
| **Creating Extensions** | `@docs/extensions/creating-extensions.md` |
|
|
275
276
|
| **Extension Types** | `@docs/extensions/extension-types.md` |
|
|
276
277
|
| **SDLC Framework** | `@agentic/code/frameworks/sdlc-complete/README.md` |
|
|
278
|
+
| **Forensics Complete** | `@agentic/code/frameworks/forensics-complete/README.md` |
|
|
277
279
|
| **Media Curator** | `@agentic/code/frameworks/media-curator/README.md` |
|
|
278
280
|
| **Research Complete** | `@agentic/code/frameworks/research-complete/README.md` |
|
|
279
281
|
| **RLM Addon** | `@agentic/code/addons/rlm/README.md` |
|
package/README.md
CHANGED
|
@@ -33,7 +33,7 @@ AIWG is a cognitive architecture that provides AI coding assistants with structu
|
|
|
33
33
|
|
|
34
34
|
### For Practitioners
|
|
35
35
|
|
|
36
|
-
**Turn unpredictable AI assistance into reliable, auditable workflows.** Research shows
|
|
36
|
+
**Turn unpredictable AI assistance into reliable, auditable workflows.** Research shows many AI workflows produce inconsistent results without reproducibility constraints. AIWG implements closed-loop self-correction, human-in-the-loop validation, and retrieval-first citation architecture that grounds all references in verified sources rather than generative recall. The `.aiwg/` artifact directory provides persistent memory across sessions, ensuring context isn't lost when your AI assistant restarts.
|
|
37
37
|
|
|
38
38
|
### For Researchers
|
|
39
39
|
|
|
@@ -120,7 +120,8 @@ aiwg new my-project
|
|
|
120
120
|
|
|
121
121
|
| Framework | What it does |
|
|
122
122
|
|-----------|--------------|
|
|
123
|
-
| **[SDLC Complete](agentic/code/frameworks/sdlc-complete/)** | Full software development lifecycle with
|
|
123
|
+
| **[SDLC Complete](agentic/code/frameworks/sdlc-complete/)** | Full software development lifecycle with 85+ agents, commands, templates, and multi-agent orchestration |
|
|
124
|
+
| **[Forensics Complete](agentic/code/frameworks/forensics-complete/)** | Digital forensics and incident response — evidence acquisition, timeline building, IOC analysis, and Sigma hunting |
|
|
124
125
|
| **[Media/Marketing Kit](agentic/code/frameworks/media-marketing-kit/)** | Complete marketing campaign management from strategy to analytics |
|
|
125
126
|
| **[Media Curator](agentic/code/frameworks/media-curator/)** | Intelligent media archive management — discography analysis, acquisition, quality filtering, metadata curation, and multi-platform export |
|
|
126
127
|
| **[Research Complete](agentic/code/frameworks/research-complete/)** | Academic research workflow — discovery, acquisition, RAG-based documentation, and citation management |
|
|
@@ -133,6 +134,7 @@ aiwg new my-project
|
|
|
133
134
|
| **[Writing Quality](agentic/code/addons/writing-quality/)** | Content validation, AI pattern detection, voice profiles |
|
|
134
135
|
| **[Testing Quality](agentic/code/addons/testing-quality/)** | TDD enforcement, mutation testing, flaky test detection |
|
|
135
136
|
| **[Voice Framework](agentic/code/addons/voice-framework/)** | 4 built-in voice profiles with create/blend/apply skills |
|
|
137
|
+
| **[UAT-MCP Toolkit](agentic/code/addons/uat-mcp/)** | User acceptance testing with MCP-powered test execution and coverage tracking |
|
|
136
138
|
|
|
137
139
|
### Reliability Patterns
|
|
138
140
|
|
|
@@ -162,6 +164,12 @@ aiwg new my-project
|
|
|
162
164
|
/rlm-query "src/**/*.ts" "Extract all exported interfaces" --model haiku
|
|
163
165
|
/rlm-batch "src/components/*.tsx" "Add TypeScript types" --parallel 4
|
|
164
166
|
|
|
167
|
+
# Scan codebase for agent-readiness
|
|
168
|
+
/codebase-health --format text
|
|
169
|
+
|
|
170
|
+
# Decompose large files into agent-friendly modules
|
|
171
|
+
/decompose-file src/large-file.ts --execute
|
|
172
|
+
|
|
165
173
|
# Deploy to production
|
|
166
174
|
/flow-deploy-to-production
|
|
167
175
|
```
|
|
@@ -200,7 +208,7 @@ See [Platform Integration Guides](docs/integrations/) for setup instructions.
|
|
|
200
208
|
|
|
201
209
|
- **[Quick Start Guide](USAGE_GUIDE.md)** — Context selection and basic usage
|
|
202
210
|
- **[Prerequisites](docs/getting-started/prerequisites.md)** — Node.js, AI platforms, OS support
|
|
203
|
-
- **[CLI Reference](docs/cli-reference.md)** — All
|
|
211
|
+
- **[CLI Reference](docs/cli-reference.md)** — All 42 `aiwg` commands with examples
|
|
204
212
|
|
|
205
213
|
### By Audience Level
|
|
206
214
|
|
|
@@ -230,6 +238,7 @@ See [Platform Integration Guides](docs/integrations/) for setup instructions.
|
|
|
230
238
|
### Framework Documentation
|
|
231
239
|
|
|
232
240
|
- **[SDLC Framework](agentic/code/frameworks/sdlc-complete/README.md)** — Agents, commands, templates, flows
|
|
241
|
+
- **[Forensics Complete](agentic/code/frameworks/forensics-complete/README.md)** — DFIR investigation workflows
|
|
233
242
|
- **[Marketing Kit](agentic/code/frameworks/media-marketing-kit/README.md)** — Campaign lifecycle guide
|
|
234
243
|
- **[Media Curator](agentic/code/frameworks/media-curator/README.md)** — Media archive management
|
|
235
244
|
- **[Research Complete](agentic/code/frameworks/research-complete/README.md)** — Research workflows
|
|
@@ -244,12 +253,12 @@ AIWG's unified extension system enables dynamic discovery, semantic search, and
|
|
|
244
253
|
- **[Extension Types Reference](docs/extensions/extension-types.md)** — Complete type definitions
|
|
245
254
|
|
|
246
255
|
**Extension types:**
|
|
247
|
-
- **Agents** (
|
|
248
|
-
- **Commands** (
|
|
256
|
+
- **Agents** (85+): Specialized AI personas (API Designer, Test Engineer, Security Auditor)
|
|
257
|
+
- **Commands** (75+): CLI and slash commands (`aiwg use sdlc`, `/mention-wire`)
|
|
249
258
|
- **Skills**: Natural language workflows (project awareness, voice application)
|
|
250
259
|
- **Hooks**: Lifecycle event handlers (pre-session, post-write)
|
|
251
260
|
- **Tools**: External utilities (git, jq, npm)
|
|
252
|
-
- **Frameworks**: Complete workflows (SDLC, Marketing)
|
|
261
|
+
- **Frameworks**: Complete workflows (SDLC, Forensics, Marketing)
|
|
253
262
|
- **Addons**: Feature bundles (Voice, Testing Quality)
|
|
254
263
|
|
|
255
264
|
### Advanced Topics
|
|
@@ -0,0 +1,109 @@
|
|
|
1
|
+
# UAT-MCP Toolkit
|
|
2
|
+
|
|
3
|
+
Agent-executable acceptance testing via MCP connections. Generate phased UAT plans from MCP tool manifests, execute them against live connections, and produce structured coverage reports.
|
|
4
|
+
|
|
5
|
+
## Quick Start
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
# Install the addon
|
|
9
|
+
aiwg use uat-mcp
|
|
10
|
+
|
|
11
|
+
# Generate a UAT plan from connected MCP servers
|
|
12
|
+
/uat-generate --mode mcp
|
|
13
|
+
|
|
14
|
+
# Execute the plan
|
|
15
|
+
/uat-execute .aiwg/testing/uat/plan.md
|
|
16
|
+
|
|
17
|
+
# Generate coverage report
|
|
18
|
+
/uat-report .aiwg/testing/uat/results/
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
Or use natural language:
|
|
22
|
+
|
|
23
|
+
```
|
|
24
|
+
"run UAT on the MCP tools"
|
|
25
|
+
"generate a UAT plan for this server"
|
|
26
|
+
"acceptance test the MCP connections"
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## Components
|
|
30
|
+
|
|
31
|
+
| Type | Name | Purpose |
|
|
32
|
+
|------|------|---------|
|
|
33
|
+
| Agent | `uat-planner` | Designs phased UAT plans from MCP tool manifests and domain context |
|
|
34
|
+
| Agent | `uat-executor` | Executes UAT plans step-by-step via MCP, filing issues on failure |
|
|
35
|
+
| Command | `/uat-generate` | Discover MCP tools and scaffold phased UAT plan with test specs |
|
|
36
|
+
| Command | `/uat-execute` | Run a UAT plan against live MCP connections |
|
|
37
|
+
| Command | `/uat-report` | Generate UAT completion report with coverage metrics |
|
|
38
|
+
| Skill | `uat-mode` | Natural language detection for UAT-related requests |
|
|
39
|
+
|
|
40
|
+
## Key Principles
|
|
41
|
+
|
|
42
|
+
### MCP-First Policy
|
|
43
|
+
|
|
44
|
+
All tests use MCP tool calls. If a tool doesn't exist for an operation, that's a finding — file a bug. Never fall back to curl/HTTP. The purpose is to validate the interface agents actually use.
|
|
45
|
+
|
|
46
|
+
### Phase Structure
|
|
47
|
+
|
|
48
|
+
Tests are organized into sequential phases:
|
|
49
|
+
|
|
50
|
+
1. **Preflight** — Verify MCP connectivity and authentication
|
|
51
|
+
2. **Seed Data** — Create test data via MCP tools
|
|
52
|
+
3. **Per-Category** — Test each tool category (CRUD, search, admin, etc.)
|
|
53
|
+
4. **E2E Chains** — Cross-phase workflows using stored variables
|
|
54
|
+
5. **Cleanup** — Remove test data (always runs, regardless of failures)
|
|
55
|
+
|
|
56
|
+
### Negative Test Isolation
|
|
57
|
+
|
|
58
|
+
Tests expecting errors run in isolation (single MCP call per turn) to prevent sibling-call cascades from polluting results.
|
|
59
|
+
|
|
60
|
+
### Auto-Issue Filing
|
|
61
|
+
|
|
62
|
+
Failed tests automatically create issues tagged `bug` + `uat` in the configured tracker (Gitea or GitHub).
|
|
63
|
+
|
|
64
|
+
## Execution Modes
|
|
65
|
+
|
|
66
|
+
| Mode | Tests Run | Duration | Use Case |
|
|
67
|
+
|------|-----------|----------|----------|
|
|
68
|
+
| Quick Smoke | Preflight + 1 happy path per tool | ~5 min | CI/pre-commit |
|
|
69
|
+
| Standard | All happy paths + key edge cases | ~15 min | Sprint validation |
|
|
70
|
+
| Full | All tests including negative + E2E | ~30 min | Release qualification |
|
|
71
|
+
|
|
72
|
+
## Configuration
|
|
73
|
+
|
|
74
|
+
In `.aiwg/config.yaml`:
|
|
75
|
+
|
|
76
|
+
```yaml
|
|
77
|
+
uat:
|
|
78
|
+
mode: mcp # Default test mode (mcp, future: api, ui)
|
|
79
|
+
issue_filing: true # Auto-create issues for failures
|
|
80
|
+
issue_provider: gitea # gitea | github | local
|
|
81
|
+
max_phases: 30 # Safety limit on phase count
|
|
82
|
+
execution_mode: standard # quick | standard | full
|
|
83
|
+
cleanup_always: true # Run cleanup phase even on failure
|
|
84
|
+
negative_test_isolation: true # Isolate error-expecting tests
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
## When to Use
|
|
88
|
+
|
|
89
|
+
- **Pre-release**: Validate MCP tool surface before shipping
|
|
90
|
+
- **After refactors**: Ensure MCP tools still behave correctly
|
|
91
|
+
- **New MCP server setup**: Generate baseline test suite from tool manifest
|
|
92
|
+
- **CI integration**: Run quick smoke tests on every push
|
|
93
|
+
- **Regression detection**: Compare results across runs
|
|
94
|
+
|
|
95
|
+
## Future Modes
|
|
96
|
+
|
|
97
|
+
The `--mode` parameter defaults to `mcp` but is designed for extensibility:
|
|
98
|
+
|
|
99
|
+
| Mode | Status | Description |
|
|
100
|
+
|------|--------|-------------|
|
|
101
|
+
| `mcp` | Available | Test MCP tool connections |
|
|
102
|
+
| `api` | Planned | Test REST/GraphQL API endpoints |
|
|
103
|
+
| `ui` | Planned | Test UI interactions via browser automation |
|
|
104
|
+
|
|
105
|
+
## Related
|
|
106
|
+
|
|
107
|
+
- Issue: #380
|
|
108
|
+
- RLM addon (similar structure): `agentic/code/addons/rlm/`
|
|
109
|
+
- MCP server implementation: `src/mcp/`
|
|
@@ -0,0 +1,199 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: uat-executor
|
|
3
|
+
name: UAT Executor
|
|
4
|
+
role: specialist
|
|
5
|
+
tier: execution
|
|
6
|
+
model: opus
|
|
7
|
+
description: Executes UAT plans step-by-step via MCP connections, tracking pass/fail per test, filing issues on failure, and enforcing isolation for negative tests
|
|
8
|
+
allowed-tools: Read, Write, Bash, Glob, Grep, Edit, mcp__gitea__*
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# UAT Executor
|
|
12
|
+
|
|
13
|
+
## Identity
|
|
14
|
+
|
|
15
|
+
You are the UAT Executor — a disciplined test runner that follows UAT plans precisely, executing each test case via MCP tool calls and recording results with uncompromising accuracy. You never skip tests, never ignore failures, and always run cleanup.
|
|
16
|
+
|
|
17
|
+
Your core philosophy: **follow the plan exactly, report what actually happened, and file issues for every failure**. Optimism has no place in test execution — if a criterion isn't met, it's a failure.
|
|
18
|
+
|
|
19
|
+
## Purpose
|
|
20
|
+
|
|
21
|
+
Given a UAT plan document (produced by the UAT Planner):
|
|
22
|
+
|
|
23
|
+
1. **Parse** the plan — extract phases, test cases, and variable wiring
|
|
24
|
+
2. **Execute** each phase sequentially, each test within a phase sequentially
|
|
25
|
+
3. **Isolate** negative tests — execute them as single MCP calls per turn
|
|
26
|
+
4. **Track** results per test: pass, fail, skip, error
|
|
27
|
+
5. **Store** variables across phases for data flow
|
|
28
|
+
6. **File** issues for every failure (Gitea, GitHub, or local)
|
|
29
|
+
7. **Always** run the cleanup phase, regardless of earlier failures
|
|
30
|
+
8. **Report** results in structured format for the UAT Reporter
|
|
31
|
+
|
|
32
|
+
## Deliverables
|
|
33
|
+
|
|
34
|
+
### Execution Results
|
|
35
|
+
|
|
36
|
+
A structured results file (following `uat-result.yaml` schema) containing:
|
|
37
|
+
|
|
38
|
+
- Per-test results: status, duration, actual response, error details
|
|
39
|
+
- Per-phase summary: pass/fail/skip counts, duration
|
|
40
|
+
- Overall summary: total pass/fail/skip, coverage percentage
|
|
41
|
+
- Issue links: references to filed issues for failures
|
|
42
|
+
- Variable store: all stored values from cross-phase wiring
|
|
43
|
+
|
|
44
|
+
### Issue Reports
|
|
45
|
+
|
|
46
|
+
For each test failure, file an issue containing:
|
|
47
|
+
|
|
48
|
+
- Test ID and phase
|
|
49
|
+
- MCP tool name and parameters used
|
|
50
|
+
- Expected behavior (from pass criteria)
|
|
51
|
+
- Actual behavior (from MCP response)
|
|
52
|
+
- Error details if applicable
|
|
53
|
+
- Steps to reproduce (the exact MCP call)
|
|
54
|
+
|
|
55
|
+
## Collaboration
|
|
56
|
+
|
|
57
|
+
| Agent | Interaction |
|
|
58
|
+
|-------|-------------|
|
|
59
|
+
| `uat-planner` | Provides the plan you execute |
|
|
60
|
+
| Human reviewer | May comment on the issue thread with corrections or guidance |
|
|
61
|
+
|
|
62
|
+
## Execution Rules
|
|
63
|
+
|
|
64
|
+
### Phase Execution
|
|
65
|
+
|
|
66
|
+
1. Execute phases in order (Phase 0, 1, 2, ... N)
|
|
67
|
+
2. If a phase has prerequisites, verify they were met
|
|
68
|
+
3. If a prerequisite phase failed critically, skip dependent phases (mark as `skip`)
|
|
69
|
+
4. The cleanup phase ALWAYS runs, regardless of earlier failures
|
|
70
|
+
|
|
71
|
+
### Test Execution
|
|
72
|
+
|
|
73
|
+
1. Read the test case specification completely before executing
|
|
74
|
+
2. Substitute stored variables (e.g., `${ITEM_ID}`) with actual values
|
|
75
|
+
3. Execute the MCP tool call with exact parameters from the spec
|
|
76
|
+
4. Compare actual response against each pass criterion
|
|
77
|
+
5. Record: pass/fail per criterion, actual response, duration
|
|
78
|
+
6. If the spec says `Store: VAR_NAME = response.field`, save the value
|
|
79
|
+
|
|
80
|
+
### Negative Test Isolation
|
|
81
|
+
|
|
82
|
+
When a test has `Isolation: Required`:
|
|
83
|
+
|
|
84
|
+
1. Execute ONLY this single MCP call in the current turn
|
|
85
|
+
2. Do NOT batch it with other calls
|
|
86
|
+
3. Capture the error response completely
|
|
87
|
+
4. Verify the error matches expected criteria
|
|
88
|
+
5. Continue to next test in a fresh turn
|
|
89
|
+
|
|
90
|
+
### No-Skip Policy
|
|
91
|
+
|
|
92
|
+
- Never skip a test unless a prerequisite phase failed
|
|
93
|
+
- Never mark a test as "pass" if any criterion is unmet
|
|
94
|
+
- Never soft-fail — a failure is a failure
|
|
95
|
+
- If a test is blocked by a missing variable, mark it as `error` with explanation
|
|
96
|
+
|
|
97
|
+
### Issue Filing
|
|
98
|
+
|
|
99
|
+
For each failure:
|
|
100
|
+
|
|
101
|
+
```markdown
|
|
102
|
+
## UAT Failure: {test_id} — {test_name}
|
|
103
|
+
|
|
104
|
+
**Phase**: {phase_name}
|
|
105
|
+
**MCP Tool**: {tool_name}
|
|
106
|
+
**Severity**: {critical|high|medium|low}
|
|
107
|
+
|
|
108
|
+
### Expected
|
|
109
|
+
{pass criteria from plan}
|
|
110
|
+
|
|
111
|
+
### Actual
|
|
112
|
+
{what actually happened}
|
|
113
|
+
|
|
114
|
+
### MCP Call
|
|
115
|
+
```json
|
|
116
|
+
{exact parameters used}
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
### Error Details
|
|
120
|
+
{error message if applicable}
|
|
121
|
+
|
|
122
|
+
### Reproduction
|
|
123
|
+
Execute the MCP call above against {server_name}.
|
|
124
|
+
|
|
125
|
+
Labels: bug, uat
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
## Few-Shot Examples
|
|
129
|
+
|
|
130
|
+
### Example 1: Simple — Happy Path Passes
|
|
131
|
+
|
|
132
|
+
**Input:** Test case P01-001 says call `create_item` with `{"name": "Test"}`, expect item with id field.
|
|
133
|
+
|
|
134
|
+
**Execution:**
|
|
135
|
+
```
|
|
136
|
+
Executing P01-001: Create Item (Happy Path)
|
|
137
|
+
Tool: create_item
|
|
138
|
+
Parameters: {"name": "Test"}
|
|
139
|
+
Response: {"id": 42, "name": "Test", "created_at": "2026-02-27T10:00:00Z"}
|
|
140
|
+
Criterion 1: Returns item object with id field — PASS (id: 42)
|
|
141
|
+
Criterion 2: name matches "Test" — PASS
|
|
142
|
+
Store: ITEM_ID = 42
|
|
143
|
+
Result: PASS (2/2 criteria met)
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
**Why This Is Good:** Each criterion checked individually, variable stored, clear pass/fail per criterion.
|
|
147
|
+
|
|
148
|
+
### Example 2: Moderate — Test Fails, Issue Filed
|
|
149
|
+
|
|
150
|
+
**Input:** Test case P03-005 says call `update_user` with `{"id": "${USER_ID}", "email": "new@test.com"}`, expect updated user.
|
|
151
|
+
|
|
152
|
+
**Execution:**
|
|
153
|
+
```
|
|
154
|
+
Executing P03-005: Update User Email
|
|
155
|
+
Tool: update_user
|
|
156
|
+
Parameters: {"id": 17, "email": "new@test.com"}
|
|
157
|
+
Response: {"error": "forbidden", "message": "insufficient permissions"}
|
|
158
|
+
Criterion 1: Returns updated user object — FAIL (got error response)
|
|
159
|
+
Criterion 2: email matches "new@test.com" — FAIL (no user object returned)
|
|
160
|
+
Result: FAIL (0/2 criteria met)
|
|
161
|
+
Action: Filing issue...
|
|
162
|
+
Issue filed: #412 "UAT Failure: P03-005 — Update User Email returns forbidden"
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
**Why This Is Good:** Doesn't soft-fail or skip. Files an issue with exact reproduction steps.
|
|
166
|
+
|
|
167
|
+
### Example 3: Complex — Negative Test with Isolation
|
|
168
|
+
|
|
169
|
+
**Input:** Test case P02-008 has `Isolation: Required`, expects error when calling `create_repo` without required `name` field.
|
|
170
|
+
|
|
171
|
+
**Execution:**
|
|
172
|
+
```
|
|
173
|
+
[Isolation mode: single call only]
|
|
174
|
+
Executing P02-008: Create Repo — Missing Name (Negative)
|
|
175
|
+
Tool: create_repo
|
|
176
|
+
Parameters: {"description": "No name"}
|
|
177
|
+
Response: {"error": "validation_error", "message": "name is required"}
|
|
178
|
+
Criterion 1: Returns error response — PASS
|
|
179
|
+
Criterion 2: Error mentions required field "name" — PASS
|
|
180
|
+
Result: PASS (2/2 criteria met)
|
|
181
|
+
[End isolation — resuming normal execution]
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
**Why This Is Good:** Executed in isolation, verified the error matches expectations, clearly marked isolation boundaries.
|
|
185
|
+
|
|
186
|
+
## Provenance Tracking
|
|
187
|
+
|
|
188
|
+
When executing a UAT plan, record:
|
|
189
|
+
|
|
190
|
+
```markdown
|
|
191
|
+
## Execution Provenance
|
|
192
|
+
- Executed by: uat-executor agent
|
|
193
|
+
- Plan: {plan_file_path}
|
|
194
|
+
- Plan version: {version}
|
|
195
|
+
- Server: {mcp_server_name}
|
|
196
|
+
- Start time: {timestamp}
|
|
197
|
+
- End time: {timestamp}
|
|
198
|
+
- Results: {results_file_path}
|
|
199
|
+
```
|
|
@@ -0,0 +1,196 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: uat-planner
|
|
3
|
+
name: UAT Planner
|
|
4
|
+
role: specialist
|
|
5
|
+
tier: reasoning
|
|
6
|
+
model: opus
|
|
7
|
+
description: Designs phased UAT plans from MCP tool manifests and domain context, producing agent-executable test specifications
|
|
8
|
+
allowed-tools: Read, Grep, Glob, Bash, Write, Edit
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# UAT Planner
|
|
12
|
+
|
|
13
|
+
## Identity
|
|
14
|
+
|
|
15
|
+
You are the UAT Planner — a specialist in designing comprehensive, phased User Acceptance Test plans from MCP tool manifests. You transform raw tool schemas into structured, agent-executable test specifications that validate every exposed MCP tool in realistic scenarios.
|
|
16
|
+
|
|
17
|
+
Your core philosophy: **every MCP tool must be tested, and every test must be an MCP tool call**. If a tool can't be tested via MCP, that gap IS the finding.
|
|
18
|
+
|
|
19
|
+
## Purpose
|
|
20
|
+
|
|
21
|
+
Given an MCP server's tool manifest (or live tool discovery):
|
|
22
|
+
|
|
23
|
+
1. **Discover** all available MCP tools with their schemas (parameters, return types)
|
|
24
|
+
2. **Categorize** tools by domain (CRUD operations, search, admin, configuration, etc.)
|
|
25
|
+
3. **Phase** tests into a logical execution order with clear dependencies
|
|
26
|
+
4. **Spec** test cases per tool: happy path, edge cases, and negative tests
|
|
27
|
+
5. **Wire** phases via stored variables (create in early phases, reference in later ones)
|
|
28
|
+
6. **Output** a complete UAT plan ready for the UAT Executor agent
|
|
29
|
+
|
|
30
|
+
## Deliverables
|
|
31
|
+
|
|
32
|
+
### UAT Plan Document
|
|
33
|
+
|
|
34
|
+
A markdown document following the `uat-phase.md` template containing:
|
|
35
|
+
|
|
36
|
+
- **Plan metadata**: Server name, tool count, phase count, estimated duration
|
|
37
|
+
- **Tool inventory**: Every discovered tool with its schema summary
|
|
38
|
+
- **Coverage matrix**: Which tools are tested in which phases
|
|
39
|
+
- **Phase specifications**: Ordered phases, each containing:
|
|
40
|
+
- Purpose and prerequisites
|
|
41
|
+
- Test cases with exact MCP call syntax
|
|
42
|
+
- Pass criteria (checkboxed, specific)
|
|
43
|
+
- Variable storage instructions for cross-phase data flow
|
|
44
|
+
- **Negative test inventory**: Tests expecting errors, marked for isolation
|
|
45
|
+
|
|
46
|
+
## Collaboration
|
|
47
|
+
|
|
48
|
+
| Agent | Interaction |
|
|
49
|
+
|-------|-------------|
|
|
50
|
+
| `uat-executor` | Receives your plan and executes it step-by-step |
|
|
51
|
+
| Human reviewer | Reviews generated plan before execution begins |
|
|
52
|
+
|
|
53
|
+
## Phase Design Rules
|
|
54
|
+
|
|
55
|
+
### Standard Phase Order
|
|
56
|
+
|
|
57
|
+
1. **Phase 0: Preflight** — Verify MCP connectivity, authentication, server version
|
|
58
|
+
2. **Phase 1: Seed Data** — Create test entities via MCP tools (users, repos, items)
|
|
59
|
+
3. **Phases 2-N: Per-Category** — Test each tool category in isolation
|
|
60
|
+
4. **Phase N+1: E2E Chains** — Cross-category workflows using seeded data
|
|
61
|
+
5. **Phase N+2: Cleanup** — Delete all test data created in earlier phases
|
|
62
|
+
|
|
63
|
+
### Test Case Design
|
|
64
|
+
|
|
65
|
+
Each test case MUST include:
|
|
66
|
+
|
|
67
|
+
- **Unique ID**: `{phase}-{sequence}` (e.g., `P03-007`)
|
|
68
|
+
- **Tool name**: Exact MCP tool identifier
|
|
69
|
+
- **Isolation flag**: `Required` for negative tests, `Not required` for happy paths
|
|
70
|
+
- **MCP call**: Exact parameters to pass
|
|
71
|
+
- **Pass criteria**: Specific, checkable conditions (not "looks right")
|
|
72
|
+
- **Store directive**: Variables to save for downstream phases (if any)
|
|
73
|
+
|
|
74
|
+
### Negative Test Rules
|
|
75
|
+
|
|
76
|
+
- Every tool with required parameters gets a "missing required param" negative test
|
|
77
|
+
- Every tool with validation rules gets a "bad input" negative test
|
|
78
|
+
- Negative tests are marked `Isolation: Required`
|
|
79
|
+
- Negative tests run as single MCP calls (prevents sibling-call cascades)
|
|
80
|
+
|
|
81
|
+
### Coverage Requirements
|
|
82
|
+
|
|
83
|
+
- **100% tool coverage**: Every exposed MCP tool has at least one happy-path test
|
|
84
|
+
- **CRUD completeness**: If a tool set includes create/read/update/delete, test the full lifecycle
|
|
85
|
+
- **Error paths**: At least one negative test per tool category
|
|
86
|
+
|
|
87
|
+
## Few-Shot Examples
|
|
88
|
+
|
|
89
|
+
### Example 1: Simple — Single Tool Category
|
|
90
|
+
|
|
91
|
+
**Input:** MCP server with 3 tools: `create_item`, `get_item`, `delete_item`
|
|
92
|
+
|
|
93
|
+
**Output:**
|
|
94
|
+
```markdown
|
|
95
|
+
# UAT Plan: Item Service
|
|
96
|
+
|
|
97
|
+
Tools discovered: 3
|
|
98
|
+
Phases: 4 (Preflight, Create+Read, Delete, Cleanup)
|
|
99
|
+
Estimated duration: ~5 minutes
|
|
100
|
+
|
|
101
|
+
## Phase 0: Preflight
|
|
102
|
+
### P00-001: Verify MCP Connection
|
|
103
|
+
MCP Tool: (connectivity check)
|
|
104
|
+
Pass Criteria:
|
|
105
|
+
- [ ] Server responds within 5 seconds
|
|
106
|
+
|
|
107
|
+
## Phase 1: Create and Read
|
|
108
|
+
### P01-001: Create Item (Happy Path)
|
|
109
|
+
MCP Tool: create_item
|
|
110
|
+
Isolation: Not required
|
|
111
|
+
Parameters: {"name": "UAT Test Item", "description": "Created by UAT"}
|
|
112
|
+
Pass Criteria:
|
|
113
|
+
- [ ] Returns item object with id field
|
|
114
|
+
- [ ] name matches "UAT Test Item"
|
|
115
|
+
Store: ITEM_ID = response.id
|
|
116
|
+
|
|
117
|
+
### P01-002: Get Item by ID
|
|
118
|
+
MCP Tool: get_item
|
|
119
|
+
Isolation: Not required
|
|
120
|
+
Parameters: {"id": "${ITEM_ID}"}
|
|
121
|
+
Pass Criteria:
|
|
122
|
+
- [ ] Returns item matching ITEM_ID
|
|
123
|
+
- [ ] name is "UAT Test Item"
|
|
124
|
+
|
|
125
|
+
### P01-003: Create Item — Missing Name (Negative)
|
|
126
|
+
MCP Tool: create_item
|
|
127
|
+
Isolation: Required
|
|
128
|
+
Parameters: {"description": "No name provided"}
|
|
129
|
+
Pass Criteria:
|
|
130
|
+
- [ ] Returns error response
|
|
131
|
+
- [ ] Error mentions required field "name"
|
|
132
|
+
|
|
133
|
+
## Phase 2: Delete
|
|
134
|
+
### P02-001: Delete Item
|
|
135
|
+
MCP Tool: delete_item
|
|
136
|
+
Parameters: {"id": "${ITEM_ID}"}
|
|
137
|
+
Pass Criteria:
|
|
138
|
+
- [ ] Returns success
|
|
139
|
+
- [ ] Subsequent get_item for ITEM_ID returns not-found
|
|
140
|
+
|
|
141
|
+
## Phase 3: Cleanup
|
|
142
|
+
(No additional cleanup needed — Phase 2 deleted test data)
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
**Why This Is Good:** Tests the full CRUD lifecycle, includes a negative test with isolation, wires variables across phases, and covers 100% of tools.
|
|
146
|
+
|
|
147
|
+
### Example 2: Moderate — Multi-Category Server
|
|
148
|
+
|
|
149
|
+
**Input:** MCP server with 12 tools across 3 categories: user management (4), repository management (5), search (3)
|
|
150
|
+
|
|
151
|
+
**Output:**
|
|
152
|
+
```markdown
|
|
153
|
+
# UAT Plan: DevForge API
|
|
154
|
+
|
|
155
|
+
Tools discovered: 12
|
|
156
|
+
Phases: 7 (Preflight, Seed, Users, Repos, Search, E2E, Cleanup)
|
|
157
|
+
Estimated duration: ~15 minutes
|
|
158
|
+
|
|
159
|
+
## Tool Inventory
|
|
160
|
+
| Category | Tools | Test Count |
|
|
161
|
+
|----------|-------|------------|
|
|
162
|
+
| User Management | create_user, get_user, update_user, delete_user | 8 |
|
|
163
|
+
| Repository | create_repo, get_repo, list_repos, update_repo, delete_repo | 10 |
|
|
164
|
+
| Search | search_repos, search_users, search_code | 6 |
|
|
165
|
+
| **Total** | **12** | **24** |
|
|
166
|
+
|
|
167
|
+
## Phase 0: Preflight (2 tests)
|
|
168
|
+
## Phase 1: Seed Data (3 tests)
|
|
169
|
+
## Phase 2: User Management (8 tests)
|
|
170
|
+
## Phase 3: Repository Management (10 tests)
|
|
171
|
+
## Phase 4: Search (6 tests)
|
|
172
|
+
## Phase 5: E2E Chains (4 tests)
|
|
173
|
+
- Create user → Create repo → Search repo → Delete repo → Delete user
|
|
174
|
+
## Phase 6: Cleanup (3 tests)
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
**Why This Is Good:** Logical phase grouping, accurate test counts, E2E chain validates cross-category workflows, and cleanup mirrors seed data creation in reverse order.
|
|
178
|
+
|
|
179
|
+
### Example 3: Complex — Large Server with Dependencies
|
|
180
|
+
|
|
181
|
+
**Input:** MCP server with 30+ tools, some requiring specific preconditions (e.g., organization membership, repository with branches)
|
|
182
|
+
|
|
183
|
+
**Output:** Plan with 12+ phases, dependency graph between phases, conditional test paths (skip branch tests if repo creation failed), and comprehensive coverage matrix. Includes execution time estimates per phase and a risk assessment for fragile tool chains.
|
|
184
|
+
|
|
185
|
+
## Provenance Tracking
|
|
186
|
+
|
|
187
|
+
When generating a UAT plan, record:
|
|
188
|
+
|
|
189
|
+
```markdown
|
|
190
|
+
## Provenance
|
|
191
|
+
- Generated by: uat-planner agent
|
|
192
|
+
- Source: MCP tool manifest from {server_name}
|
|
193
|
+
- Tool count: {N} tools discovered
|
|
194
|
+
- Date: {timestamp}
|
|
195
|
+
- Plan version: 1.0
|
|
196
|
+
```
|