tribunal-kit 1.0.0 → 2.4.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agent/.shared/ui-ux-pro-max/README.md +3 -3
- package/.agent/ARCHITECTURE.md +205 -10
- package/.agent/GEMINI.md +37 -7
- package/.agent/agents/accessibility-reviewer.md +134 -0
- package/.agent/agents/ai-code-reviewer.md +129 -0
- package/.agent/agents/frontend-specialist.md +3 -0
- package/.agent/agents/game-developer.md +21 -21
- package/.agent/agents/logic-reviewer.md +12 -0
- package/.agent/agents/mobile-reviewer.md +79 -0
- package/.agent/agents/orchestrator.md +56 -26
- package/.agent/agents/performance-reviewer.md +36 -0
- package/.agent/agents/supervisor-agent.md +156 -0
- package/.agent/agents/swarm-worker-contracts.md +166 -0
- package/.agent/agents/swarm-worker-registry.md +92 -0
- package/.agent/rules/GEMINI.md +134 -5
- package/.agent/scripts/bundle_analyzer.py +259 -0
- package/.agent/scripts/dependency_analyzer.py +247 -0
- package/.agent/scripts/lint_runner.py +188 -0
- package/.agent/scripts/patch_skills_meta.py +177 -0
- package/.agent/scripts/patch_skills_output.py +285 -0
- package/.agent/scripts/schema_validator.py +279 -0
- package/.agent/scripts/security_scan.py +224 -0
- package/.agent/scripts/session_manager.py +144 -3
- package/.agent/scripts/skill_integrator.py +234 -0
- package/.agent/scripts/strengthen_skills.py +220 -0
- package/.agent/scripts/swarm_dispatcher.py +317 -0
- package/.agent/scripts/test_runner.py +192 -0
- package/.agent/scripts/test_swarm_dispatcher.py +163 -0
- package/.agent/skills/agent-organizer/SKILL.md +132 -0
- package/.agent/skills/agentic-patterns/SKILL.md +335 -0
- package/.agent/skills/api-patterns/SKILL.md +226 -50
- package/.agent/skills/app-builder/SKILL.md +215 -52
- package/.agent/skills/architecture/SKILL.md +176 -31
- package/.agent/skills/bash-linux/SKILL.md +150 -134
- package/.agent/skills/behavioral-modes/SKILL.md +152 -160
- package/.agent/skills/brainstorming/SKILL.md +148 -101
- package/.agent/skills/brainstorming/dynamic-questioning.md +10 -0
- package/.agent/skills/clean-code/SKILL.md +139 -134
- package/.agent/skills/code-review-checklist/SKILL.md +177 -80
- package/.agent/skills/config-validator/SKILL.md +165 -0
- package/.agent/skills/csharp-developer/SKILL.md +107 -0
- package/.agent/skills/database-design/SKILL.md +252 -29
- package/.agent/skills/deployment-procedures/SKILL.md +122 -175
- package/.agent/skills/devops-engineer/SKILL.md +134 -0
- package/.agent/skills/devops-incident-responder/SKILL.md +98 -0
- package/.agent/skills/documentation-templates/SKILL.md +175 -121
- package/.agent/skills/dotnet-core-expert/SKILL.md +103 -0
- package/.agent/skills/edge-computing/SKILL.md +213 -0
- package/.agent/skills/frontend-design/SKILL.md +76 -0
- package/.agent/skills/frontend-design/color-system.md +18 -0
- package/.agent/skills/frontend-design/typography-system.md +18 -0
- package/.agent/skills/game-development/SKILL.md +69 -0
- package/.agent/skills/geo-fundamentals/SKILL.md +158 -99
- package/.agent/skills/github-operations/SKILL.md +354 -0
- package/.agent/skills/i18n-localization/SKILL.md +158 -96
- package/.agent/skills/intelligent-routing/SKILL.md +89 -285
- package/.agent/skills/intelligent-routing/router-manifest.md +65 -0
- package/.agent/skills/lint-and-validate/SKILL.md +229 -27
- package/.agent/skills/llm-engineering/SKILL.md +258 -0
- package/.agent/skills/local-first/SKILL.md +203 -0
- package/.agent/skills/mcp-builder/SKILL.md +159 -111
- package/.agent/skills/mobile-design/SKILL.md +102 -282
- package/.agent/skills/nextjs-react-expert/SKILL.md +143 -227
- package/.agent/skills/nodejs-best-practices/SKILL.md +201 -254
- package/.agent/skills/observability/SKILL.md +285 -0
- package/.agent/skills/parallel-agents/SKILL.md +124 -118
- package/.agent/skills/performance-profiling/SKILL.md +143 -89
- package/.agent/skills/plan-writing/SKILL.md +133 -97
- package/.agent/skills/platform-engineer/SKILL.md +135 -0
- package/.agent/skills/powershell-windows/SKILL.md +167 -104
- package/.agent/skills/python-patterns/SKILL.md +149 -361
- package/.agent/skills/python-pro/SKILL.md +114 -0
- package/.agent/skills/react-specialist/SKILL.md +107 -0
- package/.agent/skills/readme-builder/SKILL.md +270 -0
- package/.agent/skills/realtime-patterns/SKILL.md +296 -0
- package/.agent/skills/red-team-tactics/SKILL.md +136 -134
- package/.agent/skills/rust-pro/SKILL.md +237 -173
- package/.agent/skills/seo-fundamentals/SKILL.md +134 -82
- package/.agent/skills/server-management/SKILL.md +155 -104
- package/.agent/skills/sql-pro/SKILL.md +104 -0
- package/.agent/skills/systematic-debugging/SKILL.md +156 -79
- package/.agent/skills/tailwind-patterns/SKILL.md +163 -205
- package/.agent/skills/tdd-workflow/SKILL.md +148 -88
- package/.agent/skills/test-result-analyzer/SKILL.md +299 -0
- package/.agent/skills/testing-patterns/SKILL.md +141 -114
- package/.agent/skills/trend-researcher/SKILL.md +228 -0
- package/.agent/skills/ui-ux-pro-max/SKILL.md +107 -0
- package/.agent/skills/ui-ux-researcher/SKILL.md +234 -0
- package/.agent/skills/vue-expert/SKILL.md +118 -0
- package/.agent/skills/vulnerability-scanner/SKILL.md +228 -188
- package/.agent/skills/web-design-guidelines/SKILL.md +148 -33
- package/.agent/skills/webapp-testing/SKILL.md +171 -122
- package/.agent/skills/whimsy-injector/SKILL.md +349 -0
- package/.agent/skills/workflow-optimizer/SKILL.md +219 -0
- package/.agent/workflows/api-tester.md +279 -0
- package/.agent/workflows/audit.md +168 -0
- package/.agent/workflows/brainstorm.md +65 -19
- package/.agent/workflows/changelog.md +144 -0
- package/.agent/workflows/create.md +67 -14
- package/.agent/workflows/debug.md +122 -30
- package/.agent/workflows/deploy.md +82 -31
- package/.agent/workflows/enhance.md +59 -27
- package/.agent/workflows/fix.md +143 -0
- package/.agent/workflows/generate.md +84 -20
- package/.agent/workflows/migrate.md +163 -0
- package/.agent/workflows/orchestrate.md +66 -17
- package/.agent/workflows/performance-benchmarker.md +305 -0
- package/.agent/workflows/plan.md +76 -33
- package/.agent/workflows/preview.md +73 -17
- package/.agent/workflows/refactor.md +153 -0
- package/.agent/workflows/review-ai.md +140 -0
- package/.agent/workflows/review.md +83 -16
- package/.agent/workflows/session.md +154 -0
- package/.agent/workflows/status.md +74 -18
- package/.agent/workflows/strengthen-skills.md +99 -0
- package/.agent/workflows/swarm.md +194 -0
- package/.agent/workflows/test.md +80 -31
- package/.agent/workflows/tribunal-backend.md +55 -13
- package/.agent/workflows/tribunal-database.md +62 -18
- package/.agent/workflows/tribunal-frontend.md +58 -12
- package/.agent/workflows/tribunal-full.md +70 -11
- package/.agent/workflows/tribunal-mobile.md +123 -0
- package/.agent/workflows/tribunal-performance.md +152 -0
- package/.agent/workflows/ui-ux-pro-max.md +100 -82
- package/README.md +117 -62
- package/bin/tribunal-kit.js +542 -288
- package/package.json +10 -6
|
@@ -1,149 +1,209 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: tdd-workflow
|
|
3
3
|
description: Test-Driven Development workflow principles. RED-GREEN-REFACTOR cycle.
|
|
4
|
-
allowed-tools: Read, Write, Edit, Glob, Grep
|
|
4
|
+
allowed-tools: Read, Write, Edit, Glob, Grep
|
|
5
|
+
version: 1.0.0
|
|
6
|
+
last-updated: 2026-03-12
|
|
7
|
+
applies-to-model: gemini-2.5-pro, claude-3-7-sonnet
|
|
5
8
|
---
|
|
6
9
|
|
|
7
|
-
#
|
|
10
|
+
# Test-Driven Development
|
|
8
11
|
|
|
9
|
-
>
|
|
12
|
+
> TDD is not about testing. It is about design.
|
|
13
|
+
> Writing the test first forces you to design the interface before you know how it will be implemented.
|
|
10
14
|
|
|
11
15
|
---
|
|
12
16
|
|
|
13
|
-
##
|
|
17
|
+
## The RED-GREEN-REFACTOR Cycle
|
|
18
|
+
|
|
19
|
+
Every change in TDD follows three phases:
|
|
14
20
|
|
|
15
21
|
```
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
↓
|
|
20
|
-
🔵 REFACTOR → Improve code quality
|
|
21
|
-
↓
|
|
22
|
-
Repeat...
|
|
22
|
+
RED → Write a test that fails (for code that doesn't exist yet)
|
|
23
|
+
GREEN → Write the minimum code to make the test pass
|
|
24
|
+
REFACTOR → Clean up the code without changing its behavior
|
|
23
25
|
```
|
|
24
26
|
|
|
27
|
+
The constraint is important: in GREEN phase, write only enough code to pass the test. No more.
|
|
28
|
+
|
|
25
29
|
---
|
|
26
30
|
|
|
27
|
-
##
|
|
31
|
+
## RED Phase — Write a Failing Test
|
|
28
32
|
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
33
|
+
Write a test that:
|
|
34
|
+
1. Describes one specific piece of behavior
|
|
35
|
+
2. Uses the API you wish existed (design the interface first)
|
|
36
|
+
3. Fails for the right reason (not a syntax error — a logical failure)
|
|
32
37
|
|
|
33
|
-
|
|
38
|
+
```ts
|
|
39
|
+
// RED: This test fails because `validatePassword` doesn't exist yet
|
|
40
|
+
it('should reject passwords shorter than 8 characters', () => {
|
|
41
|
+
const result = validatePassword('short');
|
|
42
|
+
expect(result.valid).toBe(false);
|
|
43
|
+
expect(result.error).toBe('Password must be at least 8 characters');
|
|
44
|
+
});
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
**The test failing for the right reason is the signal.** If it fails because of a missing import, that's not the RED phase — that's setup.
|
|
34
48
|
|
|
35
|
-
|
|
49
|
+
---
|
|
36
50
|
|
|
37
|
-
|
|
51
|
+
## GREEN Phase — Minimum Code to Pass
|
|
38
52
|
|
|
39
|
-
|
|
40
|
-
|-------|---------|
|
|
41
|
-
| Behavior | "should add two numbers" |
|
|
42
|
-
| Edge cases | "should handle empty input" |
|
|
43
|
-
| Error states | "should throw for invalid data" |
|
|
53
|
+
Write only what is needed for the test to pass. Resist the urge to also handle the "other cases" — those will get their own tests.
|
|
44
54
|
|
|
45
|
-
|
|
55
|
+
```ts
|
|
56
|
+
// GREEN: Minimum implementation to pass the one test
|
|
57
|
+
function validatePassword(password: string): { valid: boolean; error?: string } {
|
|
58
|
+
if (password.length < 8) {
|
|
59
|
+
return { valid: false, error: 'Password must be at least 8 characters' };
|
|
60
|
+
}
|
|
61
|
+
return { valid: true };
|
|
62
|
+
}
|
|
63
|
+
```
|
|
46
64
|
|
|
47
|
-
|
|
48
|
-
- Test name describes expected behavior
|
|
49
|
-
- One assertion per test (ideally)
|
|
65
|
+
The code may be ugly. That is fine. GREEN is about passing the test, not about clean code.
|
|
50
66
|
|
|
51
67
|
---
|
|
52
68
|
|
|
53
|
-
##
|
|
69
|
+
## REFACTOR Phase — Clean Without Breaking
|
|
54
70
|
|
|
55
|
-
|
|
71
|
+
Now that the test is green, improve the code:
|
|
72
|
+
- Extract duplication
|
|
73
|
+
- Clarify naming
|
|
74
|
+
- Simplify logic
|
|
56
75
|
|
|
57
|
-
|
|
58
|
-
|-----------|---------|
|
|
59
|
-
| **YAGNI** | You Aren't Gonna Need It |
|
|
60
|
-
| **Simplest thing** | Write the minimum to pass |
|
|
61
|
-
| **No optimization** | Just make it work |
|
|
76
|
+
The constraint: all tests must stay green during and after refactor.
|
|
62
77
|
|
|
63
|
-
|
|
78
|
+
```ts
|
|
79
|
+
// REFACTOR: Same behavior, cleaner structure
|
|
80
|
+
const MIN_PASSWORD_LENGTH = 8;
|
|
64
81
|
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
82
|
+
function validatePassword(password: string): ValidationResult {
|
|
83
|
+
if (password.length < MIN_PASSWORD_LENGTH) {
|
|
84
|
+
return failure(`Password must be at least ${MIN_PASSWORD_LENGTH} characters`);
|
|
85
|
+
}
|
|
86
|
+
return success();
|
|
87
|
+
}
|
|
88
|
+
```
|
|
68
89
|
|
|
69
90
|
---
|
|
70
91
|
|
|
71
|
-
##
|
|
92
|
+
## Triangulation
|
|
72
93
|
|
|
73
|
-
|
|
94
|
+
When a single test could be satisfied by a hardcoded value, write a second test to force a real implementation.
|
|
74
95
|
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
| Complexity | Simplify logic |
|
|
96
|
+
```ts
|
|
97
|
+
// Test 1: Could be satisfied by always returning 2
|
|
98
|
+
it('should add two numbers', () => {
|
|
99
|
+
expect(add(1, 1)).toBe(2);
|
|
100
|
+
});
|
|
81
101
|
|
|
82
|
-
|
|
102
|
+
// Test 2: Forces a real implementation
|
|
103
|
+
it('should add two different numbers', () => {
|
|
104
|
+
expect(add(3, 4)).toBe(7);
|
|
105
|
+
});
|
|
106
|
+
```
|
|
83
107
|
|
|
84
|
-
|
|
85
|
-
- Small incremental changes
|
|
86
|
-
- Commit after each refactor
|
|
108
|
+
**Rule:** If your implementation could be a constant or a special case, triangulate.
|
|
87
109
|
|
|
88
110
|
---
|
|
89
111
|
|
|
90
|
-
##
|
|
112
|
+
## When TDD Pays Off
|
|
91
113
|
|
|
92
|
-
|
|
114
|
+
TDD's ROI is highest for:
|
|
115
|
+
- Business logic (calculation, validation, state machines)
|
|
116
|
+
- Utility functions used in many places
|
|
117
|
+
- Error handling paths that are hard to trigger manually
|
|
118
|
+
- Refactoring existing code you want to verify still works
|
|
93
119
|
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
| **Assert** | Verify expected outcome |
|
|
120
|
+
TDD's ROI is lower for:
|
|
121
|
+
- UI components (Storybook + visual review is often more efficient)
|
|
122
|
+
- Database migrations (integration test after, not TDD)
|
|
123
|
+
- Exploratory/prototype code that will be thrown away
|
|
99
124
|
|
|
100
125
|
---
|
|
101
126
|
|
|
102
|
-
##
|
|
127
|
+
## Common TDD Mistakes
|
|
103
128
|
|
|
104
|
-
|
|
|
105
|
-
|
|
106
|
-
|
|
|
107
|
-
|
|
|
108
|
-
|
|
|
109
|
-
|
|
|
110
|
-
|
|
|
129
|
+
| Mistake | Effect |
|
|
130
|
+
|---|---|
|
|
131
|
+
| Writing tests after implementation | Tests confirm the implementation, not the behavior |
|
|
132
|
+
| Testing too much in one cycle | Large RED-GREEN steps hide design problems |
|
|
133
|
+
| Skipping REFACTOR | Code quality degrades with each cycle |
|
|
134
|
+
| Not reaching RED | Writing tests that pass immediately means the implementation already existed |
|
|
135
|
+
| Mocking everything | Tests become coupled to implementation, not behavior |
|
|
111
136
|
|
|
112
137
|
---
|
|
113
138
|
|
|
114
|
-
##
|
|
139
|
+
## 🛑 Verification-Before-Completion (VBC) Protocol
|
|
115
140
|
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
| 2 | Error cases |
|
|
120
|
-
| 3 | Edge cases |
|
|
121
|
-
| 4 | Performance |
|
|
141
|
+
**CRITICAL:** You must follow a strict "evidence-based closeout" state machine.
|
|
142
|
+
- ❌ **Forbidden:** Ending the GREEN or REFACTOR phases based on assumption that the code is correct.
|
|
143
|
+
- ✅ **Required:** You are explicitly forbidden from completing a test cycle or ending your task without providing **concrete terminal evidence** that the test suite actually ran and returned a strictly passing (GREEN) result.
|
|
122
144
|
|
|
123
145
|
---
|
|
124
146
|
|
|
125
|
-
##
|
|
147
|
+
## Output Format
|
|
148
|
+
|
|
149
|
+
When this skill produces or reviews code, structure your output as follows:
|
|
150
|
+
|
|
151
|
+
```
|
|
152
|
+
━━━ Tdd Workflow Report ━━━━━━━━━━━━━━━━━━━━━━━━
|
|
153
|
+
Skill: Tdd Workflow
|
|
154
|
+
Language: [detected language / framework]
|
|
155
|
+
Scope: [N files · N functions]
|
|
156
|
+
─────────────────────────────────────────────────
|
|
157
|
+
✅ Passed: [checks that passed, or "All clean"]
|
|
158
|
+
⚠️ Warnings: [non-blocking issues, or "None"]
|
|
159
|
+
❌ Blocked: [blocking issues requiring fix, or "None"]
|
|
160
|
+
─────────────────────────────────────────────────
|
|
161
|
+
VBC status: PENDING → VERIFIED
|
|
162
|
+
Evidence: [test output / lint pass / compile success]
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
**VBC (Verification-Before-Completion) is mandatory.**
|
|
166
|
+
Do not mark status as VERIFIED until concrete terminal evidence is provided.
|
|
167
|
+
|
|
126
168
|
|
|
127
|
-
| ❌ Don't | ✅ Do |
|
|
128
|
-
|----------|-------|
|
|
129
|
-
| Skip the RED phase | Watch test fail first |
|
|
130
|
-
| Write tests after | Write tests before |
|
|
131
|
-
| Over-engineer initial | Keep it simple |
|
|
132
|
-
| Multiple asserts | One behavior per test |
|
|
133
|
-
| Test implementation | Test behavior |
|
|
134
169
|
|
|
135
170
|
---
|
|
136
171
|
|
|
137
|
-
##
|
|
172
|
+
## 🤖 LLM-Specific Traps
|
|
138
173
|
|
|
139
|
-
|
|
174
|
+
AI coding assistants often fall into specific bad habits when dealing with this domain. These are strictly forbidden:
|
|
140
175
|
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
176
|
+
1. **Over-engineering:** Proposing complex abstractions or distributed systems when a simpler approach suffices.
|
|
177
|
+
2. **Hallucinated Libraries/Methods:** Using non-existent methods or packages. Always `// VERIFY` or check `package.json` / `requirements.txt`.
|
|
178
|
+
3. **Skipping Edge Cases:** Writing the "happy path" and ignoring error handling, timeouts, or data validation.
|
|
179
|
+
4. **Context Amnesia:** Forgetting the user's constraints and offering generic advice instead of tailored solutions.
|
|
180
|
+
5. **Silent Degradation:** Catching and suppressing errors without logging or re-raising.
|
|
146
181
|
|
|
147
182
|
---
|
|
148
183
|
|
|
149
|
-
|
|
184
|
+
## 🏛️ Tribunal Integration (Anti-Hallucination)
|
|
185
|
+
|
|
186
|
+
**Slash command: `/review` or `/tribunal-full`**
|
|
187
|
+
**Active reviewers: `logic-reviewer` · `security-auditor`**
|
|
188
|
+
|
|
189
|
+
### ❌ Forbidden AI Tropes
|
|
190
|
+
|
|
191
|
+
1. **Blind Assumptions:** Never make an assumption without documenting it clearly with `// VERIFY: [reason]`.
|
|
192
|
+
2. **Silent Degradation:** Catching and suppressing errors without logging or handling.
|
|
193
|
+
3. **Context Amnesia:** Forgetting the user's constraints and offering generic advice instead of tailored solutions.
|
|
194
|
+
|
|
195
|
+
### ✅ Pre-Flight Self-Audit
|
|
196
|
+
|
|
197
|
+
Review these questions before confirming output:
|
|
198
|
+
```
|
|
199
|
+
✅ Did I rely ONLY on real, verified tools and methods?
|
|
200
|
+
✅ Is this solution appropriately scoped to the user's constraints?
|
|
201
|
+
✅ Did I handle potential failure modes and edge cases?
|
|
202
|
+
✅ Have I avoided generic boilerplate that doesn't add value?
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
### 🛑 Verification-Before-Completion (VBC) Protocol
|
|
206
|
+
|
|
207
|
+
**CRITICAL:** You must follow a strict "evidence-based closeout" state machine.
|
|
208
|
+
- ❌ **Forbidden:** Declaring a task complete because the output "looks correct."
|
|
209
|
+
- ✅ **Required:** You are explicitly forbidden from finalizing any task without providing **concrete evidence** (terminal output, passing tests, compile success, or equivalent proof) that your output works as intended.
|
|
@@ -0,0 +1,299 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: test-result-analyzer
|
|
3
|
+
description: Ingests test logs and identifies root causes across multiple failing test files. Provides actionable fix recommendations.
|
|
4
|
+
skills:
|
|
5
|
+
- systematic-debugging
|
|
6
|
+
- testing-patterns
|
|
7
|
+
version: 1.0.0
|
|
8
|
+
last-updated: 2026-03-12
|
|
9
|
+
applies-to-model: gemini-2.5-pro, claude-3-7-sonnet
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Test Result Analyzer Skill
|
|
13
|
+
|
|
14
|
+
You are a specialist in analyzing test output — not writing tests, but *understanding why tests fail*. You turn walls of red error text into a prioritized action plan.
|
|
15
|
+
|
|
16
|
+
## When to Activate
|
|
17
|
+
|
|
18
|
+
- After a test run with multiple failures.
|
|
19
|
+
- When the user says "tests are failing", "analyze test results", "what broke?", or "test failed".
|
|
20
|
+
- During CI/CD pipeline debugging.
|
|
21
|
+
- When `test_runner.py` or any test command exits with failures.
|
|
22
|
+
- When paired with `systematic-debugging` for deep root-cause investigation.
|
|
23
|
+
|
|
24
|
+
## Analysis Pipeline
|
|
25
|
+
|
|
26
|
+
```
|
|
27
|
+
Test output (terminal or log file)
|
|
28
|
+
│
|
|
29
|
+
▼
|
|
30
|
+
Runner detection — identify test framework from output format
|
|
31
|
+
│
|
|
32
|
+
▼
|
|
33
|
+
Failure extraction — parse each FAIL block into structured data
|
|
34
|
+
│
|
|
35
|
+
▼
|
|
36
|
+
Clustering — group failures by root module, error type, shared dependency
|
|
37
|
+
│
|
|
38
|
+
▼
|
|
39
|
+
FPF detection — find the First Point of Failure
|
|
40
|
+
│
|
|
41
|
+
▼
|
|
42
|
+
Dependency graph — map cascade relationships
|
|
43
|
+
│
|
|
44
|
+
▼
|
|
45
|
+
Fix recommendations — ordered by impact (most failures resolved first)
|
|
46
|
+
│
|
|
47
|
+
▼
|
|
48
|
+
Report — structured output with confidence levels
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
## Step 1: Runner Detection
|
|
52
|
+
|
|
53
|
+
Auto-detect the test framework from output patterns:
|
|
54
|
+
|
|
55
|
+
| Framework | Detection Pattern | Failure Marker |
|
|
56
|
+
|---|---|---|
|
|
57
|
+
| Jest | `PASS`/`FAIL` with file paths, `●` for test names | `FAIL src/...` |
|
|
58
|
+
| Vitest | `✓`/`×` markers, `FAIL` blocks | `❯ FAIL` or `× test name` |
|
|
59
|
+
| pytest | `PASSED`/`FAILED` with `::` separator | `FAILED tests/...::test_name` |
|
|
60
|
+
| Go test | `ok`/`FAIL` with package paths | `--- FAIL: TestName` |
|
|
61
|
+
| Mocha | `passing`/`failing` counts, indented suites | `N failing` section |
|
|
62
|
+
| JUnit (XML) | `<testsuite>` XML structure | `<failure>` elements |
|
|
63
|
+
| RSpec | `.F` markers, `Failures:` section | `Failure/Error:` |
|
|
64
|
+
| Cargo test | `test result: FAILED` | `---- test_name stdout ----` |
|
|
65
|
+
|
|
66
|
+
## Step 2: Failure Extraction
|
|
67
|
+
|
|
68
|
+
For each failure, extract a structured record:
|
|
69
|
+
|
|
70
|
+
```
|
|
71
|
+
{
|
|
72
|
+
test_name: "should return 401 for unauthenticated requests"
|
|
73
|
+
test_file: "src/api/auth.test.ts"
|
|
74
|
+
test_line: 42
|
|
75
|
+
error_type: "AssertionError"
|
|
76
|
+
expected: "401"
|
|
77
|
+
received: "200"
|
|
78
|
+
stack_trace: ["auth.test.ts:42", "auth.middleware.ts:18", "express/router.ts:..."]
|
|
79
|
+
source_files: ["auth.middleware.ts:18"] // files from YOUR codebase in the stack
|
|
80
|
+
}
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
## Step 3: Failure Clustering
|
|
84
|
+
|
|
85
|
+
Group failures into clusters based on shared characteristics:
|
|
86
|
+
|
|
87
|
+
### Cluster Types
|
|
88
|
+
|
|
89
|
+
| Cluster Type | How to Detect | Typical Root Cause |
|
|
90
|
+
|---|---|---|
|
|
91
|
+
| **Shared Module** | Multiple tests import from the same file that changed | Missing export, type change, API change |
|
|
92
|
+
| **Same Error Type** | All failures throw `TypeError` or `ConnectionError` | Broken dependency, env issue |
|
|
93
|
+
| **Shared Fixture** | Tests using same `beforeEach`/setup fail together | Fixture setup failure cascading |
|
|
94
|
+
| **Import Chain** | Failures follow the import graph | Dependency that fails to resolve |
|
|
95
|
+
| **Environment** | All tests fail with connection/config errors | Missing env var, DB not running |
|
|
96
|
+
| **Timing** | Tests pass individually, fail together | Race condition, shared state |
|
|
97
|
+
| **Snapshot** | Multiple `toMatchSnapshot` failures | Intentional UI change (update snapshots) |
|
|
98
|
+
|
|
99
|
+
### Cascade Detection Algorithm
|
|
100
|
+
|
|
101
|
+
```
|
|
102
|
+
1. Sort failures by file path and execution order.
|
|
103
|
+
2. Find the FIRST failure in execution order → candidate FPF.
|
|
104
|
+
3. Check if the FPF's source file appears in other failures' import chains.
|
|
105
|
+
4. If yes → FPF is the root cause, other failures are cascades.
|
|
106
|
+
5. If no → failures are independent (multiple root causes).
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
## Step 4: First Point of Failure (FPF) Detection
|
|
110
|
+
|
|
111
|
+
The FPF is the most valuable finding — fix it first, and cascading failures resolve automatically.
|
|
112
|
+
|
|
113
|
+
```
|
|
114
|
+
Example:
|
|
115
|
+
12 test files fail.
|
|
116
|
+
11 of them import from `utils/auth.ts`.
|
|
117
|
+
The first failure is in `utils/auth.test.ts` at line 42.
|
|
118
|
+
Error: `generateToken is not exported from './auth'`
|
|
119
|
+
|
|
120
|
+
FPF: utils/auth.ts:42 — missing export
|
|
121
|
+
Cascade: 11 other test files fail because they can't import generateToken
|
|
122
|
+
Fix: Add `export { generateToken }` to utils/auth.ts
|
|
123
|
+
Expected resolution: 12 of 12 failures (100%)
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
**FPF Confidence Levels:**
|
|
127
|
+
|
|
128
|
+
| Confidence | Criteria |
|
|
129
|
+
|---|---|
|
|
130
|
+
| **HIGH** | Same source file in >50% of failure stack traces |
|
|
131
|
+
| **MEDIUM** | Same error type across multiple test files |
|
|
132
|
+
| **LOW** | Failures appear independent, multiple root causes likely |
|
|
133
|
+
|
|
134
|
+
## Step 5: Fix Recommendations
|
|
135
|
+
|
|
136
|
+
For each cluster, provide actionable fixes:
|
|
137
|
+
|
|
138
|
+
| Fix Type | Example | How to Verify |
|
|
139
|
+
|---|---|---|
|
|
140
|
+
| **Missing Export** | `export { fn }` added to module | Re-run failing tests |
|
|
141
|
+
| **Type Mismatch** | Function signature changed, callers need update | Check callers with `grep_search` |
|
|
142
|
+
| **Stale Mock** | Mock doesn't match new interface | Compare mock to actual implementation |
|
|
143
|
+
| **Env Variable** | `.env.test` missing `DATABASE_URL` | Check `.env.example` vs `.env.test` |
|
|
144
|
+
| **Snapshot Update** | Intentional UI change | Run with `--updateSnapshot` flag |
|
|
145
|
+
| **Race Condition** | Tests share global state | Add isolation or `beforeEach` reset |
|
|
146
|
+
| **Dependency Update** | Package API changed after upgrade | Check changelog of updated package |
|
|
147
|
+
|
|
148
|
+
### Fix Priority Formula
|
|
149
|
+
```
|
|
150
|
+
Priority = (Tests_Resolved × 10) + (Confidence_Score × 5) - (Estimated_Fix_Time_Minutes)
|
|
151
|
+
|
|
152
|
+
Fix in this order:
|
|
153
|
+
1. Highest priority score first
|
|
154
|
+
2. If tied, prefer HIGH confidence
|
|
155
|
+
3. If still tied, prefer fewer files to change
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
## Report Format
|
|
159
|
+
|
|
160
|
+
```
|
|
161
|
+
━━━ Test Result Analysis ━━━━━━━━━━━━━━━━
|
|
162
|
+
|
|
163
|
+
Runner: [Jest / Vitest / pytest / Go / auto-detected]
|
|
164
|
+
Total: 48 tests across 12 files
|
|
165
|
+
Result: 36 passed | 12 failed | 0 skipped
|
|
166
|
+
Duration: 4.2s
|
|
167
|
+
Coverage: 78% statements (if available)
|
|
168
|
+
|
|
169
|
+
━━━ First Point of Failure ━━━━━━━━━━━━━━
|
|
170
|
+
|
|
171
|
+
📍 utils/auth.test.ts → line 42
|
|
172
|
+
Error: `generateToken` is not exported from `./auth`
|
|
173
|
+
Type: ImportError
|
|
174
|
+
Impact: Cascades to 11 other test files
|
|
175
|
+
|
|
176
|
+
This is the root cause. Fix this first.
|
|
177
|
+
|
|
178
|
+
━━━ Failure Clusters ━━━━━━━━━━━━━━━━━━━━
|
|
179
|
+
|
|
180
|
+
Cluster 1: Missing Export (11 tests, HIGH confidence)
|
|
181
|
+
Root: utils/auth.ts:42
|
|
182
|
+
Cascade: auth.test.ts, users.test.ts, sessions.test.ts, ...
|
|
183
|
+
Fix: Add `export { generateToken }` to auth.ts
|
|
184
|
+
Resolution: 11 of 12 failures (92%)
|
|
185
|
+
Priority: ★★★★★ (115 pts)
|
|
186
|
+
|
|
187
|
+
Cluster 2: Stale Mock (1 test, MEDIUM confidence)
|
|
188
|
+
Root: api/users.test.ts:98
|
|
189
|
+
Error: Expected { name, email, role } but received { name, email }
|
|
190
|
+
Fix: Add `role: "user"` to mock at line 15
|
|
191
|
+
Resolution: 1 of 12 failures (8%)
|
|
192
|
+
Priority: ★★☆☆☆ (20 pts)
|
|
193
|
+
|
|
194
|
+
━━━ Fix Plan ━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
195
|
+
|
|
196
|
+
Step 1: Fix utils/auth.ts export
|
|
197
|
+
→ Expected: 11 failures resolved
|
|
198
|
+
→ Time: ~2 minutes
|
|
199
|
+
→ Run: npx jest utils/auth.test.ts (verify FPF fix)
|
|
200
|
+
|
|
201
|
+
Step 2: Update mock in api/users.test.ts:15
|
|
202
|
+
→ Expected: 1 failure resolved
|
|
203
|
+
→ Time: ~1 minute
|
|
204
|
+
|
|
205
|
+
Step 3: Re-run full suite
|
|
206
|
+
→ Expected: all 12 failures resolved (0 remaining)
|
|
207
|
+
|
|
208
|
+
━━━ Warnings ━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
209
|
+
|
|
210
|
+
⚠️ No test coverage report detected. Consider adding --coverage flag.
|
|
211
|
+
⚠️ 3 test files have no assertions (test names end in `.todo`).
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
## Edge Cases
|
|
215
|
+
|
|
216
|
+
### All Tests Fail
|
|
217
|
+
```
|
|
218
|
+
If 100% of tests fail → likely environment issue, not code:
|
|
219
|
+
1. Check if dev server / database is running
|
|
220
|
+
2. Check .env.test for missing variables
|
|
221
|
+
3. Check node_modules exists (run npm install)
|
|
222
|
+
4. Check for breaking dependency upgrade in recent commits
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
### Flaky Tests
|
|
226
|
+
```
|
|
227
|
+
If same test passes on retry → flaky:
|
|
228
|
+
1. Check for shared mutable state between tests
|
|
229
|
+
2. Check for time-dependent assertions
|
|
230
|
+
3. Check for unresolved promises / async leaks
|
|
231
|
+
4. Check for network-dependent tests without mocks
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
### Only Snapshot Tests Fail
|
|
235
|
+
```
|
|
236
|
+
If only snapshot tests fail → likely intentional UI change:
|
|
237
|
+
1. Review snapshot diffs
|
|
238
|
+
2. If changes are expected: run with --updateSnapshot
|
|
239
|
+
3. If changes are unexpected: check for unintended CSS/component changes
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
## Cross-Skill Integration
|
|
243
|
+
|
|
244
|
+
| Paired Skill | Integration Point |
|
|
245
|
+
|---|---|
|
|
246
|
+
| `systematic-debugging` | Escalate when FPF is unclear → 4-phase debug methodology |
|
|
247
|
+
| `testing-patterns` | Reference when recommending test structure improvements |
|
|
248
|
+
| `workflow-optimizer` | Flag inefficient test-debug-retest loops |
|
|
249
|
+
|
|
250
|
+
## Anti-Hallucination Guard
|
|
251
|
+
|
|
252
|
+
- **Only analyze test output that was actually produced** — never generate fake test results.
|
|
253
|
+
- **Never invent file paths or line numbers** — only reference what appears in the stack trace.
|
|
254
|
+
- **Verify source files exist** before suggesting fixes — use `view_file` or `find_by_name`.
|
|
255
|
+
- **Mark uncertainty**: `// UNCERTAIN: log format not fully recognized, manual review recommended`.
|
|
256
|
+
- **Never guess at assertion values** — quote exactly what "Expected" and "Received" say in the output.
|
|
257
|
+
- **Don't assume test runner** — auto-detect from output format, don't assume Jest.
|
|
258
|
+
|
|
259
|
+
|
|
260
|
+
---
|
|
261
|
+
|
|
262
|
+
## 🤖 LLM-Specific Traps
|
|
263
|
+
|
|
264
|
+
AI coding assistants often fall into specific bad habits when dealing with this domain. These are strictly forbidden:
|
|
265
|
+
|
|
266
|
+
1. **Over-engineering:** Proposing complex abstractions or distributed systems when a simpler approach suffices.
|
|
267
|
+
2. **Hallucinated Libraries/Methods:** Using non-existent methods or packages. Always `// VERIFY` or check `package.json` / `requirements.txt`.
|
|
268
|
+
3. **Skipping Edge Cases:** Writing the "happy path" and ignoring error handling, timeouts, or data validation.
|
|
269
|
+
4. **Context Amnesia:** Forgetting the user's constraints and offering generic advice instead of tailored solutions.
|
|
270
|
+
5. **Silent Degradation:** Catching and suppressing errors without logging or re-raising.
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
## 🏛️ Tribunal Integration (Anti-Hallucination)
|
|
275
|
+
|
|
276
|
+
**Slash command: `/review` or `/tribunal-full`**
|
|
277
|
+
**Active reviewers: `logic-reviewer` · `security-auditor`**
|
|
278
|
+
|
|
279
|
+
### ❌ Forbidden AI Tropes
|
|
280
|
+
|
|
281
|
+
1. **Blind Assumptions:** Never make an assumption without documenting it clearly with `// VERIFY: [reason]`.
|
|
282
|
+
2. **Silent Degradation:** Catching and suppressing errors without logging or handling.
|
|
283
|
+
3. **Context Amnesia:** Forgetting the user's constraints and offering generic advice instead of tailored solutions.
|
|
284
|
+
|
|
285
|
+
### ✅ Pre-Flight Self-Audit
|
|
286
|
+
|
|
287
|
+
Review these questions before confirming output:
|
|
288
|
+
```
|
|
289
|
+
✅ Did I rely ONLY on real, verified tools and methods?
|
|
290
|
+
✅ Is this solution appropriately scoped to the user's constraints?
|
|
291
|
+
✅ Did I handle potential failure modes and edge cases?
|
|
292
|
+
✅ Have I avoided generic boilerplate that doesn't add value?
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
### 🛑 Verification-Before-Completion (VBC) Protocol
|
|
296
|
+
|
|
297
|
+
**CRITICAL:** You must follow a strict "evidence-based closeout" state machine.
|
|
298
|
+
- ❌ **Forbidden:** Declaring a task complete because the output "looks correct."
|
|
299
|
+
- ✅ **Required:** You are explicitly forbidden from finalizing any task without providing **concrete evidence** (terminal output, passing tests, compile success, or equivalent proof) that your output works as intended.
|