tribunal-kit 1.0.0 → 2.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (127) hide show
  1. package/.agent/.shared/ui-ux-pro-max/README.md +3 -3
  2. package/.agent/ARCHITECTURE.md +205 -10
  3. package/.agent/GEMINI.md +37 -7
  4. package/.agent/agents/accessibility-reviewer.md +134 -0
  5. package/.agent/agents/ai-code-reviewer.md +129 -0
  6. package/.agent/agents/frontend-specialist.md +3 -0
  7. package/.agent/agents/game-developer.md +21 -21
  8. package/.agent/agents/logic-reviewer.md +12 -0
  9. package/.agent/agents/mobile-reviewer.md +79 -0
  10. package/.agent/agents/orchestrator.md +56 -26
  11. package/.agent/agents/performance-reviewer.md +36 -0
  12. package/.agent/agents/supervisor-agent.md +156 -0
  13. package/.agent/agents/swarm-worker-contracts.md +166 -0
  14. package/.agent/agents/swarm-worker-registry.md +92 -0
  15. package/.agent/rules/GEMINI.md +134 -5
  16. package/.agent/scripts/bundle_analyzer.py +259 -0
  17. package/.agent/scripts/dependency_analyzer.py +247 -0
  18. package/.agent/scripts/lint_runner.py +188 -0
  19. package/.agent/scripts/patch_skills_meta.py +177 -0
  20. package/.agent/scripts/patch_skills_output.py +285 -0
  21. package/.agent/scripts/schema_validator.py +279 -0
  22. package/.agent/scripts/security_scan.py +224 -0
  23. package/.agent/scripts/session_manager.py +144 -3
  24. package/.agent/scripts/skill_integrator.py +234 -0
  25. package/.agent/scripts/strengthen_skills.py +220 -0
  26. package/.agent/scripts/swarm_dispatcher.py +317 -0
  27. package/.agent/scripts/test_runner.py +192 -0
  28. package/.agent/scripts/test_swarm_dispatcher.py +163 -0
  29. package/.agent/skills/agent-organizer/SKILL.md +132 -0
  30. package/.agent/skills/agentic-patterns/SKILL.md +335 -0
  31. package/.agent/skills/api-patterns/SKILL.md +226 -50
  32. package/.agent/skills/app-builder/SKILL.md +215 -52
  33. package/.agent/skills/architecture/SKILL.md +176 -31
  34. package/.agent/skills/bash-linux/SKILL.md +150 -134
  35. package/.agent/skills/behavioral-modes/SKILL.md +152 -160
  36. package/.agent/skills/brainstorming/SKILL.md +148 -101
  37. package/.agent/skills/brainstorming/dynamic-questioning.md +10 -0
  38. package/.agent/skills/clean-code/SKILL.md +139 -134
  39. package/.agent/skills/code-review-checklist/SKILL.md +177 -80
  40. package/.agent/skills/config-validator/SKILL.md +165 -0
  41. package/.agent/skills/csharp-developer/SKILL.md +107 -0
  42. package/.agent/skills/database-design/SKILL.md +252 -29
  43. package/.agent/skills/deployment-procedures/SKILL.md +122 -175
  44. package/.agent/skills/devops-engineer/SKILL.md +134 -0
  45. package/.agent/skills/devops-incident-responder/SKILL.md +98 -0
  46. package/.agent/skills/documentation-templates/SKILL.md +175 -121
  47. package/.agent/skills/dotnet-core-expert/SKILL.md +103 -0
  48. package/.agent/skills/edge-computing/SKILL.md +213 -0
  49. package/.agent/skills/frontend-design/SKILL.md +76 -0
  50. package/.agent/skills/frontend-design/color-system.md +18 -0
  51. package/.agent/skills/frontend-design/typography-system.md +18 -0
  52. package/.agent/skills/game-development/SKILL.md +69 -0
  53. package/.agent/skills/geo-fundamentals/SKILL.md +158 -99
  54. package/.agent/skills/github-operations/SKILL.md +354 -0
  55. package/.agent/skills/i18n-localization/SKILL.md +158 -96
  56. package/.agent/skills/intelligent-routing/SKILL.md +89 -285
  57. package/.agent/skills/intelligent-routing/router-manifest.md +65 -0
  58. package/.agent/skills/lint-and-validate/SKILL.md +229 -27
  59. package/.agent/skills/llm-engineering/SKILL.md +258 -0
  60. package/.agent/skills/local-first/SKILL.md +203 -0
  61. package/.agent/skills/mcp-builder/SKILL.md +159 -111
  62. package/.agent/skills/mobile-design/SKILL.md +102 -282
  63. package/.agent/skills/nextjs-react-expert/SKILL.md +143 -227
  64. package/.agent/skills/nodejs-best-practices/SKILL.md +201 -254
  65. package/.agent/skills/observability/SKILL.md +285 -0
  66. package/.agent/skills/parallel-agents/SKILL.md +124 -118
  67. package/.agent/skills/performance-profiling/SKILL.md +143 -89
  68. package/.agent/skills/plan-writing/SKILL.md +133 -97
  69. package/.agent/skills/platform-engineer/SKILL.md +135 -0
  70. package/.agent/skills/powershell-windows/SKILL.md +167 -104
  71. package/.agent/skills/python-patterns/SKILL.md +149 -361
  72. package/.agent/skills/python-pro/SKILL.md +114 -0
  73. package/.agent/skills/react-specialist/SKILL.md +107 -0
  74. package/.agent/skills/readme-builder/SKILL.md +270 -0
  75. package/.agent/skills/realtime-patterns/SKILL.md +296 -0
  76. package/.agent/skills/red-team-tactics/SKILL.md +136 -134
  77. package/.agent/skills/rust-pro/SKILL.md +237 -173
  78. package/.agent/skills/seo-fundamentals/SKILL.md +134 -82
  79. package/.agent/skills/server-management/SKILL.md +155 -104
  80. package/.agent/skills/sql-pro/SKILL.md +104 -0
  81. package/.agent/skills/systematic-debugging/SKILL.md +156 -79
  82. package/.agent/skills/tailwind-patterns/SKILL.md +163 -205
  83. package/.agent/skills/tdd-workflow/SKILL.md +148 -88
  84. package/.agent/skills/test-result-analyzer/SKILL.md +299 -0
  85. package/.agent/skills/testing-patterns/SKILL.md +141 -114
  86. package/.agent/skills/trend-researcher/SKILL.md +228 -0
  87. package/.agent/skills/ui-ux-pro-max/SKILL.md +107 -0
  88. package/.agent/skills/ui-ux-researcher/SKILL.md +234 -0
  89. package/.agent/skills/vue-expert/SKILL.md +118 -0
  90. package/.agent/skills/vulnerability-scanner/SKILL.md +228 -188
  91. package/.agent/skills/web-design-guidelines/SKILL.md +148 -33
  92. package/.agent/skills/webapp-testing/SKILL.md +171 -122
  93. package/.agent/skills/whimsy-injector/SKILL.md +349 -0
  94. package/.agent/skills/workflow-optimizer/SKILL.md +219 -0
  95. package/.agent/workflows/api-tester.md +279 -0
  96. package/.agent/workflows/audit.md +168 -0
  97. package/.agent/workflows/brainstorm.md +65 -19
  98. package/.agent/workflows/changelog.md +144 -0
  99. package/.agent/workflows/create.md +67 -14
  100. package/.agent/workflows/debug.md +122 -30
  101. package/.agent/workflows/deploy.md +82 -31
  102. package/.agent/workflows/enhance.md +59 -27
  103. package/.agent/workflows/fix.md +143 -0
  104. package/.agent/workflows/generate.md +84 -20
  105. package/.agent/workflows/migrate.md +163 -0
  106. package/.agent/workflows/orchestrate.md +66 -17
  107. package/.agent/workflows/performance-benchmarker.md +305 -0
  108. package/.agent/workflows/plan.md +76 -33
  109. package/.agent/workflows/preview.md +73 -17
  110. package/.agent/workflows/refactor.md +153 -0
  111. package/.agent/workflows/review-ai.md +140 -0
  112. package/.agent/workflows/review.md +83 -16
  113. package/.agent/workflows/session.md +154 -0
  114. package/.agent/workflows/status.md +74 -18
  115. package/.agent/workflows/strengthen-skills.md +99 -0
  116. package/.agent/workflows/swarm.md +194 -0
  117. package/.agent/workflows/test.md +80 -31
  118. package/.agent/workflows/tribunal-backend.md +55 -13
  119. package/.agent/workflows/tribunal-database.md +62 -18
  120. package/.agent/workflows/tribunal-frontend.md +58 -12
  121. package/.agent/workflows/tribunal-full.md +70 -11
  122. package/.agent/workflows/tribunal-mobile.md +123 -0
  123. package/.agent/workflows/tribunal-performance.md +152 -0
  124. package/.agent/workflows/ui-ux-pro-max.md +100 -82
  125. package/README.md +117 -62
  126. package/bin/tribunal-kit.js +542 -288
  127. package/package.json +10 -6
@@ -1,149 +1,209 @@
1
1
  ---
2
2
  name: tdd-workflow
3
3
  description: Test-Driven Development workflow principles. RED-GREEN-REFACTOR cycle.
4
- allowed-tools: Read, Write, Edit, Glob, Grep, Bash
4
+ allowed-tools: Read, Write, Edit, Glob, Grep
5
+ version: 1.0.0
6
+ last-updated: 2026-03-12
7
+ applies-to-model: gemini-2.5-pro, claude-3-7-sonnet
5
8
  ---
6
9
 
7
- # TDD Workflow
10
+ # Test-Driven Development
8
11
 
9
- > Write tests first, code second.
12
+ > TDD is not about testing. It is about design.
13
+ > Writing the test first forces you to design the interface before you know how it will be implemented.
10
14
 
11
15
  ---
12
16
 
13
- ## 1. The TDD Cycle
17
+ ## The RED-GREEN-REFACTOR Cycle
18
+
19
+ Every change in TDD follows three phases:
14
20
 
15
21
  ```
16
- 🔴 RED → Write failing test
17
-
18
- 🟢 GREEN Write minimal code to pass
19
-
20
- 🔵 REFACTOR → Improve code quality
21
-
22
- Repeat...
22
+ RED → Write a test that fails (for code that doesn't exist yet)
23
+ GREEN → Write the minimum code to make the test pass
24
+ REFACTORClean up the code without changing its behavior
23
25
  ```
24
26
 
27
+ The constraint is important: in GREEN phase, write only enough code to pass the test. No more.
28
+
25
29
  ---
26
30
 
27
- ## 2. The Three Laws of TDD
31
+ ## RED Phase Write a Failing Test
28
32
 
29
- 1. Write production code only to make a failing test pass
30
- 2. Write only enough test to demonstrate failure
31
- 3. Write only enough code to make the test pass
33
+ Write a test that:
34
+ 1. Describes one specific piece of behavior
35
+ 2. Uses the API you wish existed (design the interface first)
36
+ 3. Fails for the right reason (not a syntax error — a logical failure)
32
37
 
33
- ---
38
+ ```ts
39
+ // RED: This test fails because `validatePassword` doesn't exist yet
40
+ it('should reject passwords shorter than 8 characters', () => {
41
+ const result = validatePassword('short');
42
+ expect(result.valid).toBe(false);
43
+ expect(result.error).toBe('Password must be at least 8 characters');
44
+ });
45
+ ```
46
+
47
+ **The test failing for the right reason is the signal.** If it fails because of a missing import, that's not the RED phase — that's setup.
34
48
 
35
- ## 3. RED Phase Principles
49
+ ---
36
50
 
37
- ### What to Write
51
+ ## GREEN Phase — Minimum Code to Pass
38
52
 
39
- | Focus | Example |
40
- |-------|---------|
41
- | Behavior | "should add two numbers" |
42
- | Edge cases | "should handle empty input" |
43
- | Error states | "should throw for invalid data" |
53
+ Write only what is needed for the test to pass. Resist the urge to also handle the "other cases" — those will get their own tests.
44
54
 
45
- ### RED Phase Rules
55
+ ```ts
56
+ // GREEN: Minimum implementation to pass the one test
57
+ function validatePassword(password: string): { valid: boolean; error?: string } {
58
+ if (password.length < 8) {
59
+ return { valid: false, error: 'Password must be at least 8 characters' };
60
+ }
61
+ return { valid: true };
62
+ }
63
+ ```
46
64
 
47
- - Test must fail first
48
- - Test name describes expected behavior
49
- - One assertion per test (ideally)
65
+ The code may be ugly. That is fine. GREEN is about passing the test, not about clean code.
50
66
 
51
67
  ---
52
68
 
53
- ## 4. GREEN Phase Principles
69
+ ## REFACTOR Phase — Clean Without Breaking
54
70
 
55
- ### Minimum Code
71
+ Now that the test is green, improve the code:
72
+ - Extract duplication
73
+ - Clarify naming
74
+ - Simplify logic
56
75
 
57
- | Principle | Meaning |
58
- |-----------|---------|
59
- | **YAGNI** | You Aren't Gonna Need It |
60
- | **Simplest thing** | Write the minimum to pass |
61
- | **No optimization** | Just make it work |
76
+ The constraint: all tests must stay green during and after refactor.
62
77
 
63
- ### GREEN Phase Rules
78
+ ```ts
79
+ // REFACTOR: Same behavior, cleaner structure
80
+ const MIN_PASSWORD_LENGTH = 8;
64
81
 
65
- - Don't write unneeded code
66
- - Don't optimize yet
67
- - Pass the test, nothing more
82
+ function validatePassword(password: string): ValidationResult {
83
+ if (password.length < MIN_PASSWORD_LENGTH) {
84
+ return failure(`Password must be at least ${MIN_PASSWORD_LENGTH} characters`);
85
+ }
86
+ return success();
87
+ }
88
+ ```
68
89
 
69
90
  ---
70
91
 
71
- ## 5. REFACTOR Phase Principles
92
+ ## Triangulation
72
93
 
73
- ### What to Improve
94
+ When a single test could be satisfied by a hardcoded value, write a second test to force a real implementation.
74
95
 
75
- | Area | Action |
76
- |------|--------|
77
- | Duplication | Extract common code |
78
- | Naming | Make intent clear |
79
- | Structure | Improve organization |
80
- | Complexity | Simplify logic |
96
+ ```ts
97
+ // Test 1: Could be satisfied by always returning 2
98
+ it('should add two numbers', () => {
99
+ expect(add(1, 1)).toBe(2);
100
+ });
81
101
 
82
- ### REFACTOR Rules
102
+ // Test 2: Forces a real implementation
103
+ it('should add two different numbers', () => {
104
+ expect(add(3, 4)).toBe(7);
105
+ });
106
+ ```
83
107
 
84
- - All tests must stay green
85
- - Small incremental changes
86
- - Commit after each refactor
108
+ **Rule:** If your implementation could be a constant or a special case, triangulate.
87
109
 
88
110
  ---
89
111
 
90
- ## 6. AAA Pattern
112
+ ## When TDD Pays Off
91
113
 
92
- Every test follows:
114
+ TDD's ROI is highest for:
115
+ - Business logic (calculation, validation, state machines)
116
+ - Utility functions used in many places
117
+ - Error handling paths that are hard to trigger manually
118
+ - Refactoring existing code you want to verify still works
93
119
 
94
- | Step | Purpose |
95
- |------|---------|
96
- | **Arrange** | Set up test data |
97
- | **Act** | Execute code under test |
98
- | **Assert** | Verify expected outcome |
120
+ TDD's ROI is lower for:
121
+ - UI components (Storybook + visual review is often more efficient)
122
+ - Database migrations (integration test after, not TDD)
123
+ - Exploratory/prototype code that will be thrown away
99
124
 
100
125
  ---
101
126
 
102
- ## 7. When to Use TDD
127
+ ## Common TDD Mistakes
103
128
 
104
- | Scenario | TDD Value |
105
- |----------|-----------|
106
- | New feature | High |
107
- | Bug fix | High (write test first) |
108
- | Complex logic | High |
109
- | Exploratory | Low (spike, then TDD) |
110
- | UI layout | Low |
129
+ | Mistake | Effect |
130
+ |---|---|
131
+ | Writing tests after implementation | Tests confirm the implementation, not the behavior |
132
+ | Testing too much in one cycle | Large RED-GREEN steps hide design problems |
133
+ | Skipping REFACTOR | Code quality degrades with each cycle |
134
+ | Not reaching RED | Writing tests that pass immediately means the implementation already existed |
135
+ | Mocking everything | Tests become coupled to implementation, not behavior |
111
136
 
112
137
  ---
113
138
 
114
- ## 8. Test Prioritization
139
+ ## 🛑 Verification-Before-Completion (VBC) Protocol
115
140
 
116
- | Priority | Test Type |
117
- |----------|-----------|
118
- | 1 | Happy path |
119
- | 2 | Error cases |
120
- | 3 | Edge cases |
121
- | 4 | Performance |
141
+ **CRITICAL:** You must follow a strict "evidence-based closeout" state machine.
142
+ - ❌ **Forbidden:** Ending the GREEN or REFACTOR phases based on assumption that the code is correct.
143
+ - **Required:** You are explicitly forbidden from completing a test cycle or ending your task without providing **concrete terminal evidence** that the test suite actually ran and returned a strictly passing (GREEN) result.
122
144
 
123
145
  ---
124
146
 
125
- ## 9. Anti-Patterns
147
+ ## Output Format
148
+
149
+ When this skill produces or reviews code, structure your output as follows:
150
+
151
+ ```
152
+ ━━━ Tdd Workflow Report ━━━━━━━━━━━━━━━━━━━━━━━━
153
+ Skill: Tdd Workflow
154
+ Language: [detected language / framework]
155
+ Scope: [N files · N functions]
156
+ ─────────────────────────────────────────────────
157
+ ✅ Passed: [checks that passed, or "All clean"]
158
+ ⚠️ Warnings: [non-blocking issues, or "None"]
159
+ ❌ Blocked: [blocking issues requiring fix, or "None"]
160
+ ─────────────────────────────────────────────────
161
+ VBC status: PENDING → VERIFIED
162
+ Evidence: [test output / lint pass / compile success]
163
+ ```
164
+
165
+ **VBC (Verification-Before-Completion) is mandatory.**
166
+ Do not mark status as VERIFIED until concrete terminal evidence is provided.
167
+
126
168
 
127
- | ❌ Don't | ✅ Do |
128
- |----------|-------|
129
- | Skip the RED phase | Watch test fail first |
130
- | Write tests after | Write tests before |
131
- | Over-engineer initial | Keep it simple |
132
- | Multiple asserts | One behavior per test |
133
- | Test implementation | Test behavior |
134
169
 
135
170
  ---
136
171
 
137
- ## 10. AI-Augmented TDD
172
+ ## 🤖 LLM-Specific Traps
138
173
 
139
- ### Multi-Agent Pattern
174
+ AI coding assistants often fall into specific bad habits when dealing with this domain. These are strictly forbidden:
140
175
 
141
- | Agent | Role |
142
- |-------|------|
143
- | Agent A | Write failing tests (RED) |
144
- | Agent B | Implement to pass (GREEN) |
145
- | Agent C | Optimize (REFACTOR) |
176
+ 1. **Over-engineering:** Proposing complex abstractions or distributed systems when a simpler approach suffices.
177
+ 2. **Hallucinated Libraries/Methods:** Using non-existent methods or packages. Always `// VERIFY` or check `package.json` / `requirements.txt`.
178
+ 3. **Skipping Edge Cases:** Writing the "happy path" and ignoring error handling, timeouts, or data validation.
179
+ 4. **Context Amnesia:** Forgetting the user's constraints and offering generic advice instead of tailored solutions.
180
+ 5. **Silent Degradation:** Catching and suppressing errors without logging or re-raising.
146
181
 
147
182
  ---
148
183
 
149
- > **Remember:** The test is the specification. If you can't write a test, you don't understand the requirement.
184
+ ## 🏛️ Tribunal Integration (Anti-Hallucination)
185
+
186
+ **Slash command: `/review` or `/tribunal-full`**
187
+ **Active reviewers: `logic-reviewer` · `security-auditor`**
188
+
189
+ ### ❌ Forbidden AI Tropes
190
+
191
+ 1. **Blind Assumptions:** Never make an assumption without documenting it clearly with `// VERIFY: [reason]`.
192
+ 2. **Silent Degradation:** Catching and suppressing errors without logging or handling.
193
+ 3. **Context Amnesia:** Forgetting the user's constraints and offering generic advice instead of tailored solutions.
194
+
195
+ ### ✅ Pre-Flight Self-Audit
196
+
197
+ Review these questions before confirming output:
198
+ ```
199
+ ✅ Did I rely ONLY on real, verified tools and methods?
200
+ ✅ Is this solution appropriately scoped to the user's constraints?
201
+ ✅ Did I handle potential failure modes and edge cases?
202
+ ✅ Have I avoided generic boilerplate that doesn't add value?
203
+ ```
204
+
205
+ ### 🛑 Verification-Before-Completion (VBC) Protocol
206
+
207
+ **CRITICAL:** You must follow a strict "evidence-based closeout" state machine.
208
+ - ❌ **Forbidden:** Declaring a task complete because the output "looks correct."
209
+ - ✅ **Required:** You are explicitly forbidden from finalizing any task without providing **concrete evidence** (terminal output, passing tests, compile success, or equivalent proof) that your output works as intended.
@@ -0,0 +1,299 @@
1
+ ---
2
+ name: test-result-analyzer
3
+ description: Ingests test logs and identifies root causes across multiple failing test files. Provides actionable fix recommendations.
4
+ skills:
5
+ - systematic-debugging
6
+ - testing-patterns
7
+ version: 1.0.0
8
+ last-updated: 2026-03-12
9
+ applies-to-model: gemini-2.5-pro, claude-3-7-sonnet
10
+ ---
11
+
12
+ # Test Result Analyzer Skill
13
+
14
+ You are a specialist in analyzing test output — not writing tests, but *understanding why tests fail*. You turn walls of red error text into a prioritized action plan.
15
+
16
+ ## When to Activate
17
+
18
+ - After a test run with multiple failures.
19
+ - When the user says "tests are failing", "analyze test results", "what broke?", or "test failed".
20
+ - During CI/CD pipeline debugging.
21
+ - When `test_runner.py` or any test command exits with failures.
22
+ - When paired with `systematic-debugging` for deep root-cause investigation.
23
+
24
+ ## Analysis Pipeline
25
+
26
+ ```
27
+ Test output (terminal or log file)
28
+
29
+
30
+ Runner detection — identify test framework from output format
31
+
32
+
33
+ Failure extraction — parse each FAIL block into structured data
34
+
35
+
36
+ Clustering — group failures by root module, error type, shared dependency
37
+
38
+
39
+ FPF detection — find the First Point of Failure
40
+
41
+
42
+ Dependency graph — map cascade relationships
43
+
44
+
45
+ Fix recommendations — ordered by impact (most failures resolved first)
46
+
47
+
48
+ Report — structured output with confidence levels
49
+ ```
50
+
51
+ ## Step 1: Runner Detection
52
+
53
+ Auto-detect the test framework from output patterns:
54
+
55
+ | Framework | Detection Pattern | Failure Marker |
56
+ |---|---|---|
57
+ | Jest | `PASS`/`FAIL` with file paths, `●` for test names | `FAIL src/...` |
58
+ | Vitest | `✓`/`×` markers, `FAIL` blocks | `❯ FAIL` or `× test name` |
59
+ | pytest | `PASSED`/`FAILED` with `::` separator | `FAILED tests/...::test_name` |
60
+ | Go test | `ok`/`FAIL` with package paths | `--- FAIL: TestName` |
61
+ | Mocha | `passing`/`failing` counts, indented suites | `N failing` section |
62
+ | JUnit (XML) | `<testsuite>` XML structure | `<failure>` elements |
63
+ | RSpec | `.F` markers, `Failures:` section | `Failure/Error:` |
64
+ | Cargo test | `test result: FAILED` | `---- test_name stdout ----` |
65
+
66
+ ## Step 2: Failure Extraction
67
+
68
+ For each failure, extract a structured record:
69
+
70
+ ```
71
+ {
72
+ test_name: "should return 401 for unauthenticated requests"
73
+ test_file: "src/api/auth.test.ts"
74
+ test_line: 42
75
+ error_type: "AssertionError"
76
+ expected: "401"
77
+ received: "200"
78
+ stack_trace: ["auth.test.ts:42", "auth.middleware.ts:18", "express/router.ts:..."]
79
+ source_files: ["auth.middleware.ts:18"] // files from YOUR codebase in the stack
80
+ }
81
+ ```
82
+
83
+ ## Step 3: Failure Clustering
84
+
85
+ Group failures into clusters based on shared characteristics:
86
+
87
+ ### Cluster Types
88
+
89
+ | Cluster Type | How to Detect | Typical Root Cause |
90
+ |---|---|---|
91
+ | **Shared Module** | Multiple tests import from the same file that changed | Missing export, type change, API change |
92
+ | **Same Error Type** | All failures throw `TypeError` or `ConnectionError` | Broken dependency, env issue |
93
+ | **Shared Fixture** | Tests using same `beforeEach`/setup fail together | Fixture setup failure cascading |
94
+ | **Import Chain** | Failures follow the import graph | Dependency that fails to resolve |
95
+ | **Environment** | All tests fail with connection/config errors | Missing env var, DB not running |
96
+ | **Timing** | Tests pass individually, fail together | Race condition, shared state |
97
+ | **Snapshot** | Multiple `toMatchSnapshot` failures | Intentional UI change (update snapshots) |
98
+
99
+ ### Cascade Detection Algorithm
100
+
101
+ ```
102
+ 1. Sort failures by file path and execution order.
103
+ 2. Find the FIRST failure in execution order → candidate FPF.
104
+ 3. Check if the FPF's source file appears in other failures' import chains.
105
+ 4. If yes → FPF is the root cause, other failures are cascades.
106
+ 5. If no → failures are independent (multiple root causes).
107
+ ```
108
+
109
+ ## Step 4: First Point of Failure (FPF) Detection
110
+
111
+ The FPF is the most valuable finding — fix it first, and cascading failures resolve automatically.
112
+
113
+ ```
114
+ Example:
115
+ 12 test files fail.
116
+ 11 of them import from `utils/auth.ts`.
117
+ The first failure is in `utils/auth.test.ts` at line 42.
118
+ Error: `generateToken is not exported from './auth'`
119
+
120
+ FPF: utils/auth.ts:42 — missing export
121
+ Cascade: 11 other test files fail because they can't import generateToken
122
+ Fix: Add `export { generateToken }` to utils/auth.ts
123
+ Expected resolution: 12 of 12 failures (100%)
124
+ ```
125
+
126
+ **FPF Confidence Levels:**
127
+
128
+ | Confidence | Criteria |
129
+ |---|---|
130
+ | **HIGH** | Same source file in >50% of failure stack traces |
131
+ | **MEDIUM** | Same error type across multiple test files |
132
+ | **LOW** | Failures appear independent, multiple root causes likely |
133
+
134
+ ## Step 5: Fix Recommendations
135
+
136
+ For each cluster, provide actionable fixes:
137
+
138
+ | Fix Type | Example | How to Verify |
139
+ |---|---|---|
140
+ | **Missing Export** | `export { fn }` added to module | Re-run failing tests |
141
+ | **Type Mismatch** | Function signature changed, callers need update | Check callers with `grep_search` |
142
+ | **Stale Mock** | Mock doesn't match new interface | Compare mock to actual implementation |
143
+ | **Env Variable** | `.env.test` missing `DATABASE_URL` | Check `.env.example` vs `.env.test` |
144
+ | **Snapshot Update** | Intentional UI change | Run with `--updateSnapshot` flag |
145
+ | **Race Condition** | Tests share global state | Add isolation or `beforeEach` reset |
146
+ | **Dependency Update** | Package API changed after upgrade | Check changelog of updated package |
147
+
148
+ ### Fix Priority Formula
149
+ ```
150
+ Priority = (Tests_Resolved × 10) + (Confidence_Score × 5) - (Estimated_Fix_Time_Minutes)
151
+
152
+ Fix in this order:
153
+ 1. Highest priority score first
154
+ 2. If tied, prefer HIGH confidence
155
+ 3. If still tied, prefer fewer files to change
156
+ ```
157
+
158
+ ## Report Format
159
+
160
+ ```
161
+ ━━━ Test Result Analysis ━━━━━━━━━━━━━━━━
162
+
163
+ Runner: [Jest / Vitest / pytest / Go / auto-detected]
164
+ Total: 48 tests across 12 files
165
+ Result: 36 passed | 12 failed | 0 skipped
166
+ Duration: 4.2s
167
+ Coverage: 78% statements (if available)
168
+
169
+ ━━━ First Point of Failure ━━━━━━━━━━━━━━
170
+
171
+ 📍 utils/auth.test.ts → line 42
172
+ Error: `generateToken` is not exported from `./auth`
173
+ Type: ImportError
174
+ Impact: Cascades to 11 other test files
175
+
176
+ This is the root cause. Fix this first.
177
+
178
+ ━━━ Failure Clusters ━━━━━━━━━━━━━━━━━━━━
179
+
180
+ Cluster 1: Missing Export (11 tests, HIGH confidence)
181
+ Root: utils/auth.ts:42
182
+ Cascade: auth.test.ts, users.test.ts, sessions.test.ts, ...
183
+ Fix: Add `export { generateToken }` to auth.ts
184
+ Resolution: 11 of 12 failures (92%)
185
+ Priority: ★★★★★ (115 pts)
186
+
187
+ Cluster 2: Stale Mock (1 test, MEDIUM confidence)
188
+ Root: api/users.test.ts:98
189
+ Error: Expected { name, email, role } but received { name, email }
190
+ Fix: Add `role: "user"` to mock at line 15
191
+ Resolution: 1 of 12 failures (8%)
192
+ Priority: ★★☆☆☆ (20 pts)
193
+
194
+ ━━━ Fix Plan ━━━━━━━━━━━━━━━━━━━━━━━━━━━
195
+
196
+ Step 1: Fix utils/auth.ts export
197
+ → Expected: 11 failures resolved
198
+ → Time: ~2 minutes
199
+ → Run: npx jest utils/auth.test.ts (verify FPF fix)
200
+
201
+ Step 2: Update mock in api/users.test.ts:15
202
+ → Expected: 1 failure resolved
203
+ → Time: ~1 minute
204
+
205
+ Step 3: Re-run full suite
206
+ → Expected: all 12 failures resolved (0 remaining)
207
+
208
+ ━━━ Warnings ━━━━━━━━━━━━━━━━━━━━━━━━━━━
209
+
210
+ ⚠️ No test coverage report detected. Consider adding --coverage flag.
211
+ ⚠️ 3 test files have no assertions (test names end in `.todo`).
212
+ ```
213
+
214
+ ## Edge Cases
215
+
216
+ ### All Tests Fail
217
+ ```
218
+ If 100% of tests fail → likely environment issue, not code:
219
+ 1. Check if dev server / database is running
220
+ 2. Check .env.test for missing variables
221
+ 3. Check node_modules exists (run npm install)
222
+ 4. Check for breaking dependency upgrade in recent commits
223
+ ```
224
+
225
+ ### Flaky Tests
226
+ ```
227
+ If same test passes on retry → flaky:
228
+ 1. Check for shared mutable state between tests
229
+ 2. Check for time-dependent assertions
230
+ 3. Check for unresolved promises / async leaks
231
+ 4. Check for network-dependent tests without mocks
232
+ ```
233
+
234
+ ### Only Snapshot Tests Fail
235
+ ```
236
+ If only snapshot tests fail → likely intentional UI change:
237
+ 1. Review snapshot diffs
238
+ 2. If changes are expected: run with --updateSnapshot
239
+ 3. If changes are unexpected: check for unintended CSS/component changes
240
+ ```
241
+
242
+ ## Cross-Skill Integration
243
+
244
+ | Paired Skill | Integration Point |
245
+ |---|---|
246
+ | `systematic-debugging` | Escalate when FPF is unclear → 4-phase debug methodology |
247
+ | `testing-patterns` | Reference when recommending test structure improvements |
248
+ | `workflow-optimizer` | Flag inefficient test-debug-retest loops |
249
+
250
+ ## Anti-Hallucination Guard
251
+
252
+ - **Only analyze test output that was actually produced** — never generate fake test results.
253
+ - **Never invent file paths or line numbers** — only reference what appears in the stack trace.
254
+ - **Verify source files exist** before suggesting fixes — use `view_file` or `find_by_name`.
255
+ - **Mark uncertainty**: `// UNCERTAIN: log format not fully recognized, manual review recommended`.
256
+ - **Never guess at assertion values** — quote exactly what "Expected" and "Received" say in the output.
257
+ - **Don't assume test runner** — auto-detect from output format, don't assume Jest.
258
+
259
+
260
+ ---
261
+
262
+ ## 🤖 LLM-Specific Traps
263
+
264
+ AI coding assistants often fall into specific bad habits when dealing with this domain. These are strictly forbidden:
265
+
266
+ 1. **Over-engineering:** Proposing complex abstractions or distributed systems when a simpler approach suffices.
267
+ 2. **Hallucinated Libraries/Methods:** Using non-existent methods or packages. Always `// VERIFY` or check `package.json` / `requirements.txt`.
268
+ 3. **Skipping Edge Cases:** Writing the "happy path" and ignoring error handling, timeouts, or data validation.
269
+ 4. **Context Amnesia:** Forgetting the user's constraints and offering generic advice instead of tailored solutions.
270
+ 5. **Silent Degradation:** Catching and suppressing errors without logging or re-raising.
271
+
272
+ ---
273
+
274
+ ## 🏛️ Tribunal Integration (Anti-Hallucination)
275
+
276
+ **Slash command: `/review` or `/tribunal-full`**
277
+ **Active reviewers: `logic-reviewer` · `security-auditor`**
278
+
279
+ ### ❌ Forbidden AI Tropes
280
+
281
+ 1. **Blind Assumptions:** Never make an assumption without documenting it clearly with `// VERIFY: [reason]`.
282
+ 2. **Silent Degradation:** Catching and suppressing errors without logging or handling.
283
+ 3. **Context Amnesia:** Forgetting the user's constraints and offering generic advice instead of tailored solutions.
284
+
285
+ ### ✅ Pre-Flight Self-Audit
286
+
287
+ Review these questions before confirming output:
288
+ ```
289
+ ✅ Did I rely ONLY on real, verified tools and methods?
290
+ ✅ Is this solution appropriately scoped to the user's constraints?
291
+ ✅ Did I handle potential failure modes and edge cases?
292
+ ✅ Have I avoided generic boilerplate that doesn't add value?
293
+ ```
294
+
295
+ ### 🛑 Verification-Before-Completion (VBC) Protocol
296
+
297
+ **CRITICAL:** You must follow a strict "evidence-based closeout" state machine.
298
+ - ❌ **Forbidden:** Declaring a task complete because the output "looks correct."
299
+ - ✅ **Required:** You are explicitly forbidden from finalizing any task without providing **concrete evidence** (terminal output, passing tests, compile success, or equivalent proof) that your output works as intended.