fraim-framework 2.0.26 → 2.0.30

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (104) hide show
  1. package/.github/workflows/deploy-fraim.yml +1 -1
  2. package/dist/registry/scripts/build-scripts-generator.js +205 -0
  3. package/dist/registry/scripts/cleanup-branch.js +258 -0
  4. package/dist/registry/scripts/evaluate-code-quality.js +66 -0
  5. package/dist/registry/scripts/exec-with-timeout.js +142 -0
  6. package/dist/registry/scripts/fraim-config.js +61 -0
  7. package/dist/registry/scripts/generate-engagement-emails.js +630 -0
  8. package/dist/registry/scripts/generic-issues-api.js +100 -0
  9. package/dist/registry/scripts/newsletter-helpers.js +731 -0
  10. package/dist/registry/scripts/openapi-generator.js +664 -0
  11. package/dist/registry/scripts/performance/profile-server.js +390 -0
  12. package/dist/registry/scripts/run-thank-you-workflow.js +92 -0
  13. package/dist/registry/scripts/send-newsletter-simple.js +85 -0
  14. package/dist/registry/scripts/send-thank-you-emails.js +54 -0
  15. package/dist/registry/scripts/validate-openapi-limits.js +311 -0
  16. package/dist/registry/scripts/validate-test-coverage.js +262 -0
  17. package/dist/registry/scripts/verify-test-coverage.js +66 -0
  18. package/dist/src/cli/commands/init.js +14 -12
  19. package/dist/src/cli/commands/sync.js +19 -2
  20. package/dist/src/cli/fraim.js +24 -22
  21. package/dist/src/cli/setup/first-run.js +13 -6
  22. package/dist/src/fraim/config-loader.js +0 -8
  23. package/dist/src/fraim/db-service.js +26 -15
  24. package/dist/src/fraim/issues.js +67 -0
  25. package/dist/src/fraim/setup-wizard.js +1 -69
  26. package/dist/src/fraim/types.js +0 -11
  27. package/dist/src/fraim-mcp-server.js +272 -18
  28. package/dist/src/utils/git-utils.js +1 -1
  29. package/dist/src/utils/version-utils.js +32 -0
  30. package/dist/tests/debug-tools.js +79 -0
  31. package/dist/tests/esm-compat.js +11 -0
  32. package/dist/tests/test-chalk-esm-issue.js +159 -0
  33. package/dist/tests/test-chalk-real-world.js +265 -0
  34. package/dist/tests/test-chalk-regression.js +327 -0
  35. package/dist/tests/test-chalk-resolution-issue.js +304 -0
  36. package/dist/tests/test-cli.js +0 -2
  37. package/dist/tests/test-fraim-install-chalk-issue.js +254 -0
  38. package/dist/tests/test-fraim-issues.js +59 -0
  39. package/dist/tests/test-genericization.js +1 -3
  40. package/dist/tests/test-mcp-connection.js +166 -0
  41. package/dist/tests/test-mcp-issue-integration.js +144 -0
  42. package/dist/tests/test-mcp-lifecycle-methods.js +312 -0
  43. package/dist/tests/test-node-compatibility.js +71 -0
  44. package/dist/tests/test-npm-install.js +66 -0
  45. package/dist/tests/test-npm-resolution-diagnostic.js +140 -0
  46. package/dist/tests/test-session-rehydration.js +145 -0
  47. package/dist/tests/test-standalone.js +2 -8
  48. package/dist/tests/test-sync-version-update.js +93 -0
  49. package/dist/tests/test-telemetry.js +190 -0
  50. package/package.json +10 -8
  51. package/registry/agent-guardrails.md +62 -54
  52. package/registry/rules/agent-success-criteria.md +52 -0
  53. package/registry/rules/agent-testing-guidelines.md +502 -502
  54. package/registry/rules/communication.md +121 -121
  55. package/registry/rules/continuous-learning.md +54 -54
  56. package/registry/rules/ephemeral-execution.md +10 -5
  57. package/registry/rules/hitl-ppe-record-analysis.md +302 -302
  58. package/registry/rules/local-development.md +251 -251
  59. package/registry/rules/software-development-lifecycle.md +104 -104
  60. package/registry/rules/successful-debugging-patterns.md +482 -478
  61. package/registry/rules/telemetry.md +67 -0
  62. package/registry/scripts/build-scripts-generator.ts +216 -215
  63. package/registry/scripts/cleanup-branch.ts +303 -284
  64. package/registry/scripts/code-quality-check.sh +559 -559
  65. package/registry/scripts/detect-tautological-tests.sh +38 -38
  66. package/registry/scripts/evaluate-code-quality.ts +1 -1
  67. package/registry/scripts/generate-engagement-emails.ts +744 -744
  68. package/registry/scripts/generic-issues-api.ts +110 -150
  69. package/registry/scripts/newsletter-helpers.ts +874 -874
  70. package/registry/scripts/openapi-generator.ts +695 -693
  71. package/registry/scripts/performance/profile-server.ts +5 -3
  72. package/registry/scripts/prep-issue.sh +468 -455
  73. package/registry/scripts/validate-openapi-limits.ts +366 -365
  74. package/registry/scripts/validate-test-coverage.ts +280 -280
  75. package/registry/scripts/verify-pr-comments.sh +70 -70
  76. package/registry/scripts/verify-test-coverage.ts +1 -1
  77. package/registry/templates/bootstrap/ARCHITECTURE-TEMPLATE.md +53 -53
  78. package/registry/templates/evidence/Implementation-BugEvidence.md +85 -85
  79. package/registry/templates/evidence/Implementation-FeatureEvidence.md +120 -120
  80. package/registry/templates/marketing/HBR-ARTICLE-TEMPLATE.md +66 -0
  81. package/registry/workflows/bootstrap/create-architecture.md +2 -2
  82. package/registry/workflows/bootstrap/evaluate-code-quality.md +3 -3
  83. package/registry/workflows/bootstrap/verify-test-coverage.md +2 -2
  84. package/registry/workflows/customer-development/insight-analysis.md +156 -156
  85. package/registry/workflows/customer-development/interview-preparation.md +421 -421
  86. package/registry/workflows/customer-development/strategic-brainstorming.md +146 -146
  87. package/registry/workflows/customer-development/thank-customers.md +193 -191
  88. package/registry/workflows/customer-development/weekly-newsletter.md +362 -352
  89. package/registry/workflows/improve-fraim/contribute.md +32 -0
  90. package/registry/workflows/improve-fraim/file-issue.md +32 -0
  91. package/registry/workflows/marketing/hbr-article.md +73 -0
  92. package/registry/workflows/performance/analyze-performance.md +63 -59
  93. package/registry/workflows/product-building/design.md +3 -2
  94. package/registry/workflows/product-building/implement.md +4 -3
  95. package/registry/workflows/product-building/prep-issue.md +28 -17
  96. package/registry/workflows/product-building/resolve.md +3 -2
  97. package/registry/workflows/product-building/retrospect.md +3 -2
  98. package/registry/workflows/product-building/spec.md +5 -4
  99. package/registry/workflows/product-building/test.md +3 -2
  100. package/registry/workflows/quality-assurance/iterative-improvement-cycle.md +562 -562
  101. package/registry/workflows/replicate/website-discovery-analysis.md +3 -3
  102. package/registry/workflows/reviewer/review-implementation-vs-design-spec.md +632 -632
  103. package/registry/workflows/reviewer/review-implementation-vs-feature-spec.md +669 -669
  104. package/tsconfig.json +2 -1
@@ -1,502 +1,502 @@
1
- # Rule: agent-testing-guidelines
2
-
3
- **Path:** `rules/agent-testing-guidelines.md`
4
-
5
- ---
6
-
7
- # AI Agent Testing & Validation Guidelines
8
-
9
- ## INTENT
10
- To ensure all work is thoroughly validated. To ensure agents provide **real, reproducible, end‑to‑end evidence** that fixes work — never “looks good” claims.
11
-
12
- ## PRINCIPLES
13
- - **Reproduce → Fix → Prove**: Show failing evidence first, then passing evidence after the fix.
14
- - **Regression Test Verification**: ALWAYS verify regression tests fail with bug and pass with fix. Never create tests that pass/fail in both scenarios.
15
- - **Test what matters**: Follow the test plan. Test the functionality that you have changed. Do not mock core functionality being tested.
16
- - **Keep it simple**: Tests must be minimal but complete, covering all relevant scenarios. Use boilerplate tests and mocks to reduce duplication. Prefer simple unit tests over complex integration tests.
17
- - **Be complete**: No Placeholder Tests `// TODO` or empty/disabled test bodies are forbidden. Do not assume failing test cases are not important. Always watch for server logs for errors.
18
- - **Be efficient**: Tests should create objects they need, test core functionality, then delete any objects they create, close server/database connections.
19
- - **Be resilient**: When a tool fails, investigate alternative approaches before giving up. Check for existing working examples in the project before claiming a tool is broken.
20
- - **Be truthful**: NEVER CLAIM SUCCESS WITHOUT RUNNING TESTS. Always run tests and show results before claiming they pass.
21
- - **Backup with Evidence, Not Assertions**: Include evidence of passing tests in the PR before submitting for review.
22
- - **Complete Test Ownership**: When you write tests, you own ALL failures until ALL tests pass. No exceptions, no categorizations, no "not my problem" rationalizations. If you wrote the test, you fix ALL its failures before claiming completion.
23
- - **Avoid Duplicates; Find Gaps**: Do not add redundant tests that restate existing coverage. Before adding/removing tests, identify what behavior is currently covered and what is missing, then add/merge tests to close gaps (not inflate count).
24
-
25
- ## UI E2E (Playwright) — MANDATORY RULES
26
-
27
- These rules exist because UI failures often show up as “timeouts” unless you instrument and assert correctly.
28
-
29
- - **Headless by default**: `chromium.launch({ headless: true })` unless explicitly debugging.
30
- - **Prefer DOM truth over network truth**: Assert user-visible state changes (selectors/text) instead of relying on `page.waitForResponse()` predicates.
31
- - **Instrument failures**: Every Playwright E2E must attach:
32
- - `page.on('requestfailed', …)` to print method/url/errorText
33
- - `page.on('response', …)` to log HTTP >= 400
34
- - **Server bootstrap is explicit**: If a test depends on the dev server, it must either:
35
- - auto-start it (preferred), or
36
- - fail fast with an exact command and URL (`http://localhost:PORT/...`)
37
- - **Stable selectors only**: Use `data-role`/`data-ashley-component` selectors. Do not use nth-child or layout-dependent selectors.
38
- - **Optional UI stays optional**: Never re-enable product UI just to satisfy a test. Gate the test behavior or verify via API/DB instead.
39
- - **Manual UI verification uses Playwright MCP**: When asked to “pop the browser” or confirm a UI bugfix, use Playwright MCP tools (navigate/click/snapshot) and capture the resulting DOM snapshot as evidence.
40
-
41
- ## FAILING TEST TRIAGE (DO NOT DELETE FIRST)
42
-
43
- **Rule**: A failing test is a signal until proven otherwise. You may only delete/skip a test after completing triage and ensuring coverage remains.
44
-
45
- ### Required triage steps (in order)
46
- 1. **Reproduce twice**: Re-run the failing test at least 2x (same commit) and capture `test.log`/output.
47
- 2. **Classify the failure**:
48
- - **Product bug** (test reveals a real functional gap): fix product code; keep the test.
49
- - **Test bug** (wrong selector/assumption/incorrect fixture): fix the test; keep the test.
50
- - **Flake** (timing/overlays/network): stabilize the test OR rewrite it to a more deterministic layer (unit/integration). Do not delete until replaced.
51
- 3. **Prove redundancy before deletion**:
52
- - Point to an existing test that asserts the *same* behavior (or strengthen an existing test to do so).
53
- - If you delete a test, you MUST add/merge equivalent coverage elsewhere in the same PR.
54
-
55
- ### Allowed outcomes
56
- - **Fix product** (preferred): keep test; it becomes the regression guardrail.
57
- - **Fix test**: keep test; align with real UX/DOM and intended behavior.
58
- - **Replace test**: move the assertion to a more reliable layer (e.g., unit test for UI validation logic, API test for backend behavior). Delete only after replacement is merged and passing.
59
-
60
- ### Prohibited outcomes
61
- - ❌ Deleting a failing test to “unblock” without proving redundancy
62
- - ❌ Deleting a failing test when it is the only check for a behavior
63
- - ❌ Creating “dupe tests” instead of strengthening the existing one
64
- - ❌ Converting a failing test into a no-op (skipping, loosening assertions) without replacement
65
-
66
- ## 🚨 MANDATORY ENFORCEMENT PROTOCOLS
67
-
68
- ### **ANTI-LYING ENFORCEMENT**
69
- **RULE**: Before claiming ANY test result, you MUST:
70
- 1. Run the exact test command
71
- 2. Show the complete output (not summary)
72
- 3. Verify exit code is 0
73
- 4. State what was actually tested
74
-
75
- **VIOLATION CONSEQUENCES**:
76
- - Immediate task termination
77
- - Mandatory apology to user
78
- - Must re-run tests with evidence before continuing
79
- - Cannot claim "tests pass" without showing actual output
80
-
81
- ### **TEST SCOPE PRECISION ENFORCEMENT**
82
- **RULE**: When testing specific functionality:
83
- - Use targeted commands: `npm run test-failing`, `npm run test-smoke`, etc.
84
- - NEVER run entire test suites unless explicitly requested
85
- - Always specify exact test file/function being tested
86
-
87
- **VIOLATION CONSEQUENCES**:
88
- - Must re-run with correct scope
89
- - Cannot proceed until targeted test is run
90
- - Must show evidence of correct test scope
91
-
92
- ### **ANTI-PATTERN DETECTION ENFORCEMENT**
93
- **RULE**: Before creating any test, you MUST answer these questions:
94
-
95
- 1. **Am I testing runtime behavior or code structure?**
96
- - Runtime behavior = Executing code, checking outcomes (API calls, service methods, state changes)
97
- - Code structure = Reading files, checking if strings exist (`fs.readFileSync()` to verify code)
98
- - ❌ Code structure checks are NOT tests - they're static analysis
99
-
100
- 2. **Am I testing the real behavior or a mock?**
101
- - Real behavior = Actual API calls, actual service methods
102
- - Mock = Only if mocking dependencies, NOT the thing being tested
103
- - ❌ Mocking what you're testing is an anti-pattern
104
-
105
- 3. **Does this test actually validate the fix?**
106
- - Would test fail if functionality was broken?
107
- - Does test check observable outcomes (database state, API responses, logs)?
108
- - ❌ Tests that pass regardless of behavior are invalid
109
-
110
- 4. **Am I repeating a documented anti-pattern?**
111
- - Checked retrospectives for similar mistakes?
112
- - Verified test approach doesn't match anti-patterns?
113
- - ❌ Repeating documented mistakes (like Issue #723) is prohibited
114
-
115
- **MANDATORY VALIDATION**:
116
- Before writing test code, you MUST:
117
- 1. Answer all 4 questions above
118
- 2. Show answers in your response
119
- 3. Verify answers don't indicate anti-patterns
120
- 4. If ANY answer indicates anti-pattern → STOP, cannot proceed
121
-
122
- **VIOLATION CONSEQUENCES**:
123
- - If mocking what you're testing → STOP immediately
124
- - If testing code structure instead of runtime behavior → STOP immediately
125
- - If repeating documented anti-pattern → STOP immediately
126
- - Must redesign test to validate actual behavior
127
- - Cannot proceed until test validates functionality
128
-
129
- ### **FAILURE CASCADE PREVENTION ENFORCEMENT**
130
- **RULE**: When one approach fails:
131
- 1. STOP - don't try another approach immediately
132
- 2. ANALYZE - why did it fail?
133
- 3. ADMIT - what went wrong?
134
- 4. ASK - for guidance if stuck
135
- 5. FIX - only after understanding the problem
136
-
137
- **VIOLATION CONSEQUENCES**:
138
- - Must pause and analyze failure
139
- - Cannot try new approach without understanding first failure
140
- - Must ask user for guidance if stuck
141
- - Cannot compound errors with more attempts
142
-
143
-
144
- ## CORE TESTING PROCEDURE
145
-
146
- ### 1. Analyze Before Implementing
147
- **CRITICAL**: Always analyze the codebase thoroughly before making changes:
148
- - Use `grep_search` to find all dependencies and usage patterns
149
- - Use `find_by_name` to locate related files and patterns
150
- - Use `Read` to understand existing implementations
151
- - Document findings with real code examples and line numbers
152
- - **NEVER** make changes based on assumptions
153
-
154
- ### 2. Identify Code Changes
155
- Before committing, determine the full scope of files that have been modified, created, or deleted.
156
-
157
- ### 3. **MANDATORY BUILD VERIFICATION - DO THIS BEFORE TESTS**
158
-
159
- **🚨 CRITICAL: Run build BEFORE running any tests**
160
-
161
- ```bash
162
- # ALWAYS run this first, before any test execution:
163
- npm run build
164
-
165
- # Verify exit code is 0:
166
- echo $? # Must output: 0
167
- ```
168
-
169
- **Why this is mandatory**:
170
- - Catches TypeScript compilation errors immediately
171
- - Prevents wasting time running tests that will fail anyway
172
- - Catches function signature mismatches
173
- - Verifies all imports resolve correctly
174
- - **If build fails, tests are meaningless**
175
-
176
- **PROHIBITED**:
177
- - ❌ Running tests before build
178
- - ❌ Assuming build works without running it
179
- - ❌ Claiming "tests passing" when build fails
180
- - ❌ Ignoring build errors as "probably unrelated"
181
-
182
- **REQUIRED**:
183
- - ✅ Run `npm run build` first, every time
184
- - ✅ Fix ALL build errors before running tests
185
- - ✅ Verify exit code is 0
186
- - ✅ Never proceed to tests if build fails
187
-
188
- **Special Case - BAML Files**:
189
- If you edited ANY `.baml` file, see `.ai-agents/rules/baml-workflow.md` for the mandatory 3-step workflow that must be completed BEFORE running build.
190
-
191
- ### 3.5. **MANDATORY PRE-TEST CREATION CHECKLIST - BLOCKS TEST WRITING**
192
-
193
- **🚨 CRITICAL: Complete this checklist BEFORE writing ANY test code**
194
-
195
- **You CANNOT create `test-*.ts` files until ALL items are checked:**
196
-
197
- - [ ] **FUNCTIONAL VALIDATION COMPLETE**: Completed Section 4 below with curl/API tests and have evidence
198
- - [ ] **STRUCTURAL VALIDATION**: Confirmed test file imports `BaseTestCase` and uses `runTests(...)` (verify with `grep`)
199
- - [ ] **RETROSPECTIVE REVIEW**: Searched `retrospectives/` for testing mistakes (grep -r "anti-pattern\|testing.*fail" retrospectives/)
200
- - [ ] **ANTI-PATTERN REVIEW**: Read lines 609-780 of this file (anti-pattern section)
201
- - [ ] **TEST TYPE VERIFIED**: Confirmed I'm testing runtime behavior (executing code), not code structure (reading files)
202
-
203
- **VIOLATION CONSEQUENCES**:
204
- - If ANY item unchecked → STOP immediately, cannot create test files
205
- - Must show evidence of functional validation (curl outputs, logs) before proceeding
206
- - User will reject tests that skip this checklist
207
-
208
- **VERIFICATION REQUIRED**:
209
- Before creating any `test-*.ts` file, you MUST display this checklist with all items checked and show functional validation evidence.
210
-
211
- ### 4. **MANDATORY FUNCTIONAL VALIDATION - DO THIS BEFORE WRITING TESTS**
212
-
213
- **🚨 CRITICAL: Validate the feature actually works before writing tests**
214
-
215
- **For ANY feature with API endpoints (especially CRUD operations):**
216
-
217
- ```bash
218
- # Get the dynamic port (based on issue number in branch name)
219
- PORT=$(node -e "const {getPort} = require('./src/utils/git-utils'); console.log(getPort());")
220
-
221
- # 1. Test CREATE operation
222
- curl -X POST http://localhost:$PORT/endpoint \
223
- -H "x-executive-id: exec-ID" \
224
- -d '{"field":"value"}' | grep success
225
-
226
- # 2. Verify state in database/downstream systems
227
- curl http://localhost:$PORT/endpoint | grep "expected-value"
228
- grep "Created" server.log | tail -5
229
-
230
- # 3. Test UPDATE operation and TIME IT
231
- time curl -X POST http://localhost:$PORT/endpoint \
232
- -H "x-executive-id: exec-ID" \
233
- -d '{"field":"updated-value"}'
234
- # MUST complete < 5 seconds
235
-
236
- # 4. Verify old state REPLACED (not duplicated)
237
- curl http://localhost:$PORT/endpoint | grep "old-value"
238
- # Should be empty (old value gone)
239
-
240
- # 5. Check server.log for expected operations
241
- grep "Updated\|Deleted.*old\|Created.*new" server.log | tail -10
242
-
243
- # 6. Test DELETE operation
244
- curl -X DELETE http://localhost:$PORT/endpoint/ID
245
- curl http://localhost:$PORT/endpoint | grep "ID"
246
- # Should be empty (deleted)
247
-
248
- # 7. Verify downstream cleanup (calendar events, cache, etc.)
249
- # For calendar features: verify events deleted from calendar
250
- # For cache features: verify cache invalidated
251
- ```
252
-
253
- **Why this is mandatory:**
254
- - Tests can pass while feature is broken
255
- - Must validate ACTUAL behavior, not just test behavior
256
- - Must verify state changes in ALL systems (DB, calendar, cache)
257
- - Must catch performance issues (slow operations)
258
- - Must verify cleanup (no orphaned data)
259
-
260
- **PROHIBITED:**
261
- - ❌ Writing tests without validating feature works first
262
- - ❌ Assuming UPDATE works because CREATE works
263
- - ❌ Assuming DELETE cleans up downstream systems
264
- - ❌ Not timing operations to catch performance bugs
265
- - ❌ Only checking database, ignoring downstream systems
266
-
267
- **REQUIRED**:
268
- - ✅ Test ALL CRUD operations with curl before writing tests
269
- - ✅ Time operations to verify performance (< 5s)
270
- - ✅ Verify state in ALL systems (DB + calendar + logs)
271
- - ✅ Test with existing data (update scenarios)
272
- - ✅ Verify cleanup (delete operations)
273
- - ✅ **SHOW EVIDENCE**: Display curl outputs, server logs, database state BEFORE creating test files
274
- - ✅ **BLOCK TEST CREATION**: Cannot write `test-*.ts` files until functional validation complete and evidence shown
275
-
276
- **Issue #393 Example:**
277
- What went wrong: Implemented `updateWorkHours()` but didn't test it with curl.
278
- Result: Updated database but didn't create calendar events.
279
- Prevention: `curl -X POST /work-hours` twice, check server.log for calendar creation both times.
280
-
281
- ### 5. Locate Relevant Tests
282
- - For each modified source file (e.g., in `src/`), search the codebase for corresponding test files
283
- - Test files follow the naming convention `test-*.ts` or `*.test.ts`
284
- - A good method is to search for test files that `import` or `require` the modified source file
285
-
286
- ### 5.5. **MANDATORY RETROSPECTIVE REVIEW BEFORE TEST CREATION**
287
-
288
- **🚨 CRITICAL: Review retrospectives for testing mistakes before writing tests**
289
-
290
- **REQUIRED ACTION** (must complete before writing ANY test code):
291
- 1. **Search retrospectives**: `grep -r "anti-pattern\|testing.*fail\|test.*wrong" retrospectives/`
292
- 2. **Read mandatory retrospective**: `retrospectives/integration-testing-anti-pattern.md` (MANDATORY - documents this exact mistake)
293
- 3. **Verify test approach**: Check that your planned test doesn't match any documented anti-pattern
294
-
295
- **PROHIBITED:**
296
- - ❌ Writing tests without checking retrospectives
297
- - ❌ Ignoring documented anti-patterns
298
- - ❌ Repeating documented anti-patterns (like Issue #723)
299
-
300
- **ENFORCEMENT:**
301
- - If you create tests that match a documented anti-pattern → Immediate rejection
302
- - Must show evidence of retrospective review before test creation
303
-
304
- ### 6. Execute Tests
305
- - **ONLY after functional validation AND build pass**, run the specific test files you have identified
306
- - Use the `npm test -- <test-file-name.ts>` command to run individual tests
307
- - If multiple modules are affected, run all relevant test files
308
-
309
- ### 6.5. **COMPLETE TEST OWNERSHIP - MANDATORY**
310
- **🚨 CRITICAL: When you write tests, you own ALL failures until ALL tests pass**
311
-
312
- **RULE**: If you wrote the test, you fix ALL its failures before claiming completion. No exceptions.
313
-
314
- **PROHIBITED**:
315
- - ❌ "Fixed X issue, but Y failures are not my problem"
316
- - ❌ "Y failures are functional issues, not [the issue I was asked to fix]"
317
- - ❌ Categorizing failures as "my problem" vs "not my problem"
318
- - ❌ Claiming partial completion while tests are failing
319
-
320
- **REQUIRED**:
321
- - ✅ Run ALL tests you wrote before claiming completion
322
- - ✅ Verify ALL tests pass: `grep "not ok" test.log` returns empty
323
- - ✅ Fix ALL failures, regardless of failure type
324
- - ✅ Report complete status: "Fixed X, but Y still failing - investigating now" (not "Fixed X" while Y fails)
325
- - ✅ Only claim completion when ALL tests pass
326
-
327
- **Enforcement**:
328
- - Before claiming test work complete, verify: `grep -E "^(ok|not ok)" test.log | grep "not ok"` returns empty
329
- - If any test fails, it's your problem to fix - no exceptions
330
- - User should never have to remind you: "fix your own tests"
331
-
332
- **Reference**: See `retrospectives/retrospective-test-ownership-and-completion-responsibility.md` for detailed analysis of this failure pattern.
333
-
334
- ### 7. Verify and Fix
335
- - Careful…22488 tokens truncated…es Checklist
336
- - [ ] Verify database connection is working
337
- - [ ] Check database exists and is accessible
338
- - [ ] Verify collection/table exists
339
- - [ ] Test simple query to confirm connectivity
340
- - [ ] After write operations, query to verify data persisted
341
- - [ ] Check data matches expected schema
342
- - [ ] Verify indexes and constraints are correct
343
-
344
- ### API Issues Checklist
345
- - [ ] Identify exact endpoint URL and method
346
- - [ ] Test with curl before writing code/tests
347
- - [ ] Verify endpoint returns expected status code
348
- - [ ] Check response body structure
349
- - [ ] Test with required headers
350
- - [ ] Verify authentication/authorization works
351
- - [ ] Test error cases (invalid data, missing params)
352
-
353
- ### UI Issues Checklist
354
- - [ ] Open browser with Playwright (headless: false)
355
- - [ ] Navigate to the page manually
356
- - [ ] Take screenshots at each step
357
- - [ ] Inspect elements to identify selectors
358
- - [ ] Try interactions manually (click, type, etc)
359
- - [ ] Verify expected behavior occurs
360
- - [ ] Check for JavaScript errors in console
361
- - [ ] Only then write automated tests
362
-
363
- ### Test Writing Checklist
364
- - [ ] Mock dependencies, NOT the code being tested
365
- - [ ] Test actual implementation, not mocks
366
- - [ ] Validate service interactions occurred
367
- - [ ] Verify correct parameters were passed
368
- - [ ] Check actual return values
369
- - [ ] Confirm expected side effects
370
- - [ ] Test error cases and edge cases
371
-
372
- ## ENFORCEMENT
373
-
374
- These rules are **MANDATORY** and enforced through the following implemented mechanisms:
375
-
376
- ### 1. ESLint Configuration
377
- **Status**: ✅ **IMPLEMENTED** in `.eslintrc.js`
378
-
379
- The following rules enforce type safety:
380
- ```javascript
381
- '@typescript-eslint/no-explicit-any': 'error',
382
- '@typescript-eslint/consistent-type-assertions': ['error', {
383
- assertionStyle: 'as',
384
- objectLiteralTypeAssertions: 'never'
385
- }]
386
- ```
387
-
388
- ### 2. Quality Check Script
389
- **Status**: ✅ **IMPLEMENTED** in `.ai-agents/scripts/code-quality-check.sh`
390
-
391
- Automated script that checks for violations at three points:
392
- - **pre-commit mode**: Checks staged files before commit
393
- - **pre-pr mode**: Comprehensive checks before creating PR
394
- - **ci mode**: Runs in GitHub Actions on every PR
395
-
396
- Checks performed:
397
- - ❌ BLOCKS: `as any` type bypassing
398
- - ❌ BLOCKS: TypeScript compilation errors
399
- - ⚠️ WARNS: Linter issues
400
- - ⚠️ WARNS: Uncommitted changes (pre-pr/ci)
401
- - ⚠️ WARNS: Unpushed commits (pre-pr)
402
- - ⚠️ WARNS: Unaddressed PR comments (pre-pr)
403
-
404
- ### 3. GitHub Actions Workflow
405
- **Status**: ✅ **IMPLEMENTED** in `.github/workflows/code-quality-gate.yml`
406
-
407
- Automatically runs on every pull request:
408
- - Executes quality check script in CI mode
409
- - Posts results as PR comment
410
- - Blocks merge if critical checks fail
411
-
412
- ### PR Review Checklist
413
-
414
- When reviewing PRs, verify:
415
- - [ ] No `as any` type bypassing without documented justification
416
- - [ ] All commands use timeout scripts (check for bare `npm`, `curl`, etc.)
417
- - [ ] All previous PR comments have been addressed
418
- - [ ] Linter shows no errors
419
- - [ ] Database changes verified with direct queries (vidence in PR)
420
- - [ ] API changes tested with curl (vidence in PR)
421
- - [ ] UI changes explored with browser (screenshots in PR)
422
- - [ ] Tests validate behavior, not just absence of exceptions
423
- - [ ] Changes made thoughtfully with clear rationale
424
-
425
- ## QUALITY GATES
426
-
427
- The rules above are enforced through automated quality gates using `.ai-agents/scripts/code-quality-check.sh`.
428
-
429
- ### When Quality Gates Run
430
-
431
- 1. **Pre-commit** (optional): Checks staged files before committing
432
- 2. **Pre-PR** (REQUIRED): Validates work before marking ready for review - see `.ai-agents/workflows/implement.md` Step 6
433
- 3. **CI** (automatic): Runs on every pull request via GitHub Actions
434
-
435
- ### What Gets Checked
436
-
437
- | Check | Pre-Commit | Pre-PR | CI |
438
- |-------|------------|--------|-----|
439
- | `as any` type bypassing | ❌ BLOCKS | ❌ BLOCKS | ❌ BLOCKS |
440
- | TypeScript compilation | ❌ BLOCKS | ❌ BLOCKS | ❌ BLOCKS |
441
- | Linter issues | ⚠️ WARNS | ⚠️ WARNS | ⚠️ WARNS |
442
- | Uncommitted changes | N/A | ⚠️ WARNS | ⚠️ WARNS |
443
- | Unpushed commits | N/A | ⚠️ WARNS | N/A |
444
- | PR comments addressed | N/A | ⚠️ WARNS | N/A |
445
-
446
- ### For Agents
447
-
448
- **REQUIRED**: Before marking work ready for review, follow Step 6 in `.ai-agents/workflows/implement.md`:
449
- - Run `bash .ai-agents/scripts/code-quality-check.sh pre-pr`
450
- - Fix any critical failures (❌)
451
- - Document any warnings (⚠️) in evidence
452
- - Include quality gate output in implementation evidence
453
-
454
- ### Rule 10: OAuth Multi-Provider Debugging
455
-
456
- **CRITICAL**: When debugging OAuth flows with multiple providers (Google, Microsoft):
457
-
458
- **Logging Requirements:**
459
- - **MUST** log which provider is being used: `console.log('🔍 Using provider:', provider)`
460
- - **MUST** log redirect_uri in authorization: `console.log('🔍 Authorization redirect_uri:', redirectUri)`
461
- - **MUST** log redirect_uri in token exchange: `console.log('🔍 Token exchange redirect_uri:', redirectUri)`
462
- - **MUST** log provider extraction from state: `console.log('🔍 Provider from state:', provider || 'MISSING')`
463
-
464
- **Provider Extraction:**
465
- - **MUST** extract `provider` from OAuth state in **ALL** parsing paths (JSON, base64)
466
- - **MUST** fail explicitly if provider missing from state (don't default silently)
467
- - **MUST** verify provider matches between authorization and token exchange
468
-
469
- **Redirect URI Consistency:**
470
- - **MUST** use helper function for redirect_uri construction (don't construct inline)
471
- - **MUST** use same helper in both authorization and token exchange endpoints
472
- - **MUST** verify redirect_uri matches exactly (Microsoft is strict about this)
473
-
474
- **Testing:**
475
- - **MUST** test with both providers (Google and Microsoft)
476
- - **MUST** test with stricter provider (Microsoft) first
477
- - **MUST** check logs to confirm correct provider is being used
478
-
479
- **Common Pitfall:**
480
- - Error: "invalid_grant: Malformed auth code" when provider isn't extracted from state
481
- - Root cause: Microsoft login using Google provider because provider defaulted
482
- - Fix: Extract `provider = stateData.provider` in ALL state parsing paths
483
-
484
- **Reference**: See `retrospectives/oauth-microsoft-login-fix-retrospective.md` for detailed analysis.
485
-
486
- ## SUMMARY
487
-
488
- **Code Quality**: Use proper types, run linters, use timeouts, respond to feedback
489
-
490
- **Database Debugging**: Always verify connection and query directly
491
-
492
- **API Debugging**: Test with curl before writing code
493
-
494
- **UI Debugging**: Open browser and explore before automating
495
-
496
- **Test Quality**: Mock dependencies, validate behavior, not just exceptions
497
-
498
- **Thoughtful Action**: Think, analyze, plan, then act - don't react impulsively
499
-
500
- **OAuth Multi-Provider**: Always extract provider from OAuth state, use helper functions for redirect_uri, test with Microsoft first
501
-
502
- **Enforcement**: Automated via quality gates (pre-commit, pre-PR, CI) using `.ai-agents/scripts/code-quality-check.sh`
1
+ # Rule: agent-testing-guidelines
2
+
3
+ **Path:** `rules/agent-testing-guidelines.md`
4
+
5
+ ---
6
+
7
+ # AI Agent Testing & Validation Guidelines
8
+
9
+ ## INTENT
10
+ To ensure all work is thoroughly validated. To ensure agents provide **real, reproducible, end‑to‑end evidence** that fixes work — never “looks good” claims.
11
+
12
+ ## PRINCIPLES
13
+ - **Reproduce → Fix → Prove**: Show failing evidence first, then passing evidence after the fix.
14
+ - **Regression Test Verification**: ALWAYS verify regression tests fail with bug and pass with fix. Never create tests that pass/fail in both scenarios.
15
+ - **Test what matters**: Follow the test plan. Test the functionality that you have changed. Do not mock core functionality being tested.
16
+ - **Keep it simple**: Tests must be minimal but complete, covering all relevant scenarios. Use boilerplate tests and mocks to reduce duplication. Prefer simple unit tests over complex integration tests.
17
+ - **Be complete**: No Placeholder Tests `// TODO` or empty/disabled test bodies are forbidden. Do not assume failing test cases are not important. Always watch for server logs for errors.
18
+ - **Be efficient**: Tests should create objects they need, test core functionality, then delete any objects they create, close server/database connections.
19
+ - **Be resilient**: When a tool fails, investigate alternative approaches before giving up. Check for existing working examples in the project before claiming a tool is broken.
20
+ - **Be truthful**: NEVER CLAIM SUCCESS WITHOUT RUNNING TESTS. Always run tests and show results before claiming they pass.
21
+ - **Backup with Evidence, Not Assertions**: Include evidence of passing tests in the PR before submitting for review.
22
+ - **Complete Test Ownership**: When you write tests, you own ALL failures until ALL tests pass. No exceptions, no categorizations, no "not my problem" rationalizations. If you wrote the test, you fix ALL its failures before claiming completion.
23
+ - **Avoid Duplicates; Find Gaps**: Do not add redundant tests that restate existing coverage. Before adding/removing tests, identify what behavior is currently covered and what is missing, then add/merge tests to close gaps (not inflate count).
24
+
25
+ ## UI E2E (Playwright) — MANDATORY RULES
26
+
27
+ These rules exist because UI failures often show up as “timeouts” unless you instrument and assert correctly.
28
+
29
+ - **Headless by default**: `chromium.launch({ headless: true })` unless explicitly debugging.
30
+ - **Prefer DOM truth over network truth**: Assert user-visible state changes (selectors/text) instead of relying on `page.waitForResponse()` predicates.
31
+ - **Instrument failures**: Every Playwright E2E must attach:
32
+ - `page.on('requestfailed', …)` to print method/url/errorText
33
+ - `page.on('response', …)` to log HTTP >= 400
34
+ - **Server bootstrap is explicit**: If a test depends on the dev server, it must either:
35
+ - auto-start it (preferred), or
36
+ - fail fast with an exact command and URL (`http://localhost:PORT/...`)
37
+ - **Stable selectors only**: Use `data-role`/`data-ashley-component` selectors. Do not use nth-child or layout-dependent selectors.
38
+ - **Optional UI stays optional**: Never re-enable product UI just to satisfy a test. Gate the test behavior or verify via API/DB instead.
39
+ - **Manual UI verification uses Playwright MCP**: When asked to “pop the browser” or confirm a UI bugfix, use Playwright MCP tools (navigate/click/snapshot) and capture the resulting DOM snapshot as evidence.
40
+
41
+ ## FAILING TEST TRIAGE (DO NOT DELETE FIRST)
42
+
43
+ **Rule**: A failing test is a signal until proven otherwise. You may only delete/skip a test after completing triage and ensuring coverage remains.
44
+
45
+ ### Required triage steps (in order)
46
+ 1. **Reproduce twice**: Re-run the failing test at least 2x (same commit) and capture `test.log`/output.
47
+ 2. **Classify the failure**:
48
+ - **Product bug** (test reveals a real functional gap): fix product code; keep the test.
49
+ - **Test bug** (wrong selector/assumption/incorrect fixture): fix the test; keep the test.
50
+ - **Flake** (timing/overlays/network): stabilize the test OR rewrite it to a more deterministic layer (unit/integration). Do not delete until replaced.
51
+ 3. **Prove redundancy before deletion**:
52
+ - Point to an existing test that asserts the *same* behavior (or strengthen an existing test to do so).
53
+ - If you delete a test, you MUST add/merge equivalent coverage elsewhere in the same PR.
54
+
55
+ ### Allowed outcomes
56
+ - **Fix product** (preferred): keep test; it becomes the regression guardrail.
57
+ - **Fix test**: keep test; align with real UX/DOM and intended behavior.
58
+ - **Replace test**: move the assertion to a more reliable layer (e.g., unit test for UI validation logic, API test for backend behavior). Delete only after replacement is merged and passing.
59
+
60
+ ### Prohibited outcomes
61
+ - ❌ Deleting a failing test to “unblock” without proving redundancy
62
+ - ❌ Deleting a failing test when it is the only check for a behavior
63
+ - ❌ Creating “dupe tests” instead of strengthening the existing one
64
+ - ❌ Converting a failing test into a no-op (skipping, loosening assertions) without replacement
65
+
66
+ ## 🚨 MANDATORY ENFORCEMENT PROTOCOLS
67
+
68
+ ### **ANTI-LYING ENFORCEMENT**
69
+ **RULE**: Before claiming ANY test result, you MUST:
70
+ 1. Run the exact test command
71
+ 2. Show the complete output (not summary)
72
+ 3. Verify exit code is 0
73
+ 4. State what was actually tested
74
+
75
+ **VIOLATION CONSEQUENCES**:
76
+ - Immediate task termination
77
+ - Mandatory apology to user
78
+ - Must re-run tests with evidence before continuing
79
+ - Cannot claim "tests pass" without showing actual output
80
+
81
+ ### **TEST SCOPE PRECISION ENFORCEMENT**
82
+ **RULE**: When testing specific functionality:
83
+ - Use targeted commands: `npm run test-failing`, `npm run test-smoke`, etc.
84
+ - NEVER run entire test suites unless explicitly requested
85
+ - Always specify exact test file/function being tested
86
+
87
+ **VIOLATION CONSEQUENCES**:
88
+ - Must re-run with correct scope
89
+ - Cannot proceed until targeted test is run
90
+ - Must show evidence of correct test scope
91
+
92
+ ### **ANTI-PATTERN DETECTION ENFORCEMENT**
93
+ **RULE**: Before creating any test, you MUST answer these questions:
94
+
95
+ 1. **Am I testing runtime behavior or code structure?**
96
+ - Runtime behavior = Executing code, checking outcomes (API calls, service methods, state changes)
97
+ - Code structure = Reading files, checking if strings exist (`fs.readFileSync()` to verify code)
98
+ - ❌ Code structure checks are NOT tests - they're static analysis
99
+
100
+ 2. **Am I testing the real behavior or a mock?**
101
+ - Real behavior = Actual API calls, actual service methods
102
+ - Mock = Only if mocking dependencies, NOT the thing being tested
103
+ - ❌ Mocking what you're testing is an anti-pattern
104
+
105
+ 3. **Does this test actually validate the fix?**
106
+ - Would test fail if functionality was broken?
107
+ - Does test check observable outcomes (database state, API responses, logs)?
108
+ - ❌ Tests that pass regardless of behavior are invalid
109
+
110
+ 4. **Am I repeating a documented anti-pattern?**
111
+ - Checked retrospectives for similar mistakes?
112
+ - Verified test approach doesn't match anti-patterns?
113
+ - ❌ Repeating documented mistakes (like Issue #723) is prohibited
114
+
115
+ **MANDATORY VALIDATION**:
116
+ Before writing test code, you MUST:
117
+ 1. Answer all 4 questions above
118
+ 2. Show answers in your response
119
+ 3. Verify answers don't indicate anti-patterns
120
+ 4. If ANY answer indicates anti-pattern → STOP, cannot proceed
121
+
122
+ **VIOLATION CONSEQUENCES**:
123
+ - If mocking what you're testing → STOP immediately
124
+ - If testing code structure instead of runtime behavior → STOP immediately
125
+ - If repeating documented anti-pattern → STOP immediately
126
+ - Must redesign test to validate actual behavior
127
+ - Cannot proceed until test validates functionality
128
+
129
+ ### **FAILURE CASCADE PREVENTION ENFORCEMENT**
130
+ **RULE**: When one approach fails:
131
+ 1. STOP - don't try another approach immediately
132
+ 2. ANALYZE - why did it fail?
133
+ 3. ADMIT - what went wrong?
134
+ 4. ASK - for guidance if stuck
135
+ 5. FIX - only after understanding the problem
136
+
137
+ **VIOLATION CONSEQUENCES**:
138
+ - Must pause and analyze failure
139
+ - Cannot try new approach without understanding first failure
140
+ - Must ask user for guidance if stuck
141
+ - Cannot compound errors with more attempts
142
+
143
+
144
+ ## CORE TESTING PROCEDURE
145
+
146
+ ### 1. Analyze Before Implementing
147
+ **CRITICAL**: Always analyze the codebase thoroughly before making changes:
148
+ - Use `grep_search` to find all dependencies and usage patterns
149
+ - Use `find_by_name` to locate related files and patterns
150
+ - Use `Read` to understand existing implementations
151
+ - Document findings with real code examples and line numbers
152
+ - **NEVER** make changes based on assumptions
153
+
154
+ ### 2. Identify Code Changes
155
+ Before committing, determine the full scope of files that have been modified, created, or deleted.
156
+
157
+ ### 3. **MANDATORY BUILD VERIFICATION - DO THIS BEFORE TESTS**
158
+
159
+ **🚨 CRITICAL: Run build BEFORE running any tests**
160
+
161
+ ```bash
162
+ # ALWAYS run this first, before any test execution:
163
+ npm run build
164
+
165
+ # Verify exit code is 0:
166
+ echo $? # Must output: 0
167
+ ```
168
+
169
+ **Why this is mandatory**:
170
+ - Catches TypeScript compilation errors immediately
171
+ - Prevents wasting time running tests that will fail anyway
172
+ - Catches function signature mismatches
173
+ - Verifies all imports resolve correctly
174
+ - **If build fails, tests are meaningless**
175
+
176
+ **PROHIBITED**:
177
+ - ❌ Running tests before build
178
+ - ❌ Assuming build works without running it
179
+ - ❌ Claiming "tests passing" when build fails
180
+ - ❌ Ignoring build errors as "probably unrelated"
181
+
182
+ **REQUIRED**:
183
+ - ✅ Run `npm run build` first, every time
184
+ - ✅ Fix ALL build errors before running tests
185
+ - ✅ Verify exit code is 0
186
+ - ✅ Never proceed to tests if build fails
187
+
188
+ **Special Case - BAML Files**:
189
+ If you edited ANY `.baml` file, see `get_fraim_file({ path: "rules/baml-workflow.md" })` for the mandatory 3-step workflow that must be completed BEFORE running build.
190
+
191
+ ### 3.5. **MANDATORY PRE-TEST CREATION CHECKLIST - BLOCKS TEST WRITING**
192
+
193
+ **🚨 CRITICAL: Complete this checklist BEFORE writing ANY test code**
194
+
195
+ **You CANNOT create `test-*.ts` files until ALL items are checked:**
196
+
197
+ - [ ] **FUNCTIONAL VALIDATION COMPLETE**: Completed Section 4 below with curl/API tests and have evidence
198
+ - [ ] **STRUCTURAL VALIDATION**: Confirmed test file imports `BaseTestCase` and uses `runTests(...)` (verify with `grep`)
199
+ - [ ] **RETROSPECTIVE REVIEW**: Searched `retrospectives/` for testing mistakes (grep -r "anti-pattern\|testing.*fail" retrospectives/)
200
+ - [ ] **ANTI-PATTERN REVIEW**: Read lines 609-780 of this file (anti-pattern section)
201
+ - [ ] **TEST TYPE VERIFIED**: Confirmed I'm testing runtime behavior (executing code), not code structure (reading files)
202
+
203
+ **VIOLATION CONSEQUENCES**:
204
+ - If ANY item unchecked → STOP immediately, cannot create test files
205
+ - Must show evidence of functional validation (curl outputs, logs) before proceeding
206
+ - User will reject tests that skip this checklist
207
+
208
+ **VERIFICATION REQUIRED**:
209
+ Before creating any `test-*.ts` file, you MUST display this checklist with all items checked and show functional validation evidence.
210
+
211
+ ### 4. **MANDATORY FUNCTIONAL VALIDATION - DO THIS BEFORE WRITING TESTS**
212
+
213
+ **🚨 CRITICAL: Validate the feature actually works before writing tests**
214
+
215
+ **For ANY feature with API endpoints (especially CRUD operations):**
216
+
217
+ ```bash
218
+ # Get the dynamic port (based on issue number in branch name)
219
+ PORT=$(node -e "const {getPort} = require('./src/utils/git-utils'); console.log(getPort());")
220
+
221
+ # 1. Test CREATE operation
222
+ curl -X POST http://localhost:$PORT/endpoint \
223
+ -H "x-executive-id: exec-ID" \
224
+ -d '{"field":"value"}' | grep success
225
+
226
+ # 2. Verify state in database/downstream systems
227
+ curl http://localhost:$PORT/endpoint | grep "expected-value"
228
+ grep "Created" server.log | tail -5
229
+
230
+ # 3. Test UPDATE operation and TIME IT
231
+ time curl -X POST http://localhost:$PORT/endpoint \
232
+ -H "x-executive-id: exec-ID" \
233
+ -d '{"field":"updated-value"}'
234
+ # MUST complete < 5 seconds
235
+
236
+ # 4. Verify old state REPLACED (not duplicated)
237
+ curl http://localhost:$PORT/endpoint | grep "old-value"
238
+ # Should be empty (old value gone)
239
+
240
+ # 5. Check server.log for expected operations
241
+ grep "Updated\|Deleted.*old\|Created.*new" server.log | tail -10
242
+
243
+ # 6. Test DELETE operation
244
+ curl -X DELETE http://localhost:$PORT/endpoint/ID
245
+ curl http://localhost:$PORT/endpoint | grep "ID"
246
+ # Should be empty (deleted)
247
+
248
+ # 7. Verify downstream cleanup (calendar events, cache, etc.)
249
+ # For calendar features: verify events deleted from calendar
250
+ # For cache features: verify cache invalidated
251
+ ```
252
+
253
+ **Why this is mandatory:**
254
+ - Tests can pass while feature is broken
255
+ - Must validate ACTUAL behavior, not just test behavior
256
+ - Must verify state changes in ALL systems (DB, calendar, cache)
257
+ - Must catch performance issues (slow operations)
258
+ - Must verify cleanup (no orphaned data)
259
+
260
+ **PROHIBITED:**
261
+ - ❌ Writing tests without validating feature works first
262
+ - ❌ Assuming UPDATE works because CREATE works
263
+ - ❌ Assuming DELETE cleans up downstream systems
264
+ - ❌ Not timing operations to catch performance bugs
265
+ - ❌ Only checking database, ignoring downstream systems
266
+
267
+ **REQUIRED**:
268
+ - ✅ Test ALL CRUD operations with curl before writing tests
269
+ - ✅ Time operations to verify performance (< 5s)
270
+ - ✅ Verify state in ALL systems (DB + calendar + logs)
271
+ - ✅ Test with existing data (update scenarios)
272
+ - ✅ Verify cleanup (delete operations)
273
+ - ✅ **SHOW EVIDENCE**: Display curl outputs, server logs, database state BEFORE creating test files
274
+ - ✅ **BLOCK TEST CREATION**: Cannot write `test-*.ts` files until functional validation complete and evidence shown
275
+
276
+ **Issue #393 Example:**
277
+ What went wrong: Implemented `updateWorkHours()` but didn't test it with curl.
278
+ Result: Updated database but didn't create calendar events.
279
+ Prevention: `curl -X POST /work-hours` twice, check server.log for calendar creation both times.
280
+
281
+ ### 5. Locate Relevant Tests
282
+ - For each modified source file (e.g., in `src/`), search the codebase for corresponding test files
283
+ - Test files follow the naming convention `test-*.ts` or `*.test.ts`
284
+ - A good method is to search for test files that `import` or `require` the modified source file
285
+
286
+ ### 5.5. **MANDATORY RETROSPECTIVE REVIEW BEFORE TEST CREATION**
287
+
288
+ **🚨 CRITICAL: Review retrospectives for testing mistakes before writing tests**
289
+
290
+ **REQUIRED ACTION** (must complete before writing ANY test code):
291
+ 1. **Search retrospectives**: `grep -r "anti-pattern\|testing.*fail\|test.*wrong" retrospectives/`
292
+ 2. **Read mandatory retrospective**: `retrospectives/integration-testing-anti-pattern.md` (MANDATORY - documents this exact mistake)
293
+ 3. **Verify test approach**: Check that your planned test doesn't match any documented anti-pattern
294
+
295
+ **PROHIBITED:**
296
+ - ❌ Writing tests without checking retrospectives
297
+ - ❌ Ignoring documented anti-patterns
298
+ - ❌ Repeating documented anti-patterns (like Issue #723)
299
+
300
+ **ENFORCEMENT:**
301
+ - If you create tests that match a documented anti-pattern → Immediate rejection
302
+ - Must show evidence of retrospective review before test creation
303
+
304
+ ### 6. Execute Tests
305
+ - **ONLY after functional validation AND build pass**, run the specific test files you have identified
306
+ - Use the `npm test -- <test-file-name.ts>` command to run individual tests
307
+ - If multiple modules are affected, run all relevant test files
308
+
309
+ ### 6.5. **COMPLETE TEST OWNERSHIP - MANDATORY**
310
+ **🚨 CRITICAL: When you write tests, you own ALL failures until ALL tests pass**
311
+
312
+ **RULE**: If you wrote the test, you fix ALL its failures before claiming completion. No exceptions.
313
+
314
+ **PROHIBITED**:
315
+ - ❌ "Fixed X issue, but Y failures are not my problem"
316
+ - ❌ "Y failures are functional issues, not [the issue I was asked to fix]"
317
+ - ❌ Categorizing failures as "my problem" vs "not my problem"
318
+ - ❌ Claiming partial completion while tests are failing
319
+
320
+ **REQUIRED**:
321
+ - ✅ Run ALL tests you wrote before claiming completion
322
+ - ✅ Verify ALL tests pass: `grep "not ok" test.log` returns empty
323
+ - ✅ Fix ALL failures, regardless of failure type
324
+ - ✅ Report complete status: "Fixed X, but Y still failing - investigating now" (not "Fixed X" while Y fails)
325
+ - ✅ Only claim completion when ALL tests pass
326
+
327
+ **Enforcement**:
328
+ - Before claiming test work complete, verify: `grep -E "^(ok|not ok)" test.log | grep "not ok"` returns empty
329
+ - If any test fails, it's your problem to fix - no exceptions
330
+ - User should never have to remind you: "fix your own tests"
331
+
332
+ **Reference**: See `retrospectives/retrospective-test-ownership-and-completion-responsibility.md` for detailed analysis of this failure pattern.
333
+
334
+ ### 7. Verify and Fix
335
+ - Careful…22488 tokens truncated…es Checklist
336
+ - [ ] Verify database connection is working
337
+ - [ ] Check database exists and is accessible
338
+ - [ ] Verify collection/table exists
339
+ - [ ] Test simple query to confirm connectivity
340
+ - [ ] After write operations, query to verify data persisted
341
+ - [ ] Check data matches expected schema
342
+ - [ ] Verify indexes and constraints are correct
343
+
344
+ ### API Issues Checklist
345
+ - [ ] Identify exact endpoint URL and method
346
+ - [ ] Test with curl before writing code/tests
347
+ - [ ] Verify endpoint returns expected status code
348
+ - [ ] Check response body structure
349
+ - [ ] Test with required headers
350
+ - [ ] Verify authentication/authorization works
351
+ - [ ] Test error cases (invalid data, missing params)
352
+
353
+ ### UI Issues Checklist
354
+ - [ ] Open browser with Playwright (headless: false)
355
+ - [ ] Navigate to the page manually
356
+ - [ ] Take screenshots at each step
357
+ - [ ] Inspect elements to identify selectors
358
+ - [ ] Try interactions manually (click, type, etc)
359
+ - [ ] Verify expected behavior occurs
360
+ - [ ] Check for JavaScript errors in console
361
+ - [ ] Only then write automated tests
362
+
363
+ ### Test Writing Checklist
364
+ - [ ] Mock dependencies, NOT the code being tested
365
+ - [ ] Test actual implementation, not mocks
366
+ - [ ] Validate service interactions occurred
367
+ - [ ] Verify correct parameters were passed
368
+ - [ ] Check actual return values
369
+ - [ ] Confirm expected side effects
370
+ - [ ] Test error cases and edge cases
371
+
372
+ ## ENFORCEMENT
373
+
374
+ These rules are **MANDATORY** and enforced through the following implemented mechanisms:
375
+
376
+ ### 1. ESLint Configuration
377
+ **Status**: ✅ **IMPLEMENTED** in `.eslintrc.js`
378
+
379
+ The following rules enforce type safety:
380
+ ```javascript
381
+ '@typescript-eslint/no-explicit-any': 'error',
382
+ '@typescript-eslint/consistent-type-assertions': ['error', {
383
+ assertionStyle: 'as',
384
+ objectLiteralTypeAssertions: 'never'
385
+ }]
386
+ ```
387
+
388
+ ### 2. Quality Check Script
389
+ **Status**: ✅ **IMPLEMENTED** in `get_fraim_file({ path: "scripts/code-quality-check.sh" })`
390
+
391
+ Automated script that checks for violations at three points:
392
+ - **pre-commit mode**: Checks staged files before commit
393
+ - **pre-pr mode**: Comprehensive checks before creating PR
394
+ - **ci mode**: Runs in GitHub Actions on every PR
395
+
396
+ Checks performed:
397
+ - ❌ BLOCKS: `as any` type bypassing
398
+ - ❌ BLOCKS: TypeScript compilation errors
399
+ - ⚠️ WARNS: Linter issues
400
+ - ⚠️ WARNS: Uncommitted changes (pre-pr/ci)
401
+ - ⚠️ WARNS: Unpushed commits (pre-pr)
402
+ - ⚠️ WARNS: Unaddressed PR comments (pre-pr)
403
+
404
+ ### 3. GitHub Actions Workflow
405
+ **Status**: ✅ **IMPLEMENTED** in `.github/workflows/code-quality-gate.yml`
406
+
407
+ Automatically runs on every pull request:
408
+ - Executes quality check script in CI mode
409
+ - Posts results as PR comment
410
+ - Blocks merge if critical checks fail
411
+
412
+ ### PR Review Checklist
413
+
414
+ When reviewing PRs, verify:
415
+ - [ ] No `as any` type bypassing without documented justification
416
+ - [ ] All commands use timeout scripts (check for bare `npm`, `curl`, etc.)
417
+ - [ ] All previous PR comments have been addressed
418
+ - [ ] Linter shows no errors
419
+ - [ ] Database changes verified with direct queries (vidence in PR)
420
+ - [ ] API changes tested with curl (vidence in PR)
421
+ - [ ] UI changes explored with browser (screenshots in PR)
422
+ - [ ] Tests validate behavior, not just absence of exceptions
423
+ - [ ] Changes made thoughtfully with clear rationale
424
+
425
+ ## QUALITY GATES
426
+
427
+ The rules above are enforced through automated quality gates using `get_fraim_file({ path: "scripts/code-quality-check.sh" })`.
428
+
429
+ ### When Quality Gates Run
430
+
431
+ 1. **Pre-commit** (optional): Checks staged files before committing
432
+ 2. **Pre-PR** (REQUIRED): Validates work before marking ready for review - see `get_fraim_file({ path: "workflows/product-building/implement.md" })` Step 6
433
+ 3. **CI** (automatic): Runs on every pull request via GitHub Actions
434
+
435
+ ### What Gets Checked
436
+
437
+ | Check | Pre-Commit | Pre-PR | CI |
438
+ |-------|------------|--------|-----|
439
+ | `as any` type bypassing | ❌ BLOCKS | ❌ BLOCKS | ❌ BLOCKS |
440
+ | TypeScript compilation | ❌ BLOCKS | ❌ BLOCKS | ❌ BLOCKS |
441
+ | Linter issues | ⚠️ WARNS | ⚠️ WARNS | ⚠️ WARNS |
442
+ | Uncommitted changes | N/A | ⚠️ WARNS | ⚠️ WARNS |
443
+ | Unpushed commits | N/A | ⚠️ WARNS | N/A |
444
+ | PR comments addressed | N/A | ⚠️ WARNS | N/A |
445
+
446
+ ### For Agents
447
+
448
+ **REQUIRED**: Before marking work ready for review, follow "Quality Gate Validation" in `get_fraim_file({ path: "workflows/product-building/implement.md" })`:
449
+ - Run `get_fraim_file({ path: "scripts/code-quality-check.sh" })` with argument `pre-pr`
450
+ - Fix any critical failures (❌)
451
+ - Document any warnings (⚠️) in evidence
452
+ - Include quality gate output in implementation evidence
453
+
454
+ ### Rule 10: OAuth Multi-Provider Debugging
455
+
456
+ **CRITICAL**: When debugging OAuth flows with multiple providers (Google, Microsoft):
457
+
458
+ **Logging Requirements:**
459
+ - **MUST** log which provider is being used: `console.log('🔍 Using provider:', provider)`
460
+ - **MUST** log redirect_uri in authorization: `console.log('🔍 Authorization redirect_uri:', redirectUri)`
461
+ - **MUST** log redirect_uri in token exchange: `console.log('🔍 Token exchange redirect_uri:', redirectUri)`
462
+ - **MUST** log provider extraction from state: `console.log('🔍 Provider from state:', provider || 'MISSING')`
463
+
464
+ **Provider Extraction:**
465
+ - **MUST** extract `provider` from OAuth state in **ALL** parsing paths (JSON, base64)
466
+ - **MUST** fail explicitly if provider missing from state (don't default silently)
467
+ - **MUST** verify provider matches between authorization and token exchange
468
+
469
+ **Redirect URI Consistency:**
470
+ - **MUST** use helper function for redirect_uri construction (don't construct inline)
471
+ - **MUST** use same helper in both authorization and token exchange endpoints
472
+ - **MUST** verify redirect_uri matches exactly (Microsoft is strict about this)
473
+
474
+ **Testing:**
475
+ - **MUST** test with both providers (Google and Microsoft)
476
+ - **MUST** test with stricter provider (Microsoft) first
477
+ - **MUST** check logs to confirm correct provider is being used
478
+
479
+ **Common Pitfall:**
480
+ - Error: "invalid_grant: Malformed auth code" when provider isn't extracted from state
481
+ - Root cause: Microsoft login using Google provider because provider defaulted
482
+ - Fix: Extract `provider = stateData.provider` in ALL state parsing paths
483
+
484
+ **Reference**: See `retrospectives/oauth-microsoft-login-fix-retrospective.md` for detailed analysis.
485
+
486
+ ## SUMMARY
487
+
488
+ **Code Quality**: Use proper types, run linters, use timeouts, respond to feedback
489
+
490
+ **Database Debugging**: Always verify connection and query directly
491
+
492
+ **API Debugging**: Test with curl before writing code
493
+
494
+ **UI Debugging**: Open browser and explore before automating
495
+
496
+ **Test Quality**: Mock dependencies, validate behavior, not just exceptions
497
+
498
+ **Thoughtful Action**: Think, analyze, plan, then act - don't react impulsively
499
+
500
+ **OAuth Multi-Provider**: Always extract provider from OAuth state, use helper functions for redirect_uri, test with Microsoft first
501
+
502
+ **Enforcement**: Automated via quality gates (pre-commit, pre-PR, CI) using `get_fraim_file({ path: "scripts/code-quality-check.sh" })`