fraim-framework 1.0.11 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (93) hide show
  1. package/.ai-agents/agent-guardrails.md +58 -0
  2. package/.ai-agents/mcp-template.jsonc +34 -0
  3. package/.ai-agents/rules/agent-testing-guidelines.md +545 -0
  4. package/.ai-agents/rules/architecture.md +52 -0
  5. package/.ai-agents/rules/communication.md +122 -0
  6. package/.ai-agents/rules/continuous-learning.md +55 -0
  7. package/.ai-agents/rules/git-safe-commands.md +34 -0
  8. package/.ai-agents/rules/integrity-and-test-ethics.md +223 -0
  9. package/.ai-agents/rules/local-development.md +252 -0
  10. package/.ai-agents/rules/merge-requirements.md +231 -0
  11. package/.ai-agents/rules/pr-workflow-completeness.md +191 -0
  12. package/.ai-agents/rules/simplicity.md +112 -0
  13. package/.ai-agents/rules/software-development-lifecycle.md +276 -0
  14. package/.ai-agents/rules/spike-first-development.md +199 -0
  15. package/.ai-agents/rules/successful-debugging-patterns.md +313 -0
  16. package/.ai-agents/scripts/cleanup-branch.ts +278 -0
  17. package/.ai-agents/scripts/exec-with-timeout.ts +122 -0
  18. package/.ai-agents/scripts/prep-issue.sh +162 -0
  19. package/.ai-agents/templates/evidence/Design-Evidence.md +30 -0
  20. package/.ai-agents/templates/evidence/Implementation-BugEvidence.md +48 -0
  21. package/.ai-agents/templates/evidence/Implementation-FeatureEvidence.md +54 -0
  22. package/.ai-agents/templates/evidence/Spec-Evidence.md +19 -0
  23. package/.ai-agents/templates/help/HelpNeeded.md +14 -0
  24. package/.ai-agents/templates/retrospective/RETROSPECTIVE-TEMPLATE.md +55 -0
  25. package/.ai-agents/templates/specs/BUGSPEC-TEMPLATE.md +37 -0
  26. package/.ai-agents/templates/specs/FEATURESPEC-TEMPLATE.md +29 -0
  27. package/.ai-agents/templates/specs/TECHSPEC-TEMPLATE.md +39 -0
  28. package/.ai-agents/workflows/design.md +121 -0
  29. package/.ai-agents/workflows/implement.md +170 -0
  30. package/.ai-agents/workflows/resolve.md +152 -0
  31. package/.ai-agents/workflows/retrospect.md +84 -0
  32. package/.ai-agents/workflows/spec.md +103 -0
  33. package/.ai-agents/workflows/test.md +90 -0
  34. package/.cursor/rules/cursor-rules.mdc +8 -0
  35. package/.cursor/rules/design.mdc +4 -0
  36. package/.cursor/rules/implement.mdc +6 -0
  37. package/.cursor/rules/resolve.mdc +5 -0
  38. package/.cursor/rules/retrospect.mdc +4 -0
  39. package/.cursor/rules/spec.mdc +4 -0
  40. package/.cursor/rules/test.mdc +5 -0
  41. package/.windsurf/rules/windsurf-rules.md +7 -0
  42. package/.windsurf/workflows/resolve-issue.md +6 -0
  43. package/.windsurf/workflows/retrospect.md +6 -0
  44. package/.windsurf/workflows/start-design.md +6 -0
  45. package/.windsurf/workflows/start-impl.md +6 -0
  46. package/.windsurf/workflows/start-spec.md +6 -0
  47. package/.windsurf/workflows/start-tests.md +6 -0
  48. package/CHANGELOG.md +66 -0
  49. package/CODEOWNERS +24 -0
  50. package/DISTRIBUTION.md +6 -6
  51. package/PUBLISH_INSTRUCTIONS.md +93 -0
  52. package/README.md +330 -104
  53. package/bin/fraim.js +49 -3
  54. package/index.js +30 -3
  55. package/install.sh +58 -58
  56. package/labels.json +52 -0
  57. package/linkedin-post.md +23 -0
  58. package/package.json +12 -7
  59. package/sample_package.json +18 -0
  60. package/setup.js +712 -389
  61. package/test-utils.ts +118 -0
  62. package/tsconfig.json +22 -0
  63. package/agents/claude/CLAUDE.md +0 -42
  64. package/agents/cursor/rules/architecture.mdc +0 -49
  65. package/agents/cursor/rules/continuous-learning.mdc +0 -48
  66. package/agents/cursor/rules/cursor-workflow.mdc +0 -29
  67. package/agents/cursor/rules/design.mdc +0 -25
  68. package/agents/cursor/rules/implement.mdc +0 -26
  69. package/agents/cursor/rules/local-development.mdc +0 -104
  70. package/agents/cursor/rules/prep.mdc +0 -15
  71. package/agents/cursor/rules/resolve.mdc +0 -46
  72. package/agents/cursor/rules/simplicity.mdc +0 -18
  73. package/agents/cursor/rules/software-development-lifecycle.mdc +0 -41
  74. package/agents/cursor/rules/test.mdc +0 -25
  75. package/agents/windsurf/rules/architecture.md +0 -49
  76. package/agents/windsurf/rules/continuous-learning.md +0 -47
  77. package/agents/windsurf/rules/local-development.md +0 -103
  78. package/agents/windsurf/rules/remote-development.md +0 -22
  79. package/agents/windsurf/rules/simplicity.md +0 -17
  80. package/agents/windsurf/rules/windsurf-workflow.md +0 -28
  81. package/agents/windsurf/workflows/prep.md +0 -20
  82. package/agents/windsurf/workflows/resolve-issue.md +0 -47
  83. package/agents/windsurf/workflows/start-design.md +0 -26
  84. package/agents/windsurf/workflows/start-impl.md +0 -27
  85. package/agents/windsurf/workflows/start-tests.md +0 -26
  86. package/github/phase-change.yml +0 -218
  87. package/github/status-change.yml +0 -68
  88. package/github/sync-on-pr-review.yml +0 -66
  89. package/scripts/__init__.py +0 -10
  90. package/scripts/cli.py +0 -141
  91. package/setup.py +0 -0
  92. package/test-config.json +0 -32
  93. package/workflows/setup-fraim.yml +0 -147
@@ -0,0 +1,58 @@
1
+ # AI Agent Guardrails
2
+
3
+ This file references the centralized rules located in `.ai-agents/rules/` to ensure consistency across all AI platforms.
4
+
5
+ ## Referenced Rules
6
+
7
+ ### 0. Integrity
8
+ **Source**: `.ai-agents/rules/integrity-and-test-ethics.md`
9
+
10
+ THIS IS THE MOST CRITICAL RULE. Be ethical, truthful, honest above all.
11
+
12
+ ### 1. Simplicity
13
+ **Source**: `.ai-agents/rules/simplicity.md`
14
+
15
+ Keep solutions simple and focused, avoid over-engineering. Focus on the assigned issue only and don't make unrelated changes.
16
+
17
+ ### 2. Communication
18
+ **Source**: `.ai-agents/rules/communication.md`
19
+
20
+ Establish clear communication patterns and progress reporting standards for effective coordination between agents and stakeholders.
21
+
22
+ ### 3. Architecture
23
+ **Source**: `.ai-agents/rules/architecture.md`
24
+
25
+ Maintain clean architectural boundaries by using BAML (LLM) for natural-language understanding and TypeScript for deterministic work.
26
+
27
+ ### 4. Continuous Learning
28
+ **Source**: `.ai-agents/rules/continuous-learning.md`
29
+
30
+ Prevent repeating past mistakes by systematically learning from retrospectives, RFCs, and historical issue patterns.
31
+
32
+ ### 5. Agent Testing Guidelines
33
+ **Source**: `.ai-agents/rules/agent-testing-guidelines.md`
34
+
35
+ Comprehensive testing and validation requirements with concrete evidence. Ensures all work is thoroughly validated before completion.
36
+
37
+ ### 6. Local Development
38
+ **Source**: `.ai-agents/rules/local-development.md`
39
+
40
+ Local development guidelines and workspace safety. Enables safe parallel development through strict workspace separation.
41
+
42
+ ### 7. Software Development Lifecycle
43
+ **Source**: `.ai-agents/rules/software-development-lifecycle.md`
44
+
45
+ ### 8. PR Workflow Completeness
46
+ **Source**: `.ai-agents/rules/pr-workflow-completeness.md`
47
+
48
+ Ensure complete PR lifecycle handling with proper monitoring, feedback handling, and testing. Follow requirements for Git action monitoring, PR feedback polling, comment handling, test documentation, and bug fix workflow.
49
+
50
+ ### 9. Merge Requirements
51
+ **Source**: `.ai-agents/rules/merge-requirements.md`
52
+
53
+ Enforces a strict `git rebase` workflow to ensure feature branches are up-to-date with `master` before merging, maintaining a clean and stable history.
54
+
55
+ ### 10. Best practices while debuggin
56
+ **Source**: `.ai-agents/rules/successful-debugging-patterns.md`
57
+
58
+ Patterns on debugging issues systematically and converting learnings into test cases
@@ -0,0 +1,34 @@
1
+ // This is a template file to set up MCP servers for dev purposes. Follow these steps
2
+ // 1. Replace <your github pat> with your github personal access token
3
+ // 2. Replace <your context7 api key> with your context7 api key
4
+ // 3. Remove the comments from the file
5
+ //
6
+ // Depending on the coding agent you are using, do the following:
7
+ // Windsurf: Copy this file to .codeium/windsurf/mcp.json (at the user's home directory)
8
+ // Cursor: Copy this file to .cursor/mcp.json
9
+ // Claude: Append the mcpServers object to .claude/settings.json
10
+
11
+ {
12
+ "mcpServers": {
13
+ "git": {
14
+ "command": "npx",
15
+ "args": ["-y", "@cyanheads/git-mcp-server"]
16
+ },
17
+ "github": {
18
+ "url": "https://api.githubcopilot.com/mcp/",
19
+ "headers": {
20
+ "Authorization": "Bearer <your github pat>"
21
+ }
22
+ },
23
+ "context7": {
24
+ "url": "https://mcp.context7.com/mcp",
25
+ "headers": {
26
+ "CONTEXT7_API_KEY": "<your context7 api key>"
27
+ }
28
+ },
29
+ "playwright": {
30
+ "command": "npx",
31
+ "args": ["-y", "@playwright/mcp"]
32
+ }
33
+ }
34
+ }
@@ -0,0 +1,545 @@
1
+ # AI Agent Testing & Validation Guidelines
2
+
3
+ ## INTENT
4
+ To ensure all work is thoroughly validated. To ensure agents provide **real, reproducible, end‑to‑end evidence** that fixes work — never "looks good" claims.
5
+
6
+ ## MANDATORY PRINCIPLES
7
+ - **Reproduce → Fix → Prove**: Show failing evidence first, then passing evidence after the fix.
8
+ - **Test what matters**: Follow the test plan. Test the functionality that you have changed. Do not mock core functionality being tested.
9
+ - **Keep it simple**: Tests must be minimal but complete, covering all relevant scenarios. Use boilerplate tests and mocks to reduce duplication.
10
+ - **Be complete**: No Placeholder Tests `// TODO` or empty/disabled test bodies are forbidden. Do not assume failing test cases are not important. Always watch for server logs for errors.
11
+ - **Be efficient**: Tests should create objects they need, test core functionality, then delete any objects they create, close server/database connections.
12
+ - **Be resilient**: When a tool fails, investigate alternative approaches before giving up. Check for existing working examples in the project before claiming a tool is broken.
13
+ - **Be truthful**: NEVER CLAIM SUCCESS WITHOUT RUNNING TESTS. Always run tests and show results before claiming they pass.
14
+ - **Backup with Evidence, Not Assertions**: Include evidence of passing tests in the PR before submitting for review.
15
+
16
+
17
+ ## CORE TESTING PROCEDURE
18
+
19
+ ### 1. Analyze Before Implementing
20
+ **CRITICAL**: Always analyze the codebase thoroughly before making changes:
21
+ - Use `grep_search` to find all dependencies and usage patterns
22
+ - Use `find_by_name` to locate related files and patterns
23
+ - Use `Read` to understand existing implementations
24
+ - Document findings with real code examples and line numbers
25
+ - **NEVER** make changes based on assumptions
26
+
27
+ ### 2. Identify Code Changes
28
+ Before committing, determine the full scope of files that have been modified, created, or deleted.
29
+
30
+ ### 3. Locate Relevant Tests
31
+ - For each modified source file (e.g., in `src/`), search the codebase for corresponding test files
32
+ - Test files follow the naming convention `test-*.ts` or `*.test.ts`
33
+ - A good method is to search for test files that `import` or `require` the modified source file
34
+
35
+ ### 4. Execute Tests
36
+ - Run the specific test files you have identified
37
+ - Use the `npm test -- <test-file-name.ts>` command to run individual tests
38
+ - If multiple modules are affected, run all relevant test files
39
+
40
+ ### 5. Verify and Fix
41
+ - Carefully review the test output in `test.log` to understand and analyze test results. All tests must pass
42
+ - If any test fails, you MUST fix the underlying issue in the source code
43
+ - If you believe the failure is due to a test flaw, escalate to the user and get their permission to modify the test. Never modify existing tests without permission.
44
+ - Do not proceed with a commit until all related tests are passing
45
+ - Communicate with the user if you are stuck
46
+
47
+ ### 6. Test Structure Requirements
48
+ - **MUST** use `BaseTestCase` from `test-utils.ts` for all test cases
49
+ - **MUST** include proper `tags` array (e.g., `['smoke']`). At least 1 smoke test must exist for every fix
50
+ - **MUST** follow the existing test structure patterns in the codebase
51
+ - **MUST** use `runTests()` function for test execution
52
+ - **MUST** provide clear test descriptions and expected outcomes
53
+ - **MUST** be minimal but complete, covering all relevant scenarios.
54
+ - **MUST** use boilerplate tests and mocks to reduce duplication. Test-Util.ts is the location for common test code.
55
+
56
+ ### 7. Evidence
57
+ - **MUST** include evidence before submitting for review.
58
+ - **MUST** attach this evidence to the issue as a single comment with screenshots, logs, or any other relevant evidence to demonstrate the fix.
59
+ - **MUST** include in evidence which test cases were run, which ones passed, and reasons/analysis for any failures.
60
+ - **MUST** have run all relevant test suites as indicated in step #2.
61
+ - **MUST** have run `npm run test-smoke test*.ts` to ensure smoke tests are passing
62
+
63
+ ## INTEGRATION TESTING ANTI-PATTERNS TO AVOID
64
+
65
+ ### ❌ The "No Exceptions" Anti-Pattern
66
+ **NEVER write tests that only check for the absence of exceptions without validating actual behavior:**
67
+
68
+ ```typescript
69
+ // BAD: Only checking for absence of exceptions
70
+ await actionOrchestrator.processHITLApproval(testRecord);
71
+ console.log('✅ Test passed - no exceptions thrown');
72
+ return true;
73
+ ```
74
+
75
+ **What's Wrong:**
76
+ - No validation of actual behavior or outcomes
77
+ - No verification of side effects (database updates, service calls)
78
+ - No verification of service interactions
79
+ - False confidence - tests could pass even if functionality is completely broken
80
+
81
+ ### ❌ Insufficient Mock Validation
82
+ **NEVER create mocks that don't capture and validate behavior:**
83
+
84
+ ```typescript
85
+ // BAD: Mock that doesn't track calls
86
+ const mockDataService = {
87
+ updateRecord: async () => ({ success: true })
88
+ };
89
+
90
+ // BAD: No validation of what was called
91
+ await actionOrchestrator.processHITLApproval(testRecord);
92
+ // No verification of service interactions!
93
+ ```
94
+
95
+ ## INTEGRATION TESTING BEST PRACTICES
96
+
97
+ ### ✅ Enhanced Mock Services with Call Tracking
98
+ **ALWAYS create mocks that track method calls and parameters for validation:**
99
+
100
+ ```typescript
101
+ // GOOD: Mock with call tracking
102
+ const createMockDataService = () => {
103
+ const calls: any[] = [];
104
+ return {
105
+ calls, // Expose calls for validation
106
+ updateRecord: async (request: any) => {
107
+ calls.push({ method: 'updateRecord', request });
108
+ return { success: true, recordId: request.recordId };
109
+ }
110
+ };
111
+ };
112
+ ```
113
+
114
+ ### ✅ Comprehensive Behavior Validation
115
+ **ALWAYS validate specific, measurable outcomes:**
116
+
117
+ ```typescript
118
+ // GOOD: Validating actual behavior
119
+ const dataCalls = mockDataService.calls;
120
+ const updateCall = dataCalls.find(call => call.method === 'updateRecord');
121
+
122
+ if (!updateCall) {
123
+ console.log('❌ Expected updateRecord to be called');
124
+ return false;
125
+ }
126
+
127
+ const updateRequest = updateCall.request;
128
+ if (updateRequest.eventId !== 'expected-event-id') {
129
+ console.log(`❌ Expected eventId 'expected-event-id', got '${updateRequest.eventId}'`);
130
+ return false;
131
+ }
132
+
133
+ if (updateRequest.updates.summary !== 'Expected Summary') {
134
+ console.log(`❌ Expected summary 'Expected Summary', got '${updateRequest.updates.summary}'`);
135
+ return false;
136
+ }
137
+ ```
138
+
139
+ ### ✅ Multi-Service Validation Pattern
140
+ **ALWAYS validate interactions between multiple services:**
141
+
142
+ ```typescript
143
+ // GOOD: Validate multiple service interactions
144
+ const validateServiceInteractions = (mocks: any) => {
145
+ // Data Service Validation
146
+ const dataCalls = mocks.dataService.calls;
147
+ const updateCall = dataCalls.find(call => call.method === 'updateRecord');
148
+ if (!updateCall || updateCall.request.recordId !== expectedRecordId) {
149
+ throw new Error('Data service not called correctly');
150
+ }
151
+
152
+ // Notification Service Validation
153
+ const notificationCalls = mocks.notificationService.calls;
154
+ const sendCall = notificationCalls.find(call => call.method === 'sendMessage');
155
+ if (!sendCall || !sendCall.request.recipients.includes('expected@example.com')) {
156
+ throw new Error('Notification service not called correctly');
157
+ }
158
+
159
+ // Database Service Validation
160
+ const dbCalls = mocks.dbService.calls;
161
+ const updateCall = dbCalls.find(call => call.method === 'updateReview');
162
+ if (!updateCall || updateCall.request.status !== 'completed') {
163
+ throw new Error('Database not updated correctly');
164
+ }
165
+ };
166
+ ```
167
+
168
+ ### ✅ Behavior-Driven Test Design
169
+ **ALWAYS start with "What should this code do?" then "How can I verify it did that?":**
170
+
171
+ 1. **Define Expected Behavior**: What should the code actually do?
172
+ 2. **Identify Observable Outcomes**: How can you verify the behavior occurred?
173
+ 3. **Create Validation Logic**: Write assertions that check the observable outcomes
174
+ 4. **Test Edge Cases**: What could go wrong?
175
+
176
+ ## TOOL TROUBLESHOOTING REQUIREMENTS
177
+
178
+ ### When a Tool Fails
179
+ 1. **Check for existing working examples** in the project before claiming a tool is broken
180
+ 2. **Try alternative approaches** (e.g., direct library usage vs MCP tools)
181
+ 3. **Investigate the root cause** rather than immediately giving up
182
+ 4. **Use project-specific patterns** that are already proven to work
183
+
184
+ ### Common Tool Issues and Solutions
185
+ - **MCP Tools failing**: Check if direct library usage is available (e.g., Playwright library vs MCP Playwright tools)
186
+ - **API endpoints not working**: Verify server is running, check logs, test with curl
187
+ - **Database operations failing**: Check connection, verify schema, test with existing working scripts
188
+ - **UI automation failing**: Look for existing test files that work, use them as templates
189
+
190
+ ### Prohibited Tool Behavior
191
+ - **NEVER** give up on tool usage without investigating alternatives
192
+ - **NEVER** claim a tool is broken without checking for working examples
193
+ - **NEVER** suggest manual work when automated approaches are available
194
+ - **NEVER** abandon testing due to tool issues without exhausting alternatives
195
+
196
+ ## CRITICAL FAILURE PATTERNS TO AVOID
197
+
198
+ ### ❌ The "Claim Success Without Running Tests" Anti-Pattern
199
+ **NEVER claim tests are passing without actually running them:**
200
+ - **What Happened**: Agent claimed "tests are good" without running them
201
+ - **User Response**: "are you serious? every test except 1 passed and youre saying its good?"
202
+ - **Impact**: False confidence, ignored real failures, wasted time
203
+ - **Prevention**: Always run tests and show results before claiming success
204
+
205
+ ### ❌ The "Ignore User Feedback" Anti-Pattern
206
+ **NEVER argue when user corrects mistakes:**
207
+ - **What Happened**: User had to correct agent multiple times about same issues
208
+ - **User Response**: "are you lying to me? see the tooltip test!! again, please review agent instructions"
209
+ - **Impact**: Lost trust, frustrated user, repeated same mistakes
210
+ - **Prevention**: Accept feedback immediately, don't argue, learn from corrections
211
+
212
+ ### ❌ The "Mock What You're Testing" Anti-Pattern
213
+ **NEVER mock the core functionality you're supposed to test:**
214
+ - **What Happened**: Mocked `storeForHumanReview` method instead of testing the real implementation
215
+ - **User Response**: "you have mocked out the real thing being tested which is what is sent to the store function"
216
+ - **Impact**: Test didn't validate actual bug fix, was testing mock implementation
217
+ - **Prevention**: Test the real implementation, not mock implementations
218
+
219
+ ### ❌ The "Ignore Database Issues" Anti-Pattern
220
+ **NEVER ignore database connection problems as acceptable:**
221
+ - **What Happened**: Ignored `MongoNotConnectedError` and Cosmos DB throughput limit errors
222
+ - **User Response**: "and youve confirmed the tests are all passing?"
223
+ - **Impact**: Claimed tests were working when they were actually failing
224
+ - **Prevention**: Database issues are test failures, not acceptable
225
+
226
+ ### ❌ The "Weak Test Validation" Anti-Pattern
227
+ **NEVER write tests that only check presence/absence:**
228
+ - **What Happened**: Wrote tests that checked if fields were not empty instead of validating actual content
229
+ - **User Response**: "if the rules about this aren't already clear, please help clarify them... what you have done is an antipattern"
230
+ - **Impact**: Tests didn't validate real behavior, missed actual bugs
231
+ - **Prevention**: Always validate actual content, not just presence/absence
232
+
233
+ ## PROHIBITED
234
+ - Changing tests/success criteria to create a pass.
235
+ - Declaring "tested" without including artifacts.
236
+ - Relying only on mocked tests for e2e behavior.
237
+ - Leaving TODOs/placeholders in any test file.
238
+ - Mocking the core functionality that needs to be validated.
239
+ - Changing existing test cases without permission.
240
+ - **Writing tests that only check for absence of exceptions**
241
+ - **Creating mocks that don't validate actual behavior**
242
+ - **Not verifying service interactions and side effects**
243
+ - **Giving up on tool usage without investigating alternatives**
244
+ - **Claiming tests pass without running them**
245
+ - **Arguing when user corrects mistakes**
246
+ - **Mocking what you're supposed to test**
247
+ - **Ignoring database connection issues**
248
+ - **Writing weak tests that only check presence/absence**
249
+
250
+ ### 7. Provide evidence
251
+ - **MUST** include a table in the PR comments with the following - test, pass/fail, analysis of why failed (if failed)
252
+
253
+ ## MANDATORY VERIFICATION REQUIREMENTS
254
+
255
+ ### UI Changes
256
+ - **MUST** open browser and test UI functionality
257
+ - **MUST** test actual user interactions (clicks, forms, etc.)
258
+ - **MUST** verify UI loads without errors
259
+ - **MUST** author 1 e2e test case without mocking anything
260
+
261
+ ### Database Changes
262
+ - **MUST** verify database schema compatibility
263
+ - **MUST** test actual database operations
264
+ - **MUST** verify column mapping is correct
265
+ - **MUST** test with real database, not just mocks
266
+
267
+ ### Issue Resolution
268
+ - **MUST** reproduce the original issue first
269
+ - **MUST** demonstrate the issue is actually fixed
270
+ - **MUST** test related scenarios to ensure no regressions
271
+ - **MUST** provide evidence of successful resolution
272
+
273
+ ### Test Execution
274
+ - **MUST** run all relevant tests and verify they pass
275
+ - **MUST** fix any test failures that were not present before
276
+ - **MUST** provide test output as evidence
277
+ - **MUST** verify tests actually test the functionality
278
+
279
+ ### Git Workflows
280
+ - **MUST** verify Git workflows completed successfully (not just started)
281
+ - **MUST** check PR status, branch status, merge status
282
+ - **MUST** confirm expected outcomes occurred
283
+
284
+ ## MANDATORY COMPLETENESS VERIFICATION
285
+
286
+ ### **CRITICAL: Before Declaring Work Complete**
287
+ Every agent **MUST** perform this comprehensive verification checklist:
288
+
289
+ #### **1. Compilation Verification**
290
+ - **MUST** run `npx tsc --noEmit --skipLibCheck` and verify 0 errors
291
+ - **MUST** run `npm run build` (if build script exists) and verify success
292
+ - **MUST** fix any TypeScript compilation errors before proceeding
293
+ - **NEVER** ignore compilation errors or assume they're unrelated
294
+
295
+ #### **2. Comprehensive Search Verification**
296
+ When refactoring or removing dependencies, **MUST** use multiple search strategies:
297
+ ```bash
298
+ # Search for class/interface names
299
+ grep_search --SearchPath . --Query "ClassName" --MatchPerLine true
300
+
301
+ # Search for import statements
302
+ grep_search --SearchPath . --Query "import.*ClassName" --IsRegex true --MatchPerLine true
303
+
304
+ # Search for file references
305
+ grep_search --SearchPath . --Query "filename" --MatchPerLine true
306
+
307
+ # Search for method calls
308
+ grep_search --SearchPath . --Query "methodName" --MatchPerLine true
309
+ ```
310
+
311
+ #### **3. Build System Verification**
312
+ - **MUST** verify the entire system builds without errors
313
+ - **MUST** check that all imports resolve correctly
314
+ - **MUST** verify no broken dependencies exist
315
+ - **MUST** test that the application starts without compilation errors
316
+
317
+ #### **4. End-to-End Functionality Verification**
318
+ - **MUST** verify the main application functionality still works
319
+ - **MUST** test critical user workflows end-to-end
320
+ - **MUST** verify no regressions in existing functionality
321
+ - **MUST** test with real data, not just mocks
322
+
323
+ #### **5. Dependency Impact Analysis**
324
+ When changing core services or interfaces:
325
+ - **MUST** identify ALL files that depend on the changed code
326
+ - **MUST** verify each dependent file still works correctly
327
+ - **MUST** update ALL references to use new patterns/interfaces
328
+ - **MUST** verify no orphaned code or broken imports remain
329
+
330
+ ### **PROHIBITED COMPLETENESS ANTI-PATTERNS**
331
+
332
+ #### **❌ The "Partial Search" Anti-Pattern**
333
+ **NEVER** search for only one pattern when multiple exist:
334
+ ```bash
335
+ # BAD: Only searching for imports
336
+ grep_search "import.*GmailService"
337
+
338
+ # GOOD: Comprehensive search
339
+ grep_search "GmailService" # All references
340
+ grep_search "gmail-service" # File references
341
+ grep_search "import.*GmailService" # Import statements
342
+ npx tsc --noEmit # Compilation check
343
+ ```
344
+
345
+ #### **❌ The "Assume It Works" Anti-Pattern**
346
+ **NEVER** assume changes work without verification:
347
+ - **BAD**: "I updated the imports, should work now"
348
+ - **GOOD**: Run compilation, run tests, verify functionality
349
+
350
+ #### **❌ The "Ignore Compilation Errors" Anti-Pattern**
351
+ **NEVER** ignore TypeScript errors:
352
+ - **BAD**: "There are some TS errors but they're probably unrelated"
353
+ - **GOOD**: Fix ALL compilation errors before declaring work complete
354
+
355
+ #### **❌ The "Skip Build Verification" Anti-Pattern**
356
+ **NEVER** skip verifying the system builds:
357
+ - **BAD**: "Code looks right, tests pass, must be good"
358
+ - **GOOD**: Run `npm run build` and verify success
359
+
360
+ ## TESTING COMMANDS
361
+ - `npm run test <file>.ts` to run all tests
362
+ - `npm run test-smoke <file>.ts` for smoke tests
363
+ - `npm run test-flaky <file>.ts` for flaky tests
364
+ - `npm run test-failing <file>.ts` for failing tests
365
+ - `npm run test-baml <file>.ts` for BAML tests
366
+
367
+ ## EXAMPLES
368
+
369
+ ### Good: Comprehensive Verification
370
+ ```
371
+ Action: Fixed login button
372
+ Verification:
373
+ - Opened browser, clicked button, verified login works
374
+ - Took screenshot showing successful login
375
+ - Ran tests: npm test -- test-login.ts ✅ PASSED
376
+ - Verified no regressions in other login flows
377
+ Evidence: Screenshot + test output provided
378
+ ```
379
+
380
+ ### Bad: Assumption-Based Work
381
+ ```
382
+ Action: Fixed login button
383
+ Verification: "Code looks correct, should work"
384
+ Evidence: None provided
385
+ ```
386
+
387
+ ### Good: Database Verification
388
+ ```
389
+ Action: Added new user field
390
+ Verification:
391
+ - Updated database schema
392
+ - Tested actual database write/read operations
393
+ - Verified column mapping is correct
394
+ - Ran: npm test -- test-db-schema-lifecycle.ts ✅ PASSED
395
+ Evidence: Database operation results + test output
396
+ ```
397
+
398
+ ### Bad: Code-Only Changes
399
+ ```
400
+ Action: Added new user field
401
+ Verification: "Added field to TypeScript interface"
402
+ Evidence: None - didn't test actual database operations
403
+ ```
404
+
405
+ ### Good: Proper Test Structure
406
+ ```typescript
407
+ import { BaseTestCase, runTests } from './test-utils';
408
+
409
+ interface MyTestCase extends BaseTestCase {
410
+ description: string;
411
+ testFunction: () => Promise<boolean>;
412
+ }
413
+
414
+ const MY_TEST_CASES: MyTestCase[] = [
415
+ {
416
+ name: 'test_user_field_creation',
417
+ tags: ['smoke'],
418
+ description: 'Should create user field and verify database operations',
419
+ testFunction: async () => {
420
+ // Test implementation
421
+ return true;
422
+ }
423
+ }
424
+ ];
425
+
426
+ const runMyTest = async (testCase: MyTestCase) => {
427
+ doSetup(); // any mocking, object creation, etc.
428
+ const result = testCoreFunctionality();
429
+ doTeardown(); // remove any created objects, etc.
430
+ return result;
431
+ };
432
+
433
+ runTests(MY_TEST_CASES, runMyTest, 'My Test Suite');
434
+ ```
435
+
436
+ ### Bad: Incorrect Test Structure
437
+ ```typescript
438
+ // DON'T: Direct test execution without proper structure
439
+ describe('My Tests', () => {
440
+ it('should work', () => {
441
+ // Test implementation
442
+ });
443
+ });
444
+ ```
445
+
446
+
447
+ ## VERIFICATION CHECKLIST
448
+ Before starting work:
449
+ - [ ] Understand the issue and its reproduction steps
450
+ - [ ] Able to reproduce the issue
451
+ - [ ] Identify relevant test cases
452
+
453
+ While debugging test failures
454
+ - [ ] Run the single test that repros the issue (set tag failing and run using `npm run test-flaky <tesuite>`)
455
+ - [ ] Read test output to understand WHY it failed
456
+ - [ ] Identify specific error messages or assertions
457
+ - [ ] Determine if it's configuration, data, or functionality issue
458
+ - [ ] Plan specific fix before re-running the single failing test
459
+ - [ ] Once fixed, run the entire suite (remove tag failing and run using `npm run test <testsuite>`)
460
+
461
+ Before marking any work complete:
462
+ - [ ] **COMPILATION**: `npx tsc --noEmit --skipLibCheck` shows 0 errors
463
+ - [ ] **BUILD**: `npm run build` completes successfully (if build script exists)
464
+ - [ ] **COMPREHENSIVE SEARCH**: Used multiple search patterns to find ALL references
465
+ - [ ] **DEPENDENCY ANALYSIS**: Verified ALL dependent files still work
466
+ - [ ] **IMPORTS**: All imports resolve correctly, no broken dependencies
467
+ - [ ] UI changes tested in actual browser
468
+ - [ ] Database changes tested with real operations
469
+ - [ ] Issue reproduction and resolution demonstrated
470
+ - [ ] All relevant tests pass with evidence
471
+ - [ ] Core functionality tested
472
+ - [ ] No regressions introduced
473
+ - [ ] Git workflows verified successful
474
+ - [ ] Evidence provided in GitHub issue
475
+
476
+
477
+
478
+ ## ITERATIVE TESTING PATTERNS & ANTI-PATTERNS THAT AGENTS HAVE LEARNED
479
+
480
+ ### ✅ **SUCCESSFUL TESTING PATTERNS**
481
+
482
+ #### **1. Multi-Layer Verification Pattern**
483
+ **Pattern**: Test at every layer when fixing issues
484
+ - **Database Layer**: Verify data is stored correctly with proper IDs
485
+ - **API Layer**: Test endpoints return correct data structure
486
+ - **UI Layer**: Verify user interface displays and functions correctly
487
+ - **Integration Layer**: Test end-to-end workflows
488
+
489
+ **Example**: When fixing user preference issues:
490
+ 1. Check database: `node check-preferences.js` → verify `user_id` field
491
+ 2. Test API: `curl -H "x-user-id: user-001" /preferences` → verify response
492
+ 3. Test UI: Use Playwright to navigate and interact → verify functionality
493
+ 4. Test isolation: Switch users and verify data separation
494
+
495
+ #### **2. Root Cause Analysis Pattern**
496
+ **Pattern**: Don't just fix symptoms, identify and fix root causes
497
+ - **Symptom**: UI shows 404 errors
498
+ - **Surface Fix**: Update API endpoints
499
+ - **Root Cause**: UI still using old URL structure with `USER_ID` parameters
500
+ - **Complete Fix**: Update all UI API calls to use new authentication pattern
501
+
502
+ #### **3. Incremental Validation Pattern**
503
+ **Pattern**: Validate each fix before moving to the next
504
+ - Fix API endpoints → Test with curl → Verify response
505
+ - Fix UI calls → Test with Playwright → Verify functionality
506
+ - Test multi-tenancy → Verify isolation → Document results
507
+
508
+ #### **4. Data Consistency Verification Pattern**
509
+ **Pattern**: Always verify data consistency across systems
510
+ - **Data Design**: Ensure `Record.id` vs `Record.recordId` vs `Token.recordId` consistency
511
+ - **User Context**: Verify `user_id` is used consistently across all APIs
512
+ - **Database Queries**: Ensure query fields match stored data fields
513
+
514
+ #### **5. User-Centric Testing Pattern**
515
+ **Pattern**: Test from user perspective, not just technical perspective
516
+ - **User Journey**: Can user add preferences? Can they see their data?
517
+ - **Multi-User**: Can different users see only their own data?
518
+ - **Error Handling**: Are error messages user-friendly?
519
+
520
+ ### ❌ **COMMON ANTI-PATTERNS TO AVOID**
521
+
522
+ #### **1. Premature Victory Declaration**
523
+ **Anti-Pattern**: Claiming success after only partial testing
524
+ - **Wrong**: "Fixed the API, it's working!" (only tested with curl)
525
+ - **Right**: "Fixed API, tested with curl, now testing UI with Playwright"
526
+
527
+ #### **2. Single-Layer Testing**
528
+ **Anti-Pattern**: Only testing one layer (e.g., just API or just UI)
529
+ - **Wrong**: "API works, UI should work too"
530
+ - **Right**: "API works, now let me test UI to verify"
531
+
532
+ #### **3. Assumption-Based Fixes**
533
+ **Anti-Pattern**: Making changes based on assumptions without verification
534
+ - **Wrong**: "The UI probably uses the same API structure"
535
+ - **Right**: "Let me check the UI code to see how it calls the API"
536
+
537
+ #### **4. Incomplete Multi-Tenancy Testing**
538
+ **Anti-Pattern**: Only testing one user
539
+ - **Wrong**: "User1 can see their preferences, multi-tenancy works"
540
+ - **Right**: "User1 sees their preferences, User2 sees empty list, isolation confirmed"
541
+
542
+ #### **5. Database Assumption Anti-Pattern**
543
+ **Anti-Pattern**: Assuming database structure without verification
544
+ - **Wrong**: "The database probably stores it as `userId`"
545
+ - **Right**: "Let me check the database to see the actual field names"
@@ -0,0 +1,52 @@
1
+ # Clean Architecture Guidelines
2
+
3
+ ## INTENT
4
+ To maintain clean architectural boundaries by separating concerns between AI/LLM components for natural-language understanding and deterministic code for side-effectful and rule-based work, ensuring optimal performance and maintainability.
5
+
6
+ ## PRINCIPLES
7
+ - **Separation of Concerns**: AI for semantic operations, deterministic code for business logic
8
+ - **Performance Optimization**: Use appropriate tools for each type of operation
9
+ - **Maintainability**: Clear boundaries between different system components
10
+ - **Testability**: Deterministic components are easily testable
11
+
12
+ ## ARCHITECTURAL BOUNDARIES
13
+
14
+ ### AI/LLM Layer (Semantic Operations)
15
+ **Purpose**: Natural language understanding, content generation, semantic analysis
16
+
17
+ **Use Cases**:
18
+ - Text classification and sentiment analysis
19
+ - Content generation and summarization
20
+ - Natural language query processing
21
+ - Semantic search and matching
22
+ - Intent recognition and extraction
23
+
24
+ **Examples**:
25
+ ```typescript
26
+ // Good: Use AI for semantic operations
27
+ const intent = await classifyUserIntent(userMessage);
28
+ const summary = await generateSummary(longText);
29
+ const sentiment = await analyzeSentiment(feedback);
30
+ ```
31
+
32
+ ### Deterministic Layer (Business Logic)
33
+ **Purpose**: Interface driven integrations, Rule-based operations, Data processing
34
+
35
+ **Use Cases**:
36
+ - Database operations and data persistence
37
+ - API integrations and external service calls
38
+ - Business rule enforcement
39
+ - Data validation and transformation
40
+ - System orchestration and workflow management
41
+
42
+ **Examples**:
43
+ ```typescript
44
+ // Good: Use deterministic code for business logic
45
+ const user = await userService.findById(userId);
46
+ const isValid = validateInputFormat(input);
47
+ const result = await paymentService.processPayment(amount);
48
+ ```
49
+
50
+ ### Service Architecture
51
+
52
+ <add your service architecture details here>