@northbridge-security/secureai 0.1.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. package/.claude/README.md +122 -0
  2. package/.claude/commands/architect/clean.md +978 -0
  3. package/.claude/commands/architect/kiss.md +762 -0
  4. package/.claude/commands/architect/review.md +704 -0
  5. package/.claude/commands/catchup.md +90 -0
  6. package/.claude/commands/code.md +115 -0
  7. package/.claude/commands/commit.md +1218 -0
  8. package/.claude/commands/cover.md +1298 -0
  9. package/.claude/commands/fmea.md +275 -0
  10. package/.claude/commands/kaizen.md +312 -0
  11. package/.claude/commands/pr.md +503 -0
  12. package/.claude/commands/todo.md +99 -0
  13. package/.claude/commands/worktree.md +738 -0
  14. package/.claude/commands/wrapup.md +103 -0
  15. package/LICENSE +183 -0
  16. package/README.md +108 -0
  17. package/dist/cli.js +75634 -0
  18. package/docs/agents/devops-reviewer.md +889 -0
  19. package/docs/agents/kiss-simplifier.md +1088 -0
  20. package/docs/agents/typescript.md +8 -0
  21. package/docs/guides/README.md +109 -0
  22. package/docs/guides/agents.clean.arch.md +244 -0
  23. package/docs/guides/agents.clean.arch.ts.md +1314 -0
  24. package/docs/guides/agents.gotask.md +1037 -0
  25. package/docs/guides/agents.markdown.md +1209 -0
  26. package/docs/guides/agents.onepassword.md +285 -0
  27. package/docs/guides/agents.sonar.md +857 -0
  28. package/docs/guides/agents.tdd.md +838 -0
  29. package/docs/guides/agents.tdd.ts.md +1062 -0
  30. package/docs/guides/agents.typesript.md +1389 -0
  31. package/docs/guides/github-mcp.md +1075 -0
  32. package/package.json +130 -0
  33. package/packages/secureai-cli/src/cli.ts +21 -0
  34. package/tasks/README.md +880 -0
  35. package/tasks/aws.yml +64 -0
  36. package/tasks/bash.yml +118 -0
  37. package/tasks/bun.yml +738 -0
  38. package/tasks/claude.yml +183 -0
  39. package/tasks/docker.yml +420 -0
  40. package/tasks/docs.yml +127 -0
  41. package/tasks/git.yml +1336 -0
  42. package/tasks/gotask.yml +132 -0
  43. package/tasks/json.yml +77 -0
  44. package/tasks/markdown.yml +95 -0
  45. package/tasks/onepassword.yml +350 -0
  46. package/tasks/security.yml +102 -0
  47. package/tasks/sonar.yml +437 -0
  48. package/tasks/template.yml +74 -0
  49. package/tasks/vscode.yml +103 -0
  50. package/tasks/yaml.yml +121 -0
@@ -0,0 +1,838 @@
1
+ # Test-Driven Development for AI Agents
2
+
3
+ This guide establishes test-driven development (TDD) principles for AI agents working with codebases. These patterns apply across all languages and frameworks, ensuring testable, maintainable, and reliable code.
4
+
5
+ ## Target Audience
6
+
7
+ AI agents (Claude Code, Cursor, GitHub Copilot, etc.) writing production code that requires automated testing, regression prevention, and quality assurance.
8
+
9
+ ## Core Principles
10
+
11
+ ### The TDD Cycle
12
+
13
+ **Red → Green → Refactor** is the fundamental TDD workflow:
14
+
15
+ ```text
16
+ 1. RED: Write a failing test
17
+
18
+ 2. GREEN: Write minimal code to pass the test
19
+
20
+ 3. REFACTOR: Improve code without changing behavior
21
+
22
+ (Repeat)
23
+ ```
24
+
25
+ **Why this order matters:**
26
+
27
+ - **Red first** - Proves the test can fail (validates test correctness)
28
+ - **Green quickly** - Gets to working state fast (validates implementation)
29
+ - **Refactor safely** - Tests catch regressions (enables improvement)
30
+
31
+ ### Test First, Code Second
32
+
33
+ **Always write tests before implementation:**
34
+
35
+ ```text
36
+ ❌ WRONG:
37
+ 1. Write function implementation
38
+ 2. Write tests to verify it works
39
+ 3. Find bugs, fix, repeat
40
+
41
+ ✓ CORRECT:
42
+ 1. Write test describing expected behavior
43
+ 2. Run test (should fail - RED)
44
+ 3. Write minimal code to pass test (GREEN)
45
+ 4. Refactor for quality (REFACTOR)
46
+ ```
47
+
48
+ **Benefits:**
49
+
50
+ - **Better design** - Writing tests first forces you to think about API design
51
+ - **Complete coverage** - Every line of code has a corresponding test
52
+ - **No dead code** - Only write code needed to pass tests
53
+ - **Living documentation** - Tests document how code should be used
54
+
55
+ ### Small Steps
56
+
57
+ **Make incremental progress with small, focused tests:**
58
+
59
+ ```text
60
+ Testing a validator function:
61
+
62
+ Step 1: Test empty input
63
+ Step 2: Test valid input
64
+ Step 3: Test invalid format
65
+ Step 4: Test boundary conditions
66
+ Step 5: Test error messages
67
+ ```
68
+
69
+ **Why small steps:**
70
+
71
+ - Easier to identify what broke when tests fail
72
+ - Faster feedback loop (run tests every few minutes)
73
+ - Reduced cognitive load (focus on one behavior at a time)
74
+ - Natural progression toward complete implementation
75
+
76
+ ## Test Types
77
+
78
+ ### Unit Tests
79
+
80
+ **Test individual functions/classes in isolation:**
81
+
82
+ **Characteristics:**
83
+
84
+ - Fast (< 10ms per test)
85
+ - No external dependencies (filesystem, network, database)
86
+ - Use mocks/stubs for dependencies
87
+ - Test one behavior per test
88
+
89
+ **Example scenarios:**
90
+
91
+ - Pure functions (input → output)
92
+ - Business logic calculations
93
+ - Data transformations
94
+ - Validation rules
95
+ - String parsing
96
+ - Math operations
97
+
98
+ **When to use:**
99
+
100
+ - Testing business logic
101
+ - Validating calculations
102
+ - Checking edge cases
103
+ - Regression prevention
104
+
105
+ ### Integration Tests
106
+
107
+ **Test multiple components working together:**
108
+
109
+ **Characteristics:**
110
+
111
+ - Slower (100ms - 5 seconds per test)
112
+ - May use real dependencies (files, databases, APIs)
113
+ - Test interaction between components
114
+ - Verify end-to-end workflows
115
+
116
+ **Example scenarios:**
117
+
118
+ - Reading/writing files
119
+ - Database queries
120
+ - API calls
121
+ - Command execution
122
+ - Configuration loading
123
+ - Multi-step processes
124
+
125
+ **When to use:**
126
+
127
+ - Testing system boundaries (filesystem, network)
128
+ - Verifying component integration
129
+ - End-to-end workflow validation
130
+ - Infrastructure verification
131
+
132
+ ### Functional/End-to-End Tests
133
+
134
+ **Test complete user workflows:**
135
+
136
+ **Characteristics:**
137
+
138
+ - Slowest (seconds to minutes per test)
139
+ - Full system deployment
140
+ - Real environment (staging/production-like)
141
+ - User-centric scenarios
142
+
143
+ **Example scenarios:**
144
+
145
+ - CLI command workflows
146
+ - Web application user flows
147
+ - API endpoint chains
148
+ - Installation processes
149
+ - Update/migration procedures
150
+
151
+ **When to use:**
152
+
153
+ - Critical user workflows
154
+ - Release verification
155
+ - Smoke testing deployments
156
+ - Regression testing major features
157
+
158
+ ## Test Organization
159
+
160
+ ### Folder Structure
161
+
162
+ **Organize tests parallel to source code:**
163
+
164
+ ```text
165
+ project/
166
+ ├── src/
167
+ │ ├── auth/
168
+ │ │ ├── authenticate.ts
169
+ │ │ ├── session.ts
170
+ │ │ └── tokens.ts
171
+ │ └── users/
172
+ │ ├── repository.ts
173
+ │ └── service.ts
174
+ ├── tests/
175
+ │ ├── unit/
176
+ │ │ ├── auth/
177
+ │ │ │ ├── authenticate.test.ts
178
+ │ │ │ ├── session.test.ts
179
+ │ │ │ └── tokens.test.test.ts
180
+ │ │ └── users/
181
+ │ │ ├── repository.test.ts
182
+ │ │ └── service.test.ts
183
+ │ ├── integration/
184
+ │ │ ├── auth-flow.test.ts
185
+ │ │ └── user-management.test.ts
186
+ │ └── mocks/
187
+ │ ├── auth/
188
+ │ │ └── mock-session.ts
189
+ │ └── users/
190
+ │ └── mock-repository.ts
191
+ ```
192
+
193
+ **Benefits:**
194
+
195
+ - Easy to find related tests
196
+ - Clear separation of test types
197
+ - Parallel structure to source code
198
+ - Shared mocks in dedicated folder
199
+
200
+ ### Naming Conventions
201
+
202
+ **Test file names:**
203
+
204
+ | Pattern | Example | Purpose |
205
+ | ---------------------- | -------------------------- | ------------------ |
206
+ | `<module>.test.<ext>` | `authenticate.test.ts` | Unit tests |
207
+ | `<feature>.test.<ext>` | `auth-flow.test.ts` | Integration tests |
208
+ | `<workflow>.e2e.<ext>` | `user-registration.e2e.ts` | End-to-end tests |
209
+ | `mock-<module>.<ext>` | `mock-repository.ts` | Test doubles/mocks |
210
+
211
+ **Test case names:**
212
+
213
+ Use descriptive names that explain behavior:
214
+
215
+ ```text
216
+ ✓ GOOD:
217
+ - "should return user when valid credentials provided"
218
+ - "should throw error when password is too short"
219
+ - "should hash password before storing in database"
220
+
221
+ ✗ BAD:
222
+ - "test1"
223
+ - "authentication"
224
+ - "it works"
225
+ ```
226
+
227
+ ## Writing Effective Tests
228
+
229
+ ### Arrange-Act-Assert Pattern
230
+
231
+ **Structure every test with three sections:**
232
+
233
+ ```text
234
+ // Arrange: Set up test data and preconditions
235
+ const input = "test@example.com";
236
+ const expected = { email: "test@example.com", valid: true };
237
+
238
+ // Act: Execute the code being tested
239
+ const result = validateEmail(input);
240
+
241
+ // Assert: Verify the outcome
242
+ expect(result).toEqual(expected);
243
+ ```
244
+
245
+ **Why this structure:**
246
+
247
+ - Clear separation of setup, execution, and verification
248
+ - Easy to understand what's being tested
249
+ - Simple to debug when tests fail
250
+ - Consistent pattern across all tests
251
+
252
+ ### One Assertion Per Concept
253
+
254
+ **Test one behavior at a time:**
255
+
256
+ ```text
257
+ ✓ GOOD - Single concept:
258
+ test("should validate email format") {
259
+ const result = validateEmail("test@example.com");
260
+ expect(result.valid).toBe(true);
261
+ }
262
+
263
+ test("should extract email domain") {
264
+ const result = validateEmail("test@example.com");
265
+ expect(result.domain).toBe("example.com");
266
+ }
267
+
268
+ ✗ BAD - Multiple concepts:
269
+ test("should validate email") {
270
+ const result = validateEmail("test@example.com");
271
+ expect(result.valid).toBe(true);
272
+ expect(result.domain).toBe("example.com");
273
+ expect(result.username).toBe("test");
274
+ }
275
+ ```
276
+
277
+ **Exception:** Multiple assertions are acceptable when testing the same concept:
278
+
279
+ ```text
280
+ ✓ ACCEPTABLE - Same concept (object shape):
281
+ test("should return complete user object") {
282
+ const user = createUser("John", "john@example.com");
283
+
284
+ expect(user.name).toBe("John");
285
+ expect(user.email).toBe("john@example.com");
286
+ expect(user.id).toBeDefined();
287
+ expect(user.createdAt).toBeInstanceOf(Date);
288
+ }
289
+ ```
290
+
291
+ ### Test Edge Cases
292
+
293
+ **Cover boundary conditions and error scenarios:**
294
+
295
+ **Input validation example:**
296
+
297
+ ```text
298
+ Function: validateAge(age: number): boolean
299
+
300
+ Test cases:
301
+ 1. Valid age (18-120): expect true
302
+ 2. Minimum boundary (18): expect true
303
+ 3. Below minimum (17): expect false
304
+ 4. Maximum boundary (120): expect true
305
+ 5. Above maximum (121): expect false
306
+ 6. Zero: expect false
307
+ 7. Negative: expect false
308
+ 8. Decimal: expect false
309
+ 9. NaN: expect false
310
+ 10. Infinity: expect false
311
+ ```
312
+
313
+ **Common edge cases:**
314
+
315
+ - Empty inputs (null, undefined, empty string, empty array)
316
+ - Boundary values (min, max, zero, one)
317
+ - Invalid types (wrong type, NaN, Infinity)
318
+ - Special characters (Unicode, emojis, control characters)
319
+ - Large inputs (performance, memory limits)
320
+ - Concurrent operations (race conditions)
321
+
322
+ ### Avoid Test Interdependence
323
+
324
+ **Each test should be independent:**
325
+
326
+ ```text
327
+ ✓ GOOD - Independent tests:
328
+ test("should add user") {
329
+ const db = createTestDatabase();
330
+ db.addUser({ name: "Alice" });
331
+ expect(db.count()).toBe(1);
332
+ }
333
+
334
+ test("should remove user") {
335
+ const db = createTestDatabase();
336
+ db.addUser({ name: "Bob" });
337
+ db.removeUser("Bob");
338
+ expect(db.count()).toBe(0);
339
+ }
340
+
341
+ ✗ BAD - Dependent tests:
342
+ let db;
343
+
344
+ test("should add user") {
345
+ db = createTestDatabase();
346
+ db.addUser({ name: "Alice" });
347
+ expect(db.count()).toBe(1);
348
+ }
349
+
350
+ test("should remove user") {
351
+ // DEPENDS ON PREVIOUS TEST
352
+ db.removeUser("Alice");
353
+ expect(db.count()).toBe(0);
354
+ }
355
+ ```
356
+
357
+ **Why independence matters:**
358
+
359
+ - Tests can run in any order
360
+ - Tests can run in parallel
361
+ - Failures are isolated (one failure doesn't cascade)
362
+ - Tests can be run individually for debugging
363
+
364
+ ## Mocking and Test Doubles
365
+
366
+ ### When to Use Mocks
367
+
368
+ **Use mocks for external dependencies:**
369
+
370
+ **Mock these:**
371
+
372
+ - File system operations (read, write, delete)
373
+ - Network requests (HTTP, WebSocket, database)
374
+ - System commands (exec, spawn)
375
+ - Time-dependent code (Date.now(), timers)
376
+ - Random number generation
377
+ - External APIs
378
+
379
+ **Don't mock these:**
380
+
381
+ - Pure functions (no side effects)
382
+ - Data structures (objects, arrays)
383
+ - Simple utilities (string manipulation, math)
384
+ - Code you're testing directly
385
+
386
+ ### Types of Test Doubles
387
+
388
+ **Different patterns for different needs:**
389
+
390
+ **Stub** - Returns canned responses:
391
+
392
+ ```text
393
+ mockDatabase.getUser() → returns { id: 1, name: "Test User" }
394
+ ```
395
+
396
+ **Spy** - Records how it was called:
397
+
398
+ ```text
399
+ mockLogger.log("message")
400
+ → Verify: called once with "message"
401
+ ```
402
+
403
+ **Mock** - Programmable behavior with expectations:
404
+
405
+ ```text
406
+ mockAPI
407
+ .expect("POST", "/users")
408
+ .withBody({ name: "Alice" })
409
+ .respond({ id: 1 })
410
+ ```
411
+
412
+ **Fake** - Working implementation (lightweight):
413
+
414
+ ```text
415
+ InMemoryDatabase - Real database logic, but in-memory storage
416
+ ```
417
+
418
+ ### Dependency Injection for Testability
419
+
420
+ **Pass dependencies as parameters:**
421
+
422
+ ```text
423
+ ✓ GOOD - Injectable dependency:
424
+ function saveUser(user, database) {
425
+ return database.insert(user);
426
+ }
427
+
428
+ // In tests:
429
+ const mockDB = createMockDatabase();
430
+ saveUser({ name: "Alice" }, mockDB);
431
+
432
+ ✗ BAD - Hard-coded dependency:
433
+ import { realDatabase } from './database';
434
+
435
+ function saveUser(user) {
436
+ return realDatabase.insert(user);
437
+ // Cannot test without real database
438
+ }
439
+ ```
440
+
441
+ **For class-based code, use constructor injection:**
442
+
443
+ ```text
444
+ ✓ GOOD - Constructor injection:
445
+ class UserService {
446
+ constructor(database, emailService) {
447
+ this.database = database;
448
+ this.emailService = emailService;
449
+ }
450
+ }
451
+
452
+ // In tests:
453
+ const service = new UserService(mockDB, mockEmail);
454
+
455
+ ✗ BAD - Hard-coded dependencies:
456
+ class UserService {
457
+ constructor() {
458
+ this.database = new RealDatabase();
459
+ this.emailService = new RealEmailService();
460
+ }
461
+ }
462
+ ```
463
+
464
+ ## Test Coverage
465
+
466
+ ### Coverage Metrics
467
+
468
+ **Understand what coverage measures:**
469
+
470
+ | Metric | Meaning | Target |
471
+ | ------------------ | ---------------------------- | ------ |
472
+ | Line coverage | % of code lines executed | 80%+ |
473
+ | Branch coverage | % of if/else branches tested | 80%+ |
474
+ | Function coverage | % of functions called | 90%+ |
475
+ | Statement coverage | % of statements executed | 80%+ |
476
+
477
+ **Coverage is not quality:**
478
+
479
+ - 100% coverage doesn't mean bug-free code
480
+ - Focus on meaningful tests, not coverage numbers
481
+ - Cover critical paths and edge cases thoroughly
482
+ - Low-value code (getters/setters) can have lower coverage
483
+
484
+ ### What to Prioritize
485
+
486
+ **Test these thoroughly (aim for 100%):**
487
+
488
+ - Business logic and algorithms
489
+ - Security-critical code (authentication, authorization)
490
+ - Data validation and sanitization
491
+ - Error handling and edge cases
492
+ - Public APIs and interfaces
493
+
494
+ **Lower priority (aim for 60-80%):**
495
+
496
+ - Simple getters/setters
497
+ - Configuration loading
498
+ - Logging statements
499
+ - UI layout code
500
+ - Trivial utilities
501
+
502
+ ### Excluding Code from Coverage
503
+
504
+ **Mark code that shouldn't be covered:**
505
+
506
+ ```text
507
+ // Language-specific examples:
508
+
509
+ // TypeScript/JavaScript
510
+ /* istanbul ignore next */
511
+ function developmentOnlyHelper() { ... }
512
+
513
+ // Python
514
+ def debug_helper(): # pragma: no cover
515
+ ...
516
+
517
+ // Go
518
+ // +build !test
519
+ ```
520
+
521
+ **What to exclude:**
522
+
523
+ - Development/debug utilities
524
+ - Platform-specific code on other platforms
525
+ - Defensive assertions that should never happen
526
+ - Generated code
527
+
528
+ ## Anti-Patterns to Avoid
529
+
530
+ ### Testing Implementation Details
531
+
532
+ **Test behavior, not implementation:**
533
+
534
+ ```text
535
+ ✗ BAD - Tests internal state:
536
+ test("should increment counter") {
537
+ const obj = new Counter();
538
+ obj.increment();
539
+ expect(obj._internalCounter).toBe(1); // Testing private state
540
+ }
541
+
542
+ ✓ GOOD - Tests public behavior:
543
+ test("should return incremented value") {
544
+ const counter = new Counter();
545
+ counter.increment();
546
+ expect(counter.getValue()).toBe(1);
547
+ }
548
+ ```
549
+
550
+ **Why this matters:**
551
+
552
+ - Internal refactoring shouldn't break tests
553
+ - Tests document public contract, not implementation
554
+ - Enables changing internals without test changes
555
+
556
+ ### Brittle Tests
557
+
558
+ **Avoid tests that break on unrelated changes:**
559
+
560
+ ```text
561
+ ✗ BAD - Hardcoded values:
562
+ test("should format date") {
563
+ const result = formatDate(new Date());
564
+ expect(result).toBe("2024-11-22 14:30:45"); // Breaks constantly
565
+ }
566
+
567
+ ✓ GOOD - Flexible matching:
568
+ test("should format date") {
569
+ const result = formatDate(new Date());
570
+ expect(result).toMatch(/^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}$/);
571
+ }
572
+ ```
573
+
574
+ ### Testing Multiple Things
575
+
576
+ **One test, one responsibility:**
577
+
578
+ ```text
579
+ ✗ BAD - Tests entire workflow:
580
+ test("user workflow") {
581
+ const user = createUser();
582
+ user.login();
583
+ user.updateProfile();
584
+ user.changePassword();
585
+ user.logout();
586
+ // If any step fails, which one?
587
+ }
588
+
589
+ ✓ GOOD - Separate tests:
590
+ test("should create user")
591
+ test("should login user")
592
+ test("should update profile")
593
+ test("should change password")
594
+ test("should logout user")
595
+ ```
596
+
597
+ ### Slow Tests
598
+
599
+ **Keep tests fast:**
600
+
601
+ **Performance targets:**
602
+
603
+ - Unit test: < 10ms
604
+ - Integration test: < 1 second
605
+ - E2E test: < 30 seconds
606
+
607
+ **Optimization strategies:**
608
+
609
+ - Use in-memory databases instead of real ones
610
+ - Mock slow external dependencies
611
+ - Parallelize test execution
612
+ - Use test data factories (don't recreate fixtures)
613
+ - Share expensive setup across tests (carefully)
614
+
615
+ ## TDD in Practice
616
+
617
+ ### Starting a New Feature
618
+
619
+ **TDD workflow for new features:**
620
+
621
+ ```text
622
+ 1. Write first test for simplest case
623
+ → Test fails (RED)
624
+
625
+ 2. Write minimal code to pass
626
+ → Test passes (GREEN)
627
+
628
+ 3. Write test for next case
629
+ → Test fails (RED)
630
+
631
+ 4. Extend code to pass new test
632
+ → All tests pass (GREEN)
633
+
634
+ 5. Refactor if needed
635
+ → Tests still pass (GREEN)
636
+
637
+ 6. Repeat until feature complete
638
+ ```
639
+
640
+ **Example: Building an email validator**
641
+
642
+ ```text
643
+ Step 1: Test empty string
644
+ Test: validateEmail("") → {valid: false}
645
+ Code: function validateEmail(email) { return {valid: false}; }
646
+
647
+ Step 2: Test simple valid email
648
+ Test: validateEmail("a@b.c") → {valid: true}
649
+ Code: Add check for @ and .
650
+
651
+ Step 3: Test invalid format (no @)
652
+ Test: validateEmail("invalid") → {valid: false}
653
+ Code: Already passes
654
+
655
+ Step 4: Test invalid format (no domain)
656
+ Test: validateEmail("test@") → {valid: false}
657
+ Code: Add domain check
658
+
659
+ Step 5: Test complex valid email
660
+ Test: validateEmail("user.name+tag@example.co.uk") → {valid: true}
661
+ Code: Improve regex pattern
662
+
663
+ (Continue for all edge cases...)
664
+ ```
665
+
666
+ ### Fixing Bugs
667
+
668
+ **TDD workflow for bug fixes:**
669
+
670
+ ```text
671
+ 1. Write test that reproduces the bug
672
+ → Test fails (confirms bug exists)
673
+
674
+ 2. Fix the bug
675
+ → Test passes (bug is fixed)
676
+
677
+ 3. Ensure all other tests still pass
678
+ → Regression test in place forever
679
+ ```
680
+
681
+ **Example: Bug report: "App crashes on empty input"**
682
+
683
+ ```text
684
+ 1. Write failing test:
685
+ test("should handle empty input") {
686
+ expect(() => processInput("")).not.toThrow();
687
+ }
688
+ → Test fails: TypeError: Cannot read property 'length' of undefined
689
+
690
+ 2. Fix code:
691
+ function processInput(input) {
692
+ if (!input) return null; // Add null check
693
+ return input.length;
694
+ }
695
+ → Test passes
696
+
697
+ 3. Run all tests:
698
+ → All pass, regression prevented
699
+ ```
700
+
701
+ ### Refactoring
702
+
703
+ **TDD enables safe refactoring:**
704
+
705
+ ```text
706
+ 1. Ensure comprehensive test coverage
707
+ → All tests pass (GREEN)
708
+
709
+ 2. Refactor code
710
+ → Change structure, not behavior
711
+
712
+ 3. Run tests frequently
713
+ → Tests catch regressions immediately
714
+
715
+ 4. If tests fail:
716
+ → Either fix code or fix test (if test was wrong)
717
+
718
+ 5. Repeat until refactoring complete
719
+ → All tests still pass
720
+ ```
721
+
722
+ **Refactoring example:**
723
+
724
+ ```text
725
+ BEFORE:
726
+ function calculateTotal(items) {
727
+ let total = 0;
728
+ for (let i = 0; i < items.length; i++) {
729
+ total += items[i].price * items[i].quantity;
730
+ }
731
+ return total;
732
+ }
733
+
734
+ Tests: ✓ All passing
735
+
736
+ AFTER (refactored):
737
+ function calculateTotal(items) {
738
+ return items.reduce((sum, item) =>
739
+ sum + (item.price * item.quantity), 0
740
+ );
741
+ }
742
+
743
+ Tests: ✓ All still passing (behavior unchanged)
744
+ ```
745
+
746
+ ## Best Practices for AI Agents
747
+
748
+ ### Always Run Tests Before Coding
749
+
750
+ **Workflow for AI agents:**
751
+
752
+ ```text
753
+ 1. Read existing tests
754
+ 2. Understand expected behavior
755
+ 3. Write new test for feature/fix
756
+ 4. Run tests (should fail)
757
+ 5. Write code to pass test
758
+ 6. Run tests (should pass)
759
+ 7. Refactor if needed
760
+ 8. Run tests (should still pass)
761
+ ```
762
+
763
+ **Never skip step 4** - Confirming the test fails proves it's valid.
764
+
765
+ ### Communicate Test Results
766
+
767
+ **Report test status to users:**
768
+
769
+ ```text
770
+ ✓ GOOD:
771
+ "I've written a test for email validation. Running tests..."
772
+ [test output]
773
+ "Test failed as expected (RED). Now implementing the validator..."
774
+ [writes code]
775
+ "Running tests again..."
776
+ [test output]
777
+ "Test passes (GREEN). Email validation is working correctly."
778
+
779
+ ✗ BAD:
780
+ "I've implemented email validation."
781
+ [no tests mentioned, no verification shown]
782
+ ```
783
+
784
+ ### Use Test Output for Debugging
785
+
786
+ **When tests fail, analyze output:**
787
+
788
+ ```text
789
+ Test failure output:
790
+ Expected: { valid: true, domain: "example.com" }
791
+ Received: { valid: true, domain: undefined }
792
+
793
+ Analysis:
794
+ - valid flag is correct
795
+ - domain extraction is broken
796
+ - Focus debugging on domain parsing logic
797
+ ```
798
+
799
+ ### Maintain Test Quality
800
+
801
+ **Treat tests as production code:**
802
+
803
+ - Use descriptive names
804
+ - Keep tests simple and readable
805
+ - Refactor duplicate test code
806
+ - Delete obsolete tests
807
+ - Update tests when requirements change
808
+
809
+ ## Language-Specific Guides
810
+
811
+ For implementation details in specific languages:
812
+
813
+ - **TypeScript/JavaScript**: See [agents.tdd.ts.md](./agents.tdd.ts.md)
814
+ - **Python**: See [agents.tdd.py.md](./agents.tdd.py.md) (coming soon)
815
+ - **Go**: See [agents.tdd.go.md](./agents.tdd.go.md) (coming soon)
816
+ - **Ruby**: See [agents.tdd.rb.md](./agents.tdd.rb.md) (coming soon)
817
+
818
+ ## Summary
819
+
820
+ **TDD fundamentals:**
821
+
822
+ - Write tests first (RED → GREEN → REFACTOR)
823
+ - Test behavior, not implementation
824
+ - Keep tests fast and independent
825
+ - Use mocks for external dependencies
826
+ - Aim for 80%+ coverage on critical code
827
+ - One test, one behavior
828
+ - Run tests frequently
829
+
830
+ **For AI agents:**
831
+
832
+ - Always write tests before implementation
833
+ - Verify tests fail before writing code (RED)
834
+ - Show test output to users
835
+ - Use test failures for debugging
836
+ - Maintain test quality like production code
837
+
838
+ **Result**: Reliable, maintainable code with regression protection and living documentation.