dojo.md 0.2.2 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (196) hide show
  1. package/courses/GENERATION_LOG.md +29 -0
  2. package/courses/api-documentation-writing/course.yaml +12 -0
  3. package/courses/api-documentation-writing/scenarios/level-1/authentication-basics.yaml +46 -0
  4. package/courses/api-documentation-writing/scenarios/level-1/data-types-formats.yaml +45 -0
  5. package/courses/api-documentation-writing/scenarios/level-1/endpoint-description.yaml +45 -0
  6. package/courses/api-documentation-writing/scenarios/level-1/error-documentation.yaml +45 -0
  7. package/courses/api-documentation-writing/scenarios/level-1/first-documentation-shift.yaml +47 -0
  8. package/courses/api-documentation-writing/scenarios/level-1/getting-started-guide.yaml +42 -0
  9. package/courses/api-documentation-writing/scenarios/level-1/pagination-docs.yaml +51 -0
  10. package/courses/api-documentation-writing/scenarios/level-1/request-parameters.yaml +46 -0
  11. package/courses/api-documentation-writing/scenarios/level-1/request-response-examples.yaml +48 -0
  12. package/courses/api-documentation-writing/scenarios/level-1/status-codes.yaml +45 -0
  13. package/courses/api-documentation-writing/scenarios/level-2/error-patterns.yaml +48 -0
  14. package/courses/api-documentation-writing/scenarios/level-2/intermediate-documentation-shift.yaml +48 -0
  15. package/courses/api-documentation-writing/scenarios/level-2/oauth-documentation.yaml +47 -0
  16. package/courses/api-documentation-writing/scenarios/level-2/openapi-specification.yaml +46 -0
  17. package/courses/api-documentation-writing/scenarios/level-2/rate-limiting-docs.yaml +45 -0
  18. package/courses/api-documentation-writing/scenarios/level-2/request-body-schemas.yaml +46 -0
  19. package/courses/api-documentation-writing/scenarios/level-2/schema-definitions.yaml +41 -0
  20. package/courses/api-documentation-writing/scenarios/level-2/swagger-redoc-rendering.yaml +43 -0
  21. package/courses/api-documentation-writing/scenarios/level-2/validation-documentation.yaml +47 -0
  22. package/courses/api-documentation-writing/scenarios/level-2/versioning-changelog.yaml +42 -0
  23. package/courses/api-documentation-writing/scenarios/level-3/advanced-documentation-shift.yaml +43 -0
  24. package/courses/api-documentation-writing/scenarios/level-3/api-style-guide.yaml +40 -0
  25. package/courses/api-documentation-writing/scenarios/level-3/code-samples-multilang.yaml +40 -0
  26. package/courses/api-documentation-writing/scenarios/level-3/content-architecture.yaml +47 -0
  27. package/courses/api-documentation-writing/scenarios/level-3/deprecation-communication.yaml +44 -0
  28. package/courses/api-documentation-writing/scenarios/level-3/interactive-api-explorer.yaml +42 -0
  29. package/courses/api-documentation-writing/scenarios/level-3/migration-guides.yaml +42 -0
  30. package/courses/api-documentation-writing/scenarios/level-3/sdk-documentation.yaml +40 -0
  31. package/courses/api-documentation-writing/scenarios/level-3/webhook-documentation.yaml +48 -0
  32. package/courses/api-documentation-writing/scenarios/level-3/websocket-sse-docs.yaml +47 -0
  33. package/courses/api-documentation-writing/scenarios/level-4/api-changelog-management.yaml +44 -0
  34. package/courses/api-documentation-writing/scenarios/level-4/api-governance-standards.yaml +41 -0
  35. package/courses/api-documentation-writing/scenarios/level-4/api-product-strategy.yaml +41 -0
  36. package/courses/api-documentation-writing/scenarios/level-4/developer-portal-design.yaml +48 -0
  37. package/courses/api-documentation-writing/scenarios/level-4/docs-as-code.yaml +41 -0
  38. package/courses/api-documentation-writing/scenarios/level-4/documentation-localization.yaml +46 -0
  39. package/courses/api-documentation-writing/scenarios/level-4/documentation-metrics.yaml +45 -0
  40. package/courses/api-documentation-writing/scenarios/level-4/documentation-testing.yaml +41 -0
  41. package/courses/api-documentation-writing/scenarios/level-4/expert-documentation-shift.yaml +45 -0
  42. package/courses/api-documentation-writing/scenarios/level-4/multi-audience-docs.yaml +46 -0
  43. package/courses/api-documentation-writing/scenarios/level-5/ai-powered-documentation.yaml +44 -0
  44. package/courses/api-documentation-writing/scenarios/level-5/api-first-documentation.yaml +45 -0
  45. package/courses/api-documentation-writing/scenarios/level-5/api-marketplace-docs.yaml +42 -0
  46. package/courses/api-documentation-writing/scenarios/level-5/board-api-strategy.yaml +48 -0
  47. package/courses/api-documentation-writing/scenarios/level-5/documentation-program-strategy.yaml +42 -0
  48. package/courses/api-documentation-writing/scenarios/level-5/documentation-team-structure.yaml +47 -0
  49. package/courses/api-documentation-writing/scenarios/level-5/dx-competitive-advantage.yaml +46 -0
  50. package/courses/api-documentation-writing/scenarios/level-5/ecosystem-documentation.yaml +45 -0
  51. package/courses/api-documentation-writing/scenarios/level-5/industry-documentation-patterns.yaml +46 -0
  52. package/courses/api-documentation-writing/scenarios/level-5/master-documentation-shift.yaml +46 -0
  53. package/courses/code-review-feedback-writing/course.yaml +12 -0
  54. package/courses/code-review-feedback-writing/scenarios/level-1/approve-vs-request-changes.yaml +48 -0
  55. package/courses/code-review-feedback-writing/scenarios/level-1/asking-questions.yaml +50 -0
  56. package/courses/code-review-feedback-writing/scenarios/level-1/clear-comment-writing.yaml +45 -0
  57. package/courses/code-review-feedback-writing/scenarios/level-1/constructive-tone.yaml +43 -0
  58. package/courses/code-review-feedback-writing/scenarios/level-1/first-review-shift.yaml +46 -0
  59. package/courses/code-review-feedback-writing/scenarios/level-1/giving-praise.yaml +44 -0
  60. package/courses/code-review-feedback-writing/scenarios/level-1/nitpick-etiquette.yaml +44 -0
  61. package/courses/code-review-feedback-writing/scenarios/level-1/providing-context.yaml +46 -0
  62. package/courses/code-review-feedback-writing/scenarios/level-1/reviewing-small-prs.yaml +43 -0
  63. package/courses/code-review-feedback-writing/scenarios/level-1/style-vs-logic.yaml +48 -0
  64. package/courses/code-review-feedback-writing/scenarios/level-2/architectural-feedback.yaml +52 -0
  65. package/courses/code-review-feedback-writing/scenarios/level-2/intermediate-review-shift.yaml +46 -0
  66. package/courses/code-review-feedback-writing/scenarios/level-2/performance-feedback.yaml +50 -0
  67. package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-breaking-changes.yaml +44 -0
  68. package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-complex-prs.yaml +43 -0
  69. package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-documentation.yaml +47 -0
  70. package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-error-handling.yaml +50 -0
  71. package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-tests.yaml +53 -0
  72. package/courses/code-review-feedback-writing/scenarios/level-2/security-review-comments.yaml +50 -0
  73. package/courses/code-review-feedback-writing/scenarios/level-2/suggesting-alternatives.yaml +42 -0
  74. package/courses/code-review-feedback-writing/scenarios/level-3/advanced-review-shift.yaml +48 -0
  75. package/courses/code-review-feedback-writing/scenarios/level-3/api-design-review.yaml +47 -0
  76. package/courses/code-review-feedback-writing/scenarios/level-3/cross-team-review.yaml +45 -0
  77. package/courses/code-review-feedback-writing/scenarios/level-3/database-migration-review.yaml +48 -0
  78. package/courses/code-review-feedback-writing/scenarios/level-3/design-pattern-feedback.yaml +48 -0
  79. package/courses/code-review-feedback-writing/scenarios/level-3/mentoring-through-review.yaml +46 -0
  80. package/courses/code-review-feedback-writing/scenarios/level-3/production-incident-review.yaml +42 -0
  81. package/courses/code-review-feedback-writing/scenarios/level-3/reviewing-senior-code.yaml +47 -0
  82. package/courses/code-review-feedback-writing/scenarios/level-3/reviewing-unfamiliar-code.yaml +43 -0
  83. package/courses/code-review-feedback-writing/scenarios/level-3/speed-vs-thoroughness.yaml +46 -0
  84. package/courses/code-review-feedback-writing/scenarios/level-4/automated-review-strategy.yaml +44 -0
  85. package/courses/code-review-feedback-writing/scenarios/level-4/expert-review-shift.yaml +46 -0
  86. package/courses/code-review-feedback-writing/scenarios/level-4/review-culture-design.yaml +41 -0
  87. package/courses/code-review-feedback-writing/scenarios/level-4/review-guidelines-standards.yaml +45 -0
  88. package/courses/code-review-feedback-writing/scenarios/level-4/review-load-balancing.yaml +39 -0
  89. package/courses/code-review-feedback-writing/scenarios/level-4/review-metrics.yaml +39 -0
  90. package/courses/code-review-feedback-writing/scenarios/level-4/review-process-optimization.yaml +48 -0
  91. package/courses/code-review-feedback-writing/scenarios/level-4/scaling-review-process.yaml +45 -0
  92. package/courses/code-review-feedback-writing/scenarios/level-4/security-review-standards.yaml +41 -0
  93. package/courses/code-review-feedback-writing/scenarios/level-4/training-reviewers.yaml +42 -0
  94. package/courses/code-review-feedback-writing/scenarios/level-5/board-quality-metrics.yaml +44 -0
  95. package/courses/code-review-feedback-writing/scenarios/level-5/knowledge-transfer-at-scale.yaml +42 -0
  96. package/courses/code-review-feedback-writing/scenarios/level-5/ma-review-alignment.yaml +50 -0
  97. package/courses/code-review-feedback-writing/scenarios/level-5/master-review-shift.yaml +49 -0
  98. package/courses/code-review-feedback-writing/scenarios/level-5/review-competitive-advantage.yaml +48 -0
  99. package/courses/code-review-feedback-writing/scenarios/level-5/review-organizational-learning.yaml +46 -0
  100. package/courses/code-review-feedback-writing/scenarios/level-5/review-roi-analysis.yaml +51 -0
  101. package/courses/code-review-feedback-writing/scenarios/level-5/review-velocity-impact.yaml +44 -0
  102. package/courses/code-review-feedback-writing/scenarios/level-5/scaling-reviews-100-plus.yaml +45 -0
  103. package/courses/code-review-feedback-writing/scenarios/level-5/toxic-culture-transformation.yaml +46 -0
  104. package/courses/technical-rfc-writing/course.yaml +11 -0
  105. package/courses/technical-rfc-writing/scenarios/level-1/first-rfc-shift.yaml +45 -0
  106. package/courses/technical-rfc-writing/scenarios/level-1/implementation-planning.yaml +47 -0
  107. package/courses/technical-rfc-writing/scenarios/level-1/open-questions.yaml +46 -0
  108. package/courses/technical-rfc-writing/scenarios/level-1/problem-statement.yaml +41 -0
  109. package/courses/technical-rfc-writing/scenarios/level-1/proposing-solutions.yaml +49 -0
  110. package/courses/technical-rfc-writing/scenarios/level-1/rfc-structure.yaml +41 -0
  111. package/courses/technical-rfc-writing/scenarios/level-1/risks-and-mitigations.yaml +43 -0
  112. package/courses/technical-rfc-writing/scenarios/level-1/scoping-an-rfc.yaml +49 -0
  113. package/courses/technical-rfc-writing/scenarios/level-1/success-metrics.yaml +43 -0
  114. package/courses/technical-rfc-writing/scenarios/level-1/writing-for-audience.yaml +42 -0
  115. package/courses/technical-rfc-writing/scenarios/level-2/risk-assessment-matrix.yaml +43 -0
  116. package/courses/technical-rfc-writing/scenarios/level-2/technical-design-detail.yaml +42 -0
  117. package/courses/technical-rfc-writing/scenarios/level-2/trade-off-analysis.yaml +43 -0
  118. package/courses/terraform-infrastructure-setup/scenarios/level-1/first-debugging-shift.yaml +66 -0
  119. package/courses/terraform-infrastructure-setup/scenarios/level-1/plan-output-reading.yaml +71 -0
  120. package/courses/terraform-infrastructure-setup/scenarios/level-1/resource-creation-failures.yaml +54 -0
  121. package/courses/terraform-infrastructure-setup/scenarios/level-1/resource-references.yaml +70 -0
  122. package/courses/terraform-infrastructure-setup/scenarios/level-1/state-file-basics.yaml +73 -0
  123. package/courses/terraform-infrastructure-setup/scenarios/level-1/terraform-fmt-validate.yaml +58 -0
  124. package/courses/terraform-infrastructure-setup/scenarios/level-2/count-vs-for-each.yaml +58 -0
  125. package/courses/terraform-infrastructure-setup/scenarios/level-2/dependency-management.yaml +80 -0
  126. package/courses/terraform-infrastructure-setup/scenarios/level-2/intermediate-debugging-shift.yaml +66 -0
  127. package/courses/terraform-infrastructure-setup/scenarios/level-2/lifecycle-rules.yaml +51 -0
  128. package/courses/terraform-infrastructure-setup/scenarios/level-2/locals-and-expressions.yaml +58 -0
  129. package/courses/terraform-infrastructure-setup/scenarios/level-2/module-structure.yaml +75 -0
  130. package/courses/terraform-infrastructure-setup/scenarios/level-2/provisioner-pitfalls.yaml +64 -0
  131. package/courses/terraform-infrastructure-setup/scenarios/level-2/remote-state-backend.yaml +55 -0
  132. package/courses/terraform-infrastructure-setup/scenarios/level-2/terraform-import.yaml +55 -0
  133. package/courses/terraform-infrastructure-setup/scenarios/level-2/workspace-management.yaml +51 -0
  134. package/courses/terraform-infrastructure-setup/scenarios/level-3/advanced-debugging-shift.yaml +63 -0
  135. package/courses/terraform-infrastructure-setup/scenarios/level-3/api-rate-limiting.yaml +50 -0
  136. package/courses/terraform-infrastructure-setup/scenarios/level-3/conditional-resources.yaml +66 -0
  137. package/courses/terraform-infrastructure-setup/scenarios/level-3/drift-detection.yaml +66 -0
  138. package/courses/terraform-infrastructure-setup/scenarios/level-3/dynamic-blocks.yaml +71 -0
  139. package/courses/terraform-infrastructure-setup/scenarios/level-3/large-scale-refactoring.yaml +59 -0
  140. package/courses/terraform-infrastructure-setup/scenarios/level-3/multi-provider-config.yaml +69 -0
  141. package/courses/terraform-infrastructure-setup/scenarios/level-3/state-surgery.yaml +57 -0
  142. package/courses/terraform-infrastructure-setup/scenarios/level-3/terraform-cloud-enterprise.yaml +59 -0
  143. package/courses/terraform-infrastructure-setup/scenarios/level-3/terraform-debugging.yaml +51 -0
  144. package/courses/terraform-infrastructure-setup/scenarios/level-4/blast-radius-management.yaml +51 -0
  145. package/courses/terraform-infrastructure-setup/scenarios/level-4/cicd-pipeline-design.yaml +50 -0
  146. package/courses/terraform-infrastructure-setup/scenarios/level-4/compliance-as-code.yaml +46 -0
  147. package/courses/terraform-infrastructure-setup/scenarios/level-4/cost-estimation-governance.yaml +42 -0
  148. package/courses/terraform-infrastructure-setup/scenarios/level-4/expert-debugging-shift.yaml +51 -0
  149. package/courses/terraform-infrastructure-setup/scenarios/level-4/iac-organization-strategy.yaml +45 -0
  150. package/courses/terraform-infrastructure-setup/scenarios/level-4/incident-response-iac.yaml +47 -0
  151. package/courses/terraform-infrastructure-setup/scenarios/level-4/infrastructure-testing.yaml +41 -0
  152. package/courses/terraform-infrastructure-setup/scenarios/level-4/module-registry-design.yaml +45 -0
  153. package/courses/terraform-infrastructure-setup/scenarios/level-4/multi-account-strategy.yaml +57 -0
  154. package/courses/terraform-infrastructure-setup/scenarios/level-5/board-infrastructure-investment.yaml +53 -0
  155. package/courses/terraform-infrastructure-setup/scenarios/level-5/disaster-recovery-iac.yaml +47 -0
  156. package/courses/terraform-infrastructure-setup/scenarios/level-5/enterprise-iac-transformation.yaml +48 -0
  157. package/courses/terraform-infrastructure-setup/scenarios/level-5/iac-technology-evolution.yaml +49 -0
  158. package/courses/terraform-infrastructure-setup/scenarios/level-5/ma-infrastructure-consolidation.yaml +54 -0
  159. package/courses/terraform-infrastructure-setup/scenarios/level-5/master-debugging-shift.yaml +53 -0
  160. package/courses/terraform-infrastructure-setup/scenarios/level-5/multi-cloud-strategy.yaml +49 -0
  161. package/courses/terraform-infrastructure-setup/scenarios/level-5/platform-engineering.yaml +47 -0
  162. package/courses/terraform-infrastructure-setup/scenarios/level-5/regulatory-compliance-automation.yaml +47 -0
  163. package/courses/terraform-infrastructure-setup/scenarios/level-5/terraform-vs-alternatives.yaml +46 -0
  164. package/dist/cli/commands/generate.d.ts.map +1 -1
  165. package/dist/cli/commands/generate.js +2 -1
  166. package/dist/cli/commands/generate.js.map +1 -1
  167. package/dist/cli/commands/train.d.ts.map +1 -1
  168. package/dist/cli/commands/train.js +6 -3
  169. package/dist/cli/commands/train.js.map +1 -1
  170. package/dist/cli/index.js +9 -6
  171. package/dist/cli/index.js.map +1 -1
  172. package/dist/cli/run-demo.js +3 -2
  173. package/dist/cli/run-demo.js.map +1 -1
  174. package/dist/engine/model-utils.d.ts +6 -0
  175. package/dist/engine/model-utils.d.ts.map +1 -1
  176. package/dist/engine/model-utils.js +28 -1
  177. package/dist/engine/model-utils.js.map +1 -1
  178. package/dist/engine/training.d.ts.map +1 -1
  179. package/dist/engine/training.js +4 -3
  180. package/dist/engine/training.js.map +1 -1
  181. package/dist/evaluator/judge.d.ts +7 -1
  182. package/dist/evaluator/judge.d.ts.map +1 -1
  183. package/dist/evaluator/judge.js +50 -11
  184. package/dist/evaluator/judge.js.map +1 -1
  185. package/dist/generator/course-generator.d.ts.map +1 -1
  186. package/dist/generator/course-generator.js +4 -3
  187. package/dist/generator/course-generator.js.map +1 -1
  188. package/dist/mcp/server.d.ts.map +1 -1
  189. package/dist/mcp/server.js +7 -3
  190. package/dist/mcp/server.js.map +1 -1
  191. package/dist/mcp/session-manager.d.ts.map +1 -1
  192. package/dist/mcp/session-manager.js +3 -2
  193. package/dist/mcp/session-manager.js.map +1 -1
  194. package/dist/types/index.d.ts +1 -1
  195. package/dist/types/index.d.ts.map +1 -1
  196. package/package.json +1 -1
@@ -0,0 +1,48 @@
1
+ meta:
2
+ id: design-pattern-feedback
3
+ level: 3
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Give design pattern feedback — recognize pattern opportunities and anti-patterns in code reviews and explain when patterns help vs when they over-engineer"
7
+ tags: [code-review, design-patterns, anti-patterns, refactoring, over-engineering, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're reviewing code that could benefit from design patterns, but
13
+ you need to be careful — not every piece of code needs a pattern.
14
+ You've found:
15
+
16
+ 1. A 500-line switch statement handling 20 different notification
17
+ types — each case has similar but slightly different logic
18
+
19
+ 2. A class with 15 methods that all start by loading user data,
20
+ checking permissions, and logging — massive duplication
21
+
22
+ 3. Code that creates database connections directly in 30 different
23
+ places instead of using a connection pool
24
+
25
+ 4. A simple CRUD endpoint where someone suggests "we should add
26
+ a Strategy pattern for the create operation"
27
+
28
+ 5. Three layers of abstraction for a function that's called in
29
+ exactly one place
30
+
31
+ Task: For each situation, decide: does it need a pattern, which
32
+ pattern, or is the current approach fine? Write review comments
33
+ that teach WHEN patterns are appropriate, not just WHAT patterns
34
+ exist. Call out over-engineering as firmly as under-engineering.
35
+
36
+ assertions:
37
+ - type: llm_judge
38
+ criteria: "Pattern recommendations are justified with concrete benefits — (1) Switch statement: Strategy or Command pattern — define a NotificationHandler interface, create EmailHandler, SMSHandler, PushHandler. Benefit: each handler is independently testable, new notification types don't modify existing code. Show the refactored structure. (2) Repeated setup: Template Method or middleware/decorator pattern — extract loadUser(), checkPermission(), logAction() into middleware that runs before each method. Benefit: DRY, single place to modify auth logic. (3) Direct connections: use connection pool (singleton or factory pattern). Benefit: resource management, connection reuse, centralized configuration. Each recommendation includes: what pattern, why it helps HERE, and a code sketch showing the result"
39
+ weight: 0.35
40
+ description: "Justified patterns"
41
+ - type: llm_judge
42
+ criteria: "Over-engineering is called out with equal conviction — (4) Strategy for simple CRUD: 'This endpoint creates a record in a table — a Strategy pattern adds 3 files and an interface for something that's 10 lines of straightforward code. The Strategy pattern is valuable when you have multiple interchangeable algorithms, not for a single code path that does one thing. YAGNI — You Aren't Gonna Need It. If we need different creation strategies later, we can refactor then.' (5) Three abstraction layers: 'This function is called in one place. The abstraction adds cognitive overhead — a reader must trace through 3 files to understand what happens. Inline the logic into the caller. Abstractions are earned by duplication, not created in anticipation.' Equal firmness: over-engineering wastes time and adds complexity just as under-engineering creates duplication"
43
+ weight: 0.35
44
+ description: "Over-engineering callout"
45
+ - type: llm_judge
46
+ criteria: "Guidelines for WHEN to apply patterns are practical — rules of thumb: (1) Three strikes rule — the first time, just write the code; the second time, note the duplication; the third time, extract a pattern. (2) Patterns are answers to specific problems — if you can't name the problem, you don't need the pattern. (3) Test: would a new team member understand the code better WITH or WITHOUT the pattern? If the pattern adds confusion, skip it. (4) Patterns should reduce code, not increase it — if Strategy adds 100 lines to save 20 lines of duplication, it's not worth it yet. (5) Favor simple composition over complex inheritance. Anti-patterns to watch for: 'Abstract Factory for one implementation', 'Singleton just because', 'Observer for two components that could just call each other.' Maturity signal: knowing when NOT to use a pattern is more valuable than knowing patterns"
47
+ weight: 0.30
48
+ description: "When to apply"
@@ -0,0 +1,46 @@
1
+ meta:
2
+ id: mentoring-through-review
3
+ level: 3
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Mentor through code review — use reviews as teaching opportunities that develop junior developers' skills while maintaining code quality"
7
+ tags: [code-review, mentoring, teaching, junior-developers, growth, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're reviewing code from a junior developer (3 months on the team).
13
+ Their PR implements a user notification preferences feature. The code
14
+ works but has several learning opportunities:
15
+
16
+ 1. They wrote 200 lines of procedural code in one file instead of
17
+ using the service pattern the team follows
18
+ 2. They used string comparison for notification types instead of
19
+ the existing NotificationType enum
20
+ 3. They implemented their own email queue instead of using the
21
+ existing NotificationQueue service
22
+ 4. Error handling catches everything and returns a generic 500
23
+ 5. Tests are tightly coupled to implementation (mock everything)
24
+ 6. Good: the database migration is clean and well-structured
25
+ 7. Good: they wrote thorough documentation for the new endpoints
26
+
27
+ They're eager to learn but past harsh reviews have made them
28
+ defensive. How do you turn this review into a growth opportunity?
29
+
30
+ Task: Write the complete mentoring review. For each issue, teach
31
+ the principle (not just the fix), connect to team patterns, and
32
+ offer to pair. Make the developer feel supported, not criticized.
33
+
34
+ assertions:
35
+ - type: llm_judge
36
+ criteria: "Each comment teaches a principle, not just a fix — (1) Service pattern: explain WHY the team uses services (testability, reuse, separation of concerns) — not just 'move this to a service.' Show how the existing OrderService follows this pattern as a reference. (2) Enum: explain WHY enums exist (typo prevention, IDE autocomplete, refactoring safety) — show what happens when someone types 'emal' instead of 'email'. (3) Existing queue: explain HOW to discover existing utilities (search patterns, team wiki, ask in Slack) — the skill is knowing to look, not just what to use. (4) Error handling: explain the difference between expected errors (user not found) and unexpected errors (database crash) — each needs different handling. (5) Test coupling: explain what happens when implementation changes (all tests break even though behavior is correct) — suggest testing behavior: 'when user disables email notifications, emails stop being sent'"
37
+ weight: 0.35
38
+ description: "Principle teaching"
39
+ - type: llm_judge
40
+ criteria: "Tone supports growth without lowering the bar — start with genuine praise: 'The migration is really clean — especially the default values and constraints. And the endpoint documentation is better than what most senior developers write. These are the things that matter for long-term code quality.' For issues: frame as learning, not mistakes: 'This is a great opportunity to learn about our service pattern — it's not obvious when you're new and it took me a while to internalize it too.' Normalize the learning process: 'When I joined, I wrote a custom email sender too before discovering we had NotificationQueue — it's hard to know what exists in a large codebase.' Offer help: 'Want to pair on extracting this into a service? I can show you the pattern and you can drive.' Encourage: 'Your code logic is solid — the refactoring is mostly about fitting into team patterns, which takes time to learn.'"
41
+ weight: 0.35
42
+ description: "Supportive tone"
43
+ - type: llm_judge
44
+ criteria: "Review develops the developer's independence — don't just give answers, teach how to find them: 'Next time you need to handle notifications, try searching for NotificationType in the codebase — you'll find the enum and the existing service. Pro tip: `grep -r NotificationType src/` is my go-to for discovering existing patterns.' Suggest a discovery process: 'Before building a new utility, check: (1) search the codebase for similar patterns, (2) check the team wiki under Shared Services, (3) ask in #eng-questions — someone might have built it.' Set expectations for growth: 'For your next PR, try to identify one existing utility you can reuse instead of building from scratch.' Don't over-correct: 'I'm suggesting improvements for several areas, but you don't need to address everything in this PR. Focus on the service extraction (#1) and enum usage (#2) — those are the highest-impact changes. The rest can be follow-up work.'"
45
+ weight: 0.30
46
+ description: "Building independence"
@@ -0,0 +1,42 @@
1
+ meta:
2
+ id: production-incident-review
3
+ level: 3
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Review incident fix PRs — evaluate hotfix code for correctness, root cause coverage, regression prevention, and appropriate urgency without cutting corners"
7
+ tags: [code-review, incident, hotfix, production, root-cause, regression, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ There's an active production incident. A developer submits a PR
13
+ tagged "HOTFIX" to resolve it. The incident: users intermittently
14
+ see another user's order history when clicking "My Orders."
15
+
16
+ The PR (45 lines):
17
+ - Adds a user ID filter to the orders query that was missing
18
+ - The fix is in the caching layer — orders were cached by page
19
+ URL without including the user ID in the cache key
20
+ - The author also "cleaned up" some unrelated code in the same file
21
+ - No tests added ("we need to ship this fast")
22
+ - The fix handles the cache key but doesn't invalidate existing
23
+ incorrect cache entries
24
+
25
+ Task: Review this hotfix balancing urgency with safety. Write your
26
+ review addressing: is the fix correct, does it fully resolve the
27
+ issue, what's the risk of shipping vs not shipping, and what
28
+ follow-up work is needed. This is a time-sensitive review.
29
+
30
+ assertions:
31
+ - type: llm_judge
32
+ criteria: "Fix correctness is evaluated against the root cause — 'The cache key fix is correct — cache_key = /orders?page=1 should be cache_key = /orders?page=1&user_id=123. This prevents NEW incorrect entries. However, EXISTING cached entries still contain mixed user data. Risk: users continue seeing wrong data until those cache entries expire.' Critical missing piece: 'We need to either (1) flush the orders cache entirely, or (2) add cache invalidation for affected keys. Without this, the fix only prevents future occurrences — existing cached wrong data persists for [TTL duration].' Assessment: 'The fix is 80% correct — it prevents the vulnerability from recurring but doesn't clean up existing corrupted cache entries.'"
33
+ weight: 0.35
34
+ description: "Fix correctness"
35
+ - type: llm_judge
36
+ criteria: "Review balances urgency with necessary safety — decision framework: 'Ship immediately with cache flush' not 'Wait for perfect fix.' Specific recommendation: (1) Remove the unrelated code cleanup — hotfix PRs should ONLY contain the fix (reduces risk, simplifies rollback). (2) Add cache flush/invalidation — without this, the data leak continues from cache. (3) Tests can be a fast follow-up PR — but must be done today, not 'later.' (4) Approve once: unrelated changes removed AND cache invalidation added. Urgency acknowledgment: 'This is a data privacy incident — users seeing other users' data. Speed matters. But shipping a partial fix that still leaks cached data isn't actually faster — it's a second incident.'"
37
+ weight: 0.35
38
+ description: "Urgency balance"
39
+ - type: llm_judge
40
+ criteria: "Follow-up work is clearly scoped — immediate (ship with hotfix): cache key fix + cache flush + remove unrelated changes. Today: add regression test that verifies cache keys include user ID. This week: (1) audit all other cache keys for similar user-scoping issues, (2) add cache key logging/monitoring to detect future issues, (3) consider cache key builder utility that enforces user scoping by default. Post-incident: (1) retrospective to understand why the original code didn't include user ID (was it a refactor regression? was it never there?), (2) review process improvement — how did the original PR pass review without user scoping? (3) postmortem document. Frame follow-up as learning: 'The cache key pattern should be documented as a team standard — this is how bugs become institutional knowledge.'"
41
+ weight: 0.30
42
+ description: "Follow-up scoping"
@@ -0,0 +1,47 @@
1
+ meta:
2
+ id: reviewing-senior-code
3
+ level: 3
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Review senior engineer's code — navigate the dynamics of providing feedback to more experienced developers while maintaining quality standards"
7
+ tags: [code-review, senior-engineer, dynamics, confidence, speaking-up, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're a mid-level engineer asked to review a PR from the team's
13
+ most senior developer (10 years experience, built the core system).
14
+ Their PR refactors the authentication middleware — 250 lines.
15
+
16
+ You find several concerns:
17
+ 1. A race condition when refreshing OAuth tokens concurrently
18
+ 2. The new middleware doesn't handle the edge case of expired
19
+ refresh tokens (the old code did)
20
+ 3. Magic numbers: timeout values without named constants
21
+ 4. No tests for the concurrent token refresh scenario
22
+ 5. The refactor changes behavior — previously failed silently,
23
+ now throws (affects all callers)
24
+
25
+ You're nervous because:
26
+ - This person is far more experienced than you
27
+ - They might know something you don't
28
+ - You've seen them dismiss junior reviewers before
29
+ - But you're genuinely concerned about #1 and #2
30
+
31
+ Task: Write the review navigating the power dynamic. Show how to
32
+ give feedback to someone more senior while being confident in your
33
+ observations. Write a guide for reviewing up.
34
+
35
+ assertions:
36
+ - type: llm_judge
37
+ criteria: "Technical concerns are communicated with appropriate confidence — for genuine bugs (#1, #2): be direct but frame as observations, not accusations. 'I traced through the concurrent token refresh path and I believe there's a race condition: if two requests trigger a refresh simultaneously, both will attempt to write the new token. The old code used a mutex here (auth.ts:89 in the previous version) — was the removal intentional?' For the behavior change (#5): 'I noticed the error handling shifted from silent failure to throwing. Since this middleware is used by all 40 endpoints, this changes behavior for every request — was this intentional? If so, should we coordinate the rollout?' Confidence where warranted: 'The race condition is a real concern regardless of experience level — concurrent access bugs don't care about seniority.'"
38
+ weight: 0.35
39
+ description: "Confident feedback"
40
+ - type: llm_judge
41
+ criteria: "Power dynamic is navigated without being deferential or aggressive — don't start with: 'I might be wrong but...' or 'I'm sure you thought of this...' (undermines your review). Don't say: 'This is obviously wrong' (confrontational with senior). Do: state observations factually. 'Lines 45-60: when two concurrent requests both have expired tokens, the first refresh succeeds but the second attempts to refresh with the already-used refresh token, which will fail.' Ask genuine questions for things you're unsure about: 'I see the silent failure was changed to throwing — is there a design document for this change? I want to understand the migration plan for callers.' Nits (#3) are fine to raise normally — everyone gets nits. The key: treat the code the same regardless of who wrote it"
42
+ weight: 0.35
43
+ description: "Dynamic navigation"
44
+ - type: llm_judge
45
+ criteria: "Guide for reviewing up is practical — principles: (1) Your job is to review the CODE, not the PERSON. Experience doesn't make code bug-free. (2) If you see a potential bug, say so — you were asked to review for a reason. (3) Frame uncertain observations as questions ('Could this cause X?'), but frame clear issues as statements ('This race condition will cause Y'). (4) Don't pre-apologize for having feedback — it undermines the review process. (5) Senior engineers generally WANT honest review — they chose you as a reviewer. (6) If dismissed unfairly, escalate to your manager — review quality is everyone's responsibility. Common mistake: being so deferential that you provide a useless rubber-stamp review. The worst outcome isn't disagreeing with a senior — it's a production bug you saw but didn't mention"
46
+ weight: 0.30
47
+ description: "Reviewing up guide"
@@ -0,0 +1,43 @@
1
+ meta:
2
+ id: reviewing-unfamiliar-code
3
+ level: 3
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Review unfamiliar code — develop strategies for reviewing code in codebases you don't know well, asking the right questions, and still providing value"
7
+ tags: [code-review, unfamiliar-codebase, learning, questions, onboarding, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You just joined a new team and are asked to review PRs on day one.
13
+ You don't know:
14
+ - The architecture or how services communicate
15
+ - Team conventions (naming, patterns, error handling approach)
16
+ - Business domain (insurance claims processing)
17
+ - Why certain design decisions were made
18
+ - What the testing expectations are
19
+
20
+ The PR adds a new "claims escalation" feature — 200 lines across
21
+ 5 files. You can read the code but you're missing context for many
22
+ decisions.
23
+
24
+ Task: Write a review that provides value despite limited context.
25
+ Show how to: identify universal quality issues (any codebase),
26
+ ask context-gathering questions that also serve as review, learn
27
+ the codebase through reviewing, and be honest about what you can
28
+ and can't evaluate. Write the review comments and a reflection
29
+ on your review strategy.
30
+
31
+ assertions:
32
+ - type: llm_judge
33
+ criteria: "Universal quality issues are identified — regardless of domain knowledge, catch: (1) error handling gaps (catch block that swallows errors), (2) missing input validation, (3) potential null/undefined references, (4) SQL injection or security concerns, (5) dead code or unused variables, (6) inconsistent patterns within the PR itself (does the PR's own code follow its own conventions?). (7) Test quality: are tests testing behavior or implementation? Do tests have meaningful names? (8) Code clarity: are variable/function names descriptive? Is the control flow easy to follow? These issues are valid in any codebase and show the reviewer is thorough despite being new"
34
+ weight: 0.35
35
+ description: "Universal issues"
36
+ - type: llm_judge
37
+ criteria: "Questions serve double duty as review and learning — questions that are both genuine learning AND review prompts: 'I notice ClaimEscalation inherits from BaseProcessor — could you point me to where BaseProcessor is defined? I want to understand the lifecycle hooks.' This helps the reviewer learn AND surfaces whether the inheritance is appropriate. 'What triggers a claim to enter the escalation queue? I want to make sure the conditions in isEligibleForEscalation() cover all cases.' Asks for business context AND validates the logic. 'I see we're using Redis for the escalation queue — is there a reason we're not using the existing RabbitMQ setup I saw in other services?' Learns architecture AND questions the design choice. Each question shows the reviewer is engaged and thinking, not just rubber-stamping"
38
+ weight: 0.35
39
+ description: "Dual-purpose questions"
40
+ - type: llm_judge
41
+ criteria: "Honest scope disclosure builds trust — opening comment: 'Disclosure: I'm new to this codebase and the insurance domain. I focused my review on code quality, error handling, and testing patterns. I can't yet validate the business logic or architectural fit — I'd recommend a domain expert also reviews the escalation rules.' Specific limitations: 'I'm not sure if the 72-hour escalation window is a business requirement or an arbitrary choice — if it's business-critical, it should be a named constant with a comment explaining the requirement.' Learning-through-review: 'This review helped me understand how claims flow through the system. I documented what I learned in case it's useful: [brief architecture note].' Value: even as a newcomer, catching a null reference exception or a missing error handler is valuable — don't apologize for what you can contribute"
42
+ weight: 0.30
43
+ description: "Honest scope"
@@ -0,0 +1,46 @@
1
+ meta:
2
+ id: speed-vs-thoroughness
3
+ level: 3
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Balance review speed and thoroughness — develop strategies for providing timely reviews without sacrificing quality for different types of PRs"
7
+ tags: [code-review, speed, thoroughness, triage, time-management, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You have 6 PRs in your review queue, all assigned to you. Your
13
+ team's SLA is 24 hours for first review. It's 4pm and these came
14
+ in today:
15
+
16
+ 1. Hotfix: Production bug — users seeing 500 errors on checkout
17
+ (15 lines, high urgency)
18
+ 2. Feature: New dashboard analytics (350 lines, normal priority,
19
+ sprint deadline Friday)
20
+ 3. Dependency update: Bump React from 18.2 to 18.3 (5 files changed,
21
+ low priority)
22
+ 4. Refactor: Extract common validation logic into shared module
23
+ (200 lines, medium priority)
24
+ 5. Junior dev's PR: First feature implementation, they're blocked
25
+ waiting for review (100 lines, medium urgency due to person)
26
+ 6. Tech debt: Remove deprecated API endpoints (150 lines, low priority)
27
+
28
+ You have 2 hours of review time today.
29
+
30
+ Task: Create a review strategy for all 6 PRs. Show how to triage,
31
+ what depth of review each gets, write example reviews at different
32
+ thoroughness levels, and explain your prioritization reasoning.
33
+
34
+ assertions:
35
+ - type: llm_judge
36
+ criteria: "Triage ordering is well-reasoned — priority: (1) Hotfix — production impact, review in next 15 minutes. (2) Junior dev's PR — person is blocked, reviewing unblocks them (invest 30 minutes). (3) Feature — sprint deadline pressure (invest 45 minutes). (4) Refactor — medium priority, affects shared code quality (invest 20 minutes). (5) Dependency update — low risk if CI passes, quick scan (10 minutes). (6) Tech debt — no deadline, can slip to tomorrow (skip or quick scan). Reasoning: combine urgency (production > blocked person > deadline) with risk (shared code > isolated change > dependency bump). The junior dev is prioritized higher than strict urgency would suggest because blocking people is more costly than blocking code"
37
+ weight: 0.35
38
+ description: "Triage ordering"
39
+ - type: llm_judge
40
+ criteria: "Review depth varies appropriately per PR — Hotfix (15 min): verify the fix addresses the error, check for regression risk, don't nit on style — approve fast with 'LGTM, fix looks correct. Let's clean up styling in a follow-up.' Feature (45 min): full review — architecture, security, tests, performance. This is the big one that justifies thorough review. Junior PR (30 min): mentor-focused review — teach patterns, be encouraging, unblock quickly even if follow-up work is needed. Refactor (20 min): focus on interface changes and backward compatibility — internal implementation can be reviewed more lightly. Dependency (10 min): check CI passes, scan changelog for breaking changes, verify no deprecated API usage. Tech debt (5 min or skip): quick scan for accidental behavior changes, or leave comment 'I'll review this first thing tomorrow.' Each depth level is explicitly justified"
41
+ weight: 0.35
42
+ description: "Varying depth"
43
+ - type: llm_judge
44
+ criteria: "Example reviews demonstrate different thoroughness levels — quick review (hotfix): 'Verified: the fix adds null check for user.cart before accessing .items. This matches the error in Sentry (TypeError: Cannot read property items of null). CI passes. Approved — ship it.' Medium review (refactor): summary of approach, 2-3 specific comments on interface design, one question about edge case, approve with suggestions. Deep review (feature): structured thematic review, 8-10 comments across architecture/security/testing, request changes on 2 blocking items. Honest scoping: when you do a lighter review, say so: 'I did a quick review focused on the API interface — I'd recommend someone else gives the internal logic a deeper look.' Better to do a timeboxed review and disclose the scope than to skim everything and pretend it was thorough"
45
+ weight: 0.30
46
+ description: "Depth examples"
@@ -0,0 +1,44 @@
1
+ meta:
2
+ id: automated-review-strategy
3
+ level: 4
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Design automated review tooling — determine what to automate vs keep as human review, select tools, and integrate automation into the review workflow"
7
+ tags: [code-review, automation, linting, static-analysis, CI, tooling, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your team spends too much time on mechanical review tasks. You want
13
+ to automate what you can so humans focus on what matters. Current
14
+ review pain points:
15
+
16
+ - 40% of comments are style/formatting (Prettier/ESLint should handle)
17
+ - Security vulnerabilities sometimes slip through (Snyk/Semgrep could catch)
18
+ - Type errors found in review (TypeScript strict mode would prevent)
19
+ - Dead code and unused imports flagged manually
20
+ - Test coverage gaps discovered during review
21
+ - API breaking changes not noticed until production
22
+ - Documentation staleness (code changed, docs not updated)
23
+
24
+ Budget: $500/month for tooling. Team: 30 engineers.
25
+
26
+ Task: Design the automated review tooling strategy. For each pain
27
+ point, decide: automate fully, automate as advisory, or keep human.
28
+ Include: tool selection, CI/CD integration, configuration approach,
29
+ and how to introduce automation without overwhelming developers
30
+ with noise.
31
+
32
+ assertions:
33
+ - type: llm_judge
34
+ criteria: "Each pain point has a clear automation decision — automate fully (block PR if failing): formatting (Prettier — auto-fix on commit), linting (ESLint — errors block, warnings advisory), type checking (TypeScript strict — blocks build), unused imports (ESLint rule — auto-fix). Automate as advisory (comment on PR, don't block): security scanning (Semgrep/CodeQL — flag but human validates severity), test coverage (report delta — don't block on arbitrary threshold), dead code detection (report but human decides if intentional). Keep human: architectural review, business logic validation, API design review, documentation quality (detect CHANGE but human reviews quality). Each decision justified: 'Formatting is objectively correct — no human judgment needed. Security findings require context — is this a real vulnerability or a false positive?'"
35
+ weight: 0.35
36
+ description: "Automation decisions"
37
+ - type: llm_judge
38
+ criteria: "Tool selection is specific and budget-conscious — within $500/month: (1) Prettier + ESLint: free, handles formatting and code quality. (2) TypeScript strict mode: free, catches type errors. (3) GitHub Actions: included in GitHub, runs CI checks. (4) Semgrep: free tier covers security scanning. (5) CodeCov: free for open source, ~$100/month for private — coverage delta reporting. (6) danger.js or similar: free, custom PR automation (check for docs updates when API files change, check for changelog when version bumps). (7) Spectral: free, OpenAPI linting for API breaking changes. Total: ~$100-200/month, well under budget. Alternative: GitHub Advanced Security (~$400/month) for CodeQL + secret scanning + dependency review. Configuration: use shared config packages so all 30 engineers have consistent setup"
39
+ weight: 0.35
40
+ description: "Tool selection"
41
+ - type: llm_judge
42
+ criteria: "Rollout strategy prevents automation fatigue — phase 1 (week 1-2): Prettier auto-format only (zero noise — just fixes things). Phase 2 (week 3-4): ESLint errors only (not warnings — configure strictly to avoid noise). Phase 3 (month 2): Security scanning as advisory comments. Phase 4 (month 3): Coverage reporting and API change detection. Key principle: 'Every automated check must be actionable. If developers start ignoring bot comments, the automation is noise, not signal.' Configuration: tune tools before enabling — run against existing codebase, fix existing violations, THEN enable as blocking. False positive budget: 'If a tool produces more than 10% false positives, fix configuration or remove the tool. Developer trust in automation is fragile.' Gradual: introduce one tool at a time, measure impact, adjust before adding the next"
43
+ weight: 0.30
44
+ description: "Rollout strategy"
@@ -0,0 +1,46 @@
1
+ meta:
2
+ id: expert-review-shift
3
+ level: 4
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Combined expert shift — redesign an organization's code review program covering culture, process, automation, metrics, and training for a growing engineering team"
7
+ tags: [code-review, combined, shift-simulation, program-design, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You've been promoted to VP of Engineering at a 100-person startup
13
+ (60 engineers, 8 teams). The CEO says: "Our code quality is
14
+ declining, deployments keep breaking, and engineers are unhappy
15
+ with the review process. Fix it."
16
+
17
+ Current state:
18
+ - No formal review process — each team does their own thing
19
+ - Average defect escape rate: 12% (industry target: < 5%)
20
+ - Deployment rollback rate: 8% (should be < 2%)
21
+ - Engineer satisfaction with reviews: 3.5/10
22
+ - 3 engineers left citing "toxic review culture" in exit interviews
23
+ - No automated code quality checks
24
+ - Average time-to-merge: 8 days (blocking feature delivery)
25
+
26
+ Budget: $50K for tooling, ability to hire 1 additional head
27
+
28
+ Task: Present your complete code review transformation plan to the
29
+ CEO. Include: diagnosis of current problems, 90-day action plan
30
+ with quick wins, 6-month target state, tooling investment, culture
31
+ change program, metrics dashboard, and how this connects to the
32
+ broader goals of code quality and engineer retention.
33
+
34
+ assertions:
35
+ - type: llm_judge
36
+ criteria: "Diagnosis connects symptoms to root causes — 12% defect escape: reviews aren't catching bugs (reviewers focused on style, not logic — or rubber-stamping). 8% rollback rate: inadequate testing requirement in reviews (tests not reviewed or not required). 3.5/10 satisfaction: combination of slow reviews (8 day merge), hostile comments (3 people left), and inconsistent standards. Toxic culture: specific reviewers giving harsh feedback without consequences — management didn't intervene. 8-day merge: no SLA, no reviewer assignment, no load balancing — PRs sit in queues. Root cause: absence of process, not malicious intent. Nobody designed the review system — it evolved haphazardly. Fix the system, not the people (though the toxic reviewer behavior needs direct intervention)"
37
+ weight: 0.35
38
+ description: "Root cause diagnosis"
39
+ - type: llm_judge
40
+ criteria: "90-day plan delivers quick wins and lasting change — days 1-30 (immediate): (1) address toxic reviewer behavior directly (1:1s, expectations, consequences), (2) implement PR template with minimum requirements, (3) add Prettier + ESLint to CI (eliminate 40% of review comments), (4) establish 24-hour review SLA, (5) hire: DevEx engineer to own review tooling. Days 31-60: (1) CODEOWNERS for automatic reviewer assignment, (2) security scanning in CI (Semgrep), (3) reviewer training workshop (2 hours, all engineers), (4) code review guidelines document published. Days 61-90: (1) metrics dashboard live, (2) review quality feedback loop (manager reviews reviews), (3) cross-team reviewer rotation pilot, (4) quarterly calibration exercise. Each action connected to a specific metric improvement"
41
+ weight: 0.35
42
+ description: "90-day plan"
43
+ - type: llm_judge
44
+ criteria: "6-month targets and CEO communication are compelling — 6-month targets: defect escape < 6% (from 12%), rollback < 4% (from 8%), satisfaction > 6.5/10 (from 3.5), time-to-merge < 3 days (from 8), zero departures citing review culture. Budget allocation: tooling $20K (CI/CD, security scanning, coverage), training $10K (workshops, external facilitation), DevEx hire $80K (6 months salary). ROI for CEO: each production defect costs ~$5K (investigation + fix + customer impact). Reducing defect escape from 12% to 6% on 500 deploys/year = 30 fewer defects = $150K saved. Each departed engineer costs ~$150K to replace — preventing 3 departures = $450K. Total ROI: $600K savings on $110K investment. Culture change message: 'Code review is how we maintain quality at scale. We're investing in making it a positive, effective process — not a bureaucratic checkbox.'"
45
+ weight: 0.30
46
+ description: "Targets and communication"
@@ -0,0 +1,41 @@
1
+ meta:
2
+ id: review-culture-design
3
+ level: 4
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Design code review culture — establish review norms, expectations, and values that create a healthy, productive review environment for the engineering team"
7
+ tags: [code-review, culture, norms, psychological-safety, team-health, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're the new engineering manager for a 25-person team. The code
13
+ review culture has problems:
14
+
15
+ - 3 senior engineers do 80% of reviews (bottleneck + burnout)
16
+ - Reviews average 5 days turnaround (blocking developers for a week)
17
+ - Some reviewers leave 30+ comments on every PR (demoralizing)
18
+ - Junior developers avoid requesting review from specific seniors
19
+ - Two engineers had a public argument in PR comments last month
20
+ - No one reviews documentation PRs ("not code, not my job")
21
+ - Some PRs are merged with "LGTM" and no substantive review
22
+
23
+ Task: Design a complete code review culture document. Include:
24
+ review values and principles, expected behaviors, SLAs, reviewer
25
+ responsibilities, author responsibilities, conflict resolution,
26
+ and how to onboard new team members into the culture. This becomes
27
+ the team's code review charter.
28
+
29
+ assertions:
30
+ - type: llm_judge
31
+ criteria: "Values and principles address the specific problems — values: (1) Respect: critique code, never people. Public arguments in PRs are unacceptable — disagreements escalate to private discussion. (2) Shared responsibility: reviewing is everyone's job, not just seniors'. Every engineer reviews at least 2 PRs/week. (3) Timeliness: first review within 24 hours (business day). Slow reviews block people and waste context. (4) Proportionality: comment volume should match issue severity. 30 comments on a 50-line PR signals either a problematic PR (should have been discussed before writing) or an overly critical reviewer. (5) Growth orientation: reviews are opportunities for learning, not gatekeeping. (6) Comprehensiveness: documentation, tests, and configuration changes deserve review attention, not just 'code.'"
32
+ weight: 0.35
33
+ description: "Values and principles"
34
+ - type: llm_judge
35
+ criteria: "Concrete behaviors and SLAs are defined — reviewer SLA: first review pass within 24 business hours. If you can't review, reassign within 4 hours. Author SLA: respond to review comments within 24 hours. Comment labels: require 'blocking:', 'suggestion:', 'nit:', 'question:', 'praise:' prefixes. Maximum review rounds: if a PR requires more than 2 rounds, schedule a synchronous discussion instead. PR size: aim for under 200 lines; PRs over 400 lines can be reviewed in a meeting instead. Approval requirements: 1 approval for standard changes, 2 for security/auth/payments, team lead for architectural changes. Self-review: authors should self-review their own PR before requesting review (catches obvious issues)"
36
+ weight: 0.35
37
+ description: "Behaviors and SLAs"
38
+ - type: llm_judge
39
+ criteria: "Conflict resolution and onboarding are addressed — conflict resolution: (1) technical disagreements: if 2 rounds of comments don't resolve, take it offline — DM, call, or meeting. (2) Tone issues: if a comment feels personal, assume good intent first. If pattern continues, raise with manager privately. (3) Impasse: engineering manager breaks ties. (4) Never argue in PR comments — the PR author, other reviewers, and future readers all see it. Onboarding new team members: (1) first week: shadow 3 reviews from different reviewers, (2) second week: pair-review with a senior, (3) third week: solo review with senior co-reviewer, (4) ongoing: monthly 'review quality' check-in. Measuring health: quarterly anonymous survey on review experience, track turnaround time, track comment patterns. Review the charter quarterly and update based on team feedback"
40
+ weight: 0.30
41
+ description: "Conflict and onboarding"
@@ -0,0 +1,45 @@
1
+ meta:
2
+ id: review-guidelines-standards
3
+ level: 4
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Create review guidelines — write comprehensive code review standards that define expectations for both authors and reviewers across the organization"
7
+ tags: [code-review, guidelines, standards, expectations, organization, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your growing engineering organization (60 developers, 8 teams) needs
13
+ written code review guidelines. Currently, review expectations are
14
+ tribal knowledge — each team has different unwritten rules.
15
+
16
+ Inconsistencies across teams:
17
+ - Team A requires 2 approvals, Team B requires 1
18
+ - Team C blocks on any style issue, Team D auto-formats
19
+ - Some teams review architecture before coding, others review
20
+ only the final PR
21
+ - Test requirements: Team E requires 80% coverage, Team F has
22
+ no coverage requirement
23
+ - Some teams use PR templates, others have no structure
24
+
25
+ You need org-wide guidelines that are flexible enough for team
26
+ autonomy but consistent enough for cross-team collaboration.
27
+
28
+ Task: Write the organization's code review guidelines document.
29
+ Include: universal requirements (all teams must follow), recommended
30
+ practices (teams should customize), author responsibilities, reviewer
31
+ responsibilities, review scope by PR type, and escalation procedures.
32
+
33
+ assertions:
34
+ - type: llm_judge
35
+ criteria: "Universal requirements establish a quality floor — non-negotiable (all teams): (1) every production code change must be reviewed by at least 1 person who didn't write it, (2) security-sensitive code (auth, payments, PII handling) requires review by security-trained engineer, (3) database migrations require DBA or data team review, (4) review comments must use severity labels (blocking/suggestion/nit), (5) first review within 24 business hours, (6) no self-merging except for emergencies (documented with incident link), (7) reviews must verify: correctness, test coverage for new behavior, no security regressions. Each requirement has a clear rationale explaining WHY it's non-negotiable. Exceptions process: how to request a waiver for legitimate reasons (hotfix, trivial change)"
36
+ weight: 0.35
37
+ description: "Universal requirements"
38
+ - type: llm_judge
39
+ criteria: "Author and reviewer responsibilities are explicit — author responsibilities: (1) self-review before requesting review (read your own diff), (2) write meaningful PR description (what, why, how to test), (3) keep PRs focused (one concern per PR), (4) respond to comments within 24 hours, (5) don't merge with unresolved blocking comments. Reviewer responsibilities: (1) review within 24 hours, (2) review the entire PR (don't review half and approve), (3) label comment severity, (4) provide actionable feedback (problem + suggestion), (5) approve or request changes (don't leave in liminal state), (6) re-review within 4 hours after author addresses feedback. PR templates: suggested sections — description, motivation, testing instructions, screenshots (for UI), checklist (tests added, docs updated, no secrets). Teams can customize but must include at minimum: description and testing instructions"
40
+ weight: 0.35
41
+ description: "Responsibilities"
42
+ - type: llm_judge
43
+ criteria: "Flexibility and escalation balance consistency with autonomy — team-customizable: number of required approvals (minimum 1, teams can require 2+), coverage thresholds (each team sets their own, but must have one), style/formatting rules (team choice but must be automated), PR size limits (recommended < 200 lines, teams set their own). Review scope by PR type: hotfix (correctness only — ship fast), feature (full review — architecture, tests, docs), refactor (focus on behavior preservation), dependency update (check changelog + CI). Escalation: (1) reviewer disagrees with author → discussion in comments, (2) can't resolve → bring in a third reviewer, (3) still can't resolve → team lead decides, (4) cross-team disagreement → engineering manager. Document versioning: guidelines reviewed quarterly, teams can propose changes via PR to the guidelines doc itself"
44
+ weight: 0.30
45
+ description: "Flexibility and escalation"
@@ -0,0 +1,39 @@
1
+ meta:
2
+ id: review-load-balancing
3
+ level: 4
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Balance review workload — design assignment systems that distribute review load fairly, develop reviewer expertise, and prevent bottlenecks"
7
+ tags: [code-review, load-balancing, assignment, bottleneck, fairness, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your 40-person engineering organization has a review load imbalance:
13
+
14
+ - 5 senior engineers review 70% of all PRs (burned out, blocking others)
15
+ - 15 mid-level engineers review 25% (could do more)
16
+ - 20 junior engineers review 5% (missing learning opportunity)
17
+ - Some code areas have only 1 qualified reviewer (bus factor = 1)
18
+ - Security-sensitive code (auth, payments) requires senior review
19
+ - Cross-team PRs sit in queues because no one "owns" the review
20
+ - Some reviewers are faster but lower quality; others are thorough but slow
21
+
22
+ Task: Design a review assignment and load balancing system. Include:
23
+ assignment algorithm, reviewer tiers, escalation paths, cross-
24
+ training plan to reduce bus factors, and how to handle different
25
+ review quality levels. Balance fairness, speed, and quality.
26
+
27
+ assertions:
28
+ - type: llm_judge
29
+ criteria: "Assignment system balances load with expertise — tiered system: (1) Primary reviewer (auto-assigned via round-robin within capable pool) — must be able to review the code area. (2) Secondary reviewer for critical paths (auth, payments, data migrations) — always a senior engineer. Use CODEOWNERS to define code area → reviewer pool mappings. Round-robin within each pool to distribute evenly. Capacity tracking: each reviewer has a weekly capacity (juniors: 2-3 reviews, mids: 4-5, seniors: 3-4 — seniors review fewer but harder PRs). Assignment considers current load: don't assign to someone with 3 pending reviews. Opt-out periods: reviewers can mark 'focus time' blocks where they're not assigned reviews"
30
+ weight: 0.35
31
+ description: "Assignment system"
32
+ - type: llm_judge
33
+ criteria: "Cross-training plan reduces bus factor — identify single points of failure: code areas with only 1 qualified reviewer. Training approach: (1) pair reviewing — shadow the expert for 2-3 reviews, then co-review, then solo with expert available for questions. (2) Rotate secondary reviewer assignments — mid-level engineers get assigned to unfamiliar areas with a senior co-reviewer. (3) Documentation of domain knowledge — expert creates a 'reviewer guide' for their specialty area (what to look for in payment code, common pitfalls). Target: every code area has at least 3 qualified reviewers within 6 months. Tracking: matrix of code areas × qualified reviewers, updated monthly. Incentive: reviewing unfamiliar code (with appropriate support) counts toward growth goals"
34
+ weight: 0.35
35
+ description: "Cross-training"
36
+ - type: llm_judge
37
+ criteria: "Quality differences are managed fairly — acknowledge: reviewers have different quality levels. Fast-but-shallow reviewers: good for low-risk code (documentation, minor refactors, dependency updates). Thorough-but-slow reviewers: good for complex logic, security-sensitive code. Match reviewer style to PR risk. Quality improvement: (1) monthly 'review quality' feedback — manager samples 3 reviews per engineer, provides feedback. (2) Example repository of excellent reviews — annotated examples of good comments at each severity level. (3) Pair reviewing for quality improvement — pair a fast reviewer with a thorough one. Avoid: publicly ranking review quality (shame doesn't improve quality). Do: celebrate specific excellent review comments in team channel. Escalation: when a reviewer is consistently rubber-stamping, private conversation about review expectations"
38
+ weight: 0.30
39
+ description: "Quality management"
@@ -0,0 +1,39 @@
1
+ meta:
2
+ id: review-metrics
3
+ level: 4
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Measure review effectiveness — define and track metrics that indicate code review quality, not just velocity"
7
+ tags: [code-review, metrics, effectiveness, quality, measurement, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Leadership asks: "Are our code reviews actually catching bugs, or
13
+ are we just wasting engineering time?" You need data to answer.
14
+
15
+ Available data:
16
+ - Git/GitHub: PR metadata, review comments, approval times, merge times
17
+ - Bug tracker: production bugs, their root causes, which PRs introduced them
18
+ - CI/CD: build failures, test failures, deployment rollbacks
19
+ - Surveys: developer satisfaction data (quarterly)
20
+ - Code quality tools: SonarQube, CodeClimate scores over time
21
+
22
+ Task: Design a code review metrics program. Define: what to measure,
23
+ how to measure it, what "good" looks like for each metric, how to
24
+ use metrics to improve reviews, and what metrics to NOT track (or
25
+ track privately). Avoid metrics that incentivize bad behavior.
26
+
27
+ assertions:
28
+ - type: llm_judge
29
+ criteria: "Effectiveness metrics measure outcomes, not just activity — primary: (1) Defect escape rate: percentage of production bugs that passed through code review without being caught. Target: < 5%. Measures: is review actually catching things? (2) Review-introduced improvements: how often do review comments lead to code changes that prevent future bugs? Track: comment → code change → no subsequent bug in that area. (3) Knowledge distribution: are review comments teaching? Measure: decrease in similar review comments to same developer over time (they're learning). (4) Developer growth: developers who receive reviews improve their first-submission quality over time (fewer review rounds needed). These measure the PURPOSE of code review, not just the process"
30
+ weight: 0.35
31
+ description: "Effectiveness metrics"
32
+ - type: llm_judge
33
+ criteria: "Process metrics are balanced with quality safeguards — track: time-to-first-review (SLA compliance), review rounds per PR, time-to-merge. But: pair with quality metrics to prevent gaming. If time-to-merge drops but defect escape rate rises, we're approving too fast. Review depth: average meaningful comments per review (exclude auto-generated, style comments). Review distribution: Gini coefficient of review assignments (0 = perfectly equal, 1 = one person does everything). Target: < 0.4. PR rejection rate: if reviews rarely request changes, either code quality is excellent or reviews are rubber stamps — correlate with defect escape rate to determine which. Comment resolution rate: what percentage of review comments result in code changes? If low, comments may not be actionable"
34
+ weight: 0.35
35
+ description: "Process metrics"
36
+ - type: llm_judge
37
+ criteria: "Dangerous metrics and gaming are addressed — DO NOT track (or only track privately): (1) number of comments per reviewer — incentivizes volume over quality (nitpicking to boost numbers). (2) Review speed as individual metric — incentivizes rubber-stamp approvals. (3) Bug count per developer — creates blame culture, discourages risk-taking. Instead track: bug patterns (systemic issues, not individual blame). Private metrics (manager only): individual review quality scores (are comments helpful?), reviewer response time per person (are specific people bottlenecks?). Public metrics (team level only): team defect rate, team review turnaround, team satisfaction. Gaming prevention: any metric that becomes a target will be gamed. Use metrics as signals for investigation, not as goals to optimize. 'If your metrics improve but developer satisfaction drops, the metrics are lying.'"
38
+ weight: 0.30
39
+ description: "Dangerous metrics"
@@ -0,0 +1,48 @@
1
+ meta:
2
+ id: review-process-optimization
3
+ level: 4
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Optimize review processes — reduce review cycle time, improve throughput, and eliminate bottlenecks while maintaining quality"
7
+ tags: [code-review, process, optimization, cycle-time, bottleneck, throughput, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your engineering organization (50 developers, 6 teams) has code
13
+ review metrics that need improvement:
14
+
15
+ Current state:
16
+ - Average time-to-first-review: 3.2 days
17
+ - Average time-to-merge: 7.1 days (after review rounds)
18
+ - Average review rounds: 2.8 per PR
19
+ - 15% of PRs are abandoned after review (never merged)
20
+ - 40% of review comments are about style/formatting
21
+ - Top 5 reviewers handle 65% of all reviews
22
+ - Developer satisfaction with review process: 4.2/10
23
+
24
+ Target:
25
+ - Time-to-first-review: < 24 hours
26
+ - Time-to-merge: < 3 days
27
+ - Review rounds: < 2 per PR
28
+ - Abandoned PRs: < 5%
29
+ - Developer satisfaction: > 7/10
30
+
31
+ Task: Design the process optimization plan. Include: root cause
32
+ analysis for each metric, specific interventions, implementation
33
+ timeline, and how to measure improvement. Show the connection
34
+ between interventions and metric improvements.
35
+
36
+ assertions:
37
+ - type: llm_judge
38
+ criteria: "Root causes are diagnosed from the metrics — 3.2 day first review: top 5 reviewers are bottlenecked (65% of reviews, insufficient capacity). Fix: distribute reviews more evenly, use CODEOWNERS for auto-assignment, rotate reviewers. 7.1 day merge: 2.8 review rounds means 2-3 back-and-forth cycles. Fix: higher quality first submissions (PR templates, self-review checklist) and higher quality first reviews (one thorough pass, not incremental nitpicking). 40% style comments: humans doing work machines should do. Fix: linters + formatters in CI (ESLint, Prettier, auto-format on commit). 15% abandoned: PRs too large or misaligned with team direction. Fix: design review BEFORE coding for large features, PR size limits. Each root cause has a specific, measurable intervention"
39
+ weight: 0.35
40
+ description: "Root cause analysis"
41
+ - type: llm_judge
42
+ criteria: "Interventions are specific and connected to metrics — intervention 1 (week 1): add ESLint/Prettier to CI pipeline → eliminates 40% of review comments → reduces review rounds by ~1 → reduces time-to-merge by ~2 days. Intervention 2 (week 2): implement CODEOWNERS with round-robin assignment → distributes load from top 5 → reduces time-to-first-review. Intervention 3 (week 3): PR template with self-review checklist → reduces low-quality submissions → reduces first-review comments → reduces review rounds. Intervention 4 (week 4): require design doc/discussion for PRs > 300 lines → reduces abandoned PRs (alignment before coding). Intervention 5 (ongoing): review SLA dashboard visible to all → social pressure for timely reviews. Each intervention: expected metric impact, implementation effort, timeline"
43
+ weight: 0.35
44
+ description: "Specific interventions"
45
+ - type: llm_judge
46
+ criteria: "Measurement and iteration plan is rigorous — dashboard: real-time metrics for all 5 targets, broken down by team and individual. Weekly review: review metrics in team standup (not to blame, to identify bottlenecks). Monthly retrospective: which interventions are working? Which need adjustment? Leading indicators: (1) PR size distribution (smaller is better), (2) first-review quality (fewer follow-up rounds means better first reviews), (3) automated check pass rate (linter/formatter adoption). Lagging indicators: time-to-merge trend, developer satisfaction (quarterly survey). Iteration: 'We won't hit all targets immediately. The plan is: month 1 (automation + assignment), month 2 (process + templates), month 3 (culture + training). Measure after each month, adjust before proceeding.' Risk: 'Optimizing for speed can reduce quality — monitor defect escape rate alongside speed metrics'"
47
+ weight: 0.30
48
+ description: "Measurement plan"