dojo.md 0.2.3 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. package/courses/GENERATION_LOG.md +9 -0
  2. package/courses/code-review-feedback-writing/scenarios/level-3/advanced-review-shift.yaml +48 -0
  3. package/courses/code-review-feedback-writing/scenarios/level-3/api-design-review.yaml +47 -0
  4. package/courses/code-review-feedback-writing/scenarios/level-3/database-migration-review.yaml +48 -0
  5. package/courses/code-review-feedback-writing/scenarios/level-3/design-pattern-feedback.yaml +48 -0
  6. package/courses/code-review-feedback-writing/scenarios/level-3/production-incident-review.yaml +42 -0
  7. package/courses/code-review-feedback-writing/scenarios/level-3/reviewing-senior-code.yaml +47 -0
  8. package/courses/code-review-feedback-writing/scenarios/level-3/speed-vs-thoroughness.yaml +46 -0
  9. package/courses/code-review-feedback-writing/scenarios/level-4/automated-review-strategy.yaml +44 -0
  10. package/courses/code-review-feedback-writing/scenarios/level-4/expert-review-shift.yaml +46 -0
  11. package/courses/code-review-feedback-writing/scenarios/level-4/review-culture-design.yaml +41 -0
  12. package/courses/code-review-feedback-writing/scenarios/level-4/review-guidelines-standards.yaml +45 -0
  13. package/courses/code-review-feedback-writing/scenarios/level-4/review-load-balancing.yaml +39 -0
  14. package/courses/code-review-feedback-writing/scenarios/level-4/review-metrics.yaml +39 -0
  15. package/courses/code-review-feedback-writing/scenarios/level-4/review-process-optimization.yaml +48 -0
  16. package/courses/code-review-feedback-writing/scenarios/level-4/scaling-review-process.yaml +45 -0
  17. package/courses/code-review-feedback-writing/scenarios/level-4/security-review-standards.yaml +41 -0
  18. package/courses/code-review-feedback-writing/scenarios/level-4/training-reviewers.yaml +42 -0
  19. package/courses/code-review-feedback-writing/scenarios/level-5/board-quality-metrics.yaml +44 -0
  20. package/courses/code-review-feedback-writing/scenarios/level-5/knowledge-transfer-at-scale.yaml +42 -0
  21. package/courses/code-review-feedback-writing/scenarios/level-5/ma-review-alignment.yaml +50 -0
  22. package/courses/code-review-feedback-writing/scenarios/level-5/master-review-shift.yaml +49 -0
  23. package/courses/code-review-feedback-writing/scenarios/level-5/review-competitive-advantage.yaml +48 -0
  24. package/courses/code-review-feedback-writing/scenarios/level-5/review-organizational-learning.yaml +46 -0
  25. package/courses/code-review-feedback-writing/scenarios/level-5/review-roi-analysis.yaml +51 -0
  26. package/courses/code-review-feedback-writing/scenarios/level-5/review-velocity-impact.yaml +44 -0
  27. package/courses/code-review-feedback-writing/scenarios/level-5/scaling-reviews-100-plus.yaml +45 -0
  28. package/courses/code-review-feedback-writing/scenarios/level-5/toxic-culture-transformation.yaml +46 -0
  29. package/courses/technical-rfc-writing/course.yaml +11 -0
  30. package/courses/technical-rfc-writing/scenarios/level-1/first-rfc-shift.yaml +45 -0
  31. package/courses/technical-rfc-writing/scenarios/level-1/implementation-planning.yaml +47 -0
  32. package/courses/technical-rfc-writing/scenarios/level-1/open-questions.yaml +46 -0
  33. package/courses/technical-rfc-writing/scenarios/level-1/problem-statement.yaml +41 -0
  34. package/courses/technical-rfc-writing/scenarios/level-1/proposing-solutions.yaml +49 -0
  35. package/courses/technical-rfc-writing/scenarios/level-1/rfc-structure.yaml +41 -0
  36. package/courses/technical-rfc-writing/scenarios/level-1/risks-and-mitigations.yaml +43 -0
  37. package/courses/technical-rfc-writing/scenarios/level-1/scoping-an-rfc.yaml +49 -0
  38. package/courses/technical-rfc-writing/scenarios/level-1/success-metrics.yaml +43 -0
  39. package/courses/technical-rfc-writing/scenarios/level-1/writing-for-audience.yaml +42 -0
  40. package/courses/technical-rfc-writing/scenarios/level-2/risk-assessment-matrix.yaml +43 -0
  41. package/courses/technical-rfc-writing/scenarios/level-2/technical-design-detail.yaml +42 -0
  42. package/courses/technical-rfc-writing/scenarios/level-2/trade-off-analysis.yaml +43 -0
  43. package/dist/evaluator/judge.d.ts +7 -1
  44. package/dist/evaluator/judge.d.ts.map +1 -1
  45. package/dist/evaluator/judge.js +50 -11
  46. package/dist/evaluator/judge.js.map +1 -1
  47. package/dist/types/index.d.ts +1 -1
  48. package/dist/types/index.d.ts.map +1 -1
  49. package/package.json +1 -1
@@ -533,3 +533,12 @@ Tracks all auto-generated courses for dojo.md.
533
533
  - **Scenarios**: 50 (10 per level × 5 levels)
534
534
  - **Type**: output
535
535
  - **Sources**: OpenAPI Initiative (OAS 3.1 specification), Swagger UI documentation, Redoc documentation, Stripe API documentation (industry benchmark), Twilio documentation patterns, GitHub API documentation, Diátaxis documentation framework, RFC 8594 (Deprecation header), RFC 7396 (JSON Merge Patch), Spectral linting documentation, Dredd contract testing, Schemathesis API testing, Docusaurus/Nextra static site generators, developer experience research and best practices
536
+
537
+ ### Course 55: Code Review Feedback Writing
538
+ - **Generated**: 2026-02-27
539
+ - **Category**: Writing & Communication
540
+ - **Directory**: `courses/code-review-feedback-writing/`
541
+ - **Topics researched**: clear comment writing (specific, actionable, contextual), constructive tone (collaborative, respectful, critique code not people), style vs logic severity (blocking/suggestion/nit categorization), asking effective questions (genuine vs passive-aggressive, Socratic method), providing context in comments (self-contained, references, evidence), reviewing small PRs (summary comments, inline feedback, approval decisions), nitpick etiquette (when to raise, labeling, automation alternatives), approve vs request changes (decision framework, calibrated decisions), giving meaningful praise (specific, pattern-reinforcing, authentic), reviewing complex PRs (thematic grouping, critical path identification, PR splitting), architectural feedback (design principles, proportionate alternatives, future risk), security review comments (vulnerability explanations, attack scenarios, OWASP), performance feedback (quantified impact, N+1 queries, complexity analysis), suggesting alternatives with code (before/after, trade-off analysis), reviewing tests (coverage, edge cases, naming, independence), reviewing documentation changes (accuracy, completeness, prevention), reviewing error handling (failure scenarios, resilience patterns, production impact), reviewing breaking changes (impact identification, safe migration, deployment coordination), cross-team review (different vs wrong, interface focus, collaborative communication), reviewing unfamiliar code (universal quality issues, dual-purpose questions), mentoring through review (teaching principles, supportive tone, building independence), speed vs thoroughness (triage, varying depth, honest scoping), API design review (REST conventions, response format, future compatibility), database migration review (production safety, rollback, deployment strategy), design pattern feedback (when to apply, over-engineering callout), reviewing senior engineers' code (confident feedback, power dynamics), production incident review (fix correctness, urgency balance, follow-up scoping), review culture design (values, behaviors, SLAs, conflict resolution), review process optimization (root cause analysis, interventions, measurement), review metrics (effectiveness vs activity, dangerous metrics, gaming prevention), automated review strategy (automation decisions, tool selection, rollout), review load balancing (assignment systems, cross-training, quality management), training reviewers (skill levels, curriculum, practice exercises), review guidelines standards (universal requirements, responsibilities, flexibility), security review standards (checklist by code area, training tiers, champions), scaling review process (20→50→100 milestones, transitions), review as organizational learning (knowledge capture, cross-team programs), toxic culture transformation (diagnosis, immediate interventions, 90-day program), scaling to 100+ engineers (automated systems, timezone awareness, culture preservation), review velocity impact (cost analysis, optimization, CEO recommendation), board quality metrics (business translation, financial quantification, competitive strategy), review competitive advantage (strategic narrative, recruiting, maintenance), knowledge transfer at scale (emergency 4-week plan, prevention strategy, distribution measurement), M&A review alignment (culture assessment, unified framework, migration strategy), review ROI analysis (cost model, value quantification, CFO recommendation)
542
+ - **Scenarios**: 50 (10 per level × 5 levels)
543
+ - **Type**: output
544
+ - **Sources**: Google Engineering Practices (code review documentation), Conventional Comments (labeling system), Stack Overflow Blog (code review best practices), Dr. Michaela Greiler (respectful review feedback research), OWASP Secure Code Review Cheat Sheet, Graphite blog (code review scaling), DORA metrics research (deployment frequency, change failure rate), psychological safety research (Amy Edmondson), code review effectiveness studies
@@ -0,0 +1,48 @@
1
+ meta:
2
+ id: advanced-review-shift
3
+ level: 3
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Combined advanced shift — review a cross-team database migration PR while mentoring a junior developer, balancing speed with thoroughness during a sprint deadline"
7
+ tags: [code-review, combined, shift-simulation, cross-team, mentoring, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ It's Wednesday afternoon, sprint ends Friday. You have one complex
13
+ review situation:
14
+
15
+ A junior developer on another team wrote a PR that adds multi-tenant
16
+ support to the shared authentication service. Your team depends on
17
+ this service heavily. The PR:
18
+
19
+ - Database migration: adds tenant_id to 3 tables (users, sessions, api_keys)
20
+ - Auth middleware changes: adds tenant context to every request
21
+ - API changes: new endpoints for tenant management
22
+ - Breaking change: all existing API calls need a X-Tenant-ID header
23
+ - Tests: basic coverage, missing edge cases
24
+ - The junior dev is under pressure from their manager to ship by Friday
25
+
26
+ You need to:
27
+ 1. Review from your team's perspective (impact on your services)
28
+ 2. Mentor the junior developer (this is their biggest PR)
29
+ 3. Be thorough on security (auth changes are critical)
30
+ 4. Be fast enough to not block the sprint
31
+ 5. Navigate the cross-team politics (their manager wants quick approval)
32
+
33
+ Task: Write the complete review that balances all five concerns.
34
+ Include your communication strategy for the sensitive dynamics.
35
+
36
+ assertions:
37
+ - type: llm_judge
38
+ criteria: "Security concerns are non-negotiable regardless of deadline — auth middleware review: 'Tenant isolation is a security boundary — if tenant_id can be spoofed via header, any user can access any tenant's data. The X-Tenant-ID header must be validated against the authenticated user's allowed tenants, not just trusted.' Specific checks: (1) Can user A set X-Tenant-ID to tenant B's ID? (2) What happens when the header is missing? (3) What happens when an invalid tenant ID is provided? (4) Are all existing queries properly scoped by tenant_id? Migration: 'Adding tenant_id without a foreign key constraint means invalid tenant IDs can exist in the database.' Mark as blocking: 'These security items must be addressed before merge — no exceptions for sprint deadlines on authentication code.'"
39
+ weight: 0.35
40
+ description: "Security non-negotiable"
41
+ - type: llm_judge
42
+ criteria: "Impact on dependent teams is clearly communicated — breaking change assessment: 'The X-Tenant-ID header requirement is a breaking change for all consuming services. Our payments service makes 200 requests/second to auth — this will fail after deploy unless we're coordinated.' Migration plan needed: 'How is this being deployed? All consuming services need to be updated simultaneously, or: make the header optional first (default to primary tenant), deploy all consumers, then make it required.' Communication: raise in cross-team channel, not just PR comment. Specific impact: 'Our team needs 2-3 days to update our services to include X-Tenant-ID. This can't ship Friday without a backward-compatible migration path.' Propose solution: 'Make tenant_id optional with default for single-tenant mode. Phase 2 makes it required after all services migrate.'"
43
+ weight: 0.35
44
+ description: "Impact communication"
45
+ - type: llm_judge
46
+ criteria: "Mentoring and politics are handled with care — mentoring: 'This is an ambitious and important feature — the overall approach is solid. I have some security items that need to be addressed and some suggestions for safer deployment.' Don't overwhelm: separate blocking items (3-4) from suggestions (5+) from follow-up items. Offer help: 'Happy to pair on the tenant validation logic if that would help move faster.' Politics: direct message to the junior dev's manager: 'The PR is well-written but has security issues that need to be addressed — I want to help [name] fix them quickly rather than rush a merge. The breaking change also needs a migration plan that involves our team. Can we sync briefly?' Don't: approve under pressure ('it's fine, ship it'), block passive-aggressively ('I have 47 comments'), or escalate unnecessarily. Frame as: 'We're on the same team — let's ship this right.'"
47
+ weight: 0.30
48
+ description: "Mentoring and politics"
@@ -0,0 +1,47 @@
1
+ meta:
2
+ id: api-design-review
3
+ level: 3
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Review API design — evaluate endpoint design, request/response contracts, versioning strategy, and backward compatibility in code reviews"
7
+ tags: [code-review, API-design, endpoints, contracts, versioning, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're reviewing a PR that adds 3 new API endpoints to your
13
+ public-facing REST API. These endpoints will be used by external
14
+ developers and can't easily be changed after release.
15
+
16
+ Proposed endpoints:
17
+ 1. POST /api/v1/getOrders — retrieves orders with filters in the
18
+ request body
19
+ 2. PUT /api/v1/orders/{id} — updates an order, but some fields
20
+ (status, total) are calculated and shouldn't be settable
21
+ 3. DELETE /api/v1/orders/{id}/items/{itemId}/remove — removes an
22
+ item from an order
23
+
24
+ Response formats:
25
+ - Success: { "success": true, "data": {...} }
26
+ - Error: { "success": false, "error": "Something went wrong" }
27
+
28
+ The endpoints work correctly with comprehensive tests.
29
+
30
+ Task: Review the API design for RESTful conventions, future
31
+ compatibility, and developer experience. Write review comments
32
+ that explain API design principles and suggest corrections.
33
+ Remember: these become your public contract once released.
34
+
35
+ assertions:
36
+ - type: llm_judge
37
+ criteria: "REST convention violations are identified — (1) POST /getOrders: GET is for retrieval, POST is for creation. Filters should be query parameters: GET /api/v1/orders?status=pending&after=2024-01-01. If filter complexity requires a body, use POST /api/v1/orders/search. 'getOrders' as a verb violates REST (nouns for resources). (2) PUT allows setting calculated fields: read-only fields should be rejected or ignored. Document which fields are writable. Consider: does PUT replace the entire order or just modified fields? If partial, use PATCH. (3) DELETE .../remove: redundant — DELETE already means remove. Simplify to DELETE /api/v1/orders/{id}/items/{itemId}. Each violation explained with the REST principle and the practical impact on developers"
38
+ weight: 0.35
39
+ description: "REST conventions"
40
+ - type: llm_judge
41
+ criteria: "Response format and contract issues are addressed — success/error envelope: 'The { success: true/false } pattern is redundant with HTTP status codes — 200 already means success, 400 means client error. Industry standard (Stripe, GitHub): use status codes for success/failure, and reserve the response body for data. Error responses should include: code (machine-readable), message (human-readable), details (field-level errors).' Current error: 'Something went wrong' gives developers nothing to debug. Suggest: `{ error: { code: 'ORDER_NOT_FOUND', message: 'Order with ID xyz does not exist', param: 'id' } }`. Consistency: 'Match the error format used by your other 20 endpoints — developers shouldn't learn a new error format per endpoint.'"
42
+ weight: 0.35
43
+ description: "Response format"
44
+ - type: llm_judge
45
+ criteria: "Future compatibility concerns are raised — 'These endpoints become a permanent public contract. Changes after release are breaking changes that affect external developers.' Specific concerns: (1) pagination — GET /orders must return paginated results (what happens when there are 100K orders?), (2) filtering — be intentional about which filters are supported (adding is non-breaking, removing is breaking), (3) field expansion — consider using ?fields=id,status,total for sparse responses, (4) versioning — v1 in the URL means we can make v2 later, but plan for what triggers a v2. API review checklist: naming consistency, HTTP method correctness, pagination, filtering, sorting, error format, rate limiting headers, authentication, idempotency (PUT/DELETE should be idempotent). 'Suggest: run this through our API design review checklist before launch.'"
46
+ weight: 0.30
47
+ description: "Future compatibility"
@@ -0,0 +1,48 @@
1
+ meta:
2
+ id: database-migration-review
3
+ level: 3
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Review database migrations — evaluate schema changes for safety, reversibility, data integrity, and production deployment risk"
7
+ tags: [code-review, database, migration, schema, deployment, safety, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're reviewing a database migration PR. The migration:
13
+
14
+ 1. Adds a NOT NULL column "subscription_tier" to the users table
15
+ (2 million rows) without a default value
16
+
17
+ 2. Renames column "userName" to "user_name" in the orders table
18
+
19
+ 3. Adds a unique constraint on (email, organization_id) to the
20
+ users table
21
+
22
+ 4. Drops the "legacy_tokens" table (mentioned as "unused")
23
+
24
+ 5. Creates a new index on orders(customer_id, created_at) — the
25
+ orders table has 50 million rows
26
+
27
+ 6. Changes the "amount" column type from INTEGER to DECIMAL(10,2)
28
+
29
+ The migration has no down/rollback migration defined.
30
+
31
+ Task: Review each migration step for production safety. For each,
32
+ explain the risk, describe what could go wrong during deployment,
33
+ and suggest the safe approach. These are the reviews that prevent
34
+ 2am deployment incidents.
35
+
36
+ assertions:
37
+ - type: llm_judge
38
+ criteria: "Production risks are identified for each step — (1) NOT NULL without default: migration fails on 2M existing rows (all have NULL for the new column). Even with a default, adding NOT NULL column locks the table during backfill on some databases. (2) Column rename: every query referencing 'userName' breaks instantly — application code, reports, ETL jobs, third-party integrations. (3) Unique constraint: fails if duplicate (email, org_id) pairs already exist — migration aborts mid-way. Need to check for duplicates BEFORE applying. (4) Drop table: 'unused' — how was this verified? Any scheduled jobs, background workers, or external systems accessing it? Data loss is permanent. (5) New index on 50M rows: index creation locks the table (in some DBs) — no reads or writes during build, could take 30+ minutes. (6) Type change: INTEGER to DECIMAL changes data semantics — existing value 1000 (cents) becomes 1000.00 (dollars?). All application code interpreting this value must be updated simultaneously"
39
+ weight: 0.35
40
+ description: "Production risks"
41
+ - type: llm_judge
42
+ criteria: "Safe approaches are suggested for each — (1) NOT NULL: three-step migration: add column as NULL with default → backfill existing rows → add NOT NULL constraint. (2) Rename: don't rename — add new column, dual-write, migrate reads, drop old column. Or: use database alias/view. (3) Unique constraint: first run SELECT email, organization_id, COUNT(*) ... HAVING COUNT(*) > 1 to find duplicates, fix them, THEN add constraint. (4) Drop table: add a deprecation period — rename to legacy_tokens_deprecated, wait 30 days, monitor for errors, THEN drop. (5) Index: use CREATE INDEX CONCURRENTLY (PostgreSQL) or equivalent to avoid table lock. (6) Type change: coordinate with application deploy — use a feature flag or blue/green deployment so code and schema change together. Each step: show the safe SQL or migration code"
43
+ weight: 0.35
44
+ description: "Safe approaches"
45
+ - type: llm_judge
46
+ criteria: "Missing rollback and deployment strategy are flagged — 'No down migration defined — if this migration fails halfway, how do we roll back? Every migration must have a corresponding rollback.' Rollback for each step: (1) DROP COLUMN, (2) rename back, (3) DROP CONSTRAINT, (4) can't rollback data loss (strongest argument against dropping tables), (5) DROP INDEX, (6) ALTER COLUMN back to INTEGER (but data precision is lost). Deployment strategy: 'This migration should be split into 6 separate migrations, deployed independently over multiple deploys. If step 3 fails, we don't want to have to roll back steps 1-2.' Testing: 'Run this migration against a copy of production data first — the unique constraint check (#3) and index creation time (#5) should be validated with real data volumes.' Monitor: 'Add migration timing to deployment metrics — alert if any step takes more than 5 minutes'"
47
+ weight: 0.30
48
+ description: "Rollback and deployment"
@@ -0,0 +1,48 @@
1
+ meta:
2
+ id: design-pattern-feedback
3
+ level: 3
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Give design pattern feedback — recognize pattern opportunities and anti-patterns in code reviews and explain when patterns help vs when they over-engineer"
7
+ tags: [code-review, design-patterns, anti-patterns, refactoring, over-engineering, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're reviewing code that could benefit from design patterns, but
13
+ you need to be careful — not every piece of code needs a pattern.
14
+ You've found:
15
+
16
+ 1. A 500-line switch statement handling 20 different notification
17
+ types — each case has similar but slightly different logic
18
+
19
+ 2. A class with 15 methods that all start by loading user data,
20
+ checking permissions, and logging — massive duplication
21
+
22
+ 3. Code that creates database connections directly in 30 different
23
+ places instead of using a connection pool
24
+
25
+ 4. A simple CRUD endpoint where someone suggests "we should add
26
+ a Strategy pattern for the create operation"
27
+
28
+ 5. Three layers of abstraction for a function that's called in
29
+ exactly one place
30
+
31
+ Task: For each situation, decide: does it need a pattern, which
32
+ pattern, or is the current approach fine? Write review comments
33
+ that teach WHEN patterns are appropriate, not just WHAT patterns
34
+ exist. Call out over-engineering as firmly as under-engineering.
35
+
36
+ assertions:
37
+ - type: llm_judge
38
+ criteria: "Pattern recommendations are justified with concrete benefits — (1) Switch statement: Strategy or Command pattern — define a NotificationHandler interface, create EmailHandler, SMSHandler, PushHandler. Benefit: each handler is independently testable, new notification types don't modify existing code. Show the refactored structure. (2) Repeated setup: Template Method or middleware/decorator pattern — extract loadUser(), checkPermission(), logAction() into middleware that runs before each method. Benefit: DRY, single place to modify auth logic. (3) Direct connections: use connection pool (singleton or factory pattern). Benefit: resource management, connection reuse, centralized configuration. Each recommendation includes: what pattern, why it helps HERE, and a code sketch showing the result"
39
+ weight: 0.35
40
+ description: "Justified patterns"
41
+ - type: llm_judge
42
+ criteria: "Over-engineering is called out with equal conviction — (4) Strategy for simple CRUD: 'This endpoint creates a record in a table — a Strategy pattern adds 3 files and an interface for something that's 10 lines of straightforward code. The Strategy pattern is valuable when you have multiple interchangeable algorithms, not for a single code path that does one thing. YAGNI — You Aren't Gonna Need It. If we need different creation strategies later, we can refactor then.' (5) Three abstraction layers: 'This function is called in one place. The abstraction adds cognitive overhead — a reader must trace through 3 files to understand what happens. Inline the logic into the caller. Abstractions are earned by duplication, not created in anticipation.' Equal firmness: over-engineering wastes time and adds complexity just as under-engineering creates duplication"
43
+ weight: 0.35
44
+ description: "Over-engineering callout"
45
+ - type: llm_judge
46
+ criteria: "Guidelines for WHEN to apply patterns are practical — rules of thumb: (1) Three strikes rule — the first time, just write the code; the second time, note the duplication; the third time, extract a pattern. (2) Patterns are answers to specific problems — if you can't name the problem, you don't need the pattern. (3) Test: would a new team member understand the code better WITH or WITHOUT the pattern? If the pattern adds confusion, skip it. (4) Patterns should reduce code, not increase it — if Strategy adds 100 lines to save 20 lines of duplication, it's not worth it yet. (5) Favor simple composition over complex inheritance. Anti-patterns to watch for: 'Abstract Factory for one implementation', 'Singleton just because', 'Observer for two components that could just call each other.' Maturity signal: knowing when NOT to use a pattern is more valuable than knowing patterns"
47
+ weight: 0.30
48
+ description: "When to apply"
@@ -0,0 +1,42 @@
1
+ meta:
2
+ id: production-incident-review
3
+ level: 3
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Review incident fix PRs — evaluate hotfix code for correctness, root cause coverage, regression prevention, and appropriate urgency without cutting corners"
7
+ tags: [code-review, incident, hotfix, production, root-cause, regression, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ There's an active production incident. A developer submits a PR
13
+ tagged "HOTFIX" to resolve it. The incident: users intermittently
14
+ see another user's order history when clicking "My Orders."
15
+
16
+ The PR (45 lines):
17
+ - Adds a user ID filter to the orders query that was missing
18
+ - The fix is in the caching layer — orders were cached by page
19
+ URL without including the user ID in the cache key
20
+ - The author also "cleaned up" some unrelated code in the same file
21
+ - No tests added ("we need to ship this fast")
22
+ - The fix handles the cache key but doesn't invalidate existing
23
+ incorrect cache entries
24
+
25
+ Task: Review this hotfix balancing urgency with safety. Write your
26
+ review addressing: is the fix correct, does it fully resolve the
27
+ issue, what's the risk of shipping vs not shipping, and what
28
+ follow-up work is needed. This is a time-sensitive review.
29
+
30
+ assertions:
31
+ - type: llm_judge
32
+ criteria: "Fix correctness is evaluated against the root cause — 'The cache key fix is correct — cache_key = /orders?page=1 should be cache_key = /orders?page=1&user_id=123. This prevents NEW incorrect entries. However, EXISTING cached entries still contain mixed user data. Risk: users continue seeing wrong data until those cache entries expire.' Critical missing piece: 'We need to either (1) flush the orders cache entirely, or (2) add cache invalidation for affected keys. Without this, the fix only prevents future occurrences — existing cached wrong data persists for [TTL duration].' Assessment: 'The fix is 80% correct — it prevents the vulnerability from recurring but doesn't clean up existing corrupted cache entries.'"
33
+ weight: 0.35
34
+ description: "Fix correctness"
35
+ - type: llm_judge
36
+ criteria: "Review balances urgency with necessary safety — decision framework: 'Ship immediately with cache flush' not 'Wait for perfect fix.' Specific recommendation: (1) Remove the unrelated code cleanup — hotfix PRs should ONLY contain the fix (reduces risk, simplifies rollback). (2) Add cache flush/invalidation — without this, the data leak continues from cache. (3) Tests can be a fast follow-up PR — but must be done today, not 'later.' (4) Approve once: unrelated changes removed AND cache invalidation added. Urgency acknowledgment: 'This is a data privacy incident — users seeing other users' data. Speed matters. But shipping a partial fix that still leaks cached data isn't actually faster — it's a second incident.'"
37
+ weight: 0.35
38
+ description: "Urgency balance"
39
+ - type: llm_judge
40
+ criteria: "Follow-up work is clearly scoped — immediate (ship with hotfix): cache key fix + cache flush + remove unrelated changes. Today: add regression test that verifies cache keys include user ID. This week: (1) audit all other cache keys for similar user-scoping issues, (2) add cache key logging/monitoring to detect future issues, (3) consider cache key builder utility that enforces user scoping by default. Post-incident: (1) retrospective to understand why the original code didn't include user ID (was it a refactor regression? was it never there?), (2) review process improvement — how did the original PR pass review without user scoping? (3) postmortem document. Frame follow-up as learning: 'The cache key pattern should be documented as a team standard — this is how bugs become institutional knowledge.'"
41
+ weight: 0.30
42
+ description: "Follow-up scoping"
@@ -0,0 +1,47 @@
1
+ meta:
2
+ id: reviewing-senior-code
3
+ level: 3
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Review senior engineer's code — navigate the dynamics of providing feedback to more experienced developers while maintaining quality standards"
7
+ tags: [code-review, senior-engineer, dynamics, confidence, speaking-up, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're a mid-level engineer asked to review a PR from the team's
13
+ most senior developer (10 years experience, built the core system).
14
+ Their PR refactors the authentication middleware — 250 lines.
15
+
16
+ You find several concerns:
17
+ 1. A race condition when refreshing OAuth tokens concurrently
18
+ 2. The new middleware doesn't handle the edge case of expired
19
+ refresh tokens (the old code did)
20
+ 3. Magic numbers: timeout values without named constants
21
+ 4. No tests for the concurrent token refresh scenario
22
+ 5. The refactor changes behavior — previously failed silently,
23
+ now throws (affects all callers)
24
+
25
+ You're nervous because:
26
+ - This person is far more experienced than you
27
+ - They might know something you don't
28
+ - You've seen them dismiss junior reviewers before
29
+ - But you're genuinely concerned about #1 and #2
30
+
31
+ Task: Write the review navigating the power dynamic. Show how to
32
+ give feedback to someone more senior while being confident in your
33
+ observations. Write a guide for reviewing up.
34
+
35
+ assertions:
36
+ - type: llm_judge
37
+ criteria: "Technical concerns are communicated with appropriate confidence — for genuine bugs (#1, #2): be direct but frame as observations, not accusations. 'I traced through the concurrent token refresh path and I believe there's a race condition: if two requests trigger a refresh simultaneously, both will attempt to write the new token. The old code used a mutex here (auth.ts:89 in the previous version) — was the removal intentional?' For the behavior change (#5): 'I noticed the error handling shifted from silent failure to throwing. Since this middleware is used by all 40 endpoints, this changes behavior for every request — was this intentional? If so, should we coordinate the rollout?' Confidence where warranted: 'The race condition is a real concern regardless of experience level — concurrent access bugs don't care about seniority.'"
38
+ weight: 0.35
39
+ description: "Confident feedback"
40
+ - type: llm_judge
41
+ criteria: "Power dynamic is navigated without being deferential or aggressive — don't start with: 'I might be wrong but...' or 'I'm sure you thought of this...' (undermines your review). Don't say: 'This is obviously wrong' (confrontational with senior). Do: state observations factually. 'Lines 45-60: when two concurrent requests both have expired tokens, the first refresh succeeds but the second attempts to refresh with the already-used refresh token, which will fail.' Ask genuine questions for things you're unsure about: 'I see the silent failure was changed to throwing — is there a design document for this change? I want to understand the migration plan for callers.' Nits (#3) are fine to raise normally — everyone gets nits. The key: treat the code the same regardless of who wrote it"
42
+ weight: 0.35
43
+ description: "Dynamic navigation"
44
+ - type: llm_judge
45
+ criteria: "Guide for reviewing up is practical — principles: (1) Your job is to review the CODE, not the PERSON. Experience doesn't make code bug-free. (2) If you see a potential bug, say so — you were asked to review for a reason. (3) Frame uncertain observations as questions ('Could this cause X?'), but frame clear issues as statements ('This race condition will cause Y'). (4) Don't pre-apologize for having feedback — it undermines the review process. (5) Senior engineers generally WANT honest review — they chose you as a reviewer. (6) If dismissed unfairly, escalate to your manager — review quality is everyone's responsibility. Common mistake: being so deferential that you provide a useless rubber-stamp review. The worst outcome isn't disagreeing with a senior — it's a production bug you saw but didn't mention"
46
+ weight: 0.30
47
+ description: "Reviewing up guide"
@@ -0,0 +1,46 @@
1
+ meta:
2
+ id: speed-vs-thoroughness
3
+ level: 3
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Balance review speed and thoroughness — develop strategies for providing timely reviews without sacrificing quality for different types of PRs"
7
+ tags: [code-review, speed, thoroughness, triage, time-management, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You have 6 PRs in your review queue, all assigned to you. Your
13
+ team's SLA is 24 hours for first review. It's 4pm and these came
14
+ in today:
15
+
16
+ 1. Hotfix: Production bug — users seeing 500 errors on checkout
17
+ (15 lines, high urgency)
18
+ 2. Feature: New dashboard analytics (350 lines, normal priority,
19
+ sprint deadline Friday)
20
+ 3. Dependency update: Bump React from 18.2 to 18.3 (5 files changed,
21
+ low priority)
22
+ 4. Refactor: Extract common validation logic into shared module
23
+ (200 lines, medium priority)
24
+ 5. Junior dev's PR: First feature implementation, they're blocked
25
+ waiting for review (100 lines, medium urgency due to person)
26
+ 6. Tech debt: Remove deprecated API endpoints (150 lines, low priority)
27
+
28
+ You have 2 hours of review time today.
29
+
30
+ Task: Create a review strategy for all 6 PRs. Show how to triage,
31
+ what depth of review each gets, write example reviews at different
32
+ thoroughness levels, and explain your prioritization reasoning.
33
+
34
+ assertions:
35
+ - type: llm_judge
36
+ criteria: "Triage ordering is well-reasoned — priority: (1) Hotfix — production impact, review in next 15 minutes. (2) Junior dev's PR — person is blocked, reviewing unblocks them (invest 30 minutes). (3) Feature — sprint deadline pressure (invest 45 minutes). (4) Refactor — medium priority, affects shared code quality (invest 20 minutes). (5) Dependency update — low risk if CI passes, quick scan (10 minutes). (6) Tech debt — no deadline, can slip to tomorrow (skip or quick scan). Reasoning: combine urgency (production > blocked person > deadline) with risk (shared code > isolated change > dependency bump). The junior dev is prioritized higher than strict urgency would suggest because blocking people is more costly than blocking code"
37
+ weight: 0.35
38
+ description: "Triage ordering"
39
+ - type: llm_judge
40
+ criteria: "Review depth varies appropriately per PR — Hotfix (15 min): verify the fix addresses the error, check for regression risk, don't nit on style — approve fast with 'LGTM, fix looks correct. Let's clean up styling in a follow-up.' Feature (45 min): full review — architecture, security, tests, performance. This is the big one that justifies thorough review. Junior PR (30 min): mentor-focused review — teach patterns, be encouraging, unblock quickly even if follow-up work is needed. Refactor (20 min): focus on interface changes and backward compatibility — internal implementation can be reviewed more lightly. Dependency (10 min): check CI passes, scan changelog for breaking changes, verify no deprecated API usage. Tech debt (5 min or skip): quick scan for accidental behavior changes, or leave comment 'I'll review this first thing tomorrow.' Each depth level is explicitly justified"
41
+ weight: 0.35
42
+ description: "Varying depth"
43
+ - type: llm_judge
44
+ criteria: "Example reviews demonstrate different thoroughness levels — quick review (hotfix): 'Verified: the fix adds null check for user.cart before accessing .items. This matches the error in Sentry (TypeError: Cannot read property items of null). CI passes. Approved — ship it.' Medium review (refactor): summary of approach, 2-3 specific comments on interface design, one question about edge case, approve with suggestions. Deep review (feature): structured thematic review, 8-10 comments across architecture/security/testing, request changes on 2 blocking items. Honest scoping: when you do a lighter review, say so: 'I did a quick review focused on the API interface — I'd recommend someone else gives the internal logic a deeper look.' Better to do a timeboxed review and disclose the scope than to skim everything and pretend it was thorough"
45
+ weight: 0.30
46
+ description: "Depth examples"
@@ -0,0 +1,44 @@
1
+ meta:
2
+ id: automated-review-strategy
3
+ level: 4
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Design automated review tooling — determine what to automate vs keep as human review, select tools, and integrate automation into the review workflow"
7
+ tags: [code-review, automation, linting, static-analysis, CI, tooling, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your team spends too much time on mechanical review tasks. You want
13
+ to automate what you can so humans focus on what matters. Current
14
+ review pain points:
15
+
16
+ - 40% of comments are style/formatting (Prettier/ESLint should handle)
17
+ - Security vulnerabilities sometimes slip through (Snyk/Semgrep could catch)
18
+ - Type errors found in review (TypeScript strict mode would prevent)
19
+ - Dead code and unused imports flagged manually
20
+ - Test coverage gaps discovered during review
21
+ - API breaking changes not noticed until production
22
+ - Documentation staleness (code changed, docs not updated)
23
+
24
+ Budget: $500/month for tooling. Team: 30 engineers.
25
+
26
+ Task: Design the automated review tooling strategy. For each pain
27
+ point, decide: automate fully, automate as advisory, or keep human.
28
+ Include: tool selection, CI/CD integration, configuration approach,
29
+ and how to introduce automation without overwhelming developers
30
+ with noise.
31
+
32
+ assertions:
33
+ - type: llm_judge
34
+ criteria: "Each pain point has a clear automation decision — automate fully (block PR if failing): formatting (Prettier — auto-fix on commit), linting (ESLint — errors block, warnings advisory), type checking (TypeScript strict — blocks build), unused imports (ESLint rule — auto-fix). Automate as advisory (comment on PR, don't block): security scanning (Semgrep/CodeQL — flag but human validates severity), test coverage (report delta — don't block on arbitrary threshold), dead code detection (report but human decides if intentional). Keep human: architectural review, business logic validation, API design review, documentation quality (detect CHANGE but human reviews quality). Each decision justified: 'Formatting is objectively correct — no human judgment needed. Security findings require context — is this a real vulnerability or a false positive?'"
35
+ weight: 0.35
36
+ description: "Automation decisions"
37
+ - type: llm_judge
38
+ criteria: "Tool selection is specific and budget-conscious — within $500/month: (1) Prettier + ESLint: free, handles formatting and code quality. (2) TypeScript strict mode: free, catches type errors. (3) GitHub Actions: included in GitHub, runs CI checks. (4) Semgrep: free tier covers security scanning. (5) CodeCov: free for open source, ~$100/month for private — coverage delta reporting. (6) danger.js or similar: free, custom PR automation (check for docs updates when API files change, check for changelog when version bumps). (7) Spectral: free, OpenAPI linting for API breaking changes. Total: ~$100-200/month, well under budget. Alternative: GitHub Advanced Security (~$400/month) for CodeQL + secret scanning + dependency review. Configuration: use shared config packages so all 30 engineers have consistent setup"
39
+ weight: 0.35
40
+ description: "Tool selection"
41
+ - type: llm_judge
42
+ criteria: "Rollout strategy prevents automation fatigue — phase 1 (week 1-2): Prettier auto-format only (zero noise — just fixes things). Phase 2 (week 3-4): ESLint errors only (not warnings — configure strictly to avoid noise). Phase 3 (month 2): Security scanning as advisory comments. Phase 4 (month 3): Coverage reporting and API change detection. Key principle: 'Every automated check must be actionable. If developers start ignoring bot comments, the automation is noise, not signal.' Configuration: tune tools before enabling — run against existing codebase, fix existing violations, THEN enable as blocking. False positive budget: 'If a tool produces more than 10% false positives, fix configuration or remove the tool. Developer trust in automation is fragile.' Gradual: introduce one tool at a time, measure impact, adjust before adding the next"
43
+ weight: 0.30
44
+ description: "Rollout strategy"
@@ -0,0 +1,46 @@
1
+ meta:
2
+ id: expert-review-shift
3
+ level: 4
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Combined expert shift — redesign an organization's code review program covering culture, process, automation, metrics, and training for a growing engineering team"
7
+ tags: [code-review, combined, shift-simulation, program-design, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You've been promoted to VP of Engineering at a 100-person startup
13
+ (60 engineers, 8 teams). The CEO says: "Our code quality is
14
+ declining, deployments keep breaking, and engineers are unhappy
15
+ with the review process. Fix it."
16
+
17
+ Current state:
18
+ - No formal review process — each team does their own thing
19
+ - Average defect escape rate: 12% (industry target: < 5%)
20
+ - Deployment rollback rate: 8% (should be < 2%)
21
+ - Engineer satisfaction with reviews: 3.5/10
22
+ - 3 engineers left citing "toxic review culture" in exit interviews
23
+ - No automated code quality checks
24
+ - Average time-to-merge: 8 days (blocking feature delivery)
25
+
26
+ Budget: $50K for tooling, ability to hire 1 additional head
27
+
28
+ Task: Present your complete code review transformation plan to the
29
+ CEO. Include: diagnosis of current problems, 90-day action plan
30
+ with quick wins, 6-month target state, tooling investment, culture
31
+ change program, metrics dashboard, and how this connects to the
32
+ broader goals of code quality and engineer retention.
33
+
34
+ assertions:
35
+ - type: llm_judge
36
+ criteria: "Diagnosis connects symptoms to root causes — 12% defect escape: reviews aren't catching bugs (reviewers focused on style, not logic — or rubber-stamping). 8% rollback rate: inadequate testing requirement in reviews (tests not reviewed or not required). 3.5/10 satisfaction: combination of slow reviews (8 day merge), hostile comments (3 people left), and inconsistent standards. Toxic culture: specific reviewers giving harsh feedback without consequences — management didn't intervene. 8-day merge: no SLA, no reviewer assignment, no load balancing — PRs sit in queues. Root cause: absence of process, not malicious intent. Nobody designed the review system — it evolved haphazardly. Fix the system, not the people (though the toxic reviewer behavior needs direct intervention)"
37
+ weight: 0.35
38
+ description: "Root cause diagnosis"
39
+ - type: llm_judge
40
+ criteria: "90-day plan delivers quick wins and lasting change — days 1-30 (immediate): (1) address toxic reviewer behavior directly (1:1s, expectations, consequences), (2) implement PR template with minimum requirements, (3) add Prettier + ESLint to CI (eliminate 40% of review comments), (4) establish 24-hour review SLA, (5) hire: DevEx engineer to own review tooling. Days 31-60: (1) CODEOWNERS for automatic reviewer assignment, (2) security scanning in CI (Semgrep), (3) reviewer training workshop (2 hours, all engineers), (4) code review guidelines document published. Days 61-90: (1) metrics dashboard live, (2) review quality feedback loop (manager reviews reviews), (3) cross-team reviewer rotation pilot, (4) quarterly calibration exercise. Each action connected to a specific metric improvement"
41
+ weight: 0.35
42
+ description: "90-day plan"
43
+ - type: llm_judge
44
+ criteria: "6-month targets and CEO communication are compelling — 6-month targets: defect escape < 6% (from 12%), rollback < 4% (from 8%), satisfaction > 6.5/10 (from 3.5), time-to-merge < 3 days (from 8), zero departures citing review culture. Budget allocation: tooling $20K (CI/CD, security scanning, coverage), training $10K (workshops, external facilitation), DevEx hire $80K (6 months salary). ROI for CEO: each production defect costs ~$5K (investigation + fix + customer impact). Reducing defect escape from 12% to 6% on 500 deploys/year = 30 fewer defects = $150K saved. Each departed engineer costs ~$150K to replace — preventing 3 departures = $450K. Total ROI: $600K savings on $110K investment. Culture change message: 'Code review is how we maintain quality at scale. We're investing in making it a positive, effective process — not a bureaucratic checkbox.'"
45
+ weight: 0.30
46
+ description: "Targets and communication"
@@ -0,0 +1,41 @@
1
+ meta:
2
+ id: review-culture-design
3
+ level: 4
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Design code review culture — establish review norms, expectations, and values that create a healthy, productive review environment for the engineering team"
7
+ tags: [code-review, culture, norms, psychological-safety, team-health, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're the new engineering manager for a 25-person team. The code
13
+ review culture has problems:
14
+
15
+ - 3 senior engineers do 80% of reviews (bottleneck + burnout)
16
+ - Reviews average 5 days turnaround (blocking developers for a week)
17
+ - Some reviewers leave 30+ comments on every PR (demoralizing)
18
+ - Junior developers avoid requesting review from specific seniors
19
+ - Two engineers had a public argument in PR comments last month
20
+ - No one reviews documentation PRs ("not code, not my job")
21
+ - Some PRs are merged with "LGTM" and no substantive review
22
+
23
+ Task: Design a complete code review culture document. Include:
24
+ review values and principles, expected behaviors, SLAs, reviewer
25
+ responsibilities, author responsibilities, conflict resolution,
26
+ and how to onboard new team members into the culture. This becomes
27
+ the team's code review charter.
28
+
29
+ assertions:
30
+ - type: llm_judge
31
+ criteria: "Values and principles address the specific problems — values: (1) Respect: critique code, never people. Public arguments in PRs are unacceptable — disagreements escalate to private discussion. (2) Shared responsibility: reviewing is everyone's job, not just seniors'. Every engineer reviews at least 2 PRs/week. (3) Timeliness: first review within 24 hours (business day). Slow reviews block people and waste context. (4) Proportionality: comment volume should match issue severity. 30 comments on a 50-line PR signals either a problematic PR (should have been discussed before writing) or an overly critical reviewer. (5) Growth orientation: reviews are opportunities for learning, not gatekeeping. (6) Comprehensiveness: documentation, tests, and configuration changes deserve review attention, not just 'code.'"
32
+ weight: 0.35
33
+ description: "Values and principles"
34
+ - type: llm_judge
35
+ criteria: "Concrete behaviors and SLAs are defined — reviewer SLA: first review pass within 24 business hours. If you can't review, reassign within 4 hours. Author SLA: respond to review comments within 24 hours. Comment labels: require 'blocking:', 'suggestion:', 'nit:', 'question:', 'praise:' prefixes. Maximum review rounds: if a PR requires more than 2 rounds, schedule a synchronous discussion instead. PR size: aim for under 200 lines; PRs over 400 lines can be reviewed in a meeting instead. Approval requirements: 1 approval for standard changes, 2 for security/auth/payments, team lead for architectural changes. Self-review: authors should self-review their own PR before requesting review (catches obvious issues)"
36
+ weight: 0.35
37
+ description: "Behaviors and SLAs"
38
+ - type: llm_judge
39
+ criteria: "Conflict resolution and onboarding are addressed — conflict resolution: (1) technical disagreements: if 2 rounds of comments don't resolve, take it offline — DM, call, or meeting. (2) Tone issues: if a comment feels personal, assume good intent first. If pattern continues, raise with manager privately. (3) Impasse: engineering manager breaks ties. (4) Never argue in PR comments — the PR author, other reviewers, and future readers all see it. Onboarding new team members: (1) first week: shadow 3 reviews from different reviewers, (2) second week: pair-review with a senior, (3) third week: solo review with senior co-reviewer, (4) ongoing: monthly 'review quality' check-in. Measuring health: quarterly anonymous survey on review experience, track turnaround time, track comment patterns. Review the charter quarterly and update based on team feedback"
40
+ weight: 0.30
41
+ description: "Conflict and onboarding"
@@ -0,0 +1,45 @@
1
+ meta:
2
+ id: review-guidelines-standards
3
+ level: 4
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Create review guidelines — write comprehensive code review standards that define expectations for both authors and reviewers across the organization"
7
+ tags: [code-review, guidelines, standards, expectations, organization, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your growing engineering organization (60 developers, 8 teams) needs
13
+ written code review guidelines. Currently, review expectations are
14
+ tribal knowledge — each team has different unwritten rules.
15
+
16
+ Inconsistencies across teams:
17
+ - Team A requires 2 approvals, Team B requires 1
18
+ - Team C blocks on any style issue, Team D auto-formats
19
+ - Some teams review architecture before coding, others review
20
+ only the final PR
21
+ - Test requirements: Team E requires 80% coverage, Team F has
22
+ no coverage requirement
23
+ - Some teams use PR templates, others have no structure
24
+
25
+ You need org-wide guidelines that are flexible enough for team
26
+ autonomy but consistent enough for cross-team collaboration.
27
+
28
+ Task: Write the organization's code review guidelines document.
29
+ Include: universal requirements (all teams must follow), recommended
30
+ practices (teams should customize), author responsibilities, reviewer
31
+ responsibilities, review scope by PR type, and escalation procedures.
32
+
33
+ assertions:
34
+ - type: llm_judge
35
+ criteria: "Universal requirements establish a quality floor — non-negotiable (all teams): (1) every production code change must be reviewed by at least 1 person who didn't write it, (2) security-sensitive code (auth, payments, PII handling) requires review by security-trained engineer, (3) database migrations require DBA or data team review, (4) review comments must use severity labels (blocking/suggestion/nit), (5) first review within 24 business hours, (6) no self-merging except for emergencies (documented with incident link), (7) reviews must verify: correctness, test coverage for new behavior, no security regressions. Each requirement has a clear rationale explaining WHY it's non-negotiable. Exceptions process: how to request a waiver for legitimate reasons (hotfix, trivial change)"
36
+ weight: 0.35
37
+ description: "Universal requirements"
38
+ - type: llm_judge
39
+ criteria: "Author and reviewer responsibilities are explicit — author responsibilities: (1) self-review before requesting review (read your own diff), (2) write meaningful PR description (what, why, how to test), (3) keep PRs focused (one concern per PR), (4) respond to comments within 24 hours, (5) don't merge with unresolved blocking comments. Reviewer responsibilities: (1) review within 24 hours, (2) review the entire PR (don't review half and approve), (3) label comment severity, (4) provide actionable feedback (problem + suggestion), (5) approve or request changes (don't leave in liminal state), (6) re-review within 4 hours after author addresses feedback. PR templates: suggested sections — description, motivation, testing instructions, screenshots (for UI), checklist (tests added, docs updated, no secrets). Teams can customize but must include at minimum: description and testing instructions"
40
+ weight: 0.35
41
+ description: "Responsibilities"
42
+ - type: llm_judge
43
+ criteria: "Flexibility and escalation balance consistency with autonomy — team-customizable: number of required approvals (minimum 1, teams can require 2+), coverage thresholds (each team sets their own, but must have one), style/formatting rules (team choice but must be automated), PR size limits (recommended < 200 lines, teams set their own). Review scope by PR type: hotfix (correctness only — ship fast), feature (full review — architecture, tests, docs), refactor (focus on behavior preservation), dependency update (check changelog + CI). Escalation: (1) reviewer disagrees with author → discussion in comments, (2) can't resolve → bring in a third reviewer, (3) still can't resolve → team lead decides, (4) cross-team disagreement → engineering manager. Document versioning: guidelines reviewed quarterly, teams can propose changes via PR to the guidelines doc itself"
44
+ weight: 0.30
45
+ description: "Flexibility and escalation"
@@ -0,0 +1,39 @@
1
+ meta:
2
+ id: review-load-balancing
3
+ level: 4
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Balance review workload — design assignment systems that distribute review load fairly, develop reviewer expertise, and prevent bottlenecks"
7
+ tags: [code-review, load-balancing, assignment, bottleneck, fairness, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your 40-person engineering organization has a review load imbalance:
13
+
14
+ - 5 senior engineers review 70% of all PRs (burned out, blocking others)
15
+ - 15 mid-level engineers review 25% (could do more)
16
+ - 20 junior engineers review 5% (missing learning opportunity)
17
+ - Some code areas have only 1 qualified reviewer (bus factor = 1)
18
+ - Security-sensitive code (auth, payments) requires senior review
19
+ - Cross-team PRs sit in queues because no one "owns" the review
20
+ - Some reviewers are faster but lower quality; others are thorough but slow
21
+
22
+ Task: Design a review assignment and load balancing system. Include:
23
+ assignment algorithm, reviewer tiers, escalation paths, cross-
24
+ training plan to reduce bus factors, and how to handle different
25
+ review quality levels. Balance fairness, speed, and quality.
26
+
27
+ assertions:
28
+ - type: llm_judge
29
+ criteria: "Assignment system balances load with expertise — tiered system: (1) Primary reviewer (auto-assigned via round-robin within capable pool) — must be able to review the code area. (2) Secondary reviewer for critical paths (auth, payments, data migrations) — always a senior engineer. Use CODEOWNERS to define code area → reviewer pool mappings. Round-robin within each pool to distribute evenly. Capacity tracking: each reviewer has a weekly capacity (juniors: 2-3 reviews, mids: 4-5, seniors: 3-4 — seniors review fewer but harder PRs). Assignment considers current load: don't assign to someone with 3 pending reviews. Opt-out periods: reviewers can mark 'focus time' blocks where they're not assigned reviews"
30
+ weight: 0.35
31
+ description: "Assignment system"
32
+ - type: llm_judge
33
+ criteria: "Cross-training plan reduces bus factor — identify single points of failure: code areas with only 1 qualified reviewer. Training approach: (1) pair reviewing — shadow the expert for 2-3 reviews, then co-review, then solo with expert available for questions. (2) Rotate secondary reviewer assignments — mid-level engineers get assigned to unfamiliar areas with a senior co-reviewer. (3) Documentation of domain knowledge — expert creates a 'reviewer guide' for their specialty area (what to look for in payment code, common pitfalls). Target: every code area has at least 3 qualified reviewers within 6 months. Tracking: matrix of code areas × qualified reviewers, updated monthly. Incentive: reviewing unfamiliar code (with appropriate support) counts toward growth goals"
34
+ weight: 0.35
35
+ description: "Cross-training"
36
+ - type: llm_judge
37
+ criteria: "Quality differences are managed fairly — acknowledge: reviewers have different quality levels. Fast-but-shallow reviewers: good for low-risk code (documentation, minor refactors, dependency updates). Thorough-but-slow reviewers: good for complex logic, security-sensitive code. Match reviewer style to PR risk. Quality improvement: (1) monthly 'review quality' feedback — manager samples 3 reviews per engineer, provides feedback. (2) Example repository of excellent reviews — annotated examples of good comments at each severity level. (3) Pair reviewing for quality improvement — pair a fast reviewer with a thorough one. Avoid: publicly ranking review quality (shame doesn't improve quality). Do: celebrate specific excellent review comments in team channel. Escalation: when a reviewer is consistently rubber-stamping, private conversation about review expectations"
38
+ weight: 0.30
39
+ description: "Quality management"