npm - dojo.md - Versions diffs - 0.2.3 → 0.2.4 - Mend

dojo.md 0.2.3 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (49) hide show

package/courses/GENERATION_LOG.md CHANGED Viewed

@@ -533,3 +533,12 @@ Tracks all auto-generated courses for dojo.md.
 - **Scenarios**: 50 (10 per level × 5 levels)
 - **Type**: output
 - **Sources**: OpenAPI Initiative (OAS 3.1 specification), Swagger UI documentation, Redoc documentation, Stripe API documentation (industry benchmark), Twilio documentation patterns, GitHub API documentation, Diátaxis documentation framework, RFC 8594 (Deprecation header), RFC 7396 (JSON Merge Patch), Spectral linting documentation, Dredd contract testing, Schemathesis API testing, Docusaurus/Nextra static site generators, developer experience research and best practices
+### Course 55: Code Review Feedback Writing
+- **Generated**: 2026-02-27
+- **Category**: Writing & Communication
+- **Directory**: `courses/code-review-feedback-writing/`
+- **Topics researched**: clear comment writing (specific, actionable, contextual), constructive tone (collaborative, respectful, critique code not people), style vs logic severity (blocking/suggestion/nit categorization), asking effective questions (genuine vs passive-aggressive, Socratic method), providing context in comments (self-contained, references, evidence), reviewing small PRs (summary comments, inline feedback, approval decisions), nitpick etiquette (when to raise, labeling, automation alternatives), approve vs request changes (decision framework, calibrated decisions), giving meaningful praise (specific, pattern-reinforcing, authentic), reviewing complex PRs (thematic grouping, critical path identification, PR splitting), architectural feedback (design principles, proportionate alternatives, future risk), security review comments (vulnerability explanations, attack scenarios, OWASP), performance feedback (quantified impact, N+1 queries, complexity analysis), suggesting alternatives with code (before/after, trade-off analysis), reviewing tests (coverage, edge cases, naming, independence), reviewing documentation changes (accuracy, completeness, prevention), reviewing error handling (failure scenarios, resilience patterns, production impact), reviewing breaking changes (impact identification, safe migration, deployment coordination), cross-team review (different vs wrong, interface focus, collaborative communication), reviewing unfamiliar code (universal quality issues, dual-purpose questions), mentoring through review (teaching principles, supportive tone, building independence), speed vs thoroughness (triage, varying depth, honest scoping), API design review (REST conventions, response format, future compatibility), database migration review (production safety, rollback, deployment strategy), design pattern feedback (when to apply, over-engineering callout), reviewing senior engineers' code (confident feedback, power dynamics), production incident review (fix correctness, urgency balance, follow-up scoping), review culture design (values, behaviors, SLAs, conflict resolution), review process optimization (root cause analysis, interventions, measurement), review metrics (effectiveness vs activity, dangerous metrics, gaming prevention), automated review strategy (automation decisions, tool selection, rollout), review load balancing (assignment systems, cross-training, quality management), training reviewers (skill levels, curriculum, practice exercises), review guidelines standards (universal requirements, responsibilities, flexibility), security review standards (checklist by code area, training tiers, champions), scaling review process (20→50→100 milestones, transitions), review as organizational learning (knowledge capture, cross-team programs), toxic culture transformation (diagnosis, immediate interventions, 90-day program), scaling to 100+ engineers (automated systems, timezone awareness, culture preservation), review velocity impact (cost analysis, optimization, CEO recommendation), board quality metrics (business translation, financial quantification, competitive strategy), review competitive advantage (strategic narrative, recruiting, maintenance), knowledge transfer at scale (emergency 4-week plan, prevention strategy, distribution measurement), M&A review alignment (culture assessment, unified framework, migration strategy), review ROI analysis (cost model, value quantification, CFO recommendation)
+- **Scenarios**: 50 (10 per level × 5 levels)
+- **Type**: output
+- **Sources**: Google Engineering Practices (code review documentation), Conventional Comments (labeling system), Stack Overflow Blog (code review best practices), Dr. Michaela Greiler (respectful review feedback research), OWASP Secure Code Review Cheat Sheet, Graphite blog (code review scaling), DORA metrics research (deployment frequency, change failure rate), psychological safety research (Amy Edmondson), code review effectiveness studies

package/courses/code-review-feedback-writing/scenarios/level-3/advanced-review-shift.yaml ADDED Viewed

@@ -0,0 +1,48 @@
+meta:
+  id: advanced-review-shift
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Combined advanced shift — review a cross-team database migration PR while mentoring a junior developer, balancing speed with thoroughness during a sprint deadline"
+  tags: [code-review, combined, shift-simulation, cross-team, mentoring, advanced]
+state: {}
+trigger: |
+  It's Wednesday afternoon, sprint ends Friday. You have one complex
+  review situation:
+  A junior developer on another team wrote a PR that adds multi-tenant
+  support to the shared authentication service. Your team depends on
+  this service heavily. The PR:
+  - Database migration: adds tenant_id to 3 tables (users, sessions, api_keys)
+  - Auth middleware changes: adds tenant context to every request
+  - API changes: new endpoints for tenant management
+  - Breaking change: all existing API calls need a X-Tenant-ID header
+  - Tests: basic coverage, missing edge cases
+  - The junior dev is under pressure from their manager to ship by Friday
+  You need to:
+  1. Review from your team's perspective (impact on your services)
+  2. Mentor the junior developer (this is their biggest PR)
+  3. Be thorough on security (auth changes are critical)
+  4. Be fast enough to not block the sprint
+  5. Navigate the cross-team politics (their manager wants quick approval)
+  Task: Write the complete review that balances all five concerns.
+  Include your communication strategy for the sensitive dynamics.
+assertions:
+  - type: llm_judge
+    criteria: "Security concerns are non-negotiable regardless of deadline — auth middleware review: 'Tenant isolation is a security boundary — if tenant_id can be spoofed via header, any user can access any tenant's data. The X-Tenant-ID header must be validated against the authenticated user's allowed tenants, not just trusted.' Specific checks: (1) Can user A set X-Tenant-ID to tenant B's ID? (2) What happens when the header is missing? (3) What happens when an invalid tenant ID is provided? (4) Are all existing queries properly scoped by tenant_id? Migration: 'Adding tenant_id without a foreign key constraint means invalid tenant IDs can exist in the database.' Mark as blocking: 'These security items must be addressed before merge — no exceptions for sprint deadlines on authentication code.'"
+    weight: 0.35
+    description: "Security non-negotiable"
+  - type: llm_judge
+    criteria: "Impact on dependent teams is clearly communicated — breaking change assessment: 'The X-Tenant-ID header requirement is a breaking change for all consuming services. Our payments service makes 200 requests/second to auth — this will fail after deploy unless we're coordinated.' Migration plan needed: 'How is this being deployed? All consuming services need to be updated simultaneously, or: make the header optional first (default to primary tenant), deploy all consumers, then make it required.' Communication: raise in cross-team channel, not just PR comment. Specific impact: 'Our team needs 2-3 days to update our services to include X-Tenant-ID. This can't ship Friday without a backward-compatible migration path.' Propose solution: 'Make tenant_id optional with default for single-tenant mode. Phase 2 makes it required after all services migrate.'"
+    weight: 0.35
+    description: "Impact communication"
+  - type: llm_judge
+    criteria: "Mentoring and politics are handled with care — mentoring: 'This is an ambitious and important feature — the overall approach is solid. I have some security items that need to be addressed and some suggestions for safer deployment.' Don't overwhelm: separate blocking items (3-4) from suggestions (5+) from follow-up items. Offer help: 'Happy to pair on the tenant validation logic if that would help move faster.' Politics: direct message to the junior dev's manager: 'The PR is well-written but has security issues that need to be addressed — I want to help [name] fix them quickly rather than rush a merge. The breaking change also needs a migration plan that involves our team. Can we sync briefly?' Don't: approve under pressure ('it's fine, ship it'), block passive-aggressively ('I have 47 comments'), or escalate unnecessarily. Frame as: 'We're on the same team — let's ship this right.'"
+    weight: 0.30
+    description: "Mentoring and politics"

package/courses/code-review-feedback-writing/scenarios/level-3/api-design-review.yaml ADDED Viewed

@@ -0,0 +1,47 @@
+meta:
+  id: api-design-review
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Review API design — evaluate endpoint design, request/response contracts, versioning strategy, and backward compatibility in code reviews"
+  tags: [code-review, API-design, endpoints, contracts, versioning, advanced]
+state: {}
+trigger: |
+  You're reviewing a PR that adds 3 new API endpoints to your
+  public-facing REST API. These endpoints will be used by external
+  developers and can't easily be changed after release.
+  Proposed endpoints:
+  1. POST /api/v1/getOrders — retrieves orders with filters in the
+     request body
+  2. PUT /api/v1/orders/{id} — updates an order, but some fields
+     (status, total) are calculated and shouldn't be settable
+  3. DELETE /api/v1/orders/{id}/items/{itemId}/remove — removes an
+     item from an order
+  Response formats:
+  - Success: { "success": true, "data": {...} }
+  - Error: { "success": false, "error": "Something went wrong" }
+  The endpoints work correctly with comprehensive tests.
+  Task: Review the API design for RESTful conventions, future
+  compatibility, and developer experience. Write review comments
+  that explain API design principles and suggest corrections.
+  Remember: these become your public contract once released.
+assertions:
+  - type: llm_judge
+    criteria: "REST convention violations are identified — (1) POST /getOrders: GET is for retrieval, POST is for creation. Filters should be query parameters: GET /api/v1/orders?status=pending&after=2024-01-01. If filter complexity requires a body, use POST /api/v1/orders/search. 'getOrders' as a verb violates REST (nouns for resources). (2) PUT allows setting calculated fields: read-only fields should be rejected or ignored. Document which fields are writable. Consider: does PUT replace the entire order or just modified fields? If partial, use PATCH. (3) DELETE .../remove: redundant — DELETE already means remove. Simplify to DELETE /api/v1/orders/{id}/items/{itemId}. Each violation explained with the REST principle and the practical impact on developers"
+    weight: 0.35
+    description: "REST conventions"
+  - type: llm_judge
+    criteria: "Response format and contract issues are addressed — success/error envelope: 'The { success: true/false } pattern is redundant with HTTP status codes — 200 already means success, 400 means client error. Industry standard (Stripe, GitHub): use status codes for success/failure, and reserve the response body for data. Error responses should include: code (machine-readable), message (human-readable), details (field-level errors).' Current error: 'Something went wrong' gives developers nothing to debug. Suggest: `{ error: { code: 'ORDER_NOT_FOUND', message: 'Order with ID xyz does not exist', param: 'id' } }`. Consistency: 'Match the error format used by your other 20 endpoints — developers shouldn't learn a new error format per endpoint.'"
+    weight: 0.35
+    description: "Response format"
+  - type: llm_judge
+    criteria: "Future compatibility concerns are raised — 'These endpoints become a permanent public contract. Changes after release are breaking changes that affect external developers.' Specific concerns: (1) pagination — GET /orders must return paginated results (what happens when there are 100K orders?), (2) filtering — be intentional about which filters are supported (adding is non-breaking, removing is breaking), (3) field expansion — consider using ?fields=id,status,total for sparse responses, (4) versioning — v1 in the URL means we can make v2 later, but plan for what triggers a v2. API review checklist: naming consistency, HTTP method correctness, pagination, filtering, sorting, error format, rate limiting headers, authentication, idempotency (PUT/DELETE should be idempotent). 'Suggest: run this through our API design review checklist before launch.'"
+    weight: 0.30
+    description: "Future compatibility"

package/courses/code-review-feedback-writing/scenarios/level-3/database-migration-review.yaml ADDED Viewed

@@ -0,0 +1,48 @@
+meta:
+  id: database-migration-review
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Review database migrations — evaluate schema changes for safety, reversibility, data integrity, and production deployment risk"
+  tags: [code-review, database, migration, schema, deployment, safety, advanced]
+state: {}
+trigger: |
+  You're reviewing a database migration PR. The migration:
+  1. Adds a NOT NULL column "subscription_tier" to the users table
+     (2 million rows) without a default value
+  2. Renames column "userName" to "user_name" in the orders table
+  3. Adds a unique constraint on (email, organization_id) to the
+     users table
+  4. Drops the "legacy_tokens" table (mentioned as "unused")
+  5. Creates a new index on orders(customer_id, created_at) — the
+     orders table has 50 million rows
+  6. Changes the "amount" column type from INTEGER to DECIMAL(10,2)
+  The migration has no down/rollback migration defined.
+  Task: Review each migration step for production safety. For each,
+  explain the risk, describe what could go wrong during deployment,
+  and suggest the safe approach. These are the reviews that prevent
+  2am deployment incidents.
+assertions:
+  - type: llm_judge
+    criteria: "Production risks are identified for each step — (1) NOT NULL without default: migration fails on 2M existing rows (all have NULL for the new column). Even with a default, adding NOT NULL column locks the table during backfill on some databases. (2) Column rename: every query referencing 'userName' breaks instantly — application code, reports, ETL jobs, third-party integrations. (3) Unique constraint: fails if duplicate (email, org_id) pairs already exist — migration aborts mid-way. Need to check for duplicates BEFORE applying. (4) Drop table: 'unused' — how was this verified? Any scheduled jobs, background workers, or external systems accessing it? Data loss is permanent. (5) New index on 50M rows: index creation locks the table (in some DBs) — no reads or writes during build, could take 30+ minutes. (6) Type change: INTEGER to DECIMAL changes data semantics — existing value 1000 (cents) becomes 1000.00 (dollars?). All application code interpreting this value must be updated simultaneously"
+    weight: 0.35
+    description: "Production risks"
+  - type: llm_judge
+    criteria: "Safe approaches are suggested for each — (1) NOT NULL: three-step migration: add column as NULL with default → backfill existing rows → add NOT NULL constraint. (2) Rename: don't rename — add new column, dual-write, migrate reads, drop old column. Or: use database alias/view. (3) Unique constraint: first run SELECT email, organization_id, COUNT(*) ... HAVING COUNT(*) > 1 to find duplicates, fix them, THEN add constraint. (4) Drop table: add a deprecation period — rename to legacy_tokens_deprecated, wait 30 days, monitor for errors, THEN drop. (5) Index: use CREATE INDEX CONCURRENTLY (PostgreSQL) or equivalent to avoid table lock. (6) Type change: coordinate with application deploy — use a feature flag or blue/green deployment so code and schema change together. Each step: show the safe SQL or migration code"
+    weight: 0.35
+    description: "Safe approaches"
+  - type: llm_judge
+    criteria: "Missing rollback and deployment strategy are flagged — 'No down migration defined — if this migration fails halfway, how do we roll back? Every migration must have a corresponding rollback.' Rollback for each step: (1) DROP COLUMN, (2) rename back, (3) DROP CONSTRAINT, (4) can't rollback data loss (strongest argument against dropping tables), (5) DROP INDEX, (6) ALTER COLUMN back to INTEGER (but data precision is lost). Deployment strategy: 'This migration should be split into 6 separate migrations, deployed independently over multiple deploys. If step 3 fails, we don't want to have to roll back steps 1-2.' Testing: 'Run this migration against a copy of production data first — the unique constraint check (#3) and index creation time (#5) should be validated with real data volumes.' Monitor: 'Add migration timing to deployment metrics — alert if any step takes more than 5 minutes'"
+    weight: 0.30
+    description: "Rollback and deployment"

package/courses/code-review-feedback-writing/scenarios/level-3/design-pattern-feedback.yaml ADDED Viewed

@@ -0,0 +1,48 @@
+meta:
+  id: design-pattern-feedback
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Give design pattern feedback — recognize pattern opportunities and anti-patterns in code reviews and explain when patterns help vs when they over-engineer"
+  tags: [code-review, design-patterns, anti-patterns, refactoring, over-engineering, advanced]
+state: {}
+trigger: |
+  You're reviewing code that could benefit from design patterns, but
+  you need to be careful — not every piece of code needs a pattern.
+  You've found:
+  1. A 500-line switch statement handling 20 different notification
+     types — each case has similar but slightly different logic
+  2. A class with 15 methods that all start by loading user data,
+     checking permissions, and logging — massive duplication
+  3. Code that creates database connections directly in 30 different
+     places instead of using a connection pool
+  4. A simple CRUD endpoint where someone suggests "we should add
+     a Strategy pattern for the create operation"
+  5. Three layers of abstraction for a function that's called in
+     exactly one place
+  Task: For each situation, decide: does it need a pattern, which
+  pattern, or is the current approach fine? Write review comments
+  that teach WHEN patterns are appropriate, not just WHAT patterns
+  exist. Call out over-engineering as firmly as under-engineering.
+assertions:
+  - type: llm_judge
+    criteria: "Pattern recommendations are justified with concrete benefits — (1) Switch statement: Strategy or Command pattern — define a NotificationHandler interface, create EmailHandler, SMSHandler, PushHandler. Benefit: each handler is independently testable, new notification types don't modify existing code. Show the refactored structure. (2) Repeated setup: Template Method or middleware/decorator pattern — extract loadUser(), checkPermission(), logAction() into middleware that runs before each method. Benefit: DRY, single place to modify auth logic. (3) Direct connections: use connection pool (singleton or factory pattern). Benefit: resource management, connection reuse, centralized configuration. Each recommendation includes: what pattern, why it helps HERE, and a code sketch showing the result"
+    weight: 0.35
+    description: "Justified patterns"
+  - type: llm_judge
+    criteria: "Over-engineering is called out with equal conviction — (4) Strategy for simple CRUD: 'This endpoint creates a record in a table — a Strategy pattern adds 3 files and an interface for something that's 10 lines of straightforward code. The Strategy pattern is valuable when you have multiple interchangeable algorithms, not for a single code path that does one thing. YAGNI — You Aren't Gonna Need It. If we need different creation strategies later, we can refactor then.' (5) Three abstraction layers: 'This function is called in one place. The abstraction adds cognitive overhead — a reader must trace through 3 files to understand what happens. Inline the logic into the caller. Abstractions are earned by duplication, not created in anticipation.' Equal firmness: over-engineering wastes time and adds complexity just as under-engineering creates duplication"
+    weight: 0.35
+    description: "Over-engineering callout"
+  - type: llm_judge
+    criteria: "Guidelines for WHEN to apply patterns are practical — rules of thumb: (1) Three strikes rule — the first time, just write the code; the second time, note the duplication; the third time, extract a pattern. (2) Patterns are answers to specific problems — if you can't name the problem, you don't need the pattern. (3) Test: would a new team member understand the code better WITH or WITHOUT the pattern? If the pattern adds confusion, skip it. (4) Patterns should reduce code, not increase it — if Strategy adds 100 lines to save 20 lines of duplication, it's not worth it yet. (5) Favor simple composition over complex inheritance. Anti-patterns to watch for: 'Abstract Factory for one implementation', 'Singleton just because', 'Observer for two components that could just call each other.' Maturity signal: knowing when NOT to use a pattern is more valuable than knowing patterns"
+    weight: 0.30
+    description: "When to apply"

package/courses/code-review-feedback-writing/scenarios/level-3/production-incident-review.yaml ADDED Viewed

@@ -0,0 +1,42 @@
+meta:
+  id: production-incident-review
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Review incident fix PRs — evaluate hotfix code for correctness, root cause coverage, regression prevention, and appropriate urgency without cutting corners"
+  tags: [code-review, incident, hotfix, production, root-cause, regression, advanced]
+state: {}
+trigger: |
+  There's an active production incident. A developer submits a PR
+  tagged "HOTFIX" to resolve it. The incident: users intermittently
+  see another user's order history when clicking "My Orders."
+  The PR (45 lines):
+  - Adds a user ID filter to the orders query that was missing
+  - The fix is in the caching layer — orders were cached by page
+    URL without including the user ID in the cache key
+  - The author also "cleaned up" some unrelated code in the same file
+  - No tests added ("we need to ship this fast")
+  - The fix handles the cache key but doesn't invalidate existing
+    incorrect cache entries
+  Task: Review this hotfix balancing urgency with safety. Write your
+  review addressing: is the fix correct, does it fully resolve the
+  issue, what's the risk of shipping vs not shipping, and what
+  follow-up work is needed. This is a time-sensitive review.
+assertions:
+  - type: llm_judge
+    criteria: "Fix correctness is evaluated against the root cause — 'The cache key fix is correct — cache_key = /orders?page=1 should be cache_key = /orders?page=1&user_id=123. This prevents NEW incorrect entries. However, EXISTING cached entries still contain mixed user data. Risk: users continue seeing wrong data until those cache entries expire.' Critical missing piece: 'We need to either (1) flush the orders cache entirely, or (2) add cache invalidation for affected keys. Without this, the fix only prevents future occurrences — existing cached wrong data persists for [TTL duration].' Assessment: 'The fix is 80% correct — it prevents the vulnerability from recurring but doesn't clean up existing corrupted cache entries.'"
+    weight: 0.35
+    description: "Fix correctness"
+  - type: llm_judge
+    criteria: "Review balances urgency with necessary safety — decision framework: 'Ship immediately with cache flush' not 'Wait for perfect fix.' Specific recommendation: (1) Remove the unrelated code cleanup — hotfix PRs should ONLY contain the fix (reduces risk, simplifies rollback). (2) Add cache flush/invalidation — without this, the data leak continues from cache. (3) Tests can be a fast follow-up PR — but must be done today, not 'later.' (4) Approve once: unrelated changes removed AND cache invalidation added. Urgency acknowledgment: 'This is a data privacy incident — users seeing other users' data. Speed matters. But shipping a partial fix that still leaks cached data isn't actually faster — it's a second incident.'"
+    weight: 0.35
+    description: "Urgency balance"
+  - type: llm_judge
+    criteria: "Follow-up work is clearly scoped — immediate (ship with hotfix): cache key fix + cache flush + remove unrelated changes. Today: add regression test that verifies cache keys include user ID. This week: (1) audit all other cache keys for similar user-scoping issues, (2) add cache key logging/monitoring to detect future issues, (3) consider cache key builder utility that enforces user scoping by default. Post-incident: (1) retrospective to understand why the original code didn't include user ID (was it a refactor regression? was it never there?), (2) review process improvement — how did the original PR pass review without user scoping? (3) postmortem document. Frame follow-up as learning: 'The cache key pattern should be documented as a team standard — this is how bugs become institutional knowledge.'"
+    weight: 0.30
+    description: "Follow-up scoping"

package/courses/code-review-feedback-writing/scenarios/level-3/reviewing-senior-code.yaml ADDED Viewed

@@ -0,0 +1,47 @@
+meta:
+  id: reviewing-senior-code
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Review senior engineer's code — navigate the dynamics of providing feedback to more experienced developers while maintaining quality standards"
+  tags: [code-review, senior-engineer, dynamics, confidence, speaking-up, advanced]
+state: {}
+trigger: |
+  You're a mid-level engineer asked to review a PR from the team's
+  most senior developer (10 years experience, built the core system).
+  Their PR refactors the authentication middleware — 250 lines.
+  You find several concerns:
+  1. A race condition when refreshing OAuth tokens concurrently
+  2. The new middleware doesn't handle the edge case of expired
+     refresh tokens (the old code did)
+  3. Magic numbers: timeout values without named constants
+  4. No tests for the concurrent token refresh scenario
+  5. The refactor changes behavior — previously failed silently,
+     now throws (affects all callers)
+  You're nervous because:
+  - This person is far more experienced than you
+  - They might know something you don't
+  - You've seen them dismiss junior reviewers before
+  - But you're genuinely concerned about #1 and #2
+  Task: Write the review navigating the power dynamic. Show how to
+  give feedback to someone more senior while being confident in your
+  observations. Write a guide for reviewing up.
+assertions:
+  - type: llm_judge
+    criteria: "Technical concerns are communicated with appropriate confidence — for genuine bugs (#1, #2): be direct but frame as observations, not accusations. 'I traced through the concurrent token refresh path and I believe there's a race condition: if two requests trigger a refresh simultaneously, both will attempt to write the new token. The old code used a mutex here (auth.ts:89 in the previous version) — was the removal intentional?' For the behavior change (#5): 'I noticed the error handling shifted from silent failure to throwing. Since this middleware is used by all 40 endpoints, this changes behavior for every request — was this intentional? If so, should we coordinate the rollout?' Confidence where warranted: 'The race condition is a real concern regardless of experience level — concurrent access bugs don't care about seniority.'"
+    weight: 0.35
+    description: "Confident feedback"
+  - type: llm_judge
+    criteria: "Power dynamic is navigated without being deferential or aggressive — don't start with: 'I might be wrong but...' or 'I'm sure you thought of this...' (undermines your review). Don't say: 'This is obviously wrong' (confrontational with senior). Do: state observations factually. 'Lines 45-60: when two concurrent requests both have expired tokens, the first refresh succeeds but the second attempts to refresh with the already-used refresh token, which will fail.' Ask genuine questions for things you're unsure about: 'I see the silent failure was changed to throwing — is there a design document for this change? I want to understand the migration plan for callers.' Nits (#3) are fine to raise normally — everyone gets nits. The key: treat the code the same regardless of who wrote it"
+    weight: 0.35
+    description: "Dynamic navigation"
+  - type: llm_judge
+    criteria: "Guide for reviewing up is practical — principles: (1) Your job is to review the CODE, not the PERSON. Experience doesn't make code bug-free. (2) If you see a potential bug, say so — you were asked to review for a reason. (3) Frame uncertain observations as questions ('Could this cause X?'), but frame clear issues as statements ('This race condition will cause Y'). (4) Don't pre-apologize for having feedback — it undermines the review process. (5) Senior engineers generally WANT honest review — they chose you as a reviewer. (6) If dismissed unfairly, escalate to your manager — review quality is everyone's responsibility. Common mistake: being so deferential that you provide a useless rubber-stamp review. The worst outcome isn't disagreeing with a senior — it's a production bug you saw but didn't mention"
+    weight: 0.30
+    description: "Reviewing up guide"

package/courses/code-review-feedback-writing/scenarios/level-3/speed-vs-thoroughness.yaml ADDED Viewed

@@ -0,0 +1,46 @@
+meta:
+  id: speed-vs-thoroughness
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Balance review speed and thoroughness — develop strategies for providing timely reviews without sacrificing quality for different types of PRs"
+  tags: [code-review, speed, thoroughness, triage, time-management, advanced]
+state: {}
+trigger: |
+  You have 6 PRs in your review queue, all assigned to you. Your
+  team's SLA is 24 hours for first review. It's 4pm and these came
+  in today:
+  1. Hotfix: Production bug — users seeing 500 errors on checkout
+     (15 lines, high urgency)
+  2. Feature: New dashboard analytics (350 lines, normal priority,
+     sprint deadline Friday)
+  3. Dependency update: Bump React from 18.2 to 18.3 (5 files changed,
+     low priority)
+  4. Refactor: Extract common validation logic into shared module
+     (200 lines, medium priority)
+  5. Junior dev's PR: First feature implementation, they're blocked
+     waiting for review (100 lines, medium urgency due to person)
+  6. Tech debt: Remove deprecated API endpoints (150 lines, low priority)
+  You have 2 hours of review time today.
+  Task: Create a review strategy for all 6 PRs. Show how to triage,
+  what depth of review each gets, write example reviews at different
+  thoroughness levels, and explain your prioritization reasoning.
+assertions:
+  - type: llm_judge
+    criteria: "Triage ordering is well-reasoned — priority: (1) Hotfix — production impact, review in next 15 minutes. (2) Junior dev's PR — person is blocked, reviewing unblocks them (invest 30 minutes). (3) Feature — sprint deadline pressure (invest 45 minutes). (4) Refactor — medium priority, affects shared code quality (invest 20 minutes). (5) Dependency update — low risk if CI passes, quick scan (10 minutes). (6) Tech debt — no deadline, can slip to tomorrow (skip or quick scan). Reasoning: combine urgency (production > blocked person > deadline) with risk (shared code > isolated change > dependency bump). The junior dev is prioritized higher than strict urgency would suggest because blocking people is more costly than blocking code"
+    weight: 0.35
+    description: "Triage ordering"
+  - type: llm_judge
+    criteria: "Review depth varies appropriately per PR — Hotfix (15 min): verify the fix addresses the error, check for regression risk, don't nit on style — approve fast with 'LGTM, fix looks correct. Let's clean up styling in a follow-up.' Feature (45 min): full review — architecture, security, tests, performance. This is the big one that justifies thorough review. Junior PR (30 min): mentor-focused review — teach patterns, be encouraging, unblock quickly even if follow-up work is needed. Refactor (20 min): focus on interface changes and backward compatibility — internal implementation can be reviewed more lightly. Dependency (10 min): check CI passes, scan changelog for breaking changes, verify no deprecated API usage. Tech debt (5 min or skip): quick scan for accidental behavior changes, or leave comment 'I'll review this first thing tomorrow.' Each depth level is explicitly justified"
+    weight: 0.35
+    description: "Varying depth"
+  - type: llm_judge
+    criteria: "Example reviews demonstrate different thoroughness levels — quick review (hotfix): 'Verified: the fix adds null check for user.cart before accessing .items. This matches the error in Sentry (TypeError: Cannot read property items of null). CI passes. Approved — ship it.' Medium review (refactor): summary of approach, 2-3 specific comments on interface design, one question about edge case, approve with suggestions. Deep review (feature): structured thematic review, 8-10 comments across architecture/security/testing, request changes on 2 blocking items. Honest scoping: when you do a lighter review, say so: 'I did a quick review focused on the API interface — I'd recommend someone else gives the internal logic a deeper look.' Better to do a timeboxed review and disclose the scope than to skim everything and pretend it was thorough"
+    weight: 0.30
+    description: "Depth examples"

package/courses/code-review-feedback-writing/scenarios/level-4/automated-review-strategy.yaml ADDED Viewed

@@ -0,0 +1,44 @@
+meta:
+  id: automated-review-strategy
+  level: 4
+  course: code-review-feedback-writing
+  type: output
+  description: "Design automated review tooling — determine what to automate vs keep as human review, select tools, and integrate automation into the review workflow"
+  tags: [code-review, automation, linting, static-analysis, CI, tooling, expert]
+state: {}
+trigger: |
+  Your team spends too much time on mechanical review tasks. You want
+  to automate what you can so humans focus on what matters. Current
+  review pain points:
+  - 40% of comments are style/formatting (Prettier/ESLint should handle)
+  - Security vulnerabilities sometimes slip through (Snyk/Semgrep could catch)
+  - Type errors found in review (TypeScript strict mode would prevent)
+  - Dead code and unused imports flagged manually
+  - Test coverage gaps discovered during review
+  - API breaking changes not noticed until production
+  - Documentation staleness (code changed, docs not updated)
+  Budget: $500/month for tooling. Team: 30 engineers.
+  Task: Design the automated review tooling strategy. For each pain
+  point, decide: automate fully, automate as advisory, or keep human.
+  Include: tool selection, CI/CD integration, configuration approach,
+  and how to introduce automation without overwhelming developers
+  with noise.
+assertions:
+  - type: llm_judge
+    criteria: "Each pain point has a clear automation decision — automate fully (block PR if failing): formatting (Prettier — auto-fix on commit), linting (ESLint — errors block, warnings advisory), type checking (TypeScript strict — blocks build), unused imports (ESLint rule — auto-fix). Automate as advisory (comment on PR, don't block): security scanning (Semgrep/CodeQL — flag but human validates severity), test coverage (report delta — don't block on arbitrary threshold), dead code detection (report but human decides if intentional). Keep human: architectural review, business logic validation, API design review, documentation quality (detect CHANGE but human reviews quality). Each decision justified: 'Formatting is objectively correct — no human judgment needed. Security findings require context — is this a real vulnerability or a false positive?'"
+    weight: 0.35
+    description: "Automation decisions"
+  - type: llm_judge
+    criteria: "Tool selection is specific and budget-conscious — within $500/month: (1) Prettier + ESLint: free, handles formatting and code quality. (2) TypeScript strict mode: free, catches type errors. (3) GitHub Actions: included in GitHub, runs CI checks. (4) Semgrep: free tier covers security scanning. (5) CodeCov: free for open source, ~$100/month for private — coverage delta reporting. (6) danger.js or similar: free, custom PR automation (check for docs updates when API files change, check for changelog when version bumps). (7) Spectral: free, OpenAPI linting for API breaking changes. Total: ~$100-200/month, well under budget. Alternative: GitHub Advanced Security (~$400/month) for CodeQL + secret scanning + dependency review. Configuration: use shared config packages so all 30 engineers have consistent setup"
+    weight: 0.35
+    description: "Tool selection"
+  - type: llm_judge
+    criteria: "Rollout strategy prevents automation fatigue — phase 1 (week 1-2): Prettier auto-format only (zero noise — just fixes things). Phase 2 (week 3-4): ESLint errors only (not warnings — configure strictly to avoid noise). Phase 3 (month 2): Security scanning as advisory comments. Phase 4 (month 3): Coverage reporting and API change detection. Key principle: 'Every automated check must be actionable. If developers start ignoring bot comments, the automation is noise, not signal.' Configuration: tune tools before enabling — run against existing codebase, fix existing violations, THEN enable as blocking. False positive budget: 'If a tool produces more than 10% false positives, fix configuration or remove the tool. Developer trust in automation is fragile.' Gradual: introduce one tool at a time, measure impact, adjust before adding the next"
+    weight: 0.30
+    description: "Rollout strategy"

package/courses/code-review-feedback-writing/scenarios/level-4/expert-review-shift.yaml ADDED Viewed

@@ -0,0 +1,46 @@
+meta:
+  id: expert-review-shift
+  level: 4
+  course: code-review-feedback-writing
+  type: output
+  description: "Combined expert shift — redesign an organization's code review program covering culture, process, automation, metrics, and training for a growing engineering team"
+  tags: [code-review, combined, shift-simulation, program-design, expert]
+state: {}
+trigger: |
+  You've been promoted to VP of Engineering at a 100-person startup
+  (60 engineers, 8 teams). The CEO says: "Our code quality is
+  declining, deployments keep breaking, and engineers are unhappy
+  with the review process. Fix it."
+  Current state:
+  - No formal review process — each team does their own thing
+  - Average defect escape rate: 12% (industry target: < 5%)
+  - Deployment rollback rate: 8% (should be < 2%)
+  - Engineer satisfaction with reviews: 3.5/10
+  - 3 engineers left citing "toxic review culture" in exit interviews
+  - No automated code quality checks
+  - Average time-to-merge: 8 days (blocking feature delivery)
+  Budget: $50K for tooling, ability to hire 1 additional head
+  Task: Present your complete code review transformation plan to the
+  CEO. Include: diagnosis of current problems, 90-day action plan
+  with quick wins, 6-month target state, tooling investment, culture
+  change program, metrics dashboard, and how this connects to the
+  broader goals of code quality and engineer retention.
+assertions:
+  - type: llm_judge
+    criteria: "Diagnosis connects symptoms to root causes — 12% defect escape: reviews aren't catching bugs (reviewers focused on style, not logic — or rubber-stamping). 8% rollback rate: inadequate testing requirement in reviews (tests not reviewed or not required). 3.5/10 satisfaction: combination of slow reviews (8 day merge), hostile comments (3 people left), and inconsistent standards. Toxic culture: specific reviewers giving harsh feedback without consequences — management didn't intervene. 8-day merge: no SLA, no reviewer assignment, no load balancing — PRs sit in queues. Root cause: absence of process, not malicious intent. Nobody designed the review system — it evolved haphazardly. Fix the system, not the people (though the toxic reviewer behavior needs direct intervention)"
+    weight: 0.35
+    description: "Root cause diagnosis"
+  - type: llm_judge
+    criteria: "90-day plan delivers quick wins and lasting change — days 1-30 (immediate): (1) address toxic reviewer behavior directly (1:1s, expectations, consequences), (2) implement PR template with minimum requirements, (3) add Prettier + ESLint to CI (eliminate 40% of review comments), (4) establish 24-hour review SLA, (5) hire: DevEx engineer to own review tooling. Days 31-60: (1) CODEOWNERS for automatic reviewer assignment, (2) security scanning in CI (Semgrep), (3) reviewer training workshop (2 hours, all engineers), (4) code review guidelines document published. Days 61-90: (1) metrics dashboard live, (2) review quality feedback loop (manager reviews reviews), (3) cross-team reviewer rotation pilot, (4) quarterly calibration exercise. Each action connected to a specific metric improvement"
+    weight: 0.35
+    description: "90-day plan"
+  - type: llm_judge
+    criteria: "6-month targets and CEO communication are compelling — 6-month targets: defect escape < 6% (from 12%), rollback < 4% (from 8%), satisfaction > 6.5/10 (from 3.5), time-to-merge < 3 days (from 8), zero departures citing review culture. Budget allocation: tooling $20K (CI/CD, security scanning, coverage), training $10K (workshops, external facilitation), DevEx hire $80K (6 months salary). ROI for CEO: each production defect costs ~$5K (investigation + fix + customer impact). Reducing defect escape from 12% to 6% on 500 deploys/year = 30 fewer defects = $150K saved. Each departed engineer costs ~$150K to replace — preventing 3 departures = $450K. Total ROI: $600K savings on $110K investment. Culture change message: 'Code review is how we maintain quality at scale. We're investing in making it a positive, effective process — not a bureaucratic checkbox.'"
+    weight: 0.30
+    description: "Targets and communication"

package/courses/code-review-feedback-writing/scenarios/level-4/review-culture-design.yaml ADDED Viewed

@@ -0,0 +1,41 @@
+meta:
+  id: review-culture-design
+  level: 4
+  course: code-review-feedback-writing
+  type: output
+  description: "Design code review culture — establish review norms, expectations, and values that create a healthy, productive review environment for the engineering team"
+  tags: [code-review, culture, norms, psychological-safety, team-health, expert]
+state: {}
+trigger: |
+  You're the new engineering manager for a 25-person team. The code
+  review culture has problems:
+  - 3 senior engineers do 80% of reviews (bottleneck + burnout)
+  - Reviews average 5 days turnaround (blocking developers for a week)
+  - Some reviewers leave 30+ comments on every PR (demoralizing)
+  - Junior developers avoid requesting review from specific seniors
+  - Two engineers had a public argument in PR comments last month
+  - No one reviews documentation PRs ("not code, not my job")
+  - Some PRs are merged with "LGTM" and no substantive review
+  Task: Design a complete code review culture document. Include:
+  review values and principles, expected behaviors, SLAs, reviewer
+  responsibilities, author responsibilities, conflict resolution,
+  and how to onboard new team members into the culture. This becomes
+  the team's code review charter.
+assertions:
+  - type: llm_judge
+    criteria: "Values and principles address the specific problems — values: (1) Respect: critique code, never people. Public arguments in PRs are unacceptable — disagreements escalate to private discussion. (2) Shared responsibility: reviewing is everyone's job, not just seniors'. Every engineer reviews at least 2 PRs/week. (3) Timeliness: first review within 24 hours (business day). Slow reviews block people and waste context. (4) Proportionality: comment volume should match issue severity. 30 comments on a 50-line PR signals either a problematic PR (should have been discussed before writing) or an overly critical reviewer. (5) Growth orientation: reviews are opportunities for learning, not gatekeeping. (6) Comprehensiveness: documentation, tests, and configuration changes deserve review attention, not just 'code.'"
+    weight: 0.35
+    description: "Values and principles"
+  - type: llm_judge
+    criteria: "Concrete behaviors and SLAs are defined — reviewer SLA: first review pass within 24 business hours. If you can't review, reassign within 4 hours. Author SLA: respond to review comments within 24 hours. Comment labels: require 'blocking:', 'suggestion:', 'nit:', 'question:', 'praise:' prefixes. Maximum review rounds: if a PR requires more than 2 rounds, schedule a synchronous discussion instead. PR size: aim for under 200 lines; PRs over 400 lines can be reviewed in a meeting instead. Approval requirements: 1 approval for standard changes, 2 for security/auth/payments, team lead for architectural changes. Self-review: authors should self-review their own PR before requesting review (catches obvious issues)"
+    weight: 0.35
+    description: "Behaviors and SLAs"
+  - type: llm_judge
+    criteria: "Conflict resolution and onboarding are addressed — conflict resolution: (1) technical disagreements: if 2 rounds of comments don't resolve, take it offline — DM, call, or meeting. (2) Tone issues: if a comment feels personal, assume good intent first. If pattern continues, raise with manager privately. (3) Impasse: engineering manager breaks ties. (4) Never argue in PR comments — the PR author, other reviewers, and future readers all see it. Onboarding new team members: (1) first week: shadow 3 reviews from different reviewers, (2) second week: pair-review with a senior, (3) third week: solo review with senior co-reviewer, (4) ongoing: monthly 'review quality' check-in. Measuring health: quarterly anonymous survey on review experience, track turnaround time, track comment patterns. Review the charter quarterly and update based on team feedback"
+    weight: 0.30
+    description: "Conflict and onboarding"

package/courses/code-review-feedback-writing/scenarios/level-4/review-guidelines-standards.yaml ADDED Viewed

@@ -0,0 +1,45 @@
+meta:
+  id: review-guidelines-standards
+  level: 4
+  course: code-review-feedback-writing
+  type: output
+  description: "Create review guidelines — write comprehensive code review standards that define expectations for both authors and reviewers across the organization"
+  tags: [code-review, guidelines, standards, expectations, organization, expert]
+state: {}
+trigger: |
+  Your growing engineering organization (60 developers, 8 teams) needs
+  written code review guidelines. Currently, review expectations are
+  tribal knowledge — each team has different unwritten rules.
+  Inconsistencies across teams:
+  - Team A requires 2 approvals, Team B requires 1
+  - Team C blocks on any style issue, Team D auto-formats
+  - Some teams review architecture before coding, others review
+    only the final PR
+  - Test requirements: Team E requires 80% coverage, Team F has
+    no coverage requirement
+  - Some teams use PR templates, others have no structure
+  You need org-wide guidelines that are flexible enough for team
+  autonomy but consistent enough for cross-team collaboration.
+  Task: Write the organization's code review guidelines document.
+  Include: universal requirements (all teams must follow), recommended
+  practices (teams should customize), author responsibilities, reviewer
+  responsibilities, review scope by PR type, and escalation procedures.
+assertions:
+  - type: llm_judge
+    criteria: "Universal requirements establish a quality floor — non-negotiable (all teams): (1) every production code change must be reviewed by at least 1 person who didn't write it, (2) security-sensitive code (auth, payments, PII handling) requires review by security-trained engineer, (3) database migrations require DBA or data team review, (4) review comments must use severity labels (blocking/suggestion/nit), (5) first review within 24 business hours, (6) no self-merging except for emergencies (documented with incident link), (7) reviews must verify: correctness, test coverage for new behavior, no security regressions. Each requirement has a clear rationale explaining WHY it's non-negotiable. Exceptions process: how to request a waiver for legitimate reasons (hotfix, trivial change)"
+    weight: 0.35
+    description: "Universal requirements"
+  - type: llm_judge
+    criteria: "Author and reviewer responsibilities are explicit — author responsibilities: (1) self-review before requesting review (read your own diff), (2) write meaningful PR description (what, why, how to test), (3) keep PRs focused (one concern per PR), (4) respond to comments within 24 hours, (5) don't merge with unresolved blocking comments. Reviewer responsibilities: (1) review within 24 hours, (2) review the entire PR (don't review half and approve), (3) label comment severity, (4) provide actionable feedback (problem + suggestion), (5) approve or request changes (don't leave in liminal state), (6) re-review within 4 hours after author addresses feedback. PR templates: suggested sections — description, motivation, testing instructions, screenshots (for UI), checklist (tests added, docs updated, no secrets). Teams can customize but must include at minimum: description and testing instructions"
+    weight: 0.35
+    description: "Responsibilities"
+  - type: llm_judge
+    criteria: "Flexibility and escalation balance consistency with autonomy — team-customizable: number of required approvals (minimum 1, teams can require 2+), coverage thresholds (each team sets their own, but must have one), style/formatting rules (team choice but must be automated), PR size limits (recommended < 200 lines, teams set their own). Review scope by PR type: hotfix (correctness only — ship fast), feature (full review — architecture, tests, docs), refactor (focus on behavior preservation), dependency update (check changelog + CI). Escalation: (1) reviewer disagrees with author → discussion in comments, (2) can't resolve → bring in a third reviewer, (3) still can't resolve → team lead decides, (4) cross-team disagreement → engineering manager. Document versioning: guidelines reviewed quarterly, teams can propose changes via PR to the guidelines doc itself"
+    weight: 0.30
+    description: "Flexibility and escalation"

package/courses/code-review-feedback-writing/scenarios/level-4/review-load-balancing.yaml ADDED Viewed

@@ -0,0 +1,39 @@
+meta:
+  id: review-load-balancing
+  level: 4
+  course: code-review-feedback-writing
+  type: output
+  description: "Balance review workload — design assignment systems that distribute review load fairly, develop reviewer expertise, and prevent bottlenecks"
+  tags: [code-review, load-balancing, assignment, bottleneck, fairness, expert]
+state: {}
+trigger: |
+  Your 40-person engineering organization has a review load imbalance:
+  - 5 senior engineers review 70% of all PRs (burned out, blocking others)
+  - 15 mid-level engineers review 25% (could do more)
+  - 20 junior engineers review 5% (missing learning opportunity)
+  - Some code areas have only 1 qualified reviewer (bus factor = 1)
+  - Security-sensitive code (auth, payments) requires senior review
+  - Cross-team PRs sit in queues because no one "owns" the review
+  - Some reviewers are faster but lower quality; others are thorough but slow
+  Task: Design a review assignment and load balancing system. Include:
+  assignment algorithm, reviewer tiers, escalation paths, cross-
+  training plan to reduce bus factors, and how to handle different
+  review quality levels. Balance fairness, speed, and quality.
+assertions:
+  - type: llm_judge
+    criteria: "Assignment system balances load with expertise — tiered system: (1) Primary reviewer (auto-assigned via round-robin within capable pool) — must be able to review the code area. (2) Secondary reviewer for critical paths (auth, payments, data migrations) — always a senior engineer. Use CODEOWNERS to define code area → reviewer pool mappings. Round-robin within each pool to distribute evenly. Capacity tracking: each reviewer has a weekly capacity (juniors: 2-3 reviews, mids: 4-5, seniors: 3-4 — seniors review fewer but harder PRs). Assignment considers current load: don't assign to someone with 3 pending reviews. Opt-out periods: reviewers can mark 'focus time' blocks where they're not assigned reviews"
+    weight: 0.35
+    description: "Assignment system"
+  - type: llm_judge
+    criteria: "Cross-training plan reduces bus factor — identify single points of failure: code areas with only 1 qualified reviewer. Training approach: (1) pair reviewing — shadow the expert for 2-3 reviews, then co-review, then solo with expert available for questions. (2) Rotate secondary reviewer assignments — mid-level engineers get assigned to unfamiliar areas with a senior co-reviewer. (3) Documentation of domain knowledge — expert creates a 'reviewer guide' for their specialty area (what to look for in payment code, common pitfalls). Target: every code area has at least 3 qualified reviewers within 6 months. Tracking: matrix of code areas × qualified reviewers, updated monthly. Incentive: reviewing unfamiliar code (with appropriate support) counts toward growth goals"
+    weight: 0.35
+    description: "Cross-training"
+  - type: llm_judge
+    criteria: "Quality differences are managed fairly — acknowledge: reviewers have different quality levels. Fast-but-shallow reviewers: good for low-risk code (documentation, minor refactors, dependency updates). Thorough-but-slow reviewers: good for complex logic, security-sensitive code. Match reviewer style to PR risk. Quality improvement: (1) monthly 'review quality' feedback — manager samples 3 reviews per engineer, provides feedback. (2) Example repository of excellent reviews — annotated examples of good comments at each severity level. (3) Pair reviewing for quality improvement — pair a fast reviewer with a thorough one. Avoid: publicly ranking review quality (shame doesn't improve quality). Do: celebrate specific excellent review comments in team channel. Escalation: when a reviewer is consistently rubber-stamping, private conversation about review expectations"
+    weight: 0.30
+    description: "Quality management"