npm - dojo.md - Versions diffs - 0.2.2 → 0.2.4 - Mend

dojo.md 0.2.2 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (196) hide show

package/courses/code-review-feedback-writing/scenarios/level-3/design-pattern-feedback.yaml ADDED Viewed

@@ -0,0 +1,48 @@
+meta:
+  id: design-pattern-feedback
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Give design pattern feedback — recognize pattern opportunities and anti-patterns in code reviews and explain when patterns help vs when they over-engineer"
+  tags: [code-review, design-patterns, anti-patterns, refactoring, over-engineering, advanced]
+state: {}
+trigger: |
+  You're reviewing code that could benefit from design patterns, but
+  you need to be careful — not every piece of code needs a pattern.
+  You've found:
+  1. A 500-line switch statement handling 20 different notification
+     types — each case has similar but slightly different logic
+  2. A class with 15 methods that all start by loading user data,
+     checking permissions, and logging — massive duplication
+  3. Code that creates database connections directly in 30 different
+     places instead of using a connection pool
+  4. A simple CRUD endpoint where someone suggests "we should add
+     a Strategy pattern for the create operation"
+  5. Three layers of abstraction for a function that's called in
+     exactly one place
+  Task: For each situation, decide: does it need a pattern, which
+  pattern, or is the current approach fine? Write review comments
+  that teach WHEN patterns are appropriate, not just WHAT patterns
+  exist. Call out over-engineering as firmly as under-engineering.
+assertions:
+  - type: llm_judge
+    criteria: "Pattern recommendations are justified with concrete benefits — (1) Switch statement: Strategy or Command pattern — define a NotificationHandler interface, create EmailHandler, SMSHandler, PushHandler. Benefit: each handler is independently testable, new notification types don't modify existing code. Show the refactored structure. (2) Repeated setup: Template Method or middleware/decorator pattern — extract loadUser(), checkPermission(), logAction() into middleware that runs before each method. Benefit: DRY, single place to modify auth logic. (3) Direct connections: use connection pool (singleton or factory pattern). Benefit: resource management, connection reuse, centralized configuration. Each recommendation includes: what pattern, why it helps HERE, and a code sketch showing the result"
+    weight: 0.35
+    description: "Justified patterns"
+  - type: llm_judge
+    criteria: "Over-engineering is called out with equal conviction — (4) Strategy for simple CRUD: 'This endpoint creates a record in a table — a Strategy pattern adds 3 files and an interface for something that's 10 lines of straightforward code. The Strategy pattern is valuable when you have multiple interchangeable algorithms, not for a single code path that does one thing. YAGNI — You Aren't Gonna Need It. If we need different creation strategies later, we can refactor then.' (5) Three abstraction layers: 'This function is called in one place. The abstraction adds cognitive overhead — a reader must trace through 3 files to understand what happens. Inline the logic into the caller. Abstractions are earned by duplication, not created in anticipation.' Equal firmness: over-engineering wastes time and adds complexity just as under-engineering creates duplication"
+    weight: 0.35
+    description: "Over-engineering callout"
+  - type: llm_judge
+    criteria: "Guidelines for WHEN to apply patterns are practical — rules of thumb: (1) Three strikes rule — the first time, just write the code; the second time, note the duplication; the third time, extract a pattern. (2) Patterns are answers to specific problems — if you can't name the problem, you don't need the pattern. (3) Test: would a new team member understand the code better WITH or WITHOUT the pattern? If the pattern adds confusion, skip it. (4) Patterns should reduce code, not increase it — if Strategy adds 100 lines to save 20 lines of duplication, it's not worth it yet. (5) Favor simple composition over complex inheritance. Anti-patterns to watch for: 'Abstract Factory for one implementation', 'Singleton just because', 'Observer for two components that could just call each other.' Maturity signal: knowing when NOT to use a pattern is more valuable than knowing patterns"
+    weight: 0.30
+    description: "When to apply"

package/courses/code-review-feedback-writing/scenarios/level-3/mentoring-through-review.yaml ADDED Viewed

@@ -0,0 +1,46 @@
+meta:
+  id: mentoring-through-review
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Mentor through code review — use reviews as teaching opportunities that develop junior developers' skills while maintaining code quality"
+  tags: [code-review, mentoring, teaching, junior-developers, growth, advanced]
+state: {}
+trigger: |
+  You're reviewing code from a junior developer (3 months on the team).
+  Their PR implements a user notification preferences feature. The code
+  works but has several learning opportunities:
+  1. They wrote 200 lines of procedural code in one file instead of
+     using the service pattern the team follows
+  2. They used string comparison for notification types instead of
+     the existing NotificationType enum
+  3. They implemented their own email queue instead of using the
+     existing NotificationQueue service
+  4. Error handling catches everything and returns a generic 500
+  5. Tests are tightly coupled to implementation (mock everything)
+  6. Good: the database migration is clean and well-structured
+  7. Good: they wrote thorough documentation for the new endpoints
+  They're eager to learn but past harsh reviews have made them
+  defensive. How do you turn this review into a growth opportunity?
+  Task: Write the complete mentoring review. For each issue, teach
+  the principle (not just the fix), connect to team patterns, and
+  offer to pair. Make the developer feel supported, not criticized.
+assertions:
+  - type: llm_judge
+    criteria: "Each comment teaches a principle, not just a fix — (1) Service pattern: explain WHY the team uses services (testability, reuse, separation of concerns) — not just 'move this to a service.' Show how the existing OrderService follows this pattern as a reference. (2) Enum: explain WHY enums exist (typo prevention, IDE autocomplete, refactoring safety) — show what happens when someone types 'emal' instead of 'email'. (3) Existing queue: explain HOW to discover existing utilities (search patterns, team wiki, ask in Slack) — the skill is knowing to look, not just what to use. (4) Error handling: explain the difference between expected errors (user not found) and unexpected errors (database crash) — each needs different handling. (5) Test coupling: explain what happens when implementation changes (all tests break even though behavior is correct) — suggest testing behavior: 'when user disables email notifications, emails stop being sent'"
+    weight: 0.35
+    description: "Principle teaching"
+  - type: llm_judge
+    criteria: "Tone supports growth without lowering the bar — start with genuine praise: 'The migration is really clean — especially the default values and constraints. And the endpoint documentation is better than what most senior developers write. These are the things that matter for long-term code quality.' For issues: frame as learning, not mistakes: 'This is a great opportunity to learn about our service pattern — it's not obvious when you're new and it took me a while to internalize it too.' Normalize the learning process: 'When I joined, I wrote a custom email sender too before discovering we had NotificationQueue — it's hard to know what exists in a large codebase.' Offer help: 'Want to pair on extracting this into a service? I can show you the pattern and you can drive.' Encourage: 'Your code logic is solid — the refactoring is mostly about fitting into team patterns, which takes time to learn.'"
+    weight: 0.35
+    description: "Supportive tone"
+  - type: llm_judge
+    criteria: "Review develops the developer's independence — don't just give answers, teach how to find them: 'Next time you need to handle notifications, try searching for NotificationType in the codebase — you'll find the enum and the existing service. Pro tip: `grep -r NotificationType src/` is my go-to for discovering existing patterns.' Suggest a discovery process: 'Before building a new utility, check: (1) search the codebase for similar patterns, (2) check the team wiki under Shared Services, (3) ask in #eng-questions — someone might have built it.' Set expectations for growth: 'For your next PR, try to identify one existing utility you can reuse instead of building from scratch.' Don't over-correct: 'I'm suggesting improvements for several areas, but you don't need to address everything in this PR. Focus on the service extraction (#1) and enum usage (#2) — those are the highest-impact changes. The rest can be follow-up work.'"
+    weight: 0.30
+    description: "Building independence"

package/courses/code-review-feedback-writing/scenarios/level-3/production-incident-review.yaml ADDED Viewed

@@ -0,0 +1,42 @@
+meta:
+  id: production-incident-review
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Review incident fix PRs — evaluate hotfix code for correctness, root cause coverage, regression prevention, and appropriate urgency without cutting corners"
+  tags: [code-review, incident, hotfix, production, root-cause, regression, advanced]
+state: {}
+trigger: |
+  There's an active production incident. A developer submits a PR
+  tagged "HOTFIX" to resolve it. The incident: users intermittently
+  see another user's order history when clicking "My Orders."
+  The PR (45 lines):
+  - Adds a user ID filter to the orders query that was missing
+  - The fix is in the caching layer — orders were cached by page
+    URL without including the user ID in the cache key
+  - The author also "cleaned up" some unrelated code in the same file
+  - No tests added ("we need to ship this fast")
+  - The fix handles the cache key but doesn't invalidate existing
+    incorrect cache entries
+  Task: Review this hotfix balancing urgency with safety. Write your
+  review addressing: is the fix correct, does it fully resolve the
+  issue, what's the risk of shipping vs not shipping, and what
+  follow-up work is needed. This is a time-sensitive review.
+assertions:
+  - type: llm_judge
+    criteria: "Fix correctness is evaluated against the root cause — 'The cache key fix is correct — cache_key = /orders?page=1 should be cache_key = /orders?page=1&user_id=123. This prevents NEW incorrect entries. However, EXISTING cached entries still contain mixed user data. Risk: users continue seeing wrong data until those cache entries expire.' Critical missing piece: 'We need to either (1) flush the orders cache entirely, or (2) add cache invalidation for affected keys. Without this, the fix only prevents future occurrences — existing cached wrong data persists for [TTL duration].' Assessment: 'The fix is 80% correct — it prevents the vulnerability from recurring but doesn't clean up existing corrupted cache entries.'"
+    weight: 0.35
+    description: "Fix correctness"
+  - type: llm_judge
+    criteria: "Review balances urgency with necessary safety — decision framework: 'Ship immediately with cache flush' not 'Wait for perfect fix.' Specific recommendation: (1) Remove the unrelated code cleanup — hotfix PRs should ONLY contain the fix (reduces risk, simplifies rollback). (2) Add cache flush/invalidation — without this, the data leak continues from cache. (3) Tests can be a fast follow-up PR — but must be done today, not 'later.' (4) Approve once: unrelated changes removed AND cache invalidation added. Urgency acknowledgment: 'This is a data privacy incident — users seeing other users' data. Speed matters. But shipping a partial fix that still leaks cached data isn't actually faster — it's a second incident.'"
+    weight: 0.35
+    description: "Urgency balance"
+  - type: llm_judge
+    criteria: "Follow-up work is clearly scoped — immediate (ship with hotfix): cache key fix + cache flush + remove unrelated changes. Today: add regression test that verifies cache keys include user ID. This week: (1) audit all other cache keys for similar user-scoping issues, (2) add cache key logging/monitoring to detect future issues, (3) consider cache key builder utility that enforces user scoping by default. Post-incident: (1) retrospective to understand why the original code didn't include user ID (was it a refactor regression? was it never there?), (2) review process improvement — how did the original PR pass review without user scoping? (3) postmortem document. Frame follow-up as learning: 'The cache key pattern should be documented as a team standard — this is how bugs become institutional knowledge.'"
+    weight: 0.30
+    description: "Follow-up scoping"

package/courses/code-review-feedback-writing/scenarios/level-3/reviewing-senior-code.yaml ADDED Viewed

@@ -0,0 +1,47 @@
+meta:
+  id: reviewing-senior-code
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Review senior engineer's code — navigate the dynamics of providing feedback to more experienced developers while maintaining quality standards"
+  tags: [code-review, senior-engineer, dynamics, confidence, speaking-up, advanced]
+state: {}
+trigger: |
+  You're a mid-level engineer asked to review a PR from the team's
+  most senior developer (10 years experience, built the core system).
+  Their PR refactors the authentication middleware — 250 lines.
+  You find several concerns:
+  1. A race condition when refreshing OAuth tokens concurrently
+  2. The new middleware doesn't handle the edge case of expired
+     refresh tokens (the old code did)
+  3. Magic numbers: timeout values without named constants
+  4. No tests for the concurrent token refresh scenario
+  5. The refactor changes behavior — previously failed silently,
+     now throws (affects all callers)
+  You're nervous because:
+  - This person is far more experienced than you
+  - They might know something you don't
+  - You've seen them dismiss junior reviewers before
+  - But you're genuinely concerned about #1 and #2
+  Task: Write the review navigating the power dynamic. Show how to
+  give feedback to someone more senior while being confident in your
+  observations. Write a guide for reviewing up.
+assertions:
+  - type: llm_judge
+    criteria: "Technical concerns are communicated with appropriate confidence — for genuine bugs (#1, #2): be direct but frame as observations, not accusations. 'I traced through the concurrent token refresh path and I believe there's a race condition: if two requests trigger a refresh simultaneously, both will attempt to write the new token. The old code used a mutex here (auth.ts:89 in the previous version) — was the removal intentional?' For the behavior change (#5): 'I noticed the error handling shifted from silent failure to throwing. Since this middleware is used by all 40 endpoints, this changes behavior for every request — was this intentional? If so, should we coordinate the rollout?' Confidence where warranted: 'The race condition is a real concern regardless of experience level — concurrent access bugs don't care about seniority.'"
+    weight: 0.35
+    description: "Confident feedback"
+  - type: llm_judge
+    criteria: "Power dynamic is navigated without being deferential or aggressive — don't start with: 'I might be wrong but...' or 'I'm sure you thought of this...' (undermines your review). Don't say: 'This is obviously wrong' (confrontational with senior). Do: state observations factually. 'Lines 45-60: when two concurrent requests both have expired tokens, the first refresh succeeds but the second attempts to refresh with the already-used refresh token, which will fail.' Ask genuine questions for things you're unsure about: 'I see the silent failure was changed to throwing — is there a design document for this change? I want to understand the migration plan for callers.' Nits (#3) are fine to raise normally — everyone gets nits. The key: treat the code the same regardless of who wrote it"
+    weight: 0.35
+    description: "Dynamic navigation"
+  - type: llm_judge
+    criteria: "Guide for reviewing up is practical — principles: (1) Your job is to review the CODE, not the PERSON. Experience doesn't make code bug-free. (2) If you see a potential bug, say so — you were asked to review for a reason. (3) Frame uncertain observations as questions ('Could this cause X?'), but frame clear issues as statements ('This race condition will cause Y'). (4) Don't pre-apologize for having feedback — it undermines the review process. (5) Senior engineers generally WANT honest review — they chose you as a reviewer. (6) If dismissed unfairly, escalate to your manager — review quality is everyone's responsibility. Common mistake: being so deferential that you provide a useless rubber-stamp review. The worst outcome isn't disagreeing with a senior — it's a production bug you saw but didn't mention"
+    weight: 0.30
+    description: "Reviewing up guide"

package/courses/code-review-feedback-writing/scenarios/level-3/reviewing-unfamiliar-code.yaml ADDED Viewed

@@ -0,0 +1,43 @@
+meta:
+  id: reviewing-unfamiliar-code
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Review unfamiliar code — develop strategies for reviewing code in codebases you don't know well, asking the right questions, and still providing value"
+  tags: [code-review, unfamiliar-codebase, learning, questions, onboarding, advanced]
+state: {}
+trigger: |
+  You just joined a new team and are asked to review PRs on day one.
+  You don't know:
+  - The architecture or how services communicate
+  - Team conventions (naming, patterns, error handling approach)
+  - Business domain (insurance claims processing)
+  - Why certain design decisions were made
+  - What the testing expectations are
+  The PR adds a new "claims escalation" feature — 200 lines across
+  5 files. You can read the code but you're missing context for many
+  decisions.
+  Task: Write a review that provides value despite limited context.
+  Show how to: identify universal quality issues (any codebase),
+  ask context-gathering questions that also serve as review, learn
+  the codebase through reviewing, and be honest about what you can
+  and can't evaluate. Write the review comments and a reflection
+  on your review strategy.
+assertions:
+  - type: llm_judge
+    criteria: "Universal quality issues are identified — regardless of domain knowledge, catch: (1) error handling gaps (catch block that swallows errors), (2) missing input validation, (3) potential null/undefined references, (4) SQL injection or security concerns, (5) dead code or unused variables, (6) inconsistent patterns within the PR itself (does the PR's own code follow its own conventions?). (7) Test quality: are tests testing behavior or implementation? Do tests have meaningful names? (8) Code clarity: are variable/function names descriptive? Is the control flow easy to follow? These issues are valid in any codebase and show the reviewer is thorough despite being new"
+    weight: 0.35
+    description: "Universal issues"
+  - type: llm_judge
+    criteria: "Questions serve double duty as review and learning — questions that are both genuine learning AND review prompts: 'I notice ClaimEscalation inherits from BaseProcessor — could you point me to where BaseProcessor is defined? I want to understand the lifecycle hooks.' This helps the reviewer learn AND surfaces whether the inheritance is appropriate. 'What triggers a claim to enter the escalation queue? I want to make sure the conditions in isEligibleForEscalation() cover all cases.' Asks for business context AND validates the logic. 'I see we're using Redis for the escalation queue — is there a reason we're not using the existing RabbitMQ setup I saw in other services?' Learns architecture AND questions the design choice. Each question shows the reviewer is engaged and thinking, not just rubber-stamping"
+    weight: 0.35
+    description: "Dual-purpose questions"
+  - type: llm_judge
+    criteria: "Honest scope disclosure builds trust — opening comment: 'Disclosure: I'm new to this codebase and the insurance domain. I focused my review on code quality, error handling, and testing patterns. I can't yet validate the business logic or architectural fit — I'd recommend a domain expert also reviews the escalation rules.' Specific limitations: 'I'm not sure if the 72-hour escalation window is a business requirement or an arbitrary choice — if it's business-critical, it should be a named constant with a comment explaining the requirement.' Learning-through-review: 'This review helped me understand how claims flow through the system. I documented what I learned in case it's useful: [brief architecture note].' Value: even as a newcomer, catching a null reference exception or a missing error handler is valuable — don't apologize for what you can contribute"
+    weight: 0.30
+    description: "Honest scope"

package/courses/code-review-feedback-writing/scenarios/level-3/speed-vs-thoroughness.yaml ADDED Viewed

@@ -0,0 +1,46 @@
+meta:
+  id: speed-vs-thoroughness
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Balance review speed and thoroughness — develop strategies for providing timely reviews without sacrificing quality for different types of PRs"
+  tags: [code-review, speed, thoroughness, triage, time-management, advanced]
+state: {}
+trigger: |
+  You have 6 PRs in your review queue, all assigned to you. Your
+  team's SLA is 24 hours for first review. It's 4pm and these came
+  in today:
+  1. Hotfix: Production bug — users seeing 500 errors on checkout
+     (15 lines, high urgency)
+  2. Feature: New dashboard analytics (350 lines, normal priority,
+     sprint deadline Friday)
+  3. Dependency update: Bump React from 18.2 to 18.3 (5 files changed,
+     low priority)
+  4. Refactor: Extract common validation logic into shared module
+     (200 lines, medium priority)
+  5. Junior dev's PR: First feature implementation, they're blocked
+     waiting for review (100 lines, medium urgency due to person)
+  6. Tech debt: Remove deprecated API endpoints (150 lines, low priority)
+  You have 2 hours of review time today.
+  Task: Create a review strategy for all 6 PRs. Show how to triage,
+  what depth of review each gets, write example reviews at different
+  thoroughness levels, and explain your prioritization reasoning.
+assertions:
+  - type: llm_judge
+    criteria: "Triage ordering is well-reasoned — priority: (1) Hotfix — production impact, review in next 15 minutes. (2) Junior dev's PR — person is blocked, reviewing unblocks them (invest 30 minutes). (3) Feature — sprint deadline pressure (invest 45 minutes). (4) Refactor — medium priority, affects shared code quality (invest 20 minutes). (5) Dependency update — low risk if CI passes, quick scan (10 minutes). (6) Tech debt — no deadline, can slip to tomorrow (skip or quick scan). Reasoning: combine urgency (production > blocked person > deadline) with risk (shared code > isolated change > dependency bump). The junior dev is prioritized higher than strict urgency would suggest because blocking people is more costly than blocking code"
+    weight: 0.35
+    description: "Triage ordering"
+  - type: llm_judge
+    criteria: "Review depth varies appropriately per PR — Hotfix (15 min): verify the fix addresses the error, check for regression risk, don't nit on style — approve fast with 'LGTM, fix looks correct. Let's clean up styling in a follow-up.' Feature (45 min): full review — architecture, security, tests, performance. This is the big one that justifies thorough review. Junior PR (30 min): mentor-focused review — teach patterns, be encouraging, unblock quickly even if follow-up work is needed. Refactor (20 min): focus on interface changes and backward compatibility — internal implementation can be reviewed more lightly. Dependency (10 min): check CI passes, scan changelog for breaking changes, verify no deprecated API usage. Tech debt (5 min or skip): quick scan for accidental behavior changes, or leave comment 'I'll review this first thing tomorrow.' Each depth level is explicitly justified"
+    weight: 0.35
+    description: "Varying depth"
+  - type: llm_judge
+    criteria: "Example reviews demonstrate different thoroughness levels — quick review (hotfix): 'Verified: the fix adds null check for user.cart before accessing .items. This matches the error in Sentry (TypeError: Cannot read property items of null). CI passes. Approved — ship it.' Medium review (refactor): summary of approach, 2-3 specific comments on interface design, one question about edge case, approve with suggestions. Deep review (feature): structured thematic review, 8-10 comments across architecture/security/testing, request changes on 2 blocking items. Honest scoping: when you do a lighter review, say so: 'I did a quick review focused on the API interface — I'd recommend someone else gives the internal logic a deeper look.' Better to do a timeboxed review and disclose the scope than to skim everything and pretend it was thorough"
+    weight: 0.30
+    description: "Depth examples"

package/courses/code-review-feedback-writing/scenarios/level-4/automated-review-strategy.yaml ADDED Viewed

@@ -0,0 +1,44 @@
+meta:
+  id: automated-review-strategy
+  level: 4
+  course: code-review-feedback-writing
+  type: output
+  description: "Design automated review tooling — determine what to automate vs keep as human review, select tools, and integrate automation into the review workflow"
+  tags: [code-review, automation, linting, static-analysis, CI, tooling, expert]
+state: {}
+trigger: |
+  Your team spends too much time on mechanical review tasks. You want
+  to automate what you can so humans focus on what matters. Current
+  review pain points:
+  - 40% of comments are style/formatting (Prettier/ESLint should handle)
+  - Security vulnerabilities sometimes slip through (Snyk/Semgrep could catch)
+  - Type errors found in review (TypeScript strict mode would prevent)
+  - Dead code and unused imports flagged manually
+  - Test coverage gaps discovered during review
+  - API breaking changes not noticed until production
+  - Documentation staleness (code changed, docs not updated)
+  Budget: $500/month for tooling. Team: 30 engineers.
+  Task: Design the automated review tooling strategy. For each pain
+  point, decide: automate fully, automate as advisory, or keep human.
+  Include: tool selection, CI/CD integration, configuration approach,
+  and how to introduce automation without overwhelming developers
+  with noise.
+assertions:
+  - type: llm_judge
+    criteria: "Each pain point has a clear automation decision — automate fully (block PR if failing): formatting (Prettier — auto-fix on commit), linting (ESLint — errors block, warnings advisory), type checking (TypeScript strict — blocks build), unused imports (ESLint rule — auto-fix). Automate as advisory (comment on PR, don't block): security scanning (Semgrep/CodeQL — flag but human validates severity), test coverage (report delta — don't block on arbitrary threshold), dead code detection (report but human decides if intentional). Keep human: architectural review, business logic validation, API design review, documentation quality (detect CHANGE but human reviews quality). Each decision justified: 'Formatting is objectively correct — no human judgment needed. Security findings require context — is this a real vulnerability or a false positive?'"
+    weight: 0.35
+    description: "Automation decisions"
+  - type: llm_judge
+    criteria: "Tool selection is specific and budget-conscious — within $500/month: (1) Prettier + ESLint: free, handles formatting and code quality. (2) TypeScript strict mode: free, catches type errors. (3) GitHub Actions: included in GitHub, runs CI checks. (4) Semgrep: free tier covers security scanning. (5) CodeCov: free for open source, ~$100/month for private — coverage delta reporting. (6) danger.js or similar: free, custom PR automation (check for docs updates when API files change, check for changelog when version bumps). (7) Spectral: free, OpenAPI linting for API breaking changes. Total: ~$100-200/month, well under budget. Alternative: GitHub Advanced Security (~$400/month) for CodeQL + secret scanning + dependency review. Configuration: use shared config packages so all 30 engineers have consistent setup"
+    weight: 0.35
+    description: "Tool selection"
+  - type: llm_judge
+    criteria: "Rollout strategy prevents automation fatigue — phase 1 (week 1-2): Prettier auto-format only (zero noise — just fixes things). Phase 2 (week 3-4): ESLint errors only (not warnings — configure strictly to avoid noise). Phase 3 (month 2): Security scanning as advisory comments. Phase 4 (month 3): Coverage reporting and API change detection. Key principle: 'Every automated check must be actionable. If developers start ignoring bot comments, the automation is noise, not signal.' Configuration: tune tools before enabling — run against existing codebase, fix existing violations, THEN enable as blocking. False positive budget: 'If a tool produces more than 10% false positives, fix configuration or remove the tool. Developer trust in automation is fragile.' Gradual: introduce one tool at a time, measure impact, adjust before adding the next"
+    weight: 0.30
+    description: "Rollout strategy"

package/courses/code-review-feedback-writing/scenarios/level-4/expert-review-shift.yaml ADDED Viewed

@@ -0,0 +1,46 @@
+meta:
+  id: expert-review-shift
+  level: 4
+  course: code-review-feedback-writing
+  type: output
+  description: "Combined expert shift — redesign an organization's code review program covering culture, process, automation, metrics, and training for a growing engineering team"
+  tags: [code-review, combined, shift-simulation, program-design, expert]
+state: {}
+trigger: |
+  You've been promoted to VP of Engineering at a 100-person startup
+  (60 engineers, 8 teams). The CEO says: "Our code quality is
+  declining, deployments keep breaking, and engineers are unhappy
+  with the review process. Fix it."
+  Current state:
+  - No formal review process — each team does their own thing
+  - Average defect escape rate: 12% (industry target: < 5%)
+  - Deployment rollback rate: 8% (should be < 2%)
+  - Engineer satisfaction with reviews: 3.5/10
+  - 3 engineers left citing "toxic review culture" in exit interviews
+  - No automated code quality checks
+  - Average time-to-merge: 8 days (blocking feature delivery)
+  Budget: $50K for tooling, ability to hire 1 additional head
+  Task: Present your complete code review transformation plan to the
+  CEO. Include: diagnosis of current problems, 90-day action plan
+  with quick wins, 6-month target state, tooling investment, culture
+  change program, metrics dashboard, and how this connects to the
+  broader goals of code quality and engineer retention.
+assertions:
+  - type: llm_judge
+    criteria: "Diagnosis connects symptoms to root causes — 12% defect escape: reviews aren't catching bugs (reviewers focused on style, not logic — or rubber-stamping). 8% rollback rate: inadequate testing requirement in reviews (tests not reviewed or not required). 3.5/10 satisfaction: combination of slow reviews (8 day merge), hostile comments (3 people left), and inconsistent standards. Toxic culture: specific reviewers giving harsh feedback without consequences — management didn't intervene. 8-day merge: no SLA, no reviewer assignment, no load balancing — PRs sit in queues. Root cause: absence of process, not malicious intent. Nobody designed the review system — it evolved haphazardly. Fix the system, not the people (though the toxic reviewer behavior needs direct intervention)"
+    weight: 0.35
+    description: "Root cause diagnosis"
+  - type: llm_judge
+    criteria: "90-day plan delivers quick wins and lasting change — days 1-30 (immediate): (1) address toxic reviewer behavior directly (1:1s, expectations, consequences), (2) implement PR template with minimum requirements, (3) add Prettier + ESLint to CI (eliminate 40% of review comments), (4) establish 24-hour review SLA, (5) hire: DevEx engineer to own review tooling. Days 31-60: (1) CODEOWNERS for automatic reviewer assignment, (2) security scanning in CI (Semgrep), (3) reviewer training workshop (2 hours, all engineers), (4) code review guidelines document published. Days 61-90: (1) metrics dashboard live, (2) review quality feedback loop (manager reviews reviews), (3) cross-team reviewer rotation pilot, (4) quarterly calibration exercise. Each action connected to a specific metric improvement"
+    weight: 0.35
+    description: "90-day plan"
+  - type: llm_judge
+    criteria: "6-month targets and CEO communication are compelling — 6-month targets: defect escape < 6% (from 12%), rollback < 4% (from 8%), satisfaction > 6.5/10 (from 3.5), time-to-merge < 3 days (from 8), zero departures citing review culture. Budget allocation: tooling $20K (CI/CD, security scanning, coverage), training $10K (workshops, external facilitation), DevEx hire $80K (6 months salary). ROI for CEO: each production defect costs ~$5K (investigation + fix + customer impact). Reducing defect escape from 12% to 6% on 500 deploys/year = 30 fewer defects = $150K saved. Each departed engineer costs ~$150K to replace — preventing 3 departures = $450K. Total ROI: $600K savings on $110K investment. Culture change message: 'Code review is how we maintain quality at scale. We're investing in making it a positive, effective process — not a bureaucratic checkbox.'"
+    weight: 0.30
+    description: "Targets and communication"

package/courses/code-review-feedback-writing/scenarios/level-4/review-culture-design.yaml ADDED Viewed

@@ -0,0 +1,41 @@
+meta:
+  id: review-culture-design
+  level: 4
+  course: code-review-feedback-writing
+  type: output
+  description: "Design code review culture — establish review norms, expectations, and values that create a healthy, productive review environment for the engineering team"
+  tags: [code-review, culture, norms, psychological-safety, team-health, expert]
+state: {}
+trigger: |
+  You're the new engineering manager for a 25-person team. The code
+  review culture has problems:
+  - 3 senior engineers do 80% of reviews (bottleneck + burnout)
+  - Reviews average 5 days turnaround (blocking developers for a week)
+  - Some reviewers leave 30+ comments on every PR (demoralizing)
+  - Junior developers avoid requesting review from specific seniors
+  - Two engineers had a public argument in PR comments last month
+  - No one reviews documentation PRs ("not code, not my job")
+  - Some PRs are merged with "LGTM" and no substantive review
+  Task: Design a complete code review culture document. Include:
+  review values and principles, expected behaviors, SLAs, reviewer
+  responsibilities, author responsibilities, conflict resolution,
+  and how to onboard new team members into the culture. This becomes
+  the team's code review charter.
+assertions:
+  - type: llm_judge
+    criteria: "Values and principles address the specific problems — values: (1) Respect: critique code, never people. Public arguments in PRs are unacceptable — disagreements escalate to private discussion. (2) Shared responsibility: reviewing is everyone's job, not just seniors'. Every engineer reviews at least 2 PRs/week. (3) Timeliness: first review within 24 hours (business day). Slow reviews block people and waste context. (4) Proportionality: comment volume should match issue severity. 30 comments on a 50-line PR signals either a problematic PR (should have been discussed before writing) or an overly critical reviewer. (5) Growth orientation: reviews are opportunities for learning, not gatekeeping. (6) Comprehensiveness: documentation, tests, and configuration changes deserve review attention, not just 'code.'"
+    weight: 0.35
+    description: "Values and principles"
+  - type: llm_judge
+    criteria: "Concrete behaviors and SLAs are defined — reviewer SLA: first review pass within 24 business hours. If you can't review, reassign within 4 hours. Author SLA: respond to review comments within 24 hours. Comment labels: require 'blocking:', 'suggestion:', 'nit:', 'question:', 'praise:' prefixes. Maximum review rounds: if a PR requires more than 2 rounds, schedule a synchronous discussion instead. PR size: aim for under 200 lines; PRs over 400 lines can be reviewed in a meeting instead. Approval requirements: 1 approval for standard changes, 2 for security/auth/payments, team lead for architectural changes. Self-review: authors should self-review their own PR before requesting review (catches obvious issues)"
+    weight: 0.35
+    description: "Behaviors and SLAs"
+  - type: llm_judge
+    criteria: "Conflict resolution and onboarding are addressed — conflict resolution: (1) technical disagreements: if 2 rounds of comments don't resolve, take it offline — DM, call, or meeting. (2) Tone issues: if a comment feels personal, assume good intent first. If pattern continues, raise with manager privately. (3) Impasse: engineering manager breaks ties. (4) Never argue in PR comments — the PR author, other reviewers, and future readers all see it. Onboarding new team members: (1) first week: shadow 3 reviews from different reviewers, (2) second week: pair-review with a senior, (3) third week: solo review with senior co-reviewer, (4) ongoing: monthly 'review quality' check-in. Measuring health: quarterly anonymous survey on review experience, track turnaround time, track comment patterns. Review the charter quarterly and update based on team feedback"
+    weight: 0.30
+    description: "Conflict and onboarding"

package/courses/code-review-feedback-writing/scenarios/level-4/review-guidelines-standards.yaml ADDED Viewed

@@ -0,0 +1,45 @@
+meta:
+  id: review-guidelines-standards
+  level: 4
+  course: code-review-feedback-writing
+  type: output
+  description: "Create review guidelines — write comprehensive code review standards that define expectations for both authors and reviewers across the organization"
+  tags: [code-review, guidelines, standards, expectations, organization, expert]
+state: {}
+trigger: |
+  Your growing engineering organization (60 developers, 8 teams) needs
+  written code review guidelines. Currently, review expectations are
+  tribal knowledge — each team has different unwritten rules.
+  Inconsistencies across teams:
+  - Team A requires 2 approvals, Team B requires 1
+  - Team C blocks on any style issue, Team D auto-formats
+  - Some teams review architecture before coding, others review
+    only the final PR
+  - Test requirements: Team E requires 80% coverage, Team F has
+    no coverage requirement
+  - Some teams use PR templates, others have no structure
+  You need org-wide guidelines that are flexible enough for team
+  autonomy but consistent enough for cross-team collaboration.
+  Task: Write the organization's code review guidelines document.
+  Include: universal requirements (all teams must follow), recommended
+  practices (teams should customize), author responsibilities, reviewer
+  responsibilities, review scope by PR type, and escalation procedures.
+assertions:
+  - type: llm_judge
+    criteria: "Universal requirements establish a quality floor — non-negotiable (all teams): (1) every production code change must be reviewed by at least 1 person who didn't write it, (2) security-sensitive code (auth, payments, PII handling) requires review by security-trained engineer, (3) database migrations require DBA or data team review, (4) review comments must use severity labels (blocking/suggestion/nit), (5) first review within 24 business hours, (6) no self-merging except for emergencies (documented with incident link), (7) reviews must verify: correctness, test coverage for new behavior, no security regressions. Each requirement has a clear rationale explaining WHY it's non-negotiable. Exceptions process: how to request a waiver for legitimate reasons (hotfix, trivial change)"
+    weight: 0.35
+    description: "Universal requirements"
+  - type: llm_judge
+    criteria: "Author and reviewer responsibilities are explicit — author responsibilities: (1) self-review before requesting review (read your own diff), (2) write meaningful PR description (what, why, how to test), (3) keep PRs focused (one concern per PR), (4) respond to comments within 24 hours, (5) don't merge with unresolved blocking comments. Reviewer responsibilities: (1) review within 24 hours, (2) review the entire PR (don't review half and approve), (3) label comment severity, (4) provide actionable feedback (problem + suggestion), (5) approve or request changes (don't leave in liminal state), (6) re-review within 4 hours after author addresses feedback. PR templates: suggested sections — description, motivation, testing instructions, screenshots (for UI), checklist (tests added, docs updated, no secrets). Teams can customize but must include at minimum: description and testing instructions"
+    weight: 0.35
+    description: "Responsibilities"
+  - type: llm_judge
+    criteria: "Flexibility and escalation balance consistency with autonomy — team-customizable: number of required approvals (minimum 1, teams can require 2+), coverage thresholds (each team sets their own, but must have one), style/formatting rules (team choice but must be automated), PR size limits (recommended < 200 lines, teams set their own). Review scope by PR type: hotfix (correctness only — ship fast), feature (full review — architecture, tests, docs), refactor (focus on behavior preservation), dependency update (check changelog + CI). Escalation: (1) reviewer disagrees with author → discussion in comments, (2) can't resolve → bring in a third reviewer, (3) still can't resolve → team lead decides, (4) cross-team disagreement → engineering manager. Document versioning: guidelines reviewed quarterly, teams can propose changes via PR to the guidelines doc itself"
+    weight: 0.30
+    description: "Flexibility and escalation"

package/courses/code-review-feedback-writing/scenarios/level-4/review-load-balancing.yaml ADDED Viewed

@@ -0,0 +1,39 @@
+meta:
+  id: review-load-balancing
+  level: 4
+  course: code-review-feedback-writing
+  type: output
+  description: "Balance review workload — design assignment systems that distribute review load fairly, develop reviewer expertise, and prevent bottlenecks"
+  tags: [code-review, load-balancing, assignment, bottleneck, fairness, expert]
+state: {}
+trigger: |
+  Your 40-person engineering organization has a review load imbalance:
+  - 5 senior engineers review 70% of all PRs (burned out, blocking others)
+  - 15 mid-level engineers review 25% (could do more)
+  - 20 junior engineers review 5% (missing learning opportunity)
+  - Some code areas have only 1 qualified reviewer (bus factor = 1)
+  - Security-sensitive code (auth, payments) requires senior review
+  - Cross-team PRs sit in queues because no one "owns" the review
+  - Some reviewers are faster but lower quality; others are thorough but slow
+  Task: Design a review assignment and load balancing system. Include:
+  assignment algorithm, reviewer tiers, escalation paths, cross-
+  training plan to reduce bus factors, and how to handle different
+  review quality levels. Balance fairness, speed, and quality.
+assertions:
+  - type: llm_judge
+    criteria: "Assignment system balances load with expertise — tiered system: (1) Primary reviewer (auto-assigned via round-robin within capable pool) — must be able to review the code area. (2) Secondary reviewer for critical paths (auth, payments, data migrations) — always a senior engineer. Use CODEOWNERS to define code area → reviewer pool mappings. Round-robin within each pool to distribute evenly. Capacity tracking: each reviewer has a weekly capacity (juniors: 2-3 reviews, mids: 4-5, seniors: 3-4 — seniors review fewer but harder PRs). Assignment considers current load: don't assign to someone with 3 pending reviews. Opt-out periods: reviewers can mark 'focus time' blocks where they're not assigned reviews"
+    weight: 0.35
+    description: "Assignment system"
+  - type: llm_judge
+    criteria: "Cross-training plan reduces bus factor — identify single points of failure: code areas with only 1 qualified reviewer. Training approach: (1) pair reviewing — shadow the expert for 2-3 reviews, then co-review, then solo with expert available for questions. (2) Rotate secondary reviewer assignments — mid-level engineers get assigned to unfamiliar areas with a senior co-reviewer. (3) Documentation of domain knowledge — expert creates a 'reviewer guide' for their specialty area (what to look for in payment code, common pitfalls). Target: every code area has at least 3 qualified reviewers within 6 months. Tracking: matrix of code areas × qualified reviewers, updated monthly. Incentive: reviewing unfamiliar code (with appropriate support) counts toward growth goals"
+    weight: 0.35
+    description: "Cross-training"
+  - type: llm_judge
+    criteria: "Quality differences are managed fairly — acknowledge: reviewers have different quality levels. Fast-but-shallow reviewers: good for low-risk code (documentation, minor refactors, dependency updates). Thorough-but-slow reviewers: good for complex logic, security-sensitive code. Match reviewer style to PR risk. Quality improvement: (1) monthly 'review quality' feedback — manager samples 3 reviews per engineer, provides feedback. (2) Example repository of excellent reviews — annotated examples of good comments at each severity level. (3) Pair reviewing for quality improvement — pair a fast reviewer with a thorough one. Avoid: publicly ranking review quality (shame doesn't improve quality). Do: celebrate specific excellent review comments in team channel. Escalation: when a reviewer is consistently rubber-stamping, private conversation about review expectations"
+    weight: 0.30
+    description: "Quality management"

package/courses/code-review-feedback-writing/scenarios/level-4/review-metrics.yaml ADDED Viewed

@@ -0,0 +1,39 @@
+meta:
+  id: review-metrics
+  level: 4
+  course: code-review-feedback-writing
+  type: output
+  description: "Measure review effectiveness — define and track metrics that indicate code review quality, not just velocity"
+  tags: [code-review, metrics, effectiveness, quality, measurement, expert]
+state: {}
+trigger: |
+  Leadership asks: "Are our code reviews actually catching bugs, or
+  are we just wasting engineering time?" You need data to answer.
+  Available data:
+  - Git/GitHub: PR metadata, review comments, approval times, merge times
+  - Bug tracker: production bugs, their root causes, which PRs introduced them
+  - CI/CD: build failures, test failures, deployment rollbacks
+  - Surveys: developer satisfaction data (quarterly)
+  - Code quality tools: SonarQube, CodeClimate scores over time
+  Task: Design a code review metrics program. Define: what to measure,
+  how to measure it, what "good" looks like for each metric, how to
+  use metrics to improve reviews, and what metrics to NOT track (or
+  track privately). Avoid metrics that incentivize bad behavior.
+assertions:
+  - type: llm_judge
+    criteria: "Effectiveness metrics measure outcomes, not just activity — primary: (1) Defect escape rate: percentage of production bugs that passed through code review without being caught. Target: < 5%. Measures: is review actually catching things? (2) Review-introduced improvements: how often do review comments lead to code changes that prevent future bugs? Track: comment → code change → no subsequent bug in that area. (3) Knowledge distribution: are review comments teaching? Measure: decrease in similar review comments to same developer over time (they're learning). (4) Developer growth: developers who receive reviews improve their first-submission quality over time (fewer review rounds needed). These measure the PURPOSE of code review, not just the process"
+    weight: 0.35
+    description: "Effectiveness metrics"
+  - type: llm_judge
+    criteria: "Process metrics are balanced with quality safeguards — track: time-to-first-review (SLA compliance), review rounds per PR, time-to-merge. But: pair with quality metrics to prevent gaming. If time-to-merge drops but defect escape rate rises, we're approving too fast. Review depth: average meaningful comments per review (exclude auto-generated, style comments). Review distribution: Gini coefficient of review assignments (0 = perfectly equal, 1 = one person does everything). Target: < 0.4. PR rejection rate: if reviews rarely request changes, either code quality is excellent or reviews are rubber stamps — correlate with defect escape rate to determine which. Comment resolution rate: what percentage of review comments result in code changes? If low, comments may not be actionable"
+    weight: 0.35
+    description: "Process metrics"
+  - type: llm_judge
+    criteria: "Dangerous metrics and gaming are addressed — DO NOT track (or only track privately): (1) number of comments per reviewer — incentivizes volume over quality (nitpicking to boost numbers). (2) Review speed as individual metric — incentivizes rubber-stamp approvals. (3) Bug count per developer — creates blame culture, discourages risk-taking. Instead track: bug patterns (systemic issues, not individual blame). Private metrics (manager only): individual review quality scores (are comments helpful?), reviewer response time per person (are specific people bottlenecks?). Public metrics (team level only): team defect rate, team review turnaround, team satisfaction. Gaming prevention: any metric that becomes a target will be gamed. Use metrics as signals for investigation, not as goals to optimize. 'If your metrics improve but developer satisfaction drops, the metrics are lying.'"
+    weight: 0.30
+    description: "Dangerous metrics"

package/courses/code-review-feedback-writing/scenarios/level-4/review-process-optimization.yaml ADDED Viewed

@@ -0,0 +1,48 @@
+meta:
+  id: review-process-optimization
+  level: 4
+  course: code-review-feedback-writing
+  type: output
+  description: "Optimize review processes — reduce review cycle time, improve throughput, and eliminate bottlenecks while maintaining quality"
+  tags: [code-review, process, optimization, cycle-time, bottleneck, throughput, expert]
+state: {}
+trigger: |
+  Your engineering organization (50 developers, 6 teams) has code
+  review metrics that need improvement:
+  Current state:
+  - Average time-to-first-review: 3.2 days
+  - Average time-to-merge: 7.1 days (after review rounds)
+  - Average review rounds: 2.8 per PR
+  - 15% of PRs are abandoned after review (never merged)
+  - 40% of review comments are about style/formatting
+  - Top 5 reviewers handle 65% of all reviews
+  - Developer satisfaction with review process: 4.2/10
+  Target:
+  - Time-to-first-review: < 24 hours
+  - Time-to-merge: < 3 days
+  - Review rounds: < 2 per PR
+  - Abandoned PRs: < 5%
+  - Developer satisfaction: > 7/10
+  Task: Design the process optimization plan. Include: root cause
+  analysis for each metric, specific interventions, implementation
+  timeline, and how to measure improvement. Show the connection
+  between interventions and metric improvements.
+assertions:
+  - type: llm_judge
+    criteria: "Root causes are diagnosed from the metrics — 3.2 day first review: top 5 reviewers are bottlenecked (65% of reviews, insufficient capacity). Fix: distribute reviews more evenly, use CODEOWNERS for auto-assignment, rotate reviewers. 7.1 day merge: 2.8 review rounds means 2-3 back-and-forth cycles. Fix: higher quality first submissions (PR templates, self-review checklist) and higher quality first reviews (one thorough pass, not incremental nitpicking). 40% style comments: humans doing work machines should do. Fix: linters + formatters in CI (ESLint, Prettier, auto-format on commit). 15% abandoned: PRs too large or misaligned with team direction. Fix: design review BEFORE coding for large features, PR size limits. Each root cause has a specific, measurable intervention"
+    weight: 0.35
+    description: "Root cause analysis"
+  - type: llm_judge
+    criteria: "Interventions are specific and connected to metrics — intervention 1 (week 1): add ESLint/Prettier to CI pipeline → eliminates 40% of review comments → reduces review rounds by ~1 → reduces time-to-merge by ~2 days. Intervention 2 (week 2): implement CODEOWNERS with round-robin assignment → distributes load from top 5 → reduces time-to-first-review. Intervention 3 (week 3): PR template with self-review checklist → reduces low-quality submissions → reduces first-review comments → reduces review rounds. Intervention 4 (week 4): require design doc/discussion for PRs > 300 lines → reduces abandoned PRs (alignment before coding). Intervention 5 (ongoing): review SLA dashboard visible to all → social pressure for timely reviews. Each intervention: expected metric impact, implementation effort, timeline"
+    weight: 0.35
+    description: "Specific interventions"
+  - type: llm_judge
+    criteria: "Measurement and iteration plan is rigorous — dashboard: real-time metrics for all 5 targets, broken down by team and individual. Weekly review: review metrics in team standup (not to blame, to identify bottlenecks). Monthly retrospective: which interventions are working? Which need adjustment? Leading indicators: (1) PR size distribution (smaller is better), (2) first-review quality (fewer follow-up rounds means better first reviews), (3) automated check pass rate (linter/formatter adoption). Lagging indicators: time-to-merge trend, developer satisfaction (quarterly survey). Iteration: 'We won't hit all targets immediately. The plan is: month 1 (automation + assignment), month 2 (process + templates), month 3 (culture + training). Measure after each month, adjust before proceeding.' Risk: 'Optimizing for speed can reduce quality — monitor defect escape rate alongside speed metrics'"
+    weight: 0.30
+    description: "Measurement plan"