npm - dojo.md - Versions diffs - 0.2.2 → 0.2.4 - Mend

dojo.md 0.2.2 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (196) hide show

package/courses/code-review-feedback-writing/scenarios/level-2/performance-feedback.yaml ADDED Viewed

@@ -0,0 +1,50 @@
+meta:
+  id: performance-feedback
+  level: 2
+  course: code-review-feedback-writing
+  type: output
+  description: "Give performance feedback — identify and communicate performance issues in code reviews with data-backed reasoning and optimization suggestions"
+  tags: [code-review, performance, optimization, complexity, N+1, intermediate]
+state: {}
+trigger: |
+  You're reviewing a PR for a dashboard that displays order analytics.
+  The page loads slowly in staging. You've identified performance
+  issues:
+  1. N+1 query: loops through 100 orders and makes a separate DB
+     query for each order's customer data
+  2. Loading all 50,000 orders into memory to calculate totals
+     instead of using SQL aggregation
+  3. A nested loop (O(n²)) matching orders to products when a
+     Map/dictionary lookup would be O(n)
+  4. Fetching the same user profile 3 times on the same page
+     (3 separate API calls in 3 components)
+  5. Rendering a 10,000-row table without virtualization
+  6. Computing expensive derived values on every render instead
+     of memoizing
+  Task: Write performance review comments for each issue. For each,
+  explain the performance impact with approximate numbers, explain
+  why the current approach is slow, and suggest the optimized approach
+  with enough detail to implement.
+assertions:
+  - type: llm_judge
+    criteria: "Performance impact is quantified — (1) N+1: 100 orders = 101 queries (1 list + 100 customer lookups). Each query ~5ms = 500ms total. Fix with JOIN or eager loading: 1 query ~10ms. 50x improvement. (2) 50K orders in memory: ~200MB memory, 2-3 seconds to load and iterate. SQL aggregate: `SELECT SUM(total), COUNT(*) FROM orders WHERE ...` returns one row in ~50ms. (3) O(n²): 1000 orders × 500 products = 500,000 comparisons. Map lookup: 1000 + 500 = 1,500 operations. 300x fewer operations. (4) 3 duplicate API calls: 3 × ~200ms = 600ms wasted. Fetch once and share. (5) 10K rows: DOM rendering 10K rows takes 2-3 seconds, causes jank. Virtualization renders ~50 visible rows. (6) Recomputing on every render: if computation takes 100ms and renders happen 10x/second, the page stutters. Numbers don't need to be exact — approximate is fine for review"
+    weight: 0.35
+    description: "Quantified impact"
+  - type: llm_judge
+    criteria: "Optimized approaches include implementation detail — (1) N+1 fix: show the JOIN query or ORM eager loading syntax (e.g., `Order.findAll({ include: Customer })`). (2) SQL aggregation: show the exact query `SELECT status, COUNT(*), SUM(total) FROM orders GROUP BY status`. (3) Map lookup: show building a Map first `const productMap = new Map(products.map(p => [p.id, p]))`, then `productMap.get(order.productId)`. (4) Shared fetch: use React context, SWR/React Query (deduplication built in), or lift the fetch to a parent component. (5) Virtualization: suggest react-window or react-virtualized, show the component change. (6) Memoization: show useMemo with dependency array. Each fix is a concrete code change, not just 'optimize this'"
+    weight: 0.35
+    description: "Implementation detail"
+  - type: llm_judge
+    criteria: "Comments explain WHY the current approach is slow — don't just say 'use a JOIN' — explain: 'Each database query has overhead: connection, parsing, execution, network round-trip. Doing this 100 times multiplies that overhead. A JOIN does one trip and the database optimizes the execution plan.' For O(n²): 'Nested loops compare every order with every product. When data grows from 1K to 10K items, this goes from 500K to 50M comparisons — it doesn't just get 10x slower, it gets 100x slower.' For memory: 'Loading 50K rows into Node.js means serializing from Postgres, deserializing in JS, iterating in a single-threaded event loop. The database can do this math on indexed data in milliseconds.' The reviewer teaches performance thinking, not just performance fixes"
+    weight: 0.30
+    description: "Explains why slow"

package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-breaking-changes.yaml ADDED Viewed

@@ -0,0 +1,44 @@
+meta:
+  id: reviewing-breaking-changes
+  level: 2
+  course: code-review-feedback-writing
+  type: output
+  description: "Review breaking changes — identify and communicate backward compatibility issues, migration requirements, and deployment risks in code reviews"
+  tags: [code-review, breaking-changes, backward-compatibility, migration, deployment, intermediate]
+state: {}
+trigger: |
+  You're reviewing a PR that modifies a shared library used by 6
+  microservices. The changes:
+  1. Renamed function: getUserById() → findUser() (affects all callers)
+  2. Changed return type: was { user: User } now returns User directly
+     (all callers destructure the old format)
+  3. Removed deprecated parameter: search(query, options) no longer
+     accepts options.fuzzy (3 services use this)
+  4. Changed default: connection timeout changed from 30s to 5s
+     (services with slow DB queries will start timing out)
+  5. New required parameter: authenticate(token) now requires
+     authenticate(token, scope) — all callers must be updated
+  The author says: "I cleaned up the API to be more consistent."
+  Task: Write review comments for each breaking change. For each,
+  explain: who is affected, what will break, whether it needs
+  migration, and how to make the change safely. Address the overall
+  approach of making 5 breaking changes in one PR.
+assertions:
+  - type: llm_judge
+    criteria: "Each breaking change impact is identified precisely — (1) Function rename: all 6 services that import getUserById will fail to compile/build. (2) Return type change: `const { user } = getUserById(id)` becomes `const user = findUser(id)` — runtime error: destructuring undefined. (3) Removed parameter: 3 services using options.fuzzy silently lose fuzzy search (might not error, just behave differently — WORSE than an error). (4) Changed default: services with queries that take 6-29 seconds worked before, now timeout. Performance regression masquerading as a bug. (5) New required parameter: TypeScript catches this at compile time, JavaScript fails at runtime. Each change: number of affected services, type of failure (compile error, runtime error, silent behavior change), how to detect the break"
+    weight: 0.35
+    description: "Impact identification"
+  - type: llm_judge
+    criteria: "Safe migration paths are proposed — (1) Rename: add findUser as alias first, deprecate getUserById with console.warn, remove after all callers migrate. (2) Return type: version the library, v2 returns new format, v1 maintained for 3 months. Or: support both formats with options parameter. (3) Remove parameter: warn when fuzzy is passed (don't silently ignore), give callers time to remove. (4) Default change: make it configurable, don't change the default — let callers opt-in to the shorter timeout. (5) New required parameter: make scope optional with a default value first, then make it required after all callers update. Overall: each change should be a separate PR that can be deployed independently. Use semantic versioning: these are MAJOR version changes"
+    weight: 0.35
+    description: "Migration paths"
+  - type: llm_judge
+    criteria: "Overall approach is challenged constructively — 'I understand the motivation to clean up the API — consistency is valuable. But 5 breaking changes in one PR affecting 6 services is risky. If something goes wrong, we can't identify which change caused the problem.' Suggest: 'Can we split this into 5 separate PRs, each with its own migration? This way: (1) each change can be reviewed independently, (2) each service team can update at their own pace, (3) if a change causes issues, we know exactly which one. Recommended order: add aliases/new APIs first (non-breaking), then deprecate old APIs, then remove old APIs after all consumers migrate.' Deployment strategy: 'This requires coordinated deployment of the library + all 6 services simultaneously. A versioned approach with backward compatibility lets services migrate independently.'"
+    weight: 0.30
+    description: "Approach challenge"

package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-complex-prs.yaml ADDED Viewed

@@ -0,0 +1,43 @@
+meta:
+  id: reviewing-complex-prs
+  level: 2
+  course: code-review-feedback-writing
+  type: output
+  description: "Review complex PRs — manage large pull requests by batching feedback, identifying critical paths, and providing structured review across many files"
+  tags: [code-review, complex-pr, large-pr, batching, structure, intermediate]
+state: {}
+trigger: |
+  A teammate submits a 450-line PR touching 12 files. It implements
+  a new order processing pipeline with:
+  - New database migration (adds 3 tables)
+  - New service class with business logic (order validation, pricing,
+    inventory checks)
+  - API controller with 4 new endpoints
+  - 8 test files
+  - Updates to 2 existing services (inventory, notifications)
+  You've been reviewing for 30 minutes and have 20+ comments scattered
+  across files. The review is becoming unfocused and you're losing
+  track of what's critical vs minor.
+  Task: Organize your review of this complex PR. Write: (1) a
+  structured summary that groups feedback by theme rather than file,
+  (2) identify the critical path (which parts MUST be correct for the
+  PR to be safe), (3) acknowledge what you didn't review thoroughly
+  (and why), and (4) suggest whether this PR should have been split.
+assertions:
+  - type: llm_judge
+    criteria: "Feedback is grouped thematically, not file-by-file — categories: (1) Data model concerns (migration issues, schema design), (2) Business logic (validation rules, pricing calculations, inventory checks), (3) API design (endpoint paths, request/response formats, error handling), (4) Testing gaps (missing edge cases, untested paths), (5) Integration concerns (changes to existing services, backward compatibility). Each category: list of specific comments with severity. This structure helps the author: they can address all data model concerns at once rather than jumping between files. The author sees the reviewer understands the overall design, not just individual lines"
+    weight: 0.35
+    description: "Thematic grouping"
+  - type: llm_judge
+    criteria: "Critical path is identified explicitly — 'The most important things to get right: (1) the database migration must be reversible and handle existing data, (2) the pricing calculation must match the business rules exactly (errors here cost real money), (3) the inventory check must be atomic (race condition between check and decrement would oversell).' Distinguish: areas where bugs cause data loss/money loss (critical) vs areas where bugs cause bad UX but are recoverable (important) vs areas where issues are cosmetic (minor). State what you focused review time on: 'I spent most time on the service class and migration. I did a lighter review of the tests and controller.'"
+    weight: 0.35
+    description: "Critical path identified"
+  - type: llm_judge
+    criteria: "PR splitting recommendation is constructive — acknowledge: 'I understand this is one feature, but a 450-line PR across 12 files is hard to review thoroughly. Reviewer fatigue after 200 lines means later files get less attention.' Suggest splitting: 'For future features like this, consider: PR 1: migration + model (reviewable independently), PR 2: service layer + tests, PR 3: API controller + integration. Each PR is ~150 lines and can be reviewed in one focused session.' Don't demand retroactive splitting of this PR (too late). Be honest about review limitations: 'I did a thorough review of the service layer but a lighter review of the tests — you may want a second reviewer for the test coverage.'"
+    weight: 0.30
+    description: "PR splitting advice"

package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-documentation.yaml ADDED Viewed

@@ -0,0 +1,47 @@
+meta:
+  id: reviewing-documentation
+  level: 2
+  course: code-review-feedback-writing
+  type: output
+  description: "Review documentation changes — evaluate documentation PRs for accuracy, completeness, clarity, and alignment with code changes"
+  tags: [code-review, documentation, accuracy, completeness, clarity, intermediate]
+state: {}
+trigger: |
+  You're reviewing PRs that include documentation changes alongside
+  code changes. Common problems you've seen:
+  PR #1: New feature added (webhook retry configuration) but the
+  README wasn't updated. The author says "I'll document it later."
+  PR #2: Documentation was updated to describe a new API parameter,
+  but the example code in the docs doesn't include the new parameter.
+  PR #3: A refactor changed the method signature from
+  `sendEmail(to, subject, body)` to `sendEmail(options)` but the
+  JSDoc comments still show the old signature.
+  PR #4: The CHANGELOG entry says "Fixed bug" with no description
+  of what bug was fixed or what changed.
+  PR #5: A new environment variable is required (REDIS_URL) but it's
+  not added to .env.example, the README, or the deployment docs.
+  Task: Write review comments for each situation. Explain why
+  documentation accuracy matters, what specifically needs to change,
+  and how to prevent these issues in the future.
+assertions:
+  - type: llm_judge
+    criteria: "Each documentation gap is identified with impact — PR #1: 'Documentation debt compounds — webhook retry config will be discoverable only by reading source code. Developers will file support tickets. Could we add a section to docs/webhooks.md before merging?' PR #2: 'The example on line 34 still shows the old API call without the new timeout parameter. A developer copy-pasting this example will get unexpected behavior. Please update the example to include timeout.' PR #3: 'The JSDoc at line 12 shows sendEmail(to, subject, body) but the function now takes an options object — this will confuse anyone relying on IDE tooltips.' PR #4: 'A changelog entry of just Fixed bug makes our changelog useless. What broke? What was the fix? Who does this affect? Suggest: Fixed: Order total calculation failed when discount code exceeded subtotal.' PR #5: 'Missing REDIS_URL from .env.example means every developer and every deployment environment will fail on first try.'"
+    weight: 0.35
+    description: "Gap identification"
+  - type: llm_judge
+    criteria: "Comments connect documentation to developer experience — documentation isn't bureaucracy, it's user experience: 'The .env.example file is the first thing a new developer touches when setting up the project. Missing REDIS_URL means their first experience is a cryptic connection error.' 'JSDoc is how developers discover your API — when it's wrong, they write code based on lies.' 'The changelog is how developers decide whether to upgrade — vague entries mean they can't assess risk.' 'Example code in docs is the most copied content in your entire system — if it's wrong, bugs are multiplied across every integration.' Each comment frames documentation as essential infrastructure, not optional polish"
+    weight: 0.35
+    description: "DX connection"
+  - type: llm_judge
+    criteria: "Prevention strategies are practical — (1) PR template: add 'Documentation updated?' checkbox, (2) CI check: detect API signature changes without corresponding JSDoc updates, (3) .env.example: add a CI check that all env vars in code exist in .env.example, (4) Changelog: require structured changelog format in PR template (What changed? Why? Who's affected?), (5) Examples: add documentation example testing to CI (run code examples to verify they work). Team agreement: 'Code without documentation is not done — add documentation to your team's Definition of Done.' Gradual adoption: 'We don't need to fix all existing gaps, but new code should ship with docs. Suggest adding a docs requirement to the PR template this sprint.'"
+    weight: 0.30
+    description: "Prevention strategies"

package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-error-handling.yaml ADDED Viewed

@@ -0,0 +1,50 @@
+meta:
+  id: reviewing-error-handling
+  level: 2
+  course: code-review-feedback-writing
+  type: output
+  description: "Review error handling — evaluate how code handles failures, edge cases, and unexpected states with feedback on robustness patterns"
+  tags: [code-review, error-handling, robustness, edge-cases, resilience, intermediate]
+state: {}
+trigger: |
+  You're reviewing a PR for an e-commerce checkout flow. The happy
+  path works great, but you're concerned about error handling:
+  1. `try { await chargeCard(amount) } catch(e) { console.log(e) }`
+     — errors are logged but not propagated or handled
+  2. No timeout on the external payment API call — if the payment
+     gateway hangs, the request hangs forever
+  3. If inventory check fails after payment succeeds, the payment
+     isn't reversed (inconsistent state)
+  4. The function returns null on error instead of throwing — callers
+     don't check for null
+  5. Generic catch block: catches everything including programming
+     errors (TypeError, ReferenceError) along with business errors
+  6. Error messages shown to users include internal details:
+     "Failed to insert into orders table: constraint violation"
+  Task: Write review comments for each error handling issue. For
+  each, explain the failure scenario, the user/system impact, and
+  the correct error handling pattern. These are the bugs that cause
+  3am pages.
+assertions:
+  - type: llm_judge
+    criteria: "Failure scenarios are vivid and specific — (1) Swallowed error: 'Card is charged but the order isn't created because the error is only logged, not handled. The customer pays but gets no order confirmation. Support ticket incoming.' (2) Missing timeout: 'If Stripe hangs for 5 minutes, this HTTP request hangs for 5 minutes. If 100 users hit checkout, you have 100 hanging connections and the server becomes unresponsive.' (3) Inconsistent state: 'Customer is charged $50 but has no inventory allocated. They see Charged on their credit card but Your cart is empty on your site. This is the worst kind of bug — money is involved.' (4) Null return: 'Callers assume this function returns an order object. When it returns null, the next line does order.id and crashes with Cannot read property id of null — a confusing error for a payment problem.' Each scenario tells a story a developer can visualize"
+    weight: 0.35
+    description: "Failure scenarios"
+  - type: llm_judge
+    criteria: "Correct patterns are demonstrated — (1) Don't swallow: either re-throw, return an error result, or handle meaningfully (show user an error page, queue a retry). (2) Timeout: `await Promise.race([chargeCard(amount), timeout(30000)])` or use AbortController. (3) Compensation: implement saga pattern or at minimum: try charge → try allocate → if allocate fails, refund charge. Show the code structure. (4) Error types: throw typed errors instead of returning null: `throw new PaymentError('Card declined', { code: 'CARD_DECLINED' })`. Callers use try/catch. (5) Specific catches: catch business errors separately from programming errors: `catch(e) { if (e instanceof PaymentError) { handlePaymentError(e) } else { throw e } }`. (6) User-safe errors: map internal errors to user-friendly messages, log details server-side"
+    weight: 0.35
+    description: "Correct patterns"
+  - type: llm_judge
+    criteria: "Comments connect to real production impact — 'These are the bugs that cause 3am pages' framing throughout. Reference real-world scenarios: 'We had an incident last month where a similar swallowed error caused 200 orders to be charged but not fulfilled.' Testing suggestion: 'Each of these error paths should have a test — simulate payment gateway timeout, simulate inventory service failure, simulate partial failure.' Monitoring: 'Even with proper error handling, add alerting for these failure paths — if chargeCard fails for 5% of requests, we need to know immediately.' Order of importance: 'The payment-without-order issue (#3) is the most critical — it involves customer money. I'd fix that first and create follow-up tickets for the rest.'"
+    weight: 0.30
+    description: "Production impact"

package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-tests.yaml ADDED Viewed

@@ -0,0 +1,53 @@
+meta:
+  id: reviewing-tests
+  level: 2
+  course: code-review-feedback-writing
+  type: output
+  description: "Review test code — evaluate test quality including coverage, edge cases, test naming, and the relationship between tests and implementation"
+  tags: [code-review, testing, test-quality, coverage, edge-cases, intermediate]
+state: {}
+trigger: |
+  You're reviewing test code for a payment processing module. The
+  author says "I added tests" and the coverage report shows 85%.
+  But coverage doesn't mean quality. You find these issues:
+  1. All 8 tests only cover the happy path (valid inputs, successful
+     operations) — no error cases tested
+  2. Test names are "test1", "test2", "test3" — no description of
+     what's being tested
+  3. Tests share mutable state — test order matters (test3 fails
+     if run alone)
+  4. One test is 60 lines long with 12 assertions (testing too many
+     things at once)
+  5. Tests mock the database but the mocks always return success —
+     database errors are never simulated
+  6. No test for the boundary condition: payment amount of exactly
+     $0.50 (minimum) and $999,999.99 (maximum)
+  7. A test that just checks the function doesn't throw — doesn't
+     verify the actual output
+  Task: Write review comments for each testing issue. For each,
+  explain why it matters for test quality, and suggest the specific
+  test(s) that should be added or changed.
+assertions:
+  - type: llm_judge
+    criteria: "Each test quality issue is explained clearly — (1) Happy path only: '85% coverage looks good on paper, but these tests only verify that correct inputs produce correct outputs. What happens when the payment gateway returns a timeout? When the amount is negative? When the customer doesn't exist? These are the scenarios that cause production incidents.' (2) Test names: 'When a test fails in CI, the name is all you see. test3 failing tells me nothing — but should_reject_payment_when_amount_below_minimum tells me exactly what broke.' (3) Shared state: 'Tests that depend on execution order are fragile — they pass in your suite but fail when run individually or in parallel. Each test should set up its own state.' (4) Long test: '12 assertions means 12 different things can fail — but you only see the first failure. Split into focused tests that each verify one behavior.'"
+    weight: 0.35
+    description: "Test quality explanations"
+  - type: llm_judge
+    criteria: "Specific missing tests are suggested — error cases to add: payment gateway timeout, gateway returns decline, invalid card number, duplicate payment (idempotency), insufficient funds, network error, database write failure. Boundary tests: amount = 49 cents (below minimum, should reject), amount = 50 cents (exact minimum, should accept), amount = $999,999.99 (exact maximum, should accept), amount = $1,000,000.00 (above maximum, should reject), amount = 0 (should reject), amount = -1 (should reject). State tests: payment in each status (pending, processing, succeeded, failed) — can each be refunded? cancelled? retried? Each suggestion is a specific test case with expected behavior, not just 'add more tests'"
+    weight: 0.35
+    description: "Specific missing tests"
+  - type: llm_judge
+    criteria: "Test naming and structure improvements are demonstrated — naming pattern: 'should_[expected behavior]_when_[condition]' or 'it [behavior] when [condition]'. Examples: 'should_create_payment_when_valid_amount_and_card', 'should_return_error_when_amount_below_minimum', 'should_reject_refund_when_payment_already_refunded'. Test structure: Arrange-Act-Assert clearly separated. Each test: set up specific state (arrange), call the function (act), verify one specific outcome (assert). Fixture usage: suggest beforeEach for common setup but ensure each test has its own state. Mock improvement: show how to make the database mock return an error: `mockDb.query.mockRejectedValueOnce(new Error('connection timeout'))`. Independence: show how to reset state between tests"
+    weight: 0.30
+    description: "Structure improvements"

package/courses/code-review-feedback-writing/scenarios/level-2/security-review-comments.yaml ADDED Viewed

@@ -0,0 +1,50 @@
+meta:
+  id: security-review-comments
+  level: 2
+  course: code-review-feedback-writing
+  type: output
+  description: "Write security-focused review comments — identify and communicate security vulnerabilities clearly with risk assessment and remediation guidance"
+  tags: [code-review, security, vulnerabilities, OWASP, remediation, intermediate]
+state: {}
+trigger: |
+  You're reviewing a PR for a user management feature. You've spotted
+  several security issues at different severity levels:
+  1. SQL query built with string concatenation:
+     `SELECT * FROM users WHERE name = '${userName}'`
+  2. Password stored as MD5 hash (no salt)
+  3. API endpoint returns full user object including hashed password,
+     SSN, and internal notes in the response
+  4. File upload accepts any file type with no size limit
+  5. Admin endpoint has no authorization check — any authenticated
+     user can access it
+  6. CORS is configured with Access-Control-Allow-Origin: *
+  7. Error messages include stack traces and database column names
+  Task: Write security review comments for each vulnerability. For
+  each: explain the vulnerability, describe the attack scenario,
+  rate the severity, and provide the specific fix. These comments
+  should educate the author about WHY the fix matters, not just
+  WHAT to fix.
+assertions:
+  - type: llm_judge
+    criteria: "Each vulnerability is explained with attack scenarios — (1) SQL injection: attacker inputs `'; DROP TABLE users; --` as userName, executing arbitrary SQL. (2) MD5: rainbow table attack cracks passwords in seconds (no salt means identical passwords have identical hashes). (3) Data exposure: API consumers (or browser dev tools users) can see passwords, SSNs — violates data minimization. (4) File upload: attacker uploads 10GB file (DoS) or .exe/.php file (remote code execution if served). (5) Broken authorization: any logged-in user can delete other users or modify admin settings. (6) Wildcard CORS: any website can make authenticated API requests on behalf of your users (CSRF-like). (7) Information disclosure: stack traces reveal framework version, file paths, database schema — helps attackers map your system. Each explanation is specific and scary enough to motivate the fix without being fear-mongering"
+    weight: 0.35
+    description: "Vulnerability explanations"
+  - type: llm_judge
+    criteria: "Severity ratings and fixes are specific — critical (must fix before merge): SQL injection (#1), broken auth (#5), password storage (#2). High (must fix before production): data exposure (#3), file upload (#4). Medium (fix before public launch): CORS (#6), information disclosure (#7). Fixes: (1) use parameterized queries: `db.query('SELECT * FROM users WHERE name = $1', [userName])`. (2) Use bcrypt with cost factor 12+. (3) Create a UserResponse DTO that excludes sensitive fields. (4) Allowlist file types (jpg, png, pdf), max 10MB, store outside webroot. (5) Add @RequireRole('admin') middleware/decorator. (6) Set specific allowed origins. (7) Use generic error messages in production, log details server-side. Each fix is implementable from the comment alone"
+    weight: 0.35
+    description: "Severity and fixes"
+  - type: llm_judge
+    criteria: "Comments educate about security principles — don't just fix the symptom, explain the principle: SQL injection → 'Never trust user input in queries — parameterized queries separate data from commands.' Password storage → 'MD5 was designed for speed, not security — bcrypt is intentionally slow, making brute force impractical.' Data exposure → 'API responses should follow the principle of least privilege — only return what the consumer needs.' Authorization → 'Authentication (who are you?) is different from authorization (are you allowed?) — this endpoint checks the first but not the second.' Each comment links to relevant OWASP reference or security guide. Overall comment: 'I'd recommend running the OWASP top 10 checklist against any endpoint handling user data — happy to pair on a security review if helpful.'"
+    weight: 0.30
+    description: "Security education"

package/courses/code-review-feedback-writing/scenarios/level-2/suggesting-alternatives.yaml ADDED Viewed

@@ -0,0 +1,42 @@
+meta:
+  id: suggesting-alternatives
+  level: 2
+  course: code-review-feedback-writing
+  type: output
+  description: "Suggest alternatives with code — provide concrete code suggestions in reviews showing before/after transformations with trade-off analysis"
+  tags: [code-review, alternatives, code-suggestions, before-after, trade-offs, intermediate]
+state: {}
+trigger: |
+  You're reviewing code and find several places where a different
+  approach would be better. Rather than vaguely saying "consider
+  another approach," you want to show the alternative with code.
+  Current code patterns you've found:
+  1. A 15-line if/else chain checking user.role for 6 different values
+  2. A Promise chain 8 levels deep (.then().then().then()...)
+  3. Manual array filtering with a for loop and push
+  4. Try/catch with identical error handling in 5 different functions
+  5. A function that takes 7 positional parameters
+  Task: For each pattern, write a review comment that shows the
+  current approach and the suggested alternative as code snippets.
+  Include: what's wrong with the current approach, the alternative
+  code, and an honest assessment of trade-offs (the alternative
+  isn't always strictly better).
+assertions:
+  - type: llm_judge
+    criteria: "Code suggestions are concrete and complete — (1) if/else chain → object literal lookup or switch statement or Map: show `const handlers = { admin: handleAdmin, user: handleUser, ... }; handlers[user.role]?.()`. (2) Promise chain → async/await: show the before (nested .then) and after (linear await statements). (3) Manual loop → array methods: show `const active = users.filter(u => u.isActive).map(u => u.name)`. (4) Duplicate error handling → higher-order function or error middleware: show `const withErrorHandling = (fn) => async (...args) => { try { return await fn(...args); } catch(e) { handleError(e); } }`. (5) 7 parameters → options object: show `function createUser({ name, email, role, ...opts })` with destructuring. Each suggestion is copy-pasteable with minimal modification"
+    weight: 0.35
+    description: "Concrete code"
+  - type: llm_judge
+    criteria: "Trade-offs are honestly assessed — (1) Object lookup: cleaner but loses TypeScript exhaustiveness checking (switch with default gives compile-time safety). (2) Async/await: more readable but error handling requires try/catch wrapping (or .catch on the promise). (3) Array methods: more declarative but creates intermediate arrays (for large datasets, a single loop is more memory-efficient). (4) Higher-order function: reduces duplication but adds indirection (debugging is harder when error handling is abstracted). (5) Options object: flexible but loses positional argument auto-complete in some IDEs, and you must destructure carefully. Each trade-off is honest — the reviewer acknowledges when the current approach has legitimate advantages. 'I prefer this approach, but I understand if you want to keep the current one because of [specific advantage]'"
+    weight: 0.35
+    description: "Honest trade-offs"
+  - type: llm_judge
+    criteria: "Comments frame suggestions as proposals, not mandates — language: 'Here's an alternative approach — let me know what you think' not 'Change this to...'. When the alternative is clearly better: 'I think this would improve readability significantly — the nested .then chain is hard to follow at 8 levels.' When it's a preference: 'This is a style preference — the object lookup is more concise but your switch statement works well too. Up to you.' When the suggestion is complex: 'This is a bigger refactor — could be a follow-up PR if you agree with the direction.' GitHub suggestion format used where applicable (```suggestion blocks). Each comment acknowledges the effort in the current code — 'The logic here is correct, I'm just suggesting a structural alternative'"
+    weight: 0.30
+    description: "Proposal framing"

package/courses/code-review-feedback-writing/scenarios/level-3/advanced-review-shift.yaml ADDED Viewed

@@ -0,0 +1,48 @@
+meta:
+  id: advanced-review-shift
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Combined advanced shift — review a cross-team database migration PR while mentoring a junior developer, balancing speed with thoroughness during a sprint deadline"
+  tags: [code-review, combined, shift-simulation, cross-team, mentoring, advanced]
+state: {}
+trigger: |
+  It's Wednesday afternoon, sprint ends Friday. You have one complex
+  review situation:
+  A junior developer on another team wrote a PR that adds multi-tenant
+  support to the shared authentication service. Your team depends on
+  this service heavily. The PR:
+  - Database migration: adds tenant_id to 3 tables (users, sessions, api_keys)
+  - Auth middleware changes: adds tenant context to every request
+  - API changes: new endpoints for tenant management
+  - Breaking change: all existing API calls need a X-Tenant-ID header
+  - Tests: basic coverage, missing edge cases
+  - The junior dev is under pressure from their manager to ship by Friday
+  You need to:
+  1. Review from your team's perspective (impact on your services)
+  2. Mentor the junior developer (this is their biggest PR)
+  3. Be thorough on security (auth changes are critical)
+  4. Be fast enough to not block the sprint
+  5. Navigate the cross-team politics (their manager wants quick approval)
+  Task: Write the complete review that balances all five concerns.
+  Include your communication strategy for the sensitive dynamics.
+assertions:
+  - type: llm_judge
+    criteria: "Security concerns are non-negotiable regardless of deadline — auth middleware review: 'Tenant isolation is a security boundary — if tenant_id can be spoofed via header, any user can access any tenant's data. The X-Tenant-ID header must be validated against the authenticated user's allowed tenants, not just trusted.' Specific checks: (1) Can user A set X-Tenant-ID to tenant B's ID? (2) What happens when the header is missing? (3) What happens when an invalid tenant ID is provided? (4) Are all existing queries properly scoped by tenant_id? Migration: 'Adding tenant_id without a foreign key constraint means invalid tenant IDs can exist in the database.' Mark as blocking: 'These security items must be addressed before merge — no exceptions for sprint deadlines on authentication code.'"
+    weight: 0.35
+    description: "Security non-negotiable"
+  - type: llm_judge
+    criteria: "Impact on dependent teams is clearly communicated — breaking change assessment: 'The X-Tenant-ID header requirement is a breaking change for all consuming services. Our payments service makes 200 requests/second to auth — this will fail after deploy unless we're coordinated.' Migration plan needed: 'How is this being deployed? All consuming services need to be updated simultaneously, or: make the header optional first (default to primary tenant), deploy all consumers, then make it required.' Communication: raise in cross-team channel, not just PR comment. Specific impact: 'Our team needs 2-3 days to update our services to include X-Tenant-ID. This can't ship Friday without a backward-compatible migration path.' Propose solution: 'Make tenant_id optional with default for single-tenant mode. Phase 2 makes it required after all services migrate.'"
+    weight: 0.35
+    description: "Impact communication"
+  - type: llm_judge
+    criteria: "Mentoring and politics are handled with care — mentoring: 'This is an ambitious and important feature — the overall approach is solid. I have some security items that need to be addressed and some suggestions for safer deployment.' Don't overwhelm: separate blocking items (3-4) from suggestions (5+) from follow-up items. Offer help: 'Happy to pair on the tenant validation logic if that would help move faster.' Politics: direct message to the junior dev's manager: 'The PR is well-written but has security issues that need to be addressed — I want to help [name] fix them quickly rather than rush a merge. The breaking change also needs a migration plan that involves our team. Can we sync briefly?' Don't: approve under pressure ('it's fine, ship it'), block passive-aggressively ('I have 47 comments'), or escalate unnecessarily. Frame as: 'We're on the same team — let's ship this right.'"
+    weight: 0.30
+    description: "Mentoring and politics"

package/courses/code-review-feedback-writing/scenarios/level-3/api-design-review.yaml ADDED Viewed

@@ -0,0 +1,47 @@
+meta:
+  id: api-design-review
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Review API design — evaluate endpoint design, request/response contracts, versioning strategy, and backward compatibility in code reviews"
+  tags: [code-review, API-design, endpoints, contracts, versioning, advanced]
+state: {}
+trigger: |
+  You're reviewing a PR that adds 3 new API endpoints to your
+  public-facing REST API. These endpoints will be used by external
+  developers and can't easily be changed after release.
+  Proposed endpoints:
+  1. POST /api/v1/getOrders — retrieves orders with filters in the
+     request body
+  2. PUT /api/v1/orders/{id} — updates an order, but some fields
+     (status, total) are calculated and shouldn't be settable
+  3. DELETE /api/v1/orders/{id}/items/{itemId}/remove — removes an
+     item from an order
+  Response formats:
+  - Success: { "success": true, "data": {...} }
+  - Error: { "success": false, "error": "Something went wrong" }
+  The endpoints work correctly with comprehensive tests.
+  Task: Review the API design for RESTful conventions, future
+  compatibility, and developer experience. Write review comments
+  that explain API design principles and suggest corrections.
+  Remember: these become your public contract once released.
+assertions:
+  - type: llm_judge
+    criteria: "REST convention violations are identified — (1) POST /getOrders: GET is for retrieval, POST is for creation. Filters should be query parameters: GET /api/v1/orders?status=pending&after=2024-01-01. If filter complexity requires a body, use POST /api/v1/orders/search. 'getOrders' as a verb violates REST (nouns for resources). (2) PUT allows setting calculated fields: read-only fields should be rejected or ignored. Document which fields are writable. Consider: does PUT replace the entire order or just modified fields? If partial, use PATCH. (3) DELETE .../remove: redundant — DELETE already means remove. Simplify to DELETE /api/v1/orders/{id}/items/{itemId}. Each violation explained with the REST principle and the practical impact on developers"
+    weight: 0.35
+    description: "REST conventions"
+  - type: llm_judge
+    criteria: "Response format and contract issues are addressed — success/error envelope: 'The { success: true/false } pattern is redundant with HTTP status codes — 200 already means success, 400 means client error. Industry standard (Stripe, GitHub): use status codes for success/failure, and reserve the response body for data. Error responses should include: code (machine-readable), message (human-readable), details (field-level errors).' Current error: 'Something went wrong' gives developers nothing to debug. Suggest: `{ error: { code: 'ORDER_NOT_FOUND', message: 'Order with ID xyz does not exist', param: 'id' } }`. Consistency: 'Match the error format used by your other 20 endpoints — developers shouldn't learn a new error format per endpoint.'"
+    weight: 0.35
+    description: "Response format"
+  - type: llm_judge
+    criteria: "Future compatibility concerns are raised — 'These endpoints become a permanent public contract. Changes after release are breaking changes that affect external developers.' Specific concerns: (1) pagination — GET /orders must return paginated results (what happens when there are 100K orders?), (2) filtering — be intentional about which filters are supported (adding is non-breaking, removing is breaking), (3) field expansion — consider using ?fields=id,status,total for sparse responses, (4) versioning — v1 in the URL means we can make v2 later, but plan for what triggers a v2. API review checklist: naming consistency, HTTP method correctness, pagination, filtering, sorting, error format, rate limiting headers, authentication, idempotency (PUT/DELETE should be idempotent). 'Suggest: run this through our API design review checklist before launch.'"
+    weight: 0.30
+    description: "Future compatibility"

package/courses/code-review-feedback-writing/scenarios/level-3/cross-team-review.yaml ADDED Viewed

@@ -0,0 +1,45 @@
+meta:
+  id: cross-team-review
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Review cross-team code — navigate reviewing code from teams with different conventions, priorities, and domain knowledge while providing valuable feedback"
+  tags: [code-review, cross-team, collaboration, domain-knowledge, conventions, advanced]
+state: {}
+trigger: |
+  You're on the payments team. The platform team asks you to review
+  their PR that adds a new event bus system. Challenges:
+  - You don't know their codebase well
+  - They use a different testing framework (they use Playwright,
+    you use Jest)
+  - Their coding conventions differ (they use classes, your team
+    uses functional)
+  - The PR impacts your team (your services will need to emit events)
+  - You're not sure which of your concerns are "that's how they do
+    things" vs "that's actually wrong"
+  The PR adds: EventBus class, EventEmitter, EventSubscriber pattern,
+  retry mechanism for failed deliveries, dead letter queue.
+  Task: Write a cross-team review that provides value without
+  overstepping. Show how to: ask about unfamiliar conventions before
+  criticizing, focus on the interface your team will use, flag
+  concerns from your domain expertise (payments), and distinguish
+  between "different" and "wrong."
+assertions:
+  - type: llm_judge
+    criteria: "Review distinguishes different from wrong — explicitly separate: 'These are questions about conventions I'm not familiar with (may be intentional)' vs 'These are concerns from my payments domain expertise (likely valid regardless of team conventions).' Convention questions: 'I noticed you use class-based patterns throughout — is that your team's standard? Our team uses functions but I want to respect your conventions.' Domain concerns: 'The retry mechanism retries on all errors — for payment events, retrying a charge event could result in double-charging. Payment events should only be retried if they're idempotent.' The reviewer shows humility about what they don't know while being firm about what they do know"
+    weight: 0.35
+    description: "Different vs wrong"
+  - type: llm_judge
+    criteria: "Interface concerns are highlighted as the primary review focus — 'Since my team will be the primary consumer of this event bus, I focused my review on the producer API.' Specific interface feedback: event schema documentation (what fields are required?), error handling contract (what happens when the bus is down?), delivery guarantees (at-least-once vs exactly-once — critical for payments), event ordering guarantees (payment.created must arrive before payment.succeeded). API ergonomics: 'The current API requires 5 lines of boilerplate to emit one event — could we add a helper like eventBus.emit(type, payload)?' Backward compatibility: 'If the event schema changes, how will consumers know? Suggest versioning events from the start.'"
+    weight: 0.35
+    description: "Interface focus"
+  - type: llm_judge
+    criteria: "Cross-team communication is respectful and collaborative — opening: 'Thanks for the review request — excited to see the event bus taking shape. I focused primarily on the producer API since the payments team will be a heavy consumer.' Questions before assumptions: 'I have some questions about design decisions that may reflect conventions I'm not familiar with — happy to discuss async.' Expertise sharing: 'From our experience with payment events, here are patterns that have burned us...' Collaboration offer: 'Would it be helpful if I put together a list of payment events we'd want to emit? That could help validate the event schema design.' Close: 'Overall this looks solid for the use cases I can evaluate. I'd recommend getting a review from the data team too since they'll be heavy consumers of analytics events.'"
+    weight: 0.30
+    description: "Collaborative communication"

package/courses/code-review-feedback-writing/scenarios/level-3/database-migration-review.yaml ADDED Viewed

@@ -0,0 +1,48 @@
+meta:
+  id: database-migration-review
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Review database migrations — evaluate schema changes for safety, reversibility, data integrity, and production deployment risk"
+  tags: [code-review, database, migration, schema, deployment, safety, advanced]
+state: {}
+trigger: |
+  You're reviewing a database migration PR. The migration:
+  1. Adds a NOT NULL column "subscription_tier" to the users table
+     (2 million rows) without a default value
+  2. Renames column "userName" to "user_name" in the orders table
+  3. Adds a unique constraint on (email, organization_id) to the
+     users table
+  4. Drops the "legacy_tokens" table (mentioned as "unused")
+  5. Creates a new index on orders(customer_id, created_at) — the
+     orders table has 50 million rows
+  6. Changes the "amount" column type from INTEGER to DECIMAL(10,2)
+  The migration has no down/rollback migration defined.
+  Task: Review each migration step for production safety. For each,
+  explain the risk, describe what could go wrong during deployment,
+  and suggest the safe approach. These are the reviews that prevent
+  2am deployment incidents.
+assertions:
+  - type: llm_judge
+    criteria: "Production risks are identified for each step — (1) NOT NULL without default: migration fails on 2M existing rows (all have NULL for the new column). Even with a default, adding NOT NULL column locks the table during backfill on some databases. (2) Column rename: every query referencing 'userName' breaks instantly — application code, reports, ETL jobs, third-party integrations. (3) Unique constraint: fails if duplicate (email, org_id) pairs already exist — migration aborts mid-way. Need to check for duplicates BEFORE applying. (4) Drop table: 'unused' — how was this verified? Any scheduled jobs, background workers, or external systems accessing it? Data loss is permanent. (5) New index on 50M rows: index creation locks the table (in some DBs) — no reads or writes during build, could take 30+ minutes. (6) Type change: INTEGER to DECIMAL changes data semantics — existing value 1000 (cents) becomes 1000.00 (dollars?). All application code interpreting this value must be updated simultaneously"
+    weight: 0.35
+    description: "Production risks"
+  - type: llm_judge
+    criteria: "Safe approaches are suggested for each — (1) NOT NULL: three-step migration: add column as NULL with default → backfill existing rows → add NOT NULL constraint. (2) Rename: don't rename — add new column, dual-write, migrate reads, drop old column. Or: use database alias/view. (3) Unique constraint: first run SELECT email, organization_id, COUNT(*) ... HAVING COUNT(*) > 1 to find duplicates, fix them, THEN add constraint. (4) Drop table: add a deprecation period — rename to legacy_tokens_deprecated, wait 30 days, monitor for errors, THEN drop. (5) Index: use CREATE INDEX CONCURRENTLY (PostgreSQL) or equivalent to avoid table lock. (6) Type change: coordinate with application deploy — use a feature flag or blue/green deployment so code and schema change together. Each step: show the safe SQL or migration code"
+    weight: 0.35
+    description: "Safe approaches"
+  - type: llm_judge
+    criteria: "Missing rollback and deployment strategy are flagged — 'No down migration defined — if this migration fails halfway, how do we roll back? Every migration must have a corresponding rollback.' Rollback for each step: (1) DROP COLUMN, (2) rename back, (3) DROP CONSTRAINT, (4) can't rollback data loss (strongest argument against dropping tables), (5) DROP INDEX, (6) ALTER COLUMN back to INTEGER (but data precision is lost). Deployment strategy: 'This migration should be split into 6 separate migrations, deployed independently over multiple deploys. If step 3 fails, we don't want to have to roll back steps 1-2.' Testing: 'Run this migration against a copy of production data first — the unique constraint check (#3) and index creation time (#5) should be validated with real data volumes.' Monitor: 'Add migration timing to deployment metrics — alert if any step takes more than 5 minutes'"
+    weight: 0.30
+    description: "Rollback and deployment"