npm - dojo.md - Versions diffs - 0.2.1 → 0.2.3 - Mend

dojo.md 0.2.1 → 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (152) hide show

package/courses/code-review-feedback-writing/scenarios/level-2/performance-feedback.yaml ADDED Viewed

@@ -0,0 +1,50 @@
+meta:
+  id: performance-feedback
+  level: 2
+  course: code-review-feedback-writing
+  type: output
+  description: "Give performance feedback — identify and communicate performance issues in code reviews with data-backed reasoning and optimization suggestions"
+  tags: [code-review, performance, optimization, complexity, N+1, intermediate]
+state: {}
+trigger: |
+  You're reviewing a PR for a dashboard that displays order analytics.
+  The page loads slowly in staging. You've identified performance
+  issues:
+  1. N+1 query: loops through 100 orders and makes a separate DB
+     query for each order's customer data
+  2. Loading all 50,000 orders into memory to calculate totals
+     instead of using SQL aggregation
+  3. A nested loop (O(n²)) matching orders to products when a
+     Map/dictionary lookup would be O(n)
+  4. Fetching the same user profile 3 times on the same page
+     (3 separate API calls in 3 components)
+  5. Rendering a 10,000-row table without virtualization
+  6. Computing expensive derived values on every render instead
+     of memoizing
+  Task: Write performance review comments for each issue. For each,
+  explain the performance impact with approximate numbers, explain
+  why the current approach is slow, and suggest the optimized approach
+  with enough detail to implement.
+assertions:
+  - type: llm_judge
+    criteria: "Performance impact is quantified — (1) N+1: 100 orders = 101 queries (1 list + 100 customer lookups). Each query ~5ms = 500ms total. Fix with JOIN or eager loading: 1 query ~10ms. 50x improvement. (2) 50K orders in memory: ~200MB memory, 2-3 seconds to load and iterate. SQL aggregate: `SELECT SUM(total), COUNT(*) FROM orders WHERE ...` returns one row in ~50ms. (3) O(n²): 1000 orders × 500 products = 500,000 comparisons. Map lookup: 1000 + 500 = 1,500 operations. 300x fewer operations. (4) 3 duplicate API calls: 3 × ~200ms = 600ms wasted. Fetch once and share. (5) 10K rows: DOM rendering 10K rows takes 2-3 seconds, causes jank. Virtualization renders ~50 visible rows. (6) Recomputing on every render: if computation takes 100ms and renders happen 10x/second, the page stutters. Numbers don't need to be exact — approximate is fine for review"
+    weight: 0.35
+    description: "Quantified impact"
+  - type: llm_judge
+    criteria: "Optimized approaches include implementation detail — (1) N+1 fix: show the JOIN query or ORM eager loading syntax (e.g., `Order.findAll({ include: Customer })`). (2) SQL aggregation: show the exact query `SELECT status, COUNT(*), SUM(total) FROM orders GROUP BY status`. (3) Map lookup: show building a Map first `const productMap = new Map(products.map(p => [p.id, p]))`, then `productMap.get(order.productId)`. (4) Shared fetch: use React context, SWR/React Query (deduplication built in), or lift the fetch to a parent component. (5) Virtualization: suggest react-window or react-virtualized, show the component change. (6) Memoization: show useMemo with dependency array. Each fix is a concrete code change, not just 'optimize this'"
+    weight: 0.35
+    description: "Implementation detail"
+  - type: llm_judge
+    criteria: "Comments explain WHY the current approach is slow — don't just say 'use a JOIN' — explain: 'Each database query has overhead: connection, parsing, execution, network round-trip. Doing this 100 times multiplies that overhead. A JOIN does one trip and the database optimizes the execution plan.' For O(n²): 'Nested loops compare every order with every product. When data grows from 1K to 10K items, this goes from 500K to 50M comparisons — it doesn't just get 10x slower, it gets 100x slower.' For memory: 'Loading 50K rows into Node.js means serializing from Postgres, deserializing in JS, iterating in a single-threaded event loop. The database can do this math on indexed data in milliseconds.' The reviewer teaches performance thinking, not just performance fixes"
+    weight: 0.30
+    description: "Explains why slow"

package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-breaking-changes.yaml ADDED Viewed

@@ -0,0 +1,44 @@
+meta:
+  id: reviewing-breaking-changes
+  level: 2
+  course: code-review-feedback-writing
+  type: output
+  description: "Review breaking changes — identify and communicate backward compatibility issues, migration requirements, and deployment risks in code reviews"
+  tags: [code-review, breaking-changes, backward-compatibility, migration, deployment, intermediate]
+state: {}
+trigger: |
+  You're reviewing a PR that modifies a shared library used by 6
+  microservices. The changes:
+  1. Renamed function: getUserById() → findUser() (affects all callers)
+  2. Changed return type: was { user: User } now returns User directly
+     (all callers destructure the old format)
+  3. Removed deprecated parameter: search(query, options) no longer
+     accepts options.fuzzy (3 services use this)
+  4. Changed default: connection timeout changed from 30s to 5s
+     (services with slow DB queries will start timing out)
+  5. New required parameter: authenticate(token) now requires
+     authenticate(token, scope) — all callers must be updated
+  The author says: "I cleaned up the API to be more consistent."
+  Task: Write review comments for each breaking change. For each,
+  explain: who is affected, what will break, whether it needs
+  migration, and how to make the change safely. Address the overall
+  approach of making 5 breaking changes in one PR.
+assertions:
+  - type: llm_judge
+    criteria: "Each breaking change impact is identified precisely — (1) Function rename: all 6 services that import getUserById will fail to compile/build. (2) Return type change: `const { user } = getUserById(id)` becomes `const user = findUser(id)` — runtime error: destructuring undefined. (3) Removed parameter: 3 services using options.fuzzy silently lose fuzzy search (might not error, just behave differently — WORSE than an error). (4) Changed default: services with queries that take 6-29 seconds worked before, now timeout. Performance regression masquerading as a bug. (5) New required parameter: TypeScript catches this at compile time, JavaScript fails at runtime. Each change: number of affected services, type of failure (compile error, runtime error, silent behavior change), how to detect the break"
+    weight: 0.35
+    description: "Impact identification"
+  - type: llm_judge
+    criteria: "Safe migration paths are proposed — (1) Rename: add findUser as alias first, deprecate getUserById with console.warn, remove after all callers migrate. (2) Return type: version the library, v2 returns new format, v1 maintained for 3 months. Or: support both formats with options parameter. (3) Remove parameter: warn when fuzzy is passed (don't silently ignore), give callers time to remove. (4) Default change: make it configurable, don't change the default — let callers opt-in to the shorter timeout. (5) New required parameter: make scope optional with a default value first, then make it required after all callers update. Overall: each change should be a separate PR that can be deployed independently. Use semantic versioning: these are MAJOR version changes"
+    weight: 0.35
+    description: "Migration paths"
+  - type: llm_judge
+    criteria: "Overall approach is challenged constructively — 'I understand the motivation to clean up the API — consistency is valuable. But 5 breaking changes in one PR affecting 6 services is risky. If something goes wrong, we can't identify which change caused the problem.' Suggest: 'Can we split this into 5 separate PRs, each with its own migration? This way: (1) each change can be reviewed independently, (2) each service team can update at their own pace, (3) if a change causes issues, we know exactly which one. Recommended order: add aliases/new APIs first (non-breaking), then deprecate old APIs, then remove old APIs after all consumers migrate.' Deployment strategy: 'This requires coordinated deployment of the library + all 6 services simultaneously. A versioned approach with backward compatibility lets services migrate independently.'"
+    weight: 0.30
+    description: "Approach challenge"

package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-complex-prs.yaml ADDED Viewed

@@ -0,0 +1,43 @@
+meta:
+  id: reviewing-complex-prs
+  level: 2
+  course: code-review-feedback-writing
+  type: output
+  description: "Review complex PRs — manage large pull requests by batching feedback, identifying critical paths, and providing structured review across many files"
+  tags: [code-review, complex-pr, large-pr, batching, structure, intermediate]
+state: {}
+trigger: |
+  A teammate submits a 450-line PR touching 12 files. It implements
+  a new order processing pipeline with:
+  - New database migration (adds 3 tables)
+  - New service class with business logic (order validation, pricing,
+    inventory checks)
+  - API controller with 4 new endpoints
+  - 8 test files
+  - Updates to 2 existing services (inventory, notifications)
+  You've been reviewing for 30 minutes and have 20+ comments scattered
+  across files. The review is becoming unfocused and you're losing
+  track of what's critical vs minor.
+  Task: Organize your review of this complex PR. Write: (1) a
+  structured summary that groups feedback by theme rather than file,
+  (2) identify the critical path (which parts MUST be correct for the
+  PR to be safe), (3) acknowledge what you didn't review thoroughly
+  (and why), and (4) suggest whether this PR should have been split.
+assertions:
+  - type: llm_judge
+    criteria: "Feedback is grouped thematically, not file-by-file — categories: (1) Data model concerns (migration issues, schema design), (2) Business logic (validation rules, pricing calculations, inventory checks), (3) API design (endpoint paths, request/response formats, error handling), (4) Testing gaps (missing edge cases, untested paths), (5) Integration concerns (changes to existing services, backward compatibility). Each category: list of specific comments with severity. This structure helps the author: they can address all data model concerns at once rather than jumping between files. The author sees the reviewer understands the overall design, not just individual lines"
+    weight: 0.35
+    description: "Thematic grouping"
+  - type: llm_judge
+    criteria: "Critical path is identified explicitly — 'The most important things to get right: (1) the database migration must be reversible and handle existing data, (2) the pricing calculation must match the business rules exactly (errors here cost real money), (3) the inventory check must be atomic (race condition between check and decrement would oversell).' Distinguish: areas where bugs cause data loss/money loss (critical) vs areas where bugs cause bad UX but are recoverable (important) vs areas where issues are cosmetic (minor). State what you focused review time on: 'I spent most time on the service class and migration. I did a lighter review of the tests and controller.'"
+    weight: 0.35
+    description: "Critical path identified"
+  - type: llm_judge
+    criteria: "PR splitting recommendation is constructive — acknowledge: 'I understand this is one feature, but a 450-line PR across 12 files is hard to review thoroughly. Reviewer fatigue after 200 lines means later files get less attention.' Suggest splitting: 'For future features like this, consider: PR 1: migration + model (reviewable independently), PR 2: service layer + tests, PR 3: API controller + integration. Each PR is ~150 lines and can be reviewed in one focused session.' Don't demand retroactive splitting of this PR (too late). Be honest about review limitations: 'I did a thorough review of the service layer but a lighter review of the tests — you may want a second reviewer for the test coverage.'"
+    weight: 0.30
+    description: "PR splitting advice"

package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-documentation.yaml ADDED Viewed

@@ -0,0 +1,47 @@
+meta:
+  id: reviewing-documentation
+  level: 2
+  course: code-review-feedback-writing
+  type: output
+  description: "Review documentation changes — evaluate documentation PRs for accuracy, completeness, clarity, and alignment with code changes"
+  tags: [code-review, documentation, accuracy, completeness, clarity, intermediate]
+state: {}
+trigger: |
+  You're reviewing PRs that include documentation changes alongside
+  code changes. Common problems you've seen:
+  PR #1: New feature added (webhook retry configuration) but the
+  README wasn't updated. The author says "I'll document it later."
+  PR #2: Documentation was updated to describe a new API parameter,
+  but the example code in the docs doesn't include the new parameter.
+  PR #3: A refactor changed the method signature from
+  `sendEmail(to, subject, body)` to `sendEmail(options)` but the
+  JSDoc comments still show the old signature.
+  PR #4: The CHANGELOG entry says "Fixed bug" with no description
+  of what bug was fixed or what changed.
+  PR #5: A new environment variable is required (REDIS_URL) but it's
+  not added to .env.example, the README, or the deployment docs.
+  Task: Write review comments for each situation. Explain why
+  documentation accuracy matters, what specifically needs to change,
+  and how to prevent these issues in the future.
+assertions:
+  - type: llm_judge
+    criteria: "Each documentation gap is identified with impact — PR #1: 'Documentation debt compounds — webhook retry config will be discoverable only by reading source code. Developers will file support tickets. Could we add a section to docs/webhooks.md before merging?' PR #2: 'The example on line 34 still shows the old API call without the new timeout parameter. A developer copy-pasting this example will get unexpected behavior. Please update the example to include timeout.' PR #3: 'The JSDoc at line 12 shows sendEmail(to, subject, body) but the function now takes an options object — this will confuse anyone relying on IDE tooltips.' PR #4: 'A changelog entry of just Fixed bug makes our changelog useless. What broke? What was the fix? Who does this affect? Suggest: Fixed: Order total calculation failed when discount code exceeded subtotal.' PR #5: 'Missing REDIS_URL from .env.example means every developer and every deployment environment will fail on first try.'"
+    weight: 0.35
+    description: "Gap identification"
+  - type: llm_judge
+    criteria: "Comments connect documentation to developer experience — documentation isn't bureaucracy, it's user experience: 'The .env.example file is the first thing a new developer touches when setting up the project. Missing REDIS_URL means their first experience is a cryptic connection error.' 'JSDoc is how developers discover your API — when it's wrong, they write code based on lies.' 'The changelog is how developers decide whether to upgrade — vague entries mean they can't assess risk.' 'Example code in docs is the most copied content in your entire system — if it's wrong, bugs are multiplied across every integration.' Each comment frames documentation as essential infrastructure, not optional polish"
+    weight: 0.35
+    description: "DX connection"
+  - type: llm_judge
+    criteria: "Prevention strategies are practical — (1) PR template: add 'Documentation updated?' checkbox, (2) CI check: detect API signature changes without corresponding JSDoc updates, (3) .env.example: add a CI check that all env vars in code exist in .env.example, (4) Changelog: require structured changelog format in PR template (What changed? Why? Who's affected?), (5) Examples: add documentation example testing to CI (run code examples to verify they work). Team agreement: 'Code without documentation is not done — add documentation to your team's Definition of Done.' Gradual adoption: 'We don't need to fix all existing gaps, but new code should ship with docs. Suggest adding a docs requirement to the PR template this sprint.'"
+    weight: 0.30
+    description: "Prevention strategies"

package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-error-handling.yaml ADDED Viewed

@@ -0,0 +1,50 @@
+meta:
+  id: reviewing-error-handling
+  level: 2
+  course: code-review-feedback-writing
+  type: output
+  description: "Review error handling — evaluate how code handles failures, edge cases, and unexpected states with feedback on robustness patterns"
+  tags: [code-review, error-handling, robustness, edge-cases, resilience, intermediate]
+state: {}
+trigger: |
+  You're reviewing a PR for an e-commerce checkout flow. The happy
+  path works great, but you're concerned about error handling:
+  1. `try { await chargeCard(amount) } catch(e) { console.log(e) }`
+     — errors are logged but not propagated or handled
+  2. No timeout on the external payment API call — if the payment
+     gateway hangs, the request hangs forever
+  3. If inventory check fails after payment succeeds, the payment
+     isn't reversed (inconsistent state)
+  4. The function returns null on error instead of throwing — callers
+     don't check for null
+  5. Generic catch block: catches everything including programming
+     errors (TypeError, ReferenceError) along with business errors
+  6. Error messages shown to users include internal details:
+     "Failed to insert into orders table: constraint violation"
+  Task: Write review comments for each error handling issue. For
+  each, explain the failure scenario, the user/system impact, and
+  the correct error handling pattern. These are the bugs that cause
+  3am pages.
+assertions:
+  - type: llm_judge
+    criteria: "Failure scenarios are vivid and specific — (1) Swallowed error: 'Card is charged but the order isn't created because the error is only logged, not handled. The customer pays but gets no order confirmation. Support ticket incoming.' (2) Missing timeout: 'If Stripe hangs for 5 minutes, this HTTP request hangs for 5 minutes. If 100 users hit checkout, you have 100 hanging connections and the server becomes unresponsive.' (3) Inconsistent state: 'Customer is charged $50 but has no inventory allocated. They see Charged on their credit card but Your cart is empty on your site. This is the worst kind of bug — money is involved.' (4) Null return: 'Callers assume this function returns an order object. When it returns null, the next line does order.id and crashes with Cannot read property id of null — a confusing error for a payment problem.' Each scenario tells a story a developer can visualize"
+    weight: 0.35
+    description: "Failure scenarios"
+  - type: llm_judge
+    criteria: "Correct patterns are demonstrated — (1) Don't swallow: either re-throw, return an error result, or handle meaningfully (show user an error page, queue a retry). (2) Timeout: `await Promise.race([chargeCard(amount), timeout(30000)])` or use AbortController. (3) Compensation: implement saga pattern or at minimum: try charge → try allocate → if allocate fails, refund charge. Show the code structure. (4) Error types: throw typed errors instead of returning null: `throw new PaymentError('Card declined', { code: 'CARD_DECLINED' })`. Callers use try/catch. (5) Specific catches: catch business errors separately from programming errors: `catch(e) { if (e instanceof PaymentError) { handlePaymentError(e) } else { throw e } }`. (6) User-safe errors: map internal errors to user-friendly messages, log details server-side"
+    weight: 0.35
+    description: "Correct patterns"
+  - type: llm_judge
+    criteria: "Comments connect to real production impact — 'These are the bugs that cause 3am pages' framing throughout. Reference real-world scenarios: 'We had an incident last month where a similar swallowed error caused 200 orders to be charged but not fulfilled.' Testing suggestion: 'Each of these error paths should have a test — simulate payment gateway timeout, simulate inventory service failure, simulate partial failure.' Monitoring: 'Even with proper error handling, add alerting for these failure paths — if chargeCard fails for 5% of requests, we need to know immediately.' Order of importance: 'The payment-without-order issue (#3) is the most critical — it involves customer money. I'd fix that first and create follow-up tickets for the rest.'"
+    weight: 0.30
+    description: "Production impact"

package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-tests.yaml ADDED Viewed

@@ -0,0 +1,53 @@
+meta:
+  id: reviewing-tests
+  level: 2
+  course: code-review-feedback-writing
+  type: output
+  description: "Review test code — evaluate test quality including coverage, edge cases, test naming, and the relationship between tests and implementation"
+  tags: [code-review, testing, test-quality, coverage, edge-cases, intermediate]
+state: {}
+trigger: |
+  You're reviewing test code for a payment processing module. The
+  author says "I added tests" and the coverage report shows 85%.
+  But coverage doesn't mean quality. You find these issues:
+  1. All 8 tests only cover the happy path (valid inputs, successful
+     operations) — no error cases tested
+  2. Test names are "test1", "test2", "test3" — no description of
+     what's being tested
+  3. Tests share mutable state — test order matters (test3 fails
+     if run alone)
+  4. One test is 60 lines long with 12 assertions (testing too many
+     things at once)
+  5. Tests mock the database but the mocks always return success —
+     database errors are never simulated
+  6. No test for the boundary condition: payment amount of exactly
+     $0.50 (minimum) and $999,999.99 (maximum)
+  7. A test that just checks the function doesn't throw — doesn't
+     verify the actual output
+  Task: Write review comments for each testing issue. For each,
+  explain why it matters for test quality, and suggest the specific
+  test(s) that should be added or changed.
+assertions:
+  - type: llm_judge
+    criteria: "Each test quality issue is explained clearly — (1) Happy path only: '85% coverage looks good on paper, but these tests only verify that correct inputs produce correct outputs. What happens when the payment gateway returns a timeout? When the amount is negative? When the customer doesn't exist? These are the scenarios that cause production incidents.' (2) Test names: 'When a test fails in CI, the name is all you see. test3 failing tells me nothing — but should_reject_payment_when_amount_below_minimum tells me exactly what broke.' (3) Shared state: 'Tests that depend on execution order are fragile — they pass in your suite but fail when run individually or in parallel. Each test should set up its own state.' (4) Long test: '12 assertions means 12 different things can fail — but you only see the first failure. Split into focused tests that each verify one behavior.'"
+    weight: 0.35
+    description: "Test quality explanations"
+  - type: llm_judge
+    criteria: "Specific missing tests are suggested — error cases to add: payment gateway timeout, gateway returns decline, invalid card number, duplicate payment (idempotency), insufficient funds, network error, database write failure. Boundary tests: amount = 49 cents (below minimum, should reject), amount = 50 cents (exact minimum, should accept), amount = $999,999.99 (exact maximum, should accept), amount = $1,000,000.00 (above maximum, should reject), amount = 0 (should reject), amount = -1 (should reject). State tests: payment in each status (pending, processing, succeeded, failed) — can each be refunded? cancelled? retried? Each suggestion is a specific test case with expected behavior, not just 'add more tests'"
+    weight: 0.35
+    description: "Specific missing tests"
+  - type: llm_judge
+    criteria: "Test naming and structure improvements are demonstrated — naming pattern: 'should_[expected behavior]_when_[condition]' or 'it [behavior] when [condition]'. Examples: 'should_create_payment_when_valid_amount_and_card', 'should_return_error_when_amount_below_minimum', 'should_reject_refund_when_payment_already_refunded'. Test structure: Arrange-Act-Assert clearly separated. Each test: set up specific state (arrange), call the function (act), verify one specific outcome (assert). Fixture usage: suggest beforeEach for common setup but ensure each test has its own state. Mock improvement: show how to make the database mock return an error: `mockDb.query.mockRejectedValueOnce(new Error('connection timeout'))`. Independence: show how to reset state between tests"
+    weight: 0.30
+    description: "Structure improvements"

package/courses/code-review-feedback-writing/scenarios/level-2/security-review-comments.yaml ADDED Viewed

@@ -0,0 +1,50 @@
+meta:
+  id: security-review-comments
+  level: 2
+  course: code-review-feedback-writing
+  type: output
+  description: "Write security-focused review comments — identify and communicate security vulnerabilities clearly with risk assessment and remediation guidance"
+  tags: [code-review, security, vulnerabilities, OWASP, remediation, intermediate]
+state: {}
+trigger: |
+  You're reviewing a PR for a user management feature. You've spotted
+  several security issues at different severity levels:
+  1. SQL query built with string concatenation:
+     `SELECT * FROM users WHERE name = '${userName}'`
+  2. Password stored as MD5 hash (no salt)
+  3. API endpoint returns full user object including hashed password,
+     SSN, and internal notes in the response
+  4. File upload accepts any file type with no size limit
+  5. Admin endpoint has no authorization check — any authenticated
+     user can access it
+  6. CORS is configured with Access-Control-Allow-Origin: *
+  7. Error messages include stack traces and database column names
+  Task: Write security review comments for each vulnerability. For
+  each: explain the vulnerability, describe the attack scenario,
+  rate the severity, and provide the specific fix. These comments
+  should educate the author about WHY the fix matters, not just
+  WHAT to fix.
+assertions:
+  - type: llm_judge
+    criteria: "Each vulnerability is explained with attack scenarios — (1) SQL injection: attacker inputs `'; DROP TABLE users; --` as userName, executing arbitrary SQL. (2) MD5: rainbow table attack cracks passwords in seconds (no salt means identical passwords have identical hashes). (3) Data exposure: API consumers (or browser dev tools users) can see passwords, SSNs — violates data minimization. (4) File upload: attacker uploads 10GB file (DoS) or .exe/.php file (remote code execution if served). (5) Broken authorization: any logged-in user can delete other users or modify admin settings. (6) Wildcard CORS: any website can make authenticated API requests on behalf of your users (CSRF-like). (7) Information disclosure: stack traces reveal framework version, file paths, database schema — helps attackers map your system. Each explanation is specific and scary enough to motivate the fix without being fear-mongering"
+    weight: 0.35
+    description: "Vulnerability explanations"
+  - type: llm_judge
+    criteria: "Severity ratings and fixes are specific — critical (must fix before merge): SQL injection (#1), broken auth (#5), password storage (#2). High (must fix before production): data exposure (#3), file upload (#4). Medium (fix before public launch): CORS (#6), information disclosure (#7). Fixes: (1) use parameterized queries: `db.query('SELECT * FROM users WHERE name = $1', [userName])`. (2) Use bcrypt with cost factor 12+. (3) Create a UserResponse DTO that excludes sensitive fields. (4) Allowlist file types (jpg, png, pdf), max 10MB, store outside webroot. (5) Add @RequireRole('admin') middleware/decorator. (6) Set specific allowed origins. (7) Use generic error messages in production, log details server-side. Each fix is implementable from the comment alone"
+    weight: 0.35
+    description: "Severity and fixes"
+  - type: llm_judge
+    criteria: "Comments educate about security principles — don't just fix the symptom, explain the principle: SQL injection → 'Never trust user input in queries — parameterized queries separate data from commands.' Password storage → 'MD5 was designed for speed, not security — bcrypt is intentionally slow, making brute force impractical.' Data exposure → 'API responses should follow the principle of least privilege — only return what the consumer needs.' Authorization → 'Authentication (who are you?) is different from authorization (are you allowed?) — this endpoint checks the first but not the second.' Each comment links to relevant OWASP reference or security guide. Overall comment: 'I'd recommend running the OWASP top 10 checklist against any endpoint handling user data — happy to pair on a security review if helpful.'"
+    weight: 0.30
+    description: "Security education"

package/courses/code-review-feedback-writing/scenarios/level-2/suggesting-alternatives.yaml ADDED Viewed

@@ -0,0 +1,42 @@
+meta:
+  id: suggesting-alternatives
+  level: 2
+  course: code-review-feedback-writing
+  type: output
+  description: "Suggest alternatives with code — provide concrete code suggestions in reviews showing before/after transformations with trade-off analysis"
+  tags: [code-review, alternatives, code-suggestions, before-after, trade-offs, intermediate]
+state: {}
+trigger: |
+  You're reviewing code and find several places where a different
+  approach would be better. Rather than vaguely saying "consider
+  another approach," you want to show the alternative with code.
+  Current code patterns you've found:
+  1. A 15-line if/else chain checking user.role for 6 different values
+  2. A Promise chain 8 levels deep (.then().then().then()...)
+  3. Manual array filtering with a for loop and push
+  4. Try/catch with identical error handling in 5 different functions
+  5. A function that takes 7 positional parameters
+  Task: For each pattern, write a review comment that shows the
+  current approach and the suggested alternative as code snippets.
+  Include: what's wrong with the current approach, the alternative
+  code, and an honest assessment of trade-offs (the alternative
+  isn't always strictly better).
+assertions:
+  - type: llm_judge
+    criteria: "Code suggestions are concrete and complete — (1) if/else chain → object literal lookup or switch statement or Map: show `const handlers = { admin: handleAdmin, user: handleUser, ... }; handlers[user.role]?.()`. (2) Promise chain → async/await: show the before (nested .then) and after (linear await statements). (3) Manual loop → array methods: show `const active = users.filter(u => u.isActive).map(u => u.name)`. (4) Duplicate error handling → higher-order function or error middleware: show `const withErrorHandling = (fn) => async (...args) => { try { return await fn(...args); } catch(e) { handleError(e); } }`. (5) 7 parameters → options object: show `function createUser({ name, email, role, ...opts })` with destructuring. Each suggestion is copy-pasteable with minimal modification"
+    weight: 0.35
+    description: "Concrete code"
+  - type: llm_judge
+    criteria: "Trade-offs are honestly assessed — (1) Object lookup: cleaner but loses TypeScript exhaustiveness checking (switch with default gives compile-time safety). (2) Async/await: more readable but error handling requires try/catch wrapping (or .catch on the promise). (3) Array methods: more declarative but creates intermediate arrays (for large datasets, a single loop is more memory-efficient). (4) Higher-order function: reduces duplication but adds indirection (debugging is harder when error handling is abstracted). (5) Options object: flexible but loses positional argument auto-complete in some IDEs, and you must destructure carefully. Each trade-off is honest — the reviewer acknowledges when the current approach has legitimate advantages. 'I prefer this approach, but I understand if you want to keep the current one because of [specific advantage]'"
+    weight: 0.35
+    description: "Honest trade-offs"
+  - type: llm_judge
+    criteria: "Comments frame suggestions as proposals, not mandates — language: 'Here's an alternative approach — let me know what you think' not 'Change this to...'. When the alternative is clearly better: 'I think this would improve readability significantly — the nested .then chain is hard to follow at 8 levels.' When it's a preference: 'This is a style preference — the object lookup is more concise but your switch statement works well too. Up to you.' When the suggestion is complex: 'This is a bigger refactor — could be a follow-up PR if you agree with the direction.' GitHub suggestion format used where applicable (```suggestion blocks). Each comment acknowledges the effort in the current code — 'The logic here is correct, I'm just suggesting a structural alternative'"
+    weight: 0.30
+    description: "Proposal framing"

package/courses/code-review-feedback-writing/scenarios/level-3/cross-team-review.yaml ADDED Viewed

@@ -0,0 +1,45 @@
+meta:
+  id: cross-team-review
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Review cross-team code — navigate reviewing code from teams with different conventions, priorities, and domain knowledge while providing valuable feedback"
+  tags: [code-review, cross-team, collaboration, domain-knowledge, conventions, advanced]
+state: {}
+trigger: |
+  You're on the payments team. The platform team asks you to review
+  their PR that adds a new event bus system. Challenges:
+  - You don't know their codebase well
+  - They use a different testing framework (they use Playwright,
+    you use Jest)
+  - Their coding conventions differ (they use classes, your team
+    uses functional)
+  - The PR impacts your team (your services will need to emit events)
+  - You're not sure which of your concerns are "that's how they do
+    things" vs "that's actually wrong"
+  The PR adds: EventBus class, EventEmitter, EventSubscriber pattern,
+  retry mechanism for failed deliveries, dead letter queue.
+  Task: Write a cross-team review that provides value without
+  overstepping. Show how to: ask about unfamiliar conventions before
+  criticizing, focus on the interface your team will use, flag
+  concerns from your domain expertise (payments), and distinguish
+  between "different" and "wrong."
+assertions:
+  - type: llm_judge
+    criteria: "Review distinguishes different from wrong — explicitly separate: 'These are questions about conventions I'm not familiar with (may be intentional)' vs 'These are concerns from my payments domain expertise (likely valid regardless of team conventions).' Convention questions: 'I noticed you use class-based patterns throughout — is that your team's standard? Our team uses functions but I want to respect your conventions.' Domain concerns: 'The retry mechanism retries on all errors — for payment events, retrying a charge event could result in double-charging. Payment events should only be retried if they're idempotent.' The reviewer shows humility about what they don't know while being firm about what they do know"
+    weight: 0.35
+    description: "Different vs wrong"
+  - type: llm_judge
+    criteria: "Interface concerns are highlighted as the primary review focus — 'Since my team will be the primary consumer of this event bus, I focused my review on the producer API.' Specific interface feedback: event schema documentation (what fields are required?), error handling contract (what happens when the bus is down?), delivery guarantees (at-least-once vs exactly-once — critical for payments), event ordering guarantees (payment.created must arrive before payment.succeeded). API ergonomics: 'The current API requires 5 lines of boilerplate to emit one event — could we add a helper like eventBus.emit(type, payload)?' Backward compatibility: 'If the event schema changes, how will consumers know? Suggest versioning events from the start.'"
+    weight: 0.35
+    description: "Interface focus"
+  - type: llm_judge
+    criteria: "Cross-team communication is respectful and collaborative — opening: 'Thanks for the review request — excited to see the event bus taking shape. I focused primarily on the producer API since the payments team will be a heavy consumer.' Questions before assumptions: 'I have some questions about design decisions that may reflect conventions I'm not familiar with — happy to discuss async.' Expertise sharing: 'From our experience with payment events, here are patterns that have burned us...' Collaboration offer: 'Would it be helpful if I put together a list of payment events we'd want to emit? That could help validate the event schema design.' Close: 'Overall this looks solid for the use cases I can evaluate. I'd recommend getting a review from the data team too since they'll be heavy consumers of analytics events.'"
+    weight: 0.30
+    description: "Collaborative communication"

package/courses/code-review-feedback-writing/scenarios/level-3/mentoring-through-review.yaml ADDED Viewed

@@ -0,0 +1,46 @@
+meta:
+  id: mentoring-through-review
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Mentor through code review — use reviews as teaching opportunities that develop junior developers' skills while maintaining code quality"
+  tags: [code-review, mentoring, teaching, junior-developers, growth, advanced]
+state: {}
+trigger: |
+  You're reviewing code from a junior developer (3 months on the team).
+  Their PR implements a user notification preferences feature. The code
+  works but has several learning opportunities:
+  1. They wrote 200 lines of procedural code in one file instead of
+     using the service pattern the team follows
+  2. They used string comparison for notification types instead of
+     the existing NotificationType enum
+  3. They implemented their own email queue instead of using the
+     existing NotificationQueue service
+  4. Error handling catches everything and returns a generic 500
+  5. Tests are tightly coupled to implementation (mock everything)
+  6. Good: the database migration is clean and well-structured
+  7. Good: they wrote thorough documentation for the new endpoints
+  They're eager to learn but past harsh reviews have made them
+  defensive. How do you turn this review into a growth opportunity?
+  Task: Write the complete mentoring review. For each issue, teach
+  the principle (not just the fix), connect to team patterns, and
+  offer to pair. Make the developer feel supported, not criticized.
+assertions:
+  - type: llm_judge
+    criteria: "Each comment teaches a principle, not just a fix — (1) Service pattern: explain WHY the team uses services (testability, reuse, separation of concerns) — not just 'move this to a service.' Show how the existing OrderService follows this pattern as a reference. (2) Enum: explain WHY enums exist (typo prevention, IDE autocomplete, refactoring safety) — show what happens when someone types 'emal' instead of 'email'. (3) Existing queue: explain HOW to discover existing utilities (search patterns, team wiki, ask in Slack) — the skill is knowing to look, not just what to use. (4) Error handling: explain the difference between expected errors (user not found) and unexpected errors (database crash) — each needs different handling. (5) Test coupling: explain what happens when implementation changes (all tests break even though behavior is correct) — suggest testing behavior: 'when user disables email notifications, emails stop being sent'"
+    weight: 0.35
+    description: "Principle teaching"
+  - type: llm_judge
+    criteria: "Tone supports growth without lowering the bar — start with genuine praise: 'The migration is really clean — especially the default values and constraints. And the endpoint documentation is better than what most senior developers write. These are the things that matter for long-term code quality.' For issues: frame as learning, not mistakes: 'This is a great opportunity to learn about our service pattern — it's not obvious when you're new and it took me a while to internalize it too.' Normalize the learning process: 'When I joined, I wrote a custom email sender too before discovering we had NotificationQueue — it's hard to know what exists in a large codebase.' Offer help: 'Want to pair on extracting this into a service? I can show you the pattern and you can drive.' Encourage: 'Your code logic is solid — the refactoring is mostly about fitting into team patterns, which takes time to learn.'"
+    weight: 0.35
+    description: "Supportive tone"
+  - type: llm_judge
+    criteria: "Review develops the developer's independence — don't just give answers, teach how to find them: 'Next time you need to handle notifications, try searching for NotificationType in the codebase — you'll find the enum and the existing service. Pro tip: `grep -r NotificationType src/` is my go-to for discovering existing patterns.' Suggest a discovery process: 'Before building a new utility, check: (1) search the codebase for similar patterns, (2) check the team wiki under Shared Services, (3) ask in #eng-questions — someone might have built it.' Set expectations for growth: 'For your next PR, try to identify one existing utility you can reuse instead of building from scratch.' Don't over-correct: 'I'm suggesting improvements for several areas, but you don't need to address everything in this PR. Focus on the service extraction (#1) and enum usage (#2) — those are the highest-impact changes. The rest can be follow-up work.'"
+    weight: 0.30
+    description: "Building independence"

package/courses/code-review-feedback-writing/scenarios/level-3/reviewing-unfamiliar-code.yaml ADDED Viewed

@@ -0,0 +1,43 @@
+meta:
+  id: reviewing-unfamiliar-code
+  level: 3
+  course: code-review-feedback-writing
+  type: output
+  description: "Review unfamiliar code — develop strategies for reviewing code in codebases you don't know well, asking the right questions, and still providing value"
+  tags: [code-review, unfamiliar-codebase, learning, questions, onboarding, advanced]
+state: {}
+trigger: |
+  You just joined a new team and are asked to review PRs on day one.
+  You don't know:
+  - The architecture or how services communicate
+  - Team conventions (naming, patterns, error handling approach)
+  - Business domain (insurance claims processing)
+  - Why certain design decisions were made
+  - What the testing expectations are
+  The PR adds a new "claims escalation" feature — 200 lines across
+  5 files. You can read the code but you're missing context for many
+  decisions.
+  Task: Write a review that provides value despite limited context.
+  Show how to: identify universal quality issues (any codebase),
+  ask context-gathering questions that also serve as review, learn
+  the codebase through reviewing, and be honest about what you can
+  and can't evaluate. Write the review comments and a reflection
+  on your review strategy.
+assertions:
+  - type: llm_judge
+    criteria: "Universal quality issues are identified — regardless of domain knowledge, catch: (1) error handling gaps (catch block that swallows errors), (2) missing input validation, (3) potential null/undefined references, (4) SQL injection or security concerns, (5) dead code or unused variables, (6) inconsistent patterns within the PR itself (does the PR's own code follow its own conventions?). (7) Test quality: are tests testing behavior or implementation? Do tests have meaningful names? (8) Code clarity: are variable/function names descriptive? Is the control flow easy to follow? These issues are valid in any codebase and show the reviewer is thorough despite being new"
+    weight: 0.35
+    description: "Universal issues"
+  - type: llm_judge
+    criteria: "Questions serve double duty as review and learning — questions that are both genuine learning AND review prompts: 'I notice ClaimEscalation inherits from BaseProcessor — could you point me to where BaseProcessor is defined? I want to understand the lifecycle hooks.' This helps the reviewer learn AND surfaces whether the inheritance is appropriate. 'What triggers a claim to enter the escalation queue? I want to make sure the conditions in isEligibleForEscalation() cover all cases.' Asks for business context AND validates the logic. 'I see we're using Redis for the escalation queue — is there a reason we're not using the existing RabbitMQ setup I saw in other services?' Learns architecture AND questions the design choice. Each question shows the reviewer is engaged and thinking, not just rubber-stamping"
+    weight: 0.35
+    description: "Dual-purpose questions"
+  - type: llm_judge
+    criteria: "Honest scope disclosure builds trust — opening comment: 'Disclosure: I'm new to this codebase and the insurance domain. I focused my review on code quality, error handling, and testing patterns. I can't yet validate the business logic or architectural fit — I'd recommend a domain expert also reviews the escalation rules.' Specific limitations: 'I'm not sure if the 72-hour escalation window is a business requirement or an arbitrary choice — if it's business-critical, it should be a named constant with a comment explaining the requirement.' Learning-through-review: 'This review helped me understand how claims flow through the system. I documented what I learned in case it's useful: [brief architecture note].' Value: even as a newcomer, catching a null reference exception or a missing error handler is valuable — don't apologize for what you can contribute"
+    weight: 0.30
+    description: "Honest scope"

package/courses/terraform-infrastructure-setup/scenarios/level-1/first-debugging-shift.yaml ADDED Viewed

@@ -0,0 +1,66 @@
+meta:
+  id: first-debugging-shift
+  level: 1
+  course: terraform-infrastructure-setup
+  type: output
+  description: "Combined beginner shift — handle multiple Terraform issues during your first on-call including init failures, plan errors, and state problems"
+  tags: [Terraform, troubleshooting, combined, shift-simulation, beginner]
+state: {}
+trigger: |
+  It's your first week managing Terraform infrastructure. You face
+  three issues in one day:
+  Issue 1 — New engineer can't get started:
+  ```
+  $ terraform init
+  Error: Failed to get existing workspaces: S3 bucket
+  "company-tf-state" does not exist.
+  ```
+  The engineer cloned the repo but nobody told them about the
+  backend S3 bucket setup.
+  Issue 2 — Staging plan shows unexpected destroy:
+  ```
+  $ terraform plan
+  # aws_instance.app[0] will be destroyed
+  # aws_instance.app[1] will be destroyed
+  # aws_instance.app[2] will be destroyed
+  # aws_instance.app["web-1"] will be created
+  # aws_instance.app["web-2"] will be created
+  # aws_instance.app["web-3"] will be created
+  Plan: 3 to add, 0 to change, 3 to destroy.
+  ```
+  Someone refactored from count to for_each. The 3 instances will be
+  destroyed and recreated, causing downtime.
+  Issue 3 — State lock stuck:
+  ```
+  Error: Error acquiring the state lock
+  Lock Info:
+    ID:        abc-123
+    Who:       jenkins@ci-server
+    Created:   2024-01-15 08:00:00 UTC
+  ```
+  The CI pipeline crashed mid-apply 6 hours ago and the lock is stale.
+  Task: Diagnose and fix all three issues. Explain the root causes
+  and preventive measures for each.
+assertions:
+  - type: llm_judge
+    criteria: "Issue 1 (init failure) is resolved — the S3 backend bucket must exist before terraform init. Fix: create the bucket manually (or with a separate bootstrap Terraform config), or use terraform init -backend-config=path/to/backend.hcl for flexible backend configuration. Document the bootstrap process in README. Consider: use a Makefile or script that checks prerequisites before init. For new engineers: provide onboarding docs with exact setup steps, or use Terraform Cloud to avoid S3 backend setup entirely"
+    weight: 0.35
+    description: "Init failure"
+  - type: llm_judge
+    criteria: "Issue 2 (count to for_each migration) is resolved — switching from count to for_each changes resource addresses (aws_instance.app[0] → aws_instance.app['web-1']), causing Terraform to see them as different resources. Fix: use terraform state mv to rename resources in state: terraform state mv 'aws_instance.app[0]' 'aws_instance.app[\"web-1\"]' for each instance. This avoids recreation. Alternative: use moved blocks in Terraform 1.1+: moved { from = aws_instance.app[0], to = aws_instance.app[\"web-1\"] }. Always check plan after refactoring"
+    weight: 0.35
+    description: "Count to for_each"
+  - type: llm_judge
+    criteria: "Issue 3 (stuck lock) is resolved — the CI pipeline crashed and didn't release the DynamoDB lock. Verify the lock is stale (6 hours old, CI pipeline is dead). Fix: terraform force-unlock abc-123. This is safe because the original process is dead. Prevention: CI/CD pipeline should have timeout protection, use terraform apply with -lock-timeout=5m to wait for locks, ensure CI cleanup steps release locks on failure. Warning: never force-unlock if the original process might still be running"
+    weight: 0.30
+    description: "Stuck lock"