dojo.md 0.2.3 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. package/courses/GENERATION_LOG.md +9 -0
  2. package/courses/code-review-feedback-writing/scenarios/level-3/advanced-review-shift.yaml +48 -0
  3. package/courses/code-review-feedback-writing/scenarios/level-3/api-design-review.yaml +47 -0
  4. package/courses/code-review-feedback-writing/scenarios/level-3/database-migration-review.yaml +48 -0
  5. package/courses/code-review-feedback-writing/scenarios/level-3/design-pattern-feedback.yaml +48 -0
  6. package/courses/code-review-feedback-writing/scenarios/level-3/production-incident-review.yaml +42 -0
  7. package/courses/code-review-feedback-writing/scenarios/level-3/reviewing-senior-code.yaml +47 -0
  8. package/courses/code-review-feedback-writing/scenarios/level-3/speed-vs-thoroughness.yaml +46 -0
  9. package/courses/code-review-feedback-writing/scenarios/level-4/automated-review-strategy.yaml +44 -0
  10. package/courses/code-review-feedback-writing/scenarios/level-4/expert-review-shift.yaml +46 -0
  11. package/courses/code-review-feedback-writing/scenarios/level-4/review-culture-design.yaml +41 -0
  12. package/courses/code-review-feedback-writing/scenarios/level-4/review-guidelines-standards.yaml +45 -0
  13. package/courses/code-review-feedback-writing/scenarios/level-4/review-load-balancing.yaml +39 -0
  14. package/courses/code-review-feedback-writing/scenarios/level-4/review-metrics.yaml +39 -0
  15. package/courses/code-review-feedback-writing/scenarios/level-4/review-process-optimization.yaml +48 -0
  16. package/courses/code-review-feedback-writing/scenarios/level-4/scaling-review-process.yaml +45 -0
  17. package/courses/code-review-feedback-writing/scenarios/level-4/security-review-standards.yaml +41 -0
  18. package/courses/code-review-feedback-writing/scenarios/level-4/training-reviewers.yaml +42 -0
  19. package/courses/code-review-feedback-writing/scenarios/level-5/board-quality-metrics.yaml +44 -0
  20. package/courses/code-review-feedback-writing/scenarios/level-5/knowledge-transfer-at-scale.yaml +42 -0
  21. package/courses/code-review-feedback-writing/scenarios/level-5/ma-review-alignment.yaml +50 -0
  22. package/courses/code-review-feedback-writing/scenarios/level-5/master-review-shift.yaml +49 -0
  23. package/courses/code-review-feedback-writing/scenarios/level-5/review-competitive-advantage.yaml +48 -0
  24. package/courses/code-review-feedback-writing/scenarios/level-5/review-organizational-learning.yaml +46 -0
  25. package/courses/code-review-feedback-writing/scenarios/level-5/review-roi-analysis.yaml +51 -0
  26. package/courses/code-review-feedback-writing/scenarios/level-5/review-velocity-impact.yaml +44 -0
  27. package/courses/code-review-feedback-writing/scenarios/level-5/scaling-reviews-100-plus.yaml +45 -0
  28. package/courses/code-review-feedback-writing/scenarios/level-5/toxic-culture-transformation.yaml +46 -0
  29. package/courses/technical-rfc-writing/course.yaml +11 -0
  30. package/courses/technical-rfc-writing/scenarios/level-1/first-rfc-shift.yaml +45 -0
  31. package/courses/technical-rfc-writing/scenarios/level-1/implementation-planning.yaml +47 -0
  32. package/courses/technical-rfc-writing/scenarios/level-1/open-questions.yaml +46 -0
  33. package/courses/technical-rfc-writing/scenarios/level-1/problem-statement.yaml +41 -0
  34. package/courses/technical-rfc-writing/scenarios/level-1/proposing-solutions.yaml +49 -0
  35. package/courses/technical-rfc-writing/scenarios/level-1/rfc-structure.yaml +41 -0
  36. package/courses/technical-rfc-writing/scenarios/level-1/risks-and-mitigations.yaml +43 -0
  37. package/courses/technical-rfc-writing/scenarios/level-1/scoping-an-rfc.yaml +49 -0
  38. package/courses/technical-rfc-writing/scenarios/level-1/success-metrics.yaml +43 -0
  39. package/courses/technical-rfc-writing/scenarios/level-1/writing-for-audience.yaml +42 -0
  40. package/courses/technical-rfc-writing/scenarios/level-2/risk-assessment-matrix.yaml +43 -0
  41. package/courses/technical-rfc-writing/scenarios/level-2/technical-design-detail.yaml +42 -0
  42. package/courses/technical-rfc-writing/scenarios/level-2/trade-off-analysis.yaml +43 -0
  43. package/dist/evaluator/judge.d.ts +7 -1
  44. package/dist/evaluator/judge.d.ts.map +1 -1
  45. package/dist/evaluator/judge.js +50 -11
  46. package/dist/evaluator/judge.js.map +1 -1
  47. package/dist/types/index.d.ts +1 -1
  48. package/dist/types/index.d.ts.map +1 -1
  49. package/package.json +1 -1
@@ -0,0 +1,44 @@
1
+ meta:
2
+ id: review-velocity-impact
3
+ level: 5
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Analyze review impact on velocity — quantify how code review practices affect engineering team velocity, quality, and delivery predictability"
7
+ tags: [code-review, velocity, engineering-metrics, delivery, impact-analysis, master]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ The CEO asks: "Code review takes 30% of our engineering time. Is it
13
+ worth it? Should we reduce review requirements to ship faster?"
14
+
15
+ Data available:
16
+ - Engineering time: 30% of developer hours spent on review activities
17
+ - Cycle time: average 7 days from PR open to production (review is
18
+ 4 of those 7 days)
19
+ - Defect rate: teams with thorough reviews have 3x fewer production bugs
20
+ - But: teams with fastest reviews ship 2x more features per quarter
21
+ - Developer satisfaction: inversely correlated with review wait time
22
+ - Knowledge sharing: 60% of cross-team knowledge transfer happens
23
+ through code reviews
24
+ - Onboarding: new engineers who receive detailed reviews reach full
25
+ productivity 40% faster
26
+
27
+ Task: Write the analysis for the CEO. Include: the true cost of
28
+ code review, the true cost of NOT reviewing, optimization
29
+ opportunities (maintain quality, reduce time), and a recommendation
30
+ with projected impact. Use data to tell the story.
31
+
32
+ assertions:
33
+ - type: llm_judge
34
+ criteria: "Cost analysis is honest and complete — cost of review: 30% of engineering time = $X million/year (calculate from team size and average salary). 4 of 7 days in cycle time is a bottleneck. Opportunity cost: features not shipped during review wait time. Cost of NOT reviewing: 3x more production bugs × average bug fix cost ($5K-50K each) = $Y million/year in bug costs. Knowledge silos: without reviews, knowledge stays in individual heads (bus factor risk). Onboarding: 40% slower onboarding = $Z additional cost per new hire. Comparison: review investment should be compared to the alternative (more bugs, slower onboarding, knowledge loss), not to zero. Key insight: 'The question isn't whether to do reviews — it's whether we're doing them efficiently.'"
35
+ weight: 0.35
36
+ description: "Cost analysis"
37
+ - type: llm_judge
38
+ criteria: "Optimization recommendations maintain quality while reducing time — 'We can reduce review time from 4 days to 1.5 days without reducing quality.' Specific interventions: (1) Automate style/formatting (saves 40% of review time — Prettier, ESLint). (2) PR size limits (smaller PRs reviewed faster — 200 vs 500 lines saves 2-3 hours per review). (3) Reviewer assignment automation (eliminates 1-2 day assignment delay). (4) Review SLA (24-hour first review commitment). (5) High-quality first submissions (PR templates, self-review checklist reduce review rounds). Projected impact: review time from 30% to 20% of engineering hours, cycle time from 7 to 4 days, feature delivery +30%, quality maintained or improved. Each recommendation with expected time savings and implementation cost"
39
+ weight: 0.35
40
+ description: "Optimization plan"
41
+ - type: llm_judge
42
+ criteria: "Recommendation is data-driven and CEO-friendly — framing: 'Code review is engineering infrastructure — like CI/CD or monitoring. The question isn't whether to invest, but how to invest efficiently.' Recommendation: optimize, don't reduce. Reducing review thoroughness has compounding costs: bugs cost 10x more to fix in production than in review, knowledge loss is irreversible, onboarding slows permanently. Projected outcomes: invest $50K in automation + 90-day process improvement → save $200K/year in review time + $150K/year in prevented bugs + $100K/year in faster onboarding = $450K annual return on $50K investment. Timeline: improvements visible in 30 days (automation), fully realized in 90 days (process changes). Dashboard: CEO-level metrics — cycle time trend, defect rate trend, developer satisfaction trend, features delivered per quarter"
43
+ weight: 0.30
44
+ description: "CEO recommendation"
@@ -0,0 +1,45 @@
1
+ meta:
2
+ id: scaling-reviews-100-plus
3
+ level: 5
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Scale reviews to 100+ engineers — design review processes that maintain quality and culture at large scale across multiple teams, timezones, and experience levels"
7
+ tags: [code-review, scaling, large-team, process-design, multi-timezone, master]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your company is growing from 50 to 200 engineers over 18 months.
13
+ The current review process works well at 50 but won't scale:
14
+
15
+ - Manual reviewer assignment (team lead picks reviewers)
16
+ - One review channel in Slack (50 engineers, manageable noise)
17
+ - Informal mentoring through review (seniors know all juniors)
18
+ - Single coding standard (everyone agreed in one meeting)
19
+ - Everyone knows everyone (trust is personal, not institutional)
20
+
21
+ At 200 engineers across 4 timezones:
22
+ - Team lead can't know everyone's expertise
23
+ - Slack channel becomes unusable noise
24
+ - Seniors can't personally mentor 150 juniors
25
+ - Standards need to be written (can't be tribal knowledge)
26
+ - Trust must be institutional (process-based, not personal)
27
+
28
+ Task: Design the review process for 200 engineers. Include:
29
+ automated assignment, tiered review requirements, timezone-aware
30
+ scheduling, knowledge preservation systems, and how to maintain
31
+ the collaborative culture that worked at 50 people.
32
+
33
+ assertions:
34
+ - type: llm_judge
35
+ criteria: "Automated systems replace manual coordination — assignment: CODEOWNERS + automated round-robin weighted by: expertise (code area familiarity), load (current open reviews), timezone (prefer same or adjacent timezone for sync discussions), cross-pollination (10% of assignments go to engineers outside the team for knowledge sharing). Tiered requirements: Tier 1 (low risk: docs, config, style) — 1 approval, any engineer. Tier 2 (standard: features, refactors) — 1 approval, team member. Tier 3 (high risk: security, data, infrastructure) — 2 approvals, at least 1 senior. Tier auto-detected: CI analyzes changed files against CODEOWNERS patterns. Slack: replace single channel with per-team channels + cross-team review request bot. Dashboard: real-time review queue visibility per team"
36
+ weight: 0.35
37
+ description: "Automated systems"
38
+ - type: llm_judge
39
+ criteria: "Timezone awareness prevents review latency — async-first design: reviews should be completable without real-time discussion (high-quality PR descriptions, thorough first review pass). Timezone routing: assign reviewers in same or +/- 4 hour timezone when possible. Follow-the-sun for urgent reviews: if no reviewer available in requester's timezone, auto-escalate to next timezone's pool. SLA adjustment by timezone overlap: same timezone = 8 hours, adjacent = 16 hours, opposite = 24 hours. Review handoff: if review discussion is complex, write summary for the next timezone reviewer rather than waiting for sync. Critical path: reviews blocking production deploys get priority routing regardless of timezone. Meeting-light: replace synchronous design reviews with async RFC documents + async review comments"
40
+ weight: 0.35
41
+ description: "Timezone design"
42
+ - type: llm_judge
43
+ criteria: "Culture preservation strategy is explicit — what to preserve from 50-person culture: (1) constructive tone — encode in written guidelines with examples, enforce via manager review. (2) Mentoring — replace personal mentoring with structured program: each junior paired with a mentor-reviewer for 6 months. (3) Trust — shift from personal trust ('I know Alex does good work') to process trust ('this PR passed our quality gates'). (4) Shared ownership — cross-team review rotation prevents silos. What to let go: (1) everyone knowing everyone (impossible at 200), (2) single review channel (too noisy), (3) ad-hoc standards (must be documented). New at scale: (1) written review culture document (onboarding reading), (2) review quality calibration (quarterly, team-level), (3) review recognition program (celebrate excellent reviews monthly), (4) review retrospectives (what's working, what's not, quarterly). Success metric: 'Would a new hire who joined 3 months ago describe our review culture the same way a 3-year veteran would?'"
44
+ weight: 0.30
45
+ description: "Culture preservation"
@@ -0,0 +1,46 @@
1
+ meta:
2
+ id: toxic-culture-transformation
3
+ level: 5
4
+ course: code-review-feedback-writing
5
+ type: output
6
+ description: "Transform toxic review culture — diagnose and systematically fix an adversarial code review environment, rebuilding trust and psychological safety"
7
+ tags: [code-review, culture-transformation, psychological-safety, trust, toxic, master]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're brought in as CTO of a 150-person engineering org where
13
+ code review culture is actively harmful:
14
+
15
+ - Exit interviews: "code reviews felt like personal attacks"
16
+ - Two teams refuse to request review from each other (feud)
17
+ - A "hall of shame" Slack channel where bad code from reviews is mocked
18
+ - Senior engineers use reviews to demonstrate superiority
19
+ - Junior developers submit PRs with apologies: "I know this isn't
20
+ great, please don't be too harsh"
21
+ - Code quality is actually declining — harsh reviews make developers
22
+ hide code rather than seeking feedback
23
+ - 25% attrition rate among engineers under 2 years tenure
24
+
25
+ The technical bar IS high and needs to stay high. The challenge:
26
+ maintain quality standards while completely transforming the culture.
27
+
28
+ Task: Present the culture transformation plan. Include: diagnosis
29
+ of how the culture became toxic, immediate interventions (first 2
30
+ weeks), 90-day transformation program, how to handle the specific
31
+ bad actors, metrics for measuring cultural change, and how to
32
+ maintain high quality standards in a psychologically safe environment.
33
+
34
+ assertions:
35
+ - type: llm_judge
36
+ criteria: "Diagnosis explains how the culture became toxic — root causes: (1) review quality was never defined — 'rigorous' was conflated with 'harsh.' (2) Senior engineers were rewarded for catching bugs, not for mentoring. (3) Management tolerated hostile behavior because the engineers were 'high performers.' (4) No consequences for abusive comments — normalized over time. (5) 'Hall of shame' channel started as humor, became bullying. (6) Junior developers internalized that asking for help = weakness. Systemic pattern: the culture rewards proving you're smart over helping others get smarter. This creates competition, not collaboration. Quality actually suffers: developers hide problems, avoid asking questions, submit smaller PRs to minimize exposure — all of which reduce code quality and knowledge sharing"
37
+ weight: 0.35
38
+ description: "Diagnosis"
39
+ - type: llm_judge
40
+ criteria: "Immediate interventions stop the bleeding — week 1: (1) shut down the 'hall of shame' channel immediately — public announcement: 'Mocking colleagues' code is incompatible with our values.' (2) 1:1 meetings with the worst offenders — clear expectations and consequences. (3) All-hands meeting: 'Code review is being reformed. High quality standards remain. Hostile behavior stops today.' (4) Anonymous feedback channel for reporting review abuse. Week 2: (1) interim review guidelines published (tone requirements, comment labels), (2) review moderation — manager spot-checks reviews for the next 30 days, (3) skip-level 1:1s with junior developers to hear their experience, (4) inter-team feud: facilitated meeting with both team leads, mediated by engineering manager. Handling bad actors: progressive — warning, coaching, performance plan, termination. 'We can teach kind feedback; we can't teach willingness to be kind.'"
41
+ weight: 0.35
42
+ description: "Immediate interventions"
43
+ - type: llm_judge
44
+ criteria: "90-day program rebuilds trust with measurable outcomes — month 1 (safety): training on constructive feedback (mandatory, all engineers), reviewer code of conduct, celebrate examples of excellent review comments. Month 2 (skill): pair reviewing program (senior + junior), review quality as performance criteria, cross-team review exchanges to break feuds. Month 3 (sustainability): metrics dashboard (review tone, turnaround, satisfaction), recognition program for mentoring through review, quarterly culture survey. Metrics: (1) survey: 'I feel safe submitting PRs for review' — target 80% agree (from estimated 30%), (2) attrition rate for < 2yr tenure — target 15% (from 25%), (3) review comment sentiment analysis — track positive/negative ratio, (4) PR submission frequency — should increase as safety increases. Quality maintenance: defect escape rate and rollback rate should NOT increase — if they do, adjust. The thesis: psychological safety IMPROVES quality because people seek feedback instead of hiding mistakes"
45
+ weight: 0.30
46
+ description: "Transformation program"
@@ -0,0 +1,11 @@
1
+ id: technical-rfc-writing
2
+ name: "Technical RFC Writing"
3
+ description: >
4
+ Master technical RFC writing from basic problem statements and
5
+ solution proposals to enterprise technology governance through
6
+ structured decision-making documents. Progress through RFC structure,
7
+ trade-off analysis, cross-team coordination, organizational RFC
8
+ programs, and strategic technology decision frameworks.
9
+ levels: 5
10
+ scenarios_per_level: 10
11
+ tags: [writing, RFC, design-docs, technical-decisions, architecture, proposals]
@@ -0,0 +1,45 @@
1
+ meta:
2
+ id: first-rfc-shift
3
+ level: 1
4
+ course: technical-rfc-writing
5
+ type: output
6
+ description: "First RFC shift simulation — write a complete RFC from scratch combining all fundamentals: problem statement, solution, alternatives, risks, implementation plan, and success metrics"
7
+ tags: [RFC, shift-simulation, complete-rfc, synthesis, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You've been asked to write your first complete RFC. The situation:
13
+
14
+ Your team's deployment process is painful:
15
+ - Deployments happen manually via SSH by a single senior engineer
16
+ - Only 2 people know the deployment process (bus factor = 2)
17
+ - Deployments take 45 minutes and require a "deployment window"
18
+ - Last month, 3 out of 8 deployments caused incidents requiring rollback
19
+ - Rollbacks are also manual and take 30 minutes
20
+ - The team deploys only twice a week because deployments are risky
21
+ - Feature work is piling up — there's a 3-week backlog of merged PRs
22
+ waiting to be deployed
23
+
24
+ You want to propose implementing a CI/CD pipeline with automated
25
+ testing, staged rollouts, and automated rollback.
26
+
27
+ Task: Write a complete RFC including all standard sections: summary,
28
+ problem statement (with data), proposed solution, alternatives
29
+ considered, risks and mitigations, implementation plan (phased),
30
+ success metrics, and open questions. This is your synthesis exercise
31
+ — demonstrate everything you've learned about RFC writing.
32
+
33
+ assertions:
34
+ - type: llm_judge
35
+ criteria: "RFC has all standard sections with appropriate depth — Summary: 2-3 sentences capturing the proposal, impact, and timeline. Problem Statement: data-driven (45-min deploys, 37.5% failure rate, 2-person bus factor, 3-week backlog), business impact (delayed features, incident costs, engineer frustration), urgency (getting worse as team grows). Proposed Solution: CI/CD pipeline with specific choices (e.g., GitHub Actions, staged rollouts via feature flags or canary), enough detail to evaluate but not an implementation guide. Alternatives Considered: at least 2 alternatives fairly presented (e.g., documented manual process with checklist, partial automation with deploy scripts), each with honest pros/cons and specific reasons for rejection. Risks: specific risks with mitigations (pipeline reliability, learning curve, migration period where both systems exist). Implementation Plan: phased (automated tests first, then CI builds, then CD to staging, then CD to production), with milestones and timeline. Success Metrics: deployment frequency, failure rate, rollback time, time-to-production. Open Questions: 2-3 genuine unresolved decisions with context and preliminary thinking"
36
+ weight: 0.35
37
+ description: "Complete RFC structure"
38
+ - type: llm_judge
39
+ criteria: "RFC sections are internally consistent and well-connected — the problem statement's data (37.5% failure rate, 45-minute deploys) is directly referenced in the success metrics ('reduce deployment failure rate to under 5%, reduce deployment time to under 10 minutes'). The proposed solution addresses every problem mentioned (manual process → automated, bus factor → documented pipeline, slow rollbacks → automated rollback). The risks section acknowledges realistic challenges with the proposed solution, not just risks of doing nothing. The implementation plan phases align with the solution description. The alternatives section explains why they don't solve the problems as well. Open questions are genuinely unresolved (not things already decided in the solution section). The reader should feel this is one coherent document, not disconnected sections written independently"
40
+ weight: 0.35
41
+ description: "Internal consistency"
42
+ - type: llm_judge
43
+ criteria: "RFC demonstrates good writing practices throughout — executive-friendly summary that a VP could read and understand the proposal. Problem statement leads with impact, not technical details. Solution section explains WHY this approach, not just WHAT it is. Alternatives are steel-manned (presented fairly, not as straw men). Risks are honest (doesn't hide difficulties to make the proposal look better). Implementation plan is realistic (includes buffer time, go/no-go criteria between phases). Success metrics have baselines and targets. Open questions show intellectual humility (acknowledges what the author doesn't know). Tone throughout is confident but not arrogant — 'we recommend' not 'we must', acknowledges trade-offs rather than presenting the solution as perfect. The RFC should serve all audiences: an engineer should find enough detail to evaluate feasibility, a manager should find timeline and resources, a VP should find business justification"
44
+ weight: 0.30
45
+ description: "Writing quality"
@@ -0,0 +1,47 @@
1
+ meta:
2
+ id: implementation-planning
3
+ level: 1
4
+ course: technical-rfc-writing
5
+ type: output
6
+ description: "Write an implementation plan — break a proposed solution into phases with milestones, dependencies, and a realistic timeline"
7
+ tags: [RFC, implementation, planning, phasing, milestones, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're writing the implementation plan section of an RFC. The proposal:
13
+ migrate your monolithic Node.js application to a containerized
14
+ deployment using Docker and Kubernetes.
15
+
16
+ Current state:
17
+ - Single Node.js app running on 3 EC2 instances behind a load balancer
18
+ - Deployments via SSH + PM2 (manual, error-prone)
19
+ - No health checks beyond basic ping
20
+ - Environment config stored in .env files on each server
21
+
22
+ Target state:
23
+ - Dockerized application with CI/CD pipeline
24
+ - Kubernetes cluster with auto-scaling
25
+ - Health checks and readiness probes
26
+ - Config managed via Kubernetes secrets/configmaps
27
+
28
+ Your first draft says: "Phase 1: Set up Kubernetes. Phase 2: Migrate.
29
+ Phase 3: Done." This doesn't help anyone plan their work.
30
+
31
+ Task: Write a detailed implementation plan that breaks this migration
32
+ into realistic phases with clear milestones, dependencies, and a
33
+ timeline. Then write a guide on how to plan implementations in RFCs.
34
+
35
+ assertions:
36
+ - type: llm_judge
37
+ criteria: "Implementation is broken into clear, sequenced phases — Phase 1 (Weeks 1-2): Containerization — Dockerize the application, create Dockerfile and docker-compose for local development, set up container registry, validate that containerized app passes all existing tests. Milestone: app runs identically in Docker as on bare metal. No infrastructure changes yet. Phase 2 (Weeks 3-4): CI/CD Pipeline — set up automated builds on push, container image scanning, automated testing in containers, deployment to staging Kubernetes cluster. Milestone: every merge to main produces a tested, deployable container image. Phase 3 (Weeks 5-6): Kubernetes Setup — provision production cluster, configure networking and ingress, set up secrets management, implement health checks and readiness probes. Milestone: staging environment runs reliably on Kubernetes for 1 week. Phase 4 (Weeks 7-8): Production Migration — gradual traffic shift (10% → 50% → 100%), monitoring and rollback readiness, decommission old EC2 instances after 2 weeks of stable operation. Milestone: 100% traffic on Kubernetes with no degradation. Each phase is independently valuable and shippable"
38
+ weight: 0.35
39
+ description: "Phased implementation"
40
+ - type: llm_judge
41
+ criteria: "Dependencies and risks per phase are identified — dependencies between phases are explicit: 'Phase 2 depends on Phase 1 (need Docker images before CI/CD can build them). Phase 4 depends on Phase 3 (need cluster before migration).' External dependencies: 'Kubernetes cluster provisioning requires infrastructure team approval (1 week lead time — start in Phase 1).' Team allocation: 'Phases 1-2 require 2 backend engineers. Phase 3 requires 1 infrastructure engineer + 1 backend engineer. Phase 4 is all-hands for monitoring.' Go/no-go criteria between phases: 'Do not start Phase 4 until Phase 3 staging environment has run without incidents for 5 business days.' Rollback plan per phase: 'At any point before Phase 4 reaches 100%, we can route all traffic back to EC2 instances within 5 minutes.' Parallel work identified: 'CI/CD setup (Phase 2) and Kubernetes provisioning (Phase 3) can overlap in weeks 3-4'"
42
+ weight: 0.35
43
+ description: "Dependencies and sequencing"
44
+ - type: llm_judge
45
+ criteria: "Planning guide provides reusable principles — principles: (1) Work backwards from the goal: what's the last thing that needs to happen? What must be true before that? Keep going until you reach today. (2) Each phase should be independently valuable: if the project is cancelled after Phase 2, you should still have gained something (containerized app with CI/CD is valuable even without Kubernetes). (3) Front-load risk reduction: tackle the hardest, most uncertain work first — if it fails, you've wasted less time. (4) Include buffer time: estimate honestly then add 20-30% for unknowns. A plan that requires everything to go perfectly is not a plan. (5) Define milestones as verifiable states, not activities: 'app passes load test at 2x current traffic' not 'finish load testing.' (6) Identify the critical path: which tasks, if delayed, delay the entire project? These get the most attention and resources. Anti-patterns: phases that are just calendar months with no milestones, plans with no rollback strategy, big-bang migrations with no gradual rollout, and plans that assume zero dependencies on other teams"
46
+ weight: 0.30
47
+ description: "Planning guide"
@@ -0,0 +1,46 @@
1
+ meta:
2
+ id: open-questions
3
+ level: 1
4
+ course: technical-rfc-writing
5
+ type: output
6
+ description: "Write open questions — identify unresolved decisions in an RFC and frame them to get useful input from reviewers"
7
+ tags: [RFC, open-questions, decisions, feedback, collaboration, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're writing the open questions section of an RFC proposing to
13
+ introduce a message queue (like RabbitMQ, Kafka, or SQS) for
14
+ asynchronous processing. Currently, your application processes
15
+ everything synchronously, causing timeouts on heavy operations
16
+ like report generation, email sending, and image processing.
17
+
18
+ You have several unresolved questions:
19
+ - Which message queue technology to use
20
+ - Whether to start with one queue or separate queues per task type
21
+ - How to handle failed messages (retry? dead letter queue?)
22
+ - Whether to build a custom worker or use an existing framework
23
+ - What the message format should be (JSON? Protobuf? Avro?)
24
+ - Who owns the queue infrastructure (your team or platform team?)
25
+
26
+ Your first draft just lists these as bullet points. Reviewers
27
+ respond with "I don't know, what do you think?" because the
28
+ questions aren't framed to help people give useful answers.
29
+
30
+ Task: Write an open questions section that frames each question
31
+ with context, constraints, options, and your preliminary thinking.
32
+ Then write a guide on how to use open questions effectively in RFCs.
33
+
34
+ assertions:
35
+ - type: llm_judge
36
+ criteria: "Open questions are well-framed with context and options — each question provides: (1) Why it matters: 'Message queue technology selection affects operational complexity, cost, and future scalability.' (2) Constraints: 'We need at-least-once delivery, message persistence, and support for delayed messages. Budget constraint: under $500/month for expected volume.' (3) Options considered: 'RabbitMQ (familiar to team, good for task queues, self-hosted), SQS (managed, lowest ops burden, AWS-native), Kafka (highest throughput, but overkill for our volume of ~1000 messages/hour).' (4) Preliminary recommendation: 'Leaning toward SQS because managed service reduces operational burden and our volume doesn't justify Kafka's complexity. However, if we anticipate event streaming needs in 6 months, Kafka may be worth the upfront investment.' The reviewer now has enough context to give an informed opinion rather than starting from scratch"
37
+ weight: 0.35
38
+ description: "Well-framed questions"
39
+ - type: llm_judge
40
+ criteria: "Questions are prioritized and categorized — blocking questions (must resolve before implementation): queue technology choice (affects all other decisions), failure handling strategy (affects message format and retry logic). Non-blocking questions (can decide during implementation): single vs multiple queues (can start with one and split later), message format (can evolve with versioning). Ownership questions (need organizational input): infrastructure ownership (requires manager/platform team input, not a technical decision). For each question: deadline for decision ('queue technology must be decided before Phase 1 starts — targeting end of review period'), who should weigh in ('platform team for infrastructure ownership, backend engineers for technology choice'), and what happens if not decided ('if no consensus on queue tech, we default to SQS as the lowest-risk managed option'). Default decisions prevent open questions from blocking progress indefinitely"
41
+ weight: 0.35
42
+ description: "Prioritized questions"
43
+ - type: llm_judge
44
+ criteria: "Guide explains how to use open questions effectively — principles: (1) Open questions should be genuine unknowns, not things you're too lazy to decide. If you have enough information to make a decision, make it and state your reasoning — don't punt to reviewers. (2) Frame questions to reduce cognitive load: don't ask 'what should we use?' — ask 'should we use A or B, given these trade-offs?' (3) Include your preliminary thinking: reviewers can critique a proposal faster than generate one from scratch. 'I'm leaning toward X because Y' invites 'have you considered Z?' which is more useful than starting from blank. (4) Set decision deadlines: open questions without timelines stay open forever. 'If no input by Friday, we'll proceed with SQS.' (5) Limit to 3-5 questions: if you have 15 open questions, your RFC isn't ready for review — you need more research. (6) Track resolution: update the RFC as questions are answered. Mark resolved questions so future readers don't re-debate settled decisions. Anti-patterns: rhetorical questions (you already know the answer), questions that should be separate RFCs, questions designed to shift responsibility, and questions so vague that any answer works"
45
+ weight: 0.30
46
+ description: "Open questions guide"
@@ -0,0 +1,41 @@
1
+ meta:
2
+ id: problem-statement
3
+ level: 1
4
+ course: technical-rfc-writing
5
+ type: output
6
+ description: "Write a clear problem statement — articulate the problem an RFC addresses with context, impact, and urgency that motivates the reader to care"
7
+ tags: [RFC, problem-statement, context, impact, motivation, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You need to write the problem statement section of an RFC. The
13
+ technical problem: your API response times have degraded from
14
+ 100ms to 800ms over the past 6 months as the database grew from
15
+ 1M to 10M rows. Several key customers have complained. The team
16
+ has discussed solutions informally but nothing is documented.
17
+
18
+ Bad problem statements you've seen:
19
+ - "The API is slow" (too vague)
20
+ - "We need to add caching" (jumps to solution)
21
+ - A 3-page history of the entire system (too much context)
22
+ - "P99 latency is 800ms" (data without context or impact)
23
+
24
+ Task: Write a problem statement that clearly articulates: what the
25
+ problem is (with data), who is affected, what the impact is (business
26
+ and technical), how urgent it is, and what happens if we don't act.
27
+ Then write a guide on what makes a good problem statement.
28
+
29
+ assertions:
30
+ - type: llm_judge
31
+ criteria: "Problem statement includes data, context, and impact — data: 'API response times increased from 100ms (p50) to 800ms (p99) over 6 months as our users table grew from 1M to 10M rows.' Context: 'This degradation corresponds with organic user growth. Query execution plans show full table scans on unindexed columns.' Impact: business ('3 enterprise customers escalated to their account managers, 2 are evaluating competitors'), technical ('downstream services timeout, causing cascading failures during peak hours'). Urgency: 'At current growth rate, we'll reach 20M rows in 4 months. Without intervention, p99 latency will exceed 2 seconds, breaching our SLA commitments.' The statement makes the reader understand WHY this RFC exists and WHY it matters NOW"
32
+ weight: 0.35
33
+ description: "Data and impact"
34
+ - type: llm_judge
35
+ criteria: "Problem statement avoids common pitfalls — doesn't jump to solutions (no mention of caching, indexing, or any specific fix in the problem section). Doesn't include unnecessary history (only relevant context, not the entire system's evolution). Isn't too vague ('API is slow') or too narrow ('index needed on users.created_at'). Separates symptoms (slow API) from root cause (query patterns on growing tables). Quantifies scope: which endpoints are affected, what percentage of requests, which customers. Distinguishes between the problem (slow queries) and the constraint (can't just throw hardware at it because costs). States success criteria: 'We need to return to sub-200ms p99 latency while supporting 50M+ rows'"
36
+ weight: 0.35
37
+ description: "Avoids pitfalls"
38
+ - type: llm_judge
39
+ criteria: "Guide explains what makes problem statements effective — principles: (1) Start with impact, not technical details — the reader should care before they understand. (2) Include data — 'slow' is subjective, '800ms p99' is measurable. (3) Show the trend — not just current state but trajectory ('getting worse at X rate'). (4) Separate problem from solution — the problem section should be agreeable regardless of which solution is chosen. (5) Define success — what does 'solved' look like? (measurable criteria). (6) State urgency — what happens if we wait? (timeline and consequences). Anti-patterns: solution-oriented problem statements ('we need caching'), blame-oriented ('the previous developer didn't add indexes'), scope-creep ('while we're at it, let's also...'). The problem statement is the foundation — if reviewers disagree on the problem, the solution discussion is pointless"
40
+ weight: 0.30
41
+ description: "Problem statement guide"
@@ -0,0 +1,49 @@
1
+ meta:
2
+ id: proposing-solutions
3
+ level: 1
4
+ course: technical-rfc-writing
5
+ type: output
6
+ description: "Propose solutions with alternatives — present a recommended approach alongside alternatives considered, with clear reasoning for the recommendation"
7
+ tags: [RFC, solutions, alternatives, recommendation, reasoning, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're writing the solution section of an RFC. The problem: your
13
+ application needs real-time notifications. You've researched three
14
+ approaches:
15
+
16
+ 1. Polling: clients poll API every 5 seconds
17
+ - Pros: simple, works everywhere, no infrastructure changes
18
+ - Cons: high server load, 5-second delay, wastes bandwidth
19
+
20
+ 2. WebSocket: persistent connection for real-time push
21
+ - Pros: instant delivery, low latency, bidirectional
22
+ - Cons: connection management complexity, load balancer config,
23
+ harder to scale horizontally
24
+
25
+ 3. Server-Sent Events (SSE): server pushes over HTTP
26
+ - Pros: simpler than WebSocket, automatic reconnection, works
27
+ through most proxies
28
+ - Cons: unidirectional only, limited browser connection pool
29
+
30
+ You recommend SSE.
31
+
32
+ Task: Write the "Proposed Solution" and "Alternatives Considered"
33
+ sections. Show how to present your recommendation with reasoning
34
+ while fairly representing the alternatives. The reader should
35
+ understand WHY SSE was chosen, not just THAT it was chosen.
36
+
37
+ assertions:
38
+ - type: llm_judge
39
+ criteria: "Recommendation is clearly stated with reasoning — the proposed solution section leads with the recommendation: 'We recommend Server-Sent Events (SSE) for real-time notifications.' Then explains WHY: (1) Our use case is unidirectional (server → client), so WebSocket's bidirectional capability is unnecessary complexity. (2) SSE's automatic reconnection handles network instability (common on mobile) without custom code. (3) SSE works through existing load balancers and proxies without configuration changes. (4) Simpler to implement than WebSocket while meeting all requirements. Decision criteria explicitly stated: we optimized for simplicity and reliability over maximum capability. The reasoning should be convincing — a reader should agree with the choice after reading the justification"
40
+ weight: 0.35
41
+ description: "Clear recommendation"
42
+ - type: llm_judge
43
+ criteria: "Alternatives are fairly represented — each alternative: (1) described accurately (not straw-manned to make the recommendation look better), (2) pros acknowledged (polling IS simpler, WebSocket IS more capable), (3) specific reasons for rejection tied to THIS project's needs. Polling rejected: 'Polling at 5-second intervals for 10,000 concurrent users = 2,000 requests/second. This triples our API load and still has unacceptable latency for time-sensitive notifications (payment confirmations).' WebSocket rejected: 'WebSocket requires significant infrastructure changes (sticky sessions, WebSocket-aware load balancer) and custom reconnection logic. The capability gain (bidirectional) doesn't justify the complexity for our unidirectional notification use case.' A reader favoring an alternative should feel their option was genuinely considered, not dismissed"
44
+ weight: 0.35
45
+ description: "Fair alternatives"
46
+ - type: llm_judge
47
+ criteria: "Decision matrix or comparison is provided — structured comparison: table with criteria (latency, complexity, scalability, infrastructure changes, browser support) × options. Each cell has a specific value, not just 'good/bad.' Under what conditions the recommendation changes: 'If we later need bidirectional communication (e.g., real-time collaboration features), we should revisit WebSocket.' 'If we need to support very old browsers, polling becomes the fallback.' This shows the recommendation isn't dogmatic — it's the best choice for current requirements. Reversibility: 'SSE and WebSocket have similar client-side interfaces — migrating from SSE to WebSocket later is straightforward (estimated 2 weeks). This makes SSE a low-risk starting point.'"
48
+ weight: 0.30
49
+ description: "Comparison structure"
@@ -0,0 +1,41 @@
1
+ meta:
2
+ id: rfc-structure
3
+ level: 1
4
+ course: technical-rfc-writing
5
+ type: output
6
+ description: "Learn RFC structure — understand the standard sections of a technical RFC and what each section should contain"
7
+ tags: [RFC, structure, sections, template, format, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your team wants to start writing RFCs for technical decisions but
13
+ no one has written one before. You need to explain the standard
14
+ RFC structure and create a template the team can use.
15
+
16
+ The team has questions:
17
+ - "How long should an RFC be?"
18
+ - "What sections are required vs optional?"
19
+ - "Who is the audience for each section?"
20
+ - "How detailed should the technical design be?"
21
+ - "What's the difference between an RFC and a design doc?"
22
+
23
+ Task: Write a comprehensive guide to RFC structure. Include: each
24
+ standard section with its purpose and audience, guidelines for
25
+ length and detail, a reusable template with prompts for each
26
+ section, and an explanation of when an RFC is appropriate vs
27
+ other document types.
28
+
29
+ assertions:
30
+ - type: llm_judge
31
+ criteria: "Standard sections are defined with clear purposes — sections: (1) Title and metadata (author, date, status, reviewers). (2) Summary/TL;DR (1-2 paragraphs for executives and time-pressed readers). (3) Problem Statement (what problem, who's affected, why now). (4) Proposed Solution (the recommended approach, enough detail to evaluate). (5) Alternatives Considered (other approaches and why they were rejected). (6) Technical Design (architecture, data model, API changes, sequence diagrams). (7) Risks and Mitigations (what could go wrong, how to handle it). (8) Implementation Plan (phases, timeline, milestones). (9) Success Metrics (how to measure if the solution works). (10) Open Questions (unresolved decisions needing input). Each section: who reads it (manager reads summary, engineers read design), and what it should NOT include"
32
+ weight: 0.35
33
+ description: "Section definitions"
34
+ - type: llm_judge
35
+ criteria: "Length and detail guidelines are practical — total length: 2-6 pages for most RFCs (not counting appendices). If longer, the scope is probably too broad — split into multiple RFCs. Summary: 2-3 sentences (the 'elevator pitch'). Problem statement: half a page (focused, data-driven). Solution: 1-2 pages (enough detail to evaluate, not enough to implement — that's the code). Alternatives: half page per alternative (brief, focus on why rejected). Design: as detailed as needed (diagrams encouraged, pseudocode over real code). Risks: bullet points, not paragraphs. Implementation: phased, with milestones. Rule of thumb: if you can't explain the proposal in 5 pages, you need to simplify or split. Appendices for supplementary data (benchmark results, detailed schemas)"
36
+ weight: 0.35
37
+ description: "Length guidelines"
38
+ - type: llm_judge
39
+ criteria: "RFC vs other document types is clearly explained — RFC: proposes a significant technical decision that needs team/org input. Use when: change affects multiple teams, multiple valid approaches exist, decision is hard to reverse. Design doc: detailed implementation plan for an already-decided approach. ADR (Architecture Decision Record): lightweight record of a decision already made (retrospective). Tech spec: detailed specification for implementers (API contracts, data schemas). One-pager: brief proposal for smaller decisions. When NOT to write an RFC: trivial decisions, already-decided approaches that just need implementation, urgent fixes (fix first, document later). Hierarchy: RFC (proposal, needs consensus) → ADR (decision recorded) → Design doc (implementation plan) → Tech spec (implementation details)"
40
+ weight: 0.30
41
+ description: "RFC vs other docs"
@@ -0,0 +1,43 @@
1
+ meta:
2
+ id: risks-and-mitigations
3
+ level: 1
4
+ course: technical-rfc-writing
5
+ type: output
6
+ description: "Write risks and mitigations — identify what could go wrong with a proposed solution and present concrete mitigation strategies"
7
+ tags: [RFC, risks, mitigations, planning, failure-modes, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're writing the risks section of an RFC proposing to migrate your
13
+ application from a self-hosted PostgreSQL database to a managed cloud
14
+ database service (like AWS RDS or Google Cloud SQL).
15
+
16
+ The migration involves:
17
+ - Moving 500GB of production data
18
+ - Changing connection strings across 12 microservices
19
+ - Switching from self-managed backups to provider-managed backups
20
+ - New network topology (VPC peering instead of local connections)
21
+
22
+ Your first draft just lists: "Risk: data loss. Risk: downtime."
23
+ This isn't helpful — it doesn't tell reviewers HOW LIKELY risks are,
24
+ HOW SEVERE they'd be, or WHAT YOU'LL DO about them.
25
+
26
+ Task: Write a comprehensive risks and mitigations section for this
27
+ RFC. For each risk, include likelihood, severity, impact, and a
28
+ concrete mitigation plan. Then write a guide on how to think about
29
+ risks in technical proposals.
30
+
31
+ assertions:
32
+ - type: llm_judge
33
+ criteria: "Risks are specific with likelihood and severity — not generic 'data loss' but specific: (1) 'Data corruption during migration: rows with special characters or encoding issues may fail transfer. Likelihood: Medium. Severity: High. Mitigation: run migration on a staging copy first, validate row counts and checksums per table, keep source database read-only during final sync.' (2) 'Extended downtime beyond maintenance window: migration of 500GB may exceed the 4-hour window if network throughput is lower than expected. Likelihood: Low. Severity: High. Mitigation: perform dry runs to measure actual transfer rates, have a go/no-go checkpoint at 2 hours, pre-stage data with continuous replication before cutover.' (3) 'Connection string misconfiguration: 12 services need updated connection strings — missing one causes partial outage. Likelihood: Medium. Severity: Medium. Mitigation: use environment variables managed by config service, deploy connection changes before migration, test each service against new endpoint in staging.' (4) 'Increased latency from network topology change: VPC peering adds ~1-2ms per query vs local connections. Likelihood: High. Severity: Low. Mitigation: benchmark critical query paths, implement connection pooling, consider read replicas for latency-sensitive services.' Each risk should feel researched, not speculative"
34
+ weight: 0.35
35
+ description: "Specific risks"
36
+ - type: llm_judge
37
+ criteria: "Mitigations are actionable, not hand-wavy — each mitigation answers: what specifically will be done, who is responsible, when it happens (before/during/after migration), and what the fallback is if mitigation fails. Includes a rollback plan: 'If critical issues are discovered within 48 hours of migration, we can fail back to the original database. Continuous replication from new to old database will be maintained during the validation period.' Risk acceptance: explicitly acknowledges which risks are accepted ('We accept 1-2ms latency increase as an acceptable trade-off for managed backups and reduced operational burden'). Residual risks: 'Even with mitigations, there is a non-zero chance of brief inconsistency during cutover. This is bounded to a 30-second window and affects only write operations.' No mitigation should be 'we'll be careful' — every mitigation is a concrete action"
38
+ weight: 0.35
39
+ description: "Actionable mitigations"
40
+ - type: llm_judge
41
+ criteria: "Risk guide provides a framework for thinking about risks — principles: (1) Categorize risks: technical (will it work?), operational (can we run it?), timeline (will it be late?), organizational (do we have the skills?). (2) Use likelihood × severity to prioritize — high likelihood + high severity = must mitigate, low likelihood + low severity = accept and monitor. (3) Every risk needs an owner — 'the team will handle it' means nobody handles it. (4) Distinguish between risks you can prevent and risks you can only detect and respond to. (5) Include rollback criteria — under what conditions do you abort? Make these specific and pre-agreed. (6) Time-bound your risk window — 'risk of data loss exists only during the 4-hour migration window, not before or after.' Anti-patterns: ignoring risks to make the RFC look better, listing risks without mitigations (anxiety without action), mitigating everything (some risks should be accepted), and confusing risks with issues (a risk MIGHT happen, an issue already exists)"
42
+ weight: 0.30
43
+ description: "Risk framework guide"
@@ -0,0 +1,49 @@
1
+ meta:
2
+ id: scoping-an-rfc
3
+ level: 1
4
+ course: technical-rfc-writing
5
+ type: output
6
+ description: "Scope an RFC appropriately — determine what should be in scope, out of scope, and how to handle scope creep in technical proposals"
7
+ tags: [RFC, scoping, boundaries, focus, scope-creep, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're writing an RFC to add search functionality to your e-commerce
13
+ platform. As you write, the scope keeps growing:
14
+
15
+ Started as: "Add product search with text matching"
16
+
17
+ Expanded to include:
18
+ - Full-text search with relevance ranking
19
+ - Autocomplete suggestions
20
+ - Search analytics (what users search for)
21
+ - Faceted filtering (by category, price, brand)
22
+ - Personalized search results (based on user history)
23
+ - Multi-language search support
24
+ - Image-based search ("search by photo")
25
+ - Voice search integration
26
+ - Search result caching strategy
27
+ - A/B testing framework for search ranking
28
+
29
+ Each addition is genuinely useful, but the RFC is now a 6-month
30
+ project that no one will approve.
31
+
32
+ Task: Write the scoping section for this RFC. Decide what's in
33
+ scope (phase 1), what's explicitly out of scope (with reasoning),
34
+ and what's future work (acknowledged but deferred). Then write a
35
+ guide on how to scope technical proposals effectively.
36
+
37
+ assertions:
38
+ - type: llm_judge
39
+ criteria: "Phase 1 scope is focused and deliverable — in scope: full-text search with relevance ranking (core feature), faceted filtering by category and price (expected UX), search result caching (performance requirement). These form a coherent, shippable feature. Timeline: 4-6 weeks. Explicitly out of scope: autocomplete (separate RFC — different backend architecture), personalized search (requires ML infrastructure not yet built), image search (R&D project, not engineering), voice search (depends on voice input infrastructure), A/B testing framework (separate cross-cutting concern). Future work (acknowledged): search analytics (phase 2, enables data-driven improvements), multi-language support (phase 2, when internationalization begins). Each out-of-scope item has a brief reason WHY it's deferred"
40
+ weight: 0.35
41
+ description: "Focused scope"
42
+ - type: llm_judge
43
+ criteria: "Out-of-scope decisions are justified, not arbitrary — each deferral: reason tied to constraints or dependencies. Autocomplete: 'Requires a dedicated suggestion index and real-time query infrastructure that's a separate architectural decision.' Personalization: 'Depends on user behavior tracking and ML pipeline that don't exist yet. Building search infrastructure first provides the data foundation for personalization later.' Image search: 'Requires computer vision capabilities we haven't evaluated. This is a research question, not an engineering decision.' Each deferred item includes: when it could be addressed ('after phase 1 data validates the search approach') and what would trigger revisiting ('if user research shows autocomplete is critical for conversion'). The reader should feel that scope was carefully considered, not randomly cut"
44
+ weight: 0.35
45
+ description: "Justified boundaries"
46
+ - type: llm_judge
47
+ criteria: "Scoping guide provides reusable principles — principles: (1) The MVP test: what's the smallest scope that solves the core problem? Start there. (2) Dependency check: does this feature require infrastructure that doesn't exist? If yes, scope it out. (3) Two-week rule: if a single scope item takes more than 2 weeks, it might be its own RFC. (4) Reversibility: prefer scope that's easy to extend later over scope that's hard to change (build basic search, then add personalization — not the reverse). (5) Stakeholder alignment: get agreement on scope BEFORE writing the detailed design. (6) Explicit out-of-scope: always list what's NOT included — prevents assumptions and 'while you're at it' requests. Anti-patterns: scope by committee (everyone adds their feature), gold plating (adding nice-to-haves before shipping must-haves), scope fear (cutting so much the solution doesn't solve the problem)"
48
+ weight: 0.30
49
+ description: "Scoping guide"
@@ -0,0 +1,43 @@
1
+ meta:
2
+ id: success-metrics
3
+ level: 1
4
+ course: technical-rfc-writing
5
+ type: output
6
+ description: "Define success metrics — specify how to measure whether a proposed solution actually solves the problem it was designed to address"
7
+ tags: [RFC, metrics, success-criteria, measurement, KPIs, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're writing the success metrics section of an RFC. The proposal:
13
+ introduce a caching layer (Redis) between your API and database to
14
+ improve response times and reduce database load.
15
+
16
+ Current state:
17
+ - API p50 latency: 200ms, p99: 1.2 seconds
18
+ - Database CPU: 85% average during peak hours
19
+ - 50,000 API requests per minute at peak
20
+ - Database connection pool frequently exhausted (5-10 times/day)
21
+
22
+ Your first draft says: "Success: the API is faster and the database
23
+ is less loaded." This doesn't work — how fast? How much less? By when?
24
+ How do you know it actually worked?
25
+
26
+ Task: Write a success metrics section with specific, measurable
27
+ criteria. Include leading indicators (early signals), lagging
28
+ indicators (final outcomes), and how you'll measure them. Then write
29
+ a guide on defining success metrics for technical proposals.
30
+
31
+ assertions:
32
+ - type: llm_judge
33
+ criteria: "Success metrics are specific and measurable — primary metrics with targets: (1) API latency: 'p50 latency below 50ms for cached endpoints (75% reduction). p99 latency below 300ms (75% reduction). Measured via existing APM tooling (Datadog/New Relic).' (2) Database load: 'Database CPU below 50% during peak hours (40% reduction). Zero connection pool exhaustion events per day. Measured via CloudWatch/database monitoring.' (3) Cache effectiveness: 'Cache hit rate above 85% within 2 weeks of deployment. Cache miss latency no worse than current latency (200ms p50). Measured via Redis metrics and custom instrumentation.' (4) Reliability: 'No increase in error rate (currently 0.1%). Graceful degradation to direct database queries if Redis is unavailable.' Each metric has a specific number, a measurement method, and a timeline for when the target should be reached"
34
+ weight: 0.35
35
+ description: "Specific metrics"
36
+ - type: llm_judge
37
+ criteria: "Leading and lagging indicators are distinguished — leading indicators (visible within days): cache hit rate trending upward, database connection pool utilization declining, Redis memory usage stable and within capacity. These tell you early if the solution is working. Lagging indicators (visible within weeks): sustained p99 latency improvement, database CPU reduction during peak hours, reduction in customer-reported latency complaints, cost savings from potential database downscaling. Timeline: 'Week 1: validate cache hit rate above 80% and no error rate increase. Week 2: confirm sustained latency improvement across all endpoints. Month 1: measure database CPU trend and evaluate downscaling opportunity. Month 3: full cost-benefit analysis including Redis infrastructure cost vs database savings.' Failure criteria: 'If cache hit rate is below 60% after 2 weeks, investigate cache key strategy. If p99 latency does not improve by at least 50%, the caching layer may not address the root cause'"
38
+ weight: 0.35
39
+ description: "Leading and lagging indicators"
40
+ - type: llm_judge
41
+ criteria: "Metrics guide provides reusable principles — principles: (1) Every metric needs a baseline: you can't measure improvement without knowing where you started. Capture current state before any changes. (2) Metrics should be SMART: Specific (not 'faster'), Measurable (has a number), Achievable (realistic target), Relevant (actually measures the problem), Time-bound (by when). (3) Include counter-metrics: if you optimize latency, also track error rate — fast but wrong is worse than slow but correct. (4) Distinguish vanity metrics from actionable metrics: 'cache hit rate' is actionable (you can improve key strategy), 'total requests served' is vanity (doesn't tell you if the solution works). (5) Define failure criteria: what metric values would cause you to reconsider the approach? This makes the RFC intellectually honest. (6) Automate measurement: if you can't measure it automatically, you won't measure it consistently. Set up dashboards before launch, not after. Anti-patterns: no baseline measurements, metrics that only show success (confirmation bias), metrics that require manual collection, and metrics disconnected from the original problem statement"
42
+ weight: 0.30
43
+ description: "Metrics guide"